AMR/tests at 060449e23401018abaa530431ad4bf1516c3d66b - AMR

mirror of https://github.com/msberends/AMR.git synced 2026-07-17 17:10:52 +02:00

Files

Claude 060449e234 Optimise parallel as.sir(): row-batch mode when n_cols < n_cores

Previously parallel dispatch only parallelised by column, so a 6-column
dataset on a 16-core machine used at most 6 cores with the other 10 idle.
For large n this also caused memory-bandwidth saturation (each worker did
a full n-row scan of clinical_breakpoints simultaneously).

New row-batch mode (fork path, R >= 4.0, non-Windows):
  pieces_per_col = ceil(n_cores / n_cols)
  Jobs = n_cols × pieces_per_col  (≈ n_cores jobs total)
  Each job: one column × one row slice

Benefits:
  - All cores stay busy regardless of column count
  - Per-worker memory footprint shrinks by pieces_per_col ×
  - Breakpoints lookup cache pressure reduced per worker

PSOCK path (Windows / R < 4.0) is unchanged: per-job serialisation
overhead makes row batching unprofitable there.

run_as_sir_column() gains an optional `rows` parameter (NULL = all rows,
backward-compatible). Results are reassembled via as.sir(c(as.character(.)))
which is safe for already-clean SIR values.

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR

2026-04-24 22:01:09 +00:00

testthat

Optimise parallel as.sir(): row-batch mode when n_cols < n_cores

2026-04-24 22:01:09 +00:00

testthat.R

(v2.1.1.9236) documentation

2025-04-12 11:46:42 +02:00

tinytest.R.old

(v2.1.1.9236) documentation

2025-04-12 11:46:42 +02:00