1
0
mirror of https://github.com/msberends/AMR.git synced 2026-05-31 13:01:42 +02:00
Files
AMR/tests/testthat
Claude 060449e234 Optimise parallel as.sir(): row-batch mode when n_cols < n_cores
Previously parallel dispatch only parallelised by column, so a 6-column
dataset on a 16-core machine used at most 6 cores with the other 10 idle.
For large n this also caused memory-bandwidth saturation (each worker did
a full n-row scan of clinical_breakpoints simultaneously).

New row-batch mode (fork path, R >= 4.0, non-Windows):
  pieces_per_col = ceil(n_cores / n_cols)
  Jobs = n_cols × pieces_per_col  (≈ n_cores jobs total)
  Each job: one column × one row slice

Benefits:
  - All cores stay busy regardless of column count
  - Per-worker memory footprint shrinks by pieces_per_col ×
  - Breakpoints lookup cache pressure reduced per worker

PSOCK path (Windows / R < 4.0) is unchanged: per-job serialisation
overhead makes row batching unprofitable there.

run_as_sir_column() gains an optional `rows` parameter (NULL = all rows,
backward-compatible). Results are reassembled via as.sir(c(as.character(.)))
which is safe for already-clean SIR values.

https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR
2026-04-24 22:01:09 +00:00
..
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2025-09-12 16:52:59 +02:00
2025-04-12 11:46:42 +02:00
2026-01-06 23:08:50 +01:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2026-03-07 18:07:24 +01:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2025-04-12 11:46:42 +02:00
2026-02-09 13:16:36 +01:00
2025-04-12 11:46:42 +02:00
2026-03-24 12:44:47 +01:00