P281424/AMR - AMR - Gitea RUG

mirror of https://github.com/msberends/AMR.git synced 2026-07-02 16:58:58 +02:00

Author	SHA1	Message	Date
Matthijs Berends	19157ce718	Fix parallel computing in as.sir.data.frame (#276 ) * Fix parallel computing in as.sir.data.frame Six bugs in parallel = TRUE mode: 1. PSOCK workers (Windows / R < 4.0) never had AMR loaded, so every exported/AMR function call failed. Added clusterEvalQ(cl, library(AMR)) with a graceful fallback to sequential when the package cannot be loaded (e.g. dev-only load_all() environments). 2. clusterExport'd AMR_env was a frozen serialised copy; as.sir() on the worker wrote to AMR:::AMR_env while run_as_sir_column read from the stale copy, so the captured log was always wrong. Fixed by resolving AMR_env dynamically via get("AMR_env", envir = asNamespace("AMR")) inside the worker function, and removing AMR_env from clusterExport. 3. In the fork-based (mclapply) path each worker inherited the parent's full sir_interpretation_history. Capturing the whole log then combining across workers duplicated every pre-existing entry. Fixed by recording the log row count before the as.sir() call and slicing only the new rows afterwards. 4. run_as_sir_column used non-exported internals (%pm>%, pm_pull, as.sir.default) that are inaccessible on PSOCK workers after library(AMR). Replaced pipe chains with direct as.mic(as.character(x[, col, drop=TRUE])) and as.disk(...) calls, and changed as.sir.default() to as.sir() which dispatches correctly via S3. 5. With info = TRUE, worker forks printed per-column progress messages simultaneously, producing garbled interleaved console output. Per-column messages are now suppressed inside workers (effective_info = FALSE) while the outer "Running in parallel" / "DONE" messages still appear. 6. Malformed Unicode escape \u00a (3 hex digits) in the "DONE" banner was parsed by R as U+00AD (soft hyphen) + "ONE"; corrected to . https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Add parallel computing tests to test-sir.R Eight targeted tests verify correctness of the parallel as.sir() path: identical SIR output vs sequential, matching log row counts, no pre-existing history duplication, reproducibility across runs, results consistency across max_cores values, single-column fallback, and no per-column worker messages leaking when info = TRUE. All pass when only 1 core is available (parallel silently falls back to sequential). https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Fix as.sir() data.frame: preserve already-<sir> columns, exclude metadata Issue #278: two related bugs in the column-detection / type-assignment pipeline. Bug 1 – already-<sir> columns deleted on re-run Line 886 excluded already-sir columns from the type assignment (they stayed type "") causing the result loop to do x[,col] <- NULL, deleting them. Fix: drop the !is.sir() guard so all untyped columns fall through to type "sir" and are re-processed correctly. Bug 2 – metadata columns treated as antibiotics as.ab("patient") -> OXY, as.ab("ward") -> PRU. The column detector accepted any column whose name matched an antibiotic code, regardless of content. Fix: for name-matched columns that do not already carry an AMR class, also verify content looks like AMR data (all_valid_mics, all- numeric, or any SIR-like string). all_valid_disks() is intentionally avoided here because it strips letters from strings (as.disk("Pt_1")==1). Also adds tools/benchmark_parallel.R: a standalone script that times sequential vs parallel as.sir() across n=20/200/2000/20000 rows and saves a ggplot2 PNG to tools/benchmark_parallel.png. https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Update benchmark: two-panel script with warm-up and column-count sweep Previous single-panel benchmark was misleading: the first sequential run paid one-time cache-warm-up cost (skewing n=20), and only 6 columns were used so only 6 cores were ever active on a 16-core machine. New two-panel design: Left – vary rows with 16 fixed AB columns (shows memory-bandwidth saturation for large n) Right – vary columns with fixed rows (shows the real speedup profile: parallel wins when n_cols >> 1) Also adds a warm-up pass before measurements to eliminate first-call bias. https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Optimise parallel as.sir(): row-batch mode when n_cols < n_cores Previously parallel dispatch only parallelised by column, so a 6-column dataset on a 16-core machine used at most 6 cores with the other 10 idle. For large n this also caused memory-bandwidth saturation (each worker did a full n-row scan of clinical_breakpoints simultaneously). New row-batch mode (fork path, R >= 4.0, non-Windows): pieces_per_col = ceil(n_cores / n_cols) Jobs = n_cols × pieces_per_col (≈ n_cores jobs total) Each job: one column × one row slice Benefits: - All cores stay busy regardless of column count - Per-worker memory footprint shrinks by pieces_per_col × - Breakpoints lookup cache pressure reduced per worker PSOCK path (Windows / R < 4.0) is unchanged: per-job serialisation overhead makes row batching unprofitable there. run_as_sir_column() gains an optional `rows` parameter (NULL = all rows, backward-compatible). Results are reassembled via as.sir(c(as.character(.))) which is safe for already-clean SIR values. https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Fix info=FALSE ignored when no breakpoints found in as_sir_method Operator-precedence bug at line 1601: if (isTRUE(info) && nrow(df_unique) < 10 \|\| nrow(breakpoints) == 0) R evaluates && before \|\|, so this was equivalent to: (isTRUE(info) && nrow(df_unique) < 10) \|\| (nrow(breakpoints) == 0) When nrow(breakpoints) == 0 (e.g. cefoxitin / flucloxacillin / mupirocin against E. coli in EUCAST) the intro message was always printed regardless of info. Fix: add parentheses so info gates both conditions: isTRUE(info) && (nrow(df_unique) < 10 \|\| nrow(breakpoints) == 0) Also pass print = isTRUE(info) to progress_ticker so the progress bar (which prints intro_txt as its title) is suppressed when info = FALSE. https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Fix cli formatting in as.sir() messages - stop_if for empty ab_cols: wrap as.mic() and as.disk() in {.help [{.fun ...}](...)} for clickable links in cli output - Parallel mode message: use {.field col} formatting for column names and quotes = FALSE in vector_and(), consistent with the rest of the codebase (avoids double-quoting from both font_bold and quotes="'") https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Use font_bold() inside {.field} for column names in parallel message Convention: paste0("{.field ", font_bold(col), "}") gives bold green column names without quotation marks, consistent with the rest of the codebase (e.g. the 'Cleaning values' message in run_as_sir_column). https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Add collapse = NULL to font_bold() for column name vectors font_bold() without collapse = NULL joins a vector with "" into a single string, breaking paste0() element-wise formatting for length > 1 vectors. https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR * Add tools/ to .Rbuildignore Keeps the benchmark script out of the built package tarball. https://claude.ai/code/session_012DXCXbZUC54Zij1z9bFiHR --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-04-25 00:34:38 +02:00
Matthijs Berends	2c21eba04c	add CLAUDE.md with project context for Claude Code (#261 ) * add CLAUDE.md with project context for Claude Code Provides development commands, architecture overview, file conventions, custom S3 classes, data files, testing setup, and versioning guidelines to help Claude Code assist effectively in this repository. https://claude.ai/code/session_01L3fTxqsg3Gc6J1znpWN1Mx * add CLAUDE.md to .Rbuildignore Excludes the Claude Code context file from the R package build tarball. https://claude.ai/code/session_01L3fTxqsg3Gc6J1znpWN1Mx * document version-bump requirement for every PR in CLAUDE.md Each PR must increment the .9zzz dev counter by 1 in both DESCRIPTION (Version: field) and NEWS.md (top-level heading). https://claude.ai/code/session_01L3fTxqsg3Gc6J1znpWN1Mx --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-02-27 17:13:11 +01:00
dr. M.S. (Matthijs) Berends	3ba1b8a10a	(v3.0.0.9022) postpone new features - we like a clearly focussed bugfix release first	2025-09-03 15:39:44 +02:00
github-actions[bot]	c70ac149ff	new WISCA vignette	2025-04-30 17:33:03 +02:00
dr. M.S. (Matthijs) Berends	4a336d040c	(v2.1.1.9250) Automated README and `index.md`	2025-04-21 15:37:26 +02:00
dr. M.S. (Matthijs) Berends	15fc72fc66	(v2.1.1.9121) support tidymodels	2024-12-19 20:17:15 +01:00
dr. M.S. (Matthijs) Berends	87271d261a	add Python package to repo	2024-11-21 10:06:26 +01:00
dr. M.S. (Matthijs) Berends	5c4d8fcd2a	(v2.1.1.9095) Python support	2024-10-15 17:12:55 +02:00
dr. M.S. (Matthijs) Berends	08f7256852	include unit tests again	2023-07-13 12:45:14 +01:00
dr. M.S. (Matthijs) Berends	ddd01f9410	test to remove unit tests from build	2023-07-13 09:42:37 +01:00
dr. M.S. (Matthijs) Berends	0bcf55d3b6	improve `as.mo()`	2023-05-24 15:55:53 +02:00
dr. M.S. (Matthijs) Berends	303d61b473	new tibble export	2022-08-27 20:49:37 +02:00
dr. M.S. (Matthijs) Berends	d6676e9443	disk documentation fix	2022-08-21 16:52:09 +02:00
dr. M.S. (Matthijs) Berends	952d16de33	new, automated website	2022-08-21 16:37:20 +02:00
dr. M.S. (Matthijs) Berends	7226b70c3d	update languages	2022-08-20 20:17:14 +02:00
dr. M.S. (Matthijs) Berends	ccb09706e4	v1.8.1	2022-03-24 23:05:04 +01:00
dr. M.S. (Matthijs) Berends	f5dcf0ad58	v1.8.0 as accepted by CRAN	2022-01-07 16:27:13 +01:00
dr. M.S. (Matthijs) Berends	a2d249962f	(v1.7.1.9023) Removed filter_ functions, new set_ab_names(), ATC code update, ab selector update, fixes #46 and fixed #47	2021-08-16 21:54:34 +02:00
dr. M.S. (Matthijs) Berends	d277d58475	(v1.6.0.9002) R-3.0 installation fix	2021-04-12 14:24:40 +02:00
dr. M.S. (Matthijs) Berends	1737d56ae4	(v1.5.0.9026) vignette update, support for GISA	2021-02-25 12:31:12 +01:00
dr. M.S. (Matthijs) Berends	286eaa9699	(v1.5.0.9010) MDRO vignette update, get_episode for < day	2021-01-24 14:48:56 +01:00
dr. M.S. (Matthijs) Berends	c8bcecf232	(v1.4.0.9037) random_* functions	2020-12-12 23:17:29 +01:00
dr. M.S. (Matthijs) Berends	791bb6d33f	(v1.3.0) remove vignettes from CRAN	2020-07-31 11:39:56 +02:00
dr. M.S. (Matthijs) Berends	c5f7294381	(v1.3.0) skip more CRAN tests	2020-07-31 10:50:08 +02:00
dr. M.S. (Matthijs) Berends	76fc8e1b14	(v1.2.0.9026) move to github	2020-07-08 14:48:06 +02:00
dr. M.S. (Matthijs) Berends	e2d05cb1b0	(v0.8.0.9017) keywords update	2019-11-06 14:43:23 +01:00
dr. M.S. (Matthijs) Berends	10e6b225e7	(v0.7.1.9107) v0.8.0	2019-10-15 14:35:23 +02:00
dr. M.S. (Matthijs) Berends	00cdb498a0	(v0.7.1.9102) lintr	2019-10-11 17:21:02 +02:00
dr. M.S. (Matthijs) Berends	398c5bdc4f	(v0.7.1.9073) as.mo() self-learning algorithm	2019-09-15 22:57:30 +02:00
dr. M.S. (Matthijs) Berends	2667fff8a7	(v0.6.1.9050) support staged install	2019-06-01 20:40:49 +02:00
dr. M.S. (Matthijs) Berends	461eec9bac	cfta streptococci, codecov.yml	2019-04-09 14:59:17 +02:00
dr. M.S. (Matthijs) Berends	30b559827c	documentation update	2019-04-07 22:40:02 +02:00
dr. M.S. (Matthijs) Berends	fb1fc3686c	Catalogue of life	2019-02-20 00:04:48 +01:00
dr. M.S. (Matthijs) Berends	46dcc7e2e8	set_mo_source	2019-01-21 15:53:01 +01:00
dr. M.S. (Matthijs) Berends	b48e609afe	gitlab pkg cache	2019-01-05 09:50:22 +01:00
dr. M.S. (Matthijs) Berends	b92c392dd4	gitlab ci	2019-01-04 12:13:02 +01:00
dr. M.S. (Matthijs) Berends	eab3c9dac8	gitlab ci	2019-01-04 10:41:18 +01:00
dr. M.S. (Matthijs) Berends	6652f7d82b	fix warnings	2019-01-04 09:49:42 +01:00
dr. M.S. (Matthijs) Berends	fd646fe1fc	introduction of Packrat	2018-12-30 08:40:40 +01:00
dr. M.S. (Matthijs) Berends	92a32b62a7	new website, freq updates	2018-12-29 22:24:19 +01:00
dr. M.S. (Matthijs) Berends	456f3e8773	gitlab pages fix	2018-12-23 21:33:44 +01:00
dr. M.S. (Matthijs) Berends	92d2553dfe	gitlab pages	2018-12-23 21:26:21 +01:00
dr. M.S. (Matthijs) Berends	b937662a97	limits for scale_y_percent - Licence update	2018-12-16 22:45:12 +01:00
dr. M.S. (Matthijs) Berends	5cfa5bbfe3	v0.5.0	2018-11-30 16:16:04 +01:00
dr. M.S. (Matthijs) Berends	87757748bd	switch to gitlab	2018-10-23 16:49:40 +02:00
dr. M.S. (Matthijs) Berends	029157b3be	163 new trade names, added ab_tradenames	2018-08-29 12:27:37 +02:00
dr. M.S. (Matthijs) Berends	e5193c7749	AppVeyor	2018-08-13 23:05:53 +02:00
dr. M.S. (Matthijs) Berends	1ba7d883fe	new ggplot enhancement	2018-08-11 21:30:00 +02:00
dr. M.S. (Matthijs) Berends	fc30d3fb13	freq: support for table	2018-07-09 14:02:58 +02:00
uscloud	dcc26dd942	Update freq function	2018-05-22 16:34:22 +02:00

1 2

52 Commits