1
0
mirror of https://github.com/msberends/AMR.git synced 2026-05-14 04:30:53 +02:00

Fix emend. author bug in get_author_year() and update ref documentation

Strip emend. and everything after it so the ref column retains the
combination authority, not the emendation author. Update data.R and
mo_property.R docs to describe the correct semantics of the ref field.

https://claude.ai/code/session_01VH4Ju4Xq9aW1AHuoVbjGEo
This commit is contained in:
Claude
2026-05-02 14:34:45 +00:00
parent 0af3f84655
commit 9707450b89
5 changed files with 8 additions and 5 deletions

View File

@@ -109,7 +109,7 @@
#' - `status` \cr Status of the taxon, either `r vector_or(microorganisms$status, documentation = TRUE)`
#' - `kingdom`, `phylum`, `class`, `order`, `family`, `genus`, `species`, `subspecies`\cr Taxonomic rank of the microorganism. Note that for fungi, *phylum* is equal to their taxonomic *division*. Also, for fungi, *subkingdom* and *subdivision* were left out since they do not occur in the bacterial taxonomy.
#' - `rank`\cr Text of the taxonomic rank of the microorganism, such as `"species"` or `"genus"`
#' - `ref`\cr Author(s) and year of related scientific publication. This contains only the *first surname* and year of the *latest* authors, e.g. "Wallis *et al.* 2006 *emend.* Smith and Jones 2018" becomes "Smith *et al.*, 2018". This field is directly retrieved from the source specified in the column `source`. Moreover, accents were removed to comply with CRAN that only allows ASCII characters.
#' - `ref`\cr Abbreviated authority citation for the nomenclatural act that established the current name combination, following ICNP conventions. For species described in their current genus (sp. nov.), this is the original description author(s) and year. For species transferred to a different genus (comb. nov.), this is the reclassification author(s) and year. Emendations are excluded. For synonyms, this is the authority under which the synonym was originally published. Data sourced primarily from LPSN, supplemented by GBIF where LPSN coverage is absent.
#' - `oxygen_tolerance` \cr Oxygen tolerance, either `r vector_or(microorganisms$oxygen_tolerance, documentation = TRUE)`. These data were retrieved from BacDive (see *Source*). Items that contain "likely" are missing from BacDive and were extrapolated from other species within the same genus to guess the oxygen tolerance. Currently `r round(length(microorganisms$oxygen_tolerance[which(!is.na(microorganisms$oxygen_tolerance))]) / nrow(microorganisms[which(microorganisms$kingdom == "Bacteria"), ]) * 100, 1)`% of all `r format_included_data_number(nrow(microorganisms[which(microorganisms$kingdom == "Bacteria"), ]))` bacteria in the data set contain an oxygen tolerance.
#' - `source`\cr Either `r vector_or(microorganisms$source, documentation = TRUE)` (see *Source*)
#' - `lpsn`\cr Identifier ('Record number') of `r TAXONOMY_VERSION$LPSN$name`. This will be the first/highest LPSN identifier to keep one identifier per row. For example, *Acetobacter ascendens* has LPSN Record number 7864 and 11011. Only the first is available in the `microorganisms` data set. ***This is a unique identifier***, though available for only `r format_included_data_number(sum(!is.na(microorganisms$lpsn)))` records.