website update

2025-08-24 11:52:11 +02:00 · 2019-02-20 13:57:23 +01:00
parent 8dc027309e
commit 13120f465f
28 changed files with 661 additions and 684 deletions
--- a/vignettes/benchmarks.Rmd
+++ b/vignettes/benchmarks.Rmd
@@ -46,7 +46,7 @@ S.aureus <- microbenchmark(as.mo("sau"),
                           as.mo("Staphylococcus aureus"),
                           as.mo("B_STPHY_AUR"),
                           times = 10)
-print(S.aureus, unit = "ms")
+print(S.aureus, unit = "ms", signif = 3)
 ```

 In the table above, all measurements are shown in milliseconds (thousands of seconds). A value of 10 milliseconds means it can determine 100 input values per second. It case of 50 milliseconds, this is only 20 input values per second. The more an input value resembles a full name, the faster the result will be found. In case of `as.mo("B_STPHY_AUR")`, the input is already a valid MO code, so it only almost takes no time at all (`r as.integer(min(S.aureus$time, na.rm = TRUE) / 1000)` millionths of seconds).
@@ -62,7 +62,7 @@ M.leonicaptivi <- microbenchmark(as.mo("myle"),
                                 as.mo("Mycoplasma leonicaptivi"),
                                 as.mo("B_MYCPL_LEO"),
                                 times = 10)
-print(M.leonicaptivi, unit = "ms")
+print(M.leonicaptivi, unit = "ms", signif = 4)
 ```

 That takes `r round(mean(M.leonicaptivi$time, na.rm = TRUE) / mean(S.aureus$time, na.rm = TRUE), 1)` times as much time on average! A value of 100 milliseconds means it can only determine ~10 different input values per second. We can conclude that looking up arbitrary codes of less prevalent microorganisms is the worst way to go, in terms of calculation performance:
@@ -99,9 +99,9 @@ length(x)
 n_distinct(x)

 # now let's see:
-run_it <- microbenchmark(X = mo_fullname(x),
+run_it <- microbenchmark(mo_fullname(x),
                         times = 10)
-print(run_it, unit = "ms")
+print(run_it, unit = "ms", signif = 3)
 ```

 So transforming 500,000 values (!) of `r n_distinct(x)` unique values only takes `r round(median(run_it$time, na.rm = TRUE) / 1e9, 2)` seconds (`r as.integer(median(run_it$time, na.rm = TRUE) / 1e6)` ms). You only lose time on your unique input values.
@@ -115,22 +115,22 @@ run_it <- microbenchmark(A = mo_fullname("B_STPHY_AUR"),
                         B = mo_fullname("S. aureus"),
                         C = mo_fullname("Staphylococcus aureus"),
                         times = 10)
-print(run_it, unit = "ms")
+print(run_it, unit = "ms", signif = 3)
 ```

 So going from `mo_fullname("Staphylococcus aureus")` to `"Staphylococcus aureus"` takes `r format(round(run_it %>% filter(expr == "C") %>% pull(time) %>% median() / 1e9, 4), scientific = FALSE)` seconds - it doesn't even start calculating *if the result would be the same as the expected resulting value*. That goes for all helper functions:

 ```{r}
-microbenchmark(A = mo_species("aureus"),
-               B = mo_genus("Staphylococcus"),
-               C = mo_fullname("Staphylococcus aureus"),
-               D = mo_family("Staphylococcaceae"),
-               E = mo_order("Bacillales"),
-               F = mo_class("Bacilli"),
-               G = mo_phylum("Firmicutes"),
-               H = mo_kingdom("Bacteria"),
-               times = 10,
-               unit = "ms")
+run_it <- microbenchmark(A = mo_species("aureus"),
+                         B = mo_genus("Staphylococcus"),
+                         C = mo_fullname("Staphylococcus aureus"),
+                         D = mo_family("Staphylococcaceae"),
+                         E = mo_order("Bacillales"),
+                         F = mo_class("Bacilli"),
+                         G = mo_phylum("Firmicutes"),
+                         H = mo_kingdom("Bacteria"),
+                         times = 10)
+print(run_it, unit = "ms", signif = 3)
 ```

 Of course, when running `mo_phylum("Firmicutes")` the function has zero knowledge about the actual microorganism, namely *S. aureus*. But since the result would be `"Firmicutes"` too, there is no point in calculating the result. And because this package 'knows' all phyla of all known bacteria (according to the Catalogue of Life), it can just return the initial value immediately.
@@ -142,17 +142,19 @@ When the system language is non-English and supported by this `AMR` package, som
 ```{r}
 mo_fullname("CoNS", language = "en") # or just mo_fullname("CoNS") on an English system

-mo_fullname("CoNS", language = "fr") # or just mo_fullname("CoNS") on a French system
+mo_fullname("CoNS", language = "es") # or just mo_fullname("CoNS") on a Spanish system

-microbenchmark(en = mo_fullname("CoNS", language = "en"),
-               de = mo_fullname("CoNS", language = "de"),
-               nl = mo_fullname("CoNS", language = "nl"),
-               es = mo_fullname("CoNS", language = "es"),
-               it = mo_fullname("CoNS", language = "it"),
-               fr = mo_fullname("CoNS", language = "fr"),
-               pt = mo_fullname("CoNS", language = "pt"),
-               times = 10,
-               unit = "ms")
+mo_fullname("CoNS", language = "nl") # or just mo_fullname("CoNS") on a Dutch system
+
+run_it <- microbenchmark(en = mo_fullname("CoNS", language = "en"),
+                         de = mo_fullname("CoNS", language = "de"),
+                         nl = mo_fullname("CoNS", language = "nl"),
+                         es = mo_fullname("CoNS", language = "es"),
+                         it = mo_fullname("CoNS", language = "it"),
+                         fr = mo_fullname("CoNS", language = "fr"),
+                         pt = mo_fullname("CoNS", language = "pt"),
+                         times = 10)
+print(run_it, unit = "ms", signif = 4)
 ```

 Currently supported are German, Dutch, Spanish, Italian, French and Portuguese.