AMR/man/g.test.Rd

109 lines
5.8 KiB
Plaintext
Raw Normal View History

2018-07-01 21:40:37 +02:00
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/g.test.R
\name{g.test}
\alias{g.test}
\title{\emph{G}-test of matrix or vector}
\usage{
g.test(x, y = NULL, alpha = 0.05, info = TRUE, minimum = 0)
}
\arguments{
\item{x}{numeric vector or matrix}
2018-07-02 11:14:20 +02:00
\item{y}{expected value of \code{x}. Leave empty to determine automatically. This can also be ratios of \code{x}, e.g. calculated with \code{\link{ratio}}.}
2018-07-01 21:40:37 +02:00
\item{alpha}{value to test the p value against}
\item{info}{logical to determine whether the analysis should be printed}
\item{minimum}{the test with fail if any of the observed values is below this value. Use \code{minimum = 30} for microbial epidemiology, to prevent calculating a p value when less than 30 isolates are available.}
}
\description{
A \emph{G}-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a \strong{\emph{G}-test of goodness-of-fit}), or to see whether the proportions of one variable are different for different values of the other variable (called a \strong{\emph{G}-test of independence}).
}
\section{\emph{G}-test of goodness-of-fit (likelihood ratio test)}{
Use the \emph{G}-test of goodness-of-fit when you have one nominal variable with two or more values (such as male and female, or red, pink and white flowers). You compare the observed counts of numbers of observations in each category with the expected counts, which you calculate using some kind of theoretical expectation (such as a 1:1 sex ratio or a 1:2:1 ratio in a genetic cross).
If the expected number of observations in any category is too small, the \emph{G}-test may give inaccurate results, and you should use an exact test instead. See the web page on small sample sizes for discussion of what "small" means.
The \emph{G}-test of goodness-of-fit is an alternative to the chi-square test of goodness-of-fit; each of these tests has some advantages and some disadvantages, and the results of the two tests are usually very similar.
}
\section{\emph{G}-test of independence}{
Use the \emph{G}-test of independence when you have two nominal variables, each with two or more possible values. You want to know whether the proportions for one variable are different among values of the other variable.
It is also possible to do a \emph{G}-test of independence with more than two nominal variables. For example, Jackson et al. (2013) also had data for children under 3, so you could do an analysis of old vs. young, thigh vs. arm, and reaction vs. no reaction, all analyzed together.
Fisher's exact test is more accurate than the \emph{G}-test of independence when the expected numbers are small, so it is recommend to only use the \emph{G}-test if your total sample size is greater than 1000.
The \emph{G}-test of independence is an alternative to the chi-square test of independence, and they will give approximately the same results.
}
\section{How the test works}{
Unlike the exact test of goodness-of-fit, the \emph{G}-test does not directly calculate the probability of obtaining the observed results or something more extreme. Instead, like almost all statistical tests, the \emph{G}-test has an intermediate step; it uses the data to calculate a test statistic that measures how far the observed data are from the null expectation. You then use a mathematical relationship, in this case the chi-square distribution, to estimate the probability of obtaining that value of the test statistic.
The \emph{G}-test uses the log of the ratio of two likelihoods as the test statistic, which is why it is also called a likelihood ratio test or log-likelihood ratio test. The formula to calculate a \emph{G}-statistic is:
\code{G <- 2 * sum(x * log(x / x.expected))}
Since this is chi-square distributed, the p value can be calculated with:
\code{p <- 1 - stats::pchisq(G, df))}
where \code{df} are the degrees of freedom: \code{max(NROW(x) - 1, 1) * max(NCOL(x) - 1, 1)}.
If there are more than two categories and you want to find out which ones are significantly different from their null expectation, you can use the same method of testing each category vs. the sum of all categories, with the Bonferroni correction. You use \emph{G}-tests for each category, of course.
}
\examples{
# = EXAMPLE 1 =
# Shivrain et al. (2006) crossed clearfield rice (which are resistant
# to the herbicide imazethapyr) with red rice (which are susceptible to
# imazethapyr). They then crossed the hybrid offspring and examined the
# F2 generation, where they found 772 resistant plants, 1611 moderately
# resistant plants, and 737 susceptible plants. If resistance is controlled
# by a single gene with two co-dominant alleles, you would expect a 1:2:1
# ratio.
x <- c(772, 1611, 737)
2018-07-02 11:14:20 +02:00
x.expected <- ratio(x, "1:2:1")
2018-07-01 21:40:37 +02:00
x.expected
# 780 1560 780
g.test(x, x.expected)
# p = 0.12574.
# There is no significant difference from a 1:2:1 ratio.
# Meaning: resistance controlled by a single gene with two co-dominant
# alleles, is plausible.
# = EXAMPLE 2 =
# Red crossbills (Loxia curvirostra) have the tip of the upper bill either
# right or left of the lower bill, which helps them extract seeds from pine
# cones. Some have hypothesized that frequency-dependent selection would
# keep the number of right and left-billed birds at a 1:1 ratio. Groth (1992)
# observed 1752 right-billed and 1895 left-billed crossbills.
x <- c(1752, 1895)
2018-07-02 11:14:20 +02:00
x.expected <- ratio(x, ratio = c(1, 1))
2018-07-01 21:40:37 +02:00
x.expected
# 1823.5 1823.5
g.test(x, x.expected)
# p = 0.01787343
# There is a significant difference from a 1:1 ratio.
# Meaning: there are significantly more left-billed birds.
}
\references{
McDonald, J.H. 2014. \strong{Handbook of Biological Statistics (3rd ed.)}. Sparky House Publishing, Baltimore, Maryland. \url{http://www.biostathandbook.com/gtestgof.html}.
}
\seealso{
\code{\link{chisq.test}}
}
\keyword{chi}