AMR/docs/reference/g.test.html

371 lines
21 KiB
HTML
Raw Normal View History

2018-12-23 21:26:21 +01:00
<!-- Generated by pkgdown: do not edit by hand -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
2018-12-29 22:24:19 +01:00
<title><em>G</em>-test for Count Data — g.test • AMR (for R)</title>
<!-- favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="../favicon-16x16.png">
<link rel="icon" type="image/png" sizes="32x32" href="../favicon-32x32.png">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="../apple-touch-icon.png" />
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="../apple-touch-icon-120x120.png" />
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="../apple-touch-icon-76x76.png" />
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="../apple-touch-icon-60x60.png" />
2018-12-23 21:26:21 +01:00
<!-- jquery -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js" integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8=" crossorigin="anonymous"></script>
<!-- Bootstrap -->
2018-12-29 22:24:19 +01:00
<link href="https://cdnjs.cloudflare.com/ajax/libs/bootswatch/3.3.7/flatly/bootstrap.min.css" rel="stylesheet" crossorigin="anonymous" />
2018-12-23 21:26:21 +01:00
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha256-U5ZEeKfGNOja007MMD3YBI0A3OSZOQbeG6z2f2Y0hu8=" crossorigin="anonymous"></script>
<!-- Font Awesome icons -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css" integrity="sha256-eZrrJcwDc/3uDhsdt61sL2oOBY362qM3lon1gyExkL0=" crossorigin="anonymous" />
<!-- clipboard.js -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.4/clipboard.min.js" integrity="sha256-FiZwavyI2V6+EXO1U+xzLG3IKldpiTFf3153ea9zikQ=" crossorigin="anonymous"></script>
<!-- sticky kit -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/sticky-kit/1.1.3/sticky-kit.min.js" integrity="sha256-c4Rlo1ZozqTPE2RLuvbusY3+SU1pQaJC0TjuhygMipw=" crossorigin="anonymous"></script>
<!-- pkgdown -->
<link href="../pkgdown.css" rel="stylesheet">
<script src="../pkgdown.js"></script>
2018-12-29 22:24:19 +01:00
<!-- docsearch -->
<script src="../docsearch.js"></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/docsearch.js/2.6.1/docsearch.min.css" integrity="sha256-QOSRU/ra9ActyXkIBbiIB144aDBdtvXBcNc3OTNuX/Q=" crossorigin="anonymous" />
<link href="../docsearch.css" rel="stylesheet">
<script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/jquery.mark.min.js" integrity="sha256-4HLtjeVgH0eIB3aZ9mLYF6E8oU5chNdjU6p6rrXpl9U=" crossorigin="anonymous"></script>
2018-12-23 21:26:21 +01:00
2018-12-29 22:24:19 +01:00
<link href="../extra.css" rel="stylesheet">
<script src="../extra.js"></script>
2018-12-23 21:26:21 +01:00
<meta property="og:title" content="<em>G</em>-test for Count Data — g.test" />
<meta property="og:description" content="g.test performs chi-squared contingency table tests and goodness-of-fit tests, just like chisq.test but is more reliable [1]. A G-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a G-test of goodness-of-fit), or to see whether the proportions of one variable are different for different values of the other variable (called a G-test of independence)." />
2018-12-29 22:24:19 +01:00
<meta property="og:image" content="https://msberends.gitlab.io/logo.png" />
2018-12-23 21:26:21 +01:00
<meta name="twitter:card" content="summary" />
<!-- mathjax -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script>
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="container template-reference-topic">
<header>
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<span class="navbar-brand">
2018-12-29 22:24:19 +01:00
<a class="navbar-link" href="../index.html">AMR (for R)</a>
<span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">0.5.0.9008</span>
2018-12-23 21:26:21 +01:00
</span>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li>
<a href="../index.html">
2018-12-29 22:24:19 +01:00
<span class="fa fa-home"></span>
2018-12-23 21:26:21 +01:00
2018-12-29 22:24:19 +01:00
Home
2018-12-23 21:26:21 +01:00
</a>
</li>
<li>
2018-12-29 22:24:19 +01:00
<a href="../articles/AMR.html">
<span class="fa fa-directions"></span>
Get Started
</a>
2018-12-23 21:26:21 +01:00
</li>
<li>
2018-12-29 22:24:19 +01:00
<a href="../reference/">
<span class="fa fa-book-open"></span>
Manual
</a>
</li>
<li>
<a href="../authors.html">
<span class="fa fa-users"></span>
Authors
</a>
2018-12-23 21:26:21 +01:00
</li>
2018-12-29 22:24:19 +01:00
<li>
<a href="../news/">
<span class="far fa far fa-newspaper"></span>
2018-12-23 21:26:21 +01:00
2018-12-29 22:24:19 +01:00
Changelog
2018-12-23 21:26:21 +01:00
</a>
</li>
<li>
2018-12-29 22:24:19 +01:00
<a href="https://gitlab.com/msberends/AMR">
<span class="fab fa fab fa-gitlab"></span>
Source Code
</a>
2018-12-23 21:26:21 +01:00
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
2018-12-29 22:24:19 +01:00
<li>
<a href="../LICENSE-text.html">
<span class="fa fa-book"></span>
Licence
</a>
</li>
2018-12-23 21:26:21 +01:00
</ul>
2018-12-29 22:24:19 +01:00
<form class="navbar-form navbar-right" role="search">
<div class="form-group">
<input type="search" class="form-control" name="search-input" id="search-input" placeholder="Search..." aria-label="Search for..." autocomplete="off">
</div>
</form>
2018-12-23 21:26:21 +01:00
</div><!--/.nav-collapse -->
</div><!--/.container -->
</div><!--/.navbar -->
</header>
<div class="row">
<div class="col-md-9 contents">
<div class="page-header">
<h1><em>G</em>-test for Count Data</h1>
<div class="hidden name"><code>g.test.Rd</code></div>
</div>
<div class="ref-description">
<p><code>g.test</code> performs chi-squared contingency table tests and goodness-of-fit tests, just like <code><a href='https://www.rdocumentation.org/packages/stats/topics/chisq.test'>chisq.test</a></code> but is more reliable [1]. A <em>G</em>-test can be used to see whether the number of observations in each category fits a theoretical expectation (called a <strong><em>G</em>-test of goodness-of-fit</strong>), or to see whether the proportions of one variable are different for different values of the other variable (called a <strong><em>G</em>-test of independence</strong>).</p>
</div>
<pre class="usage"><span class='fu'>g.test</span>(<span class='no'>x</span>, <span class='kw'>y</span> <span class='kw'>=</span> <span class='kw'>NULL</span>, <span class='kw'>p</span> <span class='kw'>=</span> <span class='fu'><a href='https://www.rdocumentation.org/packages/base/topics/rep'>rep</a></span>(<span class='fl'>1</span>/<span class='fu'><a href='https://www.rdocumentation.org/packages/base/topics/length'>length</a></span>(<span class='no'>x</span>), <span class='fu'><a href='https://www.rdocumentation.org/packages/base/topics/length'>length</a></span>(<span class='no'>x</span>)),
<span class='kw'>rescale.p</span> <span class='kw'>=</span> <span class='fl'>FALSE</span>)</pre>
<h2 class="hasAnchor" id="arguments"><a class="anchor" href="#arguments"></a>Arguments</h2>
<table class="ref-arguments">
<colgroup><col class="name" /><col class="desc" /></colgroup>
<tr>
<th>x</th>
<td><p>a numeric vector or matrix. <code>x</code> and <code>y</code> can also
both be factors.</p></td>
</tr>
<tr>
<th>y</th>
<td><p>a numeric vector; ignored if <code>x</code> is a matrix. If
<code>x</code> is a factor, <code>y</code> should be a factor of the same length.</p></td>
</tr>
<tr>
<th>p</th>
<td><p>a vector of probabilities of the same length of <code>x</code>.
An error is given if any entry of <code>p</code> is negative.</p></td>
</tr>
<tr>
<th>rescale.p</th>
<td><p>a logical scalar; if TRUE then <code>p</code> is rescaled
(if necessary) to sum to 1. If <code>rescale.p</code> is FALSE, and
<code>p</code> does not sum to 1, an error is given.</p></td>
</tr>
</table>
<h2 class="hasAnchor" id="source"><a class="anchor" href="#source"></a>Source</h2>
<p>This code is almost identical to <code><a href='https://www.rdocumentation.org/packages/stats/topics/chisq.test'>chisq.test</a></code>, except that:</p><ul>
<li><p>The calculation of the statistic was changed to <code>2 * sum(x * log(x / E))</code></p></li>
<li><p>Yates' continuity correction was removed as it does not apply to a <em>G</em>-test</p></li>
<li><p>The possibility to simulate p values with <code>simulate.p.value</code> was removed</p></li>
</ul>
<h2 class="hasAnchor" id="value"><a class="anchor" href="#value"></a>Value</h2>
<p>A list with class <code>"htest"</code> containing the following
components:</p>
<dt>statistic</dt><dd><p>the value the chi-squared test statistic.</p></dd>
<dt>parameter</dt><dd><p>the degrees of freedom of the approximate
chi-squared distribution of the test statistic, <code>NA</code> if the
p-value is computed by Monte Carlo simulation.</p></dd>
<dt>p.value</dt><dd><p>the p-value for the test.</p></dd>
<dt>method</dt><dd><p>a character string indicating the type of test
performed, and whether Monte Carlo simulation or continuity
correction was used.</p></dd>
<dt>data.name</dt><dd><p>a character string giving the name(s) of the data.</p></dd>
<dt>observed</dt><dd><p>the observed counts.</p></dd>
<dt>expected</dt><dd><p>the expected counts under the null hypothesis.</p></dd>
<dt>residuals</dt><dd><p>the Pearson residuals,
<code>(observed - expected) / sqrt(expected)</code>.</p></dd>
<dt>stdres</dt><dd><p>standardized residuals,
<code>(observed - expected) / sqrt(V)</code>, where <code>V</code> is the residual cell variance (Agresti, 2007,
section 2.4.5 for the case where <code>x</code> is a matrix, <code>n * p * (1 - p)</code> otherwise).</p></dd>
<h2 class="hasAnchor" id="details"><a class="anchor" href="#details"></a>Details</h2>
<p>If <code>x</code> is a matrix with one row or column, or if <code>x</code> is a vector and <code>y</code> is not given, then a <em>goodness-of-fit test</em> is performed (<code>x</code> is treated as a one-dimensional contingency table). The entries of <code>x</code> must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in <code>p</code>, or are all equal if <code>p</code> is not given.</p>
<p>If <code>x</code> is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of <code>x</code> must be non-negative integers. Otherwise, <code>x</code> and <code>y</code> must be vectors or factors of the same length; cases with missing values are removed, the objects are coerced to factors, and the contingency table is computed from these. Then Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals.</p>
<p>The p-value is computed from the asymptotic chi-squared distribution of the test statistic.</p>
<p>In the contingency table case simulation is done by random sampling from the set of all contingency tables with given marginals, and works only if the marginals are strictly positive. Note that this is not the usual sampling situation assumed for a chi-squared test (like the <em>G</em>-test) but rather that for Fisher's exact test.</p>
<p>In the goodness-of-fit case simulation is done by random sampling from the discrete distribution specified by <code>p</code>, each sample being of size <code>n = sum(x)</code>. This simulation is done in <span style="R">R</span> and may be slow.</p>
<h2 class="hasAnchor" id="g-test-of-goodness-of-fit-likelihood-ratio-test-"><a class="anchor" href="#g-test-of-goodness-of-fit-likelihood-ratio-test-"></a><em>G</em>-test of goodness-of-fit (likelihood ratio test)</h2>
<p>Use the <em>G</em>-test of goodness-of-fit when you have one nominal variable with two or more values (such as male and female, or red, pink and white flowers). You compare the observed counts of numbers of observations in each category with the expected counts, which you calculate using some kind of theoretical expectation (such as a 1:1 sex ratio or a 1:2:1 ratio in a genetic cross).</p>
<p>If the expected number of observations in any category is too small, the <em>G</em>-test may give inaccurate results, and you should use an exact test instead (<code><a href='https://www.rdocumentation.org/packages/stats/topics/fisher.test'>fisher.test</a></code>).</p>
<p>The <em>G</em>-test of goodness-of-fit is an alternative to the chi-square test of goodness-of-fit (<code><a href='https://www.rdocumentation.org/packages/stats/topics/chisq.test'>chisq.test</a></code>); each of these tests has some advantages and some disadvantages, and the results of the two tests are usually very similar.</p>
<h2 class="hasAnchor" id="g-test-of-independence"><a class="anchor" href="#g-test-of-independence"></a><em>G</em>-test of independence</h2>
<p>Use the <em>G</em>-test of independence when you have two nominal variables, each with two or more possible values. You want to know whether the proportions for one variable are different among values of the other variable.</p>
<p>It is also possible to do a <em>G</em>-test of independence with more than two nominal variables. For example, Jackson et al. (2013) also had data for children under 3, so you could do an analysis of old vs. young, thigh vs. arm, and reaction vs. no reaction, all analyzed together.</p>
<p>Fisher's exact test (<code><a href='https://www.rdocumentation.org/packages/stats/topics/fisher.test'>fisher.test</a></code>) is more accurate than the <em>G</em>-test of independence when the expected numbers are small, so it is recommend to only use the <em>G</em>-test if your total sample size is greater than 1000.</p>
<p>The <em>G</em>-test of independence is an alternative to the chi-square test of independence (<code><a href='https://www.rdocumentation.org/packages/stats/topics/chisq.test'>chisq.test</a></code>), and they will give approximately the same results.</p>
<h2 class="hasAnchor" id="how-the-test-works"><a class="anchor" href="#how-the-test-works"></a>How the test works</h2>
<p>Unlike the exact test of goodness-of-fit (<code><a href='https://www.rdocumentation.org/packages/stats/topics/fisher.test'>fisher.test</a></code>), the <em>G</em>-test does not directly calculate the probability of obtaining the observed results or something more extreme. Instead, like almost all statistical tests, the <em>G</em>-test has an intermediate step; it uses the data to calculate a test statistic that measures how far the observed data are from the null expectation. You then use a mathematical relationship, in this case the chi-square distribution, to estimate the probability of obtaining that value of the test statistic.</p>
<p>The <em>G</em>-test uses the log of the ratio of two likelihoods as the test statistic, which is why it is also called a likelihood ratio test or log-likelihood ratio test. The formula to calculate a <em>G</em>-statistic is:</p>
<p><code>G &lt;- 2 * sum(x * log(x / E))</code></p>
<p>where <code>E</code> are the expected values. Since this is chi-square distributed, the p value can be calculated with:</p>
<p><code>p &lt;- stats::pchisq(G, df, lower.tail = FALSE)</code></p>
<p>where <code>df</code> are the degrees of freedom.</p>
<p>If there are more than two categories and you want to find out which ones are significantly different from their null expectation, you can use the same method of testing each category vs. the sum of all categories, with the Bonferroni correction. You use <em>G</em>-tests for each category, of course.</p>
<h2 class="hasAnchor" id="references"><a class="anchor" href="#references"></a>References</h2>
<p>[1] McDonald, J.H. 2014. <strong>Handbook of Biological Statistics (3rd ed.)</strong>. Sparky House Publishing, Baltimore, Maryland. <a href='http://www.biostathandbook.com/gtestgof.html'>http://www.biostathandbook.com/gtestgof.html</a>.</p>
<h2 class="hasAnchor" id="see-also"><a class="anchor" href="#see-also"></a>See also</h2>
<div class='dont-index'><p><code><a href='https://www.rdocumentation.org/packages/stats/topics/chisq.test'>chisq.test</a></code></p></div>
<h2 class="hasAnchor" id="examples"><a class="anchor" href="#examples"></a>Examples</h2>
2018-12-31 14:30:06 +01:00
<pre class="examples"><div class='input'><span class='co'># = EXAMPLE 1 =</span>
2018-12-23 21:26:21 +01:00
<span class='co'># Shivrain et al. (2006) crossed clearfield rice (which are resistant</span>
<span class='co'># to the herbicide imazethapyr) with red rice (which are susceptible to</span>
<span class='co'># imazethapyr). They then crossed the hybrid offspring and examined the</span>
<span class='co'># F2 generation, where they found 772 resistant plants, 1611 moderately</span>
<span class='co'># resistant plants, and 737 susceptible plants. If resistance is controlled</span>
<span class='co'># by a single gene with two co-dominant alleles, you would expect a 1:2:1</span>
<span class='co'># ratio.</span>
<span class='no'>x</span> <span class='kw'>&lt;-</span> <span class='fu'><a href='https://www.rdocumentation.org/packages/base/topics/c'>c</a></span>(<span class='fl'>772</span>, <span class='fl'>1611</span>, <span class='fl'>737</span>)<span class='co'>#'</span>
<span class='no'>G</span> <span class='kw'>&lt;-</span> <span class='fu'>g.test</span>(<span class='no'>x</span>, <span class='kw'>p</span> <span class='kw'>=</span> <span class='fu'><a href='https://www.rdocumentation.org/packages/base/topics/c'>c</a></span>(<span class='fl'>1</span>, <span class='fl'>2</span>, <span class='fl'>1</span>) / <span class='fl'>4</span>)
<span class='co'># G$p.value = 0.12574.</span>
<span class='co'># There is no significant difference from a 1:2:1 ratio.</span>
<span class='co'># Meaning: resistance controlled by a single gene with two co-dominant</span>
<span class='co'># alleles, is plausible.</span>
<span class='co'># = EXAMPLE 2 =</span>
<span class='co'># Red crossbills (Loxia curvirostra) have the tip of the upper bill either</span>
<span class='co'># right or left of the lower bill, which helps them extract seeds from pine</span>
<span class='co'># cones. Some have hypothesized that frequency-dependent selection would</span>
<span class='co'># keep the number of right and left-billed birds at a 1:1 ratio. Groth (1992)</span>
<span class='co'># observed 1752 right-billed and 1895 left-billed crossbills.</span>
<span class='no'>x</span> <span class='kw'>&lt;-</span> <span class='fu'><a href='https://www.rdocumentation.org/packages/base/topics/c'>c</a></span>(<span class='fl'>1752</span>, <span class='fl'>1895</span>)
2018-12-31 14:30:06 +01:00
<span class='fu'>g.test</span>(<span class='no'>x</span>)</div><div class='output co'>#&gt;
#&gt; G-test of goodness-of-fit (likelihood ratio test)
#&gt;
#&gt; data: x
#&gt; X-squared = 5.6085, df = 1, p-value = 0.01787
#&gt; </div><div class='input'># p = 0.01787343
2018-12-23 21:26:21 +01:00
2018-12-31 14:30:06 +01:00
# There is a significant difference from a 1:1 ratio.
# Meaning: there are significantly more left-billed birds.
2018-12-23 21:26:21 +01:00
2018-12-31 14:30:06 +01:00
</div></pre>
2018-12-23 21:26:21 +01:00
</div>
<div class="col-md-3 hidden-xs hidden-sm" id="sidebar">
<h2>Contents</h2>
<ul class="nav nav-pills nav-stacked">
<li><a href="#arguments">Arguments</a></li>
<li><a href="#source">Source</a></li>
<li><a href="#value">Value</a></li>
<li><a href="#details">Details</a></li>
<li><a href="#g-test-of-goodness-of-fit-likelihood-ratio-test-"><em>G</em>-test of goodness-of-fit (likelihood ratio test)</a></li>
<li><a href="#g-test-of-independence"><em>G</em>-test of independence</a></li>
<li><a href="#how-the-test-works">How the test works</a></li>
<li><a href="#references">References</a></li>
<li><a href="#see-also">See also</a></li>
<li><a href="#examples">Examples</a></li>
</ul>
</div>
</div>
<footer>
<div class="copyright">
2018-12-29 22:24:19 +01:00
<p>Developed by <a href='https://www.rug.nl/staff/m.s.berends/'>Matthijs S. Berends</a>, <a href='https://www.rug.nl/staff/c.f.luz/'>Christian F. Luz</a>, <a href='https://www.rug.nl/staff/c.glasner/'>Corinna Glasner</a>, <a href='https://www.rug.nl/staff/a.w.friedrich/'>Alex W. Friedrich</a>, <a href='https://www.rug.nl/staff/b.sinha/'>Bhanu N. M. Sinha</a>.</p>
2018-12-23 21:26:21 +01:00
</div>
<div class="pkgdown">
<p>Site built with <a href="https://pkgdown.r-lib.org/">pkgdown</a> 1.3.0.</p>
</div>
</footer>
</div>
2018-12-29 22:24:19 +01:00
<script src="https://cdnjs.cloudflare.com/ajax/libs/docsearch.js/2.6.1/docsearch.min.js" integrity="sha256-GKvGqXDznoRYHCwKXGnuchvKSwmx9SRMrZOTh2g4Sb0=" crossorigin="anonymous"></script>
<script>
docsearch({
apiKey: 'f737050abfd4d726c63938e18f8c496e',
indexName: 'amr',
inputSelector: 'input#search-input.form-control',
transformData: function(hits) {
return hits.map(function (hit) {
hit.url = updateHitURL(hit);
return hit;
});
}
});
</script>
2018-12-23 21:26:21 +01:00
</body>
</html>