Testing Independence: Chi-Squared Vs Fisher's Exact Test

Maybe your like

Statistical testing

The two most common tests for determining whether measurements from different groups are independent are the chi-squared test ($\chi^2$ test) and Fisher’s exact test. Note that you should use McNemar’s test if the measurements were paired (e.g. individual looms could be identified).

Pearson’s chi-squared test

The $\chi^2$ test is a non-parametric test that can be applied to contingency tables with various dimensions. The name of the test originates from the $\chi^2$ distribution, which is the distribution for the squares of independent standard normal variables. This is the distribution of the test statistic of the $\chi^2$ test, which is defined by the sum of chi-square values $\chi_{i,j}^2$ for all pairs of cells $i,j$ arising from the difference between a cell’s observed value $O_{i,j}$ and the expected value $E_{i,j}$, normalized by $E_{i,j}$:

\[\sum \chi_{i,j}^2 \quad \text{where} \quad \chi_{i,j}^2 = \frac{(O_{i,j}−E_{i,j})^2}{E_{i,j}}\]

The intuition here is that $\sum \chi_{i,j}^2$ will be large if the observed values considerably deviate from the expected values, while $\sum \chi_{i,j}^2$ will be close to zero if the observed values agree well with the expected values. Performing the test via

chi.result <- chisq.test(df) print(chi.result$p.value) ## [1] 7.900708e-07

Since the p-value is less than 0.05, we can reject the null hypothesis of the test (the frequency of breaks is independent of the wool) at the 5% significance level. Based on the entries of df one could then claim that wool B is significantly better (with respect to warp breaks) than wool A.

Investigating the Pearson residuals

Another way would be to consider the chi-square values of the test. The chisq.test function, provides the Pearson residuals (roots) of the chi-square values, that is, $\chi_{i,j}$. In contrast to the chi-square values, which result from squared differences, the residuals are not squared. Thus, residuals reflect the extent to which an observed value exceeded the expected value (positive value) or fell short of the expected value (negative value). In our data set, positive values indicate more strand breaks than expected, while negative values indicate less breaks:

print(chi.result$residuals) ## tension ## wool L M H ## A 2.0990516 -2.8348433 0.4082867 ## B -2.3267672 3.1423813 -0.4525797

The residuals show that, compared with wool A, wool B had less breaks for low and high tensions than expected. For, medium tension, however, wool B had more breaks than expected. Again, we find that, overall wool B is superior to wool A. The values of the residuals also indicate that wool B performs best for low tensions (residual of 2.1), well for high tensions (0.41) and badly for medium tensions (-2.8). The residuals, however, helped us in identifying a problem with wool B: it does not perform well for medium tension. How would this inform further development? In order to obtain a wool that performs well for all tension levels, we would need to focus on improving wool B for medium tension. For this purpose, we could consider the properties that make wool A perform better at medium tension.

Fisher’s exact test

Fisher’s exact test is a non-parametric test for testing independence that is typically used only for $2 \times 2$ contingency table. As an exact significance test, Fisher’s test meets all the assumptions on which basis the distribution of the test statistic is defined. In practice, this means that the false rejection rate equals the significance level of the test, which is not necessarily true for approximate tests such as the $\chi^2$ test. In short, Fisher’s exact test relies on computing the p-value according to the hypergeometric distribution using binomial coefficients, namely via

\[p = \frac{\binom{n_{1,1} + n_{1,2}}{n_{1,1}} \binom{n_{2,1} + n_{2,2}}{n_{2,1}}}{\binom{n_{1,1} + n_{1,2} + n_{2,1} + n_{2,2}}{n_{1,1} + n_{2,1}}}\]

Since the computed factorials can become very large, Fisher’s exact test may not work for large sample sizes.

Note that it is not possible to specify the alternative of the test for df since the odds ratio, which indicates the effect size, is only defined for $2 \times 2$ matrices:

\[OR = {\frac{n_{1,1}}{n_{1,2}}}/{\frac{n_{2,1}}{n_{2,2}}}\]

We can still perform Fisher’s exact test to obtain a p-value:

fisher.result <- fisher.test(df) print(fisher.result$p.value) ## [1] 8.162421e-07

The resulting p-value is similar to the one obtained from the $\chi^2$ test and we arrive at the same conclusion: we can reject the null hypothesis that the type of wool is independent of the number of breaks observed for different levels of stress.

Conversion to 2 by 2 matrices

To specify the alternative hypothesis and obtain the odds ratio, we could compute the test for the three $2 \times 2$ matrices that can be constructed from df:

p.values <- rep(NA, 3) for (i in seq(ncol(df))) { # compute strand breaks for tested stress vs other types of stress test.df <- cbind(df[, i], apply(df[,-i], 1, sum)) tested.stress <- colnames(df)[i] colnames(test.df) <- c(tested.stress, "other") # for clarity test.res <- fisher.test(test.df, alternative = "greater") p.values[i] <- test.res$p.value names(p.values)[i] <- paste0(tested.stress, " vs others") }

Since the alternative is set to greater, this means that we are performing a one-tailed test where the alternative hypothesis is that wool A is associated with a greater number of breaks than wool B (i.e. we expect $OR > 1$). By performing tests on $2 \times 2$ tables, we also gain interpretability: we can now distinguish the specific conditions under which the wools are different. Before interpreting the p-values, however, we need to correct for multiple hypothesis testing. In this case, we have performed three tests. Here, we’ll simply adjust the initial significance level of 0.05 to $\frac{0.05}{3} = 0.01\overline{6}$ according to the Bonferroni method. Based on the adjusted threshold, the following tests were significant:

print(names(p.values)[which(p.values < 0.05/3)]) ## [1] "L vs others"

This finding indicates that wool B is only significantly superior to wool A if the stress is light. Note that we could have also the approach of constructing $2 \times 2$ matrices for the $\chi^2$ test. With the $\chi^2$ test, however, this wasn’t necessary because we based our analysis on residuals.

Tag » When To Use Fisher's Exact Test

Testing Independence: Chi-Squared Vs Fisher's Exact Test

Statistical testing

Pearson’s chi-squared test

Investigating the Pearson residuals

Fisher’s exact test

Conversion to 2 by 2 matrices

Fisher's Exact Test Of Independence - Handbook Of Biological Statistics

Statistical Notes For Clinical Researchers: Chi-squared Test And ... - NCBI

Fisher's Exact Test - Wikipedia

Use And Interpret Fishers Exact Test In SPSS - Statistician For Hire

Fisher's Exact Test

Fisher's Exact Test: Use & Misuse - 2x2 Contingency Table, Fixed ...

Fisher Exact Test - StatsDirect

(PDF) When To Use Fisher's Exact Test - ResearchGate

Fisher's Exact Test: Definition, Formula, And Example - - Statology

4.5 - Fisher's Exact Test | STAT 504

Fisher's Exact Test -- From Wolfram MathWorld

Fisher's Exact Test In R: Independence Test For A Small Sample

Interpretation Of Odds Ratio And Fisher's Exact Test | By Sergen Cansiz

Fisher Exact Test - An Overview | ScienceDirect Topics

Contact