Chapter 3 Effect Sizes | Doing Meta-Analysis In R - Bookdown
Maybe your like
3.1 What Is an Effect Size?
In the terminology we use in this book, an effect size is defined as a metric quantifying the relationship between two entities. It captures the direction and magnitude of this relationship. If relationships are expressed as the same effect size, it is possible to compare them.
We want to stress here that this is just one way to define what an effect size means. Definitions of an effect size can be wider and narrower, and the term is used differently by different people (Borenstein et al. 2011, chap. 3). Some researchers only talk of effect sizes when referring to the results of intervention studies, which are usually expressed as differences between the treatment and control group (see Chapter 3.3.1). Using this conceptualization, “effect size” refers to the effect of a treatment, and how large this effect is.
In our opinion, this is quite a narrow definition. Not only treatments can have an effect on some variable; effects can also appear naturally without any direct human intervention. For example, it is possible that socio-demographic variables, such as the income and education of parents, may have an effect on the educational attainment of their children. Correlations describe how well we can predict the values of a variable through the values of another, and can also be seen as a form of effect size.
On the other hand, it might go too far to say that everything we can pool as part of a meta-analysis is automatically an effect size. As we will learn, there are measures of the central tendency, such as the sample mean, which can also be used in meta-analyses. But a sample mean alone does not quantify a relationship between two phenomena, and there is no “effect”. Nevertheless, in this book, we will often use the word “effect size” as a pars pro toto, representing both estimates of an actual effect, as well as “one-variable” and central tendency measures. We do not do this because this is accurate, but because it is more convenient.
Others disapprove of the term “effect size” altogether. They stress that the word “effect” in “effect size” suggests that there is a causal relationship. However, we all know that correlation is not causation, and a difference between an intervention and control group must not automatically be caused by the treatment itself. In the end, it is up to you to decide which definition you prefer, but be aware that people may have different conceptualizations in mind when they talk about effect sizes.
In mathematical notation, it is common to use the greek letter theta (\(\theta\)) as the symbol for a true effect size8. More precisely, \(\theta_k\) represents the true effect size of a study \(k\). Importantly, the true effect size is not identical with the observed effect size that we find in the published results of the study. The observed effect size is only an estimate of the true effect size. It is common to use a hat (^) symbol to clarify that the entity we refer to is only an estimate. The observed effect size in study \(k\), our estimate of the true effect size, can therefore be written as \(\hat\theta_k\).
But why does \(\hat\theta_k\) differ from \(\theta_k\)? It differs because of the sampling error, which can be symbolized as \(\epsilon_k\). In every primary study, researchers can only draw a small sample from the whole population. For example, when we want to examine the benefits of regular exercise on the cardiovascular health of primary care patients, we will only be able to include a small selection of patients, not all primary care patients in the world. The fact that a study can only take small samples from an infinitely large population means that the observed effect will differ from the true population effect.
Put simply, \(\hat\theta_k\) is, therefore, the same as \(\theta_k\) plus some sampling error \(\epsilon_k\)9.
\[\begin{align} \hat\theta_k = \theta_k + \epsilon_k \tag{3.1} \end{align}\]
It is obviously desirable that the effect size estimate \(\hat\theta_k\) of study \(k\) is as close as possible to the true effect size, and that \(\epsilon_k\) is minimal. All things being equal, we can assume that studies with smaller \(\epsilon\) will deliver a more precise estimate of the true effect size. Meta-analysis methods take into account how precise an effect size estimate is (see Chapter 4). When pooling the results of different studies, they give effects with a greater precision (i.e., less sampling error) a higher weight, because they are better estimators of the true effect (L. Hedges and Olkin 2014).
But how can we know how big the sampling error is? Unsurprisingly, the true effect of a study \(\theta_k\) is unknown, and so \(\epsilon_k\) is also unknown. Often, however, we can use statistical theory to approximate the sampling error. A common way to quantify \(\epsilon\) is through the standard error (\(SE\)). The standard error is defined as the standard deviation of the sampling distribution. A sampling distribution is the distribution of a metric we get when we draw random samples with the same sample size \(n\) from our population many, many times.
We can make this more concrete by simulating data in R. We can pretend that we are drawing random samples from a larger population using the rnorm function. This function allows us to draw random samples from a normal distribution, therefore the name. The rnorm function simulates a “perfect world” in which we know how values are distributed in the true population and lets us take samples.
The function takes three arguments: n, the number of observations we want to have in our sample; mean, the true mean of the population; and sd, the true standard deviation. The rnorm function has a random component, so to make results reproducible, we have to set a seed first. This can be done using the set.seed function, which we have to supply with a number. For our example, we chose to set a seed of 123. Furthermore, we want to simulate that the true mean of our population is \(\mu =\) 10, that the true standard deviation is \(\sigma =\) 2, and that our sample consists of \(n=\) 50 randomly selected observations, which we save under the name sample.
This is what our code looks like:
set.seed(123) sample <- rnorm(n = 50, mean = 10, sd = 2)Now, we can calculate the mean of our sample.
mean(sample) ## [1] 10.06881We see that the mean is \(\bar{x} =\) 10.07, which is already very close to the true value in our population. The sampling distribution can now be created by repeating what we did here–taking a random sample and calculating its mean–countless times. To simulate this process for you, we conducted the steps from before 1000 times.
The histogram in Figure 3.1 displays the results. We can see that the means of the samples closely resemble a normal distribution with a mean of 10. If we were to draw even more samples, the distribution of the means would get even closer to a normal distribution. This idea is expressed in one of the most fundamental tenets of statistics, the central limit theorem (Aronow and Miller 2019, chap. 3.2.4).

Figure 3.1: “Sampling distribution” of means (1000 samples).
The standard error is defined as the standard deviation of this sampling distribution. Therefore, we calculated the standard deviation of the 1000 simulated means to get an approximation of the standard error. The result is \(SE =\) 0.267.
As we mentioned before, we cannot simply calculate the standard error in real life by simulating the true sampling distribution. However, there are formulas based on statistical theory which allow us to calculate an estimate of the standard error, even when we are limited to only one observed sample–which we usually are. The formula to calculate the standard error of the mean is defined like this:
\[\begin{align} SE = \frac{s}{\sqrt{n}} \tag{3.2} \end{align}\]
It defines the standard error as the standard deviation of our sample \(s\), divided by the square root of the sample size \(n\). Using this formula, we can easily calculate the standard error of our sample object from before using R. Remember that the size of our random sample was \(n =\) 50.
sd(sample)/sqrt(50) ## [1] 0.2618756If we compare this value to the one we found in our simulation of the sampling distribution, we see that they are nearly identical. Using the formula, we could quite accurately estimate the standard error using only the sample we have at hand.
In formula 3.2, we can see that the standard error of the mean depends on the sample size of a study. When \(n\) becomes larger, the standard error becomes smaller, meaning that a study’s estimate of the true population mean becomes more precise.
To exemplify this relationship, we conducted another simulation. Again, we used the rnorm function, and assumed a true population mean of \(\mu =\) 10 and that \(\sigma =\) 2. But this time, we varied the sample size, from \(n =\) 2 to \(n =\) 500. For each simulation, we calculated both the mean, and the standard error using formula 3.2.

Figure 3.2: Sample mean and standard error as a function of sample size.
Figure 3.2 shows the results. We can see that the means look like a funnel: as the sample size increases, the mean estimates become more and more precise, and converge towards 10. This increase in precision is represented by the standard error: with increasing sample size, the standard error becomes smaller and smaller.
We have now explored the quintessential elements we need to conduct a meta-analysis: an (1) observed effect size or outcome measure, and (2) its precision, expressed as the standard error. If these two types of information can be calculated from a published study, it is usually also possible to perform a meta-analytic synthesis (see Chapter 4).
In our simulations, we used the mean of a variable as an example. It is important to understand that the properties we saw above can also be found in other outcome measures, including commonly used effect sizes. If we would have calculated a mean difference in our sample instead of a mean, this mean difference would have exhibited a similarly shaped sampling distribution, and the standard error of the mean difference would have also decreased as the sample sizes increases (provided the standard deviation remains the same). The same is also true, for example, for (Fisher’s \(z\) transformed) correlations.
In the following sections, we will go through the most commonly used effect sizes and outcome measures in meta-analyses. One reason why these effect size metrics are used so often is because they fulfill two of the criteria we defined at the beginning of this chapter: they are reliable and computable.
In formula 3.2, we described how the standard error of a mean can be calculated, but this formula can only be readily applied to means. Different formulas to calculate the standard error are needed for other effect sizes and outcome measures. For the effect size metrics we cover here, these formulas luckily exist, and we will show you all of them. A collection of the formulas can be also found in the Appendix. Some of these formulas are somewhat complicated, but the good news is that we hardly ever have to calculate the standard error manually. There are various functions in R which do the heavy lifting for us.
In the following section, we not only want to provide a theoretical discussion of different effect size metrics. We also show you which kind of information you have to prepare in your data set so that the R meta-analysis functions we are using later can easily calculate the effect sizes for us.
We grouped effect sizes based on the type of research design in which they usually appear: observational designs (e.g. naturalistic studies or surveys), and experimental designs (e.g. controlled clinical trials). Please note that this is just a rough classification, not a strict rule. Many of the effect sizes we present are technically applicable to any type of research design, as long as the type of outcome data is suited.
Tag » Cohen's D Effect Size Meta Analysis
-
What Does Effect Size Tell You? | Simply Psychology
-
Calculating And Reporting Effect Sizes To Facilitate Cumulative Science
-
Cohen's D - Wikiversity
-
Effect Size - Wikipedia
-
Meta Analysis And Effect Size - Creative Wisdom
-
Effect Size Guidelines, Sample Size Calculations, And Statistical ...
-
How To Calculate Initial Effect Sizes For Meta Analysis? - ResearchGate
-
[PDF] Effect Sizes Based On Means - Comprehensive Meta-Analysis
-
How To Select, Calculate, And Interpret Effect Sizes - Oxford Academic
-
[PDF] Interpreting Cohen's D Effect Size - OSF
-
What Is An Effect Size? | Department Of Social Policy And Intervention
-
Effect Size - Statistics Solutions
-
Effect Size Meta-analysis - StatsDirect
-
Making Sense Of Effect Size In Meta-analysis Based For Medical ...