Overview Of Frequentist Confidence Intervals
Maybe your like
COMMON MISTEAKS MISTAKES IN USING STATISTICS: Spotting and Avoiding Them
IntroductionTypes of MistakesSuggestionsResourcesTable of ContentsAbout
Overview of Frequentist Confidence Intervals The General Situation:- We are considering a random variable Y.
- We are interested in a certain parameter (e.g., a proportion, or mean, or regression coefficient, or variance) associated with the random variable Y.
- We do not know the value of the parameter.
- Goal 1: We would like to estimate the unknown parameter, using data from a sample.
- Goal 2: We would also like to get some sense of how good our estimate is.
- The mean of therandom variable Y is also called the expected value or the expectation of Y. It is denoted E(Y). It is also called the population mean, often denoted µ. It is what we do not know in this example.
- A sample mean is typically denoted ȳ (read "y-bar"). It is calculated from a sample y1, y2, ... , yn of values of Y by the familiar formula ȳ = (y1+ y2+ ... + yn)/n.
- The population mean µ and a sample mean ȳ are usually not the same. Confusing them is a common mistake.
- Note that I have written "the population mean" but "a sample mean". A sample mean depends on the sample chosen. Since there are many possible samples, there are many possible sample means.
- In this case, "suitable sample" turns out to be "simple random sample" (i.e., the model assumptions for the particular procedure require a simple random sample).
- So we collect a simple random sample, say of size n, consisting of observations y1, y2, ... , yn. (For example, if Y is "height of an adult American male", we take a sample random sample of n adult American males; y1, y2, ... , yn are their heights.)
- We use the sample mean ȳ = (y1+ y2+ ... + yn)/n as our estimate of µ. (This is an example of a point estimate -- a numerical estimate with no indication of how good the estimate is.)
- But to get an idea of how good our estimate is, we look at all possible simple random samples of size n from Y. (In the specific example, we consider all possible simple random samples of adult American males, and for each sample of men, the list of their heights.)
- One way we can get a sense of how good our estimate is in this situation is to consider the sample means for all possible simple random samples of size n from Y. This amounts to defining a new random variable, which we will call Ȳn (read Y-bar sub n). We can describe the random variable as Ȳn as "sample mean of a simple random sample of size n from Y", or perhaps more clearly as: "pick a simple random sample of size n from Y and calculate its sample mean". Note that each value of Ȳn is an estimate of the population mean µ.
- This new random variable Ȳn has a distribution. This is called a sampling distribution, since it arises from considering varying samples. The distribution of Ȳn gives us information about the variability (as samples vary) of our method of estimating the population mean µ. (Click here to see a summary chart and picture of both distributions. See also the Rice Virtual Lab in Statistics' Sampling Distribution Simulation to visualize sampling distributions for a variety of parameters and a variety of distributions.)
- We don't know the sampling distribution (distribution of Ȳn ) exactly (in particular, it will depend on µ, which we don't know), but the model assumptions2 will tell us enough so that it is possible to do the following:
- If we specify a probability (we'll use .95 to illustrate), we can find a number a so that
- A little algebraic manipulation allows us to restate (*) as
- We are now faced with two possibilities (assuming the model assumptions are indeed all true):
- Nonetheless, we calculate the values of Ȳn- a and Ȳn+ a for the sample we have, and call the resulting interval a 95% confidence interval for µ. We can say that we have obtained the confidence interval by using a procedure which, for 95% of all simple random samples from Y, of the given size, produces an interval containing the parameter we are estimating. Unfortunately, we can't know whether or not the sample we have used is one of the 95% of "good" samples that yield a confidence interval containing the true mean µ , or whether the sample we have is one of the 5% of "bad" samples that yield a confidence interval that does not contain the true mean µ. We can just say that we have used a procedure that "works" 95% of the time.
- Each type of confidence interval procedure has its own model assumptions; if the model assumptions are not true, we are not sure that the procedure does what is claimed. However, some procedures are robust to some degree to some departures from models assumptions -- i.e, the procedure works pretty closely to what is intended if the model assumption is not too far from true. Robustness depends on the particular procedure; there are no "one size fits all" rules; see Using an Inappropriate Method of Analysis for more details.
- We can decide on the "level of confidence" we want; that is, we can choose 90%, 99%, etc. rather than 95%. Just which level of confidence is appropriate depends on the circumstances. More
- The confidence level determines the percentage of samples for which the procedure results in an interval containing the true parameter.
- However, a higher level of confidence will produce a wider confidence interval -- that is, less certainty in our estimate. So there is a trade-off between degree of confidence and degree of certainty.
- Sometimes the best we can do is a procedure that only gives approximate confidence intervals -- that is, the sampling distribution can be described only approximately.
- If the sampling distribution is not symmetric, we can't expect the confidence interval to be symmetric around the estimate. There may be slightly different procedures for calculating the endpoints of the confidence interval.
- There are variations such as "upper confidence limits" or "lower confidence limits" where we are only interested in estimating how large or how small the estimate might be.
Tag » What Is E In Stats
-
What Does It Mean When Some Results Have E In The Number?
-
Machine Learning - What Does This E Symbol Mean? - Cross Validated
-
What Does E Stand For In Stats? - Faq
-
Expected Value - Wikipedia
-
Statistical Symbols & Probability Symbols (μ,σ,...)
-
Expectation And Variance – Mathematics A-Level Revision
-
List Of Probability And Statistics Symbols In Mathematics - Byju's
-
Statistical Formula - Statistics Solutions
-
[PDF] Important Notation
-
Symbol Sheet / SWT
-
What Does ∑ Mean? - Quora
-
4.2 Mean Or Expected Value And Standard Deviation - Statistics
-
Definition Of The Summation Symbol - Math Insight
-
E-Statistics | NIST - National Institute Of Standards And Technology