Confidence interval

format_list_bulleted Contenido keyboard_arrow_down
ImprimirCitar
Vertical lines represent 50 different constructions of confidence intervals for the estimation of μ value.

In statistics, it is called confidence interval a pair or several pairs of numbers among which it is estimated that there will be some unknown value regarding a population parameter with a certain level of trust. Formally, these numbers determine an interval, which is calculated from sample data, and the unknown value is a population parameter. The level of trust represents the percentage of intervals taken from 100 different independent samples actually contain the unknown value. Under these circumstances, is called random error or level of significance, that is, the number of intervals over 100 that do not contain the value

The level of confidence and the width of the interval vary together, so that a larger interval will have a greater probability of being correct (higher level of confidence), while for a smaller interval, which offers a more precise estimate, it increases its probability of error.

To construct a certain confidence interval it is necessary to know the theoretical distribution that the parameter to be estimated follows, θ. It is usual for the parameter to present a normal distribution. Confidence intervals can also be constructed with the Chebyshev inequality.

Definition

An interval confidence for estimating a population parameter that follows a certain distribution of probability, is an expression of the type such as where is the probability distribution function .

Examples

Confidence interval of the mean of a population

Of an average population typical deviation samples can be taken elements. Each of these samples has in turn an average. It can be shown that the average of all sample media matches the population average:

But also, if the size of the samples is large enough, or the population distribution is normal, the distribution of sample media is practically a normal (or gaussian) distribution with average and a typical deviation given by the following expression

this is represented as

If we standardize, it follows that

In a distribution an interval within which a certain percentage of observations falls can be easily calculated, that is, simple to find and such that where is the desired percentage (see the use of tables in a normal distribution).

In this normal media distribution you can calculate the confidence interval where the population average will be found if only a sample average is known (), with a certain confidence. We usually manage confidence values of 95 and 99 percent. This value will be called (because is the mistake that will be made, an opposite term).

To do this it is necessary to calculate the point or, better said, its standardized version or Critical Value— Along With Their "Office in Distribution" . These points delimit the probability for the interval, as shown in the following image:

This point is the number such that:

And in the standardized version it is true that:

Like this:

From which the confidence interval will be obtained

Note that the confidence interval is given by the sample average ± Critical value product by standard error .

If it is not known and n is large (usually n ≥ 30 is taken):

Approximations for value for standard confidence levels are 1.96 for and 2,576 .

Confidence interval of a proportion

The confidence interval to estimate a ratio , known as a sample ratio of a sample size at a level of trust is:

The Central Limit Theorem and the approximation of a binomial by a normal are involved in the demonstration of these formulas.

Practical example

A factory assembly line fills the ice cream cups up to 250 g +/- 2.5 g desired.

A machine fills cups with ice cream, and is supposed to be set to pour the amount of 250 g. Since the machine cannot fill each cup with exactly 250 g, the content added to each individual cup has some variation and is assigned a random variable desired average of 250 g, with a standard deviation of 2.5 g.

To determine if the machine is properly calibrated, a random sample of n = 25 cups of ice cream is taken and weighed. The resulting measurement is X1,..., X25, a random sample from X.

For μ, it is sufficient to give an estimate. The appropriate estimator is the sample mean:

The sample indicates the real weights x1,..., x25, with average:


When taking another sample of 25 cups, it is expected, in the same way, that the mass presents values such as 250.4 or 251.1 grams. A sample mean value of 280 grams, on the other hand, would be extremely exceptional if the mean content of the cups is in practice close to 250 grams. There is an interval around the observed value of 250.2 grams of the sample mean, for which if the mean of the entire population indeed takes a value in this range, the observed data could not be considered particularly unusual. Such an interval is called the confidence interval for the parameter μ. How is such an interval calculated? The extremes of the interval must be calculated from the sample to give statistical functions of the sample X1,..., X25 and in this way they are random variables in turn.

In this case, the extremes will be determined considering the sample mean X which, since it comes from a normal distribution, is also normally distributed with the same expectation μ, but with a standard error of:

By standardization, a random variable is obtained:

dependent on the μ parameter that must be estimated, but with a standard normal distribution independent of the μ parameter. Therefore, it is possible to find numbers −z and z, independent of μ, between which is Z with probability 1 − α, a measure of how confident we want to be.

We take 1 − α = 0.95, for example. Thus, we have:

The number z comes from a cumulative distribution function, in this case the Cumulative Normal Distribution Function:

and you get:

In other words, the lower limit of a 95% confidence interval is:

and the upper of such interval is:

With the values in this example, the confidence interval is:

This could be interpreted as: with probability 0.95 we find a confidence interval in which it is true that the parameter μ is between the stochastic limits

and

This does not imply that there is a probability of 0.95 of finding the parameter μ in the interval obtained using the value actually established for the mean value of the sample.

Each time the measurements are repeated, they will give another value for the mean X of the sample. In 95% of the cases μ will be between the limits calculated from the mean, but in 5% of the cases it will not be. The effective confidence interval is calculated by taking the measured ice cream mass values into the formula. This confidence interval of 0.95 results:

The vertical segment represents 50 realizations of a confidence interval for μ.

In other words, the 95% confidence interval is between the lower limit of 249.22 g and the upper limit of 251.18 g.

As the desired value of μ 250 is within the resulting confidence interval there is no reason to believe that the machine is not correctly calibrated.

The calculated interval has fixed limits, where μ may or may not be bounded. Thus, this event has probability 0 or 1. It is not possible to say: "with probability (1 − α) the parameter μ is in the confidence interval." We only know that by repetition in 100(1 − α)% of the cases, μ will be in the calculated interval. In 100% of cases, however, this does not happen. Unfortunately, it is not known in which cases this happens. That is why we can say: "with confidence level 100(1 − α)%, μ is in the confidence interval."

The maximum error is calculated as 0.98 since it is the difference between the value at which confidence is preserved within the upper and lower limits.

The figure illustrates 50 realizations of a confidence interval for a given population mean μ. If a realization is randomly selected, the probability is 95% of finally having chosen an interval that contains the parameter; However, there could be the unfortunate situation of having chosen the wrong one.

Más resultados...
Tamaño del texto:
undoredo
format_boldformat_italicformat_underlinedstrikethrough_ssuperscriptsubscriptlink
save