Standard error
The standard error is the standard deviation of the sampling distribution of a sample statistic. The term also refers to an estimate of the standard deviation, derived from a particular sample used to compute the estimate.
Concept
The sample mean is the usual estimator of a population mean. However, different samples chosen from the same population tend in general to give different values of sample means. The standard error of the mean (that is, the error due to estimating the population mean from the sample means) is the standard deviation of all possible samples (of a given size) chosen from that population. Also, the standard error of the mean can refer to an estimate of the standard deviation, calculated from a sample of data that is being analyzed at the same time.
In practical applications, the true value of the standard deviation (or error) is generally unknown. As a result, the term "standard error" is sometimes used to refer to an estimate of this unknown quantity. In such cases it is important to be clear about where it comes from, since the standard error is only an estimate. Unfortunately, this is not always possible and it may be better to use an approximation that avoids using the standard error, for example using maximum likelihood estimation or a more formal approximation derived from confidence intervals. A well-known case where it can be used appropriately is in the Student distribution to provide a confidence interval for an estimated mean or mean difference. In other cases, The standard error can be used to provide an indication of the size of the uncertainty, but its formal or semi-formal use to provide confidence intervals or tests should be avoided unless the sample size is at least moderately large. Here the concept "large" will depend on the particular quantities to be analyzed.
In regression analysis, the term standard error or typical error is also used as the mean of the differences between the least squares estimate and the given sample
Standard error of the mean
Population
The standard error of the mean (SEM) can be expressed as:
whereσ is the population standard deviationn is the size (number of observations) of the sample.
Estimate
Because the population standard deviation is rarely known, the standard error of the mean is usually estimated as the sample standard deviation divided by the square root of the sample size (assuming statistical independence of the sample values).
wheres is the sample standard deviation (ie, the estimate of the population standard deviation based on the sample), andn is the size (number of observations) of the sample.
The formula for the standard error of the mean can be arrived at from what we already know about the variance of the sum of independent random variables.
- If they are independent observations from a population that have a mean and a standard deviation , then the variance of the total is .
- The variance of must be .
- And so the standard deviation of will be .
- Of course, it is the sample mean ().
Note: The small sample standard error and standard deviation tend to systematically underestimate the population standard error and standard deviation: the standard error of the mean is a biased parameter of the population standard error. With n=2 the underestimation can be 25%, but for n=6 the underestimation is only 5%.
Assumptions and use
Assuming that the data used are normally distributed, the quantiles of the normal distribution, the sample mean, and the standard error can be used to calculate approximate confidence intervals for the mean. The following expressions can be used to calculate the upper and lower 95% confidence limits, where equals the sample mean, equals the standard error for the sample mean, and 1.96 is the quantile 0.975 of the normal distribution:Above 95% Limit =Below 95% Limit =
In particular, the standard error of a statistical sample (as it is of the sample mean) is the estimated standard deviation of the error in the process that it is generated. In other words, the standard error is the standard deviation of the sampling distribution of the statistical sample. The notation for standard error can be , (for standard error of "measure" (measurement) or "mean" (mean)), or .
Standard errors provide a measure of the uncertainty of the sample measurements in a single value that is often used because:
- If the standard error of various individual quantities is known then the standard error of some mathematical function of those quantities can be easily calculated in many cases:
- Where the probability distribution of the value is known, it can be used to compute a good approximation of an exact confidence interval.
- Where the probability distribution is unknown, relations such as the Chebyshov inequality or the Vysochanskiï–Petunin inequality can be used to calculate conservative confidence intervals.
- Since the sample size approaches infinity, the central limit theorem guarantees that the distribution of the sample mean is asymptotically the normal distribution.
Standard error of the regression
The standard error of the regression is the value that shows the difference between the actual values and the estimated values of a regression. It is used to assess whether there is a correlation between the regression and the measured values. Many authors prefer this datum to others such as the linear correlation coefficient, since the standard error is measured in the same units as the values being studied. The formula would be:
Being:
- estimated values
- measured values
- The size of the sample
Contenido relacionado
Annex:Countries and dependent territories by population
Opinion poll
Statistical model