Variance

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

In probability theory, the varianza or Variance (usually represented as $sigma^2$ ) of a random variable is a dispersion measure defined as the hope of the square of the deviation of such variable regarding its average. Its unit of measure corresponds to the square of the unit of measurement of the variable: for example, if the variable measures a distance in meters, the variance is expressed in meters to square. Variance has a minimum value 0. The standard deviation (positive square root of variance) is an alternative dispersion measure, expressed in the same units as the data of the variable subject to study.

Note that the variance can be greatly influenced by outliers and its use is not recommended when the distributions of random variables have heavy tails. In such cases, the use of other more robust measures of dispersion is recommended.

The term variance was coined by Ronald Fisher in an article published in January 1919 under the title The Correlation Between Relatives on the Supposition of Mendelian Inheritance.

Then we will review the formulas, keep in mind that the variance formula for a population (σ²) differs from the variance formula for a sample (s ²), But before looking at the variance formula, we must say that variance in statistics is very important. Since although it is a simple measure, it can provide a lot of information about a specific variable.

Formula to calculate the variance

The unit of measurement of the variance will always be the unit of measurement corresponding to the data but raised to the square. The variance is always greater than or equal to zero. When the residuals are squared, it is mathematically impossible for the variance to be negative. And so it can't be less than zero.

Sea $X$ a random variable with average ${displaystyle mu =operatorname {E} (X)}$ , define the varianza of the random variable $X$ , denoted by ${displaystyle operatorname {Var} (X)}$ , ${displaystyle sigma _{X}^{2}}$ or simply ${displaystyle sigma ^{2}}$ Like

{displaystyle operatorname {Var} (X)=operatorname {E} [(X-mu)^{2}]}

Developing the previous definition, the following alternative (and equivalent) definition is obtained:

{displaystyle {begin{aligned}operatorname {Var} (X)&=operatorname {E} [(X-mu)^{2}]\&=operatorname {E} [(X^{2}-2Xmu +mu ^{2})]\&=operatorname {E} [X^{2}]-2mu operatorname {E} [X]+mu ^{2}\&=operatorname {E} [X^{2}]-2mu ^{2}+mu ^{2}\&=operatorname {E} [X^{2}]-mu ^{2}\&=operatorname {E} [X^{2}]-operatorname {E} ^{2}[X]end{aligned}}}

If a distribution has no hope, as with that of Cauchy, it does not have any variation. There are other distributions that, even with hope, lack varying. An example of them is that of Couple when their index $k$ sat $<math alttext="{displaystyle 11.k\leq \leq 2{displaystyle 1 habitkleq 2} <img alt="{displaystyle 1$ .

Case continuous

If the random variable $X$ is continuous with density function $f(x)$ then.

{displaystyle operatorname {Var} (X)=int _{R_{X}}(x-mu)^{2}f(x)dx}

where

{displaystyle mu =operatorname {E} [X]=int _{R_{X}}xf(x)dx}

and the integrals are defined on the support of the random variable $X$ I mean, ${displaystyle R_{X}}$ .

Inconspicuous case

If the random variable $X$ is discreet with probability function ${displaystyle operatorname {P} [X=x]}$ then.

{displaystyle operatorname {Var} (X)=sum _{xin R_{X}}(x-mu)^{2}operatorname {P} [X=x]}

where

{displaystyle mu =operatorname {E} [X]=sum _{xin R_{X}}xoperatorname {P} [X=x]}

Properties

Sean. $X$ and $Y$ two random variables with finite variance and ${displaystyle ain mathbb {R} }$

${displaystyle operatorname {Var} (X)geq 0}$
${displaystyle operatorname {Var} (a)=0}$
${displaystyle operatorname {Var} (aX)=a^{2}operatorname {Var} (X)}$
${displaystyle operatorname {Var} (X+Y)=operatorname {Var} (X)+operatorname {Var} (Y)+2operatorname {Cov} (X,Y)}$ Where ${displaystyle operatorname {Cov} (X,Y)}$ denotes the covariance of $X$ e $Y$
${displaystyle operatorname {Var} (X+Y)=operatorname {Var} (X)+operatorname {Var} (Y)}$ Yeah. $X$ and $Y$ are independent random variables.
${displaystyle operatorname {Var} (Y)=operatorname {E} (operatorname {Var} (Y|X))+operatorname {Var} (operatorname {E} (Y|X))}$ calculation of the Variance by Pythagoras, where ${displaystyle Y|X}$ is the conditional random variable $Y$ given $X$ .

Examples

By tossing a coin we could get Heads or Tails.

Let's give them the values Face = 0 and Shield = 1 and we have a random variable "X":

Using mathematical notation:

X = {0, 1}

Note: We could choose Heads = 100 and Shields = 150 or other values if we wanted! It is our choice. So:

We have a experiment (like throwing a coin)
We do. values to each event.
The set of values form the Variable Random

Exponential distribution

If a continuous random variable $X$ has an exponential distribution with parameter $lambda$ then its density function is given by

{displaystyle f_{X}(x)=lambda e^{-lambda x}}

for ${displaystyle xgeq 0}$ .

It's not hard to see the average $X$ That's it. ${displaystyle operatorname {E} [X]=1/lambda }$ So to find his variance we calculate

{displaystyle {begin{aligned}operatorname {Var} (X)&=int _{0}^{infty }left(x-{frac {1}{lambda }}right)^{2}lambda e^{-lambda x}dxend{aligned}}}

After integrating it can be concluded that

{displaystyle operatorname {Var} (X)={frac {1}{lambda ^{2}}}}

Perfect Die

A six-sided die can be represented as a discrete random variable that takes values from 1 to 6 with probability equal to ¹/₆. The expected value is (1+2+3+4+5+6)/6 = 3.5. Therefore, its variance is:

sum_{i=1}^6 tfrac{1}{6} (i - 3,5)^2 = tfrac{1}{6}left((-2,5)^2{+}(-1,5)^2{+}(-0,5)^2{+}0,5^2{+}1,5^2{+}2,5^2right) = tfrac{1}{6} cdot 17,50 = tfrac{35}{12} approx 2,92,.

Sampling variance

In many situations it is necessary to estimate the population variation from a sample. If you take a sample with replacement ${displaystyle (x_{1},x_{2}dotsx_{n})}$ of $n$ among all possible estimators of the varying population of departure, there are two current uses

The first one

{displaystyle s_{n}^{2}={frac {1}{n}}sum _{i=1}^{n}left(x_{i}-{bar {x}}right)^{2}}

which can be written as

{displaystyle s_{n}^{2}={frac {1}{n}}sum _{i=1}^{n}x_{i}^{2}-{bar {x}}^{2}}

well

{displaystyle {begin{aligned}s_{n}^{2}&={frac {1}{n}}sum _{i=1}^{n}left(x_{i}-{overline {x}}right)^{2}\&={frac {1}{n}}sum _{i=1}^{n}left(x_{i}^{2}-2x_{i}{overline {x}}+{overline {x}}^{2}right)\&={frac {1}{n}}sum _{i=1}^{n}x_{i}^{2}-{frac {2{overline {x}}}{n}}sum _{i=1}^{n}x_{i}+{overline {x}}^{2}{frac {1}{n}}sum _{i=1}^{n}1\&={frac {1}{n}}sum _{i=1}^{n}x_{i}^{2}-2{overline {x}}^{2}+{overline {x}}^{2}\&={frac {1}{n}}sum _{i=1}^{n}x_{i}^{2}-{overline {x}}^{2}end{aligned}}}

and the second one is

{displaystyle s^{2}={frac {1}{n-1}}sum _{i=1}^{n}left(x_{i}-{overline {x}}right)^{2}}

which can be written as

{displaystyle s^{2}={frac {sum _{i=1}^{n}x_{i}^{2}-n{overline {x}}^{2}}{n-1}}}

well

{displaystyle {begin{aligned}s^{2}&={frac {1}{n-1}}sum _{i=1}^{n}left(x_{i}-{overline {x}}right)^{2}\&={frac {1}{n-1}}sum _{i=1}^{n}left(x_{i}^{2}-2x_{i}{overline {x}}+{overline {x}}^{2}right)\&={frac {1}{n-1}}sum _{i=1}^{n}x_{i}^{2}-{frac {2{overline {x}}}{n-1}}sum _{i=1}^{n}x_{i}+{frac {{overline {x}}^{2}}{n-1}}sum _{i=1}^{n}1\&={frac {1}{n-1}}sum _{i=1}^{n}x_{i}^{2}-{frac {2{overline {x}}n}{n-1}}{frac {1}{n}}sum _{i=1}^{n}x_{i}+{frac {{overline {x}}^{2}n}{n-1}}\&={frac {1}{n-1}}sum _{i=1}^{n}x_{i}^{2}-{frac {2{overline {x}}^{2}n}{n-1}}+{frac {{overline {x}}^{2}n}{n-1}}\&={frac {1}{n-1}}sum _{i=1}^{n}x_{i}^{2}-{frac {{overline {x}}^{2}n}{n-1}}\&={frac {sum _{i=1}^{n}x_{i}^{2}-n{overline {x}}^{2}}{n-1}}end{aligned}}}

Both are called sample variance, differ slightly and, for large values $n$ The difference is irrelevant. The first moves directly the variation of the sample to that of the population and the second is an unswerving estimator of the population variance because

{displaystyle {begin{aligned}operatorname {E} [s^{2}]&=operatorname {E} left[{frac {1}{n-1}}sum _{i=1}^{n}x_{i}^{2}-{frac {n}{n-1}}{overline {x}}^{2}right]\&={frac {1}{n-1}}left(sum _{i=1}^{n}operatorname {E} [x_{i}^{2}]-noperatorname {E} [{bar {x}}^{2}]right)\&={frac {1}{n-1}}left(noperatorname {E} [x_{1}^{2}]-noperatorname {E} [{overline {x}}^{2}]right)\&={frac {n}{n-1}}left(operatorname {Var} (x_{1})+operatorname {E} [x_{1}]^{2}-operatorname {Var} ({overline {x}})-operatorname {E} [{overline {x}}]^{2}right)\&={frac {n}{n-1}}left(operatorname {Var} (x_{1})+mu ^{2}-{frac {1}{n}}operatorname {Var} (x_{1})-mu ^{2}right)\&={frac {n}{n-1}}left({frac {n-1}{n}}~operatorname {Var} (x_{1})right)\&=operatorname {Var} (x_{1})\&=sigma ^{2}end{aligned}}}

while

E[s_n^2] = frac{n-1}{n} sigma^2

Properties of the sample variance

As a result of equality $operatorname{E}(s^2)=sigma^2$ , ${displaystyle s^{2}}$ is an insected statistician $sigma^2$ . In addition, if the conditions necessary for the law of large numbers are met, s² is a consistent estimator $sigma^2$ .

Moreover, when samples follow a normal distribution, by the Cochran theorem, $s^2$ has the chi-square distribution:

{displaystyle n{frac {s^{2}}{sigma ^{2}}}sim chi _{n-1}^{2}.}

Interpretations of the sample variance

We leave three equivalent formulas for the calculation of sample variance ${displaystyle s_{n}}$

<math alttext="{displaystyle s_{n}^{2}={frac {1}{n}}sum _{i=1}^{n}left(y_{i}-{overline {y}}right)^{2}=left({frac {1}{n}}sum _{i=1}^{n}y_{i}^{2}right)-{overline {y}}^{2}={frac {1}{n^{2}}}sum _{isn2=1n␡ ␡ i=1n(andi− − and! ! )2=(1n␡ ␡ i=1nandi2)− − and! ! 2=1n2␡ ␡ i.j(andi− − andj)2{displaystyle s_{n^}{2}{frac {1}{n}}{i=1}{n}{n}{nft(y_{i}{i}{overline {y}}{i}}{i}{2}}{left({n}{n}{n1}{i}{nx}{i}{i }{i }{i }{i }{i }{f}{i }{f}{f}{f}{f}{f}{f}}{f}{f}{f}}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{b<img alt="{displaystyle s_{n}^{2}={frac {1}{n}}sum _{i=1}^{n}left(y_{i}-{overline {y}}right)^{2}=left({frac {1}{n}}sum _{i=1}^{n}y_{i}^{2}right)-{overline {y}}^{2}={frac {1}{n^{2}}}sum _{i

The latter equality has an interest in interpreting estimators ${displaystyle s^{2}}$ and ${displaystyle s_{n}^{2}}$ If you want to evaluate the deviation of a data or its differences, you can choose to calculate the average of the squares of the differences of each pair of data:

{displaystyle 2s_{n}^{2}={frac {sum _{left(ileqslant n,jleqslant nright)}left(y_{i}-y_{j}right)^{2}}{n^{2}}}}

. Note that the sum number is

{displaystyle n^{2}}

Or you can consider the average of the squares of the differences of each pair of data without taking into account each data itself, now the number of addings is ${displaystyle nleft(n-1right)}$ .

{displaystyle 2s^{2}={frac {sum _{ineq j}left(y_{i}-y_{j}right)^{2}}{nleft(n-1right)}}}

Some applications of variance

The statistical applications of the variance concept are countless. The following are just some of the main ones:

Efficient estimators. They are those whose hope is the true value of the parameter and, in addition, they have a minimal variation. In this way, we do as little as possible the risk that what we extract from a sample is far removed from the true value of the parameter.
The consistent estimators. They are those who, as the sample size grows, tend to have a zero variance. Therefore, with large samples, the estimate tends to deviate very little from the true value.
In normal distribution, variance (its square root, the typical deviation) is one of the parameters. The bell of Gauss tends to be higher and narrow as the variance decreases.
In regression models, we talk about homocedasticity when the variance of error is constant throughout their observations. For example, in a simple regression, we see a cloud of points in which the scattering of the points around the estimated straight or curve remains constant.
Variance analysis (ANOVA) allows you to compare different groups and see the factors that influence them.
Chebyshev's inequality allows us to grasp to what extent a random variable is likely to be separated from its mathematical hope in proportion to its typical deviation (square root of variance).

Conclusion

In the analysis of variances, the significant differences between two or more means of a sample are studied. This analysis is commonly known as ANOVA and it also allows us to determine if these means come from the same population (it can be the total number of employees of a company), or if the means of two populations are equal.

On the other hand, the variance as well as the standard deviation are very sensitive to outliers, these are the values that are very far from the average or that are very different from it.

So that these measures are not so affected, these outliers can be ignored when performing the analyzes and even the calculations. Other measures of dispersion that are more useful in these cases can also be used.

In the case of analyzing the risk of an investment, two important aspects are taken into account, one is the return invested and the other is the return expected according to the investment made. As already mentioned, variance can be used to analyze this risk.

Contenido relacionado

Más resultados...