Χ² distribution
In theory of probability and in statistics, the distribution ji to square (also called) distribution of Pearson or distribution χ χ 2{displaystyle chi ^{2}}) with k한 한 N{displaystyle kin mathbb {N} } degree of freedom is the distribution of the sum of the square k{displaystyle k} independent random variables with standard normal distribution. The square chi distribution is a special case of gamma distribution and is one of the most used probability distributions in Statistical Inference, mainly in hypothesis tests and in the construction of confidence intervals.
Definition
As the sum of standard normals
Sean. Z1,...... ,Zk{displaystyle Z_{1},dotsZ_{k}} independent random variables such that Zi♥ ♥ N(0,1){displaystyle Z_{i}sim N(0.1)} for i=1,2,...... ,k{displaystyle i=1,2,dotsk} then the random variable X{displaystyle X} defined by
- X=Z12+Z22+ +Zk2=␡ ␡ i=1kZi2{displaystyle {begin{aligned}X fake=Z_{1}^{2}+Z_{2}^{2}{2} +cdots +Z_{k}{k}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{1}{sum}{x {{i={i}{i}{k}{i}{k}{k}{k}}{x {{x {p}}}}}}}}{s}}}}}}}{x1}}}{x1}}}{s}}}}{s}}}}{s}}}{x1}}}}{s}}}}}
has a chi square distribution k{displaystyle k} degrees of freedom.
Notation
If the random variable continues X{displaystyle X} has a Chi Cuadrada distribution k{displaystyle k} degrees of freedom then we'll write X♥ ♥ χ χ k2{displaystyle Xsim chi _{k}^{2}} or X♥ ♥ χ χ 2(k){displaystyle Xsim chi ^{2}(k)}.
Density Function
Yeah. X♥ ♥ χ χ k2{displaystyle Xsim chi _{k}^{2}} then the density function of the random variable X{displaystyle X} That's it.
- fX(x)=(12)k2Interpreter Interpreter (k2)xk2− − 1e− − x/2{displaystyle f_{X}(x)={frac {left({frac {1}{2}}}}}{right)^{frac {k}{2}}{Gamma left({frac {k}{2}{2}}}{right}}}}{,x^{frac {k}{2}{2}{2}{c}{c}{f}{f}{f}{f}{f}{f}{f}{f}{f}{f}{f}{f)}{f)}{f)}{f}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f)}{f}{f)}{f)}{f}}{f
for 0}" xmlns="http://www.w3.org/1998/Math/MathML">x▪0{displaystyle x 2005}0" aria-hidden="true" class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/80d24be5f0eb4a9173da6038badc8659546021d0" style="vertical-align: -0.338ex; width:5.591ex; height:2.176ex;"/> where Interpreter Interpreter {displaystyle Gamma } It's the gamma function.
Cumulative Distribution Function
Yeah. X♥ ♥ χ χ k2{displaystyle Xsim chi _{k}^{2}} then its distribution function is given by
- FX(x)=γ γ (k2,x2)Interpreter Interpreter (k2){displaystyle F_{X}(x)={frac {gamma left({frac {k}{2}}{frac {x}{2}}{2}}right)}{Gamma left({frac {k}{2}}}}}}}}}}}}{right
where γ γ (k,z){displaystyle gamma (k,z)} is the incomplete gamma function.
In particular when k=2{displaystyle k=2} Then this function takes shape
- FX(x)=1− − e− − x/2{displaystyle F_{X}(x)=1-e^{-x/2}}}
Properties
Yeah. X♥ ♥ χ χ k2{displaystyle Xsim chi _{k}^{2}} then the random variable X{displaystyle X} satisfies some properties.
Media
The mean of the random variable X{displaystyle X} That's it.
- E [chuckles]X]=k{displaystyle operatorname {E} [X]=k}
Variance
Variance of the random variable X{displaystyle X} That's it.
- Var (X)=2k{displaystyle operatorname {Var} (X)=2k}
Moment generating function
The generating function of moments X{displaystyle X} That's it.
- MX(t)=(11− − 2t)k/2{displaystyle M_{X}(t)=left({frac {1}{1-2t}}}right)^{k/2}}
for <math alttext="{displaystyle 2t2t.1{displaystyle 2t tax1}<img alt="{displaystyle 2t.
Theorem
Sea X1,...... ,Xn{displaystyle X_{1},dotsX_{n}} a random sample from a population with distribution N(μ μ ,σ σ 2){displaystyle N(musigma ^{2}}} then.
- X! ! {displaystyle {overline {X}}} and the vector (X1− − X! ! ,...... ,Xn− − X! ! ){displaystyle left(X_{1}-{overline {X}}}},dotsX_{n}-{overline {X}}right)} They're independent.
- X! ! {displaystyle {overline {X}}} and S2{displaystyle S^{2}} They're independent.
- (n− − 1)S2σ σ 2♥ ♥ χ χ n− − 12{displaystyle {frac {(n-1)S^{2}}{sigma ^{2}}}}}{n-1}{n-1}{2}}}}}.
- E [chuckles]S2]=σ σ 2{displaystyle operatorname {E} [S^{2}]=sigma ^{2}}} and Var (S2)=2σ σ 4n− − 1{displaystyle operatorname {Var} (S^{2})={frac {2sigma ^{4}{n-1}}}}}.
where
- X! ! =1n␡ ␡ i=1nXi{displaystyle {overline {X}}={frac {1}{n}}}{sum _{i=1}^{n}X_{i}}}
and
- S2=1n− − 1␡ ␡ i=1n(Xi− − X! ! )2{displaystyle S^{2}={frac {1}{n-1}}}sum _{i=1}{nleft(X_{i}-{overline {X}}}right)^{2}}}
are the mean and variance of the random sample respectively.
Confidence intervals for samples from the normal distribution
Interval for the variance
Sean. X1,...... ,Xn{displaystyle X_{1},dotsX_{n}} a random sample from a population with distribution N(μ μ ,σ σ 2){displaystyle N(musigma ^{2}}} where μ μ {displaystyle mu } and σ σ 2{displaystyle sigma ^{2}} They're unknown.
You have to
- (n− − 1)S2σ σ 2♥ ♥ χ χ n− − 12{displaystyle {frac {(n-1)S^{2}}{sigma ^{2}}}}}{n-1}{n-1}{2}}}}}
Sean. χ χ n− − 1,α α /2,χ χ n− − 1,1− − α α /2한 한 R{displaystyle chi _{n-1,alpha /2},chi _{n-1,1-alpha /2}in mathbb {R} } such that
- <math alttext="{displaystyle operatorname {P} [chi _{n-1,alpha /2}<YP [chuckles]χ χ n− − 1,α α /2.And.χ χ n− − 1,1− − α α /2]=1− − α α {displaystyle operatorname {P} [chi _{n-1,alpha /2}.<img alt="{displaystyle operatorname {P} [chi _{n-1,alpha /2}<Y
being And♥ ♥ χ χ n− − 12{displaystyle Ysim chi _{n-1}^{2}} then.
- <math alttext="{displaystyle {begin{aligned}&operatorname {P} left[chi _{n-1,alpha /2}<{frac {(n-1)S^{2}}{sigma ^{2}}}{frac {sigma ^{2}}{(n-1)S^{2}}}>{frac {1}{chi _{n-1,1-alpha /2}}}right]=1-alpha \&operatorname {P} left[{frac {(n-1)S^{2}}{chi _{n-1,1-alpha /2}}}<sigma ^{2}P [chuckles]χ χ n− − 1,α α /2.(n− − 1)S2σ σ 2.χ χ n− − 1,1− − α α /2]=1− − α α P [chuckles]1χ χ n− − 1,α α /2▪σ σ 2(n− − 1)S2▪1χ χ n− − 1,1− − α α /2]=1− − α α P [chuckles](n− − 1)S2χ χ n− − 1,1− − α α /2.σ σ 2.(n− − 1)S2χ χ n− − 1,α α /2]=1− − α α {cHFFFFFF}{cHFFFFFF}{cHFFFFFF00}{cHFFFFFF00}{cHFFFFFFFF00}{cHFFFFFF00}{cHFFFFFFFF00}{cHFFFFFF00}{cHFFFFFF00}{cHFFFFFF00}{cHFFFFFFFFFF00}{cHFFFFFFFFFFFFFFFFFFFF00}{cH00}{cH00}{cH00}{cHFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00}{cH00}{cHFFFFFFFFFFFFFFFF00}{cHFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00}{cH00}{cHFFFFFFFFFFFFFFFFFF00}{cHFFFFFFFFFFFFFFFFFFFF<img alt="{displaystyle {begin{aligned}&operatorname {P} left[chi _{n-1,alpha /2}<{frac {(n-1)S^{2}}{sigma ^{2}}}{frac {sigma ^{2}}{(n-1)S^{2}}}>{frac {1}{chi _{n-1,1-alpha /2}}}right]=1-alpha \&operatorname {P} left[{frac {(n-1)S^{2}}{chi _{n-1,1-alpha /2}}}<sigma ^{2}
therefore an interval of (1− − α α )100% % {displaystyle (1-alpha)100%} confidence for σ σ 2{displaystyle sigma ^{2}} It's given by
- ((n− − 1)S2χ χ n− − 1,1− − α α /2,(n− − 1)S2χ χ n− − 1,α α /2){displaystyle left({frac {(n-1)S^{2}}}{chi _{n-1,1-alpha /2}}}},{frac {(n-1)S^{2}}{chi _{n-1,alpha /2}}}}}{right)}}
Related distributions
- Distribution χ χ 2{displaystyle chi ^{2}} with k{displaystyle k} degree of freedom is a particular case of gamma distribution as if
- X♥ ♥ Interpreter Interpreter (k2,12){displaystyle Xsim Gamma left({frac {k}{2}}},{frac {1}{2}}}right)}
- then. X♥ ♥ χ χ k2{displaystyle Xsim chi _{k}^{2}}.
- When k is large enough, as a result of the central boundary theorem, it can be approached by a normal distribution:
- limk→ → ∞ ∞ χ χ k2(x)k=N(1,2/k)(x){displaystyle lim _{kto infty }{frac {chi _{k^}{2}(x)}{k}}=N_{(1,{sqrt {2/k}}})}(x)}
Applications
The χ² distribution has many applications in statistical inference. The best known is the so-called χ² test, used as a test of independence and as a test of good fit and in the estimation of variances. But it is also involved in the problem of estimating the mean of a normally distributed population and in the problem of estimating the slope of a linear regression line, through its role in the Student's t-distribution.
It also appears in all analysis of variance problems because of its relationship with the Snedecor F distribution, which is the distribution of the ratio of two independent random variables with a χ² distribution.
See this too
- Chi-square distribution tables
- Contingency table
- Contingency ratio
- Enough phi
- Jean-Paul Benzécri
Computational methods
Table of χ2 values vs p-values
The p-value is the probability of observing a test statistic of "at least" as an extremum in a chi-square distribution. Therefore, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df, from English degree of freedom) gives the probability of having obtained a less extreme value than this point, subtracting the CDF value from 1 gives the p value. A low p value, below the chosen significance level, indicates statistical significance, that is, sufficient evidence to reject the null hypothesis. A significance level of 0.05 is often used as the cutoff between significant and insignificant results.
The following table gives a number of values p matching χ χ 2{displaystyle chi ^{2}} for the first 10 degrees of freedom.
Freedom (df) | value χ χ 2{displaystyle chi ^{2}} | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0.004 | 0.02 | 0.06 | 0.15 | 0.46 | 1.07 | 1.64 | 2.71 | 3.84 | 6.63 | 10.83 |
2 | 0.10 | 0.21 | 0.45 | 0.71 | 1.39 | 2.41 | 3.22 | 4.61 | 5.99 | 9.21 | 13.82 |
3 | 0.35 | 0.58 | 1.01 | 1.42 | 2.37 | 3.66 | 4.64 | 6.25 | 7.81 | 11.34 | 16.27 |
4 | 0.71 | 1.06 | 1.65 | 2.20 | 3.36 | 4.88 | 5.99 | 7.78 | 9.49 | 13.28 | 18.47 |
5 | 1.14 | 1.61 | 2.34 | 3.00 | 4.35 | 6.06 | 7.29 | 9.24 | 11.07 | 15.09 | 20.52 |
6 | 1.63 | 2.20 | 3.07 | 3.83 | 5.35 | 7.23 | 8.56 | 10.64 | 12.59 | 16.81 | 22.46 |
7 | 2.17 | 2.83 | 3.82 | 4.67 | 6.35 | 8.38 | 9.80 | 12.02 | 14.07 | 18.48 | 24.32 |
8 | 2.73 | 3.49 | 4.59 | 5.53 | 7.34 | 9.52 | 11.03 | 13.36 | 15.51 | 20.09 | 26.12 |
9 | 3.32 | 4.17 | 5.38 | 6.39 | 8.34 | 10.66 | 12.24 | 14.68 | 16.92 | 21.67 | 27.88 |
10 | 3.94 | 4.87 | 6.18 | 7.27 | 9.34 | 11.78 | 13.44 | 15.99 | 18.31 | 23.21 | 29.59 |
Value p (probability) | 0.95 | 0.90 | 0.80 | 0.70 | 0.50 | 0.30 | 0.20 | 0.10 | 0.05 | 0.01 | 0.001 |
These values can be calculated by evaluating the quantile function (also known as the "inverse FDC" or "ICDF") of the chi-square distribution; for example, the χ2 ICDF of p = 0.05 and df = 7 yields 2.1673 ≈ 2.17 as in the table above, noting that 1 – p is the value p from the table.
History
This distribution was first described by the German geodesist and statistician Friedrich Robert Helmert in papers from 1875–6, where he calculated the sampling distribution of the sampling variance of a normal population. Thus, in German, this was traditionally known as the Helmert'sche ("Helmertian") or "Helmert distribution".
The distribution was independently rediscovered by the English mathematician Karl Pearson in the context of goodness-of-fit, for which he developed his Pearson chi-square test, published in 1900, with a computed table of values published in (Elderton, 1902), collected in (Pearson, 1914, Table XII). The name "chi-square" ultimately derives from Pearson's shorthand for the exponent in a multivariate normal distribution with the Greek letter ji, by writing −½χ2 so it would appear in modern notation as −½xTΣ−1x (Σ being the covariance matrix). However, the idea of a family of "chi-square distributions" is not due to Pearson, but arose as a further development due to Fisher in the 1920s.
For more information
- Hald, Anders (1998). A history of mathematical statistics from 1750 to 1930. New York: Wiley. ISBN 978-0-471-17912-2.
- Elderton, William Palin (1902). «Tables for Testing the Goodness of Fit of Theory to Observation». Biometrika 1 (2): 155-163. doi:10.1093/biomet/1.2.155.
- Hazewinkel, Michiel, ed. (2001), "Chi-squared distribution", Encyclopaedia of Mathematics (in English)Springer, ISBN 978-1556080104.
- Pearson, Karl (1914). «On the probability that two independent distributions of frequency are really samples of the same population, with special reference to recent work on the identity of Trypanosome strains». Biometrika 10: 85-154. doi:10.1093/biomet/10.1.85.
Contenido relacionado
Stokes's theorem
Diophantus of Alexandria
Infinitesimal