Χ² distribution

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

In theory of probability and in statistics, the distribution ji to square (also called) distribution of Pearson or distribution ${displaystyle chi ^{2}}$ ) with ${displaystyle kin mathbb {N} }$ degree of freedom is the distribution of the sum of the square ${displaystyle k}$ independent random variables with standard normal distribution. The square chi distribution is a special case of gamma distribution and is one of the most used probability distributions in Statistical Inference, mainly in hypothesis tests and in the construction of confidence intervals.

Definition

As the sum of standard normals

Sean. ${displaystyle Z_{1},dotsZ_{k}}$ independent random variables such that ${displaystyle Z_{i}sim N(0,1)}$ for ${displaystyle i=1,2,dotsk}$ then the random variable $X$ defined by

{displaystyle {begin{aligned}X&=Z_{1}^{2}+Z_{2}^{2}+cdots +Z_{k}^{2}\&=sum _{i=1}^{k}Z_{i}^{2}end{aligned}}}

has a chi square distribution $k$ degrees of freedom.

Notation

If the random variable continues $X$ has a Chi Cuadrada distribution $k$ degrees of freedom then we'll write ${displaystyle Xsim chi _{k}^{2}}$ or ${displaystyle Xsim chi ^{2}(k)}$ .

Density Function

Yeah. ${displaystyle Xsim chi _{k}^{2}}$ then the density function of the random variable ${displaystyle X}$ That's it.

{displaystyle f_{X}(x)={frac {left({frac {1}{2}}right)^{frac {k}{2}}}{Gamma left({frac {k}{2}}right)}},x^{{frac {k}{2}}-1}e^{-x/2}}

for $0}" xmlns="http://www.w3.org/1998/Math/MathML">x▪0{displaystyle x 2005}0" aria-hidden="true" class="mwe-math-fallback-image-inline" src="https://wikimedia.org/api/rest_v1/media/math/render/svg/80d24be5f0eb4a9173da6038badc8659546021d0" style="vertical-align: -0.338ex; width:5.591ex; height:2.176ex;"/>$ where $Gamma$ It's the gamma function.

Cumulative Distribution Function

Yeah. ${displaystyle Xsim chi _{k}^{2}}$ then its distribution function is given by

{displaystyle F_{X}(x)={frac {gamma left({frac {k}{2}},{frac {x}{2}}right)}{Gamma left({frac {k}{2}}right)}}}

where ${displaystyle gamma (k,z)}$ is the incomplete gamma function.

In particular when $k=2$ Then this function takes shape

{displaystyle F_{X}(x)=1-e^{-x/2}}

Properties

Yeah. ${displaystyle Xsim chi _{k}^{2}}$ then the random variable ${displaystyle X}$ satisfies some properties.

Media

The mean of the random variable ${displaystyle X}$ That's it.

{displaystyle operatorname {E} [X]=k}

Variance

Variance of the random variable ${displaystyle X}$ That's it.

{displaystyle operatorname {Var} (X)=2k}

Moment generating function

The generating function of moments ${displaystyle X}$ That's it.

{displaystyle M_{X}(t)=left({frac {1}{1-2t}}right)^{k/2}}

for $<math alttext="{displaystyle 2t2t.1{displaystyle 2t tax1} <img alt="{displaystyle 2t$ .

Theorem

Sea $X_1,dots,X_n$ a random sample from a population with distribution ${displaystyle N(musigma ^{2})}$ then.

$overline{X}$ and the vector ${displaystyle left(X_{1}-{overline {X}},dotsX_{n}-{overline {X}}right)}$ They're independent.
$overline{X}$ and $S^{2}$ They're independent.
${displaystyle {frac {(n-1)S^{2}}{sigma ^{2}}}sim chi _{n-1}^{2}}$ .
${displaystyle operatorname {E} [S^{2}]=sigma ^{2}}$ and ${displaystyle operatorname {Var} (S^{2})={frac {2sigma ^{4}}{n-1}}}$ .

where

{displaystyle {overline {X}}={frac {1}{n}}sum _{i=1}^{n}X_{i}}

and

{displaystyle S^{2}={frac {1}{n-1}}sum _{i=1}^{n}left(X_{i}-{overline {X}}right)^{2}}

are the mean and variance of the random sample respectively.

Confidence intervals for samples from the normal distribution

Interval for the variance

Sean. $X_1,dots,X_n$ a random sample from a population with distribution ${displaystyle N(musigma ^{2})}$ where $mu$ and $sigma^2$ They're unknown.

You have to

{displaystyle {frac {(n-1)S^{2}}{sigma ^{2}}}sim chi _{n-1}^{2}}

Sean. ${displaystyle chi _{n-1,alpha /2},chi _{n-1,1-alpha /2}in mathbb {R} }$ such that

<math alttext="{displaystyle operatorname {P} [chi _{n-1,alpha /2}<YP [chuckles]χ χ n− − 1,α α /2.And.χ χ n− − 1,1− − α α /2]=1− − α α {displaystyle operatorname {P} [chi _{n-1,alpha /2}.<img alt="{displaystyle operatorname {P} [chi _{n-1,alpha /2}<Y

being ${displaystyle Ysim chi _{n-1}^{2}}$ then.

<math alttext="{displaystyle {begin{aligned}&operatorname {P} left[chi _{n-1,alpha /2}<{frac {(n-1)S^{2}}{sigma ^{2}}}{frac {sigma ^{2}}{(n-1)S^{2}}}>{frac {1}{chi _{n-1,1-alpha /2}}}right]=1-alpha \&operatorname {P} left[{frac {(n-1)S^{2}}{chi _{n-1,1-alpha /2}}}<sigma ^{2}P [chuckles]χ χ n− − 1,α α /2.(n− − 1)S2σ σ 2.χ χ n− − 1,1− − α α /2]=1− − α α P [chuckles]1χ χ n− − 1,α α /2▪σ σ 2(n− − 1)S2▪1χ χ n− − 1,1− − α α /2]=1− − α α P [chuckles](n− − 1)S2χ χ n− − 1,1− − α α /2.σ σ 2.(n− − 1)S2χ χ n− − 1,α α /2]=1− − α α {cHFFFFFF}{cHFFFFFF}{cHFFFFFF00}{cHFFFFFF00}{cHFFFFFFFF00}{cHFFFFFF00}{cHFFFFFFFF00}{cHFFFFFF00}{cHFFFFFF00}{cHFFFFFF00}{cHFFFFFFFFFF00}{cHFFFFFFFFFFFFFFFFFFFF00}{cH00}{cH00}{cH00}{cHFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00}{cH00}{cHFFFFFFFFFFFFFFFF00}{cHFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00}{cH00}{cHFFFFFFFFFFFFFFFFFF00}{cHFFFFFFFFFFFFFFFFFFFF<img alt="{displaystyle {begin{aligned}&operatorname {P} left[chi _{n-1,alpha /2}<{frac {(n-1)S^{2}}{sigma ^{2}}}{frac {sigma ^{2}}{(n-1)S^{2}}}>{frac {1}{chi _{n-1,1-alpha /2}}}right]=1-alpha \&operatorname {P} left[{frac {(n-1)S^{2}}{chi _{n-1,1-alpha /2}}}<sigma ^{2}

therefore an interval of ${displaystyle (1-alpha)100%}$ confidence for $sigma^2$ It's given by

{displaystyle left({frac {(n-1)S^{2}}{chi _{n-1,1-alpha /2}}},{frac {(n-1)S^{2}}{chi _{n-1,alpha /2}}}right)}

Related distributions

Distribution ${displaystyle chi ^{2}}$ with $k$ degree of freedom is a particular case of gamma distribution as if

{displaystyle Xsim Gamma left({frac {k}{2}},{frac {1}{2}}right)}

then.

{displaystyle Xsim chi _{k}^{2}}

When k is large enough, as a result of the central boundary theorem, it can be approached by a normal distribution:

{displaystyle lim _{kto infty }{frac {chi _{k}^{2}(x)}{k}}=N_{(1,{sqrt {2/k}})}(x)}

Applications

The χ² distribution has many applications in statistical inference. The best known is the so-called χ² test, used as a test of independence and as a test of good fit and in the estimation of variances. But it is also involved in the problem of estimating the mean of a normally distributed population and in the problem of estimating the slope of a linear regression line, through its role in the Student's t-distribution.

It also appears in all analysis of variance problems because of its relationship with the Snedecor F distribution, which is the distribution of the ratio of two independent random variables with a χ² distribution.

See this too

Chi-square distribution tables
Contingency table
Contingency ratio
Enough phi
Jean-Paul Benzécri

Computational methods

Table of χ2 values vs p-values

The p-value is the probability of observing a test statistic of "at least" as an extremum in a chi-square distribution. Therefore, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df, from English degree of freedom) gives the probability of having obtained a less extreme value than this point, subtracting the CDF value from 1 gives the p value. A low p value, below the chosen significance level, indicates statistical significance, that is, sufficient evidence to reject the null hypothesis. A significance level of 0.05 is often used as the cutoff between significant and insignificant results.

The following table gives a number of values p matching ${displaystyle chi ^{2}}$ for the first 10 degrees of freedom.

Freedom (df)	value ${displaystyle chi ^{2}}$
1	0.004	0.02	0.06	0.15	0.46	1.07	1.64	2.71	3.84	6.63	10.83
2	0.10	0.21	0.45	0.71	1.39	2.41	3.22	4.61	5.99	9.21	13.82
3	0.35	0.58	1.01	1.42	2.37	3.66	4.64	6.25	7.81	11.34	16.27
4	0.71	1.06	1.65	2.20	3.36	4.88	5.99	7.78	9.49	13.28	18.47
5	1.14	1.61	2.34	3.00	4.35	6.06	7.29	9.24	11.07	15.09	20.52
6	1.63	2.20	3.07	3.83	5.35	7.23	8.56	10.64	12.59	16.81	22.46
7	2.17	2.83	3.82	4.67	6.35	8.38	9.80	12.02	14.07	18.48	24.32
8	2.73	3.49	4.59	5.53	7.34	9.52	11.03	13.36	15.51	20.09	26.12
9	3.32	4.17	5.38	6.39	8.34	10.66	12.24	14.68	16.92	21.67	27.88
10	3.94	4.87	6.18	7.27	9.34	11.78	13.44	15.99	18.31	23.21	29.59
Value p (probability)	0.95	0.90	0.80	0.70	0.50	0.30	0.20	0.10	0.05	0.01	0.001

These values can be calculated by evaluating the quantile function (also known as the "inverse FDC" or "ICDF") of the chi-square distribution; for example, the $χ 2$ ICDF of $p = 0.05$ and $df = 7$ yields $2.1673 \approx 2.17$ as in the table above, noting that $1 - p$ is the value p from the table.

History

This distribution was first described by the German geodesist and statistician Friedrich Robert Helmert in papers from 1875–6, where he calculated the sampling distribution of the sampling variance of a normal population. Thus, in German, this was traditionally known as the Helmert'sche ("Helmertian") or "Helmert distribution".

The distribution was independently rediscovered by the English mathematician Karl Pearson in the context of goodness-of-fit, for which he developed his Pearson chi-square test, published in 1900, with a computed table of values published in (Elderton, 1902), collected in (Pearson, 1914, Table XII). The name "chi-square" ultimately derives from Pearson's shorthand for the exponent in a multivariate normal distribution with the Greek letter ji, by writing $-½χ 2$ so it would appear in modern notation as $-½ x T Σ -1 x$ (Σ being the covariance matrix). However, the idea of a family of "chi-square distributions" is not due to Pearson, but arose as a further development due to Fisher in the 1920s.

For more information

Hald, Anders (1998). A history of mathematical statistics from 1750 to 1930. New York: Wiley. ISBN 978-0-471-17912-2.
Elderton, William Palin (1902). «Tables for Testing the Goodness of Fit of Theory to Observation». Biometrika 1 (2): 155-163. doi:10.1093/biomet/1.2.155.
Hazewinkel, Michiel, ed. (2001), "Chi-squared distribution", Encyclopaedia of Mathematics (in English)Springer, ISBN 978-1556080104.
Pearson, Karl (1914). «On the probability that two independent distributions of frequency are really samples of the same population, with special reference to recent work on the identity of Trypanosome strains». Biometrika 10: 85-154. doi:10.1093/biomet/10.1.85.

Contenido relacionado

Más resultados...