Discrete uniform distribution

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

In theory of probability and statistics, the uniform discreet distribution is a symmetrical discreet probability distribution that arises in comparable probability spaces, that is, in situations where $n$ different results, all have the same chance of happening.

A simple example of the discrete uniform distribution is rolling dice. The possible values are 1, 2, 3, 4, 5, 6 and each time the die is rolled, the probability of a given score is 1/6. If two dice are rolled and their values are added, the resulting distribution is no longer uniform because not all sums have the same probability. Although it is convenient to describe discrete uniform distributions over integers, such as this, one can also consider discrete uniform distributions over any finite set. For example, a random permutation is a permutation generated uniformly from permutations of a given length, and a uniform spanning tree is a spanning tree. uniformly generated from the spanning trees of a given graph.

The discrete uniform distribution itself is intrinsically nonparametric. It is convenient, however, to represent their values in general by all integers in an interval [ a b ], so that a and b become the leading parameters of the distribution (often one simply considers the interval [1, n ] with the single parameter n). With these conventions, the cumulative distribution function (CDF) of the discrete uniform distribution can be expressed, for any k ∈ [ a b ], as

{displaystyle F(k;a,b)={frac {lfloor krfloor -a+1}{b-a+1}}}

Definition

Notation

Yeah. $X$ is a discreet random variable whose support is the set ${displaystyle {x_{1},x_{2},dotsx_{n}}}$ and has a discreet uniform distribution then we'll write ${displaystyle Xsim operatorname {Uniforme} (x_{1},x_{2},dotsx_{n})}$ .

Probability function

The probability function $X$ That's it.

{displaystyle operatorname {P} [X=x]={frac {1}{n}}}

for ${displaystyle x=x_{1},x_{2},dotsx_{n}}$ Lakers

Properties

Discreet uniform distribution.

Yeah. ${displaystyle Xsim operatorname {Uniforme} (x_{1},x_{2},dotsx_{n})}$ then the random variable satisfies some properties.

Media

The mean of the random variable $X$ That's it.

{displaystyle operatorname {E} [X]={frac {1}{n}}sum _{i=1}^{n}x_{i},!}

Variance

Variance of the random variable $X$ That's it.

{displaystyle operatorname {Var} (X)={frac {1}{n}}sum _{i=1}^{n}(x_{i}-operatorname {E} [X])^{2}}

Properties

The family of uniform distributions over ranges of integers (with one or both limits unknown) has a finite-dimensional sufficient statistic, that is, three times the maximum sample, the minimum sample, and the size of the sample. shows, but is not an exponential family of distributions, because the support varies with the parameters. For families whose support does not depend on parameters, the Pitman-Koopman-Darmois theorem states that only exponential families have a sufficient statistic whose dimension is bounded as the sample size increases. The uniform distribution is therefore a simple example that shows the limit of this theorem.

Examples

For a perfect dice, all results have the probability of ${displaystyle 1/6}$ .
For a perfect coin, all results have the probability of $1/2$ .

German tank problem

The problem of estimating the maximum in a population can be formulated as follows:

Suppose to be an intelligence analyst for the Allies during World War II, and have some serial numbers of captured German tanks. In addition, assume that all German tanks have been numbered sequentially from 1 to N. How could the total number of tanks be estimated?

For the point estimate (estimating a single value for the total), the unbiased minimum variance estimator is given by the formula:

{displaystyle {hat {N}}=m+{frac {m-k}{k}}=m+{frac {m}{k}}-1={frac {(k+1)}{k}}m-1}

where m is the largest observed serial number (sample maximum) and k is the observed number of tanks (sample size). The formula can be understood as

"The maximum in the sample plus the average gap in the sample"

In the first equation, the first summand is the maximum and the second summand is the middle gap.

The name of the estimator (unbiased) can be understood if we consider that we are taking the maximum of the sample as our base estimate and then correcting for its bias, tending to "underestimate" the true population maximum, since the sample maximum can be equal to or less than, but never greater than, the population maximum.

Note that, due to the assumption of no replacement, once a serial number has been observed, it is no longer in the observation repository and cannot be seen again.

Specific data
According to Allied conventional intelligence estimates, the Germans were producing around 1,400 tanks per month between June 1940 and September 1942. Applying the above formula to the serial numbers of captured German tanks (both those still in in a state of being used as those partially destroyed), the resulting number was calculated at 256 per month. After the war, official production figures, obtained from documents seized from Albert Speer's War Office, showed the actual number to be 255.

The following estimates have been quoted for some specific months:

Month	Statistical estimate	Intelligence estimate	German Register
June 1940	169	1000	122
June 1941	244	1550	271
August 1942	327	1550	342

Más resultados...