## 概率模型

概率模型还是统计或概率模型是一种 数学模型，它包含一组与样本数据的生成相关的统计假设。统计模型通常以高度理想化的形式表示生成数据的过…… (leer más)

In statistics, **model validation** is the task of confirming that the outputs of a statistical model are acceptable with respect to the real data-generating process. In other words, model validation is the task of confirming that the outputs of a statistical model have enough fidelity to the outputs of the data-generating process that the objectives of the investigation can be achieved.

Model validation can be based on two types of data: data that was used in the construction of the model and data that was not used in the construction. Validation based on the first type usually involves analyzing the goodness of fit of the model or analyzing whether the residuals seem to be random (i.e. residual diagnostics). Validation based on the second type usually involves analyzing whether the model's predictive performance deteriorates non-negligibly when applied to pertinent new data.

Validation based on only the first type (data that was used in the construction of the model) is often inadequate. An extreme example is illustrated in Figure 1. The figure displays data (black dots) that was generated via a straight line + noise. The figure also displays a curve, which is a polynomial chosen to fit the data perfectly. The residuals for the curve are all zero. Hence validation based on only the first type of data would conclude that the curve was a good model. Yet the curve is obviously a poor model: interpolation, especially between −5 and −4, would tend to be highly misleading; moreover, any substantial extrapolation would be bad.

Thus, validation is usually not based on only considering data that was used in the construction of the model; rather, validation usually also employs data that was not used in the construction. In other words, validation usually includes testing some of the model's predictions.

A model can be validated only relative to some application area. A model that is valid for one application might be invalid for some other applications. As an example, consider the curve in Figure 1: if the application only used inputs from the interval [0, 2], then the curve might well be an acceptable model.

When doing a validation, there are three notable causes of potential difficulty, according to the *Encyclopedia of Statistical Sciences*. The three causes are these: lack of data; lack of control of the input variables; uncertainty about the underlying probability distributions and correlations. The usual methods for dealing with difficulties in validation include the following: checking the assumptions made in constructing the model; examining the available data and related model outputs; applying expert judgment. Note that expert judgment commonly requires expertise in the application area.

Expert judgment can sometimes be used to assess the validity of a prediction *without* obtaining real data: e.g. for the curve in Figure 1, an expert might well be able to assess that a substantial extrapolation will be invalid. Additionally, expert judgment can be used in Turing-type tests, where experts are presented with both real data and related model outputs and then asked to distinguish between the two.

For some classes of statistical models, specialized methods of performing validation are available. As an example, if the statistical model was obtained via a regression, then specialized analyses for regression model validation exist and are generally employed.

Residual diagnostics comprise analyses of the residuals to determine whether the residuals seem to be effectively random. Such analyses typically requires estimates of the probability distributions for the residuals. Estimates of the residuals' distributions can often be obtained by repeatedly running the model, i.e. by using repeated stochastic simulations (employing a pseudorandom number generator for random variables in the model).

If the statistical model was obtained via a regression, then regression-residual diagnostics exist and may be used; such diagnostics have been well studied.

Lic. CC BY-SA 3.0 - Statistical model validation - Wikipedia

- Te puede interesar
## 概率模型

概率模型还是统计或概率模型是一种 数学模型，它包含一组与样本数据的生成相关的统计假设。统计模型通常以高度理想化的形式表示生成数据的过…… (leer más)

- Te puede interesar
## 统计模型验证

在统计学中，模型验证是确认统计模型的输出对于实际数据生成过程是可接受的任务。换句话说，模型验证是确认统计模型的输出对数据生成过程的输出有足够的保真度以实现调查目标的任…… (leer más)

- Te puede interesar
## Null hypothesis

In Statistics, the null hypothesis, represented by , is a hypothesis that is presented on certain statistical facts and whose falsity is tried to be proved... (leer más)

Iniciar con Google

Iniciar con Facebook

x