Tolerance interval

A tolerance interval (TI) is a statistical interval within which, with some confidence level, a specified sampled proportion of a population falls. "More specifically, a $100\times p %/100\times(1-α)$ tolerance interval provides limits within which at least a certain proportion (p) of the population falls with a given level of confidence (1−α)." "A (p, 1−α) tolerance interval (TI) based on a sample is constructed so that it would include at least a proportion p of the sampled population with confidence 1−α; such a TI is usually referred to as p-content − (1−α) coverage TI." "A (p, 1−α) upper tolerance limit (TL) is simply a 1−α upper confidence limit for the 100 p percentile of the population."

Definition

Assume observations or random variates $\mathbf {x} =(x_{1},\ldots ,x_{n})$ as realization of independent random variables $\mathbf {X} =(X_{1},\ldots ,X_{n})$ which have a common distribution $F_{\theta }$ , with unknown parameter $\theta$ . Then, a tolerance interval with endpoints $(L(\mathbf {x} ),U(\mathbf {x} )]$ which has the defining property:

\inf _{\theta }\{{\Pr }_{\theta }\left(F_{\theta }(U(\mathbf {X} ))-F_{\theta }(L(\mathbf {X} )\right)\geq p)\}=100(1-\alpha )

where $\inf\{\}$ denotes the infimum function.

This is in contrast to a prediction interval with endpoints $[l(\mathbf {x} ),u(\mathbf {x} )]$ which has the defining property:

\inf _{\theta }\{{\Pr }_{\theta }(X_{0}\in [l(\mathbf {X} ),u(\mathbf {X} )])\}=100(1-\alpha )

.

Here, $X_{0}$ is a random variable from the same distribution $F_{\theta }$ but independent of the first $n$ variables.

Notice $X_{0}$ is not involved in the definition of tolerance interval, which deals only with the first sample, of size n.

Calculation

One-sided normal tolerance intervals have an exact solution in terms of the sample mean and sample variance based on the noncentral t-distribution. Two-sided normal tolerance intervals can be estimated using the chi-squared distribution.

Relation to other intervals

"In the parameters-known case, a 95% tolerance interval and a 95% prediction interval are the same." If we knew a population's exact parameters, we would be able to compute a range within which a certain proportion of the population falls. For example, if we know a population is normally distributed with mean $\mu$ and standard deviation $\sigma$ , then the interval $\mu \pm 1.96\sigma$ includes 95% of the population (1.96 is the z-score for 95% coverage of a normally distributed population).

However, if we have only a sample from the population, we know only the sample mean ${\hat {\mu }}$ and sample standard deviation ${\hat {\sigma }}$ , which are only estimates of the true parameters. In that case, ${\hat {\mu }}\pm 1.96{\hat {\sigma }}$ will not necessarily include 95% of the population, due to variance in these estimates. A tolerance interval bounds this variance by introducing a confidence level $\gamma$ , which is the confidence with which this interval actually includes the specified proportion of the population. For a normally distributed population, a z-score can be transformed into a "k factor" or tolerance factor for a given $\gamma$ via lookup tables or several approximation formulas. "As the degrees of freedom approach infinity, the prediction and tolerance intervals become equal."

The tolerance interval is less widely known than the confidence interval and prediction interval, a situation some educators have lamented, as it can lead to misuse of the other intervals where a tolerance interval is more appropriate.

The tolerance interval differs from a confidence interval in that the confidence interval bounds a single-valued population parameter (the mean or the variance, for example) with some confidence, while the tolerance interval bounds the range of data values that includes a specific proportion of the population. Whereas a confidence interval's size is entirely due to sampling error, and will approach a zero-width interval at the true population parameter as sample size increases, a tolerance interval's size is due partly to sampling error and partly to actual variance in the population, and will approach the population's probability interval as sample size increases.

The tolerance interval is related to a prediction interval in that both put bounds on variation in future samples. However, the prediction interval only bounds a single future sample, whereas a tolerance interval bounds the entire population (equivalently, an arbitrary sequence of future samples). In other words, a prediction interval covers a specified proportion of a population on average, whereas a tolerance interval covers it with a certain confidence level, making the tolerance interval more appropriate if a single interval is intended to bound multiple future samples.

Examples

gives the following example:

Another example is given by:

Tolerance interval

Definition

Calculation

Relation to other intervals

Examples

See also

References

Further reading