In probability theory and statistics, the Dirichlet process (DP) is one of the most popular Bayesian nonparametric models. It was introduced by Thomas Ferguson as a prior over probability distributions.

## Contents

- Inferences with the Imprecise Dirichlet Process
- Choice of the prior strength s displaystyle s
- Example estimate of the cumulative distribution
- Example median test
- Applications of the Imprecise Dirichlet Process
- Categorical variables
- References

A Dirichlet process
*base distribution* or *base measure*) is an arbitrary distribution and
*concentration parameter*) is a positive real number (it is often denoted as

The question is: how should we choose the prior parameters

To address this issue, the only prior that has been proposed so far is the limiting DP obtained for

The imprecise Dirichlet process has been proposed to overcome these issues. The basic idea is to fix

More precisely, the **imprecise Dirichlet process** (IDP) is defined as follows:

where

## Inferences with the Imprecise Dirichlet Process

Let

One of the most remarkable properties of the DP priors is that the posterior distribution of

where

In the **IDP**
**IDP** is by computing lower and upper bounds for the expectation of

the lower (upper) bound is obtained by a probability measure that puts all the mass on the infimum (supremum) of
**IDP** is the same as the original range of
**IDP** is therefore a model of prior (near)-ignorance for

A-posteriori, **IDP** can learn from data. The posterior lower and upper bounds for the expectation of

It can be observed that the posterior inferences do not depend on
*near* in prior near-ignorance, because the IDP requires by the modeller the elicitation of a parameter. However, this is a simple elicitation problem for a nonparametric prior, since we only have to choose the value of a positive scalar (there are not infinitely many parameters left in the IDP model).

Finally, observe that for

where

## Choice of the prior strength s {displaystyle s}

The IDP is completely specified by

## Example: estimate of the cumulative distribution

Let

Since

where

Note that, for any precise choice of

## Example: median test

IDP can also be used for hypothesis testing, for instance to test the hypothesis

where

By exploiting this property, it follows that

where

(with

- if both the inequalities are satisfied we can declare that
F ( 0 ) < 0.5 with probability larger than1 − γ ; - if only one of the inequality is satisfied (which has necessarily to be the one for the upper), we are in an indeterminate situation, i.e., we cannot decide;
- if both are not satisfied, we can declare that the probability that
F ( 0 ) < 0.5 is lower than the desired probability of1 − γ .

IDP returns an indeterminate decision when the decision is prior dependent (that is when it would depend on the choice of

By exploting the relationship between the cumulative distribution function of the Beta distribution, and the cumulative distribution function of a random variable *Z* from a binomial distribution, where the "probability of success" is *p* and the sample size is *n*:

we can show that the median test derived with th IDP for any choice of

## Applications of the Imprecise Dirichlet Process

Dirichlet processes are frequently used in Bayesian nonparametric statistics. The Imprecise Dirichlet Process can be employed instead of the Dirichlet processes in any application in which prior information is lacking (it is therefore important to model this state of prior ignorance).

In this respect, the Imprecise Dirichlet Process has been used for nonparametric hypothesis testing, see the Imprecise Dirichlet Process statistical package. Based on the Imprecise Dirichlet Process, Bayesian nonparametric near-ignorance versions of the following classical nonparametric estimators have been derived: the Wilcoxon rank sum test and the Wilcoxon signed-rank test.

A Bayesian nonparametric near-ignorance model presents several advantages with respect to a traditional approach to hypothesis testing.

- The Bayesian approach allows us to formulate the hypothesis test as a decision problem. This means that we can verify the evidence in favor of the null hypothesis and not only rejecting it and take decisions which minimize the expected loss.
- Because of the nonparametric prior near-ignorance, IDP based tests allows us to start the hypothesis test with very weak prior assumptions, much in the direction of letting data speak for themselves.
- Although the IDP test shares several similarities with a standard Bayesian approach, at the same time it embodies a significant change of paradigm when it comes to take decisions. In fact the IDP based tests have the advantage of producing an indeterminate outcome when the decision is prior-dependent. In other words, the IDP test suspends the judgment when the option which minimizes the expected loss changes depending on the Dirichlet Process base measure we focus on.
- It has been empirically verified that when the IDP test is indeterminate, the frequentist tests are virtually behaving as random guessers. This surprising result has practical consequences in hypothesis testing. Assume that we are trying to compare the effects of two medical treatments (Y is better than X) and that, given the available data, the IDP test is indeterminate. In such a situation the frequentist test always issues a determinate response (for instance I can tell that Y is better than X), but it turns out that its response is completely random, like if we were tossing of a coin. On the other side, the IDP test acknowledges the impossibility of making a decision in these cases. Thus, by saying "I do not know", the IDP test provides a richer information to the analyst. The analyst could for instance use this information to collect more data.

## Categorical variables

For categorical variables, i.e., when