## Quick overview:

We show how to obtain the *Chinese Restaurant Process* (CRP) as the limit of a family of finite Dirichlet mixtures.
But before we start I would like to point the reader to a few references that I enjoyed and found useful.

- D. Aldous, Exchangeability and related topics (1983).
- C. Robert, G. Casella,
*Monte Carlo Statistical Methods*, Technometrics, 42 (4), pp. 430-431 - R. Neal,
*Markov Chain Sampling Methods for Dirichlet Process Mixture Models*, Journal of Computational and Graphical Statistics Vol. 9, No. 2 (2000), pp. 249-265. - T. Broderick, https://people.csail.mit.edu/tbroderick/tutorials.html

## Dirichlet Mixtures

Choose a *concentration parameter* $\alpha > 0$ and consider the following generative process for seating (or cluster) assignments $c_1,\ldots, c_t$ of $t$ customers to a fixed number of $k$ tables (or clusters) identified with $1,\ldots,k$ (I am using the terms *clusters* and *tables* interchangably, the same holds for *cluster-* and *seating assignments*):

By marginalizing over the mixture probabilities $p_i$ we compute the posterior

\[\tag{DIR} \begin{eqnarray*} p(c_{t} = i \mid c_1,\ldots,c_{t-1}; \alpha, k) = ... = \frac{\tfrac{\alpha}{k} + n_{i,t-1}}{\alpha + t - 1}, \end{eqnarray*}\]where $n_{i,t}$ denotes the size of cluster $i$ at time $t$, that is \(n_{i,t} = \vert\{ c_s : c_s = i, s\leq t \}\vert\).

## CRP as Limit of Dirichlet Mixtures

As we will see below when we consider the limit $k\to \infty$ it will be helpful to group the outcomes of chosing *any* of the empty clusters (i.e. choosing $c_t = i$ with $n_{i,t-1}=0$ or put differently $c_t \neq c_s, \text{ for all } s < t$) into a single event that marks the start of a new cluster. If we denote by $k_t$ the number of non-empty clusters up at time $t$ (after sampling $c_t$) the number of
empty clusters (those with $n_{c,t-1}=0$) is obviously $k - k_{t-1}$ and
we have

In the limit $k \to \infty$ this yields

\[\tag{CRP} p(c_{t} | c_1,\ldots,c_{t-1}; \alpha) \propto \begin{cases} n_{c_s,t-1} & \text{if $c_t = c_s$ for some $ s < t$}, \\ \alpha & \text{if $c_{t} \neq c_s$ for all $s < t$}. \end{cases}\]The normalizing constant in the equation above is \(\alpha + t-1\).
For the lack of a better term I call this probability the **CRP kernel** for the remainder of this tutorial. With this in hand we can now define a distribution over table assignments called the **Chinese Restauran Process (CRP)**

Although the CRP is defined sequentially, the order the customers sit down at their respective tables does not change the probablity of asingment — this corresponds to the exchangability of the process. This follows from rewriting the probability in terms of the table sizes $n_i$ which do not contain any sequential information: It is not hard to see that we have

\[p(c_1,\ldots,c_T \mid \alpha) = \frac{1}{\alpha \cdot (\alpha +1) \cdots (\alpha + T -1)} \cdot \alpha^{k} \cdot \prod_{i=1}^{k} (n_i -1)!.\]