The Chinese Restaurant Process - Derivation from finite Dirichlet mixtures

— 1 September 2022

Quick overview:

We show how to obtain the Chinese Restaurant Process (CRP) as the limit of a family of finite Dirichlet mixtures. But before we start I would like to point the reader to a few references that I enjoyed and found useful.

Dirichlet Mixtures

Choose a concentration parameter $\alpha > 0$ and consider the following generative process for seating (or cluster) assignments $c_1,\ldots, c_t$ of $t$ customers to a fixed number of $k$ tables (or clusters) identified with $1,\ldots,k$ (I am using the terms clusters and tables interchangably, the same holds for cluster- and seating assignments):

$\begin{eqnarray*} p = (p_1,\ldots,p_k) &\sim& \text{Dirichlet} (\tfrac{\alpha}{k}, \ldots, \tfrac{\alpha}{k}) \\ c_1, \ldots, c_t &\sim& \text{Categorical}(p_1,\ldots,p_k). \end{eqnarray*}$

By marginalizing over the mixture probabilities $p_i$ we compute the posterior

$\tag{DIR} \begin{eqnarray*} p(c_{t} = i \mid c_1,\ldots,c_{t-1}; \alpha, k) = ... = \frac{\tfrac{\alpha}{k} + n_{i,t-1}}{\alpha + t - 1}, \end{eqnarray*}$

where $n_{i,t}$ denotes the size of cluster $i$ at time $t$, that is $n_{i,t} = \vert\{ c_s : c_s = i, s\leq t \}\vert$.

Cf. my handwritten notes for the details.

CRP as Limit of Dirichlet Mixtures

As we will see below when we consider the limit $k\to \infty$ it will be helpful to group the outcomes of chosing any of the empty clusters (i.e. choosing $c_t = i$ with $n_{i,t-1}=0$ or put differently $c_t \neq c_s, \text{ for all } s < t$) into a single event that marks the start of a new cluster. If we denote by $k_t$ the number of non-empty clusters up at time $t$ (after sampling $c_t$) the number of empty clusters (those with $n_{c,t-1}=0$) is obviously $k - k_{t-1}$ and we have

$\begin{eqnarray*} p(c_{t} | c_1,\ldots,c_{t-1}; \alpha, k) \propto \begin{cases} \tfrac{\alpha}{k} + n_{c_s,t-1} & \text{if c_t = c_s for some  s < t,} \\ \tfrac{\alpha}{k} (k - k_{t-1}) & \text{if c_{t} \neq c_s for all s < t}. \end{cases} \end{eqnarray*}$

In the limit $k \to \infty$ this yields

$\tag{CRP} p(c_{t} | c_1,\ldots,c_{t-1}; \alpha) \propto \begin{cases} n_{c_s,t-1} & \text{if c_t = c_s for some  s < t}, \\ \alpha & \text{if c_{t} \neq c_s for all s < t}. \end{cases}$

The normalizing constant in the equation above is $\alpha + t-1$. For the lack of a better term I call this probability the CRP kernel for the remainder of this tutorial. With this in hand we can now define a distribution over table assignments called the Chinese Restauran Process (CRP)

$\label{eq:crp} \tag{CRP'} p(c_1,\ldots,c_T \mid \alpha) := \prod_{t=1}^T p(c_t \mid c_{t-1}, \ldots, c_{1}; \alpha).$

Although the CRP is defined sequentially, the order the customers sit down at their respective tables does not change the probablity of asingment — this corresponds to the exchangability of the process. This follows from rewriting the probability in terms of the table sizes $n_i$ which do not contain any sequential information: It is not hard to see that we have

$p(c_1,\ldots,c_T \mid \alpha) = \frac{1}{\alpha \cdot (\alpha +1) \cdots (\alpha + T -1)} \cdot \alpha^{k} \cdot \prod_{i=1}^{k} (n_i -1)!.$
The first term consists of the normalizing constants for each term in Eq. CRP'. The second term are the probabilities of starting each of the $k$ clusters, and the last term consists of all probabilities for the remaining $n_i - 1$ elements in each cluster $i$.