# Cereal box prizes and transition matrices

If you don't know what a transition matrix is, you might want to read the transition matrix post before reading this one.

Transition matrices can be used to solve some classic probability problems. For example, consider the following problem:

Suppose in each cereal box you buy there is one number in the set $\{1,2,3,4,5\}$. You get a prize if you collect all five numbers. What is the expected number of boxes you have to buy (or steal) before you get all five numbers?

I found this problem in Frederick Mosteller's book 'Fifty Challenging Problems in Probability with Solutions'. I had to think about this problem for a few minutes, and you should too before going on.

Thought about it? Great. This is the solution I came up with: with the first box, you get some number. After this, you need on average $5/4$ boxes to get the second number. After you get two numbers, you need on average $5/3$ boxes to get the third number, and so forth. Thus, the expected number of boxes to win the prize is
$$1 + 5/4 + 5/3 + 5/2 + 5/1 = 137/12.$$
It turns out that this is the same as Mosteller's solution.

But, I thought this might be a great problem to illustrate transition matrices and Markov processes. Because if you think about it, this is actually a Markov process with six states. Or, if you prefer five states after the first box: you start with one number, and then the probability of proceeding to state two from state one is 4/5. But, the probability of stating in state one with one number is 1/5, and so on. The transition matrix is as follows:
$$\begin{pmatrix}\frac{1}{5} & \frac{4}{5} & 0 & 0 & 0 \\ 0 & \frac{2}{5} & \frac{3}{5} & 0 & 0 \\ 0 & 0 & \frac{3}{5} & \frac{2}{5} & 0 \\ 0 & 0 & 0 & \frac{4}{5} & \frac{1}{5} \\ 0 & 0 & 0 & 0 & 1\end{pmatrix}$$
Remember, with the convention of five states, we are starting out already with one cereal box. The good thing is that because this matrix is upper triangular, the eigenvalues are just the diagonal entries: since they are all distinct, this matrix is definitely diagonalizable.

As you can tell, this solution is not nearly as slick and short as the first one. The advantage of it is that it gives you all sorts of closed formulas for various probabilities of reaching various states! For example, if you actually go through the work of diagonalizing the transition matrix, you can get an expression for the probability of being in state four in $n-1$ moves. Then you can multiply this by $1/5$ to get the probability of reaching state five in exactly $n$ moves. You can't just use the 1,5-entry of the $n$th power of the transition matrix because that's the probability of ending up in state 5 after $n$ moves, not the probability of needing $n$ moves to end up in state 5. Get the difference?

If you actually go through this rigmarole, you'll get that the probability of needing $n$ moves to get all five numbers is:
$$P(n) = 5/4(4/5)^n – 20/3(3/5)^n + 15(2/5)^n – 20(1/5)^n$$
for $n \geq 5$. For example, $P(15)$ is about 0.0408620 (in a simulation of this situation that ran a hundred thousand times, 15 boxes occurred 4092 times). Incidentally, I believe this formula would be rather tedious to try and work out without transition matrices!

Now you can use this to figure out the expected number of boxes needed to get the prize, which is $\sum_{k=5}^\infty kP(k)$. The good news is this is just a bunch of manipulations with geometric series. It still takes a bit of work. To simplify this work I made Sage do some of it (here the function 'exval' just evaluates the sum $\sum_{k=5}^\infty kr^k$):

Well, there you have it!