Imagine $n$ states of a system in a discrete-time stochastic system. For each pair of states $i$ and $j$, there is a probability $p_{ij}$ of moving to state $j$ in the next time step, given that the system is in state $i$. Each of these probabilities can be put in a matrix, known as the **transition matrix**.

Let’s take an example. Such a discrete time system can be easily represented by a graph (in the sense of graph theory). Here is one:

(Incidentally, this is not how I would draw this graph by hand, but not having to do so by using Graphviz is worth not drawing it exactly the way I want it.) In this system, there are two states: in each of these states, there is a 1/2 probability of leaving the state and a 1/2 probability of remaining in it. It is a system with a lot of symmetry and whose transition matrix is

$$T_1 = \begin{pmatrix}1/2 & 1/2\\1/2 & 1/2\end{pmatrix}$$

The cool thing about a transition matrix is that you can raise it to some power. What does this mean? If $T$ is a transition matrix, then it follows from the definition of matrix multiplication and induction that the $i,j$ entry is the probabiltiy of moving to state $j$ given that the system is in state $i$. But it follows pretty much from the definition of matrix multiplication that the $i,j$-entry of $T^2$ is the probability from moving to state $j$ in **two steps** from state $i$. In general, the $i,j$ entry of $T^n$ is the probability of moving to state $j$ in **$n$ steps** starting from state $i$.

Note that a transition matrix has the property that the sum of each row is one. This property obviously characterises transition matrices, and by our discussion we see that any power of a transition matrix is itself a transition matrix, because from any state, *you have to go somewhere*. The transition matrix $T_1$ with every entry 1/2 satisfies $T_1^n = T_1$ for any positive natural number $n$. It’s an idempotent.

## What’s the point of transition matrices?

The transition matrix $T$ is a concise way to represent a discrete-time system that moves between a finite number of states with given probabilities (such a process is called a **discrete-time Markov process** in case you want to look up more stuff about it). But it also gives you a way to figure out the long-term properties of a system. If you can figure out what happens to $T^n$ when $n\to\infty$, then that will tell you in the long run, how much time is spent in each state! If we take the matrix

$$T_1 = \begin{pmatrix}1/2 & 1/2\\1/2 & 1/2\end{pmatrix}$$

in our earlier example, then we already know that $\lim_{n\to\infty} T_1^n = T_1$ because $T_1$ is an idempotent matrix. But this is just a special system. Let’s consider another:

Now, the transition matrix is

$$T_2 = \begin{pmatrix}2/3 & 1/3\\1/3 & 2/3\end{pmatrix}$$

How does $T_2^n$ look in the long run? You could just compute powers of $T_2$ to get an approximate answer. But it’s easier to see the exact answer by using diagonalisation of matrices. In this case $T_2$ has two eigenvalues: $1$ and $1/3$. Thus, $T_2$ can be diagonalised to a matrix $D = {\rm diag}(1,1/3)$. (Here I am using the notation ${\rm diag}(a_1,\dots,a_n)$ for a diagonal matrix whose $k,k$-entry is $a_k$.) That is, there exists a matrix $P$ such that

$$PDP^{-1} = T_2.$$

Because the eigenvalues of $T_2$ are in the interval $[0,1]$, the limit $\lim_{n\to\infty}T_2^n$ exists and is equal to

$$P(\lim_{n\to\infty} D^n)P^{-1}.$$

Of course, it is really easy to calculate $\lim_{n\to\infty} D^n$. Just replace any number less than one appearing in $D$ with a zero. I’ll leave it as an exercise to calculate $P$, but if you do that, you get that

$$\lim_{n\to\infty} T_2 = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & 1/2\end{pmatrix}$$

So, in the long run, the average time in state one is the same as the average time in state two, even though in the short term by watching these two systems they would behave differently. That’s not really surprising though because both states in the system are the same with respect to the probabilities of transitioning to all the other states.

## A slightly more complicated example

Here is another example:

What is the corresponding transition matrix? It is

$$T_3 = \begin{pmatrix} 1/2 & 1/4 & 1/4 \\ 1/4 & 2/3 & 1/12\\ 1/8 & 1/8 & 1/4\end{pmatrix}$$

For this matrix and other systems you typically encounter in the real world, the eigenvalues might not be very pretty. But it is still a great idea to use diagonalization as we did before because computing powers of a matrix naively takes a lot more work than using diagonalisation.

Again, this matrix $T_3$ can be diagonalised and $\lim_{n\to\infty} T^n$ can be computed exactly. It is in fact

$$\lim_{n\to\infty} T_3^n = \begin{pmatrix}\frac{7}{26} & \frac{9}{26} & \frac{5}{13} \\

\frac{7}{26} & \frac{9}{26} & \frac{5}{13} \\

\frac{7}{26} & \frac{9}{26} & \frac{5}{13}\end{pmatrix}$$

These probabilities are definitely not obvious from our intial system. We also revealed some interesting properties. For example, consider state 3. The least amount of time is spent in state 3 even though it’s the hardest to leave it!

At this point you should consider some questions about transition matrices:

- Can all transition matrices be diagonalised?
- Do the transition matrices which have limits have all rows the same?
- Which real-world systems can be modeled by discrete-time systems with a finite number of states?