In the post Binomial distribution: mean and variance, I proved that if $X$ is a binomial random variable with parameters $n$ (trials) and $p$ (probability of success) then the variance of $X$ is $np(1-p)$. If you'll notice my proof was by induction. You might ask, why would I do that? It's certainly one of the most roundabout and lengthy proofs of this fairly simple fact. Well, I think it's an interesting proof. Today, however, let's look at some shorter ones.

One shorter proof is to recall that I originally defined $X$ as

$$X = X_1 + \cdots + X_n$$ where $X_1,\dots, X_n$ are independent and identically distributed Bernoulli random variables; that is, a random variable that takes either the value 1 with probability $p$ or 0 with probability $1-p$. I used this fact to calculate the mean. Well, if you square this expression and calculate the expectation, it will already provide a much shorter calculation of $E(X^2)$, which is the main ingredient to showing that the variance of $X$ is $np(1-p)$.

But there's another proof that I like even better, and it uses generating functions. But besides liking the use of generating functions, I am also introducing them because I will need them for more advanced material on stochastic processes that I will talk about in the near future.

Let's recall the probability mass function of $X$. It's given by the formula

$$P(X=k) = \binom{n}{k}p^{k}(1-p)^{n-k}.$$ Since the variance of $X$ can be calculated by $E(X^2) – E(X)^2$ and we already know that $E(X) = np$, the key value that we need to compute is $E(X^2)$. This value by definition is the sum

$$E(X^2) = \sum_{k=0}^n k^2\binom{n}{k}p^k(1-p)^{n-k}.$$ The tricky part of this sum is the $k^2$ term. There is a trick to evaluate a sum like this, and that is to consider the function

$$m(t) = \sum_{k=0}^n e^{tk}\binom{n}{k}p^k(1-p)^{n-k}$$ of the real variable $t$. Of the function $m(t)$ we make two observations. First, it is now easier to evaluate, as we will see soon. Second, the second derivative of this function evaluated at $t=0$ gives $E(X^2)$. In general, if $E(e^{tX})$ exists in some interval around zero, then its $n$th derivative evaluated at $t=0$ gives $E(X^n)$—also known as the $n$th moment. For this reason, $E(e^{tX})$ is called the moment generating function of the random variable $X$.

Now, let us get onto evaluating the sum expression for $m(t)$. We have

$$m(t) = \sum_{k=0}^n \binom{n}{k}(pe^t)^k(1-p)^{n-k}.$$ But now this is just the expansion of $(pe^t + 1 – p)^n$, and this is a function much easier to work with. Its first derivative is

$$m'(t) = n(pe^t + 1 – p)^{n-1}pe^t,$$ which gives $E(X) = m'(0) = np$. So far so good. Now, calculate the second derivative as

$$m{'}{'}(t) = n(n-1)(pe^t + 1-p)^{n-2}p^2e^{2t} + n(pe^t + 1 – p)^{n-1}pe^t.$$ This gives us the second moment

$$\begin{align*}E(X^2) &= m{'}{'}(0) = n(n-1)p^2 + np\\

&=n^2p^2 + np(1-p).

\end{align*}$$ Therefore, the variance of $X$ is $E(X^2) – E(X)^2 = np(1-p).$ To me this seems like the best derivation of the variance of the binomial distribution because it avoids any fiddling with binomial coefficients and yields a function that gives you all the higher moments $E(X^n)$ should you need them.