If we observe data coming from a distribution with known form but unknown parameters, estimating those parameters is our primary aim. If the distribution is uniform on $[0,\theta]$ with $\theta$ unknown, we already looked at two methods to estimate $\theta$ given $n$ i.i.d. observations $x_1,\dots,x_n$:

- Maximum likelihood, which maximizes the likelihood function and gives $\max\{ x_i\}$
- Moment estimator $2\sum x_i/n$, or twice the mean

The uniform distribution was an interesting example because maximum likelihood and moments gave two different estimates. But what about the Poisson distribution? It is supported on the natural numbers, depends on a single parameter $\mu$, and has density function

$$f(n) = \frac{e^{-\mu}\mu^n}{n!}$$

What about the two methods of parameter estimation here? Let’s start with the method of moments. It is easy to compute the moments of the Poisson distribution directly, but let’s write down the moment generating function of the Poisson distribution.

Wait, what’s the moment generating function? For a random variable $X$, its moment generating function is $g_X(t) := E(e^{tX})$, which is a power series whose coefficients are the moments $E(X),E(X^2),E(X^3),\dots$ over $n!$, provided that this expected value actually exists for some $t$ in a neighbourhood of zero. This works out so that $g_X^{(n)}(0)$ is the $n$th moment $E(X^n)$ of $X$. While it may seem that the moment generating function is more work than necessary for today’s discussion, its main importance is that the moment generating function of a sum of two independent random variables is the product of the moment generating function. The upshot is that the density function or mass function for a sum is hard to compute directly, but taking products is easy.

Back to the Poisson.

**Theorem.**The moment generating function of a Poisson random variable with parameter $\mu$ is

$$g(t) = e^{\mu(e^t-1)}.$$

*Proof.*We compute:

$$g(t) = E(e^{tX}) = \sum_{k=0}^\infty \frac{e^{tk}\mu^ke^{-\mu}}{k!} = e^{-\mu}e^{\mu e^t}\\

= e^{\mu(e^t-1)}.$$

Then $g'(0) = \mu$ so that $E(X) = \mu$. This implies that the method of moments estimator for $\mu$ for $x_1,\dots,x_n$ of $n$ i.i.d. observations from a Poisson is $\sum x_i/n$. It turns out this is also the maximum likelihood estimator. So with the Poisson, the two methods of estimation give the same result. Incidentally, notice how the moment generating function for the Poisson distribution shows immediately that the sum of two independent Poisson variables with means $\mu_1$ and $\mu_2$ must be Poisson with mean $\mu_1 + \mu_2$.

How does this estimator work in practice? Let’s run a simulation. We will draw from a Poisson distribution with $\mu = 5$ until the mean of the samples drawn so far is within 0.25 of the true mean 5. That is, until the sample mean falls in $[4.75,5.25]$. How many steps does it take to get here? Here is a histogram showing the results of this simulation run five thousand times:

This is a plot of the number of steps for each run of the simulation, sorted:

Here is the same sorted plot but with the log of the number of steps: