Posted by Jason Polak on 13. February 2018 · Write a comment · Categories: statistics · Tags:

Previously we talked about the Poisson distribution. The Poisson distribution with mean $\mu \gt 0$ is a distribution on the natural numbers whose density function is
$$f(n) = \frac{e^{-\mu}\mu^n}{n!}$$
We have already seen that the Poisson distribution essentially arises from the binomial distribution as a sort of “limiting case”. In fact, the Poisson distribution is sometimes used as an approximation to the binomial distribution, even though it also arises in its own right in processes like observing the arrival of random particles from radioactive decay.

The Poisson distribution and the binomial distribution are related in another way, through conditioning on the sum of Poisson random variables.

Suppose we have two independent Poisson random variables $X_1$ and $X_2$ with means $E(X_i) = \mu_i$. Then the sum $X_1 + X_2$ also has a Poisson distribution with mean $\mu_1 + \mu_2$.

On the other hand, what is the conditional density $P(X_1 = n_1, X_2 = n_2~|~ X_1 + X_2 = n)$? Here, $n_1 + n_2 = n$. By definition, it is
$$\frac{P(X_1 = n_1, X_2 = n- n_1)}{P(X_1 + X_2 = n)}$$
This is
where $p = \mu_1/(\mu_1+\mu_2)$. So, the joint density of two Poisson random variables conditioning on their sum being $n$ is binomial!

Suppose we have observations from a known probability distribution whose parameters are unknown. How should we estimate the parameters from our observations?

Throughout we’ll focus on a concrete example. Suppose we observe a random variable drawn from the uniform distribution on $[0,\theta]$, but we don’t know what $\theta$ is. Our one observation is the number $a$. How can we estimate $\theta$?

One method is the ubiquitous maximum likelihood estimator. With this method, we put our observation into the density function, and maximize it with respect to the unknown parameter. The uniform distribution has density $f(x) = 1/\theta$ on the interval $[0,\theta]$ and zero elsewhere. This function is maximized when $\theta = a$. For if $\theta$ were any smaller, then $f(a)$ would be zero.

Also, it’s easy to see that if we draw $n$ samples $a_1,\dots,a_n$ from this distribution, the maximum likelihood estimator for $\theta$, which is the value of $\theta$ that maximizes the joint probability density function, is $\max_i \{a_i\}$.
More »

Posted by Jason Polak on 08. February 2018 · Write a comment · Categories: statistics · Tags: ,

The Poisson distribution is a discrete probability distribution on the natural numbers $0,1,2,\dots$. Its density function depends on one parameter $\mu$ and is given by
$$d(n) = \frac{e^{-\mu}\mu^n}{n!}$$
Not surprisingly, the parameter $\mu$ is the mean, which follows from the exponential series
$$e^x = \sum_{n=0}^\infty \frac{x^n}{n!}$$
Here is what the density function looks like when $\mu=5$:

How does the Poisson distribution actually arise?

It comes from the following process: suppose you have a fixed interval of time, and you observe the number of occurrences of some phenomenon. In practice, it might be ‘the number of buses to arrive at a given bus stop’. Whatever it is, you’re counting something.

Moreover, this process has to satisfy the important “Poisson axiom”: if you take two disjoint intervals of time that are small, then the number of occurrences in the first is independent of the number of occurrences in the second. Here, “small” means that as the size of the intervals approaches zero, the results should approach independence.
More »

Posted by Jason Polak on 30. June 2013 · Write a comment · Categories: statistics · Tags: , ,

The internet has enabled researchers and organisations of various kinds to make their data available for free to download and hence anyone with a computer and some rudimentary R knowledge can observe and analyse all sorts of trends in everything from economics to society to natural phenomena. Obviously this can provide endless hours of fun and distraction!

One such data set, available at Willmott, Matsuura, and Collaborators’ Global Climate Resource Page is a dataset that was compiled by Legates and Willmott and described in their paper [1].

This dataset is a set of estimated mean monthly surface air temperature values for various (points on a 0.5 by 0.5 degree grid) geographical locations and was made from 17986 land weather stations (most densely concentrated in the United States and Europe) and 6955 ocean recorded points. These observations came from various sources and span a period of sixty years, and were used to estimate and interpolate the temperature at various points. To give you an idea of how much data was collected, here is a map from their paper [1] showing the location of the stations (I figure showing this map is fair use):


The details of the estimations are documented in the paper, and estimated errors are also available on the website above. Although there are some pictures in the paper, they are in black and white and I thought it would be fun to make some in colour.
More »