Posted by Jason Polak on 18. February 2018 · 2 comments · Categories: computer-science · Tags: , ,

Let’s assume you have a polynomial over a finite field $\F_q$, defined in Sage. How can you tell whether it’s a permutation polynomial? That is, when is the corresponding function $\F_q\to\F_q$ bijective?

This is how you might have a polynomial over $\F_q$ defined in Sage:

Here, the variable $x$ refers the element $x$ in the isomorphism $\F_q \cong \F_p[x]/\alpha(x)$ and $t$ is the variable in the polynomial ring $\F_q[t]$. Is $f$ a permutation polynomial? That of course depends on what $q$ is.
More »

Let $\F_q$ be a finite field. For any function $f:\F_q\to \F_q$, there exists a polynomial $p\in \F_q[x]$ such that $f(a) = p(a)$ for all $a\in \F_q$. In other words, every function from a finite field to itself can be represented by a polynomial.

In particular, every permutation of $\F_q$ can be represented by a polynomial. This isn’t true for other finite rings, a fact which is the topic of one of my papers. At any rate, polynomials that represent permutations are called permutation polynomials.

It’s not easy to determine which polynomials are permutation polynomials. The exception is for the finite field $\F_2$, and more generally $\Z/2^w$. Then this case Ronald Rivest gave a straightforward criterion for whether a polynomial is a permutation polynomial.

However, there is a classical result that says any permutation may be represented uniquely by a polynomial whose degree is at most $q-2$. Here is an example noted by Charles Wells. Let $a,b\in\F_q$ be two distinct elements. The polynomial
$$f(x) = x + (a-b)(x-a)^{q-1} + (b-a)(x-b)^{q-1}$$
represents the transposition of $a$ and $b$. I suggest that you verify this by direct substitution, using the identity $c^{q-1} =1$ whenever $c\not=0$, which in turn follows from Fermat’s little theorem.
More »

Posted by Jason Polak on 14. February 2018 · Write a comment · Categories: homological-algebra, modules · Tags: , ,

Over a field $k$, an arbitrary product of copies of $k$ is a free module. In other words, every vector space has a basis. In particular, this means that arbitrary products of projective $k$-modules are projective.

Over the ring of integers, an arbitrary product of projective modules is not necessarily projective. In fact, a product of countably infinitely many copies of $\Z$ is not projective!

So while arbitrary direct sums of projective modules are projective, the same is not true of arbitrary products for some rings.

Besides fields, which other rings have the property that arbitrary direct products of their projective modules are projective?
More »

Posted by Jason Polak on 14. February 2018 · Write a comment · Categories: book · Tags: , , , ,

I first heard of Edward Snowden when I was still a PhD student. He became world-famous for leaking huge numbers of NSA documents on their surveillance and data collection program, whose primary aim is to indiscriminately collect as much private data from as many people as they can, regardless of whether they were suspected in any wrongdoing.

How this leak happened, and what the leak contained, is detailed in No Place to Hide by Glenn Greenwald. As Snowden entrusted Greenwald with the leaked documents, Greenwald is in a unique position to offer a detailed and accurate account of Snowden’s leaked files.

The bulk of Snowden’s documents describe the huge amounts of data are collected from pretty much anyone the NSA can get their hands on, regardless of who they are or where they live in the world. This data comes from various sources: intercepted internet transmissions, agreements with phone companies like AT&T, agreements with tech companies like Google, Facebook, and Microsoft, and other spy agencies. The data collected include the exact times and durations of phone calls, and the contents and metadata of emails. The data accumulated by the NSA is so huge that they have to build massive petabyte data centers to store it.

Greenwald’s book is frightening in its description of the NSA’s indiscriminate and often illegal data collection. That the U.S. Government has convinced enough people that this type of spying is worth it is the most bewildering aspect of all, considering that the probability of being killed in a terrorist act is so miniscule.

I also found it alarming that so many people, including journalists, denounced Greenwald and his associates for the leaks. A journalist for the New York Times went so far as to lie in an article, and that was just one of many fabrications Greenwald endured to bring the leaked documents to the world. Of course, Greenwald and Snowden also had great support as well. And lest anyone think that the leaked documents compromised national security, they should actually read this book in which it clearly shows that Snowden carefully selected his documents so that they did not reveal sensitive information that could compromise any security whatsoever.

Greenwald has done a major service to humanity in writing this book and his articles, and reporting on the Snowden leaks. Even so, I think the battle has only begun for basic human rights and existence in the face of massive data analytics in the hands of the power-hungry. Data-driven technology has only begun to mature and in the future we will see something that we’ve never seen before: a future where our lives are manipulated so quickly that most of us will be in danger of losing balance and losing control of some of the best things that make human life liveable. With the onslaught of data analytics and machine learning techniques, I fear that proponents of indiscriminate spying will only redouble their efforts to make the lives of the rest of us worse in their quest for their own power and money.

Everyone should read this book. Highly recommended.

By the way, if you’re interested in this book (and you should be), read Bruce Schneier’s Data and Goliath. Schneier, an expert in computer security, worked with Greenwald on the Snowden files. Data and Goliath is more focused and detailed on the technical aspects of indiscriminate data collection and espionage, whereas Greenwald’s book is more about the initial event and the effort it took to actually get this crucial information to the public.

Posted by Jason Polak on 13. February 2018 · Write a comment · Categories: statistics · Tags: , ,

If we observe data coming from a distribution with known form but unknown parameters, estimating those parameters is our primary aim. If the distribution is uniform on $[0,\theta]$ with $\theta$ unknown, we already looked at two methods to estimate $\theta$ given $n$ i.i.d. observations $x_1,\dots,x_n$:

  1. Maximum likelihood, which maximizes the likelihood function and gives $\max\{ x_i\}$
  2. Moment estimator $2\sum x_i/n$, or twice the mean

The uniform distribution was an interesting example because maximum likelihood and moments gave two different estimates. But what about the Poisson distribution? It is supported on the natural numbers, depends on a single parameter $\mu$, and has density function
$$f(n) = \frac{e^{-\mu}\mu^n}{n!}$$
What about the two methods of parameter estimation here? Let’s start with the method of moments. It is easy to compute the moments of the Poisson distribution directly, but let’s write down the moment generating function of the Poisson distribution.
More »

Posted by Jason Polak on 13. February 2018 · Write a comment · Categories: statistics · Tags: ,

This post is a quick introduction to the R. I learnt R when I was an undergrad and I still use it from time to time. It was one of the first major programs I compiled from source as well.

What is R? It is simply the best statistical computing environment in use today. Better yet, it’s open source. If you’re working with data, you need to learn R. This post won’t teach you how to use R. Instead, it will give a whirlwind tour so you can get appreciate R’s flavour. The only thing I don’t like much about R is searching for material on it. Its one letter name makes that hard.

I will assume that you have R installed, and have opened the interactive console. It should look something like this:

More »

Posted by Jason Polak on 13. February 2018 · Write a comment · Categories: statistics · Tags:

Previously we talked about the Poisson distribution. The Poisson distribution with mean $\mu \gt 0$ is a distribution on the natural numbers whose density function is
$$f(n) = \frac{e^{-\mu}\mu^n}{n!}$$
We have already seen that the Poisson distribution essentially arises from the binomial distribution as a sort of “limiting case”. In fact, the Poisson distribution is sometimes used as an approximation to the binomial distribution, even though it also arises in its own right in processes like observing the arrival of random particles from radioactive decay.

The Poisson distribution and the binomial distribution are related in another way, through conditioning on the sum of Poisson random variables.

Suppose we have two independent Poisson random variables $X_1$ and $X_2$ with means $E(X_i) = \mu_i$. Then the sum $X_1 + X_2$ also has a Poisson distribution with mean $\mu_1 + \mu_2$.

On the other hand, what is the conditional density $P(X_1 = n_1, X_2 = n_2~|~ X_1 + X_2 = n)$? Here, $n_1 + n_2 = n$. By definition, it is
$$\frac{P(X_1 = n_1, X_2 = n- n_1)}{P(X_1 + X_2 = n)}$$
This is
where $p = \mu_1/(\mu_1+\mu_2)$. So, the joint density of two Poisson random variables conditioning on their sum being $n$ is binomial!

Suppose we have observations from a known probability distribution whose parameters are unknown. How should we estimate the parameters from our observations?

Throughout we’ll focus on a concrete example. Suppose we observe a random variable drawn from the uniform distribution on $[0,\theta]$, but we don’t know what $\theta$ is. Our one observation is the number $a$. How can we estimate $\theta$?

One method is the ubiquitous maximum likelihood estimator. With this method, we put our observation into the density function, and maximize it with respect to the unknown parameter. The uniform distribution has density $f(x) = 1/\theta$ on the interval $[0,\theta]$ and zero elsewhere. This function is maximized when $\theta = a$. For if $\theta$ were any smaller, then $f(a)$ would be zero.

Also, it’s easy to see that if we draw $n$ samples $a_1,\dots,a_n$ from this distribution, the maximum likelihood estimator for $\theta$, which is the value of $\theta$ that maximizes the joint probability density function, is $\max_i \{a_i\}$.
More »

Posted by Jason Polak on 09. February 2018 · Write a comment · Categories: book · Tags: ,

Steve Jobs will forever be known as prime force behind Apple. ‘Steve Jobs’ by Walter Isaacson is his definitive biography, requested and authorised by Jobs himself.

Having read quite a few biographies, Isaacson’s biography stands out as excellent. Isaacson is a good storyteller, and combined with Jobs’ energetic and polarized personality, this book was never difficult to read despite its length.

Perhaps the most fascinating part of this book his how Jobs ruled Apple’s product development. Despite not being either technical or a programmer, he had an amazing sense of product usability and aesthetic, and he applied this with an amazing relentless energy to many fine details over the years of product development at Apple. His management style of yelling at people, which seems so common in tech development in various guises, is not something I would want to work under but is nonetheless fascinating to observe from a distance.

I believe the beauty of this biography is that Isaacson manages to create a perfect balance between the personal and the technological. There is great detail on Jobs as a person, which is to be expected. But what makes this biography stand out is that it describes his influence on technology with sufficient detail so that the book doesn’t become just a list of biographical facts. For instance, I really enjoyed the descriptions of how Jobs was involved with the evolution of a product from a design and usability perspective.

Isaacson also manages to give just the right amount of detail of the supporting characters so that we see Jobs’ life in its social entirety, without too much character introduction or diversions. Highly recommended.

Posted by Jason Polak on 08. February 2018 · Write a comment · Categories: statistics · Tags: ,

The Poisson distribution is a discrete probability distribution on the natural numbers $0,1,2,\dots$. Its density function depends on one parameter $\mu$ and is given by
$$d(n) = \frac{e^{-\mu}\mu^n}{n!}$$
Not surprisingly, the parameter $\mu$ is the mean, which follows from the exponential series
$$e^x = \sum_{n=0}^\infty \frac{x^n}{n!}$$
Here is what the density function looks like when $\mu=5$:

How does the Poisson distribution actually arise?

It comes from the following process: suppose you have a fixed interval of time, and you observe the number of occurrences of some phenomenon. In practice, it might be ‘the number of buses to arrive at a given bus stop’. Whatever it is, you’re counting something.

Moreover, this process has to satisfy the important “Poisson axiom”: if you take two disjoint intervals of time that are small, then the number of occurrences in the first is independent of the number of occurrences in the second. Here, “small” means that as the size of the intervals approaches zero, the results should approach independence.
More »