The Discrete Log, Part 2: Shanks' Algorithm

In a group $G$, the discrete logarithm problem is to solve for $x$ in the equation $g^x =h$ where $g,h\in G$. In Part 1, we saw that solving the discrete log problem for finite fields would imply that we could solve the Diffie-Hellman problem and crack the ElGamal encryption scheme.

Obviously, the main question at this point is: how hard is the discrete logarithm problem (DLP)? In this post, we will look at various ways to approach the problem. We'll focus on solving the DLP in the cyclic group $\F_p^\times$, though some of the ideas generalise to more general groups as well.

The Brute Force Approach

So, we've got this equation: $g^x = h$ in $\F_p^\times$. Of course, we're assuming that there exists an $x$ that solves this equation, because in practice a solution exists.

The most naive approach is the brute-force approach: successively compute powers $g,g^2,g^3,\dots$ until you find the solution. If $g$ is a primitive elements (meaning it has order $p-1$) then it will take up to $p-2$ multiplications to find $x$. That means even if all the values of $x$ are equally likely, the running time for this algorithm is $O(p) = O(2^n)$ where $n$ is the number of bits in $p$. Therefore, this algorithm sucks. Let's see it in action anyway. Here is a basic Python implementation:

Here, it solves the equation base^x = arg, or returns -1 if there is no solution. We'll use this test function to see how how this algorithm takes:

Running this first naive algorithm three times gave an average running time of 2.247 seconds. I originally added a fourth problem to this test function: solve
$$7^x = 8458730 \pmod{18989249}.$$
However, it took over six minutes with this naive algorithm and so I just terminated the program.

Shanks' Algorithm

Shanks came up with a better algorithm for the discrete logarithm problem. It's somewhat like the naive prime testing algorithm: to test if a number $n$ is prime, you only need to test factors up $\sqrt{n}$ (including that if $\sqrt{n}$ is an integer), instead of testing factors all the way up to $n$. Shanks' observed that a similar trick can be made to work in solving the discrete logarithm problem. How does it work?

Here's the basic idea: in a group $G$, we want to solve for $x$ in the equation $g^x = h$ where x is an unknown integer. If $g$ is an element of order $N\geq 2$, let $n = 1+\lfloor\sqrt{n}\rfloor$. In the two sets:

1. $\{ e,g,g^2,\dots,g^n \}$
2. $\{ h, hg^{-n}, hg^{-2n},\dots, hg^{-n^2}\}$

there will be a common element if the DLP has a solution. If the match is $g^i = hg^{-jn}$, then the solution to the DLP is $x = i + jn$. Computing these two sets requires at most $O(n)$ operations. In practice, you wouldn't compute the full second set, but rather check as you are computing. You also need to search the first set every time you create an element of the second set. However, you can get away without searching if you use a hashtable where the keys are the computed values. Because the computed values are all distinct, checking whether the computed values of the seconds set belong to the first happens in $O(1)$ time. Therefore, this algorithm's running time is $O(\sqrt{N}) = O(2^{1/2k})$ where $k$ is the number of bits to represent $N$. So, it's going to be quite a lot faster than the most naive algorithm. Here is a Python 3 implementation:

It uses a hashtable as mentioned to avoid searching lists. Let's see how it performs using this test function:

The average running time over three iterations of this test function was 1.563 seconds. At first, this doesn't seem that much better, but that's because these numbers are too small to give you a good idea of the enormous time savings we've achieved. Recall the problem:
$$7^x = 8458730 \pmod{18989249}.$$
If I add it to the test function, then it now has a running time of 6.597 seconds. Using the most naive approach, the algorithm ran for six minutes before I just terminated its execution before it could finish.