In a group $G$, the discrete logarithm problem is to solve for $x$ in the equation $g^x =h$ where $g,h\in G$. In Part 1, we saw that solving the discrete log problem for finite fields would imply that we could solve the Diffie-Hellman problem and crack the ElGamal encryption scheme.

Obviously, the main question at this point is: how hard *is* the discrete logarithm problem (DLP)? In this post, we will look at various ways to approach the problem. We'll focus on solving the DLP in the cyclic group $\F_p^\times$, though some of the ideas generalise to more general groups as well.

## The Brute Force Approach

So, we've got this equation: $g^x = h$ in $\F_p^\times$. Of course, we're assuming that there exists an $x$ that solves this equation, because in practice a solution exists.

The most naive approach is the brute-force approach: successively compute powers $g,g^2,g^3,\dots$ until you find the solution. If $g$ is a primitive elements (meaning it has order $p-1$) then it will take up to $p-2$ multiplications to find $x$. That means even if all the values of $x$ are equally likely, the running time for this algorithm is $O(p) = O(2^n)$ where $n$ is the number of bits in $p$. Therefore, this algorithm sucks. Let's see it in action anyway. Here is a basic Python implementation:

1 2 3 4 5 6 7 8 9 10 |
def discreteLog( prime, base, arg ): result = 0 current = 1 while result < prime - 1: if (current-arg)%prime == 0: return(result) else: current *= base result += 1 return(-1) |

Here, it solves the equation **base^x = arg**, or returns -1 if there is no solution. We'll use this test function to see how how this algorithm takes:

1 2 3 4 |
def test(func): print("5^31 mod 37 = 24, "+str(func(37, 5, 24))) print("3^8412 mod 1898959 = 844405, "+str(func(1898959, 3, 844405))) print("7^50000 mod 5898931 = 5110965, "+str(func(5898931, 7, 5110965))) |

Running this first naive algorithm three times gave an average running time of 2.247 seconds. I originally added a fourth problem to this test function: solve

$$7^x = 8458730 \pmod{18989249}.$$

However, it took over six minutes with this naive algorithm and so I just terminated the program.

## Shanks' Algorithm

Shanks came up with a better algorithm for the discrete logarithm problem. It's somewhat like the naive prime testing algorithm: to test if a number $n$ is prime, you only need to test factors up $\sqrt{n}$ (including that if $\sqrt{n}$ is an integer), instead of testing factors all the way up to $n$. Shanks' observed that a similar trick can be made to work in solving the discrete logarithm problem. How does it work?

Here's the basic idea: in a group $G$, we want to solve for $x$ in the equation $g^x = h$ where x is an unknown integer. If $g$ is an element of order $N\geq 2$, let $n = 1+\lfloor\sqrt{n}\rfloor$. In the two sets:

- $\{ e,g,g^2,\dots,g^n \}$
- $\{ h, hg^{-n}, hg^{-2n},\dots, hg^{-n^2}\}$

there will be a common element if the DLP has a solution. If the match is $g^i = hg^{-jn}$, then the solution to the DLP is $x = i + jn$. Computing these two sets requires at most $O(n)$ operations. In practice, you wouldn't compute the full second set, but rather check as you are computing. You also need to search the first set every time you create an element of the second set. However, you can get away without searching if you use a hashtable where the keys are the computed values. Because the computed values are all distinct, checking whether the computed values of the seconds set belong to the first happens in $O(1)$ time. Therefore, this algorithm's running time is $O(\sqrt{N}) = O(2^{1/2k})$ where $k$ is the number of bits to represent $N$. So, it's going to be quite a lot faster than the most naive algorithm. Here is a Python 3 implementation:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
def discreteLog2( prime, base, arg ): result = 0 N = prime - 1 n = 1 + int(N**1/2) firstList = {1:0} current = 1 for i in range(1,n+1): current = current*base%prime if not current in firstList: firstList[current] = i if arg in firstList: return firstList[arg] else: multiplier = euclidean(pow(base,n,prime),prime)[1]%prime for i in range(1,n+1): arg = (arg*multiplier)%prime if arg in firstList: return(firstList[arg]+n*i) return(-1) |

It uses a hashtable as mentioned to avoid searching lists. Let's see how it performs using this test function:

1 2 3 4 |
def test(func): print("5^31 mod 37 = 24, "+str(func(37, 5, 24))) print("3^8412 mod 1898959 = 844405, "+str(func(1898959, 3, 844405))) print("7^50000 mod 5898931 = 5110965, "+str(func(5898931, 7, 5110965))) |

The average running time over three iterations of this test function was 1.563 seconds. At first, this doesn't seem that much better, but that's because these numbers are too small to give you a good idea of the enormous time savings we've achieved. Recall the problem:

$$7^x = 8458730 \pmod{18989249}.$$

If I add it to the test function, then it now has a running time of 6.597 seconds. Using the most naive approach, the algorithm ran for six minutes before I just terminated its execution before it could finish.