# Simple OpenMP Tutorial

Today, I’ll give you a simple and practical example of using OpenMP with gcc. What’s OpenMP? It’s a C/C++/Fortran library for writing multithreaded applications. What’s that? Most computers these days have multiple cores so that multiple tasks can be done in parallel. With multiple cores, you can edit pictures and listen to music at the same time without a noticeable slowdown. The intensive editing computations and music decoding happen on separate cores. It’s a little more complicated than that, but that’s the basic idea.

If your computer has multiple cores, it can use them in a single program. That’s where OpenMP comes in. For concreteness, let’s say you want to test whether the number 770948605618065977 is prime. If you’re sane you’d just use the Miller-Rabin test like we’ve been discussing on this blog, and be happy with a 99.99999% probability that the number is prime. However, let’s say you want to prove this number prime by trying all the numbers up to the integer part of its square root, which is 878036790.

Here is a minimal C program to do this:

#include
#include

int main(){
long int N = 770948605618065977;
long int upper = (long int) sqrt(N);
for (long int j = 2; j <= upper; j++){
if (N%j == 0){
printf("Factor found: %ld\n", j);
break;
}
}
return(0);
}


To compile this program (with filename factor.c) with gcc, run

gcc factor.c -lm -o factor


When you run factor, it will output the smallest proper factor of N if such a factor exists. If it doesn't find a factor, then the number must be prime. How long does this program take to run? The average of three runs on my ancient laptop is 10.4 seconds.

## Using OpenMP

Our prime-testing program is a perfect example of something you'd want to do on multiple threads. For simplicity, let's say you have two cores. Then we could test half the factors on one core and half the factors on the other core. The OpenMP library lets you do this. To use it, you must add the header file for the OpenMP library:

#include


Now, consider the following code:

omp_set_num_threads(2);
#pragma omp parallel
{
//stuff goes here
}


I first call the function omp_set_num_threads, setting 2 as the number of threads to use. You don't have to do this part: you could let the computer decide how many threads to use, and tailor your program based on the number of threads it gets.

Next comes the preprocessor directive #pragma omp parallel. This says that any code in the surrounding code block delimited by the braces will be executed in each thread. So, in our case, because two threads will be created, the code in the block will be executed twice, once in each thread.

Let's look at how we actually use it in our prime-testing program:

#include
#include
#include

int main(int argc, char** argv){
long int N = 770948605618065977;
printf("Number to be tested: %ld\n", N);
long int upper = (long int) sqrt(N);
omp_set_num_threads(2);
#pragma omp parallel
{
int ID = omp_get_thread_num();
int start,stop;
if (ID == 0){
start = 2;
stop = 439018395;
}
else{
start = 439018396;
stop = upper;
}
for (long int j = start; j <= stop; j++){
if (N%j == 0){
printf("Factor found: %ld\n", j);
break;
}
}
}
return(0);
}


What's going on here? Inside the code block preceded by the pragma, I wrote

int ID = omp_get_thread_num();


Each thread will have an identification number. Retrieving this will allow us to do different things depending on which thread we are in. In this case, I define two ranges by start and stop defining the numbers I want to test as factors. If the thread id is 0, then I want to test the first half of the numbers. If the thread id is 1, then I want to test the second half. Simple right?

Here is how to compile this program, assuming it is saved under factormulti.c:

gcc factormulti.c -lm -o factor -fopenmp


We must use the switch -fopenmp to enable the use of the OpenMP library. The average three-time run I got for this program is 5.3 seconds. It nearly doubled in speed.

Obviously, in real life, we would want to write the program so that it can be used on any positive integer and with a variable number of threads. But this is just a basic illustration to keep the examples minimal.

OpenMP also has many more features than what I covered here to handle much more complicated examples!

## Closing Remarks

OpenMP can speed up your program, but it certainly isn't the only way. If you want to write a faster program, first wisely choose your algorithms. For example, even with trial division, a greater improvement than using OpenMP is obtained by only testing numbers of the form
$$12k + r, r=1,5,7,11$$
after testing division by $2,3,5,7,11$.

Once you choose the best algorithm for the specific task at hand and your program is optimized, only then is it a good idea to consider OpenMP. That's because if the algorithms need modifications, you may need to write new parallelized code all over again.