If $n \geq 0$ is an integer, define the number $n!$, pronounced “$n$ factorial”, as the number of bijections from an $n$-element set to itself. Therefore, $0! = 1$ and if $n > 0$ then

$$n! = 1\times \cdots\times n$$.

What’s the best way to calculate this quantity? The following “First Method” could possibly be the most straightforward method in the C language:

1 2 3 4 5 6 7 |
int factorialN(int n){ int result = 1; for (int i = 1; i <= n; i++){ result *= i; } return(result); } |

On other other hand, one might be tempted to reduce the number of iterations through the loop and try the following “Second Method”:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
int factorialD(int n){ int result = 1; int max; if (n%2 == 0){ max = n/2 + 1; } else{ max = (n+1)/2; result = max; } for (int i = 1; i < max; i++){ result *= i*(n + 1 - i); } return(result); } |

If $n$ is even for example, this algorithm does $[1*n][2*(n-1)]*[3*(n-2)]\cdots$ where each square bracket multiplication is done in one loop iteration. So, it reduces the number of iterations by about half. Of course, all the multiplications are still done.

To test the two programs I wrote a main function that computed the sum $1! + 2! + \cdots + 20!$, by adding to the sum each factorial computed with one of the above functions. Of course, on today’s machines, such a computation is extremely fast, so I made the program execute this summation a million times so that times can actually be observed. Two programs were compiled, one for each version of the factorial, and using the **-O2** compilation switch in **gcc**.

Then, a Python program used the **time** program to execute the first and then the second, and this step of running one after the other ran five hundred times. To test whether the results were different, I decided on a Mann-Whitney-Wilcoxon two-sample test, a nonparametric statistical test to see whether two samples come from the same distribution. R reports the following 95% confidence interval (in seconds) for the difference between the first and the second algorithms: [0.02996523, 0.03003752], p-value essentially zero. So there seems to be a small but significant difference in these two methods — in fact the total time saved from the second method after five hundred iterations was about 14.5 seconds.

On the other hand, one should be careful and performing this kind of “hack” optimization in coding. Compiling both programs with the **-Ofast** switch gives a significant advantage to the first, simpler code, with the following 95% confidence interval in seconds, for a difference between the first and the second: [-0.06005062, -0.05997876]. Here is the summary of the means:

-O2 | -Ofast | |

First Method | 0.216 | 0.136 |

Second Method | 0.187 | 0.195 |

I don’t know enough about compilers to explain what’s going on here properly, but I’d guess it has something to do with loop optimization and vectorization that is easier for the compiler to handle with the simpler loop.