Sieves

Given $n$, what are all the primes from $1$ to $n$? You might be thinking, “That's easy! I'll just use is_prime $n$ times.”

  1. vector<int> all_primes(int n) {
  2. vector<int> primes;
  3. for (int i = 1; i <= n; i++) {
  4. if (is_prime(i)) {
  5. primes.push_back(i);
  6. }
  7. }
  8. return primes;
  9. }

You can check that this works using the following code:

  1. #include <stdio.h>
  2. #include <vector>
  3. using namespace std;
  4. ... // is_prime and all_primes definitions here
  5. int main() {
  6. int n = 100;
  7. vector<int> primes = all_primes(n);
  8. printf("the primes from 1 to %d are:", n);
  9. for (int i = 0; i < primes.size(); i++) {
  10. printf(" %d", primes[i]);
  11. }
  12. printf("\n");
  13. }

And you're right. This code works. Unfortunately, you can only use this algorithm for up to a certain $n$ before it becomes so slow. For example, when i run it with $n = 3000000$, it takes more than $4.5$ seconds in my machine. This is understandable; our best is_prime so far runs in $O(\sqrt{n})$ in the worst case, so this runs in $O(n\sqrt{n})$ time. So we want a faster solution.

Eratosthenes

Our fast solution for this problem will actually come from 2000 years in the past. (Way before computers were invented.) The sieve of Eratosthenes is a method of doing exactly this, but much faster. It was attributed to Eratosthenes of Cyrene. The idea is to start by marking all numbers from $2$ to $n$ as “prime”, and then iterating through all known primes and marking their proper multiples as “composite”.

Specifically,

The reason this works is that every composite number $x$ has a prime factor $p$ smaller than it, so when we reach $p$ in the algorithm, $x$ will be a proper multiple of $p$, so $x$ will be marked composite. (In fact, $x$ will be marked “composite” multiple times, once for each distinct prime factor.) On the other hand, primes don't have prime factors smaller than them, so they will never be marked composite.

For example, suppose we want to know all primes up to $25$. Here's how the algorithm will go:

$$\require{cancel} \require{enclose} \begin{array}{r|ccccccccccccccccccccccccccccc} \text{Initially} & 2&3&4&5&6&7&8&9&10&11&12&13&14&15&16&17&18&19&20&21&22&23&24&25\\ \text{$2$ is prime} & \enclose{circle}{2}&3&\mathbf{\cancel{4}}&5&\mathbf{\cancel{6}}&7&\mathbf{\cancel{8}}&9&\mathbf{\cancel{10}}&11&\mathbf{\cancel{12}}&13&\mathbf{\cancel{14}}&15&\mathbf{\cancel{16}}&17&\mathbf{\cancel{18}}&19&\mathbf{\cancel{20}}&21&\mathbf{\cancel{22}}&23&\mathbf{\cancel{24}}&25\\ \text{$3$ is prime} & \color{gray}{\enclose{circle}{2}}&\enclose{circle}{3}&\color{gray}{\cancel{4}}&5&\mathbf{\cancel{6}}&7&\color{gray}{\cancel{8}}&\mathbf{\cancel{9}}&\color{gray}{\cancel{10}}&11&\mathbf{\cancel{12}}&13&\color{gray}{\cancel{14}}&\mathbf{\cancel{15}}&\color{gray}{\cancel{16}}&17&\mathbf{\cancel{18}}&19&\color{gray}{\cancel{20}}&\mathbf{\cancel{21}}&\color{gray}{\cancel{22}}&23&\mathbf{\cancel{24}}&25\\ \text{$5$ is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\enclose{circle}{5}&\color{gray}{\cancel{6}}&7&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\mathbf{\cancel{10}}&11&\color{gray}{\cancel{12}}&13&\color{gray}{\cancel{14}}&\mathbf{\cancel{15}}&\color{gray}{\cancel{16}}&17&\color{gray}{\cancel{18}}&19&\mathbf{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\mathbf{\cancel{25}}\\ \text{$7$ is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\enclose{circle}{7}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&11&\color{gray}{\cancel{12}}&13&\mathbf{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&17&\color{gray}{\cancel{18}}&19&\color{gray}{\cancel{20}}&\mathbf{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{$11$ is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\enclose{circle}{11}&\color{gray}{\cancel{12}}&13&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&17&\color{gray}{\cancel{18}}&19&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\mathbf{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{$13$ is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\color{gray}{\enclose{circle}{11}}&\color{gray}{\cancel{12}}&\enclose{circle}{13}&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&17&\color{gray}{\cancel{18}}&19&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{$17$ is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\color{gray}{\enclose{circle}{11}}&\color{gray}{\cancel{12}}&\color{gray}{\enclose{circle}{13}}&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&\enclose{circle}{17}&\color{gray}{\cancel{18}}&19&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{$19$ is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\color{gray}{\enclose{circle}{11}}&\color{gray}{\cancel{12}}&\color{gray}{\enclose{circle}{13}}&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&\color{gray}{\enclose{circle}{17}}&\color{gray}{\cancel{18}}&\enclose{circle}{19}&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{$23$ is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\color{gray}{\enclose{circle}{11}}&\color{gray}{\cancel{12}}&\color{gray}{\enclose{circle}{13}}&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&\color{gray}{\enclose{circle}{17}}&\color{gray}{\cancel{18}}&\color{gray}{\enclose{circle}{19}}&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&\enclose{circle}{23}&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \end{array}$$

We're simply crossing off proper multiples of primes because we know they're not primes any more, and as you can see, at the last step, we have a correct list of primes up to $25$!

Implementing this is really simple: Just create an array is_prime of length $n$, and use this array to “cross off” composite numbers. Here's a sample implementation:

  1. vector<int> all_primes(int n) {
  2. vector<int> primes;
  3. // allocate the is_prime array.
  4. // you can also preallocate this globally if you want.
  5. bool *is_prime = new bool[n+1];
  6. // initialize
  7. for (int i = 2; i <= n; i++) is_prime[i] = true;
  8. // main algorithm
  9. for (int i = 2; i <= n; i++) {
  10. if (is_prime[i]) {
  11. // 'i' is the next prime.
  12. primes.push_back(i);
  13. // proper multiples of 'i'
  14. for (int j = 2*i; j <= n; j += i) {
  15. is_prime[j] = false;
  16. }
  17. }
  18. }
  19. // free is_prime to avoid memory leak.
  20. // don't free if you allocated globally
  21. free(is_prime);
  22. return primes;
  23. }

Time complexity

The reason the sieve is better is that it's faster. Specifically, we only need to iterate through proper multiples of every prime $p$. Since there are $\lfloor n/p \rfloor$ multiples of $p$ not exceeding $n$, the running time is proportional to $$\left\lfloor \frac{n}{2} \right\rfloor + \left\lfloor \frac{n}{3} \right\rfloor + \left\lfloor \frac{n}{5} \right\rfloor + \left\lfloor \frac{n}{7} \right\rfloor + \ldots + \left\lfloor \frac{n}{p_i} \right\rfloor + \ldots$$ where the sum runs through all primes $p_i \le n$. Using some clever maths, we can derive that this is $O(n \log \log n)$, which is clearly better than $O(n\sqrt{n})$. Note that $\log n$ is a slow-growing function, so $\log \log n$ is even more slow-growing.

Try running the sieve for $n = 3000000$ and see how much faster it is! In my laptop it runs in $.06$ seconds, approximately $75$ times faster than before!

Optimizations

We can actually optimize this sieve in a few ways. First, for a prime $i$, instead of marking $2i$, $3i$, $4i$, ..., we can just mark $i^2$, $i^2+i$, $i^2+2i$, $i^2+3i$, ... . The reason is that every composite number $x$ has a prime factor $\le \sqrt{x}$ (as proven in the previous section), so for every prime $i$, we only need to mark its multiples $\ge i^2$.

A second optimization is to initially only mark odd numbers as prime, i.e. mark all even numbers (except $2$) composite. And then for every odd prime $i$, we increment by $2i$ instead of $i$, avoiding all even numbers. In fact, we can save half the memory needed by only allocating is_prime for odd numbers.

These optimizations can be seen below:

  1. vector<int> all_primes(int n) {
  2. // the old 'is_prime[i]' now corresponds to 'is_prime[i/2]'
  3. bool *is_prime = new bool[n/2+1];
  4. for (int i = 2; i <= n; i++) is_prime[i/2] = true;
  5. for (int i = 3; i*i <= n; i += 2) { // only iterate through odd 'i'
  6. if (is_prime[i/2]) {
  7. for (int j = i*i; j <= n; j += 2*i) { // odd multiples >= i^2
  8. is_prime[j/2] = false;
  9. }
  10. }
  11. }
  12. // extract the primes
  13. vector<int> primes;
  14. primes.push_back(2); // add '2' as prime
  15. for (int i = 3; i <= n; i += 2) {
  16. if (is_prime[i/2]) {
  17. primes.push_back(i);
  18. }
  19. }
  20. free(is_prime);
  21. return primes;
  22. }

If you have the courage, you can even optimize this some more by only considering numbers not divisible by $2$ or $3$! It's rare for anyone to need to do this, though.

Modified sieves

This “sieve” procedure is actually somewhat flexible. For example, it can be modified to compute one prime factor for every number up to $n$. Study the following code to understand how:

  1. // assume n <= one million. change this number if needed
  2. int pfac[1000111];
  3. void compute_prime_factors(int n) {
  4. for (int i = 2; i <= n; i++) pfac[i] = 0;
  5. for (int i = 2; i <= n; i++) {
  6. if (pfac[i] == 0) { // this means i is prime
  7. pfac[i] = i; // set i's prime factor to itself
  8. for (int j = i*i; j <= n; j += i) { // iterate through proper multiples
  9. pfac[j] = i; // set i as the prime factor
  10. }
  11. }
  12. }
  13. }

It can also be modified to compute other things, such as divisor counts, divisor sums, and even Euler's totient function (if you're familiar with it).

As a side effect, being able to compute a prime factor allows us to prime factorize every number up to $n$ quickly, using one of our older algorithms:

  1. int pfac[1000111];
  2. ... // compute_prime_factors here
  3. vector<int> prime_factorize(int n) {
  4. vector<int> primes;
  5. // extract prime factors
  6. while (n > 1) {
  7. int p = pfac[n]; // use pfac to get a prime factor
  8. primes.push_back(p);
  9. n /= p;
  10. }
  11. return primes;
  12. }

This is our original prime factorization algorithms, but much faster because we use our computed pfac to extract a prime factor quickly. This runs very quickly, though it only works for numbers $\le n$.

Don't for get to call compute_prime_factors first for it work properly!

Problems