# Sieves

Given $n$, what are all the primes from $1$ to $n$? You might be thinking, “That's easy! I'll just use is_prime $n$ times.”

vector<int> all_primes(int n) {
vector<int> primes;
for (int i = 1; i <= n; i++) {
if (is_prime(i)) {
primes.push_back(i);
}
}
return primes;
}


You can check that this works using the following code:

#include <stdio.h>
#include <vector>
using namespace std;

... // is_prime and all_primes definitions here

int main() {
int n = 100;
vector<int> primes = all_primes(n);

printf("the primes from 1 to %d are:", n);
for (int i = 0; i < primes.size(); i++) {
printf(" %d", primes[i]);
}
printf("\n");
}


And you're right. This code works. Unfortunately, you can only use this algorithm for up to a certain $n$ before it becomes so slow. For example, when i run it with $n = 3000000$, it takes more than $4.5$ seconds in my machine. This is understandable; our best is_prime so far runs in $O(\sqrt{n})$ in the worst case, so this runs in $O(n\sqrt{n})$ time. So we want a faster solution.

## Eratosthenes

Our fast solution for this problem will actually come from 2000 years in the past. (Way before computers were invented.) The sieve of Eratosthenes is a method of doing exactly this, but much faster. It was attributed to Eratosthenes of Cyrene. The idea is to start by marking all numbers from $2$ to $n$ as “prime”, and then iterating through all known primes and marking their proper multiples as “composite”.

Specifically,

• Mark all numbers from $2$ to $n$ as “prime” initially.
• Mark all proper multiples of $2$ as composite. (A proper multiple means a multiple is not equal to itself.)
• The next smallest number marked prime is now $3$, so we're sure it's a real prime. Mark all proper multiples of $3$ as composite.
• The next smallest number marked prime is now $5$, so we're sure it's a real prime. Mark all proper multiples of $5$ as composite.
• The next smallest number marked prime is now $7$, so we're sure it's a real prime. Mark all proper multiples of $7$ as composite.
• The next smallest number marked prime is now $11$, so we're sure it's a real prime. etc.

The reason this works is that every composite number $x$ has a prime factor $p$ smaller than it, so when we reach $p$ in the algorithm, $x$ will be a proper multiple of $p$, so $x$ will be marked composite. (In fact, $x$ will be marked “composite” multiple times, once for each distinct prime factor.) On the other hand, primes don't have prime factors smaller than them, so they will never be marked composite.

For example, suppose we want to know all primes up to $25$. Here's how the algorithm will go:

$$\require{cancel} \require{enclose} \begin{array}{r|ccccccccccccccccccccccccccccc} \text{Initially} & 2&3&4&5&6&7&8&9&10&11&12&13&14&15&16&17&18&19&20&21&22&23&24&25\\ \text{2 is prime} & \enclose{circle}{2}&3&\mathbf{\cancel{4}}&5&\mathbf{\cancel{6}}&7&\mathbf{\cancel{8}}&9&\mathbf{\cancel{10}}&11&\mathbf{\cancel{12}}&13&\mathbf{\cancel{14}}&15&\mathbf{\cancel{16}}&17&\mathbf{\cancel{18}}&19&\mathbf{\cancel{20}}&21&\mathbf{\cancel{22}}&23&\mathbf{\cancel{24}}&25\\ \text{3 is prime} & \color{gray}{\enclose{circle}{2}}&\enclose{circle}{3}&\color{gray}{\cancel{4}}&5&\mathbf{\cancel{6}}&7&\color{gray}{\cancel{8}}&\mathbf{\cancel{9}}&\color{gray}{\cancel{10}}&11&\mathbf{\cancel{12}}&13&\color{gray}{\cancel{14}}&\mathbf{\cancel{15}}&\color{gray}{\cancel{16}}&17&\mathbf{\cancel{18}}&19&\color{gray}{\cancel{20}}&\mathbf{\cancel{21}}&\color{gray}{\cancel{22}}&23&\mathbf{\cancel{24}}&25\\ \text{5 is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\enclose{circle}{5}&\color{gray}{\cancel{6}}&7&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\mathbf{\cancel{10}}&11&\color{gray}{\cancel{12}}&13&\color{gray}{\cancel{14}}&\mathbf{\cancel{15}}&\color{gray}{\cancel{16}}&17&\color{gray}{\cancel{18}}&19&\mathbf{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\mathbf{\cancel{25}}\\ \text{7 is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\enclose{circle}{7}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&11&\color{gray}{\cancel{12}}&13&\mathbf{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&17&\color{gray}{\cancel{18}}&19&\color{gray}{\cancel{20}}&\mathbf{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{11 is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\enclose{circle}{11}&\color{gray}{\cancel{12}}&13&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&17&\color{gray}{\cancel{18}}&19&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\mathbf{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{13 is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\color{gray}{\enclose{circle}{11}}&\color{gray}{\cancel{12}}&\enclose{circle}{13}&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&17&\color{gray}{\cancel{18}}&19&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{17 is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\color{gray}{\enclose{circle}{11}}&\color{gray}{\cancel{12}}&\color{gray}{\enclose{circle}{13}}&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&\enclose{circle}{17}&\color{gray}{\cancel{18}}&19&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{19 is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\color{gray}{\enclose{circle}{11}}&\color{gray}{\cancel{12}}&\color{gray}{\enclose{circle}{13}}&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&\color{gray}{\enclose{circle}{17}}&\color{gray}{\cancel{18}}&\enclose{circle}{19}&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&23&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \text{23 is prime} & \color{gray}{\enclose{circle}{2}}&\color{gray}{\enclose{circle}{3}}&\color{gray}{\cancel{4}}&\color{gray}{\enclose{circle}{5}}&\color{gray}{\cancel{6}}&\color{gray}{\enclose{circle}{7}}&\color{gray}{\cancel{8}}&\color{gray}{\cancel{9}}&\color{gray}{\cancel{10}}&\color{gray}{\enclose{circle}{11}}&\color{gray}{\cancel{12}}&\color{gray}{\enclose{circle}{13}}&\color{gray}{\cancel{14}}&\color{gray}{\cancel{15}}&\color{gray}{\cancel{16}}&\color{gray}{\enclose{circle}{17}}&\color{gray}{\cancel{18}}&\color{gray}{\enclose{circle}{19}}&\color{gray}{\cancel{20}}&\color{gray}{\cancel{21}}&\color{gray}{\cancel{22}}&\enclose{circle}{23}&\color{gray}{\cancel{24}}&\color{gray}{\cancel{25}}\\ \end{array}$$

We're simply crossing off proper multiples of primes because we know they're not primes any more, and as you can see, at the last step, we have a correct list of primes up to $25$!

Implementing this is really simple: Just create an array is_prime of length $n$, and use this array to “cross off” composite numbers. Here's a sample implementation:

vector<int> all_primes(int n) {
vector<int> primes;

// allocate the is_prime array.
// you can also preallocate this globally if you want.
bool *is_prime = new bool[n+1];

// initialize
for (int i = 2; i <= n; i++) is_prime[i] = true;

// main algorithm
for (int i = 2; i <= n; i++) {
if (is_prime[i]) {
// 'i' is the next prime.
primes.push_back(i);

// proper multiples of 'i'
for (int j = 2*i; j <= n; j += i) {
is_prime[j] = false;
}
}
}

// free is_prime to avoid memory leak.
// don't free if you allocated globally
free(is_prime);

return primes;
}


## Time complexity

The reason the sieve is better is that it's faster. Specifically, we only need to iterate through proper multiples of every prime $p$. Since there are $\lfloor n/p \rfloor$ multiples of $p$ not exceeding $n$, the running time is proportional to $$\left\lfloor \frac{n}{2} \right\rfloor + \left\lfloor \frac{n}{3} \right\rfloor + \left\lfloor \frac{n}{5} \right\rfloor + \left\lfloor \frac{n}{7} \right\rfloor + \ldots + \left\lfloor \frac{n}{p_i} \right\rfloor + \ldots$$ where the sum runs through all primes $p_i \le n$. Using some clever maths, we can derive that this is $O(n \log \log n)$, which is clearly better than $O(n\sqrt{n})$. Note that $\log n$ is a slow-growing function, so $\log \log n$ is even more slow-growing.

Try running the sieve for $n = 3000000$ and see how much faster it is! In my laptop it runs in $.06$ seconds, approximately $75$ times faster than before!

## Optimizations

We can actually optimize this sieve in a few ways. First, for a prime $i$, instead of marking $2i$, $3i$, $4i$, ..., we can just mark $i^2$, $i^2+i$, $i^2+2i$, $i^2+3i$, ... . The reason is that every composite number $x$ has a prime factor $\le \sqrt{x}$ (as proven in the previous section), so for every prime $i$, we only need to mark its multiples $\ge i^2$.

A second optimization is to initially only mark odd numbers as prime, i.e. mark all even numbers (except $2$) composite. And then for every odd prime $i$, we increment by $2i$ instead of $i$, avoiding all even numbers. In fact, we can save half the memory needed by only allocating is_prime for odd numbers.

These optimizations can be seen below:

vector<int> all_primes(int n) {
// the old 'is_prime[i]' now corresponds to 'is_prime[i/2]'
bool *is_prime = new bool[n/2+1];

for (int i = 2; i <= n; i++) is_prime[i/2] = true;

for (int i = 3; i*i <= n; i += 2) {           // only iterate through odd 'i'
if (is_prime[i/2]) {
for (int j = i*i; j <= n; j += 2*i) { // odd multiples >= i^2
is_prime[j/2] = false;
}
}
}

// extract the primes
vector<int> primes;
primes.push_back(2);                          // add '2' as prime
for (int i = 3; i <= n; i += 2) {
if (is_prime[i/2]) {
primes.push_back(i);
}
}

free(is_prime);

return primes;
}


If you have the courage, you can even optimize this some more by only considering numbers not divisible by $2$ or $3$! It's rare for anyone to need to do this, though.

## Modified sieves

This “sieve” procedure is actually somewhat flexible. For example, it can be modified to compute one prime factor for every number up to $n$. Study the following code to understand how:

// assume n <= one million. change this number if needed
int pfac;

void compute_prime_factors(int n) {
for (int i = 2; i <= n; i++) pfac[i] = 0;

for (int i = 2; i <= n; i++) {
if (pfac[i] == 0) {                     // this means i is prime
pfac[i] = i;                        // set i's prime factor to itself
for (int j = i*i; j <= n; j += i) { // iterate through proper multiples
pfac[j] = i;                    // set i as the prime factor
}
}
}
}


It can also be modified to compute other things, such as divisor counts, divisor sums, and even Euler's totient function (if you're familiar with it).

As a side effect, being able to compute a prime factor allows us to prime factorize every number up to $n$ quickly, using one of our older algorithms:

int pfac;

... // compute_prime_factors here

vector<int> prime_factorize(int n) {
vector<int> primes;
// extract prime factors
while (n > 1) {
int p = pfac[n]; // use pfac to get a prime factor
primes.push_back(p);
n /= p;
}
return primes;
}


This is our original prime factorization algorithms, but much faster because we use our computed pfac to extract a prime factor quickly. This runs very quickly, though it only works for numbers $\le n$.

Don't for get to call compute_prime_factors first for it work properly!