Primes

I'm sure you know primes. You know, positive integers who have exactly two positive divisors. You're probably familiar, for example, that $3$, $11$ and $2$ are primes, while $1000$, $24$ and $1$ are not primes. Note that negative numbers are not primes by definition. (although there are ways to define primes for negative numbers, we won't deal with those for now.)

Prime numbers are a favorite of mathematicians. You'll find them even in the deepest level of their study, and one of the most famous unsolved problems in math, the Riemann hypothesis, is deeply related to primes. This is because when you list the primes in order, $$2,3,5,7,11,13,17,19,23,29,31,\ldots$$

there seems to be no discernible pattern. Yet these numbers have so many deep and interesting properties!

Here's an example of a famous and interesting theorem about primes.

Bertrand's postulate (weaker version). For every integer $n > 1$, there is a prime strictly between $n$ and $2n$.

Now, this is quite amazing, given that the primes themselves look like a random sequence. Who's to say the primes can't form a gap large enough to violate this theorem? And yet, this theorem is true, and even though the primes look random, they still exhibit some properties that don't seem to be random.

But we won't go about proving it. Don't worry!

Properties of primes

Let's begin with some properties of primes.

Proposition 1. If $p$ is a prime and $p \mid ab$, then $p \mid a$ or $p \mid b$.

This sounds intuitive enough, and if you try a couple of $p$s, $a$s and $b$s you'll find that the proposition is true. (This includes $0$ and negative numbers.) But the proof requires knowledge of GCDs so we'll probably not prove this proposition. You can try proving it later if you want, but you may proceed as long as you are already convinced that it is true.

A usual next step is to generalize it into the following:


Proposition 2. If $p$ is a prime and $p \mid a_1a_2\cdots a_k$, then $p \mid a_i$ for some $1 \le i \le k$.

Proof. We will prove this by induction on $k$. What this means is that we will prove using induction using the predicate $P(k)$ which means “If $p$ is a prime and $p \mid a_1a_2\cdots a_k$, then $p \mid a_i$ for some $1 \le i \le k$.”

For the base case, $k = 1$, and it's very easy to check this case, because $a_1a_2\cdots a_k$ is just $a_1$. So $P(1)$ is true.

Now, for the inductive case, fix $k \ge 1$ and assume the $P(k)$ is true. We will now prove $P(k + 1)$. Let $p$ be a prime dividing $p \mid a_1a_2\cdots a_{k+1}$. Note that $a_1a_2\cdots a_{k+1} = (a_1a_2\cdots a_k) a_{k+1}$, so $p \mid (a_1a_2\cdots a_k) a_{k+1}$. By proposition $1$, either $p \mid (a_1a_2\cdots a_k)$ or $p \mid a_{k+1}$. In the latter case, just set $i = k+1$ and we are done. In the former case, by the inductive hypothesis we know that $p \mid a_i$ for some $1 \le i \le k$, and we are done.

You were probably able to convince yourself that proposition 1 is true because you're familiar with the concept of prime factorization, and the following theorem.

Fundamental Theorem of Arithmetic. Every integer can be factored into a product of primes uniquely, disregarding order.

This is also so intuitive and familiar that we won't prove it here. But if you want to prove it yourself, you can use strong induction. But you also have to prove that the factorization is unique, and you can do this with a clever use of proposition 2. Anyway, I'm sure you know this theorem by heart.

Prime check

Given a number $n$, how do we check if it is a prime? The simplest way is to check if there is a factor from $2$ to $n-1$, as in the following code:

  1. bool is_prime(long long n) {
  2. if (n <= 1) {
  3. // edge cases
  4. return false;
  5. }
  6. // find a proper nontrivial divisor
  7. for (long long d = 2; d < n; d++) {
  8. if (n % d == 0) {
  9. return false;
  10. }
  11. }
  12. return true;
  13. }

Note the check n <= 1 in the beginning. This is very important, otherwise you might get some weird hard-to-fix bugs, that will eat up a lot of your time. Always check the constraints; although the judge data will surely follow the constaints, there may still be possible tricky edge cases!

Unfortunately, the code above becomes really slow once you reach, say, 12 digits. Try running the code for n = 100000000003 You might find that the code runs so slowly, and you might get tired of waiting for it to finish!

In fact, in programming contests where the typical time limits are a few seconds, the code above becomes unusable for $n$s with just 10 digits. And, if you need to perform many prime checks, then it might not even be usable for $n$ up to 6 digits! All this because of the fact that the code above runs in $O(n)$. Clearly, we need to improve it.

It turns out that all we need is a simple observation.

Proposition 3. If $n$ is composite, then it has a factor between $2$ and $\sqrt{n}$.

Proof. Since $n$ is composite, then we can factorize it into $n = ab$ where $1 < a, b < n$. Now, if $a$ and $b$ were both greater than $\sqrt{n}$, then $n = ab > \sqrt{n}\sqrt{n} = n$, which is a contradiction. Therefore, one of them must be $\le \sqrt{n}$, and we are done!

This simple observation drastically improves our algorithm! Specifically, we now have an $O(\sqrt{n})$ algorithm. See the following implementation (and notice the minimal change from the previous code):

  1. bool is_prime(long long n) {
  2. if (n <= 1) {
  3. // edge cases
  4. return false;
  5. }
  6. // find a proper nontrivial divisor
  7. for (long long d = 2; d*d <= n; d++) {
  8. if (n % d == 0) {
  9. return false;
  10. }
  11. }
  12. return true;
  13. }

With this, we can now test for primality for $n$ up to 12 digits!

We can actually improve this some more once we learn about sieves, and then later again when we learn about modular arithmetic.

Prime factorization

Let's extend the problem: Given $n$, find a prime factor of $n$. This is actually easy when you notice that the code above stops when d reaches the smallest divisor of n, and that the smallest nontrivial divisor of any number is prime! (Can you see why?) This means that the code above can be easily updated for this task. See the following:

  1. long long find_prime_factor(long long n) {
  2. if (n <= 1) {
  3. // edge cases
  4. return -1;
  5. }
  6. // find a proper nontrivial divisor
  7. for (long long d = 2; d*d <= n; d++) {
  8. if (n % d == 0) {
  9. // d is now the smallest prime factor
  10. return d;
  11. }
  12. }
  13. // n must be prime
  14. return n;
  15. }

Not only does this code returns a prime factor, but it returns the smallest prime factor! Also, this still runs in $O(\sqrt{n})$ time.

So let's extend the problem again: Given $n$, find all the prime factors of $n$. In other words, find the prime factorization of $n$.

But we can actually use our find_prime_factor code to extract each prime factor individually! See the following:

  1. #include <vector>
  2. using namespace std;
  3. ...
  4. vector<long long> prime_factorize(long long n) {
  5. vector<long long> primes;
  6. // extract prime factors
  7. while (n > 1) {
  8. long long p = find_prime_factor(n);
  9. primes.push_back(p);
  10. n /= p;
  11. }
  12. return primes;
  13. }

This simply calls find_prime_factor repeatedly until we extract all prime factors, and then returns them as a list (std::vector). This certainly works, but it's kinda inefficient, because it restarts the search of a prime factor at d = 2 every time we find one. We can improve this with a simple modification of the find_prime_factor code:

  1. vector<long long> prime_factorize(long long n) {
  2. vector<long long> primes;
  3. for (long long d = 2; d*d <= n; d++) {
  4. while (n % d == 0) {
  5. primes.push_back(d);
  6. n /= d;
  7. }
  8. }
  9. if (n > 1) {
  10. // n must be prime
  11. primes.push_back(n);
  12. }
  13. return primes;
  14. }

Notice that the if statement inside the loop is now a while loop, because a prime factor can appear multiple times in the prime factorization! With this improvement, our prime factorization code now runs in $O(\sqrt{n})$ time!

Prime factorization and “divides”

The prime factorization actually gives us another way of knowing whether $a \mid b$. For example, can you tell whether $a \mid b$ when:

These are very large numbers, so they won't fit inside a long long variable.

Now, by definition $a \mid b$ if $ac = b$ for some $c$, so let's find this $c$. In the first example, the only $c$ satisfying $ac = b$ is the integer $c = 2^{85}3^5 7^4 11$, so it's indeed true that $a \mid b$. (Verify that $ac = b$ indeed.) In the second example, the only $c$ satisfying $ac = b$ is $c = \displaystyle\frac{2^{85} 3^5 7^4 11}{5^2}$. But this isn't an integer! (Why? Hint: Try using Proposition 2.) Therefore, we have shown that $a \nmid b$ (because there is no integer $c$ such that $ac = b$).

What we learn here is that $a \mid b$ if and only if the exponent of every prime in the prime factorization of $a$ is $\le$ its exponent in the prime factorization of $b$, which you probably already know!

We can state this in a more precise way. For a prime $p$, let $\operatorname{ord}_p n$ be the exponent of $p$ in the prime factorization of $n$. For example, $\operatorname{ord}_2 60 = 4$, $\operatorname{ord}_3 60 = 1$ and $\operatorname{ord}_7 60 = 0$. Then $a \mid b$ if and only if $\operatorname{ord}_p a \le \operatorname{ord}_p b$ for every prime $p$.

Problems