Greatest common divisor

The greatest common divisor of two integers $a$ and $b$ is defined as the largest number that is a divisor of $a$ and $b$. (Well, duh.) You might know it by the name greatest common factor. We abbreviate it as gcd, and we refer to the gcd of $a$ and $b$ as $\operatorname{gcd}(a,b)$. For example, $\operatorname{gcd}(18,24) = 6$, and $\operatorname{gcd}(28,15) = 1$.

Note that gcd is defined for negative numbers and zero! For example, $\operatorname{gcd}(-18,24) = 6$, and $\operatorname{gcd}(0,-11) = 11$. Actually, you'll find that gcd will be well defined for all pairs of integers. Except one. What is $\operatorname{gcd}(0,0)$? All integers are common divisors of $0$ and $0$, so what should we define $\operatorname{gcd}(0,0)$? Should we even define it?

Well, the usual convention is to define $\operatorname{gcd}(0,0) = 0$. I know this sounds weird, but this actually makes some sense. For example, by defining $\operatorname{gcd}(0,0) = 0$, we make all the following things true without making a special case for zeroes:

Associativity law. $\operatorname{gcd}(\operatorname{gcd}(a,b),c) = \operatorname{gcd}(a,\operatorname{gcd}(b,c)) = \operatorname{gcd}(a,b,c)$ for any $a$, $b$ and $c$.

(Almost) identity law. $\operatorname{gcd}(0,a) = \operatorname{gcd}(a,0) = \left|a\right|$ for any $a$.

There are many more examples.

Properties

The gcd has many nice properties. You have already seen some, and we'll enumerate more here. To make things easier to read, let's ignore negative numbers for now, because of the following property:

Proposition 1. $\operatorname{gcd}(a,b) = \operatorname{gcd}(|a|,|b|)$ for any $a$, $b$.

Most of these properties, including those we've seen so far, are quite intuitive, so we won't prove most of them. (You can try it yourself though.) Here are some more properties:

Commutativity law. $\operatorname{gcd}(a,b) = \operatorname{gcd}(b,a)$ for any $a$, $b$.

Proposition 2. $\operatorname{gcd}(a,0) = a$ for any $a \ge 0$.

Proposition 3. $\operatorname{gcd}(a,a) = a$ for any $a \ge 0$.

You might know this next one based on how you might be computing the gcd in school:

Proposition 4. Let $a$ and $b$ have the following prime factorizations: $$a = p_1^{e_1} p_2^{e_2} p_3^{e_3} p_4^{e_4} \cdots$$ $$b = p_1^{f_1} p_2^{f_2} p_3^{f_3} p_4^{f_4} \cdots$$

where the $p_i$ are all the primes, but only finitely many of the $e_i$s or $f_i$s are nonzero (i.e. if $p_i$ doesn't appear in $a$, set $e_i = 0$.) Then $$\operatorname{gcd}(a,b) = p_1^{\min(e_1,f_1)} p_2^{\min(e_2,f_2)} p_3^{\min(e_3,f_3)} p_4^{\min(e_4,f_4)} \cdots$$

This property is actually pretty strong. Using this one, we can prove many other properties of the gcd:

Proposition 5. If $d \mid a$ and $d \mid b$, then $d \mid \operatorname{gcd}(a,b)$.

Distributivity law. $k\operatorname{gcd}(a,b) = \operatorname{gcd}(ka,kb)$ for all $k$, $a$, $b$.

We can actually restate Proposition 4 more succinctly using $\operatorname{ord}$.

Proposition 4 (restated). For any prime $p$, $\operatorname{ord}_p(\operatorname{gcd}(a,b)) = \min(\operatorname{ord}_p(a),\operatorname{ord}_p(b))$.

Two numbers are called coprime if they don't have any common divisors other than $1$. This is equivalent to saying that their gcd is equal to $1$.

Proposition 6. $\operatorname{gcd}(a,b) = 1$ and $\operatorname{gcd}(a,c) = 1$ then $\operatorname{gcd}(a,bc) = 1$

Proposition 7. If $a \mid bc$ and $\operatorname{gcd}(a,b) = 1$, then $a \mid c$.

Computing the gcd

You might have encountered gcd when you needed to simplify fractions in a previous section, so you know that one way to implement it is the following:

  1. long long gcd(long long a, long long b) {
  2. if (a == 0 && b == 0) return 0; // special case
  3. for (long long i = std::min(a,b); i > 0; i--) {
  4. if (a % i == 0 && b % i == 0) {
  5. return i;
  6. }
  7. }
  8. return 1;
  9. }

But this is actually very slow! For example, try computing gcd(18000000000000000LL,24000000000000000LL). We know the answer must be 6000000000000000LL, but this will take 18000000000000000LL - 6000000000000000LL = 12000000000000000LL iterations, so you might be waiting for a while. We want a faster solution!

Euclid

You know Euclid? Yes, the geometry guy who wrote Elements. Actually, this Elements book also had number theory in it, and it also contains stuff about how to compute the gcd of two numbers efficiently! (Obviously way before computers were even invented.)

It relies on the following fact:


Proposition 8. For any pair of positive integers $a$, $b$, we have $\operatorname{gcd}(a,b) = \operatorname{gcd}(a-b,b)$.

Proof. It suffices to prove that $\operatorname{gcd}(a,b) \mid \operatorname{gcd}(a-b,b)$ and $\operatorname{gcd}(a-b,b) \mid \operatorname{gcd}(a,b)$. (This is one of the basic things about “divides” we enumerated before.)

First, we try to show that $\operatorname{gcd}(a,b) \mid \operatorname{gcd}(a-b,b)$. Note that $\operatorname{gcd}(a,b)$ divides $a$ and $b$, so it must divide $a-b$. Thus, $\operatorname{gcd}(a,b)$ divides of $a-b$ and $b$, so by Proposition 5, $\operatorname{gcd}(a,b)$ divides $\operatorname{gcd}(a-b,b)$.

Second, we try to show that $\operatorname{gcd}(a-b,b) \mid \operatorname{gcd}(a,b)$. Note that $\operatorname{gcd}(a-b,b)$ divides $a-b$ and $b$, so it must divide $(a-b)+b = a$. Thus, $\operatorname{gcd}(a-b,b)$ divides of $a$ and $b$, so by Proposition 5, $\operatorname{gcd}(a-b,b)$ divides $\operatorname{gcd}(a,b)$.

This proposition is the key to Euclid's algorithm for computing gcds. For example, say we want to compute $\operatorname{gcd}(204,90)$. Then we just keep on using Proposition 8 until one of the numbers become zero: $$\begin{align*}\operatorname{gcd}(204,90) &= \operatorname{gcd}(204-90,90)\\ &= \operatorname{gcd}(114,90)\\ &= \operatorname{gcd}(114-90,90)\\ &= \operatorname{gcd}(24,90)\\ &= \operatorname{gcd}(24,90-24)\\ &= \operatorname{gcd}(24,66)\\ &= \operatorname{gcd}(24,66-24)\\ &= \operatorname{gcd}(24,42)\\ &= \operatorname{gcd}(24,42-24)\\ &= \operatorname{gcd}(24,18)\\ &= \operatorname{gcd}(24-18,18)\\ &= \operatorname{gcd}(6,18)\\ &= \operatorname{gcd}(6,18-6)\\ &= \operatorname{gcd}(6,12)\\ &= \operatorname{gcd}(6,12-6)\\ &= \operatorname{gcd}(6,6)\\ &= \operatorname{gcd}(6,6-6)\\ &= \operatorname{gcd}(6,0)\\ &= 6 \end{align*}$$

Here's one way to implement it in code:

  1. long long gcd(long long a, long long b) {
  2. // This assumes 'a' and 'b' are nonnegative.
  3. // just take their absolute values before passing them.
  4. while (a != 0 && b != 0) {
  5. if (a >= b) {
  6. a -= b;
  7. } else {
  8. b -= a;
  9. }
  10. }
  11. if (a == 0) {
  12. return b;
  13. } else { // if (b == 0)
  14. return a;
  15. }
  16. }

This is much better than before, because it's now able to compute gcd(18000000000000000LL,24000000000000000LL) quickly!

Unfortunately, this isn't faster in all cases. Consider for example gcd(10000000000000000LL,1). The original gcd code finishes quickly, but Euclid's algorithm will take very long because we need to subtract 1 many times from 10000000000000000LL.

Thankfully, there's a way to save Euclid's algorithm. This is by extending Proposition 8. We start with the following:


Proposition 9. For any $a$, $b$, we have $\operatorname{gcd}(a,b) = \operatorname{gcd}(b,(a \bmod b))$.

Proof. By definition, $(a \bmod b) = a - \lfloor a/b \rfloor b$, so we just apply "Proposition 8" $\lfloor a/b \rfloor$ times: $$\begin{align*} \operatorname{gcd}(a,b) &= \operatorname{gcd}(a-b,b) \\ &= \operatorname{gcd}(a-2b,b) \\ &= \operatorname{gcd}(a-3b,b) \\ &= \ldots \\ &= \operatorname{gcd}(a-\lfloor a/b \rfloor b,b) \\ &= \operatorname{gcd}((a \bmod b),b) \\ &= \operatorname{gcd}(b,(a \bmod b)) \end{align*}$$.

For example, if we want to compute $\operatorname{gcd}(204,90)$, we can use Proposition 9 many times: $$\begin{align*}\operatorname{gcd}(204,90) &= \operatorname{gcd}(90,204\bmod 90)\\ &= \operatorname{gcd}(90,24)\\ &= \operatorname{gcd}(24,90\bmod 24)\\ &= \operatorname{gcd}(24,18)\\ &= \operatorname{gcd}(18,24\bmod 18)\\ &= \operatorname{gcd}(18,6)\\ &= \operatorname{gcd}(6,18\bmod 6)\\ &= \operatorname{gcd}(6,0)\\ &= 6 \end{align*}$$

Notice that it takes fewer steps than before! In fact, if you try it on gcd(10000000000000000LL,1) you'll find that this now runs very fast. Let's try implementing it:

  1. long long gcd(long long a, long long b) {
  2. while (b != 0) {
  3. long long r = a % b;
  4. a = b;
  5. b = r;
  6. }
  7. return a;
  8. }

It's now much faster and more readable! So let's call this the real Euclid's algorithm and the old one the slow Euclid's algorithm.

We can also write a recursive one-line version:

  1. long long gcd(long long a, long long b) {
  2. return b == 0 ? a : gcd(b, a % b);
  3. }

So if you're ever planning on implementing Euclid's algorithm, you should use one of these faster versions!

Time complexity

What is the time complexity of (the fast) Euclid's algorithm? How many steps does it take? Surely, Euclid's algorithm halts because at every step, $b$ decreases in value (so it must reach $0$ at some point). But how much does it decrease? Here's a useful proposition:


Proposition 10. After two steps of applying Proposition 9, the new $b$ will be at most half the original $b$.

Proof. Let's apply it once: $\operatorname{gcd}(a,b) = \operatorname{gcd}(b,(a \bmod b))$. If $(a\bmod b) \le b/2$, then we are done (in just one step!), because have shown that the new $b$, namely $a \bmod b$, is at most half the original $b$.

So, let's assume that $(a\bmod b) > b/2$. Let $b' = (a\bmod b)$. Thus, $\operatorname{gcd}(a,b) = \operatorname{gcd}(b,b')$. Applying Proposition 9 a second time, we get $\operatorname{gcd}(b,b') = \operatorname{gcd}(b', (b\bmod b'))$. But remember that $b' > b/2$. This implies two things:

Thus, $b \bmod b' = b - b' < b/2$, and we are done, because have shown that the new $b$, namely $b \bmod b'$, is at most half the original $b$.

This gives us an upper bound for the number of steps Euclid's algorithm takes!


Proposition 11. Euclid's algorithm halts after at most $2\lfloor \log_2 b \rfloor + 3$ steps. Thus, it runs in $O(\log b)$ time.

Proof. After $2(\lfloor \log_2 b \rfloor + 1)$ steps, the new $b$ will be at most $$\frac{b}{2^{\lfloor \log_2 b \rfloor + 1}} < \frac{b}{2^{\log_2 b}} = 1.$$ But since $b$ is an integer, $b$ can only be $0$ at this point. So Euclid's algorithm halts at this point (after one more step). Thus, $2\lfloor \log_2 b \rfloor + 3$ steps.

Thus, we have shown that computing $\operatorname{gcd}(a,b)$ takes $O(\log b)$ time, and this is very fast! With a little more effort, you can prove a stronger time complexity bound: $O(\log \min(a,b))$.

Problems

Note: If you're into more advanced maths, you might hear that $-6$ is also considered a gcd of $18$ and $24$ (in fact, almost all pairs of numbers have two gcds), and you might also know why we say $\operatorname{gcd}(0,0) = 0$ for a reason that isn't arbitrary. But we'll not go into that now.