The Mystery of 248 and Cofactor DH

Yayım Tarihi: 11 May 2022 Dr. Metin Evrim Ulu

The Mystery of 248 and Cofactor DH

Introduction

I was reading the OpenSSH portable codebase trying to dig out anything interesting and saw very strange lines of code during the key generation of Ed25519 signature algorithm. Curious as always, I started digging to see whether or not it is legitimate.

Ed25519

Ed25519 is the Elliptic Curve Signature Algorithm designed by cryptographer djb; his papers on it can be found here. Let's recall the parameters of the scheme provided in Wikipedia.

  • \( q = 2^{255} - 19 \)
  • \( E/\mathbb{F}_q \) is the twisted Edwards curve defined by: \[ -x^2 + y^2 = 1 - \frac{121665}{121666} x^2 y^2 \]
  • \( \ell = 2^{252} + 27742317777372353535851937790883648493 \) and \( c = 3 \)
  • \( B \) is the unique point \( E(\mathbb{F}_q) \) whose \( y \) coordinate is \( \frac{4}{5} \) and whose \( x \) coordinate is positive.
    • "positive" coordinates are even coordinates (least significant bit is cleared)
    • "negative" coordinates are odd coordinates (least significant bit is set)

First of all, we know that we are on a prime field, so \( q \) is prime. Finding such primes is still an interesting problem in cryptography. Sage verifies our claim:

sage: is_prime(2^(255)-19)
True

The curve defined by the equation above is a twisted Edwards curve. Edwards curves are defined by the equation

\[ ax^2 + y^2 = 1 + dx^2y^2 \]

where it is untwisted when \( a = 1 \). In our case, \( a = -1 \) and \( d = \frac{121665}{121666} \). The integral representation of \( d \) is as follows:

sage: q = 2^(255)-19
sage: k = GF(q);
sage: k(121665/121666)
20800338683988658368647408995589388737092878452977063003340006470870624536394

Kind of long, isn’t it? How many digits are there?

sage: len(str(k(121665/121666)))
77
sage: log(20800338683988658368647408995589388737092878452977063003340006470870624536394,2).n()
253.523142230851

So it is 254 bits. Another note in Wikipedia tells us the following:

The curve \( E/\mathbb{F}_q \) is birationally equivalent to the Montgomery curve known as Curve25519. The equivalence is:

\[ x = \frac{u}{v} \sqrt{-486664}, \quad y = \frac{u-1}{u+1} \]

If you worry, oh there is a minus sign in sqrt, recall that we are working in a prime field and the following holds for every prime field:

\[ -1 = q - 1 \pmod{q} \]

What does birationally equivalent even mean? First, let's define what rational equivalence means. It means that there exist rational functions \( x = f(u,v), y = g(u,v) \) that map points on one curve to the other. Similarly, birationally equivalent means there are also functions \( u = f'(x,y), v = g'(x,y) \) that map points back to the original curve, and these functions are rational functions in \( \mathbb{F}_q(x,y) \), where \( \mathbb{F}_q(x,y) \) is the function field of \( \mathbb{F}_q[x,y] \). This implies that if \( f \in \mathbb{F}_q (x,y) \), then there exists \( a,b \in \mathbb{F}_q[x,y] \) such that

\[ f(x,y) = \frac{a(x,y)}{b(x,y)} \]

Of course, \( b(x,y) \) can have zeroes and \( f \) does not have to be defined everywhere, but on an open subset of \( \overline{\mathbb{F}}_q \times \overline{\mathbb{F}}_q \).

Fortunately, this is not an article on sheaves and schemes. Now, just tell me: what the heck is a cofactor?

Number of Rational Points on Curve25519

Although curves defined over \( \mathbb{C}^2 \) are continuous on a dense subset, the question of finding rational points of arbitrary curves defined over finite fields is still an open question. However, we are in luck, and for Curve25519, the group has the following order:

\[ \# E(\mathbb{F}_q) = 2^c \ell \]

where \( \ell \) is prime and \( c = 3 \), and \( 2^c \) is called the cofactor. Therefore,

\[ \left| \# E(\mathbb{F}_q) / \langle g \rangle \right| = 8 \]

where \( g \) is a generator of the subgroup of order \( \ell \). Still didn’t get it? Why do I care about this little number \( c \)?

Ed25519 Key Generation in OpenSSH-Portable

Let's look into the source code. It starts in ssh-keygen.c on line 3799:

  break;
  default:
    if ((r = sshkey_generate(type, bits, &private)) != 0)
      fatal("sshkey_generate failed");
    break;
  }

Here it calls int sshkey_generate(int type, u_int bits, struct sshkey **keyp) in sshkey.c. Relevant lines start at 1753:

 switch (type) {
 case KEY_ED25519:
   if ((k->ed25519_pk = malloc(ED25519_PK_SZ)) == NULL ||
       (k->ed25519_sk = malloc(ED25519_SK_SZ)) == NULL) {
     ret = SSH_ERR_ALLOC_FAIL;
     break;
   }
   crypto_sign_ed25519_keypair(k->ed25519_pk, k->ed25519_sk);
   ret = 0;
   break;

So basically here, it allocates enough space and passes pointers for public and private keys to int crypto_sign_ed25519_keypair(unsigned char *pk, unsigned char *sk) in ed25519.c. This is exactly where the fun begins. Let’s look into crypto_sign_ed25519_keypair.

int crypto_sign_ed25519_keypair(
    unsigned char *pk,
    unsigned char *sk
    )
{
  sc25519 scsk;
  ge25519 gepk;
  unsigned char extsk[64];
  int i;

  randombytes(sk, 32);
  crypto_hash_sha512(extsk, sk, 32);
  extsk[0] &= 248;
  extsk[31] &= 127;
  extsk[31] |= 64;

  sc25519_from32bytes(&scsk,extsk);
  
  ge25519_scalarmult_base(&gepk, &scsk);
  ge25519_pack(pk, &gepk);
  for (i = 0; i < 32; i++)
    sk[32 + i] = pk[i];
  return 0;
}

Clamping extsk[31] is basically due to the magnitude \( \ell \). We want our key to be less than \( 8\ell \). Also, OR‑ing with 64 makes sure that the secret key is large, preventing brute‑forcing.

Now, do you see the line where the least significant byte of the secret key extsk[0] is ANDed with the number 248? In binary, it is 0b11111000, so it zeroes out the least significant three bits of the secret key. I was puzzled by this and looked for the reasons behind it.

Elliptic Diffie‑Hellman KEX

Before delving into the details, let's review how classic Diffie‑Hellman Key Exchange works. Say there are two parties who want to generate keys for a symmetric algorithm. First party, say Alice, generates a random \( a \) and calculates:

\[ g^a \pmod{p} \]

and sends it to Bob. Similarly, Bob generates a random \( b \) and calculates:

\[ g^b \pmod{p} \]

Sends this information to Alice. Both then perform the final symmetric operation to generate a shared secret:

\[ (g^b)^a = g^{ab} = (g^a)^b \pmod{p} \]

The security of this scheme depends on an NP‑hard problem called the Discrete Logarithm Problem. The problem depends on the existence of a polynomial‑time algorithm to find \( a \) given \( g^a \pmod{p} \).

Anyway, no time for complexity theory. Let's look at how the elliptic version of DH works. First, Alice generates a random scalar \( a \) and calculates:

\[ aG \in E(\mathbb{F}_q) \]

Sends it to Bob. Similarly, Bob picks a random \( b \) and calculates:

\[ bG \in E(\mathbb{F}_q) \]

Sends it to Alice. Both calculate the shared secret similar to the classic version:

\[ abG \in E(\mathbb{F}_q) \]

The security of this problem is based on the Elliptic Discrete Logarithm Problem, which is basically the elliptic‑curve version of DLP on prime fields.

The Cofactor Problem

So let's look into the details of this problem. Basically, the problem states that the least significant 3 bits can be leaked during an Elliptic DH by specially crafted values. I promise you will have a better understanding by the end of this section.

Say the elliptic group \( E(\mathbb{F}_q) \) is generated by \( G \) and an adversary Eve wants to know more about Bob's private key \( b \) without solving an instance of Elliptic DLP. What Eve does is the following:

  1. Calculates: \( \ell G \in E(\mathbb{F}_q) \) and sends it to Bob.
  2. Bob picks a random \( b \) and calculates:
\[ b\ell G \in E(\mathbb{F}_q) \]

So there are only 8 possible keys for this KEX depending on the value of \( b \). When Eve gets the response, she will be able to recover

\[ b \pmod{8} \]

by comparing it to \( 0\ell G, 1\ell G, \ldots, 7\ell G \). Therefore, Eve will know the least significant 3 bits of Bob's private key \( b \). This is due to the larger group acting on the smaller cofactor group.

To prevent this kind of leak, Bob would basically check whether the order of Eve's public key is greater than the cofactor. This can be accomplished either by comparing the values to the ones on the list or by calculating \( 8 \) times the value. This can be done in \( 3 \) steps by the repeated‑squaring algorithm, so the burden is not that great.

Conclusion

In OpenSSH‑portable's implementation, the least significant bits of the secret keys are zeroed out to prevent the leak. You might ask: why pre‑define these bits? My guess is that these bits might leak information about the random‑number generator of the target, so fixing them to 000 would prevent such information from being exfiltrated by an adversary. However, this reduces the keyspace by a factor of 8.

One note is that the keys generated in OpenSSH‑portable are for signing, not DH. So, it might raise the question: why fix for DH in a signature algorithm? I'll let you know when I have the answer.

Geleceği Birlikte İnşa Edelim

Teknoloji ihtiyaçlarınız için doğru çözüm ortağı arıyorsanız, Kor Bilişim yanınızda.

Kendi Bulutunuzu Başlatın