# The Mystery of 248 and Cofactor DH

## The Key generation of Ed25519 signature algorithm

**Yayım Tarihi:**

*11 May 2022*-

**Yazar:**

*Dr. Metin Evrim Ulu*

# The Mystery of 248 and Cofactor DH

## Introduction

I was reading the OpenSSH portable codebase trying to dig out anything interesting and saw very strange lines of code during the key generation of Ed25519 signature algorithm. Curious as always started digging whether or not it is legitimate.

## Ed25519

Ed25519 is the Elliptic Curve Signature Algorithm designed by cryptographer djb and his papers on it can be found here. Let’s recall the parameters of the scheme provided in Wikipedia.

- \( q = 2^{255} - 19 \)
- \( E/\mathbb{F}_q \) is the twisted Edwards curve defined by: \[ -x^2 + y^2 = 1 - \frac{121665}{121666} x^2 y^2 \]
- \( \ell =2^{252}+27742317777372353535851937790883648493 \) and \( c = 3 \)
- \(B\) is the unique point \(E(\mathbb{F}_q) \) whose \(y\)
coordinate is \( \frac{4}{5} \) and whose \( x \) coordinate is
positive.
- “positive” coordinates are even coordinates (least significant bit is cleared)
- “negative” coordinates are odd coordinates (least significant bit is set)

First of all we know that we are on a prime field so q is prime. Finding such primes is still an interesting problem in Cryptography. Sage verifies our claim:

```
sage: is_prime(2^(255)-19)
True
```

The curve defined by the equation above is a twisted Edwards curve. Edwards curves are defined by the equation

\[ ax^2 + y^2 = 1 + dx^2y^2 \]where it is untwisted when \( a = 1 \). In our case, \( a = -1 \) and \( d = \frac{121665}{121666} \). Integral representation of \( d \) is as follows:

```
sage: q = 2^(255)-19
sage: k = GF(q);
sage: k(121665/121666)
20800338683988658368647408995589388737092878452977063003340006470870624536394
```

Kinda long, isn’t it? How many digits are there?

```
sage: len(str(k(121665/121666)))
77
sage: log(20800338683988658368647408995589388737092878452977063003340006470870624536394,2).n()
253.523142230851
```

So it is 254 bits. Another note in Wikipedia tell us the following:

The curve \( E/\mathbb{F}_q \) is birationally equivalent to the Montgomery curve known as Curve25519. The equivalence is:

\[ x = \frac{u}{v} \sqrt{-486664}, y = \frac{u-1}{u+1} \]If you worry, oh there is a minus sign in sqrt, recall that we are working in a prime field and the following holds for every prime field:

\[ -1 = q - 1 \ (mod\ q) \]What does birationally equivalent even mean? First let’s define what rational equivalence means. It means that there exist rational functions \( x = f(u,v), y = g(u,v) \) that maps points on one curve to the other. Similarly, birationally equivalent means, there are also functions \( u = f^\prime(x,y), v = g^\prime(x,y) \) that maps points back to the original curve and these functions are rational functions in \( \mathbb{F}_q(x,y) \) where \( \mathbb{F}_q(x,y) \) is the function field of \( \mathbb{F}_q[x,y] \). This implies if \( f \in \mathbb{F}_q (x,y) \) then there exists \( a,b \in \mathbb{F}_q[x,y] \)such that

\[ f(x,y) = \frac{a(x,y)}{b(x,y)} \]Of course \( b(x,y) \) can have zeroes and \( f\) should not have to be defined everywhere but an open subset of \( \overline{\mathbb{F}}_q \times \overline{\mathbb{F}}_q \).

Fortunately this is not an article on Sheaves and Schemes. Now, just tell me what the heck is a cofactor, will you?

## Number of rationals point on Curve25519

Although curves defined over \( \mathbb{C}^2 \) are continuous on a dense subset, the question of finding rational points of arbitrary curves defined over finite fields is still an open question. However, we are in luck and for Curve25519, the group has the following order:

\[ \# E(\mathbb{F}_q) = 2^c \ell \]where \( \ell \) is prime and \( c = 3 \) and \( 2^c \) is called the cofactor. Therefore,

\[ \vert \# E(\mathbb{F}_q) / \langle g \rangle \vert = 8 \]where \( g \) is a generator of the subgroup of order \( \ell \). Still did not get it. Why do I care about this little number \( c \) ?

## Ed25519 Key Generation in OpenSSH-Portable

Let’s look into the source code. It starts in
ssh-keygen.c
on line *3799*:

```
break;
default:
if ((r = sshkey_generate(type, bits, &private)) != 0)
fatal("sshkey_generate failed");
break;
}
```

Here it calls `int sshkey_generate(int type, u_int bits, struct sshkey **keyp)`

in
sshkey.c. Relevant
lines start at *1753*:

```
switch (type) {
case KEY_ED25519:
if ((k->ed25519_pk = malloc(ED25519_PK_SZ)) == NULL ||
(k->ed25519_sk = malloc(ED25519_SK_SZ)) == NULL) {
ret = SSH_ERR_ALLOC_FAIL;
break;
}
crypto_sign_ed25519_keypair(k->ed25519_pk, k->ed25519_sk);
ret = 0;
break;
```

So basically here, it allocates enough space and passes pointers for
public and private keys to `int crypto_sign_ed25519_keypair( unsigned char *pk, unsigned char *sk )`

in
ed25519.c. This
is exactly where the fun begins. Let’s look into
`crypto_sign_ed25519_keypair`

.

```
int crypto_sign_ed25519_keypair(
unsigned char *pk,
unsigned char *sk
)
{
sc25519 scsk;
ge25519 gepk;
unsigned char extsk[64];
int i;
randombytes(sk, 32);
crypto_hash_sha512(extsk, sk, 32);
extsk[0] &= 248;
extsk[31] &= 127;
extsk[31] |= 64;
sc25519_from32bytes(&scsk,extsk);
ge25519_scalarmult_base(&gepk, &scsk);
ge25519_pack(pk, &gepk);
for(i=0;i<32;i++)
sk[32 + i] = pk[i];
return 0;
}
```

Clamping `ext[31]`

is basically due to the magnitude \( \ell \). We
want our key to be less then \( 8\ell \). Also, or’ing with 64 makes
sure that the secret key is large preventing brute forcing.

Now, do you see the line where the least significant byte of the
secret key `extsk[0]`

is anded with the number 248? In binary, it is
`0b11111000`

so it zeroes out the least significant three bits of the
secret key. I was puzzled by this and look for the reasons behind it.

## Elliptic Diffie-Hellman KEX

Before delving into the details, let’s review how classic Diffie-Hellman Key Exchange works. Say there are two parties who want to generate keys for a symmetric algorithm. First party, say Alice, generates a random \( a \), and calculates the following:

\[ g^a (mod\ p)\]and sends it to Bob. Similarly, Bob generates a random \( b \) and calculates the following:

\[ g^b (mod\ p)\]Sends this information to Alice. Both then do the final symmetric operation to generate a shared secret:

\[ (g^b)^a = g^{ab} = (g^a)^b \ (mod\ p) \]The security of this scheme depends on a NP-hard problem called Discrete Logarithm Problem. The problem depends on existence of a polynomial time algorithm to find \(a \) given \( g^a (mod\ p)\).

Anyway, no time for Complexity Theory. Let’s look how elliptic version of DH works. First Alice generate a random scalar \( a \) and calculates the following:

\[ aG \in E(\mathbb{F}_q) \]Sends it to Bob. Similarly, Bob picks a random \( b\) and calculates the following:

\[ bG \in E(\mathbb{F}_q) \]Sends it to Alice. Both calculate the shared secret similar to the classic version:

\[ abG \in E(\mathbb{F}_q) \]This security of this problem is based on the Elliptic Discrete Logarithm problem which is basically the Elliptic Curve version of DLP on prime fields.

## The Cofactor Problem

So let’s look into the details of this problem. Basically, the problem states that the least significant 3 bits can be leaked during an Elliptic DH by specially crafting values. I promise you will have a better understanding by the end of this section.

Say the elliptic group \( E(\mathbb{F}_q) \) is generated by \(G\) and say an adversary Eve wants to know more about Bob’s private key, \(b\) without solving an instance of Elliptic DLP. What Eve does is the following:

- Calculates: \( \ell G \in E(\mathbb{F}_q) \) and sends it to Bob,
- Bob picks a random \( b\) and calculates:

So there are only 8 possible keys for this KEX depending on the value of \( b\). When Eve gets the response, she will be able to recover

\[ b\ (mod\ 8) \]by comparing it to \( 0\ell G, 1\ell G, \ldots , 7\ell G\). Therefore, Eve will know the least significant 3 bits of the private key of Bob, that is \( b \). This is due to the larger group acting on the smaller cofactor group.

To prevent this kind of leak, Bob would basically check whether the order of Eve’s public key is greater than the cofactor. This can be accomplished either comparing the values to the ones on the list or by calculating \( 8\) times the value. This can be accomplished in \(3\) steps by Repeated Squaring Algorithm. So the burden is not that great.

## Conclusion

In openssh-portable’s implementation, the least significant bits of
the secret keys are zeroed out to prevent the leak. You might ask why
pre-define these bits? My guess is basically, these bits might leak
information about the random number generator of the target, so fixing
them to *000* would prevent such an information to be exfiltrated by
an adversary. However, this reduces the keyspace by 8 folds.

One note is that, the keys generated in openssh-portable are for signing not DH. So, it might raise the question why fix for DH in a signature algorithm? I’ll let you know when I have the answer.