Elliptic curve discrete logarithm problem in characteristic two

Several recent preprints have discussed summation polynomial attacks on the ECDLP in characteristic 2:

  • eprint 2015/310, New algorithm for the discrete logarithm problem on elliptic curves, by Igor Semaev.
  • eprint 2015/319, Point Decomposition Problem in Binary Elliptic Curves, by Koray Karabina.
  • arxiv 1503.08001, Notes on summation polynomials, by Michiel Kosters and Sze Ling Yeo.

To recall some history, in 2004 Semaev introduced a fundamental new idea for index calculus algorithms for the ECDLP. Fix an elliptic curve E over a field K. Semaev’s summation polynomials S_m( x_1, \dots, x_m ) have the property that if S_m( a_1, \dots, a_m ) = 0 for some field elements a_1, \dots, a_m \in K then there are elliptic curve points (a_1, b_1), \dots, (a_m,b_m) on E(\overline{K}) such that (a_1, b_1) + \cdots + (a_m,b_m) = 0.
Gaudry and Diem explored how to use this idea in the context of Weil descent and the ECDLP on elliptic curves over \mathbb{F}_{2^n}. In particular, Diem showed there exists a sequence of finite fields \mathbb{F}_{2^n} (not of prime degree) such that the ECDLP along this sequence is provably subexponential. In these cases, the idea is to choose a_i in some dimension d < n subspace of \mathbb{F}_{2^n} and to write a "random point" R as a sum of m such points by finding a solution to S_{m+1}(a_1, \dots, a_m, x(R)) = 0. One chooses md \approx n to have a good chance of a relation, in which case one has to solve a system of polynomial equations involving md \approx n variables and n equations.

The main problem is that the summation polynomial S_m has degree 2^{m-2} in each variable, meaning that after Weil descent the system of equations in a large number of variables has extremely high degree. The difficult case is when n is a prime and so one cannot choose the subspace of dimension d to be a subfield. There have been a number of recent papers that exploit invariants under group actions to lower the degree, and hence improve these attacks in the case when n is prime. I mention just three of these works:

  • Jean-Charles Faugere, Pierrick Gaudry, Louise Huot and Guenael Renault, Using Symmetries in the Index Calculus for Elliptic Curves Discrete Logarithm, J. Crypt., Volume 27, Issue 4 (2014) 595-635.
  • Yun-Ju Huang, Christophe Petit, Naoyuki Shinohara and Tsuyoshi Takagi, Improvement of Faugere et al.’s Method to Solve ECDLP, in K. Sakiyama and M. Terada (eds.), IWSEC 2013, Springer LNCS 8231 (2013) 115–132.
  • Steven D. Galbraith and Shishay W. Gebregiyorgis, Summation polynomial algorithms for elliptic curves in characteristic two, in W. Meier and D. Mukhopadhyay (eds), INDOCRYPT 2014, Springer LNCS 8885 (2014) 409-427.

The challenge in solving large systems of multivariate equations is always the same: How to choose new variables that enable lowering the degree of the system of equations. The simplest of all such ideas is linearisation: replace every monomial with a new variable. The aim is to lower the degree by increasing the number of variables, but hopefully by not introducing too many new variables.

The new idea in the papers by Semaev and Karabina is elegant, natural and simple. The idea is natural when one knows that the summation polynomials are computed recursively by taking resultants. The idea is to “unroll” the resultant computation by introducing new variables for the intermediate points. Putting the idea simply, instead of trying to write R = P_1 + \cdots + P_m for points P_i in the factor base, we write
Q_1 = P_1 + P_2, Q_2 = Q_1 + P_3, \dots, R = Q_{m-2} + P_m where the Q_i are completely arbitrary points.

Hence, one now has a system of md + (m-2)n \approx (m-1)n variables and (m-1)n equations, but the equations all have total degree 4 because S_3 only has total degree 4. The big question is whether this is a good way to trade-off the number of variables versus degree from the point of view of solving systems of equations. Both the Semaev and Karabina papers give evidence that this choice of variables and equations is easier to solve than previous ones.

So far so good, but now the papers become more speculative. In particular, the papers both assume the “first fall” conjecture of Petit and Quisquater.

Karabina is relatively modest. He suggests taking m=5, with the aim of getting a O( q^{2/5} ) algorithm that would beat the O( q^{1/2} ) of Pollard rho (here q = 2^n). He gives some experimental evidence that the first fall conjecture applies in this case (his data only goes up to n=19 for the case m=5, but this is using S_4 rather than the full idea of S_3). He suggests the O( q^{2/5} ) algorithm might start to beat rho once q = 2^n > 2^{700}.

Semaev is more aggressive in his choice of parameters. He takes m growing with n , and assumes the first fall conjecture holds in this case. Experimental results are given in the paper, but the largest n appears to be 21 and the largest m is 6. In my opinion there is insufficient evidence in the paper to support the conjecture as n and m both grow in the required way. But it is interesting to speculate on how the algorithm might go. For n = 571 Semaev suggests m = 12, giving running time O( (n(m-1))^{4\omega} ) where \omega > 2.3 is the linear algebra constant. The claimed complexity is much better than Pollard rho for such a curve.

However, I note that the storage cost for the matrix representing the system of equations is { N + 3 \choose 4}^2 where N = (m-1)n is the number of binary variables in the descended polynomial equations. In the case n=571, m=12 the storage cost is at least 2^{91} bits, which is more than 10^{14} terabytes. To put this in perspective, Avogadro’s constant is about 2^{79}, so storing such a matrix seems way beyond current engineering and so there is no current reason to be concerned about such algorithms. (Francisco Rodriguez-Henriquez pointed out to me that the entire world wide web in 2013 required about 4 zettabytes which is 4 \cdot 10^9 terabytes.) I stress that this matrix needs to be stored as part of the Groebner basis computation for every single relation computation (and this cannot be effectively distributed, as far as we know). The final linear algebra stage is easier.

As an aside, Semaev’s choice of title “New algorithm for the discrete logarithm problem on elliptic curves” seems exaggerated. The title of the paper suggests there is a major new idea, rather than merely an elegant choice of variables for which to implement a known algorithm.

Finally I mention the Kosters-Yeo paper. Section 3 of the paper is about NP-completeness of an elliptic curve version of subset-sum, which is already well-known, see for example the discussion here. Section 4 is more relevant to this blog post as it discusses the first fall conjecture. The paper seems to give evidence that the first fall conjecture is false. It is also notable that this paper reports much larger computations, going as far as n = 40, while Karabina and Semaev do not go beyond n = 26. If it turns out that the first fall conjecture is false, then there is no reason to believe the claims by Semaev and Karabina that index calculus methods could beat Pollard rho for the ECDLP on binary curves.

One thing is clear, more work needs to be done to evaluate the first fall conjecture and the performance of this class of algorithms. But the current state of knowledge is that there is no serious threat to the ECDLP over binary fields of prime extension degree n.

— Steven Galbraith

PS. Semaev has drawn my attention to the revision of his paper on eprint, which now relies on a weaker assumption (Assumption 2). However, I am still of the opinion that there is currently insufficient evidence to support this assumption for the case of very large n.

PPS. This post had around 200 views in its first 17 hours. It is good to know there is a lot of interest in this topic.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Elliptic curve discrete logarithm problem in characteristic two

  1. Igor Semaev says:

    Few comments on the posting by Steven Galbraith.

    1. Storage cost is significantly lower as the matrix representing the Boolean equation system which comes from summation polynomials is rather sparse: at most about n^3/ m monomials in each of the equations, where the overall number of monomials in the system is about (nm)^4/4!. For n=571 and m=12 I got 2^{70} bits to store the matrix which is lower than 2^{91}. That may have some importance if an XL family algorithm is used to solve the system. One can probably reduce the storage requirement further at the expense of the computational time as the matrix is very structured.

    2. Rather than Groebner basis algorithms F4(F5) one can use Block Wiedemann algorithm due to Coppersmith to solve the system after linearisation (XL method). The algorithm seems possible to distribute, see “Solving Quadratic Equations with XL on Parallel Architectures” by Chen-Mou Cheng, Tung Chou, Ruben Niederhagen, and Bo-Yin Yang.

    3. It would be interesting to study closely the regularity degree of the Boolean equation systems which come from summation polynomials. A first fall degree assumption says it is at most 4. I should mention the Kosters-Yeo paper does not provide strong evidence that the regularity degree is larger than 4 for n=40, m=2 as their computation was not complete. In contrast, for n=40, m=2 and 100 random systems of that sort solved by MAGMA the maximal step degree before a Groebner basis is computed was 4. A number of redundant steps of degree 5,6,7 may appear in the very end of the computation. The maximal degree 4 assumption was supported by experiments in Section 5 (for n up to 19 and m=5) of the latest version of the preprint by Karabina as well.

    4. For n=571 and m=12 the method still works faster than Pollard’s even if the regularity degree is up to 6. The linear algebra constant is here taken 3 for the complexity of computing a Groebner basis with MAGMA.

    5. The new asymptotical complexity bound 2^{c (n log n)^{1/2}}, c=1.69 still holds when the regularity degree grows as o((n/ log n)^{1/2}).

    -Igor Semaev

  2. I would like to add a comment regarding point 3 of the comment of Igor Semaev.

    With the help of the Caramel team from Nancy (France), we managed to finish the computation n=45, m=t=2. The Groebner basis computation terminated, and the system turned out to have two solutions. It indeed reached step degree 5. The total amount of RAM needed was 126 GB.

    Furthermore, we did computations with n=25, m=t=3. If the system has no solutions, then the step degree seems to be 4. If the system has solutions, the step degree seems to be 5 again (this computation did not finish after using 111 GB of RAM, but the previous step, of degree 4, only uses 11 GB of RAM).

  3. Pierrick Gaudry says:

    If someone else is limited in their experiments related to these questions by the lack of memory, we are willing to let them run on our machine with 512 GB of RAM (depending on the availability of the machine, of course). Just drop me an email.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s