### Re: [Cfrg] Curve selection revisited

Andrey Jivsov <crypto@brainhub.org> Sat, 26 July 2014 21:42 UTC

Return-Path: <crypto@brainhub.org>

X-Original-To: cfrg@ietfa.amsl.com

Delivered-To: cfrg@ietfa.amsl.com

Received: from localhost (ietfa.amsl.com [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id 929CE1A040C
for <cfrg@ietfa.amsl.com>; Sat, 26 Jul 2014 14:42:14 -0700 (PDT)

X-Virus-Scanned: amavisd-new at amsl.com

X-Spam-Flag: NO

X-Spam-Score: -1.901

X-Spam-Level:

X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
SPF_PASS=-0.001] autolearn=ham

Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id WeGnxo4L6oTo for <cfrg@ietfa.amsl.com>;
Sat, 26 Jul 2014 14:42:12 -0700 (PDT)

Received: from qmta11.emeryville.ca.mail.comcast.net
(qmta11.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:211])
by ietfa.amsl.com (Postfix) with ESMTP id 91E6F1A03FF
for <cfrg@irtf.org>; Sat, 26 Jul 2014 14:42:12 -0700 (PDT)

Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88])
by qmta11.emeryville.ca.mail.comcast.net with comcast
id X9bX1o0011u4NiLAB9iCKQ; Sat, 26 Jul 2014 21:42:12 +0000

Received: from [192.168.1.2] ([71.202.164.227])
by omta21.emeryville.ca.mail.comcast.net with comcast
id X9iB1o0044uhcbK8h9iBK3; Sat, 26 Jul 2014 21:42:12 +0000

Message-ID: <53D420B3.10707@brainhub.org>

Date: Sat, 26 Jul 2014 14:42:11 -0700

From: Andrey Jivsov <crypto@brainhub.org>

User-Agent: Mozilla/5.0 (X11; Linux x86_64;
rv:24.0) Gecko/20100101 Thunderbird/24.6.0

MIME-Version: 1.0

To: cfrg@irtf.org

References: <CA+Vbu7xroa68=HOZtbf=oz7kK2EeUv_z1okpnjxHPR0ZtHD5cA@mail.gmail.com>
<CFF7E184.28E9F%kenny.paterson@rhul.ac.uk> <53D2781B.8030605@sbcglobal.net>
<CACsn0ckqFigWoH2+OOEHSd2VWPp8y6=m8H5OsFRyjXmjK7+m4w@mail.gmail.com>
<CABqy+srxMNuG0AaQd0SaegHvZWgbW762EQq+iAHL_fbu6sOJJQ@mail.gmail.com>

In-Reply-To: <CABqy+srxMNuG0AaQd0SaegHvZWgbW762EQq+iAHL_fbu6sOJJQ@mail.gmail.com>

Content-Type: text/plain; charset=UTF-8; format=flowed

Content-Transfer-Encoding: 7bit

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
s=q20140121; t=1406410932;
bh=wp94k3uUfafgOTR/Pz0KziS8XfLFBJaWelWe4SKukJQ=;
h=Received:Received:Message-ID:Date:From:MIME-Version:To:Subject:
Content-Type;
b=vAJMwCTZUQBayVmgwKmWswknNPpKlPBghgKyq2wmCog7KGxwujE5YOHvzQ7q4Gih7
qo8jmZaT8w0es7ClGchJozx5Uqb1o+JNGjNo/j3rhE8xb0sK54EudHLI6b41x5XKdH
yV9DU885PdW8GesqBxw5xnk+Hez4Oq6XraN+9Ei5O1eIUZDBBOVs1eoUguFyzpmgBe
IYey/TlbnEEb3qbj3WvvpZSJD3Je9D/SzxSv37QbxalPA4q28EMVVgdNF5Q5pep8cx
TjsVoe31y3kexKcQ4f+O0cYEwIGQgxfw6E5/93ySAH9a1jvp0Ff2gFch3ddeTRfFK+
Q9C50eTPJzwLQ==

Archived-At: http://mailarchive.ietf.org/arch/msg/cfrg/JYqu4L6Ksnh8Xs4AI_9yTfOwUOc

Subject: Re: [Cfrg] Curve selection revisited

X-BeenThere: cfrg@irtf.org

X-Mailman-Version: 2.1.15

Precedence: list

List-Id: Crypto Forum Research Group <cfrg.irtf.org>

List-Unsubscribe: <http://www.irtf.org/mailman/options/cfrg>,
<mailto:cfrg-request@irtf.org?subject=unsubscribe>

List-Archive: <http://www.irtf.org/mail-archive/web/cfrg/>

List-Post: <mailto:cfrg@irtf.org>

List-Help: <mailto:cfrg-request@irtf.org?subject=help>

List-Subscribe: <http://www.irtf.org/mailman/listinfo/cfrg>,
<mailto:cfrg-request@irtf.org?subject=subscribe>

X-List-Received-Date: Sat, 26 Jul 2014 21:42:14 -0000

On 07/25/2014 07:25 PM, Robert Ransom wrote: > The exact value of c determines whether a portable or vectorized > implementation can delay carries until after reduction. For example, > after curve25519-donna performs a multiply or squaring, each of 19 > 64-bit limbs has absolute value less than 14*2^54. When > freduce_degree reduces a field element to 10 limbs without carrying, > the result has absolute value less than (c+1)*14*2^54 = 280*2^54, or a > bit more than 2^62. In the Microsoft field 2^255 - 765, or even 2^256 > - 189, this would not be possible. > > Carries between limbs can be performed more efficiently than in > curve25519-donna -- curve25519-donna goes to extra effort to keep > limbs signed. In the Montgomery ladder, this optimization does not > add any extra carry steps, because the Montgomery-ladder formulas > never perform more than two add/subtract operations between a > multiply/square operation. With Edwards-curve operations, this > optimization may require an additional carry step in the doubling > formula. > > The exact value of s determines whether carries can be vectorized > within a coordinate-field element. For example, 2^251 - 9 can be > represented using one 26-bit limb (in the least significant position) > and 9 25-bit limbs, but this interferes with the vectorization > strategy described in the neoncrypto paper (by Dr. Bernstein et al.). > Since c=9 is so small, this can be worked around by representing > coordinates modulo 2^252 - 18 instead, but this trick is rarely > possible without running into the bound on c. On 07/25/2014 07:25 PM, Robert Ransom wrote: > The exact value of c determines whether a portable or vectorized > implementation can delay carries until after reduction. For example, > after curve25519-donna performs a multiply or squaring, each of 19 > 64-bit limbs has absolute value less than 14*2^54. When > freduce_degree reduces a field element to 10 limbs without carrying, > the result has absolute value less than (c+1)*14*2^54 = 280*2^54, or a > bit more than 2^62. In the Microsoft field 2^255 - 765, or even 2^256 > - 189, this would not be possible. > > Carries between limbs can be performed more efficiently than in > curve25519-donna -- curve25519-donna goes to extra effort to keep > limbs signed. In the Montgomery ladder, this optimization does not > add any extra carry steps, because the Montgomery-ladder formulas > never perform more than two add/subtract operations between a > multiply/square operation. With Edwards-curve operations, this > optimization may require an additional carry step in the doubling > formula. > > The exact value of s determines whether carries can be vectorized > within a coordinate-field element. For example, 2^251 - 9 can be > represented using one 26-bit limb (in the least significant position) > and 9 25-bit limbs, but this interferes with the vectorization > strategy described in the neoncrypto paper (by Dr. Bernstein et al.). > Since c=9 is so small, this can be worked around by representing > coordinates modulo 2^252 - 18 instead, but this trick is rarely > possible without running into the bound on c. The main argument for p = 2^255-19 v.s. p = 1^256-189 is that the C=19 is smaller than C=189 and this helps to "manage" the carries. Let's assume that an F(p) integer is represented as n limbs (words). 256 bit Montgomery curve will have on the order of (5 mul + 4 squares) and about the same number F(p) additions per one scalar bit. Therefore, we can talk about one unreduced F(p) multiplication + F(p) reduction (where C plays its role) + F(p) addition. With this in mind the contribution of C can be estimated within this group. F(p) multiplications (counting squares here as well) dominate the performance of EC operations: * the result of the multiplication is 2*n limbs and the complexity of this operation is n^2 limb multiplications and O(n^2) additions. * the 2*n limbs must be reduced (mod p) to n limbs immediately after multiplication * there is on average ~1 addition/subtraction of n limbs per each multiplication due to EC formula Each multiplication needs degree reduction and settling of accumulated carries before the next multiplication. One can only postpone settling of carries while he does additions (prior to the next multiplciation). We have about n^2 limb additions, plus n limb additions due to EC formular, plus we must perform n additions of the form <some_limb>*C (for modp reduction). Assuming n=10 limbs, the entire contribution of n <some_limb>*C is bound by ~10% of all the additions. We are not talking about C=0, however, but rather about |C|=4 bits v.s. |C| = 8 bits difference. A rough estimate here is 5% penalty. One reason why I think this is a good estimate is that there may be a benefit of more fine-tuned control of carries. The highest accumulation of the carries happens in the inner limbs of the 2*n integer (imagine paper-and-pencil multiplication of two n-limb numbers: the stack is the highest in the middle). Therefore, if we have the need to control overflow within, say, multiplication, perhaps it's possible to accomplish this within a subset of n limbs. ( BTW, I don't see a clear argument why can't any F(p) be vectorized (some more efficiently than others). The vectorization using x86 SSE2 is known to speeds up RSA operations for generic F(p)s. )

- [Cfrg] Curve selection revisited Benjamin Black
- Re: [Cfrg] Curve selection revisited Yoav Nir
- Re: [Cfrg] Curve selection revisited Paterson, Kenny
- Re: [Cfrg] Curve selection revisited David Jacobson
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Watson Ladd
- Re: [Cfrg] Curve selection revisited Andrey Jivsov
- Re: [Cfrg] Curve selection revisited Ilari Liusvaara
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Michael Jenkins
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Hannes Tschofenig
- Re: [Cfrg] Curve selection revisited Hannes Tschofenig
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Stephen Farrell
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Michael Hamburg
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Paul Lambert
- Re: [Cfrg] Curve selection revisited Paul Lambert
- Re: [Cfrg] Curve selection revisited Mike Hamburg
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Andrey Jivsov
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Robert Ransom
- Re: [Cfrg] Curve selection revisited Phillip Hallam-Baker
- Re: [Cfrg] Curve selection revisited Robert Moskowitz
- Re: [Cfrg] Curve selection revisited Russ Housley
- Re: [Cfrg] Curve selection revisited Salz, Rich
- Re: [Cfrg] Curve selection revisited Phillip Hallam-Baker
- Re: [Cfrg] Curve selection revisited Salz, Rich