Re: [PATCH 2/3] sha512: reduce stack usage to safe number

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alexey Dobriyan <adobriyan@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	linux-crypto@vger.kernel.org, netdev@vger.kernel.org,
	ken@codelabs.ch, Steffen Klassert <steffen.klassert@secunet.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	security@kernel.org
Subject: Re: [PATCH 2/3] sha512: reduce stack usage to safe number
Date: Sat, 14 Jan 2012 23:41:27 +0300	[thread overview]
Message-ID: <20120114204127.GA4100@p183.telecom.by> (raw)
In-Reply-To: <CA+55aFziQSzTpfnxQ+k83Ui-65TDHu5GZFoJhhPoR504K_d7pw@mail.gmail.com>

On Sat, Jan 14, 2012 at 11:08:45AM -0800, Linus Torvalds wrote:
> On Sat, Jan 14, 2012 at 10:40 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > Line by line explanation:
> > * BLEND_OP
> >  array is "circular" now, all indexes have to be modulo 16.
> >  Round number is positive, so remainder operation should be
> >  without surprises.
> 
> Don't use "%" except on unsigned values. Even if it's positive, if
> it's a signed number and the compiler doesn't *see* that it is
> absolutely positive, division is nontrivial. Even when you divide by a
> constant.
> 
> For example, "% 16" on an 'int' on x86-64 will generate
> 
> 	movl	%edi, %edx
> 	sarl	$31, %edx
> 	shrl	$28, %edx
> 	leal	(%rdi,%rdx), %eax
> 	andl	$15, %eax
> 	subl	%edx, %eax
> 
> in order to get the signed case right. The fact that the end result is
> correct for unsigned numbers is irrelevant: it's still stupid and
> slow.
> 
> With an unsigned int, '% 16' will generate the obvious
> 
> 	andl	$15, %eax
> 
> instead.
> 
> Quite frankly, stop using division in the first place. Dividing by
> powers-of-two and expecting the compiler to fix things up is just
> stupid, *exactly* because of issues like these: you either have to
> think about it carefully, or the compiler may end up creating crap
> code.

For the record, it generates "andl $15" here.

> So just use "& 15" instead. That doesn't have these kinds of issues.
> It is a *good* thing when the C code is close to the end result you
> want to generate. It is *not* a good thing to write code that looks
> nothing like the end result and just expect the compiler to do the
> right thing. Even if the compiler does do the right thing, what was
> the advantage?

Here is updated patch which explicitly uses & (equally tested):
---------------------------------------------------------------

For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15]
and W[i - 16]. Consequently, keeping all W[80] array on stack is
unnecessary, only 16 values are really needed.

Using W[16] instead of W[80] greatly reduces stack usage
(~750 bytes to ~340 bytes on x86_64).

Line by line explanation:
* BLEND_OP
  array is "circular" now, all indexes have to be modulo 16.

* initial full message scheduling is trimmed to first 16 values which
  come from data block, the rest is calculated right before it's needed.

* original loop body is unrolled version of new SHA512_0_15 and
  SHA512_16_79 macros, unrolling was done to not do explicit variable
  renaming. Otherwise it's the very same code after preprocessing.
  See sha1_transform() code which does the same trick.

Patch survives in-tree crypto test and original bugreport test
(ping flood with hmac(sha512).

See FIPS 180-2 for SHA-512 definition
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@vger.kernel.org
---

 This is patch is for stable if 750 byte stack usage is not
 considered safe.

 crypto/sha512_generic.c |   58 ++++++++++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 24 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -78,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W, const u8 *input)
 
 static inline void BLEND_OP(int I, u64 *W)
 {
-	W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
+	W[I & 15] += s1(W[(I-2) & 15]) + W[(I-7) & 15] + s0(W[(I-15) & 15]);
 }
 
 static void
@@ -87,38 +87,48 @@ sha512_transform(u64 *state, const u8 *input)
 	u64 a, b, c, d, e, f, g, h, t1, t2;
 
 	int i;
-	u64 W[80];
+	u64 W[16];
 
 	/* load the input */
         for (i = 0; i < 16; i++)
                 LOAD_OP(i, W, input);
 
-        for (i = 16; i < 80; i++) {
-                BLEND_OP(i, W);
-        }
-
 	/* load the state into our registers */
 	a=state[0];   b=state[1];   c=state[2];   d=state[3];
 	e=state[4];   f=state[5];   g=state[6];   h=state[7];
 
-	/* now iterate */
-	for (i=0; i<80; i+=8) {
-		t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i  ] + W[i  ];
-		t2 = e0(a) + Maj(a,b,c);    d+=t1;    h=t1+t2;
-		t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1];
-		t2 = e0(h) + Maj(h,a,b);    c+=t1;    g=t1+t2;
-		t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2];
-		t2 = e0(g) + Maj(g,h,a);    b+=t1;    f=t1+t2;
-		t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3];
-		t2 = e0(f) + Maj(f,g,h);    a+=t1;    e=t1+t2;
-		t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4];
-		t2 = e0(e) + Maj(e,f,g);    h+=t1;    d=t1+t2;
-		t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5];
-		t2 = e0(d) + Maj(d,e,f);    g+=t1;    c=t1+t2;
-		t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6];
-		t2 = e0(c) + Maj(c,d,e);    f+=t1;    b=t1+t2;
-		t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7];
-		t2 = e0(b) + Maj(b,c,d);    e+=t1;    a=t1+t2;
+#define SHA512_0_15(i, a, b, c, d, e, f, g, h)			\
+	t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];	\
+	t2 = e0(a) + Maj(a, b, c);				\
+	d += t1;						\
+	h = t1 + t2
+
+#define SHA512_16_79(i, a, b, c, d, e, f, g, h)			\
+	BLEND_OP(i, W);						\
+	t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)&15];	\
+	t2 = e0(a) + Maj(a, b, c);				\
+	d += t1;						\
+	h = t1 + t2
+
+	for (i = 0; i < 16; i += 8) {
+		SHA512_0_15(i, a, b, c, d, e, f, g, h);
+		SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
+		SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
+		SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
+		SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
+		SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
+		SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
+		SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
+	}
+	for (i = 16; i < 80; i += 8) {
+		SHA512_16_79(i, a, b, c, d, e, f, g, h);
+		SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
+		SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
+		SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
+		SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
+		SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
+		SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
+		SHA512_16_79(i + 7, b, c, d, e, f, g, h, a);
 	}
 
 	state[0] += a; state[1] += b; state[2] += c; state[3] += d;

next prev parent reply	other threads:[~2012-01-14 20:41 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-11  0:00 sha512: make it work, undo percpu message schedule Alexey Dobriyan
2012-01-11  0:12 ` Alexey Dobriyan
2012-01-11  0:36 ` Herbert Xu
2012-01-12 23:55   ` Alexey Dobriyan
2012-01-13  0:19     ` Herbert Xu
2012-01-13  7:08     ` Herbert Xu
2012-01-13 10:35       ` Eric Dumazet
2012-01-13 10:41         ` Eric Dumazet
2012-01-13 10:57           ` Eric Dumazet
2012-01-13 11:33             ` Alexey Dobriyan
2012-01-13 12:34               ` Eric Dumazet
2012-01-14 18:20                 ` Alexey Dobriyan
2012-01-14 18:27                   ` [PATCH 1/3] " Alexey Dobriyan
2012-01-14 18:40                     ` [PATCH 2/3] sha512: reduce stack usage to safe number Alexey Dobriyan
2012-01-14 18:44                       ` [PATCH 3/3] sha512: use standard ror64() Alexey Dobriyan
2012-01-14 19:08                       ` [PATCH 2/3] sha512: reduce stack usage to safe number Linus Torvalds
2012-01-14 20:41                         ` Alexey Dobriyan [this message]
2012-01-14 21:14                           ` Linus Torvalds
2012-01-16  9:56                           ` David Laight
2012-01-16 10:20                             ` Alexey Dobriyan
2012-01-16 10:23                             ` Eric Dumazet
2012-01-16 11:37                               ` David Laight
2012-01-17 12:03                               ` Alexey Dobriyan
2012-01-18 18:02                                 ` [PATCH 4/3] sha512: reduce stack usage even on i386 Alexey Dobriyan
2012-01-26  2:35                                   ` Herbert Xu
2012-01-27 17:51                                     ` Alexey Dobriyan
2012-01-27 22:32                                       ` Herbert Xu
2012-01-30 11:10                                         ` Alexey Dobriyan
2012-02-03  3:34                                           ` Herbert Xu
2012-01-14 21:46                     ` [PATCH 1/3] sha512: make it work, undo percpu message schedule Eric Dumazet
2012-01-14 21:52                       ` Linus Torvalds
2012-01-14 22:00                         ` Eric Dumazet
2012-01-15  1:43                     ` Herbert Xu
2012-01-13 11:02         ` Steffen Klassert
2012-01-15  1:43           ` Herbert Xu
2012-01-13 11:45         ` David Laight
2012-01-13 12:35           ` Eric Dumazet
2012-01-13  6:22   ` Steffen Klassert
2012-01-13  6:46     ` Herbert Xu
2012-01-13  6:48     ` Eric Dumazet
2012-01-13  6:50       ` Herbert Xu
2012-01-13  9:45       ` David Laight
2012-01-11  1:12 ` Adrian-Ken Rueegsegger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120114204127.GA4100@p183.telecom.by \
    --to=adobriyan@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=ken@codelabs.ch \
    --cc=linux-crypto@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=security@kernel.org \
    --cc=steffen.klassert@secunet.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.