[21/28] crypto: sha512 - Avoid stack bloat on i386

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk,
	Herbert Xu <herbert@gondor.apana.org.au>
Subject: [21/28] crypto: sha512 - Avoid stack bloat on i386
Date: Thu, 16 Feb 2012 16:55:55 -0800	[thread overview]
Message-ID: <20120217005536.880797019@linuxfoundation.org> (raw)
In-Reply-To: <20120217005609.GA17081@kroah.com>

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Herbert Xu <herbert@gondor.apana.org.au>

commit 3a92d687c8015860a19213e3c102cad6b722f83c upstream.

Unfortunately in reducing W from 80 to 16 we ended up unrolling
the loop twice.  As gcc has issues dealing with 64-bit ops on
i386 this means that we end up using even more stack space (>1K).

This patch solves the W reduction by moving LOAD_OP/BLEND_OP
into the loop itself, thus avoiding the need to duplicate it.

While the stack space still isn't great (>0.5K) it is at least
in the same ball park as the amount of stack used for our C sha1
implementation.

Note that this patch basically reverts to the original code so
the diff looks bigger than it really is.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 crypto/sha512_generic.c |   68 ++++++++++++++++++++++--------------------------
 1 file changed, 32 insertions(+), 36 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -89,46 +89,42 @@ sha512_transform(u64 *state, const u8 *i
 	int i;
 	u64 W[16];
 
-	/* load the input */
-        for (i = 0; i < 16; i++)
-                LOAD_OP(i, W, input);
-
 	/* load the state into our registers */
 	a=state[0];   b=state[1];   c=state[2];   d=state[3];
 	e=state[4];   f=state[5];   g=state[6];   h=state[7];
 
-#define SHA512_0_15(i, a, b, c, d, e, f, g, h)			\
-	t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];	\
-	t2 = e0(a) + Maj(a, b, c);				\
-	d += t1;						\
-	h = t1 + t2
-
-#define SHA512_16_79(i, a, b, c, d, e, f, g, h)			\
-	BLEND_OP(i, W);						\
-	t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)&15];	\
-	t2 = e0(a) + Maj(a, b, c);				\
-	d += t1;						\
-	h = t1 + t2
-
-	for (i = 0; i < 16; i += 8) {
-		SHA512_0_15(i, a, b, c, d, e, f, g, h);
-		SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
-		SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
-		SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
-		SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
-		SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
-		SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
-		SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
-	}
-	for (i = 16; i < 80; i += 8) {
-		SHA512_16_79(i, a, b, c, d, e, f, g, h);
-		SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
-		SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
-		SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
-		SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
-		SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
-		SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
-		SHA512_16_79(i + 7, b, c, d, e, f, g, h, a);
+	/* now iterate */
+	for (i=0; i<80; i+=8) {
+		if (!(i & 8)) {
+			int j;
+
+			if (i < 16) {
+				/* load the input */
+				for (j = 0; j < 16; j++)
+					LOAD_OP(i + j, W, input);
+			} else {
+				for (j = 0; j < 16; j++) {
+					BLEND_OP(i + j, W);
+				}
+			}
+		}
+
+		t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i  ] + W[(i & 15)];
+		t2 = e0(a) + Maj(a,b,c);    d+=t1;    h=t1+t2;
+		t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[(i & 15) + 1];
+		t2 = e0(h) + Maj(h,a,b);    c+=t1;    g=t1+t2;
+		t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[(i & 15) + 2];
+		t2 = e0(g) + Maj(g,h,a);    b+=t1;    f=t1+t2;
+		t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[(i & 15) + 3];
+		t2 = e0(f) + Maj(f,g,h);    a+=t1;    e=t1+t2;
+		t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[(i & 15) + 4];
+		t2 = e0(e) + Maj(e,f,g);    h+=t1;    d=t1+t2;
+		t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[(i & 15) + 5];
+		t2 = e0(d) + Maj(d,e,f);    g+=t1;    c=t1+t2;
+		t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[(i & 15) + 6];
+		t2 = e0(c) + Maj(c,d,e);    f+=t1;    b=t1+t2;
+		t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[(i & 15) + 7];
+		t2 = e0(b) + Maj(b,c,d);    e+=t1;    a=t1+t2;
 	}
 
 	state[0] += a; state[1] += b; state[2] += c; state[3] += d;

WARNING: multiple messages have this Message-ID (diff)

From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk,
	Herbert Xu <herbert@gondor.hengli.com.au>
Subject: [21/28] crypto: sha512 - Avoid stack bloat on i386
Date: Thu, 16 Feb 2012 16:55:55 -0800	[thread overview]
Message-ID: <20120217005536.880797019@linuxfoundation.org> (raw)
In-Reply-To: <20120217005609.GA17081@kroah.com>

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Herbert Xu <herbert@gondor.apana.org.au>

commit 3a92d687c8015860a19213e3c102cad6b722f83c upstream.

Unfortunately in reducing W from 80 to 16 we ended up unrolling
the loop twice.  As gcc has issues dealing with 64-bit ops on
i386 this means that we end up using even more stack space (>1K).

This patch solves the W reduction by moving LOAD_OP/BLEND_OP
into the loop itself, thus avoiding the need to duplicate it.

While the stack space still isn't great (>0.5K) it is at least
in the same ball park as the amount of stack used for our C sha1
implementation.

Note that this patch basically reverts to the original code so
the diff looks bigger than it really is.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 crypto/sha512_generic.c |   68 ++++++++++++++++++++++--------------------------
 1 file changed, 32 insertions(+), 36 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -89,46 +89,42 @@ sha512_transform(u64 *state, const u8 *i
 	int i;
 	u64 W[16];
 
-	/* load the input */
-        for (i = 0; i < 16; i++)
-                LOAD_OP(i, W, input);
-
 	/* load the state into our registers */
 	a=state[0];   b=state[1];   c=state[2];   d=state[3];
 	e=state[4];   f=state[5];   g=state[6];   h=state[7];
 
-#define SHA512_0_15(i, a, b, c, d, e, f, g, h)			\
-	t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];	\
-	t2 = e0(a) + Maj(a, b, c);				\
-	d += t1;						\
-	h = t1 + t2
-
-#define SHA512_16_79(i, a, b, c, d, e, f, g, h)			\
-	BLEND_OP(i, W);						\
-	t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)&15];	\
-	t2 = e0(a) + Maj(a, b, c);				\
-	d += t1;						\
-	h = t1 + t2
-
-	for (i = 0; i < 16; i += 8) {
-		SHA512_0_15(i, a, b, c, d, e, f, g, h);
-		SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
-		SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
-		SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
-		SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
-		SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
-		SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
-		SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
-	}
-	for (i = 16; i < 80; i += 8) {
-		SHA512_16_79(i, a, b, c, d, e, f, g, h);
-		SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
-		SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
-		SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
-		SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
-		SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
-		SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
-		SHA512_16_79(i + 7, b, c, d, e, f, g, h, a);
+	/* now iterate */
+	for (i=0; i<80; i+=8) {
+		if (!(i & 8)) {
+			int j;
+
+			if (i < 16) {
+				/* load the input */
+				for (j = 0; j < 16; j++)
+					LOAD_OP(i + j, W, input);
+			} else {
+				for (j = 0; j < 16; j++) {
+					BLEND_OP(i + j, W);
+				}
+			}
+		}
+
+		t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i  ] + W[(i & 15)];
+		t2 = e0(a) + Maj(a,b,c);    d+=t1;    h=t1+t2;
+		t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[(i & 15) + 1];
+		t2 = e0(h) + Maj(h,a,b);    c+=t1;    g=t1+t2;
+		t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[(i & 15) + 2];
+		t2 = e0(g) + Maj(g,h,a);    b+=t1;    f=t1+t2;
+		t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[(i & 15) + 3];
+		t2 = e0(f) + Maj(f,g,h);    a+=t1;    e=t1+t2;
+		t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[(i & 15) + 4];
+		t2 = e0(e) + Maj(e,f,g);    h+=t1;    d=t1+t2;
+		t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[(i & 15) + 5];
+		t2 = e0(d) + Maj(d,e,f);    g+=t1;    c=t1+t2;
+		t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[(i & 15) + 6];
+		t2 = e0(c) + Maj(c,d,e);    f+=t1;    b=t1+t2;
+		t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[(i & 15) + 7];
+		t2 = e0(b) + Maj(b,c,d);    e+=t1;    a=t1+t2;
 	}
 
 	state[0] += a; state[1] += b; state[2] += c; state[3] += d;

next prev parent reply	other threads:[~2012-02-17  0:55 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-17  0:56 [00/28] 3.2.7-stable review Greg KH
2012-02-17  0:55 ` [01/28] ixgbe: fix vf lookup Greg KH
2012-02-17  0:55 ` [02/28] igb: " Greg KH
2012-02-17  0:55 ` [03/28] perf evsel: Fix an issue where perf report fails to show the proper percentage Greg KH
2012-02-17  0:55 ` [04/28] perf tools: Fix perf stack to non executable on x86_64 Greg KH
2012-02-17  0:55 ` [05/28] drm/i915: Force explicit bpp selection for intel_dp_link_required Greg KH
2012-02-17  0:55 ` [06/28] drm/i915: no lvds quirk for AOpen MP45 Greg KH
2012-02-17  0:55 ` [07/28] ath9k: Fix kernel panic during driver initilization Greg KH
2012-02-17  0:55 ` [08/28] ath9k: fix a WEP crypto related regression Greg KH
2012-02-17  0:55 ` [09/28] ath9k_hw: fix a RTS/CTS timeout regression Greg KH
2012-02-17  0:55 ` [10/28] hwmon: (f75375s) Fix bit shifting in f75375_write16 Greg KH
2012-02-17  0:55 ` [11/28] net: enable TC35815 for MIPS again Greg KH
2012-02-17  0:55 ` [12/28] lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernel Greg KH
2012-02-17  0:55 ` [13/28] relay: prevent integer overflow in relay_open() Greg KH
2012-02-17  0:55 ` [14/28] mac80211: timeout a single frame in the rx reorder buffer Greg KH
2012-02-17  0:55 ` [15/28] writeback: fix NULL bdi->dev in trace writeback_single_inode Greg KH
2012-02-17  0:55 ` [16/28] writeback: fix dereferencing NULL bdi->dev on trace_writeback_queue Greg KH
2012-02-17  0:55 ` [17/28] hwmon: (f75375s) Fix automatic pwm mode setting for F75373 & F75375 Greg KH
2012-02-17  0:55 ` [18/28] cifs: request oplock when doing open on lookup Greg KH
2012-02-17  0:55 ` [19/28] cifs: dont return error from standard_receive3 after marking response malformed Greg KH
2012-02-17  0:55 ` [20/28] crypto: sha512 - Use binary and instead of modulus Greg KH
2012-02-17  0:55   ` Greg KH
2012-02-17  0:55 ` Greg KH [this message]
2012-02-17  0:55   ` [21/28] crypto: sha512 - Avoid stack bloat on i386 Greg KH
2012-02-17  1:52   ` David Miller
2012-02-17  2:17     ` Greg KH
2012-02-17  2:28       ` Herbert Xu
2012-02-17  2:28         ` Herbert Xu
2012-02-17  2:37         ` Greg KH
2012-02-17  2:37           ` Greg KH
2012-02-17  2:54           ` David Miller
2012-02-17  3:30             ` Greg KH
2012-02-20 20:45               ` Greg KH
2012-02-17  0:55 ` [22/28] backing-dev: fix wakeup timer races with bdi_unregister() Greg KH
2012-02-17  0:55 ` [23/28] ALSA: intel8x0: Fix default inaudible sound on Gateway M520 Greg KH
2012-02-17  0:55 ` [24/28] ALSA: hda - Fix initialization of secondary capture source on VT1705 Greg KH
2012-02-17  0:55 ` [25/28] ALSA: hda - Fix silent speaker output on Acer Aspire 6935 Greg KH
2012-02-17  0:56 ` [26/28] mmc: atmel-mci: save and restore sdioirq when soft reset is performed Greg KH
2012-02-17  0:56 ` [27/28] mmc: dw_mmc: Fix PIO mode with support of highmem Greg KH
2012-02-17  0:56 ` [28/28] xen pvhvm: do not remap pirqs onto evtchns if !xen_have_vector_callback Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120217005536.880797019@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.