* [PATCH] small sha512 cleanup
@ 2004-10-01 19:31 Denis Vlasenko
2004-10-01 20:38 ` [PATCH] reduce sha512_transform() stack usage, speedup Denis Vlasenko
0 siblings, 1 reply; 4+ messages in thread
From: Denis Vlasenko @ 2004-10-01 19:31 UTC (permalink / raw)
To: jmorris, davem; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 299 bytes --]
Looks like open-coded be_to_cpu.
GCC produces rather poor code for this.
be_to_cpu produces asm()s which are ~4 times shorter.
Compile-tested only.
I am not sure whether input can be 64bit-unaligned.
If it indeed can be, replace:
((u64*)(input))[I] -> get_unaligned( ((u64*)(input))+I )
--
vda
[-- Attachment #2: sha512.c.diff --]
[-- Type: text/x-diff, Size: 1004 bytes --]
Replaces tons of GCC-produced horror code
with nice small one.
While we're at it, fix whitespace.
--- linux-2.6.9-rc3/crypto/sha512.c.org Thu Sep 30 07:09:44 2004
+++ linux-2.6.9-rc3/crypto/sha512.c Thu Sep 30 07:10:36 2004
@@ -104,27 +104,12 @@
static inline void LOAD_OP(int I, u64 *W, const u8 *input)
{
- u64 t1 = input[(8*I) ] & 0xff;
- t1 <<= 8;
- t1 |= input[(8*I)+1] & 0xff;
- t1 <<= 8;
- t1 |= input[(8*I)+2] & 0xff;
- t1 <<= 8;
- t1 |= input[(8*I)+3] & 0xff;
- t1 <<= 8;
- t1 |= input[(8*I)+4] & 0xff;
- t1 <<= 8;
- t1 |= input[(8*I)+5] & 0xff;
- t1 <<= 8;
- t1 |= input[(8*I)+6] & 0xff;
- t1 <<= 8;
- t1 |= input[(8*I)+7] & 0xff;
- W[I] = t1;
+ W[I] = __be64_to_cpu( ((u64*)(input))[I] );
}
static inline void BLEND_OP(int I, u64 *W)
{
- W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
+ W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
}
static void
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] reduce sha512_transform() stack usage, speedup
2004-10-01 19:31 [PATCH] small sha512 cleanup Denis Vlasenko
@ 2004-10-01 20:38 ` Denis Vlasenko
2004-10-01 20:43 ` David S. Miller
0 siblings, 1 reply; 4+ messages in thread
From: Denis Vlasenko @ 2004-10-01 20:38 UTC (permalink / raw)
To: jmorris, davem; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]
On top of previous:
Patch moves large temporary u64 W[80]
from stack to ctx struct:
* reduces stack usage by 640 bytes
* saves one 640-byte memset() per sha512_transform()
(we still do it after *all* iterations are done)
* quite unexpectedly saves 1.6k of code on i386
because stack offsets now fit into 8bits
and many stack addressing insns got 3 bytes smaller:
# size sha512.o.org sha512.o
text data bss dec hex filename
8281 372 0 8653 21cd sha512.o.org
6649 372 0 7021 1b6d sha512.o
# objdump -d sha512.o.org | cut -b9- >sha512.d.org
# objdump -d sha512.o | cut -b9- >sha512.d
# diff -u sha512.d.org sha512.d
[snip]
: 8b 4b 28 mov 0x28(%ebx),%ecx
: 8b 5b 2c mov 0x2c(%ebx),%ebx
-: 89 8d 44 fd ff ff mov %ecx,0xfffffd44(%ebp)
-: 89 9d 48 fd ff ff mov %ebx,0xfffffd48(%ebp)
-: 89 9d f4 fc ff ff mov %ebx,0xfffffcf4(%ebp)
+: 89 4d c4 mov %ecx,0xffffffc4(%ebp)
+: 89 5d c8 mov %ebx,0xffffffc8(%ebp)
+: 89 9d 64 ff ff ff mov %ebx,0xffffff64(%ebp)
: 8b 5d 08 mov 0x8(%ebp),%ebx
-: 89 8d f0 fc ff ff mov %ecx,0xfffffcf0(%ebp)
+: 89 8d 60 ff ff ff mov %ecx,0xffffff60(%ebp)
: 8b 42 30 mov 0x30(%edx),%eax
: 8b 52 34 mov 0x34(%edx),%edx
[snip]
WARNING: compile tested only.
--
vda
[-- Attachment #2: sha512.c.W.patch --]
[-- Type: text/x-diff, Size: 1198 bytes --]
--- linux-2.6.9-rc3/crypto/sha512.c.org Fri Oct 1 22:17:14 2004
+++ linux-2.6.9-rc3/crypto/sha512.c Fri Oct 1 23:20:13 2004
@@ -30,6 +30,7 @@
u64 state[8];
u32 count[4];
u8 buf[128];
+ u64 W[80];
};
static inline u64 Ch(u64 x, u64 y, u64 z)
@@ -113,10 +114,9 @@
}
static void
-sha512_transform(u64 *state, const u8 *input)
+sha512_transform(u64 *state, u64 *W, const u8 *input)
{
u64 a, b, c, d, e, f, g, h, t1, t2;
- u64 W[80];
int i;
@@ -157,7 +157,6 @@
/* erase our data */
a = b = c = d = e = f = g = h = t1 = t2 = 0;
- memset(W, 0, 80 * sizeof(u64));
}
static void
@@ -215,10 +214,10 @@
/* Transform as many times as possible. */
if (len >= part_len) {
memcpy(&sctx->buf[index], data, part_len);
- sha512_transform(sctx->state, sctx->buf);
+ sha512_transform(sctx->state, sctx->W, sctx->buf);
for (i = part_len; i + 127 < len; i+=128)
- sha512_transform(sctx->state, &data[i]);
+ sha512_transform(sctx->state, sctx->W, &data[i]);
index = 0;
} else {
@@ -227,6 +226,9 @@
/* Buffer remaining input */
memcpy(&sctx->buf[index], &data[i], len - i);
+
+ /* erase our data */
+ memset(sctx->W, 0, sizeof(sctx->W));
}
static void
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] reduce sha512_transform() stack usage, speedup
2004-10-01 20:38 ` [PATCH] reduce sha512_transform() stack usage, speedup Denis Vlasenko
@ 2004-10-01 20:43 ` David S. Miller
2004-10-01 21:22 ` Denis Vlasenko
0 siblings, 1 reply; 4+ messages in thread
From: David S. Miller @ 2004-10-01 20:43 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: jmorris, linux-kernel
On Fri, 1 Oct 2004 23:38:11 +0300
Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:
> WARNING: compile tested only.
You can't claim a "speed up" if you only compile test your
changes. Neither can you expect us to apply patches in
such a case.
It's not that difficult to load the tcrypt module and make
sure all the tests for the module you're changing still
pass.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] reduce sha512_transform() stack usage, speedup
2004-10-01 20:43 ` David S. Miller
@ 2004-10-01 21:22 ` Denis Vlasenko
0 siblings, 0 replies; 4+ messages in thread
From: Denis Vlasenko @ 2004-10-01 21:22 UTC (permalink / raw)
To: David S. Miller; +Cc: jmorris, linux-kernel
On Friday 01 October 2004 23:43, David S. Miller wrote:
> On Fri, 1 Oct 2004 23:38:11 +0300
> Denis Vlasenko <vda@port.imtp.ilyichevsk.odessa.ua> wrote:
>
> > WARNING: compile tested only.
>
> You can't claim a "speed up" if you only compile test your
> changes. Neither can you expect us to apply patches in
> such a case.
Speedup is rather tiny, most probably not measurable.
Patch optimizes out some memsets, otherwise code
practically did not change.
> It's not that difficult to load the tcrypt module and make
> sure all the tests for the module you're changing still
> pass.
Done:
testing sha384
test 1:
cb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7
pass
test 2:
3391fdddfc8dc7393707a65b1b4709397cf8b1d162af05abfe8f450de5f36bc6b0455a8520bc4e6f5fe95b1fe3c8452b
pass
test 3:
09330c33f71147e83d192fc782cd1b4753111b173b3b05d22fa08086e3b0f712fcc7c71a557e2db966c3e9fa91746039
pass
test 4:
3d208973ab3508dbbd7e2c2862ba290ad3010e4978c198dc4d8fd014e582823a89e16f9b2a7bbc1ac938e2d199e8bea4
pass
testing sha384 across pages
test 1:
3d208973ab3508dbbd7e2c2862ba290ad3010e4978c198dc4d8fd014e582823a89e16f9b2a7bbc1ac938e2d199e8bea4
pass
testing sha512
test 1:
ddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a9eeee64b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f
pass
test 2:
204a8fc6dda82f0a0ced7beb8e08a41657c16ef468b228a8279be331a703c33596fd15c13b1b07f9aa1d3bea57789ca031ad85c7a71dd70354ec631238ca3445
pass
test 3:
8e959b75dae313da8cf4f72814fc143f8f7779c6eb9f7fa17299aeadb6889018501d289e4900f7e4331b99dec4b5433ac7d329eeb6dd26545e96e55b874be909
pass
test 4:
930d0cefcb30ff1133b6898121f1cf3d27578afcafe8677c5257cf069911f75d8f5831b56ebfda67b278e66dff8b84fe2b2870f742a580d8edb41987232850c9
pass
testing sha512 across pages
test 1:
930d0cefcb30ff1133b6898121f1cf3d27578afcafe8677c5257cf069911f75d8f5831b56ebfda67b278e66dff8b84fe2b2870f742a580d8edb41987232850c9
pass
Please consider applying.
--
vda
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-10-01 21:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-01 19:31 [PATCH] small sha512 cleanup Denis Vlasenko
2004-10-01 20:38 ` [PATCH] reduce sha512_transform() stack usage, speedup Denis Vlasenko
2004-10-01 20:43 ` David S. Miller
2004-10-01 21:22 ` Denis Vlasenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox