From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@cam.org>, George Spelvin <linux@horizon.com>,
Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org
Subject: Re: x86 SHA1: Faster than OpenSSL
Date: Thu, 06 Aug 2009 07:44:28 +0200 [thread overview]
Message-ID: <4A7A6DBC.9010107@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908052120330.3390@localhost.localdomain>
Linus Torvalds wrote:
>
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>>> The way it's written, I can easily make it do one or the other by just
>>> turning the macro inside a loop (and we can have a preprocessor flag to
>>> choose one or the other), but let me work on it a bit more first.
>> that's of course how i measured it.. :)
>
> Well, with my "rolling 512-bit array" I can't do that easily any more.
>
> Now it actually depends on the compiler being able to statically do that
> circular list calculation. If I were to turn it back into the chunks of
> loops, my new code would suck, because it would have all those nasty
> dynamic address calculations.
i did try (obvious patch below) and in fact the loops still win on p4:
#Initializing... Rounds: 1000000, size: 62500K, time: 1.428s, speed: 42.76MB/s
# TIME[s] SPEED[MB/s]
rfc3174 1.437 42.47
rfc3174 1.438 42.45
linus 0.5791 105.4
linusas 0.5052 120.8
mozilla 1.525 40.01
mozillaas 1.192 51.19
artur
--- block-sha1/sha1.c 2009-08-06 06:45:03.407322970 +0200
+++ block-sha1/sha1as.c 2009-08-06 07:36:41.332318683 +0200
@@ -107,13 +107,17 @@
#define T_0_15(t) \
TEMP = htonl(data[t]); array[t] = TEMP; \
- TEMP += SHA_ROL(A,5) + (((C^D)&B)^D) + E + 0x5a827999; \
- E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP; \
+ TEMP += SHA_ROL(A,5) + (((C^D)&B)^D) + E; \
+ E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP + 0x5a827999; \
+#if UNROLL
T_0_15( 0); T_0_15( 1); T_0_15( 2); T_0_15( 3); T_0_15( 4);
T_0_15( 5); T_0_15( 6); T_0_15( 7); T_0_15( 8); T_0_15( 9);
T_0_15(10); T_0_15(11); T_0_15(12); T_0_15(13); T_0_15(14);
T_0_15(15);
+#else
+ for (int t = 0; t <= 15; t++) { T_0_15(t); }
+#endif
/* This "rolls" over the 512-bit array */
#define W(x) (array[(x)&15])
@@ -125,37 +129,53 @@
TEMP += SHA_ROL(A,5) + (((C^D)&B)^D) + E + 0x5a827999; \
E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP; \
+#if UNROLL
T_16_19(16); T_16_19(17); T_16_19(18); T_16_19(19);
+#else
+ for (int t = 16; t <= 19; t++) { T_16_19(t); }
+#endif
#define T_20_39(t) \
SHA_XOR(t); \
- TEMP += SHA_ROL(A,5) + (B^C^D) + E + 0x6ed9eba1; \
- E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP;
+ TEMP += SHA_ROL(A,5) + (B^C^D) + E; \
+ E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP + 0x6ed9eba1;
+#if UNROLL
T_20_39(20); T_20_39(21); T_20_39(22); T_20_39(23); T_20_39(24);
T_20_39(25); T_20_39(26); T_20_39(27); T_20_39(28); T_20_39(29);
T_20_39(30); T_20_39(31); T_20_39(32); T_20_39(33); T_20_39(34);
T_20_39(35); T_20_39(36); T_20_39(37); T_20_39(38); T_20_39(39);
+#else
+ for (int t = 20; t <= 39; t++) { T_20_39(t); }
+#endif
#define T_40_59(t) \
SHA_XOR(t); \
- TEMP += SHA_ROL(A,5) + ((B&C)|(D&(B|C))) + E + 0x8f1bbcdc; \
- E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP;
+ TEMP += SHA_ROL(A,5) + ((B&C)|(D&(B|C))) + E; \
+ E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP + 0x8f1bbcdc;
+#if UNROLL
T_40_59(40); T_40_59(41); T_40_59(42); T_40_59(43); T_40_59(44);
T_40_59(45); T_40_59(46); T_40_59(47); T_40_59(48); T_40_59(49);
T_40_59(50); T_40_59(51); T_40_59(52); T_40_59(53); T_40_59(54);
T_40_59(55); T_40_59(56); T_40_59(57); T_40_59(58); T_40_59(59);
+#else
+ for (int t = 40; t <= 59; t++) { T_40_59(t); }
+#endif
#define T_60_79(t) \
SHA_XOR(t); \
TEMP += SHA_ROL(A,5) + (B^C^D) + E + 0xca62c1d6; \
E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP;
+#if UNROLL
T_60_79(60); T_60_79(61); T_60_79(62); T_60_79(63); T_60_79(64);
T_60_79(65); T_60_79(66); T_60_79(67); T_60_79(68); T_60_79(69);
T_60_79(70); T_60_79(71); T_60_79(72); T_60_79(73); T_60_79(74);
T_60_79(75); T_60_79(76); T_60_79(77); T_60_79(78); T_60_79(79);
+#else
+ for (int t = 60; t <= 79; t++) { T_60_79(t); }
+#endif
ctx->H[0] += A;
ctx->H[1] += B;
next prev parent reply other threads:[~2009-08-06 5:44 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-26 23:21 Performance issue of 'git branch' George Spelvin
2009-07-31 10:46 ` Request for benchmarking: x86 SHA1 code George Spelvin
2009-07-31 11:11 ` Erik Faye-Lund
2009-07-31 11:31 ` George Spelvin
2009-07-31 11:37 ` Michael J Gruber
2009-07-31 12:24 ` Erik Faye-Lund
2009-07-31 12:29 ` Johannes Schindelin
2009-07-31 12:32 ` George Spelvin
2009-07-31 12:45 ` Erik Faye-Lund
2009-07-31 13:02 ` George Spelvin
2009-07-31 11:21 ` Michael J Gruber
2009-07-31 11:26 ` Michael J Gruber
2009-07-31 12:31 ` Carlos R. Mafra
2009-07-31 13:27 ` Brian Ristuccia
2009-07-31 14:05 ` George Spelvin
2009-07-31 13:27 ` Jakub Narebski
2009-07-31 15:05 ` Peter Harris
2009-07-31 15:22 ` Peter Harris
2009-08-03 3:47 ` x86 SHA1: Faster than OpenSSL George Spelvin
2009-08-03 7:36 ` Jonathan del Strother
2009-08-04 1:40 ` Mark Lodato
2009-08-04 2:30 ` Linus Torvalds
2009-08-04 2:51 ` Linus Torvalds
2009-08-04 3:07 ` Jon Smirl
2009-08-04 5:01 ` George Spelvin
2009-08-04 12:56 ` Jon Smirl
2009-08-04 14:29 ` Dmitry Potapov
2009-08-18 21:50 ` Andy Polyakov
2009-08-04 4:48 ` George Spelvin
2009-08-04 6:30 ` Linus Torvalds
2009-08-04 8:01 ` George Spelvin
2009-08-04 20:41 ` Junio C Hamano
2009-08-05 18:17 ` George Spelvin
2009-08-05 20:36 ` Johannes Schindelin
2009-08-05 20:44 ` Junio C Hamano
2009-08-05 20:55 ` Linus Torvalds
2009-08-05 23:13 ` Linus Torvalds
2009-08-06 1:18 ` Linus Torvalds
2009-08-06 1:52 ` Nicolas Pitre
2009-08-06 2:04 ` Junio C Hamano
2009-08-06 2:10 ` Linus Torvalds
2009-08-06 2:20 ` Nicolas Pitre
2009-08-06 2:08 ` Linus Torvalds
2009-08-06 3:19 ` Artur Skawina
2009-08-06 3:31 ` Linus Torvalds
2009-08-06 3:48 ` Linus Torvalds
2009-08-06 4:01 ` Linus Torvalds
2009-08-06 4:28 ` Artur Skawina
2009-08-06 4:50 ` Linus Torvalds
2009-08-06 5:19 ` Artur Skawina
2009-08-06 7:03 ` George Spelvin
2009-08-06 4:52 ` George Spelvin
2009-08-06 4:08 ` Artur Skawina
2009-08-06 4:27 ` Linus Torvalds
2009-08-06 5:44 ` Artur Skawina [this message]
2009-08-06 5:56 ` Artur Skawina
2009-08-06 7:45 ` Artur Skawina
2009-08-06 18:49 ` Erik Faye-Lund
2009-08-04 6:40 ` Linus Torvalds
2009-08-18 21:26 ` Andy Polyakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A7A6DBC.9010107@gmail.com \
--to=art.08.09@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=linux@horizon.com \
--cc=nico@cam.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.