git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
@ 2009-08-07  7:36 George Spelvin
  0 siblings, 0 replies; 30+ messages in thread
From: George Spelvin @ 2009-08-07  7:36 UTC (permalink / raw)
  To: torvalds; +Cc: art.08.09, git, linux

> Basically, older P4's will I think shift one bit at a time. So while even 
> Prescott is relatively weak in the shifter department, pre-prescott 
> (Willamette and Northwood) are _really_ weak. If your P4 is one of those, 
> you really shouldn't use it to decide on optimizations.

Thanks for the warning.  The only P4 I have a login on is a Willamette, so
I guess it's a bad optimization target:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Pentium(R) 4 CPU 1.60GHz
stepping        : 2
cpu MHz         : 1593.888
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pebs bts
bogomips        : 3187.77
clflush size    : 64
power management:



BTW, I'm having a bit of a problem with sha1bench.  Because the number
of rounds is multiplied by 10 until it takes more than mintime, and
the watchdog timer is set to 12*mintime, if I just barely miss the
threshold, I can hit the watchdog on any code that's more than 20% slower
than the reference rfc3174 code:

# /tmp/sha1bench 
#Initializing... Rounds: 10000000, size: 625000K, time: 6.904s, speed: 88.41MB/s
#             TIME[s] SPEED[MB/s]
rfc3174         6.912       88.31
# New hash result: 0489b02aee9fbd82b0bb0cba96f8047e42f543b8
rfc3174         6.911       88.31
linus           3.002       203.3
linusas         3.253       187.7
linusas2        3.059       199.6
mozilla         10.86       56.22
mozillaas       10.33       59.09
openssl         2.145       284.5
spelvin         1.933       315.8
spelvina        1.933       315.8
nettle          2.161       282.4

I had to bump it to 20 times to be able to get past the mozilla code.

You can still have nice round numbers with smaller incements of 2x or 2.5x:

    do {
-      rounds *= 10;
+      rounds *= 2;
+      if (rounds % 9 == 4)
+        rounds += rounds/4;     /* Step 1, 2, 5, 10, 20, 50, 100... */
       uWATCHDOG(mintime*20);
       t1 = GETTIME();
       for (i=0; i<rounds; i++) {
          SHA1Input(sc, arr512b, sizeof(arr512b));
          if (i<64) {
             SHA1Result(sc, arr512b+
                     (i+arr512b[i]+arr512b[arr512b[i]%64]+
                      arr512b[63-i]+arr512b[arr512b[63-i]%64]) %
                     (sizeof(arr512b)-20));
             SHA1Reset(sc);
             }
          }
       t2 = GETTIME(); td = t2-t1;
       }
    while (td<mintime);

And yeah, my P4 is touchy:
n# /var/tmp/sha1bench 
#Initializing... Rounds: 500000, size: 31250K, time: 0.9736s, speed: 31.35MB/s
#             TIME[s] SPEED[MB/s]
rfc3174        0.9931       30.73
# New hash result: 2872616106e163ae9c7c8d12a38bef032323c844
rfc3174        0.9491       32.16
linus          0.4906        62.2
linusas        0.5799       52.62
linusas2       0.4859       62.81
mozilla         1.302       23.44
mozillaas       1.234       24.74
openssl         0.226         135
spelvin        0.2298       132.8
spelvina       0.2494       122.4
nettle         0.3687       82.78

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH 0/7] block-sha1: improved SHA1 hashing
@ 2009-08-06 15:13 Linus Torvalds
  2009-08-06 17:22 ` Artur Skawina
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2009-08-06 15:13 UTC (permalink / raw)
  To: Git Mailing List, Junio C Hamano


The bulk of this is the patches I sent out yesterday, but there's a few 
added tweaks from today there, and it's now one nice series instead of a 
patch followed by a "Haa, I improved on it" etc, so you can get the whole 
thing without actually having to hunt for the pieces.

It's a series of 7 patches:

      block-sha1: add new optimized C 'block-sha1' routines
      block-sha1: try to use rol/ror appropriately
      block-sha1: make the 'ntohl()' part of the first SHA1 loop
      block-sha1: re-use the temporary array as we calculate the SHA1
      block-sha1: macroize the rounds a bit further
      block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3
      block-sha1: get rid of redundant 'lenW' context

where the thing is loosely based on the Mozilla SHA1 routines but by the 
end doesn't really resemble them all that much. The Mozilla ones suck 
donkey d*ck in so many ways - unnecessary copies, idiotic byte-at-a-time 
build-up of the hash input etc.

The end result is pretty much equivalent in performance to the OpenSSL 
SHA1 code for me on x86-64. Getting rid of OpenSSL gets rid of another 
couple of shared library loadings, and as a result my "make -j64 test" is 
now a couple of seconds faster with this than with OpenSSL in my rather 
unscientific tests.

The code isn't very big:

 Makefile          |    9 +++
 block-sha1/sha1.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block-sha1/sha1.h |   20 +++++++
 3 files changed, 187 insertions(+), 0 deletions(-)
 create mode 100644 block-sha1/sha1.c
 create mode 100644 block-sha1/sha1.h

and passes all my tests. It's really hard to not do the SHA1 right and 
still pass any tests at all, so it should all be good.

Enable them with BLK_SHA1=1.

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-08-08 23:37 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-07  7:36 [PATCH 0/7] block-sha1: improved SHA1 hashing George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2009-08-06 15:13 Linus Torvalds
2009-08-06 17:22 ` Artur Skawina
2009-08-06 18:09   ` Linus Torvalds
2009-08-06 19:10     ` Artur Skawina
2009-08-06 19:41       ` Linus Torvalds
2009-08-06 20:08         ` Artur Skawina
2009-08-06 20:53           ` Linus Torvalds
2009-08-06 21:24             ` Linus Torvalds
2009-08-06 21:39             ` Artur Skawina
2009-08-06 21:52               ` Artur Skawina
2009-08-06 22:27                 ` Linus Torvalds
2009-08-06 22:33                   ` Linus Torvalds
2009-08-06 23:19                     ` Artur Skawina
2009-08-06 23:42                       ` Linus Torvalds
2009-08-06 22:55                   ` Artur Skawina
2009-08-06 23:04                     ` Linus Torvalds
2009-08-06 23:25                       ` Linus Torvalds
2009-08-07  0:13                         ` Linus Torvalds
2009-08-07  1:30                           ` Artur Skawina
2009-08-07  1:55                             ` Linus Torvalds
2009-08-07  0:53                         ` Artur Skawina
2009-08-07  2:23                   ` Linus Torvalds
2009-08-07  4:16                     ` Artur Skawina
     [not found]                     ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
     [not found]                       ` <4A7CBD28.6070306@gmail.com>
     [not found]                         ` <4A7CBF47.9000903@gmail.com>
     [not found]                           ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
     [not found]                             ` <4A7CC380.3070008@gmail.com>
2009-08-08  4:16                               ` Linus Torvalds
2009-08-08  5:34                                 ` Artur Skawina
2009-08-08 17:10                                   ` Linus Torvalds
2009-08-08 18:12                                     ` Artur Skawina
2009-08-08 22:58                                   ` Artur Skawina
2009-08-08 23:36                                     ` Artur Skawina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).