All of lore.kernel.org
 help / color / mirror / Atom feed
From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@cam.org>, George Spelvin <linux@horizon.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: x86 SHA1: Faster than OpenSSL
Date: Thu, 06 Aug 2009 07:19:01 +0200	[thread overview]
Message-ID: <4A7A67C5.8060109@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908052137400.3390@localhost.localdomain>

Linus Torvalds wrote:
> 
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> #             TIME[s] SPEED[MB/s]
>> rfc3174         1.357       44.99
>> rfc3174         1.352       45.13
>> mozilla         1.509       40.44
>> mozillaas       1.133       53.87
>> linus          0.5818       104.9
>>
>> so it's more than twice as fast as the mozilla implementation.
> 
> So that's some general SHA1 benchmark you have?
> 
> I hope it tests correctness too. 

yep, sort of, i just check that all versions return the same result
when hashing some pseudorandom data.

> As to my atom testing: my poor little atom is a sad little thing, and 
> it's almost painful to benchmark that thing. But it's worth it to look at 
> how the 32-bit code compares to the openssl asm code too:
> 
>  - BLK_SHA1:
> 	real	2m27.160s
>  - OpenSSL:
> 	real	2m12.580s
>  - Mozilla-SHA1:
> 	real	3m21.836s
> 
> As expected, the hand-tuned assembly does better (and by a bigger margin). 
> Probably partly because scheduling is important when in-order, and partly 
> because gcc will have a harder time with the small register set.
> 
> But it's still a big improvement over mozilla one.
> 
> (This is, as always, 'git fsck --full'. It spends about 50% on that SHA1 
> calculation, so the SHA1 speedup is larger than you see from just th 
> enumbers)

I'll start looking at other cpus once i integrate the asm versions into
my benchmark. 

P4s really are "special". Even something as simple as this on top of your
version:

@@ -129,8 +133,8 @@
 
 #define T_20_39(t) \
        SHA_XOR(t); \
-       TEMP += SHA_ROL(A,5) + (B^C^D) + E + 0x6ed9eba1; \
-       E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP;
+       TEMP += SHA_ROL(A,5) + (B^C^D) + E; \
+       E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP + 0x6ed9eba1;
 
        T_20_39(20); T_20_39(21); T_20_39(22); T_20_39(23); T_20_39(24);
        T_20_39(25); T_20_39(26); T_20_39(27); T_20_39(28); T_20_39(29);
@@ -139,8 +143,8 @@
 
 #define T_40_59(t) \
        SHA_XOR(t); \
-       TEMP += SHA_ROL(A,5) + ((B&C)|(D&(B|C))) + E + 0x8f1bbcdc; \
-       E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP;
+       TEMP += SHA_ROL(A,5) + ((B&C)|(D&(B|C))) + E; \
+       E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP + 0x8f1bbcdc;
 
        T_40_59(40); T_40_59(41); T_40_59(42); T_40_59(43); T_40_59(44);
        T_40_59(45); T_40_59(46); T_40_59(47); T_40_59(48); T_40_59(49);

saves another 10% or so:

#Initializing... Rounds: 1000000, size: 62500K, time: 1.421s, speed: 42.97MB/s
#             TIME[s] SPEED[MB/s]
rfc3174         1.403        43.5
# New hash result: b747042d9f4f1fdabd2ac53076f8f830dea7fe0f
rfc3174         1.403       43.51
linus          0.5891       103.6
linusas        0.5337       114.4
mozilla         1.535       39.76
mozillaas       1.128       54.13


artur

  reply	other threads:[~2009-08-06  5:19 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-26 23:21 Performance issue of 'git branch' George Spelvin
2009-07-31 10:46 ` Request for benchmarking: x86 SHA1 code George Spelvin
2009-07-31 11:11   ` Erik Faye-Lund
2009-07-31 11:31     ` George Spelvin
2009-07-31 11:37     ` Michael J Gruber
2009-07-31 12:24       ` Erik Faye-Lund
2009-07-31 12:29         ` Johannes Schindelin
2009-07-31 12:32         ` George Spelvin
2009-07-31 12:45           ` Erik Faye-Lund
2009-07-31 13:02             ` George Spelvin
2009-07-31 11:21   ` Michael J Gruber
2009-07-31 11:26   ` Michael J Gruber
2009-07-31 12:31   ` Carlos R. Mafra
2009-07-31 13:27   ` Brian Ristuccia
2009-07-31 14:05     ` George Spelvin
2009-07-31 13:27   ` Jakub Narebski
2009-07-31 15:05   ` Peter Harris
2009-07-31 15:22   ` Peter Harris
2009-08-03  3:47   ` x86 SHA1: Faster than OpenSSL George Spelvin
2009-08-03  7:36     ` Jonathan del Strother
2009-08-04  1:40     ` Mark Lodato
2009-08-04  2:30     ` Linus Torvalds
2009-08-04  2:51       ` Linus Torvalds
2009-08-04  3:07         ` Jon Smirl
2009-08-04  5:01           ` George Spelvin
2009-08-04 12:56             ` Jon Smirl
2009-08-04 14:29               ` Dmitry Potapov
2009-08-18 21:50         ` Andy Polyakov
2009-08-04  4:48       ` George Spelvin
2009-08-04  6:30         ` Linus Torvalds
2009-08-04  8:01           ` George Spelvin
2009-08-04 20:41             ` Junio C Hamano
2009-08-05 18:17               ` George Spelvin
2009-08-05 20:36                 ` Johannes Schindelin
2009-08-05 20:44                 ` Junio C Hamano
2009-08-05 20:55                 ` Linus Torvalds
2009-08-05 23:13                   ` Linus Torvalds
2009-08-06  1:18                     ` Linus Torvalds
2009-08-06  1:52                       ` Nicolas Pitre
2009-08-06  2:04                         ` Junio C Hamano
2009-08-06  2:10                           ` Linus Torvalds
2009-08-06  2:20                           ` Nicolas Pitre
2009-08-06  2:08                         ` Linus Torvalds
2009-08-06  3:19                           ` Artur Skawina
2009-08-06  3:31                             ` Linus Torvalds
2009-08-06  3:48                               ` Linus Torvalds
2009-08-06  4:01                                 ` Linus Torvalds
2009-08-06  4:28                                   ` Artur Skawina
2009-08-06  4:50                                     ` Linus Torvalds
2009-08-06  5:19                                       ` Artur Skawina [this message]
2009-08-06  7:03                                         ` George Spelvin
2009-08-06  4:52                                 ` George Spelvin
2009-08-06  4:08                               ` Artur Skawina
2009-08-06  4:27                                 ` Linus Torvalds
2009-08-06  5:44                                   ` Artur Skawina
2009-08-06  5:56                                     ` Artur Skawina
2009-08-06  7:45                                       ` Artur Skawina
2009-08-06 18:49                       ` Erik Faye-Lund
2009-08-04  6:40         ` Linus Torvalds
2009-08-18 21:26     ` Andy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A7A67C5.8060109@gmail.com \
    --to=art.08.09@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=linux@horizon.com \
    --cc=nico@cam.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.