All of lore.kernel.org
 help / color / mirror / Atom feed
From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@cam.org>, George Spelvin <linux@horizon.com>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: x86 SHA1: Faster than OpenSSL
Date: Thu, 06 Aug 2009 05:19:33 +0200	[thread overview]
Message-ID: <4A7A4BC5.7010106@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908051902580.3390@localhost.localdomain>

Linus Torvalds wrote:
> 
> The bigger issue seems to be that it's shifter-limited, or that's what I 
> take away from my profiles. I suspect it's even _more_ shifter-limited on 
> some other micro-architectures, because gcc is being stupid, and generates
> 
> 	ror $31,%eax
> 
> from the "left shift + right shift" combination. It seems to -always- 
> generate a "ror", rather than trying to generate 'rot' if the shift count 
> would be smaller that way.
> 
> And I know _some_ old micro-architectures will literally internally loop 
> on the rol/ror counts, so "ror $31" can be _much_ more expensive than "rol 
> $1".
> 
> That isn't the case on my Nehalem, though. But I can't seem to get gcc to 
> generate better code without actually using inline asm..

The compiler does the right thing w/ something like this:

+#if __GNUC__>1 && defined(__i386)
+#define SHA_ROT(data,bits) ({ \
+  unsigned d = (data); \
+  if (bits<16) \
+    __asm__ ("roll %1,%0" : "=r" (d) : "I" (bits), "0" (d)); \
+  else \
+    __asm__ ("rorl %1,%0" : "=r" (d) : "I" (32-bits), "0" (d)); \
+  d; \
+  })
+#else
 #define SHA_ROT(X,n) (((X) << (n)) | ((X) >> (32-(n))))
+#endif
 
which doesn't obfuscate the code as much.
(I needed the asm on p4 anyway, as w/o it the mozilla version is even
 slower than an rfc3174 one. rol vs ror makes no measurable difference)

>  static void blk_SHA1Block(blk_SHA_CTX *ctx, const unsigned int *data)
>  {
> @@ -93,7 +105,7 @@ static void blk_SHA1Block(blk_SHA_CTX *ctx, const unsigned int *data)
>  
>  	/* Unroll it? */
>  	for (t = 16; t <= 79; t++)
> -		W[t] = SHA_ROT(W[t-3] ^ W[t-8] ^ W[t-14] ^ W[t-16], 1);
> +		W[t] = SHA_ROL(W[t-3] ^ W[t-8] ^ W[t-14] ^ W[t-16], 1);

unrolling this once (but not more) is a win, at least on p4.

>  #define T_0_19(t) \
> -	TEMP = SHA_ROT(A,5) + (((C^D)&B)^D)     + E + W[t] + 0x5a827999; \
> -	E = D; D = C; C = SHA_ROT(B, 30); B = A; A = TEMP;
> +	TEMP = SHA_ROL(A,5) + (((C^D)&B)^D)     + E + W[t] + 0x5a827999; \
> +	E = D; D = C; C = SHA_ROR(B, 2); B = A; A = TEMP;
>  
>  	T_0_19( 0); T_0_19( 1); T_0_19( 2); T_0_19( 3); T_0_19( 4);
>  	T_0_19( 5); T_0_19( 6); T_0_19( 7); T_0_19( 8); T_0_19( 9);

unrolling these otoh is a clear loss (iirc ~10%). 

artur

  reply	other threads:[~2009-08-06  3:19 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-26 23:21 Performance issue of 'git branch' George Spelvin
2009-07-31 10:46 ` Request for benchmarking: x86 SHA1 code George Spelvin
2009-07-31 11:11   ` Erik Faye-Lund
2009-07-31 11:31     ` George Spelvin
2009-07-31 11:37     ` Michael J Gruber
2009-07-31 12:24       ` Erik Faye-Lund
2009-07-31 12:29         ` Johannes Schindelin
2009-07-31 12:32         ` George Spelvin
2009-07-31 12:45           ` Erik Faye-Lund
2009-07-31 13:02             ` George Spelvin
2009-07-31 11:21   ` Michael J Gruber
2009-07-31 11:26   ` Michael J Gruber
2009-07-31 12:31   ` Carlos R. Mafra
2009-07-31 13:27   ` Brian Ristuccia
2009-07-31 14:05     ` George Spelvin
2009-07-31 13:27   ` Jakub Narebski
2009-07-31 15:05   ` Peter Harris
2009-07-31 15:22   ` Peter Harris
2009-08-03  3:47   ` x86 SHA1: Faster than OpenSSL George Spelvin
2009-08-03  7:36     ` Jonathan del Strother
2009-08-04  1:40     ` Mark Lodato
2009-08-04  2:30     ` Linus Torvalds
2009-08-04  2:51       ` Linus Torvalds
2009-08-04  3:07         ` Jon Smirl
2009-08-04  5:01           ` George Spelvin
2009-08-04 12:56             ` Jon Smirl
2009-08-04 14:29               ` Dmitry Potapov
2009-08-18 21:50         ` Andy Polyakov
2009-08-04  4:48       ` George Spelvin
2009-08-04  6:30         ` Linus Torvalds
2009-08-04  8:01           ` George Spelvin
2009-08-04 20:41             ` Junio C Hamano
2009-08-05 18:17               ` George Spelvin
2009-08-05 20:36                 ` Johannes Schindelin
2009-08-05 20:44                 ` Junio C Hamano
2009-08-05 20:55                 ` Linus Torvalds
2009-08-05 23:13                   ` Linus Torvalds
2009-08-06  1:18                     ` Linus Torvalds
2009-08-06  1:52                       ` Nicolas Pitre
2009-08-06  2:04                         ` Junio C Hamano
2009-08-06  2:10                           ` Linus Torvalds
2009-08-06  2:20                           ` Nicolas Pitre
2009-08-06  2:08                         ` Linus Torvalds
2009-08-06  3:19                           ` Artur Skawina [this message]
2009-08-06  3:31                             ` Linus Torvalds
2009-08-06  3:48                               ` Linus Torvalds
2009-08-06  4:01                                 ` Linus Torvalds
2009-08-06  4:28                                   ` Artur Skawina
2009-08-06  4:50                                     ` Linus Torvalds
2009-08-06  5:19                                       ` Artur Skawina
2009-08-06  7:03                                         ` George Spelvin
2009-08-06  4:52                                 ` George Spelvin
2009-08-06  4:08                               ` Artur Skawina
2009-08-06  4:27                                 ` Linus Torvalds
2009-08-06  5:44                                   ` Artur Skawina
2009-08-06  5:56                                     ` Artur Skawina
2009-08-06  7:45                                       ` Artur Skawina
2009-08-06 18:49                       ` Erik Faye-Lund
2009-08-04  6:40         ` Linus Torvalds
2009-08-18 21:26     ` Andy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A7A4BC5.7010106@gmail.com \
    --to=art.08.09@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=linux@horizon.com \
    --cc=nico@cam.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.