git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Thu, 06 Aug 2009 22:08:44 +0200	[thread overview]
Message-ID: <4A7B384C.2020407@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908061233360.3390@localhost.localdomain>

Linus Torvalds wrote:
> 
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> Oh, i noticed that '-mtune' makes quite a difference, it can change
>> the relative performance of the functions significantly, in unobvious
>> ways; depending on which cpu gcc tunes for (build config or -mtune);
>> some implementations slow down, others become a bit faster.
> 
> That probably is mainly true for P4, although it's quite possible that it 
> has an effect for just what the register allocator does, and then for 
> spilling.
> 
> And it looks like _all_ the tweakability is in the spilling. Nothing else 
> matters.
> 
> How does this patch work for you? It avoids doing that C-level register 
> rotation, and instead rotates the register names with the preprocessor.
> 
> I realize it's ugly as hell, but it does make it easier for gcc to see 
> what's going on.
> 
> The patch is against my git patches, but I think it should apply pretty 
> much as-is to your sha1bench sources too. Does it make any difference for 
> you?

it's a bit slower (P4):

before: linus          0.6288       97.06
after:  linus          0.6604       92.42

i was trying similar things, like the example below, too, but it wasn't a
win on 32 bit...

artur

[the iteration below is functionally correct, but scheduling is most likely
 fubared as it wasn't a win and i was checking how much a difference it made
 on P4 -- ~-20..~0%, but never faster (relative to linusas2; it _is_ faster
 than 'linus'. Dropped this version when merging your new preprocessor macros.]

@@ -125,6 +127,8 @@
 #define W(x) (array[(x)&15])
 #define SHA_XOR(t) \
        TEMP = SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1); W(t) = TEMP;
+#define SHA_XOR2(t) \
+       SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1)
 
 #define T_16_19(t) \
         { unsigned TEMP;\
@@ -139,10 +143,27 @@
 #endif
 
 #define T_20_39(t) \
-        { unsigned TEMP;\
-       SHA_XOR(t); \
-       TEMP += (B^C^D) + E + 0x6ed9eba1; \
-       E = D; D = C; C = SHA_ROR(B, 2); B = A; TEMP += SHA_ROL(A,5); A = TEMP; }
+        if (t%2==0) {\
+               unsigned TEMP;\
+               unsigned TEMP2;\
+               \
+               TEMP   = SHA_XOR2(t); \
+               TEMP2  = SHA_XOR2(t+1); \
+               W(t)   = TEMP;\
+               W(t+1) = TEMP2;\
+               TEMP   += E + 0x6ed9eba1; \
+               E      = C;\
+               TEMP   += (B^E^D); \
+               TEMP2  += D + 0x6ed9eba1; \
+               D      = SHA_ROR(B, 2);\
+               B      = SHA_ROL(A, 5);\
+               B      += TEMP;\
+               C      = SHA_ROR(A, 2);\
+               A      ^= E; \
+               A      ^= D; \
+               A      += TEMP2;\
+               A      += SHA_ROL(B, 5);\
+       }
 
 #if UNROLL
        T_20_39(20); T_20_39(21); T_20_39(22); T_20_39(23); T_20_39(24);

  reply	other threads:[~2009-08-06 20:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16   ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18     ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20       ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22         ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24           ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25             ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25     ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09   ` Linus Torvalds
2009-08-06 19:10     ` Artur Skawina
2009-08-06 19:41       ` Linus Torvalds
2009-08-06 20:08         ` Artur Skawina [this message]
2009-08-06 20:53           ` Linus Torvalds
2009-08-06 21:24             ` Linus Torvalds
2009-08-06 21:39             ` Artur Skawina
2009-08-06 21:52               ` Artur Skawina
2009-08-06 22:27                 ` Linus Torvalds
2009-08-06 22:33                   ` Linus Torvalds
2009-08-06 23:19                     ` Artur Skawina
2009-08-06 23:42                       ` Linus Torvalds
2009-08-06 22:55                   ` Artur Skawina
2009-08-06 23:04                     ` Linus Torvalds
2009-08-06 23:25                       ` Linus Torvalds
2009-08-07  0:13                         ` Linus Torvalds
2009-08-07  1:30                           ` Artur Skawina
2009-08-07  1:55                             ` Linus Torvalds
2009-08-07  0:53                         ` Artur Skawina
2009-08-07  2:23                   ` Linus Torvalds
2009-08-07  4:16                     ` Artur Skawina
     [not found]                     ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
     [not found]                       ` <4A7CBD28.6070306@gmail.com>
     [not found]                         ` <4A7CBF47.9000903@gmail.com>
     [not found]                           ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
     [not found]                             ` <4A7CC380.3070008@gmail.com>
2009-08-08  4:16                               ` Linus Torvalds
2009-08-08  5:34                                 ` Artur Skawina
2009-08-08 17:10                                   ` Linus Torvalds
2009-08-08 18:12                                     ` Artur Skawina
2009-08-08 22:58                                   ` Artur Skawina
2009-08-08 23:36                                     ` Artur Skawina
  -- strict thread matches above, loose matches on Subject: below --
2009-08-07  7:36 George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A7B384C.2020407@gmail.com \
    --to=art.08.09@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).