From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Thu, 06 Aug 2009 22:08:44 +0200 [thread overview]
Message-ID: <4A7B384C.2020407@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908061233360.3390@localhost.localdomain>
Linus Torvalds wrote:
>
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> Oh, i noticed that '-mtune' makes quite a difference, it can change
>> the relative performance of the functions significantly, in unobvious
>> ways; depending on which cpu gcc tunes for (build config or -mtune);
>> some implementations slow down, others become a bit faster.
>
> That probably is mainly true for P4, although it's quite possible that it
> has an effect for just what the register allocator does, and then for
> spilling.
>
> And it looks like _all_ the tweakability is in the spilling. Nothing else
> matters.
>
> How does this patch work for you? It avoids doing that C-level register
> rotation, and instead rotates the register names with the preprocessor.
>
> I realize it's ugly as hell, but it does make it easier for gcc to see
> what's going on.
>
> The patch is against my git patches, but I think it should apply pretty
> much as-is to your sha1bench sources too. Does it make any difference for
> you?
it's a bit slower (P4):
before: linus 0.6288 97.06
after: linus 0.6604 92.42
i was trying similar things, like the example below, too, but it wasn't a
win on 32 bit...
artur
[the iteration below is functionally correct, but scheduling is most likely
fubared as it wasn't a win and i was checking how much a difference it made
on P4 -- ~-20..~0%, but never faster (relative to linusas2; it _is_ faster
than 'linus'. Dropped this version when merging your new preprocessor macros.]
@@ -125,6 +127,8 @@
#define W(x) (array[(x)&15])
#define SHA_XOR(t) \
TEMP = SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1); W(t) = TEMP;
+#define SHA_XOR2(t) \
+ SHA_ROL(W(t+13) ^ W(t+8) ^ W(t+2) ^ W(t), 1)
#define T_16_19(t) \
{ unsigned TEMP;\
@@ -139,10 +143,27 @@
#endif
#define T_20_39(t) \
- { unsigned TEMP;\
- SHA_XOR(t); \
- TEMP += (B^C^D) + E + 0x6ed9eba1; \
- E = D; D = C; C = SHA_ROR(B, 2); B = A; TEMP += SHA_ROL(A,5); A = TEMP; }
+ if (t%2==0) {\
+ unsigned TEMP;\
+ unsigned TEMP2;\
+ \
+ TEMP = SHA_XOR2(t); \
+ TEMP2 = SHA_XOR2(t+1); \
+ W(t) = TEMP;\
+ W(t+1) = TEMP2;\
+ TEMP += E + 0x6ed9eba1; \
+ E = C;\
+ TEMP += (B^E^D); \
+ TEMP2 += D + 0x6ed9eba1; \
+ D = SHA_ROR(B, 2);\
+ B = SHA_ROL(A, 5);\
+ B += TEMP;\
+ C = SHA_ROR(A, 2);\
+ A ^= E; \
+ A ^= D; \
+ A += TEMP2;\
+ A += SHA_ROL(B, 5);\
+ }
#if UNROLL
T_20_39(20); T_20_39(21); T_20_39(22); T_20_39(23); T_20_39(24);
next prev parent reply other threads:[~2009-08-06 20:08 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18 ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20 ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22 ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24 ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25 ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09 ` Linus Torvalds
2009-08-06 19:10 ` Artur Skawina
2009-08-06 19:41 ` Linus Torvalds
2009-08-06 20:08 ` Artur Skawina [this message]
2009-08-06 20:53 ` Linus Torvalds
2009-08-06 21:24 ` Linus Torvalds
2009-08-06 21:39 ` Artur Skawina
2009-08-06 21:52 ` Artur Skawina
2009-08-06 22:27 ` Linus Torvalds
2009-08-06 22:33 ` Linus Torvalds
2009-08-06 23:19 ` Artur Skawina
2009-08-06 23:42 ` Linus Torvalds
2009-08-06 22:55 ` Artur Skawina
2009-08-06 23:04 ` Linus Torvalds
2009-08-06 23:25 ` Linus Torvalds
2009-08-07 0:13 ` Linus Torvalds
2009-08-07 1:30 ` Artur Skawina
2009-08-07 1:55 ` Linus Torvalds
2009-08-07 0:53 ` Artur Skawina
2009-08-07 2:23 ` Linus Torvalds
2009-08-07 4:16 ` Artur Skawina
[not found] ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
[not found] ` <4A7CBD28.6070306@gmail.com>
[not found] ` <4A7CBF47.9000903@gmail.com>
[not found] ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
[not found] ` <4A7CC380.3070008@gmail.com>
2009-08-08 4:16 ` Linus Torvalds
2009-08-08 5:34 ` Artur Skawina
2009-08-08 17:10 ` Linus Torvalds
2009-08-08 18:12 ` Artur Skawina
2009-08-08 22:58 ` Artur Skawina
2009-08-08 23:36 ` Artur Skawina
-- strict thread matches above, loose matches on Subject: below --
2009-08-07 7:36 George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A7B384C.2020407@gmail.com \
--to=art.08.09@gmail.com \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).