git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Thu, 06 Aug 2009 21:10:00 +0200	[thread overview]
Message-ID: <4A7B2A88.2040602@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908061052320.3390@localhost.localdomain>

Linus Torvalds wrote:
> 
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> For those curious just how close the C version is to the various
>> asm and C implementations, the q&d microbenchmark is at 
>> http://www.src.multimo.pl/YDpqIo7Li27O0L0h/sha1bench.tar.gz
> 
> Hmm. That thing doesn't work at all on x86-64. Even apart from the asm 
> sources, your timing thing does soem really odd things (why do you do that 
> odd "iret" in GETCYCLES and GETTIME?). You're better off using 
> lfence/mfence/cpuid, and I think you could make it work on 64-bit that 
> way too.

yes, it's 32-bit only, i should have mentioned that. The timing
code was written more than a decade ago, it really works on p2,
haven't updated it, it's all just c&p'ed ever since. All of it
can be safely disabled; on p2 you could account for every cycle,
nowadays gettimeofday is more than enough.

> I just hacked it away for testing.
> 
>> In short: 88% of openssl speed on P3, 42% on P4, 66% on Atom.
> 
> I'll use this to see if I can improve the 32-bit case.
> 
> On Nehalem, with your benchmark, I get:
> 
> 	#             TIME[s] SPEED[MB/s]
> 	rfc3174         5.122       119.2
> 	# New hash result: d829b9e028e64840094ab6702f9acdf11bec3937
> 	rfc3174         5.153       118.5
> 	linus           2.092       291.8
> 	linusas         2.056       296.8
> 	linusas2        1.909       319.8
> 	mozilla         5.139       118.8
> 	mozillaas       5.775       105.7
> 	openssl         1.627       375.1
> 	spelvin         1.678       363.7
> 	spelvina        1.603       380.8
> 	nettle          1.592       383.4
> 
> And with the hacked version to get some 64-bit numbers:
> 
> 	#             TIME[s] SPEED[MB/s]
> 	rfc3174         3.992       152.9
> 	# New hash result: b78fd74c0033a4dfe0ededccb85ab00cb56880ab
> 	rfc3174         3.991       152.9
> 	linus            1.54       396.3
> 	linusas         1.533       398.1
> 	linusas2        1.603       380.9
> 	mozilla         4.352       140.3
> 	mozillaas       4.227       144.4
> 
> so as you can see, your improvements in 32-bit mode are actually 
> de-provements in 64-bit mode (ok, your first one seems to be a tiny 
> improvement, but I think it's in the noise).

Actually i didn't keep anything that wasn't a win, one reason
why linusas2 stayed was that it really surprised me, i'd have
expected for gcc to do a lot worse w/ the many temporaries and
the compiler came up w/ a 70% gain; gcc really must have improved
when i wasn't looking.

> But you're right, I need to try to improve the 32-bit case.

I never said anything like that. :) there probably isn't all that
much that can be done. I tried a few things, but never saw any 
improvement above measurement noise (a few percent). Would have
though that overlapping the iterations a bit would be a gain, but
that didn't do much (-20%..0), maybe on 64 bit, with more registers...

Oh, i noticed that '-mtune' makes quite a difference, it can change
the relative performance of the functions significantly, in unobvious
ways; depending on which cpu gcc tunes for (build config or -mtune);
some implementations slow down, others become a bit faster.

artur

  reply	other threads:[~2009-08-06 19:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16   ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18     ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20       ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22         ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24           ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25             ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25     ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09   ` Linus Torvalds
2009-08-06 19:10     ` Artur Skawina [this message]
2009-08-06 19:41       ` Linus Torvalds
2009-08-06 20:08         ` Artur Skawina
2009-08-06 20:53           ` Linus Torvalds
2009-08-06 21:24             ` Linus Torvalds
2009-08-06 21:39             ` Artur Skawina
2009-08-06 21:52               ` Artur Skawina
2009-08-06 22:27                 ` Linus Torvalds
2009-08-06 22:33                   ` Linus Torvalds
2009-08-06 23:19                     ` Artur Skawina
2009-08-06 23:42                       ` Linus Torvalds
2009-08-06 22:55                   ` Artur Skawina
2009-08-06 23:04                     ` Linus Torvalds
2009-08-06 23:25                       ` Linus Torvalds
2009-08-07  0:13                         ` Linus Torvalds
2009-08-07  1:30                           ` Artur Skawina
2009-08-07  1:55                             ` Linus Torvalds
2009-08-07  0:53                         ` Artur Skawina
2009-08-07  2:23                   ` Linus Torvalds
2009-08-07  4:16                     ` Artur Skawina
     [not found]                     ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
     [not found]                       ` <4A7CBD28.6070306@gmail.com>
     [not found]                         ` <4A7CBF47.9000903@gmail.com>
     [not found]                           ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
     [not found]                             ` <4A7CC380.3070008@gmail.com>
2009-08-08  4:16                               ` Linus Torvalds
2009-08-08  5:34                                 ` Artur Skawina
2009-08-08 17:10                                   ` Linus Torvalds
2009-08-08 18:12                                     ` Artur Skawina
2009-08-08 22:58                                   ` Artur Skawina
2009-08-08 23:36                                     ` Artur Skawina
  -- strict thread matches above, loose matches on Subject: below --
2009-08-07  7:36 George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A7B2A88.2040602@gmail.com \
    --to=art.08.09@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).