All of lore.kernel.org
 help / color / mirror / Atom feed
From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Thu, 06 Aug 2009 21:10:00 +0200	[thread overview]
Message-ID: <4A7B2A88.2040602@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908061052320.3390@localhost.localdomain>

Linus Torvalds wrote:
> 
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> For those curious just how close the C version is to the various
>> asm and C implementations, the q&d microbenchmark is at 
>> http://www.src.multimo.pl/YDpqIo7Li27O0L0h/sha1bench.tar.gz
> 
> Hmm. That thing doesn't work at all on x86-64. Even apart from the asm 
> sources, your timing thing does soem really odd things (why do you do that 
> odd "iret" in GETCYCLES and GETTIME?). You're better off using 
> lfence/mfence/cpuid, and I think you could make it work on 64-bit that 
> way too.

yes, it's 32-bit only, i should have mentioned that. The timing
code was written more than a decade ago, it really works on p2,
haven't updated it, it's all just c&p'ed ever since. All of it
can be safely disabled; on p2 you could account for every cycle,
nowadays gettimeofday is more than enough.

> I just hacked it away for testing.
> 
>> In short: 88% of openssl speed on P3, 42% on P4, 66% on Atom.
> 
> I'll use this to see if I can improve the 32-bit case.
> 
> On Nehalem, with your benchmark, I get:
> 
> 	#             TIME[s] SPEED[MB/s]
> 	rfc3174         5.122       119.2
> 	# New hash result: d829b9e028e64840094ab6702f9acdf11bec3937
> 	rfc3174         5.153       118.5
> 	linus           2.092       291.8
> 	linusas         2.056       296.8
> 	linusas2        1.909       319.8
> 	mozilla         5.139       118.8
> 	mozillaas       5.775       105.7
> 	openssl         1.627       375.1
> 	spelvin         1.678       363.7
> 	spelvina        1.603       380.8
> 	nettle          1.592       383.4
> 
> And with the hacked version to get some 64-bit numbers:
> 
> 	#             TIME[s] SPEED[MB/s]
> 	rfc3174         3.992       152.9
> 	# New hash result: b78fd74c0033a4dfe0ededccb85ab00cb56880ab
> 	rfc3174         3.991       152.9
> 	linus            1.54       396.3
> 	linusas         1.533       398.1
> 	linusas2        1.603       380.9
> 	mozilla         4.352       140.3
> 	mozillaas       4.227       144.4
> 
> so as you can see, your improvements in 32-bit mode are actually 
> de-provements in 64-bit mode (ok, your first one seems to be a tiny 
> improvement, but I think it's in the noise).

Actually i didn't keep anything that wasn't a win, one reason
why linusas2 stayed was that it really surprised me, i'd have
expected for gcc to do a lot worse w/ the many temporaries and
the compiler came up w/ a 70% gain; gcc really must have improved
when i wasn't looking.

> But you're right, I need to try to improve the 32-bit case.

I never said anything like that. :) there probably isn't all that
much that can be done. I tried a few things, but never saw any 
improvement above measurement noise (a few percent). Would have
though that overlapping the iterations a bit would be a gain, but
that didn't do much (-20%..0), maybe on 64 bit, with more registers...

Oh, i noticed that '-mtune' makes quite a difference, it can change
the relative performance of the functions significantly, in unobvious
ways; depending on which cpu gcc tunes for (build config or -mtune);
some implementations slow down, others become a bit faster.

artur

  reply	other threads:[~2009-08-06 19:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16   ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18     ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20       ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22         ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24           ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25             ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25     ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09   ` Linus Torvalds
2009-08-06 19:10     ` Artur Skawina [this message]
2009-08-06 19:41       ` Linus Torvalds
2009-08-06 20:08         ` Artur Skawina
2009-08-06 20:53           ` Linus Torvalds
2009-08-06 21:24             ` Linus Torvalds
2009-08-06 21:39             ` Artur Skawina
2009-08-06 21:52               ` Artur Skawina
2009-08-06 22:27                 ` Linus Torvalds
2009-08-06 22:33                   ` Linus Torvalds
2009-08-06 23:19                     ` Artur Skawina
2009-08-06 23:42                       ` Linus Torvalds
2009-08-06 22:55                   ` Artur Skawina
2009-08-06 23:04                     ` Linus Torvalds
2009-08-06 23:25                       ` Linus Torvalds
2009-08-07  0:13                         ` Linus Torvalds
2009-08-07  1:30                           ` Artur Skawina
2009-08-07  1:55                             ` Linus Torvalds
2009-08-07  0:53                         ` Artur Skawina
2009-08-07  2:23                   ` Linus Torvalds
2009-08-07  4:16                     ` Artur Skawina
     [not found]                     ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
     [not found]                       ` <4A7CBD28.6070306@gmail.com>
     [not found]                         ` <4A7CBF47.9000903@gmail.com>
     [not found]                           ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
     [not found]                             ` <4A7CC380.3070008@gmail.com>
2009-08-08  4:16                               ` Linus Torvalds
2009-08-08  5:34                                 ` Artur Skawina
2009-08-08 17:10                                   ` Linus Torvalds
2009-08-08 18:12                                     ` Artur Skawina
2009-08-08 22:58                                   ` Artur Skawina
2009-08-08 23:36                                     ` Artur Skawina
  -- strict thread matches above, loose matches on Subject: below --
2009-08-07  7:36 George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A7B2A88.2040602@gmail.com \
    --to=art.08.09@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.