From: Linus Torvalds <torvalds@linux-foundation.org>
To: Artur Skawina <art.08.09@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Thu, 6 Aug 2009 19:23:21 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.01.0908061909310.3390@localhost.localdomain> (raw)
In-Reply-To: <alpine.LFD.2.01.0908061502570.3390@localhost.localdomain>
On Thu, 6 Aug 2009, Linus Torvalds wrote:
>
>
> On Thu, 6 Aug 2009, Artur Skawina wrote:
> >
> > Does this make any difference for you? For me it's the best one so far
> > (the linusas2 number clearly shows that for me the register renaming does
> > nothing; other than that the functions should be very similar)
>
> Nope. If anything, it's bit slower, but it might be in the noise. I
> generally got 330MB/s with my "cpp renaming" on Nehalem (32-bit - the
> 64-bit numbers are ~400MB/s), but with this I got 325MB/s twice in a row,
> which matches the linusas2 numbers pretty exactly.
I actually found a P4 I have access to, except that one is a Prescott.
And I can't run it in 32-bit mode, because I only have a regular user
login, and it only has the 64-bit development environment.
But I can do the hacked-for-64bit sha1bench runs, and I tested your patch.
It's horrible.
Here's the plain "linus" baseline (ie the "Do register rotation in cpp")
thing, with the fixed "E += TEMP .." thing):
# TIME[s] SPEED[MB/s]
rfc3174 1.648 37.03
rfc3174 1.677 36.4
linus 0.4018 151.9
linusas 0.4439 137.5
linusas2 0.4381 139.3
mozilla 0.9587 63.66
mozillaas 0.9434 64.7
and here it is with your patch:
# TIME[s] SPEED[MB/s]
rfc3174 1.667 36.61
rfc3174 1.644 37.12
linus 0.4653 131.2
linusas 0.4412 138.3
linusas2 0.4388 139.1
mozilla 0.9466 64.48
mozillaas 0.9449 64.59
(ok, so the numbers aren't horribly stable, but the "plain linus" thing
consistently outperforms here - and underperforms with your patch).
However, note that since this is the 64-bit thing, there likely aren't any
spill issues, but it's simply an issue of "just how did the array[]
accesses get scheduled" etc. And since this is a Prescott (or rather
"Xeon") P4, the shifter isn't quite as horrible as yours is. _And_ this is
a different gcc version (4.0.3).
So the numbers aren't really all that comparable. It's more an example of
"optimizing for P4 is futile, because you're just playing with total
randomness". That's like a 20MB/s difference, just from moving a few ALU
ops around a bit.
And it's entirely possible that if I had gcc-4.4 on that machine, your
patch would magically do the right thing ;)
Sadly, that machine is just a ssh gateway, so there's no real development
tools on it at all - no way to get good profiles etc. So I can't really
say exactly what the problem pattern is :(
Linus
next prev parent reply other threads:[~2009-08-07 2:23 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18 ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20 ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22 ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24 ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25 ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09 ` Linus Torvalds
2009-08-06 19:10 ` Artur Skawina
2009-08-06 19:41 ` Linus Torvalds
2009-08-06 20:08 ` Artur Skawina
2009-08-06 20:53 ` Linus Torvalds
2009-08-06 21:24 ` Linus Torvalds
2009-08-06 21:39 ` Artur Skawina
2009-08-06 21:52 ` Artur Skawina
2009-08-06 22:27 ` Linus Torvalds
2009-08-06 22:33 ` Linus Torvalds
2009-08-06 23:19 ` Artur Skawina
2009-08-06 23:42 ` Linus Torvalds
2009-08-06 22:55 ` Artur Skawina
2009-08-06 23:04 ` Linus Torvalds
2009-08-06 23:25 ` Linus Torvalds
2009-08-07 0:13 ` Linus Torvalds
2009-08-07 1:30 ` Artur Skawina
2009-08-07 1:55 ` Linus Torvalds
2009-08-07 0:53 ` Artur Skawina
2009-08-07 2:23 ` Linus Torvalds [this message]
2009-08-07 4:16 ` Artur Skawina
[not found] ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
[not found] ` <4A7CBD28.6070306@gmail.com>
[not found] ` <4A7CBF47.9000903@gmail.com>
[not found] ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
[not found] ` <4A7CC380.3070008@gmail.com>
2009-08-08 4:16 ` Linus Torvalds
2009-08-08 5:34 ` Artur Skawina
2009-08-08 17:10 ` Linus Torvalds
2009-08-08 18:12 ` Artur Skawina
2009-08-08 22:58 ` Artur Skawina
2009-08-08 23:36 ` Artur Skawina
-- strict thread matches above, loose matches on Subject: below --
2009-08-07 7:36 George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.01.0908061909310.3390@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=art.08.09@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).