From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Fri, 07 Aug 2009 00:55:08 +0200 [thread overview]
Message-ID: <4A7B5F4C.30102@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908061502570.3390@localhost.localdomain>
Linus Torvalds wrote:
>
> On Thu, 6 Aug 2009, Artur Skawina wrote:
>> Does this make any difference for you? For me it's the best one so far
>> (the linusas2 number clearly shows that for me the register renaming does
>> nothing; other than that the functions should be very similar)
>
> Nope. If anything, it's bit slower, but it might be in the noise. I
> generally got 330MB/s with my "cpp renaming" on Nehalem (32-bit - the
> 64-bit numbers are ~400MB/s), but with this I got 325MB/s twice in a row,
> which matches the linusas2 numbers pretty exactly.
>
> But it seems to make a big difference for you.
It seems to do well on P2 and P4 here, if it works for core2 this could
be a good generic candidate. It only does 62% on an Atom, but the best C
version so far exceeds it only by ~2%.
> Btw, _what_ P4 do you have (Northwood or Prescott)?
northwood
> The Intel optimization manuals very much talk about avoiding rotates. And
> they mention "with a CPUID signature corresponding to family 15 and model
> encoding of 0, 1, or 2" specifically as being longer latency. That's
> basically pre-prescott P4, I think.
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 5
> Anyway, on P4 I think you have two double-speed integer issue ports (ie
> max four ops per cycle), but only one of them takes a rotate, and only in
> the first half of the cycle (ie just one shift per cycle).
>
> And afaik, that is actually the _improved_ state in Prescott. The older
> P4's didn't have a full shifter unit at all, iirc: shifts were "complex
> instructions" in Northwood and weren't even single-clock.
>
> In Core 2, I think there's still just one shifter unit, but at least it's
> as fast as all the other units. So P4 really does stand out as sucking as
> far as shifts are concerned, and if you have an older P4, it will be even
> worse.
hmm, I might be able to try it on some old willamette, but my prescott's
mobo died, so i can't verify that right now.
I'll upload an updated sha1bench, maybe somebody else feels like checking...
artur
next prev parent reply other threads:[~2009-08-06 22:55 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18 ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20 ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22 ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24 ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25 ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09 ` Linus Torvalds
2009-08-06 19:10 ` Artur Skawina
2009-08-06 19:41 ` Linus Torvalds
2009-08-06 20:08 ` Artur Skawina
2009-08-06 20:53 ` Linus Torvalds
2009-08-06 21:24 ` Linus Torvalds
2009-08-06 21:39 ` Artur Skawina
2009-08-06 21:52 ` Artur Skawina
2009-08-06 22:27 ` Linus Torvalds
2009-08-06 22:33 ` Linus Torvalds
2009-08-06 23:19 ` Artur Skawina
2009-08-06 23:42 ` Linus Torvalds
2009-08-06 22:55 ` Artur Skawina [this message]
2009-08-06 23:04 ` Linus Torvalds
2009-08-06 23:25 ` Linus Torvalds
2009-08-07 0:13 ` Linus Torvalds
2009-08-07 1:30 ` Artur Skawina
2009-08-07 1:55 ` Linus Torvalds
2009-08-07 0:53 ` Artur Skawina
2009-08-07 2:23 ` Linus Torvalds
2009-08-07 4:16 ` Artur Skawina
[not found] ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
[not found] ` <4A7CBD28.6070306@gmail.com>
[not found] ` <4A7CBF47.9000903@gmail.com>
[not found] ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
[not found] ` <4A7CC380.3070008@gmail.com>
2009-08-08 4:16 ` Linus Torvalds
2009-08-08 5:34 ` Artur Skawina
2009-08-08 17:10 ` Linus Torvalds
2009-08-08 18:12 ` Artur Skawina
2009-08-08 22:58 ` Artur Skawina
2009-08-08 23:36 ` Artur Skawina
-- strict thread matches above, loose matches on Subject: below --
2009-08-07 7:36 George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A7B5F4C.30102@gmail.com \
--to=art.08.09@gmail.com \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.