From: Linus Torvalds <torvalds@linux-foundation.org>
To: Artur Skawina <art.08.09@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Thu, 6 Aug 2009 16:42:01 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.01.0908061625300.3390@localhost.localdomain> (raw)
In-Reply-To: <4A7B64F1.2000309@gmail.com>
On Fri, 7 Aug 2009, Artur Skawina wrote:
>
> Actually that's even more of a reason to make sure the code doesn't suck :)
> The difference on less perverse cpus will usually be small, but on P4 it
> can be huge.
No. First off, the things you have to do on P4 are just insane. See the
email I just sent out asking you to test whether two 1-bit rotates might
be faster than 1 2-bit rotate.
So optimizing for P4 is often the wrong thing.
Secondly, P4's are going away. You may have one, but they are getting
rare. So optimizing for them is a losing proposition in the long run.
> A few years back I found my old ip checksum microbenchmark, and when I ran
> it on a P4 (prescott iirc) i didn't believe my eyes. The straightforward
> 32-bit C implementation was running circles around the in-kernel one...
> And a few tweaks to the assembler version got me another ~100% speedup.[1]
Yeah, not very surprising. The P4 is very good at the simplest possible
kind of code that does _nothing_ fancy.
But then it completely chokes on some code. I mean _totally_. It slows
down by a huge amount if there is anything but the most trivial kinds of
instructions. And by "trivial", I mean _really_ trivial. Shifts (as in
SHA1), but iirc also things like "adc" (add with carry) etc.
So it's not hard to write code that works well on other uarchs, and then
totally blow up on P4. I think it doesn't rename the flags at all, so any
flag dependency (carry being the most common one) will stall things
horrible.
There's also a very subtle store forwarding failure thing (and a lot of
other events) that causes a nasty micro-architectural replay trap, and
again you go from "running like a bat out of hell" to "slower than a i486
at a tenth the frequency".
Really. It's disgusting. Perfectly fine code can run really slowly on the
P4 just because it hits some random internal micro-architectural flaw. And
there's a _lot_ of those "glass jaw" issues.
The best way to avoid them is to use _only_ simple ALU instructions (add,
sub, and/or/not), and to be _very_ careful with loads and stores.
Linus
next prev parent reply other threads:[~2009-08-06 23:42 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18 ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20 ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22 ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24 ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25 ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09 ` Linus Torvalds
2009-08-06 19:10 ` Artur Skawina
2009-08-06 19:41 ` Linus Torvalds
2009-08-06 20:08 ` Artur Skawina
2009-08-06 20:53 ` Linus Torvalds
2009-08-06 21:24 ` Linus Torvalds
2009-08-06 21:39 ` Artur Skawina
2009-08-06 21:52 ` Artur Skawina
2009-08-06 22:27 ` Linus Torvalds
2009-08-06 22:33 ` Linus Torvalds
2009-08-06 23:19 ` Artur Skawina
2009-08-06 23:42 ` Linus Torvalds [this message]
2009-08-06 22:55 ` Artur Skawina
2009-08-06 23:04 ` Linus Torvalds
2009-08-06 23:25 ` Linus Torvalds
2009-08-07 0:13 ` Linus Torvalds
2009-08-07 1:30 ` Artur Skawina
2009-08-07 1:55 ` Linus Torvalds
2009-08-07 0:53 ` Artur Skawina
2009-08-07 2:23 ` Linus Torvalds
2009-08-07 4:16 ` Artur Skawina
[not found] ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
[not found] ` <4A7CBD28.6070306@gmail.com>
[not found] ` <4A7CBF47.9000903@gmail.com>
[not found] ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
[not found] ` <4A7CC380.3070008@gmail.com>
2009-08-08 4:16 ` Linus Torvalds
2009-08-08 5:34 ` Artur Skawina
2009-08-08 17:10 ` Linus Torvalds
2009-08-08 18:12 ` Artur Skawina
2009-08-08 22:58 ` Artur Skawina
2009-08-08 23:36 ` Artur Skawina
-- strict thread matches above, loose matches on Subject: below --
2009-08-07 7:36 George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.01.0908061625300.3390@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=art.08.09@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).