From: Linus Torvalds <torvalds@linux-foundation.org>
To: Artur Skawina <art.08.09@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Thu, 6 Aug 2009 18:55:19 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.01.0908061833130.3390@localhost.localdomain> (raw)
In-Reply-To: <4A7B83BC.1040606@gmail.com>
On Fri, 7 Aug 2009, Artur Skawina wrote:
>
> I also see 44 extra lea instructions, 44 less adds
add and lea (as long as the lea shift is 1) should be the same on a P4
(they are not the same on some other microarchitectures and lea can have
address generation stalls etc).
Lea, of course, gives the potential for register movement at the same time
(three-address op), and that's likely the reason for lea-vs-adds.
> and changes like:
> [...]
> mov XX(%eRX),%eRX
> xor XX(%eRX),%eRX
> - and %eRX,%eRX
> + and XX(%eRX),%eRX
Yeah, different spill patterns. That's the biggest issue, I think.
In particular, on P4, with unlucky spills, you may end up with things like
ror $2,reg
mov reg,x(%esp)
.. a few instructions ..
xor x(%esp), reg
and the above is exactly when one of the worst P4 problems hit: a store,
followed a few cycles later by a load from the same address (and "a few
cycles later" can be quite a few instructions if they are the nice ones).
What can happen is that if the store data isn't ready yet (because it
comes from a long-latency op like a shift or a multiply), then you hit a
store buffer replay thing. The P4 (with its long pipeline) basically
starts the load speculatively, and if anything bad happens for the load
(L1 cache miss, TLB miss, store buffer fault, you name it), it will cause
a replay of the whole pipeline.
Which can take tens of cycles.
[ That said, it's been a long time since I did a lot of P4 worrying. So I
may mis-remember the details. But that whole store buffer forwarding had
some really nasty replay issues ]
> which could mean that gcc did a better job of register allocation
> (where "better job" might be just luck).
I suspect that's the biggest issue. Just _happening_ to get the spills so
that they don't hurt. And with unlucky scheduling, you might hit some of
the P4 replay issues every single time.
There are some P4 optimizations that are simple:
- avoid complex instructions
- don't blow the trace cache
- predictable branches
but the replay faults can really get you.
Linus
next prev parent reply other threads:[~2009-08-07 1:55 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18 ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20 ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22 ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24 ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25 ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25 ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09 ` Linus Torvalds
2009-08-06 19:10 ` Artur Skawina
2009-08-06 19:41 ` Linus Torvalds
2009-08-06 20:08 ` Artur Skawina
2009-08-06 20:53 ` Linus Torvalds
2009-08-06 21:24 ` Linus Torvalds
2009-08-06 21:39 ` Artur Skawina
2009-08-06 21:52 ` Artur Skawina
2009-08-06 22:27 ` Linus Torvalds
2009-08-06 22:33 ` Linus Torvalds
2009-08-06 23:19 ` Artur Skawina
2009-08-06 23:42 ` Linus Torvalds
2009-08-06 22:55 ` Artur Skawina
2009-08-06 23:04 ` Linus Torvalds
2009-08-06 23:25 ` Linus Torvalds
2009-08-07 0:13 ` Linus Torvalds
2009-08-07 1:30 ` Artur Skawina
2009-08-07 1:55 ` Linus Torvalds [this message]
2009-08-07 0:53 ` Artur Skawina
2009-08-07 2:23 ` Linus Torvalds
2009-08-07 4:16 ` Artur Skawina
[not found] ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
[not found] ` <4A7CBD28.6070306@gmail.com>
[not found] ` <4A7CBF47.9000903@gmail.com>
[not found] ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
[not found] ` <4A7CC380.3070008@gmail.com>
2009-08-08 4:16 ` Linus Torvalds
2009-08-08 5:34 ` Artur Skawina
2009-08-08 17:10 ` Linus Torvalds
2009-08-08 18:12 ` Artur Skawina
2009-08-08 22:58 ` Artur Skawina
2009-08-08 23:36 ` Artur Skawina
-- strict thread matches above, loose matches on Subject: below --
2009-08-07 7:36 George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.01.0908061833130.3390@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=art.08.09@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).