git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Artur Skawina <art.08.09@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Thu, 6 Aug 2009 18:55:19 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.2.01.0908061833130.3390@localhost.localdomain> (raw)
In-Reply-To: <4A7B83BC.1040606@gmail.com>



On Fri, 7 Aug 2009, Artur Skawina wrote:
> 
> I also see 44 extra lea instructions, 44 less adds

add and lea (as long as the lea shift is 1) should be the same on a P4 
(they are not the same on some other microarchitectures and lea can have 
address generation stalls etc).

Lea, of course, gives the potential for register movement at the same time 
(three-address op), and that's likely the reason for lea-vs-adds.

> and changes like:
>         [...]
>         mov    XX(%eRX),%eRX
>         xor    XX(%eRX),%eRX
> -       and    %eRX,%eRX
> +       and    XX(%eRX),%eRX

Yeah, different spill patterns. That's the biggest issue, I think.

In particular, on P4, with unlucky spills, you may end up with things like

	ror $2,reg
	mov reg,x(%esp)
	.. a few instructions ..
	xor x(%esp), reg

and the above is exactly when one of the worst P4 problems hit: a store, 
followed a few cycles later by a load from the same address (and "a few 
cycles later" can be quite a few instructions if they are the nice ones).

What can happen is that if the store data isn't ready yet (because it 
comes from a long-latency op like a shift or a multiply), then you hit a 
store buffer replay thing. The P4 (with its long pipeline) basically 
starts the load speculatively, and if anything bad happens for the load 
(L1 cache miss, TLB miss, store buffer fault, you name it), it will cause 
a replay of the whole pipeline.

Which can take tens of cycles. 

[ That said, it's been a long time since I did a lot of P4 worrying. So I 
  may mis-remember the details. But that whole store buffer forwarding had 
  some really nasty replay issues ]

> which could mean that gcc did a better job of register allocation
> (where "better job" might be just luck).

I suspect that's the biggest issue. Just _happening_ to get the spills so 
that they don't hurt. And with unlucky scheduling, you might hit some of 
the P4 replay issues every single time.

There are some P4 optimizations that are simple:
 - avoid complex instructions
 - don't blow the trace cache
 - predictable branches
but the replay faults can really get you.

			Linus

  reply	other threads:[~2009-08-07  1:55 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16   ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18     ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20       ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22         ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24           ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25             ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25     ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09   ` Linus Torvalds
2009-08-06 19:10     ` Artur Skawina
2009-08-06 19:41       ` Linus Torvalds
2009-08-06 20:08         ` Artur Skawina
2009-08-06 20:53           ` Linus Torvalds
2009-08-06 21:24             ` Linus Torvalds
2009-08-06 21:39             ` Artur Skawina
2009-08-06 21:52               ` Artur Skawina
2009-08-06 22:27                 ` Linus Torvalds
2009-08-06 22:33                   ` Linus Torvalds
2009-08-06 23:19                     ` Artur Skawina
2009-08-06 23:42                       ` Linus Torvalds
2009-08-06 22:55                   ` Artur Skawina
2009-08-06 23:04                     ` Linus Torvalds
2009-08-06 23:25                       ` Linus Torvalds
2009-08-07  0:13                         ` Linus Torvalds
2009-08-07  1:30                           ` Artur Skawina
2009-08-07  1:55                             ` Linus Torvalds [this message]
2009-08-07  0:53                         ` Artur Skawina
2009-08-07  2:23                   ` Linus Torvalds
2009-08-07  4:16                     ` Artur Skawina
     [not found]                     ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
     [not found]                       ` <4A7CBD28.6070306@gmail.com>
     [not found]                         ` <4A7CBF47.9000903@gmail.com>
     [not found]                           ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
     [not found]                             ` <4A7CC380.3070008@gmail.com>
2009-08-08  4:16                               ` Linus Torvalds
2009-08-08  5:34                                 ` Artur Skawina
2009-08-08 17:10                                   ` Linus Torvalds
2009-08-08 18:12                                     ` Artur Skawina
2009-08-08 22:58                                   ` Artur Skawina
2009-08-08 23:36                                     ` Artur Skawina
  -- strict thread matches above, loose matches on Subject: below --
2009-08-07  7:36 George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.01.0908061833130.3390@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=art.08.09@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).