Re: block-sha1: improve code on large-register-set machines

Git development
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nicolas Pitre <nico@cam.org>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: block-sha1: improve code on large-register-set machines
Date: Tue, 11 Aug 2009 16:14:19 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.2.01.0908111602020.28882@localhost.localdomain> (raw)
In-Reply-To: <alpine.LFD.2.01.0908111550470.28882@localhost.localdomain>

On Tue, 11 Aug 2009, Linus Torvalds wrote:

> 
> 
> On Tue, 11 Aug 2009, Nicolas Pitre wrote:
> > 
> > Well... gcc is really strange in this case (and similar other ones) with 
> > ARM compilation.  A good indicator of the quality of the code is the 
> > size of the stack frame.  When using the "+m" then gcc creates a 816 
> > byte stack frame, the generated binary grows by approx 3000 bytes, and 
> > performances is almost halved (7.600s).  Looking at the assembly result 
> > I just can't figure out all the crazy moves taking place.  Even the 
> > version with no barrier what so ever produces better assembly with a 
> > stack frame of 560 bytes.
> 
> Ok, that's just crazy. That function has a required stack size of exactly 
> 64 bytes, and anything more than that is just spilling. And if you end up 
> with a stack frame of 560 bytes, that means that gcc is doing some _crazy_ 
> spilling.

Btw, what I think happens is:

 - gcc turns all those array accesses into pseudo's 

   So now the 'array[16]' is seen as just another 16 variables rather than 
   an array.

 - gcc then turns it into SSA, where each assignment basically creates a 
   new variable. So the 16 array variables (and 5 regular variables) are 
   now expanded to 80 SSA asignments (one array assignment per SHA1 round) 
   plus an additional 2 assignments to the "regular" variables per round 
   (B and E are changed each round). So in SSA form, you actually end up 
   having 240 pseudo's associated with the actual variables. Plus all 
   the temporaries of course.

 - the thing then goes crazy and tries to generate great code from that 
   internal SSA model. And since there are never more than ~25 things 
   _live_ at any particular point, it works fine with lots of registers, 
   but on small-register machines gcc just goes crazy and has to spill. 
   And it doesn't spill 'array[x]' entries - it spills the _pseudo's_ it 
   has created - hundreds of them.

 - End result: if the spill code doesn't share slots, it's going to create 
   a totally unholy mess of crap.

That's what the whole 'volatile unsigned int *' game tried to avoid. But 
it really sounds like it's not working too well for you. And the _big_ 
memory barrier ends up helping just because with that in place, you end up 
being almost entirely unable to schedule _anything_ between the different 
SHA rounds, so you end up with only six or seven variables "live" in 
between those barriers, and the stupid register allocator/spill logic 
doesn't break down too badly.

The thing is, if you do full memory barriers, then you're probably better 
off making both the loads and the stores be "volatile". That should have 
similar effects.

The downside with that is that it really limits the loads. So (like the 
full memory barrier) it's a big hammer approach. But it probably generates 
better code for you, because it avoids the mental breakdown of gcc 
spilling its pseudo's.

			Linus

next prev parent reply	other threads:[~2009-08-11 23:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-10 23:52 block-sha1: improve code on large-register-set machines Linus Torvalds
2009-08-11  6:15 ` Nicolas Pitre
2009-08-11 15:04   ` Linus Torvalds
2009-08-11 18:00     ` Nicolas Pitre
2009-08-11 19:31       ` Nicolas Pitre
2009-08-11 21:20         ` Brandon Casey
2009-08-11 21:36           ` Nicolas Pitre
2009-08-11 21:49             ` Brandon Casey
2009-08-11 22:57           ` Linus Torvalds
2009-08-11 23:13             ` Brandon Casey
2009-08-11 15:43   ` Linus Torvalds
2009-08-11 20:03     ` Nicolas Pitre
2009-08-11 22:53       ` Linus Torvalds
2009-08-11 23:14         ` Linus Torvalds [this message]
2009-08-12  2:26           ` Nicolas Pitre
2009-08-11 23:45         ` Artur Skawina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.01.0908111602020.28882@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox