From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nicolas Pitre <nico@cam.org>
Cc: Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: block-sha1: improve code on large-register-set machines
Date: Tue, 11 Aug 2009 08:43:21 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.01.0908110810410.3417@localhost.localdomain> (raw)
In-Reply-To: <alpine.LFD.2.00.0908102246210.10633@xanadu.home>
On Tue, 11 Aug 2009, Nicolas Pitre wrote:
>
> BLK_SHA1: 5.280s [original]
> BLK_SHA1: 7.410s [with SMALL_REGISTER_SET defined]
> BLK_SHA1: 7.480s [with 'W(x)=(val);asm("":"+m" (W(x)))']
> BLK_SHA1: 4.980s [with 'W(x)=(val);asm("":::"memory")']
>
> At this point the generated assembly is pretty slick. I bet the full
> memory barrier might help on x86 as well.
No, I had tested that earlier - single-word memory barrier for some reason
gets _much_ better numbers at least on x86-64. We're talking
linus 1.46 418.2
vs
linus 2.004 304.6
kind of differences. With the "+m" it outperforms openssl (375-380MB/s).
The "volatile unsigned int *" cast looks pretty much like the "+m" version
to me, but Arthur got a speedup from whatever gcc code generation
differences on his P4.
The really fundamental and basic problem with gcc on this code is that gcc
does not see _any_ difference what-so-ever between the five variables
declared with
unsigned int A, B, C, D, E;
and the sixteen variables declared with
unsigned int array[16];
and considers those all to be 21 local variables. It really seems to think
that they are all 100% equivalent, and gcc totally ignores me doing things
like adding "register" to the A-E ones etc.
And if you are a compiler, and think that the routine has 21 equal
register variables, you're going to do crazy reload sh*t when you have
only 7 (or 15) GP registers. So doing that full memory barrier seems to
just take that random situation, and force some random variable to be
spilled (this is all from looking at the generated code, not from looking
at gcc).
In contrast, with the _targeted_ thing ("you'd better write back into
array[]") we force gcc to spill the array[16] values, and not the A-E
ones, and that's why it seems to make such a big difference.
And no, I'm not sure why ARM apparently doesn't show the same behavior. Or
maybe it does, but with an in-order core it doesn't matter as much which
registers you keep reloading - you'll be serialized all the time _anyway_.
Linus
next prev parent reply other threads:[~2009-08-11 15:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-10 23:52 block-sha1: improve code on large-register-set machines Linus Torvalds
2009-08-11 6:15 ` Nicolas Pitre
2009-08-11 15:04 ` Linus Torvalds
2009-08-11 18:00 ` Nicolas Pitre
2009-08-11 19:31 ` Nicolas Pitre
2009-08-11 21:20 ` Brandon Casey
2009-08-11 21:36 ` Nicolas Pitre
2009-08-11 21:49 ` Brandon Casey
2009-08-11 22:57 ` Linus Torvalds
2009-08-11 23:13 ` Brandon Casey
2009-08-11 15:43 ` Linus Torvalds [this message]
2009-08-11 20:03 ` Nicolas Pitre
2009-08-11 22:53 ` Linus Torvalds
2009-08-11 23:14 ` Linus Torvalds
2009-08-12 2:26 ` Nicolas Pitre
2009-08-11 23:45 ` Artur Skawina
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.01.0908110810410.3417@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox