All of lore.kernel.org
 help / color / mirror / Atom feed
From: Artur Skawina <art.08.09@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/7] block-sha1: improved SHA1 hashing
Date: Fri, 07 Aug 2009 03:30:36 +0200	[thread overview]
Message-ID: <4A7B83BC.1040606@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0908061709400.3390@localhost.localdomain>

Linus Torvalds wrote:
> 
> On Thu, 6 Aug 2009, Linus Torvalds wrote:
>> In particular, I'm thinking about the warnign in the intel optimization 
>> manual:
>>
>> 	The rotate by immediate and rotate by register instructions are 
>> 	more expensive than a shift. The rotate by 1 instruction has the 
>> 	same latency as a shift.
>>
>> so it's very possible that "rotate by 1" is much better than other 
>> rotates.
> 
> Hmm. Probably not. Googling more seems to indicate that rotates and shifts 
> have a fixed 4-cycle latency on Northwood. I'm not seeing anything that 
> indicates that a single-bit rotate/shift would be any faster.
> 
> (And remember, if 4 cycles doesn't sound so bad: that's enough of a 
> latency to do _16_ "simple" ALU's, since they can be double-pumped in the 
> two regular ALU's).

looking at the generated code, there is a lot of ro[rl] movement, so it's
likely that contributes to the problem.

I also see 44 extra lea instructions, 44 less adds and changes like:
        [...]
        mov    XX(%eRX),%eRX
        xor    XX(%eRX),%eRX
-       and    %eRX,%eRX
+       and    XX(%eRX),%eRX
        xor    XX(%eRX),%eRX
-       add    %eRX,%eRX
-       ror    $0x2,%eRX
-       mov    %eRX,XX(%eRX)
+       lea    (%eRX,%eRX,1),%eRX
        mov    XX(%eRX),%eRX
        bswap  %eRX
        mov    %eRX,XX(%eRX)
        mov    %eRX,%eRX
+       ror    $0x2,%eRX
+       mov    %eRX,XX(%eRX)
+       mov    %eRX,%eRX
        rol    $0x5,%eRX
        mov    XX(%eRX),%eRX
-       mov    XX(%eRX),%eRX
        [...]
which could mean that gcc did a better job of register allocation
(where "better job" might be just luck).

artur

  reply	other threads:[~2009-08-07  1:30 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06 15:13 [PATCH 0/7] block-sha1: improved SHA1 hashing Linus Torvalds
2009-08-06 15:15 ` [PATCH 1/7] block-sha1: add new optimized C 'block-sha1' routines Linus Torvalds
2009-08-06 15:16   ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Linus Torvalds
2009-08-06 15:18     ` [PATCH 3/7] block-sha1: make the 'ntohl()' part of the first SHA1 loop Linus Torvalds
2009-08-06 15:20       ` [PATCH 4/7] block-sha1: re-use the temporary array as we calculate the SHA1 Linus Torvalds
2009-08-06 15:22         ` [PATCH 5/7] block-sha1: macroize the rounds a bit further Linus Torvalds
2009-08-06 15:24           ` [PATCH 6/7] block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)|(D&(B|C))' in round 3 Linus Torvalds
2009-08-06 15:25             ` [PATCH 7/7] block-sha1: get rid of redundant 'lenW' context Linus Torvalds
2009-08-06 18:25     ` [PATCH 2/7] block-sha1: try to use rol/ror appropriately Bert Wesarg
2009-08-06 17:22 ` [PATCH 0/7] block-sha1: improved SHA1 hashing Artur Skawina
2009-08-06 18:09   ` Linus Torvalds
2009-08-06 19:10     ` Artur Skawina
2009-08-06 19:41       ` Linus Torvalds
2009-08-06 20:08         ` Artur Skawina
2009-08-06 20:53           ` Linus Torvalds
2009-08-06 21:24             ` Linus Torvalds
2009-08-06 21:39             ` Artur Skawina
2009-08-06 21:52               ` Artur Skawina
2009-08-06 22:27                 ` Linus Torvalds
2009-08-06 22:33                   ` Linus Torvalds
2009-08-06 23:19                     ` Artur Skawina
2009-08-06 23:42                       ` Linus Torvalds
2009-08-06 22:55                   ` Artur Skawina
2009-08-06 23:04                     ` Linus Torvalds
2009-08-06 23:25                       ` Linus Torvalds
2009-08-07  0:13                         ` Linus Torvalds
2009-08-07  1:30                           ` Artur Skawina [this message]
2009-08-07  1:55                             ` Linus Torvalds
2009-08-07  0:53                         ` Artur Skawina
2009-08-07  2:23                   ` Linus Torvalds
2009-08-07  4:16                     ` Artur Skawina
     [not found]                     ` <alpine.LFD.2.01.0908071614310.3288@localhost.localdomain>
     [not found]                       ` <4A7CBD28.6070306@gmail.com>
     [not found]                         ` <4A7CBF47.9000903@gmail.com>
     [not found]                           ` <alpine.LFD.2.01.0908071700290.3288@localhost.localdomain>
     [not found]                             ` <4A7CC380.3070008@gmail.com>
2009-08-08  4:16                               ` Linus Torvalds
2009-08-08  5:34                                 ` Artur Skawina
2009-08-08 17:10                                   ` Linus Torvalds
2009-08-08 18:12                                     ` Artur Skawina
2009-08-08 22:58                                   ` Artur Skawina
2009-08-08 23:36                                     ` Artur Skawina
  -- strict thread matches above, loose matches on Subject: below --
2009-08-07  7:36 George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A7B83BC.1040606@gmail.com \
    --to=art.08.09@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.