Re: [PATCH] PPC assembly implementation of SHA1

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Wayne Scott <wsc9tt@gmail.com>
To: Paul Mackerras <paulus@samba.org>
Cc: linux@horizon.com, git@vger.kernel.org
Subject: Re: [PATCH] PPC assembly implementation of SHA1
Date: Sun, 24 Apr 2005 07:04:27 -0500	[thread overview]
Message-ID: <59a6e583050424050434ae2501@mail.gmail.com> (raw)
In-Reply-To: <17003.9009.226712.220822@cargo.ozlabs.ibm.com>

_The_ book on expression rewriting tricks like this, especially for
the PPC, is "Hacker's Delight" by Henry Warren.  Great reading!!!
http://www.hackersdelight.org/

-Wayne

On 4/23/05, Paul Mackerras <paulus@samba.org> wrote:
> linux@horizon.com writes:
> 
> > I was working on the same thing, but hindered by lack of access to PPC
> > hardware.  I notice that you also took advantage of the unaligned load
> > support and native byte order to do the hash straight from the source.
> 
> Yes. :)  In previous experiments (in the context of trying different
> ways to do memcpy) I found that doing unaligned word loads is faster
> than doing aligned loads plus extra rotate and mask instructions to
> get the bytes you want together.
> 
> > But I came up with a few additional refinements:
> >
> > - You are using three temporaries (%r0, %r6, and RT(x)) for your
> >   round functions.  You only need one temporary (%r0) for all the functions.
> >   (Plus %r15 for k)
> 
> The reason I used more than one temporary is that I was trying to put
> dependent instructions as far apart as reasonably possible, to
> minimize the chances of pipeline stalls.  Given that the 970 does
> register renaming and out-of-order execution, I don't know how
> essential that is, but it can't hurt.
> 
> > All are three logical instrunctions on PPC.  The second form
> > lets you add it into the accumulator e in two pieces:
> 
> A sequence of adds into a single register is going to incur the
> 2-cycle latency between generation and use of a value; i.e. the adds
> will only issue on every second cycle.  I think we are better off
> making the dataflow more like a tree than a linear chain where
> possible.
> 
> > And the last function, majority(x,y,z), can be written as:
> > f3(x,y,z) = (x & y) | (y & z) | (z & x)
> >           = (x & y) | z & (x | y)
> >           = (x & y) | z & (x ^ y)
> >           = (x & y) + z & (x ^ y)
> 
> That's cute, I hadn't thought of that.
> 
> > - You don't need to decrement %r1 before saving registers.
> >   The PPC calling convention defines a "red zone" below the
> >   current stack pointer that is guaranteed never to be touched
> >   by signal handlers or the like.  This is specifically for
> >   leaf procedure optimization, and is at least 224 bytes.
> 
> Not in the ppc32 ELF ABI - you are not supposed to touch memory below
> the stack pointer.  The kernel is more forgiving than that, and in
> fact you can currently use the red zone without anything bad
> happening, but you really shouldn't.
> 
> > - Is that many stw/lwz instructions faster than stmw/lmw?
> >   The latter is at least more cahce-friendly.
> 
> I believe the stw/lwz and the stmw/lmw will actually execute at the
> same speed on the 970, but I have seen lwz/stw go faster than lmw/stmw
> on other machines.  In any case we aren't executing the prolog and
> epilog as often as the instructions in the main loop, hopefully.
> 
> > - You can avoid saving and restoring %r15 by recycling %r5 for that
> >   purpose; it's not used after the mtctr %r5.
> 
> True.
> 
> > - The above changes actually save enough registers to cache the whole hash[5]
> >   in registers as well, eliminating *all* unnecessary load/store traffic.
> 
> That's cool.
> 
> > With all of the above changes, your sha1ppc.S file turns into:
> 
> I added a stwu and an addi to make a stack frame, and changed %r15 to
> %r5 as you mentioned in another message.  I tried it in a little test
> program I have that calls SHA1_Update 256,000 times with a buffer of
> 4096 zero bytes, i.e. it processes 1000MB.  Your version seems to be
> about 2% faster; it took 4.53 seconds compared to 4.62 for mine.  But
> it also gives the wrong answer; I haven't investigated why.
> 
> Thanks,
> Paul.
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2005-04-24 11:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-23 12:42 [PATCH] PPC assembly implementation of SHA1 linux
2005-04-23 13:03 ` linux
2005-04-24  2:49 ` Benjamin Herrenschmidt
2005-04-24  4:40 ` Paul Mackerras
2005-04-24 12:04   ` Wayne Scott [this message]
2005-04-25  0:16   ` linux
2005-04-25  3:13   ` Revised PPC assembly implementation linux
2005-04-25  9:40     ` Paul Mackerras
2005-04-25 17:34       ` linux
2005-04-25 23:00         ` Paul Mackerras
2005-04-25 23:17           ` David S. Miller
2005-04-26  1:22             ` Paul Mackerras
2005-04-27  1:47               ` linux
2005-04-27  3:39                 ` Paul Mackerras
2005-04-27 16:01                   ` linux
2005-04-26  2:14             ` linux
2005-04-26  2:35             ` linux
  -- strict thread matches above, loose matches on Subject: below --
2005-04-23  5:33 [PATCH] PPC assembly implementation of SHA1 Paul Mackerras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59a6e583050424050434ae2501@mail.gmail.com \
    --to=wsc9tt@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).