Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matt Mackall <mpm@selenic.com>
To: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Cc: Jeff Garzik <jeff@garzik.org>,
	akpm@linux-foundation.org, herbert@gondor.apana.org.au,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+
Date: Sun, 10 Jun 2007 08:59:56 -0500	[thread overview]
Message-ID: <20070610135956.GS11115@waste.org> (raw)
In-Reply-To: <466B46D5.1020004@cs.cmu.edu>

On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote:
> Jeff Garzik wrote:
> >Matt Mackall wrote:
> >>Have you benchmarked this against lib/sha1.c? Please post the results.
> >>Until then, I'm frankly skeptical that your unrolled version is faster
> >>because when I introduced lib/sha1.c the rolled version therein won by
> >>a significant margin and had 1/10th the cache footprint.
> 
> See the benchmark tables in patch 0 at the head of this thread. 
> Performance improved by at least 25% in every test, and 40-60% was more 
> common for the 32-bit version (on a Pentium IV).
> 
> It's not just the loop unrolling; it's the register allocation and 
> spilling.  For comparison, I built SHATransform() from the 
> drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and 
> SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty 
> close to what you tested back then.  The resulting code is 49% MOV 
> instructions, and 80% of *those* involve memory.  gcc4 is somewhat 
> better, but it still spills a whole lot, both for the 2.6.11 unrolled 
> code and for the current lib/sha1.c.

Wait, your benchmark is comparing against the unrolled code?

> In contrast, the assembly implementation in this patch only has to go to 
> memory for data and workspace (with one small exception in the F3 
> rounds), and the workspace has a fifth of the cache footprint of the 
> default implementation.

How big is the -code- footprint?

Earlier you wrote:

> On the aforementioned Pentium IV, /dev/urandom throughput goes from
> 3.7 MB/s to 5.6 MB/s with the patches; on the Core 2, it increases
> from 5.5 MB/s to 8.1 MB/s.

Whoa. We've regressed something horrible here:

http://groups.google.com/group/linux.kernel/msg/fba056363c99d4f9?dmode=source&hl=en

In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s.
Were your tests with or without the latest /dev/urandom fixes? This
one in particular:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.21.y.git;a=commitdiff;h=374f167dfb97c1785515a0c41e32a66b414859a8

-- 
Mathematics is the supreme nostalgia of our time.

next prev parent reply	other threads:[~2007-06-10 14:00 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-08 21:42 [PATCH 0/3] Add optimized SHA-1 implementations for x86 and x86_64 Benjamin Gilbert
2007-06-08 21:42 ` [PATCH 1/3] [CRYPTO] Move sha_init() into cryptohash.h Benjamin Gilbert
2007-06-08 21:42 ` [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+ Benjamin Gilbert
2007-06-09  7:32   ` Jan Engelhardt
2007-06-10  1:15     ` Benjamin Gilbert
2007-06-11 19:47       ` Benjamin Gilbert
2007-06-11 19:50         ` [PATCH] " Benjamin Gilbert
2007-06-11 19:52         ` [PATCH] [CRYPTO] Add optimized SHA-1 implementation for x86_64 Benjamin Gilbert
2007-06-09 20:11   ` [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+ Matt Mackall
2007-06-09 20:23     ` Jeff Garzik
2007-06-09 21:34       ` Matt Mackall
2007-06-10  0:33       ` Benjamin Gilbert
2007-06-10 13:59         ` Matt Mackall [this message]
2007-06-10 16:47           ` Benjamin Gilbert
2007-06-10 17:33             ` Matt Mackall
2007-06-11 17:39           ` Benjamin Gilbert
2007-06-11 12:04     ` Andi Kleen
2007-06-08 21:42 ` [PATCH 3/3] [CRYPTO] Add optimized SHA-1 implementation for x86_64 Benjamin Gilbert
2007-06-11 12:01   ` Andi Kleen
2007-06-11 19:45     ` Benjamin Gilbert
2007-06-11 20:30 ` [PATCH 0/3] Add optimized SHA-1 implementations for x86 and x86_64 Adrian Bunk
  -- strict thread matches above, loose matches on Subject: below --
2007-06-11  7:53 [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+ linux
2007-06-11 19:17 ` Benjamin Gilbert
2007-06-12  5:05   ` linux
2007-06-13  5:50     ` Matt Mackall
2007-06-13  6:46       ` linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070610135956.GS11115@waste.org \
    --to=mpm@selenic.com \
    --cc=akpm@linux-foundation.org \
    --cc=bgilbert@cs.cmu.edu \
    --cc=herbert@gondor.apana.org.au \
    --cc=jeff@garzik.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.