From: Benjamin Gilbert <bgilbert@cs.cmu.edu>
To: Jeff Garzik <jeff@garzik.org>
Cc: Matt Mackall <mpm@selenic.com>,
akpm@linux-foundation.org, herbert@gondor.apana.org.au,
linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+
Date: Sat, 09 Jun 2007 20:33:25 -0400 [thread overview]
Message-ID: <466B46D5.1020004@cs.cmu.edu> (raw)
In-Reply-To: <466B0C3F.3040300@garzik.org>
Jeff Garzik wrote:
> Matt Mackall wrote:
>> Have you benchmarked this against lib/sha1.c? Please post the results.
>> Until then, I'm frankly skeptical that your unrolled version is faster
>> because when I introduced lib/sha1.c the rolled version therein won by
>> a significant margin and had 1/10th the cache footprint.
See the benchmark tables in patch 0 at the head of this thread.
Performance improved by at least 25% in every test, and 40-60% was more
common for the 32-bit version (on a Pentium IV).
It's not just the loop unrolling; it's the register allocation and
spilling. For comparison, I built SHATransform() from the
drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and
SHA_CODE_SIZE == 3 (i.e., fully unrolled); I'm guessing this is pretty
close to what you tested back then. The resulting code is 49% MOV
instructions, and 80% of *those* involve memory. gcc4 is somewhat
better, but it still spills a whole lot, both for the 2.6.11 unrolled
code and for the current lib/sha1.c.
In contrast, the assembly implementation in this patch only has to go to
memory for data and workspace (with one small exception in the F3
rounds), and the workspace has a fifth of the cache footprint of the
default implementation.
> Yes. And it also depends on the CPU as well. Testing on a server-class
> x86 CPU (often with bigger L2, and perhaps even L1, cache) will produce
> different result than from popular but less-capable "value" CPUs.
Good point. I benchmarked the 32-bit assembly code on a couple more boxes:
=== AMD Duron, average of 5 trials ===
Test# Bytes/ Bytes/ Cyc/B Cyc/B Change
block update (C) (asm)
0 16 16 104 72 31%
1 64 16 52 36 31%
2 64 64 45 29 36%
3 256 16 33 23 30%
4 256 64 27 17 37%
5 256 256 24 14 42%
6 1024 16 29 20 31%
7 1024 256 20 11 45%
8 1024 1024 19 11 42%
9 2048 16 28 20 29%
10 2048 256 19 11 42%
11 2048 1024 18 10 44%
12 2048 2048 18 10 44%
13 4096 16 28 19 32%
14 4096 256 18 10 44%
15 4096 1024 18 10 44%
16 4096 4096 18 10 44%
17 8192 16 27 19 30%
18 8192 256 18 10 44%
19 8192 1024 18 10 44%
20 8192 4096 17 10 41%
21 8192 8192 17 10 41%
=== Classic Pentium, average of 5 trials ===
Test# Bytes/ Bytes/ Cyc/B Cyc/B Change
block update (C) (asm)
0 16 16 145 144 1%
1 64 16 72 61 15%
2 64 64 65 52 20%
3 256 16 46 39 15%
4 256 64 39 32 18%
5 256 256 36 29 19%
6 1024 16 40 33 18%
7 1024 256 30 23 23%
8 1024 1024 29 23 21%
9 2048 16 39 32 18%
10 2048 256 29 22 24%
11 2048 1024 28 22 21%
12 2048 2048 28 22 21%
13 4096 16 38 32 16%
14 4096 256 28 22 21%
15 4096 1024 28 21 25%
16 4096 4096 27 21 22%
17 8192 16 38 32 16%
18 8192 256 28 22 21%
19 8192 1024 28 21 25%
20 8192 4096 27 21 22%
21 8192 8192 27 21 22%
The improvement isn't as good, but it's still noticeable.
--Benjamin Gilbert
next prev parent reply other threads:[~2007-06-10 0:34 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-08 21:42 [PATCH 0/3] Add optimized SHA-1 implementations for x86 and x86_64 Benjamin Gilbert
2007-06-08 21:42 ` [PATCH 1/3] [CRYPTO] Move sha_init() into cryptohash.h Benjamin Gilbert
2007-06-08 21:42 ` [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+ Benjamin Gilbert
2007-06-09 7:32 ` Jan Engelhardt
2007-06-10 1:15 ` Benjamin Gilbert
2007-06-11 19:47 ` Benjamin Gilbert
2007-06-11 19:50 ` [PATCH] " Benjamin Gilbert
2007-06-11 19:52 ` [PATCH] [CRYPTO] Add optimized SHA-1 implementation for x86_64 Benjamin Gilbert
2007-06-09 20:11 ` [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+ Matt Mackall
2007-06-09 20:23 ` Jeff Garzik
2007-06-09 21:34 ` Matt Mackall
2007-06-10 0:33 ` Benjamin Gilbert [this message]
2007-06-10 13:59 ` Matt Mackall
2007-06-10 16:47 ` Benjamin Gilbert
2007-06-10 17:33 ` Matt Mackall
2007-06-11 17:39 ` Benjamin Gilbert
2007-06-11 12:04 ` Andi Kleen
2007-06-08 21:42 ` [PATCH 3/3] [CRYPTO] Add optimized SHA-1 implementation for x86_64 Benjamin Gilbert
2007-06-11 12:01 ` Andi Kleen
2007-06-11 19:45 ` Benjamin Gilbert
2007-06-11 20:30 ` [PATCH 0/3] Add optimized SHA-1 implementations for x86 and x86_64 Adrian Bunk
-- strict thread matches above, loose matches on Subject: below --
2007-06-11 7:53 [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+ linux
2007-06-11 19:17 ` Benjamin Gilbert
2007-06-12 5:05 ` linux
2007-06-13 5:50 ` Matt Mackall
2007-06-13 6:46 ` linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=466B46D5.1020004@cs.cmu.edu \
--to=bgilbert@cs.cmu.edu \
--cc=akpm@linux-foundation.org \
--cc=herbert@gondor.apana.org.au \
--cc=jeff@garzik.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mpm@selenic.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.