public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Cc: akpm@linux-foundation.org, herbert@gondor.apana.org.au,
	linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] [CRYPTO] Add optimized SHA-1 implementation for x86_64
Date: 11 Jun 2007 14:01:56 +0200	[thread overview]
Message-ID: <p73lkeqvgwb.fsf@bingen.suse.de> (raw)
In-Reply-To: <20070608214258.23949.67358.stgit@dev>

Benjamin Gilbert <bgilbert@cs.cmu.edu> writes:

> +/* push/pop wrappers that update the DWARF unwind table */
> +#define PUSH(regname)						\
> +	push			%regname;			\
> +	CFI_ADJUST_CFA_OFFSET	8;				\
> +	CFI_REL_OFFSET		regname, 0
> +
> +#define POP(regname)						\
> +	pop			%regname;			\
> +	CFI_ADJUST_CFA_OFFSET	-8;				\
> +	CFI_RESTORE		regname

Please don't do these kinds of wrappers. They just obfuscate the code.

And BTW plain gas macros (.macro) are much nicer to read too
than cpp macros.


> +#define EXPAND(i)						\
> +	movl	OFFSET(i % 16)(DATA), TMP;			\
> +	xorl	OFFSET((i + 2) % 16)(DATA), TMP;		\

Such overlapping memory accesses are somewhat dangerous as they tend
to stall some CPUs.  Better probably to do a quad load and then extract.

If you care about the last cycle I would suggest you run 
it at least once through the Pipeline simulator in the Linux
version of AMD CodeAnalyst or through vtune.

I haven't checked in detail if it's possible but it's suspicious you
never use quad operations for anything. You keep at least half
the CPU's bits idle all the time.

> +	EXPAND(75); ROUND(SA, SB, SC, SD, SE, F2, TMP)
> +	EXPAND(76); ROUND(SE, SA, SB, SC, SD, F2, TMP)
> +	EXPAND(77); ROUND(SD, SE, SA, SB, SC, F2, TMP)
> +	EXPAND(78); ROUND(SC, SD, SE, SA, SB, F2, TMP)
> +	EXPAND(79); ROUND(SB, SC, SD, SE, SA, F2, TMP)

Gut feeling is that the unroll factor is far too large.
Have you tried a smaller one? That would save icache
which is very important in the kernel. Unlike in your micro benchmark
when kernel code runs normally caches are cold.  Smaller is faster then.
And most kernel SHA applications don't process very much data anyways
so startup costs are important.

> diff --git a/lib/Kconfig b/lib/Kconfig
> index 69fdb64..23a84ed 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -132,9 +132,14 @@ config SHA1_X86
>  	depends on (X86 || UML_X86) && !64BIT && X86_BSWAP
>  	default y
>  
> +config SHA1_X86_64
> +	bool
> +	depends on (X86 || UML_X86) && 64BIT
> +	default y
> +
>  config SHA1_GENERIC
>  	bool
> -	depends on !SHA1_X86
> +	depends on !SHA1_X86 && !SHA1_X86_64

Better define a SHA_ARCH_OPTIMIZED helper symbol, otherwise
this will get messy as more architectures add optimized versions.

-Andi

  reply	other threads:[~2007-06-11 11:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-08 21:42 [PATCH 0/3] Add optimized SHA-1 implementations for x86 and x86_64 Benjamin Gilbert
2007-06-08 21:42 ` [PATCH 1/3] [CRYPTO] Move sha_init() into cryptohash.h Benjamin Gilbert
2007-06-08 21:42 ` [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+ Benjamin Gilbert
2007-06-09  7:32   ` Jan Engelhardt
2007-06-10  1:15     ` Benjamin Gilbert
2007-06-11 19:47       ` Benjamin Gilbert
2007-06-11 19:50         ` [PATCH] " Benjamin Gilbert
2007-06-11 19:52         ` [PATCH] [CRYPTO] Add optimized SHA-1 implementation for x86_64 Benjamin Gilbert
2007-06-09 20:11   ` [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+ Matt Mackall
2007-06-09 20:23     ` Jeff Garzik
2007-06-09 21:34       ` Matt Mackall
2007-06-10  0:33       ` Benjamin Gilbert
2007-06-10 13:59         ` Matt Mackall
2007-06-10 16:47           ` Benjamin Gilbert
2007-06-10 17:33             ` Matt Mackall
2007-06-11 17:39           ` Benjamin Gilbert
2007-06-11 12:04     ` Andi Kleen
2007-06-08 21:42 ` [PATCH 3/3] [CRYPTO] Add optimized SHA-1 implementation for x86_64 Benjamin Gilbert
2007-06-11 12:01   ` Andi Kleen [this message]
2007-06-11 19:45     ` Benjamin Gilbert
2007-06-11 20:30 ` [PATCH 0/3] Add optimized SHA-1 implementations for x86 and x86_64 Adrian Bunk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=p73lkeqvgwb.fsf@bingen.suse.de \
    --to=andi@firstfloor.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgilbert@cs.cmu.edu \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox