From: Marek Vasut <marex@denx.de>
To: chandramouli narayanan <mouli@linux.intel.com>
Cc: herbert@gondor.apana.org.au, davem@davemloft.net, hpa@zytor.com,
ilya.albrekht@intel.com, maxim.locktyukhin@intel.com,
ronen.zohar@intel.com, wajdi.k.feghali@intel.com,
tim.c.chen@linux.intel.com, linux-crypto@vger.kernel.org
Subject: Re: [PATCH 1/2] SHA1 transform: x86_64 AVX2 optimization - assembly code-v2
Date: Fri, 14 Mar 2014 06:34:39 +0100 [thread overview]
Message-ID: <201403140634.40045.marex@denx.de> (raw)
In-Reply-To: <1394650063.7495.133.camel@pegasus.jf.intel.com>
On Wednesday, March 12, 2014 at 07:47:43 PM, chandramouli narayanan wrote:
> This git patch adds x86_64 AVX2 optimization of SHA1 transform
> to crypto support. The patch has been tested with 3.14.0-rc1
> kernel.
>
> On a Haswell desktop, with turbo disabled and all cpus running
> at maximum frequency, tcrypt shows AVX2 performance improvement
> from 3% for 256 bytes update to 16% for 1024 bytes update over
> AVX implementation.
>
> Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com>
>
> diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> b/arch/x86/crypto/sha1_avx2_x86_64_asm.S new file mode 100644
> index 0000000..2f71294
> --- /dev/null
> +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> @@ -0,0 +1,732 @@
> +/*
> + Implement fast SHA-1 with AVX2 instructions. (x86_64)
> +
> + This file is provided under a dual BSD/GPLv2 license. When using or
> + redistributing this file, you may do so under either license.
> +
> + GPL LICENSE SUMMARY
Please see Documentation/CodingStyle chapter 8 for the preffered comment style.
[...]
> +*/
> +
> +#---------------------
> +#
> +#SHA-1 implementation with Intel(R) AVX2 instruction set extensions.
DTTO here.
> +#This implementation is based on the previous SSSE3 release:
> +#Visit http://software.intel.com/en-us/articles/
> +#and refer to improving-the-performance-of-the-secure-hash-algorithm-1/
> +#
> +#Updates 20-byte SHA-1 record in 'hash' for even number of
> +#'num_blocks' consecutive 64-byte blocks
> +#
> +#extern "C" void sha1_transform_avx2(
> +# int *hash, const char* input, size_t num_blocks );
> +#
> +
> +#ifdef CONFIG_AS_AVX2
I wonder, is this large #ifdef around the entire file needed here? Can you not
just handle not-compiling this file in in the Makefile ?
[...]
> + push %rbx
> + push %rbp
> + push %r12
> + push %r13
> + push %r14
> + push %r15
> + #FIXME: Save rsp
> +
> + RESERVE_STACK = (W_SIZE*4 + 8+24)
> +
> + # Align stack
> + mov %rsp, %rbx
> + and $(0x1000-1), %rbx
> + sub $(8+32), %rbx
> + sub %rbx, %rsp
> + push %rbx
> + sub $RESERVE_STACK, %rsp
> +
> + avx2_zeroupper
> +
> + lea K_XMM_AR(%rip), K_BASE
Can you please use TABs for indent consistently (see the CodingStyle again) ?
[...]
> + .align 32
> + _loop:
> + # code loops through more than one block
> + # we use K_BASE value as a signal of a last block,
> + # it is set below by: cmovae BUFFER_PTR, K_BASE
> + cmp K_BASE, BUFFER_PTR
> + jne _begin
> + .align 32
> + jmp _end
> + .align 32
> + _begin:
> +
> + # Do first block
> + RR 0
> + RR 2
> + RR 4
> + RR 6
> + RR 8
> +
> + jmp _loop0
> +_loop0:
> +
> + RR 10
> + RR 12
> + RR 14
> + RR 16
> + RR 18
> +
> + RR 20
> + RR 22
> + RR 24
> + RR 26
> + RR 28
Can you not generate these repeated sequences with some of the AS's macro voodoo
? Like .rept or somesuch ?
[...]
> +.macro UPDATE_HASH hash, val
> + add \hash, \val
> + mov \val, \hash
> +.endm
This macro is defined below the point where it's used, which is a little
counter-intuitive.
[...]
> +
> +/* AVX2 optimized implementation:
> + * extern "C" void sha1_transform_avx2(
> + * int *hash, const char* input, size_t num_blocks );
What does this comment tell me ?
btw. you might want to squash 1/2 and 2/2 , since they are not two logical
separate blocks I think.
next prev parent reply other threads:[~2014-03-14 5:40 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-12 18:47 [PATCH 1/2] SHA1 transform: x86_64 AVX2 optimization - assembly code-v2 chandramouli narayanan
2014-03-14 5:34 ` Marek Vasut [this message]
2014-03-17 15:59 ` chandramouli narayanan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201403140634.40045.marex@denx.de \
--to=marex@denx.de \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=hpa@zytor.com \
--cc=ilya.albrekht@intel.com \
--cc=linux-crypto@vger.kernel.org \
--cc=maxim.locktyukhin@intel.com \
--cc=mouli@linux.intel.com \
--cc=ronen.zohar@intel.com \
--cc=tim.c.chen@linux.intel.com \
--cc=wajdi.k.feghali@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.