Re: [PATCH 1/2] SHA1 transform: x86_64 AVX2 optimization - assembly code-v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marek Vasut <marex@denx.de>
To: chandramouli narayanan <mouli@linux.intel.com>
Cc: herbert@gondor.apana.org.au, davem@davemloft.net, hpa@zytor.com,
	ilya.albrekht@intel.com, maxim.locktyukhin@intel.com,
	ronen.zohar@intel.com, wajdi.k.feghali@intel.com,
	tim.c.chen@linux.intel.com, linux-crypto@vger.kernel.org
Subject: Re: [PATCH 1/2] SHA1 transform: x86_64 AVX2 optimization - assembly code-v2
Date: Fri, 14 Mar 2014 06:34:39 +0100	[thread overview]
Message-ID: <201403140634.40045.marex@denx.de> (raw)
In-Reply-To: <1394650063.7495.133.camel@pegasus.jf.intel.com>

On Wednesday, March 12, 2014 at 07:47:43 PM, chandramouli narayanan wrote:
> This git patch adds x86_64 AVX2 optimization of SHA1 transform
> to crypto support. The patch has been tested with 3.14.0-rc1
> kernel.
> 
> On a Haswell desktop, with turbo disabled and all cpus running
> at maximum frequency, tcrypt shows AVX2 performance improvement
> from 3% for 256 bytes update to 16% for 1024 bytes update over
> AVX implementation.
> 
> Signed-off-by: Chandramouli Narayanan <mouli@linux.intel.com>
> 
> diff --git a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> b/arch/x86/crypto/sha1_avx2_x86_64_asm.S new file mode 100644
> index 0000000..2f71294
> --- /dev/null
> +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> @@ -0,0 +1,732 @@
> +/*
> +	Implement fast SHA-1 with AVX2 instructions. (x86_64)
> +
> +  This file is provided under a dual BSD/GPLv2 license.  When using or
> +  redistributing this file, you may do so under either license.
> +
> +  GPL LICENSE SUMMARY

Please see Documentation/CodingStyle chapter 8 for the preffered comment style.

[...]

> +*/
> +
> +#---------------------
> +#
> +#SHA-1 implementation with Intel(R) AVX2 instruction set extensions.

DTTO here.

> +#This implementation is based on the previous SSSE3 release:
> +#Visit http://software.intel.com/en-us/articles/
> +#and refer to improving-the-performance-of-the-secure-hash-algorithm-1/
> +#
> +#Updates 20-byte SHA-1 record in 'hash' for even number of
> +#'num_blocks' consecutive 64-byte blocks
> +#
> +#extern "C" void sha1_transform_avx2(
> +#	int *hash, const char* input, size_t num_blocks );
> +#
> +
> +#ifdef CONFIG_AS_AVX2

I wonder, is this large #ifdef around the entire file needed here? Can you not 
just handle not-compiling this file in in the Makefile ?

[...]

> +        push %rbx
> +        push %rbp
> +        push %r12
> +        push %r13
> +        push %r14
> +        push %r15
> +	#FIXME: Save rsp
> +
> +        RESERVE_STACK  = (W_SIZE*4 + 8+24)
> +
> +        # Align stack
> +        mov     %rsp, %rbx
> +        and     $(0x1000-1), %rbx
> +        sub     $(8+32), %rbx
> +        sub     %rbx, %rsp
> +        push    %rbx
> +        sub     $RESERVE_STACK, %rsp
> +
> +        avx2_zeroupper
> +
> +	lea	K_XMM_AR(%rip), K_BASE

Can you please use TABs for indent consistently (see the CodingStyle again) ?

[...]

> +    .align 32
> +    _loop:
> +	# code loops through more than one block
> +	# we use K_BASE value as a signal of a last block,
> +	# it is set below by: cmovae BUFFER_PTR, K_BASE
> +        cmp K_BASE, BUFFER_PTR
> +        jne _begin
> +    .align 32
> +        jmp _end
> +    .align 32
> +    _begin:
> +
> +        # Do first block
> +        RR 0
> +        RR 2
> +        RR 4
> +        RR 6
> +        RR 8
> +
> +        jmp _loop0
> +_loop0:
> +
> +        RR 10
> +        RR 12
> +        RR 14
> +        RR 16
> +        RR 18
> +
> +        RR 20
> +        RR 22
> +        RR 24
> +        RR 26
> +        RR 28

Can you not generate these repeated sequences with some of the AS's macro voodoo 
? Like .rept or somesuch ?

[...]

> +.macro UPDATE_HASH  hash, val
> +	add	\hash, \val
> +	mov	\val, \hash
> +.endm

This macro is defined below the point where it's used, which is a little 
counter-intuitive.
[...]

> +
> +/* AVX2 optimized implementation:
> + *   extern "C" void sha1_transform_avx2(
> + *	int *hash, const char* input, size_t num_blocks );

What does this comment tell me ?

btw. you might want to squash 1/2 and 2/2 , since they are not two logical 
separate blocks I think.

next prev parent reply	other threads:[~2014-03-14  5:40 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-12 18:47 [PATCH 1/2] SHA1 transform: x86_64 AVX2 optimization - assembly code-v2 chandramouli narayanan
2014-03-14  5:34 ` Marek Vasut [this message]
2014-03-17 15:59   ` chandramouli narayanan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201403140634.40045.marex@denx.de \
    --to=marex@denx.de \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=ilya.albrekht@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=maxim.locktyukhin@intel.com \
    --cc=mouli@linux.intel.com \
    --cc=ronen.zohar@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=wajdi.k.feghali@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.