All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jussi Kivilinna <jussi.kivilinna@iki.fi>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	Russell King <linux@arm.linux.org.uk>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH 2/2] crypto: sha1: add ARM NEON implementation
Date: Sun, 29 Jun 2014 14:25:20 +0300	[thread overview]
Message-ID: <53AFF7A0.1000306@iki.fi> (raw)
In-Reply-To: <CAKv+Gu9uWr6E2DT7Nz35k8YVG_zypktMkkdPKUKncPJ=xo2Lsw@mail.gmail.com>

On 28.06.2014 23:07, Ard Biesheuvel wrote:> Hi Jussi,
> 
> On 28 June 2014 12:40, Jussi Kivilinna <jussi.kivilinna@iki.fi> wrote:
>> This patch adds ARM NEON assembly implementation of SHA-1 algorithm.
>>
>> tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm:
>>
>> block-size      bytes/update    old-vs-new
>> 16              16              1.06x
>> 64              16              1.05x
>> 64              64              1.09x
>> 256             16              1.04x
>> 256             64              1.11x
>> 256             256             1.28x
>> 1024            16              1.04x
>> 1024            256             1.34x
>> 1024            1024            1.42x
>> 2048            16              1.04x
>> 2048            256             1.35x
>> 2048            1024            1.44x
>> 2048            2048            1.46x
>> 4096            16              1.04x
>> 4096            256             1.36x
>> 4096            1024            1.45x
>> 4096            4096            1.48x
>> 8192            16              1.04x
>> 8192            256             1.36x
>> 8192            1024            1.46x
>> 8192            4096            1.49x
>> 8192            8192            1.49x
>>
> 
> This is a nice result: about the same speedup as OpenSSL when
> comparing the ALU asm implementation with the NEON.
> 
>> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
>> ---
>>  arch/arm/crypto/Makefile           |    2
>>  arch/arm/crypto/sha1-armv7-neon.S  |  635 ++++++++++++++++++++++++++++++++++++
>>  arch/arm/crypto/sha1_glue.c        |    8
>>  arch/arm/crypto/sha1_neon_glue.c   |  197 +++++++++++
>>  arch/arm/include/asm/crypto/sha1.h |   10 +
>>  crypto/Kconfig                     |   11 +
>>  6 files changed, 860 insertions(+), 3 deletions(-)
>>  create mode 100644 arch/arm/crypto/sha1-armv7-neon.S
>>  create mode 100644 arch/arm/crypto/sha1_neon_glue.c
>>  create mode 100644 arch/arm/include/asm/crypto/sha1.h
>>
>> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
>> index 81cda39..374956d 100644
>> --- a/arch/arm/crypto/Makefile
>> +++ b/arch/arm/crypto/Makefile
>> @@ -5,10 +5,12 @@
>>  obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
>>  obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
>>  obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
>> +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
>>
>>  aes-arm-y      := aes-armv4.o aes_glue.o
>>  aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
>>  sha1-arm-y     := sha1-armv4-large.o sha1_glue.o
>> +sha1-arm-neon-y        := sha1-armv7-neon.o sha1_neon_glue.o
>>
>>  quiet_cmd_perl = PERL    $@
>>        cmd_perl = $(PERL) $(<) > $(@)
>> diff --git a/arch/arm/crypto/sha1-armv7-neon.S b/arch/arm/crypto/sha1-armv7-neon.S
>> new file mode 100644
>> index 0000000..beb1ed1
>> --- /dev/null
>> +++ b/arch/arm/crypto/sha1-armv7-neon.S
>> @@ -0,0 +1,635 @@
>> +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
>> + *
>> + * Copyright © 2013-2014 Jussi Kivilinna <jussi.kivilinna@iki.fi>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License as published by the Free
>> + * Software Foundation; either version 2 of the License, or (at your option)
>> + * any later version.
>> + */
>> +
>> +.syntax unified
>> +#ifdef __thumb2__
>> +.thumb
>> +#else
>> +.code   32
>> +#endif
> 
> This is all NEON code, which has no size benefit from being assembled
> as Thumb-2. (NEON instructions are 4 bytes in either case)
> If we drop the Thumb-2 versions, there's one less version to test.
> 

Ok, I'll drop the .thumb part for both SHA1 and SHA512.

>> +.fpu neon
>> +
>> +.data
>> +
>> +#define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
>> +
> [...]
>> +.align 4
>> +.LK_VEC:
>> +.LK1:  .long K1, K1, K1, K1
>> +.LK2:  .long K2, K2, K2, K2
>> +.LK3:  .long K3, K3, K3, K3
>> +.LK4:  .long K4, K4, K4, K4
> 
> If you are going to put these constants in a different section, they
> belong in .rodata not .data.
> But why not just keep them in .text? In that case, you can replace the
> above 'ldr reg, =name' with 'adr reg ,name' (or adrl if required) and
> get rid of the .ltorg and the literal pool.
> 

Ok, I'll move these to .text.

Actually I realized that these values can be loaded to still free NEON
registers for additional speed up.

>> +/*
>> + * Transform nblks*64 bytes (nblks*16 32-bit words) at DATA.
>> + *
>> + * unsigned int
>> + * sha1_transform_neon (void *ctx, const unsigned char *data,
>> + *                      unsigned int nblks)
>> + */
>> +.align 3
>> +.globl sha1_transform_neon
>> +.type  sha1_transform_neon,%function;
>> +
>> +sha1_transform_neon:
> 
> ENTRY(sha1_transform_neon) [and matching ENDPROC() below]

Sure.

-Jussi

WARNING: multiple messages have this Message-ID (diff)
From: jussi.kivilinna@iki.fi (Jussi Kivilinna)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 2/2] crypto: sha1: add ARM NEON implementation
Date: Sun, 29 Jun 2014 14:25:20 +0300	[thread overview]
Message-ID: <53AFF7A0.1000306@iki.fi> (raw)
In-Reply-To: <CAKv+Gu9uWr6E2DT7Nz35k8YVG_zypktMkkdPKUKncPJ=xo2Lsw@mail.gmail.com>

On 28.06.2014 23:07, Ard Biesheuvel wrote:> Hi Jussi,
> 
> On 28 June 2014 12:40, Jussi Kivilinna <jussi.kivilinna@iki.fi> wrote:
>> This patch adds ARM NEON assembly implementation of SHA-1 algorithm.
>>
>> tcrypt benchmark results on Cortex-A8, sha1-arm-asm vs sha1-neon-asm:
>>
>> block-size      bytes/update    old-vs-new
>> 16              16              1.06x
>> 64              16              1.05x
>> 64              64              1.09x
>> 256             16              1.04x
>> 256             64              1.11x
>> 256             256             1.28x
>> 1024            16              1.04x
>> 1024            256             1.34x
>> 1024            1024            1.42x
>> 2048            16              1.04x
>> 2048            256             1.35x
>> 2048            1024            1.44x
>> 2048            2048            1.46x
>> 4096            16              1.04x
>> 4096            256             1.36x
>> 4096            1024            1.45x
>> 4096            4096            1.48x
>> 8192            16              1.04x
>> 8192            256             1.36x
>> 8192            1024            1.46x
>> 8192            4096            1.49x
>> 8192            8192            1.49x
>>
> 
> This is a nice result: about the same speedup as OpenSSL when
> comparing the ALU asm implementation with the NEON.
> 
>> Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
>> ---
>>  arch/arm/crypto/Makefile           |    2
>>  arch/arm/crypto/sha1-armv7-neon.S  |  635 ++++++++++++++++++++++++++++++++++++
>>  arch/arm/crypto/sha1_glue.c        |    8
>>  arch/arm/crypto/sha1_neon_glue.c   |  197 +++++++++++
>>  arch/arm/include/asm/crypto/sha1.h |   10 +
>>  crypto/Kconfig                     |   11 +
>>  6 files changed, 860 insertions(+), 3 deletions(-)
>>  create mode 100644 arch/arm/crypto/sha1-armv7-neon.S
>>  create mode 100644 arch/arm/crypto/sha1_neon_glue.c
>>  create mode 100644 arch/arm/include/asm/crypto/sha1.h
>>
>> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
>> index 81cda39..374956d 100644
>> --- a/arch/arm/crypto/Makefile
>> +++ b/arch/arm/crypto/Makefile
>> @@ -5,10 +5,12 @@
>>  obj-$(CONFIG_CRYPTO_AES_ARM) += aes-arm.o
>>  obj-$(CONFIG_CRYPTO_AES_ARM_BS) += aes-arm-bs.o
>>  obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
>> +obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
>>
>>  aes-arm-y      := aes-armv4.o aes_glue.o
>>  aes-arm-bs-y   := aesbs-core.o aesbs-glue.o
>>  sha1-arm-y     := sha1-armv4-large.o sha1_glue.o
>> +sha1-arm-neon-y        := sha1-armv7-neon.o sha1_neon_glue.o
>>
>>  quiet_cmd_perl = PERL    $@
>>        cmd_perl = $(PERL) $(<) > $(@)
>> diff --git a/arch/arm/crypto/sha1-armv7-neon.S b/arch/arm/crypto/sha1-armv7-neon.S
>> new file mode 100644
>> index 0000000..beb1ed1
>> --- /dev/null
>> +++ b/arch/arm/crypto/sha1-armv7-neon.S
>> @@ -0,0 +1,635 @@
>> +/* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
>> + *
>> + * Copyright ? 2013-2014 Jussi Kivilinna <jussi.kivilinna@iki.fi>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License as published by the Free
>> + * Software Foundation; either version 2 of the License, or (at your option)
>> + * any later version.
>> + */
>> +
>> +.syntax unified
>> +#ifdef __thumb2__
>> +.thumb
>> +#else
>> +.code   32
>> +#endif
> 
> This is all NEON code, which has no size benefit from being assembled
> as Thumb-2. (NEON instructions are 4 bytes in either case)
> If we drop the Thumb-2 versions, there's one less version to test.
> 

Ok, I'll drop the .thumb part for both SHA1 and SHA512.

>> +.fpu neon
>> +
>> +.data
>> +
>> +#define GET_DATA_POINTER(reg, name, rtmp) ldr reg, =name
>> +
> [...]
>> +.align 4
>> +.LK_VEC:
>> +.LK1:  .long K1, K1, K1, K1
>> +.LK2:  .long K2, K2, K2, K2
>> +.LK3:  .long K3, K3, K3, K3
>> +.LK4:  .long K4, K4, K4, K4
> 
> If you are going to put these constants in a different section, they
> belong in .rodata not .data.
> But why not just keep them in .text? In that case, you can replace the
> above 'ldr reg, =name' with 'adr reg ,name' (or adrl if required) and
> get rid of the .ltorg and the literal pool.
> 

Ok, I'll move these to .text.

Actually I realized that these values can be loaded to still free NEON
registers for additional speed up.

>> +/*
>> + * Transform nblks*64 bytes (nblks*16 32-bit words) at DATA.
>> + *
>> + * unsigned int
>> + * sha1_transform_neon (void *ctx, const unsigned char *data,
>> + *                      unsigned int nblks)
>> + */
>> +.align 3
>> +.globl sha1_transform_neon
>> +.type  sha1_transform_neon,%function;
>> +
>> +sha1_transform_neon:
> 
> ENTRY(sha1_transform_neon) [and matching ENDPROC() below]

Sure.

-Jussi

  reply	other threads:[~2014-06-29 11:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-28 10:39 [PATCH 1/2] crypto: sha1/ARM: make use of common SHA-1 structures Jussi Kivilinna
2014-06-28 10:39 ` Jussi Kivilinna
2014-06-28 10:40 ` [PATCH 2/2] crypto: sha1: add ARM NEON implementation Jussi Kivilinna
2014-06-28 10:40   ` Jussi Kivilinna
2014-06-28 20:07   ` Ard Biesheuvel
2014-06-28 20:07     ` Ard Biesheuvel
2014-06-29 11:25     ` Jussi Kivilinna [this message]
2014-06-29 11:25       ` Jussi Kivilinna
2014-06-28 19:47 ` [PATCH 1/2] crypto: sha1/ARM: make use of common SHA-1 structures Ard Biesheuvel
2014-06-28 19:47   ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53AFF7A0.1000306@iki.fi \
    --to=jussi.kivilinna@iki.fi \
    --cc=ard.biesheuvel@linaro.org \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.