Re: [PATCH 6/6] crypto: arm/crct10dif - Implement plain NEON variant

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Biggers <ebiggers@kernel.org>
To: Ard Biesheuvel <ardb+git@google.com>
Cc: linux-crypto@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	herbert@gondor.apana.org.au, keescook@chromium.org,
	Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH 6/6] crypto: arm/crct10dif - Implement plain NEON variant
Date: Tue, 29 Oct 2024 21:33:16 -0700	[thread overview]
Message-ID: <20241030043316.GF1489@sol.localdomain> (raw)
In-Reply-To: <20241028190207.1394367-14-ardb+git@google.com>

On Mon, Oct 28, 2024 at 08:02:14PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> The CRC-T10DIF algorithm produces a 16-bit CRC, and this is reflected in
> the folding coefficients, which are also only 16 bits wide.
> 
> This means that the polynomial multiplications involving these
> coefficients can be performed using 8-bit long polynomial multiplication
> (8x8 -> 16) in only a few steps, and this is an instruction that is part
> of the base NEON ISA, which is all most real ARMv7 cores implement. (The
> 64-bit PMULL instruction is part of the crypto extensions, which are
> only implemented by 64-bit cores)
> 
> The final reduction is a bit more involved, but we can delegate that to
> the generic CRC-T10DIF implementation after folding the entire input
> into a 16 byte vector.
> 
> This results in a speedup of around 6.6x on Cortex-A72 running in 32-bit
> mode.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm/crypto/crct10dif-ce-core.S | 50 ++++++++++++++++++--
>  arch/arm/crypto/crct10dif-ce-glue.c | 44 +++++++++++++++--
>  2 files changed, 85 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm/crypto/crct10dif-ce-core.S b/arch/arm/crypto/crct10dif-ce-core.S
> index 6b72167574b2..5e103a9a42dd 100644
> --- a/arch/arm/crypto/crct10dif-ce-core.S
> +++ b/arch/arm/crypto/crct10dif-ce-core.S
> @@ -112,6 +112,34 @@
>  	FOLD_CONST_L	.req	q10l
>  	FOLD_CONST_H	.req	q10h
>  
> +__pmull16x64_p8:
> +	vmull.p8	q13, d23, d24
> +	vmull.p8	q14, d23, d25
> +	vmull.p8	q15, d22, d24
> +	vmull.p8	q12, d22, d25
> +
> +	veor		q14, q14, q15
> +	veor		d24, d24, d25
> +	veor		d26, d26, d27
> +	veor		d28, d28, d29
> +	vmov.i32	d25, #0
> +	vmov.i32	d29, #0
> +	vext.8		q12, q12, q12, #14
> +	vext.8		q14, q14, q14, #15
> +	veor		d24, d24, d26
> +	bx		lr
> +ENDPROC(__pmull16x64_p8)

As in the arm64 version, a few comments here would help.

> diff --git a/arch/arm/crypto/crct10dif-ce-glue.c b/arch/arm/crypto/crct10dif-ce-glue.c
> index 60aa79c2fcdb..4431e4ce2dbe 100644
> --- a/arch/arm/crypto/crct10dif-ce-glue.c
> +++ b/arch/arm/crypto/crct10dif-ce-glue.c
> @@ -20,6 +20,7 @@
>  #define CRC_T10DIF_PMULL_CHUNK_SIZE	16U
>  
>  asmlinkage u16 crc_t10dif_pmull64(u16 init_crc, const u8 *buf, size_t len);
> +asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len, u8 *out);

Maybe explicitly type 'out' to 'u8 out[16]'?

- Eric

     prev parent reply	other threads:[~2024-10-30  4:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-28 19:02 [PATCH 0/6] Clean up and improve ARM/arm64 CRC-T10DIF code Ard Biesheuvel
2024-10-28 19:02 ` [PATCH 1/6] crypto: arm64/crct10dif - Remove obsolete chunking logic Ard Biesheuvel
2024-10-30  3:54   ` Eric Biggers
2024-10-28 19:02 ` [PATCH 2/6] crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply Ard Biesheuvel
2024-10-30  4:01   ` Eric Biggers
2024-10-28 19:02 ` [PATCH 3/6] crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code Ard Biesheuvel
2024-10-30  4:15   ` Eric Biggers
2024-10-28 19:02 ` [PATCH 4/6] crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl Ard Biesheuvel
2024-10-30  4:29   ` Eric Biggers
2024-10-28 19:02 ` [PATCH 5/6] crypto: arm/crct10dif - Macroify PMULL asm code Ard Biesheuvel
2024-10-30  4:31   ` Eric Biggers
2024-10-28 19:02 ` [PATCH 6/6] crypto: arm/crct10dif - Implement plain NEON variant Ard Biesheuvel
2024-10-30  4:33   ` Eric Biggers [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241030043316.GF1489@sol.localdomain \
    --to=ebiggers@kernel.org \
    --cc=ardb+git@google.com \
    --cc=ardb@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=keescook@chromium.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.