From: Eric Biggers <ebiggers@kernel.org>
To: Ard Biesheuvel <ardb+git@google.com>
Cc: linux-crypto@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
herbert@gondor.apana.org.au, keescook@chromium.org,
Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH 6/6] crypto: arm/crct10dif - Implement plain NEON variant
Date: Tue, 29 Oct 2024 21:33:16 -0700 [thread overview]
Message-ID: <20241030043316.GF1489@sol.localdomain> (raw)
In-Reply-To: <20241028190207.1394367-14-ardb+git@google.com>
On Mon, Oct 28, 2024 at 08:02:14PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The CRC-T10DIF algorithm produces a 16-bit CRC, and this is reflected in
> the folding coefficients, which are also only 16 bits wide.
>
> This means that the polynomial multiplications involving these
> coefficients can be performed using 8-bit long polynomial multiplication
> (8x8 -> 16) in only a few steps, and this is an instruction that is part
> of the base NEON ISA, which is all most real ARMv7 cores implement. (The
> 64-bit PMULL instruction is part of the crypto extensions, which are
> only implemented by 64-bit cores)
>
> The final reduction is a bit more involved, but we can delegate that to
> the generic CRC-T10DIF implementation after folding the entire input
> into a 16 byte vector.
>
> This results in a speedup of around 6.6x on Cortex-A72 running in 32-bit
> mode.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm/crypto/crct10dif-ce-core.S | 50 ++++++++++++++++++--
> arch/arm/crypto/crct10dif-ce-glue.c | 44 +++++++++++++++--
> 2 files changed, 85 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm/crypto/crct10dif-ce-core.S b/arch/arm/crypto/crct10dif-ce-core.S
> index 6b72167574b2..5e103a9a42dd 100644
> --- a/arch/arm/crypto/crct10dif-ce-core.S
> +++ b/arch/arm/crypto/crct10dif-ce-core.S
> @@ -112,6 +112,34 @@
> FOLD_CONST_L .req q10l
> FOLD_CONST_H .req q10h
>
> +__pmull16x64_p8:
> + vmull.p8 q13, d23, d24
> + vmull.p8 q14, d23, d25
> + vmull.p8 q15, d22, d24
> + vmull.p8 q12, d22, d25
> +
> + veor q14, q14, q15
> + veor d24, d24, d25
> + veor d26, d26, d27
> + veor d28, d28, d29
> + vmov.i32 d25, #0
> + vmov.i32 d29, #0
> + vext.8 q12, q12, q12, #14
> + vext.8 q14, q14, q14, #15
> + veor d24, d24, d26
> + bx lr
> +ENDPROC(__pmull16x64_p8)
As in the arm64 version, a few comments here would help.
> diff --git a/arch/arm/crypto/crct10dif-ce-glue.c b/arch/arm/crypto/crct10dif-ce-glue.c
> index 60aa79c2fcdb..4431e4ce2dbe 100644
> --- a/arch/arm/crypto/crct10dif-ce-glue.c
> +++ b/arch/arm/crypto/crct10dif-ce-glue.c
> @@ -20,6 +20,7 @@
> #define CRC_T10DIF_PMULL_CHUNK_SIZE 16U
>
> asmlinkage u16 crc_t10dif_pmull64(u16 init_crc, const u8 *buf, size_t len);
> +asmlinkage void crc_t10dif_pmull8(u16 init_crc, const u8 *buf, size_t len, u8 *out);
Maybe explicitly type 'out' to 'u8 out[16]'?
- Eric
prev parent reply other threads:[~2024-10-30 4:35 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-28 19:02 [PATCH 0/6] Clean up and improve ARM/arm64 CRC-T10DIF code Ard Biesheuvel
2024-10-28 19:02 ` [PATCH 1/6] crypto: arm64/crct10dif - Remove obsolete chunking logic Ard Biesheuvel
2024-10-30 3:54 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 2/6] crypto: arm64/crct10dif - Use faster 16x64 bit polynomial multiply Ard Biesheuvel
2024-10-30 4:01 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 3/6] crypto: arm64/crct10dif - Remove remaining 64x64 PMULL fallback code Ard Biesheuvel
2024-10-30 4:15 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 4/6] crypto: arm/crct10dif - Use existing mov_l macro instead of __adrl Ard Biesheuvel
2024-10-30 4:29 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 5/6] crypto: arm/crct10dif - Macroify PMULL asm code Ard Biesheuvel
2024-10-30 4:31 ` Eric Biggers
2024-10-28 19:02 ` [PATCH 6/6] crypto: arm/crct10dif - Implement plain NEON variant Ard Biesheuvel
2024-10-30 4:33 ` Eric Biggers [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241030043316.GF1489@sol.localdomain \
--to=ebiggers@kernel.org \
--cc=ardb+git@google.com \
--cc=ardb@kernel.org \
--cc=herbert@gondor.apana.org.au \
--cc=keescook@chromium.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.