From: Eric Biggers <ebiggers@kernel.org>
To: Ard Biesheuvel <ardb+git@google.com>
Cc: linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org,
herbert@gondor.apana.org.au, will@kernel.org,
catalin.marinas@arm.com, Ard Biesheuvel <ardb@kernel.org>,
Kees Cook <kees@kernel.org>
Subject: Re: [PATCH v2 2/2] arm64/crc32: Implement 4-way interleave using PMULL
Date: Wed, 16 Oct 2024 14:54:01 -0700 [thread overview]
Message-ID: <20241016215401.GC1742@sol.localdomain> (raw)
In-Reply-To: <20241016192640.406255-6-ardb+git@google.com>
On Wed, Oct 16, 2024 at 09:26:43PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Now that kernel mode NEON no longer disables preemption, using FP/SIMD
> in library code which is not obviously part of the crypto subsystem is
> no longer problematic, as it will no longer incur unexpected latencies.
>
> So accelerate the CRC-32 library code on arm64 to use a 4-way
> interleave, using PMULL instructions to implement the folding.
>
> On Apple M2, this results in a speedup of 2 - 2.8x when using input
> sizes of 1k - 8k. For smaller sizes, the overhead of preserving and
> restoring the FP/SIMD register file may not be worth it, so 1k is used
> as a threshold for choosing this code path.
>
> The coefficient tables were generated using code provided by Eric. [0]
>
> [0] https://github.com/ebiggers/libdeflate/blob/master/scripts/gen_crc32_multipliers.c
>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/arm64/lib/Makefile | 2 +-
> arch/arm64/lib/crc32-4way.S | 242 ++++++++++++++++++++
> arch/arm64/lib/crc32-glue.c | 48 ++++
> 3 files changed, 291 insertions(+), 1 deletion(-)
Reviewed-by: Eric Biggers <ebiggers@google.com>
> + /* Process up to 64 blocks of 64 bytes at a time */
> +.La\@: mov x3, #64
> + cmp len, #64
> + csel x3, x3, len, hi // x3 := max(len, 64)
The comment should say min(len, 64), not max(len, 64).
- Eric
prev parent reply other threads:[~2024-10-16 21:54 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-16 19:26 [PATCH v2 0/2] arm64: Speed up CRC-32 using PMULL instructions Ard Biesheuvel
2024-10-16 19:26 ` [PATCH v2 1/2] arm64/lib: Handle CRC-32 alternative in C code Ard Biesheuvel
2024-10-16 21:52 ` Eric Biggers
2024-10-17 2:37 ` Eric Biggers
2024-10-16 19:26 ` [PATCH v2 2/2] arm64/crc32: Implement 4-way interleave using PMULL Ard Biesheuvel
2024-10-16 21:54 ` Eric Biggers [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241016215401.GC1742@sol.localdomain \
--to=ebiggers@kernel.org \
--cc=ardb+git@google.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=herbert@gondor.apana.org.au \
--cc=kees@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.