From: David Laight <david.laight.linux@gmail.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: Demian Shulhan <demyansh@gmail.com>,
linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, ardb@kernel.org
Subject: Re: [PATCH v3] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation
Date: Sun, 29 Mar 2026 22:57:04 +0100 [thread overview]
Message-ID: <20260329225704.0eb82966@pumpkin> (raw)
In-Reply-To: <20260329203829.GA2746@quark>
On Sun, 29 Mar 2026 13:38:29 -0700
Eric Biggers <ebiggers@kernel.org> wrote:
> On Sun, Mar 29, 2026 at 07:43:38AM +0000, Demian Shulhan wrote:
> > Implement an optimized CRC64 (NVMe) algorithm for ARM64 using NEON
> > Polynomial Multiply Long (PMULL) instructions. The generic shift-and-XOR
> > software implementation is slow, which creates a bottleneck in NVMe and
> > other storage subsystems.
> >
> > The acceleration is implemented using C intrinsics (<arm_neon.h>) rather
> > than raw assembly for better readability and maintainability.
> >
> > Key highlights of this implementation:
> > - Uses 4KB chunking inside scoped_ksimd() to avoid preemption latency
> > spikes on large buffers.
> > - Pre-calculates and loads fold constants via vld1q_u64() to minimize
> > register spilling.
> > - Benchmarks show the break-even point against the generic implementation
> > is around 128 bytes. The PMULL path is enabled only for len >= 128.
Final thought:
Is that allowing for the cost of kernel_fpu_begin()? - which I think only
affects the first call.
And the cost of the data-cache misses for the lookup table reads? - again
worse for the first call.
David
> >
> > Performance results (kunit crc_benchmark on Cortex-A72):
> > - Generic (len=4096): ~268 MB/s
> > - PMULL (len=4096): ~1556 MB/s (nearly 6x improvement)
> >
> > Signed-off-by: Demian Shulhan <demyansh@gmail.com>
>
> Applied to https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=crc-next
>
> Thanks!
>
> - Eric
>
next prev parent reply other threads:[~2026-03-29 21:57 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 6:54 [PATCH] lib/crc: arm64: add NEON accelerated CRC64-NVMe implementation Demian Shulhan
2026-03-19 19:09 ` Eric Biggers
2026-03-20 10:36 ` David Laight
2026-03-20 20:00 ` Eric Biggers
2026-03-22 9:29 ` Demian Shulhan
2026-03-22 14:13 ` Eric Biggers
2026-03-19 23:31 ` David Laight
2026-03-20 11:22 ` kernel test robot
2026-03-27 6:02 ` [PATCH v2] " Demian Shulhan
2026-03-27 19:38 ` Eric Biggers
2026-03-29 7:43 ` [PATCH v3] " Demian Shulhan
2026-03-29 20:38 ` Eric Biggers
2026-03-29 21:57 ` David Laight [this message]
2026-03-29 22:18 ` Eric Biggers
2026-03-30 9:31 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260329225704.0eb82966@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=ardb@kernel.org \
--cc=demyansh@gmail.com \
--cc=ebiggers@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.