From: "Ard Biesheuvel" <ardb@kernel.org>
To: "Robin Murphy" <robin.murphy@arm.com>,
"Demian Shulhan" <demyansh@gmail.com>
Cc: "Christoph Hellwig" <hch@lst.de>,
"Mark Rutland" <mark.rutland@arm.com>,
"Song Liu" <song@kernel.org>, "Yu Kuai" <yukuai@fnnas.com>,
"Will Deacon" <will@kernel.org>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Mark Brown" <broonie@kernel.org>,
linux-arm-kernel@lists.infradead.org,
"Li Nan" <linan122@huawei.com>,
linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation
Date: Fri, 17 Apr 2026 16:43:06 +0200 [thread overview]
Message-ID: <c9362db6-1fef-4e70-9525-29b2936f4887@app.fastmail.com> (raw)
In-Reply-To: <8db4defe-8b5e-4cc3-880b-72d46510b034@arm.com>
On Thu, 16 Apr 2026, at 18:26, Robin Murphy wrote:
> On 16/04/2026 3:59 pm, Demian Shulhan wrote:
>> Hi Ard!
...
>>> OK, so the takeaway here is that SVE is only worth the hassle if the
>>> vector length is at least 256 bits. This is not entirely surprising,
>>> but given that Graviton4 went back to 128 bit vectors from 256, I
>>> wonder what the future expectation is here.
>>
>> I agree. The results from the SnapRAID tests are not as impressive as
>> I hoped, and the fact that Neoverse-V2 went back to 128-bit is a red
>> flag. It suggests that wide SVE registers might not be a priority in
>> future architecture versions.
>
> If you look at the Neoverse V1 software optimisation guide[1], the SVE
> instructions generally have half the throughput of their ASIMD
> equivalents (i.e. presumably the vector pipes are still only 128 bits
> wide and SVE is just using them in pairs), so indeed the total
> instruction count is largely meaningless - IPC might be somewhat more
> relevant, but I'd say the only performance number that's really
> meaningful is the end-to-end MB/s measure of how fast the function
> implementation as a whole can process data.
On arm64, kernel mode NEON is mostly used to gain access to AES and SHA
instructions, and only to a lesser degree to speed up ordinary
arithmetic, and so XOR is somewhat of an outlier here.
Given that Neoverse V1 apparently already carves up ordinary arithmetic
performed on 256-bit vectors and operates on 128 bits at a time, I am
rather skeptical that we're likely to see any SVE implementations of the
crypto extensions soon that are meaningfully faster, given that these
are presumably much costlier to implement in terms of gate count, and
therefore likely to be split up even on SVE implementations that can
perform ordinary arithmetic on 256+ bit vectors in a single cycle. Note
that even the arm64 SIMD accelerated CRC implementations rely heavily on
64x64->128 polynomial multiplication.
IOW, before we consider kernel mode SVE, I'd like to see some benchmarks
for other algorithms too.
> It's probably also worth checking whether the current NEON routines
> themselves are actually optimal for modern big CPUs - things have
> moved on quite a bit since Cortex-A57 (whose ASIMD performance could
> also be described as "esoteric" at the best of times...)
>
Some of those crypto routines could definitely be made faster, but it
highly depends on the context whether that actually helps: for instance,
there was a proposal a while ago to incorporate the AES-GCM code from
the OpenSSL project (authored by ARM) but at the time, it slightly
regressed the ~1500 byte case and only gave a substantial improvement
for much larger block sizes, which aren't that common in the kernel for
this particular algorithm.
IOW, any contributions that improve the existing code (or outright
replace it with something faster, for all I care) are highly
appreciated, but they should be motivated by benchmarks that reflect
the use cases that we actually consider important for the algorithm
in question.
next prev parent reply other threads:[~2026-04-17 14:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260318150245.3080719-1-demyansh@gmail.com>
2026-03-24 7:45 ` [PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation Christoph Hellwig
2026-03-24 8:00 ` Ard Biesheuvel
2026-03-24 10:04 ` Mark Rutland
2026-03-29 13:01 ` Demian Shulhan
2026-03-30 5:30 ` Christoph Hellwig
2026-03-30 16:39 ` Ard Biesheuvel
2026-03-31 6:36 ` Christoph Hellwig
2026-03-31 13:18 ` Demian Shulhan
2026-04-16 12:40 ` Demian Shulhan
2026-04-16 13:39 ` Ard Biesheuvel
2026-04-16 14:59 ` Demian Shulhan
2026-04-16 16:26 ` Robin Murphy
2026-04-16 16:47 ` Mark Brown
2026-04-16 17:03 ` Robin Murphy
2026-04-17 14:43 ` Ard Biesheuvel [this message]
2026-04-17 15:36 ` Mark Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c9362db6-1fef-4e70-9525-29b2936f4887@app.fastmail.com \
--to=ardb@kernel.org \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=demyansh@gmail.com \
--cc=hch@lst.de \
--cc=linan122@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=robin.murphy@arm.com \
--cc=song@kernel.org \
--cc=will@kernel.org \
--cc=yukuai@fnnas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox