From: Eric Biggers <ebiggers@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org,
x86@kernel.org, David Laight <david.laight.linux@gmail.com>,
linux-raid@vger.kernel.org
Subject: Re: [PATCH v3] lib/raid/xor: x86: Add AVX-512 optimized xor_gen()
Date: Wed, 17 Jun 2026 15:44:37 +0000 [thread overview]
Message-ID: <20260617154437.GA785086@google.com> (raw)
In-Reply-To: <20260617055653.GB19218@lst.de>
On Wed, Jun 17, 2026 at 07:56:53AM +0200, Christoph Hellwig wrote:
> Can use the xor: prefix used for all other commits to lib/raid/xor?
>
> > Benchmark on AMD Ryzen 9 9950X (Zen 5):
> >
> > src_cnt avx avx512 Improvement
> > ======= ========== ========== ===========
> > 1 56353 MB/s 75388 MB/s 33%
> > 2 54274 MB/s 68409 MB/s 26%
> > 3 44649 MB/s 64042 MB/s 43%
> > 4 41315 MB/s 55002 MB/s 33%
>
> On my Zen 5 mobile (AMD Ryzen AI 7 PRO 350) both the existing
> AVX2 and this AVX512 code give numbers in the 200+ GB/s range. Not
> sure if is just the different benchmarking or something else going on.
I used lib/raid/xor/xor-core.c which measures the throughput of parity
data generated, whereas your proposed xor_benchmark() in xor_kunit
measures the throughput of source data consumed. I don't know which
makes more sense, but we should make them consistent with each other.
> FYI, one or 2 sources are basically useless as they RAID5 configs
> that have no benefits over simple mirroring and thus the numbers
> aren't too interesting.
>
> > +DO_XOR_BLOCKS(avx512_inner, xor_avx512_2, xor_avx512_3, xor_avx512_4,
> > + xor_avx512_5);
>
> Is there really much of a benefit of doing the historic DO_XOR_BLOCKS
> vs doing the loop manually? Especially as the common cases for a
> modern RAID will usually loop over more disks than this was built
> for. I.e., in practice one or two source buffers only happen at the
> end of a loop over more disks.
There's not really a way out of unrolling by source buffer count, as
otherwise the pointers would continuously have to be reloaded into
registers. That's why your proposal was so slow (see the numbers I gave
in https://lore.kernel.org/linux-crypto/20260612055933.GA6675@sol/ ).
It could be something different from 2-5 specifically, or open-coded
instead of using the macro if that's all you're asking for, but at a
high level the unrolling by source buffer count does seem to be needed.
- Eric
prev parent reply other threads:[~2026-06-17 15:44 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 19:03 [PATCH v3] lib/raid/xor: x86: Add AVX-512 optimized xor_gen() Eric Biggers
2026-06-15 19:25 ` sashiko-bot
2026-06-15 20:10 ` Eric Biggers
2026-06-15 21:16 ` Borislav Petkov
2026-06-15 21:29 ` Eric Biggers
2026-06-15 23:53 ` Borislav Petkov
2026-06-16 0:29 ` Dave Hansen
2026-06-17 5:44 ` Christoph Hellwig
2026-06-16 8:13 ` David Laight
2026-06-17 5:56 ` Christoph Hellwig
2026-06-17 10:05 ` David Laight
2026-06-17 15:44 ` Eric Biggers [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260617154437.GA785086@google.com \
--to=ebiggers@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=david.laight.linux@gmail.com \
--cc=hch@lst.de \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.