From: Eric Biggers <ebiggers@kernel.org>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-crypto@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org, Andy Lutomirski <luto@kernel.org>,
"Chang S . Bae" <chang.seok.bae@intel.com>
Subject: Re: [PATCH 0/6] Faster AES-XTS on modern x86_64 CPUs
Date: Tue, 26 Mar 2024 09:47:55 -0700 [thread overview]
Message-ID: <20240326164755.GB1524@sol.localdomain> (raw)
In-Reply-To: <CAMj1kXH4fNevFzrbazJptadxh_spEY3W91FZni5eMqD+UKrSUQ@mail.gmail.com>
On Tue, Mar 26, 2024 at 10:51:48AM +0200, Ard Biesheuvel wrote:
> > Open questions:
> >
> > - Is the policy that I implemented for preferring ymm registers to zmm
> > registers the right one? arch/x86/crypto/poly1305_glue.c thinks that
> > only Skylake has the bad downclocking. My current proposal is a bit
> > more conservative; it also excludes Ice Lake and Tiger Lake. Those
> > CPUs supposedly still have some downclocking, though not as much.
> >
> > - Should the policy on the use of zmm registers be in a centralized
> > place? It probably doesn't make sense to have random different
> > policies for different crypto algorithms (AES, Poly1305, ARIA, etc.).
> >
> > - Are there any other known issues with using AVX512 in kernel mode? It
> > seems to work, and technically it's not new because Poly1305 and ARIA
> > already use AVX512, including the mask registers and zmm registers up
> > to 31. So if there was a major issue, like the new registers not
> > being properly saved and restored, it probably would have already been
> > found. But AES-XTS support would introduce a wider use of it.
> >
>
> I don't have much input here, except that I think we should just
> disable AVX512 kernel-wide on systems where there is no benefit in
> terms of throughput. I suspect this might change with algorithms that
> rely more heavily on the masking, but so far, we have been making
> quite effective use of simple permute vectors and overlapping loads
> and stores to do the same. And as Eric points out, the only relevant
> use case in the kernel is blocks of size 2^n where n is at least 9.
There are several benefits to AVX512 besides the 512-bit zmm registers. Besides
masking, there are also twice as many SIMD registers which make it possible to
cache all the AES round keys. There are also other new instructions such as
vpternlogd which I've used in AES-XTS to XOR values together more efficiently.
That's why this patchset adds both xts-aes-vaes-avx10_256 and
xts-aes-vaes-avx10_512. And I've adopted the new "AVX10" naming, maybe a bit
early, to emphasize that it's not just about 512-bit...
Consider Intel Ice Lake for example, these are the AES-256-XTS encryption speeds
on 4096-byte messages in MB/s I'm seeing:
xts-aes-aesni 5136
xts-aes-aesni-avx 5366
xts-aes-vaes-avx2 9337
xts-aes-vaes-avx10_256 9876
xts-aes-vaes-avx10_512 10215
So yes, on that CPU the biggest boost comes just from VAES, staying on AVX2.
But taking advantage of AVX512 does help a bit more, first from the parts other
than 512-bit registers, then a bit more from 512-bit registers.
I do have Ice Lake on the exclusion list from xts-aes-vaes-avx10_512 anyway,
since the concern with downclocking is not really about the performance of the
code itself but rather the impact on unrelated code running on the CPU.
And I *think* the right policy is to just disable the use of the zmm registers,
as opposed to AVX512 entirely. As AVX512 was originally presented it did tie
these together, but they don't have to be. AVX10 (which supposedly future
x86_64 CPUs will have) explicitly moves away from that by repackaging the
existing AVX512 features and making the zmm registers optional.
- Eric
next prev parent reply other threads:[~2024-03-26 16:47 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-26 8:02 [PATCH 0/6] Faster AES-XTS on modern x86_64 CPUs Eric Biggers
2024-03-26 8:02 ` [PATCH 1/6] x86: add kconfig symbols for assembler VAES and VPCLMULQDQ support Eric Biggers
2024-03-26 8:10 ` Ingo Molnar
2024-03-26 8:18 ` Eric Biggers
2024-03-26 8:28 ` Ingo Molnar
2024-03-26 8:03 ` [PATCH 2/6] crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUs Eric Biggers
2024-03-26 8:03 ` [PATCH 3/6] crypto: x86/aes-xts - wire up AESNI + AVX implementation Eric Biggers
2024-03-26 8:03 ` [PATCH 4/6] crypto: x86/aes-xts - wire up VAES + AVX2 implementation Eric Biggers
2024-03-26 8:03 ` [PATCH 5/6] crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation Eric Biggers
2024-03-26 8:03 ` [PATCH 6/6] crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation Eric Biggers
2024-03-26 8:51 ` [PATCH 0/6] Faster AES-XTS on modern x86_64 CPUs Ard Biesheuvel
2024-03-26 16:47 ` Eric Biggers [this message]
2024-04-03 8:12 ` David Laight
2024-04-04 1:35 ` Eric Biggers
2024-04-04 7:53 ` David Laight
2024-04-05 19:19 ` Eric Biggers
2024-04-08 7:41 ` David Laight
2024-04-08 12:31 ` Eric Biggers
2024-04-05 7:58 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240326164755.GB1524@sol.localdomain \
--to=ebiggers@kernel.org \
--cc=ardb@kernel.org \
--cc=chang.seok.bae@intel.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox