* [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen()
@ 2026-06-26 4:37 Eric Biggers
2026-06-26 4:37 ` [PATCH 1/8] x86/fpu: Check for missing AVX and AVX-512 xstate bits Eric Biggers
` (7 more replies)
0 siblings, 8 replies; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers
My patch "lib/raid/xor: x86: Add AVX-512 optimized xor_gen()"
(https://lore.kernel.org/r/20260615190338.26581-1-ebiggers@kernel.org/)
still seems to be blocked on a Sashiko comment about cpu_has_xfeatures()
not being called.
However, the x86-optimized RAID library code supports UML, and currently
UML doesn't implement cpu_has_xfeatures(). That's perhaps why the
existing AVX-512 optimized RAID6 code doesn't check it either.
In fact, it seems to have been getting by fine without it, which
suggests that it's not truly needed.
But to eliminate any doubts, I've had a go at fully resolving the
situation by making both native x86 and UML explicitly clear any
X86_FEATURE_* flags at boot time whose xfeatures are missing.
Then, cpu_has_xfeatures() is entirely removed from the kernel.
The last patch adds the AVX-512 optimized xor_gen(). I do still think
it would be fine to proceed with it without the rest. But if there are
any doubts, we can take this more comprehensive cleanup route.
Eric Biggers (8):
x86/fpu: Check for missing AVX and AVX-512 xstate bits
um: Check for missing AVX and AVX-512 xstate bits
crypto: x86 - Stop using cpu_has_xfeatures()
lib/crypto: x86: Stop using cpu_has_xfeatures()
lib/crc: x86: Stop using cpu_has_xfeatures()
x86/fpu: Remove cpu_has_xfeatures()
lib/raid/xor: x86: Remove redundant X86_FEATURE_OSXSAVE check
lib/raid/xor: x86: Add AVX-512 optimized xor_gen()
arch/um/kernel/um_arch.c | 78 ++++++++++++-
arch/x86/crypto/aegis128-aesni-glue.c | 3 +-
arch/x86/crypto/aesni-intel_glue.c | 7 +-
arch/x86/crypto/aria_aesni_avx2_glue.c | 11 +-
arch/x86/crypto/aria_aesni_avx_glue.c | 11 +-
arch/x86/crypto/aria_gfni_avx512_glue.c | 11 +-
arch/x86/crypto/camellia_aesni_avx2_glue.c | 11 +-
arch/x86/crypto/camellia_aesni_avx_glue.c | 11 +-
arch/x86/crypto/cast5_avx_glue.c | 7 +-
arch/x86/crypto/cast6_avx_glue.c | 7 +-
arch/x86/crypto/serpent_avx2_glue.c | 9 +-
arch/x86/crypto/serpent_avx_glue.c | 7 +-
arch/x86/crypto/sm4_aesni_avx2_glue.c | 11 +-
arch/x86/crypto/sm4_aesni_avx_glue.c | 11 +-
arch/x86/crypto/twofish_avx_glue.c | 6 +-
arch/x86/include/asm/fpu/api.h | 9 --
arch/x86/kernel/fpu/xstate.c | 63 ++++-------
lib/crc/x86/crc-pclmul-template.h | 6 +-
lib/crypto/x86/blake2s.h | 4 +-
lib/crypto/x86/chacha.h | 3 +-
lib/crypto/x86/nh.h | 4 +-
lib/crypto/x86/poly1305.h | 7 +-
lib/crypto/x86/sha1.h | 4 +-
lib/crypto/x86/sha256.h | 4 +-
lib/crypto/x86/sha512.h | 3 +-
lib/crypto/x86/sm3.h | 3 +-
lib/raid/xor/Makefile | 2 +-
lib/raid/xor/x86/xor-avx512.c | 121 +++++++++++++++++++++
lib/raid/xor/x86/xor_arch.h | 24 ++--
29 files changed, 264 insertions(+), 194 deletions(-)
create mode 100644 lib/raid/xor/x86/xor-avx512.c
base-commit: 4edcdefd4083ae04b1a5656f4be6cd83ae919ef4
--
2.54.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/8] x86/fpu: Check for missing AVX and AVX-512 xstate bits
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
@ 2026-06-26 4:37 ` Eric Biggers
2026-06-26 5:39 ` Christoph Hellwig
2026-06-26 4:37 ` [PATCH 2/8] um: " Eric Biggers
` (6 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers
If the CPU declares AVX or AVX-512 support, verify that the
corresponding xstate bits are also set. If not, warn and clear them.
This eliminates the perceived need for AVX and AVX-512 optimized code in
the kernel to call cpu_has_xfeatures(). That has never been universally
done, which strongly suggests that it has never really been needed in
practice, but this should remove any remaining doubt.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
arch/x86/kernel/fpu/xstate.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index a7b6524a9dea..7f7e62e4ebc5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -797,10 +797,27 @@ static u64 __init guest_default_mask(void)
* for KVM guests.
*/
return ~(u64)XFEATURE_MASK_USER_DYNAMIC;
}
+/* Clear any X86_FEATURE_* used by the kernel whose xfeatures are missing. */
+static void __init clear_cpu_caps_with_missing_xfeatures(u64 xfeatures)
+{
+ u64 mask;
+
+ mask = XFEATURE_MASK_FPSSE | XFEATURE_MASK_YMM;
+ if (boot_cpu_has(X86_FEATURE_AVX) && (xfeatures & mask) != mask) {
+ pr_err("x86/fpu: Disabling AVX support due to missing xstate features\n");
+ setup_clear_cpu_cap(X86_FEATURE_AVX);
+ }
+ mask = XFEATURE_MASK_FPSSE | XFEATURE_MASK_YMM | XFEATURE_MASK_AVX512;
+ if (boot_cpu_has(X86_FEATURE_AVX512F) && (xfeatures & mask) != mask) {
+ pr_err("x86/fpu: Disabling AVX-512 support due to missing xstate features\n");
+ setup_clear_cpu_cap(X86_FEATURE_AVX512F);
+ }
+}
+
/*
* Enable and initialize the xsave feature.
* Called once per system bootup.
*/
void __init fpu__init_system_xstate(unsigned int legacy_size)
@@ -853,10 +870,12 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx.\n",
fpu_kernel_cfg.max_features);
goto out_disable;
}
+ clear_cpu_caps_with_missing_xfeatures(fpu_kernel_cfg.max_features);
+
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
XFEATURE_MASK_INDEPENDENT;
/*
* Clear XSAVE features that are disabled in the normal CPUID.
--
2.54.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/8] um: Check for missing AVX and AVX-512 xstate bits
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
2026-06-26 4:37 ` [PATCH 1/8] x86/fpu: Check for missing AVX and AVX-512 xstate bits Eric Biggers
@ 2026-06-26 4:37 ` Eric Biggers
2026-06-26 7:41 ` David Laight
2026-06-26 4:37 ` [PATCH 3/8] crypto: x86 - Stop using cpu_has_xfeatures() Eric Biggers
` (5 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers
If the CPU declares AVX or AVX-512 support, verify that all the
corresponding xstate bits are also set. If any are missing, warn and
don't set the corresponding X86_FEATURE_* flags.
This eliminates the perceived need for UML-supporting AVX and AVX-512
optimized code in the kernel (that is, lib/raid/ currently) to start
checking the xstate bits in addition to X86_FEATURE_AVX*.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
arch/um/kernel/um_arch.c | 78 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 77 insertions(+), 1 deletion(-)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index 2141f5f1f5a2..aafbaef2ae82 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -262,16 +262,92 @@ EXPORT_SYMBOL(task_size);
unsigned long brk_start;
#define MIN_VMALLOC (32 * 1024 * 1024)
+static u64 __init read_xcr0(void)
+{
+ u32 a, b, c, d;
+
+ asm volatile("cpuid"
+ : "=a"(a), "=b"(b), "=c"(c), "=d"(d)
+ : "a"(0), "c"(0));
+ if (a >= 1) { /* max_leaf >= 1 */
+ asm volatile("cpuid"
+ : "=a"(a), "=b"(b), "=c"(c), "=d"(d)
+ : "a"(1), "c"(0));
+ if (c & (1 << 27)) { /* XSAVE enabled by OS */
+ asm volatile("xgetbv" : "=d"(d), "=a"(a) : "c"(0));
+ return ((u64)d << 32) | a;
+ }
+ }
+ return 0;
+}
+
+static void __init validate_and_set_cpu_cap(int cap, u64 xcr0)
+{
+ /*
+ * Check for missing xstate features right away, so that there's no
+ * perceived need for all optimized code in the kernel to do so.
+ */
+ switch (cap) {
+ case X86_FEATURE_AVX:
+ case X86_FEATURE_AVX2:
+ case X86_FEATURE_AVX_VNNI:
+ case X86_FEATURE_FMA:
+ case X86_FEATURE_VAES:
+ case X86_FEATURE_VPCLMULQDQ:
+ if ((xcr0 & 0x7) != 0x7) {
+ static bool warned;
+
+ if (!warned) {
+ os_warn("Disabling AVX support due to missing xstate features\n");
+ warned = true;
+ }
+ return;
+ }
+ break;
+ case X86_FEATURE_AVX512F:
+ case X86_FEATURE_AVX512BW:
+ case X86_FEATURE_AVX512CD:
+ case X86_FEATURE_AVX512DQ:
+ case X86_FEATURE_AVX512ER:
+ case X86_FEATURE_AVX512IFMA:
+ case X86_FEATURE_AVX512PF:
+ case X86_FEATURE_AVX512VBMI:
+ case X86_FEATURE_AVX512VL:
+ case X86_FEATURE_AVX512_4FMAPS:
+ case X86_FEATURE_AVX512_4VNNIW:
+ case X86_FEATURE_AVX512_BF16:
+ case X86_FEATURE_AVX512_BITALG:
+ case X86_FEATURE_AVX512_FP16:
+ case X86_FEATURE_AVX512_VBMI2:
+ case X86_FEATURE_AVX512_VNNI:
+ case X86_FEATURE_AVX512_VP2INTERSECT:
+ case X86_FEATURE_AVX512_VPOPCNTDQ:
+ if ((xcr0 & 0xe7) != 0xe7) {
+ static bool warned;
+
+ if (!warned) {
+ os_warn("Disabling AVX-512 support due to missing xstate features\n");
+ warned = true;
+ }
+ return;
+ }
+ break;
+ }
+ set_cpu_cap(&boot_cpu_data, cap);
+}
+
static void __init parse_host_cpu_flags(char *line)
{
+ u64 xcr0 = read_xcr0();
int i;
+
for (i = 0; i < 32*NCAPINTS; i++) {
if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]))
- set_cpu_cap(&boot_cpu_data, i);
+ validate_and_set_cpu_cap(i, xcr0);
}
}
static void __init parse_cache_line(char *line)
{
--
2.54.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/8] crypto: x86 - Stop using cpu_has_xfeatures()
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
2026-06-26 4:37 ` [PATCH 1/8] x86/fpu: Check for missing AVX and AVX-512 xstate bits Eric Biggers
2026-06-26 4:37 ` [PATCH 2/8] um: " Eric Biggers
@ 2026-06-26 4:37 ` Eric Biggers
2026-06-26 4:37 ` [PATCH 4/8] lib/crypto: x86: " Eric Biggers
` (4 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers
Checking both boot_cpu_has(X86_FEATURE_AVX*) and cpu_has_xfeatures() has
never really been needed in practice, and it's never been universally
done (e.g., lib/raid/ omits cpu_has_xfeatures()). Nevertheless, both
x86 and UML now explicitly clear the AVX and AVX-512 flags if their
xfeatures are missing, which should remove any remaining doubts.
Thus, remove all the calls to cpu_has_xfeatures(), as well as the
related checks of boot_cpu_has(X86_FEATURE_OSXSAVE).
In a few cases there was no corresponding boot_cpu_has(X86_FEATURE_AVX*)
check, so add the missing ones.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
arch/x86/crypto/aegis128-aesni-glue.c | 3 +--
arch/x86/crypto/aesni-intel_glue.c | 7 ++-----
arch/x86/crypto/aria_aesni_avx2_glue.c | 11 +----------
arch/x86/crypto/aria_aesni_avx_glue.c | 11 +----------
arch/x86/crypto/aria_gfni_avx512_glue.c | 11 +----------
arch/x86/crypto/camellia_aesni_avx2_glue.c | 11 +----------
arch/x86/crypto/camellia_aesni_avx_glue.c | 11 +----------
arch/x86/crypto/cast5_avx_glue.c | 7 ++-----
arch/x86/crypto/cast6_avx_glue.c | 7 ++-----
arch/x86/crypto/serpent_avx2_glue.c | 9 +--------
arch/x86/crypto/serpent_avx_glue.c | 7 ++-----
arch/x86/crypto/sm4_aesni_avx2_glue.c | 11 +----------
arch/x86/crypto/sm4_aesni_avx_glue.c | 11 +----------
arch/x86/crypto/twofish_avx_glue.c | 6 ++----
14 files changed, 19 insertions(+), 104 deletions(-)
diff --git a/arch/x86/crypto/aegis128-aesni-glue.c b/arch/x86/crypto/aegis128-aesni-glue.c
index f1adfba1a76e..09fc0b15b0e9 100644
--- a/arch/x86/crypto/aegis128-aesni-glue.c
+++ b/arch/x86/crypto/aegis128-aesni-glue.c
@@ -263,12 +263,11 @@ static struct aead_alg crypto_aegis128_aesni_alg = {
};
static int __init crypto_aegis128_aesni_module_init(void)
{
if (!boot_cpu_has(X86_FEATURE_XMM4_1) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
- !cpu_has_xfeatures(XFEATURE_MASK_SSE, NULL))
+ !boot_cpu_has(X86_FEATURE_AES))
return -ENODEV;
return crypto_register_aead(&crypto_aegis128_aesni_alg);
}
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index f522fff9231e..f6f899db7482 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1546,12 +1546,11 @@ static int __init register_avx_algs(void)
* For simplicity, just always check for VAES and VPCLMULQDQ together.
*/
if (!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_VAES) ||
!boot_cpu_has(X86_FEATURE_VPCLMULQDQ) ||
- !boot_cpu_has(X86_FEATURE_PCLMULQDQ) ||
- !cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
+ !boot_cpu_has(X86_FEATURE_PCLMULQDQ))
return 0;
err = crypto_register_skciphers(skcipher_algs_vaes_avx2,
ARRAY_SIZE(skcipher_algs_vaes_avx2));
if (err)
return err;
@@ -1560,13 +1559,11 @@ static int __init register_avx_algs(void)
if (err)
return err;
if (!boot_cpu_has(X86_FEATURE_AVX512BW) ||
!boot_cpu_has(X86_FEATURE_AVX512VL) ||
- !boot_cpu_has(X86_FEATURE_BMI2) ||
- !cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
- XFEATURE_MASK_AVX512, NULL))
+ !boot_cpu_has(X86_FEATURE_BMI2))
return 0;
if (boot_cpu_has(X86_FEATURE_PREFER_YMM)) {
int i;
diff --git a/arch/x86/crypto/aria_aesni_avx2_glue.c b/arch/x86/crypto/aria_aesni_avx2_glue.c
index 1487a49bfbac..371be2fb6469 100644
--- a/arch/x86/crypto/aria_aesni_avx2_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx2_glue.c
@@ -193,26 +193,17 @@ static struct skcipher_alg aria_algs[] = {
}
};
static int __init aria_avx2_init(void)
{
- const char *feature_name;
-
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AVX2) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ !boot_cpu_has(X86_FEATURE_AES)) {
pr_info("AVX2 or AES-NI instructions are not detected.\n");
return -ENODEV;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
- }
-
if (boot_cpu_has(X86_FEATURE_GFNI)) {
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
aria_ops.aria_encrypt_32way = aria_aesni_avx2_gfni_encrypt_32way;
diff --git a/arch/x86/crypto/aria_aesni_avx_glue.c b/arch/x86/crypto/aria_aesni_avx_glue.c
index e4e3d78915a5..d23fc91c0ebd 100644
--- a/arch/x86/crypto/aria_aesni_avx_glue.c
+++ b/arch/x86/crypto/aria_aesni_avx_glue.c
@@ -180,25 +180,16 @@ static struct skcipher_alg aria_algs[] = {
}
};
static int __init aria_avx_init(void)
{
- const char *feature_name;
-
if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ !boot_cpu_has(X86_FEATURE_AES)) {
pr_info("AVX or AES-NI instructions are not detected.\n");
return -ENODEV;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
- }
-
if (boot_cpu_has(X86_FEATURE_GFNI)) {
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
} else {
diff --git a/arch/x86/crypto/aria_gfni_avx512_glue.c b/arch/x86/crypto/aria_gfni_avx512_glue.c
index 363cbf4399cc..e05bbeb22d4a 100644
--- a/arch/x86/crypto/aria_gfni_avx512_glue.c
+++ b/arch/x86/crypto/aria_gfni_avx512_glue.c
@@ -194,28 +194,19 @@ static struct skcipher_alg aria_algs[] = {
}
};
static int __init aria_avx512_init(void)
{
- const char *feature_name;
-
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_AVX512F) ||
!boot_cpu_has(X86_FEATURE_AVX512VL) ||
- !boot_cpu_has(X86_FEATURE_GFNI) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ !boot_cpu_has(X86_FEATURE_GFNI)) {
pr_info("AVX512/GFNI instructions are not detected.\n");
return -ENODEV;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
- XFEATURE_MASK_AVX512, &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
- }
-
aria_ops.aria_encrypt_16way = aria_aesni_avx_gfni_encrypt_16way;
aria_ops.aria_decrypt_16way = aria_aesni_avx_gfni_decrypt_16way;
aria_ops.aria_ctr_crypt_16way = aria_aesni_avx_gfni_ctr_crypt_16way;
aria_ops.aria_encrypt_32way = aria_aesni_avx2_gfni_encrypt_32way;
aria_ops.aria_decrypt_32way = aria_aesni_avx2_gfni_decrypt_32way;
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 2d2f4e16537c..073fa3bb8388 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -95,26 +95,17 @@ static struct skcipher_alg camellia_algs[] = {
},
};
static int __init camellia_aesni_init(void)
{
- const char *feature_name;
-
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AVX2) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ !boot_cpu_has(X86_FEATURE_AES)) {
pr_info("AVX2 or AES-NI instructions are not detected.\n");
return -ENODEV;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
- }
-
return crypto_register_skciphers(camellia_algs,
ARRAY_SIZE(camellia_algs));
}
static void __exit camellia_aesni_fini(void)
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 5c321f255eb7..872e5e07220f 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -96,25 +96,16 @@ static struct skcipher_alg camellia_algs[] = {
}
};
static int __init camellia_aesni_init(void)
{
- const char *feature_name;
-
if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ !boot_cpu_has(X86_FEATURE_AES)) {
pr_info("AVX or AES-NI instructions are not detected.\n");
return -ENODEV;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
- }
-
return crypto_register_skciphers(camellia_algs,
ARRAY_SIZE(camellia_algs));
}
static void __exit camellia_aesni_fini(void)
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 3aca04d43b34..5de35e863370 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -90,15 +90,12 @@ static struct skcipher_alg cast5_algs[] = {
}
};
static int __init cast5_init(void)
{
- const char *feature_name;
-
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ if (!boot_cpu_has(X86_FEATURE_AVX)) {
+ pr_info("AVX instructions are not detected.\n");
return -ENODEV;
}
return crypto_register_skciphers(cast5_algs,
ARRAY_SIZE(cast5_algs));
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index c4dd28c30303..3d7ea48007bc 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -90,15 +90,12 @@ static struct skcipher_alg cast6_algs[] = {
},
};
static int __init cast6_init(void)
{
- const char *feature_name;
-
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ if (!boot_cpu_has(X86_FEATURE_AVX)) {
+ pr_info("AVX instructions are not detected.\n");
return -ENODEV;
}
return crypto_register_skciphers(cast6_algs, ARRAY_SIZE(cast6_algs));
}
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index f5f2121b7956..72a9e2b306d6 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -91,21 +91,14 @@ static struct skcipher_alg serpent_algs[] = {
},
};
static int __init serpent_avx2_init(void)
{
- const char *feature_name;
-
- if (!boot_cpu_has(X86_FEATURE_AVX2) || !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ if (!boot_cpu_has(X86_FEATURE_AVX2)) {
pr_info("AVX2 instructions are not detected.\n");
return -ENODEV;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
- }
return crypto_register_skciphers(serpent_algs,
ARRAY_SIZE(serpent_algs));
}
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 9c8b3a335d5c..42c4e1569674 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -98,15 +98,12 @@ static struct skcipher_alg serpent_algs[] = {
},
};
static int __init serpent_init(void)
{
- const char *feature_name;
-
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ if (!boot_cpu_has(X86_FEATURE_AVX)) {
+ pr_info("AVX instructions are not detected.\n");
return -ENODEV;
}
return crypto_register_skciphers(serpent_algs,
ARRAY_SIZE(serpent_algs));
diff --git a/arch/x86/crypto/sm4_aesni_avx2_glue.c b/arch/x86/crypto/sm4_aesni_avx2_glue.c
index fec0ab7a63dd..eef73894e777 100644
--- a/arch/x86/crypto/sm4_aesni_avx2_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx2_glue.c
@@ -96,26 +96,17 @@ static struct skcipher_alg sm4_aesni_avx2_skciphers[] = {
}
};
static int __init sm4_init(void)
{
- const char *feature_name;
-
if (!boot_cpu_has(X86_FEATURE_AVX) ||
!boot_cpu_has(X86_FEATURE_AVX2) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ !boot_cpu_has(X86_FEATURE_AES)) {
pr_info("AVX2 or AES-NI instructions are not detected.\n");
return -ENODEV;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
- }
-
return crypto_register_skciphers(sm4_aesni_avx2_skciphers,
ARRAY_SIZE(sm4_aesni_avx2_skciphers));
}
static void __exit sm4_exit(void)
diff --git a/arch/x86/crypto/sm4_aesni_avx_glue.c b/arch/x86/crypto/sm4_aesni_avx_glue.c
index 88caf418a06f..ed383da5ff46 100644
--- a/arch/x86/crypto/sm4_aesni_avx_glue.c
+++ b/arch/x86/crypto/sm4_aesni_avx_glue.c
@@ -312,25 +312,16 @@ static struct skcipher_alg sm4_aesni_avx_skciphers[] = {
}
};
static int __init sm4_init(void)
{
- const char *feature_name;
-
if (!boot_cpu_has(X86_FEATURE_AVX) ||
- !boot_cpu_has(X86_FEATURE_AES) ||
- !boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ !boot_cpu_has(X86_FEATURE_AES)) {
pr_info("AVX or AES-NI instructions are not detected.\n");
return -ENODEV;
}
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
- return -ENODEV;
- }
-
return crypto_register_skciphers(sm4_aesni_avx_skciphers,
ARRAY_SIZE(sm4_aesni_avx_skciphers));
}
static void __exit sm4_exit(void)
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 9e20db013750..985bc54a2340 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -100,14 +100,12 @@ static struct skcipher_alg twofish_algs[] = {
},
};
static int __init twofish_init(void)
{
- const char *feature_name;
-
- if (!cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, &feature_name)) {
- pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ if (!boot_cpu_has(X86_FEATURE_AVX)) {
+ pr_info("AVX instructions are not detected.\n");
return -ENODEV;
}
return crypto_register_skciphers(twofish_algs,
ARRAY_SIZE(twofish_algs));
--
2.54.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/8] lib/crypto: x86: Stop using cpu_has_xfeatures()
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
` (2 preceding siblings ...)
2026-06-26 4:37 ` [PATCH 3/8] crypto: x86 - Stop using cpu_has_xfeatures() Eric Biggers
@ 2026-06-26 4:37 ` Eric Biggers
2026-06-26 4:37 ` [PATCH 5/8] lib/crc: " Eric Biggers
` (3 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers
Checking both boot_cpu_has() and cpu_has_xfeatures() has never really
been needed in practice, and it's never been universally done (e.g.,
lib/raid/ omits cpu_has_xfeatures()). Nevertheless, both x86 and UML
now explicitly clear the AVX and AVX-512 flags if their xfeatures are
missing, which should remove any remaining doubts.
Thus, remove all the calls to cpu_has_xfeatures().
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
lib/crypto/x86/blake2s.h | 4 +---
lib/crypto/x86/chacha.h | 3 +--
lib/crypto/x86/nh.h | 4 +---
lib/crypto/x86/poly1305.h | 7 ++-----
lib/crypto/x86/sha1.h | 4 +---
lib/crypto/x86/sha256.h | 4 +---
lib/crypto/x86/sha512.h | 3 +--
lib/crypto/x86/sm3.h | 3 +--
8 files changed, 9 insertions(+), 23 deletions(-)
diff --git a/lib/crypto/x86/blake2s.h b/lib/crypto/x86/blake2s.h
index f8eed6cb042e..0f7c51f055c8 100644
--- a/lib/crypto/x86/blake2s.h
+++ b/lib/crypto/x86/blake2s.h
@@ -53,10 +53,8 @@ static void blake2s_mod_init_arch(void)
static_branch_enable(&blake2s_use_ssse3);
if (boot_cpu_has(X86_FEATURE_AVX) &&
boot_cpu_has(X86_FEATURE_AVX2) &&
boot_cpu_has(X86_FEATURE_AVX512F) &&
- boot_cpu_has(X86_FEATURE_AVX512VL) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
- XFEATURE_MASK_AVX512, NULL))
+ boot_cpu_has(X86_FEATURE_AVX512VL))
static_branch_enable(&blake2s_use_avx512);
}
diff --git a/lib/crypto/x86/chacha.h b/lib/crypto/x86/chacha.h
index 10cf8f1c569d..c79562aac56b 100644
--- a/lib/crypto/x86/chacha.h
+++ b/lib/crypto/x86/chacha.h
@@ -163,12 +163,11 @@ static void chacha_mod_init_arch(void)
return;
static_branch_enable(&chacha_use_simd);
if (boot_cpu_has(X86_FEATURE_AVX) &&
- boot_cpu_has(X86_FEATURE_AVX2) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL)) {
+ boot_cpu_has(X86_FEATURE_AVX2)) {
static_branch_enable(&chacha_use_avx2);
if (boot_cpu_has(X86_FEATURE_AVX512VL) &&
boot_cpu_has(X86_FEATURE_AVX512BW)) /* kmovq */
static_branch_enable(&chacha_use_avx512vl);
diff --git a/lib/crypto/x86/nh.h b/lib/crypto/x86/nh.h
index 83361c2e9783..342636dcb750 100644
--- a/lib/crypto/x86/nh.h
+++ b/lib/crypto/x86/nh.h
@@ -35,11 +35,9 @@ static bool nh_arch(const u32 *key, const u8 *message, size_t message_len,
#define nh_mod_init_arch nh_mod_init_arch
static void nh_mod_init_arch(void)
{
if (boot_cpu_has(X86_FEATURE_XMM2)) {
static_branch_enable(&have_sse2);
- if (boot_cpu_has(X86_FEATURE_AVX2) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- NULL))
+ if (boot_cpu_has(X86_FEATURE_AVX2))
static_branch_enable(&have_avx2);
}
}
diff --git a/lib/crypto/x86/poly1305.h b/lib/crypto/x86/poly1305.h
index ee92e3740a78..b061b9926fa5 100644
--- a/lib/crypto/x86/poly1305.h
+++ b/lib/crypto/x86/poly1305.h
@@ -141,18 +141,15 @@ static void poly1305_emit(const struct poly1305_state *ctx,
}
#define poly1305_mod_init_arch poly1305_mod_init_arch
static void poly1305_mod_init_arch(void)
{
- if (boot_cpu_has(X86_FEATURE_AVX) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
+ if (boot_cpu_has(X86_FEATURE_AVX))
static_branch_enable(&poly1305_use_avx);
- if (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_AVX2) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
+ if (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_AVX2))
static_branch_enable(&poly1305_use_avx2);
if (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_AVX2) &&
boot_cpu_has(X86_FEATURE_AVX512F) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM | XFEATURE_MASK_AVX512, NULL) &&
/* Skylake downclocks unacceptably much when using zmm, but later generations are fast. */
boot_cpu_data.x86_vfm != INTEL_SKYLAKE_X)
static_branch_enable(&poly1305_use_avx512);
}
diff --git a/lib/crypto/x86/sha1.h b/lib/crypto/x86/sha1.h
index c48a0131fd12..6aff433466e7 100644
--- a/lib/crypto/x86/sha1.h
+++ b/lib/crypto/x86/sha1.h
@@ -57,13 +57,11 @@ static void sha1_blocks(struct sha1_block_state *state,
#define sha1_mod_init_arch sha1_mod_init_arch
static void sha1_mod_init_arch(void)
{
if (boot_cpu_has(X86_FEATURE_SHA_NI)) {
static_call_update(sha1_blocks_x86, sha1_blocks_ni);
- } else if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- NULL) &&
- boot_cpu_has(X86_FEATURE_AVX)) {
+ } else if (boot_cpu_has(X86_FEATURE_AVX)) {
if (boot_cpu_has(X86_FEATURE_AVX2) &&
boot_cpu_has(X86_FEATURE_BMI1) &&
boot_cpu_has(X86_FEATURE_BMI2))
static_call_update(sha1_blocks_x86, sha1_blocks_avx2);
else
diff --git a/lib/crypto/x86/sha256.h b/lib/crypto/x86/sha256.h
index 0ee69d8e39fe..e98ffdaf4b14 100644
--- a/lib/crypto/x86/sha256.h
+++ b/lib/crypto/x86/sha256.h
@@ -102,13 +102,11 @@ static void sha256_mod_init_arch(void)
static_branch_enable(&have_sha_ni);
} else if (IS_ENABLED(CONFIG_CPU_SUP_ZHAOXIN) &&
boot_cpu_has(X86_FEATURE_PHE_EN) &&
boot_cpu_data.x86 >= 0x07) {
static_call_update(sha256_blocks_x86, sha256_blocks_phe);
- } else if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM,
- NULL) &&
- boot_cpu_has(X86_FEATURE_AVX)) {
+ } else if (boot_cpu_has(X86_FEATURE_AVX)) {
if (boot_cpu_has(X86_FEATURE_AVX2) &&
boot_cpu_has(X86_FEATURE_BMI2))
static_call_update(sha256_blocks_x86,
sha256_blocks_avx2);
else
diff --git a/lib/crypto/x86/sha512.h b/lib/crypto/x86/sha512.h
index 0213c70cedd0..4e177b4606bd 100644
--- a/lib/crypto/x86/sha512.h
+++ b/lib/crypto/x86/sha512.h
@@ -35,12 +35,11 @@ static void sha512_blocks(struct sha512_block_state *state,
}
#define sha512_mod_init_arch sha512_mod_init_arch
static void sha512_mod_init_arch(void)
{
- if (cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL) &&
- boot_cpu_has(X86_FEATURE_AVX)) {
+ if (boot_cpu_has(X86_FEATURE_AVX)) {
if (boot_cpu_has(X86_FEATURE_AVX2) &&
boot_cpu_has(X86_FEATURE_BMI2))
static_call_update(sha512_blocks_x86,
sha512_blocks_avx2);
else
diff --git a/lib/crypto/x86/sm3.h b/lib/crypto/x86/sm3.h
index 3834780f2f6a..e06d4a22e4fa 100644
--- a/lib/crypto/x86/sm3.h
+++ b/lib/crypto/x86/sm3.h
@@ -31,9 +31,8 @@ static void sm3_blocks(struct sm3_block_state *state,
}
#define sm3_mod_init_arch sm3_mod_init_arch
static void sm3_mod_init_arch(void)
{
- if (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_BMI2) &&
- cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL))
+ if (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_BMI2))
static_call_update(sm3_blocks_x86, sm3_blocks_avx);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/8] lib/crc: x86: Stop using cpu_has_xfeatures()
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
` (3 preceding siblings ...)
2026-06-26 4:37 ` [PATCH 4/8] lib/crypto: x86: " Eric Biggers
@ 2026-06-26 4:37 ` Eric Biggers
2026-06-26 4:37 ` [PATCH 6/8] x86/fpu: Remove cpu_has_xfeatures() Eric Biggers
` (2 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers
Checking both boot_cpu_has() and cpu_has_xfeatures() has never really
been needed in practice, and it's never been universally done (e.g.,
lib/raid/ omits cpu_has_xfeatures()). Nevertheless, both x86 and UML
now explicitly clear the AVX and AVX-512 flags if their xfeatures are
missing, which should remove any remaining doubts.
Thus, remove all the calls to cpu_has_xfeatures().
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
lib/crc/x86/crc-pclmul-template.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/lib/crc/x86/crc-pclmul-template.h b/lib/crc/x86/crc-pclmul-template.h
index 02744831c6fa..893119bb7c07 100644
--- a/lib/crc/x86/crc-pclmul-template.h
+++ b/lib/crc/x86/crc-pclmul-template.h
@@ -25,20 +25,18 @@ crc_t prefix##_vpclmul_avx512(crc_t crc, const u8 *p, size_t len, \
DEFINE_STATIC_CALL(prefix##_pclmul, prefix##_pclmul_sse)
static inline bool have_vpclmul(void)
{
return boot_cpu_has(X86_FEATURE_VPCLMULQDQ) &&
- boot_cpu_has(X86_FEATURE_AVX2) &&
- cpu_has_xfeatures(XFEATURE_MASK_YMM, NULL);
+ boot_cpu_has(X86_FEATURE_AVX2);
}
static inline bool have_avx512(void)
{
return boot_cpu_has(X86_FEATURE_AVX512BW) &&
boot_cpu_has(X86_FEATURE_AVX512VL) &&
- !boot_cpu_has(X86_FEATURE_PREFER_YMM) &&
- cpu_has_xfeatures(XFEATURE_MASK_AVX512, NULL);
+ !boot_cpu_has(X86_FEATURE_PREFER_YMM);
}
/*
* Call a [V]PCLMULQDQ optimized CRC function if the data length is at least 16
* bytes, the CPU has PCLMULQDQ support, and the current context may use SIMD.
--
2.54.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 6/8] x86/fpu: Remove cpu_has_xfeatures()
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
` (4 preceding siblings ...)
2026-06-26 4:37 ` [PATCH 5/8] lib/crc: " Eric Biggers
@ 2026-06-26 4:37 ` Eric Biggers
2026-06-26 4:37 ` [PATCH 7/8] lib/raid/xor: x86: Remove redundant X86_FEATURE_OSXSAVE check Eric Biggers
2026-06-26 4:37 ` [PATCH 8/8] lib/raid/xor: x86: Add AVX-512 optimized xor_gen() Eric Biggers
7 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers
The only remaining caller of cpu_has_xfeatures() is
print_xstate_features(), which uses it only to check and get the name of
a single feature.
Remove it and just inline the needed code into print_xstate_features().
This also makes the "unknown xstate feature" entry at index XFEATURE_MAX
of xfeature_names[] unnecessary, so remove that too.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
arch/x86/include/asm/fpu/api.h | 9 -------
arch/x86/kernel/fpu/xstate.c | 44 +++-------------------------------
2 files changed, 3 insertions(+), 50 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 90c63fe19c0f..cfed8b24d64f 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -97,19 +97,10 @@ static inline void fpregs_assert_state_consistent(void) { }
/*
* Load the task FPU state before returning to userspace.
*/
extern void switch_fpu_return(void);
-/*
- * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
- *
- * If 'feature_name' is set then put a human-readable description of
- * the feature there as well - this can be used to print error (or success)
- * messages.
- */
-extern int cpu_has_xfeatures(u64 xfeatures_mask, const char **feature_name);
-
/* Trap handling */
extern int fpu__exception_code(struct fpu *fpu, int trap_nr);
extern void fpu_sync_fpstate(struct fpu *fpu);
extern void fpu_reset_from_exception_fixup(void);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 7f7e62e4ebc5..c6f0264f957c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -64,12 +64,12 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
"APX registers",
- "unknown xstate feature",
};
+static_assert(ARRAY_SIZE(xfeature_names) == XFEATURE_MAX);
static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_FP] = X86_FEATURE_FPU,
[XFEATURE_SSE] = X86_FEATURE_XMM,
[XFEATURE_YMM] = X86_FEATURE_AVX,
@@ -120,48 +120,10 @@ static inline unsigned int next_xfeature_order(unsigned int i, u64 mask)
i++)
#define XSTATE_FLAG_SUPERVISOR BIT(0)
#define XSTATE_FLAG_ALIGNED64 BIT(1)
-/*
- * Return whether the system supports a given xfeature.
- *
- * Also return the name of the (most advanced) feature that the caller requested:
- */
-int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
-{
- u64 xfeatures_missing = xfeatures_needed & ~fpu_kernel_cfg.max_features;
-
- if (unlikely(feature_name)) {
- long xfeature_idx, max_idx;
- u64 xfeatures_print;
- /*
- * So we use FLS here to be able to print the most advanced
- * feature that was requested but is missing. So if a driver
- * asks about "XFEATURE_MASK_SSE | XFEATURE_MASK_YMM" we'll print the
- * missing AVX feature - this is the most informative message
- * to users:
- */
- if (xfeatures_missing)
- xfeatures_print = xfeatures_missing;
- else
- xfeatures_print = xfeatures_needed;
-
- xfeature_idx = fls64(xfeatures_print)-1;
- max_idx = ARRAY_SIZE(xfeature_names)-1;
- xfeature_idx = min(xfeature_idx, max_idx);
-
- *feature_name = xfeature_names[xfeature_idx];
- }
-
- if (xfeatures_missing)
- return 0;
-
- return 1;
-}
-EXPORT_SYMBOL_GPL(cpu_has_xfeatures);
-
static bool xfeature_is_aligned64(int xfeature_nr)
{
return xstate_flags[xfeature_nr] & XSTATE_FLAG_ALIGNED64;
}
@@ -300,13 +262,13 @@ static void __init print_xstate_features(void)
{
int i;
for (i = 0; i < XFEATURE_MAX; i++) {
u64 mask = BIT_ULL(i);
- const char *name;
+ const char *name = xfeature_names[i];
- if (cpu_has_xfeatures(mask, &name))
+ if (fpu_kernel_cfg.max_features & mask)
pr_info("x86/fpu: Supporting XSAVE feature 0x%03Lx: '%s'\n", mask, name);
}
}
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 7/8] lib/raid/xor: x86: Remove redundant X86_FEATURE_OSXSAVE check
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
` (5 preceding siblings ...)
2026-06-26 4:37 ` [PATCH 6/8] x86/fpu: Remove cpu_has_xfeatures() Eric Biggers
@ 2026-06-26 4:37 ` Eric Biggers
2026-06-26 5:40 ` Christoph Hellwig
2026-06-26 4:37 ` [PATCH 8/8] lib/raid/xor: x86: Add AVX-512 optimized xor_gen() Eric Biggers
7 siblings, 1 reply; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers
X86_FEATURE_AVX implies X86_FEATURE_OSXSAVE already.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
lib/raid/xor/x86/xor_arch.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/lib/raid/xor/x86/xor_arch.h b/lib/raid/xor/x86/xor_arch.h
index 99fe85a213c6..991abe3f4bbd 100644
--- a/lib/raid/xor/x86/xor_arch.h
+++ b/lib/raid/xor/x86/xor_arch.h
@@ -16,12 +16,11 @@ extern struct xor_block_template xor_block_avx;
*
* 32-bit without MMX can fall back to the generic routines.
*/
static __always_inline void __init arch_xor_init(void)
{
- if (boot_cpu_has(X86_FEATURE_AVX) &&
- boot_cpu_has(X86_FEATURE_OSXSAVE)) {
+ if (boot_cpu_has(X86_FEATURE_AVX)) {
xor_force(&xor_block_avx);
} else if (IS_ENABLED(CONFIG_X86_64) || boot_cpu_has(X86_FEATURE_XMM)) {
xor_register(&xor_block_sse);
xor_register(&xor_block_sse_pf64);
} else if (boot_cpu_has(X86_FEATURE_MMX)) {
--
2.54.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 8/8] lib/raid/xor: x86: Add AVX-512 optimized xor_gen()
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
` (6 preceding siblings ...)
2026-06-26 4:37 ` [PATCH 7/8] lib/raid/xor: x86: Remove redundant X86_FEATURE_OSXSAVE check Eric Biggers
@ 2026-06-26 4:37 ` Eric Biggers
2026-06-26 5:47 ` Christoph Hellwig
7 siblings, 1 reply; 16+ messages in thread
From: Eric Biggers @ 2026-06-26 4:37 UTC (permalink / raw)
To: x86
Cc: linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, Eric Biggers, David Laight
Add an implementation of xor_gen() using AVX-512.
It uses 512-bit vectors, i.e. ZMM registers. It also uses the
vpternlogq instruction to do three-input XORs when applicable.
It's enabled on x86_64 CPUs that have AVX512F && !PREFER_YMM. In
practice that means:
- AMD Zen 4 and later (client and server)
- Intel Sapphire Rapids and later (server)
- Intel Rocket Lake (client)
- Intel Nova Lake and later (client)
The !PREFER_YMM condition excludes the older AVX-512 implementations in
Intel Skylake Server and Intel Ice Lake. They could run this code, but
they're known to have overly-eager downclocking when ZMM registers are
used. This is the same policy that the crypto and CRC code uses.
Benchmark on AMD Ryzen 9 9950X (Zen 5):
src_cnt avx avx512 Improvement
======= ========== ========== ===========
1 56353 MB/s 75388 MB/s 33%
2 54274 MB/s 68409 MB/s 26%
3 44649 MB/s 64042 MB/s 43%
4 41315 MB/s 55002 MB/s 33%
Reviewed-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
---
lib/raid/xor/Makefile | 2 +-
lib/raid/xor/x86/xor-avx512.c | 121 ++++++++++++++++++++++++++++++++++
lib/raid/xor/x86/xor_arch.h | 23 ++++---
3 files changed, 135 insertions(+), 11 deletions(-)
create mode 100644 lib/raid/xor/x86/xor-avx512.c
diff --git a/lib/raid/xor/Makefile b/lib/raid/xor/Makefile
index e8ecec3c09f9..4a0e5c6d8298 100644
--- a/lib/raid/xor/Makefile
+++ b/lib/raid/xor/Makefile
@@ -27,11 +27,11 @@ xor-$(CONFIG_ALTIVEC) += powerpc/xor_vmx.o powerpc/xor_vmx_glue.o
xor-$(CONFIG_RISCV_ISA_V) += riscv/xor.o riscv/xor-glue.o
xor-$(CONFIG_SPARC32) += sparc/xor-sparc32.o
xor-$(CONFIG_SPARC64) += sparc/xor-sparc64.o sparc/xor-sparc64-glue.o
xor-$(CONFIG_S390) += s390/xor.o
xor-$(CONFIG_X86_32) += x86/xor-avx.o x86/xor-sse.o x86/xor-mmx.o
-xor-$(CONFIG_X86_64) += x86/xor-avx.o x86/xor-sse.o
+xor-$(CONFIG_X86_64) += x86/xor-avx.o x86/xor-sse.o x86/xor-avx512.o
obj-y += tests/
CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) -I$(src)/$(SRCARCH)
CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU)
diff --git a/lib/raid/xor/x86/xor-avx512.c b/lib/raid/xor/x86/xor-avx512.c
new file mode 100644
index 000000000000..17f57900d827
--- /dev/null
+++ b/lib/raid/xor/x86/xor-avx512.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * AVX-512 optimized implementation of xor_gen()
+ *
+ * Copyright 2026 Google LLC
+ */
+
+#include <linux/types.h>
+#include <asm/fpu/api.h>
+#include "xor_impl.h"
+#include "xor_arch.h"
+
+/*
+ * Implementation notes:
+ *
+ * Unrolling by the number of buffers (2-5) is very important.
+ *
+ * Unrolling by length is less important, especially when using register-indexed
+ * addressing with negative indices from the end of the buffers. That approach
+ * results in just two loop control instructions being needed per iteration,
+ * regardless of the number of buffers.
+ *
+ * In fact, benchmarks showed that the 2 and 3 buffer cases require only 2x
+ * unrolling by length, while the 4 and 5 buffer cases don't require any
+ * unrolling by length. Benchmarks also showed that the register-indexed
+ * addressing isn't a bottleneck either; i.e., we can't do any better by
+ * incrementing the pointers as we go along, even with more unrolling.
+ */
+
+static void xor_avx512_2(long bytes, u8 *p1, const u8 *p2)
+{
+ long i = -bytes;
+
+ asm volatile("1: vmovdqa64 (%1,%0), %%zmm0\n"
+ "vmovdqa64 64(%1,%0), %%zmm1\n"
+ "vpxorq (%2,%0), %%zmm0, %%zmm0\n"
+ "vpxorq 64(%2,%0), %%zmm1, %%zmm1\n"
+ "vmovdqa64 %%zmm0, (%1,%0)\n"
+ "vmovdqa64 %%zmm1, 64(%1,%0)\n"
+ "add $128, %0\n"
+ "jnz 1b\n"
+ : "+&r"(i)
+ : "r"(p1 + bytes), "r"(p2 + bytes)
+ : "memory", "cc");
+}
+
+static void xor_avx512_3(long bytes, u8 *p1, const u8 *p2, const u8 *p3)
+{
+ long i = -bytes;
+
+ asm volatile("1: vmovdqa64 (%1,%0), %%zmm0\n"
+ "vmovdqa64 64(%1,%0), %%zmm1\n"
+ "vmovdqa64 (%2,%0), %%zmm2\n"
+ "vmovdqa64 64(%2,%0), %%zmm3\n"
+ "vpternlogq $0x96, (%3,%0), %%zmm2, %%zmm0\n"
+ "vpternlogq $0x96, 64(%3,%0), %%zmm3, %%zmm1\n"
+ "vmovdqa64 %%zmm0, (%1,%0)\n"
+ "vmovdqa64 %%zmm1, 64(%1,%0)\n"
+ "add $128, %0\n"
+ "jnz 1b\n"
+ : "+&r"(i)
+ : "r"(p1 + bytes), "r"(p2 + bytes), "r"(p3 + bytes)
+ : "memory", "cc");
+}
+
+static void xor_avx512_4(long bytes, u8 *p1, const u8 *p2, const u8 *p3,
+ const u8 *p4)
+{
+ long i = -bytes;
+
+ asm volatile("1: vmovdqa64 (%1,%0), %%zmm0\n"
+ "vmovdqa64 (%2,%0), %%zmm1\n"
+ "vpxorq (%3,%0), %%zmm0, %%zmm0\n"
+ "vpternlogq $0x96, (%4,%0), %%zmm1, %%zmm0\n"
+ "vmovdqa64 %%zmm0, (%1,%0)\n"
+ "add $64, %0\n"
+ "jnz 1b\n"
+ : "+&r"(i)
+ : "r"(p1 + bytes), "r"(p2 + bytes), "r"(p3 + bytes),
+ "r"(p4 + bytes)
+ : "memory", "cc");
+}
+
+static void xor_avx512_5(long bytes, u8 *p1, const u8 *p2, const u8 *p3,
+ const u8 *p4, const u8 *p5)
+{
+ long i = -bytes;
+
+ asm volatile("1: vmovdqa64 (%1,%0), %%zmm0\n"
+ "vmovdqa64 (%2,%0), %%zmm1\n"
+ "vpternlogq $0x96, (%3,%0), %%zmm1, %%zmm0\n"
+ "vmovdqa64 (%4,%0), %%zmm1\n"
+ "vpternlogq $0x96, (%5,%0), %%zmm1, %%zmm0\n"
+ "vmovdqa64 %%zmm0, (%1,%0)\n"
+ "add $64, %0\n"
+ "jnz 1b\n"
+ : "+&r"(i)
+ : "r"(p1 + bytes), "r"(p2 + bytes), "r"(p3 + bytes),
+ "r"(p4 + bytes), "r"(p5 + bytes)
+ : "memory", "cc");
+}
+
+DO_XOR_BLOCKS(avx512_inner, xor_avx512_2, xor_avx512_3, xor_avx512_4,
+ xor_avx512_5);
+
+/*
+ * Preconditions: bytes is a nonzero multiple of 512, and all buffers are
+ * 64-byte aligned.
+ */
+static void xor_gen_avx512(void *dest, void **srcs, unsigned int src_cnt,
+ unsigned int bytes)
+{
+ kernel_fpu_begin();
+ xor_gen_avx512_inner(dest, srcs, src_cnt, bytes);
+ kernel_fpu_end();
+}
+
+struct xor_block_template xor_block_avx512 = {
+ .name = "avx512",
+ .xor_gen = xor_gen_avx512,
+};
diff --git a/lib/raid/xor/x86/xor_arch.h b/lib/raid/xor/x86/xor_arch.h
index 991abe3f4bbd..d5e192b8793f 100644
--- a/lib/raid/xor/x86/xor_arch.h
+++ b/lib/raid/xor/x86/xor_arch.h
@@ -4,25 +4,28 @@
extern struct xor_block_template xor_block_pII_mmx;
extern struct xor_block_template xor_block_p5_mmx;
extern struct xor_block_template xor_block_sse;
extern struct xor_block_template xor_block_sse_pf64;
extern struct xor_block_template xor_block_avx;
+extern struct xor_block_template xor_block_avx512;
-/*
- * When SSE is available, use it as it can write around L2. We may also be able
- * to load into the L1 only depending on how the cpu deals with a load to a line
- * that is being prefetched.
- *
- * When AVX2 is available, force using it as it is better by all measures.
- *
- * 32-bit without MMX can fall back to the generic routines.
- */
static __always_inline void __init arch_xor_init(void)
{
- if (boot_cpu_has(X86_FEATURE_AVX)) {
+ if (IS_ENABLED(CONFIG_X86_64) && boot_cpu_has(X86_FEATURE_AVX512F) &&
+ !boot_cpu_has(X86_FEATURE_PREFER_YMM)) {
+ /* AVX-512 will be the best; no need to try others. */
+ /* !PREFER_YMM excludes CPUs with overly-eager downclocking. */
+ xor_force(&xor_block_avx512);
+ } else if (boot_cpu_has(X86_FEATURE_AVX)) {
+ /* AVX will be the best; no need to try others. */
xor_force(&xor_block_avx);
} else if (IS_ENABLED(CONFIG_X86_64) || boot_cpu_has(X86_FEATURE_XMM)) {
+ /*
+ * When SSE is available, use it as it can write around L2. We
+ * may also be able to load into the L1 only depending on how
+ * the cpu deals with a load to a line that is being prefetched.
+ */
xor_register(&xor_block_sse);
xor_register(&xor_block_sse_pf64);
} else if (boot_cpu_has(X86_FEATURE_MMX)) {
xor_register(&xor_block_pII_mmx);
xor_register(&xor_block_p5_mmx);
--
2.54.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/8] x86/fpu: Check for missing AVX and AVX-512 xstate bits
2026-06-26 4:37 ` [PATCH 1/8] x86/fpu: Check for missing AVX and AVX-512 xstate bits Eric Biggers
@ 2026-06-26 5:39 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-26 5:39 UTC (permalink / raw)
To: Eric Biggers
Cc: x86, linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton
On Thu, Jun 25, 2026 at 09:37:24PM -0700, Eric Biggers wrote:
> If the CPU declares AVX or AVX-512 support, verify that the
> corresponding xstate bits are also set. If not, warn and clear them.
>
> This eliminates the perceived need for AVX and AVX-512 optimized code in
> the kernel to call cpu_has_xfeatures(). That has never been universally
> done, which strongly suggests that it has never really been needed in
> practice, but this should remove any remaining doubt.
I'll leave it to the x86-experts if the low-level details are right,
but the model behind this makes life so much easier, thanks a lot!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 7/8] lib/raid/xor: x86: Remove redundant X86_FEATURE_OSXSAVE check
2026-06-26 4:37 ` [PATCH 7/8] lib/raid/xor: x86: Remove redundant X86_FEATURE_OSXSAVE check Eric Biggers
@ 2026-06-26 5:40 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-26 5:40 UTC (permalink / raw)
To: Eric Biggers
Cc: x86, linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 8/8] lib/raid/xor: x86: Add AVX-512 optimized xor_gen()
2026-06-26 4:37 ` [PATCH 8/8] lib/raid/xor: x86: Add AVX-512 optimized xor_gen() Eric Biggers
@ 2026-06-26 5:47 ` Christoph Hellwig
2026-06-26 5:47 ` Christoph Hellwig
0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-26 5:47 UTC (permalink / raw)
To: Eric Biggers
Cc: x86, linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton, David Laight
On Thu, Jun 25, 2026 at 09:37:31PM -0700, Eric Biggers wrote:
> + if (IS_ENABLED(CONFIG_X86_64) && boot_cpu_has(X86_FEATURE_AVX512F) &&
> + !boot_cpu_has(X86_FEATURE_PREFER_YMM)) {
> + /* AVX-512 will be the best; no need to try others. */
> + /* !PREFER_YMM excludes CPUs with overly-eager downclocking. */
Can you turn this into a single block comment using full sentences?
Right now the two separate comments almost feel contradictory even
if I get what you mean. While you're at it also through in a blurb
why we dont bother with AVX-512 (number of register, no one in the right
mind would bother running high performance code on modern cpus in 32-bit
mode).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 8/8] lib/raid/xor: x86: Add AVX-512 optimized xor_gen()
2026-06-26 5:47 ` Christoph Hellwig
@ 2026-06-26 5:47 ` Christoph Hellwig
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2026-06-26 5:47 UTC (permalink / raw)
To: Eric Biggers
Cc: x86, linux-um, linux-raid, linux-crypto, linux-kernel,
Andrew Morton, David Laight
On Fri, Jun 26, 2026 at 07:47:31AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 25, 2026 at 09:37:31PM -0700, Eric Biggers wrote:
> > + if (IS_ENABLED(CONFIG_X86_64) && boot_cpu_has(X86_FEATURE_AVX512F) &&
> > + !boot_cpu_has(X86_FEATURE_PREFER_YMM)) {
> > + /* AVX-512 will be the best; no need to try others. */
> > + /* !PREFER_YMM excludes CPUs with overly-eager downclocking. */
>
> Can you turn this into a single block comment using full sentences?
> Right now the two separate comments almost feel contradictory even
> if I get what you mean. While you're at it also through in a blurb
> why we dont bother with AVX-512 (number of register, no one in the right
> mind would bother running high performance code on modern cpus in 32-bit
> mode).
Otherwise looks good, btw:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/8] um: Check for missing AVX and AVX-512 xstate bits
2026-06-26 4:37 ` [PATCH 2/8] um: " Eric Biggers
@ 2026-06-26 7:41 ` David Laight
2026-06-26 8:21 ` Anton Ivanov
0 siblings, 1 reply; 16+ messages in thread
From: David Laight @ 2026-06-26 7:41 UTC (permalink / raw)
To: Eric Biggers
Cc: x86, linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton
On Thu, 25 Jun 2026 21:37:25 -0700
Eric Biggers <ebiggers@kernel.org> wrote:
> If the CPU declares AVX or AVX-512 support, verify that all the
> corresponding xstate bits are also set. If any are missing, warn and
> don't set the corresponding X86_FEATURE_* flags.
>
> This eliminates the perceived need for UML-supporting AVX and AVX-512
> optimized code in the kernel (that is, lib/raid/ currently) to start
> checking the xstate bits in addition to X86_FEATURE_AVX*.
>
...
> static void __init parse_host_cpu_flags(char *line)
> {
> + u64 xcr0 = read_xcr0();
> int i;
> +
> for (i = 0; i < 32*NCAPINTS; i++) {
> if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]))
'line' comes from /proc/cpuinfo
Surely something would be terribly wrong if that included something the kernel
had disabled (or didn't support).
David
> - set_cpu_cap(&boot_cpu_data, i);
> + validate_and_set_cpu_cap(i, xcr0);
> }
> }
>
> static void __init parse_cache_line(char *line)
> {
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/8] um: Check for missing AVX and AVX-512 xstate bits
2026-06-26 7:41 ` David Laight
@ 2026-06-26 8:21 ` Anton Ivanov
2026-06-26 10:49 ` David Laight
0 siblings, 1 reply; 16+ messages in thread
From: Anton Ivanov @ 2026-06-26 8:21 UTC (permalink / raw)
To: David Laight, Eric Biggers
Cc: x86, linux-um, linux-raid, linux-crypto, linux-kernel,
Christoph Hellwig, Andrew Morton
On 26/06/2026 08:41, David Laight wrote:
> On Thu, 25 Jun 2026 21:37:25 -0700
> Eric Biggers <ebiggers@kernel.org> wrote:
>
>> If the CPU declares AVX or AVX-512 support, verify that all the
>> corresponding xstate bits are also set. If any are missing, warn and
>> don't set the corresponding X86_FEATURE_* flags.
>>
>> This eliminates the perceived need for UML-supporting AVX and AVX-512
>> optimized code in the kernel (that is, lib/raid/ currently) to start
>> checking the xstate bits in addition to X86_FEATURE_AVX*.
>>
> ...
>> static void __init parse_host_cpu_flags(char *line)
>> {
>> + u64 xcr0 = read_xcr0();
>> int i;
>> +
>> for (i = 0; i < 32*NCAPINTS; i++) {
>> if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]))
>
> 'line' comes from /proc/cpuinfo
> Surely something would be terribly wrong if that included something the kernel
> had disabled (or didn't support).
>
> David
>
>
>> - set_cpu_cap(&boot_cpu_data, i);
>> + validate_and_set_cpu_cap(i, xcr0);
>> }
>> }
>>
>> static void __init parse_cache_line(char *line)
>> {
>
>
>
>
Lots of other stuff will go wrong before that. Glibc, things compiled with LLVM, python, perl, etc.
Half of the userland will go belly up, because AVX is used in string operations and hashing if it is available.
UML is just another userland application from this perspective, so there is no reason for it to behave any different from the rest of the userland.
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/8] um: Check for missing AVX and AVX-512 xstate bits
2026-06-26 8:21 ` Anton Ivanov
@ 2026-06-26 10:49 ` David Laight
0 siblings, 0 replies; 16+ messages in thread
From: David Laight @ 2026-06-26 10:49 UTC (permalink / raw)
To: Anton Ivanov
Cc: Eric Biggers, x86, linux-um, linux-raid, linux-crypto,
linux-kernel, Christoph Hellwig, Andrew Morton
On Fri, 26 Jun 2026 09:21:49 +0100
Anton Ivanov <anton.ivanov@cambridgegreys.com> wrote:
> On 26/06/2026 08:41, David Laight wrote:
> > On Thu, 25 Jun 2026 21:37:25 -0700
> > Eric Biggers <ebiggers@kernel.org> wrote:
> >
> >> If the CPU declares AVX or AVX-512 support, verify that all the
> >> corresponding xstate bits are also set. If any are missing, warn and
> >> don't set the corresponding X86_FEATURE_* flags.
> >>
> >> This eliminates the perceived need for UML-supporting AVX and AVX-512
> >> optimized code in the kernel (that is, lib/raid/ currently) to start
> >> checking the xstate bits in addition to X86_FEATURE_AVX*.
> >>
> > ...
> >> static void __init parse_host_cpu_flags(char *line)
> >> {
> >> + u64 xcr0 = read_xcr0();
> >> int i;
> >> +
> >> for (i = 0; i < 32*NCAPINTS; i++) {
> >> if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]))
> >
> > 'line' comes from /proc/cpuinfo
> > Surely something would be terribly wrong if that included something the kernel
> > had disabled (or didn't support).
> >
> > David
> >
> >
> >> - set_cpu_cap(&boot_cpu_data, i);
> >> + validate_and_set_cpu_cap(i, xcr0);
> >> }
> >> }
> >>
> >> static void __init parse_cache_line(char *line)
> >> {
> >
> >
> >
> >
> Lots of other stuff will go wrong before that. Glibc, things compiled with LLVM, python, perl, etc.
>
> Half of the userland will go belly up, because AVX is used in string operations and hashing if it is available.
And glibc will check xcr0.
>
> UML is just another userland application from this perspective, so there is no reason for it to behave any different from the rest of the userland.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-06-26 10:50 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 4:37 [PATCH 0/8] x86: Remove cpu_has_xfeatures() and add AVX-512 xor_gen() Eric Biggers
2026-06-26 4:37 ` [PATCH 1/8] x86/fpu: Check for missing AVX and AVX-512 xstate bits Eric Biggers
2026-06-26 5:39 ` Christoph Hellwig
2026-06-26 4:37 ` [PATCH 2/8] um: " Eric Biggers
2026-06-26 7:41 ` David Laight
2026-06-26 8:21 ` Anton Ivanov
2026-06-26 10:49 ` David Laight
2026-06-26 4:37 ` [PATCH 3/8] crypto: x86 - Stop using cpu_has_xfeatures() Eric Biggers
2026-06-26 4:37 ` [PATCH 4/8] lib/crypto: x86: " Eric Biggers
2026-06-26 4:37 ` [PATCH 5/8] lib/crc: " Eric Biggers
2026-06-26 4:37 ` [PATCH 6/8] x86/fpu: Remove cpu_has_xfeatures() Eric Biggers
2026-06-26 4:37 ` [PATCH 7/8] lib/raid/xor: x86: Remove redundant X86_FEATURE_OSXSAVE check Eric Biggers
2026-06-26 5:40 ` Christoph Hellwig
2026-06-26 4:37 ` [PATCH 8/8] lib/raid/xor: x86: Add AVX-512 optimized xor_gen() Eric Biggers
2026-06-26 5:47 ` Christoph Hellwig
2026-06-26 5:47 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox