* [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions
@ 2025-02-27 18:44 Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features() Chang S. Bae
` (12 more replies)
0 siblings, 13 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
Hi all,
This patch series introduces support for Intel's Advanced Performance
Extensions (APX). The goal is to collect early feedback on the approach
taken for APX. Below is a brief overview of the feature and key
considerations.
== Introduction ==
APX introduces a new set of general-purpose registers designed to improve
performance. Currently, these registers are expected to be used primarily
by userspace applications, with no intended use in kernel mode. More
details on its use cases can be found in the published documentation [1].
== Points to Consider ==
In terms of kernel support, some aspects need attention:
* New Register State
APX register state is managed by the XSAVE instruction set, with its
state component number assigned to 19, following XTILEDATA.
* XSAVE Buffer Offset
- In the compacted format used (for the in-kernel buffer), the APX
state will appear at a later position in the buffer.
- In the non-compacted format (used for signal, ptrace, and KVM ABIs),
APX is assigned a lower offset, occupying the space previously
reserved for the deprecated MPX state.
This should not bring any ABI changes. In the extended register state
area, each state's size and offset are determined dynamically via
CPUID.
* Kernel Assumptions:
The kernel generally assumes that higher-numbered components have
higher offsets, as discussed in the following section. Although MPX
feature usage support was removed [2], its state components (#3 and #4)
remain supported.
== Areas for Adjustment ==
With that in mind, here are a few key areas that need to be addressed to
properly handle the APX state. While these are thought to be pretty much
at this stage, I'd say it is still open to discover possible hindsights
in other areas.
1. Feature Conflict
While a valid CPU should not expose both MPX and APX, a broken or
misconfigured CPU could erroneously expose support for both. This hard
conflict should be avoided up front.
2. XSAVE Format Conversion
APX introduces an offset anomaly that requires special handling when
converting between compacted and non-compacted layouts. The kernel
relies on XSAVE instructions for context switching and signal
handling. However, xstate copy functions must correctly translate the
XSAVE format while ensuring sequential memory access, as enforced by
struct membuf.
3. XSAVE Size Calculation
The kernel calculates XSAVE buffer sizes based on the assumption that
the highest-numbered feature appears at the end. If APX is the last
bit set in the feature mask, the existing logic in
xstate_calculate_size() miscalculates the buffer size.
4. Offset Sanity Check
The kernel's boot-time sanity check in setup_xstate_cache() assumes
offsets increase with feature numbers. The APX state will conflict
with this assumption.
== Approaches ==
These two approaches are considerable:
Option 1, Consider APX as a one-off exception
Initially, I thought this approach would result in fewer changes to the
xstate code. However, it makes the code less comprehensive (or more
complicated) and introduces a risk. If another feature similar to APX
comes up, adding another exception seems to be inefficient and messy.
Option 2, Handle out-of-order offsets in general
Rather than treating APX as an exception, this approach adapts the
kernel to accommodate out-of-order offsets. By introducing a feature
order table and an accompanying macro, the traversal logic is
encapsulated cleanly, simplifying related code. This makes the kernel
more resilient to future features with non-sequential offsets.
== Series Summary ==
This patch set addresses the above issues before enabling APX support
The chosen approach (Option 2) adjusts the kernel to handle out-of-order
offsets. Here’s a breakdown of the patches:
* PART1: XSTATE Code Adjustment
- PATCH1: Clean up xstate enabling messages.
- PATCH2: Introduce the feature order table.
- PATCH3: Remove the offset sanity check, as the checked rule is no
longer true.
- PATCH4: Adjust XSAVE size calculation.
- PATCH5: Modify the xstate copy function.
* PART2: APX Enabling
- PATCH6: Remove MPX support.
- PATCH7: Enumerate the APX CPUID feature bit.
- PATCH8: Update xstate definitions to include APX.
- PATCH9: Ensure MPX and APX are mutually exclusive.
- PATCH10: Register APX in the supported xstate list.
- PATCH11: Add a self-test case for APX.
== Testing ==
The first part is agnostic to the new feature, and the changes should
preserve the same functionality for the currently supported features. The
xstate tests -- covering context switching and ABI compatibility for
signal and ptrace -- were executed to ensure there is no regression.
PATCH11 applies to the same test set and builds on the xstate selftest
rework [3]. Since no hardware implementation is available at this time,
an internal Intel emulator was primarily used to verify the test cases.
---
The patches are based on the x86/fpu branch [4], where the selftest
rework has landed (Thanks, Ingo!). The series can be found here:
git://github.com/intel/apx.git apx_rfc-v1
Thanks,
Chang
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
[2] Commit 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
[3] https://lore.kernel.org/lkml/20250226010731.2456-1-chang.seok.bae@intel.com/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/fpu
Chang S. Bae (11):
x86/fpu/xstate: Simplify print_xstate_features()
x86/fpu/xstate: Introduce xstate order table and accessor macro
x86/fpu/xstate: Remove xstate offset check
x86/fpu/xstate: Adjust XSAVE buffer size calculation
x86/fpu/xstate: Adjust xstate copying logic for user ABI
x86/fpu/mpx: Remove MPX xstate component support
x86/cpufeatures: Add X86_FEATURE_APX
x86/fpu/apx: Define APX state component
x86/fpu/apx: Disallow conflicting MPX presence
x86/fpu/apx: Enable APX state support
selftests/x86/apx: Add APX test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/fpu/types.h | 9 ++
arch/x86/include/asm/fpu/xstate.h | 5 +-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/fpu/xstate.c | 131 ++++++++++++++++-----------
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/apx.c | 10 ++
tools/testing/selftests/x86/xstate.c | 3 +-
tools/testing/selftests/x86/xstate.h | 1 +
10 files changed, 108 insertions(+), 57 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
base-commit: bd64e9d6aafd12e5437685d2a06360f86418d277
--
2.45.2
^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features()
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 18:51 ` Dave Hansen
` (2 more replies)
2025-02-27 18:44 ` [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro Chang S. Bae
` (11 subsequent siblings)
12 siblings, 3 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
print_xstate_features() currently invokes print_xstate_feature() multiple
times on separate lines, which can be simplified in a loop.
print_xstate_feature() already checks the feature's enabled status and is
only called within print_xstate_features(). Inline print_xstate_feature()
and iterate over features in a loop to streamline the enabling message.
No functional changes.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
I find it difficult to justify using separate lines for printing when
they can be efficiently grouped under a simple for-loop. While this
cleanup isn't directly related to APX, it felt like a necessary
improvement, especially when new xstates are introduced.
---
arch/x86/kernel/fpu/xstate.c | 30 +++++++++---------------------
1 file changed, 9 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 27417b685c1d..6a41d1610d8b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -259,32 +259,20 @@ static void __init setup_xstate_cache(void)
}
}
-static void __init print_xstate_feature(u64 xstate_mask)
-{
- const char *feature_name;
-
- if (cpu_has_xfeatures(xstate_mask, &feature_name))
- pr_info("x86/fpu: Supporting XSAVE feature 0x%03Lx: '%s'\n", xstate_mask, feature_name);
-}
-
/*
* Print out all the supported xstate features:
*/
static void __init print_xstate_features(void)
{
- print_xstate_feature(XFEATURE_MASK_FP);
- print_xstate_feature(XFEATURE_MASK_SSE);
- print_xstate_feature(XFEATURE_MASK_YMM);
- print_xstate_feature(XFEATURE_MASK_BNDREGS);
- print_xstate_feature(XFEATURE_MASK_BNDCSR);
- print_xstate_feature(XFEATURE_MASK_OPMASK);
- print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
- print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
- print_xstate_feature(XFEATURE_MASK_PKRU);
- print_xstate_feature(XFEATURE_MASK_PASID);
- print_xstate_feature(XFEATURE_MASK_CET_USER);
- print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
- print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
+ int i;
+
+ for (i = 0; i < XFEATURE_MAX; i++) {
+ u64 mask = BIT_ULL(i);
+ const char *name;
+
+ if (cpu_has_xfeatures(mask, &name))
+ pr_info("x86/fpu: Supporting XSAVE feature 0x%03Lx: '%s'\n", mask, name);
+ }
}
/*
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features() Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 19:03 ` Dave Hansen
2025-02-27 19:05 ` Ingo Molnar
2025-02-27 18:44 ` [PATCH RFC v1 03/11] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
` (10 subsequent siblings)
12 siblings, 2 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
The kernel has largely assumed that higher xstate component numbers
correspond to later offsets in the buffer. However, this assumption does
not hold for the non-compacted format, where a newer state component may
have a lower offset.
When iterating over xstate components in offset order, using the feature
number as an index may be misleading. At the same time, the CPU exposes
each component’s size and offset based on its feature number, making it a
key for state information.
To provide flexibility in handling xstate ordering, add a mapping table:
feature order -> feature number. This new array stores feature numbers in
offset order, accompanied by an accessor macro.
The macro enables sequential traversal of xstate components based on
their actual buffer positions, given a feature bitmask. This will be
particularly useful for computing customized non-compacted format sizes
and sequentially accessing xstate offsets over non-compacted buffers.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
This lays the groundwork for handling APX, which has feature number 19
but appears immediately after FEATURE_YMM, occupying the space previously
reserved for the now-deprecated MPX state.
---
arch/x86/kernel/fpu/xstate.c | 37 ++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a41d1610d8b..cee9a1e454b7 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -91,6 +91,43 @@ static unsigned int xstate_flags[XFEATURE_MAX] __ro_after_init;
#define XSTATE_FLAG_SUPERVISOR BIT(0)
#define XSTATE_FLAG_ALIGNED64 BIT(1)
+/*
+ * Ordering of xstate components in non-compacted format: The xfeature
+ * number does not necessarily indicate its position in the XSAVE buffer.
+ * This array defines the traversal order of xstate features, included in
+ * XFEATURE_MASK_USER_SUPPORTED.
+ */
+static const enum xfeature xfeature_noncompact_order[] = {
+ XFEATURE_FP,
+ XFEATURE_SSE,
+ XFEATURE_YMM,
+ XFEATURE_BNDREGS,
+ XFEATURE_BNDCSR,
+ XFEATURE_OPMASK,
+ XFEATURE_ZMM_Hi256,
+ XFEATURE_Hi16_ZMM,
+ XFEATURE_PKRU,
+ XFEATURE_XTILE_CFG,
+ XFEATURE_XTILE_DATA,
+};
+
+static inline unsigned int next_xfeature_order(unsigned int i, u64 mask)
+{
+ for (; i < ARRAY_SIZE(xfeature_noncompact_order); i++) {
+ if (mask & BIT_ULL(xfeature_noncompact_order[i]))
+ break;
+ }
+
+ return i;
+}
+
+/* Iterate xstate features in non-compacted order */
+#define for_each_extended_xfeature_in_order(i, mask) \
+ for (i = XFEATURE_YMM; \
+ i = next_xfeature_order(i, mask), \
+ i < ARRAY_SIZE(xfeature_noncompact_order); \
+ i++)
+
/*
* Return whether the system supports a given xfeature.
*
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 03/11] x86/fpu/xstate: Remove xstate offset check
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features() Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 04/11] x86/fpu/xstate: Adjust XSAVE buffer size calculation Chang S. Bae
` (9 subsequent siblings)
12 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
Traditionally, new xstate components have been assigned sequentially,
aligning feature numbers with their offsets in the XSAVE buffer. However,
this ordering is not architecturally mandated in the non-compacted
format, where a component's offset may not correspond to its feature
number.
The kernel caches CPUID-reported xstate component details, including size
and offset in the non-compacted format. As part of this process, a sanity
check is also conducted to ensure alignment between feature numbers and
offsets.
This check was likely intended as a general guideline rather than a
strict requirement. As part of supporting out-of-order offsets, remove
this unnecessary validation.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index cee9a1e454b7..c6f825bb744b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -253,9 +253,6 @@ static bool xfeature_enabled(enum xfeature xfeature)
static void __init setup_xstate_cache(void)
{
u32 eax, ebx, ecx, edx, i;
- /* start at the beginning of the "extended state" */
- unsigned int last_good_offset = offsetof(struct xregs_state,
- extended_state_area);
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -283,16 +280,6 @@ static void __init setup_xstate_cache(void)
continue;
xstate_offsets[i] = ebx;
-
- /*
- * In our xstate size checks, we assume that the highest-numbered
- * xstate feature has the highest offset in the buffer. Ensure
- * it does.
- */
- WARN_ONCE(last_good_offset > xstate_offsets[i],
- "x86/fpu: misordered xstate at %d\n", last_good_offset);
-
- last_good_offset = xstate_offsets[i];
}
}
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 04/11] x86/fpu/xstate: Adjust XSAVE buffer size calculation
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (2 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 03/11] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 05/11] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
` (8 subsequent siblings)
12 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
The current xstate size calculation assumes that the highest-numbered
xstate feature has the highest offset in the buffer, determining the size
based on the topmost bit in the feature mask. However, this assumption is
not architecturally guaranteed -- higher-numbered features may have lower
offsets.
With the introduction of the xfeature order table and its helper macro,
xstate components can now be traversed in their positional order. Update
the non-compacted format handling to iterate through the table to
determine the last-positioned feature. Then, set the offset accordingly.
Since size calculation primarily occurs during initialization or in
non-critical paths, looping to find the last feature is not expected to
have a meaningful performance impact.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c6f825bb744b..06d2be602ea9 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -576,13 +576,20 @@ static bool __init check_xstate_against_struct(int nr)
static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
{
unsigned int topmost = fls64(xfeatures) - 1;
- unsigned int offset = xstate_offsets[topmost];
+ unsigned int offset, i;
if (topmost <= XFEATURE_SSE)
return sizeof(struct xregs_state);
- if (compacted)
+ if (compacted) {
offset = xfeature_get_offset(xfeatures, topmost);
+ } else {
+ /* Walk through the xfeature order to pick the last */
+ for_each_extended_xfeature_in_order(i, xfeatures)
+ topmost = xfeature_noncompact_order[i];
+ offset = xstate_offsets[topmost];
+ }
+
return offset + xstate_sizes[topmost];
}
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 05/11] x86/fpu/xstate: Adjust xstate copying logic for user ABI
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (3 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 04/11] x86/fpu/xstate: Adjust XSAVE buffer size calculation Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support Chang S. Bae
` (7 subsequent siblings)
12 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 06d2be602ea9..d04221d0fa0a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1102,10 +1102,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1174,15 +1173,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_noncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1192,14 +1192,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (4 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 05/11] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 19:12 ` Ingo Molnar
2025-02-27 18:44 ` [PATCH RFC v1 07/11] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
` (6 subsequent siblings)
12 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
A new xstate component is set to occupy the position previously used by
MPX in the non-compacted format, then creating a fundamental conflict
between the two.
Currently, xfeature_noncompact_order[] includes MPX, but the introduction
of the new feature would cause a direct conflict there unless MPX is
removed. Fortunately, MPX support has already been deprecated and
effectively removed by commit:
45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
Explicitly disable the deprecated feature to reserve a space for the new
xstate.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Several code references to this feature macro remain, primarily on the
KVM side. While they are likely to become obsolete after this patch,
their cleanup has been deferred at this review stage: This can be
addressed in a follow-up patch or included as an optional part of APX
enablement, I suppose.
---
arch/x86/include/asm/fpu/xstate.h | 2 --
arch/x86/kernel/fpu/xstate.c | 8 --------
2 files changed, 10 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7980c5..e87d36a31ab1 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -30,8 +30,6 @@
XFEATURE_MASK_ZMM_Hi256 | \
XFEATURE_MASK_Hi16_ZMM | \
XFEATURE_MASK_PKRU | \
- XFEATURE_MASK_BNDREGS | \
- XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_XTILE)
/*
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index d04221d0fa0a..16f45fff8811 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -69,8 +69,6 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_FP] = X86_FEATURE_FPU,
[XFEATURE_SSE] = X86_FEATURE_XMM,
[XFEATURE_YMM] = X86_FEATURE_AVX,
- [XFEATURE_BNDREGS] = X86_FEATURE_MPX,
- [XFEATURE_BNDCSR] = X86_FEATURE_MPX,
[XFEATURE_OPMASK] = X86_FEATURE_AVX512F,
[XFEATURE_ZMM_Hi256] = X86_FEATURE_AVX512F,
[XFEATURE_Hi16_ZMM] = X86_FEATURE_AVX512F,
@@ -101,8 +99,6 @@ static const enum xfeature xfeature_noncompact_order[] = {
XFEATURE_FP,
XFEATURE_SSE,
XFEATURE_YMM,
- XFEATURE_BNDREGS,
- XFEATURE_BNDCSR,
XFEATURE_OPMASK,
XFEATURE_ZMM_Hi256,
XFEATURE_Hi16_ZMM,
@@ -360,8 +356,6 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_ZMM_Hi256 | \
XFEATURE_MASK_Hi16_ZMM | \
XFEATURE_MASK_PKRU | \
- XFEATURE_MASK_BNDREGS | \
- XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
XFEATURE_MASK_XTILE)
@@ -555,8 +549,6 @@ static bool __init check_xstate_against_struct(int nr)
*/
switch (nr) {
case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct);
- case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state);
- case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state);
case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state);
case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state);
case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state);
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 07/11] x86/cpufeatures: Add X86_FEATURE_APX
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (5 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 08/11] x86/fpu/apx: Define APX state component Chang S. Bae
` (5 subsequent siblings)
12 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Allocating a new feature word for this bit seems excessive at this stage,
given that no other imminent features are quite known.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 508c0dad116b..4f96515af7c3 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -483,6 +483,7 @@
#define X86_FEATURE_AMD_FAST_CPPC (21*32 + 5) /* Fast CPPC */
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
+#define X86_FEATURE_APX (21*32 + 8) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index df838e3bdbe0..44ab6baeec42 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{ X86_FEATURE_FRED, X86_FEATURE_LKGS },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{}
};
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..6c40d5af8479 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -54,6 +54,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
{ X86_FEATURE_AMD_HETEROGENEOUS_CORES, CPUID_EAX, 30, 0x80000026, 0 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ 0, 0, 0, 0, 0 }
};
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 08/11] x86/fpu/apx: Define APX state component
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (6 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 07/11] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 09/11] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
` (4 subsequent siblings)
12 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
Previously, Advanced Performance Extensions (APX) feature flag was
defined. The feature is associated with a new state component number 19.
Considering the feature support, it is essential to identify the new
xstate component and implement necessary sanity checks.
Define the new component number, the state naming, and those registers'
data type. Then, extend the size checker to ensure correct validation of
the register data type and explicitly sets the APX feature flag as a
dependency for the new component in xsave_cpuid_features[].
Include APX in xfeature_noncompact_order[]. Notably, this out-of-order
placement demonstrates the practical value of the feature order table.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/kernel/fpu/xstate.c | 4 ++++
2 files changed, 13 insertions(+)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862bf230..97310df3ea13 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -125,6 +125,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -145,6 +146,7 @@ enum xfeature {
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
+#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
@@ -303,6 +305,13 @@ struct xtile_data {
struct reg_1024_byte tmm;
} __packed;
+/*
+ * State component 19: 8B extended general purpose register.
+ */
+struct apx_state {
+ u64 egpr[16];
+} __packed;
+
/*
* State component 10 is supervisor state used for context-switching the
* PASID state.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 16f45fff8811..6a6f0e78e2c3 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -62,6 +62,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
@@ -78,6 +79,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
+ [XFEATURE_APX] = X86_FEATURE_APX,
};
static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -99,6 +101,7 @@ static const enum xfeature xfeature_noncompact_order[] = {
XFEATURE_FP,
XFEATURE_SSE,
XFEATURE_YMM,
+ XFEATURE_APX, /* Out-of-order feature */
XFEATURE_OPMASK,
XFEATURE_ZMM_Hi256,
XFEATURE_Hi16_ZMM,
@@ -557,6 +560,7 @@ static bool __init check_xstate_against_struct(int nr)
case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
+ case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
return false;
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 09/11] x86/fpu/apx: Disallow conflicting MPX presence
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (7 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 08/11] x86/fpu/apx: Define APX state component Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 23:16 ` Dave Hansen
2025-02-27 18:44 ` [PATCH RFC v1 10/11] x86/fpu/apx: Enable APX state support Chang S. Bae
` (3 subsequent siblings)
12 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
APX is introduced as xstate component 19, following AMX. However, its
offset in the non-compacted format overlaps with the area previously
occupied by the now-removed MPX. As they cannot coexist, MPX support was
removed from the kernel.
Despite this, the kernel must ensure the CPU never expose both features
at the same time. If so, it indicates unreliable hardware. In such cases,
XSAVE should be disabled entirely as a precautionary measure.
Add a sanity check to detect this condition and disable XSAVE if an
invalid hardware configuration is identified.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a6f0e78e2c3..0f731e11c414 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -802,6 +802,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
+ fpu_kernel_cfg.max_features & XFEATURE_MASK_BNDREGS) {
+ /*
+ * This is a problematic CPU configuration where two
+ * conflicting state components are both enumerated.
+ */
+ pr_err("x86/fpu: both APX and MPX present in the CPU's xstate features: 0x%llx.\n",
+ fpu_kernel_cfg.max_features);
+ goto out_disable;
+ }
+
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
XFEATURE_MASK_INDEPENDENT;
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 10/11] x86/fpu/apx: Enable APX state support
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (8 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 09/11] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 11/11] selftests/x86/apx: Add APX test Chang S. Bae
` (2 subsequent siblings)
12 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
With securing APX against conflicting MPX, it is now ready to be enabled.
Include APX in the enabled xfeature set.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/fpu/xstate.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index e87d36a31ab1..6e3cba7b0068 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -30,7 +30,8 @@
XFEATURE_MASK_ZMM_Hi256 | \
XFEATURE_MASK_Hi16_ZMM | \
XFEATURE_MASK_PKRU | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* Features which are restored when returning to user space.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0f731e11c414..d42abad5a0d4 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -361,7 +361,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_PKRU | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* setup the xstate image representing the init state
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v1 11/11] selftests/x86/apx: Add APX test
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (9 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 10/11] x86/fpu/apx: Enable APX state support Chang S. Bae
@ 2025-02-27 18:44 ` Chang S. Bae
2025-02-27 19:15 ` [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Ingo Molnar
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
12 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 18:44 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
The extended general-purpose registers for APX may contain random data,
which is currently assumed by the xstate testing framework. This allows
the testing of the new userspace feature using the common test code.
Invoke the test entry function from apx.c after enumerating the
state component and adding it to the support list
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
The patch depends on the selftest xstate series, which was just merged
into the x86/fpu branch:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/fpu
Here is the original series posting:
https://lore.kernel.org/lkml/20250226010731.2456-1-chang.seok.bae@intel.com/
---
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 1 +
4 files changed, 15 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 28422c32cc8f..f703fcfe9f7c 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -19,7 +19,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
- corrupt_xstate_header amx lam test_shadow_stack avx
+ corrupt_xstate_header amx lam test_shadow_stack avx apx
# Some selftests require 32bit support enabled also on 64bit systems
TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
@@ -136,3 +136,4 @@ $(OUTPUT)/nx_stack_64: CFLAGS += -Wl,-z,noexecstack
$(OUTPUT)/avx_64: CFLAGS += -mno-avx -mno-avx512f
$(OUTPUT)/amx_64: EXTRA_FILES += xstate.c
$(OUTPUT)/avx_64: EXTRA_FILES += xstate.c
+$(OUTPUT)/apx_64: EXTRA_FILES += xstate.c
diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
new file mode 100644
index 000000000000..d9c8d41b8c5a
--- /dev/null
+++ b/tools/testing/selftests/x86/apx.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include "xstate.h"
+
+int main(void)
+{
+ test_xstate(XFEATURE_APX);
+}
diff --git a/tools/testing/selftests/x86/xstate.c b/tools/testing/selftests/x86/xstate.c
index 23c1d6c964ea..97fe4bd8bc77 100644
--- a/tools/testing/selftests/x86/xstate.c
+++ b/tools/testing/selftests/x86/xstate.c
@@ -31,7 +31,8 @@
(1 << XFEATURE_OPMASK) | \
(1 << XFEATURE_ZMM_Hi256) | \
(1 << XFEATURE_Hi16_ZMM) | \
- (1 << XFEATURE_XTILEDATA))
+ (1 << XFEATURE_XTILEDATA) | \
+ (1 << XFEATURE_APX))
static inline uint64_t xgetbv(uint32_t index)
{
diff --git a/tools/testing/selftests/x86/xstate.h b/tools/testing/selftests/x86/xstate.h
index 42af36ec852f..f3c25193a3be 100644
--- a/tools/testing/selftests/x86/xstate.h
+++ b/tools/testing/selftests/x86/xstate.h
@@ -33,6 +33,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILECFG,
XFEATURE_XTILEDATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features()
2025-02-27 18:44 ` [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features() Chang S. Bae
@ 2025-02-27 18:51 ` Dave Hansen
2025-02-27 18:52 ` Ingo Molnar
2025-02-27 19:04 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2 siblings, 0 replies; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 18:51 UTC (permalink / raw)
To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 2/27/25 10:44, Chang S. Bae wrote:
> I find it difficult to justify using separate lines for printing when
> they can be efficiently grouped under a simple for-loop. While this
> cleanup isn't directly related to APX, it felt like a necessary
> improvement, especially when new xstates are introduced.
Yeah, this approach made some sense when the string was defined there too:
print_xstate_feature(XSTATE_OPMASK, "AVX-512 opmask");
print_xstate_feature(XSTATE_ZMM_Hi256, "AVX-512 Hi256");
...
But it doesn't now that the strings are in a separate structure.
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features()
2025-02-27 18:44 ` [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features() Chang S. Bae
2025-02-27 18:51 ` Dave Hansen
@ 2025-02-27 18:52 ` Ingo Molnar
2025-02-27 19:04 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 18:52 UTC (permalink / raw)
To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> print_xstate_features() currently invokes print_xstate_feature() multiple
> times on separate lines, which can be simplified in a loop.
>
> print_xstate_feature() already checks the feature's enabled status and is
> only called within print_xstate_features(). Inline print_xstate_feature()
> and iterate over features in a loop to streamline the enabling message.
>
> No functional changes.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> I find it difficult to justify using separate lines for printing when
> they can be efficiently grouped under a simple for-loop. While this
> cleanup isn't directly related to APX, it felt like a necessary
> improvement, especially when new xstates are introduced.
Agreed, and I've picked up this simplification independently in
tip:x86/fpu.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 18:44 ` [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro Chang S. Bae
@ 2025-02-27 19:03 ` Dave Hansen
2025-02-27 19:34 ` Andrew Cooper
2025-03-20 23:46 ` Chang S. Bae
2025-02-27 19:05 ` Ingo Molnar
1 sibling, 2 replies; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 19:03 UTC (permalink / raw)
To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 2/27/25 10:44, Chang S. Bae wrote:
> The kernel has largely assumed that higher xstate component numbers
> correspond to later offsets in the buffer. However, this assumption does
> not hold for the non-compacted format, where a newer state component may
> have a lower offset.
Maybe "no longer holds" instead of "does not hold".
This never happened before APX, right?
...
> +/*
> + * Ordering of xstate components in non-compacted format: The xfeature
> + * number does not necessarily indicate its position in the XSAVE buffer.
> + * This array defines the traversal order of xstate features, included in
> + * XFEATURE_MASK_USER_SUPPORTED.
> + */
> +static const enum xfeature xfeature_noncompact_order[] = {
Nit: I think we call this "uncompacted" in most places.
> + XFEATURE_FP,
> + XFEATURE_SSE,
> + XFEATURE_YMM,
> + XFEATURE_BNDREGS,
> + XFEATURE_BNDCSR,
> + XFEATURE_OPMASK,
> + XFEATURE_ZMM_Hi256,
> + XFEATURE_Hi16_ZMM,
> + XFEATURE_PKRU,
> + XFEATURE_XTILE_CFG,
> + XFEATURE_XTILE_DATA,
> +};
There are two possible ways to do this:
One, hard-code it like you have here. If we do this, we probably also
need some verification debug code that it got done right.
Two, dynamically generate it. I suspect that once you have the
uncompacted offset read out of CPUID, you could generate this in fewer
lines of code than it takes to define it manually. I think it would also
not require adding new features manually.
^ permalink raw reply [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/xstate: Simplify print_xstate_features()
2025-02-27 18:44 ` [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features() Chang S. Bae
2025-02-27 18:51 ` Dave Hansen
2025-02-27 18:52 ` Ingo Molnar
@ 2025-02-27 19:04 ` tip-bot2 for Chang S. Bae
2 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-02-27 19:04 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Dave Hansen, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 69a2fdf446049ae31be4a14a0cf16f2f18f09b6c
Gitweb: https://git.kernel.org/tip/69a2fdf446049ae31be4a14a0cf16f2f18f09b6c
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 27 Feb 2025 10:44:46 -08:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 27 Feb 2025 19:54:41 +01:00
x86/fpu/xstate: Simplify print_xstate_features()
print_xstate_features() currently invokes print_xstate_feature() multiple
times on separate lines, which can be simplified in a loop.
print_xstate_feature() already checks the feature's enabled status and is
only called within print_xstate_features(). Inline print_xstate_feature()
and iterate over features in a loop to streamline the enabling message.
No functional changes.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20250227184502.10288-2-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 30 +++++++++---------------------
1 file changed, 9 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 27417b6..6a41d16 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -259,32 +259,20 @@ static void __init setup_xstate_cache(void)
}
}
-static void __init print_xstate_feature(u64 xstate_mask)
-{
- const char *feature_name;
-
- if (cpu_has_xfeatures(xstate_mask, &feature_name))
- pr_info("x86/fpu: Supporting XSAVE feature 0x%03Lx: '%s'\n", xstate_mask, feature_name);
-}
-
/*
* Print out all the supported xstate features:
*/
static void __init print_xstate_features(void)
{
- print_xstate_feature(XFEATURE_MASK_FP);
- print_xstate_feature(XFEATURE_MASK_SSE);
- print_xstate_feature(XFEATURE_MASK_YMM);
- print_xstate_feature(XFEATURE_MASK_BNDREGS);
- print_xstate_feature(XFEATURE_MASK_BNDCSR);
- print_xstate_feature(XFEATURE_MASK_OPMASK);
- print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
- print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
- print_xstate_feature(XFEATURE_MASK_PKRU);
- print_xstate_feature(XFEATURE_MASK_PASID);
- print_xstate_feature(XFEATURE_MASK_CET_USER);
- print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
- print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
+ int i;
+
+ for (i = 0; i < XFEATURE_MAX; i++) {
+ u64 mask = BIT_ULL(i);
+ const char *name;
+
+ if (cpu_has_xfeatures(mask, &name))
+ pr_info("x86/fpu: Supporting XSAVE feature 0x%03Lx: '%s'\n", mask, name);
+ }
}
/*
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 18:44 ` [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro Chang S. Bae
2025-02-27 19:03 ` Dave Hansen
@ 2025-02-27 19:05 ` Ingo Molnar
2025-02-27 19:32 ` Dave Hansen
1 sibling, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 19:05 UTC (permalink / raw)
To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> The kernel has largely assumed that higher xstate component numbers
> correspond to later offsets in the buffer. However, this assumption does
> not hold for the non-compacted format, where a newer state component may
> have a lower offset.
>
> When iterating over xstate components in offset order, using the feature
> number as an index may be misleading. At the same time, the CPU exposes
> each component’s size and offset based on its feature number, making it a
> key for state information.
>
> To provide flexibility in handling xstate ordering, add a mapping table:
> feature order -> feature number. This new array stores feature numbers in
> offset order, accompanied by an accessor macro.
>
> The macro enables sequential traversal of xstate components based on
> their actual buffer positions, given a feature bitmask. This will be
> particularly useful for computing customized non-compacted format sizes
> and sequentially accessing xstate offsets over non-compacted buffers.
>
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> This lays the groundwork for handling APX, which has feature number 19
> but appears immediately after FEATURE_YMM, occupying the space previously
> reserved for the now-deprecated MPX state.
So why not just reuse the MPX feature number and have some sort of
extended feature bit that tells us and userspace that it's APX? The
xsave area is getting reused anyway - causing all this indirection and
trouble ...
Since we don't really do much for MPX in the kernel, and APX is not
supposed to be used by the kernel either, it should be pretty much a
shoe-in, right?
What's the scope of MPX support in actual processors out in the wild?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support
2025-02-27 18:44 ` [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support Chang S. Bae
@ 2025-02-27 19:12 ` Ingo Molnar
2025-02-28 3:12 ` Chang S. Bae
0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 19:12 UTC (permalink / raw)
To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> A new xstate component is set to occupy the position previously used by
> MPX in the non-compacted format, then creating a fundamental conflict
> between the two.
>
> Currently, xfeature_noncompact_order[] includes MPX, but the introduction
> of the new feature would cause a direct conflict there unless MPX is
> removed. Fortunately, MPX support has already been deprecated and
> effectively removed by commit:
>
> 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
>
> Explicitly disable the deprecated feature to reserve a space for the new
> xstate.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> Several code references to this feature macro remain, primarily on the
> KVM side. While they are likely to become obsolete after this patch,
> their cleanup has been deferred at this review stage: This can be
> addressed in a follow-up patch or included as an optional part of APX
> enablement, I suppose.
So can this patch be moved further up in the series (without breaking
anything), to make it easier to review the impact of the APX changes?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (10 preceding siblings ...)
2025-02-27 18:44 ` [PATCH RFC v1 11/11] selftests/x86/apx: Add APX test Chang S. Bae
@ 2025-02-27 19:15 ` Ingo Molnar
2025-02-27 19:36 ` Dave Hansen
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
12 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 19:15 UTC (permalink / raw)
To: Chang S. Bae
Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen, Linus Torvalds
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> == Introduction ==
>
> APX introduces a new set of general-purpose registers designed to improve
> performance. Currently, these registers are expected to be used primarily
> by userspace applications, with no intended use in kernel mode. More
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> details on its use cases can be found in the published documentation [1].
I strongly suspect this won't remain so, unless there's some horrible
ISA limitation or other quirk that makes APX unsuitable for kernel use.
The moment one of the mainstream compilers adds support for it we'll
want to enable it for the kernel too. Since the new GP registers are
caller-saved they are fairly easy to use as scratch registers to reduce
stack pressure, which is very much present on x86-64, especially when
the frame pointer is in use.
So we need to keep that in mind when shaping these patches.
Unless I'm missing something that is. :-)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:05 ` Ingo Molnar
@ 2025-02-27 19:32 ` Dave Hansen
2025-02-27 19:36 ` Ingo Molnar
0 siblings, 1 reply; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 19:32 UTC (permalink / raw)
To: Ingo Molnar, Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 2/27/25 11:05, Ingo Molnar wrote:
> So why not just reuse the MPX feature number and have some sort of
> extended feature bit that tells us and userspace that it's APX? The
> xsave area is getting reused anyway - causing all this indirection and
> trouble ...
>
> Since we don't really do much for MPX in the kernel, and APX is not
> supposed to be used by the kernel either, it should be pretty much a
> shoe-in, right?
Theoretically, yes.
In practice, no.
MPX didn't have hardware enabling separate from the XSAVE enabling. So
there is software out there that assumes that if it can see the MPX bits
set in XCR0 that MPX is around. Take a look at kvm_mpx_supported() as an
example. Would you be comfortable having an old kernel which had
kvm_mpx_supported()==1 and X86_FEATURE_MPX==0? What about other VMMs?
There were basically three choices:
1. Reuse XFEATURES 3/4 (MPX)
2. Create a new out-of-order XFEATURE 19 that reuses MPX space
3. Create a n in-order XFEATURE 19 that needs XFD and an opt-in
#1 risks breaking old MPX code in weird ways.
#2 requires teaching OSes about the out-of-order XSAVE offsets.
#3 is really nasty for hardware and software
So, you can blame me for picking (or consenting to) #2. But I do still
think it's the safest option. I'm not looking at this diffstat and
regretting the decision at all.
> What's the scope of MPX support in actual processors out in the wild?
IIRC, it showed up in Skylake, roughly 2015. By 2019, it was deprecated
and was mostly fused off in the "10th gen" CPUs:
> https://www.intel.com/content/www/us/en/support/articles/000059823/processors.html
APX is not out yet. No CPU will ever have APX and MPX, obviously.
There's also a ~5 year gap between CPUs that have one or the other.
Is that what you were asking?
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:03 ` Dave Hansen
@ 2025-02-27 19:34 ` Andrew Cooper
2025-02-27 19:42 ` Ingo Molnar
2025-03-20 23:46 ` Chang S. Bae
1 sibling, 1 reply; 112+ messages in thread
From: Andrew Cooper @ 2025-02-27 19:34 UTC (permalink / raw)
To: dave.hansen
Cc: bp, chang.seok.bae, dave.hansen, linux-kernel, mingo, tglx, x86
On 2/27/25 19:03, Dave Hansen wrote:
> On 2/27/25 10:44, Chang S. Bae wrote:
>> The kernel has largely assumed that higher xstate component numbers
>> correspond to later offsets in the buffer. However, this assumption
>> does not hold for the non-compacted format, where a newer state
>> component may have a lower offset.
> Maybe "no longer holds" instead of "does not hold".
>
> This never happened before APX, right?
I'm afraid that AMD beat you there by a decade with LWP, index 63 but
also overlaps the MPX state.
Except LWP support never became mainstream, and it also got sacrificed
to make room for IBPB in microcode, so you can safely ignore it[1].
~Andrew
[1] Other than for competitions of who did the silly thing first.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:32 ` Dave Hansen
@ 2025-02-27 19:36 ` Ingo Molnar
2025-02-27 19:49 ` Andrew Cooper
2025-02-27 19:56 ` Dave Hansen
0 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 19:36 UTC (permalink / raw)
To: Dave Hansen; +Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen
* Dave Hansen <dave.hansen@intel.com> wrote:
> On 2/27/25 11:05, Ingo Molnar wrote:
> > So why not just reuse the MPX feature number and have some sort of
> > extended feature bit that tells us and userspace that it's APX? The
> > xsave area is getting reused anyway - causing all this indirection and
> > trouble ...
> >
> > Since we don't really do much for MPX in the kernel, and APX is not
> > supposed to be used by the kernel either, it should be pretty much a
> > shoe-in, right?
>
> Theoretically, yes.
>
> In practice, no.
>
> MPX didn't have hardware enabling separate from the XSAVE enabling. So
> there is software out there that assumes that if it can see the MPX bits
> set in XCR0 that MPX is around. Take a look at kvm_mpx_supported() as an
> example. Would you be comfortable having an old kernel which had
> kvm_mpx_supported()==1 and X86_FEATURE_MPX==0? What about other VMMs?
I'd absolutely be comfortable if the APX bit was only enabled by newer,
APX-aware kernels. Older kernels would see a non-MPX CPU and would
handle it properly.
I really don't see the issue:
> There were basically three choices:
>
> 1. Reuse XFEATURES 3/4 (MPX)
> 2. Create a new out-of-order XFEATURE 19 that reuses MPX space
> 3. Create a n in-order XFEATURE 19 that needs XFD and an opt-in
>
> #1 risks breaking old MPX code in weird ways.
This is a false trichotomy. ;-)
There's a 4th option:
4. Reuse XFEATURES 3/4 (MPX) only on APX-aware kernels, keep it
disabled for old kernels.
Problem solved.
> Is that what you were asking?
Yeah, thanks!
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions
2025-02-27 19:15 ` [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Ingo Molnar
@ 2025-02-27 19:36 ` Dave Hansen
2025-02-28 3:10 ` Chang S. Bae
0 siblings, 1 reply; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 19:36 UTC (permalink / raw)
To: Ingo Molnar, Chang S. Bae
Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen, Linus Torvalds
On 2/27/25 11:15, Ingo Molnar wrote:
>> by userspace applications, with no intended use in kernel mode. More
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> details on its use cases can be found in the published documentation [1].
> I strongly suspect this won't remain so, unless there's some horrible
> ISA limitation or other quirk that makes APX unsuitable for kernel use.
I think Chang was trying to say that this series is completely focused
on userspace AMX and basically ignores the idea of using it in the kernel.
That said, we honestly talk all the time about using it in the kernel. A
kernel compiled to use APX everywhere obviously won't run on older CPUs
but it _should_ have some really nice advantages. I completely expect
the kernel to be able to be compiled to use APX at _some_ point.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:34 ` Andrew Cooper
@ 2025-02-27 19:42 ` Ingo Molnar
0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 19:42 UTC (permalink / raw)
To: Andrew Cooper
Cc: dave.hansen, bp, chang.seok.bae, dave.hansen, linux-kernel, mingo,
tglx, x86
* Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 2/27/25 19:03, Dave Hansen wrote:
>
> > On 2/27/25 10:44, Chang S. Bae wrote:
> >> The kernel has largely assumed that higher xstate component numbers
> >> correspond to later offsets in the buffer. However, this assumption
> >> does not hold for the non-compacted format, where a newer state
> >> component may have a lower offset.
> > Maybe "no longer holds" instead of "does not hold".
> >
> > This never happened before APX, right?
>
> I'm afraid that AMD beat you there by a decade with LWP, index 63 but
> also overlaps the MPX state.
https://giphy.com/gifs/moodman-TJawtKM6OCKkvwCIqX
> Except LWP support never became mainstream, and it also got
> sacrificed to make room for IBPB in microcode, so you can safely
> ignore it[1].
yay! CPU makers always chose the right solution, once they have
exhaused all the alternatives.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:36 ` Ingo Molnar
@ 2025-02-27 19:49 ` Andrew Cooper
2025-02-27 20:06 ` Ingo Molnar
2025-02-27 19:56 ` Dave Hansen
1 sibling, 1 reply; 112+ messages in thread
From: Andrew Cooper @ 2025-02-27 19:49 UTC (permalink / raw)
To: mingo
Cc: bp, chang.seok.bae, dave.hansen, dave.hansen, linux-kernel, mingo,
tglx, x86
> I really don't see the issue:
>
>> There were basically three choices: 1. Reuse XFEATURES 3/4 (MPX) 2.
>> Create a new out-of-order XFEATURE 19 that reuses MPX space 3. Create
>> a n in-order XFEATURE 19 that needs XFD and an opt-in #1 risks
>> breaking old MPX code in weird ways.
>
> This is a false trichotomy. ;-)
>
> There's a 4th option:
>
> 4. Reuse XFEATURES 3/4 (MPX) only on APX-aware kernels, keep it
> disabled for old kernels.
>
> Problem solved.
Forget breaking MPX code in weird ways; there's very little of it, and
it was distinctly of negative value.
What options #1 and #4 will cause is the virt people to come after you
with sharp implements for creating incompatibilities in an ABI. The
XFEATUREs are the tag(ish) of the union that is the xsave buffer.
That said, I don't see why option 3 would need XFD support. I can see
why Intel didn't want to cause APX-userpsace to need a ~10k legacy XSAVE
buffer, but that's not really about XFD.
~Andrew
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:36 ` Ingo Molnar
2025-02-27 19:49 ` Andrew Cooper
@ 2025-02-27 19:56 ` Dave Hansen
2025-02-27 20:02 ` Ingo Molnar
1 sibling, 1 reply; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 19:56 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 2/27/25 11:36, Ingo Molnar wrote:
>
>> There were basically three choices:
>>
>> 1. Reuse XFEATURES 3/4 (MPX)
>> 2. Create a new out-of-order XFEATURE 19 that reuses MPX space
>> 3. Create a n in-order XFEATURE 19 that needs XFD and an opt-in
>>
>> #1 risks breaking old MPX code in weird ways.
> This is a false trichotomy. 😉
>
> There's a 4th option:
>
> 4. Reuse XFEATURES 3/4 (MPX) only on APX-aware kernels, keep it
> disabled for old kernels.
>
> Problem solved.
The worry boils down to code in the kernel or userspace doing this:
if (XGETBV(0) & 0x11)
do_mpx_gunk();
So, sure, we could try to make sure that new kernels don't have any
do_mpx_gunk() in them, but that doesn't help old kernels or other OSes/VMMs.
The only thing that would help them would be to hide the XFEATURES 3/4
enumeration and require some kind of opt-in for it so that it can't be
enabled except with APX-aware software. That's totally possible, of
course. But it doesn't fix the userspace problem.
If folks really look at the diffstat:
> 10 files changed, 108 insertions(+), 57 deletions(-)
and hate it with _so_ much of a passion that we should go back and re
architect this thing, just say the word. I don't know how far along the
hardware is, but this kind of stuff is usually fixable in microcode. We
can totally go back to the drawing board and try to reuse XFEATURES 3/4
and see if there are any showstoppers (like with "other" OSes).
Let's just make sure that we've got a strong consensus amongst the
maintainer team, and I'm happy to oblige!
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:56 ` Dave Hansen
@ 2025-02-27 20:02 ` Ingo Molnar
2025-02-27 20:19 ` Dave Hansen
0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 20:02 UTC (permalink / raw)
To: Dave Hansen
Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen,
Linus Torvalds
* Dave Hansen <dave.hansen@intel.com> wrote:
> On 2/27/25 11:36, Ingo Molnar wrote:
> >
> >> There were basically three choices:
> >>
> >> 1. Reuse XFEATURES 3/4 (MPX)
> >> 2. Create a new out-of-order XFEATURE 19 that reuses MPX space
> >> 3. Create a n in-order XFEATURE 19 that needs XFD and an opt-in
> >>
> >> #1 risks breaking old MPX code in weird ways.
> > This is a false trichotomy. 😉
> >
> > There's a 4th option:
> >
> > 4. Reuse XFEATURES 3/4 (MPX) only on APX-aware kernels, keep it
> > disabled for old kernels.
> >
> > Problem solved.
>
> The worry boils down to code in the kernel or userspace doing this:
>
> if (XGETBV(0) & 0x11)
> do_mpx_gunk();
New, APX-aware kernels wouldn't be doing this, why would they?
> So, sure, we could try to make sure that new kernels don't have any
> do_mpx_gunk() in them, but that doesn't help old kernels or other
> OSes/VMMs.
Old kernels would *never* see this bit enabled if it's disabled by
default on bootup ...
VMMs would boot with it default-disabled as well, they can enable it
themselves.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:49 ` Andrew Cooper
@ 2025-02-27 20:06 ` Ingo Molnar
2025-02-27 21:10 ` Andrew Cooper
0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 20:06 UTC (permalink / raw)
To: Andrew Cooper
Cc: bp, chang.seok.bae, dave.hansen, dave.hansen, linux-kernel, mingo,
tglx, x86
* Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > I really don't see the issue:
> >
> >> There were basically three choices: 1. Reuse XFEATURES 3/4 (MPX) 2.
> >> Create a new out-of-order XFEATURE 19 that reuses MPX space 3. Create
> >> a n in-order XFEATURE 19 that needs XFD and an opt-in #1 risks
> >> breaking old MPX code in weird ways.
> >
> > This is a false trichotomy. ;-)
> >
> > There's a 4th option:
> >
> > 4. Reuse XFEATURES 3/4 (MPX) only on APX-aware kernels, keep it
> > disabled for old kernels.
> >
> > Problem solved.
>
> Forget breaking MPX code in weird ways; there's very little of it,
> and it was distinctly of negative value.
>
> What options #1 and #4 will cause is the virt people to come after
> you with sharp implements for creating incompatibilities in an ABI.
> The XFEATUREs are the tag(ish) of the union that is the xsave buffer.
There's no incompatibility for a default-disabled feature that gets
enabled by an AVX-aware host kernel and by AVX-aware guest kernels.
What ABI would be broken?
Kernels need to be updated for AVX support *anyway*, so it's not like
the proposed approach is a simple shoe-in for old kernels.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 20:02 ` Ingo Molnar
@ 2025-02-27 20:19 ` Dave Hansen
2025-02-27 20:29 ` Ingo Molnar
0 siblings, 1 reply; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 20:19 UTC (permalink / raw)
To: Ingo Molnar
Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen,
Linus Torvalds
On 2/27/25 12:02, Ingo Molnar wrote:
>>> There's a 4th option:
>>>
>>> 4. Reuse XFEATURES 3/4 (MPX) only on APX-aware kernels, keep it
>>> disabled for old kernels.
>>>
>>> Problem solved.
>> The worry boils down to code in the kernel or userspace doing this:
>>
>> if (XGETBV(0) & 0x11)
>> do_mpx_gunk();
> New, APX-aware kernels wouldn't be doing this, why would they?
New, updated software is rarely the problem. It would not be a problem
in this case either.
The problem is with old OSes/VMMs and even old userspace on new kernels.
XGETBV() is unprivileged, so the concern extends to userspace and kernel.
>> So, sure, we could try to make sure that new kernels don't have any
>> do_mpx_gunk() in them, but that doesn't help old kernels or other
>> OSes/VMMs.
> Old kernels would *never* see this bit enabled if it's disabled by
> default on bootup ...
>
> VMMs would boot with it default-disabled as well, they can enable it
> themselves.
The problem is with code which is logically like this:
cpuid_count(CPUID_LEAF_XSTATE, 0, &eax, &ebx, &ecx, &edx);
fpu_kernel_cfg.max_features = eax + ((u64)edx << 32);
fpu_kernel_cfg.max_features &= SUPPORTED_FEATURE_MASK;
xsetbv(fpu_kernel_cfg.max_features);
In that code, the kernel asks the CPU which features it supports and
then it enables all those features. In an old kernel,
SUPPORTED_FEATURE_MASK would contain the two MPX XFEATURES.
So I'm not _quite_ sure what you meant when you said:
> Old kernels would *never* see this bit enabled if it's disabled by
> default on bootup ...
Because old kernel very much do see the MPX bits in CPUID and very much
do enable them by default via XSETBV.
But I suspect I'm misunderstanding what you mean by "this bit" in first
place. Could you explain in some more detail, please?
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 20:19 ` Dave Hansen
@ 2025-02-27 20:29 ` Ingo Molnar
2025-02-27 21:17 ` Dave Hansen
0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 20:29 UTC (permalink / raw)
To: Dave Hansen
Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen,
Linus Torvalds
* Dave Hansen <dave.hansen@intel.com> wrote:
> On 2/27/25 12:02, Ingo Molnar wrote:
> >>> There's a 4th option:
> >>>
> >>> 4. Reuse XFEATURES 3/4 (MPX) only on APX-aware kernels, keep it
> >>> disabled for old kernels.
> >>>
> >>> Problem solved.
> >> The worry boils down to code in the kernel or userspace doing this:
> >>
> >> if (XGETBV(0) & 0x11)
> >> do_mpx_gunk();
> > New, APX-aware kernels wouldn't be doing this, why would they?
>
> New, updated software is rarely the problem. It would not be a problem
> in this case either.
>
> The problem is with old OSes/VMMs and even old userspace on new kernels.
So this is the 3rd time I have to repeat that if the MPX/AVX bit is
disabled in the CPU by default on bootup, these old kernels will work
fine just like they do today, so I think we might be talking past each
other.
Let's try again:
> So I'm not _quite_ sure what you meant when you said:
>
> > Old kernels would *never* see this bit enabled if it's disabled by
> > default on bootup ...
>
> Because old kernel very much do see the MPX bits in CPUID and very
> much do enable them by default via XSETBV.
My '#4' suggestion:
- On bootup the CPU would not have the MPX/AVX bit enabled. So old
host kernels are fine as-is.
- On new, AVX-aware host kernels, the AVX boot code enables the APX
bit (via a ususal CPUID discovery mechanism), which is the same as
the old MPX bit.
- Likewise, the APX init code in APX-aware guest kernels would be able
to enable the APX bit as well. Old guest kernels will have the
APX/MPX bit disabled and will continue to work fine.
- [ With this approach there's no out-of-order feature bit problem,
the APX feature bit lines up with the APX xsave area. Enabling AVX
would require very little code - only the CPUID discovery and MSR
twiddling to enable it in the FPU init path, plus similar KVM
code. ]
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 20:06 ` Ingo Molnar
@ 2025-02-27 21:10 ` Andrew Cooper
2025-02-27 21:28 ` Ingo Molnar
0 siblings, 1 reply; 112+ messages in thread
From: Andrew Cooper @ 2025-02-27 21:10 UTC (permalink / raw)
To: Ingo Molnar
Cc: bp, chang.seok.bae, dave.hansen, dave.hansen, linux-kernel, mingo,
tglx, x86
On 27/02/2025 8:06 pm, Ingo Molnar wrote:
> * Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> What options #1 and #4 will cause is the virt people to come after
>> you with sharp implements for creating incompatibilities in an ABI.
>> The XFEATUREs are the tag(ish) of the union that is the xsave buffer.
> There's no incompatibility for a default-disabled feature that gets
> enabled by an AVX-aware host kernel and by AVX-aware guest kernels.
> What ABI would be broken?
I don't understand your question.
XSAVE, and details about in CPUID, are a stated ABI (given in the SDM
and APM), and available in userspace, including for userpace to write
into a file/socket and interpret later (this is literally how we migrate
VMs between different hosts).
You simply redefine what %xcr0.bnd_* (a.k.a. XFEATURES 3 and 4) mean,
irrespective of what a guest kernel thinks it can get away with.
~Andrew
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 20:29 ` Ingo Molnar
@ 2025-02-27 21:17 ` Dave Hansen
2025-02-27 21:37 ` Ingo Molnar
0 siblings, 1 reply; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 21:17 UTC (permalink / raw)
To: Ingo Molnar
Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen,
Linus Torvalds
On 2/27/25 12:29, Ingo Molnar wrote:
> My '#4' suggestion:
>
> - On bootup the CPU would not have the MPX/AVX bit enabled. So old
> host kernels are fine as-is.
Could you be specific about "the MPX bit"? Because, first of all, there
are a lot more than one bit in play here:
* The MPX state enumeration in CPUID.0DH:EAX[3:4]
* The MPX state enabling in XCR0[3:4]
* The MPX ISA enumeration CPUID.07H:EBX[14]
As far as the architecture goes, those XSAVE enabling/enumeration bits
are rather independent from the _additional_ X86_FEATURE_MPX ISA CPUID bit.
Like I showed in my earlier example, the CPU enumerates which XSAVE
features are available. These enumeration bits in CPUID leaf 0xd *ARE*
set at boot independent of any other enabling or enumeration. In this
regard, XSAVE enumeration is quite independent of the other parts of the
ISA. This could, in theory, be changed to become dependent on some kind
of APX enabling. But that would be novel for an XSAVE feature.
Right now, the entirety of CPUID.0DH:EAX/EDX is static. It does not
change based on enumeration or other enabling. It can literally come out
of the microcode ROM.
So are you proposing that CPUID.0DH:EAX[3:4] would be 0x0 or 0x3 at CPU
reset? Or, that CPUID.0DH:EAX[3:4] would default to 0x0 at reset and
then flip to 0x3 based on some other kind of novel APX opt-in?
(BTW, CPUID.0DH:EAX[3:4]==0x3 on all MPX hardware)
> - On new, AVX-aware host kernels, the AVX boot code enables the APX
> bit (via a ususal CPUID discovery mechanism), which is the same as
> the old MPX bit.
I think, here, you're suggesting that the new host kernel would set
XCR0[3:4]==0x3. Right?
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 21:10 ` Andrew Cooper
@ 2025-02-27 21:28 ` Ingo Molnar
2025-02-27 21:30 ` Ingo Molnar
0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 21:28 UTC (permalink / raw)
To: Andrew Cooper
Cc: bp, chang.seok.bae, dave.hansen, dave.hansen, linux-kernel, mingo,
tglx, x86
* Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > There's no incompatibility for a default-disabled feature that gets
> > enabled by an AVX-aware host kernel and by AVX-aware guest kernels.
> > What ABI would be broken?
>
> I don't understand your question.
>
> XSAVE, and details about in CPUID, are a stated ABI (given in the SDM
> and APM), and available in userspace, including for userpace to write
> into a file/socket and interpret later (this is literally how we
> migrate VMs between different hosts).
>
> You simply redefine what %xcr0.bnd_* (a.k.a. XFEATURES 3 and 4) mean,
> irrespective of what a guest kernel thinks it can get away with.
XFEATURES bits 3 and 4 are zero by default in the CPU, so the previous
ABI promise has been fully met: MPX is unavailable and disabled.
I propose a new addition, an extension of functionality: if a new CPUID
bit indicates it, and a new MSR is written, XFEATURES bit 3 becomes
active again - and the MPX area is now used by AVX. Obviously only
AVX-aware host and guest kernels would enable AVX.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 21:28 ` Ingo Molnar
@ 2025-02-27 21:30 ` Ingo Molnar
2025-02-27 22:23 ` Andrew Cooper
2025-02-28 3:10 ` Chang S. Bae
0 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 21:30 UTC (permalink / raw)
To: Andrew Cooper
Cc: bp, chang.seok.bae, dave.hansen, dave.hansen, linux-kernel, mingo,
tglx, x86
* Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> > > There's no incompatibility for a default-disabled feature that gets
> > > enabled by an AVX-aware host kernel and by AVX-aware guest kernels.
> > > What ABI would be broken?
> >
> > I don't understand your question.
> >
> > XSAVE, and details about in CPUID, are a stated ABI (given in the SDM
> > and APM), and available in userspace, including for userpace to write
> > into a file/socket and interpret later (this is literally how we
> > migrate VMs between different hosts).
> >
> > You simply redefine what %xcr0.bnd_* (a.k.a. XFEATURES 3 and 4) mean,
> > irrespective of what a guest kernel thinks it can get away with.
>
> XFEATURES bits 3 and 4 are zero by default in the CPU, so the previous
> ABI promise has been fully met: MPX is unavailable and disabled.
>
> I propose a new addition, an extension of functionality: if a new CPUID
> bit indicates it, and a new MSR is written, XFEATURES bit 3 becomes
> active again - and the MPX area is now used by AVX. Obviously only
> AVX-aware host and guest kernels would enable AVX.
Erm, s/AVX/APX ...
Too many acronyms.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 21:17 ` Dave Hansen
@ 2025-02-27 21:37 ` Ingo Molnar
2025-02-27 23:54 ` Dave Hansen
0 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-02-27 21:37 UTC (permalink / raw)
To: Dave Hansen
Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen,
Linus Torvalds
* Dave Hansen <dave.hansen@intel.com> wrote:
> On 2/27/25 12:29, Ingo Molnar wrote:
> > My '#4' suggestion:
> >
> > - On bootup the CPU would not have the MPX/AVX bit enabled. So old
> > host kernels are fine as-is.
>
> Could you be specific about "the MPX bit"? Because, first of all, there
> are a lot more than one bit in play here:
>
> * The MPX state enumeration in CPUID.0DH:EAX[3:4]
> * The MPX state enabling in XCR0[3:4]
> * The MPX ISA enumeration CPUID.07H:EBX[14]
>
> As far as the architecture goes, those XSAVE enabling/enumeration bits
> are rather independent from the _additional_ X86_FEATURE_MPX ISA CPUID bit.
>
> Like I showed in my earlier example, the CPU enumerates which XSAVE
> features are available. These enumeration bits in CPUID leaf 0xd *ARE*
> set at boot independent of any other enabling or enumeration. In this
> regard, XSAVE enumeration is quite independent of the other parts of the
> ISA. This could, in theory, be changed to become dependent on some kind
> of APX enabling. But that would be novel for an XSAVE feature.
Yeah. That would be novel for an XSAVE feature - but so is the change
in ordering. With my proposal we'd avoid the
xfeature_noncompact_order[] indirection table AFAICS.
> Right now, the entirety of CPUID.0DH:EAX/EDX is static. It does not
> change based on enumeration or other enabling. It can literally come
> out of the microcode ROM.
>
> So are you proposing that CPUID.0DH:EAX[3:4] would be 0x0 or 0x3 at
> CPU reset? Or, that CPUID.0DH:EAX[3:4] would default to 0x0 at reset
> and then flip to 0x3 based on some other kind of novel APX opt-in?
Yeah, I was thinking of a novel APX opt-in in the CPUID space,
triggered via an MSR write or so, only triggered by APX-aware
host and guest kernels.
Old kernels would see 0 in all the legacy MPX state bits.
( Assuming this actually works for APX. )
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 21:30 ` Ingo Molnar
@ 2025-02-27 22:23 ` Andrew Cooper
2025-02-28 3:10 ` Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: Andrew Cooper @ 2025-02-27 22:23 UTC (permalink / raw)
To: Ingo Molnar
Cc: bp, chang.seok.bae, dave.hansen, dave.hansen, linux-kernel, mingo,
tglx, x86
On 27/02/2025 9:30 pm, Ingo Molnar wrote:
> * Ingo Molnar <mingo@kernel.org> wrote:
>
>> * Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>
>>>> There's no incompatibility for a default-disabled feature that gets
>>>> enabled by an AVX-aware host kernel and by AVX-aware guest kernels.
>>>> What ABI would be broken?
>>> I don't understand your question.
>>>
>>> XSAVE, and details about in CPUID, are a stated ABI (given in the SDM
>>> and APM), and available in userspace, including for userpace to write
>>> into a file/socket and interpret later (this is literally how we
>>> migrate VMs between different hosts).
>>>
>>> You simply redefine what %xcr0.bnd_* (a.k.a. XFEATURES 3 and 4) mean,
>>> irrespective of what a guest kernel thinks it can get away with.
>> XFEATURES bits 3 and 4 are zero by default in the CPU, so the previous
>> ABI promise has been fully met: MPX is unavailable and disabled.
I disagree.
XFEATURES is defined in the spec, and the spec demands that userspace
checks it.
XGETBV(0) & 0x18 in userspace has a very well defined meaning that is
MPX and not "MPX now but something unrelated in the future".
>>
>> I propose a new addition, an extension of functionality: if a new CPUID
>> bit indicates it, and a new MSR is written, XFEATURES bit 3 becomes
>> active again - and the MPX area is now used by AVX. Obviously only
>> AVX-aware host and guest kernels would enable AVX.
> Erm, s/AVX/APX ...
>
> Too many acronyms.
With semantics like that, the pipeline designers will also be coming
after you with sharp objects.
~Andrew
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 09/11] x86/fpu/apx: Disallow conflicting MPX presence
2025-02-27 18:44 ` [PATCH RFC v1 09/11] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
@ 2025-02-27 23:16 ` Dave Hansen
2025-02-27 23:58 ` Chang S. Bae
0 siblings, 1 reply; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 23:16 UTC (permalink / raw)
To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 2/27/25 10:44, Chang S. Bae wrote:
>
> + if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
> + fpu_kernel_cfg.max_features & XFEATURE_MASK_BNDREGS) {
> + /*
Does this also need to check XFEATURE_MASK_BNDCSR for completeness?
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 21:37 ` Ingo Molnar
@ 2025-02-27 23:54 ` Dave Hansen
2025-02-28 15:20 ` Sean Christopherson
0 siblings, 1 reply; 112+ messages in thread
From: Dave Hansen @ 2025-02-27 23:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen,
Linus Torvalds, Sean Christopherson, Paolo Bonzini
On 2/27/25 13:37, Ingo Molnar wrote:
...
>> Like I showed in my earlier example, the CPU enumerates which XSAVE
>> features are available. These enumeration bits in CPUID leaf 0xd *ARE*
>> set at boot independent of any other enabling or enumeration. In this
>> regard, XSAVE enumeration is quite independent of the other parts of the
>> ISA. This could, in theory, be changed to become dependent on some kind
>> of APX enabling. But that would be novel for an XSAVE feature.
>
> Yeah. That would be novel for an XSAVE feature - but so is the change
> in ordering. With my proposal we'd avoid the
> xfeature_noncompact_order[] indirection table AFAICS.
Yeah, so with your proposal, we could toss out most of this series, so
roughly 100 lines of code.
The downsides are:
1. It can still confuse userspace, arguably in an architecture
violating manner because the SDM says: "If XCR0[4:3] is 11b, the
XSAVE feature set can be used to manage MPX state and software can
execute Intel MPX instructions." (this would be for userspace)
1a. Userspace like GDB still needs code to disambiguate XCR0[3:4]
2. It would add complexity in the XSAVE enumeration microcode. CPUID
data that comes right out of a ROM today would need to check some
CPU enabling state and change the enumeration.
3. Linux would still need to go fix up KVM in the kernel, like:
https://hansen.beer/~dave/intel/mpxapx.patch . Every APX-enabling
VMM would need something similar.
KVM folks, would you have any issues if XCR0[3:4] (the old MPX bits) got
used for this new APX feature? You'd basically need to add code to make
sure that XCR0[3:4]==0x3 don't imply MPX support, but _could_ imply APX
support instead, paired with another bit of enabling.
Looking back at Chang's series, I think there's ~40 lines of code
related to xfeature_noncompact_order[]. We'd need something like:
https://hansen.beer/~dave/intel/mpxapx.patch
and, honestly, just hacking that together, I really dislike the idea of
reusing bits. Even little things like this hunk:
- "MPX bounds registers",
- "MPX CSR",
+ "MPX bounds registers or some APX registers",
+ "MPX CSR or some other APX registers",
look really funky. So, we'd need some MSR bit twiddling, some KVM fixups
and special casing of the xsave_cpuid_features[] checks. We'd save, at
most, ~20 lines of code.
Honestly, that 40 lines of code is looking pretty good.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 09/11] x86/fpu/apx: Disallow conflicting MPX presence
2025-02-27 23:16 ` Dave Hansen
@ 2025-02-27 23:58 ` Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-27 23:58 UTC (permalink / raw)
To: Dave Hansen, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 2/27/2025 3:16 PM, Dave Hansen wrote:
> On 2/27/25 10:44, Chang S. Bae wrote:
>>
>> + if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
>> + fpu_kernel_cfg.max_features & XFEATURE_MASK_BNDREGS) {
>> + /*
>
> Does this also need to check XFEATURE_MASK_BNDCSR for completeness?
Ah, right. Sorry, it was a silly mistake. Thanks for reading through this!
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 21:30 ` Ingo Molnar
2025-02-27 22:23 ` Andrew Cooper
@ 2025-02-28 3:10 ` Chang S. Bae
2025-02-28 17:11 ` Ingo Molnar
1 sibling, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-02-28 3:10 UTC (permalink / raw)
To: Ingo Molnar, Andrew Cooper
Cc: bp, dave.hansen, dave.hansen, linux-kernel, mingo, tglx, x86
On 2/27/2025 1:30 PM, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>>
>> I propose a new addition, an extension of functionality: if a new CPUID
>> bit indicates it, and a new MSR is written, XFEATURES bit 3 becomes
>> active again - and the MPX area is now used by AVX. Obviously only
>> AVX-aware host and guest kernels would enable AVX.
>
> Erm, s/AVX/APX ...
Just thought of another aspect of this:
I'm curious about how core dumps should handle this. Initially, an
xfeature mask was added to the software-reserved area [1] to indicate
which xfeatures were present in the layout. More recently, a new note
[2] was introduced to expose CPUID-reported size and offset information,
helping tools like GDB. From an offline interpretation standpoint, I
think these bits will become ambiguous without further extensions.
[1] commit 5b3efd500854 ("x86, ptrace: regset extensions to support xstate")
[2] commit ba386777a30b ("x86/elf: Add a new FPU buffer layout info to
x86 core files")
Thanks,
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions
2025-02-27 19:36 ` Dave Hansen
@ 2025-02-28 3:10 ` Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-02-28 3:10 UTC (permalink / raw)
To: Dave Hansen, Ingo Molnar
Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen, Linus Torvalds
On 2/27/2025 11:36 AM, Dave Hansen wrote:
> On 2/27/25 11:15, Ingo Molnar wrote:
>>> by userspace applications, with no intended use in kernel mode. More
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> details on its use cases can be found in the published documentation [1].
>> I strongly suspect this won't remain so, unless there's some horrible
>> ISA limitation or other quirk that makes APX unsuitable for kernel use.
>
> I think Chang was trying to say that this series is completely focused
> on userspace AMX and basically ignores the idea of using it in the kernel.
>
> That said, we honestly talk all the time about using it in the kernel. A
> kernel compiled to use APX everywhere obviously won't run on older CPUs
> but it _should_ have some really nice advantages. I completely expect
> the kernel to be able to be compiled to use APX at _some_ point.
Yes, exactly. My primary intent was to clarify that this posting is for
supporting userspace APX usage. At the same time, I don’t want this to
imply little or no potential for in-kernel use.
Thanks,
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support
2025-02-27 19:12 ` Ingo Molnar
@ 2025-02-28 3:12 ` Chang S. Bae
2025-03-20 23:47 ` Chang S. Bae
0 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-02-28 3:12 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 2/27/2025 11:12 AM, Ingo Molnar wrote:
>
> So can this patch be moved further up in the series (without breaking
> anything), to make it easier to review the impact of the APX changes?
Will do it.
Thanks,
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 23:54 ` Dave Hansen
@ 2025-02-28 15:20 ` Sean Christopherson
0 siblings, 0 replies; 112+ messages in thread
From: Sean Christopherson @ 2025-02-28 15:20 UTC (permalink / raw)
To: Dave Hansen
Cc: Ingo Molnar, Chang S. Bae, linux-kernel, x86, tglx, mingo, bp,
dave.hansen, Linus Torvalds, Paolo Bonzini
On Thu, Feb 27, 2025, Dave Hansen wrote:
> On 2/27/25 13:37, Ingo Molnar wrote:
> ...
> >> Like I showed in my earlier example, the CPU enumerates which XSAVE
> >> features are available. These enumeration bits in CPUID leaf 0xd *ARE*
> >> set at boot independent of any other enabling or enumeration. In this
> >> regard, XSAVE enumeration is quite independent of the other parts of the
> >> ISA. This could, in theory, be changed to become dependent on some kind
> >> of APX enabling. But that would be novel for an XSAVE feature.
> >
> > Yeah. That would be novel for an XSAVE feature - but so is the change
> > in ordering. With my proposal we'd avoid the
> > xfeature_noncompact_order[] indirection table AFAICS.
>
> Yeah, so with your proposal, we could toss out most of this series, so
> roughly 100 lines of code.
>
> The downsides are:
>
> 1. It can still confuse userspace, arguably in an architecture
> violating manner because the SDM says: "If XCR0[4:3] is 11b, the
> XSAVE feature set can be used to manage MPX state and software can
> execute Intel MPX instructions." (this would be for userspace)
> 1a. Userspace like GDB still needs code to disambiguate XCR0[3:4]
> 2. It would add complexity in the XSAVE enumeration microcode. CPUID
> data that comes right out of a ROM today would need to check some
> CPU enabling state and change the enumeration.
> 3. Linux would still need to go fix up KVM in the kernel, like:
> https://hansen.beer/~dave/intel/mpxapx.patch . Every APX-enabling
> VMM would need something similar.
>
> KVM folks, would you have any issues if XCR0[3:4] (the old MPX bits) got
> used for this new APX feature?
Yes. I could live with complexity in KVM code, but I agree 100% with Andrew's
take:
: XGETBV(0) & 0x18 in userspace has a very well defined meaning that is
: MPX and not "MPX now but something unrelated in the future".
The risk of breaking guest kernels and guest userspace is decidedly non-zero.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-28 3:10 ` Chang S. Bae
@ 2025-02-28 17:11 ` Ingo Molnar
0 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2025-02-28 17:11 UTC (permalink / raw)
To: Chang S. Bae
Cc: Andrew Cooper, bp, dave.hansen, dave.hansen, linux-kernel, mingo,
tglx, x86, Linus Torvalds
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> > > I propose a new addition, an extension of functionality: if a new
> > > CPUID bit indicates it, and a new MSR is written, XFEATURES bit 3
> > > becomes active again - and the MPX area is now used by AVX.
> > > Obviously only AVX-aware host and guest kernels would enable AVX.
> >
> > Erm, s/AVX/APX ...
>
> Just thought of another aspect of this:
>
> I'm curious about how core dumps should handle this. Initially, an
> xfeature mask was added to the software-reserved area [1] to indicate
> which xfeatures were present in the layout. More recently, a new note
> [2] was introduced to expose CPUID-reported size and offset
> information, helping tools like GDB. From an offline interpretation
> standpoint, I think these bits will become ambiguous without further
> extensions.
>
> [1] commit 5b3efd500854 ("x86, ptrace: regset extensions to support xstate")
> [2] commit ba386777a30b ("x86/elf: Add a new FPU buffer layout info to x86
> core files")
Okay, I guess I agree and you guys are right, the MPX/APX ambiguity is
probably not worth it.
Still not happy about xfeature_noncompact_order[], but I guess that is
the price if we want to reuse the MPX area. :-/
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH RFC v2 0/9] x86: Support Intel Advanced Performance Extensions
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
` (11 preceding siblings ...)
2025-02-27 19:15 ` [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Ingo Molnar
@ 2025-03-20 23:42 ` Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 1/9] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
` (9 more replies)
12 siblings, 10 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
Hi all,
This is a revision of the previous posting [1]. The primary goal at this
stage is to present the scope of the code changes and still gather early
feedback on the approach to enabling this feature for userspace.
== Changes in this revision ==
A major update in this version is that the xstate ordering table is now
dynamically populated at boot time, thanks to Dave’s suggestion. Each
system will now generate its own table based on the available xfeature
set.
As a result, the MPX removal patch has been dropped from this series.
Previously, a conflict arose when listing both MPX and APX in a static
table. With the new dynamic approach, each feature can be enabled
exclusively without conflict. Patch 7 ensures their mutual exclusivity,
meaning the table will now reflect either one or neither of them.
Removing MPX entirely also seemed premature, given previous discussions
on the list [4]. For guest compatibility, setting MPX bits in XCR0 is
still expected on older systems.
Additional changes since the last posting:
* Complete MPX feature bits for enforcing APX exclusivity (Dave).
* Remove the first patch, as merged into the tip tree:
69a2fdf44604 ("x86/fpu/xstate: Simplify print_xstate_features()")
Below is the original cover letter with minor updates:
== Introduction ==
APX introduces a new set of general-purpose registers designed to improve
performance. More details on its use cases can be found in the published
documentation [2].
This series primarily assumes the userspace usage only, though in-kernel
use cases may remain open for considerations.
== Points to Consider ==
Considering APX enabling by the kernel,
* New Register State
APX register state is managed by the XSAVE instruction set, with its
state component number assigned to 19, following XTILEDATA.
* XSAVE Buffer Offset
- In the compacted format used (for the in-kernel buffer), the APX
state will appear at a later position in the buffer.
- In the non-compacted format (used for signal, ptrace, and KVM ABIs),
APX is assigned a lower offset, occupying the space previously
reserved for the deprecated MPX state.
This should not bring any ABI changes. In the extended register state
area, each state's size and offset are determined dynamically via
CPUID.
* Kernel Assumptions:
The kernel generally assumes that higher-numbered components have
higher offsets, as discussed in the following section.
== Areas for Adjustment ==
With that in mind, here are a few key areas that need to be addressed to
properly handle the APX state. While these are thought to be pretty much
at this stage, I'd say it is still open to discover possible hindsights
in other areas.
1. Feature Conflict
While a valid CPU should not expose both MPX and APX, a broken or
misconfigured CPU could erroneously expose support for both. This hard
conflict should be avoided up front.
2. XSAVE Format Conversion
APX introduces an offset anomaly that requires special handling when
converting between compacted and non-compacted layouts. The kernel
relies on XSAVE instructions for context switching and signal
handling. However, xstate copy functions must correctly translate the
XSAVE format while ensuring sequential memory access, as enforced by
struct membuf.
3. XSAVE Size Calculation
The kernel calculates XSAVE buffer sizes based on the assumption that
the highest-numbered feature appears at the end. If APX is the last
bit set in the feature mask, the existing logic in
xstate_calculate_size() miscalculates the buffer size.
4. Offset Sanity Check
The kernel's boot-time sanity check in setup_xstate_cache() assumes
offsets increase with feature numbers. The APX state will conflict
with this assumption.
== Approaches ==
These two approaches are considerable:
Option 1, Consider APX as a one-off exception
Initially, I thought this approach would result in fewer changes to the
xstate code. However, it makes the code less comprehensive (or more
complicated) and introduces a risk. If another feature similar to APX
comes up, adding another exception seems to be inefficient and messy.
Option 2, Handle out-of-order offsets in general
Rather than treating APX as an exception, this approach adapts the
kernel to accommodate out-of-order offsets. By introducing a feature
order table and an accompanying macro, the traversal logic is
encapsulated cleanly, simplifying related code. This makes the kernel
more resilient to future features with non-sequential offsets.
== Series Summary ==
This patch set addresses the above issues before enabling APX support
The chosen approach (Option 2) adjusts the kernel to handle out-of-order
offsets. Here’s a breakdown of the patches:
* PART1: XSTATE Code Adjustment
- PATCH1: Remove the offset sanity check.
- PATCH2: Introduce the feature order table.
- PATCH3: Adjust XSAVE size calculation.
- PATCH4: Modify the xstate copy function.
* PART2: APX Enabling
- PATCH5: Enumerate the APX CPUID feature bit.
- PATCH6: Update xstate definitions to include APX.
- PATCH7: Ensure MPX and APX are mutually exclusive.
- PATCH8: Register APX in the supported xstate list.
- PATCH9: Add a self-test case for APX.
== Testing ==
The first part is agnostic to APX and should preserve the same
functionality for the currently supported features. The xstate tests --
covering context switching and ABI compatibility for signal and ptrace --
were executed to ensure there is no regression.
Now, with dynamic population of the ordering table, MPX and APX states
can be separately enabled in different systems:
* APX: PATCH9 applies to the same test set and builds on the xstate
selftest rework [3]. Since no hardware implementation is available at
this time, an internal Intel emulator was primarily used to verify the
test cases.
* MPX exposure in KVM guest was also validated from a Skylake machine.
---
The patches are based on the tip/master branch. The series can be found
here:
git://github.com/intel/apx.git apx_rfc-v2
Thanks,
Chang
[1] https://lore.kernel.org/lkml/20250227184502.10288-1-chang.seok.bae@intel.com/
[2] https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html
[3] https://lore.kernel.org/lkml/20250226010731.2456-1-chang.seok.bae@intel.com/
[4] https://lore.kernel.org/lkml/547a1203-0339-7ad2-9505-eab027046298@intel.com/
Chang S. Bae (9):
x86/fpu/xstate: Remove xstate offset check
x86/fpu/xstate: Introduce xfeature order table and accessor macro
x86/fpu/xstate: Adjust XSAVE buffer size calculation
x86/fpu/xstate: Adjust xstate copying logic for user ABI
x86/cpufeatures: Add X86_FEATURE_APX
x86/fpu/apx: Define APX state component
x86/fpu/apx: Disallow conflicting MPX presence
x86/fpu/apx: Enable APX state support
selftests/x86/apx: Add APX test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/fpu/types.h | 9 +++
arch/x86/include/asm/fpu/xstate.h | 3 +-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/fpu/xstate.c | 115 +++++++++++++++++++--------
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/apx.c | 10 +++
tools/testing/selftests/x86/xstate.c | 3 +-
tools/testing/selftests/x86/xstate.h | 2 +
10 files changed, 113 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
base-commit: 758ea9705c51865858c612f591c6e6950dcafccf
--
2.45.2
^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH RFC v2 1/9] x86/fpu/xstate: Remove xstate offset check
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
@ 2025-03-20 23:42 ` Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 2/9] x86/fpu/xstate: Introduce xfeature order table and accessor macro Chang S. Bae
` (8 subsequent siblings)
9 siblings, 2 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
Traditionally, new xstate components have been assigned sequentially,
aligning feature numbers with their offsets in the XSAVE buffer. However,
this ordering is not architecturally mandated in the non-compacted
format, where a component's offset may not correspond to its feature
number.
The kernel caches CPUID-reported xstate component details, including size
and offset in the non-compacted format. As part of this process, a sanity
check is also conducted to ensure alignment between feature numbers and
offsets.
This check was likely intended as a general guideline rather than a
strict requirement. Upcoming changes will support out-of-order offsets.
Remove the check as becoming obsolete.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a41d1610d8b..542c6981180d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -216,9 +216,6 @@ static bool xfeature_enabled(enum xfeature xfeature)
static void __init setup_xstate_cache(void)
{
u32 eax, ebx, ecx, edx, i;
- /* start at the beginning of the "extended state" */
- unsigned int last_good_offset = offsetof(struct xregs_state,
- extended_state_area);
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -246,16 +243,6 @@ static void __init setup_xstate_cache(void)
continue;
xstate_offsets[i] = ebx;
-
- /*
- * In our xstate size checks, we assume that the highest-numbered
- * xstate feature has the highest offset in the buffer. Ensure
- * it does.
- */
- WARN_ONCE(last_good_offset > xstate_offsets[i],
- "x86/fpu: misordered xstate at %d\n", last_good_offset);
-
- last_good_offset = xstate_offsets[i];
}
}
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v2 2/9] x86/fpu/xstate: Introduce xfeature order table and accessor macro
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 1/9] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
@ 2025-03-20 23:42 ` Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 3/9] x86/fpu/xstate: Adjust XSAVE buffer size calculation Chang S. Bae
` (7 subsequent siblings)
9 siblings, 2 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
The kernel has largely assumed that higher xstate component numbers
correspond to later offsets in the buffer. However, this assumption no
longer holds for the non-compacted format, where a newer state component
may have a lower offset.
When iterating over xstate components in offset order, using the feature
number as an index may be misleading. At the same time, the CPU exposes
each component’s size and offset based on its feature number, making it a
key for state information.
To provide flexibility in handling xstate ordering, introduce a mapping
table: feature order -> feature number. The table is dynamically
populated based on the CPU-exposed features and is sorted in offset order
at boot time.
Additionally, add an accessor macro to facilitate sequential traversal of
xstate components based on their actual buffer positions, given a feature
bitmask. This accessor macro will be particularly useful for computing
custom non-compacted format sizes and iterating over xstate offsets in
non-compacted buffers.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/all/7fa02be2-0884-4702-ae73-a3620938161b@intel.com
---
RFC-V1 -> RFC-V2: Populate the order table dynamically (Dave).
This introduction lays the groundwork for handling APX, which is assigned
feature number 19 but appears immediately after FEATURE_YMM in
APX-enabled systems. Older CPUs, such as Skylake systems, previously used
this region for MPX.
Later in this series, APX and MPX will be explicitly marked as mutually
exclusive to prevent conflicts. Since the xfeature order table is
dynamically populated at boot, it will reflect the correct feature set
for each system configuration.
---
arch/x86/kernel/fpu/xstate.c | 58 +++++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 542c6981180d..1e22103a8e17 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -14,6 +14,7 @@
#include <linux/proc_fs.h>
#include <linux/vmalloc.h>
#include <linux/coredump.h>
+#include <linux/sort.h>
#include <asm/fpu/api.h>
#include <asm/fpu/regset.h>
@@ -88,6 +89,31 @@ static unsigned int xstate_sizes[XFEATURE_MAX] __ro_after_init =
{ [ 0 ... XFEATURE_MAX - 1] = -1};
static unsigned int xstate_flags[XFEATURE_MAX] __ro_after_init;
+/*
+ * Ordering of xstate components in uncompacted format: The xfeature
+ * number does not necessarily indicate its position in the XSAVE buffer.
+ * This array defines the traversal order of xstate features.
+ */
+static unsigned int xfeature_uncompact_order[XFEATURE_MAX] __ro_after_init =
+ { [ 0 ... XFEATURE_MAX - 1] = -1};
+
+static inline unsigned int next_xfeature_order(unsigned int i, u64 mask)
+{
+ for (; xfeature_uncompact_order[i] != -1; i++) {
+ if (mask & BIT_ULL(xfeature_uncompact_order[i]))
+ break;
+ }
+
+ return i;
+}
+
+/* Iterate xstate features in uncompacted order: */
+#define for_each_extended_xfeature_in_order(i, mask) \
+ for (i = 0; \
+ i = next_xfeature_order(i, mask), \
+ xfeature_uncompact_order[i] != -1; \
+ i++)
+
#define XSTATE_FLAG_SUPERVISOR BIT(0)
#define XSTATE_FLAG_ALIGNED64 BIT(1)
@@ -209,13 +235,20 @@ static bool xfeature_enabled(enum xfeature xfeature)
return fpu_kernel_cfg.max_features & BIT_ULL(xfeature);
}
+static int compare_xstate_offsets(const void *xfeature1, const void *xfeature2)
+{
+ return xstate_offsets[*(unsigned int *)xfeature1] -
+ xstate_offsets[*(unsigned int *)xfeature2];
+}
+
/*
* Record the offsets and sizes of various xstates contained
- * in the XSAVE state memory layout.
+ * in the XSAVE state memory layout. Also, create an ordered
+ * list of xfeatures for handling out-of-order offsets.
*/
static void __init setup_xstate_cache(void)
{
- u32 eax, ebx, ecx, edx, i;
+ u32 eax, ebx, ecx, edx, xfeature, i = 0;
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -229,21 +262,30 @@ static void __init setup_xstate_cache(void)
xstate_sizes[XFEATURE_SSE] = sizeof_field(struct fxregs_state,
xmm_space);
- for_each_extended_xfeature(i, fpu_kernel_cfg.max_features) {
- cpuid_count(CPUID_LEAF_XSTATE, i, &eax, &ebx, &ecx, &edx);
+ for_each_extended_xfeature(xfeature, fpu_kernel_cfg.max_features) {
+ cpuid_count(CPUID_LEAF_XSTATE, xfeature, &eax, &ebx, &ecx, &edx);
- xstate_sizes[i] = eax;
- xstate_flags[i] = ecx;
+ xstate_sizes[xfeature] = eax;
+ xstate_flags[xfeature] = ecx;
/*
* If an xfeature is supervisor state, the offset in EBX is
* invalid, leave it to -1.
*/
- if (xfeature_is_supervisor(i))
+ if (xfeature_is_supervisor(xfeature))
continue;
- xstate_offsets[i] = ebx;
+ xstate_offsets[xfeature] = ebx;
+
+ /* Populate the list of xfeatures before sorting */
+ xfeature_uncompact_order[i++] = xfeature;
}
+
+ /*
+ * Sort xfeatures by their offsets to support out-of-order
+ * offsets in the uncompacted format.
+ */
+ sort(xfeature_uncompact_order, i, sizeof(unsigned int), compare_xstate_offsets, NULL);
}
/*
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v2 3/9] x86/fpu/xstate: Adjust XSAVE buffer size calculation
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 1/9] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 2/9] x86/fpu/xstate: Introduce xfeature order table and accessor macro Chang S. Bae
@ 2025-03-20 23:42 ` Chang S. Bae
2025-03-25 10:28 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 4/9] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
` (6 subsequent siblings)
9 siblings, 2 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
The current xstate size calculation assumes that the highest-numbered
xstate feature has the highest offset in the buffer, determining the size
based on the topmost bit in the feature mask. However, this assumption is
not architecturally guaranteed -- higher-numbered features may have lower
offsets.
With the introduction of the xfeature order table and its helper macro,
xstate components can now be traversed in their positional order. Update
the non-compacted format handling to iterate through the table to
determine the last-positioned feature. Then, set the offset accordingly.
Since size calculation primarily occurs during initialization or in
non-critical paths, looping to find the last feature is not expected to
have a meaningful performance impact.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1e22103a8e17..93f94013b094 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -581,13 +581,20 @@ static bool __init check_xstate_against_struct(int nr)
static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
{
unsigned int topmost = fls64(xfeatures) - 1;
- unsigned int offset = xstate_offsets[topmost];
+ unsigned int offset, i;
if (topmost <= XFEATURE_SSE)
return sizeof(struct xregs_state);
- if (compacted)
+ if (compacted) {
offset = xfeature_get_offset(xfeatures, topmost);
+ } else {
+ /* Walk through the xfeature order to pick the last */
+ for_each_extended_xfeature_in_order(i, xfeatures)
+ topmost = xfeature_uncompact_order[i];
+ offset = xstate_offsets[topmost];
+ }
+
return offset + xstate_sizes[topmost];
}
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v2 4/9] x86/fpu/xstate: Adjust xstate copying logic for user ABI
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
` (2 preceding siblings ...)
2025-03-20 23:42 ` [PATCH RFC v2 3/9] x86/fpu/xstate: Adjust XSAVE buffer size calculation Chang S. Bae
@ 2025-03-20 23:42 ` Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
` (2 more replies)
2025-03-20 23:42 ` [PATCH RFC v2 5/9] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
` (5 subsequent siblings)
9 siblings, 3 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 93f94013b094..46c45e2f2a5a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1107,10 +1107,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1179,15 +1178,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_uncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1197,14 +1197,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v2 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
` (3 preceding siblings ...)
2025-03-20 23:42 ` [PATCH RFC v2 4/9] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
@ 2025-03-20 23:42 ` Chang S. Bae
2025-04-11 16:12 ` [PATCH RFC v2a " Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 6/9] x86/fpu/apx: Define APX state component Chang S. Bae
` (4 subsequent siblings)
9 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Allocating a new feature word for this bit seems excessive at this stage,
given that no other imminent features are quite known.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 8b7cf13e0acb..51178d4a6308 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -479,6 +479,7 @@
#define X86_FEATURE_AMD_FAST_CPPC (21*32 + 5) /* Fast CPPC */
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
+#define X86_FEATURE_APX (21*32 + 8) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index a2fbea0be535..d5e5013e0e9f 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{ X86_FEATURE_FRED, X86_FEATURE_LKGS },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{}
};
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..6c40d5af8479 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -54,6 +54,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
{ X86_FEATURE_AMD_HETEROGENEOUS_CORES, CPUID_EAX, 30, 0x80000026, 0 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ 0, 0, 0, 0, 0 }
};
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v2 6/9] x86/fpu/apx: Define APX state component
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
` (4 preceding siblings ...)
2025-03-20 23:42 ` [PATCH RFC v2 5/9] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
@ 2025-03-20 23:42 ` Chang S. Bae
2025-04-11 22:54 ` Sohil Mehta
2025-03-20 23:42 ` [PATCH RFC v2 7/9] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
` (3 subsequent siblings)
9 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
The Advanced Performance Extensions (APX) feature flag was previously
defined. This feature is associated with a new state component number 19.
To support APX, it is essential to define this xstate component and
implement the necessary sanity checks.
Define the new component number, state name, and those register data
type. Then, extend the size checker to validate the register data type
and explicitly set the APX feature flag as a dependency for the new
component in xsave_cpuid_features[].
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC-V1 -> RFC-V2:
* Remove the ordering table change, as it is now dynamically populated.
---
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/kernel/fpu/xstate.c | 3 +++
2 files changed, 12 insertions(+)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862bf230..97310df3ea13 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -125,6 +125,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -145,6 +146,7 @@ enum xfeature {
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
+#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
@@ -303,6 +305,13 @@ struct xtile_data {
struct reg_1024_byte tmm;
} __packed;
+/*
+ * State component 19: 8B extended general purpose register.
+ */
+struct apx_state {
+ u64 egpr[16];
+} __packed;
+
/*
* State component 10 is supervisor state used for context-switching the
* PASID state.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 46c45e2f2a5a..2a270683a762 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -63,6 +63,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
@@ -81,6 +82,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
+ [XFEATURE_APX] = X86_FEATURE_APX,
};
static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -570,6 +572,7 @@ static bool __init check_xstate_against_struct(int nr)
case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
+ case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
return false;
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v2 7/9] x86/fpu/apx: Disallow conflicting MPX presence
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
` (5 preceding siblings ...)
2025-03-20 23:42 ` [PATCH RFC v2 6/9] x86/fpu/apx: Define APX state component Chang S. Bae
@ 2025-03-20 23:42 ` Chang S. Bae
2025-04-14 17:09 ` Sohil Mehta
2025-03-20 23:42 ` [PATCH RFC v2 8/9] x86/fpu/apx: Enable APX state support Chang S. Bae
` (2 subsequent siblings)
9 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
APX is introduced as xstate component 19, following AMX. However, in the
non-compacted format, its offset overlaps with the space previously
occupied by the now-deprecated MPX:
45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
To prevent conflicts, the kernel must ensure the CPU never expose both
features at the same time. If so, it indicates unreliable hardware. In
such cases, XSAVE should be disabled entirely as a precautionary measure.
Add a sanity check to detect this condition and disable XSAVE if an
invalid hardware configuration is identified.
Note: MPX state components remain enabled on legacy systems solely for
KVM guest support.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2a270683a762..0d68d5c4bc48 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -814,6 +814,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
+ fpu_kernel_cfg.max_features & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) {
+ /*
+ * This is a problematic CPU configuration where two
+ * conflicting state components are both enumerated.
+ */
+ pr_err("x86/fpu: both APX and MPX present in the CPU's xstate features: 0x%llx.\n",
+ fpu_kernel_cfg.max_features);
+ goto out_disable;
+ }
+
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
XFEATURE_MASK_INDEPENDENT;
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v2 8/9] x86/fpu/apx: Enable APX state support
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
` (6 preceding siblings ...)
2025-03-20 23:42 ` [PATCH RFC v2 7/9] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
@ 2025-03-20 23:42 ` Chang S. Bae
2025-04-14 15:41 ` Sohil Mehta
2025-03-20 23:43 ` [PATCH RFC v2 9/9] selftests/x86/apx: Add APX test Chang S. Bae
2025-03-25 10:25 ` [PATCH RFC v2 0/9] x86: Support Intel Advanced Performance Extensions Ingo Molnar
9 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
With securing APX against conflicting MPX, it is now ready to be enabled.
Include APX in the enabled xfeature set.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/fpu/xstate.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7980c5..b308a76afbb7 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -32,7 +32,8 @@
XFEATURE_MASK_PKRU | \
XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* Features which are restored when returning to user space.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0d68d5c4bc48..a5e3954dc834 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -371,7 +371,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* setup the xstate image representing the init state
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH RFC v2 9/9] selftests/x86/apx: Add APX test
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
` (7 preceding siblings ...)
2025-03-20 23:42 ` [PATCH RFC v2 8/9] x86/fpu/apx: Enable APX state support Chang S. Bae
@ 2025-03-20 23:43 ` Chang S. Bae
2025-04-14 16:02 ` Sohil Mehta
2025-03-25 10:25 ` [PATCH RFC v2 0/9] x86: Support Intel Advanced Performance Extensions Ingo Molnar
9 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:43 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
The extended general-purpose registers for APX may contain random data,
which is currently assumed by the xstate testing framework. This allows
the testing of the new userspace feature using the common test code.
Invoke the test entry function from apx.c after enumerating the
state component and adding it to the support list
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
The patch depends on the selftest xstate series, which was just merged
into the x86/fpu branch as of now:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/fpu
Here is the original series posting for the record:
https://lore.kernel.org/lkml/20250226010731.2456-1-chang.seok.bae@intel.com/
---
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 2 ++
4 files changed, 16 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 28422c32cc8f..f703fcfe9f7c 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -19,7 +19,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
- corrupt_xstate_header amx lam test_shadow_stack avx
+ corrupt_xstate_header amx lam test_shadow_stack avx apx
# Some selftests require 32bit support enabled also on 64bit systems
TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
@@ -136,3 +136,4 @@ $(OUTPUT)/nx_stack_64: CFLAGS += -Wl,-z,noexecstack
$(OUTPUT)/avx_64: CFLAGS += -mno-avx -mno-avx512f
$(OUTPUT)/amx_64: EXTRA_FILES += xstate.c
$(OUTPUT)/avx_64: EXTRA_FILES += xstate.c
+$(OUTPUT)/apx_64: EXTRA_FILES += xstate.c
diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
new file mode 100644
index 000000000000..d9c8d41b8c5a
--- /dev/null
+++ b/tools/testing/selftests/x86/apx.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include "xstate.h"
+
+int main(void)
+{
+ test_xstate(XFEATURE_APX);
+}
diff --git a/tools/testing/selftests/x86/xstate.c b/tools/testing/selftests/x86/xstate.c
index 23c1d6c964ea..97fe4bd8bc77 100644
--- a/tools/testing/selftests/x86/xstate.c
+++ b/tools/testing/selftests/x86/xstate.c
@@ -31,7 +31,8 @@
(1 << XFEATURE_OPMASK) | \
(1 << XFEATURE_ZMM_Hi256) | \
(1 << XFEATURE_Hi16_ZMM) | \
- (1 << XFEATURE_XTILEDATA))
+ (1 << XFEATURE_XTILEDATA) | \
+ (1 << XFEATURE_APX))
static inline uint64_t xgetbv(uint32_t index)
{
diff --git a/tools/testing/selftests/x86/xstate.h b/tools/testing/selftests/x86/xstate.h
index 42af36ec852f..e91e3092b5d2 100644
--- a/tools/testing/selftests/x86/xstate.h
+++ b/tools/testing/selftests/x86/xstate.h
@@ -33,6 +33,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILECFG,
XFEATURE_XTILEDATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -59,6 +60,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro
2025-02-27 19:03 ` Dave Hansen
2025-02-27 19:34 ` Andrew Cooper
@ 2025-03-20 23:46 ` Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:46 UTC (permalink / raw)
To: Dave Hansen, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 2/27/2025 11:03 AM, Dave Hansen wrote:
>
> Two, dynamically generate it. I suspect that once you have the
> uncompacted offset read out of CPUID, you could generate this in fewer
> lines of code than it takes to define it manually. I think it would also
> not require adding new features manually.
Yeah, you also suggested using sort(), thanks! Just posted a revision of
this patch here:
https://lore.kernel.org/lkml/20250320234301.8342-3-chang.seok.bae@intel.com/
Thanks,
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support
2025-02-28 3:12 ` Chang S. Bae
@ 2025-03-20 23:47 ` Chang S. Bae
2025-04-01 17:13 ` Chang S. Bae
0 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:47 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 2/27/2025 7:12 PM, Chang S. Bae wrote:
> On 2/27/2025 11:12 AM, Ingo Molnar wrote:
>>
>> So can this patch be moved further up in the series (without breaking
>> anything), to make it easier to review the impact of the APX changes?
>
> Will do it.
If the xfeature_noncompact_order[] table is dynamically populated at
boot time, the kernel should be able to maintain the order list flexibly
enough without dropping MPX bits.
So, I posted V2 [1] with this patch removed -- after observing MPX being
exposed to a KVM guest when running this version on an Skylake machine
and confirming that APX support passed self-tests on an APX-emulated system.
I also considered handling MPX removal up to the KVM code in a separate
series, which would eliminate about 220 lines. However, given previous
discussions like [2], posting that cleanup now might appear to disregard
those concerns. Perhaps it’s something to revisit in five years or more.
In the meantime, kernel memory usage could be improved by treating MPX
as VCPU-only, aligning with the ongoing supervisor CET state handling
work [3].
Thanks,
Chang
[1]
https://lore.kernel.org/lkml/20250320234301.8342-1-chang.seok.bae@intel.com/
[2]
https://lore.kernel.org/lkml/CALMp9eTD+GeiKK9E5_JUOy7YXPTq9z2f2LcyXZL5ypQ6vBHrHg@mail.gmail.com/
[3]
https://lore.kernel.org/lkml/20250318153316.1970147-1-chao.gao@intel.com/
^ permalink raw reply [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/xstate: Adjust xstate copying logic for user ABI
2025-03-20 23:42 ` [PATCH RFC v2 4/9] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
@ 2025-03-25 10:18 ` tip-bot2 for Chang S. Bae
2025-03-25 10:28 ` tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-03-25 10:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 8693f035b9d6814082aed41234f3459560ce81b2
Gitweb: https://git.kernel.org/tip/8693f035b9d6814082aed41234f3459560ce81b2
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:55 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:02:21 +01:00
x86/fpu/xstate: Adjust xstate copying logic for user ABI
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1e22103..61b3a49 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1100,10 +1100,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1172,15 +1171,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_uncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1190,14 +1190,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/xstate: Introduce xfeature order table and accessor macro
2025-03-20 23:42 ` [PATCH RFC v2 2/9] x86/fpu/xstate: Introduce xfeature order table and accessor macro Chang S. Bae
@ 2025-03-25 10:18 ` tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-03-25 10:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: Dave Hansen, Chang S. Bae, Ingo Molnar, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: b03c328e9711b4af8e4a433a1c4dad38f2e160a2
Gitweb: https://git.kernel.org/tip/b03c328e9711b4af8e4a433a1c4dad38f2e160a2
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:53 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:02:20 +01:00
x86/fpu/xstate: Introduce xfeature order table and accessor macro
The kernel has largely assumed that higher xstate component numbers
correspond to later offsets in the buffer. However, this assumption no
longer holds for the non-compacted format, where a newer state component
may have a lower offset.
When iterating over xstate components in offset order, using the feature
number as an index may be misleading. At the same time, the CPU exposes
each component’s size and offset based on its feature number, making it a
key for state information.
To provide flexibility in handling xstate ordering, introduce a mapping
table: feature order -> feature number. The table is dynamically
populated based on the CPU-exposed features and is sorted in offset order
at boot time.
Additionally, add an accessor macro to facilitate sequential traversal of
xstate components based on their actual buffer positions, given a feature
bitmask. This accessor macro will be particularly useful for computing
custom non-compacted format sizes and iterating over xstate offsets in
non-compacted buffers.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-3-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 58 ++++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 542c698..1e22103 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -14,6 +14,7 @@
#include <linux/proc_fs.h>
#include <linux/vmalloc.h>
#include <linux/coredump.h>
+#include <linux/sort.h>
#include <asm/fpu/api.h>
#include <asm/fpu/regset.h>
@@ -88,6 +89,31 @@ static unsigned int xstate_sizes[XFEATURE_MAX] __ro_after_init =
{ [ 0 ... XFEATURE_MAX - 1] = -1};
static unsigned int xstate_flags[XFEATURE_MAX] __ro_after_init;
+/*
+ * Ordering of xstate components in uncompacted format: The xfeature
+ * number does not necessarily indicate its position in the XSAVE buffer.
+ * This array defines the traversal order of xstate features.
+ */
+static unsigned int xfeature_uncompact_order[XFEATURE_MAX] __ro_after_init =
+ { [ 0 ... XFEATURE_MAX - 1] = -1};
+
+static inline unsigned int next_xfeature_order(unsigned int i, u64 mask)
+{
+ for (; xfeature_uncompact_order[i] != -1; i++) {
+ if (mask & BIT_ULL(xfeature_uncompact_order[i]))
+ break;
+ }
+
+ return i;
+}
+
+/* Iterate xstate features in uncompacted order: */
+#define for_each_extended_xfeature_in_order(i, mask) \
+ for (i = 0; \
+ i = next_xfeature_order(i, mask), \
+ xfeature_uncompact_order[i] != -1; \
+ i++)
+
#define XSTATE_FLAG_SUPERVISOR BIT(0)
#define XSTATE_FLAG_ALIGNED64 BIT(1)
@@ -209,13 +235,20 @@ static bool xfeature_enabled(enum xfeature xfeature)
return fpu_kernel_cfg.max_features & BIT_ULL(xfeature);
}
+static int compare_xstate_offsets(const void *xfeature1, const void *xfeature2)
+{
+ return xstate_offsets[*(unsigned int *)xfeature1] -
+ xstate_offsets[*(unsigned int *)xfeature2];
+}
+
/*
* Record the offsets and sizes of various xstates contained
- * in the XSAVE state memory layout.
+ * in the XSAVE state memory layout. Also, create an ordered
+ * list of xfeatures for handling out-of-order offsets.
*/
static void __init setup_xstate_cache(void)
{
- u32 eax, ebx, ecx, edx, i;
+ u32 eax, ebx, ecx, edx, xfeature, i = 0;
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -229,21 +262,30 @@ static void __init setup_xstate_cache(void)
xstate_sizes[XFEATURE_SSE] = sizeof_field(struct fxregs_state,
xmm_space);
- for_each_extended_xfeature(i, fpu_kernel_cfg.max_features) {
- cpuid_count(CPUID_LEAF_XSTATE, i, &eax, &ebx, &ecx, &edx);
+ for_each_extended_xfeature(xfeature, fpu_kernel_cfg.max_features) {
+ cpuid_count(CPUID_LEAF_XSTATE, xfeature, &eax, &ebx, &ecx, &edx);
- xstate_sizes[i] = eax;
- xstate_flags[i] = ecx;
+ xstate_sizes[xfeature] = eax;
+ xstate_flags[xfeature] = ecx;
/*
* If an xfeature is supervisor state, the offset in EBX is
* invalid, leave it to -1.
*/
- if (xfeature_is_supervisor(i))
+ if (xfeature_is_supervisor(xfeature))
continue;
- xstate_offsets[i] = ebx;
+ xstate_offsets[xfeature] = ebx;
+
+ /* Populate the list of xfeatures before sorting */
+ xfeature_uncompact_order[i++] = xfeature;
}
+
+ /*
+ * Sort xfeatures by their offsets to support out-of-order
+ * offsets in the uncompacted format.
+ */
+ sort(xfeature_uncompact_order, i, sizeof(unsigned int), compare_xstate_offsets, NULL);
}
/*
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/xstate: Remove xstate offset check
2025-03-20 23:42 ` [PATCH RFC v2 1/9] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
@ 2025-03-25 10:18 ` tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-03-25 10:18 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 937a3c877f7c84103891a2319e24626bf3a4d8a5
Gitweb: https://git.kernel.org/tip/937a3c877f7c84103891a2319e24626bf3a4d8a5
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:52 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:02:20 +01:00
x86/fpu/xstate: Remove xstate offset check
Traditionally, new xstate components have been assigned sequentially,
aligning feature numbers with their offsets in the XSAVE buffer. However,
this ordering is not architecturally mandated in the non-compacted
format, where a component's offset may not correspond to its feature
number.
The kernel caches CPUID-reported xstate component details, including size
and offset in the non-compacted format. As part of this process, a sanity
check is also conducted to ensure alignment between feature numbers and
offsets.
This check was likely intended as a general guideline rather than a
strict requirement. Upcoming changes will support out-of-order offsets.
Remove the check as becoming obsolete.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-2-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a41d16..542c698 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -216,9 +216,6 @@ static bool xfeature_enabled(enum xfeature xfeature)
static void __init setup_xstate_cache(void)
{
u32 eax, ebx, ecx, edx, i;
- /* start at the beginning of the "extended state" */
- unsigned int last_good_offset = offsetof(struct xregs_state,
- extended_state_area);
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -246,16 +243,6 @@ static void __init setup_xstate_cache(void)
continue;
xstate_offsets[i] = ebx;
-
- /*
- * In our xstate size checks, we assume that the highest-numbered
- * xstate feature has the highest offset in the buffer. Ensure
- * it does.
- */
- WARN_ONCE(last_good_offset > xstate_offsets[i],
- "x86/fpu: misordered xstate at %d\n", last_good_offset);
-
- last_good_offset = xstate_offsets[i];
}
}
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2 0/9] x86: Support Intel Advanced Performance Extensions
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
` (8 preceding siblings ...)
2025-03-20 23:43 ` [PATCH RFC v2 9/9] selftests/x86/apx: Add APX test Chang S. Bae
@ 2025-03-25 10:25 ` Ingo Molnar
9 siblings, 0 replies; 112+ messages in thread
From: Ingo Molnar @ 2025-03-25 10:25 UTC (permalink / raw)
To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> Chang S. Bae (9):
> x86/fpu/xstate: Remove xstate offset check
> x86/fpu/xstate: Introduce xfeature order table and accessor macro
> x86/fpu/xstate: Adjust XSAVE buffer size calculation
> x86/fpu/xstate: Adjust xstate copying logic for user ABI
So I've applied these first 4 as preparatory patches to tip:x86/fpu,
let's get these tested a bit broadly.
> x86/cpufeatures: Add X86_FEATURE_APX
> x86/fpu/apx: Define APX state component
> x86/fpu/apx: Disallow conflicting MPX presence
> x86/fpu/apx: Enable APX state support
> selftests/x86/apx: Add APX test
Planning to apply these remaining 5 as well in a couple of days, if all
goes well with the others.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/xstate: Adjust xstate copying logic for user ABI
2025-03-20 23:42 ` [PATCH RFC v2 4/9] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
@ 2025-03-25 10:28 ` tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-03-25 10:28 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: ed91ad084ef9fa49f6c6dfef2cb10c12ce3786a6
Gitweb: https://git.kernel.org/tip/ed91ad084ef9fa49f6c6dfef2cb10c12ce3786a6
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:55 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:21:20 +01:00
x86/fpu/xstate: Adjust xstate copying logic for user ABI
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 93f9401..46c45e2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1107,10 +1107,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1179,15 +1178,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_uncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1197,14 +1197,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/xstate: Adjust XSAVE buffer size calculation
2025-03-20 23:42 ` [PATCH RFC v2 3/9] x86/fpu/xstate: Adjust XSAVE buffer size calculation Chang S. Bae
@ 2025-03-25 10:28 ` tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-03-25 10:28 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: a6842ee9b5be3223cdb0c8fee3f6d71c274f68ba
Gitweb: https://git.kernel.org/tip/a6842ee9b5be3223cdb0c8fee3f6d71c274f68ba
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:54 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Mar 2025 11:21:20 +01:00
x86/fpu/xstate: Adjust XSAVE buffer size calculation
The current xstate size calculation assumes that the highest-numbered
xstate feature has the highest offset in the buffer, determining the size
based on the topmost bit in the feature mask. However, this assumption is
not architecturally guaranteed -- higher-numbered features may have lower
offsets.
With the introduction of the xfeature order table and its helper macro,
xstate components can now be traversed in their positional order. Update
the non-compacted format handling to iterate through the table to
determine the last-positioned feature. Then, set the offset accordingly.
Since size calculation primarily occurs during initialization or in
non-critical paths, looping to find the last feature is not expected to
have a meaningful performance impact.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-4-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1e22103..93f9401 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -581,13 +581,20 @@ static bool __init check_xstate_against_struct(int nr)
static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
{
unsigned int topmost = fls64(xfeatures) - 1;
- unsigned int offset = xstate_offsets[topmost];
+ unsigned int offset, i;
if (topmost <= XFEATURE_SSE)
return sizeof(struct xregs_state);
- if (compacted)
+ if (compacted) {
offset = xfeature_get_offset(xfeatures, topmost);
+ } else {
+ /* Walk through the xfeature order to pick the last */
+ for_each_extended_xfeature_in_order(i, xfeatures)
+ topmost = xfeature_uncompact_order[i];
+ offset = xstate_offsets[topmost];
+ }
+
return offset + xstate_sizes[topmost];
}
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support
2025-03-20 23:47 ` Chang S. Bae
@ 2025-04-01 17:13 ` Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-04-01 17:13 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 3/20/2025 4:47 PM, Chang S. Bae wrote:
>
> In the meantime, kernel memory usage could be improved by treating MPX
> as VCPU-only, aligning with the ongoing supervisor CET state handling
> work [3].
Sorry, I made a naive assumption when mentioning this earlier.
Since MPX is a user feature, excluding it from the host fpstate while
exposing it to the guest would also require enforcement in userspace.
The only viable option seems to be switching XCR0. However, this would
introduce significant kernel code complexity (as you’re aware) with a
marginal benefit — saving just per-task 128 bytes on legacy systems.
Unlike guest-exclusive supervisor states, this trade-off appears to be
too costly.
Thanks,
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-03-20 23:42 ` [PATCH RFC v2 5/9] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
@ 2025-04-11 16:12 ` Chang S. Bae
2025-04-11 16:54 ` Sohil Mehta
0 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-11 16:12 UTC (permalink / raw)
To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Rebased onto v6.15-rc1 with commit:
968e9bc4cef8 ("x86: move ZMM exclusion list into CPU feature flag")
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..5445937eff3d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_APX (21*32 + 9) /* Advanced #Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 94c062cddfa4..7c1268138a7a 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -88,6 +88,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{ X86_FEATURE_FRED, X86_FEATURE_LKGS },
{ X86_FEATURE_SPEC_CTRL_SSBD, X86_FEATURE_SPEC_CTRL },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{}
};
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..6c40d5af8479 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -54,6 +54,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
{ X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
{ X86_FEATURE_AMD_HETEROGENEOUS_CORES, CPUID_EAX, 30, 0x80000026, 0 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ 0, 0, 0, 0, 0 }
};
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-11 16:12 ` [PATCH RFC v2a " Chang S. Bae
@ 2025-04-11 16:54 ` Sohil Mehta
2025-04-11 18:23 ` Chang S. Bae
0 siblings, 1 reply; 112+ messages in thread
From: Sohil Mehta @ 2025-04-11 16:54 UTC (permalink / raw)
To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 4/11/2025 9:12 AM, Chang S. Bae wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 6c2c152d8a67..5445937eff3d 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -481,6 +481,7 @@
> #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
> #define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
> #define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
> +#define X86_FEATURE_APX (21*32 + 9) /* Advanced #Performance Extensions */
>
Is the '#' before 'Performance' intentional? The previous version didn't
seem to have it.
> /*
> * BUG word(s)
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index 94c062cddfa4..7c1268138a7a 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -88,6 +88,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
> { X86_FEATURE_FRED, X86_FEATURE_LKGS },
> { X86_FEATURE_SPEC_CTRL_SSBD, X86_FEATURE_SPEC_CTRL },
> + { X86_FEATURE_APX, X86_FEATURE_XSAVE },
I don't think this table follows any specific logic, but recent patches
have tried grouping by similar features or features with similar
dependencies.
I don't have a preference, but should this be inserted closer to other
XSAVE dependent features?
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{ X86_FEATURE_CMOV, X86_FEATURE_FXSR },
> {}
> };
>
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 16f3ca30626a..6c40d5af8479 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -54,6 +54,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> { X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 },
> { X86_FEATURE_AMD_LBR_PMC_FREEZE, CPUID_EAX, 2, 0x80000022, 0 },
> { X86_FEATURE_AMD_HETEROGENEOUS_CORES, CPUID_EAX, 30, 0x80000026, 0 },
> + { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
There is a comment on top of that table that says:
"Please keep the leaf sorted by cpuid_bit.level for faster search."
Based on that, APX should be inserted here:
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+{ X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
> { 0, 0, 0, 0, 0 }
> };
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-11 16:54 ` Sohil Mehta
@ 2025-04-11 18:23 ` Chang S. Bae
2025-04-11 21:57 ` Sohil Mehta
0 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-11 18:23 UTC (permalink / raw)
To: Sohil Mehta, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
[-- Attachment #1: Type: text/plain, Size: 1468 bytes --]
On 4/11/2025 9:54 AM, Sohil Mehta wrote:
> On 4/11/2025 9:12 AM, Chang S. Bae wrote:
>
>> +#define X86_FEATURE_APX (21*32 + 9) /* Advanced #Performance Extensions */
>
> Is the '#' before 'Performance' intentional? The previous version didn't
> seem to have it.
Oh no — I only meant to update the bit position. My bad!
> I don't think this table follows any specific logic, but recent patches
> have tried grouping by similar features or features with similar
> dependencies.
>
> I don't have a preference, but should this be inserted closer to other
> XSAVE dependent features?
>
> { X86_FEATURE_PKU, X86_FEATURE_XSAVE },
> { X86_FEATURE_MPX, X86_FEATURE_XSAVE },
> { X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
> + { X86_FEATURE_APX, X86_FEATURE_XSAVE },
> { X86_FEATURE_CMOV, X86_FEATURE_FXSR },
> There is a comment on top of that table that says:
> "Please keep the leaf sorted by cpuid_bit.level for faster search."
>
> Based on that, APX should be inserted here:
>
> { X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
> +{ X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
> { X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },Yeah, both suggestions make sense. Thanks for pointing them out!
I've attached the patch revision.
Thanks,
Chang
[-- Attachment #2: 0005-x86-cpufeatures-Add-X86_FEATURE_APX.patch --]
[-- Type: text/plain, Size: 3234 bytes --]
From a3a68edf35c43162a57dad80ea2eecc9c1101246 Mon Sep 17 00:00:00 2001
From: "Chang S. Bae" <chang.seok.bae@intel.com>
Date: Thu, 13 Feb 2025 23:00:28 -0800
Subject: [PATCH RFC v2b 5/9] x86/cpufeatures: Add X86_FEATURE_APX
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC-V2a -> RFC-V2b: Fix typo and organize APX entries orderly (Sohil)
RFC-V2 -> RFC-V2a: Rebased onto v6.15-rc1; resolve conflict with commit
968e9bc4cef8 ("x86: move ZMM exclusion list into CPU feature flag")
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..eb73f3f0ec70 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_APX (21*32 + 9) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 94c062cddfa4..46efcbd6afa4 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -28,6 +28,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{ X86_FEATURE_CMOV, X86_FEATURE_FXSR },
{ X86_FEATURE_MMX, X86_FEATURE_FXSR },
{ X86_FEATURE_MMXEXT, X86_FEATURE_MMX },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..ffb80b5ad97f 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
{ X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-11 18:23 ` Chang S. Bae
@ 2025-04-11 21:57 ` Sohil Mehta
2025-04-12 8:43 ` Ingo Molnar
0 siblings, 1 reply; 112+ messages in thread
From: Sohil Mehta @ 2025-04-11 21:57 UTC (permalink / raw)
To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 4/11/2025 11:23 AM, Chang S. Bae wrote:
> I've attached the patch revision.
>
LGTM,
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2 6/9] x86/fpu/apx: Define APX state component
2025-03-20 23:42 ` [PATCH RFC v2 6/9] x86/fpu/apx: Define APX state component Chang S. Bae
@ 2025-04-11 22:54 ` Sohil Mehta
0 siblings, 0 replies; 112+ messages in thread
From: Sohil Mehta @ 2025-04-11 22:54 UTC (permalink / raw)
To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
Overall, the patch looks good to me.
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Some minor nits below.
On 3/20/2025 4:42 PM, Chang S. Bae wrote:
> The Advanced Performance Extensions (APX) feature flag was previously
> defined.
It's unnecessary to say this. You could directly start with. Advanced
Performance Extensions (APX) is associated with...
> This feature is associated with a new state component number 19.
> To support APX, it is essential to define this xstate component and
> implement the necessary sanity checks.
>
It might be more precise to say what support is being added.
Maybe something like,
During context switch, to support saving and restoring of APX registers
using XSAVE, it is essential...
> Define the new component number, state name, and those register data
> type. Then, extend the size checker to validate the register data type
> and explicitly set the APX feature flag as a dependency for the new
> component in xsave_cpuid_features[].
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> RFC-V1 -> RFC-V2:
> * Remove the ordering table change, as it is now dynamically populated.
> ---
> arch/x86/include/asm/fpu/types.h | 9 +++++++++
> arch/x86/kernel/fpu/xstate.c | 3 +++
> 2 files changed, 12 insertions(+)
>
> diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
> index de16862bf230..97310df3ea13 100644
> --- a/arch/x86/include/asm/fpu/types.h
> +++ b/arch/x86/include/asm/fpu/types.h
> @@ -125,6 +125,7 @@ enum xfeature {
> XFEATURE_RSRVD_COMP_16,
> XFEATURE_XTILE_CFG,
> XFEATURE_XTILE_DATA,
> + XFEATURE_APX,
>
> XFEATURE_MAX,
> };
> @@ -145,6 +146,7 @@ enum xfeature {
> #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
> #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
> #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
> +#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
>
> #define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
> #define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
> @@ -303,6 +305,13 @@ struct xtile_data {
> struct reg_1024_byte tmm;
> } __packed;
>
> +/*
> + * State component 19: 8B extended general purpose register.
> + */
> +struct apx_state {
> + u64 egpr[16];
> +} __packed;
> +
The below comment makes it seem the APX component is inserted out of
order. But, it's the PASID component that is actually out of order.
I'll send a cleanup patch separately. It could be applied before or
after this series.
> /*
> * State component 10 is supervisor state used for context-switching the
> * PASID state.
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 46c45e2f2a5a..2a270683a762 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -63,6 +63,7 @@ static const char *xfeature_names[] =
> "unknown xstate feature",
> "AMX Tile config",
> "AMX Tile data",
> + "APX registers",
> "unknown xstate feature",
> };
>
> @@ -81,6 +82,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
> [XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
> [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
> [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
> + [XFEATURE_APX] = X86_FEATURE_APX,
> };
>
> static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
> @@ -570,6 +572,7 @@ static bool __init check_xstate_against_struct(int nr)
> case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
> case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
> case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
> + case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
Can we insert the new APX case before XFEATURE_XTILE_DATA? These cases
are not really in numerical order. That way, the switch case is more
consistent and easier to read.
> default:
> XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
> return false;
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-11 21:57 ` Sohil Mehta
@ 2025-04-12 8:43 ` Ingo Molnar
2025-04-14 8:23 ` Ingo Molnar
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
0 siblings, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2025-04-12 8:43 UTC (permalink / raw)
To: Sohil Mehta; +Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen
* Sohil Mehta <sohil.mehta@intel.com> wrote:
> On 4/11/2025 11:23 AM, Chang S. Bae wrote:
>
> > I've attached the patch revision.
> >
>
> LGTM,
>
> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Chang, mind sending a series of the latest version of all the pending
APX patches you have at the moment (and any other pending FPU patches
you may have), with Reviewed-by tags rolled in, etc., on top of:
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/fpu
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* [tip: x86/merge] x86/fpu/xstate: Adjust xstate copying logic for user ABI
2025-03-20 23:42 ` [PATCH RFC v2 4/9] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-03-25 10:28 ` tip-bot2 for Chang S. Bae
@ 2025-04-14 7:34 ` tip-bot2 for Chang S. Bae
2 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-14 7:34 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/merge branch of tip:
Commit-ID: cbe8e4dab16c56ac84765dcd53e418160c8bc0db
Gitweb: https://git.kernel.org/tip/cbe8e4dab16c56ac84765dcd53e418160c8bc0db
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:55 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 14 Apr 2025 08:18:29 +02:00
x86/fpu/xstate: Adjust xstate copying logic for user ABI
== Background ==
As feature positions in the userspace XSAVE buffer do not always align
with their feature numbers, the XSAVE format conversion needs to be
reconsidered to align with the revised xstate size calculation logic.
* For signal handling, XSAVE and XRSTOR are used directly to save and
restore extended registers.
* For ptrace, KVM, and signal returns (for 32-bit frame), the kernel
copies data between its internal buffer and the userspace XSAVE buffer.
If memcpy() were used for these cases, existing offset helpers — such
as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to
handle the format conversion.
== Problem ==
When copying data from the compacted in-kernel buffer to the
non-compacted userspace buffer, the function follows the
user_regset_get2_fn() prototype. This means it utilizes struct membuf
helpers for the destination buffer. As defined in regset.h, these helpers
update the memory pointer during the copy process, enforcing sequential
writes within the loop.
Since xstate components are processed sequentially, any component whose
buffer position does not align with its feature number has an issue.
== Solution ==
Replace for_each_extended_xfeature() with the newly introduced
for_each_extended_xfeature_in_order(). This macro ensures xstate
components are handled in the correct order based on their actual
positions in the destination buffer, rather than their feature numbers.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 93f9401..46c45e2 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1107,10 +1107,9 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
struct xregs_state *xsave = &fpstate->regs.xsave;
+ unsigned int zerofrom, i, xfeature;
struct xstate_header header;
- unsigned int zerofrom;
u64 mask;
- int i;
memset(&header, 0, sizeof(header));
header.xfeatures = xsave->header.xfeatures;
@@ -1179,15 +1178,16 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
*/
mask = header.xfeatures;
- for_each_extended_xfeature(i, mask) {
+ for_each_extended_xfeature_in_order(i, mask) {
+ xfeature = xfeature_uncompact_order[i];
/*
* If there was a feature or alignment gap, zero the space
* in the destination buffer.
*/
- if (zerofrom < xstate_offsets[i])
- membuf_zero(&to, xstate_offsets[i] - zerofrom);
+ if (zerofrom < xstate_offsets[xfeature])
+ membuf_zero(&to, xstate_offsets[xfeature] - zerofrom);
- if (i == XFEATURE_PKRU) {
+ if (xfeature == XFEATURE_PKRU) {
struct pkru_state pkru = {0};
/*
* PKRU is not necessarily up to date in the
@@ -1197,14 +1197,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
membuf_write(&to, &pkru, sizeof(pkru));
} else {
membuf_write(&to,
- __raw_xsave_addr(xsave, i),
- xstate_sizes[i]);
+ __raw_xsave_addr(xsave, xfeature),
+ xstate_sizes[xfeature]);
}
/*
* Keep track of the last copied state in the non-compacted
* target buffer for gap zeroing.
*/
- zerofrom = xstate_offsets[i] + xstate_sizes[i];
+ zerofrom = xstate_offsets[xfeature] + xstate_sizes[xfeature];
}
out:
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/merge] x86/fpu/xstate: Adjust XSAVE buffer size calculation
2025-03-20 23:42 ` [PATCH RFC v2 3/9] x86/fpu/xstate: Adjust XSAVE buffer size calculation Chang S. Bae
2025-03-25 10:28 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
@ 2025-04-14 7:34 ` tip-bot2 for Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-14 7:34 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/merge branch of tip:
Commit-ID: a758ae2885eacaa68e4de9a01539a242a6bbb403
Gitweb: https://git.kernel.org/tip/a758ae2885eacaa68e4de9a01539a242a6bbb403
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:54 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 14 Apr 2025 08:18:29 +02:00
x86/fpu/xstate: Adjust XSAVE buffer size calculation
The current xstate size calculation assumes that the highest-numbered
xstate feature has the highest offset in the buffer, determining the size
based on the topmost bit in the feature mask. However, this assumption is
not architecturally guaranteed -- higher-numbered features may have lower
offsets.
With the introduction of the xfeature order table and its helper macro,
xstate components can now be traversed in their positional order. Update
the non-compacted format handling to iterate through the table to
determine the last-positioned feature. Then, set the offset accordingly.
Since size calculation primarily occurs during initialization or in
non-critical paths, looping to find the last feature is not expected to
have a meaningful performance impact.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-4-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1e22103..93f9401 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -581,13 +581,20 @@ static bool __init check_xstate_against_struct(int nr)
static unsigned int xstate_calculate_size(u64 xfeatures, bool compacted)
{
unsigned int topmost = fls64(xfeatures) - 1;
- unsigned int offset = xstate_offsets[topmost];
+ unsigned int offset, i;
if (topmost <= XFEATURE_SSE)
return sizeof(struct xregs_state);
- if (compacted)
+ if (compacted) {
offset = xfeature_get_offset(xfeatures, topmost);
+ } else {
+ /* Walk through the xfeature order to pick the last */
+ for_each_extended_xfeature_in_order(i, xfeatures)
+ topmost = xfeature_uncompact_order[i];
+ offset = xstate_offsets[topmost];
+ }
+
return offset + xstate_sizes[topmost];
}
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/merge] x86/fpu/xstate: Introduce xfeature order table and accessor macro
2025-03-20 23:42 ` [PATCH RFC v2 2/9] x86/fpu/xstate: Introduce xfeature order table and accessor macro Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
@ 2025-04-14 7:34 ` tip-bot2 for Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-14 7:34 UTC (permalink / raw)
To: linux-tip-commits
Cc: Dave Hansen, Chang S. Bae, Ingo Molnar, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/merge branch of tip:
Commit-ID: 15d51a2f6f3f7057311d6e37d5190d58597a94a9
Gitweb: https://git.kernel.org/tip/15d51a2f6f3f7057311d6e37d5190d58597a94a9
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:53 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 14 Apr 2025 08:18:29 +02:00
x86/fpu/xstate: Introduce xfeature order table and accessor macro
The kernel has largely assumed that higher xstate component numbers
correspond to later offsets in the buffer. However, this assumption no
longer holds for the non-compacted format, where a newer state component
may have a lower offset.
When iterating over xstate components in offset order, using the feature
number as an index may be misleading. At the same time, the CPU exposes
each component’s size and offset based on its feature number, making it a
key for state information.
To provide flexibility in handling xstate ordering, introduce a mapping
table: feature order -> feature number. The table is dynamically
populated based on the CPU-exposed features and is sorted in offset order
at boot time.
Additionally, add an accessor macro to facilitate sequential traversal of
xstate components based on their actual buffer positions, given a feature
bitmask. This accessor macro will be particularly useful for computing
custom non-compacted format sizes and iterating over xstate offsets in
non-compacted buffers.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-3-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 58 ++++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 542c698..1e22103 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -14,6 +14,7 @@
#include <linux/proc_fs.h>
#include <linux/vmalloc.h>
#include <linux/coredump.h>
+#include <linux/sort.h>
#include <asm/fpu/api.h>
#include <asm/fpu/regset.h>
@@ -88,6 +89,31 @@ static unsigned int xstate_sizes[XFEATURE_MAX] __ro_after_init =
{ [ 0 ... XFEATURE_MAX - 1] = -1};
static unsigned int xstate_flags[XFEATURE_MAX] __ro_after_init;
+/*
+ * Ordering of xstate components in uncompacted format: The xfeature
+ * number does not necessarily indicate its position in the XSAVE buffer.
+ * This array defines the traversal order of xstate features.
+ */
+static unsigned int xfeature_uncompact_order[XFEATURE_MAX] __ro_after_init =
+ { [ 0 ... XFEATURE_MAX - 1] = -1};
+
+static inline unsigned int next_xfeature_order(unsigned int i, u64 mask)
+{
+ for (; xfeature_uncompact_order[i] != -1; i++) {
+ if (mask & BIT_ULL(xfeature_uncompact_order[i]))
+ break;
+ }
+
+ return i;
+}
+
+/* Iterate xstate features in uncompacted order: */
+#define for_each_extended_xfeature_in_order(i, mask) \
+ for (i = 0; \
+ i = next_xfeature_order(i, mask), \
+ xfeature_uncompact_order[i] != -1; \
+ i++)
+
#define XSTATE_FLAG_SUPERVISOR BIT(0)
#define XSTATE_FLAG_ALIGNED64 BIT(1)
@@ -209,13 +235,20 @@ static bool xfeature_enabled(enum xfeature xfeature)
return fpu_kernel_cfg.max_features & BIT_ULL(xfeature);
}
+static int compare_xstate_offsets(const void *xfeature1, const void *xfeature2)
+{
+ return xstate_offsets[*(unsigned int *)xfeature1] -
+ xstate_offsets[*(unsigned int *)xfeature2];
+}
+
/*
* Record the offsets and sizes of various xstates contained
- * in the XSAVE state memory layout.
+ * in the XSAVE state memory layout. Also, create an ordered
+ * list of xfeatures for handling out-of-order offsets.
*/
static void __init setup_xstate_cache(void)
{
- u32 eax, ebx, ecx, edx, i;
+ u32 eax, ebx, ecx, edx, xfeature, i = 0;
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -229,21 +262,30 @@ static void __init setup_xstate_cache(void)
xstate_sizes[XFEATURE_SSE] = sizeof_field(struct fxregs_state,
xmm_space);
- for_each_extended_xfeature(i, fpu_kernel_cfg.max_features) {
- cpuid_count(CPUID_LEAF_XSTATE, i, &eax, &ebx, &ecx, &edx);
+ for_each_extended_xfeature(xfeature, fpu_kernel_cfg.max_features) {
+ cpuid_count(CPUID_LEAF_XSTATE, xfeature, &eax, &ebx, &ecx, &edx);
- xstate_sizes[i] = eax;
- xstate_flags[i] = ecx;
+ xstate_sizes[xfeature] = eax;
+ xstate_flags[xfeature] = ecx;
/*
* If an xfeature is supervisor state, the offset in EBX is
* invalid, leave it to -1.
*/
- if (xfeature_is_supervisor(i))
+ if (xfeature_is_supervisor(xfeature))
continue;
- xstate_offsets[i] = ebx;
+ xstate_offsets[xfeature] = ebx;
+
+ /* Populate the list of xfeatures before sorting */
+ xfeature_uncompact_order[i++] = xfeature;
}
+
+ /*
+ * Sort xfeatures by their offsets to support out-of-order
+ * offsets in the uncompacted format.
+ */
+ sort(xfeature_uncompact_order, i, sizeof(unsigned int), compare_xstate_offsets, NULL);
}
/*
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/merge] x86/fpu/xstate: Remove xstate offset check
2025-03-20 23:42 ` [PATCH RFC v2 1/9] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
@ 2025-04-14 7:34 ` tip-bot2 for Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-14 7:34 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/merge branch of tip:
Commit-ID: 031b33ef1a6a1129a1a02a16b89608ded2eff9be
Gitweb: https://git.kernel.org/tip/031b33ef1a6a1129a1a02a16b89608ded2eff9be
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Thu, 20 Mar 2025 16:42:52 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 14 Apr 2025 08:18:29 +02:00
x86/fpu/xstate: Remove xstate offset check
Traditionally, new xstate components have been assigned sequentially,
aligning feature numbers with their offsets in the XSAVE buffer. However,
this ordering is not architecturally mandated in the non-compacted
format, where a component's offset may not correspond to its feature
number.
The kernel caches CPUID-reported xstate component details, including size
and offset in the non-compacted format. As part of this process, a sanity
check is also conducted to ensure alignment between feature numbers and
offsets.
This check was likely intended as a general guideline rather than a
strict requirement. Upcoming changes will support out-of-order offsets.
Remove the check as becoming obsolete.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250320234301.8342-2-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 6a41d16..542c698 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -216,9 +216,6 @@ static bool xfeature_enabled(enum xfeature xfeature)
static void __init setup_xstate_cache(void)
{
u32 eax, ebx, ecx, edx, i;
- /* start at the beginning of the "extended state" */
- unsigned int last_good_offset = offsetof(struct xregs_state,
- extended_state_area);
/*
* The FP xstates and SSE xstates are legacy states. They are always
* in the fixed offsets in the xsave area in either compacted form
@@ -246,16 +243,6 @@ static void __init setup_xstate_cache(void)
continue;
xstate_offsets[i] = ebx;
-
- /*
- * In our xstate size checks, we assume that the highest-numbered
- * xstate feature has the highest offset in the buffer. Ensure
- * it does.
- */
- WARN_ONCE(last_good_offset > xstate_offsets[i],
- "x86/fpu: misordered xstate at %d\n", last_good_offset);
-
- last_good_offset = xstate_offsets[i];
}
}
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-12 8:43 ` Ingo Molnar
@ 2025-04-14 8:23 ` Ingo Molnar
2025-04-14 17:28 ` Sohil Mehta
2025-04-14 18:31 ` Chang S. Bae
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
1 sibling, 2 replies; 112+ messages in thread
From: Ingo Molnar @ 2025-04-14 8:23 UTC (permalink / raw)
To: Sohil Mehta, Chang S. Bae
Cc: Chang S. Bae, linux-kernel, x86, tglx, mingo, bp, dave.hansen
* Ingo Molnar <mingo@kernel.org> wrote:
>
> * Sohil Mehta <sohil.mehta@intel.com> wrote:
>
> > On 4/11/2025 11:23 AM, Chang S. Bae wrote:
> >
> > > I've attached the patch revision.
> > >
> >
> > LGTM,
> >
> > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
>
> Chang, mind sending a series of the latest version of all the pending
> APX patches you have at the moment (and any other pending FPU patches
> you may have), with Reviewed-by tags rolled in, etc., on top of:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/fpu
Note that this is now all in tip:x86/fpu, alongside with a rebased
version of Chang S. Bae's preparatory patches for Intel APX:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/fpu
I think we could now start merging the rest of the APX patches, for
v6.16 upstreaming.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2 8/9] x86/fpu/apx: Enable APX state support
2025-03-20 23:42 ` [PATCH RFC v2 8/9] x86/fpu/apx: Enable APX state support Chang S. Bae
@ 2025-04-14 15:41 ` Sohil Mehta
0 siblings, 0 replies; 112+ messages in thread
From: Sohil Mehta @ 2025-04-14 15:41 UTC (permalink / raw)
To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 3/20/2025 4:42 PM, Chang S. Bae wrote:
> With securing APX against conflicting MPX, it is now ready to be enabled.
> Include APX in the enabled xfeature set.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> arch/x86/include/asm/fpu/xstate.h | 3 ++-
> arch/x86/kernel/fpu/xstate.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
> diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
> index 7f39fe7980c5..b308a76afbb7 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -32,7 +32,8 @@
> XFEATURE_MASK_PKRU | \
> XFEATURE_MASK_BNDREGS | \
> XFEATURE_MASK_BNDCSR | \
> - XFEATURE_MASK_XTILE)
> + XFEATURE_MASK_XTILE | \
> + XFEATURE_MASK_APX)
>
> /*
> * Features which are restored when returning to user space.
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 0d68d5c4bc48..a5e3954dc834 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -371,7 +371,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
> XFEATURE_MASK_BNDCSR | \
> XFEATURE_MASK_PASID | \
> XFEATURE_MASK_CET_USER | \
> - XFEATURE_MASK_XTILE)
> + XFEATURE_MASK_XTILE | \
> + XFEATURE_MASK_APX)
>
> /*
> * setup the xstate image representing the init state
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2 9/9] selftests/x86/apx: Add APX test
2025-03-20 23:43 ` [PATCH RFC v2 9/9] selftests/x86/apx: Add APX test Chang S. Bae
@ 2025-04-14 16:02 ` Sohil Mehta
0 siblings, 0 replies; 112+ messages in thread
From: Sohil Mehta @ 2025-04-14 16:02 UTC (permalink / raw)
To: Chang S. Bae; +Cc: x86, tglx, mingo, bp, dave.hansen, LKML Mailing List
On 3/20/2025 4:43 PM, Chang S. Bae wrote:
> The extended general-purpose registers for APX may contain random data,
> which is currently assumed by the xstate testing framework. This allows
> the testing of the new userspace feature using the common test code.
>
> Invoke the test entry function from apx.c after enumerating the
> state component and adding it to the support list
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
AVX and APX have very similar single line testcases. If we end up adding
more, it might be useful to consider merging them into a common file.
> diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
> new file mode 100644
> index 000000000000..d9c8d41b8c5a
> --- /dev/null
> +++ b/tools/testing/selftests/x86/apx.c
> @@ -0,0 +1,10 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define _GNU_SOURCE
> +
> +#include "xstate.h"
> +
> +int main(void)
> +{
> + test_xstate(XFEATURE_APX);
> +}
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2 7/9] x86/fpu/apx: Disallow conflicting MPX presence
2025-03-20 23:42 ` [PATCH RFC v2 7/9] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
@ 2025-04-14 17:09 ` Sohil Mehta
2025-04-14 17:25 ` Dave Hansen
0 siblings, 1 reply; 112+ messages in thread
From: Sohil Mehta @ 2025-04-14 17:09 UTC (permalink / raw)
To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 3/20/2025 4:42 PM, Chang S. Bae wrote:
> APX is introduced as xstate component 19, following AMX. However, in the
> non-compacted format, its offset overlaps with the space previously
> occupied by the now-deprecated MPX:
>
> 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
>
Would it be useful to describe why and how this is possible?
My knowledge on this fairly limited, is it because the size of the APX
register state is the same or less than the legacy MPX state?
Also, what other options does the kernel have to handle this?
There was an earlier email from Dave that describes the trade-offs.
Would it be useful to insert some form of summary here or in a different
patch?
https://lore.kernel.org/lkml/674db309-e206-4bb4-bf99-b3c39bff7973@intel.com/
> To prevent conflicts, the kernel must ensure the CPU never expose both
> features at the same time. If so, it indicates unreliable hardware. In
> such cases, XSAVE should be disabled entirely as a precautionary measure.
>
> Add a sanity check to detect this condition and disable XSAVE if an
> invalid hardware configuration is identified.
>
> Note: MPX state components remain enabled on legacy systems solely for
> KVM guest support.
>
I didn't understand this properly. Can you please explain more? Is there
an impact or restriction that this imposes on a combination of legacy
and modern components?
Old KVM — new Guest kernel
or
New KVM — Old guest kernel
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 2a270683a762..0d68d5c4bc48 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -814,6 +814,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> goto out_disable;
> }
>
> + if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
> + fpu_kernel_cfg.max_features & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) {
> + /*
> + * This is a problematic CPU configuration where two
> + * conflicting state components are both enumerated.
> + */
> + pr_err("x86/fpu: both APX and MPX present in the CPU's xstate features: 0x%llx.\n",
> + fpu_kernel_cfg.max_features);
s/both/Both
For the user, this statement doesn't necessarily seem like an error
message, nor does it say what action the kernel takes.
IIUC, there is no other print that says the kernel is disabling XSAVE
once it jumps to out_disable?
It might be useful to add a "disabling XSAVE" print at the end of this
statement, like the other error messages in the same function.
With this change,
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
> + goto out_disable;
> + }
> +
> fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
> XFEATURE_MASK_INDEPENDENT;
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2 7/9] x86/fpu/apx: Disallow conflicting MPX presence
2025-04-14 17:09 ` Sohil Mehta
@ 2025-04-14 17:25 ` Dave Hansen
0 siblings, 0 replies; 112+ messages in thread
From: Dave Hansen @ 2025-04-14 17:25 UTC (permalink / raw)
To: Sohil Mehta, Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen
On 4/14/25 10:09, Sohil Mehta wrote:
> On 3/20/2025 4:42 PM, Chang S. Bae wrote:
>> APX is introduced as xstate component 19, following AMX. However, in the
>> non-compacted format, its offset overlaps with the space previously
>> occupied by the now-deprecated MPX:
>>
>> 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
>>
> Would it be useful to describe why and how this is possible?
Every XSAVE state component is architecturally independent. There's no
architectural rule out there that says that state component offsets in
the non-compacted format must always go up or that they can't overlap.
But, there's never been an overlap like this before.
So this is technically allowed, but it's weird and it breaks some
internal assumptions that the kernel has.
Does that summarize it OK?
> My knowledge on this fairly limited, is it because the size of the APX
> register state is the same or less than the legacy MPX state?
Well, that was one reason the MPX location was _chosen_ for APX. Had APX
been bigger, it the MPX location would obviously not have been an
option. Had it been way smaller, using the MPX location would be OK.
> Also, what other options does the kernel have to handle this?
The kernel could _probably_ just disable the MPX states and favor using
APX. But, honestly, if this ever happens it's going to be because some
silly VMM is enumerating all the features it knows of and doesn't have
logic to make APX and MPX mutually exclusive.
In that case, we want the VMM to get fixed pronto, so papering over the
issue is not the best idea. Spitting out a warning and disabling XSAVE
but still booting sounds like a good balance of being annoying, but not
being _as_ annoying as a BUG_ON() or breaking the boot.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-14 8:23 ` Ingo Molnar
@ 2025-04-14 17:28 ` Sohil Mehta
2025-04-14 17:32 ` Dave Hansen
2025-04-14 18:31 ` Chang S. Bae
1 sibling, 1 reply; 112+ messages in thread
From: Sohil Mehta @ 2025-04-14 17:28 UTC (permalink / raw)
To: Ingo Molnar, Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 4/14/2025 1:23 AM, Ingo Molnar wrote:
> I think we could now start merging the rest of the APX patches, for
> v6.16 upstreaming.
>
Chang, Ingo,
The series looks good to me.
Functionally, the only change I have requested is in patch 7/9 to add a
"disabling XSAVE" message to the error print.
The rest of the comments are mainly about clarifications in the commit
logs. Feel free to ignore them if they seem redundant.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-14 17:28 ` Sohil Mehta
@ 2025-04-14 17:32 ` Dave Hansen
2025-04-14 17:45 ` Sohil Mehta
0 siblings, 1 reply; 112+ messages in thread
From: Dave Hansen @ 2025-04-14 17:32 UTC (permalink / raw)
To: Sohil Mehta, Ingo Molnar, Chang S. Bae
Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 4/14/25 10:28, Sohil Mehta wrote:
> Functionally, the only change I have requested is in patch 7/9 to add a
> "disabling XSAVE" message to the error print.
That's probably a good idea to stick in:
fpu__init_disable_system_xstate()
for all of the "out_disable" cases to use, no?
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-14 17:32 ` Dave Hansen
@ 2025-04-14 17:45 ` Sohil Mehta
2025-04-14 18:02 ` Chang S. Bae
0 siblings, 1 reply; 112+ messages in thread
From: Sohil Mehta @ 2025-04-14 17:45 UTC (permalink / raw)
To: Dave Hansen, Ingo Molnar, Chang S. Bae
Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 4/14/2025 10:32 AM, Dave Hansen wrote:
> On 4/14/25 10:28, Sohil Mehta wrote:
>> Functionally, the only change I have requested is in patch 7/9 to add a
>> "disabling XSAVE" message to the error print.
>
> That's probably a good idea to stick in:
>
> fpu__init_disable_system_xstate()
>
> for all of the "out_disable" cases to use, no?
That way my initial inclination as well. My suggestion was mainly to
keep it consistent. But looking more closely, there is mismatch already.
So either of the options work for me.
"x86/fpu: FP/SSE not present amongst the CPU's xstate features: 0x%llx."
"x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE"
"x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init,
disabling XSAVE"
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-14 17:45 ` Sohil Mehta
@ 2025-04-14 18:02 ` Chang S. Bae
2025-04-14 18:12 ` Sohil Mehta
0 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-14 18:02 UTC (permalink / raw)
To: Sohil Mehta, Dave Hansen, Ingo Molnar
Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 4/14/2025 10:45 AM, Sohil Mehta wrote:
>
> That way my initial inclination as well. My suggestion was mainly to
> keep it consistent. But looking more closely, there is mismatch already.
> So either of the options work for me.
>
> "x86/fpu: FP/SSE not present amongst the CPU's xstate features: 0x%llx."
>
> "x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE"
>
> "x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init,
> disabling XSAVE"
And you also left this comment in patch 7:
> It might be useful to add a "disabling XSAVE" print at the end of this
> statement, like the other error messages in the same function.
So it sounds like you were suggesting something along the lines of:
"x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx,
disabling XSAVE"
If so, I see that as an improvement to the error message rather than a
functional change. But I agree it reads better and more in line with the
other cases.
Thanks,
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-14 18:02 ` Chang S. Bae
@ 2025-04-14 18:12 ` Sohil Mehta
0 siblings, 0 replies; 112+ messages in thread
From: Sohil Mehta @ 2025-04-14 18:12 UTC (permalink / raw)
To: Chang S. Bae, Dave Hansen, Ingo Molnar
Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 4/14/2025 11:02 AM, Chang S. Bae wrote:
> On 4/14/2025 10:45 AM, Sohil Mehta wrote:
>>
>> That way my initial inclination as well. My suggestion was mainly to
>> keep it consistent. But looking more closely, there is mismatch already.
>> So either of the options work for me.
>>
>> "x86/fpu: FP/SSE not present amongst the CPU's xstate features: 0x%llx."
>>
>> "x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE"
>>
>> "x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init,
>> disabling XSAVE"
>
>
> And you also left this comment in patch 7:
>
> > It might be useful to add a "disabling XSAVE" print at the end of this
> > statement, like the other error messages in the same function.
>
> So it sounds like you were suggesting something along the lines of:
> "x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx,
> disabling XSAVE"
>
Yes, this is what I meant. But Dave's suggestion to add a common print
in fpu__init_disable_system_xstate() is fine as well.
> If so, I see that as an improvement to the error message rather than a
> functional change.
Maybe functional is too strong. I meant a minor code/message change.
> But I agree it reads better and more in line with the
> other cases.
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH RFC v2a 5/9] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-14 8:23 ` Ingo Molnar
2025-04-14 17:28 ` Sohil Mehta
@ 2025-04-14 18:31 ` Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-04-14 18:31 UTC (permalink / raw)
To: Ingo Molnar, Sohil Mehta; +Cc: linux-kernel, x86, tglx, mingo, bp, dave.hansen
On 4/14/2025 1:23 AM, Ingo Molnar wrote:
>
>> Chang, mind sending a series of the latest version of all the pending
>> APX patches you have at the moment (and any other pending FPU patches
>> you may have), with Reviewed-by tags rolled in, etc., on top of:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/fpu
>
> Note that this is now all in tip:x86/fpu, alongside with a rebased
> version of Chang S. Bae's preparatory patches for Intel APX:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/fpu
>
> I think we could now start merging the rest of the APX patches, for
> v6.16 upstreaming.
Thanks for the update! Finalizing a few minor changes based on the
feedback so far, and also refreshing a few other FPU patches that are
considerably worth to be bundled up. I’ll post them in the next day or two.
Thanks,
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements
2025-04-12 8:43 ` Ingo Molnar
2025-04-14 8:23 ` Ingo Molnar
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 2:16 ` [PATCH 01/10] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
` (10 more replies)
1 sibling, 11 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae
On 4/12/2025 1:43 AM, Ingo Molnar wrote:
>
> Chang, mind sending a series of the latest version of all the pending
> APX patches you have at the moment (and any other pending FPU patches
> you may have), with Reviewed-by tags rolled in, etc.
Hi Ingo,
Here’s the updated patch set following up on the previous APX series [1],
along with a collection of additional FPU-related cleanups and
improvements that were previously posted or discussed.
The series is organized into two parts:
1. APX Enabling (PATCH 1–5)
These patches complete the APX bring-up. After laying the groundwork,
this portion finalizes the enablement:
* Patches 1, 2, and 4 are typical xfeature plumbing.
* Patch 3 handles MPX conflict -- unexpected hardware issue
* Patch 5 adds a test case.
2. Miscellaneous FPU Code Improvements (PATCH 6–10)
This batch includes various standalone improvements:
* Patch 6: Centralizes the XSAVE disablement message
* Patches 7-8: Simplifies PKRU update in XSTATE_BV on sigframe
* Patch 9: Removes the unused mxcsr_feature_mask export
* Patch 10: Renames fpu_reset_fpregs() for clarity
Each patch includes context and links to earlier discussions or
revisions.
As you noted in [2], the series is based on the x86/fpu branch and is
available at:
git://github.com/intel/apx.git apx-and-misc
[1]: https://lore.kernel.org/lkml/20250320234301.8342-1-chang.seok.bae@intel.com
[2]: https://lore.kernel.org/lkml/Z_zGCCNE_Qt3IlMZ@gmail.com
Thanks,
Chang
Chang S. Bae (10):
x86/cpufeatures: Add X86_FEATURE_APX
x86/fpu/apx: Define APX state component
x86/fpu/apx: Disallow conflicting MPX presence
x86/fpu/apx: Enable APX state support
selftests/x86/apx: Add APX test
x86/fpu: Log XSAVE disablement consistently
x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE
x86/pkeys: Simplify PKRU update in signal frame
x86/fpu: Remove export of mxcsr_feature_mask
x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs()
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/fpu/core.c | 6 +++---
arch/x86/kernel/fpu/init.c | 1 -
arch/x86/kernel/fpu/signal.c | 11 +----------
arch/x86/kernel/fpu/xstate.c | 23 ++++++++++++++++++++---
arch/x86/kernel/fpu/xstate.h | 21 +++++++++++++++------
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 2 ++
14 files changed, 69 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
base-commit: e3a52b67f54aa36ab21265eeea016460b5fe1c46
--
2.45.2
^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH 01/10] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 02/10] x86/fpu/apx: Define APX state component Chang S. Bae
` (9 subsequent siblings)
10 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae,
Sohil Mehta
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-6-chang.seok.bae@intel.com/
* Rebase onto v6.15-rc1; resolve conflict with commit:
968e9bc4cef8 ("x86: move ZMM exclusion list into CPU feature flag")
* Organize APX entries orderly (Sohil). Then, include his review tag
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..eb73f3f0ec70 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_APX (21*32 + 9) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index a2fbea0be535..72f4fb66ac20 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -28,6 +28,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{ X86_FEATURE_CMOV, X86_FEATURE_FXSR },
{ X86_FEATURE_MMX, X86_FEATURE_FXSR },
{ X86_FEATURE_MMXEXT, X86_FEATURE_MMX },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 16f3ca30626a..ffb80b5ad97f 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
{ X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 02/10] x86/fpu/apx: Define APX state component
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
2025-04-16 2:16 ` [PATCH 01/10] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 03/10] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
` (8 subsequent siblings)
10 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae,
Sohil Mehta
Advanced Performance Extensions (APX) is associated with a new state
component number 19. To support saving and restoring of the corresponding
registers via the XSAVE mechanism, introduce the component definition
along with the necessary sanity checks.
Define the new component number, state name, and those register data
type. Then, extend the size checker to validate the register data type
and explicitly list the APX feature flag as a dependency for the new
component in xsave_cpuid_features[].
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-7-chang.seok.bae@intel.com/
* Move the check to be grouped with other XCHECK_SZ() entries (Sohil)
* Massage the changelog (Sohil)
* Add Sohil's tag
---
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/kernel/fpu/xstate.c | 3 +++
2 files changed, 12 insertions(+)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862bf230..97310df3ea13 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -125,6 +125,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -145,6 +146,7 @@ enum xfeature {
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
+#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
@@ -303,6 +305,13 @@ struct xtile_data {
struct reg_1024_byte tmm;
} __packed;
+/*
+ * State component 19: 8B extended general purpose register.
+ */
+struct apx_state {
+ u64 egpr[16];
+} __packed;
+
/*
* State component 10 is supervisor state used for context-switching the
* PASID state.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index a288597065fd..dfd07af10037 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -63,6 +63,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
@@ -81,6 +82,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
+ [XFEATURE_APX] = X86_FEATURE_APX,
};
static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -569,6 +571,7 @@ static bool __init check_xstate_against_struct(int nr)
case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
+ case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 03/10] x86/fpu/apx: Disallow conflicting MPX presence
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
2025-04-16 2:16 ` [PATCH 01/10] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
2025-04-16 2:16 ` [PATCH 02/10] x86/fpu/apx: Define APX state component Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 04/10] x86/fpu/apx: Enable APX state support Chang S. Bae
` (7 subsequent siblings)
10 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae,
Sohil Mehta
XSTATE components are architecturally independent. There is no rule
requiring their offsets in the non-compacted format to be strictly
ascending or mutually non-overlapping. However, in practice, such
overlaps have not occurred -- until now.
APX is introduced as xstate component 19, following AMX. In the
non-compacted XSAVE format, its offset overlaps with the space previously
occupied by the now-deprecated MPX feature:
45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
To prevent conflicts, the kernel must ensure the CPU never expose both
features at the same time. If so, it indicates unreliable hardware. In
such cases, XSAVE should be disabled entirely as a precautionary measure.
Add a sanity check to detect this condition and disable XSAVE if an
invalid hardware configuration is identified.
Note: MPX state components remain enabled on legacy systems solely for
KVM guest support.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-8-chang.seok.bae@intel.com/
* Add background in the changelog (Sohil/Dave)
* Clarify XSAVE disablement (Sohil).
* Collect review tag
The related warning message will be moved to the XSAVE-disabling function
in patch 6, per Dave’s suggestion.
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index dfd07af10037..14f5c1bb2080 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -814,6 +814,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
+ fpu_kernel_cfg.max_features & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) {
+ /*
+ * This is a problematic CPU configuration where two
+ * conflicting state components are both enumerated.
+ */
+ pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
+ fpu_kernel_cfg.max_features);
+ goto out_disable;
+ }
+
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
XFEATURE_MASK_INDEPENDENT;
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 04/10] x86/fpu/apx: Enable APX state support
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
` (2 preceding siblings ...)
2025-04-16 2:16 ` [PATCH 03/10] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 05/10] selftests/x86/apx: Add APX test Chang S. Bae
` (6 subsequent siblings)
10 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae,
Sohil Mehta
With securing APX against conflicting MPX, it is now ready to be enabled.
Include APX in the enabled xfeature set.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-9-chang.seok.bae@intel.com/
* Include review tag
---
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/fpu/xstate.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7980c5..b308a76afbb7 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -32,7 +32,8 @@
XFEATURE_MASK_PKRU | \
XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* Features which are restored when returning to user space.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 14f5c1bb2080..2ac1fc182273 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -371,7 +371,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* setup the xstate image representing the init state
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 05/10] selftests/x86/apx: Add APX test
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
` (3 preceding siblings ...)
2025-04-16 2:16 ` [PATCH 04/10] x86/fpu/apx: Enable APX state support Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently Chang S. Bae
` (5 subsequent siblings)
10 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae,
Sohil Mehta
The extended general-purpose registers for APX may contain random data,
which is currently assumed by the xstate testing framework. This allows
the testing of the new userspace feature using the common test code.
Invoke the test entry function from apx.c after enumerating the
state component and adding it to the support list
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250320234301.8342-10-chang.seok.bae@intel.com/
* Add review tag
Some might view this standalone test as potentially converging with other
tests. I encountered similar situation when posting the selftest rework:
https://lore.kernel.org/lkml/20250226010731.2456-10-chang.seok.bae@intel.com/
Maybe it's worthwhile to clarify the consideration again here:
Alternatively, this invocation could be placed directly in
xstate.c::main(). However, the current test file naming convention,
which clearly specifies the tested area, seems reasonable. Adding apx.c
considerably aligns with that convention.
---
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 2 ++
4 files changed, 16 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 28422c32cc8f..f703fcfe9f7c 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -19,7 +19,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
- corrupt_xstate_header amx lam test_shadow_stack avx
+ corrupt_xstate_header amx lam test_shadow_stack avx apx
# Some selftests require 32bit support enabled also on 64bit systems
TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
@@ -136,3 +136,4 @@ $(OUTPUT)/nx_stack_64: CFLAGS += -Wl,-z,noexecstack
$(OUTPUT)/avx_64: CFLAGS += -mno-avx -mno-avx512f
$(OUTPUT)/amx_64: EXTRA_FILES += xstate.c
$(OUTPUT)/avx_64: EXTRA_FILES += xstate.c
+$(OUTPUT)/apx_64: EXTRA_FILES += xstate.c
diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
new file mode 100644
index 000000000000..d9c8d41b8c5a
--- /dev/null
+++ b/tools/testing/selftests/x86/apx.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include "xstate.h"
+
+int main(void)
+{
+ test_xstate(XFEATURE_APX);
+}
diff --git a/tools/testing/selftests/x86/xstate.c b/tools/testing/selftests/x86/xstate.c
index 23c1d6c964ea..97fe4bd8bc77 100644
--- a/tools/testing/selftests/x86/xstate.c
+++ b/tools/testing/selftests/x86/xstate.c
@@ -31,7 +31,8 @@
(1 << XFEATURE_OPMASK) | \
(1 << XFEATURE_ZMM_Hi256) | \
(1 << XFEATURE_Hi16_ZMM) | \
- (1 << XFEATURE_XTILEDATA))
+ (1 << XFEATURE_XTILEDATA) | \
+ (1 << XFEATURE_APX))
static inline uint64_t xgetbv(uint32_t index)
{
diff --git a/tools/testing/selftests/x86/xstate.h b/tools/testing/selftests/x86/xstate.h
index 42af36ec852f..e91e3092b5d2 100644
--- a/tools/testing/selftests/x86/xstate.h
+++ b/tools/testing/selftests/x86/xstate.h
@@ -33,6 +33,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILECFG,
XFEATURE_XTILEDATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -59,6 +60,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
` (4 preceding siblings ...)
2025-04-16 2:16 ` [PATCH 05/10] selftests/x86/apx: Add APX test Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 16:56 ` [PATCH 06/10] " Sohil Mehta
2025-04-16 2:16 ` [PATCH 07/10] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE Chang S. Bae
` (4 subsequent siblings)
10 siblings, 2 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae
Not all paths that lead to fpu__init_disable_system_xstate() currently
emit a message indicating that XSAVE has been disabled. Move the print
statement into the function to ensure the message in all cases.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Link: https://lore.kernel.org/lkml/d6d19e39-2749-4d45-aeab-a209a0ecba17@intel.com
---
New patch for following up patch 3.
---
arch/x86/kernel/fpu/xstate.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2ac1fc182273..8b14c9d3a1df 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -751,6 +751,8 @@ static int __init init_xstate_size(void)
*/
static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
{
+ pr_info("x86/fpu: XSAVE disabled\n");
+
fpu_kernel_cfg.max_features = 0;
cr4_clear_bits(X86_CR4_OSXSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
@@ -821,7 +823,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
* This is a problematic CPU configuration where two
* conflicting state components are both enumerated.
*/
- pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
+ pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx.\n",
fpu_kernel_cfg.max_features);
goto out_disable;
}
@@ -900,7 +902,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
init_fpstate.xfeatures = fpu_kernel_cfg.default_features;
if (init_fpstate.size > sizeof(init_fpstate.regs)) {
- pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
+ pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d)\n",
sizeof(init_fpstate.regs), init_fpstate.size);
goto out_disable;
}
@@ -912,7 +914,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
* xfeatures mask.
*/
if (xfeatures != fpu_kernel_cfg.max_features) {
- pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init, disabling XSAVE\n",
+ pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init\n",
xfeatures, fpu_kernel_cfg.max_features);
goto out_disable;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 07/10] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
` (5 preceding siblings ...)
2025-04-16 2:16 ` [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:05 ` Ingo Molnar
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 08/10] x86/pkeys: Simplify PKRU update in signal frame Chang S. Bae
` (3 subsequent siblings)
10 siblings, 2 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae
Currently, saving register states in the signal frame, the legacy feature
bits are always set in xregs_state->header->xfeatures. This code sequence
can be generalized for reuse in similar cases.
Refactor the logic to ensure a consistent approach across similar usages.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250214010607.7067-2-chang.seok.bae@intel.com/
* No change
This patch and the next were previously posted together. I thought this
refactoring is a meaningful step toward decoupling PKRU from an
unnecessary dependency on XGETBV(1).
---
arch/x86/kernel/fpu/signal.c | 11 +----------
arch/x86/kernel/fpu/xstate.h | 12 ++++++++++++
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index b8b4fa9c2d04..c3ec2512f2bb 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -114,7 +114,6 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
{
struct xregs_state __user *x = buf;
struct _fpx_sw_bytes sw_bytes = {};
- u32 xfeatures;
int err;
/* Setup the bytes not touched by the [f]xsave and reserved for SW. */
@@ -127,12 +126,6 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
err |= __put_user(FP_XSTATE_MAGIC2,
(__u32 __user *)(buf + fpstate->user_size));
- /*
- * Read the xfeatures which we copied (directly from the cpu or
- * from the state in task struct) to the user buffers.
- */
- err |= __get_user(xfeatures, (__u32 __user *)&x->header.xfeatures);
-
/*
* For legacy compatible, we always set FP/SSE bits in the bit
* vector while saving the state to the user context. This will
@@ -144,9 +137,7 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
* header as well as change any contents in the memory layout.
* xrestore as part of sigreturn will capture all the changes.
*/
- xfeatures |= XFEATURE_MASK_FPSSE;
-
- err |= __put_user(xfeatures, (__u32 __user *)&x->header.xfeatures);
+ err |= set_xfeature_in_sigframe(x, XFEATURE_MASK_FPSSE);
return !err;
}
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 9a3a8ccf13bf..aadf02aed071 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -69,6 +69,18 @@ static inline u64 xfeatures_mask_independent(void)
return fpu_kernel_cfg.independent_features;
}
+static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64 mask)
+{
+ u64 xfeatures;
+ int err;
+
+ /* Read the xfeatures value already saved in the user buffer */
+ err = __get_user(xfeatures, &xbuf->header.xfeatures);
+ xfeatures |= mask;
+ err |= __put_user(xfeatures, &xbuf->header.xfeatures);
+ return err;
+}
+
/*
* Update the value of PKRU register that was already pushed onto the signal frame.
*/
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 08/10] x86/pkeys: Simplify PKRU update in signal frame
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
` (6 preceding siblings ...)
2025-04-16 2:16 ` [PATCH 07/10] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 09/10] x86/fpu: Remove export of mxcsr_feature_mask Chang S. Bae
` (2 subsequent siblings)
10 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae,
Aruna Ramakrishna, Tony W Wang-oc
The signal delivery logic was modified to always set the PKRU bit in
xregs_state->header->xfeatures by this commit:
ae6012d72fa6 ("x86/pkeys: Ensure updated PKRU value is XRSTOR'd")
However, the change derives the bitmask value using XGETBV(1), rather
than simply updating the buffer that already holds the value. Thus, this
approach induces an unnecessary dependency on XGETBV1 for PKRU handling.
Eliminate the dependency by using the established helper function.
Subsequently, remove the now-unused 'mask' argument.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20250214010607.7067-3-chang.seok.bae@intel.com/
* Massage the changelog
Additional Context:
Previously, the concern was raised about environments where XGETBV1 is
unavailable — such as in some virtual machines:
https://lore.kernel.org/lkml/20250102075419.2559-1-TonyWWang-oc@zhaoxin.com
That patch tried to sidestep the problem by skipping PKRU updates
entirely when XGETBV1 is inaccessible. However, this assumed the
dependency was necessary, which isn’t the case.
---
arch/x86/kernel/fpu/xstate.h | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index aadf02aed071..a6d987c16293 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -84,18 +84,15 @@ static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64
/*
* Update the value of PKRU register that was already pushed onto the signal frame.
*/
-static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u64 mask, u32 pkru)
+static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
{
- u64 xstate_bv;
int err;
if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
return 0;
/* Mark PKRU as in-use so that it is restored correctly. */
- xstate_bv = (mask & xfeatures_in_use()) | XFEATURE_MASK_PKRU;
-
- err = __put_user(xstate_bv, &buf->header.xfeatures);
+ err = set_xfeature_in_sigframe(buf, XFEATURE_MASK_PKRU);
if (err)
return err;
@@ -319,7 +316,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf, u32 pkr
clac();
if (!err)
- err = update_pkru_in_sigframe(buf, mask, pkru);
+ err = update_pkru_in_sigframe(buf, pkru);
return err;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 09/10] x86/fpu: Remove export of mxcsr_feature_mask
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
` (7 preceding siblings ...)
2025-04-16 2:16 ` [PATCH 08/10] x86/pkeys: Simplify PKRU update in signal frame Chang S. Bae
@ 2025-04-16 2:16 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:17 ` [PATCH 10/10] x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs() Chang S. Bae
2025-04-16 8:07 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Ingo Molnar
10 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:16 UTC (permalink / raw)
To: mingo
Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae,
Nikolay Borisov
The variable was previously referenced in KVM code but the last usage was
removed by:
ea4d6938d4c0 ("x86/fpu: Replace KVMs home brewed FPU copy from user")
Remove its export symbol.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/d143cc4c-8f8e-48e5-87f1-dded3272433a@suse.com
* Note the commit that removed its usage (Nikolay)
* Include review tag
Apologies -- given the review tag, I should have followed up this
earlier.
---
arch/x86/kernel/fpu/init.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 16b6611634c3..2d9b5e677559 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -100,7 +100,6 @@ static void __init fpu__init_system_early_generic(void)
* Boot time FPU feature detection code:
*/
unsigned int mxcsr_feature_mask __ro_after_init = 0xffffffffu;
-EXPORT_SYMBOL_GPL(mxcsr_feature_mask);
static void __init fpu__init_system_mxcsr(void)
{
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH 10/10] x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs()
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
` (8 preceding siblings ...)
2025-04-16 2:16 ` [PATCH 09/10] x86/fpu: Remove export of mxcsr_feature_mask Chang S. Bae
@ 2025-04-16 2:17 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 8:07 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Ingo Molnar
10 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 2:17 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen, chang.seok.bae
The original function name came from an overly compressed form of
'fpstate_regs' by commit:
e61d6310a0f8 ("x86/fpu: Reset permission and fpstate on exec()")
However, the term 'fpregs' typically refers to physical FPU registers. In
contrast, this function copies the init values to fpu->fpstate->regs, not
hardware registers.
Rename the function to better reflect what it actually does.
No functional change.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Changes from the last posting:
https://lore.kernel.org/lkml/20240530192739.172566-2-chang.seok.bae@intel.com
* Refine the changelog
* Add a note referencing the original naming decision.
This patch was originally submitted as part of the In-Field Scan driver
series. Although that series is now dropped that I believe, this rename
still serves as a useful cleanup to correct a naming choice that I
previously made :(.
---
arch/x86/kernel/fpu/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3a19877a314e..8d674435f173 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -733,7 +733,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
/*
* Reset current->fpu memory state to the init values.
*/
-static void fpu_reset_fpregs(void)
+static void fpu_reset_fpstate_regs(void)
{
struct fpu *fpu = x86_task_fpu(current);
@@ -768,7 +768,7 @@ void fpu__clear_user_states(struct fpu *fpu)
fpregs_lock();
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
- fpu_reset_fpregs();
+ fpu_reset_fpstate_regs();
fpregs_unlock();
return;
}
@@ -798,7 +798,7 @@ void fpu__clear_user_states(struct fpu *fpu)
void fpu_flush_thread(void)
{
fpstate_reset(x86_task_fpu(current));
- fpu_reset_fpregs();
+ fpu_reset_fpstate_regs();
}
/*
* Load FPU context before returning to userspace.
--
2.45.2
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH 07/10] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE
2025-04-16 2:16 ` [PATCH 07/10] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE Chang S. Bae
@ 2025-04-16 8:05 ` Ingo Molnar
2025-04-16 16:55 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
1 sibling, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-04-16 8:05 UTC (permalink / raw)
To: Chang S. Bae; +Cc: mingo, linux-kernel, x86, tglx, bp, dave.hansen
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> +static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64 mask)
> +{
> + u64 xfeatures;
> + int err;
> +
> + /* Read the xfeatures value already saved in the user buffer */
> + err = __get_user(xfeatures, &xbuf->header.xfeatures);
> + xfeatures |= mask;
> + err |= __put_user(xfeatures, &xbuf->header.xfeatures);
> + return err;
> +}
For future reference, please put an extra newline before 'return'
statements, so that the code looks more 'balanced':
> +{
> + u64 xfeatures;
> + int err;
> +
> + /* Read the xfeatures value already saved in the user buffer */
> + err = __get_user(xfeatures, &xbuf->header.xfeatures);
> + xfeatures |= mask;
> + err |= __put_user(xfeatures, &xbuf->header.xfeatures);
> +
> + return err;
> +}
I've done that for this patch, so no need to resend it.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
` (9 preceding siblings ...)
2025-04-16 2:17 ` [PATCH 10/10] x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs() Chang S. Bae
@ 2025-04-16 8:07 ` Ingo Molnar
2025-04-16 16:56 ` Chang S. Bae
10 siblings, 1 reply; 112+ messages in thread
From: Ingo Molnar @ 2025-04-16 8:07 UTC (permalink / raw)
To: Chang S. Bae; +Cc: mingo, linux-kernel, x86, tglx, bp, dave.hansen
* Chang S. Bae <chang.seok.bae@intel.com> wrote:
> On 4/12/2025 1:43 AM, Ingo Molnar wrote:
> >
> > Chang, mind sending a series of the latest version of all the pending
> > APX patches you have at the moment (and any other pending FPU patches
> > you may have), with Reviewed-by tags rolled in, etc.
>
> Hi Ingo,
>
> Here’s the updated patch set following up on the previous APX series [1],
> along with a collection of additional FPU-related cleanups and
> improvements that were previously posted or discussed.
>
> The series is organized into two parts:
>
> 1. APX Enabling (PATCH 1–5)
>
> These patches complete the APX bring-up. After laying the groundwork,
> this portion finalizes the enablement:
>
> * Patches 1, 2, and 4 are typical xfeature plumbing.
>
> * Patch 3 handles MPX conflict -- unexpected hardware issue
>
> * Patch 5 adds a test case.
>
> 2. Miscellaneous FPU Code Improvements (PATCH 6–10)
>
> This batch includes various standalone improvements:
>
> * Patch 6: Centralizes the XSAVE disablement message
>
> * Patches 7-8: Simplifies PKRU update in XSTATE_BV on sigframe
>
> * Patch 9: Removes the unused mxcsr_feature_mask export
>
> * Patch 10: Renames fpu_reset_fpregs() for clarity
>
> Each patch includes context and links to earlier discussions or
> revisions.
> 14 files changed, 69 insertions(+), 26 deletions(-)
Applied to tip:x86/fpu, thanks!
Note that I've merged the currently pending tip:x86/cpu bits into
tip:x86/fpu before applying these patches, to resolve a conflict with
<asm/cpufeatures.h>, and re-formatted the new X86_FEATURE_APX line to
have the canonical format.
Ingo
^ permalink raw reply [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs()
2025-04-16 2:17 ` [PATCH 10/10] x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs() Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: de8304c319bc020ef79d109909ad40e944d82c82
Gitweb: https://git.kernel.org/tip/de8304c319bc020ef79d109909ad40e944d82c82
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:17:00 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 10:01:03 +02:00
x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs()
The original function name came from an overly compressed form of
'fpstate_regs' by commit:
e61d6310a0f8 ("x86/fpu: Reset permission and fpstate on exec()")
However, the term 'fpregs' typically refers to physical FPU registers. In
contrast, this function copies the init values to fpu->fpstate->regs, not
hardware registers.
Rename the function to better reflect what it actually does.
No functional change.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-11-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3a19877..8d67443 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -733,7 +733,7 @@ static inline void restore_fpregs_from_init_fpstate(u64 features_mask)
/*
* Reset current->fpu memory state to the init values.
*/
-static void fpu_reset_fpregs(void)
+static void fpu_reset_fpstate_regs(void)
{
struct fpu *fpu = x86_task_fpu(current);
@@ -768,7 +768,7 @@ void fpu__clear_user_states(struct fpu *fpu)
fpregs_lock();
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
- fpu_reset_fpregs();
+ fpu_reset_fpstate_regs();
fpregs_unlock();
return;
}
@@ -798,7 +798,7 @@ void fpu__clear_user_states(struct fpu *fpu)
void fpu_flush_thread(void)
{
fpstate_reset(x86_task_fpu(current));
- fpu_reset_fpregs();
+ fpu_reset_fpstate_regs();
}
/*
* Load FPU context before returning to userspace.
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu: Remove export of mxcsr_feature_mask
2025-04-16 2:16 ` [PATCH 09/10] x86/fpu: Remove export of mxcsr_feature_mask Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Nikolay Borisov, Chang S. Bae, Ingo Molnar, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 70fe4a0266ef156f3a49071da0d9ea6af0f49c44
Gitweb: https://git.kernel.org/tip/70fe4a0266ef156f3a49071da0d9ea6af0f49c44
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:59 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 10:01:03 +02:00
x86/fpu: Remove export of mxcsr_feature_mask
The variable was previously referenced in KVM code but the last usage was
removed by:
ea4d6938d4c0 ("x86/fpu: Replace KVMs home brewed FPU copy from user")
Remove its export symbol.
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-10-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/init.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 16b6611..2d9b5e6 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -100,7 +100,6 @@ static void __init fpu__init_system_early_generic(void)
* Boot time FPU feature detection code:
*/
unsigned int mxcsr_feature_mask __ro_after_init = 0xffffffffu;
-EXPORT_SYMBOL_GPL(mxcsr_feature_mask);
static void __init fpu__init_system_mxcsr(void)
{
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/pkeys: Simplify PKRU update in signal frame
2025-04-16 2:16 ` [PATCH 08/10] x86/pkeys: Simplify PKRU update in signal frame Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, Aruna Ramakrishna,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, Tony W Wang-oc,
Dave Hansen, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: d1e420772cd1eb0afe5858619c73ce36f3e781a1
Gitweb: https://git.kernel.org/tip/d1e420772cd1eb0afe5858619c73ce36f3e781a1
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:58 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 10:01:03 +02:00
x86/pkeys: Simplify PKRU update in signal frame
The signal delivery logic was modified to always set the PKRU bit in
xregs_state->header->xfeatures by this commit:
ae6012d72fa6 ("x86/pkeys: Ensure updated PKRU value is XRSTOR'd")
However, the change derives the bitmask value using XGETBV(1), rather
than simply updating the buffer that already holds the value. Thus, this
approach induces an unnecessary dependency on XGETBV1 for PKRU handling.
Eliminate the dependency by using the established helper function.
Subsequently, remove the now-unused 'mask' argument.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20250416021720.12305-9-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.h | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 4231e44..a0256ef 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -85,18 +85,15 @@ static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64
/*
* Update the value of PKRU register that was already pushed onto the signal frame.
*/
-static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u64 mask, u32 pkru)
+static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
{
- u64 xstate_bv;
int err;
if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
return 0;
/* Mark PKRU as in-use so that it is restored correctly. */
- xstate_bv = (mask & xfeatures_in_use()) | XFEATURE_MASK_PKRU;
-
- err = __put_user(xstate_bv, &buf->header.xfeatures);
+ err = set_xfeature_in_sigframe(buf, XFEATURE_MASK_PKRU);
if (err)
return err;
@@ -320,7 +317,7 @@ static inline int xsave_to_user_sigframe(struct xregs_state __user *buf, u32 pkr
clac();
if (!err)
- err = update_pkru_in_sigframe(buf, mask, pkru);
+ err = update_pkru_in_sigframe(buf, pkru);
return err;
}
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE
2025-04-16 2:16 ` [PATCH 07/10] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE Chang S. Bae
2025-04-16 8:05 ` Ingo Molnar
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
1 sibling, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Andy Lutomirski, H. Peter Anvin,
Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 64e54461ab6e8524a8de4e63b7d1a3e4481b5cf3
Gitweb: https://git.kernel.org/tip/64e54461ab6e8524a8de4e63b7d1a3e4481b5cf3
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:57 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 10:01:00 +02:00
x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE
Currently, saving register states in the signal frame, the legacy feature
bits are always set in xregs_state->header->xfeatures. This code sequence
can be generalized for reuse in similar cases.
Refactor the logic to ensure a consistent approach across similar usages.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-8-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/signal.c | 11 +----------
arch/x86/kernel/fpu/xstate.h | 13 +++++++++++++
2 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index b8b4fa9..c3ec251 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -114,7 +114,6 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
{
struct xregs_state __user *x = buf;
struct _fpx_sw_bytes sw_bytes = {};
- u32 xfeatures;
int err;
/* Setup the bytes not touched by the [f]xsave and reserved for SW. */
@@ -128,12 +127,6 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
(__u32 __user *)(buf + fpstate->user_size));
/*
- * Read the xfeatures which we copied (directly from the cpu or
- * from the state in task struct) to the user buffers.
- */
- err |= __get_user(xfeatures, (__u32 __user *)&x->header.xfeatures);
-
- /*
* For legacy compatible, we always set FP/SSE bits in the bit
* vector while saving the state to the user context. This will
* enable us capturing any changes(during sigreturn) to
@@ -144,9 +137,7 @@ static inline bool save_xstate_epilog(void __user *buf, int ia32_frame,
* header as well as change any contents in the memory layout.
* xrestore as part of sigreturn will capture all the changes.
*/
- xfeatures |= XFEATURE_MASK_FPSSE;
-
- err |= __put_user(xfeatures, (__u32 __user *)&x->header.xfeatures);
+ err |= set_xfeature_in_sigframe(x, XFEATURE_MASK_FPSSE);
return !err;
}
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 9a3a8cc..4231e44 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -69,6 +69,19 @@ static inline u64 xfeatures_mask_independent(void)
return fpu_kernel_cfg.independent_features;
}
+static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64 mask)
+{
+ u64 xfeatures;
+ int err;
+
+ /* Read the xfeatures value already saved in the user buffer */
+ err = __get_user(xfeatures, &xbuf->header.xfeatures);
+ xfeatures |= mask;
+ err |= __put_user(xfeatures, &xbuf->header.xfeatures);
+
+ return err;
+}
+
/*
* Update the value of PKRU register that was already pushed onto the signal frame.
*/
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu: Log XSAVE disablement consistently
2025-04-16 2:16 ` [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
2025-04-16 16:56 ` [PATCH 06/10] " Sohil Mehta
1 sibling, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Dave Hansen, Chang S. Bae, Ingo Molnar, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 39cd7fad39ce2ffbeb21939c805ee55f3ec808d4
Gitweb: https://git.kernel.org/tip/39cd7fad39ce2ffbeb21939c805ee55f3ec808d4
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:56 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:15 +02:00
x86/fpu: Log XSAVE disablement consistently
Not all paths that lead to fpu__init_disable_system_xstate() currently
emit a message indicating that XSAVE has been disabled. Move the print
statement into the function to ensure the message in all cases.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250416021720.12305-7-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2ac1fc1..8b14c9d 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -751,6 +751,8 @@ static int __init init_xstate_size(void)
*/
static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
{
+ pr_info("x86/fpu: XSAVE disabled\n");
+
fpu_kernel_cfg.max_features = 0;
cr4_clear_bits(X86_CR4_OSXSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
@@ -821,7 +823,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
* This is a problematic CPU configuration where two
* conflicting state components are both enumerated.
*/
- pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
+ pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx.\n",
fpu_kernel_cfg.max_features);
goto out_disable;
}
@@ -900,7 +902,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
init_fpstate.xfeatures = fpu_kernel_cfg.default_features;
if (init_fpstate.size > sizeof(init_fpstate.regs)) {
- pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
+ pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d)\n",
sizeof(init_fpstate.regs), init_fpstate.size);
goto out_disable;
}
@@ -912,7 +914,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
* xfeatures mask.
*/
if (xfeatures != fpu_kernel_cfg.max_features) {
- pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init, disabling XSAVE\n",
+ pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init\n",
xfeatures, fpu_kernel_cfg.max_features);
goto out_disable;
}
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] selftests/x86/apx: Add APX test
2025-04-16 2:16 ` [PATCH 05/10] selftests/x86/apx: Add APX test Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Sohil Mehta, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: ab6f87ddd0c6d3fb114cdf897eb9839cbd429439
Gitweb: https://git.kernel.org/tip/ab6f87ddd0c6d3fb114cdf897eb9839cbd429439
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:55 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:14 +02:00
selftests/x86/apx: Add APX test
The extended general-purpose registers for APX may contain random data,
which is currently assumed by the xstate testing framework. This allows
the testing of the new userspace feature using the common test code.
Invoke the test entry function from apx.c after enumerating the
state component and adding it to the support list
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-6-chang.seok.bae@intel.com
---
tools/testing/selftests/x86/Makefile | 3 ++-
tools/testing/selftests/x86/apx.c | 10 ++++++++++
tools/testing/selftests/x86/xstate.c | 3 ++-
tools/testing/selftests/x86/xstate.h | 2 ++
4 files changed, 16 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/apx.c
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 28422c3..f703fcf 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -19,7 +19,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
vdso_restorer
TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \
- corrupt_xstate_header amx lam test_shadow_stack avx
+ corrupt_xstate_header amx lam test_shadow_stack avx apx
# Some selftests require 32bit support enabled also on 64bit systems
TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall
@@ -136,3 +136,4 @@ $(OUTPUT)/nx_stack_64: CFLAGS += -Wl,-z,noexecstack
$(OUTPUT)/avx_64: CFLAGS += -mno-avx -mno-avx512f
$(OUTPUT)/amx_64: EXTRA_FILES += xstate.c
$(OUTPUT)/avx_64: EXTRA_FILES += xstate.c
+$(OUTPUT)/apx_64: EXTRA_FILES += xstate.c
diff --git a/tools/testing/selftests/x86/apx.c b/tools/testing/selftests/x86/apx.c
new file mode 100644
index 0000000..d9c8d41
--- /dev/null
+++ b/tools/testing/selftests/x86/apx.c
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include "xstate.h"
+
+int main(void)
+{
+ test_xstate(XFEATURE_APX);
+}
diff --git a/tools/testing/selftests/x86/xstate.c b/tools/testing/selftests/x86/xstate.c
index 23c1d6c..97fe4bd 100644
--- a/tools/testing/selftests/x86/xstate.c
+++ b/tools/testing/selftests/x86/xstate.c
@@ -31,7 +31,8 @@
(1 << XFEATURE_OPMASK) | \
(1 << XFEATURE_ZMM_Hi256) | \
(1 << XFEATURE_Hi16_ZMM) | \
- (1 << XFEATURE_XTILEDATA))
+ (1 << XFEATURE_XTILEDATA) | \
+ (1 << XFEATURE_APX))
static inline uint64_t xgetbv(uint32_t index)
{
diff --git a/tools/testing/selftests/x86/xstate.h b/tools/testing/selftests/x86/xstate.h
index 42af36e..e91e309 100644
--- a/tools/testing/selftests/x86/xstate.h
+++ b/tools/testing/selftests/x86/xstate.h
@@ -33,6 +33,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILECFG,
XFEATURE_XTILEDATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -59,6 +60,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/apx: Enable APX state support
2025-04-16 2:16 ` [PATCH 04/10] x86/fpu/apx: Enable APX state support Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Sohil Mehta, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: 50c5b071e2833d2b61e3774cd792620311df157c
Gitweb: https://git.kernel.org/tip/50c5b071e2833d2b61e3774cd792620311df157c
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:54 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:14 +02:00
x86/fpu/apx: Enable APX state support
With securing APX against conflicting MPX, it is now ready to be enabled.
Include APX in the enabled xfeature set.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-5-chang.seok.bae@intel.com
---
arch/x86/include/asm/fpu/xstate.h | 3 ++-
arch/x86/kernel/fpu/xstate.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 7f39fe7..b308a76 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -32,7 +32,8 @@
XFEATURE_MASK_PKRU | \
XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* Features which are restored when returning to user space.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 14f5c1b..2ac1fc1 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -371,7 +371,8 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
XFEATURE_MASK_CET_USER | \
- XFEATURE_MASK_XTILE)
+ XFEATURE_MASK_XTILE | \
+ XFEATURE_MASK_APX)
/*
* setup the xstate image representing the init state
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/apx: Disallow conflicting MPX presence
2025-04-16 2:16 ` [PATCH 03/10] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Sohil Mehta, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: ea68e39190cff86f457bd286c70b535e2a99a94d
Gitweb: https://git.kernel.org/tip/ea68e39190cff86f457bd286c70b535e2a99a94d
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:53 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:14 +02:00
x86/fpu/apx: Disallow conflicting MPX presence
XSTATE components are architecturally independent. There is no rule
requiring their offsets in the non-compacted format to be strictly
ascending or mutually non-overlapping. However, in practice, such
overlaps have not occurred -- until now.
APX is introduced as xstate component 19, following AMX. In the
non-compacted XSAVE format, its offset overlaps with the space previously
occupied by the now-deprecated MPX feature:
45fc24e89b7c ("x86/mpx: remove MPX from arch/x86")
To prevent conflicts, the kernel must ensure the CPU never expose both
features at the same time. If so, it indicates unreliable hardware. In
such cases, XSAVE should be disabled entirely as a precautionary measure.
Add a sanity check to detect this condition and disable XSAVE if an
invalid hardware configuration is identified.
Note: MPX state components remain enabled on legacy systems solely for
KVM guest support.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-4-chang.seok.bae@intel.com
---
arch/x86/kernel/fpu/xstate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index dfd07af..14f5c1b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -814,6 +814,17 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ if (fpu_kernel_cfg.max_features & XFEATURE_MASK_APX &&
+ fpu_kernel_cfg.max_features & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) {
+ /*
+ * This is a problematic CPU configuration where two
+ * conflicting state components are both enumerated.
+ */
+ pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
+ fpu_kernel_cfg.max_features);
+ goto out_disable;
+ }
+
fpu_kernel_cfg.independent_features = fpu_kernel_cfg.max_features &
XFEATURE_MASK_INDEPENDENT;
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/fpu/apx: Define APX state component
2025-04-16 2:16 ` [PATCH 02/10] x86/fpu/apx: Define APX state component Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Sohil Mehta, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: bd0b10b795c5c4c587e83c0498251356874c655c
Gitweb: https://git.kernel.org/tip/bd0b10b795c5c4c587e83c0498251356874c655c
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:52 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:14 +02:00
x86/fpu/apx: Define APX state component
Advanced Performance Extensions (APX) is associated with a new state
component number 19. To support saving and restoring of the corresponding
registers via the XSAVE mechanism, introduce the component definition
along with the necessary sanity checks.
Define the new component number, state name, and those register data
type. Then, extend the size checker to validate the register data type
and explicitly list the APX feature flag as a dependency for the new
component in xsave_cpuid_features[].
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-3-chang.seok.bae@intel.com
---
arch/x86/include/asm/fpu/types.h | 9 +++++++++
arch/x86/kernel/fpu/xstate.c | 3 +++
2 files changed, 12 insertions(+)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index de16862..97310df 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -125,6 +125,7 @@ enum xfeature {
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
+ XFEATURE_APX,
XFEATURE_MAX,
};
@@ -145,6 +146,7 @@ enum xfeature {
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
+#define XFEATURE_MASK_APX (1 << XFEATURE_APX)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
@@ -304,6 +306,13 @@ struct xtile_data {
} __packed;
/*
+ * State component 19: 8B extended general purpose register.
+ */
+struct apx_state {
+ u64 egpr[16];
+} __packed;
+
+/*
* State component 10 is supervisor state used for context-switching the
* PASID state.
*/
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index a288597..dfd07af 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -63,6 +63,7 @@ static const char *xfeature_names[] =
"unknown xstate feature",
"AMX Tile config",
"AMX Tile data",
+ "APX registers",
"unknown xstate feature",
};
@@ -81,6 +82,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
+ [XFEATURE_APX] = X86_FEATURE_APX,
};
static unsigned int xstate_offsets[XFEATURE_MAX] __ro_after_init =
@@ -569,6 +571,7 @@ static bool __init check_xstate_against_struct(int nr)
case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
+ case XFEATURE_APX: return XCHECK_SZ(sz, nr, struct apx_state);
case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [tip: x86/fpu] x86/cpufeatures: Add X86_FEATURE_APX
2025-04-16 2:16 ` [PATCH 01/10] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
@ 2025-04-16 8:17 ` tip-bot2 for Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2025-04-16 8:17 UTC (permalink / raw)
To: linux-tip-commits
Cc: Chang S. Bae, Ingo Molnar, Sohil Mehta, Andy Lutomirski,
H. Peter Anvin, Linus Torvalds, Oleg Nesterov, x86, linux-kernel
The following commit has been merged into the x86/fpu branch of tip:
Commit-ID: b02dc185ee86836cf1d8a37b81349374e4018ee0
Gitweb: https://git.kernel.org/tip/b02dc185ee86836cf1d8a37b81349374e4018ee0
Author: Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate: Tue, 15 Apr 2025 19:16:51 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 16 Apr 2025 09:44:13 +02:00
x86/cpufeatures: Add X86_FEATURE_APX
Intel Advanced Performance Extensions (APX) introduce a new set of
general-purpose registers, managed as an extended state component via the
xstate management facility.
Before enabling this new xstate, define a feature flag to clarify the
dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level
7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag
in a scattered feature word.
While this feature is intended only for userspace, exposing it via
/proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2)
mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the
feature availability.
Finally, clarify that APX depends on XSAVE.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20250416021720.12305-2-chang.seok.bae@intel.com
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index bc81b9d..478ab36 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -481,6 +481,7 @@
#define X86_FEATURE_AMD_HTR_CORES (21*32+ 6) /* Heterogeneous Core Topology */
#define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32+ 7) /* Workload Classification */
#define X86_FEATURE_PREFER_YMM (21*32+ 8) /* Avoid ZMM registers due to downclocking */
+#define X86_FEATURE_APX (21*32+ 9) /* Advanced Performance Extensions */
/*
* BUG word(s)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 94c062c..46efcbd 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -28,6 +28,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
+ { X86_FEATURE_APX, X86_FEATURE_XSAVE },
{ X86_FEATURE_CMOV, X86_FEATURE_FXSR },
{ X86_FEATURE_MMX, X86_FEATURE_FXSR },
{ X86_FEATURE_MMXEXT, X86_FEATURE_MMX },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index c75c57b..dbf6d71 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_INTEL_PPIN, CPUID_EBX, 0, 0x00000007, 1 },
+ { X86_FEATURE_APX, CPUID_EDX, 21, 0x00000007, 1 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
{ X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH 07/10] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE
2025-04-16 8:05 ` Ingo Molnar
@ 2025-04-16 16:55 ` Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 16:55 UTC (permalink / raw)
To: Ingo Molnar; +Cc: mingo, linux-kernel, x86, tglx, bp, dave.hansen
On 4/16/2025 1:05 AM, Ingo Molnar wrote:
>
> * Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
>> +static inline int set_xfeature_in_sigframe(struct xregs_state __user *xbuf, u64 mask)
>> +{
>> + u64 xfeatures;
>> + int err;
>> +
>> + /* Read the xfeatures value already saved in the user buffer */
>> + err = __get_user(xfeatures, &xbuf->header.xfeatures);
>> + xfeatures |= mask;
>> + err |= __put_user(xfeatures, &xbuf->header.xfeatures);
>> + return err;
>> +}
>
> For future reference, please put an extra newline before 'return'
> statements, so that the code looks more 'balanced':
>
>> +{
>> + u64 xfeatures;
>> + int err;
>> +
>> + /* Read the xfeatures value already saved in the user buffer */
>> + err = __get_user(xfeatures, &xbuf->header.xfeatures);
>> + xfeatures |= mask;
>> + err |= __put_user(xfeatures, &xbuf->header.xfeatures);
>> +
>> + return err;
>> +}
I see. Thanks for the note.
> I've done that for this patch, so no need to resend it.
Thanks!
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements
2025-04-16 8:07 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Ingo Molnar
@ 2025-04-16 16:56 ` Chang S. Bae
0 siblings, 0 replies; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 16:56 UTC (permalink / raw)
To: Ingo Molnar; +Cc: mingo, linux-kernel, x86, tglx, bp, dave.hansen
On 4/16/2025 1:07 AM, Ingo Molnar wrote:
>
>> 14 files changed, 69 insertions(+), 26 deletions(-)
>
> Applied to tip:x86/fpu, thanks!
>
> Note that I've merged the currently pending tip:x86/cpu bits into
> tip:x86/fpu before applying these patches, to resolve a conflict with
> <asm/cpufeatures.h>, and re-formatted the new X86_FEATURE_APX line to
> have the canonical format.
Appreciate!
Chang
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently
2025-04-16 2:16 ` [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
@ 2025-04-16 16:56 ` Sohil Mehta
2025-04-16 17:03 ` Chang S. Bae
1 sibling, 1 reply; 112+ messages in thread
From: Sohil Mehta @ 2025-04-16 16:56 UTC (permalink / raw)
To: Chang S. Bae, mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen
On 4/15/2025 7:16 PM, Chang S. Bae wrote:
> Not all paths that lead to fpu__init_disable_system_xstate() currently
> emit a message indicating that XSAVE has been disabled. Move the print
> statement into the function to ensure the message in all cases.
>
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Link: https://lore.kernel.org/lkml/d6d19e39-2749-4d45-aeab-a209a0ecba17@intel.com
> ---
> New patch for following up patch 3.
> ---
> arch/x86/kernel/fpu/xstate.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 2ac1fc182273..8b14c9d3a1df 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -751,6 +751,8 @@ static int __init init_xstate_size(void)
> */
> static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
> {
> + pr_info("x86/fpu: XSAVE disabled\n");
> +
There is a mix of pr_info(), pr_warn() and pr_err() to log these related
messages. Would it be useful to make the log level consistent in this
patch or a follow-up?
I think new the "XSAVE disabled" print should be a pr_warn() at least.
> fpu_kernel_cfg.max_features = 0;
> cr4_clear_bits(X86_CR4_OSXSAVE);
> setup_clear_cpu_cap(X86_FEATURE_XSAVE);
> @@ -821,7 +823,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> * This is a problematic CPU configuration where two
> * conflicting state components are both enumerated.
> */
> - pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx, disabling XSAVE.\n",
> + pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features: 0x%llx.\n",
> fpu_kernel_cfg.max_features);
> goto out_disable;
> }
> @@ -900,7 +902,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> init_fpstate.xfeatures = fpu_kernel_cfg.default_features;
>
> if (init_fpstate.size > sizeof(init_fpstate.regs)) {
> - pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
> + pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d)\n",
> sizeof(init_fpstate.regs), init_fpstate.size);
> goto out_disable;
> }
> @@ -912,7 +914,7 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> * xfeatures mask.
> */
> if (xfeatures != fpu_kernel_cfg.max_features) {
> - pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init, disabling XSAVE\n",
> + pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during init\n",
> xfeatures, fpu_kernel_cfg.max_features);
> goto out_disable;
> }
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently
2025-04-16 16:56 ` [PATCH 06/10] " Sohil Mehta
@ 2025-04-16 17:03 ` Chang S. Bae
2025-04-16 18:04 ` Sohil Mehta
0 siblings, 1 reply; 112+ messages in thread
From: Chang S. Bae @ 2025-04-16 17:03 UTC (permalink / raw)
To: Sohil Mehta, mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen
On 4/16/2025 9:56 AM, Sohil Mehta wrote:
> On 4/15/2025 7:16 PM, Chang S. Bae wrote:
>>
>> static void __init fpu__init_disable_system_xstate(unsigned int legacy_size)
>> {
>> + pr_info("x86/fpu: XSAVE disabled\n");
>> +
>
> There is a mix of pr_info(), pr_warn() and pr_err() to log these related
> messages. Would it be useful to make the log level consistent in this
> patch or a follow-up?
>
> I think new the "XSAVE disabled" print should be a pr_warn() at least.
I think pr_info() makes sense here, as it aligns with this hunk:
static __init void disable_smp(void)
{
pr_info("SMP disabled\n");
disable_ioapic_support();
topology_reset_possible_cpus_up();
cpumask_set_cpu(0, topology_sibling_cpumask(0));
cpumask_set_cpu(0, topology_core_cpumask(0));
cpumask_set_cpu(0, topology_die_cpumask(0));
}
If you strongly feel that pr_warn() is more appropriate, then it would
make sense to update all related messages consistently. But honestly, I
don’t think it’s a big deal either way.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently
2025-04-16 17:03 ` Chang S. Bae
@ 2025-04-16 18:04 ` Sohil Mehta
0 siblings, 0 replies; 112+ messages in thread
From: Sohil Mehta @ 2025-04-16 18:04 UTC (permalink / raw)
To: Chang S. Bae, mingo; +Cc: linux-kernel, x86, tglx, bp, dave.hansen
On 4/16/2025 10:03 AM, Chang S. Bae wrote:
> If you strongly feel that pr_warn() is more appropriate, then it would
> make sense to update all related messages consistently. But honestly, I
> don’t think it’s a big deal either way.
I don't have a strong preference.
But, for someone who is using log levels to reduce the kernel log
details, the below fpu-specific messages probably don't seem to be in
priority order.
pr_info("x86/fpu: XSAVE disabled\n");
pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d)\n",
pr_err("x86/fpu: Both APX/MPX present in the CPU's xstate features:
0x%llx.\n",
pr_err("x86/fpu: xfeatures modified from 0x%016llx to 0x%016llx during
init\n",
pr_err("x86/fpu: FP/SSE not present amongst the CPU's xstate features:
0x%llx.\n",
"XSAVE disabled" seems most important, with the rest just being internal
details. Maybe it's just me... Feel free to ignore unless someone else
chimes in.
^ permalink raw reply [flat|nested] 112+ messages in thread
end of thread, other threads:[~2025-04-16 18:04 UTC | newest]
Thread overview: 112+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-27 18:44 [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 01/11] x86/fpu/xstate: Simplify print_xstate_features() Chang S. Bae
2025-02-27 18:51 ` Dave Hansen
2025-02-27 18:52 ` Ingo Molnar
2025-02-27 19:04 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 02/11] x86/fpu/xstate: Introduce xstate order table and accessor macro Chang S. Bae
2025-02-27 19:03 ` Dave Hansen
2025-02-27 19:34 ` Andrew Cooper
2025-02-27 19:42 ` Ingo Molnar
2025-03-20 23:46 ` Chang S. Bae
2025-02-27 19:05 ` Ingo Molnar
2025-02-27 19:32 ` Dave Hansen
2025-02-27 19:36 ` Ingo Molnar
2025-02-27 19:49 ` Andrew Cooper
2025-02-27 20:06 ` Ingo Molnar
2025-02-27 21:10 ` Andrew Cooper
2025-02-27 21:28 ` Ingo Molnar
2025-02-27 21:30 ` Ingo Molnar
2025-02-27 22:23 ` Andrew Cooper
2025-02-28 3:10 ` Chang S. Bae
2025-02-28 17:11 ` Ingo Molnar
2025-02-27 19:56 ` Dave Hansen
2025-02-27 20:02 ` Ingo Molnar
2025-02-27 20:19 ` Dave Hansen
2025-02-27 20:29 ` Ingo Molnar
2025-02-27 21:17 ` Dave Hansen
2025-02-27 21:37 ` Ingo Molnar
2025-02-27 23:54 ` Dave Hansen
2025-02-28 15:20 ` Sean Christopherson
2025-02-27 18:44 ` [PATCH RFC v1 03/11] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 04/11] x86/fpu/xstate: Adjust XSAVE buffer size calculation Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 05/11] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 06/11] x86/fpu/mpx: Remove MPX xstate component support Chang S. Bae
2025-02-27 19:12 ` Ingo Molnar
2025-02-28 3:12 ` Chang S. Bae
2025-03-20 23:47 ` Chang S. Bae
2025-04-01 17:13 ` Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 07/11] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 08/11] x86/fpu/apx: Define APX state component Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 09/11] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
2025-02-27 23:16 ` Dave Hansen
2025-02-27 23:58 ` Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 10/11] x86/fpu/apx: Enable APX state support Chang S. Bae
2025-02-27 18:44 ` [PATCH RFC v1 11/11] selftests/x86/apx: Add APX test Chang S. Bae
2025-02-27 19:15 ` [PATCH RFC v1 00/11] x86: Support Intel Advanced Performance Extensions Ingo Molnar
2025-02-27 19:36 ` Dave Hansen
2025-02-28 3:10 ` Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 0/9] " Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 1/9] x86/fpu/xstate: Remove xstate offset check Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 2/9] x86/fpu/xstate: Introduce xfeature order table and accessor macro Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 3/9] x86/fpu/xstate: Adjust XSAVE buffer size calculation Chang S. Bae
2025-03-25 10:28 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 4/9] x86/fpu/xstate: Adjust xstate copying logic for user ABI Chang S. Bae
2025-03-25 10:18 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-03-25 10:28 ` tip-bot2 for Chang S. Bae
2025-04-14 7:34 ` [tip: x86/merge] " tip-bot2 for Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 5/9] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
2025-04-11 16:12 ` [PATCH RFC v2a " Chang S. Bae
2025-04-11 16:54 ` Sohil Mehta
2025-04-11 18:23 ` Chang S. Bae
2025-04-11 21:57 ` Sohil Mehta
2025-04-12 8:43 ` Ingo Molnar
2025-04-14 8:23 ` Ingo Molnar
2025-04-14 17:28 ` Sohil Mehta
2025-04-14 17:32 ` Dave Hansen
2025-04-14 17:45 ` Sohil Mehta
2025-04-14 18:02 ` Chang S. Bae
2025-04-14 18:12 ` Sohil Mehta
2025-04-14 18:31 ` Chang S. Bae
2025-04-16 2:16 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Chang S. Bae
2025-04-16 2:16 ` [PATCH 01/10] x86/cpufeatures: Add X86_FEATURE_APX Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 02/10] x86/fpu/apx: Define APX state component Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 03/10] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 04/10] x86/fpu/apx: Enable APX state support Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 05/10] selftests/x86/apx: Add APX test Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 06/10] x86/fpu: Log XSAVE disablement consistently Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 16:56 ` [PATCH 06/10] " Sohil Mehta
2025-04-16 17:03 ` Chang S. Bae
2025-04-16 18:04 ` Sohil Mehta
2025-04-16 2:16 ` [PATCH 07/10] x86/fpu: Refactor xfeature bitmask update code for sigframe XSAVE Chang S. Bae
2025-04-16 8:05 ` Ingo Molnar
2025-04-16 16:55 ` Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 08/10] x86/pkeys: Simplify PKRU update in signal frame Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:16 ` [PATCH 09/10] x86/fpu: Remove export of mxcsr_feature_mask Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 2:17 ` [PATCH 10/10] x86/fpu: Rename fpu_reset_fpregs() to fpu_reset_fpstate_regs() Chang S. Bae
2025-04-16 8:17 ` [tip: x86/fpu] " tip-bot2 for Chang S. Bae
2025-04-16 8:07 ` [PATCH 00/10] x86/fpu: APX enablement and assorted FPU code improvements Ingo Molnar
2025-04-16 16:56 ` Chang S. Bae
2025-03-20 23:42 ` [PATCH RFC v2 6/9] x86/fpu/apx: Define APX state component Chang S. Bae
2025-04-11 22:54 ` Sohil Mehta
2025-03-20 23:42 ` [PATCH RFC v2 7/9] x86/fpu/apx: Disallow conflicting MPX presence Chang S. Bae
2025-04-14 17:09 ` Sohil Mehta
2025-04-14 17:25 ` Dave Hansen
2025-03-20 23:42 ` [PATCH RFC v2 8/9] x86/fpu/apx: Enable APX state support Chang S. Bae
2025-04-14 15:41 ` Sohil Mehta
2025-03-20 23:43 ` [PATCH RFC v2 9/9] selftests/x86/apx: Add APX test Chang S. Bae
2025-04-14 16:02 ` Sohil Mehta
2025-03-25 10:25 ` [PATCH RFC v2 0/9] x86: Support Intel Advanced Performance Extensions Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox