* [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
@ 2026-04-29 0:06 Andrei Vagin
2026-04-29 7:26 ` Chang S. Bae
0 siblings, 1 reply; 13+ messages in thread
From: Andrei Vagin @ 2026-04-29 0:06 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen
Cc: linux-kernel, criu, x86, Andrei Vagin, Chang S. Bae, stable
This reverts commit dc8aa31a7ac2 ("x86/fpu: Refine and simplify the
magic number check during signal return").
The reverted commit broke applications that construct signal frames in
userspace (such as CRIU and gVisor) if the frame's xstate size is
smaller than the kernel's fpstate->user_size.
Furthermore, this introduces a critical issue for checkpoint/restore
tools like CRIU. If a process is checkpointed while inside a signal
handler, its stack contains a signal frame formatted according to the
source host's xstate capabilities. If that process is later restored on
a destination host with larger xstate capabilities (e.g., a newer CPU
with more features enabled, resulting in a larger fpstate->user_size),
the kernel will look for FP_XSTATE_MAGIC2 at the destination host's
larger user_size offset instead of the offset encoded in the frame's
fx_sw->xstate_size. This causes the magic2 check to fail, forcing
sigreturn to silently fall back to "FX-only" mode. Upon return from the
signal handler, the process's extended state is reset to initial values
instead of being restored, leading to silent data corruption.
The original commit cited commit d877550eaf2d ("x86/fpu: Stop
relying on userspace for info to fault in xsave buffer") as
justification to stop relying on userspace for the magic number check.
However, these two changes are fundamentally different. The last one
only changed how much memory the kernel ensures is paged-in before
running XRSTOR to prevent an infinite loop. It did not change the signal
frame format or how the layout is validated.
Reverting this change restores the use of fx_sw->xstate_size for
locating magic2 and restores the necessary sanity checks, ensuring that
the signal frame remains self-describing and portable.
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: stable@vger.kernel.org
Fixes: dc8aa31a7ac2 ("x86/fpu: Refine and simplify the magic number check during signal return")
Signed-off-by: Andrei Vagin <avagin@google.com>
---
arch/x86/kernel/fpu/signal.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index c3ec2512f2bb..20b638c507ca 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -27,14 +27,19 @@
static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
struct _fpx_sw_bytes *fx_sw)
{
+ int min_xstate_size = sizeof(struct fxregs_state) +
+ sizeof(struct xstate_header);
void __user *fpstate = fxbuf;
unsigned int magic2;
if (__copy_from_user(fx_sw, &fxbuf->sw_reserved[0], sizeof(*fx_sw)))
return false;
- /* Check for the first magic field */
- if (fx_sw->magic1 != FP_XSTATE_MAGIC1)
+ /* Check for the first magic field and other error scenarios. */
+ if (fx_sw->magic1 != FP_XSTATE_MAGIC1 ||
+ fx_sw->xstate_size < min_xstate_size ||
+ fx_sw->xstate_size > x86_task_fpu(current)->fpstate->user_size ||
+ fx_sw->xstate_size > fx_sw->extended_size)
goto setfx;
/*
@@ -43,7 +48,7 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
* fpstate layout with out copying the extended state information
* in the memory layout.
*/
- if (__get_user(magic2, (__u32 __user *)(fpstate + x86_task_fpu(current)->fpstate->user_size)))
+ if (__get_user(magic2, (__u32 __user *)(fpstate + fx_sw->xstate_size)))
return false;
if (likely(magic2 == FP_XSTATE_MAGIC2))
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-04-29 0:06 [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return" Andrei Vagin
@ 2026-04-29 7:26 ` Chang S. Bae
2026-04-29 16:44 ` Andrei Vagin
2026-05-01 18:44 ` Andrei Vagin
0 siblings, 2 replies; 13+ messages in thread
From: Chang S. Bae @ 2026-04-29 7:26 UTC (permalink / raw)
To: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen
Cc: linux-kernel, criu, x86, stable
On 4/28/2026 5:06 PM, Andrei Vagin wrote:
>
> The reverted commit broke applications that construct signal frames in
> userspace (such as CRIU and gVisor) if the frame's xstate size is
> smaller than the kernel's fpstate->user_size.
In the extended state area, the sigframe embeds the hardware-defined
XSAVE format. If CPU A and CPU B support different XSTATE features, the
layout (size and offsets) differ across systems. However, within a
system, the layout is invariant. Userspace can query CPUID to obtain the
exact offset and sizes, which effectively defines the ABI.
On top of the XSAVE data, the kernel appends metadata (e.g. the xstate
size and magic values). In particular fpstate->user_size is written by
save_sw_bytes() at signal delivery. On sigreturn, the kernel validates
this, which is a symmetric and straightforward check.
Because the format is hardware-defined, arbitrary size mismatches should
not be allowed. The sigframe should match the CPU-defined XSAVE layout.
So the change in fact strengthens the sanity check.
> Furthermore, this introduces a critical issue for checkpoint/restore
> tools like CRIU. If a process is checkpointed while inside a signal
> handler, its stack contains a signal frame formatted according to the
> source host's xstate capabilities. If that process is later restored on
> a destination host with larger xstate capabilities (e.g., a newer CPU
> with more features enabled, resulting in a larger fpstate->user_size),
> the kernel will look for FP_XSTATE_MAGIC2 at the destination host's
> larger user_size offset instead of the offset encoded in the frame's
> fx_sw->xstate_size. This causes the magic2 check to fail, forcing
> sigreturn to silently fall back to "FX-only" mode.
It seems that userspace could translate the XSAVE buffer from CPU A's
format to CPU B's format during restore. If so, the frame can be
consistent with the destination system without modifying
fx_sw->xstate_size, and the kernel-side validation would continue to
work as intended.
Thanks,
Chang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-04-29 7:26 ` Chang S. Bae
@ 2026-04-29 16:44 ` Andrei Vagin
2026-04-29 17:15 ` Chang S. Bae
2026-05-01 18:44 ` Andrei Vagin
1 sibling, 1 reply; 13+ messages in thread
From: Andrei Vagin @ 2026-04-29 16:44 UTC (permalink / raw)
To: Chang S. Bae
Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, linux-kernel, criu, x86, stable
On Wed, Apr 29, 2026 at 12:27 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 4/28/2026 5:06 PM, Andrei Vagin wrote:
> >
> > The reverted commit broke applications that construct signal frames in
> > userspace (such as CRIU and gVisor) if the frame's xstate size is
> > smaller than the kernel's fpstate->user_size.
>
> In the extended state area, the sigframe embeds the hardware-defined
> XSAVE format. If CPU A and CPU B support different XSTATE features, the
> layout (size and offsets) differ across systems. However, within a
> system, the layout is invariant. Userspace can query CPUID to obtain the
> exact offset and sizes, which effectively defines the ABI.
>
> On top of the XSAVE data, the kernel appends metadata (e.g. the xstate
> size and magic values). In particular fpstate->user_size is written by
> save_sw_bytes() at signal delivery. On sigreturn, the kernel validates
> this, which is a symmetric and straightforward check.
First of all, the reverted change broke backward compatibility for
user-space. There are at least two projects (gVisor and CRIU) that
worked correctly before this change. With the reverted commit, they
run into silent memory corruption. We usually try to avoid breaking
user-space like this without strong justification.
As for layout compatibility, in most cases CPU A (older) and CPU B
(newer) have compatible XSAVE layouts in terms of saving states on A
and restoring them on B. CPU B may feature new extended hardware
states, but the layout for previously supported components remains
the same. CRIU relies on this fact to allow users to migrate
processes from older to newer CPUs. CRIU can check whether
XSAVE states align across machines.
>
> Because the format is hardware-defined, arbitrary size mismatches should
> not be allowed. The sigframe should match the CPU-defined XSAVE layout.
> So the change in fact strengthens the sanity check.
>
> > Furthermore, this introduces a critical issue for checkpoint/restore
> > tools like CRIU. If a process is checkpointed while inside a signal
> > handler, its stack contains a signal frame formatted according to the
> > source host's xstate capabilities. If that process is later restored on
> > a destination host with larger xstate capabilities (e.g., a newer CPU
> > with more features enabled, resulting in a larger fpstate->user_size),
> > the kernel will look for FP_XSTATE_MAGIC2 at the destination host's
> > larger user_size offset instead of the offset encoded in the frame's
> > fx_sw->xstate_size. This causes the magic2 check to fail, forcing
> > sigreturn to silently fall back to "FX-only" mode.
>
> It seems that userspace could translate the XSAVE buffer from CPU A's
> format to CPU B's format during restore. If so, the frame can be
> consistent with the destination system without modifying
> fx_sw->xstate_size, and the kernel-side validation would continue to
> work as intended.
When checkpointing a process, CRIU cannot determine whether it is
currently executing within a signal handler, and it cannot find
signal frames on a user stack. In fact, there could be multiple
nested signal frames stacked on top of each other if a process
triggered additional signals while executing in an earlier handler.
Even if CRIU were somehow able to locate these frames, extending
them would be impossible. The target application stack is not
under our control, and other user stack data or local variables
reside immediately after the frame.
Thanks,
Andrei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-04-29 16:44 ` Andrei Vagin
@ 2026-04-29 17:15 ` Chang S. Bae
2026-04-29 20:44 ` Andrei Vagin
0 siblings, 1 reply; 13+ messages in thread
From: Chang S. Bae @ 2026-04-29 17:15 UTC (permalink / raw)
To: Andrei Vagin
Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, linux-kernel, criu, x86, stable
On 4/29/2026 9:44 AM, Andrei Vagin wrote:
>
> First of all, the reverted change broke backward compatibility for
> user-space.
The ABI itself is still intact. Do you mean that the kernel cannot
strengthen its sanity check logic? The change does not alter the ABI,
but enforces stricter validation of the existing format.
> As for layout compatibility, in most cases CPU A (older) and CPU B
> (newer) have compatible XSAVE layouts in terms of saving states on A
> and restoring them on B. CPU B may feature new extended hardware
> states, but the layout for previously supported components remains
> the same.
I don't think this assumption holds. For example, with APX, the state is
placed at the offset previously used by MPX. So the layout is not
strictly append-only, and offsets are not guaranteed to remain stable
across different CPU generations.
> Even if CRIU were somehow able to locate these frames, extending
> them would be impossible. The target application stack is not
> under our control, and other user stack data or local variables
> reside immediately after the frame.
I’m confused by this point. If the frame cannot be adjusted, in the
first place, how does migration work across systems with differing
feature sets?
Features can be introduced or deprecated over time, and a snapshot taken
on one machine cannot be expected to run unmodified on an random machine
with a different XSTATE set. Some form of translation is inevitable for
any cross-machine restore mechanism.
Thanks,
Chang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-04-29 17:15 ` Chang S. Bae
@ 2026-04-29 20:44 ` Andrei Vagin
2026-04-29 21:44 ` Chang S. Bae
0 siblings, 1 reply; 13+ messages in thread
From: Andrei Vagin @ 2026-04-29 20:44 UTC (permalink / raw)
To: Chang S. Bae
Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, linux-kernel, criu, x86, stable
On Wed, Apr 29, 2026 at 10:15 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 4/29/2026 9:44 AM, Andrei Vagin wrote:
> >
> > First of all, the reverted change broke backward compatibility for
> > user-space.
>
> The ABI itself is still intact. Do you mean that the kernel cannot
> strengthen its sanity check logic? The change does not alter the ABI,
> but enforces stricter validation of the existing format.
Enforcing validation against 'fpstate->user_size' instead of the frame's
own 'fx_sw->xstate_size' changes the kernel ABI, it isn't strengthen the
sanity check logic. When user-space supplies a valid, self-consistent
frame with an explicit size that older kernels accepted, and the updated
logic rejects it, which triggers a userspace regression.
CRIU and gVisor breakages are not related to migration from one host to
another. In both cases, they were broken even when running on the same
host. Migration between different CPUs is a separate issue. In both
cases, the code that constructs signal frames has existed for many years
and has worked without any problem before this change.
>
> > As for layout compatibility, in most cases CPU A (older) and CPU B
> > (newer) have compatible XSAVE layouts in terms of saving states on A
> > and restoring them on B. CPU B may feature new extended hardware
> > states, but the layout for previously supported components remains
> > the same.
> I don't think this assumption holds. For example, with APX, the state is
> placed at the offset previously used by MPX. So the layout is not
> strictly append-only, and offsets are not guaranteed to remain stable
> across different CPU generations.
Regarding layout variations (like APX vs MPX), migration tools already
track XSAVE capabilities and offsets. Furthermore, APX has its own
dedicated bit in the 'xfeatures' field of the xstate_header. If
platforms present conflicting layouts or incompatible extensions, CRIU
cancels restoration.
The issue with checking against 'user_size' is that it disrupts
migration even between compatible systems. If offsets match
but the destination cpu has more features (leading to a larger
'user_size'), validation fails...
>
> > Even if CRIU were somehow able to locate these frames, extending
> > them would be impossible. The target application stack is not
> > under our control, and other user stack data or local variables
> > reside immediately after the frame.
> I’m confused by this point. If the frame cannot be adjusted, in the
> first place, how does migration work across systems with differing
> feature sets?
Cross-host migration only works reliably between compatible systems. It
works when both hosts share identical feature sets, or in a one-way
direction when the target host supports all features of the source host
and their XSAVE layouts are compatible. In this context, `compatible`
means fpu states saved on the source hosts are restorable on the
destination host.
If processes are checkpointed at safe, predefined points where they are
not executing signal handlers, target host requirements can be more
flexible. Here, I need to mention when CRIU constructs signal frames
from userspace. In the final step, after all file descriptors and memory
mappings are restored, it invokes sigreturn with a pre-constructed
signal frame to restore registers and resume the fully restored process.
Since CRIU constructs these frames, it can adjust the XSAVE layout if
required. We currently do not do this because we have not yet seen
scenarios where it would be required.
> on one machine cannot be expected to run unmodified on an random machine
> with a different XSTATE set. Some form of translation is inevitable for
> any cross-machine restore mechanism.
As I mentioned, migration tools have logic to determine where a
specific workload can be migrated. Because we cannot always control the
exact execution point at which a process is stopped, state translation
is not always feasible. For instance, an active signal frame on a
process stack can be entirely outside our control. However, we can
reliably find out compatible target systems where the workload can be
resumed safely.
Thanks,
Andrei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-04-29 20:44 ` Andrei Vagin
@ 2026-04-29 21:44 ` Chang S. Bae
2026-04-30 0:28 ` Andrei Vagin
0 siblings, 1 reply; 13+ messages in thread
From: Chang S. Bae @ 2026-04-29 21:44 UTC (permalink / raw)
To: Andrei Vagin
Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, linux-kernel, criu, x86, stable
On 4/29/2026 1:44 PM, Andrei Vagin wrote:
>
> Enforcing validation against 'fpstate->user_size' instead of the frame's
> own 'fx_sw->xstate_size' changes the kernel ABI, it isn't strengthen the
> sanity check logic. When user-space supplies a valid, self-consistent
> frame with an explicit size that older kernels accepted, and the updated
> logic rejects it, which triggers a userspace regression.
Sorry, I don't get your version of ABI.
Eventually, XRSTOR will execute to restore the state. The kernel tracks
each task's requested feature bitmap (RFBM), which determines the size.
As describe SDM Vol.1, Section 13.13:
An execution of an instruction in the XSAVE feature set may access
any byte of any state component on which that execution operates even
when saving a state component is omitted ...
Given this, the kernel must ensure the backing memory is valid and
sufficient. So this consistency does matter.
Thanks,
Chang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-04-29 21:44 ` Chang S. Bae
@ 2026-04-30 0:28 ` Andrei Vagin
0 siblings, 0 replies; 13+ messages in thread
From: Andrei Vagin @ 2026-04-30 0:28 UTC (permalink / raw)
To: Chang S. Bae
Cc: Andrei Vagin, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, linux-kernel, criu, x86, stable
On Wed, Apr 29, 2026 at 2:44 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 4/29/2026 1:44 PM, Andrei Vagin wrote:
> >
> > Enforcing validation against 'fpstate->user_size' instead of the frame's
> > own 'fx_sw->xstate_size' changes the kernel ABI, it isn't strengthen the
> > sanity check logic. When user-space supplies a valid, self-consistent
> > frame with an explicit size that older kernels accepted, and the updated
> > logic rejects it, which triggers a userspace regression.
> Sorry, I don't get your version of ABI.
>
> Eventually, XRSTOR will execute to restore the state. The kernel tracks
> each task's requested feature bitmap (RFBM), which determines the size.
> As describe SDM Vol.1, Section 13.13:
>
> An execution of an instruction in the XSAVE feature set may access
> any byte of any state component on which that execution operates even
> when saving a state component is omitted ...
>
> Given this, the kernel must ensure the backing memory is valid and
> sufficient. So this consistency does matter.
We need to add one more paragraph to have the full context:
Each instruction in the XSAVE feature set operates on a set of
XSAVE-managed state components. The specific set of components on
which an instruction operates is determined by the values of XCR0,
the IA32_XSS MSR, EDX:EAX, and (for XRSTOR and XRSTORS) the XSAVE
header.
Section 13.4 provides the details necessary to determine the
location of each state component for any execution of an
instruction in the XSAVE feature set. An execution of an
instruction in the XSAVE feature set may access any byte of any
state component on which that execution operates even when saving
a state component is omitted because it is in its initial
configuration; when restoring a state component to its initial
configuration; or when XFD is enabled for the state components
(see Section 13.14).
I interpret this to mean that XRSTOR will not access memory for a component
if its corresponding bit is clear in the XSAVE header.
However, my point was not about the CPU specification, but about the
kernel ABI. The reverted change broke existing user-space applications
without justifying an ABI regression. Even if xrstor were to trigger a
fault, the kernel handles it properly, so there is no real issue there.
It feels like we are trying to justify the change after the fact. The
rule is: "we don't break user-space". As usual, there are no rules
without exceptions, but any exception should be explicitly analyzed
considering all side effects. According to the commit message of the
reverted commit, that wasn't such case.
Thanks,
Andrei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-04-29 7:26 ` Chang S. Bae
2026-04-29 16:44 ` Andrei Vagin
@ 2026-05-01 18:44 ` Andrei Vagin
2026-05-01 19:13 ` Chang S. Bae
1 sibling, 1 reply; 13+ messages in thread
From: Andrei Vagin @ 2026-05-01 18:44 UTC (permalink / raw)
To: Chang S. Bae
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
linux-kernel, criu, x86, stable
On Wed, Apr 29, 2026 at 12:26 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 4/28/2026 5:06 PM, Andrei Vagin wrote:
> >
> > The reverted commit broke applications that construct signal frames in
> > userspace (such as CRIU and gVisor) if the frame's xstate size is
> > smaller than the kernel's fpstate->user_size.
>
> In the extended state area, the sigframe embeds the hardware-defined
> XSAVE format. If CPU A and CPU B support different XSTATE features, the
> layout (size and offsets) differ across systems. However, within a
> system, the layout is invariant. Userspace can query CPUID to obtain the
> exact offset and sizes, which effectively defines the ABI.
I've been thinking about this more, and I believe the claim that XSAVE
offsets can differ across CPUs for the same feature is inaccurate. The
XSAVE standard format uses fixed offsets specifically to allow migration
between different CPU generations. If a feature exists on both the
source and destination CPUs, its data resides at the exact same byte
offset.
This design is what makes virtual machine migration possible.
Hypervisors cannot "translate" XSTATE data hidden in guest memory, so it
relies on these invariant offsets. The CRIU case is very similar: when a
process is in a signal handler, its state is saved on the stack as an
opaque block of memory.
If a future CPU uses different offsets for existing features, it would break
VM migration. Backward compatibility in this area should be a requirement
even for hardware. If we look at existing CPUs, they follow this principle.
Thanks,
Andrei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-05-01 18:44 ` Andrei Vagin
@ 2026-05-01 19:13 ` Chang S. Bae
2026-05-01 20:50 ` Andrei Vagin
0 siblings, 1 reply; 13+ messages in thread
From: Chang S. Bae @ 2026-05-01 19:13 UTC (permalink / raw)
To: Andrei Vagin
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
linux-kernel, criu, x86, stable
On 5/1/2026 11:44 AM, Andrei Vagin wrote:
>
> I've been thinking about this more, and I believe the claim that XSAVE
> offsets can differ across CPUs for the same feature is inaccurate. The
> XSAVE standard format uses fixed offsets specifically to allow migration
> between different CPU generations. If a feature exists on both the
> source and destination CPUs, its data resides at the exact same byte
> offset.
There is commit ba386777a30b ("x86/elf: Add a new FPU buffer layout info
to x86 core files") for this reason:
...
The XSAVE layouts of modern AMD and Intel CPUs differ, especially
since Memory Protection Keys and the AVX-512 features have been
inculcated into the AMD CPUs.
Since AMD never adopted (and hence never left room in the XSAVE
layout for) the Intel MPX feature, tools like GDB had assumed a
fixed XSAVE layout matching that of Intel (based on the XCR0 mask).
Hence, core dumps from AMD CPUs didn't match the known size for the
XCR0 mask. This resulted in GDB and other tools not being able to
access the values of the AVX-512 and PKRU registers on AMD CPUs.
...
Thanks,
Chang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-05-01 19:13 ` Chang S. Bae
@ 2026-05-01 20:50 ` Andrei Vagin
2026-05-01 21:04 ` Chang S. Bae
0 siblings, 1 reply; 13+ messages in thread
From: Andrei Vagin @ 2026-05-01 20:50 UTC (permalink / raw)
To: Chang S. Bae
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
linux-kernel, criu, x86, stable
On Fri, May 1, 2026 at 12:13 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 5/1/2026 11:44 AM, Andrei Vagin wrote:
> >
> > I've been thinking about this more, and I believe the claim that XSAVE
> > offsets can differ across CPUs for the same feature is inaccurate. The
> > XSAVE standard format uses fixed offsets specifically to allow migration
> > between different CPU generations. If a feature exists on both the
> > source and destination CPUs, its data resides at the exact same byte
> > offset.
>
> There is commit ba386777a30b ("x86/elf: Add a new FPU buffer layout info
> to x86 core files") for this reason:
>
> ...
> The XSAVE layouts of modern AMD and Intel CPUs differ, especially
> since Memory Protection Keys and the AVX-512 features have been
> inculcated into the AMD CPUs.
>
> Since AMD never adopted (and hence never left room in the XSAVE
> layout for) the Intel MPX feature, tools like GDB had assumed a
> fixed XSAVE layout matching that of Intel (based on the XCR0 mask).
>
> Hence, core dumps from AMD CPUs didn't match the known size for the
> XCR0 mask. This resulted in GDB and other tools not being able to
> access the values of the AVX-512 and PKRU registers on AMD CPUs.
> ...
This is a different; here, we have two different CPU vendors where XSAVE
layouts differ. The XSAVE layout itself is not the only reason why migration
between Intel and AMD cannot work reliably.
Thanks,
Andrei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-05-01 20:50 ` Andrei Vagin
@ 2026-05-01 21:04 ` Chang S. Bae
2026-05-01 21:42 ` Andrei Vagin
0 siblings, 1 reply; 13+ messages in thread
From: Chang S. Bae @ 2026-05-01 21:04 UTC (permalink / raw)
To: Andrei Vagin
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
linux-kernel, criu, x86, stable
On 5/1/2026 1:50 PM, Andrei Vagin wrote:
>
> This is a different; here, we have two different CPU vendors where XSAVE
> layouts differ. The XSAVE layout itself is not the only reason why migration
> between Intel and AMD cannot work reliably.
When saying CPU A and B, I didn't intend the same vendor but x86 in general.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-05-01 21:04 ` Chang S. Bae
@ 2026-05-01 21:42 ` Andrei Vagin
2026-05-02 19:23 ` Chang S. Bae
0 siblings, 1 reply; 13+ messages in thread
From: Andrei Vagin @ 2026-05-01 21:42 UTC (permalink / raw)
To: Chang S. Bae
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
linux-kernel, criu, x86, stable
On Fri, May 1, 2026 at 2:04 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 5/1/2026 1:50 PM, Andrei Vagin wrote:
> >
> > This is a different; here, we have two different CPU vendors where XSAVE
> > layouts differ. The XSAVE layout itself is not the only reason why migration
> > between Intel and AMD cannot work reliably.
> When saying CPU A and B, I didn't intend the same vendor but x86 in general.
My point is that the reverted change broke a significant, real-life use
case that the hardware was explicitly designed to support.
It is the responsibility of C/R tooling to ensure the migration target
is compatible with the source. Enforcing a magic check based on a fixed
offset does not provide additional security. The kernel must be prepared
to handle "trash" data in the userspace xsave area and manage any
exceptions triggered by the xrstor instruction.
Thanks,
Andrei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return"
2026-05-01 21:42 ` Andrei Vagin
@ 2026-05-02 19:23 ` Chang S. Bae
0 siblings, 0 replies; 13+ messages in thread
From: Chang S. Bae @ 2026-05-02 19:23 UTC (permalink / raw)
To: Andrei Vagin
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
linux-kernel, criu, x86, stable
On 5/1/2026 2:42 PM, Andrei Vagin wrote:
>
> My point is that the reverted change broke a significant, real-life use
> case that the hardware was explicitly designed to support.
>
> It is the responsibility of C/R tooling to ensure the migration target
> is compatible with the source. Enforcing a magic check based on a fixed
> offset does not provide additional security. The kernel must be prepared
> to handle "trash" data in the userspace xsave area and manage any
> exceptions triggered by the xrstor instruction.
It looks like this behavior has been in place since c37b5efea43f ("x86,
xsave: save/restore the extended state context in sigframe"). With the
sanity check, userspace can modify the sw_fx->xfeature_size and the
sw_fx->xfeatures (independently).
But, it seems there is no consistency check between the two. For
example, the size only could be set to an arbitrary value within the
valid range, without matching xfeatures.
If userspace sets an inconsistent size vs. xfeatures, maybe zeroing out
the garbage could be an option which I expect still compatible with the
portability model.
It's still not entirely clear to me whether your claimed portability was
considered in the original sigframe design. If so, this should be
documented more clearly (e.g., in headers and/or Documentation), along
with relevant selftests. I’d to follow up on that.
That said, yes, this area ultimately falls under the rule of not
breaking userspace. So,
Acked-by: Chang S. Bae chang.seok.bae@intel.com
Thanks,
Chang
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-05-02 19:23 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 0:06 [PATCH] Revert "x86/fpu: Refine and simplify the magic number check during signal return" Andrei Vagin
2026-04-29 7:26 ` Chang S. Bae
2026-04-29 16:44 ` Andrei Vagin
2026-04-29 17:15 ` Chang S. Bae
2026-04-29 20:44 ` Andrei Vagin
2026-04-29 21:44 ` Chang S. Bae
2026-04-30 0:28 ` Andrei Vagin
2026-05-01 18:44 ` Andrei Vagin
2026-05-01 19:13 ` Chang S. Bae
2026-05-01 20:50 ` Andrei Vagin
2026-05-01 21:04 ` Chang S. Bae
2026-05-01 21:42 ` Andrei Vagin
2026-05-02 19:23 ` Chang S. Bae
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox