public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
@ 2026-03-26  9:44 Yi Lai
  2026-03-26 22:06 ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Yi Lai @ 2026-03-26  9:44 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andrew Cooper, Xin Li, x86, hpa, Shuah Khan, linux-kernel,
	linux-kselftest, yi1.lai, yi1.lai

The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
regs->flags'. This check relies on the behavior of the SYSCALL
instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.

However, on systems with FRED (Flexible Return and Event Delivery)
enabled, instead of using registers, all state is saved onto the stack.
Consequently, 'R11' retains its userspace value, causing the assertion
to fail.

Fix this by detecting if FRED is enabled and skipping the register
assertion in that case. The detection is done by checking if the RPL
bits of the GS selector are preserved after a hardware exception.
IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
ERETU) preserves them.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Yi Lai <yi1.lai@intel.com>
---
v3:
- Move is_fred_enabled() to helpers.h for other x86 selftests to use.
  Rename empty_handler to fred_handler to avoid symbol conflicts.

v2:
- Replaced CPUID check with a runtime probe using INT3 and GS RPL
  preservation to robustly detect active FRED usage (Suggested by
  Andrew Cooper).

 tools/testing/selftests/x86/helpers.h    | 34 ++++++++++++++++++++++++
 tools/testing/selftests/x86/sysret_rip.c | 12 ++++++---
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/helpers.h b/tools/testing/selftests/x86/helpers.h
index 4c747a1278d9..4d09ed97aaac 100644
--- a/tools/testing/selftests/x86/helpers.h
+++ b/tools/testing/selftests/x86/helpers.h
@@ -4,6 +4,7 @@
 
 #include <signal.h>
 #include <string.h>
+#include <stdbool.h>
 
 #include <asm/processor-flags.h>
 
@@ -50,4 +51,37 @@ static inline void clearhandler(int sig)
 		ksft_exit_fail_msg("sigaction failed");
 }
 
+static inline void fred_handler(int sig, siginfo_t *info, void *ctx_void)
+{
+}
+
+static inline bool is_fred_enabled(void)
+{
+	unsigned short gs_val;
+
+	sethandler(SIGTRAP, fred_handler, 0);
+
+	/*
+	 * Distinguish IDT and FRED mode by loading GS with a non-zero RPL and
+	 * triggering an exception:
+	 * IDT (IRET) clears RPL bits of NULL selectors.
+	 * FRED (ERETU) preserves them.
+	 *
+	 * If GS is loaded with 3 (Index=0, RPL=3), trigger an exception:
+	 * IDT should restore GS as 0.
+	 * FRED should preserve GS as 3.
+	 */
+	asm volatile (
+		"mov %[rpl3], %%gs\n\t"
+		"int3\n\t"
+		"mov %%gs, %[res]"
+		: [res] "=r" (gs_val)
+		: [rpl3] "r" (3)
+	);
+
+	clearhandler(SIGTRAP);
+
+	return gs_val == 3;
+}
+
 #endif /* __SELFTESTS_X86_HELPERS_H */
diff --git a/tools/testing/selftests/x86/sysret_rip.c b/tools/testing/selftests/x86/sysret_rip.c
index 2e423a335e1c..30b195266779 100644
--- a/tools/testing/selftests/x86/sysret_rip.c
+++ b/tools/testing/selftests/x86/sysret_rip.c
@@ -64,9 +64,15 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
 	ctx->uc_mcontext.gregs[REG_RIP] = rip;
 	ctx->uc_mcontext.gregs[REG_RCX] = rip;
 
-	/* R11 and EFLAGS should already match. */
-	assert(ctx->uc_mcontext.gregs[REG_EFL] ==
-	       ctx->uc_mcontext.gregs[REG_R11]);
+	/*
+	 * SYSCALL works differently on FRED, it does not save RIP and RFLAGS
+	 * to RCX and R11.
+	 */
+	if (!is_fred_enabled()) {
+		/* R11 and EFLAGS should already match. */
+		assert(ctx->uc_mcontext.gregs[REG_EFL] ==
+		       ctx->uc_mcontext.gregs[REG_R11]);
+	}
 
 	sethandler(SIGSEGV, sigsegv_for_sigreturn_test, SA_RESETHAND);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-03-26  9:44 [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems Yi Lai
@ 2026-03-26 22:06 ` Andy Lutomirski
  2026-03-27 12:33   ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2026-03-26 22:06 UTC (permalink / raw)
  To: Yi Lai, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Andrew Cooper, Xin Li, the arch/x86 maintainers,
	H. Peter Anvin, Shuah Khan, Linux Kernel Mailing List,
	linux-kselftest, yi1.lai



On Thu, Mar 26, 2026, at 2:44 AM, Yi Lai wrote:
> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
> regs->flags'. This check relies on the behavior of the SYSCALL
> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>
> However, on systems with FRED (Flexible Return and Event Delivery)
> enabled, instead of using registers, all state is saved onto the stack.
> Consequently, 'R11' retains its userspace value, causing the assertion
> to fail.
>
> Fix this by detecting if FRED is enabled and skipping the register
> assertion in that case. The detection is done by checking if the RPL
> bits of the GS selector are preserved after a hardware exception.
> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
> ERETU) preserves them.
>

I don't really like this.  I think we have two credible choices:

1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves R11 and RCX on entry and exit.  And update the test to actually test this.

2. Define the Linux ABI to be what it has been for quite a few years: SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit preserves all registers.

I'm in favor of #2.  People love making new programming languages and runtimes and inline asm and, these days, vibe coded crap.  And it's *easier* to emit a SYSCALL and forget to tell the compiler / code generator that RCX and R11 are clobbered than it is to remember that they're clobbered.  And it's easy to test on FRED (well, not really, but it hopefully will be some day) and it's easy to publish one's code, and then everyone is a bit screwed when the resulting program crashes sometimes on non-FRED systems.  And it will be miserable to debug.

(It's *really* *really* easy to screw this up in a way that sort of works even on non-FRED: RCX and R11 are usually clobbered across function calls, so one can get into a situation in which one's generated code usually doesn't require that SYSCALL preserve one of these registers until an inlining decision changes or some code gets reordered, and then it will start failing.  And making the failure depend on hardware details is just nasty.

So I think we should add the ~2 lines of code to fix the SYSCALL entry on FRED to match non-FRED.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-03-26 22:06 ` Andy Lutomirski
@ 2026-03-27 12:33   ` Peter Zijlstra
  2026-03-31  2:21     ` Lai, Yi
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2026-03-27 12:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yi Lai, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Andrew Cooper, Xin Li, the arch/x86 maintainers,
	H. Peter Anvin, Shuah Khan, Linux Kernel Mailing List,
	linux-kselftest, yi1.lai

On Thu, Mar 26, 2026 at 03:06:05PM -0700, Andy Lutomirski wrote:
> 
> 
> On Thu, Mar 26, 2026, at 2:44 AM, Yi Lai wrote:
> > The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
> > regs->flags'. This check relies on the behavior of the SYSCALL
> > instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
> >
> > However, on systems with FRED (Flexible Return and Event Delivery)
> > enabled, instead of using registers, all state is saved onto the stack.
> > Consequently, 'R11' retains its userspace value, causing the assertion
> > to fail.
> >
> > Fix this by detecting if FRED is enabled and skipping the register
> > assertion in that case. The detection is done by checking if the RPL
> > bits of the GS selector are preserved after a hardware exception.
> > IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
> > ERETU) preserves them.
> >
> 
> I don't really like this.  I think we have two credible choices:
> 
> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
> R11 and RCX on entry and exit.  And update the test to actually test
> this.
> 
> 2. Define the Linux ABI to be what it has been for quite a few years:
> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
> preserves all registers.
> 
> I'm in favor of #2.  People love making new programming languages and
> runtimes and inline asm and, these days, vibe coded crap.  And it's
> *easier* to emit a SYSCALL and forget to tell the compiler / code
> generator that RCX and R11 are clobbered than it is to remember that
> they're clobbered.  And it's easy to test on FRED (well, not really,
> but it hopefully will be some day) and it's easy to publish one's
> code, and then everyone is a bit screwed when the resulting program
> crashes sometimes on non-FRED systems.  And it will be miserable to
> debug.
> 
> (It's *really* *really* easy to screw this up in a way that sort of
> works even on non-FRED: RCX and R11 are usually clobbered across
> function calls, so one can get into a situation in which one's
> generated code usually doesn't require that SYSCALL preserve one of
> these registers until an inlining decision changes or some code gets
> reordered, and then it will start failing.  And making the failure
> depend on hardware details is just nasty.
> 
> So I think we should add the ~2 lines of code to fix the SYSCALL entry
> on FRED to match non-FRED.

Yes; I'm afraid I have to concur. Preserving the clobber on entry for
FRED systems is by far the safest choice. 

Aside from this selftest, fancy debuggers and anything that can transfer
userspace state between machines might be 'surprised'.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-03-27 12:33   ` Peter Zijlstra
@ 2026-03-31  2:21     ` Lai, Yi
  2026-03-31  6:03       ` Xin Li
  0 siblings, 1 reply; 13+ messages in thread
From: Lai, Yi @ 2026-03-31  2:21 UTC (permalink / raw)
  To: Peter Zijlstra, Xin Li
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Andrew Cooper, Xin Li, the arch/x86 maintainers,
	H. Peter Anvin, Shuah Khan, Linux Kernel Mailing List,
	linux-kselftest, yi1.lai

On Fri, Mar 27, 2026 at 01:33:15PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 26, 2026 at 03:06:05PM -0700, Andy Lutomirski wrote:
> > 
> > 
> > On Thu, Mar 26, 2026, at 2:44 AM, Yi Lai wrote:
> > > The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
> > > regs->flags'. This check relies on the behavior of the SYSCALL
> > > instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
> > >
> > > However, on systems with FRED (Flexible Return and Event Delivery)
> > > enabled, instead of using registers, all state is saved onto the stack.
> > > Consequently, 'R11' retains its userspace value, causing the assertion
> > > to fail.
> > >
> > > Fix this by detecting if FRED is enabled and skipping the register
> > > assertion in that case. The detection is done by checking if the RPL
> > > bits of the GS selector are preserved after a hardware exception.
> > > IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
> > > ERETU) preserves them.
> > >
> > 
> > I don't really like this.  I think we have two credible choices:
> > 
> > 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
> > R11 and RCX on entry and exit.  And update the test to actually test
> > this.
> > 
> > 2. Define the Linux ABI to be what it has been for quite a few years:
> > SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
> > preserves all registers.
> > 
> > I'm in favor of #2.  People love making new programming languages and
> > runtimes and inline asm and, these days, vibe coded crap.  And it's
> > *easier* to emit a SYSCALL and forget to tell the compiler / code
> > generator that RCX and R11 are clobbered than it is to remember that
> > they're clobbered.  And it's easy to test on FRED (well, not really,
> > but it hopefully will be some day) and it's easy to publish one's
> > code, and then everyone is a bit screwed when the resulting program
> > crashes sometimes on non-FRED systems.  And it will be miserable to
> > debug.
> > 
> > (It's *really* *really* easy to screw this up in a way that sort of
> > works even on non-FRED: RCX and R11 are usually clobbered across
> > function calls, so one can get into a situation in which one's
> > generated code usually doesn't require that SYSCALL preserve one of
> > these registers until an inlining decision changes or some code gets
> > reordered, and then it will start failing.  And making the failure
> > depend on hardware details is just nasty.
> > 
> > So I think we should add the ~2 lines of code to fix the SYSCALL entry
> > on FRED to match non-FRED.
> 
> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
> FRED systems is by far the safest choice. 
> 
> Aside from this selftest, fancy debuggers and anything that can transfer
> userspace state between machines might be 'surprised'.

Thanks Andy and Peter.

Indeed, making the selftest branch on FRED vs. non-FRED behavior
is not a good practice. The selftest should validate ABI consistency.

I agree with Andy's option #2, so this should be fixed in the FRED
syscall entry implementation.

Li Xin, does this direction look right to you? I can assit with
validation and keep the selftest aligned with the agreed ABI.

Regards,
Yi Lai


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-03-31  2:21     ` Lai, Yi
@ 2026-03-31  6:03       ` Xin Li
  2026-04-01  1:59         ` Xin Li
  0 siblings, 1 reply; 13+ messages in thread
From: Xin Li @ 2026-03-31  6:03 UTC (permalink / raw)
  To: Lai, Yi
  Cc: Peter Zijlstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Andrew Cooper,
	the arch/x86 maintainers, H. Peter Anvin, Shuah Khan,
	Linux Kernel Mailing List, linux-kselftest, yi1.lai


>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>> 
>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>> to fail.
>>>> 
>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>> assertion in that case. The detection is done by checking if the RPL
>>>> bits of the GS selector are preserved after a hardware exception.
>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>> ERETU) preserves them.
>>>> 
>>> 
>>> I don't really like this.  I think we have two credible choices:
>>> 
>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>> R11 and RCX on entry and exit.  And update the test to actually test
>>> this.
>>> 
>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>> preserves all registers.
>>> 
>>> I'm in favor of #2.  People love making new programming languages and
>>> runtimes and inline asm and, these days, vibe coded crap.  And it's
>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>> generator that RCX and R11 are clobbered than it is to remember that
>>> they're clobbered.  And it's easy to test on FRED (well, not really,
>>> but it hopefully will be some day) and it's easy to publish one's
>>> code, and then everyone is a bit screwed when the resulting program
>>> crashes sometimes on non-FRED systems.  And it will be miserable to
>>> debug.
>>> 
>>> (It's *really* *really* easy to screw this up in a way that sort of
>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>> function calls, so one can get into a situation in which one's
>>> generated code usually doesn't require that SYSCALL preserve one of
>>> these registers until an inlining decision changes or some code gets
>>> reordered, and then it will start failing.  And making the failure
>>> depend on hardware details is just nasty.
>>> 
>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>> on FRED to match non-FRED.
>> 
>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>> FRED systems is by far the safest choice. 
>> 
>> Aside from this selftest, fancy debuggers and anything that can transfer
>> userspace state between machines might be 'surprised'.
> 
> Thanks Andy and Peter.
> 
> Indeed, making the selftest branch on FRED vs. non-FRED behavior
> is not a good practice. The selftest should validate ABI consistency.
> 
> I agree with Andy's option #2, so this should be fixed in the FRED
> syscall entry implementation.
> 
> Li Xin, does this direction look right to you? I can assit with
> validation and keep the selftest aligned with the agreed ABI.
> 

Yes, consistency should take precedence over hardware-specific variations.

I would like to hear from Andrew Cooper and hpa before we do it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-03-31  6:03       ` Xin Li
@ 2026-04-01  1:59         ` Xin Li
  2026-04-01  2:48           ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Xin Li @ 2026-04-01  1:59 UTC (permalink / raw)
  To: Lai, Yi
  Cc: Peter Zijlstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Andrew Cooper,
	the arch/x86 maintainers, H. Peter Anvin, Shuah Khan,
	Linux Kernel Mailing List, linux-kselftest, yi1.lai



> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
> 
> 
>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>> 
>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>> to fail.
>>>>> 
>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>> ERETU) preserves them.
>>>>> 
>>>> 
>>>> I don't really like this.  I think we have two credible choices:
>>>> 
>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>> R11 and RCX on entry and exit.  And update the test to actually test
>>>> this.
>>>> 
>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>> preserves all registers.
>>>> 
>>>> I'm in favor of #2.  People love making new programming languages and
>>>> runtimes and inline asm and, these days, vibe coded crap.  And it's
>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>> they're clobbered.  And it's easy to test on FRED (well, not really,
>>>> but it hopefully will be some day) and it's easy to publish one's
>>>> code, and then everyone is a bit screwed when the resulting program
>>>> crashes sometimes on non-FRED systems.  And it will be miserable to
>>>> debug.
>>>> 
>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>> function calls, so one can get into a situation in which one's
>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>> these registers until an inlining decision changes or some code gets
>>>> reordered, and then it will start failing.  And making the failure
>>>> depend on hardware details is just nasty.
>>>> 
>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>> on FRED to match non-FRED.
>>> 
>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>> FRED systems is by far the safest choice. 
>>> 
>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>> userspace state between machines might be 'surprised'.
>> 
>> Thanks Andy and Peter.
>> 
>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>> is not a good practice. The selftest should validate ABI consistency.
>> 
>> I agree with Andy's option #2, so this should be fixed in the FRED
>> syscall entry implementation.
>> 
>> Li Xin, does this direction look right to you? I can assit with
>> validation and keep the selftest aligned with the agreed ABI.
>> 
> 
> Yes, consistency should take precedence over hardware-specific variations.
> 
> I would like to hear from Andrew Cooper and hpa before we do it.

Per Andy’s suggestion, the change would be:

diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
index 88c757ac8ccd..a19898747a2c 100644
--- a/arch/x86/entry/entry_fred.c
+++ b/arch/x86/entry/entry_fred.c
@@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
 {
 	/* The compiler can fold these conditions into a single test */
 	if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
+		regs->cx = regs->ip;
+		regs->r11 = regs->flags;
+
 		regs->orig_ax = regs->ax;
 		regs->ax = -ENOSYS;
 		do_syscall_64(regs, regs->orig_ax);

It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.







^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-04-01  1:59         ` Xin Li
@ 2026-04-01  2:48           ` H. Peter Anvin
  2026-04-01 14:36             ` Xin Li
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2026-04-01  2:48 UTC (permalink / raw)
  To: Xin Li, Lai, Yi
  Cc: Peter Zijlstra, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Andrew Cooper,
	the arch/x86 maintainers, Shuah Khan, Linux Kernel Mailing List,
	linux-kselftest, yi1.lai

On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@zytor.com> wrote:
>
>
>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>> 
>> 
>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>> 
>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>> to fail.
>>>>>> 
>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>> ERETU) preserves them.
>>>>>> 
>>>>> 
>>>>> I don't really like this.  I think we have two credible choices:
>>>>> 
>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>> R11 and RCX on entry and exit.  And update the test to actually test
>>>>> this.
>>>>> 
>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>> preserves all registers.
>>>>> 
>>>>> I'm in favor of #2.  People love making new programming languages and
>>>>> runtimes and inline asm and, these days, vibe coded crap.  And it's
>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>> they're clobbered.  And it's easy to test on FRED (well, not really,
>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>> crashes sometimes on non-FRED systems.  And it will be miserable to
>>>>> debug.
>>>>> 
>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>> function calls, so one can get into a situation in which one's
>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>> these registers until an inlining decision changes or some code gets
>>>>> reordered, and then it will start failing.  And making the failure
>>>>> depend on hardware details is just nasty.
>>>>> 
>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>> on FRED to match non-FRED.
>>>> 
>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>> FRED systems is by far the safest choice. 
>>>> 
>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>> userspace state between machines might be 'surprised'.
>>> 
>>> Thanks Andy and Peter.
>>> 
>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>> is not a good practice. The selftest should validate ABI consistency.
>>> 
>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>> syscall entry implementation.
>>> 
>>> Li Xin, does this direction look right to you? I can assit with
>>> validation and keep the selftest aligned with the agreed ABI.
>>> 
>> 
>> Yes, consistency should take precedence over hardware-specific variations.
>> 
>> I would like to hear from Andrew Cooper and hpa before we do it.
>
>Per Andy’s suggestion, the change would be:
>
>diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>index 88c757ac8ccd..a19898747a2c 100644
>--- a/arch/x86/entry/entry_fred.c
>+++ b/arch/x86/entry/entry_fred.c
>@@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
> {
> 	/* The compiler can fold these conditions into a single test */
> 	if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>+		regs->cx = regs->ip;
>+		regs->r11 = regs->flags;
>+
> 		regs->orig_ax = regs->ax;
> 		regs->ax = -ENOSYS;
> 		do_syscall_64(regs, regs->orig_ax);
>
>It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>
>
>
>
>
>
>

We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-04-01  2:48           ` H. Peter Anvin
@ 2026-04-01 14:36             ` Xin Li
  2026-04-01 17:54               ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Xin Li @ 2026-04-01 14:36 UTC (permalink / raw)
  To: Anvin H. Peter
  Cc: Lai Yi, Zijlstra Peter, Lutomirski Andy, Gleixner Thomas,
	Molnar Ingo, Petkov Borislav, Hansen Dave, Cooper Andrew,
	the arch x86 maintainers, Khan Shuah, Kernel Mailing List Linux,
	linux-kselftest, yi1.lai


Thanks!
Xin

> On Mar 31, 2026, at 8:15 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> 
> On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@zytor.com> wrote:
>> 
>> 
>>>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>>> 
>>> 
>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>> 
>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>> to fail.
>>>>>>> 
>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>> ERETU) preserves them.
>>>>>>> 
>>>>>> 
>>>>>> I don't really like this.  I think we have two credible choices:
>>>>>> 
>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>> R11 and RCX on entry and exit.  And update the test to actually test
>>>>>> this.
>>>>>> 
>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>> preserves all registers.
>>>>>> 
>>>>>> I'm in favor of #2.  People love making new programming languages and
>>>>>> runtimes and inline asm and, these days, vibe coded crap.  And it's
>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>> they're clobbered.  And it's easy to test on FRED (well, not really,
>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>> crashes sometimes on non-FRED systems.  And it will be miserable to
>>>>>> debug.
>>>>>> 
>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>> function calls, so one can get into a situation in which one's
>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>> these registers until an inlining decision changes or some code gets
>>>>>> reordered, and then it will start failing.  And making the failure
>>>>>> depend on hardware details is just nasty.
>>>>>> 
>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>> on FRED to match non-FRED.
>>>>> 
>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>> FRED systems is by far the safest choice.
>>>>> 
>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>> userspace state between machines might be 'surprised'.
>>>> 
>>>> Thanks Andy and Peter.
>>>> 
>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>> is not a good practice. The selftest should validate ABI consistency.
>>>> 
>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>> syscall entry implementation.
>>>> 
>>>> Li Xin, does this direction look right to you? I can assit with
>>>> validation and keep the selftest aligned with the agreed ABI.
>>>> 
>>> 
>>> Yes, consistency should take precedence over hardware-specific variations.
>>> 
>>> I would like to hear from Andrew Cooper and hpa before we do it.
>> 
>> Per Andy’s suggestion, the change would be:
>> 
>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>> index 88c757ac8ccd..a19898747a2c 100644
>> --- a/arch/x86/entry/entry_fred.c
>> +++ b/arch/x86/entry/entry_fred.c
>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>> {
>>    /* The compiler can fold these conditions into a single test */
>>    if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>> +        regs->cx = regs->ip;
>> +        regs->r11 = regs->flags;
>> +
>>        regs->orig_ax = regs->ax;
>>        regs->ax = -ENOSYS;
>>        do_syscall_64(regs, regs->orig_ax);
>> 
>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
> 
> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?

Yes, that is technically cleaner.

The question is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?

I think Andy and Peter want to be on the safer side, which kind of assumes that this is established.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
@ 2026-04-01 14:59 Xin Li
  2026-04-01 15:18 ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Xin Li @ 2026-04-01 14:59 UTC (permalink / raw)
  To: Anvin H. Peter
  Cc: Yi Lai, Zijlstra Peter, Lutomirski Andy, Thomas Gleixner,
	Molnar Ingo, Petkov Borislav, Hansen Dave, Cooper Andrew,
	x86 maintainers the arch, Khan Shuah, Kernel Mailing List Linux,
	linux-kselftest, yi1.lai


>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>> to fail.
>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>> ERETU) preserves them.
>>>>>> I don't really like this.  I think we have two credible choices:
>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>> R11 and RCX on entry and exit.  And update the test to actually test
>>>>>> this.
>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>> preserves all registers.
>>>>>> I'm in favor of #2.  People love making new programming languages and
>>>>>> runtimes and inline asm and, these days, vibe coded crap.  And it's
>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>> they're clobbered.  And it's easy to test on FRED (well, not really,
>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>> crashes sometimes on non-FRED systems.  And it will be miserable to
>>>>>> debug.
>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>> function calls, so one can get into a situation in which one's
>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>> these registers until an inlining decision changes or some code gets
>>>>>> reordered, and then it will start failing.  And making the failure
>>>>>> depend on hardware details is just nasty.
>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>> on FRED to match non-FRED.
>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>> FRED systems is by far the safest choice.
>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>> userspace state between machines might be 'surprised'.
>>>> Thanks Andy and Peter.
>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>> is not a good practice. The selftest should validate ABI consistency.
>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>> syscall entry implementation.
>>>> Li Xin, does this direction look right to you? I can assit with
>>>> validation and keep the selftest aligned with the agreed ABI.
>>> Yes, consistency should take precedence over hardware-specific variations.
>>> I would like to hear from Andrew Cooper and hpa before we do it.
>> Per Andy’s suggestion, the change would be:
>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>> index 88c757ac8ccd..a19898747a2c 100644
>> --- a/arch/x86/entry/entry_fred.c
>> +++ b/arch/x86/entry/entry_fred.c
>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>> {
>> /* The compiler can fold these conditions into a single test */
>> if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>> +        regs->cx = regs->ip;
>> +        regs->r11 = regs->flags;
>> +
>>    regs->orig_ax = regs->ax;
>>    regs->ax = -ENOSYS;
>>    do_syscall_64(regs, regs->orig_ax);
>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
> 
> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?

Yes, that is technically simpler and cleaner.

The question brought up by Andy is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?

But both are hard to prove.

I think Andy and PeterZ want to be on the safer side, i.e., this clobbering behavior is established.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-04-01 14:59 Xin Li
@ 2026-04-01 15:18 ` H. Peter Anvin
  0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2026-04-01 15:18 UTC (permalink / raw)
  To: Xin Li
  Cc: Yi Lai, Zijlstra Peter, Lutomirski Andy, Thomas Gleixner,
	Molnar Ingo, Petkov Borislav, Hansen Dave, Cooper Andrew,
	x86 maintainers the arch, Khan Shuah, Kernel Mailing List Linux,
	linux-kselftest, yi1.lai

On April 1, 2026 7:59:17 AM PDT, Xin Li <xin@zytor.com> wrote:
>
>>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>>> to fail.
>>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>>> ERETU) preserves them.
>>>>>>> I don't really like this.  I think we have two credible choices:
>>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>>> R11 and RCX on entry and exit.  And update the test to actually test
>>>>>>> this.
>>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>>> preserves all registers.
>>>>>>> I'm in favor of #2.  People love making new programming languages and
>>>>>>> runtimes and inline asm and, these days, vibe coded crap.  And it's
>>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>>> they're clobbered.  And it's easy to test on FRED (well, not really,
>>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>>> crashes sometimes on non-FRED systems.  And it will be miserable to
>>>>>>> debug.
>>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>>> function calls, so one can get into a situation in which one's
>>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>>> these registers until an inlining decision changes or some code gets
>>>>>>> reordered, and then it will start failing.  And making the failure
>>>>>>> depend on hardware details is just nasty.
>>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>>> on FRED to match non-FRED.
>>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>>> FRED systems is by far the safest choice.
>>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>>> userspace state between machines might be 'surprised'.
>>>>> Thanks Andy and Peter.
>>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>>> is not a good practice. The selftest should validate ABI consistency.
>>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>>> syscall entry implementation.
>>>>> Li Xin, does this direction look right to you? I can assit with
>>>>> validation and keep the selftest aligned with the agreed ABI.
>>>> Yes, consistency should take precedence over hardware-specific variations.
>>>> I would like to hear from Andrew Cooper and hpa before we do it.
>>> Per Andy’s suggestion, the change would be:
>>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>>> index 88c757ac8ccd..a19898747a2c 100644
>>> --- a/arch/x86/entry/entry_fred.c
>>> +++ b/arch/x86/entry/entry_fred.c
>>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>>> {
>>> /* The compiler can fold these conditions into a single test */
>>> if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>>> +        regs->cx = regs->ip;
>>> +        regs->r11 = regs->flags;
>>> +
>>>    regs->orig_ax = regs->ax;
>>>    regs->ax = -ENOSYS;
>>>    do_syscall_64(regs, regs->orig_ax);
>>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>> 
>> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?
>
>Yes, that is technically simpler and cleaner.
>
>The question brought up by Andy is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?
>
>But both are hard to prove.
>
>I think Andy and PeterZ want to be on the safer side, i.e., this clobbering behavior is established.
>

I do see the point especially by the time developers will be mostly on FRED-capable hardware and their programs end up failing on legacy. 

I'm more annoyed because we actually had this discussion once already.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-04-01 14:36             ` Xin Li
@ 2026-04-01 17:54               ` H. Peter Anvin
  2026-04-02 13:21                 ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2026-04-01 17:54 UTC (permalink / raw)
  To: Xin Li
  Cc: Lai Yi, Zijlstra Peter, Lutomirski Andy, Gleixner Thomas,
	Molnar Ingo, Petkov Borislav, Hansen Dave, Cooper Andrew,
	the arch x86 maintainers, Khan Shuah, Kernel Mailing List Linux,
	linux-kselftest, yi1.lai

On April 1, 2026 7:36:48 AM PDT, Xin Li <xin@zytor.com> wrote:
>
>Thanks!
>Xin
>
>> On Mar 31, 2026, at 8:15 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> 
>> On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@zytor.com> wrote:
>>> 
>>> 
>>>>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>>>> 
>>>> 
>>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>>> 
>>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>>> to fail.
>>>>>>>> 
>>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>>> ERETU) preserves them.
>>>>>>>> 
>>>>>>> 
>>>>>>> I don't really like this.  I think we have two credible choices:
>>>>>>> 
>>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>>> R11 and RCX on entry and exit.  And update the test to actually test
>>>>>>> this.
>>>>>>> 
>>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>>> preserves all registers.
>>>>>>> 
>>>>>>> I'm in favor of #2.  People love making new programming languages and
>>>>>>> runtimes and inline asm and, these days, vibe coded crap.  And it's
>>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>>> they're clobbered.  And it's easy to test on FRED (well, not really,
>>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>>> crashes sometimes on non-FRED systems.  And it will be miserable to
>>>>>>> debug.
>>>>>>> 
>>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>>> function calls, so one can get into a situation in which one's
>>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>>> these registers until an inlining decision changes or some code gets
>>>>>>> reordered, and then it will start failing.  And making the failure
>>>>>>> depend on hardware details is just nasty.
>>>>>>> 
>>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>>> on FRED to match non-FRED.
>>>>>> 
>>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>>> FRED systems is by far the safest choice.
>>>>>> 
>>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>>> userspace state between machines might be 'surprised'.
>>>>> 
>>>>> Thanks Andy and Peter.
>>>>> 
>>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>>> is not a good practice. The selftest should validate ABI consistency.
>>>>> 
>>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>>> syscall entry implementation.
>>>>> 
>>>>> Li Xin, does this direction look right to you? I can assit with
>>>>> validation and keep the selftest aligned with the agreed ABI.
>>>>> 
>>>> 
>>>> Yes, consistency should take precedence over hardware-specific variations.
>>>> 
>>>> I would like to hear from Andrew Cooper and hpa before we do it.
>>> 
>>> Per Andy’s suggestion, the change would be:
>>> 
>>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>>> index 88c757ac8ccd..a19898747a2c 100644
>>> --- a/arch/x86/entry/entry_fred.c
>>> +++ b/arch/x86/entry/entry_fred.c
>>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>>> {
>>>    /* The compiler can fold these conditions into a single test */
>>>    if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>>> +        regs->cx = regs->ip;
>>> +        regs->r11 = regs->flags;
>>> +
>>>        regs->orig_ax = regs->ax;
>>>        regs->ax = -ENOSYS;
>>>        do_syscall_64(regs, regs->orig_ax);
>>> 
>>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>> 
>> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?
>
>Yes, that is technically cleaner.
>
>The question is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?
>
>I think Andy and Peter want to be on the safer side, which kind of assumes that this is established.
>

Clobbering is never an architectural contract; clobbering is always an option. However, I understand the concern that a developer who writes software on a FRED system which breaks on a legacy system.

Last time this came up, the policy we decided on was that a system that clobbers must do so in all cases (in order to not leak internal kernel state) but a system that can preserve (FRED or IDT-without-SYSCALL) may always do so.

I would prefer if we could defer this policy reversal for a bit. Since there is production hardware out now, I have been working on actually tuning the FRED code paths, and because the Linux kernel is so efficient, details matter in surprising ways. 

I *particularly* dislike clobbering registers on the way *into* the kernel, though. That needlessly makes them unavailable to a debugger, and one of the benefits of FRED is improving debug visibility in some specific cases.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-04-01 17:54               ` H. Peter Anvin
@ 2026-04-02 13:21                 ` Andy Lutomirski
  2026-04-03 17:32                   ` H. Peter Anvin
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2026-04-02 13:21 UTC (permalink / raw)
  To: H. Peter Anvin, Xin Li
  Cc: Yi Lai, Peter Zijlstra (Intel), Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Andrew Cooper,
	the arch/x86 maintainers, Khan Shuah, Linux Kernel Mailing List,
	linux-kselftest, yi1.lai



On Wed, Apr 1, 2026, at 10:54 AM, H. Peter Anvin wrote:
> On April 1, 2026 7:36:48 AM PDT, Xin Li <xin@zytor.com> wrote:
>>
>>Thanks!
>>Xin
>>
>>> On Mar 31, 2026, at 8:15 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>> 
>>> On March 31, 2026 6:59:06 PM PDT, Xin Li <xin@zytor.com> wrote:
>>>> 
>>>> 
>>>>>> On Mar 30, 2026, at 11:03 PM, Xin Li <xin@zytor.com> wrote:
>>>>> 
>>>>> 
>>>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>>>> 
>>>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>>>> to fail.
>>>>>>>>> 
>>>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>>>> ERETU) preserves them.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I don't really like this.  I think we have two credible choices:
>>>>>>>> 
>>>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>>>> R11 and RCX on entry and exit.  And update the test to actually test
>>>>>>>> this.
>>>>>>>> 
>>>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>>>> preserves all registers.
>>>>>>>> 
>>>>>>>> I'm in favor of #2.  People love making new programming languages and
>>>>>>>> runtimes and inline asm and, these days, vibe coded crap.  And it's
>>>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>>>> they're clobbered.  And it's easy to test on FRED (well, not really,
>>>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>>>> crashes sometimes on non-FRED systems.  And it will be miserable to
>>>>>>>> debug.
>>>>>>>> 
>>>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>>>> function calls, so one can get into a situation in which one's
>>>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>>>> these registers until an inlining decision changes or some code gets
>>>>>>>> reordered, and then it will start failing.  And making the failure
>>>>>>>> depend on hardware details is just nasty.
>>>>>>>> 
>>>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>>>> on FRED to match non-FRED.
>>>>>>> 
>>>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>>>> FRED systems is by far the safest choice.
>>>>>>> 
>>>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>>>> userspace state between machines might be 'surprised'.
>>>>>> 
>>>>>> Thanks Andy and Peter.
>>>>>> 
>>>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>>>> is not a good practice. The selftest should validate ABI consistency.
>>>>>> 
>>>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>>>> syscall entry implementation.
>>>>>> 
>>>>>> Li Xin, does this direction look right to you? I can assit with
>>>>>> validation and keep the selftest aligned with the agreed ABI.
>>>>>> 
>>>>> 
>>>>> Yes, consistency should take precedence over hardware-specific variations.
>>>>> 
>>>>> I would like to hear from Andrew Cooper and hpa before we do it.
>>>> 
>>>> Per Andy’s suggestion, the change would be:
>>>> 
>>>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>>>> index 88c757ac8ccd..a19898747a2c 100644
>>>> --- a/arch/x86/entry/entry_fred.c
>>>> +++ b/arch/x86/entry/entry_fred.c
>>>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs *regs)
>>>> {
>>>>    /* The compiler can fold these conditions into a single test */
>>>>    if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>>>> +        regs->cx = regs->ip;
>>>> +        regs->r11 = regs->flags;
>>>> +
>>>>        regs->orig_ax = regs->ax;
>>>>        regs->ax = -ENOSYS;
>>>>        do_syscall_64(regs, regs->orig_ax);
>>>> 
>>>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>>> 
>>> We discussed this over a year ago, and at that point agreed that reserving the register was the desired behavior. Why has this changed now?
>>
>>Yes, that is technically cleaner.
>>
>>The question is, is the RCX/R11 clobbering behavior an established architectural contract, or is it an implementation detail that software ignores?
>>
>>I think Andy and Peter want to be on the safer side, which kind of assumes that this is established.
>>
>
> Clobbering is never an architectural contract; clobbering is always an 
> option. However, I understand the concern that a developer who writes 
> software on a FRED system which breaks on a legacy system.
>
> Last time this came up, the policy we decided on was that a system that 
> clobbers must do so in all cases (in order to not leak internal kernel 
> state) but a system that can preserve (FRED or IDT-without-SYSCALL) may 
> always do so.
>
> I would prefer if we could defer this policy reversal for a bit. Since 
> there is production hardware out now, I have been working on actually 
> tuning the FRED code paths, and because the Linux kernel is so 
> efficient, details matter in surprising ways. 
>
> I *particularly* dislike clobbering registers on the way *into* the 
> kernel, though. That needlessly makes them unavailable to a debugger, 
> and one of the benefits of FRED is improving debug visibility in some 
> specific cases.

I don't really agree.  For quite a few years now, we've tried to make the exit path uniform, and we have this logic in syscall_64:

        /* SYSRET requires RCX == RIP and R11 == EFLAGS */
        if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
                return false;  <-- fall back to IRET

and this is not just an aesthetic thing -- it allows us to have deliver signals and implement things like sigreturn without needing to track extra flag bits that mean "well, actually, we're in the syscall *code* but we're not returning from a syscall any more".  We had that a long time ago, and it was extremely difficult to understand and maintain.

So, on current kernels and kernels going back, I dunno, 10 years (I didn't try to dig out the git history, but I did write much of this code...), the semantics have been that we return to usermode in a state that matches pt_regs as precisely as we can arrange.  For the one case where we have a very longstanding divergence between entry and exit regs, we have orig_ax.

So it would be at least a fairly large maintainability regression to make the non-FRED SYSCALL behavior modify rcx and/or r11 on exit.

Now we have FRED.  Sure, it would be nice to remember the entry RCX and R11, but if we want to avoid the footgun where the effect of SYSCALL is different on FRED and non-FRED hardware, then we need the context after entry completes to have regs->rcx == regs->rip and regs->rcx == regs->flags (or perhaps RCX and R11 differently poisoned, but that seems a bit silly).

If we really want to have the option to fish the original rcx and r11 out from somewhere or perhaps to have extra-bonus-efficient many-parameter syscalls (I'm not sure why), then we could add orig_rcx and orig_r11.  Or we could invent a time machine and fix SYSCALL when it first came out.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems
  2026-04-02 13:21                 ` Andy Lutomirski
@ 2026-04-03 17:32                   ` H. Peter Anvin
  0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2026-04-03 17:32 UTC (permalink / raw)
  To: Andy Lutomirski, Xin Li
  Cc: Yi Lai, Peter Zijlstra (Intel), Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Andrew Cooper,
	the arch/x86 maintainers, Khan Shuah, Linux Kernel Mailing List,
	linux-kselftest, yi1.lai

On 2026-04-02 06:21, Andy Lutomirski wrote:
> 
> I don't really agree.  For quite a few years now, we've tried to make the exit path uniform, and we have this logic in syscall_64:
> 
>         /* SYSRET requires RCX == RIP and R11 == EFLAGS */
>         if (unlikely(regs->cx != regs->ip || regs->r11 != regs->flags))
>                 return false;  <-- fall back to IRET
> 
> and this is not just an aesthetic thing -- it allows us to have deliver signals and implement things like sigreturn without needing to track extra flag bits that mean "well, actually, we're in the syscall *code* but we're not returning from a syscall any more".  We had that a long time ago, and it was extremely difficult to understand and maintain.
> 
> So, on current kernels and kernels going back, I dunno, 10 years (I didn't try to dig out the git history, but I did write much of this code...), the semantics have been that we return to usermode in a state that matches pt_regs as precisely as we can arrange.  For the one case where we have a very longstanding divergence between entry and exit regs, we have orig_ax.
> 
> So it would be at least a fairly large maintainability regression to make the non-FRED SYSCALL behavior modify rcx and/or r11 on exit.
> 
> Now we have FRED.  Sure, it would be nice to remember the entry RCX and R11, but if we want to avoid the footgun where the effect of SYSCALL is different on FRED and non-FRED hardware, then we need the context after entry completes to have regs->rcx == regs->rip and regs->rcx == regs->flags (or perhaps RCX and R11 differently poisoned, but that seems a bit silly).
> 
> If we really want to have the option to fish the original rcx and r11 out from somewhere or perhaps to have extra-bonus-efficient many-parameter syscalls (I'm not sure why), then we could add orig_rcx and orig_r11.  Or we could invent a time machine and fix SYSCALL when it first came out.
> 

I certainly see what you're saying. I still don't like the idea of clobbering
registers "just because" for this reason and more...

	-hpa


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-04-03 18:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26  9:44 [PATCH v3] selftests/x86: Fix sysret_rip assertion failure on FRED systems Yi Lai
2026-03-26 22:06 ` Andy Lutomirski
2026-03-27 12:33   ` Peter Zijlstra
2026-03-31  2:21     ` Lai, Yi
2026-03-31  6:03       ` Xin Li
2026-04-01  1:59         ` Xin Li
2026-04-01  2:48           ` H. Peter Anvin
2026-04-01 14:36             ` Xin Li
2026-04-01 17:54               ` H. Peter Anvin
2026-04-02 13:21                 ` Andy Lutomirski
2026-04-03 17:32                   ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2026-04-01 14:59 Xin Li
2026-04-01 15:18 ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox