public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] x86/smp: Set up exception handling before cr4_init()
@ 2026-02-06 18:50 Xin Li (Intel)
  2026-02-06 19:19 ` Dave Hansen
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Li (Intel) @ 2026-02-06 18:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, xin, peterz,
	andrew.cooper3, sohil.mehta, nikunj, thomas.lendacky, seanjc,
	stable

The current AP boot sequence initializes CR4 before setting up
exception handling.  With FRED enabled, however, CR4.FRED is set
prior to initializing the FRED configuration MSRs, introducing a
brief window where a triple fault could occur.  This wasn't
considered a problem, as the early boot code was carefully designed
to avoid triggering exceptions.  Moreover, if an exception does
occur at this stage, it's preferable for the CPU to triple fault
rather than risk a potential exploit.

However, under TDX or SEV-ES/SNP, printk() triggers a #VE or #VC,
so any logging during this small window results in a triple fault.

Swap the order of cr4_init() and cpu_init_exception_handling(),
since cr4_init() only involves reading from and writing to CR4,
and setting up exception handling does not depend on any specific
CR4 bits being set (Arguably CR4.PAE, CR4.PSE and CR4.PGE are
related but they are already set before start_secondary() anyway).

Notably, this triple fault can still occur before FRED is enabled,
while the bringup IDT is in use, since it lacks a #VE handler for
TDX.

BTW, on 32-bit systems, loading CR3 with swapper_pg_dir is moved
ahead of cr4_init(), which appears to be harmless.

Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Reported-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Cc: stable@vger.kernel.org # 6.9+
---
 arch/x86/kernel/smpboot.c | 41 ++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5cd6950ab672..d2bf1acddb86 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -228,13 +228,6 @@ static void ap_calibrate_delay(void)
  */
 static void notrace __noendbr start_secondary(void *unused)
 {
-	/*
-	 * Don't put *anything* except direct CPU state initialization
-	 * before cpu_init(), SMP booting is too fragile that we want to
-	 * limit the things done here to the most necessary things.
-	 */
-	cr4_init();
-
 	/*
 	 * 32-bit specific. 64-bit reaches this code with the correct page
 	 * table established. Yet another historical divergence.
@@ -245,8 +238,37 @@ static void notrace __noendbr start_secondary(void *unused)
 		__flush_tlb_all();
 	}
 
+	/*
+	 * AP startup assembly code has setup the following before calling
+	 * start_secondary() on 64-bit:
+	 *
+	 * 1) CS set to __KERNEL_CS.
+	 * 2) CR3 switched to the init_top_pgt.
+	 * 3) CR4.PAE, CR4.PSE and CR4.PGE are set.
+	 * 4) GDT set to per-CPU gdt_page.
+	 * 5) ALL data segments set to the NULL descriptor.
+	 * 6) MSR_GS_BASE set to per-CPU offset.
+	 * 7) IDT set to bringup IDT.
+	 * 8) CR0 set to CR0_STATE.
+	 *
+	 * So it's ready to setup exception handling.
+	 */
 	cpu_init_exception_handling(false);
 
+	/*
+	 * Ensure bits set in cr4_pinned_bits are set in CR4.
+	 *
+	 * cr4_pinned_bits is a subset of cr4_pinned_mask, which includes
+	 * the following bits:
+	 *         X86_CR4_SMEP
+	 *         X86_CR4_SMAP
+	 *         X86_CR4_UMIP
+	 *         X86_CR4_FSGSBASE
+	 *         X86_CR4_CET
+	 *         X86_CR4_FRED
+	 */
+	cr4_init();
+
 	/*
 	 * Load the microcode before reaching the AP alive synchronization
 	 * point below so it is not part of the full per CPU serialized
@@ -272,6 +294,11 @@ static void notrace __noendbr start_secondary(void *unused)
 	 */
 	cpuhp_ap_sync_alive();
 
+	/*
+	 * Don't put *anything* except direct CPU state initialization
+	 * before cpu_init(), SMP booting is too fragile that we want to
+	 * limit the things done here to the most necessary things.
+	 */
 	cpu_init();
 	fpu__init_cpu();
 	rcutree_report_cpu_starting(raw_smp_processor_id());
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
  2026-02-06 18:50 [PATCH v1] x86/smp: Set up exception handling before cr4_init() Xin Li (Intel)
@ 2026-02-06 19:19 ` Dave Hansen
  2026-02-08 19:02   ` Xin Li
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2026-02-06 19:19 UTC (permalink / raw)
  To: Xin Li (Intel), linux-kernel
  Cc: tglx, mingo, bp, dave.hansen, x86, hpa, peterz, andrew.cooper3,
	sohil.mehta, nikunj, thomas.lendacky, seanjc, stable

> +	/*
> +	 * AP startup assembly code has setup the following before calling
> +	 * start_secondary() on 64-bit:
> +	 *
> +	 * 1) CS set to __KERNEL_CS.
> +	 * 2) CR3 switched to the init_top_pgt.
> +	 * 3) CR4.PAE, CR4.PSE and CR4.PGE are set.
> +	 * 4) GDT set to per-CPU gdt_page.
> +	 * 5) ALL data segments set to the NULL descriptor.
> +	 * 6) MSR_GS_BASE set to per-CPU offset.
> +	 * 7) IDT set to bringup IDT.
> +	 * 8) CR0 set to CR0_STATE.
> +	 *
> +	 * So it's ready to setup exception handling.
> +	 */
>  	cpu_init_exception_handling(false);

This is fine because very little of that ^ is ever going to change. It's
not great that it duplicates the assembly logic, but it's good
documentation I guess.

> +	/*
> +	 * Ensure bits set in cr4_pinned_bits are set in CR4.
> +	 *
> +	 * cr4_pinned_bits is a subset of cr4_pinned_mask, which includes
> +	 * the following bits:
> +	 *         X86_CR4_SMEP
> +	 *         X86_CR4_SMAP
> +	 *         X86_CR4_UMIP
> +	 *         X86_CR4_FSGSBASE
> +	 *         X86_CR4_CET
> +	 *         X86_CR4_FRED
> +	 */
> +	cr4_init();

I'm not as big of a fan of this comment. The next pinned bit that gets
added will make this stale. Could we try to make this more timeless, please?

I'm also not sure I like the asymmetry of this between the boot and
secondary CPUs. On a boot CPU, CR4.SMEP will get set via
identify_boot_cpu() and eventually setup_smep(). On a secondary CPU,
it'll get set in cr4_init() and *not* in setup_smep().

This asymmetry is (I think) part of what the root of the problem is here
and how this bug came to be.

I really think the current pinning behavior is too invasive. It has zero
benefit to be pinning CR bits this early in bringup. It's only causing pain.

I _really_ think we need a defined per-cpu point where pinning comes
into effect. Marking the CPU online is one idea.

Thoughts?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
  2026-02-06 19:19 ` Dave Hansen
@ 2026-02-08 19:02   ` Xin Li
  2026-02-09  7:28     ` Sohil Mehta
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Li @ 2026-02-08 19:02 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, peterz,
	andrew.cooper3, sohil.mehta, nikunj, thomas.lendacky, seanjc,
	stable


>> + /*
>> +  * Ensure bits set in cr4_pinned_bits are set in CR4.
>> +  *
>> +  * cr4_pinned_bits is a subset of cr4_pinned_mask, which includes
>> +  * the following bits:
>> +  *         X86_CR4_SMEP
>> +  *         X86_CR4_SMAP
>> +  *         X86_CR4_UMIP
>> +  *         X86_CR4_FSGSBASE
>> +  *         X86_CR4_CET
>> +  *         X86_CR4_FRED
>> +  */
>> + cr4_init();
> 
> I'm not as big of a fan of this comment. The next pinned bit that gets
> added will make this stale. Could we try to make this more timeless, please?

Right, it’s still a potential problem.

> 
> I'm also not sure I like the asymmetry of this between the boot and
> secondary CPUs. On a boot CPU, CR4.SMEP will get set via
> identify_boot_cpu() and eventually setup_smep(). On a secondary CPU,
> it'll get set in cr4_init() and *not* in setup_smep().
> 
> This asymmetry is (I think) part of what the root of the problem is here
> and how this bug came to be.

Are you proposing a single CPU initialization function for both BSP and
APs?

It reminds me that tglx proposed the following change when I was fixing
a FRED boot order bug.

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6de12b3c1b04..a4735d9b5a1d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2176,7 +2176,7 @@ static inline void tss_setup_io_bitmap(struct tss_struct *tss)
  * Setup everything needed to handle exceptions from the IDT, including the IST
  * exceptions which use paranoid_entry().
  */
-void cpu_init_exception_handling(void)
+void cpu_init_exception_handling(bool boot_cpu)
 {
 	struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw);
 	int cpu = raw_smp_processor_id();


So something like

	void cpu_init(bool boot_cpu)

and it gets called on both BSP and APs.

> 
> I really think the current pinning behavior is too invasive. It has zero
> benefit to be pinning CR bits this early in bringup. It's only causing pain.


Maybe someone can explain why it’s added like this before I find time to
dig into it.

I’m curious why cr4_init() is not part of the following cpu_init()? IOW,
why does it need to be called so early in the existing code?


> 
> I _really_ think we need a defined per-cpu point where pinning comes
> into effect. Marking the CPU online is one idea.
> 
> Thoughts?

It seems a good fit.  Just that {on,off}line() are not called on BSP (not
a real problem).

Question is that who would work on it ;) ?

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
  2026-02-08 19:02   ` Xin Li
@ 2026-02-09  7:28     ` Sohil Mehta
  2026-02-09  8:16       ` Xin Li
  2026-02-10  0:18       ` Sohil Mehta
  0 siblings, 2 replies; 9+ messages in thread
From: Sohil Mehta @ 2026-02-09  7:28 UTC (permalink / raw)
  To: Xin Li, Dave Hansen
  Cc: linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, peterz,
	andrew.cooper3, nikunj, thomas.lendacky, seanjc, stable

On 2/8/2026 11:02 AM, Xin Li wrote:

> 
> I’m curious why cr4_init() is not part of the following cpu_init()? IOW,
> why does it need to be called so early in the existing code?
> 

The name cpu_init() is misleading. Most of the pinned features don't get
initialized in cpu_init(). They are set up slightly later:

start_secondary()
  ap_starting()
    identify_secondary_cpu()
      identify_cpu()

The original reason for writing CR4 early on APs probably originates in
commit c7ad5ad297e6 ("x86/mm/64: Initialize CR4.PCIDE early"). Then,
when CR pinning was introduced, it was a global system-wide concept. So,
the pinned bits had to be programmed when the first write to CR4 happened.

> 
>>
>> I _really_ think we need a defined per-cpu point where pinning comes
>> into effect. Marking the CPU online is one idea.
>>
>> Thoughts?
> 

I think this approach could work. It should cover APs as well as hotplug
CPUs that come online later.

> It seems a good fit.  Just that {on,off}line() are not called on BSP (not
> a real problem).
> 

The BSP is marked online in boot_cpu_init()->set_cpu_online(). So, it
should be covered as well.

> Question is that who would work on it ;) ?

I think Dave already posted the patch for it here.
https://lore.kernel.org/lkml/02df7890-83c2-4047-8c88-46fbc6e0a892@intel.com/

I will test that out to confirm that it doesn't mess up some implicit
behavior.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
  2026-02-09  7:28     ` Sohil Mehta
@ 2026-02-09  8:16       ` Xin Li
  2026-02-09 15:14         ` Dave Hansen
  2026-02-10  0:18       ` Sohil Mehta
  1 sibling, 1 reply; 9+ messages in thread
From: Xin Li @ 2026-02-09  8:16 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Dave Hansen, linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa,
	peterz, andrew.cooper3, nikunj, thomas.lendacky, seanjc, stable


>> 
>> I’m curious why cr4_init() is not part of the following cpu_init()? IOW,
>> why does it need to be called so early in the existing code?
>> 
> 
> The name cpu_init() is misleading. Most of the pinned features don't get
> initialized in cpu_init(). They are set up slightly later:
> 
> start_secondary()
>  ap_starting()
>    identify_secondary_cpu()
>      identify_cpu()
> 
> The original reason for writing CR4 early on APs probably originates in
> commit c7ad5ad297e6 ("x86/mm/64: Initialize CR4.PCIDE early"). Then,
> when CR pinning was introduced, it was a global system-wide concept. So,
> the pinned bits had to be programmed when the first write to CR4 happened.


Thanks for digging into it and explain it.


> 
>> 
>>> 
>>> I _really_ think we need a defined per-cpu point where pinning comes
>>> into effect. Marking the CPU online is one idea.
>>> 
>>> Thoughts?
>> 
> 
> I think this approach could work. It should cover APs as well as hotplug
> CPUs that come online later.
> 
>> It seems a good fit.  Just that {on,off}line() are not called on BSP (not
>> a real problem).
>> 
> 
> The BSP is marked online in boot_cpu_init()->set_cpu_online(). So, it
> should be covered as well.
> 
>> Question is that who would work on it ;) ?
> 
> I think Dave already posted the patch for it here.
> https://lore.kernel.org/lkml/02df7890-83c2-4047-8c88-46fbc6e0a892@intel.com/
> 
> I will test that out to confirm that it doesn't mess up some implicit
> behavior.
> 

I’m not sure if Dave also wants to make BSP/AP boot code symmetric at the same time ;)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
  2026-02-09  8:16       ` Xin Li
@ 2026-02-09 15:14         ` Dave Hansen
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Hansen @ 2026-02-09 15:14 UTC (permalink / raw)
  To: Xin Li, Sohil Mehta
  Cc: linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, peterz,
	andrew.cooper3, nikunj, thomas.lendacky, seanjc, stable

On 2/9/26 00:16, Xin Li wrote:
> I’m not sure if Dave also wants to make BSP/AP boot code symmetric
> at the same time 😉

It's perfectly fine to do things one bit at a time. In this case,
cr4_init() is does two different things underneath the covers depending
on the state of CR pinning. If CR pinning as a feature is kicked out
influencing the early CPU boot code at _all_, then it certainly helps
the goal if making the AP/BSP code more symmetric.

So I very much prefer doing things one bit at a time, if we can.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
  2026-02-09  7:28     ` Sohil Mehta
  2026-02-09  8:16       ` Xin Li
@ 2026-02-10  0:18       ` Sohil Mehta
  2026-02-10  1:13         ` Dave Hansen
  1 sibling, 1 reply; 9+ messages in thread
From: Sohil Mehta @ 2026-02-10  0:18 UTC (permalink / raw)
  To: Xin Li, Dave Hansen
  Cc: linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, peterz,
	andrew.cooper3, nikunj, thomas.lendacky, seanjc, stable

On 2/8/2026 11:28 PM, Sohil Mehta wrote:
> I think Dave already posted the patch for it here.
> https://lore.kernel.org/lkml/02df7890-83c2-4047-8c88-46fbc6e0a892@intel.com/
> 
> I will test that out to confirm that it doesn't mess up some implicit
> behavior.
> 

I verified that the above patch works as expected. I added a debug print
in cpu_init_fred_exceptions() to test FRED behavior.

	if (cr4_read_shadow() & X86_CR4_FRED)
		pr_warn("FRED is already enabled on CPU%d\n",
			smp_processor_id());


With Dave's patch, the warning no longer shows up on APs.

However, I found another location where we enable FRED in CR4 before
enabling the MSRs.

__restore_processor_state():

...

  __write_cr4(ctxt->cr4);

...

  if (ctxt->cr4 & X86_CR4_FRED) {
    cpu_init_fred_exceptions();
    cpu_init_fred_rsps();
  }

Due to limitations of my test platform, I couldn't verify the FRED print
in __restore_processor_state()'s path. But, could "restore" run into a
similar issue in the future?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
  2026-02-10  0:18       ` Sohil Mehta
@ 2026-02-10  1:13         ` Dave Hansen
  2026-02-10  3:11           ` Xin Li
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2026-02-10  1:13 UTC (permalink / raw)
  To: Sohil Mehta, Xin Li
  Cc: linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa, peterz,
	andrew.cooper3, nikunj, thomas.lendacky, seanjc, stable

On 2/9/26 16:18, Sohil Mehta wrote:
> 
> However, I found another location where we enable FRED in CR4 before
> enabling the MSRs.
> 
> __restore_processor_state():
> 
> ...
> 
>   __write_cr4(ctxt->cr4);
> 
> ...
> 
>   if (ctxt->cr4 & X86_CR4_FRED) {
>     cpu_init_fred_exceptions();
>     cpu_init_fred_rsps();
>   }
> 
> Due to limitations of my test platform, I couldn't verify the FRED print
> in __restore_processor_state()'s path. But, could "restore" run into a
> similar issue in the future?

It sure looks like it. Good catch! I have the feeling there's never been
an exception in that code anyway. The:

	wrmsrq(MSR_GS_BASE, ctxt->kernelmode_gs_base);

is misplaced too. Exception handling leans heavily on MSR_GS_BASE.

But, it doesn't hurt to be consistent about the ordering.

In a perfect world, we'd probably unify all this code. Maybe the boot
code establishes a 'saved_context' and all the APs just
restore_processor_state() to go online normally instead of just for a
restore. Or maybe the restore_processor_state() should be unified more
with start_secondary() because a big chunk of the stuff its restoring is
pretty static already.

But there's no need today to do anything that drastic.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
  2026-02-10  1:13         ` Dave Hansen
@ 2026-02-10  3:11           ` Xin Li
  0 siblings, 0 replies; 9+ messages in thread
From: Xin Li @ 2026-02-10  3:11 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sohil Mehta, linux-kernel, tglx, mingo, bp, dave.hansen, x86, hpa,
	peterz, andrew.cooper3, nikunj, thomas.lendacky, seanjc, stable



> On Feb 9, 2026, at 5:13 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 2/9/26 16:18, Sohil Mehta wrote:
>> 
>> However, I found another location where we enable FRED in CR4 before
>> enabling the MSRs.


IIRC, it’s expected:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5f1e8af9c9e151ecd665f6d2e36fb25fec3b110



>> 
>> __restore_processor_state():
>> 
>> ...
>> 
>>  __write_cr4(ctxt->cr4);
>> 
>> ...
>> 
>>  if (ctxt->cr4 & X86_CR4_FRED) {
>>    cpu_init_fred_exceptions();
>>    cpu_init_fred_rsps();
>>  }
>> 
>> Due to limitations of my test platform, I couldn't verify the FRED print
>> in __restore_processor_state()'s path. But, could "restore" run into a
>> similar issue in the future?
> 
> It sure looks like it. Good catch! I have the feeling there's never been
> an exception in that code anyway. The:
> 
> wrmsrq(MSR_GS_BASE, ctxt->kernelmode_gs_base);
> 
> is misplaced too. Exception handling leans heavily on MSR_GS_BASE.
> 
> But, it doesn't hurt to be consistent about the ordering.
> 
> In a perfect world, we'd probably unify all this code. Maybe the boot
> code establishes a 'saved_context' and all the APs just
> restore_processor_state() to go online normally instead of just for a
> restore. Or maybe the restore_processor_state() should be unified more
> with start_secondary() because a big chunk of the stuff its restoring is
> pretty static already.
> 
> But there's no need today to do anything that drastic.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-10  3:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-06 18:50 [PATCH v1] x86/smp: Set up exception handling before cr4_init() Xin Li (Intel)
2026-02-06 19:19 ` Dave Hansen
2026-02-08 19:02   ` Xin Li
2026-02-09  7:28     ` Sohil Mehta
2026-02-09  8:16       ` Xin Li
2026-02-09 15:14         ` Dave Hansen
2026-02-10  0:18       ` Sohil Mehta
2026-02-10  1:13         ` Dave Hansen
2026-02-10  3:11           ` Xin Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox