* Save a WRMSR GS.base? @ 2026-06-04 1:53 Borislav Petkov 2026-06-04 9:17 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-04 1:53 UTC (permalink / raw) To: H. Peter Anvin, Andrew Cooper; +Cc: x86-ML, LKML Hi, so here: diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index b85e715ebb30..ffa894bdb4ee 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -400,7 +400,9 @@ static __always_inline void x86_fsgsbase_load(struct thread_struct *prev, /* Update the bases. */ wrfsbase(next->fsbase); - __wrgsbase_inactive(next->gsbase); + + if (!cpu_feature_enabled(X86_FEATURE_LKGS)) + __wrgsbase_inactive(next->gsbase); } else { load_seg_legacy(prev->fsindex, prev->fsbase, next->fsindex, next->fsbase, FS); a couple of lines above in that function we have: if (unlikely(prev->gsindex || next->gsindex)) loadseg(GS, next->gsindex); which, on a FRED machine, would do LKGS. Now that insn does: GS.selector := SRC; GS.attributes := descriptor.attributes; IA32_KERNEL_GS_BASE := descriptor.base; // bits 63:32 cleared so I can save myself the __wrgsbase_inactive() which ends up doing WRMSR GS.base. Right? I.e., the diff above. We're also not doing the optimization of checking whether prev.GS.base and next.GS.base are equal. I see them both 0 in a trace here but I guess luserpace can change them so I guess we wanna overwrite GS.base on context switch unconditionally. But LKGS does that for us so we don't need the WRMSR GS.base there, right? Or am I missing something? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-04 1:53 Save a WRMSR GS.base? Borislav Petkov @ 2026-06-04 9:17 ` Andrew Cooper 2026-06-05 2:24 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2026-06-04 9:17 UTC (permalink / raw) To: Borislav Petkov, H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On 04/06/2026 2:53 am, Borislav Petkov wrote: > Hi, > > so here: > > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c > index b85e715ebb30..ffa894bdb4ee 100644 > --- a/arch/x86/kernel/process_64.c > +++ b/arch/x86/kernel/process_64.c > @@ -400,7 +400,9 @@ static __always_inline void x86_fsgsbase_load(struct thread_struct *prev, > > /* Update the bases. */ > wrfsbase(next->fsbase); > - __wrgsbase_inactive(next->gsbase); > + > + if (!cpu_feature_enabled(X86_FEATURE_LKGS)) > + __wrgsbase_inactive(next->gsbase); > } else { > load_seg_legacy(prev->fsindex, prev->fsbase, > next->fsindex, next->fsbase, FS); > > a couple of lines above in that function we have: > > if (unlikely(prev->gsindex || next->gsindex)) > loadseg(GS, next->gsindex); > > which, on a FRED machine, would do LKGS. Now that insn does: > > GS.selector := SRC; > GS.attributes := descriptor.attributes; > IA32_KERNEL_GS_BASE := descriptor.base; // bits 63:32 cleared > > so I can save myself the __wrgsbase_inactive() which ends up doing WRMSR > GS.base. > > Right? I.e., the diff above. > > We're also not doing the optimization of checking whether prev.GS.base and > next.GS.base are equal. I see them both 0 in a trace here but I guess > luserpace can change them so I guess we wanna overwrite GS.base on context > switch unconditionally. > > But LKGS does that for us so we don't need the WRMSR GS.base there, right? > > Or am I missing something? Yes, but it took me writing a "no" email to spot it. If the LKGS (in load seg) was called unconditionally, then yes it would be safe to drop the __wrgsbase_inactive(), but it's not. Consider a prev and next which both have the same ->gsindex (so skips loadseg()), but have different ->gsbase (still need to update KERN_GS_BASE). ~Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-04 9:17 ` Andrew Cooper @ 2026-06-05 2:24 ` Borislav Petkov 2026-06-05 2:36 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-05 2:24 UTC (permalink / raw) To: Andrew Cooper; +Cc: H. Peter Anvin, x86-ML, LKML On Thu, Jun 04, 2026 at 10:17:57AM +0100, Andrew Cooper wrote: > Yes, but it took me writing a "no" email to spot it. Oh, I know those situations. > If the LKGS (in load seg) was called unconditionally, then yes it would > be safe to drop the __wrgsbase_inactive(), but it's not. > > Consider a prev and next which both have the same ->gsindex (so skips > loadseg()), but have different ->gsbase (still need to update KERN_GS_BASE). Gah, ofc. So we'll have to do something like this which is ugly as hell: --- diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index b85e715ebb30..248c39da9ba0 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -391,16 +391,23 @@ static __always_inline void x86_pkru_load(struct thread_struct *prev, static __always_inline void x86_fsgsbase_load(struct thread_struct *prev, struct thread_struct *next) { + bool loaded_gs = false; + if (static_cpu_has(X86_FEATURE_FSGSBASE)) { /* Update the FS and GS selectors if they could have changed. */ if (unlikely(prev->fsindex || next->fsindex)) loadseg(FS, next->fsindex); - if (unlikely(prev->gsindex || next->gsindex)) + + if (unlikely(prev->gsindex || next->gsindex)) { loadseg(GS, next->gsindex); + loaded_gs = true; + } /* Update the bases. */ wrfsbase(next->fsbase); - __wrgsbase_inactive(next->gsbase); + + if (!(cpu_feature_enabled(X86_FEATURE_LKGS) && loaded_gs)) + __wrgsbase_inactive(next->gsbase); } else { load_seg_legacy(prev->fsindex, prev->fsbase, next->fsindex, next->fsbase, FS); Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 2:24 ` Borislav Petkov @ 2026-06-05 2:36 ` H. Peter Anvin 2026-06-05 2:54 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2026-06-05 2:36 UTC (permalink / raw) To: Borislav Petkov, Andrew Cooper; +Cc: x86-ML, LKML On June 4, 2026 7:24:28 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >On Thu, Jun 04, 2026 at 10:17:57AM +0100, Andrew Cooper wrote: >> Yes, but it took me writing a "no" email to spot it. > >Oh, I know those situations. > >> If the LKGS (in load seg) was called unconditionally, then yes it would >> be safe to drop the __wrgsbase_inactive(), but it's not. >> >> Consider a prev and next which both have the same ->gsindex (so skips >> loadseg()), but have different ->gsbase (still need to update KERN_GS_BASE). > >Gah, ofc. > >So we'll have to do something like this which is ugly as hell: > >--- > >diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c >index b85e715ebb30..248c39da9ba0 100644 >--- a/arch/x86/kernel/process_64.c >+++ b/arch/x86/kernel/process_64.c >@@ -391,16 +391,23 @@ static __always_inline void x86_pkru_load(struct thread_struct *prev, > static __always_inline void x86_fsgsbase_load(struct thread_struct *prev, > struct thread_struct *next) > { >+ bool loaded_gs = false; >+ > if (static_cpu_has(X86_FEATURE_FSGSBASE)) { > /* Update the FS and GS selectors if they could have changed. */ > if (unlikely(prev->fsindex || next->fsindex)) > loadseg(FS, next->fsindex); >- if (unlikely(prev->gsindex || next->gsindex)) >+ >+ if (unlikely(prev->gsindex || next->gsindex)) { > loadseg(GS, next->gsindex); >+ loaded_gs = true; >+ } > > /* Update the bases. */ > wrfsbase(next->fsbase); >- __wrgsbase_inactive(next->gsbase); >+ >+ if (!(cpu_feature_enabled(X86_FEATURE_LKGS) && loaded_gs)) >+ __wrgsbase_inactive(next->gsbase); > } else { > load_seg_legacy(prev->fsindex, prev->fsbase, > next->fsindex, next->fsbase, FS); > > >Thx. > Also consider that user space might have done: mov gs,... wrgsbase ... So gs.selector > 3 doesn't necessarily mean that the base is consistent with the descriptor. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 2:36 ` H. Peter Anvin @ 2026-06-05 2:54 ` Borislav Petkov 2026-06-05 3:20 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-05 2:54 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On Thu, Jun 04, 2026 at 07:36:01PM -0700, H. Peter Anvin wrote: > >diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c > >index b85e715ebb30..248c39da9ba0 100644 > >--- a/arch/x86/kernel/process_64.c > >+++ b/arch/x86/kernel/process_64.c > >@@ -391,16 +391,23 @@ static __always_inline void x86_pkru_load(struct thread_struct *prev, > > static __always_inline void x86_fsgsbase_load(struct thread_struct *prev, > > struct thread_struct *next) > > { > >+ bool loaded_gs = false; > >+ > > if (static_cpu_has(X86_FEATURE_FSGSBASE)) { > > /* Update the FS and GS selectors if they could have changed. */ > > if (unlikely(prev->fsindex || next->fsindex)) > > loadseg(FS, next->fsindex); > >- if (unlikely(prev->gsindex || next->gsindex)) > >+ > >+ if (unlikely(prev->gsindex || next->gsindex)) { > > loadseg(GS, next->gsindex); > >+ loaded_gs = true; > >+ } > > > > /* Update the bases. */ > > wrfsbase(next->fsbase); > >- __wrgsbase_inactive(next->gsbase); > >+ > >+ if (!(cpu_feature_enabled(X86_FEATURE_LKGS) && loaded_gs)) > >+ __wrgsbase_inactive(next->gsbase); > > } else { > > load_seg_legacy(prev->fsindex, prev->fsbase, > > next->fsindex, next->fsbase, FS); > > > > > >Thx. > > > > Also consider that user space might have done: > > mov gs,... > wrgsbase ... > > So gs.selector > 3 doesn't necessarily mean that the base is consistent with the descriptor. Right, I want to avoid the second write to KERNEL_GS_BASE iff we have done LKGS before. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 2:54 ` Borislav Petkov @ 2026-06-05 3:20 ` H. Peter Anvin 2026-06-05 4:26 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2026-06-05 3:20 UTC (permalink / raw) To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On June 4, 2026 7:54:53 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >On Thu, Jun 04, 2026 at 07:36:01PM -0700, H. Peter Anvin wrote: >> >diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c >> >index b85e715ebb30..248c39da9ba0 100644 >> >--- a/arch/x86/kernel/process_64.c >> >+++ b/arch/x86/kernel/process_64.c >> >@@ -391,16 +391,23 @@ static __always_inline void x86_pkru_load(struct thread_struct *prev, >> > static __always_inline void x86_fsgsbase_load(struct thread_struct *prev, >> > struct thread_struct *next) >> > { >> >+ bool loaded_gs = false; >> >+ >> > if (static_cpu_has(X86_FEATURE_FSGSBASE)) { >> > /* Update the FS and GS selectors if they could have changed. */ >> > if (unlikely(prev->fsindex || next->fsindex)) >> > loadseg(FS, next->fsindex); >> >- if (unlikely(prev->gsindex || next->gsindex)) >> >+ >> >+ if (unlikely(prev->gsindex || next->gsindex)) { >> > loadseg(GS, next->gsindex); >> >+ loaded_gs = true; >> >+ } >> > >> > /* Update the bases. */ >> > wrfsbase(next->fsbase); >> >- __wrgsbase_inactive(next->gsbase); >> >+ >> >+ if (!(cpu_feature_enabled(X86_FEATURE_LKGS) && loaded_gs)) >> >+ __wrgsbase_inactive(next->gsbase); >> > } else { >> > load_seg_legacy(prev->fsindex, prev->fsbase, >> > next->fsindex, next->fsbase, FS); >> > >> > >> >Thx. >> > >> >> Also consider that user space might have done: >> >> mov gs,... >> wrgsbase ... >> >> So gs.selector > 3 doesn't necessarily mean that the base is consistent with the descriptor. > >Right, I want to avoid the second write to KERNEL_GS_BASE iff we have done >LKGS before. > I guess the question is why there is a "first" one. Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS can be replaced with swapgs/mov gs/swapgs on legacy. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 3:20 ` H. Peter Anvin @ 2026-06-05 4:26 ` Borislav Petkov 2026-06-05 4:30 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-05 4:26 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote: > I guess the question is why there is a "first" one. That happens when we do: x86_fsgsbase_load() loadseg(GS) -> load_gs_index() -> native_load_gs_index() -> if (cpu_feature_enabled(X86_FEATURE_LKGS)) native_lkgs(selector); then back in x86_fsgsbase_load() we do: __wrgsbase_inactive(next->gsbase); which does wrmsrq(MSR_KERNEL_GS_BASE, gsbase); on FRED. But LKGS already wrote MSR_KERNEL_GS_BASE... > Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS > can be replaced with swapgs/mov gs/swapgs on legacy. Right. I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf back... Although, I need to think how to make it pretty... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 4:26 ` Borislav Petkov @ 2026-06-05 4:30 ` H. Peter Anvin 2026-06-05 4:38 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2026-06-05 4:30 UTC (permalink / raw) To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote: >> I guess the question is why there is a "first" one. > >That happens when we do: > >x86_fsgsbase_load() > > loadseg(GS) -> load_gs_index() -> native_load_gs_index() -> > if (cpu_feature_enabled(X86_FEATURE_LKGS)) > native_lkgs(selector); > >then back in x86_fsgsbase_load() we do: > > __wrgsbase_inactive(next->gsbase); > >which does > > wrmsrq(MSR_KERNEL_GS_BASE, gsbase); > >on FRED. > >But LKGS already wrote MSR_KERNEL_GS_BASE... > >> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS >> can be replaced with swapgs/mov gs/swapgs on legacy. > >Right. > >I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf >back... > >Although, I need to think how to make it pretty... > Should be doing wrmsrns... ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 4:30 ` H. Peter Anvin @ 2026-06-05 4:38 ` Borislav Petkov 2026-06-05 5:05 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-05 4:38 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote: > On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote: > >On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote: > >> I guess the question is why there is a "first" one. > > > >That happens when we do: > > > >x86_fsgsbase_load() > > > > loadseg(GS) -> load_gs_index() -> native_load_gs_index() -> > > if (cpu_feature_enabled(X86_FEATURE_LKGS)) > > native_lkgs(selector); > > > >then back in x86_fsgsbase_load() we do: > > > > __wrgsbase_inactive(next->gsbase); > > > >which does > > > > wrmsrq(MSR_KERNEL_GS_BASE, gsbase); > > > >on FRED. > > > >But LKGS already wrote MSR_KERNEL_GS_BASE... > > > >> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS > >> can be replaced with swapgs/mov gs/swapgs on legacy. > > > >Right. > > > >I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf > >back... > > > >Although, I need to think how to make it pretty... > > > > Should be doing wrmsrns... No, I think that second WRMSR* should not happen at all if we have executed LKGS which has already written MSR_KERNEL_GS_BASE, right? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 4:38 ` Borislav Petkov @ 2026-06-05 5:05 ` H. Peter Anvin 2026-06-05 9:13 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2026-06-05 5:05 UTC (permalink / raw) To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote: >> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >> >On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote: >> >> I guess the question is why there is a "first" one. >> > >> >That happens when we do: >> > >> >x86_fsgsbase_load() >> > >> > loadseg(GS) -> load_gs_index() -> native_load_gs_index() -> >> > if (cpu_feature_enabled(X86_FEATURE_LKGS)) >> > native_lkgs(selector); >> > >> >then back in x86_fsgsbase_load() we do: >> > >> > __wrgsbase_inactive(next->gsbase); >> > >> >which does >> > >> > wrmsrq(MSR_KERNEL_GS_BASE, gsbase); >> > >> >on FRED. >> > >> >But LKGS already wrote MSR_KERNEL_GS_BASE... >> > >> >> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS >> >> can be replaced with swapgs/mov gs/swapgs on legacy. >> > >> >Right. >> > >> >I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf >> >back... >> > >> >Although, I need to think how to make it pretty... >> > >> >> Should be doing wrmsrns... > >No, I think that second WRMSR* should not happen at all if we have executed >LKGS which has already written MSR_KERNEL_GS_BASE, right? > > You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector. Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 5:05 ` H. Peter Anvin @ 2026-06-05 9:13 ` Andrew Cooper 2026-06-05 15:13 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2026-06-05 9:13 UTC (permalink / raw) To: H. Peter Anvin, Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On 05/06/2026 6:05 am, H. Peter Anvin wrote: > On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >> On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote: >>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >>>> On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote: >>>>> I guess the question is why there is a "first" one. >>>> That happens when we do: >>>> >>>> x86_fsgsbase_load() >>>> >>>> loadseg(GS) -> load_gs_index() -> native_load_gs_index() -> >>>> if (cpu_feature_enabled(X86_FEATURE_LKGS)) >>>> native_lkgs(selector); >>>> >>>> then back in x86_fsgsbase_load() we do: >>>> >>>> __wrgsbase_inactive(next->gsbase); >>>> >>>> which does >>>> >>>> wrmsrq(MSR_KERNEL_GS_BASE, gsbase); >>>> >>>> on FRED. >>>> >>>> But LKGS already wrote MSR_KERNEL_GS_BASE... >>>> >>>>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS >>>>> can be replaced with swapgs/mov gs/swapgs on legacy. >>>> Right. >>>> >>>> I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf >>>> back... >>>> >>>> Although, I need to think how to make it pretty... >>>> >>> Should be doing wrmsrns... >> No, I think that second WRMSR* should not happen at all if we have executed >> LKGS which has already written MSR_KERNEL_GS_BASE, right? >> >> > You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector. > > Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized. I think you're slightly talking past each other, and I also made a mistake on the original reply, so lets try rephrasing it. LGKS only writes a zero-extended 32bit value into KERN_GS_BASE. This is because there's only 32 bits of information in the GDT/LDT. So the real write into KERN_GS_BASE is still needed. Sorry - you can't optimise this away. Also, I'm pretty sure amluto did some x86 selftests covering this last time the logic was rewritten. As to WRMSR vs WRMSRNS, yes Intel CPUs want this to be WRMSRNS. AMD don't have WRMSRNS but this particular MSR index is architecturally not architecturally serialising anyway. ~Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 9:13 ` Andrew Cooper @ 2026-06-05 15:13 ` H. Peter Anvin 2026-06-05 15:16 ` Andrew Cooper 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2026-06-05 15:13 UTC (permalink / raw) To: Andrew Cooper, Borislav Petkov; +Cc: x86-ML, LKML On June 5, 2026 2:13:07 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >On 05/06/2026 6:05 am, H. Peter Anvin wrote: >> On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >>> On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote: >>>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >>>>> On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote: >>>>>> I guess the question is why there is a "first" one. >>>>> That happens when we do: >>>>> >>>>> x86_fsgsbase_load() >>>>> >>>>> loadseg(GS) -> load_gs_index() -> native_load_gs_index() -> >>>>> if (cpu_feature_enabled(X86_FEATURE_LKGS)) >>>>> native_lkgs(selector); >>>>> >>>>> then back in x86_fsgsbase_load() we do: >>>>> >>>>> __wrgsbase_inactive(next->gsbase); >>>>> >>>>> which does >>>>> >>>>> wrmsrq(MSR_KERNEL_GS_BASE, gsbase); >>>>> >>>>> on FRED. >>>>> >>>>> But LKGS already wrote MSR_KERNEL_GS_BASE... >>>>> >>>>>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS >>>>>> can be replaced with swapgs/mov gs/swapgs on legacy. >>>>> Right. >>>>> >>>>> I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf >>>>> back... >>>>> >>>>> Although, I need to think how to make it pretty... >>>>> >>>> Should be doing wrmsrns... >>> No, I think that second WRMSR* should not happen at all if we have executed >>> LKGS which has already written MSR_KERNEL_GS_BASE, right? >>> >>> >> You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector. >> >> Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized. > >I think you're slightly talking past each other, and I also made a >mistake on the original reply, so lets try rephrasing it. > >LGKS only writes a zero-extended 32bit value into KERN_GS_BASE. This is >because there's only 32 bits of information in the GDT/LDT. > >So the real write into KERN_GS_BASE is still needed. Sorry - you can't >optimise this away. Also, I'm pretty sure amluto did some x86 selftests >covering this last time the logic was rewritten. > > >As to WRMSR vs WRMSRNS, yes Intel CPUs want this to be WRMSRNS. AMD >don't have WRMSRNS but this particular MSR index is architecturally not >architecturally serialising anyway. > >~Andrew It's not just a matter of it being a 32-bit base, it might not even be the correct one even so. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 15:13 ` H. Peter Anvin @ 2026-06-05 15:16 ` Andrew Cooper 2026-06-05 15:51 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2026-06-05 15:16 UTC (permalink / raw) To: H. Peter Anvin, Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On 05/06/2026 4:13 pm, H. Peter Anvin wrote: > On June 5, 2026 2:13:07 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 05/06/2026 6:05 am, H. Peter Anvin wrote: >>> On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >>>> On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote: >>>>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >>>>>> On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote: >>>>>>> I guess the question is why there is a "first" one. >>>>>> That happens when we do: >>>>>> >>>>>> x86_fsgsbase_load() >>>>>> >>>>>> loadseg(GS) -> load_gs_index() -> native_load_gs_index() -> >>>>>> if (cpu_feature_enabled(X86_FEATURE_LKGS)) >>>>>> native_lkgs(selector); >>>>>> >>>>>> then back in x86_fsgsbase_load() we do: >>>>>> >>>>>> __wrgsbase_inactive(next->gsbase); >>>>>> >>>>>> which does >>>>>> >>>>>> wrmsrq(MSR_KERNEL_GS_BASE, gsbase); >>>>>> >>>>>> on FRED. >>>>>> >>>>>> But LKGS already wrote MSR_KERNEL_GS_BASE... >>>>>> >>>>>>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS >>>>>>> can be replaced with swapgs/mov gs/swapgs on legacy. >>>>>> Right. >>>>>> >>>>>> I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf >>>>>> back... >>>>>> >>>>>> Although, I need to think how to make it pretty... >>>>>> >>>>> Should be doing wrmsrns... >>>> No, I think that second WRMSR* should not happen at all if we have executed >>>> LKGS which has already written MSR_KERNEL_GS_BASE, right? >>>> >>>> >>> You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector. >>> >>> Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized. >> I think you're slightly talking past each other, and I also made a >> mistake on the original reply, so lets try rephrasing it. >> >> LGKS only writes a zero-extended 32bit value into KERN_GS_BASE. This is >> because there's only 32 bits of information in the GDT/LDT. >> >> So the real write into KERN_GS_BASE is still needed. Sorry - you can't >> optimise this away. Also, I'm pretty sure amluto did some x86 selftests >> covering this last time the logic was rewritten. >> >> >> As to WRMSR vs WRMSRNS, yes Intel CPUs want this to be WRMSRNS. AMD >> don't have WRMSRNS but this particular MSR index is architecturally not >> architecturally serialising anyway. >> >> ~Andrew > It's not just a matter of it being a 32-bit base, it might not even be the correct one even so. Indeed, GS might be an LDT selector. ~Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 15:16 ` Andrew Cooper @ 2026-06-05 15:51 ` H. Peter Anvin 2026-06-05 17:17 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2026-06-05 15:51 UTC (permalink / raw) To: Andrew Cooper, Borislav Petkov; +Cc: x86-ML, LKML On June 5, 2026 8:16:53 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >On 05/06/2026 4:13 pm, H. Peter Anvin wrote: >> On June 5, 2026 2:13:07 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> On 05/06/2026 6:05 am, H. Peter Anvin wrote: >>>> On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >>>>> On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote: >>>>>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >>>>>>> On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote: >>>>>>>> I guess the question is why there is a "first" one. >>>>>>> That happens when we do: >>>>>>> >>>>>>> x86_fsgsbase_load() >>>>>>> >>>>>>> loadseg(GS) -> load_gs_index() -> native_load_gs_index() -> >>>>>>> if (cpu_feature_enabled(X86_FEATURE_LKGS)) >>>>>>> native_lkgs(selector); >>>>>>> >>>>>>> then back in x86_fsgsbase_load() we do: >>>>>>> >>>>>>> __wrgsbase_inactive(next->gsbase); >>>>>>> >>>>>>> which does >>>>>>> >>>>>>> wrmsrq(MSR_KERNEL_GS_BASE, gsbase); >>>>>>> >>>>>>> on FRED. >>>>>>> >>>>>>> But LKGS already wrote MSR_KERNEL_GS_BASE... >>>>>>> >>>>>>>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS >>>>>>>> can be replaced with swapgs/mov gs/swapgs on legacy. >>>>>>> Right. >>>>>>> >>>>>>> I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf >>>>>>> back... >>>>>>> >>>>>>> Although, I need to think how to make it pretty... >>>>>>> >>>>>> Should be doing wrmsrns... >>>>> No, I think that second WRMSR* should not happen at all if we have executed >>>>> LKGS which has already written MSR_KERNEL_GS_BASE, right? >>>>> >>>>> >>>> You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector. >>>> >>>> Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized. >>> I think you're slightly talking past each other, and I also made a >>> mistake on the original reply, so lets try rephrasing it. >>> >>> LGKS only writes a zero-extended 32bit value into KERN_GS_BASE. This is >>> because there's only 32 bits of information in the GDT/LDT. >>> >>> So the real write into KERN_GS_BASE is still needed. Sorry - you can't >>> optimise this away. Also, I'm pretty sure amluto did some x86 selftests >>> covering this last time the logic was rewritten. >>> >>> >>> As to WRMSR vs WRMSRNS, yes Intel CPUs want this to be WRMSRNS. AMD >>> don't have WRMSRNS but this particular MSR index is architecturally not >>> architecturally serialising anyway. >>> >>> ~Andrew >> It's not just a matter of it being a 32-bit base, it might not even be the correct one even so. > >Indeed, GS might be an LDT selector. > >~Andrew No, GS.base might have been loaded (with wrgsbase) after GS was loaded, so it could be *completely different*. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 15:51 ` H. Peter Anvin @ 2026-06-05 17:17 ` Borislav Petkov 2026-06-08 6:46 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-05 17:17 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On Fri, Jun 05, 2026 at 08:51:04AM -0700, H. Peter Anvin wrote: > No, GS.base might have been loaded (with wrgsbase) after GS was loaded, so > it could be *completely different*. So you're basically saying, LKGS would load the IA32_KERNEL_GS_BASE which belongs to the segment selector. This is what the pseudo code in the SDM says: GS.selector := SRC; GS.attributes := descriptor.attributes; IA32_KERNEL_GS_BASE := descriptor.base; // bits 63:32 cleared Now, luserspace might've put something else in GS.base with WRGSBASE: GS.base := SRC; So now, on context switch, we need to load IA32_KERNEL_GS_BASE with next->gsbase which is the GS.base of the next task we're switching to. And yes, GS.base is mapped to IA32_KERNEL_GS_BASE so yes, we must do that update. And yes, as Andrew points out, both LKGS and WRMGSBASE do 32-bit writes only so we need to do the full MSR write. Ok, thanks guys, that makes sense. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-05 17:17 ` Borislav Petkov @ 2026-06-08 6:46 ` H. Peter Anvin 2026-06-08 14:38 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2026-06-08 6:46 UTC (permalink / raw) To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On June 5, 2026 10:17:11 AM PDT, Borislav Petkov <bp@alien8.de> wrote: >On Fri, Jun 05, 2026 at 08:51:04AM -0700, H. Peter Anvin wrote: >> No, GS.base might have been loaded (with wrgsbase) after GS was loaded, so >> it could be *completely different*. > >So you're basically saying, LKGS would load the IA32_KERNEL_GS_BASE which >belongs to the segment selector. This is what the pseudo code in the SDM says: > > GS.selector := SRC; > GS.attributes := descriptor.attributes; > IA32_KERNEL_GS_BASE := descriptor.base; // bits 63:32 cleared > >Now, luserspace might've put something else in GS.base with WRGSBASE: > > GS.base := SRC; > >So now, on context switch, we need to load IA32_KERNEL_GS_BASE with >next->gsbase which is the GS.base of the next task we're switching to. > >And yes, GS.base is mapped to IA32_KERNEL_GS_BASE so yes, we must do that >update. > >And yes, as Andrew points out, both LKGS and WRMGSBASE do 32-bit writes only >so we need to do the full MSR write. > >Ok, thanks guys, that makes sense. > WRxSBASE does a 64-bit write, but for GS it would incorrectly address the kernel GS.base. For legacy it can be used under swapgs, but with FRED that is disallowed. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-08 6:46 ` H. Peter Anvin @ 2026-06-08 14:38 ` Borislav Petkov 2026-06-08 17:30 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-08 14:38 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On Sun, Jun 07, 2026 at 11:46:34PM -0700, H. Peter Anvin wrote: > WRxSBASE does a 64-bit write, When REX.W. The SDM text is confusing: "If no REX.W prefix is used, the operand size is 32 bits; the upper 32 bits of the source register are ignored and upper 32 bits of the base address (for FS or GS) are cleared." Does this last part that GS is cleared, refer to when WRGSBASE is used with no REX.W or in general? > but for GS it would incorrectly address the kernel GS.base. What does that mean? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-08 14:38 ` Borislav Petkov @ 2026-06-08 17:30 ` H. Peter Anvin 2026-06-08 20:05 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: H. Peter Anvin @ 2026-06-08 17:30 UTC (permalink / raw) To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On 2026-06-08 07:38, Borislav Petkov wrote: > On Sun, Jun 07, 2026 at 11:46:34PM -0700, H. Peter Anvin wrote: >> WRxSBASE does a 64-bit write, > > When REX.W. > > The SDM text is confusing: > > "If no REX.W prefix is used, the operand size is 32 bits; the upper 32 bits of > the source register are ignored and upper 32 bits of the base address (for FS > or GS) are cleared." > > Does this last part that GS is cleared, refer to when WRGSBASE is used with no > REX.W or in general? > Without REX.W (e.g. wrgsbase %eax as opposed to wrgsbase %rax). >> but for GS it would incorrectly address the kernel GS.base. > > What does that mean? > It means that in kernel mode, it is the currently active GS.base that is written (or read with rdgsbase), that is, the one that belongs to kernel, not the user space one in what is confusingly enough called MSR_KERNEL_GS_BASE. In other words, not the one we want to task switch, *unless* you are in IDT mode and can surround it with SWAPGS. -hpa ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-08 17:30 ` H. Peter Anvin @ 2026-06-08 20:05 ` Borislav Petkov 2026-06-08 21:21 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-08 20:05 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On Mon, Jun 08, 2026 at 10:30:36AM -0700, H. Peter Anvin wrote: > Without REX.W (e.g. wrgsbase %eax as opposed to wrgsbase %rax). I see. > It means that in kernel mode, it is the currently active GS.base that is > written (or read with rdgsbase), that is, the one that belongs to kernel, > not the user space one in what is confusingly enough called > MSR_KERNEL_GS_BASE. > > In other words, not the one we want to task switch, *unless* you are in IDT > mode and can surround it with SWAPGS. Uff, what a mess this stuff is. Brain is in a knot. I think this is begging to be written down somewhere. Lemme point AI to it and see what it would generate. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-08 20:05 ` Borislav Petkov @ 2026-06-08 21:21 ` Borislav Petkov 2026-06-08 21:52 ` H. Peter Anvin 2026-06-08 22:58 ` Andrew Cooper 0 siblings, 2 replies; 25+ messages in thread From: Borislav Petkov @ 2026-06-08 21:21 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On Mon, Jun 08, 2026 at 01:05:28PM -0700, Borislav Petkov wrote: > I think this is begging to be written down somewhere. Lemme point AI to it and > see what it would generate. Something like the below... I am thinking of sticking that somewhere under Documentation... oh look: Documentation/arch/x86/x86_64/fsgs.rst. Looks like there was already need to document this stuff which is clearly not really transparent. What's there is covering the userspace side more tho. Anyway, the below is a summary of our thread with AI, it ain't half-bad and I think we should write it down for future reference. Thx. --- GS handling on context switches SWAPGS Swaps the base-address value in MSR_KERNEL_GS_BASE with the active GS.base in the hidden portion of the GS selector register. MOV <segment selector>, GS (legacy path, non-FRED) Loads the GS selector and fetches the descriptor attributes and base from the GDT/LDT. Writes a 32-bit base into the active GS.base. Does not touch MSR_KERNEL_GS_BASE. Useless for task switching the user GS base on its own without surrounding SWAPGS calls. LKGS <selector> (FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and descriptor attributes, but it redirects the base write — instead of updating the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE (i.e. MSR_KERNEL_GS_BASE). Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT descriptors only encode 32-bit bases. This means it cannot correctly represent a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR is still required afterwards. WRGSBASE <reg> (with REX.W) Writes a full 64-bit value directly into the currently active GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to cover the full address space. The problem in kernel context: the currently active GS.base belongs to the kernel, not the user task. So using this during context switching would corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in IDT mode). Without REX.W, the upper 32 bits of the base are cleared instead. WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive (user-space) GS base — the one that gets swapped into the active GS.base on SWAPGS. This is the only instruction that can correctly set a 64-bit user GS base during a context switch from kernel mode. WRMSRNS is the non-serializing variant, which is preferable here since this MSR is architecturally non-serializing anyway. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-08 21:21 ` Borislav Petkov @ 2026-06-08 21:52 ` H. Peter Anvin 2026-06-08 22:58 ` Andrew Cooper 1 sibling, 0 replies; 25+ messages in thread From: H. Peter Anvin @ 2026-06-08 21:52 UTC (permalink / raw) To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On June 8, 2026 2:21:00 PM PDT, Borislav Petkov <bp@alien8.de> wrote: >On Mon, Jun 08, 2026 at 01:05:28PM -0700, Borislav Petkov wrote: >> I think this is begging to be written down somewhere. Lemme point AI to it and >> see what it would generate. > >Something like the below... I am thinking of sticking that somewhere under >Documentation... oh look: Documentation/arch/x86/x86_64/fsgs.rst. Looks like >there was already need to document this stuff which is clearly not really >transparent. What's there is covering the userspace side more tho. > >Anyway, the below is a summary of our thread with AI, it ain't half-bad and >I think we should write it down for future reference. > >Thx. > >--- >GS handling on context switches > >SWAPGS > >Swaps the base-address value in MSR_KERNEL_GS_BASE with the active GS.base in >the hidden portion of the GS selector register. > >MOV <segment selector>, GS > >(legacy path, non-FRED) Loads the GS selector and fetches the descriptor >attributes and base from the GDT/LDT. Writes a 32-bit base into the active >GS.base. Does not touch MSR_KERNEL_GS_BASE. Useless for task switching the >user GS base on its own without surrounding SWAPGS calls. > >LKGS <selector> > >(FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and >descriptor attributes, but it redirects the base write — instead of updating >the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE >(i.e. MSR_KERNEL_GS_BASE). > >Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT >descriptors only encode 32-bit bases. This means it cannot correctly represent >a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR >is still required afterwards. > >WRGSBASE <reg> > >(with REX.W) Writes a full 64-bit value directly into the currently active >GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to cover >the full address space. > >The problem in kernel context: the currently active GS.base belongs to the >kernel, not the user task. So using this during context switching would >corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in >IDT mode). Without REX.W, the upper 32 bits of the base are cleared instead. > >WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE > >Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive >(user-space) GS base — the one that gets swapped into the active GS.base on >SWAPGS. This is the only instruction that can correctly set a 64-bit user GS >base during a context switch from kernel mode. WRMSRNS is the non-serializing >variant, which is preferable here since this MSR is architecturally >non-serializing anyway. > Looks good to me. You might want to add that despite the name, while running in the kernel (which is the only time MSRs are accessible) KEENEL_GS_BASE, despite its name, actually contains the *user* GS base. The naming was confusing to begin with, and is even more so with FRED. Also, MOV GS/LKGS is the only way to update the *other* fields of the GS descriptor (the base and the flags.) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-08 21:21 ` Borislav Petkov 2026-06-08 21:52 ` H. Peter Anvin @ 2026-06-08 22:58 ` Andrew Cooper 2026-06-18 1:09 ` Borislav Petkov 1 sibling, 1 reply; 25+ messages in thread From: Andrew Cooper @ 2026-06-08 22:58 UTC (permalink / raw) To: Borislav Petkov, H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML On 08/06/2026 10:21 pm, Borislav Petkov wrote: > On Mon, Jun 08, 2026 at 01:05:28PM -0700, Borislav Petkov wrote: >> I think this is begging to be written down somewhere. Lemme point AI to it and >> see what it would generate. > Something like the below... I am thinking of sticking that somewhere under > Documentation... oh look: Documentation/arch/x86/x86_64/fsgs.rst. Looks like > there was already need to document this stuff which is clearly not really > transparent. What's there is covering the userspace side more tho. > > Anyway, the below is a summary of our thread with AI, it ain't half-bad and > I think we should write it down for future reference. This contains some mechanics, but is lacking on the background. > --- > GS handling on context switches History ------- In 32bit, data segments need reloading on entry to the kernel, and restoring on exit to userspace. Only the segment selector is necessary, as all segment data resides in the GDT/LDT. Bases in the GDT/LDT are 32 bits wide. The segment selector values are user-chosen, and effectively arbitrary. The 32bit mechanism is slow, so in 64bit, segments were made mostly flat so as to not need reloading on entry/exit. FS and GS segment bases was extended to 64 bits, and become accessible via MSRs, and a separate GS_SHADOW value introduced. The SWAPGS instruction swaps GS_BASE and GS_SHADOW, as the only action needed on entry/exit. 64bit userspace needed to make the prctl() ARCH_SET_GS to have a base value greater than 32 bits, and a side effect of this syscall was to zero the GS selector. Then the FSGSBASE instructions came along, and userspace could finally choose an arbitrary base address not previously registered via syscall. Linux's ABI promises to preserve both the selector value and the full base, even when they are disconnected. Mechanisms ---------- > SWAPGS > > Swaps the base-address value in MSR_KERNEL_GS_BASE with the active GS.base in > the hidden portion of the GS selector register. "Swaps the value in " > > MOV <segment selector>, GS > > (legacy path, non-FRED) Loads the GS selector and fetches the descriptor > attributes and base from the GDT/LDT. Loads the GS segment from the GDT/LDT, including selector, attributes, limit and base. The 32 bits of base from the GDT/LDT are zero-extended into the 64bit base register. > Writes a 32-bit base into the active > GS.base. Does not touch MSR_KERNEL_GS_BASE. Useless for task switching the > user GS base on its own without surrounding SWAPGS calls. I wouldn't quite say useless. There are ways without a SWAPGS, although they're less neat. The salient point is that MOV GS clobbers the active kernel per-cpu pointer, so must be done with custom error handling/recovery so #GP/#PF/NMI/#MC can get back to the correct per-cpu pointer. > > LKGS <selector> > > (FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and > descriptor attributes, but it redirects the base write — instead of updating > the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE > (i.e. MSR_KERNEL_GS_BASE). > > Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT > descriptors only encode 32-bit bases. This means it cannot correctly represent > a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR > is still required afterwards. "Exactly like MOV GS, except that the 32bit zero-extended base value is written into GS_SHADOW instead of the active GS.base." "This instruction ensures that the kernels per-cpu pointer stays good, and does not need custom error handling." > WRGSBASE <reg> > > (with REX.W) Writes a full 64-bit value directly into the currently active > GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to cover > the full address space. > > The problem in kernel context: the currently active GS.base belongs to the > kernel, not the user task. So using this during context switching would > corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in > IDT mode). Without REX.W, the upper 32 bits of the base are cleared instead. The REX.W aspect isn't interesting from a context switching point of view. It behaves like most other instructions in this regard, even if it's not interesting to use the 32bit form of WRGSBASE. > > WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE > > Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive > (user-space) GS base — the one that gets swapped into the active GS.base on > SWAPGS. This is the only instruction that can correctly set a 64-bit user GS > base during a context switch from kernel mode. WRMSRNS is the non-serializing > variant, which is preferable here since this MSR is architecturally > non-serializing anyway. This is where the AI didn't get it right. AMD ~silently made FS_BASE/GS_BASE/GS_SHADOW become non-architecturally-serialising, enumerated by 0x80000021.eax[1].FS_GS_NS. I think this was in Zen3. Intel still has them as architecturally serialising, requiring software to opt in to non-serialising behaviour using WRMSRNS. ~Andrew ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-08 22:58 ` Andrew Cooper @ 2026-06-18 1:09 ` Borislav Petkov 2026-06-18 10:22 ` David Laight 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2026-06-18 1:09 UTC (permalink / raw) To: Andrew Cooper, H. Peter Anvin; +Cc: x86-ML, LKML Ok, I think I incorporated them all: --- From: "Borislav Petkov (AMD)" <bp@alien8.de> Date: Mon, 8 Jun 2026 18:59:14 -0700 Subject: [PATCH] Documentation/x86: Document the intricacies of GS context switching Summarized from the thread at Link by AI, with additions and improvements by H. Peter Anvin and Andrew Cooper. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Assisted-by: Claude Code:claude-sonnet-4-6 Link: https://lore.kernel.org/all/20260604015303.GEaiDafyuU0bwP4Y05@fat_crate.local --- Documentation/arch/x86/x86_64/fsgs.rst | 97 ++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) diff --git a/Documentation/arch/x86/x86_64/fsgs.rst b/Documentation/arch/x86/x86_64/fsgs.rst index 6bda4d16d3f7..fa3d4b065423 100644 --- a/Documentation/arch/x86/x86_64/fsgs.rst +++ b/Documentation/arch/x86/x86_64/fsgs.rst @@ -197,3 +197,100 @@ be used for FS/GS based addressing mode:: mov %reg, %fs:offset mov %reg, %gs:offset + + +Complexities with GS handling on context switches +================================================= + +History +------- + +In 32-bit, data segments need reloading on entry to the kernel, and restoring +on exit to userspace. Only the segment selector is necessary, as all segment +data resides in the GDT/LDT. Bases in the GDT/LDT are 32 bits wide. The +segment selector values are user-chosen, and effectively arbitrary. + +The 32-bit mechanism is slow, so in 64-bit, segments were made mostly flat so +as to not need reloading on entry/exit. FS and GS segment bases were extended +to 64 bits, and became accessible via MSRs. Also, a separate GS_SHADOW value +was introduced. The SWAPGS instruction swaps GS_BASE and GS_SHADOW, as the +only action needed on entry/exit. + +64-bit userspace needed to make the prctl() ARCH_SET_GS have a base value +greater than 32 bits, and a side effect of this syscall was to zero the GS +selector. + +Then the FSGSBASE instructions came along, and userspace could finally choose +an arbitrary base address not previously registered via the syscall. Linux's +ABI promises to preserve both the selector value and the full base, even when +they are disconnected. + +When looking at the hardware capabilities, there are multiple x86 instructions +which modify GS: + +* SWAPGS + +Swaps the value in MSR_KERNEL_GS_BASE with the active GS.base in the hidden +portion of the GS selector register. + +* MOV <segment selector>, GS + +(legacy path, non-FRED) Loads GS with the selector specified in <segment +selector> and fetches the GS descriptor attributes, limit and base from the +GDT/LDT. Writes a 32-bit base into the active GS.base, zero-extending it into +the 64-bit base register. It does not touch MSR_KERNEL_GS_BASE. + +The problem with this is that because it writes the *current* GS.base, it +corrupts the active kernel per-CPU pointer (in %gs). + +* LKGS <selector> + +(FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and +descriptor attributes, but it redirects the base write — instead of updating +the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE +(i.e. MSR_KERNEL_GS_BASE). + +Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT +descriptors only encode 32-bit bases. This means it cannot correctly represent +a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR +is still required afterwards. + +This instruction ensures that the kernel's per-CPU pointer stays good, and +does not need custom error handling. + +MOV GS and LKGS are the only way to update the other fields of the GS +descriptor. + +* WRGSBASE <reg> + +In 64-bit mode, it writes a full 64-bit value directly into the currently +active GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to +cover the full address space. + +The problem in kernel context: the currently active GS.base belongs to the +kernel, not the user task. So using this during context switching would +corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in +IDT mode). + +In the remaining modes, the upper 32 bits of the base are cleared instead. + +* WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE + +Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive +(user-space) GS.base — the one that gets swapped into the active GS.base on +SWAPGS. This is the only instruction that can correctly set a 64-bit user +GS.base during a context switch from kernel mode. + +The non-serializing nature of the write was accomplished by the two vendors +differently. AMD, starting with Zen4, made it the default through: + + CPUID_Fn80000021_EAX [Extended Feature 2 EAX] + (Core::X86::Cpuid::FeatureExt2Eax)[1], FsGsKernelGsBaseNonSerializing which is + fixed to 1 + +and Intel through the WRMSRNS instruction which is the non-serializing +variant. + +Btw, while running in kernel mode, MSR_KERNEL_GS_BASE contains actually the +*user* GS.base. Thus, the naming can be confusing. Unless one thinks of it as +the kernel's access to GS.base as MSRs are accessible only in CPL0. -- 2.53.0 -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-18 1:09 ` Borislav Petkov @ 2026-06-18 10:22 ` David Laight 2026-06-18 18:17 ` H. Peter Anvin 0 siblings, 1 reply; 25+ messages in thread From: David Laight @ 2026-06-18 10:22 UTC (permalink / raw) To: Borislav Petkov; +Cc: Andrew Cooper, H. Peter Anvin, x86-ML, LKML On Wed, 17 Jun 2026 18:09:02 -0700 Borislav Petkov <bp@alien8.de> wrote: > Ok, > > I think I incorporated them all: ... > +Btw, while running in kernel mode, MSR_KERNEL_GS_BASE contains actually the > +*user* GS.base. Thus, the naming can be confusing. Unless one thinks of it as > +the kernel's access to GS.base as MSRs are accessible only in CPL0. That last sentence doesn't read right. Maybe: The naming of MSR_KERNEL_GS_BASE is rather confusing. In can only be accessed in kernel mode where it normally contains the USER GS.base. The only time it contains the KERNEL GS.base is on system call/interrupt entry prior to swapgs being executed (and late in the return to user paths). As an aside I think a 32bit program can detect hardware interrupts. If %gs/%fs is loaded from an LDT and then the LDT entry changed (eg a different limit) then the new limit will be loaded by the ISR return path. I seem to remember deciding that it was impossible to actual restore the actual register value. David ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Save a WRMSR GS.base? 2026-06-18 10:22 ` David Laight @ 2026-06-18 18:17 ` H. Peter Anvin 0 siblings, 0 replies; 25+ messages in thread From: H. Peter Anvin @ 2026-06-18 18:17 UTC (permalink / raw) To: David Laight, Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML On June 18, 2026 3:22:56 AM PDT, David Laight <david.laight.linux@gmail.com> wrote: >On Wed, 17 Jun 2026 18:09:02 -0700 >Borislav Petkov <bp@alien8.de> wrote: > >> Ok, >> >> I think I incorporated them all: >... >> +Btw, while running in kernel mode, MSR_KERNEL_GS_BASE contains actually the >> +*user* GS.base. Thus, the naming can be confusing. Unless one thinks of it as >> +the kernel's access to GS.base as MSRs are accessible only in CPL0. > >That last sentence doesn't read right. Maybe: > >The naming of MSR_KERNEL_GS_BASE is rather confusing. >In can only be accessed in kernel mode where it normally contains the >USER GS.base. >The only time it contains the KERNEL GS.base is on system call/interrupt entry >prior to swapgs being executed (and late in the return to user paths). > >As an aside I think a 32bit program can detect hardware interrupts. >If %gs/%fs is loaded from an LDT and then the LDT entry changed (eg >a different limit) then the new limit will be loaded by the ISR >return path. >I seem to remember deciding that it was impossible to actual restore >the actual register value. > > David Not just 32-bit programs; any program using IDT, simply by loading a nonzero null selector. FRED does close that gap. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-06-18 18:18 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-04 1:53 Save a WRMSR GS.base? Borislav Petkov 2026-06-04 9:17 ` Andrew Cooper 2026-06-05 2:24 ` Borislav Petkov 2026-06-05 2:36 ` H. Peter Anvin 2026-06-05 2:54 ` Borislav Petkov 2026-06-05 3:20 ` H. Peter Anvin 2026-06-05 4:26 ` Borislav Petkov 2026-06-05 4:30 ` H. Peter Anvin 2026-06-05 4:38 ` Borislav Petkov 2026-06-05 5:05 ` H. Peter Anvin 2026-06-05 9:13 ` Andrew Cooper 2026-06-05 15:13 ` H. Peter Anvin 2026-06-05 15:16 ` Andrew Cooper 2026-06-05 15:51 ` H. Peter Anvin 2026-06-05 17:17 ` Borislav Petkov 2026-06-08 6:46 ` H. Peter Anvin 2026-06-08 14:38 ` Borislav Petkov 2026-06-08 17:30 ` H. Peter Anvin 2026-06-08 20:05 ` Borislav Petkov 2026-06-08 21:21 ` Borislav Petkov 2026-06-08 21:52 ` H. Peter Anvin 2026-06-08 22:58 ` Andrew Cooper 2026-06-18 1:09 ` Borislav Petkov 2026-06-18 10:22 ` David Laight 2026-06-18 18:17 ` H. Peter Anvin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.