All of lore.kernel.org
 help / color / mirror / Atom feed
* Save a WRMSR GS.base?
@ 2026-06-04  1:53 Borislav Petkov
  2026-06-04  9:17 ` Andrew Cooper
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-04  1:53 UTC (permalink / raw)
  To: H. Peter Anvin, Andrew Cooper; +Cc: x86-ML, LKML

Hi,

so here:

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b85e715ebb30..ffa894bdb4ee 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -400,7 +400,9 @@ static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 
 		/* Update the bases. */
 		wrfsbase(next->fsbase);
-		__wrgsbase_inactive(next->gsbase);
+
+		if (!cpu_feature_enabled(X86_FEATURE_LKGS))
+			__wrgsbase_inactive(next->gsbase);
 	} else {
 		load_seg_legacy(prev->fsindex, prev->fsbase,
 				next->fsindex, next->fsbase, FS);

a couple of lines above in that function we have:

                if (unlikely(prev->gsindex || next->gsindex))
                        loadseg(GS, next->gsindex);

which, on a FRED machine, would do LKGS. Now that insn does:

		GS.selector := SRC;
		GS.attributes := descriptor.attributes;
		IA32_KERNEL_GS_BASE := descriptor.base; // bits 63:32 cleared

so I can save myself the __wrgsbase_inactive() which ends up doing WRMSR
GS.base. 

Right? I.e., the diff above.

We're also not doing the optimization of checking whether prev.GS.base and
next.GS.base are equal. I see them both 0 in a trace here but I guess
luserpace can change them so I guess we wanna overwrite GS.base on context
switch unconditionally.

But LKGS does that for us so we don't need the WRMSR GS.base there, right?

Or am I missing something?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-04  1:53 Save a WRMSR GS.base? Borislav Petkov
@ 2026-06-04  9:17 ` Andrew Cooper
  2026-06-05  2:24   ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2026-06-04  9:17 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On 04/06/2026 2:53 am, Borislav Petkov wrote:
> Hi,
>
> so here:
>
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index b85e715ebb30..ffa894bdb4ee 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -400,7 +400,9 @@ static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
>  
>  		/* Update the bases. */
>  		wrfsbase(next->fsbase);
> -		__wrgsbase_inactive(next->gsbase);
> +
> +		if (!cpu_feature_enabled(X86_FEATURE_LKGS))
> +			__wrgsbase_inactive(next->gsbase);
>  	} else {
>  		load_seg_legacy(prev->fsindex, prev->fsbase,
>  				next->fsindex, next->fsbase, FS);
>
> a couple of lines above in that function we have:
>
>                 if (unlikely(prev->gsindex || next->gsindex))
>                         loadseg(GS, next->gsindex);
>
> which, on a FRED machine, would do LKGS. Now that insn does:
>
> 		GS.selector := SRC;
> 		GS.attributes := descriptor.attributes;
> 		IA32_KERNEL_GS_BASE := descriptor.base; // bits 63:32 cleared
>
> so I can save myself the __wrgsbase_inactive() which ends up doing WRMSR
> GS.base. 
>
> Right? I.e., the diff above.
>
> We're also not doing the optimization of checking whether prev.GS.base and
> next.GS.base are equal. I see them both 0 in a trace here but I guess
> luserpace can change them so I guess we wanna overwrite GS.base on context
> switch unconditionally.
>
> But LKGS does that for us so we don't need the WRMSR GS.base there, right?
>
> Or am I missing something?

Yes, but it took me writing a "no" email to spot it.

If the LKGS (in load seg) was called unconditionally, then yes it would
be safe to drop the __wrgsbase_inactive(), but it's not.

Consider a prev and next which both have the same ->gsindex (so skips
loadseg()), but have different ->gsbase (still need to update KERN_GS_BASE).

~Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-04  9:17 ` Andrew Cooper
@ 2026-06-05  2:24   ` Borislav Petkov
  2026-06-05  2:36     ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-05  2:24 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: H. Peter Anvin, x86-ML, LKML

On Thu, Jun 04, 2026 at 10:17:57AM +0100, Andrew Cooper wrote:
> Yes, but it took me writing a "no" email to spot it.

Oh, I know those situations.

> If the LKGS (in load seg) was called unconditionally, then yes it would
> be safe to drop the __wrgsbase_inactive(), but it's not.
> 
> Consider a prev and next which both have the same ->gsindex (so skips
> loadseg()), but have different ->gsbase (still need to update KERN_GS_BASE).

Gah, ofc.

So we'll have to do something like this which is ugly as hell:

---

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b85e715ebb30..248c39da9ba0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -391,16 +391,23 @@ static __always_inline void x86_pkru_load(struct thread_struct *prev,
 static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
 					      struct thread_struct *next)
 {
+	bool loaded_gs = false;
+
 	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
 		/* Update the FS and GS selectors if they could have changed. */
 		if (unlikely(prev->fsindex || next->fsindex))
 			loadseg(FS, next->fsindex);
-		if (unlikely(prev->gsindex || next->gsindex))
+
+		if (unlikely(prev->gsindex || next->gsindex)) {
 			loadseg(GS, next->gsindex);
+			loaded_gs = true;
+		}
 
 		/* Update the bases. */
 		wrfsbase(next->fsbase);
-		__wrgsbase_inactive(next->gsbase);
+
+		if (!(cpu_feature_enabled(X86_FEATURE_LKGS) && loaded_gs))
+			__wrgsbase_inactive(next->gsbase);
 	} else {
 		load_seg_legacy(prev->fsindex, prev->fsbase,
 				next->fsindex, next->fsbase, FS);


Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  2:24   ` Borislav Petkov
@ 2026-06-05  2:36     ` H. Peter Anvin
  2026-06-05  2:54       ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-05  2:36 UTC (permalink / raw)
  To: Borislav Petkov, Andrew Cooper; +Cc: x86-ML, LKML

On June 4, 2026 7:24:28 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Thu, Jun 04, 2026 at 10:17:57AM +0100, Andrew Cooper wrote:
>> Yes, but it took me writing a "no" email to spot it.
>
>Oh, I know those situations.
>
>> If the LKGS (in load seg) was called unconditionally, then yes it would
>> be safe to drop the __wrgsbase_inactive(), but it's not.
>> 
>> Consider a prev and next which both have the same ->gsindex (so skips
>> loadseg()), but have different ->gsbase (still need to update KERN_GS_BASE).
>
>Gah, ofc.
>
>So we'll have to do something like this which is ugly as hell:
>
>---
>
>diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>index b85e715ebb30..248c39da9ba0 100644
>--- a/arch/x86/kernel/process_64.c
>+++ b/arch/x86/kernel/process_64.c
>@@ -391,16 +391,23 @@ static __always_inline void x86_pkru_load(struct thread_struct *prev,
> static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
> 					      struct thread_struct *next)
> {
>+	bool loaded_gs = false;
>+
> 	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> 		/* Update the FS and GS selectors if they could have changed. */
> 		if (unlikely(prev->fsindex || next->fsindex))
> 			loadseg(FS, next->fsindex);
>-		if (unlikely(prev->gsindex || next->gsindex))
>+
>+		if (unlikely(prev->gsindex || next->gsindex)) {
> 			loadseg(GS, next->gsindex);
>+			loaded_gs = true;
>+		}
> 
> 		/* Update the bases. */
> 		wrfsbase(next->fsbase);
>-		__wrgsbase_inactive(next->gsbase);
>+
>+		if (!(cpu_feature_enabled(X86_FEATURE_LKGS) && loaded_gs))
>+			__wrgsbase_inactive(next->gsbase);
> 	} else {
> 		load_seg_legacy(prev->fsindex, prev->fsbase,
> 				next->fsindex, next->fsbase, FS);
>
>
>Thx.
>

Also consider that user space might have done: 

    mov gs,...
    wrgsbase ...

So gs.selector > 3 doesn't necessarily mean that the base is consistent with the descriptor.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  2:36     ` H. Peter Anvin
@ 2026-06-05  2:54       ` Borislav Petkov
  2026-06-05  3:20         ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-05  2:54 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On Thu, Jun 04, 2026 at 07:36:01PM -0700, H. Peter Anvin wrote:
> >diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> >index b85e715ebb30..248c39da9ba0 100644
> >--- a/arch/x86/kernel/process_64.c
> >+++ b/arch/x86/kernel/process_64.c
> >@@ -391,16 +391,23 @@ static __always_inline void x86_pkru_load(struct thread_struct *prev,
> > static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
> > 					      struct thread_struct *next)
> > {
> >+	bool loaded_gs = false;
> >+
> > 	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> > 		/* Update the FS and GS selectors if they could have changed. */
> > 		if (unlikely(prev->fsindex || next->fsindex))
> > 			loadseg(FS, next->fsindex);
> >-		if (unlikely(prev->gsindex || next->gsindex))
> >+
> >+		if (unlikely(prev->gsindex || next->gsindex)) {
> > 			loadseg(GS, next->gsindex);
> >+			loaded_gs = true;
> >+		}
> > 
> > 		/* Update the bases. */
> > 		wrfsbase(next->fsbase);
> >-		__wrgsbase_inactive(next->gsbase);
> >+
> >+		if (!(cpu_feature_enabled(X86_FEATURE_LKGS) && loaded_gs))
> >+			__wrgsbase_inactive(next->gsbase);
> > 	} else {
> > 		load_seg_legacy(prev->fsindex, prev->fsbase,
> > 				next->fsindex, next->fsbase, FS);
> >
> >
> >Thx.
> >
> 
> Also consider that user space might have done: 
> 
>     mov gs,...
>     wrgsbase ...
> 
> So gs.selector > 3 doesn't necessarily mean that the base is consistent with the descriptor.

Right, I want to avoid the second write to KERNEL_GS_BASE iff we have done
LKGS before.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  2:54       ` Borislav Petkov
@ 2026-06-05  3:20         ` H. Peter Anvin
  2026-06-05  4:26           ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-05  3:20 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML

On June 4, 2026 7:54:53 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Thu, Jun 04, 2026 at 07:36:01PM -0700, H. Peter Anvin wrote:
>> >diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>> >index b85e715ebb30..248c39da9ba0 100644
>> >--- a/arch/x86/kernel/process_64.c
>> >+++ b/arch/x86/kernel/process_64.c
>> >@@ -391,16 +391,23 @@ static __always_inline void x86_pkru_load(struct thread_struct *prev,
>> > static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
>> > 					      struct thread_struct *next)
>> > {
>> >+	bool loaded_gs = false;
>> >+
>> > 	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
>> > 		/* Update the FS and GS selectors if they could have changed. */
>> > 		if (unlikely(prev->fsindex || next->fsindex))
>> > 			loadseg(FS, next->fsindex);
>> >-		if (unlikely(prev->gsindex || next->gsindex))
>> >+
>> >+		if (unlikely(prev->gsindex || next->gsindex)) {
>> > 			loadseg(GS, next->gsindex);
>> >+			loaded_gs = true;
>> >+		}
>> > 
>> > 		/* Update the bases. */
>> > 		wrfsbase(next->fsbase);
>> >-		__wrgsbase_inactive(next->gsbase);
>> >+
>> >+		if (!(cpu_feature_enabled(X86_FEATURE_LKGS) && loaded_gs))
>> >+			__wrgsbase_inactive(next->gsbase);
>> > 	} else {
>> > 		load_seg_legacy(prev->fsindex, prev->fsbase,
>> > 				next->fsindex, next->fsbase, FS);
>> >
>> >
>> >Thx.
>> >
>> 
>> Also consider that user space might have done: 
>> 
>>     mov gs,...
>>     wrgsbase ...
>> 
>> So gs.selector > 3 doesn't necessarily mean that the base is consistent with the descriptor.
>
>Right, I want to avoid the second write to KERNEL_GS_BASE iff we have done
>LKGS before.
>

I guess the question is why there is a "first" one.

Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS can be replaced with swapgs/mov gs/swapgs on legacy.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  3:20         ` H. Peter Anvin
@ 2026-06-05  4:26           ` Borislav Petkov
  2026-06-05  4:30             ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-05  4:26 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote:
> I guess the question is why there is a "first" one.

That happens when we do:

x86_fsgsbase_load()

	loadseg(GS) -> load_gs_index() -> native_load_gs_index() ->
	  if (cpu_feature_enabled(X86_FEATURE_LKGS))
                native_lkgs(selector);

then back in x86_fsgsbase_load() we do:

	__wrgsbase_inactive(next->gsbase);

which does

	wrmsrq(MSR_KERNEL_GS_BASE, gsbase);

on FRED.

But LKGS already wrote MSR_KERNEL_GS_BASE...

> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS
> can be replaced with swapgs/mov gs/swapgs on legacy.

Right.

I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf
back...

Although, I need to think how to make it pretty...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  4:26           ` Borislav Petkov
@ 2026-06-05  4:30             ` H. Peter Anvin
  2026-06-05  4:38               ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-05  4:30 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML

On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote:
>> I guess the question is why there is a "first" one.
>
>That happens when we do:
>
>x86_fsgsbase_load()
>
>	loadseg(GS) -> load_gs_index() -> native_load_gs_index() ->
>	  if (cpu_feature_enabled(X86_FEATURE_LKGS))
>                native_lkgs(selector);
>
>then back in x86_fsgsbase_load() we do:
>
>	__wrgsbase_inactive(next->gsbase);
>
>which does
>
>	wrmsrq(MSR_KERNEL_GS_BASE, gsbase);
>
>on FRED.
>
>But LKGS already wrote MSR_KERNEL_GS_BASE...
>
>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS
>> can be replaced with swapgs/mov gs/swapgs on legacy.
>
>Right.
>
>I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf
>back...
>
>Although, I need to think how to make it pretty...
>

Should be doing wrmsrns...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  4:30             ` H. Peter Anvin
@ 2026-06-05  4:38               ` Borislav Petkov
  2026-06-05  5:05                 ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-05  4:38 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote:
> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
> >On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote:
> >> I guess the question is why there is a "first" one.
> >
> >That happens when we do:
> >
> >x86_fsgsbase_load()
> >
> >	loadseg(GS) -> load_gs_index() -> native_load_gs_index() ->
> >	  if (cpu_feature_enabled(X86_FEATURE_LKGS))
> >                native_lkgs(selector);
> >
> >then back in x86_fsgsbase_load() we do:
> >
> >	__wrgsbase_inactive(next->gsbase);
> >
> >which does
> >
> >	wrmsrq(MSR_KERNEL_GS_BASE, gsbase);
> >
> >on FRED.
> >
> >But LKGS already wrote MSR_KERNEL_GS_BASE...
> >
> >> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS
> >> can be replaced with swapgs/mov gs/swapgs on legacy.
> >
> >Right.
> >
> >I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf
> >back...
> >
> >Although, I need to think how to make it pretty...
> >
> 
> Should be doing wrmsrns...

No, I think that second WRMSR* should not happen at all if we have executed
LKGS which has already written MSR_KERNEL_GS_BASE, right?


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  4:38               ` Borislav Petkov
@ 2026-06-05  5:05                 ` H. Peter Anvin
  2026-06-05  9:13                   ` Andrew Cooper
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-05  5:05 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML

On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote:
>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>> >On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote:
>> >> I guess the question is why there is a "first" one.
>> >
>> >That happens when we do:
>> >
>> >x86_fsgsbase_load()
>> >
>> >	loadseg(GS) -> load_gs_index() -> native_load_gs_index() ->
>> >	  if (cpu_feature_enabled(X86_FEATURE_LKGS))
>> >                native_lkgs(selector);
>> >
>> >then back in x86_fsgsbase_load() we do:
>> >
>> >	__wrgsbase_inactive(next->gsbase);
>> >
>> >which does
>> >
>> >	wrmsrq(MSR_KERNEL_GS_BASE, gsbase);
>> >
>> >on FRED.
>> >
>> >But LKGS already wrote MSR_KERNEL_GS_BASE...
>> >
>> >> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS
>> >> can be replaced with swapgs/mov gs/swapgs on legacy.
>> >
>> >Right.
>> >
>> >I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf
>> >back...
>> >
>> >Although, I need to think how to make it pretty...
>> >
>> 
>> Should be doing wrmsrns...
>
>No, I think that second WRMSR* should not happen at all if we have executed
>LKGS which has already written MSR_KERNEL_GS_BASE, right?
>
>

You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector.

Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  5:05                 ` H. Peter Anvin
@ 2026-06-05  9:13                   ` Andrew Cooper
  2026-06-05 15:13                     ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2026-06-05  9:13 UTC (permalink / raw)
  To: H. Peter Anvin, Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML

On 05/06/2026 6:05 am, H. Peter Anvin wrote:
> On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>> On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote:
>>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>>>> On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote:
>>>>> I guess the question is why there is a "first" one.
>>>> That happens when we do:
>>>>
>>>> x86_fsgsbase_load()
>>>>
>>>> 	loadseg(GS) -> load_gs_index() -> native_load_gs_index() ->
>>>> 	  if (cpu_feature_enabled(X86_FEATURE_LKGS))
>>>>                native_lkgs(selector);
>>>>
>>>> then back in x86_fsgsbase_load() we do:
>>>>
>>>> 	__wrgsbase_inactive(next->gsbase);
>>>>
>>>> which does
>>>>
>>>> 	wrmsrq(MSR_KERNEL_GS_BASE, gsbase);
>>>>
>>>> on FRED.
>>>>
>>>> But LKGS already wrote MSR_KERNEL_GS_BASE...
>>>>
>>>>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS
>>>>> can be replaced with swapgs/mov gs/swapgs on legacy.
>>>> Right.
>>>>
>>>> I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf
>>>> back...
>>>>
>>>> Although, I need to think how to make it pretty...
>>>>
>>> Should be doing wrmsrns...
>> No, I think that second WRMSR* should not happen at all if we have executed
>> LKGS which has already written MSR_KERNEL_GS_BASE, right?
>>
>>
> You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector.
>
> Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized.

I think you're slightly talking past each other, and I also made a
mistake on the original reply, so lets try rephrasing it.

LGKS only writes a zero-extended 32bit value into KERN_GS_BASE.  This is
because there's only 32 bits of information in the GDT/LDT.

So the real write into KERN_GS_BASE is still needed.  Sorry - you can't
optimise this away.  Also, I'm pretty sure amluto did some x86 selftests
covering this last time the logic was rewritten.


As to WRMSR vs WRMSRNS, yes Intel CPUs want this to be WRMSRNS.  AMD
don't have WRMSRNS but this particular MSR index is architecturally not
architecturally serialising anyway.

~Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05  9:13                   ` Andrew Cooper
@ 2026-06-05 15:13                     ` H. Peter Anvin
  2026-06-05 15:16                       ` Andrew Cooper
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-05 15:13 UTC (permalink / raw)
  To: Andrew Cooper, Borislav Petkov; +Cc: x86-ML, LKML

On June 5, 2026 2:13:07 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>On 05/06/2026 6:05 am, H. Peter Anvin wrote:
>> On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>>> On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote:
>>>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>>>>> On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote:
>>>>>> I guess the question is why there is a "first" one.
>>>>> That happens when we do:
>>>>>
>>>>> x86_fsgsbase_load()
>>>>>
>>>>> 	loadseg(GS) -> load_gs_index() -> native_load_gs_index() ->
>>>>> 	  if (cpu_feature_enabled(X86_FEATURE_LKGS))
>>>>>                native_lkgs(selector);
>>>>>
>>>>> then back in x86_fsgsbase_load() we do:
>>>>>
>>>>> 	__wrgsbase_inactive(next->gsbase);
>>>>>
>>>>> which does
>>>>>
>>>>> 	wrmsrq(MSR_KERNEL_GS_BASE, gsbase);
>>>>>
>>>>> on FRED.
>>>>>
>>>>> But LKGS already wrote MSR_KERNEL_GS_BASE...
>>>>>
>>>>>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS
>>>>>> can be replaced with swapgs/mov gs/swapgs on legacy.
>>>>> Right.
>>>>>
>>>>> I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf
>>>>> back...
>>>>>
>>>>> Although, I need to think how to make it pretty...
>>>>>
>>>> Should be doing wrmsrns...
>>> No, I think that second WRMSR* should not happen at all if we have executed
>>> LKGS which has already written MSR_KERNEL_GS_BASE, right?
>>>
>>>
>> You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector.
>>
>> Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized.
>
>I think you're slightly talking past each other, and I also made a
>mistake on the original reply, so lets try rephrasing it.
>
>LGKS only writes a zero-extended 32bit value into KERN_GS_BASE.  This is
>because there's only 32 bits of information in the GDT/LDT.
>
>So the real write into KERN_GS_BASE is still needed.  Sorry - you can't
>optimise this away.  Also, I'm pretty sure amluto did some x86 selftests
>covering this last time the logic was rewritten.
>
>
>As to WRMSR vs WRMSRNS, yes Intel CPUs want this to be WRMSRNS.  AMD
>don't have WRMSRNS but this particular MSR index is architecturally not
>architecturally serialising anyway.
>
>~Andrew

It's not just a matter of it being a 32-bit base, it might not even be the correct one even so.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05 15:13                     ` H. Peter Anvin
@ 2026-06-05 15:16                       ` Andrew Cooper
  2026-06-05 15:51                         ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2026-06-05 15:16 UTC (permalink / raw)
  To: H. Peter Anvin, Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML

On 05/06/2026 4:13 pm, H. Peter Anvin wrote:
> On June 5, 2026 2:13:07 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 05/06/2026 6:05 am, H. Peter Anvin wrote:
>>> On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>>>> On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote:
>>>>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>>>>>> On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote:
>>>>>>> I guess the question is why there is a "first" one.
>>>>>> That happens when we do:
>>>>>>
>>>>>> x86_fsgsbase_load()
>>>>>>
>>>>>> 	loadseg(GS) -> load_gs_index() -> native_load_gs_index() ->
>>>>>> 	  if (cpu_feature_enabled(X86_FEATURE_LKGS))
>>>>>>                native_lkgs(selector);
>>>>>>
>>>>>> then back in x86_fsgsbase_load() we do:
>>>>>>
>>>>>> 	__wrgsbase_inactive(next->gsbase);
>>>>>>
>>>>>> which does
>>>>>>
>>>>>> 	wrmsrq(MSR_KERNEL_GS_BASE, gsbase);
>>>>>>
>>>>>> on FRED.
>>>>>>
>>>>>> But LKGS already wrote MSR_KERNEL_GS_BASE...
>>>>>>
>>>>>>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS
>>>>>>> can be replaced with swapgs/mov gs/swapgs on legacy.
>>>>>> Right.
>>>>>>
>>>>>> I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf
>>>>>> back...
>>>>>>
>>>>>> Although, I need to think how to make it pretty...
>>>>>>
>>>>> Should be doing wrmsrns...
>>>> No, I think that second WRMSR* should not happen at all if we have executed
>>>> LKGS which has already written MSR_KERNEL_GS_BASE, right?
>>>>
>>>>
>>> You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector.
>>>
>>> Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized.
>> I think you're slightly talking past each other, and I also made a
>> mistake on the original reply, so lets try rephrasing it.
>>
>> LGKS only writes a zero-extended 32bit value into KERN_GS_BASE.  This is
>> because there's only 32 bits of information in the GDT/LDT.
>>
>> So the real write into KERN_GS_BASE is still needed.  Sorry - you can't
>> optimise this away.  Also, I'm pretty sure amluto did some x86 selftests
>> covering this last time the logic was rewritten.
>>
>>
>> As to WRMSR vs WRMSRNS, yes Intel CPUs want this to be WRMSRNS.  AMD
>> don't have WRMSRNS but this particular MSR index is architecturally not
>> architecturally serialising anyway.
>>
>> ~Andrew
> It's not just a matter of it being a 32-bit base, it might not even be the correct one even so.

Indeed, GS might be an LDT selector.

~Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05 15:16                       ` Andrew Cooper
@ 2026-06-05 15:51                         ` H. Peter Anvin
  2026-06-05 17:17                           ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-05 15:51 UTC (permalink / raw)
  To: Andrew Cooper, Borislav Petkov; +Cc: x86-ML, LKML

On June 5, 2026 8:16:53 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>On 05/06/2026 4:13 pm, H. Peter Anvin wrote:
>> On June 5, 2026 2:13:07 AM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 05/06/2026 6:05 am, H. Peter Anvin wrote:
>>>> On June 4, 2026 9:38:46 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>>>>> On Thu, Jun 04, 2026 at 09:30:33PM -0700, H. Peter Anvin wrote:
>>>>>> On June 4, 2026 9:26:52 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>>>>>>> On Thu, Jun 04, 2026 at 08:20:57PM -0700, H. Peter Anvin wrote:
>>>>>>>> I guess the question is why there is a "first" one.
>>>>>>> That happens when we do:
>>>>>>>
>>>>>>> x86_fsgsbase_load()
>>>>>>>
>>>>>>> 	loadseg(GS) -> load_gs_index() -> native_load_gs_index() ->
>>>>>>> 	  if (cpu_feature_enabled(X86_FEATURE_LKGS))
>>>>>>>                native_lkgs(selector);
>>>>>>>
>>>>>>> then back in x86_fsgsbase_load() we do:
>>>>>>>
>>>>>>> 	__wrgsbase_inactive(next->gsbase);
>>>>>>>
>>>>>>> which does
>>>>>>>
>>>>>>> 	wrmsrq(MSR_KERNEL_GS_BASE, gsbase);
>>>>>>>
>>>>>>> on FRED.
>>>>>>>
>>>>>>> But LKGS already wrote MSR_KERNEL_GS_BASE...
>>>>>>>
>>>>>>>> Logically the sequence should be LKGS first, if needed; then WRMSR(NS). LKGS
>>>>>>>> can be replaced with swapgs/mov gs/swapgs on legacy.
>>>>>>> Right.
>>>>>>>
>>>>>>> I think avoiding that second WRMSR(MSR_KERNEL_GS_BASE) should give some perf
>>>>>>> back...
>>>>>>>
>>>>>>> Although, I need to think how to make it pretty...
>>>>>>>
>>>>>> Should be doing wrmsrns...
>>>>> No, I think that second WRMSR* should not happen at all if we have executed
>>>>> LKGS which has already written MSR_KERNEL_GS_BASE, right?
>>>>>
>>>>>
>>>> You can't do that (at least not without further checks) if user space has WRGSBASE enabled, since you have no guarantee that the active GS.base is consistent with GS.selector.
>>>>
>>>> Since GS > 3 is pretty rare in 64-bit code at least, it doesn't seem to be a code path that needs to be that heavily optimized.
>>> I think you're slightly talking past each other, and I also made a
>>> mistake on the original reply, so lets try rephrasing it.
>>>
>>> LGKS only writes a zero-extended 32bit value into KERN_GS_BASE.  This is
>>> because there's only 32 bits of information in the GDT/LDT.
>>>
>>> So the real write into KERN_GS_BASE is still needed.  Sorry - you can't
>>> optimise this away.  Also, I'm pretty sure amluto did some x86 selftests
>>> covering this last time the logic was rewritten.
>>>
>>>
>>> As to WRMSR vs WRMSRNS, yes Intel CPUs want this to be WRMSRNS.  AMD
>>> don't have WRMSRNS but this particular MSR index is architecturally not
>>> architecturally serialising anyway.
>>>
>>> ~Andrew
>> It's not just a matter of it being a 32-bit base, it might not even be the correct one even so.
>
>Indeed, GS might be an LDT selector.
>
>~Andrew

No, GS.base might have been loaded (with wrgsbase) after GS was loaded, so it could be *completely different*.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05 15:51                         ` H. Peter Anvin
@ 2026-06-05 17:17                           ` Borislav Petkov
  2026-06-08  6:46                             ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-05 17:17 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On Fri, Jun 05, 2026 at 08:51:04AM -0700, H. Peter Anvin wrote:
> No, GS.base might have been loaded (with wrgsbase) after GS was loaded, so
> it could be *completely different*.

So you're basically saying, LKGS would load the IA32_KERNEL_GS_BASE which
belongs to the segment selector. This is what the pseudo code in the SDM says:

  GS.selector := SRC;
  GS.attributes := descriptor.attributes;
  IA32_KERNEL_GS_BASE := descriptor.base; // bits 63:32 cleared

Now, luserspace might've put something else in GS.base with WRGSBASE:

  GS.base := SRC;

So now, on context switch, we need to load IA32_KERNEL_GS_BASE with
next->gsbase which is the GS.base of the next task we're switching to.

And yes, GS.base is mapped to IA32_KERNEL_GS_BASE so yes, we must do that
update.

And yes, as Andrew points out, both LKGS and WRMGSBASE do 32-bit writes only
so we need to do the full MSR write.

Ok, thanks guys, that makes sense.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-05 17:17                           ` Borislav Petkov
@ 2026-06-08  6:46                             ` H. Peter Anvin
  2026-06-08 14:38                               ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-08  6:46 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML

On June 5, 2026 10:17:11 AM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Fri, Jun 05, 2026 at 08:51:04AM -0700, H. Peter Anvin wrote:
>> No, GS.base might have been loaded (with wrgsbase) after GS was loaded, so
>> it could be *completely different*.
>
>So you're basically saying, LKGS would load the IA32_KERNEL_GS_BASE which
>belongs to the segment selector. This is what the pseudo code in the SDM says:
>
>  GS.selector := SRC;
>  GS.attributes := descriptor.attributes;
>  IA32_KERNEL_GS_BASE := descriptor.base; // bits 63:32 cleared
>
>Now, luserspace might've put something else in GS.base with WRGSBASE:
>
>  GS.base := SRC;
>
>So now, on context switch, we need to load IA32_KERNEL_GS_BASE with
>next->gsbase which is the GS.base of the next task we're switching to.
>
>And yes, GS.base is mapped to IA32_KERNEL_GS_BASE so yes, we must do that
>update.
>
>And yes, as Andrew points out, both LKGS and WRMGSBASE do 32-bit writes only
>so we need to do the full MSR write.
>
>Ok, thanks guys, that makes sense.
>

WRxSBASE does a 64-bit write, but for GS it would incorrectly address the kernel GS.base. For legacy it can be used under swapgs, but with FRED that is disallowed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-08  6:46                             ` H. Peter Anvin
@ 2026-06-08 14:38                               ` Borislav Petkov
  2026-06-08 17:30                                 ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-08 14:38 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On Sun, Jun 07, 2026 at 11:46:34PM -0700, H. Peter Anvin wrote:
> WRxSBASE does a 64-bit write,

When REX.W.

The SDM text is confusing:

"If no REX.W prefix is used, the operand size is 32 bits; the upper 32 bits of
the source register are ignored and upper 32 bits of the base address (for FS
or GS) are cleared."

Does this last part that GS is cleared, refer to when WRGSBASE is used with no
REX.W or in general?

> but for GS it would incorrectly address the kernel GS.base.

What does that mean?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-08 14:38                               ` Borislav Petkov
@ 2026-06-08 17:30                                 ` H. Peter Anvin
  2026-06-08 20:05                                   ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-08 17:30 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML

On 2026-06-08 07:38, Borislav Petkov wrote:
> On Sun, Jun 07, 2026 at 11:46:34PM -0700, H. Peter Anvin wrote:
>> WRxSBASE does a 64-bit write,
> 
> When REX.W.
> 
> The SDM text is confusing:
> 
> "If no REX.W prefix is used, the operand size is 32 bits; the upper 32 bits of
> the source register are ignored and upper 32 bits of the base address (for FS
> or GS) are cleared."
> 
> Does this last part that GS is cleared, refer to when WRGSBASE is used with no
> REX.W or in general?
> 

Without REX.W (e.g. wrgsbase %eax as opposed to wrgsbase %rax).

>> but for GS it would incorrectly address the kernel GS.base.
> 
> What does that mean?
> 

It means that in kernel mode, it is the currently active GS.base that is 
written (or read with rdgsbase), that is, the one that belongs to 
kernel, not the user space one in what is confusingly enough called 
MSR_KERNEL_GS_BASE.

In other words, not the one we want to task switch, *unless* you are in 
IDT mode and can surround it with SWAPGS.

	-hpa


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-08 17:30                                 ` H. Peter Anvin
@ 2026-06-08 20:05                                   ` Borislav Petkov
  2026-06-08 21:21                                     ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-08 20:05 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On Mon, Jun 08, 2026 at 10:30:36AM -0700, H. Peter Anvin wrote:
> Without REX.W (e.g. wrgsbase %eax as opposed to wrgsbase %rax).

I see.

> It means that in kernel mode, it is the currently active GS.base that is
> written (or read with rdgsbase), that is, the one that belongs to kernel,
> not the user space one in what is confusingly enough called
> MSR_KERNEL_GS_BASE.
> 
> In other words, not the one we want to task switch, *unless* you are in IDT
> mode and can surround it with SWAPGS.

Uff, what a mess this stuff is. Brain is in a knot.

I think this is begging to be written down somewhere. Lemme point AI to it and
see what it would generate.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-08 20:05                                   ` Borislav Petkov
@ 2026-06-08 21:21                                     ` Borislav Petkov
  2026-06-08 21:52                                       ` H. Peter Anvin
  2026-06-08 22:58                                       ` Andrew Cooper
  0 siblings, 2 replies; 24+ messages in thread
From: Borislav Petkov @ 2026-06-08 21:21 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On Mon, Jun 08, 2026 at 01:05:28PM -0700, Borislav Petkov wrote:
> I think this is begging to be written down somewhere. Lemme point AI to it and
> see what it would generate.

Something like the below...  I am thinking of sticking that somewhere under
Documentation... oh look: Documentation/arch/x86/x86_64/fsgs.rst. Looks like
there was already need to document this stuff which is clearly not really
transparent. What's there is covering the userspace side more tho.

Anyway, the below is a summary of our thread with AI, it ain't half-bad and
I think we should write it down for future reference.

Thx.

---
GS handling on context switches

SWAPGS

Swaps the base-address value in MSR_KERNEL_GS_BASE with the active GS.base in
the hidden portion of the GS selector register.

MOV <segment selector>, GS

(legacy path, non-FRED) Loads the GS selector and fetches the descriptor
attributes and base from the GDT/LDT. Writes a 32-bit base into the active
GS.base. Does not touch MSR_KERNEL_GS_BASE. Useless for task switching the
user GS base on its own without surrounding SWAPGS calls.

LKGS <selector>

(FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and
descriptor attributes, but it redirects the base write — instead of updating
the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE
(i.e. MSR_KERNEL_GS_BASE).

Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT
descriptors only encode 32-bit bases. This means it cannot correctly represent
a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR
is still required afterwards.

WRGSBASE <reg>

(with REX.W) Writes a full 64-bit value directly into the currently active
GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to cover
the full address space.

The problem in kernel context: the currently active GS.base belongs to the
kernel, not the user task. So using this during context switching would
corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in
IDT mode). Without REX.W, the upper 32 bits of the base are cleared instead.

WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE

Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive
(user-space) GS base — the one that gets swapped into the active GS.base on
SWAPGS. This is the only instruction that can correctly set a 64-bit user GS
base during a context switch from kernel mode. WRMSRNS is the non-serializing
variant, which is preferable here since this MSR is architecturally
non-serializing anyway.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-08 21:21                                     ` Borislav Petkov
@ 2026-06-08 21:52                                       ` H. Peter Anvin
  2026-06-08 22:58                                       ` Andrew Cooper
  1 sibling, 0 replies; 24+ messages in thread
From: H. Peter Anvin @ 2026-06-08 21:52 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Cooper, x86-ML, LKML

On June 8, 2026 2:21:00 PM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Mon, Jun 08, 2026 at 01:05:28PM -0700, Borislav Petkov wrote:
>> I think this is begging to be written down somewhere. Lemme point AI to it and
>> see what it would generate.
>
>Something like the below...  I am thinking of sticking that somewhere under
>Documentation... oh look: Documentation/arch/x86/x86_64/fsgs.rst. Looks like
>there was already need to document this stuff which is clearly not really
>transparent. What's there is covering the userspace side more tho.
>
>Anyway, the below is a summary of our thread with AI, it ain't half-bad and
>I think we should write it down for future reference.
>
>Thx.
>
>---
>GS handling on context switches
>
>SWAPGS
>
>Swaps the base-address value in MSR_KERNEL_GS_BASE with the active GS.base in
>the hidden portion of the GS selector register.
>
>MOV <segment selector>, GS
>
>(legacy path, non-FRED) Loads the GS selector and fetches the descriptor
>attributes and base from the GDT/LDT. Writes a 32-bit base into the active
>GS.base. Does not touch MSR_KERNEL_GS_BASE. Useless for task switching the
>user GS base on its own without surrounding SWAPGS calls.
>
>LKGS <selector>
>
>(FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and
>descriptor attributes, but it redirects the base write — instead of updating
>the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE
>(i.e. MSR_KERNEL_GS_BASE).
>
>Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT
>descriptors only encode 32-bit bases. This means it cannot correctly represent
>a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR
>is still required afterwards.
>
>WRGSBASE <reg>
>
>(with REX.W) Writes a full 64-bit value directly into the currently active
>GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to cover
>the full address space.
>
>The problem in kernel context: the currently active GS.base belongs to the
>kernel, not the user task. So using this during context switching would
>corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in
>IDT mode). Without REX.W, the upper 32 bits of the base are cleared instead.
>
>WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE
>
>Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive
>(user-space) GS base — the one that gets swapped into the active GS.base on
>SWAPGS. This is the only instruction that can correctly set a 64-bit user GS
>base during a context switch from kernel mode. WRMSRNS is the non-serializing
>variant, which is preferable here since this MSR is architecturally
>non-serializing anyway.
>

Looks good to me. 

You might want to add that despite the name, while running in the kernel (which is the only time MSRs are accessible) KEENEL_GS_BASE, despite its name, actually contains the *user* GS base.

The naming was confusing to begin with, and is even more so with FRED.

Also, MOV GS/LKGS is the only way to update the *other* fields of the GS descriptor (the base and the flags.)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-08 21:21                                     ` Borislav Petkov
  2026-06-08 21:52                                       ` H. Peter Anvin
@ 2026-06-08 22:58                                       ` Andrew Cooper
  2026-06-18  1:09                                         ` Borislav Petkov
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2026-06-08 22:58 UTC (permalink / raw)
  To: Borislav Petkov, H. Peter Anvin; +Cc: Andrew Cooper, x86-ML, LKML

On 08/06/2026 10:21 pm, Borislav Petkov wrote:
> On Mon, Jun 08, 2026 at 01:05:28PM -0700, Borislav Petkov wrote:
>> I think this is begging to be written down somewhere. Lemme point AI to it and
>> see what it would generate.
> Something like the below...  I am thinking of sticking that somewhere under
> Documentation... oh look: Documentation/arch/x86/x86_64/fsgs.rst. Looks like
> there was already need to document this stuff which is clearly not really
> transparent. What's there is covering the userspace side more tho.
>
> Anyway, the below is a summary of our thread with AI, it ain't half-bad and
> I think we should write it down for future reference.

This contains some mechanics, but is lacking on the background.

> ---
> GS handling on context switches

History
-------

In 32bit, data segments need reloading on entry to the kernel, and
restoring on exit to userspace.  Only the segment selector is necessary,
as all segment data resides in the GDT/LDT.  Bases in the GDT/LDT are 32
bits wide.  The segment selector values are user-chosen, and effectively
arbitrary.

The 32bit mechanism is slow, so in 64bit, segments were made mostly flat
so as to not need reloading on entry/exit.  FS and GS segment bases was
extended to 64 bits, and become accessible via MSRs, and a separate
GS_SHADOW value introduced.  The SWAPGS instruction swaps GS_BASE and
GS_SHADOW, as the only action needed on entry/exit.

64bit userspace needed to make the prctl() ARCH_SET_GS to have a base
value greater than 32 bits, and a side effect of this syscall was to
zero the GS selector.

Then the FSGSBASE instructions came along, and userspace could finally
choose an arbitrary base address not previously registered via syscall. 
Linux's ABI promises to preserve both the selector value and the full
base, even when they are disconnected.


Mechanisms
----------

> SWAPGS
>
> Swaps the base-address value in MSR_KERNEL_GS_BASE with the active GS.base in
> the hidden portion of the GS selector register.

"Swaps the value in "

>
> MOV <segment selector>, GS
>
> (legacy path, non-FRED) Loads the GS selector and fetches the descriptor
> attributes and base from the GDT/LDT.

Loads the GS segment from the GDT/LDT, including selector, attributes,
limit and base.  The 32 bits of base from the GDT/LDT are zero-extended
into the 64bit base register.

>  Writes a 32-bit base into the active
> GS.base. Does not touch MSR_KERNEL_GS_BASE. Useless for task switching the
> user GS base on its own without surrounding SWAPGS calls.

I wouldn't quite say useless.  There are ways without a SWAPGS, although
they're less neat.

The salient point is that MOV GS clobbers the active kernel per-cpu
pointer, so must be done with custom error handling/recovery so
#GP/#PF/NMI/#MC can get back to the correct per-cpu pointer.

>
> LKGS <selector>
>
> (FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and
> descriptor attributes, but it redirects the base write — instead of updating
> the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE
> (i.e. MSR_KERNEL_GS_BASE).
>
> Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT
> descriptors only encode 32-bit bases. This means it cannot correctly represent
> a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR
> is still required afterwards.

"Exactly like MOV GS, except that the 32bit zero-extended base value is
written into GS_SHADOW instead of the active GS.base."

"This instruction ensures that the kernels per-cpu pointer stays good,
and does not need custom error handling."

> WRGSBASE <reg>
>
> (with REX.W) Writes a full 64-bit value directly into the currently active
> GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to cover
> the full address space.
>
> The problem in kernel context: the currently active GS.base belongs to the
> kernel, not the user task. So using this during context switching would
> corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in
> IDT mode). Without REX.W, the upper 32 bits of the base are cleared instead.

The REX.W aspect isn't interesting from a context switching point of
view.  It behaves like most other instructions in this regard, even if
it's not interesting to use the 32bit form of WRGSBASE.

>
> WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE
>
> Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive
> (user-space) GS base — the one that gets swapped into the active GS.base on
> SWAPGS. This is the only instruction that can correctly set a 64-bit user GS
> base during a context switch from kernel mode. WRMSRNS is the non-serializing
> variant, which is preferable here since this MSR is architecturally
> non-serializing anyway.

This is where the AI didn't get it right.

AMD ~silently made FS_BASE/GS_BASE/GS_SHADOW become
non-architecturally-serialising, enumerated by
0x80000021.eax[1].FS_GS_NS.  I think this was in Zen3.

Intel still has them as architecturally serialising, requiring software
to opt in to non-serialising behaviour using WRMSRNS.

~Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-08 22:58                                       ` Andrew Cooper
@ 2026-06-18  1:09                                         ` Borislav Petkov
  2026-06-18 10:22                                           ` David Laight
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2026-06-18  1:09 UTC (permalink / raw)
  To: Andrew Cooper, H. Peter Anvin; +Cc: x86-ML, LKML

Ok,

I think I incorporated them all:

---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Mon, 8 Jun 2026 18:59:14 -0700
Subject: [PATCH] Documentation/x86: Document the intricacies of GS context
 switching

Summarized from the thread at Link by AI, with additions and
improvements by H. Peter Anvin and Andrew Cooper.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Assisted-by: Claude Code:claude-sonnet-4-6
Link: https://lore.kernel.org/all/20260604015303.GEaiDafyuU0bwP4Y05@fat_crate.local
---
 Documentation/arch/x86/x86_64/fsgs.rst | 97 ++++++++++++++++++++++++++
 1 file changed, 97 insertions(+)

diff --git a/Documentation/arch/x86/x86_64/fsgs.rst b/Documentation/arch/x86/x86_64/fsgs.rst
index 6bda4d16d3f7..fa3d4b065423 100644
--- a/Documentation/arch/x86/x86_64/fsgs.rst
+++ b/Documentation/arch/x86/x86_64/fsgs.rst
@@ -197,3 +197,100 @@ be used for FS/GS based addressing mode::
 
 	mov %reg, %fs:offset
 	mov %reg, %gs:offset
+
+
+Complexities with GS handling on context switches
+=================================================
+
+History
+-------
+
+In 32-bit, data segments need reloading on entry to the kernel, and restoring
+on exit to userspace.  Only the segment selector is necessary, as all segment
+data resides in the GDT/LDT.  Bases in the GDT/LDT are 32 bits wide.  The
+segment selector values are user-chosen, and effectively arbitrary.
+
+The 32-bit mechanism is slow, so in 64-bit, segments were made mostly flat so
+as to not need reloading on entry/exit.  FS and GS segment bases were extended
+to 64 bits, and became accessible via MSRs. Also, a separate GS_SHADOW value
+was introduced.  The SWAPGS instruction swaps GS_BASE and GS_SHADOW, as the
+only action needed on entry/exit.
+
+64-bit userspace needed to make the prctl() ARCH_SET_GS have a base value
+greater than 32 bits, and a side effect of this syscall was to zero the GS
+selector.
+
+Then the FSGSBASE instructions came along, and userspace could finally choose
+an arbitrary base address not previously registered via the syscall.  Linux's
+ABI promises to preserve both the selector value and the full base, even when
+they are disconnected.
+
+When looking at the hardware capabilities, there are multiple x86 instructions
+which modify GS:
+
+* SWAPGS
+
+Swaps the value in MSR_KERNEL_GS_BASE with the active GS.base in the hidden
+portion of the GS selector register.
+
+* MOV <segment selector>, GS
+
+(legacy path, non-FRED) Loads GS with the selector specified in <segment
+selector> and fetches the GS descriptor attributes, limit and base from the
+GDT/LDT.  Writes a 32-bit base into the active GS.base, zero-extending it into
+the 64-bit base register. It does not touch MSR_KERNEL_GS_BASE.
+
+The problem with this is that because it writes the *current* GS.base, it
+corrupts the active kernel per-CPU pointer (in %gs).
+
+* LKGS <selector>
+
+(FRED path, replaces MOV GS) Like MOV GS in that it loads the selector and
+descriptor attributes, but it redirects the base write — instead of updating
+the active GS.base, it writes the descriptor base into IA32_KERNEL_GS_BASE
+(i.e. MSR_KERNEL_GS_BASE).
+
+Critical caveat: it only writes a zero-extended 32-bit value, because GDT/LDT
+descriptors only encode 32-bit bases. This means it cannot correctly represent
+a full 64-bit user-space GS base (e.g. a TLS pointer), so a full 64-bit WRMSR
+is still required afterwards.
+
+This instruction ensures that the kernel's per-CPU pointer stays good, and
+does not need custom error handling.
+
+MOV GS and LKGS are the only way to update the other fields of the GS
+descriptor.
+
+* WRGSBASE <reg>
+
+In 64-bit mode, it writes a full 64-bit value directly into the currently
+active GS.base as FS.base and GS.base in 64-bit mode are expanded to 64-bit to
+cover the full address space.
+
+The problem in kernel context: the currently active GS.base belongs to the
+kernel, not the user task. So using this during context switching would
+corrupt the kernel's own GS.base, unless surrounded by SWAPGS (only safe in
+IDT mode).
+
+In the remaining modes, the upper 32 bits of the base are cleared instead.
+
+* WRMSR MSR_KERNEL_GS_BASE / WRMSRNS MSR_KERNEL_GS_BASE
+
+Writes a full 64-bit value into MSR_KERNEL_GS_BASE, which holds the inactive
+(user-space) GS.base — the one that gets swapped into the active GS.base on
+SWAPGS. This is the only instruction that can correctly set a 64-bit user
+GS.base during a context switch from kernel mode.
+
+The non-serializing nature of the write was accomplished by the two vendors
+differently. AMD, starting with Zen4, made it the default through:
+
+  CPUID_Fn80000021_EAX [Extended Feature 2 EAX]
+  (Core::X86::Cpuid::FeatureExt2Eax)[1], FsGsKernelGsBaseNonSerializing which is
+  fixed to 1
+
+and Intel through the WRMSRNS instruction which is the non-serializing
+variant.
+
+Btw, while running in kernel mode, MSR_KERNEL_GS_BASE contains actually the
+*user* GS.base. Thus, the naming can be confusing. Unless one thinks of it as
+the kernel's access to GS.base as MSRs are accessible only in CPL0.
-- 
2.53.0


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Save a WRMSR GS.base?
  2026-06-18  1:09                                         ` Borislav Petkov
@ 2026-06-18 10:22                                           ` David Laight
  0 siblings, 0 replies; 24+ messages in thread
From: David Laight @ 2026-06-18 10:22 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Andrew Cooper, H. Peter Anvin, x86-ML, LKML

On Wed, 17 Jun 2026 18:09:02 -0700
Borislav Petkov <bp@alien8.de> wrote:

> Ok,
> 
> I think I incorporated them all:
...
> +Btw, while running in kernel mode, MSR_KERNEL_GS_BASE contains actually the
> +*user* GS.base. Thus, the naming can be confusing. Unless one thinks of it as
> +the kernel's access to GS.base as MSRs are accessible only in CPL0.

That last sentence doesn't read right. Maybe:

The naming of MSR_KERNEL_GS_BASE is rather confusing.
In can only be accessed in kernel mode where it normally contains the
USER GS.base.
The only time it contains the KERNEL GS.base is on system call/interrupt entry
prior to swapgs being executed (and late in the return to user paths).

As an aside I think a 32bit program can detect hardware interrupts.
If %gs/%fs is loaded from an LDT and then the LDT entry changed (eg
a different limit) then the new limit will be loaded by the ISR
return path.
I seem to remember deciding that it was impossible to actual restore
the actual register value.

	David

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-06-18 10:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04  1:53 Save a WRMSR GS.base? Borislav Petkov
2026-06-04  9:17 ` Andrew Cooper
2026-06-05  2:24   ` Borislav Petkov
2026-06-05  2:36     ` H. Peter Anvin
2026-06-05  2:54       ` Borislav Petkov
2026-06-05  3:20         ` H. Peter Anvin
2026-06-05  4:26           ` Borislav Petkov
2026-06-05  4:30             ` H. Peter Anvin
2026-06-05  4:38               ` Borislav Petkov
2026-06-05  5:05                 ` H. Peter Anvin
2026-06-05  9:13                   ` Andrew Cooper
2026-06-05 15:13                     ` H. Peter Anvin
2026-06-05 15:16                       ` Andrew Cooper
2026-06-05 15:51                         ` H. Peter Anvin
2026-06-05 17:17                           ` Borislav Petkov
2026-06-08  6:46                             ` H. Peter Anvin
2026-06-08 14:38                               ` Borislav Petkov
2026-06-08 17:30                                 ` H. Peter Anvin
2026-06-08 20:05                                   ` Borislav Petkov
2026-06-08 21:21                                     ` Borislav Petkov
2026-06-08 21:52                                       ` H. Peter Anvin
2026-06-08 22:58                                       ` Andrew Cooper
2026-06-18  1:09                                         ` Borislav Petkov
2026-06-18 10:22                                           ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.