[PATCH v2 0/2] x86/bugs: RSB tweaks

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] x86/bugs: RSB tweaks
@ 2024-11-21 20:07 Josh Poimboeuf
  2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Josh Poimboeuf @ 2024-11-21 20:07 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

v2:
- remove stray whitespace in printk
- clarify patch 2 comment message about prev/next tasks
- move error path to default switch case in spectre_v2_mitigate_rsb()
- add Reviewed-bys

Some RSB filling tweaks as discussed in the following thread:

  [RFC PATCH v2 0/3] Add support for the ERAPS feature
  https://lore.kernel.org/20241111163913.36139-1-amit@kernel.org

Josh Poimboeuf (2):
  x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  x86/bugs: Don't fill RSB on context switch with eIBRS

 arch/x86/kernel/cpu/bugs.c | 115 +++++++++++++++----------------------
 arch/x86/mm/tlb.c          |   2 +-
 2 files changed, 48 insertions(+), 69 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-11-21 20:07 [PATCH v2 0/2] x86/bugs: RSB tweaks Josh Poimboeuf
@ 2024-11-21 20:07 ` Josh Poimboeuf
  2024-11-30 15:31   ` Borislav Petkov
  2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf
  2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah
  2 siblings, 1 reply; 31+ messages in thread
From: Josh Poimboeuf @ 2024-11-21 20:07 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

eIBRS protects against RSB underflow/poisoning attacks.  Adding
retpoline to the mix doesn't change that.  Retpoline has a balanced
CALL/RET anyway.

So the current full RSB filling on VMEXIT with eIBRS+retpoline is
overkill.  Disable it (or do the VMEXIT_LITE mitigation if needed).

Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Amit Shah <amit.shah@amd.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/x86/kernel/cpu/bugs.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 47a01d4028f6..68bed17f0980 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1605,20 +1605,20 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_
 	case SPECTRE_V2_NONE:
 		return;
 
-	case SPECTRE_V2_EIBRS_LFENCE:
 	case SPECTRE_V2_EIBRS:
+	case SPECTRE_V2_EIBRS_LFENCE:
+	case SPECTRE_V2_EIBRS_RETPOLINE:
 		if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
-			setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
 			pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
+			setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
 		}
 		return;
 
-	case SPECTRE_V2_EIBRS_RETPOLINE:
 	case SPECTRE_V2_RETPOLINE:
 	case SPECTRE_V2_LFENCE:
 	case SPECTRE_V2_IBRS:
-		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
 		pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
+		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
 		return;
 	}
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf
@ 2024-11-30 15:31   ` Borislav Petkov
  2024-12-02 11:15     ` Shah, Amit
                       ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Borislav Petkov @ 2024-11-30 15:31 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Thu, Nov 21, 2024 at 12:07:18PM -0800, Josh Poimboeuf wrote:
> eIBRS protects against RSB underflow/poisoning attacks.  Adding
> retpoline to the mix doesn't change that.  Retpoline has a balanced
> CALL/RET anyway.

This is exactly why I've been wanting for us to document our mitigations for
a long time now.

A bunch of statements above for which I can only rhyme up they're correct if
I search for the vendor docs. On the AMD side I've found:

"When Automatic IBRS is enabled, the internal return address stack used for
return address predictions is cleared on VMEXIT."

APM v2, p. 58/119

For the Intel side I'm not that lucky. There's something here:

https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html

Or is it this one:

https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html#inpage-nav-1-3-undefined

Or is this written down explicitly in some other doc?

In any case, I'd like for us to do have a piece of text accompanying such
patches, perhaps here:

Documentation/admin-guide/hw-vuln/spectre.rst

which quotes the vendor docs.

The current thread(s) on the matter already show how much confused we all are
by all the possible mitigation options, uarch speculative dances etc etc.

> So the current full RSB filling on VMEXIT with eIBRS+retpoline is
> overkill.  Disable it (or do the VMEXIT_LITE mitigation if needed).
> 
> Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Reviewed-by: Amit Shah <amit.shah@amd.com>
> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
> ---
>  arch/x86/kernel/cpu/bugs.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 47a01d4028f6..68bed17f0980 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1605,20 +1605,20 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_
>  	case SPECTRE_V2_NONE:
>  		return;
>  
> -	case SPECTRE_V2_EIBRS_LFENCE:
>  	case SPECTRE_V2_EIBRS:
> +	case SPECTRE_V2_EIBRS_LFENCE:
> +	case SPECTRE_V2_EIBRS_RETPOLINE:
>  		if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
> -			setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
>  			pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
> +			setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);

Why are you swapping those?

>  		}
>  		return;
>  
> -	case SPECTRE_V2_EIBRS_RETPOLINE:
>  	case SPECTRE_V2_RETPOLINE:
>  	case SPECTRE_V2_LFENCE:
>  	case SPECTRE_V2_IBRS:
> -		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
>  		pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
> +		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);

Ditto?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-11-30 15:31   ` Borislav Petkov
@ 2024-12-02 11:15     ` Shah, Amit
  2024-12-02 12:19       ` Borislav Petkov
  2024-12-02 23:35     ` Pawan Gupta
  2024-12-05 23:13     ` Josh Poimboeuf
  2 siblings, 1 reply; 31+ messages in thread
From: Shah, Amit @ 2024-12-02 11:15 UTC (permalink / raw)
  To: jpoimboe@kernel.org, bp@alien8.de
  Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, kvm@vger.kernel.org,
	andrew.cooper3@citrix.com, dave.hansen@linux.intel.com,
	Lendacky, Thomas, daniel.sneddon@linux.intel.com,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	dwmw@amazon.co.uk, amit@kernel.org, hpa@zytor.com,
	peterz@infradead.org, Kaplan, David, x86@kernel.org

On Sat, 2024-11-30 at 16:31 +0100, Borislav Petkov wrote:
> On Thu, Nov 21, 2024 at 12:07:18PM -0800, Josh Poimboeuf wrote:
> > eIBRS protects against RSB underflow/poisoning attacks.  Adding
> > retpoline to the mix doesn't change that.  Retpoline has a balanced
> > CALL/RET anyway.
> 
> This is exactly why I've been wanting for us to document our
> mitigations for
> a long time now.

FWIW, I'd say we have fairly decent documentation with commit messages
+ code + comments in code.

> A bunch of statements above for which I can only rhyme up they're
> correct if
> I search for the vendor docs. On the AMD side I've found:

[...]

> In any case, I'd like for us to do have a piece of text accompanying
> such
> patches, perhaps here:
> 
> Documentation/admin-guide/hw-vuln/spectre.rst
> 
> which quotes the vendor docs.

If you're saying that we need *additional* documentation that
replicates hw manuals and the knowledge we have in our commit + code +
comments, that I agree with.

I got the feeling earlier, though, that you were saying we need that
documentation *instead of* the current comments-within-code, and that
didn't sound like the right thing to do.

> The current thread(s) on the matter already show how much confused we
> all are
> by all the possible mitigation options, uarch speculative dances etc
> etc.

... and the code flows and looks much better after this commit (for
SpectreRSB at least), which is a huge plus.

It's important to note that at some point in the past we got
vulnerabilities and hw features/quirks one after the other, and we kept
tacking mitigation code on top of the existing one -- because that's
what you need to do during an embargo period.  Now's the moment when
we're consolidating it all while taking stock of the overall situation.
This looks like a sound way to go about taking a higher-level view and
simplifying the code.

I doubt we'd want to do things any other way; and much less doing this
kind of an exercise during an embargo.  Moving comments out of the code
will only add to frustration during embargo periods.

Just my 2c

		Amit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-12-02 11:15     ` Shah, Amit
@ 2024-12-02 12:19       ` Borislav Petkov
  0 siblings, 0 replies; 31+ messages in thread
From: Borislav Petkov @ 2024-12-02 12:19 UTC (permalink / raw)
  To: Shah, Amit
  Cc: jpoimboe@kernel.org, corbet@lwn.net,
	pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com,
	kvm@vger.kernel.org, andrew.cooper3@citrix.com,
	dave.hansen@linux.intel.com, Lendacky, Thomas,
	daniel.sneddon@linux.intel.com, boris.ostrovsky@oracle.com,
	linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, dwmw@amazon.co.uk, amit@kernel.org, hpa@zytor.com,
	peterz@infradead.org, Kaplan, David, x86@kernel.org

On Mon, Dec 02, 2024 at 11:15:24AM +0000, Shah, Amit wrote:
> FWIW, I'd say we have fairly decent documentation with commit messages
> + code + comments in code.

You mean everytime we want to swap back in why we did some mitigation decision
the way we did, we should do git archeology and go on a chase?

I've been doing it for years now and it is an awful experience. Absolutely
certainly can be better.

> If you're saying that we need *additional* documentation that replicates hw
> manuals and the knowledge we have in our commit + code + comments, that
> I agree with.

We need documentation or at least pointers to vendor documentation which
explain why we're doing this exact type of mitigation and why we're not doing
other. Why is it ok to disable SMT, why is it not, for example? This patch is
another good example.

> I got the feeling earlier, though, that you were saying we need that
> documentation *instead of* the current comments-within-code, and that
> didn't sound like the right thing to do.

Comments within the code are fine. Big fat comment which is almost a page long
is pushing it. It can very well exist in Documentation/ and a short comment
can point to it.

And in Documentation/ we can go nuts and explain away...

> ... and the code flows and looks much better after this commit (for
> SpectreRSB at least), which is a huge plus.

Why does it look better?

Because of the move of SPECTRE_V2_EIBRS_RETPOLINE?

> It's important to note that at some point in the past we got vulnerabilities
> and hw features/quirks one after the other, and we kept tacking mitigation
> code on top of the existing one -- because that's what you need to do during
> an embargo period.  Now's the moment when we're consolidating it all while
> taking stock of the overall situation.

Yes, that's exactly why I'm pushing for improving the documentation and the
reasoning for each mitigation. Exactly because stuff is not under NDA anymore
and we can do all the debating in the open, and the dust has settled.

> This looks like a sound way to go about taking a higher-level view and
> simplifying the code.

Yap.

And to give you another argument for it: when we clean it up nicely, it'll be
easier to add new mitigations. :)

And no, I don't want more but no one's listening to me anyway...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-11-30 15:31   ` Borislav Petkov
  2024-12-02 11:15     ` Shah, Amit
@ 2024-12-02 23:35     ` Pawan Gupta
  2024-12-03 11:20       ` Borislav Petkov
  2024-12-05 23:13     ` Josh Poimboeuf
  2 siblings, 1 reply; 31+ messages in thread
From: Pawan Gupta @ 2024-12-02 23:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Josh Poimboeuf, x86, linux-kernel, amit, kvm, amit.shah,
	thomas.lendacky, tglx, peterz, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Sat, Nov 30, 2024 at 04:31:25PM +0100, Borislav Petkov wrote:
> On Thu, Nov 21, 2024 at 12:07:18PM -0800, Josh Poimboeuf wrote:
> > eIBRS protects against RSB underflow/poisoning attacks.  Adding
> > retpoline to the mix doesn't change that.  Retpoline has a balanced
> > CALL/RET anyway.
> 
> This is exactly why I've been wanting for us to document our mitigations for
> a long time now.
> 
> A bunch of statements above for which I can only rhyme up they're correct if
> I search for the vendor docs. On the AMD side I've found:
> 
> "When Automatic IBRS is enabled, the internal return address stack used for
> return address predictions is cleared on VMEXIT."
> 
> APM v2, p. 58/119
> 
> For the Intel side I'm not that lucky. There's something here:
> 
> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
> 
> Or is it this one:
> 
> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html#inpage-nav-1-3-undefined
> 
> Or is this written down explicitly in some other doc?

It is in this doc:

  https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html

  "Processors with enhanced IBRS still support the usage model where IBRS is
  set only in the OS/VMM for OSes that enable SMEP. To do this, such
  processors will ensure that guest behavior cannot control the RSB after a
  VM exit once IBRS is set, even if IBRS was not set at the time of the VM
  exit."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-12-02 23:35     ` Pawan Gupta
@ 2024-12-03 11:20       ` Borislav Petkov
  2024-12-05 23:12         ` Josh Poimboeuf
  0 siblings, 1 reply; 31+ messages in thread
From: Borislav Petkov @ 2024-12-03 11:20 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Josh Poimboeuf, x86, linux-kernel, amit, kvm, amit.shah,
	thomas.lendacky, tglx, peterz, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Mon, Dec 02, 2024 at 03:35:21PM -0800, Pawan Gupta wrote:
> It is in this doc:
> 
>   https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html
> 

I hope those URLs remain more stable than past experience shows.

>   "Processors with enhanced IBRS still support the usage model where IBRS is
>   set only in the OS/VMM for OSes that enable SMEP. To do this, such
>   processors will ensure that guest behavior cannot control the RSB after a
>   VM exit once IBRS is set, even if IBRS was not set at the time of the VM
>   exit."

ACK, thanks.

Now, can we pls add those excerpts to Documentation/ and point to them from
the code so that it is crystal clear why it is ok?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-12-03 11:20       ` Borislav Petkov
@ 2024-12-05 23:12         ` Josh Poimboeuf
  2024-12-21  9:13           ` Borislav Petkov
  2025-04-02  9:19           ` Shah, Amit
  0 siblings, 2 replies; 31+ messages in thread
From: Josh Poimboeuf @ 2024-12-05 23:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pawan Gupta, x86, linux-kernel, amit, kvm, amit.shah,
	thomas.lendacky, tglx, peterz, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Tue, Dec 03, 2024 at 12:20:15PM +0100, Borislav Petkov wrote:
> On Mon, Dec 02, 2024 at 03:35:21PM -0800, Pawan Gupta wrote:
> > It is in this doc:
> > 
> >   https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html
> > 
> 
> I hope those URLs remain more stable than past experience shows.
> 
> >   "Processors with enhanced IBRS still support the usage model where IBRS is
> >   set only in the OS/VMM for OSes that enable SMEP. To do this, such
> >   processors will ensure that guest behavior cannot control the RSB after a
> >   VM exit once IBRS is set, even if IBRS was not set at the time of the VM
> >   exit."
> 
> ACK, thanks.
> 
> Now, can we pls add those excerpts to Documentation/ and point to them from
> the code so that it is crystal clear why it is ok?

Ok, I'll try to write up a document.  I'm thinking it should go in its
own return-based-attacks.rst file rather than spectre.rst, which is more
of an outdated historical document at this point.  And we want this
document to actually be read (and kept up to date) by developers instead
of mostly ignored like the others.

-- 
Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-12-05 23:12         ` Josh Poimboeuf
@ 2024-12-21  9:13           ` Borislav Petkov
  2025-04-02  9:19           ` Shah, Amit
  1 sibling, 0 replies; 31+ messages in thread
From: Borislav Petkov @ 2024-12-21  9:13 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Pawan Gupta, x86, linux-kernel, amit, kvm, amit.shah,
	thomas.lendacky, tglx, peterz, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Thu, Dec 05, 2024 at 03:12:07PM -0800, Josh Poimboeuf wrote:
> Ok, I'll try to write up a document.  I'm thinking it should go in its
> own return-based-attacks.rst file rather than spectre.rst, which is more
> of an outdated historical document at this point.

Thanks!

> And we want this document to actually be read (and kept up to date) by
> developers instead of mostly ignored like the others.

Good idea.

Now how do we enforce this?

We could add a rule to checkpatch or to some other patch checking script
like mine here for example :-)

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=vp

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-12-05 23:12         ` Josh Poimboeuf
  2024-12-21  9:13           ` Borislav Petkov
@ 2025-04-02  9:19           ` Shah, Amit
  2025-04-02 14:16             ` Josh Poimboeuf
  1 sibling, 1 reply; 31+ messages in thread
From: Shah, Amit @ 2025-04-02  9:19 UTC (permalink / raw)
  To: jpoimboe@kernel.org, bp@alien8.de
  Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, kvm@vger.kernel.org,
	andrew.cooper3@citrix.com, dave.hansen@linux.intel.com,
	daniel.sneddon@linux.intel.com, Lendacky, Thomas,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org,
	Kaplan, David, x86@kernel.org

On Thu, 2024-12-05 at 15:12 -0800, Josh Poimboeuf wrote:
> On Tue, Dec 03, 2024 at 12:20:15PM +0100, Borislav Petkov wrote:
> > On Mon, Dec 02, 2024 at 03:35:21PM -0800, Pawan Gupta wrote:
> > > It is in this doc:
> > > 
> > >  
> > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html
> > > 
> > 
> > I hope those URLs remain more stable than past experience shows.
> > 
> > >   "Processors with enhanced IBRS still support the usage model
> > > where IBRS is
> > >   set only in the OS/VMM for OSes that enable SMEP. To do this,
> > > such
> > >   processors will ensure that guest behavior cannot control the
> > > RSB after a
> > >   VM exit once IBRS is set, even if IBRS was not set at the time
> > > of the VM
> > >   exit."
> > 
> > ACK, thanks.
> > 
> > Now, can we pls add those excerpts to Documentation/ and point to
> > them from
> > the code so that it is crystal clear why it is ok?
> 
> Ok, I'll try to write up a document.  I'm thinking it should go in
> its
> own return-based-attacks.rst file rather than spectre.rst, which is
> more
> of an outdated historical document at this point.  And we want this
> document to actually be read (and kept up to date) by developers
> instead
> of mostly ignored like the others.
> 

Hey Josh,

Do you plan to submit a v3 with the changes?

Thanks,

		Amit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2025-04-02  9:19           ` Shah, Amit
@ 2025-04-02 14:16             ` Josh Poimboeuf
  2025-04-02 14:19               ` Shah, Amit
  0 siblings, 1 reply; 31+ messages in thread
From: Josh Poimboeuf @ 2025-04-02 14:16 UTC (permalink / raw)
  To: Shah, Amit
  Cc: bp@alien8.de, corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, kvm@vger.kernel.org,
	andrew.cooper3@citrix.com, dave.hansen@linux.intel.com,
	daniel.sneddon@linux.intel.com, Lendacky, Thomas,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org,
	Kaplan, David, x86@kernel.org

On Wed, Apr 02, 2025 at 09:19:19AM +0000, Shah, Amit wrote:
> On Thu, 2024-12-05 at 15:12 -0800, Josh Poimboeuf wrote:
> > On Tue, Dec 03, 2024 at 12:20:15PM +0100, Borislav Petkov wrote:
> > > On Mon, Dec 02, 2024 at 03:35:21PM -0800, Pawan Gupta wrote:
> > > > It is in this doc:
> > > > 
> > > >  
> > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html
> > > > 
> > > 
> > > I hope those URLs remain more stable than past experience shows.
> > > 
> > > >   "Processors with enhanced IBRS still support the usage model
> > > > where IBRS is
> > > >   set only in the OS/VMM for OSes that enable SMEP. To do this,
> > > > such
> > > >   processors will ensure that guest behavior cannot control the
> > > > RSB after a
> > > >   VM exit once IBRS is set, even if IBRS was not set at the time
> > > > of the VM
> > > >   exit."
> > > 
> > > ACK, thanks.
> > > 
> > > Now, can we pls add those excerpts to Documentation/ and point to
> > > them from
> > > the code so that it is crystal clear why it is ok?
> > 
> > Ok, I'll try to write up a document.  I'm thinking it should go in
> > its
> > own return-based-attacks.rst file rather than spectre.rst, which is
> > more
> > of an outdated historical document at this point.  And we want this
> > document to actually be read (and kept up to date) by developers
> > instead
> > of mostly ignored like the others.
> > 
> 
> Hey Josh,
> 
> Do you plan to submit a v3 with the changes?

Thanks for the reminder, I actually had the patches ready to go a few
months ago (with a fancy new doc) and then forgot to post.  Let me dust
off the cobwebs.

-- 
Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2025-04-02 14:16             ` Josh Poimboeuf
@ 2025-04-02 14:19               ` Shah, Amit
  0 siblings, 0 replies; 31+ messages in thread
From: Shah, Amit @ 2025-04-02 14:19 UTC (permalink / raw)
  To: jpoimboe@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com,
	andrew.cooper3@citrix.com, kai.huang@intel.com,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, pawan.kumar.gupta@linux.intel.com,
	linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, kvm@vger.kernel.org, Moger, Babu,
	Das1, Sandipan, tglx@linutronix.de, dwmw@amazon.co.uk,
	hpa@zytor.com, peterz@infradead.org, bp@alien8.de, Kaplan, David,
	x86@kernel.org

On Wed, 2025-04-02 at 07:16 -0700, Josh Poimboeuf wrote:
> On Wed, Apr 02, 2025 at 09:19:19AM +0000, Shah, Amit wrote:
> 

[...]

> > Hey Josh,
> > 
> > Do you plan to submit a v3 with the changes?
> 
> Thanks for the reminder, I actually had the patches ready to go a few
> months ago (with a fancy new doc) and then forgot to post.  Let me
> dust
> off the cobwebs.

Excellent, thanks!

		Amit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
  2024-11-30 15:31   ` Borislav Petkov
  2024-12-02 11:15     ` Shah, Amit
  2024-12-02 23:35     ` Pawan Gupta
@ 2024-12-05 23:13     ` Josh Poimboeuf
  2 siblings, 0 replies; 31+ messages in thread
From: Josh Poimboeuf @ 2024-12-05 23:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Sat, Nov 30, 2024 at 04:31:25PM +0100, Borislav Petkov wrote:
> On Thu, Nov 21, 2024 at 12:07:18PM -0800, Josh Poimboeuf wrote:
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -1605,20 +1605,20 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_
> >  	case SPECTRE_V2_NONE:
> >  		return;
> >  
> > -	case SPECTRE_V2_EIBRS_LFENCE:
> >  	case SPECTRE_V2_EIBRS:
> > +	case SPECTRE_V2_EIBRS_LFENCE:
> > +	case SPECTRE_V2_EIBRS_RETPOLINE:
> >  		if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
> > -			setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
> >  			pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
> > +			setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
> 
> Why are you swapping those?
> 
> >  		}
> >  		return;
> >  
> > -	case SPECTRE_V2_EIBRS_RETPOLINE:
> >  	case SPECTRE_V2_RETPOLINE:
> >  	case SPECTRE_V2_LFENCE:
> >  	case SPECTRE_V2_IBRS:
> > -		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
> >  		pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
> > +		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
> 
> Ditto?

It's more readable that way, similar to how a comment goes before code.

-- 
Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-11-21 20:07 [PATCH v2 0/2] x86/bugs: RSB tweaks Josh Poimboeuf
  2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf
@ 2024-11-21 20:07 ` Josh Poimboeuf
  2024-12-03 11:42   ` Borislav Petkov
  2024-12-05 23:32   ` Josh Poimboeuf
  2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah
  2 siblings, 2 replies; 31+ messages in thread
From: Josh Poimboeuf @ 2024-11-21 20:07 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

User->user Spectre v2 attacks (including RSB) across context switches
are already mitigated by IBPB in cond_mitigation(), if enabled globally
or if either the prev or the next task has opted in to protection.  RSB
filling without IBPB serves no purpose for protecting user space, as
indirect branches are still vulnerable.

User->kernel RSB attacks are mitigated by eIBRS.  In which case the RSB
filling on context switch isn't needed, so remove it.

While at it, update and coalesce the comments describing the various RSB
mitigations.

Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Amit Shah <amit.shah@amd.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/x86/kernel/cpu/bugs.c | 103 +++++++++++++++----------------------
 arch/x86/mm/tlb.c          |   2 +-
 2 files changed, 42 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 68bed17f0980..d5102b72f74d 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1579,31 +1579,48 @@ static void __init spec_ctrl_disable_kernel_rrsba(void)
 	rrsba_disabled = true;
 }
 
-static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
+static void __init spectre_v2_mitigate_rsb(enum spectre_v2_mitigation mode)
 {
 	/*
-	 * Similar to context switches, there are two types of RSB attacks
-	 * after VM exit:
+	 * In general there are two types of RSB attacks:
 	 *
-	 * 1) RSB underflow
+	 * 1) RSB underflow ("Intel Retbleed")
+	 *
+	 *    Some Intel parts have "bottomless RSB".  When the RSB is empty,
+	 *    speculated return targets may come from the branch predictor,
+	 *    which could have a user-poisoned BTB or BHB entry.
+	 *
+	 *    user->user attacks are mitigated by IBPB on context switch.
+	 *
+	 *    user->kernel attacks via context switch are mitigated by IBRS,
+	 *    eIBRS, or RSB filling.
+	 *
+	 *    user->kernel attacks via kernel entry are mitigated by IBRS,
+	 *    eIBRS, or call depth tracking.
+	 *
+	 *    On VMEXIT, guest->host attacks are mitigated by IBRS, eIBRS, or
+	 *    RSB filling.
 	 *
 	 * 2) Poisoned RSB entry
 	 *
-	 * When retpoline is enabled, both are mitigated by filling/clearing
-	 * the RSB.
+	 *    On a context switch, the previous task can poison RSB entries
+	 *    used by the next task, controlling its speculative return
+	 *    targets.  Poisoned RSB entries can also be created by "AMD
+	 *    Retbleed" or SRSO.
 	 *
-	 * When IBRS is enabled, while #1 would be mitigated by the IBRS branch
-	 * prediction isolation protections, RSB still needs to be cleared
-	 * because of #2.  Note that SMEP provides no protection here, unlike
-	 * user-space-poisoned RSB entries.
+	 *    user->user attacks are mitigated by IBPB on context switch.
 	 *
-	 * eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB
-	 * bug is present then a LITE version of RSB protection is required,
-	 * just a single call needs to retire before a RET is executed.
+	 *    user->kernel attacks via context switch are prevented by
+	 *    SMEP+eIBRS+SRSO mitigations, or RSB clearing.
+	 *
+	 *    guest->host attacks are mitigated by eIBRS or RSB clearing on
+	 *    VMEXIT.  eIBRS implementations with X86_BUG_EIBRS_PBRSB still
+	 *    need "lite" RSB filling which retires a CALL before the first
+	 *    RET.
 	 */
 	switch (mode) {
 	case SPECTRE_V2_NONE:
-		return;
+		break;
 
 	case SPECTRE_V2_EIBRS:
 	case SPECTRE_V2_EIBRS_LFENCE:
@@ -1612,18 +1629,21 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_
 			pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
 			setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
 		}
-		return;
+		break;
 
 	case SPECTRE_V2_RETPOLINE:
 	case SPECTRE_V2_LFENCE:
 	case SPECTRE_V2_IBRS:
-		pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
+		pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n");
+		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
 		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
-		return;
-	}
+		break;
 
-	pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit");
-	dump_stack();
+	default:
+		pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation\n");
+		dump_stack();
+		break;
+	}
 }
 
 /*
@@ -1817,48 +1837,7 @@ static void __init spectre_v2_select_mitigation(void)
 	spectre_v2_enabled = mode;
 	pr_info("%s\n", spectre_v2_strings[mode]);
 
-	/*
-	 * If Spectre v2 protection has been enabled, fill the RSB during a
-	 * context switch.  In general there are two types of RSB attacks
-	 * across context switches, for which the CALLs/RETs may be unbalanced.
-	 *
-	 * 1) RSB underflow
-	 *
-	 *    Some Intel parts have "bottomless RSB".  When the RSB is empty,
-	 *    speculated return targets may come from the branch predictor,
-	 *    which could have a user-poisoned BTB or BHB entry.
-	 *
-	 *    AMD has it even worse: *all* returns are speculated from the BTB,
-	 *    regardless of the state of the RSB.
-	 *
-	 *    When IBRS or eIBRS is enabled, the "user -> kernel" attack
-	 *    scenario is mitigated by the IBRS branch prediction isolation
-	 *    properties, so the RSB buffer filling wouldn't be necessary to
-	 *    protect against this type of attack.
-	 *
-	 *    The "user -> user" attack scenario is mitigated by RSB filling.
-	 *
-	 * 2) Poisoned RSB entry
-	 *
-	 *    If the 'next' in-kernel return stack is shorter than 'prev',
-	 *    'next' could be tricked into speculating with a user-poisoned RSB
-	 *    entry.
-	 *
-	 *    The "user -> kernel" attack scenario is mitigated by SMEP and
-	 *    eIBRS.
-	 *
-	 *    The "user -> user" scenario, also known as SpectreBHB, requires
-	 *    RSB clearing.
-	 *
-	 * So to mitigate all cases, unconditionally fill RSB on context
-	 * switches.
-	 *
-	 * FIXME: Is this pointless for retbleed-affected AMD?
-	 */
-	setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
-	pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
-
-	spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
+	spectre_v2_mitigate_rsb(mode);
 
 	/*
 	 * Retpoline protects the kernel, but doesn't protect firmware.  IBRS
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 86593d1b787d..dc388f5ae7ef 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -388,7 +388,7 @@ static void cond_mitigation(struct task_struct *next)
 	prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
 
 	/*
-	 * Avoid user/user BTB poisoning by flushing the branch predictor
+	 * Avoid user->user BTB/RSB poisoning by flushing the branch predictor
 	 * when switching between processes. This stops one process from
 	 * doing Spectre-v2 attacks on another.
 	 *
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf
@ 2024-12-03 11:42   ` Borislav Petkov
  2024-12-05 23:32   ` Josh Poimboeuf
  1 sibling, 0 replies; 31+ messages in thread
From: Borislav Petkov @ 2024-12-03 11:42 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote:
> @@ -1579,31 +1579,48 @@ static void __init spec_ctrl_disable_kernel_rrsba(void)
>  	rrsba_disabled = true;
>  }
>  
> -static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode)
> +static void __init spectre_v2_mitigate_rsb(enum spectre_v2_mitigation mode)

mitigate the return stack buffer?

The previous name was describing better what this function does...

Perhaps

	spectre_v2_determine_rsb_handling()

or so. "Handling" to mean, what do to with the RSB based on the selected
Spectre v2 mitigation.

>  {
>  	/*
> -	 * Similar to context switches, there are two types of RSB attacks
> -	 * after VM exit:
> +	 * In general there are two types of RSB attacks:
>  	 *
> -	 * 1) RSB underflow
> +	 * 1) RSB underflow ("Intel Retbleed")
> +	 *
> +	 *    Some Intel parts have "bottomless RSB".  When the RSB is empty,

This is exactly what I mean with expanding this in Documentation/: Which parts
are those? 

> +	 *    speculated return targets may come from the branch predictor,
> +	 *    which could have a user-poisoned BTB or BHB entry.

Also there we could explain what those alternative predictors are:

"RSBA: The processor supports RSB Alternate. Alternative branch predictors may
be used by RET instructions when the RSB is empty."

https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html

> +	 *
> +	 *    user->user attacks are mitigated by IBPB on context switch.
> +	 *
> +	 *    user->kernel attacks via context switch are mitigated by IBRS,
> +	 *    eIBRS, or RSB filling.
> +	 *
> +	 *    user->kernel attacks via kernel entry are mitigated by IBRS,
> +	 *    eIBRS, or call depth tracking.
> +	 *
> +	 *    On VMEXIT, guest->host attacks are mitigated by IBRS, eIBRS, or
> +	 *    RSB filling.

I like the explanation of every attack vector. I'd like even more if that were
expanded in Documentation/ and we can always go look it up there when touching
this code or reviewing patches touching it.

>  	 *
>  	 * 2) Poisoned RSB entry
>  	 *
> -	 * When retpoline is enabled, both are mitigated by filling/clearing
> -	 * the RSB.
> +	 *    On a context switch, the previous task can poison RSB entries
> +	 *    used by the next task, controlling its speculative return
> +	 *    targets.  Poisoned RSB entries can also be created by "AMD
> +	 *    Retbleed" or SRSO.

"... by the AMD Retbleed variant... "

>  	 *
> -	 * When IBRS is enabled, while #1 would be mitigated by the IBRS branch
> -	 * prediction isolation protections, RSB still needs to be cleared
> -	 * because of #2.  Note that SMEP provides no protection here, unlike
> -	 * user-space-poisoned RSB entries.
> +	 *    user->user attacks are mitigated by IBPB on context switch.
>  	 *
> -	 * eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB
> -	 * bug is present then a LITE version of RSB protection is required,
> -	 * just a single call needs to retire before a RET is executed.
> +	 *    user->kernel attacks via context switch are prevented by
> +	 *    SMEP+eIBRS+SRSO mitigations, or RSB clearing.

we call it "RSB filling" above. What's the difference?

The hw *IBRS* vs our RSB stuffing sw sequence?

> +	 *    guest->host attacks are mitigated by eIBRS or RSB clearing on
> +	 *    VMEXIT.

I think you mean our sw RSB stuffing sequence here.

> eIBRS implementations with X86_BUG_EIBRS_PBRSB still
> +	 *    need "lite" RSB filling which retires a CALL before the first
> +	 *    RET.
>  	 */
>  	switch (mode) {
>  	case SPECTRE_V2_NONE:
> -		return;
> +		break;
>  
>  	case SPECTRE_V2_EIBRS:
>  	case SPECTRE_V2_EIBRS_LFENCE:
> @@ -1612,18 +1629,21 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_
>  			pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n");
>  			setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE);
>  		}
> -		return;
> +		break;
>  
>  	case SPECTRE_V2_RETPOLINE:
>  	case SPECTRE_V2_LFENCE:
>  	case SPECTRE_V2_IBRS:
> -		pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n");
> +		pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n");
> +		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);

Aha, with "RSB filling" you mean our sw sequence.

It would be good to have those explained properly and have some text
establishing the terminology we're using.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf
  2024-12-03 11:42   ` Borislav Petkov
@ 2024-12-05 23:32   ` Josh Poimboeuf
  2024-12-06  0:53     ` Josh Poimboeuf
  2024-12-06 10:10     ` Shah, Amit
  1 sibling, 2 replies; 31+ messages in thread
From: Josh Poimboeuf @ 2024-12-05 23:32 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote:
> User->user Spectre v2 attacks (including RSB) across context switches
> are already mitigated by IBPB in cond_mitigation(), if enabled globally
> or if either the prev or the next task has opted in to protection.  RSB
> filling without IBPB serves no purpose for protecting user space, as
> indirect branches are still vulnerable.

Question for Intel/AMD folks: where is it documented that IBPB clears
the RSB?  I thought I'd seen this somewhere but I can't seem to find it.

-- 
Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-12-05 23:32   ` Josh Poimboeuf
@ 2024-12-06  0:53     ` Josh Poimboeuf
  2024-12-06 23:02       ` Josh Poimboeuf
  2024-12-06 10:10     ` Shah, Amit
  1 sibling, 1 reply; 31+ messages in thread
From: Josh Poimboeuf @ 2024-12-06  0:53 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Thu, Dec 05, 2024 at 03:32:47PM -0800, Josh Poimboeuf wrote:
> On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote:
> > User->user Spectre v2 attacks (including RSB) across context switches
> > are already mitigated by IBPB in cond_mitigation(), if enabled globally
> > or if either the prev or the next task has opted in to protection.  RSB
> > filling without IBPB serves no purpose for protecting user space, as
> > indirect branches are still vulnerable.
> 
> Question for Intel/AMD folks: where is it documented that IBPB clears
> the RSB?  I thought I'd seen this somewhere but I can't seem to find it.

For Intel, I found this:

  https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html

  "Software that executed before the IBPB command cannot control the
  predicted targets of indirect branches executed after the command on
  the same logical processor. The term indirect branch in this context
  includes near return instructions, so these predicted targets may come
  from the RSB.

  This article uses the term RSB-barrier to refer to either an IBPB
  command event, or (on processors which support enhanced IBRS) either a
  VM exit with IBRS set to 1 or setting IBRS to 1 after a VM exit."

I haven't seen anything that explicit for AMD.

-- 
Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-12-06  0:53     ` Josh Poimboeuf
@ 2024-12-06 23:02       ` Josh Poimboeuf
  2024-12-30 14:54         ` Shah, Amit
  2025-01-08 11:50         ` Shah, Amit
  0 siblings, 2 replies; 31+ messages in thread
From: Josh Poimboeuf @ 2024-12-06 23:02 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx,
	peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On Thu, Dec 05, 2024 at 04:53:03PM -0800, Josh Poimboeuf wrote:
> On Thu, Dec 05, 2024 at 03:32:47PM -0800, Josh Poimboeuf wrote:
> > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote:
> > > User->user Spectre v2 attacks (including RSB) across context switches
> > > are already mitigated by IBPB in cond_mitigation(), if enabled globally
> > > or if either the prev or the next task has opted in to protection.  RSB
> > > filling without IBPB serves no purpose for protecting user space, as
> > > indirect branches are still vulnerable.
> > 
> > Question for Intel/AMD folks: where is it documented that IBPB clears
> > the RSB?  I thought I'd seen this somewhere but I can't seem to find it.
> 
> For Intel, I found this:
> 
>   https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
> 
>   "Software that executed before the IBPB command cannot control the
>   predicted targets of indirect branches executed after the command on
>   the same logical processor. The term indirect branch in this context
>   includes near return instructions, so these predicted targets may come
>   from the RSB.
> 
>   This article uses the term RSB-barrier to refer to either an IBPB
>   command event, or (on processors which support enhanced IBRS) either a
>   VM exit with IBRS set to 1 or setting IBRS to 1 after a VM exit."
> 
> I haven't seen anything that explicit for AMD.

Found it.  As Andrew mentioned earlier, AMD IBPB only clears RSB if the
IBPB_RET CPUID bit is set.  From APM vol 3:

CPUID Fn8000_0008_EBX Extended Feature Identifiers:

30	IBPB_RET	The processor clears the return address
			predictor when MSR PRED_CMD.IBPB is written to 1.

We check that already for the IBPB entry mitigation, but now we'll also
need to do so for the context switch IBPB.

Question for AMD, does SBPB behave the same way, i.e. does it clear RSB
if IBPB_RET?

-- 
Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-12-06 23:02       ` Josh Poimboeuf
@ 2024-12-30 14:54         ` Shah, Amit
  2025-01-08 11:50         ` Shah, Amit
  1 sibling, 0 replies; 31+ messages in thread
From: Shah, Amit @ 2024-12-30 14:54 UTC (permalink / raw)
  To: jpoimboe@kernel.org, x86@kernel.org
  Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, kvm@vger.kernel.org,
	andrew.cooper3@citrix.com, dave.hansen@linux.intel.com,
	Lendacky, Thomas, daniel.sneddon@linux.intel.com,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org,
	bp@alien8.de, Kaplan, David

On Fri, 2024-12-06 at 15:02 -0800, Josh Poimboeuf wrote:
> On Thu, Dec 05, 2024 at 04:53:03PM -0800, Josh Poimboeuf wrote:
> > On Thu, Dec 05, 2024 at 03:32:47PM -0800, Josh Poimboeuf wrote:
> > > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote:
> > > > User->user Spectre v2 attacks (including RSB) across context
> > > > switches
> > > > are already mitigated by IBPB in cond_mitigation(), if enabled
> > > > globally
> > > > or if either the prev or the next task has opted in to
> > > > protection.  RSB
> > > > filling without IBPB serves no purpose for protecting user
> > > > space, as
> > > > indirect branches are still vulnerable.
> > > 
> > > Question for Intel/AMD folks: where is it documented that IBPB
> > > clears
> > > the RSB?  I thought I'd seen this somewhere but I can't seem to
> > > find it.
> > 
> > For Intel, I found this:
> > 
> >  
> > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
> > 
> >   "Software that executed before the IBPB command cannot control
> > the
> >   predicted targets of indirect branches executed after the command
> > on
> >   the same logical processor. The term indirect branch in this
> > context
> >   includes near return instructions, so these predicted targets may
> > come
> >   from the RSB.
> > 
> >   This article uses the term RSB-barrier to refer to either an IBPB
> >   command event, or (on processors which support enhanced IBRS)
> > either a
> >   VM exit with IBRS set to 1 or setting IBRS to 1 after a VM exit."
> > 
> > I haven't seen anything that explicit for AMD.
> 
> Found it.  As Andrew mentioned earlier, AMD IBPB only clears RSB if
> the
> IBPB_RET CPUID bit is set.  From APM vol 3:
> 
> CPUID Fn8000_0008_EBX Extended Feature Identifiers:
> 
> 30	IBPB_RET	The processor clears the return address
> 			predictor when MSR PRED_CMD.IBPB is written
> to 1.
> 
> We check that already for the IBPB entry mitigation, but now we'll
> also
> need to do so for the context switch IBPB.
> 
> Question for AMD, does SBPB behave the same way, i.e. does it clear
> RSB
> if IBPB_RET?

I'm not sure about this.  I'll ask around internally.


		Amit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-12-06 23:02       ` Josh Poimboeuf
  2024-12-30 14:54         ` Shah, Amit
@ 2025-01-08 11:50         ` Shah, Amit
  1 sibling, 0 replies; 31+ messages in thread
From: Shah, Amit @ 2025-01-08 11:50 UTC (permalink / raw)
  To: jpoimboe@kernel.org, x86@kernel.org
  Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, kvm@vger.kernel.org,
	andrew.cooper3@citrix.com, dave.hansen@linux.intel.com,
	Lendacky, Thomas, daniel.sneddon@linux.intel.com,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	dwmw@amazon.co.uk, amit@kernel.org, hpa@zytor.com,
	peterz@infradead.org, bp@alien8.de, Kaplan, David

On Fri, 2024-12-06 at 15:02 -0800, Josh Poimboeuf wrote:
> On Thu, Dec 05, 2024 at 04:53:03PM -0800, Josh Poimboeuf wrote:
> > On Thu, Dec 05, 2024 at 03:32:47PM -0800, Josh Poimboeuf wrote:
> > > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote:
> > > > User->user Spectre v2 attacks (including RSB) across context
> > > > switches
> > > > are already mitigated by IBPB in cond_mitigation(), if enabled
> > > > globally
> > > > or if either the prev or the next task has opted in to
> > > > protection.  RSB
> > > > filling without IBPB serves no purpose for protecting user
> > > > space, as
> > > > indirect branches are still vulnerable.
> > > 
> > > Question for Intel/AMD folks: where is it documented that IBPB
> > > clears
> > > the RSB?  I thought I'd seen this somewhere but I can't seem to
> > > find it.
> > 
> > For Intel, I found this:
> > 
> >  
> > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
> > 
> >   "Software that executed before the IBPB command cannot control
> > the
> >   predicted targets of indirect branches executed after the command
> > on
> >   the same logical processor. The term indirect branch in this
> > context
> >   includes near return instructions, so these predicted targets may
> > come
> >   from the RSB.
> > 
> >   This article uses the term RSB-barrier to refer to either an IBPB
> >   command event, or (on processors which support enhanced IBRS)
> > either a
> >   VM exit with IBRS set to 1 or setting IBRS to 1 after a VM exit."
> > 
> > I haven't seen anything that explicit for AMD.
> 
> Found it.  As Andrew mentioned earlier, AMD IBPB only clears RSB if
> the
> IBPB_RET CPUID bit is set.  From APM vol 3:
> 
> CPUID Fn8000_0008_EBX Extended Feature Identifiers:
> 
> 30	IBPB_RET	The processor clears the return address
> 			predictor when MSR PRED_CMD.IBPB is written
> to 1.
> 
> We check that already for the IBPB entry mitigation, but now we'll
> also
> need to do so for the context switch IBPB.
> 
> Question for AMD, does SBPB behave the same way, i.e. does it clear
> RSB
> if IBPB_RET?

That's correct - SBPB clears the RSB only when IBPB_RET is set.

		Amit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-12-05 23:32   ` Josh Poimboeuf
  2024-12-06  0:53     ` Josh Poimboeuf
@ 2024-12-06 10:10     ` Shah, Amit
  2024-12-09 20:46       ` jpoimboe
  1 sibling, 1 reply; 31+ messages in thread
From: Shah, Amit @ 2024-12-06 10:10 UTC (permalink / raw)
  To: jpoimboe@kernel.org, x86@kernel.org
  Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, kvm@vger.kernel.org,
	andrew.cooper3@citrix.com, dave.hansen@linux.intel.com,
	Lendacky, Thomas, daniel.sneddon@linux.intel.com,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org,
	bp@alien8.de, Kaplan, David

On Thu, 2024-12-05 at 15:32 -0800, Josh Poimboeuf wrote:
> On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote:
> > User->user Spectre v2 attacks (including RSB) across context
> > switches
> > are already mitigated by IBPB in cond_mitigation(), if enabled
> > globally
> > or if either the prev or the next task has opted in to protection. 
> > RSB
> > filling without IBPB serves no purpose for protecting user space,
> > as
> > indirect branches are still vulnerable.
> 
> Question for Intel/AMD folks: where is it documented that IBPB clears
> the RSB?  I thought I'd seen this somewhere but I can't seem to find
> it.

"AMD64 TECHNOLOGY INDIRECT BRANCH CONTROL EXTENSION"
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/white-papers/111006-architecture-guidelines-update-amd64-technology-indirect-branch-control-extension.pdf

has:

Indirect branch prediction barrier (IBPB) exists at MSR 0x49 (PRED_CMD)
it 0. This is a write only MSR that both GP faults when software reads
it or if software tries to write any of the bits in 63:1. When bit zero
is written, the processor guarantees that older indirect branches
cannot influence predictions of indirect branches in the future. This
applies to jmp indirects, call indirects and returns. As this restricts
the processor from using all previous indirect branch information, it
is  intended to only be used by software when switching from one user
context to another user context that requires protection, or from one
guest to another guest.

		Amit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS
  2024-12-06 10:10     ` Shah, Amit
@ 2024-12-09 20:46       ` jpoimboe
  0 siblings, 0 replies; 31+ messages in thread
From: jpoimboe @ 2024-12-09 20:46 UTC (permalink / raw)
  To: Shah, Amit
  Cc: x86@kernel.org, corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, kvm@vger.kernel.org,
	andrew.cooper3@citrix.com, dave.hansen@linux.intel.com,
	Lendacky, Thomas, daniel.sneddon@linux.intel.com,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org,
	bp@alien8.de, Kaplan, David

On Fri, Dec 06, 2024 at 10:10:31AM +0000, Shah, Amit wrote:
> On Thu, 2024-12-05 at 15:32 -0800, Josh Poimboeuf wrote:
> > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote:
> > > User->user Spectre v2 attacks (including RSB) across context
> > > switches
> > > are already mitigated by IBPB in cond_mitigation(), if enabled
> > > globally
> > > or if either the prev or the next task has opted in to protection. 
> > > RSB
> > > filling without IBPB serves no purpose for protecting user space,
> > > as
> > > indirect branches are still vulnerable.
> > 
> > Question for Intel/AMD folks: where is it documented that IBPB clears
> > the RSB?  I thought I'd seen this somewhere but I can't seem to find
> > it.
> 
> "AMD64 TECHNOLOGY INDIRECT BRANCH CONTROL EXTENSION"
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/white-papers/111006-architecture-guidelines-update-amd64-technology-indirect-branch-control-extension.pdf
> 
> has:
> 
> Indirect branch prediction barrier (IBPB) exists at MSR 0x49 (PRED_CMD)
> it 0. This is a write only MSR that both GP faults when software reads
> it or if software tries to write any of the bits in 63:1. When bit zero
> is written, the processor guarantees that older indirect branches
> cannot influence predictions of indirect branches in the future. This
> applies to jmp indirects, call indirects and returns. As this restricts
> the processor from using all previous indirect branch information, it
> is  intended to only be used by software when switching from one user
> context to another user context that requires protection, or from one
> guest to another guest.

Sounds like that needs to be updated to mention the IBPB_RET bit.

-- 
Josh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH v3 0/2] Add support for the ERAPS feature
  2024-11-21 20:07 [PATCH v2 0/2] x86/bugs: RSB tweaks Josh Poimboeuf
  2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf
  2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf
@ 2024-11-28 13:28 ` Amit Shah
  2024-11-28 13:28   ` [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB Amit Shah
  2024-11-28 13:28   ` [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Amit Shah
  2 siblings, 2 replies; 31+ messages in thread
From: Amit Shah @ 2024-11-28 13:28 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3,
	Amit Shah

Newer AMD CPUs (Zen5+) have the ERAPS feature bit that allows us to remove the
RSB filling loops required during context switches and VM exits.

This patchset implements the feature to:
* remove the need for RSB filling on context switches and VMEXITs in host and
  guests
* allow KVM guests to use the full default RSB stack

The feature isn't yet part of an APM update that details its working, so this
is still tagged as RFC.  The notes at

https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/

may help follow along till the APM is public.

v3:
* rebase on top of Josh's RSB tweaks series
  * with that rebase, only the non-AutoIBRS case needs special ERAPS support.
    AutoIBRS is currently disabled when SEV-SNP is active (commit acaa4b5c4c8)

* remove comment about RSB_CLEAR_LOOPS and the size of the RSB -- it's not
  necessary anymore with the rework

* remove comment from patch 2 in svm.c in favour of the commit message

v2:
* reword comments to highlight context switch as the main trigger for RSB
  flushes in hardware (Dave Hansen)
* Split out outdated comment updates in (v1) patch1 to be a standalone
  patch1 in this series, to reinforce RSB filling is only required for RSB
  poisoning cases for AMD
  * Remove mentions of BTC/BTC_NO (Andrew Cooper)
* Add braces in case stmt (kernel test robot)
* s/boot_cpu_has/cpu_feature_enabled (Boris Petkov)

Amit Shah (2):
  x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
  x86: kvm: svm: advertise ERAPS (larger RSB) support to guests

 Documentation/admin-guide/hw-vuln/spectre.rst |  5 ++--
 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/svm.h                    |  6 +++-
 arch/x86/kernel/cpu/bugs.c                    |  6 +++-
 arch/x86/kvm/cpuid.c                          | 18 ++++++++++--
 arch/x86/kvm/svm/svm.c                        | 29 +++++++++++++++++++
 arch/x86/kvm/svm/svm.h                        | 15 ++++++++++
 7 files changed, 74 insertions(+), 6 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
  2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah
@ 2024-11-28 13:28   ` Amit Shah
  2024-12-02 17:26     ` Dave Hansen
  2024-11-28 13:28   ` [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Amit Shah
  1 sibling, 1 reply; 31+ messages in thread
From: Amit Shah @ 2024-11-28 13:28 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

From: Amit Shah <amit.shah@amd.com>

When Automatic IBRS is disabled, Linux flushed the RSB on every context
switch.  This RSB flush is not necessary in software with the ERAPS
feature on Zen5+ CPUs that flushes the RSB in hardware on a context
switch (triggered by mov-to-CR3).

Additionally, the ERAPS feature also tags host and guest addresses in
the RSB - eliminating the need for software flushing of the RSB on
VMEXIT.

Disable all RSB flushing by Linux when the CPU has ERAPS.

Feature mentioned in AMD PPR 57238.  Will be resubmitted once APM is
public - which I'm told is imminent.

Signed-off-by: Amit Shah <amit.shah@amd.com>
---
 Documentation/admin-guide/hw-vuln/spectre.rst | 5 +++--
 arch/x86/include/asm/cpufeatures.h            | 1 +
 arch/x86/kernel/cpu/bugs.c                    | 6 +++++-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 132e0bc6007e..647c10c0307a 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -417,9 +417,10 @@ The possible values in this file are:
 
   - Return stack buffer (RSB) protection status:
 
-  =============   ===========================================
+  =============   ========================================================
   'RSB filling'   Protection of RSB on context switch enabled
-  =============   ===========================================
+  'ERAPS'         Hardware RSB flush on context switches + guest/host tags
+  =============   ========================================================
 
   - EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status:
 
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 17b6590748c0..79a1373050f7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -461,6 +461,7 @@
 #define X86_FEATURE_AUTOIBRS		(20*32+ 8) /* Automatic IBRS */
 #define X86_FEATURE_NO_SMM_CTL_MSR	(20*32+ 9) /* SMM_CTL MSR is not present */
 
+#define X86_FEATURE_ERAPS		(20*32+24) /* Enhanced RAP / RSB / RAS Security */
 #define X86_FEATURE_SBPB		(20*32+27) /* Selective Branch Prediction Barrier */
 #define X86_FEATURE_IBPB_BRTYPE		(20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
 #define X86_FEATURE_SRSO_NO		(20*32+29) /* CPU is not affected by SRSO */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d5102b72f74d..d7af5f811776 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1634,6 +1634,9 @@ static void __init spectre_v2_mitigate_rsb(enum spectre_v2_mitigation mode)
 	case SPECTRE_V2_RETPOLINE:
 	case SPECTRE_V2_LFENCE:
 	case SPECTRE_V2_IBRS:
+		if (boot_cpu_has(X86_FEATURE_ERAPS))
+			break;
+
 		pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n");
 		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
 		setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT);
@@ -2850,7 +2853,7 @@ static ssize_t spectre_v2_show_state(char *buf)
 	    spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE)
 		return sysfs_emit(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
 
-	return sysfs_emit(buf, "%s%s%s%s%s%s%s%s\n",
+	return sysfs_emit(buf, "%s%s%s%s%s%s%s%s%s\n",
 			  spectre_v2_strings[spectre_v2_enabled],
 			  ibpb_state(),
 			  boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? "; IBRS_FW" : "",
@@ -2858,6 +2861,7 @@ static ssize_t spectre_v2_show_state(char *buf)
 			  boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? "; RSB filling" : "",
 			  pbrsb_eibrs_state(),
 			  spectre_bhi_state(),
+			  boot_cpu_has(X86_FEATURE_ERAPS) ? "; ERAPS hardware RSB flush" : "",
 			  /* this should always be at the end */
 			  spectre_v2_module_string());
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
  2024-11-28 13:28   ` [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB Amit Shah
@ 2024-12-02 17:26     ` Dave Hansen
  2024-12-02 18:09       ` Amit Shah
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2024-12-02 17:26 UTC (permalink / raw)
  To: Amit Shah, linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

On 11/28/24 05:28, Amit Shah wrote:
> From: Amit Shah <amit.shah@amd.com>
> 
> When Automatic IBRS is disabled, Linux flushed the RSB on every context
> switch.  This RSB flush is not necessary in software with the ERAPS
> feature on Zen5+ CPUs that flushes the RSB in hardware on a context
> switch (triggered by mov-to-CR3).
> 
> Additionally, the ERAPS feature also tags host and guest addresses in
> the RSB - eliminating the need for software flushing of the RSB on
> VMEXIT.
> 
> Disable all RSB flushing by Linux when the CPU has ERAPS.
> 
> Feature mentioned in AMD PPR 57238.  Will be resubmitted once APM is
> public - which I'm told is imminent.

There was a _lot_ of discussion about this. But all of that discussion
seems to have been trimmed out and it seems like we're basically back
to: "this is new hardware supposed to mitigate SpectreRSB, thus it
mitigates SpectreRSB."

Could we please summarize the previous discussions in the changelog?
Otherwise, I fear it will be lost.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
  2024-12-02 17:26     ` Dave Hansen
@ 2024-12-02 18:09       ` Amit Shah
  2024-12-02 18:25         ` Dave Hansen
  0 siblings, 1 reply; 31+ messages in thread
From: Amit Shah @ 2024-12-02 18:09 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel, kvm, x86, linux-doc
  Cc: thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta,
	corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon,
	kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger,
	david.kaplan, dwmw, andrew.cooper3, Amit Shah

On Mon, 2024-12-02 at 09:26 -0800, Dave Hansen wrote:
> On 11/28/24 05:28, Amit Shah wrote:
> > From: Amit Shah <amit.shah@amd.com>
> > 
> > When Automatic IBRS is disabled, Linux flushed the RSB on every
> > context
> > switch.  This RSB flush is not necessary in software with the ERAPS
> > feature on Zen5+ CPUs that flushes the RSB in hardware on a context
> > switch (triggered by mov-to-CR3).
> > 
> > Additionally, the ERAPS feature also tags host and guest addresses
> > in
> > the RSB - eliminating the need for software flushing of the RSB on
> > VMEXIT.
> > 
> > Disable all RSB flushing by Linux when the CPU has ERAPS.
> > 
> > Feature mentioned in AMD PPR 57238.  Will be resubmitted once APM
> > is
> > public - which I'm told is imminent.
> 
> There was a _lot_ of discussion about this. But all of that
> discussion
> seems to have been trimmed out and it seems like we're basically back
> to: "this is new hardware supposed to mitigate SpectreRSB, thus it
> mitigates SpectreRSB."

Absolutely, I don't want that to get lost -- but I think that got
captured in Josh's rework patchset.  With that rework, I don't even
need this patchset for the hardware feature to work, because we now
rely on AutoIBRS to do the RSB clearing; and the hardware takes care of
AutoIBRS and ERAPS interaction in Zen5.

The only thing this patch now does is to handle the AutoIBRS-disabled
case -- which happens when SEV-SNP is turned on (i.e. let the hw clear
the RSB instead of stuffing it in Linux).

I can still include the summary of the discussion in this patch - I
just feel it isn't necessary with the rework.

		Amit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
  2024-12-02 18:09       ` Amit Shah
@ 2024-12-02 18:25         ` Dave Hansen
  2024-12-02 18:36           ` Sean Christopherson
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2024-12-02 18:25 UTC (permalink / raw)
  To: Amit Shah, linux-kernel, kvm, x86, linux-doc
  Cc: thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta,
	corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon,
	kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger,
	david.kaplan, dwmw, andrew.cooper3, Amit Shah

On 12/2/24 10:09, Amit Shah wrote:
> I can still include the summary of the discussion in this patch - I
> just feel it isn't necessary with the rework.

It's necessary.

There's a new hardware feature. You want it to replace a software
sequence in certain situations. You have to make an actual, coherent
argument argument as to why the hardware feature is a suitable replacement.

For instance (and I'm pretty sure we've gone over this more than once),
the changelog here still makes the claim that a "context switch" and a
"mov-to-CR3" are the same thing.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB
  2024-12-02 18:25         ` Dave Hansen
@ 2024-12-02 18:36           ` Sean Christopherson
  0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-12-02 18:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Amit Shah, linux-kernel, kvm, x86, linux-doc, thomas.lendacky, bp,
	tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo,
	dave.hansen, hpa, pbonzini, daniel.sneddon, kai.huang,
	sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw,
	andrew.cooper3, Amit Shah

On Mon, Dec 02, 2024, Dave Hansen wrote:
> On 12/2/24 10:09, Amit Shah wrote:
> > I can still include the summary of the discussion in this patch - I
> > just feel it isn't necessary with the rework.
> 
> It's necessary.
> 
> There's a new hardware feature. You want it to replace a software
> sequence in certain situations. You have to make an actual, coherent
> argument argument as to why the hardware feature is a suitable replacement.
> 
> For instance (and I'm pretty sure we've gone over this more than once),
> the changelog here still makes the claim that a "context switch" and a
> "mov-to-CR3" are the same thing.

+1000.  I want a crisp, precise description of the actual hardware behavior, so
that KVM can do the right thing when virtualizing ERAPS.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests
  2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah
  2024-11-28 13:28   ` [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB Amit Shah
@ 2024-11-28 13:28   ` Amit Shah
  2024-12-02 18:30     ` Sean Christopherson
  1 sibling, 1 reply; 31+ messages in thread
From: Amit Shah @ 2024-11-28 13:28 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

From: Amit Shah <amit.shah@amd.com>

AMD CPUs with the ERAPS feature (Zen5+) have a larger RSB (aka RAP).
While the new default RSB size is used on the host without any software
modification necessary, the RSB size for guests is limited to the older
value (32 entries) for backwards compatibility.  With this patch, KVM
enables guest mode to also use the default number of entries by setting
the new ALLOW_LARGER_RAP bit in the VMCB.

The two cases for backward compatibility that need special handling are
nested guests, and guests using shadow paging (or when NPT is disabled):

For nested guests: the ERAPS feature adds host/guest tagging to entries
in the RSB, but does not distinguish between ASIDs.  On a nested exit,
the L0 hypervisor instructs the hardware (via another new VMCB bit,
FLUSH_RAP_ON_VMRUN) to flush the RSB on the next VMRUN to prevent RSB
poisoning attacks from an L2 guest to an L1 guest.  With that in place,
this feature can be exposed to guests.

For shadow paging guests: do not expose this feature to guests; only
expose if nested paging is enabled, to ensure a context switch within
a guest triggers a context switch on the CPU -- thereby ensuring guest
context switches flush guest RSB entries.  For shadow paging, the CPU's
CR3 is not used for guest processes, and hence cannot benefit from this
feature.

Signed-off-by: Amit Shah <amit.shah@amd.com>
---
 arch/x86/include/asm/svm.h |  6 +++++-
 arch/x86/kvm/cpuid.c       | 18 ++++++++++++++++--
 arch/x86/kvm/svm/svm.c     | 29 +++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h     | 15 +++++++++++++++
 4 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 2b59b9951c90..f8584a63c859 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -129,7 +129,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u64 tsc_offset;
 	u32 asid;
 	u8 tlb_ctl;
-	u8 reserved_2[3];
+	u8 erap_ctl;
+	u8 reserved_2[2];
 	u32 int_ctl;
 	u32 int_vector;
 	u32 int_state;
@@ -175,6 +176,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define TLB_CONTROL_FLUSH_ASID 3
 #define TLB_CONTROL_FLUSH_ASID_LOCAL 7
 
+#define ERAP_CONTROL_ALLOW_LARGER_RAP 0
+#define ERAP_CONTROL_FLUSH_RAP 1
+
 #define V_TPR_MASK 0x0f
 
 #define V_IRQ_SHIFT 8
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 097bdc022d0f..dd589670a716 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -803,6 +803,8 @@ void kvm_set_cpu_caps(void)
 		F(WRMSR_XX_BASE_NS)
 	);
 
+	if (tdp_enabled)
+		kvm_cpu_cap_check_and_set(X86_FEATURE_ERAPS);
 	kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB);
 	kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
 	kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);
@@ -1362,10 +1364,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 	case 0x80000020:
 		entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 		break;
-	case 0x80000021:
-		entry->ebx = entry->ecx = entry->edx = 0;
+	case 0x80000021: {
+		unsigned int ebx_mask = 0;
+
+		entry->ecx = entry->edx = 0;
 		cpuid_entry_override(entry, CPUID_8000_0021_EAX);
+
+		/*
+		 * Bits 23:16 in EBX indicate the size of the RSB.
+		 * Expose the value in the hardware to the guest.
+		 */
+		if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
+			ebx_mask |= GENMASK(23, 16);
+
+		entry->ebx &= ebx_mask;
 		break;
+	}
 	/* AMD Extended Performance Monitoring and Debug */
 	case 0x80000022: {
 		union cpuid_0x80000022_ebx ebx;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd15cc635655..9b055de079cb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1360,6 +1360,13 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
 
+	/*
+	 * If the hardware has a larger RSB, use it in the guest context as
+	 * well.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_ERAPS) && npt_enabled)
+		vmcb_set_larger_rap(svm->vmcb);
+
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
@@ -3395,6 +3402,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%016llx\n", "tsc_offset:", control->tsc_offset);
 	pr_err("%-20s%d\n", "asid:", control->asid);
 	pr_err("%-20s%d\n", "tlb_ctl:", control->tlb_ctl);
+	pr_err("%-20s%d\n", "erap_ctl:", control->erap_ctl);
 	pr_err("%-20s%08x\n", "int_ctl:", control->int_ctl);
 	pr_err("%-20s%08x\n", "int_vector:", control->int_vector);
 	pr_err("%-20s%08x\n", "int_state:", control->int_state);
@@ -3561,6 +3569,27 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 
 		trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM);
 
+		if (boot_cpu_has(X86_FEATURE_ERAPS)
+		    && vmcb_is_larger_rap(svm->vmcb01.ptr)) {
+			/*
+			 * XXX a few further optimizations can be made:
+			 *
+			 * 1. In pre_svm_run() we can reset this bit when a hw
+			 * TLB flush has happened - any context switch on a
+			 * CPU (which causes a TLB flush) auto-flushes the RSB
+			 * - eg when this vCPU is scheduled on a different
+			 * pCPU.
+			 *
+			 * 2. This is also not needed in the case where the
+			 * vCPU is being scheduled on the same pCPU, but there
+			 * was a context switch between the #VMEXIT and VMRUN.
+			 *
+			 * 3. If the guest returns to L2 again after this
+			 * #VMEXIT, there's no need to flush the RSB.
+			 */
+			vmcb_set_flush_rap(svm->vmcb01.ptr);
+		}
+
 		vmexit = nested_svm_exit_special(svm);
 
 		if (vmexit == NESTED_EXIT_CONTINUE)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 43fa6a16eb19..8a7877f46dc5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -500,6 +500,21 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
 	return vmcb_is_intercept(&svm->vmcb->control, bit);
 }
 
+static inline void vmcb_set_flush_rap(struct vmcb *vmcb)
+{
+	__set_bit(ERAP_CONTROL_FLUSH_RAP, (unsigned long *)&vmcb->control.erap_ctl);
+}
+
+static inline void vmcb_set_larger_rap(struct vmcb *vmcb)
+{
+	__set_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl);
+}
+
+static inline bool vmcb_is_larger_rap(struct vmcb *vmcb)
+{
+	return test_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl);
+}
+
 static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
 {
 	return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests
  2024-11-28 13:28   ` [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Amit Shah
@ 2024-12-02 18:30     ` Sean Christopherson
  2025-03-27 11:10       ` Shah, Amit
  0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2024-12-02 18:30 UTC (permalink / raw)
  To: Amit Shah
  Cc: linux-kernel, kvm, x86, linux-doc, amit.shah, thomas.lendacky, bp,
	tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo,
	dave.hansen, hpa, pbonzini, daniel.sneddon, kai.huang,
	sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw,
	andrew.cooper3

KVM: SVM:

Please through the relevant maintainer handbooks, there are warts all over.

  Documentation/process/maintainer-kvm-x86.rst
  Documentation/process/maintainer-tip.rst

And the shortlog is wrong.  The patch itself is also broken.  KVM should (a) add
support for virtualizing ERAPS and (b) advertise support to *userspace*.  The
userspace VMM ultimately decides what to expose/enable for the guest.

On Thu, Nov 28, 2024, Amit Shah wrote:
> From: Amit Shah <amit.shah@amd.com>
> 
> AMD CPUs with the ERAPS feature (Zen5+) have a larger RSB (aka RAP).

Please spell out ERAPS at least once (I assume it's "Enhanced RAPs"?) and very
briefly document what it does.

> While the new default RSB size is used on the host without any software
> modification necessary, the RSB size for guests is limited to the older

Please describe hardware behavior, and make it abundantly clear that the changelog
is talking about hardware behavior.  One of my pet peeves (understatement) with
the APM is that it does a shit job of explaining the actual architectural behavior.

> value (32 entries) for backwards compatibility.

Backwards compatibility with what?  And how far back?  E.g. have CPUs with a RAP
always had 32 entries?

> With this patch, KVM

No "this patch"

> enables guest mode 

Use imperative mood.

> to also use the default number of entries by setting

"default" is clearly wrong, since the *default* behavior is to use

> the new ALLOW_LARGER_RAP bit in the VMCB.

I detest the "ALLOW_LARGER" name.  "Allow" implies the guest somehow has a choice.
And "Larger" implies there's an even larger size

And again, please explicitly describe what this bit does.

> The two cases for backward compatibility that need special handling are
> nested guests, and guests using shadow paging

Guests don't use shadow paging, *KVM* uses 

> (or when NPT is disabled):

"i.e", not "or".  "Or" makes it sound like "NPT is disabled" is separate case
from shadow paging.

> For nested guests: the ERAPS feature adds host/guest tagging to entries
> in the RSB, but does not distinguish between ASIDs.  On a nested exit,
> the L0 hypervisor instructs the hardware (via another new VMCB bit,

I strongly suspect this was copied from the APM.  Please don't do that.  State
what change is being for *KVM*, not for "the L0 hypervisor".  This verbiage mixes
hardware behavior with software behavior, which again is why I hate much of the
APM's wording.

> FLUSH_RAP_ON_VMRUN) to flush the RSB on the next VMRUN to prevent RSB
> poisoning attacks from an L2 guest to an L1 guest.  With that in place,
> this feature can be exposed to guests.

ERAPS can also be advertised if nested virtualization is disabled, no?  I think
it makes sense to first add support for ERAPS if "!nested", and then in a separate
path, add support for ERAPS when nested virtualization is enabled.  Partly so that
it's easier for readers to understand why nested VMs are special, but mainly because
the nested virtualization support is sorely lacking.

> For shadow paging guests: do not expose this feature to guests; only
> expose if nested paging is enabled, to ensure a context switch within
> a guest triggers a context switch on the CPU -- thereby ensuring guest
> context switches flush guest RSB entries.

Huh?

> For shadow paging, the CPU's CR3 is not used for guest processes, and hence
> cannot benefit from this feature.

What does that have to do with anything?

> Signed-off-by: Amit Shah <amit.shah@amd.com>
> ---
>  arch/x86/include/asm/svm.h |  6 +++++-
>  arch/x86/kvm/cpuid.c       | 18 ++++++++++++++++--
>  arch/x86/kvm/svm/svm.c     | 29 +++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.h     | 15 +++++++++++++++
>  4 files changed, 65 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 2b59b9951c90..f8584a63c859 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -129,7 +129,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  	u64 tsc_offset;
>  	u32 asid;
>  	u8 tlb_ctl;
> -	u8 reserved_2[3];
> +	u8 erap_ctl;
> +	u8 reserved_2[2];
>  	u32 int_ctl;
>  	u32 int_vector;
>  	u32 int_state;
> @@ -175,6 +176,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  #define TLB_CONTROL_FLUSH_ASID 3
>  #define TLB_CONTROL_FLUSH_ASID_LOCAL 7
>  
> +#define ERAP_CONTROL_ALLOW_LARGER_RAP 0
> +#define ERAP_CONTROL_FLUSH_RAP 1

Assuming the control enables using the full RAP size, these should be something
like:

#define ERAP_CONTROL_ENABLE_FULL_RAP_MASK	BIT(0)
#define ERAP_CONTROL_FLUSH_RAP_ON_VMRUN		BIT(1)

>  #define V_TPR_MASK 0x0f
>  
>  #define V_IRQ_SHIFT 8
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 097bdc022d0f..dd589670a716 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -803,6 +803,8 @@ void kvm_set_cpu_caps(void)
>  		F(WRMSR_XX_BASE_NS)
>  	);
>  
> +	if (tdp_enabled)
> +		kvm_cpu_cap_check_and_set(X86_FEATURE_ERAPS);
>  	kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB);
>  	kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
>  	kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);
> @@ -1362,10 +1364,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>  	case 0x80000020:
>  		entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
>  		break;
> -	case 0x80000021:
> -		entry->ebx = entry->ecx = entry->edx = 0;
> +	case 0x80000021: {
> +		unsigned int ebx_mask = 0;
> +
> +		entry->ecx = entry->edx = 0;
>  		cpuid_entry_override(entry, CPUID_8000_0021_EAX);
> +
> +		/*
> +		 * Bits 23:16 in EBX indicate the size of the RSB.

Is this enumeration explicitly tied to ERAPS?

> +		 * Expose the value in the hardware to the guest.

__do_cpuid_func() is used to advertise KVM's supported CPUID to host userspace,
not to the guest.

Side topic, what happens when Zen6 adds EVEN_LARGER_RAP?  Enumerating the size of
the RAP suggets it's likely to change in the future.

> +		 */
> +		if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
> +			ebx_mask |= GENMASK(23, 16);
> +
> +		entry->ebx &= ebx_mask;

This is a waste of code and makes it unnecessarily difficult to read.  Just do:

		if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
			entry->ebx &= GENMASK(23, 16);
		else
			entry->ebx = 0;

>  		break;
> +	}
>  	/* AMD Extended Performance Monitoring and Debug */
>  	case 0x80000022: {
>  		union cpuid_0x80000022_ebx ebx;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dd15cc635655..9b055de079cb 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1360,6 +1360,13 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
>  		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
>  
> +	/*
> +	 * If the hardware has a larger RSB, use it in the guest context as
> +	 * well.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_ERAPS) && npt_enabled)

This is wrong.  Userspace owns the vCPU model, not KVM.  If userspace wants to
disable ERAPS for the guest, say because of a hardware vulnerability, then KVM
needs to honor that.

And this should be kvm_cpu_cap_has(), not copy+paste of the code that enables
the KVM capability.

> +		vmcb_set_larger_rap(svm->vmcb);

s/set/enable.  "set" implies a value in this context.

> +
>  	if (kvm_vcpu_apicv_active(vcpu))
>  		avic_init_vmcb(svm, vmcb);
>  
> @@ -3395,6 +3402,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
>  	pr_err("%-20s%016llx\n", "tsc_offset:", control->tsc_offset);
>  	pr_err("%-20s%d\n", "asid:", control->asid);
>  	pr_err("%-20s%d\n", "tlb_ctl:", control->tlb_ctl);
> +	pr_err("%-20s%d\n", "erap_ctl:", control->erap_ctl);
>  	pr_err("%-20s%08x\n", "int_ctl:", control->int_ctl);
>  	pr_err("%-20s%08x\n", "int_vector:", control->int_vector);
>  	pr_err("%-20s%08x\n", "int_state:", control->int_state);
> @@ -3561,6 +3569,27 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
>  
>  		trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM);
>  
> +		if (boot_cpu_has(X86_FEATURE_ERAPS)
> +		    && 

&& goes on the previous line.

> vmcb_is_larger_rap(svm->vmcb01.ptr)) {

This should be something like "vmcb_is_full_rap_size_enabled()".  "is_larger_rap()"
begs the question: is larger than what? 

> +			/*
> +			 * XXX a few further optimizations can be made:
> +			 *
> +			 * 1. In pre_svm_run() we can reset this bit when a hw
> +			 * TLB flush has happened - any context switch on a
> +			 * CPU (which causes a TLB flush) auto-flushes the RSB
> +			 * - eg when this vCPU is scheduled on a different
> +			 * pCPU.
> +			 *
> +			 * 2. This is also not needed in the case where the
> +			 * vCPU is being scheduled on the same pCPU, but there
> +			 * was a context switch between the #VMEXIT and VMRUN.

Either do the optimizations straightaway, or call them out as possible optimizations
in the changelog and then explain why it's not worth doing them.

The above also mixes hardware behavior and software behavior, to the point where
I honestly have no idea who is doing what.  "A context switch" tells me nothing
useful.

> +			 *
> +			 * 3. If the guest returns to L2 again after this
> +			 * #VMEXIT, there's no need to flush the RSB.

This one in particular is trivially easy to implement correctly.

This also highlights the fact that KVM completely fails to emulate FLUSH_RAP_ON_VMRUN
if it's set in vmcb12, though that's somewhat of a moot point because unless I'm
missing something, KVM is responsible for emulating host vs. guest hardware tagging.

From L1's perspective, the (virtual) CPU, a.k.a. KVM, is responsible for isolating
guest (L2) RAP entries from host (L1) RAP entries.  And so KVM must flush the RAP
on every nested VM-Exit *and* nested VM-Enter, not just on nested VM-Exit.

> +			 */
> +			vmcb_set_flush_rap(svm->vmcb01.ptr);

Eh, follow the TLB flush helpers and just go with vmcb_flush_rap().

> +		}
> +
>  		vmexit = nested_svm_exit_special(svm);
>  
>  		if (vmexit == NESTED_EXIT_CONTINUE)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 43fa6a16eb19..8a7877f46dc5 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -500,6 +500,21 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
>  	return vmcb_is_intercept(&svm->vmcb->control, bit);
>  }
>  
> +static inline void vmcb_set_flush_rap(struct vmcb *vmcb)
> +{
> +	__set_bit(ERAP_CONTROL_FLUSH_RAP, (unsigned long *)&vmcb->control.erap_ctl);

Eww.  Don't use the bitops helpers, casting a u8 to an unsigned long, and then
having to use the non-atomic helpers makes this way, way more complicated then
it actually is.

	vmcb->control.erap_ctl |= ERAP_CONTROL_FLUSH_RAP_ON_VMRUN;

> +}
> +
> +static inline void vmcb_set_larger_rap(struct vmcb *vmcb)
> +{
> +	__set_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl);
> +}
> +
> +static inline bool vmcb_is_larger_rap(struct vmcb *vmcb)
> +{
> +	return test_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl);
> +}
> +
>  static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
>  {
>  	return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) &&
> -- 
> 2.47.0
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests
  2024-12-02 18:30     ` Sean Christopherson
@ 2025-03-27 11:10       ` Shah, Amit
  0 siblings, 0 replies; 31+ messages in thread
From: Shah, Amit @ 2025-03-27 11:10 UTC (permalink / raw)
  To: seanjc@google.com
  Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, jpoimboe@kernel.org,
	andrew.cooper3@citrix.com, dave.hansen@linux.intel.com,
	daniel.sneddon@linux.intel.com, Lendacky, Thomas,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	mingo@redhat.com, dwmw@amazon.co.uk, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	linux-doc@vger.kernel.org, hpa@zytor.com, peterz@infradead.org,
	bp@alien8.de, boris.ostrovsky@oracle.com, Kaplan, David,
	x86@kernel.org

Thanks for the review, Sean.  Apologies for not replying sooner.  I
hope you still have some context in mind.  I have a question for you
below, but other than that, I'll send out a new rev next week.

On Mon, 2024-12-02 at 10:30 -0800, Sean Christopherson wrote:
> KVM: SVM:
> 
> Please through the relevant maintainer handbooks, there are warts all
> over.
> 
>   Documentation/process/maintainer-kvm-x86.rst
>   Documentation/process/maintainer-tip.rst
> 
> And the shortlog is wrong.  The patch itself is also broken.  KVM
> should (a) add
> support for virtualizing ERAPS and (b) advertise support to
> *userspace*.  The
> userspace VMM ultimately decides what to expose/enable for the guest.

Point taken - I interpreted it the way I wanted it to - but I can see
what you mean - I'll reword.

[...]

> > While the new default RSB size is used on the host without any
> > software
> > modification necessary, the RSB size for guests is limited to the
> > older
> 
> Please describe hardware behavior, and make it abundantly clear that
> the changelog
> is talking about hardware behavior.  One of my pet peeves
> (understatement) with
> the APM is that it does a shit job of explaining the actual
> architectural behavior.

OK, more rewording.

Hardware behaviour is to just use 64 entries on the host; and limit to
32 entries for any guest.  If the right VMCB bits are set, like in the
patch, the hardware uses all the 64 entries for the guest.

> > value (32 entries) for backwards compatibility.
> 
> Backwards compatibility with what?  And how far back?  E.g. have CPUs
> with a RAP
> always had 32 entries?

Yes, all CPUs upto Zen5 had 32 entries -- that number's even just hard-
coded in the definition of RSB_CLEAR_LOOPS, which is used for the RSB
stuffing.

The wording is borrowed a bit from the APM -- especially the "default"
value.  With Zen5, the default hardware RSB size is 64.  So to expose
the hardware-default size to the guest, this VMCB update is necessary;
else the guests use the non-default, trimmed-down 32 entry RSB.
> 

> > to also use the default number of entries by setting
> 
> "default" is clearly wrong, since the *default* behavior is to use

Yea; I get this is confusing.  The hardware default is 64, but guest
default is 32.  My intention with that stmt was to say use the native
default vs the guest default.  Will reword.

> > the new ALLOW_LARGER_RAP bit in the VMCB.
> 
> I detest the "ALLOW_LARGER" name.  "Allow" implies the guest somehow
> has a choice.
> And "Larger" implies there's an even larger size

I agree, but this is the name used in the APM.  What's the preference -
deviate from the APM naming, or stick to it, despite the confusion it
causes?

> > The two cases for backward compatibility that need special handling
> > are
> > nested guests, and guests using shadow paging
> 
> Guests don't use shadow paging, *KVM* uses 
> 
> > (or when NPT is disabled):
> 
> "i.e", not "or".  "Or" makes it sound like "NPT is disabled" is
> separate case
> from shadow paging.
> 
> > For nested guests: the ERAPS feature adds host/guest tagging to
> > entries
> > in the RSB, but does not distinguish between ASIDs.  On a nested
> > exit,
> > the L0 hypervisor instructs the hardware (via another new VMCB bit,
> 
> I strongly suspect this was copied from the APM.  Please don't do
> that.  State
> what change is being for *KVM*, not for "the L0 hypervisor".  This
> verbiage mixes
> hardware behavior with software behavior, which again is why I hate
> much of the
> APM's wording.

This isn't necessarily from the APM; this is me describing what the hw
does, to set up why KVM needs to do a few other things.

I want to say that the hw is in control of host/guest tagging for the
RSB entries; but it does not tag ASIDs.  So KVM running as L0
hypervisor needs to take certain steps.  KVM running as L1 hypervisor
does not need to.

> > FLUSH_RAP_ON_VMRUN) to flush the RSB on the next VMRUN to prevent
> > RSB
> > poisoning attacks from an L2 guest to an L1 guest.  With that in
> > place,
> > this feature can be exposed to guests.
> 
> ERAPS can also be advertised if nested virtualization is disabled,
> no?  I think
> it makes sense to first add support for ERAPS if "!nested", and then
> in a separate
> path, add support for ERAPS when nested virtualization is enabled. 
> Partly so that
> it's easier for readers to understand why nested VMs are special, but
> mainly because
> the nested virtualization support is sorely lacking.

Yea, it makes sense to split these up, I'll try doing that.

Nested virt disabled is the common case, so it makes sense to make that
more obvious as well.

> > For shadow paging guests: do not expose this feature to guests;
> > only
> > expose if nested paging is enabled, to ensure a context switch
> > within
> > a guest triggers a context switch on the CPU -- thereby ensuring
> > guest
> > context switches flush guest RSB entries.
> 
> Huh?

OK this is where I'm slightly confused as well - so tell me if I'm
wrong:

When EPT/NPT is disabled, and shadow MMU is used by kvm, the CR3
register on the CPU holds the PGD of the qemu process.  So if a task
switch happens within the guest, the CR3 on the CPU is not updated, but
KVM's shadow MMU routines change the page tables pointed to by that
CR3.  Contrasting to the NPT case, the CPU's CR3 holds the guest PGD
directly, and task switches within the guest cause an update to the
CPU's CR3.

Am I misremembering and misreading the code?

> > For shadow paging, the CPU's CR3 is not used for guest processes,
> > and hence
> > cannot benefit from this feature.
> 
> What does that have to do with anything?

If what I wrote above is true, since CR3 does not change when guest
tasks switch, the RSB will not be flushed by hardware on guest task
switches, and so exposing ERAPS to this guest is wrong -- it'll relax
its own FILL_RETURN_BUFFER routines (via patch 1) and that's not what
we want when the hw is doing that.

[...]

> > @@ -1362,10 +1364,22 @@ static inline int __do_cpuid_func(struct
> > kvm_cpuid_array *array, u32 function)
> >   case 0x80000020:
> >   entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> >   break;
> > - case 0x80000021:
> > - entry->ebx = entry->ecx = entry->edx = 0;
> > + case 0x80000021: {
> > + unsigned int ebx_mask = 0;
> > +
> > + entry->ecx = entry->edx = 0;
> >   cpuid_entry_override(entry, CPUID_8000_0021_EAX);
> > +
> > + /*
> > + * Bits 23:16 in EBX indicate the size of the RSB.
> 
> Is this enumeration explicitly tied to ERAPS?

Not tied to ERAPS, but it happens to be defined at the same time as
ERAPS is being defined, and there's no other CPUID bit that says 'the
RSB size is now available in these bits'.  So yea, the ERAPS CPUID bit
is being overloaded here.  Existence of the ERAPS CPUID bit confirms
that the default RSB size is now different, and that the hardware
flushes the RSB (as in patch 1).

> > + * Expose the value in the hardware to the guest.
> 
> __do_cpuid_func() is used to advertise KVM's supported CPUID to host
> userspace,
> not to the guest.
> 
> Side topic, what happens when Zen6 adds EVEN_LARGER_RAP?  Enumerating
> the size of
> the RAP suggets it's likely to change in the future.

It's just informational-only for now, and there aren't any plans to
extend the RAP anymore.  But you're right - it could change in the
future.

> > + */
> > + if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
> > + ebx_mask |= GENMASK(23, 16);
> > +
> > + entry->ebx &= ebx_mask;
> 
> This is a waste of code and makes it unnecessarily difficult to
> read.  Just do:
> 
>  if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
>  entry->ebx &= GENMASK(23, 16);
>  else
>  entry->ebx = 0;

Well I left it all this way to make it easier the the future to add
more ebx bits here, but sure - I'll just shorten it now.

[...]

		Amit

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-04-02 14:19 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-21 20:07 [PATCH v2 0/2] x86/bugs: RSB tweaks Josh Poimboeuf
2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf
2024-11-30 15:31   ` Borislav Petkov
2024-12-02 11:15     ` Shah, Amit
2024-12-02 12:19       ` Borislav Petkov
2024-12-02 23:35     ` Pawan Gupta
2024-12-03 11:20       ` Borislav Petkov
2024-12-05 23:12         ` Josh Poimboeuf
2024-12-21  9:13           ` Borislav Petkov
2025-04-02  9:19           ` Shah, Amit
2025-04-02 14:16             ` Josh Poimboeuf
2025-04-02 14:19               ` Shah, Amit
2024-12-05 23:13     ` Josh Poimboeuf
2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf
2024-12-03 11:42   ` Borislav Petkov
2024-12-05 23:32   ` Josh Poimboeuf
2024-12-06  0:53     ` Josh Poimboeuf
2024-12-06 23:02       ` Josh Poimboeuf
2024-12-30 14:54         ` Shah, Amit
2025-01-08 11:50         ` Shah, Amit
2024-12-06 10:10     ` Shah, Amit
2024-12-09 20:46       ` jpoimboe
2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah
2024-11-28 13:28   ` [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB Amit Shah
2024-12-02 17:26     ` Dave Hansen
2024-12-02 18:09       ` Amit Shah
2024-12-02 18:25         ` Dave Hansen
2024-12-02 18:36           ` Sean Christopherson
2024-11-28 13:28   ` [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Amit Shah
2024-12-02 18:30     ` Sean Christopherson
2025-03-27 11:10       ` Shah, Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox