* [PATCH v2 0/2] x86/bugs: RSB tweaks
@ 2024-11-21 20:07 Josh Poimboeuf
2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Josh Poimboeuf @ 2024-11-21 20:07 UTC (permalink / raw)
To: x86
Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx,
peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3
v2:
- remove stray whitespace in printk
- clarify patch 2 comment message about prev/next tasks
- move error path to default switch case in spectre_v2_mitigate_rsb()
- add Reviewed-bys
Some RSB filling tweaks as discussed in the following thread:
[RFC PATCH v2 0/3] Add support for the ERAPS feature
https://lore.kernel.org/20241111163913.36139-1-amit@kernel.org
Josh Poimboeuf (2):
x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
x86/bugs: Don't fill RSB on context switch with eIBRS
arch/x86/kernel/cpu/bugs.c | 115 +++++++++++++++----------------------
arch/x86/mm/tlb.c | 2 +-
2 files changed, 48 insertions(+), 69 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 31+ messages in thread* [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-11-21 20:07 [PATCH v2 0/2] x86/bugs: RSB tweaks Josh Poimboeuf @ 2024-11-21 20:07 ` Josh Poimboeuf 2024-11-30 15:31 ` Borislav Petkov 2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf 2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah 2 siblings, 1 reply; 31+ messages in thread From: Josh Poimboeuf @ 2024-11-21 20:07 UTC (permalink / raw) To: x86 Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx, peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 eIBRS protects against RSB underflow/poisoning attacks. Adding retpoline to the mix doesn't change that. Retpoline has a balanced CALL/RET anyway. So the current full RSB filling on VMEXIT with eIBRS+retpoline is overkill. Disable it (or do the VMEXIT_LITE mitigation if needed). Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> --- arch/x86/kernel/cpu/bugs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 47a01d4028f6..68bed17f0980 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1605,20 +1605,20 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_ case SPECTRE_V2_NONE: return; - case SPECTRE_V2_EIBRS_LFENCE: case SPECTRE_V2_EIBRS: + case SPECTRE_V2_EIBRS_LFENCE: + case SPECTRE_V2_EIBRS_RETPOLINE: if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) { - setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); } return; - case SPECTRE_V2_EIBRS_RETPOLINE: case SPECTRE_V2_RETPOLINE: case SPECTRE_V2_LFENCE: case SPECTRE_V2_IBRS: - setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n"); + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); return; } -- 2.47.0 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf @ 2024-11-30 15:31 ` Borislav Petkov 2024-12-02 11:15 ` Shah, Amit ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Borislav Petkov @ 2024-11-30 15:31 UTC (permalink / raw) To: Josh Poimboeuf Cc: x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx, peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Thu, Nov 21, 2024 at 12:07:18PM -0800, Josh Poimboeuf wrote: > eIBRS protects against RSB underflow/poisoning attacks. Adding > retpoline to the mix doesn't change that. Retpoline has a balanced > CALL/RET anyway. This is exactly why I've been wanting for us to document our mitigations for a long time now. A bunch of statements above for which I can only rhyme up they're correct if I search for the vendor docs. On the AMD side I've found: "When Automatic IBRS is enabled, the internal return address stack used for return address predictions is cleared on VMEXIT." APM v2, p. 58/119 For the Intel side I'm not that lucky. There's something here: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html Or is it this one: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html#inpage-nav-1-3-undefined Or is this written down explicitly in some other doc? In any case, I'd like for us to do have a piece of text accompanying such patches, perhaps here: Documentation/admin-guide/hw-vuln/spectre.rst which quotes the vendor docs. The current thread(s) on the matter already show how much confused we all are by all the possible mitigation options, uarch speculative dances etc etc. > So the current full RSB filling on VMEXIT with eIBRS+retpoline is > overkill. Disable it (or do the VMEXIT_LITE mitigation if needed). > > Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > Reviewed-by: Amit Shah <amit.shah@amd.com> > Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> > --- > arch/x86/kernel/cpu/bugs.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c > index 47a01d4028f6..68bed17f0980 100644 > --- a/arch/x86/kernel/cpu/bugs.c > +++ b/arch/x86/kernel/cpu/bugs.c > @@ -1605,20 +1605,20 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_ > case SPECTRE_V2_NONE: > return; > > - case SPECTRE_V2_EIBRS_LFENCE: > case SPECTRE_V2_EIBRS: > + case SPECTRE_V2_EIBRS_LFENCE: > + case SPECTRE_V2_EIBRS_RETPOLINE: > if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) { > - setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); > pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); > + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); Why are you swapping those? > } > return; > > - case SPECTRE_V2_EIBRS_RETPOLINE: > case SPECTRE_V2_RETPOLINE: > case SPECTRE_V2_LFENCE: > case SPECTRE_V2_IBRS: > - setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); > pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n"); > + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); Ditto? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-11-30 15:31 ` Borislav Petkov @ 2024-12-02 11:15 ` Shah, Amit 2024-12-02 12:19 ` Borislav Petkov 2024-12-02 23:35 ` Pawan Gupta 2024-12-05 23:13 ` Josh Poimboeuf 2 siblings, 1 reply; 31+ messages in thread From: Shah, Amit @ 2024-12-02 11:15 UTC (permalink / raw) To: jpoimboe@kernel.org, bp@alien8.de Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, kvm@vger.kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, Lendacky, Thomas, daniel.sneddon@linux.intel.com, boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, dwmw@amazon.co.uk, amit@kernel.org, hpa@zytor.com, peterz@infradead.org, Kaplan, David, x86@kernel.org On Sat, 2024-11-30 at 16:31 +0100, Borislav Petkov wrote: > On Thu, Nov 21, 2024 at 12:07:18PM -0800, Josh Poimboeuf wrote: > > eIBRS protects against RSB underflow/poisoning attacks. Adding > > retpoline to the mix doesn't change that. Retpoline has a balanced > > CALL/RET anyway. > > This is exactly why I've been wanting for us to document our > mitigations for > a long time now. FWIW, I'd say we have fairly decent documentation with commit messages + code + comments in code. > A bunch of statements above for which I can only rhyme up they're > correct if > I search for the vendor docs. On the AMD side I've found: [...] > In any case, I'd like for us to do have a piece of text accompanying > such > patches, perhaps here: > > Documentation/admin-guide/hw-vuln/spectre.rst > > which quotes the vendor docs. If you're saying that we need *additional* documentation that replicates hw manuals and the knowledge we have in our commit + code + comments, that I agree with. I got the feeling earlier, though, that you were saying we need that documentation *instead of* the current comments-within-code, and that didn't sound like the right thing to do. > The current thread(s) on the matter already show how much confused we > all are > by all the possible mitigation options, uarch speculative dances etc > etc. ... and the code flows and looks much better after this commit (for SpectreRSB at least), which is a huge plus. It's important to note that at some point in the past we got vulnerabilities and hw features/quirks one after the other, and we kept tacking mitigation code on top of the existing one -- because that's what you need to do during an embargo period. Now's the moment when we're consolidating it all while taking stock of the overall situation. This looks like a sound way to go about taking a higher-level view and simplifying the code. I doubt we'd want to do things any other way; and much less doing this kind of an exercise during an embargo. Moving comments out of the code will only add to frustration during embargo periods. Just my 2c Amit ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-12-02 11:15 ` Shah, Amit @ 2024-12-02 12:19 ` Borislav Petkov 0 siblings, 0 replies; 31+ messages in thread From: Borislav Petkov @ 2024-12-02 12:19 UTC (permalink / raw) To: Shah, Amit Cc: jpoimboe@kernel.org, corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, kvm@vger.kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, Lendacky, Thomas, daniel.sneddon@linux.intel.com, boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, dwmw@amazon.co.uk, amit@kernel.org, hpa@zytor.com, peterz@infradead.org, Kaplan, David, x86@kernel.org On Mon, Dec 02, 2024 at 11:15:24AM +0000, Shah, Amit wrote: > FWIW, I'd say we have fairly decent documentation with commit messages > + code + comments in code. You mean everytime we want to swap back in why we did some mitigation decision the way we did, we should do git archeology and go on a chase? I've been doing it for years now and it is an awful experience. Absolutely certainly can be better. > If you're saying that we need *additional* documentation that replicates hw > manuals and the knowledge we have in our commit + code + comments, that > I agree with. We need documentation or at least pointers to vendor documentation which explain why we're doing this exact type of mitigation and why we're not doing other. Why is it ok to disable SMT, why is it not, for example? This patch is another good example. > I got the feeling earlier, though, that you were saying we need that > documentation *instead of* the current comments-within-code, and that > didn't sound like the right thing to do. Comments within the code are fine. Big fat comment which is almost a page long is pushing it. It can very well exist in Documentation/ and a short comment can point to it. And in Documentation/ we can go nuts and explain away... > ... and the code flows and looks much better after this commit (for > SpectreRSB at least), which is a huge plus. Why does it look better? Because of the move of SPECTRE_V2_EIBRS_RETPOLINE? > It's important to note that at some point in the past we got vulnerabilities > and hw features/quirks one after the other, and we kept tacking mitigation > code on top of the existing one -- because that's what you need to do during > an embargo period. Now's the moment when we're consolidating it all while > taking stock of the overall situation. Yes, that's exactly why I'm pushing for improving the documentation and the reasoning for each mitigation. Exactly because stuff is not under NDA anymore and we can do all the debating in the open, and the dust has settled. > This looks like a sound way to go about taking a higher-level view and > simplifying the code. Yap. And to give you another argument for it: when we clean it up nicely, it'll be easier to add new mitigations. :) And no, I don't want more but no one's listening to me anyway... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-11-30 15:31 ` Borislav Petkov 2024-12-02 11:15 ` Shah, Amit @ 2024-12-02 23:35 ` Pawan Gupta 2024-12-03 11:20 ` Borislav Petkov 2024-12-05 23:13 ` Josh Poimboeuf 2 siblings, 1 reply; 31+ messages in thread From: Pawan Gupta @ 2024-12-02 23:35 UTC (permalink / raw) To: Borislav Petkov Cc: Josh Poimboeuf, x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx, peterz, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Sat, Nov 30, 2024 at 04:31:25PM +0100, Borislav Petkov wrote: > On Thu, Nov 21, 2024 at 12:07:18PM -0800, Josh Poimboeuf wrote: > > eIBRS protects against RSB underflow/poisoning attacks. Adding > > retpoline to the mix doesn't change that. Retpoline has a balanced > > CALL/RET anyway. > > This is exactly why I've been wanting for us to document our mitigations for > a long time now. > > A bunch of statements above for which I can only rhyme up they're correct if > I search for the vendor docs. On the AMD side I've found: > > "When Automatic IBRS is enabled, the internal return address stack used for > return address predictions is cleared on VMEXIT." > > APM v2, p. 58/119 > > For the Intel side I'm not that lucky. There's something here: > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html > > Or is it this one: > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html#inpage-nav-1-3-undefined > > Or is this written down explicitly in some other doc? It is in this doc: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html "Processors with enhanced IBRS still support the usage model where IBRS is set only in the OS/VMM for OSes that enable SMEP. To do this, such processors will ensure that guest behavior cannot control the RSB after a VM exit once IBRS is set, even if IBRS was not set at the time of the VM exit." ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-12-02 23:35 ` Pawan Gupta @ 2024-12-03 11:20 ` Borislav Petkov 2024-12-05 23:12 ` Josh Poimboeuf 0 siblings, 1 reply; 31+ messages in thread From: Borislav Petkov @ 2024-12-03 11:20 UTC (permalink / raw) To: Pawan Gupta Cc: Josh Poimboeuf, x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx, peterz, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Mon, Dec 02, 2024 at 03:35:21PM -0800, Pawan Gupta wrote: > It is in this doc: > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html > I hope those URLs remain more stable than past experience shows. > "Processors with enhanced IBRS still support the usage model where IBRS is > set only in the OS/VMM for OSes that enable SMEP. To do this, such > processors will ensure that guest behavior cannot control the RSB after a > VM exit once IBRS is set, even if IBRS was not set at the time of the VM > exit." ACK, thanks. Now, can we pls add those excerpts to Documentation/ and point to them from the code so that it is crystal clear why it is ok? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-12-03 11:20 ` Borislav Petkov @ 2024-12-05 23:12 ` Josh Poimboeuf 2024-12-21 9:13 ` Borislav Petkov 2025-04-02 9:19 ` Shah, Amit 0 siblings, 2 replies; 31+ messages in thread From: Josh Poimboeuf @ 2024-12-05 23:12 UTC (permalink / raw) To: Borislav Petkov Cc: Pawan Gupta, x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx, peterz, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Tue, Dec 03, 2024 at 12:20:15PM +0100, Borislav Petkov wrote: > On Mon, Dec 02, 2024 at 03:35:21PM -0800, Pawan Gupta wrote: > > It is in this doc: > > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html > > > > I hope those URLs remain more stable than past experience shows. > > > "Processors with enhanced IBRS still support the usage model where IBRS is > > set only in the OS/VMM for OSes that enable SMEP. To do this, such > > processors will ensure that guest behavior cannot control the RSB after a > > VM exit once IBRS is set, even if IBRS was not set at the time of the VM > > exit." > > ACK, thanks. > > Now, can we pls add those excerpts to Documentation/ and point to them from > the code so that it is crystal clear why it is ok? Ok, I'll try to write up a document. I'm thinking it should go in its own return-based-attacks.rst file rather than spectre.rst, which is more of an outdated historical document at this point. And we want this document to actually be read (and kept up to date) by developers instead of mostly ignored like the others. -- Josh ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-12-05 23:12 ` Josh Poimboeuf @ 2024-12-21 9:13 ` Borislav Petkov 2025-04-02 9:19 ` Shah, Amit 1 sibling, 0 replies; 31+ messages in thread From: Borislav Petkov @ 2024-12-21 9:13 UTC (permalink / raw) To: Josh Poimboeuf Cc: Pawan Gupta, x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx, peterz, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Thu, Dec 05, 2024 at 03:12:07PM -0800, Josh Poimboeuf wrote: > Ok, I'll try to write up a document. I'm thinking it should go in its > own return-based-attacks.rst file rather than spectre.rst, which is more > of an outdated historical document at this point. Thanks! > And we want this document to actually be read (and kept up to date) by > developers instead of mostly ignored like the others. Good idea. Now how do we enforce this? We could add a rule to checkpatch or to some other patch checking script like mine here for example :-) https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=vp -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-12-05 23:12 ` Josh Poimboeuf 2024-12-21 9:13 ` Borislav Petkov @ 2025-04-02 9:19 ` Shah, Amit 2025-04-02 14:16 ` Josh Poimboeuf 1 sibling, 1 reply; 31+ messages in thread From: Shah, Amit @ 2025-04-02 9:19 UTC (permalink / raw) To: jpoimboe@kernel.org, bp@alien8.de Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, kvm@vger.kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com, Lendacky, Thomas, boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org, Kaplan, David, x86@kernel.org On Thu, 2024-12-05 at 15:12 -0800, Josh Poimboeuf wrote: > On Tue, Dec 03, 2024 at 12:20:15PM +0100, Borislav Petkov wrote: > > On Mon, Dec 02, 2024 at 03:35:21PM -0800, Pawan Gupta wrote: > > > It is in this doc: > > > > > > > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html > > > > > > > I hope those URLs remain more stable than past experience shows. > > > > > "Processors with enhanced IBRS still support the usage model > > > where IBRS is > > > set only in the OS/VMM for OSes that enable SMEP. To do this, > > > such > > > processors will ensure that guest behavior cannot control the > > > RSB after a > > > VM exit once IBRS is set, even if IBRS was not set at the time > > > of the VM > > > exit." > > > > ACK, thanks. > > > > Now, can we pls add those excerpts to Documentation/ and point to > > them from > > the code so that it is crystal clear why it is ok? > > Ok, I'll try to write up a document. I'm thinking it should go in > its > own return-based-attacks.rst file rather than spectre.rst, which is > more > of an outdated historical document at this point. And we want this > document to actually be read (and kept up to date) by developers > instead > of mostly ignored like the others. > Hey Josh, Do you plan to submit a v3 with the changes? Thanks, Amit ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2025-04-02 9:19 ` Shah, Amit @ 2025-04-02 14:16 ` Josh Poimboeuf 2025-04-02 14:19 ` Shah, Amit 0 siblings, 1 reply; 31+ messages in thread From: Josh Poimboeuf @ 2025-04-02 14:16 UTC (permalink / raw) To: Shah, Amit Cc: bp@alien8.de, corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, kvm@vger.kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com, Lendacky, Thomas, boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org, Kaplan, David, x86@kernel.org On Wed, Apr 02, 2025 at 09:19:19AM +0000, Shah, Amit wrote: > On Thu, 2024-12-05 at 15:12 -0800, Josh Poimboeuf wrote: > > On Tue, Dec 03, 2024 at 12:20:15PM +0100, Borislav Petkov wrote: > > > On Mon, Dec 02, 2024 at 03:35:21PM -0800, Pawan Gupta wrote: > > > > It is in this doc: > > > > > > > > > > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/indirect-branch-restricted-speculation.html > > > > > > > > > > I hope those URLs remain more stable than past experience shows. > > > > > > > "Processors with enhanced IBRS still support the usage model > > > > where IBRS is > > > > set only in the OS/VMM for OSes that enable SMEP. To do this, > > > > such > > > > processors will ensure that guest behavior cannot control the > > > > RSB after a > > > > VM exit once IBRS is set, even if IBRS was not set at the time > > > > of the VM > > > > exit." > > > > > > ACK, thanks. > > > > > > Now, can we pls add those excerpts to Documentation/ and point to > > > them from > > > the code so that it is crystal clear why it is ok? > > > > Ok, I'll try to write up a document. I'm thinking it should go in > > its > > own return-based-attacks.rst file rather than spectre.rst, which is > > more > > of an outdated historical document at this point. And we want this > > document to actually be read (and kept up to date) by developers > > instead > > of mostly ignored like the others. > > > > Hey Josh, > > Do you plan to submit a v3 with the changes? Thanks for the reminder, I actually had the patches ready to go a few months ago (with a fancy new doc) and then forgot to post. Let me dust off the cobwebs. -- Josh ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2025-04-02 14:16 ` Josh Poimboeuf @ 2025-04-02 14:19 ` Shah, Amit 0 siblings, 0 replies; 31+ messages in thread From: Shah, Amit @ 2025-04-02 14:19 UTC (permalink / raw) To: jpoimboe@kernel.org Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, andrew.cooper3@citrix.com, kai.huang@intel.com, dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com, Lendacky, Thomas, pawan.kumar.gupta@linux.intel.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, kvm@vger.kernel.org, Moger, Babu, Das1, Sandipan, tglx@linutronix.de, dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org, bp@alien8.de, Kaplan, David, x86@kernel.org On Wed, 2025-04-02 at 07:16 -0700, Josh Poimboeuf wrote: > On Wed, Apr 02, 2025 at 09:19:19AM +0000, Shah, Amit wrote: > [...] > > Hey Josh, > > > > Do you plan to submit a v3 with the changes? > > Thanks for the reminder, I actually had the patches ready to go a few > months ago (with a fancy new doc) and then forgot to post. Let me > dust > off the cobwebs. Excellent, thanks! Amit ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline 2024-11-30 15:31 ` Borislav Petkov 2024-12-02 11:15 ` Shah, Amit 2024-12-02 23:35 ` Pawan Gupta @ 2024-12-05 23:13 ` Josh Poimboeuf 2 siblings, 0 replies; 31+ messages in thread From: Josh Poimboeuf @ 2024-12-05 23:13 UTC (permalink / raw) To: Borislav Petkov Cc: x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx, peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Sat, Nov 30, 2024 at 04:31:25PM +0100, Borislav Petkov wrote: > On Thu, Nov 21, 2024 at 12:07:18PM -0800, Josh Poimboeuf wrote: > > --- a/arch/x86/kernel/cpu/bugs.c > > +++ b/arch/x86/kernel/cpu/bugs.c > > @@ -1605,20 +1605,20 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_ > > case SPECTRE_V2_NONE: > > return; > > > > - case SPECTRE_V2_EIBRS_LFENCE: > > case SPECTRE_V2_EIBRS: > > + case SPECTRE_V2_EIBRS_LFENCE: > > + case SPECTRE_V2_EIBRS_RETPOLINE: > > if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) { > > - setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); > > pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); > > + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); > > Why are you swapping those? > > > } > > return; > > > > - case SPECTRE_V2_EIBRS_RETPOLINE: > > case SPECTRE_V2_RETPOLINE: > > case SPECTRE_V2_LFENCE: > > case SPECTRE_V2_IBRS: > > - setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); > > pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n"); > > + setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); > > Ditto? It's more readable that way, similar to how a comment goes before code. -- Josh ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-11-21 20:07 [PATCH v2 0/2] x86/bugs: RSB tweaks Josh Poimboeuf 2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf @ 2024-11-21 20:07 ` Josh Poimboeuf 2024-12-03 11:42 ` Borislav Petkov 2024-12-05 23:32 ` Josh Poimboeuf 2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah 2 siblings, 2 replies; 31+ messages in thread From: Josh Poimboeuf @ 2024-11-21 20:07 UTC (permalink / raw) To: x86 Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx, peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 User->user Spectre v2 attacks (including RSB) across context switches are already mitigated by IBPB in cond_mitigation(), if enabled globally or if either the prev or the next task has opted in to protection. RSB filling without IBPB serves no purpose for protecting user space, as indirect branches are still vulnerable. User->kernel RSB attacks are mitigated by eIBRS. In which case the RSB filling on context switch isn't needed, so remove it. While at it, update and coalesce the comments describing the various RSB mitigations. Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> --- arch/x86/kernel/cpu/bugs.c | 103 +++++++++++++++---------------------- arch/x86/mm/tlb.c | 2 +- 2 files changed, 42 insertions(+), 63 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 68bed17f0980..d5102b72f74d 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1579,31 +1579,48 @@ static void __init spec_ctrl_disable_kernel_rrsba(void) rrsba_disabled = true; } -static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode) +static void __init spectre_v2_mitigate_rsb(enum spectre_v2_mitigation mode) { /* - * Similar to context switches, there are two types of RSB attacks - * after VM exit: + * In general there are two types of RSB attacks: * - * 1) RSB underflow + * 1) RSB underflow ("Intel Retbleed") + * + * Some Intel parts have "bottomless RSB". When the RSB is empty, + * speculated return targets may come from the branch predictor, + * which could have a user-poisoned BTB or BHB entry. + * + * user->user attacks are mitigated by IBPB on context switch. + * + * user->kernel attacks via context switch are mitigated by IBRS, + * eIBRS, or RSB filling. + * + * user->kernel attacks via kernel entry are mitigated by IBRS, + * eIBRS, or call depth tracking. + * + * On VMEXIT, guest->host attacks are mitigated by IBRS, eIBRS, or + * RSB filling. * * 2) Poisoned RSB entry * - * When retpoline is enabled, both are mitigated by filling/clearing - * the RSB. + * On a context switch, the previous task can poison RSB entries + * used by the next task, controlling its speculative return + * targets. Poisoned RSB entries can also be created by "AMD + * Retbleed" or SRSO. * - * When IBRS is enabled, while #1 would be mitigated by the IBRS branch - * prediction isolation protections, RSB still needs to be cleared - * because of #2. Note that SMEP provides no protection here, unlike - * user-space-poisoned RSB entries. + * user->user attacks are mitigated by IBPB on context switch. * - * eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB - * bug is present then a LITE version of RSB protection is required, - * just a single call needs to retire before a RET is executed. + * user->kernel attacks via context switch are prevented by + * SMEP+eIBRS+SRSO mitigations, or RSB clearing. + * + * guest->host attacks are mitigated by eIBRS or RSB clearing on + * VMEXIT. eIBRS implementations with X86_BUG_EIBRS_PBRSB still + * need "lite" RSB filling which retires a CALL before the first + * RET. */ switch (mode) { case SPECTRE_V2_NONE: - return; + break; case SPECTRE_V2_EIBRS: case SPECTRE_V2_EIBRS_LFENCE: @@ -1612,18 +1629,21 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_ pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); } - return; + break; case SPECTRE_V2_RETPOLINE: case SPECTRE_V2_LFENCE: case SPECTRE_V2_IBRS: - pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n"); + pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n"); + setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); - return; - } + break; - pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation at VM exit"); - dump_stack(); + default: + pr_warn_once("Unknown Spectre v2 mode, disabling RSB mitigation\n"); + dump_stack(); + break; + } } /* @@ -1817,48 +1837,7 @@ static void __init spectre_v2_select_mitigation(void) spectre_v2_enabled = mode; pr_info("%s\n", spectre_v2_strings[mode]); - /* - * If Spectre v2 protection has been enabled, fill the RSB during a - * context switch. In general there are two types of RSB attacks - * across context switches, for which the CALLs/RETs may be unbalanced. - * - * 1) RSB underflow - * - * Some Intel parts have "bottomless RSB". When the RSB is empty, - * speculated return targets may come from the branch predictor, - * which could have a user-poisoned BTB or BHB entry. - * - * AMD has it even worse: *all* returns are speculated from the BTB, - * regardless of the state of the RSB. - * - * When IBRS or eIBRS is enabled, the "user -> kernel" attack - * scenario is mitigated by the IBRS branch prediction isolation - * properties, so the RSB buffer filling wouldn't be necessary to - * protect against this type of attack. - * - * The "user -> user" attack scenario is mitigated by RSB filling. - * - * 2) Poisoned RSB entry - * - * If the 'next' in-kernel return stack is shorter than 'prev', - * 'next' could be tricked into speculating with a user-poisoned RSB - * entry. - * - * The "user -> kernel" attack scenario is mitigated by SMEP and - * eIBRS. - * - * The "user -> user" scenario, also known as SpectreBHB, requires - * RSB clearing. - * - * So to mitigate all cases, unconditionally fill RSB on context - * switches. - * - * FIXME: Is this pointless for retbleed-affected AMD? - */ - setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n"); - - spectre_v2_determine_rsb_fill_type_at_vmexit(mode); + spectre_v2_mitigate_rsb(mode); /* * Retpoline protects the kernel, but doesn't protect firmware. IBRS diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 86593d1b787d..dc388f5ae7ef 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -388,7 +388,7 @@ static void cond_mitigation(struct task_struct *next) prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec); /* - * Avoid user/user BTB poisoning by flushing the branch predictor + * Avoid user->user BTB/RSB poisoning by flushing the branch predictor * when switching between processes. This stops one process from * doing Spectre-v2 attacks on another. * -- 2.47.0 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf @ 2024-12-03 11:42 ` Borislav Petkov 2024-12-05 23:32 ` Josh Poimboeuf 1 sibling, 0 replies; 31+ messages in thread From: Borislav Petkov @ 2024-12-03 11:42 UTC (permalink / raw) To: Josh Poimboeuf Cc: x86, linux-kernel, amit, kvm, amit.shah, thomas.lendacky, tglx, peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote: > @@ -1579,31 +1579,48 @@ static void __init spec_ctrl_disable_kernel_rrsba(void) > rrsba_disabled = true; > } > > -static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_mitigation mode) > +static void __init spectre_v2_mitigate_rsb(enum spectre_v2_mitigation mode) mitigate the return stack buffer? The previous name was describing better what this function does... Perhaps spectre_v2_determine_rsb_handling() or so. "Handling" to mean, what do to with the RSB based on the selected Spectre v2 mitigation. > { > /* > - * Similar to context switches, there are two types of RSB attacks > - * after VM exit: > + * In general there are two types of RSB attacks: > * > - * 1) RSB underflow > + * 1) RSB underflow ("Intel Retbleed") > + * > + * Some Intel parts have "bottomless RSB". When the RSB is empty, This is exactly what I mean with expanding this in Documentation/: Which parts are those? > + * speculated return targets may come from the branch predictor, > + * which could have a user-poisoned BTB or BHB entry. Also there we could explain what those alternative predictors are: "RSBA: The processor supports RSB Alternate. Alternative branch predictors may be used by RET instructions when the RSB is empty." https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html > + * > + * user->user attacks are mitigated by IBPB on context switch. > + * > + * user->kernel attacks via context switch are mitigated by IBRS, > + * eIBRS, or RSB filling. > + * > + * user->kernel attacks via kernel entry are mitigated by IBRS, > + * eIBRS, or call depth tracking. > + * > + * On VMEXIT, guest->host attacks are mitigated by IBRS, eIBRS, or > + * RSB filling. I like the explanation of every attack vector. I'd like even more if that were expanded in Documentation/ and we can always go look it up there when touching this code or reviewing patches touching it. > * > * 2) Poisoned RSB entry > * > - * When retpoline is enabled, both are mitigated by filling/clearing > - * the RSB. > + * On a context switch, the previous task can poison RSB entries > + * used by the next task, controlling its speculative return > + * targets. Poisoned RSB entries can also be created by "AMD > + * Retbleed" or SRSO. "... by the AMD Retbleed variant... " > * > - * When IBRS is enabled, while #1 would be mitigated by the IBRS branch > - * prediction isolation protections, RSB still needs to be cleared > - * because of #2. Note that SMEP provides no protection here, unlike > - * user-space-poisoned RSB entries. > + * user->user attacks are mitigated by IBPB on context switch. > * > - * eIBRS should protect against RSB poisoning, but if the EIBRS_PBRSB > - * bug is present then a LITE version of RSB protection is required, > - * just a single call needs to retire before a RET is executed. > + * user->kernel attacks via context switch are prevented by > + * SMEP+eIBRS+SRSO mitigations, or RSB clearing. we call it "RSB filling" above. What's the difference? The hw *IBRS* vs our RSB stuffing sw sequence? > + * guest->host attacks are mitigated by eIBRS or RSB clearing on > + * VMEXIT. I think you mean our sw RSB stuffing sequence here. > eIBRS implementations with X86_BUG_EIBRS_PBRSB still > + * need "lite" RSB filling which retires a CALL before the first > + * RET. > */ > switch (mode) { > case SPECTRE_V2_NONE: > - return; > + break; > > case SPECTRE_V2_EIBRS: > case SPECTRE_V2_EIBRS_LFENCE: > @@ -1612,18 +1629,21 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_ > pr_info("Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT\n"); > setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT_LITE); > } > - return; > + break; > > case SPECTRE_V2_RETPOLINE: > case SPECTRE_V2_LFENCE: > case SPECTRE_V2_IBRS: > - pr_info("Spectre v2 / SpectreRSB : Filling RSB on VMEXIT\n"); > + pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n"); > + setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); Aha, with "RSB filling" you mean our sw sequence. It would be good to have those explained properly and have some text establishing the terminology we're using. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf 2024-12-03 11:42 ` Borislav Petkov @ 2024-12-05 23:32 ` Josh Poimboeuf 2024-12-06 0:53 ` Josh Poimboeuf 2024-12-06 10:10 ` Shah, Amit 1 sibling, 2 replies; 31+ messages in thread From: Josh Poimboeuf @ 2024-12-05 23:32 UTC (permalink / raw) To: x86 Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx, peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote: > User->user Spectre v2 attacks (including RSB) across context switches > are already mitigated by IBPB in cond_mitigation(), if enabled globally > or if either the prev or the next task has opted in to protection. RSB > filling without IBPB serves no purpose for protecting user space, as > indirect branches are still vulnerable. Question for Intel/AMD folks: where is it documented that IBPB clears the RSB? I thought I'd seen this somewhere but I can't seem to find it. -- Josh ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-12-05 23:32 ` Josh Poimboeuf @ 2024-12-06 0:53 ` Josh Poimboeuf 2024-12-06 23:02 ` Josh Poimboeuf 2024-12-06 10:10 ` Shah, Amit 1 sibling, 1 reply; 31+ messages in thread From: Josh Poimboeuf @ 2024-12-06 0:53 UTC (permalink / raw) To: x86 Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx, peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Thu, Dec 05, 2024 at 03:32:47PM -0800, Josh Poimboeuf wrote: > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote: > > User->user Spectre v2 attacks (including RSB) across context switches > > are already mitigated by IBPB in cond_mitigation(), if enabled globally > > or if either the prev or the next task has opted in to protection. RSB > > filling without IBPB serves no purpose for protecting user space, as > > indirect branches are still vulnerable. > > Question for Intel/AMD folks: where is it documented that IBPB clears > the RSB? I thought I'd seen this somewhere but I can't seem to find it. For Intel, I found this: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html "Software that executed before the IBPB command cannot control the predicted targets of indirect branches executed after the command on the same logical processor. The term indirect branch in this context includes near return instructions, so these predicted targets may come from the RSB. This article uses the term RSB-barrier to refer to either an IBPB command event, or (on processors which support enhanced IBRS) either a VM exit with IBRS set to 1 or setting IBRS to 1 after a VM exit." I haven't seen anything that explicit for AMD. -- Josh ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-12-06 0:53 ` Josh Poimboeuf @ 2024-12-06 23:02 ` Josh Poimboeuf 2024-12-30 14:54 ` Shah, Amit 2025-01-08 11:50 ` Shah, Amit 0 siblings, 2 replies; 31+ messages in thread From: Josh Poimboeuf @ 2024-12-06 23:02 UTC (permalink / raw) To: x86 Cc: linux-kernel, amit, kvm, amit.shah, thomas.lendacky, bp, tglx, peterz, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On Thu, Dec 05, 2024 at 04:53:03PM -0800, Josh Poimboeuf wrote: > On Thu, Dec 05, 2024 at 03:32:47PM -0800, Josh Poimboeuf wrote: > > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote: > > > User->user Spectre v2 attacks (including RSB) across context switches > > > are already mitigated by IBPB in cond_mitigation(), if enabled globally > > > or if either the prev or the next task has opted in to protection. RSB > > > filling without IBPB serves no purpose for protecting user space, as > > > indirect branches are still vulnerable. > > > > Question for Intel/AMD folks: where is it documented that IBPB clears > > the RSB? I thought I'd seen this somewhere but I can't seem to find it. > > For Intel, I found this: > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html > > "Software that executed before the IBPB command cannot control the > predicted targets of indirect branches executed after the command on > the same logical processor. The term indirect branch in this context > includes near return instructions, so these predicted targets may come > from the RSB. > > This article uses the term RSB-barrier to refer to either an IBPB > command event, or (on processors which support enhanced IBRS) either a > VM exit with IBRS set to 1 or setting IBRS to 1 after a VM exit." > > I haven't seen anything that explicit for AMD. Found it. As Andrew mentioned earlier, AMD IBPB only clears RSB if the IBPB_RET CPUID bit is set. From APM vol 3: CPUID Fn8000_0008_EBX Extended Feature Identifiers: 30 IBPB_RET The processor clears the return address predictor when MSR PRED_CMD.IBPB is written to 1. We check that already for the IBPB entry mitigation, but now we'll also need to do so for the context switch IBPB. Question for AMD, does SBPB behave the same way, i.e. does it clear RSB if IBPB_RET? -- Josh ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-12-06 23:02 ` Josh Poimboeuf @ 2024-12-30 14:54 ` Shah, Amit 2025-01-08 11:50 ` Shah, Amit 1 sibling, 0 replies; 31+ messages in thread From: Shah, Amit @ 2024-12-30 14:54 UTC (permalink / raw) To: jpoimboe@kernel.org, x86@kernel.org Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, kvm@vger.kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, Lendacky, Thomas, daniel.sneddon@linux.intel.com, boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org, bp@alien8.de, Kaplan, David On Fri, 2024-12-06 at 15:02 -0800, Josh Poimboeuf wrote: > On Thu, Dec 05, 2024 at 04:53:03PM -0800, Josh Poimboeuf wrote: > > On Thu, Dec 05, 2024 at 03:32:47PM -0800, Josh Poimboeuf wrote: > > > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote: > > > > User->user Spectre v2 attacks (including RSB) across context > > > > switches > > > > are already mitigated by IBPB in cond_mitigation(), if enabled > > > > globally > > > > or if either the prev or the next task has opted in to > > > > protection. RSB > > > > filling without IBPB serves no purpose for protecting user > > > > space, as > > > > indirect branches are still vulnerable. > > > > > > Question for Intel/AMD folks: where is it documented that IBPB > > > clears > > > the RSB? I thought I'd seen this somewhere but I can't seem to > > > find it. > > > > For Intel, I found this: > > > > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html > > > > "Software that executed before the IBPB command cannot control > > the > > predicted targets of indirect branches executed after the command > > on > > the same logical processor. The term indirect branch in this > > context > > includes near return instructions, so these predicted targets may > > come > > from the RSB. > > > > This article uses the term RSB-barrier to refer to either an IBPB > > command event, or (on processors which support enhanced IBRS) > > either a > > VM exit with IBRS set to 1 or setting IBRS to 1 after a VM exit." > > > > I haven't seen anything that explicit for AMD. > > Found it. As Andrew mentioned earlier, AMD IBPB only clears RSB if > the > IBPB_RET CPUID bit is set. From APM vol 3: > > CPUID Fn8000_0008_EBX Extended Feature Identifiers: > > 30 IBPB_RET The processor clears the return address > predictor when MSR PRED_CMD.IBPB is written > to 1. > > We check that already for the IBPB entry mitigation, but now we'll > also > need to do so for the context switch IBPB. > > Question for AMD, does SBPB behave the same way, i.e. does it clear > RSB > if IBPB_RET? I'm not sure about this. I'll ask around internally. Amit ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-12-06 23:02 ` Josh Poimboeuf 2024-12-30 14:54 ` Shah, Amit @ 2025-01-08 11:50 ` Shah, Amit 1 sibling, 0 replies; 31+ messages in thread From: Shah, Amit @ 2025-01-08 11:50 UTC (permalink / raw) To: jpoimboe@kernel.org, x86@kernel.org Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, kvm@vger.kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, Lendacky, Thomas, daniel.sneddon@linux.intel.com, boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, dwmw@amazon.co.uk, amit@kernel.org, hpa@zytor.com, peterz@infradead.org, bp@alien8.de, Kaplan, David On Fri, 2024-12-06 at 15:02 -0800, Josh Poimboeuf wrote: > On Thu, Dec 05, 2024 at 04:53:03PM -0800, Josh Poimboeuf wrote: > > On Thu, Dec 05, 2024 at 03:32:47PM -0800, Josh Poimboeuf wrote: > > > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote: > > > > User->user Spectre v2 attacks (including RSB) across context > > > > switches > > > > are already mitigated by IBPB in cond_mitigation(), if enabled > > > > globally > > > > or if either the prev or the next task has opted in to > > > > protection. RSB > > > > filling without IBPB serves no purpose for protecting user > > > > space, as > > > > indirect branches are still vulnerable. > > > > > > Question for Intel/AMD folks: where is it documented that IBPB > > > clears > > > the RSB? I thought I'd seen this somewhere but I can't seem to > > > find it. > > > > For Intel, I found this: > > > > > > https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html > > > > "Software that executed before the IBPB command cannot control > > the > > predicted targets of indirect branches executed after the command > > on > > the same logical processor. The term indirect branch in this > > context > > includes near return instructions, so these predicted targets may > > come > > from the RSB. > > > > This article uses the term RSB-barrier to refer to either an IBPB > > command event, or (on processors which support enhanced IBRS) > > either a > > VM exit with IBRS set to 1 or setting IBRS to 1 after a VM exit." > > > > I haven't seen anything that explicit for AMD. > > Found it. As Andrew mentioned earlier, AMD IBPB only clears RSB if > the > IBPB_RET CPUID bit is set. From APM vol 3: > > CPUID Fn8000_0008_EBX Extended Feature Identifiers: > > 30 IBPB_RET The processor clears the return address > predictor when MSR PRED_CMD.IBPB is written > to 1. > > We check that already for the IBPB entry mitigation, but now we'll > also > need to do so for the context switch IBPB. > > Question for AMD, does SBPB behave the same way, i.e. does it clear > RSB > if IBPB_RET? That's correct - SBPB clears the RSB only when IBPB_RET is set. Amit ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-12-05 23:32 ` Josh Poimboeuf 2024-12-06 0:53 ` Josh Poimboeuf @ 2024-12-06 10:10 ` Shah, Amit 2024-12-09 20:46 ` jpoimboe 1 sibling, 1 reply; 31+ messages in thread From: Shah, Amit @ 2024-12-06 10:10 UTC (permalink / raw) To: jpoimboe@kernel.org, x86@kernel.org Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, kvm@vger.kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, Lendacky, Thomas, daniel.sneddon@linux.intel.com, boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org, bp@alien8.de, Kaplan, David On Thu, 2024-12-05 at 15:32 -0800, Josh Poimboeuf wrote: > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote: > > User->user Spectre v2 attacks (including RSB) across context > > switches > > are already mitigated by IBPB in cond_mitigation(), if enabled > > globally > > or if either the prev or the next task has opted in to protection. > > RSB > > filling without IBPB serves no purpose for protecting user space, > > as > > indirect branches are still vulnerable. > > Question for Intel/AMD folks: where is it documented that IBPB clears > the RSB? I thought I'd seen this somewhere but I can't seem to find > it. "AMD64 TECHNOLOGY INDIRECT BRANCH CONTROL EXTENSION" https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/white-papers/111006-architecture-guidelines-update-amd64-technology-indirect-branch-control-extension.pdf has: Indirect branch prediction barrier (IBPB) exists at MSR 0x49 (PRED_CMD) it 0. This is a write only MSR that both GP faults when software reads it or if software tries to write any of the bits in 63:1. When bit zero is written, the processor guarantees that older indirect branches cannot influence predictions of indirect branches in the future. This applies to jmp indirects, call indirects and returns. As this restricts the processor from using all previous indirect branch information, it is intended to only be used by software when switching from one user context to another user context that requires protection, or from one guest to another guest. Amit ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS 2024-12-06 10:10 ` Shah, Amit @ 2024-12-09 20:46 ` jpoimboe 0 siblings, 0 replies; 31+ messages in thread From: jpoimboe @ 2024-12-09 20:46 UTC (permalink / raw) To: Shah, Amit Cc: x86@kernel.org, corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, kvm@vger.kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, Lendacky, Thomas, daniel.sneddon@linux.intel.com, boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, dwmw@amazon.co.uk, hpa@zytor.com, peterz@infradead.org, bp@alien8.de, Kaplan, David On Fri, Dec 06, 2024 at 10:10:31AM +0000, Shah, Amit wrote: > On Thu, 2024-12-05 at 15:32 -0800, Josh Poimboeuf wrote: > > On Thu, Nov 21, 2024 at 12:07:19PM -0800, Josh Poimboeuf wrote: > > > User->user Spectre v2 attacks (including RSB) across context > > > switches > > > are already mitigated by IBPB in cond_mitigation(), if enabled > > > globally > > > or if either the prev or the next task has opted in to protection. > > > RSB > > > filling without IBPB serves no purpose for protecting user space, > > > as > > > indirect branches are still vulnerable. > > > > Question for Intel/AMD folks: where is it documented that IBPB clears > > the RSB? I thought I'd seen this somewhere but I can't seem to find > > it. > > "AMD64 TECHNOLOGY INDIRECT BRANCH CONTROL EXTENSION" > https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/white-papers/111006-architecture-guidelines-update-amd64-technology-indirect-branch-control-extension.pdf > > has: > > Indirect branch prediction barrier (IBPB) exists at MSR 0x49 (PRED_CMD) > it 0. This is a write only MSR that both GP faults when software reads > it or if software tries to write any of the bits in 63:1. When bit zero > is written, the processor guarantees that older indirect branches > cannot influence predictions of indirect branches in the future. This > applies to jmp indirects, call indirects and returns. As this restricts > the processor from using all previous indirect branch information, it > is intended to only be used by software when switching from one user > context to another user context that requires protection, or from one > guest to another guest. Sounds like that needs to be updated to mention the IBPB_RET bit. -- Josh ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC PATCH v3 0/2] Add support for the ERAPS feature 2024-11-21 20:07 [PATCH v2 0/2] x86/bugs: RSB tweaks Josh Poimboeuf 2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf 2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf @ 2024-11-28 13:28 ` Amit Shah 2024-11-28 13:28 ` [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB Amit Shah 2024-11-28 13:28 ` [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Amit Shah 2 siblings, 2 replies; 31+ messages in thread From: Amit Shah @ 2024-11-28 13:28 UTC (permalink / raw) To: linux-kernel, kvm, x86, linux-doc Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3, Amit Shah Newer AMD CPUs (Zen5+) have the ERAPS feature bit that allows us to remove the RSB filling loops required during context switches and VM exits. This patchset implements the feature to: * remove the need for RSB filling on context switches and VMEXITs in host and guests * allow KVM guests to use the full default RSB stack The feature isn't yet part of an APM update that details its working, so this is still tagged as RFC. The notes at https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/ may help follow along till the APM is public. v3: * rebase on top of Josh's RSB tweaks series * with that rebase, only the non-AutoIBRS case needs special ERAPS support. AutoIBRS is currently disabled when SEV-SNP is active (commit acaa4b5c4c8) * remove comment about RSB_CLEAR_LOOPS and the size of the RSB -- it's not necessary anymore with the rework * remove comment from patch 2 in svm.c in favour of the commit message v2: * reword comments to highlight context switch as the main trigger for RSB flushes in hardware (Dave Hansen) * Split out outdated comment updates in (v1) patch1 to be a standalone patch1 in this series, to reinforce RSB filling is only required for RSB poisoning cases for AMD * Remove mentions of BTC/BTC_NO (Andrew Cooper) * Add braces in case stmt (kernel test robot) * s/boot_cpu_has/cpu_feature_enabled (Boris Petkov) Amit Shah (2): x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Documentation/admin-guide/hw-vuln/spectre.rst | 5 ++-- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 6 +++- arch/x86/kernel/cpu/bugs.c | 6 +++- arch/x86/kvm/cpuid.c | 18 ++++++++++-- arch/x86/kvm/svm/svm.c | 29 +++++++++++++++++++ arch/x86/kvm/svm/svm.h | 15 ++++++++++ 7 files changed, 74 insertions(+), 6 deletions(-) -- 2.47.0 ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB 2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah @ 2024-11-28 13:28 ` Amit Shah 2024-12-02 17:26 ` Dave Hansen 2024-11-28 13:28 ` [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Amit Shah 1 sibling, 1 reply; 31+ messages in thread From: Amit Shah @ 2024-11-28 13:28 UTC (permalink / raw) To: linux-kernel, kvm, x86, linux-doc Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 From: Amit Shah <amit.shah@amd.com> When Automatic IBRS is disabled, Linux flushed the RSB on every context switch. This RSB flush is not necessary in software with the ERAPS feature on Zen5+ CPUs that flushes the RSB in hardware on a context switch (triggered by mov-to-CR3). Additionally, the ERAPS feature also tags host and guest addresses in the RSB - eliminating the need for software flushing of the RSB on VMEXIT. Disable all RSB flushing by Linux when the CPU has ERAPS. Feature mentioned in AMD PPR 57238. Will be resubmitted once APM is public - which I'm told is imminent. Signed-off-by: Amit Shah <amit.shah@amd.com> --- Documentation/admin-guide/hw-vuln/spectre.rst | 5 +++-- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/bugs.c | 6 +++++- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst index 132e0bc6007e..647c10c0307a 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -417,9 +417,10 @@ The possible values in this file are: - Return stack buffer (RSB) protection status: - ============= =========================================== + ============= ======================================================== 'RSB filling' Protection of RSB on context switch enabled - ============= =========================================== + 'ERAPS' Hardware RSB flush on context switches + guest/host tags + ============= ======================================================== - EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status: diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 17b6590748c0..79a1373050f7 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -461,6 +461,7 @@ #define X86_FEATURE_AUTOIBRS (20*32+ 8) /* Automatic IBRS */ #define X86_FEATURE_NO_SMM_CTL_MSR (20*32+ 9) /* SMM_CTL MSR is not present */ +#define X86_FEATURE_ERAPS (20*32+24) /* Enhanced RAP / RSB / RAS Security */ #define X86_FEATURE_SBPB (20*32+27) /* Selective Branch Prediction Barrier */ #define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */ #define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */ diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d5102b72f74d..d7af5f811776 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1634,6 +1634,9 @@ static void __init spectre_v2_mitigate_rsb(enum spectre_v2_mitigation mode) case SPECTRE_V2_RETPOLINE: case SPECTRE_V2_LFENCE: case SPECTRE_V2_IBRS: + if (boot_cpu_has(X86_FEATURE_ERAPS)) + break; + pr_info("Spectre v2 / SpectreRSB: Filling RSB on context switch and VMEXIT\n"); setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); setup_force_cpu_cap(X86_FEATURE_RSB_VMEXIT); @@ -2850,7 +2853,7 @@ static ssize_t spectre_v2_show_state(char *buf) spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE) return sysfs_emit(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n"); - return sysfs_emit(buf, "%s%s%s%s%s%s%s%s\n", + return sysfs_emit(buf, "%s%s%s%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], ibpb_state(), boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? "; IBRS_FW" : "", @@ -2858,6 +2861,7 @@ static ssize_t spectre_v2_show_state(char *buf) boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? "; RSB filling" : "", pbrsb_eibrs_state(), spectre_bhi_state(), + boot_cpu_has(X86_FEATURE_ERAPS) ? "; ERAPS hardware RSB flush" : "", /* this should always be at the end */ spectre_v2_module_string()); } -- 2.47.0 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB 2024-11-28 13:28 ` [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB Amit Shah @ 2024-12-02 17:26 ` Dave Hansen 2024-12-02 18:09 ` Amit Shah 0 siblings, 1 reply; 31+ messages in thread From: Dave Hansen @ 2024-12-02 17:26 UTC (permalink / raw) To: Amit Shah, linux-kernel, kvm, x86, linux-doc Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 On 11/28/24 05:28, Amit Shah wrote: > From: Amit Shah <amit.shah@amd.com> > > When Automatic IBRS is disabled, Linux flushed the RSB on every context > switch. This RSB flush is not necessary in software with the ERAPS > feature on Zen5+ CPUs that flushes the RSB in hardware on a context > switch (triggered by mov-to-CR3). > > Additionally, the ERAPS feature also tags host and guest addresses in > the RSB - eliminating the need for software flushing of the RSB on > VMEXIT. > > Disable all RSB flushing by Linux when the CPU has ERAPS. > > Feature mentioned in AMD PPR 57238. Will be resubmitted once APM is > public - which I'm told is imminent. There was a _lot_ of discussion about this. But all of that discussion seems to have been trimmed out and it seems like we're basically back to: "this is new hardware supposed to mitigate SpectreRSB, thus it mitigates SpectreRSB." Could we please summarize the previous discussions in the changelog? Otherwise, I fear it will be lost. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB 2024-12-02 17:26 ` Dave Hansen @ 2024-12-02 18:09 ` Amit Shah 2024-12-02 18:25 ` Dave Hansen 0 siblings, 1 reply; 31+ messages in thread From: Amit Shah @ 2024-12-02 18:09 UTC (permalink / raw) To: Dave Hansen, linux-kernel, kvm, x86, linux-doc Cc: thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3, Amit Shah On Mon, 2024-12-02 at 09:26 -0800, Dave Hansen wrote: > On 11/28/24 05:28, Amit Shah wrote: > > From: Amit Shah <amit.shah@amd.com> > > > > When Automatic IBRS is disabled, Linux flushed the RSB on every > > context > > switch. This RSB flush is not necessary in software with the ERAPS > > feature on Zen5+ CPUs that flushes the RSB in hardware on a context > > switch (triggered by mov-to-CR3). > > > > Additionally, the ERAPS feature also tags host and guest addresses > > in > > the RSB - eliminating the need for software flushing of the RSB on > > VMEXIT. > > > > Disable all RSB flushing by Linux when the CPU has ERAPS. > > > > Feature mentioned in AMD PPR 57238. Will be resubmitted once APM > > is > > public - which I'm told is imminent. > > There was a _lot_ of discussion about this. But all of that > discussion > seems to have been trimmed out and it seems like we're basically back > to: "this is new hardware supposed to mitigate SpectreRSB, thus it > mitigates SpectreRSB." Absolutely, I don't want that to get lost -- but I think that got captured in Josh's rework patchset. With that rework, I don't even need this patchset for the hardware feature to work, because we now rely on AutoIBRS to do the RSB clearing; and the hardware takes care of AutoIBRS and ERAPS interaction in Zen5. The only thing this patch now does is to handle the AutoIBRS-disabled case -- which happens when SEV-SNP is turned on (i.e. let the hw clear the RSB instead of stuffing it in Linux). I can still include the summary of the discussion in this patch - I just feel it isn't necessary with the rework. Amit ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB 2024-12-02 18:09 ` Amit Shah @ 2024-12-02 18:25 ` Dave Hansen 2024-12-02 18:36 ` Sean Christopherson 0 siblings, 1 reply; 31+ messages in thread From: Dave Hansen @ 2024-12-02 18:25 UTC (permalink / raw) To: Amit Shah, linux-kernel, kvm, x86, linux-doc Cc: thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3, Amit Shah On 12/2/24 10:09, Amit Shah wrote: > I can still include the summary of the discussion in this patch - I > just feel it isn't necessary with the rework. It's necessary. There's a new hardware feature. You want it to replace a software sequence in certain situations. You have to make an actual, coherent argument argument as to why the hardware feature is a suitable replacement. For instance (and I'm pretty sure we've gone over this more than once), the changelog here still makes the claim that a "context switch" and a "mov-to-CR3" are the same thing. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB 2024-12-02 18:25 ` Dave Hansen @ 2024-12-02 18:36 ` Sean Christopherson 0 siblings, 0 replies; 31+ messages in thread From: Sean Christopherson @ 2024-12-02 18:36 UTC (permalink / raw) To: Dave Hansen Cc: Amit Shah, linux-kernel, kvm, x86, linux-doc, thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3, Amit Shah On Mon, Dec 02, 2024, Dave Hansen wrote: > On 12/2/24 10:09, Amit Shah wrote: > > I can still include the summary of the discussion in this patch - I > > just feel it isn't necessary with the rework. > > It's necessary. > > There's a new hardware feature. You want it to replace a software > sequence in certain situations. You have to make an actual, coherent > argument argument as to why the hardware feature is a suitable replacement. > > For instance (and I'm pretty sure we've gone over this more than once), > the changelog here still makes the claim that a "context switch" and a > "mov-to-CR3" are the same thing. +1000. I want a crisp, precise description of the actual hardware behavior, so that KVM can do the right thing when virtualizing ERAPS. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests 2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah 2024-11-28 13:28 ` [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB Amit Shah @ 2024-11-28 13:28 ` Amit Shah 2024-12-02 18:30 ` Sean Christopherson 1 sibling, 1 reply; 31+ messages in thread From: Amit Shah @ 2024-11-28 13:28 UTC (permalink / raw) To: linux-kernel, kvm, x86, linux-doc Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 From: Amit Shah <amit.shah@amd.com> AMD CPUs with the ERAPS feature (Zen5+) have a larger RSB (aka RAP). While the new default RSB size is used on the host without any software modification necessary, the RSB size for guests is limited to the older value (32 entries) for backwards compatibility. With this patch, KVM enables guest mode to also use the default number of entries by setting the new ALLOW_LARGER_RAP bit in the VMCB. The two cases for backward compatibility that need special handling are nested guests, and guests using shadow paging (or when NPT is disabled): For nested guests: the ERAPS feature adds host/guest tagging to entries in the RSB, but does not distinguish between ASIDs. On a nested exit, the L0 hypervisor instructs the hardware (via another new VMCB bit, FLUSH_RAP_ON_VMRUN) to flush the RSB on the next VMRUN to prevent RSB poisoning attacks from an L2 guest to an L1 guest. With that in place, this feature can be exposed to guests. For shadow paging guests: do not expose this feature to guests; only expose if nested paging is enabled, to ensure a context switch within a guest triggers a context switch on the CPU -- thereby ensuring guest context switches flush guest RSB entries. For shadow paging, the CPU's CR3 is not used for guest processes, and hence cannot benefit from this feature. Signed-off-by: Amit Shah <amit.shah@amd.com> --- arch/x86/include/asm/svm.h | 6 +++++- arch/x86/kvm/cpuid.c | 18 ++++++++++++++++-- arch/x86/kvm/svm/svm.c | 29 +++++++++++++++++++++++++++++ arch/x86/kvm/svm/svm.h | 15 +++++++++++++++ 4 files changed, 65 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 2b59b9951c90..f8584a63c859 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -129,7 +129,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area { u64 tsc_offset; u32 asid; u8 tlb_ctl; - u8 reserved_2[3]; + u8 erap_ctl; + u8 reserved_2[2]; u32 int_ctl; u32 int_vector; u32 int_state; @@ -175,6 +176,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area { #define TLB_CONTROL_FLUSH_ASID 3 #define TLB_CONTROL_FLUSH_ASID_LOCAL 7 +#define ERAP_CONTROL_ALLOW_LARGER_RAP 0 +#define ERAP_CONTROL_FLUSH_RAP 1 + #define V_TPR_MASK 0x0f #define V_IRQ_SHIFT 8 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 097bdc022d0f..dd589670a716 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -803,6 +803,8 @@ void kvm_set_cpu_caps(void) F(WRMSR_XX_BASE_NS) ); + if (tdp_enabled) + kvm_cpu_cap_check_and_set(X86_FEATURE_ERAPS); kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB); kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE); kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO); @@ -1362,10 +1364,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) case 0x80000020: entry->eax = entry->ebx = entry->ecx = entry->edx = 0; break; - case 0x80000021: - entry->ebx = entry->ecx = entry->edx = 0; + case 0x80000021: { + unsigned int ebx_mask = 0; + + entry->ecx = entry->edx = 0; cpuid_entry_override(entry, CPUID_8000_0021_EAX); + + /* + * Bits 23:16 in EBX indicate the size of the RSB. + * Expose the value in the hardware to the guest. + */ + if (kvm_cpu_cap_has(X86_FEATURE_ERAPS)) + ebx_mask |= GENMASK(23, 16); + + entry->ebx &= ebx_mask; break; + } /* AMD Extended Performance Monitoring and Debug */ case 0x80000022: { union cpuid_0x80000022_ebx ebx; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index dd15cc635655..9b055de079cb 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1360,6 +1360,13 @@ static void init_vmcb(struct kvm_vcpu *vcpu) if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL)) set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); + /* + * If the hardware has a larger RSB, use it in the guest context as + * well. + */ + if (cpu_feature_enabled(X86_FEATURE_ERAPS) && npt_enabled) + vmcb_set_larger_rap(svm->vmcb); + if (kvm_vcpu_apicv_active(vcpu)) avic_init_vmcb(svm, vmcb); @@ -3395,6 +3402,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu) pr_err("%-20s%016llx\n", "tsc_offset:", control->tsc_offset); pr_err("%-20s%d\n", "asid:", control->asid); pr_err("%-20s%d\n", "tlb_ctl:", control->tlb_ctl); + pr_err("%-20s%d\n", "erap_ctl:", control->erap_ctl); pr_err("%-20s%08x\n", "int_ctl:", control->int_ctl); pr_err("%-20s%08x\n", "int_vector:", control->int_vector); pr_err("%-20s%08x\n", "int_state:", control->int_state); @@ -3561,6 +3569,27 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM); + if (boot_cpu_has(X86_FEATURE_ERAPS) + && vmcb_is_larger_rap(svm->vmcb01.ptr)) { + /* + * XXX a few further optimizations can be made: + * + * 1. In pre_svm_run() we can reset this bit when a hw + * TLB flush has happened - any context switch on a + * CPU (which causes a TLB flush) auto-flushes the RSB + * - eg when this vCPU is scheduled on a different + * pCPU. + * + * 2. This is also not needed in the case where the + * vCPU is being scheduled on the same pCPU, but there + * was a context switch between the #VMEXIT and VMRUN. + * + * 3. If the guest returns to L2 again after this + * #VMEXIT, there's no need to flush the RSB. + */ + vmcb_set_flush_rap(svm->vmcb01.ptr); + } + vmexit = nested_svm_exit_special(svm); if (vmexit == NESTED_EXIT_CONTINUE) diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 43fa6a16eb19..8a7877f46dc5 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -500,6 +500,21 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit) return vmcb_is_intercept(&svm->vmcb->control, bit); } +static inline void vmcb_set_flush_rap(struct vmcb *vmcb) +{ + __set_bit(ERAP_CONTROL_FLUSH_RAP, (unsigned long *)&vmcb->control.erap_ctl); +} + +static inline void vmcb_set_larger_rap(struct vmcb *vmcb) +{ + __set_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl); +} + +static inline bool vmcb_is_larger_rap(struct vmcb *vmcb) +{ + return test_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl); +} + static inline bool nested_vgif_enabled(struct vcpu_svm *svm) { return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) && -- 2.47.0 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests 2024-11-28 13:28 ` [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Amit Shah @ 2024-12-02 18:30 ` Sean Christopherson 2025-03-27 11:10 ` Shah, Amit 0 siblings, 1 reply; 31+ messages in thread From: Sean Christopherson @ 2024-12-02 18:30 UTC (permalink / raw) To: Amit Shah Cc: linux-kernel, kvm, x86, linux-doc, amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, pbonzini, daniel.sneddon, kai.huang, sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3 KVM: SVM: Please through the relevant maintainer handbooks, there are warts all over. Documentation/process/maintainer-kvm-x86.rst Documentation/process/maintainer-tip.rst And the shortlog is wrong. The patch itself is also broken. KVM should (a) add support for virtualizing ERAPS and (b) advertise support to *userspace*. The userspace VMM ultimately decides what to expose/enable for the guest. On Thu, Nov 28, 2024, Amit Shah wrote: > From: Amit Shah <amit.shah@amd.com> > > AMD CPUs with the ERAPS feature (Zen5+) have a larger RSB (aka RAP). Please spell out ERAPS at least once (I assume it's "Enhanced RAPs"?) and very briefly document what it does. > While the new default RSB size is used on the host without any software > modification necessary, the RSB size for guests is limited to the older Please describe hardware behavior, and make it abundantly clear that the changelog is talking about hardware behavior. One of my pet peeves (understatement) with the APM is that it does a shit job of explaining the actual architectural behavior. > value (32 entries) for backwards compatibility. Backwards compatibility with what? And how far back? E.g. have CPUs with a RAP always had 32 entries? > With this patch, KVM No "this patch" > enables guest mode Use imperative mood. > to also use the default number of entries by setting "default" is clearly wrong, since the *default* behavior is to use > the new ALLOW_LARGER_RAP bit in the VMCB. I detest the "ALLOW_LARGER" name. "Allow" implies the guest somehow has a choice. And "Larger" implies there's an even larger size And again, please explicitly describe what this bit does. > The two cases for backward compatibility that need special handling are > nested guests, and guests using shadow paging Guests don't use shadow paging, *KVM* uses > (or when NPT is disabled): "i.e", not "or". "Or" makes it sound like "NPT is disabled" is separate case from shadow paging. > For nested guests: the ERAPS feature adds host/guest tagging to entries > in the RSB, but does not distinguish between ASIDs. On a nested exit, > the L0 hypervisor instructs the hardware (via another new VMCB bit, I strongly suspect this was copied from the APM. Please don't do that. State what change is being for *KVM*, not for "the L0 hypervisor". This verbiage mixes hardware behavior with software behavior, which again is why I hate much of the APM's wording. > FLUSH_RAP_ON_VMRUN) to flush the RSB on the next VMRUN to prevent RSB > poisoning attacks from an L2 guest to an L1 guest. With that in place, > this feature can be exposed to guests. ERAPS can also be advertised if nested virtualization is disabled, no? I think it makes sense to first add support for ERAPS if "!nested", and then in a separate path, add support for ERAPS when nested virtualization is enabled. Partly so that it's easier for readers to understand why nested VMs are special, but mainly because the nested virtualization support is sorely lacking. > For shadow paging guests: do not expose this feature to guests; only > expose if nested paging is enabled, to ensure a context switch within > a guest triggers a context switch on the CPU -- thereby ensuring guest > context switches flush guest RSB entries. Huh? > For shadow paging, the CPU's CR3 is not used for guest processes, and hence > cannot benefit from this feature. What does that have to do with anything? > Signed-off-by: Amit Shah <amit.shah@amd.com> > --- > arch/x86/include/asm/svm.h | 6 +++++- > arch/x86/kvm/cpuid.c | 18 ++++++++++++++++-- > arch/x86/kvm/svm/svm.c | 29 +++++++++++++++++++++++++++++ > arch/x86/kvm/svm/svm.h | 15 +++++++++++++++ > 4 files changed, 65 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h > index 2b59b9951c90..f8584a63c859 100644 > --- a/arch/x86/include/asm/svm.h > +++ b/arch/x86/include/asm/svm.h > @@ -129,7 +129,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area { > u64 tsc_offset; > u32 asid; > u8 tlb_ctl; > - u8 reserved_2[3]; > + u8 erap_ctl; > + u8 reserved_2[2]; > u32 int_ctl; > u32 int_vector; > u32 int_state; > @@ -175,6 +176,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area { > #define TLB_CONTROL_FLUSH_ASID 3 > #define TLB_CONTROL_FLUSH_ASID_LOCAL 7 > > +#define ERAP_CONTROL_ALLOW_LARGER_RAP 0 > +#define ERAP_CONTROL_FLUSH_RAP 1 Assuming the control enables using the full RAP size, these should be something like: #define ERAP_CONTROL_ENABLE_FULL_RAP_MASK BIT(0) #define ERAP_CONTROL_FLUSH_RAP_ON_VMRUN BIT(1) > #define V_TPR_MASK 0x0f > > #define V_IRQ_SHIFT 8 > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 097bdc022d0f..dd589670a716 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -803,6 +803,8 @@ void kvm_set_cpu_caps(void) > F(WRMSR_XX_BASE_NS) > ); > > + if (tdp_enabled) > + kvm_cpu_cap_check_and_set(X86_FEATURE_ERAPS); > kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB); > kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE); > kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO); > @@ -1362,10 +1364,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) > case 0x80000020: > entry->eax = entry->ebx = entry->ecx = entry->edx = 0; > break; > - case 0x80000021: > - entry->ebx = entry->ecx = entry->edx = 0; > + case 0x80000021: { > + unsigned int ebx_mask = 0; > + > + entry->ecx = entry->edx = 0; > cpuid_entry_override(entry, CPUID_8000_0021_EAX); > + > + /* > + * Bits 23:16 in EBX indicate the size of the RSB. Is this enumeration explicitly tied to ERAPS? > + * Expose the value in the hardware to the guest. __do_cpuid_func() is used to advertise KVM's supported CPUID to host userspace, not to the guest. Side topic, what happens when Zen6 adds EVEN_LARGER_RAP? Enumerating the size of the RAP suggets it's likely to change in the future. > + */ > + if (kvm_cpu_cap_has(X86_FEATURE_ERAPS)) > + ebx_mask |= GENMASK(23, 16); > + > + entry->ebx &= ebx_mask; This is a waste of code and makes it unnecessarily difficult to read. Just do: if (kvm_cpu_cap_has(X86_FEATURE_ERAPS)) entry->ebx &= GENMASK(23, 16); else entry->ebx = 0; > break; > + } > /* AMD Extended Performance Monitoring and Debug */ > case 0x80000022: { > union cpuid_0x80000022_ebx ebx; > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index dd15cc635655..9b055de079cb 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1360,6 +1360,13 @@ static void init_vmcb(struct kvm_vcpu *vcpu) > if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL)) > set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); > > + /* > + * If the hardware has a larger RSB, use it in the guest context as > + * well. > + */ > + if (cpu_feature_enabled(X86_FEATURE_ERAPS) && npt_enabled) This is wrong. Userspace owns the vCPU model, not KVM. If userspace wants to disable ERAPS for the guest, say because of a hardware vulnerability, then KVM needs to honor that. And this should be kvm_cpu_cap_has(), not copy+paste of the code that enables the KVM capability. > + vmcb_set_larger_rap(svm->vmcb); s/set/enable. "set" implies a value in this context. > + > if (kvm_vcpu_apicv_active(vcpu)) > avic_init_vmcb(svm, vmcb); > > @@ -3395,6 +3402,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu) > pr_err("%-20s%016llx\n", "tsc_offset:", control->tsc_offset); > pr_err("%-20s%d\n", "asid:", control->asid); > pr_err("%-20s%d\n", "tlb_ctl:", control->tlb_ctl); > + pr_err("%-20s%d\n", "erap_ctl:", control->erap_ctl); > pr_err("%-20s%08x\n", "int_ctl:", control->int_ctl); > pr_err("%-20s%08x\n", "int_vector:", control->int_vector); > pr_err("%-20s%08x\n", "int_state:", control->int_state); > @@ -3561,6 +3569,27 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) > > trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM); > > + if (boot_cpu_has(X86_FEATURE_ERAPS) > + && && goes on the previous line. > vmcb_is_larger_rap(svm->vmcb01.ptr)) { This should be something like "vmcb_is_full_rap_size_enabled()". "is_larger_rap()" begs the question: is larger than what? > + /* > + * XXX a few further optimizations can be made: > + * > + * 1. In pre_svm_run() we can reset this bit when a hw > + * TLB flush has happened - any context switch on a > + * CPU (which causes a TLB flush) auto-flushes the RSB > + * - eg when this vCPU is scheduled on a different > + * pCPU. > + * > + * 2. This is also not needed in the case where the > + * vCPU is being scheduled on the same pCPU, but there > + * was a context switch between the #VMEXIT and VMRUN. Either do the optimizations straightaway, or call them out as possible optimizations in the changelog and then explain why it's not worth doing them. The above also mixes hardware behavior and software behavior, to the point where I honestly have no idea who is doing what. "A context switch" tells me nothing useful. > + * > + * 3. If the guest returns to L2 again after this > + * #VMEXIT, there's no need to flush the RSB. This one in particular is trivially easy to implement correctly. This also highlights the fact that KVM completely fails to emulate FLUSH_RAP_ON_VMRUN if it's set in vmcb12, though that's somewhat of a moot point because unless I'm missing something, KVM is responsible for emulating host vs. guest hardware tagging. From L1's perspective, the (virtual) CPU, a.k.a. KVM, is responsible for isolating guest (L2) RAP entries from host (L1) RAP entries. And so KVM must flush the RAP on every nested VM-Exit *and* nested VM-Enter, not just on nested VM-Exit. > + */ > + vmcb_set_flush_rap(svm->vmcb01.ptr); Eh, follow the TLB flush helpers and just go with vmcb_flush_rap(). > + } > + > vmexit = nested_svm_exit_special(svm); > > if (vmexit == NESTED_EXIT_CONTINUE) > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index 43fa6a16eb19..8a7877f46dc5 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -500,6 +500,21 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit) > return vmcb_is_intercept(&svm->vmcb->control, bit); > } > > +static inline void vmcb_set_flush_rap(struct vmcb *vmcb) > +{ > + __set_bit(ERAP_CONTROL_FLUSH_RAP, (unsigned long *)&vmcb->control.erap_ctl); Eww. Don't use the bitops helpers, casting a u8 to an unsigned long, and then having to use the non-atomic helpers makes this way, way more complicated then it actually is. vmcb->control.erap_ctl |= ERAP_CONTROL_FLUSH_RAP_ON_VMRUN; > +} > + > +static inline void vmcb_set_larger_rap(struct vmcb *vmcb) > +{ > + __set_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl); > +} > + > +static inline bool vmcb_is_larger_rap(struct vmcb *vmcb) > +{ > + return test_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl); > +} > + > static inline bool nested_vgif_enabled(struct vcpu_svm *svm) > { > return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) && > -- > 2.47.0 > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests 2024-12-02 18:30 ` Sean Christopherson @ 2025-03-27 11:10 ` Shah, Amit 0 siblings, 0 replies; 31+ messages in thread From: Shah, Amit @ 2025-03-27 11:10 UTC (permalink / raw) To: seanjc@google.com Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com, kai.huang@intel.com, jpoimboe@kernel.org, andrew.cooper3@citrix.com, dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com, Lendacky, Thomas, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, dwmw@amazon.co.uk, pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu, Das1, Sandipan, linux-doc@vger.kernel.org, hpa@zytor.com, peterz@infradead.org, bp@alien8.de, boris.ostrovsky@oracle.com, Kaplan, David, x86@kernel.org Thanks for the review, Sean. Apologies for not replying sooner. I hope you still have some context in mind. I have a question for you below, but other than that, I'll send out a new rev next week. On Mon, 2024-12-02 at 10:30 -0800, Sean Christopherson wrote: > KVM: SVM: > > Please through the relevant maintainer handbooks, there are warts all > over. > > Documentation/process/maintainer-kvm-x86.rst > Documentation/process/maintainer-tip.rst > > And the shortlog is wrong. The patch itself is also broken. KVM > should (a) add > support for virtualizing ERAPS and (b) advertise support to > *userspace*. The > userspace VMM ultimately decides what to expose/enable for the guest. Point taken - I interpreted it the way I wanted it to - but I can see what you mean - I'll reword. [...] > > While the new default RSB size is used on the host without any > > software > > modification necessary, the RSB size for guests is limited to the > > older > > Please describe hardware behavior, and make it abundantly clear that > the changelog > is talking about hardware behavior. One of my pet peeves > (understatement) with > the APM is that it does a shit job of explaining the actual > architectural behavior. OK, more rewording. Hardware behaviour is to just use 64 entries on the host; and limit to 32 entries for any guest. If the right VMCB bits are set, like in the patch, the hardware uses all the 64 entries for the guest. > > value (32 entries) for backwards compatibility. > > Backwards compatibility with what? And how far back? E.g. have CPUs > with a RAP > always had 32 entries? Yes, all CPUs upto Zen5 had 32 entries -- that number's even just hard- coded in the definition of RSB_CLEAR_LOOPS, which is used for the RSB stuffing. The wording is borrowed a bit from the APM -- especially the "default" value. With Zen5, the default hardware RSB size is 64. So to expose the hardware-default size to the guest, this VMCB update is necessary; else the guests use the non-default, trimmed-down 32 entry RSB. > > > to also use the default number of entries by setting > > "default" is clearly wrong, since the *default* behavior is to use Yea; I get this is confusing. The hardware default is 64, but guest default is 32. My intention with that stmt was to say use the native default vs the guest default. Will reword. > > the new ALLOW_LARGER_RAP bit in the VMCB. > > I detest the "ALLOW_LARGER" name. "Allow" implies the guest somehow > has a choice. > And "Larger" implies there's an even larger size I agree, but this is the name used in the APM. What's the preference - deviate from the APM naming, or stick to it, despite the confusion it causes? > > The two cases for backward compatibility that need special handling > > are > > nested guests, and guests using shadow paging > > Guests don't use shadow paging, *KVM* uses > > > (or when NPT is disabled): > > "i.e", not "or". "Or" makes it sound like "NPT is disabled" is > separate case > from shadow paging. > > > For nested guests: the ERAPS feature adds host/guest tagging to > > entries > > in the RSB, but does not distinguish between ASIDs. On a nested > > exit, > > the L0 hypervisor instructs the hardware (via another new VMCB bit, > > I strongly suspect this was copied from the APM. Please don't do > that. State > what change is being for *KVM*, not for "the L0 hypervisor". This > verbiage mixes > hardware behavior with software behavior, which again is why I hate > much of the > APM's wording. This isn't necessarily from the APM; this is me describing what the hw does, to set up why KVM needs to do a few other things. I want to say that the hw is in control of host/guest tagging for the RSB entries; but it does not tag ASIDs. So KVM running as L0 hypervisor needs to take certain steps. KVM running as L1 hypervisor does not need to. > > FLUSH_RAP_ON_VMRUN) to flush the RSB on the next VMRUN to prevent > > RSB > > poisoning attacks from an L2 guest to an L1 guest. With that in > > place, > > this feature can be exposed to guests. > > ERAPS can also be advertised if nested virtualization is disabled, > no? I think > it makes sense to first add support for ERAPS if "!nested", and then > in a separate > path, add support for ERAPS when nested virtualization is enabled. > Partly so that > it's easier for readers to understand why nested VMs are special, but > mainly because > the nested virtualization support is sorely lacking. Yea, it makes sense to split these up, I'll try doing that. Nested virt disabled is the common case, so it makes sense to make that more obvious as well. > > For shadow paging guests: do not expose this feature to guests; > > only > > expose if nested paging is enabled, to ensure a context switch > > within > > a guest triggers a context switch on the CPU -- thereby ensuring > > guest > > context switches flush guest RSB entries. > > Huh? OK this is where I'm slightly confused as well - so tell me if I'm wrong: When EPT/NPT is disabled, and shadow MMU is used by kvm, the CR3 register on the CPU holds the PGD of the qemu process. So if a task switch happens within the guest, the CR3 on the CPU is not updated, but KVM's shadow MMU routines change the page tables pointed to by that CR3. Contrasting to the NPT case, the CPU's CR3 holds the guest PGD directly, and task switches within the guest cause an update to the CPU's CR3. Am I misremembering and misreading the code? > > For shadow paging, the CPU's CR3 is not used for guest processes, > > and hence > > cannot benefit from this feature. > > What does that have to do with anything? If what I wrote above is true, since CR3 does not change when guest tasks switch, the RSB will not be flushed by hardware on guest task switches, and so exposing ERAPS to this guest is wrong -- it'll relax its own FILL_RETURN_BUFFER routines (via patch 1) and that's not what we want when the hw is doing that. [...] > > @@ -1362,10 +1364,22 @@ static inline int __do_cpuid_func(struct > > kvm_cpuid_array *array, u32 function) > > case 0x80000020: > > entry->eax = entry->ebx = entry->ecx = entry->edx = 0; > > break; > > - case 0x80000021: > > - entry->ebx = entry->ecx = entry->edx = 0; > > + case 0x80000021: { > > + unsigned int ebx_mask = 0; > > + > > + entry->ecx = entry->edx = 0; > > cpuid_entry_override(entry, CPUID_8000_0021_EAX); > > + > > + /* > > + * Bits 23:16 in EBX indicate the size of the RSB. > > Is this enumeration explicitly tied to ERAPS? Not tied to ERAPS, but it happens to be defined at the same time as ERAPS is being defined, and there's no other CPUID bit that says 'the RSB size is now available in these bits'. So yea, the ERAPS CPUID bit is being overloaded here. Existence of the ERAPS CPUID bit confirms that the default RSB size is now different, and that the hardware flushes the RSB (as in patch 1). > > + * Expose the value in the hardware to the guest. > > __do_cpuid_func() is used to advertise KVM's supported CPUID to host > userspace, > not to the guest. > > Side topic, what happens when Zen6 adds EVEN_LARGER_RAP? Enumerating > the size of > the RAP suggets it's likely to change in the future. It's just informational-only for now, and there aren't any plans to extend the RAP anymore. But you're right - it could change in the future. > > + */ > > + if (kvm_cpu_cap_has(X86_FEATURE_ERAPS)) > > + ebx_mask |= GENMASK(23, 16); > > + > > + entry->ebx &= ebx_mask; > > This is a waste of code and makes it unnecessarily difficult to > read. Just do: > > if (kvm_cpu_cap_has(X86_FEATURE_ERAPS)) > entry->ebx &= GENMASK(23, 16); > else > entry->ebx = 0; Well I left it all this way to make it easier the the future to add more ebx bits here, but sure - I'll just shorten it now. [...] Amit ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-04-02 14:19 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-21 20:07 [PATCH v2 0/2] x86/bugs: RSB tweaks Josh Poimboeuf 2024-11-21 20:07 ` [PATCH v2 1/2] x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline Josh Poimboeuf 2024-11-30 15:31 ` Borislav Petkov 2024-12-02 11:15 ` Shah, Amit 2024-12-02 12:19 ` Borislav Petkov 2024-12-02 23:35 ` Pawan Gupta 2024-12-03 11:20 ` Borislav Petkov 2024-12-05 23:12 ` Josh Poimboeuf 2024-12-21 9:13 ` Borislav Petkov 2025-04-02 9:19 ` Shah, Amit 2025-04-02 14:16 ` Josh Poimboeuf 2025-04-02 14:19 ` Shah, Amit 2024-12-05 23:13 ` Josh Poimboeuf 2024-11-21 20:07 ` [PATCH v2 2/2] x86/bugs: Don't fill RSB on context switch with eIBRS Josh Poimboeuf 2024-12-03 11:42 ` Borislav Petkov 2024-12-05 23:32 ` Josh Poimboeuf 2024-12-06 0:53 ` Josh Poimboeuf 2024-12-06 23:02 ` Josh Poimboeuf 2024-12-30 14:54 ` Shah, Amit 2025-01-08 11:50 ` Shah, Amit 2024-12-06 10:10 ` Shah, Amit 2024-12-09 20:46 ` jpoimboe 2024-11-28 13:28 ` [RFC PATCH v3 0/2] Add support for the ERAPS feature Amit Shah 2024-11-28 13:28 ` [RFC PATCH v3 1/2] x86: cpu/bugs: add AMD ERAPS support; hardware flushes RSB Amit Shah 2024-12-02 17:26 ` Dave Hansen 2024-12-02 18:09 ` Amit Shah 2024-12-02 18:25 ` Dave Hansen 2024-12-02 18:36 ` Sean Christopherson 2024-11-28 13:28 ` [RFC PATCH v3 2/2] x86: kvm: svm: advertise ERAPS (larger RSB) support to guests Amit Shah 2024-12-02 18:30 ` Sean Christopherson 2025-03-27 11:10 ` Shah, Amit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox