* [PATCH v3 0/3] VMSCAPE optimization for BHI variant
@ 2025-10-27 23:43 Pawan Gupta
2025-10-27 23:43 ` [PATCH v3 1/3] x86/bhi: Add BHB clearing for CPUs with larger branch history Pawan Gupta
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Pawan Gupta @ 2025-10-27 23:43 UTC (permalink / raw)
To: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang
v3:
- s/x86_pred_flush_pending/x86_predictor_flush_exit_to_user/ (Sean).
- Removed IBPB & BHB-clear mutual exclusion at exit-to-userspace.
- Collected tags.
v2: https://lore.kernel.org/r/20251015-vmscape-bhb-v2-0-91cbdd9c3a96@linux.intel.com
- Added check for IBPB feature in vmscape_select_mitigation(). (David)
- s/vmscape=auto/vmscape=on/ (David)
- Added patch to remove LFENCE from VMSCAPE BHB-clear sequence.
- Rebased to v6.18-rc1.
v1: https://lore.kernel.org/r/20250924-vmscape-bhb-v1-0-da51f0e1934d@linux.intel.com
Hi All,
These patches aim to improve the performance of a recent mitigation for
VMSCAPE[1] vulnerability. This improvement is relevant for BHI variant of
VMSCAPE that affect Alder Lake and newer processors.
The current mitigation approach uses IBPB on kvm-exit-to-userspace for all
affected range of CPUs. This is an overkill for CPUs that are only affected
by the BHI variant. On such CPUs clearing the branch history is sufficient
for VMSCAPE, and also more apt as the underlying issue is due to poisoned
branch history.
Roadmap:
- First patch introduces clear_bhb_long_loop() for processors with larger
branch history tables.
- Second patch replaces IBPB on exit-to-userspace with branch history
clearing sequence.
Below is the iPerf data for transfer between guest and host, comparing IBPB
and BHB-clear mitigation. BHB-clear shows performance improvement over IBPB
in most cases.
Platform: Emerald Rapids
Baseline: vmscape=off
(pN = N parallel connections)
| iPerf user-net | IBPB | BHB Clear |
|----------------|---------|-----------|
| UDP 1-vCPU_p1 | -12.5% | 1.3% |
| TCP 1-vCPU_p1 | -10.4% | -1.5% |
| TCP 1-vCPU_p1 | -7.5% | -3.0% |
| UDP 4-vCPU_p16 | -3.7% | -3.7% |
| TCP 4-vCPU_p4 | -2.9% | -1.4% |
| UDP 4-vCPU_p4 | -0.6% | 0.0% |
| TCP 4-vCPU_p4 | 3.5% | 0.0% |
| iPerf bridge-net | IBPB | BHB Clear |
|------------------|---------|-----------|
| UDP 1-vCPU_p1 | -9.4% | -0.4% |
| TCP 1-vCPU_p1 | -3.9% | -0.5% |
| UDP 4-vCPU_p16 | -2.2% | -3.8% |
| TCP 4-vCPU_p4 | -1.0% | -1.0% |
| TCP 4-vCPU_p4 | 0.5% | 0.5% |
| UDP 4-vCPU_p4 | 0.0% | 0.9% |
| TCP 1-vCPU_p1 | 0.0% | 0.9% |
| iPerf vhost-net | IBPB | BHB Clear |
|-----------------|---------|-----------|
| UDP 1-vCPU_p1 | -4.3% | 1.0% |
| TCP 1-vCPU_p1 | -3.8% | -0.5% |
| TCP 1-vCPU_p1 | -2.7% | -0.7% |
| UDP 4-vCPU_p16 | -0.7% | -2.2% |
| TCP 4-vCPU_p4 | -0.4% | 0.8% |
| UDP 4-vCPU_p4 | 0.4% | -0.7% |
| TCP 4-vCPU_p4 | 0.0% | 0.6% |
[1] https://comsec.ethz.ch/research/microarch/vmscape-exposing-and-exploiting-incomplete-branch-predictor-isolation-in-cloud-environments/
---
Pawan Gupta (3):
x86/bhi: Add BHB clearing for CPUs with larger branch history
x86/vmscape: Replace IBPB with branch history clear on exit to userspace
x86/vmscape: Remove LFENCE from BHB clearing long loop
Documentation/admin-guide/hw-vuln/vmscape.rst | 8 ++++
Documentation/admin-guide/kernel-parameters.txt | 4 +-
arch/x86/entry/entry_64.S | 63 ++++++++++++++++++-------
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/entry-common.h | 12 +++--
arch/x86/include/asm/nospec-branch.h | 5 +-
arch/x86/kernel/cpu/bugs.c | 53 +++++++++++++++------
arch/x86/kvm/x86.c | 5 +-
8 files changed, 110 insertions(+), 41 deletions(-)
---
base-commit: fd57572253bc356330dbe5b233c2e1d8426c66fd
change-id: 20250916-vmscape-bhb-d7d469977f2f
Best regards,
--
Pawan
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 1/3] x86/bhi: Add BHB clearing for CPUs with larger branch history
2025-10-27 23:43 [PATCH v3 0/3] VMSCAPE optimization for BHI variant Pawan Gupta
@ 2025-10-27 23:43 ` Pawan Gupta
2025-11-03 20:04 ` Dave Hansen
2025-10-27 23:43 ` [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace Pawan Gupta
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Pawan Gupta @ 2025-10-27 23:43 UTC (permalink / raw)
To: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang
Add a version of clear_bhb_loop() that works on CPUs with larger branch
history table such as Alder Lake and newer. This could serve as a cheaper
alternative to IBPB mitigation for VMSCAPE.
clear_bhb_loop() and the new clear_bhb_long_loop() only differ in the loop
counter. Convert the asm implementation of clear_bhb_loop() into a macro
that is used by both the variants, passing counter as an argument.
There is no difference in the output of:
$ objdump --disassemble=clear_bhb_loop vmlinux
before and after this commit.
Acked-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/entry/entry_64.S | 47 ++++++++++++++++++++++++++----------
arch/x86/include/asm/nospec-branch.h | 3 +++
2 files changed, 37 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ed04a968cc7d0095ab0185b2e3b5beffb7680afd..f5f62af080d8ec6fe81e4dbe78ce44d08e62aa59 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1499,11 +1499,6 @@ SYM_CODE_END(rewind_stack_and_make_dead)
* from the branch history tracker in the Branch Predictor, therefore removing
* user influence on subsequent BTB lookups.
*
- * It should be used on parts prior to Alder Lake. Newer parts should use the
- * BHI_DIS_S hardware control instead. If a pre-Alder Lake part is being
- * virtualized on newer hardware the VMM should protect against BHI attacks by
- * setting BHI_DIS_S for the guests.
- *
* CALLs/RETs are necessary to prevent Loop Stream Detector(LSD) from engaging
* and not clearing the branch history. The call tree looks like:
*
@@ -1529,11 +1524,12 @@ SYM_CODE_END(rewind_stack_and_make_dead)
* that all RETs are in the second half of a cacheline to mitigate Indirect
* Target Selection, rather than taking the slowpath via its_return_thunk.
*/
-SYM_FUNC_START(clear_bhb_loop)
+.macro __CLEAR_BHB_LOOP outer_loop_count:req, inner_loop_count:req
ANNOTATE_NOENDBR
push %rbp
mov %rsp, %rbp
- movl $5, %ecx
+
+ movl $\outer_loop_count, %ecx
ANNOTATE_INTRA_FUNCTION_CALL
call 1f
jmp 5f
@@ -1542,29 +1538,54 @@ SYM_FUNC_START(clear_bhb_loop)
* Shift instructions so that the RET is in the upper half of the
* cacheline and don't take the slowpath to its_return_thunk.
*/
- .skip 32 - (.Lret1 - 1f), 0xcc
+ .skip 32 - (.Lret1_\@ - 1f), 0xcc
ANNOTATE_INTRA_FUNCTION_CALL
1: call 2f
-.Lret1: RET
+.Lret1_\@:
+ RET
.align 64, 0xcc
/*
- * As above shift instructions for RET at .Lret2 as well.
+ * As above shift instructions for RET at .Lret2_\@ as well.
*
- * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
+ * This should be ideally be: .skip 32 - (.Lret2_\@ - 2f), 0xcc
* but some Clang versions (e.g. 18) don't like this.
*/
.skip 32 - 18, 0xcc
-2: movl $5, %eax
+2: movl $\inner_loop_count, %eax
3: jmp 4f
nop
4: sub $1, %eax
jnz 3b
sub $1, %ecx
jnz 1b
-.Lret2: RET
+.Lret2_\@:
+ RET
5: lfence
+
pop %rbp
RET
+.endm
+
+/*
+ * This should be used on parts prior to Alder Lake. Newer parts should use the
+ * BHI_DIS_S hardware control instead. If a pre-Alder Lake part is being
+ * virtualized on newer hardware the VMM should protect against BHI attacks by
+ * setting BHI_DIS_S for the guests.
+ */
+SYM_FUNC_START(clear_bhb_loop)
+ __CLEAR_BHB_LOOP 5, 5
SYM_FUNC_END(clear_bhb_loop)
EXPORT_SYMBOL_GPL(clear_bhb_loop)
STACK_FRAME_NON_STANDARD(clear_bhb_loop)
+
+/*
+ * A longer version of clear_bhb_loop to ensure that the BHB is cleared on CPUs
+ * with larger branch history tables (i.e. Alder Lake and newer). BHI_DIS_S
+ * protects the kernel, but to mitigate the guest influence on the host
+ * userspace either IBPB or this sequence should be used. See VMSCAPE bug.
+ */
+SYM_FUNC_START(clear_bhb_long_loop)
+ __CLEAR_BHB_LOOP 12, 7
+SYM_FUNC_END(clear_bhb_long_loop)
+EXPORT_SYMBOL_GPL(clear_bhb_long_loop)
+STACK_FRAME_NON_STANDARD(clear_bhb_long_loop)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 08ed5a2e46a5fd790bcb1b73feb6469518809c06..49707e563bdf71bdd05d3827f10dd2b8ac6bca2c 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -388,6 +388,9 @@ extern void write_ibpb(void);
#ifdef CONFIG_X86_64
extern void clear_bhb_loop(void);
+extern void clear_bhb_long_loop(void);
+#else
+static inline void clear_bhb_long_loop(void) {}
#endif
extern void (*x86_return_thunk)(void);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace
2025-10-27 23:43 [PATCH v3 0/3] VMSCAPE optimization for BHI variant Pawan Gupta
2025-10-27 23:43 ` [PATCH v3 1/3] x86/bhi: Add BHB clearing for CPUs with larger branch history Pawan Gupta
@ 2025-10-27 23:43 ` Pawan Gupta
2025-10-29 22:47 ` Sean Christopherson
2025-11-03 20:31 ` Dave Hansen
2025-10-27 23:43 ` [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop Pawan Gupta
2025-11-03 20:07 ` [PATCH v3 0/3] VMSCAPE optimization for BHI variant Dave Hansen
3 siblings, 2 replies; 18+ messages in thread
From: Pawan Gupta @ 2025-10-27 23:43 UTC (permalink / raw)
To: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang
IBPB mitigation for VMSCAPE is an overkill for CPUs that are only affected
by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
indirect branch isolation between guest and host userspace. But, a guest
could still poison the branch history.
To mitigate that, use the recently added clear_bhb_long_loop() to isolate
the branch history between guest and userspace. Add cmdline option
'vmscape=on' that automatically selects the appropriate mitigation based
on the CPU.
Acked-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Documentation/admin-guide/hw-vuln/vmscape.rst | 8 ++++
Documentation/admin-guide/kernel-parameters.txt | 4 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/entry-common.h | 12 +++---
arch/x86/include/asm/nospec-branch.h | 2 +-
arch/x86/kernel/cpu/bugs.c | 53 ++++++++++++++++++-------
arch/x86/kvm/x86.c | 5 ++-
7 files changed, 61 insertions(+), 24 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index d9b9a2b6c114c05a7325e5f3c9d42129339b870b..580f288ae8bfc601ff000d6d95d711bb9084459e 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -86,6 +86,10 @@ The possible values in this file are:
run a potentially malicious guest and issues an IBPB before the first
exit to userspace after VM-exit.
+ * 'Mitigation: Clear BHB before exit to userspace':
+
+ As above, conditional BHB clearing mitigation is enabled.
+
* 'Mitigation: IBPB on VMEXIT':
IBPB is issued on every VM-exit. This occurs when other mitigations like
@@ -108,3 +112,7 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter:
Force vulnerability detection and mitigation even on processors that are
not known to be affected.
+
+ * ``vmscape=on``:
+
+ Choose the mitigation based on the VMSCAPE variant the CPU is affected by.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6c42061ca20e581b5192b66c6f25aba38d4f8ff8..4b4711ced5e187495476b5365cd7b3df81db893b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8104,9 +8104,11 @@
off - disable the mitigation
ibpb - use Indirect Branch Prediction Barrier
- (IBPB) mitigation (default)
+ (IBPB) mitigation
force - force vulnerability detection even on
unaffected processors
+ on - (default) automatically select IBPB
+ or BHB clear mitigation based on CPU
vsyscall= [X86-64,EARLY]
Controls the behavior of vsyscalls (i.e. calls to
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 4091a776e37aaed67ca93b0a0cd23cc25dbc33d4..3d547c3eab4e3290de3eee8e89f21587fee34931 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -499,6 +499,7 @@
#define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
#define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */
#define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */
+#define X86_FEATURE_CLEAR_BHB_EXIT_TO_USER (21*32+17) /* Clear branch history on exit-to-userspace, see VMSCAPE bug */
/*
* BUG word(s)
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index ce3eb6d5fdf9f2dba59b7bad24afbfafc8c36918..b629e85c33aa7387042cce60040b8a493e3e6d46 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -94,11 +94,13 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
*/
choose_random_kstack_offset(rdtsc());
- /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
- if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
- this_cpu_read(x86_ibpb_exit_to_user)) {
- indirect_branch_prediction_barrier();
- this_cpu_write(x86_ibpb_exit_to_user, false);
+ if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
+ if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
+ indirect_branch_prediction_barrier();
+ if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_EXIT_TO_USER))
+ clear_bhb_long_loop();
+
+ this_cpu_write(x86_predictor_flush_exit_to_user, false);
}
}
#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 49707e563bdf71bdd05d3827f10dd2b8ac6bca2c..745394be734f3c2b5640c9aef10156fe1d02636b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -534,7 +534,7 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
: "memory");
}
-DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
+DECLARE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
static inline void indirect_branch_prediction_barrier(void)
{
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d7fa03bf51b4517c12cc68e7c441f7589a4983d1..592730201b6e50a6b1f381e4179a0e45560418cc 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -109,12 +109,11 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
/*
- * Set when the CPU has run a potentially malicious guest. An IBPB will
- * be needed to before running userspace. That IBPB will flush the branch
- * predictor content.
+ * Set when the CPU has run a potentially malicious guest. Indicates that a
+ * branch predictor flush is needed before running userspace.
*/
-DEFINE_PER_CPU(bool, x86_ibpb_exit_to_user);
-EXPORT_PER_CPU_SYMBOL_GPL(x86_ibpb_exit_to_user);
+DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
+EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
@@ -3197,13 +3196,15 @@ enum vmscape_mitigations {
VMSCAPE_MITIGATION_AUTO,
VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
+ VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
};
static const char * const vmscape_strings[] = {
- [VMSCAPE_MITIGATION_NONE] = "Vulnerable",
+ [VMSCAPE_MITIGATION_NONE] = "Vulnerable",
/* [VMSCAPE_MITIGATION_AUTO] */
- [VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER] = "Mitigation: IBPB before exit to userspace",
- [VMSCAPE_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT",
+ [VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER] = "Mitigation: IBPB before exit to userspace",
+ [VMSCAPE_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT",
+ [VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER] = "Mitigation: Clear BHB before exit to userspace",
};
static enum vmscape_mitigations vmscape_mitigation __ro_after_init =
@@ -3221,6 +3222,8 @@ static int __init vmscape_parse_cmdline(char *str)
} else if (!strcmp(str, "force")) {
setup_force_cpu_bug(X86_BUG_VMSCAPE);
vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
+ } else if (!strcmp(str, "on")) {
+ vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
} else {
pr_err("Ignoring unknown vmscape=%s option.\n", str);
}
@@ -3231,18 +3234,35 @@ early_param("vmscape", vmscape_parse_cmdline);
static void __init vmscape_select_mitigation(void)
{
- if (!boot_cpu_has_bug(X86_BUG_VMSCAPE) ||
- !boot_cpu_has(X86_FEATURE_IBPB)) {
+ if (!boot_cpu_has_bug(X86_BUG_VMSCAPE)) {
vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
return;
}
- if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) {
- if (should_mitigate_vuln(X86_BUG_VMSCAPE))
- vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
- else
- vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+ if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO &&
+ !should_mitigate_vuln(X86_BUG_VMSCAPE))
+ vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+
+ if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER &&
+ !boot_cpu_has(X86_FEATURE_IBPB)) {
+ pr_err("IBPB not supported, switching to AUTO select\n");
+ vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
}
+
+ if (vmscape_mitigation != VMSCAPE_MITIGATION_AUTO)
+ return;
+
+ /*
+ * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use BHB
+ * clear sequence. These CPUs are only vulnerable to the BHI variant
+ * of the VMSCAPE attack and does not require an IBPB flush.
+ */
+ if (boot_cpu_has(X86_FEATURE_BHI_CTRL))
+ vmscape_mitigation = VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER;
+ else if (boot_cpu_has(X86_FEATURE_IBPB))
+ vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
+ else
+ vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
}
static void __init vmscape_update_mitigation(void)
@@ -3261,6 +3281,8 @@ static void __init vmscape_apply_mitigation(void)
{
if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER);
+ else if (vmscape_mitigation == VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER)
+ setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_EXIT_TO_USER);
}
#undef pr_fmt
@@ -3352,6 +3374,7 @@ void cpu_bugs_smt_update(void)
break;
case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+ case VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER:
/*
* Hypervisors can be attacked across-threads, warn for SMT when
* STIBP is not already enabled system-wide.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4b5d2d0963467a81c7cc00575547619654295c6..0212de0ec2da153acb4218dc187dfcb87eae7115 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11397,8 +11397,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
* set for the CPU that actually ran the guest, and not the CPU that it
* may migrate to.
*/
- if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
- this_cpu_write(x86_ibpb_exit_to_user, true);
+ if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) ||
+ cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_EXIT_TO_USER))
+ this_cpu_write(x86_predictor_flush_exit_to_user, true);
/*
* Consume any pending interrupts, including the possible source of
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop
2025-10-27 23:43 [PATCH v3 0/3] VMSCAPE optimization for BHI variant Pawan Gupta
2025-10-27 23:43 ` [PATCH v3 1/3] x86/bhi: Add BHB clearing for CPUs with larger branch history Pawan Gupta
2025-10-27 23:43 ` [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace Pawan Gupta
@ 2025-10-27 23:43 ` Pawan Gupta
2025-11-03 20:45 ` Dave Hansen
2025-11-03 20:07 ` [PATCH v3 0/3] VMSCAPE optimization for BHI variant Dave Hansen
3 siblings, 1 reply; 18+ messages in thread
From: Pawan Gupta @ 2025-10-27 23:43 UTC (permalink / raw)
To: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang
Long loop is used to clear the branch history when switching from a guest
to host userspace. The LFENCE barrier is not required in this case as ring
transition itself acts as a barrier.
Move the prologue, LFENCE and epilogue out of __CLEAR_BHB_LOOP macro to
allow skipping the LFENCE in the long loop variant. Rename the long loop
function to clear_bhb_long_loop_no_barrier() to reflect the change.
Acked-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/entry/entry_64.S | 32 ++++++++++++++++++++------------
arch/x86/include/asm/entry-common.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 4 ++--
3 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f5f62af080d8ec6fe81e4dbe78ce44d08e62aa59..bb456a3c652e97f3a6fe72866b6dee04f59ccc98 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1525,10 +1525,6 @@ SYM_CODE_END(rewind_stack_and_make_dead)
* Target Selection, rather than taking the slowpath via its_return_thunk.
*/
.macro __CLEAR_BHB_LOOP outer_loop_count:req, inner_loop_count:req
- ANNOTATE_NOENDBR
- push %rbp
- mov %rsp, %rbp
-
movl $\outer_loop_count, %ecx
ANNOTATE_INTRA_FUNCTION_CALL
call 1f
@@ -1560,10 +1556,7 @@ SYM_CODE_END(rewind_stack_and_make_dead)
jnz 1b
.Lret2_\@:
RET
-5: lfence
-
- pop %rbp
- RET
+5:
.endm
/*
@@ -1573,7 +1566,15 @@ SYM_CODE_END(rewind_stack_and_make_dead)
* setting BHI_DIS_S for the guests.
*/
SYM_FUNC_START(clear_bhb_loop)
+ ANNOTATE_NOENDBR
+ push %rbp
+ mov %rsp, %rbp
+
__CLEAR_BHB_LOOP 5, 5
+
+ lfence
+ pop %rbp
+ RET
SYM_FUNC_END(clear_bhb_loop)
EXPORT_SYMBOL_GPL(clear_bhb_loop)
STACK_FRAME_NON_STANDARD(clear_bhb_loop)
@@ -1584,8 +1585,15 @@ STACK_FRAME_NON_STANDARD(clear_bhb_loop)
* protects the kernel, but to mitigate the guest influence on the host
* userspace either IBPB or this sequence should be used. See VMSCAPE bug.
*/
-SYM_FUNC_START(clear_bhb_long_loop)
+SYM_FUNC_START(clear_bhb_long_loop_no_barrier)
+ ANNOTATE_NOENDBR
+ push %rbp
+ mov %rsp, %rbp
+
__CLEAR_BHB_LOOP 12, 7
-SYM_FUNC_END(clear_bhb_long_loop)
-EXPORT_SYMBOL_GPL(clear_bhb_long_loop)
-STACK_FRAME_NON_STANDARD(clear_bhb_long_loop)
+
+ pop %rbp
+ RET
+SYM_FUNC_END(clear_bhb_long_loop_no_barrier)
+EXPORT_SYMBOL_GPL(clear_bhb_long_loop_no_barrier)
+STACK_FRAME_NON_STANDARD(clear_bhb_long_loop_no_barrier)
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index b629e85c33aa7387042cce60040b8a493e3e6d46..eb2b7303a9c1fc5976388c2a6a3fb7914b553239 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -98,7 +98,7 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
indirect_branch_prediction_barrier();
if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_EXIT_TO_USER))
- clear_bhb_long_loop();
+ clear_bhb_long_loop_no_barrier();
this_cpu_write(x86_predictor_flush_exit_to_user, false);
}
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 745394be734f3c2b5640c9aef10156fe1d02636b..7f479aaa21313e484e7a0fded0b8b417feb8e2d0 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -388,9 +388,9 @@ extern void write_ibpb(void);
#ifdef CONFIG_X86_64
extern void clear_bhb_loop(void);
-extern void clear_bhb_long_loop(void);
+extern void clear_bhb_long_loop_no_barrier(void);
#else
-static inline void clear_bhb_long_loop(void) {}
+static inline void clear_bhb_long_loop_no_barrier(void) {}
#endif
extern void (*x86_return_thunk)(void);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace
2025-10-27 23:43 ` [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace Pawan Gupta
@ 2025-10-29 22:47 ` Sean Christopherson
2025-10-30 0:08 ` Pawan Gupta
2025-11-03 20:31 ` Dave Hansen
1 sibling, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2025-10-29 22:47 UTC (permalink / raw)
To: Pawan Gupta
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan, Paolo Bonzini,
Borislav Petkov, Dave Hansen, linux-kernel, kvm, Asit Mallick,
Tao Zhang
On Mon, Oct 27, 2025, Pawan Gupta wrote:
> IBPB mitigation for VMSCAPE is an overkill for CPUs that are only affected
> by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
> indirect branch isolation between guest and host userspace. But, a guest
> could still poison the branch history.
>
> To mitigate that, use the recently added clear_bhb_long_loop() to isolate
> the branch history between guest and userspace. Add cmdline option
> 'vmscape=on' that automatically selects the appropriate mitigation based
> on the CPU.
>
> Acked-by: David Kaplan <david.kaplan@amd.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
> Documentation/admin-guide/hw-vuln/vmscape.rst | 8 ++++
> Documentation/admin-guide/kernel-parameters.txt | 4 +-
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/entry-common.h | 12 +++---
> arch/x86/include/asm/nospec-branch.h | 2 +-
> arch/x86/kernel/cpu/bugs.c | 53 ++++++++++++++++++-------
> arch/x86/kvm/x86.c | 5 ++-
> 7 files changed, 61 insertions(+), 24 deletions(-)
For the KVM changes,
Acked-by: Sean Christopherson <seanjc@google.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace
2025-10-29 22:47 ` Sean Christopherson
@ 2025-10-30 0:08 ` Pawan Gupta
0 siblings, 0 replies; 18+ messages in thread
From: Pawan Gupta @ 2025-10-30 0:08 UTC (permalink / raw)
To: Sean Christopherson
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan, Paolo Bonzini,
Borislav Petkov, Dave Hansen, linux-kernel, kvm, Asit Mallick,
Tao Zhang
On Wed, Oct 29, 2025 at 03:47:54PM -0700, Sean Christopherson wrote:
> On Mon, Oct 27, 2025, Pawan Gupta wrote:
> > IBPB mitigation for VMSCAPE is an overkill for CPUs that are only affected
> > by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
> > indirect branch isolation between guest and host userspace. But, a guest
> > could still poison the branch history.
> >
> > To mitigate that, use the recently added clear_bhb_long_loop() to isolate
> > the branch history between guest and userspace. Add cmdline option
> > 'vmscape=on' that automatically selects the appropriate mitigation based
> > on the CPU.
> >
> > Acked-by: David Kaplan <david.kaplan@amd.com>
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > ---
> > Documentation/admin-guide/hw-vuln/vmscape.rst | 8 ++++
> > Documentation/admin-guide/kernel-parameters.txt | 4 +-
> > arch/x86/include/asm/cpufeatures.h | 1 +
> > arch/x86/include/asm/entry-common.h | 12 +++---
> > arch/x86/include/asm/nospec-branch.h | 2 +-
> > arch/x86/kernel/cpu/bugs.c | 53 ++++++++++++++++++-------
> > arch/x86/kvm/x86.c | 5 ++-
> > 7 files changed, 61 insertions(+), 24 deletions(-)
>
> For the KVM changes,
>
> Acked-by: Sean Christopherson <seanjc@google.com>
Thank you.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] x86/bhi: Add BHB clearing for CPUs with larger branch history
2025-10-27 23:43 ` [PATCH v3 1/3] x86/bhi: Add BHB clearing for CPUs with larger branch history Pawan Gupta
@ 2025-11-03 20:04 ` Dave Hansen
2025-11-03 22:45 ` Pawan Gupta
0 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2025-11-03 20:04 UTC (permalink / raw)
To: Pawan Gupta, x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang
On 10/27/25 16:43, Pawan Gupta wrote:
> Add a version of clear_bhb_loop() that works on CPUs with larger branch
> history table such as Alder Lake and newer. This could serve as a cheaper
> alternative to IBPB mitigation for VMSCAPE.
This is missing a bit of background about clear_bhb_loop(). What does it
mitigate? This is also a better place to talk about why this loop exists
if it doesn't work on newer CPUs.
In other words, please mention BHI_DIS_S here.
> clear_bhb_loop() and the new clear_bhb_long_loop() only differ in the loop
> counter. Convert the asm implementation of clear_bhb_loop() into a macro
> that is used by both the variants, passing counter as an argument.
I find these a lot easier to review if you separate out the refactoring
from the new work. I know it's not a lot of code, but refactor first,
then add he new function in a separate patch.
> +/*
> + * A longer version of clear_bhb_loop to ensure that the BHB is cleared on CPUs
"clear_bhb_loop()", please.
> + * with larger branch history tables (i.e. Alder Lake and newer). BHI_DIS_S
> + * protects the kernel, but to mitigate the guest influence on the host
> + * userspace either IBPB or this sequence should be used. See VMSCAPE bug.
> + */
> +SYM_FUNC_START(clear_bhb_long_loop)
> + __CLEAR_BHB_LOOP 12, 7
> +SYM_FUNC_END(clear_bhb_long_loop)
> +EXPORT_SYMBOL_GPL(clear_bhb_long_loop)
> +STACK_FRAME_NON_STANDARD(clear_bhb_long_loop)
All the pieces are out there, but I feel like we need this in one place,
somewhere:
BHI_DIS_S: Mitigates user=>kernel attacks on new CPUs. Faster than the
long loop.
Long Loop: Mitigates guest=>host userspace attacks on new CPUs. Would
also work for user=>kernel, but BHI_DIS_S is faster.
Short Loop: The only choice on older CPUs. Used for both user=>kernel
and guest=>host userspace mitigation.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 0/3] VMSCAPE optimization for BHI variant
2025-10-27 23:43 [PATCH v3 0/3] VMSCAPE optimization for BHI variant Pawan Gupta
` (2 preceding siblings ...)
2025-10-27 23:43 ` [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop Pawan Gupta
@ 2025-11-03 20:07 ` Dave Hansen
2025-11-03 23:03 ` Pawan Gupta
3 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2025-11-03 20:07 UTC (permalink / raw)
To: Pawan Gupta, x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang
On 10/27/25 16:43, Pawan Gupta wrote:
> | iPerf user-net | IBPB | BHB Clear |
> |----------------|---------|-----------|
> | UDP 1-vCPU_p1 | -12.5% | 1.3% |
...
Could you clarify what "1.3%" means? Is that relative to the baseline,
or relative to the IBPB number?
If it's relative to the baseline, then this data either looks wrong or
noisy since there are a lot of places where adding the BHB Clear loop
makes things faster.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace
2025-10-27 23:43 ` [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace Pawan Gupta
2025-10-29 22:47 ` Sean Christopherson
@ 2025-11-03 20:31 ` Dave Hansen
2025-11-06 23:40 ` Pawan Gupta
1 sibling, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2025-11-03 20:31 UTC (permalink / raw)
To: Pawan Gupta, x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang
On 10/27/25 16:43, Pawan Gupta wrote:
> IBPB mitigation for VMSCAPE is an overkill for CPUs that are only affected
> by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
> indirect branch isolation between guest and host userspace. But, a guest
> could still poison the branch history.
This is missing a wee bit of background about how branch history and
indirect branch prediction are involved in VMSCAPE.
> To mitigate that, use the recently added clear_bhb_long_loop() to isolate
> the branch history between guest and userspace. Add cmdline option
> 'vmscape=on' that automatically selects the appropriate mitigation based
> on the CPU.
Is "=on" the right thing here as opposed to "=auto"? What you have here
doesn't actually turn VMSCAPE mitigation on for 'vmscape=on'.
> Documentation/admin-guide/hw-vuln/vmscape.rst | 8 ++++
> Documentation/admin-guide/kernel-parameters.txt | 4 +-
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/entry-common.h | 12 +++---
> arch/x86/include/asm/nospec-branch.h | 2 +-
> arch/x86/kernel/cpu/bugs.c | 53 ++++++++++++++++++-------
> arch/x86/kvm/x86.c | 5 ++-
> 7 files changed, 61 insertions(+), 24 deletions(-)
I think I'd rather this be three or four or five more patches.
The rename:
> -DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
> +DECLARE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
could be alone by itself.
So could the additional command-line override and its documentation.
(whatever it gets named).
...
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 4091a776e37aaed67ca93b0a0cd23cc25dbc33d4..3d547c3eab4e3290de3eee8e89f21587fee34931 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -499,6 +499,7 @@
> #define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
> #define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */
> #define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */
> +#define X86_FEATURE_CLEAR_BHB_EXIT_TO_USER (21*32+17) /* Clear branch history on exit-to-userspace, see VMSCAPE bug */
X86_FEATURE flags are cheap, but they're not infinite. Is this worth two
of these? It actually makes the code actively worse. (See below).
> diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
> index ce3eb6d5fdf9f2dba59b7bad24afbfafc8c36918..b629e85c33aa7387042cce60040b8a493e3e6d46 100644
> --- a/arch/x86/include/asm/entry-common.h
> +++ b/arch/x86/include/asm/entry-common.h
> @@ -94,11 +94,13 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> */
> choose_random_kstack_offset(rdtsc());
>
> - /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
> - if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
> - this_cpu_read(x86_ibpb_exit_to_user)) {
> - indirect_branch_prediction_barrier();
> - this_cpu_write(x86_ibpb_exit_to_user, false);
> + if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
> + if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
> + indirect_branch_prediction_barrier();
> + if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_EXIT_TO_USER))
> + clear_bhb_long_loop();
> +
> + this_cpu_write(x86_predictor_flush_exit_to_user, false);
> }
> }
One (mildly) nice thing about the old code was that it could avoid
reading 'x86_predictor_flush_exit_to_user' in the unaffected case.
Also, how does the code generation end up looking here? Each
cpu_feature_enabled() has an alternative, and
indirect_branch_prediction_barrier() has another one. Are we generating
alternatives that can't even possibly happen? For instance, could we
ever have system with X86_FEATURE_IBPB_EXIT_TO_USER but *not*
X86_FEATURE_IBPB?
Let's say this was:
if (cpu_feature_enabled(X86_FEATURE_FOO_EXIT_TO_USER) &&
this_cpu_read(x86_ibpb_exit_to_user)) {
static_call(clear_branch_history);
this_cpu_write(x86_ibpb_exit_to_user, false);
}
And the static_call() was assigned to either clear_bhb_long_loop() or
write_ibpb(). I suspect the code generation would be nicer and it would
eliminate one reason for having two X86_FEATUREs.
> static enum vmscape_mitigations vmscape_mitigation __ro_after_init =
> @@ -3221,6 +3222,8 @@ static int __init vmscape_parse_cmdline(char *str)
> } else if (!strcmp(str, "force")) {
> setup_force_cpu_bug(X86_BUG_VMSCAPE);
> vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
> + } else if (!strcmp(str, "on")) {
> + vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
> } else {
> pr_err("Ignoring unknown vmscape=%s option.\n", str);
> }
Yeah, it's goofy that =on sets ..._AUTO.
> @@ -3231,18 +3234,35 @@ early_param("vmscape", vmscape_parse_cmdline);
>
> static void __init vmscape_select_mitigation(void)
> {
> - if (!boot_cpu_has_bug(X86_BUG_VMSCAPE) ||
> - !boot_cpu_has(X86_FEATURE_IBPB)) {
> + if (!boot_cpu_has_bug(X86_BUG_VMSCAPE)) {
> vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
> return;
> }
>
> - if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) {
> - if (should_mitigate_vuln(X86_BUG_VMSCAPE))
> - vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
> - else
> - vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
> + if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO &&
> + !should_mitigate_vuln(X86_BUG_VMSCAPE))
> + vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
> +
> + if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER &&
> + !boot_cpu_has(X86_FEATURE_IBPB)) {
> + pr_err("IBPB not supported, switching to AUTO select\n");
> + vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
> }
> +
> + if (vmscape_mitigation != VMSCAPE_MITIGATION_AUTO)
> + return;
> +
> + /*
> + * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use BHB
> + * clear sequence. These CPUs are only vulnerable to the BHI variant
> + * of the VMSCAPE attack and does not require an IBPB flush.
> + */
> + if (boot_cpu_has(X86_FEATURE_BHI_CTRL))
> + vmscape_mitigation = VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER;
> + else if (boot_cpu_has(X86_FEATURE_IBPB))
> + vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
> + else
> + vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
> }
Yeah, there are a *lot* of logic changes there. Any simplifications by
breaking this up would be appreciated.
> static void __init vmscape_update_mitigation(void)
> @@ -3261,6 +3281,8 @@ static void __init vmscape_apply_mitigation(void)
> {
> if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
> setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER);
> + else if (vmscape_mitigation == VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER)
> + setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_EXIT_TO_USER);
> }
Yeah, so in that scheme I was talking about a minute ago, this could be
where you do a static_call_update() instead of setting individual
feature bits.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop
2025-10-27 23:43 ` [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop Pawan Gupta
@ 2025-11-03 20:45 ` Dave Hansen
2025-11-04 22:01 ` Pawan Gupta
0 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2025-11-03 20:45 UTC (permalink / raw)
To: Pawan Gupta, x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang
On 10/27/25 16:43, Pawan Gupta wrote:
> Long loop is used to clear the branch history when switching from a guest
> to host userspace. The LFENCE barrier is not required in this case as ring
> transition itself acts as a barrier.
>
> Move the prologue, LFENCE and epilogue out of __CLEAR_BHB_LOOP macro to
> allow skipping the LFENCE in the long loop variant. Rename the long loop
> function to clear_bhb_long_loop_no_barrier() to reflect the change.
Too. Much. Assembly.
Is there a reason we can't do more of this in C? Can we have _one_
assembly function, please? One that takes the loop counts? No macros, no
duplication functions. Just one:
void __clear_bhb_loop(int inner, int outer);
Then we have sensible code that looks like this:
void clear_bhb_loop()
{
__clear_bhb_loop(inner, outer);
lfence();
}
void clear_bhb_loop_nofence()
{
__clear_bhb_loop(inner, outer);
}
We don't need a short and a long *version*. We just have one function
(or pair of functions) that gets called that works everywhere.
Actually, if you just used global variables and called the assembly one:
extern void clear_bhb_loop_nofence();
then the other implementation would just be:
void clear_bhb_loop()
{
__clear_bhb_loop(inner, outer);
lfence();
}
Then we have *ONE* assembly function instead of four.
Right? What am I missing?
Does the LFENCE *need* to be before that last pop and RET?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] x86/bhi: Add BHB clearing for CPUs with larger branch history
2025-11-03 20:04 ` Dave Hansen
@ 2025-11-03 22:45 ` Pawan Gupta
0 siblings, 0 replies; 18+ messages in thread
From: Pawan Gupta @ 2025-11-03 22:45 UTC (permalink / raw)
To: Dave Hansen
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
linux-kernel, kvm, Asit Mallick, Tao Zhang
On Mon, Nov 03, 2025 at 12:04:50PM -0800, Dave Hansen wrote:
> On 10/27/25 16:43, Pawan Gupta wrote:
> > Add a version of clear_bhb_loop() that works on CPUs with larger branch
> > history table such as Alder Lake and newer. This could serve as a cheaper
> > alternative to IBPB mitigation for VMSCAPE.
>
> This is missing a bit of background about clear_bhb_loop(). What does it
> mitigate? This is also a better place to talk about why this loop exists
> if it doesn't work on newer CPUs.
>
> In other words, please mention BHI_DIS_S here.
Sure, will add the background on clear_bhb_loop() and BHI_DIS_S.
> > clear_bhb_loop() and the new clear_bhb_long_loop() only differ in the loop
> > counter. Convert the asm implementation of clear_bhb_loop() into a macro
> > that is used by both the variants, passing counter as an argument.
>
> I find these a lot easier to review if you separate out the refactoring
> from the new work. I know it's not a lot of code, but refactor first,
> then add he new function in a separate patch.
Ya, thats a better way to do it, I will split the patch.
> > +/*
> > + * A longer version of clear_bhb_loop to ensure that the BHB is cleared on CPUs
>
> "clear_bhb_loop()", please.
Will fix.
> > + * with larger branch history tables (i.e. Alder Lake and newer). BHI_DIS_S
> > + * protects the kernel, but to mitigate the guest influence on the host
> > + * userspace either IBPB or this sequence should be used. See VMSCAPE bug.
> > + */
> > +SYM_FUNC_START(clear_bhb_long_loop)
> > + __CLEAR_BHB_LOOP 12, 7
> > +SYM_FUNC_END(clear_bhb_long_loop)
> > +EXPORT_SYMBOL_GPL(clear_bhb_long_loop)
> > +STACK_FRAME_NON_STANDARD(clear_bhb_long_loop)
>
> All the pieces are out there, but I feel like we need this in one place,
> somewhere:
>
> BHI_DIS_S: Mitigates user=>kernel attacks on new CPUs. Faster than the
> long loop.
> Long Loop: Mitigates guest=>host userspace attacks on new CPUs. Would
> also work for user=>kernel, but BHI_DIS_S is faster.
> Short Loop: The only choice on older CPUs. Used for both user=>kernel
> and guest=>host userspace mitigation.
Sure, I will capture them in one place. I guess this should also go in the
documentation.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 0/3] VMSCAPE optimization for BHI variant
2025-11-03 20:07 ` [PATCH v3 0/3] VMSCAPE optimization for BHI variant Dave Hansen
@ 2025-11-03 23:03 ` Pawan Gupta
0 siblings, 0 replies; 18+ messages in thread
From: Pawan Gupta @ 2025-11-03 23:03 UTC (permalink / raw)
To: Dave Hansen
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
linux-kernel, kvm, Asit Mallick, Tao Zhang
On Mon, Nov 03, 2025 at 12:07:30PM -0800, Dave Hansen wrote:
> On 10/27/25 16:43, Pawan Gupta wrote:
> > | iPerf user-net | IBPB | BHB Clear |
> > |----------------|---------|-----------|
> > | UDP 1-vCPU_p1 | -12.5% | 1.3% |
> ...
>
> Could you clarify what "1.3%" means? Is that relative to the baseline,
> or relative to the IBPB number?
This is relative to the baseline, sorry I didn't mention that explicitly.
> If it's relative to the baseline, then this data either looks wrong or
> noisy since there are a lot of places where adding the BHB Clear loop
> makes things faster.
I will double check, but I am fairly positive that this wasn't noisy.
Surprisingly, there were a few other cases where the BHB-clearing was
performing better than the baseline.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop
2025-11-03 20:45 ` Dave Hansen
@ 2025-11-04 22:01 ` Pawan Gupta
2025-11-04 22:35 ` Dave Hansen
0 siblings, 1 reply; 18+ messages in thread
From: Pawan Gupta @ 2025-11-04 22:01 UTC (permalink / raw)
To: Dave Hansen
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
linux-kernel, kvm, Asit Mallick, Tao Zhang
On Mon, Nov 03, 2025 at 12:45:35PM -0800, Dave Hansen wrote:
> On 10/27/25 16:43, Pawan Gupta wrote:
> > Long loop is used to clear the branch history when switching from a guest
> > to host userspace. The LFENCE barrier is not required in this case as ring
> > transition itself acts as a barrier.
> >
> > Move the prologue, LFENCE and epilogue out of __CLEAR_BHB_LOOP macro to
> > allow skipping the LFENCE in the long loop variant. Rename the long loop
> > function to clear_bhb_long_loop_no_barrier() to reflect the change.
>
> Too. Much. Assembly.
>
> Is there a reason we can't do more of this in C?
Apart from VMSCAPE, BHB clearing is also required when entering kernel from
system calls. And one of the safety requirement is to absolutely not
execute any indirect call/jmp unless we have cleared the BHB. In a C
implementation we cannot guarantee that the compiler won't generate
indirect branches before the BHB clearing can be done.
> Can we have _one_ assembly function, please? One that takes the loop
> counts? No macros, no duplication functions. Just one:
This seems possible for all the C callers. ASM callers should stick to asm
versions of BHB clearing to guarantee the compiler did not do anything
funky that would break the mitigation.
> void __clear_bhb_loop(int inner, int outer);
>
> Then we have sensible code that looks like this:
>
> void clear_bhb_loop()
> {
> __clear_bhb_loop(inner, outer);
> lfence();
> }
>
> void clear_bhb_loop_nofence()
> {
> __clear_bhb_loop(inner, outer);
> }
>
> We don't need a short and a long *version*. We just have one function
> (or pair of functions) that gets called that works everywhere.
>
> Actually, if you just used global variables and called the assembly one:
>
> extern void clear_bhb_loop_nofence();
>
> then the other implementation would just be:
>
> void clear_bhb_loop()
> {
> __clear_bhb_loop(inner, outer);
> lfence();
> }
>
> Then we have *ONE* assembly function instead of four.
>
> Right? What am I missing?
Overall, these look to be good improvements to me. The only concern is
making sure that we don't inadvertently call the C version from places that
strictly require no indirect branches before BHB clearing.
> Does the LFENCE *need* to be before that last pop and RET?
At syscall entry, VMexit and BPF (for native BHI mitigation), it does not
matter whether the LFENCE is before or after the last RET, if we can
guarantee that there will be no indirect call/jmp before LFENCE. C version
may not be able to provide this guarantee.
For exit-to-userspace (for VMSCAPE), C implementation is perfectly fine
since the goal is to protect userspace.
To summarize, only 1 of the BHB clear callsite can safely use the C
version, while others need to continue to use the assembly version. I do
not anticipate more such callsites that would be okay with indirect
branches before BHB clearing.
I am open to suggestions on making the code more readable while ensuring
the safety.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop
2025-11-04 22:01 ` Pawan Gupta
@ 2025-11-04 22:35 ` Dave Hansen
2025-11-04 23:36 ` Pawan Gupta
0 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2025-11-04 22:35 UTC (permalink / raw)
To: Pawan Gupta
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
linux-kernel, kvm, Asit Mallick, Tao Zhang
On 11/4/25 14:01, Pawan Gupta wrote:
> On Mon, Nov 03, 2025 at 12:45:35PM -0800, Dave Hansen wrote:
...
>> Too. Much. Assembly.
>>
>> Is there a reason we can't do more of this in C?
>
> Apart from VMSCAPE, BHB clearing is also required when entering kernel from
> system calls. And one of the safety requirement is to absolutely not
> execute any indirect call/jmp unless we have cleared the BHB. In a C
> implementation we cannot guarantee that the compiler won't generate
> indirect branches before the BHB clearing can be done.
That's a good reason, and I did forget about the CLEAR_BRANCH_HISTORY
route to get in to this code.
But my main aversion was to having so many different functions with
different names to do different things that are also exported to the world.
For instance, if we need an LFENCE in the entry code, we could do this:
.macro CLEAR_BRANCH_HISTORY
ALTERNATIVE "", "call clear_bhb_loop; lfence",\
X86_FEATURE_CLEAR_BHB_LOOP
.endm
Instead of having a LFENCE variant of clear_bhb_loop().
>> Can we have _one_ assembly function, please? One that takes the loop
>> counts? No macros, no duplication functions. Just one:
>
> This seems possible for all the C callers. ASM callers should stick to asm
> versions of BHB clearing to guarantee the compiler did not do anything
> funky that would break the mitigation.
ASM callers can pass arguments to functions too. ;)
Sure, the syscall entry path might not be the *best* place in the world
to do that because it'll add even more noops.
It does make me wonder if we want to deal with this more holistically
somehow:
/* clobbers %rax, make sure it is after saving the syscall nr */
IBRS_ENTER
UNTRAIN_RET
CLEAR_BRANCH_HISTORY
especially if we're creating lots and lots of variants of functions to
keep the ALTERNATIVE noop padding short.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop
2025-11-04 22:35 ` Dave Hansen
@ 2025-11-04 23:36 ` Pawan Gupta
0 siblings, 0 replies; 18+ messages in thread
From: Pawan Gupta @ 2025-11-04 23:36 UTC (permalink / raw)
To: Dave Hansen
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
linux-kernel, kvm, Asit Mallick, Tao Zhang
On Tue, Nov 04, 2025 at 02:35:11PM -0800, Dave Hansen wrote:
> On 11/4/25 14:01, Pawan Gupta wrote:
> > On Mon, Nov 03, 2025 at 12:45:35PM -0800, Dave Hansen wrote:
> ...
> >> Too. Much. Assembly.
> >>
> >> Is there a reason we can't do more of this in C?
> >
> > Apart from VMSCAPE, BHB clearing is also required when entering kernel from
> > system calls. And one of the safety requirement is to absolutely not
> > execute any indirect call/jmp unless we have cleared the BHB. In a C
> > implementation we cannot guarantee that the compiler won't generate
> > indirect branches before the BHB clearing can be done.
>
> That's a good reason, and I did forget about the CLEAR_BRANCH_HISTORY
> route to get in to this code.
>
> But my main aversion was to having so many different functions with
> different names to do different things that are also exported to the world.
>
> For instance, if we need an LFENCE in the entry code, we could do this:
>
> .macro CLEAR_BRANCH_HISTORY
> ALTERNATIVE "", "call clear_bhb_loop; lfence",\
> X86_FEATURE_CLEAR_BHB_LOOP
> .endm
>
> Instead of having a LFENCE variant of clear_bhb_loop().
This makes perfect sense. I will do that.
> >> Can we have _one_ assembly function, please? One that takes the loop
> >> counts? No macros, no duplication functions. Just one:
> >
> > This seems possible for all the C callers. ASM callers should stick to asm
> > versions of BHB clearing to guarantee the compiler did not do anything
> > funky that would break the mitigation.
>
> ASM callers can pass arguments to functions too. ;)
Oh my comment was more from the safety perspective of compiler induced
code.
> Sure, the syscall entry path might not be the *best* place in the world
> to do that because it'll add even more noops.
Right.
> It does make me wonder if we want to deal with this more holistically
> somehow:
>
> /* clobbers %rax, make sure it is after saving the syscall nr */
> IBRS_ENTER
> UNTRAIN_RET
> CLEAR_BRANCH_HISTORY
>
> especially if we're creating lots and lots of variants of functions to
> keep the ALTERNATIVE noop padding short.
Hmm, mitigations that are mutually exclusive can certainly be grouped
together in an ALTERNATIVE_N block. It also has a potential to quickly
become messy. But certainly worth exploring.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace
2025-11-03 20:31 ` Dave Hansen
@ 2025-11-06 23:40 ` Pawan Gupta
2025-11-19 10:33 ` Nikolay Borisov
0 siblings, 1 reply; 18+ messages in thread
From: Pawan Gupta @ 2025-11-06 23:40 UTC (permalink / raw)
To: Dave Hansen
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
linux-kernel, kvm, Asit Mallick, Tao Zhang
[ I drafted the reply this this email earlier, but forgot to send it, sorry. ]
On Mon, Nov 03, 2025 at 12:31:09PM -0800, Dave Hansen wrote:
> On 10/27/25 16:43, Pawan Gupta wrote:
> > IBPB mitigation for VMSCAPE is an overkill for CPUs that are only affected
> > by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
> > indirect branch isolation between guest and host userspace. But, a guest
> > could still poison the branch history.
>
> This is missing a wee bit of background about how branch history and
> indirect branch prediction are involved in VMSCAPE.
Adding more background to this.
> > To mitigate that, use the recently added clear_bhb_long_loop() to isolate
> > the branch history between guest and userspace. Add cmdline option
> > 'vmscape=on' that automatically selects the appropriate mitigation based
> > on the CPU.
>
> Is "=on" the right thing here as opposed to "=auto"?
v1 had it as =auto, David Kaplan made a point that for attack vector controls
"auto" means "defer to attack vector controls":
https://lore.kernel.org/all/LV3PR12MB9265B1C6D9D36408539B68B9941EA@LV3PR12MB9265.namprd12.prod.outlook.com/
"Maybe a better solution instead is to add a new option 'vmscape=on'.
If we look at the other most recently added bugs like TSA and ITS, neither
have an explicit 'auto' cmdline option. But they do have 'on' cmdline
options.
The difference between 'auto' and 'on' is that 'auto' defers to the attack
vector controls while 'on' means 'enable this mitigation if the CPU is
vulnerable' (as opposed to 'force' which will enable it even if not
vulnerable).
An explicit 'vmscape=on' could give users an option to ensure the
mitigation is used (regardless of attack vectors) and could choose the best
mitigation (BHB clear if available, otherwise IBPB).
I'd still advise users to not specify any option here unless they know what
they're doing. But an 'on' option would arguably be more consistent with
the other recent bugs and maybe meets the needs you're after?"
> What you have here doesn't actually turn VMSCAPE mitigation on for
> 'vmscape=on'.
It picks between BHB-clear and IBPB, but it still turns 'on' the
mitigation. Maybe I am misunderstanding you?
> > Documentation/admin-guide/hw-vuln/vmscape.rst | 8 ++++
> > Documentation/admin-guide/kernel-parameters.txt | 4 +-
> > arch/x86/include/asm/cpufeatures.h | 1 +
> > arch/x86/include/asm/entry-common.h | 12 +++---
> > arch/x86/include/asm/nospec-branch.h | 2 +-
> > arch/x86/kernel/cpu/bugs.c | 53 ++++++++++++++++++-------
> > arch/x86/kvm/x86.c | 5 ++-
> > 7 files changed, 61 insertions(+), 24 deletions(-)
>
> I think I'd rather this be three or four or five more patches.
>
> The rename:
>
> > -DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
> > +DECLARE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
>
> could be alone by itself.
>
> So could the additional command-line override and its documentation.
> (whatever it gets named).
On it.
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index 4091a776e37aaed67ca93b0a0cd23cc25dbc33d4..3d547c3eab4e3290de3eee8e89f21587fee34931 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -499,6 +499,7 @@
> > #define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
> > #define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */
> > #define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */
> > +#define X86_FEATURE_CLEAR_BHB_EXIT_TO_USER (21*32+17) /* Clear branch history on exit-to-userspace, see VMSCAPE bug */
>
> X86_FEATURE flags are cheap, but they're not infinite. Is this worth two
> of these? It actually makes the code actively worse. (See below).
>
> > diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
> > index ce3eb6d5fdf9f2dba59b7bad24afbfafc8c36918..b629e85c33aa7387042cce60040b8a493e3e6d46 100644
> > --- a/arch/x86/include/asm/entry-common.h
> > +++ b/arch/x86/include/asm/entry-common.h
> > @@ -94,11 +94,13 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> > */
> > choose_random_kstack_offset(rdtsc());
> >
> > - /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
> > - if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
> > - this_cpu_read(x86_ibpb_exit_to_user)) {
> > - indirect_branch_prediction_barrier();
> > - this_cpu_write(x86_ibpb_exit_to_user, false);
> > + if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
> > + if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
> > + indirect_branch_prediction_barrier();
> > + if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_EXIT_TO_USER))
> > + clear_bhb_long_loop();
> > +
> > + this_cpu_write(x86_predictor_flush_exit_to_user, false);
> > }
> > }
>
> One (mildly) nice thing about the old code was that it could avoid
> reading 'x86_predictor_flush_exit_to_user' in the unaffected case.
Yes.
> Also, how does the code generation end up looking here? Each
> cpu_feature_enabled() has an alternative, and
> indirect_branch_prediction_barrier() has another one. Are we generating
> alternatives that can't even possibly happen? For instance, could we
> ever have system with X86_FEATURE_IBPB_EXIT_TO_USER but *not*
> X86_FEATURE_IBPB?
No, without IBPB X86_FEATURE_IBPB_EXIT_TO_USER won't be set. As you
suggested below, static_call() can call write_ibpb() directly in this case.
> Let's say this was:
>
> if (cpu_feature_enabled(X86_FEATURE_FOO_EXIT_TO_USER) &&
With static_call() we could also live without X86_FEATURE_FOO_EXIT_TO_USER,
but ...
> this_cpu_read(x86_ibpb_exit_to_user)) {
... it has a slight drawback that we read this always.
> static_call(clear_branch_history);
> this_cpu_write(x86_ibpb_exit_to_user, false);
> }
>
> And the static_call() was assigned to either clear_bhb_long_loop() or
> write_ibpb(). I suspect the code generation would be nicer and it would
> eliminate one reason for having two X86_FEATUREs.
Agree.
> > static enum vmscape_mitigations vmscape_mitigation __ro_after_init =
> > @@ -3221,6 +3222,8 @@ static int __init vmscape_parse_cmdline(char *str)
> > } else if (!strcmp(str, "force")) {
> > setup_force_cpu_bug(X86_BUG_VMSCAPE);
> > vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
> > + } else if (!strcmp(str, "on")) {
> > + vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
> > } else {
> > pr_err("Ignoring unknown vmscape=%s option.\n", str);
> > }
>
> Yeah, it's goofy that =on sets ..._AUTO.
Yes, we can go back to =auto. David, I hope it is not too big of a problem
with attack vector controls?
> > @@ -3231,18 +3234,35 @@ early_param("vmscape", vmscape_parse_cmdline);
> >
> > static void __init vmscape_select_mitigation(void)
> > {
> > - if (!boot_cpu_has_bug(X86_BUG_VMSCAPE) ||
> > - !boot_cpu_has(X86_FEATURE_IBPB)) {
> > + if (!boot_cpu_has_bug(X86_BUG_VMSCAPE)) {
> > vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
> > return;
> > }
> >
> > - if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) {
> > - if (should_mitigate_vuln(X86_BUG_VMSCAPE))
> > - vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
> > - else
> > - vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
> > + if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO &&
> > + !should_mitigate_vuln(X86_BUG_VMSCAPE))
> > + vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
> > +
> > + if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER &&
> > + !boot_cpu_has(X86_FEATURE_IBPB)) {
> > + pr_err("IBPB not supported, switching to AUTO select\n");
> > + vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
> > }
> > +
> > + if (vmscape_mitigation != VMSCAPE_MITIGATION_AUTO)
> > + return;
> > +
> > + /*
> > + * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use BHB
> > + * clear sequence. These CPUs are only vulnerable to the BHI variant
> > + * of the VMSCAPE attack and does not require an IBPB flush.
> > + */
> > + if (boot_cpu_has(X86_FEATURE_BHI_CTRL))
> > + vmscape_mitigation = VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER;
> > + else if (boot_cpu_has(X86_FEATURE_IBPB))
> > + vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
> > + else
> > + vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
> > }
>
> Yeah, there are a *lot* of logic changes there. Any simplifications by
> breaking this up would be appreciated.
Into multiple patches, I guess? Will do.
> > static void __init vmscape_update_mitigation(void)
> > @@ -3261,6 +3281,8 @@ static void __init vmscape_apply_mitigation(void)
> > {
> > if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
> > setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER);
> > + else if (vmscape_mitigation == VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER)
> > + setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_EXIT_TO_USER);
> > }
>
> Yeah, so in that scheme I was talking about a minute ago, this could be
> where you do a static_call_update() instead of setting individual
> feature bits.
Yes, and we can avoid both IBPB_EXIT_TO_USER and CLEAR_BHB_EXIT_TO_USER
feature flags.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace
2025-11-06 23:40 ` Pawan Gupta
@ 2025-11-19 10:33 ` Nikolay Borisov
2025-11-19 18:26 ` Pawan Gupta
0 siblings, 1 reply; 18+ messages in thread
From: Nikolay Borisov @ 2025-11-19 10:33 UTC (permalink / raw)
To: Pawan Gupta, Dave Hansen
Cc: x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
linux-kernel, kvm, Asit Mallick, Tao Zhang
On 11/7/25 01:40, Pawan Gupta wrote:
> [ I drafted the reply this this email earlier, but forgot to send it, sorry. ]
>
> On Mon, Nov 03, 2025 at 12:31:09PM -0800, Dave Hansen wrote:
>> On 10/27/25 16:43, Pawan Gupta wrote:
>>> IBPB mitigation for VMSCAPE is an overkill for CPUs that are only affected
>>> by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
>>> indirect branch isolation between guest and host userspace. But, a guest
>>> could still poison the branch history.
>>
>> This is missing a wee bit of background about how branch history and
>> indirect branch prediction are involved in VMSCAPE.
>
> Adding more background to this.
>
>>> To mitigate that, use the recently added clear_bhb_long_loop() to isolate
>>> the branch history between guest and userspace. Add cmdline option
>>> 'vmscape=on' that automatically selects the appropriate mitigation based
>>> on the CPU.
>>
>> Is "=on" the right thing here as opposed to "=auto"?
>
> v1 had it as =auto, David Kaplan made a point that for attack vector controls
> "auto" means "defer to attack vector controls":
>
> https://lore.kernel.org/all/LV3PR12MB9265B1C6D9D36408539B68B9941EA@LV3PR12MB9265.namprd12.prod.outlook.com/
>
> "Maybe a better solution instead is to add a new option 'vmscape=on'.
>
> If we look at the other most recently added bugs like TSA and ITS, neither
> have an explicit 'auto' cmdline option. But they do have 'on' cmdline
> options.
>
> The difference between 'auto' and 'on' is that 'auto' defers to the attack
> vector controls while 'on' means 'enable this mitigation if the CPU is
> vulnerable' (as opposed to 'force' which will enable it even if not
> vulnerable).
>
> An explicit 'vmscape=on' could give users an option to ensure the
> mitigation is used (regardless of attack vectors) and could choose the best
> mitigation (BHB clear if available, otherwise IBPB).
I thought the whole idea of attack vectors was because the gazillion
options for gazillion mitigation became untenable over time. Now, what
you are saying is - on top of the simplification, let's add yet more
options to override the attack vectors o_O. IMO, having 'force' is
sufficient to cover scenarios where people really want this mitigation -
either because they know better, or because they want to test something.
Force also covers the "on" case, so let's leave it at "on" for attack
vector support, and 'force' for everything else
<snip>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace
2025-11-19 10:33 ` Nikolay Borisov
@ 2025-11-19 18:26 ` Pawan Gupta
0 siblings, 0 replies; 18+ messages in thread
From: Pawan Gupta @ 2025-11-19 18:26 UTC (permalink / raw)
To: Nikolay Borisov
Cc: Dave Hansen, x86, H. Peter Anvin, Josh Poimboeuf, David Kaplan,
Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
linux-kernel, kvm, Asit Mallick, Tao Zhang
On Wed, Nov 19, 2025 at 12:33:05PM +0200, Nikolay Borisov wrote:
>
>
> On 11/7/25 01:40, Pawan Gupta wrote:
> > [ I drafted the reply this this email earlier, but forgot to send it, sorry. ]
> >
> > On Mon, Nov 03, 2025 at 12:31:09PM -0800, Dave Hansen wrote:
> > > On 10/27/25 16:43, Pawan Gupta wrote:
> > > > IBPB mitigation for VMSCAPE is an overkill for CPUs that are only affected
> > > > by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
> > > > indirect branch isolation between guest and host userspace. But, a guest
> > > > could still poison the branch history.
> > >
> > > This is missing a wee bit of background about how branch history and
> > > indirect branch prediction are involved in VMSCAPE.
> >
> > Adding more background to this.
> >
> > > > To mitigate that, use the recently added clear_bhb_long_loop() to isolate
> > > > the branch history between guest and userspace. Add cmdline option
> > > > 'vmscape=on' that automatically selects the appropriate mitigation based
> > > > on the CPU.
> > >
> > > Is "=on" the right thing here as opposed to "=auto"?
> >
> > v1 had it as =auto, David Kaplan made a point that for attack vector controls
> > "auto" means "defer to attack vector controls":
> >
> > https://lore.kernel.org/all/LV3PR12MB9265B1C6D9D36408539B68B9941EA@LV3PR12MB9265.namprd12.prod.outlook.com/
> >
> > "Maybe a better solution instead is to add a new option 'vmscape=on'.
> >
> > If we look at the other most recently added bugs like TSA and ITS, neither
> > have an explicit 'auto' cmdline option. But they do have 'on' cmdline
> > options.
> >
> > The difference between 'auto' and 'on' is that 'auto' defers to the attack
> > vector controls while 'on' means 'enable this mitigation if the CPU is
> > vulnerable' (as opposed to 'force' which will enable it even if not
> > vulnerable).
> >
> > An explicit 'vmscape=on' could give users an option to ensure the
> > mitigation is used (regardless of attack vectors) and could choose the best
> > mitigation (BHB clear if available, otherwise IBPB).
>
> I thought the whole idea of attack vectors was because the gazillion options
> for gazillion mitigation became untenable over time. Now, what you are
> saying is - on top of the simplification, let's add yet more options to
> override the attack vectors o_O. IMO, having 'force' is sufficient to cover
> scenarios where people really want this mitigation - either because they
> know better, or because they want to test something.
Agree with that in general. It all boils down to: Is there are use case
where people would want to use attack vector controls but want to override
one specific mitigation?
> Force also covers the "on" case, so let's leave it at "on" for attack vector
> support, and 'force' for everything else
Force covers the "on" case with a caveat that it also forces the BUG on
unaffected CPUs.
Given that attack vectors do allow all other mitigations to override the
attack vector settings, VMSCAPE should be no different. Or else we
introduce a change to let attack vectors reign all mitigations.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-11-19 18:26 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27 23:43 [PATCH v3 0/3] VMSCAPE optimization for BHI variant Pawan Gupta
2025-10-27 23:43 ` [PATCH v3 1/3] x86/bhi: Add BHB clearing for CPUs with larger branch history Pawan Gupta
2025-11-03 20:04 ` Dave Hansen
2025-11-03 22:45 ` Pawan Gupta
2025-10-27 23:43 ` [PATCH v3 2/3] x86/vmscape: Replace IBPB with branch history clear on exit to userspace Pawan Gupta
2025-10-29 22:47 ` Sean Christopherson
2025-10-30 0:08 ` Pawan Gupta
2025-11-03 20:31 ` Dave Hansen
2025-11-06 23:40 ` Pawan Gupta
2025-11-19 10:33 ` Nikolay Borisov
2025-11-19 18:26 ` Pawan Gupta
2025-10-27 23:43 ` [PATCH v3 3/3] x86/vmscape: Remove LFENCE from BHB clearing long loop Pawan Gupta
2025-11-03 20:45 ` Dave Hansen
2025-11-04 22:01 ` Pawan Gupta
2025-11-04 22:35 ` Dave Hansen
2025-11-04 23:36 ` Pawan Gupta
2025-11-03 20:07 ` [PATCH v3 0/3] VMSCAPE optimization for BHI variant Dave Hansen
2025-11-03 23:03 ` Pawan Gupta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox