public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 00/10] VMSCAPE optimization for BHI variant
@ 2026-04-03  0:30 Pawan Gupta
  2026-04-03  0:30 ` [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:30 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

v9:
- Use global variables for BHB loop counters instead of ALTERNATIVE-based
  approach. (Dave & others)
- Use 32-bit registers (%eax/%ecx) for loop counters, loaded via movzbl
  from 8-bit globals. 8-bit registers (e.g. %ah in the inner loop) caused
  performance regression on certain CPUs due to partial-register stalls. (David Laight)
- Let BPF save/restore %rax/%rcx as in the original implementation, since
  it is the only caller that needs these registers preserved across the
  BHB clearing sequence.
- Drop Reviewed-by from patch 2/10 as the implementation changed significantly.
- Apply Tested-by from Jon Kohler to the series (except patch 2/10).
- Fix commit message grammar. (Borislav)
- Rebased to v7.0-rc6.

v8: https://lore.kernel.org/r/20260324-vmscape-bhb-v8-0-68bb524b3ab9@linux.intel.com
- Use helper in KVM to convey the mitigation status. (PeterZ/Borisov)
- Fix the documentation for default vmscape mitigation. (BPF bot)
- Remove the stray lines in bug.c (BPF bot).
- Updated commit messages and comments.
- Rebased to v7.0-rc5.

v7: https://lore.kernel.org/r/20260319-vmscape-bhb-v7-0-b76a777a98af@linux.intel.com
- s/This allows/Allow/ and s/This does adds/This adds/ in patch 1/10 commit
  message (Borislav).
- Minimize register usage in BHB clearing seq. (David Laight)
  - Instead of separate ecx/eax counters, use al/ah.
  - Adjust the alignment of RET due to register size change.
  - save/restore rax in the seq itself.
  - Remove the save/restore of rax/rcx for BPF callers.
- Rename clear_bhb_loop() to clear_bhb_loop_nofence() to make it
  obvious that the LFENCE is not part of the sequence (Borislav).
- Fix Kconfig: s/select/depends on/ HAVE_STATIC_CALL (PeterZ).
- Rebased to v7.0-rc4.

v6: https://lore.kernel.org/r/20251201-vmscape-bhb-v6-0-d610dd515714@linux.intel.com
- Remove semicolon at the end of asm in ALTERNATIVE (Uros).
- Fix build warning in vmscape_select_mitigation() (LKP).
- Rebased to v6.18.

v5: https://lore.kernel.org/r/20251126-vmscape-bhb-v5-2-02d66e423b00@linux.intel.com
- For BHI seq, limit runtime-patching to loop counts only (Dave).
  Dropped 2 patches that moved the BHB seq to a macro.
- Remove redundant switch cases in vmscape_select_mitigation() (Nikolay).
- Improve commit message (Nikolay).
- Collected tags.

v4: https://lore.kernel.org/r/20251119-vmscape-bhb-v4-0-1adad4e69ddc@linux.intel.com
- Move LFENCE to the callsite, out of clear_bhb_loop(). (Dave)
- Make clear_bhb_loop() work for larger BHB. (Dave)
  This now uses hardware enumeration to determine the BHB size to clear.
- Use write_ibpb() instead of indirect_branch_prediction_barrier() when
  IBPB is known to be available. (Dave)
- Use static_call() to simplify mitigation at exit-to-userspace. (Dave)
- Refactor vmscape_select_mitigation(). (Dave)
- Fix vmscape=on which was wrongly behaving as AUTO. (Dave)
- Split the patches. (Dave)
  - Patch 1-4 prepares for making the sequence flexible for VMSCAPE use.
  - Patch 5 trivial rename of variable.
  - Patch 6-8 prepares for deploying BHB mitigation for VMSCAPE.
  - Patch 9 deploys the mitigation.
  - Patch 10-11 fixes ON Vs AUTO mode.

v3: https://lore.kernel.org/r/20251027-vmscape-bhb-v3-0-5793c2534e93@linux.intel.com
- s/x86_pred_flush_pending/x86_predictor_flush_exit_to_user/ (Sean).
- Removed IBPB & BHB-clear mutual exclusion at exit-to-userspace.
- Collected tags.

v2: https://lore.kernel.org/r/20251015-vmscape-bhb-v2-0-91cbdd9c3a96@linux.intel.com
- Added check for IBPB feature in vmscape_select_mitigation(). (David)
- s/vmscape=auto/vmscape=on/ (David)
- Added patch to remove LFENCE from VMSCAPE BHB-clear sequence.
- Rebased to v6.18-rc1.

v1: https://lore.kernel.org/r/20250924-vmscape-bhb-v1-0-da51f0e1934d@linux.intel.com

Hi All,

These patches aim to improve the performance of a recent mitigation for
VMSCAPE[1] vulnerability. This improvement is relevant for BHI variant of
VMSCAPE that affect Alder Lake and newer processors.

The current mitigation approach uses IBPB on kvm-exit-to-userspace for all
affected range of CPUs. This is an overkill for CPUs that are only affected
by the BHI variant. On such CPUs clearing the branch history is sufficient
for VMSCAPE, and also more apt as the underlying issue is due to poisoned
branch history.

Below is the iPerf data for transfer between guest and host, comparing IBPB
and BHB-clear mitigation. BHB-clear shows performance improvement over IBPB
in most cases.

Platform: Emerald Rapids
Baseline: vmscape=off
Target: IBPB at VMexit-to-userspace Vs the new BHB-clear at
	VMexit-to-userspace mitigation (both compared against baseline).

(pN = N parallel connections)

| iPerf user-net | IBPB    | BHB Clear |
|----------------|---------|-----------|
| UDP 1-vCPU_p1  | -12.5%  |   1.3%    |
| TCP 1-vCPU_p1  | -10.4%  |  -1.5%    |
| TCP 1-vCPU_p1  | -7.5%   |  -3.0%    |
| UDP 4-vCPU_p16 | -3.7%   |  -3.7%    |
| TCP 4-vCPU_p4  | -2.9%   |  -1.4%    |
| UDP 4-vCPU_p4  | -0.6%   |   0.0%    |
| TCP 4-vCPU_p4  |  3.5%   |   0.0%    |

| iPerf bridge-net | IBPB    | BHB Clear |
|------------------|---------|-----------|
| UDP 1-vCPU_p1    | -9.4%   |  -0.4%    |
| TCP 1-vCPU_p1    | -3.9%   |  -0.5%    |
| UDP 4-vCPU_p16   | -2.2%   |  -3.8%    |
| TCP 4-vCPU_p4    | -1.0%   |  -1.0%    |
| TCP 4-vCPU_p4    |  0.5%   |   0.5%    |
| UDP 4-vCPU_p4    |  0.0%   |   0.9%    |
| TCP 1-vCPU_p1    |  0.0%   |   0.9%    |

| iPerf vhost-net | IBPB    | BHB Clear |
|-----------------|---------|-----------|
| UDP 1-vCPU_p1   | -4.3%   |   1.0%    |
| TCP 1-vCPU_p1   | -3.8%   |  -0.5%    |
| TCP 1-vCPU_p1   | -2.7%   |  -0.7%    |
| UDP 4-vCPU_p16  | -0.7%   |  -2.2%    |
| TCP 4-vCPU_p4   | -0.4%   |   0.8%    |
| UDP 4-vCPU_p4   |  0.4%   |  -0.7%    |
| TCP 4-vCPU_p4   |  0.0%   |   0.6%    |

[1] https://comsec.ethz.ch/research/microarch/vmscape-exposing-and-exploiting-incomplete-branch-predictor-isolation-in-cloud-environments/

---
Pawan Gupta (10):
      x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
      x86/bhi: Make clear_bhb_loop() effective on newer CPUs
      x86/bhi: Rename clear_bhb_loop() to clear_bhb_loop_nofence()
      x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user
      x86/vmscape: Move mitigation selection to a switch()
      x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier()
      x86/vmscape: Use static_call() for predictor flush
      x86/vmscape: Deploy BHB clearing mitigation
      x86/vmscape: Resolve conflict between attack-vectors and vmscape=force
      x86/vmscape: Add cmdline vmscape=on to override attack vector controls

 Documentation/admin-guide/hw-vuln/vmscape.rst   | 15 ++++-
 Documentation/admin-guide/kernel-parameters.txt |  6 +-
 arch/x86/Kconfig                                |  1 +
 arch/x86/entry/entry_64.S                       | 21 +++---
 arch/x86/include/asm/cpufeatures.h              |  2 +-
 arch/x86/include/asm/entry-common.h             | 13 ++--
 arch/x86/include/asm/nospec-branch.h            | 15 +++--
 arch/x86/include/asm/processor.h                |  1 +
 arch/x86/kernel/cpu/bugs.c                      | 89 +++++++++++++++++++++----
 arch/x86/kvm/x86.c                              |  4 +-
 arch/x86/net/bpf_jit_comp.c                     |  4 +-
 11 files changed, 135 insertions(+), 36 deletions(-)
---
base-commit: 7aaa8047eafd0bd628065b15757d9b48c5f9c07d
change-id: 20250916-vmscape-bhb-d7d469977f2f

Best regards,
--  
Thanks,
Pawan



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
@ 2026-04-03  0:30 ` Pawan Gupta
  2026-04-03 15:16   ` Borislav Petkov
  2026-04-03  0:31 ` [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:30 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

Currently, the BHB clearing sequence is followed by an LFENCE to prevent
transient execution of subsequent indirect branches prematurely. However,
the LFENCE barrier could be unnecessary in certain cases. For example, when
the kernel is using the BHI_DIS_S mitigation, and BHB clearing is only
needed for userspace. In such cases, the LFENCE is redundant because ring
transitions would provide the necessary serialization.

Below is a quick recap of BHI mitigation options:

On Alder Lake and newer

    BHI_DIS_S: Hardware control to mitigate BHI in ring0. This has low
    performance overhead.

    Long loop: Alternatively, a longer version of the BHB clearing sequence
    can be used to mitigate BHI. It can also be used to mitigate the BHI
    variant of VMSCAPE. This is not yet implemented in Linux.

On older CPUs

    Short loop: Clears BHB at kernel entry and VMexit. The "Long loop" is
    effective on older CPUs as well, but should be avoided because of
    unnecessary overhead.

On Alder Lake and newer CPUs, eIBRS isolates the indirect targets between
guest and host. But when affected by the BHI variant of VMSCAPE, a guest's
branch history may still influence indirect branches in userspace. This
also means the big hammer IBPB could be replaced with a cheaper option that
clears the BHB at exit-to-userspace after a VMexit.

In preparation for adding the support for the BHB sequence (without LFENCE)
on newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
executed. Allow callers to decide whether they need the LFENCE or not. This
adds a few extra bytes to the call sites, but it obviates the need for
multiple variants of clear_bhb_loop().

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S            | 5 ++++-
 arch/x86/include/asm/nospec-branch.h | 4 ++--
 arch/x86/net/bpf_jit_comp.c          | 2 ++
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 42447b1e1dff..3a180a36ca0e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1528,6 +1528,9 @@ SYM_CODE_END(rewind_stack_and_make_dead)
  * refactored in the future if needed. The .skips are for safety, to ensure
  * that all RETs are in the second half of a cacheline to mitigate Indirect
  * Target Selection, rather than taking the slowpath via its_return_thunk.
+ *
+ * Note, callers should use a speculation barrier like LFENCE immediately after
+ * a call to this function to ensure BHB is cleared before indirect branches.
  */
 SYM_FUNC_START(clear_bhb_loop)
 	ANNOTATE_NOENDBR
@@ -1562,7 +1565,7 @@ SYM_FUNC_START(clear_bhb_loop)
 	sub	$1, %ecx
 	jnz	1b
 .Lret2:	RET
-5:	lfence
+5:
 	pop	%rbp
 	RET
 SYM_FUNC_END(clear_bhb_loop)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4f4b5e8a1574..70b377fcbc1c 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -331,11 +331,11 @@
 
 #ifdef CONFIG_X86_64
 .macro CLEAR_BRANCH_HISTORY
-	ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP
+	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP
 .endm
 
 .macro CLEAR_BRANCH_HISTORY_VMEXIT
-	ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_VMEXIT
+	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
 .endm
 #else
 #define CLEAR_BRANCH_HISTORY
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e9b78040d703..63d6c9fa5e80 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1624,6 +1624,8 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
 
 		if (emit_call(&prog, func, ip))
 			return -EINVAL;
+		/* Don't speculate past this until BHB is cleared */
+		EMIT_LFENCE();
 		EMIT1(0x59); /* pop rcx */
 		EMIT1(0x58); /* pop rax */
 	}

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
  2026-04-03  0:30 ` [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
@ 2026-04-03  0:31 ` Pawan Gupta
  2026-04-03 18:10   ` Jim Mattson
  2026-04-03  0:31 ` [PATCH v9 03/10] x86/bhi: Rename clear_bhb_loop() to clear_bhb_loop_nofence() Pawan Gupta
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:31 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
the Branch History Buffer (BHB). On Alder Lake and newer parts this
sequence is not sufficient because it doesn't clear enough entries. This
was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
in the kernel.

Now with VMSCAPE (BHI variant) it is also required to isolate branch
history between guests and userspace. Since BHI_DIS_S only protects the
kernel, the newer CPUs also use IBPB.

A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
But it currently does not clear enough BHB entries to be effective on newer
CPUs with larger BHB. At boot, dynamically set the loop count of
clear_bhb_loop() such that it is effective on newer CPUs too. Use the
X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S            |  8 +++++---
 arch/x86/include/asm/nospec-branch.h |  2 ++
 arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3a180a36ca0e..bbd4b1c7ec04 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
 	ANNOTATE_NOENDBR
 	push	%rbp
 	mov	%rsp, %rbp
-	movl	$5, %ecx
+
+	movzbl    bhb_seq_outer_loop(%rip), %ecx
+
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call	1f
 	jmp	5f
@@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
 	 * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
 	 * but some Clang versions (e.g. 18) don't like this.
 	 */
-	.skip 32 - 18, 0xcc
-2:	movl	$5, %eax
+	.skip 32 - 20, 0xcc
+2:	movzbl  bhb_seq_inner_loop(%rip), %eax
 3:	jmp	4f
 	nop
 4:	sub	$1, %eax
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 70b377fcbc1c..87b83ae7c97f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
 extern void update_spec_ctrl_cond(u64 val);
 extern u64 spec_ctrl_current(void);
 
+extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
+
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 83f51cab0b1e..2cb4a96247d8 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2047,6 +2047,10 @@ enum bhi_mitigations {
 static enum bhi_mitigations bhi_mitigation __ro_after_init =
 	IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
 
+/* Default to short BHB sequence values */
+u8 bhb_seq_outer_loop __ro_after_init = 5;
+u8 bhb_seq_inner_loop __ro_after_init = 5;
+
 static int __init spectre_bhi_parse_cmdline(char *str)
 {
 	if (!str)
@@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
 		x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
 	}
 
+	/*
+	 * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
+	 * support), see Intel's BHI guidance.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
+		bhb_seq_outer_loop = 12;
+		bhb_seq_inner_loop = 7;
+	}
+
 	x86_arch_cap_msr = x86_read_arch_cap_msr();
 
 	cpu_print_attack_vectors();

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 03/10] x86/bhi: Rename clear_bhb_loop() to clear_bhb_loop_nofence()
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
  2026-04-03  0:30 ` [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
  2026-04-03  0:31 ` [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
@ 2026-04-03  0:31 ` Pawan Gupta
  2026-04-03  0:31 ` [PATCH v9 04/10] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user Pawan Gupta
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:31 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

To reflect the recent change that moved LFENCE to the caller side.

Suggested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S            | 8 ++++----
 arch/x86/include/asm/nospec-branch.h | 6 +++---
 arch/x86/net/bpf_jit_comp.c          | 2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index bbd4b1c7ec04..1f56d086d312 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1532,7 +1532,7 @@ SYM_CODE_END(rewind_stack_and_make_dead)
  * Note, callers should use a speculation barrier like LFENCE immediately after
  * a call to this function to ensure BHB is cleared before indirect branches.
  */
-SYM_FUNC_START(clear_bhb_loop)
+SYM_FUNC_START(clear_bhb_loop_nofence)
 	ANNOTATE_NOENDBR
 	push	%rbp
 	mov	%rsp, %rbp
@@ -1570,6 +1570,6 @@ SYM_FUNC_START(clear_bhb_loop)
 5:
 	pop	%rbp
 	RET
-SYM_FUNC_END(clear_bhb_loop)
-EXPORT_SYMBOL_FOR_KVM(clear_bhb_loop)
-STACK_FRAME_NON_STANDARD(clear_bhb_loop)
+SYM_FUNC_END(clear_bhb_loop_nofence)
+EXPORT_SYMBOL_FOR_KVM(clear_bhb_loop_nofence)
+STACK_FRAME_NON_STANDARD(clear_bhb_loop_nofence)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 87b83ae7c97f..157eb69c7f0f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -331,11 +331,11 @@
 
 #ifdef CONFIG_X86_64
 .macro CLEAR_BRANCH_HISTORY
-	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP
+	ALTERNATIVE "", "call clear_bhb_loop_nofence; lfence", X86_FEATURE_CLEAR_BHB_LOOP
 .endm
 
 .macro CLEAR_BRANCH_HISTORY_VMEXIT
-	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
+	ALTERNATIVE "", "call clear_bhb_loop_nofence; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
 .endm
 #else
 #define CLEAR_BRANCH_HISTORY
@@ -389,7 +389,7 @@ extern void entry_untrain_ret(void);
 extern void write_ibpb(void);
 
 #ifdef CONFIG_X86_64
-extern void clear_bhb_loop(void);
+extern void clear_bhb_loop_nofence(void);
 #endif
 
 extern void (*x86_return_thunk)(void);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 63d6c9fa5e80..f40e88f87273 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1619,7 +1619,7 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
 		EMIT1(0x51); /* push rcx */
 		ip += 2;
 
-		func = (u8 *)clear_bhb_loop;
+		func = (u8 *)clear_bhb_loop_nofence;
 		ip += x86_call_depth_emit_accounting(&prog, func, ip);
 
 		if (emit_call(&prog, func, ip))

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 04/10] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (2 preceding siblings ...)
  2026-04-03  0:31 ` [PATCH v9 03/10] x86/bhi: Rename clear_bhb_loop() to clear_bhb_loop_nofence() Pawan Gupta
@ 2026-04-03  0:31 ` Pawan Gupta
  2026-04-03  0:31 ` [PATCH v9 05/10] x86/vmscape: Move mitigation selection to a switch() Pawan Gupta
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:31 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

With the upcoming changes x86_ibpb_exit_to_user will also be used when BHB
clearing sequence is used. Rename it cover both the cases.

No functional change.

Suggested-by: Sean Christopherson <seanjc@google.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/include/asm/entry-common.h  | 6 +++---
 arch/x86/include/asm/nospec-branch.h | 2 +-
 arch/x86/kernel/cpu/bugs.c           | 4 ++--
 arch/x86/kvm/x86.c                   | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index ce3eb6d5fdf9..c45858db16c9 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -94,11 +94,11 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	 */
 	choose_random_kstack_offset(rdtsc());
 
-	/* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
+	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
-	    this_cpu_read(x86_ibpb_exit_to_user)) {
+	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
 		indirect_branch_prediction_barrier();
-		this_cpu_write(x86_ibpb_exit_to_user, false);
+		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }
 #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 157eb69c7f0f..0381db59c39d 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -533,7 +533,7 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
 		: "memory");
 }
 
-DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
+DECLARE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
 
 static inline void indirect_branch_prediction_barrier(void)
 {
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 2cb4a96247d8..002bf4adccc3 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -65,8 +65,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
  * be needed to before running userspace. That IBPB will flush the branch
  * predictor content.
  */
-DEFINE_PER_CPU(bool, x86_ibpb_exit_to_user);
-EXPORT_PER_CPU_SYMBOL_GPL(x86_ibpb_exit_to_user);
+DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
+EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
 
 u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd1c4a36b593..45d7cfedc507 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11464,7 +11464,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * may migrate to.
 	 */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
-		this_cpu_write(x86_ibpb_exit_to_user, true);
+		this_cpu_write(x86_predictor_flush_exit_to_user, true);
 
 	/*
 	 * Consume any pending interrupts, including the possible source of

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 05/10] x86/vmscape: Move mitigation selection to a switch()
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (3 preceding siblings ...)
  2026-04-03  0:31 ` [PATCH v9 04/10] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user Pawan Gupta
@ 2026-04-03  0:31 ` Pawan Gupta
  2026-04-03  0:32 ` [PATCH v9 06/10] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier() Pawan Gupta
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:31 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

This ensures that all mitigation modes are explicitly handled, while
keeping the mitigation selection for each mode together. This also prepares
for adding BHB-clearing mitigation mode for VMSCAPE.

Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/kernel/cpu/bugs.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 002bf4adccc3..636280c612f0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3088,17 +3088,33 @@ early_param("vmscape", vmscape_parse_cmdline);
 
 static void __init vmscape_select_mitigation(void)
 {
-	if (!boot_cpu_has_bug(X86_BUG_VMSCAPE) ||
-	    !boot_cpu_has(X86_FEATURE_IBPB)) {
+	if (!boot_cpu_has_bug(X86_BUG_VMSCAPE)) {
 		vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
 		return;
 	}
 
-	if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) {
-		if (should_mitigate_vuln(X86_BUG_VMSCAPE))
+	if ((vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) &&
+	    !should_mitigate_vuln(X86_BUG_VMSCAPE))
+		vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+
+	switch (vmscape_mitigation) {
+	case VMSCAPE_MITIGATION_NONE:
+		break;
+
+	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+		if (!boot_cpu_has(X86_FEATURE_IBPB))
+			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+		break;
+
+	case VMSCAPE_MITIGATION_AUTO:
+		if (boot_cpu_has(X86_FEATURE_IBPB))
 			vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 		else
 			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+		break;
+
+	default:
+		break;
 	}
 }
 

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 06/10] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier()
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (4 preceding siblings ...)
  2026-04-03  0:31 ` [PATCH v9 05/10] x86/vmscape: Move mitigation selection to a switch() Pawan Gupta
@ 2026-04-03  0:32 ` Pawan Gupta
  2026-04-03  0:32 ` [PATCH v9 07/10] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:32 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

indirect_branch_prediction_barrier() is a wrapper to write_ibpb(), which
also checks if the CPU supports IBPB. For VMSCAPE, call to
indirect_branch_prediction_barrier() is only possible when CPU supports
IBPB.

Simply call write_ibpb() directly to avoid unnecessary alternative
patching.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/include/asm/entry-common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index c45858db16c9..78b143673ca7 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -97,7 +97,7 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
 	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
-		indirect_branch_prediction_barrier();
+		write_ibpb();
 		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 07/10] x86/vmscape: Use static_call() for predictor flush
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (5 preceding siblings ...)
  2026-04-03  0:32 ` [PATCH v9 06/10] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier() Pawan Gupta
@ 2026-04-03  0:32 ` Pawan Gupta
  2026-04-03 14:52   ` Sean Christopherson
  2026-04-03  0:32 ` [PATCH v9 08/10] x86/vmscape: Deploy BHB clearing mitigation Pawan Gupta
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:32 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

Adding more mitigation options at exit-to-userspace for VMSCAPE would
usually require a series of checks to decide which mitigation to use. In
this case, the mitigation is done by calling a function, which is decided
at boot. So, adding more feature flags and multiple checks can be avoided
by using static_call() to the mitigating function.

Replace the flag-based mitigation selector with a static_call(). This also
frees the existing X86_FEATURE_IBPB_EXIT_TO_USER.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/Kconfig                     |  1 +
 arch/x86/include/asm/cpufeatures.h   |  2 +-
 arch/x86/include/asm/entry-common.h  |  7 +++----
 arch/x86/include/asm/nospec-branch.h |  3 +++
 arch/x86/include/asm/processor.h     |  1 +
 arch/x86/kernel/cpu/bugs.c           | 14 +++++++++++++-
 arch/x86/kvm/x86.c                   |  2 +-
 7 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e2df1b147184..5b8def9ddb98 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2720,6 +2720,7 @@ config MITIGATION_TSA
 config MITIGATION_VMSCAPE
 	bool "Mitigate VMSCAPE"
 	depends on KVM
+	depends on HAVE_STATIC_CALL
 	default y
 	help
 	  Enable mitigation for VMSCAPE attacks. VMSCAPE is a hardware security
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dbe104df339b..b4d529dd6d30 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -503,7 +503,7 @@
 #define X86_FEATURE_TSA_SQ_NO		(21*32+11) /* AMD CPU not vulnerable to TSA-SQ */
 #define X86_FEATURE_TSA_L1_NO		(21*32+12) /* AMD CPU not vulnerable to TSA-L1 */
 #define X86_FEATURE_CLEAR_CPU_BUF_VM	(21*32+13) /* Clear CPU buffers using VERW before VMRUN */
-#define X86_FEATURE_IBPB_EXIT_TO_USER	(21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
+/* Free */
 #define X86_FEATURE_ABMC		(21*32+15) /* Assignable Bandwidth Monitoring Counters */
 #define X86_FEATURE_MSR_IMM		(21*32+16) /* MSR immediate form instructions */
 #define X86_FEATURE_SGX_EUPDATESVN	(21*32+17) /* Support for ENCLS[EUPDATESVN] instruction */
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 78b143673ca7..783e7cb50cae 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -4,6 +4,7 @@
 
 #include <linux/randomize_kstack.h>
 #include <linux/user-return-notifier.h>
+#include <linux/static_call_types.h>
 
 #include <asm/nospec-branch.h>
 #include <asm/io_bitmap.h>
@@ -94,10 +95,8 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	 */
 	choose_random_kstack_offset(rdtsc());
 
-	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
-	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
-	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
-		write_ibpb();
+	if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
+		static_call_cond(vmscape_predictor_flush)();
 		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 0381db59c39d..066fd8095200 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -542,6 +542,9 @@ static inline void indirect_branch_prediction_barrier(void)
 			    :: "rax", "rcx", "rdx", "memory");
 }
 
+#include <linux/static_call_types.h>
+DECLARE_STATIC_CALL(vmscape_predictor_flush, write_ibpb);
+
 /* The Intel SPEC CTRL MSR base value cache */
 extern u64 x86_spec_ctrl_base;
 DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a24c7805acdb..20ab4dd588c6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -753,6 +753,7 @@ enum mds_mitigations {
 };
 
 extern bool gds_ucode_mitigated(void);
+extern bool vmscape_mitigation_enabled(void);
 
 /*
  * Make previous memory operations globally visible before
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 636280c612f0..2f431d0be3d9 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -144,6 +144,12 @@ EXPORT_SYMBOL_GPL(cpu_buf_idle_clear);
  */
 DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
 
+/*
+ * Controls how vmscape is mitigated e.g. via IBPB or BHB-clear
+ * sequence. This defaults to no mitigation.
+ */
+DEFINE_STATIC_CALL_NULL(vmscape_predictor_flush, write_ibpb);
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"mitigations: " fmt
 
@@ -3133,8 +3139,14 @@ static void __init vmscape_update_mitigation(void)
 static void __init vmscape_apply_mitigation(void)
 {
 	if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
-		setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER);
+		static_call_update(vmscape_predictor_flush, write_ibpb);
+}
+
+bool vmscape_mitigation_enabled(void)
+{
+	return !!static_call_query(vmscape_predictor_flush);
 }
+EXPORT_SYMBOL_FOR_KVM(vmscape_mitigation_enabled);
 
 #undef pr_fmt
 #define pr_fmt(fmt) fmt
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45d7cfedc507..e204482e64f3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11463,7 +11463,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * set for the CPU that actually ran the guest, and not the CPU that it
 	 * may migrate to.
 	 */
-	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
+	if (vmscape_mitigation_enabled())
 		this_cpu_write(x86_predictor_flush_exit_to_user, true);
 
 	/*

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 08/10] x86/vmscape: Deploy BHB clearing mitigation
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (6 preceding siblings ...)
  2026-04-03  0:32 ` [PATCH v9 07/10] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
@ 2026-04-03  0:32 ` Pawan Gupta
  2026-04-03  0:32 ` [PATCH v9 09/10] x86/vmscape: Resolve conflict between attack-vectors and vmscape=force Pawan Gupta
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:32 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

IBPB mitigation for VMSCAPE is an overkill on CPUs that are only affected
by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
indirect branch isolation between guest and host userspace. However, branch
history from guest may also influence the indirect branches in host
userspace.

To mitigate the BHI aspect, use the BHB clearing sequence. Since now, IBPB
is not the only mitigation for VMSCAPE, update the documentation to reflect
that =auto could select either IBPB or BHB clear mitigation based on the
CPU.

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 Documentation/admin-guide/hw-vuln/vmscape.rst   | 11 ++++++++-
 Documentation/admin-guide/kernel-parameters.txt |  4 +++-
 arch/x86/include/asm/entry-common.h             |  4 ++++
 arch/x86/include/asm/nospec-branch.h            |  2 ++
 arch/x86/kernel/cpu/bugs.c                      | 30 +++++++++++++++++++------
 5 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index d9b9a2b6c114..7c40cf70ad7a 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -86,6 +86,10 @@ The possible values in this file are:
    run a potentially malicious guest and issues an IBPB before the first
    exit to userspace after VM-exit.
 
+ * 'Mitigation: Clear BHB before exit to userspace':
+
+   As above, conditional BHB clearing mitigation is enabled.
+
  * 'Mitigation: IBPB on VMEXIT':
 
    IBPB is issued on every VM-exit. This occurs when other mitigations like
@@ -102,9 +106,14 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter:
 
  * ``vmscape=ibpb``:
 
-   Enable conditional IBPB mitigation (default when CONFIG_MITIGATION_VMSCAPE=y).
+   Enable conditional IBPB mitigation.
 
  * ``vmscape=force``:
 
    Force vulnerability detection and mitigation even on processors that are
    not known to be affected.
+
+ * ``vmscape=auto``:
+
+   Choose the mitigation based on the VMSCAPE variant the CPU is affected by.
+   (default when CONFIG_MITIGATION_VMSCAPE=y)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 03a550630644..3853c7109419 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8378,9 +8378,11 @@ Kernel parameters
 
 			off		- disable the mitigation
 			ibpb		- use Indirect Branch Prediction Barrier
-					  (IBPB) mitigation (default)
+					  (IBPB) mitigation
 			force		- force vulnerability detection even on
 					  unaffected processors
+			auto		- (default) use IBPB or BHB clear
+					  mitigation based on CPU
 
 	vsyscall=	[X86-64,EARLY]
 			Controls the behavior of vsyscalls (i.e. calls to
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 783e7cb50cae..13db31472f3a 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -96,6 +96,10 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	choose_random_kstack_offset(rdtsc());
 
 	if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
+		/*
+		 * Since the mitigation is for userspace, an explicit
+		 * speculation barrier is not required after flush.
+		 */
 		static_call_cond(vmscape_predictor_flush)();
 		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 066fd8095200..38478383139b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -390,6 +390,8 @@ extern void write_ibpb(void);
 
 #ifdef CONFIG_X86_64
 extern void clear_bhb_loop_nofence(void);
+#else
+static inline void clear_bhb_loop_nofence(void) {}
 #endif
 
 extern void (*x86_return_thunk)(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 2f431d0be3d9..c7946cd809f7 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -61,9 +61,8 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
 EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
 
 /*
- * Set when the CPU has run a potentially malicious guest. An IBPB will
- * be needed to before running userspace. That IBPB will flush the branch
- * predictor content.
+ * Set when the CPU has run a potentially malicious guest. Indicates that a
+ * branch predictor flush is needed before running userspace.
  */
 DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
 EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
@@ -3060,13 +3059,15 @@ enum vmscape_mitigations {
 	VMSCAPE_MITIGATION_AUTO,
 	VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
 	VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
+	VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
 };
 
 static const char * const vmscape_strings[] = {
-	[VMSCAPE_MITIGATION_NONE]		= "Vulnerable",
+	[VMSCAPE_MITIGATION_NONE]			= "Vulnerable",
 	/* [VMSCAPE_MITIGATION_AUTO] */
-	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]	= "Mitigation: IBPB before exit to userspace",
-	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]	= "Mitigation: IBPB on VMEXIT",
+	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]		= "Mitigation: IBPB before exit to userspace",
+	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]		= "Mitigation: IBPB on VMEXIT",
+	[VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER]	= "Mitigation: Clear BHB before exit to userspace",
 };
 
 static enum vmscape_mitigations vmscape_mitigation __ro_after_init =
@@ -3084,6 +3085,8 @@ static int __init vmscape_parse_cmdline(char *str)
 	} else if (!strcmp(str, "force")) {
 		setup_force_cpu_bug(X86_BUG_VMSCAPE);
 		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
+	} else if (!strcmp(str, "auto")) {
+		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
 	} else {
 		pr_err("Ignoring unknown vmscape=%s option.\n", str);
 	}
@@ -3113,7 +3116,17 @@ static void __init vmscape_select_mitigation(void)
 		break;
 
 	case VMSCAPE_MITIGATION_AUTO:
-		if (boot_cpu_has(X86_FEATURE_IBPB))
+		/*
+		 * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use
+		 * BHB clear sequence. These CPUs are only vulnerable to the BHI
+		 * variant of the VMSCAPE attack, and thus they do not require a
+		 * full predictor flush.
+		 *
+		 * Note, in 32-bit mode BHB clear sequence is not supported.
+		 */
+		if (boot_cpu_has(X86_FEATURE_BHI_CTRL) && IS_ENABLED(CONFIG_X86_64))
+			vmscape_mitigation = VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER;
+		else if (boot_cpu_has(X86_FEATURE_IBPB))
 			vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 		else
 			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
@@ -3140,6 +3153,8 @@ static void __init vmscape_apply_mitigation(void)
 {
 	if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
 		static_call_update(vmscape_predictor_flush, write_ibpb);
+	else if (vmscape_mitigation == VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER)
+		static_call_update(vmscape_predictor_flush, clear_bhb_loop_nofence);
 }
 
 bool vmscape_mitigation_enabled(void)
@@ -3237,6 +3252,7 @@ void cpu_bugs_smt_update(void)
 		break;
 	case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
 	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+	case VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER:
 		/*
 		 * Hypervisors can be attacked across-threads, warn for SMT when
 		 * STIBP is not already enabled system-wide.

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 09/10] x86/vmscape: Resolve conflict between attack-vectors and vmscape=force
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (7 preceding siblings ...)
  2026-04-03  0:32 ` [PATCH v9 08/10] x86/vmscape: Deploy BHB clearing mitigation Pawan Gupta
@ 2026-04-03  0:32 ` Pawan Gupta
  2026-04-03  0:33 ` [PATCH v9 10/10] x86/vmscape: Add cmdline vmscape=on to override attack vector controls Pawan Gupta
  2026-04-04 15:20 ` [PATCH v9 00/10] VMSCAPE optimization for BHI variant David Laight
  10 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:32 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

vmscape=force option currently defaults to AUTO mitigation. This lets
attack-vector controls to override the vmscape mitigation. Preventing the
user from being able to force VMSCAPE mitigation.

When vmscape mitigation is forced, allow it be deployed irrespective of
attack vectors. Introduce VMSCAPE_MITIGATION_ON that wins over
attack-vector controls.

Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/kernel/cpu/bugs.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index c7946cd809f7..ba8389df467a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3057,6 +3057,7 @@ static void __init srso_apply_mitigation(void)
 enum vmscape_mitigations {
 	VMSCAPE_MITIGATION_NONE,
 	VMSCAPE_MITIGATION_AUTO,
+	VMSCAPE_MITIGATION_ON,
 	VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
 	VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
 	VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
@@ -3065,6 +3066,7 @@ enum vmscape_mitigations {
 static const char * const vmscape_strings[] = {
 	[VMSCAPE_MITIGATION_NONE]			= "Vulnerable",
 	/* [VMSCAPE_MITIGATION_AUTO] */
+	/* [VMSCAPE_MITIGATION_ON] */
 	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]		= "Mitigation: IBPB before exit to userspace",
 	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]		= "Mitigation: IBPB on VMEXIT",
 	[VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER]	= "Mitigation: Clear BHB before exit to userspace",
@@ -3084,7 +3086,7 @@ static int __init vmscape_parse_cmdline(char *str)
 		vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 	} else if (!strcmp(str, "force")) {
 		setup_force_cpu_bug(X86_BUG_VMSCAPE);
-		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
+		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
 	} else if (!strcmp(str, "auto")) {
 		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
 	} else {
@@ -3116,6 +3118,7 @@ static void __init vmscape_select_mitigation(void)
 		break;
 
 	case VMSCAPE_MITIGATION_AUTO:
+	case VMSCAPE_MITIGATION_ON:
 		/*
 		 * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use
 		 * BHB clear sequence. These CPUs are only vulnerable to the BHI
@@ -3249,6 +3252,7 @@ void cpu_bugs_smt_update(void)
 	switch (vmscape_mitigation) {
 	case VMSCAPE_MITIGATION_NONE:
 	case VMSCAPE_MITIGATION_AUTO:
+	case VMSCAPE_MITIGATION_ON:
 		break;
 	case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
 	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v9 10/10] x86/vmscape: Add cmdline vmscape=on to override attack vector controls
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (8 preceding siblings ...)
  2026-04-03  0:32 ` [PATCH v9 09/10] x86/vmscape: Resolve conflict between attack-vectors and vmscape=force Pawan Gupta
@ 2026-04-03  0:33 ` Pawan Gupta
  2026-04-04 15:20 ` [PATCH v9 00/10] VMSCAPE optimization for BHI variant David Laight
  10 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03  0:33 UTC (permalink / raw)
  To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

In general, individual mitigation knobs override the attack vector
controls. For VMSCAPE, =ibpb exists but nothing to select BHB clearing
mitigation. The =force option would select BHB clearing when supported, but
with a side-effect of also forcing the bug, hence deploying the mitigation
on unaffected parts too.

Add a new cmdline option vmscape=on to enable the mitigation based on the
VMSCAPE variant the CPU is affected by.

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 Documentation/admin-guide/hw-vuln/vmscape.rst   | 4 ++++
 Documentation/admin-guide/kernel-parameters.txt | 2 ++
 arch/x86/kernel/cpu/bugs.c                      | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index 7c40cf70ad7a..2558a5c3d956 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -117,3 +117,7 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter:
 
    Choose the mitigation based on the VMSCAPE variant the CPU is affected by.
    (default when CONFIG_MITIGATION_VMSCAPE=y)
+
+ * ``vmscape=on``:
+
+   Same as ``auto``, except that it overrides attack vector controls.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3853c7109419..98204d464477 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8383,6 +8383,8 @@ Kernel parameters
 					  unaffected processors
 			auto		- (default) use IBPB or BHB clear
 					  mitigation based on CPU
+			on		- same as "auto", but override attack
+					  vector control
 
 	vsyscall=	[X86-64,EARLY]
 			Controls the behavior of vsyscalls (i.e. calls to
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ba8389df467a..366ebe1e1fb9 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3087,6 +3087,8 @@ static int __init vmscape_parse_cmdline(char *str)
 	} else if (!strcmp(str, "force")) {
 		setup_force_cpu_bug(X86_BUG_VMSCAPE);
 		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
+	} else if (!strcmp(str, "on")) {
+		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
 	} else if (!strcmp(str, "auto")) {
 		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
 	} else {

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 07/10] x86/vmscape: Use static_call() for predictor flush
  2026-04-03  0:32 ` [PATCH v9 07/10] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
@ 2026-04-03 14:52   ` Sean Christopherson
  2026-04-03 16:44     ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2026-04-03 14:52 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Borislav Petkov, Dave Hansen, Peter Zijlstra,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh,
	Jiri Olsa, David S. Miller, David Laight, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, David Ahern, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Paolo Bonzini, Jonathan Corbet,
	linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

On Thu, Apr 02, 2026, Pawan Gupta wrote:
> Adding more mitigation options at exit-to-userspace for VMSCAPE would
> usually require a series of checks to decide which mitigation to use. In
> this case, the mitigation is done by calling a function, which is decided
> at boot. So, adding more feature flags and multiple checks can be avoided
> by using static_call() to the mitigating function.
> 
> Replace the flag-based mitigation selector with a static_call(). This also
> frees the existing X86_FEATURE_IBPB_EXIT_TO_USER.

...

> @@ -3133,8 +3139,14 @@ static void __init vmscape_update_mitigation(void)
>  static void __init vmscape_apply_mitigation(void)
>  {
>  	if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
> -		setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER);
> +		static_call_update(vmscape_predictor_flush, write_ibpb);
> +}
> +
> +bool vmscape_mitigation_enabled(void)
> +{
> +	return !!static_call_query(vmscape_predictor_flush);
>  }
> +EXPORT_SYMBOL_FOR_KVM(vmscape_mitigation_enabled);
>  
>  #undef pr_fmt
>  #define pr_fmt(fmt) fmt
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 45d7cfedc507..e204482e64f3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11463,7 +11463,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	 * set for the CPU that actually ran the guest, and not the CPU that it
>  	 * may migrate to.
>  	 */
> -	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
> +	if (vmscape_mitigation_enabled())

This is pretty lame.  It turns a statically patched MOV

  11548		if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
  11549			this_cpu_write(x86_ibpb_exit_to_user, true);
     0x000000000003c57a <+858>:	movb   $0x1,%gs:0x0(%rip)        # 0x3c582 <vcpu_enter_guest+866>

into a function call and two sets of conditional branches.  And with mitigations
enabled, that function call may trigger the wonderful unret insanity

  11548		if (vmscape_mitigation_enabled())
     0x000000000003c575 <+853>:	call   0x3c57a <vcpu_enter_guest+858>
     0x000000000003c57a <+858>:	test   %al,%al
     0x000000000003c57c <+860>:	je     0x3c586 <vcpu_enter_guest+870>

  11549			this_cpu_write(x86_predictor_flush_exit_to_user, true);
     0x000000000003c57e <+862>:	movb   $0x1,%gs:0x0(%rip)        # 0x3c586 <vcpu_enter_guest+870>


  3166	{
     0xffffffff81285320 <+0>:	endbr64
     0xffffffff81285324 <+4>:	call   0xffffffff812aa5a0 <__fentry__>

  3167		return !!static_call_query(vmscape_predictor_flush);
     0xffffffff81285329 <+9>:	mov    0x13a4f30(%rip),%rax        # 0xffffffff8262a260 <__SCK__vmscape_predictor_flush>
     0xffffffff81285330 <+16>:	test   %rax,%rax
     0xffffffff81285333 <+19>:	setne  %al

  3168	}
     0xffffffff81285336 <+22>:	jmp    0xffffffff81db1e30 <__x86_return_thunk>

While this isn't KVM's super hot inner run loop, it's still very much a hot path.
Even more annoying, KVM will eat the function call on kernels with CPU_MITIGATIONS=n.

I'd like to at least do something like the below to make the common case of
multiple guest entry/exits more or less free, and to avoid the CALL+(UN)RET
overhead, but trying to include linux/static_call.h in processor.h (or any other
core x86 header) creates a cyclical dependency :-/

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 20ab4dd588c6..0dc0680a80f8 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -36,6 +36,7 @@ struct vm86;
 #include <linux/err.h>
 #include <linux/irqflags.h>
 #include <linux/mem_encrypt.h>
+#include <linux/static_call.h>
 
 /*
  * We handle most unaligned accesses in hardware.  On the other hand
@@ -753,7 +754,11 @@ enum mds_mitigations {
 };
 
 extern bool gds_ucode_mitigated(void);
-extern bool vmscape_mitigation_enabled(void);
+
+static inline bool vmscape_mitigation_enabled(void)
+{
+       return !!static_call_query(vmscape_predictor_flush);
+}
 
 /*
  * Make previous memory operations globally visible before
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 366ebe1e1fb9..02bf626f0773 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -148,6 +148,7 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
  * sequence. This defaults to no mitigation.
  */
 DEFINE_STATIC_CALL_NULL(vmscape_predictor_flush, write_ibpb);
+EXPORT_STATIC_CALL_GPL(vmscape_predictor_flush);
 
 #undef pr_fmt
 #define pr_fmt(fmt)    "mitigations: " fmt
@@ -3162,12 +3163,6 @@ static void __init vmscape_apply_mitigation(void)
                static_call_update(vmscape_predictor_flush, clear_bhb_loop_nofence);
 }
 
-bool vmscape_mitigation_enabled(void)
-{
-       return !!static_call_query(vmscape_predictor_flush);
-}
-EXPORT_SYMBOL_FOR_KVM(vmscape_mitigation_enabled);
-
 #undef pr_fmt
 #define pr_fmt(fmt) fmt
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1fbbab08291..117c60d00758 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11545,7 +11545,9 @@ static noinline int vcpu_enter_guest(struct kvm_vcpu *vcpu)
         * set for the CPU that actually ran the guest, and not the CPU that it
         * may migrate to.
         */
-       if (vmscape_mitigation_enabled())
+       if (IS_ENABLED(CONFIG_CPU_MITIGATIONS) &&
+           !this_cpu_read(x86_predictor_flush_exit_to_user) &&
+           vmscape_mitigation_enabled())
                this_cpu_write(x86_predictor_flush_exit_to_user, true);
 
        /*

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
  2026-04-03  0:30 ` [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
@ 2026-04-03 15:16   ` Borislav Petkov
  2026-04-03 16:45     ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Borislav Petkov @ 2026-04-03 15:16 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Dave Hansen, Peter Zijlstra,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh,
	Jiri Olsa, David S. Miller, David Laight, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, David Ahern, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Paolo Bonzini, Jonathan Corbet,
	linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

On Thu, Apr 02, 2026 at 05:30:47PM -0700, Pawan Gupta wrote:
> Currently, the BHB clearing sequence is followed by an LFENCE to prevent
> transient execution of subsequent indirect branches prematurely. However,
> the LFENCE barrier could be unnecessary in certain cases. For example, when
> the kernel is using the BHI_DIS_S mitigation, and BHB clearing is only
> needed for userspace. In such cases, the LFENCE is redundant because ring
> transitions would provide the necessary serialization.
> 
> Below is a quick recap of BHI mitigation options:
> 
> On Alder Lake and newer
> 
>     BHI_DIS_S: Hardware control to mitigate BHI in ring0. This has low
>     performance overhead.
> 
>     Long loop: Alternatively, a longer version of the BHB clearing sequence
>     can be used to mitigate BHI. It can also be used to mitigate the BHI
>     variant of VMSCAPE. This is not yet implemented in Linux.
> 
> On older CPUs
> 
>     Short loop: Clears BHB at kernel entry and VMexit. The "Long loop" is
>     effective on older CPUs as well, but should be avoided because of
>     unnecessary overhead.
> 
> On Alder Lake and newer CPUs, eIBRS isolates the indirect targets between
> guest and host. But when affected by the BHI variant of VMSCAPE, a guest's
> branch history may still influence indirect branches in userspace. This
> also means the big hammer IBPB could be replaced with a cheaper option that
> clears the BHB at exit-to-userspace after a VMexit.
> 
> In preparation for adding the support for the BHB sequence (without LFENCE)
> on newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
> executed. Allow callers to decide whether they need the LFENCE or not. This
> adds a few extra bytes to the call sites, but it obviates the need for
> multiple variants of clear_bhb_loop().
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Tested-by: Jon Kohler <jon@nutanix.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
>  arch/x86/entry/entry_64.S            | 5 ++++-
>  arch/x86/include/asm/nospec-branch.h | 4 ++--
>  arch/x86/net/bpf_jit_comp.c          | 2 ++
>  3 files changed, 8 insertions(+), 3 deletions(-)

Acked-by: Borislav Petkov (AMD) <bp@alien8.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 07/10] x86/vmscape: Use static_call() for predictor flush
  2026-04-03 14:52   ` Sean Christopherson
@ 2026-04-03 16:44     ` Pawan Gupta
  2026-04-03 17:26       ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03 16:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Borislav Petkov, Dave Hansen, Peter Zijlstra,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh,
	Jiri Olsa, David S. Miller, David Laight, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, David Ahern, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Paolo Bonzini, Jonathan Corbet,
	linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

On Fri, Apr 03, 2026 at 07:52:23AM -0700, Sean Christopherson wrote:
> On Thu, Apr 02, 2026, Pawan Gupta wrote:
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -11463,7 +11463,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  	 * set for the CPU that actually ran the guest, and not the CPU that it
> >  	 * may migrate to.
> >  	 */
> > -	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
> > +	if (vmscape_mitigation_enabled())
> 
> This is pretty lame.  It turns a statically patched MOV

Yes it is, this was done ...

>   11548		if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
>   11549			this_cpu_write(x86_ibpb_exit_to_user, true);
>      0x000000000003c57a <+858>:	movb   $0x1,%gs:0x0(%rip)        # 0x3c582 <vcpu_enter_guest+866>
> 
> into a function call and two sets of conditional branches.  And with mitigations
> enabled, that function call may trigger the wonderful unret insanity
> 
>   11548		if (vmscape_mitigation_enabled())
>      0x000000000003c575 <+853>:	call   0x3c57a <vcpu_enter_guest+858>
>      0x000000000003c57a <+858>:	test   %al,%al
>      0x000000000003c57c <+860>:	je     0x3c586 <vcpu_enter_guest+870>
> 
>   11549			this_cpu_write(x86_predictor_flush_exit_to_user, true);
>      0x000000000003c57e <+862>:	movb   $0x1,%gs:0x0(%rip)        # 0x3c586 <vcpu_enter_guest+870>
> 
> 
>   3166	{
>      0xffffffff81285320 <+0>:	endbr64
>      0xffffffff81285324 <+4>:	call   0xffffffff812aa5a0 <__fentry__>
> 
>   3167		return !!static_call_query(vmscape_predictor_flush);
>      0xffffffff81285329 <+9>:	mov    0x13a4f30(%rip),%rax        # 0xffffffff8262a260 <__SCK__vmscape_predictor_flush>
>      0xffffffff81285330 <+16>:	test   %rax,%rax
>      0xffffffff81285333 <+19>:	setne  %al
> 
>   3168	}
>      0xffffffff81285336 <+22>:	jmp    0xffffffff81db1e30 <__x86_return_thunk>
> 
> While this isn't KVM's super hot inner run loop, it's still very much a hot path.
> Even more annoying, KVM will eat the function call on kernels with CPU_MITIGATIONS=n.
> 
> I'd like to at least do something like the below to make the common case of
> multiple guest entry/exits more or less free, and to avoid the CALL+(UN)RET
> overhead, but trying to include linux/static_call.h in processor.h (or any other
> core x86 header) creates a cyclical dependency :-/
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 20ab4dd588c6..0dc0680a80f8 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -36,6 +36,7 @@ struct vm86;
>  #include <linux/err.h>
>  #include <linux/irqflags.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/static_call.h>
>  
>  /*
>   * We handle most unaligned accesses in hardware.  On the other hand
> @@ -753,7 +754,11 @@ enum mds_mitigations {
>  };
>  
>  extern bool gds_ucode_mitigated(void);
> -extern bool vmscape_mitigation_enabled(void);
> +
> +static inline bool vmscape_mitigation_enabled(void)
> +{
> +       return !!static_call_query(vmscape_predictor_flush);
> +}
>  
>  /*
>   * Make previous memory operations globally visible before
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 366ebe1e1fb9..02bf626f0773 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -148,6 +148,7 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
>   * sequence. This defaults to no mitigation.
>   */
>  DEFINE_STATIC_CALL_NULL(vmscape_predictor_flush, write_ibpb);
> +EXPORT_STATIC_CALL_GPL(vmscape_predictor_flush);

... to avoid exporting the static key, so that modules (other than KVM)
cannot do static_call_update(vmscape_predictor_flush).

Peter suggested changes that allowed adding EXPORT_STATIC_CALL_FOR_KVM():

  https://lore.kernel.org/all/20260319214409.GL3738786@noisy.programming.kicks-ass.net/

EXPORT_STATIC_CALL_FOR_KVM() seems to be a cleaner approach to me.

Boris, I know you didn't like exporting the static_key. But, as Sean said
this is a hot path, and avoiding the unnecessary call would benefit all
CPUs (affected or unaffected). Moreover, EXPORT_STATIC_CALL_FOR_KVM()
somewhat addresses your concern of exporting the static_key to the world.
Would you be okay with it?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
  2026-04-03 15:16   ` Borislav Petkov
@ 2026-04-03 16:45     ` Pawan Gupta
  2026-04-03 17:11       ` Borislav Petkov
  0 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03 16:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Dave Hansen, Peter Zijlstra,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh,
	Jiri Olsa, David S. Miller, David Laight, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, David Ahern, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Paolo Bonzini, Jonathan Corbet,
	linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

On Fri, Apr 03, 2026 at 05:16:30PM +0200, Borislav Petkov wrote:
> On Thu, Apr 02, 2026 at 05:30:47PM -0700, Pawan Gupta wrote:
> > Currently, the BHB clearing sequence is followed by an LFENCE to prevent
> > transient execution of subsequent indirect branches prematurely. However,
> > the LFENCE barrier could be unnecessary in certain cases. For example, when
> > the kernel is using the BHI_DIS_S mitigation, and BHB clearing is only
> > needed for userspace. In such cases, the LFENCE is redundant because ring
> > transitions would provide the necessary serialization.
> > 
> > Below is a quick recap of BHI mitigation options:
> > 
> > On Alder Lake and newer
> > 
> >     BHI_DIS_S: Hardware control to mitigate BHI in ring0. This has low
> >     performance overhead.
> > 
> >     Long loop: Alternatively, a longer version of the BHB clearing sequence
> >     can be used to mitigate BHI. It can also be used to mitigate the BHI
> >     variant of VMSCAPE. This is not yet implemented in Linux.
> > 
> > On older CPUs
> > 
> >     Short loop: Clears BHB at kernel entry and VMexit. The "Long loop" is
> >     effective on older CPUs as well, but should be avoided because of
> >     unnecessary overhead.
> > 
> > On Alder Lake and newer CPUs, eIBRS isolates the indirect targets between
> > guest and host. But when affected by the BHI variant of VMSCAPE, a guest's
> > branch history may still influence indirect branches in userspace. This
> > also means the big hammer IBPB could be replaced with a cheaper option that
> > clears the BHB at exit-to-userspace after a VMexit.
> > 
> > In preparation for adding the support for the BHB sequence (without LFENCE)
> > on newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
> > executed. Allow callers to decide whether they need the LFENCE or not. This
> > adds a few extra bytes to the call sites, but it obviates the need for
> > multiple variants of clear_bhb_loop().
> > 
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Tested-by: Jon Kohler <jon@nutanix.com>
> > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > ---
> >  arch/x86/entry/entry_64.S            | 5 ++++-
> >  arch/x86/include/asm/nospec-branch.h | 4 ++--
> >  arch/x86/net/bpf_jit_comp.c          | 2 ++
> >  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> Acked-by: Borislav Petkov (AMD) <bp@alien8.de>

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
  2026-04-03 16:45     ` Pawan Gupta
@ 2026-04-03 17:11       ` Borislav Petkov
  0 siblings, 0 replies; 32+ messages in thread
From: Borislav Petkov @ 2026-04-03 17:11 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Dave Hansen, Peter Zijlstra,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh,
	Jiri Olsa, David S. Miller, David Laight, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, David Ahern, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Paolo Bonzini, Jonathan Corbet,
	linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
	linux-doc

On Fri, Apr 03, 2026 at 09:45:52AM -0700, Pawan Gupta wrote:
> Thanks.

You don't have to say "thanks" to every review - we're one big family.

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 07/10] x86/vmscape: Use static_call() for predictor flush
  2026-04-03 16:44     ` Pawan Gupta
@ 2026-04-03 17:26       ` Pawan Gupta
  0 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03 17:26 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Dave Hansen, Peter Zijlstra, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, KP Singh, Jiri Olsa,
	David S. Miller, David Laight, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, David Ahern, Martin KaFai Lau, Eduard Zingerman,
	Song Liu, Yonghong Song, John Fastabend, Stanislav Fomichev,
	Hao Luo, Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm,
	Asit Mallick, Tao Zhang, bpf, netdev, linux-doc

On Fri, Apr 03, 2026 at 09:44:32AM -0700, Pawan Gupta wrote:
> On Fri, Apr 03, 2026 at 07:52:23AM -0700, Sean Christopherson wrote:
> > On Thu, Apr 02, 2026, Pawan Gupta wrote:
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -11463,7 +11463,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > >  	 * set for the CPU that actually ran the guest, and not the CPU that it
> > >  	 * may migrate to.
> > >  	 */
> > > -	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
> > > +	if (vmscape_mitigation_enabled())
> > 
> > This is pretty lame.  It turns a statically patched MOV
> 
> Yes it is, this was done ...
> 
> >   11548		if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
> >   11549			this_cpu_write(x86_ibpb_exit_to_user, true);
> >      0x000000000003c57a <+858>:	movb   $0x1,%gs:0x0(%rip)        # 0x3c582 <vcpu_enter_guest+866>
> > 
> > into a function call and two sets of conditional branches.  And with mitigations
> > enabled, that function call may trigger the wonderful unret insanity
> > 
> >   11548		if (vmscape_mitigation_enabled())
> >      0x000000000003c575 <+853>:	call   0x3c57a <vcpu_enter_guest+858>
> >      0x000000000003c57a <+858>:	test   %al,%al
> >      0x000000000003c57c <+860>:	je     0x3c586 <vcpu_enter_guest+870>
> > 
> >   11549			this_cpu_write(x86_predictor_flush_exit_to_user, true);
> >      0x000000000003c57e <+862>:	movb   $0x1,%gs:0x0(%rip)        # 0x3c586 <vcpu_enter_guest+870>
> > 
> > 
> >   3166	{
> >      0xffffffff81285320 <+0>:	endbr64
> >      0xffffffff81285324 <+4>:	call   0xffffffff812aa5a0 <__fentry__>
> > 
> >   3167		return !!static_call_query(vmscape_predictor_flush);
> >      0xffffffff81285329 <+9>:	mov    0x13a4f30(%rip),%rax        # 0xffffffff8262a260 <__SCK__vmscape_predictor_flush>
> >      0xffffffff81285330 <+16>:	test   %rax,%rax
> >      0xffffffff81285333 <+19>:	setne  %al
> > 
> >   3168	}
> >      0xffffffff81285336 <+22>:	jmp    0xffffffff81db1e30 <__x86_return_thunk>
> > 
> > While this isn't KVM's super hot inner run loop, it's still very much a hot path.
> > Even more annoying, KVM will eat the function call on kernels with CPU_MITIGATIONS=n.
> > 
> > I'd like to at least do something like the below to make the common case of
> > multiple guest entry/exits more or less free, and to avoid the CALL+(UN)RET
> > overhead, but trying to include linux/static_call.h in processor.h (or any other
> > core x86 header) creates a cyclical dependency :-/
> > 
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 20ab4dd588c6..0dc0680a80f8 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -36,6 +36,7 @@ struct vm86;
> >  #include <linux/err.h>
> >  #include <linux/irqflags.h>
> >  #include <linux/mem_encrypt.h>
> > +#include <linux/static_call.h>
> >  
> >  /*
> >   * We handle most unaligned accesses in hardware.  On the other hand
> > @@ -753,7 +754,11 @@ enum mds_mitigations {
> >  };
> >  
> >  extern bool gds_ucode_mitigated(void);
> > -extern bool vmscape_mitigation_enabled(void);
> > +
> > +static inline bool vmscape_mitigation_enabled(void)
> > +{
> > +       return !!static_call_query(vmscape_predictor_flush);
> > +}
> >  
> >  /*
> >   * Make previous memory operations globally visible before
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index 366ebe1e1fb9..02bf626f0773 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -148,6 +148,7 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
> >   * sequence. This defaults to no mitigation.
> >   */
> >  DEFINE_STATIC_CALL_NULL(vmscape_predictor_flush, write_ibpb);
> > +EXPORT_STATIC_CALL_GPL(vmscape_predictor_flush);
> 
> ... to avoid exporting the static key, so that modules (other than KVM)
> cannot do static_call_update(vmscape_predictor_flush).
> 
> Peter suggested changes that allowed adding EXPORT_STATIC_CALL_FOR_KVM():
> 
>   https://lore.kernel.org/all/20260319214409.GL3738786@noisy.programming.kicks-ass.net/

Sorry, this is the correct link for EXPORT_STATIC_CALL_FOR_KVM():

  https://lore.kernel.org/all/20260320062206.bdrnmnvho6lhmejw@desk/

> EXPORT_STATIC_CALL_FOR_KVM() seems to be a cleaner approach to me.
> 
> Boris, I know you didn't like exporting the static_key. But, as Sean said
> this is a hot path, and avoiding the unnecessary call would benefit all
> CPUs (affected or unaffected). Moreover, EXPORT_STATIC_CALL_FOR_KVM()
> somewhat addresses your concern of exporting the static_key to the world.
> Would you be okay with it?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03  0:31 ` [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
@ 2026-04-03 18:10   ` Jim Mattson
  2026-04-03 18:52     ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Jim Mattson @ 2026-04-03 18:10 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc

On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> the Branch History Buffer (BHB). On Alder Lake and newer parts this
> sequence is not sufficient because it doesn't clear enough entries. This
> was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> in the kernel.
>
> Now with VMSCAPE (BHI variant) it is also required to isolate branch
> history between guests and userspace. Since BHI_DIS_S only protects the
> kernel, the newer CPUs also use IBPB.
>
> A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> But it currently does not clear enough BHB entries to be effective on newer
> CPUs with larger BHB. At boot, dynamically set the loop count of
> clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
>
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
>  arch/x86/entry/entry_64.S            |  8 +++++---
>  arch/x86/include/asm/nospec-branch.h |  2 ++
>  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
>  3 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 3a180a36ca0e..bbd4b1c7ec04 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
>         ANNOTATE_NOENDBR
>         push    %rbp
>         mov     %rsp, %rbp
> -       movl    $5, %ecx
> +
> +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> +
>         ANNOTATE_INTRA_FUNCTION_CALL
>         call    1f
>         jmp     5f
> @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
>          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
>          * but some Clang versions (e.g. 18) don't like this.
>          */
> -       .skip 32 - 18, 0xcc
> -2:     movl    $5, %eax
> +       .skip 32 - 20, 0xcc
> +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
>  3:     jmp     4f
>         nop
>  4:     sub     $1, %eax
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 70b377fcbc1c..87b83ae7c97f 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
>  extern void update_spec_ctrl_cond(u64 val);
>  extern u64 spec_ctrl_current(void);
>
> +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> +
>  /*
>   * With retpoline, we must use IBRS to restrict branch prediction
>   * before calling into firmware.
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 83f51cab0b1e..2cb4a96247d8 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
>  static enum bhi_mitigations bhi_mitigation __ro_after_init =
>         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
>
> +/* Default to short BHB sequence values */
> +u8 bhb_seq_outer_loop __ro_after_init = 5;
> +u8 bhb_seq_inner_loop __ro_after_init = 5;
> +
>  static int __init spectre_bhi_parse_cmdline(char *str)
>  {
>         if (!str)
> @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
>                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
>         }
>
> +       /*
> +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> +        * support), see Intel's BHI guidance.
> +        */
> +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> +               bhb_seq_outer_loop = 12;
> +               bhb_seq_inner_loop = 7;
> +       }
> +

How does this work for VMs in a heterogeneous migration pool that
spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
it isn't available on all hosts in the migration pool, but they need
the long sequence when running on Alder Lake or newer.

Previously, I considered such a migration pool infeasible, because of
the change in MAXPHYADDR, but I now predict that I will lose that
battle.


>         x86_arch_cap_msr = x86_read_arch_cap_msr();
>
>         cpu_print_attack_vectors();
>
> --
> 2.34.1
>
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 18:10   ` Jim Mattson
@ 2026-04-03 18:52     ` Pawan Gupta
  2026-04-03 20:19       ` Jim Mattson
  0 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03 18:52 UTC (permalink / raw)
  To: Jim Mattson
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc

On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > sequence is not sufficient because it doesn't clear enough entries. This
> > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > in the kernel.
> >
> > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > history between guests and userspace. Since BHI_DIS_S only protects the
> > kernel, the newer CPUs also use IBPB.
> >
> > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > But it currently does not clear enough BHB entries to be effective on newer
> > CPUs with larger BHB. At boot, dynamically set the loop count of
> > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> >
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > ---
> >  arch/x86/entry/entry_64.S            |  8 +++++---
> >  arch/x86/include/asm/nospec-branch.h |  2 ++
> >  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
> >  3 files changed, 20 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> >         ANNOTATE_NOENDBR
> >         push    %rbp
> >         mov     %rsp, %rbp
> > -       movl    $5, %ecx
> > +
> > +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> > +
> >         ANNOTATE_INTRA_FUNCTION_CALL
> >         call    1f
> >         jmp     5f
> > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> >          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> >          * but some Clang versions (e.g. 18) don't like this.
> >          */
> > -       .skip 32 - 18, 0xcc
> > -2:     movl    $5, %eax
> > +       .skip 32 - 20, 0xcc
> > +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
> >  3:     jmp     4f
> >         nop
> >  4:     sub     $1, %eax
> > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > index 70b377fcbc1c..87b83ae7c97f 100644
> > --- a/arch/x86/include/asm/nospec-branch.h
> > +++ b/arch/x86/include/asm/nospec-branch.h
> > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> >  extern void update_spec_ctrl_cond(u64 val);
> >  extern u64 spec_ctrl_current(void);
> >
> > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > +
> >  /*
> >   * With retpoline, we must use IBRS to restrict branch prediction
> >   * before calling into firmware.
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index 83f51cab0b1e..2cb4a96247d8 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> >  static enum bhi_mitigations bhi_mitigation __ro_after_init =
> >         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> >
> > +/* Default to short BHB sequence values */
> > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > +
> >  static int __init spectre_bhi_parse_cmdline(char *str)
> >  {
> >         if (!str)
> > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> >                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> >         }
> >
> > +       /*
> > +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > +        * support), see Intel's BHI guidance.
> > +        */
> > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > +               bhb_seq_outer_loop = 12;
> > +               bhb_seq_inner_loop = 7;
> > +       }
> > +
> 
> How does this work for VMs in a heterogeneous migration pool that
> spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> it isn't available on all hosts in the migration pool, but they need
> the long sequence when running on Alder Lake or newer.

As we discussed elsewhere, support for migration pool is much more
involved. It should be dealt in a separate QEMU/KVM focused series.

A quickfix could be adding support for spectre_bhi=long that guests in a
migration pool can use?

> Previously, I considered such a migration pool infeasible, because of
> the change in MAXPHYADDR, but I now predict that I will lose that
> battle.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 18:52     ` Pawan Gupta
@ 2026-04-03 20:19       ` Jim Mattson
  2026-04-03 21:34         ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Jim Mattson @ 2026-04-03 20:19 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc

On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > sequence is not sufficient because it doesn't clear enough entries. This
> > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > in the kernel.
> > >
> > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > kernel, the newer CPUs also use IBPB.
> > >
> > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > But it currently does not clear enough BHB entries to be effective on newer
> > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > >
> > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > ---
> > >  arch/x86/entry/entry_64.S            |  8 +++++---
> > >  arch/x86/include/asm/nospec-branch.h |  2 ++
> > >  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
> > >  3 files changed, 20 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > --- a/arch/x86/entry/entry_64.S
> > > +++ b/arch/x86/entry/entry_64.S
> > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > >         ANNOTATE_NOENDBR
> > >         push    %rbp
> > >         mov     %rsp, %rbp
> > > -       movl    $5, %ecx
> > > +
> > > +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> > > +
> > >         ANNOTATE_INTRA_FUNCTION_CALL
> > >         call    1f
> > >         jmp     5f
> > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > >          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > >          * but some Clang versions (e.g. 18) don't like this.
> > >          */
> > > -       .skip 32 - 18, 0xcc
> > > -2:     movl    $5, %eax
> > > +       .skip 32 - 20, 0xcc
> > > +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
> > >  3:     jmp     4f
> > >         nop
> > >  4:     sub     $1, %eax
> > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > --- a/arch/x86/include/asm/nospec-branch.h
> > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > >  extern void update_spec_ctrl_cond(u64 val);
> > >  extern u64 spec_ctrl_current(void);
> > >
> > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > +
> > >  /*
> > >   * With retpoline, we must use IBRS to restrict branch prediction
> > >   * before calling into firmware.
> > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > --- a/arch/x86/kernel/cpu/bugs.c
> > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > >  static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > >         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > >
> > > +/* Default to short BHB sequence values */
> > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > +
> > >  static int __init spectre_bhi_parse_cmdline(char *str)
> > >  {
> > >         if (!str)
> > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > >                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > >         }
> > >
> > > +       /*
> > > +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > +        * support), see Intel's BHI guidance.
> > > +        */
> > > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > +               bhb_seq_outer_loop = 12;
> > > +               bhb_seq_inner_loop = 7;
> > > +       }
> > > +
> >
> > How does this work for VMs in a heterogeneous migration pool that
> > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > it isn't available on all hosts in the migration pool, but they need
> > the long sequence when running on Alder Lake or newer.
>
> As we discussed elsewhere, support for migration pool is much more
> involved. It should be dealt in a separate QEMU/KVM focused series.
>
> A quickfix could be adding support for spectre_bhi=long that guests in a
> migration pool can use?

The simplest solution is to add "|
cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
If that is unacceptable for the performance of pre-Alder Lake
migration pools, you could define a CPUID or MSR bit that says
explicitly, "long BHB flush sequence needed," rather than trying to
intuit that property from the presence of BHI_CTRL. Like
IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
by a hypervisor.

I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
friends, unless there is a major guest OS out there that relies on
them.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 20:19       ` Jim Mattson
@ 2026-04-03 21:34         ` Pawan Gupta
  2026-04-03 21:59           ` Jim Mattson
  0 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03 21:34 UTC (permalink / raw)
  To: Jim Mattson
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc

On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
> On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > >
> > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > > in the kernel.
> > > >
> > > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > > kernel, the newer CPUs also use IBPB.
> > > >
> > > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > > But it currently does not clear enough BHB entries to be effective on newer
> > > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > > >
> > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > > ---
> > > >  arch/x86/entry/entry_64.S            |  8 +++++---
> > > >  arch/x86/include/asm/nospec-branch.h |  2 ++
> > > >  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
> > > >  3 files changed, 20 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > > --- a/arch/x86/entry/entry_64.S
> > > > +++ b/arch/x86/entry/entry_64.S
> > > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > > >         ANNOTATE_NOENDBR
> > > >         push    %rbp
> > > >         mov     %rsp, %rbp
> > > > -       movl    $5, %ecx
> > > > +
> > > > +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> > > > +
> > > >         ANNOTATE_INTRA_FUNCTION_CALL
> > > >         call    1f
> > > >         jmp     5f
> > > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > > >          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > > >          * but some Clang versions (e.g. 18) don't like this.
> > > >          */
> > > > -       .skip 32 - 18, 0xcc
> > > > -2:     movl    $5, %eax
> > > > +       .skip 32 - 20, 0xcc
> > > > +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
> > > >  3:     jmp     4f
> > > >         nop
> > > >  4:     sub     $1, %eax
> > > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > > --- a/arch/x86/include/asm/nospec-branch.h
> > > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > > >  extern void update_spec_ctrl_cond(u64 val);
> > > >  extern u64 spec_ctrl_current(void);
> > > >
> > > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > > +
> > > >  /*
> > > >   * With retpoline, we must use IBRS to restrict branch prediction
> > > >   * before calling into firmware.
> > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > > --- a/arch/x86/kernel/cpu/bugs.c
> > > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > > >  static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > > >         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > > >
> > > > +/* Default to short BHB sequence values */
> > > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > > +
> > > >  static int __init spectre_bhi_parse_cmdline(char *str)
> > > >  {
> > > >         if (!str)
> > > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > > >                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > > >         }
> > > >
> > > > +       /*
> > > > +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > > +        * support), see Intel's BHI guidance.
> > > > +        */
> > > > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > > +               bhb_seq_outer_loop = 12;
> > > > +               bhb_seq_inner_loop = 7;
> > > > +       }
> > > > +
> > >
> > > How does this work for VMs in a heterogeneous migration pool that
> > > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > > it isn't available on all hosts in the migration pool, but they need
> > > the long sequence when running on Alder Lake or newer.
> >
> > As we discussed elsewhere, support for migration pool is much more
> > involved. It should be dealt in a separate QEMU/KVM focused series.
> >
> > A quickfix could be adding support for spectre_bhi=long that guests in a
> > migration pool can use?
> 
> The simplest solution is to add "|
> cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
> If that is unacceptable for the performance of pre-Alder Lake

Yes, that would be unnecessary overhead.

> migration pools, you could define a CPUID or MSR bit that says
> explicitly, "long BHB flush sequence needed," rather than trying to
> intuit that property from the presence of BHI_CTRL. Like
> IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
> by a hypervisor.

I will think about this more.

> I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
> friends, unless there is a major guest OS out there that relies on
> them.

If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
in the best position to decide whether a guest needs
virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
BHI_DIS_S for the guests that are in migration pool?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 21:34         ` Pawan Gupta
@ 2026-04-03 21:59           ` Jim Mattson
  2026-04-03 23:16             ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Jim Mattson @ 2026-04-03 21:59 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc

On Fri, Apr 3, 2026 at 2:34 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
> > On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > > > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > >
> > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > > > in the kernel.
> > > > >
> > > > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > > > kernel, the newer CPUs also use IBPB.
> > > > >
> > > > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > > > But it currently does not clear enough BHB entries to be effective on newer
> > > > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > > > >
> > > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > > > ---
> > > > >  arch/x86/entry/entry_64.S            |  8 +++++---
> > > > >  arch/x86/include/asm/nospec-branch.h |  2 ++
> > > > >  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
> > > > >  3 files changed, 20 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > > > --- a/arch/x86/entry/entry_64.S
> > > > > +++ b/arch/x86/entry/entry_64.S
> > > > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > >         ANNOTATE_NOENDBR
> > > > >         push    %rbp
> > > > >         mov     %rsp, %rbp
> > > > > -       movl    $5, %ecx
> > > > > +
> > > > > +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> > > > > +
> > > > >         ANNOTATE_INTRA_FUNCTION_CALL
> > > > >         call    1f
> > > > >         jmp     5f
> > > > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > >          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > > > >          * but some Clang versions (e.g. 18) don't like this.
> > > > >          */
> > > > > -       .skip 32 - 18, 0xcc
> > > > > -2:     movl    $5, %eax
> > > > > +       .skip 32 - 20, 0xcc
> > > > > +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
> > > > >  3:     jmp     4f
> > > > >         nop
> > > > >  4:     sub     $1, %eax
> > > > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > > > --- a/arch/x86/include/asm/nospec-branch.h
> > > > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > > > >  extern void update_spec_ctrl_cond(u64 val);
> > > > >  extern u64 spec_ctrl_current(void);
> > > > >
> > > > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > > > +
> > > > >  /*
> > > > >   * With retpoline, we must use IBRS to restrict branch prediction
> > > > >   * before calling into firmware.
> > > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > > > --- a/arch/x86/kernel/cpu/bugs.c
> > > > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > > > >  static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > > > >         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > > > >
> > > > > +/* Default to short BHB sequence values */
> > > > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > > > +
> > > > >  static int __init spectre_bhi_parse_cmdline(char *str)
> > > > >  {
> > > > >         if (!str)
> > > > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > > > >                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > > > >         }
> > > > >
> > > > > +       /*
> > > > > +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > > > +        * support), see Intel's BHI guidance.
> > > > > +        */
> > > > > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > > > +               bhb_seq_outer_loop = 12;
> > > > > +               bhb_seq_inner_loop = 7;
> > > > > +       }
> > > > > +
> > > >
> > > > How does this work for VMs in a heterogeneous migration pool that
> > > > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > > > it isn't available on all hosts in the migration pool, but they need
> > > > the long sequence when running on Alder Lake or newer.
> > >
> > > As we discussed elsewhere, support for migration pool is much more
> > > involved. It should be dealt in a separate QEMU/KVM focused series.
> > >
> > > A quickfix could be adding support for spectre_bhi=long that guests in a
> > > migration pool can use?
> >
> > The simplest solution is to add "|
> > cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
> > If that is unacceptable for the performance of pre-Alder Lake
>
> Yes, that would be unnecessary overhead.
>
> > migration pools, you could define a CPUID or MSR bit that says
> > explicitly, "long BHB flush sequence needed," rather than trying to
> > intuit that property from the presence of BHI_CTRL. Like
> > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
> > by a hypervisor.
>
> I will think about this more.
>
> > I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
> > friends, unless there is a major guest OS out there that relies on
> > them.
>
> If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
> in the best position to decide whether a guest needs
> virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
> BHI_DIS_S for the guests that are in migration pool?

That is not possible today, since KVM does not implement Intel's
IA32_SPEC_CTRL virtualization, and cedes the hardware IA32_SPEC_CTRL
to the guest after the first non-zero write to the guest's MSR.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 21:59           ` Jim Mattson
@ 2026-04-03 23:16             ` Pawan Gupta
  2026-04-03 23:22               ` Jim Mattson
  0 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03 23:16 UTC (permalink / raw)
  To: Jim Mattson
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc, chao.gao

On Fri, Apr 03, 2026 at 02:59:33PM -0700, Jim Mattson wrote:
> On Fri, Apr 3, 2026 at 2:34 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
> > > On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
> > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > >
> > > > On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > > > > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > > >
> > > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > > > > in the kernel.
> > > > > >
> > > > > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > > > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > > > > kernel, the newer CPUs also use IBPB.
> > > > > >
> > > > > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > > > > But it currently does not clear enough BHB entries to be effective on newer
> > > > > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > > > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > > > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > > > > >
> > > > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > > > > ---
> > > > > >  arch/x86/entry/entry_64.S            |  8 +++++---
> > > > > >  arch/x86/include/asm/nospec-branch.h |  2 ++
> > > > > >  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
> > > > > >  3 files changed, 20 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > > > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > > > > --- a/arch/x86/entry/entry_64.S
> > > > > > +++ b/arch/x86/entry/entry_64.S
> > > > > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > >         ANNOTATE_NOENDBR
> > > > > >         push    %rbp
> > > > > >         mov     %rsp, %rbp
> > > > > > -       movl    $5, %ecx
> > > > > > +
> > > > > > +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> > > > > > +
> > > > > >         ANNOTATE_INTRA_FUNCTION_CALL
> > > > > >         call    1f
> > > > > >         jmp     5f
> > > > > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > >          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > > > > >          * but some Clang versions (e.g. 18) don't like this.
> > > > > >          */
> > > > > > -       .skip 32 - 18, 0xcc
> > > > > > -2:     movl    $5, %eax
> > > > > > +       .skip 32 - 20, 0xcc
> > > > > > +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
> > > > > >  3:     jmp     4f
> > > > > >         nop
> > > > > >  4:     sub     $1, %eax
> > > > > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > > > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > > > > --- a/arch/x86/include/asm/nospec-branch.h
> > > > > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > > > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > > > > >  extern void update_spec_ctrl_cond(u64 val);
> > > > > >  extern u64 spec_ctrl_current(void);
> > > > > >
> > > > > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > > > > +
> > > > > >  /*
> > > > > >   * With retpoline, we must use IBRS to restrict branch prediction
> > > > > >   * before calling into firmware.
> > > > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > > > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > > > > --- a/arch/x86/kernel/cpu/bugs.c
> > > > > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > > > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > > > > >  static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > > > > >         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > > > > >
> > > > > > +/* Default to short BHB sequence values */
> > > > > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > > > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > > > > +
> > > > > >  static int __init spectre_bhi_parse_cmdline(char *str)
> > > > > >  {
> > > > > >         if (!str)
> > > > > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > > > > >                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > > > > >         }
> > > > > >
> > > > > > +       /*
> > > > > > +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > > > > +        * support), see Intel's BHI guidance.
> > > > > > +        */
> > > > > > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > > > > +               bhb_seq_outer_loop = 12;
> > > > > > +               bhb_seq_inner_loop = 7;
> > > > > > +       }
> > > > > > +
> > > > >
> > > > > How does this work for VMs in a heterogeneous migration pool that
> > > > > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > > > > it isn't available on all hosts in the migration pool, but they need
> > > > > the long sequence when running on Alder Lake or newer.
> > > >
> > > > As we discussed elsewhere, support for migration pool is much more
> > > > involved. It should be dealt in a separate QEMU/KVM focused series.
> > > >
> > > > A quickfix could be adding support for spectre_bhi=long that guests in a
> > > > migration pool can use?
> > >
> > > The simplest solution is to add "|
> > > cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
> > > If that is unacceptable for the performance of pre-Alder Lake
> >
> > Yes, that would be unnecessary overhead.
> >
> > > migration pools, you could define a CPUID or MSR bit that says
> > > explicitly, "long BHB flush sequence needed," rather than trying to
> > > intuit that property from the presence of BHI_CTRL. Like
> > > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
> > > by a hypervisor.
> >
> > I will think about this more.
> >
> > > I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
> > > friends, unless there is a major guest OS out there that relies on
> > > them.
> >
> > If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
> > in the best position to decide whether a guest needs
> > virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
> > BHI_DIS_S for the guests that are in migration pool?
> 
> That is not possible today, since KVM does not implement Intel's
> IA32_SPEC_CTRL virtualization, and cedes the hardware IA32_SPEC_CTRL
> to the guest after the first non-zero write to the guest's MSR.

Yes, KVM doesn't support it yet. But, adding that support to give more
control to userspace VMM helps this case, and probably many other in
the future.

I will check with Chao if he can prepare the next version of virtual
SPEC_CTRL series (leaving out virtual mitigation MSRs).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 23:16             ` Pawan Gupta
@ 2026-04-03 23:22               ` Jim Mattson
  2026-04-03 23:33                 ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Jim Mattson @ 2026-04-03 23:22 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc, chao.gao

On Fri, Apr 3, 2026 at 4:16 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Apr 03, 2026 at 02:59:33PM -0700, Jim Mattson wrote:
> > On Fri, Apr 3, 2026 at 2:34 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
> > > > On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
> > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > >
> > > > > On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > > > > > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > > > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > > > >
> > > > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > > > > > in the kernel.
> > > > > > >
> > > > > > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > > > > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > > > > > kernel, the newer CPUs also use IBPB.
> > > > > > >
> > > > > > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > > > > > But it currently does not clear enough BHB entries to be effective on newer
> > > > > > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > > > > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > > > > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > > > > > >
> > > > > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > > > > > ---
> > > > > > >  arch/x86/entry/entry_64.S            |  8 +++++---
> > > > > > >  arch/x86/include/asm/nospec-branch.h |  2 ++
> > > > > > >  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
> > > > > > >  3 files changed, 20 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > > > > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > > > > > --- a/arch/x86/entry/entry_64.S
> > > > > > > +++ b/arch/x86/entry/entry_64.S
> > > > > > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > >         ANNOTATE_NOENDBR
> > > > > > >         push    %rbp
> > > > > > >         mov     %rsp, %rbp
> > > > > > > -       movl    $5, %ecx
> > > > > > > +
> > > > > > > +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> > > > > > > +
> > > > > > >         ANNOTATE_INTRA_FUNCTION_CALL
> > > > > > >         call    1f
> > > > > > >         jmp     5f
> > > > > > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > >          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > > > > > >          * but some Clang versions (e.g. 18) don't like this.
> > > > > > >          */
> > > > > > > -       .skip 32 - 18, 0xcc
> > > > > > > -2:     movl    $5, %eax
> > > > > > > +       .skip 32 - 20, 0xcc
> > > > > > > +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
> > > > > > >  3:     jmp     4f
> > > > > > >         nop
> > > > > > >  4:     sub     $1, %eax
> > > > > > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > > > > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > > > > > --- a/arch/x86/include/asm/nospec-branch.h
> > > > > > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > > > > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > > > > > >  extern void update_spec_ctrl_cond(u64 val);
> > > > > > >  extern u64 spec_ctrl_current(void);
> > > > > > >
> > > > > > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > > > > > +
> > > > > > >  /*
> > > > > > >   * With retpoline, we must use IBRS to restrict branch prediction
> > > > > > >   * before calling into firmware.
> > > > > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > > > > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > > > > > --- a/arch/x86/kernel/cpu/bugs.c
> > > > > > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > > > > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > > > > > >  static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > > > > > >         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > > > > > >
> > > > > > > +/* Default to short BHB sequence values */
> > > > > > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > > > > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > > > > > +
> > > > > > >  static int __init spectre_bhi_parse_cmdline(char *str)
> > > > > > >  {
> > > > > > >         if (!str)
> > > > > > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > > > > > >                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > > > > > >         }
> > > > > > >
> > > > > > > +       /*
> > > > > > > +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > > > > > +        * support), see Intel's BHI guidance.
> > > > > > > +        */
> > > > > > > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > > > > > +               bhb_seq_outer_loop = 12;
> > > > > > > +               bhb_seq_inner_loop = 7;
> > > > > > > +       }
> > > > > > > +
> > > > > >
> > > > > > How does this work for VMs in a heterogeneous migration pool that
> > > > > > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > > > > > it isn't available on all hosts in the migration pool, but they need
> > > > > > the long sequence when running on Alder Lake or newer.
> > > > >
> > > > > As we discussed elsewhere, support for migration pool is much more
> > > > > involved. It should be dealt in a separate QEMU/KVM focused series.
> > > > >
> > > > > A quickfix could be adding support for spectre_bhi=long that guests in a
> > > > > migration pool can use?
> > > >
> > > > The simplest solution is to add "|
> > > > cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
> > > > If that is unacceptable for the performance of pre-Alder Lake
> > >
> > > Yes, that would be unnecessary overhead.
> > >
> > > > migration pools, you could define a CPUID or MSR bit that says
> > > > explicitly, "long BHB flush sequence needed," rather than trying to
> > > > intuit that property from the presence of BHI_CTRL. Like
> > > > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
> > > > by a hypervisor.
> > >
> > > I will think about this more.
> > >
> > > > I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
> > > > friends, unless there is a major guest OS out there that relies on
> > > > them.
> > >
> > > If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
> > > in the best position to decide whether a guest needs
> > > virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
> > > BHI_DIS_S for the guests that are in migration pool?
> >
> > That is not possible today, since KVM does not implement Intel's
> > IA32_SPEC_CTRL virtualization, and cedes the hardware IA32_SPEC_CTRL
> > to the guest after the first non-zero write to the guest's MSR.
>
> Yes, KVM doesn't support it yet. But, adding that support to give more
> control to userspace VMM helps this case, and probably many other in
> the future.

But didn't you tell me that Windows doesn't want the hypervisor to set
BHI_DIS_S behind their back?

> I will check with Chao if he can prepare the next version of virtual
> SPEC_CTRL series (leaving out virtual mitigation MSRs).

Excellent.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 23:22               ` Jim Mattson
@ 2026-04-03 23:33                 ` Pawan Gupta
  2026-04-03 23:39                   ` Jim Mattson
  0 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-03 23:33 UTC (permalink / raw)
  To: Jim Mattson
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc, chao.gao

On Fri, Apr 03, 2026 at 04:22:28PM -0700, Jim Mattson wrote:
> On Fri, Apr 3, 2026 at 4:16 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > On Fri, Apr 03, 2026 at 02:59:33PM -0700, Jim Mattson wrote:
> > > On Fri, Apr 3, 2026 at 2:34 PM Pawan Gupta
> > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > >
> > > > On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
> > > > > On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
> > > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > > >
> > > > > > On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > > > > > > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > > > > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > > > > >
> > > > > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > > > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > > > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > > > > > > in the kernel.
> > > > > > > >
> > > > > > > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > > > > > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > > > > > > kernel, the newer CPUs also use IBPB.
> > > > > > > >
> > > > > > > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > > > > > > But it currently does not clear enough BHB entries to be effective on newer
> > > > > > > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > > > > > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > > > > > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > > > > > > >
> > > > > > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > > > > > > ---
> > > > > > > >  arch/x86/entry/entry_64.S            |  8 +++++---
> > > > > > > >  arch/x86/include/asm/nospec-branch.h |  2 ++
> > > > > > > >  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
> > > > > > > >  3 files changed, 20 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > > > > > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > > > > > > --- a/arch/x86/entry/entry_64.S
> > > > > > > > +++ b/arch/x86/entry/entry_64.S
> > > > > > > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > > >         ANNOTATE_NOENDBR
> > > > > > > >         push    %rbp
> > > > > > > >         mov     %rsp, %rbp
> > > > > > > > -       movl    $5, %ecx
> > > > > > > > +
> > > > > > > > +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> > > > > > > > +
> > > > > > > >         ANNOTATE_INTRA_FUNCTION_CALL
> > > > > > > >         call    1f
> > > > > > > >         jmp     5f
> > > > > > > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > > >          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > > > > > > >          * but some Clang versions (e.g. 18) don't like this.
> > > > > > > >          */
> > > > > > > > -       .skip 32 - 18, 0xcc
> > > > > > > > -2:     movl    $5, %eax
> > > > > > > > +       .skip 32 - 20, 0xcc
> > > > > > > > +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
> > > > > > > >  3:     jmp     4f
> > > > > > > >         nop
> > > > > > > >  4:     sub     $1, %eax
> > > > > > > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > > > > > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > > > > > > --- a/arch/x86/include/asm/nospec-branch.h
> > > > > > > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > > > > > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > > > > > > >  extern void update_spec_ctrl_cond(u64 val);
> > > > > > > >  extern u64 spec_ctrl_current(void);
> > > > > > > >
> > > > > > > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > > > > > > +
> > > > > > > >  /*
> > > > > > > >   * With retpoline, we must use IBRS to restrict branch prediction
> > > > > > > >   * before calling into firmware.
> > > > > > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > > > > > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > > > > > > --- a/arch/x86/kernel/cpu/bugs.c
> > > > > > > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > > > > > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > > > > > > >  static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > > > > > > >         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > > > > > > >
> > > > > > > > +/* Default to short BHB sequence values */
> > > > > > > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > > > > > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > > > > > > +
> > > > > > > >  static int __init spectre_bhi_parse_cmdline(char *str)
> > > > > > > >  {
> > > > > > > >         if (!str)
> > > > > > > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > > > > > > >                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > +       /*
> > > > > > > > +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > > > > > > +        * support), see Intel's BHI guidance.
> > > > > > > > +        */
> > > > > > > > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > > > > > > +               bhb_seq_outer_loop = 12;
> > > > > > > > +               bhb_seq_inner_loop = 7;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > >
> > > > > > > How does this work for VMs in a heterogeneous migration pool that
> > > > > > > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > > > > > > it isn't available on all hosts in the migration pool, but they need
> > > > > > > the long sequence when running on Alder Lake or newer.
> > > > > >
> > > > > > As we discussed elsewhere, support for migration pool is much more
> > > > > > involved. It should be dealt in a separate QEMU/KVM focused series.
> > > > > >
> > > > > > A quickfix could be adding support for spectre_bhi=long that guests in a
> > > > > > migration pool can use?
> > > > >
> > > > > The simplest solution is to add "|
> > > > > cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
> > > > > If that is unacceptable for the performance of pre-Alder Lake
> > > >
> > > > Yes, that would be unnecessary overhead.
> > > >
> > > > > migration pools, you could define a CPUID or MSR bit that says
> > > > > explicitly, "long BHB flush sequence needed," rather than trying to
> > > > > intuit that property from the presence of BHI_CTRL. Like
> > > > > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
> > > > > by a hypervisor.
> > > >
> > > > I will think about this more.
> > > >
> > > > > I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
> > > > > friends, unless there is a major guest OS out there that relies on
> > > > > them.
> > > >
> > > > If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
> > > > in the best position to decide whether a guest needs
> > > > virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
> > > > BHI_DIS_S for the guests that are in migration pool?
> > >
> > > That is not possible today, since KVM does not implement Intel's
> > > IA32_SPEC_CTRL virtualization, and cedes the hardware IA32_SPEC_CTRL
> > > to the guest after the first non-zero write to the guest's MSR.
> >
> > Yes, KVM doesn't support it yet. But, adding that support to give more
> > control to userspace VMM helps this case, and probably many other in
> > the future.
> 
> But didn't you tell me that Windows doesn't want the hypervisor to set
> BHI_DIS_S behind their back?

Since cloud providers have greater control over userspace, the decision to
use BHI_DIS_S or not can be left to them. KVM would simply follow what it
is asked to do by the userspace.

> > I will check with Chao if he can prepare the next version of virtual
> > SPEC_CTRL series (leaving out virtual mitigation MSRs).
> 
> Excellent.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 23:33                 ` Pawan Gupta
@ 2026-04-03 23:39                   ` Jim Mattson
  2026-04-04  0:21                     ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Jim Mattson @ 2026-04-03 23:39 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc, chao.gao

On Fri, Apr 3, 2026 at 4:33 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Apr 03, 2026 at 04:22:28PM -0700, Jim Mattson wrote:
> > On Fri, Apr 3, 2026 at 4:16 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > On Fri, Apr 03, 2026 at 02:59:33PM -0700, Jim Mattson wrote:
> > > > On Fri, Apr 3, 2026 at 2:34 PM Pawan Gupta
> > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > >
> > > > > On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
> > > > > > On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
> > > > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > > > >
> > > > > > > On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > > > > > > > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > > > > > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > > > > > >
> > > > > > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > > > > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > > > > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > > > > > > > in the kernel.
> > > > > > > > >
> > > > > > > > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > > > > > > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > > > > > > > kernel, the newer CPUs also use IBPB.
> > > > > > > > >
> > > > > > > > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > > > > > > > But it currently does not clear enough BHB entries to be effective on newer
> > > > > > > > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > > > > > > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > > > > > > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > > > > > > > >
> > > > > > > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > > > > > > > ---
> > > > > > > > >  arch/x86/entry/entry_64.S            |  8 +++++---
> > > > > > > > >  arch/x86/include/asm/nospec-branch.h |  2 ++
> > > > > > > > >  arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
> > > > > > > > >  3 files changed, 20 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > > > > > > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > > > > > > > --- a/arch/x86/entry/entry_64.S
> > > > > > > > > +++ b/arch/x86/entry/entry_64.S
> > > > > > > > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > > > >         ANNOTATE_NOENDBR
> > > > > > > > >         push    %rbp
> > > > > > > > >         mov     %rsp, %rbp
> > > > > > > > > -       movl    $5, %ecx
> > > > > > > > > +
> > > > > > > > > +       movzbl    bhb_seq_outer_loop(%rip), %ecx
> > > > > > > > > +
> > > > > > > > >         ANNOTATE_INTRA_FUNCTION_CALL
> > > > > > > > >         call    1f
> > > > > > > > >         jmp     5f
> > > > > > > > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > > > >          * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > > > > > > > >          * but some Clang versions (e.g. 18) don't like this.
> > > > > > > > >          */
> > > > > > > > > -       .skip 32 - 18, 0xcc
> > > > > > > > > -2:     movl    $5, %eax
> > > > > > > > > +       .skip 32 - 20, 0xcc
> > > > > > > > > +2:     movzbl  bhb_seq_inner_loop(%rip), %eax
> > > > > > > > >  3:     jmp     4f
> > > > > > > > >         nop
> > > > > > > > >  4:     sub     $1, %eax
> > > > > > > > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > > > > > > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > > > > > > > --- a/arch/x86/include/asm/nospec-branch.h
> > > > > > > > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > > > > > > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > > > > > > > >  extern void update_spec_ctrl_cond(u64 val);
> > > > > > > > >  extern u64 spec_ctrl_current(void);
> > > > > > > > >
> > > > > > > > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > > > > > > > +
> > > > > > > > >  /*
> > > > > > > > >   * With retpoline, we must use IBRS to restrict branch prediction
> > > > > > > > >   * before calling into firmware.
> > > > > > > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > > > > > > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > > > > > > > --- a/arch/x86/kernel/cpu/bugs.c
> > > > > > > > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > > > > > > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > > > > > > > >  static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > > > > > > > >         IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > > > > > > > >
> > > > > > > > > +/* Default to short BHB sequence values */
> > > > > > > > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > > > > > > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > > > > > > > +
> > > > > > > > >  static int __init spectre_bhi_parse_cmdline(char *str)
> > > > > > > > >  {
> > > > > > > > >         if (!str)
> > > > > > > > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > > > > > > > >                 x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > +       /*
> > > > > > > > > +        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > > > > > > > +        * support), see Intel's BHI guidance.
> > > > > > > > > +        */
> > > > > > > > > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > > > > > > > +               bhb_seq_outer_loop = 12;
> > > > > > > > > +               bhb_seq_inner_loop = 7;
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > >
> > > > > > > > How does this work for VMs in a heterogeneous migration pool that
> > > > > > > > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > > > > > > > it isn't available on all hosts in the migration pool, but they need
> > > > > > > > the long sequence when running on Alder Lake or newer.
> > > > > > >
> > > > > > > As we discussed elsewhere, support for migration pool is much more
> > > > > > > involved. It should be dealt in a separate QEMU/KVM focused series.
> > > > > > >
> > > > > > > A quickfix could be adding support for spectre_bhi=long that guests in a
> > > > > > > migration pool can use?
> > > > > >
> > > > > > The simplest solution is to add "|
> > > > > > cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
> > > > > > If that is unacceptable for the performance of pre-Alder Lake
> > > > >
> > > > > Yes, that would be unnecessary overhead.
> > > > >
> > > > > > migration pools, you could define a CPUID or MSR bit that says
> > > > > > explicitly, "long BHB flush sequence needed," rather than trying to
> > > > > > intuit that property from the presence of BHI_CTRL. Like
> > > > > > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
> > > > > > by a hypervisor.
> > > > >
> > > > > I will think about this more.
> > > > >
> > > > > > I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
> > > > > > friends, unless there is a major guest OS out there that relies on
> > > > > > them.
> > > > >
> > > > > If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
> > > > > in the best position to decide whether a guest needs
> > > > > virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
> > > > > BHI_DIS_S for the guests that are in migration pool?
> > > >
> > > > That is not possible today, since KVM does not implement Intel's
> > > > IA32_SPEC_CTRL virtualization, and cedes the hardware IA32_SPEC_CTRL
> > > > to the guest after the first non-zero write to the guest's MSR.
> > >
> > > Yes, KVM doesn't support it yet. But, adding that support to give more
> > > control to userspace VMM helps this case, and probably many other in
> > > the future.
> >
> > But didn't you tell me that Windows doesn't want the hypervisor to set
> > BHI_DIS_S behind their back?
>
> Since cloud providers have greater control over userspace, the decision to
> use BHI_DIS_S or not can be left to them. KVM would simply follow what it
> is asked to do by the userspace.

I feel like we've gone over this before, but if userspace tells KVM
not to enable BHI_DIS_S, how do we inform Windows that it needs to do
the longer clearing sequence, despite the fact that the virtual CPU is
masquerading as Ice Lake?

I don't think the virtual mitigation MSRs address that issue.

> > > I will check with Chao if he can prepare the next version of virtual
> > > SPEC_CTRL series (leaving out virtual mitigation MSRs).
> >
> > Excellent.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-03 23:39                   ` Jim Mattson
@ 2026-04-04  0:21                     ` Pawan Gupta
  2026-04-04  2:21                       ` Jim Mattson
  0 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-04  0:21 UTC (permalink / raw)
  To: Jim Mattson
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc, chao.gao

On Fri, Apr 03, 2026 at 04:39:54PM -0700, Jim Mattson wrote:
> > Since cloud providers have greater control over userspace, the decision to
> > use BHI_DIS_S or not can be left to them. KVM would simply follow what it
> > is asked to do by the userspace.
> 
> I feel like we've gone over this before, but if userspace tells KVM
> not to enable BHI_DIS_S, how do we inform Windows that it needs to do
> the longer clearing sequence, despite the fact that the virtual CPU is
> masquerading as Ice Lake?

IMO, if an OS is allergic to a hardware mitigation, and is also aware that
it is virtualized, it should default to a sw mitigation that works everywhere.

> I don't think the virtual mitigation MSRs address that issue.

Virtual mitigation MSRs are meant to inform the VMM about the guest
mitigation. Even if there was a way to tell the guest that it needs to use
a different mitigation, it seems unrealistic for a guest to change its
mitigation post-migration.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-04  0:21                     ` Pawan Gupta
@ 2026-04-04  2:21                       ` Jim Mattson
  2026-04-04  3:49                         ` Pawan Gupta
  0 siblings, 1 reply; 32+ messages in thread
From: Jim Mattson @ 2026-04-04  2:21 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc, chao.gao

On Fri, Apr 3, 2026 at 5:22 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Apr 03, 2026 at 04:39:54PM -0700, Jim Mattson wrote:
> > > Since cloud providers have greater control over userspace, the decision to
> > > use BHI_DIS_S or not can be left to them. KVM would simply follow what it
> > > is asked to do by the userspace.
> >
> > I feel like we've gone over this before, but if userspace tells KVM
> > not to enable BHI_DIS_S, how do we inform Windows that it needs to do
> > the longer clearing sequence, despite the fact that the virtual CPU is
> > masquerading as Ice Lake?
>
> IMO, if an OS is allergic to a hardware mitigation, and is also aware that
> it is virtualized, it should default to a sw mitigation that works everywhere.

Agreed. So, without any information to the contrary, VMs should assume
the long BHB clearing sequence is required.

Returning to my earlier comment, the test should be:

+       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL) ||
cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+               bhb_seq_outer_loop = 12;
+               bhb_seq_inner_loop = 7;
+       }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-04  2:21                       ` Jim Mattson
@ 2026-04-04  3:49                         ` Pawan Gupta
  2026-04-06 14:23                           ` Jim Mattson
  0 siblings, 1 reply; 32+ messages in thread
From: Pawan Gupta @ 2026-04-04  3:49 UTC (permalink / raw)
  To: Jim Mattson
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc, chao.gao

On Fri, Apr 03, 2026 at 07:21:02PM -0700, Jim Mattson wrote:
> On Fri, Apr 3, 2026 at 5:22 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > On Fri, Apr 03, 2026 at 04:39:54PM -0700, Jim Mattson wrote:
> > > > Since cloud providers have greater control over userspace, the decision to
> > > > use BHI_DIS_S or not can be left to them. KVM would simply follow what it
> > > > is asked to do by the userspace.
> > >
> > > I feel like we've gone over this before, but if userspace tells KVM
> > > not to enable BHI_DIS_S, how do we inform Windows that it needs to do
> > > the longer clearing sequence, despite the fact that the virtual CPU is
> > > masquerading as Ice Lake?
> >
> > IMO, if an OS is allergic to a hardware mitigation, and is also aware that
> > it is virtualized, it should default to a sw mitigation that works everywhere.
> 
> Agreed. So, without any information to the contrary, VMs should assume
> the long BHB clearing sequence is required.
> 
> Returning to my earlier comment, the test should be:
> 
> +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL) ||
> cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
> +               bhb_seq_outer_loop = 12;
> +               bhb_seq_inner_loop = 7;
> +       }

To be clear, my comment was for an OS that doesn't want BHI_DIS_S
under-the-hood with virtual-SPEC_CTRL. Linux doesn't have that problem,
hardware mitigation on Linux is perfectly okay.

Without virtual-SPEC_CTRL, the problem set is limited to guests that
migrate accross Alder Lake generation CPUs. As you mentioned the change in
MAXPHYADDR makes it unlikely.

With virtual-SPEC_CTRL support, guests that fall into the subset that
migrate inspite of MAXPHYADDR change would also be mitigated. Then, on top
of hardware mitigation, deploying the long sequence in the guest would
incur a significant performance penalty for no good reason.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 00/10] VMSCAPE optimization for BHI variant
  2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (9 preceding siblings ...)
  2026-04-03  0:33 ` [PATCH v9 10/10] x86/vmscape: Add cmdline vmscape=on to override attack vector controls Pawan Gupta
@ 2026-04-04 15:20 ` David Laight
  2026-04-05  7:23   ` Pawan Gupta
  10 siblings, 1 reply; 32+ messages in thread
From: David Laight @ 2026-04-04 15:20 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, David Ahern,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Paolo Bonzini,
	Jonathan Corbet, linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf,
	netdev, linux-doc

On Thu, 2 Apr 2026 17:30:32 -0700
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:

> v9:
> - Use global variables for BHB loop counters instead of ALTERNATIVE-based
>   approach. (Dave & others)
> - Use 32-bit registers (%eax/%ecx) for loop counters, loaded via movzbl
>   from 8-bit globals. 8-bit registers (e.g. %ah in the inner loop) caused
>   performance regression on certain CPUs due to partial-register stalls. (David Laight)
> - Let BPF save/restore %rax/%rcx as in the original implementation, since
>   it is the only caller that needs these registers preserved across the
>   BHB clearing sequence.

That is as dangerous as hell...
Does BPF even save %rcx - I'm sure I checked that a long time ago
and found it didn't.
(I'm mostly AFK over Easter and can't check.)
A least there should be a blood great big comment that BPF calls this code
and only saves specific registers.
But given the number of mispredicted branches and other pipeline stalls
in this code a couple of register saves to stack are unlikely to make
any difference.

	David


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 00/10] VMSCAPE optimization for BHI variant
  2026-04-04 15:20 ` [PATCH v9 00/10] VMSCAPE optimization for BHI variant David Laight
@ 2026-04-05  7:23   ` Pawan Gupta
  0 siblings, 0 replies; 32+ messages in thread
From: Pawan Gupta @ 2026-04-05  7:23 UTC (permalink / raw)
  To: David Laight
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	Andy Lutomirski, Thomas Gleixner, Ingo Molnar, David Ahern,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Paolo Bonzini,
	Jonathan Corbet, linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf,
	netdev, linux-doc

On Sat, Apr 04, 2026 at 04:20:59PM +0100, David Laight wrote:
> On Thu, 2 Apr 2026 17:30:32 -0700
> Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:
> 
> > v9:
> > - Use global variables for BHB loop counters instead of ALTERNATIVE-based
> >   approach. (Dave & others)
> > - Use 32-bit registers (%eax/%ecx) for loop counters, loaded via movzbl
> >   from 8-bit globals. 8-bit registers (e.g. %ah in the inner loop) caused
> >   performance regression on certain CPUs due to partial-register stalls. (David Laight)
> > - Let BPF save/restore %rax/%rcx as in the original implementation, since
> >   it is the only caller that needs these registers preserved across the
> >   BHB clearing sequence.
> 
> That is as dangerous as hell...
> Does BPF even save %rcx - I'm sure I checked that a long time ago
> and found it didn't.

Below code injects save/restore of %rax and %rcx to BPF programs:

arch/x86/net/bpf_jit_comp.c

emit_spectre_bhb_barrier()
{
	u8 *prog = *pprog;
	u8 *func;

	if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
		/* The clearing sequence clobbers eax and ecx. */
		EMIT1(0x50); /* push rax */
		EMIT1(0x51); /* push rcx */
		ip += 2;

		func = (u8 *)clear_bhb_loop_nofence;
		ip += x86_call_depth_emit_accounting(&prog, func, ip);

		if (emit_call(&prog, func, ip))
			return -EINVAL;
		/* Don't speculate past this until BHB is cleared */
		EMIT_LFENCE();
		EMIT1(0x59); /* pop rcx */
		EMIT1(0x58); /* pop rax */
	}
	...

> (I'm mostly AFK over Easter and can't check.)
> A least there should be a blood great big comment that BPF calls this code
> and only saves specific registers.

Sure, will add.

> But given the number of mispredicted branches and other pipeline stalls
> in this code a couple of register saves to stack are unlikely to make
> any difference.

BPF programs have been saving/restoring the registers since long now. What
problem are you anticipating?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-04-04  3:49                         ` Pawan Gupta
@ 2026-04-06 14:23                           ` Jim Mattson
  0 siblings, 0 replies; 32+ messages in thread
From: Jim Mattson @ 2026-04-06 14:23 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
	David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
	Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
	David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Paolo Bonzini, Jonathan Corbet, linux-kernel, kvm, Asit Mallick,
	Tao Zhang, bpf, netdev, linux-doc, chao.gao

On Fri, Apr 3, 2026 at 8:50 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Apr 03, 2026 at 07:21:02PM -0700, Jim Mattson wrote:
> > On Fri, Apr 3, 2026 at 5:22 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > On Fri, Apr 03, 2026 at 04:39:54PM -0700, Jim Mattson wrote:
> > > > > Since cloud providers have greater control over userspace, the decision to
> > > > > use BHI_DIS_S or not can be left to them. KVM would simply follow what it
> > > > > is asked to do by the userspace.
> > > >
> > > > I feel like we've gone over this before, but if userspace tells KVM
> > > > not to enable BHI_DIS_S, how do we inform Windows that it needs to do
> > > > the longer clearing sequence, despite the fact that the virtual CPU is
> > > > masquerading as Ice Lake?
> > >
> > > IMO, if an OS is allergic to a hardware mitigation, and is also aware that
> > > it is virtualized, it should default to a sw mitigation that works everywhere.
> >
> > Agreed. So, without any information to the contrary, VMs should assume
> > the long BHB clearing sequence is required.
> >
> > Returning to my earlier comment, the test should be:
> >
> > +       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL) ||
> > cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
> > +               bhb_seq_outer_loop = 12;
> > +               bhb_seq_inner_loop = 7;
> > +       }
>
> To be clear, my comment was for an OS that doesn't want BHI_DIS_S
> under-the-hood with virtual-SPEC_CTRL. Linux doesn't have that problem,
> hardware mitigation on Linux is perfectly okay.

Today, BHI_DIS_S under-the-hood isn't offered. If the hypervisor
doesn't offer the paravirtual mitigation MSRs, the guest must assume
that the hypervisor will not set BHI_DIS_S on its behalf.

> Without virtual-SPEC_CTRL, the problem set is limited to guests that
> migrate accross Alder Lake generation CPUs. As you mentioned the change in
> MAXPHYADDR makes it unlikely.

I have been unable to make a compelling argument for not crossing this
boundary. The only applications I can point to that are broken by the
missing reserved bits are (nested) hypervisors using shadow-paging.
Since both nVMX and nSVM support TDP, the niche cache isn't a concern.
There are compelling business reasons to support seamless migration
from pre-Alder Lake to post-Alder Lake. If you know of any other
applications that will fail with a mis-emulated smaller MAXPHYADDR,
please let me know.

> With virtual-SPEC_CTRL support, guests that fall into the subset that
> migrate inspite of MAXPHYADDR change would also be mitigated. Then, on top
> of hardware mitigation, deploying the long sequence in the guest would
> incur a significant performance penalty for no good reason.

Yes, but the guest needs a way to determine whether the hypervisor
will do what's necessary to make the short sequence effective. And, in
particular, no KVM hypervisor today is prepared to do that.

When running under a hypervisor, without BHI_CTRL and without any
evidence to the contrary, the guest must assume that the longer
sequence is necessary. At the very least, we need a CPUID or MSR bit
that says, "the short BHB clearing sequence is adequate for this
vCPU."

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-04-06 14:23 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-03  0:30 [PATCH v9 00/10] VMSCAPE optimization for BHI variant Pawan Gupta
2026-04-03  0:30 ` [PATCH v9 01/10] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
2026-04-03 15:16   ` Borislav Petkov
2026-04-03 16:45     ` Pawan Gupta
2026-04-03 17:11       ` Borislav Petkov
2026-04-03  0:31 ` [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
2026-04-03 18:10   ` Jim Mattson
2026-04-03 18:52     ` Pawan Gupta
2026-04-03 20:19       ` Jim Mattson
2026-04-03 21:34         ` Pawan Gupta
2026-04-03 21:59           ` Jim Mattson
2026-04-03 23:16             ` Pawan Gupta
2026-04-03 23:22               ` Jim Mattson
2026-04-03 23:33                 ` Pawan Gupta
2026-04-03 23:39                   ` Jim Mattson
2026-04-04  0:21                     ` Pawan Gupta
2026-04-04  2:21                       ` Jim Mattson
2026-04-04  3:49                         ` Pawan Gupta
2026-04-06 14:23                           ` Jim Mattson
2026-04-03  0:31 ` [PATCH v9 03/10] x86/bhi: Rename clear_bhb_loop() to clear_bhb_loop_nofence() Pawan Gupta
2026-04-03  0:31 ` [PATCH v9 04/10] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user Pawan Gupta
2026-04-03  0:31 ` [PATCH v9 05/10] x86/vmscape: Move mitigation selection to a switch() Pawan Gupta
2026-04-03  0:32 ` [PATCH v9 06/10] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier() Pawan Gupta
2026-04-03  0:32 ` [PATCH v9 07/10] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
2026-04-03 14:52   ` Sean Christopherson
2026-04-03 16:44     ` Pawan Gupta
2026-04-03 17:26       ` Pawan Gupta
2026-04-03  0:32 ` [PATCH v9 08/10] x86/vmscape: Deploy BHB clearing mitigation Pawan Gupta
2026-04-03  0:32 ` [PATCH v9 09/10] x86/vmscape: Resolve conflict between attack-vectors and vmscape=force Pawan Gupta
2026-04-03  0:33 ` [PATCH v9 10/10] x86/vmscape: Add cmdline vmscape=on to override attack vector controls Pawan Gupta
2026-04-04 15:20 ` [PATCH v9 00/10] VMSCAPE optimization for BHI variant David Laight
2026-04-05  7:23   ` Pawan Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox