public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/9] VMSCAPE optimization for BHI variant
@ 2025-12-02  6:18 Pawan Gupta
  2025-12-02  6:18 ` [PATCH v6 1/9] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:18 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

v6:
- Remove semicolon at the end of asm in ALTERNATIVE (Uros).
- Fix build warning in vmscape_select_mitigation() (LKP).
- Rebased to v6.18.

v5: https://lore.kernel.org/r/20251126-vmscape-bhb-v5-2-02d66e423b00@linux.intel.com
- For BHI seq, limit runtime-patching to loop counts only (Dave).
  Dropped 2 patches that moved the BHB seq to a macro.
- Remove redundant switch cases in vmscape_select_mitigation() (Nikolay).
- Improve commit message (Nikolay).
- Collected tags.

v4: https://lore.kernel.org/r/20251119-vmscape-bhb-v4-0-1adad4e69ddc@linux.intel.com
- Move LFENCE to the callsite, out of clear_bhb_loop(). (Dave)
- Make clear_bhb_loop() work for larger BHB. (Dave)
  This now uses hardware enumeration to determine the BHB size to clear.
- Use write_ibpb() instead of indirect_branch_prediction_barrier() when
  IBPB is known to be available. (Dave)
- Use static_call() to simplify mitigation at exit-to-userspace. (Dave)
- Refactor vmscape_select_mitigation(). (Dave)
- Fix vmscape=on which was wrongly behaving as AUTO. (Dave)
- Split the patches. (Dave)
  - Patch 1-4 prepares for making the sequence flexible for VMSCAPE use.
  - Patch 5 trivial rename of variable.
  - Patch 6-8 prepares for deploying BHB mitigation for VMSCAPE.
  - Patch 9 deploys the mitigation.
  - Patch 10-11 fixes ON Vs AUTO mode.

v3: https://lore.kernel.org/r/20251027-vmscape-bhb-v3-0-5793c2534e93@linux.intel.com
- s/x86_pred_flush_pending/x86_predictor_flush_exit_to_user/ (Sean).
- Removed IBPB & BHB-clear mutual exclusion at exit-to-userspace.
- Collected tags.

v2: https://lore.kernel.org/r/20251015-vmscape-bhb-v2-0-91cbdd9c3a96@linux.intel.com
- Added check for IBPB feature in vmscape_select_mitigation(). (David)
- s/vmscape=auto/vmscape=on/ (David)
- Added patch to remove LFENCE from VMSCAPE BHB-clear sequence.
- Rebased to v6.18-rc1.

v1: https://lore.kernel.org/r/20250924-vmscape-bhb-v1-0-da51f0e1934d@linux.intel.com

Hi All,

These patches aim to improve the performance of a recent mitigation for
VMSCAPE[1] vulnerability. This improvement is relevant for BHI variant of
VMSCAPE that affect Alder Lake and newer processors.

The current mitigation approach uses IBPB on kvm-exit-to-userspace for all
affected range of CPUs. This is an overkill for CPUs that are only affected
by the BHI variant. On such CPUs clearing the branch history is sufficient
for VMSCAPE, and also more apt as the underlying issue is due to poisoned
branch history.

Below is the iPerf data for transfer between guest and host, comparing IBPB
and BHB-clear mitigation. BHB-clear shows performance improvement over IBPB
in most cases.

Platform: Emerald Rapids
Baseline: vmscape=off
Target: IBPB at VMexit-to-userspace Vs the new BHB-clear at
	VMexit-to-userspace mitigation (both compared against baseline).

(pN = N parallel connections)

| iPerf user-net | IBPB    | BHB Clear |
|----------------|---------|-----------|
| UDP 1-vCPU_p1  | -12.5%  |   1.3%    |
| TCP 1-vCPU_p1  | -10.4%  |  -1.5%    |
| TCP 1-vCPU_p1  | -7.5%   |  -3.0%    |
| UDP 4-vCPU_p16 | -3.7%   |  -3.7%    |
| TCP 4-vCPU_p4  | -2.9%   |  -1.4%    |
| UDP 4-vCPU_p4  | -0.6%   |   0.0%    |
| TCP 4-vCPU_p4  |  3.5%   |   0.0%    |

| iPerf bridge-net | IBPB    | BHB Clear |
|------------------|---------|-----------|
| UDP 1-vCPU_p1    | -9.4%   |  -0.4%    |
| TCP 1-vCPU_p1    | -3.9%   |  -0.5%    |
| UDP 4-vCPU_p16   | -2.2%   |  -3.8%    |
| TCP 4-vCPU_p4    | -1.0%   |  -1.0%    |
| TCP 4-vCPU_p4    |  0.5%   |   0.5%    |
| UDP 4-vCPU_p4    |  0.0%   |   0.9%    |
| TCP 1-vCPU_p1    |  0.0%   |   0.9%    |

| iPerf vhost-net | IBPB    | BHB Clear |
|-----------------|---------|-----------|
| UDP 1-vCPU_p1   | -4.3%   |   1.0%    |
| TCP 1-vCPU_p1   | -3.8%   |  -0.5%    |
| TCP 1-vCPU_p1   | -2.7%   |  -0.7%    |
| UDP 4-vCPU_p16  | -0.7%   |  -2.2%    |
| TCP 4-vCPU_p4   | -0.4%   |   0.8%    |
| UDP 4-vCPU_p4   |  0.4%   |  -0.7%    |
| TCP 4-vCPU_p4   |  0.0%   |   0.6%    |

[1] https://comsec.ethz.ch/research/microarch/vmscape-exposing-and-exploiting-incomplete-branch-predictor-isolation-in-cloud-environments/

---
Pawan Gupta (9):
      x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
      x86/bhi: Make clear_bhb_loop() effective on newer CPUs
      x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user
      x86/vmscape: Move mitigation selection to a switch()
      x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier()
      x86/vmscape: Use static_call() for predictor flush
      x86/vmscape: Deploy BHB clearing mitigation
      x86/vmscape: Fix conflicting attack-vector controls with =force
      x86/vmscape: Add cmdline vmscape=on to override attack vector controls

 Documentation/admin-guide/hw-vuln/vmscape.rst   |  8 +++
 Documentation/admin-guide/kernel-parameters.txt |  4 +-
 arch/x86/Kconfig                                |  1 +
 arch/x86/entry/entry_64.S                       | 13 +++--
 arch/x86/include/asm/cpufeatures.h              |  2 +-
 arch/x86/include/asm/entry-common.h             |  9 ++--
 arch/x86/include/asm/nospec-branch.h            | 11 +++--
 arch/x86/kernel/cpu/bugs.c                      | 65 +++++++++++++++++++------
 arch/x86/kvm/x86.c                              |  4 +-
 arch/x86/net/bpf_jit_comp.c                     |  2 +
 10 files changed, 90 insertions(+), 29 deletions(-)
---
base-commit: 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
change-id: 20250916-vmscape-bhb-d7d469977f2f

Best regards,
-- 
Thanks,
Pawan



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v6 1/9] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
@ 2025-12-02  6:18 ` Pawan Gupta
  2026-01-01 12:51   ` Borislav Petkov
  2025-12-02  6:19 ` [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:18 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

Currently, BHB clearing sequence is followed by an LFENCE to prevent
transient execution of subsequent indirect branches prematurely. However,
LFENCE barrier could be unnecessary in certain cases. For example, when
kernel is using BHI_DIS_S mitigation, and BHB clearing is only needed for
userspace. In such cases, LFENCE is redundant because ring transitions
would provide the necessary serialization.

Below is a quick recap of BHI mitigation options:

  On Alder Lake and newer

  - BHI_DIS_S: Hardware control to mitigate BHI in ring0. This has low
	       performance overhead.
  - Long loop: Alternatively, longer version of BHB clearing sequence
	       can be used to mitigate BHI. It can also be used to mitigate
	       BHI variant of VMSCAPE. This is not yet implemented in
	       Linux.

  On older CPUs

  - Short loop: Clears BHB at kernel entry and VMexit. The "Long loop" is
		effective on older CPUs as well, but should be avoided
		because of unnecessary overhead.

On Alder Lake and newer CPUs, eIBRS isolates the indirect targets between
guest and host. But when affected by the BHI variant of VMSCAPE, a guest's
branch history may still influence indirect branches in userspace. This
also means the big hammer IBPB could be replaced with a cheaper option that
clears the BHB at exit-to-userspace after a VMexit.

In preparation for adding the support for BHB sequence (without LFENCE) on
newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
executed. This allows callers to decide whether they need the LFENCE or
not. This does adds a few extra bytes to the call sites, but it obviates
the need for multiple variants of clear_bhb_loop().

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S            | 5 ++++-
 arch/x86/include/asm/nospec-branch.h | 4 ++--
 arch/x86/net/bpf_jit_comp.c          | 2 ++
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ed04a968cc7d0095ab0185b2e3b5beffb7680afd..886f86790b4467347031bc27d3d761d5cc286da1 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1528,6 +1528,9 @@ SYM_CODE_END(rewind_stack_and_make_dead)
  * refactored in the future if needed. The .skips are for safety, to ensure
  * that all RETs are in the second half of a cacheline to mitigate Indirect
  * Target Selection, rather than taking the slowpath via its_return_thunk.
+ *
+ * Note, callers should use a speculation barrier like LFENCE immediately after
+ * a call to this function to ensure BHB is cleared before indirect branches.
  */
 SYM_FUNC_START(clear_bhb_loop)
 	ANNOTATE_NOENDBR
@@ -1562,7 +1565,7 @@ SYM_FUNC_START(clear_bhb_loop)
 	sub	$1, %ecx
 	jnz	1b
 .Lret2:	RET
-5:	lfence
+5:
 	pop	%rbp
 	RET
 SYM_FUNC_END(clear_bhb_loop)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 08ed5a2e46a5fd790bcb1b73feb6469518809c06..ec5ebf96dbb9e240f402f39efc6929ae45ec8f0b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -329,11 +329,11 @@
 
 #ifdef CONFIG_X86_64
 .macro CLEAR_BRANCH_HISTORY
-	ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP
+	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP
 .endm
 
 .macro CLEAR_BRANCH_HISTORY_VMEXIT
-	ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_VMEXIT
+	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
 .endm
 #else
 #define CLEAR_BRANCH_HISTORY
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index de5083cb1d3747bba00effca3703a4f6eea80d8d..c1ec14c559119b120edfac079aeb07948e9844b8 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1603,6 +1603,8 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
 
 		if (emit_call(&prog, func, ip))
 			return -EINVAL;
+		/* Don't speculate past this until BHB is cleared */
+		EMIT_LFENCE();
 		EMIT1(0x59); /* pop rcx */
 		EMIT1(0x58); /* pop rax */
 	}

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
  2025-12-02  6:18 ` [PATCH v6 1/9] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
@ 2025-12-02  6:19 ` Pawan Gupta
  2025-12-10 12:31   ` Nikolay Borisov
  2026-01-24 19:34   ` Borislav Petkov
  2025-12-02  6:19 ` [PATCH v6 3/9] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user Pawan Gupta
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:19 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
the Branch History Buffer (BHB). On Alder Lake and newer parts this
sequence is not sufficient because it doesn't clear enough entries. This
was not an issue because these CPUs have a hardware control (BHI_DIS_S)
that mitigates BHI in kernel.

BHI variant of VMSCAPE requires isolating branch history between guests and
userspace. Note that there is no equivalent hardware control for userspace.
To effectively isolate branch history on newer CPUs, clear_bhb_loop()
should execute sufficient number of branches to clear a larger BHB.

Dynamically set the loop count of clear_bhb_loop() such that it is
effective on newer CPUs too. Use the hardware control enumeration
X86_FEATURE_BHI_CTRL to select the appropriate loop count.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 886f86790b4467347031bc27d3d761d5cc286da1..9f6f4a7c5baf1fe4e3ab18b11e25e2fbcc77489d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1536,7 +1536,11 @@ SYM_FUNC_START(clear_bhb_loop)
 	ANNOTATE_NOENDBR
 	push	%rbp
 	mov	%rsp, %rbp
-	movl	$5, %ecx
+
+	/* loop count differs based on BHI_CTRL, see Intel's BHI guidance */
+	ALTERNATIVE "movl $5,  %ecx; movl $5, %edx",	\
+		    "movl $12, %ecx; movl $7, %edx", X86_FEATURE_BHI_CTRL
+
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call	1f
 	jmp	5f
@@ -1557,7 +1561,7 @@ SYM_FUNC_START(clear_bhb_loop)
 	 * but some Clang versions (e.g. 18) don't like this.
 	 */
 	.skip 32 - 18, 0xcc
-2:	movl	$5, %eax
+2:	movl	%edx, %eax
 3:	jmp	4f
 	nop
 4:	sub	$1, %eax

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 3/9] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
  2025-12-02  6:18 ` [PATCH v6 1/9] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
  2025-12-02  6:19 ` [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
@ 2025-12-02  6:19 ` Pawan Gupta
  2025-12-02  6:19 ` [PATCH v6 4/9] x86/vmscape: Move mitigation selection to a switch() Pawan Gupta
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:19 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

With the upcoming changes x86_ibpb_exit_to_user will also be used when BHB
clearing sequence is used. Rename it cover both the cases.

No functional change.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/include/asm/entry-common.h  | 6 +++---
 arch/x86/include/asm/nospec-branch.h | 2 +-
 arch/x86/kernel/cpu/bugs.c           | 4 ++--
 arch/x86/kvm/x86.c                   | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index ce3eb6d5fdf9f2dba59b7bad24afbfafc8c36918..c45858db16c92fc1364fb818185fba7657840991 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -94,11 +94,11 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	 */
 	choose_random_kstack_offset(rdtsc());
 
-	/* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
+	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
-	    this_cpu_read(x86_ibpb_exit_to_user)) {
+	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
 		indirect_branch_prediction_barrier();
-		this_cpu_write(x86_ibpb_exit_to_user, false);
+		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }
 #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index ec5ebf96dbb9e240f402f39efc6929ae45ec8f0b..df60f9cf51b84e5b75e5db70713188d2e6ad0f5d 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -531,7 +531,7 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
 		: "memory");
 }
 
-DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
+DECLARE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
 
 static inline void indirect_branch_prediction_barrier(void)
 {
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d7fa03bf51b4517c12cc68e7c441f7589a4983d1..1e9b11198db0fe2483bd17b1327bcfd44a2c1dbf 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -113,8 +113,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
  * be needed to before running userspace. That IBPB will flush the branch
  * predictor content.
  */
-DEFINE_PER_CPU(bool, x86_ibpb_exit_to_user);
-EXPORT_PER_CPU_SYMBOL_GPL(x86_ibpb_exit_to_user);
+DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
+EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
 
 u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9c2aa6f4705e1ae257bf94572967a5724a940a7..60123568fba85c8a445f9220d3f4a1d11fd0eb77 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11397,7 +11397,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * may migrate to.
 	 */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
-		this_cpu_write(x86_ibpb_exit_to_user, true);
+		this_cpu_write(x86_predictor_flush_exit_to_user, true);
 
 	/*
 	 * Consume any pending interrupts, including the possible source of

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 4/9] x86/vmscape: Move mitigation selection to a switch()
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (2 preceding siblings ...)
  2025-12-02  6:19 ` [PATCH v6 3/9] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user Pawan Gupta
@ 2025-12-02  6:19 ` Pawan Gupta
  2025-12-10 16:15   ` Nikolay Borisov
  2025-12-02  6:19 ` [PATCH v6 5/9] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier() Pawan Gupta
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:19 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

This ensures that all mitigation modes are explicitly handled, while
keeping the mitigation selection for each mode together. This also prepares
for adding BHB-clearing mitigation mode for VMSCAPE.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/kernel/cpu/bugs.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1e9b11198db0fe2483bd17b1327bcfd44a2c1dbf..71865b9d2c5c18cd0cf3cb8bbf07d1576cd20498 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3231,17 +3231,33 @@ early_param("vmscape", vmscape_parse_cmdline);
 
 static void __init vmscape_select_mitigation(void)
 {
-	if (!boot_cpu_has_bug(X86_BUG_VMSCAPE) ||
-	    !boot_cpu_has(X86_FEATURE_IBPB)) {
+	if (!boot_cpu_has_bug(X86_BUG_VMSCAPE)) {
 		vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
 		return;
 	}
 
-	if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) {
-		if (should_mitigate_vuln(X86_BUG_VMSCAPE))
+	if ((vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) &&
+	    !should_mitigate_vuln(X86_BUG_VMSCAPE))
+		vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+
+	switch (vmscape_mitigation) {
+	case VMSCAPE_MITIGATION_NONE:
+		break;
+
+	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+		if (!boot_cpu_has(X86_FEATURE_IBPB))
+			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+		break;
+
+	case VMSCAPE_MITIGATION_AUTO:
+		if (boot_cpu_has(X86_FEATURE_IBPB))
 			vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 		else
 			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+		break;
+
+	default:
+		break;
 	}
 }
 

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 5/9] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier()
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (3 preceding siblings ...)
  2025-12-02  6:19 ` [PATCH v6 4/9] x86/vmscape: Move mitigation selection to a switch() Pawan Gupta
@ 2025-12-02  6:19 ` Pawan Gupta
  2025-12-02  6:20 ` [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:19 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

indirect_branch_prediction_barrier() is a wrapper to write_ibpb(), which
also checks if the CPU supports IBPB. For VMSCAPE, call to
indirect_branch_prediction_barrier() is only possible when CPU supports
IBPB.

Simply call write_ibpb() directly to avoid unnecessary alternative
patching.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/include/asm/entry-common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index c45858db16c92fc1364fb818185fba7657840991..78b143673ca72642149eb2dbf3e3e31370fe6b28 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -97,7 +97,7 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
 	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
 	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
-		indirect_branch_prediction_barrier();
+		write_ibpb();
 		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (4 preceding siblings ...)
  2025-12-02  6:19 ` [PATCH v6 5/9] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier() Pawan Gupta
@ 2025-12-02  6:20 ` Pawan Gupta
  2025-12-11 10:06   ` Nikolay Borisov
  2025-12-11 10:50   ` Peter Zijlstra
  2025-12-02  6:20 ` [PATCH v6 7/9] x86/vmscape: Deploy BHB clearing mitigation Pawan Gupta
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:20 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

Adding more mitigation options at exit-to-userspace for VMSCAPE would
usually require a series of checks to decide which mitigation to use. In
this case, the mitigation is done by calling a function, which is decided
at boot. So, adding more feature flags and multiple checks can be avoided
by using static_call() to the mitigating function.

Replace the flag-based mitigation selector with a static_call(). This also
frees the existing X86_FEATURE_IBPB_EXIT_TO_USER.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/Kconfig                     | 1 +
 arch/x86/include/asm/cpufeatures.h   | 2 +-
 arch/x86/include/asm/entry-common.h  | 7 +++----
 arch/x86/include/asm/nospec-branch.h | 3 +++
 arch/x86/kernel/cpu/bugs.c           | 5 ++++-
 arch/x86/kvm/x86.c                   | 2 +-
 6 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fa3b616af03a2d50eaf5f922bc8cd4e08a284045..066f62f15e67e85fda0f3fd66acabad9a9794ff8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2706,6 +2706,7 @@ config MITIGATION_TSA
 config MITIGATION_VMSCAPE
 	bool "Mitigate VMSCAPE"
 	depends on KVM
+	select HAVE_STATIC_CALL
 	default y
 	help
 	  Enable mitigation for VMSCAPE attacks. VMSCAPE is a hardware security
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 4091a776e37aaed67ca93b0a0cd23cc25dbc33d4..02871318c999f94ec8557e5fb0b8fb299960d454 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -496,7 +496,7 @@
 #define X86_FEATURE_TSA_SQ_NO		(21*32+11) /* AMD CPU not vulnerable to TSA-SQ */
 #define X86_FEATURE_TSA_L1_NO		(21*32+12) /* AMD CPU not vulnerable to TSA-L1 */
 #define X86_FEATURE_CLEAR_CPU_BUF_VM	(21*32+13) /* Clear CPU buffers using VERW before VMRUN */
-#define X86_FEATURE_IBPB_EXIT_TO_USER	(21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
+/* Free */
 #define X86_FEATURE_ABMC		(21*32+15) /* Assignable Bandwidth Monitoring Counters */
 #define X86_FEATURE_MSR_IMM		(21*32+16) /* MSR immediate form instructions */
 
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 78b143673ca72642149eb2dbf3e3e31370fe6b28..783e7cb50caeb6c6fc68e0a5c75ab43e75e37116 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -4,6 +4,7 @@
 
 #include <linux/randomize_kstack.h>
 #include <linux/user-return-notifier.h>
+#include <linux/static_call_types.h>
 
 #include <asm/nospec-branch.h>
 #include <asm/io_bitmap.h>
@@ -94,10 +95,8 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 	 */
 	choose_random_kstack_offset(rdtsc());
 
-	/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
-	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
-	    this_cpu_read(x86_predictor_flush_exit_to_user)) {
-		write_ibpb();
+	if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
+		static_call_cond(vmscape_predictor_flush)();
 		this_cpu_write(x86_predictor_flush_exit_to_user, false);
 	}
 }
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index df60f9cf51b84e5b75e5db70713188d2e6ad0f5d..15a2fa8f2f48a066e102263513eff9537ac1d25f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -540,6 +540,9 @@ static inline void indirect_branch_prediction_barrier(void)
 			    :: "rax", "rcx", "rdx", "memory");
 }
 
+#include <linux/static_call_types.h>
+DECLARE_STATIC_CALL(vmscape_predictor_flush, write_ibpb);
+
 /* The Intel SPEC CTRL MSR base value cache */
 extern u64 x86_spec_ctrl_base;
 DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 71865b9d2c5c18cd0cf3cb8bbf07d1576cd20498..71a35a153c1eb852438d533fc8ad76eefaca3219 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -200,6 +200,9 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
 DEFINE_STATIC_KEY_FALSE(cpu_buf_vm_clear);
 EXPORT_SYMBOL_GPL(cpu_buf_vm_clear);
 
+DEFINE_STATIC_CALL_NULL(vmscape_predictor_flush, write_ibpb);
+EXPORT_STATIC_CALL_GPL(vmscape_predictor_flush);
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"mitigations: " fmt
 
@@ -3276,7 +3279,7 @@ static void __init vmscape_update_mitigation(void)
 static void __init vmscape_apply_mitigation(void)
 {
 	if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
-		setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER);
+		static_call_update(vmscape_predictor_flush, write_ibpb);
 }
 
 #undef pr_fmt
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 60123568fba85c8a445f9220d3f4a1d11fd0eb77..7e55ef3b3203a26be1a138c8fa838a8c5aae0125 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11396,7 +11396,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	 * set for the CPU that actually ran the guest, and not the CPU that it
 	 * may migrate to.
 	 */
-	if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
+	if (static_call_query(vmscape_predictor_flush))
 		this_cpu_write(x86_predictor_flush_exit_to_user, true);
 
 	/*

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 7/9] x86/vmscape: Deploy BHB clearing mitigation
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (5 preceding siblings ...)
  2025-12-02  6:20 ` [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
@ 2025-12-02  6:20 ` Pawan Gupta
  2025-12-11 14:26   ` Nikolay Borisov
  2025-12-02  6:20 ` [PATCH v6 8/9] x86/vmscape: Fix conflicting attack-vector controls with =force Pawan Gupta
  2025-12-02  6:21 ` [PATCH v6 9/9] x86/vmscape: Add cmdline vmscape=on to override attack vector controls Pawan Gupta
  8 siblings, 1 reply; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:20 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

IBPB mitigation for VMSCAPE is an overkill on CPUs that are only affected
by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
indirect branch isolation between guest and host userspace. However, branch
history from guest may also influence the indirect branches in host
userspace.

To mitigate the BHI aspect, use clear_bhb_loop().

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 Documentation/admin-guide/hw-vuln/vmscape.rst |  4 ++++
 arch/x86/include/asm/nospec-branch.h          |  2 ++
 arch/x86/kernel/cpu/bugs.c                    | 26 +++++++++++++++++++-------
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index d9b9a2b6c114c05a7325e5f3c9d42129339b870b..dc63a0bac03d43d1e295de0791dd6497d101f986 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -86,6 +86,10 @@ The possible values in this file are:
    run a potentially malicious guest and issues an IBPB before the first
    exit to userspace after VM-exit.
 
+ * 'Mitigation: Clear BHB before exit to userspace':
+
+   As above, conditional BHB clearing mitigation is enabled.
+
  * 'Mitigation: IBPB on VMEXIT':
 
    IBPB is issued on every VM-exit. This occurs when other mitigations like
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 15a2fa8f2f48a066e102263513eff9537ac1d25f..1e8c26c37dbed4256b35101fb41c0e1eb6ef9272 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -388,6 +388,8 @@ extern void write_ibpb(void);
 
 #ifdef CONFIG_X86_64
 extern void clear_bhb_loop(void);
+#else
+static inline void clear_bhb_loop(void) {}
 #endif
 
 extern void (*x86_return_thunk)(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 71a35a153c1eb852438d533fc8ad76eefaca3219..61c3b4ae131f39fd716a54ba46d255844b1bb609 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -109,9 +109,8 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
 EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
 
 /*
- * Set when the CPU has run a potentially malicious guest. An IBPB will
- * be needed to before running userspace. That IBPB will flush the branch
- * predictor content.
+ * Set when the CPU has run a potentially malicious guest. Indicates that a
+ * branch predictor flush is needed before running userspace.
  */
 DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
 EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
@@ -3200,13 +3199,15 @@ enum vmscape_mitigations {
 	VMSCAPE_MITIGATION_AUTO,
 	VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
 	VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
+	VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
 };
 
 static const char * const vmscape_strings[] = {
-	[VMSCAPE_MITIGATION_NONE]		= "Vulnerable",
+	[VMSCAPE_MITIGATION_NONE]			= "Vulnerable",
 	/* [VMSCAPE_MITIGATION_AUTO] */
-	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]	= "Mitigation: IBPB before exit to userspace",
-	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]	= "Mitigation: IBPB on VMEXIT",
+	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]		= "Mitigation: IBPB before exit to userspace",
+	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]		= "Mitigation: IBPB on VMEXIT",
+	[VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER]	= "Mitigation: Clear BHB before exit to userspace",
 };
 
 static enum vmscape_mitigations vmscape_mitigation __ro_after_init =
@@ -3253,7 +3254,15 @@ static void __init vmscape_select_mitigation(void)
 		break;
 
 	case VMSCAPE_MITIGATION_AUTO:
-		if (boot_cpu_has(X86_FEATURE_IBPB))
+		/*
+		 * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use BHB
+		 * clear sequence. These CPUs are only vulnerable to the BHI variant
+		 * of the VMSCAPE attack and does not require an IBPB flush. In
+		 * 32-bit mode BHB clear sequence is not supported.
+		 */
+		if (boot_cpu_has(X86_FEATURE_BHI_CTRL) && IS_ENABLED(CONFIG_X86_64))
+			vmscape_mitigation = VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER;
+		else if (boot_cpu_has(X86_FEATURE_IBPB))
 			vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 		else
 			vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
@@ -3280,6 +3289,8 @@ static void __init vmscape_apply_mitigation(void)
 {
 	if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
 		static_call_update(vmscape_predictor_flush, write_ibpb);
+	else if (vmscape_mitigation == VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER)
+		static_call_update(vmscape_predictor_flush, clear_bhb_loop);
 }
 
 #undef pr_fmt
@@ -3371,6 +3382,7 @@ void cpu_bugs_smt_update(void)
 		break;
 	case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
 	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+	case VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER:
 		/*
 		 * Hypervisors can be attacked across-threads, warn for SMT when
 		 * STIBP is not already enabled system-wide.

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 8/9] x86/vmscape: Fix conflicting attack-vector controls with =force
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (6 preceding siblings ...)
  2025-12-02  6:20 ` [PATCH v6 7/9] x86/vmscape: Deploy BHB clearing mitigation Pawan Gupta
@ 2025-12-02  6:20 ` Pawan Gupta
  2025-12-02  6:21 ` [PATCH v6 9/9] x86/vmscape: Add cmdline vmscape=on to override attack vector controls Pawan Gupta
  8 siblings, 0 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:20 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

vmscape=force option currently defaults to AUTO mitigation. This is not
correct because attack-vector controls overrides a mitigation in AUTO mode.
This prevents a user from being able to force VMSCAPE mitigation when it
conflicts with attack-vector controls.

Kernel should deploy a forced mitigation irrespective of attack vectors.
Instead of AUTO, use VMSCAPE_MITIGATION_ON that wins over attack-vector
controls.

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/kernel/cpu/bugs.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 61c3b4ae131f39fd716a54ba46d255844b1bb609..58cd26e4f4c385a10230912666c02dbb05e71cba 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3197,6 +3197,7 @@ static void __init srso_apply_mitigation(void)
 enum vmscape_mitigations {
 	VMSCAPE_MITIGATION_NONE,
 	VMSCAPE_MITIGATION_AUTO,
+	VMSCAPE_MITIGATION_ON,
 	VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
 	VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
 	VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
@@ -3205,6 +3206,7 @@ enum vmscape_mitigations {
 static const char * const vmscape_strings[] = {
 	[VMSCAPE_MITIGATION_NONE]			= "Vulnerable",
 	/* [VMSCAPE_MITIGATION_AUTO] */
+	/* [VMSCAPE_MITIGATION_ON] */
 	[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER]		= "Mitigation: IBPB before exit to userspace",
 	[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT]		= "Mitigation: IBPB on VMEXIT",
 	[VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER]	= "Mitigation: Clear BHB before exit to userspace",
@@ -3224,7 +3226,7 @@ static int __init vmscape_parse_cmdline(char *str)
 		vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
 	} else if (!strcmp(str, "force")) {
 		setup_force_cpu_bug(X86_BUG_VMSCAPE);
-		vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
+		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
 	} else {
 		pr_err("Ignoring unknown vmscape=%s option.\n", str);
 	}
@@ -3254,6 +3256,7 @@ static void __init vmscape_select_mitigation(void)
 		break;
 
 	case VMSCAPE_MITIGATION_AUTO:
+	case VMSCAPE_MITIGATION_ON:
 		/*
 		 * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use BHB
 		 * clear sequence. These CPUs are only vulnerable to the BHI variant
@@ -3379,6 +3382,7 @@ void cpu_bugs_smt_update(void)
 	switch (vmscape_mitigation) {
 	case VMSCAPE_MITIGATION_NONE:
 	case VMSCAPE_MITIGATION_AUTO:
+	case VMSCAPE_MITIGATION_ON:
 		break;
 	case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
 	case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v6 9/9] x86/vmscape: Add cmdline vmscape=on to override attack vector controls
  2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
                   ` (7 preceding siblings ...)
  2025-12-02  6:20 ` [PATCH v6 8/9] x86/vmscape: Fix conflicting attack-vector controls with =force Pawan Gupta
@ 2025-12-02  6:21 ` Pawan Gupta
  8 siblings, 0 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-02  6:21 UTC (permalink / raw)
  To: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang

In general, individual mitigation controls can be used to override the
attack vector controls. But, nothing exists to select BHB clearing
mitigation for VMSCAPE. The =force option comes close, but with a
side-effect of also forcibly setting the bug, hence deploying the
mitigation on unaffected parts too.

Add a new cmdline option vmscape=on to enable the mitigation based on the
VMSCAPE variant the CPU is affected by.

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 Documentation/admin-guide/hw-vuln/vmscape.rst   | 4 ++++
 Documentation/admin-guide/kernel-parameters.txt | 4 +++-
 arch/x86/kernel/cpu/bugs.c                      | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index dc63a0bac03d43d1e295de0791dd6497d101f986..580f288ae8bfc601ff000d6d95d711bb9084459e 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -112,3 +112,7 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter:
 
    Force vulnerability detection and mitigation even on processors that are
    not known to be affected.
+
+ * ``vmscape=on``:
+
+   Choose the mitigation based on the VMSCAPE variant the CPU is affected by.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6c42061ca20e581b5192b66c6f25aba38d4f8ff8..d2ccec6e10f3ea094c01083d4c133b837c7fc7d7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8104,9 +8104,11 @@
 
 			off		- disable the mitigation
 			ibpb		- use Indirect Branch Prediction Barrier
-					  (IBPB) mitigation (default)
+					  (IBPB) mitigation
 			force		- force vulnerability detection even on
 					  unaffected processors
+			on		- (default) selects IBPB or BHB clear
+					  mitigation based on CPU
 
 	vsyscall=	[X86-64,EARLY]
 			Controls the behavior of vsyscalls (i.e. calls to
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 58cd26e4f4c385a10230912666c02dbb05e71cba..5870bb67baf3bb54be80a7c193c26b6f6eb246d5 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3227,6 +3227,8 @@ static int __init vmscape_parse_cmdline(char *str)
 	} else if (!strcmp(str, "force")) {
 		setup_force_cpu_bug(X86_BUG_VMSCAPE);
 		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
+	} else if (!strcmp(str, "on")) {
+		vmscape_mitigation = VMSCAPE_MITIGATION_ON;
 	} else {
 		pr_err("Ignoring unknown vmscape=%s option.\n", str);
 	}

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-02  6:19 ` [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
@ 2025-12-10 12:31   ` Nikolay Borisov
  2025-12-10 13:35     ` David Laight
  2025-12-14 17:16     ` Pawan Gupta
  2026-01-24 19:34   ` Borislav Petkov
  1 sibling, 2 replies; 27+ messages in thread
From: Nikolay Borisov @ 2025-12-10 12:31 UTC (permalink / raw)
  To: Pawan Gupta, x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang



On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:
> As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> the Branch History Buffer (BHB). On Alder Lake and newer parts this
> sequence is not sufficient because it doesn't clear enough entries. This
> was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> that mitigates BHI in kernel.
> 
> BHI variant of VMSCAPE requires isolating branch history between guests and
> userspace. Note that there is no equivalent hardware control for userspace.
> To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> should execute sufficient number of branches to clear a larger BHB.
> 
> Dynamically set the loop count of clear_bhb_loop() such that it is
> effective on newer CPUs too. Use the hardware control enumeration
> X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

nit: My RB tag is incorrect, while I did agree with Dave's suggestion to 
have global variables for the loop counts I haven't' really seen the 
code so I couldn't have given my RB on something which I haven't seen 
but did agree with in principle.

Now that I have seen the code I'm willing to give my :

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> ---
>   arch/x86/entry/entry_64.S | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 886f86790b4467347031bc27d3d761d5cc286da1..9f6f4a7c5baf1fe4e3ab18b11e25e2fbcc77489d 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1536,7 +1536,11 @@ SYM_FUNC_START(clear_bhb_loop)
>   	ANNOTATE_NOENDBR
>   	push	%rbp
>   	mov	%rsp, %rbp
> -	movl	$5, %ecx
> +
> +	/* loop count differs based on BHI_CTRL, see Intel's BHI guidance */
> +	ALTERNATIVE "movl $5,  %ecx; movl $5, %edx",	\
> +		    "movl $12, %ecx; movl $7, %edx", X86_FEATURE_BHI_CTRL

nit: Just

> +
>   	ANNOTATE_INTRA_FUNCTION_CALL
>   	call	1f
>   	jmp	5f
> @@ -1557,7 +1561,7 @@ SYM_FUNC_START(clear_bhb_loop)
>   	 * but some Clang versions (e.g. 18) don't like this.
>   	 */
>   	.skip 32 - 18, 0xcc
> -2:	movl	$5, %eax
> +2:	movl	%edx, %eax
>   3:	jmp	4f
>   	nop
>   4:	sub	$1, %eax
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-10 12:31   ` Nikolay Borisov
@ 2025-12-10 13:35     ` David Laight
  2025-12-10 15:42       ` Nikolay Borisov
  2025-12-14 18:38       ` Pawan Gupta
  2025-12-14 17:16     ` Pawan Gupta
  1 sibling, 2 replies; 27+ messages in thread
From: David Laight @ 2025-12-10 13:35 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Pawan Gupta, x86, David Kaplan, H. Peter Anvin, Josh Poimboeuf,
	Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
	linux-kernel, kvm, Asit Mallick, Tao Zhang

On Wed, 10 Dec 2025 14:31:31 +0200
Nikolay Borisov <nik.borisov@suse.com> wrote:

> On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:
> > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > sequence is not sufficient because it doesn't clear enough entries. This
> > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > that mitigates BHI in kernel.
> > 
> > BHI variant of VMSCAPE requires isolating branch history between guests and
> > userspace. Note that there is no equivalent hardware control for userspace.
> > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > should execute sufficient number of branches to clear a larger BHB.
> > 
> > Dynamically set the loop count of clear_bhb_loop() such that it is
> > effective on newer CPUs too. Use the hardware control enumeration
> > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> > 
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>  
> 
> nit: My RB tag is incorrect, while I did agree with Dave's suggestion to 
> have global variables for the loop counts I haven't' really seen the 
> code so I couldn't have given my RB on something which I haven't seen 
> but did agree with in principle.

I thought the plan was to use global variables rather than ALTERNATIVE.
The performance of this code is dominated by the loop.

I also found this code in arch/x86/net/bpf_jit_comp.c:
	if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
		/* The clearing sequence clobbers eax and ecx. */
		EMIT1(0x50); /* push rax */
		EMIT1(0x51); /* push rcx */
		ip += 2;

		func = (u8 *)clear_bhb_loop;
		ip += x86_call_depth_emit_accounting(&prog, func, ip);

		if (emit_call(&prog, func, ip))
			return -EINVAL;
		EMIT1(0x59); /* pop rcx */
		EMIT1(0x58); /* pop rax */
	}
which appears to assume that only rax and rcx are changed.
Since all the counts are small, there is nothing stopping the code
using the 8-bit registers %al, %ah, %cl and %ch.

There are probably some schemes that only need one register.
eg two separate ALTERNATIVE blocks.

	David

> 
> Now that I have seen the code I'm willing to give my :
> 
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > ---
> >   arch/x86/entry/entry_64.S | 8 ++++++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 886f86790b4467347031bc27d3d761d5cc286da1..9f6f4a7c5baf1fe4e3ab18b11e25e2fbcc77489d 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -1536,7 +1536,11 @@ SYM_FUNC_START(clear_bhb_loop)
> >   	ANNOTATE_NOENDBR
> >   	push	%rbp
> >   	mov	%rsp, %rbp
> > -	movl	$5, %ecx
> > +
> > +	/* loop count differs based on BHI_CTRL, see Intel's BHI guidance */
> > +	ALTERNATIVE "movl $5,  %ecx; movl $5, %edx",	\
> > +		    "movl $12, %ecx; movl $7, %edx", X86_FEATURE_BHI_CTRL  
> 
> nit: Just
> 
> > +
> >   	ANNOTATE_INTRA_FUNCTION_CALL
> >   	call	1f
> >   	jmp	5f
> > @@ -1557,7 +1561,7 @@ SYM_FUNC_START(clear_bhb_loop)
> >   	 * but some Clang versions (e.g. 18) don't like this.
> >   	 */
> >   	.skip 32 - 18, 0xcc
> > -2:	movl	$5, %eax
> > +2:	movl	%edx, %eax
> >   3:	jmp	4f
> >   	nop
> >   4:	sub	$1, %eax
> >   
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-10 13:35     ` David Laight
@ 2025-12-10 15:42       ` Nikolay Borisov
  2025-12-14 18:38       ` Pawan Gupta
  1 sibling, 0 replies; 27+ messages in thread
From: Nikolay Borisov @ 2025-12-10 15:42 UTC (permalink / raw)
  To: David Laight
  Cc: Pawan Gupta, x86, David Kaplan, H. Peter Anvin, Josh Poimboeuf,
	Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
	linux-kernel, kvm, Asit Mallick, Tao Zhang



On 10.12.25 г. 15:35 ч., David Laight wrote:
> On Wed, 10 Dec 2025 14:31:31 +0200
> Nikolay Borisov <nik.borisov@suse.com> wrote:
> 
>> On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:
>>> As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
>>> the Branch History Buffer (BHB). On Alder Lake and newer parts this
>>> sequence is not sufficient because it doesn't clear enough entries. This
>>> was not an issue because these CPUs have a hardware control (BHI_DIS_S)
>>> that mitigates BHI in kernel.
>>>
>>> BHI variant of VMSCAPE requires isolating branch history between guests and
>>> userspace. Note that there is no equivalent hardware control for userspace.
>>> To effectively isolate branch history on newer CPUs, clear_bhb_loop()
>>> should execute sufficient number of branches to clear a larger BHB.
>>>
>>> Dynamically set the loop count of clear_bhb_loop() such that it is
>>> effective on newer CPUs too. Use the hardware control enumeration
>>> X86_FEATURE_BHI_CTRL to select the appropriate loop count.
>>>
>>> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
>>> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
>>> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>>
>> nit: My RB tag is incorrect, while I did agree with Dave's suggestion to
>> have global variables for the loop counts I haven't' really seen the
>> code so I couldn't have given my RB on something which I haven't seen
>> but did agree with in principle.
> 
> I thought the plan was to use global variables rather than ALTERNATIVE.
> The performance of this code is dominated by the loop.

Generally yes and I was on the verge of calling this out, however what 
stopped me is the fact that the global variables are going to be set 
"somewhere else" whilst with the current approach everything is 
contained within the clear_bhb_loop function. Both ways have their merit 
but I don't want to endlessly bikeshed.

<snip>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 4/9] x86/vmscape: Move mitigation selection to a switch()
  2025-12-02  6:19 ` [PATCH v6 4/9] x86/vmscape: Move mitigation selection to a switch() Pawan Gupta
@ 2025-12-10 16:15   ` Nikolay Borisov
  0 siblings, 0 replies; 27+ messages in thread
From: Nikolay Borisov @ 2025-12-10 16:15 UTC (permalink / raw)
  To: Pawan Gupta, x86, David Kaplan, H. Peter Anvin, Josh Poimboeuf,
	Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang



On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:
> This ensures that all mitigation modes are explicitly handled, while
> keeping the mitigation selection for each mode together. This also prepares
> for adding BHB-clearing mitigation mode for VMSCAPE.
> 
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush
  2025-12-02  6:20 ` [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
@ 2025-12-11 10:06   ` Nikolay Borisov
  2025-12-11 10:50   ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Nikolay Borisov @ 2025-12-11 10:06 UTC (permalink / raw)
  To: Pawan Gupta, x86, David Kaplan, H. Peter Anvin, Josh Poimboeuf,
	Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang



On 2.12.25 г. 8:20 ч., Pawan Gupta wrote:
> Adding more mitigation options at exit-to-userspace for VMSCAPE would
> usually require a series of checks to decide which mitigation to use. In
> this case, the mitigation is done by calling a function, which is decided
> at boot. So, adding more feature flags and multiple checks can be avoided
> by using static_call() to the mitigating function.
> 
> Replace the flag-based mitigation selector with a static_call(). This also
> frees the existing X86_FEATURE_IBPB_EXIT_TO_USER.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush
  2025-12-02  6:20 ` [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
  2025-12-11 10:06   ` Nikolay Borisov
@ 2025-12-11 10:50   ` Peter Zijlstra
  2025-12-14 18:45     ` Pawan Gupta
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2025-12-11 10:50 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen, linux-kernel, kvm, Asit Mallick,
	Tao Zhang

On Mon, Dec 01, 2025 at 10:20:14PM -0800, Pawan Gupta wrote:
> Adding more mitigation options at exit-to-userspace for VMSCAPE would
> usually require a series of checks to decide which mitigation to use. In
> this case, the mitigation is done by calling a function, which is decided
> at boot. So, adding more feature flags and multiple checks can be avoided
> by using static_call() to the mitigating function.
> 
> Replace the flag-based mitigation selector with a static_call(). This also
> frees the existing X86_FEATURE_IBPB_EXIT_TO_USER.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
>  arch/x86/Kconfig                     | 1 +
>  arch/x86/include/asm/cpufeatures.h   | 2 +-
>  arch/x86/include/asm/entry-common.h  | 7 +++----
>  arch/x86/include/asm/nospec-branch.h | 3 +++
>  arch/x86/kernel/cpu/bugs.c           | 5 ++++-
>  arch/x86/kvm/x86.c                   | 2 +-
>  6 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index fa3b616af03a2d50eaf5f922bc8cd4e08a284045..066f62f15e67e85fda0f3fd66acabad9a9794ff8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2706,6 +2706,7 @@ config MITIGATION_TSA
>  config MITIGATION_VMSCAPE
>  	bool "Mitigate VMSCAPE"
>  	depends on KVM
> +	select HAVE_STATIC_CALL

That can't be right.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 7/9] x86/vmscape: Deploy BHB clearing mitigation
  2025-12-02  6:20 ` [PATCH v6 7/9] x86/vmscape: Deploy BHB clearing mitigation Pawan Gupta
@ 2025-12-11 14:26   ` Nikolay Borisov
  0 siblings, 0 replies; 27+ messages in thread
From: Nikolay Borisov @ 2025-12-11 14:26 UTC (permalink / raw)
  To: Pawan Gupta, x86, David Kaplan, H. Peter Anvin, Josh Poimboeuf,
	Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen
  Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang



On 2.12.25 г. 8:20 ч., Pawan Gupta wrote:
> IBPB mitigation for VMSCAPE is an overkill on CPUs that are only affected
> by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
> indirect branch isolation between guest and host userspace. However, branch
> history from guest may also influence the indirect branches in host
> userspace.
> 
> To mitigate the BHI aspect, use clear_bhb_loop().
> 
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-10 12:31   ` Nikolay Borisov
  2025-12-10 13:35     ` David Laight
@ 2025-12-14 17:16     ` Pawan Gupta
  1 sibling, 0 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-14 17:16 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: x86, David Kaplan, H. Peter Anvin, Josh Poimboeuf,
	Sean Christopherson, Paolo Bonzini, Borislav Petkov, Dave Hansen,
	linux-kernel, kvm, Asit Mallick, Tao Zhang

On Wed, Dec 10, 2025 at 02:31:31PM +0200, Nikolay Borisov wrote:
> 
> 
> On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:
> > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > sequence is not sufficient because it doesn't clear enough entries. This
> > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > that mitigates BHI in kernel.
> > 
> > BHI variant of VMSCAPE requires isolating branch history between guests and
> > userspace. Note that there is no equivalent hardware control for userspace.
> > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > should execute sufficient number of branches to clear a larger BHB.
> > 
> > Dynamically set the loop count of clear_bhb_loop() such that it is
> > effective on newer CPUs too. Use the hardware control enumeration
> > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> > 
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> 
> nit: My RB tag is incorrect, while I did agree with Dave's suggestion to
> have global variables for the loop counts I haven't' really seen the code so
> I couldn't have given my RB on something which I haven't seen but did agree
> with in principle.

The tag got applied from v4, but yes the patch got updated since:

https://lore.kernel.org/all/8b657ef2-d9a7-4424-987d-111beb477727@suse.com/

> Now that I have seen the code I'm willing to give my :
> 
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>

Thanks.

> > ---
> >   arch/x86/entry/entry_64.S | 8 ++++++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 886f86790b4467347031bc27d3d761d5cc286da1..9f6f4a7c5baf1fe4e3ab18b11e25e2fbcc77489d 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -1536,7 +1536,11 @@ SYM_FUNC_START(clear_bhb_loop)
> >   	ANNOTATE_NOENDBR
> >   	push	%rbp
> >   	mov	%rsp, %rbp
> > -	movl	$5, %ecx
> > +
> > +	/* loop count differs based on BHI_CTRL, see Intel's BHI guidance */
> > +	ALTERNATIVE "movl $5,  %ecx; movl $5, %edx",	\
> > +		    "movl $12, %ecx; movl $7, %edx", X86_FEATURE_BHI_CTRL
> 
> nit: Just

Will do:

	/* Just loop count differs based on BHI_CTRL, see Intel's BHI guidance */

> > +
> >   	ANNOTATE_INTRA_FUNCTION_CALL
> >   	call	1f
> >   	jmp	5f
> > @@ -1557,7 +1561,7 @@ SYM_FUNC_START(clear_bhb_loop)
> >   	 * but some Clang versions (e.g. 18) don't like this.
> >   	 */
> >   	.skip 32 - 18, 0xcc
> > -2:	movl	$5, %eax
> > +2:	movl	%edx, %eax
> >   3:	jmp	4f
> >   	nop
> >   4:	sub	$1, %eax
> > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-10 13:35     ` David Laight
  2025-12-10 15:42       ` Nikolay Borisov
@ 2025-12-14 18:38       ` Pawan Gupta
  2025-12-14 19:02         ` David Laight
  1 sibling, 1 reply; 27+ messages in thread
From: Pawan Gupta @ 2025-12-14 18:38 UTC (permalink / raw)
  To: David Laight
  Cc: Nikolay Borisov, x86, David Kaplan, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen, linux-kernel, kvm, Asit Mallick,
	Tao Zhang

On Wed, Dec 10, 2025 at 01:35:42PM +0000, David Laight wrote:
> On Wed, 10 Dec 2025 14:31:31 +0200
> Nikolay Borisov <nik.borisov@suse.com> wrote:
> 
> > On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:
> > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > sequence is not sufficient because it doesn't clear enough entries. This
> > > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > > that mitigates BHI in kernel.
> > > 
> > > BHI variant of VMSCAPE requires isolating branch history between guests and
> > > userspace. Note that there is no equivalent hardware control for userspace.
> > > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > > should execute sufficient number of branches to clear a larger BHB.
> > > 
> > > Dynamically set the loop count of clear_bhb_loop() such that it is
> > > effective on newer CPUs too. Use the hardware control enumeration
> > > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> > > 
> > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>  
> > 
> > nit: My RB tag is incorrect, while I did agree with Dave's suggestion to 
> > have global variables for the loop counts I haven't' really seen the 
> > code so I couldn't have given my RB on something which I haven't seen 
> > but did agree with in principle.
> 
> I thought the plan was to use global variables rather than ALTERNATIVE.
> The performance of this code is dominated by the loop.

Using globals was much more involved, requiring changes in atleast 3 files.
The current ALTERNATIVE approach is much simpler and avoids additional
handling to make sure that globals are set correctly for all mitigation
modes of BHI and VMSCAPE.

[ BTW, I am travelling on a vacation and will be intermittently checking my
  emails. ]

> I also found this code in arch/x86/net/bpf_jit_comp.c:
> 	if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
> 		/* The clearing sequence clobbers eax and ecx. */
> 		EMIT1(0x50); /* push rax */
> 		EMIT1(0x51); /* push rcx */
> 		ip += 2;
> 
> 		func = (u8 *)clear_bhb_loop;
> 		ip += x86_call_depth_emit_accounting(&prog, func, ip);
> 
> 		if (emit_call(&prog, func, ip))
> 			return -EINVAL;
> 		EMIT1(0x59); /* pop rcx */
> 		EMIT1(0x58); /* pop rax */
> 	}
> which appears to assume that only rax and rcx are changed.
> Since all the counts are small, there is nothing stopping the code
> using the 8-bit registers %al, %ah, %cl and %ch.

Thanks for catching this.

> There are probably some schemes that only need one register.
> eg two separate ALTERNATIVE blocks.

Also, I think it is better to use a callee-saved register like rbx to avoid
callers having to save/restore registers. Something like below:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9f6f4a7c5baf..ca4a34ce314a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1535,11 +1535,12 @@ SYM_CODE_END(rewind_stack_and_make_dead)
 SYM_FUNC_START(clear_bhb_loop)
 	ANNOTATE_NOENDBR
 	push	%rbp
+	push	%rbx
 	mov	%rsp, %rbp
 
 	/* loop count differs based on BHI_CTRL, see Intel's BHI guidance */
-	ALTERNATIVE "movl $5,  %ecx; movl $5, %edx",	\
-		    "movl $12, %ecx; movl $7, %edx", X86_FEATURE_BHI_CTRL
+	ALTERNATIVE "movb $5,  %bl",	\
+		    "movb $12, %bl", X86_FEATURE_BHI_CTRL
 
 	ANNOTATE_INTRA_FUNCTION_CALL
 	call	1f
@@ -1561,15 +1562,17 @@ SYM_FUNC_START(clear_bhb_loop)
 	 * but some Clang versions (e.g. 18) don't like this.
 	 */
 	.skip 32 - 18, 0xcc
-2:	movl	%edx, %eax
+2:	ALTERNATIVE "movb $5, %bh",	\
+		    "movb $7, %bh", X86_FEATURE_BHI_CTRL
 3:	jmp	4f
 	nop
-4:	sub	$1, %eax
+4:	sub	$1, %bh
 	jnz	3b
-	sub	$1, %ecx
+	sub	$1, %bl
 	jnz	1b
 .Lret2:	RET
 5:
+	pop	%rbx
 	pop	%rbp
 	RET
 SYM_FUNC_END(clear_bhb_loop)
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index c1ec14c55911..823b3f613774 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1593,11 +1593,6 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
 	u8 *func;
 
 	if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
-		/* The clearing sequence clobbers eax and ecx. */
-		EMIT1(0x50); /* push rax */
-		EMIT1(0x51); /* push rcx */
-		ip += 2;
-
 		func = (u8 *)clear_bhb_loop;
 		ip += x86_call_depth_emit_accounting(&prog, func, ip);
 
@@ -1605,8 +1600,6 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
 			return -EINVAL;
 		/* Don't speculate past this until BHB is cleared */
 		EMIT_LFENCE();
-		EMIT1(0x59); /* pop rcx */
-		EMIT1(0x58); /* pop rax */
 	}
 	/* Insert IBHF instruction */
 	if ((cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP) &&

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush
  2025-12-11 10:50   ` Peter Zijlstra
@ 2025-12-14 18:45     ` Pawan Gupta
  0 siblings, 0 replies; 27+ messages in thread
From: Pawan Gupta @ 2025-12-14 18:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen, linux-kernel, kvm, Asit Mallick,
	Tao Zhang

On Thu, Dec 11, 2025 at 11:50:50AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 01, 2025 at 10:20:14PM -0800, Pawan Gupta wrote:
> > Adding more mitigation options at exit-to-userspace for VMSCAPE would
> > usually require a series of checks to decide which mitigation to use. In
> > this case, the mitigation is done by calling a function, which is decided
> > at boot. So, adding more feature flags and multiple checks can be avoided
> > by using static_call() to the mitigating function.
> > 
> > Replace the flag-based mitigation selector with a static_call(). This also
> > frees the existing X86_FEATURE_IBPB_EXIT_TO_USER.
> > 
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > ---
> >  arch/x86/Kconfig                     | 1 +
> >  arch/x86/include/asm/cpufeatures.h   | 2 +-
> >  arch/x86/include/asm/entry-common.h  | 7 +++----
> >  arch/x86/include/asm/nospec-branch.h | 3 +++
> >  arch/x86/kernel/cpu/bugs.c           | 5 ++++-
> >  arch/x86/kvm/x86.c                   | 2 +-
> >  6 files changed, 13 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index fa3b616af03a2d50eaf5f922bc8cd4e08a284045..066f62f15e67e85fda0f3fd66acabad9a9794ff8 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -2706,6 +2706,7 @@ config MITIGATION_TSA
> >  config MITIGATION_VMSCAPE
> >  	bool "Mitigate VMSCAPE"
> >  	depends on KVM
> > +	select HAVE_STATIC_CALL
> 
> That can't be right.

Hmm, should be "depends on HAVE_STATIC_CALL". Will fix it, thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-14 18:38       ` Pawan Gupta
@ 2025-12-14 19:02         ` David Laight
  2025-12-15 18:01           ` Pawan Gupta
  0 siblings, 1 reply; 27+ messages in thread
From: David Laight @ 2025-12-14 19:02 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Nikolay Borisov, x86, David Kaplan, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen, linux-kernel, kvm, Asit Mallick,
	Tao Zhang

On Sun, 14 Dec 2025 10:38:27 -0800
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:

> On Wed, Dec 10, 2025 at 01:35:42PM +0000, David Laight wrote:
> > On Wed, 10 Dec 2025 14:31:31 +0200
> > Nikolay Borisov <nik.borisov@suse.com> wrote:
> >   
> > > On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:  
> > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > > > that mitigates BHI in kernel.
> > > > 
> > > > BHI variant of VMSCAPE requires isolating branch history between guests and
> > > > userspace. Note that there is no equivalent hardware control for userspace.
> > > > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > > > should execute sufficient number of branches to clear a larger BHB.
> > > > 
> > > > Dynamically set the loop count of clear_bhb_loop() such that it is
> > > > effective on newer CPUs too. Use the hardware control enumeration
> > > > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> > > > 
> > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>    
> > > 
> > > nit: My RB tag is incorrect, while I did agree with Dave's suggestion to 
> > > have global variables for the loop counts I haven't' really seen the 
> > > code so I couldn't have given my RB on something which I haven't seen 
> > > but did agree with in principle.  
> > 
> > I thought the plan was to use global variables rather than ALTERNATIVE.
> > The performance of this code is dominated by the loop.  
> 
> Using globals was much more involved, requiring changes in atleast 3 files.
> The current ALTERNATIVE approach is much simpler and avoids additional
> handling to make sure that globals are set correctly for all mitigation
> modes of BHI and VMSCAPE.
> 
> [ BTW, I am travelling on a vacation and will be intermittently checking my
>   emails. ]
> 
> > I also found this code in arch/x86/net/bpf_jit_comp.c:
> > 	if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
> > 		/* The clearing sequence clobbers eax and ecx. */
> > 		EMIT1(0x50); /* push rax */
> > 		EMIT1(0x51); /* push rcx */
> > 		ip += 2;
> > 
> > 		func = (u8 *)clear_bhb_loop;
> > 		ip += x86_call_depth_emit_accounting(&prog, func, ip);
> > 
> > 		if (emit_call(&prog, func, ip))
> > 			return -EINVAL;
> > 		EMIT1(0x59); /* pop rcx */
> > 		EMIT1(0x58); /* pop rax */
> > 	}
> > which appears to assume that only rax and rcx are changed.
> > Since all the counts are small, there is nothing stopping the code
> > using the 8-bit registers %al, %ah, %cl and %ch.  
> 
> Thanks for catching this.

I was trying to find where it was called from.
Failed to find the one on system call entry...

> > There are probably some schemes that only need one register.
> > eg two separate ALTERNATIVE blocks.  
> 
> Also, I think it is better to use a callee-saved register like rbx to avoid
> callers having to save/restore registers. Something like below:

I'm not sure.
%ax is the return value so can be 'trashed' by a normal function call.
But if the bpf code is saving %ax then it isn't expecting a normal call.
OTOH if you are going to save the register in clear_bhb_loop you might
as well use %ax to get the slightly shorter instructions for %al.
(I think 'movb' comes out shorter - as if it really matters.)

Definitely worth a comment that it must save all resisters.

I also wonder if it needs to setup a stack frame?
Again, the code is so slow it won't matter.

	David


> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 9f6f4a7c5baf..ca4a34ce314a 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1535,11 +1535,12 @@ SYM_CODE_END(rewind_stack_and_make_dead)
>  SYM_FUNC_START(clear_bhb_loop)
>  	ANNOTATE_NOENDBR
>  	push	%rbp
> +	push	%rbx
>  	mov	%rsp, %rbp
>  
>  	/* loop count differs based on BHI_CTRL, see Intel's BHI guidance */
> -	ALTERNATIVE "movl $5,  %ecx; movl $5, %edx",	\
> -		    "movl $12, %ecx; movl $7, %edx", X86_FEATURE_BHI_CTRL
> +	ALTERNATIVE "movb $5,  %bl",	\
> +		    "movb $12, %bl", X86_FEATURE_BHI_CTRL
>  
>  	ANNOTATE_INTRA_FUNCTION_CALL
>  	call	1f
> @@ -1561,15 +1562,17 @@ SYM_FUNC_START(clear_bhb_loop)
>  	 * but some Clang versions (e.g. 18) don't like this.
>  	 */
>  	.skip 32 - 18, 0xcc
> -2:	movl	%edx, %eax
> +2:	ALTERNATIVE "movb $5, %bh",	\
> +		    "movb $7, %bh", X86_FEATURE_BHI_CTRL
>  3:	jmp	4f
>  	nop
> -4:	sub	$1, %eax
> +4:	sub	$1, %bh
>  	jnz	3b
> -	sub	$1, %ecx
> +	sub	$1, %bl
>  	jnz	1b
>  .Lret2:	RET
>  5:
> +	pop	%rbx
>  	pop	%rbp
>  	RET
>  SYM_FUNC_END(clear_bhb_loop)
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index c1ec14c55911..823b3f613774 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1593,11 +1593,6 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
>  	u8 *func;
>  
>  	if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
> -		/* The clearing sequence clobbers eax and ecx. */
> -		EMIT1(0x50); /* push rax */
> -		EMIT1(0x51); /* push rcx */
> -		ip += 2;
> -
>  		func = (u8 *)clear_bhb_loop;
>  		ip += x86_call_depth_emit_accounting(&prog, func, ip);
>  
> @@ -1605,8 +1600,6 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
>  			return -EINVAL;
>  		/* Don't speculate past this until BHB is cleared */
>  		EMIT_LFENCE();
> -		EMIT1(0x59); /* pop rcx */
> -		EMIT1(0x58); /* pop rax */
>  	}
>  	/* Insert IBHF instruction */
>  	if ((cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP) &&


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-14 19:02         ` David Laight
@ 2025-12-15 18:01           ` Pawan Gupta
  2025-12-15 21:05             ` David Laight
  0 siblings, 1 reply; 27+ messages in thread
From: Pawan Gupta @ 2025-12-15 18:01 UTC (permalink / raw)
  To: David Laight
  Cc: Nikolay Borisov, x86, David Kaplan, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen, linux-kernel, kvm, Asit Mallick,
	Tao Zhang

On Sun, Dec 14, 2025 at 07:02:33PM +0000, David Laight wrote:
> On Sun, 14 Dec 2025 10:38:27 -0800
> Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:
> 
> > On Wed, Dec 10, 2025 at 01:35:42PM +0000, David Laight wrote:
> > > On Wed, 10 Dec 2025 14:31:31 +0200
> > > Nikolay Borisov <nik.borisov@suse.com> wrote:
> > >   
> > > > On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:  
> > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > > > > that mitigates BHI in kernel.
> > > > > 
> > > > > BHI variant of VMSCAPE requires isolating branch history between guests and
> > > > > userspace. Note that there is no equivalent hardware control for userspace.
> > > > > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > > > > should execute sufficient number of branches to clear a larger BHB.
> > > > > 
> > > > > Dynamically set the loop count of clear_bhb_loop() such that it is
> > > > > effective on newer CPUs too. Use the hardware control enumeration
> > > > > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> > > > > 
> > > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>    
> > > > 
> > > > nit: My RB tag is incorrect, while I did agree with Dave's suggestion to 
> > > > have global variables for the loop counts I haven't' really seen the 
> > > > code so I couldn't have given my RB on something which I haven't seen 
> > > > but did agree with in principle.  
> > > 
> > > I thought the plan was to use global variables rather than ALTERNATIVE.
> > > The performance of this code is dominated by the loop.  
> > 
> > Using globals was much more involved, requiring changes in atleast 3 files.
> > The current ALTERNATIVE approach is much simpler and avoids additional
> > handling to make sure that globals are set correctly for all mitigation
> > modes of BHI and VMSCAPE.
> > 
> > [ BTW, I am travelling on a vacation and will be intermittently checking my
> >   emails. ]
> > 
> > > I also found this code in arch/x86/net/bpf_jit_comp.c:
> > > 	if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
> > > 		/* The clearing sequence clobbers eax and ecx. */
> > > 		EMIT1(0x50); /* push rax */
> > > 		EMIT1(0x51); /* push rcx */
> > > 		ip += 2;
> > > 
> > > 		func = (u8 *)clear_bhb_loop;
> > > 		ip += x86_call_depth_emit_accounting(&prog, func, ip);
> > > 
> > > 		if (emit_call(&prog, func, ip))
> > > 			return -EINVAL;
> > > 		EMIT1(0x59); /* pop rcx */
> > > 		EMIT1(0x58); /* pop rax */
> > > 	}
> > > which appears to assume that only rax and rcx are changed.
> > > Since all the counts are small, there is nothing stopping the code
> > > using the 8-bit registers %al, %ah, %cl and %ch.  
> > 
> > Thanks for catching this.
> 
> I was trying to find where it was called from.
> Failed to find the one on system call entry...

The macro CLEAR_BRANCH_HISTORY calls clear_bhb_loop() at system call entry.

> > > There are probably some schemes that only need one register.
> > > eg two separate ALTERNATIVE blocks.  
> > 
> > Also, I think it is better to use a callee-saved register like rbx to avoid
> > callers having to save/restore registers. Something like below:
> 
> I'm not sure.
> %ax is the return value so can be 'trashed' by a normal function call.
> But if the bpf code is saving %ax then it isn't expecting a normal call.

BHB clear sequence is executed at the end of the BPF JITted code, and %rax
is likely the return value of the BPF program. So, saving/restoring %rax
around the sequence makes sense to me.

> OTOH if you are going to save the register in clear_bhb_loop you might
> as well use %ax to get the slightly shorter instructions for %al.
> (I think 'movb' comes out shorter - as if it really matters.)

%rbx is a callee-saved register so it felt more intuitive to save/restore
it in clear_bhb_loop(). But, I can use %ax if you feel strongly.

> Definitely worth a comment that it must save all resisters.

Yes, will add a comment.

> I also wonder if it needs to setup a stack frame?

I don't know if thats necessary, objtool doesn't complain because
clear_bhb_loop() is marked STACK_FRAME_NON_STANDARD.

> Again, the code is so slow it won't matter.
> 
> 	David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-15 18:01           ` Pawan Gupta
@ 2025-12-15 21:05             ` David Laight
  0 siblings, 0 replies; 27+ messages in thread
From: David Laight @ 2025-12-15 21:05 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Nikolay Borisov, x86, David Kaplan, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini,
	Borislav Petkov, Dave Hansen, linux-kernel, kvm, Asit Mallick,
	Tao Zhang

On Mon, 15 Dec 2025 10:01:36 -0800
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:

> On Sun, Dec 14, 2025 at 07:02:33PM +0000, David Laight wrote:
> > On Sun, 14 Dec 2025 10:38:27 -0800
> > Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:
> >   
> > > On Wed, Dec 10, 2025 at 01:35:42PM +0000, David Laight wrote:  
> > > > On Wed, 10 Dec 2025 14:31:31 +0200
> > > > Nikolay Borisov <nik.borisov@suse.com> wrote:
> > > >     
> > > > > On 2.12.25 г. 8:19 ч., Pawan Gupta wrote:    
> > > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > > > > > that mitigates BHI in kernel.
> > > > > > 
> > > > > > BHI variant of VMSCAPE requires isolating branch history between guests and
> > > > > > userspace. Note that there is no equivalent hardware control for userspace.
> > > > > > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > > > > > should execute sufficient number of branches to clear a larger BHB.
> > > > > > 
> > > > > > Dynamically set the loop count of clear_bhb_loop() such that it is
> > > > > > effective on newer CPUs too. Use the hardware control enumeration
> > > > > > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> > > > > > 
> > > > > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > > > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>      
> > > > > 
> > > > > nit: My RB tag is incorrect, while I did agree with Dave's suggestion to 
> > > > > have global variables for the loop counts I haven't' really seen the 
> > > > > code so I couldn't have given my RB on something which I haven't seen 
> > > > > but did agree with in principle.    
> > > > 
> > > > I thought the plan was to use global variables rather than ALTERNATIVE.
> > > > The performance of this code is dominated by the loop.    
> > > 
> > > Using globals was much more involved, requiring changes in atleast 3 files.
> > > The current ALTERNATIVE approach is much simpler and avoids additional
> > > handling to make sure that globals are set correctly for all mitigation
> > > modes of BHI and VMSCAPE.
> > > 
> > > [ BTW, I am travelling on a vacation and will be intermittently checking my
> > >   emails. ]
> > >   
> > > > I also found this code in arch/x86/net/bpf_jit_comp.c:
> > > > 	if (cpu_feature_enabled(X86_FEATURE_CLEAR_BHB_LOOP)) {
> > > > 		/* The clearing sequence clobbers eax and ecx. */
> > > > 		EMIT1(0x50); /* push rax */
> > > > 		EMIT1(0x51); /* push rcx */
> > > > 		ip += 2;
> > > > 
> > > > 		func = (u8 *)clear_bhb_loop;
> > > > 		ip += x86_call_depth_emit_accounting(&prog, func, ip);
> > > > 
> > > > 		if (emit_call(&prog, func, ip))
> > > > 			return -EINVAL;
> > > > 		EMIT1(0x59); /* pop rcx */
> > > > 		EMIT1(0x58); /* pop rax */
> > > > 	}
> > > > which appears to assume that only rax and rcx are changed.
> > > > Since all the counts are small, there is nothing stopping the code
> > > > using the 8-bit registers %al, %ah, %cl and %ch.    
> > > 
> > > Thanks for catching this.  
> > 
> > I was trying to find where it was called from.
> > Failed to find the one on system call entry...  
> 
> The macro CLEAR_BRANCH_HISTORY calls clear_bhb_loop() at system call entry.

I didn't look very hard :-)

> 
> > > > There are probably some schemes that only need one register.
> > > > eg two separate ALTERNATIVE blocks.    
> > > 
> > > Also, I think it is better to use a callee-saved register like rbx to avoid
> > > callers having to save/restore registers. Something like below:  
> > 
> > I'm not sure.
> > %ax is the return value so can be 'trashed' by a normal function call.
> > But if the bpf code is saving %ax then it isn't expecting a normal call.  
> 
> BHB clear sequence is executed at the end of the BPF JITted code, and %rax
> is likely the return value of the BPF program. So, saving/restoring %rax
> around the sequence makes sense to me.
> 
> > OTOH if you are going to save the register in clear_bhb_loop you might
> > as well use %ax to get the slightly shorter instructions for %al.
> > (I think 'movb' comes out shorter - as if it really matters.)  
> 
> %rbx is a callee-saved register so it felt more intuitive to save/restore
> it in clear_bhb_loop(). But, I can use %ax if you feel strongly.

If you are going to save a register it might as well be %ax.
Otherwise someone will wonder why you picked a different one.

> 
> > Definitely worth a comment that it must save all resisters.  
> 
> Yes, will add a comment.
> 
> > I also wonder if it needs to setup a stack frame?  
> 
> I don't know if thats necessary, objtool doesn't complain because
> clear_bhb_loop() is marked STACK_FRAME_NON_STANDARD.

In some senses it is a leaf functions - and the compiler doesn't create
stack frames for those (by default).

Provided objtool isn't confused by all the call instructions it probably
doesn't matter.

	David

> 
> > Again, the code is so slow it won't matter.
> > 
> > 	David  


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 1/9] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
  2025-12-02  6:18 ` [PATCH v6 1/9] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
@ 2026-01-01 12:51   ` Borislav Petkov
  2026-01-06  4:29     ` Pawan Gupta
  0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2026-01-01 12:51 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini, Dave Hansen,
	linux-kernel, kvm, Asit Mallick, Tao Zhang

On Mon, Dec 01, 2025 at 10:18:59PM -0800, Pawan Gupta wrote:
> In preparation for adding the support for BHB sequence (without LFENCE) on
> newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
> executed. This allows callers to decide whether they need the LFENCE or

s/This allows/Allow/

> not. This does adds a few extra bytes to the call sites, but it obviates

s/This does adds/This adds/

> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index ed04a968cc7d0095ab0185b2e3b5beffb7680afd..886f86790b4467347031bc27d3d761d5cc286da1 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1528,6 +1528,9 @@ SYM_CODE_END(rewind_stack_and_make_dead)
>   * refactored in the future if needed. The .skips are for safety, to ensure
>   * that all RETs are in the second half of a cacheline to mitigate Indirect
>   * Target Selection, rather than taking the slowpath via its_return_thunk.
> + *
> + * Note, callers should use a speculation barrier like LFENCE immediately after
> + * a call to this function to ensure BHB is cleared before indirect branches.
>   */

Comments do get missed. So, I'd call the function clear_bhb_loop_unfenced or
something to that effect so that it is perfectly clear that !BHI_DIS_S parts
will need the LFENCE at the end. This way it is in the name and should make
people think what they're calling. I'd hope...

>  SYM_FUNC_START(clear_bhb_loop)
>  	ANNOTATE_NOENDBR

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 1/9] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
  2026-01-01 12:51   ` Borislav Petkov
@ 2026-01-06  4:29     ` Pawan Gupta
  0 siblings, 0 replies; 27+ messages in thread
From: Pawan Gupta @ 2026-01-06  4:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini, Dave Hansen,
	linux-kernel, kvm, Asit Mallick, Tao Zhang

On Thu, Jan 01, 2026 at 01:51:22PM +0100, Borislav Petkov wrote:
> On Mon, Dec 01, 2025 at 10:18:59PM -0800, Pawan Gupta wrote:
> > In preparation for adding the support for BHB sequence (without LFENCE) on
> > newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
> > executed. This allows callers to decide whether they need the LFENCE or
> 
> s/This allows/Allow/

> > not. This does adds a few extra bytes to the call sites, but it obviates
> 
> s/This does adds/This adds/

Ok.

> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index ed04a968cc7d0095ab0185b2e3b5beffb7680afd..886f86790b4467347031bc27d3d761d5cc286da1 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -1528,6 +1528,9 @@ SYM_CODE_END(rewind_stack_and_make_dead)
> >   * refactored in the future if needed. The .skips are for safety, to ensure
> >   * that all RETs are in the second half of a cacheline to mitigate Indirect
> >   * Target Selection, rather than taking the slowpath via its_return_thunk.
> > + *
> > + * Note, callers should use a speculation barrier like LFENCE immediately after
> > + * a call to this function to ensure BHB is cleared before indirect branches.
> >   */
> 
> Comments do get missed. So, I'd call the function clear_bhb_loop_unfenced or
> something to that effect so that it is perfectly clear that !BHI_DIS_S parts
> will need the LFENCE at the end. This way it is in the name and should make
> people think what they're calling. I'd hope...

Sure, renaming this to clear_bhb_loop_nofence() in a separate patch.

Will send v7 after some testing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2025-12-02  6:19 ` [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
  2025-12-10 12:31   ` Nikolay Borisov
@ 2026-01-24 19:34   ` Borislav Petkov
  2026-03-05  0:41     ` Pawan Gupta
  1 sibling, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2026-01-24 19:34 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini, Dave Hansen,
	linux-kernel, kvm, Asit Mallick, Tao Zhang

On Mon, Dec 01, 2025 at 10:19:14PM -0800, Pawan Gupta wrote:
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 886f86790b4467347031bc27d3d761d5cc286da1..9f6f4a7c5baf1fe4e3ab18b11e25e2fbcc77489d 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1536,7 +1536,11 @@ SYM_FUNC_START(clear_bhb_loop)
>  	ANNOTATE_NOENDBR
>  	push	%rbp
>  	mov	%rsp, %rbp
> -	movl	$5, %ecx
> +
> +	/* loop count differs based on BHI_CTRL, see Intel's BHI guidance */
> +	ALTERNATIVE "movl $5,  %ecx; movl $5, %edx",	\
> +		    "movl $12, %ecx; movl $7, %edx", X86_FEATURE_BHI_CTRL

Why isn't this written like this:

in C:

clear_bhb_loop:

	if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL))
		__clear_bhb_loop(12, 7);
	else
		__clear_bhb_loop(5, 5);

and then the __-version is asm and it gets those two arguments from %rdi, and
%rsi instead of more hard-coded, error-prone registers diddling alternative
gunk?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
  2026-01-24 19:34   ` Borislav Petkov
@ 2026-03-05  0:41     ` Pawan Gupta
  0 siblings, 0 replies; 27+ messages in thread
From: Pawan Gupta @ 2026-03-05  0:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, David Kaplan, Nikolay Borisov, H. Peter Anvin,
	Josh Poimboeuf, Sean Christopherson, Paolo Bonzini, Dave Hansen,
	linux-kernel, kvm, Asit Mallick, Tao Zhang

First of all, apologies for not responding to this and many other emails I
still need to read. (For the past few months I was off-work and have been
dealing with a personal emergency. Now thats over, I am catching up with
the pending stuff.)

On Sat, Jan 24, 2026 at 08:34:18PM +0100, Borislav Petkov wrote:
> On Mon, Dec 01, 2025 at 10:19:14PM -0800, Pawan Gupta wrote:
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 886f86790b4467347031bc27d3d761d5cc286da1..9f6f4a7c5baf1fe4e3ab18b11e25e2fbcc77489d 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -1536,7 +1536,11 @@ SYM_FUNC_START(clear_bhb_loop)
> >  	ANNOTATE_NOENDBR
> >  	push	%rbp
> >  	mov	%rsp, %rbp
> > -	movl	$5, %ecx
> > +
> > +	/* loop count differs based on BHI_CTRL, see Intel's BHI guidance */
> > +	ALTERNATIVE "movl $5,  %ecx; movl $5, %edx",	\
> > +		    "movl $12, %ecx; movl $7, %edx", X86_FEATURE_BHI_CTRL
> 
> Why isn't this written like this:
> 
> in C:
> 
> clear_bhb_loop:
> 
> 	if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL))
> 		__clear_bhb_loop(12, 7);
> 	else
> 		__clear_bhb_loop(5, 5);
> 
> and then the __-version is asm and it gets those two arguments from %rdi, and
> %rsi instead of more hard-coded, error-prone registers diddling alternative
> gunk?

This would require CLEAR_BRANCH_HISTORY to move the hard-coded arguments to
the register, which isn't looking pretty:

.macro CLEAR_BRANCH_HISTORY
	ALTERNATIVE "movq $5,  %rdi; movq $5, %rsi",		\
		    "movq $12, %rdi; movq $7, %rsi", X86_FEATURE_BHI_CTRL

	ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP
.endm

I don't think we can avoid the register diddling one way or the other. Also
it is best if the loop count stays within clear_bhb_loop(), so that atleast
the callsites can stay clean and don't have to worry about the magic number
arguments.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-03-05  0:42 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-02  6:18 [PATCH v6 0/9] VMSCAPE optimization for BHI variant Pawan Gupta
2025-12-02  6:18 ` [PATCH v6 1/9] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop() Pawan Gupta
2026-01-01 12:51   ` Borislav Petkov
2026-01-06  4:29     ` Pawan Gupta
2025-12-02  6:19 ` [PATCH v6 2/9] x86/bhi: Make clear_bhb_loop() effective on newer CPUs Pawan Gupta
2025-12-10 12:31   ` Nikolay Borisov
2025-12-10 13:35     ` David Laight
2025-12-10 15:42       ` Nikolay Borisov
2025-12-14 18:38       ` Pawan Gupta
2025-12-14 19:02         ` David Laight
2025-12-15 18:01           ` Pawan Gupta
2025-12-15 21:05             ` David Laight
2025-12-14 17:16     ` Pawan Gupta
2026-01-24 19:34   ` Borislav Petkov
2026-03-05  0:41     ` Pawan Gupta
2025-12-02  6:19 ` [PATCH v6 3/9] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user Pawan Gupta
2025-12-02  6:19 ` [PATCH v6 4/9] x86/vmscape: Move mitigation selection to a switch() Pawan Gupta
2025-12-10 16:15   ` Nikolay Borisov
2025-12-02  6:19 ` [PATCH v6 5/9] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier() Pawan Gupta
2025-12-02  6:20 ` [PATCH v6 6/9] x86/vmscape: Use static_call() for predictor flush Pawan Gupta
2025-12-11 10:06   ` Nikolay Borisov
2025-12-11 10:50   ` Peter Zijlstra
2025-12-14 18:45     ` Pawan Gupta
2025-12-02  6:20 ` [PATCH v6 7/9] x86/vmscape: Deploy BHB clearing mitigation Pawan Gupta
2025-12-11 14:26   ` Nikolay Borisov
2025-12-02  6:20 ` [PATCH v6 8/9] x86/vmscape: Fix conflicting attack-vector controls with =force Pawan Gupta
2025-12-02  6:21 ` [PATCH v6 9/9] x86/vmscape: Add cmdline vmscape=on to override attack vector controls Pawan Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox