* [PATCH v10 01/12] x86/bhi: x86/vmscape: Move LFENCE out of clear_bhb_loop()
From: Pawan Gupta @ 2026-04-14 7:05 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
Currently, the BHB clearing sequence is followed by an LFENCE to prevent
transient execution of subsequent indirect branches prematurely. However,
the LFENCE barrier could be unnecessary in certain cases. For example, when
the kernel is using the BHI_DIS_S mitigation, and BHB clearing is only
needed for userspace. In such cases, the LFENCE is redundant because ring
transitions would provide the necessary serialization.
Below is a quick recap of BHI mitigation options:
On Alder Lake and newer
BHI_DIS_S: Hardware control to mitigate BHI in ring0. This has low
performance overhead.
Long loop: Alternatively, a longer version of the BHB clearing sequence
can be used to mitigate BHI. It can also be used to mitigate the BHI
variant of VMSCAPE. This is not yet implemented in Linux.
On older CPUs
Short loop: Clears BHB at kernel entry and VMexit. The "Long loop" is
effective on older CPUs as well, but should be avoided because of
unnecessary overhead.
On Alder Lake and newer CPUs, eIBRS isolates the indirect targets between
guest and host. But when affected by the BHI variant of VMSCAPE, a guest's
branch history may still influence indirect branches in userspace. This
also means the big hammer IBPB could be replaced with a cheaper option that
clears the BHB at exit-to-userspace after a VMexit.
In preparation for adding the support for the BHB sequence (without LFENCE)
on newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is
executed. Allow callers to decide whether they need the LFENCE or not. This
adds a few extra bytes to the call sites, but it obviates the need for
multiple variants of clear_bhb_loop().
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/entry/entry_64.S | 5 ++++-
arch/x86/include/asm/nospec-branch.h | 4 ++--
arch/x86/net/bpf_jit_comp.c | 2 ++
3 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 42447b1e1dff..3a180a36ca0e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1528,6 +1528,9 @@ SYM_CODE_END(rewind_stack_and_make_dead)
* refactored in the future if needed. The .skips are for safety, to ensure
* that all RETs are in the second half of a cacheline to mitigate Indirect
* Target Selection, rather than taking the slowpath via its_return_thunk.
+ *
+ * Note, callers should use a speculation barrier like LFENCE immediately after
+ * a call to this function to ensure BHB is cleared before indirect branches.
*/
SYM_FUNC_START(clear_bhb_loop)
ANNOTATE_NOENDBR
@@ -1562,7 +1565,7 @@ SYM_FUNC_START(clear_bhb_loop)
sub $1, %ecx
jnz 1b
.Lret2: RET
-5: lfence
+5:
pop %rbp
RET
SYM_FUNC_END(clear_bhb_loop)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 4f4b5e8a1574..70b377fcbc1c 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -331,11 +331,11 @@
#ifdef CONFIG_X86_64
.macro CLEAR_BRANCH_HISTORY
- ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP
+ ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP
.endm
.macro CLEAR_BRANCH_HISTORY_VMEXIT
- ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_VMEXIT
+ ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
.endm
#else
#define CLEAR_BRANCH_HISTORY
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e9b78040d703..63d6c9fa5e80 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1624,6 +1624,8 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
if (emit_call(&prog, func, ip))
return -EINVAL;
+ /* Don't speculate past this until BHB is cleared */
+ EMIT_LFENCE();
EMIT1(0x59); /* pop rcx */
EMIT1(0x58); /* pop rax */
}
--
2.34.1
^ permalink raw reply related
* [PATCH v10 02/12] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Pawan Gupta @ 2026-04-14 7:05 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
the Branch History Buffer (BHB). On Alder Lake and newer parts this
sequence is not sufficient because it doesn't clear enough entries. This
was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
in the kernel.
Now with VMSCAPE (BHI variant) it is also required to isolate branch
history between guests and userspace. Since BHI_DIS_S only protects the
kernel, the newer CPUs also use IBPB.
A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
But it currently does not clear enough BHB entries to be effective on newer
CPUs with larger BHB. At boot, dynamically set the loop count of
clear_bhb_loop() such that it is effective on newer CPUs too.
Introduce global loop counts, initializing them with appropriate value
based on the hardware feature X86_FEATURE_BHI_CTRL.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/entry/entry_64.S | 8 +++++---
arch/x86/include/asm/nospec-branch.h | 2 ++
arch/x86/kernel/cpu/bugs.c | 13 +++++++++++++
3 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3a180a36ca0e..bbd4b1c7ec04 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
ANNOTATE_NOENDBR
push %rbp
mov %rsp, %rbp
- movl $5, %ecx
+
+ movzbl bhb_seq_outer_loop(%rip), %ecx
+
ANNOTATE_INTRA_FUNCTION_CALL
call 1f
jmp 5f
@@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
* This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
* but some Clang versions (e.g. 18) don't like this.
*/
- .skip 32 - 18, 0xcc
-2: movl $5, %eax
+ .skip 32 - 20, 0xcc
+2: movzbl bhb_seq_inner_loop(%rip), %eax
3: jmp 4f
nop
4: sub $1, %eax
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 70b377fcbc1c..87b83ae7c97f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
extern void update_spec_ctrl_cond(u64 val);
extern u64 spec_ctrl_current(void);
+extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
+
/*
* With retpoline, we must use IBRS to restrict branch prediction
* before calling into firmware.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 83f51cab0b1e..2cb4a96247d8 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2047,6 +2047,10 @@ enum bhi_mitigations {
static enum bhi_mitigations bhi_mitigation __ro_after_init =
IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
+/* Default to short BHB sequence values */
+u8 bhb_seq_outer_loop __ro_after_init = 5;
+u8 bhb_seq_inner_loop __ro_after_init = 5;
+
static int __init spectre_bhi_parse_cmdline(char *str)
{
if (!str)
@@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
}
+ /*
+ * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
+ * support), see Intel's BHI guidance.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
+ bhb_seq_outer_loop = 12;
+ bhb_seq_inner_loop = 7;
+ }
+
x86_arch_cap_msr = x86_read_arch_cap_msr();
cpu_print_attack_vectors();
--
2.34.1
^ permalink raw reply related
* [PATCH v10 03/12] x86/bhi: Rename clear_bhb_loop() to clear_bhb_loop_nofence()
From: Pawan Gupta @ 2026-04-14 7:06 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
To reflect the recent change that moved LFENCE to the caller side.
Suggested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/entry/entry_64.S | 8 ++++----
arch/x86/include/asm/nospec-branch.h | 6 +++---
arch/x86/net/bpf_jit_comp.c | 2 +-
3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index bbd4b1c7ec04..1f56d086d312 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1532,7 +1532,7 @@ SYM_CODE_END(rewind_stack_and_make_dead)
* Note, callers should use a speculation barrier like LFENCE immediately after
* a call to this function to ensure BHB is cleared before indirect branches.
*/
-SYM_FUNC_START(clear_bhb_loop)
+SYM_FUNC_START(clear_bhb_loop_nofence)
ANNOTATE_NOENDBR
push %rbp
mov %rsp, %rbp
@@ -1570,6 +1570,6 @@ SYM_FUNC_START(clear_bhb_loop)
5:
pop %rbp
RET
-SYM_FUNC_END(clear_bhb_loop)
-EXPORT_SYMBOL_FOR_KVM(clear_bhb_loop)
-STACK_FRAME_NON_STANDARD(clear_bhb_loop)
+SYM_FUNC_END(clear_bhb_loop_nofence)
+EXPORT_SYMBOL_FOR_KVM(clear_bhb_loop_nofence)
+STACK_FRAME_NON_STANDARD(clear_bhb_loop_nofence)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 87b83ae7c97f..157eb69c7f0f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -331,11 +331,11 @@
#ifdef CONFIG_X86_64
.macro CLEAR_BRANCH_HISTORY
- ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP
+ ALTERNATIVE "", "call clear_bhb_loop_nofence; lfence", X86_FEATURE_CLEAR_BHB_LOOP
.endm
.macro CLEAR_BRANCH_HISTORY_VMEXIT
- ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
+ ALTERNATIVE "", "call clear_bhb_loop_nofence; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT
.endm
#else
#define CLEAR_BRANCH_HISTORY
@@ -389,7 +389,7 @@ extern void entry_untrain_ret(void);
extern void write_ibpb(void);
#ifdef CONFIG_X86_64
-extern void clear_bhb_loop(void);
+extern void clear_bhb_loop_nofence(void);
#endif
extern void (*x86_return_thunk)(void);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 63d6c9fa5e80..f40e88f87273 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1619,7 +1619,7 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip,
EMIT1(0x51); /* push rcx */
ip += 2;
- func = (u8 *)clear_bhb_loop;
+ func = (u8 *)clear_bhb_loop_nofence;
ip += x86_call_depth_emit_accounting(&prog, func, ip);
if (emit_call(&prog, func, ip))
--
2.34.1
^ permalink raw reply related
* [PATCH v10 04/12] x86/vmscape: Rename x86_ibpb_exit_to_user to x86_predictor_flush_exit_to_user
From: Pawan Gupta @ 2026-04-14 7:06 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
With the upcoming changes x86_ibpb_exit_to_user will also be used when BHB
clearing sequence is used. Rename it cover both the cases.
No functional change.
Suggested-by: Sean Christopherson <seanjc@google.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/include/asm/entry-common.h | 6 +++---
arch/x86/include/asm/nospec-branch.h | 2 +-
arch/x86/kernel/cpu/bugs.c | 4 ++--
arch/x86/kvm/x86.c | 2 +-
4 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index ce3eb6d5fdf9..c45858db16c9 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -94,11 +94,11 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
*/
choose_random_kstack_offset(rdtsc());
- /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
+ /* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
- this_cpu_read(x86_ibpb_exit_to_user)) {
+ this_cpu_read(x86_predictor_flush_exit_to_user)) {
indirect_branch_prediction_barrier();
- this_cpu_write(x86_ibpb_exit_to_user, false);
+ this_cpu_write(x86_predictor_flush_exit_to_user, false);
}
}
#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 157eb69c7f0f..0381db59c39d 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -533,7 +533,7 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
: "memory");
}
-DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
+DECLARE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
static inline void indirect_branch_prediction_barrier(void)
{
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 2cb4a96247d8..002bf4adccc3 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -65,8 +65,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
* be needed to before running userspace. That IBPB will flush the branch
* predictor content.
*/
-DEFINE_PER_CPU(bool, x86_ibpb_exit_to_user);
-EXPORT_PER_CPU_SYMBOL_GPL(x86_ibpb_exit_to_user);
+DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
+EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd1c4a36b593..45d7cfedc507 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11464,7 +11464,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
* may migrate to.
*/
if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
- this_cpu_write(x86_ibpb_exit_to_user, true);
+ this_cpu_write(x86_predictor_flush_exit_to_user, true);
/*
* Consume any pending interrupts, including the possible source of
--
2.34.1
^ permalink raw reply related
* [PATCH v10 07/12] static_call: Add EXPORT_STATIC_CALL_FOR_MODULES()
From: Pawan Gupta @ 2026-04-14 7:07 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
There is EXPORT_STATIC_CALL_TRAMP() that hides the static key from all
modules. But there is no equivalent of EXPORT_SYMBOL_FOR_MODULES() to
restrict symbol visibility to only certain modules.
Add EXPORT_STATIC_CALL_FOR_MODULES(name, mods) that wraps both the key and
the trampoline with EXPORT_SYMBOL_FOR_MODULES(), allowing only a limited
set of modules to see and update the static key.
The immediate user is KVM, in the following commit.
checkpatch reported below warnings with this change that I believe don't
apply in this case:
include/linux/static_call.h:219: WARNING: Non-declarative macros with multiple statements should be enclosed in a do - while loop
include/linux/static_call.h:220: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
include/linux/static_call.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 78a77a4ae0ea..b610afd1ed55 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -216,6 +216,9 @@ extern long __static_call_return0(void);
#define EXPORT_STATIC_CALL_GPL(name) \
EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name)); \
EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name))
+#define EXPORT_STATIC_CALL_FOR_MODULES(name, mods) \
+ EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_KEY(name), mods); \
+ EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_TRAMP(name), mods)
/* Leave the key unexported, so modules can't change static call targets: */
#define EXPORT_STATIC_CALL_TRAMP(name) \
@@ -276,6 +279,9 @@ extern long __static_call_return0(void);
#define EXPORT_STATIC_CALL_GPL(name) \
EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name)); \
EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name))
+#define EXPORT_STATIC_CALL_FOR_MODULES(name, mods) \
+ EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_KEY(name), mods); \
+ EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_TRAMP(name), mods)
/* Leave the key unexported, so modules can't change static call targets: */
#define EXPORT_STATIC_CALL_TRAMP(name) \
@@ -346,6 +352,8 @@ static inline int static_call_text_reserved(void *start, void *end)
#define EXPORT_STATIC_CALL(name) EXPORT_SYMBOL(STATIC_CALL_KEY(name))
#define EXPORT_STATIC_CALL_GPL(name) EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name))
+#define EXPORT_STATIC_CALL_FOR_MODULES(name, mods) \
+ EXPORT_SYMBOL_FOR_MODULES(STATIC_CALL_KEY(name), mods)
#endif /* CONFIG_HAVE_STATIC_CALL */
--
2.34.1
^ permalink raw reply related
* [PATCH v10 05/12] x86/vmscape: Move mitigation selection to a switch()
From: Pawan Gupta @ 2026-04-14 7:06 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
This ensures that all mitigation modes are explicitly handled, while
keeping the mitigation selection for each mode together. This also prepares
for adding BHB-clearing mitigation mode for VMSCAPE.
Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/kernel/cpu/bugs.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 002bf4adccc3..636280c612f0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3088,17 +3088,33 @@ early_param("vmscape", vmscape_parse_cmdline);
static void __init vmscape_select_mitigation(void)
{
- if (!boot_cpu_has_bug(X86_BUG_VMSCAPE) ||
- !boot_cpu_has(X86_FEATURE_IBPB)) {
+ if (!boot_cpu_has_bug(X86_BUG_VMSCAPE)) {
vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
return;
}
- if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) {
- if (should_mitigate_vuln(X86_BUG_VMSCAPE))
+ if ((vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) &&
+ !should_mitigate_vuln(X86_BUG_VMSCAPE))
+ vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+
+ switch (vmscape_mitigation) {
+ case VMSCAPE_MITIGATION_NONE:
+ break;
+
+ case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+ if (!boot_cpu_has(X86_FEATURE_IBPB))
+ vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+ break;
+
+ case VMSCAPE_MITIGATION_AUTO:
+ if (boot_cpu_has(X86_FEATURE_IBPB))
vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
else
vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
+ break;
+
+ default:
+ break;
}
}
--
2.34.1
^ permalink raw reply related
* [PATCH v10 06/12] x86/vmscape: Use write_ibpb() instead of indirect_branch_prediction_barrier()
From: Pawan Gupta @ 2026-04-14 7:06 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
indirect_branch_prediction_barrier() is a wrapper to write_ibpb(), which
also checks if the CPU supports IBPB. For VMSCAPE, call to
indirect_branch_prediction_barrier() is only possible when CPU supports
IBPB.
Simply call write_ibpb() directly to avoid unnecessary alternative
patching.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/include/asm/entry-common.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index c45858db16c9..78b143673ca7 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -97,7 +97,7 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
/* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
this_cpu_read(x86_predictor_flush_exit_to_user)) {
- indirect_branch_prediction_barrier();
+ write_ibpb();
this_cpu_write(x86_predictor_flush_exit_to_user, false);
}
}
--
2.34.1
^ permalink raw reply related
* [PATCH v10 08/12] kvm: Define EXPORT_STATIC_CALL_FOR_KVM()
From: Pawan Gupta @ 2026-04-14 7:07 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
EXPORT_SYMBOL_FOR_KVM() exists to export symbols to KVM modules. Static
calls need the same treatment when the core kernel defines a static_call
that KVM needs access to (e.g. from a VM-exit path).
Define EXPORT_STATIC_CALL_FOR_KVM() as the static_call analogue of
EXPORT_SYMBOL_FOR_KVM(). The same three-way logic applies:
- KVM_SUB_MODULES defined: export to "kvm," plus all sub-modules
- KVM=m, no sub-modules: export to "kvm" only
- KVM built-in: no export needed (noop)
As with EXPORT_SYMBOL_FOR_KVM(), allow architectures to override the
definition (e.g. to suppress the export when kvm.ko itself will not be
built despite CONFIG_KVM=m). Add the x86 no-op override in
arch/x86/include/asm/kvm_types.h for that case.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/include/asm/kvm_types.h | 1 +
include/linux/kvm_types.h | 13 ++++++++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_types.h
index d7c704ed1be9..bceeaed2940e 100644
--- a/arch/x86/include/asm/kvm_types.h
+++ b/arch/x86/include/asm/kvm_types.h
@@ -15,6 +15,7 @@
* at least one vendor module is enabled.
*/
#define EXPORT_SYMBOL_FOR_KVM(symbol)
+#define EXPORT_STATIC_CALL_FOR_KVM(symbol)
#endif
#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a568d8e6f4e8..c81f4fdba625 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -13,6 +13,8 @@
EXPORT_SYMBOL_FOR_MODULES(symbol, __stringify(KVM_SUB_MODULES))
#define EXPORT_SYMBOL_FOR_KVM(symbol) \
EXPORT_SYMBOL_FOR_MODULES(symbol, "kvm," __stringify(KVM_SUB_MODULES))
+#define EXPORT_STATIC_CALL_FOR_KVM(symbol) \
+ EXPORT_STATIC_CALL_FOR_MODULES(symbol, "kvm," __stringify(KVM_SUB_MODULES))
#else
#define EXPORT_SYMBOL_FOR_KVM_INTERNAL(symbol)
/*
@@ -27,7 +29,16 @@
#define EXPORT_SYMBOL_FOR_KVM(symbol)
#endif /* IS_MODULE(CONFIG_KVM) */
#endif /* EXPORT_SYMBOL_FOR_KVM */
-#endif
+
+#ifndef EXPORT_STATIC_CALL_FOR_KVM
+#if IS_MODULE(CONFIG_KVM)
+#define EXPORT_STATIC_CALL_FOR_KVM(symbol) EXPORT_STATIC_CALL_FOR_MODULES(symbol, "kvm")
+#else
+#define EXPORT_STATIC_CALL_FOR_KVM(symbol)
+#endif /* IS_MODULE(CONFIG_KVM) */
+#endif /* EXPORT_STATIC_CALL_FOR_KVM */
+
+#endif /* KVM_SUB_MODULES */
#ifndef __ASSEMBLER__
--
2.34.1
^ permalink raw reply related
* [PATCH v10 11/12] x86/vmscape: Resolve conflict between attack-vectors and vmscape=force
From: Pawan Gupta @ 2026-04-14 7:08 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
vmscape=force option currently defaults to AUTO mitigation. This lets
attack-vector controls to override the vmscape mitigation. Preventing the
user from being able to force VMSCAPE mitigation.
When vmscape mitigation is forced, allow it be deployed irrespective of
attack vectors. Introduce VMSCAPE_MITIGATION_ON that wins over
attack-vector controls.
Tested-by: Jon Kohler <jon@nutanix.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/kernel/cpu/bugs.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1082ed1fb2e6..fbdb137720c4 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3058,6 +3058,7 @@ static void __init srso_apply_mitigation(void)
enum vmscape_mitigations {
VMSCAPE_MITIGATION_NONE,
VMSCAPE_MITIGATION_AUTO,
+ VMSCAPE_MITIGATION_ON,
VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
@@ -3066,6 +3067,7 @@ enum vmscape_mitigations {
static const char * const vmscape_strings[] = {
[VMSCAPE_MITIGATION_NONE] = "Vulnerable",
/* [VMSCAPE_MITIGATION_AUTO] */
+ /* [VMSCAPE_MITIGATION_ON] */
[VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER] = "Mitigation: IBPB before exit to userspace",
[VMSCAPE_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT",
[VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER] = "Mitigation: Clear BHB before exit to userspace",
@@ -3085,7 +3087,7 @@ static int __init vmscape_parse_cmdline(char *str)
vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
} else if (!strcmp(str, "force")) {
setup_force_cpu_bug(X86_BUG_VMSCAPE);
- vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
+ vmscape_mitigation = VMSCAPE_MITIGATION_ON;
} else if (!strcmp(str, "auto")) {
vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
} else {
@@ -3117,6 +3119,7 @@ static void __init vmscape_select_mitigation(void)
break;
case VMSCAPE_MITIGATION_AUTO:
+ case VMSCAPE_MITIGATION_ON:
/*
* CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use
* BHB clear sequence. These CPUs are only vulnerable to the BHI
@@ -3244,6 +3247,7 @@ void cpu_bugs_smt_update(void)
switch (vmscape_mitigation) {
case VMSCAPE_MITIGATION_NONE:
case VMSCAPE_MITIGATION_AUTO:
+ case VMSCAPE_MITIGATION_ON:
break;
case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
--
2.34.1
^ permalink raw reply related
* [PATCH v10 12/12] x86/vmscape: Add cmdline vmscape=on to override attack vector controls
From: Pawan Gupta @ 2026-04-14 7:08 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
In general, individual mitigation knobs override the attack vector
controls. For VMSCAPE, =ibpb exists but nothing to select BHB clearing
mitigation. The =force option would select BHB clearing when supported, but
with a side-effect of also forcing the bug, hence deploying the mitigation
on unaffected parts too.
Add a new cmdline option vmscape=on to enable the mitigation based on the
VMSCAPE variant the CPU is affected by.
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Documentation/admin-guide/hw-vuln/vmscape.rst | 4 ++++
Documentation/admin-guide/kernel-parameters.txt | 2 ++
arch/x86/kernel/cpu/bugs.c | 2 ++
3 files changed, 8 insertions(+)
diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index 7c40cf70ad7a..2558a5c3d956 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -117,3 +117,7 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter:
Choose the mitigation based on the VMSCAPE variant the CPU is affected by.
(default when CONFIG_MITIGATION_VMSCAPE=y)
+
+ * ``vmscape=on``:
+
+ Same as ``auto``, except that it overrides attack vector controls.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3853c7109419..98204d464477 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8383,6 +8383,8 @@ Kernel parameters
unaffected processors
auto - (default) use IBPB or BHB clear
mitigation based on CPU
+ on - same as "auto", but override attack
+ vector control
vsyscall= [X86-64,EARLY]
Controls the behavior of vsyscalls (i.e. calls to
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index fbdb137720c4..4e0b77fb21dd 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -3088,6 +3088,8 @@ static int __init vmscape_parse_cmdline(char *str)
} else if (!strcmp(str, "force")) {
setup_force_cpu_bug(X86_BUG_VMSCAPE);
vmscape_mitigation = VMSCAPE_MITIGATION_ON;
+ } else if (!strcmp(str, "on")) {
+ vmscape_mitigation = VMSCAPE_MITIGATION_ON;
} else if (!strcmp(str, "auto")) {
vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
} else {
--
2.34.1
^ permalink raw reply related
* [PATCH v10 09/12] x86/vmscape: Use static_call() for predictor flush
From: Pawan Gupta @ 2026-04-14 7:07 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
Adding more mitigation options at exit-to-userspace for VMSCAPE would
usually require a series of checks to decide which mitigation to use. In
this case, the mitigation is done by calling a function, which is decided
at boot. So, adding more feature flags and multiple checks can be avoided
by using static_call() to the mitigating function.
Replace the flag-based mitigation selector with a static_call(). This also
frees the existing X86_FEATURE_IBPB_EXIT_TO_USER.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/entry-common.h | 7 +++----
arch/x86/include/asm/nospec-branch.h | 3 +++
arch/x86/kernel/cpu/bugs.c | 9 ++++++++-
arch/x86/kvm/x86.c | 2 +-
6 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e2df1b147184..5b8def9ddb98 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2720,6 +2720,7 @@ config MITIGATION_TSA
config MITIGATION_VMSCAPE
bool "Mitigate VMSCAPE"
depends on KVM
+ depends on HAVE_STATIC_CALL
default y
help
Enable mitigation for VMSCAPE attacks. VMSCAPE is a hardware security
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dbe104df339b..b4d529dd6d30 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -503,7 +503,7 @@
#define X86_FEATURE_TSA_SQ_NO (21*32+11) /* AMD CPU not vulnerable to TSA-SQ */
#define X86_FEATURE_TSA_L1_NO (21*32+12) /* AMD CPU not vulnerable to TSA-L1 */
#define X86_FEATURE_CLEAR_CPU_BUF_VM (21*32+13) /* Clear CPU buffers using VERW before VMRUN */
-#define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
+/* Free */
#define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */
#define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */
#define X86_FEATURE_SGX_EUPDATESVN (21*32+17) /* Support for ENCLS[EUPDATESVN] instruction */
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 78b143673ca7..783e7cb50cae 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -4,6 +4,7 @@
#include <linux/randomize_kstack.h>
#include <linux/user-return-notifier.h>
+#include <linux/static_call_types.h>
#include <asm/nospec-branch.h>
#include <asm/io_bitmap.h>
@@ -94,10 +95,8 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
*/
choose_random_kstack_offset(rdtsc());
- /* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */
- if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
- this_cpu_read(x86_predictor_flush_exit_to_user)) {
- write_ibpb();
+ if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
+ static_call_cond(vmscape_predictor_flush)();
this_cpu_write(x86_predictor_flush_exit_to_user, false);
}
}
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 0381db59c39d..066fd8095200 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -542,6 +542,9 @@ static inline void indirect_branch_prediction_barrier(void)
:: "rax", "rcx", "rdx", "memory");
}
+#include <linux/static_call_types.h>
+DECLARE_STATIC_CALL(vmscape_predictor_flush, write_ibpb);
+
/* The Intel SPEC CTRL MSR base value cache */
extern u64 x86_spec_ctrl_base;
DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 636280c612f0..bfc0e41697f6 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -144,6 +144,13 @@ EXPORT_SYMBOL_GPL(cpu_buf_idle_clear);
*/
DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
+/*
+ * Controls how vmscape is mitigated e.g. via IBPB or BHB-clear
+ * sequence. This defaults to no mitigation.
+ */
+DEFINE_STATIC_CALL_NULL(vmscape_predictor_flush, write_ibpb);
+EXPORT_STATIC_CALL_FOR_KVM(vmscape_predictor_flush);
+
#undef pr_fmt
#define pr_fmt(fmt) "mitigations: " fmt
@@ -3133,7 +3140,7 @@ static void __init vmscape_update_mitigation(void)
static void __init vmscape_apply_mitigation(void)
{
if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
- setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER);
+ static_call_update(vmscape_predictor_flush, write_ibpb);
}
#undef pr_fmt
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 45d7cfedc507..5582056b2fa1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11463,7 +11463,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
* set for the CPU that actually ran the guest, and not the CPU that it
* may migrate to.
*/
- if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
+ if (static_call_query(vmscape_predictor_flush))
this_cpu_write(x86_predictor_flush_exit_to_user, true);
/*
--
2.34.1
^ permalink raw reply related
* [PATCH v10 10/12] x86/vmscape: Deploy BHB clearing mitigation
From: Pawan Gupta @ 2026-04-14 7:07 UTC (permalink / raw)
To: x86, Jon Kohler, Nikolay Borisov, H. Peter Anvin, Josh Poimboeuf,
David Kaplan, Sean Christopherson, Borislav Petkov, Dave Hansen,
Peter Zijlstra, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, KP Singh, Jiri Olsa, David S. Miller,
David Laight, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
David Ahern, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
Paolo Bonzini, Jonathan Corbet
Cc: linux-kernel, kvm, Asit Mallick, Tao Zhang, bpf, netdev,
linux-doc
In-Reply-To: <20260414-vmscape-bhb-v10-0-efa924abae5f@linux.intel.com>
IBPB mitigation for VMSCAPE is an overkill on CPUs that are only affected
by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides
indirect branch isolation between guest and host userspace. However, branch
history from guest may also influence the indirect branches in host
userspace.
To mitigate the BHI aspect, use the BHB clearing sequence. Since now, IBPB
is not the only mitigation for VMSCAPE, update the documentation to reflect
that =auto could select either IBPB or BHB clear mitigation based on the
CPU.
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Tested-by: Jon Kohler <jon@nutanix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Documentation/admin-guide/hw-vuln/vmscape.rst | 11 ++++++++-
Documentation/admin-guide/kernel-parameters.txt | 4 +++-
arch/x86/include/asm/entry-common.h | 4 ++++
arch/x86/include/asm/nospec-branch.h | 2 ++
arch/x86/kernel/cpu/bugs.c | 30 +++++++++++++++++++------
5 files changed, 42 insertions(+), 9 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
index d9b9a2b6c114..7c40cf70ad7a 100644
--- a/Documentation/admin-guide/hw-vuln/vmscape.rst
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -86,6 +86,10 @@ The possible values in this file are:
run a potentially malicious guest and issues an IBPB before the first
exit to userspace after VM-exit.
+ * 'Mitigation: Clear BHB before exit to userspace':
+
+ As above, conditional BHB clearing mitigation is enabled.
+
* 'Mitigation: IBPB on VMEXIT':
IBPB is issued on every VM-exit. This occurs when other mitigations like
@@ -102,9 +106,14 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter:
* ``vmscape=ibpb``:
- Enable conditional IBPB mitigation (default when CONFIG_MITIGATION_VMSCAPE=y).
+ Enable conditional IBPB mitigation.
* ``vmscape=force``:
Force vulnerability detection and mitigation even on processors that are
not known to be affected.
+
+ * ``vmscape=auto``:
+
+ Choose the mitigation based on the VMSCAPE variant the CPU is affected by.
+ (default when CONFIG_MITIGATION_VMSCAPE=y)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 03a550630644..3853c7109419 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8378,9 +8378,11 @@ Kernel parameters
off - disable the mitigation
ibpb - use Indirect Branch Prediction Barrier
- (IBPB) mitigation (default)
+ (IBPB) mitigation
force - force vulnerability detection even on
unaffected processors
+ auto - (default) use IBPB or BHB clear
+ mitigation based on CPU
vsyscall= [X86-64,EARLY]
Controls the behavior of vsyscalls (i.e. calls to
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index 783e7cb50cae..13db31472f3a 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -96,6 +96,10 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
choose_random_kstack_offset(rdtsc());
if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) {
+ /*
+ * Since the mitigation is for userspace, an explicit
+ * speculation barrier is not required after flush.
+ */
static_call_cond(vmscape_predictor_flush)();
this_cpu_write(x86_predictor_flush_exit_to_user, false);
}
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 066fd8095200..38478383139b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -390,6 +390,8 @@ extern void write_ibpb(void);
#ifdef CONFIG_X86_64
extern void clear_bhb_loop_nofence(void);
+#else
+static inline void clear_bhb_loop_nofence(void) {}
#endif
extern void (*x86_return_thunk)(void);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bfc0e41697f6..1082ed1fb2e6 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -61,9 +61,8 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current);
/*
- * Set when the CPU has run a potentially malicious guest. An IBPB will
- * be needed to before running userspace. That IBPB will flush the branch
- * predictor content.
+ * Set when the CPU has run a potentially malicious guest. Indicates that a
+ * branch predictor flush is needed before running userspace.
*/
DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user);
EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user);
@@ -3061,13 +3060,15 @@ enum vmscape_mitigations {
VMSCAPE_MITIGATION_AUTO,
VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER,
VMSCAPE_MITIGATION_IBPB_ON_VMEXIT,
+ VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER,
};
static const char * const vmscape_strings[] = {
- [VMSCAPE_MITIGATION_NONE] = "Vulnerable",
+ [VMSCAPE_MITIGATION_NONE] = "Vulnerable",
/* [VMSCAPE_MITIGATION_AUTO] */
- [VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER] = "Mitigation: IBPB before exit to userspace",
- [VMSCAPE_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT",
+ [VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER] = "Mitigation: IBPB before exit to userspace",
+ [VMSCAPE_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT",
+ [VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER] = "Mitigation: Clear BHB before exit to userspace",
};
static enum vmscape_mitigations vmscape_mitigation __ro_after_init =
@@ -3085,6 +3086,8 @@ static int __init vmscape_parse_cmdline(char *str)
} else if (!strcmp(str, "force")) {
setup_force_cpu_bug(X86_BUG_VMSCAPE);
vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
+ } else if (!strcmp(str, "auto")) {
+ vmscape_mitigation = VMSCAPE_MITIGATION_AUTO;
} else {
pr_err("Ignoring unknown vmscape=%s option.\n", str);
}
@@ -3114,7 +3117,17 @@ static void __init vmscape_select_mitigation(void)
break;
case VMSCAPE_MITIGATION_AUTO:
- if (boot_cpu_has(X86_FEATURE_IBPB))
+ /*
+ * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use
+ * BHB clear sequence. These CPUs are only vulnerable to the BHI
+ * variant of the VMSCAPE attack, and thus they do not require a
+ * full predictor flush.
+ *
+ * Note, in 32-bit mode BHB clear sequence is not supported.
+ */
+ if (boot_cpu_has(X86_FEATURE_BHI_CTRL) && IS_ENABLED(CONFIG_X86_64))
+ vmscape_mitigation = VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER;
+ else if (boot_cpu_has(X86_FEATURE_IBPB))
vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER;
else
vmscape_mitigation = VMSCAPE_MITIGATION_NONE;
@@ -3141,6 +3154,8 @@ static void __init vmscape_apply_mitigation(void)
{
if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER)
static_call_update(vmscape_predictor_flush, write_ibpb);
+ else if (vmscape_mitigation == VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER)
+ static_call_update(vmscape_predictor_flush, clear_bhb_loop_nofence);
}
#undef pr_fmt
@@ -3232,6 +3247,7 @@ void cpu_bugs_smt_update(void)
break;
case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT:
case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER:
+ case VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER:
/*
* Hypervisors can be attacked across-threads, warn for SMT when
* STIBP is not already enabled system-wide.
--
2.34.1
^ permalink raw reply related
* Re: [PATCH v4 08/13] mfd: sec: resolve PMIC revision in S2MU005
From: Kaustabh Chakraborty @ 2026-04-14 7:25 UTC (permalink / raw)
To: Kaustabh Chakraborty, Lee Jones, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, MyungJoo Ham, Chanwoo Choi,
Sebastian Reichel, Krzysztof Kozlowski, André Draszik,
Alexandre Belloni, Jonathan Corbet, Shuah Khan, Nam Tran,
Łukasz Lebiedziński
Cc: linux-leds, devicetree, linux-kernel, linux-pm, linux-samsung-soc,
linux-rtc, linux-doc
In-Reply-To: <20260414-s2mu005-pmic-v4-8-7fe7480577e6@disroot.org>
On 2026-04-14 12:03 +05:30, Kaustabh Chakraborty wrote:
> In devices other than S2MPG1X, the revision can be retrieved from the
> first register of the PMIC regmap. In S2MU005 however, the location is
> in offset 0x73. Introduce a switch-case block to allow selecting the
> REG_ID register.
>
> S2MU005 also has a field mask for the revision. Apply it using
> FIELD_GET() and get the extracted value.
>
> Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
> ---
> drivers/mfd/sec-common.c | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/mfd/sec-common.c b/drivers/mfd/sec-common.c
> index 883e6d0aa3f06..43215605191e4 100644
> --- a/drivers/mfd/sec-common.c
> +++ b/drivers/mfd/sec-common.c
> @@ -16,6 +16,7 @@
> #include <linux/mfd/samsung/irq.h>
> #include <linux/mfd/samsung/s2mps11.h>
> #include <linux/mfd/samsung/s2mps13.h>
> +#include <linux/mfd/samsung/s2mu005.h>
> #include <linux/module.h>
> #include <linux/of.h>
> #include <linux/pm.h>
> @@ -119,20 +120,27 @@ static const struct mfd_cell s2mu005_devs[] = {
>
> static void sec_pmic_dump_rev(struct sec_pmic_dev *sec_pmic)
> {
> - unsigned int val;
> + unsigned int reg, mask, val;
>
> - /* For s2mpg1x, the revision is in a different regmap */
> switch (sec_pmic->device_type) {
> case S2MPG10:
> case S2MPG11:
> + /* For s2mpg1x, the revision is in a different regmap */
> return;
> - default:
> + case S2MU005:
> + reg = S2MU005_REG_ID;
> + mask = S2MU005_ID_MASK;
> break;
> + default:
> + /* For other device types, REG_ID is always the first register. */
> + reg = S2MPS11_REG_ID;
> + mask = ~0;
> }
>
> - /* For each device type, the REG_ID is always the first register */
> - if (!regmap_read(sec_pmic->regmap_pmic, S2MPS11_REG_ID, &val))
> + if (!regmap_read(sec_pmic->regmap_pmic, reg, &val)) {
> + val = FIELD_GET(S2MU005_ID_MASK, val);
Bug here! FIELD_GET(mask, val) should've been used.
> dev_dbg(sec_pmic->dev, "Revision: 0x%x\n", val);
> + }
> }
>
> static void sec_pmic_configure(struct sec_pmic_dev *sec_pmic)
^ permalink raw reply
* Re: [PATCH v5 00/21] Virtual Swap Space
From: Christoph Hellwig @ 2026-04-14 7:52 UTC (permalink / raw)
To: Nhat Pham
Cc: YoungJun Park, kasong, Liam.Howlett, akpm, apopple, axelrasmussen,
baohua, baolin.wang, bhe, byungchul, cgroups, chengming.zhou,
chrisl, corbet, david, dev.jain, gourry, hannes, hughd, jannh,
joshua.hahnjy, lance.yang, lenb, linux-doc, linux-kernel,
linux-mm, linux-pm, lorenzo.stoakes, matthew.brost, mhocko,
muchun.song, npache, pavel, peterx, peterz, pfalcato, rafael,
rakie.kim, roman.gushchin, rppt, ryan.roberts, shakeel.butt,
shikemeng, surenb, tglx, vbabka, weixugc, ying.huang, yosry.ahmed,
yuanchu, zhengqi.arch, ziy, kernel-team, riel
In-Reply-To: <CAKEwX=NnHxpQKp9qBg2=r_euyjgxw2nHXjbgof3MymHTgJmRAQ@mail.gmail.com>
On Sat, Apr 11, 2026 at 06:40:44PM -0700, Nhat Pham wrote:
> > However, if the modularization from point 1 is achieved and
> > vswap acts as a swap device itself, then we can cleanly
> > establish a:
> >
> > virtual -> physical
>
> I read that thread sometimes ago. Some remarks:
>
> 1. I think Christoph has a point. Seems like some of your ideas ( are
> broadly applicable to swap in general. Maybe fixing swap infra
> generally would make a lot of sense?
I think a first step would be a dump of that code, even if it is against
an old kernel so that everyone knows what we are talking about.
> 2. Why do we need to do two virtual layers here? For example, If you
> want to buffer multiple swap outs and turn them into a sequential
> request, you can:
>
> a. Allocate virtual swap space for them as you wish. They don't even
> need to be sequential.
>
> b. At swap_writeout() time, don't allocate physical swap space for
> them right away. Instead, accumulate them into a buffer. You can add a
> new virtual swap entry type to flag it if necessary.
>
> c. Once that buffer reaches a certain size, you can now allocate
> contiguous physical swap space for them. Then flush etc. You can flush
> at swap_writeout() time, or use a dedicated threads etc.
That matches what file systems do with delalloc, where space 2 just
adjust an in-memory counter for space reservations.
> Deduplication sounds like something that should live at a lower layer
> - I was thinking about it for zswap/zsmalloc back then. I mean, I
> assume you don't want content sharing across different swap media? :)
> Something along the line of:
Does dedup in swap really make much sense? If you want to dedup you
also want to do that in-memory, i.e. using ksm.
^ permalink raw reply
* Re: [PATCH v4 05/13] dt-bindings: mfd: s2mps11: add documentation for S2MU005 PMIC
From: Rob Herring (Arm) @ 2026-04-14 7:58 UTC (permalink / raw)
To: Kaustabh Chakraborty
Cc: Conor Dooley, Krzysztof Kozlowski, linux-rtc, linux-leds,
Jonathan Corbet, linux-pm, devicetree, Pavel Machek, Nam Tran,
linux-kernel, Shuah Khan, linux-doc, MyungJoo Ham,
Alexandre Belloni, Łukasz Lebiedziński, Chanwoo Choi,
Lee Jones, Krzysztof Kozlowski, Sebastian Reichel,
André Draszik, linux-samsung-soc
In-Reply-To: <20260414-s2mu005-pmic-v4-5-7fe7480577e6@disroot.org>
On Tue, 14 Apr 2026 12:02:57 +0530, Kaustabh Chakraborty wrote:
> Samsung's S2MU005 PMIC includes subdevices for a charger, an MUIC (Micro
> USB Interface Controller), and flash and RGB LED controllers.
>
> Since regulators are not supported by this device, unmark this property
> as required and instead set this in a per-device basis for ones which
> need it.
>
> Add the compatible and documentation for the S2MU005 PMIC. Also, add an
> example for nodes for supported sub-devices, i.e. charger, extcon,
> flash, and rgb.
>
> Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
> ---
> .../devicetree/bindings/mfd/samsung,s2mps11.yaml | 121 ++++++++++++++++++++-
> 1 file changed, 120 insertions(+), 1 deletion(-)
>
My bot found errors running 'make dt_binding_check' on your patch:
yamllint warnings/errors:
dtschema/dtc warnings/errors:
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:241.29-39: Warning (reg_format): /example-2/i2c/pmic@3d/extcon/port/endpoint@0:reg: property has invalid length (4 bytes) (#address-cells == 2, #size-cells == 1)
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:246.29-39: Warning (reg_format): /example-2/i2c/pmic@3d/extcon/port/endpoint@1:reg: property has invalid length (4 bytes) (#address-cells == 2, #size-cells == 1)
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dtb: Warning (pci_device_reg): Failed prerequisite 'reg_format'
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dtb: Warning (pci_device_bus_num): Failed prerequisite 'reg_format'
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dtb: Warning (simple_bus_reg): Failed prerequisite 'reg_format'
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dtb: Warning (i2c_bus_reg): Failed prerequisite 'reg_format'
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dtb: Warning (spi_bus_reg): Failed prerequisite 'reg_format'
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:240.53-243.27: Warning (avoid_default_addr_size): /example-2/i2c/pmic@3d/extcon/port/endpoint@0: Relying on default #address-cells value
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:240.53-243.27: Warning (avoid_default_addr_size): /example-2/i2c/pmic@3d/extcon/port/endpoint@0: Relying on default #size-cells value
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:245.49-248.27: Warning (avoid_default_addr_size): /example-2/i2c/pmic@3d/extcon/port/endpoint@1: Relying on default #address-cells value
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:245.49-248.27: Warning (avoid_default_addr_size): /example-2/i2c/pmic@3d/extcon/port/endpoint@1: Relying on default #size-cells value
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dtb: Warning (unique_unit_address_if_enabled): Failed prerequisite 'avoid_default_addr_size'
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:240.53-243.27: Warning (graph_endpoint): /example-2/i2c/pmic@3d/extcon/port/endpoint@0: graph node '#address-cells' is -1, must be 1
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:240.53-243.27: Warning (graph_endpoint): /example-2/i2c/pmic@3d/extcon/port/endpoint@0: graph node '#size-cells' is -1, must be 0
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:245.49-248.27: Warning (graph_endpoint): /example-2/i2c/pmic@3d/extcon/port/endpoint@1: graph node '#address-cells' is -1, must be 1
Documentation/devicetree/bindings/mfd/samsung,s2mps11.example.dts:245.49-248.27: Warning (graph_endpoint): /example-2/i2c/pmic@3d/extcon/port/endpoint@1: graph node '#size-cells' is -1, must be 0
doc reference errors (make refcheckdocs):
See https://patchwork.kernel.org/project/devicetree/patch/20260414-s2mu005-pmic-v4-5-7fe7480577e6@disroot.org
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
^ permalink raw reply
* Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: Jesper Dangaard Brouer @ 2026-04-14 8:06 UTC (permalink / raw)
To: syzbot ci, andrew, ast, bpf, corbet, daniel, davem, edumazet,
frederic, horms, j.koeppeler, john.fastabend, kernel-team, kuba,
linux-doc, linux-kernel, linux-kselftest, netdev, pabeni, sdf,
shuah
Cc: syzbot, syzkaller-bugs
In-Reply-To: <69dd48c2.a00a0220.468cb.004e.GAE@google.com>
[-- Attachment #1: Type: text/plain, Size: 4594 bytes --]
On 13/04/2026 21.49, syzbot ci wrote:
> syzbot ci has tested the following series
>
> [v2] veth: add Byte Queue Limits (BQL) support
> https://lore.kernel.org/all/20260413094442.1376022-1-hawk@kernel.org
> * [PATCH net-next v2 1/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices
> * [PATCH net-next v2 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction
> * [PATCH net-next v2 3/5] veth: add tx_timeout watchdog as BQL safety net
> * [PATCH net-next v2 4/5] net: sched: add timeout count to NETDEV WATCHDOG message
> * [PATCH net-next v2 5/5] selftests: net: add veth BQL stress test
>
> and found the following issue:
> WARNING in veth_napi_del_range
>
> Full report is available here:
> https://ci.syzbot.org/series/ee732006-8545-4abd-a105-b4b1592a7baf
>
> ***
>
> WARNING in veth_napi_del_range
>
Attached a reproducer myself.
- I have V3 ready see below for diff
> tree: net-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
> base: 8806d502e0a7e7d895b74afbd24e8550a65a2b17
> arch: amd64
> compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> config: https://ci.syzbot.org/builds/90743a26-f003-44cf-abcc-5991c47588b2/config
> syz repro: https://ci.syzbot.org/findings/d068bfb2-9f8b-466a-95b4-cd7e7b00006c/syz_repro
>
> ------------[ cut here ]------------
> index >= dev->num_tx_queues
> WARNING: ./include/linux/netdevice.h:2672 at netdev_get_tx_queue include/linux/netdevice.h:2672 [inline], CPU#0: syz.1.27/6002
> WARNING: ./include/linux/netdevice.h:2672 at veth_napi_del_range+0x3b7/0x4e0 drivers/net/veth.c:1142, CPU#0: syz.1.27/6002
> Modules linked in:
> CPU: 0 UID: 0 PID: 6002 Comm: syz.1.27 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:netdev_get_tx_queue include/linux/netdevice.h:2672 [inline]
> RIP: 0010:veth_napi_del_range+0x3b7/0x4e0 drivers/net/veth.c:1142
> Code: 00 e8 ad 96 69 fe 44 39 6c 24 10 74 5e e8 41 61 44 fb 41 ff c5 49 bc 00 00 00 00 00 fc ff df e9 6d ff ff ff e8 2a 61 44 fb 90 <0f> 0b 90 42 80 3c 23 00 75 8e eb 94 48 8b 0c 24 80 e1 07 80 c1 03
> RSP: 0018:ffffc90003adf918 EFLAGS: 00010293
> RAX: ffffffff86814ec6 RBX: 1ffff110227a6c03 RCX: ffff888103a857c0
> RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002
> RBP: 1ffff110227a6c9a R08: ffff888113f01ab7 R09: 0000000000000000
> R10: ffff888113f01a98 R11: ffffed10227e0357 R12: dffffc0000000000
> R13: 0000000000000002 R14: 0000000000000002 R15: ffff888113d36018
> FS: 000055555ea16500(0000) GS:ffff88818de4a000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007efc287456b8 CR3: 000000010cdd0000 CR4: 00000000000006f0
> Call Trace:
> <TASK>
> veth_napi_del drivers/net/veth.c:1153 [inline]
> veth_disable_xdp+0x1b0/0x310 drivers/net/veth.c:1255
> veth_xdp_set drivers/net/veth.c:1693 [inline]
> veth_xdp+0x48e/0x730 drivers/net/veth.c:1717
> dev_xdp_propagate+0x125/0x260 net/core/dev_api.c:348
> bond_xdp_set drivers/net/bonding/bond_main.c:5715 [inline]
> bond_xdp+0x3ca/0x830 drivers/net/bonding/bond_main.c:5761
> dev_xdp_install+0x42c/0x600 net/core/dev.c:10387
> dev_xdp_detach_link net/core/dev.c:10579 [inline]
> bpf_xdp_link_release+0x362/0x540 net/core/dev.c:10595
> bpf_link_free+0x103/0x480 kernel/bpf/syscall.c:3292
> bpf_link_put_direct kernel/bpf/syscall.c:3344 [inline]
> bpf_link_release+0x6b/0x80 kernel/bpf/syscall.c:3351
> __fput+0x44f/0xa70 fs/file_table.c:469
> task_work_run+0x1d9/0x270 kernel/task_work.c:233
The BQL reset loop in veth_napi_del_range() iterates
dev->real_num_rx_queues but indexes into peer's TX queues,
which goes out of bounds when the peer has fewer TX queues
(e.g. veth enslaved to a bond with XDP).
Fix is to clamp the loop to the peer's real_num_tx_queues.
Will be included in the V3 submission.
#syz test
---
drivers/net/veth.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 911e7e36e166..9d7b085c9548 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1138,7 +1138,9 @@ static void veth_napi_del_range(struct net_device
*dev, int start, int end)
*/
peer = rtnl_dereference(priv->peer);
if (peer) {
- for (i = start; i < end; i++)
+ int peer_end = min(end, (int)peer->real_num_tx_queues);
+
+ for (i = start; i < peer_end; i++)
netdev_tx_reset_queue(netdev_get_tx_queue(peer, i));
}
[-- Attachment #2: repro-syzbot-veth-bql.sh --]
[-- Type: application/x-shellscript, Size: 2967 bytes --]
^ permalink raw reply related
* Re: Re: [syzbot ci] Re: veth: add Byte Queue Limits (BQL) support
From: syzbot ci @ 2026-04-14 8:08 UTC (permalink / raw)
To: hawk
Cc: andrew, ast, bpf, corbet, daniel, davem, edumazet, frederic, hawk,
horms, j.koeppeler, john.fastabend, kernel-team, kuba, linux-doc,
linux-kernel, linux-kselftest, netdev, pabeni, sdf, shuah, syzbot,
syzkaller-bugs
In-Reply-To: <41689f2e-8786-49a6-912d-f65e48245a61@kernel.org>
Please attach the patch to act upon.
^ permalink raw reply
* [PATCH] docs: fix typos in kernel documentation
From: fru1tworld @ 2026-04-14 8:45 UTC (permalink / raw)
To: corbet; +Cc: skhan, linux-doc, fru1tworld
reinitalizes => reinitializes
unpriviledged => unprivileged
the the => the (duplicated word)
sub-struture => sub-structure
Signed-off-by: fru1tworld <fruitworld.planet@gmail.com>
---
Documentation/block/data-integrity.rst | 2 +-
Documentation/core-api/list.rst | 2 +-
Documentation/core-api/real-time/differences.rst | 2 +-
Documentation/gpu/drm-uapi.rst | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/block/data-integrity.rst b/Documentation/block/data-integrity.rst
index 99905e880..b7b10c8ab 100644
--- a/Documentation/block/data-integrity.rst
+++ b/Documentation/block/data-integrity.rst
@@ -154,7 +154,7 @@ bio_free() will automatically free the bip.
----------------
Block devices can set up the integrity information in the integrity
-sub-struture of the queue_limits structure.
+sub-structure of the queue_limits structure.
Layered block devices will need to pick a profile that's appropriate
for all subdevices. queue_limits_stack_integrity() can help with that. DM
diff --git a/Documentation/core-api/list.rst b/Documentation/core-api/list.rst
index 241464ca0..4819343a2 100644
--- a/Documentation/core-api/list.rst
+++ b/Documentation/core-api/list.rst
@@ -752,7 +752,7 @@ This is because list_splice() did not reinitialize the list_head it took
entries from, leaving its pointer pointing into what is now a different list.
If we want to avoid this situation, list_splice_init() can be used. It does the
-same thing as list_splice(), except reinitalizes the donor list_head after the
+same thing as list_splice(), except reinitializes the donor list_head after the
transplant.
Concurrency considerations
diff --git a/Documentation/core-api/real-time/differences.rst b/Documentation/core-api/real-time/differences.rst
index 83ec9aa1c..a129570da 100644
--- a/Documentation/core-api/real-time/differences.rst
+++ b/Documentation/core-api/real-time/differences.rst
@@ -213,7 +213,7 @@ to suspend until the callback completes, ensuring forward progress without
risking livelock.
In order to solve the problem at the API level, the sequence locks were extended
-to allow a proper handover between the the spinning reader and the maybe
+to allow a proper handover between the spinning reader and the maybe
blocked writer.
Sequence locks
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index d98428a59..14ecaf98d 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -568,7 +568,7 @@ ENOSPC:
EPERM/EACCES:
Returned for an operation that is valid, but needs more privileges.
E.g. root-only or much more common, DRM master-only operations return
- this when called by unpriviledged clients. There's no clear
+ this when called by unprivileged clients. There's no clear
difference between EACCES and EPERM.
ENODEV:
--
2.52.0
^ permalink raw reply related
* [PATCH] docs: rust: fix grammar in testing documentation
From: Ariful Islam Shoikot @ 2026-04-14 9:07 UTC (permalink / raw)
To: linux-doc; +Cc: Ariful Islam Shoikot
Replace "how to test" with "on how to test" for clarity
Signed-off-by: Ariful Islam Shoikot <islamarifulshoikat@gmail.com>
---
Documentation/rust/testing.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/rust/testing.rst b/Documentation/rust/testing.rst
index f43cb77bcc69..edce2cb6c54e 100644
--- a/Documentation/rust/testing.rst
+++ b/Documentation/rust/testing.rst
@@ -3,7 +3,7 @@
Testing
=======
-This document contains useful information how to test the Rust code in the
+This document contains useful information on how to test the Rust code in the
kernel.
There are three sorts of tests:
--
2.43.0
^ permalink raw reply related
* [PATCH v4 0/2] hwmon: Add support for MPS mp2985
From: wenswang @ 2026-04-14 9:28 UTC (permalink / raw)
To: robh, krzk+dt, conor+dt, linux, corbet, skhan
Cc: devicetree, linux-kernel, linux-hwmon, linux-doc, Wensheng Wang
From: Wensheng Wang <wenswang@yeah.net>
Add mp2985 driver in hwmon and add dt-bindings for it.
V3 -> V4:
1. Avoid mantissa data overflow in mp2985_linear_exp_transfer()
function.
V2 -> V3:
1. The shifted mantissa be clamped to the range [-1024, 1023]
before being masked in mp2985_linear_exp_transfer() function.
2. The PMBUS_VOUT_OV_FAULT_LIMIT and PMBUS_VOUT_UV_FAULT_LIMIT
value are clamped to 0xFFF before being written to the mp2985.
3. Fix the vout scale issue for vout linear11 mode.
v1 -> v2:
1. add Krzysztof's Acked-by
2. remove duplicate entry in mp2985.rst
3. clamp vout value to 32767
4. simplify the code for obtaining PMBUS_VOUT_MODE bit value
5. add comment for explaining MP2985 supported vout mode
6. switch back to previous page after obtaining vid scale to avoid
confusing the PMBus core
Wensheng Wang (2):
dt-bindings: hwmon: Add MPS mp2985
hwmon: add MP2985 driver
.../devicetree/bindings/trivial-devices.yaml | 2 +
Documentation/hwmon/index.rst | 1 +
Documentation/hwmon/mp2985.rst | 147 +++++++
MAINTAINERS | 7 +
drivers/hwmon/pmbus/Kconfig | 9 +
drivers/hwmon/pmbus/Makefile | 1 +
drivers/hwmon/pmbus/mp2985.c | 402 ++++++++++++++++++
7 files changed, 569 insertions(+)
create mode 100644 Documentation/hwmon/mp2985.rst
create mode 100644 drivers/hwmon/pmbus/mp2985.c
--
2.25.1
^ permalink raw reply
* [PATCH v4 2/2] hwmon: add MP2985 driver
From: wenswang @ 2026-04-14 9:29 UTC (permalink / raw)
To: robh, krzk+dt, conor+dt, linux, corbet, skhan
Cc: devicetree, linux-kernel, linux-hwmon, linux-doc, Wensheng Wang
In-Reply-To: <20260414092921.1067735-1-wenswang@yeah.net>
From: Wensheng Wang <wenswang@yeah.net>
Add support for MPS mp2985 controller. This driver exposes
telemetry and limit value readings and writtings.
Signed-off-by: Wensheng Wang <wenswang@yeah.net>
---
V3 -> V4:
1. Avoid mantissa data overflow in mp2985_linear_exp_transfer()
function.
V2 -> V3:
1. The shifted mantissa be clamped to the range [-1024, 1023]
before being masked in mp2985_linear_exp_transfer() function.
2. The PMBUS_VOUT_OV_FAULT_LIMIT and PMBUS_VOUT_UV_FAULT_LIMIT
value are clamped to 0xFFF before being written to the mp2985.
3. Fix the vout scale issue for vout linear11 mode.
v1 -> v2:
1. remove duplicate entry in mp2985.rst
2. clamp vout value to 32767
3. simplify the code for obtaining PMBUS_VOUT_MODE bit value
4. add comment for explaining MP2985 supported vout mode
5. switch back to previous page after obtaining vid scale to avoid
confusing the PMBus core
Documentation/hwmon/index.rst | 1 +
Documentation/hwmon/mp2985.rst | 147 ++++++++++++
MAINTAINERS | 7 +
drivers/hwmon/pmbus/Kconfig | 9 +
drivers/hwmon/pmbus/Makefile | 1 +
drivers/hwmon/pmbus/mp2985.c | 402 +++++++++++++++++++++++++++++++++
6 files changed, 567 insertions(+)
create mode 100644 Documentation/hwmon/mp2985.rst
create mode 100644 drivers/hwmon/pmbus/mp2985.c
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index b2ca8513cfcd..1b7007f41b39 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -183,6 +183,7 @@ Hardware Monitoring Kernel Drivers
mp2925
mp29502
mp2975
+ mp2985
mp2993
mp5023
mp5920
diff --git a/Documentation/hwmon/mp2985.rst b/Documentation/hwmon/mp2985.rst
new file mode 100644
index 000000000000..87a39c8a300c
--- /dev/null
+++ b/Documentation/hwmon/mp2985.rst
@@ -0,0 +1,147 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver mp2985
+====================
+
+Supported chips:
+
+ * MPS mp2985
+
+ Prefix: 'mp2985'
+
+Author:
+
+ Wensheng Wang <wenswang@yeah.net>
+
+Description
+-----------
+
+This driver implements support for Monolithic Power Systems, Inc. (MPS)
+MP2985 Dual Loop Digital Multi-phase Controller.
+
+Device compliant with:
+
+- PMBus rev 1.3 interface.
+
+The driver exports the following attributes via the 'sysfs' files
+for input voltage:
+
+**in1_input**
+
+**in1_label**
+
+**in1_crit**
+
+**in1_crit_alarm**
+
+**in1_lcrit**
+
+**in1_lcrit_alarm**
+
+**in1_max**
+
+**in1_max_alarm**
+
+**in1_min**
+
+**in1_min_alarm**
+
+The driver provides the following attributes for output voltage:
+
+**in2_input**
+
+**in2_label**
+
+**in2_crit**
+
+**in2_crit_alarm**
+
+**in2_lcrit**
+
+**in2_lcrit_alarm**
+
+**in3_input**
+
+**in3_label**
+
+**in3_crit**
+
+**in3_crit_alarm**
+
+**in3_lcrit**
+
+**in3_lcrit_alarm**
+
+The driver provides the following attributes for input current:
+
+**curr1_input**
+
+**curr1_label**
+
+The driver provides the following attributes for output current:
+
+**curr2_input**
+
+**curr2_label**
+
+**curr2_crit**
+
+**curr2_crit_alarm**
+
+**curr2_max**
+
+**curr2_max_alarm**
+
+**curr3_input**
+
+**curr3_label**
+
+**curr3_crit**
+
+**curr3_crit_alarm**
+
+**curr3_max**
+
+**curr3_max_alarm**
+
+The driver provides the following attributes for input power:
+
+**power1_input**
+
+**power1_label**
+
+**power2_input**
+
+**power2_label**
+
+The driver provides the following attributes for output power:
+
+**power3_input**
+
+**power3_label**
+
+**power4_input**
+
+**power4_label**
+
+The driver provides the following attributes for temperature:
+
+**temp1_input**
+
+**temp1_crit**
+
+**temp1_crit_alarm**
+
+**temp1_max**
+
+**temp1_max_alarm**
+
+**temp2_input**
+
+**temp2_crit**
+
+**temp2_crit_alarm**
+
+**temp2_max**
+
+**temp2_max_alarm**
diff --git a/MAINTAINERS b/MAINTAINERS
index 3adc870d523b..ead04c2d1665 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17933,6 +17933,13 @@ S: Maintained
F: Documentation/hwmon/mp29502.rst
F: drivers/hwmon/pmbus/mp29502.c
+MPS MP2985 DRIVER
+M: Wensheng Wang <wenswang@yeah.net>
+L: linux-hwmon@vger.kernel.org
+S: Maintained
+F: Documentation/hwmon/mp2985.rst
+F: drivers/hwmon/pmbus/mp2985.c
+
MPS MP2993 DRIVER
M: Noah Wang <noahwang.wang@outlook.com>
L: linux-hwmon@vger.kernel.org
diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index fc1273abe357..83fe5866c083 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -447,6 +447,15 @@ config SENSORS_MP2975
This driver can also be built as a module. If so, the module will
be called mp2975.
+config SENSORS_MP2985
+ tristate "MPS MP2985"
+ help
+ If you say yes here you get hardware monitoring support for MPS
+ MP2985 Dual Loop Digital Multi-Phase Controller.
+
+ This driver can also be built as a module. If so, the module will
+ be called mp2985.
+
config SENSORS_MP2993
tristate "MPS MP2993"
help
diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile
index d6c86924f887..24505bbee2b0 100644
--- a/drivers/hwmon/pmbus/Makefile
+++ b/drivers/hwmon/pmbus/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_SENSORS_MP2891) += mp2891.o
obj-$(CONFIG_SENSORS_MP2925) += mp2925.o
obj-$(CONFIG_SENSORS_MP29502) += mp29502.o
obj-$(CONFIG_SENSORS_MP2975) += mp2975.o
+obj-$(CONFIG_SENSORS_MP2985) += mp2985.o
obj-$(CONFIG_SENSORS_MP2993) += mp2993.o
obj-$(CONFIG_SENSORS_MP5023) += mp5023.o
obj-$(CONFIG_SENSORS_MP5920) += mp5920.o
diff --git a/drivers/hwmon/pmbus/mp2985.c b/drivers/hwmon/pmbus/mp2985.c
new file mode 100644
index 000000000000..eb1a25b00c0b
--- /dev/null
+++ b/drivers/hwmon/pmbus/mp2985.c
@@ -0,0 +1,402 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Hardware monitoring driver for MPS Multi-phase Digital VR Controllers(MP2985)
+ *
+ * Copyright (C) 2026 MPS
+ */
+
+#include <linux/bitfield.h>
+#include <linux/i2c.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include "pmbus.h"
+
+/*
+ * Vender specific register READ_PIN_EST(0x93), READ_IIN_EST(0x8E),
+ * MFR_VR_MULTI_CONFIG_R1(0x0D) and MFR_VR_MULTI_CONFIG_R2(0x1D).
+ * The READ_PIN_EST is used to read pin telemetry, the READ_IIN_EST
+ * is used to read iin telemetry and the MFR_VR_MULTI_CONFIG_R1,
+ * MFR_VR_MULTI_CONFIG_R2 are used to obtain vid scale.
+ */
+#define READ_PIN_EST 0x93
+#define READ_IIN_EST 0x8E
+#define MFR_VR_MULTI_CONFIG_R1 0x0D
+#define MFR_VR_MULTI_CONFIG_R2 0x1D
+
+#define MP2985_VOUT_DIV 64
+#define MP2985_VOUT_OVUV_UINT 125
+#define MP2985_VOUT_OVUV_DIV 64
+
+#define MP2985_PAGE_NUM 2
+
+#define MP2985_RAIL1_FUNC (PMBUS_HAVE_VIN | PMBUS_HAVE_PIN | \
+ PMBUS_HAVE_VOUT | PMBUS_HAVE_IOUT | \
+ PMBUS_HAVE_POUT | PMBUS_HAVE_TEMP | \
+ PMBUS_HAVE_STATUS_VOUT | \
+ PMBUS_HAVE_STATUS_IOUT | \
+ PMBUS_HAVE_STATUS_TEMP | \
+ PMBUS_HAVE_STATUS_INPUT)
+
+#define MP2985_RAIL2_FUNC (PMBUS_HAVE_PIN | PMBUS_HAVE_VOUT | \
+ PMBUS_HAVE_IOUT | PMBUS_HAVE_POUT | \
+ PMBUS_HAVE_TEMP | PMBUS_HAVE_IIN | \
+ PMBUS_HAVE_STATUS_VOUT | \
+ PMBUS_HAVE_STATUS_IOUT | \
+ PMBUS_HAVE_STATUS_TEMP | \
+ PMBUS_HAVE_STATUS_INPUT)
+
+struct mp2985_data {
+ struct pmbus_driver_info info;
+ int vout_scale[MP2985_PAGE_NUM];
+ int vid_offset[MP2985_PAGE_NUM];
+};
+
+#define to_mp2985_data(x) container_of(x, struct mp2985_data, info)
+
+static u16 mp2985_linear_exp_transfer(u16 word, u16 expect_exponent)
+{
+ s16 exponent, mantissa, target_exponent;
+
+ exponent = ((s16)word) >> 11;
+ mantissa = ((s16)((word & 0x7ff) << 5)) >> 5;
+ target_exponent = (s16)((expect_exponent & 0x1f) << 11) >> 11;
+
+ /*
+ * The MP2985 does not support negtive limit value, if a negtive
+ * limit value is written, the limit value will become to 0. And
+ * the maximum positive limit value is limitted to 0x3FF.
+ */
+ if (mantissa < 0) {
+ mantissa = 0;
+ } else {
+ if (exponent > target_exponent) {
+ mantissa = (1023 >> (exponent - target_exponent)) >= mantissa ?
+ mantissa << (exponent - target_exponent) :
+ 0x3FF;
+ } else {
+ mantissa = clamp_val(mantissa >> (target_exponent - exponent),
+ 0, 0x3FF);
+ }
+ }
+
+ return mantissa | ((expect_exponent << 11) & 0xf800);
+}
+
+static int mp2985_read_byte_data(struct i2c_client *client, int page, int reg)
+{
+ int ret;
+
+ switch (reg) {
+ case PMBUS_VOUT_MODE:
+ /*
+ * The MP2985 does not follow standard PMBus protocol completely,
+ * and the calculation of vout in this driver is based on direct
+ * format. As a result, the format of vout is enforced to direct.
+ */
+ ret = PB_VOUT_MODE_DIRECT;
+ break;
+ default:
+ ret = -ENODATA;
+ break;
+ }
+
+ return ret;
+}
+
+static int mp2985_read_word_data(struct i2c_client *client, int page, int phase,
+ int reg)
+{
+ const struct pmbus_driver_info *info = pmbus_get_driver_info(client);
+ struct mp2985_data *data = to_mp2985_data(info);
+ int ret;
+
+ switch (reg) {
+ case PMBUS_READ_VOUT:
+ ret = pmbus_read_word_data(client, page, phase, reg);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * The MP2985 supports three vout mode, direct, linear11 and vid mode.
+ * In vid mode, the MP2985 vout telemetry has 49 vid step offset, but
+ * PMBUS_VOUT_OV_FAULT_LIMIT and PMBUS_VOUT_UV_FAULT_LIMIT do not take
+ * this into consideration, their resolution are 1.953125mV/LSB, as a
+ * result, format[PSC_VOLTAGE_OUT] can not be set to vid mode directly.
+ * Adding extra vid_offset variable for vout telemetry.
+ */
+ ret = clamp_val(DIV_ROUND_CLOSEST(((ret & GENMASK(11, 0)) +
+ data->vid_offset[page]) *
+ data->vout_scale[page], MP2985_VOUT_DIV),
+ 0, 0x7FFF);
+ break;
+ case PMBUS_READ_IIN:
+ /*
+ * The MP2985 has standard PMBUS_READ_IIN register(0x89), but this is
+ * not used to read the input current of per rail. The input current
+ * is read through the vender redefined register READ_IIN_EST(0x8E).
+ */
+ ret = pmbus_read_word_data(client, page, phase, READ_IIN_EST);
+ break;
+ case PMBUS_READ_PIN:
+ /*
+ * The MP2985 has standard PMBUS_READ_PIN register(0x97), but this
+ * is not used to read the input power of per rail. The input power
+ * of per rail is read through the vender redefined register
+ * READ_PIN_EST(0x93).
+ */
+ ret = pmbus_read_word_data(client, page, phase, READ_PIN_EST);
+ break;
+ case PMBUS_VOUT_OV_FAULT_LIMIT:
+ case PMBUS_VOUT_UV_FAULT_LIMIT:
+ ret = pmbus_read_word_data(client, page, phase, reg);
+ if (ret < 0)
+ return ret;
+
+ ret = DIV_ROUND_CLOSEST((ret & GENMASK(11, 0)) * MP2985_VOUT_OVUV_UINT,
+ MP2985_VOUT_OVUV_DIV);
+ break;
+ case PMBUS_STATUS_WORD:
+ case PMBUS_READ_VIN:
+ case PMBUS_READ_IOUT:
+ case PMBUS_READ_POUT:
+ case PMBUS_READ_TEMPERATURE_1:
+ case PMBUS_VIN_OV_FAULT_LIMIT:
+ case PMBUS_VIN_OV_WARN_LIMIT:
+ case PMBUS_VIN_UV_WARN_LIMIT:
+ case PMBUS_VIN_UV_FAULT_LIMIT:
+ case PMBUS_IOUT_OC_FAULT_LIMIT:
+ case PMBUS_IOUT_OC_WARN_LIMIT:
+ case PMBUS_OT_FAULT_LIMIT:
+ case PMBUS_OT_WARN_LIMIT:
+ /*
+ * These register is not explicitly handled by the driver,
+ * as a result, return -ENODATA directly.
+ */
+ ret = -ENODATA;
+ break;
+ default:
+ /*
+ * The MP2985 do not support other telemetry and limit value
+ * reading, so, return -EINVAL directly.
+ */
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static int mp2985_write_word_data(struct i2c_client *client, int page, int reg,
+ u16 word)
+{
+ int ret;
+
+ switch (reg) {
+ case PMBUS_VIN_OV_FAULT_LIMIT:
+ case PMBUS_VIN_OV_WARN_LIMIT:
+ case PMBUS_VIN_UV_WARN_LIMIT:
+ case PMBUS_VIN_UV_FAULT_LIMIT:
+ /*
+ * The PMBUS_VIN_OV_FAULT_LIMIT, PMBUS_VIN_OV_WARN_LIMIT,
+ * PMBUS_VIN_UV_WARN_LIMIT and PMBUS_VIN_UV_FAULT_LIMIT
+ * of MP2985 is linear11 format, and the exponent is a
+ * constant value(5'b11101), so the exponent of word
+ * parameter should be converted to 5'b11101(0x1D).
+ */
+ ret = pmbus_write_word_data(client, page, reg,
+ mp2985_linear_exp_transfer(word, 0x1D));
+ break;
+ case PMBUS_VOUT_OV_FAULT_LIMIT:
+ case PMBUS_VOUT_UV_FAULT_LIMIT:
+ /*
+ * The bit0-bit11 is the limit value, and bit12-bit15
+ * should not be changed.
+ */
+ ret = pmbus_read_word_data(client, page, 0xff, reg);
+ if (ret < 0)
+ return ret;
+
+ ret = pmbus_write_word_data(client, page, reg,
+ (ret & ~GENMASK(11, 0)) |
+ clamp_val(DIV_ROUND_CLOSEST(word * MP2985_VOUT_OVUV_DIV,
+ MP2985_VOUT_OVUV_UINT), 0, 0xFFF));
+ break;
+ case PMBUS_OT_FAULT_LIMIT:
+ case PMBUS_OT_WARN_LIMIT:
+ /*
+ * The PMBUS_OT_FAULT_LIMIT and PMBUS_OT_WARN_LIMIT of
+ * MP2985 is linear11 format, and the exponent is a
+ * constant value(5'b00000), so the exponent of word
+ * parameter should be converted to 5'b00000.
+ */
+ ret = pmbus_write_word_data(client, page, reg,
+ mp2985_linear_exp_transfer(word, 0x00));
+ break;
+ case PMBUS_IOUT_OC_FAULT_LIMIT:
+ case PMBUS_IOUT_OC_WARN_LIMIT:
+ /*
+ * The PMBUS_IOUT_OC_FAULT_LIMIT and PMBUS_IOUT_OC_WARN_LIMIT
+ * of MP2985 is linear11 format, and the exponent can not be
+ * changed.
+ */
+ ret = pmbus_read_word_data(client, page, 0xff, reg);
+ if (ret < 0)
+ return ret;
+
+ ret = pmbus_write_word_data(client, page, reg,
+ mp2985_linear_exp_transfer(word,
+ FIELD_GET(GENMASK(15, 11),
+ ret)));
+ break;
+ default:
+ /*
+ * The MP2985 do not support other limit value configuration,
+ * so, return -EINVAL directly.
+ */
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+mp2985_identify_vout_scale(struct i2c_client *client, struct pmbus_driver_info *info,
+ int page)
+{
+ struct mp2985_data *data = to_mp2985_data(info);
+ int ret;
+
+ ret = i2c_smbus_write_byte_data(client, PMBUS_PAGE, page);
+ if (ret < 0)
+ return ret;
+
+ ret = i2c_smbus_read_byte_data(client, PMBUS_VOUT_MODE);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * The MP2985 supports three vout mode. If PMBUS_VOUT_MODE
+ * bit5 is 1, it is vid mode. If PMBUS PMBUS_VOUT_MODE bit4
+ * is 1, it is linear11 mode, the vout scale is 1.953125mv/LSB.
+ * If PMBUS PMBUS_VOUT_MODE bit6 is 1, it is direct mode, the
+ * vout scale is 1mv/LSB. In vid mode, the MP2985 vout telemetry
+ * has 49 vid step offset.
+ */
+ if (FIELD_GET(BIT(5), ret)) {
+ ret = i2c_smbus_write_byte_data(client, PMBUS_PAGE, 2);
+ if (ret < 0)
+ return ret;
+
+ ret = i2c_smbus_read_word_data(client, page == 0 ?
+ MFR_VR_MULTI_CONFIG_R1 :
+ MFR_VR_MULTI_CONFIG_R2);
+ if (ret < 0)
+ return ret;
+
+ if (page == 0) {
+ if (FIELD_GET(BIT(4), ret))
+ data->vout_scale[page] = 320;
+ else
+ data->vout_scale[page] = 640;
+ } else {
+ if (FIELD_GET(BIT(3), ret))
+ data->vout_scale[page] = 320;
+ else
+ data->vout_scale[page] = 640;
+ }
+
+ data->vid_offset[page] = 49;
+
+ /*
+ * For vid mode, the MP2985 should be changed to page 2
+ * to obtain vout scale value, this may confuse the PMBus
+ * core. To avoid this, switch back to the previous page
+ * again.
+ */
+ ret = i2c_smbus_write_byte_data(client, PMBUS_PAGE, page);
+ if (ret < 0)
+ return ret;
+ } else if (FIELD_GET(BIT(4), ret)) {
+ data->vout_scale[page] = 125;
+ data->vid_offset[page] = 0;
+ } else {
+ data->vout_scale[page] = 64;
+ data->vid_offset[page] = 0;
+ }
+
+ return 0;
+}
+
+static int mp2985_identify(struct i2c_client *client, struct pmbus_driver_info *info)
+{
+ int ret;
+
+ ret = mp2985_identify_vout_scale(client, info, 0);
+ if (ret < 0)
+ return ret;
+
+ return mp2985_identify_vout_scale(client, info, 1);
+}
+
+static struct pmbus_driver_info mp2985_info = {
+ .pages = MP2985_PAGE_NUM,
+ .format[PSC_VOLTAGE_IN] = linear,
+ .format[PSC_CURRENT_IN] = linear,
+ .format[PSC_CURRENT_OUT] = linear,
+ .format[PSC_POWER] = linear,
+ .format[PSC_TEMPERATURE] = linear,
+ .format[PSC_VOLTAGE_OUT] = direct,
+
+ .m[PSC_VOLTAGE_OUT] = 1,
+ .R[PSC_VOLTAGE_OUT] = 3,
+ .b[PSC_VOLTAGE_OUT] = 0,
+
+ .func[0] = MP2985_RAIL1_FUNC,
+ .func[1] = MP2985_RAIL2_FUNC,
+ .read_word_data = mp2985_read_word_data,
+ .read_byte_data = mp2985_read_byte_data,
+ .write_word_data = mp2985_write_word_data,
+ .identify = mp2985_identify,
+};
+
+static int mp2985_probe(struct i2c_client *client)
+{
+ struct mp2985_data *data;
+
+ data = devm_kzalloc(&client->dev, sizeof(struct mp2985_data), GFP_KERNEL);
+ if (!data)
+ return -ENOMEM;
+
+ memcpy(&data->info, &mp2985_info, sizeof(mp2985_info));
+
+ return pmbus_do_probe(client, &data->info);
+}
+
+static const struct i2c_device_id mp2985_id[] = {
+ {"mp2985", 0},
+ {}
+};
+MODULE_DEVICE_TABLE(i2c, mp2985_id);
+
+static const struct of_device_id __maybe_unused mp2985_of_match[] = {
+ {.compatible = "mps,mp2985"},
+ {}
+};
+MODULE_DEVICE_TABLE(of, mp2985_of_match);
+
+static struct i2c_driver mp2985_driver = {
+ .driver = {
+ .name = "mp2985",
+ .of_match_table = mp2985_of_match,
+ },
+ .probe = mp2985_probe,
+ .id_table = mp2985_id,
+};
+
+module_i2c_driver(mp2985_driver);
+
+MODULE_AUTHOR("Wensheng Wang <wenswang@yeah.net>");
+MODULE_DESCRIPTION("PMBus driver for MPS MP2985 device");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS("PMBUS");
--
2.25.1
^ permalink raw reply related
* [PATCH v4 1/2] dt-bindings: hwmon: Add MPS mp2985
From: wenswang @ 2026-04-14 9:29 UTC (permalink / raw)
To: robh, krzk+dt, conor+dt, linux, corbet, skhan
Cc: devicetree, linux-kernel, linux-hwmon, linux-doc, Wensheng Wang,
Krzysztof Kozlowski
In-Reply-To: <20260414092801.1067470-1-wenswang@yeah.net>
From: Wensheng Wang <wenswang@yeah.net>
Add support for MPS mp2985 controller.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Wensheng Wang <wenswang@yeah.net>
---
v1 -> v2:
1. add Krzysztof's Acked-by
Documentation/devicetree/bindings/trivial-devices.yaml | 2 ++
1 file changed, 2 insertions(+)
diff --git a/Documentation/devicetree/bindings/trivial-devices.yaml b/Documentation/devicetree/bindings/trivial-devices.yaml
index a482aeadcd44..d61482269352 100644
--- a/Documentation/devicetree/bindings/trivial-devices.yaml
+++ b/Documentation/devicetree/bindings/trivial-devices.yaml
@@ -325,6 +325,8 @@ properties:
- mps,mp29612
# Monolithic Power Systems Inc. multi-phase controller mp29816
- mps,mp29816
+ # Monolithic Power Systems Inc. multi-phase controller mp2985
+ - mps,mp2985
# Monolithic Power Systems Inc. multi-phase controller mp2993
- mps,mp2993
# Monolithic Power Systems Inc. hot-swap protection device
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v4 2/9] bus: mhi: Move sahara protocol driver under drivers/bus/mhi
From: Kishore Batta @ 2026-04-14 9:45 UTC (permalink / raw)
To: Manivannan Sadhasivam, Jeff Hugo
Cc: Jonathan Corbet, Shuah Khan, Carl Vanderlip, Oded Gabbay,
andersson, linux-doc, linux-kernel, linux-arm-msm, dri-devel, mhi
In-Reply-To: <sab2tgxtiftme5gscknsl7cfifpshtlrnnihbm2g56ppbowcit@bg4bzwuta6a6>
On 4/13/2026 4:34 PM, Manivannan Sadhasivam wrote:
> On Thu, Apr 09, 2026 at 02:20:02PM -0600, Jeff Hugo wrote:
>> On 3/19/2026 12:31 AM, Kishore Batta wrote:
>>> The Sahara protocol driver is currently located under the QAIC
>>> accelerator subsystem even though protocol itself is transported over the
>>> MHI bus and is used by multiple Qualcomm flashless devices.
>>>
>>> Relocate the Sahara protocol driver to drivers/bus/mhi and register it as
>>> an independent MHI protocol driver. This avoids treating Sahara as QAIC
>>> specific and makes it available for reuse by other MHI based devices.
>>>
>>> As part of this move, introduce a dedicated Kconfig and Makefile under the
>>> MHI subsystem and expose the sahara interface via a common header.
>> I don't think this belongs under MHI. Mani needs to confirm that he agrees
>> with the concept of moving this there.
>>
>> The Sahara protocol as defined by the spec does not require MHI. We know
>> that there are Sahara implementations over USB. I don't see a dependency or
>> relationship to MHI other than the current in-kernel implementation uses
>> MHI, but there are plenty of things that use MHI (qaic, mhi-net, ath12k,
>> etc) which are not a part of the MHI bus.
>>
> Since Sahara is a MHI client driver, it is OK with me to place it under
> drivers/bus/mhi/host/. We do tend to host the client/controller drivers if they
> also bind to separate top level subsystems like Net, WWAN... but for the pure
> protocol drivers like Sahara, MHI can provide asylum.
>
> - Mani
Thanks for the confirmation Mani. I will keep the Sahara driver under
driver/bus/mhi/host/ and also move the Sahara documentation under
Documentation/mhi/ directory.
^ permalink raw reply
* Re: [PATCH v4 2/9] bus: mhi: Move sahara protocol driver under drivers/bus/mhi
From: Kishore Batta @ 2026-04-14 9:48 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Jonathan Corbet, Shuah Khan, Jeff Hugo, Carl Vanderlip,
Oded Gabbay, andersson, linux-doc, linux-kernel, linux-arm-msm,
dri-devel, mhi
In-Reply-To: <enwtopztznwtvlhukkggxcdmh4t7v7duoiuapi5gd4zggqwbit@ypb4nxnds53f>
On 4/13/2026 4:50 PM, Manivannan Sadhasivam wrote:
> On Thu, Mar 19, 2026 at 12:01:42PM +0530, Kishore Batta wrote:
>> The Sahara protocol driver is currently located under the QAIC
>> accelerator subsystem even though protocol itself is transported over the
>> MHI bus and is used by multiple Qualcomm flashless devices.
>>
>> Relocate the Sahara protocol driver to drivers/bus/mhi and register it as
>> an independent MHI protocol driver. This avoids treating Sahara as QAIC
>> specific and makes it available for reuse by other MHI based devices.
>>
>> As part of this move, introduce a dedicated Kconfig and Makefile under the
>> MHI subsystem and expose the sahara interface via a common header.
>>
>> Signed-off-by: Kishore Batta <kishore.batta@oss.qualcomm.com>
>> ---
>> drivers/accel/qaic/Kconfig | 1 +
>> drivers/accel/qaic/Makefile | 3 +--
>> drivers/accel/qaic/qaic_drv.c | 11 ++---------
>> drivers/bus/mhi/Kconfig | 1 +
>> drivers/bus/mhi/Makefile | 3 +++
>> drivers/bus/mhi/sahara/Kconfig | 18 ++++++++++++++++++
>> drivers/bus/mhi/sahara/Makefile | 2 ++
> Create one more subidr 'clients' and move 'sahara' here:
> drivers/bus/mhi/host/clients/sahara/
>
> I'm not sure if we are going to have Sahara implementation for the endpoint
> itself. If so, it should be moved under drivers/bus/mhi/common/.
Thanks for the suggestion. I will create clients directory and move
Sahara driver here. For endpoint, Sahara driver is implemented in XBL.
So, its not required here.
>
>> drivers/{accel/qaic => bus/mhi/sahara}/sahara.c | 16 +++++++++++-----
>> {drivers/accel/qaic => include/linux}/sahara.h | 0
> include/linux/mhi/sahara.h
ACK. I will move the header file to include/linux/mhi/sahara.h
>
>> 9 files changed, 39 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/accel/qaic/Kconfig b/drivers/accel/qaic/Kconfig
>> index 116e42d152ca885b8c59e33c7a87519a0abc6bb3..1e5f1f4fa93c12d8ca8fb37633f2f0bee9997499 100644
>> --- a/drivers/accel/qaic/Kconfig
>> +++ b/drivers/accel/qaic/Kconfig
>> @@ -8,6 +8,7 @@ config DRM_ACCEL_QAIC
>> depends on DRM_ACCEL
>> depends on PCI && HAS_IOMEM
>> depends on MHI_BUS
>> + select MHI_SAHARA
>> select CRC32
>> select WANT_DEV_COREDUMP
>> help
>> diff --git a/drivers/accel/qaic/Makefile b/drivers/accel/qaic/Makefile
>> index 71f727b74da3bb4478324689f02a7cea24a05c2d..e7b8458800072aa627f7f36c3257883aa56f4ce4 100644
>> --- a/drivers/accel/qaic/Makefile
>> +++ b/drivers/accel/qaic/Makefile
>> @@ -13,7 +13,6 @@ qaic-y := \
>> qaic_ras.o \
>> qaic_ssr.o \
>> qaic_sysfs.o \
>> - qaic_timesync.o \
>> - sahara.o
>> + qaic_timesync.o
>>
>> qaic-$(CONFIG_DEBUG_FS) += qaic_debugfs.o
>> diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
>> index 63fb8c7b4abcbe4f1b76c32106f4e8b9ea5e2c8e..76cc8086825e7949ed756d51fcb56a08f392d228 100644
>> --- a/drivers/accel/qaic/qaic_drv.c
>> +++ b/drivers/accel/qaic/qaic_drv.c
>> @@ -15,6 +15,7 @@
>> #include <linux/msi.h>
>> #include <linux/mutex.h>
>> #include <linux/pci.h>
>> +#include <linux/sahara.h>
>> #include <linux/spinlock.h>
>> #include <linux/workqueue.h>
>> #include <linux/wait.h>
>> @@ -32,7 +33,6 @@
>> #include "qaic_ras.h"
>> #include "qaic_ssr.h"
>> #include "qaic_timesync.h"
>> -#include "sahara.h"
>>
>> MODULE_IMPORT_NS("DMA_BUF");
>>
>> @@ -782,18 +782,12 @@ static int __init qaic_init(void)
>> ret = pci_register_driver(&qaic_pci_driver);
>> if (ret) {
>> pr_debug("qaic: pci_register_driver failed %d\n", ret);
>> - return ret;
>> + goto free_pci;
>> }
>>
>> ret = mhi_driver_register(&qaic_mhi_driver);
>> if (ret) {
>> pr_debug("qaic: mhi_driver_register failed %d\n", ret);
>> - goto free_pci;
>> - }
>> -
>> - ret = sahara_register();
>> - if (ret) {
>> - pr_debug("qaic: sahara_register failed %d\n", ret);
>> goto free_mhi;
>> }
>>
>> @@ -847,7 +841,6 @@ static void __exit qaic_exit(void)
>> qaic_ras_unregister();
>> qaic_bootlog_unregister();
>> qaic_timesync_deinit();
>> - sahara_unregister();
>> mhi_driver_unregister(&qaic_mhi_driver);
>> pci_unregister_driver(&qaic_pci_driver);
>> }
>> diff --git a/drivers/bus/mhi/Kconfig b/drivers/bus/mhi/Kconfig
>> index b39a11e6c624ba00349cca22d74bd876020590ab..4acedb886adccc6f76f69c241d53106da59b491f 100644
>> --- a/drivers/bus/mhi/Kconfig
>> +++ b/drivers/bus/mhi/Kconfig
>> @@ -7,3 +7,4 @@
>>
>> source "drivers/bus/mhi/host/Kconfig"
>> source "drivers/bus/mhi/ep/Kconfig"
>> +source "drivers/bus/mhi/sahara/Kconfig"
>> diff --git a/drivers/bus/mhi/Makefile b/drivers/bus/mhi/Makefile
>> index 354204b0ef3ae4030469a24a659f32429d592aef..e4af535e1bb1bc9481fae60d7eb347700d2e874c 100644
>> --- a/drivers/bus/mhi/Makefile
>> +++ b/drivers/bus/mhi/Makefile
>> @@ -3,3 +3,6 @@ obj-$(CONFIG_MHI_BUS) += host/
>>
>> # Endpoint MHI stack
>> obj-$(CONFIG_MHI_BUS_EP) += ep/
>> +
>> +# Sahara MHI protocol
>> +obj-$(CONFIG_MHI_SAHARA) += sahara/
>> diff --git a/drivers/bus/mhi/sahara/Kconfig b/drivers/bus/mhi/sahara/Kconfig
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..3f1caf6acd979a4af68aaf0e250aa54762e8cda5
>> --- /dev/null
>> +++ b/drivers/bus/mhi/sahara/Kconfig
>> @@ -0,0 +1,18 @@
>> +config MHI_SAHARA
>> + tristate
>> + depends on MHI_BUS
>> + select FW_LOADER_COMPRESS
>> + select FW_LOADER_COMPRESS_XZ
>> + select FW_LOADER_COMPRESS_ZSTD
> Why suddenly these configs pop up?
I will remove these in the next version.
>
>> + help
>> + Enable support for the Sahara protocol transported over the MHI bus.
>> +
>> + The Sahara protocol is used to transfer firmware images, retrieve
>> + memory dumps and exchange command mode DDR calibration data between
>> + host and device. This driver is not tied to a specific SoC and may be
>> + used by multiple MHI based devices.
>> +
>> + If unsure, say N.
>> +
>> + To compile this driver as a module, choose M here: the module will be
>> + called mhi_sahara.
>> diff --git a/drivers/bus/mhi/sahara/Makefile b/drivers/bus/mhi/sahara/Makefile
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..fc02a25935011cbd7138ea8f24b88cf5b032a4ce
>> --- /dev/null
>> +++ b/drivers/bus/mhi/sahara/Makefile
>> @@ -0,0 +1,2 @@
>> +obj-$(CONFIG_MHI_SAHARA) += mhi_sahara.o
>> +mhi_sahara-y := sahara.o
>> diff --git a/drivers/accel/qaic/sahara.c b/drivers/bus/mhi/sahara/sahara.c
>> similarity index 99%
>> rename from drivers/accel/qaic/sahara.c
>> rename to drivers/bus/mhi/sahara/sahara.c
>> index fd3c3b2d1fd3bb698809e6ca669128e2dce06613..8ff7b6425ac5423ef8f32117151dca10397686a8 100644
>> --- a/drivers/accel/qaic/sahara.c
>> +++ b/drivers/bus/mhi/sahara/sahara.c
>> @@ -1,6 +1,8 @@
>> -// SPDX-License-Identifier: GPL-2.0-only
>> -
>> -/* Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. */
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2018-2020, The Linux Foundation. All rights reserved.
> Why are you changing the copyright?
I misunderstood the comment from Patch 1 series. Only the copyright
style needs to be changed. I will modify it in next version.
>
>> + *
>> + */
>>
>> #include <linux/devcoredump.h>
>> #include <linux/firmware.h>
>> @@ -9,12 +11,11 @@
>> #include <linux/minmax.h>
>> #include <linux/mod_devicetable.h>
>> #include <linux/overflow.h>
>> +#include <linux/sahara.h>
>> #include <linux/types.h>
>> #include <linux/vmalloc.h>
>> #include <linux/workqueue.h>
>>
>> -#include "sahara.h"
>> -
>> #define SAHARA_HELLO_CMD 0x1 /* Min protocol version 1.0 */
>> #define SAHARA_HELLO_RESP_CMD 0x2 /* Min protocol version 1.0 */
>> #define SAHARA_READ_DATA_CMD 0x3 /* Min protocol version 1.0 */
>> @@ -928,8 +929,13 @@ int sahara_register(void)
>> {
>> return mhi_driver_register(&sahara_mhi_driver);
>> }
>> +module_init(sahara_register);
>>
>> void sahara_unregister(void)
>> {
>> mhi_driver_unregister(&sahara_mhi_driver);
>> }
>> +module_exit(sahara_unregister);
> Use module_mhi_driver().
ACK.
>
> - Mani
>
^ permalink raw reply
* Re: [PATCH v4 4/9] bus: mhi: Centralize firmware image table selection at probe time
From: Kishore Batta @ 2026-04-14 9:49 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Jonathan Corbet, Shuah Khan, Jeff Hugo, Carl Vanderlip,
Oded Gabbay, andersson, linux-doc, linux-kernel, linux-arm-msm,
dri-devel, mhi
In-Reply-To: <2sykuv6r643v3i6ymdoevzohoxdmgrrodvgpbaystskz7fwgun@fd3p7gcso252>
On 4/13/2026 4:56 PM, Manivannan Sadhasivam wrote:
> On Thu, Mar 19, 2026 at 12:01:44PM +0530, Kishore Batta wrote:
>> The Sahara driver currently selects firmware image tables using
>> scattered, device specific conditionals in the probe path, making the
>> logic harder to follow and extend.
>>
>> Refactor firmware image table selection into a single, explicit probe-time
>> mechanism by introducing a variant table that captures device matching,
>> firmware image tables, firmware folder names, and streaming behavior in
>> one place.
>>
>> This centralizes device specific decisions, simplifies the probe logic,
>> and avoids ad-hoc conditionals while preserving the existing behavior for
>> all supported AIC devices.
>>
>> Signed-off-by: Kishore Batta <kishore.batta@oss.qualcomm.com>
>> ---
>> drivers/bus/mhi/sahara/sahara.c | 66 ++++++++++++++++++++++++++++++++++++-----
>> 1 file changed, 58 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/bus/mhi/sahara/sahara.c b/drivers/bus/mhi/sahara/sahara.c
>> index e3499977e7c6b53bc624a8eb00d0636f2ea63307..8f1c0d72066c0cf80c09d78bfc51df2e482133b9 100644
>> --- a/drivers/bus/mhi/sahara/sahara.c
>> +++ b/drivers/bus/mhi/sahara/sahara.c
>> @@ -180,6 +180,16 @@ struct sahara_context {
>> u32 read_data_length;
>> bool is_mem_dump_mode;
>> bool non_streaming;
>> + const char *fw_folder;
>> +};
>> +
>> +struct sahara_variant {
>> + const char *match;
>> + bool match_is_chan;
> This name makes no sense.
>
> - Mani
I will drop this in the next version.
>> + const char * const *image_table;
>> + size_t table_size;
>> + const char *fw_folder;
>> + bool non_streaming;
>> };
>>
>> static const char * const aic100_image_table[] = {
>> @@ -224,11 +234,50 @@ static const char * const aic200_image_table[] = {
>> [78] = "qcom/aic200/pvs.bin",
>> };
>>
>> +static const struct sahara_variant sahara_variants[] = {
>> + {
>> + .match = "AIC100",
>> + .match_is_chan = false,
>> + .image_table = aic100_image_table,
>> + .table_size = ARRAY_SIZE(aic100_image_table),
>> + .fw_folder = "aic100",
>> + .non_streaming = true,
>> + },
>> + {
>> + .match = "AIC200",
>> + .match_is_chan = false,
>> + .image_table = aic200_image_table,
>> + .table_size = ARRAY_SIZE(aic200_image_table),
>> + .fw_folder = "aic200",
>> + .non_streaming = false,
>> + }
>> +};
>> +
>> static bool is_streaming(struct sahara_context *context)
>> {
>> return !context->non_streaming;
>> }
>>
>> +static const struct sahara_variant *sahara_select_variant(struct mhi_device *mhi_dev,
>> + const struct mhi_device_id *id)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < ARRAY_SIZE(sahara_variants); i++) {
>> + const struct sahara_variant *v = &sahara_variants[i];
>> +
>> + if (v->match_is_chan) {
>> + if (id && id->chan && !strcmp(id->chan, v->match))
>> + return v;
>> + } else {
>> + if (mhi_dev->mhi_cntrl && mhi_dev->mhi_cntrl->name &&
>> + !strcmp(mhi_dev->mhi_cntrl->name, v->match))
>> + return v;
>> + }
>> + }
>> + return NULL;
>> +}
>> +
>> static int sahara_find_image(struct sahara_context *context, u32 image_id)
>> {
>> int ret;
>> @@ -797,6 +846,7 @@ static void sahara_read_data_processing(struct work_struct *work)
>>
>> static int sahara_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id *id)
>> {
>> + const struct sahara_variant *variant;
>> struct sahara_context *context;
>> int ret;
>> int i;
>> @@ -809,14 +859,14 @@ static int sahara_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_
>> if (!context->rx)
>> return -ENOMEM;
>>
>> - if (!strcmp(mhi_dev->mhi_cntrl->name, "AIC200")) {
>> - context->image_table = aic200_image_table;
>> - context->table_size = ARRAY_SIZE(aic200_image_table);
>> - } else {
>> - context->image_table = aic100_image_table;
>> - context->table_size = ARRAY_SIZE(aic100_image_table);
>> - context->non_streaming = true;
>> - }
>> + variant = sahara_select_variant(mhi_dev, id);
>> + if (!variant)
>> + return -ENODEV;
>> +
>> + context->image_table = variant->image_table;
>> + context->table_size = variant->table_size;
>> + context->non_streaming = variant->non_streaming;
>> + context->fw_folder = variant->fw_folder;
>>
>> /*
>> * There are two firmware implementations for READ_DATA handling.
>>
>> --
>> 2.34.1
>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox