* Re: [PATCH v13 02/12] iio: kstrtox: add local _parse_integer_limit_init() helper
From: Jonathan Cameron @ 2026-05-17 13:53 UTC (permalink / raw)
To: Rodrigo Alencar via B4 Relay
Cc: rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Andy Shevchenko, Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan
In-Reply-To: <20260517-adf41513-iio-driver-v13-2-bb6e134a360f@analog.com>
On Sun, 17 May 2026 10:13:57 +0100
Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> From: Rodrigo Alencar <rodrigo.alencar@analog.com>
>
> Add parsing helper that accepts an initial value for the accumulated
> result when parsing an 64-bit integer. It reuses current implementation
> for _parse_integer_limit(), which now consumes the new function with
> init = 0. The diff algorithm would have the documentation header and
> prototype of _parse_integer_limit() moved around so it is adjusted
> according to guidelines.
>
> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
Sashiko makes a good point on this. It is not currently IIO specific
so the patch title should make that clear.
kstrox: add local_parse_integer_limit_init() helper.
> ---
> lib/kstrtox.c | 39 ++++++++++++++++++++++++++-------------
> 1 file changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/lib/kstrtox.c b/lib/kstrtox.c
> index 97be2a39f537..0705461f51c0 100644
> --- a/lib/kstrtox.c
> +++ b/lib/kstrtox.c
> @@ -39,23 +39,15 @@ const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
> return s;
> }
>
> -/*
> - * Convert non-negative integer string representation in explicitly given radix
> - * to an integer. A maximum of max_chars characters will be converted.
> - *
> - * Return number of characters consumed maybe or-ed with overflow bit.
> - * If overflow occurs, result integer (incorrect) is still returned.
> - *
> - * Don't you dare use this function.
> - */
> -noinline
> -unsigned int _parse_integer_limit(const char *s, unsigned int base, unsigned long long *p,
> - size_t max_chars)
> +static unsigned int _parse_integer_limit_init(const char *s, unsigned int base,
> + unsigned long long init,
> + unsigned long long *p,
> + size_t max_chars)
> {
> unsigned long long res;
> unsigned int rv;
>
> - res = 0;
> + res = init;
> rv = 0;
> while (max_chars--) {
> unsigned int c = *s;
> @@ -87,6 +79,27 @@ unsigned int _parse_integer_limit(const char *s, unsigned int base, unsigned lon
> return rv;
> }
>
> +/**
> + * _parse_integer_limit() - Convert integer string representation to an integer
> + * limiting the number of characters parsed.
> + * @s: The start of the string.
> + * @base: The number base to use.
> + * @p: Where to write the result of the conversion.
> + * @max_chars: Maximum amount of characters to consume.
> + *
> + * Convert non-negative integer string representation in explicitly given radix
> + * to an integer. A maximum of max_chars characters will be converted.
> + *
> + * Return: Number of characters consumed maybe or-ed with overflow bit.
> + * If overflow occurs, result integer (incorrect) is still returned.
> + */
> +noinline
> +unsigned int _parse_integer_limit(const char *s, unsigned int base,
> + unsigned long long *p, size_t max_chars)
> +{
> + return _parse_integer_limit_init(s, base, 0, p, max_chars);
> +}
> +
> noinline
> unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
> {
>
^ permalink raw reply
* [PATCH v3] killswitch: add per-function short-circuit mitigation primitive
From: Sasha Levin @ 2026-05-17 13:48 UTC (permalink / raw)
To: linux-kernel
Cc: linux-doc, linux-kselftest, bpf, live-patching,
Greg Kroah-Hartman, Andrew Morton, Jonathan Corbet,
Mathieu Desnoyers, Joshua Peisach, Florian Weimer, Breno Leitao,
Anthony Iliopoulos, Michal Hocko, Jiri Olsa, Sasha Levin
In-Reply-To: <20260508195749.1885522-1-sashal@kernel.org>
When a kernel (security) issue goes public, fleets stay exposed until a patched
kernel is built, distributed, and rebooted into.
For many such issues the simplest mitigation is to stop calling the buggy
function. Killswitch provides that. An admin writes:
echo "engage af_alg_sendmsg -1" \
> /sys/kernel/security/killswitch/control
After this, af_alg_sendmsg() returns -EPERM on every call without
running its body. The mitigation takes effect immediately, and is dropped on
the next reboot -- by which point a patched kernel is hopefully in place.
A lot of recent kernel issues sit in code paths most installs only have enabled
to support a relative minority of users: AF_ALG, ksmbd, nf_tables, vsock, ax25,
and friends.
For most users, the cost of "this socket family stops working for the day" is
much smaller than the cost of running a known vulnerable kernel until the fix
lands.
Why not an existing facility:
* livepatch needs a built, signed, per-kernel-version module per CVE.
Under Secure Boot the operator can't sign their own, so they wait
for the vendor, and only a minority of vendors actually ship
livepatches. Killswitch covers the days before that module shows
up.
* fail_function (CONFIG_FUNCTION_ERROR_INJECTION) is disabled in
most production kernels. Even where enabled, it only works on
functions pre-annotated with ALLOW_ERROR_INJECTION() in source -
no help for a freshly-disclosed CVE. The debugfs UI is blocked by
lockdown=integrity and the override is probabilistic.
* BPF override (bpf_override_return) honors the same
ALLOW_ERROR_INJECTION() whitelist, and BPF itself is off in many
production kernels. Even where on, the operator interface is
"load a verified BPF program," not a one-line write.
* Module blacklist only helps when the bug is in a loadable module.
Killswitch fills the gap: write a symbol to securityfs, function
returns the chosen value until disengage or reboot.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Changes since v2:
- Fix LLVM=1 build: gate __noipa__ on __has_attribute() (Breno)
- Admin guide: do-not-engage list, pre-soak workflow, relation to
livepatch/fail_function/BPF (Michal, Mathieu, Joshua)
- Add CVE-2026-43284 (esp_input) worked example + netns selftest
- Drop unused [reason] token from Kconfig help and cmdline comment
- Commit message: spell out why livepatch / fail_function / BPF
override / module-blacklist don't cover this window.
Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/killswitch.rst | 229 +++++
Documentation/admin-guide/tainted-kernels.rst | 8 +
MAINTAINERS | 11 +
include/linux/killswitch.h | 19 +
include/linux/panic.h | 3 +-
include/linux/security.h | 1 +
init/Kconfig | 2 +
kernel/Kconfig.killswitch | 31 +
kernel/Makefile | 1 +
kernel/killswitch.c | 863 ++++++++++++++++++
kernel/panic.c | 1 +
lib/Kconfig.debug | 13 +
lib/Makefile | 1 +
lib/test_killswitch.c | 85 ++
security/security.c | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/killswitch/.gitignore | 1 +
tools/testing/selftests/killswitch/Makefile | 8 +
.../selftests/killswitch/cve_31431_test.c | 162 ++++
.../selftests/killswitch/cve_43284_test.c | 88 ++
.../selftests/killswitch/killswitch_test.sh | 254 ++++++
22 files changed, 1783 insertions(+), 1 deletion(-)
create mode 100644 Documentation/admin-guide/killswitch.rst
create mode 100644 include/linux/killswitch.h
create mode 100644 kernel/Kconfig.killswitch
create mode 100644 kernel/killswitch.c
create mode 100644 lib/test_killswitch.c
create mode 100644 tools/testing/selftests/killswitch/.gitignore
create mode 100644 tools/testing/selftests/killswitch/Makefile
create mode 100644 tools/testing/selftests/killswitch/cve_31431_test.c
create mode 100644 tools/testing/selftests/killswitch/cve_43284_test.c
create mode 100755 tools/testing/selftests/killswitch/killswitch_test.sh
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index cd28dfe91b060..ca37dd70f108d 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -70,6 +70,7 @@ problems and bugs in particular.
bug-hunting
bug-bisect
tainted-kernels
+ killswitch
ramoops
dynamic-debug-howto
init
diff --git a/Documentation/admin-guide/killswitch.rst b/Documentation/admin-guide/killswitch.rst
new file mode 100644
index 0000000000000..a524cc9ee23ca
--- /dev/null
+++ b/Documentation/admin-guide/killswitch.rst
@@ -0,0 +1,229 @@
+.. SPDX-License-Identifier: GPL-2.0
+..
+.. Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+
+============
+Killswitch
+============
+
+Killswitch lets a privileged operator make a chosen kernel function
+return a fixed value without executing its body, as a temporary
+mitigation for a security bug while a real fix is being prepared.
+
+The function returns the operator-supplied value and nothing else
+runs in its place. There is no allowlist, no return-type check; if
+the kprobe layer accepts the symbol, killswitch engages it. Once
+engaged, the change is in effect on every CPU until ``disengage`` is
+written or the system reboots.
+
+Configuration
+=============
+
+``CONFIG_KILLSWITCH``
+ Enables the feature. Depends on ``SECURITYFS``, ``KPROBES`` (with
+ ftrace support), and ``FUNCTION_ERROR_INJECTION``.
+
+The interface
+=============
+
+::
+
+ /sys/kernel/security/killswitch/
+ engaged RO currently-engaged functions
+ control WO command sink
+ taint RO 0 or 1
+ fn/<name>/ per-function directory, created on engage
+ retval RW return value
+ hits RO per-cpu summed call count
+
+Three commands are accepted by ``control``::
+
+ engage <symbol> <retval>
+ disengage <symbol>
+ disengage_all
+
+Each engage and disengage emits a single ``KERN_WARNING`` line to
+dmesg with the symbol, retval, hit count (on disengage), and the
+operator's identity (uid/auid/sessionid/comm, or ``source=cmdline``).
+
+Engagement is rejected when:
+
+* the symbol is unknown, in a non-traceable section, on the kprobe
+ blacklist, or otherwise refused by ``register_kprobe`` (the error
+ from the kprobe layer is logged and returned to userspace);
+* the symbol is already engaged (``-EBUSY``);
+* the operator does not hold ``CAP_SYS_ADMIN``.
+
+Whatever value the operator writes is what the function returns.
+Writing the wrong type or wrong value lands in the caller as-is.
+
+Boot parameter
+==============
+
+``killswitch=fn1=<val>,fn2=<val>,...``
+
+Parsed early; engagements are applied at the end of kernel init
+once the kprobe subsystem is up. Parse failures emit a warning and
+skip the offending entry; they never panic.
+
+Useful for fleet rollout: when an issue drops, ship the mitigation
+in the bootloader / PXE config and roll the fleet through reboots
+while the real fix is being prepared.
+
+Tainting
+========
+
+The first successful engagement (runtime or boot-time) sets
+``TAINT_KILLSWITCH`` (bit 20, char ``H``). The taint persists across
+``disengage`` until reboot, so an oops on a killswitch-modified
+kernel is identifiable from the banner: ``Tainted: ... H`` tells a
+maintainer to consult ``engaged`` before further triage.
+
+Module unload
+=============
+
+If a module containing an engaged target is unloaded, killswitch
+auto-disengages the entry and emits a ``KERN_WARNING`` so the loss
+of mitigation is visible. Reloading the module does not silently
+re-arm the killswitch; the operator re-engages explicitly.
+
+Choosing the right target
+=========================
+
+A function that *looks* skippable may be relied on by callers for a
+side effect (a lock the caller releases, a refcount the caller
+drops, a scatterlist the caller consumes). The rule of thumb:
+
+ Pick the **highest-level** entry point that contains the bug.
+
+That gives callers no chance to dereference half-initialised state
+from a function whose body was skipped. Two illustrative examples
+from ``crypto/af_alg.c``:
+
+Anti-pattern: ``af_alg_count_tsgl``
+-----------------------------------
+
+``af_alg_count_tsgl()`` returns ``unsigned int`` (the number of TX
+SG entries). Engaging it with retval ``0`` causes the caller in
+``algif_aead.c`` to allocate a 1-entry scatterlist (its
+``if (!entries) entries = 1`` guard) and then walk the *real* TX
+SGL into that undersized destination via ``af_alg_pull_tsgl``,
+producing out-of-bounds writes. **Killswitching here introduces a
+worse bug than the one being mitigated.**
+
+Anti-pattern: ``af_alg_pull_tsgl``
+----------------------------------
+
+``af_alg_pull_tsgl()`` returns ``void``, so any retval is accepted.
+But its caller depends on the per-request SGL being filled in.
+Skipping the body leaves the per-request SGL with NULL pages; the
+next-stage ``memcpy_sglist`` dereferences them and the kernel
+oopses.
+
+Correct pattern: ``af_alg_sendmsg``
+-----------------------------------
+
+``af_alg_sendmsg()`` is the highest-level entry into the AF_ALG
+send path. Engaging it with retval ``-EPERM`` causes every send
+attempt to return -EPERM to userspace; no caller ever sees
+half-initialised state, and any AF_ALG-reachable bug downstream of
+``sendmsg`` is unreachable until the killswitch is disengaged.
+
+The canonical pattern: pick a syscall-handler-shaped function whose
+return value already encodes "this operation didn't happen", and
+let userspace handle the error as it would any other failed
+syscall.
+
+Correct pattern: ``esp_input`` (CVE-2026-43284)
+-----------------------------------------------
+
+The IPsec ESP receive-path bug fixed by ``xfrm: esp: avoid in-place
+decrypt on shared skb frags`` is reachable through ``esp_input()``
+in ``net/ipv4/esp4.c`` (and ``esp6_input()`` for IPv6). Engage these
+with retval ``-EINVAL``:
+
+::
+
+ echo "engage esp_input -22" > /sys/kernel/security/killswitch/control
+ echo "engage esp6_input -22" > /sys/kernel/security/killswitch/control
+
+Inbound ESP packets are then dropped before decapsulation, neutering
+any bug downstream of the ESP receive path. IPsec tunnels stop
+working; other networking is unaffected.
+
+Do not engage
+=============
+
+Do not killswitch:
+
+* process or memory primitives the rest of the kernel needs to
+ function: ``fork``, ``do_exit``, ``__alloc_pages``, ``kmalloc``,
+ ``schedule``, anything in ``mm/`` reached by every allocation.
+* hot paths in the scheduler, timekeeping, RCU, or interrupt entry.
+* functions invoked from the killswitch path itself (``securityfs``,
+ ``lockdown``, ``audit``, ``kprobe`` registration) -- the system
+ may livelock or refuse to disengage.
+* functions whose return value is read structurally (size, count,
+ pointer-to-allocated-thing) rather than as success/failure.
+ See the AF_ALG anti-patterns above for what goes wrong.
+
+When in doubt, measure first.
+
+Pre-soak before engaging
+========================
+
+If the target's call rate is unknown, attach a counter for a few
+seconds first. With perf::
+
+ perf probe --add 'esp_input'
+ perf stat -a -e probe:esp_input -- sleep 5
+
+Or with bpftrace::
+
+ bpftrace -e 'kprobe:esp_input { @hits = count(); } interval:s:5 { exit(); }'
+
+A target with ten thousand hits per second is not a candidate -- the
+kernel will not survive five seconds with that path returning a
+fixed error.
+
+Relation to other facilities
+============================
+
+* ``CONFIG_FUNCTION_ERROR_INJECTION`` provides the same architecture
+ trampoline (``override_function_with_return``), which killswitch
+ reuses. fail_function is debug-oriented: targets must be
+ pre-annotated with ``ALLOW_ERROR_INJECTION()`` in source, the
+ override is probabilistic, and the interface is on debugfs (blocked
+ under ``lockdown=integrity``). Killswitch is the production cousin:
+ no whitelist, deterministic, securityfs-visible under integrity
+ lockdown, with audit and taint.
+* livepatch can do everything killswitch can and more, at the cost
+ of building, signing, and shipping a kernel module per mitigation.
+ Killswitch is for the window before that module exists.
+* BPF override (``bpf_override_return``) needs a BPF program and
+ ``CONFIG_BPF_KPROBE_OVERRIDE``; appropriate when the policy is
+ conditional, overkill for "always return -EPERM".
+
+Safety notes
+============
+
+* In-flight calls during ``write()`` to ``control`` may run either
+ the original body or the override. The override is ``return X``,
+ which has no preconditions to violate.
+* SMP visibility comes from ``text_poke_bp()``. ``write()`` to
+ ``control`` returns only after every CPU sees the new path.
+* The ftrace ops unregister waits for in-flight pre-handlers, so
+ freeing the engagement attribute on disengage is safe.
+* Inline functions, freed ``__init`` symbols, and anything compiled
+ away cannot be killswitched. ``register_kprobe`` rejects them
+ with whatever error the kprobe layer chooses.
+
+Diagnostics
+===========
+
+Per-call hits are aggregated in a per-cpu counter readable at
+``/sys/kernel/security/killswitch/fn/<name>/hits``. Per-hit logging
+is not provided to avoid log storms on hot paths.
+
+A ``KILLSWITCH`` entry appears in the kernel taint vector once any
+engagement succeeds (also visible as ``H`` in the oops banner).
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index 9ead927a37c0f..71a6e3364eddc 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -102,6 +102,7 @@ Bit Log Number Reason that got the kernel tainted
17 _/T 131072 kernel was built with the struct randomization plugin
18 _/N 262144 an in-kernel test has been run
19 _/J 524288 userspace used a mutating debug operation in fwctl
+ 20 _/H 1048576 killswitch override engaged (function short-circuited)
=== === ====== ========================================================
Note: The character ``_`` is representing a blank in this table to make reading
@@ -189,3 +190,10 @@ More detailed explanation for tainting
19) ``J`` if userspace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE
to use the devices debugging features. Device debugging features could
cause the device to malfunction in undefined ways.
+
+ 20) ``H`` if the killswitch primitive (see
+ Documentation/admin-guide/killswitch.rst) has been engaged on at least
+ one function. The kernel is no longer running its source: at least one
+ function has been short-circuited to return a fixed value. The taint
+ persists across ``disengage`` until the next reboot — once the running
+ image has been modified, oops triage must reflect that.
diff --git a/MAINTAINERS b/MAINTAINERS
index b2040011a3865..b4005b61d444f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14350,6 +14350,17 @@ F: lib/Kconfig.kmsan
F: mm/kmsan/
F: scripts/Makefile.kmsan
+KILLSWITCH (function short-circuit mitigation)
+M: Sasha Levin <sashal@kernel.org>
+L: linux-kernel@vger.kernel.org
+S: Maintained
+F: Documentation/admin-guide/killswitch.rst
+F: include/linux/killswitch.h
+F: kernel/Kconfig.killswitch
+F: kernel/killswitch.c
+F: lib/test_killswitch.c
+F: tools/testing/selftests/killswitch/
+
KPROBES
M: Naveen N Rao <naveen@kernel.org>
M: "David S. Miller" <davem@davemloft.net>
diff --git a/include/linux/killswitch.h b/include/linux/killswitch.h
new file mode 100644
index 0000000000000..3fad49e180ddf
--- /dev/null
+++ b/include/linux/killswitch.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+ */
+#ifndef _LINUX_KILLSWITCH_H
+#define _LINUX_KILLSWITCH_H
+
+#ifdef CONFIG_KILLSWITCH
+int killswitch_engage(const char *symbol, long retval);
+int killswitch_disengage(const char *symbol);
+bool killswitch_is_engaged(const char *symbol);
+#else
+static inline int killswitch_engage(const char *symbol, long retval)
+{ return -EOPNOTSUPP; }
+static inline int killswitch_disengage(const char *symbol) { return -EOPNOTSUPP; }
+static inline bool killswitch_is_engaged(const char *symbol) { return false; }
+#endif
+
+#endif /* _LINUX_KILLSWITCH_H */
diff --git a/include/linux/panic.h b/include/linux/panic.h
index f1dd417e54b29..6699261a61f13 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -88,7 +88,8 @@ static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
#define TAINT_RANDSTRUCT 17
#define TAINT_TEST 18
#define TAINT_FWCTL 19
-#define TAINT_FLAGS_COUNT 20
+#define TAINT_KILLSWITCH 20
+#define TAINT_FLAGS_COUNT 21
#define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1)
struct taint_flag {
diff --git a/include/linux/security.h b/include/linux/security.h
index 41d7367cf4036..038027c33ba1a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -146,6 +146,7 @@ enum lockdown_reason {
LOCKDOWN_DBG_WRITE_KERNEL,
LOCKDOWN_RTAS_ERROR_INJECTION,
LOCKDOWN_XEN_USER_ACTIONS,
+ LOCKDOWN_KILLSWITCH,
LOCKDOWN_INTEGRITY_MAX,
LOCKDOWN_KCORE,
LOCKDOWN_KPROBES,
diff --git a/init/Kconfig b/init/Kconfig
index 2937c4d308aec..5368dd4b5c65b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2278,6 +2278,8 @@ config ASN1
source "kernel/Kconfig.locks"
+source "kernel/Kconfig.killswitch"
+
config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
bool
diff --git a/kernel/Kconfig.killswitch b/kernel/Kconfig.killswitch
new file mode 100644
index 0000000000000..a33f7ecb2861e
--- /dev/null
+++ b/kernel/Kconfig.killswitch
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Killswitch: per-function short-circuit mitigation primitive.
+#
+# Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+#
+
+config KILLSWITCH
+ bool "Killswitch: short-circuit a kernel function as a CVE mitigation"
+ depends on SECURITYFS
+ depends on KPROBES && HAVE_KPROBES_ON_FTRACE
+ depends on HAVE_FUNCTION_ERROR_INJECTION
+ select FUNCTION_ERROR_INJECTION
+ help
+ Provide an admin-facing mechanism to make a chosen kernel function
+ return a fixed value without executing its body, as a temporary
+ mitigation for a security bug before a real fix is available.
+
+ Operators write "engage <symbol> <retval>" to
+ /sys/kernel/security/killswitch/control. The function entry is
+ redirected via a kprobe whose pre-handler sets the chosen return
+ value and short-circuits the call. There is no allowlist,
+ denylist, or return-type validation: if the kprobe layer accepts
+ the symbol the engagement proceeds, otherwise its error is
+ returned to userspace.
+
+ This is *not* livepatch: there is no replacement implementation,
+ the function simply returns the chosen value. Engaging a killswitch
+ taints the kernel (TAINT_KILLSWITCH, 'H'). Requires CAP_SYS_ADMIN.
+
+ If unsure, say N.
diff --git a/kernel/Makefile b/kernel/Makefile
index 6785982013dce..b3e408d9f275e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_GCOV_KERNEL) += gcov/
obj-$(CONFIG_KCOV) += kcov.o
obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_FAIL_FUNCTION) += fail_function.o
+obj-$(CONFIG_KILLSWITCH) += killswitch.o
obj-$(CONFIG_KGDB) += debug/
obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
diff --git a/kernel/killswitch.c b/kernel/killswitch.c
new file mode 100644
index 0000000000000..7f509c62ea748
--- /dev/null
+++ b/kernel/killswitch.c
@@ -0,0 +1,863 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Per-function short-circuit mitigation.
+ *
+ * Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+ *
+ * Engaging a killswitch installs a kprobe at the function's entry
+ * whose pre-handler sets the return register and skips the body via
+ * override_function_with_return(). Operator interface lives at
+ * /sys/kernel/security/killswitch/.
+ */
+
+#include <linux/audit.h>
+#include <linux/capability.h>
+#include <linux/cred.h>
+#include <linux/ctype.h>
+#include <linux/error-injection.h>
+#include <linux/init.h>
+#include <linux/killswitch.h>
+#include <linux/kprobes.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/panic.h>
+#include <linux/percpu.h>
+#include <linux/printk.h>
+#include <linux/sched.h>
+#include <linux/security.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
+#include <linux/uidgid.h>
+
+struct ks_attr {
+ struct list_head list;
+ struct kprobe kp;
+ /* atomic so a writer racing an in-flight call can't tear the long. */
+ atomic_long_t retval;
+ /* false once disengaged; per-fn file ops then return -EIDRM. */
+ bool engaged;
+ unsigned long __percpu *hits;
+ struct dentry *dir;
+ /* engaged_list holds one ref; each open per-fn fd holds one. */
+ struct kref refcnt;
+};
+
+static DEFINE_MUTEX(ks_lock);
+static LIST_HEAD(ks_engaged_list);
+static struct dentry *ks_root_dir;
+static struct dentry *ks_fn_dir; /* parent for per-fn directories */
+
+/* ------------------------------------------------------------------ *
+ * Pre-handler: the actual override *
+ * ------------------------------------------------------------------ */
+
+static int ks_kprobe_pre_handler(struct kprobe *kp, struct pt_regs *regs)
+{
+ struct ks_attr *attr = container_of(kp, struct ks_attr, kp);
+
+ this_cpu_inc(*attr->hits);
+ regs_set_return_value(regs, (unsigned long)atomic_long_read(&attr->retval));
+ override_function_with_return(regs);
+ return 1;
+}
+NOKPROBE_SYMBOL(ks_kprobe_pre_handler);
+
+/* Defined non-NULL so the kprobe layer keeps the IPMODIFY ops. */
+static void ks_kprobe_post_handler(struct kprobe *kp, struct pt_regs *regs,
+ unsigned long flags)
+{
+}
+
+/* ------------------------------------------------------------------ *
+ * Attribute lifecycle *
+ * ------------------------------------------------------------------ */
+
+static struct ks_attr *ks_attr_lookup(const char *symbol)
+{
+ struct ks_attr *attr;
+
+ list_for_each_entry(attr, &ks_engaged_list, list)
+ if (!strcmp(attr->kp.symbol_name, symbol))
+ return attr;
+ return NULL;
+}
+
+static unsigned long ks_attr_hits(const struct ks_attr *attr)
+{
+ unsigned long total = 0;
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ total += *per_cpu_ptr(attr->hits, cpu);
+ return total;
+}
+
+static void ks_attr_destroy(struct ks_attr *attr)
+{
+ if (!attr)
+ return;
+ free_percpu(attr->hits);
+ kfree(attr->kp.symbol_name);
+ kfree(attr);
+}
+
+static void ks_attr_kref_release(struct kref *kref)
+{
+ ks_attr_destroy(container_of(kref, struct ks_attr, refcnt));
+}
+
+static void ks_attr_get(struct ks_attr *attr)
+{
+ kref_get(&attr->refcnt);
+}
+
+static void ks_attr_put(struct ks_attr *attr)
+{
+ kref_put(&attr->refcnt, ks_attr_kref_release);
+}
+
+static struct ks_attr *ks_attr_alloc(const char *symbol)
+{
+ struct ks_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+ if (!attr)
+ return NULL;
+
+ attr->kp.symbol_name = kstrdup(symbol, GFP_KERNEL);
+ if (!attr->kp.symbol_name)
+ goto err;
+
+ attr->hits = alloc_percpu(unsigned long);
+ if (!attr->hits)
+ goto err;
+
+ attr->kp.pre_handler = ks_kprobe_pre_handler;
+ attr->kp.post_handler = ks_kprobe_post_handler;
+ INIT_LIST_HEAD(&attr->list);
+ kref_init(&attr->refcnt);
+ return attr;
+
+err:
+ ks_attr_destroy(attr);
+ return NULL;
+}
+
+/* ------------------------------------------------------------------ *
+ * Securityfs: per-fn attribute files *
+ * ------------------------------------------------------------------ */
+
+/*
+ * Look up by symbol name (the parent dentry's basename) under
+ * ks_lock and confirm attr->dir is the file's parent dentry. This
+ * binds the fd to the engagement it was opened against and avoids
+ * dereferencing inode->i_private, which a racing disengage may have
+ * freed. d_parent is stable for the open's lifetime via the file's
+ * dentry reference.
+ */
+static int ks_attr_open(struct inode *inode, struct file *file)
+{
+ struct dentry *parent = file->f_path.dentry->d_parent;
+ const char *name = parent->d_name.name;
+ struct ks_attr *attr;
+
+ mutex_lock(&ks_lock);
+ attr = ks_attr_lookup(name);
+ if (attr && attr->dir == parent)
+ ks_attr_get(attr);
+ else
+ attr = NULL;
+ mutex_unlock(&ks_lock);
+ if (!attr)
+ return -ENOENT;
+ file->private_data = attr;
+ return 0;
+}
+
+static int ks_attr_release(struct inode *inode, struct file *file)
+{
+ ks_attr_put(file->private_data);
+ file->private_data = NULL;
+ return 0;
+}
+
+/* Caller must hold ks_lock. */
+static int ks_attr_check_live(const struct ks_attr *attr)
+{
+ return attr->engaged ? 0 : -EIDRM;
+}
+
+static ssize_t ks_retval_read(struct file *file, char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ struct ks_attr *attr = file->private_data;
+ char buf[32];
+ long val;
+ int ret, len;
+
+ mutex_lock(&ks_lock);
+ ret = ks_attr_check_live(attr);
+ val = atomic_long_read(&attr->retval);
+ mutex_unlock(&ks_lock);
+ if (ret)
+ return ret;
+ len = scnprintf(buf, sizeof(buf), "%ld\n", val);
+ return simple_read_from_buffer(ubuf, count, ppos, buf, len);
+}
+
+static ssize_t ks_retval_write(struct file *file, const char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ struct ks_attr *attr = file->private_data;
+ char buf[32];
+ long val;
+ int ret;
+
+ if (count >= sizeof(buf))
+ return -EINVAL;
+ if (copy_from_user(buf, ubuf, count))
+ return -EFAULT;
+ buf[count] = '\0';
+ strim(buf);
+
+ ret = kstrtol(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ mutex_lock(&ks_lock);
+ ret = ks_attr_check_live(attr);
+ if (!ret)
+ atomic_long_set(&attr->retval, val);
+ mutex_unlock(&ks_lock);
+
+ return ret ? ret : count;
+}
+
+static const struct file_operations ks_retval_fops = {
+ .open = ks_attr_open,
+ .release = ks_attr_release,
+ .read = ks_retval_read,
+ .write = ks_retval_write,
+ .llseek = default_llseek,
+};
+
+static ssize_t ks_hits_read(struct file *file, char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ struct ks_attr *attr = file->private_data;
+ char buf[32];
+ unsigned long hits;
+ int ret, len;
+
+ mutex_lock(&ks_lock);
+ ret = ks_attr_check_live(attr);
+ hits = ks_attr_hits(attr);
+ mutex_unlock(&ks_lock);
+ if (ret)
+ return ret;
+ len = scnprintf(buf, sizeof(buf), "%lu\n", hits);
+ return simple_read_from_buffer(ubuf, count, ppos, buf, len);
+}
+
+static const struct file_operations ks_hits_fops = {
+ .open = ks_attr_open,
+ .release = ks_attr_release,
+ .read = ks_hits_read,
+ .llseek = default_llseek,
+};
+
+static int ks_create_attr_dir(struct ks_attr *attr)
+{
+ struct dentry *d;
+
+ attr->dir = securityfs_create_dir(attr->kp.symbol_name, ks_fn_dir);
+ if (IS_ERR(attr->dir))
+ return PTR_ERR(attr->dir);
+
+ /* ks_attr_open looks the attr up by name; i_private is unused. */
+ d = securityfs_create_file("retval", 0600, attr->dir,
+ NULL, &ks_retval_fops);
+ if (IS_ERR(d))
+ goto err;
+ d = securityfs_create_file("hits", 0400, attr->dir,
+ NULL, &ks_hits_fops);
+ if (IS_ERR(d))
+ goto err;
+ return 0;
+err:
+ securityfs_remove(attr->dir);
+ attr->dir = NULL;
+ return PTR_ERR(d);
+}
+
+/* ------------------------------------------------------------------ *
+ * Engage / disengage *
+ * ------------------------------------------------------------------ */
+
+static int __ks_engage(const char *symbol, long retval, bool from_cmdline)
+{
+ struct ks_attr *attr;
+ int ret;
+
+ if (!symbol || !*symbol)
+ return -EINVAL;
+
+ if (!from_cmdline) {
+ ret = security_locked_down(LOCKDOWN_KILLSWITCH);
+ if (ret)
+ return ret;
+ }
+
+ mutex_lock(&ks_lock);
+
+ if (ks_attr_lookup(symbol)) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ attr = ks_attr_alloc(symbol);
+ if (!attr) {
+ ret = -ENOMEM;
+ goto out_unlock;
+ }
+
+ atomic_long_set(&attr->retval, retval);
+
+ ret = register_kprobe(&attr->kp);
+ if (ret) {
+ pr_warn("killswitch: register_kprobe(%s) failed: %d\n",
+ symbol, ret);
+ ks_attr_put(attr);
+ goto out_unlock;
+ }
+
+ ret = ks_create_attr_dir(attr);
+ if (ret) {
+ unregister_kprobe(&attr->kp);
+ ks_attr_put(attr);
+ goto out_unlock;
+ }
+
+ list_add_tail(&attr->list, &ks_engaged_list);
+ attr->engaged = true;
+ add_taint(TAINT_KILLSWITCH, LOCKDEP_STILL_OK);
+
+ if (from_cmdline) {
+ pr_warn("killswitch: engage %s=%ld source=cmdline\n",
+ symbol, retval);
+ } else {
+ pr_warn("killswitch: engage %s=%ld uid=%u auid=%u ses=%u comm=%s\n",
+ symbol, retval,
+ from_kuid(&init_user_ns, current_uid()),
+ from_kuid(&init_user_ns, audit_get_loginuid(current)),
+ audit_get_sessionid(current),
+ current->comm);
+ }
+ ret = 0;
+
+out_unlock:
+ mutex_unlock(&ks_lock);
+ return ret;
+}
+
+int killswitch_engage(const char *symbol, long retval)
+{
+ return __ks_engage(symbol, retval, false);
+}
+
+static int __ks_disengage(const char *symbol)
+{
+ struct ks_attr *attr;
+ unsigned long hits;
+ int ret = 0;
+
+ mutex_lock(&ks_lock);
+ attr = ks_attr_lookup(symbol);
+ if (!attr) {
+ ret = -ENOENT;
+ goto out_unlock;
+ }
+
+ unregister_kprobe(&attr->kp);
+ attr->engaged = false;
+ list_del(&attr->list);
+ hits = ks_attr_hits(attr);
+ securityfs_remove(attr->dir);
+
+ pr_warn("killswitch: disengage %s hits=%lu uid=%u auid=%u ses=%u comm=%s\n",
+ symbol, hits,
+ from_kuid(&init_user_ns, current_uid()),
+ from_kuid(&init_user_ns, audit_get_loginuid(current)),
+ audit_get_sessionid(current),
+ current->comm);
+
+ /* unregister_kprobe() already waited out in-flight pre-handlers. */
+ ks_attr_put(attr);
+
+out_unlock:
+ mutex_unlock(&ks_lock);
+ return ret;
+}
+
+int killswitch_disengage(const char *symbol)
+{
+ return __ks_disengage(symbol);
+}
+
+bool killswitch_is_engaged(const char *symbol)
+{
+ bool engaged;
+
+ mutex_lock(&ks_lock);
+ engaged = ks_attr_lookup(symbol) != NULL;
+ mutex_unlock(&ks_lock);
+ return engaged;
+}
+
+static void ks_disengage_all_locked(void)
+{
+ struct ks_attr *attr, *n;
+
+ list_for_each_entry_safe(attr, n, &ks_engaged_list, list) {
+ unregister_kprobe(&attr->kp);
+ attr->engaged = false;
+ list_del(&attr->list);
+ securityfs_remove(attr->dir);
+ pr_warn("killswitch: disengage %s hits=%lu (disengage_all)\n",
+ attr->kp.symbol_name, ks_attr_hits(attr));
+ ks_attr_put(attr);
+ }
+}
+
+/* ------------------------------------------------------------------ *
+ * Module unload: drop engagements on functions in the going module *
+ * ------------------------------------------------------------------ */
+
+static int ks_module_notify(struct notifier_block *nb, unsigned long action,
+ void *data)
+{
+ struct module *mod = data;
+ struct ks_attr *attr, *n;
+
+ if (action != MODULE_STATE_GOING)
+ return NOTIFY_DONE;
+
+ mutex_lock(&ks_lock);
+ list_for_each_entry_safe(attr, n, &ks_engaged_list, list) {
+ if (!attr->kp.addr ||
+ __module_address((unsigned long)attr->kp.addr) != mod)
+ continue;
+
+ pr_warn("killswitch: %s mitigation lost: module %s unloading; re-engage after reload if still needed\n",
+ attr->kp.symbol_name, mod->name);
+ unregister_kprobe(&attr->kp);
+ attr->engaged = false;
+ list_del(&attr->list);
+ securityfs_remove(attr->dir);
+ ks_attr_put(attr);
+ }
+ mutex_unlock(&ks_lock);
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block ks_module_nb = {
+ .notifier_call = ks_module_notify,
+};
+
+/* ------------------------------------------------------------------ *
+ * Top-level securityfs files: control / engaged / taint *
+ * ------------------------------------------------------------------ */
+
+static int ks_engaged_show(struct seq_file *m, void *v)
+{
+ struct ks_attr *attr;
+
+ mutex_lock(&ks_lock);
+ list_for_each_entry(attr, &ks_engaged_list, list) {
+ seq_printf(m, "%s retval=%ld hits=%lu\n",
+ attr->kp.symbol_name,
+ atomic_long_read(&attr->retval),
+ ks_attr_hits(attr));
+ }
+ mutex_unlock(&ks_lock);
+ return 0;
+}
+
+static int ks_engaged_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, ks_engaged_show, NULL);
+}
+
+static const struct file_operations ks_engaged_fops = {
+ .open = ks_engaged_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static ssize_t ks_taint_read(struct file *file, char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ char buf[4];
+ int len;
+
+ len = scnprintf(buf, sizeof(buf), "%d\n",
+ test_taint(TAINT_KILLSWITCH) ? 1 : 0);
+ return simple_read_from_buffer(ubuf, count, ppos, buf, len);
+}
+
+static const struct file_operations ks_taint_fops = {
+ .open = simple_open,
+ .read = ks_taint_read,
+ .llseek = default_llseek,
+};
+
+/*
+ * control: parse one of:
+ * engage <symbol> <retval>
+ * disengage <symbol>
+ * disengage_all
+ */
+static ssize_t ks_control_write(struct file *file, const char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ char *buf, *cur, *verb, *sym, *retstr;
+ long retval = 0;
+ int ret;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (count == 0 || count > 4096)
+ return -EINVAL;
+
+ buf = memdup_user_nul(ubuf, count);
+ if (IS_ERR(buf))
+ return PTR_ERR(buf);
+
+ cur = strim(buf);
+ verb = strsep(&cur, " \t\n");
+ if (!verb || !*verb) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (!strcmp(verb, "disengage_all")) {
+ mutex_lock(&ks_lock);
+ ks_disengage_all_locked();
+ mutex_unlock(&ks_lock);
+ ret = count;
+ goto out;
+ }
+
+ sym = strsep(&cur, " \t\n");
+ if (!sym || !*sym) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (!strcmp(verb, "disengage")) {
+ ret = __ks_disengage(sym);
+ ret = ret ? ret : count;
+ goto out;
+ }
+
+ if (strcmp(verb, "engage")) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ retstr = strsep(&cur, " \t\n");
+ if (!retstr || !*retstr) {
+ ret = -EINVAL;
+ goto out;
+ }
+ if (kstrtol(retstr, 0, &retval)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = killswitch_engage(sym, retval);
+ if (!ret)
+ ret = count;
+
+out:
+ kfree(buf);
+ return ret;
+}
+
+static const struct file_operations ks_control_fops = {
+ .open = simple_open,
+ .write = ks_control_write,
+ .llseek = noop_llseek,
+};
+
+/* ------------------------------------------------------------------ *
+ * Boot parameter: *
+ * killswitch=fn1=-1,fn2=0,fn3=-22 *
+ * ------------------------------------------------------------------ */
+
+#define KS_BOOT_BUF 1024
+static char ks_boot_buf[KS_BOOT_BUF] __initdata;
+static bool ks_boot_present __initdata;
+
+static int __init ks_boot_setup(char *str)
+{
+ if (!str)
+ return 0;
+ strscpy(ks_boot_buf, str, sizeof(ks_boot_buf));
+ ks_boot_present = true;
+ return 1;
+}
+__setup("killswitch=", ks_boot_setup);
+
+static void __init ks_apply_boot_params(void)
+{
+ char *cur, *tok;
+ long retval;
+
+ if (!ks_boot_present)
+ return;
+
+ cur = ks_boot_buf;
+ while ((tok = strsep(&cur, ",")) != NULL) {
+ char *eq, *sym, *retstr;
+
+ if (!*tok)
+ continue;
+ eq = strchr(tok, '=');
+ if (!eq) {
+ pr_warn("killswitch: cmdline missing '=': %s\n", tok);
+ continue;
+ }
+ *eq++ = '\0';
+ sym = tok;
+ retstr = eq;
+
+ if (kstrtol(retstr, 0, &retval)) {
+ pr_warn("killswitch: cmdline bad retval %s=%s\n",
+ sym, retstr);
+ continue;
+ }
+
+ if (__ks_engage(sym, retval, true))
+ pr_warn("killswitch: cmdline engage %s failed\n", sym);
+ }
+}
+
+/* ------------------------------------------------------------------ *
+ * Init *
+ * ------------------------------------------------------------------ */
+
+static int __init killswitch_init(void)
+{
+ struct dentry *d;
+
+ ks_root_dir = securityfs_create_dir("killswitch", NULL);
+ if (IS_ERR(ks_root_dir))
+ return PTR_ERR(ks_root_dir);
+
+ d = securityfs_create_file("control", 0200, ks_root_dir,
+ NULL, &ks_control_fops);
+ if (IS_ERR(d))
+ goto err;
+ d = securityfs_create_file("engaged", 0444, ks_root_dir,
+ NULL, &ks_engaged_fops);
+ if (IS_ERR(d))
+ goto err;
+ d = securityfs_create_file("taint", 0444, ks_root_dir,
+ NULL, &ks_taint_fops);
+ if (IS_ERR(d))
+ goto err;
+
+ ks_fn_dir = securityfs_create_dir("fn", ks_root_dir);
+ if (IS_ERR(ks_fn_dir)) {
+ d = ks_fn_dir;
+ goto err;
+ }
+
+ register_module_notifier(&ks_module_nb);
+ ks_apply_boot_params();
+
+ pr_info("killswitch: ready (sysfs at /sys/kernel/security/killswitch/)\n");
+ return 0;
+
+err:
+ securityfs_remove(ks_root_dir);
+ return PTR_ERR(d);
+}
+late_initcall(killswitch_init);
+
+/* ------------------------------------------------------------------ *
+ * KUnit tests *
+ * ------------------------------------------------------------------ */
+
+#if IS_ENABLED(CONFIG_KUNIT)
+#include <kunit/test.h>
+
+/* Non-static so kallsyms resolves them without CONFIG_KALLSYMS_ALL. */
+int ks_kunit_target_int(int x);
+void *ks_kunit_target_ptr(int x);
+
+#if __has_attribute(__noipa__)
+# define KS_KUNIT_NOIPA __attribute__((__noipa__))
+#else
+# define KS_KUNIT_NOIPA noinline __noclone
+#endif
+
+KS_KUNIT_NOIPA int ks_kunit_target_int(int x)
+{
+ return x + 1;
+}
+
+KS_KUNIT_NOIPA void *ks_kunit_target_ptr(int x)
+{
+ return ERR_PTR(-EIO);
+}
+
+static int ks_kunit_init(struct kunit *test)
+{
+ if (security_locked_down(LOCKDOWN_KILLSWITCH))
+ kunit_skip(test, "integrity lockdown blocks killswitch_engage()");
+ return 0;
+}
+
+static int ks_kunit_init_lockdown(struct kunit *test)
+{
+ if (!security_locked_down(LOCKDOWN_KILLSWITCH))
+ kunit_skip(test, "requires lockdown=integrity");
+ return 0;
+}
+
+static void ks_disengage_quiet(const char *sym)
+{
+ if (killswitch_is_engaged(sym))
+ killswitch_disengage(sym);
+}
+
+static void ks_test_engage_int(struct kunit *test)
+{
+ int ret;
+
+ ret = killswitch_engage("ks_kunit_target_int", -EPERM);
+ KUNIT_EXPECT_EQ(test, ret, 0);
+ KUNIT_EXPECT_EQ(test, ks_kunit_target_int(7), -EPERM);
+ KUNIT_EXPECT_EQ(test, killswitch_disengage("ks_kunit_target_int"), 0);
+ KUNIT_EXPECT_EQ(test, ks_kunit_target_int(7), 8);
+}
+
+static void ks_test_double_engage(struct kunit *test)
+{
+ KUNIT_ASSERT_EQ(test,
+ killswitch_engage("ks_kunit_target_int", 0), 0);
+ KUNIT_EXPECT_EQ(test,
+ killswitch_engage("ks_kunit_target_int", 0), -EBUSY);
+ ks_disengage_quiet("ks_kunit_target_int");
+}
+
+static void ks_test_disengage_unknown(struct kunit *test)
+{
+ KUNIT_EXPECT_EQ(test,
+ killswitch_disengage("ks_kunit_target_int"), -ENOENT);
+}
+
+static void ks_test_pointer_target(struct kunit *test)
+{
+ long retval = (long)(unsigned long)ERR_PTR(-EACCES);
+
+ KUNIT_ASSERT_EQ(test,
+ killswitch_engage("ks_kunit_target_ptr", retval), 0);
+ KUNIT_EXPECT_TRUE(test, IS_ERR(ks_kunit_target_ptr(0)));
+ KUNIT_EXPECT_EQ(test, PTR_ERR(ks_kunit_target_ptr(0)), -EACCES);
+ ks_disengage_quiet("ks_kunit_target_ptr");
+}
+
+static void ks_test_taint_set(struct kunit *test)
+{
+ KUNIT_ASSERT_EQ(test,
+ killswitch_engage("ks_kunit_target_int", 0), 0);
+ KUNIT_EXPECT_TRUE(test, test_taint(TAINT_KILLSWITCH));
+ ks_disengage_quiet("ks_kunit_target_int");
+ /* taint must persist even after disengage */
+ KUNIT_EXPECT_TRUE(test, test_taint(TAINT_KILLSWITCH));
+}
+
+static void ks_test_hits_counter(struct kunit *test)
+{
+ struct ks_attr *attr;
+ int i;
+
+ KUNIT_ASSERT_EQ(test,
+ killswitch_engage("ks_kunit_target_int", 0), 0);
+
+ for (i = 0; i < 17; i++)
+ (void)ks_kunit_target_int(i);
+
+ mutex_lock(&ks_lock);
+ attr = ks_attr_lookup("ks_kunit_target_int");
+ KUNIT_EXPECT_NOT_NULL(test, attr);
+ if (attr)
+ KUNIT_EXPECT_EQ(test, ks_attr_hits(attr), 17UL);
+ mutex_unlock(&ks_lock);
+
+ ks_disengage_quiet("ks_kunit_target_int");
+}
+
+static struct kunit_case ks_kunit_cases[] = {
+ KUNIT_CASE(ks_test_engage_int),
+ KUNIT_CASE(ks_test_double_engage),
+ KUNIT_CASE(ks_test_disengage_unknown),
+ KUNIT_CASE(ks_test_pointer_target),
+ KUNIT_CASE(ks_test_taint_set),
+ KUNIT_CASE(ks_test_hits_counter),
+ {}
+};
+
+static struct kunit_suite ks_kunit_suite = {
+ .name = "killswitch",
+ .init = ks_kunit_init,
+ .test_cases = ks_kunit_cases,
+};
+
+/*
+ * Lockdown suite. Skipped unless the kernel was booted with
+ * lockdown=integrity (or higher). Run together with
+ * killswitch=ks_kunit_target_int=... on the same cmdline to also
+ * exercise the cmdline-bypass and disengage-under-lockdown paths.
+ */
+static void ks_test_lockdown_runtime_engage(struct kunit *test)
+{
+ KUNIT_EXPECT_EQ(test,
+ killswitch_engage("ks_kunit_target_int", 0), -EPERM);
+}
+
+static void ks_test_lockdown_cmdline_disengage(struct kunit *test)
+{
+ if (!killswitch_is_engaged("ks_kunit_target_int"))
+ kunit_skip(test,
+ "requires killswitch=ks_kunit_target_int=... on cmdline");
+ KUNIT_EXPECT_EQ(test,
+ killswitch_disengage("ks_kunit_target_int"), 0);
+}
+
+static struct kunit_case ks_kunit_lockdown_cases[] = {
+ KUNIT_CASE(ks_test_lockdown_runtime_engage),
+ KUNIT_CASE(ks_test_lockdown_cmdline_disengage),
+ {}
+};
+
+static struct kunit_suite ks_kunit_lockdown_suite = {
+ .name = "killswitch_lockdown",
+ .init = ks_kunit_init_lockdown,
+ .test_cases = ks_kunit_lockdown_cases,
+};
+
+kunit_test_suites(&ks_kunit_suite, &ks_kunit_lockdown_suite);
+
+#endif /* CONFIG_KUNIT */
+
diff --git a/kernel/panic.c b/kernel/panic.c
index 20feada5319d4..8ee174c7b7dd0 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -825,6 +825,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
TAINT_FLAG(RANDSTRUCT, 'T', ' '),
TAINT_FLAG(TEST, 'N', ' '),
TAINT_FLAG(FWCTL, 'J', ' '),
+ TAINT_FLAG(KILLSWITCH, 'H', ' '),
};
#undef TAINT_FLAG
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 8ff5adcfe1e0a..5770639c7b0ea 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -3349,6 +3349,19 @@ config TEST_HMM
If unsure, say N.
+config TEST_KILLSWITCH
+ tristate "Test module for the killswitch mitigation primitive"
+ depends on KILLSWITCH && DEBUG_FS
+ depends on m
+ help
+ Build a module that exposes a deliberately-vulnerable function
+ ks_test_vuln() and a debugfs trigger /sys/kernel/debug/test_killswitch/fire.
+ The killswitch selftest in tools/testing/selftests/killswitch/
+ uses this to confirm engaging a killswitch suppresses the BUG()
+ the function would otherwise hit.
+
+ If unsure, say N.
+
config TEST_FREE_PAGES
tristate "Test freeing pages"
help
diff --git a/lib/Makefile b/lib/Makefile
index f33a24bf1c19a..d763225340674 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_TEST_MEMCAT_P) += test_memcat_p.o
obj-$(CONFIG_TEST_OBJAGG) += test_objagg.o
obj-$(CONFIG_TEST_MEMINIT) += test_meminit.o
obj-$(CONFIG_TEST_LOCKUP) += test_lockup.o
+obj-$(CONFIG_TEST_KILLSWITCH) += test_killswitch.o
obj-$(CONFIG_TEST_HMM) += test_hmm.o
obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o
obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o
diff --git a/lib/test_killswitch.c b/lib/test_killswitch.c
new file mode 100644
index 0000000000000..cc2584ad652ff
--- /dev/null
+++ b/lib/test_killswitch.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test target for the killswitch selftest. ks_test_vuln() returns
+ * -EBADMSG on a magic input, standing in for "the buggy path runs
+ * and produces a bad outcome". Engaging killswitch on this function
+ * with retval 0 is the mitigation.
+ *
+ * Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+ */
+
+#include <linux/debugfs.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+
+#define KS_TEST_MAGIC 0xC0FFEEL
+
+int ks_test_vuln(long magic);
+
+/*
+ * Returns -EBADMSG on the magic input -- stands in for "the buggy
+ * path runs and produces a bad outcome". Engaging a killswitch on
+ * this function with retval 0 represents the mitigation: even on
+ * the magic input, callers see success because the body never runs.
+ *
+ * noipa prevents inlining/IPA so the call actually reaches the
+ * kprobe-instrumented entry point.
+ */
+noinline int ks_test_vuln(long magic)
+{
+ if (magic == KS_TEST_MAGIC)
+ return -EBADMSG;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ks_test_vuln);
+
+static struct dentry *ks_test_dir;
+
+static ssize_t ks_test_fire_write(struct file *file, const char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ char buf[32];
+ long magic;
+ int ret;
+
+ if (count == 0 || count >= sizeof(buf))
+ return -EINVAL;
+ if (copy_from_user(buf, ubuf, count))
+ return -EFAULT;
+ buf[count] = '\0';
+
+ ret = kstrtol(strim(buf), 0, &magic);
+ if (ret)
+ return ret;
+
+ ret = ks_test_vuln(magic);
+ return ret ? ret : count;
+}
+
+static const struct file_operations ks_test_fire_fops = {
+ .write = ks_test_fire_write,
+ .open = simple_open,
+ .llseek = noop_llseek,
+};
+
+static int __init test_killswitch_init(void)
+{
+ ks_test_dir = debugfs_create_dir("test_killswitch", NULL);
+ debugfs_create_file("fire", 0200, ks_test_dir, NULL,
+ &ks_test_fire_fops);
+ pr_info("test_killswitch: loaded (magic=0x%lx)\n", KS_TEST_MAGIC);
+ return 0;
+}
+module_init(test_killswitch_init);
+
+static void __exit test_killswitch_exit(void)
+{
+ debugfs_remove_recursive(ks_test_dir);
+}
+module_exit(test_killswitch_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Deliberately-vulnerable target for killswitch selftest");
diff --git a/security/security.c b/security/security.c
index 4e999f0236516..bf700abc911a9 100644
--- a/security/security.c
+++ b/security/security.c
@@ -62,6 +62,7 @@ const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX + 1] = {
[LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
[LOCKDOWN_RTAS_ERROR_INJECTION] = "RTAS error injection",
[LOCKDOWN_XEN_USER_ACTIONS] = "Xen guest user action",
+ [LOCKDOWN_KILLSWITCH] = "engaging a killswitch",
[LOCKDOWN_INTEGRITY_MAX] = "integrity",
[LOCKDOWN_KCORE] = "/proc/kcore access",
[LOCKDOWN_KPROBES] = "use of kprobes",
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6e59b8f63e416..04c3f8c5ff229 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -53,6 +53,7 @@ TARGETS += ipc
TARGETS += ir
TARGETS += kcmp
TARGETS += kexec
+TARGETS += killswitch
TARGETS += kselftest_harness
TARGETS += kvm
TARGETS += landlock
diff --git a/tools/testing/selftests/killswitch/.gitignore b/tools/testing/selftests/killswitch/.gitignore
new file mode 100644
index 0000000000000..cbf204ce18615
--- /dev/null
+++ b/tools/testing/selftests/killswitch/.gitignore
@@ -0,0 +1 @@
+cve_31431_test
diff --git a/tools/testing/selftests/killswitch/Makefile b/tools/testing/selftests/killswitch/Makefile
new file mode 100644
index 0000000000000..ccf41165cb73d
--- /dev/null
+++ b/tools/testing/selftests/killswitch/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+TEST_GEN_PROGS := cve_31431_test cve_43284_test
+TEST_PROGS := killswitch_test.sh
+
+CFLAGS += -O2 -g -std=gnu99 -Wall $(KHDR_INCLUDES)
+
+include ../lib.mk
diff --git a/tools/testing/selftests/killswitch/cve_31431_test.c b/tools/testing/selftests/killswitch/cve_31431_test.c
new file mode 100644
index 0000000000000..1ff817c51d881
--- /dev/null
+++ b/tools/testing/selftests/killswitch/cve_31431_test.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AF_ALG AEAD round-trip prober. The killswitch selftest uses this
+ * to demonstrate that engaging a killswitch on af_alg_sendmsg
+ * neuters AF_ALG operations (sendmsg returns -EPERM), mitigating
+ * any AF_ALG-reachable bug whose exploit primitive runs from the
+ * send path.
+ *
+ * Exit codes:
+ * 0 AEAD round-trip succeeded (function intact)
+ * 1 AEAD round-trip refused (mitigation engaged)
+ * 2 setup error (no AF_ALG, missing aead/gcm(aes), etc.) -> SKIP
+ *
+ * Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <unistd.h>
+#include <linux/if_alg.h>
+
+#define KEY_LEN 16
+#define IV_LEN 12
+#define AAD_LEN 16
+#define PT_LEN 64
+#define TAG_LEN 16
+#define EXPECTED_LEN (AAD_LEN + PT_LEN + TAG_LEN)
+
+#ifndef AF_ALG
+#define AF_ALG 38
+#endif
+#ifndef SOL_ALG
+#define SOL_ALG 279
+#endif
+
+int main(void)
+{
+ struct sockaddr_alg sa = {
+ .salg_family = AF_ALG,
+ .salg_type = "aead",
+ .salg_name = "gcm(aes)",
+ };
+ unsigned char key[KEY_LEN] = { 0 };
+ unsigned char iv[IV_LEN] = { 0 };
+ unsigned char buf[1024] = { 0 };
+ struct msghdr msg = { 0 };
+ struct iovec iov;
+ struct cmsghdr *cmsg;
+ struct af_alg_iv *aiv;
+ char cbuf[256] = { 0 };
+ int *p_op, *p_assoclen;
+ int sk, opfd;
+ ssize_t n;
+
+ sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
+ if (sk < 0) {
+ fprintf(stderr, "AF_ALG socket: %s -- skip\n", strerror(errno));
+ return 2;
+ }
+ if (bind(sk, (struct sockaddr *)&sa, sizeof(sa))) {
+ fprintf(stderr, "bind aead/gcm(aes): %s -- skip\n",
+ strerror(errno));
+ close(sk);
+ return 2;
+ }
+ if (setsockopt(sk, SOL_ALG, ALG_SET_KEY, key, KEY_LEN)) {
+ fprintf(stderr, "ALG_SET_KEY: %s -- skip\n", strerror(errno));
+ close(sk);
+ return 2;
+ }
+ if (setsockopt(sk, SOL_ALG, ALG_SET_AEAD_AUTHSIZE, NULL, TAG_LEN)) {
+ fprintf(stderr, "ALG_SET_AEAD_AUTHSIZE: %s -- skip\n",
+ strerror(errno));
+ close(sk);
+ return 2;
+ }
+
+ opfd = accept(sk, NULL, 0);
+ if (opfd < 0) {
+ fprintf(stderr, "accept: %s -- skip\n", strerror(errno));
+ close(sk);
+ return 2;
+ }
+
+ /* control message: ENCRYPT op + IV + assoclen */
+ msg.msg_control = cbuf;
+ msg.msg_controllen = CMSG_SPACE(sizeof(int))
+ + CMSG_SPACE(sizeof(*aiv) + IV_LEN)
+ + CMSG_SPACE(sizeof(int));
+
+ cmsg = CMSG_FIRSTHDR(&msg);
+ cmsg->cmsg_level = SOL_ALG;
+ cmsg->cmsg_type = ALG_SET_OP;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ p_op = (int *)CMSG_DATA(cmsg);
+ *p_op = ALG_OP_ENCRYPT;
+
+ cmsg = CMSG_NXTHDR(&msg, cmsg);
+ cmsg->cmsg_level = SOL_ALG;
+ cmsg->cmsg_type = ALG_SET_IV;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(*aiv) + IV_LEN);
+ aiv = (struct af_alg_iv *)CMSG_DATA(cmsg);
+ aiv->ivlen = IV_LEN;
+ memcpy(aiv->iv, iv, IV_LEN);
+
+ cmsg = CMSG_NXTHDR(&msg, cmsg);
+ cmsg->cmsg_level = SOL_ALG;
+ cmsg->cmsg_type = ALG_SET_AEAD_ASSOCLEN;
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ p_assoclen = (int *)CMSG_DATA(cmsg);
+ *p_assoclen = AAD_LEN;
+
+ /* AAD || plaintext */
+ memset(buf, 0xaa, AAD_LEN);
+ memset(buf + AAD_LEN, 0x55, PT_LEN);
+ iov.iov_base = buf;
+ iov.iov_len = AAD_LEN + PT_LEN;
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+
+ n = sendmsg(opfd, &msg, 0);
+ if (n < 0) {
+ /*
+ * sendmsg refused: this is exactly the killswitch
+ * af_alg_sendmsg=-EPERM mitigation outcome. Distinct
+ * exit code from setup failure so the test script can
+ * tell them apart.
+ */
+ fprintf(stderr, "sendmsg: %s -- mitigation engaged?\n",
+ strerror(errno));
+ close(opfd); close(sk);
+ return 1;
+ }
+
+ /* recv: AAD echoed, plus ciphertext + tag */
+ memset(buf, 0, sizeof(buf));
+ n = read(opfd, buf, EXPECTED_LEN);
+ close(opfd); close(sk);
+
+ if (n == 0) {
+ printf("AEAD returned 0 bytes -- killswitch mitigation engaged\n");
+ return 1;
+ }
+ if (n != EXPECTED_LEN) {
+ fprintf(stderr,
+ "AEAD short read: got %zd, expected %d -- mitigated?\n",
+ n, EXPECTED_LEN);
+ return 1;
+ }
+
+ /* sanity: ciphertext (after AAD) shouldn't equal the plaintext bytes */
+ if (memcmp(buf + AAD_LEN, buf + AAD_LEN + 1, PT_LEN - 1) == 0) {
+ fprintf(stderr, "AEAD output looks unencrypted\n");
+ return 2;
+ }
+
+ printf("AEAD round-trip OK (%zd bytes)\n", n);
+ return 0;
+}
diff --git a/tools/testing/selftests/killswitch/cve_43284_test.c b/tools/testing/selftests/killswitch/cve_43284_test.c
new file mode 100644
index 0000000000000..4771cb0957dc1
--- /dev/null
+++ b/tools/testing/selftests/killswitch/cve_43284_test.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * UDP loopback round-trip prober. Wrapped by killswitch_test.sh with
+ * an IPsec ESP SA + policy pair on loopback, this demonstrates that
+ * engaging a killswitch on esp_input drops inbound ESP packets before
+ * decapsulation, mitigating CVE-2026-43284 ("Dirty Frag", upstream fix
+ * xfrm: esp: avoid in-place decrypt on shared skb frags).
+ *
+ * The binary itself knows nothing about ESP -- it sends one UDP
+ * datagram to itself and waits up to a second for delivery.
+ *
+ * Exit codes:
+ * 0 UDP round-trip succeeded (no mitigation in effect)
+ * 1 UDP recv timed out (mitigation engaged)
+ * 2 setup error -> SKIP
+ *
+ * Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+ */
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <unistd.h>
+
+#define UDP_PORT 53435
+#define PROBE "ks-43284-probe"
+
+int main(void)
+{
+ struct sockaddr_in addr = {
+ .sin_family = AF_INET,
+ .sin_port = htons(UDP_PORT),
+ .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+ };
+ struct timeval tv = { .tv_sec = 1, .tv_usec = 0 };
+ char buf[64];
+ int sk;
+ ssize_t n;
+
+ sk = socket(AF_INET, SOCK_DGRAM, 0);
+ if (sk < 0) {
+ fprintf(stderr, "socket: %s -- skip\n", strerror(errno));
+ return 2;
+ }
+ if (bind(sk, (struct sockaddr *)&addr, sizeof(addr))) {
+ fprintf(stderr, "bind: %s -- skip\n", strerror(errno));
+ close(sk);
+ return 2;
+ }
+ if (setsockopt(sk, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv))) {
+ fprintf(stderr, "SO_RCVTIMEO: %s -- skip\n", strerror(errno));
+ close(sk);
+ return 2;
+ }
+
+ if (sendto(sk, PROBE, sizeof(PROBE) - 1, 0,
+ (struct sockaddr *)&addr, sizeof(addr)) < 0) {
+ fprintf(stderr, "sendto: %s -- skip\n", strerror(errno));
+ close(sk);
+ return 2;
+ }
+
+ memset(buf, 0, sizeof(buf));
+ n = recvfrom(sk, buf, sizeof(buf), 0, NULL, NULL);
+ close(sk);
+
+ if (n < 0) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ fprintf(stderr,
+ "recvfrom: timeout -- mitigation engaged?\n");
+ return 1;
+ }
+ fprintf(stderr, "recvfrom: %s\n", strerror(errno));
+ return 2;
+ }
+ if (n != (ssize_t)(sizeof(PROBE) - 1) ||
+ memcmp(buf, PROBE, sizeof(PROBE) - 1)) {
+ fprintf(stderr, "recvfrom: bad payload (%zd bytes)\n", n);
+ return 2;
+ }
+
+ printf("UDP round-trip OK (%zd bytes)\n", n);
+ return 0;
+}
diff --git a/tools/testing/selftests/killswitch/killswitch_test.sh b/tools/testing/selftests/killswitch/killswitch_test.sh
new file mode 100755
index 0000000000000..ea3fd394a984f
--- /dev/null
+++ b/tools/testing/selftests/killswitch/killswitch_test.sh
@@ -0,0 +1,254 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# End-to-end killswitch selftest. Drives the test_killswitch module
+# through an engage/disengage cycle and confirms each transition
+# behaves as expected. Also runs the AF_ALG mitigation proof.
+#
+# Requirements (see Documentation/admin-guide/killswitch.rst):
+# - CONFIG_KILLSWITCH=y
+# - CONFIG_TEST_KILLSWITCH=m
+# - run as root (CAP_SYS_ADMIN)
+#
+# Copyright (C) 2026 Sasha Levin <sashal@kernel.org>
+#
+
+set -u
+
+KS=/sys/kernel/security/killswitch
+TRIG=/sys/kernel/debug/test_killswitch/fire
+
+NOMOD=0
+SKIP_RC=4
+N=0
+FAIL=0
+
+ksft_pass() { N=$((N+1)); echo "ok $N - $*"; }
+ksft_fail() { N=$((N+1)); FAIL=$((FAIL+1)); echo "not ok $N - $*"; }
+ksft_skip() { echo "ok 1 - SKIP $*"; echo "1..1"; exit $SKIP_RC; }
+
+[[ $EUID -eq 0 ]] || ksft_skip "must be root"
+[[ -d $KS ]] || ksft_skip "$KS not present (CONFIG_KILLSWITCH disabled?)"
+
+if ! modprobe test_killswitch 2>/dev/null; then
+ NOMOD=1
+fi
+[[ -e $TRIG ]] || ksft_skip "$TRIG missing (test_killswitch.ko not installed?)"
+
+cleanup() {
+ echo "disengage_all" > $KS/control 2>/dev/null || true
+ [[ $NOMOD -eq 0 ]] && rmmod test_killswitch 2>/dev/null || true
+}
+trap cleanup EXIT
+
+# --- pre-engage: bad path runs, write fails with EBADMSG ---
+if echo 0xC0FFEE > $TRIG 2>/dev/null; then
+ ksft_fail "pre-engage: write should have failed (-EBADMSG)"
+else
+ [[ $? -ne 0 ]] && ksft_pass "pre-engage: bad path returns error" \
+ || ksft_fail "pre-engage: unexpected outcome"
+fi
+
+# --- engage ---
+echo "engage ks_test_vuln 0" > $KS/control
+grep -q "^ks_test_vuln" $KS/engaged \
+ && ksft_pass "engage: ks_test_vuln in engaged list" \
+ || ksft_fail "engage: missing from engaged list"
+
+[[ $(cat $KS/taint) == 1 ]] \
+ && ksft_pass "engage: taint set" \
+ || ksft_fail "engage: taint not set"
+
+[[ -d $KS/fn/ks_test_vuln ]] \
+ && ksft_pass "engage: per-fn dir created" \
+ || ksft_fail "engage: per-fn dir missing"
+
+# --- post-engage: BUG suppressed; write returns successfully ---
+if echo 0xC0FFEE > $TRIG 2>/dev/null; then
+ ksft_pass "post-engage: BUG suppressed, write succeeded"
+else
+ ksft_fail "post-engage: write should succeed"
+fi
+
+[[ $(cat $KS/fn/ks_test_vuln/hits) -ge 1 ]] \
+ && ksft_pass "post-engage: hits counter incremented" \
+ || ksft_fail "post-engage: hits counter did not move"
+
+# --- retval rewrite is a plain write (no validation) ---
+echo 7 > $KS/fn/ks_test_vuln/retval
+[[ $(cat $KS/fn/ks_test_vuln/retval) == 7 ]] \
+ && ksft_pass "retval rewrite round-trips" \
+ || ksft_fail "retval rewrite failed"
+
+# --- engage on a kprobe-rejected function fails ---
+# warn_thunk_thunk is in /sys/kernel/debug/kprobes/blacklist;
+# register_kprobe() refuses it.
+KP_REJECT=warn_thunk_thunk
+if echo "engage $KP_REJECT 0" > $KS/control 2>/dev/null; then
+ ksft_fail "register_kprobe should have rejected $KP_REJECT"
+ echo "disengage $KP_REJECT" > $KS/control
+else
+ ksft_pass "register_kprobe refuses blacklisted target"
+fi
+
+# --- disengage ---
+echo "disengage ks_test_vuln" > $KS/control
+[[ -z "$(cat $KS/engaged)" ]] \
+ && ksft_pass "disengage: engaged list empty" \
+ || ksft_fail "disengage: engaged list not empty"
+
+[[ ! -d $KS/fn/ks_test_vuln ]] \
+ && ksft_pass "disengage: per-fn dir removed" \
+ || ksft_fail "disengage: per-fn dir still present"
+
+[[ $(cat $KS/taint) == 1 ]] \
+ && ksft_pass "disengage: taint persists" \
+ || ksft_fail "disengage: taint should persist"
+
+# --- post-disengage: bad path active again ---
+if echo 0xC0FFEE > $TRIG 2>/dev/null; then
+ ksft_fail "post-disengage: write should fail again"
+else
+ ksft_pass "post-disengage: bad path active again"
+fi
+
+# ---- CVE-2026-31431 mitigation proof (AF_ALG aead via af_alg_sendmsg) ----
+# Skip the whole block if AF_ALG / AEAD machinery isn't compiled in.
+if [[ -x $(dirname "$0")/cve_31431_test ]]; then
+ CVE=$(dirname "$0")/cve_31431_test
+ $CVE >/dev/null 2>&1 && PRE=$? || PRE=$?
+ if [[ $PRE -eq 0 ]]; then
+ ksft_pass "cve-31431: pre-engage AEAD round-trip OK"
+
+ echo "engage af_alg_sendmsg -1" > $KS/control
+ $CVE >/dev/null 2>&1 && POST=$? || POST=$?
+ if [[ $POST -eq 1 ]]; then
+ ksft_pass "cve-31431: post-engage AEAD refused (mitigated)"
+ else
+ ksft_fail "cve-31431: post-engage exit=$POST (expected 1)"
+ fi
+
+ HITS=$(cat $KS/fn/af_alg_sendmsg/hits 2>/dev/null || echo 0)
+ [[ $HITS -ge 1 ]] && ksft_pass "cve-31431: hits=$HITS recorded" \
+ || ksft_fail "cve-31431: hits not recorded"
+
+ echo "disengage af_alg_sendmsg" > $KS/control
+ $CVE >/dev/null 2>&1 && POST2=$? || POST2=$?
+ [[ $POST2 -eq 0 ]] && ksft_pass "cve-31431: post-disengage restored" \
+ || ksft_fail "cve-31431: post-disengage exit=$POST2"
+ elif [[ $PRE -eq 2 ]]; then
+ echo "# SKIP cve-31431 (AF_ALG/AEAD not available)"
+ else
+ ksft_fail "cve-31431: pre-engage exit=$PRE"
+ fi
+fi
+
+# ---- CVE-2026-43284 mitigation proof (IPsec ESP via esp_input) ----
+# Engaging esp_input causes inbound ESP packets to be dropped before
+# decapsulation, neutering any bug downstream of the ESP receive path.
+# Two netns + veth so traffic actually traverses xfrm (single-netns
+# 127.0.0.0/8 traffic short-circuits before xfrm policy lookup).
+NS0=ks-esp-0
+NS1=ks-esp-1
+esp_setup_ok=0
+esp_cleanup() {
+ [[ $esp_setup_ok -eq 1 ]] || return 0
+ ip netns del $NS0 2>/dev/null
+ ip netns del $NS1 2>/dev/null
+}
+trap 'cleanup; esp_cleanup' EXIT
+
+# UDP probe in python3 (always present on Debian/Fedora minimal installs).
+esp_round_trip() {
+ # $1: source netns, $2: dest netns, $3: dest ip, $4: port
+ local tmp rpid rc
+ tmp=$(mktemp)
+ ip netns exec "$2" python3 -c '
+import socket
+r = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+r.bind(("0.0.0.0", '"$4"'))
+r.settimeout(2.0)
+try:
+ d,_ = r.recvfrom(64)
+ print(d.decode(errors="replace"))
+except socket.timeout:
+ print("timeout")
+' > "$tmp" 2>&1 &
+ rpid=$!
+ sleep 0.3
+ ip netns exec "$1" python3 -c '
+import socket
+s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+s.sendto(b"ks-esp-probe", ("'"$3"'", '"$4"'))
+' 2>/dev/null
+ wait $rpid 2>/dev/null
+ rc=1
+ grep -q "ks-esp-probe" "$tmp" && rc=0
+ rm -f "$tmp"
+ return $rc
+}
+
+if command -v ip >/dev/null 2>&1 && command -v python3 >/dev/null 2>&1; then
+ KEY=0x0123456789abcdef0123456789abcdef01234567
+
+ if ip netns add $NS0 2>/dev/null && \
+ ip netns add $NS1 2>/dev/null && \
+ ip link add veth0 type veth peer name veth1 2>/dev/null && \
+ ip link set veth0 netns $NS0 2>/dev/null && \
+ ip link set veth1 netns $NS1 2>/dev/null && \
+ ip -n $NS0 addr add 10.99.0.1/24 dev veth0 2>/dev/null && \
+ ip -n $NS1 addr add 10.99.0.2/24 dev veth1 2>/dev/null && \
+ ip -n $NS0 link set veth0 up 2>/dev/null && \
+ ip -n $NS1 link set veth1 up 2>/dev/null && \
+ ip -n $NS0 link set lo up 2>/dev/null && \
+ ip -n $NS1 link set lo up 2>/dev/null && \
+ ip -n $NS0 xfrm state add src 10.99.0.1 dst 10.99.0.2 proto esp \
+ spi 0x1000 mode transport reqid 0x100 \
+ aead 'rfc4106(gcm(aes))' $KEY 128 2>/dev/null && \
+ ip -n $NS0 xfrm state add src 10.99.0.2 dst 10.99.0.1 proto esp \
+ spi 0x1001 mode transport reqid 0x100 \
+ aead 'rfc4106(gcm(aes))' $KEY 128 2>/dev/null && \
+ ip -n $NS1 xfrm state add src 10.99.0.1 dst 10.99.0.2 proto esp \
+ spi 0x1000 mode transport reqid 0x100 \
+ aead 'rfc4106(gcm(aes))' $KEY 128 2>/dev/null && \
+ ip -n $NS1 xfrm state add src 10.99.0.2 dst 10.99.0.1 proto esp \
+ spi 0x1001 mode transport reqid 0x100 \
+ aead 'rfc4106(gcm(aes))' $KEY 128 2>/dev/null && \
+ ip -n $NS0 xfrm policy add src 10.99.0.1 dst 10.99.0.2 \
+ dir out tmpl src 10.99.0.1 dst 10.99.0.2 proto esp \
+ reqid 0x100 mode transport 2>/dev/null && \
+ ip -n $NS1 xfrm policy add src 10.99.0.1 dst 10.99.0.2 \
+ dir in tmpl src 10.99.0.1 dst 10.99.0.2 proto esp \
+ reqid 0x100 mode transport 2>/dev/null; then
+ esp_setup_ok=1
+ fi
+
+ if [[ $esp_setup_ok -eq 1 ]] \
+ && esp_round_trip $NS0 $NS1 10.99.0.2 53435; then
+ ksft_pass "cve-43284: pre-engage ESP round-trip OK"
+
+ echo "engage esp_input -22" > $KS/control
+ if esp_round_trip $NS0 $NS1 10.99.0.2 53435; then
+ ksft_fail "cve-43284: post-engage ESP should have been dropped"
+ else
+ ksft_pass "cve-43284: post-engage ESP refused (mitigated)"
+ fi
+
+ ESP_HITS=$(cat $KS/fn/esp_input/hits 2>/dev/null || echo 0)
+ [[ $ESP_HITS -ge 1 ]] \
+ && ksft_pass "cve-43284: hits=$ESP_HITS recorded" \
+ || ksft_fail "cve-43284: hits not recorded"
+
+ echo "disengage esp_input" > $KS/control
+ if esp_round_trip $NS0 $NS1 10.99.0.2 53435; then
+ ksft_pass "cve-43284: post-disengage restored"
+ else
+ ksft_fail "cve-43284: post-disengage ESP still dropped"
+ fi
+ else
+ echo "# SKIP cve-43284 (netns/veth/XFRM/ESP setup failed)"
+ fi
+fi
+
+echo "1..$N"
+exit $((FAIL > 0))
--
2.53.0
^ permalink raw reply related
* [PATCH] docs: gpu: fix spelling errors and remove duplicate sentence
From: Elliot Tester @ 2026-05-17 13:41 UTC (permalink / raw)
To: alexander.deucher, christian.koenig, maarten.lankhorst, mripard,
tzimmermann, airlied, simona, corbet
Cc: skhan, amd-gfx, dri-devel, linux-doc, linux-kernel, Elliot Tester
Fix various spelling errors in GPU docs:
- indicies -> indices (userq.rst)
- umap -> unmap (userq.rst)
- pre-empt -> preempt (drm-compute.rst)
- buffer-leaks -> buffer leaks (drm-uapi.rst)
- Additionally to -> In addition to (drm-uapi.rst)
- unpriviledged -> unprivileged (drm-uapi.rst)
- fucntions -> functions (todo.rst)
- varios -> various (todo.rst)
- implementions -> implementations (todo.rst)
- complection -> completion (todo.rst)
Ale remove a duplicated sentance and stray "uff." in the todo.rst, add
missing period after drm_ioctl.c reference, and add missing newline at
end of drm-uapi.rst. Fixing this would make reading the docs just a
little bit easier.
Signed-off-by: Elliot Tester <elliotctester1@gmail.com>
---
Documentation/gpu/amdgpu/userq.rst | 4 ++--
Documentation/gpu/drm-compute.rst | 2 +-
Documentation/gpu/drm-uapi.rst | 10 +++++-----
Documentation/gpu/todo.rst | 11 +++++------
4 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/Documentation/gpu/amdgpu/userq.rst b/Documentation/gpu/amdgpu/userq.rst
index 88f54393b..94427e18a 100644
--- a/Documentation/gpu/amdgpu/userq.rst
+++ b/Documentation/gpu/amdgpu/userq.rst
@@ -156,9 +156,9 @@ IOCTL Interfaces
GPU virtual addresses used for queues and related data (rptrs, wptrs, context
save areas, etc.) should be validated by the kernel mode driver to prevent the
user from specifying invalid GPU virtual addresses. If the user provides
-invalid GPU virtual addresses or doorbell indicies, the IOCTL should return an
+invalid GPU virtual addresses or doorbell indices, the IOCTL should return an
error message. These buffers should also be tracked in the kernel driver so
-that if the user attempts to unmap the buffer(s) from the GPUVM, the umap call
+that if the user attempts to unmap the buffer(s) from the GPUVM, the unmap call
would return an error.
INFO
diff --git a/Documentation/gpu/drm-compute.rst b/Documentation/gpu/drm-compute.rst
index f90c3e63a..35cc8d654 100644
--- a/Documentation/gpu/drm-compute.rst
+++ b/Documentation/gpu/drm-compute.rst
@@ -7,7 +7,7 @@ seconds. (The time let the user wait before he reaches for the power button).
This means that other techniques need to be used to manage those workloads,
that cannot use fences.
-Some hardware may schedule compute jobs, and have no way to pre-empt them, or
+Some hardware may schedule compute jobs, and have no way to preempt them, or
have their memory swapped out from them. Or they simply want their workload
not to be preempted or swapped out at all.
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 579e87cb9..0ef498bff 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -150,10 +150,10 @@ separate render node called renderD<num>. There will be one render node
per device. No ioctls except PRIME-related ioctls will be allowed on
this node. Especially GEM_OPEN will be explicitly prohibited. For a
complete list of driver-independent ioctls that can be used on render
-nodes, see the ioctls marked DRM_RENDER_ALLOW in drm_ioctl.c Render
-nodes are designed to avoid the buffer-leaks, which occur if clients
+nodes, see the ioctls marked DRM_RENDER_ALLOW in drm_ioctl.c. Render
+nodes are designed to avoid the buffer leaks, which occur if clients
guess the flink names or mmap offsets on the legacy interface.
-Additionally to this basic interface, drivers must mark their
+In addition to this basic interface, drivers must mark their
driver-dependent render-only ioctls as DRM_RENDER_ALLOW so render
clients can use them. Driver authors must be careful not to allow any
privileged ioctls on render nodes.
@@ -568,7 +568,7 @@ ENOSPC:
EPERM/EACCES:
Returned for an operation that is valid, but needs more privileges.
E.g. root-only or much more common, DRM master-only operations return
- this when called by unpriviledged clients. There's no clear
+ this when called by unprivileged clients. There's no clear
difference between EACCES and EPERM.
ENODEV:
@@ -761,4 +761,4 @@ Stable uAPI events
From ``drivers/gpu/drm/scheduler/gpu_scheduler_trace.h``
.. kernel-doc:: drivers/gpu/drm/scheduler/gpu_scheduler_trace.h
- :doc: uAPI trace events
\ No newline at end of file
+ :doc: uAPI trace events
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index bc9f14c8a..b13cd4347 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -55,7 +55,7 @@ There are still drivers that use drm_simple_display_pipe. The task here is to
convert them to use regular atomic helpers. Search for a driver that calls
drm_simple_display_pipe_init() and inline all helpers from drm_simple_kms_helper.c
into the driver, such that no simple-KMS interfaces are required. Please also
-rename all inlined fucntions according to driver conventions.
+rename all inlined functions according to driver conventions.
Contact: Thomas Zimmermann, respective driver maintainer
@@ -301,7 +301,7 @@ Various hold-ups:
valid formats for atomic drivers.
- Many drivers subclass drm_framebuffer, we'd need a embedding compatible
- version of the varios drm_gem_fb_create functions. Maybe called
+ version of the various drm_gem_fb_create functions. Maybe called
drm_gem_fb_create/_with_dirty/_with_funcs as needed.
Contact: Simona Vetter
@@ -326,10 +326,9 @@ everything after it has done the write-protect/mkwrite trickery:
vma->vm_page_prot = pgprot_wrprotect(vma->vm_page_prot);
-- Set the mkwrite and fsync callbacks with similar implementions to the core
+- Set the mkwrite and fsync callbacks with similar implementations to the core
fbdev defio stuff. These should all work on plain ptes, they don't actually
- require a struct page. uff. These should all work on plain ptes, they don't
- actually require a struct page.
+ require a struct page.
- Track the dirty pages in a separate structure (bitfield with one bit per page
should work) to avoid clobbering struct page.
@@ -914,7 +913,7 @@ Querying errors from drm_syncobj
================================
The drm_syncobj container can be used by driver independent code to signal
-complection of submission.
+completion of submission.
One minor feature still missing is a generic DRM IOCTL to query the error
status of binary and timeline drm_syncobj.
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v6 03/11] dt-bindings: mfd: add documentation for S2MU005 PMIC
From: Kaustabh Chakraborty @ 2026-05-17 13:09 UTC (permalink / raw)
To: Conor Dooley, Kaustabh Chakraborty
Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
Krzysztof Kozlowski, André Draszik, Alexandre Belloni,
Jonathan Corbet, Shuah Khan, Nam Tran,
Łukasz Lebiedziński, linux-leds, devicetree,
linux-kernel, linux-pm, linux-samsung-soc, linux-rtc, linux-doc
In-Reply-To: <20260516-esquire-chitchat-0fffa597e2f3@spud>
On 2026-05-16 23:25 +01:00, Conor Dooley wrote:
> On Sat, May 16, 2026 at 02:41:29AM +0530, Kaustabh Chakraborty wrote:
>> On 2026-05-15 18:14 +01:00, Conor Dooley wrote:
>> > On Fri, May 15, 2026 at 04:08:59PM +0530, Kaustabh Chakraborty wrote:
>> >> Samsung's S2MU005 PMIC includes subdevices for a charger, an MUIC (Micro
>> >> USB Interface Controller), and flash and RGB LED controllers.
>> >>
>> >> Add the compatible and documentation for the S2MU005 PMIC. Also, add an
>> >> example for nodes for supported sub-devices, i.e. MUIC, flash LEDs, and
>> >> RGB LEDs. Charger sub-device uses the node of the parent.
>> >>
>> >> Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
>> >> ---
>> >> .../bindings/mfd/samsung,s2mu005-pmic.yaml | 120 +++++++++++++++++++++
>> >> 1 file changed, 120 insertions(+)
>> >>
>> >> diff --git a/Documentation/devicetree/bindings/mfd/samsung,s2mu005-pmic.yaml b/Documentation/devicetree/bindings/mfd/samsung,s2mu005-pmic.yaml
>> >> new file mode 100644
>> >> index 0000000000000..0e6afb7d2017b
>> >> --- /dev/null
>> >> +++ b/Documentation/devicetree/bindings/mfd/samsung,s2mu005-pmic.yaml
>> >> @@ -0,0 +1,120 @@
>> >> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
>> >> +%YAML 1.2
>> >> +---
>> >> +$id: http://devicetree.org/schemas/mfd/samsung,s2mu005-pmic.yaml#
>> >> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> >> +
>> >> +title: Samsung S2MU005 Power Management IC
>> >> +
>> >> +maintainers:
>> >> + - Kaustabh Chakraborty <kauschluss@disroot.org>
>> >> +
>> >> +description: |
>> >> + The S2MU005 is a companion power management IC which includes subdevices for
>> >> + a charger controller, an MUIC (Micro USB Interface Controller), and flash and
>> >> + RGB LED controllers.
>> >> +
>> >> +allOf:
>> >> + - $ref: /schemas/power/supply/power-supply.yaml#
>> >> +
>> >> +properties:
>> >> + compatible:
>> >> + const: samsung,s2mu005-pmic
>> >> +
>> >> + flash:
>> >> + $ref: /schemas/leds/samsung,s2mu005-flash.yaml
>> >> + description:
>> >> + Child node describing flash LEDs.
>> >> +
>> >> + interrupts:
>> >> + maxItems: 1
>> >> +
>> >> + muic:
>> >> + $ref: /schemas/extcon/samsung,s2mu005-muic.yaml#
>> >> + description:
>> >> + Child node describing MUIC device.
>> >> +
>> >> + multi-led:
>> >> + type: object
>> >> +
>> >> + allOf:
>> >> + - $ref: /schemas/leds/leds-class-multicolor.yaml#
>> >
>> > Does this need to be an allOf when the other refs are not?
>>
>> It has it's own properties, that's the reason. This used to be it's own
>> thing in dt-bindings/leds, but I was asked to move it here in prior
>> reviews.
>
> What do you mean by "its own properties"?
I mean, the other schemas (muic, flash) are in their own file, with
compatible, and other properties too.
This one, inherits properties from leds-class-multicolor, AND has a
"compatible" property with it, which is not defined in
leds-class-multicolor. Now if you ask why does the compatible exist,
that's something Krzysztof suggested in previous revisions.
And, Krzysztof had also reviewed this patch, and (similar to the prev
patch) I've missed the trailers, which have been addressed in v7 now.
>>
>> >> +
>> >> + properties:
>> >> + compatible:
>> >> + const: samsung,s2mu005-rgb
>> >> +
>> >> + required:
>> >> + - compatible
>> >> +
>> >> + unevaluatedProperties: false
>> >> +
>> >> + reg:
>> >> + maxItems: 1
>> >
>> > Move this above the child nodes please.
>>
>> But properties are sorted in lex order?
>
> Typically the binding is sorted in the same order as properties go in
> nodes. Common stuff like reg/clocks/interrupts therefore send up above
> child nodes.
So, do I change this? For one, I don't see the same being followed in
other schemas of samsung in the same dir (not that I'm trying to pose it
as an argument against your suggestion), and this was reviewed by
Krzysztof and is adderssed in v7.
^ permalink raw reply
* [PATCH v6 2/2] hwmon: add AMD Promontory 21 xHCI temperature sensor support
From: Jihong Min @ 2026-05-17 13:04 UTC (permalink / raw)
To: Greg Kroah-Hartman, Mathias Nyman
Cc: Guenter Roeck, Jonathan Corbet, Shuah Khan, Mario Limonciello,
Basavaraj Natikar, linux-usb, linux-hwmon, linux-doc, linux-pci,
linux-kernel, Jihong Min, Mario Limonciello (AMD),
Yaroslav Isakov
In-Reply-To: <20260517130407.795157-1-hurryman2212@gmail.com>
Add an auxiliary-bus hwmon driver for the temperature sensor exposed by
AMD Promontory 21 (PROM21) xHCI PCI functions. The driver binds to the
"hwmon" auxiliary device published by the PROM21 xHCI PCI glue and
exposes the sensor as temp1_input under the prom21_xhci hwmon device.
The sensor is accessed through a PROM21 vendor index/data register pair
in the xHCI PCI MMIO BAR. The driver consumes parent-provided MMIO data
from the PROM21 PCI glue instead of inspecting the parent PCI driver's
drvdata. The read path restores the previous vendor index value after
sampling and does not runtime-resume the parent PCI device; reads from a
suspended parent return -ENODATA.
Document the supported device, register access, runtime PM behavior, and
sysfs lookup method. The documentation also records the observation
method used to identify the register pair and derive the conversion
formula.
Assisted-by: Codex:gpt-5.5
Signed-off-by: Jihong Min <hurryman2212@gmail.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
---
Documentation/hwmon/index.rst | 1 +
Documentation/hwmon/prom21-xhci.rst | 101 ++++++++++++
drivers/hwmon/Kconfig | 10 ++
drivers/hwmon/Makefile | 1 +
drivers/hwmon/prom21-xhci.c | 239 ++++++++++++++++++++++++++++
5 files changed, 352 insertions(+)
create mode 100644 Documentation/hwmon/prom21-xhci.rst
create mode 100644 drivers/hwmon/prom21-xhci.c
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 8b655e5d6b68..324208f1faa2 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -216,6 +216,7 @@ Hardware Monitoring Kernel Drivers
pmbus
powerz
powr1220
+ prom21-xhci
pt5161l
pxe1610
pwm-fan
diff --git a/Documentation/hwmon/prom21-xhci.rst b/Documentation/hwmon/prom21-xhci.rst
new file mode 100644
index 000000000000..7984fb187bd8
--- /dev/null
+++ b/Documentation/hwmon/prom21-xhci.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver prom21-xhci
+=========================
+
+Supported chips:
+
+ * AMD Promontory 21 (PROM21) xHCI USB host controller
+
+ Prefix: 'prom21_xhci'
+
+ PCI IDs: 1022:43fc, 1022:43fd
+
+Author:
+
+ - Jihong Min <hurryman2212@gmail.com>
+
+Description
+-----------
+
+This driver exposes the temperature sensor in AMD PROM21 xHCI controllers.
+
+The driver binds to an auxiliary device created by the xHCI PCI driver for
+supported controllers. The sensor value is accessed through a vendor-specific
+index/data register pair in the controller's PCI MMIO BAR.
+The auxiliary device is created by the ``xhci-pci-prom21`` PCI glue driver.
+USB host operation is otherwise delegated to the common ``xhci-pci`` code.
+
+PROM21 is an AMD chipset IP used in single-chip or daisy-chained configurations
+to build AMD 6xx/8xx series chipsets. Since the xHCI controllers are
+integrated in PROM21, this temperature can also be used as a monitor for a
+temperature close to the AMD chipset temperature.
+
+Register access
+---------------
+
+The temperature value is read through a vendor-specific index/data register
+pair in the xHCI PCI MMIO BAR. The driver uses the following byte offsets from
+the MMIO BAR base:
+
+======================= =====================================================
+0x3000 Vendor index register
+0x3008 Vendor data register
+======================= =====================================================
+
+The driver saves the current vendor index register value, writes the
+temperature selector ``0x0001e520`` to the vendor index register, reads the
+vendor data register, and restores the previous vendor index value before
+returning. The raw temperature value is the low 8 bits of the vendor data
+register value.
+
+The hwmon core serializes this driver's callbacks, and the driver restores the
+previous index value after each read. This does not provide synchronization
+with firmware, SMM, ACPI AML, or any other user outside this driver.
+
+No public AMD reference is available for the register pair or the raw value.
+The register pair was identified on an X870E system with two PROM21 xHCI
+controllers. One controller was passed through to a Windows VM, and the same
+controller's PCI MMIO BAR was observed from the Linux host while HWiNFO64 was
+reporting the PROM21 xHCI temperature. In the test environment, the reported
+temperature was very stable at idle and the displayed sensor resolution was
+low, which made it possible to look for a consistently repeating MMIO response
+for the same reported temperature. During observation, offset 0x3000 repeatedly
+contained selector ``0x0001e520``. Writing the same selector to offset 0x3000
+from Linux and then reading offset 0x3008 reproduced the same raw value, so the
+offsets are treated as a vendor index/data register pair.
+
+The conversion formula was empirically inferred by matching observed raw
+8-bit values against HWiNFO64's reported PROM21 xHCI temperature for the same
+controller. The observed mapping is:
+
+ temp[C] = raw * 0.9066 - 78.624
+
+Runtime PM
+----------
+
+The driver does not wake the xHCI PCI device for hwmon reads. It reads the
+temperature only when the parent device is already active. A read from a
+suspended device returns ``-ENODATA``. After a successful read, the driver
+drops its active-only runtime PM reference and lets the PM core re-evaluate the
+idle state.
+
+Sysfs entries
+-------------
+
+======================= =====================================================
+temp1_input Temperature in millidegrees Celsius
+======================= =====================================================
+
+The hwmon device name is ``prom21_xhci``. The sysfs path depends on the hwmon
+device number assigned by the kernel. Userspace can locate the device by
+matching the ``name`` attribute:
+
+.. code-block:: sh
+
+ for hwmon in /sys/class/hwmon/hwmon*; do
+ [ "$(cat "$hwmon/name")" = "prom21_xhci" ] || continue
+ cat "$hwmon/temp1_input"
+ done
+
+If the raw register value is invalid, ``temp1_input`` returns ``-ENODATA``.
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 14e4cea48acc..fe0f14e247b5 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -951,6 +951,16 @@ config SENSORS_POWR1220
This driver can also be built as a module. If so, the module
will be called powr1220.
+config SENSORS_PROM21_XHCI
+ tristate "AMD Promontory 21 xHCI temperature sensor"
+ depends on USB_XHCI_PCI_PROM21
+ help
+ If you say yes here you get support for the AMD Promontory 21
+ (PROM21) xHCI temperature sensor.
+
+ This driver can also be built as a module. If so, the module
+ will be called prom21-xhci.
+
config SENSORS_LAN966X
tristate "Microchip LAN966x Hardware Monitoring"
depends on SOC_LAN966 || COMPILE_TEST
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 982ee2c6f9de..f833aed890d8 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -196,6 +196,7 @@ obj-$(CONFIG_SENSORS_PC87427) += pc87427.o
obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o
obj-$(CONFIG_SENSORS_POWERZ) += powerz.o
obj-$(CONFIG_SENSORS_POWR1220) += powr1220.o
+obj-$(CONFIG_SENSORS_PROM21_XHCI) += prom21-xhci.o
obj-$(CONFIG_SENSORS_PT5161L) += pt5161l.o
obj-$(CONFIG_SENSORS_PWM_FAN) += pwm-fan.o
obj-$(CONFIG_SENSORS_QNAP_MCU_HWMON) += qnap-mcu-hwmon.o
diff --git a/drivers/hwmon/prom21-xhci.c b/drivers/hwmon/prom21-xhci.c
new file mode 100644
index 000000000000..d40d0c53ce45
--- /dev/null
+++ b/drivers/hwmon/prom21-xhci.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Promontory 21 xHCI Hwmon Implementation
+ * (only temperature monitoring is supported)
+ *
+ * This can be effectively used as the alternative chipset temperature monitor.
+ *
+ * Copyright (C) 2026 Jihong Min <hurryman2212@gmail.com>
+ */
+
+#include <linux/auxiliary_bus.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/hwmon.h>
+#include <linux/io.h>
+#include <linux/math.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/platform_data/usb-xhci-prom21.h>
+#include <linux/pm_runtime.h>
+
+#define PROM21_XHCI_INDEX_OFFSET 0x3000
+#define PROM21_XHCI_DATA_OFFSET 0x3008
+#define PROM21_XHCI_TEMP_SELECTOR 0x0001e520
+
+struct prom21_xhci {
+ struct pci_dev *pdev;
+ struct device *hwmon_dev;
+ void __iomem *regs;
+};
+
+static int prom21_xhci_pm_get(struct prom21_xhci *hwmon)
+{
+ struct device *dev = &hwmon->pdev->dev;
+ int ret;
+
+ /*
+ * PROM21 temperature register access does not return a valid value while
+ * the parent xHCI PCI function is suspended. Do not wake the device from
+ * a hwmon read. On success, hold a usage reference without changing the
+ * runtime PM state; if runtime PM is disabled, allow the read unless the
+ * device is still marked suspended.
+ */
+ ret = pm_runtime_get_if_active(dev);
+ if (ret > 0)
+ return 0;
+
+ if (ret == -EINVAL) {
+ if (pm_runtime_status_suspended(dev))
+ return -ENODATA;
+
+ pm_runtime_get_noresume(dev);
+ return 0;
+ }
+
+ if (!ret)
+ return -ENODATA;
+
+ return ret;
+}
+
+/*
+ * This is not a pure MMIO read. The PROM21 vendor data register is selected
+ * by temporarily writing PROM21_XHCI_TEMP_SELECTOR to the vendor index
+ * register.
+ * The hwmon core already serializes this driver's callbacks, so this driver
+ * does not need an additional private lock. That does not synchronize with
+ * firmware, SMM, ACPI, or other possible users. Keep the sequence short and
+ * restore the previous index before returning.
+ */
+static int prom21_xhci_read_temp_raw_restore_index(struct prom21_xhci *hwmon,
+ u8 *raw)
+{
+ struct device *dev = &hwmon->pdev->dev;
+ u32 index;
+ u8 data;
+ int ret;
+
+ ret = prom21_xhci_pm_get(hwmon);
+ if (ret)
+ return ret;
+
+ index = readl(hwmon->regs + PROM21_XHCI_INDEX_OFFSET);
+ /* Select the PROM21 temperature register through the vendor index. */
+ writel(PROM21_XHCI_TEMP_SELECTOR,
+ hwmon->regs + PROM21_XHCI_INDEX_OFFSET);
+ /* Use a 32-bit read for PCI MMIO register access. */
+ data = readl(hwmon->regs + PROM21_XHCI_DATA_OFFSET) & 0xff;
+ /* Restore the previous vendor index register value. */
+ writel(index, hwmon->regs + PROM21_XHCI_INDEX_OFFSET);
+ readl(hwmon->regs + PROM21_XHCI_INDEX_OFFSET);
+
+ /*
+ * Drop the usage reference taken by prom21_xhci_pm_get(). This is
+ * enough because the read path never resumes the device; use the normal
+ * put path so the PM core can re-evaluate idle state after the read.
+ * Otherwise, a racing xHCI autosuspend attempt can see a nonzero
+ * runtime PM usage count and skip autosuspend, and a later
+ * pm_runtime_put_noidle(), which does not check for an idle device,
+ * would leave the device active.
+ */
+ pm_runtime_put(dev);
+
+ if (!data)
+ return -ENODATA;
+
+ *raw = data;
+ return 0;
+}
+
+static long prom21_xhci_raw_to_millicelsius(u8 raw)
+{
+ /*
+ * No public AMD reference is available for this value.
+ * The scale was derived from observed PROM21 xHCI temperature readings:
+ * temp[C] = raw * 0.9066 - 78.624
+ */
+ return DIV_ROUND_CLOSEST(raw * 9066, 10) - 78624;
+}
+
+static umode_t prom21_xhci_is_visible(const void *drvdata,
+ enum hwmon_sensor_types type, u32 attr,
+ int channel)
+{
+ if (type != hwmon_temp)
+ return 0;
+
+ switch (attr) {
+ case hwmon_temp_input:
+ return 0444;
+ default:
+ return 0;
+ }
+}
+
+static int prom21_xhci_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
+{
+ struct prom21_xhci *hwmon = dev_get_drvdata(dev);
+ u8 raw;
+ int ret;
+
+ if (type != hwmon_temp || attr != hwmon_temp_input)
+ return -EOPNOTSUPP;
+
+ ret = prom21_xhci_read_temp_raw_restore_index(hwmon, &raw);
+ if (ret)
+ return ret;
+
+ *val = prom21_xhci_raw_to_millicelsius(raw);
+ return 0;
+}
+
+static const struct hwmon_ops prom21_xhci_ops = {
+ .is_visible = prom21_xhci_is_visible,
+ .read = prom21_xhci_read,
+};
+
+static const struct hwmon_channel_info *const prom21_xhci_info[] = {
+ HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT),
+ NULL,
+};
+
+static const struct hwmon_chip_info prom21_xhci_chip_info = {
+ .ops = &prom21_xhci_ops,
+ .info = prom21_xhci_info,
+};
+
+static int prom21_xhci_probe(struct auxiliary_device *auxdev,
+ const struct auxiliary_device_id *id)
+{
+ struct device *dev = &auxdev->dev;
+ const struct prom21_xhci_pdata *pdata = dev_get_platdata(dev);
+ struct prom21_xhci *hwmon;
+
+ if (!pdata)
+ return dev_err_probe(dev, -ENODEV,
+ "platform data unavailable\n");
+
+ if (!pdata->regs ||
+ pdata->rsrc_len < PROM21_XHCI_DATA_OFFSET + sizeof(u32))
+ return dev_err_probe(dev, -ENODEV, "invalid MMIO resource\n");
+
+ hwmon = devm_kzalloc(dev, sizeof(*hwmon), GFP_KERNEL);
+ if (!hwmon)
+ return -ENOMEM;
+
+ hwmon->pdev = pdata->pdev;
+ hwmon->regs = pdata->regs;
+ auxiliary_set_drvdata(auxdev, hwmon);
+
+ /*
+ * Parent the hwmon device to the PCI function because the temperature
+ * value is read from that function's MMIO BAR, and systems may contain
+ * multiple PROM21 xHCI functions. This lets userspace identify the PCI
+ * endpoint for each reading. The auxiliary driver still owns the hwmon
+ * lifetime and unregisters it before HCD teardown.
+ */
+ hwmon->hwmon_dev =
+ hwmon_device_register_with_info(&pdata->pdev->dev, "prom21_xhci",
+ hwmon, &prom21_xhci_chip_info,
+ NULL);
+ if (IS_ERR(hwmon->hwmon_dev))
+ return PTR_ERR(hwmon->hwmon_dev);
+
+ return 0;
+}
+
+static void prom21_xhci_remove(struct auxiliary_device *auxdev)
+{
+ struct prom21_xhci *hwmon = auxiliary_get_drvdata(auxdev);
+
+ /*
+ * The PROM21 PCI glue destroys the auxiliary device before HCD teardown.
+ * Unregister the hwmon device here so sysfs removes the attributes,
+ * stops new reads, and drains active hwmon callbacks before the xHCI
+ * MMIO mapping is released.
+ */
+ hwmon_device_unregister(hwmon->hwmon_dev);
+}
+
+static const struct auxiliary_device_id prom21_xhci_id_table[] = {
+ { .name = "xhci_pci_prom21.hwmon" },
+ {}
+};
+MODULE_DEVICE_TABLE(auxiliary, prom21_xhci_id_table);
+
+static struct auxiliary_driver prom21_xhci_driver = {
+ .name = "prom21-xhci",
+ .probe = prom21_xhci_probe,
+ .remove = prom21_xhci_remove,
+ .id_table = prom21_xhci_id_table,
+};
+module_auxiliary_driver(prom21_xhci_driver);
+
+MODULE_AUTHOR("Jihong Min <hurryman2212@gmail.com>");
+MODULE_DESCRIPTION("AMD Promontory 21 xHCI temperature sensor driver");
+MODULE_LICENSE("GPL");
--
2.53.0
^ permalink raw reply related
* [PATCH v6 1/2] usb: xhci-pci: add AMD Promontory 21 PCI glue
From: Jihong Min @ 2026-05-17 13:04 UTC (permalink / raw)
To: Greg Kroah-Hartman, Mathias Nyman
Cc: Guenter Roeck, Jonathan Corbet, Shuah Khan, Mario Limonciello,
Basavaraj Natikar, linux-usb, linux-hwmon, linux-doc, linux-pci,
linux-kernel, Jihong Min, Mario Limonciello (AMD),
Yaroslav Isakov
In-Reply-To: <20260517130407.795157-1-hurryman2212@gmail.com>
AMD Promontory 21 (PROM21) xHCI controllers use generic xHCI
operation, but the PCI function also exposes optional
controller-specific sensor functionality. Add a small PROM21 PCI glue
driver for AMD 1022:43fc and 1022:43fd controllers.
The driver delegates USB host operation to the common xhci-pci core,
collects the parent-provided MMIO resource data, and creates a "hwmon"
auxiliary device for optional child drivers. Failure to create the
auxiliary device is logged but does not fail the xHCI probe, since the
auxiliary device is only needed for sensor support.
Make the PROM21 PCI glue a hidden Kconfig tristate that follows
USB_XHCI_PCI. This keeps the glue built in with a built-in xhci-pci core
and builds it as a module with a modular xhci-pci core. A built-in
xhci-pci core must not hand PROM21 controllers to a PROM21 glue driver
that is only available as a module, otherwise USB behind those controllers
can be unavailable during initramfs and PROM21 temperature sensor support
may not appear until the controller is rebound after the module loads.
Assisted-by: Codex:gpt-5.5
Signed-off-by: Jihong Min <hurryman2212@gmail.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: Yaroslav Isakov <yaroslav.isakov@gmail.com>
---
drivers/usb/host/Kconfig | 7 +
drivers/usb/host/Makefile | 1 +
drivers/usb/host/xhci-pci-prom21.c | 136 ++++++++++++++++++
drivers/usb/host/xhci-pci.c | 11 ++
drivers/usb/host/xhci-pci.h | 3 +
include/linux/platform_data/usb-xhci-prom21.h | 22 +++
6 files changed, 180 insertions(+)
create mode 100644 drivers/usb/host/xhci-pci-prom21.c
create mode 100644 include/linux/platform_data/usb-xhci-prom21.h
diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
index 0a277a07cf70..89bf262235e1 100644
--- a/drivers/usb/host/Kconfig
+++ b/drivers/usb/host/Kconfig
@@ -42,6 +42,13 @@ config USB_XHCI_PCI
depends on USB_PCI
default y
+config USB_XHCI_PCI_PROM21
+ tristate
+ depends on X86
+ depends on USB_XHCI_PCI
+ default USB_XHCI_PCI
+ select AUXILIARY_BUS
+
config USB_XHCI_PCI_RENESAS
tristate "Support for additional Renesas xHCI controller with firmware"
depends on USB_XHCI_PCI
diff --git a/drivers/usb/host/Makefile b/drivers/usb/host/Makefile
index a07e7ba9cd53..174580c1281a 100644
--- a/drivers/usb/host/Makefile
+++ b/drivers/usb/host/Makefile
@@ -71,6 +71,7 @@ obj-$(CONFIG_USB_UHCI_HCD) += uhci-hcd.o
obj-$(CONFIG_USB_FHCI_HCD) += fhci.o
obj-$(CONFIG_USB_XHCI_HCD) += xhci-hcd.o
obj-$(CONFIG_USB_XHCI_PCI) += xhci-pci.o
+obj-$(CONFIG_USB_XHCI_PCI_PROM21) += xhci-pci-prom21.o
obj-$(CONFIG_USB_XHCI_PCI_RENESAS) += xhci-pci-renesas.o
obj-$(CONFIG_USB_XHCI_PLATFORM) += xhci-plat-hcd.o
obj-$(CONFIG_USB_XHCI_HISTB) += xhci-histb.o
diff --git a/drivers/usb/host/xhci-pci-prom21.c b/drivers/usb/host/xhci-pci-prom21.c
new file mode 100644
index 000000000000..be0933ca5c62
--- /dev/null
+++ b/drivers/usb/host/xhci-pci-prom21.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Promontory 21 xHCI host controller PCI Bus Glue.
+ *
+ * This does not add any PROM21-specific USB or xHCI operation. It exists only
+ * to publish an auxiliary device for integrated temperature sensor support.
+ *
+ * Copyright (C) 2026 Jihong Min <hurryman2212@gmail.com>
+ */
+
+#include <linux/auxiliary_bus.h>
+#include <linux/device/devres.h>
+#include <linux/errno.h>
+#include <linux/idr.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/platform_data/usb-xhci-prom21.h>
+#include <linux/usb.h>
+#include <linux/usb/hcd.h>
+
+#include "xhci-pci.h"
+
+struct prom21_xhci_auxdev {
+ struct auxiliary_device *auxdev;
+ struct prom21_xhci_pdata pdata;
+ int id;
+};
+
+static DEFINE_IDA(prom21_xhci_auxdev_ida);
+
+static void prom21_xhci_auxdev_release(struct device *dev, void *res)
+{
+ struct prom21_xhci_auxdev *prom21_auxdev = res;
+
+ auxiliary_device_destroy(prom21_auxdev->auxdev);
+ ida_free(&prom21_xhci_auxdev_ida, prom21_auxdev->id);
+}
+
+static int prom21_xhci_create_auxdev(struct pci_dev *pdev)
+{
+ struct prom21_xhci_auxdev *prom21_auxdev;
+ struct usb_hcd *hcd = pci_get_drvdata(pdev);
+
+ if (!hcd)
+ return -ENODEV;
+
+ prom21_auxdev = devres_alloc(prom21_xhci_auxdev_release,
+ sizeof(*prom21_auxdev), GFP_KERNEL);
+ if (!prom21_auxdev)
+ return -ENOMEM;
+
+ prom21_auxdev->pdata.pdev = pdev;
+ prom21_auxdev->pdata.regs = hcd->regs;
+ prom21_auxdev->pdata.rsrc_len = hcd->rsrc_len;
+
+ prom21_auxdev->id = ida_alloc(&prom21_xhci_auxdev_ida, GFP_KERNEL);
+ if (prom21_auxdev->id < 0) {
+ int ret = prom21_auxdev->id;
+
+ devres_free(prom21_auxdev);
+ return ret;
+ }
+
+ prom21_auxdev->auxdev = auxiliary_device_create(&pdev->dev,
+ KBUILD_MODNAME, "hwmon",
+ &prom21_auxdev->pdata,
+ prom21_auxdev->id);
+ if (!prom21_auxdev->auxdev) {
+ ida_free(&prom21_xhci_auxdev_ida, prom21_auxdev->id);
+ devres_free(prom21_auxdev);
+ return -ENOMEM;
+ }
+
+ devres_add(&pdev->dev, prom21_auxdev);
+ return 0;
+}
+
+static void prom21_xhci_destroy_auxdev(struct pci_dev *pdev)
+{
+ devres_release(&pdev->dev, prom21_xhci_auxdev_release, NULL, NULL);
+}
+
+static int prom21_xhci_probe(struct pci_dev *dev,
+ const struct pci_device_id *id)
+{
+ int retval;
+
+ retval = xhci_pci_common_probe(dev, id);
+ if (retval)
+ return retval;
+
+ retval = prom21_xhci_create_auxdev(dev);
+ if (retval) {
+ /*
+ * The auxiliary device only provides optional temperature sensor
+ * support. Keep the xHCI controller usable if it fails.
+ */
+ dev_err(&dev->dev,
+ "failed to create PROM21 hwmon auxiliary device: %d\n",
+ retval);
+ }
+
+ return 0;
+}
+
+static void prom21_xhci_remove(struct pci_dev *dev)
+{
+ prom21_xhci_destroy_auxdev(dev);
+ xhci_pci_remove(dev);
+}
+
+static const struct pci_device_id pci_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_PROM21_XHCI_43FC) },
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_PROM21_XHCI_43FD) },
+ { /* end: all zeroes */ }
+};
+MODULE_DEVICE_TABLE(pci, pci_ids);
+
+static struct pci_driver prom21_xhci_driver = {
+ .name = "xhci-pci-prom21",
+ .id_table = pci_ids,
+
+ .probe = prom21_xhci_probe,
+ .remove = prom21_xhci_remove,
+
+ .shutdown = usb_hcd_pci_shutdown,
+ .driver = {
+ .pm = pm_ptr(&usb_hcd_pci_pm_ops),
+ },
+};
+module_pci_driver(prom21_xhci_driver);
+
+MODULE_AUTHOR("Jihong Min <hurryman2212@gmail.com>");
+MODULE_DESCRIPTION("AMD Promontory 21 xHCI PCI Host Controller Driver");
+MODULE_IMPORT_NS("xhci");
+MODULE_LICENSE("GPL");
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 585b2f3117b0..039c26b241d0 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -696,12 +696,23 @@ static const struct pci_device_id pci_ids_renesas[] = {
{ /* end: all zeroes */ }
};
+/* handled by xhci-pci-prom21 if enabled */
+static const struct pci_device_id pci_ids_prom21[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_PROM21_XHCI_43FC) },
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_PROM21_XHCI_43FD) },
+ { /* end: all zeroes */ }
+};
+
static int xhci_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
{
if (IS_ENABLED(CONFIG_USB_XHCI_PCI_RENESAS) &&
pci_match_id(pci_ids_renesas, dev))
return -ENODEV;
+ if (IS_ENABLED(CONFIG_USB_XHCI_PCI_PROM21) &&
+ pci_match_id(pci_ids_prom21, dev))
+ return -ENODEV;
+
return xhci_pci_common_probe(dev, id);
}
diff --git a/drivers/usb/host/xhci-pci.h b/drivers/usb/host/xhci-pci.h
index e87c7d9d76b8..11f435f94322 100644
--- a/drivers/usb/host/xhci-pci.h
+++ b/drivers/usb/host/xhci-pci.h
@@ -4,6 +4,9 @@
#ifndef XHCI_PCI_H
#define XHCI_PCI_H
+#define PCI_DEVICE_ID_AMD_PROM21_XHCI_43FC 0x43fc
+#define PCI_DEVICE_ID_AMD_PROM21_XHCI_43FD 0x43fd
+
int xhci_pci_common_probe(struct pci_dev *dev, const struct pci_device_id *id);
void xhci_pci_remove(struct pci_dev *dev);
diff --git a/include/linux/platform_data/usb-xhci-prom21.h b/include/linux/platform_data/usb-xhci-prom21.h
new file mode 100644
index 000000000000..ee672ad452a8
--- /dev/null
+++ b/include/linux/platform_data/usb-xhci-prom21.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD Promontory 21 xHCI auxiliary device platform data.
+ *
+ * Copyright (C) 2026 Jihong Min <hurryman2212@gmail.com>
+ */
+
+#ifndef _LINUX_PLATFORM_DATA_USB_XHCI_PROM21_H
+#define _LINUX_PLATFORM_DATA_USB_XHCI_PROM21_H
+
+#include <linux/compiler_types.h>
+#include <linux/types.h>
+
+struct pci_dev;
+
+struct prom21_xhci_pdata {
+ struct pci_dev *pdev;
+ void __iomem *regs;
+ resource_size_t rsrc_len;
+};
+
+#endif
--
2.53.0
^ permalink raw reply related
* [PATCH v6 0/2] AMD Promontory 21 xHCI temperature sensor support
From: Jihong Min @ 2026-05-17 13:04 UTC (permalink / raw)
To: Greg Kroah-Hartman, Mathias Nyman
Cc: Guenter Roeck, Jonathan Corbet, Shuah Khan, Mario Limonciello,
Basavaraj Natikar, linux-usb, linux-hwmon, linux-doc, linux-pci,
linux-kernel, Jihong Min
Hi,
This series adds temperature monitoring for AMD Promontory 21 (PROM21)
xHCI PCI functions.
Patch 1 adds a small PROM21-specific xHCI PCI glue driver. USB host
operation is delegated to the common xhci-pci code, while the PROM21 glue
publishes an auxiliary device for optional sensor support.
Patch 2 adds an auxiliary-bus hwmon driver that binds to that auxiliary
device and exposes the PROM21 xHCI temperature value as temp1_input.
The hwmon driver reads the sensor through a vendor index/data register pair
in the xHCI PCI MMIO BAR. It does not wake the parent PCI device for hwmon
reads; if the parent is suspended, the read returns -ENODATA.
Changes in v6:
- Make USB_XHCI_PCI_PROM21 a hidden tristate that follows USB_XHCI_PCI,
so the PROM21 PCI glue is built in with a built-in xhci-pci core and
built as a module with a modular xhci-pci core.
- Use an IDA-allocated auxiliary device id instead of encoding the PCI
domain/BDF into the auxiliary id.
- Use a 32-bit read for the PROM21 vendor data register and mask the low
byte instead of using readb().
Jihong Min (2):
usb: xhci-pci: add AMD Promontory 21 PCI glue
hwmon: add AMD Promontory 21 xHCI temperature sensor support
Documentation/hwmon/index.rst | 1 +
Documentation/hwmon/prom21-xhci.rst | 101 ++++++++
drivers/hwmon/Kconfig | 10 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/prom21-xhci.c | 239 ++++++++++++++++++
drivers/usb/host/Kconfig | 7 +
drivers/usb/host/Makefile | 1 +
drivers/usb/host/xhci-pci-prom21.c | 136 ++++++++++
drivers/usb/host/xhci-pci.c | 11 +
drivers/usb/host/xhci-pci.h | 3 +
include/linux/platform_data/usb-xhci-prom21.h | 22 ++
11 files changed, 532 insertions(+)
create mode 100644 Documentation/hwmon/prom21-xhci.rst
create mode 100644 drivers/hwmon/prom21-xhci.c
create mode 100644 drivers/usb/host/xhci-pci-prom21.c
create mode 100644 include/linux/platform_data/usb-xhci-prom21.h
--
2.53.0
^ permalink raw reply
* [PATCH] docs: hwmon: htu31: document debugfs serial_number
From: Chen-Shi-Hong @ 2026-05-17 12:52 UTC (permalink / raw)
To: Guenter Roeck
Cc: Jonathan Corbet, Shuah Khan, linux-hwmon, linux-doc, linux-kernel,
Chen-Shi-Hong
Document the debugfs serial_number file exposed by the htu31 driver.
The driver creates a debugfs entry for the sensor serial number, but
the documentation currently only describes the sysfs interface.
Signed-off-by: Chen-Shi-Hong <eric039eric@gmail.com>
---
Documentation/hwmon/htu31.rst | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/Documentation/hwmon/htu31.rst b/Documentation/hwmon/htu31.rst
index ccde84264643..9ab774dcf65d 100644
--- a/Documentation/hwmon/htu31.rst
+++ b/Documentation/hwmon/htu31.rst
@@ -35,3 +35,10 @@ temp1_input: temperature input
humidity1_input: humidity input
heater_enable: heater control
=================== =================
+
+debugfs-Interface
+-----------------
+
+=================== =========================================
+serial_number: unique serial number of the sensor
+=================== =========================================
--
2.53.0
^ permalink raw reply related
* Re: [RFC PATCH v3 2/3] Documentation: add kconfirm
From: Miguel Ojeda @ 2026-05-17 12:35 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Julian Braha, nsc, jani.nikula, akpm, gary, ljs, arnd, gregkh,
masahiroy, ojeda, corbet, qingfang.deng, yann.prono, demiobenour,
ej, linux-kernel, rust-for-linux, linux-doc, linux-kbuild
In-Reply-To: <20260517094041.GC3773662@ax162>
On Sun, May 17, 2026 at 11:40 AM Nathan Chancellor <nathan@kernel.org> wrote:
>
> I guess this is kind of a weird/unique situation. I agree that the files
> generated by 'cargo run' should absolutely be contained within the build
> folder; at that point, $(srctree) could be read only and I would
> consider it rude not to respect the user's choice of build directory.
> For 'cargo vendor' however, I am not sure. They are source files and I
> would expect that running 'cargo vendor' would be more considered part
> of preparing the source tree, rather than the build one (so it should
> not be read only).
That would simplify things, yeah. We could always start there and see
if someone needs it.
> At the same time, it might be safer for dependency updates and internal
> consistency that they are confined to the build folder. I guess we would
> only want to remove them with a 'distclean', rather than 'mrproper' or
> 'clean', in that case, to avoid requiring users to constantly run
> 'cargo vendor'. It might be more ergonomic for this to be a Kbuild
> target ('kconfirmvendor'?) so that this could be handled automatically
> based on the user's build command.
Yeah, it is a bit painful to not have the usual Kbuild
variables/infrastructure around... On the other hand, it is a nice
property to know that nothing called via `make` will ever connect (or
need to connect) to the Internet.
Hmm... Perhaps a good middle ground would be having something in the
name that makes it obvious it will connect, e.g. `fetch` like Git? Or,
if people feel strongly about the property mentioned, then something
like an environment variable that needs to be set to allow it (with a
message printed about it if it is not set).
If this were allowed, i.e. if we are OK having things in `make` that
fetch stuff and put it in the build folder (only in certain targets,
of course), then we could actually think about doing more things that
we didn't so far, such as other setup-like targets, e.g. preparing
kernel.org toolchains, setting up a Rust toolchain via `rustup`
(including `bindgen` etc.), and so on and so forth.
> Additionally, can we detect explicitly when dependencies are not
> properly vendored and error with a more helpful error message? The build
> command in patch 1 just throws up its hands when the build fails and
> asks if the dependencies have been set up but if we provided our own
> vendoring build target, we could add some canary that says we vendored
> successfully and if that is not present, error before even running the
> build and say "hey, you need to explicitly run this target before you
> build".
+1, good error messages help a lot. Something like `rustavailable`
that prints which particular thing is missing is great (that one even
tries to warn about some problematic versions testing for bugs --
hopefully we don't need `autoconf`... :).
Cheers,
Miguel
^ permalink raw reply
* Re: [PATCH v11 6/6] docs: iio: adc: ad4691: add driver documentation
From: Jonathan Cameron @ 2026-05-17 12:32 UTC (permalink / raw)
To: David Lechner
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Uwe Kleine-König, Liam Girdwood, Mark Brown, Linus Walleij,
Bartosz Golaszewski, Philipp Zabel, Jonathan Corbet, Shuah Khan,
linux-iio, devicetree, linux-kernel, linux-pwm, linux-gpio,
linux-doc
In-Reply-To: <78d4bb1c-4c6d-4781-86ee-458579ac6990@baylibre.com>
On Sat, 16 May 2026 13:18:03 -0500
David Lechner <dlechner@baylibre.com> wrote:
> On 5/15/26 8:31 AM, Radu Sabau via B4 Relay wrote:
> > From: Radu Sabau <radu.sabau@analog.com>
> >
> > Add RST documentation for the AD4691 family ADC driver covering
> > supported devices, IIO channels, operating modes, oversampling,
> > reference voltage, LDO supply, reset, GP pins, SPI offload support,
> > and buffer data format.
> >
> > Signed-off-by: Radu Sabau <radu.sabau@analog.com>
FWIW I finished giving the series another look and mostly didn't have
anything to add to David's review! So subject to further discussion
on the feedback (and maybe a day or two to see if others want to enter
the fray), looking forward to v12. Fingers crossed that's good to go.
Thanks,
Jonathan
^ permalink raw reply
* Re: [PATCH v11 3/6] iio: adc: ad4691: add triggered buffer support
From: Jonathan Cameron @ 2026-05-17 12:25 UTC (permalink / raw)
To: David Lechner
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Uwe Kleine-König, Liam Girdwood, Mark Brown, Linus Walleij,
Bartosz Golaszewski, Philipp Zabel, Jonathan Corbet, Shuah Khan,
linux-iio, devicetree, linux-kernel, linux-pwm, linux-gpio,
linux-doc
In-Reply-To: <9b7986e1-6550-415d-b301-33089ba10177@baylibre.com>
On Sat, 16 May 2026 12:32:51 -0500
David Lechner <dlechner@baylibre.com> wrote:
> On 5/15/26 8:31 AM, Radu Sabau via B4 Relay wrote:
> > From: Radu Sabau <radu.sabau@analog.com>
> >
> > Add buffered capture support using the IIO triggered buffer framework.
> >
> > CNV Burst Mode: the GP pin identified by interrupt-names in the device
> > tree is configured as DATA_READY output. The IRQ handler stops
> > conversions and fires the IIO trigger; the trigger handler executes a
> > pre-built SPI message that reads all active channels from the AVG_IN
> > accumulator registers and then resets accumulator state and restarts
> > conversions for the next cycle.
> >
> > Manual Mode: CNV is tied to SPI CS so each transfer simultaneously
> > reads the previous result and starts the next conversion (pipelined
> > N+1 scheme). At preenable time a pre-built, optimised SPI message of
> > N+1 transfers is constructed (N channel reads plus one NOOP to drain
> > the pipeline). The trigger handler executes the message in a single
> > spi_sync() call and collects the results. An external trigger (e.g.
> > iio-trig-hrtimer) is required to drive the trigger at the desired
> > sample rate.
> >
> > Both modes share the same trigger handler and push a complete scan —
> > one big-endian 16-bit (__be16) slot per active channel, densely packed
> > in scan_index order, followed by a timestamp.
> >
> > The CNV Burst Mode sampling frequency (PWM period) is exposed as a
> > buffer-level attribute via IIO_DEVICE_ATTR.
> >
> > Signed-off-by: Radu Sabau <radu.sabau@analog.com>
> > +
> > +static int ad4691_manual_buffer_preenable(struct iio_dev *indio_dev)
> > +{
> > + struct ad4691_state *st = iio_priv(indio_dev);
> > + unsigned int k, i;
> > + int ret;
> > +
> > + memset(st->scan_xfers, 0, sizeof(st->scan_xfers));
> > + memset(st->scan_tx, 0, sizeof(st->scan_tx));
> > +
> > + spi_message_init(&st->scan_msg);
> > +
> > + k = 0;
> > + iio_for_each_active_channel(indio_dev, i) {
> > + if (i >= indio_dev->num_channels - 1)
> > + break; /* skip soft timestamp */
>
> I don't think timestamp gets set in the scan mask. It is handled separately.
FWIW that is a sashiko false postive (I believe anyway!)
If we do hit this please shout as we have a core bug.
If anyone has time to look at how hard it would be to tweak
iio_for_each_active_channel to skip a last element timestamp that
would be great.
I think that iterates one too far which is what sashiko is tripping over.
I'm only keen to fix that if we can make it low cost and hid it entirely
from drivers.
Jonathan
>
> > + /*
> > + * Channel-select command occupies the first (high) byte of the
> > + * 16-bit DIN frame; the second byte is a don't-care zero pad.
> > + * put_unaligned_be16() writes [cmd, 0x00] in memory so the
> > + * SPI controller sends the command byte first on the wire.
> > + */
> > + put_unaligned_be16((u16)(AD4691_ADC_CHAN(i) << 8), &st->scan_tx[k]);
> > + st->scan_xfers[k].tx_buf = &st->scan_tx[k];
> > + /*
> > + * The pipeline means xfer[0] receives the residual from the
> > + * previous sequence, not a valid sample. Discard it (rx_buf=NULL)
> > + * to avoid aliasing vals[0] across two concurrent DMA mappings.
> > + * xfer[1] (or the NOOP when only one channel is active) writes
> > + * the real ch[0] result to vals[0]. Subsequent transfers write
> > + * into vals[k-1] so each result lands at the next dense slot.
> > + */
> > + st->scan_xfers[k].rx_buf = (k == 0) ? NULL : &st->vals[k - 1];
> > + st->scan_xfers[k].len = sizeof(st->scan_tx[k]);
> > + st->scan_xfers[k].cs_change = 1;
> > + st->scan_xfers[k].cs_change_delay.value = AD4691_CNV_HIGH_TIME_NS;
> > + st->scan_xfers[k].cs_change_delay.unit = SPI_DELAY_UNIT_NSECS;
> > + spi_message_add_tail(&st->scan_xfers[k], &st->scan_msg);
> > + k++;
> > + }
> > +
> > + /* Final NOOP transfer retrieves the last channel's result. */
> > + st->scan_xfers[k].tx_buf = &st->scan_tx[k]; /* scan_tx[k] == 0 == NOOP */
> > + st->scan_xfers[k].rx_buf = &st->vals[k - 1];
> > + st->scan_xfers[k].len = sizeof(st->scan_tx[k]);
> > + spi_message_add_tail(&st->scan_xfers[k], &st->scan_msg);
> > +
> > + ret = spi_optimize_message(st->spi, &st->scan_msg);
> > + if (ret)
> > + return ret;
> > +
> > + ret = ad4691_enter_conversion_mode(st);
> > + if (ret) {
> > + spi_unoptimize_message(&st->scan_msg);
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
^ permalink raw reply
* Re: [PATCH v13 02/12] iio: kstrtox: add local _parse_integer_limit_init() helper
From: Rodrigo Alencar @ 2026-05-17 12:19 UTC (permalink / raw)
To: rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc
Cc: Jonathan Cameron, David Lechner, Andy Shevchenko,
Lars-Peter Clausen, Michael Hennerich, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet, Andrew Morton,
Petr Mladek, Steven Rostedt, Andy Shevchenko, Rasmus Villemoes,
Sergey Senozhatsky, Shuah Khan
In-Reply-To: <20260517-adf41513-iio-driver-v13-2-bb6e134a360f@analog.com>
On 26/05/17 10:13AM, Rodrigo Alencar via B4 Relay wrote:
> From: Rodrigo Alencar <rodrigo.alencar@analog.com>
>
> Add parsing helper that accepts an initial value for the accumulated
> result when parsing an 64-bit integer. It reuses current implementation
> for _parse_integer_limit(), which now consumes the new function with
> init = 0. The diff algorithm would have the documentation header and
> prototype of _parse_integer_limit() moved around so it is adjusted
> according to guidelines.
Sashiko's feedback at:
https://sashiko.dev/#/patchset/20260517-adf41513-iio-driver-v13-0-bb6e134a360f%40analog.com?part=2
...
> -/*
> - * Convert non-negative integer string representation in explicitly given radix
> - * to an integer. A maximum of max_chars characters will be converted.
> - *
> - * Return number of characters consumed maybe or-ed with overflow bit.
> - * If overflow occurs, result integer (incorrect) is still returned.
> - *
> - * Don't you dare use this function.
> - */
> -noinline
> -unsigned int _parse_integer_limit(const char *s, unsigned int base, unsigned long long *p,
> - size_t max_chars)
...
> +/**
> + * _parse_integer_limit() - Convert integer string representation to an integer
> + * limiting the number of characters parsed.
> + * @s: The start of the string.
> + * @base: The number base to use.
> + * @p: Where to write the result of the conversion.
> + * @max_chars: Maximum amount of characters to consume.
> + *
> + * Convert non-negative integer string representation in explicitly given radix
> + * to an integer. A maximum of max_chars characters will be converted.
> + *
> + * Return: Number of characters consumed maybe or-ed with overflow bit.
> + * If overflow occurs, result integer (incorrect) is still returned.
> + */
> +noinline
> +unsigned int _parse_integer_limit(const char *s, unsigned int base,
> + unsigned long long *p, size_t max_chars)
Is it safe to remove the "Don't you dare use this function." warning?
Since this function returns the number of consumed characters bitwise-ORed with
the KSTRTOX_OVERFLOW flag, turning the comment into standard kernel-doc format
makes it look like an inviting API.
If a developer uses this and directly advances a pointer by the return value
without masking the overflow bit, could an integer overflow during parsing cause
the pointer to jump out-of-bounds?
That return value merged with the overflow indication is a bit weird in the first
place. I wonder if we need a precursor patch cleaning _parse_integer_limit() and
removing this KSTRTOX_OVERFLOW completely... turning it into:
ssize_t _parse_integer_limit(const char *s,
const char **endp,
unsigned int base,
unsigned long long *res,
size_t max_chars);
so ssize_t returns the amount of converted characters, or -ERANGE if overflow occurs.
Also endp would point to the end of the conversion, so current usage (in _kstrtoull())
rv = _parse_integer(s, base, &_res);
if (rv & KSTRTOX_OVERFLOW)
return -ERANGE;
if (rv == 0)
return -EINVAL;
s += rv;
would become:
ret = _parse_integer(s, &s, base, &_res);
if (ret < 0)
return ret;
if (!ret)
return -EINVAL;
which seems more aligned to the code we see out there.
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* Re: [PATCH v11 2/6] iio: adc: ad4691: add initial driver for AD4691 family
From: Jonathan Cameron @ 2026-05-17 12:19 UTC (permalink / raw)
To: Radu Sabau via B4 Relay
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, David Lechner,
Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
Jonathan Corbet, Shuah Khan, linux-iio, devicetree, linux-kernel,
linux-pwm, linux-gpio, linux-doc
In-Reply-To: <20260515-ad4692-multichannel-sar-adc-driver-v11-2-eab27d852ac2@analog.com>
On Fri, 15 May 2026 16:31:31 +0300
Radu Sabau via B4 Relay <devnull+radu.sabau.analog.com@kernel.org> wrote:
> From: Radu Sabau <radu.sabau@analog.com>
>
> Add support for the Analog Devices AD4691 family of high-speed,
> low-power multichannel SAR ADCs: AD4691 (16-ch, 500 kSPS),
> AD4692 (16-ch, 1 MSPS), AD4693 (8-ch, 500 kSPS) and
> AD4694 (8-ch, 1 MSPS).
>
> The driver implements a custom regmap layer over raw SPI to handle the
> device's mixed 1/2/3/4-byte register widths and uses the standard IIO
> read_raw/write_raw interface for single-channel reads.
>
> The chip idles in Autonomous Mode so that single-shot read_raw can use
> the internal oscillator without disturbing the hardware configuration.
>
> Three voltage supply domains are managed: avdd (required), vio, and a
> reference supply on either the REF pin (ref-supply, external buffer)
> or the REFIN pin (refin-supply, uses the on-chip reference buffer;
> REFBUF_EN is set accordingly). Hardware reset is performed via
> the reset controller framework; a software reset through SPI_CONFIG_A
> is used as fallback when no hardware reset is available.
>
> Accumulator channel masking for single-shot reads uses ACC_MASK_REG via
> an ADDR_DESCENDING SPI write, which covers both mask bytes in a single
> 16-bit transfer.
>
> IIO_CHAN_INFO_SAMP_FREQ is exposed as info_mask_separate. The oscillator
> is shared hardware — writing any channel's sampling_frequency attribute
> sets it for all others — but per-channel attributes are used throughout
> the series to avoid an ABI change when per-channel oversampling ratios
> are introduced in a later commit, at which point the effective output
> rate (osc_freq / osr[N]) becomes genuinely per-channel.
>
> Reviewed-by: David Lechner <dlechner@baylibre.com>
> Signed-off-by: Radu Sabau <radu.sabau@analog.com>
Just a couple of really trivial comments.
I might just have tweaked some of these whilst applying but seeing
as it seems you'll be doing a v12 - over to you :)
Jonathan
> diff --git a/drivers/iio/adc/ad4691.c b/drivers/iio/adc/ad4691.c
> new file mode 100644
> index 000000000000..ba77e1bfef16
> --- /dev/null
> +++ b/drivers/iio/adc/ad4691.c
> +
> +#define AD4691_OSC_EN_REG 0x180
> +#define AD4691_STATE_RESET_REG 0x181
> +#define AD4691_STATE_RESET_ALL 0x01
Is this a value in a field or documented as whole register value?
I checked, it's BIT(0) - rest of register is Reserved 0
Hence cleaner to use BIT(0) for this definition.
> +#define AD4691_ADC_SETUP 0x182
> +#define AD4691_ADC_MODE_MASK GENMASK(1, 0)
> +#define AD4691_AUTONOMOUS_MODE 0x02
Personal preference would have been to fully define the field values
at this stage, but meh, you bring in the other used ones in later patches
so not critical.
> +
> +#define AD4691_CHANNEL(ch) \
> + { \
> + .type = IIO_VOLTAGE, \
> + .indexed = 1, \
> + .info_mask_separate = BIT(IIO_CHAN_INFO_RAW) \
> + | BIT(IIO_CHAN_INFO_SAMP_FREQ), \
> + .info_mask_separate_available = \
> + BIT(IIO_CHAN_INFO_SAMP_FREQ), \
> + .info_mask_shared_by_all = BIT(IIO_CHAN_INFO_SCALE), \
> + .channel = ch, \
> + .scan_index = ch, \
> + .scan_type = { \
> + .sign = 'u', \
> + .realbits = 16, \
> + .storagebits = 16, \
Really trivial but I suspect that scan_index and most of scan_type aren't used
until the buffered support is added. So in ideal world add that extra stuff
in that patch. .realbits is used here so fine to initialize that now.
> + }, \
> + }
> +
> +static int ad4691_single_shot_read(struct iio_dev *indio_dev,
> + struct iio_chan_spec const *chan, int *val)
> +{
> + struct ad4691_state *st = iio_priv(indio_dev);
> + unsigned int reg_val, osc_idx, period_us;
> + int ret;
> +
> + guard(mutex)(&st->lock);
> +
> + /* Use AUTONOMOUS mode for single-shot reads. */
> + ret = regmap_write(st->regmap, AD4691_STATE_RESET_REG,
> + AD4691_STATE_RESET_ALL);
This identical to line below that you don't wrap. I don't mind which
but consistency is good.
> + if (ret)
> + return ret;
> +
> + ret = regmap_write(st->regmap, AD4691_STD_SEQ_CONFIG,
> + BIT(chan->channel));
> + if (ret)
> + return ret;
> +
> + ret = regmap_write(st->regmap, AD4691_ACC_MASK_REG,
> + ~BIT(chan->channel) & GENMASK(15, 0));
> + if (ret)
> + return ret;
> +
> + ret = regmap_read(st->regmap, AD4691_OSC_FREQ_REG, ®_val);
> + if (ret)
> + return ret;
> +
> + ret = regmap_write(st->regmap, AD4691_OSC_EN_REG, 1);
> + if (ret)
> + return ret;
> +
> + osc_idx = FIELD_GET(AD4691_OSC_FREQ_MASK, reg_val);
> + /* Wait 2 oscillator periods for the conversion to complete. */
> + period_us = DIV_ROUND_UP(2UL * USEC_PER_SEC, ad4691_osc_freqs_Hz[osc_idx]);
> + fsleep(period_us);
> +
> + ret = regmap_write(st->regmap, AD4691_OSC_EN_REG, 0);
> + if (ret)
> + return ret;
> +
> + ret = regmap_read(st->regmap, AD4691_AVG_IN(chan->channel), ®_val);
> + if (ret)
> + return ret;
> +
> + *val = reg_val;
> +
> + ret = regmap_write(st->regmap, AD4691_STATE_RESET_REG, AD4691_STATE_RESET_ALL);
See above.
> + if (ret)
> + return ret;
> +
> + return IIO_VAL_INT;
> +}
> +
> +static const struct of_device_id ad4691_of_match[] = {
> + { .compatible = "adi,ad4691", .data = &ad4691_chip_info },
> + { .compatible = "adi,ad4692", .data = &ad4692_chip_info },
> + { .compatible = "adi,ad4693", .data = &ad4693_chip_info },
> + { .compatible = "adi,ad4694", .data = &ad4694_chip_info },
> + { }
> +};
> +MODULE_DEVICE_TABLE(of, ad4691_of_match);
> +
> +static const struct spi_device_id ad4691_id[] = {
> + { "ad4691", (kernel_ulong_t)&ad4691_chip_info },
> + { "ad4692", (kernel_ulong_t)&ad4692_chip_info },
> + { "ad4693", (kernel_ulong_t)&ad4693_chip_info },
> + { "ad4694", (kernel_ulong_t)&ad4694_chip_info },
New thing to IIO, but as you are doing another spin anyway can you save
us a future patch. Seems likely that similar to I2C that Uwe is
currently working on, we will eventually move away from stashing pointers
in that ulong_t. So please use named initializers. I'm not sure
why they've been standard for years for of_device_id but not spi_device_id!
Jonathan
> + { }
> +};
> +MODULE_DEVICE_TABLE(spi, ad4691_id);
^ permalink raw reply
* Re: [PATCH v11 2/6] iio: adc: ad4691: add initial driver for AD4691 family
From: Jonathan Cameron @ 2026-05-17 12:14 UTC (permalink / raw)
To: David Lechner
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Uwe Kleine-König, Liam Girdwood, Mark Brown, Linus Walleij,
Bartosz Golaszewski, Philipp Zabel, Jonathan Corbet, Shuah Khan,
linux-iio, devicetree, linux-kernel, linux-pwm, linux-gpio,
linux-doc
In-Reply-To: <0696b662-f478-4d1a-95e0-0338bbdb719e@baylibre.com>
On Sat, 16 May 2026 12:11:16 -0500
David Lechner <dlechner@baylibre.com> wrote:
> On 5/15/26 8:31 AM, Radu Sabau via B4 Relay wrote:
> > From: Radu Sabau <radu.sabau@analog.com>
> >
> > Add support for the Analog Devices AD4691 family of high-speed,
> > low-power multichannel SAR ADCs: AD4691 (16-ch, 500 kSPS),
> > AD4692 (16-ch, 1 MSPS), AD4693 (8-ch, 500 kSPS) and
> > AD4694 (8-ch, 1 MSPS).
> >
> > The driver implements a custom regmap layer over raw SPI to handle the
> > device's mixed 1/2/3/4-byte register widths and uses the standard IIO
> > read_raw/write_raw interface for single-channel reads.
> >
> > The chip idles in Autonomous Mode so that single-shot read_raw can use
> > the internal oscillator without disturbing the hardware configuration.
> >
> > Three voltage supply domains are managed: avdd (required), vio, and a
> > reference supply on either the REF pin (ref-supply, external buffer)
> > or the REFIN pin (refin-supply, uses the on-chip reference buffer;
> > REFBUF_EN is set accordingly). Hardware reset is performed via
> > the reset controller framework; a software reset through SPI_CONFIG_A
> > is used as fallback when no hardware reset is available.
> >
> > Accumulator channel masking for single-shot reads uses ACC_MASK_REG via
> > an ADDR_DESCENDING SPI write, which covers both mask bytes in a single
> > 16-bit transfer.
> >
> > IIO_CHAN_INFO_SAMP_FREQ is exposed as info_mask_separate. The oscillator
> > is shared hardware — writing any channel's sampling_frequency attribute
> > sets it for all others — but per-channel attributes are used throughout
> > the series to avoid an ABI change when per-channel oversampling ratios
> > are introduced in a later commit, at which point the effective output
> > rate (osc_freq / osr[N]) becomes genuinely per-channel.
> >
> > Reviewed-by: David Lechner <dlechner@baylibre.com>
> > Signed-off-by: Radu Sabau <radu.sabau@analog.com>
One follow up as I was commenting on same code...
> > diff --git a/drivers/iio/adc/ad4691.c b/drivers/iio/adc/ad4691.c
> > new file mode 100644
> > index 000000000000..ba77e1bfef16
> > --- /dev/null
> > +++ b/drivers/iio/adc/ad4691.c
> > +static int ad4691_get_sampling_freq(struct ad4691_state *st, int *val)
> > +{
> > + unsigned int reg_val;
> > + int ret;
> > +
>
> No mutex lock here? Maybe without OK since it is a read.
Agreed. It's not a bug, but also not a fast path and it will save reasoning
and need for comment to just take the lock.
>
> > + /*
> > + * AD4691_OSC_FREQ_REG is non-volatile and written during
> > + * ad4691_config(), so regmap returns the cached value here without
> > + * touching the SPI bus. No lock is needed.
> > + */
> > + ret = regmap_read(st->regmap, AD4691_OSC_FREQ_REG, ®_val);
> > + if (ret)
> > + return ret;
> > +
> > + *val = ad4691_osc_freqs_Hz[FIELD_GET(AD4691_OSC_FREQ_MASK, reg_val)];
> > + return IIO_VAL_INT;
> > +}
> > +
> > +static int ad4691_set_sampling_freq(struct iio_dev *indio_dev, int freq)
> > +{
> > + struct ad4691_state *st = iio_priv(indio_dev);
> > + unsigned int start = ad4691_samp_freq_start(st->info);
> > +
> > + IIO_DEV_ACQUIRE_DIRECT_MODE(indio_dev, claim);
> > + if (IIO_DEV_ACQUIRE_FAILED(claim))
> > + return -EBUSY;
> > +
> > + for (unsigned int i = start; i < ARRAY_SIZE(ad4691_osc_freqs_Hz); i++) {
> > + if (ad4691_osc_freqs_Hz[i] != freq)
> > + continue;
>
> mutex lock?
Agreed. Whilst the direct mode acquire will serialize that's an internal implementation
detail. Where a driver needs to ensure some sequences are not interrupted
(like I think for the single short read?) then it should take the local
lock.
>
> > + return regmap_update_bits(st->regmap, AD4691_OSC_FREQ_REG,
> > + AD4691_OSC_FREQ_MASK, i);
> > + }
> > +
> > + return -EINVAL;
> > +}
^ permalink raw reply
* Re: [PATCH v13 08/12] iio: frequency: adf41513: driver implementation
From: Rodrigo Alencar @ 2026-05-17 11:56 UTC (permalink / raw)
To: rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc
Cc: Jonathan Cameron, David Lechner, Andy Shevchenko,
Lars-Peter Clausen, Michael Hennerich, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet, Andrew Morton,
Petr Mladek, Steven Rostedt, Andy Shevchenko, Rasmus Villemoes,
Sergey Senozhatsky, Shuah Khan
In-Reply-To: <20260517-adf41513-iio-driver-v13-8-bb6e134a360f@analog.com>
On 26/05/17 10:14AM, Rodrigo Alencar via B4 Relay wrote:
> From: Rodrigo Alencar <rodrigo.alencar@analog.com>
>
> The driver is based on existing PLL drivers in the IIO subsystem and
> implements the following key features:
>
> - Integer-N and fractional-N (fixed/variable modulus) synthesis modes;
> - High-resolution frequency calculations using microhertz (µHz) precision
> to handle sub-Hz resolution across multi-GHz frequency ranges;
> - IIO debugfs interface for direct register access;
> - FW property parsing from devicetree including charge pump settings and
> reference path configuration;
> - Power management support with suspend/resume callbacks;
> - Lock detect GPIO monitoring.
Sashiko's feed back at:
https://sashiko.dev/#/patchset/20260517-adf41513-iio-driver-v13-0-bb6e134a360f%40analog.com?part=8
...
> +static int adf41513_calc_pfd_frequency(struct adf41513_state *st,
> + struct adf41513_pll_settings *result,
> + u64 fpfd_limit_uhz)
> +{
> + result->ref_div2 = st->data.ref_div2_en;
> + result->ref_doubler = st->data.ref_doubler_en;
> +
> + if (st->data.ref_doubler_en &&
> + st->ref_freq_hz > ADF41513_MAX_REF_FREQ_DOUBLER_HZ) {
> + result->ref_doubler = 0;
> + dev_warn(&st->spi->dev, "Disabling ref doubler due to high reference frequency\n");
Will this cause kernel log spam if an application frequently changes the
frequency? Since ref_doubler_en and ref_freq_hz are constant after probe,
could we validate this once during initialization or use dev_warn_once()
here?
Indeed, will move this check to parse_fw()
> + }
...
> +static int adf41513_calc_variable_mod(struct adf41513_state *st,
> + struct adf41513_pll_settings *result)
> +{
> + u64 freq_error_uhz, mod2;
> + u32 frac1, frac2;
> + u32 int_val = div64_u64_rem(result->target_frequency_uhz,
> + result->pfd_frequency_uhz, &freq_error_uhz);
> +
> + if (st->chip_info->has_prescaler_8_9 && int_val >= ADF41513_MIN_INT_FRAC_8_9 &&
> + int_val <= ADF41513_MAX_INT_8_9)
> + result->prescaler = 1;
> + else if (int_val >= ADF41513_MIN_INT_FRAC_4_5 && int_val <= ADF41513_MAX_INT_4_5)
> + result->prescaler = 0;
> + else
> + return -ERANGE;
> +
> + /* calculate required mod2 based on target resolution / 2 */
> + mod2 = DIV64_U64_ROUND_CLOSEST(result->pfd_frequency_uhz << 1,
> + st->data.freq_resolution_uhz * ADF41513_FIXED_MODULUS);
> + /* ensure mod2 is at least 2 for meaningful operation */
> + mod2 = clamp(mod2, 2, ADF41513_MAX_MOD2);
> +
> + /* calculate frac1 and frac2 */
> + frac1 = mul_u64_u64_div_u64(freq_error_uhz, ADF41513_FIXED_MODULUS,
> + result->pfd_frequency_uhz);
> + freq_error_uhz -= mul_u64_u32_div(result->pfd_frequency_uhz, frac1,
> + ADF41513_FIXED_MODULUS);
> + frac2 = mul_u64_u64_div_u64(freq_error_uhz, mod2 * ADF41513_FIXED_MODULUS,
> + result->pfd_frequency_uhz);
Can this calculation result in a frac2 value that exceeds mod2?
Because mul_u64_u32_div() truncates the remainder during the integer
division, the subtracted value might be smaller than the exact mathematical
contribution of frac1. This leaves freq_error_uhz artificially larger than
its theoretical bound.
When frac2 is computed by multiplying this enlarged error by
(mod2 * ADF41513_FIXED_MODULUS), couldn't the result exceed mod2? If
the hardware requires frac2 < mod2, should we cap frac2 or handle the
remainder differently?
will replace to:
/* calculate frac1 and frac2 */
frac1 = mul_u64_u64_div_u64(freq_error_uhz, ADF41513_FIXED_MODULUS,
result->pfd_frequency_uhz);
frac2 = mul_u64_u64_div_u64(freq_error_uhz, mod2 * ADF41513_FIXED_MODULUS,
result->pfd_frequency_uhz) - (u64)frac1 * mod2;
which would remove the possible error introduced by freq_error_uhz -= mul_u64_u32_div().
> +
> + /* integer part */
> + result->actual_frequency_uhz = (u64)int_val * result->pfd_frequency_uhz;
...
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* Re: [PATCH v11 2/6] iio: adc: ad4691: add initial driver for AD4691 family
From: Jonathan Cameron @ 2026-05-17 11:52 UTC (permalink / raw)
To: Radu Sabau via B4 Relay
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, David Lechner,
Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Uwe Kleine-König, Liam Girdwood, Mark Brown,
Linus Walleij, Bartosz Golaszewski, Philipp Zabel,
Jonathan Corbet, Shuah Khan, linux-iio, devicetree, linux-kernel,
linux-pwm, linux-gpio, linux-doc
In-Reply-To: <20260515-ad4692-multichannel-sar-adc-driver-v11-2-eab27d852ac2@analog.com>
On Fri, 15 May 2026 16:31:31 +0300
Radu Sabau via B4 Relay <devnull+radu.sabau.analog.com@kernel.org> wrote:
> From: Radu Sabau <radu.sabau@analog.com>
>
> Add support for the Analog Devices AD4691 family of high-speed,
> low-power multichannel SAR ADCs: AD4691 (16-ch, 500 kSPS),
> AD4692 (16-ch, 1 MSPS), AD4693 (8-ch, 500 kSPS) and
> AD4694 (8-ch, 1 MSPS).
>
> The driver implements a custom regmap layer over raw SPI to handle the
> device's mixed 1/2/3/4-byte register widths and uses the standard IIO
> read_raw/write_raw interface for single-channel reads.
>
> The chip idles in Autonomous Mode so that single-shot read_raw can use
> the internal oscillator without disturbing the hardware configuration.
>
> Three voltage supply domains are managed: avdd (required), vio, and a
> reference supply on either the REF pin (ref-supply, external buffer)
> or the REFIN pin (refin-supply, uses the on-chip reference buffer;
> REFBUF_EN is set accordingly). Hardware reset is performed via
> the reset controller framework; a software reset through SPI_CONFIG_A
> is used as fallback when no hardware reset is available.
>
> Accumulator channel masking for single-shot reads uses ACC_MASK_REG via
> an ADDR_DESCENDING SPI write, which covers both mask bytes in a single
> 16-bit transfer.
>
> IIO_CHAN_INFO_SAMP_FREQ is exposed as info_mask_separate. The oscillator
> is shared hardware — writing any channel's sampling_frequency attribute
> sets it for all others — but per-channel attributes are used throughout
> the series to avoid an ABI change when per-channel oversampling ratios
> are introduced in a later commit, at which point the effective output
> rate (osc_freq / osr[N]) becomes genuinely per-channel.
>
> Reviewed-by: David Lechner <dlechner@baylibre.com>
> Signed-off-by: Radu Sabau <radu.sabau@analog.com>
Just to make sure it's not missed, one sashiko comment referenced below.
I agree with it that no sleep after coming out of reset is unusual. Not
impossible I suppose, but needs a comment.
> +static int ad4691_reset(struct ad4691_state *st)
> +{
> + struct device *dev = regmap_get_device(st->regmap);
> + struct reset_control *rst;
> +
> + rst = devm_reset_control_get_optional_exclusive(dev, NULL);
> + if (IS_ERR(rst))
> + return dev_err_probe(dev, PTR_ERR(rst), "Failed to get reset\n");
> +
> + if (rst) {
> + /*
> + * Assert the reset line before sleeping to guarantee a proper
> + * reset pulse on every probe, including driver reloads where
> + * the line may already be deasserted (reset_control_put() does
> + * not re-assert on release).
> + * devm_reset_control_get_optional_exclusive_deasserted() cannot
> + * be used because it deasserts immediately without delay; the
> + * datasheet (Table 5) requires a ≥300 µs reset pulse width
> + * before deassertion.
> + */
> + reset_control_assert(rst);
> + fsleep(300);
> + return reset_control_deassert(rst);
Sashiko makes the reasonable point that we'd kind of expect some time between
that pin dropping the device out of reset and it being able to respond. If it
really is that quick - then add a comment.
https://sashiko.dev/#/patchset/20260515-ad4692-multichannel-sar-adc-driver-v11-0-eab27d852ac2%40analog.com
> + }
> +
> + /* No hardware reset available, fall back to software reset. */
> + return regmap_write(st->regmap, AD4691_SPI_CONFIG_A_REG,
> + AD4691_SW_RESET);
Same applies here.
> +}
^ permalink raw reply
* Re: [PATCH v11 2/6] iio: adc: ad4691: add initial driver for AD4691 family
From: Jonathan Cameron @ 2026-05-17 11:50 UTC (permalink / raw)
To: David Lechner
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Uwe Kleine-König, Liam Girdwood, Mark Brown, Linus Walleij,
Bartosz Golaszewski, Philipp Zabel, Jonathan Corbet, Shuah Khan,
linux-iio, devicetree, linux-kernel, linux-pwm, linux-gpio,
linux-doc
In-Reply-To: <0696b662-f478-4d1a-95e0-0338bbdb719e@baylibre.com>
On Sat, 16 May 2026 12:11:16 -0500
David Lechner <dlechner@baylibre.com> wrote:
> > +static int ad4691_probe(struct spi_device *spi)
> > +{
> > + struct device *dev = &spi->dev;
> > + struct iio_dev *indio_dev;
> > + struct ad4691_state *st;
> > + int ret;
> > +
> > + indio_dev = devm_iio_device_alloc(dev, sizeof(*st));
> > + if (!indio_dev)
> > + return -ENOMEM;
> > +
> > + st = iio_priv(indio_dev);
> > + st->info = spi_get_device_match_data(spi);
> > + if (!st->info)
> > + return -ENODEV;
>
> We've recently standardized on not checking return value
> of spi_get_device_match_data().
There is a wrinkle in that plan. It's still possible to bind these
drivers to a different ID via driver_override and that will land
us in a NULL dereference.
Until we've closed that out (Andy and others are working on it)
we should probably keep these in. Once that's in place we can
cycle back to clean them out + potentially backport that feature
to ensure drivers that are currently not checking are fine.
>
> > +
> > + ret = devm_mutex_init(dev, &st->lock);
> > + if (ret)
> > + return ret;
> > +
> > + st->regmap = devm_regmap_init(dev, NULL, spi, &ad4691_regmap_config);
> > + if (IS_ERR(st->regmap))
> > + return dev_err_probe(dev, PTR_ERR(st->regmap),
> > + "Failed to initialize regmap\n");
> > +
> > + ret = ad4691_regulator_setup(st);
> > + if (ret)
> > + return ret;
> > +
> > + ret = ad4691_reset(st);
> > + if (ret)
> > + return ret;
> > +
> > + ret = ad4691_config(st);
> > + if (ret)
> > + return ret;
> > +
> > + indio_dev->name = st->info->name;
> > + indio_dev->info = &ad4691_info;
> > + indio_dev->modes = INDIO_DIRECT_MODE;
> > +
> > + indio_dev->channels = st->info->sw_info->channels;
> > + indio_dev->num_channels = st->info->sw_info->num_channels;
> > +
> > + return devm_iio_device_register(dev, indio_dev);
> > +}
^ permalink raw reply
* Re: [PATCH RFC v4 01/10] dt-bindings: iio: frequency: add ad9910
From: Jonathan Cameron @ 2026-05-17 11:28 UTC (permalink / raw)
To: Rodrigo Alencar
Cc: Rodrigo Alencar via B4 Relay, rodrigo.alencar, linux-iio,
devicetree, linux-kernel, linux-doc, linux-hardening,
Lars-Peter Clausen, Michael Hennerich, David Lechner,
Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Philipp Zabel, Jonathan Corbet, Shuah Khan, Kees Cook,
Gustavo A. R. Silva
In-Reply-To: <ikhp5dsb4ook2cx665p4xbqg5ykmnoytiaybv2cx5khyn3wngn@j752jkewd36b>
On Sun, 17 May 2026 11:12:14 +0100
Rodrigo Alencar <455.rodrigo.alencar@gmail.com> wrote:
> On 26/05/16 11:40AM, Jonathan Cameron wrote:
> > On Wed, 13 May 2026 16:09:24 +0100
> > Rodrigo Alencar <455.rodrigo.alencar@gmail.com> wrote:
> >
> > > On 26/05/12 07:31PM, Jonathan Cameron wrote:
> > > > On Fri, 08 May 2026 18:00:17 +0100
> > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > >
> > > > > From: Rodrigo Alencar <rodrigo.alencar@analog.com>
> > > > >
> > > > > DT-bindings for AD9910, a 1 GSPS DDS with 14-bit DAC. It includes
> > > > > configurations for clocks, DAC current, reset and basic GPIO control.
> > > >
> > > > I think this is getting close enough now that for next version you should
> > > > drop the RFC (which is probably gating DT binding folk giving it
> > > > a detailed review!)
> > > >
> > > > >
> > > > > Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
> > > >
> > > > > +
> > > > > + adi,dac-output-current-microamp:
> > > > > + minimum: 8640
> > > > > + maximum: 31590
> > > > > + default: 20070
> > > > > + description:
> > > > > + DAC full-scale output current in microamps.
> > > > > +
> > > > Can we use generic dac.yaml defined output-range-microamp? The base will be 0 always but
> > > > that shouldn't matter.
> > > >
> > >
> > > would that be fine even if we do not have those child channel nodes in the device-tree node?
> > >
> > I think I'd rather we generalized to cover the 'one shared value' case rather
> > then went with a vendor specific binding.
>
> I can't reference dac.yaml in node level as it forces the nodename to match a pattern:
>
> dds@0 (adi,ad9910): $nodename:0: 'dds@0' does not match '^channel(@[0-9a-f]+)?$'
>
> Also, I can't reference the property only, because it ends with *-microamp:
>
> output-range-microamp:
> $ref: /schemas/iio/dac/dac.yaml#/properties/output-range-microamp
> items:
> - const: 0
> - minimum: 8640
> maximum: 31590
> default: 20070
>
> which gets me:
>
> properties:output-range-microamp: '$ref' should not be valid under {'const': '$ref'}
>
> so I will adjust it to:
>
> output-range-microamp:
> description: DAC full-scale output current in microamps.
> items:
> - const: 0
> - minimum: 8640
> maximum: 31590
> default: 20070
>
> and not reference dac.yaml at all.
>
That works. Thanks
^ permalink raw reply
* [PATCH net-next 2/2] net/mlx5: implement max_sfs parameter
From: Tariq Toukan @ 2026-05-17 11:27 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Jiri Pirko, Simon Horman, Jonathan Corbet, Shuah Khan,
Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Vlad Dumitrescu, Aleksandr Loktionov, Daniel Zahka, David Ahern,
Nikolay Aleksandrov, netdev, linux-doc, linux-kernel, linux-rdma,
Gal Pressman, Dragos Tatulea, Jiri Pirko, Nikolay Aleksandrov
In-Reply-To: <20260517112700.343575-1-tariqt@nvidia.com>
From: Nikolay Aleksandrov <nikolay@nvidia.com>
Implement max_sfs generic parameter to allow users to control the total
light-weight NIC subfunctions that can be created using devlink instead
of external vendor tools. A value of 0 will effectively disable creation
of new subfunction devices. A warning is sent to user-space via extack
(returning extack without error code is interpreted as a warning by
user-space tools).
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
Documentation/networking/devlink/mlx5.rst | 7 +-
.../mellanox/mlx5/core/lib/nv_param.c | 83 ++++++++++++++++++-
2 files changed, 86 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 4bba4d780a4a..283b93d16861 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -45,8 +45,13 @@ Parameters
- The range is between 1 and a device-specific max.
- Applies to each physical function (PF) independently, if the device
supports it. Otherwise, it applies symmetrically to all PFs.
+ * - ``max_sfs``
+ - permanent
+ - The range is between 0 and a device-specific max.
+ - Applies to each physical function (PF) independently.
-Note: permanent parameters such as ``enable_sriov`` and ``total_vfs`` require FW reset to take effect
+Note: permanent parameters such as ``enable_sriov``, ``total_vfs` and ``max_sfs``
+ require FW reset to take effect
.. code-block:: bash
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
index 19bb620b7436..eff3a67e4ca0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
@@ -68,7 +68,9 @@ struct mlx5_ifc_mnvda_reg_bits {
struct mlx5_ifc_nv_global_pci_conf_bits {
u8 sriov_valid[0x1];
- u8 reserved_at_1[0x10];
+ u8 reserved_at_1[0xa];
+ u8 per_pf_num_sf[0x1];
+ u8 reserved_at_c[0x5];
u8 per_pf_total_vf[0x1];
u8 reserved_at_12[0xe];
@@ -93,9 +95,11 @@ struct mlx5_ifc_nv_global_pci_cap_bits {
};
struct mlx5_ifc_nv_pf_pci_conf_bits {
- u8 reserved_at_0[0x9];
+ u8 log_sf_bar_size[0x8];
+ u8 pf_total_sf_en[0x1];
u8 pf_total_vf_en[0x1];
- u8 reserved_at_a[0x16];
+ u8 reserved_at_a[0x6];
+ u8 total_sf[0x10];
u8 reserved_at_20[0x20];
@@ -755,6 +759,76 @@ static int mlx5_devlink_total_vfs_validate(struct devlink *devlink, u32 id,
return 0;
}
+static int mlx5_devlink_max_sfs_get(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
+ void *data;
+ int err;
+
+ err = mlx5_nv_param_read_per_host_pf_conf(dev, mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Failed to read PF configuration");
+ return err;
+ }
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+ ctx->val.vu32 = MLX5_GET(nv_pf_pci_conf, data, total_sf);
+
+ return 0;
+}
+
+static int mlx5_devlink_max_sfs_set(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
+ void *data;
+ int err;
+
+ err = mlx5_nv_param_read_global_pci_conf(dev, mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Failed to read global PCI configuration");
+ return err;
+ }
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+ MLX5_SET(nv_global_pci_conf, data, per_pf_num_sf, !!ctx->val.vu32);
+
+ err = mlx5_nv_param_write(dev, mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Failed to change per_pf_num_sf global PCI configuration");
+ return err;
+ }
+
+ memset(mnvda, 0, sizeof(mnvda));
+ err = mlx5_nv_param_read_per_host_pf_conf(dev, mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Failed to read PF configuration");
+ return err;
+ }
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+ MLX5_SET(nv_pf_pci_conf, data, log_sf_bar_size, ctx->val.vu32 ? 12 : 0);
+ MLX5_SET(nv_pf_pci_conf, data, pf_total_sf_en, !!ctx->val.vu32);
+ MLX5_SET(nv_pf_pci_conf, data, total_sf, ctx->val.vu32);
+
+ err = mlx5_nv_param_write(dev, mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Failed to change PF PCI configuration");
+ return err;
+ }
+ NL_SET_ERR_MSG(extack, "Modifying max_sfs requires a reboot");
+
+ return 0;
+}
+
static const struct devlink_param mlx5_nv_param_devlink_params[] = {
DEVLINK_PARAM_GENERIC(ENABLE_SRIOV, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
mlx5_devlink_enable_sriov_get,
@@ -763,6 +837,9 @@ static const struct devlink_param mlx5_nv_param_devlink_params[] = {
mlx5_devlink_total_vfs_get,
mlx5_devlink_total_vfs_set,
mlx5_devlink_total_vfs_validate),
+ DEVLINK_PARAM_GENERIC(MAX_SFS, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+ mlx5_devlink_max_sfs_get,
+ mlx5_devlink_max_sfs_set, NULL),
DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE,
"cqe_compress_type", DEVLINK_PARAM_TYPE_STRING,
BIT(DEVLINK_PARAM_CMODE_PERMANENT),
--
2.44.0
^ permalink raw reply related
* [PATCH net-next 1/2] devlink: add generic device max_sfs parameter
From: Tariq Toukan @ 2026-05-17 11:26 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Jiri Pirko, Simon Horman, Jonathan Corbet, Shuah Khan,
Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Vlad Dumitrescu, Aleksandr Loktionov, Daniel Zahka, David Ahern,
Nikolay Aleksandrov, netdev, linux-doc, linux-kernel, linux-rdma,
Gal Pressman, Dragos Tatulea, Jiri Pirko, Nikolay Aleksandrov
In-Reply-To: <20260517112700.343575-1-tariqt@nvidia.com>
From: Nikolay Aleksandrov <nikolay@nvidia.com>
Add a new generic devlink device parameter (max_sfs) to control if and
how many light-weight NIC subfunctions can be created. Subfunctions are
a light-weight network functions backed by an underlying PCI function.
Their lifecycle can already be managed by devlink, but currently users
cannot enable them in the device. They can be enabled/disabled only via
external vendor tools. This parameter allows subfunctions to be enabled
(>0) or disabled (0) via devlink. A subsequent patch will add support
for max_sfs to the mlx5 driver.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
Documentation/networking/devlink/devlink-params.rst | 6 ++++++
include/net/devlink.h | 4 ++++
net/devlink/param.c | 5 +++++
3 files changed, 15 insertions(+)
diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index ea17756dcda6..29b8a9246fb6 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -165,3 +165,9 @@ own name.
- u32
- Controls the maximum number of MAC address filters that can be assigned
to a Virtual Function (VF).
+ * - ``max_sfs``
+ - u32
+ - The maximum number of subfunctions which can be created on the device.
+ Modifying this parameter may require a device restart and PCI bus
+ rescanning because the BAR layout may change. A value of 0 disables
+ subfunction creation.
diff --git a/include/net/devlink.h b/include/net/devlink.h
index bcd31de1f890..4ec455cfe7a4 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -546,6 +546,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_TOTAL_VFS,
DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS,
DEVLINK_PARAM_GENERIC_ID_MAX_MAC_PER_VF,
+ DEVLINK_PARAM_GENERIC_ID_MAX_SFS,
/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -619,6 +620,9 @@ enum devlink_param_generic_id {
#define DEVLINK_PARAM_GENERIC_MAX_MAC_PER_VF_NAME "max_mac_per_vf"
#define DEVLINK_PARAM_GENERIC_MAX_MAC_PER_VF_TYPE DEVLINK_PARAM_TYPE_U32
+#define DEVLINK_PARAM_GENERIC_MAX_SFS_NAME "max_sfs"
+#define DEVLINK_PARAM_GENERIC_MAX_SFS_TYPE DEVLINK_PARAM_TYPE_U32
+
#define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
{ \
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
diff --git a/net/devlink/param.c b/net/devlink/param.c
index cf95268da5b0..523243e49d88 100644
--- a/net/devlink/param.c
+++ b/net/devlink/param.c
@@ -117,6 +117,11 @@ static const struct devlink_param devlink_param_generic[] = {
.name = DEVLINK_PARAM_GENERIC_MAX_MAC_PER_VF_NAME,
.type = DEVLINK_PARAM_GENERIC_MAX_MAC_PER_VF_TYPE,
},
+ {
+ .id = DEVLINK_PARAM_GENERIC_ID_MAX_SFS,
+ .name = DEVLINK_PARAM_GENERIC_MAX_SFS_NAME,
+ .type = DEVLINK_PARAM_GENERIC_MAX_SFS_TYPE,
+ },
};
static int devlink_param_generic_verify(const struct devlink_param *param)
--
2.44.0
^ permalink raw reply related
* [PATCH net-next 0/2] devlink: add generic max_sfs parameter and mlx5 support
From: Tariq Toukan @ 2026-05-17 11:26 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Jiri Pirko, Simon Horman, Jonathan Corbet, Shuah Khan,
Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Vlad Dumitrescu, Aleksandr Loktionov, Daniel Zahka, David Ahern,
Nikolay Aleksandrov, netdev, linux-doc, linux-kernel, linux-rdma,
Gal Pressman, Dragos Tatulea, Jiri Pirko
Hi,
This series by Nikolay introduces a new generic devlink device
parameter, max_sfs, to control the number of light-weight NIC
subfunctions (SFs) that can be created on a device.
The first patch adds the generic devlink parameter and infrastructure
support.
The second patch implements support for the parameter in the mlx5
driver.
With this addition, users can enable or disable SF creation directly via
devlink, without relying on external vendor-specific tools.
Regards,
Tariq
Nikolay Aleksandrov (2):
devlink: add generic device max_sfs parameter
net/mlx5: implement max_sfs parameter
.../networking/devlink/devlink-params.rst | 6 ++
Documentation/networking/devlink/mlx5.rst | 7 +-
.../mellanox/mlx5/core/lib/nv_param.c | 83 ++++++++++++++++++-
include/net/devlink.h | 4 +
net/devlink/param.c | 5 ++
5 files changed, 101 insertions(+), 4 deletions(-)
base-commit: 627ac78f2741e2ebd2225e2e953b6964a8a9182f
--
2.44.0
^ permalink raw reply
* [PATCH] docs/gpu: fix typo in msm-crash-dump.rst
From: Sakurai Shun @ 2026-05-17 11:06 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Jonathan Corbet, Shuah Khan
Cc: Sakurai Shun, dri-devel, linux-doc, linux-kernel
Replace "uinque" with "unique"
Signed-off-by: Sakurai Shun <ssh1326@icloud.com>
---
Documentation/gpu/msm-crash-dump.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/gpu/msm-crash-dump.rst b/Documentation/gpu/msm-crash-dump.rst
index 9509cc422..474c0d265 100644
--- a/Documentation/gpu/msm-crash-dump.rst
+++ b/Documentation/gpu/msm-crash-dump.rst
@@ -70,7 +70,7 @@ ringbuffer
bo
List of buffers from the hanging submission if available.
- Each buffer object will have a uinque iova.
+ Each buffer object will have a unique iova.
iova
GPU address of the buffer object.
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v13 06/12] iio: core: add decimal value formatting into 64-bit value
From: Rodrigo Alencar @ 2026-05-17 10:44 UTC (permalink / raw)
To: Andy Shevchenko, rodrigo.alencar
Cc: linux-kernel, linux-iio, devicetree, linux-doc, Jonathan Cameron,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan
In-Reply-To: <agmVYvWBmLI4A65m@ashevche-desk.local>
On 26/05/17 01:16PM, Andy Shevchenko wrote:
> On Sun, May 17, 2026 at 10:14:01AM +0100, Rodrigo Alencar via B4 Relay wrote:
>
> > Create new format types for iio values (IIO_VAL_DECIMAL64_*), which
> > defines the representation of fixed decimal point values into a single
> > 64-bit number. This new format increases the range of represented values,
> > allowing for integer parts greater than 2^32, as bits are not "wasted"
> > in the fractional part, which can be seen in IIO_VAL_INT_PLUS_MICRO and
> > IIO_VAL_INT_PLUS_NANO. Helpers are created to compose and decompose 64-bit
> > decimals into integer values used in IIO formatting interfaces, which
> > creates consistency and avoid error-prone manual assignments when using
> > wordpart macros. When doing the parsing, kstrtodec64() is used with the
> > scale defined by the specific decimal format type.
>
> ...
>
> > + {
> > + int scale = type - IIO_VAL_DECIMAL64_BASE;
>
> > + int l = 0;
>
> Perhaps make it global in the function? We have the same in
> IIO_VAL_INT_MULTIPLE case.
>
> > + s64 frac;
> > +
> > + tmp2 = div64_s64_rem(iio_val_s64_from_s32s(vals),
> > + int_pow(10, scale), &frac);
> > + if (tmp2 == 0 && frac < 0)
> > + l += sysfs_emit_at(buf, offset, "-");
> > +
> > + l += sysfs_emit_at(buf, offset + l, "%lld.%0*lld", tmp2, scale,
> > + abs(frac));
> > + return l;
> > + }
>
> ...
>
> > #ifndef _IIO_TYPES_H_
> > #define _IIO_TYPES_H_
>
> Also needs types.h now... Which makes me think if the proposed macros are
> placed in the good enough location.
That is a good point. I left it there because they are related to IIO_VAL_INT_64
and IIO_VAL_DECIMAL64_*. I had them as macros initially, but they are not exactly
"types" indeed.
>
> (Note, iio/iio.h missing actually types.h, but includes it indirectly.)
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* [PATCH] docs/mm: fix typo in process_addrs.rst
From: Sakurai Shun @ 2026-05-17 10:36 UTC (permalink / raw)
To: Andrew Morton, Suren Baghdasaryan, Liam R. Howlett,
Lorenzo Stoakes, Vlastimil Babka, Shakeel Butt, David Hildenbrand,
Mike Rapoport, Michal Hocko, Jonathan Corbet, Shuah Khan
Cc: Sakurai Shun, linux-mm, linux-doc, linux-kernel
Replace "presense" with "presence"
Signed-off-by: Sakurai Shun <ssh1326@icloud.com>
---
Documentation/mm/process_addrs.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst
index 851680ead..042d64d72 100644
--- a/Documentation/mm/process_addrs.rst
+++ b/Documentation/mm/process_addrs.rst
@@ -775,7 +775,7 @@ lock, releasing or downgrading the mmap write lock also releases the VMA write
lock so there is no :c:func:`!vma_end_write` function.
Note that when write-locking a VMA lock, the :c:member:`!vma.vm_refcnt` is temporarily
-modified so that readers can detect the presense of a writer. The reference counter is
+modified so that readers can detect the presence of a writer. The reference counter is
restored once the vma sequence number used for serialisation is updated.
This ensures the semantics we require - VMA write locks provide exclusive write
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v13 06/12] iio: core: add decimal value formatting into 64-bit value
From: Andy Shevchenko @ 2026-05-17 10:16 UTC (permalink / raw)
To: rodrigo.alencar
Cc: linux-kernel, linux-iio, devicetree, linux-doc, Jonathan Cameron,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan
In-Reply-To: <20260517-adf41513-iio-driver-v13-6-bb6e134a360f@analog.com>
On Sun, May 17, 2026 at 10:14:01AM +0100, Rodrigo Alencar via B4 Relay wrote:
> Create new format types for iio values (IIO_VAL_DECIMAL64_*), which
> defines the representation of fixed decimal point values into a single
> 64-bit number. This new format increases the range of represented values,
> allowing for integer parts greater than 2^32, as bits are not "wasted"
> in the fractional part, which can be seen in IIO_VAL_INT_PLUS_MICRO and
> IIO_VAL_INT_PLUS_NANO. Helpers are created to compose and decompose 64-bit
> decimals into integer values used in IIO formatting interfaces, which
> creates consistency and avoid error-prone manual assignments when using
> wordpart macros. When doing the parsing, kstrtodec64() is used with the
> scale defined by the specific decimal format type.
...
> + {
> + int scale = type - IIO_VAL_DECIMAL64_BASE;
> + int l = 0;
Perhaps make it global in the function? We have the same in
IIO_VAL_INT_MULTIPLE case.
> + s64 frac;
> +
> + tmp2 = div64_s64_rem(iio_val_s64_from_s32s(vals),
> + int_pow(10, scale), &frac);
> + if (tmp2 == 0 && frac < 0)
> + l += sysfs_emit_at(buf, offset, "-");
> +
> + l += sysfs_emit_at(buf, offset + l, "%lld.%0*lld", tmp2, scale,
> + abs(frac));
> + return l;
> + }
...
> #ifndef _IIO_TYPES_H_
> #define _IIO_TYPES_H_
Also needs types.h now... Which makes me think if the proposed macros are
placed in the good enough location.
(Note, iio/iio.h missing actually types.h, but includes it indirectly.)
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox