Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v6 04/12] PCI: liveupdate: Document driver binding responsibilities
From: Samiullah Khawaja @ 2026-06-23 16:43 UTC (permalink / raw)
  To: David Matlack
  Cc: kexec, linux-doc, linux-kernel, linux-mm, linux-pci,
	Adithya Jayachandran, Alexander Graf, Alex Williamson,
	Bjorn Helgaas, Chris Li, David Rientjes, Jacob Pan,
	Jason Gunthorpe, Jonathan Corbet, Josh Hilke, Leon Romanovsky,
	Lukas Wunner, Mike Rapoport, Parav Pandit, Pasha Tatashin,
	Pranjal Shrivastava, Pratyush Yadav, Saeed Mahameed, Shuah Khan,
	Vipin Sharma, William Tu, Yi Liu
In-Reply-To: <20260522202410.3104264-5-dmatlack@google.com>

On Fri, May 22, 2026 at 08:24:02PM +0000, David Matlack wrote:
>Document how driver binding works during a Live Update and what the PCI
>core expects of drivers and users. Note that this is only a description
>of the current division of responsibilities. These can change in the
>future if we decide.
>
>Signed-off-by: David Matlack <dmatlack@google.com>
>---
> drivers/pci/liveupdate.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
>diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
>index 96c43b84532c..4f2ec6ffdd16 100644
>--- a/drivers/pci/liveupdate.c
>+++ b/drivers/pci/liveupdate.c
>@@ -70,6 +70,22 @@
>  * preserved. These may be relaxed in the future:
>  *
>  *  * The device cannot be a Virtual Function (VF).
>+ *
>+ * Driver Binding
>+ * ==============
>+ *
>+ * In the outgoing kernel, it is the driver's responsibility to ensure that it
>+ * does not release a device between pci_liveupdate_preserve() and
>+ * pci_liveupdate_unpreserve().
>+ *
>+ * In the incoming kernel, it is the driver's responsibility to ensure that it
>+ * does not release a preserved device between probe() and
>+ * pci_liveupdate_finish().
>+ *
>+ * It is the user's responsibility to ensure that incoming preserved devices are
>+ * bound to the correct driver. i.e. The PCI core does not protect against a
>+ * device getting preserved by driver A in the outgoing kernel and then getting
>+ * bound to driver B in the incoming kernel.
>  */
>
> #define pr_fmt(fmt) "PCI: liveupdate: " fmt
>-- 
>2.54.0.746.g67dd491aae-goog
>

Reviewed-by: Samiullah Khawaja <skhawaja@google.com>

^ permalink raw reply

* Re: [RFC PATCH 1/3] mm/numa: add exclusive node pool and numa=standby boot parameter
From: Gregory Price @ 2026-06-23 16:36 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-mm, x86, linux-doc, linux-kernel, linux-acpi, driver-core,
	kernel-team, corbet, skhan, dave.hansen, luto, peterz, tglx,
	mingo, bp, hpa, rafael, lenb, gregkh, dakr, akpm, rdunlap,
	feng.tang, dapeng1.mi, elver, kuba, ebiggers, lirongqing, paulmck,
	dave.jiang, jic23, xueshuai, kai.huang
In-Reply-To: <ai5vj_RjSxl_FLu-@kernel.org>

On Sun, Jun 14, 2026 at 12:08:31PM +0300, Mike Rapoport wrote:
> On Thu, Jun 11, 2026 at 10:04:01AM -0400, Gregory Price wrote:
> > On Thu, Jun 11, 2026 at 12:00:17PM +0300, Mike Rapoport wrote:
>  
> > So really i think you're pointing out that futex_init() here probably
> > shouldn't be using num_possible_nodes?
> 
> I'd rather say that num_possible_nodes() with and without CXL (or other
> differentiated memory) has different semantics.
> Maybe we need to add a new primitive for possible differentiated nodes and
> keep num_possible_nodes() to mean "number of possible nodes with normal
> memory".
>  

We'd have to define "normal" here a little more discretely.

Normal = N_MEMORY at __init?
Normal = N_MEMORY in the future?

We also use the possible_nodes() mask to allocate per-node pgdat, so
the futex example is largely just another "hey look at this thing,
I wonder what other stuff is out there".

~Gregory

^ permalink raw reply

* [PATCH v6 8/8] x86/setup: prepend embedded bootconfig cmdline before parse_early_param
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org>

Call xbc_prepend_embedded_cmdline() in setup_arch() right after the
CONFIG_CMDLINE merge and before strscpy(command_line, ...) so the
build-time-rendered embedded bootconfig "kernel" subtree is part of
boot_command_line by the time parse_early_param() runs. early_param()
handlers (mem=, earlycon=, loglevel=, ...) now see values supplied via
CONFIG_BOOT_CONFIG_EMBED_FILE without parsing bootconfig at runtime.

Gate the prepend on the same opt-in the runtime parser uses: prepend
when "bootconfig" is present on the command line, or when
CONFIG_BOOT_CONFIG_FORCE is set. Detect it with parse_args(), exactly
as setup_boot_config() does, so both agree on what counts as opt-in:
any "bootconfig" key regardless of value (bare, =0, =1, ...), and only
before the "--" that separates init arguments. Sharing the parser keeps
the early and late paths from diverging -- e.g. "bootconfig=0" or a
"-- bootconfig" meant for init must not apply the embedded keys early
while the runtime parser skips them.

The prepend necessarily runs before setup_boot_config() detects an
initrd bootconfig, so an initrd cannot override the embedded "kernel"
keys for early_param(). This is intentional: the embedded cmdline acts
like a build-time CONFIG_CMDLINE. An initrd bootconfig's "kernel" keys
never reached early_param() anyway (they apply late via
extra_command_line), so nothing is lost -- the initrd keys still apply
late, with last-wins keeping the embedded values in effect.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/x86/Kconfig        |  1 +
 arch/x86/kernel/setup.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0de23e6471973..8ab11199c16d5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -127,6 +127,7 @@ config X86
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
 	select ARCH_SUPPORTS_CFI		if X86_64
+	select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
 	select ARCH_USES_CFI_TRAPS		if X86_64 && CFI
 	select ARCH_SUPPORTS_LTO_CLANG
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 46882ce79c3a4..c973a2cebcd04 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -6,6 +6,7 @@
  * parts of early kernel initialization.
  */
 #include <linux/acpi.h>
+#include <linux/bootconfig.h>
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/crash_dump.h>
@@ -881,6 +882,37 @@ static void __init x86_report_nx(void)
  * Note: On x86_64, fixmaps are ready for use even before this is called.
  */

+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+static int __init bootconfig_optin(char *param, char *val,
+				   const char *unused, void *arg)
+{
+	if (!strcmp(param, "bootconfig"))
+		*(bool *)arg = true;
+	return 0;
+}
+
+/*
+ * Did the user opt in to bootconfig on the kernel command line? Use
+ * parse_args() so this matches setup_boot_config() exactly, including
+ * stopping at the "--" that separates init arguments.
+ */
+static bool __init bootconfig_cmdline_requested(void)
+{
+	static char tmp_cmdline[COMMAND_LINE_SIZE] __initdata;
+	bool found = false;
+
+	if (IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE))
+		return true;
+
+	strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
+	if (IS_ERR(parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0,
+			      &found, bootconfig_optin)))
+		return false;
+
+	return found;
+}
+#endif
+
 void __init setup_arch(char **cmdline_p)
 {
 #ifdef CONFIG_X86_32
@@ -924,6 +956,17 @@ void __init setup_arch(char **cmdline_p)
 	builtin_cmdline_added = true;
 #endif

+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+	/*
+	 * Prepend the build-time-rendered embedded "kernel" keys here so
+	 * parse_early_param() below sees them, gating on the same opt-in
+	 * as the runtime parser (see bootconfig_cmdline_requested()).
+	 */
+	if (bootconfig_cmdline_requested())
+		xbc_prepend_embedded_cmdline(boot_command_line,
+					     COMMAND_LINE_SIZE);
+#endif
+
 	strscpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
 	*cmdline_p = command_line;

-- 
2.53.0-Meta

^ permalink raw reply related

* [PATCH v6 7/8] bootconfig: skip runtime kernel.* render once prepended early
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org>

setup_boot_config() folds the embedded bootconfig "kernel" subtree into
the command line via xbc_make_cmdline("kernel"). A subsequent patch lets
an architecture prepend the build-time-rendered embedded "kernel" keys
to boot_command_line early in setup_arch(); rendering them again here
would then duplicate every key in saved_command_line and make
accumulating handlers (console=, earlycon=, ...) re-register the same
value.

Track whether the bootconfig data came from the embedded source
(from_embedded) and skip the runtime render only when the early prepend
actually happened, as reported by xbc_embedded_cmdline_applied(). On
architectures that do not select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
that helper is a stub returning false, so this path is unchanged and the
embedded "kernel" keys still reach the cmdline via the runtime parser
exactly as before.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 init/main.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/init/main.c b/init/main.c
index e363232b428b4..260bd5242f94e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -378,12 +378,15 @@ static void __init setup_boot_config(void)
 	int pos, ret;
 	size_t size;
 	char *err;
+	bool from_embedded = false;
 
 	/* Cut out the bootconfig data even if we have no bootconfig option */
 	data = get_boot_config_from_initrd(&size);
 	/* If there is no bootconfig in initrd, try embedded one. */
-	if (!data)
+	if (!data) {
 		data = xbc_get_embedded_bootconfig(&size);
+		from_embedded = true;
+	}
 
 	strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
 	err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
@@ -421,8 +424,24 @@ static void __init setup_boot_config(void)
 	} else {
 		xbc_get_info(&ret, NULL);
 		pr_info("Load bootconfig: %ld bytes %d nodes\n", (long)size, ret);
-		/* keys starting with "kernel." are passed via cmdline */
-		extra_command_line = xbc_make_cmdline("kernel");
+		/*
+		 * keys starting with "kernel." are passed via cmdline. When
+		 * this bootconfig came from the embedded source and
+		 * setup_arch() already prepended the rendered "kernel" subtree
+		 * to boot_command_line, rendering again here would duplicate
+		 * the keys in saved_command_line and make accumulating handlers
+		 * (console=, earlycon=, ...) re-register the same value. Skip
+		 * only when the prepend really happened.
+		 *
+		 * On arches that do not select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG,
+		 * CONFIG_CMDLINE_FROM_BOOTCONFIG is unselectable and
+		 * xbc_embedded_cmdline_applied() collapses to a stub returning
+		 * false, so this path still runs and the embedded "kernel"
+		 * keys reach the cmdline via the runtime parser exactly as
+		 * before this series.
+		 */
+		if (!from_embedded || !xbc_embedded_cmdline_applied())
+			extra_command_line = xbc_make_cmdline("kernel");
 		/* Also, "init." keys are init arguments */
 		extra_init_args = xbc_make_cmdline("init");
 	}

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v6 6/8] Documentation: bootconfig: document build-time cmdline rendering
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org>

Add a section describing CONFIG_CMDLINE_FROM_BOOTCONFIG: what it
does (renders the embedded "kernel" subtree to a flat cmdline at
build time so early_param() handlers see the values), what it
requires (BOOT_CONFIG_EMBED, a non-empty BOOT_CONFIG_EMBED_FILE,
and ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG -- currently x86 only),
the bootconfig opt-in semantics, the initrd-vs-embedded precedence,
and the soft-error overflow behavior.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Documentation/admin-guide/bootconfig.rst | 81 ++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
index f712758472d5c..349cefbb2bbcd 100644
--- a/Documentation/admin-guide/bootconfig.rst
+++ b/Documentation/admin-guide/bootconfig.rst
@@ -234,6 +234,87 @@ Kconfig option selected.
 Note that even if you set this option, you can override the embedded
 bootconfig by another bootconfig which attached to the initrd.
 
+Rendering Embedded kernel.* Keys at Build Time
+----------------------------------------------
+
+By default, the embedded bootconfig (``CONFIG_BOOT_CONFIG_EMBED=y``) is
+parsed at runtime, after ``parse_early_param()`` has already run. Early
+parameter handlers (``mem=``, ``earlycon=``, ``loglevel=``, ...) therefore
+cannot see values supplied via the embedded ``kernel`` subtree.
+
+``CONFIG_CMDLINE_FROM_BOOTCONFIG`` resolves this by rendering the
+``kernel`` subtree of ``CONFIG_BOOT_CONFIG_EMBED_FILE`` into a flat cmdline
+string at kernel build time (via ``tools/bootconfig -C``) and prepending
+it to ``boot_command_line`` during early architecture setup, so the keys
+are visible to ``parse_early_param()``.
+
+The option requires ``CONFIG_BOOT_CONFIG_EMBED=y``, a non-empty
+``CONFIG_BOOT_CONFIG_EMBED_FILE``, and an architecture that selects
+``CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG``. Currently only x86
+selects it; on other architectures the embedded bootconfig still works,
+but only through the late runtime parser.
+
+The same ``bootconfig`` opt-in applies as elsewhere: the rendered keys
+are prepended only when ``bootconfig`` (in any form) appears on the
+kernel command line, or when ``CONFIG_BOOT_CONFIG_FORCE`` is set, which
+defaults to ``y`` when ``CONFIG_BOOT_CONFIG_EMBED`` is set.
+
+For example, given::
+
+ kernel {
+   loglevel = 7
+   mem = 4G
+ }
+
+the kernel boots as if ``loglevel=7 mem=4G`` had been prepended to the
+bootloader command line, with the values visible to early-parsed
+handlers. Comma-separated values are still expanded into multiple
+cmdline entries per the bootconfig array convention -- the embedded
+``kernel.earlycon = "uart8250,io,0x3f8"`` must be quoted to land as a
+single ``earlycon=`` entry, exactly as for the runtime parser.
+
+If the rendered string would not fit in ``COMMAND_LINE_SIZE`` together
+with the existing command line, the prepend is skipped and an error is
+logged, so an oversized embedded bootconfig cannot brick a boot.
+
+Interaction with other command line and bootconfig sources
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With ``CONFIG_CMDLINE_FROM_BOOTCONFIG=y`` the rendered ``kernel``
+subtree behaves like a build-time command line (similar to
+``CONFIG_CMDLINE``), not like a bootconfig source. It is prepended to
+``boot_command_line`` in ``setup_arch()``, before ``parse_early_param()``
+and long before the runtime parser looks at an initrd. Options can reach
+the kernel from up to four places:
+
+- Bootloader command line: the arguments the boot loader passes. The
+  embedded cmdline is prepended in front of them, so for last-one-wins
+  parameters a bootloader option still overrides the embedded value.
+  Visible in /proc/cmdline.
+- Embedded cmdline (this option): the rendered ``kernel`` subtree,
+  prepended early so it is seen by ``parse_early_param()``. Visible in
+  /proc/cmdline.
+- Initrd bootconfig: parsed late in ``setup_boot_config()``; its
+  ``kernel`` keys are placed ahead of ``boot_command_line``, i.e. before
+  the embedded cmdline, so last-wins favors the embedded values. As a
+  bootconfig source, an initrd bootconfig still replaces the embedded
+  bootconfig. Visible in /proc/cmdline and /proc/bootconfig.
+- Embedded bootconfig (runtime): parsed late, only when no initrd
+  bootconfig is present. Visible in /proc/cmdline and /proc/bootconfig.
+
+So with this option the embedded ``kernel.*`` values take precedence
+over an initrd bootconfig's ``kernel.*`` values: for early parameters
+the initrd is not parsed yet, and for ordinary parameters the embedded
+keys land later in the command line. If you need an initrd bootconfig to
+override the embedded ``kernel.*`` keys, leave this option off and rely
+on the runtime parser.
+
+The rendered string is part of the command line, so it appears in
+/proc/cmdline. It is deliberately not shown in /proc/bootconfig: that
+file keeps reporting the parsed bootconfig tree -- the initrd bootconfig
+if present, otherwise the embedded bootconfig -- independent of whether
+build-time cmdline rendering is enabled.
+
 Kernel parameters via Boot Config
 =================================
 

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v6 5/8] bootconfig: add xbc_prepend_embedded_cmdline() helper
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org>

Add a helper that prepends the build-time-rendered embedded bootconfig
"kernel" subtree (embedded_kernel_cmdline[] from embedded-cmdline.S) to
a cmdline buffer with a separating space. Architectures call this from
setup_arch() before parse_early_param() so early_param() handlers
(mem=, earlycon=, loglevel=, ...) see values supplied via the embedded
bootconfig.

The in-place prepend (shift the existing string right, then drop the
embedded string in front) is factored into a small str_prepend() helper.

On overflow the helper logs an error and leaves the cmdline untouched
rather than panicking. Booting without the embedded values is better
than refusing to boot, and the error tells the user why their embedded
keys are missing.

The helper records whether it actually prepended, exposed via
xbc_embedded_cmdline_applied(). setup_boot_config() uses this to decide
whether the runtime "kernel" render would duplicate keys already folded
into boot_command_line.

When CONFIG_CMDLINE_FROM_BOOTCONFIG=n, the public declaration in
<linux/bootconfig.h> resolves to a no-op stub so callers compile
unchanged.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/bootconfig.h |  9 ++++++
 lib/bootconfig.c           | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index 1c7f3b74ffcf3..43324b477f13a 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -308,4 +308,13 @@ static inline const char *xbc_get_embedded_bootconfig(size_t *size)
 }
 #endif
 
+/* Build-time-rendered bootconfig cmdline prepended in setup_arch() */
+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size);
+bool __init xbc_embedded_cmdline_applied(void);
+#else
+static inline void xbc_prepend_embedded_cmdline(char *dst, size_t size) { }
+static inline bool xbc_embedded_cmdline_applied(void) { return false; }
+#endif
+
 #endif
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 926094d97397e..05cb1ea9afdae 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -19,6 +19,7 @@
 #include <linux/errno.h>
 #include <linux/cache.h>
 #include <linux/compiler.h>
+#include <linux/printk.h>
 #include <linux/sprintf.h>
 #include <linux/memblock.h>
 #include <linux/string.h>
@@ -34,6 +35,83 @@ const char * __init xbc_get_embedded_bootconfig(size_t *size)
 	return (*size) ? embedded_bootconfig_data : NULL;
 }
 #endif
+
+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+/* embedded_kernel_cmdline is defined in embedded-cmdline.S */
+extern __visible const char embedded_kernel_cmdline[];
+extern __visible const char embedded_kernel_cmdline_end[];
+
+/* Set once the embedded cmdline has actually been prepended. */
+static bool xbc_cmdline_applied __initdata;
+
+/*
+ * str_prepend() - Prepend @src in front of the string in @dst, in place
+ * @dst: NUL-terminated destination buffer, currently @dst_len bytes long
+ * @dst_len: length of the current @dst string (excluding its NUL)
+ * @src: bytes to prepend (not NUL-terminated)
+ * @src_len: number of bytes from @src to prepend
+ *
+ * The caller must guarantee @dst has room for src_len + dst_len + 1 bytes.
+ * Moving dst_len + 1 bytes carries @dst's NUL terminator too, so an empty
+ * @dst needs no special case.
+ */
+static void __init str_prepend(char *dst, size_t dst_len,
+			       const char *src, size_t src_len)
+{
+	memmove(dst + src_len, dst, dst_len + 1);
+	memcpy(dst, src, src_len);
+}
+
+/**
+ * xbc_prepend_embedded_cmdline() - Prepend embedded bootconfig cmdline
+ * @dst: cmdline buffer to prepend into (must already contain a NUL byte)
+ * @size: total capacity of @dst in bytes
+ *
+ * Prepend the build-time-rendered "kernel" subtree of the embedded
+ * bootconfig to @dst. The rendered string already ends with a single
+ * space (the xbc_snprint_cmdline() invariant), which serves as the
+ * separator between the embedded keys and any existing content of @dst.
+ * On overflow, log an error and leave @dst untouched rather than
+ * silently truncating: booting without the embedded values is better
+ * than refusing to boot, and the error message tells the user why
+ * their embedded keys are missing.
+ *
+ * Intended to be called from setup_arch() before parse_early_param() so
+ * that early_param() handlers see the embedded values.
+ */
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size)
+{
+	size_t embed_len = embedded_kernel_cmdline_end - embedded_kernel_cmdline;
+	size_t dst_len;
+
+	if (!size || embed_len <= 1)	/* trailing NUL only */
+		return;
+	embed_len--;			/* exclude trailing NUL byte */
+
+	dst_len = strnlen(dst, size);
+	if (embed_len + dst_len + 1 > size) {
+		pr_err("embedded bootconfig cmdline (%zu bytes) does not fit in COMMAND_LINE_SIZE with %zu bytes already used; ignoring embedded values\n",
+		       embed_len, dst_len);
+		return;
+	}
+
+	str_prepend(dst, dst_len, embedded_kernel_cmdline, embed_len);
+	xbc_cmdline_applied = true;
+}
+
+/**
+ * xbc_embedded_cmdline_applied() - Did the embedded cmdline get prepended?
+ *
+ * Return true if xbc_prepend_embedded_cmdline() actually prepended the
+ * embedded "kernel" subtree. setup_boot_config() uses this to avoid
+ * rendering the same keys a second time.
+ */
+bool __init xbc_embedded_cmdline_applied(void)
+{
+	return xbc_cmdline_applied;
+}
+#endif
+
 #endif
 
 /*

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v6 4/8] bootconfig: clean build-time tools/bootconfig from make clean
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier
In-Reply-To: <20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org>

The previous patch builds tools/bootconfig during 'make prepare' to
render the embedded bootconfig cmdline, but nothing removes it on
'make clean', leaving the compiled tool and its objects behind.

Wire a bootconfig_clean hook into the top-level clean target so the
compiled tool and its objects are removed by make clean, matching the
prepare-wired tools/objtool and tools/bpf/resolve_btfids.

The hook runs tools/bootconfig's Makefile via $(MAKE), which the kernel
build invokes with -rR (MAKEFLAGS += -rR). -rR drops the built-in $(RM)
variable, so the existing "$(RM) -f ..." clean recipe would expand to a
bare "-f ..." and fail. Spell the recipe with a literal "rm -f" so it
keeps working both standalone and when invoked from Kbuild.

Reviewed-by: Nicolas Schier <n.schier@fritz.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Makefile                  | 11 ++++++++++-
 tools/bootconfig/Makefile |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 5255aa35a2e51..20a2bcacde3b8 100644
--- a/Makefile
+++ b/Makefile
@@ -1587,6 +1587,15 @@ ifneq ($(wildcard $(objtool_O)),)
 	$(Q)$(MAKE) -sC $(abs_srctree)/tools/objtool O=$(objtool_O) srctree=$(abs_srctree) $(patsubst objtool_%,%,$@)
 endif
 
+PHONY += bootconfig_clean
+
+bootconfig_O = $(abspath $(objtree))/tools/bootconfig
+
+bootconfig_clean:
+ifneq ($(wildcard $(bootconfig_O)),)
+	$(Q)$(MAKE) -sC $(srctree)/tools/bootconfig O=$(bootconfig_O) clean
+endif
+
 tools/: FORCE
 	$(Q)mkdir -p $(objtree)/tools
 	$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/
@@ -1757,7 +1766,7 @@ vmlinuxclean:
 	$(Q)$(CONFIG_SHELL) $(srctree)/scripts/link-vmlinux.sh clean
 	$(Q)$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) clean)
 
-clean: archclean vmlinuxclean resolve_btfids_clean objtool_clean
+clean: archclean vmlinuxclean resolve_btfids_clean objtool_clean bootconfig_clean
 
 # mrproper - Delete all generated files, including .config
 #
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 4e82fd9553cde..3cb8066d5141b 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -27,4 +27,4 @@ install: $(ALL_PROGRAMS)
 	install $(OUTPUT)bootconfig $(DESTDIR)$(bindir)
 
 clean:
-	$(RM) -f $(OUTPUT)*.o $(ALL_PROGRAMS)
+	rm -f $(OUTPUT)*.o $(ALL_PROGRAMS)

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v6 3/8] bootconfig: render embedded bootconfig as a kernel cmdline at build time
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier
In-Reply-To: <20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org>

Add the build-time pipeline that renders the "kernel" subtree of
CONFIG_BOOT_CONFIG_EMBED_FILE into a flat cmdline string and stashes
it in .init.rodata as embedded_kernel_cmdline[]. A follow-up patch
adds the runtime helper that prepends this string to boot_command_line
during early architecture setup so parse_early_param() sees the values.

The build wires up:
  tools/bootconfig -C kernel - userspace tool already shared with
                               lib/bootconfig.c, used here in -C mode
                               to render a bootconfig file to a cmdline
  lib/embedded-cmdline.S     - .incbin's the rendered text plus a NUL
                               (listed under the EXTRA BOOT CONFIG
                               MAINTAINERS entry)
  lib/Makefile rule          - runs tools/bootconfig at build time
  Makefile prepare dep       - ensures tools/bootconfig is built first,
                               same pattern as tools/objtool and
                               tools/bpf/resolve_btfids

Drop the test target from tools/bootconfig/Makefile's default 'all'
recipe so that hooking the binary into the kernel build does not run
test-bootconfig.sh on every prepare. The tests stay available as
'make -C tools/bootconfig test', matching the convention of
tools/objtool and tools/bpf/resolve_btfids whose 'all' targets only
build the binary.

Require BOOT_CONFIG_EMBED_FILE to be non-empty before the new option
can be enabled, otherwise tools/bootconfig -C runs against an empty
file and prints a parse error on every kernel build.

The feature gates on CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, a
silent symbol arches select once they've wired the prepend call into
setup_arch(). No arch selects it in this patch, so the user-visible
CONFIG_CMDLINE_FROM_BOOTCONFIG is not yet enableable; when an arch
later opts in, the runtime behavior is added by the follow-up patches.

tools/bootconfig also installs on target systems, so its own Makefile
keeps $(CC) and stays cross-buildable as a standalone tool. The kernel
build, which runs the tool on the build host during prepare, instead
forces CC=$(HOSTCC) from a dedicated tools/bootconfig rule and clears
CROSS_COMPILE= in the sub-make. Without that clear, an LLVM=1 cross
build would inherit CROSS_COMPILE and tools/scripts/Makefile.include
would inject --target=/--sysroot= flags into the host clang invocation,
producing a target binary that fails to exec ("Exec format error").

embedded-cmdline.S places the rendered string in its own .init.rodata
subsection (.init.rodata.embed_cmdline) with the "a" (allocatable,
read-only) flag and %progbits. lib/bootconfig-data.S already places
the embedded bootconfig blob in .init.rodata with the "aw" flag
(xbc_init() rewrites separators in place, so that data must be
writable). Using a distinct subsection name avoids the ld.lld section-
type mismatch that would otherwise arise from mixing "a" and "aw"
under the same name; the linker's "*(.init.rodata .init.rodata.*)"
glob still folds both into the init image and frees them after boot.

A follow-up patch wires the build-time tools/bootconfig into the
top-level clean target.

Reviewed-by: Nicolas Schier <n.schier@fritz.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 MAINTAINERS               |  1 +
 Makefile                  | 16 ++++++++++++++++
 init/Kconfig              | 36 ++++++++++++++++++++++++++++++++++++
 lib/Makefile              | 16 ++++++++++++++++
 lib/embedded-cmdline.S    | 16 ++++++++++++++++
 tools/bootconfig/Makefile |  2 +-
 6 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 57656ec0e9d5d..953231df1911d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9844,6 +9844,7 @@ F:	fs/proc/bootconfig.c
 F:	include/linux/bootconfig.h
 F:	lib/bootconfig-data.S
 F:	lib/bootconfig.c
+F:	lib/embedded-cmdline.S
 F:	tools/bootconfig/*
 F:	tools/bootconfig/scripts/*
 
diff --git a/Makefile b/Makefile
index bf196c6df5b92..5255aa35a2e51 100644
--- a/Makefile
+++ b/Makefile
@@ -1545,6 +1545,22 @@ prepare: tools/bpf/resolve_btfids
 endif
 endif
 
+# tools/bootconfig renders the embedded bootconfig into a cmdline at build time.
+ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+prepare: tools/bootconfig
+endif
+
+# tools/bootconfig is run on the build host during prepare, so force a host
+# binary here; its own Makefile keeps $(CC) for standalone and cross builds.
+# CROSS_COMPILE= is cleared so tools/scripts/Makefile.include does not inject
+# the target's --target=/--sysroot= flags into the host clang invocation under
+# LLVM=1 cross builds (which would produce a target binary that fails to exec).
+tools/bootconfig: export CC := $(HOSTCC)
+tools/bootconfig: FORCE
+	$(Q)mkdir -p $(objtree)/tools
+	$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ \
+		bootconfig CROSS_COMPILE=
+
 # The tools build system is not a part of Kbuild and tends to introduce
 # its own unique issues. If you need to integrate a new tool into Kbuild,
 # please consider locating that tool outside the tools/ tree and using the
diff --git a/init/Kconfig b/init/Kconfig
index 5230d4879b1c8..598690ec313a2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1566,6 +1566,42 @@ config BOOT_CONFIG_EMBED_FILE
 	  This bootconfig will be used if there is no initrd or no other
 	  bootconfig in the initrd.
 
+config ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+	bool
+	help
+	  Silent symbol; no C code reads it directly. Architectures
+	  select it once their setup_arch() calls
+	  xbc_prepend_embedded_cmdline() before parse_early_param().
+	  Its only role is to gate the user-visible
+	  CMDLINE_FROM_BOOTCONFIG option per-arch, the same
+	  ARCH_SUPPORTS_* idiom used by ARCH_SUPPORTS_CFI, etc.
+
+config CMDLINE_FROM_BOOTCONFIG
+	bool "Render embedded bootconfig as kernel cmdline at build time"
+	depends on BOOT_CONFIG_EMBED_FILE != ""
+	depends on ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+	depends on CMDLINE = ""
+	default n
+	help
+	  Render the "kernel" subtree of the embedded bootconfig file into a
+	  flat cmdline string at kernel build time and prepend it to
+	  boot_command_line during early architecture setup. This makes
+	  early_param() handlers (e.g. mem=, earlycon=, loglevel=) see the
+	  values supplied via the embedded bootconfig.
+
+	  The runtime bootconfig parser is unaffected, so tree-structured
+	  consumers such as ftrace boot-time tracing keep working.
+
+	  Note: when an initrd also carries a bootconfig, its "kernel"
+	  subtree is still parsed at runtime, but the embedded "kernel"
+	  keys remain in boot_command_line for parse_early_param() and
+	  end up later than the initrd keys in saved_command_line, so
+	  parse_args() last-wins favors the embedded values. If you need
+	  initrd to override embedded kernel.* keys, leave this option
+	  off.
+
+	  If unsure, say N.
+
 config CMDLINE_LOG_WRAP_IDEAL_LEN
 	int "Length to try to wrap the cmdline when logged at boot"
 	default 1021
diff --git a/lib/Makefile b/lib/Makefile
index 7f75cc6edf94a..4ccdce2fd5e5b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -273,6 +273,22 @@ filechk_defbconf = cat $(or $(real-prereqs), /dev/null)
 $(obj)/default.bconf: $(CONFIG_BOOT_CONFIG_EMBED_FILE) FORCE
 	$(call filechk,defbconf)
 
+obj-$(CONFIG_CMDLINE_FROM_BOOTCONFIG) += embedded-cmdline.o
+$(obj)/embedded-cmdline.o: $(obj)/embedded_cmdline.bin
+
+# Render the bootconfig "kernel" subtree to a flat cmdline string using
+# the userspace tools/bootconfig parser (-C mode). The runtime prepend
+# helper enforces COMMAND_LINE_SIZE at boot, so no build-time size
+# check is performed here (COMMAND_LINE_SIZE is an arch header
+# constant, not a Kconfig value).
+quiet_cmd_render_cmdline = BCONF2C $@
+      cmd_render_cmdline = \
+	$(objtree)/tools/bootconfig/bootconfig -C $< > $@
+
+targets += embedded_cmdline.bin
+$(obj)/embedded_cmdline.bin: $(obj)/default.bconf $(objtree)/tools/bootconfig/bootconfig FORCE
+	$(call if_changed,render_cmdline)
+
 obj-$(CONFIG_RBTREE_TEST) += rbtree_test.o
 obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o
 
diff --git a/lib/embedded-cmdline.S b/lib/embedded-cmdline.S
new file mode 100644
index 0000000000000..bda81b4a42bea
--- /dev/null
+++ b/lib/embedded-cmdline.S
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Embed the build-time-rendered bootconfig "kernel" subtree as a flat
+ * cmdline string. setup_arch() prepends this to boot_command_line on
+ * architectures that select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG.
+ *
+ * Copyright (c) 2026 Meta Platforms, Inc. and affiliates
+ * Copyright (c) 2026 Breno Leitao <leitao@debian.org>
+ */
+	.section .init.rodata.embed_cmdline, "a", %progbits
+	.global embedded_kernel_cmdline
+embedded_kernel_cmdline:
+	.incbin "lib/embedded_cmdline.bin"
+	.byte 0
+	.global embedded_kernel_cmdline_end
+embedded_kernel_cmdline_end:
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 90eb47c9d8de6..4e82fd9553cde 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -15,7 +15,7 @@ override CFLAGS += -Wall -g -I$(CURDIR)/include
 ALL_TARGETS := bootconfig
 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
 
-all: $(ALL_PROGRAMS) test
+all: $(ALL_PROGRAMS)
 
 $(OUTPUT)bootconfig: main.c include/linux/bootconfig.h $(LIBSRC)
 	$(CC) $(filter %.c,$^) $(CFLAGS) $(LDFLAGS) -o $@

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v6 2/8] bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org>

xbc_node_for_each_key_value() walks to the first leaf under @root, and
when @root is itself a leaf it yields @root. That happens not only for
an empty "kernel {}" subtree, but also when @root carries both a value
and subkeys, e.g.

	kernel = x
	kernel.foo = bar

Here @root ("kernel") is a leaf because its first child is the value
node "x", so the iterator returns @root first. Feeding @root back into
xbc_node_compose_key_after(root, root) returns -EINVAL, which the only
in-kernel caller papers over with a "len <= 0" check -- but the
follow-up tools/bootconfig -C user propagates the error and turns such
a bootconfig into a build failure. Worse, short-circuiting the whole
call on a leaf @root would silently drop the valid "kernel.foo = bar"
descendant that this patch should render.

Skip @root inside the loop instead of bailing out: the value-only entry
is dropped (it is rendered through the "kernel" cmdline path, not here),
while real descendant keys are still emitted. An entirely empty subtree
now renders nothing and returns 0 rather than -EINVAL, matching the
"nothing to render is not an error" semantics expected by the new
build-time caller.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 2ed9ee3dc81c7..926094d97397e 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -440,6 +440,17 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 	 * itself is well defined and returns the would-be length.
 	 */
 	xbc_node_for_each_key_value(root, knode, val) {
+		/*
+		 * An empty or value-only @root (e.g. "kernel {}" or
+		 * "kernel = x", possibly alongside "kernel.foo = bar")
+		 * yields @root itself here. Skip it: composing a key for it
+		 * would fail with -EINVAL, yet any real descendant keys must
+		 * still be rendered. An entirely empty subtree then renders
+		 * nothing and returns 0 rather than an error.
+		 */
+		if (knode == root)
+			continue;
+
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);
 		if (ret < 0)

-- 
2.53.0-Meta

^ permalink raw reply related

* [PATCH v6 1/8] bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org>

xbc_snprint_cmdline() is meant to be called twice: first with
buf=NULL, size=0 to probe the rendered length, then with a real
buffer to fill it (the standard snprintf() two-pass pattern). The
probe call makes the function compute "buf + size" (NULL + 0) and,
on every iteration, advance "buf += ret" from that NULL base and
pass the result back into snprintf().

Pointer arithmetic on a NULL pointer is undefined behavior. It is
harmless in the in-kernel callers today, but the follow-up patches
run this same code in the userspace tools/bootconfig parser at kernel
build time, where host UBSan / FORTIFY_SOURCE abort the build.

Track a running written length (size_t) instead of mutating @buf, and
only form "buf + len" when @buf is non-NULL. snprintf(NULL, 0, ...)
is itself well defined and returns the would-be length, so the
two-pass "probe then fill" usage returns identical byte counts.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index f445b7703fdd9..2ed9ee3dc81c7 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -427,10 +427,18 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
 int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 {
 	struct xbc_node *knode, *vnode;
-	char *end = buf + size;
 	const char *val, *q;
+	size_t len = 0;
 	int ret;
 
+	/*
+	 * Track the running written length rather than advancing @buf, so we
+	 * never form "buf + size" or "buf += ret" while @buf is NULL (the
+	 * size-probe call passes buf=NULL, size=0). NULL pointer arithmetic
+	 * is undefined behavior and trips host UBSan / FORTIFY_SOURCE when
+	 * this renderer runs at kernel build time. snprintf(NULL, 0, ...)
+	 * itself is well defined and returns the would-be length.
+	 */
 	xbc_node_for_each_key_value(root, knode, val) {
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);
@@ -439,10 +447,11 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 
 		vnode = xbc_node_get_child(knode);
 		if (!vnode) {
-			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
+			ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+				       "%s ", xbc_namebuf);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 			continue;
 		}
 		xbc_array_for_each_value(vnode, val) {
@@ -452,15 +461,15 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 			 * whitespace.
 			 */
 			q = strpbrk(val, " \t\r\n") ? "\"" : "";
-			ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
-				       xbc_namebuf, q, val, q);
+			ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+				       "%s=%s%s%s ", xbc_namebuf, q, val, q);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 		}
 	}
 
-	return buf - (end - size);
+	return len;
 }
 #undef rest
 

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v6 0/8] bootconfig: embed kernel.* cmdline at build time
From: Breno Leitao @ 2026-06-23 16:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier

The userspace pieces (xbc_snprint_cmdline() in lib/, tools/bootconfig -C)
already landed; this series wires the rendered cmdline into the kernel.

Motivation: today the embedded bootconfig is parsed at runtime, after
parse_early_param() has already run, so early_param() handlers can't
see embedded values. Folding the kernel.* subtree into the cmdline at
build time gives a CONFIG_CMDLINE-equivalent for embedded-bootconfig
users without forcing them to maintain two cmdline sources.

Behaviorally, the "kernel" subtree is rendered to a flat string at
build time and stashed in .init.rodata. setup_arch() prepends it to
boot_command_line before parse_early_param() runs. Overflow is a soft
error: the helper logs and leaves boot_command_line untouched rather
than panicking, so an oversized embedded bconf cannot brick a boot.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v6:
- renamed CONFIG_BOOT_CONFIG_EMBED_CMDLINE to
  CONFIG_CMDLINE_FROM_BOOTCONFIG
- prepend embedded bootconfig cmdline before parse_early_param
- Link to v5: https://lore.kernel.org/r/20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org

Changes in v5:
- Patch 3 (Kconfig): drop the redundant "depends on BOOT_CONFIG_EMBED"
  from CMDLINE_FROM_BOOTCONFIG; Julian Braha.
- Patch 6 (Documentation): spell out how the embedded cmdline interacts
  with the bootloader cmdline, an initrd bootconfig, and the embedded
  bootconfig
- Link to v4: https://lore.kernel.org/r/20260609-bootconfig_using_tools-v4-0-73c463f03a97@debian.org

Changes in v4:
- Patch 3 (build pipeline): clear CROSS_COMPILE= in the kernel-side
  tools/bootconfig sub-make. Without it, an LLVM=1 cross build
  inherits CROSS_COMPILE and tools/scripts/Makefile.include injects
  --target=/--sysroot= into the host clang, producing a target
  binary that fails to exec.
- Patch 3 (build pipeline): place embedded-cmdline.S in its own
  .init.rodata.embed_cmdline subsection ("a") so ld.lld does not
  see a section-type mismatch against lib/bootconfig-data.S's
  writable .init.rodata ("aw"). The linker's *(.init.rodata
  .init.rodata.*) glob still folds it into the init image.
- Patch 6 (x86/setup): also accept the bootconfig=<anything> form
  via cmdline_find_option(), matching the runtime parse_args() loop.
  Without it, bootconfig=0/=off would skip the early prepend but
  still trigger the late runtime apply -- a split-brain state.
- New patch 7: document CONFIG_CMDLINE_FROM_BOOTCONFIG in
  Documentation/admin-guide/bootconfig.rst (semantics, opt-in,
  precedence, overflow behavior, example).
- Link to v3: https://lore.kernel.org/r/20260608-bootconfig_using_tools-v3-0-4ddd079a0696@debian.org

Changes in v3:
- Patch 3: Move HOSTCC override to the kernel-side rule; tool keeps
  $(CC) for standalone/cross builds.
- Patch 6: Drop the false fail-safe wording; document the
  BOOT_CONFIG_FORCE=y default interaction.
- Link to v2:
  https://lore.kernel.org/r/20260605-bootconfig_using_tools-v2-0-d309f544b5f7@debian.org

Changes in v2 (addressing review of v1):
- Split out a standalone fix for the NULL-pointer arithmetic in
  xbc_snprint_cmdline() so the build-time render cannot trip host
  UBSan/FORTIFY_SOURCE.
- Rework the leaf-root handling: instead of returning early, skip @root
  inside the loop so a root carrying both a value and subkeys
  (kernel = x together with kernel.foo = bar) still renders its
  descendant keys.
- Build tools/bootconfig with $(HOSTCC) so cross-compiled (ARCH=...)
  builds render the cmdline on the build host instead of failing with
  "Exec format error".
- Mark the embedded cmdline section read-only (drop the "w" flag from
  .init.rodata).
- Add a make-clean hook so tools/bootconfig artifacts are removed by
  make clean.
- Gate the x86 prepend on "bootconfig" being present on the command
  line (or CONFIG_BOOT_CONFIG_FORCE), matching the init.* opt-in
  semantics documented in bootconfig.rst and preserving fail-safe
  recovery: dropping "bootconfig" from the bootloader cmdline now also
  disables the embedded kernel.* keys.
- Link to v1: https://patch.msgid.link/20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org

---
Breno Leitao (8):
      bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
      bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
      bootconfig: render embedded bootconfig as a kernel cmdline at build time
      bootconfig: clean build-time tools/bootconfig from make clean
      bootconfig: add xbc_prepend_embedded_cmdline() helper
      Documentation: bootconfig: document build-time cmdline rendering
      bootconfig: skip runtime kernel.* render once prepended early
      x86/setup: prepend embedded bootconfig cmdline before parse_early_param

 Documentation/admin-guide/bootconfig.rst |  81 ++++++++++++++++++++++
 MAINTAINERS                              |   1 +
 Makefile                                 |  27 +++++++-
 arch/x86/Kconfig                         |   1 +
 arch/x86/kernel/setup.c                  |  43 ++++++++++++
 include/linux/bootconfig.h               |   9 +++
 init/Kconfig                             |  36 ++++++++++
 init/main.c                              |  25 ++++++-
 lib/Makefile                             |  16 +++++
 lib/bootconfig.c                         | 112 +++++++++++++++++++++++++++++--
 lib/embedded-cmdline.S                   |  16 +++++
 tools/bootconfig/Makefile                |   4 +-
 12 files changed, 358 insertions(+), 13 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260508-bootconfig_using_tools-cfa7aa9d6a5a

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply

* Re: [RFC PATCH v2 06/10] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: Pratyush Yadav @ 2026-06-23 16:14 UTC (permalink / raw)
  To: tarunsahu
  Cc: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen,
	linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <9huztsqtmihs.fsf@tarunix.c.googlers.com>

On Tue, Jun 23 2026, tarunsahu@google.com wrote:

> Ackerley Tng <ackerleytng@google.com> writes:
>
>> Tarun Sahu <tarunsahu@google.com> writes:
>>
>>>  static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
>>>  			       loff_t len)
>>>  {
>>> +	struct inode *inode = file_inode(file);
>>>  	int ret;
>>> +	int idx;
>>>
>>> -	if (!(mode & FALLOC_FL_KEEP_SIZE))
>>> -		return -EOPNOTSUPP;
>>> +	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
>>> +	if (kvm_gmem_is_frozen(inode)) {
>>> +		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
>>> +		return -EPERM;
>>> +	}
>>
>> fallocate may eventually go to kvm_gmem_get_folio(), so that would check
>> kvm_gmem_is_frozen() twice. Is this meant to catch the punch hole case?

Yeah, I reckon you can get away with doing this check only in
kvm_gmem_get_folio(). Normally you'd like to fail early, but as of now I
don't see much of a problem. If you drop the check here and fail in
kvm_gmem_get_folio() you'd end up taking and releasing the mapping
invalidate_lock, but this isn't a fast path anyway so I don't think it
should matter much.

I think either way can work just as fine...

>>
>>>
>>> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
>>> -		return -EOPNOTSUPP;
>>> +	if (!(mode & FALLOC_FL_KEEP_SIZE)) {
>>> +		ret = -EOPNOTSUPP;
>>> +		goto out;
>>> +	}
>>>
>>> -	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
>>> -		return -EINVAL;
>>> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
>>> +		ret = -EOPNOTSUPP;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
>>> +		ret = -EINVAL;
>>> +		goto out;
>>> +	}
>>
>> There's some reordering here. Why not let the validation happen like
>> before, then check kvm_gmem_is_frozen()?

There is no reordering, if I am reading the diff correctly. The diff is
somewhat misleading. The kvm_gmem_is_frozen() call is added at the top
of the function, and then all the later checks are in the same place but
get a goto out (and hence a full body to the if block). So the diff
reads like reordering, but there is none.

It would be very neat if scru had a cleanup.h style scope-based locking
function, but on a quick glance I can't see one.

>
> To align with design. "stop the fallocate call if inode is frozen, No
> need to go further". I dont have strict opinion on this. I am fine with
> taking it across punch hole as well to make it more fine grained. But it
> will no longer claims stop the fallocate call (allocation one is stopped
> in separate path: fault path) , though functionally it does the same
> thing.
>
> WDYT?
>
> ~Tarun

-- 
Regards,
Pratyush Yadav

^ permalink raw reply

* [PATCH v2] usbcore: Add quirk for 255-bytes initial config read
From: Nikhil Solanke @ 2026-06-23 16:10 UTC (permalink / raw)
  To: linux-usb
  Cc: gregkh, linux-kernel, stern, michal.pecio, stable, corbet, skhan,
	linux-doc, Nikhil Solanke

Certain third-party USB game controllers exposing (or spoofing) an Xbox
360-compatible interface (VID:PID 045e:028e) fail to enumerate under Linux.
The device disconnects from the bus without responding to the initial
GET_DESCRIPTOR(CONFIGURATION) request, and the kernel logs 'unable to read
config index 0 descriptor/start: -71'.

The device then falls back to a secondary Android HID mode (with a
different VID:PID), losing XInput functionality including rumble support.
The failure reproduces across multiple machines, host controller types, and
kernel versions including current mainline and LTS. The device enumerates
correctly and remains in XInput mode under Windows. Notably, the device
enumerates correctly in Android mode when the same 9-byte request
is issued for that mode's configuration descriptor, confirming the firmware
bug is specific to the XInput mode.

usbmon traces from Linux and Wireshark/USBPcap traces from Windows are
identical up to the point of failure, with no visible protocol-level
difference explaining the divergence. The root cause was identified when
Michal Pecio discovered via a QEMU bus-level capture that Windows does not
use wLength=9 for the initial config descriptor request; it uses
wLength=255. Alan Stern subsequently confirmed this with a bus
analyzer on a different USB 2.0 device, and Michal verified the behavior
goes back to Windows 95 OSR2.1.

So, add a new quirk flag USB_QUIRK_CONFIG_SIZE which causes
usb_get_configuration() to issue a 255 byte sized configuration request
instead of USB_DT_CONFIG_SIZE (9) for the initial
GET_DESCRIPTOR(CONFIGURATION) request, mimicking long-standing Windows
behavior.

Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Michal Pecio <michal.pecio@gmail.com>
Closes: https://lore.kernel.org/linux-usb/CAFgddh+JWdT4LLwMc5qjM8q_pBu-fRo2qADR5ovAKoGHWMQrRw@mail.gmail.com/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org

Signed-off-by: Nikhil Solanke <nikhilsolanke5@gmail.com>
---
Changes in v2:
- Add Documentation
- Naming changes
- Refactored to have a better flow with existing code.

 .../admin-guide/kernel-parameters.txt         |  9 +++
 drivers/usb/core/config.c                     | 61 ++++++++++++++-----
 drivers/usb/core/hub.c                        |  6 +-
 drivers/usb/core/quirks.c                     |  4 ++
 include/linux/usb/quirks.h                    |  3 +
 5 files changed, 67 insertions(+), 16 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 97007f4f69d4..af4bf0ef2c7b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8158,6 +8158,15 @@ Kernel parameters
 				q = USB_QUIRK_FORCE_ONE_CONFIG (Device
 					claims zero configurations,
 					forcing to 1);
+                r = USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE (Device
+                    fails during initialization when asked for
+                    9-bytes configuration desciptor request. Ask
+                    for 255-bytes request instead to mirror
+                    Windows' behavior. This quirk is originally
+                    meant to fix some quirky gamepads that refuse
+                    to connect in their XInput mode. But it can also
+                    potentially fix issues with other USB devices
+                    that work on Windows but not on Linux)
 			Example: quirks=0781:5580:bk,0a5c:5834:gij
 
 	usbhid.mousepoll=
diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c
index 45e20c6d76c0..4fc3145404d6 100644
--- a/drivers/usb/core/config.c
+++ b/drivers/usb/core/config.c
@@ -19,6 +19,9 @@
 
 #define USB_MAXCONFIG			8	/* Arbitrary limit */
 
+/* config req size if USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE is set */
+#define USB_CONFIG_WINDOWS_REQ_SIZE	255
+
 static int find_next_descriptor(unsigned char *buffer, int size,
     int dt1, int dt2, int *num_skipped)
 {
@@ -912,6 +915,13 @@ int usb_get_configuration(struct usb_device *dev)
 	unsigned char *bigbuffer;
 	struct usb_config_descriptor *desc;
 	int result;
+	/*
+	 * Devices with quirky firmware will stall or reset when asked only for
+	 * the configuration header. This variable decides which size to use in
+	 * that case, if the quirk for that device was set.
+	 */
+	size_t usb_config_req_size = (dev->quirks & USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE)
+		? USB_CONFIG_WINDOWS_REQ_SIZE : USB_DT_CONFIG_SIZE;
 
 	if (ncfg > USB_MAXCONFIG) {
 		dev_notice(ddev, "too many configurations: %d, "
@@ -938,18 +948,27 @@ int usb_get_configuration(struct usb_device *dev)
 	if (!dev->rawdescriptors)
 		return -ENOMEM;
 
-	desc = kmalloc(USB_DT_CONFIG_SIZE, GFP_KERNEL);
+	desc = kmalloc(usb_config_req_size, GFP_KERNEL);
+
 	if (!desc)
 		return -ENOMEM;
 
 	for (cfgno = 0; cfgno < ncfg; cfgno++) {
-		/* We grab just the first descriptor so we know how long
-		 * the whole configuration is */
+
+		if (dev->quirks & USB_QUIRK_DELAY_INIT)
+			msleep(200);
+
+		/*
+		 * Grab just the first descriptor so we know how long the whole
+		 * configuration is. In case of quirky firmware, try to grab the
+		 * whole thing in one go by asking for a 255-bytes sized buffer
+		 * mirroring Windows behavior.
+		 */
 		result = usb_get_descriptor(dev, USB_DT_CONFIG, cfgno,
-		    desc, USB_DT_CONFIG_SIZE);
+						desc, usb_config_req_size);
 		if (result < 0) {
 			dev_err(ddev, "unable to read config index %d "
-			    "descriptor/%s: %d\n", cfgno, "start", result);
+				"descriptor/%s: %d\n", cfgno, "start", result);
 			if (result != -EPIPE)
 				goto err;
 			dev_notice(ddev, "chopping to %d config(s)\n", cfgno);
@@ -957,13 +976,25 @@ int usb_get_configuration(struct usb_device *dev)
 			break;
 		} else if (result < 4) {
 			dev_err(ddev, "config index %d descriptor too short "
-			    "(expected %i, got %i)\n", cfgno,
-			    USB_DT_CONFIG_SIZE, result);
+				"(asked for %zu, got %i, expected at least %i)\n",
+				cfgno, usb_config_req_size, result, 4);
 			result = -EINVAL;
 			goto err;
 		}
+
 		length = max_t(int, le16_to_cpu(desc->wTotalLength),
-		    USB_DT_CONFIG_SIZE);
+				USB_DT_CONFIG_SIZE);
+
+		/*
+		 * If the device returns the full length configuration
+		 * descriptor, skip the second read. Otherwise, send a second
+		 * request asking for the full length.
+		 */
+		if (result >= le16_to_cpu(desc->wTotalLength)) {
+			bigbuffer = (unsigned char *) desc;
+			desc = NULL;
+			goto store_and_parse;
+		}
 
 		/* Now that we know the length, get the whole thing */
 		bigbuffer = kmalloc(length, GFP_KERNEL);
@@ -972,23 +1003,25 @@ int usb_get_configuration(struct usb_device *dev)
 			goto err;
 		}
 
-		if (dev->quirks & USB_QUIRK_DELAY_INIT)
-			msleep(200);
-
 		result = usb_get_descriptor(dev, USB_DT_CONFIG, cfgno,
-		    bigbuffer, length);
+						bigbuffer, length);
+
 		if (result < 0) {
 			dev_err(ddev, "unable to read config index %d "
-			    "descriptor/%s\n", cfgno, "all");
+				"descriptor/%s\n", cfgno, "all");
 			kfree(bigbuffer);
 			goto err;
 		}
+
 		if (result < length) {
 			dev_notice(ddev, "config index %d descriptor too short "
-			    "(expected %i, got %i)\n", cfgno, length, result);
+				"(asked for %i, got %i)\n",
+				cfgno, length, result);
 			length = result;
 		}
 
+store_and_parse:
+		krealloc(bigbuffer, length, GFP_KERNEL);
 		dev->rawdescriptors[cfgno] = bigbuffer;
 
 		result = usb_parse_configuration(dev, cfgno,
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 24960ba9caa9..9acd278666fc 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2527,8 +2527,10 @@ static int usb_enumerate_device(struct usb_device *udev)
 		err = usb_get_configuration(udev);
 		if (err < 0) {
 			if (err != -ENODEV)
-				dev_err(&udev->dev, "can't read configurations, error %d\n",
-						err);
+				dev_err(&udev->dev, "can't read configurations, "
+					"for device %04x:%04x, error %d\n",
+					le16_to_cpu(udev->descriptor.idVendor),
+					le16_to_cpu(udev->descriptor.idProduct), err);
 			return err;
 		}
 	}
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 87810eff974e..df670b0b66fe 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -142,6 +142,10 @@ static int quirks_param_set(const char *value, const struct kernel_param *kp)
 				break;
 			case 'q':
 				flags |= USB_QUIRK_FORCE_ONE_CONFIG;
+				break;
+			case 'r':
+				flags |= USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE;
+				break;
 			/* Ignore unrecognized flag characters */
 			}
 		}
diff --git a/include/linux/usb/quirks.h b/include/linux/usb/quirks.h
index b3cc7beab4a3..a4043b33c2c2 100644
--- a/include/linux/usb/quirks.h
+++ b/include/linux/usb/quirks.h
@@ -81,4 +81,7 @@
 /* Device claims zero configurations, forcing to 1 */
 #define USB_QUIRK_FORCE_ONE_CONFIG		BIT(18)
 
+/* Use a 255 bytes config descriptor request mirroring windows behavior */
+#define USB_QUIRK_WINDOWS_CONFIG_REQ_SIZE	BIT(19)
+
 #endif /* __LINUX_USB_QUIRKS_H */
-- 
2.54.0


^ permalink raw reply related

* Re: [RFC PATCH v2 03/10] kvm: Prepare core VM structs and helpers for LUO support
From: tarunsahu @ 2026-06-23 15:33 UTC (permalink / raw)
  To: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen
  Cc: linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <CAEvNRgGharGxs9s_ow0Z4iiQ9PCzdghch-4Fk6UMjiPP9tX-5g@mail.gmail.com>

Ackerley Tng <ackerleytng@google.com> writes:

> Tarun Sahu <tarunsahu@google.com> writes:
>
>> Introduce core infrastructure to support VM preservation with LUO.
>>
>> First two changes are just refactoring, no functional change, third
>> change introduces a new member in struct kvm.
>> - Move ITOA_MAX_LEN to kvm_mm.h for reuse by upcoming kvm_luo code.
>> - Add a public kvm_create_vm_file() helper wrapping kvm_create_vm()
>>   and anon_inode_getfile() to provide a unified VM file creation API.
>> - Track a weak reference to the backing file in struct kvm under
>>   CONFIG_LIVEUPDATE_GUEST_MEMFD to enable reverse file resolution
>>   without circular lifetime dependencies.
>>
>
> Given the above, I think this should be separate patches.
>
>> Signed-off-by: Tarun Sahu <tarunsahu@google.com>
>> ---
>>  include/linux/kvm_host.h | 14 +++++++
>>  virt/kvm/kvm_main.c      | 79 +++++++++++++++++++++++++++++-----------
>>  virt/kvm/kvm_mm.h        |  3 ++
>>  3 files changed, 75 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 4c14aee1fb06..9111a28637af 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -874,6 +874,18 @@ struct kvm {
>>  #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
>>  	/* Protected by slots_lock (for writes) and RCU (for reads) */
>>  	struct xarray mem_attr_array;
>> +#endif
>> +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
>> +	/*
>> +	 * Weak reference to the VFS file backing this KVM instance. Stored
>> +	 * without incrementing the file refcount to prevent a circular lifetime
>> +	 * dependency (since file->private_data already pins this struct kvm).
>> +	 * Used exclusively to resolve the file pointer back from struct kvm.
>> +	 *
>> +	 * Written/cleared via rcu_assign_pointer() and read locklessly under
>> +	 * RCU (e.g. via get_file_active() to prevent ABA races).
>> +	 */
>> +	struct file *vm_file;
>>  #endif
>
> We didn't really talk about this during the calls, but it seems weird to
> preserve a vm_file with pretty much nothing other than the vm type. The
> entire VM is re-created, which means it could potentially be a
> completely different VM?
>
> In some sense it's more flexible since the guest_memfd can be restored
> with some completely different VM, but it seems like it could introduce
> other issues.
>
> I think other KVM folks would probably have more thoughts here.
>
>>  	char stats_id[KVM_STATS_NAME_SIZE];
>>  };
>> @@ -1074,7 +1086,9 @@ void kvm_get_kvm(struct kvm *kvm);
>>  bool kvm_get_kvm_safe(struct kvm *kvm);
>>  void kvm_put_kvm(struct kvm *kvm);
>>  bool file_is_kvm(struct file *file);
>> +struct file *kvm_create_vm_file(unsigned long type, const char *fdname);
>>  void kvm_put_kvm_no_destroy(struct kvm *kvm);
>> +void kvm_uevent_notify_vm_create(struct kvm *kvm);
>>
>>  static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id)
>>  {
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 89489996fbc1..65f0c5fb353e 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -67,9 +67,6 @@
>>  #include <linux/kvm_dirty_ring.h>
>>
>>
>> -/* Worst case buffer size needed for holding an integer. */
>> -#define ITOA_MAX_LEN 12
>> -
>>  MODULE_AUTHOR("Qumranet");
>>  MODULE_DESCRIPTION("Kernel-based Virtual Machine (KVM) Hypervisor");
>>  MODULE_LICENSE("GPL");
>> @@ -1349,6 +1346,19 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
>>  {
>>  	struct kvm *kvm = filp->private_data;
>>
>> +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
>> +	/*
>> +	 * Clear the weak reference of the vm file.
>> +	 * In case vm file is closed by userspace, but kvm still has
>> +	 * other users like vCPUs, clearing this pointer ensures
>> +	 * that we don't have a dangling pointer to a closed file.
>> +	 *
>> +	 * Cleared via rcu_assign_pointer() to ensure proper memory visibility
>> +	 * for concurrent lockless readers under RCU.
>> +	 */
>> +	rcu_assign_pointer(kvm->vm_file, NULL);
>> +#endif
>> +
>>  	kvm_irqfd_release(kvm);
>>
>>  	kvm_put_kvm(kvm);
>> @@ -5476,11 +5486,47 @@ bool file_is_kvm(struct file *file)
>>  }
>>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(file_is_kvm);
>>
>> +struct file *kvm_create_vm_file(unsigned long type, const char *fdname)
>> +{
>> +	struct kvm *kvm = kvm_create_vm(type, fdname);
>> +	struct file *file;
>> +
>> +	if (IS_ERR(kvm))
>> +		return ERR_CAST(kvm);
>> +
>> +	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
>> +	if (IS_ERR(file)) {
>> +		kvm_put_kvm(kvm);
>> +		return file;
>> +	}
>> +
>> +#ifdef CONFIG_LIVEUPDATE_GUEST_MEMFD
>> +	/*
>> +	 * Weak reference to the file (without get_file()) to prevent a circular
>> +	 * dependency. Safe because the file's release path clears this pointer
>> +	 * and drops its reference to the VM.
>> +	 *
>> +	 * Written via rcu_assign_pointer() because the pointer can be read
>> +	 * locklessly under RCU (e.g., in kvm_gmem_luo_preserve() via
>> +	 * get_file_active() to prevent lockless ABA races).
>> +	 */
>> +	rcu_assign_pointer(kvm->vm_file, file);
>> +#endif
>> +
>> +	/*
>> +	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
>> +	 * already set, with ->release() being kvm_vm_release().  In error
>> +	 * cases it will be called by the final fput(file) and will take
>> +	 * care of doing kvm_put_kvm(kvm).
>> +	 */
>> +
>> +	return file;
>> +}
>> +
>>  static int kvm_dev_ioctl_create_vm(unsigned long type)
>>  {
>>  	char fdname[ITOA_MAX_LEN + 1];
>>  	int r, fd;
>> -	struct kvm *kvm;
>>  	struct file *file;
>>
>>  	fd = get_unused_fd_flags(O_CLOEXEC);
>> @@ -5489,31 +5535,17 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
>>
>>  	snprintf(fdname, sizeof(fdname), "%d", fd);
>>
>> -	kvm = kvm_create_vm(type, fdname);
>> -	if (IS_ERR(kvm)) {
>> -		r = PTR_ERR(kvm);
>> -		goto put_fd;
>> -	}
>> -
>> -	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
>> +	file = kvm_create_vm_file(type, fdname);
>>  	if (IS_ERR(file)) {
>>  		r = PTR_ERR(file);
>> -		goto put_kvm;
>> +		goto put_fd;
>>  	}
>>
>> -	/*
>> -	 * Don't call kvm_put_kvm anymore at this point; file->f_op is
>> -	 * already set, with ->release() being kvm_vm_release().  In error
>> -	 * cases it will be called by the final fput(file) and will take
>> -	 * care of doing kvm_put_kvm(kvm).
>> -	 */
>> -	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
>> +	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, file->private_data);
>
> Notifying with file->private_data threw me off... I would rather inline
> the rcu_assign_pointer() in this function and have this line read
> notify(..., kvm) like before.

Yes, I will update it to:

     struct kvm *kvm;
     ...
     kvm = file->private_data;
     notify (..., kvm);

About rcu_assign_pointer, I am not sure, IIUC. that only set the
kvm->vm_file pointer, which does not have any relation with
file->private_data. And keeping the rcu_assign_pointer(kvm->vm_file,
file) at the current place (inside kvm_create_vm_file) logically makes
sense. because, kvm_create_vm_file creates the struct kvm and vm_file,
So, all the relation variables should get updated there.

>
>>
>>  	fd_install(fd, file);
>>  	return fd;
>>
>> -put_kvm:
>> -	kvm_put_kvm(kvm);
>>  put_fd:
>>  	put_unused_fd(fd);
>>  	return r;
>> @@ -6341,6 +6373,11 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
>>  	kfree(env);
>>  }
>>
>> +void kvm_uevent_notify_vm_create(struct kvm *kvm)
>> +{
>> +	kvm_uevent_notify_change(KVM_EVENT_CREATE_VM, kvm);
>> +}
>> +
>>  static void kvm_init_debug(void)
>>  {
>>  	const struct file_operations *fops;
>> diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
>> index 9fcc5d5b7f8d..7aa1d65c3d46 100644
>> --- a/virt/kvm/kvm_mm.h
>> +++ b/virt/kvm/kvm_mm.h
>> @@ -3,6 +3,9 @@
>>  #ifndef __KVM_MM_H__
>>  #define __KVM_MM_H__ 1
>>
>> +/* Worst case buffer size needed for holding an integer as a string. */
>> +#define ITOA_MAX_LEN 12
>> +
>>  /*
>>   * Architectures can choose whether to use an rwlock or spinlock
>>   * for the mmu_lock.  These macros, for use in common code
>> --
>> 2.54.0.1032.g2f8565e1d1-goog

^ permalink raw reply

* Re: [RFC PATCH v2 07/10] kvm: guest_memfd_luo: add support for guest_memfd preservation
From: tarunsahu @ 2026-06-23 15:26 UTC (permalink / raw)
  To: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen
  Cc: linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <CAEvNRgGYM+evk5H-bU8suU++zX3-w25fcc+9mu1YoNG8DtysHw@mail.gmail.com>


Thanks for reviewing!

Ackerley Tng <ackerleytng@google.com> writes:

> Tarun Sahu <tarunsahu@google.com> writes:
>
>> This patch sets up the basic infrastructure to preserve the guest_memfd.
>> Currently this supports only fully shared guest_memfd and backed by
>> PAGE_SIZE pages.
>>
>> It registers a new LUO file handler for guest_memfd files to serialize
>> and deserialize guest memory. This allows preserving guest memory backed
>> by guest_memfd across updates, ensuring that guest instances can be
>> resumed seamlessly without losing their memory contents.
>>
>> Preservation is straight forward. It walks through the folios and
>> serialize them.
>>
>> There is kvm_gmem_freeze call on preserve which freeze the guest_memfd
>> inode. It avoids any changes to inode mapping with fallocate calls or
>> any new fault allocation (fails) on or after preservation. No need to check
>> this during the page fault as preservation is only supported for
>> pre-faulted/pre-allocated guest_memfd.
>>
>> While retrieving the guest_memfd, it requires the struct kvm to create
>> new guest_memfd. So it first get the vm_file from the same session using
>> the token passed during the preservation. And use it to get
>> vm_file->kvm.
>>
>> This change also update the MAINTAINERS list.
>>
>> Signed-off-by: Tarun Sahu <tarunsahu@google.com>
>> ---
>>  MAINTAINERS                 |   1 +
>>  include/linux/kho/abi/kvm.h |  79 +++++-
>>  virt/kvm/Makefile.kvm       |   2 +-
>>  virt/kvm/guest_memfd_luo.c  | 485 ++++++++++++++++++++++++++++++++++++
>>  virt/kvm/kvm_main.c         |   7 +
>>  virt/kvm/kvm_mm.h           |   4 +
>>  6 files changed, 571 insertions(+), 7 deletions(-)
>>  create mode 100644 virt/kvm/guest_memfd_luo.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 9bfc3c1f6676..16cba790a84d 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -14418,6 +14418,7 @@ L:	kexec@lists.infradead.org
>>  L:	kvm@vger.kernel.org
>>  S:	Maintained
>>  T:	git git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
>> +F:	virt/kvm/guest_memfd_luo.c
>>  F:	virt/kvm/kvm_luo.c
>>
>>  KVM PARAVIRT (KVM/paravirt)
>> diff --git a/include/linux/kho/abi/kvm.h b/include/linux/kho/abi/kvm.h
>> index 718db68a541a..42074d76e04a 100644
>> --- a/include/linux/kho/abi/kvm.h
>> +++ b/include/linux/kho/abi/kvm.h
>> @@ -9,20 +9,23 @@
>>  #define _LINUX_KHO_ABI_KVM_H
>>
>>  #include <linux/types.h>
>> +#include <linux/bits.h>
>>  #include <linux/kho/abi/kexec_handover.h>
>>
>>  /**
>> - * DOC: KVM Live Update ABI
>> + * DOC: KVM and guest_memfd Live Update ABI
>>   *
>> - * KVM uses the ABI defined below for preserving its state
>> + * KVM and guest_memfd use the ABI defined below for preserving their states
>>   * across a kexec reboot using the LUO.
>>   *
>> - * The state is serialized into a packed structure `struct kvm_luo_ser`
>> - * which is handed over to the next kernel via the KHO mechanism.
>> + * The state is serialized into packed structures (struct kvm_luo_ser and
>> + * struct guest_memfd_luo_ser) which are handed over to the next kernel via
>> + * the KHO mechanism.
>>   *
>> - * This interface is a contract. Any modification to the structure layout
>> + * This interface is a contract. Any modification to the structure layouts
>>   * constitutes a breaking change. Such changes require incrementing the
>> - * version number in the KVM_LUO_FH_COMPATIBLE compatibility string.
>> + * version number in the KVM_LUO_FH_COMPATIBLE or
>> + * GUEST_MEMFD_LUO_FH_COMPATIBLE compatibility strings.
>>   */
>>
>>  /**
>> @@ -36,4 +39,68 @@ struct kvm_luo_ser {
>>  /* The compatibility string for KVM VM file handler */
>>  #define KVM_LUO_FH_COMPATIBLE	"kvm_vm_luo_v1"
>>
>> +/**
>> + * struct guest_memfd_luo_folio_ser - Serialization layout for a single folio in guest_memfd.
>> + * @pfn:   Page Frame Number of the folio.
>> + * @index: Page offset of the folio within the file.
>> + * @flags: State flags associated with the folio.
>> + */
>> +struct guest_memfd_luo_folio_ser {
>> +	u64 pfn:52;
>> +	u64 flags:12;
>> +	u64 index;
>> +} __packed;
>> +
>> +/**
>> + * GUEST_MEMFD_LUO_FOLIO_UPTODATE - The folio is up-to-date.
>> + *
>> + * This flag is per folio to check if the folio is uptodate.
>> + */
>> +#define GUEST_MEMFD_LUO_FOLIO_UPTODATE	BIT(0)
>> +
>> +
>> +/**
>> + * GUEST_MEMFD_LUO_FLAG_MMAP - The guest_memfd supports mmap.
>> + *
>> + * This flag indicates that the guest_memfd supports host-side mmap.
>> + */
>> +#define GUEST_MEMFD_LUO_FLAG_MMAP		BIT(0)
>> +
>> +/**
>> + * GUEST_MEMFD_LUO_FLAG_INIT_SHARED - Initialize memory as shared.
>> + *
>> + * This flag indicates that the guest_memfd has been initialized as shared
>> + * memory.
>> + */
>> +#define GUEST_MEMFD_LUO_FLAG_INIT_SHARED	BIT(1)
>> +
>> +/**
>> + * GUEST_MEMFD_LUO_SUPPORTED_FLAGS - Supported guest_memfd LUO flags mask.
>> + *
>> + * A mask of all guest_memfd preservation flags supported by this version
>> + * of the KVM LUO ABI.
>> + */
>> +#define GUEST_MEMFD_LUO_SUPPORTED_FLAGS	(GUEST_MEMFD_LUO_FLAG_MMAP | \
>> +						 GUEST_MEMFD_LUO_FLAG_INIT_SHARED)
>> +
>> +/**
>> + * struct guest_memfd_luo_ser - Main serialization structure for guest_memfd.
>> + * @size:      The size of the file in bytes.
>> + * @flags:     File-level flags.
>> + * @nr_folios: Number of folios in the folios array.
>> + * @vm_token:  Token of the associated KVM VM instance.
>> + * @folios:    KHO vmalloc descriptor pointing to the array of
>> + *             struct guest_memfd_luo_folio_ser.
>> + */
>> +struct guest_memfd_luo_ser {
>> +	u64 size;
>> +	u64 flags;
>> +	u64 nr_folios;
>> +	u64 vm_token;
>> +	struct kho_vmalloc folios;
>> +} __packed;
>> +
>> +/* The compatibility string for GUEST_MEMFD file handler */
>> +#define GUEST_MEMFD_LUO_FH_COMPATIBLE	"guest_memfd_luo_v1"
>> +
>>  #endif /* _LINUX_KHO_ABI_KVM_H */
>> diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
>> index c1a962159264..d30fca094c42 100644
>> --- a/virt/kvm/Makefile.kvm
>> +++ b/virt/kvm/Makefile.kvm
>> @@ -13,4 +13,4 @@ kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
>>  kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
>>  kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
>>  kvm-$(CONFIG_KVM_GUEST_MEMFD) += $(KVM)/guest_memfd.o
>> -kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/kvm_luo.o
>> +kvm-$(CONFIG_LIVEUPDATE_GUEST_MEMFD) += $(KVM)/guest_memfd_luo.o $(KVM)/kvm_luo.o
>> diff --git a/virt/kvm/guest_memfd_luo.c b/virt/kvm/guest_memfd_luo.c
>> new file mode 100644
>> index 000000000000..d466f889c9aa
>> --- /dev/null
>> +++ b/virt/kvm/guest_memfd_luo.c
>> @@ -0,0 +1,485 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +/*
>> + * Copyright (c) 2026, Google LLC.
>> + * Tarun Sahu <tarunsahu@google.com>
>> + *
>> + * Guestmemfd Preservation for Live Update Orchestrator (LUO)
>> + */
>> +
>> +/**
>> + * DOC: Guestmemfd Preservation via LUO
>> + *
>> + * Overview
>> + * ========
>> + *
>> + * Guest memory file descriptors (guest_memfd) can be preserved over a kexec
>> + * reboot using the Live Update Orchestrator (LUO) file preservation. This
>> + * allows userspace to preserve VM memory across kexec reboots.
>> + *
>> + * The preservation is not intended to be transparent. Only select properties
>> + * of the guest_memfd are preserved, while others are reset to default.
>> + *
>> + * Preserved Properties
>> + * ====================
>> + *
>> + * The following properties of guest_memfd are preserved across kexec:
>> + *
>> + * File Size
>> + *   The size of the file is preserved.
>> + *
>> + * File Contents
>> + *   All folios present in the page cache are preserved.
>> + *
>> + * File-level Flags
>> + *   The file-level flags (such as MMAP support and INIT_SHARED default mapping)
>> + *   are preserved.
>> + *
>> + * Non-Preserved Properties
>> + * ========================
>> + *
>> + * NUMA Memory Policy
>> + *   NUMA memory policies associated with the guest_memfd are not preserved.
>> + */
>> +#include <linux/liveupdate.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/pagemap.h>
>> +#include <linux/file.h>
>> +#include <linux/err.h>
>> +#include <linux/anon_inodes.h>
>> +#include <linux/magic.h>
>> +#include <linux/kexec_handover.h>
>> +#include <linux/kho/abi/kexec_handover.h>
>> +#include <linux/kho/abi/kvm.h>
>> +#include "guest_memfd.h"
>> +
>> +static int kvm_gmem_luo_walk_folios(struct address_space *mapping,
>> +		pgoff_t end_index, struct guest_memfd_luo_folio_ser *folios_ser,
>> +		u64 *out_count)
>> +{
>> +	struct folio_batch fbatch;
>> +	pgoff_t index = 0;
>> +	u64 count = 0;
>> +	int err = 0;
>> +
>> +	folio_batch_init(&fbatch);
>> +	while (index < end_index) {
>> +		unsigned int nr, i;
>> +
>> +		nr = filemap_get_folios(mapping, &index, end_index - 1, &fbatch);
>> +		if (nr == 0)
>> +			break;
>> +
>> +		for (i = 0; i < nr; i++) {
>> +			struct folio *folio = fbatch.folios[i];
>> +
>> +			if (folios_ser) {
>> +				if (folio_test_hwpoison(folio)) {
>> +					err = -EHWPOISON;
>> +					folio_batch_release(&fbatch);
>> +					goto out;
>> +				}
>> +				err = kho_preserve_folio(folio);
>> +				if (err) {
>> +					folio_batch_release(&fbatch);
>> +					goto out;
>> +				}
>> +
>> +				folios_ser[count].pfn = folio_pfn(folio);
>> +				folios_ser[count].index = folio->index;
>> +				folios_ser[count].flags = folio_test_uptodate(folio) ?
>> +							  GUEST_MEMFD_LUO_FOLIO_UPTODATE : 0;
>> +			}
>> +			count++;
>> +		}
>> +		folio_batch_release(&fbatch);
>> +		cond_resched();
>> +	}
>> +
>> +out:
>> +	*out_count = count;
>> +	return err;
>> +}
>> +
>> +static bool kvm_gmem_luo_can_preserve(struct liveupdate_file_handler *handler, struct file *file)
>> +{
>> +	struct inode *inode = file_inode(file);
>> +	struct gmem_file *gmem_file = file->private_data;
>> +	struct kvm *kvm = gmem_file->kvm;
>> +
>> +	if (inode->i_sb->s_magic != GUEST_MEMFD_MAGIC)
>> +		return 0;
>> +
>
> How does .can_preserve decide route to this function? If it already
> routes here, wouldn't this inode definitely be a guest_memfd file?

No.
kvm_gmem_luo_handler {
                 ...
                 .ops.can_preserve = kvm_gmem_luo_can_preserve;
                 ...
                 }

.can_preserve is eligiblity call. LUO preservation ioctl can be
called by any type of file (for example: memfd file). LUO internally
loops through all the registered handler (memfd_luo handler, kvm_luo
handler, guest_memfd_luo handler etc) and call their can_preserve
function, which is kvm_gmem_luo_can_preserve, For memfd file, it will
return false and for guest_memfd it will return true.

In short: This function is used to filter the handler for the
guest_memfd file. We have additional checks to filter only the
guest_memfd file which we currently support for the preservation.

>
>> +	if (kvm_arch_has_private_mem(kvm))
>> +		return 0;
>> +
>> +	if (mapping_large_folio_support(inode->i_mapping))
>> +		return 0;
>> +
>> +	return 1;
>
> Let's return true and false rather than relying on casting.

ACK. Will update it.
>
>> +}
>> +
>> +static int kvm_gmem_luo_preserve(struct liveupdate_file_op_args *args)
>> +{
>> +	struct guest_memfd_luo_folio_ser *folios_ser = NULL;
>> +	u64 count = 0, gmem_flags, abi_flags = 0;
>> +	struct guest_memfd_luo_ser *ser;
>> +	struct address_space *mapping;
>> +	struct gmem_file *gmem_file;
>> +	struct inode *inode;
>> +	pgoff_t end_index;
>> +	struct kvm *kvm;
>> +	int err = 0;
>> +	long size;
>> +
>> +	inode = file_inode(args->file);
>
> I think to lock out all allocates, you'd have to take
> filemap_invalidate_lock() before freezing.

No need, freeze wait for synchronisation. Only after current,
allocation finishes (future allocation as blocked by freeze = true),
It will proceed further.

>
>> +	kvm_gmem_freeze(inode, true);
>> +
>> +	mapping = inode->i_mapping;
>> +	size = i_size_read(inode);
>> +	if (!size) {
>> +		err = -EINVAL;
>> +		goto err_unfreeze_inode;
>> +	}
>> +
>> +	if (WARN_ON_ONCE(!PAGE_ALIGNED(size))) {
>> +		err = -EINVAL;
>> +		goto err_unfreeze_inode;
>> +	}
>> +
>> +	gmem_file = args->file->private_data;
>> +	kvm = gmem_file->kvm;
>> +
>> +	gmem_flags = READ_ONCE(GMEM_I(inode)->flags);
>> +	if (gmem_flags & ~(GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED
>
> Why condition this on MMAP?

Here, It is only checking what flags are set in gmem inode. Currently,
gmem inode (not talking about preservation here) supports three flags
MMAP, INIT_SHARED and FROZEN. So, other than that if there are any
flags, We fail preservation with -EOPNOTSUPP.

>
> After conversions lands, we'd have to iterate to check that the entire
> guest_memfd is shared offset-by-offset instead of checking for
> INIT_SHARED.

This is unrelated to comment to the code snippet here: but we can
disucss this as it is good question.

I object it. It will not be needed. If in-place coversion lands first or
after this preservation series. There will no affect to the preservation
series (IMHO, IIUC). I have following two checks in V3:
https://lore.kernel.org/all/20260622184851.2309827-7-tarunsahu@google.com/#:~:text=static%20bool%20kvm_gmem_luo_can_preserve,0%3B%0A%2B%0A%2B%09return%201%3B%0A%2B%7D

INIT_SHARED: Which makes sure that initially, the guest_memfd was
completely shared, when created.
!kvm_arch_has_private_mem(): Avoids Any future conversion.

So, If these two checks are passed, that means guest_memfd file does
not have any private pages. So we can preserve it. This is also aligns
with logic in in-place conversion series.

>
>> +				| GUEST_MEMFD_F_MAPPING_FROZEN)) {
>
> This would always be true since kvm_gmem_freeze() is done above.

Like, I said above, it is just checking if there are any flags set other
than these three.

>
>> +		err = -EOPNOTSUPP;
>> +		goto err_unfreeze_inode;
>> +	}
>> +
>> +	if (gmem_flags & GUEST_MEMFD_FLAG_MMAP)
>> +		abi_flags |= GUEST_MEMFD_LUO_FLAG_MMAP;
>> +	if (gmem_flags & GUEST_MEMFD_FLAG_INIT_SHARED)
>> +		abi_flags |= GUEST_MEMFD_LUO_FLAG_INIT_SHARED;
>> +
>
> Is it intentional to have a different set of flags that are actually
> preserved? I think we should refactor out a function to transfer the
> flags over.

To have backward compatiblity with userspace ABI. the flags
must be constant across different version of kernel. In the
file include/linux/kho/abi/kvm.h

I have defined these two flags, and they will be certain to be
constant across different version of kernel. Internal flags might
change their name, but externally, this will remain constant.

gmem_inode->flags (internal kernel variablity)
                   |
                   v
"userspace defined gmem_inode flags" => preserve

<KEXEC to new kernel version>

retrieve => "userspace defined gmem_inode flags"
                   |
                   v
gmem_inode->flags (internal kernel variablity)

Ack on refactoring out the logic: Will take care of that in next
revision v4.

>
>> +	end_index = size >> PAGE_SHIFT;
>> +
>> +	ser = kho_alloc_preserve(sizeof(*ser));
>> +	if (IS_ERR(ser)) {
>> +		err = PTR_ERR(ser);
>> +		goto err_unfreeze_inode;
>> +	}
>> +
>> +	/* First pass: Count the folios present in the page cache */
>> +	err = kvm_gmem_luo_walk_folios(mapping, end_index, NULL, &count);
>> +	if (err)
>> +		goto err_free_ser;
>> +
>> +	ser->size = size;
>> +	ser->flags = abi_flags;
>> +	ser->nr_folios = count;
>> +	ser->vm_token = 0; // It will be set during the kvm_gmem_luo_freeze()
>
> I don't think // is commonly used.

Will take care of such instances.

>
>> +
>> +	if (count > 0) {
>> +		folios_ser = vcalloc(count, sizeof(*folios_ser));
>> +		if (!folios_ser) {
>> +			err = -ENOMEM;
>> +			goto err_free_ser;
>> +		}
>> +
>> +		/* Second pass: Fill the metadata array and preserve folios */
>> +		err = kvm_gmem_luo_walk_folios(mapping, end_index, folios_ser, &count);
>
> I think it's clearer to just define 2 functions rather than using the
> same function twice to do these different things. The comments on the
> two passes can then be dropped.

I have single function to avoid duplication of the same code.

>
>> +		if (err)
>> +			goto err_unpreserve_unlocked;
>> +
>> +		if (WARN_ON_ONCE(count != ser->nr_folios)) {
>> +			err = -EINVAL;
>> +			goto err_unpreserve_unlocked;
>> +		}
>> +	}
>> +
>> +	if (count > 0) {
>> +		err = kho_preserve_vmalloc(folios_ser, &ser->folios);
>> +		if (err)
>> +			goto err_unpreserve_unlocked;
>> +	}
>> +
>> +	args->serialized_data = virt_to_phys(ser);
>> +	args->private_data = folios_ser;
>> +
>> +	return 0;
>> +
>> +err_unpreserve_unlocked:
>> +	for (long i = (long)count - 1; i >= 0; i--) {
>
> Not sure if it's common to define long i inline.

I will correct it. Good fine. thanks.

>
>> +		struct folio *folio = pfn_folio(folios_ser[i].pfn);
>> +
>> +		kho_unpreserve_folio(folio);
>> +	}
>> +	vfree(folios_ser);
>> +err_free_ser:
>> +	kho_unpreserve_free(ser);
>> +err_unfreeze_inode:
>> +	kvm_gmem_freeze(inode, false);
>> +	return err;
>> +}
>> +
>> +static int kvm_gmem_luo_freeze(struct liveupdate_file_op_args *args)
>> +{
>> +	struct guest_memfd_luo_ser *ser;
>> +	struct gmem_file *gmem_file;
>> +	struct kvm *kvm;
>> +	struct file *kvm_file;
>> +	u64 vm_token;
>> +	int err;
>> +
>> +	if (WARN_ON_ONCE(!args->serialized_data))
>> +		return -EINVAL;
>> +
>> +	ser = phys_to_virt(args->serialized_data);
>> +
>> +	gmem_file = args->file->private_data;
>> +	kvm = gmem_file->kvm;
>> +
>> +	/*
>> +	 * Obtain a strong reference to kvm->vm_file to prevent the SLAB_TYPESAFE_BY_RCU
>> +	 * file memory from being reallocated while it is being processed.
>> +	 */
>> +	kvm_file = get_file_active(&kvm->vm_file);
>> +	if (!kvm_file)
>> +		return -ENOENT;
>> +
>> +	err = liveupdate_get_token_outgoing(args->session, kvm_file, &vm_token);
>> +	fput(kvm_file);
>> +	if (err)
>> +		return err;
>> +
>> +	ser->vm_token = vm_token;
>> +	return 0;
>> +}
>> +
>> +static void kvm_gmem_luo_discard_folios(
>> +	const struct guest_memfd_luo_folio_ser *folios_ser,
>> +	u64 nr_folios, u64 start_idx)
>> +{
>> +	long i;
>> +
>> +	for (i = start_idx; i < nr_folios; i++) {
>> +		struct folio *folio;
>> +		phys_addr_t phys;
>> +
>> +		if (!folios_ser[i].pfn)
>> +			continue;
>> +
>> +		phys = PFN_PHYS(folios_ser[i].pfn);
>> +		folio = kho_restore_folio(phys);
>> +		if (folio)
>> +			folio_put(folio);
>> +	}
>> +}
>> +
>> +static void kvm_gmem_luo_unpreserve(struct liveupdate_file_op_args *args)
>> +{
>> +	struct guest_memfd_luo_folio_ser *folios_ser = args->private_data;
>> +	struct guest_memfd_luo_ser *ser;
>> +	long i;
>> +
>> +	if (WARN_ON_ONCE(!args->serialized_data))
>> +		return;
>> +
>> +	ser = phys_to_virt(args->serialized_data);
>> +	if (!ser)
>> +		return;
>> +
>> +	if (ser->nr_folios > 0)
>> +		kho_unpreserve_vmalloc(&ser->folios);
>> +	for (i = ser->nr_folios - 1; i >= 0; i--) {
>> +		struct folio *folio;
>> +
>> +		if (!folios_ser[i].pfn)
>
> Is it possible for pfn to be 0 here? Perhaps this should be a
> WARN_ON_ONCE().

In LUO design, We explicitly assumed, that pfn 0 is invalid.
Looping +Pratyush for more context.

Yes, WARN_ON_ONCE makes sense. I will update it.

>
>> +			continue;
>> +
>> +		folio = pfn_folio(folios_ser[i].pfn);
>> +		kho_unpreserve_folio(folio);
>> +	}
>> +	vfree(folios_ser);
>> +
>> +	kho_unpreserve_free(ser);
>> +	kvm_gmem_freeze(file_inode(args->file), false);
>> +}
>> +
>>
>> [...snip...]
>>
Thanks,
Tarun



^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Jonathan Cameron @ 2026-06-23 15:08 UTC (permalink / raw)
  To: Conor Dooley
  Cc: Nuno Sá, Rodrigo Alencar, Janani Sunil, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <20260622-captive-tux-067efd31ceac@spud>

On Mon, 22 Jun 2026 19:39:43 +0100
Conor Dooley <conor@kernel.org> wrote:

> On Mon, Jun 22, 2026 at 05:29:11PM +0100, Jonathan Cameron wrote:
> > > > > Yeah. It's not clear to me how that works for the microchip devices
> > > > > (I suspect it doesn't!)
> > > > > 
> > > > > Just thinking as I type, but could we do something a bit nasty with
> > > > > a gpio mux that doesn't actually switch but represents the GPIO being
> > > > > shared?  Given this is all tied to the spi bus that should all happen
> > > > > under serializing locks. 
> > > > > 
> > > > > Agreed though that this would be nicer as an SPI thing that let
> > > > > us specify that a single CS is share by multiple devices and their
> > > > > is some other signal acting to select which one we are talking to.
> > > > >     
> > > > 
> > > > If the device-addressing on the same chip-select is to be handled
> > > > by the spi framework, wouldn't we lose device-specific features?
> > > > 
> > > > I understand that this multi-device feature is there mostly to extend the
> > > > channel count from 16 to 32, 48 or 64. I suppose the command:
> > > > 
> > > > 	"MULTI DEVICE SW LDAC MODE"
> > > > 
> > > > exists so that software can update channel values accross multiple devices.    
> > > 
> > > Right! You do have a point! I agree the main driver for a feature like
> > > this is likely to extend the channel count and effectively "aggregate"
> > > devices.
> > > 
> > > But I would say that even with the spi solution the MULTI DEVICE stuff
> > > should be doable (as we still need a sort of adi,pin-id property). 
> > > 
> > > But yes, I do feel that the whole feature is for aggregation so seeing
> > > one device with 32 channels is the expectation here? Rather than seeing
> > > two devices with 16 channels.  
> > 
> > Agreed - if we have messages that address both devices at once that needs
> > to be a unified driver and given they are about triggering simultaneous
> > update of all channels it needs to look like one big device.
> > This ends up similar to how we handle daisy chain devices.
> > 
> > The question of what to do on devices that don't have this feature
> > is rather different. Good thing you read the datasheet :)  
> 
> I'm not sure it really is, the intent for the microchip devices I think
> is pretty similar. The mcp3911 datasheet cites three-phase power
> metering using three devices as a typical use-case, for example.
> Probably creating an amalgamated device is a good fit there too?
> 
> I assume an amalgamated device for this ADI product means per-channel ID
> properties?

Per device as they are 8 channel each, but effectively the same.
(microchip devices are 2 channels each I think)

> If so, I think they should be made generic and the Microchip
> products retrofitted to use them, with a fallback to the proprietary
> property. Not going to ask for the support for multiple devices in those
> drivers, since the current way doesn't work and there'd be no loss of
> support. Someone from Microchip can do that. The proprietary property
> to generic conversion should be straightforward and provides weight to
> an argument for this being generic, since that'd be three devices that
> can all share?

I think the practical difference is that for the ADI part we have to support
it as a single device as some commands effectively don't use the address
bits in the message.  For the microchip ones we could register multiple
separate SPI devices and just provide one address for each.

Equally we could treat the microchip ones as a unified device in which case
the binding will be the same and there will be a property with a bunch
of addresses (one for each device present). I'm fine with this option given
as you say the datasheet calls out using this for channel expansion.

Jonathan




^ permalink raw reply

* Re: [PATCH] crypto: af_alg - Add af_alg_restrict sysctl, defaulting to 1
From: Luiz Augusto von Dentz @ 2026-06-23 15:04 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, Herbert Xu, linux-kernel, linux-doc,
	linux-bluetooth, iwd, linux-hardening, Milan Broz,
	Demi Marie Obenour, Andy Lutomirski
In-Reply-To: <20260622234803.6982-1-ebiggers@kernel.org>

Hi Eric,

On Mon, Jun 22, 2026 at 7:50 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> AF_ALG is a frequent source of vulnerabilities and a maintenance
> nightmare.  It exposes far more functionality to userspace than ever
> should have been exposed, especially to unprivileged processes.  Recent
> exploits have targeted kernel internal implementation details like
> "authencesn" that have zero use case for userspace access.
>
> Fortunately, AF_ALG is rarely used in practice, as userspace crypto
> libraries exist.  And when it is used, only some functionality is known
> to be used, and many users are known to hold capabilities already.
> iwd for example requires CAP_NET_ADMIN and has a known algorithm list
> (https://lore.kernel.org/linux-crypto/bcbbef00-5881-421b-8892-7be6c04b832d@gmail.com/).
>
> Thus, let's restrict the set of allowed algorithms by default, depending
> on the capabilities held.
>
> Add a sysctl /proc/sys/crypto/af_alg_restrict with meaning:
>
>     0: unrestricted
>     1: limited functionality
>     2: completely disabled
>
> Set the default value to 1, which enables an algorithm allowlist for
> unprivileged processes and a slightly longer allowlist for privileged
> processes.
>
> Note that the list may be tweaked in the future.  However, the common
> use cases such as iwd and bluez are taken into account already.  I've
> tested that iwd still works with the default value of 1.
>
> Signed-off-by: Eric Biggers <ebiggers@kernel.org>
> ---
>  Documentation/admin-guide/sysctl/crypto.rst | 36 +++++++++++
>  Documentation/crypto/userspace-if.rst       | 13 +++-
>  crypto/af_alg.c                             | 72 +++++++++++++++++++--
>  crypto/algif_aead.c                         | 11 ++++
>  crypto/algif_hash.c                         | 24 +++++++
>  crypto/algif_rng.c                          |  9 +++
>  crypto/algif_skcipher.c                     | 20 ++++++
>  include/crypto/if_alg.h                     |  8 +++
>  8 files changed, 184 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/crypto.rst b/Documentation/admin-guide/sysctl/crypto.rst
> index b707bd314a64..9a1bd53287f4 100644
> --- a/Documentation/admin-guide/sysctl/crypto.rst
> +++ b/Documentation/admin-guide/sysctl/crypto.rst
> @@ -5,10 +5,46 @@
>  These files show up in ``/proc/sys/crypto/``, depending on the
>  kernel configuration:
>
>  .. contents:: :local:
>
> +.. _af_alg_restrict:
> +
> +af_alg_restrict
> +===============
> +
> +Controls the level of restriction of AF_ALG.
> +
> +AF_ALG is a deprecated and rarely-used userspace interface that is a
> +frequent source of vulnerabilities. It also unnecessarily exposes a
> +large number of kernel implementation details. For more information
> +about AF_ALG, see :ref:`Documentation/crypto/userspace-if.rst
> +<crypto_userspace_interface>`.
> +
> +Starting in Linux v7.3, AF_ALG supports only a limited set of
> +algorithms by default. This sysctl allows the system administrator to
> +remove this restriction when needed for compatibility reasons, or to
> +go further and disable AF_ALG entirely. The default value is 1.
> +
> +===  ==================================================================
> +0    AF_ALG is unrestricted.
> +
> +1    AF_ALG is supported with a limited list of algorithms. The list
> +     is designed for compatibility with known users such as iwd and
> +     bluez that haven't yet been fixed to use userspace crypto code.

Is the expectation that we go shopping for userspace crypto here? This
would mean the Bluetooth stack would use 2 different crypto stacks:
the kernel for module parts and a userspace library for components
like bluetoothd, resulting in 2 different sources to audit. So, while
I agree we should restrict it to just the algorithms in use and with
capabilities, I don't think we should assume the current userspace
users are broken and need fixing.

> +     Specifically, there is an allowlist for unprivileged processes
> +     and a somewhat longer allowlist for processes that hold
> +     CAP_SYS_ADMIN or CAP_NET_ADMIN in the initial user namespace.
> +
> +     Attempts to bind() an AF_ALG socket with a disallowed algorithm
> +     fail with ENOENT.
> +
> +2    AF_ALG is completely disabled. Attempts to create an AF_ALG
> +     socket fail with EAFNOSUPPORT.
> +===  ==================================================================
> +
>  fips_enabled
>  ============
>
>  Read-only flag that indicates whether FIPS mode is enabled.
>
> diff --git a/Documentation/crypto/userspace-if.rst b/Documentation/crypto/userspace-if.rst
> index ab93300c8e04..d6194346e366 100644
> --- a/Documentation/crypto/userspace-if.rst
> +++ b/Documentation/crypto/userspace-if.rst
> @@ -1,5 +1,7 @@
> +.. _crypto_userspace_interface:
> +
>  User Space Interface
>  ====================
>
>  Introduction
>  ------------
> @@ -10,13 +12,18 @@ code.
>
>  AF_ALG is insecure and is deprecated. Originally added to the kernel in 2010,
>  most kernel developers now consider it to be a mistake. Support for hardware
>  accelerators, which was the original purpose of AF_ALG, has been removed.
>
> -AF_ALG continues to be supported only for backwards compatibility. On systems
> -where no programs using AF_ALG remain, the support for it should be disabled by
> -disabling ``CONFIG_CRYPTO_USER_API_*``.
> +AF_ALG continues to be supported only for backwards compatibility.
> +
> +Starting in Linux v7.3, the set of algorithms supported by AF_ALG is limited by
> +default. See :ref:`/proc/sys/crypto/af_alg_restrict <af_alg_restrict>`.
> +
> +On systems where no programs using AF_ALG remain, the support for it should be
> +disabled entirely by setting ``/proc/sys/crypto/af_alg_restrict`` to 2 or by
> +disabling ``CONFIG_CRYPTO_USER_API_*`` in the kernel configuration.
>
>  Deprecation
>  -----------
>
>  AF_ALG was originally intended to provide userspace programs access to crypto
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index cce000e8590e..34b801568fba 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -6,10 +6,11 @@
>   *
>   * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
>   */
>
>  #include <linux/atomic.h>
> +#include <linux/capability.h>
>  #include <crypto/if_alg.h>
>  #include <linux/crypto.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/key.h>
> @@ -20,14 +21,32 @@
>  #include <linux/rwsem.h>
>  #include <linux/sched.h>
>  #include <linux/sched/signal.h>
>  #include <linux/security.h>
>  #include <linux/string.h>
> +#include <linux/sysctl.h>
> +#include <linux/user_namespace.h>
>  #include <keys/user-type.h>
>  #include <keys/trusted-type.h>
>  #include <keys/encrypted-type.h>
>
> +static int af_alg_restrict = 1;
> +
> +static const struct ctl_table af_alg_table[] = {
> +       {
> +               .procname       = "af_alg_restrict",
> +               .data           = &af_alg_restrict,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec_minmax,
> +               .extra1         = SYSCTL_ZERO,
> +               .extra2         = SYSCTL_TWO,
> +       },
> +};
> +
> +static struct ctl_table_header *af_alg_header;
> +
>  struct alg_type_list {
>         const struct af_alg_type *type;
>         struct list_head list;
>  };
>
> @@ -108,10 +127,43 @@ int af_alg_unregister_type(const struct af_alg_type *type)
>
>         return err;
>  }
>  EXPORT_SYMBOL_GPL(af_alg_unregister_type);
>
> +static bool af_alg_capable(void)
> +{
> +       return ns_capable_noaudit(&init_user_ns, CAP_NET_ADMIN) ||
> +              capable(CAP_SYS_ADMIN);
> +}
> +
> +int af_alg_check_restriction(const char *name,
> +                            const struct af_alg_allowlist_entry allowlist[])
> +{
> +       int level = READ_ONCE(af_alg_restrict);
> +
> +       if (level == 0)
> +               return 0;
> +       if (level == 1) {
> +               for (const struct af_alg_allowlist_entry *ent = allowlist;
> +                    ent->name; ent++) {
> +                       if (strcmp(name, ent->name) == 0 &&
> +                           (!ent->privileged || af_alg_capable()))
> +                               return 0;
> +               }
> +       }
> +       /*
> +        * Use -ENOENT (the error code for "algorithm not found") instead of
> +        * -EACCES or -EPERM, for the highest chance of correctly triggering
> +        * fallback code paths in userspace programs.
> +        *
> +        * Don't log a warning, since it would be noisy.  iwd tries to bind a
> +        * bunch of algorithms that it never uses.
> +        */
> +       return -ENOENT;
> +}
> +EXPORT_SYMBOL_GPL(af_alg_check_restriction);
> +
>  static void alg_do_release(const struct af_alg_type *type, void *private)
>  {
>         if (!type)
>                 return;
>
> @@ -504,10 +556,13 @@ static int alg_create(struct net *net, struct socket *sock, int protocol,
>                       int kern)
>  {
>         struct sock *sk;
>         int err;
>
> +       if (READ_ONCE(af_alg_restrict) == 2)
> +               return -EAFNOSUPPORT;
> +
>         if (sock->type != SOCK_SEQPACKET)
>                 return -ESOCKTNOSUPPORT;
>         if (protocol != 0)
>                 return -EPROTONOSUPPORT;
>
> @@ -1220,31 +1275,36 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
>  }
>  EXPORT_SYMBOL_GPL(af_alg_get_rsgl);
>
>  static int __init af_alg_init(void)
>  {
> -       int err = proto_register(&alg_proto, 0);
> +       int err;
> +
> +       af_alg_header = register_sysctl("crypto", af_alg_table);
>
> +       err = proto_register(&alg_proto, 0);
>         if (err)
> -               goto out;
> +               goto out_unregister_sysctl;
>
>         err = sock_register(&alg_family);
> -       if (err != 0)
> +       if (err)
>                 goto out_unregister_proto;
>
> -out:
> -       return err;
> +       return 0;
>
>  out_unregister_proto:
>         proto_unregister(&alg_proto);
> -       goto out;
> +out_unregister_sysctl:
> +       unregister_sysctl_table(af_alg_header);
> +       return err;
>  }
>
>  static void __exit af_alg_exit(void)
>  {
>         sock_unregister(PF_ALG);
>         proto_unregister(&alg_proto);
> +       unregister_sysctl_table(af_alg_header);
>  }
>
>  module_init(af_alg_init);
>  module_exit(af_alg_exit);
>  MODULE_DESCRIPTION("Crypto userspace interface");
> diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> index 787aac8aeb24..b9217f9086aa 100644
> --- a/crypto/algif_aead.c
> +++ b/crypto/algif_aead.c
> @@ -32,10 +32,15 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/net.h>
>  #include <net/sock.h>
>
> +static const struct af_alg_allowlist_entry aead_allowlist[] = {
> +       { "ccm(aes)", true }, /* bluez */
> +       {},
> +};
> +
>  static inline bool aead_sufficient_data(struct sock *sk)
>  {
>         struct alg_sock *ask = alg_sk(sk);
>         struct sock *psk = ask->parent;
>         struct alg_sock *pask = alg_sk(psk);
> @@ -342,10 +347,16 @@ static struct proto_ops algif_aead_ops_nokey = {
>         .poll           =       af_alg_poll,
>  };
>
>  static void *aead_bind(const char *name)
>  {
> +       int err;
> +
> +       err = af_alg_check_restriction(name, aead_allowlist);
> +       if (err)
> +               return ERR_PTR(err);
> +
>         return crypto_alloc_aead(name, 0, AF_ALG_CRYPTOAPI_MASK);
>  }
>
>  static void aead_release(void *private)
>  {
> diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
> index 5452ad6c1506..a8d958d51ece 100644
> --- a/crypto/algif_hash.c
> +++ b/crypto/algif_hash.c
> @@ -14,10 +14,28 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/net.h>
>  #include <net/sock.h>
>
> +static const struct af_alg_allowlist_entry hash_allowlist[] = {
> +       { "cmac(aes)", true }, /* iwd, bluez */
> +       { "hmac(md5)", true }, /* iwd */
> +       { "hmac(sha1)", true }, /* iwd */
> +       { "hmac(sha224)", true }, /* iwd */
> +       { "hmac(sha256)", true }, /* iwd */
> +       { "hmac(sha384)", true }, /* iwd */
> +       { "hmac(sha512)", true }, /* iwd, sha512hmac */
> +       { "md4", true }, /* iwd */
> +       { "md5", true }, /* iwd */
> +       { "sha1", false }, /* iwd, iproute2 < 7.0 */
> +       { "sha224", true }, /* iwd */
> +       { "sha256", true }, /* iwd */
> +       { "sha384", true }, /* iwd */
> +       { "sha512", true }, /* iwd */
> +       {},
> +};
> +
>  struct hash_ctx {
>         struct af_alg_sgl sgl;
>
>         u8 *result;
>
> @@ -380,10 +398,16 @@ static struct proto_ops algif_hash_ops_nokey = {
>         .accept         =       hash_accept_nokey,
>  };
>
>  static void *hash_bind(const char *name)
>  {
> +       int err;
> +
> +       err = af_alg_check_restriction(name, hash_allowlist);
> +       if (err)
> +               return ERR_PTR(err);
> +
>         return crypto_alloc_ahash(name, 0, AF_ALG_CRYPTOAPI_MASK);
>  }
>
>  static void hash_release(void *private)
>  {
> diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
> index 4dfe7899f8fa..bd522915d56d 100644
> --- a/crypto/algif_rng.c
> +++ b/crypto/algif_rng.c
> @@ -48,10 +48,14 @@
>
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Stephan Mueller <smueller@chronox.de>");
>  MODULE_DESCRIPTION("User-space interface for random number generators");
>
> +static const struct af_alg_allowlist_entry rng_allowlist[] = {
> +       {},
> +};
> +
>  struct rng_ctx {
>  #define MAXSIZE 128
>         unsigned int len;
>         struct crypto_rng *drng;
>         u8 *addtl;
> @@ -199,10 +203,15 @@ static struct proto_ops __maybe_unused algif_rng_test_ops = {
>
>  static void *rng_bind(const char *name)
>  {
>         struct rng_parent_ctx *pctx;
>         struct crypto_rng *rng;
> +       int err;
> +
> +       err = af_alg_check_restriction(name, rng_allowlist);
> +       if (err)
> +               return ERR_PTR(err);
>
>         pctx = kzalloc_obj(*pctx);
>         if (!pctx)
>                 return ERR_PTR(-ENOMEM);
>
> diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
> index df20bdfe1f1f..2b8069667974 100644
> --- a/crypto/algif_skcipher.c
> +++ b/crypto/algif_skcipher.c
> @@ -32,10 +32,24 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/net.h>
>  #include <net/sock.h>
>
> +static const struct af_alg_allowlist_entry skcipher_allowlist[] = {
> +       { "adiantum(xchacha12,aes)", false }, /* cryptsetup */
> +       { "adiantum(xchacha20,aes)", false }, /* cryptsetup */
> +       { "cbc(aes)", true }, /* iwd */
> +       { "cbc(des)", true }, /* iwd */
> +       { "cbc(des3_ede)", true }, /* iwd */
> +       { "ctr(aes)", true }, /* iwd */
> +       { "ecb(aes)", true }, /* iwd, bluez */
> +       { "ecb(des)", true }, /* iwd */
> +       { "hctr2(aes)", false }, /* cryptsetup */
> +       { "xts(aes)", false }, /* cryptsetup benchmark */
> +       {},
> +};
> +
>  static int skcipher_sendmsg(struct socket *sock, struct msghdr *msg,
>                             size_t size)
>  {
>         struct sock *sk = sock->sk;
>         struct alg_sock *ask = alg_sk(sk);
> @@ -307,10 +321,16 @@ static struct proto_ops algif_skcipher_ops_nokey = {
>         .poll           =       af_alg_poll,
>  };
>
>  static void *skcipher_bind(const char *name)
>  {
> +       int err;
> +
> +       err = af_alg_check_restriction(name, skcipher_allowlist);
> +       if (err)
> +               return ERR_PTR(err);
> +
>         return crypto_alloc_skcipher(name, 0, AF_ALG_CRYPTOAPI_MASK);
>  }
>
>  static void skcipher_release(void *private)
>  {
> diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
> index 7643ba954125..4e9ed8e73403 100644
> --- a/include/crypto/if_alg.h
> +++ b/include/crypto/if_alg.h
> @@ -159,13 +159,21 @@ struct af_alg_ctx {
>         unsigned int len;
>
>         unsigned int inflight;
>  };
>
> +struct af_alg_allowlist_entry {
> +       const char *name;
> +       bool privileged;
> +};
> +
>  int af_alg_register_type(const struct af_alg_type *type);
>  int af_alg_unregister_type(const struct af_alg_type *type);
>
> +int af_alg_check_restriction(const char *name,
> +                            const struct af_alg_allowlist_entry allowlist[]);
> +
>  int af_alg_release(struct socket *sock);
>  void af_alg_release_parent(struct sock *sk);
>  int af_alg_accept(struct sock *sk, struct socket *newsock,
>                   struct proto_accept_arg *arg);
>
>
> base-commit: 1dc18801be29bc54709aa355b8acd80e183b03cd
> --
> 2.54.0
>
>


-- 
Luiz Augusto von Dentz

^ permalink raw reply

* Re: [PATCH][v2] mm/dmapool: Untangle CONFIG_SLUB_DEBUG_ON abuse and switch to static key
From: Liam R . Howlett @ 2026-06-23 14:58 UTC (permalink / raw)
  To: lirongqing
  Cc: Jonathan Corbet, Shuah Khan, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, linux-doc, linux-kernel,
	linux-mm
In-Reply-To: <20260623121245.2221-1-lirongqing@baidu.com>


When generating new versions of a patch, using git format-patch -v[#]
will make the subject [PATCH v2] as apposed to appending v2 after the
[PATCH] like it looks like you did.

On 26/06/23 08:12PM, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> The dmapool subsystem historically wrapped its debugging logic inside an
> This approach is fundamentally flawed because CONFIG_SLUB_DEBUG_ON
> merely defines compile-time defaults for SLUB and caused two flaws:

This looks truncated? "inside an This approach.."

> 
> On production kernels where CONFIG_SLUB_DEBUG=y but
> CONFIG_SLUB_DEBUG_ON=n, dmapool debugging was completely compiled out
> at compile time, leaving no way to enable it without rebuilding the
> kernel.
> 
> On kernels with CONFIG_SLUB_DEBUG_ON=y, dmapool debugging stayed
> unconditionally active even if a user explicitly disabled slub debugging
> at boot time.
> 
> Clean up this mess by removing the #ifdef and switching to a runtime
> static key (dmapool_debug_enabled), allowing dmapool debugging to be
> toggled cleanly via its own boot parameter: dmapool_debug
> 
> Suggested-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Liam R. Howlett <liam@infradead.org>
> Cc: Vlastimil Babka <vbabka@kernel.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> ---
> Diff with v1:  Move the static key check out of pool_init_page etc 

Link to v1 here is a good idea so that people can see the diff and the
conversation.

> 
>  Documentation/admin-guide/kernel-parameters.txt |  5 +++
>  mm/dmapool.c                                    | 57 ++++++++++++++-----------
>  2 files changed, 38 insertions(+), 24 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 19c9a19..66d853c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1304,6 +1304,11 @@ Kernel parameters
>  
>  	dis_ucode_ldr	[X86] Disable the microcode loader.
>  
> +	dmapool_debug	[MM]
> +			Enable DMA pool debugging. This enables memory
> +			poisoning and validation for DMA pool allocations.
> +			Useful for debugging DMA API misuse.
> +
>  	dma_debug=off	If the kernel is compiled with DMA_API_DEBUG support,
>  			this option disables the debugging code at boot.
>  
> diff --git a/mm/dmapool.c b/mm/dmapool.c
> index 5d8af6e..7bd037a 100644
> --- a/mm/dmapool.c
> +++ b/mm/dmapool.c
> @@ -35,10 +35,23 @@
>  #include <linux/string.h>
>  #include <linux/types.h>
>  #include <linux/wait.h>
> +#include <linux/static_key.h>
> +#include <linux/init.h>
>  
> -#ifdef CONFIG_SLUB_DEBUG_ON
> -#define DMAPOOL_DEBUG 1
> -#endif
> +/*
> + * Debugging support for dmapool using static key.
> + *
> + * This allows enabling dmapool debug at boot time via:
> + *   dmapool_debug
> + */
> +static DEFINE_STATIC_KEY_FALSE(dmapool_debug_enabled);
> +
> +static int __init dmapool_debug_setup(char *str)
> +{
> +	static_branch_enable(&dmapool_debug_enabled);
> +	return 1;
> +}
> +__setup("dmapool_debug", dmapool_debug_setup);
>  
>  struct dma_block {
>  	struct dma_block *next_block;
> @@ -92,7 +105,6 @@ static ssize_t pools_show(struct device *dev, struct device_attribute *attr, cha
>  
>  static DEVICE_ATTR_RO(pools);
>  
> -#ifdef DMAPOOL_DEBUG
>  static void pool_check_block(struct dma_pool *pool, struct dma_block *block,
>  			     gfp_t mem_flags)
>  {
> @@ -161,23 +173,6 @@ static void pool_init_page(struct dma_pool *pool, struct dma_page *page)
>  {
>  	memset(page->vaddr, POOL_POISON_FREED, pool->allocation);
>  }
> -#else
> -static void pool_check_block(struct dma_pool *pool, struct dma_block *block,
> -			     gfp_t mem_flags)
> -{
> -}
> -
> -static bool pool_block_err(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
> -{
> -	if (want_init_on_free())
> -		memset(vaddr, 0, pool->size);
> -	return false;
> -}
> -
> -static void pool_init_page(struct dma_pool *pool, struct dma_page *page)
> -{
> -}
> -#endif
>  
>  static struct dma_block *pool_block_pop(struct dma_pool *pool)
>  {
> @@ -305,7 +300,9 @@ static void pool_initialise_page(struct dma_pool *pool, struct dma_page *page)
>  	unsigned int next_boundary = pool->boundary, offset = 0;
>  	struct dma_block *block, *first = NULL, *last = NULL;
>  
> -	pool_init_page(pool, page);
> +	if (static_branch_unlikely(&dmapool_debug_enabled))
> +		pool_init_page(pool, page);
> +
>  	while (offset + pool->size <= pool->allocation) {
>  		if (offset + pool->size > next_boundary) {
>  			offset = next_boundary;
> @@ -433,7 +430,10 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
>  	spin_unlock_irqrestore(&pool->lock, flags);
>  
>  	*handle = block->dma;
> -	pool_check_block(pool, block, mem_flags);
> +
> +	if (static_branch_unlikely(&dmapool_debug_enabled))
> +		pool_check_block(pool, block, mem_flags);
> +
>  	if (want_init_on_alloc(mem_flags))
>  		memset(block, 0, pool->size);
>  
> @@ -454,9 +454,18 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma)
>  {
>  	struct dma_block *block = vaddr;
>  	unsigned long flags;
> +	bool err = false;
>  
>  	spin_lock_irqsave(&pool->lock, flags);
> -	if (!pool_block_err(pool, vaddr, dma)) {
> +
> +	if (static_branch_unlikely(&dmapool_debug_enabled))
> +		err = pool_block_err(pool, vaddr, dma);
> +	else {
> +		if (want_init_on_free())
> +			memset(vaddr, 0, pool->size);
> +	}
> +
> +	if (!err) {
>  		pool_block_push(pool, block, dma);
>  		pool->nr_active--;
>  	}
> -- 
> 2.9.4
> 
> 

^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Jonathan Cameron @ 2026-06-23 14:57 UTC (permalink / raw)
  To: Rodrigo Alencar
  Cc: Nuno Sá, Conor Dooley, Janani Sunil, Janani Sunil,
	Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, Mark Brown
In-Reply-To: <5guhkvoxhkzevjm4b45hhk3772akpnz3givr3gb56x5ywamzyf@oiycreun4y62>

On Tue, 23 Jun 2026 09:09:14 +0100
Rodrigo Alencar <455.rodrigo.alencar@gmail.com> wrote:

> On 22/06/26 13:20, Nuno Sá wrote:
> > On Mon, Jun 22, 2026 at 12:51:20PM +0100, Rodrigo Alencar wrote:  
> > > On 22/06/26 11:29, Nuno Sá wrote:  
> > > > On Mon, Jun 22, 2026 at 10:24:05AM +0100, Rodrigo Alencar wrote:  
> > > > > On 21/06/26 15:33, Jonathan Cameron wrote:  
> > > > > > On Fri, 19 Jun 2026 16:54:11 +0100
> > > > > > Nuno Sá <noname.nuno@gmail.com> wrote:
> > > > > >   
> > > > > > > On Fri, Jun 19, 2026 at 03:12:07PM +0100, Conor Dooley wrote:  
> > > > > > > > On Fri, Jun 19, 2026 at 02:01:08PM +0100, Nuno Sá wrote:    
> > > > > > > > > On Fri, Jun 19, 2026 at 12:40:54PM +0100, Conor Dooley wrote:    
> > > > > > > > > > On Fri, Jun 19, 2026 at 12:36:55PM +0100, Conor Dooley wrote:    
> > > > > > > > > > > On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:    
> > > > > > > > > > > > 
> > > > > > > > > > > > On 6/14/26 21:44, Jonathan Cameron wrote:    
> > > > > > > > > > > > > On Tue, 9 Jun 2026 16:47:23 +0200
> > > > > > > > > > > > > Janani Sunil <jan.sun97@gmail.com> wrote:
> > > > > > > > > > > > >     
> > > > > > > > > > > > > > On 5/26/26 15:11, Rodrigo Alencar wrote:    
> > > > > > > > > > > > > > > On 26/05/19 05:42PM, Janani Sunil wrote:    
> > > > > > > > > > > > > > > > Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
> > > > > > > > > > > > > > > > buffered voltage output digital-to-analog converter (DAC) with an
> > > > > > > > > > > > > > > > integrated precision reference.    
> > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > Probably others may comment on that, but...
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > This parent node may support device addressing for multi-device support through
> > > > > > > > > > > > > > > those ID pins. I suppose that each device may have its own power supplies or
> > > > > > > > > > > > > > > other resources like the toggle pins or reset and enable.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > That way I suppose that an example would look like...    
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +patternProperties:
> > > > > > > > > > > > > > > > +  "^channel@([0-9]|1[0-5])$":
> > > > > > > > > > > > > > > > +    type: object
> > > > > > > > > > > > > > > > +    description: Child nodes for individual channel configuration
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +    properties:
> > > > > > > > > > > > > > > > +      reg:
> > > > > > > > > > > > > > > > +        description: Channel number.
> > > > > > > > > > > > > > > > +        minimum: 0
> > > > > > > > > > > > > > > > +        maximum: 15
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +      adi,output-range-microvolt:
> > > > > > > > > > > > > > > > +        description: |
> > > > > > > > > > > > > > > > +          Output voltage range for this channel as [min, max] in microvolts.
> > > > > > > > > > > > > > > > +          If not specified, defaults to 0V to 5V range.
> > > > > > > > > > > > > > > > +        oneOf:
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: 0
> > > > > > > > > > > > > > > > +              - enum: [5000000, 10000000, 20000000, 40000000]
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: -5000000
> > > > > > > > > > > > > > > > +              - const: 5000000
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: -10000000
> > > > > > > > > > > > > > > > +              - const: 10000000
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: -15000000
> > > > > > > > > > > > > > > > +              - const: 15000000
> > > > > > > > > > > > > > > > +          - items:
> > > > > > > > > > > > > > > > +              - const: -20000000
> > > > > > > > > > > > > > > > +              - const: 20000000
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +    required:
> > > > > > > > > > > > > > > > +      - reg
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +    additionalProperties: false
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +required:
> > > > > > > > > > > > > > > > +  - compatible
> > > > > > > > > > > > > > > > +  - reg
> > > > > > > > > > > > > > > > +  - vdd-supply
> > > > > > > > > > > > > > > > +  - avdd-supply
> > > > > > > > > > > > > > > > +  - hvdd-supply
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +dependencies:
> > > > > > > > > > > > > > > > +  spi-cpha: [ spi-cpol ]
> > > > > > > > > > > > > > > > +  spi-cpol: [ spi-cpha ]
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +allOf:
> > > > > > > > > > > > > > > > +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +unevaluatedProperties: false
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +examples:
> > > > > > > > > > > > > > > > +  - |
> > > > > > > > > > > > > > > > +    #include <dt-bindings/gpio/gpio.h>
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +    spi {
> > > > > > > > > > > > > > > > +        #address-cells = <1>;
> > > > > > > > > > > > > > > > +        #size-cells = <0>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +        dac@0 {
> > > > > > > > > > > > > > > > +            compatible = "adi,ad5529r-16";
> > > > > > > > > > > > > > > > +            reg = <0>;
> > > > > > > > > > > > > > > > +            spi-max-frequency = <25000000>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > > > +            avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > > > +            hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > > > +            hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            #address-cells = <1>;
> > > > > > > > > > > > > > > > +            #size-cells = <0>;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            channel@0 {
> > > > > > > > > > > > > > > > +                reg = <0>;
> > > > > > > > > > > > > > > > +                adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            channel@1 {
> > > > > > > > > > > > > > > > +                reg = <1>;
> > > > > > > > > > > > > > > > +                adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +            channel@2 {
> > > > > > > > > > > > > > > > +                reg = <2>;
> > > > > > > > > > > > > > > > +                adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > > > > > > > > +            };
> > > > > > > > > > > > > > > > +        };
> > > > > > > > > > > > > > > > +    };    
> > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 	spi {
> > > > > > > > > > > > > > > 		#address-cells = <1>;
> > > > > > > > > > > > > > > 		#size-cells = <0>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 		multi-dac@0 {
> > > > > > > > > > > > > > > 			compatible = "adi,ad5529r-16";
> > > > > > > > > > > > > > > 			reg = <0>;
> > > > > > > > > > > > > > > 			spi-max-frequency = <25000000>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 			#address-cells = <1>;
> > > > > > > > > > > > > > > 			#size-cells = <0>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 			dac@0 {
> > > > > > > > > > > > > > > 				reg = <0>;
> > > > > > > > > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				#address-cells = <1>;
> > > > > > > > > > > > > > > 				#size-cells = <0>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@0 {
> > > > > > > > > > > > > > > 					reg = <0>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@1 {
> > > > > > > > > > > > > > > 					reg = <1>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@2 {
> > > > > > > > > > > > > > > 					reg = <2>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 40000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 			}
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 			dac@1 {
> > > > > > > > > > > > > > > 				reg = <1>;
> > > > > > > > > > > > > > > 				vdd-supply = <&vdd_regulator>;
> > > > > > > > > > > > > > > 				avdd-supply = <&avdd_regulator>;
> > > > > > > > > > > > > > > 				hvdd-supply = <&hvdd_regulator>;
> > > > > > > > > > > > > > > 				hvss-supply = <&hvss_regulator>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				#address-cells = <1>;
> > > > > > > > > > > > > > > 				#size-cells = <0>;
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@0 {
> > > > > > > > > > > > > > > 					reg = <0>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <0 5000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 				channel@1 {
> > > > > > > > > > > > > > > 					reg = <1>;
> > > > > > > > > > > > > > > 					adi,output-range-microvolt = <(-10000000) 10000000>;
> > > > > > > > > > > > > > > 				};
> > > > > > > > > > > > > > > 			}
> > > > > > > > > > > > > > > 		};
> > > > > > > > > > > > > > > 	};
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > then you might need something like:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 	patternProperties:
> > > > > > > > > > > > > > > 		"^dac@[0-3]$":
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > and put most of the things under this node pattern.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > So the main driver that you're putting together might need to handle up to four instances.
> > > > > > > > > > > > > > > Even if your current driver cannot handle this, the dt-bindings might need cover that.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Need to double check if each dac node needs a separate compatible, so you would maybe populate
> > > > > > > > > > > > > > > a platform data to be shared with the child nodes, which would be a separate driver.
> > > > > > > > > > > > > > > (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).    
> > > > > > > > > > > > > > Hi Rodrigo,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Thank you for looking at this.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
> > > > > > > > > > > > > > hardware/use case we have only needs one device node and the driver is written around that model as well.
> > > > > > > > > > > > > > While the device addressing pins could allow multi-device topology, we do not have an actual platform using
> > > > > > > > > > > > > > that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
> > > > > > > > > > > > > > speculatively without a validating use case.    
> > > > > > > > > > > > > Interesting feature - kind of similar to address control on a typical i2c bus device, or
> > > > > > > > > > > > > looking at it another way a kind of distributed SPI mux.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Challenge of a binding is we need to anticipate the future.  So I think we do need something
> > > > > > > > > > > > > like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> > > > > > > > > > > > > That would leave the path open to supporting the addressing at a later date.
> > > > > > > > > > > > > An alternative might be to look at it like a chained device setup. In those we pretend there
> > > > > > > > > > > > > is just one device with a lot of channels etc.  The snag is that here things are more loosely
> > > > > > > > > > > > > coupled whereas for those devices it tends to be you have to read / write the same register
> > > > > > > > > > > > > in all devices in the chain as one big SPI message.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> > > > > > > > > > > > > - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> > > > > > > > > > > > > value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> > > > > > > > > > > > > be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> > > > > > > > > > > > > longer term how to support it cleanly in SPI.    
> > > > > > > > > > > 
> > > > > > > > > > > I'd swear I have seen this before, from some Microchip devices. Let me
> > > > > > > > > > > see if I can find what I am thinking of...    
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > microchip,mcp3911 and microchip,mcp3564 both seem to do this with
> > > > > > > > > > slightly different properties.
> > > > > > > > > > 
> > > > > > > > > >   microchip,device-addr:
> > > > > > > > > >     description: Device address when multiple MCP3911 chips are present on the same SPI bus.
> > > > > > > > > >     $ref: /schemas/types.yaml#/definitions/uint32
> > > > > > > > > >     enum: [0, 1, 2, 3]
> > > > > > > > > >     default: 0
> > > > > > > > > > 
> > > > > > > > > > and
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > >   microchip,hw-device-address:
> > > > > > > > > >     $ref: /schemas/types.yaml#/definitions/uint32
> > > > > > > > > >     minimum: 0
> > > > > > > > > >     maximum: 3
> > > > > > > > > >     description:
> > > > > > > > > >       The address is set on a per-device basis by fuses in the factory,
> > > > > > > > > >       configured on request. If not requested, the fuses are set for 0x1.
> > > > > > > > > >       The device address is part of the device markings to avoid
> > > > > > > > > >       potential confusion. This address is coded on two bits, so four possible
> > > > > > > > > >       addresses are available when multiple devices are present on the same
> > > > > > > > > >       SPI bus with only one Chip Select line for all devices.
> > > > > > > > > >       Each device communication starts by a CS falling edge, followed by the
> > > > > > > > > >       clocking of the device address (BITS[7:6] - top two bits of COMMAND BYTE
> > > > > > > > > >       which is first one on the wire).
> > > > > > > > > > 
> > > > > > > > > > This sounds exactly like the sort of feature that you're dealing with
> > > > > > > > > > here?
> > > > > > > > > >     
> > > > > > > > > 
> > > > > > > > > The core idea yes but for this chip, things are a bit more annoying (but
> > > > > > > > > Janani can correct me if I'm wrong). Here, each device can, in theory,
> > > > > > > > > have it's own supplies, pins and at the very least, channels with maybe
> > > > > > > > > different scales. That is why Janani is proposing dac nodes. Given I
> > > > > > > > > honestly don't like much of that "adi,ad5529r-bus" compatible I wondered
> > > > > > > > > about solving this at the spi level.
> > > > > > > > > 
> > > > > > > > > Ah and to make it more annoying, we can also mix 12 and 16 bits variants
> > > > > > > > > together in the same bus.    
> > > > > > > > 
> > > > > > > > I'm definitely missing something, because that property for the
> > > > > > > > microchip devices is not impacted what else is on the bus. AFAICT, you
> > > > > > > > could have an mcp3911 and an mcp3564 on the same bus even though both
> > > > > > > > are completely different devices with different drivers. They have
> > > > > > > > individual device nodes and their own supplies etc etc. These aren't
> > > > > > > > per-channel properties on an adc or dac, they're per child device on a
> > > > > > > > spi bus.    
> > > > > > > 
> > > > > > > Maybe I'm the one missing something :). IIRC, spi would not allow two
> > > > > > > devices on the same CS right? Because for this chip we would need
> > > > > > > something like:
> > > > > > > 
> > > > > > > spi {
> > > > > > > 	dac@0 {
> > > > > > > 		reg = <0>;
> > > > > > > 		adi,pin-id = <0>;
> > > > > > > 	};
> > > > > > > 
> > > > > > > 	dac@1 {
> > > > > > > 		reg = <0>; // which seems already problematic?
> > > > > > > 		adi,pin-id <1>;
> > > > > > > 	};
> > > > > > > 
> > > > > > > 	...
> > > > > > > 
> > > > > > > 	//up to 4
> > > > > > > };  
> > > > > > Yeah. It's not clear to me how that works for the microchip devices
> > > > > > (I suspect it doesn't!)
> > > > > > 
> > > > > > Just thinking as I type, but could we do something a bit nasty with
> > > > > > a gpio mux that doesn't actually switch but represents the GPIO being
> > > > > > shared?  Given this is all tied to the spi bus that should all happen
> > > > > > under serializing locks. 
> > > > > > 
> > > > > > Agreed though that this would be nicer as an SPI thing that let
> > > > > > us specify that a single CS is share by multiple devices and their
> > > > > > is some other signal acting to select which one we are talking to.
> > > > > >   
> > > > > 
> > > > > If the device-addressing on the same chip-select is to be handled
> > > > > by the spi framework, wouldn't we lose device-specific features?
> > > > > 
> > > > > I understand that this multi-device feature is there mostly to extend the
> > > > > channel count from 16 to 32, 48 or 64. I suppose the command:
> > > > > 
> > > > > 	"MULTI DEVICE SW LDAC MODE"
> > > > > 
> > > > > exists so that software can update channel values accross multiple devices.  
> > > > 
> > > > Right! You do have a point! I agree the main driver for a feature like
> > > > this is likely to extend the channel count and effectively "aggregate"
> > > > devices.
> > > > 
> > > > But I would say that even with the spi solution the MULTI DEVICE stuff
> > > > should be doable (as we still need a sort of adi,pin-id property).   
> > > 
> > > I don't think we can have something like an IIO buffer shared by multiple
> > > devices. Synchronizing separate devices would be doable with proper hardware
> > > support for this (probably involving an FGPA).  
> > 
> > True!
> >   
> > >    
> > > > But yes, I do feel that the whole feature is for aggregation so seeing
> > > > one device with 32 channels is the expectation here? Rather than seeing
> > > > two devices with 16 channels.  
> > > 
> > > Yes, I think aggregation is the whole point there... so that the IIO driver
> > > is multi-device-aware.  
> > 
> > Which makes me feel that different pins per device might be possible
> > from an HW point of view but does not make much sense. For example, for
> > the buffer example I would expect LDAC to be shared between all the
> > devices.  
> 
> That is why I would still suggest the multi-dac node in the middle...
> the parent node can hold shared resources, while the dac children can
> have their own, overriding or inheriting stuff.
> 
Before going down that path I'd want confirmation this is something we
actually think anyone will build.

Jonathan



^ permalink raw reply

* Re: [PATCH v3 1/2] dt-bindings: iio: dac: Add AD5529R
From: Jonathan Cameron @ 2026-06-23 14:55 UTC (permalink / raw)
  To: Janani Sunil
  Cc: David Lechner, Nuno Sá, Rodrigo Alencar, Conor Dooley,
	Janani Sunil, Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, linux-iio, devicetree,
	linux-kernel, linux-doc, Mark Brown
In-Reply-To: <0f778d06-726f-4aec-ab3b-f6e091cc9421@gmail.com>

On Tue, 23 Jun 2026 12:07:52 +0200
Janani Sunil <jan.sun97@gmail.com> wrote:

> On 6/22/26 17:36, David Lechner wrote:
> > On 6/22/26 7:20 AM, Nuno Sá wrote:  
> >> On Mon, Jun 22, 2026 at 12:51:20PM +0100, Rodrigo Alencar wrote:  
> >>> On 22/06/26 11:29, Nuno Sá wrote:  
> >>>> On Mon, Jun 22, 2026 at 10:24:05AM +0100, Rodrigo Alencar wrote:  
> >>>>> On 21/06/26 15:33, Jonathan Cameron wrote:  
> >>>>>> On Fri, 19 Jun 2026 16:54:11 +0100
> >>>>>> Nuno Sá <noname.nuno@gmail.com> wrote:
> >>>>>>  
> >>>>>>> On Fri, Jun 19, 2026 at 03:12:07PM +0100, Conor Dooley wrote:  
> >>>>>>>> On Fri, Jun 19, 2026 at 02:01:08PM +0100, Nuno Sá wrote:  
> >>>>>>>>> On Fri, Jun 19, 2026 at 12:40:54PM +0100, Conor Dooley wrote:  
> >>>>>>>>>> On Fri, Jun 19, 2026 at 12:36:55PM +0100, Conor Dooley wrote:  
> >>>>>>>>>>> On Fri, Jun 19, 2026 at 12:33:11PM +0200, Janani Sunil wrote:  
> >>>>>>>>>>>> On 6/14/26 21:44, Jonathan Cameron wrote:  
> >>>>>>>>>>>>> On Tue, 9 Jun 2026 16:47:23 +0200
> >>>>>>>>>>>>> Janani Sunil <jan.sun97@gmail.com> wrote:
> >>>>>>>>>>>>>      
> >>>>>>>>>>>>>> On 5/26/26 15:11, Rodrigo Alencar wrote:  
> >>>>>>>>>>>>>>> On 26/05/19 05:42PM, Janani Sunil wrote:  
> >>>>>>>>>>>>>>>> Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
> >>>>>>>>>>>>>>>> buffered voltage output digital-to-analog converter (DAC) with an
> >>>>>>>>>>>>>>>> integrated precision reference.  
> >>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>> Probably others may comment on that, but...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This parent node may support device addressing for multi-device support through
> >>>>>>>>>>>>>>> those ID pins. I suppose that each device may have its own power supplies or
> >>>>>>>>>>>>>>> other resources like the toggle pins or reset and enable.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> That way I suppose that an example would look like...  
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +patternProperties:
> >>>>>>>>>>>>>>>> +  "^channel@([0-9]|1[0-5])$":
> >>>>>>>>>>>>>>>> +    type: object
> >>>>>>>>>>>>>>>> +    description: Child nodes for individual channel configuration
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +    properties:
> >>>>>>>>>>>>>>>> +      reg:
> >>>>>>>>>>>>>>>> +        description: Channel number.
> >>>>>>>>>>>>>>>> +        minimum: 0
> >>>>>>>>>>>>>>>> +        maximum: 15
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +      adi,output-range-microvolt:
> >>>>>>>>>>>>>>>> +        description: |
> >>>>>>>>>>>>>>>> +          Output voltage range for this channel as [min, max] in microvolts.
> >>>>>>>>>>>>>>>> +          If not specified, defaults to 0V to 5V range.
> >>>>>>>>>>>>>>>> +        oneOf:
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: 0
> >>>>>>>>>>>>>>>> +              - enum: [5000000, 10000000, 20000000, 40000000]
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: -5000000
> >>>>>>>>>>>>>>>> +              - const: 5000000
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: -10000000
> >>>>>>>>>>>>>>>> +              - const: 10000000
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: -15000000
> >>>>>>>>>>>>>>>> +              - const: 15000000
> >>>>>>>>>>>>>>>> +          - items:
> >>>>>>>>>>>>>>>> +              - const: -20000000
> >>>>>>>>>>>>>>>> +              - const: 20000000
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +    required:
> >>>>>>>>>>>>>>>> +      - reg
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +    additionalProperties: false
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +required:
> >>>>>>>>>>>>>>>> +  - compatible
> >>>>>>>>>>>>>>>> +  - reg
> >>>>>>>>>>>>>>>> +  - vdd-supply
> >>>>>>>>>>>>>>>> +  - avdd-supply
> >>>>>>>>>>>>>>>> +  - hvdd-supply
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +dependencies:
> >>>>>>>>>>>>>>>> +  spi-cpha: [ spi-cpol ]
> >>>>>>>>>>>>>>>> +  spi-cpol: [ spi-cpha ]
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +allOf:
> >>>>>>>>>>>>>>>> +  - $ref: /schemas/spi/spi-peripheral-props.yaml#
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +unevaluatedProperties: false
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +examples:
> >>>>>>>>>>>>>>>> +  - |
> >>>>>>>>>>>>>>>> +    #include <dt-bindings/gpio/gpio.h>
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +    spi {
> >>>>>>>>>>>>>>>> +        #address-cells = <1>;
> >>>>>>>>>>>>>>>> +        #size-cells = <0>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +        dac@0 {
> >>>>>>>>>>>>>>>> +            compatible = "adi,ad5529r-16";
> >>>>>>>>>>>>>>>> +            reg = <0>;
> >>>>>>>>>>>>>>>> +            spi-max-frequency = <25000000>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            vdd-supply = <&vdd_regulator>;
> >>>>>>>>>>>>>>>> +            avdd-supply = <&avdd_regulator>;
> >>>>>>>>>>>>>>>> +            hvdd-supply = <&hvdd_regulator>;
> >>>>>>>>>>>>>>>> +            hvss-supply = <&hvss_regulator>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            #address-cells = <1>;
> >>>>>>>>>>>>>>>> +            #size-cells = <0>;
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            channel@0 {
> >>>>>>>>>>>>>>>> +                reg = <0>;
> >>>>>>>>>>>>>>>> +                adi,output-range-microvolt = <0 5000000>;
> >>>>>>>>>>>>>>>> +            };
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            channel@1 {
> >>>>>>>>>>>>>>>> +                reg = <1>;
> >>>>>>>>>>>>>>>> +                adi,output-range-microvolt = <(-10000000) 10000000>;
> >>>>>>>>>>>>>>>> +            };
> >>>>>>>>>>>>>>>> +
> >>>>>>>>>>>>>>>> +            channel@2 {
> >>>>>>>>>>>>>>>> +                reg = <2>;
> >>>>>>>>>>>>>>>> +                adi,output-range-microvolt = <0 40000000>;
> >>>>>>>>>>>>>>>> +            };
> >>>>>>>>>>>>>>>> +        };
> >>>>>>>>>>>>>>>> +    };  
> >>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 	spi {
> >>>>>>>>>>>>>>> 		#address-cells = <1>;
> >>>>>>>>>>>>>>> 		#size-cells = <0>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 		multi-dac@0 {
> >>>>>>>>>>>>>>> 			compatible = "adi,ad5529r-16";
> >>>>>>>>>>>>>>> 			reg = <0>;
> >>>>>>>>>>>>>>> 			spi-max-frequency = <25000000>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 			#address-cells = <1>;
> >>>>>>>>>>>>>>> 			#size-cells = <0>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 			dac@0 {
> >>>>>>>>>>>>>>> 				reg = <0>;
> >>>>>>>>>>>>>>> 				vdd-supply = <&vdd_regulator>;
> >>>>>>>>>>>>>>> 				avdd-supply = <&avdd_regulator>;
> >>>>>>>>>>>>>>> 				hvdd-supply = <&hvdd_regulator>;
> >>>>>>>>>>>>>>> 				hvss-supply = <&hvss_regulator>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				#address-cells = <1>;
> >>>>>>>>>>>>>>> 				#size-cells = <0>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@0 {
> >>>>>>>>>>>>>>> 					reg = <0>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <0 5000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@1 {
> >>>>>>>>>>>>>>> 					reg = <1>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <(-10000000) 10000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@2 {
> >>>>>>>>>>>>>>> 					reg = <2>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <0 40000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>> 			}
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 			dac@1 {
> >>>>>>>>>>>>>>> 				reg = <1>;
> >>>>>>>>>>>>>>> 				vdd-supply = <&vdd_regulator>;
> >>>>>>>>>>>>>>> 				avdd-supply = <&avdd_regulator>;
> >>>>>>>>>>>>>>> 				hvdd-supply = <&hvdd_regulator>;
> >>>>>>>>>>>>>>> 				hvss-supply = <&hvss_regulator>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				reset-gpios = <&gpio0 88 GPIO_ACTIVE_LOW>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				#address-cells = <1>;
> >>>>>>>>>>>>>>> 				#size-cells = <0>;
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@0 {
> >>>>>>>>>>>>>>> 					reg = <0>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <0 5000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 				channel@1 {
> >>>>>>>>>>>>>>> 					reg = <1>;
> >>>>>>>>>>>>>>> 					adi,output-range-microvolt = <(-10000000) 10000000>;
> >>>>>>>>>>>>>>> 				};
> >>>>>>>>>>>>>>> 			}
> >>>>>>>>>>>>>>> 		};
> >>>>>>>>>>>>>>> 	};
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> then you might need something like:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 	patternProperties:
> >>>>>>>>>>>>>>> 		"^dac@[0-3]$":
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> and put most of the things under this node pattern.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> So the main driver that you're putting together might need to handle up to four instances.
> >>>>>>>>>>>>>>> Even if your current driver cannot handle this, the dt-bindings might need cover that.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Need to double check if each dac node needs a separate compatible, so you would maybe populate
> >>>>>>>>>>>>>>> a platform data to be shared with the child nodes, which would be a separate driver.
> >>>>>>>>>>>>>>> (not sure if it would make sense to mix and match ad5529r-16 and ad5529r-12).  
> >>>>>>>>>>>>>> Hi Rodrigo,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you for looking at this.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For now, I would prefer to keep the binding scoped to a single AD5529R device instance. The current
> >>>>>>>>>>>>>> hardware/use case we have only needs one device node and the driver is written around that model as well.
> >>>>>>>>>>>>>> While the device addressing pins could allow multi-device topology, we do not have an actual platform using
> >>>>>>>>>>>>>> that configuration at the moment, so I would prefer not to introduce an extra parent/child binding structure
> >>>>>>>>>>>>>> speculatively without a validating use case.  
> >>>>>>>>>>>>> Interesting feature - kind of similar to address control on a typical i2c bus device, or
> >>>>>>>>>>>>> looking at it another way a kind of distributed SPI mux.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Challenge of a binding is we need to anticipate the future.  So I think we do need something
> >>>>>>>>>>>>> like Rodrigo is suggesting even if we only (for now) support a single instance in the driver.
> >>>>>>>>>>>>> That would leave the path open to supporting the addressing at a later date.
> >>>>>>>>>>>>> An alternative might be to look at it like a chained device setup. In those we pretend there
> >>>>>>>>>>>>> is just one device with a lot of channels etc.  The snag is that here things are more loosely
> >>>>>>>>>>>>> coupled whereas for those devices it tends to be you have to read / write the same register
> >>>>>>>>>>>>> in all devices in the chain as one big SPI message.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> +CC Mark Brown as he may know of some precedence for this feature. For his reference..
> >>>>>>>>>>>>> - Each of these device has 2 ID pins.  The SPI transfers have to contain the 2 bit
> >>>>>>>>>>>>> value that matches that or they are ignored.  Thus a single bus + 1 chip select can
> >>>>>>>>>>>>> be used to talk to 4 devices.  Question is what that looks like in device tree + I guess
> >>>>>>>>>>>>> longer term how to support it cleanly in SPI.  
> >>>>>>>>>>> I'd swear I have seen this before, from some Microchip devices. Let me
> >>>>>>>>>>> see if I can find what I am thinking of...  
> >>>>>>>>>>
> >>>>>>>>>> microchip,mcp3911 and microchip,mcp3564 both seem to do this with
> >>>>>>>>>> slightly different properties.
> >>>>>>>>>>
> >>>>>>>>>>    microchip,device-addr:
> >>>>>>>>>>      description: Device address when multiple MCP3911 chips are present on the same SPI bus.
> >>>>>>>>>>      $ref: /schemas/types.yaml#/definitions/uint32
> >>>>>>>>>>      enum: [0, 1, 2, 3]
> >>>>>>>>>>      default: 0
> >>>>>>>>>>
> >>>>>>>>>> and
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>    microchip,hw-device-address:
> >>>>>>>>>>      $ref: /schemas/types.yaml#/definitions/uint32
> >>>>>>>>>>      minimum: 0
> >>>>>>>>>>      maximum: 3
> >>>>>>>>>>      description:
> >>>>>>>>>>        The address is set on a per-device basis by fuses in the factory,
> >>>>>>>>>>        configured on request. If not requested, the fuses are set for 0x1.
> >>>>>>>>>>        The device address is part of the device markings to avoid
> >>>>>>>>>>        potential confusion. This address is coded on two bits, so four possible
> >>>>>>>>>>        addresses are available when multiple devices are present on the same
> >>>>>>>>>>        SPI bus with only one Chip Select line for all devices.
> >>>>>>>>>>        Each device communication starts by a CS falling edge, followed by the
> >>>>>>>>>>        clocking of the device address (BITS[7:6] - top two bits of COMMAND BYTE
> >>>>>>>>>>        which is first one on the wire).
> >>>>>>>>>>
> >>>>>>>>>> This sounds exactly like the sort of feature that you're dealing with
> >>>>>>>>>> here?
> >>>>>>>>>>      
> >>>>>>>>> The core idea yes but for this chip, things are a bit more annoying (but
> >>>>>>>>> Janani can correct me if I'm wrong). Here, each device can, in theory,
> >>>>>>>>> have it's own supplies, pins and at the very least, channels with maybe
> >>>>>>>>> different scales. That is why Janani is proposing dac nodes. Given I
> >>>>>>>>> honestly don't like much of that "adi,ad5529r-bus" compatible I wondered
> >>>>>>>>> about solving this at the spi level.
> >>>>>>>>>
> >>>>>>>>> Ah and to make it more annoying, we can also mix 12 and 16 bits variants
> >>>>>>>>> together in the same bus.  
> >>>>>>>> I'm definitely missing something, because that property for the
> >>>>>>>> microchip devices is not impacted what else is on the bus. AFAICT, you
> >>>>>>>> could have an mcp3911 and an mcp3564 on the same bus even though both
> >>>>>>>> are completely different devices with different drivers. They have
> >>>>>>>> individual device nodes and their own supplies etc etc. These aren't
> >>>>>>>> per-channel properties on an adc or dac, they're per child device on a
> >>>>>>>> spi bus.  
> >>>>>>> Maybe I'm the one missing something :). IIRC, spi would not allow two
> >>>>>>> devices on the same CS right? Because for this chip we would need
> >>>>>>> something like:
> >>>>>>>
> >>>>>>> spi {
> >>>>>>> 	dac@0 {
> >>>>>>> 		reg = <0>;
> >>>>>>> 		adi,pin-id = <0>;
> >>>>>>> 	};
> >>>>>>>
> >>>>>>> 	dac@1 {
> >>>>>>> 		reg = <0>; // which seems already problematic?
> >>>>>>> 		adi,pin-id <1>;
> >>>>>>> 	};
> >>>>>>>
> >>>>>>> 	...
> >>>>>>>
> >>>>>>> 	//up to 4
> >>>>>>> };  
> >>>>>> Yeah. It's not clear to me how that works for the microchip devices
> >>>>>> (I suspect it doesn't!)
> >>>>>>
> >>>>>> Just thinking as I type, but could we do something a bit nasty with
> >>>>>> a gpio mux that doesn't actually switch but represents the GPIO being
> >>>>>> shared?  Given this is all tied to the spi bus that should all happen
> >>>>>> under serializing locks.
> >>>>>>
> >>>>>> Agreed though that this would be nicer as an SPI thing that let
> >>>>>> us specify that a single CS is share by multiple devices and their
> >>>>>> is some other signal acting to select which one we are talking to.
> >>>>>>  
> >>>>> If the device-addressing on the same chip-select is to be handled
> >>>>> by the spi framework, wouldn't we lose device-specific features?
> >>>>>
> >>>>> I understand that this multi-device feature is there mostly to extend the
> >>>>> channel count from 16 to 32, 48 or 64. I suppose the command:
> >>>>>
> >>>>> 	"MULTI DEVICE SW LDAC MODE"
> >>>>>
> >>>>> exists so that software can update channel values accross multiple devices.  
> >>>> Right! You do have a point! I agree the main driver for a feature like
> >>>> this is likely to extend the channel count and effectively "aggregate"
> >>>> devices.
> >>>>
> >>>> But I would say that even with the spi solution the MULTI DEVICE stuff
> >>>> should be doable (as we still need a sort of adi,pin-id property).  
> >>> I don't think we can have something like an IIO buffer shared by multiple
> >>> devices. Synchronizing separate devices would be doable with proper hardware
> >>> support for this (probably involving an FGPA).  
> >> True!
> >>  
> >>>     
> >>>> But yes, I do feel that the whole feature is for aggregation so seeing
> >>>> one device with 32 channels is the expectation here? Rather than seeing
> >>>> two devices with 16 channels.  
> >>> Yes, I think aggregation is the whole point there... so that the IIO driver
> >>> is multi-device-aware.  
> >> Which makes me feel that different pins per device might be possible
> >> from an HW point of view but does not make much sense. For example, for
> >> the buffer example I would expect LDAC to be shared between all the
> >> devices.
> >>
> >> - Nuno Sá  
> > I think I mentioned this on a previous revision, but I still think the
> > simplest way to go about it would be to assume that all chips treated
> > as an aggregate device have everything wired in parallel and just add
> > support for per-chip wiring on an as-needed basis. This is how we have
> > handled daisy-chained devices so far.  
> 
> Hi David,
> 
> One thing about this approach is that is does not cover a combination  of 12 and 16 bit parts in the chain,
> since the compatible string would be at the top level and apply to all chips. To handle this without per chip child nodes or per-chip compatible,
> I propose an "adi, resolution" property as an integer array, indexed by the device position:
> 

To me mixed parts, smells like a case that we will probably never see.  If
a customer actually has a design doing that we can add optional dt
bindings at that time.

So I'd stick to separate compatibles. 

> 
> dac@0 {
>      compatible = "adi,ad5529r";
>      reg = <0>;
>      adi,device-addrs = <0 1>;
>      adi,resolution   = <16 12>;   /* per-chip, indexed by position */
>      reset-gpios = <&gpio0 87 GPIO_ACTIVE_LOW>;
>      vdd-supply  = <&vdd_reg>;
>      hvdd-supply = <&hvdd_reg>;
> 
>      channel@0  { reg = <0>;  adi,output-range-microvolt = <0 5000000>; };
>      channel@16 { reg = <16>; adi,output-range-microvolt = <0 40000000>; };




> };
> 
> 
> 1) This follows the daisy-chain/aggregated model as you suggested, exposing N*16 channels as a single IIO device.
> 2) Keeps the binding flat- no phantom compatible at a parent bus node, no per-chip child nodes.
> 3) Enables a 12 bit + 16 bit device combination in the chain, without needing a per-chip compatible.
> 4) adi, device-addrs specifies the HW address, allowing the driver to encode it into the SPI frame.
> 5) Supplies and GPIOs remain simple- assuming parallel wiring across all chips.
> 
> Jonathan, you had earlier suggested using separate compatibles
>   (adi,ad5529r-16 and adi,ad5529r-12) to handle the
>   resolution difference.
> However, with the aggregated flat binding model,
>   separate per-chip compatibles would require child nodes which brings
> back the phantom compatible problem at the parent level. The
>   adi,resolution array is intended as an alternative that achieves the
> same goal-expressing per-chip resolution, without needing a per-chip
>   compatible or child node structure.
> 
> Does this look reasonable?
> 
> Best Regards,
> Janani Sunil
> 
> 


^ permalink raw reply

* Re: [RFC PATCH v2 06/10] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: tarunsahu @ 2026-06-23 14:36 UTC (permalink / raw)
  To: Ackerley Tng, Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin,
	Shuah Khan, sagis, aneesh.kumar, skhawaja, vipinsh,
	Pratyush Yadav, david, dmatlack, mark.rutland, Paolo Bonzini,
	Mike Rapoport, Alexander Graf, seanjc, axelrasmussen
  Cc: linux-kselftest, kexec, linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <CAEvNRgFEHciT3T9y+qEYRvXhDwfrggoU7Rm=f9hT3OrV+wgpNQ@mail.gmail.com>

Ackerley Tng <ackerleytng@google.com> writes:

> Tarun Sahu <tarunsahu@google.com> writes:
>
>>  static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
>>  			       loff_t len)
>>  {
>> +	struct inode *inode = file_inode(file);
>>  	int ret;
>> +	int idx;
>>
>> -	if (!(mode & FALLOC_FL_KEEP_SIZE))
>> -		return -EOPNOTSUPP;
>> +	idx = srcu_read_lock(&kvm_gmem_freeze_srcu);
>> +	if (kvm_gmem_is_frozen(inode)) {
>> +		srcu_read_unlock(&kvm_gmem_freeze_srcu, idx);
>> +		return -EPERM;
>> +	}
>
> fallocate may eventually go to kvm_gmem_get_folio(), so that would check
> kvm_gmem_is_frozen() twice. Is this meant to catch the punch hole case?
>
>>
>> -	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
>> -		return -EOPNOTSUPP;
>> +	if (!(mode & FALLOC_FL_KEEP_SIZE)) {
>> +		ret = -EOPNOTSUPP;
>> +		goto out;
>> +	}
>>
>> -	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
>> -		return -EINVAL;
>> +	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) {
>> +		ret = -EOPNOTSUPP;
>> +		goto out;
>> +	}
>> +
>> +	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>
> There's some reordering here. Why not let the validation happen like
> before, then check kvm_gmem_is_frozen()?

To align with design. "stop the fallocate call if inode is frozen, No
need to go further". I dont have strict opinion on this. I am fine with
taking it across punch hole as well to make it more fine grained. But it
will no longer claims stop the fallocate call (allocation one is stopped
in separate path: fault path) , though functionally it does the same
thing.

WDYT?

~Tarun

^ permalink raw reply

* Re: [RFC PATCH 0/6] mm/damon: hardware-sampled access reports
From: Zeng Heng @ 2026-06-23 14:08 UTC (permalink / raw)
  To: Ravi Jonnalagadda, sj, akinobu.mita, damon, linux-mm,
	linux-kernel, linux-doc
  Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun
In-Reply-To: <20260529165640.820-1-ravis.opensrc@gmail.com>

Hi Ravi,

On 2026/5/30 0:56, Ravi Jonnalagadda wrote:
> This series introduces a vendor and PMU-agnostic substrate inside DAMON
> that consumes hardware-sampled access reports through the standard
> perf-event interface.  Userspace selects the PMU through sysfs (raw
> type/config knobs), driving either Intel PEBS L3-miss sampling or AMD
> IBS Op sampling.
> 

[...]

> 
> Ravi Jonnalagadda (6):
>    mm/damon: add struct damon_perf_event{,_attr} and per-ctx perf_events
>      list
>    mm/damon/sysfs-sample: expose perf_events configuration via sysfs
>    mm/damon/sysfs: install perf_events on apply
>    mm/damon/core: per-CPU SPSC ring drain and damon_perf_event lifecycle
>    mm/damon/vaddr: implement perf-event access check
>    mm/damon: add damos_node_eligible_mem_bp tracepoint
> 
>   include/linux/damon.h        |  80 +++++
>   include/trace/events/damon.h |  49 +++
>   mm/damon/core.c              | 403 ++++++++++++++++++++----
>   mm/damon/ops-common.h        |  39 +++
>   mm/damon/sysfs-common.h      |   6 +
>   mm/damon/sysfs-sample.c      | 579 +++++++++++++++++++++++++++++++++++
>   mm/damon/sysfs.c             |   3 +
>   mm/damon/vaddr.c             | 267 ++++++++++++++++
>   8 files changed, 1370 insertions(+), 56 deletions(-)
> 
> 
> base-commit: 4c8ad15abf15eb480d3ad85f902001e35465ef18

I wasn't able to apply this patch series to the linux (and linux-next)
mainline branch, and also had trouble identifying the source of the base
commit.

Would you mind sharing where this baseline is from?

Thanks in advance!


Best Regards,
Zeng Heng

^ permalink raw reply

* [dlech:b4/iio-adc-ti-ads122c14 10/10] htmldocs: Documentation/iio/ads112c14.rst:57: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
From: kernel test robot @ 2026-06-23 14:05 UTC (permalink / raw)
  To: David Lechner (TI); +Cc: oe-kbuild-all, linux-doc

tree:   https://github.com/dlech/linux b4/iio-adc-ti-ads122c14
head:   9a48214781dd8525063938516a2a00ffb7b3c23c
commit: 9a48214781dd8525063938516a2a00ffb7b3c23c [10/10] WIP
compiler: clang version 22.1.8 (https://github.com/llvm/llvm-project ca7933e47d3a3451d81e72ac174dcb5aa28b59d1)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260623/202606231501.hG7m0kHG-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202606231501.hG7m0kHG-lkp@intel.com/

All warnings (new ones prefixed by >>):

   in0_lcrit_fault_log             Set to 1 by a VIN undervoltage fault occurring.
   in0_crit_fault_log              Set to 1 by a VIN overvoltage fault occurring.
   curr1_crit_fault_log            Set to 1 by an overcurrent fault occurring.
   =======================         ========================================== [docutils]
   Documentation/iio/ads112c14.rst:56: ERROR: Unexpected indentation. [docutils]
>> Documentation/iio/ads112c14.rst:57: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
   WARNING: ./include/linux/damon.h:909 struct member 'probes' not described in 'damon_ctx'
   WARNING: ./include/linux/damon.h:909 struct member 'rnd_state' not described in 'damon_ctx'
   WARNING: ./include/linux/damon.h:909 struct member 'probes' not described in 'damon_ctx'
   WARNING: ./include/linux/damon.h:909 struct member 'rnd_state' not described in 'damon_ctx'
   Documentation/mm/memfd_preservation:7: ./mm/memfd_luo.c:13: ERROR: Unexpected section title.
--
   Documentation/userspace-api/landlock:596: ./include/uapi/linux/landlock.h:58: ERROR: Unknown target name: "network flags". [docutils]
   Documentation/userspace-api/landlock:596: ./include/uapi/linux/landlock.h:63: ERROR: Unknown target name: "scope flags". [docutils]
   Documentation/userspace-api/landlock:596: ./include/uapi/linux/landlock.h:24: ERROR: Unknown target name: "filesystem flags". [docutils]
   Documentation/userspace-api/landlock:605: ./include/uapi/linux/landlock.h:228: ERROR: Unknown target name: "filesystem flags". [docutils]
   Documentation/userspace-api/landlock:605: ./include/uapi/linux/landlock.h:251: ERROR: Unknown target name: "network flags". [docutils]
>> Documentation/iio/ads112c14.rst:22: WARNING: unknown document: '../devicetree/bindings/iio/adc/ti,ads112c14' [ref.doc]
   Documentation/networking/checksum-offloads:157: ./include/linux/skbuff.h:181: WARNING: Failed to create a cross reference. A title or caption not found: 'crc' [ref.ref]


vim +57 Documentation/iio/ads112c14.rst

    21	
  > 22	See :doc:`../devicetree/bindings/iio/adc/ti,ads112c14` for the complete
    23	binding description.
    24	
    25	The minimal device tree node looks like this::
    26	
    27	    adc@40 {
    28	        compatible = "ti,ads122c14";
    29	        reg = <0x40>;
    30	        avdd-supply = <&avdd>;
    31	        dvdd-supply = <&dvdd>;
    32	
    33	        /* TODO: add channel@ nodes here */
    34	    };
    35	
    36	Effect of devicetree on registers
    37	---------------------------------
    38	
    39	Device properties have the following effects on registers:
    40	
    41	* ``CLK_SEL`` and ``GPIO3_CFG`` are set if the ``clocks`` property is present.
    42	* ``GPIO2_SRC`` may be set when ``interrupts`` and ``interrupt-names`` for
    43	  "fault" is present.
    44	* ``GPIO3_SRC`` may be set when ``interrupts`` and ``interrupt-names`` for
    45	  "drdy" is present.
    46	
    47	Channel-specific properties have the following effects on registers only when
    48	a measurement is active on the channel:
    49	
    50	* ``AINP`` and ``AINN`` are set according to the ``single-channel`` or
    51	  ``differential-channel`` properties of each channel.
    52	* ``REF_SEL`` and ``REF_VAL`` will be set according to the value of the
    53	  ``ti,ref-source`` property.
    54	* When ``ti,ref-source`` is set to REFP-REFN:
    55	  * If ``refp-supply`` is given and not the same phandle as ``avdd-supply``,
    56	    ``REFP_BUF_EN`` will be set to 1, otherwise it will be 0.
  > 57	  * If ``refn-supply`` is given, ``REFN_BUF_EN`` will be set to 1, otherwise it
    58	    will be 0.
    59	* ``I1MUX`` and ``I2MUX`` will be set to the values of the ``excitation-channels``
    60	   property.
    61	* ``IUNIT``, ``I1MAG`` and ``I2MAG`` will be set to best fit according to the
    62	   values of the ``excitation-current-nanoamp`` property.
    63	* ``GC_EN`` will be set to 1 if the ``input-channel-rotation`` property is present,
    64	  otherwise it will be 0.
    65	* ``CODING`` is set based on the ``bipolar`` property.
    66	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v4 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
From: Amit Machhiwal @ 2026-06-23 14:05 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Amit Machhiwal, linuxppc-dev, Madhavan Srinivasan,
	Anushree Mathur, Paolo Bonzini, Nicholas Piggin, Michael Ellerman,
	Christophe Leroy (CS GROUP), Jonathan Corbet, Shuah Khan, kvm,
	linux-kernel, linux-doc, lkp
In-Reply-To: <8733yjcb0y.fsf@vajain21.in.ibm.com>

Hi Vaibhav,

Thanks for reviewing this patch. Please find my response below.

On 2026/06/19 11:44 AM, Vaibhav Jain wrote:
> Hi Amit,
> 
> Thanks for the patch and incorporating V3 review comments. Further
> review comments inline below:
> 
> Amit Machhiwal <amachhiw@linux.ibm.com> writes:
> 
> > Add documentation for the KVM_PPC_GET_COMPAT_CAPS ioctl to the KVM API
> > documentation.
> >
> > The ioctl exposes host processor compatibility modes supported for
> > nested KVM guests on PowerPC systems. The documentation includes
> > comprehensive error code descriptions, structure field definitions
> > including the size field for forward compatibility, and KVM-specific
> > capability bit constants.
> >
> > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 47 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 47 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 52bbbb553ce1..ba6feba74d7d 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6553,6 +6553,53 @@ KVM_S390_KEYOP_SSKE
> >    Sets the storage key for the guest address ``guest_addr`` to the key
> >    specified in ``key``, returning the previous value in ``key``.
> >  
> > +4.145 KVM_PPC_GET_COMPAT_CAPS
> > +-----------------------------
> > +:Capability: KVM_CAP_PPC_COMPAT_CAPS
> > +:Architectures: powerpc
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_ppc_compat_caps (out)
> > +:Returns: 0 on success, negative value on failure
> > +
> > +Errors include:
> > +
> > +  ======== ============================================================
> > +  EFAULT   if ``struct kvm_ppc_compat_caps`` cannot be read from or
> > +           written to userspace
> > +  EINVAL   if the ``size`` field is smaller than the current structure
> > +           size, or if the backend implementation fails to retrieve or
> > +           map CPU compatibility capabilities
> > +  ENOTTY   if the backend does not implement the ``get_compat_caps``
> > +           operation (e.g., on non-pseries platforms or when the
> > +           required KVM operations are not available)
> > +  ======== ============================================================
> > +
> > +IBM POWER system server-based processors provide a compatibility mode feature
> > +where an Nth generation processor can operate in modes consistent with earlier
> > +generations such as (N-1) and (N-2).
> > +
> > +This ioctl provides userspace with information about the CPU compatibility modes
> > +supported by the current host processor for booting the nested KVM guests on
> > +PowerNV (KVM nested APIv1) and PowerVM (KVM nested APIv2) platforms.
> > +
> 
> Please add a detail on how returned 'size' field can be less than what
> the userspace has sent and how it should be handled.

Good point! However, with the suggested change from patch-1 (using
strict equality check `user_size != sizeof(host_caps)`, the returned
size will always equal what userspace sent, because we reject any size
mismatch with -EINVAL.

I'll update the documentation to clarify the strict size matching
behavior:

- Userspace must set size to sizeof(struct kvm_ppc_compat_caps) before
  the ioctl
- The kernel validates this with strict equality (!= check)
- Returns -EINVAL if size doesn't match
- On success, the returned size will always equal what was sent

This should eliminates the forward-compatibility scenario where returned
size could differ from sent size.

Thank,
Amit

> 
> > +::
> > +
> > +  struct kvm_ppc_compat_caps {
> > +	__u64	flags;			/* Reserved for future use */
> > +	__u64	size;			/* Size of this structure */
> > +	__u64	compat_capabilities;	/* Capabilities supported by the host */
> > +  };
> > +
> > +The ``compat_capabilities`` bit field describes the processor compatibility
> > +modes supported by the host. For example, the following bits indicate support
> > +for specific processor modes.
> > +
> > +::
> > +
> > +  KVM_PPC_COMPAT_CAP_POWER9  (bit 1): KVM guests can run in Power9 processor mode
> > +  KVM_PPC_COMPAT_CAP_POWER10 (bit 2): KVM guests can run in Power10 processor mode
> > +  KVM_PPC_COMPAT_CAP_POWER11 (bit 3): KVM guests can run in Power11 processor mode
> > +
> >  .. _kvm_run:
> >  
> >  5. The kvm_run structure
> > -- 
> > 2.50.1 (Apple Git-155)
> >
> 
> -- 
> Cheers
> ~ Vaibhav

^ permalink raw reply

* Re: [RFC PATCH v2 06/10] kvm: guest_memfd: Add support for freezing and unfreezing mappings
From: tarunsahu @ 2026-06-23 14:03 UTC (permalink / raw)
  To: Sean Christopherson, Ackerley Tng
  Cc: Jonathan Corbet, vannapurve, fvdl, Pasha Tatashin, Shuah Khan,
	sagis, aneesh.kumar, skhawaja, vipinsh, Pratyush Yadav, david,
	dmatlack, mark.rutland, Paolo Bonzini, Mike Rapoport,
	Alexander Graf, axelrasmussen, linux-kselftest, kexec,
	linux-kernel, linux-doc, kvm, linux-mm
In-Reply-To: <ajnOnzdknfwbuJ9g@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Jun 22, 2026, Ackerley Tng wrote:
>> Tarun Sahu <tarunsahu@google.com> writes:
>> 
>> > This patch introduces the freeze on gmem_inode which prevents
>> 
>> Can't find the reference now, but commit messages should take the
>> imperative mood and avoid "this patch" [*]
>
> From Documentation/process/submitting-patches.rst:
>
>   Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
>   instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
>   to do frotz", as if you are giving orders to the codebase to change
>   its behaviour.
>
> Documentation/process/maintainer-tip.rst and Documentation/process/maintainer-kvm-x86.rst
> elaborate more on the preferred style (I do most of the guest_memfd maintenance,
> and so for all intents and purpose it's bound by KVM x86 "rules").


Thanks!. Will take care of that.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox