Linux Trace Kernel
 help / color / mirror / Atom feed
* [PATCH v11 00/11] tracing/probes: Add more typecast features
From: Masami Hiramatsu (Google) @ 2026-06-26 14:14 UTC (permalink / raw)
  To: Steven Rostedt, Mathieu Desnoyers
  Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, linux-doc, linux-kselftest

Hi,

Here is the 11th version of series to introduce more typecast features
to probe events. The previous version is here:

 https://lore.kernel.org/all/178243982430.790911.17439694390021542101.stgit@devnote2/

In this version, I fixed minor issues and add 2 patches to fix
in-tree tools to ignore comment lines in dynamic_events[3/11][4/11].

This series extends BTF typecast feature and add more options:

1. Expanding BTF typecast to kprobe and fprobe.
   (currently only function entry/exit)

2. Introduce container_of like typecast. This adds a "assigned
   member" option to the typecast.

   (STRUCT,MEMBER)VAR->ANOTHER_MEMBER

   This casts VAR to STRUCT type but the VAR is as the address
   of STRUCT.MEMBER. In C, it is:

   container_of(VAR, STRUCT, MEMBER)->ANOTHER_MEMBER

3. Support nested typecast, e.g.

   (STRUCT)((STRUCT2)VAR->MEMBER2)->MEMBER

   the nest level must be smaller than 3.

4. Add $current variable to point "current" task_struct.
   This is useful with typecast, e.g.

   (task_struct)$current->pid

5. per-cpu dereference support.

   Intrdouce this_cpu_read(VAR) and this_cpu_ptr(VAR) to
   access per-cpu data on the current CPU (accessing other CPU
   data is not stable, because it can be changed.)

   You can access the member of per-cpu data structure using
   typecast like:

   (STRUCT)this_cpu_ptr(VAR)->MEMBER

6. Support event fields without $ prefix on eprobes.

   Now eprobe events can access its event fields.

And added fetcharg dump feature (for debug) and updated test scripts
to test part of them.

Thanks,

---
base-commit: c69b5f959286395e94c237ce6d7d4970bad7f6e3

Masami Hiramatsu (Google) (11):
      tracing/probes: Allow eprobe to use variable without $ prefix
      tracing/probes: Support dumping fetcharg program for debugging dynamic events
      tools/bootconfig: Ignore comment lines in dynamic_events/kprobe_events file
      perf/probe: Ignore comment lines in dynamic_events/kprobe_events file
      tracing/probes: Support typecast for various probe events
      tracing/probes: Support nested typecast
      tracing/probes: Type casting always involves nested calls
      tracing/probes: Support field specifier option for typecast
      tracing/probes: Add $current variable support
      tracing/probes: Add this_cpu_read() and this_cpu_ptr() dereference method to fetcharg
      tracing/probes: Add a new testcase for BTF typecasts


 Documentation/trace/eprobetrace.rst                |    7 
 Documentation/trace/fprobetrace.rst                |   10 
 Documentation/trace/kprobetrace.rst                |   11 
 kernel/trace/Kconfig                               |   12 
 kernel/trace/trace.c                               |    8 
 kernel/trace/trace_eprobe.c                        |    2 
 kernel/trace/trace_fprobe.c                        |    2 
 kernel/trace/trace_kprobe.c                        |    2 
 kernel/trace/trace_probe.c                         |  585 ++++++++++++++++----
 kernel/trace/trace_probe.h                         |  100 ++-
 kernel/trace/trace_probe_tmpl.h                    |   25 +
 kernel/trace/trace_uprobe.c                        |    3 
 samples/trace_events/trace-events-sample.c         |   40 +
 samples/trace_events/trace-events-sample.h         |   34 +
 tools/bootconfig/scripts/ftrace2bconf.sh           |    2 
 tools/perf/util/probe-file.c                       |    2 
 .../ftrace/test.d/dynevent/btf_probe_event.tc      |   51 ++
 .../test.d/dynevent/btf_typecast_accepted.tc       |  107 ++++
 .../test.d/dynevent/eprobes_syntax_errors.tc       |   12 
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |   12 
 .../ftrace/test.d/kprobe/kprobe_syntax_errors.tc   |   12 
 .../ftrace/test.d/kprobe/uprobe_syntax_errors.tc   |    5 
 22 files changed, 890 insertions(+), 154 deletions(-)
 create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
 create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
From: Alexander Krabler @ 2026-06-26 13:42 UTC (permalink / raw)
  To: Wandun, Vlastimil Babka (SUSE), linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev
  Cc: akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
	jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com,
	rostedt@goodmis.org, mhiramat@kernel.org,
	mathieu.desnoyers@efficios.com, david@kernel.org, ljs@kernel.org,
	liam@infradead.org, rppt@kernel.org, bigeasy@linutronix.de,
	clrkwllms@kernel.org, Hugh Dickins
In-Reply-To: <a96b0b24-c405-43c4-96ef-605bacd17cad@gmail.com>

On 6/26/26 11:38, Wandun wrote:
> On 6/26/26 16:45, Alexander Krabler wrote:
>> However, we were not able to reproduce the actual race
>> (mlockall() process waiting on a migration PTE),
>> not in the past, not now. Might be hard to trigger that race.
>
> Not hard to trigger that case, I added a debug message, such as below,
> lots of messages occur in a few second.
>
> diff --cc mm/memory.c
> index ff338c2abe92,ff338c2abe92..6552b3b14f78
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@@ -4768,6 -4768,6 +4768,8 @@@ vm_fault_t do_swap_page(struct vm_faul
>                 if (softleaf_is_migration(entry)) {
>                         migration_entry_wait(vma->vm_mm, vmf->pmd,
>                                              vmf->address);
> +                       if (!strcmp(current->comm, "repro"))
> +                               pr_err("============== hit ================\n");
>                 } else if (softleaf_is_device_exclusive(entry)) {
>                         vmf->page = softleaf_to_page(entry);
>                         ret = remove_device_exclusive_entry(vmf);

I have a kprobe on migration_entry_wait set and logged into a ftrace buffer
(including kernel stacktrace).
Yes, this function is hit, but only inside the mmap-syscall, which is okay,
memory allocation is not realtime-safe.

           repro-2090    [002] d....   811.129549: frt_migration_entry_wait: (migration_entry_wait+0x0/0x100)
           repro-2090    [002] d....   811.129553: <stack trace>
 => migration_entry_wait
 => __handle_mm_fault
 => handle_mm_fault
 => __get_user_pages
 => populate_vma_page_range
 => __mm_populate
 => vm_mmap_pgoff
 => ksys_mmap_pgoff
 => __arm64_sys_mmap
 => el0_svc_common.constprop.0
 => do_el0_svc
 => el0_svc
 => el0t_64_sync_handler
 => el0t_64_sync

The original race was an instruction abort interrupt out of nothing due
to the migration PTE set by kcompactd.
And these kind of races I see quite often on non mlockall()-processes,
but can't reproduce on memory locked processes.

Example:
          podman-832     [000] d....   812.447820: frt_migration_entry_wait: (migration_entry_wait+0x0/0x100)
          podman-832     [000] d....   812.447823: <stack trace>
 => migration_entry_wait
 => __handle_mm_fault
 => handle_mm_fault
 => do_page_fault
 => do_translation_fault
 => do_mem_abort
 => el0_da
 => el0t_64_sync_handler
 => el0t_64_sync

Thanks,
Alexander

--

KUKA Deutschland GmbH   Board of Directors: Michael Jürgens (Chairman), Johan Naten, Hui Zhang   Registered Office: Augsburg HRB 14914

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of contents of this e-mail is strictly forbidden.

Please consider the environment before printing this e-mail.

^ permalink raw reply

* [PATCH v7 9/9] init/main.c: use bootconfig_cmdline_requested() for the runtime opt-in
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

setup_boot_config() open-coded the same "is bootconfig requested on the
kernel command line?" check that setup_arch() performs via the shared
bootconfig_cmdline_requested() helper. Switch it to the helper so the
early (setup_arch) and late (setup_boot_config) paths use one parser and
cannot disagree on what counts as opt-in.

The helper also reports the offset of the init arguments following a "--"
separator, which is exactly what initargs_offs needs, so the local
parse_args() call, its bootconfig_params() callback and the tmp_cmdline
copy are removed.

No functional change intended.

Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 init/main.c | 27 ++++++---------------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/init/main.c b/init/main.c
index 260bd5242f94e..39a518a472422 100644
--- a/init/main.c
+++ b/init/main.c
@@ -356,28 +356,17 @@ static char * __init xbc_make_cmdline(const char *key)
 	return new_cmdline;
 }
 
-static int __init bootconfig_params(char *param, char *val,
-				    const char *unused, void *arg)
-{
-	if (strcmp(param, "bootconfig") == 0) {
-		bootconfig_found = true;
-	}
-	return 0;
-}
-
 static int __init warn_bootconfig(char *str)
 {
-	/* The 'bootconfig' has been handled by bootconfig_params(). */
+	/* The 'bootconfig' option is handled by setup_boot_config(). */
 	return 0;
 }
 
 static void __init setup_boot_config(void)
 {
-	static char tmp_cmdline[COMMAND_LINE_SIZE] __initdata;
 	const char *msg, *data;
-	int pos, ret;
+	int pos, ret, offs;
 	size_t size;
-	char *err;
 	bool from_embedded = false;
 
 	/* Cut out the bootconfig data even if we have no bootconfig option */
@@ -388,16 +377,12 @@ static void __init setup_boot_config(void)
 		from_embedded = true;
 	}
 
-	strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
-	err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
-			 bootconfig_params);
-
-	if (IS_ERR(err) || !(bootconfig_found || IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE)))
+	bootconfig_found = bootconfig_cmdline_requested(boot_command_line, &offs);
+	if (!(bootconfig_found || IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE)))
 		return;
 
-	/* parse_args() stops at the next param of '--' and returns an address */
-	if (err)
-		initargs_offs = err - tmp_cmdline;
+	/* Offset of the init arguments after a "--", located by the helper. */
+	initargs_offs = offs;
 
 	if (!data) {
 		/* If user intended to use bootconfig, show an error level message */

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v7 8/9] bootconfig: skip runtime kernel.* render once prepended early
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

setup_boot_config() folds the embedded bootconfig "kernel" subtree into
the command line via xbc_make_cmdline("kernel"). A subsequent patch lets
an architecture prepend the build-time-rendered embedded "kernel" keys
to boot_command_line early in setup_arch(); rendering them again here
would then duplicate every key in saved_command_line and make
accumulating handlers (console=, earlycon=, ...) re-register the same
value.

Track whether the bootconfig data came from the embedded source
(from_embedded) and skip the runtime render only when the early prepend
actually happened, as reported by xbc_embedded_cmdline_applied(). On
architectures that do not select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
that helper is a stub returning false, so this path is unchanged and the
embedded "kernel" keys still reach the cmdline via the runtime parser
exactly as before.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 init/main.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/init/main.c b/init/main.c
index e363232b428b4..260bd5242f94e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -378,12 +378,15 @@ static void __init setup_boot_config(void)
 	int pos, ret;
 	size_t size;
 	char *err;
+	bool from_embedded = false;
 
 	/* Cut out the bootconfig data even if we have no bootconfig option */
 	data = get_boot_config_from_initrd(&size);
 	/* If there is no bootconfig in initrd, try embedded one. */
-	if (!data)
+	if (!data) {
 		data = xbc_get_embedded_bootconfig(&size);
+		from_embedded = true;
+	}
 
 	strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
 	err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
@@ -421,8 +424,24 @@ static void __init setup_boot_config(void)
 	} else {
 		xbc_get_info(&ret, NULL);
 		pr_info("Load bootconfig: %ld bytes %d nodes\n", (long)size, ret);
-		/* keys starting with "kernel." are passed via cmdline */
-		extra_command_line = xbc_make_cmdline("kernel");
+		/*
+		 * keys starting with "kernel." are passed via cmdline. When
+		 * this bootconfig came from the embedded source and
+		 * setup_arch() already prepended the rendered "kernel" subtree
+		 * to boot_command_line, rendering again here would duplicate
+		 * the keys in saved_command_line and make accumulating handlers
+		 * (console=, earlycon=, ...) re-register the same value. Skip
+		 * only when the prepend really happened.
+		 *
+		 * On arches that do not select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG,
+		 * CONFIG_CMDLINE_FROM_BOOTCONFIG is unselectable and
+		 * xbc_embedded_cmdline_applied() collapses to a stub returning
+		 * false, so this path still runs and the embedded "kernel"
+		 * keys reach the cmdline via the runtime parser exactly as
+		 * before this series.
+		 */
+		if (!from_embedded || !xbc_embedded_cmdline_applied())
+			extra_command_line = xbc_make_cmdline("kernel");
 		/* Also, "init." keys are init arguments */
 		extra_init_args = xbc_make_cmdline("init");
 	}

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v7 7/9] x86/setup: prepend embedded bootconfig cmdline before parse_early_param
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

Call xbc_prepend_embedded_cmdline() in setup_arch() right after the
CONFIG_CMDLINE merge and before strscpy(command_line, ...) so the
build-time-rendered embedded bootconfig "kernel" subtree is part of
boot_command_line by the time parse_early_param() runs. early_param()
handlers (mem=, earlycon=, loglevel=, ...) now see values supplied via
CONFIG_BOOT_CONFIG_EMBED_FILE without parsing bootconfig at runtime.

Gate the prepend on the same opt-in the runtime parser uses: prepend
when "bootconfig" is present on the command line, or when
CONFIG_BOOT_CONFIG_FORCE is set. Detect it with parse_args(), exactly
as setup_boot_config() does, so both agree on what counts as opt-in:
any "bootconfig" key regardless of value (bare, =0, =1, ...), and only
before the "--" that separates init arguments. Sharing the parser keeps
the early and late paths from diverging -- e.g. "bootconfig=0" or a
"-- bootconfig" meant for init must not apply the embedded keys early
while the runtime parser skips them.

The prepend necessarily runs before setup_boot_config() detects an
initrd bootconfig, so an initrd cannot override the embedded "kernel"
keys for early_param(). This is intentional: the embedded cmdline acts
like a build-time CONFIG_CMDLINE. An initrd bootconfig's "kernel" keys
never reached early_param() anyway (they apply late via
extra_command_line), so nothing is lost -- the initrd keys still apply
late, with last-wins keeping the embedded values in effect.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/x86/Kconfig        |  1 +
 arch/x86/kernel/setup.c | 14 +++++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0de23e6471973..8ab11199c16d5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -127,6 +127,7 @@ config X86
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
 	select ARCH_SUPPORTS_CFI		if X86_64
+	select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
 	select ARCH_USES_CFI_TRAPS		if X86_64 && CFI
 	select ARCH_SUPPORTS_LTO_CLANG
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 46882ce79c3a4..88b055a46591e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -6,6 +6,7 @@
  * parts of early kernel initialization.
  */
 #include <linux/acpi.h>
+#include <linux/bootconfig.h>
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/crash_dump.h>
@@ -880,7 +881,6 @@ static void __init x86_report_nx(void)
  *
  * Note: On x86_64, fixmaps are ready for use even before this is called.
  */
-
 void __init setup_arch(char **cmdline_p)
 {
 #ifdef CONFIG_X86_32
@@ -924,6 +924,18 @@ void __init setup_arch(char **cmdline_p)
 	builtin_cmdline_added = true;
 #endif
 
+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+	/*
+	 * Prepend the build-time-rendered embedded "kernel" keys here so
+	 * parse_early_param() below sees them, using the same opt-in as the
+	 * runtime parser, plus the build-time CONFIG_BOOT_CONFIG_FORCE.
+	 */
+	if (bootconfig_cmdline_requested(boot_command_line, NULL) ||
+	    IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE))
+		xbc_prepend_embedded_cmdline(boot_command_line,
+					     COMMAND_LINE_SIZE);
+#endif
+
 	strscpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
 	*cmdline_p = command_line;
 

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v7 6/9] Documentation: bootconfig: document build-time cmdline rendering
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

Add a section describing CONFIG_CMDLINE_FROM_BOOTCONFIG: what it
does (renders the embedded "kernel" subtree to a flat cmdline at
build time so early_param() handlers see the values), what it
requires (BOOT_CONFIG_EMBED, a non-empty BOOT_CONFIG_EMBED_FILE,
CONFIG_CMDLINE to be empty, and ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG --
currently x86 only), the bootconfig opt-in semantics, the initrd-vs-embedded
precedence, and the soft-error overflow behavior.

This addresses feedback from the Sashiko AI review and Masami Hiramatsu to
document the CONFIG_CMDLINE requirement, which is enforced at the Kconfig
level but was not mentioned in the documentation, potentially confusing users
who might satisfy all other requirements but still find the option hidden in
menuconfig if CONFIG_CMDLINE is non-empty.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Documentation/admin-guide/bootconfig.rst | 81 ++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
index f712758472d5c..3d6412458c8b6 100644
--- a/Documentation/admin-guide/bootconfig.rst
+++ b/Documentation/admin-guide/bootconfig.rst
@@ -234,6 +234,87 @@ Kconfig option selected.
 Note that even if you set this option, you can override the embedded
 bootconfig by another bootconfig which attached to the initrd.
 
+Rendering Embedded kernel.* Keys at Build Time
+----------------------------------------------
+
+By default, the embedded bootconfig (``CONFIG_BOOT_CONFIG_EMBED=y``) is
+parsed at runtime, after ``parse_early_param()`` has already run. Early
+parameter handlers (``mem=``, ``earlycon=``, ``loglevel=``, ...) therefore
+cannot see values supplied via the embedded ``kernel`` subtree.
+
+``CONFIG_CMDLINE_FROM_BOOTCONFIG`` resolves this by rendering the
+``kernel`` subtree of ``CONFIG_BOOT_CONFIG_EMBED_FILE`` into a flat cmdline
+string at kernel build time (via ``tools/bootconfig -C``) and prepending
+it to ``boot_command_line`` during early architecture setup, so the keys
+are visible to ``parse_early_param()``.
+
+The option requires ``CONFIG_BOOT_CONFIG_EMBED=y``, a non-empty
+``CONFIG_BOOT_CONFIG_EMBED_FILE``, ``CONFIG_CMDLINE`` to be empty, and
+an architecture that selects ``CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG``.
+Currently only x86 selects it; on other architectures the embedded
+bootconfig still works, but only through the late runtime parser.
+
+The same ``bootconfig`` opt-in applies as elsewhere: the rendered keys
+are prepended only when ``bootconfig`` (in any form) appears on the
+kernel command line, or when ``CONFIG_BOOT_CONFIG_FORCE`` is set, which
+defaults to ``y`` when ``CONFIG_BOOT_CONFIG_EMBED`` is set.
+
+For example, given::
+
+ kernel {
+   loglevel = 7
+   mem = 4G
+ }
+
+the kernel boots as if ``loglevel=7 mem=4G`` had been prepended to the
+bootloader command line, with the values visible to early-parsed
+handlers. Comma-separated values are still expanded into multiple
+cmdline entries per the bootconfig array convention -- the embedded
+``kernel.earlycon = "uart8250,io,0x3f8"`` must be quoted to land as a
+single ``earlycon=`` entry, exactly as for the runtime parser.
+
+If the rendered string would not fit in ``COMMAND_LINE_SIZE`` together
+with the existing command line, the prepend is skipped and an error is
+logged, so an oversized embedded bootconfig cannot brick a boot.
+
+Interaction with other command line and bootconfig sources
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With ``CONFIG_CMDLINE_FROM_BOOTCONFIG=y`` the rendered ``kernel``
+subtree behaves like a build-time command line (similar to
+``CONFIG_CMDLINE``), not like a bootconfig source. It is prepended to
+``boot_command_line`` in ``setup_arch()``, before ``parse_early_param()``
+and long before the runtime parser looks at an initrd. Options can reach
+the kernel from up to four places:
+
+- Bootloader command line: the arguments the boot loader passes. The
+  embedded cmdline is prepended in front of them, so for last-one-wins
+  parameters a bootloader option still overrides the embedded value.
+  Visible in /proc/cmdline.
+- Embedded cmdline (this option): the rendered ``kernel`` subtree,
+  prepended early so it is seen by ``parse_early_param()``. Visible in
+  /proc/cmdline.
+- Initrd bootconfig: parsed late in ``setup_boot_config()``; its
+  ``kernel`` keys are placed ahead of ``boot_command_line``, i.e. before
+  the embedded cmdline, so last-wins favors the embedded values. As a
+  bootconfig source, an initrd bootconfig still replaces the embedded
+  bootconfig. Visible in /proc/cmdline and /proc/bootconfig.
+- Embedded bootconfig (runtime): parsed late, only when no initrd
+  bootconfig is present. Visible in /proc/cmdline and /proc/bootconfig.
+
+So with this option the embedded ``kernel.*`` values take precedence
+over an initrd bootconfig's ``kernel.*`` values: for early parameters
+the initrd is not parsed yet, and for ordinary parameters the embedded
+keys land later in the command line. If you need an initrd bootconfig to
+override the embedded ``kernel.*`` keys, leave this option off and rely
+on the runtime parser.
+
+The rendered string is part of the command line, so it appears in
+/proc/cmdline. It is deliberately not shown in /proc/bootconfig: that
+file keeps reporting the parsed bootconfig tree -- the initrd bootconfig
+if present, otherwise the embedded bootconfig -- independent of whether
+build-time cmdline rendering is enabled.
+
 Kernel parameters via Boot Config
 =================================
 

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v7 5/9] bootconfig: add xbc_prepend_embedded_cmdline() helper
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

Add a helper that prepends the build-time-rendered embedded bootconfig
"kernel" subtree (embedded_kernel_cmdline[] from embedded-cmdline.S) to
a cmdline buffer with a separating space. Architectures call this from
setup_arch() before parse_early_param() so early_param() handlers
(mem=, earlycon=, loglevel=, ...) see values supplied via the embedded
bootconfig.

The in-place prepend (shift the existing string right, then drop the
embedded string in front) is factored into a small str_prepend() helper.

On overflow the helper logs an error and leaves the cmdline untouched
rather than panicking. Booting without the embedded values is better
than refusing to boot, and the error tells the user why their embedded
keys are missing.

The helper records whether it actually prepended, exposed via
xbc_embedded_cmdline_applied(). setup_boot_config() uses this to decide
whether the runtime "kernel" render would duplicate keys already folded
into boot_command_line.

Also add bootconfig_cmdline_requested(), a small parse_args() wrapper
that reports whether "bootconfig" was passed on the command line and,
via an optional out-parameter, where the "--" init arguments begin.
setup_arch() and setup_boot_config() share it so the early and late
paths agree on the opt-in. It sits under CONFIG_BOOT_CONFIG rather than
CONFIG_CMDLINE_FROM_BOOTCONFIG because the runtime parser needs it on
every bootconfig build.

When CONFIG_CMDLINE_FROM_BOOTCONFIG=n, the public declaration in
<linux/bootconfig.h> resolves to a no-op stub so callers compile
unchanged.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/bootconfig.h |  14 +++++
 lib/bootconfig.c           | 128 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 141 insertions(+), 1 deletion(-)

diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index 1c7f3b74ffcf3..deda507500da2 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -308,4 +308,18 @@ static inline const char *xbc_get_embedded_bootconfig(size_t *size)
 }
 #endif
 
+/* Bootconfig opt-in detection, shared by setup_arch() and setup_boot_config() */
+#ifdef CONFIG_BOOT_CONFIG
+bool __init bootconfig_cmdline_requested(const char *boot_cmdline, int *end_offset);
+#endif
+
+/* Build-time-rendered bootconfig cmdline prepended in setup_arch() */
+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size);
+bool __init xbc_embedded_cmdline_applied(void);
+#else
+static inline void xbc_prepend_embedded_cmdline(char *dst, size_t size) { }
+static inline bool xbc_embedded_cmdline_applied(void) { return false; }
+#endif
+
 #endif
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 926094d97397e..89c88e359179f 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -19,9 +19,13 @@
 #include <linux/errno.h>
 #include <linux/cache.h>
 #include <linux/compiler.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/printk.h>
 #include <linux/sprintf.h>
 #include <linux/memblock.h>
 #include <linux/string.h>
+#include <asm/setup.h>		/* COMMAND_LINE_SIZE */
 
 #ifdef CONFIG_BOOT_CONFIG_EMBED
 /* embedded_bootconfig_data is defined in bootconfig-data.S */
@@ -34,7 +38,129 @@ const char * __init xbc_get_embedded_bootconfig(size_t *size)
 	return (*size) ? embedded_bootconfig_data : NULL;
 }
 #endif
-#endif
+
+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+/* embedded_kernel_cmdline is defined in embedded-cmdline.S */
+extern __visible const char embedded_kernel_cmdline[];
+extern __visible const char embedded_kernel_cmdline_end[];
+
+/* Set once the embedded cmdline has actually been prepended. */
+static bool xbc_cmdline_applied __initdata;
+
+/*
+ * str_prepend() - Prepend @src in front of the string in @dst, in place
+ * @dst: NUL-terminated destination buffer, currently @dst_len bytes long
+ * @dst_len: length of the current @dst string (excluding its NUL)
+ * @src: bytes to prepend (not NUL-terminated)
+ * @src_len: number of bytes from @src to prepend
+ *
+ * The caller must guarantee @dst has room for src_len + dst_len + 1 bytes.
+ * Moving dst_len + 1 bytes carries @dst's NUL terminator too, so an empty
+ * @dst needs no special case.
+ */
+static void __init str_prepend(char *dst, size_t dst_len,
+			       const char *src, size_t src_len)
+{
+	memmove(dst + src_len, dst, dst_len + 1);
+	memcpy(dst, src, src_len);
+}
+
+/**
+ * xbc_prepend_embedded_cmdline() - Prepend embedded bootconfig cmdline
+ * @dst: cmdline buffer to prepend into (must already contain a NUL byte)
+ * @size: total capacity of @dst in bytes
+ *
+ * Prepend the build-time-rendered "kernel" subtree of the embedded
+ * bootconfig to @dst. The rendered string already ends with a single
+ * space (the xbc_snprint_cmdline() invariant), which serves as the
+ * separator between the embedded keys and any existing content of @dst.
+ * On overflow, log an error and leave @dst untouched rather than
+ * silently truncating: booting without the embedded values is better
+ * than refusing to boot, and the error message tells the user why
+ * their embedded keys are missing.
+ *
+ * Intended to be called from setup_arch() before parse_early_param() so
+ * that early_param() handlers see the embedded values.
+ */
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size)
+{
+	size_t embed_len = embedded_kernel_cmdline_end - embedded_kernel_cmdline;
+	size_t dst_len;
+
+	if (!size || embed_len <= 1)	/* trailing NUL only */
+		return;
+	embed_len--;			/* exclude trailing NUL byte */
+
+	dst_len = strnlen(dst, size);
+	if (embed_len + dst_len + 1 > size) {
+		pr_err("embedded bootconfig cmdline (%zu bytes) does not fit in COMMAND_LINE_SIZE with %zu bytes already used; ignoring embedded values\n",
+		       embed_len, dst_len);
+		return;
+	}
+
+	str_prepend(dst, dst_len, embedded_kernel_cmdline, embed_len);
+	xbc_cmdline_applied = true;
+}
+
+/**
+ * xbc_embedded_cmdline_applied() - Did the embedded cmdline get prepended?
+ *
+ * Return true if xbc_prepend_embedded_cmdline() actually prepended the
+ * embedded "kernel" subtree. setup_boot_config() uses this to avoid
+ * rendering the same keys a second time.
+ */
+bool __init xbc_embedded_cmdline_applied(void)
+{
+	return xbc_cmdline_applied;
+}
+#endif	/* CONFIG_CMDLINE_FROM_BOOTCONFIG */
+
+/* parse_args() callback: flag when the "bootconfig" parameter is present. */
+static int __init bootconfig_optin(char *param, char *val,
+				   const char *unused, void *arg)
+{
+	if (!strcmp(param, "bootconfig"))
+		*(bool *)arg = true;
+	return 0;
+}
+
+/**
+ * bootconfig_cmdline_requested() - Was "bootconfig" passed on the cmdline?
+ * @boot_cmdline: kernel command line to inspect (not modified)
+ * @end_offset: if non-NULL, set to the offset of the init arguments that
+ *		follow a "--" separator, or 0 when there is none
+ *
+ * Parse a private copy of @boot_cmdline (parse_args() is destructive) and
+ * report whether "bootconfig" is present before the "--" separator.
+ * setup_arch() uses this to gate prepending the build-time embedded cmdline;
+ * setup_boot_config() uses it for the runtime opt-in and to locate the init
+ * arguments via @end_offset. Sharing one parser keeps the early and late
+ * paths agreeing on what counts as opt-in. CONFIG_BOOT_CONFIG_FORCE is not
+ * folded in here; callers apply it where they need it.
+ */
+bool __init bootconfig_cmdline_requested(const char *boot_cmdline, int *end_offset)
+{
+	static char tmp_cmdline[COMMAND_LINE_SIZE] __initdata;
+	bool found = false;
+	char *err;
+
+	if (end_offset)
+		*end_offset = 0;
+
+	strscpy(tmp_cmdline, boot_cmdline, COMMAND_LINE_SIZE);
+	err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0,
+			 &found, bootconfig_optin);
+	if (IS_ERR(err))
+		return false;
+
+	/* parse_args() stops at "--" and returns the address of the rest. */
+	if (end_offset && err)
+		*end_offset = err - tmp_cmdline;
+
+	return found;
+}
+
+#endif	/* __KERNEL__ */
 
 /*
  * Extra Boot Config (XBC) is given as tree-structured ascii text of

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v7 4/9] bootconfig: clean build-time tools/bootconfig from make clean
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

The previous patch builds tools/bootconfig during 'make prepare' to
render the embedded bootconfig cmdline, but nothing removes it on
'make clean', leaving the compiled tool and its objects behind.

Wire a bootconfig_clean hook into the top-level clean target so the
compiled tool and its objects are removed by make clean, matching the
prepare-wired tools/objtool and tools/bpf/resolve_btfids.

The hook runs tools/bootconfig's Makefile via $(MAKE), which the kernel
build invokes with -rR (MAKEFLAGS += -rR). -rR drops the built-in $(RM)
variable, so the existing "$(RM) -f ..." clean recipe would expand to a
bare "-f ..." and fail. Spell the recipe with a literal "rm -f" so it
keeps working both standalone and when invoked from Kbuild.

Reviewed-by: Nicolas Schier <n.schier@fritz.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Makefile                  | 11 ++++++++++-
 tools/bootconfig/Makefile |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 5255aa35a2e51..20a2bcacde3b8 100644
--- a/Makefile
+++ b/Makefile
@@ -1587,6 +1587,15 @@ ifneq ($(wildcard $(objtool_O)),)
 	$(Q)$(MAKE) -sC $(abs_srctree)/tools/objtool O=$(objtool_O) srctree=$(abs_srctree) $(patsubst objtool_%,%,$@)
 endif
 
+PHONY += bootconfig_clean
+
+bootconfig_O = $(abspath $(objtree))/tools/bootconfig
+
+bootconfig_clean:
+ifneq ($(wildcard $(bootconfig_O)),)
+	$(Q)$(MAKE) -sC $(srctree)/tools/bootconfig O=$(bootconfig_O) clean
+endif
+
 tools/: FORCE
 	$(Q)mkdir -p $(objtree)/tools
 	$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/
@@ -1757,7 +1766,7 @@ vmlinuxclean:
 	$(Q)$(CONFIG_SHELL) $(srctree)/scripts/link-vmlinux.sh clean
 	$(Q)$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) clean)
 
-clean: archclean vmlinuxclean resolve_btfids_clean objtool_clean
+clean: archclean vmlinuxclean resolve_btfids_clean objtool_clean bootconfig_clean
 
 # mrproper - Delete all generated files, including .config
 #
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 4e82fd9553cde..3cb8066d5141b 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -27,4 +27,4 @@ install: $(ALL_PROGRAMS)
 	install $(OUTPUT)bootconfig $(DESTDIR)$(bindir)
 
 clean:
-	$(RM) -f $(OUTPUT)*.o $(ALL_PROGRAMS)
+	rm -f $(OUTPUT)*.o $(ALL_PROGRAMS)

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v7 3/9] bootconfig: render embedded bootconfig as a kernel cmdline at build time
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

Add the build-time pipeline that renders the "kernel" subtree of
CONFIG_BOOT_CONFIG_EMBED_FILE into a flat cmdline string and stashes
it in .init.rodata as embedded_kernel_cmdline[]. A follow-up patch
adds the runtime helper that prepends this string to boot_command_line
during early architecture setup so parse_early_param() sees the values.

The build wires up:
  tools/bootconfig -C kernel - userspace tool already shared with
                               lib/bootconfig.c, used here in -C mode
                               to render a bootconfig file to a cmdline
  lib/embedded-cmdline.S     - .incbin's the rendered text plus a NUL
                               (listed under the EXTRA BOOT CONFIG
                               MAINTAINERS entry)
  lib/Makefile rule          - runs tools/bootconfig at build time
  Makefile prepare dep       - ensures tools/bootconfig is built first,
                               same pattern as tools/objtool and
                               tools/bpf/resolve_btfids

Drop the test target from tools/bootconfig/Makefile's default 'all'
recipe so that hooking the binary into the kernel build does not run
test-bootconfig.sh on every prepare. The tests stay available as
'make -C tools/bootconfig test', matching the convention of
tools/objtool and tools/bpf/resolve_btfids whose 'all' targets only
build the binary.

Require BOOT_CONFIG_EMBED_FILE to be non-empty before the new option
can be enabled, otherwise tools/bootconfig -C runs against an empty
file and prints a parse error on every kernel build.

The feature gates on CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, a
silent symbol arches select once they've wired the prepend call into
setup_arch(). No arch selects it in this patch, so the user-visible
CONFIG_CMDLINE_FROM_BOOTCONFIG is not yet enableable; when an arch
later opts in, the runtime behavior is added by the follow-up patches.

tools/bootconfig also installs on target systems, so its own Makefile
keeps $(CC) and stays cross-buildable as a standalone tool. The kernel
build, which runs the tool on the build host during prepare, instead
forces CC=$(HOSTCC) from a dedicated tools/bootconfig rule and clears
CROSS_COMPILE= in the sub-make. Without that clear, an LLVM=1 cross
build would inherit CROSS_COMPILE and tools/scripts/Makefile.include
would inject --target=/--sysroot= flags into the host clang invocation,
producing a target binary that fails to exec ("Exec format error").

embedded-cmdline.S places the rendered string in its own .init.rodata
subsection (.init.rodata.embed_cmdline) with the "a" (allocatable,
read-only) flag and %progbits. lib/bootconfig-data.S already places
the embedded bootconfig blob in .init.rodata with the "aw" flag
(xbc_init() rewrites separators in place, so that data must be
writable). Using a distinct subsection name avoids the ld.lld section-
type mismatch that would otherwise arise from mixing "a" and "aw"
under the same name; the linker's "*(.init.rodata .init.rodata.*)"
glob still folds both into the init image and frees them after boot.

A follow-up patch wires the build-time tools/bootconfig into the
top-level clean target.

Reviewed-by: Nicolas Schier <n.schier@fritz.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 MAINTAINERS               |  1 +
 Makefile                  | 16 ++++++++++++++++
 init/Kconfig              | 36 ++++++++++++++++++++++++++++++++++++
 lib/Makefile              | 16 ++++++++++++++++
 lib/embedded-cmdline.S    | 16 ++++++++++++++++
 tools/bootconfig/Makefile |  2 +-
 6 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 57656ec0e9d5d..953231df1911d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9844,6 +9844,7 @@ F:	fs/proc/bootconfig.c
 F:	include/linux/bootconfig.h
 F:	lib/bootconfig-data.S
 F:	lib/bootconfig.c
+F:	lib/embedded-cmdline.S
 F:	tools/bootconfig/*
 F:	tools/bootconfig/scripts/*
 
diff --git a/Makefile b/Makefile
index bf196c6df5b92..5255aa35a2e51 100644
--- a/Makefile
+++ b/Makefile
@@ -1545,6 +1545,22 @@ prepare: tools/bpf/resolve_btfids
 endif
 endif
 
+# tools/bootconfig renders the embedded bootconfig into a cmdline at build time.
+ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+prepare: tools/bootconfig
+endif
+
+# tools/bootconfig is run on the build host during prepare, so force a host
+# binary here; its own Makefile keeps $(CC) for standalone and cross builds.
+# CROSS_COMPILE= is cleared so tools/scripts/Makefile.include does not inject
+# the target's --target=/--sysroot= flags into the host clang invocation under
+# LLVM=1 cross builds (which would produce a target binary that fails to exec).
+tools/bootconfig: export CC := $(HOSTCC)
+tools/bootconfig: FORCE
+	$(Q)mkdir -p $(objtree)/tools
+	$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ \
+		bootconfig CROSS_COMPILE=
+
 # The tools build system is not a part of Kbuild and tends to introduce
 # its own unique issues. If you need to integrate a new tool into Kbuild,
 # please consider locating that tool outside the tools/ tree and using the
diff --git a/init/Kconfig b/init/Kconfig
index 5230d4879b1c8..598690ec313a2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1566,6 +1566,42 @@ config BOOT_CONFIG_EMBED_FILE
 	  This bootconfig will be used if there is no initrd or no other
 	  bootconfig in the initrd.
 
+config ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+	bool
+	help
+	  Silent symbol; no C code reads it directly. Architectures
+	  select it once their setup_arch() calls
+	  xbc_prepend_embedded_cmdline() before parse_early_param().
+	  Its only role is to gate the user-visible
+	  CMDLINE_FROM_BOOTCONFIG option per-arch, the same
+	  ARCH_SUPPORTS_* idiom used by ARCH_SUPPORTS_CFI, etc.
+
+config CMDLINE_FROM_BOOTCONFIG
+	bool "Render embedded bootconfig as kernel cmdline at build time"
+	depends on BOOT_CONFIG_EMBED_FILE != ""
+	depends on ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+	depends on CMDLINE = ""
+	default n
+	help
+	  Render the "kernel" subtree of the embedded bootconfig file into a
+	  flat cmdline string at kernel build time and prepend it to
+	  boot_command_line during early architecture setup. This makes
+	  early_param() handlers (e.g. mem=, earlycon=, loglevel=) see the
+	  values supplied via the embedded bootconfig.
+
+	  The runtime bootconfig parser is unaffected, so tree-structured
+	  consumers such as ftrace boot-time tracing keep working.
+
+	  Note: when an initrd also carries a bootconfig, its "kernel"
+	  subtree is still parsed at runtime, but the embedded "kernel"
+	  keys remain in boot_command_line for parse_early_param() and
+	  end up later than the initrd keys in saved_command_line, so
+	  parse_args() last-wins favors the embedded values. If you need
+	  initrd to override embedded kernel.* keys, leave this option
+	  off.
+
+	  If unsure, say N.
+
 config CMDLINE_LOG_WRAP_IDEAL_LEN
 	int "Length to try to wrap the cmdline when logged at boot"
 	default 1021
diff --git a/lib/Makefile b/lib/Makefile
index 7f75cc6edf94a..4ccdce2fd5e5b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -273,6 +273,22 @@ filechk_defbconf = cat $(or $(real-prereqs), /dev/null)
 $(obj)/default.bconf: $(CONFIG_BOOT_CONFIG_EMBED_FILE) FORCE
 	$(call filechk,defbconf)
 
+obj-$(CONFIG_CMDLINE_FROM_BOOTCONFIG) += embedded-cmdline.o
+$(obj)/embedded-cmdline.o: $(obj)/embedded_cmdline.bin
+
+# Render the bootconfig "kernel" subtree to a flat cmdline string using
+# the userspace tools/bootconfig parser (-C mode). The runtime prepend
+# helper enforces COMMAND_LINE_SIZE at boot, so no build-time size
+# check is performed here (COMMAND_LINE_SIZE is an arch header
+# constant, not a Kconfig value).
+quiet_cmd_render_cmdline = BCONF2C $@
+      cmd_render_cmdline = \
+	$(objtree)/tools/bootconfig/bootconfig -C $< > $@
+
+targets += embedded_cmdline.bin
+$(obj)/embedded_cmdline.bin: $(obj)/default.bconf $(objtree)/tools/bootconfig/bootconfig FORCE
+	$(call if_changed,render_cmdline)
+
 obj-$(CONFIG_RBTREE_TEST) += rbtree_test.o
 obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o
 
diff --git a/lib/embedded-cmdline.S b/lib/embedded-cmdline.S
new file mode 100644
index 0000000000000..bda81b4a42bea
--- /dev/null
+++ b/lib/embedded-cmdline.S
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Embed the build-time-rendered bootconfig "kernel" subtree as a flat
+ * cmdline string. setup_arch() prepends this to boot_command_line on
+ * architectures that select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG.
+ *
+ * Copyright (c) 2026 Meta Platforms, Inc. and affiliates
+ * Copyright (c) 2026 Breno Leitao <leitao@debian.org>
+ */
+	.section .init.rodata.embed_cmdline, "a", %progbits
+	.global embedded_kernel_cmdline
+embedded_kernel_cmdline:
+	.incbin "lib/embedded_cmdline.bin"
+	.byte 0
+	.global embedded_kernel_cmdline_end
+embedded_kernel_cmdline_end:
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 90eb47c9d8de6..4e82fd9553cde 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -15,7 +15,7 @@ override CFLAGS += -Wall -g -I$(CURDIR)/include
 ALL_TARGETS := bootconfig
 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
 
-all: $(ALL_PROGRAMS) test
+all: $(ALL_PROGRAMS)
 
 $(OUTPUT)bootconfig: main.c include/linux/bootconfig.h $(LIBSRC)
 	$(CC) $(filter %.c,$^) $(CFLAGS) $(LDFLAGS) -o $@

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v7 2/9] bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

xbc_node_for_each_key_value() walks to the first leaf under @root, and
when @root is itself a leaf it yields @root. That happens not only for
an empty "kernel {}" subtree, but also when @root carries both a value
and subkeys, e.g.

	kernel = x
	kernel.foo = bar

Here @root ("kernel") is a leaf because its first child is the value
node "x", so the iterator returns @root first. Feeding @root back into
xbc_node_compose_key_after(root, root) returns -EINVAL, which the only
in-kernel caller papers over with a "len <= 0" check -- but the
follow-up tools/bootconfig -C user propagates the error and turns such
a bootconfig into a build failure. Worse, short-circuiting the whole
call on a leaf @root would silently drop the valid "kernel.foo = bar"
descendant that this patch should render.

Skip @root inside the loop instead of bailing out: the value-only entry
is dropped (it is rendered through the "kernel" cmdline path, not here),
while real descendant keys are still emitted. An entirely empty subtree
now renders nothing and returns 0 rather than -EINVAL, matching the
"nothing to render is not an error" semantics expected by the new
build-time caller.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 2ed9ee3dc81c7..926094d97397e 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -440,6 +440,17 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 	 * itself is well defined and returns the would-be length.
 	 */
 	xbc_node_for_each_key_value(root, knode, val) {
+		/*
+		 * An empty or value-only @root (e.g. "kernel {}" or
+		 * "kernel = x", possibly alongside "kernel.foo = bar")
+		 * yields @root itself here. Skip it: composing a key for it
+		 * would fail with -EINVAL, yet any real descendant keys must
+		 * still be rendered. An entirely empty subtree then renders
+		 * nothing and returns 0 rather than an error.
+		 */
+		if (knode == root)
+			continue;
+
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);
 		if (ret < 0)

-- 
2.53.0-Meta


^ permalink raw reply related

* [PATCH v7 0/9] bootconfig: embed kernel.* cmdline at build time
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier

The userspace pieces (xbc_snprint_cmdline() in lib/, tools/bootconfig -C)
already landed; this series wires the rendered cmdline into the kernel.

Motivation: today the embedded bootconfig is parsed at runtime, after
parse_early_param() has already run, so early_param() handlers can't
see embedded values. Folding the kernel.* subtree into the cmdline at
build time gives a CONFIG_CMDLINE-equivalent for embedded-bootconfig
users without forcing them to maintain two cmdline sources.

Behaviorally, the "kernel" subtree is rendered to a flat string at
build time and stashed in .init.rodata. setup_arch() prepends it to
boot_command_line before parse_early_param() runs. Overflow is a soft
error: the helper logs and leaves boot_command_line untouched rather
than panicking, so an oversized embedded bconf cannot brick a boot.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v7:
- The runtime opt-in now shares one helper instead of open-coding its
  own. (Masami)
- bootconfig_cmdline_requested() moved into generic lib code (Masami)
- Link to v6: https://lore.kernel.org/r/20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org

Changes in v6:
- renamed CONFIG_BOOT_CONFIG_EMBED_CMDLINE to
  CONFIG_CMDLINE_FROM_BOOTCONFIG
- prepend embedded bootconfig cmdline before parse_early_param
- Link to v5: https://lore.kernel.org/r/20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org

Changes in v5:
- Patch 3 (Kconfig): drop the redundant "depends on BOOT_CONFIG_EMBED"
  from CMDLINE_FROM_BOOTCONFIG; Julian Braha.
- Patch 6 (Documentation): spell out how the embedded cmdline interacts
  with the bootloader cmdline, an initrd bootconfig, and the embedded
  bootconfig
- Link to v4: https://lore.kernel.org/r/20260609-bootconfig_using_tools-v4-0-73c463f03a97@debian.org

Changes in v4:
- Patch 3 (build pipeline): clear CROSS_COMPILE= in the kernel-side
  tools/bootconfig sub-make. Without it, an LLVM=1 cross build
  inherits CROSS_COMPILE and tools/scripts/Makefile.include injects
  --target=/--sysroot= into the host clang, producing a target
  binary that fails to exec.
- Patch 3 (build pipeline): place embedded-cmdline.S in its own
  .init.rodata.embed_cmdline subsection ("a") so ld.lld does not
  see a section-type mismatch against lib/bootconfig-data.S's
  writable .init.rodata ("aw"). The linker's *(.init.rodata
  .init.rodata.*) glob still folds it into the init image.
- Patch 6 (x86/setup): also accept the bootconfig=<anything> form
  via cmdline_find_option(), matching the runtime parse_args() loop.
  Without it, bootconfig=0/=off would skip the early prepend but
  still trigger the late runtime apply -- a split-brain state.
- New patch 7: document CONFIG_CMDLINE_FROM_BOOTCONFIG in
  Documentation/admin-guide/bootconfig.rst (semantics, opt-in,
  precedence, overflow behavior, example).
- Link to v3: https://lore.kernel.org/r/20260608-bootconfig_using_tools-v3-0-4ddd079a0696@debian.org

Changes in v3:
- Patch 3: Move HOSTCC override to the kernel-side rule; tool keeps
  $(CC) for standalone/cross builds.
- Patch 6: Drop the false fail-safe wording; document the
  BOOT_CONFIG_FORCE=y default interaction.
- Link to v2:
  https://lore.kernel.org/r/20260605-bootconfig_using_tools-v2-0-d309f544b5f7@debian.org

Changes in v2 (addressing review of v1):
- Split out a standalone fix for the NULL-pointer arithmetic in
  xbc_snprint_cmdline() so the build-time render cannot trip host
  UBSan/FORTIFY_SOURCE.
- Rework the leaf-root handling: instead of returning early, skip @root
  inside the loop so a root carrying both a value and subkeys
  (kernel = x together with kernel.foo = bar) still renders its
  descendant keys.
- Build tools/bootconfig with $(HOSTCC) so cross-compiled (ARCH=...)
  builds render the cmdline on the build host instead of failing with
  "Exec format error".
- Mark the embedded cmdline section read-only (drop the "w" flag from
  .init.rodata).
- Add a make-clean hook so tools/bootconfig artifacts are removed by
  make clean.
- Gate the x86 prepend on "bootconfig" being present on the command
  line (or CONFIG_BOOT_CONFIG_FORCE), matching the init.* opt-in
  semantics documented in bootconfig.rst and preserving fail-safe
  recovery: dropping "bootconfig" from the bootloader cmdline now also
  disables the embedded kernel.* keys.
- Link to v1: https://patch.msgid.link/20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org

---
Breno Leitao (9):
      bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
      bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
      bootconfig: render embedded bootconfig as a kernel cmdline at build time
      bootconfig: clean build-time tools/bootconfig from make clean
      bootconfig: add xbc_prepend_embedded_cmdline() helper
      Documentation: bootconfig: document build-time cmdline rendering
      x86/setup: prepend embedded bootconfig cmdline before parse_early_param
      bootconfig: skip runtime kernel.* render once prepended early
      init/main.c: use bootconfig_cmdline_requested() for the runtime opt-in

 Documentation/admin-guide/bootconfig.rst |  81 ++++++++++++++++
 MAINTAINERS                              |   1 +
 Makefile                                 |  27 +++++-
 arch/x86/Kconfig                         |   1 +
 arch/x86/kernel/setup.c                  |  14 ++-
 include/linux/bootconfig.h               |  14 +++
 init/Kconfig                             |  36 +++++++
 init/main.c                              |  52 +++++-----
 lib/Makefile                             |  16 +++
 lib/bootconfig.c                         | 162 +++++++++++++++++++++++++++++--
 lib/embedded-cmdline.S                   |  16 +++
 tools/bootconfig/Makefile                |   4 +-
 12 files changed, 388 insertions(+), 36 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260508-bootconfig_using_tools-cfa7aa9d6a5a

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply

* [PATCH v7 1/9] bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Jonathan Corbet, Shuah Khan
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>

xbc_snprint_cmdline() is meant to be called twice: first with
buf=NULL, size=0 to probe the rendered length, then with a real
buffer to fill it (the standard snprintf() two-pass pattern). The
probe call makes the function compute "buf + size" (NULL + 0) and,
on every iteration, advance "buf += ret" from that NULL base and
pass the result back into snprintf().

Pointer arithmetic on a NULL pointer is undefined behavior. It is
harmless in the in-kernel callers today, but the follow-up patches
run this same code in the userspace tools/bootconfig parser at kernel
build time, where host UBSan / FORTIFY_SOURCE abort the build.

Track a running written length (size_t) instead of mutating @buf, and
only form "buf + len" when @buf is non-NULL. snprintf(NULL, 0, ...)
is itself well defined and returns the would-be length, so the
two-pass "probe then fill" usage returns identical byte counts.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index f445b7703fdd9..2ed9ee3dc81c7 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -427,10 +427,18 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
 int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 {
 	struct xbc_node *knode, *vnode;
-	char *end = buf + size;
 	const char *val, *q;
+	size_t len = 0;
 	int ret;
 
+	/*
+	 * Track the running written length rather than advancing @buf, so we
+	 * never form "buf + size" or "buf += ret" while @buf is NULL (the
+	 * size-probe call passes buf=NULL, size=0). NULL pointer arithmetic
+	 * is undefined behavior and trips host UBSan / FORTIFY_SOURCE when
+	 * this renderer runs at kernel build time. snprintf(NULL, 0, ...)
+	 * itself is well defined and returns the would-be length.
+	 */
 	xbc_node_for_each_key_value(root, knode, val) {
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);
@@ -439,10 +447,11 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 
 		vnode = xbc_node_get_child(knode);
 		if (!vnode) {
-			ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
+			ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+				       "%s ", xbc_namebuf);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 			continue;
 		}
 		xbc_array_for_each_value(vnode, val) {
@@ -452,15 +461,15 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 			 * whitespace.
 			 */
 			q = strpbrk(val, " \t\r\n") ? "\"" : "";
-			ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
-				       xbc_namebuf, q, val, q);
+			ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+				       "%s=%s%s%s ", xbc_namebuf, q, val, q);
 			if (ret < 0)
 				return ret;
-			buf += ret;
+			len += ret;
 		}
 	}
 
-	return buf - (end - size);
+	return len;
 }
 #undef rest
 

-- 
2.53.0-Meta


^ permalink raw reply related

* Re: [RFC PATCH v2 0/4] tracing/osnoise: Track IPIs
From: Valentin Schneider @ 2026-06-26 12:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu,
	Mathieu Desnoyers, Tomas Glozar, Costa Shulyupin, Crystal Wood,
	John Kacur, Ivan Pravdin, Jonathan Corbet
In-Reply-To: <20260626062658.7f95bcad@fedora>

On 26/06/26 06:26, Steven Rostedt wrote:
> On Wed, 17 Jun 2026 15:17:55 +0200
> Valentin Schneider <vschneid@redhat.com> wrote:
>
>> Hi folks,
>>
>> So I've seen a few times now reports of latency spikes caused by IPIs, usually
>> because of isolation misconfiguration, but only detected at the tail of end
>> e.g. a 24h timerlat run.
>>
>> It's not because those IPIs are rare, but rather that they don't by themselves
>> cause a monitered CPU to reach the latency threshold, it's usually a combined
>> interference that gets us there.
>>
>> I'd like to make it easier to detect such misconfigurations and thus IPIs
>> hitting supposedly-isolated CPUs. I initially kludged a timerlat option to stop
>> tracing as soon as an IPI was sent to a monitored CPU, regardless of the latency
>> threshold. It sort of did the trick, but Tomáš convinced me timerlat wasn't
>> really the place for that.
>>
>> So here's IPI tracking added to osnoise. This time around fully in userspace, as
>> Tomáš pointed out to me that this will make it a lot easier to deploy to older
>> kernels.
>>
>> Based on top of linux/next at 'next-20260616' to have the latest libsubcmd
>> changes.
>>
>
> Hi Valentin,
>
> My new job actually makes me very interested in IPI interference, and
> this patch set looks *very* interesting. I'm currently finishing up my
> orientation and hopefully next week I can start catching up on all my
> email.
>

Welcome back :-) If IPIs are your thing, you may also have a look at
[1]. I'm working on a v10 following some (surprisingly) useful feedback
from Sashiko.

[1]: https://lore.kernel.org/lkml/20260505082355.1982003-1-vschneid@redhat.com/

> I'll try to take a deeper look at this in the coming weeks.
>

Thanks!

> -- Steve


^ permalink raw reply

* Re: [PATCH 0/2] rtla: Add tests for option parsing with attached arguments
From: Tomas Glozar @ 2026-06-26 12:22 UTC (permalink / raw)
  To: John Kacur; +Cc: linux-trace-kernel, Steven Rostedt, linux-kernel
In-Reply-To: <20260602155210.60439-1-jkacur@redhat.com>

út 2. 6. 2026 v 17:52 odesílatel John Kacur <jkacur@redhat.com> napsal:
>
> Note: Patch 1/2 is a resend of the timerlat hist tests sent previously.
> Patch 2/2 adds tests for the remaining rtla commands.
>

Ah, this confused me. I saw cover letters, 1/2, and 2/2 with almost
the same title and content, and wondered what was going on.

> Signed-off-by: John Kacur <jkacur@redhat.com>
>
> John Kacur (2):
>   rtla/timerlat: Add tests for option parsing with attached arguments
>   rtla: Add tests for option parsing with attached arguments
>
>  tools/tracing/rtla/tests/hwnoise.t  | 10 ++++++++++
>  tools/tracing/rtla/tests/osnoise.t  | 18 ++++++++++++++++++
>  tools/tracing/rtla/tests/timerlat.t | 18 ++++++++++++++++++

This should be either one patch, or three patches (hwnoise, osnoise,
timerlat); especially timerlat top and timerlat hist should not be
split between two commits, as the tests are in the same file. Also,
identical tests for both top and hist should use check_top_hist or
check_top_q_hist, see commit c15c55c01e48 ("rtla/tests: Cover both top
and hist tools where possible").

Anyway, I don't think this needs runtime tests. CLI unit tests (well,
actually more like integration tests, as RTLA design doesn't have
proper isolated unit tests) in tools/tracing/rtla/tests/unit/*_cli.c
should be able to fully cover this, with the benefit of being much
faster and not requiring root or any kernel features. Do you have any
concerns that cannot be covered by unit testing?

>  3 files changed, 46 insertions(+)
>
> --
> 2.54.0
>

Tomas


^ permalink raw reply

* Re: [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer
From: Steven Rostedt @ 2026-06-26 10:42 UTC (permalink / raw)
  To: Martin Kaiser; +Cc: Masami Hiramatsu (Google), linux-trace-kernel, linux-kernel
In-Reply-To: <aj5SdK9gUIVoPmmE@akranes.kaiser.cx>

On Fri, 26 Jun 2026 12:20:36 +0200
Martin Kaiser <martin@kaiser.cx> wrote:

> > That is, to have +u0() say "this is going to be dereferencing user space".  
> 
> > I'll add Martin's patch and see if it makes the above work.  
> 
> I've just tried your command with my patch. It works for me, filenames are
> logged correctly.

Yep, this definitely looks like a fix. We have;

	addr = rec + field->offset;

Where addr points to the location of the field on the ring buffer, thus
your change to make it:

	val = *(unsigned long *)addr;

Reads the full "long size" of the event on the ring buffer, instead of
reading just one byte. It is "val" that gets dereferenced later by the
probe logic (the "+0u()"), which has all the protections we need.

I'll queue this up.

Thanks!

-- Steve

^ permalink raw reply

* Re: [PATCH v3] mm/lruvec: trace LRU add drains and drain-all requests
From: Steven Rostedt @ 2026-06-26 10:23 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Shakeel Butt, David Hildenbrand (Arm), JP Kobryn, linux-mm, willy,
	usama.arif, akpm, mhocko, mhiramat, mathieu.desnoyers, kasong,
	qi.zheng, baohua, axelrasmussen, yuanchu, weixugc, chrisl,
	shikemeng, nphamcs, baoquan.he, youngjun.park, linux-kernel,
	linux-trace-kernel
In-Reply-To: <1136baf3-3967-4202-9eaa-5fd667c235cf@kernel.org>

On Wed, 17 Jun 2026 20:18:57 +0200
"Vlastimil Babka (SUSE)" <vbabka@kernel.org> wrote:

> Yeah and I don't recall ever that a change to a mm tracepoint would ever
> break someone who'd complain and we'd have to revert it. These are niche
> enough. So I think the risk is low.

Note, we have literally thousands of trace events already, so the
chances of one being required by an application is rather low.
Especially since access still requires root access, which limits it to
administration tooling.

That said, if you know of a tool that uses trace events, then those
that it is likely to use can become an ABI. For mm trace evnets,
rasdaemon is the tool to worry about.

-- Steve

^ permalink raw reply

* Re: [RFC PATCH v2 0/4] tracing/osnoise: Track IPIs
From: Steven Rostedt @ 2026-06-26 10:26 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu,
	Mathieu Desnoyers, Tomas Glozar, Costa Shulyupin, Crystal Wood,
	John Kacur, Ivan Pravdin, Jonathan Corbet
In-Reply-To: <20260617131803.2988989-1-vschneid@redhat.com>

On Wed, 17 Jun 2026 15:17:55 +0200
Valentin Schneider <vschneid@redhat.com> wrote:

> Hi folks,
> 
> So I've seen a few times now reports of latency spikes caused by IPIs, usually
> because of isolation misconfiguration, but only detected at the tail of end
> e.g. a 24h timerlat run.
> 
> It's not because those IPIs are rare, but rather that they don't by themselves
> cause a monitered CPU to reach the latency threshold, it's usually a combined
> interference that gets us there.
> 
> I'd like to make it easier to detect such misconfigurations and thus IPIs
> hitting supposedly-isolated CPUs. I initially kludged a timerlat option to stop
> tracing as soon as an IPI was sent to a monitored CPU, regardless of the latency
> threshold. It sort of did the trick, but Tomáš convinced me timerlat wasn't
> really the place for that.
> 
> So here's IPI tracking added to osnoise. This time around fully in userspace, as
> Tomáš pointed out to me that this will make it a lot easier to deploy to older
> kernels.
> 
> Based on top of linux/next at 'next-20260616' to have the latest libsubcmd
> changes.
>   

Hi Valentin,

My new job actually makes me very interested in IPI interference, and
this patch set looks *very* interesting. I'm currently finishing up my
orientation and hopefully next week I can start catching up on all my
email.

I'll try to take a deeper look at this in the coming weeks.

-- Steve

^ permalink raw reply

* Re: [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer
From: Martin Kaiser @ 2026-06-26 10:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, linux-kernel
In-Reply-To: <20260626055440.76c28d25@fedora>

Thus wrote Steven Rostedt (rostedt@goodmis.org):

> On Mon, 22 Jun 2026 12:58:15 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > The problem is that the event does not provide the information that
> > the string is in user space or not. But actually, for syscall events
> > all data pointed by syscall parameter should be in the user space.

> I think we should make this work then:

>   echo 'e:open syscalls.sys_enter_openat file=+u0($filename):ustring' > dynamic_events

> That is, to have +u0() say "this is going to be dereferencing user space".

> I'll add Martin's patch and see if it makes the above work.

I've just tried your command with my patch. It works for me, filenames are
logged correctly.

Martin

> -- Steve

^ permalink raw reply

* Re: [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer
From: Steven Rostedt @ 2026-06-26  9:54 UTC (permalink / raw)
  To: Masami Hiramatsu (Google); +Cc: Martin Kaiser, linux-trace-kernel, linux-kernel
In-Reply-To: <20260622125815.7416792c020bd3d81c01e51b@kernel.org>

On Mon, 22 Jun 2026 12:58:15 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> The problem is that the event does not provide the information that
> the string is in user space or not. But actually, for syscall events
> all data pointed by syscall parameter should be in the user space.

I think we should make this work then:

  echo 'e:open syscalls.sys_enter_openat file=+u0($filename):ustring' > dynamic_events

That is, to have +u0() say "this is going to be dereferencing user space".

I'll add Martin's patch and see if it makes the above work.

-- Steve

^ permalink raw reply

* Re: [PATCHv4 05/13] uprobes/x86: Move optimized uprobe from nop5 to nop10
From: Oleg Nesterov @ 2026-06-26  9:43 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Ingo Molnar, Masami Hiramatsu, Andrii Nakryiko,
	bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-6-jolsa@kernel.org>

On 05/26, Jiri Olsa wrote:
>
> which means we need to allow 0x2e prefix which maps to INAT_PFX_CS
> attribute in is_prefix_bad function.

...

> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -266,7 +266,6 @@ static bool is_prefix_bad(struct insn *insn)
>  		attr = inat_get_opcode_attribute(p);
>  		switch (attr) {
>  		case INAT_MAKE_PREFIX(INAT_PFX_ES):
> -		case INAT_MAKE_PREFIX(INAT_PFX_CS):

I know nothing about how x86 CPU works, so let me ask...

What if insn->x86_64 is false? Is it safe to allow the CS prefix in
this case?

Oleg.


^ permalink raw reply

* Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
From: Wandun @ 2026-06-26  9:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-mm, linux-kernel, linux-trace-kernel, linux-rt-devel, akpm,
	vbabka, surenb, mhocko, jackmanb, hannes, ziy, rostedt, mhiramat,
	mathieu.desnoyers, david, ljs, liam, rppt, clrkwllms,
	Alexander.Krabler
In-Reply-To: <20260626092606.7BgipTin@linutronix.de>



On 6/26/26 17:26, Sebastian Andrzej Siewior wrote:
> On 2026-06-04 10:38:10 [+0800], Wandun Chen wrote:
> …
>> Reported-by: Alexander Krabler <Alexander.Krabler@kuka.com>
>> Closes: https://lore.kernel.org/all/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/
>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>> Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
>> Link: https://lore.kernel.org/all/33275585-f2db-4779-89f0-3ae24b455a67@suse.cz/ [1]
> 
> Is it possible to get a Fixes tag on the final fix so that it can be
> backported stable?

Got it.

Best regards,
Wandun

> 
> Sebastian


^ permalink raw reply

* Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
From: Wandun @ 2026-06-26  9:38 UTC (permalink / raw)
  To: Alexander Krabler, Vlastimil Babka (SUSE), linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev
  Cc: akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
	jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com,
	rostedt@goodmis.org, mhiramat@kernel.org,
	mathieu.desnoyers@efficios.com, david@kernel.org, ljs@kernel.org,
	liam@infradead.org, rppt@kernel.org, bigeasy@linutronix.de,
	clrkwllms@kernel.org, Hugh Dickins
In-Reply-To: <PR3PR01MB6666C11E08516555C153D4FD82EB2@PR3PR01MB6666.eurprd01.prod.exchangelabs.com>



On 6/26/26 16:45, Alexander Krabler wrote:
> On 6/24/26 13:08, Wandun wrote:
>> On 6/22/26 17:55, Vlastimil Babka (SUSE) wrote:
>>> On 6/18/26 13:43, Wandun wrote:
>>>> Yes, I wrote a test case that can reproduce it in a few second.
>>>>
>>>> The test case contains 3 steps:
>>>> 1. mlockall
>>>> 2. mmap file(2GB) + trigger file write page fault;
>>>> 3. during step 1, trigger compact via /proc/sys/vm/compact_memory
>>>>
>>>>
>>>> My reproduction environment is qemu with 4GB ram, 8 core, aarch64,
>>>> preempt_rt and includes the tracepoint in patch 02.
>>>> After running the reproduction program for a few seconds, the
>>>> following output appears.
>>>>
>>>> repro-403     [004] ....1   101.270505: mm_compaction_isolate_folio: pfn=0x71e3a mode=0x0
>> flags=referenced|uptodate|mlocked
>>>> repro-403     [004] ....1   101.270507: mm_compaction_isolate_folio: pfn=0x71e3b mode=0x0
>> flags=referenced|uptodate|mlocked
>>>> repro-403     [004] ....1   101.270513: mm_compaction_isolate_folio: pfn=0x71e3c mode=0x0
>> flags=referenced|uptodate|mlocked
>>>> repro-403     [004] ....1   101.270515: mm_compaction_isolate_folio: pfn=0x71e3d mode=0x0
>> flags=uptodate|mlocked
>>>> repro-403     [004] ....1   101.270517: mm_compaction_isolate_folio: pfn=0x71e3e mode=0x0
>> flags=uptodate|mlocked
>>>> repro-403     [004] ....1   101.270520: mm_compaction_isolate_folio: pfn=0x71e3f mode=0x0
>> flags=uptodate|mlocked
> 
> I applied your PATCH 2/3 to our kernel and checked with your reproducer,
> I get similar output, e.g.
> t_compact-2148    [005] ....1   515.320221: mm_compaction_isolate_folio: pfn=0xe66c2 mode=0x0
>                                             flags=referenced|uptodate|active|swapbacked|mlocked
> 
> With your first patch applied, the amount of these messages decrease.

Parts of mlocked but not unevictable pages has been filter out, so
messages decrease, but racy is still there.

> I was not able to apply your third patch to our (older) kernel.

Patch 3 is meaningless to you. The problem in your report is caused by kcompactd,
not cma alloc, so it is of no use to you.

> 
> However, we were not able to reproduce the actual race
> (mlockall() process waiting on a migration PTE),
> not in the past, not now. Might be hard to trigger that race.

Not hard to trigger that case, I added a debug message, such as below,
lots of messages occur in a few second.

diff --cc mm/memory.c
index ff338c2abe92,ff338c2abe92..6552b3b14f78
--- a/mm/memory.c
+++ b/mm/memory.c
@@@ -4768,6 -4768,6 +4768,8 @@@ vm_fault_t do_swap_page(struct vm_faul
                if (softleaf_is_migration(entry)) {
                        migration_entry_wait(vma->vm_mm, vmf->pmd,
                                             vmf->address);
+                       if (!strcmp(current->comm, "repro"))
+                               pr_err("============== hit ================\n");
                } else if (softleaf_is_device_exclusive(entry)) {
                        vmf->page = softleaf_to_page(entry);
                        ret = remove_device_exclusive_entry(vmf);

Best regard,
Wandun

> 
>> IIUC, more accurately, the migration entry in the page talbe is real a bad for
>> RT process, because isolate page doesn't modify the page table, so memory
>> access continues as usual, therefore a new idea occur.
>>
>> S1. In the mlock[all] syscall, if mlock_vma_pages_range hit a migration entry,
>>     then, it should wait for the migration to complete.
>>
>> S2. During the unmap phase of memory migration, prevent a page from being unmapped
>>     if the page's associated vma is markd with VM_LOCKED, similar to how reclaim is
>>     disabled for pages in a VM_LOCKED vma(try_to_unmap_one).
>>
>>
>> For a page handled during the mlock[all] syscall:
>>   - if migration has been already finished, there is noting to do;
>>   - if migration is in progress and the migration etnry is already filled, we
>>     wait (S1)
>>   - if the page is in-fight, going to be isolated/migrated, S2 prevents the unmap.
>>
>> For a page handled during a page fault: VM_LOCKED is already set on the vma,
>> so S2 guarantees it will not be unmapped, hence no migration entry.
> 
> I do not understand all details of this, but it looks good,
> especially the S1 case makes a lot of sense for me.
> 
> Nitpick: I suggest to switch order of PATCH 1 and 2 for the next iteration,
> introducing the tracepoint first and then improve the situation.
> 
> Thanks a lot for looking into this issue!
> 
> Best regards,
> Alexander
> 
> --
> 
> KUKA Deutschland GmbH   Board of Directors: Michael Jürgens (Chairman), Johan Naten, Hui Zhang   Registered Office: Augsburg HRB 14914
> 
> This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of contents of this e-mail is strictly forbidden.
> 
> Please consider the environment before printing this e-mail.


^ permalink raw reply

* Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
From: Sebastian Andrzej Siewior @ 2026-06-26  9:26 UTC (permalink / raw)
  To: Wandun Chen
  Cc: linux-mm, linux-kernel, linux-trace-kernel, linux-rt-devel, akpm,
	vbabka, surenb, mhocko, jackmanb, hannes, ziy, rostedt, mhiramat,
	mathieu.desnoyers, david, ljs, liam, rppt, clrkwllms,
	Alexander.Krabler
In-Reply-To: <20260604023812.3700316-2-chenwandun1@gmail.com>

On 2026-06-04 10:38:10 [+0800], Wandun Chen wrote:
…
> Reported-by: Alexander Krabler <Alexander.Krabler@kuka.com>
> Closes: https://lore.kernel.org/all/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
> Link: https://lore.kernel.org/all/33275585-f2db-4779-89f0-3ae24b455a67@suse.cz/ [1]

Is it possible to get a Fixes tag on the final fix so that it can be
backported stable?

Sebastian

^ permalink raw reply

* Re: [PATCH v4 2/2] tracing: Remove trace_printk.h from kernel.h
From: Steven Rostedt @ 2026-06-26  8:51 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260625234158.GA261868@ax162>

On Thu, 25 Jun 2026 16:41:58 -0700
Nathan Chancellor <nathan@kernel.org> wrote:


> The following diff resolves it for me, should I send it as a separate
> patch or do you want to just fold it in with a note?
> 
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 621566345406..2301a701ffbb 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -10,6 +10,7 @@
>  #ifndef __LINUX_LOCKDEP_H
>  #define __LINUX_LOCKDEP_H
>  
> +#include <linux/instruction_pointer.h>

Ah, so the reason for this breakage is because lockdep was relying on
instruction_pointer.h, that just happened to be included in kernel.h
via trace_printk.h.

This is a separate issue, so it should be a separate patch. I'll add it
as patch 1 of this series.

Can you send me the config you used. This didn't trigger in my tests.

Thanks,

-- Steve



>  #include <linux/lockdep_types.h>
>  #include <linux/smp.h>
>  #include <asm/percpu.h>


^ permalink raw reply

* Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
From: Alexander Krabler @ 2026-06-26  8:45 UTC (permalink / raw)
  To: Wandun, Vlastimil Babka (SUSE), linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev
  Cc: akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
	jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com,
	rostedt@goodmis.org, mhiramat@kernel.org,
	mathieu.desnoyers@efficios.com, david@kernel.org, ljs@kernel.org,
	liam@infradead.org, rppt@kernel.org, bigeasy@linutronix.de,
	clrkwllms@kernel.org, Hugh Dickins
In-Reply-To: <ca1115c0-1509-453a-8235-08e381a3da6f@gmail.com>

On 6/24/26 13:08, Wandun wrote:
> On 6/22/26 17:55, Vlastimil Babka (SUSE) wrote:
>> On 6/18/26 13:43, Wandun wrote:
>>> Yes, I wrote a test case that can reproduce it in a few second.
>>>
>>> The test case contains 3 steps:
>>> 1. mlockall
>>> 2. mmap file(2GB) + trigger file write page fault;
>>> 3. during step 1, trigger compact via /proc/sys/vm/compact_memory
>>>
>>>
>>> My reproduction environment is qemu with 4GB ram, 8 core, aarch64,
>>> preempt_rt and includes the tracepoint in patch 02.
>>> After running the reproduction program for a few seconds, the
>>> following output appears.
>>>
>>> repro-403     [004] ....1   101.270505: mm_compaction_isolate_folio: pfn=0x71e3a mode=0x0
> flags=referenced|uptodate|mlocked
>>> repro-403     [004] ....1   101.270507: mm_compaction_isolate_folio: pfn=0x71e3b mode=0x0
> flags=referenced|uptodate|mlocked
>>> repro-403     [004] ....1   101.270513: mm_compaction_isolate_folio: pfn=0x71e3c mode=0x0
> flags=referenced|uptodate|mlocked
>>> repro-403     [004] ....1   101.270515: mm_compaction_isolate_folio: pfn=0x71e3d mode=0x0
> flags=uptodate|mlocked
>>> repro-403     [004] ....1   101.270517: mm_compaction_isolate_folio: pfn=0x71e3e mode=0x0
> flags=uptodate|mlocked
>>> repro-403     [004] ....1   101.270520: mm_compaction_isolate_folio: pfn=0x71e3f mode=0x0
> flags=uptodate|mlocked

I applied your PATCH 2/3 to our kernel and checked with your reproducer,
I get similar output, e.g.
t_compact-2148    [005] ....1   515.320221: mm_compaction_isolate_folio: pfn=0xe66c2 mode=0x0
                                            flags=referenced|uptodate|active|swapbacked|mlocked

With your first patch applied, the amount of these messages decrease.
I was not able to apply your third patch to our (older) kernel.

However, we were not able to reproduce the actual race
(mlockall() process waiting on a migration PTE),
not in the past, not now. Might be hard to trigger that race.

> IIUC, more accurately, the migration entry in the page talbe is real a bad for
> RT process, because isolate page doesn't modify the page table, so memory
> access continues as usual, therefore a new idea occur.
>
> S1. In the mlock[all] syscall, if mlock_vma_pages_range hit a migration entry,
>     then, it should wait for the migration to complete.
>
> S2. During the unmap phase of memory migration, prevent a page from being unmapped
>     if the page's associated vma is markd with VM_LOCKED, similar to how reclaim is
>     disabled for pages in a VM_LOCKED vma(try_to_unmap_one).
>
>
> For a page handled during the mlock[all] syscall:
>   - if migration has been already finished, there is noting to do;
>   - if migration is in progress and the migration etnry is already filled, we
>     wait (S1)
>   - if the page is in-fight, going to be isolated/migrated, S2 prevents the unmap.
>
> For a page handled during a page fault: VM_LOCKED is already set on the vma,
> so S2 guarantees it will not be unmapped, hence no migration entry.

I do not understand all details of this, but it looks good,
especially the S1 case makes a lot of sense for me.

Nitpick: I suggest to switch order of PATCH 1 and 2 for the next iteration,
introducing the tracepoint first and then improve the situation.

Thanks a lot for looking into this issue!

Best regards,
Alexander

--

KUKA Deutschland GmbH   Board of Directors: Michael Jürgens (Chairman), Johan Naten, Hui Zhang   Registered Office: Augsburg HRB 14914

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of contents of this e-mail is strictly forbidden.

Please consider the environment before printing this e-mail.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox