Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v2] scripts: Have make TAGS not include structure members
From: Steven Rostedt @ 2026-05-27 18:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Linux trace kernel, linux-kbuild, Andrew Morton,
	Masahiro Yamada, Masatake YAMATO, Geert Uytterhoeven,
	Michal Marek, Yang Bai
In-Reply-To: <20260527162914.GH3102624@noisy.programming.kicks-ass.net>

On Wed, 27 May 2026 18:29:14 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Yeah, I often use member tags.
> 
> The tags file have a 'kind' field, what you want is for emacs to order
> on kind and prefer 'f' over 'm'.
> 
> The alternative is switching to use emacs-lsp, that way the editor knows
> the kind of symbol you want. If you're on a function call, it should
> only consider 'f' tags. Whereas if the cursor is on a member deref, it
> should only consider 'm'.

OK, so in addition to my procrastination of sending out this patch, I
finally changed my .emacs file to have "Meta-." call
xref-find-definitions instead of find-tags.

The xref-find-definitions gives a list of all the tags it finds and you
can search for the function. In the example of "dev_name", I simply
searched for "dev_name(" and it found the function immediately.

As "find-tags" has been deprecated back in 2016 (10 years ago!), and
xref-find-definitions doesn't suffer as much as 'find-tags' does with
respect to member tags. I'll simply drop this patch.

I can also finally archive the conversation I have in my INBOX! ;-)

-- Steve

^ permalink raw reply

* [PATCH 3/4] bootconfig: add xbc_prepend_embedded_cmdline() helper
From: Breno Leitao @ 2026-05-27 16:41 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org>

Add a helper that prepends the build-time-rendered embedded bootconfig
"kernel" subtree (embedded_kernel_cmdline[] from embedded-cmdline.S) to
a cmdline buffer with a separating space. Architectures call this from
setup_arch() before parse_early_param() so early_param() handlers
(mem=, earlycon=, loglevel=, ...) see values supplied via the embedded
bootconfig.

On overflow the helper logs an error and leaves the cmdline untouched
rather than panicking. Booting without the embedded values is better
than refusing to boot, and the error tells the user why their embedded
keys are missing.

When CONFIG_BOOT_CONFIG_EMBED_CMDLINE=n, the public declaration in
<linux/bootconfig.h> resolves to a no-op stub so callers compile
unchanged.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 include/linux/bootconfig.h |  7 +++++++
 lib/bootconfig.c           | 48 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index 1c7f3b74ffcf..dcb0c86cbc54 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -308,4 +308,11 @@ static inline const char *xbc_get_embedded_bootconfig(size_t *size)
 }
 #endif
 
+/* Build-time-rendered bootconfig cmdline prepended in setup_arch() */
+#ifdef CONFIG_BOOT_CONFIG_EMBED_CMDLINE
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size);
+#else
+static inline void xbc_prepend_embedded_cmdline(char *dst, size_t size) { }
+#endif
+
 #endif
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 3a102c9122f7..10c62c8600c8 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -19,6 +19,7 @@
 #include <linux/errno.h>
 #include <linux/cache.h>
 #include <linux/compiler.h>
+#include <linux/printk.h>
 #include <linux/sprintf.h>
 #include <linux/memblock.h>
 #include <linux/string.h>
@@ -34,6 +35,53 @@ const char * __init xbc_get_embedded_bootconfig(size_t *size)
 	return (*size) ? embedded_bootconfig_data : NULL;
 }
 #endif
+
+#ifdef CONFIG_BOOT_CONFIG_EMBED_CMDLINE
+/* embedded_kernel_cmdline is defined in embedded-cmdline.S */
+extern __visible const char embedded_kernel_cmdline[];
+extern __visible const char embedded_kernel_cmdline_end[];
+
+/**
+ * xbc_prepend_embedded_cmdline() - Prepend embedded bootconfig cmdline
+ * @dst: cmdline buffer to prepend into (must already contain a NUL byte)
+ * @size: total capacity of @dst in bytes
+ *
+ * Prepend the build-time-rendered "kernel" subtree of the embedded
+ * bootconfig to @dst. The rendered string already ends with a single
+ * space (the xbc_snprint_cmdline() invariant), which serves as the
+ * separator between the embedded keys and any existing content of @dst.
+ * On overflow, log an error and leave @dst untouched rather than
+ * silently truncating: booting without the embedded values is better
+ * than refusing to boot, and the error message tells the user why
+ * their embedded keys are missing.
+ *
+ * Intended to be called from setup_arch() before parse_early_param() so
+ * that early_param() handlers see the embedded values.
+ */
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size)
+{
+	size_t embed_len = embedded_kernel_cmdline_end - embedded_kernel_cmdline;
+	size_t dst_len;
+
+	if (!size || embed_len <= 1)	/* trailing NUL only */
+		return;
+	embed_len--;			/* exclude trailing NUL byte */
+
+	dst_len = strnlen(dst, size);
+	if (embed_len + dst_len + 1 > size) {
+		pr_err("embedded bootconfig cmdline (%zu bytes) does not fit in COMMAND_LINE_SIZE with %zu bytes already used; ignoring embedded values\n",
+		       embed_len, dst_len);
+		return;
+	}
+
+	if (dst_len)
+		memmove(dst + embed_len, dst, dst_len + 1);
+	else
+		dst[embed_len] = '\0';
+	memcpy(dst, embedded_kernel_cmdline, embed_len);
+}
+#endif
+
 #endif
 
 /*

-- 
2.54.0


^ permalink raw reply related

* [PATCH 4/4] x86/setup: prepend embedded bootconfig cmdline before parse_early_param
From: Breno Leitao @ 2026-05-27 16:41 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org>

Call xbc_prepend_embedded_cmdline() in setup_arch() right after the
CONFIG_CMDLINE merge and before strscpy(command_line, ...) so the
build-time-rendered embedded bootconfig "kernel" subtree is part of
boot_command_line by the time parse_early_param() runs. early_param()
handlers (mem=, earlycon=, loglevel=, ...) now see values supplied via
CONFIG_BOOT_CONFIG_EMBED_FILE without parsing bootconfig at runtime.

Select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG so the user-visible
CONFIG_BOOT_CONFIG_EMBED_CMDLINE option becomes selectable on x86.

With this select in place, setup_boot_config() in init/main.c would
otherwise render the embedded "kernel" subtree a second time via
xbc_make_cmdline("kernel") and prepend it to saved_command_line /
static_command_line through extra_command_line, duplicating every
embedded kernel.* key in /proc/cmdline and causing accumulating
handlers (console=, earlycon=, ...) to register the same value twice.
Track whether the bootconfig data came from the embedded source and
skip the duplicate render in that case.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 arch/x86/Kconfig        |  1 +
 arch/x86/kernel/setup.c |  3 +++
 init/main.c             | 19 ++++++++++++++++---
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f24810015234..f839795692b4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -126,6 +126,7 @@ config X86
 	select ARCH_SUPPORTS_NUMA_BALANCING	if X86_64
 	select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP	if NR_CPUS <= 4096
 	select ARCH_SUPPORTS_CFI		if X86_64
+	select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
 	select ARCH_USES_CFI_TRAPS		if X86_64 && CFI
 	select ARCH_SUPPORTS_LTO_CLANG
 	select ARCH_SUPPORTS_LTO_CLANG_THIN
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 46882ce79c3a..592c4c79c974 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -6,6 +6,7 @@
  * parts of early kernel initialization.
  */
 #include <linux/acpi.h>
+#include <linux/bootconfig.h>
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/crash_dump.h>
@@ -924,6 +925,8 @@ void __init setup_arch(char **cmdline_p)
 	builtin_cmdline_added = true;
 #endif
 
+	xbc_prepend_embedded_cmdline(boot_command_line, COMMAND_LINE_SIZE);
+
 	strscpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
 	*cmdline_p = command_line;
 
diff --git a/init/main.c b/init/main.c
index e363232b428b..8264bfa97aa2 100644
--- a/init/main.c
+++ b/init/main.c
@@ -378,12 +378,15 @@ static void __init setup_boot_config(void)
 	int pos, ret;
 	size_t size;
 	char *err;
+	bool from_embedded = false;
 
 	/* Cut out the bootconfig data even if we have no bootconfig option */
 	data = get_boot_config_from_initrd(&size);
 	/* If there is no bootconfig in initrd, try embedded one. */
-	if (!data)
+	if (!data) {
 		data = xbc_get_embedded_bootconfig(&size);
+		from_embedded = true;
+	}
 
 	strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
 	err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
@@ -421,8 +424,18 @@ static void __init setup_boot_config(void)
 	} else {
 		xbc_get_info(&ret, NULL);
 		pr_info("Load bootconfig: %ld bytes %d nodes\n", (long)size, ret);
-		/* keys starting with "kernel." are passed via cmdline */
-		extra_command_line = xbc_make_cmdline("kernel");
+		/*
+		 * keys starting with "kernel." are passed via cmdline. When
+		 * BOOT_CONFIG_EMBED_CMDLINE is enabled and this bootconfig
+		 * came from the embedded source, setup_arch() already
+		 * prepended the rendered "kernel" subtree to
+		 * boot_command_line; rendering again here would duplicate
+		 * the keys in saved_command_line / static_command_line and
+		 * cause accumulating handlers (console=, earlycon=, ...) to
+		 * re-register the same value.
+		 */
+		if (!IS_ENABLED(CONFIG_BOOT_CONFIG_EMBED_CMDLINE) || !from_embedded)
+			extra_command_line = xbc_make_cmdline("kernel");
 		/* Also, "init." keys are init arguments */
 		extra_init_args = xbc_make_cmdline("init");
 	}

-- 
2.54.0


^ permalink raw reply related

* [PATCH 2/4] bootconfig: render embedded bootconfig as a kernel cmdline at build time
From: Breno Leitao @ 2026-05-27 16:41 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org>

Add the build-time pipeline that renders the "kernel" subtree of
CONFIG_BOOT_CONFIG_EMBED_FILE into a flat cmdline string and stashes
it in .init.rodata as embedded_kernel_cmdline[]. A follow-up patch
adds the runtime helper that prepends this string to boot_command_line
during early architecture setup so parse_early_param() sees the values.

The build wires up:
  tools/bootconfig -C kernel - userspace tool already shared with
                               lib/bootconfig.c, used here in -C mode
                               to render a bootconfig file to a cmdline
  lib/embedded-cmdline.S     - .incbin's the rendered text plus a NUL
  lib/Makefile rule          - runs tools/bootconfig at build time
  Makefile prepare dep       - ensures tools/bootconfig is built first,
                               same pattern as tools/objtool and
                               tools/bpf/resolve_btfids

Drop the test target from tools/bootconfig/Makefile's default 'all'
recipe so that hooking the binary into the kernel build does not run
test-bootconfig.sh on every prepare. The tests stay available as
'make -C tools/bootconfig test', matching the convention of
tools/objtool and tools/bpf/resolve_btfids whose 'all' targets only
build the binary.

Require BOOT_CONFIG_EMBED_FILE to be non-empty before the new option
can be enabled, otherwise tools/bootconfig -C runs against an empty
file and prints a parse error on every kernel build.

The feature gates on CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, a
silent symbol arches select once they've wired the prepend call into
setup_arch(). No arch selects it in this patch, so the user-visible
CONFIG_BOOT_CONFIG_EMBED_CMDLINE is not yet enableable; when an arch
later opts in, the runtime behavior is added by the follow-up patches.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Makefile                  |  5 +++++
 init/Kconfig              | 33 +++++++++++++++++++++++++++++++++
 lib/Makefile              | 16 ++++++++++++++++
 lib/embedded-cmdline.S    | 16 ++++++++++++++++
 tools/bootconfig/Makefile |  2 +-
 5 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index d59f703f9797..3ee259d00a9a 100644
--- a/Makefile
+++ b/Makefile
@@ -1543,6 +1543,11 @@ prepare: tools/bpf/resolve_btfids
 endif
 endif
 
+# lib/Makefile invokes tools/bootconfig to render the embedded bconf to cmdline.
+ifdef CONFIG_BOOT_CONFIG_EMBED_CMDLINE
+prepare: tools/bootconfig
+endif
+
 # The tools build system is not a part of Kbuild and tends to introduce
 # its own unique issues. If you need to integrate a new tool into Kbuild,
 # please consider locating that tool outside the tools/ tree and using the
diff --git a/init/Kconfig b/init/Kconfig
index ca35184532dc..5f491a5ac4b8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1569,6 +1569,39 @@ config BOOT_CONFIG_EMBED_FILE
 	  This bootconfig will be used if there is no initrd or no other
 	  bootconfig in the initrd.
 
+config ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+	bool
+	help
+	  Selected by architectures whose setup_arch() prepends the
+	  build-time-rendered embedded bootconfig cmdline to
+	  boot_command_line before parse_early_param() runs.
+
+config BOOT_CONFIG_EMBED_CMDLINE
+	bool "Render embedded bootconfig as kernel cmdline at build time"
+	depends on BOOT_CONFIG_EMBED
+	depends on BOOT_CONFIG_EMBED_FILE != ""
+	depends on ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+	default n
+	help
+	  Render the "kernel" subtree of the embedded bootconfig file into a
+	  flat cmdline string at kernel build time and prepend it to
+	  boot_command_line during early architecture setup. This makes
+	  early_param() handlers (e.g. mem=, earlycon=, loglevel=) see the
+	  values supplied via the embedded bootconfig.
+
+	  The runtime bootconfig parser is unaffected, so tree-structured
+	  consumers such as ftrace boot-time tracing keep working.
+
+	  Note: when an initrd also carries a bootconfig, its "kernel"
+	  subtree is still parsed at runtime, but the embedded "kernel"
+	  keys remain in boot_command_line for parse_early_param() and
+	  end up later than the initrd keys in saved_command_line, so
+	  parse_args() last-wins favors the embedded values. If you need
+	  initrd to override embedded kernel.* keys, leave this option
+	  off.
+
+	  If unsure, say N.
+
 config CMDLINE_LOG_WRAP_IDEAL_LEN
 	int "Length to try to wrap the cmdline when logged at boot"
 	default 1021
diff --git a/lib/Makefile b/lib/Makefile
index 6e72d2c1cce7..9de0ac7732a2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -273,6 +273,22 @@ filechk_defbconf = cat $(or $(real-prereqs), /dev/null)
 $(obj)/default.bconf: $(CONFIG_BOOT_CONFIG_EMBED_FILE) FORCE
 	$(call filechk,defbconf)
 
+obj-$(CONFIG_BOOT_CONFIG_EMBED_CMDLINE) += embedded-cmdline.o
+$(obj)/embedded-cmdline.o: $(obj)/embedded_cmdline.bin
+
+# Render the bootconfig "kernel" subtree to a flat cmdline string using
+# the userspace tools/bootconfig parser (-C mode). The runtime prepend
+# helper enforces COMMAND_LINE_SIZE at boot, so no build-time size
+# check is performed here (COMMAND_LINE_SIZE is an arch header
+# constant, not a Kconfig value).
+quiet_cmd_render_cmdline = BCONF2C $@
+      cmd_render_cmdline = \
+	$(objtree)/tools/bootconfig/bootconfig -C $< > $@
+
+targets += embedded_cmdline.bin
+$(obj)/embedded_cmdline.bin: $(obj)/default.bconf $(objtree)/tools/bootconfig/bootconfig FORCE
+	$(call if_changed,render_cmdline)
+
 obj-$(CONFIG_RBTREE_TEST) += rbtree_test.o
 obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o
 
diff --git a/lib/embedded-cmdline.S b/lib/embedded-cmdline.S
new file mode 100644
index 000000000000..7e2e1d81af96
--- /dev/null
+++ b/lib/embedded-cmdline.S
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Embed the build-time-rendered bootconfig "kernel" subtree as a flat
+ * cmdline string. setup_arch() prepends this to boot_command_line on
+ * architectures that select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG.
+ *
+ * Copyright (c) 2026 Meta Platforms, Inc. and affiliates
+ * Copyright (c) 2026 Breno Leitao <leitao@debian.org>
+ */
+	.section .init.rodata, "aw"
+	.global embedded_kernel_cmdline
+embedded_kernel_cmdline:
+	.incbin "lib/embedded_cmdline.bin"
+	.byte 0
+	.global embedded_kernel_cmdline_end
+embedded_kernel_cmdline_end:
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 90eb47c9d8de..4e82fd9553cd 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -15,7 +15,7 @@ override CFLAGS += -Wall -g -I$(CURDIR)/include
 ALL_TARGETS := bootconfig
 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
 
-all: $(ALL_PROGRAMS) test
+all: $(ALL_PROGRAMS)
 
 $(OUTPUT)bootconfig: main.c include/linux/bootconfig.h $(LIBSRC)
 	$(CC) $(filter %.c,$^) $(CFLAGS) $(LDFLAGS) -o $@

-- 
2.54.0


^ permalink raw reply related

* [PATCH 1/4] bootconfig: return 0 from xbc_snprint_cmdline() for a leaf root
From: Breno Leitao @ 2026-05-27 16:41 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team
In-Reply-To: <20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org>

Returning -EINVAL when @root has no descendant key nodes is a quirky
result for a renderer: "nothing to render" is not an error. The only
existing caller, xbc_make_cmdline(), papers over it with a `len <= 0`
check, so the misbehavior is harmless today. The new -C user in
tools/bootconfig added by the follow-up patches propagates the error
and turns an empty "kernel {}" subtree into a build failure.

Short-circuit the leaf-root case and return 0 so the rendered length
matches the rendered content.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 lib/bootconfig.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index f445b7703fdd..3a102c9122f7 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -431,6 +431,16 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
 	const char *val, *q;
 	int ret;
 
+	/*
+	 * A leaf @root (e.g. an empty "kernel {}" subtree, or a key whose
+	 * only child is a value node) has no descendant key/value pairs to
+	 * render. The leaf-finding iterator below would otherwise return
+	 * @root itself, which xbc_node_compose_key_after() rejects with
+	 * -EINVAL.
+	 */
+	if (root && xbc_node_is_leaf(root))
+		return 0;
+
 	xbc_node_for_each_key_value(root, knode, val) {
 		ret = xbc_node_compose_key_after(root, knode,
 					xbc_namebuf, XBC_KEYLEN_MAX);

-- 
2.54.0


^ permalink raw reply related

* [PATCH 0/4] bootconfig: embed kernel.* cmdline at build time
From: Breno Leitao @ 2026-05-27 16:41 UTC (permalink / raw)
  To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
	Nicolas Schier
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
	bpf, Breno Leitao, kernel-team

The userspace pieces (xbc_snprint_cmdline() in lib/, tools/bootconfig -C)
already landed; this series wires the rendered cmdline into the kernel.

Motivation: today the embedded bootconfig is parsed at runtime, after
parse_early_param() has already run, so early_param() handlers can't
see embedded values. Folding the kernel.* subtree into the cmdline at
build time gives a CONFIG_CMDLINE-equivalent for embedded-bootconfig
users without forcing them to maintain two cmdline sources.

Behaviorally, the "kernel" subtree is rendered to a flat string at
build time and stashed in .init.rodata. setup_arch() prepends it to
boot_command_line before parse_early_param() runs. Overflow is a soft
error: the helper logs and leaves boot_command_line untouched rather
than panicking, so an oversized embedded bconf cannot brick a boot.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Breno Leitao (4):
      bootconfig: return 0 from xbc_snprint_cmdline() for a leaf root
      bootconfig: render embedded bootconfig as a kernel cmdline at build time
      bootconfig: add xbc_prepend_embedded_cmdline() helper
      x86/setup: prepend embedded bootconfig cmdline before parse_early_param

 Makefile                   |  5 ++++
 arch/x86/Kconfig           |  1 +
 arch/x86/kernel/setup.c    |  3 +++
 include/linux/bootconfig.h |  7 ++++++
 init/Kconfig               | 33 ++++++++++++++++++++++++++
 init/main.c                | 19 ++++++++++++---
 lib/Makefile               | 16 +++++++++++++
 lib/bootconfig.c           | 58 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/embedded-cmdline.S     | 16 +++++++++++++
 tools/bootconfig/Makefile  |  2 +-
 10 files changed, 156 insertions(+), 4 deletions(-)
---
base-commit: e7e28506af98ce4e1059e5ec59334b335c00a246
change-id: 20260508-bootconfig_using_tools-cfa7aa9d6a5a

Best regards,
--  
Breno Leitao <leitao@debian.org>

^ permalink raw reply

* Re: [PATCH v2] scripts: Have make TAGS not include structure members'
From: Peter Zijlstra @ 2026-05-27 16:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux trace kernel, linux-kbuild, Andrew Morton,
	Masahiro Yamada, Masatake YAMATO, Geert Uytterhoeven,
	Michal Marek, Yang Bai, Stephen Boyd
In-Reply-To: <20260527162914.GH3102624@noisy.programming.kicks-ass.net>

On Wed, May 27, 2026 at 06:29:14PM +0200, Peter Zijlstra wrote:
> On Wed, May 27, 2026 at 12:11:44PM -0400, Steven Rostedt wrote:
> > From: Steven Rostedt <rostedt@goodmis.org>
> > 
> > It is really annoying when I use emacs TAGS to search for something
> > like "dev_name" and have to go through 12 iterations before I find the
> > function "dev_name". I really do not care about structures that include
> > "dev_name" as one of its fields, and I'm sure pretty much all other
> > developers do not care either.
> > 
> > There's a "remove_structs" variable used by the scripts/tags.sh, which
> > I'm guessing is suppose to remove these structures from the TAGS file,
> > but it must do a poor job at it, as I'm always hitting structures when
> > I want the actual declaration.
> > 
> > Luckily, the etags program comes with an option "--no-members", which does
> > exactly what I want, and I'm sure all other kernel developers want too.
> > 
> > Create a new "no_members" variable and assign it to "--no-members" for the
> > "TAGS" case and pass that to the etags program to remove structures.
> > 
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> > Changes since v1: https://lore.kernel.org/all/20131115093645.6dc03918@gandalf.local.home/
> > 
> > - Use a no_members variable instead of hard coding the --no-members into
> >   the etags call, as that can break some "tags" cases. (Michal Marek)
> 
> Yeah, I often use member tags.
> 
> The tags file have a 'kind' field, what you want is for emacs to order
> on kind and prefer 'f' over 'm'.
> 
> The alternative is switching to use emacs-lsp, that way the editor knows
> the kind of symbol you want. If you're on a function call, it should
> only consider 'f' tags. Whereas if the cursor is on a member deref, it
> should only consider 'm'.

That said, setting up clangd on the kernel tree is rather more painful
that I'd like it to be :-(

^ permalink raw reply

* Re: [PATCH v2] scripts: Have make TAGS not include structure members
From: Peter Zijlstra @ 2026-05-27 16:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux trace kernel, linux-kbuild, Andrew Morton,
	Masahiro Yamada, Masatake YAMATO, Geert Uytterhoeven,
	Michal Marek, Yang Bai, Stephen Boyd
In-Reply-To: <20260527121144.08a1f676@fedora>

On Wed, May 27, 2026 at 12:11:44PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> It is really annoying when I use emacs TAGS to search for something
> like "dev_name" and have to go through 12 iterations before I find the
> function "dev_name". I really do not care about structures that include
> "dev_name" as one of its fields, and I'm sure pretty much all other
> developers do not care either.
> 
> There's a "remove_structs" variable used by the scripts/tags.sh, which
> I'm guessing is suppose to remove these structures from the TAGS file,
> but it must do a poor job at it, as I'm always hitting structures when
> I want the actual declaration.
> 
> Luckily, the etags program comes with an option "--no-members", which does
> exactly what I want, and I'm sure all other kernel developers want too.
> 
> Create a new "no_members" variable and assign it to "--no-members" for the
> "TAGS" case and pass that to the etags program to remove structures.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> Changes since v1: https://lore.kernel.org/all/20131115093645.6dc03918@gandalf.local.home/
> 
> - Use a no_members variable instead of hard coding the --no-members into
>   the etags call, as that can break some "tags" cases. (Michal Marek)

Yeah, I often use member tags.

The tags file have a 'kind' field, what you want is for emacs to order
on kind and prefer 'f' over 'm'.

The alternative is switching to use emacs-lsp, that way the editor knows
the kind of symbol you want. If you're on a function call, it should
only consider 'f' tags. Whereas if the cursor is on a member deref, it
should only consider 'm'.

^ permalink raw reply

* Re: [PATCH v3 2/2] serial: qcom-geni: Add tracepoints for Qualcomm GENI serial driver
From: Praveen Talari @ 2026-05-27 16:16 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Jiri Slaby,
	Konrad Dybcio, linux-kernel, linux-trace-kernel, linux-arm-msm,
	linux-serial, Mukesh Kumar Savaliya, Aniket Randive,
	chandana.chiluveru, jyothi.seerapu
In-Reply-To: <2026052738-unexpired-diligence-10f8@gregkh>

Hi

On 27-05-2026 13:47, Greg Kroah-Hartman wrote:
> On Tue, May 26, 2026 at 10:36:18AM +0530, Praveen Talari wrote:
>> Hi
>>
>> On 22-05-2026 15:17, Greg Kroah-Hartman wrote:
>>> On Mon, May 18, 2026 at 11:26:56PM +0530, Praveen Talari wrote:
>>>> Add tracing to the Qualcomm GENI serial driver to improve runtime
>>>> observability.
>>>>
>>>> Trace hooks are added at key points including termios and clock
>>>> configuration, manual control get/set, interrupt handling, and data
>>>> TX/RX paths.
>>>>
>>>> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
>>>> Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
>>>> ---
>>>> v2->v3:
>>>> - Updated commit text(removed example as it was available on cover
>>>>     letter).
>>>> ---
>>>>    drivers/tty/serial/qcom_geni_serial.c | 27 +++++++++++++++++++++++----
>>>>    1 file changed, 23 insertions(+), 4 deletions(-)
>>> This patch did not apply to my tree :(
>> Do you mean these patches are not applied cleanly?
> Yes.
>
>> If yes, i will push on linux-next tip.
> You mean rebase, right?

Yes, i have already posted V4.

  https://lore.kernel.org/all/20260526-add-tracepoints-for-qcom-geni-serial-v4-0-e94fbaec0232@oss.qualcomm.com/


Thanks,

Praveen Talari

>
> thanks,
>
> greg k-h

^ permalink raw reply

* [PATCH v2] scripts: Have make TAGS not include structure members
From: Steven Rostedt @ 2026-05-27 16:11 UTC (permalink / raw)
  To: LKML, Linux trace kernel, linux-kbuild
  Cc: Andrew Morton, Masahiro Yamada, Masatake YAMATO, Peter Zijlstra,
	Geert Uytterhoeven, Michal Marek, Yang Bai, Stephen Boyd

From: Steven Rostedt <rostedt@goodmis.org>

It is really annoying when I use emacs TAGS to search for something
like "dev_name" and have to go through 12 iterations before I find the
function "dev_name". I really do not care about structures that include
"dev_name" as one of its fields, and I'm sure pretty much all other
developers do not care either.

There's a "remove_structs" variable used by the scripts/tags.sh, which
I'm guessing is suppose to remove these structures from the TAGS file,
but it must do a poor job at it, as I'm always hitting structures when
I want the actual declaration.

Luckily, the etags program comes with an option "--no-members", which does
exactly what I want, and I'm sure all other kernel developers want too.

Create a new "no_members" variable and assign it to "--no-members" for the
"TAGS" case and pass that to the etags program to remove structures.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
Changes since v1: https://lore.kernel.org/all/20131115093645.6dc03918@gandalf.local.home/

- Use a no_members variable instead of hard coding the --no-members into
  the etags call, as that can break some "tags" cases. (Michal Marek)

- Rebase to the current decade. Yes, v1 is from 2013. I've been carrying
  this patch in my personal repos as a quilt entry where I would just push
  it when doing a "make TAGS". I also have the conversation still in my
  INBOX to remind me to send a v2. Talk about procrastination! It only
  took me 13 years to send the v2 :-p

  I'm still keeping the same Cc's. I wonder how many of them will be
  broken. :-/

 scripts/tags.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/scripts/tags.sh b/scripts/tags.sh
index 243373683f98..018588014eed 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -305,7 +305,7 @@ exuberant()
 emacs()
 {
 	setup_regex emacs asm c
-	all_target_sources | xargs $1 -a "${regex[@]}"
+	all_target_sources | xargs $1 -a $no_members "${regex[@]}"

 	setup_regex emacs kconfig
 	all_kconfigs | xargs $1 -a "${regex[@]}"
@@ -334,6 +334,7 @@ if [ "${ARCH}" = "um" ]; then
 fi

 remove_structs=
+no_members=
 case "$1" in
 	"cscope")
 		docscope
@@ -353,6 +354,7 @@ case "$1" in
 		rm -f TAGS
 		xtags etags
 		remove_structs=y
+		no_members=--no-members
 		;;
 esac

-- 
2.53.0

^ permalink raw reply related

* Re: [PATCH v3] perf/ftrace: Fix WARNING in __unregister_ftrace_function
From: Rik van Riel @ 2026-05-27 15:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, kernel-team,
	linux-trace-kernel
In-Reply-To: <20260527113732.664d26db@fedora>

On Wed, 2026-05-27 at 11:37 -0400, Steven Rostedt wrote:
> On Wed, 27 May 2026 11:13:01 -0400
> Rik van Riel <riel@surriel.com> wrote:
> 
> > perf_ftrace_function_unregister() unconditionally calls
> > unregister_ftrace_function() without checking whether the
> > ftrace_ops
> > was ever successfully registered. This triggers a WARN_ON in
> > __unregister_ftrace_function() when the ops doesn't have
> > FTRACE_OPS_FL_ENABLED set.
> > 
> > This can happen during perf_event_alloc() error cleanup when
> > perf_trace_destroy() is called via __free_event() on an event whose
> > ftrace_ops registration failed or was already torn down by
> > perf_try_init_event()'s err_destroy path.
> > 
> > The call path is:
> >   perf_event_alloc() error cleanup
> >     -> __free_event()
> >       -> event->destroy() [tp_perf_event_destroy]
> >         -> perf_trace_destroy()
> >           -> perf_trace_event_close()
> >             -> TRACE_REG_PERF_CLOSE
> >               -> perf_ftrace_function_unregister()
> >                 -> unregister_ftrace_function()
> >                   -> __unregister_ftrace_function()
> >                     -> WARN_ON(!(ops->flags &
> > FTRACE_OPS_FL_ENABLED))  
> > 
> > Fix this by checking FTRACE_OPS_FL_ENABLED before attempting to
> > unregister. If the ops is not enabled, just free the filter and
> > return success.
> > 
> > Signed-off-by: Rik van Riel <riel@surriel.com>
> 
> Thanks Rik. Is this urgent where it should have a Fixes tag and Cc
> stable as well as be sent to Linus during the -rc release, or can it
> wait for the next merge window?
> 

I don't think this patch has any particular urgency.

It can go in using whatever flow is most convenient.

-- 
All Rights Reversed.

^ permalink raw reply

* Re: [PATCH 07/13] rv: Simply hybrid automata monitors's clock variables
From: Nam Cao @ 2026-05-27 15:41 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, Wander Lairson Costa, linux-trace-kernel,
	linux-kernel
In-Reply-To: <054cfff25288a98a7d7922de149be91fcbc79bc0.camel@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> I'd prefer if this was consistent with the above as in (now - env <=
> expire) or (env >= now - env), whichever you prefer but let's keep it
> equivalent.
> Or do you have a reason to rearrange it here?

No reason. Let's keep it consistent.

Thanks for the comments.

Nam

^ permalink raw reply

* Re: [PATCH v3] perf/ftrace: Fix WARNING in __unregister_ftrace_function
From: Steven Rostedt @ 2026-05-27 15:37 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, kernel-team,
	linux-trace-kernel
In-Reply-To: <20260527111301.2d0d8256@fangorn>

On Wed, 27 May 2026 11:13:01 -0400
Rik van Riel <riel@surriel.com> wrote:

> perf_ftrace_function_unregister() unconditionally calls
> unregister_ftrace_function() without checking whether the ftrace_ops
> was ever successfully registered. This triggers a WARN_ON in
> __unregister_ftrace_function() when the ops doesn't have
> FTRACE_OPS_FL_ENABLED set.
> 
> This can happen during perf_event_alloc() error cleanup when
> perf_trace_destroy() is called via __free_event() on an event whose
> ftrace_ops registration failed or was already torn down by
> perf_try_init_event()'s err_destroy path.
> 
> The call path is:
>   perf_event_alloc() error cleanup
>     -> __free_event()
>       -> event->destroy() [tp_perf_event_destroy]
>         -> perf_trace_destroy()
>           -> perf_trace_event_close()
>             -> TRACE_REG_PERF_CLOSE
>               -> perf_ftrace_function_unregister()
>                 -> unregister_ftrace_function()
>                   -> __unregister_ftrace_function()
>                     -> WARN_ON(!(ops->flags & FTRACE_OPS_FL_ENABLED))  
> 
> Fix this by checking FTRACE_OPS_FL_ENABLED before attempting to
> unregister. If the ops is not enabled, just free the filter and
> return success.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>

Thanks Rik. Is this urgent where it should have a Fixes tag and Cc
stable as well as be sent to Linus during the -rc release, or can it
wait for the next merge window?

-- Steve

^ permalink raw reply

* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Ackerley Tng @ 2026-05-27 15:35 UTC (permalink / raw)
  To: Sean Christopherson, Fuad Tabba
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CAEvNRgEZ9vCKkoMC11tVrueAonGWH2x6OeaYYxXGEj2gwHUaKw@mail.gmail.com>

Ackerley Tng <ackerleytng@google.com> writes:

>
> [...snip...]
>
>>
>> Hmm, I wonder if we can figure out a way to consolidate some documentation,
>> because this is _exactly_ the same pattern that x86's host_pfn_mapping_level()
>> deals with (see its big comment below).
>>
>
> This would be great, are you thinking an actual comment or something in
> Documentation/?
>
> Perhaps we could iterate on this a little with me providing the newbie
> perspective. Do you want me to take a stab at writing something up?
>

Please see https://lore.kernel.org/all/20260527-kvm-locking-docs-v1-0-4fe8b602ff47@google.com/T/!

>>
>> [...snip...]
>>

^ permalink raw reply

* Re: [PATCH 1/8] scripts/sorttable: Handle RISC-V patchable ftrace entries
From: Steven Rostedt @ 2026-05-27 15:30 UTC (permalink / raw)
  To: Wang Han
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Masami Hiramatsu, Mark Rutland, Catalin Marinas, Chen Pei,
	Andy Chiu, Björn Töpel, Deepak Gupta, Puranjay Mohan,
	Conor Dooley, Josh Poimboeuf, Jiri Kosina, Miroslav Benes,
	Petr Mladek, Joe Lawrence, Shuah Khan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, linux-riscv,
	linux-kernel, linux-trace-kernel, live-patching, linux-kselftest,
	linux-perf-users
In-Reply-To: <20260527123530.2593918-2-wanghan@linux.alibaba.com>

On Wed, 27 May 2026 20:35:23 +0800
Wang Han <wanghan@linux.alibaba.com> wrote:

> Signed-off-by: Wang Han <wanghan@linux.alibaba.com>
> ---
>  scripts/sorttable.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/sorttable.c b/scripts/sorttable.c
> index e8ed11c680c6..b4061c2c03e1 100644
> --- a/scripts/sorttable.c
> +++ b/scripts/sorttable.c
> @@ -901,11 +901,17 @@ static int do_file(char const *const fname, void *addr)
>  		/* fallthrough */
>  	case EM_386:
>  	case EM_LOONGARCH:
> -	case EM_RISCV:
>  	case EM_S390:
>  	case EM_X86_64:
>  		custom_sort = sort_relative_table_with_data;
>  		break;
> +	case EM_RISCV:
> +#ifdef MCOUNT_SORT_ENABLED
> +		/* RISC-V uses patchable function entries before function entry. */
> +		before_func = 8;
> +#endif
> +		custom_sort = sort_relative_table_with_data;
> +		break;
>  	case EM_PARISC:
>  	case EM_PPC:
>  	case EM_PPC64:

So basically RISCV has the same problem as ARM64 with patchable
entries. As this may happen for other archs in the future, I would like
to group them together like this:

diff --git a/scripts/sorttable.c b/scripts/sorttable.c
index e8ed11c680c6..b3d9073d9fbc 100644
--- a/scripts/sorttable.c
+++ b/scripts/sorttable.c
@@ -891,17 +891,23 @@ static int do_file(char const *const fname, void *addr)
 	table_sort_t custom_sort = NULL;
 
 	switch (elf_map_machine(ehdr)) {
-	case EM_AARCH64:
 #ifdef MCOUNT_SORT_ENABLED
+	case EM_AARCH64:
 		sort_reloc = true;
 		rela_type = 0x403;
-		/* arm64 uses patchable function entry placing before function */
+		/*
+		 * arm64 and RISCV use patchable function entry placing
+		 * before function
+		 */
+	case RISCV:
 		before_func = 8;
+#else
+	case EM_AARCH64:
+	case RISCV:
 #endif
 		/* fallthrough */
 	case EM_386:
 	case EM_LOONGARCH:
-	case EM_RISCV:
 	case EM_S390:
 	case EM_X86_64:
 		custom_sort = sort_relative_table_with_data;

does the above work for you? (Although I didn't even compile test it).

-- Steve

^ permalink raw reply related

* [PATCH v3] perf/ftrace: Fix WARNING in __unregister_ftrace_function
From: Rik van Riel @ 2026-05-27 15:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, kernel-team,
	linux-trace-kernel

perf_ftrace_function_unregister() unconditionally calls
unregister_ftrace_function() without checking whether the ftrace_ops
was ever successfully registered. This triggers a WARN_ON in
__unregister_ftrace_function() when the ops doesn't have
FTRACE_OPS_FL_ENABLED set.

This can happen during perf_event_alloc() error cleanup when
perf_trace_destroy() is called via __free_event() on an event whose
ftrace_ops registration failed or was already torn down by
perf_try_init_event()'s err_destroy path.

The call path is:
  perf_event_alloc() error cleanup
    -> __free_event()
      -> event->destroy() [tp_perf_event_destroy]
        -> perf_trace_destroy()
          -> perf_trace_event_close()
            -> TRACE_REG_PERF_CLOSE
              -> perf_ftrace_function_unregister()
                -> unregister_ftrace_function()
                  -> __unregister_ftrace_function()
                    -> WARN_ON(!(ops->flags & FTRACE_OPS_FL_ENABLED))

Fix this by checking FTRACE_OPS_FL_ENABLED before attempting to
unregister. If the ops is not enabled, just free the filter and
return success.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 kernel/trace/trace_event_perf.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a6bb7577e8c5..5b272856e5ab 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -497,7 +497,17 @@ static int perf_ftrace_function_register(struct perf_event *event)
 static int perf_ftrace_function_unregister(struct perf_event *event)
 {
 	struct ftrace_ops *ops = &event->ftrace_ops;
-	int ret = unregister_ftrace_function(ops);
+	int ret = 0;
+
+	/*
+	 * Perf will call this unconditionally even if the ops is not
+	 * enabled. The unregister_ftrace_function() will warn if called
+	 * when not enabled. Just bypass the unregistering if ops isn't
+	 * enabled here.
+	 */
+	if (ops->flags & FTRACE_OPS_FL_ENABLED)
+		ret = unregister_ftrace_function(ops);
+
 	ftrace_free_filter(ops);
 	return ret;
 }
-- 
2.54.0


^ permalink raw reply related

* [PATCH 3/3] Documentation/rtla: Add -A/--aligned option
From: Tomas Glozar @ 2026-05-27 14:49 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260527144928.2944472-1-tglozar@redhat.com>

Cover the newly added -A/--aligned option that aligns timerlat threads
using the corresponding feature of the timerlat tracer.

A note is added to clarify what alignment means, similar to the note in
the tracer implementation in commit 4245bf4dc58f ("tracing/osnoise: Add
option to align tlat threads").

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 Documentation/tools/rtla/common_timerlat_options.txt | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/Documentation/tools/rtla/common_timerlat_options.txt b/Documentation/tools/rtla/common_timerlat_options.txt
index ab159b2cbfe7..0a01fbc93ee1 100644
--- a/Documentation/tools/rtla/common_timerlat_options.txt
+++ b/Documentation/tools/rtla/common_timerlat_options.txt
@@ -95,3 +95,15 @@
         * **full**        Print the entire stack trace, including unknown addresses.
 
         For unknown addresses, the raw pointer is printed.
+
+**-A**, **--aligned** *us*
+
+        Align wake-up of timerlat threads to a set offset in microseconds.
+
+        The alignment will be applied when the threads wake up at the start of tracing while
+        the timer for the first cycle is armed. Each thread sets its timer to the wake-up time
+        of the previous thread plus the alignment.
+
+        This option may be used with any non-negative argument, including zero, which will
+        align threads so that they wake up all at the same time.
+
-- 
2.54.0


^ permalink raw reply related

* [PATCH 2/3] rtla/tests: Add unit tests for -A/--aligned option
From: Tomas Glozar @ 2026-05-27 14:49 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260527144928.2944472-1-tglozar@redhat.com>

Add both parse_args() and opt_* tests for the newly added -A/--aligned
option.

Assisted-by: Claude:claude-4.5-opus-high-thinking
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 .../rtla/tests/unit/cli_opt_callback.c        | 12 +++++++++++
 .../rtla/tests/unit/timerlat_hist_cli.c       | 20 +++++++++++++++++++
 .../rtla/tests/unit/timerlat_top_cli.c        | 20 +++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/tools/tracing/rtla/tests/unit/cli_opt_callback.c b/tools/tracing/rtla/tests/unit/cli_opt_callback.c
index 01647f4227d1..4a406af42821 100644
--- a/tools/tracing/rtla/tests/unit/cli_opt_callback.c
+++ b/tools/tracing/rtla/tests/unit/cli_opt_callback.c
@@ -545,6 +545,17 @@ START_TEST(test_opt_nano_cb)
 }
 END_TEST
 
+START_TEST(test_opt_timerlat_align_cb)
+{
+	struct timerlat_params params = {0};
+	const struct option opt = TEST_CALLBACK(&params, opt_timerlat_align_cb);
+
+	ck_assert_int_eq(opt_timerlat_align_cb(&opt, "500", 0), 0);
+	ck_assert(params.timerlat_align);
+	ck_assert_int_eq(params.timerlat_align_us, 500);
+}
+END_TEST
+
 START_TEST(test_opt_stack_format_cb)
 {
 	int stack_format = 0;
@@ -689,6 +700,7 @@ Suite *cli_opt_callback_suite(void)
 	tcase_add_test(tc, test_opt_nano_cb);
 	tcase_add_test(tc, test_opt_stack_format_cb);
 	tcase_add_exit_test(tc, test_opt_stack_format_cb_invalid, EXIT_FAILURE);
+	tcase_add_test(tc, test_opt_timerlat_align_cb);
 	suite_add_tcase(s, tc);
 
 	tc = tcase_create("histogram");
diff --git a/tools/tracing/rtla/tests/unit/timerlat_hist_cli.c b/tools/tracing/rtla/tests/unit/timerlat_hist_cli.c
index 81dc04596cd1..968bf962f53f 100644
--- a/tools/tracing/rtla/tests/unit/timerlat_hist_cli.c
+++ b/tools/tracing/rtla/tests/unit/timerlat_hist_cli.c
@@ -373,6 +373,24 @@ START_TEST(test_user_threads_long)
 }
 END_TEST
 
+START_TEST(test_aligned_short)
+{
+	PARSE_ARGS("timerlat", "hist", "-A", "500");
+
+	ck_assert(tlat_params->timerlat_align);
+	ck_assert_int_eq(tlat_params->timerlat_align_us, 500);
+}
+END_TEST
+
+START_TEST(test_aligned_long)
+{
+	PARSE_ARGS("timerlat", "hist", "--aligned", "500");
+
+	ck_assert(tlat_params->timerlat_align);
+	ck_assert_int_eq(tlat_params->timerlat_align_us, 500);
+}
+END_TEST
+
 /* Histogram Options */
 
 START_TEST(test_bucket_size_short)
@@ -654,6 +672,8 @@ Suite *timerlat_hist_cli_suite(void)
 	tcase_add_test(tc, test_user_load_long);
 	tcase_add_test(tc, test_user_threads_short);
 	tcase_add_test(tc, test_user_threads_long);
+	tcase_add_test(tc, test_aligned_short);
+	tcase_add_test(tc, test_aligned_long);
 	suite_add_tcase(s, tc);
 
 	tc = tcase_create("histogram_options");
diff --git a/tools/tracing/rtla/tests/unit/timerlat_top_cli.c b/tools/tracing/rtla/tests/unit/timerlat_top_cli.c
index 1c39008564c5..33aa6588d503 100644
--- a/tools/tracing/rtla/tests/unit/timerlat_top_cli.c
+++ b/tools/tracing/rtla/tests/unit/timerlat_top_cli.c
@@ -373,6 +373,24 @@ START_TEST(test_user_threads_long)
 }
 END_TEST
 
+START_TEST(test_aligned_short)
+{
+	PARSE_ARGS("timerlat", "top", "-A", "500");
+
+	ck_assert(tlat_params->timerlat_align);
+	ck_assert_int_eq(tlat_params->timerlat_align_us, 500);
+}
+END_TEST
+
+START_TEST(test_aligned_long)
+{
+	PARSE_ARGS("timerlat", "top", "--aligned", "500");
+
+	ck_assert(tlat_params->timerlat_align);
+	ck_assert_int_eq(tlat_params->timerlat_align_us, 500);
+}
+END_TEST
+
 /* Output */
 
 START_TEST(test_nano_short)
@@ -596,6 +614,8 @@ Suite *timerlat_top_cli_suite(void)
 	tcase_add_test(tc, test_user_load_long);
 	tcase_add_test(tc, test_user_threads_short);
 	tcase_add_test(tc, test_user_threads_long);
+	tcase_add_test(tc, test_aligned_short);
+	tcase_add_test(tc, test_aligned_long);
 	suite_add_tcase(s, tc);
 
 	tc = tcase_create("output");
-- 
2.54.0


^ permalink raw reply related

* [PATCH 1/3] rtla/timerlat: Add -A/--aligned CLI option
From: Tomas Glozar @ 2026-05-27 14:49 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel

Add a new option, -A/--aligned, that enables timerlat thread alignment
implemented on the kernel-side in commit 4245bf4dc58f ("tracing/osnoise:
Add option to align tlat threads"). The option takes an argument,
representing alignment between timerlat threads in microseconds.

The feature is modeled after the option of the same name in the
cyclictest tool.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---

This patchset depends on "rtla: Migrate to libsubcmd for command line option parsing"
- https://lore.kernel.org/linux-trace-kernel/20260521141833.2353025-1-tglozar@redhat.com/T/
as it uses the new CLI implementation. That in turn depends on the test patches:
- https://lore.kernel.org/linux-trace-kernel/20260423130558.882022-1-tglozar@redhat.com/T/
- https://lore.kernel.org/linux-trace-kernel/20260424140244.958495-1-tglozar@redhat.com/
as it touches tests, as well as a fix touching the CLI:
- https://lore.kernel.org/linux-trace-kernel/20260414185223.65353-1-costa.shul@redhat.com/

 tools/tracing/rtla/src/cli.c      |   2 +
 tools/tracing/rtla/src/cli_p.h    |  17 ++++
 tools/tracing/rtla/src/common.h   |   8 ++
 tools/tracing/rtla/src/osnoise.c  | 149 ++++++++++++++++++++++++++++++
 tools/tracing/rtla/src/osnoise.h  |   6 ++
 tools/tracing/rtla/src/timerlat.c |  18 ++++
 tools/tracing/rtla/src/timerlat.h |   2 +
 7 files changed, 202 insertions(+)

diff --git a/tools/tracing/rtla/src/cli.c b/tools/tracing/rtla/src/cli.c
index 709219341a56..c5279c987531 100644
--- a/tools/tracing/rtla/src/cli.c
+++ b/tools/tracing/rtla/src/cli.c
@@ -248,6 +248,7 @@ struct common_params *timerlat_top_parse_args(int argc, char **argv)
 		RTLA_OPT_USER_THREADS,
 		RTLA_OPT_KERNEL_THREADS,
 		RTLA_OPT_USER_LOAD,
+		TIMERLAT_OPT_ALIGNED,
 
 	OPT_GROUP("Output:"),
 		TIMERLAT_OPT_NANO,
@@ -362,6 +363,7 @@ struct common_params *timerlat_hist_parse_args(int argc, char **argv)
 		RTLA_OPT_USER_THREADS,
 		RTLA_OPT_KERNEL_THREADS,
 		RTLA_OPT_USER_LOAD,
+		TIMERLAT_OPT_ALIGNED,
 
 	OPT_GROUP("Histogram Options:"),
 		HIST_OPT_BUCKET_SIZE,
diff --git a/tools/tracing/rtla/src/cli_p.h b/tools/tracing/rtla/src/cli_p.h
index 3cea4f6e976e..3c939de9abf0 100644
--- a/tools/tracing/rtla/src/cli_p.h
+++ b/tools/tracing/rtla/src/cli_p.h
@@ -447,6 +447,10 @@ static int opt_osnoise_on_end_cb(const struct option *opt, const char *arg, int
 	"set the stack format (truncate, skip, full)", \
 	opt_stack_format_cb)
 
+#define TIMERLAT_OPT_ALIGNED OPT_CALLBACK('A', "aligned", params, "us", \
+	"align thread wakeups to a specific offset", \
+	opt_timerlat_align_cb)
+
 /*
  * Callback functions for command line options for timerlat tools
  */
@@ -608,6 +612,19 @@ static int opt_stack_format_cb(const struct option *opt, const char *arg, int un
 	return 0;
 }
 
+static int opt_timerlat_align_cb(const struct option *opt, const char *arg, int unset)
+{
+	struct timerlat_params *params = opt->value;
+
+	if (unset || !arg)
+		return -1;
+
+	params->timerlat_align = true;
+	params->timerlat_align_us = get_llong_from_str((char *)arg);
+
+	return 0;
+}
+
 /*
  * Macros for command line options specific to histogram-based tools
  */
diff --git a/tools/tracing/rtla/src/common.h b/tools/tracing/rtla/src/common.h
index 0dfca83bd726..04b287a03f6d 100644
--- a/tools/tracing/rtla/src/common.h
+++ b/tools/tracing/rtla/src/common.h
@@ -51,6 +51,14 @@ struct osnoise_context {
 	/* -1 as init value because 0 is off */
 	int			orig_opt_workload;
 	int			opt_workload;
+
+	/* -1 as init value because 0 is off */
+	int			orig_opt_timerlat_align;
+	int			opt_timerlat_align;
+
+	/* 0 as init value */
+	unsigned long long	orig_timerlat_align_us;
+	unsigned long long	timerlat_align_us;
 };
 
 extern volatile int stop_tracing;
diff --git a/tools/tracing/rtla/src/osnoise.c b/tools/tracing/rtla/src/osnoise.c
index e1e32898af2d..4ff5dad013b1 100644
--- a/tools/tracing/rtla/src/osnoise.c
+++ b/tools/tracing/rtla/src/osnoise.c
@@ -423,6 +423,86 @@ void osnoise_put_timerlat_period_us(struct osnoise_context *context)
 	context->orig_timerlat_period_us = OSNOISE_TIME_INIT_VAL;
 }
 
+/*
+ * osnoise_get_timerlat_align_us - read and save the original "timerlat_align_us"
+ */
+static long long
+osnoise_get_timerlat_align_us(struct osnoise_context *context)
+{
+	long long timerlat_align_us;
+
+	if (context->timerlat_align_us != OSNOISE_OPTION_INIT_VAL)
+		return context->timerlat_align_us;
+
+	if (context->orig_timerlat_align_us != OSNOISE_OPTION_INIT_VAL)
+		return context->orig_timerlat_align_us;
+
+	timerlat_align_us = osnoise_read_ll_config("osnoise/timerlat_align_us");
+	if (timerlat_align_us < 0)
+		goto out_err;
+
+	context->orig_timerlat_align_us = timerlat_align_us;
+	return timerlat_align_us;
+
+out_err:
+	return OSNOISE_OPTION_INIT_VAL;
+}
+
+/*
+ * osnoise_set_timerlat_align_us - set "timerlat_align_us"
+ */
+int osnoise_set_timerlat_align_us(struct osnoise_context *context, long long timerlat_align_us)
+{
+	long long curr_timerlat_align_us = osnoise_get_timerlat_align_us(context);
+	int retval;
+
+	if (curr_timerlat_align_us == OSNOISE_OPTION_INIT_VAL)
+		return -1;
+
+	retval = osnoise_write_ll_config("osnoise/timerlat_align_us", timerlat_align_us);
+	if (retval < 0)
+		return -1;
+
+	context->timerlat_align_us = timerlat_align_us;
+
+	return 0;
+}
+
+/*
+ * osnoise_restore_timerlat_align_us - restore "timerlat_align_us"
+ */
+void osnoise_restore_timerlat_align_us(struct osnoise_context *context)
+{
+	int retval;
+
+	if (context->orig_timerlat_align_us == OSNOISE_OPTION_INIT_VAL)
+		return;
+
+	if (context->orig_timerlat_align_us == context->timerlat_align_us)
+		goto out_done;
+
+	retval = osnoise_write_ll_config("osnoise/timerlat_align_us",
+				   context->orig_timerlat_align_us);
+	if (retval < 0)
+		err_msg("Could not restore original osnoise timerlat_align_us\n");
+
+out_done:
+	context->timerlat_align_us = OSNOISE_OPTION_INIT_VAL;
+}
+
+/*
+ * osnoise_put_timerlat_align_us - restore original values and cleanup data
+ */
+void osnoise_put_timerlat_align_us(struct osnoise_context *context)
+{
+	osnoise_restore_timerlat_align_us(context);
+
+	if (context->orig_timerlat_align_us == OSNOISE_OPTION_INIT_VAL)
+		return;
+
+	context->orig_timerlat_align_us = OSNOISE_OPTION_INIT_VAL;
+}
+
 /*
  * osnoise_get_stop_us - read and save the original "stop_tracing_us"
  */
@@ -908,6 +988,67 @@ static void osnoise_put_workload(struct osnoise_context *context)
 	context->orig_opt_workload = OSNOISE_OPTION_INIT_VAL;
 }
 
+static int osnoise_get_timerlat_align(struct osnoise_context *context)
+{
+	if (context->opt_timerlat_align != OSNOISE_OPTION_INIT_VAL)
+		return context->opt_timerlat_align;
+
+	if (context->orig_opt_timerlat_align != OSNOISE_OPTION_INIT_VAL)
+		return context->orig_opt_timerlat_align;
+
+	context->orig_opt_timerlat_align = osnoise_options_get_option("TIMERLAT_ALIGN");
+
+	return context->orig_opt_timerlat_align;
+}
+
+int osnoise_set_timerlat_align(struct osnoise_context *context, bool onoff)
+{
+	int opt_timerlat_align = osnoise_get_timerlat_align(context);
+	int retval;
+
+	if (opt_timerlat_align == OSNOISE_OPTION_INIT_VAL)
+		return -1;
+
+	if (opt_timerlat_align == onoff)
+		return 0;
+
+	retval = osnoise_options_set_option("TIMERLAT_ALIGN", onoff);
+	if (retval < 0)
+		return -2;
+
+	context->opt_timerlat_align = onoff;
+
+	return 0;
+}
+
+static void osnoise_restore_timerlat_align(struct osnoise_context *context)
+{
+	int retval;
+
+	if (context->orig_opt_timerlat_align == OSNOISE_OPTION_INIT_VAL)
+		return;
+
+	if (context->orig_opt_timerlat_align == context->opt_timerlat_align)
+		goto out_done;
+
+	retval = osnoise_options_set_option("TIMERLAT_ALIGN", context->orig_opt_timerlat_align);
+	if (retval < 0)
+		err_msg("Could not restore original TIMERLAT_ALIGN option\n");
+
+out_done:
+	context->orig_opt_timerlat_align = OSNOISE_OPTION_INIT_VAL;
+}
+
+static void osnoise_put_timerlat_align(struct osnoise_context *context)
+{
+	osnoise_restore_timerlat_align(context);
+
+	if (context->orig_opt_timerlat_align == OSNOISE_OPTION_INIT_VAL)
+		return;
+
+	context->orig_opt_timerlat_align = OSNOISE_OPTION_INIT_VAL;
+}
+
 enum {
 	FLAG_CONTEXT_NEWLY_CREATED	= (1 << 0),
 	FLAG_CONTEXT_DELETED		= (1 << 1),
@@ -960,6 +1101,12 @@ struct osnoise_context *osnoise_context_alloc(void)
 	context->orig_opt_workload	= OSNOISE_OPTION_INIT_VAL;
 	context->opt_workload		= OSNOISE_OPTION_INIT_VAL;
 
+	context->orig_opt_timerlat_align	= OSNOISE_OPTION_INIT_VAL;
+	context->opt_timerlat_align		= OSNOISE_OPTION_INIT_VAL;
+
+	context->orig_timerlat_align_us	= OSNOISE_OPTION_INIT_VAL;
+	context->timerlat_align_us	= OSNOISE_OPTION_INIT_VAL;
+
 	osnoise_get_context(context);
 
 	return context;
@@ -988,6 +1135,8 @@ void osnoise_put_context(struct osnoise_context *context)
 	osnoise_put_tracing_thresh(context);
 	osnoise_put_irq_disable(context);
 	osnoise_put_workload(context);
+	osnoise_put_timerlat_align(context);
+	osnoise_put_timerlat_align_us(context);
 
 	free(context);
 }
diff --git a/tools/tracing/rtla/src/osnoise.h b/tools/tracing/rtla/src/osnoise.h
index 168669aa7e0d..340ff5a64e6e 100644
--- a/tools/tracing/rtla/src/osnoise.h
+++ b/tools/tracing/rtla/src/osnoise.h
@@ -49,6 +49,12 @@ void osnoise_restore_print_stack(struct osnoise_context *context);
 int osnoise_set_print_stack(struct osnoise_context *context,
 			    long long print_stack);
 
+int osnoise_set_timerlat_align_us(struct osnoise_context *context,
+				  long long timerlat_align_us);
+void osnoise_restore_timerlat_align_us(struct osnoise_context *context);
+
+int osnoise_set_timerlat_align(struct osnoise_context *context, bool onoff);
+
 int osnoise_set_irq_disable(struct osnoise_context *context, bool onoff);
 void osnoise_report_missed_events(struct osnoise_tool *tool);
 int osnoise_apply_config(struct osnoise_tool *tool, struct osnoise_params *params);
diff --git a/tools/tracing/rtla/src/timerlat.c b/tools/tracing/rtla/src/timerlat.c
index f990c8365776..169aa9a6569d 100644
--- a/tools/tracing/rtla/src/timerlat.c
+++ b/tools/tracing/rtla/src/timerlat.c
@@ -77,6 +77,24 @@ timerlat_apply_config(struct osnoise_tool *tool, struct timerlat_params *params)
 		goto out_err;
 	}
 
+	retval = osnoise_set_timerlat_align(tool->context, params->timerlat_align);
+	if (retval && params->timerlat_align) {
+		/*
+		 * We might be running on a kernel that does not support timerlat align.
+		 * Unless user requested it explicitly, ignore the error.
+		 */
+		err_msg("Failed to enable timerlat align\n");
+		goto out_err;
+	}
+
+	if (params->timerlat_align) {
+		retval = osnoise_set_timerlat_align_us(tool->context, params->timerlat_align_us);
+		if (retval) {
+			err_msg("Failed to set timerlat align us\n");
+			goto out_err;
+		}
+	}
+
 	/*
 	 * If the user did not specify a type of thread, try user-threads first.
 	 * Fall back to kernel threads otherwise.
diff --git a/tools/tracing/rtla/src/timerlat.h b/tools/tracing/rtla/src/timerlat.h
index 38ab6b41a15e..84ec6d778183 100644
--- a/tools/tracing/rtla/src/timerlat.h
+++ b/tools/tracing/rtla/src/timerlat.h
@@ -31,6 +31,8 @@ struct timerlat_params {
 	enum timerlat_tracing_mode mode;
 	const char		*bpf_action_program;
 	enum stack_format	stack_format;
+	bool			timerlat_align;
+	unsigned long long	timerlat_align_us;
 };
 
 #define to_timerlat_params(ptr) container_of(ptr, struct timerlat_params, common)
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v6] tracing/eprobes: Allow use of BTF names to dereference pointers
From: Steven Rostedt @ 2026-05-27 14:16 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: LKML, Linux trace kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Mark Rutland, Peter Zijlstra, Namhyung Kim, Takaya Saeki,
	Douglas Raillard, Tom Zanussi, Andrew Morton, Thomas Gleixner,
	Ian Rogers
In-Reply-To: <20260527100815.53e55c57@gandalf.local.home>

On Wed, 27 May 2026 10:08:15 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> > this seems to be supported only for argument (pointer) stored in the trace record,
> > not the actual arguments to the tracepoint, is that right?
> > 
> > so I can deref worker from sched.sched_kthread_work_queue_work, like:
> > 
> >   echo 'e:myprobe sched.sched_kthread_work_queue_work (kthread_worker)worker->flags (kthread_work)work->canceling' > dynamic_events  
> 
> Correct, that is because eprobes "e:" works on the output of a trace event.
> 
> 
> > 
> > but I can't deref sched.sched_process_exec p->pid, like:
> > 
> >   # echo 'e:myprobe sched.sched_process_exec (task_struct)p->pid' > dynamic_events
> >   bash: echo: write error: Invalid argument  
> 
> For function prototypes of a tracepoint, you would use a tprobe "t:"
> 
>  # echo 't:exec sched_process_exec pid=p->pid' > dynamic_events


Hmm, this is why I need to write a book ;-)

Thanks for helping with what content I need to add!

-- Steve

^ permalink raw reply

* Re: [PATCH v2] perf/ftrace: Fix WARNING in __unregister_ftrace_function
From: Steven Rostedt @ 2026-05-27 14:14 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
	linux-trace-kernel, kernel-team
In-Reply-To: <20260513161916.04151502@fangorn>

On Wed, 13 May 2026 16:19:16 -0400
Rik van Riel <riel@surriel.com> wrote:

> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a6bb7577e8c5..58e1b427b576 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -497,7 +497,11 @@ static int perf_ftrace_function_register(struct perf_event *event)
>  static int perf_ftrace_function_unregister(struct perf_event *event)
>  {
>  	struct ftrace_ops *ops = &event->ftrace_ops;
> -	int ret = unregister_ftrace_function(ops);
> +	int ret = 0;
> +

Because this is different than unregister_ftrace_function() where it will
not fail if the ops is not registered, it deserves a comment. Something
like:

	/*
	 * Perf will call this unconditionally even if the ops is not
	 * enabled. The unregister_ftrace_function() will warn if called
	 * when not enabled. Just bypass the unregistering if ops isn't
	 * enabled here.
	 */

Thanks,

-- Steve


> +	if (ops->flags & FTRACE_OPS_FL_ENABLED)
> +		ret = unregister_ftrace_function(ops);
> +
>  	ftrace_free_filter(ops);
>  	return ret;
>  }

^ permalink raw reply

* Re: [PATCH] selftests/ftrace: Fix trace_marker_raw test on 64K page kernels
From: Steven Rostedt @ 2026-05-27 14:09 UTC (permalink / raw)
  To: Tianchen Ding
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Shuah Khan, linux-kernel,
	linux-trace-kernel, linux-kselftest
In-Reply-To: <20260527095438.1794905-1-dtcccc@linux.alibaba.com>

On Wed, 27 May 2026 17:54:37 +0800
Tianchen Ding <dtcccc@linux.alibaba.com> wrote:

> On ARM64 kernels with 64K pages, the trace_marker_raw test fails because
> bash's printf builtin uses stdio buffering which splits output into
> multiple small write() calls to the tracefs file. Since each individual
> write is within TRACE_MARKER_MAX_SIZE (4096), they all succeed, causing
> the "too big" write test to incorrectly pass.
> 
> Fix by piping make_str output through dd with iflag=fullblock to
> guarantee a single atomic write() syscall to trace_marker_raw.
> 
> Fixes: 37f46601383a ("selftests/tracing: Add basic test for trace_marker_raw file")
> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> ---
>  .../selftests/ftrace/test.d/00basic/trace_marker_raw.tc     | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
> index 8e905d4fe6dd..efd8263e6087 100644
> --- a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
> +++ b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
> @@ -43,8 +43,10 @@ write_buffer() {
>  	id=$1
>  	size=$2
>  
> -	# write the string into the raw marker
> -	make_str $id $size > trace_marker_raw
> +	# Pipe through dd to ensure a single atomic write() syscall.
> +	# Shell's printf builtin uses stdio buffering which may split the
> +	# output into multiple writes.

Could you comment that this is for architectures with 64K pages too.

Thanks for fixing this,

-- Steve

> +	make_str $id $size | dd of=trace_marker_raw bs=`expr $size + 4` iflag=fullblock
>  }
>  
>  


^ permalink raw reply

* Re: [PATCH v6] tracing/eprobes: Allow use of BTF names to dereference pointers
From: Steven Rostedt @ 2026-05-27 14:08 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: LKML, Linux trace kernel, Masami Hiramatsu, Mathieu Desnoyers,
	Mark Rutland, Peter Zijlstra, Namhyung Kim, Takaya Saeki,
	Douglas Raillard, Tom Zanussi, Andrew Morton, Thomas Gleixner,
	Ian Rogers
In-Reply-To: <ahayVg7TvNrf1ama@krava>

On Wed, 27 May 2026 10:59:02 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> 
> hi,
> this seems to be supported only for argument (pointer) stored in the trace record,
> not the actual arguments to the tracepoint, is that right?
> 
> so I can deref worker from sched.sched_kthread_work_queue_work, like:
> 
>   echo 'e:myprobe sched.sched_kthread_work_queue_work (kthread_worker)worker->flags (kthread_work)work->canceling' > dynamic_events

Correct, that is because eprobes "e:" works on the output of a trace event.


> 
> but I can't deref sched.sched_process_exec p->pid, like:
> 
>   # echo 'e:myprobe sched.sched_process_exec (task_struct)p->pid' > dynamic_events
>   bash: echo: write error: Invalid argument

For function prototypes of a tracepoint, you would use a tprobe "t:"

 # echo 't:exec sched_process_exec pid=p->pid' > dynamic_events
 # echo 1 > events/tracepoints/exec/enable
 # cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 7/7   #P:8
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
    rtkit-daemon-1935    [005] .....   105.350235: exec: (__probestub_sched_process_exec+0x4/0x10) pid=1935
    rtkit-daemon-1935    [005] .....   105.376609: exec: (__probestub_sched_process_exec+0x4/0x10) pid=1935
 pkla-check-auth-1939    [000] .....   105.404491: exec: (__probestub_sched_process_exec+0x4/0x10) pid=1939
 at-spi-bus-laun-1953    [000] .....   105.914139: exec: (__probestub_sched_process_exec+0x4/0x10) pid=1953
     dbus-daemon-1959    [002] .....   105.919123: exec: (__probestub_sched_process_exec+0x4/0x10) pid=1959
        Xwayland-1961    [006] .....   106.175491: exec: (__probestub_sched_process_exec+0x4/0x10) pid=1961
           <...>-1962    [005] .....   107.406472: exec: (__probestub_sched_process_exec+0x4/0x10) pid=1962

No need for typecasting either ;-)


> > +	ctx->offset += tmp - arg;
> > +	ret = parse_btf_arg(tmp, pcode, end, ctx);
> > +	ctx->flags &= ~TPARG_FL_TYPECAST;
> > +	ctx->last_struct = NULL;
> > +out_put:
> > +	btf_put(ctx->struct_btf);  
> 
> 
> should we zero ctx->struct_btf in case there's more type casts,
> so query_btf_struct would re-init it?

Yeah, I already mentioned that mistake:

  https://lore.kernel.org/all/20260522072322.18aa72dd@gandalf.local.home/

>> Oops, I forgot to do:
>> 
>> 	ctx->struct_buf = NULL;
>> 
>> here.
>> 
>> Will fix.

Thanks for the review!

-- Steve

^ permalink raw reply

* [PATCH v8 6/6] selftests/mm: add hwpoison-panic destructive test
From: Breno Leitao @ 2026-05-27 14:06 UTC (permalink / raw)
  To: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Shuah Khan, Naoya Horiguchi, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
	linux-trace-kernel, kernel-team
In-Reply-To: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org>

Add a destructive selftest that verifies
vm.panic_on_unrecoverable_memory_failure actually panics when a
hwpoison error hits a kernel-owned page.

Three "kinds" of kernel-owned page can be targeted, selectable via
the script's first positional argument (default: rodata):

  rodata  - a PG_reserved page in the kernel rodata range, sourced
            from the "Kernel rodata" sub-resource of "System RAM" in
            /proc/iomem.  That entry is reported on every major
            architecture and guarantees the chosen PFN is backed by
            struct page (an online System RAM range, not a firmware
            hole), is PG_reserved, and is read-only -- so even if
            the panic fails to fire for some reason, the resulting
            PG_hwpoison marker on rodata does not corrupt writable
            kernel state.

  slab    - a slab page found by walking /proc/kpageflags for the
            first PFN with KPF_SLAB set (and KPF_HWPOISON / KPF_NOPAGE
            / KPF_COMPOUND_TAIL clear).  Exercises the get_any_page()
            path on a non PG_reserved kernel-owned page and so
            catches regressions where get_any_page() collapses
            kernel-owned pages into a transient -EIO instead of
            -ENOTRECOVERABLE.

  pgtable - same as slab, but the PFN is selected via KPF_PGTABLE.

PageLargeKmalloc, the fourth page type matched by
HWPoisonKernelOwned(), is intentionally not covered: it is a
PAGE_TYPE_OPS flag with no /proc/kpageflags bit, so selecting such
a PFN from userspace is not feasible.  The slab and pgtable
variants already exercise the same get_any_page() positive-check
branch.

The script enables the sysctl and writes the selected physical
address to /sys/devices/system/memory/hard_offline_page.  A
successful run crashes the kernel with

  Memory failure: <pfn>: unrecoverable page

A return from the inject means the panic did not fire and the test
fails.  Test outcome is therefore observed externally (serial
console, kdump) rather than from the script's own exit code.

The script is intentionally NOT wired into run_vmtests.sh: every
successful run panics the kernel, which is incompatible with the
sequential "run each category in the same VM" model that
run_vmtests.sh assumes.  It is also not registered as a TEST_PROGS /
ksft_* wrapper so a default kselftest run does not opt itself into
a panic.  The script is meant to be executed manually inside a
disposable VM (e.g. virtme-ng), one variant per VM boot, and
requires RUN_DESTRUCTIVE=1 in the environment as a safety net.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 tools/testing/selftests/mm/Makefile          |   1 +
 tools/testing/selftests/mm/hwpoison-panic.sh | 193 +++++++++++++++++++++++++++
 2 files changed, 194 insertions(+)

diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index e6df968f0971..170e376c97b4 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -181,6 +181,7 @@ TEST_FILES += charge_reserved_hugetlb.sh
 TEST_FILES += hugetlb_reparenting_test.sh
 TEST_FILES += test_page_frag.sh
 TEST_FILES += run_vmtests.sh
+TEST_FILES += hwpoison-panic.sh
 
 # required by charge_reserved_hugetlb.sh
 TEST_FILES += write_hugetlb_memory.sh
diff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh
new file mode 100755
index 000000000000..43fc379f8761
--- /dev/null
+++ b/tools/testing/selftests/mm/hwpoison-panic.sh
@@ -0,0 +1,193 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison
+# error on a kernel-owned page and confirming the kernel panics.
+#
+# Three "kinds" of kernel-owned page can be targeted, selectable via the
+# first positional argument (default: rodata):
+#
+#   rodata  - a PG_reserved page in the kernel rodata range
+#             (sourced from /proc/iomem "Kernel rodata").  Exercises
+#             memory_failure() -> get_any_page() on a PageReserved page.
+#
+#   slab    - a slab page found via /proc/kpageflags (KPF_SLAB).
+#             Exercises memory_failure() -> get_any_page() on a non
+#             PG_reserved kernel-owned page.  This path is what catches
+#             regressions where get_any_page() collapses kernel-owned
+#             pages into a transient -EIO instead of -ENOTRECOVERABLE.
+#
+#   pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE).
+#             Same path as slab, different page type.
+#
+# This test is DESTRUCTIVE: a successful run crashes the kernel.  It is
+# meant to be executed inside a disposable VM (e.g. virtme-ng) with a
+# serial console captured by the harness.  It is skipped unless the
+# caller opts in via RUN_DESTRUCTIVE=1.
+#
+# Test passes externally: the kernel must panic with
+#   "Memory failure: <pfn>: unrecoverable page"
+# A return from the inject means the panic did not fire and the test
+# fails.
+#
+# Author: Breno Leitao <leitao@debian.org>
+
+set -u
+
+ksft_skip=4
+sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure
+inject_path=/sys/devices/system/memory/hard_offline_page
+kpageflags_path=/proc/kpageflags
+
+# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h)
+KPF_SLAB=7
+KPF_COMPOUND_TAIL=16
+KPF_HWPOISON=19
+KPF_NOPAGE=20
+KPF_PGTABLE=26
+
+kind=${1:-rodata}
+
+ksft_print() { echo "# $*"; }
+ksft_exit_skip() { ksft_print "$*"; exit "$ksft_skip"; }
+ksft_exit_fail() { echo "not ok 1 $*"; exit 1; }
+
+if [ "$(id -u)" -ne 0 ]; then
+	ksft_exit_skip "must run as root"
+fi
+
+if [ ! -w "$sysctl_path" ]; then
+	ksft_exit_skip "$sysctl_path not present (kernel without the sysctl?)"
+fi
+
+if [ ! -w "$inject_path" ]; then
+	ksft_exit_skip "$inject_path not present (no MEMORY_HOTPLUG?)"
+fi
+
+if [ "${RUN_DESTRUCTIVE:-0}" != "1" ]; then
+	ksft_exit_skip "destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM"
+fi
+
+# Pick a PFN inside the kernel image rodata region of /proc/iomem.
+# This is preferred over a top-level "Reserved" entry because top-level
+# Reserved ranges are often firmware holes that have no backing struct
+# page; pfn_to_online_page() returns NULL on those and memory_failure()
+# bails out with -ENXIO before reaching the panic path.
+#
+# "Kernel rodata" is reported as a sub-resource of "System RAM" on every
+# major architecture, which guarantees:
+#   - the PFN is backed by struct page (within an online memory range);
+#   - PG_reserved is set on the page (kernel image area);
+#   - the memory is read-only, so setting PG_hwpoison on it does not
+#     corrupt writable kernel state if the panic somehow does not fire.
+#
+# /proc/iomem entries look like (indented for sub-resources):
+#     "  02500000-02ffffff : Kernel rodata"
+pick_rodata_phys_addr() {
+	awk -v pagesize="$(getconf PAGE_SIZE)" '
+	/: Kernel rodata[[:space:]]*$/ {
+		sub(/^[[:space:]]+/, "")
+		n = split($0, a, /[- ]/)
+		start = strtonum("0x" a[1])
+		end   = strtonum("0x" a[2])
+		if (end <= start)
+			next
+		# Page-align upward and emit the first byte of that page.
+		pfn = int((start + pagesize - 1) / pagesize)
+		printf "0x%x\n", pfn * pagesize
+		exit 0
+	}
+	' /proc/iomem
+}
+
+# Walk /proc/kpageflags and return the phys addr of the first PFN that
+# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL
+# all clear (so we attack a real, non-tail, not-already-poisoned page).
+#
+# We skip the first 16 MiB of PFNs to step past low-memory special
+# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not
+# exhibit the slab/pgtable type we are looking for.
+pick_kpageflags_phys_addr() {
+	local want_bit=$1
+	local pagesize skip_pfn
+
+	[ -r "$kpageflags_path" ] || return
+
+	pagesize=$(getconf PAGE_SIZE)
+	skip_pfn=$(((16 * 1024 * 1024) / pagesize))
+
+	od -An -tx8 -v -w8 -j "$((skip_pfn * 8))" "$kpageflags_path" 2>/dev/null | \
+	awk -v want_bit="$want_bit" \
+	    -v hwp_bit="$KPF_HWPOISON" \
+	    -v nopage_bit="$KPF_NOPAGE" \
+	    -v tail_bit="$KPF_COMPOUND_TAIL" \
+	    -v base_pfn="$skip_pfn" \
+	    -v pagesize="$pagesize" '
+	# Test whether bit "b" is set in the 16-hex-digit value "hex".
+	# Done with substring + per-digit lookup so we never rely on awk
+	# bitwise operators (mawk lacks them) or 64-bit FP precision.
+	function bit_set(hex, b,    di, bi, c, v) {
+		di = int(b / 4)
+		bi = b - di * 4
+		c = substr(hex, length(hex) - di, 1)
+		v = strtonum("0x" c)
+		if (bi == 0) return (v % 2) == 1
+		if (bi == 1) return int(v / 2) % 2 == 1
+		if (bi == 2) return int(v / 4) % 2 == 1
+		return int(v / 8) % 2 == 1
+	}
+	{
+		gsub(/^[[:space:]]+/, "")
+		h = $1
+		if (bit_set(h, want_bit) &&
+		    !bit_set(h, hwp_bit) &&
+		    !bit_set(h, nopage_bit) &&
+		    !bit_set(h, tail_bit)) {
+			pfn = base_pfn + NR - 1
+			printf "0x%x\n", pfn * pagesize
+			exit 0
+		}
+	}
+	'
+}
+
+case "$kind" in
+rodata)
+	phys_addr=$(pick_rodata_phys_addr)
+	missing_msg='no "Kernel rodata" entry in /proc/iomem'
+	;;
+slab)
+	phys_addr=$(pick_kpageflags_phys_addr "$KPF_SLAB")
+	missing_msg="no usable slab PFN found in $kpageflags_path"
+	;;
+pgtable)
+	phys_addr=$(pick_kpageflags_phys_addr "$KPF_PGTABLE")
+	missing_msg="no usable page-table PFN found in $kpageflags_path"
+	;;
+*)
+	ksft_exit_fail "unknown kind '$kind' (expected: rodata|slab|pgtable)"
+	;;
+esac
+
+if [ -z "$phys_addr" ]; then
+	ksft_exit_skip "$missing_msg"
+fi
+
+ksft_print "enabling $sysctl_path"
+prior=$(cat "$sysctl_path")
+echo 1 > "$sysctl_path" || ksft_exit_fail "failed to enable sysctl"
+
+ksft_print "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (kind=$kind)"
+ksft_print "expecting kernel panic: 'Memory failure: <pfn>: unrecoverable page'"
+
+# If this returns, the kernel did not panic → test failed.  Restore the
+# sysctl before reporting so the system is left as we found it.
+if echo "$phys_addr" > "$inject_path"; then
+	echo "$prior" > "$sysctl_path"
+	ksft_exit_fail "inject returned without panic; sysctl ineffective"
+fi
+
+# Write failed (e.g. -EINVAL on offlining a non-online region): also a
+# failure for this test, since we expected the panic path.
+echo "$prior" > "$sysctl_path"
+ksft_exit_fail "inject failed before reaching the panic path"

-- 
2.54.0


^ permalink raw reply related

* [PATCH v8 5/6] Documentation: document panic_on_unrecoverable_memory_failure sysctl
From: Breno Leitao @ 2026-05-27 14:06 UTC (permalink / raw)
  To: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Shuah Khan, Naoya Horiguchi, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
	Liam R. Howlett
  Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
	linux-trace-kernel, kernel-team
In-Reply-To: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org>

Add documentation for the new vm.panic_on_unrecoverable_memory_failure
sysctl, describing which failures trigger a panic (kernel-owned pages
the handler cannot recover) and which are intentionally left out
(transient allocator races and unclassified pages).

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 Documentation/admin-guide/sysctl/vm.rst | 85 +++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 97e12359775c..f71d87039904 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - page_lock_unfairness
 - panic_on_oom
+- panic_on_unrecoverable_memory_failure
 - percpu_pagelist_high_fraction
 - stat_interval
 - stat_refresh
@@ -925,6 +926,90 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
 why oom happens. You can get snapshot.
 
 
+panic_on_unrecoverable_memory_failure
+======================================
+
+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page
+that cannot be recovered by the memory failure handler, the default
+behaviour is to ignore the error and continue operation.  This is
+dangerous because the corrupted data remains accessible to the kernel,
+risking silent data corruption or a delayed crash when the poisoned
+memory is next accessed.
+
+When enabled, this sysctl triggers a panic on memory failure events
+hitting kernel-owned pages that the handler cannot recover:
+``PageReserved`` (firmware reservations, kernel image, vDSO, zero
+page, and similar memblock-reserved regions), ``PageSlab``,
+``PageTable``, and ``PageLargeKmalloc``.  These are owned by the
+kernel and the memory failure handler cannot reliably evict their
+contents.
+
+For soft offline (``madvise(MADV_SOFT_OFFLINE)``,
+``/sys/devices/system/memory/soft_offline_page``), pages owned by
+``movable_ops`` are exempted, since soft offline is allowed to
+migrate them even though they are not on the LRU.
+
+Other unrecoverable kernel-owned populations (vmalloc allocations,
+kernel stack pages, ...) are not currently covered because the
+handler has no page-type signal that distinguishes them from a
+userspace folio temporarily off the LRU during migration or
+compaction.  Such pages still go through the standard
+MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a
+delayed crash on the next access remains possible.  Coverage may
+grow as the handler gains stronger kernel-ownership signals.
+
+Recoverable failure paths are also intentionally left out: in-flight
+buddy allocations and other transient races with the page allocator
+can reach the same diagnostic, and panicking on them would risk
+killing the box for a page destined for userspace where the standard
+SIGBUS recovery path applies.  Pages whose state could not be
+classified at all are not covered either, since an unknown state is
+not a sound basis for a panic decision.
+
+For many environments it is preferable to panic immediately with a clean
+crash dump that captures the original error context, rather than to
+continue and face a random crash later whose cause is difficult to
+diagnose.
+
+Use cases
+---------
+
+This option is most useful in environments where unattributed crashes
+are expensive to debug or where data integrity must take precedence
+over availability:
+
+* Large fleets, where multi-bit ECC errors on kernel pages are observed
+  regularly and post-mortem analysis of an unrelated downstream crash
+  (often seconds to minutes after the original error) consumes
+  significant engineering effort.
+
+* Systems configured with kdump, where panicking at the moment of the
+  hardware error produces a vmcore that still contains the faulting
+  address, the affected page state, and the originating MCE/GHES
+  record — context that is typically lost by the time a delayed crash
+  occurs.
+
+* High-availability clusters that rely on fast, deterministic node
+  failure for failover, and prefer an immediate panic over silent data
+  corruption propagating to replicas or persistent storage.
+
+* Kernel and platform developers reproducing hwpoison issues with
+  tools such as ``mce-inject`` or error-injection debugfs interfaces,
+  where panicking on the unrecoverable path makes regressions
+  immediately visible instead of surfacing as later, unrelated
+  failures.
+
+= =====================================================================
+0 Try to continue operation (default).
+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the
+  machine will be rebooted.
+= =====================================================================
+
+Example::
+
+     echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+
 percpu_pagelist_high_fraction
 =============================
 

-- 
2.54.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox