* [PATCH v7 1/9] bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
xbc_snprint_cmdline() is meant to be called twice: first with
buf=NULL, size=0 to probe the rendered length, then with a real
buffer to fill it (the standard snprintf() two-pass pattern). The
probe call makes the function compute "buf + size" (NULL + 0) and,
on every iteration, advance "buf += ret" from that NULL base and
pass the result back into snprintf().
Pointer arithmetic on a NULL pointer is undefined behavior. It is
harmless in the in-kernel callers today, but the follow-up patches
run this same code in the userspace tools/bootconfig parser at kernel
build time, where host UBSan / FORTIFY_SOURCE abort the build.
Track a running written length (size_t) instead of mutating @buf, and
only form "buf + len" when @buf is non-NULL. snprintf(NULL, 0, ...)
is itself well defined and returns the would-be length, so the
two-pass "probe then fill" usage returns identical byte counts.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
lib/bootconfig.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index f445b7703fdd9..2ed9ee3dc81c7 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -427,10 +427,18 @@ static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
{
struct xbc_node *knode, *vnode;
- char *end = buf + size;
const char *val, *q;
+ size_t len = 0;
int ret;
+ /*
+ * Track the running written length rather than advancing @buf, so we
+ * never form "buf + size" or "buf += ret" while @buf is NULL (the
+ * size-probe call passes buf=NULL, size=0). NULL pointer arithmetic
+ * is undefined behavior and trips host UBSan / FORTIFY_SOURCE when
+ * this renderer runs at kernel build time. snprintf(NULL, 0, ...)
+ * itself is well defined and returns the would-be length.
+ */
xbc_node_for_each_key_value(root, knode, val) {
ret = xbc_node_compose_key_after(root, knode,
xbc_namebuf, XBC_KEYLEN_MAX);
@@ -439,10 +447,11 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
vnode = xbc_node_get_child(knode);
if (!vnode) {
- ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
+ ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+ "%s ", xbc_namebuf);
if (ret < 0)
return ret;
- buf += ret;
+ len += ret;
continue;
}
xbc_array_for_each_value(vnode, val) {
@@ -452,15 +461,15 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
* whitespace.
*/
q = strpbrk(val, " \t\r\n") ? "\"" : "";
- ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
- xbc_namebuf, q, val, q);
+ ret = snprintf(buf ? buf + len : NULL, rest(len, size),
+ "%s=%s%s%s ", xbc_namebuf, q, val, q);
if (ret < 0)
return ret;
- buf += ret;
+ len += ret;
}
}
- return buf - (end - size);
+ return len;
}
#undef rest
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 0/9] bootconfig: embed kernel.* cmdline at build time
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier
The userspace pieces (xbc_snprint_cmdline() in lib/, tools/bootconfig -C)
already landed; this series wires the rendered cmdline into the kernel.
Motivation: today the embedded bootconfig is parsed at runtime, after
parse_early_param() has already run, so early_param() handlers can't
see embedded values. Folding the kernel.* subtree into the cmdline at
build time gives a CONFIG_CMDLINE-equivalent for embedded-bootconfig
users without forcing them to maintain two cmdline sources.
Behaviorally, the "kernel" subtree is rendered to a flat string at
build time and stashed in .init.rodata. setup_arch() prepends it to
boot_command_line before parse_early_param() runs. Overflow is a soft
error: the helper logs and leaves boot_command_line untouched rather
than panicking, so an oversized embedded bconf cannot brick a boot.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v7:
- The runtime opt-in now shares one helper instead of open-coding its
own. (Masami)
- bootconfig_cmdline_requested() moved into generic lib code (Masami)
- Link to v6: https://lore.kernel.org/r/20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org
Changes in v6:
- renamed CONFIG_BOOT_CONFIG_EMBED_CMDLINE to
CONFIG_CMDLINE_FROM_BOOTCONFIG
- prepend embedded bootconfig cmdline before parse_early_param
- Link to v5: https://lore.kernel.org/r/20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org
Changes in v5:
- Patch 3 (Kconfig): drop the redundant "depends on BOOT_CONFIG_EMBED"
from CMDLINE_FROM_BOOTCONFIG; Julian Braha.
- Patch 6 (Documentation): spell out how the embedded cmdline interacts
with the bootloader cmdline, an initrd bootconfig, and the embedded
bootconfig
- Link to v4: https://lore.kernel.org/r/20260609-bootconfig_using_tools-v4-0-73c463f03a97@debian.org
Changes in v4:
- Patch 3 (build pipeline): clear CROSS_COMPILE= in the kernel-side
tools/bootconfig sub-make. Without it, an LLVM=1 cross build
inherits CROSS_COMPILE and tools/scripts/Makefile.include injects
--target=/--sysroot= into the host clang, producing a target
binary that fails to exec.
- Patch 3 (build pipeline): place embedded-cmdline.S in its own
.init.rodata.embed_cmdline subsection ("a") so ld.lld does not
see a section-type mismatch against lib/bootconfig-data.S's
writable .init.rodata ("aw"). The linker's *(.init.rodata
.init.rodata.*) glob still folds it into the init image.
- Patch 6 (x86/setup): also accept the bootconfig=<anything> form
via cmdline_find_option(), matching the runtime parse_args() loop.
Without it, bootconfig=0/=off would skip the early prepend but
still trigger the late runtime apply -- a split-brain state.
- New patch 7: document CONFIG_CMDLINE_FROM_BOOTCONFIG in
Documentation/admin-guide/bootconfig.rst (semantics, opt-in,
precedence, overflow behavior, example).
- Link to v3: https://lore.kernel.org/r/20260608-bootconfig_using_tools-v3-0-4ddd079a0696@debian.org
Changes in v3:
- Patch 3: Move HOSTCC override to the kernel-side rule; tool keeps
$(CC) for standalone/cross builds.
- Patch 6: Drop the false fail-safe wording; document the
BOOT_CONFIG_FORCE=y default interaction.
- Link to v2:
https://lore.kernel.org/r/20260605-bootconfig_using_tools-v2-0-d309f544b5f7@debian.org
Changes in v2 (addressing review of v1):
- Split out a standalone fix for the NULL-pointer arithmetic in
xbc_snprint_cmdline() so the build-time render cannot trip host
UBSan/FORTIFY_SOURCE.
- Rework the leaf-root handling: instead of returning early, skip @root
inside the loop so a root carrying both a value and subkeys
(kernel = x together with kernel.foo = bar) still renders its
descendant keys.
- Build tools/bootconfig with $(HOSTCC) so cross-compiled (ARCH=...)
builds render the cmdline on the build host instead of failing with
"Exec format error".
- Mark the embedded cmdline section read-only (drop the "w" flag from
.init.rodata).
- Add a make-clean hook so tools/bootconfig artifacts are removed by
make clean.
- Gate the x86 prepend on "bootconfig" being present on the command
line (or CONFIG_BOOT_CONFIG_FORCE), matching the init.* opt-in
semantics documented in bootconfig.rst and preserving fail-safe
recovery: dropping "bootconfig" from the bootloader cmdline now also
disables the embedded kernel.* keys.
- Link to v1: https://patch.msgid.link/20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org
---
Breno Leitao (9):
bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
bootconfig: render embedded bootconfig as a kernel cmdline at build time
bootconfig: clean build-time tools/bootconfig from make clean
bootconfig: add xbc_prepend_embedded_cmdline() helper
Documentation: bootconfig: document build-time cmdline rendering
x86/setup: prepend embedded bootconfig cmdline before parse_early_param
bootconfig: skip runtime kernel.* render once prepended early
init/main.c: use bootconfig_cmdline_requested() for the runtime opt-in
Documentation/admin-guide/bootconfig.rst | 81 ++++++++++++++++
MAINTAINERS | 1 +
Makefile | 27 +++++-
arch/x86/Kconfig | 1 +
arch/x86/kernel/setup.c | 14 ++-
include/linux/bootconfig.h | 14 +++
init/Kconfig | 36 +++++++
init/main.c | 52 +++++-----
lib/Makefile | 16 +++
lib/bootconfig.c | 162 +++++++++++++++++++++++++++++--
lib/embedded-cmdline.S | 16 +++
tools/bootconfig/Makefile | 4 +-
12 files changed, 388 insertions(+), 36 deletions(-)
---
base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
change-id: 20260508-bootconfig_using_tools-cfa7aa9d6a5a
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply
* [PATCH v7 2/9] bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
xbc_node_for_each_key_value() walks to the first leaf under @root, and
when @root is itself a leaf it yields @root. That happens not only for
an empty "kernel {}" subtree, but also when @root carries both a value
and subkeys, e.g.
kernel = x
kernel.foo = bar
Here @root ("kernel") is a leaf because its first child is the value
node "x", so the iterator returns @root first. Feeding @root back into
xbc_node_compose_key_after(root, root) returns -EINVAL, which the only
in-kernel caller papers over with a "len <= 0" check -- but the
follow-up tools/bootconfig -C user propagates the error and turns such
a bootconfig into a build failure. Worse, short-circuiting the whole
call on a leaf @root would silently drop the valid "kernel.foo = bar"
descendant that this patch should render.
Skip @root inside the loop instead of bailing out: the value-only entry
is dropped (it is rendered through the "kernel" cmdline path, not here),
while real descendant keys are still emitted. An entirely empty subtree
now renders nothing and returns 0 rather than -EINVAL, matching the
"nothing to render is not an error" semantics expected by the new
build-time caller.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
lib/bootconfig.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 2ed9ee3dc81c7..926094d97397e 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -440,6 +440,17 @@ int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
* itself is well defined and returns the would-be length.
*/
xbc_node_for_each_key_value(root, knode, val) {
+ /*
+ * An empty or value-only @root (e.g. "kernel {}" or
+ * "kernel = x", possibly alongside "kernel.foo = bar")
+ * yields @root itself here. Skip it: composing a key for it
+ * would fail with -EINVAL, yet any real descendant keys must
+ * still be rendered. An entirely empty subtree then renders
+ * nothing and returns 0 rather than an error.
+ */
+ if (knode == root)
+ continue;
+
ret = xbc_node_compose_key_after(root, knode,
xbc_namebuf, XBC_KEYLEN_MAX);
if (ret < 0)
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 3/9] bootconfig: render embedded bootconfig as a kernel cmdline at build time
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
Add the build-time pipeline that renders the "kernel" subtree of
CONFIG_BOOT_CONFIG_EMBED_FILE into a flat cmdline string and stashes
it in .init.rodata as embedded_kernel_cmdline[]. A follow-up patch
adds the runtime helper that prepends this string to boot_command_line
during early architecture setup so parse_early_param() sees the values.
The build wires up:
tools/bootconfig -C kernel - userspace tool already shared with
lib/bootconfig.c, used here in -C mode
to render a bootconfig file to a cmdline
lib/embedded-cmdline.S - .incbin's the rendered text plus a NUL
(listed under the EXTRA BOOT CONFIG
MAINTAINERS entry)
lib/Makefile rule - runs tools/bootconfig at build time
Makefile prepare dep - ensures tools/bootconfig is built first,
same pattern as tools/objtool and
tools/bpf/resolve_btfids
Drop the test target from tools/bootconfig/Makefile's default 'all'
recipe so that hooking the binary into the kernel build does not run
test-bootconfig.sh on every prepare. The tests stay available as
'make -C tools/bootconfig test', matching the convention of
tools/objtool and tools/bpf/resolve_btfids whose 'all' targets only
build the binary.
Require BOOT_CONFIG_EMBED_FILE to be non-empty before the new option
can be enabled, otherwise tools/bootconfig -C runs against an empty
file and prints a parse error on every kernel build.
The feature gates on CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG, a
silent symbol arches select once they've wired the prepend call into
setup_arch(). No arch selects it in this patch, so the user-visible
CONFIG_CMDLINE_FROM_BOOTCONFIG is not yet enableable; when an arch
later opts in, the runtime behavior is added by the follow-up patches.
tools/bootconfig also installs on target systems, so its own Makefile
keeps $(CC) and stays cross-buildable as a standalone tool. The kernel
build, which runs the tool on the build host during prepare, instead
forces CC=$(HOSTCC) from a dedicated tools/bootconfig rule and clears
CROSS_COMPILE= in the sub-make. Without that clear, an LLVM=1 cross
build would inherit CROSS_COMPILE and tools/scripts/Makefile.include
would inject --target=/--sysroot= flags into the host clang invocation,
producing a target binary that fails to exec ("Exec format error").
embedded-cmdline.S places the rendered string in its own .init.rodata
subsection (.init.rodata.embed_cmdline) with the "a" (allocatable,
read-only) flag and %progbits. lib/bootconfig-data.S already places
the embedded bootconfig blob in .init.rodata with the "aw" flag
(xbc_init() rewrites separators in place, so that data must be
writable). Using a distinct subsection name avoids the ld.lld section-
type mismatch that would otherwise arise from mixing "a" and "aw"
under the same name; the linker's "*(.init.rodata .init.rodata.*)"
glob still folds both into the init image and frees them after boot.
A follow-up patch wires the build-time tools/bootconfig into the
top-level clean target.
Reviewed-by: Nicolas Schier <n.schier@fritz.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
MAINTAINERS | 1 +
Makefile | 16 ++++++++++++++++
init/Kconfig | 36 ++++++++++++++++++++++++++++++++++++
lib/Makefile | 16 ++++++++++++++++
lib/embedded-cmdline.S | 16 ++++++++++++++++
tools/bootconfig/Makefile | 2 +-
6 files changed, 86 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 57656ec0e9d5d..953231df1911d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9844,6 +9844,7 @@ F: fs/proc/bootconfig.c
F: include/linux/bootconfig.h
F: lib/bootconfig-data.S
F: lib/bootconfig.c
+F: lib/embedded-cmdline.S
F: tools/bootconfig/*
F: tools/bootconfig/scripts/*
diff --git a/Makefile b/Makefile
index bf196c6df5b92..5255aa35a2e51 100644
--- a/Makefile
+++ b/Makefile
@@ -1545,6 +1545,22 @@ prepare: tools/bpf/resolve_btfids
endif
endif
+# tools/bootconfig renders the embedded bootconfig into a cmdline at build time.
+ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+prepare: tools/bootconfig
+endif
+
+# tools/bootconfig is run on the build host during prepare, so force a host
+# binary here; its own Makefile keeps $(CC) for standalone and cross builds.
+# CROSS_COMPILE= is cleared so tools/scripts/Makefile.include does not inject
+# the target's --target=/--sysroot= flags into the host clang invocation under
+# LLVM=1 cross builds (which would produce a target binary that fails to exec).
+tools/bootconfig: export CC := $(HOSTCC)
+tools/bootconfig: FORCE
+ $(Q)mkdir -p $(objtree)/tools
+ $(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ \
+ bootconfig CROSS_COMPILE=
+
# The tools build system is not a part of Kbuild and tends to introduce
# its own unique issues. If you need to integrate a new tool into Kbuild,
# please consider locating that tool outside the tools/ tree and using the
diff --git a/init/Kconfig b/init/Kconfig
index 5230d4879b1c8..598690ec313a2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1566,6 +1566,42 @@ config BOOT_CONFIG_EMBED_FILE
This bootconfig will be used if there is no initrd or no other
bootconfig in the initrd.
+config ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+ bool
+ help
+ Silent symbol; no C code reads it directly. Architectures
+ select it once their setup_arch() calls
+ xbc_prepend_embedded_cmdline() before parse_early_param().
+ Its only role is to gate the user-visible
+ CMDLINE_FROM_BOOTCONFIG option per-arch, the same
+ ARCH_SUPPORTS_* idiom used by ARCH_SUPPORTS_CFI, etc.
+
+config CMDLINE_FROM_BOOTCONFIG
+ bool "Render embedded bootconfig as kernel cmdline at build time"
+ depends on BOOT_CONFIG_EMBED_FILE != ""
+ depends on ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
+ depends on CMDLINE = ""
+ default n
+ help
+ Render the "kernel" subtree of the embedded bootconfig file into a
+ flat cmdline string at kernel build time and prepend it to
+ boot_command_line during early architecture setup. This makes
+ early_param() handlers (e.g. mem=, earlycon=, loglevel=) see the
+ values supplied via the embedded bootconfig.
+
+ The runtime bootconfig parser is unaffected, so tree-structured
+ consumers such as ftrace boot-time tracing keep working.
+
+ Note: when an initrd also carries a bootconfig, its "kernel"
+ subtree is still parsed at runtime, but the embedded "kernel"
+ keys remain in boot_command_line for parse_early_param() and
+ end up later than the initrd keys in saved_command_line, so
+ parse_args() last-wins favors the embedded values. If you need
+ initrd to override embedded kernel.* keys, leave this option
+ off.
+
+ If unsure, say N.
+
config CMDLINE_LOG_WRAP_IDEAL_LEN
int "Length to try to wrap the cmdline when logged at boot"
default 1021
diff --git a/lib/Makefile b/lib/Makefile
index 7f75cc6edf94a..4ccdce2fd5e5b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -273,6 +273,22 @@ filechk_defbconf = cat $(or $(real-prereqs), /dev/null)
$(obj)/default.bconf: $(CONFIG_BOOT_CONFIG_EMBED_FILE) FORCE
$(call filechk,defbconf)
+obj-$(CONFIG_CMDLINE_FROM_BOOTCONFIG) += embedded-cmdline.o
+$(obj)/embedded-cmdline.o: $(obj)/embedded_cmdline.bin
+
+# Render the bootconfig "kernel" subtree to a flat cmdline string using
+# the userspace tools/bootconfig parser (-C mode). The runtime prepend
+# helper enforces COMMAND_LINE_SIZE at boot, so no build-time size
+# check is performed here (COMMAND_LINE_SIZE is an arch header
+# constant, not a Kconfig value).
+quiet_cmd_render_cmdline = BCONF2C $@
+ cmd_render_cmdline = \
+ $(objtree)/tools/bootconfig/bootconfig -C $< > $@
+
+targets += embedded_cmdline.bin
+$(obj)/embedded_cmdline.bin: $(obj)/default.bconf $(objtree)/tools/bootconfig/bootconfig FORCE
+ $(call if_changed,render_cmdline)
+
obj-$(CONFIG_RBTREE_TEST) += rbtree_test.o
obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o
diff --git a/lib/embedded-cmdline.S b/lib/embedded-cmdline.S
new file mode 100644
index 0000000000000..bda81b4a42bea
--- /dev/null
+++ b/lib/embedded-cmdline.S
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Embed the build-time-rendered bootconfig "kernel" subtree as a flat
+ * cmdline string. setup_arch() prepends this to boot_command_line on
+ * architectures that select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG.
+ *
+ * Copyright (c) 2026 Meta Platforms, Inc. and affiliates
+ * Copyright (c) 2026 Breno Leitao <leitao@debian.org>
+ */
+ .section .init.rodata.embed_cmdline, "a", %progbits
+ .global embedded_kernel_cmdline
+embedded_kernel_cmdline:
+ .incbin "lib/embedded_cmdline.bin"
+ .byte 0
+ .global embedded_kernel_cmdline_end
+embedded_kernel_cmdline_end:
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 90eb47c9d8de6..4e82fd9553cde 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -15,7 +15,7 @@ override CFLAGS += -Wall -g -I$(CURDIR)/include
ALL_TARGETS := bootconfig
ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
-all: $(ALL_PROGRAMS) test
+all: $(ALL_PROGRAMS)
$(OUTPUT)bootconfig: main.c include/linux/bootconfig.h $(LIBSRC)
$(CC) $(filter %.c,$^) $(CFLAGS) $(LDFLAGS) -o $@
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 4/9] bootconfig: clean build-time tools/bootconfig from make clean
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team, Nicolas Schier
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
The previous patch builds tools/bootconfig during 'make prepare' to
render the embedded bootconfig cmdline, but nothing removes it on
'make clean', leaving the compiled tool and its objects behind.
Wire a bootconfig_clean hook into the top-level clean target so the
compiled tool and its objects are removed by make clean, matching the
prepare-wired tools/objtool and tools/bpf/resolve_btfids.
The hook runs tools/bootconfig's Makefile via $(MAKE), which the kernel
build invokes with -rR (MAKEFLAGS += -rR). -rR drops the built-in $(RM)
variable, so the existing "$(RM) -f ..." clean recipe would expand to a
bare "-f ..." and fail. Spell the recipe with a literal "rm -f" so it
keeps working both standalone and when invoked from Kbuild.
Reviewed-by: Nicolas Schier <n.schier@fritz.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Makefile | 11 ++++++++++-
tools/bootconfig/Makefile | 2 +-
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index 5255aa35a2e51..20a2bcacde3b8 100644
--- a/Makefile
+++ b/Makefile
@@ -1587,6 +1587,15 @@ ifneq ($(wildcard $(objtool_O)),)
$(Q)$(MAKE) -sC $(abs_srctree)/tools/objtool O=$(objtool_O) srctree=$(abs_srctree) $(patsubst objtool_%,%,$@)
endif
+PHONY += bootconfig_clean
+
+bootconfig_O = $(abspath $(objtree))/tools/bootconfig
+
+bootconfig_clean:
+ifneq ($(wildcard $(bootconfig_O)),)
+ $(Q)$(MAKE) -sC $(srctree)/tools/bootconfig O=$(bootconfig_O) clean
+endif
+
tools/: FORCE
$(Q)mkdir -p $(objtree)/tools
$(Q)$(MAKE) O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/
@@ -1757,7 +1766,7 @@ vmlinuxclean:
$(Q)$(CONFIG_SHELL) $(srctree)/scripts/link-vmlinux.sh clean
$(Q)$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) clean)
-clean: archclean vmlinuxclean resolve_btfids_clean objtool_clean
+clean: archclean vmlinuxclean resolve_btfids_clean objtool_clean bootconfig_clean
# mrproper - Delete all generated files, including .config
#
diff --git a/tools/bootconfig/Makefile b/tools/bootconfig/Makefile
index 4e82fd9553cde..3cb8066d5141b 100644
--- a/tools/bootconfig/Makefile
+++ b/tools/bootconfig/Makefile
@@ -27,4 +27,4 @@ install: $(ALL_PROGRAMS)
install $(OUTPUT)bootconfig $(DESTDIR)$(bindir)
clean:
- $(RM) -f $(OUTPUT)*.o $(ALL_PROGRAMS)
+ rm -f $(OUTPUT)*.o $(ALL_PROGRAMS)
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 5/9] bootconfig: add xbc_prepend_embedded_cmdline() helper
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
Add a helper that prepends the build-time-rendered embedded bootconfig
"kernel" subtree (embedded_kernel_cmdline[] from embedded-cmdline.S) to
a cmdline buffer with a separating space. Architectures call this from
setup_arch() before parse_early_param() so early_param() handlers
(mem=, earlycon=, loglevel=, ...) see values supplied via the embedded
bootconfig.
The in-place prepend (shift the existing string right, then drop the
embedded string in front) is factored into a small str_prepend() helper.
On overflow the helper logs an error and leaves the cmdline untouched
rather than panicking. Booting without the embedded values is better
than refusing to boot, and the error tells the user why their embedded
keys are missing.
The helper records whether it actually prepended, exposed via
xbc_embedded_cmdline_applied(). setup_boot_config() uses this to decide
whether the runtime "kernel" render would duplicate keys already folded
into boot_command_line.
Also add bootconfig_cmdline_requested(), a small parse_args() wrapper
that reports whether "bootconfig" was passed on the command line and,
via an optional out-parameter, where the "--" init arguments begin.
setup_arch() and setup_boot_config() share it so the early and late
paths agree on the opt-in. It sits under CONFIG_BOOT_CONFIG rather than
CONFIG_CMDLINE_FROM_BOOTCONFIG because the runtime parser needs it on
every bootconfig build.
When CONFIG_CMDLINE_FROM_BOOTCONFIG=n, the public declaration in
<linux/bootconfig.h> resolves to a no-op stub so callers compile
unchanged.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/linux/bootconfig.h | 14 +++++
lib/bootconfig.c | 128 ++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 141 insertions(+), 1 deletion(-)
diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index 1c7f3b74ffcf3..deda507500da2 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -308,4 +308,18 @@ static inline const char *xbc_get_embedded_bootconfig(size_t *size)
}
#endif
+/* Bootconfig opt-in detection, shared by setup_arch() and setup_boot_config() */
+#ifdef CONFIG_BOOT_CONFIG
+bool __init bootconfig_cmdline_requested(const char *boot_cmdline, int *end_offset);
+#endif
+
+/* Build-time-rendered bootconfig cmdline prepended in setup_arch() */
+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size);
+bool __init xbc_embedded_cmdline_applied(void);
+#else
+static inline void xbc_prepend_embedded_cmdline(char *dst, size_t size) { }
+static inline bool xbc_embedded_cmdline_applied(void) { return false; }
+#endif
+
#endif
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index 926094d97397e..89c88e359179f 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -19,9 +19,13 @@
#include <linux/errno.h>
#include <linux/cache.h>
#include <linux/compiler.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/printk.h>
#include <linux/sprintf.h>
#include <linux/memblock.h>
#include <linux/string.h>
+#include <asm/setup.h> /* COMMAND_LINE_SIZE */
#ifdef CONFIG_BOOT_CONFIG_EMBED
/* embedded_bootconfig_data is defined in bootconfig-data.S */
@@ -34,7 +38,129 @@ const char * __init xbc_get_embedded_bootconfig(size_t *size)
return (*size) ? embedded_bootconfig_data : NULL;
}
#endif
-#endif
+
+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+/* embedded_kernel_cmdline is defined in embedded-cmdline.S */
+extern __visible const char embedded_kernel_cmdline[];
+extern __visible const char embedded_kernel_cmdline_end[];
+
+/* Set once the embedded cmdline has actually been prepended. */
+static bool xbc_cmdline_applied __initdata;
+
+/*
+ * str_prepend() - Prepend @src in front of the string in @dst, in place
+ * @dst: NUL-terminated destination buffer, currently @dst_len bytes long
+ * @dst_len: length of the current @dst string (excluding its NUL)
+ * @src: bytes to prepend (not NUL-terminated)
+ * @src_len: number of bytes from @src to prepend
+ *
+ * The caller must guarantee @dst has room for src_len + dst_len + 1 bytes.
+ * Moving dst_len + 1 bytes carries @dst's NUL terminator too, so an empty
+ * @dst needs no special case.
+ */
+static void __init str_prepend(char *dst, size_t dst_len,
+ const char *src, size_t src_len)
+{
+ memmove(dst + src_len, dst, dst_len + 1);
+ memcpy(dst, src, src_len);
+}
+
+/**
+ * xbc_prepend_embedded_cmdline() - Prepend embedded bootconfig cmdline
+ * @dst: cmdline buffer to prepend into (must already contain a NUL byte)
+ * @size: total capacity of @dst in bytes
+ *
+ * Prepend the build-time-rendered "kernel" subtree of the embedded
+ * bootconfig to @dst. The rendered string already ends with a single
+ * space (the xbc_snprint_cmdline() invariant), which serves as the
+ * separator between the embedded keys and any existing content of @dst.
+ * On overflow, log an error and leave @dst untouched rather than
+ * silently truncating: booting without the embedded values is better
+ * than refusing to boot, and the error message tells the user why
+ * their embedded keys are missing.
+ *
+ * Intended to be called from setup_arch() before parse_early_param() so
+ * that early_param() handlers see the embedded values.
+ */
+void __init xbc_prepend_embedded_cmdline(char *dst, size_t size)
+{
+ size_t embed_len = embedded_kernel_cmdline_end - embedded_kernel_cmdline;
+ size_t dst_len;
+
+ if (!size || embed_len <= 1) /* trailing NUL only */
+ return;
+ embed_len--; /* exclude trailing NUL byte */
+
+ dst_len = strnlen(dst, size);
+ if (embed_len + dst_len + 1 > size) {
+ pr_err("embedded bootconfig cmdline (%zu bytes) does not fit in COMMAND_LINE_SIZE with %zu bytes already used; ignoring embedded values\n",
+ embed_len, dst_len);
+ return;
+ }
+
+ str_prepend(dst, dst_len, embedded_kernel_cmdline, embed_len);
+ xbc_cmdline_applied = true;
+}
+
+/**
+ * xbc_embedded_cmdline_applied() - Did the embedded cmdline get prepended?
+ *
+ * Return true if xbc_prepend_embedded_cmdline() actually prepended the
+ * embedded "kernel" subtree. setup_boot_config() uses this to avoid
+ * rendering the same keys a second time.
+ */
+bool __init xbc_embedded_cmdline_applied(void)
+{
+ return xbc_cmdline_applied;
+}
+#endif /* CONFIG_CMDLINE_FROM_BOOTCONFIG */
+
+/* parse_args() callback: flag when the "bootconfig" parameter is present. */
+static int __init bootconfig_optin(char *param, char *val,
+ const char *unused, void *arg)
+{
+ if (!strcmp(param, "bootconfig"))
+ *(bool *)arg = true;
+ return 0;
+}
+
+/**
+ * bootconfig_cmdline_requested() - Was "bootconfig" passed on the cmdline?
+ * @boot_cmdline: kernel command line to inspect (not modified)
+ * @end_offset: if non-NULL, set to the offset of the init arguments that
+ * follow a "--" separator, or 0 when there is none
+ *
+ * Parse a private copy of @boot_cmdline (parse_args() is destructive) and
+ * report whether "bootconfig" is present before the "--" separator.
+ * setup_arch() uses this to gate prepending the build-time embedded cmdline;
+ * setup_boot_config() uses it for the runtime opt-in and to locate the init
+ * arguments via @end_offset. Sharing one parser keeps the early and late
+ * paths agreeing on what counts as opt-in. CONFIG_BOOT_CONFIG_FORCE is not
+ * folded in here; callers apply it where they need it.
+ */
+bool __init bootconfig_cmdline_requested(const char *boot_cmdline, int *end_offset)
+{
+ static char tmp_cmdline[COMMAND_LINE_SIZE] __initdata;
+ bool found = false;
+ char *err;
+
+ if (end_offset)
+ *end_offset = 0;
+
+ strscpy(tmp_cmdline, boot_cmdline, COMMAND_LINE_SIZE);
+ err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0,
+ &found, bootconfig_optin);
+ if (IS_ERR(err))
+ return false;
+
+ /* parse_args() stops at "--" and returns the address of the rest. */
+ if (end_offset && err)
+ *end_offset = err - tmp_cmdline;
+
+ return found;
+}
+
+#endif /* __KERNEL__ */
/*
* Extra Boot Config (XBC) is given as tree-structured ascii text of
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 6/9] Documentation: bootconfig: document build-time cmdline rendering
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
Add a section describing CONFIG_CMDLINE_FROM_BOOTCONFIG: what it
does (renders the embedded "kernel" subtree to a flat cmdline at
build time so early_param() handlers see the values), what it
requires (BOOT_CONFIG_EMBED, a non-empty BOOT_CONFIG_EMBED_FILE,
CONFIG_CMDLINE to be empty, and ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG --
currently x86 only), the bootconfig opt-in semantics, the initrd-vs-embedded
precedence, and the soft-error overflow behavior.
This addresses feedback from the Sashiko AI review and Masami Hiramatsu to
document the CONFIG_CMDLINE requirement, which is enforced at the Kconfig
level but was not mentioned in the documentation, potentially confusing users
who might satisfy all other requirements but still find the option hidden in
menuconfig if CONFIG_CMDLINE is non-empty.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Documentation/admin-guide/bootconfig.rst | 81 ++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)
diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
index f712758472d5c..3d6412458c8b6 100644
--- a/Documentation/admin-guide/bootconfig.rst
+++ b/Documentation/admin-guide/bootconfig.rst
@@ -234,6 +234,87 @@ Kconfig option selected.
Note that even if you set this option, you can override the embedded
bootconfig by another bootconfig which attached to the initrd.
+Rendering Embedded kernel.* Keys at Build Time
+----------------------------------------------
+
+By default, the embedded bootconfig (``CONFIG_BOOT_CONFIG_EMBED=y``) is
+parsed at runtime, after ``parse_early_param()`` has already run. Early
+parameter handlers (``mem=``, ``earlycon=``, ``loglevel=``, ...) therefore
+cannot see values supplied via the embedded ``kernel`` subtree.
+
+``CONFIG_CMDLINE_FROM_BOOTCONFIG`` resolves this by rendering the
+``kernel`` subtree of ``CONFIG_BOOT_CONFIG_EMBED_FILE`` into a flat cmdline
+string at kernel build time (via ``tools/bootconfig -C``) and prepending
+it to ``boot_command_line`` during early architecture setup, so the keys
+are visible to ``parse_early_param()``.
+
+The option requires ``CONFIG_BOOT_CONFIG_EMBED=y``, a non-empty
+``CONFIG_BOOT_CONFIG_EMBED_FILE``, ``CONFIG_CMDLINE`` to be empty, and
+an architecture that selects ``CONFIG_ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG``.
+Currently only x86 selects it; on other architectures the embedded
+bootconfig still works, but only through the late runtime parser.
+
+The same ``bootconfig`` opt-in applies as elsewhere: the rendered keys
+are prepended only when ``bootconfig`` (in any form) appears on the
+kernel command line, or when ``CONFIG_BOOT_CONFIG_FORCE`` is set, which
+defaults to ``y`` when ``CONFIG_BOOT_CONFIG_EMBED`` is set.
+
+For example, given::
+
+ kernel {
+ loglevel = 7
+ mem = 4G
+ }
+
+the kernel boots as if ``loglevel=7 mem=4G`` had been prepended to the
+bootloader command line, with the values visible to early-parsed
+handlers. Comma-separated values are still expanded into multiple
+cmdline entries per the bootconfig array convention -- the embedded
+``kernel.earlycon = "uart8250,io,0x3f8"`` must be quoted to land as a
+single ``earlycon=`` entry, exactly as for the runtime parser.
+
+If the rendered string would not fit in ``COMMAND_LINE_SIZE`` together
+with the existing command line, the prepend is skipped and an error is
+logged, so an oversized embedded bootconfig cannot brick a boot.
+
+Interaction with other command line and bootconfig sources
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With ``CONFIG_CMDLINE_FROM_BOOTCONFIG=y`` the rendered ``kernel``
+subtree behaves like a build-time command line (similar to
+``CONFIG_CMDLINE``), not like a bootconfig source. It is prepended to
+``boot_command_line`` in ``setup_arch()``, before ``parse_early_param()``
+and long before the runtime parser looks at an initrd. Options can reach
+the kernel from up to four places:
+
+- Bootloader command line: the arguments the boot loader passes. The
+ embedded cmdline is prepended in front of them, so for last-one-wins
+ parameters a bootloader option still overrides the embedded value.
+ Visible in /proc/cmdline.
+- Embedded cmdline (this option): the rendered ``kernel`` subtree,
+ prepended early so it is seen by ``parse_early_param()``. Visible in
+ /proc/cmdline.
+- Initrd bootconfig: parsed late in ``setup_boot_config()``; its
+ ``kernel`` keys are placed ahead of ``boot_command_line``, i.e. before
+ the embedded cmdline, so last-wins favors the embedded values. As a
+ bootconfig source, an initrd bootconfig still replaces the embedded
+ bootconfig. Visible in /proc/cmdline and /proc/bootconfig.
+- Embedded bootconfig (runtime): parsed late, only when no initrd
+ bootconfig is present. Visible in /proc/cmdline and /proc/bootconfig.
+
+So with this option the embedded ``kernel.*`` values take precedence
+over an initrd bootconfig's ``kernel.*`` values: for early parameters
+the initrd is not parsed yet, and for ordinary parameters the embedded
+keys land later in the command line. If you need an initrd bootconfig to
+override the embedded ``kernel.*`` keys, leave this option off and rely
+on the runtime parser.
+
+The rendered string is part of the command line, so it appears in
+/proc/cmdline. It is deliberately not shown in /proc/bootconfig: that
+file keeps reporting the parsed bootconfig tree -- the initrd bootconfig
+if present, otherwise the embedded bootconfig -- independent of whether
+build-time cmdline rendering is enabled.
+
Kernel parameters via Boot Config
=================================
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 7/9] x86/setup: prepend embedded bootconfig cmdline before parse_early_param
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
Call xbc_prepend_embedded_cmdline() in setup_arch() right after the
CONFIG_CMDLINE merge and before strscpy(command_line, ...) so the
build-time-rendered embedded bootconfig "kernel" subtree is part of
boot_command_line by the time parse_early_param() runs. early_param()
handlers (mem=, earlycon=, loglevel=, ...) now see values supplied via
CONFIG_BOOT_CONFIG_EMBED_FILE without parsing bootconfig at runtime.
Gate the prepend on the same opt-in the runtime parser uses: prepend
when "bootconfig" is present on the command line, or when
CONFIG_BOOT_CONFIG_FORCE is set. Detect it with parse_args(), exactly
as setup_boot_config() does, so both agree on what counts as opt-in:
any "bootconfig" key regardless of value (bare, =0, =1, ...), and only
before the "--" that separates init arguments. Sharing the parser keeps
the early and late paths from diverging -- e.g. "bootconfig=0" or a
"-- bootconfig" meant for init must not apply the embedded keys early
while the runtime parser skips them.
The prepend necessarily runs before setup_boot_config() detects an
initrd bootconfig, so an initrd cannot override the embedded "kernel"
keys for early_param(). This is intentional: the embedded cmdline acts
like a build-time CONFIG_CMDLINE. An initrd bootconfig's "kernel" keys
never reached early_param() anyway (they apply late via
extra_command_line), so nothing is lost -- the initrd keys still apply
late, with last-wins keeping the embedded values in effect.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
arch/x86/Kconfig | 1 +
arch/x86/kernel/setup.c | 14 +++++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0de23e6471973..8ab11199c16d5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -127,6 +127,7 @@ config X86
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
select ARCH_SUPPORTS_CFI if X86_64
+ select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
select ARCH_USES_CFI_TRAPS if X86_64 && CFI
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 46882ce79c3a4..88b055a46591e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -6,6 +6,7 @@
* parts of early kernel initialization.
*/
#include <linux/acpi.h>
+#include <linux/bootconfig.h>
#include <linux/console.h>
#include <linux/cpu.h>
#include <linux/crash_dump.h>
@@ -880,7 +881,6 @@ static void __init x86_report_nx(void)
*
* Note: On x86_64, fixmaps are ready for use even before this is called.
*/
-
void __init setup_arch(char **cmdline_p)
{
#ifdef CONFIG_X86_32
@@ -924,6 +924,18 @@ void __init setup_arch(char **cmdline_p)
builtin_cmdline_added = true;
#endif
+#ifdef CONFIG_CMDLINE_FROM_BOOTCONFIG
+ /*
+ * Prepend the build-time-rendered embedded "kernel" keys here so
+ * parse_early_param() below sees them, using the same opt-in as the
+ * runtime parser, plus the build-time CONFIG_BOOT_CONFIG_FORCE.
+ */
+ if (bootconfig_cmdline_requested(boot_command_line, NULL) ||
+ IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE))
+ xbc_prepend_embedded_cmdline(boot_command_line,
+ COMMAND_LINE_SIZE);
+#endif
+
strscpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 8/9] bootconfig: skip runtime kernel.* render once prepended early
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
setup_boot_config() folds the embedded bootconfig "kernel" subtree into
the command line via xbc_make_cmdline("kernel"). A subsequent patch lets
an architecture prepend the build-time-rendered embedded "kernel" keys
to boot_command_line early in setup_arch(); rendering them again here
would then duplicate every key in saved_command_line and make
accumulating handlers (console=, earlycon=, ...) re-register the same
value.
Track whether the bootconfig data came from the embedded source
(from_embedded) and skip the runtime render only when the early prepend
actually happened, as reported by xbc_embedded_cmdline_applied(). On
architectures that do not select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG
that helper is a stub returning false, so this path is unchanged and the
embedded "kernel" keys still reach the cmdline via the runtime parser
exactly as before.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
init/main.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/init/main.c b/init/main.c
index e363232b428b4..260bd5242f94e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -378,12 +378,15 @@ static void __init setup_boot_config(void)
int pos, ret;
size_t size;
char *err;
+ bool from_embedded = false;
/* Cut out the bootconfig data even if we have no bootconfig option */
data = get_boot_config_from_initrd(&size);
/* If there is no bootconfig in initrd, try embedded one. */
- if (!data)
+ if (!data) {
data = xbc_get_embedded_bootconfig(&size);
+ from_embedded = true;
+ }
strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
@@ -421,8 +424,24 @@ static void __init setup_boot_config(void)
} else {
xbc_get_info(&ret, NULL);
pr_info("Load bootconfig: %ld bytes %d nodes\n", (long)size, ret);
- /* keys starting with "kernel." are passed via cmdline */
- extra_command_line = xbc_make_cmdline("kernel");
+ /*
+ * keys starting with "kernel." are passed via cmdline. When
+ * this bootconfig came from the embedded source and
+ * setup_arch() already prepended the rendered "kernel" subtree
+ * to boot_command_line, rendering again here would duplicate
+ * the keys in saved_command_line and make accumulating handlers
+ * (console=, earlycon=, ...) re-register the same value. Skip
+ * only when the prepend really happened.
+ *
+ * On arches that do not select ARCH_SUPPORTS_CMDLINE_FROM_BOOTCONFIG,
+ * CONFIG_CMDLINE_FROM_BOOTCONFIG is unselectable and
+ * xbc_embedded_cmdline_applied() collapses to a stub returning
+ * false, so this path still runs and the embedded "kernel"
+ * keys reach the cmdline via the runtime parser exactly as
+ * before this series.
+ */
+ if (!from_embedded || !xbc_embedded_cmdline_applied())
+ extra_command_line = xbc_make_cmdline("kernel");
/* Also, "init." keys are init arguments */
extra_init_args = xbc_make_cmdline("init");
}
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 9/9] init/main.c: use bootconfig_cmdline_requested() for the runtime opt-in
From: Breno Leitao @ 2026-06-26 12:50 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton, Nathan Chancellor, paulmck,
Nicolas Schier, Nick Desaulniers, Bill Wendling, Justin Stitt,
Jonathan Corbet, Shuah Khan
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-kernel, linux-trace-kernel, linux-kbuild,
bpf, llvm, linux-doc, Breno Leitao, kernel-team
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
setup_boot_config() open-coded the same "is bootconfig requested on the
kernel command line?" check that setup_arch() performs via the shared
bootconfig_cmdline_requested() helper. Switch it to the helper so the
early (setup_arch) and late (setup_boot_config) paths use one parser and
cannot disagree on what counts as opt-in.
The helper also reports the offset of the init arguments following a "--"
separator, which is exactly what initargs_offs needs, so the local
parse_args() call, its bootconfig_params() callback and the tmp_cmdline
copy are removed.
No functional change intended.
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
init/main.c | 27 ++++++---------------------
1 file changed, 6 insertions(+), 21 deletions(-)
diff --git a/init/main.c b/init/main.c
index 260bd5242f94e..39a518a472422 100644
--- a/init/main.c
+++ b/init/main.c
@@ -356,28 +356,17 @@ static char * __init xbc_make_cmdline(const char *key)
return new_cmdline;
}
-static int __init bootconfig_params(char *param, char *val,
- const char *unused, void *arg)
-{
- if (strcmp(param, "bootconfig") == 0) {
- bootconfig_found = true;
- }
- return 0;
-}
-
static int __init warn_bootconfig(char *str)
{
- /* The 'bootconfig' has been handled by bootconfig_params(). */
+ /* The 'bootconfig' option is handled by setup_boot_config(). */
return 0;
}
static void __init setup_boot_config(void)
{
- static char tmp_cmdline[COMMAND_LINE_SIZE] __initdata;
const char *msg, *data;
- int pos, ret;
+ int pos, ret, offs;
size_t size;
- char *err;
bool from_embedded = false;
/* Cut out the bootconfig data even if we have no bootconfig option */
@@ -388,16 +377,12 @@ static void __init setup_boot_config(void)
from_embedded = true;
}
- strscpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
- err = parse_args("bootconfig", tmp_cmdline, NULL, 0, 0, 0, NULL,
- bootconfig_params);
-
- if (IS_ERR(err) || !(bootconfig_found || IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE)))
+ bootconfig_found = bootconfig_cmdline_requested(boot_command_line, &offs);
+ if (!(bootconfig_found || IS_ENABLED(CONFIG_BOOT_CONFIG_FORCE)))
return;
- /* parse_args() stops at the next param of '--' and returns an address */
- if (err)
- initargs_offs = err - tmp_cmdline;
+ /* Offset of the init arguments after a "--", located by the helper. */
+ initargs_offs = offs;
if (!data) {
/* If user intended to use bootconfig, show an error level message */
--
2.53.0-Meta
^ permalink raw reply related
* Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
From: Alexander Krabler @ 2026-06-26 13:42 UTC (permalink / raw)
To: Wandun, Vlastimil Babka (SUSE), linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-rt-devel@lists.linux.dev
Cc: akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com,
rostedt@goodmis.org, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com, david@kernel.org, ljs@kernel.org,
liam@infradead.org, rppt@kernel.org, bigeasy@linutronix.de,
clrkwllms@kernel.org, Hugh Dickins
In-Reply-To: <a96b0b24-c405-43c4-96ef-605bacd17cad@gmail.com>
On 6/26/26 11:38, Wandun wrote:
> On 6/26/26 16:45, Alexander Krabler wrote:
>> However, we were not able to reproduce the actual race
>> (mlockall() process waiting on a migration PTE),
>> not in the past, not now. Might be hard to trigger that race.
>
> Not hard to trigger that case, I added a debug message, such as below,
> lots of messages occur in a few second.
>
> diff --cc mm/memory.c
> index ff338c2abe92,ff338c2abe92..6552b3b14f78
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@@ -4768,6 -4768,6 +4768,8 @@@ vm_fault_t do_swap_page(struct vm_faul
> if (softleaf_is_migration(entry)) {
> migration_entry_wait(vma->vm_mm, vmf->pmd,
> vmf->address);
> + if (!strcmp(current->comm, "repro"))
> + pr_err("============== hit ================\n");
> } else if (softleaf_is_device_exclusive(entry)) {
> vmf->page = softleaf_to_page(entry);
> ret = remove_device_exclusive_entry(vmf);
I have a kprobe on migration_entry_wait set and logged into a ftrace buffer
(including kernel stacktrace).
Yes, this function is hit, but only inside the mmap-syscall, which is okay,
memory allocation is not realtime-safe.
repro-2090 [002] d.... 811.129549: frt_migration_entry_wait: (migration_entry_wait+0x0/0x100)
repro-2090 [002] d.... 811.129553: <stack trace>
=> migration_entry_wait
=> __handle_mm_fault
=> handle_mm_fault
=> __get_user_pages
=> populate_vma_page_range
=> __mm_populate
=> vm_mmap_pgoff
=> ksys_mmap_pgoff
=> __arm64_sys_mmap
=> el0_svc_common.constprop.0
=> do_el0_svc
=> el0_svc
=> el0t_64_sync_handler
=> el0t_64_sync
The original race was an instruction abort interrupt out of nothing due
to the migration PTE set by kcompactd.
And these kind of races I see quite often on non mlockall()-processes,
but can't reproduce on memory locked processes.
Example:
podman-832 [000] d.... 812.447820: frt_migration_entry_wait: (migration_entry_wait+0x0/0x100)
podman-832 [000] d.... 812.447823: <stack trace>
=> migration_entry_wait
=> __handle_mm_fault
=> handle_mm_fault
=> do_page_fault
=> do_translation_fault
=> do_mem_abort
=> el0_da
=> el0t_64_sync_handler
=> el0t_64_sync
Thanks,
Alexander
--
KUKA Deutschland GmbH Board of Directors: Michael Jürgens (Chairman), Johan Naten, Hui Zhang Registered Office: Augsburg HRB 14914
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of contents of this e-mail is strictly forbidden.
Please consider the environment before printing this e-mail.
^ permalink raw reply
* [PATCH v11 00/11] tracing/probes: Add more typecast features
From: Masami Hiramatsu (Google) @ 2026-06-26 14:14 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
Hi,
Here is the 11th version of series to introduce more typecast features
to probe events. The previous version is here:
https://lore.kernel.org/all/178243982430.790911.17439694390021542101.stgit@devnote2/
In this version, I fixed minor issues and add 2 patches to fix
in-tree tools to ignore comment lines in dynamic_events[3/11][4/11].
This series extends BTF typecast feature and add more options:
1. Expanding BTF typecast to kprobe and fprobe.
(currently only function entry/exit)
2. Introduce container_of like typecast. This adds a "assigned
member" option to the typecast.
(STRUCT,MEMBER)VAR->ANOTHER_MEMBER
This casts VAR to STRUCT type but the VAR is as the address
of STRUCT.MEMBER. In C, it is:
container_of(VAR, STRUCT, MEMBER)->ANOTHER_MEMBER
3. Support nested typecast, e.g.
(STRUCT)((STRUCT2)VAR->MEMBER2)->MEMBER
the nest level must be smaller than 3.
4. Add $current variable to point "current" task_struct.
This is useful with typecast, e.g.
(task_struct)$current->pid
5. per-cpu dereference support.
Intrdouce this_cpu_read(VAR) and this_cpu_ptr(VAR) to
access per-cpu data on the current CPU (accessing other CPU
data is not stable, because it can be changed.)
You can access the member of per-cpu data structure using
typecast like:
(STRUCT)this_cpu_ptr(VAR)->MEMBER
6. Support event fields without $ prefix on eprobes.
Now eprobe events can access its event fields.
And added fetcharg dump feature (for debug) and updated test scripts
to test part of them.
Thanks,
---
base-commit: c69b5f959286395e94c237ce6d7d4970bad7f6e3
Masami Hiramatsu (Google) (11):
tracing/probes: Allow eprobe to use variable without $ prefix
tracing/probes: Support dumping fetcharg program for debugging dynamic events
tools/bootconfig: Ignore comment lines in dynamic_events/kprobe_events file
perf/probe: Ignore comment lines in dynamic_events/kprobe_events file
tracing/probes: Support typecast for various probe events
tracing/probes: Support nested typecast
tracing/probes: Type casting always involves nested calls
tracing/probes: Support field specifier option for typecast
tracing/probes: Add $current variable support
tracing/probes: Add this_cpu_read() and this_cpu_ptr() dereference method to fetcharg
tracing/probes: Add a new testcase for BTF typecasts
Documentation/trace/eprobetrace.rst | 7
Documentation/trace/fprobetrace.rst | 10
Documentation/trace/kprobetrace.rst | 11
kernel/trace/Kconfig | 12
kernel/trace/trace.c | 8
kernel/trace/trace_eprobe.c | 2
kernel/trace/trace_fprobe.c | 2
kernel/trace/trace_kprobe.c | 2
kernel/trace/trace_probe.c | 585 ++++++++++++++++----
kernel/trace/trace_probe.h | 100 ++-
kernel/trace/trace_probe_tmpl.h | 25 +
kernel/trace/trace_uprobe.c | 3
samples/trace_events/trace-events-sample.c | 40 +
samples/trace_events/trace-events-sample.h | 34 +
tools/bootconfig/scripts/ftrace2bconf.sh | 2
tools/perf/util/probe-file.c | 2
.../ftrace/test.d/dynevent/btf_probe_event.tc | 51 ++
.../test.d/dynevent/btf_typecast_accepted.tc | 107 ++++
.../test.d/dynevent/eprobes_syntax_errors.tc | 12
.../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 12
.../ftrace/test.d/kprobe/kprobe_syntax_errors.tc | 12
.../ftrace/test.d/kprobe/uprobe_syntax_errors.tc | 5
22 files changed, 890 insertions(+), 154 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* [PATCH v11 01/11] tracing/probes: Allow eprobe to use variable without $ prefix
From: Masami Hiramatsu (Google) @ 2026-06-26 14:14 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
The commit 69efd863a785 ("tracing/eprobes: Allow use of BTF names
to dereference pointers") allows eprobe to use event field without
"$" prefix when it is used with typecast, it is natual to allow it
without typecast.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v8:
- Newly added.
---
kernel/trace/trace_probe.c | 12 +++++++++++-
kernel/trace/trace_probe.h | 1 +
.../test.d/dynevent/eprobes_syntax_errors.tc | 3 +--
3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 0da7c0b53ba7..2ce7d62471cb 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1341,7 +1341,17 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
ret = handle_typecast(arg, pcode, end, ctx);
break;
default:
- if (isalpha(arg[0]) || arg[0] == '_') { /* BTF variable */
+ if (isalpha(arg[0]) || arg[0] == '_') {
+ /* BTF variable or event field*/
+ if (ctx->flags & TPARG_FL_TEVENT) {
+ ret = parse_trace_event(arg, *pcode, ctx);
+ if (ret < 0) {
+ trace_probe_log_err(ctx->offset,
+ NO_EVENT_FIELD);
+ return -EINVAL;
+ }
+ break;
+ }
if (!tparg_is_function_entry(ctx->flags) &&
!tparg_is_function_return(ctx->flags)) {
trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 40b53b5b58a9..2e0d8384ee5c 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -559,6 +559,7 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(NO_PTR_STRCT, "This is not a pointer to union/structure."), \
C(NOSUP_DAT_ARG, "Non pointer structure/union argument is not supported."),\
C(BAD_HYPHEN, "Failed to parse single hyphen. Forgot '>'?"), \
+ C(NO_EVENT_FIELD, "This event field is not found."), \
C(NO_BTF_FIELD, "This field is not found."), \
C(BAD_BTF_TID, "Failed to get BTF type info."),\
C(BAD_TYPE4STR, "This type does not fit for string."),\
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
index 2a680c086047..0e65e787e426 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
@@ -10,7 +10,7 @@ check_error() { # command-with-error-pos-by-^
check_error 'e ^a.' # NO_EVENT_INFO
check_error 'e ^.b' # NO_EVENT_INFO
check_error 'e ^a.b' # BAD_ATTACH_EVENT
-check_error 'e syscalls/sys_enter_openat ^foo' # BAD_ATTACH_ARG
+check_error 'e syscalls/sys_enter_openat ^foo' # NO_EVENT_FIELD
check_error 'e:^/bar syscalls/sys_enter_openat' # NO_GROUP_NAME
check_error 'e:^12345678901234567890123456789012345678901234567890123456789012345/bar syscalls/sys_enter_openat' # GROUP_TOO_LONG
@@ -19,7 +19,6 @@ check_error 'e:^ syscalls/sys_enter_openat' # NO_EVENT_NAME
check_error 'e:foo/^12345678901234567890123456789012345678901234567890123456789012345 syscalls/sys_enter_openat' # EVENT_TOO_LONG
check_error 'e:foo/^bar.1 syscalls/sys_enter_openat' # BAD_EVENT_NAME
-check_error 'e:foo/bar syscalls/sys_enter_openat arg=^dfd' # BAD_FETCH_ARG
check_error 'e:foo/bar syscalls/sys_enter_openat arg=^$foo' # BAD_ATTACH_ARG
if grep -q '<attached-group>\.<attached-event>.*\[if <filter>\]' README; then
^ permalink raw reply related
* [PATCH v11 02/11] tracing/probes: Support dumping fetcharg program for debugging dynamic events
From: Masami Hiramatsu (Google) @ 2026-06-26 14:14 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
For debugging probe events, it is helpful to verify the compiled
fetch instructions for each probe argument. This introduces a new
kernel config CONFIG_PROBE_EVENTS_DUMP_FETCHARG to decode the
instruction sequence of each argument and display it under a
commented line starting with '#' immediately following the dynamic
event definition (such as in dynamic_events, kprobe_events,
uprobe_events, etc.).
For example:
/sys/kernel/tracing # cat dynamic_events
p:kprobes/p_vfs_read_0 vfs_read arg1=+0(file):ustring arg2=%ax:x16
# arg1: ARG(0) -> ST_USTRING(offset=0,size=4) -> END
# arg2: REG(80) -> ST_RAW(size=2) -> END
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v8:
- State this feature is only for debugging probe events.
- Fix dependency list after description in Kconfig.
Changes in v7:
- Show trace event field name for FETCH_OP_TP_ARG.
- Show immediate string value for FETCH_OP_IMMSTR.
- Fix style issues warned by checkpatch.pl.
Changes in v6:
- Newly added.
---
kernel/trace/Kconfig | 12 +++++
kernel/trace/trace_eprobe.c | 2 +
kernel/trace/trace_fprobe.c | 2 +
kernel/trace/trace_kprobe.c | 2 +
kernel/trace/trace_probe.c | 96 +++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace_probe.h | 79 +++++++++++++++++++++--------------
kernel/trace/trace_uprobe.c | 3 +
7 files changed, 164 insertions(+), 32 deletions(-)
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 084f34dc6c9f..0ab5916575a9 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -779,6 +779,18 @@ config PROBE_EVENTS_BTF_ARGS
kernel function entry or a tracepoint.
This is available only if BTF (BPF Type Format) support is enabled.
+config PROBE_EVENTS_DUMP_FETCHARG
+ bool "Dump of dynamic probe event fetch-arguments"
+ depends on PROBE_EVENTS
+ default n
+ help
+ This shows the dump of fetch-arguments of dynamic probe events
+ alongside their event definitions in the dynamic_events file
+ as comment lines. This is useful to debug the probe events.
+ Since this exposes the raw values in the dynamic_events file,
+ it might be a security risk. Only enable it if you need to debug
+ probe events themselves.
+
config KPROBE_EVENTS
depends on KPROBES
depends on HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c
index 50518b071414..462c31145733 100644
--- a/kernel/trace/trace_eprobe.c
+++ b/kernel/trace/trace_eprobe.c
@@ -87,6 +87,8 @@ static int eprobe_dyn_event_show(struct seq_file *m, struct dyn_event *ev)
seq_printf(m, " %s=%s", ep->tp.args[i].name, ep->tp.args[i].comm);
seq_putc(m, '\n');
+ trace_probe_dump_args(m, &ep->tp);
+
return 0;
}
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 4d1abbf66229..536781cd4c47 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -1449,6 +1449,8 @@ static int trace_fprobe_show(struct seq_file *m, struct dyn_event *ev)
seq_printf(m, " %s=%s", tf->tp.args[i].name, tf->tp.args[i].comm);
seq_putc(m, '\n');
+ trace_probe_dump_args(m, &tf->tp);
+
return 0;
}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index a8420e6abb56..cfa807d8e760 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1320,6 +1320,8 @@ static int trace_kprobe_show(struct seq_file *m, struct dyn_event *ev)
seq_printf(m, " %s=%s", tk->tp.args[i].name, tk->tp.args[i].comm);
seq_putc(m, '\n');
+ trace_probe_dump_args(m, &tk->tp);
+
return 0;
}
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 2ce7d62471cb..0908019aea12 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -2403,3 +2403,99 @@ int trace_probe_print_args(struct trace_seq *s, struct probe_arg *args, int nr_a
}
return 0;
}
+
+#ifdef CONFIG_PROBE_EVENTS_DUMP_FETCHARG
+
+struct fetch_op_decode {
+ const char *name;
+ void (*decode)(struct seq_file *m, struct fetch_insn *insn);
+};
+
+static const struct fetch_op_decode fetch_op_decode[];
+
+static void fetcharg_decode_none(struct seq_file *m, struct fetch_insn *insn)
+{
+ seq_puts(m, fetch_op_decode[insn->op].name);
+}
+
+static void fetcharg_decode_param(struct seq_file *m, struct fetch_insn *insn)
+{
+ seq_printf(m, "%s(%u)", fetch_op_decode[insn->op].name, insn->param);
+}
+
+static void fetcharg_decode_imm(struct seq_file *m, struct fetch_insn *insn)
+{
+ seq_printf(m, "%s(0x%lx)", fetch_op_decode[insn->op].name, insn->immediate);
+}
+
+static void fetcharg_decode_string(struct seq_file *m, struct fetch_insn *insn)
+{
+ seq_printf(m, "%s(%s)", fetch_op_decode[insn->op].name, (char *)insn->data);
+}
+
+static void fetcharg_decode_symbol(struct seq_file *m, struct fetch_insn *insn)
+{
+ seq_printf(m, "%s(%s)", fetch_op_decode[insn->op].name, (char *)insn->data);
+}
+
+static void fetcharg_decode_offset(struct seq_file *m, struct fetch_insn *insn)
+{
+ seq_printf(m, "%s(offset=%d)", fetch_op_decode[insn->op].name, insn->offset);
+}
+
+static void fetcharg_decode_store(struct seq_file *m, struct fetch_insn *insn)
+{
+ if (insn->op == FETCH_OP_ST_RAW)
+ seq_printf(m, "%s(size=%u)", fetch_op_decode[insn->op].name, insn->size);
+ else
+ seq_printf(m, "%s(offset=%d,size=%u)", fetch_op_decode[insn->op].name,
+ insn->offset, insn->size);
+}
+
+static void fetcharg_decode_bf(struct seq_file *m, struct fetch_insn *insn)
+{
+ seq_printf(m, "%s(basesize=%u,lshift=%u,rshift=%u)",
+ fetch_op_decode[insn->op].name, insn->basesize, insn->lshift, insn->rshift);
+}
+
+static void fetcharg_decode_tp_arg(struct seq_file *m, struct fetch_insn *insn)
+{
+ struct ftrace_event_field *field = insn->data;
+
+ seq_printf(m, "%s(%s)", fetch_op_decode[insn->op].name, field->name);
+}
+
+#define FETCH_OP(opname, decode_fn) \
+ [FETCH_OP_##opname] = { .name = #opname, .decode = fetcharg_decode_##decode_fn }
+
+static const struct fetch_op_decode fetch_op_decode[] = FETCH_OP_LIST;
+#undef FETCH_OP
+
+static void trace_probe_dump_arg(struct seq_file *m, struct probe_arg *parg)
+{
+ int i;
+
+ seq_printf(m, "# %s: ", parg->name);
+ for (i = 0; i < FETCH_INSN_MAX; i++) {
+ struct fetch_insn *insn = parg->code + i;
+
+ if (insn->op >= ARRAY_SIZE(fetch_op_decode) || !fetch_op_decode[insn->op].decode)
+ seq_printf(m, "unknown(%d)", insn->op);
+ else
+ fetch_op_decode[insn->op].decode(m, insn);
+
+ if (insn->op == FETCH_OP_END)
+ break;
+ seq_puts(m, " -> ");
+ }
+ seq_putc(m, '\n');
+}
+
+void trace_probe_dump_args(struct seq_file *m, struct trace_probe *tp)
+{
+ int i;
+
+ for (i = 0; i < tp->nr_args; i++)
+ trace_probe_dump_arg(m, &tp->args[i]);
+}
+#endif /* CONFIG_PROBE_EVENTS_DUMP_FETCHARG */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 2e0d8384ee5c..e36cfe39e9a8 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -83,38 +83,46 @@ static nokprobe_inline u32 update_data_loc(u32 loc, int consumed)
/* Printing function type */
typedef int (*print_type_func_t)(struct trace_seq *, void *, void *);
-enum fetch_op {
- FETCH_OP_NOP = 0,
- // Stage 1 (load) ops
- FETCH_OP_REG, /* Register : .param = offset */
- FETCH_OP_STACK, /* Stack : .param = index */
- FETCH_OP_STACKP, /* Stack pointer */
- FETCH_OP_RETVAL, /* Return value */
- FETCH_OP_IMM, /* Immediate : .immediate */
- FETCH_OP_COMM, /* Current comm */
- FETCH_OP_ARG, /* Function argument : .param */
- FETCH_OP_FOFFS, /* File offset: .immediate */
- FETCH_OP_IMMSTR, /* Allocated string: .data */
- FETCH_OP_EDATA, /* Entry data: .offset */
- // Stage 2 (dereference) op
- FETCH_OP_DEREF, /* Dereference: .offset */
- FETCH_OP_UDEREF, /* User-space Dereference: .offset */
- // Stage 3 (store) ops
- FETCH_OP_ST_RAW, /* Raw: .size */
- FETCH_OP_ST_MEM, /* Mem: .offset, .size */
- FETCH_OP_ST_UMEM, /* Mem: .offset, .size */
- FETCH_OP_ST_STRING, /* String: .offset, .size */
- FETCH_OP_ST_USTRING, /* User String: .offset, .size */
- FETCH_OP_ST_SYMSTR, /* Kernel Symbol String: .offset, .size */
- FETCH_OP_ST_EDATA, /* Store Entry Data: .offset */
- // Stage 4 (modify) op
- FETCH_OP_MOD_BF, /* Bitfield: .basesize, .lshift, .rshift */
- // Stage 5 (loop) op
- FETCH_OP_LP_ARRAY, /* Array: .param = loop count */
- FETCH_OP_TP_ARG, /* Trace Point argument */
- FETCH_OP_END,
- FETCH_NOP_SYMBOL, /* Unresolved Symbol holder */
-};
+#define FETCH_OP_LIST { \
+ /* Stage 1 (load) ops */ \
+ FETCH_OP(NOP, none), /* NOP */ \
+ FETCH_OP(REG, param), /* Register: .param = offset */ \
+ FETCH_OP(STACK, param), /* Stack: .param = index */ \
+ FETCH_OP(STACKP, none), /* Stack pointer */ \
+ FETCH_OP(RETVAL, none), /* Return value */ \
+ FETCH_OP(IMM, imm), /* Immediate: .immediate */ \
+ FETCH_OP(COMM, none), /* Current comm */ \
+ FETCH_OP(ARG, param), /* Argument: .param = index */ \
+ FETCH_OP(FOFFS, imm), /* File offset: .immediate */ \
+ FETCH_OP(IMMSTR, string), /* Allocated string: .data */ \
+ FETCH_OP(EDATA, offset), /* Entry data: .offset */ \
+ FETCH_OP(TP_ARG, tp_arg), /* Tracepoint argument: .data */\
+ /* Stage 2 (dereference) ops */ \
+ FETCH_OP(DEREF, offset), /* Dereference: .offset */ \
+ FETCH_OP(UDEREF, offset), /* User-space dereference: .offset */\
+ /* Stage 3 (store) ops */ \
+ FETCH_OP(ST_RAW, store), /* Raw value: .size */ \
+ FETCH_OP(ST_MEM, store), /* Memory: .offset, .size */ \
+ FETCH_OP(ST_UMEM, store), /* User memory: .offset, .size */\
+ FETCH_OP(ST_STRING, store), /* String: .offset, .size */ \
+ FETCH_OP(ST_USTRING, store), /* User string: .offset, .size */\
+ FETCH_OP(ST_SYMSTR, store), /* Symbol name: .offset, .size */\
+ FETCH_OP(ST_EDATA, offset), /* Entry data: .offset */ \
+ /* Stage 4 (modify) op */ \
+ FETCH_OP(MOD_BF, bf), /* Bitfield: .basesize, .lshift, .rshift*/\
+ /* Stage 5 (loop) op */ \
+ FETCH_OP(LP_ARRAY, param), /* Loop array: .param = count */\
+ /* End */ \
+ FETCH_OP(END, none), \
+ /* Unresolved Symbol holder */ \
+ FETCH_OP(NOP_SYMBOL, symbol), /* Non loaded symbol: .data = symbol name */\
+}
+
+#define FETCH_OP(opname, decode_fn) FETCH_OP_##opname
+enum fetch_op FETCH_OP_LIST;
+#undef FETCH_OP
+
+#define FETCH_NOP_SYMBOL FETCH_OP_NOP_SYMBOL
struct fetch_insn {
enum fetch_op op;
@@ -370,6 +378,13 @@ bool trace_probe_match_command_args(struct trace_probe *tp,
int trace_probe_create(const char *raw_command, int (*createfn)(int, const char **));
int trace_probe_print_args(struct trace_seq *s, struct probe_arg *args, int nr_args,
u8 *data, void *field);
+#ifdef CONFIG_PROBE_EVENTS_DUMP_FETCHARG
+void trace_probe_dump_args(struct seq_file *m, struct trace_probe *tp);
+#else
+static inline void trace_probe_dump_args(struct seq_file *m, struct trace_probe *tp)
+{
+}
+#endif
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
int traceprobe_get_entry_data_size(struct trace_probe *tp);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index c274346853d1..b2e264a4b96c 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -765,6 +765,9 @@ static int trace_uprobe_show(struct seq_file *m, struct dyn_event *ev)
seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
seq_putc(m, '\n');
+
+ trace_probe_dump_args(m, &tu->tp);
+
return 0;
}
^ permalink raw reply related
* [PATCH v11 03/11] tools/bootconfig: Ignore comment lines in dynamic_events/kprobe_events file
From: Masami Hiramatsu (Google) @ 2026-06-26 14:14 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since dynamic_events/kprobe_events files show the fetcharg debug
information as comment lines, its reader needs to ignore it.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
tools/bootconfig/scripts/ftrace2bconf.sh | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/bootconfig/scripts/ftrace2bconf.sh b/tools/bootconfig/scripts/ftrace2bconf.sh
index 1603801cf126..8eed445c295e 100755
--- a/tools/bootconfig/scripts/ftrace2bconf.sh
+++ b/tools/bootconfig/scripts/ftrace2bconf.sh
@@ -57,6 +57,8 @@ EOF
kprobe_event_options() {
cat $TRACEFS/kprobe_events | while read p args; do
case $p in
+ \#*)
+ continue;;
r*)
cat 1>&2 << EOF
# WARN: A return probe found but it is not supported by bootconfig. Skip it.
^ permalink raw reply related
* [PATCH v11 04/11] perf/probe: Ignore comment lines in dynamic_events/kprobe_events file
From: Masami Hiramatsu (Google) @ 2026-06-26 14:14 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since dynamic_events/kprobe_events files show the fetcharg debug
information as comment lines, its reader needs to ignore it.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
tools/perf/util/probe-file.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 4032572cbf55..4d12693a83b3 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -197,6 +197,8 @@ struct strlist *probe_file__get_rawlist(int fd)
idx = strlen(p) - 1;
if (p[idx] == '\n')
p[idx] = '\0';
+ if (buf[0] == '#')
+ continue;
ret = strlist__add(sl, buf);
if (ret < 0) {
pr_debug("strlist__add failed (%d)\n", ret);
^ permalink raw reply related
* [PATCH v11 05/11] tracing/probes: Support typecast for various probe events
From: Masami Hiramatsu (Google) @ 2026-06-26 14:15 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Support BTF typecast feature on other probe events, but only if it is
kernel function entry or return, and must use function parameter name
or $retval. This means you can do:
(STRUCT)PARAM->MEMBER
Note: you can not use other variables like $stackN, %reg etc. That
needs nesting support.
To support other probe events, we just need to use last_struct type
when we find a function parameter in parse_btf_arg().
This also updates <tracefs>/README file to show struct typecast.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v5:
- Add comments about $retval with typecast.
- Even if the type of retvalue is not known, if user specifies typecast,
use it for its type.
Changes in v3:
- Clarify the limitation.
Changes in v2:
- Fix to re-enable typecast on eprobe.
---
Documentation/trace/fprobetrace.rst | 3 +++
Documentation/trace/kprobetrace.rst | 4 ++++
kernel/trace/trace.c | 2 +-
kernel/trace/trace_probe.c | 23 +++++++++++++++++------
kernel/trace/trace_probe.h | 5 +++++
5 files changed, 30 insertions(+), 7 deletions(-)
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index b4c2ca3d02c1..7435ded2d66d 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -57,6 +57,9 @@ Synopsis of fprobe-events
(u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
(x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
and bitfield are supported.
+ (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+ a pointer to STRUCT and then derference the pointer defined by
+ ->MEMBER.
(\*1) This is available only when BTF is enabled.
(\*2) only for the probe on function entry (offs == 0). Note, this argument access
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 3b6791c17e9b..f73614997d52 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -61,6 +61,10 @@ Synopsis of kprobe_events
(x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
"string", "ustring", "symbol", "symstr" and bitfield are
supported.
+ (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+ a pointer to STRUCT and then derference the pointer defined by
+ ->MEMBER. Note that this is available only when the probe is
+ on function entry.
(\*1) only for the probe on function entry (offs == 0). Note, this argument access
is best effort, because depending on the argument type, it may be passed on
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1146b83b711a..280a3dccd13f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4322,7 +4322,7 @@ static const char readme_msg[] =
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
"\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
- "\t <argname>[->field[->field|.field...]],\n"
+ "\t [(structname)]<argname>[->field[->field|.field...]],\n"
#endif
#else
"\t $stack<index>, $stack, $retval, $comm,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 0908019aea12..e6cc9f3d6c8b 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -699,7 +699,7 @@ static int parse_btf_arg(char *varname,
if (ctx->flags & TPARG_FL_RETURN && !strcmp(varname, "$retval")) {
code->op = FETCH_OP_RETVAL;
- /* Check whether the function return type is not void */
+ /* Check whether the function return type is not void, even with typecast. */
if (query_btf_context(ctx) == 0) {
if (ctx->proto->type == 0) {
trace_probe_log_err(ctx->offset, NO_RETVAL);
@@ -708,6 +708,13 @@ static int parse_btf_arg(char *varname,
tid = ctx->proto->type;
goto found;
}
+ /*
+ * Even if we can not find appropriate BTF info, we can still access
+ * the field via typecast.
+ */
+ if (ctx->struct_btf)
+ goto found;
+
if (field) {
trace_probe_log_err(ctx->offset + field - varname,
NO_BTF_ENTRY);
@@ -752,7 +759,10 @@ static int parse_btf_arg(char *varname,
return -ENOENT;
found:
- type = btf_type_skip_modifiers(ctx->btf, tid, NULL);
+ if (ctx->struct_btf)
+ type = ctx->last_struct;
+ else
+ type = btf_type_skip_modifiers(ctx->btf, tid, NULL);
found_type:
if (!type) {
trace_probe_log_err(ctx->offset, BAD_BTF_TID);
@@ -829,10 +839,11 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
char *tmp;
int ret;
- /* Currently this only works for eprobes */
- if (!(ctx->flags & TPARG_FL_TEVENT)) {
- trace_probe_log_err(ctx->offset, TYPECAST_NOT_EVENT);
- return -EINVAL;
+ if (!(tparg_is_event_probe(ctx->flags) ||
+ tparg_is_function_entry(ctx->flags) ||
+ tparg_is_function_return(ctx->flags))) {
+ trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
+ return -EOPNOTSUPP;
}
tmp = strchr(arg, ')');
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index e36cfe39e9a8..aa72e2ffdd93 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -429,6 +429,11 @@ static inline bool tparg_is_function_return(unsigned int flags)
return (flags & TPARG_FL_LOC_MASK) == (TPARG_FL_KERNEL | TPARG_FL_RETURN);
}
+static inline bool tparg_is_event_probe(unsigned int flags)
+{
+ return !!(flags & TPARG_FL_TEVENT);
+}
+
struct traceprobe_parse_context {
struct trace_event_call *event;
/* BTF related parameters */
^ permalink raw reply related
* [PATCH v11 06/11] tracing/probes: Support nested typecast
From: Masami Hiramatsu (Google) @ 2026-06-26 14:15 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
When we hit an open parenthesis right after typecast closing
parenthesis, it means we have nested typecast. This allows us to
typecast a generic data member in a structure to a pointer to
another structure.
For example, to cast a DATA_MEMBER of VAR structure to STRUCT pointer
and get MEMBER value.
(STRUCT)(VAR->DATA_MEMBER)->MEMBER
Also, we can nest typecast.
(STRUCT1)((STRUCT2)$ARG->FIELD2)->FIELD1
Currently the max nest level is limited to 3.
This also allows user to use typecasting for registers or stacks on
kprobe events. e.g.
(STRUCT)(%ax)->MEMBER
(STRUCT)($stack0)->MEMBER
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v11:
- Fix to return -EINVAL if WARN_ON_ONCE() is hit.
Changes in v6:
- Add a WARN_ON_ONCE check for leaking nested_level (it must not happen.)
Changes in v4:
- Use orig_offset for reporting NO_PTR_STRCT error.
Changes in v2:
- Fix to skip "->" after closing parenthetsis.
---
Documentation/trace/eprobetrace.rst | 2 +
Documentation/trace/fprobetrace.rst | 2 +
Documentation/trace/kprobetrace.rst | 2 +
kernel/trace/trace.c | 1
kernel/trace/trace_probe.c | 83 ++++++++++++++++++++++++++++++++---
kernel/trace/trace_probe.h | 7 +++
6 files changed, 88 insertions(+), 9 deletions(-)
diff --git a/Documentation/trace/eprobetrace.rst b/Documentation/trace/eprobetrace.rst
index fe3602540569..cd0b4aa7f896 100644
--- a/Documentation/trace/eprobetrace.rst
+++ b/Documentation/trace/eprobetrace.rst
@@ -50,6 +50,8 @@ Synopsis of eprobe_events
a pointer to STRUCT and then derference the pointer defined by
->MEMBER. Note that when this is used, the FIELD name does not
need to be prefixed with a '$'.
+ (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+ also be used with another FETCHARG instead of FIELD.
Types
-----
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 7435ded2d66d..6b8bb27bb62d 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -60,6 +60,8 @@ Synopsis of fprobe-events
(STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
a pointer to STRUCT and then derference the pointer defined by
->MEMBER.
+ (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+ also be used with another FETCHARG instead of FIELD.
(\*1) This is available only when BTF is enabled.
(\*2) only for the probe on function entry (offs == 0). Note, this argument access
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index f73614997d52..c4382765d5b2 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -65,6 +65,8 @@ Synopsis of kprobe_events
a pointer to STRUCT and then derference the pointer defined by
->MEMBER. Note that this is available only when the probe is
on function entry.
+ (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+ also be used with another FETCHARG instead of FIELD.
(\*1) only for the probe on function entry (offs == 0). Note, this argument access
is best effort, because depending on the argument type, it may be passed on
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 280a3dccd13f..e56ee034c486 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4323,6 +4323,7 @@ static const char readme_msg[] =
"\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
"\t [(structname)]<argname>[->field[->field|.field...]],\n"
+ "\t [(structname)](fetcharg)->field[->field|.field...],\n"
#endif
#else
"\t $stack<index>, $stack, $retval, $comm,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index e6cc9f3d6c8b..827ae04f6351 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -832,10 +832,35 @@ static int query_btf_struct(const char *sname, struct traceprobe_parse_context *
return 0;
}
+/* Find the matching closing parenthesis for a given opening parenthesis. */
+static char *find_matched_close_paren(char *s)
+{
+ char *p = s;
+ int count = 0;
+
+ while (*p) {
+ if (*p == '(')
+ count++;
+ else if (*p == ')') {
+ if (--count == 0)
+ return p;
+ }
+ p++;
+ }
+ return NULL;
+}
+
+static int
+parse_probe_arg(char *arg, const struct fetch_type *type,
+ struct fetch_insn **pcode, struct fetch_insn *end,
+ struct traceprobe_parse_context *ctx);
+
static int handle_typecast(char *arg, struct fetch_insn **pcode,
struct fetch_insn *end,
struct traceprobe_parse_context *ctx)
{
+ int orig_offset = ctx->offset;
+ bool nested = false;
char *tmp;
int ret;
@@ -852,19 +877,56 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
DEREF_OPEN_BRACE);
return -EINVAL;
}
- *tmp = '\0';
- ret = query_btf_struct(arg + 1, ctx);
- *tmp = ')';
+ *tmp++ = '\0';
+ /* Handle the nested structure like (STRUCT)(VAR->FIELD)->... */
+ if (*tmp == '(') {
+ char *close = find_matched_close_paren(tmp);
+
+ ctx->offset += tmp - arg;
+ if (!close) {
+ trace_probe_log_err(ctx->offset, DEREF_OPEN_BRACE);
+ return -EINVAL;
+ }
+ /* We expect a field access for typecast */
+ if (close[1] != '-' || close[2] != '>') {
+ trace_probe_log_err(ctx->offset + close - tmp + 1,
+ TYPECAST_REQ_FIELD);
+ return -EINVAL;
+ }
+
+ ctx->nested_level++;
+ if (ctx->nested_level > TRACEPROBE_MAX_NESTED_LEVEL) {
+ trace_probe_log_err(ctx->offset, TOO_MANY_NESTED);
+ return -E2BIG;
+ }
+ *close = '\0';
+
+ ctx->offset += 1; /* for the '(' */
+ /* We need to parse the nested one */
+ ret = parse_probe_arg(tmp + 1, find_fetch_type(NULL, ctx->flags),
+ pcode, end, ctx);
+ if (ret < 0)
+ return ret;
+ ctx->nested_level--;
+ clear_struct_btf(ctx);
+
+ tmp = close + 3;/* Skip "->" after closing parenthesis */
+ nested = true;
+ }
+
+ ret = query_btf_struct(arg + 1, ctx);
if (ret < 0) {
- trace_probe_log_err(ctx->offset + 1, NO_PTR_STRCT);
+ trace_probe_log_err(orig_offset + 1, NO_PTR_STRCT);
return -EINVAL;
}
- tmp++;
-
- ctx->offset += tmp - arg;
- ret = parse_btf_arg(tmp, pcode, end, ctx);
+ ctx->offset = orig_offset + tmp - arg;
+ /* If it is nested, tmp points to the field name. */
+ if (nested)
+ ret = parse_btf_field(tmp, ctx->last_struct, pcode, end, ctx);
+ else
+ ret = parse_btf_arg(tmp, pcode, end, ctx);
return ret;
}
@@ -1638,6 +1700,11 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
ctx);
if (ret < 0)
goto fail;
+ /* nested_level must be 0 here, otherwise there is a bug. */
+ if (WARN_ON_ONCE(ctx->nested_level)) {
+ ret = -EINVAL;
+ goto fail;
+ }
/* Update storing type if BTF is available */
if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) &&
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index aa72e2ffdd93..7d71925244e8 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -450,8 +450,11 @@ struct traceprobe_parse_context {
struct trace_probe *tp;
unsigned int flags;
int offset;
+ int nested_level;
};
+#define TRACEPROBE_MAX_NESTED_LEVEL 3
+
extern int traceprobe_parse_probe_arg(struct trace_probe *tp, int i,
const char *argv,
struct traceprobe_parse_context *ctx);
@@ -587,7 +590,9 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(TOO_MANY_ARGS, "Too many arguments are specified"), \
C(TOO_MANY_EARGS, "Too many entry arguments specified"), \
C(EVENT_TOO_BIG, "Event too big (too many fields?)"), \
- C(TYPECAST_NOT_EVENT, "Typecasts are only for eprobe fields"),
+ C(TYPECAST_NOT_EVENT, "Typecasts are only for eprobe fields"), \
+ C(TYPECAST_REQ_FIELD, "Typecast requires a field access"), \
+ C(TOO_MANY_NESTED, "Too many nested typecasts/dereferences"),
#undef C
#define C(a, b) TP_ERR_##a
^ permalink raw reply related
* [PATCH v11 07/11] tracing/probes: Type casting always involves nested calls
From: Masami Hiramatsu (Google) @ 2026-06-26 14:15 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
This allows type casting to various fetchargs without parentheses
by recursively calling parse_probe_arg on the target when type
casting is used.
For example, this allows the following expressions:
- (STRUCT)%REG->FIELD
- (STRUCT)$stackN->FIELD
- (STRUCT)@SYM->FIELD
Note that @SYM+/-OFFSET with typecast needs parentheses like:
- (STRUCT)(@SYM-8)->FIELD
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v8:
- Fix caret position in error case.
- Add a comment about @SYM+/-OFFSET without parentheses.
Changes in v7:
- Prohibit using @SYM+/-OFFSET without parentheses.
- Cleanup parse_btf_arg() since ctx->struct_btf is always NULL now.
Changes in v6:
- Newly added.
---
kernel/trace/trace_probe.c | 123 ++++++++++++++++++++++++++------------------
kernel/trace/trace_probe.h | 4 +
2 files changed, 75 insertions(+), 52 deletions(-)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 827ae04f6351..1b97b125e9cb 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -684,19 +684,6 @@ static int parse_btf_arg(char *varname,
return -EOPNOTSUPP;
}
- if (ctx->flags & TPARG_FL_TEVENT) {
- ret = parse_trace_event(varname, code, ctx);
- if (ret < 0) {
- trace_probe_log_err(ctx->offset, BAD_ATTACH_ARG);
- return ret;
- }
- /* TEVENT is only here via a typecast */
- if (WARN_ON_ONCE(ctx->struct_btf == NULL))
- return -EINVAL;
- type = ctx->last_struct;
- goto found_type;
- }
-
if (ctx->flags & TPARG_FL_RETURN && !strcmp(varname, "$retval")) {
code->op = FETCH_OP_RETVAL;
/* Check whether the function return type is not void, even with typecast. */
@@ -708,13 +695,6 @@ static int parse_btf_arg(char *varname,
tid = ctx->proto->type;
goto found;
}
- /*
- * Even if we can not find appropriate BTF info, we can still access
- * the field via typecast.
- */
- if (ctx->struct_btf)
- goto found;
-
if (field) {
trace_probe_log_err(ctx->offset + field - varname,
NO_BTF_ENTRY);
@@ -759,11 +739,7 @@ static int parse_btf_arg(char *varname,
return -ENOENT;
found:
- if (ctx->struct_btf)
- type = ctx->last_struct;
- else
- type = btf_type_skip_modifiers(ctx->btf, tid, NULL);
-found_type:
+ type = btf_type_skip_modifiers(ctx->btf, tid, NULL);
if (!type) {
trace_probe_log_err(ctx->offset, BAD_BTF_TID);
return -EINVAL;
@@ -860,7 +836,7 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
struct traceprobe_parse_context *ctx)
{
int orig_offset = ctx->offset;
- bool nested = false;
+ char *close;
char *tmp;
int ret;
@@ -871,6 +847,17 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
return -EOPNOTSUPP;
}
+ /*
+ * Always consider the token after typecast as a nested call
+ * For example: (STRUCT)VAR->FIELD and (STRUCT)(VAR)->FIELD are same.
+ * VAR is solved in the nested call.
+ */
+ ctx->nested_level++;
+ if (ctx->nested_level > TRACEPROBE_MAX_NESTED_LEVEL) {
+ trace_probe_log_err(ctx->offset, TOO_MANY_NESTED);
+ return -E2BIG;
+ }
+
tmp = strchr(arg, ')');
if (!tmp) {
trace_probe_log_err(ctx->offset + strlen(arg),
@@ -879,11 +866,10 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
}
*tmp++ = '\0';
- /* Handle the nested structure like (STRUCT)(VAR->FIELD)->... */
+ ctx->offset += tmp - arg;
if (*tmp == '(') {
- char *close = find_matched_close_paren(tmp);
+ close = find_matched_close_paren(tmp);
- ctx->offset += tmp - arg;
if (!close) {
trace_probe_log_err(ctx->offset, DEREF_OPEN_BRACE);
return -EINVAL;
@@ -894,27 +880,66 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
TYPECAST_REQ_FIELD);
return -EINVAL;
}
-
- ctx->nested_level++;
- if (ctx->nested_level > TRACEPROBE_MAX_NESTED_LEVEL) {
- trace_probe_log_err(ctx->offset, TOO_MANY_NESTED);
- return -E2BIG;
+ /* Skip '(' */
+ ctx->offset += 1;
+ tmp++;
+ } else if (*tmp == '+' || *tmp == '-') {
+ /* Dereference can have another field access inside it. */
+ char *open = strchr(tmp + 1, '(');
+
+ if (!open) {
+ trace_probe_log_err(ctx->offset,
+ DEREF_NEED_BRACE);
+ return -EINVAL;
+ }
+ close = find_matched_close_paren(open);
+ if (!close) {
+ trace_probe_log_err(ctx->offset + strlen(tmp),
+ DEREF_OPEN_BRACE);
+ return -EINVAL;
+ }
+ close++;
+ /* We expect a field access for typecast */
+ if (close[0] != '-' || close[1] != '>') {
+ trace_probe_log_err(ctx->offset + close - tmp,
+ TYPECAST_REQ_FIELD);
+ return -EINVAL;
+ }
+ } else {
+ if (tmp[0] == '@') {
+ /* @sym+offset is not allowed without parenthesized */
+ close = strpbrk(tmp, "+-");
+ if (close && isdigit(close[1])) {
+ trace_probe_log_err(ctx->offset,
+ TYPECAST_SYM_OFFSET);
+ return -EINVAL;
+ }
}
- *close = '\0';
+ /* Inner variable name */
+ close = strchr(tmp, '-');
+ if (!close || close[1] != '>') {
+ trace_probe_log_err(ctx->offset + strlen(tmp),
+ TYPECAST_REQ_FIELD);
+ return -EINVAL;
+ }
+ }
+ *close = '\0';
- ctx->offset += 1; /* for the '(' */
- /* We need to parse the nested one */
- ret = parse_probe_arg(tmp + 1, find_fetch_type(NULL, ctx->flags),
- pcode, end, ctx);
- if (ret < 0)
- return ret;
- ctx->nested_level--;
- clear_struct_btf(ctx);
+ /* We need to parse the nested one */
+ ret = parse_probe_arg(tmp, find_fetch_type(NULL, ctx->flags),
+ pcode, end, ctx);
+ if (ret < 0)
+ return ret;
+ ctx->nested_level--;
+ clear_struct_btf(ctx);
- tmp = close + 3;/* Skip "->" after closing parenthesis */
- nested = true;
- }
+ /* Let tmp point the field name. */
+ if (close[1] == '-')
+ tmp = close + 3; /* Skip "->" after closing parenthesis */
+ else
+ tmp = close + 2; /* Skip ">" after inner variable name */
+ /* resolve the typecast struct name */
ret = query_btf_struct(arg + 1, ctx);
if (ret < 0) {
trace_probe_log_err(orig_offset + 1, NO_PTR_STRCT);
@@ -922,11 +947,7 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
}
ctx->offset = orig_offset + tmp - arg;
- /* If it is nested, tmp points to the field name. */
- if (nested)
- ret = parse_btf_field(tmp, ctx->last_struct, pcode, end, ctx);
- else
- ret = parse_btf_arg(tmp, pcode, end, ctx);
+ ret = parse_btf_field(tmp, ctx->last_struct, pcode, end, ctx);
return ret;
}
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 7d71925244e8..f4fbe3010978 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -453,6 +453,7 @@ struct traceprobe_parse_context {
int nested_level;
};
+/* Each typecast consumes nested level. So the max number of typecast is 3. */
#define TRACEPROBE_MAX_NESTED_LEVEL 3
extern int traceprobe_parse_probe_arg(struct trace_probe *tp, int i,
@@ -592,7 +593,8 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(EVENT_TOO_BIG, "Event too big (too many fields?)"), \
C(TYPECAST_NOT_EVENT, "Typecasts are only for eprobe fields"), \
C(TYPECAST_REQ_FIELD, "Typecast requires a field access"), \
- C(TOO_MANY_NESTED, "Too many nested typecasts/dereferences"),
+ C(TOO_MANY_NESTED, "Too many nested typecasts/dereferences"), \
+ C(TYPECAST_SYM_OFFSET, "@SYM+/-OFFSET with typecast needs parentheses")
#undef C
#define C(a, b) TP_ERR_##a
^ permalink raw reply related
* [PATCH v11 08/11] tracing/probes: Support field specifier option for typecast
From: Masami Hiramatsu (Google) @ 2026-06-26 14:15 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a field specifier option for the typecast. This works like
container_of() macro.
(STRUCT[,FIELD[.FIELD2...]])VAR
This is equivalent to :
container_of(VAR, struct STRUCT, FIELD[.FIELD2...])
For example:
echo "f tick_nohz_handler next_tick=(tick_sched,sched_timer)timer->next_tick" >> dynamic_events
This will trace tick_nohz_handler() with its tick_sched::next_tick which
is converted from @timer by contianer_of(tick, struct tick_sched, sched_timer).
So, if you enabkle both fprobes:tick_nohz_handler__entry and
timer:hrtimer_expire_entry events, we will see something like:
<idle>-0 [002] d.h1. 3778.087272: hrtimer_expire_entry: hrtimer=00000000d63db328 f
unction=tick_nohz_handler now=3777450051040
<idle>-0 [002] d.h1. 3778.087281: tick_nohz_handler__entry: (tick_nohz_handler+0x4
/0x140) next_tick=3777450000000
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v6:
- Update according to the allways nested patch.
Changes in v3:
- Fix error caret position.
Changes in v2:
- Use byteoffset for typecast field offset instead of bitoffset. This fixes negative modulo calculation.
- Check whether a field is specified after typecast.
- Reject if typecast field option has arrow operator.
---
Documentation/trace/eprobetrace.rst | 5 +
Documentation/trace/fprobetrace.rst | 8 +-
Documentation/trace/kprobetrace.rst | 8 +-
kernel/trace/trace.c | 4 -
kernel/trace/trace_probe.c | 169 ++++++++++++++++++++++++-----------
kernel/trace/trace_probe.h | 5 +
6 files changed, 135 insertions(+), 64 deletions(-)
diff --git a/Documentation/trace/eprobetrace.rst b/Documentation/trace/eprobetrace.rst
index cd0b4aa7f896..680e0af43d5d 100644
--- a/Documentation/trace/eprobetrace.rst
+++ b/Documentation/trace/eprobetrace.rst
@@ -49,7 +49,10 @@ Synopsis of eprobe_events
(STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
a pointer to STRUCT and then derference the pointer defined by
->MEMBER. Note that when this is used, the FIELD name does not
- need to be prefixed with a '$'.
+ need to be prefixed with a '$'. ASGN can be specified optionally.
+ If ASGN is specified, FIELD will be cast to the same offset
+ position as the ASGN member, rather than to the beginning of
+ the STRUCT.
(STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
also be used with another FETCHARG instead of FIELD.
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 6b8bb27bb62d..290a9e6f7491 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -57,10 +57,12 @@ Synopsis of fprobe-events
(u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
(x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
and bitfield are supported.
- (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+ (STRUCT[,ASGN])FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
a pointer to STRUCT and then derference the pointer defined by
- ->MEMBER.
- (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+ ->MEMBER. ASGN can be specified optionally. If ASGN is specified,
+ FIELD will be cast to the same offset position as the ASGN member,
+ rather than to the beginning of the STRUCT.
+ (STRUCT[,ASGN])(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
also be used with another FETCHARG instead of FIELD.
(\*1) This is available only when BTF is enabled.
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index c4382765d5b2..a62707e6a9f2 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -61,11 +61,13 @@ Synopsis of kprobe_events
(x8/x16/x32/x64), VFS layer common type(%pd/%pD), "char",
"string", "ustring", "symbol", "symstr" and bitfield are
supported.
- (STRUCT)FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
+ (STRUCT[,ASGN])FIELD->MEMBER[->MEMBER] : If BTF is supported, typecast FIELD to
a pointer to STRUCT and then derference the pointer defined by
->MEMBER. Note that this is available only when the probe is
- on function entry.
- (STRUCT)(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
+ on function entry. ASGN can be specified optionally. If ASGN
+ is specified, FIELD will be cast to the same offset position
+ as the ASGN member, rather than to the beginning of the STRUCT.
+ (STRUCT[,ASGN])(FETCHARG)->MEMBER[->MEMBER] : typecast can nest, so the above can
also be used with another FETCHARG instead of FIELD.
(\*1) only for the probe on function entry (offs == 0). Note, this argument access
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e56ee034c486..5670c4b91dc0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4322,8 +4322,8 @@ static const char readme_msg[] =
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
"\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
- "\t [(structname)]<argname>[->field[->field|.field...]],\n"
- "\t [(structname)](fetcharg)->field[->field|.field...],\n"
+ "\t [(structname[,field])]<argname>[->field[->field|.field...]],\n"
+ "\t [(structname[,field])](fetcharg)->field[->field|.field...],\n"
#endif
#else
"\t $stack<index>, $stack, $retval, $comm,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 1b97b125e9cb..fd006b415c68 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -568,6 +568,64 @@ static int split_next_field(char *varname, char **next_field,
return ret;
}
+/* Inner loop for solving dot operator ('.'). Return bit-offset of the given field */
+static int get_bitoffset_of_field(char **pfieldname, const struct btf_type **ptype,
+ struct traceprobe_parse_context *ctx)
+{
+ const struct btf_type *type = *ptype;
+ const struct btf_member *field;
+ struct btf *btf = ctx_btf(ctx);
+ char *fieldname = *pfieldname;
+ int bitoffs = 0;
+ u32 anon_offs;
+ char *next;
+ int is_ptr;
+
+ do {
+ next = NULL;
+ is_ptr = split_next_field(fieldname, &next, ctx);
+ if (is_ptr < 0)
+ return is_ptr;
+
+ anon_offs = 0;
+ field = btf_find_struct_member(btf, type, fieldname,
+ &anon_offs);
+ if (IS_ERR(field)) {
+ trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+ return PTR_ERR(field);
+ }
+ if (!field) {
+ trace_probe_log_err(ctx->offset, NO_BTF_FIELD);
+ return -ENOENT;
+ }
+ /* Add anonymous structure/union offset */
+ bitoffs += anon_offs;
+
+ /* Accumulate the bit-offsets of the dot-connected fields */
+ if (btf_type_kflag(type)) {
+ bitoffs += BTF_MEMBER_BIT_OFFSET(field->offset);
+ ctx->last_bitsize = BTF_MEMBER_BITFIELD_SIZE(field->offset);
+ } else {
+ bitoffs += field->offset;
+ ctx->last_bitsize = 0;
+ }
+
+ type = btf_type_skip_modifiers(btf, field->type, NULL);
+ if (!type) {
+ trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+ return -EINVAL;
+ }
+
+ if (next)
+ ctx->offset += next - fieldname;
+ fieldname = next;
+ } while (!is_ptr && fieldname);
+
+ *pfieldname = fieldname;
+ *ptype = type;
+
+ return bitoffs;
+}
/*
* Parse the field of data structure. The @type must be a pointer type
* pointing the target data structure type.
@@ -577,15 +635,13 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
struct traceprobe_parse_context *ctx)
{
struct fetch_insn *code = *pcode;
- const struct btf_member *field;
- u32 bitoffs, anon_offs;
- bool is_struct = ctx->struct_btf != NULL;
struct btf *btf = ctx_btf(ctx);
- char *next;
- int is_ptr;
+ bool is_first_field = true;
+ int bitoffs;
do {
- if (!is_struct) {
+ /* For the first field of typecast, @type will be the target structure type. */
+ if (!(is_first_field && ctx->struct_btf)) {
/* Outer loop for solving arrow operator ('->') */
if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
@@ -599,60 +655,25 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
return -EINVAL;
}
}
- /* Only the first type can skip being a pointer */
- is_struct = false;
-
- bitoffs = 0;
- do {
- /* Inner loop for solving dot operator ('.') */
- next = NULL;
- is_ptr = split_next_field(fieldname, &next, ctx);
- if (is_ptr < 0)
- return is_ptr;
-
- anon_offs = 0;
- field = btf_find_struct_member(btf, type, fieldname,
- &anon_offs);
- if (IS_ERR(field)) {
- trace_probe_log_err(ctx->offset, BAD_BTF_TID);
- return PTR_ERR(field);
- }
- if (!field) {
- trace_probe_log_err(ctx->offset, NO_BTF_FIELD);
- return -ENOENT;
- }
- /* Add anonymous structure/union offset */
- bitoffs += anon_offs;
-
- /* Accumulate the bit-offsets of the dot-connected fields */
- if (btf_type_kflag(type)) {
- bitoffs += BTF_MEMBER_BIT_OFFSET(field->offset);
- ctx->last_bitsize = BTF_MEMBER_BITFIELD_SIZE(field->offset);
- } else {
- bitoffs += field->offset;
- ctx->last_bitsize = 0;
- }
-
- type = btf_type_skip_modifiers(btf, field->type, NULL);
- if (!type) {
- trace_probe_log_err(ctx->offset, BAD_BTF_TID);
- return -EINVAL;
- }
-
- ctx->offset += next - fieldname;
- fieldname = next;
- } while (!is_ptr && fieldname);
+ bitoffs = get_bitoffset_of_field(&fieldname, &type, ctx);
+ if (bitoffs < 0)
+ return bitoffs;
if (++code == end) {
trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
return -EINVAL;
}
code->op = FETCH_OP_DEREF; /* TODO: user deref support */
code->offset = bitoffs / 8;
+ if (is_first_field && ctx->struct_btf) {
+ /* The first field can be typecasted with field option. */
+ code->offset -= ctx->prefix_byteoffs;
+ }
*pcode = code;
ctx->last_bitoffs = bitoffs % 8;
ctx->last_type = type;
+ is_first_field = false;
} while (fieldname);
return 0;
@@ -808,6 +829,46 @@ static int query_btf_struct(const char *sname, struct traceprobe_parse_context *
return 0;
}
+static int parse_btf_casttype(char *casttype, struct traceprobe_parse_context *ctx)
+{
+ char *field;
+ int ret;
+
+ /* Field option - evaluated later. */
+ field = strchr(casttype, ',');
+ if (field)
+ *field++ = '\0';
+
+ ret = query_btf_struct(casttype, ctx);
+ if (ret < 0) {
+ trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
+ return -EINVAL;
+ }
+
+ if (field) {
+ struct btf_type *type = (struct btf_type *)ctx->last_struct;
+
+ ctx->offset += field - casttype;
+ ret = get_bitoffset_of_field(&field, &ctx->last_struct, ctx);
+ if (ret < 0)
+ return ret;
+ if (ret % 8) {
+ trace_probe_log_err(ctx->offset, TYPECAST_NOT_ALIGNED);
+ return -EINVAL;
+ }
+ if (field != NULL) {
+ /* this means @field skips an arrow operator ("->"). */
+ trace_probe_log_err(ctx->offset - 2, TYPECAST_BAD_ARROW);
+ return -EINVAL;
+ }
+ ctx->prefix_byteoffs = ret / 8;
+ /* Restore the original struct type (overwritten by get_bitoffset_of_field) */
+ ctx->last_struct = type;
+ }
+
+ return ret;
+}
+
/* Find the matching closing parenthesis for a given opening parenthesis. */
static char *find_matched_close_paren(char *s)
{
@@ -940,14 +1001,14 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
tmp = close + 2; /* Skip ">" after inner variable name */
/* resolve the typecast struct name */
- ret = query_btf_struct(arg + 1, ctx);
- if (ret < 0) {
- trace_probe_log_err(orig_offset + 1, NO_PTR_STRCT);
- return -EINVAL;
- }
+ ctx->offset = orig_offset + 1; /* for the '(' */
+ ret = parse_btf_casttype(arg + 1, ctx);
+ if (ret < 0)
+ return ret;
ctx->offset = orig_offset + tmp - arg;
ret = parse_btf_field(tmp, ctx->last_struct, pcode, end, ctx);
+ ctx->prefix_byteoffs = 0;
return ret;
}
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index f4fbe3010978..e7fcc77f51fc 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -451,6 +451,7 @@ struct traceprobe_parse_context {
unsigned int flags;
int offset;
int nested_level;
+ int prefix_byteoffs; /* The byte offset of the prefix field of typecast */
};
/* Each typecast consumes nested level. So the max number of typecast is 3. */
@@ -594,7 +595,9 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(TYPECAST_NOT_EVENT, "Typecasts are only for eprobe fields"), \
C(TYPECAST_REQ_FIELD, "Typecast requires a field access"), \
C(TOO_MANY_NESTED, "Too many nested typecasts/dereferences"), \
- C(TYPECAST_SYM_OFFSET, "@SYM+/-OFFSET with typecast needs parentheses")
+ C(TYPECAST_SYM_OFFSET, "@SYM+/-OFFSET with typecast needs parentheses") \
+ C(TYPECAST_NOT_ALIGNED, "Typecast field option is not byte-aligned"), \
+ C(TYPECAST_BAD_ARROW, "Typecast field option does not support -> operator"),
#undef C
#define C(a, b) TP_ERR_##a
^ permalink raw reply related
* [PATCH v11 09/11] tracing/probes: Add $current variable support
From: Masami Hiramatsu (Google) @ 2026-06-26 14:15 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Since we can use the BTF to cast value to a structure pointer type,
it is useful to introduce "$current" special variable support to
fetcharg.
User can define a fetcharg to access current task_struct properties
using BTF info. e.g.
$current->cpus_ptr
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v8:
- Avoid uninitialized ctx->btf issue on $current without typecast.
Changes in v7:
- Fix to use force-typecast for task_struct implicitly.
Changes in v6:
- Rebased on dump fetcharg patch.
- Remove function name/eprobe requirement for $current.
Changes in v5:
- Use s32 for bof_find_btf_id().
Changes in v4:
- Add $current in README when CONFIG_HAVE_FUNCTION_ARG_ACCESS_API=y case.
- Fix to prohibit using $current in eprobes and address based kprobes.
Changes in v3:
- Remove $current support from eprobes (because eprobes is only for event)
- Prohibit uprobes to use $current.
Changes in v2:
- Support to parse $current in parse_btf_arg().
- If no typecast on $current, it automatically casted to task_struct.
- Check error case if $current follows something except for "-".
---
Documentation/trace/fprobetrace.rst | 1 +
Documentation/trace/kprobetrace.rst | 1 +
kernel/trace/trace.c | 4 ++--
kernel/trace/trace_probe.c | 37 ++++++++++++++++++++++++++++++++++-
kernel/trace/trace_probe.h | 1 +
kernel/trace/trace_probe_tmpl.h | 3 +++
6 files changed, 44 insertions(+), 3 deletions(-)
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 290a9e6f7491..3392cab016b3 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -50,6 +50,7 @@ Synopsis of fprobe-events
$argN : Fetch the Nth function argument. (N >= 1) (\*2)
$retval : Fetch return value.(\*3)
$comm : Fetch current task comm.
+ $current : Fetch the address of the current task_struct.
+|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
\IMM : Store an immediate value to the argument.
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index a62707e6a9f2..81e4fe38791d 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -53,6 +53,7 @@ Synopsis of kprobe_events
$argN : Fetch the Nth function argument. (N >= 1) (\*1)
$retval : Fetch return value.(\*2)
$comm : Fetch current task comm.
+ $current : Fetch the address of the current task_struct.
+|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
\IMM : Store an immediate value to the argument.
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5670c4b91dc0..2b0b4f9acb2e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4320,13 +4320,13 @@ static const char readme_msg[] =
"\t args: <name>=fetcharg[:type]\n"
"\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
- "\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
+ "\t $stack<index>, $stack, $retval, $comm, $arg<N>, $current\n"
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
"\t [(structname[,field])]<argname>[->field[->field|.field...]],\n"
"\t [(structname[,field])](fetcharg)->field[->field|.field...],\n"
#endif
#else
- "\t $stack<index>, $stack, $retval, $comm,\n"
+ "\t $stack<index>, $stack, $retval, $comm, $current\n"
#endif
"\t +|-[u]<offset>(<fetcharg>), \\imm-value, \\\"imm-string\"\n"
"\t kernel return probes support: $retval, $arg<N>, $comm\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index fd006b415c68..999dec84275d 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -692,7 +692,9 @@ static int parse_btf_arg(char *varname,
int i, is_ptr, ret;
u32 tid;
- if (!ctx->funcname && !(ctx->flags & TPARG_FL_TEVENT))
+ /* Note: field is not separated at this point, so check prefix. */
+ if (!str_has_prefix(varname, "$current") &&
+ !ctx->funcname && !(ctx->flags & TPARG_FL_TEVENT))
return -EINVAL;
is_ptr = split_next_field(varname, &field, ctx);
@@ -705,6 +707,20 @@ static int parse_btf_arg(char *varname,
return -EOPNOTSUPP;
}
+ if (!strcmp(varname, "$current")) {
+ code->op = FETCH_OP_CURRENT;
+ /* If no typecast is specified for $current, use task_struct by default */
+ ret = bpf_find_btf_id("task_struct", BTF_KIND_STRUCT, &ctx->struct_btf);
+ if (ret < 0) {
+ trace_probe_log_err(ctx->offset, NO_BTF_ENTRY);
+ return -ENOENT;
+ }
+ tid = (u32)ret;
+ type = ctx->last_struct =
+ btf_type_skip_modifiers(ctx->struct_btf, tid, NULL);
+ goto found_type;
+ }
+
if (ctx->flags & TPARG_FL_RETURN && !strcmp(varname, "$retval")) {
code->op = FETCH_OP_RETVAL;
/* Check whether the function return type is not void, even with typecast. */
@@ -761,6 +777,7 @@ static int parse_btf_arg(char *varname,
found:
type = btf_type_skip_modifiers(ctx->btf, tid, NULL);
+found_type:
if (!type) {
trace_probe_log_err(ctx->offset, BAD_BTF_TID);
return -EINVAL;
@@ -1270,6 +1287,24 @@ static int parse_probe_vars(char *orig_arg, const struct fetch_type *t,
return 0;
}
+ /* $current returns the address of the current task_struct. */
+ if (str_has_prefix(arg, "current")) {
+ /* $current is only supported by kernel probe. */
+ if (!(ctx->flags & TPARG_FL_KERNEL)) {
+ err = TP_ERR_BAD_VAR;
+ goto inval;
+ }
+ arg += strlen("current");
+ if (*arg == '-' && IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS))
+ return parse_btf_arg(orig_arg, pcode, end, ctx);
+
+ if (*arg != '\0')
+ goto inval;
+
+ code->op = FETCH_OP_CURRENT;
+ return 0;
+ }
+
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
len = str_has_prefix(arg, "arg");
if (len) {
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index e7fcc77f51fc..053f72fdaece 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -92,6 +92,7 @@ typedef int (*print_type_func_t)(struct trace_seq *, void *, void *);
FETCH_OP(RETVAL, none), /* Return value */ \
FETCH_OP(IMM, imm), /* Immediate: .immediate */ \
FETCH_OP(COMM, none), /* Current comm */ \
+ FETCH_OP(CURRENT, none), /* Current task_struct address */\
FETCH_OP(ARG, param), /* Argument: .param = index */ \
FETCH_OP(FOFFS, imm), /* File offset: .immediate */ \
FETCH_OP(IMMSTR, string), /* Allocated string: .data */ \
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 51436f19083b..d0e9662cde00 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -112,6 +112,9 @@ process_common_fetch_insn(struct fetch_insn *code, unsigned long *val)
case FETCH_OP_IMMSTR:
*val = (unsigned long)code->data;
break;
+ case FETCH_OP_CURRENT:
+ *val = (unsigned long)current;
+ break;
default:
return -EILSEQ;
}
^ permalink raw reply related
* [PATCH v11 10/11] tracing/probes: Add this_cpu_read() and this_cpu_ptr() dereference method to fetcharg
From: Masami Hiramatsu (Google) @ 2026-06-26 14:15 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
When tracing the kernel local variables, sometimes we need to get the
CPU local variables. To access it, current simple dereference is not
enough.
Thus, introduce a special this_cpu_read() dereference to access per-cpu
variable for the current CPU (accessing other CPU variable may race with
updates on other CPUs). Also this_cpu_ptr() is for accessing per-cpu
pointer.
Those are working as same as the kernel percpu macro.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v11:
- Remove this_cpu_*() from eprobetrace.rst.
Changes in v10:
- Prohibit this_cpu_*() for eprobe events.
Changes in v9:
- Prohibit this_cpu_*() for non kernel probes.
Changes in v6:
- Rebased on dump fetcharg patch.
- Fix to fetch static percpu variable with @SYM correctly.
Changes in v5:
- Simplify this_cpu_read() into +0(this_cpu_ptr()).
Changes in v3:
- Remove NULL check for percpu var because it is just an offset, could be 0.
- Simplify process_fetch_insn_bottom() code.
- If the last operation is this_cpu_read(), read only memory of the specific
size (of type).
Changes in v2:
- Drop +CPU/+PCPU and introduce this_cpu_read() and this_cpu_ptr().
- Support these method with BTF typecast.
- Just check the base address is NOT NULL instead of is_kernel_percpu_address().
---
Documentation/trace/fprobetrace.rst | 2
Documentation/trace/kprobetrace.rst | 2
kernel/trace/trace.c | 1
kernel/trace/trace_probe.c | 152 ++++++++++++++++++++++++++---------
kernel/trace/trace_probe.h | 6 +
kernel/trace/trace_probe_tmpl.h | 22 ++++-
6 files changed, 139 insertions(+), 46 deletions(-)
diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 3392cab016b3..3439bc9bd351 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -52,6 +52,8 @@ Synopsis of fprobe-events
$comm : Fetch current task comm.
$current : Fetch the address of the current task_struct.
+|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
+ this_cpu_read(FETCHARG) : Read the value of the per-CPU variable FETCHARG on the current CPU.
+ this_cpu_ptr(FETCHARG) : Get the address of the per-CPU variable FETCHARG on the current CPU.
\IMM : Store an immediate value to the argument.
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 81e4fe38791d..9ae330eb0a52 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -55,6 +55,8 @@ Synopsis of kprobe_events
$comm : Fetch current task comm.
$current : Fetch the address of the current task_struct.
+|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
+ this_cpu_read(FETCHARG) : Read the value of the per-CPU variable FETCHARG on the current CPU.
+ this_cpu_ptr(FETCHARG) : Get the address of the per-CPU variable FETCHARG on the current CPU.
\IMM : Store an immediate value to the argument.
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2b0b4f9acb2e..c9e182d40059 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4329,6 +4329,7 @@ static const char readme_msg[] =
"\t $stack<index>, $stack, $retval, $comm, $current\n"
#endif
"\t +|-[u]<offset>(<fetcharg>), \\imm-value, \\\"imm-string\"\n"
+ "\t this_cpu_read(<fetcharg>), this_cpu_ptr(<fetcharg>)\n"
"\t kernel return probes support: $retval, $arg<N>, $comm\n"
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, char, string, symbol,\n"
"\t b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 999dec84275d..18c212122344 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -345,6 +345,109 @@ static int parse_trace_event(char *arg, struct fetch_insn *code,
return -EINVAL;
}
+/* this_cpu_* parser */
+#define THIS_CPU_PTR_PREFIX "this_cpu_ptr("
+#define THIS_CPU_READ_PREFIX "this_cpu_read("
+#define THIS_CPU_PTR_LEN (sizeof(THIS_CPU_PTR_PREFIX) - 1)
+#define THIS_CPU_READ_LEN (sizeof(THIS_CPU_READ_PREFIX) - 1)
+
+static int
+parse_probe_arg(char *arg, const struct fetch_type *type,
+ struct fetch_insn **pcode, struct fetch_insn *end,
+ struct traceprobe_parse_context *ctx);
+
+/* handle dereference nested call */
+static inline int handle_dereference(char *arg, struct fetch_insn **pcode,
+ struct fetch_insn *end, struct traceprobe_parse_context *ctx,
+ int deref, long offset)
+{
+ const struct fetch_type *type = find_fetch_type(NULL, ctx->flags);
+ struct fetch_insn *code = *pcode;
+ int cur_offs = ctx->offset;
+ char *tmp;
+ int ret;
+
+ tmp = strrchr(arg, ')');
+ if (!tmp) {
+ trace_probe_log_err(ctx->offset + strlen(arg),
+ DEREF_OPEN_BRACE);
+ return -EINVAL;
+ }
+
+ *tmp = '\0';
+ ret = parse_probe_arg(arg, type, &code, end, ctx);
+ if (ret)
+ return ret;
+ ctx->offset = cur_offs;
+ if (code->op == FETCH_OP_COMM || code->op == FETCH_OP_IMMSTR) {
+ trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
+ return -EINVAL;
+ }
+
+ /*
+ * this_cpu_ptr(@SYM) does not use SYM value, but use SYM address.
+ * So we overwrite the last FETCH_OP_DEREF with FETCH_OP_CPU_PTR.
+ */
+ if (!(deref == FETCH_OP_CPU_PTR && *arg == '@')) {
+ code++;
+ if (code == end) {
+ trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
+ return -EINVAL;
+ }
+ }
+ *pcode = code;
+
+ code->op = deref;
+ code->offset = offset;
+ /* Reset the last type if used */
+ ctx->last_type = NULL;
+ return 0;
+}
+
+static int parse_this_cpu(char *arg, struct fetch_insn **pcode,
+ struct fetch_insn *end,
+ struct traceprobe_parse_context *ctx)
+{
+ struct fetch_insn *code;
+ bool is_ptr = false;
+ int ret;
+
+ /*
+ * This is only for kernel probes, excluding eprobe, because per-cpu
+ * pointer should not be recorded by events.
+ */
+ if (!(ctx->flags & TPARG_FL_KERNEL) ||
+ (ctx->flags & TPARG_FL_TEVENT)) {
+ trace_probe_log_err(ctx->offset, NOSUP_PERCPU);
+ return -EINVAL;
+ }
+ if (str_has_prefix(arg, THIS_CPU_PTR_PREFIX)) {
+ arg += THIS_CPU_PTR_LEN;
+ ctx->offset += THIS_CPU_PTR_LEN;
+ is_ptr = true;
+ } else if (str_has_prefix(arg, THIS_CPU_READ_PREFIX)) {
+ arg += THIS_CPU_READ_LEN;
+ ctx->offset += THIS_CPU_READ_LEN;
+ } else
+ return -EINVAL;
+
+ ret = handle_dereference(arg, pcode, end, ctx, FETCH_OP_CPU_PTR, 0);
+ if (ret || is_ptr)
+ return ret;
+
+ /* this_cpu_read(VAR) -> +0(this_cpu_ptr(VAR)) */
+ code = *pcode;
+ code++;
+ if (code == end) {
+ trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
+ return -EINVAL;
+ }
+ code->op = FETCH_OP_DEREF;
+ code->offset = 0;
+ *pcode = code;
+ return 0;
+}
+
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
static u32 btf_type_int(const struct btf_type *t)
@@ -904,11 +1007,6 @@ static char *find_matched_close_paren(char *s)
return NULL;
}
-static int
-parse_probe_arg(char *arg, const struct fetch_type *type,
- struct fetch_insn **pcode, struct fetch_insn *end,
- struct traceprobe_parse_context *ctx);
-
static int handle_typecast(char *arg, struct fetch_insn **pcode,
struct fetch_insn *end,
struct traceprobe_parse_context *ctx)
@@ -961,7 +1059,9 @@ static int handle_typecast(char *arg, struct fetch_insn **pcode,
/* Skip '(' */
ctx->offset += 1;
tmp++;
- } else if (*tmp == '+' || *tmp == '-') {
+ } else if (*tmp == '+' || *tmp == '-' ||
+ str_has_prefix(tmp, THIS_CPU_PTR_PREFIX) ||
+ str_has_prefix(tmp, THIS_CPU_READ_PREFIX)) {
/* Dereference can have another field access inside it. */
char *open = strchr(tmp + 1, '(');
@@ -1481,36 +1581,9 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
}
ctx->offset += (tmp + 1 - arg) + (arg[0] != '-' ? 1 : 0);
arg = tmp + 1;
- tmp = strrchr(arg, ')');
- if (!tmp) {
- trace_probe_log_err(ctx->offset + strlen(arg),
- DEREF_OPEN_BRACE);
- return -EINVAL;
- } else {
- const struct fetch_type *t2 = find_fetch_type(NULL, ctx->flags);
- int cur_offs = ctx->offset;
-
- *tmp = '\0';
- ret = parse_probe_arg(arg, t2, &code, end, ctx);
- if (ret)
- break;
- ctx->offset = cur_offs;
- if (code->op == FETCH_OP_COMM ||
- code->op == FETCH_OP_IMMSTR) {
- trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
- return -EINVAL;
- }
- if (++code == end) {
- trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
- return -EINVAL;
- }
- *pcode = code;
-
- code->op = deref;
- code->offset = offset;
- /* Reset the last type if used */
- ctx->last_type = NULL;
- }
+ ret = handle_dereference(arg, pcode, end, ctx, deref, offset);
+ if (ret < 0)
+ return ret;
break;
case '\\': /* Immediate value */
if (arg[1] == '"') { /* Immediate string */
@@ -1531,7 +1604,10 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
ret = handle_typecast(arg, pcode, end, ctx);
break;
default:
- if (isalpha(arg[0]) || arg[0] == '_') {
+ if (str_has_prefix(arg, THIS_CPU_PTR_PREFIX) ||
+ str_has_prefix(arg, THIS_CPU_READ_PREFIX)) {
+ ret = parse_this_cpu(arg, pcode, end, ctx);
+ } else if (isalpha(arg[0]) || arg[0] == '_') {
/* BTF variable or event field*/
if (ctx->flags & TPARG_FL_TEVENT) {
ret = parse_trace_event(arg, *pcode, ctx);
@@ -1548,8 +1624,8 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
return -EINVAL;
}
ret = parse_btf_arg(arg, pcode, end, ctx);
- break;
}
+ break;
}
if (!ret && code->op == FETCH_OP_NOP) {
/* Parsed, but do not find fetch method */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 053f72fdaece..e6268a8dc378 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -101,6 +101,7 @@ typedef int (*print_type_func_t)(struct trace_seq *, void *, void *);
/* Stage 2 (dereference) ops */ \
FETCH_OP(DEREF, offset), /* Dereference: .offset */ \
FETCH_OP(UDEREF, offset), /* User-space dereference: .offset */\
+ FETCH_OP(CPU_PTR, none), /* Per-CPU pointer: .offset */ \
/* Stage 3 (store) ops */ \
FETCH_OP(ST_RAW, store), /* Raw value: .size */ \
FETCH_OP(ST_MEM, store), /* Memory: .offset, .size */ \
@@ -596,9 +597,10 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(TYPECAST_NOT_EVENT, "Typecasts are only for eprobe fields"), \
C(TYPECAST_REQ_FIELD, "Typecast requires a field access"), \
C(TOO_MANY_NESTED, "Too many nested typecasts/dereferences"), \
- C(TYPECAST_SYM_OFFSET, "@SYM+/-OFFSET with typecast needs parentheses") \
+ C(TYPECAST_SYM_OFFSET, "@SYM+/-OFFSET with typecast needs parentheses"), \
C(TYPECAST_NOT_ALIGNED, "Typecast field option is not byte-aligned"), \
- C(TYPECAST_BAD_ARROW, "Typecast field option does not support -> operator"),
+ C(TYPECAST_BAD_ARROW, "Typecast field option does not support -> operator"), \
+ C(NOSUP_PERCPU, "Per-cpu variable access is only for kernel probes"),
#undef C
#define C(a, b) TP_ERR_##a
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index d0e9662cde00..8db12f758fda 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -129,25 +129,35 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,
struct fetch_insn *s3 = NULL;
int total = 0, ret = 0, i = 0;
u32 loc = 0;
- unsigned long lval = val;
+ unsigned long lval, llval = val;
stage2:
/* 2nd stage: dereference memory if needed */
do {
- if (code->op == FETCH_OP_DEREF) {
- lval = val;
+ lval = val;
+ switch (code->op) {
+ case FETCH_OP_DEREF:
ret = probe_mem_read(&val, (void *)val + code->offset,
sizeof(val));
- } else if (code->op == FETCH_OP_UDEREF) {
- lval = val;
+ break;
+ case FETCH_OP_UDEREF:
ret = probe_mem_read_user(&val,
(void *)val + code->offset, sizeof(val));
- } else
break;
+ case FETCH_OP_CPU_PTR:
+ val = (unsigned long)this_cpu_ptr((void __percpu *)val);
+ ret = 0;
+ break;
+ default:
+ lval = llval;
+ goto out;
+ }
if (ret)
return ret;
+ llval = lval;
code++;
} while (1);
+out:
s3 = code;
stage3:
^ permalink raw reply related
* [PATCH v11 11/11] tracing/probes: Add a new testcase for BTF typecasts
From: Masami Hiramatsu (Google) @ 2026-06-26 14:16 UTC (permalink / raw)
To: Steven Rostedt, Mathieu Desnoyers
Cc: Jonathan Corbet, Shuah Khan, Masami Hiramatsu, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest
In-Reply-To: <178248325671.841606.17344906774310339507.stgit@devnote2>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
With the introduction of container_of-style BTF typecasting and
per-CPU variable access support in trace probes, we need a way to
verify their functionality and prevent regressions.
Add a new ftrace kselftest and update the trace event sample module
to test and validate these features.
Specifically, update the trace-events-sample module to set up a
periodic timer whose callback accesses a per-CPU counter. Introduce
a new sample trace event, foo_timer_fn, to trace this callback
and log the current counter value.
Then, add a new test case, btf_probe_event.tc, which defines a
dynamic probe on the timer callback. The probe uses BTF typecasting
to recover the parent structure from the timer argument and
this_cpu_read() to fetch the per-CPU counter. The test verifies
the integrity of the implementation by ensuring the values
recorded by the dynamic probe match those from the static tracepoint.
Assisted-by: Antigravity:gemini-3.5-flash
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v11:
- nit: fix the error code in comment.
Changes in v10:
- Add a check for $current and this_cpu_* for eprobe
Changes in v9:
- Add a testcase for checking new syntax.
Changes in v8:
- Add more test cases.
Changes in v6:
- Update testcase according to changes.
Changes in v5:
- Add more syntax test cases.
Changes in v4:
- Fix uprobe $current test.
Changes in v3:
- Add syntax test case.
- Update testcase to use this_cpu_read()
Changes in v2:
- Use timer_shutdown_sync() instead of timer_delete_sync() for teardown.
---
samples/trace_events/trace-events-sample.c | 40 +++++++
samples/trace_events/trace-events-sample.h | 34 ++++++
.../ftrace/test.d/dynevent/btf_probe_event.tc | 51 ++++++++++
.../test.d/dynevent/btf_typecast_accepted.tc | 107 ++++++++++++++++++++
.../test.d/dynevent/eprobes_syntax_errors.tc | 9 ++
.../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 12 ++
.../ftrace/test.d/kprobe/kprobe_syntax_errors.tc | 12 ++
.../ftrace/test.d/kprobe/uprobe_syntax_errors.tc | 5 +
8 files changed, 265 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc
diff --git a/samples/trace_events/trace-events-sample.c b/samples/trace_events/trace-events-sample.c
index 0b7a6efdb247..ca5d98c360cb 100644
--- a/samples/trace_events/trace-events-sample.c
+++ b/samples/trace_events/trace-events-sample.c
@@ -94,6 +94,20 @@ static int simple_thread_fn(void *arg)
static DEFINE_MUTEX(thread_mutex);
static int simple_thread_cnt;
+static struct foo_timer_data *foo_timer_data;
+
+static void sample_timer_cb(struct timer_list *t)
+{
+ struct foo_timer_data *data = container_of(t, struct foo_timer_data, timer);
+
+ get_cpu();
+ trace_foo_timer_fn(data);
+ (*this_cpu_ptr(data->counter))++;
+ put_cpu();
+
+ mod_timer(t, jiffies + HZ);
+}
+
int foo_bar_reg(void)
{
mutex_lock(&thread_mutex);
@@ -132,9 +146,27 @@ void foo_bar_unreg(void)
static int __init trace_event_init(void)
{
+ foo_timer_data = kzalloc_obj(*foo_timer_data, GFP_KERNEL);
+ if (!foo_timer_data)
+ return -ENOMEM;
+
+ foo_timer_data->name = "sample_timer_counter";
+ foo_timer_data->counter = alloc_percpu(int);
+ if (!foo_timer_data->counter) {
+ kfree(foo_timer_data);
+ return -ENOMEM;
+ }
+
+ timer_setup(&foo_timer_data->timer, sample_timer_cb, 0);
+ mod_timer(&foo_timer_data->timer, jiffies + HZ);
+
simple_tsk = kthread_run(simple_thread, NULL, "event-sample");
- if (IS_ERR(simple_tsk))
- return -1;
+ if (IS_ERR(simple_tsk)) {
+ timer_shutdown_sync(&foo_timer_data->timer);
+ free_percpu(foo_timer_data->counter);
+ kfree(foo_timer_data);
+ return PTR_ERR(simple_tsk);
+ }
return 0;
}
@@ -147,6 +179,10 @@ static void __exit trace_event_exit(void)
kthread_stop(simple_tsk_fn);
simple_tsk_fn = NULL;
mutex_unlock(&thread_mutex);
+
+ timer_shutdown_sync(&foo_timer_data->timer);
+ free_percpu(foo_timer_data->counter);
+ kfree(foo_timer_data);
}
module_init(trace_event_init);
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index 1a05fc153353..816848a456a2 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -247,12 +247,14 @@
*/
/*
- * It is OK to have helper functions in the file, but they need to be protected
- * from being defined more than once. Remember, this file gets included more
- * than once.
+ * It is OK to have helper functions and data structures in the file, but they
+ * need to be protected from being defined more than once. Remember, this file
+ * gets included more than once.
*/
#ifndef __TRACE_EVENT_SAMPLE_HELPER_FUNCTIONS
#define __TRACE_EVENT_SAMPLE_HELPER_FUNCTIONS
+#include <linux/timer.h>
+
static inline int __length_of(const int *list)
{
int i;
@@ -270,6 +272,13 @@ enum {
TRACE_SAMPLE_BAR = 4,
TRACE_SAMPLE_ZOO = 8,
};
+
+struct foo_timer_data {
+ const char *name;
+ struct timer_list timer;
+ int __percpu *counter;
+};
+
#endif
/*
@@ -595,6 +604,25 @@ TRACE_EVENT(foo_rel_loc,
__get_rel_bitmask(bitmask),
__get_rel_cpumask(cpumask))
);
+
+TRACE_EVENT(foo_timer_fn,
+
+ TP_PROTO(struct foo_timer_data *data),
+
+ TP_ARGS(data),
+
+ TP_STRUCT__entry(
+ __string( name, data->name )
+ __field( int, count )
+ ),
+
+ TP_fast_assign(
+ __assign_str(name);
+ __entry->count = *this_cpu_ptr(data->counter);
+ ),
+
+ TP_printk("name=%s count=%d", __get_str(name), __entry->count)
+);
#endif
/***** NOTICE! The #if protection ends here. *****/
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc b/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
new file mode 100644
index 000000000000..96791e120b7d
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/btf_probe_event.tc
@@ -0,0 +1,51 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: BTF event with typecast and percpu access
+# requires: dynamic_events "this_cpu_read(<fetcharg>)":README "[(structname[,field])]<argname>[->field[->field|.field...]]":README
+
+# Check if the sample module is loaded
+if ! lsmod | grep -q trace_events_sample; then
+ modprobe trace-events-sample || exit_unsupported
+fi
+
+echo 0 > events/enable
+echo > dynamic_events
+
+# The sample_timer_cb(struct timer_list *t) is called.
+# We want to check (STRUCT,FIELD)VAR typecast and this_cpu_read() access.
+# (foo_timer_data,timer)t converts t to struct foo_timer_data * using container_of.
+# data->counter is a per-cpu pointer to int.
+# this_cpu_read(data->counter) should give the value of the counter.
+
+echo 'f:mysample/myevent sample_timer_cb name=(foo_timer_data,timer)t->name:string count=this_cpu_read((foo_timer_data,timer)t->counter)' >> dynamic_events
+
+echo 1 > events/mysample/myevent/enable
+echo 1 > events/sample-trace/foo_timer_fn/enable
+
+sleep 2
+
+echo 0 > events/mysample/myevent/enable
+echo 0 > events/sample-trace/foo_timer_fn/enable
+
+# Compare the values.
+MATCH=0
+while read line; do
+ if echo $line | grep -q "foo_timer_fn:"; then
+ NAME=`echo $line | sed 's/.*name=\([^ ]*\) .*/\1/'`
+ COUNT=`echo $line | sed 's/.*count=\([^ ]*\).*/\1/'`
+ if grep -q "myevent:.*name=\"${NAME}\" count=$COUNT" trace; then
+ MATCH=$((MATCH+1))
+ fi
+ fi
+done < trace
+
+if [ $MATCH -eq 0 ]; then
+ echo "No matching events found"
+ exit_fail
+fi
+
+# Clean up
+echo 0 > events/mysample/myevent/enable
+echo 0 > events/sample-trace/foo_timer_fn/enable
+echo > dynamic_events
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc b/tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc
new file mode 100644
index 000000000000..acf0b5a917d3
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/btf_typecast_accepted.tc
@@ -0,0 +1,107 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: BTF typecast and percpu access syntax validation
+# requires: dynamic_events "this_cpu_read(<fetcharg>)":README "[(structname[,field])]<argname>[->field[->field|.field...]]":README
+
+KPROBES=
+FPROBES=
+
+if grep -qF "p[:[<group>/][<event>]] <place> [<args>]" README ; then
+ KPROBES=yes
+fi
+if grep -qF "f[:[<group>/][<event>]] <func-name>[%return] [<args>]" README ; then
+ FPROBES=yes
+fi
+
+if [ -z "$KPROBES" -a -z "$FPROBES" ] ; then
+ exit_unsupported
+fi
+
+echo 0 > events/enable
+echo > dynamic_events
+
+# Load trace-events-sample module if available to have per-CPU counter structure defined
+if ! lsmod | grep -q trace_events_sample; then
+ modprobe trace-events-sample || true
+fi
+
+if [ "$FPROBES" ] ; then
+ # 1. Test basic typecast on fprobe
+ echo 'f:fpevent1 vfs_read name=(file)file->f_path.dentry->d_name.name:string' >> dynamic_events
+ # 2. Test parenthesized typecast target on fprobe
+ echo 'f:fpevent2 vfs_read name=(file)(file)->f_path.dentry->d_name.name:string' >> dynamic_events
+ # 3. Test nested typecasts on fprobe
+ echo 'f:fpevent3 vfs_read name=(dentry)((file)file->f_path.dentry)->d_name.name:string' >> dynamic_events
+ # 4. Test container_of-style typecast with field option on fprobe
+ echo 'f:fpevent4 vfs_read name=(file,f_path)file->f_mode' >> dynamic_events
+ # 5. Test typecast on return value on fprobe
+ echo 'f:fpevent5 vfs_read%return name=(file)$retval->f_path.dentry->d_name.name:string' >> dynamic_events
+ # 6. Test $current variable support on fprobe
+ echo 'f:fpevent6 vfs_read pid=$current->pid' >> dynamic_events
+ echo 'f:fpevent7 vfs_read pid=(task_struct)$current->pid' >> dynamic_events
+ echo 'f:fpevent8 vfs_read pid=(task_struct,group_leader)$current->pid' >> dynamic_events
+
+ # Test this_cpu_read and this_cpu_ptr on fprobe
+ if lsmod | grep -q trace_events_sample; then
+ echo 'f:fpevent9 sample_timer_cb name=(foo_timer_data,timer)t->name:string count=this_cpu_read((foo_timer_data,timer)t->counter)' >> dynamic_events
+ echo 'f:fpevent10 sample_timer_cb ptr=this_cpu_ptr((foo_timer_data,timer)t->counter)' >> dynamic_events
+ fi
+fi
+
+if [ "$KPROBES" ] ; then
+ # 7. Test basic typecast on kprobe
+ echo 'p:kpevent1 vfs_read name=(file)file->f_path.dentry->d_name.name:string' >> dynamic_events
+ # 8. Test parenthesized typecast target on kprobe
+ echo 'p:kpevent2 vfs_read name=(file)(file)->f_path.dentry->d_name.name:string' >> dynamic_events
+ # 9. Test nested typecasts on kprobe
+ echo 'p:kpevent3 vfs_read name=(dentry)((file)file->f_path.dentry)->d_name.name:string' >> dynamic_events
+ # 10. Test container_of-style typecast with field option on kprobe
+ echo 'p:kpevent4 vfs_read name=(file,f_path)file->f_mode' >> dynamic_events
+ # 11. Test typecast on return value on kretprobe
+ echo 'r:kpevent5 vfs_read name=(file)$retval->f_path.dentry->d_name.name:string' >> dynamic_events
+ # 12. Test $current variable support on kprobe
+ echo 'p:kpevent6 vfs_read pid=$current->pid' >> dynamic_events
+ echo 'p:kpevent7 vfs_read pid=(task_struct)$current->pid' >> dynamic_events
+ echo 'p:kpevent8 vfs_read pid=(task_struct,group_leader)$current->pid' >> dynamic_events
+
+ # Test this_cpu_read and this_cpu_ptr on kprobe
+ if lsmod | grep -q trace_events_sample; then
+ echo 'p:kpevent9 sample_timer_cb name=(foo_timer_data,timer)t->name:string count=this_cpu_read((foo_timer_data,timer)t->counter)' >> dynamic_events
+ echo 'p:kpevent10 sample_timer_cb ptr=this_cpu_ptr((foo_timer_data,timer)t->counter)' >> dynamic_events
+ fi
+fi
+
+# Verify the events exist in dynamic_events
+if [ "$FPROBES" ] ; then
+ grep -q "fpevent1 " dynamic_events
+ grep -q "fpevent2 " dynamic_events
+ grep -q "fpevent3 " dynamic_events
+ grep -q "fpevent4 " dynamic_events
+ grep -q "fpevent5 " dynamic_events
+ grep -q "fpevent6 " dynamic_events
+ grep -q "fpevent7 " dynamic_events
+ grep -q "fpevent8 " dynamic_events
+ if lsmod | grep -q trace_events_sample; then
+ grep -q "fpevent9 " dynamic_events
+ grep -q "fpevent10 " dynamic_events
+ fi
+fi
+
+if [ "$KPROBES" ] ; then
+ grep -q "kpevent1 " dynamic_events
+ grep -q "kpevent2 " dynamic_events
+ grep -q "kpevent3 " dynamic_events
+ grep -q "kpevent4 " dynamic_events
+ grep -q "kpevent5 " dynamic_events
+ grep -q "kpevent6 " dynamic_events
+ grep -q "kpevent7 " dynamic_events
+ grep -q "kpevent8 " dynamic_events
+ if lsmod | grep -q trace_events_sample; then
+ grep -q "kpevent9 " dynamic_events
+ grep -q "kpevent10 " dynamic_events
+ fi
+fi
+
+# Clean up
+echo > dynamic_events
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
index 0e65e787e426..1d6d1cf94f16 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/eprobes_syntax_errors.tc
@@ -21,8 +21,17 @@ check_error 'e:foo/^bar.1 syscalls/sys_enter_openat' # BAD_EVENT_NAME
check_error 'e:foo/bar syscalls/sys_enter_openat arg=^$foo' # BAD_ATTACH_ARG
+check_error 'e:foo/bar syscalls/sys_enter_openat arg=^COMM' # NO_EVENT_FIELD
+if grep -q '\\$current' README; then
+ check_error 'e:foo/bar syscalls/sys_enter_openat arg=^current' # NO_EVENT_FIELD
+fi
+
if grep -q '<attached-group>\.<attached-event>.*\[if <filter>\]' README; then
check_error 'e:foo/bar syscalls/sys_enter_openat if ^' # NO_EP_FILTER
fi
+if grep -q 'this_cpu_read(<fetcharg>)' README; then
+ check_error 'e:foo/bar syscalls/sys_enter_openat arg=^this_cpu_read(file)' # NOSUP_PERCPU
+fi
+
exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index fee479295e2f..e9d7e6919c7f 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -112,6 +112,18 @@ check_error 'f vfs_read%return $retval->^foo' # NO_PTR_STRCT
check_error 'f vfs_read file->^foo' # NO_BTF_FIELD
check_error 'f vfs_read file^-.foo' # BAD_HYPHEN
check_error 'f vfs_read ^file:string' # BAD_TYPE4STR
+if grep -qF "[(structname" README ; then
+check_error 'f vfs_read arg1=(task_struct)file^' # TYPECAST_REQ_FIELD
+check_error 'f vfs_read arg1=(a)((b)((c)(^(d)file->d)->c)->b)->a' # TOO_MANY_NESTED
+check_error 'f vfs_read arg1=(task_struct,^in_execve)file->comm' # TYPECAST_NOT_ALIGNED
+check_error 'f vfs_read arg1=(task_struct,^foo_bar)file->pid' # NO_BTF_FIELD
+check_error 'f vfs_read arg1=(^task_struct1234)file->pid' # NO_PTR_STRCT
+check_error 'f vfs_read arg1=(task_struct,se^->group_node)file->comm' # TYPECAST_BAD_ARROW
+check_error 'f vfs_read arg1=(task_struct,^->pid)file->comm' # NO_BTF_FIELD
+check_error 'f vfs_read arg1=(task_struct,^.pid)file->comm' # NO_BTF_FIELD
+check_error 'f vfs_read arg1=(task_struct,^.)file->comm' # NO_BTF_FIELD
+check_error 'f vfs_read arg1=(task_struct)^@symbol+10->comm' # TYPECAST_SYM_OFFSET
+fi
fi
else
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
index 8f1c58f0c239..21ce8414459f 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
@@ -115,6 +115,18 @@ check_error 'p vfs_read+20 ^$arg*' # NOFENTRY_ARGS
check_error 'p vfs_read ^hoge' # NO_BTFARG
check_error 'p kfree ^$arg10' # NO_BTFARG (exceed the number of parameters)
check_error 'r kfree ^$retval' # NO_RETVAL
+if grep -qF "[(structname" README ; then
+check_error 'p vfs_read arg1=(task_struct)file^' # TYPECAST_REQ_FIELD
+check_error 'p vfs_read arg1=(a)((b)((c)(^(d)file->d)->c)->b)->a' # TOO_MANY_NESTED
+check_error 'p vfs_read arg1=(task_struct,^in_execve)file->comm' # TYPECAST_NOT_ALIGNED
+check_error 'p vfs_read arg1=(task_struct,^foo_bar)file->pid' # NO_BTF_FIELD
+check_error 'p vfs_read arg1=(^task_struct1234)file->pid' # NO_PTR_STRCT
+check_error 'p vfs_read arg1=(task_struct,se^->group_node)file->comm' # TYPECAST_BAD_ARROW
+check_error 'p vfs_read arg1=(task_struct,^->pid)file->comm' # NO_BTF_FIELD
+check_error 'p vfs_read arg1=(task_struct,^.pid)file->comm' # NO_BTF_FIELD
+check_error 'p vfs_read arg1=(task_struct,^.)file->comm' # NO_BTF_FIELD
+check_error 'p vfs_read arg1=(task_struct)^@symbol+10->comm' # TYPECAST_SYM_OFFSET
+fi
else
check_error 'p vfs_read ^$arg*' # NOSUP_BTFARG
fi
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
index c817158b99db..e12dc967ec76 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/uprobe_syntax_errors.tc
@@ -28,4 +28,9 @@ if grep -q ".*symstr.*" README; then
check_error 'p /bin/sh:10 $stack0:^symstr' # BAD_TYPE
fi
+# $current is not supported by uprobe
+if grep -q "\$current.*" README; then
+check_error 'p /bin/sh:10 ^$current:u8' # BAD_VAR
+fi
+
exit 0
^ permalink raw reply related
* Re: [PATCH v7 0/9] bootconfig: embed kernel.* cmdline at build time
From: Masami Hiramatsu @ 2026-06-26 14:33 UTC (permalink / raw)
To: Breno Leitao
Cc: Andrew Morton, Nathan Chancellor, paulmck, Nicolas Schier,
Nick Desaulniers, Bill Wendling, Justin Stitt, Jonathan Corbet,
Shuah Khan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, linux-kernel,
linux-trace-kernel, linux-kbuild, bpf, llvm, linux-doc,
kernel-team, Nicolas Schier
In-Reply-To: <20260626-bootconfig_using_tools-v7-0-24ab72139c29@debian.org>
On Fri, 26 Jun 2026 05:50:09 -0700
Breno Leitao <leitao@debian.org> wrote:
> The userspace pieces (xbc_snprint_cmdline() in lib/, tools/bootconfig -C)
> already landed; this series wires the rendered cmdline into the kernel.
>
> Motivation: today the embedded bootconfig is parsed at runtime, after
> parse_early_param() has already run, so early_param() handlers can't
> see embedded values. Folding the kernel.* subtree into the cmdline at
> build time gives a CONFIG_CMDLINE-equivalent for embedded-bootconfig
> users without forcing them to maintain two cmdline sources.
>
> Behaviorally, the "kernel" subtree is rendered to a flat string at
> build time and stashed in .init.rodata. setup_arch() prepends it to
> boot_command_line before parse_early_param() runs. Overflow is a soft
> error: the helper logs and leaves boot_command_line untouched rather
> than panicking, so an oversized embedded bconf cannot brick a boot.
>
Thanks for update!! This looks good to me.
Let me pick it and test it.
Thanks,
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> Changes in v7:
> - The runtime opt-in now shares one helper instead of open-coding its
> own. (Masami)
> - bootconfig_cmdline_requested() moved into generic lib code (Masami)
> - Link to v6: https://lore.kernel.org/r/20260623-bootconfig_using_tools-v6-0-640c2f587a3c@debian.org
>
> Changes in v6:
> - renamed CONFIG_BOOT_CONFIG_EMBED_CMDLINE to
> CONFIG_CMDLINE_FROM_BOOTCONFIG
> - prepend embedded bootconfig cmdline before parse_early_param
> - Link to v5: https://lore.kernel.org/r/20260617-bootconfig_using_tools-v5-0-fd589a9cc5e3@debian.org
>
> Changes in v5:
> - Patch 3 (Kconfig): drop the redundant "depends on BOOT_CONFIG_EMBED"
> from CMDLINE_FROM_BOOTCONFIG; Julian Braha.
> - Patch 6 (Documentation): spell out how the embedded cmdline interacts
> with the bootloader cmdline, an initrd bootconfig, and the embedded
> bootconfig
> - Link to v4: https://lore.kernel.org/r/20260609-bootconfig_using_tools-v4-0-73c463f03a97@debian.org
>
> Changes in v4:
> - Patch 3 (build pipeline): clear CROSS_COMPILE= in the kernel-side
> tools/bootconfig sub-make. Without it, an LLVM=1 cross build
> inherits CROSS_COMPILE and tools/scripts/Makefile.include injects
> --target=/--sysroot= into the host clang, producing a target
> binary that fails to exec.
> - Patch 3 (build pipeline): place embedded-cmdline.S in its own
> .init.rodata.embed_cmdline subsection ("a") so ld.lld does not
> see a section-type mismatch against lib/bootconfig-data.S's
> writable .init.rodata ("aw"). The linker's *(.init.rodata
> .init.rodata.*) glob still folds it into the init image.
> - Patch 6 (x86/setup): also accept the bootconfig=<anything> form
> via cmdline_find_option(), matching the runtime parse_args() loop.
> Without it, bootconfig=0/=off would skip the early prepend but
> still trigger the late runtime apply -- a split-brain state.
> - New patch 7: document CONFIG_CMDLINE_FROM_BOOTCONFIG in
> Documentation/admin-guide/bootconfig.rst (semantics, opt-in,
> precedence, overflow behavior, example).
> - Link to v3: https://lore.kernel.org/r/20260608-bootconfig_using_tools-v3-0-4ddd079a0696@debian.org
>
> Changes in v3:
> - Patch 3: Move HOSTCC override to the kernel-side rule; tool keeps
> $(CC) for standalone/cross builds.
> - Patch 6: Drop the false fail-safe wording; document the
> BOOT_CONFIG_FORCE=y default interaction.
> - Link to v2:
> https://lore.kernel.org/r/20260605-bootconfig_using_tools-v2-0-d309f544b5f7@debian.org
>
> Changes in v2 (addressing review of v1):
> - Split out a standalone fix for the NULL-pointer arithmetic in
> xbc_snprint_cmdline() so the build-time render cannot trip host
> UBSan/FORTIFY_SOURCE.
> - Rework the leaf-root handling: instead of returning early, skip @root
> inside the loop so a root carrying both a value and subkeys
> (kernel = x together with kernel.foo = bar) still renders its
> descendant keys.
> - Build tools/bootconfig with $(HOSTCC) so cross-compiled (ARCH=...)
> builds render the cmdline on the build host instead of failing with
> "Exec format error".
> - Mark the embedded cmdline section read-only (drop the "w" flag from
> .init.rodata).
> - Add a make-clean hook so tools/bootconfig artifacts are removed by
> make clean.
> - Gate the x86 prepend on "bootconfig" being present on the command
> line (or CONFIG_BOOT_CONFIG_FORCE), matching the init.* opt-in
> semantics documented in bootconfig.rst and preserving fail-safe
> recovery: dropping "bootconfig" from the bootloader cmdline now also
> disables the embedded kernel.* keys.
> - Link to v1: https://patch.msgid.link/20260527-bootconfig_using_tools-v1-0-b6906a86e7d5@debian.org
>
> ---
> Breno Leitao (9):
> bootconfig: fix NULL-pointer arithmetic in xbc_snprint_cmdline()
> bootconfig: render descendant keys when xbc_snprint_cmdline() root has a value
> bootconfig: render embedded bootconfig as a kernel cmdline at build time
> bootconfig: clean build-time tools/bootconfig from make clean
> bootconfig: add xbc_prepend_embedded_cmdline() helper
> Documentation: bootconfig: document build-time cmdline rendering
> x86/setup: prepend embedded bootconfig cmdline before parse_early_param
> bootconfig: skip runtime kernel.* render once prepended early
> init/main.c: use bootconfig_cmdline_requested() for the runtime opt-in
>
> Documentation/admin-guide/bootconfig.rst | 81 ++++++++++++++++
> MAINTAINERS | 1 +
> Makefile | 27 +++++-
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/setup.c | 14 ++-
> include/linux/bootconfig.h | 14 +++
> init/Kconfig | 36 +++++++
> init/main.c | 52 +++++-----
> lib/Makefile | 16 +++
> lib/bootconfig.c | 162 +++++++++++++++++++++++++++++--
> lib/embedded-cmdline.S | 16 +++
> tools/bootconfig/Makefile | 4 +-
> 12 files changed, 388 insertions(+), 36 deletions(-)
> ---
> base-commit: a87737435cfa134f9cdcc696ba3080759d04cf72
> change-id: 20260508-bootconfig_using_tools-cfa7aa9d6a5a
>
> Best regards,
> --
> Breno Leitao <leitao@debian.org>
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH v7 0/9] bootconfig: embed kernel.* cmdline at build time
From: Breno Leitao @ 2026-06-26 14:53 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Andrew Morton, Nathan Chancellor, paulmck, Nicolas Schier,
Nick Desaulniers, Bill Wendling, Justin Stitt, Jonathan Corbet,
Shuah Khan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, linux-kernel,
linux-trace-kernel, linux-kbuild, bpf, llvm, linux-doc,
kernel-team, Nicolas Schier
In-Reply-To: <20260626233327.b5c9c8de494acdde4ddf5c02@kernel.org>
On Fri, Jun 26, 2026 at 11:33:27PM +0900, Masami Hiramatsu wrote:
> > The userspace pieces (xbc_snprint_cmdline() in lib/, tools/bootconfig -C)
> > already landed; this series wires the rendered cmdline into the kernel.
> >
> > Motivation: today the embedded bootconfig is parsed at runtime, after
> > parse_early_param() has already run, so early_param() handlers can't
> > see embedded values. Folding the kernel.* subtree into the cmdline at
> > build time gives a CONFIG_CMDLINE-equivalent for embedded-bootconfig
> > users without forcing them to maintain two cmdline sources.
> >
> > Behaviorally, the "kernel" subtree is rendered to a flat string at
> > build time and stashed in .init.rodata. setup_arch() prepends it to
> > boot_command_line before parse_early_param() runs. Overflow is a soft
> > error: the helper logs and leaves boot_command_line untouched rather
> > than panicking, so an oversized embedded bconf cannot brick a boot.
> >
>
> Thanks for update!! This looks good to me.
> Let me pick it and test it.
This is great. Thanks for it and for the support so far.
--breno
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox