* [PATCH] tracing: samples: avoid warning about __aeabi_unwind_cpp_pr1
From: Arnd Bergmann @ 2026-03-23 10:56 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Nathan Chancellor, Marc Zyngier,
Vincent Donnefort
Cc: Arnd Bergmann, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel
From: Arnd Bergmann <arnd@arndb.de>
The now more verbose check found another symbol missing from the whitelist:
Unexpected symbols in kernel/trace/simple_ring_buffer.o:
U __aeabi_unwind_cpp_pr1
Add this to the Makefile.
Fixes: 1211907ac0b5 ("tracing: Generate undef symbols allowlist for simple_ring_buffer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
kernel/trace/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index d662c1a64cd5..aba6a25db17b 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -169,8 +169,8 @@ targets += undefsyms_base.o
# because it is not linked into vmlinux.
KASAN_SANITIZE_undefsyms_base.o := y
-UNDEFINED_ALLOWLIST = __asan __gcov __kasan __kcsan __hwasan __sancov __sanitizer __tsan __ubsan __x86_indirect_thunk \
- __msan simple_ring_buffer \
+UNDEFINED_ALLOWLIST = __asan __gcov __kasan __kcsan __hwasan __sancov __sanitizer __tsan __ubsan __msan \
+ __x86_indirect_thunk __aeabi_unwind_cpp simple_ring_buffer \
$(shell $(NM) -u $(obj)/undefsyms_base.o 2>/dev/null | awk '{print $$2}')
quiet_cmd_check_undefined = NM $<
--
2.39.5
^ permalink raw reply related
* Re: [PATCH] coredump: add tracepoint for coredump events
From: Breno Leitao @ 2026-03-23 11:41 UTC (permalink / raw)
To: Steven Rostedt
Cc: Christian Brauner, Alexander Viro, Jan Kara, Masami Hiramatsu,
Mathieu Desnoyers, linux-kernel, linux-fsdevel,
linux-trace-kernel, bpf, kernel-team, Andrii Nakryiko
In-Reply-To: <20260320144834.079ba241@gandalf.local.home>
On Fri, Mar 20, 2026 at 02:48:34PM -0400, Steven Rostedt wrote:
> On Fri, 20 Mar 2026 14:21:23 +0100
> Christian Brauner <brauner@kernel.org> wrote:
>
> > > +TRACE_EVENT(coredump,
> > > +
> > > + TP_PROTO(int sig),
> > > +
> > > + TP_ARGS(sig),
> > > +
> > > + TP_STRUCT__entry(
> > > + __field(int, sig)
> > > + __array(char, comm, TASK_COMM_LEN)
> > > + __field(pid_t, pid)
> > > + ),
> > > +
> > > + TP_fast_assign(
> > > + __entry->sig = sig;
> > > + memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
> > > + __entry->pid = current->pid;
> >
> > That's the TID as seen in the global pid namespace.
> > I assume this is what you want but worth noting.
>
> Not to mention the pid is saved in all trace events and is available for
> perf and bpf too. Even the change log showed it:
>
> sleep-634 [036] ..... 145.222206: coredump: sig=11 comm=sleep pid=634
>
> ^^^ ^^^
>
> So it should not be included. It's duplicate and only wastes space. Now if
> you wanted to save the name space pid, that may be useful.
In my use case, I don't need the namespace pid since I'm primarily
focused on system-wide monitoring, and the global pid is sufficient
for my purposes. Thanks for the heads-up.
I'll update the patch to remove the pid field.
Thanks,
--breno
^ permalink raw reply
* [PATCH v2] coredump: add tracepoint for coredump events
From: Breno Leitao @ 2026-03-23 11:46 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner, Jan Kara, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers
Cc: linux-kernel, linux-fsdevel, linux-trace-kernel, bpf, kernel-team,
song, Andrii Nakryiko, Breno Leitao
Coredump is a generally useful and interesting event in the lifetime
of a process. Add a tracepoint so it can be monitored through the
standard kernel tracing infrastructure.
BPF-based crash monitoring is an advanced approach that
allows real-time crash interception: by attaching a BPF program at
this point, tools can use bpf_get_stack() with BPF_F_USER_STACK to
capture the user-space stack trace at the exact moment of the crash,
before the process is fully terminated, without waiting for a
coredump file to be written and parsed.
However, there is currently no stable kernel API for this use case.
Existing tools rely on attaching fentry probes to do_coredump(),
which is an internal function whose signature changes across kernel
versions, breaking these tools.
Add a stable tracepoint that fires at the beginning of
do_coredump(), providing BPF programs a reliable attachment point.
At tracepoint time, the crashing process context is still live, so
BPF programs can call bpf_get_stack() with BPF_F_USER_STACK to
extract the user-space backtrace.
The tracepoint records:
- sig: signal number that triggered the coredump
- comm: process name
Example output:
$ echo 1 > /sys/kernel/tracing/events/coredump/coredump/enable
$ sleep 999 &
$ kill -SEGV $!
$ cat /sys/kernel/tracing/trace
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
sleep-634 [036] ..... 145.222206: coredump: sig=11 comm=sleep
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
- Remove the pid from the tracpoint message, given pid is saved in all
trace events (Christian, Steven)
- Link to v1:
https://patch.msgid.link/20260320-coredump_tracepoint-v1-1-34864746cbb3@debian.org
---
fs/coredump.c | 5 +++++
include/trace/events/coredump.h | 45 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)
diff --git a/fs/coredump.c b/fs/coredump.c
index 29df8aa19e2e7..bb6fdb1f458e9 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -63,6 +63,9 @@
#include <trace/events/sched.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/coredump.h>
+
static bool dump_vma_snapshot(struct coredump_params *cprm);
static void free_vma_snapshot(struct coredump_params *cprm);
@@ -1090,6 +1093,8 @@ static inline bool coredump_skip(const struct coredump_params *cprm,
static void do_coredump(struct core_name *cn, struct coredump_params *cprm,
size_t **argv, int *argc, const struct linux_binfmt *binfmt)
{
+ trace_coredump(cprm->siginfo->si_signo);
+
if (!coredump_parse(cn, cprm, argv, argc)) {
coredump_report_failure("format_corename failed, aborting core");
return;
diff --git a/include/trace/events/coredump.h b/include/trace/events/coredump.h
new file mode 100644
index 0000000000000..c7b9c53fc4986
--- /dev/null
+++ b/include/trace/events/coredump.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2026 Meta Platforms, Inc. and affiliates.
+ * Copyright (c) 2026 Breno Leitao <leitao@debian.org>
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM coredump
+
+#if !defined(_TRACE_COREDUMP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_COREDUMP_H
+
+#include <linux/sched.h>
+#include <linux/tracepoint.h>
+
+/**
+ * coredump - called when a coredump starts
+ * @sig: signal number that triggered the coredump
+ *
+ * This tracepoint fires at the beginning of a coredump attempt,
+ * providing a stable interface for monitoring coredump events.
+ */
+TRACE_EVENT(coredump,
+
+ TP_PROTO(int sig),
+
+ TP_ARGS(sig),
+
+ TP_STRUCT__entry(
+ __field(int, sig)
+ __array(char, comm, TASK_COMM_LEN)
+ ),
+
+ TP_fast_assign(
+ __entry->sig = sig;
+ memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+ ),
+
+ TP_printk("sig=%d comm=%s",
+ __entry->sig, __entry->comm)
+);
+
+#endif /* _TRACE_COREDUMP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
---
base-commit: b5d083a3ed1e2798396d5e491432e887da8d4a06
change-id: 20260320-coredump_tracepoint-4de4399ce1b6
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related
* Re: [PATCH v11 4/5] ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
From: Masami Hiramatsu @ 2026-03-23 11:50 UTC (permalink / raw)
To: kernel test robot
Cc: Steven Rostedt, llvm, oe-kbuild-all, Mathieu Desnoyers,
linux-kernel, linux-trace-kernel, Ian Rogers
In-Reply-To: <202603230725.uMAZiKJx-lkp@intel.com>
On Mon, 23 Mar 2026 07:18:07 +0800
kernel test robot <lkp@intel.com> wrote:
> Hi Masami,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on trace/for-next]
> [also build test ERROR on geert-m68k/for-next geert-m68k/for-linus openrisc/for-next deller-parisc/for-next powerpc/next powerpc/fixes s390/features uml/next tip/x86/core uml/fixes v7.0-rc4 next-20260320]
> [cannot apply to linus/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Masami-Hiramatsu-Google/ring-buffer-Fix-to-update-per-subbuf-entries-of-persistent-ring-buffer/20260322-122412
> base: https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
> patch link: https://lore.kernel.org/r/177391156211.193994.7531495945584650297.stgit%40mhiramat.tok.corp.google.com
> patch subject: [PATCH v11 4/5] ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
> config: x86_64-kexec (https://download.01.org/0day-ci/archive/20260323/202603230725.uMAZiKJx-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260323/202603230725.uMAZiKJx-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202603230725.uMAZiKJx-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> >> kernel/trace/ring_buffer.c:1965:15: error: use of undeclared identifier 'bpage'
> 1965 | local_set(&bpage->entries, 0);
> | ^
> kernel/trace/ring_buffer.c:1966:15: error: use of undeclared identifier 'bpage'
> 1966 | local_set(&bpage->page->commit, 0);
> | ^
> 2 errors generated.
>
>
> vim +/bpage +1965 kernel/trace/ring_buffer.c
>
> 1910
> 1911 /* If the meta data has been validated, now validate the events */
> 1912 static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
> 1913 {
> 1914 struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
> 1915 struct buffer_page *head_page, *orig_head;
> 1916 unsigned long entry_bytes = 0;
> 1917 unsigned long entries = 0;
> 1918 int discarded = 0;
> 1919 int ret;
> 1920 u64 ts;
> 1921 int i;
> 1922
> 1923 if (!meta || !meta->head_buffer)
> 1924 return;
> 1925
> 1926 orig_head = head_page = cpu_buffer->head_page;
> 1927
> 1928 /* Do the reader page first */
> 1929 ret = rb_validate_buffer(cpu_buffer->reader_page, cpu_buffer->cpu, meta);
> 1930 if (ret < 0) {
> 1931 pr_info("Ring buffer meta [%d] invalid reader page detected\n",
> 1932 cpu_buffer->cpu);
> 1933 discarded++;
> 1934 } else {
> 1935 entries += ret;
> 1936 entry_bytes += rb_page_size(cpu_buffer->reader_page);
> 1937 }
> 1938
> 1939 ts = head_page->page->time_stamp;
> 1940
> 1941 /*
> 1942 * Try to rewind the head so that we can read the pages which already
> 1943 * read in the previous boot.
> 1944 */
> 1945 if (head_page == cpu_buffer->tail_page)
> 1946 goto skip_rewind;
> 1947
> 1948 rb_dec_page(&head_page);
> 1949 for (i = 0; i < meta->nr_subbufs + 1; i++, rb_dec_page(&head_page)) {
> 1950
> 1951 /* Rewind until tail (writer) page. */
> 1952 if (head_page == cpu_buffer->tail_page)
> 1953 break;
> 1954
> 1955 /* Rewind until unused page (no timestamp, no commit). */
> 1956 if (!head_page->page->time_stamp && rb_page_commit(head_page) == 0)
> 1957 break;
> 1958
> 1959 /*
> 1960 * Skip if the page is invalid, or its timestamp is newer than the
> 1961 * previous valid page.
> 1962 */
> 1963 ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta);
> 1964 if (ret >= 0 && ts < head_page->page->time_stamp) {
> > 1965 local_set(&bpage->entries, 0);
> 1966 local_set(&bpage->page->commit, 0);
Ooops, sorry, I made a copy & paste mistake. this should be head_page->entries and head_page->page_commit.
Let me send v12 on the latest tracing/fixes.
Thanks,
> 1967 head_page->page->time_stamp = ts;
> 1968 ret = -1;
> 1969 }
> 1970 if (ret < 0) {
> 1971 if (!discarded)
> 1972 pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
> 1973 cpu_buffer->cpu);
> 1974 discarded++;
> 1975 } else {
> 1976 entries += ret;
> 1977 entry_bytes += rb_page_size(head_page);
> 1978 if (ret > 0)
> 1979 local_inc(&cpu_buffer->pages_touched);
> 1980 ts = head_page->page->time_stamp;
> 1981 }
> 1982 }
> 1983 if (i)
> 1984 pr_info("Ring buffer [%d] rewound %d pages\n", cpu_buffer->cpu, i);
> 1985
> 1986 /* The last rewound page must be skipped. */
> 1987 if (head_page != orig_head)
> 1988 rb_inc_page(&head_page);
> 1989
> 1990 /*
> 1991 * If the ring buffer was rewound, then inject the reader page
> 1992 * into the location just before the original head page.
> 1993 */
> 1994 if (head_page != orig_head) {
> 1995 struct buffer_page *bpage = orig_head;
> 1996
> 1997 rb_dec_page(&bpage);
> 1998 /*
> 1999 * Insert the reader_page before the original head page.
> 2000 * Since the list encode RB_PAGE flags, general list
> 2001 * operations should be avoided.
> 2002 */
> 2003 cpu_buffer->reader_page->list.next = &orig_head->list;
> 2004 cpu_buffer->reader_page->list.prev = orig_head->list.prev;
> 2005 orig_head->list.prev = &cpu_buffer->reader_page->list;
> 2006 bpage->list.next = &cpu_buffer->reader_page->list;
> 2007
> 2008 /* Make the head_page the reader page */
> 2009 cpu_buffer->reader_page = head_page;
> 2010 bpage = head_page;
> 2011 rb_inc_page(&head_page);
> 2012 head_page->list.prev = bpage->list.prev;
> 2013 rb_dec_page(&bpage);
> 2014 bpage->list.next = &head_page->list;
> 2015 rb_set_list_to_head(&bpage->list);
> 2016 cpu_buffer->pages = &head_page->list;
> 2017
> 2018 cpu_buffer->head_page = head_page;
> 2019 meta->head_buffer = (unsigned long)head_page->page;
> 2020
> 2021 /* Reset all the indexes */
> 2022 bpage = cpu_buffer->reader_page;
> 2023 meta->buffers[0] = rb_meta_subbuf_idx(meta, bpage->page);
> 2024 bpage->id = 0;
> 2025
> 2026 for (i = 1, bpage = head_page; i < meta->nr_subbufs;
> 2027 i++, rb_inc_page(&bpage)) {
> 2028 meta->buffers[i] = rb_meta_subbuf_idx(meta, bpage->page);
> 2029 bpage->id = i;
> 2030 }
> 2031
> 2032 /* We'll restart verifying from orig_head */
> 2033 head_page = orig_head;
> 2034 }
> 2035
> 2036 skip_rewind:
> 2037 /* If the commit_buffer is the reader page, update the commit page */
> 2038 if (meta->commit_buffer == (unsigned long)cpu_buffer->reader_page->page) {
> 2039 cpu_buffer->commit_page = cpu_buffer->reader_page;
> 2040 /* Nothing more to do, the only page is the reader page */
> 2041 goto done;
> 2042 }
> 2043
> 2044 /* Iterate until finding the commit page */
> 2045 for (i = 0; i < meta->nr_subbufs + 1; i++, rb_inc_page(&head_page)) {
> 2046
> 2047 /* Reader page has already been done */
> 2048 if (head_page == cpu_buffer->reader_page)
> 2049 continue;
> 2050
> 2051 ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta);
> 2052 if (ret < 0) {
> 2053 if (!discarded)
> 2054 pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
> 2055 cpu_buffer->cpu);
> 2056 discarded++;
> 2057 } else {
> 2058 /* If the buffer has content, update pages_touched */
> 2059 if (ret)
> 2060 local_inc(&cpu_buffer->pages_touched);
> 2061
> 2062 entries += ret;
> 2063 entry_bytes += rb_page_size(head_page);
> 2064 }
> 2065 if (head_page == cpu_buffer->commit_page)
> 2066 break;
> 2067 }
> 2068
> 2069 if (head_page != cpu_buffer->commit_page) {
> 2070 pr_info("Ring buffer meta [%d] commit page not found\n",
> 2071 cpu_buffer->cpu);
> 2072 goto invalid;
> 2073 }
> 2074 done:
> 2075 local_set(&cpu_buffer->entries, entries);
> 2076 local_set(&cpu_buffer->entries_bytes, entry_bytes);
> 2077
> 2078 pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
> 2079 cpu_buffer->cpu, discarded);
> 2080 return;
> 2081
> 2082 invalid:
> 2083 /* The content of the buffers are invalid, reset the meta data */
> 2084 meta->head_buffer = 0;
> 2085 meta->commit_buffer = 0;
> 2086
> 2087 /* Reset the reader page */
> 2088 local_set(&cpu_buffer->reader_page->entries, 0);
> 2089 local_set(&cpu_buffer->reader_page->page->commit, 0);
> 2090
> 2091 /* Reset all the subbuffers */
> 2092 for (i = 0; i < meta->nr_subbufs - 1; i++, rb_inc_page(&head_page)) {
> 2093 local_set(&head_page->entries, 0);
> 2094 rb_init_page(head_page->page);
> 2095 }
> 2096 }
> 2097
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH] tracing: fprobe: fix the length of unused fgraph_data
From: Masami Hiramatsu @ 2026-03-23 12:06 UTC (permalink / raw)
To: Martin Kaiser
Cc: Steven Rostedt, Mathieu Desnoyers, linux-trace-kernel,
linux-kernel, stable
In-Reply-To: <20260323102020.239567-1-martin@kaiser.cx>
On Mon, 23 Mar 2026 11:19:36 +0100
Martin Kaiser <martin@kaiser.cx> wrote:
> If fprobe_entry does not fill the allocated fgraph_data completely, the
> unused part is zeroed with memset.
>
> Fix the length for this memset call. Both reserved_words and used are in
> units of return stack words, but memset needs the number of bytes.
>
> Cc: stable@vger.kernel.org
> Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
> Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Good catch!
Thanks,
> ---
> kernel/trace/fprobe.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> index dcadf1d23b8a..6a1192515afd 100644
> --- a/kernel/trace/fprobe.c
> +++ b/kernel/trace/fprobe.c
> @@ -451,7 +451,7 @@ static int fprobe_fgraph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops
> }
> }
> if (used < reserved_words)
> - memset(fgraph_data + used, 0, reserved_words - used);
> + memset(fgraph_data + used, 0, (reserved_words - used) * sizeof(long));
>
> /* If any exit_handler is set, data must be used. */
> return used != 0;
> --
> 2.43.7
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH v2] coredump: add tracepoint for coredump events
From: Christian Brauner @ 2026-03-23 12:16 UTC (permalink / raw)
To: Breno Leitao
Cc: Christian Brauner, Alexander Viro, Jan Kara, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-fsdevel,
linux-trace-kernel, bpf, kernel-team, song, Andrii Nakryiko
In-Reply-To: <20260323-coredump_tracepoint-v2-1-afced083b38d@debian.org>
On Mon, 23 Mar 2026 04:46:27 -0700, Breno Leitao wrote:
> Coredump is a generally useful and interesting event in the lifetime
> of a process. Add a tracepoint so it can be monitored through the
> standard kernel tracing infrastructure.
>
> BPF-based crash monitoring is an advanced approach that
> allows real-time crash interception: by attaching a BPF program at
> this point, tools can use bpf_get_stack() with BPF_F_USER_STACK to
> capture the user-space stack trace at the exact moment of the crash,
> before the process is fully terminated, without waiting for a
> coredump file to be written and parsed.
>
> [...]
Applied to the vfs-7.1.misc branch of the vfs/vfs.git tree.
Patches in the vfs-7.1.misc branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-7.1.misc
[1/1] coredump: add tracepoint for coredump events
https://git.kernel.org/vfs/vfs/c/f30186b0c782
^ permalink raw reply
* [PATCH v11 0/4] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu (Google) @ 2026-03-23 12:30 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
Hi,
Here is the 12th version of improvement patches for making persistent
ring buffers robust to failures.
The previous version is here:
https://lore.kernel.org/all/177391152793.193994.8986943289250629418.stgit@mhiramat.tok.corp.google.com/
In this version, I rebased it on tracing/fixes branch (thus
the first fix patch has been removed) and fixed
a build error which came from a copy-paste mistake.
Thank you,
---
Masami Hiramatsu (Google) (4):
ring-buffer: Flush and stop persistent ring buffer on panic
ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
ring-buffer: Add persistent ring buffer selftest
arch/alpha/include/asm/Kbuild | 1
arch/arc/include/asm/Kbuild | 1
arch/arm/include/asm/Kbuild | 1
arch/arm64/include/asm/ring_buffer.h | 10 +
arch/csky/include/asm/Kbuild | 1
arch/hexagon/include/asm/Kbuild | 1
arch/loongarch/include/asm/Kbuild | 1
arch/m68k/include/asm/Kbuild | 1
arch/microblaze/include/asm/Kbuild | 1
arch/mips/include/asm/Kbuild | 1
arch/nios2/include/asm/Kbuild | 1
arch/openrisc/include/asm/Kbuild | 1
arch/parisc/include/asm/Kbuild | 1
arch/powerpc/include/asm/Kbuild | 1
arch/riscv/include/asm/Kbuild | 1
arch/s390/include/asm/Kbuild | 1
arch/sh/include/asm/Kbuild | 1
arch/sparc/include/asm/Kbuild | 1
arch/um/include/asm/Kbuild | 1
arch/x86/include/asm/Kbuild | 1
arch/xtensa/include/asm/Kbuild | 1
include/asm-generic/ring_buffer.h | 13 ++
include/linux/ring_buffer.h | 1
kernel/trace/Kconfig | 15 ++
kernel/trace/ring_buffer.c | 237 ++++++++++++++++++++++++++--------
kernel/trace/trace.c | 4 +
26 files changed, 241 insertions(+), 59 deletions(-)
create mode 100644 arch/arm64/include/asm/ring_buffer.h
create mode 100644 include/asm-generic/ring_buffer.h
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* [PATCH v11 1/4] ring-buffer: Flush and stop persistent ring buffer on panic
From: Masami Hiramatsu (Google) @ 2026-03-23 12:30 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177426904455.3236905.2079719303397330696.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.
To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.
Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v11:
- Do nothing by default since flush_cache_vmap() does nothing on x86
but it can cause deadlock on some architectures via on_each_cpu()
because other CPUs will be stoppped when panic notifier is called.
Changes in v9:
- Fix typo of & to &&.
- Fix typo of "Generic"
Changes in v6:
- Introduce asm/ring_buffer.h for arch_ring_buffer_flush_range().
- Use flush_cache_vmap() instead of flush_cache_all().
Changes in v5:
- Use ring_buffer_record_off() instead of ring_buffer_record_disable().
- Use flush_cache_all() to ensure flush all cache.
Changes in v3:
- update patch description.
---
arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
arch/arm/include/asm/Kbuild | 1 +
arch/arm64/include/asm/ring_buffer.h | 10 ++++++++++
arch/csky/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
arch/parisc/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 1 +
arch/riscv/include/asm/Kbuild | 1 +
arch/s390/include/asm/Kbuild | 1 +
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/include/asm/Kbuild | 1 +
arch/x86/include/asm/Kbuild | 1 +
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/ring_buffer.h | 13 +++++++++++++
kernel/trace/ring_buffer.c | 22 ++++++++++++++++++++++
23 files changed, 65 insertions(+)
create mode 100644 arch/arm64/include/asm/ring_buffer.h
create mode 100644 include/asm-generic/ring_buffer.h
diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 483965c5a4de..b154b4e3dfa8 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += agp.h
generic-y += asm-offsets.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 4c69522e0328..483caacc6988 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -5,5 +5,6 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 03657ff8fbe3..decad5f2c826 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -3,6 +3,7 @@ generic-y += early_ioremap.h
generic-y += extable.h
generic-y += flat.h
generic-y += parport.h
+generic-y += ring_buffer.h
generated-y += mach-types.h
generated-y += unistd-nr.h
diff --git a/arch/arm64/include/asm/ring_buffer.h b/arch/arm64/include/asm/ring_buffer.h
new file mode 100644
index 000000000000..62316c406888
--- /dev/null
+++ b/arch/arm64/include/asm/ring_buffer.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_ARM64_RING_BUFFER_H
+#define _ASM_ARM64_RING_BUFFER_H
+
+#include <asm/cacheflush.h>
+
+/* Flush D-cache on persistent ring buffer */
+#define arch_ring_buffer_flush_range(start, end) dcache_clean_pop(start, end)
+
+#endif /* _ASM_ARM64_RING_BUFFER_H */
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 3a5c7f6e5aac..7dca0c6cdc84 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += qrwlock.h
generic-y += qrwlock_types.h
generic-y += qspinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += vmlinux.lds.h
generic-y += text-patching.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 1efa1e993d4b..0f887d4238ed 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += extable.h
generic-y += iomap.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 9034b583a88a..7e92957baf6a 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -10,5 +10,6 @@ generic-y += qrwlock.h
generic-y += user.h
generic-y += ioctl.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
generic-y += statfs.h
generic-y += text-patching.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index b282e0dd8dc1..62543bf305ff 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -3,5 +3,6 @@ generated-y += syscall_table.h
generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += spinlock.h
generic-y += text-patching.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 7178f990e8b3..0030309b47ad 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += syscalls.h
generic-y += tlb.h
generic-y += user.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 684569b2ecd6..9771c3d85074 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,5 +12,6 @@ generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 28004301c236..0a2530964413 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += cmpxchg.h
generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += spinlock.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index cef49d60d74c..8aa34621702d 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -8,4 +8,5 @@ generic-y += spinlock_types.h
generic-y += spinlock.h
generic-y += qrwlock_types.h
generic-y += qrwlock.h
+generic-y += ring_buffer.h
generic-y += user.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 4fb596d94c89..d48d158f7241 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2e23533b67e3..805b5aeebb6f 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -5,4 +5,5 @@ generated-y += syscall_table_spu.h
generic-y += agp.h
generic-y += mcs_spinlock.h
generic-y += qrwlock.h
+generic-y += ring_buffer.h
generic-y += early_ioremap.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index bd5fc9403295..7721b63642f4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -14,5 +14,6 @@ generic-y += ticket_spinlock.h
generic-y += qrwlock.h
generic-y += qrwlock_types.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 80bad7de7a04..0c1fc47c3ba0 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,3 +7,4 @@ generated-y += unistd_nr.h
generic-y += asm-offsets.h
generic-y += mcs_spinlock.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 4d3f10ed8275..f0403d3ee8ab 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -3,4 +3,5 @@ generated-y += syscall_table.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 17ee8a273aa6..49c6bb326b75 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe322..2a1629ba8140 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += module.lds.h
generic-y += parport.h
generic-y += percpu.h
generic-y += preempt.h
+generic-y += ring_buffer.h
generic-y += runtime-const.h
generic-y += softirq_stack.h
generic-y += switch_to.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 4566000e15c4..078fd2c0d69d 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -14,3 +14,4 @@ generic-y += early_ioremap.h
generic-y += fprobe.h
generic-y += mcs_spinlock.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 13fe45dea296..e57af619263a 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -6,5 +6,6 @@ generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/include/asm-generic/ring_buffer.h b/include/asm-generic/ring_buffer.h
new file mode 100644
index 000000000000..201d2aee1005
--- /dev/null
+++ b/include/asm-generic/ring_buffer.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic arch dependent ring_buffer macros.
+ */
+#ifndef __ASM_GENERIC_RING_BUFFER_H__
+#define __ASM_GENERIC_RING_BUFFER_H__
+
+#include <linux/cacheflush.h>
+
+/* Flush cache on ring buffer range if needed. Do nothing by default. */
+#define arch_ring_buffer_flush_range(start, end) do { } while (0)
+
+#endif /* __ASM_GENERIC_RING_BUFFER_H__ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 170170bd83bd..0131ce4e018c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6,6 +6,7 @@
*/
#include <linux/sched/isolation.h>
#include <linux/trace_recursion.h>
+#include <linux/panic_notifier.h>
#include <linux/trace_events.h>
#include <linux/ring_buffer.h>
#include <linux/trace_clock.h>
@@ -30,6 +31,7 @@
#include <linux/oom.h>
#include <linux/mm.h>
+#include <asm/ring_buffer.h>
#include <asm/local64.h>
#include <asm/local.h>
#include <asm/setup.h>
@@ -589,6 +591,7 @@ struct trace_buffer {
unsigned long range_addr_start;
unsigned long range_addr_end;
+ struct notifier_block flush_nb;
struct ring_buffer_meta *meta;
@@ -2471,6 +2474,16 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
}
+/* Stop recording on a persistent buffer and flush cache if needed. */
+static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
+{
+ struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
+
+ ring_buffer_record_off(buffer);
+ arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
+ return NOTIFY_DONE;
+}
+
static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
int order, unsigned long start,
unsigned long end,
@@ -2590,6 +2603,12 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
mutex_init(&buffer->mutex);
+ /* Persistent ring buffer needs to flush cache before reboot. */
+ if (start && end) {
+ buffer->flush_nb.notifier_call = rb_flush_buffer_cb;
+ atomic_notifier_chain_register(&panic_notifier_list, &buffer->flush_nb);
+ }
+
return_ptr(buffer);
fail_free_buffers:
@@ -2677,6 +2696,9 @@ ring_buffer_free(struct trace_buffer *buffer)
{
int cpu;
+ if (buffer->range_addr_start && buffer->range_addr_end)
+ atomic_notifier_chain_unregister(&panic_notifier_list, &buffer->flush_nb);
+
cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
irq_work_sync(&buffer->irq_work.work);
^ permalink raw reply related
* [PATCH v11 2/4] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-03-23 12:31 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177426904455.3236905.2079719303397330696.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Skip invalid sub-buffers when validating the persistent ring buffer
instead of discarding the entire ring buffer. Only skipped buffers
are invalidated (cleared).
If the cache data in memory fails to be synchronized during a reboot,
the persistent ring buffer may become partially corrupted, but other
sub-buffers may still contain readable event data. Only discard the
subbuffers that are found to be corrupted.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v11:
- Fix a typo.
Changes in v9:
- Add meta->subbuf_size check.
- Fix a typo.
- Handle invalid reader_page case.
Changes in v8:
- Add comment in rb_valudate_buffer()
- Clear the RB_MISSED_* flags in rb_valudate_buffer() instead of
skipping subbuf.
- Remove unused subbuf local variable from rb_cpu_meta_valid().
Changes in v7:
- Combined with Handling RB_MISSED_* flags patch, focus on validation at boot.
- Remove checking subbuffer data when validating metadata, because it should be done
later.
- Do not mark the discarded sub buffer page but just reset it.
Changes in v6:
- Show invalid page detection message once per CPU.
Changes in v5:
- Instead of showing errors for each page, just show the number
of discarded pages at last.
Changes in v3:
- Record missed data event on commit.
---
kernel/trace/ring_buffer.c | 98 ++++++++++++++++++++++++++------------------
1 file changed, 58 insertions(+), 40 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0131ce4e018c..dd4c8cf350df 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -396,6 +396,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
return local_read(&bpage->page->commit);
}
+/* Size is determined by what has been committed */
+static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
+{
+ return rb_page_commit(bpage) & ~RB_MISSED_MASK;
+}
+
static void free_buffer_page(struct buffer_page *bpage)
{
/* Range pages are not to be freed */
@@ -1791,7 +1797,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
unsigned long *subbuf_mask)
{
int subbuf_size = PAGE_SIZE;
- struct buffer_data_page *subbuf;
unsigned long buffers_start;
unsigned long buffers_end;
int i;
@@ -1799,6 +1804,11 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
if (!subbuf_mask)
return false;
+ if (meta->subbuf_size != PAGE_SIZE) {
+ pr_info("Ring buffer boot meta [%d] invalid subbuf_size\n", cpu);
+ return false;
+ }
+
buffers_start = meta->first_buffer;
buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
@@ -1815,11 +1825,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
return false;
}
- subbuf = rb_subbufs_from_meta(meta);
-
bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
- /* Is the meta buffers and the subbufs themselves have correct data? */
+ /*
+ * Ensure the meta::buffers array has correct data. The data in each subbufs
+ * are checked later in rb_meta_validate_events().
+ */
for (i = 0; i < meta->nr_subbufs; i++) {
if (meta->buffers[i] < 0 ||
meta->buffers[i] >= meta->nr_subbufs) {
@@ -1827,18 +1838,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
return false;
}
- if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
- pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
- return false;
- }
-
if (test_bit(meta->buffers[i], subbuf_mask)) {
pr_info("Ring buffer boot meta [%d] array has duplicates\n", cpu);
return false;
}
set_bit(meta->buffers[i], subbuf_mask);
- subbuf = (void *)subbuf + subbuf_size;
}
return true;
@@ -1902,13 +1907,22 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
return events;
}
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
+ struct ring_buffer_cpu_meta *meta)
{
unsigned long long ts;
+ unsigned long tail;
u64 delta;
- int tail;
- tail = local_read(&dpage->commit);
+ /*
+ * When a sub-buffer is recovered from a read, the commit value may
+ * have RB_MISSED_* bits set, as these bits are reset on reuse.
+ * Even after clearing these bits, a commit value greater than the
+ * subbuf_size is considered invalid.
+ */
+ tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
+ if (tail > meta->subbuf_size)
+ return -1;
return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
}
@@ -1919,6 +1933,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
struct buffer_page *head_page, *orig_head;
unsigned long entry_bytes = 0;
unsigned long entries = 0;
+ int discarded = 0;
int ret;
u64 ts;
int i;
@@ -1929,14 +1944,19 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
orig_head = head_page = cpu_buffer->head_page;
/* Do the reader page first */
- ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
if (ret < 0) {
- pr_info("Ring buffer reader page is invalid\n");
- goto invalid;
+ pr_info("Ring buffer meta [%d] invalid reader page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ /* Instead of discard whole ring buffer, discard only this sub-buffer. */
+ local_set(&cpu_buffer->reader_page->entries, 0);
+ local_set(&cpu_buffer->reader_page->page->commit, 0);
+ } else {
+ entries += ret;
+ entry_bytes += rb_page_size(cpu_buffer->reader_page);
+ local_set(&cpu_buffer->reader_page->entries, ret);
}
- entries += ret;
- entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
- local_set(&cpu_buffer->reader_page->entries, ret);
ts = head_page->page->time_stamp;
@@ -1964,7 +1984,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
break;
/* Stop rewind if the page is invalid. */
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
if (ret < 0)
break;
@@ -2043,21 +2063,24 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->reader_page)
continue;
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
if (ret < 0) {
- pr_info("Ring buffer meta [%d] invalid buffer page\n",
- cpu_buffer->cpu);
- goto invalid;
- }
-
- /* If the buffer has content, update pages_touched */
- if (ret)
- local_inc(&cpu_buffer->pages_touched);
-
- entries += ret;
- entry_bytes += local_read(&head_page->page->commit);
- local_set(&head_page->entries, ret);
+ if (!discarded)
+ pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ /* Instead of discard whole ring buffer, discard only this sub-buffer. */
+ local_set(&head_page->entries, 0);
+ local_set(&head_page->page->commit, 0);
+ } else {
+ /* If the buffer has content, update pages_touched */
+ if (ret)
+ local_inc(&cpu_buffer->pages_touched);
+ entries += ret;
+ entry_bytes += rb_page_size(head_page);
+ local_set(&head_page->entries, ret);
+ }
if (head_page == cpu_buffer->commit_page)
break;
}
@@ -2071,7 +2094,8 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
local_set(&cpu_buffer->entries, entries);
local_set(&cpu_buffer->entries_bytes, entry_bytes);
- pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu);
+ pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
+ cpu_buffer->cpu, discarded);
return;
invalid:
@@ -3258,12 +3282,6 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
return NULL;
}
-/* Size is determined by what has been committed */
-static __always_inline unsigned rb_page_size(struct buffer_page *bpage)
-{
- return rb_page_commit(bpage) & ~RB_MISSED_MASK;
-}
-
static __always_inline unsigned
rb_commit_index(struct ring_buffer_per_cpu *cpu_buffer)
{
^ permalink raw reply related
* [PATCH v11 3/4] ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-03-23 12:31 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177426904455.3236905.2079719303397330696.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Skip invalid sub-buffers when rewinding the persistent ring buffer
instead of stopping the rewinding the ring buffer. The skipped
buffers are cleared.
To ensure the rewinding stops at the unused page, this also clears
buffer_data_page::time_stamp when tracing resets the buffer. This
allows us to identify unused pages and empty pages.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v12:
- Fix build error.
Changes in v11:
- Reset timestamp when the buffer is invalid.
- When rewinding, skip subbuf page if timestamp is wrong and
check timestamp after validating buffer data page.
Changes in v10:
- Newly added.
---
kernel/trace/ring_buffer.c | 76 +++++++++++++++++++++++++-------------------
1 file changed, 43 insertions(+), 33 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index dd4c8cf350df..99f59f1c63e0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -389,6 +389,7 @@ struct buffer_page {
static void rb_init_page(struct buffer_data_page *bpage)
{
local_set(&bpage->commit, 0);
+ bpage->time_stamp = 0;
}
static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
@@ -1907,12 +1908,14 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
return events;
}
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
+static int rb_validate_buffer(struct buffer_page *bpage, int cpu,
struct ring_buffer_cpu_meta *meta)
{
+ struct buffer_data_page *dpage = bpage->page;
unsigned long long ts;
unsigned long tail;
u64 delta;
+ int ret = -1;
/*
* When a sub-buffer is recovered from a read, the commit value may
@@ -1921,9 +1924,17 @@ static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
* subbuf_size is considered invalid.
*/
tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
- if (tail > meta->subbuf_size)
- return -1;
- return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
+ if (tail <= meta->subbuf_size)
+ ret = rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
+
+ if (ret < 0) {
+ local_set(&bpage->entries, 0);
+ local_set(&bpage->page->commit, 0);
+ } else {
+ local_set(&bpage->entries, ret);
+ }
+
+ return ret;
}
/* If the meta data has been validated, now validate the events */
@@ -1944,18 +1955,14 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
orig_head = head_page = cpu_buffer->head_page;
/* Do the reader page first */
- ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
+ ret = rb_validate_buffer(cpu_buffer->reader_page, cpu_buffer->cpu, meta);
if (ret < 0) {
pr_info("Ring buffer meta [%d] invalid reader page detected\n",
cpu_buffer->cpu);
discarded++;
- /* Instead of discard whole ring buffer, discard only this sub-buffer. */
- local_set(&cpu_buffer->reader_page->entries, 0);
- local_set(&cpu_buffer->reader_page->page->commit, 0);
} else {
entries += ret;
entry_bytes += rb_page_size(cpu_buffer->reader_page);
- local_set(&cpu_buffer->reader_page->entries, ret);
}
ts = head_page->page->time_stamp;
@@ -1974,26 +1981,33 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->tail_page)
break;
- /* Ensure the page has older data than head. */
- if (ts < head_page->page->time_stamp)
- break;
-
- ts = head_page->page->time_stamp;
- /* Ensure the page has correct timestamp and some data. */
- if (!ts || rb_page_commit(head_page) == 0)
- break;
-
- /* Stop rewind if the page is invalid. */
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
- if (ret < 0)
+ /* Rewind until unused page (no timestamp, no commit). */
+ if (!head_page->page->time_stamp && rb_page_commit(head_page) == 0)
break;
- /* Recover the number of entries and update stats. */
- local_set(&head_page->entries, ret);
- if (ret)
- local_inc(&cpu_buffer->pages_touched);
- entries += ret;
- entry_bytes += rb_page_commit(head_page);
+ /*
+ * Skip if the page is invalid, or its timestamp is newer than the
+ * previous valid page.
+ */
+ ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta);
+ if (ret >= 0 && ts < head_page->page->time_stamp) {
+ local_set(&head_page->entries, 0);
+ local_set(&head_page->page->commit, 0);
+ head_page->page->time_stamp = ts;
+ ret = -1;
+ }
+ if (ret < 0) {
+ if (!discarded)
+ pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ } else {
+ entries += ret;
+ entry_bytes += rb_page_size(head_page);
+ if (ret > 0)
+ local_inc(&cpu_buffer->pages_touched);
+ ts = head_page->page->time_stamp;
+ }
}
if (i)
pr_info("Ring buffer [%d] rewound %d pages\n", cpu_buffer->cpu, i);
@@ -2063,15 +2077,12 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->reader_page)
continue;
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
+ ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta);
if (ret < 0) {
if (!discarded)
pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
cpu_buffer->cpu);
discarded++;
- /* Instead of discard whole ring buffer, discard only this sub-buffer. */
- local_set(&head_page->entries, 0);
- local_set(&head_page->page->commit, 0);
} else {
/* If the buffer has content, update pages_touched */
if (ret)
@@ -2079,7 +2090,6 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
entries += ret;
entry_bytes += rb_page_size(head_page);
- local_set(&head_page->entries, ret);
}
if (head_page == cpu_buffer->commit_page)
break;
@@ -2110,7 +2120,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
/* Reset all the subbuffers */
for (i = 0; i < meta->nr_subbufs - 1; i++, rb_inc_page(&head_page)) {
local_set(&head_page->entries, 0);
- local_set(&head_page->page->commit, 0);
+ rb_init_page(head_page->page);
}
}
^ permalink raw reply related
* [PATCH v11 4/4] ring-buffer: Add persistent ring buffer selftest
From: Masami Hiramatsu (Google) @ 2026-03-23 12:31 UTC (permalink / raw)
To: Steven Rostedt
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers
In-Reply-To: <177426904455.3236905.2079719303397330696.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a self-destractive test for the persistent ring buffer. This
will invalidate some sub-buffer pages in the persistent ring buffer
when kernel gets panic, and check whether the number of detected
invalid pages and the total entry_bytes are the same as record
after reboot.
This can ensure the kernel correctly recover partially corrupted
persistent ring buffer when boot.
The test only runs on the persistent ring buffer whose name is
"ptracingtest". And user has to fill it up with events before
kernel panics.
To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
and you have to setup the kernel cmdline;
reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace
panic=1
And run following commands after the 1st boot;
cd /sys/kernel/tracing/instances/ptracingtest
echo 1 > tracing_on
echo 1 > events/enable
sleep 3
echo c > /proc/sysrq-trigger
After panic message, the kernel will reboot and run the verification
on the persistent ring buffer, e.g.
Ring buffer meta [2] invalid buffer page detected
Ring buffer meta [2] is from previous boot! (318 pages discarded)
Ring buffer testing [2] invalid pages: PASSED (318/318)
Ring buffer testing [2] entry_bytes: PASSED (1300476/1300476)
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v10:
- Add entry_bytes test.
- Do not compile test code if CONFIG_RING_BUFFER_PERSISTENT_SELFTEST=n.
Changes in v9:
- Test also reader pages.
---
include/linux/ring_buffer.h | 1 +
kernel/trace/Kconfig | 15 +++++++++
kernel/trace/ring_buffer.c | 69 +++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.c | 4 ++
4 files changed, 89 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index d862fa610270..a92a22c44387 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -238,6 +238,7 @@ int ring_buffer_subbuf_size_get(struct trace_buffer *buffer);
enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
+ RB_FL_TESTING = 1 << 1,
};
#ifdef CONFIG_RING_BUFFER
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 49de13cae428..2e6f3b7c6a31 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -1202,6 +1202,21 @@ config RING_BUFFER_VALIDATE_TIME_DELTAS
Only say Y if you understand what this does, and you
still want it enabled. Otherwise say N
+config RING_BUFFER_PERSISTENT_SELFTEST
+ bool "Enable persistent ring buffer selftest"
+ depends on RING_BUFFER
+ help
+ Run a selftest on the persistent ring buffer which names
+ "ptracingtest" (and its backup) when panic_on_reboot by
+ invalidating ring buffer pages.
+ Note that user has to enable events on the persistent ring
+ buffer manually to fill up ring buffers before rebooting.
+ Since this invalidates the data on test target ring buffer,
+ "ptracingtest" persistent ring buffer must not be used for
+ actual tracing, but only for testing.
+
+ If unsure, say N
+
config MMIOTRACE_TEST
tristate "Test module for mmiotrace"
depends on MMIOTRACE && m
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 99f59f1c63e0..31e479143ca0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -63,6 +63,10 @@ struct ring_buffer_cpu_meta {
unsigned long commit_buffer;
__u32 subbuf_size;
__u32 nr_subbufs;
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+ __u32 nr_invalid;
+ __u32 entry_bytes;
+#endif
int buffers[];
};
@@ -2106,6 +2110,19 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
pr_info("Ring buffer meta [%d] is from previous boot! (%d pages discarded)\n",
cpu_buffer->cpu, discarded);
+
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+ if (meta->nr_invalid)
+ pr_info("Ring buffer testing [%d] invalid pages: %s (%d/%d)\n",
+ cpu_buffer->cpu,
+ (discarded == meta->nr_invalid) ? "PASSED" : "FAILED",
+ discarded, meta->nr_invalid);
+ if (meta->entry_bytes)
+ pr_info("Ring buffer testing [%d] entry_bytes: %s (%ld/%ld)\n",
+ cpu_buffer->cpu,
+ (entry_bytes == meta->entry_bytes) ? "PASSED" : "FAILED",
+ (long)entry_bytes, (long)meta->entry_bytes);
+#endif
return;
invalid:
@@ -2508,12 +2525,64 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
}
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_SELFTEST
+static void rb_test_inject_invalid_pages(struct trace_buffer *buffer)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct ring_buffer_cpu_meta *meta;
+ struct buffer_data_page *dpage;
+ u32 entry_bytes = 0;
+ unsigned long ptr;
+ int subbuf_size;
+ int invalid = 0;
+ int cpu;
+ int i;
+
+ if (!(buffer->flags & RB_FL_TESTING))
+ return;
+
+ guard(preempt)();
+ cpu = smp_processor_id();
+
+ cpu_buffer = buffer->buffers[cpu];
+ meta = cpu_buffer->ring_meta;
+ ptr = (unsigned long)rb_subbufs_from_meta(meta);
+ subbuf_size = meta->subbuf_size;
+
+ for (i = 0; i < meta->nr_subbufs; i++) {
+ int idx = meta->buffers[i];
+
+ dpage = (void *)(ptr + idx * subbuf_size);
+ /* Skip unused pages */
+ if (!local_read(&dpage->commit))
+ continue;
+
+ /* Invalidate even pages. */
+ if (!(i & 0x1)) {
+ local_add(subbuf_size + 1, &dpage->commit);
+ invalid++;
+ } else {
+ /* Count total commit bytes. */
+ entry_bytes += local_read(&dpage->commit);
+ }
+ }
+
+ pr_info("Inject invalidated %d pages on CPU%d, total size: %ld\n",
+ invalid, cpu, (long)entry_bytes);
+ meta->nr_invalid = invalid;
+ meta->entry_bytes = entry_bytes;
+}
+#else /* !CONFIG_RING_BUFFER_PERSISTENT_SELFTEST */
+#define rb_test_inject_invalid_pages(buffer) do { } while (0)
+#endif
+
/* Stop recording on a persistent buffer and flush cache if needed. */
static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
{
struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
ring_buffer_record_off(buffer);
+ rb_test_inject_invalid_pages(buffer);
arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
return NOTIFY_DONE;
}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a626211ceb9a..561bcc1c9125 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9366,6 +9366,8 @@ static void setup_trace_scratch(struct trace_array *tr,
memset(tscratch, 0, size);
}
+#define TRACE_TEST_PTRACING_NAME "ptracingtest"
+
static int
allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned long size)
{
@@ -9378,6 +9380,8 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned
buf->tr = tr;
if (tr->range_addr_start && tr->range_addr_size) {
+ if (!strcmp(tr->name, TRACE_TEST_PTRACING_NAME))
+ rb_flags |= RB_FL_TESTING;
/* Add scratch buffer to handle 128 modules */
buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
tr->range_addr_start,
^ permalink raw reply related
* Re: [PATCH] module/kallsyms: sort function symbols and use binary search
From: Petr Pavlu @ 2026-03-23 13:06 UTC (permalink / raw)
To: Stanislaw Gruszka
Cc: linux-modules, Sami Tolvanen, Luis Chamberlain, linux-kernel,
linux-trace-kernel, live-patching, Daniel Gomez, Aaron Tomlin,
Steven Rostedt, Masami Hiramatsu, Jordan Rome, Viktor Malik
In-Reply-To: <20260317110423.45481-1-stf_xl@wp.pl>
On 3/17/26 12:04 PM, Stanislaw Gruszka wrote:
> Module symbol lookup via find_kallsyms_symbol() performs a linear scan
> over the entire symtab when resolving an address. The number of symbols
> in module symtabs has grown over the years, largely due to additional
> metadata in non-standard sections, making this lookup very slow.
>
> Improve this by separating function symbols during module load, placing
> them at the beginning of the symtab, sorting them by address, and using
> binary search when resolving addresses in module text.
Doesn't considering only function symbols break the expected behavior
with CONFIG_KALLSYMS_ALL=y. For instance, when using kdb, is it still
able to see all symbols in a module? The module loader should be remain
consistent with the main kallsyms code regarding which symbols can be
looked up.
>
> This also should improve times for linear symbol name lookups, as valid
> function symbols are now located at the beginning of the symtab.
>
> The cost of sorting is small relative to module load time. In repeated
> module load tests [1], depending on .config options, this change
> increases load time between 2% and 4%. With cold caches, the difference
> is not measurable, as memory access latency dominates.
>
> The sorting theoretically could be done in compile time, but much more
> complicated as we would have to simulate kernel addresses resolution
> for symbols, and then correct relocation entries. That would be risky
> if get out of sync.
>
> The improvement can be observed when listing ftrace filter functions:
>
> root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
> 74908
>
> real 0m1.315s
> user 0m0.000s
> sys 0m1.312s
>
> After:
>
> root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
> 74911
>
> real 0m0.167s
> user 0m0.004s
> sys 0m0.175s
>
> (there are three more symbols introduced by the patch)
This looks as a reasonable improvement.
>
> For livepatch modules, the symtab layout is preserved and the existing
> linear search is used. For this case, it should be possible to keep
> the original ELF symtab instead of copying it 1:1, but that is outside
> the scope of this patch.
Livepatch modules are already handled specially by the kallsyms module
code so excluding them from this optimization is probably ok.
However, it might be worth revisiting this exception. I believe that
livepatch support requires the original symbol table for relocations to
remain usable. It might make sense to investigate whether updating the
relocation data with the adjusted symbol indexes would be sensible.
--
Thanks,
Petr
^ permalink raw reply
* Re: [PATCH 3/3] rtla: Parse cmdline using libsubcmd
From: Tomas Glozar @ 2026-03-23 14:15 UTC (permalink / raw)
To: Wander Lairson Costa
Cc: Steven Rostedt, John Kacur, Luis Goncalves, Crystal Wood,
Costa Shulyupin, Ivan Pravdin, Namhyung Kim, Ian Rogers,
Arnaldo Carvalho de Melo, LKML, linux-trace-kernel,
linux-perf-users
In-Reply-To: <ab2D0a32yuwYTO1K@192.168.0.121>
pá 20. 3. 2026 v 18:32 odesílatel Wander Lairson Costa
<wander@redhat.com> napsal:
> [snip]
>
> > -int getopt_auto(int argc, char **argv, const struct option *long_opts);
> > int common_parse_options(int argc, char **argv, struct common_params *common);
>
>
> The function common_parse_options() body was removed, but the declaration remains.
>
> > int common_apply_config(struct osnoise_tool *tool, struct common_params *params);
>
> [snip]
>
Thanks for noticing that, I will fix that in v2 that will be needed either way.
Tomas
^ permalink raw reply
* Re: [PATCH 3/3] rtla: Parse cmdline using libsubcmd
From: Tomas Glozar @ 2026-03-23 14:26 UTC (permalink / raw)
To: Costa Shulyupin
Cc: Steven Rostedt, John Kacur, Luis Goncalves, Crystal Wood,
Wander Lairson Costa, Ivan Pravdin, Namhyung Kim, Ian Rogers,
Arnaldo Carvalho de Melo, LKML, linux-trace-kernel,
linux-perf-users
In-Reply-To: <CADDUTFx7eSqQoA0Sv2hJCcpunosWcg3=VkhWQNoFv1iOC=Fc-w@mail.gmail.com>
so 21. 3. 2026 v 17:08 odesílatel Costa Shulyupin
<costa.shul@redhat.com> napsal:
>
> On Fri, 20 Mar 2026 at 17:07, Tomas Glozar <tglozar@redhat.com> wrote:
> >> +#define TIMERLAT_OPT_NANO OPT_CALLBACK('n', "nano", params, NULL, \
> > + "display data in nanoseconds", \
> > + opt_nano_cb)
>
> -n/--nano requires value incorrectly
>
Yes, this is a mistake.
> File: src/cli.c:463
> Cause: TIMERLAT_OPT_NANO used OPT_CALLBACK which expects an argument,
> but -n is a flag.
> Fix: Changed to OPT_CALLBACK_NOOPT:
>
> > + HIST_OPT_NO_IRQ,
> --no-irq clashes with auto-negation of --irq
>
> File: src/cli.c:1042
> Cause: libsubcmd auto-generates --no-X negations for every option.
> --no-irq (histogram boolean) collides with the auto-negation of --irq
> (stop threshold). The first match wins, so --irq was matched first and
> its negation intercepted the call.
> Fix: Moved HIST_OPT_NO_IRQ before RTLA_OPT_STOP('i', "irq", ...) in
> the options array so the explicit --no-irq boolean is found first.
>
Thank you for reporting that, I had no idea libsubcmd does this, I
haven't seen it documented anywhere, and I never used it in either
perf or bpftool.
It is a very strange thing to do for non-boolean options in my
opinion. If I understand the intention correctly, the option is
intended to unset another option specified before it, i.e. this:
$ ./rtla timerlat hist --event=timer --no-event
is supposed to behave the same as:
$ ./rtla timerlat hist
But it doesn't work:
$ ./rtla timerlat hist --event=timer --no-event
Usage: rtla timerlat hist [<options>]
perf fails even harder on this:
$ ./perf version
perf version 7.0.rc2.g61637af4f771
$ ./perf record --event=cycles --no-event
Segmentation fault (core dumped)
I'm thinking about modifying libsubcmd so this can be at least
disabled (not sure if the behavior is even intended in the non-boolean
case). Reshuffling the RTLA option array could work, too, but is
unintuitive from the developer's point of view, and will also force
histogram options to be before threshold options in the help message,
which is not what I wanted.
Tomas
^ permalink raw reply
* Re: [PATCH] tracing: fprobe: fix the length of unused fgraph_data
From: Steven Rostedt @ 2026-03-23 14:48 UTC (permalink / raw)
To: Martin Kaiser
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-trace-kernel,
linux-kernel, stable
In-Reply-To: <20260323102020.239567-1-martin@kaiser.cx>
On Mon, 23 Mar 2026 11:19:36 +0100
Martin Kaiser <martin@kaiser.cx> wrote:
> If fprobe_entry does not fill the allocated fgraph_data completely, the
> unused part is zeroed with memset.
>
> Fix the length for this memset call. Both reserved_words and used are in
> units of return stack words, but memset needs the number of bytes.
>
> Cc: stable@vger.kernel.org
> Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer")
> Signed-off-by: Martin Kaiser <martin@kaiser.cx>
> ---
> kernel/trace/fprobe.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> index dcadf1d23b8a..6a1192515afd 100644
> --- a/kernel/trace/fprobe.c
> +++ b/kernel/trace/fprobe.c
> @@ -451,7 +451,7 @@ static int fprobe_fgraph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops
> }
> }
> if (used < reserved_words)
> - memset(fgraph_data + used, 0, reserved_words - used);
> + memset(fgraph_data + used, 0, (reserved_words - used) * sizeof(long));
So fgraph_data is only used internally between the fprobe_fgraph_entry()
and fprobe_return() as it only exists on the fgraph shadow stack. I'm not
even sure if the unused portion needs to be zeroed out.
Thus, this may be correct, but it doesn't look like a true bug that needs a
stable tag.
-- Steve
>
> /* If any exit_handler is set, data must be used. */
> return used != 0;
^ permalink raw reply
* [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
To: Steven Rostedt, Peter Zijlstra, Dmitry Ilvokhin
Cc: Vineeth Pillai (Google), Masami Hiramatsu, Mathieu Desnoyers,
Ingo Molnar, Jens Axboe, io-uring, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Marcelo Ricardo Leitner, Xin Long, Jon Maloy, Aaron Conole,
Eelco Chaudron, Ilya Maximets, netdev, bpf, linux-sctp,
tipc-discussion, dev, Jiri Pirko, Oded Gabbay, Koby Elbaz,
dri-devel, Rafael J. Wysocki, Viresh Kumar, Gautham R. Shenoy,
Huang Rui, Mario Limonciello, Len Brown, Srinivas Pandruvada,
linux-pm, MyungJoo Ham, Kyungmin Park, Chanwoo Choi,
Christian König, Sumit Semwal, linaro-mm-sig, Eddie James,
Andrew Jeffery, Joel Stanley, linux-fsi, David Airlie,
Simona Vetter, Alex Deucher, Danilo Krummrich, Matthew Brost,
Philipp Stanner, Harry Wentland, Leo Li, amd-gfx, Jiri Kosina,
Benjamin Tissoires, linux-input, Wolfram Sang, linux-i2c,
Mark Brown, Michael Hennerich, Nuno Sá, linux-spi,
James E.J. Bottomley, Martin K. Petersen, linux-scsi, Chris Mason,
David Sterba, linux-btrfs, Thomas Gleixner, Andrew Morton,
SeongJae Park, linux-mm, Borislav Petkov, Dave Hansen, x86,
linux-trace-kernel, linux-kernel
When a caller already guards a tracepoint with an explicit enabled check:
if (trace_foo_enabled() && cond)
trace_foo(args);
trace_foo() internally re-evaluates the static_branch_unlikely() key.
Since static branches are patched binary instructions the compiler cannot
fold the two evaluations, so every such site pays the cost twice.
This series introduces trace_call__##name() as a companion to
trace_##name(). It calls __do_trace_##name() directly, bypassing the
redundant static-branch re-check, while preserving all other correctness
properties of the normal path (RCU-watching assertion, might_fault() for
syscall tracepoints). The internal __do_trace_##name() symbol is not
leaked to call sites; trace_call__##name() is the only new public API.
if (trace_foo_enabled() && cond)
trace_call__foo(args); /* calls __do_trace_foo() directly */
The first patch adds the three-location change to
include/linux/tracepoint.h (__DECLARE_TRACE, __DECLARE_TRACE_SYSCALL,
and the !TRACEPOINTS_ENABLED stub). The remaining 18 patches
mechanically convert all guarded call sites found in the tree:
kernel/, io_uring/, net/, accel/habanalabs, cpufreq/, devfreq/,
dma-buf/, fsi/, drm/, HID, i2c/, spi/, scsi/ufs/, btrfs/,
net/devlink/, kernel/time/, kernel/trace/, mm/damon/, and arch/x86/.
This series is motivated by Peter Zijlstra's observation in the discussion
around Dmitry Ilvokhin's locking tracepoint instrumentation series, where
he noted that compilers cannot optimize static branches and that guarded
call sites end up evaluating the static branch twice for no reason, and
by Steven Rostedt's suggestion to add a proper API instead of exposing
internal implementation details like __do_trace_##name() directly to
call sites:
https://lore.kernel.org/linux-trace-kernel/8298e098d3418cb446ef396f119edac58a3414e9.1772642407.git.d@ilvokhin.com
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Changes in v2:
- Renamed trace_invoke_##name() to trace_call__##name() (double
underscore) per review comments.
- Added 4 new patches covering sites missed in v1, found using
coccinelle to scan the tree (Keith Busch):
* net/devlink: guarded tracepoint_enabled() block in trap.c
* kernel/time: early-return guard in tick-sched.c (tick_stop)
* kernel/trace: early-return guard in trace_benchmark.c
* mm/damon: early-return guard in core.c
* arch/x86: do_trace_*() wrapper functions in lib/msr.c, which
are called exclusively from tracepoint_enabled()-guarded sites
in asm/msr.h
v1: https://lore.kernel.org/linux-trace-kernel/abSqrJ1J59RQC47U@kbusch-mbp/
Vineeth Pillai (Google) (19):
tracepoint: Add trace_call__##name() API
kernel: Use trace_call__##name() at guarded tracepoint call sites
io_uring: Use trace_call__##name() at guarded tracepoint call sites
net: Use trace_call__##name() at guarded tracepoint call sites
accel/habanalabs: Use trace_call__##name() at guarded tracepoint call
sites
cpufreq: Use trace_call__##name() at guarded tracepoint call sites
devfreq: Use trace_call__##name() at guarded tracepoint call sites
dma-buf: Use trace_call__##name() at guarded tracepoint call sites
fsi: Use trace_call__##name() at guarded tracepoint call sites
drm: Use trace_call__##name() at guarded tracepoint call sites
HID: Use trace_call__##name() at guarded tracepoint call sites
i2c: Use trace_call__##name() at guarded tracepoint call sites
spi: Use trace_call__##name() at guarded tracepoint call sites
scsi: ufs: Use trace_call__##name() at guarded tracepoint call sites
btrfs: Use trace_call__##name() at guarded tracepoint call sites
net: devlink: Use trace_call__##name() at guarded tracepoint call
sites
kernel: time, trace: Use trace_call__##name() at guarded tracepoint
call sites
mm: damon: Use trace_call__##name() at guarded tracepoint call sites
x86: msr: Use trace_call__##name() at guarded tracepoint call sites
arch/x86/lib/msr.c | 6 +++---
drivers/accel/habanalabs/common/device.c | 12 ++++++------
drivers/accel/habanalabs/common/mmu/mmu.c | 3 ++-
drivers/accel/habanalabs/common/pci/pci.c | 4 ++--
drivers/cpufreq/amd-pstate.c | 10 +++++-----
drivers/cpufreq/cpufreq.c | 2 +-
drivers/cpufreq/intel_pstate.c | 2 +-
drivers/devfreq/devfreq.c | 2 +-
drivers/dma-buf/dma-fence.c | 4 ++--
drivers/fsi/fsi-master-aspeed.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
drivers/gpu/drm/scheduler/sched_entity.c | 4 ++--
drivers/hid/intel-ish-hid/ipc/pci-ish.c | 2 +-
drivers/i2c/i2c-core-slave.c | 2 +-
drivers/spi/spi-axi-spi-engine.c | 4 ++--
drivers/ufs/core/ufshcd.c | 12 ++++++------
fs/btrfs/extent_map.c | 4 ++--
fs/btrfs/raid56.c | 4 ++--
include/linux/tracepoint.h | 11 +++++++++++
io_uring/io_uring.h | 2 +-
kernel/irq_work.c | 2 +-
kernel/sched/ext.c | 2 +-
kernel/smp.c | 2 +-
kernel/time/tick-sched.c | 12 ++++++------
kernel/trace/trace_benchmark.c | 2 +-
mm/damon/core.c | 2 +-
net/core/dev.c | 2 +-
net/core/xdp.c | 2 +-
net/devlink/trap.c | 2 +-
net/openvswitch/actions.c | 2 +-
net/openvswitch/datapath.c | 2 +-
net/sctp/outqueue.c | 2 +-
net/tipc/node.c | 2 +-
35 files changed, 74 insertions(+), 62 deletions(-)
--
2.53.0
^ permalink raw reply
* [PATCH v2 02/19] kernel: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Tejun Heo, David Vernet, Andrea Righi, Changwoo Min, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Ben Segall,
Mel Gorman, Valentin Schneider, Thomas Gleixner,
Yury Norov [NVIDIA], Paul E. McKenney, Rik van Riel, Roman Kisel,
Joel Fernandes, Rafael J. Wysocki, Ulf Hansson, linux-kernel,
sched-ext, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
kernel/irq_work.c | 2 +-
kernel/sched/ext.c | 2 +-
kernel/smp.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 73f7e1fd4ab4d..120fd7365fbe2 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -79,7 +79,7 @@ void __weak arch_irq_work_raise(void)
static __always_inline void irq_work_raise(struct irq_work *work)
{
if (trace_ipi_send_cpu_enabled() && arch_irq_work_has_interrupt())
- trace_ipi_send_cpu(smp_processor_id(), _RET_IP_, work->func);
+ trace_call__ipi_send_cpu(smp_processor_id(), _RET_IP_, work->func);
arch_irq_work_raise();
}
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 1594987d637b0..cfbac9cf62f84 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4494,7 +4494,7 @@ static __printf(2, 3) void dump_line(struct seq_buf *s, const char *fmt, ...)
vscnprintf(line_buf, sizeof(line_buf), fmt, args);
va_end(args);
- trace_sched_ext_dump(line_buf);
+ trace_call__sched_ext_dump(line_buf);
}
#endif
/* @s may be zero sized and seq_buf triggers WARN if so */
diff --git a/kernel/smp.c b/kernel/smp.c
index f349960f79cad..537cf1f461d75 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -394,7 +394,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
sched_ttwu_pending : csd->func;
- trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
+ trace_call__csd_queue_cpu(cpu, _RET_IP_, func, csd);
}
/*
--
2.53.0
^ permalink raw reply related
* [PATCH v2 03/19] io_uring: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Jens Axboe, io-uring, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
io_uring/io_uring.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 0fa844faf2871..e99975ffdda12 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -299,7 +299,7 @@ static __always_inline bool io_fill_cqe_req(struct io_ring_ctx *ctx,
}
if (trace_io_uring_complete_enabled())
- trace_io_uring_complete(req->ctx, req, cqe);
+ trace_call__io_uring_complete(req->ctx, req, cqe);
return true;
}
--
2.53.0
^ permalink raw reply related
* [PATCH v2 05/19] accel/habanalabs: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Koby Elbaz, Konstantin Sinyuk, Oded Gabbay, Kees Cook,
Andy Shevchenko, dri-devel, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/accel/habanalabs/common/device.c | 12 ++++++------
drivers/accel/habanalabs/common/mmu/mmu.c | 3 ++-
drivers/accel/habanalabs/common/pci/pci.c | 4 ++--
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 09b27bac3a31d..a9ad1edd0bf41 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -132,8 +132,8 @@ static void *hl_dma_alloc_common(struct hl_device *hdev, size_t size, dma_addr_t
}
if (trace_habanalabs_dma_alloc_enabled() && !ZERO_OR_NULL_PTR(ptr))
- trace_habanalabs_dma_alloc(&(hdev)->pdev->dev, (u64) (uintptr_t) ptr, *dma_handle,
- size, caller);
+ trace_call__habanalabs_dma_alloc(&(hdev)->pdev->dev, (u64) (uintptr_t) ptr,
+ *dma_handle, size, caller);
return ptr;
}
@@ -206,7 +206,7 @@ int hl_dma_map_sgtable_caller(struct hl_device *hdev, struct sg_table *sgt,
return 0;
for_each_sgtable_dma_sg(sgt, sg, i)
- trace_habanalabs_dma_map_page(&(hdev)->pdev->dev,
+ trace_call__habanalabs_dma_map_page(&(hdev)->pdev->dev,
page_to_phys(sg_page(sg)),
sg->dma_address - prop->device_dma_offset_for_host_access,
#ifdef CONFIG_NEED_SG_DMA_LENGTH
@@ -249,7 +249,7 @@ void hl_dma_unmap_sgtable_caller(struct hl_device *hdev, struct sg_table *sgt,
if (trace_habanalabs_dma_unmap_page_enabled()) {
for_each_sgtable_dma_sg(sgt, sg, i)
- trace_habanalabs_dma_unmap_page(&(hdev)->pdev->dev,
+ trace_call__habanalabs_dma_unmap_page(&(hdev)->pdev->dev,
page_to_phys(sg_page(sg)),
sg->dma_address - prop->device_dma_offset_for_host_access,
#ifdef CONFIG_NEED_SG_DMA_LENGTH
@@ -2656,7 +2656,7 @@ inline u32 hl_rreg(struct hl_device *hdev, u32 reg)
u32 val = readl(hdev->rmmio + reg);
if (unlikely(trace_habanalabs_rreg32_enabled()))
- trace_habanalabs_rreg32(&(hdev)->pdev->dev, reg, val);
+ trace_call__habanalabs_rreg32(&(hdev)->pdev->dev, reg, val);
return val;
}
@@ -2674,7 +2674,7 @@ inline u32 hl_rreg(struct hl_device *hdev, u32 reg)
inline void hl_wreg(struct hl_device *hdev, u32 reg, u32 val)
{
if (unlikely(trace_habanalabs_wreg32_enabled()))
- trace_habanalabs_wreg32(&(hdev)->pdev->dev, reg, val);
+ trace_call__habanalabs_wreg32(&(hdev)->pdev->dev, reg, val);
writel(val, hdev->rmmio + reg);
}
diff --git a/drivers/accel/habanalabs/common/mmu/mmu.c b/drivers/accel/habanalabs/common/mmu/mmu.c
index 6c7c4ff8a8a95..fdeca1fe5fc78 100644
--- a/drivers/accel/habanalabs/common/mmu/mmu.c
+++ b/drivers/accel/habanalabs/common/mmu/mmu.c
@@ -263,7 +263,8 @@ int hl_mmu_unmap_page(struct hl_ctx *ctx, u64 virt_addr, u32 page_size, bool flu
mmu_funcs->flush(ctx);
if (trace_habanalabs_mmu_unmap_enabled() && !rc)
- trace_habanalabs_mmu_unmap(&hdev->pdev->dev, virt_addr, 0, page_size, flush_pte);
+ trace_call__habanalabs_mmu_unmap(&hdev->pdev->dev, virt_addr,
+ 0, page_size, flush_pte);
return rc;
}
diff --git a/drivers/accel/habanalabs/common/pci/pci.c b/drivers/accel/habanalabs/common/pci/pci.c
index 81cbd8697d4cd..bffbde3c399be 100644
--- a/drivers/accel/habanalabs/common/pci/pci.c
+++ b/drivers/accel/habanalabs/common/pci/pci.c
@@ -123,7 +123,7 @@ int hl_pci_elbi_read(struct hl_device *hdev, u64 addr, u32 *data)
pci_read_config_dword(pdev, mmPCI_CONFIG_ELBI_DATA, data);
if (unlikely(trace_habanalabs_elbi_read_enabled()))
- trace_habanalabs_elbi_read(&hdev->pdev->dev, (u32) addr, val);
+ trace_call__habanalabs_elbi_read(&hdev->pdev->dev, (u32) addr, val);
return 0;
}
@@ -186,7 +186,7 @@ static int hl_pci_elbi_write(struct hl_device *hdev, u64 addr, u32 data)
if ((val & PCI_CONFIG_ELBI_STS_MASK) == PCI_CONFIG_ELBI_STS_DONE) {
if (unlikely(trace_habanalabs_elbi_write_enabled()))
- trace_habanalabs_elbi_write(&hdev->pdev->dev, (u32) addr, val);
+ trace_call__habanalabs_elbi_write(&hdev->pdev->dev, (u32) addr, val);
return 0;
}
--
2.53.0
^ permalink raw reply related
* [PATCH v2 04/19] net: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Aaron Conole, Eelco Chaudron, Ilya Maximets,
Marcelo Ricardo Leitner, Xin Long, Jon Maloy, Willem de Bruijn,
Samiullah Khawaja, Hangbin Liu, Kuniyuki Iwashima, netdev,
linux-kernel, bpf, dev, linux-sctp, tipc-discussion,
linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
net/core/dev.c | 2 +-
net/core/xdp.c | 2 +-
net/openvswitch/actions.c | 2 +-
net/openvswitch/datapath.c | 2 +-
net/sctp/outqueue.c | 2 +-
net/tipc/node.c | 2 +-
6 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 14a83f2035b93..f7602b1892fea 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6444,7 +6444,7 @@ void netif_receive_skb_list(struct list_head *head)
return;
if (trace_netif_receive_skb_list_entry_enabled()) {
list_for_each_entry(skb, head, list)
- trace_netif_receive_skb_list_entry(skb);
+ trace_call__netif_receive_skb_list_entry(skb);
}
netif_receive_skb_list_internal(head);
trace_netif_receive_skb_list_exit(0);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 9890a30584ba7..3003e5c574191 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -362,7 +362,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
xsk_pool_set_rxq_info(allocator, xdp_rxq);
if (trace_mem_connect_enabled() && xdp_alloc)
- trace_mem_connect(xdp_alloc, xdp_rxq);
+ trace_call__mem_connect(xdp_alloc, xdp_rxq);
return 0;
}
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 792ca44a461da..60823de201417 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1259,7 +1259,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
int err = 0;
if (trace_ovs_do_execute_action_enabled())
- trace_ovs_do_execute_action(dp, skb, key, a, rem);
+ trace_call__ovs_do_execute_action(dp, skb, key, a, rem);
/* Actions that rightfully have to consume the skb should do it
* and return directly.
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index e209099218b41..2b9755e2e4731 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -335,7 +335,7 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
int err;
if (trace_ovs_dp_upcall_enabled())
- trace_ovs_dp_upcall(dp, skb, key, upcall_info);
+ trace_call__ovs_dp_upcall(dp, skb, key, upcall_info);
if (upcall_info->portid == 0) {
err = -ENOTCONN;
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index f6b8c13dafa4a..4025d863ffc84 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1267,7 +1267,7 @@ int sctp_outq_sack(struct sctp_outq *q, struct sctp_chunk *chunk)
/* SCTP path tracepoint for congestion control debugging. */
if (trace_sctp_probe_path_enabled()) {
list_for_each_entry(transport, transport_list, transports)
- trace_sctp_probe_path(transport, asoc);
+ trace_call__sctp_probe_path(transport, asoc);
}
sack_ctsn = ntohl(sack->cum_tsn_ack);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index af442a5ef8f3d..5745d6aa0a054 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1943,7 +1943,7 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
if (trace_tipc_node_check_state_enabled()) {
trace_tipc_skb_dump(skb, false, "skb for node state check");
- trace_tipc_node_check_state(n, true, " ");
+ trace_call__tipc_node_check_state(n, true, " ");
}
l = n->links[bearer_id].link;
if (!l)
--
2.53.0
^ permalink raw reply related
* [PATCH v2 06/19] cpufreq: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Huang Rui, Gautham R. Shenoy, Mario Limonciello, Perry Yuan,
Rafael J. Wysocki, Viresh Kumar, Srinivas Pandruvada, Len Brown,
linux-pm, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/cpufreq/amd-pstate.c | 10 +++++-----
drivers/cpufreq/cpufreq.c | 2 +-
drivers/cpufreq/intel_pstate.c | 2 +-
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 5aa9fcd80cf51..4c47324aa2f73 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -247,7 +247,7 @@ static int msr_update_perf(struct cpufreq_policy *policy, u8 min_perf,
if (trace_amd_pstate_epp_perf_enabled()) {
union perf_cached perf = READ_ONCE(cpudata->perf);
- trace_amd_pstate_epp_perf(cpudata->cpu,
+ trace_call__amd_pstate_epp_perf(cpudata->cpu,
perf.highest_perf,
epp,
min_perf,
@@ -298,7 +298,7 @@ static int msr_set_epp(struct cpufreq_policy *policy, u8 epp)
if (trace_amd_pstate_epp_perf_enabled()) {
union perf_cached perf = cpudata->perf;
- trace_amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf,
+ trace_call__amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf,
epp,
FIELD_GET(AMD_CPPC_MIN_PERF_MASK,
cpudata->cppc_req_cached),
@@ -343,7 +343,7 @@ static int shmem_set_epp(struct cpufreq_policy *policy, u8 epp)
if (trace_amd_pstate_epp_perf_enabled()) {
union perf_cached perf = cpudata->perf;
- trace_amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf,
+ trace_call__amd_pstate_epp_perf(cpudata->cpu, perf.highest_perf,
epp,
FIELD_GET(AMD_CPPC_MIN_PERF_MASK,
cpudata->cppc_req_cached),
@@ -507,7 +507,7 @@ static int shmem_update_perf(struct cpufreq_policy *policy, u8 min_perf,
if (trace_amd_pstate_epp_perf_enabled()) {
union perf_cached perf = READ_ONCE(cpudata->perf);
- trace_amd_pstate_epp_perf(cpudata->cpu,
+ trace_call__amd_pstate_epp_perf(cpudata->cpu,
perf.highest_perf,
epp,
min_perf,
@@ -588,7 +588,7 @@ static void amd_pstate_update(struct amd_cpudata *cpudata, u8 min_perf,
}
if (trace_amd_pstate_perf_enabled() && amd_pstate_sample(cpudata)) {
- trace_amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->freq,
+ trace_call__amd_pstate_perf(min_perf, des_perf, max_perf, cpudata->freq,
cpudata->cur.mperf, cpudata->cur.aperf, cpudata->cur.tsc,
cpudata->cpu, fast_switch);
}
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 277884d91913c..58901047eae5a 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2222,7 +2222,7 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
if (trace_cpu_frequency_enabled()) {
for_each_cpu(cpu, policy->cpus)
- trace_cpu_frequency(freq, cpu);
+ trace_call__cpu_frequency(freq, cpu);
}
return freq;
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 11c58af419006..70be952209144 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -3132,7 +3132,7 @@ static void intel_cpufreq_trace(struct cpudata *cpu, unsigned int trace_type, in
return;
sample = &cpu->sample;
- trace_pstate_sample(trace_type,
+ trace_call__pstate_sample(trace_type,
0,
old_pstate,
cpu->pstate.current_pstate,
--
2.53.0
^ permalink raw reply related
* [PATCH v2 07/19] devfreq: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
MyungJoo Ham, Kyungmin Park, Chanwoo Choi, linux-pm, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/devfreq/devfreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index c0a74091b9041..d1b27d9b753df 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -370,7 +370,7 @@ static int devfreq_set_target(struct devfreq *devfreq, unsigned long new_freq,
* change order of between devfreq device and passive devfreq device.
*/
if (trace_devfreq_frequency_enabled() && new_freq != cur_freq)
- trace_devfreq_frequency(devfreq, new_freq, cur_freq);
+ trace_call__devfreq_frequency(devfreq, new_freq, cur_freq);
freqs.new = new_freq;
devfreq_notify_transition(devfreq, &freqs, DEVFREQ_POSTCHANGE);
--
2.53.0
^ permalink raw reply related
* [PATCH v2 08/19] dma-buf: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Sumit Semwal, Christian König, linux-media, dri-devel,
linaro-mm-sig, linux-kernel, linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/dma-buf/dma-fence.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 35afcfcac5910..232e92196da43 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -535,7 +535,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
if (trace_dma_fence_wait_start_enabled()) {
rcu_read_lock();
- trace_dma_fence_wait_start(fence);
+ trace_call__dma_fence_wait_start(fence);
rcu_read_unlock();
}
if (fence->ops->wait)
@@ -544,7 +544,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
ret = dma_fence_default_wait(fence, intr, timeout);
if (trace_dma_fence_wait_end_enabled()) {
rcu_read_lock();
- trace_dma_fence_wait_end(fence);
+ trace_call__dma_fence_wait_end(fence);
rcu_read_unlock();
}
return ret;
--
2.53.0
^ permalink raw reply related
* [PATCH v2 09/19] fsi: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
Eddie James, Ninad Palsule, Joel Stanley, Andrew Jeffery,
linux-fsi, linux-arm-kernel, linux-aspeed, linux-kernel,
linux-trace-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
drivers/fsi/fsi-master-aspeed.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/fsi/fsi-master-aspeed.c b/drivers/fsi/fsi-master-aspeed.c
index aa1380cdff338..8313c7d530eb7 100644
--- a/drivers/fsi/fsi-master-aspeed.c
+++ b/drivers/fsi/fsi-master-aspeed.c
@@ -229,7 +229,7 @@ static int check_errors(struct fsi_master_aspeed *aspeed, int err)
opb_readl(aspeed, ctrl_base + FSI_MSTAP0, &mstap0);
opb_readl(aspeed, ctrl_base + FSI_MESRB0, &mesrb0);
- trace_fsi_master_aspeed_opb_error(
+ trace_call__fsi_master_aspeed_opb_error(
be32_to_cpu(mresp0),
be32_to_cpu(mstap0),
be32_to_cpu(mesrb0));
--
2.53.0
^ permalink raw reply related
* [PATCH v2 01/19] tracepoint: Add trace_call__##name() API
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
To: Steven Rostedt, Peter Zijlstra, Dmitry Ilvokhin
Cc: Vineeth Pillai (Google), Masami Hiramatsu, Mathieu Desnoyers,
Ingo Molnar, Jens Axboe, io-uring, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Marcelo Ricardo Leitner, Xin Long, Jon Maloy, Aaron Conole,
Eelco Chaudron, Ilya Maximets, netdev, bpf, linux-sctp,
tipc-discussion, dev, Jiri Pirko, Oded Gabbay, Koby Elbaz,
dri-devel, Rafael J. Wysocki, Viresh Kumar, Gautham R. Shenoy,
Huang Rui, Mario Limonciello, Len Brown, Srinivas Pandruvada,
linux-pm, MyungJoo Ham, Kyungmin Park, Chanwoo Choi,
Christian König, Sumit Semwal, linaro-mm-sig, Eddie James,
Andrew Jeffery, Joel Stanley, linux-fsi, David Airlie,
Simona Vetter, Alex Deucher, Danilo Krummrich, Matthew Brost,
Philipp Stanner, Harry Wentland, Leo Li, amd-gfx, Jiri Kosina,
Benjamin Tissoires, linux-input, Wolfram Sang, linux-i2c,
Mark Brown, Michael Hennerich, Nuno Sá, linux-spi,
James E.J. Bottomley, Martin K. Petersen, linux-scsi, Chris Mason,
David Sterba, linux-btrfs, Thomas Gleixner, Andrew Morton,
SeongJae Park, linux-mm, Borislav Petkov, Dave Hansen, x86,
linux-trace-kernel, linux-kernel
In-Reply-To: <20260323160052.17528-1-vineeth@bitbyteword.org>
Add trace_call__##name() as a companion to trace_##name(). When a
caller already guards a tracepoint with an explicit enabled check:
if (trace_foo_enabled() && cond)
trace_foo(args);
trace_foo() internally repeats the static_branch_unlikely() test, which
the compiler cannot fold since static branches are patched binary
instructions. This results in two static-branch evaluations for every
guarded call site.
trace_call__##name() calls __do_trace_##name() directly, skipping the
redundant static-branch re-check. This avoids leaking the internal
__do_trace_##name() symbol into call sites while still eliminating the
double evaluation:
if (trace_foo_enabled() && cond)
trace_invoke_foo(args); /* calls __do_trace_foo() directly */
Three locations are updated:
- __DECLARE_TRACE: invoke form omits static_branch_unlikely, retains
the LOCKDEP RCU-watching assertion.
- __DECLARE_TRACE_SYSCALL: same, plus retains might_fault().
- !TRACEPOINTS_ENABLED stub: empty no-op so callers compile cleanly
when tracepoints are compiled out.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
include/linux/tracepoint.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 22ca1c8b54f32..ed969705341f1 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -294,6 +294,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
WARN_ONCE(!rcu_is_watching(), \
"RCU not watching for tracepoint"); \
} \
+ } \
+ static inline void trace_call__##name(proto) \
+ { \
+ __do_trace_##name(args); \
}
#define __DECLARE_TRACE_SYSCALL(name, proto, args, data_proto) \
@@ -313,6 +317,11 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
WARN_ONCE(!rcu_is_watching(), \
"RCU not watching for tracepoint"); \
} \
+ } \
+ static inline void trace_call__##name(proto) \
+ { \
+ might_fault(); \
+ __do_trace_##name(args); \
}
/*
@@ -398,6 +407,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
#define __DECLARE_TRACE_COMMON(name, proto, args, data_proto) \
static inline void trace_##name(proto) \
{ } \
+ static inline void trace_call__##name(proto) \
+ { } \
static inline int \
register_trace_##name(void (*probe)(data_proto), \
void *data) \
--
2.53.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox