Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v17 1/5] ring-buffer: Flush and stop persistent ring buffer on panic
From: Geert Uytterhoeven @ 2026-04-23  7:28 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Steven Rostedt, Catalin Marinas, Will Deacon, Mathieu Desnoyers,
	linux-kernel, linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687459412.932171.8121855108122534476.stgit@mhiramat.tok.corp.google.com>

On Wed, 22 Apr 2026 at 18:26, Masami Hiramatsu (Google)
<mhiramat@kernel.org> wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
> On real hardware, panic and machine reboot may not flush hardware cache
> to memory. This means the persistent ring buffer, which relies on a
> coherent state of memory, may not have its events written to the buffer
> and they may be lost. Moreover, there may be inconsistency with the
> counters which are used for validation of the integrity of the
> persistent ring buffer which may cause all data to be discarded.
>
> To avoid this issue, stop recording of the ring buffer on panic and
> flush the cache of the ring buffer's memory.
>
> Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
> Cc: stable@vger.kernel.org
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

>  arch/m68k/include/asm/Kbuild         |    1 +

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH bpf-next 10/17] bpf: Add support for tracing_multi link session
From: XIAO WU @ 2026-04-23  8:07 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: andrii, ast, bpf, clm, daniel, eddyz87, ihor.solodrai, jolsa,
	kafai, linux-trace-kernel, martin.lau, menglong8.dong, rostedt,
	songliubraving, yhs, yonghong.song
In-Reply-To: <0520af2c467a82b82aa3014b7e721f95bfde9e91324bb2e183b099e5a37bbc3e@mail.kernel.org>

Hi,

> BPF_TRACE_FSESSION_MULTI is now accepted here, which means
> a program with this type can enter bpf_tracing_prog_attach()
> via BPF_RAW_TRACEPOINT_OPEN:
>
> bpf_raw_tracepoint_open()
>   bpf_raw_tp_link_attach()       /* name == NULL */
>     bpf_tracing_prog_attach()    /* BPF_TRACE_FSESSION_MULTI */
>
> Further down in bpf_tracing_prog_attach(), the fexit node
> initialization only checks for BPF_TRACE_FSESSION:
>
> kernel/bpf/syscall.c:bpf_tracing_prog_attach() {
>     ...
>     if (prog->expected_attach_type == BPF_TRACE_FSESSION) {
>         link->fexit.link = &link->link.link;
>         link->fexit.cookie = bpf_cookie;
>     }
>     ...
> }
>
> So for BPF_TRACE_FSESSION_MULTI, link->fexit.link stays NULL
> (from kzalloc). When __bpf_trampoline_link_prog() later calls
> fsession_exit(), it returns &link->fexit with a NULL link
> field. This node gets added to the trampoline FEXIT list, and
> bpf_trampoline_get_progs() then dereferences it:
>
> kernel/bpf/trampoline.c:bpf_trampoline_get_progs() {
>     ...
>     hlist_for_each_entry(node, &tr->progs_hlist[kind], tramp_hlist) {
>         *ip_arg |= node->link->prog->call_get_func_ip;
>                    ^^^^^^^^^^
>     ...
> }
>
> Would it make sense to either add BPF_TRACE_FSESSION_MULTI to
> the fexit initialization, or reject this type in
> bpf_tracing_prog_attach() since it should only be used through
> bpf_tracing_multi_attach()?

Yes, confirmed.

I reproduced this on x86_64 with a minimal tracing program loaded as
BPF_PROG_TYPE_TRACING with
expected_attach_type=BPF_TRACE_FSESSION_MULTI, then attached through
BPF_RAW_TRACEPOINT_OPEN with name=NULL.

This reaches bpf_tracing_prog_attach() without initializing link->fexit
for FSESSION_MULTI and later hits the NULL dereference path in
trampoline handling, as you pointed out.

C reproducer:

--8<--
#define _GNU_SOURCE
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/syscall.h>
#include <unistd.h>

/* Use kernel-under-test UAPI, not host's potentially older one. */
#include "../kernel-source/include/uapi/linux/bpf.h"

#ifndef __NR_bpf
#define __NR_bpf 321
#endif

static int sys_bpf(int cmd, union bpf_attr *attr, unsigned int size)
{
    return (int)syscall(__NR_bpf, cmd, attr, size);
}

static void bump_memlock(void)
{
    struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
    setrlimit(RLIMIT_MEMLOCK, &r);
}

int main(void)
{
    bump_memlock();

    /* r0 = 0; exit */
    struct bpf_insn prog[] = {
        { .code = 0xb7, .dst_reg = 0, .src_reg = 0, .off = 0, .imm = 0
    }, { .code = 0x95, .dst_reg = 0, .src_reg = 0, .off = 0, .imm = 0 },
    };

    char license[] = "GPL";
    static char log_buf[1 << 20];

    union bpf_attr attr;
    memset(&attr, 0, sizeof(attr));
    attr.prog_type = BPF_PROG_TYPE_TRACING;
    attr.expected_attach_type = BPF_TRACE_FSESSION_MULTI;
    attr.insn_cnt = (uint32_t)(sizeof(prog) / sizeof(prog[0]));
    attr.insns = (uint64_t)(uintptr_t)prog;
    attr.license = (uint64_t)(uintptr_t)license;
    attr.log_buf = (uint64_t)(uintptr_t)log_buf;
    attr.log_size = sizeof(log_buf);
    attr.log_level = 1;

    int prog_fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
    if (prog_fd < 0) {
        fprintf(stderr, "BPF_PROG_LOAD failed: errno=%d (%s)\n", errno,
    strerror(errno)); if (log_buf[0])
            fprintf(stderr, "verifier log:\n%s\n", log_buf);
        return 1;
    }

    memset(&attr, 0, sizeof(attr));
    attr.raw_tracepoint.prog_fd = prog_fd;
    attr.raw_tracepoint.name = 0; /* NULL name drives TRACING attach
    path */ attr.raw_tracepoint.cookie = 0x4141414142424242ULL;

    int link_fd = sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));

    if (link_fd < 0) {
        fprintf(stderr, "BPF_RAW_TRACEPOINT_OPEN returned errno=%d
    (%s)\n", errno, strerror(errno)); close(prog_fd);
        return 2;
    }

    fprintf(stderr, "Unexpectedly succeeded: link_fd=%d\n", link_fd);
    close(link_fd);
    close(prog_fd);
    return 0;
}
--8<--

I agree the patch should be made bisect-safe. I will post a follow-up
that ensures BPF_TRACE_FSESSION_MULTI cannot enter this uninitialized
fexit path (either by initializing it consistently where needed, or
rejecting this attach route and keeping it exclusive to
bpf_tracing_multi_attach()).

Signed-off-by: XIAO WU <shawdoxwu@gmail.com>

Thanks

^ permalink raw reply

* Re: [PATCH v17 0/5] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu @ 2026-04-23  8:26 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Steven Rostedt, Catalin Marinas, Will Deacon, Mathieu Desnoyers,
	linux-kernel, linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687458572.932171.10907864814735342737.stgit@mhiramat.tok.corp.google.com>

Hi,

Sashiko[1] pointed out other problems. Let me review it.
I also found one mistake (not by this series), so I will fix it too.

[1] https://sashiko.dev/#/patchset/177687458572.932171.10907864814735342737.stgit%40mhiramat.tok.corp.google.com

Thanks,

On Thu, 23 Apr 2026 01:16:26 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:

> Hi,
> 
> Here is the 17th version of improvement patches for making persistent
> ring buffers robust to failures.
> The previous version is here:
> 
> https://lore.kernel.org/all/177547105523.259641.14385891517704197263.stgit@mhiramat.tok.corp.google.com/
> 
> This version fixes some review comments from Sashiko[1], which
> includes:
> [2/5] Fix to use rb_page_size() of rewound pages for entry_bytes.
> [3/5] - Fix to verify head_page at first before using its timestamp.
>       - Reset timestamp if the page is invalid.
> [4/5] - In rb_test_inject_invalid_pages(), changed entry_bytes and
>        idx to unsigned long
>       - Added NULL checks for cpu_buffer and meta.
>       - In allocate_trace_buffer(), added a NULL check for tr->name
>        before comparing it with strcmp.
> [5/5] Added NULL check for dpage in rbm_show in ring_buffer.c.
> 
> [1] https://sashiko.dev/#/patchset/177552432201.853249.5125045538812833325.stgit%40mhiramat.tok.corp.google.com
> 
> Thank you,
> 
> Masami Hiramatsu (Google) (5):
>       ring-buffer: Flush and stop persistent ring buffer on panic
>       ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
>       ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
>       ring-buffer: Add persistent ring buffer invalid-page inject test
>       ring-buffer: Show commit numbers in buffer_meta file
> 
> 
>  arch/alpha/include/asm/Kbuild        |    1 
>  arch/arc/include/asm/Kbuild          |    1 
>  arch/arm/include/asm/Kbuild          |    1 
>  arch/arm64/include/asm/ring_buffer.h |   10 +
>  arch/csky/include/asm/Kbuild         |    1 
>  arch/hexagon/include/asm/Kbuild      |    1 
>  arch/loongarch/include/asm/Kbuild    |    1 
>  arch/m68k/include/asm/Kbuild         |    1 
>  arch/microblaze/include/asm/Kbuild   |    1 
>  arch/mips/include/asm/Kbuild         |    1 
>  arch/nios2/include/asm/Kbuild        |    1 
>  arch/openrisc/include/asm/Kbuild     |    1 
>  arch/parisc/include/asm/Kbuild       |    1 
>  arch/powerpc/include/asm/Kbuild      |    1 
>  arch/riscv/include/asm/Kbuild        |    1 
>  arch/s390/include/asm/Kbuild         |    1 
>  arch/sh/include/asm/Kbuild           |    1 
>  arch/sparc/include/asm/Kbuild        |    1 
>  arch/um/include/asm/Kbuild           |    1 
>  arch/x86/include/asm/Kbuild          |    1 
>  arch/xtensa/include/asm/Kbuild       |    1 
>  include/asm-generic/ring_buffer.h    |   13 ++
>  include/linux/ring_buffer.h          |    1 
>  kernel/trace/Kconfig                 |   34 ++++
>  kernel/trace/ring_buffer.c           |  275 ++++++++++++++++++++++++++--------
>  kernel/trace/trace.c                 |    4 
>  26 files changed, 290 insertions(+), 67 deletions(-)
>  create mode 100644 arch/arm64/include/asm/ring_buffer.h
>  create mode 100644 include/asm-generic/ring_buffer.h
> 
> 
> base-commit: 6170922f137231b98fc568571befef63e1edff3f
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH bpf-next 10/17] bpf: Add support for tracing_multi link session
From: Jiri Olsa @ 2026-04-23  8:35 UTC (permalink / raw)
  To: XIAO WU
  Cc: bot+bpf-ci, andrii, ast, bpf, clm, daniel, eddyz87, ihor.solodrai,
	kafai, linux-trace-kernel, martin.lau, menglong8.dong, rostedt,
	songliubraving, yhs, yonghong.song
In-Reply-To: <20260423160724.00004f6d@gmail.com>

On Thu, Apr 23, 2026 at 04:07:24PM +0800, XIAO WU wrote:

SNIP

> I agree the patch should be made bisect-safe. I will post a follow-up
> that ensures BPF_TRACE_FSESSION_MULTI cannot enter this uninitialized
> fexit path (either by initializing it consistently where needed, or
> rejecting this attach route and keeping it exclusive to
> bpf_tracing_multi_attach()).
> 
> Signed-off-by: XIAO WU <shawdoxwu@gmail.com>
> 
> Thanks

fyi there's v5 already https://lore.kernel.org/bpf/20260417192502.194548-1-jolsa@kernel.org/

jirka

^ permalink raw reply

* [PATCH] mm/vmscan: add balance_pgdat begin/end tracepoints
From: Bunyod Suvonov @ 2026-04-23 10:37 UTC (permalink / raw)
  To: akpm, hannes, rostedt, mhiramat
  Cc: david, mhocko, zhengqi.arch, shakeel.butt, ljs, mathieu.desnoyers,
	linux-mm, linux-trace-kernel, linux-kernel, Bunyod Suvonov

Vmscan has six main reclaim entry points: try_to_free_pages() for
direct reclaim, try_to_free_mem_cgroup_pages() for memcg reclaim,
mem_cgroup_shrink_node() for memcg soft limit reclaim, node_reclaim()
for node reclaim, shrink_all_memory() for hibernation reclaim, and
balance_pgdat() for kswapd reclaim.

All of them, except for shrink_all_memory() and balance_pgdat(), already
have begin/end tracepoints. This makes it harder to trace which reclaim
path is responsible for memory reclaim activity, because kswapd reclaim
cannot be identified as cleanly as other reclaim entry points, even
though it is the main background reclaim path under memory pressure.
There may be no need to trace shrink_all_memory() as it is primarily
used during hibernation. So this patch adds the missing tracepoint pair
for balance_pgdat().

The begin tracepoint records the node id, requested reclaim order, and
highest_zoneidx. The end tracepoint records the node id, reclaim order
that balance_pgdat() finished with, highest_zoneidx, and nr_reclaimed.
Together, they show the requested reclaim order and zone bound, whether
reclaim fell back to a lower order, and how much reclaim work was done.

Signed-off-by: Bunyod Suvonov <b.suvonov@sjtu.edu.cn>
---
 include/trace/events/vmscan.h | 52 +++++++++++++++++++++++++++++++++++
 mm/vmscan.c                   |  5 ++++
 2 files changed, 57 insertions(+)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 4445a8d9218d..b4bf7b8def1f 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -96,6 +96,58 @@ TRACE_EVENT(mm_vmscan_kswapd_wake,
 		__entry->order)
 );
 
+TRACE_EVENT(mm_vmscan_balance_pgdat_begin,
+
+	TP_PROTO(int nid, int order, int highest_zoneidx),
+
+	TP_ARGS(nid, order, highest_zoneidx),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+		__field(int, order)
+		__field(int, highest_zoneidx)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+		__entry->order = order;
+		__entry->highest_zoneidx = highest_zoneidx;
+	),
+
+	TP_printk("nid=%d order=%d highest_zoneidx=%-8s",
+		__entry->nid,
+		__entry->order,
+		__print_symbolic(__entry->highest_zoneidx, ZONE_TYPE))
+);
+
+TRACE_EVENT(mm_vmscan_balance_pgdat_end,
+
+	TP_PROTO(int nid, int order, int highest_zoneidx,
+		 unsigned long nr_reclaimed),
+
+	TP_ARGS(nid, order, highest_zoneidx, nr_reclaimed),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+		__field(int, order)
+		__field(int, highest_zoneidx)
+		__field(unsigned long, nr_reclaimed)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+		__entry->order = order;
+		__entry->highest_zoneidx = highest_zoneidx;
+		__entry->nr_reclaimed = nr_reclaimed;
+	),
+
+	TP_printk("nid=%d order=%d highest_zoneidx=%-8s nr_reclaimed=%lu",
+		__entry->nid,
+		__entry->order,
+		__print_symbolic(__entry->highest_zoneidx, ZONE_TYPE),
+		__entry->nr_reclaimed)
+);
+
 TRACE_EVENT(mm_vmscan_wakeup_kswapd,
 
 	TP_PROTO(int nid, int zid, int order, gfp_t gfp_flags),
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd1b1aa12581..b2d89ed69d22 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7121,6 +7121,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
 		.may_unmap = 1,
 	};
 
+	trace_mm_vmscan_balance_pgdat_begin(pgdat->node_id, order,
+					    highest_zoneidx);
 	set_task_reclaim_state(current, &sc.reclaim_state);
 	psi_memstall_enter(&pflags);
 	__fs_reclaim_acquire(_THIS_IP_);
@@ -7314,6 +7316,9 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
 	psi_memstall_leave(&pflags);
 	set_task_reclaim_state(current, NULL);
 
+	trace_mm_vmscan_balance_pgdat_end(pgdat->node_id, sc.order,
+					  highest_zoneidx, sc.nr_reclaimed);
+
 	/*
 	 * Return the order kswapd stopped reclaiming at as
 	 * prepare_kswapd_sleep() takes it into account. If another caller
-- 
2.53.0


^ permalink raw reply related

* [PATCH 0/9] rtla/tests: Extend runtime test coverage
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel

This patchset introduces some new tests to cover more options, especially
histogram and thread options. Most of the new tests use positive and negative
output matches, sometimes in combination with action scripts, to verify that
RTLA is applying the settings correctly.

Tests were reorganized a little, adding two new sections: thread tests and
histogram tests, next to basic tests.

Additionally, coverage of existing tests is extended by adding new matches and
by extending tests to cover both top and hist tools where possible. For the
latter, new helpers check_top_hist and check_top_q_hist are added to engine.sh.

As part of the new action scripts, detection of measurement threads is made more
robust by following child processes of either RTLA (user workload) or kthreadd
(kernel workload) rather than grepping through the comms of all processes, which
might have lead to false positives.

These changes significantly improve test coverage and make the test suite more
against false positives from unrelated processes.

Tomas Glozar (9):
  rtla/tests: Cover both top and hist tools where possible
  rtla/tests: Add get_workload_pids() helper
  rtla/tests: Check -c/--cpus thread affinity
  rtla/tests: Use negative match when testing --aa-only
  rtla/tests: Extend timerlat top --aa-only coverage
  rtla/tests: Cover all hist options in runtime tests
  rtla/tests: Add runtime test for -H/--house-keeping
  rtla/tests: Add runtime test for -k and -u options
  rtla/tests: Add runtime tests for -C/--cgroup

 tools/tracing/rtla/tests/engine.sh            |  15 +++
 tools/tracing/rtla/tests/osnoise.t            |  73 +++++++----
 .../rtla/tests/scripts/check-cgroup-match.sh  |  17 +++
 .../tracing/rtla/tests/scripts/check-cpus.sh  |   9 ++
 .../tests/scripts/check-housekeeping-cpus.sh  |   4 +
 .../rtla/tests/scripts/check-priority.sh      |   8 +-
 .../scripts/check-user-kernel-threads.sh      |  16 +++
 .../tests/scripts/lib/get_workload_pids.sh    |  11 ++
 tools/tracing/rtla/tests/timerlat.t           | 113 +++++++++++-------
 9 files changed, 194 insertions(+), 72 deletions(-)
 create mode 100755 tools/tracing/rtla/tests/scripts/check-cgroup-match.sh
 create mode 100755 tools/tracing/rtla/tests/scripts/check-cpus.sh
 create mode 100755 tools/tracing/rtla/tests/scripts/check-housekeeping-cpus.sh
 create mode 100755 tools/tracing/rtla/tests/scripts/check-user-kernel-threads.sh
 create mode 100644 tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh

-- 
2.53.0

^ permalink raw reply

* [PATCH 1/9] rtla/tests: Cover both top and hist tools where possible
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

RTLA runtime tests currently do not cover both tool variants for osnoise
and timerlat properly. Many tests applicable to both tools are only
tested for one tool, selected randomly.

Introduce two new shell functions, check_top_hist() and
check_top_q_hist(). The functions use the same syntax as check() and run
check() on the arguments twice: once replacing the "TOOL" string in the
command with "top" (or "top -q"), once replacing it with "hist". The top
-q variant is used for tests relying on messages printed after aborting
the RTLA main loop with a starting new line, which only happens for top
tools in quiet mode; without -q, the top output is printed on the same
line and the matches would fail.

Tests that are applicable to both top and hist tools were modified to
the run for both; additionally, tests that were already done for both
tools were migrated to the new shell functions, unless the test command
or matches differ between the tools. Additional tests were added to test
tool-specific help messages.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/engine.sh  | 15 ++++++
 tools/tracing/rtla/tests/osnoise.t  | 46 +++++++++--------
 tools/tracing/rtla/tests/timerlat.t | 76 ++++++++++++++---------------
 3 files changed, 73 insertions(+), 64 deletions(-)

diff --git a/tools/tracing/rtla/tests/engine.sh b/tools/tracing/rtla/tests/engine.sh
index ed261e07c6d9..27d92f19a322 100644
--- a/tools/tracing/rtla/tests/engine.sh
+++ b/tools/tracing/rtla/tests/engine.sh
@@ -112,6 +112,21 @@ check_with_osnoise_options() {
 	NO_RESET_OSNOISE=1 check "$arg1" "$arg2" "$arg3"
 }
 
+check_top_hist() {
+	# Test one command with both "top" and "hist" tools, replacing "TOOL" in
+	# command with either "top" or "hist" respectively, and prefixing the test
+	# names with "top " and "hist ".
+	check "top $1" "$(echo "$2" | sed 's/TOOL/top/g')" "${@:3}"
+	check "hist $1" "$(echo "$2" | sed 's/TOOL/hist/g')" "${@:3}"
+}
+
+check_top_q_hist() {
+	# Same as above, but pass "-q" to top so that strings printed in main
+	# loop are on their own line for top too, not only for hist.
+	check "top $1" "$(echo "$2" | sed 's/TOOL/top -q/g')" "${@:3}"
+	check "hist $1" "$(echo "$2" | sed 's/TOOL/hist/g')" "${@:3}"
+}
+
 set_timeout() {
 	TIMEOUT="timeout -v -k 15s $1"
 }
diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index 396334608920..ce3a448b1f87 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -7,13 +7,15 @@ set_timeout 2m
 
 check "verify help page" \
 	"osnoise --help" 0 "osnoise version"
-check "verify the --priority/-P param" \
-	"osnoise top -P F:1 -c 0 -r 900000 -d 10s -q -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh osnoise/ SCHED_FIFO 1\"" \
+check_top_hist "verify help page" \
+	"osnoise TOOL --help" 0 "rtla osnoise"
+check_top_q_hist "verify the --priority/-P param" \
+	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh osnoise/ SCHED_FIFO 1\"" \
 	2 "Priorities are set correctly"
-check "verify the --stop/-s param" \
-	"osnoise top -s 30 -T 1" 2 "osnoise hit stop tracing"
-check "verify the  --trace param" \
-	"osnoise hist -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
+check_top_q_hist "verify the --stop/-s param" \
+	"osnoise TOOL -s 30 -T 1" 2 "osnoise hit stop tracing"
+check_top_q_hist "verify the --trace param" \
+	"osnoise TOOL -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
 check "verify the --entries/-E param" \
 	"osnoise hist -P F:1 -c 0 -r 900000 -d 10s -b 10 -E 25"
 
@@ -24,27 +26,23 @@ check_with_osnoise_options "apply default period" \
 	"osnoise hist -s 1" 2 period_us=600000000
 
 # Actions tests
-check "trace output through -t with custom filename" \
-	"osnoise hist -S 2 -t custom_filename.txt" 2 "^  Saving trace to custom_filename.txt$"
-check "trace output through --on-threshold trace" \
-	"osnoise hist -S 2 --on-threshold trace" 2 "^  Saving trace to osnoise_trace.txt$"
-check "trace output through --on-threshold trace with custom filename" \
-	"osnoise hist -S 2 --on-threshold trace,file=custom_filename.txt" 2 "^  Saving trace to custom_filename.txt$"
-check "exec command" \
-	"osnoise hist -S 2 --on-threshold shell,command='echo TestOutput'" 2 "^TestOutput$"
-check "multiple actions" \
-	"osnoise hist -S 2 --on-threshold shell,command='echo -n 1' --on-threshold shell,command='echo 2'" 2 "^12$"
+check_top_q_hist "trace output through -t with custom filename" \
+	"osnoise TOOL -S 2 -t custom_filename.txt" 2 "^  Saving trace to custom_filename.txt$"
+check_top_q_hist "trace output through --on-threshold trace" \
+	"osnoise TOOL -S 2 --on-threshold trace" 2 "^  Saving trace to osnoise_trace.txt$"
+check_top_q_hist "trace output through --on-threshold trace with custom filename" \
+	"osnoise TOOL -S 2 --on-threshold trace,file=custom_filename.txt" 2 "^  Saving trace to custom_filename.txt$"
+check_top_q_hist "exec command" \
+	"osnoise TOOL -S 2 --on-threshold shell,command='echo TestOutput'" 2 "^TestOutput$"
+check_top_q_hist "multiple actions" \
+	"osnoise TOOL -S 2 --on-threshold shell,command='echo -n 1' --on-threshold shell,command='echo 2'" 2 "^12$"
 check "hist stop at failed action" \
 	"osnoise hist -S 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1# RTLA osnoise histogram$"
 check "top stop at failed action" \
 	"osnoise top -S 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh"
-check "hist with continue" \
-	"osnoise hist -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
-check "top with continue" \
-	"osnoise top -q -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
-check "hist with trace output at end" \
-	"osnoise hist -d 1s --on-end trace" 0 "^  Saving trace to osnoise_trace.txt$"
-check "top with trace output at end" \
-	"osnoise top -d 1s --on-end trace" 0 "^  Saving trace to osnoise_trace.txt$"
+check_top_q_hist "with continue" \
+	"osnoise TOOL -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
+check_top_hist "with trace output at end" \
+	"osnoise TOOL -d 1s --on-end trace" 0 "^  Saving trace to osnoise_trace.txt$"
 
 test_end
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index fd4935fd7b49..d7944710a859 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -22,64 +22,60 @@ export RTLA_NO_BPF=$option
 # Basic tests
 check "verify help page" \
 	"timerlat --help" 0 "timerlat version"
-check "verify -s/--stack" \
-	"timerlat top -s 3 -T 10 -t" 2 "Blocking thread stack trace"
-check "verify -P/--priority" \
-	"timerlat top -P F:1 -c 0 -d 10s -q -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh timerlatu/ SCHED_FIFO 1\"" \
+check_top_hist "verify help page" \
+	"timerlat TOOL --help" 0 "rtla timerlat"
+check_top_hist "verify -s/--stack" \
+	"timerlat TOOL -s 3 -T 10 -t" 2 "Blocking thread stack trace"
+check_top_hist "verify -P/--priority" \
+	"timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh timerlatu/ SCHED_FIFO 1\"" \
 	2 "Priorities are set correctly"
-check "test in nanoseconds" \
-	"timerlat top -i 2 -c 0 -n -d 10s" 2 "ns"
-check "set the automatic trace mode" \
-	"timerlat top -a 5" 2 "analyzing it"
-check "dump tasks" \
-	"timerlat top -a 5 --dump-tasks" 2 "Printing CPU tasks"
+check_top_hist "test in nanoseconds" \
+	"timerlat TOOL -i 2 -c 0 -n -d 10s" 2 "ns"
+check_top_hist "set the automatic trace mode" \
+	"timerlat TOOL -a 5" 2 "analyzing it"
+check_top_hist "dump tasks" \
+	"timerlat TOOL -a 5 --dump-tasks" 2 "Printing CPU tasks"
 check "print the auto-analysis if hits the stop tracing condition" \
 	"timerlat top --aa-only 5" 2
-check "disable auto-analysis" \
-	"timerlat top -s 3 -T 10 -t --no-aa" 2
-check "verify -c/--cpus" \
-	"timerlat hist -c 0 -d 10s"
-check "hist test in nanoseconds" \
-	"timerlat hist -i 2 -c 0 -n -d 10s" 2 "ns"
+check_top_hist "disable auto-analysis" \
+	"timerlat TOOL -s 3 -T 10 -t --no-aa" 2
+check_top_hist "verify -c/--cpus" \
+	"timerlat TOOL -c 0 -d 10s"
 
 # Actions tests
-check "trace output through -t" \
-	"timerlat hist -T 2 -t" 2 "^  Saving trace to timerlat_trace.txt$"
-check "trace output through -t with custom filename" \
-	"timerlat hist -T 2 -t custom_filename.txt" 2 "^  Saving trace to custom_filename.txt$"
-check "trace output through --on-threshold trace" \
-	"timerlat hist -T 2 --on-threshold trace" 2 "^  Saving trace to timerlat_trace.txt$"
-check "trace output through --on-threshold trace with custom filename" \
-	"timerlat hist -T 2 --on-threshold trace,file=custom_filename.txt" 2 "^  Saving trace to custom_filename.txt$"
-check "exec command" \
-	"timerlat hist -T 2 --on-threshold shell,command='echo TestOutput'" 2 "^TestOutput$"
-check "multiple actions" \
-	"timerlat hist -T 2 --on-threshold shell,command='echo -n 1' --on-threshold shell,command='echo 2'" 2 "^12$"
+check_top_q_hist "trace output through -t" \
+	"timerlat TOOL -T 2 -t" 2 "^  Saving trace to timerlat_trace.txt$"
+check_top_q_hist "trace output through -t with custom filename" \
+	"timerlat TOOL -T 2 -t custom_filename.txt" 2 "^  Saving trace to custom_filename.txt$"
+check_top_q_hist "trace output through --on-threshold trace" \
+	"timerlat TOOL -T 2 --on-threshold trace" 2 "^  Saving trace to timerlat_trace.txt$"
+check_top_q_hist "trace output through --on-threshold trace with custom filename" \
+	"timerlat TOOL -T 2 --on-threshold trace,file=custom_filename.txt" 2 "^  Saving trace to custom_filename.txt$"
+check_top_q_hist "exec command" \
+	"timerlat TOOL -T 2 --on-threshold shell,command='echo TestOutput'" 2 "^TestOutput$"
+check_top_q_hist "multiple actions" \
+	"timerlat TOOL -T 2 --on-threshold shell,command='echo -n 1' --on-threshold shell,command='echo 2'" 2 "^12$"
 check "hist stop at failed action" \
 	"timerlat hist -T 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1# RTLA timerlat histogram$"
 check "top stop at failed action" \
 	"timerlat top -T 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh"
-check "hist with continue" \
-	"timerlat hist -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
-check "top with continue" \
-	"timerlat top -q -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
-check "hist with trace output at end" \
-	"timerlat hist -d 1s --on-end trace" 0 "^  Saving trace to timerlat_trace.txt$"
-check "top with trace output at end" \
-	"timerlat top -d 1s --on-end trace" 0 "^  Saving trace to timerlat_trace.txt$"
+check_top_q_hist "with continue" \
+	"timerlat TOOL -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
+check_top_hist "with trace output at end" \
+	"timerlat TOOL -d 1s --on-end trace" 0 "^  Saving trace to timerlat_trace.txt$"
 
 # BPF action program tests
 if [ "$option" -eq 0 ]
 then
 	# Test BPF action program properly in BPF mode
 	[ -z "$BPFTOOL" ] && BPFTOOL=bpftool
-	check "hist with BPF action program (BPF mode)" \
-		"timerlat hist -T 2 --bpf-action tests/bpf/bpf_action_map.o --on-threshold shell,command='$BPFTOOL map dump name rtla_test_map'" \
+	check_top_q_hist "with BPF action program (BPF mode)" \
+		"timerlat TOOL -T 2 --bpf-action tests/bpf/bpf_action_map.o --on-threshold shell,command='$BPFTOOL map dump name rtla_test_map'" \
 		2 '"value": 42'
 else
 	# Test BPF action program failure in non-BPF mode
-	check "hist with BPF action program (non-BPF mode)" \
-		"timerlat hist -T 2 --bpf-action tests/bpf/bpf_action_map.o" \
+	check_top_q_hist "with BPF action program (non-BPF mode)" \
+		"timerlat TOOL -T 2 --bpf-action tests/bpf/bpf_action_map.o" \
 		1 "BPF actions are not supported in tracefs-only mode"
 fi
 done
-- 
2.53.0


^ permalink raw reply related

* [PATCH 3/9] rtla/tests: Check -c/--cpus thread affinity
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

RTLA runtime tests verify the -c/--cpus options, but do not check
whether the correct affinity is actually applied.

Add a script named check-cpus.sh that retrieves the affinity of all
workload threads and use it to check the -c/--cpus option for both
osnoise and timerlat tools.

Also add missing -c/--cpus test for osnoise.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/osnoise.t             | 2 ++
 tools/tracing/rtla/tests/scripts/check-cpus.sh | 9 +++++++++
 tools/tracing/rtla/tests/timerlat.t            | 4 ++--
 3 files changed, 13 insertions(+), 2 deletions(-)
 create mode 100755 tools/tracing/rtla/tests/scripts/check-cpus.sh

diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index ed6ff0cc3329..5edffb23981b 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -18,6 +18,8 @@ check_top_q_hist "verify the --trace param" \
 	"osnoise TOOL -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
 check "verify the --entries/-E param" \
 	"osnoise hist -P F:1 -c 0 -r 900000 -d 10s -b 10 -E 25"
+check_top_q_hist "verify the -c/--cpus param" \
+	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 
 # Test setting default period by putting an absurdly high period
 # and stopping on threshold.
diff --git a/tools/tracing/rtla/tests/scripts/check-cpus.sh b/tools/tracing/rtla/tests/scripts/check-cpus.sh
new file mode 100755
index 000000000000..0b016d4a7945
--- /dev/null
+++ b/tools/tracing/rtla/tests/scripts/check-cpus.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+. "$(dirname $0)/lib/get_workload_pids.sh"
+echo -n "Affinity of threads: "
+for pid in $(get_workload_pids)
+do
+    echo -n $(taskset -c -p $pid | cut -d ':' -f 2)
+done
+echo
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index 765dffd9d42a..fb60022aaa64 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -39,8 +39,8 @@ check "print the auto-analysis if hits the stop tracing condition" \
 	"timerlat top --aa-only 5" 2
 check_top_hist "disable auto-analysis" \
 	"timerlat TOOL -s 3 -T 10 -t --no-aa" 2
-check_top_hist "verify -c/--cpus" \
-	"timerlat TOOL -c 0 -d 10s"
+check_top_q_hist "verify -c/--cpus" \
+	"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 
 # Actions tests
 check_top_q_hist "trace output through -t" \
-- 
2.53.0


^ permalink raw reply related

* [PATCH 2/9] rtla/tests: Add get_workload_pids() helper
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

RTLA runtime tests that check workload processes (currently the test
case "verify -P/--priority" of timerlat.t and "verify the --priority/-P
param" of osnoise.t) use "pgrep timerlatu/" or "pgrep osnoise/"
respectively to identify the workload.

Make them more robust by adding a get_workload_pids() helper that
finds the main rtla process and returns the PIDs of all siblings other
than the test script itself, plus all child processes of kthreadd that
have the osnoise/timerlat kthread pattern comm.

This filters out any spurious processes not related to the running test
that happen to have "timerlatu/" or "osnoise/" in their command, for
example, a user grepping the same names at the time of the running of
the test.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/osnoise.t                    |  2 +-
 tools/tracing/rtla/tests/scripts/check-priority.sh    |  8 ++++----
 .../rtla/tests/scripts/lib/get_workload_pids.sh       | 11 +++++++++++
 tools/tracing/rtla/tests/timerlat.t                   |  2 +-
 4 files changed, 17 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh

diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index ce3a448b1f87..ed6ff0cc3329 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -10,7 +10,7 @@ check "verify help page" \
 check_top_hist "verify help page" \
 	"osnoise TOOL --help" 0 "rtla osnoise"
 check_top_q_hist "verify the --priority/-P param" \
-	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh osnoise/ SCHED_FIFO 1\"" \
+	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
 	2 "Priorities are set correctly"
 check_top_q_hist "verify the --stop/-s param" \
 	"osnoise TOOL -s 30 -T 1" 2 "osnoise hit stop tracing"
diff --git a/tools/tracing/rtla/tests/scripts/check-priority.sh b/tools/tracing/rtla/tests/scripts/check-priority.sh
index 79b702a34a96..b51d5232a868 100755
--- a/tools/tracing/rtla/tests/scripts/check-priority.sh
+++ b/tools/tracing/rtla/tests/scripts/check-priority.sh
@@ -1,8 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
-pids="$(pgrep ^$1)" || exit 1
-for pid in $pids
+. "$(dirname $0)/lib/get_workload_pids.sh"
+for pid in $(get_workload_pids)
 do
-  chrt -p $pid | cut -d ':' -f 2 | head -n1 | grep "^ $2\$" >/dev/null
-  chrt -p $pid | cut -d ':' -f 2 | tail -n1 | grep "^ $3\$" >/dev/null
+  chrt -p $pid | cut -d ':' -f 2 | head -n1 | grep "^ $1\$" >/dev/null
+  chrt -p $pid | cut -d ':' -f 2 | tail -n1 | grep "^ $2\$" >/dev/null
 done && echo "Priorities are set correctly"
diff --git a/tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh b/tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh
new file mode 100644
index 000000000000..8aff98cd2c1f
--- /dev/null
+++ b/tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+get_workload_pids() {
+    local shell_pid=$$
+    local rtla_pid=$(ps -o ppid= $shell_pid)
+
+    # kernel threads
+    pgrep -P $(pgrep ^kthreadd$) -f '^(osnoise|timerlat)/[0-9]+$'
+    # user threads
+    pgrep -P $rtla_pid | grep -v "^$shell_pid$"
+}
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index d7944710a859..765dffd9d42a 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -27,7 +27,7 @@ check_top_hist "verify help page" \
 check_top_hist "verify -s/--stack" \
 	"timerlat TOOL -s 3 -T 10 -t" 2 "Blocking thread stack trace"
 check_top_hist "verify -P/--priority" \
-	"timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh timerlatu/ SCHED_FIFO 1\"" \
+	"timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
 	2 "Priorities are set correctly"
 check_top_hist "test in nanoseconds" \
 	"timerlat TOOL -i 2 -c 0 -n -d 10s" 2 "ns"
-- 
2.53.0


^ permalink raw reply related

* [PATCH 4/9] rtla/tests: Use negative match when testing --aa-only
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

For testing the -a/--auto option in timerlat tool, the string "analyzing
it" is matched against to make sure auto-analysis was triggered.

Use the same string as a negative match for --aa-only option test.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/timerlat.t | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index fb60022aaa64..f47a82c115c7 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -38,7 +38,7 @@ check_top_hist "dump tasks" \
 check "print the auto-analysis if hits the stop tracing condition" \
 	"timerlat top --aa-only 5" 2
 check_top_hist "disable auto-analysis" \
-	"timerlat TOOL -s 3 -T 10 -t --no-aa" 2
+	"timerlat TOOL -s 3 -T 10 -t --no-aa" 2 "" "analyzing it"
 check_top_q_hist "verify -c/--cpus" \
 	"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH 5/9] rtla/tests: Extend timerlat top --aa-only coverage
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

rtla-timerlat-top's --aa-only option is currently only tested for return
value.

Extend the tests to also check that only auto-analysis is being done via
a negative match for the "Timer Latency" text in the top header, and
further split the test case into two:

- one test case for --aa-only stopping on threshold
- one test case for --aa-only exiting without threshold being hit

For both cases, the expected output ("analyzing it" or "Max latency was"
respectively) is checked against in addition to the negative match.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/timerlat.t | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index f47a82c115c7..28c01d8b299d 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -35,8 +35,10 @@ check_top_hist "set the automatic trace mode" \
 	"timerlat TOOL -a 5" 2 "analyzing it"
 check_top_hist "dump tasks" \
 	"timerlat TOOL -a 5 --dump-tasks" 2 "Printing CPU tasks"
-check "print the auto-analysis if hits the stop tracing condition" \
-	"timerlat top --aa-only 5" 2
+check "verify --aa-only stop on threshold" \
+	"timerlat top --aa-only 5" 2 "analyzing it" "Timer Latency"
+check "verify --aa-only max latency" \
+	"timerlat top --aa-only 2000000 -d 1s" 0 "^  Max latency was" "Timer Latency"
 check_top_hist "disable auto-analysis" \
 	"timerlat TOOL -s 3 -T 10 -t --no-aa" 2 "" "analyzing it"
 check_top_q_hist "verify -c/--cpus" \
-- 
2.53.0


^ permalink raw reply related

* [PATCH 6/9] rtla/tests: Cover all hist options in runtime tests
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

Cover all options regarding histogram formatting for both
rtla-osnoise-hist and rtla-timerlat-hist tools. All options also have
output checking using positive or negative match, except for
-b/--bucket-size and -E/--entries, which cannot be tested in isolated
due to the output depending on the actual data collected.

Old -E/--entries test for rtla-osnoise was replaced with a new one
equivalent to the timerlat one.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/osnoise.t  | 18 ++++++++++++++++--
 tools/tracing/rtla/tests/timerlat.t | 20 ++++++++++++++++++++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index 5edffb23981b..773a46e2dc5f 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -16,11 +16,25 @@ check_top_q_hist "verify the --stop/-s param" \
 	"osnoise TOOL -s 30 -T 1" 2 "osnoise hit stop tracing"
 check_top_q_hist "verify the --trace param" \
 	"osnoise TOOL -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
-check "verify the --entries/-E param" \
-	"osnoise hist -P F:1 -c 0 -r 900000 -d 10s -b 10 -E 25"
 check_top_q_hist "verify the -c/--cpus param" \
 	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 
+# Histogram tests
+check "hist with -b/--bucket-size" \
+	"osnoise hist -b 1 -d 1s"
+check "hist with -E/--entries" \
+	"osnoise hist -E 10 -d 1s"
+check "hist with -E/--entries out of range" \
+	"osnoise hist -E 1 -d 1s" 1 "^Entries must be > 10 and < 9999999$"
+check "hist with --no-header" \
+	"osnoise hist --no-header -d 1s" 0 "" "RTLA osnoise histogram"
+check "hist with --with-zeros" \
+	"osnoise hist --with-zeros -b 100000 -E 21 -d 1s" 0 '^2000000\s+0\s+'
+check "hist with --no-index" \
+	"osnoise hist --no-index --with-zeros -d 1s" 0 "" "^count:"
+check "hist with --no-summary" \
+	"osnoise hist --no-summary -d 1s" 0 "" "^count:"
+
 # Test setting default period by putting an absurdly high period
 # and stopping on threshold.
 # If default period is not set, this will time out.
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index 28c01d8b299d..a14d9ec32ede 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -44,6 +44,26 @@ check_top_hist "disable auto-analysis" \
 check_top_q_hist "verify -c/--cpus" \
 	"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 
+# Histogram tests
+check "hist with -b/--bucket-size" \
+	"timerlat hist -b 1 -d 1s"
+check "hist with -E/--entries" \
+	"timerlat hist -E 10 -d 1s"
+check "hist with -E/--entries out of range" \
+	"timerlat hist -E 1 -d 1s" 1 "^Entries must be > 10 and < 9999999$"
+check "hist with --no-header" \
+	"timerlat hist --no-header -d 1s" 0 "" "RTLA timerlat histogram"
+check "hist with --with-zeros" \
+	"timerlat hist --with-zeros -b 100000 -E 21 -d 1s" 0 '^2000000\s+0\s+'
+check "hist with --no-index" \
+	"timerlat hist --no-index --with-zeros -d 1s" 0 "" "^count:"
+check "hist with --no-summary" \
+	"timerlat hist --no-summary -d 1s" 0 "" "^ALL:"
+check "hist with --no-irq" \
+	"timerlat hist --no-irq -d 1s" 0 "" "IRQ-"
+check "hist with --no-thread" \
+	"timerlat hist --no-thread -d 1s" 0 "" "Thr-"
+
 # Actions tests
 check_top_q_hist "trace output through -t" \
 	"timerlat TOOL -T 2 -t" 2 "^  Saving trace to timerlat_trace.txt$"
-- 
2.53.0


^ permalink raw reply related

* [PATCH 7/9] rtla/tests: Add runtime test for -H/--house-keeping
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

Add a runtime test for -H/--house-keeping option for both osnoise and
timerlat tools, with affinity checking similar to what is done for
-c/--cpus.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/osnoise.t                          | 2 ++
 tools/tracing/rtla/tests/scripts/check-housekeeping-cpus.sh | 4 ++++
 tools/tracing/rtla/tests/timerlat.t                         | 2 ++
 3 files changed, 8 insertions(+)
 create mode 100755 tools/tracing/rtla/tests/scripts/check-housekeeping-cpus.sh

diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index 773a46e2dc5f..cdea84914345 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -18,6 +18,8 @@ check_top_q_hist "verify the --trace param" \
 	"osnoise TOOL -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
 check_top_q_hist "verify the -c/--cpus param" \
 	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
+check_top_q_hist "verify the -H/--house-keeping param" \
+	"osnoise TOOL -P F:1 -H 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-housekeeping-cpus.sh" 2 "^Affinity of threads: 0$"
 
 # Histogram tests
 check "hist with -b/--bucket-size" \
diff --git a/tools/tracing/rtla/tests/scripts/check-housekeeping-cpus.sh b/tools/tracing/rtla/tests/scripts/check-housekeeping-cpus.sh
new file mode 100755
index 000000000000..4742f34efb49
--- /dev/null
+++ b/tools/tracing/rtla/tests/scripts/check-housekeeping-cpus.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+pid=$(ps -o ppid= $$)
+echo "Affinity of threads:$(taskset -c -p $pid | cut -d ':' -f 2)"
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index a14d9ec32ede..20f68bcbcb27 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -43,6 +43,8 @@ check_top_hist "disable auto-analysis" \
 	"timerlat TOOL -s 3 -T 10 -t --no-aa" 2 "" "analyzing it"
 check_top_q_hist "verify -c/--cpus" \
 	"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
+check_top_q_hist "verify -H/--house-keeping" \
+	"timerlat TOOL -H 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-housekeeping-cpus.sh" 2 "^Affinity of threads: 0$"
 
 # Histogram tests
 check "hist with -b/--bucket-size" \
-- 
2.53.0


^ permalink raw reply related

* [PATCH 8/9] rtla/tests: Add runtime test for -k and -u options
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

Add runtime test for rtla-timerlat's -k/--kernel-threads and
-u/--user-threads options using get_workload_pids.sh to check whether
the appropriate threads are being created.

The tests are implemented for both top and hist. Additionally, all tests
related to timerlat threads are moved to a separate section in the test
files. The latter is also done for rtla-osnoise tests.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/osnoise.t               |  8 +++++---
 .../tests/scripts/check-user-kernel-threads.sh   | 16 ++++++++++++++++
 tools/tracing/rtla/tests/timerlat.t              | 12 +++++++++---
 3 files changed, 30 insertions(+), 6 deletions(-)
 create mode 100755 tools/tracing/rtla/tests/scripts/check-user-kernel-threads.sh

diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index cdea84914345..d0b623233db5 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -9,13 +9,15 @@ check "verify help page" \
 	"osnoise --help" 0 "osnoise version"
 check_top_hist "verify help page" \
 	"osnoise TOOL --help" 0 "rtla osnoise"
-check_top_q_hist "verify the --priority/-P param" \
-	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
-	2 "Priorities are set correctly"
 check_top_q_hist "verify the --stop/-s param" \
 	"osnoise TOOL -s 30 -T 1" 2 "osnoise hit stop tracing"
 check_top_q_hist "verify the --trace param" \
 	"osnoise TOOL -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
+
+# Thread tests
+check_top_q_hist "verify the --priority/-P param" \
+	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
+	2 "Priorities are set correctly"
 check_top_q_hist "verify the -c/--cpus param" \
 	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 check_top_q_hist "verify the -H/--house-keeping param" \
diff --git a/tools/tracing/rtla/tests/scripts/check-user-kernel-threads.sh b/tools/tracing/rtla/tests/scripts/check-user-kernel-threads.sh
new file mode 100755
index 000000000000..bb7ac510a735
--- /dev/null
+++ b/tools/tracing/rtla/tests/scripts/check-user-kernel-threads.sh
@@ -0,0 +1,16 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+. "$(dirname $0)/lib/get_workload_pids.sh"
+kthreadd_pid=$(pgrep ^kthreadd$)
+cnt_kernel=0
+cnt_user=0
+for pid in $(get_workload_pids)
+do
+    if [ "$(echo $(ps -o ppid= $pid))" = "$kthreadd_pid" ]
+    then
+        ((++cnt_kernel))
+    else
+        ((++cnt_user))
+    fi
+done
+echo "$cnt_kernel kernel threads, $cnt_user user threads"
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index 20f68bcbcb27..3557adbdebae 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -26,9 +26,6 @@ check_top_hist "verify help page" \
 	"timerlat TOOL --help" 0 "rtla timerlat"
 check_top_hist "verify -s/--stack" \
 	"timerlat TOOL -s 3 -T 10 -t" 2 "Blocking thread stack trace"
-check_top_hist "verify -P/--priority" \
-	"timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
-	2 "Priorities are set correctly"
 check_top_hist "test in nanoseconds" \
 	"timerlat TOOL -i 2 -c 0 -n -d 10s" 2 "ns"
 check_top_hist "set the automatic trace mode" \
@@ -41,10 +38,19 @@ check "verify --aa-only max latency" \
 	"timerlat top --aa-only 2000000 -d 1s" 0 "^  Max latency was" "Timer Latency"
 check_top_hist "disable auto-analysis" \
 	"timerlat TOOL -s 3 -T 10 -t --no-aa" 2 "" "analyzing it"
+
+# Thread tests
+check_top_hist "verify -P/--priority" \
+	"timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
+	2 "Priorities are set correctly"
 check_top_q_hist "verify -c/--cpus" \
 	"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 check_top_q_hist "verify -H/--house-keeping" \
 	"timerlat TOOL -H 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-housekeeping-cpus.sh" 2 "^Affinity of threads: 0$"
+check_top_q_hist "verify -k/--kernel-threads" \
+	"timerlat TOOL -k -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-user-kernel-threads.sh" 2 "1 kernel threads, 0 user threads"
+check_top_q_hist "verify -u/--user-threads" \
+	"timerlat TOOL -u -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-user-kernel-threads.sh" 2 "0 kernel threads, 1 user threads"
 
 # Histogram tests
 check "hist with -b/--bucket-size" \
-- 
2.53.0


^ permalink raw reply related

* [PATCH 9/9] rtla/tests: Add runtime tests for -C/--cgroup
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>

Add a new script check-cgroup-match.sh that retrieves the cgroup of the
main rtla process and compares it to the cgroup of the rtla workload
threads.

Add a new test based on this script, for both osnoise and timerlat
tools, testing the variant of -C without argument (which sets the cgroup
of the workload to the cgroup of the rtla main process).

Note that this has to be tested in kernel mode to be significant for
timerlat tool, as user workloads inherit the parent rtla process cgroup
even without the option.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/tests/osnoise.t              |  3 +++
 .../rtla/tests/scripts/check-cgroup-match.sh    | 17 +++++++++++++++++
 tools/tracing/rtla/tests/timerlat.t             |  3 +++
 3 files changed, 23 insertions(+)
 create mode 100755 tools/tracing/rtla/tests/scripts/check-cgroup-match.sh

diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index d0b623233db5..06787471d0e8 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -18,6 +18,9 @@ check_top_q_hist "verify the --trace param" \
 check_top_q_hist "verify the --priority/-P param" \
 	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
 	2 "Priorities are set correctly"
+check_top_q_hist "verify the -C/--cgroup param" \
+	"osnoise TOOL -C -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-cgroup-match.sh\"" \
+	2 "cgroup matches for all workload PIDs"
 check_top_q_hist "verify the -c/--cpus param" \
 	"osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 check_top_q_hist "verify the -H/--house-keeping param" \
diff --git a/tools/tracing/rtla/tests/scripts/check-cgroup-match.sh b/tools/tracing/rtla/tests/scripts/check-cgroup-match.sh
new file mode 100755
index 000000000000..fdc2c68c5957
--- /dev/null
+++ b/tools/tracing/rtla/tests/scripts/check-cgroup-match.sh
@@ -0,0 +1,17 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+. "$(dirname $0)/lib/get_workload_pids.sh"
+rtla_pid=$(echo $(ps -o ppid= $$))
+rtla_cgroup=$(</proc/$rtla_pid/cgroup)
+echo "RTLA cgroup: $rtla_cgroup"
+for pid in $(get_workload_pids)
+do
+    pid_cgroup=$(</proc/$pid/cgroup)
+    echo "PID $pid cgroup: $pid_cgroup"
+    if ! [ "$pid_cgroup" = "$rtla_cgroup" ]
+    then
+        echo "Mismatch!"
+        exit 0
+    fi
+done
+echo "cgroup matches for all workload PIDs"
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index 3557adbdebae..3ebfe316b39e 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -43,6 +43,9 @@ check_top_hist "disable auto-analysis" \
 check_top_hist "verify -P/--priority" \
 	"timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
 	2 "Priorities are set correctly"
+check_top_hist "verify -C/--cgroup" \
+	"timerlat TOOL -k -C -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-cgroup-match.sh\"" \
+	2 "cgroup matches for all workload PIDs"
 check_top_q_hist "verify -c/--cpus" \
 	"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
 check_top_q_hist "verify -H/--house-keeping" \
-- 
2.53.0


^ permalink raw reply related

* [PATCH] rtla: Document tests in README
From: Tomas Glozar @ 2026-04-23 13:07 UTC (permalink / raw)
  To: Steven Rostedt, Tomas Glozar
  Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
	Wander Lairson Costa, LKML, linux-trace-kernel

RTLA tests are not documented anywhere. Mention both runtime and unit
tests in the README, with instructions on how to run them and a list of
dependencies and required system configuration.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/README.txt | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/tools/tracing/rtla/README.txt b/tools/tracing/rtla/README.txt
index a9faee4dbb3a..8a782cd2c171 100644
--- a/tools/tracing/rtla/README.txt
+++ b/tools/tracing/rtla/README.txt
@@ -42,4 +42,34 @@ For development, we suggest the following steps for compiling rtla:
   $ make
   $ sudo make install
 
+Running tests
+
+RTLA has two test suites: a runtime test suite and a unit test suite.
+
+The runtime test suite is available as "make check" (root required) and has
+the following dependencies, in addition to RTLA build dependencies:
+
+- Perl
+- Test::Harness / TAP::Harness
+- bash
+- coreutils
+- ldd
+- util-linux
+- procps(-ng)
+- bpftool (if rtla is built against libbpf)
+
+as well as the following required system configuration:
+
+- CONFIG_OSNOISE_TRACER=y
+- CONFIG_TIMERLAT_TRACER=y
+- tracefs mounted and readable at /sys/kernel/tracing
+
+The unit test suite is available as "make unit-tests" and has the following
+dependencies:
+
+- libcheck
+
+Unlike the runtime test suite, root is not required to run unit tests, nor is
+a tracefs/osnoise/timerlat-capable kernel required.
+
 For further information, please refer to the rtla man page.
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v2 2/2] module/kallsyms: sort function symbols and use binary search
From: Petr Pavlu @ 2026-04-23 14:00 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linux-modules, Sami Tolvanen, Luis Chamberlain, linux-kernel,
	linux-trace-kernel, live-patching, Daniel Gomez, Aaron Tomlin,
	Steven Rostedt, Masami Hiramatsu, Jordan Rome, Viktor Malik
In-Reply-To: <20260327110005.16499-2-stf_xl@wp.pl>

On 3/27/26 12:00 PM, Stanislaw Gruszka wrote:
> Module symbol lookup via find_kallsyms_symbol() performs a linear scan
> over the entire symtab when resolving an address. The number of symbols
> in module symtabs has grown over the years, largely due to additional
> metadata in non-standard sections, making this lookup very slow.
> 
> Improve this by separating function symbols during module load, placing
> them at the beginning of the symtab, sorting them by address, and using
> binary search when resolving addresses in module text.
> 
> This also should improve times for linear symbol name lookups, as valid
> function symbols are now located at the beginning of the symtab.
> 
> The cost of sorting is small relative to module load time. In repeated
> module load tests [1], depending on .config options, this change
> increases load time between 2% and 4%. With cold caches, the difference
> is not measurable, as memory access latency dominates.
> 
> The sorting theoretically could be done in compile time, but much more
> complicated as we would have to simulate kernel addresses resolution
> for symbols, and then correct relocation entries. That would be risky
> if get out of sync.
> 
> The improvement can be observed when listing ftrace filter functions.
> 
> Before:
> 
> root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
> 74908
> 
> real	0m1.315s
> user	0m0.000s
> sys	0m1.312s
> 
> After:
> 
> root@nano:~# time cat /sys/kernel/tracing/available_filter_functions | wc -l
> 74911
> 
> real	0m0.167s
> user	0m0.004s
> sys	0m0.175s
> 
> (there are three more symbols introduced by the patch)
> 
> For livepatch modules, the symtab layout is preserved and the existing
> linear search is used. For this case, it should be possible to keep
> the original ELF symtab instead of copying it 1:1, but that is outside
> the scope of this patch.
> 
> Link: https://gist.github.com/sgruszka/09f3fb1dad53a97b1aad96e1927ab117 [1]
> Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>

Sorry for the delay reviewing this patch.

> ---
> v1 -> v2: 
>  - fix searching data symbols for CONFIG_KALLSYMS_ALL
>  - use kallsyms_symbol_value() in elf_sym_cmp()
> 
>  include/linux/module.h   |   1 +
>  kernel/module/internal.h |   1 +
>  kernel/module/kallsyms.c | 171 +++++++++++++++++++++++++++++----------
>  3 files changed, 130 insertions(+), 43 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index ac254525014c..67c053afa882 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -379,6 +379,7 @@ struct module_memory {
>  struct mod_kallsyms {
>  	Elf_Sym *symtab;
>  	unsigned int num_symtab;
> +	unsigned int num_func_syms;
>  	char *strtab;
>  	char *typetab;
>  };
> diff --git a/kernel/module/internal.h b/kernel/module/internal.h
> index 618202578b42..6a4d498619b1 100644
> --- a/kernel/module/internal.h
> +++ b/kernel/module/internal.h
> @@ -73,6 +73,7 @@ struct load_info {
>  	bool sig_ok;
>  #ifdef CONFIG_KALLSYMS
>  	unsigned long mod_kallsyms_init_off;
> +	unsigned long num_func_syms;
>  #endif
>  #ifdef CONFIG_MODULE_DECOMPRESS
>  #ifdef CONFIG_MODULE_STATS
> diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
> index f23126d804b2..d69e99e67707 100644
> --- a/kernel/module/kallsyms.c
> +++ b/kernel/module/kallsyms.c
> @@ -10,6 +10,7 @@
>  #include <linux/kallsyms.h>
>  #include <linux/buildid.h>
>  #include <linux/bsearch.h>
> +#include <linux/sort.h>
>  #include "internal.h"
>  
>  /* Lookup exported symbol in given range of kernel_symbols */
> @@ -103,6 +104,95 @@ static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs,
>  	return true;
>  }
>  
> +static inline bool is_func_symbol(const Elf_Sym *sym)
> +{
> +	return sym->st_shndx != SHN_UNDEF && sym->st_size != 0 &&
> +	       ELF_ST_TYPE(sym->st_info) == STT_FUNC;
> +}
> +
> +static unsigned int bsearch_func_symbol(struct mod_kallsyms *kallsyms,
> +					unsigned long addr,
> +					unsigned long *bestval,
> +					unsigned long *nextval)
> +
> +{
> +	unsigned int mid, low = 1, high = kallsyms->num_func_syms + 1;
> +	unsigned int best = 0;
> +	unsigned long thisval;
> +
> +	while (low < high) {
> +		mid = low + (high - low) / 2;
> +		thisval = kallsyms_symbol_value(&kallsyms->symtab[mid]);
> +
> +		if (thisval <= addr) {
> +			*bestval = thisval;
> +			best = mid;
> +			low = mid + 1;

If thisval == addr, the search moves to the right and finds the last
symbol with the same address. I believe it should do the opposite and
return the first symbol to match the behavior of
search_kallsyms_symbol().

> +		} else {
> +			*nextval = thisval;
> +			high = mid;
> +		}
> +	}
> +
> +	return best;
> +}
> +
> +static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms,
> +					unsigned int symnum)
> +{
> +	return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
> +}
> +
> +static unsigned int search_kallsyms_symbol(struct mod_kallsyms *kallsyms,
> +					   unsigned long addr,
> +					   unsigned long *bestval,
> +					   unsigned long *nextval)
> +{
> +	unsigned int i, best = 0;
> +
> +	/*
> +	 * Scan for closest preceding symbol and next symbol. (ELF starts
> +	 * real symbols at 1). Skip the initial function symbols range
> +	 * if num_func_syms is non-zero, those are handled separately for
> +	 * the core TEXT segment lookup.
> +	 */
> +	for (i = 1 + kallsyms->num_func_syms; i < kallsyms->num_symtab; i++) {
> +		const Elf_Sym *sym = &kallsyms->symtab[i];
> +		unsigned long thisval = kallsyms_symbol_value(sym);
> +
> +		if (sym->st_shndx == SHN_UNDEF)
> +			continue;
> +
> +		/*
> +		 * We ignore unnamed symbols: they're uninformative
> +		 * and inserted at a whim.
> +		 */
> +		if (*kallsyms_symbol_name(kallsyms, i) == '\0' ||
> +		    is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
> +			continue;
> +
> +		if (thisval <= addr && thisval > *bestval) {
> +			best = i;
> +			*bestval = thisval;
> +		}
> +		if (thisval > addr && thisval < *nextval)
> +			*nextval = thisval;
> +	}
> +
> +	return best;
> +}
> +
> +static int elf_sym_cmp(const void *a, const void *b)
> +{
> +	unsigned long val_a = kallsyms_symbol_value((const Elf_Sym *)a);
> +	unsigned long val_b = kallsyms_symbol_value((const Elf_Sym *)b);
> +
> +	if (val_a < val_b)
> +		return -1;
> +
> +	return val_a > val_b;

Does this comparison function and the sort() call result in stable
sorting? If val_a and val_b are the same, the sorting should preserve
the original order.

> +}
> +
>  /*
>   * We only allocate and copy the strings needed by the parts of symtab
>   * we keep.  This is simple, but has the effect of making multiple
> @@ -115,9 +205,10 @@ void layout_symtab(struct module *mod, struct load_info *info)
>  	Elf_Shdr *symsect = info->sechdrs + info->index.sym;
>  	Elf_Shdr *strsect = info->sechdrs + info->index.str;
>  	const Elf_Sym *src;
> -	unsigned int i, nsrc, ndst, strtab_size = 0;
> +	unsigned int i, nsrc, ndst, nfunc, strtab_size = 0;
>  	struct module_memory *mod_mem_data = &mod->mem[MOD_DATA];
>  	struct module_memory *mod_mem_init_data = &mod->mem[MOD_INIT_DATA];
> +	bool is_lp_mod = is_livepatch_module(mod);
>  
>  	/* Put symbol section at end of init part of module. */
>  	symsect->sh_flags |= SHF_ALLOC;
> @@ -129,12 +220,14 @@ void layout_symtab(struct module *mod, struct load_info *info)
>  	nsrc = symsect->sh_size / sizeof(*src);
>  
>  	/* Compute total space required for the core symbols' strtab. */
> -	for (ndst = i = 0; i < nsrc; i++) {
> -		if (i == 0 || is_livepatch_module(mod) ||
> +	for (ndst = nfunc = i = 0; i < nsrc; i++) {
> +		if (i == 0 || is_lp_mod ||
>  		    is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
>  				   info->index.pcpu)) {
>  			strtab_size += strlen(&info->strtab[src[i].st_name]) + 1;
>  			ndst++;
> +			if (!is_lp_mod && is_func_symbol(src + i))
> +				nfunc++;
>  		}
>  	}
>  
> @@ -156,6 +249,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
>  	mod_mem_init_data->size = ALIGN(mod_mem_init_data->size,
>  					__alignof__(struct mod_kallsyms));
>  	info->mod_kallsyms_init_off = mod_mem_init_data->size;
> +	info->num_func_syms = nfunc;
>  
>  	mod_mem_init_data->size += sizeof(struct mod_kallsyms);
>  	info->init_typeoffs = mod_mem_init_data->size;
> @@ -169,7 +263,7 @@ void layout_symtab(struct module *mod, struct load_info *info)
>   */
>  void add_kallsyms(struct module *mod, const struct load_info *info)
>  {
> -	unsigned int i, ndst;
> +	unsigned int i, di, nfunc, ndst;
>  	const Elf_Sym *src;
>  	Elf_Sym *dst;
>  	char *s;
> @@ -178,6 +272,7 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  	void *data_base = mod->mem[MOD_DATA].base;
>  	void *init_data_base = mod->mem[MOD_INIT_DATA].base;
>  	struct mod_kallsyms *kallsyms;
> +	bool is_lp_mod = is_livepatch_module(mod);
>  
>  	kallsyms = init_data_base + info->mod_kallsyms_init_off;

This code is followed by the initialization of kallsyms:

	kallsyms->symtab = (void *)symsec->sh_addr;
	kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym);
	/* Make sure we get permanent strtab: don't use info->strtab. */
	kallsyms->strtab = (void *)info->sechdrs[info->index.str].sh_addr;
	kallsyms->typetab = init_data_base + info->init_typeoffs;

I suggest adding 'kallsyms->num_func_syms = 0;' after the initialization
of kallsyms->num_symtab.

>  
> @@ -194,19 +289,28 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  	mod->core_kallsyms.symtab = dst = data_base + info->symoffs;
>  	mod->core_kallsyms.strtab = s = data_base + info->stroffs;
>  	mod->core_kallsyms.typetab = data_base + info->core_typeoffs;
> +
>  	strtab_size = info->core_typeoffs - info->stroffs;
>  	src = kallsyms->symtab;
> -	for (ndst = i = 0; i < kallsyms->num_symtab; i++) {
> +	ndst = info->num_func_syms + 1;
> +
> +	for (nfunc = i = 0; i < kallsyms->num_symtab; i++) {
>  		kallsyms->typetab[i] = elf_type(src + i, info);
> -		if (i == 0 || is_livepatch_module(mod) ||
> +		if (i == 0 || is_lp_mod ||
>  		    is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
>  				   info->index.pcpu)) {
>  			ssize_t ret;
>  
> -			mod->core_kallsyms.typetab[ndst] =
> -				kallsyms->typetab[i];
> -			dst[ndst] = src[i];
> -			dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
> +			if (i == 0)
> +				di = 0;
> +			else if (!is_lp_mod && is_func_symbol(src + i))
> +				di = 1 + nfunc++;
> +			else
> +				di = ndst++;
> +
> +			mod->core_kallsyms.typetab[di] = kallsyms->typetab[i];
> +			dst[di] = src[i];
> +			dst[di].st_name = s - mod->core_kallsyms.strtab;
>  			ret = strscpy(s, &kallsyms->strtab[src[i].st_name],
>  				      strtab_size);
>  			if (ret < 0)
> @@ -216,9 +320,13 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  		}
>  	}
>  
> +	WARN_ON_ONCE(nfunc != info->num_func_syms);
> +	sort(dst + 1, nfunc, sizeof(Elf_Sym), elf_sym_cmp, NULL);
> +

The code sorts mod->core_kallsyms.symtab but mod->core_kallsyms.typetab
is not reordered accordingly.

>  	/* Set up to point into init section. */
>  	rcu_assign_pointer(mod->kallsyms, kallsyms);
>  	mod->core_kallsyms.num_symtab = ndst;
> +	mod->core_kallsyms.num_func_syms = nfunc;
>  }
>  
>  #if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
> @@ -241,11 +349,6 @@ void init_build_id(struct module *mod, const struct load_info *info)
>  }
>  #endif
>  
> -static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms, unsigned int symnum)
> -{
> -	return kallsyms->strtab + kallsyms->symtab[symnum].st_name;
> -}
> -
>  /*
>   * Given a module and address, find the corresponding symbol and return its name
>   * while providing its size and offset if needed.
> @@ -255,7 +358,10 @@ static const char *find_kallsyms_symbol(struct module *mod,
>  					unsigned long *size,
>  					unsigned long *offset)
>  {
> -	unsigned int i, best = 0;
> +	unsigned int (*search)(struct mod_kallsyms *kallsyms,
> +			       unsigned long addr, unsigned long *bestval,
> +			       unsigned long *nextval);
> +	unsigned int best;
>  	unsigned long nextval, bestval;
>  	struct mod_kallsyms *kallsyms = rcu_dereference(mod->kallsyms);
>  	struct module_memory *mod_mem = NULL;
> @@ -266,6 +372,11 @@ static const char *find_kallsyms_symbol(struct module *mod,
>  			continue;
>  #endif
>  		if (within_module_mem_type(addr, mod, type)) {
> +			if (type == MOD_TEXT && kallsyms->num_func_syms > 0)
> +				search = bsearch_func_symbol;

I'm not sure if it is ok to limit the search only to function symbols
when the address lies in MOD_TEXT. The text can theoretically contain
non-function symbols. Could this optimization be adjusted to sort all
MOD_TEXT symbols (excluding anonymous and mapping symbols) and move them
to the front of the symbol table?

> +			else
> +				search = search_kallsyms_symbol;
> +
>  			mod_mem = &mod->mem[type];
>  			break;
>  		}
> @@ -278,33 +389,7 @@ static const char *find_kallsyms_symbol(struct module *mod,
>  	nextval = (unsigned long)mod_mem->base + mod_mem->size;
>  	bestval = (unsigned long)mod_mem->base - 1;
>  
> -	/*
> -	 * Scan for closest preceding symbol, and next symbol. (ELF
> -	 * starts real symbols at 1).
> -	 */
> -	for (i = 1; i < kallsyms->num_symtab; i++) {
> -		const Elf_Sym *sym = &kallsyms->symtab[i];
> -		unsigned long thisval = kallsyms_symbol_value(sym);
> -
> -		if (sym->st_shndx == SHN_UNDEF)
> -			continue;
> -
> -		/*
> -		 * We ignore unnamed symbols: they're uninformative
> -		 * and inserted at a whim.
> -		 */
> -		if (*kallsyms_symbol_name(kallsyms, i) == '\0' ||
> -		    is_mapping_symbol(kallsyms_symbol_name(kallsyms, i)))
> -			continue;
> -
> -		if (thisval <= addr && thisval > bestval) {
> -			best = i;
> -			bestval = thisval;
> -		}
> -		if (thisval > addr && thisval < nextval)
> -			nextval = thisval;
> -	}
> -
> +	best = search(kallsyms, addr, &bestval, &nextval);
>  	if (!best)
>  		return NULL;
>  

-- 
Thanks,
Petr

^ permalink raw reply

* [PATCH 1/1] tools/rv: ensure monitor name and desc are NUL-terminated
From: unknownbbqrx @ 2026-04-23 14:19 UTC (permalink / raw)
  To: rostedt, gmonaco; +Cc: linux-trace-kernel, linux-kernel, unknownbbqrx


ikm_fill_monitor_definition() copies monitor name and description with
strncpy(), but does not guarantee NUL termination when source strings are
equal to or longer than the destination buffers.

Clamp copies to sizeof(dst) - 1 and explicitly append '\0' for both fields
to keep them safe for later string operations.

Signed-off-by: unknownbbqrx <dev@unknownbbqr.xyz>
---
 tools/verification/rv/src/in_kernel.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/verification/rv/src/in_kernel.c b/tools/verification/rv/src/in_kernel.c
index 4bb746ea6..d32453824 100644
--- a/tools/verification/rv/src/in_kernel.c
+++ b/tools/verification/rv/src/in_kernel.c
@@ -215,10 +215,11 @@ static int ikm_fill_monitor_definition(char *name, struct monitor *ikm, char *co
 		return -1;
 	}
 
-	strncpy(ikm->name, nested_name, MAX_DA_NAME_LEN);
+	strncpy(ikm->name, nested_name, sizeof(ikm->name) - 1);
+	ikm->name[sizeof(ikm->name) - 1] = '\0';
 	ikm->enabled = enabled;
-	strncpy(ikm->desc, desc, MAX_DESCRIPTION);
-
+	strncpy(ikm->desc, desc, sizeof(ikm->desc) - 1);
+	ikm->desc[sizeof(ikm->desc) - 1] = '\0';
 	free(desc);
 
 	return 0;
-- 
2.53.0




^ permalink raw reply related

* [PATCH] tools/rv: harden monitor name lookup bounds checks
From: unknownbbqrx @ 2026-04-23 14:44 UTC (permalink / raw)
  To: rostedt, gmonaco; +Cc: linux-trace-kernel, linux-kernel, unknownbbqrx


Bound monitor-name derived copies in __ikm_find_monitor_name() and avoid unbounded writes from sprintf()/memcpy().

Pass the output buffer size from the caller, validate extracted line length from rv/available_monitors, and use snprintf() with truncation checks when building container monitor names.

Signed-off-by: unknownbbqrx <dev@unknownbbqr.xyz>
---
 tools/verification/rv/src/in_kernel.c | 34 +++++++++++++++++++++------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/tools/verification/rv/src/in_kernel.c b/tools/verification/rv/src/in_kernel.c
index d32453824..f17eac9b6 100644
--- a/tools/verification/rv/src/in_kernel.c
+++ b/tools/verification/rv/src/in_kernel.c
@@ -56,9 +56,12 @@ static int __ikm_read_enable(char *monitor_name)
  * The string out_name is populated with the full name, which can be
  * equal to monitor_name or container/monitor_name if nested
  */
-static int __ikm_find_monitor_name(char *monitor_name, char *out_name)
+static int __ikm_find_monitor_name(char *monitor_name, char *out_name,
+				  size_t out_name_size)
 {
-	char *available_monitors, container[MAX_DA_NAME_LEN+1], *cursor, *end;
+	char *available_monitors, container[MAX_DA_NAME_LEN + 2], *cursor, *end;
+	size_t len;
+	int n;
 	int retval = 1;
 
 	available_monitors = tracefs_instance_file_read(NULL, "rv/available_monitors", NULL);
@@ -72,17 +75,34 @@ static int __ikm_find_monitor_name(char *monitor_name, char *out_name)
 	}
 
 	for (; cursor > available_monitors; cursor--)
-		if (*(cursor-1) == '\n')
+		if (*(cursor - 1) == '\n')
 			break;
+
 	end = strstr(cursor, "\n");
-	memcpy(out_name, cursor, end-cursor);
-	out_name[end-cursor] = '\0';
+	if (!end) {
+		retval = -1;
+		goto out_free;
+	}
+
+	len = end - cursor;
+	if (len >= out_name_size) {
+		retval = -1;
+		goto out_free;
+	}
+
+	memcpy(out_name, cursor, len);
+	out_name[len] = '\0';
 
 	cursor = strstr(out_name, ":");
 	if (cursor)
 		*cursor = '/';
 	else {
-		sprintf(container, "%s:", monitor_name);
+		n = snprintf(container, sizeof(container), "%s:", monitor_name);
+		if (n < 0 || (size_t)n >= sizeof(container)) {
+			retval = -1;
+			goto out_free;
+		}
+
 		if (strstr(available_monitors, container))
 			config_is_container = 1;
 	}
@@ -782,7 +802,7 @@ int ikm_run_monitor(char *monitor_name, int argc, char **argv)
 	else
 		nested_name = monitor_name;
 
-	retval = __ikm_find_monitor_name(monitor_name, full_name);
+	retval = __ikm_find_monitor_name(monitor_name, full_name, sizeof(full_name));
 	if (!retval)
 		return 0;
 	if (retval < 0) {

base-commit: 2e68039281932e6dc37718a1ea7cbb8e2cda42e6
prerequisite-patch-id: b61dd51dee390277603975bf729a687113185c3a
-- 
2.53.0




^ permalink raw reply related

* [PATCH v3 0/3] Enable perf tracing for unprivileged users
From: Anubhav Shelat @ 2026-04-23 15:17 UTC (permalink / raw)
  To: peterz, mingo, mhiramat, rostedt, acme, namhyung
  Cc: mathieu.desnoyers, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, james.clark, linux-kernel,
	linux-trace-kernel, linux-perf-users, Anubhav Shelat

Enable users to use perf-trace to trace their own processes, like strace
but without the overhead of ptrace(). Ensure that users cannot access
other users' or systemwide tracing data.

Changes in v3:
- Don't set PERF_SAMPLE_IP for unprivileged tracepoints. This allows us
  to exclude PERF_SAMPLE_IP from kaddr_leak without weakening KASLR.
- Mount tracefs as world-traversable so users can access eventfs
  directories.

v2: https://lore.kernel.org/lkml/20260410133529.21947-1-ashelat@redhat.com/

Anubhav Shelat (3):
  perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints
  perf: enable unprivileged syscall tracing with perf trace
  tracefs: make root directory world-traversable

 fs/tracefs/inode.c              |  2 +-
 kernel/events/core.c            | 23 ++++++++++++++++++++---
 kernel/trace/trace_event_perf.c | 12 +++++++++++-
 kernel/trace/trace_events.c     |  8 ++++++--
 tools/perf/util/evsel.c         |  4 +++-
 5 files changed, 41 insertions(+), 8 deletions(-)

-- 
2.53.0


^ permalink raw reply

* [PATCH v3 1/3] perf evsel: don't set PERF_SAMPLE_IP for unprivileged tracepoints
From: Anubhav Shelat @ 2026-04-23 15:17 UTC (permalink / raw)
  To: peterz, mingo, mhiramat, rostedt, acme, namhyung
  Cc: mathieu.desnoyers, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, james.clark, linux-kernel,
	linux-trace-kernel, linux-perf-users, Anubhav Shelat
In-Reply-To: <20260423151746.16258-1-ashelat@redhat.com>

For tracepoint events the IP is a static kernel address.
It doesn't vary by sample and provides no useful information for
unprivileged users. Skipping setting PERF_SAMPLE_IP for unprivileged
tracepoints avoids exposing a kernel address that reveals the KASLR base
offset and slightly reduces sample record size.

Assisted-by: Claude:claude-sonnet-4.5
Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
---
 tools/perf/util/evsel.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f59228c1a39e..a1091d937ff9 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1503,7 +1503,9 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 	attr->write_backward = opts->overwrite ? 1 : 0;
 	attr->read_format   = PERF_FORMAT_LOST;
 
-	evsel__set_sample_bit(evsel, IP);
+	if (attr->type != PERF_TYPE_TRACEPOINT || perf_event_paranoid_check(1))
+		evsel__set_sample_bit(evsel, IP);
+
 	evsel__set_sample_bit(evsel, TID);
 
 	if (evsel->sample_read) {
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 2/3] perf: enable unprivileged syscall tracing with perf trace
From: Anubhav Shelat @ 2026-04-23 15:17 UTC (permalink / raw)
  To: peterz, mingo, mhiramat, rostedt, acme, namhyung
  Cc: mathieu.desnoyers, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, james.clark, linux-kernel,
	linux-trace-kernel, linux-perf-users, Anubhav Shelat
In-Reply-To: <20260423151746.16258-1-ashelat@redhat.com>

Allow unprivileged users to trace their own processes' syscalls using
perf trace, similar to strace without the intrusive overhead of ptrace().

Currently, perf trace requires CAP_PERFMON or paranoid level ≤ 1 even
though the kernel has existing infrastructure (TRACE_EVENT_FL_CAP_ANY)
specifically designed to mark syscall tracepoints as safe for
unprivileged access. To fix this:

1. Loosen the condition in perf_event_open() which requires privileges
for all events with exclude_kernel=0. This allows perf_event_open() to
bypass the paranoid check for task-attached tracepoint events. Ensure
that sample types which can expose kernel addresses to unprivileged
users are blocked.

2. Make the format and id tracefs files world-readable only for tracepoints
with TRACE_EVENT_FL_CAP_ANY, allowing unprivileged users to see syscall
tracepoint ids without exposing sensitive information.

Also add a check to perf_trace_event_perm() to ensure only
TRACE_EVENT_FL_CAP_ANY events can be traced.

Example usage after this change:
  $ perf trace ls          # works as unprivileged user
  $ perf trace             # system-wide, still requires privileges
  $ perf trace -p 1234     # requires ptrace permission on pid 1234

Assisted-by: Claude:claude-sonnet-4.5
Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
---
 kernel/events/core.c            | 24 +++++++++++++++++++++---
 kernel/trace/trace_event_perf.c | 12 +++++++++++-
 kernel/trace/trace_events.c     |  8 ++++++--
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6d1f8bad7e1c..e9c53758574d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -13833,9 +13833,27 @@ SYSCALL_DEFINE5(perf_event_open,
 		return err;
 
 	if (!attr.exclude_kernel) {
-		err = perf_allow_kernel();
-		if (err)
-			return err;
+		bool tp_bypass = false;
+
+		if (attr.type == PERF_TYPE_TRACEPOINT && pid != -1) {
+			/*
+			 * Block sample types that expose kernel addresses to
+			 * prevent KASLR bypass
+			 */
+			u64 kaddr_leak = PERF_SAMPLE_CALLCHAIN |
+					 PERF_SAMPLE_BRANCH_STACK |
+					 PERF_SAMPLE_ADDR |
+					 PERF_SAMPLE_REGS_INTR |
+					 PERF_SAMPLE_IP;
+
+			tp_bypass = !(attr.sample_type & kaddr_leak);
+		}
+
+		if (!tp_bypass) {
+			err = perf_allow_kernel();
+			if (err)
+				return err;
+		}
 	}
 
 	if (attr.namespaces) {
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a6bb7577e8c5..e8347df7ede5 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -73,8 +73,18 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
 	}
 
 	/* No tracing, just counting, so no obvious leak */
-	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
+	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW)) {
+		/*
+		 * Only allow CAP_ANY tracepoints for unprivileged
+		 * task-attached events in case kernel context is exposed.
+		 */
+		if (!p_event->attr.exclude_kernel && !perfmon_capable()) {
+			if (!(p_event->attach_state == PERF_ATTACH_TASK &&
+			      (tp_event->flags & TRACE_EVENT_FL_CAP_ANY)))
+				return -EACCES;
+		}
 		return 0;
+	}
 
 	/* Some events are ok to be traced by non-root users... */
 	if (p_event->attach_state == PERF_ATTACH_TASK) {
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index aa422dc80ae8..69be5561d0b8 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3054,7 +3054,9 @@ static int event_callback(const char *name, umode_t *mode, void **data,
 	struct trace_event_call *call = file->event_call;
 
 	if (strcmp(name, "format") == 0) {
-		*mode = TRACE_MODE_READ;
+		*mode = (call->flags & TRACE_EVENT_FL_CAP_ANY) ?
+			(TRACE_MODE_READ | 0004) :
+			TRACE_MODE_READ;
 		*fops = &ftrace_event_format_fops;
 		return 1;
 	}
@@ -3090,7 +3092,9 @@ static int event_callback(const char *name, umode_t *mode, void **data,
 #ifdef CONFIG_PERF_EVENTS
 	if (call->event.type && call->class->reg &&
 	    strcmp(name, "id") == 0) {
-		*mode = TRACE_MODE_READ;
+		*mode = (call->flags & TRACE_EVENT_FL_CAP_ANY) ?
+			(TRACE_MODE_READ | 0004) :
+			TRACE_MODE_READ;
 		*data = (void *)(long)call->event.type;
 		*fops = &ftrace_event_id_fops;
 		return 1;
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 3/3] tracefs: make root directory world-traversable
From: Anubhav Shelat @ 2026-04-23 15:17 UTC (permalink / raw)
  To: peterz, mingo, mhiramat, rostedt, acme, namhyung
  Cc: mathieu.desnoyers, mark.rutland, alexander.shishkin, jolsa,
	irogers, adrian.hunter, james.clark, linux-kernel,
	linux-trace-kernel, linux-perf-users, Anubhav Shelat
In-Reply-To: <20260423151746.16258-1-ashelat@redhat.com>

Change the default tracefs mount mode from 0700 to 0755. This allows
unprivileged users to access the eventfs directories underneath which
already use 0755.

This does not expose any tracing data since access to the files
themselves is controlled by individual permissions.

Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
---
 fs/tracefs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index 03f768536fd5..9506450fbc91 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -23,7 +23,7 @@
 #include <linux/slab.h>
 #include "internal.h"
 
-#define TRACEFS_DEFAULT_MODE	0700
+#define TRACEFS_DEFAULT_MODE	0755
 static struct kmem_cache *tracefs_inode_cachep __ro_after_init;
 
 static struct vfsmount *tracefs_mount;
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH] mm/vmscan: add balance_pgdat begin/end tracepoints
From: Shakeel Butt @ 2026-04-23 17:46 UTC (permalink / raw)
  To: Bunyod Suvonov
  Cc: akpm, hannes, rostedt, mhiramat, david, mhocko, zhengqi.arch, ljs,
	mathieu.desnoyers, linux-mm, linux-trace-kernel, linux-kernel
In-Reply-To: <20260423103753.546582-1-b.suvonov@sjtu.edu.cn>

On Thu, Apr 23, 2026 at 06:37:53PM +0800, Bunyod Suvonov wrote:
> Vmscan has six main reclaim entry points: try_to_free_pages() for
> direct reclaim, try_to_free_mem_cgroup_pages() for memcg reclaim,
> mem_cgroup_shrink_node() for memcg soft limit reclaim, node_reclaim()
> for node reclaim, shrink_all_memory() for hibernation reclaim, and
> balance_pgdat() for kswapd reclaim.
> 
> All of them, except for shrink_all_memory() and balance_pgdat(), already
> have begin/end tracepoints. This makes it harder to trace which reclaim
> path is responsible for memory reclaim activity, because kswapd reclaim
> cannot be identified as cleanly as other reclaim entry points, even
> though it is the main background reclaim path under memory pressure.
> There may be no need to trace shrink_all_memory() as it is primarily
> used during hibernation. So this patch adds the missing tracepoint pair
> for balance_pgdat().
> 
> The begin tracepoint records the node id, requested reclaim order, and
> highest_zoneidx. The end tracepoint records the node id, reclaim order
> that balance_pgdat() finished with, highest_zoneidx, and nr_reclaimed.

Do we need to trace highest_zoneidx at the end? Can it change within
balance_pgdat()?

> Together, they show the requested reclaim order and zone bound, whether
> reclaim fell back to a lower order, and how much reclaim work was done.
> 
> Signed-off-by: Bunyod Suvonov <b.suvonov@sjtu.edu.cn>

Overall looks good. 


^ permalink raw reply

* Re: [PATCH 7.2 v16 00/13] khugepaged: mTHP support
From: Andrew Morton @ 2026-04-23 20:30 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
	hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, ljs, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe
In-Reply-To: <20260419185750.260784-1-npache@redhat.com>

On Sun, 19 Apr 2026 12:57:37 -0600 Nico Pache <npache@redhat.com> wrote:

> The following series provides khugepaged with the capability to collapse
> anonymous memory regions to mTHPs.

Thanks, I added this to mm.git's mm-new branch for testing while review
is being completed.  I added notes regarding Usana's comments, so they
don't get lost.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox