* [PATCH 5/9] rtla/tests: Extend timerlat top --aa-only coverage
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
To: Steven Rostedt, Tomas Glozar
Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>
rtla-timerlat-top's --aa-only option is currently only tested for return
value.
Extend the tests to also check that only auto-analysis is being done via
a negative match for the "Timer Latency" text in the top header, and
further split the test case into two:
- one test case for --aa-only stopping on threshold
- one test case for --aa-only exiting without threshold being hit
For both cases, the expected output ("analyzing it" or "Max latency was"
respectively) is checked against in addition to the negative match.
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
tools/tracing/rtla/tests/timerlat.t | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index f47a82c115c7..28c01d8b299d 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -35,8 +35,10 @@ check_top_hist "set the automatic trace mode" \
"timerlat TOOL -a 5" 2 "analyzing it"
check_top_hist "dump tasks" \
"timerlat TOOL -a 5 --dump-tasks" 2 "Printing CPU tasks"
-check "print the auto-analysis if hits the stop tracing condition" \
- "timerlat top --aa-only 5" 2
+check "verify --aa-only stop on threshold" \
+ "timerlat top --aa-only 5" 2 "analyzing it" "Timer Latency"
+check "verify --aa-only max latency" \
+ "timerlat top --aa-only 2000000 -d 1s" 0 "^ Max latency was" "Timer Latency"
check_top_hist "disable auto-analysis" \
"timerlat TOOL -s 3 -T 10 -t --no-aa" 2 "" "analyzing it"
check_top_q_hist "verify -c/--cpus" \
--
2.53.0
^ permalink raw reply related
* [PATCH 4/9] rtla/tests: Use negative match when testing --aa-only
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
To: Steven Rostedt, Tomas Glozar
Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>
For testing the -a/--auto option in timerlat tool, the string "analyzing
it" is matched against to make sure auto-analysis was triggered.
Use the same string as a negative match for --aa-only option test.
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
tools/tracing/rtla/tests/timerlat.t | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index fb60022aaa64..f47a82c115c7 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -38,7 +38,7 @@ check_top_hist "dump tasks" \
check "print the auto-analysis if hits the stop tracing condition" \
"timerlat top --aa-only 5" 2
check_top_hist "disable auto-analysis" \
- "timerlat TOOL -s 3 -T 10 -t --no-aa" 2
+ "timerlat TOOL -s 3 -T 10 -t --no-aa" 2 "" "analyzing it"
check_top_q_hist "verify -c/--cpus" \
"timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
--
2.53.0
^ permalink raw reply related
* [PATCH 2/9] rtla/tests: Add get_workload_pids() helper
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
To: Steven Rostedt, Tomas Glozar
Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>
RTLA runtime tests that check workload processes (currently the test
case "verify -P/--priority" of timerlat.t and "verify the --priority/-P
param" of osnoise.t) use "pgrep timerlatu/" or "pgrep osnoise/"
respectively to identify the workload.
Make them more robust by adding a get_workload_pids() helper that
finds the main rtla process and returns the PIDs of all siblings other
than the test script itself, plus all child processes of kthreadd that
have the osnoise/timerlat kthread pattern comm.
This filters out any spurious processes not related to the running test
that happen to have "timerlatu/" or "osnoise/" in their command, for
example, a user grepping the same names at the time of the running of
the test.
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
tools/tracing/rtla/tests/osnoise.t | 2 +-
tools/tracing/rtla/tests/scripts/check-priority.sh | 8 ++++----
.../rtla/tests/scripts/lib/get_workload_pids.sh | 11 +++++++++++
tools/tracing/rtla/tests/timerlat.t | 2 +-
4 files changed, 17 insertions(+), 6 deletions(-)
create mode 100644 tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh
diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index ce3a448b1f87..ed6ff0cc3329 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -10,7 +10,7 @@ check "verify help page" \
check_top_hist "verify help page" \
"osnoise TOOL --help" 0 "rtla osnoise"
check_top_q_hist "verify the --priority/-P param" \
- "osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh osnoise/ SCHED_FIFO 1\"" \
+ "osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
2 "Priorities are set correctly"
check_top_q_hist "verify the --stop/-s param" \
"osnoise TOOL -s 30 -T 1" 2 "osnoise hit stop tracing"
diff --git a/tools/tracing/rtla/tests/scripts/check-priority.sh b/tools/tracing/rtla/tests/scripts/check-priority.sh
index 79b702a34a96..b51d5232a868 100755
--- a/tools/tracing/rtla/tests/scripts/check-priority.sh
+++ b/tools/tracing/rtla/tests/scripts/check-priority.sh
@@ -1,8 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
-pids="$(pgrep ^$1)" || exit 1
-for pid in $pids
+. "$(dirname $0)/lib/get_workload_pids.sh"
+for pid in $(get_workload_pids)
do
- chrt -p $pid | cut -d ':' -f 2 | head -n1 | grep "^ $2\$" >/dev/null
- chrt -p $pid | cut -d ':' -f 2 | tail -n1 | grep "^ $3\$" >/dev/null
+ chrt -p $pid | cut -d ':' -f 2 | head -n1 | grep "^ $1\$" >/dev/null
+ chrt -p $pid | cut -d ':' -f 2 | tail -n1 | grep "^ $2\$" >/dev/null
done && echo "Priorities are set correctly"
diff --git a/tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh b/tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh
new file mode 100644
index 000000000000..8aff98cd2c1f
--- /dev/null
+++ b/tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+get_workload_pids() {
+ local shell_pid=$$
+ local rtla_pid=$(ps -o ppid= $shell_pid)
+
+ # kernel threads
+ pgrep -P $(pgrep ^kthreadd$) -f '^(osnoise|timerlat)/[0-9]+$'
+ # user threads
+ pgrep -P $rtla_pid | grep -v "^$shell_pid$"
+}
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index d7944710a859..765dffd9d42a 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -27,7 +27,7 @@ check_top_hist "verify help page" \
check_top_hist "verify -s/--stack" \
"timerlat TOOL -s 3 -T 10 -t" 2 "Blocking thread stack trace"
check_top_hist "verify -P/--priority" \
- "timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh timerlatu/ SCHED_FIFO 1\"" \
+ "timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh SCHED_FIFO 1\"" \
2 "Priorities are set correctly"
check_top_hist "test in nanoseconds" \
"timerlat TOOL -i 2 -c 0 -n -d 10s" 2 "ns"
--
2.53.0
^ permalink raw reply related
* [PATCH 3/9] rtla/tests: Check -c/--cpus thread affinity
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
To: Steven Rostedt, Tomas Glozar
Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>
RTLA runtime tests verify the -c/--cpus options, but do not check
whether the correct affinity is actually applied.
Add a script named check-cpus.sh that retrieves the affinity of all
workload threads and use it to check the -c/--cpus option for both
osnoise and timerlat tools.
Also add missing -c/--cpus test for osnoise.
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
tools/tracing/rtla/tests/osnoise.t | 2 ++
tools/tracing/rtla/tests/scripts/check-cpus.sh | 9 +++++++++
tools/tracing/rtla/tests/timerlat.t | 4 ++--
3 files changed, 13 insertions(+), 2 deletions(-)
create mode 100755 tools/tracing/rtla/tests/scripts/check-cpus.sh
diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index ed6ff0cc3329..5edffb23981b 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -18,6 +18,8 @@ check_top_q_hist "verify the --trace param" \
"osnoise TOOL -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
check "verify the --entries/-E param" \
"osnoise hist -P F:1 -c 0 -r 900000 -d 10s -b 10 -E 25"
+check_top_q_hist "verify the -c/--cpus param" \
+ "osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
# Test setting default period by putting an absurdly high period
# and stopping on threshold.
diff --git a/tools/tracing/rtla/tests/scripts/check-cpus.sh b/tools/tracing/rtla/tests/scripts/check-cpus.sh
new file mode 100755
index 000000000000..0b016d4a7945
--- /dev/null
+++ b/tools/tracing/rtla/tests/scripts/check-cpus.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+. "$(dirname $0)/lib/get_workload_pids.sh"
+echo -n "Affinity of threads: "
+for pid in $(get_workload_pids)
+do
+ echo -n $(taskset -c -p $pid | cut -d ':' -f 2)
+done
+echo
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index 765dffd9d42a..fb60022aaa64 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -39,8 +39,8 @@ check "print the auto-analysis if hits the stop tracing condition" \
"timerlat top --aa-only 5" 2
check_top_hist "disable auto-analysis" \
"timerlat TOOL -s 3 -T 10 -t --no-aa" 2
-check_top_hist "verify -c/--cpus" \
- "timerlat TOOL -c 0 -d 10s"
+check_top_q_hist "verify -c/--cpus" \
+ "timerlat TOOL -c 0 -d 10s -T 1 --on-threshold shell,command=tests/scripts/check-cpus.sh" 2 "^Affinity of threads: 0$"
# Actions tests
check_top_q_hist "trace output through -t" \
--
2.53.0
^ permalink raw reply related
* [PATCH 1/9] rtla/tests: Cover both top and hist tools where possible
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
To: Steven Rostedt, Tomas Glozar
Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
Wander Lairson Costa, LKML, linux-trace-kernel
In-Reply-To: <20260423130558.882022-1-tglozar@redhat.com>
RTLA runtime tests currently do not cover both tool variants for osnoise
and timerlat properly. Many tests applicable to both tools are only
tested for one tool, selected randomly.
Introduce two new shell functions, check_top_hist() and
check_top_q_hist(). The functions use the same syntax as check() and run
check() on the arguments twice: once replacing the "TOOL" string in the
command with "top" (or "top -q"), once replacing it with "hist". The top
-q variant is used for tests relying on messages printed after aborting
the RTLA main loop with a starting new line, which only happens for top
tools in quiet mode; without -q, the top output is printed on the same
line and the matches would fail.
Tests that are applicable to both top and hist tools were modified to
the run for both; additionally, tests that were already done for both
tools were migrated to the new shell functions, unless the test command
or matches differ between the tools. Additional tests were added to test
tool-specific help messages.
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
tools/tracing/rtla/tests/engine.sh | 15 ++++++
tools/tracing/rtla/tests/osnoise.t | 46 +++++++++--------
tools/tracing/rtla/tests/timerlat.t | 76 ++++++++++++++---------------
3 files changed, 73 insertions(+), 64 deletions(-)
diff --git a/tools/tracing/rtla/tests/engine.sh b/tools/tracing/rtla/tests/engine.sh
index ed261e07c6d9..27d92f19a322 100644
--- a/tools/tracing/rtla/tests/engine.sh
+++ b/tools/tracing/rtla/tests/engine.sh
@@ -112,6 +112,21 @@ check_with_osnoise_options() {
NO_RESET_OSNOISE=1 check "$arg1" "$arg2" "$arg3"
}
+check_top_hist() {
+ # Test one command with both "top" and "hist" tools, replacing "TOOL" in
+ # command with either "top" or "hist" respectively, and prefixing the test
+ # names with "top " and "hist ".
+ check "top $1" "$(echo "$2" | sed 's/TOOL/top/g')" "${@:3}"
+ check "hist $1" "$(echo "$2" | sed 's/TOOL/hist/g')" "${@:3}"
+}
+
+check_top_q_hist() {
+ # Same as above, but pass "-q" to top so that strings printed in main
+ # loop are on their own line for top too, not only for hist.
+ check "top $1" "$(echo "$2" | sed 's/TOOL/top -q/g')" "${@:3}"
+ check "hist $1" "$(echo "$2" | sed 's/TOOL/hist/g')" "${@:3}"
+}
+
set_timeout() {
TIMEOUT="timeout -v -k 15s $1"
}
diff --git a/tools/tracing/rtla/tests/osnoise.t b/tools/tracing/rtla/tests/osnoise.t
index 396334608920..ce3a448b1f87 100644
--- a/tools/tracing/rtla/tests/osnoise.t
+++ b/tools/tracing/rtla/tests/osnoise.t
@@ -7,13 +7,15 @@ set_timeout 2m
check "verify help page" \
"osnoise --help" 0 "osnoise version"
-check "verify the --priority/-P param" \
- "osnoise top -P F:1 -c 0 -r 900000 -d 10s -q -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh osnoise/ SCHED_FIFO 1\"" \
+check_top_hist "verify help page" \
+ "osnoise TOOL --help" 0 "rtla osnoise"
+check_top_q_hist "verify the --priority/-P param" \
+ "osnoise TOOL -P F:1 -c 0 -r 900000 -d 10s -S 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh osnoise/ SCHED_FIFO 1\"" \
2 "Priorities are set correctly"
-check "verify the --stop/-s param" \
- "osnoise top -s 30 -T 1" 2 "osnoise hit stop tracing"
-check "verify the --trace param" \
- "osnoise hist -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
+check_top_q_hist "verify the --stop/-s param" \
+ "osnoise TOOL -s 30 -T 1" 2 "osnoise hit stop tracing"
+check_top_q_hist "verify the --trace param" \
+ "osnoise TOOL -s 30 -T 1 -t" 2 "Saving trace to osnoise_trace.txt"
check "verify the --entries/-E param" \
"osnoise hist -P F:1 -c 0 -r 900000 -d 10s -b 10 -E 25"
@@ -24,27 +26,23 @@ check_with_osnoise_options "apply default period" \
"osnoise hist -s 1" 2 period_us=600000000
# Actions tests
-check "trace output through -t with custom filename" \
- "osnoise hist -S 2 -t custom_filename.txt" 2 "^ Saving trace to custom_filename.txt$"
-check "trace output through --on-threshold trace" \
- "osnoise hist -S 2 --on-threshold trace" 2 "^ Saving trace to osnoise_trace.txt$"
-check "trace output through --on-threshold trace with custom filename" \
- "osnoise hist -S 2 --on-threshold trace,file=custom_filename.txt" 2 "^ Saving trace to custom_filename.txt$"
-check "exec command" \
- "osnoise hist -S 2 --on-threshold shell,command='echo TestOutput'" 2 "^TestOutput$"
-check "multiple actions" \
- "osnoise hist -S 2 --on-threshold shell,command='echo -n 1' --on-threshold shell,command='echo 2'" 2 "^12$"
+check_top_q_hist "trace output through -t with custom filename" \
+ "osnoise TOOL -S 2 -t custom_filename.txt" 2 "^ Saving trace to custom_filename.txt$"
+check_top_q_hist "trace output through --on-threshold trace" \
+ "osnoise TOOL -S 2 --on-threshold trace" 2 "^ Saving trace to osnoise_trace.txt$"
+check_top_q_hist "trace output through --on-threshold trace with custom filename" \
+ "osnoise TOOL -S 2 --on-threshold trace,file=custom_filename.txt" 2 "^ Saving trace to custom_filename.txt$"
+check_top_q_hist "exec command" \
+ "osnoise TOOL -S 2 --on-threshold shell,command='echo TestOutput'" 2 "^TestOutput$"
+check_top_q_hist "multiple actions" \
+ "osnoise TOOL -S 2 --on-threshold shell,command='echo -n 1' --on-threshold shell,command='echo 2'" 2 "^12$"
check "hist stop at failed action" \
"osnoise hist -S 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1# RTLA osnoise histogram$"
check "top stop at failed action" \
"osnoise top -S 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh"
-check "hist with continue" \
- "osnoise hist -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
-check "top with continue" \
- "osnoise top -q -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
-check "hist with trace output at end" \
- "osnoise hist -d 1s --on-end trace" 0 "^ Saving trace to osnoise_trace.txt$"
-check "top with trace output at end" \
- "osnoise top -d 1s --on-end trace" 0 "^ Saving trace to osnoise_trace.txt$"
+check_top_q_hist "with continue" \
+ "osnoise TOOL -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
+check_top_hist "with trace output at end" \
+ "osnoise TOOL -d 1s --on-end trace" 0 "^ Saving trace to osnoise_trace.txt$"
test_end
diff --git a/tools/tracing/rtla/tests/timerlat.t b/tools/tracing/rtla/tests/timerlat.t
index fd4935fd7b49..d7944710a859 100644
--- a/tools/tracing/rtla/tests/timerlat.t
+++ b/tools/tracing/rtla/tests/timerlat.t
@@ -22,64 +22,60 @@ export RTLA_NO_BPF=$option
# Basic tests
check "verify help page" \
"timerlat --help" 0 "timerlat version"
-check "verify -s/--stack" \
- "timerlat top -s 3 -T 10 -t" 2 "Blocking thread stack trace"
-check "verify -P/--priority" \
- "timerlat top -P F:1 -c 0 -d 10s -q -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh timerlatu/ SCHED_FIFO 1\"" \
+check_top_hist "verify help page" \
+ "timerlat TOOL --help" 0 "rtla timerlat"
+check_top_hist "verify -s/--stack" \
+ "timerlat TOOL -s 3 -T 10 -t" 2 "Blocking thread stack trace"
+check_top_hist "verify -P/--priority" \
+ "timerlat TOOL -P F:1 -c 0 -d 10s -T 1 --on-threshold shell,command=\"tests/scripts/check-priority.sh timerlatu/ SCHED_FIFO 1\"" \
2 "Priorities are set correctly"
-check "test in nanoseconds" \
- "timerlat top -i 2 -c 0 -n -d 10s" 2 "ns"
-check "set the automatic trace mode" \
- "timerlat top -a 5" 2 "analyzing it"
-check "dump tasks" \
- "timerlat top -a 5 --dump-tasks" 2 "Printing CPU tasks"
+check_top_hist "test in nanoseconds" \
+ "timerlat TOOL -i 2 -c 0 -n -d 10s" 2 "ns"
+check_top_hist "set the automatic trace mode" \
+ "timerlat TOOL -a 5" 2 "analyzing it"
+check_top_hist "dump tasks" \
+ "timerlat TOOL -a 5 --dump-tasks" 2 "Printing CPU tasks"
check "print the auto-analysis if hits the stop tracing condition" \
"timerlat top --aa-only 5" 2
-check "disable auto-analysis" \
- "timerlat top -s 3 -T 10 -t --no-aa" 2
-check "verify -c/--cpus" \
- "timerlat hist -c 0 -d 10s"
-check "hist test in nanoseconds" \
- "timerlat hist -i 2 -c 0 -n -d 10s" 2 "ns"
+check_top_hist "disable auto-analysis" \
+ "timerlat TOOL -s 3 -T 10 -t --no-aa" 2
+check_top_hist "verify -c/--cpus" \
+ "timerlat TOOL -c 0 -d 10s"
# Actions tests
-check "trace output through -t" \
- "timerlat hist -T 2 -t" 2 "^ Saving trace to timerlat_trace.txt$"
-check "trace output through -t with custom filename" \
- "timerlat hist -T 2 -t custom_filename.txt" 2 "^ Saving trace to custom_filename.txt$"
-check "trace output through --on-threshold trace" \
- "timerlat hist -T 2 --on-threshold trace" 2 "^ Saving trace to timerlat_trace.txt$"
-check "trace output through --on-threshold trace with custom filename" \
- "timerlat hist -T 2 --on-threshold trace,file=custom_filename.txt" 2 "^ Saving trace to custom_filename.txt$"
-check "exec command" \
- "timerlat hist -T 2 --on-threshold shell,command='echo TestOutput'" 2 "^TestOutput$"
-check "multiple actions" \
- "timerlat hist -T 2 --on-threshold shell,command='echo -n 1' --on-threshold shell,command='echo 2'" 2 "^12$"
+check_top_q_hist "trace output through -t" \
+ "timerlat TOOL -T 2 -t" 2 "^ Saving trace to timerlat_trace.txt$"
+check_top_q_hist "trace output through -t with custom filename" \
+ "timerlat TOOL -T 2 -t custom_filename.txt" 2 "^ Saving trace to custom_filename.txt$"
+check_top_q_hist "trace output through --on-threshold trace" \
+ "timerlat TOOL -T 2 --on-threshold trace" 2 "^ Saving trace to timerlat_trace.txt$"
+check_top_q_hist "trace output through --on-threshold trace with custom filename" \
+ "timerlat TOOL -T 2 --on-threshold trace,file=custom_filename.txt" 2 "^ Saving trace to custom_filename.txt$"
+check_top_q_hist "exec command" \
+ "timerlat TOOL -T 2 --on-threshold shell,command='echo TestOutput'" 2 "^TestOutput$"
+check_top_q_hist "multiple actions" \
+ "timerlat TOOL -T 2 --on-threshold shell,command='echo -n 1' --on-threshold shell,command='echo 2'" 2 "^12$"
check "hist stop at failed action" \
"timerlat hist -T 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1# RTLA timerlat histogram$"
check "top stop at failed action" \
"timerlat top -T 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh"
-check "hist with continue" \
- "timerlat hist -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
-check "top with continue" \
- "timerlat top -q -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
-check "hist with trace output at end" \
- "timerlat hist -d 1s --on-end trace" 0 "^ Saving trace to timerlat_trace.txt$"
-check "top with trace output at end" \
- "timerlat top -d 1s --on-end trace" 0 "^ Saving trace to timerlat_trace.txt$"
+check_top_q_hist "with continue" \
+ "timerlat TOOL -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
+check_top_hist "with trace output at end" \
+ "timerlat TOOL -d 1s --on-end trace" 0 "^ Saving trace to timerlat_trace.txt$"
# BPF action program tests
if [ "$option" -eq 0 ]
then
# Test BPF action program properly in BPF mode
[ -z "$BPFTOOL" ] && BPFTOOL=bpftool
- check "hist with BPF action program (BPF mode)" \
- "timerlat hist -T 2 --bpf-action tests/bpf/bpf_action_map.o --on-threshold shell,command='$BPFTOOL map dump name rtla_test_map'" \
+ check_top_q_hist "with BPF action program (BPF mode)" \
+ "timerlat TOOL -T 2 --bpf-action tests/bpf/bpf_action_map.o --on-threshold shell,command='$BPFTOOL map dump name rtla_test_map'" \
2 '"value": 42'
else
# Test BPF action program failure in non-BPF mode
- check "hist with BPF action program (non-BPF mode)" \
- "timerlat hist -T 2 --bpf-action tests/bpf/bpf_action_map.o" \
+ check_top_q_hist "with BPF action program (non-BPF mode)" \
+ "timerlat TOOL -T 2 --bpf-action tests/bpf/bpf_action_map.o" \
1 "BPF actions are not supported in tracefs-only mode"
fi
done
--
2.53.0
^ permalink raw reply related
* [PATCH 0/9] rtla/tests: Extend runtime test coverage
From: Tomas Glozar @ 2026-04-23 13:05 UTC (permalink / raw)
To: Steven Rostedt, Tomas Glozar
Cc: John Kacur, Luis Goncalves, Crystal Wood, Costa Shulyupin,
Wander Lairson Costa, LKML, linux-trace-kernel
This patchset introduces some new tests to cover more options, especially
histogram and thread options. Most of the new tests use positive and negative
output matches, sometimes in combination with action scripts, to verify that
RTLA is applying the settings correctly.
Tests were reorganized a little, adding two new sections: thread tests and
histogram tests, next to basic tests.
Additionally, coverage of existing tests is extended by adding new matches and
by extending tests to cover both top and hist tools where possible. For the
latter, new helpers check_top_hist and check_top_q_hist are added to engine.sh.
As part of the new action scripts, detection of measurement threads is made more
robust by following child processes of either RTLA (user workload) or kthreadd
(kernel workload) rather than grepping through the comms of all processes, which
might have lead to false positives.
These changes significantly improve test coverage and make the test suite more
against false positives from unrelated processes.
Tomas Glozar (9):
rtla/tests: Cover both top and hist tools where possible
rtla/tests: Add get_workload_pids() helper
rtla/tests: Check -c/--cpus thread affinity
rtla/tests: Use negative match when testing --aa-only
rtla/tests: Extend timerlat top --aa-only coverage
rtla/tests: Cover all hist options in runtime tests
rtla/tests: Add runtime test for -H/--house-keeping
rtla/tests: Add runtime test for -k and -u options
rtla/tests: Add runtime tests for -C/--cgroup
tools/tracing/rtla/tests/engine.sh | 15 +++
tools/tracing/rtla/tests/osnoise.t | 73 +++++++----
.../rtla/tests/scripts/check-cgroup-match.sh | 17 +++
.../tracing/rtla/tests/scripts/check-cpus.sh | 9 ++
.../tests/scripts/check-housekeeping-cpus.sh | 4 +
.../rtla/tests/scripts/check-priority.sh | 8 +-
.../scripts/check-user-kernel-threads.sh | 16 +++
.../tests/scripts/lib/get_workload_pids.sh | 11 ++
tools/tracing/rtla/tests/timerlat.t | 113 +++++++++++-------
9 files changed, 194 insertions(+), 72 deletions(-)
create mode 100755 tools/tracing/rtla/tests/scripts/check-cgroup-match.sh
create mode 100755 tools/tracing/rtla/tests/scripts/check-cpus.sh
create mode 100755 tools/tracing/rtla/tests/scripts/check-housekeeping-cpus.sh
create mode 100755 tools/tracing/rtla/tests/scripts/check-user-kernel-threads.sh
create mode 100644 tools/tracing/rtla/tests/scripts/lib/get_workload_pids.sh
--
2.53.0
^ permalink raw reply
* [PATCH] mm/vmscan: add balance_pgdat begin/end tracepoints
From: Bunyod Suvonov @ 2026-04-23 10:37 UTC (permalink / raw)
To: akpm, hannes, rostedt, mhiramat
Cc: david, mhocko, zhengqi.arch, shakeel.butt, ljs, mathieu.desnoyers,
linux-mm, linux-trace-kernel, linux-kernel, Bunyod Suvonov
Vmscan has six main reclaim entry points: try_to_free_pages() for
direct reclaim, try_to_free_mem_cgroup_pages() for memcg reclaim,
mem_cgroup_shrink_node() for memcg soft limit reclaim, node_reclaim()
for node reclaim, shrink_all_memory() for hibernation reclaim, and
balance_pgdat() for kswapd reclaim.
All of them, except for shrink_all_memory() and balance_pgdat(), already
have begin/end tracepoints. This makes it harder to trace which reclaim
path is responsible for memory reclaim activity, because kswapd reclaim
cannot be identified as cleanly as other reclaim entry points, even
though it is the main background reclaim path under memory pressure.
There may be no need to trace shrink_all_memory() as it is primarily
used during hibernation. So this patch adds the missing tracepoint pair
for balance_pgdat().
The begin tracepoint records the node id, requested reclaim order, and
highest_zoneidx. The end tracepoint records the node id, reclaim order
that balance_pgdat() finished with, highest_zoneidx, and nr_reclaimed.
Together, they show the requested reclaim order and zone bound, whether
reclaim fell back to a lower order, and how much reclaim work was done.
Signed-off-by: Bunyod Suvonov <b.suvonov@sjtu.edu.cn>
---
include/trace/events/vmscan.h | 52 +++++++++++++++++++++++++++++++++++
mm/vmscan.c | 5 ++++
2 files changed, 57 insertions(+)
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 4445a8d9218d..b4bf7b8def1f 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -96,6 +96,58 @@ TRACE_EVENT(mm_vmscan_kswapd_wake,
__entry->order)
);
+TRACE_EVENT(mm_vmscan_balance_pgdat_begin,
+
+ TP_PROTO(int nid, int order, int highest_zoneidx),
+
+ TP_ARGS(nid, order, highest_zoneidx),
+
+ TP_STRUCT__entry(
+ __field(int, nid)
+ __field(int, order)
+ __field(int, highest_zoneidx)
+ ),
+
+ TP_fast_assign(
+ __entry->nid = nid;
+ __entry->order = order;
+ __entry->highest_zoneidx = highest_zoneidx;
+ ),
+
+ TP_printk("nid=%d order=%d highest_zoneidx=%-8s",
+ __entry->nid,
+ __entry->order,
+ __print_symbolic(__entry->highest_zoneidx, ZONE_TYPE))
+);
+
+TRACE_EVENT(mm_vmscan_balance_pgdat_end,
+
+ TP_PROTO(int nid, int order, int highest_zoneidx,
+ unsigned long nr_reclaimed),
+
+ TP_ARGS(nid, order, highest_zoneidx, nr_reclaimed),
+
+ TP_STRUCT__entry(
+ __field(int, nid)
+ __field(int, order)
+ __field(int, highest_zoneidx)
+ __field(unsigned long, nr_reclaimed)
+ ),
+
+ TP_fast_assign(
+ __entry->nid = nid;
+ __entry->order = order;
+ __entry->highest_zoneidx = highest_zoneidx;
+ __entry->nr_reclaimed = nr_reclaimed;
+ ),
+
+ TP_printk("nid=%d order=%d highest_zoneidx=%-8s nr_reclaimed=%lu",
+ __entry->nid,
+ __entry->order,
+ __print_symbolic(__entry->highest_zoneidx, ZONE_TYPE),
+ __entry->nr_reclaimed)
+);
+
TRACE_EVENT(mm_vmscan_wakeup_kswapd,
TP_PROTO(int nid, int zid, int order, gfp_t gfp_flags),
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bd1b1aa12581..b2d89ed69d22 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7121,6 +7121,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
.may_unmap = 1,
};
+ trace_mm_vmscan_balance_pgdat_begin(pgdat->node_id, order,
+ highest_zoneidx);
set_task_reclaim_state(current, &sc.reclaim_state);
psi_memstall_enter(&pflags);
__fs_reclaim_acquire(_THIS_IP_);
@@ -7314,6 +7316,9 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
psi_memstall_leave(&pflags);
set_task_reclaim_state(current, NULL);
+ trace_mm_vmscan_balance_pgdat_end(pgdat->node_id, sc.order,
+ highest_zoneidx, sc.nr_reclaimed);
+
/*
* Return the order kswapd stopped reclaiming at as
* prepare_kswapd_sleep() takes it into account. If another caller
--
2.53.0
^ permalink raw reply related
* Re: [PATCH bpf-next 10/17] bpf: Add support for tracing_multi link session
From: Jiri Olsa @ 2026-04-23 8:35 UTC (permalink / raw)
To: XIAO WU
Cc: bot+bpf-ci, andrii, ast, bpf, clm, daniel, eddyz87, ihor.solodrai,
kafai, linux-trace-kernel, martin.lau, menglong8.dong, rostedt,
songliubraving, yhs, yonghong.song
In-Reply-To: <20260423160724.00004f6d@gmail.com>
On Thu, Apr 23, 2026 at 04:07:24PM +0800, XIAO WU wrote:
SNIP
> I agree the patch should be made bisect-safe. I will post a follow-up
> that ensures BPF_TRACE_FSESSION_MULTI cannot enter this uninitialized
> fexit path (either by initializing it consistently where needed, or
> rejecting this attach route and keeping it exclusive to
> bpf_tracing_multi_attach()).
>
> Signed-off-by: XIAO WU <shawdoxwu@gmail.com>
>
> Thanks
fyi there's v5 already https://lore.kernel.org/bpf/20260417192502.194548-1-jolsa@kernel.org/
jirka
^ permalink raw reply
* Re: [PATCH v17 0/5] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu @ 2026-04-23 8:26 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Steven Rostedt, Catalin Marinas, Will Deacon, Mathieu Desnoyers,
linux-kernel, linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687458572.932171.10907864814735342737.stgit@mhiramat.tok.corp.google.com>
Hi,
Sashiko[1] pointed out other problems. Let me review it.
I also found one mistake (not by this series), so I will fix it too.
[1] https://sashiko.dev/#/patchset/177687458572.932171.10907864814735342737.stgit%40mhiramat.tok.corp.google.com
Thanks,
On Thu, 23 Apr 2026 01:16:26 +0900
"Masami Hiramatsu (Google)" <mhiramat@kernel.org> wrote:
> Hi,
>
> Here is the 17th version of improvement patches for making persistent
> ring buffers robust to failures.
> The previous version is here:
>
> https://lore.kernel.org/all/177547105523.259641.14385891517704197263.stgit@mhiramat.tok.corp.google.com/
>
> This version fixes some review comments from Sashiko[1], which
> includes:
> [2/5] Fix to use rb_page_size() of rewound pages for entry_bytes.
> [3/5] - Fix to verify head_page at first before using its timestamp.
> - Reset timestamp if the page is invalid.
> [4/5] - In rb_test_inject_invalid_pages(), changed entry_bytes and
> idx to unsigned long
> - Added NULL checks for cpu_buffer and meta.
> - In allocate_trace_buffer(), added a NULL check for tr->name
> before comparing it with strcmp.
> [5/5] Added NULL check for dpage in rbm_show in ring_buffer.c.
>
> [1] https://sashiko.dev/#/patchset/177552432201.853249.5125045538812833325.stgit%40mhiramat.tok.corp.google.com
>
> Thank you,
>
> Masami Hiramatsu (Google) (5):
> ring-buffer: Flush and stop persistent ring buffer on panic
> ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
> ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
> ring-buffer: Add persistent ring buffer invalid-page inject test
> ring-buffer: Show commit numbers in buffer_meta file
>
>
> arch/alpha/include/asm/Kbuild | 1
> arch/arc/include/asm/Kbuild | 1
> arch/arm/include/asm/Kbuild | 1
> arch/arm64/include/asm/ring_buffer.h | 10 +
> arch/csky/include/asm/Kbuild | 1
> arch/hexagon/include/asm/Kbuild | 1
> arch/loongarch/include/asm/Kbuild | 1
> arch/m68k/include/asm/Kbuild | 1
> arch/microblaze/include/asm/Kbuild | 1
> arch/mips/include/asm/Kbuild | 1
> arch/nios2/include/asm/Kbuild | 1
> arch/openrisc/include/asm/Kbuild | 1
> arch/parisc/include/asm/Kbuild | 1
> arch/powerpc/include/asm/Kbuild | 1
> arch/riscv/include/asm/Kbuild | 1
> arch/s390/include/asm/Kbuild | 1
> arch/sh/include/asm/Kbuild | 1
> arch/sparc/include/asm/Kbuild | 1
> arch/um/include/asm/Kbuild | 1
> arch/x86/include/asm/Kbuild | 1
> arch/xtensa/include/asm/Kbuild | 1
> include/asm-generic/ring_buffer.h | 13 ++
> include/linux/ring_buffer.h | 1
> kernel/trace/Kconfig | 34 ++++
> kernel/trace/ring_buffer.c | 275 ++++++++++++++++++++++++++--------
> kernel/trace/trace.c | 4
> 26 files changed, 290 insertions(+), 67 deletions(-)
> create mode 100644 arch/arm64/include/asm/ring_buffer.h
> create mode 100644 include/asm-generic/ring_buffer.h
>
>
> base-commit: 6170922f137231b98fc568571befef63e1edff3f
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH bpf-next 10/17] bpf: Add support for tracing_multi link session
From: XIAO WU @ 2026-04-23 8:07 UTC (permalink / raw)
To: bot+bpf-ci
Cc: andrii, ast, bpf, clm, daniel, eddyz87, ihor.solodrai, jolsa,
kafai, linux-trace-kernel, martin.lau, menglong8.dong, rostedt,
songliubraving, yhs, yonghong.song
In-Reply-To: <0520af2c467a82b82aa3014b7e721f95bfde9e91324bb2e183b099e5a37bbc3e@mail.kernel.org>
Hi,
> BPF_TRACE_FSESSION_MULTI is now accepted here, which means
> a program with this type can enter bpf_tracing_prog_attach()
> via BPF_RAW_TRACEPOINT_OPEN:
>
> bpf_raw_tracepoint_open()
> bpf_raw_tp_link_attach() /* name == NULL */
> bpf_tracing_prog_attach() /* BPF_TRACE_FSESSION_MULTI */
>
> Further down in bpf_tracing_prog_attach(), the fexit node
> initialization only checks for BPF_TRACE_FSESSION:
>
> kernel/bpf/syscall.c:bpf_tracing_prog_attach() {
> ...
> if (prog->expected_attach_type == BPF_TRACE_FSESSION) {
> link->fexit.link = &link->link.link;
> link->fexit.cookie = bpf_cookie;
> }
> ...
> }
>
> So for BPF_TRACE_FSESSION_MULTI, link->fexit.link stays NULL
> (from kzalloc). When __bpf_trampoline_link_prog() later calls
> fsession_exit(), it returns &link->fexit with a NULL link
> field. This node gets added to the trampoline FEXIT list, and
> bpf_trampoline_get_progs() then dereferences it:
>
> kernel/bpf/trampoline.c:bpf_trampoline_get_progs() {
> ...
> hlist_for_each_entry(node, &tr->progs_hlist[kind], tramp_hlist) {
> *ip_arg |= node->link->prog->call_get_func_ip;
> ^^^^^^^^^^
> ...
> }
>
> Would it make sense to either add BPF_TRACE_FSESSION_MULTI to
> the fexit initialization, or reject this type in
> bpf_tracing_prog_attach() since it should only be used through
> bpf_tracing_multi_attach()?
Yes, confirmed.
I reproduced this on x86_64 with a minimal tracing program loaded as
BPF_PROG_TYPE_TRACING with
expected_attach_type=BPF_TRACE_FSESSION_MULTI, then attached through
BPF_RAW_TRACEPOINT_OPEN with name=NULL.
This reaches bpf_tracing_prog_attach() without initializing link->fexit
for FSESSION_MULTI and later hits the NULL dereference path in
trampoline handling, as you pointed out.
C reproducer:
--8<--
#define _GNU_SOURCE
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/syscall.h>
#include <unistd.h>
/* Use kernel-under-test UAPI, not host's potentially older one. */
#include "../kernel-source/include/uapi/linux/bpf.h"
#ifndef __NR_bpf
#define __NR_bpf 321
#endif
static int sys_bpf(int cmd, union bpf_attr *attr, unsigned int size)
{
return (int)syscall(__NR_bpf, cmd, attr, size);
}
static void bump_memlock(void)
{
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
setrlimit(RLIMIT_MEMLOCK, &r);
}
int main(void)
{
bump_memlock();
/* r0 = 0; exit */
struct bpf_insn prog[] = {
{ .code = 0xb7, .dst_reg = 0, .src_reg = 0, .off = 0, .imm = 0
}, { .code = 0x95, .dst_reg = 0, .src_reg = 0, .off = 0, .imm = 0 },
};
char license[] = "GPL";
static char log_buf[1 << 20];
union bpf_attr attr;
memset(&attr, 0, sizeof(attr));
attr.prog_type = BPF_PROG_TYPE_TRACING;
attr.expected_attach_type = BPF_TRACE_FSESSION_MULTI;
attr.insn_cnt = (uint32_t)(sizeof(prog) / sizeof(prog[0]));
attr.insns = (uint64_t)(uintptr_t)prog;
attr.license = (uint64_t)(uintptr_t)license;
attr.log_buf = (uint64_t)(uintptr_t)log_buf;
attr.log_size = sizeof(log_buf);
attr.log_level = 1;
int prog_fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
if (prog_fd < 0) {
fprintf(stderr, "BPF_PROG_LOAD failed: errno=%d (%s)\n", errno,
strerror(errno)); if (log_buf[0])
fprintf(stderr, "verifier log:\n%s\n", log_buf);
return 1;
}
memset(&attr, 0, sizeof(attr));
attr.raw_tracepoint.prog_fd = prog_fd;
attr.raw_tracepoint.name = 0; /* NULL name drives TRACING attach
path */ attr.raw_tracepoint.cookie = 0x4141414142424242ULL;
int link_fd = sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
if (link_fd < 0) {
fprintf(stderr, "BPF_RAW_TRACEPOINT_OPEN returned errno=%d
(%s)\n", errno, strerror(errno)); close(prog_fd);
return 2;
}
fprintf(stderr, "Unexpectedly succeeded: link_fd=%d\n", link_fd);
close(link_fd);
close(prog_fd);
return 0;
}
--8<--
I agree the patch should be made bisect-safe. I will post a follow-up
that ensures BPF_TRACE_FSESSION_MULTI cannot enter this uninitialized
fexit path (either by initializing it consistently where needed, or
rejecting this attach route and keeping it exclusive to
bpf_tracing_multi_attach()).
Signed-off-by: XIAO WU <shawdoxwu@gmail.com>
Thanks
^ permalink raw reply
* Re: [PATCH v17 1/5] ring-buffer: Flush and stop persistent ring buffer on panic
From: Geert Uytterhoeven @ 2026-04-23 7:28 UTC (permalink / raw)
To: Masami Hiramatsu (Google)
Cc: Steven Rostedt, Catalin Marinas, Will Deacon, Mathieu Desnoyers,
linux-kernel, linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687459412.932171.8121855108122534476.stgit@mhiramat.tok.corp.google.com>
On Wed, 22 Apr 2026 at 18:26, Masami Hiramatsu (Google)
<mhiramat@kernel.org> wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
> On real hardware, panic and machine reboot may not flush hardware cache
> to memory. This means the persistent ring buffer, which relies on a
> coherent state of memory, may not have its events written to the buffer
> and they may be lost. Moreover, there may be inconsistency with the
> counters which are used for validation of the integrity of the
> persistent ring buffer which may cause all data to be discarded.
>
> To avoid this issue, stop recording of the ring buffer on panic and
> flush the cache of the ring buffer's memory.
>
> Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
> Cc: stable@vger.kernel.org
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> arch/m68k/include/asm/Kbuild | 1 +
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH v13 00/18] unwind_deferred: Implement sframe handling
From: Indu Bhagat @ 2026-04-23 7:00 UTC (permalink / raw)
To: Jens Remus
Cc: linux-kernel, linux-trace-kernel, bpf, x86, linux-mm,
Steven Rostedt, Jens Remus, Josh Poimboeuf, Masami Hiramatsu,
Mathieu Desnoyers, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave, Linus Torvalds,
Andrew Morton, Florian Weimer, Kees Cook, Carlos O'Donell,
Sam James, Dylan Hatch, Borislav Petkov, Dave Hansen,
David Hildenbrand, H. Peter Anvin, Liam R. Howlett,
Lorenzo Stoakes, Michal Hocko, Mike Rapoport, Suren Baghdasaryan,
Vlastimil Babka, Heiko Carstens, Vasily Gorbik, ibhagatgnu
In-Reply-To: <20260127150554.2760964-1-jremus@linux.ibm.com>
On Tue, Jan 27, 2026 at 7:32 AM Jens Remus <jremus@linux.ibm.com> wrote:
>
> This is the implementation of parsing the SFrame V3 stack trace information
> from an .sframe section in an ELF file. It's a continuation of Josh's and
> Steve's work that can be found here:
>
> https://lore.kernel.org/all/cover.1737511963.git.jpoimboe@kernel.org/
> https://lore.kernel.org/all/20250827201548.448472904@kernel.org/
>
> Currently the only way to get a user space stack trace from a stack
> walk (and not just copying large amount of user stack into the kernel
> ring buffer) is to use frame pointers. This has a few issues. The biggest
> one is that compiling frame pointers into every application and library
> has been shown to cause performance overhead.
>
> Another issue is that the format of the frames may not always be consistent
> between different compilers and some architectures (s390) has no defined
> format to do a reliable stack walk. The only way to perform user space
> profiling on these architectures is to copy the user stack into the kernel
> buffer.
>
> SFrame [1] is now supported in binutils (x86-64, ARM64, and s390). There is
> discussions going on about supporting SFrame in LLVM. SFrame acts more like
> ORC, and lives in the ELF executable file as its own section. Like ORC it
> has two tables where the first table is sorted by instruction pointers (IP)
> and using the current IP and finding it's entry in the first table, it will
> take you to the second table which will tell you where the return address
> of the current function is located and then you can use that address to
> look it up in the first table to find the return address of that function,
> and so on. This performs a user space stack walk.
>
> Now because the .sframe section lives in the ELF file it needs to be faulted
> into memory when it is used. This means that walking the user space stack
> requires being in a faultable context. As profilers like perf request a stack
> trace in interrupt or NMI context, it cannot do the walking when it is
> requested. Instead it must be deferred until it is safe to fault in user
> space. One place this is known to be safe is when the task is about to return
> back to user space.
>
> This series makes the deferred unwind user code implement SFrame format V3
> and enables it on x86-64.
>
> [1]: https://sourceware.org/binutils/wiki/sframe
>
>
> This series applies on top of the tip perf/core branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
>
> The to be stack-traced user space programs (and libraries) need to be
> built with the recent SFrame stack trace information format V3, as
> generated by the upcoming binutils 2.46 with assembler option --gsframe.
> It can be built from source from the binutils-2_46-branch branch:
>
> git://sourceware.org/git/binutils-gdb.git binutils-2_46-branch
>
> Namhyung Kim's related perf tools deferred callchain support can be used
> for testing ("perf record --call-graph fp,defer" and "perf report/script").
>
>
> Changes since v12 (see patch notes for details):
> - Rebase on tip perf/core branch (d55c571e4333).
> - Add support for SFrame V3, including its new flexible FDEs. SFrame V2
> is not supported.
>
> Changes since v11 (see patch notes for details):
> - Rebase on tip master branch (f8fdee44bf2f) with Namhyung Kim's
> perf/defer-callchain-v4 branch merged on top.
> - Adjust to Peter's latest undwind user enhancements.
> - Simplify logic by using an internal SFrame FDE representation, whose
> FDE function start address field is an address instead of a PC-relative
> offset (from FDE).
> - Rename struct sframe_fre to sframe_fre_internal to align with
> struct sframe_fde_internal.
> - Remove unused pt_regs from unwind_user_next_common() and its
> callers. (Peter)
> - Simplify unwind_user_next_sframe(). (Peter)
> - Fix a few checkpatch errors and warnings.
> - Minor cleanups (e.g. move includes, fix indentation).
>
> Changes since v10:
> - Support for SFrame V2 PC-relative FDE function start address.
> - Support for SFrame V2 representing RA undefined as indication for
> outermost frames.
>
>
> Patches 1, 4, 11, and 17 have been updated to exclusively support the
> latest SFrame V3 stack trace information format, that is generated by
> the upcoming binutils 2.46 release. Old SFrame V2 sections get rejected
> with dynamic debug message "bad/unsupported sframe header".
>
> Patches 7 and 8 add support to unwind user (sframe) for outermost frames.
>
> Patches 12-15 add support to unwind user (sframe) for the new SFrame V3
> flexible FDEs.
>
> Patch 16 improves the performance of searching the SFrame FRE for an IP.
>
Thanks Jens for your work on this. Apart from some of those minor
renames you are already planning on doing (as you mentioned in the
meeting today), the SFrame related bits look OK to me.
Reviewed-by: Indu Bhagat <ibhagatgnu@gmail.com>
> Regards,
> Jens
>
>
> Jens Remus (7):
> unwind_user: Stop when reaching an outermost frame
> unwind_user/sframe: Add support for outermost frame indication
> unwind_user: Enable archs that pass RA in a register
> unwind_user: Flexible FP/RA recovery rules
> unwind_user: Flexible CFA recovery rules
> unwind_user/sframe: Add support for SFrame V3 flexible FDEs
> unwind_user/sframe: Separate reading of FRE from reading of FRE data
> words
>
> Josh Poimboeuf (11):
> unwind_user/sframe: Add support for reading .sframe headers
> unwind_user/sframe: Store .sframe section data in per-mm maple tree
> x86/uaccess: Add unsafe_copy_from_user() implementation
> unwind_user/sframe: Add support for reading .sframe contents
> unwind_user/sframe: Detect .sframe sections in executables
> unwind_user/sframe: Wire up unwind_user to sframe
> unwind_user/sframe: Remove .sframe section on detected corruption
> unwind_user/sframe: Show file name in debug output
> unwind_user/sframe: Add .sframe validation option
> unwind_user/sframe/x86: Enable sframe unwinding on x86
> unwind_user/sframe: Add prctl() interface for registering .sframe
> sections
>
> MAINTAINERS | 1 +
> arch/Kconfig | 23 +
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/mmu.h | 2 +-
> arch/x86/include/asm/uaccess.h | 39 +-
> arch/x86/include/asm/unwind_user.h | 69 +-
> arch/x86/include/asm/unwind_user_sframe.h | 12 +
> fs/binfmt_elf.c | 48 +-
> include/linux/mm_types.h | 3 +
> include/linux/sframe.h | 60 ++
> include/linux/unwind_user.h | 18 +
> include/linux/unwind_user_types.h | 46 +-
> include/uapi/linux/elf.h | 1 +
> include/uapi/linux/prctl.h | 6 +-
> kernel/fork.c | 10 +
> kernel/sys.c | 9 +
> kernel/unwind/Makefile | 3 +-
> kernel/unwind/sframe.c | 840 ++++++++++++++++++++++
> kernel/unwind/sframe.h | 87 +++
> kernel/unwind/sframe_debug.h | 68 ++
> kernel/unwind/user.c | 105 ++-
> mm/init-mm.c | 2 +
> 22 files changed, 1414 insertions(+), 39 deletions(-)
> create mode 100644 arch/x86/include/asm/unwind_user_sframe.h
> create mode 100644 include/linux/sframe.h
> create mode 100644 kernel/unwind/sframe.c
> create mode 100644 kernel/unwind/sframe.h
> create mode 100644 kernel/unwind/sframe_debug.h
>
> --
> 2.51.0
>
>
^ permalink raw reply
* Re: [moderation] KCSAN: data-race in filemap_read / filemap_splice_read (3)
From: syzbot @ 2026-04-23 5:05 UTC (permalink / raw)
To: adilger.kernel, akpm, almaz.alexandrovich, baolin.wang, hughd,
jack, jiayuan.chen, jiayuan.chen, linux-ext4, linux-fsdevel,
linux-kernel, linux-mm, linux-trace-kernel, mathieu.desnoyers,
mhiramat, ntfs3, rostedt, syzkaller-upstream-moderation, tytso,
willy
In-Reply-To: <699fd494.050a0220.2fcbed.0000.GAE@google.com>
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
^ permalink raw reply
* Re: [PATCH net v1] net: validate skb->napi_id in RX tracepoints
From: patchwork-bot+netdevbpf @ 2026-04-23 3:40 UTC (permalink / raw)
To: Kohei Enju
Cc: netdev, linux-trace-kernel, davem, edumazet, kuba, pabeni, horms,
rostedt, mhiramat, mathieu.desnoyers
In-Reply-To: <20260420105427.162816-1-kohei@enjuk.jp>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 20 Apr 2026 10:54:23 +0000 you wrote:
> Since commit 2bd82484bb4c ("xps: fix xps for stacked devices"),
> skb->napi_id shares storage with sender_cpu. RX tracepoints using
> net_dev_rx_verbose_template read skb->napi_id directly and can therefore
> report sender_cpu values as if they were NAPI IDs.
>
> For example, on the loopback path this can report 1 as napi_id, where 1
> comes from raw_smp_processor_id() + 1 in the XPS path:
>
> [...]
Here is the summary with links:
- [net,v1] net: validate skb->napi_id in RX tracepoints
https://git.kernel.org/netdev/net/c/3bfcf396081a
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* [PATCH v17 5/5] ring-buffer: Show commit numbers in buffer_meta file
From: Masami Hiramatsu (Google) @ 2026-04-22 16:17 UTC (permalink / raw)
To: Steven Rostedt, Catalin Marinas, Will Deacon
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687458572.932171.10907864814735342737.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
In addition to the index number, show the commit numbers of
each data page in the per_cpu buffer_meta file.
This is useful for understanding the current status of the
persistent ring buffer. (Note that this file is shown
only for persistent ring buffer and its backup instance)
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v17:
- Added NULL check for dpage in rbm_show in ring_buffer.c.
Changes in v16:
- update description.
---
kernel/trace/ring_buffer.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 31448c5ea791..15dcbf554d49 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2215,6 +2215,7 @@ static int rbm_show(struct seq_file *m, void *v)
struct ring_buffer_per_cpu *cpu_buffer = m->private;
struct ring_buffer_cpu_meta *meta = cpu_buffer->ring_meta;
unsigned long val = (unsigned long)v;
+ struct buffer_data_page *dpage;
if (val == 1) {
seq_printf(m, "head_buffer: %d\n",
@@ -2227,7 +2228,9 @@ static int rbm_show(struct seq_file *m, void *v)
}
val -= 2;
- seq_printf(m, "buffer[%ld]: %d\n", val, meta->buffers[val]);
+ dpage = rb_range_buffer(cpu_buffer, val);
+ seq_printf(m, "buffer[%ld]: %d (commit: %ld)\n",
+ val, meta->buffers[val], dpage ? local_read(&dpage->commit) : -1);
return 0;
}
^ permalink raw reply related
* [PATCH v17 4/5] ring-buffer: Add persistent ring buffer invalid-page inject test
From: Masami Hiramatsu (Google) @ 2026-04-22 16:16 UTC (permalink / raw)
To: Steven Rostedt, Catalin Marinas, Will Deacon
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687458572.932171.10907864814735342737.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Add a self-corrupting test for the persistent ring buffer.
This will inject an erroneous value to some sub-buffer pages (where
the index is even or multiples of 5) in the persistent ring buffer
when the kernel panics, and checks whether the number of detected
invalid pages and the total entry_bytes are the same as the recorded
values after reboot.
This ensures that the kernel can correctly recover a partially
corrupted persistent ring buffer after a reboot or panic.
The test only runs on the persistent ring buffer whose name is
"ptracingtest". The user has to fill it with events before a
kernel panic.
To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_INJECT
and add the following kernel cmdline:
reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace
panic=1
Run the following commands after the 1st boot:
cd /sys/kernel/tracing/instances/ptracingtest
echo 1 > tracing_on
echo 1 > events/enable
sleep 3
echo c > /proc/sysrq-trigger
After panic message, the kernel will reboot and run the verification
on the persistent ring buffer, e.g.
Ring buffer meta [2] invalid buffer page detected
Ring buffer meta [2] is from previous boot! (318 pages discarded)
Ring buffer testing [2] invalid pages: PASSED (318/318)
Ring buffer testing [2] entry_bytes: PASSED (1300476/1300476)
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v17:
- In rb_test_inject_invalid_pages(), changed entry_bytes and
idx to unsigned long
- Added NULL checks for cpu_buffer and meta.
- In allocate_trace_buffer(), added a NULL check for tr->name
before comparing it with strcmp.
Changes in v16:
- Update description and comments according to review comments.
Changes in v15:
- Use pr_warn() for test result.
- Inject errors on the page index is multiples of 5 so that
this can reproduce contiguous empty pages.
Changes in v14:
- Rename config to CONFIG_RING_BUFFER_PERSISTENT_INJECT.
- Clear meta->nr_invalid/entry_bytes after testing.
- Add test commands in config comment.
Changes in v10:
- Add entry_bytes test.
- Do not compile test code if CONFIG_RING_BUFFER_PERSISTENT_SELFTEST=n.
Changes in v9:
- Test also reader pages.
---
include/linux/ring_buffer.h | 1 +
kernel/trace/Kconfig | 34 +++++++++++++++++++
kernel/trace/ring_buffer.c | 79 +++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.c | 4 ++
4 files changed, 118 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 994f52b34344..0670742b2d60 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -238,6 +238,7 @@ int ring_buffer_subbuf_size_get(struct trace_buffer *buffer);
enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
+ RB_FL_TESTING = 1 << 1,
};
#ifdef CONFIG_RING_BUFFER
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e130da35808f..084f34dc6c9f 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -1202,6 +1202,40 @@ config RING_BUFFER_VALIDATE_TIME_DELTAS
Only say Y if you understand what this does, and you
still want it enabled. Otherwise say N
+config RING_BUFFER_PERSISTENT_INJECT
+ bool "Enable persistent ring buffer error injection test"
+ depends on RING_BUFFER
+ help
+ This option will have the kernel check if the persistent ring
+ buffer is named "ptracingtest". and if so, it will corrupt some
+ of its pages on a kernel panic. This is used to test if the
+ persistent ring buffer can recover from some of its sub-buffers
+ being corrupted.
+ To use this, boot a kernel with a "ptracingtest" persistent
+ ring buffer, e.g.
+
+ reserve_mem=20M:2M:trace trace_instance=ptracingtest@trace panic=1
+
+ And after the 1st boot, run the following commands:
+
+ cd /sys/kernel/tracing/instances/ptracingtest
+ echo 1 > events/enable
+ echo 1 > tracing_on
+ sleep 3
+ echo c > /proc/sysrq-trigger
+
+ After the panic message, the kernel will reboot and will show
+ the test results in the console output.
+
+ Note that events for the test ring buffer needs to be enabled
+ prior to crashing the kernel so that the ring buffer has content
+ that the test will corrupt.
+ As the test will corrupt events in the "ptracingtest" persistent
+ ring buffer, it should not be used for any other purpose other
+ than this test.
+
+ If unsure, say N
+
config MMIOTRACE_TEST
tristate "Test module for mmiotrace"
depends on MMIOTRACE && m
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0e3d2d037d4d..31448c5ea791 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -64,6 +64,10 @@ struct ring_buffer_cpu_meta {
unsigned long commit_buffer;
__u32 subbuf_size;
__u32 nr_subbufs;
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_INJECT
+ __u32 nr_invalid;
+ __u32 entry_bytes;
+#endif
int buffers[];
};
@@ -2085,6 +2089,21 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (discarded)
pr_cont(" (%d pages discarded)", discarded);
pr_cont("\n");
+
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_INJECT
+ if (meta->nr_invalid)
+ pr_warn("Ring buffer testing [%d] invalid pages: %s (%d/%d)\n",
+ cpu_buffer->cpu,
+ (discarded == meta->nr_invalid) ? "PASSED" : "FAILED",
+ discarded, meta->nr_invalid);
+ if (meta->entry_bytes)
+ pr_warn("Ring buffer testing [%d] entry_bytes: %s (%ld/%ld)\n",
+ cpu_buffer->cpu,
+ (entry_bytes == meta->entry_bytes) ? "PASSED" : "FAILED",
+ (long)entry_bytes, (long)meta->entry_bytes);
+ meta->nr_invalid = 0;
+ meta->entry_bytes = 0;
+#endif
return;
invalid:
@@ -2565,12 +2584,72 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
}
+#ifdef CONFIG_RING_BUFFER_PERSISTENT_INJECT
+static void rb_test_inject_invalid_pages(struct trace_buffer *buffer)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct ring_buffer_cpu_meta *meta;
+ struct buffer_data_page *dpage;
+ unsigned long entry_bytes = 0;
+ unsigned long ptr;
+ int subbuf_size;
+ int invalid = 0;
+ int cpu;
+ int i;
+
+ if (!(buffer->flags & RB_FL_TESTING))
+ return;
+
+ guard(preempt)();
+ cpu = smp_processor_id();
+
+ cpu_buffer = buffer->buffers[cpu];
+ if (!cpu_buffer)
+ return;
+ meta = cpu_buffer->ring_meta;
+ if (!meta)
+ return;
+
+ ptr = (unsigned long)rb_subbufs_from_meta(meta);
+ subbuf_size = meta->subbuf_size;
+
+ for (i = 0; i < meta->nr_subbufs; i++) {
+ unsigned long idx = meta->buffers[i];
+
+ dpage = (void *)(ptr + idx * subbuf_size);
+ /* Skip unused pages */
+ if (!local_read(&dpage->commit))
+ continue;
+
+ /*
+ * Invalidate even pages or multiples of 5. This will cause 3
+ * contiguous invalidated(empty) pages.
+ */
+ if (!(i & 0x1) || !(i % 5)) {
+ local_add(subbuf_size + 1, &dpage->commit);
+ invalid++;
+ } else {
+ /* Count total commit bytes. */
+ entry_bytes += local_read(&dpage->commit);
+ }
+ }
+
+ pr_info("Inject invalidated %d pages on CPU%d, total size: %ld\n",
+ invalid, cpu, (long)entry_bytes);
+ meta->nr_invalid = invalid;
+ meta->entry_bytes = entry_bytes;
+}
+#else /* !CONFIG_RING_BUFFER_PERSISTENT_INJECT */
+#define rb_test_inject_invalid_pages(buffer) do { } while (0)
+#endif
+
/* Stop recording on a persistent buffer and flush cache if needed. */
static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
{
struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
ring_buffer_record_off(buffer);
+ rb_test_inject_invalid_pages(buffer);
arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
return NOTIFY_DONE;
}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e9455d46ec16..d972b24cd73b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -9436,6 +9436,8 @@ static void setup_trace_scratch(struct trace_array *tr,
memset(tscratch, 0, size);
}
+#define TRACE_TEST_PTRACING_NAME "ptracingtest"
+
static int
allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned long size)
{
@@ -9448,6 +9450,8 @@ allocate_trace_buffer(struct trace_array *tr, struct array_buffer *buf, unsigned
buf->tr = tr;
if (tr->range_addr_start && tr->range_addr_size) {
+ if (tr->name && !strcmp(tr->name, TRACE_TEST_PTRACING_NAME))
+ rb_flags |= RB_FL_TESTING;
/* Add scratch buffer to handle 128 modules */
buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
tr->range_addr_start,
^ permalink raw reply related
* [PATCH v17 3/5] ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-04-22 16:16 UTC (permalink / raw)
To: Steven Rostedt, Catalin Marinas, Will Deacon
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687458572.932171.10907864814735342737.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Skip invalid sub-buffers when rewinding the persistent ring buffer
instead of stopping the rewinding the ring buffer. The skipped
buffers are cleared.
To ensure the rewinding stops at the unused page, this also clears
buffer_data_page::time_stamp when tracing resets the buffer. This
allows us to identify unused pages and empty pages.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v17:
- Fix to verify head_page at first before using its timestamp.
- Reset timestamp if the page is invalid.
Changes in v12:
- Fix build error.
Changes in v11:
- Reset timestamp when the buffer is invalid.
- When rewinding, skip subbuf page if timestamp is wrong and
check timestamp after validating buffer data page.
Changes in v10:
- Newly added.
---
kernel/trace/ring_buffer.c | 92 ++++++++++++++++++++++++++------------------
1 file changed, 54 insertions(+), 38 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 26507b93cf40..0e3d2d037d4d 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -363,6 +363,7 @@ struct buffer_page {
static void rb_init_page(struct buffer_data_page *bpage)
{
local_set(&bpage->commit, 0);
+ bpage->time_stamp = 0;
}
static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
@@ -1878,12 +1879,14 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
return events;
}
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
- struct ring_buffer_cpu_meta *meta)
+static int rb_validate_buffer(struct buffer_page *bpage, int cpu,
+ struct ring_buffer_cpu_meta *meta, u64 prev_ts, u64 next_ts)
{
+ struct buffer_data_page *dpage = bpage->page;
unsigned long long ts;
unsigned long tail;
u64 delta;
+ int ret = -1;
/*
* When a sub-buffer is recovered from a read, the commit value may
@@ -1892,9 +1895,19 @@ static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
* subbuf_size is considered invalid.
*/
tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
- if (tail > meta->subbuf_size)
- return -1;
- return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
+ if (tail <= meta->subbuf_size)
+ ret = rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
+
+ if (ret < 0 || (prev_ts && prev_ts > ts) || (next_ts && ts > next_ts)) {
+ local_set(&bpage->entries, 0);
+ local_set(&bpage->page->commit, 0);
+ bpage->page->time_stamp = prev_ts ? prev_ts : next_ts;
+ ret = -1;
+ } else {
+ local_set(&bpage->entries, ret);
+ }
+
+ return ret;
}
/* If the meta data has been validated, now validate the events */
@@ -1914,25 +1927,29 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
orig_head = head_page = cpu_buffer->head_page;
- /* Do the reader page first */
- ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
+ /* Do the head page first */
+ ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta, 0, 0);
+ if (ret < 0) {
+ pr_info("Ring buffer meta [%d] invalid head page detected\n",
+ cpu_buffer->cpu);
+ goto skip_rewind;
+ }
+ ts = head_page->page->time_stamp;
+
+ /* Do the reader page - reader must be previous to head. */
+ ret = rb_validate_buffer(cpu_buffer->reader_page, cpu_buffer->cpu, meta, 0, ts);
if (ret < 0) {
pr_info("Ring buffer meta [%d] invalid reader page detected\n",
cpu_buffer->cpu);
discarded++;
- /* Instead of discard whole ring buffer, discard only this sub-buffer. */
- local_set(&cpu_buffer->reader_page->entries, 0);
- local_set(&cpu_buffer->reader_page->page->commit, 0);
} else {
entries += ret;
entry_bytes += rb_page_size(cpu_buffer->reader_page);
- local_set(&cpu_buffer->reader_page->entries, ret);
+ ts = cpu_buffer->reader_page->page->time_stamp;
}
- ts = head_page->page->time_stamp;
-
/*
- * Try to rewind the head so that we can read the pages which already
+ * Try to rewind the head so that we can read the pages which are already
* read in the previous boot.
*/
if (head_page == cpu_buffer->tail_page)
@@ -1945,26 +1962,27 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->tail_page)
break;
- /* Ensure the page has older data than head. */
- if (ts < head_page->page->time_stamp)
- break;
-
- ts = head_page->page->time_stamp;
- /* Ensure the page has correct timestamp and some data. */
- if (!ts || rb_page_commit(head_page) == 0)
+ /* Rewind until unused page (no timestamp, no commit). */
+ if (!head_page->page->time_stamp && rb_page_commit(head_page) == 0)
break;
- /* Stop rewind if the page is invalid. */
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
- if (ret < 0)
- break;
-
- /* Recover the number of entries and update stats. */
- local_set(&head_page->entries, ret);
- if (ret)
- local_inc(&cpu_buffer->pages_touched);
- entries += ret;
- entry_bytes += rb_page_size(head_page);
+ /*
+ * Skip if the page is invalid, or its timestamp is newer than the
+ * previous valid page.
+ */
+ ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta, 0, ts);
+ if (ret < 0) {
+ if (!discarded)
+ pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ } else {
+ entries += ret;
+ entry_bytes += rb_page_size(head_page);
+ if (ret > 0)
+ local_inc(&cpu_buffer->pages_touched);
+ ts = head_page->page->time_stamp;
+ }
}
if (i)
pr_info("Ring buffer [%d] rewound %d pages\n", cpu_buffer->cpu, i);
@@ -2026,6 +2044,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
/* Nothing more to do, the only page is the reader page */
goto done;
}
+ ts = head_page->page->time_stamp;
/* Iterate until finding the commit page */
for (i = 0; i < meta->nr_subbufs + 1; i++, rb_inc_page(&head_page)) {
@@ -2034,15 +2053,12 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->reader_page)
continue;
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
+ ret = rb_validate_buffer(head_page, cpu_buffer->cpu, meta, ts, 0);
if (ret < 0) {
if (!discarded)
pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
cpu_buffer->cpu);
discarded++;
- /* Instead of discard whole ring buffer, discard only this sub-buffer. */
- local_set(&head_page->entries, 0);
- local_set(&head_page->page->commit, 0);
} else {
/* If the buffer has content, update pages_touched */
if (ret)
@@ -2050,7 +2066,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
entries += ret;
entry_bytes += rb_page_size(head_page);
- local_set(&head_page->entries, ret);
+ ts = head_page->page->time_stamp;
}
if (head_page == cpu_buffer->commit_page)
break;
@@ -2083,7 +2099,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
/* Reset all the subbuffers */
for (i = 0; i < meta->nr_subbufs - 1; i++, rb_inc_page(&head_page)) {
local_set(&head_page->entries, 0);
- local_set(&head_page->page->commit, 0);
+ rb_init_page(head_page->page);
}
}
^ permalink raw reply related
* [PATCH v17 2/5] ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
From: Masami Hiramatsu (Google) @ 2026-04-22 16:16 UTC (permalink / raw)
To: Steven Rostedt, Catalin Marinas, Will Deacon
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687458572.932171.10907864814735342737.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Skip invalid sub-buffers when validating the persistent ring buffer
instead of discarding the entire ring buffer. Only skipped buffers
are invalidated (cleared).
If the cache data in memory fails to be synchronized during a reboot,
the persistent ring buffer may become partially corrupted, but other
sub-buffers may still contain readable event data. Only discard the
subbuffers that are found to be corrupted.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
Changes in v17:
- Fix to use rb_page_size() of rewound pages for entry_bytes.
Changes in v15:
- Skip reader_page loop check on persistent ring buffer because
there can be contiguous empty(invalidated) pages.
- Do not show discarded page number information if it is 0.
Changes in v11:
- Fix a typo.
Changes in v9:
- Add meta->subbuf_size check.
- Fix a typo.
- Handle invalid reader_page case.
Changes in v8:
- Add comment in rb_valudate_buffer()
- Clear the RB_MISSED_* flags in rb_valudate_buffer() instead of
skipping subbuf.
- Remove unused subbuf local variable from rb_cpu_meta_valid().
Changes in v7:
- Combined with Handling RB_MISSED_* flags patch, focus on validation at boot.
- Remove checking subbuffer data when validating metadata, because it should be done
later.
- Do not mark the discarded sub buffer page but just reset it.
Changes in v6:
- Show invalid page detection message once per CPU.
Changes in v5:
- Instead of showing errors for each page, just show the number
of discarded pages at last.
Changes in v3:
- Record missed data event on commit.
---
kernel/trace/ring_buffer.c | 111 ++++++++++++++++++++++++++------------------
1 file changed, 66 insertions(+), 45 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b5ed4c72643e..26507b93cf40 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -370,6 +370,12 @@ static __always_inline unsigned int rb_page_commit(struct buffer_page *bpage)
return local_read(&bpage->page->commit);
}
+/* Size is determined by what has been committed */
+static __always_inline unsigned int rb_page_size(struct buffer_page *bpage)
+{
+ return rb_page_commit(bpage) & ~RB_MISSED_MASK;
+}
+
static void free_buffer_page(struct buffer_page *bpage)
{
/* Range pages are not to be freed */
@@ -1762,7 +1768,6 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
unsigned long *subbuf_mask)
{
int subbuf_size = PAGE_SIZE;
- struct buffer_data_page *subbuf;
unsigned long buffers_start;
unsigned long buffers_end;
int i;
@@ -1770,6 +1775,11 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
if (!subbuf_mask)
return false;
+ if (meta->subbuf_size != PAGE_SIZE) {
+ pr_info("Ring buffer boot meta [%d] invalid subbuf_size\n", cpu);
+ return false;
+ }
+
buffers_start = meta->first_buffer;
buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
@@ -1786,11 +1796,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
return false;
}
- subbuf = rb_subbufs_from_meta(meta);
-
bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
- /* Is the meta buffers and the subbufs themselves have correct data? */
+ /*
+ * Ensure the meta::buffers array has correct data. The data in each subbufs
+ * are checked later in rb_meta_validate_events().
+ */
for (i = 0; i < meta->nr_subbufs; i++) {
if (meta->buffers[i] < 0 ||
meta->buffers[i] >= meta->nr_subbufs) {
@@ -1798,18 +1809,12 @@ static bool rb_cpu_meta_valid(struct ring_buffer_cpu_meta *meta, int cpu,
return false;
}
- if ((unsigned)local_read(&subbuf->commit) > subbuf_size) {
- pr_info("Ring buffer boot meta [%d] buffer invalid commit\n", cpu);
- return false;
- }
-
if (test_bit(meta->buffers[i], subbuf_mask)) {
pr_info("Ring buffer boot meta [%d] array has duplicates\n", cpu);
return false;
}
set_bit(meta->buffers[i], subbuf_mask);
- subbuf = (void *)subbuf + subbuf_size;
}
return true;
@@ -1873,13 +1878,22 @@ static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int cpu
return events;
}
-static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu,
+ struct ring_buffer_cpu_meta *meta)
{
unsigned long long ts;
+ unsigned long tail;
u64 delta;
- int tail;
- tail = local_read(&dpage->commit);
+ /*
+ * When a sub-buffer is recovered from a read, the commit value may
+ * have RB_MISSED_* bits set, as these bits are reset on reuse.
+ * Even after clearing these bits, a commit value greater than the
+ * subbuf_size is considered invalid.
+ */
+ tail = local_read(&dpage->commit) & ~RB_MISSED_MASK;
+ if (tail > meta->subbuf_size)
+ return -1;
return rb_read_data_buffer(dpage, tail, cpu, &ts, &delta);
}
@@ -1890,6 +1904,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
struct buffer_page *head_page, *orig_head;
unsigned long entry_bytes = 0;
unsigned long entries = 0;
+ int discarded = 0;
int ret;
u64 ts;
int i;
@@ -1900,14 +1915,19 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
orig_head = head_page = cpu_buffer->head_page;
/* Do the reader page first */
- ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(cpu_buffer->reader_page->page, cpu_buffer->cpu, meta);
if (ret < 0) {
- pr_info("Ring buffer reader page is invalid\n");
- goto invalid;
+ pr_info("Ring buffer meta [%d] invalid reader page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ /* Instead of discard whole ring buffer, discard only this sub-buffer. */
+ local_set(&cpu_buffer->reader_page->entries, 0);
+ local_set(&cpu_buffer->reader_page->page->commit, 0);
+ } else {
+ entries += ret;
+ entry_bytes += rb_page_size(cpu_buffer->reader_page);
+ local_set(&cpu_buffer->reader_page->entries, ret);
}
- entries += ret;
- entry_bytes += local_read(&cpu_buffer->reader_page->page->commit);
- local_set(&cpu_buffer->reader_page->entries, ret);
ts = head_page->page->time_stamp;
@@ -1935,7 +1955,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
break;
/* Stop rewind if the page is invalid. */
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
if (ret < 0)
break;
@@ -1944,7 +1964,7 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (ret)
local_inc(&cpu_buffer->pages_touched);
entries += ret;
- entry_bytes += rb_page_commit(head_page);
+ entry_bytes += rb_page_size(head_page);
}
if (i)
pr_info("Ring buffer [%d] rewound %d pages\n", cpu_buffer->cpu, i);
@@ -2014,21 +2034,24 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
if (head_page == cpu_buffer->reader_page)
continue;
- ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu);
+ ret = rb_validate_buffer(head_page->page, cpu_buffer->cpu, meta);
if (ret < 0) {
- pr_info("Ring buffer meta [%d] invalid buffer page\n",
- cpu_buffer->cpu);
- goto invalid;
- }
-
- /* If the buffer has content, update pages_touched */
- if (ret)
- local_inc(&cpu_buffer->pages_touched);
-
- entries += ret;
- entry_bytes += local_read(&head_page->page->commit);
- local_set(&head_page->entries, ret);
+ if (!discarded)
+ pr_info("Ring buffer meta [%d] invalid buffer page detected\n",
+ cpu_buffer->cpu);
+ discarded++;
+ /* Instead of discard whole ring buffer, discard only this sub-buffer. */
+ local_set(&head_page->entries, 0);
+ local_set(&head_page->page->commit, 0);
+ } else {
+ /* If the buffer has content, update pages_touched */
+ if (ret)
+ local_inc(&cpu_buffer->pages_touched);
+ entries += ret;
+ entry_bytes += rb_page_size(head_page);
+ local_set(&head_page->entries, ret);
+ }
if (head_page == cpu_buffer->commit_page)
break;
}
@@ -2042,7 +2065,10 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
local_set(&cpu_buffer->entries, entries);
local_set(&cpu_buffer->entries_bytes, entry_bytes);
- pr_info("Ring buffer meta [%d] is from previous boot!\n", cpu_buffer->cpu);
+ pr_info("Ring buffer meta [%d] is from previous boot!", cpu_buffer->cpu);
+ if (discarded)
+ pr_cont(" (%d pages discarded)", discarded);
+ pr_cont("\n");
return;
invalid:
@@ -3329,12 +3355,6 @@ rb_iter_head_event(struct ring_buffer_iter *iter)
return NULL;
}
-/* Size is determined by what has been committed */
-static __always_inline unsigned rb_page_size(struct buffer_page *bpage)
-{
- return rb_page_commit(bpage) & ~RB_MISSED_MASK;
-}
-
static __always_inline unsigned
rb_commit_index(struct ring_buffer_per_cpu *cpu_buffer)
{
@@ -5647,11 +5667,12 @@ __rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
again:
/*
* This should normally only loop twice. But because the
- * start of the reader inserts an empty page, it causes
- * a case where we will loop three times. There should be no
- * reason to loop four times (that I know of).
+ * start of the reader inserts an empty page, it causes a
+ * case where we will loop three times. There should be no
+ * reason to loop four times unless the ring buffer is a
+ * recovered persistent ring buffer.
*/
- if (RB_WARN_ON(cpu_buffer, ++nr_loops > 3)) {
+ if (RB_WARN_ON(cpu_buffer, ++nr_loops > 3 && !cpu_buffer->ring_meta)) {
reader = NULL;
goto out;
}
^ permalink raw reply related
* [PATCH v17 1/5] ring-buffer: Flush and stop persistent ring buffer on panic
From: Masami Hiramatsu (Google) @ 2026-04-22 16:16 UTC (permalink / raw)
To: Steven Rostedt, Catalin Marinas, Will Deacon
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers, linux-arm-kernel
In-Reply-To: <177687458572.932171.10907864814735342737.stgit@mhiramat.tok.corp.google.com>
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.
To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.
Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
---
Changes in v13:
- Fix a rebase conflict.
Changes in v11:
- Do nothing by default since flush_cache_vmap() does nothing on x86
but it can cause deadlock on some architectures via on_each_cpu()
because other CPUs will be stoppped when panic notifier is called.
Changes in v9:
- Fix typo of & to &&.
- Fix typo of "Generic"
Changes in v6:
- Introduce asm/ring_buffer.h for arch_ring_buffer_flush_range().
- Use flush_cache_vmap() instead of flush_cache_all().
Changes in v5:
- Use ring_buffer_record_off() instead of ring_buffer_record_disable().
- Use flush_cache_all() to ensure flush all cache.
Changes in v3:
- update patch description.
---
arch/alpha/include/asm/Kbuild | 1 +
arch/arc/include/asm/Kbuild | 1 +
arch/arm/include/asm/Kbuild | 1 +
arch/arm64/include/asm/ring_buffer.h | 10 ++++++++++
arch/csky/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/loongarch/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/nios2/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
arch/parisc/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 1 +
arch/riscv/include/asm/Kbuild | 1 +
arch/s390/include/asm/Kbuild | 1 +
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/um/include/asm/Kbuild | 1 +
arch/x86/include/asm/Kbuild | 1 +
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/ring_buffer.h | 13 +++++++++++++
kernel/trace/ring_buffer.c | 22 ++++++++++++++++++++++
23 files changed, 65 insertions(+)
create mode 100644 arch/arm64/include/asm/ring_buffer.h
create mode 100644 include/asm-generic/ring_buffer.h
diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 483965c5a4de..b154b4e3dfa8 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += agp.h
generic-y += asm-offsets.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 4c69522e0328..483caacc6988 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -5,5 +5,6 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 03657ff8fbe3..decad5f2c826 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -3,6 +3,7 @@ generic-y += early_ioremap.h
generic-y += extable.h
generic-y += flat.h
generic-y += parport.h
+generic-y += ring_buffer.h
generated-y += mach-types.h
generated-y += unistd-nr.h
diff --git a/arch/arm64/include/asm/ring_buffer.h b/arch/arm64/include/asm/ring_buffer.h
new file mode 100644
index 000000000000..62316c406888
--- /dev/null
+++ b/arch/arm64/include/asm/ring_buffer.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_ARM64_RING_BUFFER_H
+#define _ASM_ARM64_RING_BUFFER_H
+
+#include <asm/cacheflush.h>
+
+/* Flush D-cache on persistent ring buffer */
+#define arch_ring_buffer_flush_range(start, end) dcache_clean_pop(start, end)
+
+#endif /* _ASM_ARM64_RING_BUFFER_H */
diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild
index 3a5c7f6e5aac..7dca0c6cdc84 100644
--- a/arch/csky/include/asm/Kbuild
+++ b/arch/csky/include/asm/Kbuild
@@ -9,6 +9,7 @@ generic-y += qrwlock.h
generic-y += qrwlock_types.h
generic-y += qspinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += vmlinux.lds.h
generic-y += text-patching.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 1efa1e993d4b..0f887d4238ed 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -5,4 +5,5 @@ generic-y += extable.h
generic-y += iomap.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild
index 9034b583a88a..7e92957baf6a 100644
--- a/arch/loongarch/include/asm/Kbuild
+++ b/arch/loongarch/include/asm/Kbuild
@@ -10,5 +10,6 @@ generic-y += qrwlock.h
generic-y += user.h
generic-y += ioctl.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
generic-y += statfs.h
generic-y += text-patching.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index b282e0dd8dc1..62543bf305ff 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -3,5 +3,6 @@ generated-y += syscall_table.h
generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += spinlock.h
generic-y += text-patching.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 7178f990e8b3..0030309b47ad 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += syscalls.h
generic-y += tlb.h
generic-y += user.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 684569b2ecd6..9771c3d85074 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -12,5 +12,6 @@ generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 28004301c236..0a2530964413 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -5,6 +5,7 @@ generic-y += cmpxchg.h
generic-y += extable.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += spinlock.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index cef49d60d74c..8aa34621702d 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -8,4 +8,5 @@ generic-y += spinlock_types.h
generic-y += spinlock.h
generic-y += qrwlock_types.h
generic-y += qrwlock.h
+generic-y += ring_buffer.h
generic-y += user.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 4fb596d94c89..d48d158f7241 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2e23533b67e3..805b5aeebb6f 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -5,4 +5,5 @@ generated-y += syscall_table_spu.h
generic-y += agp.h
generic-y += mcs_spinlock.h
generic-y += qrwlock.h
+generic-y += ring_buffer.h
generic-y += early_ioremap.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index bd5fc9403295..7721b63642f4 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -14,5 +14,6 @@ generic-y += ticket_spinlock.h
generic-y += qrwlock.h
generic-y += qrwlock_types.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += vmlinux.lds.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 80bad7de7a04..0c1fc47c3ba0 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,3 +7,4 @@ generated-y += unistd_nr.h
generic-y += asm-offsets.h
generic-y += mcs_spinlock.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 4d3f10ed8275..f0403d3ee8ab 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -3,4 +3,5 @@ generated-y += syscall_table.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += parport.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 17ee8a273aa6..49c6bb326b75 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -4,4 +4,5 @@ generated-y += syscall_table_64.h
generic-y += agp.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
+generic-y += ring_buffer.h
generic-y += text-patching.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index 1b9b82bbe322..2a1629ba8140 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += module.lds.h
generic-y += parport.h
generic-y += percpu.h
generic-y += preempt.h
+generic-y += ring_buffer.h
generic-y += runtime-const.h
generic-y += softirq_stack.h
generic-y += switch_to.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 4566000e15c4..078fd2c0d69d 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -14,3 +14,4 @@ generic-y += early_ioremap.h
generic-y += fprobe.h
generic-y += mcs_spinlock.h
generic-y += mmzone.h
+generic-y += ring_buffer.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 13fe45dea296..e57af619263a 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -6,5 +6,6 @@ generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
+generic-y += ring_buffer.h
generic-y += user.h
generic-y += text-patching.h
diff --git a/include/asm-generic/ring_buffer.h b/include/asm-generic/ring_buffer.h
new file mode 100644
index 000000000000..201d2aee1005
--- /dev/null
+++ b/include/asm-generic/ring_buffer.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Generic arch dependent ring_buffer macros.
+ */
+#ifndef __ASM_GENERIC_RING_BUFFER_H__
+#define __ASM_GENERIC_RING_BUFFER_H__
+
+#include <linux/cacheflush.h>
+
+/* Flush cache on ring buffer range if needed. Do nothing by default. */
+#define arch_ring_buffer_flush_range(start, end) do { } while (0)
+
+#endif /* __ASM_GENERIC_RING_BUFFER_H__ */
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cef49f8871d2..b5ed4c72643e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7,6 +7,7 @@
#include <linux/ring_buffer_types.h>
#include <linux/sched/isolation.h>
#include <linux/trace_recursion.h>
+#include <linux/panic_notifier.h>
#include <linux/trace_events.h>
#include <linux/ring_buffer.h>
#include <linux/trace_clock.h>
@@ -31,6 +32,7 @@
#include <linux/oom.h>
#include <linux/mm.h>
+#include <asm/ring_buffer.h>
#include <asm/local64.h>
#include <asm/local.h>
#include <asm/setup.h>
@@ -559,6 +561,7 @@ struct trace_buffer {
unsigned long range_addr_start;
unsigned long range_addr_end;
+ struct notifier_block flush_nb;
struct ring_buffer_meta *meta;
@@ -2520,6 +2523,16 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
kfree(cpu_buffer);
}
+/* Stop recording on a persistent buffer and flush cache if needed. */
+static int rb_flush_buffer_cb(struct notifier_block *nb, unsigned long event, void *data)
+{
+ struct trace_buffer *buffer = container_of(nb, struct trace_buffer, flush_nb);
+
+ ring_buffer_record_off(buffer);
+ arch_ring_buffer_flush_range(buffer->range_addr_start, buffer->range_addr_end);
+ return NOTIFY_DONE;
+}
+
static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
int order, unsigned long start,
unsigned long end,
@@ -2650,6 +2663,12 @@ static struct trace_buffer *alloc_buffer(unsigned long size, unsigned flags,
mutex_init(&buffer->mutex);
+ /* Persistent ring buffer needs to flush cache before reboot. */
+ if (start && end) {
+ buffer->flush_nb.notifier_call = rb_flush_buffer_cb;
+ atomic_notifier_chain_register(&panic_notifier_list, &buffer->flush_nb);
+ }
+
return_ptr(buffer);
fail_free_buffers:
@@ -2748,6 +2767,9 @@ ring_buffer_free(struct trace_buffer *buffer)
{
int cpu;
+ if (buffer->range_addr_start && buffer->range_addr_end)
+ atomic_notifier_chain_unregister(&panic_notifier_list, &buffer->flush_nb);
+
cpuhp_state_remove_instance(CPUHP_TRACE_RB_PREPARE, &buffer->node);
irq_work_sync(&buffer->irq_work.work);
^ permalink raw reply related
* [PATCH v17 0/5] ring-buffer: Making persistent ring buffers robust
From: Masami Hiramatsu (Google) @ 2026-04-22 16:16 UTC (permalink / raw)
To: Steven Rostedt, Catalin Marinas, Will Deacon
Cc: Masami Hiramatsu, Mathieu Desnoyers, linux-kernel,
linux-trace-kernel, Ian Rogers, linux-arm-kernel
Hi,
Here is the 17th version of improvement patches for making persistent
ring buffers robust to failures.
The previous version is here:
https://lore.kernel.org/all/177547105523.259641.14385891517704197263.stgit@mhiramat.tok.corp.google.com/
This version fixes some review comments from Sashiko[1], which
includes:
[2/5] Fix to use rb_page_size() of rewound pages for entry_bytes.
[3/5] - Fix to verify head_page at first before using its timestamp.
- Reset timestamp if the page is invalid.
[4/5] - In rb_test_inject_invalid_pages(), changed entry_bytes and
idx to unsigned long
- Added NULL checks for cpu_buffer and meta.
- In allocate_trace_buffer(), added a NULL check for tr->name
before comparing it with strcmp.
[5/5] Added NULL check for dpage in rbm_show in ring_buffer.c.
[1] https://sashiko.dev/#/patchset/177552432201.853249.5125045538812833325.stgit%40mhiramat.tok.corp.google.com
Thank you,
Masami Hiramatsu (Google) (5):
ring-buffer: Flush and stop persistent ring buffer on panic
ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
ring-buffer: Add persistent ring buffer invalid-page inject test
ring-buffer: Show commit numbers in buffer_meta file
arch/alpha/include/asm/Kbuild | 1
arch/arc/include/asm/Kbuild | 1
arch/arm/include/asm/Kbuild | 1
arch/arm64/include/asm/ring_buffer.h | 10 +
arch/csky/include/asm/Kbuild | 1
arch/hexagon/include/asm/Kbuild | 1
arch/loongarch/include/asm/Kbuild | 1
arch/m68k/include/asm/Kbuild | 1
arch/microblaze/include/asm/Kbuild | 1
arch/mips/include/asm/Kbuild | 1
arch/nios2/include/asm/Kbuild | 1
arch/openrisc/include/asm/Kbuild | 1
arch/parisc/include/asm/Kbuild | 1
arch/powerpc/include/asm/Kbuild | 1
arch/riscv/include/asm/Kbuild | 1
arch/s390/include/asm/Kbuild | 1
arch/sh/include/asm/Kbuild | 1
arch/sparc/include/asm/Kbuild | 1
arch/um/include/asm/Kbuild | 1
arch/x86/include/asm/Kbuild | 1
arch/xtensa/include/asm/Kbuild | 1
include/asm-generic/ring_buffer.h | 13 ++
include/linux/ring_buffer.h | 1
kernel/trace/Kconfig | 34 ++++
kernel/trace/ring_buffer.c | 275 ++++++++++++++++++++++++++--------
kernel/trace/trace.c | 4
26 files changed, 290 insertions(+), 67 deletions(-)
create mode 100644 arch/arm64/include/asm/ring_buffer.h
create mode 100644 include/asm-generic/ring_buffer.h
base-commit: 6170922f137231b98fc568571befef63e1edff3f
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [PATCH] ftrace: fix use-after-free of mod->name in function_stat_show()
From: Xiang Gao @ 2026-04-22 9:35 UTC (permalink / raw)
To: rostedt
Cc: mhiramat, mark.rutland, mathieu.desnoyers, linux-kernel,
linux-trace-kernel, gaoxiang17, Xiang Gao
In-Reply-To: <20260417101814.22d5c21b@fedora>
On Fri, 17 Apr 2026 10:18:14 -0400, Steven Rostedt wrote:
> Was AI used for any part of this patch? Including finding the bug? If
> so, it must be disclosed.
Yes, AI was used. Claude (claude-opus-4-7) assisted in both finding
the bug and drafting the fix. I reviewed the analysis and took
responsibility for the submission, but I should have disclosed this
up front per Documentation/process/coding-assistants.rst. I
apologize for the oversight, and I will add an
Assisted-by: Claude:claude-opus-4-7 tag in the follow-up.
> Just move guard(rcu) out of this if statement to include the below
> reference. No need to make the code worse. This really looks like
> AI slop :-(
You are right. Hoisting guard(rcu)() to the top of the
if (tr->trace_flags & TRACE_ITER(PROF_TEXT_OFFSET)) {
block so its scope covers the single snprintf() after the if/else is
the correct fix -- +1/-1, net zero, instead of duplicating snprintf()
into both branches as I did. I should have recognized this instead of
submitting the first plausible-looking approach.
I will send a follow-up patch that restores the single snprintf()
after the if/else and hoists guard(rcu)() to cover it, with the
Subject capitalized ("ftrace: Fix ...") and
Assisted-by: Claude:claude-opus-4-7 added.
Thanks for the review and for pushing back on the approach.
Xiang
^ permalink raw reply
* [PATCH v2] tracepoint: Fix typo in tracepoint.h comment
From: Sheng Che Peng @ 2026-04-22 2:18 UTC (permalink / raw)
To: rostedt, mathieu.desnoyers
Cc: linux-trace-kernel, linux-kernel, Sheng Che Peng
Change "my" to "may" in the description of subsystem configurations.
Signed-off-by: Sheng Che Peng <synte4028@gmail.com>
---
include/linux/tracepoint.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 578e520b6ee6c..763eea4d80d87 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -202,7 +202,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
#define TP_CONDITION(args...) args
/*
- * Individual subsystem my have a separate configuration to
+ * Individual subsystem may have a separate configuration to
* enable their tracepoints. By default, this file will create
* the tracepoints if CONFIG_TRACEPOINTS is defined. If a subsystem
* wants to be able to disable its tracepoints from being created
--
2.34.1
^ permalink raw reply related
* Re: [PATCH net v1] net: validate skb->napi_id in RX tracepoints
From: Jiayuan Chen @ 2026-04-22 1:55 UTC (permalink / raw)
To: Kohei Enju
Cc: netdev, linux-trace-kernel, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers
In-Reply-To: <aeYTImGIRuOf72mi@x1>
On 4/20/26 7:54 PM, Kohei Enju wrote:
> On 04/20 19:27, Jiayuan Chen wrote:
>> On 4/20/26 6:54 PM, Kohei Enju wrote:
>>> Since commit 2bd82484bb4c ("xps: fix xps for stacked devices"),
>>> skb->napi_id shares storage with sender_cpu. RX tracepoints using
>>> net_dev_rx_verbose_template read skb->napi_id directly and can therefore
>>> report sender_cpu values as if they were NAPI IDs.
>>>
>>> For example, on the loopback path this can report 1 as napi_id, where 1
>> So I think veth_forward_skb->__netif_rx could be affected as well?
> Yes. Just in case, I've confirmed the same behavior in the veth path.
> The mentioned loopback path is just a single example of possibly
> affected paths.
>
> Thanks,
> Kohei
>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
^ permalink raw reply
* Re: [PATCH] kernel/trace/ftrace: introduce ftrace module notifier
From: Song Liu @ 2026-04-21 22:40 UTC (permalink / raw)
To: Song Chen
Cc: Petr Mladek, Steven Rostedt, Miroslav Benes, mcgrof, petr.pavlu,
da.gomez, samitolvanen, atomlin, mhiramat, mark.rutland,
mathieu.desnoyers, linux-modules, linux-kernel,
linux-trace-kernel, live-patching
In-Reply-To: <4037aa19-1b01-4076-b823-5cc0e43becac@189.cn>
Hi,
I am replying partially to make sure folks know there are two
persons with the same first name.
On Sun, Apr 12, 2026 at 7:11 AM Song Chen <chensong_2000@189.cn> wrote:
[...]
> >
> > + We would need to make sure that it does not break some
> > existing "hidden" dependencies.
> >
> Thanks so much, this is the solution i'm working on. I replaced next
> with a list_head in notifier_block and implemented
> anotifier_call_chain_reverse to address the order issues, like your
> suggestion. And a new robust revision for rolling back.
I personally don't think there is strong enough motivation to make
changes like the following. If there is indeed strong motivations,
please make it clear in the next revision.
Thanks,
Song
^ permalink raw reply
* Re: [PATCH] kernel: trace: do not generate undefsyms_base.c
From: Nathan Chancellor @ 2026-04-21 21:51 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, torvalds, rostedt, Marc Zyngier, Arnd Bergmann,
linux-trace-kernel
In-Reply-To: <20260421100455.324333-1-pbonzini@redhat.com>
On Tue, Apr 21, 2026 at 11:04:55AM +0100, Paolo Bonzini wrote:
> The code to autogenerate undefsyms_base.c in the Makefile is larger
> than the file itself.
>
> Remove the "echo" indirection that creates the file, which keeps
> the build system sane and makes it much easier to edit it if/when
> new situations arrive.
>
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Yeah, I don't really know how I did not see this originally :/ tunnel
vision is real I suppose.
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
> ---
> kernel/trace/.gitignore | 1 -
> kernel/trace/Makefile | 35 ++++-------------------------------
> kernel/trace/undefsyms_base.c | 28 ++++++++++++++++++++++++++++
> 3 files changed, 32 insertions(+), 32 deletions(-)
> delete mode 100644 kernel/trace/.gitignore
> create mode 100644 kernel/trace/undefsyms_base.c
>
> diff --git a/kernel/trace/.gitignore b/kernel/trace/.gitignore
> deleted file mode 100644
> index 6adbb09d6deb..000000000000
> --- a/kernel/trace/.gitignore
> +++ /dev/null
> @@ -1 +0,0 @@
> -/undefsyms_base.c
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 4d4229e5eec4..1decdce8cbef 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -133,41 +133,14 @@ obj-$(CONFIG_TRACE_REMOTE) += trace_remote.o
> obj-$(CONFIG_SIMPLE_RING_BUFFER) += simple_ring_buffer.o
> obj-$(CONFIG_TRACE_REMOTE_TEST) += remote_test.o
>
> -#
> # simple_ring_buffer is used by the pKVM hypervisor which does not have access
> # to all kernel symbols. Fail the build if forbidden symbols are found.
> -#
> -# undefsyms_base generates a set of compiler and tooling-generated symbols that can
> -# safely be ignored for simple_ring_buffer.
> -#
> -filechk_undefsyms_base = \
> - echo '$(pound)include <linux/atomic.h>'; \
> - echo '$(pound)include <linux/string.h>'; \
> - echo '$(pound)include <asm/page.h>'; \
> - echo 'static char page[PAGE_SIZE] __aligned(PAGE_SIZE);'; \
> - echo 'void undefsyms_base(void *p, int n);'; \
> - echo 'void undefsyms_base(void *p, int n) {'; \
> - echo ' char buffer[256] = { 0 };'; \
> - echo ' u32 u = 0;'; \
> - echo ' memset((char * volatile)page, 8, PAGE_SIZE);'; \
> - echo ' memset((char * volatile)buffer, 8, sizeof(buffer));'; \
> - echo ' memcpy((void * volatile)p, buffer, sizeof(buffer));'; \
> - echo ' cmpxchg((u32 * volatile)&u, 0, 8);'; \
> - echo ' WARN_ON(n == 0xdeadbeef);'; \
> - echo '}'
> -
> -$(obj)/undefsyms_base.c: FORCE
> - $(call filechk,undefsyms_base)
> -
> -clean-files += undefsyms_base.c
> -
> -$(obj)/undefsyms_base.o: $(obj)/undefsyms_base.c
>
> +# Basic compiler and tooling-generated symbols that can safely be left
> +# undefined. Ensure KASAN is enabled to avoid logic that may disable
> +# FORTIFY_SOURCE when KASAN is not enabled. undefsyms_base.o does not
> +# automatically get KASAN flags because it is not linked into vmlinux.
> targets += undefsyms_base.o
> -
> -# Ensure KASAN is enabled to avoid logic that may disable FORTIFY_SOURCE when
> -# KASAN is not enabled. undefsyms_base.o does not automatically get KASAN flags
> -# because it is not linked into vmlinux.
> KASAN_SANITIZE_undefsyms_base.o := y
>
> UNDEFINED_ALLOWLIST = __asan __gcov __kasan __kcsan __hwasan __sancov __sanitizer __tsan __ubsan __x86_indirect_thunk \
> diff --git a/kernel/trace/undefsyms_base.c b/kernel/trace/undefsyms_base.c
> new file mode 100644
> index 000000000000..e65baf58e6ff
> --- /dev/null
> +++ b/kernel/trace/undefsyms_base.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * simple_ring_buffer is used by the pKVM hypervisor which does not have access
> + * to all kernel symbols. Whatever is undefined when compiling this file is
> + * compiler and tooling-generated symbols that can safely be ignored for
> + * simple_ring_buffer.
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/string.h>
> +#include <asm/page.h>
> +
> +void undefsyms_base(void *p, int n);
> +
> +static char page[PAGE_SIZE] __aligned(PAGE_SIZE);
> +
> +void undefsyms_base(void *p, int n)
> +{
> + char buffer[256] = { 0 };
> +
> + u32 u = 0;
> + memset((char * volatile)page, 8, PAGE_SIZE);
> + memset((char * volatile)buffer, 8, sizeof(buffer));
> + memcpy((void * volatile)p, buffer, sizeof(buffer));
> + cmpxchg((u32 * volatile)&u, 0, 8);
> + WARN_ON(n == 0xdeadbeef);
> +}
> --
> 2.53.0
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox