* [RFC PATCH 0/4] rv/tlob: Add task latency over budget RV monitor
@ 2026-04-12 19:27 wen.yang
2026-04-12 19:27 ` [RFC PATCH 1/4] rv/tlob: Add tlob model DOT file wen.yang
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: wen.yang @ 2026-04-12 19:27 UTC (permalink / raw)
To: Steven Rostedt, Gabriele Monaco, Masami Hiramatsu,
Mathieu Desnoyers
Cc: linux-trace-kernel, linux-kernel, Wen Yang
From: Wen Yang <wen.yang@linux.dev>
This series introduces tlob (task latency over budget), a new per-task
Runtime Verification monitor.
Background
----------
The RV framework formalises kernel behavioural properties as
deterministic automata. Existing monitors (wwnr, sssw, opid, etc.) cover
scheduling and locking invariants; none tracks wall-clock latency of
a per-task code path, including off-CPU time. This property is needed
in ADAS perception/planning pipelines, industrial real-time
controllers, and similar mixed-criticality deployments.
tlob adds this capability. A caller demarcates a code path via a
start/stop pair; the kernel arms a per-task hrtimer for the requested
budget. If the task has not called TRACE_STOP before the timer fires,
a violation is recorded, the stop call returns -EOVERFLOW, and an
event is pushed to the caller's mmap ring.
The tracefs interface requires only tracefs write permissions, avoiding
the CAP_BPF privilege needed for equivalent eBPF-based approaches. The
DA model (patch 1) can be independently verified with standard model-
checking tools.
Design
------
The monitor is a three-state deterministic automaton (DA):
unmonitored --trace_start--> on_cpu
on_cpu --switch_out--> off_cpu
off_cpu --switch_in---> on_cpu
{on_cpu, off_cpu} --{trace_stop, budget_expired}--> unmonitored
Per-task state lives in a fixed-size hash table (TLOB_MAX_MONITORED
slots) with RCU-deferred free. Timing is based on CLOCK_MONOTONIC
(ktime_get()), so budgets account for off-CPU time.
Two userspace interfaces are provided:
tracefs: uprobe pair registration via the monitor/enable files; no
new UAPI required.
/dev/rv ioctls (CONFIG_RV_CHARDEV):
TLOB_IOCTL_TRACE_START — arm the budget for a target task
TLOB_IOCTL_TRACE_STOP — disarm; returns -EOVERFLOW on violation
Each /dev/rv file descriptor has a per-fd mmap ring (a physically
contiguous control page struct tlob_mmap_page followed by an array of
struct tlob_event records). Head/tail/dropped are userspace-readable
without locking; overflow uses a drop-new policy.
New UAPI (include/uapi/linux/rv.h): tlob_start_args, tlob_event,
tlob_mmap_page, ioctl numbers (RV_IOC_MAGIC=0xB9, registered in
Documentation/userspace-api/ioctl/ioctl-number.rst).
Testing
-------
KUnit (patch 3): six suites (38 cases) gated on CONFIG_TLOB_KUNIT_TEST.
./tools/testing/kunit/kunit.py run \
--kunitconfig kernel/trace/rv/monitors/tlob/.kunitconfig
Coverage: automaton state transitions, start/stop API error paths,
scheduler context-switch accounting, tracepoint payload fields,
ring-buffer push/overflow/wakeup, and the uprobe line parser.
kselftest (patch 4): 19 TAP test points under
tools/testing/selftests/rv/. Requires CONFIG_RV_MON_TLOB=y,
CONFIG_RV_CHARDEV=y, and root.
make -C tools/testing/selftests/rv
sudo ./test_tlob.sh
Patch overview
--------------
Patch 1 — DOT model: formal automaton specification for verification.
Patch 2 — monitor implementation, UAPI, and documentation.
Patch 3 — KUnit in-kernel unit tests.
Patch 4 — kselftest user-space integration tests.
Wen Yang (4):
rv/tlob: Add tlob model DOT file
rv/tlob: Add tlob deterministic automaton monitor
rv/tlob: Add KUnit tests for the tlob monitor
selftests/rv: Add selftest for the tlob monitor
Documentation/trace/rv/index.rst | 1 +
Documentation/trace/rv/monitor_tlob.rst | 381 ++++++
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 3 +
include/uapi/linux/rv.h | 181 +++
kernel/trace/rv/Kconfig | 17 +
kernel/trace/rv/Makefile | 3 +
kernel/trace/rv/monitors/tlob/.kunitconfig | 5 +
kernel/trace/rv/monitors/tlob/Kconfig | 63 +
kernel/trace/rv/monitors/tlob/tlob.c | 987 ++++++++++++++
kernel/trace/rv/monitors/tlob/tlob.h | 145 ++
kernel/trace/rv/monitors/tlob/tlob_kunit.c | 1194 +++++++++++++++++
kernel/trace/rv/monitors/tlob/tlob_trace.h | 42 +
kernel/trace/rv/rv.c | 4 +
kernel/trace/rv/rv_dev.c | 602 +++++++++
kernel/trace/rv/rv_trace.h | 50 +
tools/include/uapi/linux/rv.h | 54 +
tools/testing/selftests/rv/Makefile | 18 +
tools/testing/selftests/rv/test_tlob.sh | 563 ++++++++
tools/testing/selftests/rv/tlob_helper.c | 994 ++++++++++++++
.../testing/selftests/rv/tlob_uprobe_target.c | 108 ++
tools/verification/models/tlob.dot | 25 +
22 files changed, 5441 insertions(+)
create mode 100644 Documentation/trace/rv/monitor_tlob.rst
create mode 100644 include/uapi/linux/rv.h
create mode 100644 kernel/trace/rv/monitors/tlob/.kunitconfig
create mode 100644 kernel/trace/rv/monitors/tlob/Kconfig
create mode 100644 kernel/trace/rv/monitors/tlob/tlob.c
create mode 100644 kernel/trace/rv/monitors/tlob/tlob.h
create mode 100644 kernel/trace/rv/monitors/tlob/tlob_kunit.c
create mode 100644 kernel/trace/rv/monitors/tlob/tlob_trace.h
create mode 100644 kernel/trace/rv/rv_dev.c
create mode 100644 tools/include/uapi/linux/rv.h
create mode 100644 tools/testing/selftests/rv/Makefile
create mode 100755 tools/testing/selftests/rv/test_tlob.sh
create mode 100644 tools/testing/selftests/rv/tlob_helper.c
create mode 100644 tools/testing/selftests/rv/tlob_uprobe_target.c
create mode 100644 tools/verification/models/tlob.dot
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH 1/4] rv/tlob: Add tlob model DOT file
2026-04-12 19:27 [RFC PATCH 0/4] rv/tlob: Add task latency over budget RV monitor wen.yang
@ 2026-04-12 19:27 ` wen.yang
2026-04-13 8:19 ` Gabriele Monaco
2026-04-12 19:27 ` [RFC PATCH 2/4] rv/tlob: Add tlob deterministic automaton monitor wen.yang
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: wen.yang @ 2026-04-12 19:27 UTC (permalink / raw)
To: Steven Rostedt, Gabriele Monaco, Masami Hiramatsu,
Mathieu Desnoyers
Cc: linux-trace-kernel, linux-kernel, Wen Yang
From: Wen Yang <wen.yang@linux.dev>
Add the Graphviz DOT specification for the tlob (task latency over
budget) deterministic automaton.
The model has three states: unmonitored, on_cpu, and off_cpu.
trace_start transitions from unmonitored to on_cpu; switch_out and
switch_in cycle between on_cpu and off_cpu; trace_stop and
budget_expired return to unmonitored from either active state.
unmonitored is the sole accepting state.
switch_in, switch_out, and sched_wakeup self-loop in unmonitored;
sched_wakeup self-loops in on_cpu; switch_out and sched_wakeup
self-loop in off_cpu.
Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
MAINTAINERS | 3 +++
tools/verification/models/tlob.dot | 25 +++++++++++++++++++++++++
2 files changed, 28 insertions(+)
create mode 100644 tools/verification/models/tlob.dot
diff --git a/MAINTAINERS b/MAINTAINERS
index 9fbb619c6..c2c56236c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23242,7 +23242,10 @@ S: Maintained
F: Documentation/trace/rv/
F: include/linux/rv.h
F: include/rv/
+F: include/uapi/linux/rv.h
F: kernel/trace/rv/
+F: samples/rv/
+F: tools/testing/selftests/rv/
F: tools/testing/selftests/verification/
F: tools/verification/
diff --git a/tools/verification/models/tlob.dot b/tools/verification/models/tlob.dot
new file mode 100644
index 000000000..df34a14b8
--- /dev/null
+++ b/tools/verification/models/tlob.dot
@@ -0,0 +1,25 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = plaintext, style=invis, label=""] "__init_unmonitored"};
+ {node [shape = ellipse] "unmonitored"};
+ {node [shape = plaintext] "unmonitored"};
+ {node [shape = plaintext] "on_cpu"};
+ {node [shape = plaintext] "off_cpu"};
+ "__init_unmonitored" -> "unmonitored";
+ "unmonitored" [label = "unmonitored", color = green3];
+ "unmonitored" -> "on_cpu" [ label = "trace_start" ];
+ "unmonitored" -> "unmonitored" [ label = "switch_in\nswitch_out\nsched_wakeup" ];
+ "on_cpu" [label = "on_cpu"];
+ "on_cpu" -> "off_cpu" [ label = "switch_out" ];
+ "on_cpu" -> "unmonitored" [ label = "trace_stop\nbudget_expired" ];
+ "on_cpu" -> "on_cpu" [ label = "sched_wakeup" ];
+ "off_cpu" [label = "off_cpu"];
+ "off_cpu" -> "on_cpu" [ label = "switch_in" ];
+ "off_cpu" -> "unmonitored" [ label = "trace_stop\nbudget_expired" ];
+ "off_cpu" -> "off_cpu" [ label = "switch_out\nsched_wakeup" ];
+ { rank = min ;
+ "__init_unmonitored";
+ "unmonitored";
+ }
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 2/4] rv/tlob: Add tlob deterministic automaton monitor
2026-04-12 19:27 [RFC PATCH 0/4] rv/tlob: Add task latency over budget RV monitor wen.yang
2026-04-12 19:27 ` [RFC PATCH 1/4] rv/tlob: Add tlob model DOT file wen.yang
@ 2026-04-12 19:27 ` wen.yang
2026-04-13 8:19 ` Gabriele Monaco
2026-04-12 19:27 ` [RFC PATCH 3/4] rv/tlob: Add KUnit tests for the tlob monitor wen.yang
2026-04-12 19:27 ` [RFC PATCH 4/4] selftests/rv: Add selftest " wen.yang
3 siblings, 1 reply; 7+ messages in thread
From: wen.yang @ 2026-04-12 19:27 UTC (permalink / raw)
To: Steven Rostedt, Gabriele Monaco, Masami Hiramatsu,
Mathieu Desnoyers
Cc: linux-trace-kernel, linux-kernel, Wen Yang
From: Wen Yang <wen.yang@linux.dev>
Add the tlob (task latency over budget) RV monitor. tlob tracks the
monotonic elapsed time (CLOCK_MONOTONIC) of a marked per-task code
path, including time off-CPU, and fires a per-task hrtimer when the
elapsed time exceeds a configurable budget.
Three-state DA (unmonitored/on_cpu/off_cpu) driven by trace_start,
switch_in/out, and budget_expired events. Per-task state lives in a
fixed-size hash table (TLOB_MAX_MONITORED slots) with RCU-deferred
free.
Two userspace interfaces:
- tracefs: uprobe pair registration via the monitor file using the
format "pid:threshold_us:offset_start:offset_stop:binary_path"
- /dev/rv ioctls (CONFIG_RV_CHARDEV): TLOB_IOCTL_TRACE_START /
TRACE_STOP; TRACE_STOP returns -EOVERFLOW on violation
Each /dev/rv fd has a per-fd mmap ring buffer (physically contiguous
pages). A control page (struct tlob_mmap_page) at offset 0 exposes
head/tail/dropped for lockless userspace reads; struct tlob_event
records follow at data_offset. Drop-new policy on overflow.
UAPI: include/uapi/linux/rv.h (tlob_start_args, tlob_event,
tlob_mmap_page, ioctl numbers), monitor_tlob.rst,
ioctl-number.rst (RV_IOC_MAGIC=0xB9).
Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
Documentation/trace/rv/index.rst | 1 +
Documentation/trace/rv/monitor_tlob.rst | 381 +++++++
.../userspace-api/ioctl/ioctl-number.rst | 1 +
include/uapi/linux/rv.h | 181 ++++
kernel/trace/rv/Kconfig | 17 +
kernel/trace/rv/Makefile | 2 +
kernel/trace/rv/monitors/tlob/Kconfig | 51 +
kernel/trace/rv/monitors/tlob/tlob.c | 986 ++++++++++++++++++
kernel/trace/rv/monitors/tlob/tlob.h | 145 +++
kernel/trace/rv/monitors/tlob/tlob_trace.h | 42 +
kernel/trace/rv/rv.c | 4 +
kernel/trace/rv/rv_dev.c | 602 +++++++++++
kernel/trace/rv/rv_trace.h | 50 +
13 files changed, 2463 insertions(+)
create mode 100644 Documentation/trace/rv/monitor_tlob.rst
create mode 100644 include/uapi/linux/rv.h
create mode 100644 kernel/trace/rv/monitors/tlob/Kconfig
create mode 100644 kernel/trace/rv/monitors/tlob/tlob.c
create mode 100644 kernel/trace/rv/monitors/tlob/tlob.h
create mode 100644 kernel/trace/rv/monitors/tlob/tlob_trace.h
create mode 100644 kernel/trace/rv/rv_dev.c
diff --git a/Documentation/trace/rv/index.rst b/Documentation/trace/rv/index.rst
index a2812ac5c..4f2bfaf38 100644
--- a/Documentation/trace/rv/index.rst
+++ b/Documentation/trace/rv/index.rst
@@ -15,3 +15,4 @@ Runtime Verification
monitor_wwnr.rst
monitor_sched.rst
monitor_rtapp.rst
+ monitor_tlob.rst
diff --git a/Documentation/trace/rv/monitor_tlob.rst b/Documentation/trace/rv/monitor_tlob.rst
new file mode 100644
index 000000000..d498e9894
--- /dev/null
+++ b/Documentation/trace/rv/monitor_tlob.rst
@@ -0,0 +1,381 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Monitor tlob
+============
+
+- Name: tlob - task latency over budget
+- Type: per-task deterministic automaton
+- Author: Wen Yang <wen.yang@linux.dev>
+
+Description
+-----------
+
+The tlob monitor tracks per-task elapsed time (CLOCK_MONOTONIC, including
+both on-CPU and off-CPU time) and reports a violation when the monitored
+task exceeds a configurable latency budget threshold.
+
+The monitor implements a three-state deterministic automaton::
+
+ |
+ | (initial)
+ v
+ +--------------+
+ +-------> | unmonitored |
+ | +--------------+
+ | |
+ | trace_start
+ | v
+ | +--------------+
+ | | on_cpu |
+ | +--------------+
+ | | |
+ | switch_out| | trace_stop / budget_expired
+ | v v
+ | +--------------+ (unmonitored)
+ | | off_cpu |
+ | +--------------+
+ | | |
+ | | switch_in| trace_stop / budget_expired
+ | v v
+ | (on_cpu) (unmonitored)
+ |
+ +-- trace_stop (from on_cpu or off_cpu)
+
+ Key transitions:
+ unmonitored --(trace_start)--> on_cpu
+ on_cpu --(switch_out)--> off_cpu
+ off_cpu --(switch_in)--> on_cpu
+ on_cpu --(trace_stop)--> unmonitored
+ off_cpu --(trace_stop)--> unmonitored
+ on_cpu --(budget_expired)-> unmonitored [violation]
+ off_cpu --(budget_expired)-> unmonitored [violation]
+
+ sched_wakeup self-loops in on_cpu and unmonitored; switch_out and
+ sched_wakeup self-loop in off_cpu. budget_expired is fired by the one-shot hrtimer; it always
+ transitions to unmonitored regardless of whether the task is on-CPU
+ or off-CPU when the timer fires.
+
+State Descriptions
+------------------
+
+- **unmonitored**: Task is not being traced. Scheduling events
+ (``switch_in``, ``switch_out``, ``sched_wakeup``) are silently
+ ignored (self-loop). The monitor waits for a ``trace_start`` event
+ to begin a new observation window.
+
+- **on_cpu**: Task is running on the CPU with the deadline timer armed.
+ A one-shot hrtimer was set for ``threshold_us`` microseconds at
+ ``trace_start`` time. A ``switch_out`` event transitions to
+ ``off_cpu``; the hrtimer keeps running (off-CPU time counts toward
+ the budget). A ``trace_stop`` cancels the timer and returns to
+ ``unmonitored`` (normal completion). If the hrtimer fires
+ (``budget_expired``) the violation is recorded and the automaton
+ transitions to ``unmonitored``.
+
+- **off_cpu**: Task was preempted or blocked. The one-shot hrtimer
+ continues to run. A ``switch_in`` event returns to ``on_cpu``.
+ A ``trace_stop`` cancels the timer and returns to ``unmonitored``.
+ If the hrtimer fires (``budget_expired``) while the task is off-CPU,
+ the violation is recorded and the automaton transitions to
+ ``unmonitored``.
+
+Rationale
+---------
+
+The per-task latency budget threshold allows operators to express timing
+requirements in microseconds and receive an immediate ftrace event when a
+task exceeds its budget. This is useful for real-time tasks
+(``SCHED_FIFO`` / ``SCHED_DEADLINE``) where total elapsed time must
+remain within a known bound.
+
+Each task has an independent threshold, so up to ``TLOB_MAX_MONITORED``
+(64) tasks with different timing requirements can be monitored
+simultaneously.
+
+On threshold violation the automaton records a ``tlob_budget_exceeded``
+ftrace event carrying the final on-CPU / off-CPU time breakdown, but does
+not kill or throttle the task. Monitoring can be restarted by issuing a
+new ``trace_start`` event (or a new ``TLOB_IOCTL_TRACE_START`` ioctl).
+
+A per-task one-shot hrtimer is armed at ``trace_start`` for exactly
+``threshold_us`` microseconds. It fires at most once per monitoring
+window, performs an O(1) hash lookup, records the violation, and injects
+the ``budget_expired`` event into the DA. When ``CONFIG_RV_MON_TLOB``
+is not set there is zero runtime cost.
+
+Usage
+-----
+
+tracefs interface (uprobe-based external monitoring)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``monitor`` tracefs file allows any privileged user to instrument an
+unmodified binary via uprobes, without changing its source code. Write a
+four-field record to attach two plain entry uprobes: one at
+``offset_start`` fires ``tlob_start_task()`` and one at ``offset_stop``
+fires ``tlob_stop_task()``, so the latency budget covers exactly the code
+region between the two offsets::
+
+ threshold_us:offset_start:offset_stop:binary_path
+
+``binary_path`` comes last so it may freely contain ``:`` (e.g. paths
+inside a container namespace).
+
+The uprobes fire for every task that executes the probed instruction in
+the binary, consistent with the native uprobe semantics. All tasks that
+execute the code region get independent per-task monitoring slots.
+
+Using two plain entry uprobes (rather than a uretprobe for the stop) means
+that a mistyped offset can never corrupt the call stack; the worst outcome
+of a bad ``offset_stop`` is a missed stop that causes the hrtimer to fire
+and report a budget violation.
+
+Example -- monitor a code region in ``/usr/bin/myapp`` with a 5 ms
+budget, where the region starts at offset 0x12a0 and ends at 0x12f0::
+
+ echo 1 > /sys/kernel/tracing/rv/monitors/tlob/enable
+
+ # Bind uprobes: start probe starts the clock, stop probe stops it
+ echo "5000:0x12a0:0x12f0:/usr/bin/myapp" \
+ > /sys/kernel/tracing/rv/monitors/tlob/monitor
+
+ # Remove the uprobe binding for this code region
+ echo "-0x12a0:/usr/bin/myapp" > /sys/kernel/tracing/rv/monitors/tlob/monitor
+
+ # List registered uprobe bindings (mirrors the write format)
+ cat /sys/kernel/tracing/rv/monitors/tlob/monitor
+ # -> 5000:0x12a0:0x12f0:/usr/bin/myapp
+
+ # Read violations from the trace buffer
+ cat /sys/kernel/tracing/trace
+
+Up to ``TLOB_MAX_MONITORED`` tasks may be monitored simultaneously.
+
+The offsets can be obtained with ``nm`` or ``readelf``::
+
+ nm -n /usr/bin/myapp | grep my_function
+ # -> 0000000000012a0 T my_function
+
+ readelf -s /usr/bin/myapp | grep my_function
+ # -> 42: 0000000000012a0 336 FUNC GLOBAL DEFAULT 13 my_function
+
+ # offset_start = 0x12a0 (function entry)
+ # offset_stop = 0x12a0 + 0x50 = 0x12f0 (or any instruction before return)
+
+Notes:
+
+- The uprobes fire for every task that executes the probed instruction,
+ so concurrent calls from different threads each get independent
+ monitoring slots.
+- ``offset_stop`` need not be a function return; it can be any instruction
+ within the region. If the stop probe is never reached (e.g. early exit
+ path bypasses it), the hrtimer fires and a budget violation is reported.
+- Each ``(binary_path, offset_start)`` pair may only be registered once.
+ A second write with the same ``offset_start`` for the same binary is
+ rejected with ``-EEXIST``. Two entry uprobes at the same address would
+ both fire for every task, causing ``tlob_start_task()`` to be called
+ twice; the second call would silently fail with ``-EEXIST`` and the
+ second binding's threshold would never take effect. Different code
+ regions that share the same ``offset_stop`` (common exit point) are
+ explicitly allowed.
+- The uprobe binding is removed when ``-offset_start:binary_path`` is
+ written to ``monitor``, or when the monitor is disabled.
+- The ``tag`` field in every ``tlob_budget_exceeded`` event is
+ automatically set to ``offset_start`` for the tracefs path, so
+ violation events for different code regions are immediately
+ distinguishable even when ``threshold_us`` values are identical.
+
+ftrace ring buffer (budget violation events)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a monitored task exceeds its latency budget the hrtimer fires,
+records the violation, and emits a single ``tlob_budget_exceeded`` event
+into the ftrace ring buffer. **Nothing is written to the ftrace ring
+buffer while the task is within budget.**
+
+The event carries the on-CPU / off-CPU time breakdown so that root-cause
+analysis (CPU-bound vs. scheduling / I/O overrun) is immediate::
+
+ cat /sys/kernel/tracing/trace
+
+Example output::
+
+ myapp-1234 [003] .... 12345.678: tlob_budget_exceeded: \
+ myapp[1234]: budget exceeded threshold=5000 \
+ on_cpu=820 off_cpu=4500 switches=3 state=off_cpu tag=0x00000000000012a0
+
+Field descriptions:
+
+``threshold``
+ Configured latency budget in microseconds.
+
+``on_cpu``
+ Cumulative on-CPU time since ``trace_start``, in microseconds.
+
+``off_cpu``
+ Cumulative off-CPU (scheduling + I/O wait) time since ``trace_start``,
+ in microseconds.
+
+``switches``
+ Number of times the task was scheduled out during this window.
+
+``state``
+ DA state when the hrtimer fired: ``on_cpu`` means the task was executing
+ when the budget expired (CPU-bound overrun); ``off_cpu`` means the task
+ was preempted or blocked (scheduling / I/O overrun).
+
+``tag``
+ Opaque 64-bit cookie supplied by the caller via ``tlob_start_args.tag``
+ (ioctl path) or automatically set to ``offset_start`` (tracefs uprobe
+ path). Use it to distinguish violations from different code regions
+ monitored by the same thread. Zero when not set.
+
+To capture violations in a file::
+
+ trace-cmd record -e tlob_budget_exceeded &
+ # ... run workload ...
+ trace-cmd report
+
+/dev/rv ioctl interface (self-instrumentation)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Tasks can self-instrument their own code paths via the ``/dev/rv`` misc
+device (requires ``CONFIG_RV_CHARDEV``). The kernel key is
+``task_struct``; multiple threads sharing a single fd each get their own
+independent monitoring slot.
+
+**Synchronous mode** -- the calling thread checks its own result::
+
+ int fd = open("/dev/rv", O_RDWR);
+
+ struct tlob_start_args args = {
+ .threshold_us = 50000, /* 50 ms */
+ .tag = 0, /* optional; 0 = don't care */
+ .notify_fd = -1, /* no fd notification */
+ };
+ ioctl(fd, TLOB_IOCTL_TRACE_START, &args);
+
+ /* ... code path under observation ... */
+
+ int ret = ioctl(fd, TLOB_IOCTL_TRACE_STOP, NULL);
+ /* ret == 0: within budget */
+ /* ret == -EOVERFLOW: budget exceeded */
+
+ close(fd);
+
+**Asynchronous mode** -- a dedicated monitor thread receives violation
+records via ``read()`` on a shared fd, decoupling the observation from
+the critical path::
+
+ /* Monitor thread: open a dedicated fd. */
+ int monitor_fd = open("/dev/rv", O_RDWR);
+
+ /* Worker thread: set notify_fd = monitor_fd in TRACE_START args. */
+ int work_fd = open("/dev/rv", O_RDWR);
+ struct tlob_start_args args = {
+ .threshold_us = 10000, /* 10 ms */
+ .tag = REGION_A,
+ .notify_fd = monitor_fd,
+ };
+ ioctl(work_fd, TLOB_IOCTL_TRACE_START, &args);
+ /* ... critical section ... */
+ ioctl(work_fd, TLOB_IOCTL_TRACE_STOP, NULL);
+
+ /* Monitor thread: blocking read() returns one or more tlob_event records. */
+ struct tlob_event ntfs[8];
+ ssize_t n = read(monitor_fd, ntfs, sizeof(ntfs));
+ for (int i = 0; i < n / sizeof(struct tlob_event); i++) {
+ struct tlob_event *ntf = &ntfs[i];
+ printf("tid=%u tag=0x%llx exceeded budget=%llu us "
+ "(on_cpu=%llu off_cpu=%llu switches=%u state=%s)\n",
+ ntf->tid, ntf->tag, ntf->threshold_us,
+ ntf->on_cpu_us, ntf->off_cpu_us, ntf->switches,
+ ntf->state ? "on_cpu" : "off_cpu");
+ }
+
+**mmap ring buffer** -- zero-copy consumption of violation events::
+
+ int fd = open("/dev/rv", O_RDWR);
+ struct tlob_start_args args = {
+ .threshold_us = 1000, /* 1 ms */
+ .notify_fd = fd, /* push violations to own ring buffer */
+ };
+ ioctl(fd, TLOB_IOCTL_TRACE_START, &args);
+
+ /* Map the ring: one control page + capacity data records. */
+ size_t pagesize = sysconf(_SC_PAGESIZE);
+ size_t cap = 64; /* read from page->capacity after mmap */
+ size_t len = pagesize + cap * sizeof(struct tlob_event);
+ void *map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+ struct tlob_mmap_page *page = map;
+ struct tlob_event *data =
+ (struct tlob_event *)((char *)map + page->data_offset);
+
+ /* Consumer loop: poll for events, read without copying. */
+ while (1) {
+ poll(&(struct pollfd){fd, POLLIN, 0}, 1, -1);
+
+ uint32_t head = __atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE);
+ uint32_t tail = page->data_tail;
+ while (tail != head) {
+ handle(&data[tail & (page->capacity - 1)]);
+ tail++;
+ }
+ __atomic_store_n(&page->data_tail, tail, __ATOMIC_RELEASE);
+ }
+
+Note: ``read()`` and ``mmap()`` share the same ring and ``data_tail``
+cursor. Do not use both simultaneously on the same fd.
+
+``tlob_event`` fields:
+
+``tid``
+ Thread ID (``task_pid_vnr``) of the violating task.
+
+``threshold_us``
+ Budget that was exceeded, in microseconds.
+
+``on_cpu_us``
+ Cumulative on-CPU time at violation time, in microseconds.
+
+``off_cpu_us``
+ Cumulative off-CPU time at violation time, in microseconds.
+
+``switches``
+ Number of context switches since ``TRACE_START``.
+
+``state``
+ 1 = timer fired while task was on-CPU; 0 = timer fired while off-CPU.
+
+``tag``
+ Cookie from ``tlob_start_args.tag``; for the tracefs uprobe path this
+ equals ``offset_start``. Zero when not set.
+
+tracefs files
+-------------
+
+The following files are created under
+``/sys/kernel/tracing/rv/monitors/tlob/``:
+
+``enable`` (rw)
+ Write ``1`` to enable the monitor; write ``0`` to disable it and
+ stop all currently monitored tasks.
+
+``desc`` (ro)
+ Human-readable description of the monitor.
+
+``monitor`` (rw)
+ Write ``threshold_us:offset_start:offset_stop:binary_path`` to bind two
+ plain entry uprobes in *binary_path*. The uprobe at *offset_start* fires
+ ``tlob_start_task()``; the uprobe at *offset_stop* fires
+ ``tlob_stop_task()``. Returns ``-EEXIST`` if a binding with the same
+ *offset_start* already exists for *binary_path*. Write
+ ``-offset_start:binary_path`` to remove the binding. Read to list
+ registered bindings, one
+ ``threshold_us:0xoffset_start:0xoffset_stop:binary_path`` entry per line.
+
+Specification
+-------------
+
+Graphviz DOT file in tools/verification/models/tlob.dot
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 331223761..8d3af68db 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -385,6 +385,7 @@ Code Seq# Include File Comments
0xB8 01-02 uapi/misc/mrvl_cn10k_dpi.h Marvell CN10K DPI driver
0xB8 all uapi/linux/mshv.h Microsoft Hyper-V /dev/mshv driver
<mailto:linux-hyperv@vger.kernel.org>
+0xB9 00-3F linux/rv.h Runtime Verification (RV) monitors
0xBA 00-0F uapi/linux/liveupdate.h Pasha Tatashin
<mailto:pasha.tatashin@soleen.com>
0xC0 00-0F linux/usb/iowarrior.h
diff --git a/include/uapi/linux/rv.h b/include/uapi/linux/rv.h
new file mode 100644
index 000000000..d1b96d8cd
--- /dev/null
+++ b/include/uapi/linux/rv.h
@@ -0,0 +1,181 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * UAPI definitions for Runtime Verification (RV) monitors.
+ *
+ * All RV monitors that expose an ioctl self-instrumentation interface
+ * share the magic byte RV_IOC_MAGIC (0xB9), registered in
+ * Documentation/userspace-api/ioctl/ioctl-number.rst.
+ *
+ * A single /dev/rv misc device serves as the entry point. ioctl numbers
+ * encode both the monitor identity and the operation:
+ *
+ * 0x01 - 0x1F tlob (task latency over budget)
+ * 0x20 - 0x3F reserved for future RV monitors
+ *
+ * Usage examples and design rationale are in:
+ * Documentation/trace/rv/monitor_tlob.rst
+ */
+
+#ifndef _UAPI_LINUX_RV_H
+#define _UAPI_LINUX_RV_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+/* Magic byte shared by all RV monitor ioctls. */
+#define RV_IOC_MAGIC 0xB9
+
+/* -----------------------------------------------------------------------
+ * tlob: task latency over budget monitor (nr 0x01 - 0x1F)
+ * -----------------------------------------------------------------------
+ */
+
+/**
+ * struct tlob_start_args - arguments for TLOB_IOCTL_TRACE_START
+ * @threshold_us: Latency budget for this critical section, in microseconds.
+ * Must be greater than zero.
+ * @tag: Opaque 64-bit cookie supplied by the caller. Echoed back
+ * verbatim in the tlob_budget_exceeded ftrace event and in any
+ * tlob_event record delivered via @notify_fd. Use it to identify
+ * which code region triggered a violation when the same thread
+ * monitors multiple regions sequentially. Set to 0 if not
+ * needed.
+ * @notify_fd: File descriptor that will receive a tlob_event record on
+ * violation. Must refer to an open /dev/rv fd. May equal
+ * the calling fd (self-notification, useful for retrieving the
+ * on_cpu_us / off_cpu_us breakdown after TRACE_STOP returns
+ * -EOVERFLOW). Set to -1 to disable fd notification; in that
+ * case violations are only signalled via the TRACE_STOP return
+ * value and the tlob_budget_exceeded ftrace event.
+ * @flags: Must be 0. Reserved for future extensions.
+ */
+struct tlob_start_args {
+ __u64 threshold_us;
+ __u64 tag;
+ __s32 notify_fd;
+ __u32 flags;
+};
+
+/**
+ * struct tlob_event - one budget-exceeded event
+ *
+ * Consumed by read() on the notify_fd registered at TLOB_IOCTL_TRACE_START.
+ * Each record describes a single budget exceedance for one task.
+ *
+ * @tid: Thread ID (task_pid_vnr) of the violating task.
+ * @threshold_us: Budget that was exceeded, in microseconds.
+ * @on_cpu_us: Cumulative on-CPU time at violation time, in microseconds.
+ * @off_cpu_us: Cumulative off-CPU (scheduling + I/O wait) time at
+ * violation time, in microseconds.
+ * @switches: Number of context switches since TRACE_START.
+ * @state: DA state at violation: 1 = on_cpu, 0 = off_cpu.
+ * @tag: Cookie from tlob_start_args.tag; for the tracefs uprobe path
+ * this is the offset_start value. Zero when not set.
+ */
+struct tlob_event {
+ __u32 tid;
+ __u32 pad;
+ __u64 threshold_us;
+ __u64 on_cpu_us;
+ __u64 off_cpu_us;
+ __u32 switches;
+ __u32 state; /* 1 = on_cpu, 0 = off_cpu */
+ __u64 tag;
+};
+
+/**
+ * struct tlob_mmap_page - control page for the mmap'd violation ring buffer
+ *
+ * Mapped at offset 0 of the mmap region returned by mmap(2) on a /dev/rv fd.
+ * The data array of struct tlob_event records begins at offset @data_offset
+ * (always one page from the mmap base; use this field rather than hard-coding
+ * PAGE_SIZE so the code remains correct across architectures).
+ *
+ * Ring layout:
+ *
+ * mmap base + 0 : struct tlob_mmap_page (one page)
+ * mmap base + data_offset : struct tlob_event[capacity]
+ *
+ * The mmap length determines the ring capacity. Compute it as:
+ *
+ * raw = sysconf(_SC_PAGESIZE) + capacity * sizeof(struct tlob_event)
+ * length = (raw + sysconf(_SC_PAGESIZE) - 1) & ~(sysconf(_SC_PAGESIZE) - 1)
+ *
+ * i.e. round the raw byte count up to the next page boundary before
+ * passing it to mmap(2). The kernel requires a page-aligned length.
+ * capacity must be a power of 2. Read @capacity after a successful
+ * mmap(2) for the actual value.
+ *
+ * Producer/consumer ordering contract:
+ *
+ * Kernel (producer):
+ * data[data_head & (capacity - 1)] = event;
+ * // pairs with load-acquire in userspace:
+ * smp_store_release(&page->data_head, data_head + 1);
+ *
+ * Userspace (consumer):
+ * // pairs with store-release in kernel:
+ * head = __atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE);
+ * for (tail = page->data_tail; tail != head; tail++)
+ * handle(&data[tail & (capacity - 1)]);
+ * __atomic_store_n(&page->data_tail, tail, __ATOMIC_RELEASE);
+ *
+ * @data_head and @data_tail are monotonically increasing __u32 counters
+ * in units of records. Unsigned 32-bit wrap-around is handled correctly
+ * by modular arithmetic; the ring is full when
+ * (data_head - data_tail) == capacity.
+ *
+ * When the ring is full the kernel drops the incoming record and increments
+ * @dropped. The consumer should check @dropped periodically to detect loss.
+ *
+ * read() and mmap() share the same ring buffer. Do not use both
+ * simultaneously on the same fd.
+ *
+ * @data_head: Next write slot index. Updated by the kernel with
+ * store-release ordering. Read by userspace with load-acquire.
+ * @data_tail: Next read slot index. Updated by userspace. Read by the
+ * kernel to detect overflow.
+ * @capacity: Actual ring capacity in records (power of 2). Written once
+ * by the kernel at mmap time; read-only for userspace thereafter.
+ * @version: Ring buffer ABI version; currently 1.
+ * @data_offset: Byte offset from the mmap base to the data array.
+ * Always equal to sysconf(_SC_PAGESIZE) on the running kernel.
+ * @record_size: sizeof(struct tlob_event) as seen by the kernel. Verify
+ * this matches userspace's sizeof before indexing the array.
+ * @dropped: Number of events dropped because the ring was full.
+ * Monotonically increasing; read with __ATOMIC_RELAXED.
+ */
+struct tlob_mmap_page {
+ __u32 data_head;
+ __u32 data_tail;
+ __u32 capacity;
+ __u32 version;
+ __u32 data_offset;
+ __u32 record_size;
+ __u64 dropped;
+};
+
+/*
+ * TLOB_IOCTL_TRACE_START - begin monitoring the calling task.
+ *
+ * Arms a per-task hrtimer for threshold_us microseconds. If args.notify_fd
+ * is >= 0, a tlob_event record is pushed into that fd's ring buffer on
+ * violation in addition to the tlob_budget_exceeded ftrace event.
+ * args.notify_fd == -1 disables fd notification.
+ *
+ * Violation records are consumed by read() on the notify_fd (blocking or
+ * non-blocking depending on O_NONBLOCK). On violation, TLOB_IOCTL_TRACE_STOP
+ * also returns -EOVERFLOW regardless of whether notify_fd is set.
+ *
+ * args.flags must be 0.
+ */
+#define TLOB_IOCTL_TRACE_START _IOW(RV_IOC_MAGIC, 0x01, struct tlob_start_args)
+
+/*
+ * TLOB_IOCTL_TRACE_STOP - end monitoring the calling task.
+ *
+ * Returns 0 if within budget, -EOVERFLOW if the budget was exceeded.
+ */
+#define TLOB_IOCTL_TRACE_STOP _IO(RV_IOC_MAGIC, 0x02)
+
+#endif /* _UAPI_LINUX_RV_H */
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 5b4be87ba..227573cda 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -65,6 +65,7 @@ source "kernel/trace/rv/monitors/pagefault/Kconfig"
source "kernel/trace/rv/monitors/sleep/Kconfig"
# Add new rtapp monitors here
+source "kernel/trace/rv/monitors/tlob/Kconfig"
# Add new monitors here
config RV_REACTORS
@@ -93,3 +94,19 @@ config RV_REACT_PANIC
help
Enables the panic reactor. The panic reactor emits a printk()
message if an exception is found and panic()s the system.
+
+config RV_CHARDEV
+ bool "RV ioctl interface via /dev/rv"
+ depends on RV
+ default n
+ help
+ Register a /dev/rv misc device that exposes an ioctl interface
+ for RV monitor self-instrumentation. All RV monitors share the
+ single device node; ioctl numbers encode the monitor identity.
+
+ When enabled, user-space programs can open /dev/rv and use
+ monitor-specific ioctl commands to bracket code regions they
+ want the kernel RV subsystem to observe.
+
+ Say Y here if you want to use the tlob self-instrumentation
+ ioctl interface; otherwise say N.
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 750e4ad6f..cc3781a3b 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -3,6 +3,7 @@
ccflags-y += -I $(src) # needed for trace events
obj-$(CONFIG_RV) += rv.o
+obj-$(CONFIG_RV_CHARDEV) += rv_dev.o
obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o
obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o
obj-$(CONFIG_RV_MON_SCHED) += monitors/sched/sched.o
@@ -17,6 +18,7 @@ obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
+obj-$(CONFIG_RV_MON_TLOB) += monitors/tlob/tlob.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/tlob/Kconfig b/kernel/trace/rv/monitors/tlob/Kconfig
new file mode 100644
index 000000000..010237480
--- /dev/null
+++ b/kernel/trace/rv/monitors/tlob/Kconfig
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_TLOB
+ depends on RV
+ depends on UPROBES
+ select DA_MON_EVENTS_ID
+ bool "tlob monitor"
+ help
+ Enable the tlob (task latency over budget) monitor. This monitor
+ tracks the elapsed time (CLOCK_MONOTONIC) of a marked code path within a
+ task (including both on-CPU and off-CPU time) and reports a
+ violation when the elapsed time exceeds a configurable budget
+ threshold.
+
+ The monitor implements a three-state deterministic automaton.
+ States: unmonitored, on_cpu, off_cpu.
+ Key transitions:
+ unmonitored --(trace_start)--> on_cpu
+ on_cpu --(switch_out)--> off_cpu
+ off_cpu --(switch_in)--> on_cpu
+ on_cpu --(trace_stop)--> unmonitored
+ off_cpu --(trace_stop)--> unmonitored
+ on_cpu --(budget_expired)--> unmonitored
+ off_cpu --(budget_expired)--> unmonitored
+
+ External configuration is done via the tracefs "monitor" file:
+ echo pid:threshold_us:binary:offset_start:offset_stop > .../rv/monitors/tlob/monitor
+ echo -pid > .../rv/monitors/tlob/monitor (remove task)
+ cat .../rv/monitors/tlob/monitor (list tasks)
+
+ The uprobe binding places two plain entry uprobes at offset_start and
+ offset_stop in the binary; these trigger tlob_start_task() and
+ tlob_stop_task() respectively. Using two entry uprobes (rather than a
+ uretprobe) means that a mistyped offset can never corrupt the call
+ stack; the worst outcome is a missed stop, which causes the hrtimer to
+ fire and report a budget violation.
+
+ Violation events are delivered via a lock-free mmap ring buffer on
+ /dev/rv (enabled by CONFIG_RV_CHARDEV). The consumer mmap()s the
+ device, reads records from the data array using the head/tail indices
+ in the control page, and advances data_tail when done.
+
+ For self-instrumentation, use TLOB_IOCTL_TRACE_START /
+ TLOB_IOCTL_TRACE_STOP via the /dev/rv misc device (enabled by
+ CONFIG_RV_CHARDEV).
+
+ Up to TLOB_MAX_MONITORED tasks may be monitored simultaneously.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_tlob.rst
+
diff --git a/kernel/trace/rv/monitors/tlob/tlob.c b/kernel/trace/rv/monitors/tlob/tlob.c
new file mode 100644
index 000000000..a6e474025
--- /dev/null
+++ b/kernel/trace/rv/monitors/tlob/tlob.c
@@ -0,0 +1,986 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * tlob: task latency over budget monitor
+ *
+ * Track the elapsed wall-clock time of a marked code path and detect when
+ * a monitored task exceeds its per-task latency budget. CLOCK_MONOTONIC
+ * is used so both on-CPU and off-CPU time count toward the budget.
+ *
+ * Per-task state is maintained in a spinlock-protected hash table. A
+ * one-shot hrtimer fires at the deadline; if the task has not called
+ * trace_stop by then, a violation is recorded.
+ *
+ * Up to TLOB_MAX_MONITORED tasks may be tracked simultaneously.
+ *
+ * Copyright (C) 2026 Wen Yang <wen.yang@linux.dev>
+ */
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/ftrace.h>
+#include <linux/hash.h>
+#include <linux/hrtimer.h>
+#include <linux/kernel.h>
+#include <linux/ktime.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/rv.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/atomic.h>
+#include <linux/rcupdate.h>
+#include <linux/spinlock.h>
+#include <linux/tracefs.h>
+#include <linux/uaccess.h>
+#include <linux/uprobes.h>
+#include <kunit/visibility.h>
+#include <rv/instrumentation.h>
+
+/* rv_interface_lock is defined in kernel/trace/rv/rv.c */
+extern struct mutex rv_interface_lock;
+
+#define MODULE_NAME "tlob"
+
+#include <rv_trace.h>
+#include <trace/events/sched.h>
+
+#define RV_MON_TYPE RV_MON_PER_TASK
+#include "tlob.h"
+#include <rv/da_monitor.h>
+
+/* Hash table size; must be a power of two. */
+#define TLOB_HTABLE_BITS 6
+#define TLOB_HTABLE_SIZE (1 << TLOB_HTABLE_BITS)
+
+/* Maximum binary path length for uprobe binding. */
+#define TLOB_MAX_PATH 256
+
+/* Per-task latency monitoring state. */
+struct tlob_task_state {
+ struct hlist_node hlist;
+ struct task_struct *task;
+ u64 threshold_us;
+ u64 tag;
+ struct hrtimer deadline_timer;
+ int canceled; /* protected by entry_lock */
+ struct file *notify_file; /* NULL or held reference */
+
+ /*
+ * entry_lock serialises the mutable accounting fields below.
+ * Lock order: tlob_table_lock -> entry_lock (never reverse).
+ */
+ raw_spinlock_t entry_lock;
+ u64 on_cpu_us;
+ u64 off_cpu_us;
+ ktime_t last_ts;
+ u32 switches;
+ u8 da_state;
+
+ struct rcu_head rcu; /* for call_rcu() teardown */
+};
+
+/* Per-uprobe-binding state: a start + stop probe pair for one binary region. */
+struct tlob_uprobe_binding {
+ struct list_head list;
+ u64 threshold_us;
+ struct path path;
+ char binpath[TLOB_MAX_PATH]; /* canonical path for read/remove */
+ loff_t offset_start;
+ loff_t offset_stop;
+ struct uprobe_consumer entry_uc;
+ struct uprobe_consumer stop_uc;
+ struct uprobe *entry_uprobe;
+ struct uprobe *stop_uprobe;
+};
+
+/* Object pool for tlob_task_state. */
+static struct kmem_cache *tlob_state_cache;
+
+/* Hash table and lock protecting table structure (insert/delete/canceled). */
+static struct hlist_head tlob_htable[TLOB_HTABLE_SIZE];
+static DEFINE_RAW_SPINLOCK(tlob_table_lock);
+static atomic_t tlob_num_monitored = ATOMIC_INIT(0);
+
+/* Uprobe binding list; protected by tlob_uprobe_mutex. */
+static LIST_HEAD(tlob_uprobe_list);
+static DEFINE_MUTEX(tlob_uprobe_mutex);
+
+/* Forward declaration */
+static enum hrtimer_restart tlob_deadline_timer_fn(struct hrtimer *timer);
+
+/* Hash table helpers */
+
+static unsigned int tlob_hash_task(const struct task_struct *task)
+{
+ return hash_ptr((void *)task, TLOB_HTABLE_BITS);
+}
+
+/*
+ * tlob_find_rcu - look up per-task state.
+ * Must be called under rcu_read_lock() or with tlob_table_lock held.
+ */
+static struct tlob_task_state *tlob_find_rcu(struct task_struct *task)
+{
+ struct tlob_task_state *ws;
+ unsigned int h = tlob_hash_task(task);
+
+ hlist_for_each_entry_rcu(ws, &tlob_htable[h], hlist,
+ lockdep_is_held(&tlob_table_lock))
+ if (ws->task == task)
+ return ws;
+ return NULL;
+}
+
+/* Allocate and initialise a new per-task state entry. */
+static struct tlob_task_state *tlob_alloc(struct task_struct *task,
+ u64 threshold_us, u64 tag)
+{
+ struct tlob_task_state *ws;
+
+ ws = kmem_cache_zalloc(tlob_state_cache, GFP_ATOMIC);
+ if (!ws)
+ return NULL;
+
+ ws->task = task;
+ get_task_struct(task);
+ ws->threshold_us = threshold_us;
+ ws->tag = tag;
+ ws->last_ts = ktime_get();
+ ws->da_state = on_cpu_tlob;
+ raw_spin_lock_init(&ws->entry_lock);
+ hrtimer_setup(&ws->deadline_timer, tlob_deadline_timer_fn,
+ CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ return ws;
+}
+
+/* RCU callback: free the slab once no readers remain. */
+static void tlob_free_rcu_slab(struct rcu_head *head)
+{
+ struct tlob_task_state *ws =
+ container_of(head, struct tlob_task_state, rcu);
+ kmem_cache_free(tlob_state_cache, ws);
+}
+
+/* Arm the one-shot deadline timer for threshold_us microseconds. */
+static void tlob_arm_deadline(struct tlob_task_state *ws)
+{
+ hrtimer_start(&ws->deadline_timer,
+ ns_to_ktime(ws->threshold_us * NSEC_PER_USEC),
+ HRTIMER_MODE_REL);
+}
+
+/*
+ * Push a violation record into a monitor fd's ring buffer (softirq context).
+ * Drop-new policy: discard incoming record when full. smp_store_release on
+ * data_head pairs with smp_load_acquire in the consumer.
+ */
+static void tlob_event_push(struct rv_file_priv *priv,
+ const struct tlob_event *info)
+{
+ struct tlob_ring *ring = &priv->ring;
+ unsigned long flags;
+ u32 head, tail;
+
+ spin_lock_irqsave(&ring->lock, flags);
+
+ head = ring->page->data_head;
+ tail = READ_ONCE(ring->page->data_tail);
+
+ if (head - tail > ring->mask) {
+ /* Ring full: drop incoming record. */
+ ring->page->dropped++;
+ spin_unlock_irqrestore(&ring->lock, flags);
+ return;
+ }
+
+ ring->data[head & ring->mask] = *info;
+ /* pairs with smp_load_acquire() in the consumer */
+ smp_store_release(&ring->page->data_head, head + 1);
+
+ spin_unlock_irqrestore(&ring->lock, flags);
+
+ wake_up_interruptible_poll(&priv->waitq, EPOLLIN | EPOLLRDNORM);
+}
+
+#if IS_ENABLED(CONFIG_KUNIT)
+void tlob_event_push_kunit(struct rv_file_priv *priv,
+ const struct tlob_event *info)
+{
+ tlob_event_push(priv, info);
+}
+EXPORT_SYMBOL_IF_KUNIT(tlob_event_push_kunit);
+#endif /* CONFIG_KUNIT */
+
+/*
+ * Budget exceeded: remove the entry, record the violation, and inject
+ * budget_expired into the DA.
+ *
+ * Lock order: tlob_table_lock -> entry_lock. tlob_stop_task() sets
+ * ws->canceled under both locks; if we see it here the stop path owns cleanup.
+ * fput/put_task_struct are done before call_rcu(); the RCU callback only
+ * reclaims the slab.
+ */
+static enum hrtimer_restart tlob_deadline_timer_fn(struct hrtimer *timer)
+{
+ struct tlob_task_state *ws =
+ container_of(timer, struct tlob_task_state, deadline_timer);
+ struct tlob_event info = {};
+ struct file *notify_file;
+ struct task_struct *task;
+ unsigned long flags;
+ /* snapshots taken under entry_lock */
+ u64 on_cpu_us, off_cpu_us, threshold_us, tag;
+ u32 switches;
+ bool on_cpu;
+ bool push_event = false;
+
+ raw_spin_lock_irqsave(&tlob_table_lock, flags);
+ /* stop path sets canceled under both locks; if set it owns cleanup */
+ if (ws->canceled) {
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+ return HRTIMER_NORESTART;
+ }
+
+ /* Finalize accounting and snapshot all fields under entry_lock. */
+ raw_spin_lock(&ws->entry_lock);
+
+ {
+ ktime_t now = ktime_get();
+ u64 delta_us = ktime_to_us(ktime_sub(now, ws->last_ts));
+
+ if (ws->da_state == on_cpu_tlob)
+ ws->on_cpu_us += delta_us;
+ else
+ ws->off_cpu_us += delta_us;
+ }
+
+ ws->canceled = 1;
+ on_cpu_us = ws->on_cpu_us;
+ off_cpu_us = ws->off_cpu_us;
+ threshold_us = ws->threshold_us;
+ tag = ws->tag;
+ switches = ws->switches;
+ on_cpu = (ws->da_state == on_cpu_tlob);
+ notify_file = ws->notify_file;
+ if (notify_file) {
+ info.tid = task_pid_vnr(ws->task);
+ info.threshold_us = threshold_us;
+ info.on_cpu_us = on_cpu_us;
+ info.off_cpu_us = off_cpu_us;
+ info.switches = switches;
+ info.state = on_cpu ? 1 : 0;
+ info.tag = tag;
+ push_event = true;
+ }
+
+ raw_spin_unlock(&ws->entry_lock);
+
+ hlist_del_rcu(&ws->hlist);
+ atomic_dec(&tlob_num_monitored);
+ /*
+ * Hold a reference so task remains valid across da_handle_event()
+ * after we drop tlob_table_lock.
+ */
+ task = ws->task;
+ get_task_struct(task);
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+
+ /*
+ * Both locks are now released; ws is exclusively owned (removed from
+ * the hash table with canceled=1). Emit the tracepoint and push the
+ * violation record.
+ */
+ trace_tlob_budget_exceeded(ws->task, threshold_us, on_cpu_us,
+ off_cpu_us, switches, on_cpu, tag);
+
+ if (push_event) {
+ struct rv_file_priv *priv = notify_file->private_data;
+
+ if (priv)
+ tlob_event_push(priv, &info);
+ }
+
+ da_handle_event(task, budget_expired_tlob);
+
+ if (notify_file)
+ fput(notify_file); /* ref from fget() at TRACE_START */
+ put_task_struct(ws->task); /* ref from tlob_alloc() */
+ put_task_struct(task); /* extra ref from get_task_struct() above */
+ call_rcu(&ws->rcu, tlob_free_rcu_slab);
+ return HRTIMER_NORESTART;
+}
+
+/* Tracepoint handlers */
+
+/*
+ * handle_sched_switch - advance the DA and accumulate on/off-CPU time.
+ *
+ * RCU read-side for lock-free lookup; entry_lock for per-task accounting.
+ * da_handle_event() is called after rcu_read_unlock() to avoid holding the
+ * read-side critical section across the RV framework.
+ */
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ struct tlob_task_state *ws;
+ unsigned long flags;
+ bool do_prev = false, do_next = false;
+ ktime_t now;
+
+ rcu_read_lock();
+
+ ws = tlob_find_rcu(prev);
+ if (ws) {
+ raw_spin_lock_irqsave(&ws->entry_lock, flags);
+ if (!ws->canceled) {
+ now = ktime_get();
+ ws->on_cpu_us += ktime_to_us(ktime_sub(now, ws->last_ts));
+ ws->last_ts = now;
+ ws->switches++;
+ ws->da_state = off_cpu_tlob;
+ do_prev = true;
+ }
+ raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
+ }
+
+ ws = tlob_find_rcu(next);
+ if (ws) {
+ raw_spin_lock_irqsave(&ws->entry_lock, flags);
+ if (!ws->canceled) {
+ now = ktime_get();
+ ws->off_cpu_us += ktime_to_us(ktime_sub(now, ws->last_ts));
+ ws->last_ts = now;
+ ws->da_state = on_cpu_tlob;
+ do_next = true;
+ }
+ raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
+ }
+
+ rcu_read_unlock();
+
+ if (do_prev)
+ da_handle_event(prev, switch_out_tlob);
+ if (do_next)
+ da_handle_event(next, switch_in_tlob);
+}
+
+static void handle_sched_wakeup(void *data, struct task_struct *p)
+{
+ struct tlob_task_state *ws;
+ unsigned long flags;
+ bool found = false;
+
+ rcu_read_lock();
+ ws = tlob_find_rcu(p);
+ if (ws) {
+ raw_spin_lock_irqsave(&ws->entry_lock, flags);
+ found = !ws->canceled;
+ raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
+ }
+ rcu_read_unlock();
+
+ if (found)
+ da_handle_event(p, sched_wakeup_tlob);
+}
+
+/* -----------------------------------------------------------------------
+ * Core start/stop helpers (also called from rv_dev.c)
+ * -----------------------------------------------------------------------
+ */
+
+/*
+ * __tlob_insert - insert @ws into the hash table and arm its deadline timer.
+ *
+ * Re-checks for duplicates and capacity under tlob_table_lock; the caller
+ * may have done a lock-free pre-check before allocating @ws. On failure @ws
+ * is freed directly (never in table, so no call_rcu needed).
+ */
+static int __tlob_insert(struct task_struct *task, struct tlob_task_state *ws)
+{
+ unsigned int h;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&tlob_table_lock, flags);
+ if (tlob_find_rcu(task)) {
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+ if (ws->notify_file)
+ fput(ws->notify_file);
+ put_task_struct(ws->task);
+ kmem_cache_free(tlob_state_cache, ws);
+ return -EEXIST;
+ }
+ if (atomic_read(&tlob_num_monitored) >= TLOB_MAX_MONITORED) {
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+ if (ws->notify_file)
+ fput(ws->notify_file);
+ put_task_struct(ws->task);
+ kmem_cache_free(tlob_state_cache, ws);
+ return -ENOSPC;
+ }
+ h = tlob_hash_task(task);
+ hlist_add_head_rcu(&ws->hlist, &tlob_htable[h]);
+ atomic_inc(&tlob_num_monitored);
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+
+ da_handle_start_run_event(task, trace_start_tlob);
+ tlob_arm_deadline(ws);
+ return 0;
+}
+
+/**
+ * tlob_start_task - begin monitoring @task with latency budget @threshold_us.
+ *
+ * @notify_file: /dev/rv fd whose ring buffer receives a tlob_event on
+ * violation; caller transfers the fget() reference to tlob.c.
+ * Pass NULL for synchronous mode (violations only via
+ * TRACE_STOP return value and the tlob_budget_exceeded event).
+ *
+ * Returns 0, -ENODEV, -EEXIST, -ENOSPC, or -ENOMEM. On failure the caller
+ * retains responsibility for any @notify_file reference.
+ */
+int tlob_start_task(struct task_struct *task, u64 threshold_us,
+ struct file *notify_file, u64 tag)
+{
+ struct tlob_task_state *ws;
+ unsigned long flags;
+
+ if (!tlob_state_cache)
+ return -ENODEV;
+
+ if (threshold_us > (u64)KTIME_MAX / NSEC_PER_USEC)
+ return -ERANGE;
+
+ /* Quick pre-check before allocation. */
+ raw_spin_lock_irqsave(&tlob_table_lock, flags);
+ if (tlob_find_rcu(task)) {
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+ return -EEXIST;
+ }
+ if (atomic_read(&tlob_num_monitored) >= TLOB_MAX_MONITORED) {
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+ return -ENOSPC;
+ }
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+
+ ws = tlob_alloc(task, threshold_us, tag);
+ if (!ws)
+ return -ENOMEM;
+
+ ws->notify_file = notify_file;
+ return __tlob_insert(task, ws);
+}
+EXPORT_SYMBOL_GPL(tlob_start_task);
+
+/**
+ * tlob_stop_task - stop monitoring @task before the deadline fires.
+ *
+ * Sets canceled under entry_lock (inside tlob_table_lock) before calling
+ * hrtimer_cancel(), racing safely with the timer callback.
+ *
+ * Returns 0 if within budget, -ESRCH if the entry is gone (deadline already
+ * fired, or TRACE_START was never called).
+ */
+int tlob_stop_task(struct task_struct *task)
+{
+ struct tlob_task_state *ws;
+ struct file *notify_file;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&tlob_table_lock, flags);
+ ws = tlob_find_rcu(task);
+ if (!ws) {
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+ return -ESRCH;
+ }
+
+ /* Prevent handle_sched_switch from updating accounting after removal. */
+ raw_spin_lock(&ws->entry_lock);
+ ws->canceled = 1;
+ raw_spin_unlock(&ws->entry_lock);
+
+ hlist_del_rcu(&ws->hlist);
+ atomic_dec(&tlob_num_monitored);
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+
+ hrtimer_cancel(&ws->deadline_timer);
+
+ da_handle_event(task, trace_stop_tlob);
+
+ notify_file = ws->notify_file;
+ if (notify_file)
+ fput(notify_file);
+ put_task_struct(ws->task);
+ call_rcu(&ws->rcu, tlob_free_rcu_slab);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(tlob_stop_task);
+
+/* Stop monitoring all tracked tasks; called on monitor disable. */
+static void tlob_stop_all(void)
+{
+ struct tlob_task_state *batch[TLOB_MAX_MONITORED];
+ struct tlob_task_state *ws;
+ struct hlist_node *tmp;
+ unsigned long flags;
+ int n = 0, i;
+
+ raw_spin_lock_irqsave(&tlob_table_lock, flags);
+ for (i = 0; i < TLOB_HTABLE_SIZE; i++) {
+ hlist_for_each_entry_safe(ws, tmp, &tlob_htable[i], hlist) {
+ raw_spin_lock(&ws->entry_lock);
+ ws->canceled = 1;
+ raw_spin_unlock(&ws->entry_lock);
+ hlist_del_rcu(&ws->hlist);
+ atomic_dec(&tlob_num_monitored);
+ if (n < TLOB_MAX_MONITORED)
+ batch[n++] = ws;
+ }
+ }
+ raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
+
+ for (i = 0; i < n; i++) {
+ ws = batch[i];
+ hrtimer_cancel(&ws->deadline_timer);
+ da_handle_event(ws->task, trace_stop_tlob);
+ if (ws->notify_file)
+ fput(ws->notify_file);
+ put_task_struct(ws->task);
+ call_rcu(&ws->rcu, tlob_free_rcu_slab);
+ }
+}
+
+/* uprobe binding helpers */
+
+static int tlob_uprobe_entry_handler(struct uprobe_consumer *uc,
+ struct pt_regs *regs, __u64 *data)
+{
+ struct tlob_uprobe_binding *b =
+ container_of(uc, struct tlob_uprobe_binding, entry_uc);
+
+ tlob_start_task(current, b->threshold_us, NULL, (u64)b->offset_start);
+ return 0;
+}
+
+static int tlob_uprobe_stop_handler(struct uprobe_consumer *uc,
+ struct pt_regs *regs, __u64 *data)
+{
+ tlob_stop_task(current);
+ return 0;
+}
+
+/*
+ * Register start + stop entry uprobes for a binding.
+ * Both are plain entry uprobes (no uretprobe), so a wrong offset never
+ * corrupts the call stack; the worst outcome is a missed stop (hrtimer
+ * fires and reports a budget violation).
+ * Called with tlob_uprobe_mutex held.
+ */
+static int tlob_add_uprobe(u64 threshold_us, const char *binpath,
+ loff_t offset_start, loff_t offset_stop)
+{
+ struct tlob_uprobe_binding *b, *tmp_b;
+ char pathbuf[TLOB_MAX_PATH];
+ struct inode *inode;
+ char *canon;
+ int ret;
+
+ b = kzalloc(sizeof(*b), GFP_KERNEL);
+ if (!b)
+ return -ENOMEM;
+
+ if (binpath[0] != '/') {
+ kfree(b);
+ return -EINVAL;
+ }
+
+ b->threshold_us = threshold_us;
+ b->offset_start = offset_start;
+ b->offset_stop = offset_stop;
+
+ ret = kern_path(binpath, LOOKUP_FOLLOW, &b->path);
+ if (ret)
+ goto err_free;
+
+ if (!d_is_reg(b->path.dentry)) {
+ ret = -EINVAL;
+ goto err_path;
+ }
+
+ /* Reject duplicate start offset for the same binary. */
+ list_for_each_entry(tmp_b, &tlob_uprobe_list, list) {
+ if (tmp_b->offset_start == offset_start &&
+ tmp_b->path.dentry == b->path.dentry) {
+ ret = -EEXIST;
+ goto err_path;
+ }
+ }
+
+ /* Store canonical path for read-back and removal matching. */
+ canon = d_path(&b->path, pathbuf, sizeof(pathbuf));
+ if (IS_ERR(canon)) {
+ ret = PTR_ERR(canon);
+ goto err_path;
+ }
+ strscpy(b->binpath, canon, sizeof(b->binpath));
+
+ b->entry_uc.handler = tlob_uprobe_entry_handler;
+ b->stop_uc.handler = tlob_uprobe_stop_handler;
+
+ inode = d_real_inode(b->path.dentry);
+
+ b->entry_uprobe = uprobe_register(inode, offset_start, 0, &b->entry_uc);
+ if (IS_ERR(b->entry_uprobe)) {
+ ret = PTR_ERR(b->entry_uprobe);
+ b->entry_uprobe = NULL;
+ goto err_path;
+ }
+
+ b->stop_uprobe = uprobe_register(inode, offset_stop, 0, &b->stop_uc);
+ if (IS_ERR(b->stop_uprobe)) {
+ ret = PTR_ERR(b->stop_uprobe);
+ b->stop_uprobe = NULL;
+ goto err_entry;
+ }
+
+ list_add_tail(&b->list, &tlob_uprobe_list);
+ return 0;
+
+err_entry:
+ uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc);
+ uprobe_unregister_sync();
+err_path:
+ path_put(&b->path);
+err_free:
+ kfree(b);
+ return ret;
+}
+
+/*
+ * Remove the uprobe binding for (offset_start, binpath).
+ * binpath is resolved to a dentry for comparison so symlinks are handled
+ * correctly. Called with tlob_uprobe_mutex held.
+ */
+static void tlob_remove_uprobe_by_key(loff_t offset_start, const char *binpath)
+{
+ struct tlob_uprobe_binding *b, *tmp;
+ struct path remove_path;
+
+ if (kern_path(binpath, LOOKUP_FOLLOW, &remove_path))
+ return;
+
+ list_for_each_entry_safe(b, tmp, &tlob_uprobe_list, list) {
+ if (b->offset_start != offset_start)
+ continue;
+ if (b->path.dentry != remove_path.dentry)
+ continue;
+ uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc);
+ uprobe_unregister_nosync(b->stop_uprobe, &b->stop_uc);
+ list_del(&b->list);
+ uprobe_unregister_sync();
+ path_put(&b->path);
+ kfree(b);
+ break;
+ }
+
+ path_put(&remove_path);
+}
+
+/* Unregister all uprobe bindings; called from disable_tlob(). */
+static void tlob_remove_all_uprobes(void)
+{
+ struct tlob_uprobe_binding *b, *tmp;
+
+ mutex_lock(&tlob_uprobe_mutex);
+ list_for_each_entry_safe(b, tmp, &tlob_uprobe_list, list) {
+ uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc);
+ uprobe_unregister_nosync(b->stop_uprobe, &b->stop_uc);
+ list_del(&b->list);
+ path_put(&b->path);
+ kfree(b);
+ }
+ mutex_unlock(&tlob_uprobe_mutex);
+ uprobe_unregister_sync();
+}
+
+/*
+ * tracefs "monitor" file
+ *
+ * Read: one "threshold_us:0xoffset_start:0xoffset_stop:binary_path\n"
+ * line per registered uprobe binding.
+ * Write: "threshold_us:offset_start:offset_stop:binary_path" - add uprobe binding
+ * "-offset_start:binary_path" - remove uprobe binding
+ */
+
+static ssize_t tlob_monitor_read(struct file *file,
+ char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ /* pid(10) + threshold(20) + 2 offsets(2*18) + path(256) + delimiters */
+ const int line_sz = TLOB_MAX_PATH + 72;
+ struct tlob_uprobe_binding *b;
+ char *buf, *p;
+ int n = 0, buf_sz, pos = 0;
+ ssize_t ret;
+
+ mutex_lock(&tlob_uprobe_mutex);
+ list_for_each_entry(b, &tlob_uprobe_list, list)
+ n++;
+ mutex_unlock(&tlob_uprobe_mutex);
+
+ buf_sz = (n ? n : 1) * line_sz + 1;
+ buf = kmalloc(buf_sz, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ mutex_lock(&tlob_uprobe_mutex);
+ list_for_each_entry(b, &tlob_uprobe_list, list) {
+ p = b->binpath;
+ pos += scnprintf(buf + pos, buf_sz - pos,
+ "%llu:0x%llx:0x%llx:%s\n",
+ b->threshold_us,
+ (unsigned long long)b->offset_start,
+ (unsigned long long)b->offset_stop,
+ p);
+ }
+ mutex_unlock(&tlob_uprobe_mutex);
+
+ ret = simple_read_from_buffer(ubuf, count, ppos, buf, pos);
+ kfree(buf);
+ return ret;
+}
+
+/*
+ * Parse "threshold_us:offset_start:offset_stop:binary_path".
+ * binary_path comes last so it may freely contain ':'.
+ * Returns 0 on success.
+ */
+VISIBLE_IF_KUNIT int tlob_parse_uprobe_line(char *buf, u64 *thr_out,
+ char **path_out,
+ loff_t *start_out, loff_t *stop_out)
+{
+ unsigned long long thr;
+ long long start, stop;
+ int n = 0;
+
+ /*
+ * %llu : decimal-only (microseconds)
+ * %lli : auto-base, accepts 0x-prefixed hex for offsets
+ * %n : records the byte offset of the first path character
+ */
+ if (sscanf(buf, "%llu:%lli:%lli:%n", &thr, &start, &stop, &n) != 3)
+ return -EINVAL;
+ if (thr == 0 || n == 0 || buf[n] == '\0')
+ return -EINVAL;
+ if (start < 0 || stop < 0)
+ return -EINVAL;
+
+ *thr_out = thr;
+ *start_out = start;
+ *stop_out = stop;
+ *path_out = buf + n;
+ return 0;
+}
+
+static ssize_t tlob_monitor_write(struct file *file,
+ const char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+ char buf[TLOB_MAX_PATH + 64];
+ loff_t offset_start, offset_stop;
+ u64 threshold_us;
+ char *binpath;
+ int ret;
+
+ if (count >= sizeof(buf))
+ return -EINVAL;
+ if (copy_from_user(buf, ubuf, count))
+ return -EFAULT;
+ buf[count] = '\0';
+
+ if (count > 0 && buf[count - 1] == '\n')
+ buf[count - 1] = '\0';
+
+ /* Remove request: "-offset_start:binary_path" */
+ if (buf[0] == '-') {
+ long long off;
+ int n = 0;
+
+ if (sscanf(buf + 1, "%lli:%n", &off, &n) != 1 || n == 0)
+ return -EINVAL;
+ binpath = buf + 1 + n;
+ if (binpath[0] != '/')
+ return -EINVAL;
+
+ mutex_lock(&tlob_uprobe_mutex);
+ tlob_remove_uprobe_by_key((loff_t)off, binpath);
+ mutex_unlock(&tlob_uprobe_mutex);
+
+ return (ssize_t)count;
+ }
+
+ /*
+ * Uprobe binding: "threshold_us:offset_start:offset_stop:binary_path"
+ * binpath points into buf at the start of the path field.
+ */
+ ret = tlob_parse_uprobe_line(buf, &threshold_us,
+ &binpath, &offset_start, &offset_stop);
+ if (ret)
+ return ret;
+
+ mutex_lock(&tlob_uprobe_mutex);
+ ret = tlob_add_uprobe(threshold_us, binpath, offset_start, offset_stop);
+ mutex_unlock(&tlob_uprobe_mutex);
+ return ret ? ret : (ssize_t)count;
+}
+
+static const struct file_operations tlob_monitor_fops = {
+ .open = simple_open,
+ .read = tlob_monitor_read,
+ .write = tlob_monitor_write,
+ .llseek = noop_llseek,
+};
+
+/*
+ * __tlob_init_monitor / __tlob_destroy_monitor - called with rv_interface_lock
+ * held (required by da_monitor_init/destroy via rv_get/put_task_monitor_slot).
+ */
+static int __tlob_init_monitor(void)
+{
+ int i, retval;
+
+ tlob_state_cache = kmem_cache_create("tlob_task_state",
+ sizeof(struct tlob_task_state),
+ 0, 0, NULL);
+ if (!tlob_state_cache)
+ return -ENOMEM;
+
+ for (i = 0; i < TLOB_HTABLE_SIZE; i++)
+ INIT_HLIST_HEAD(&tlob_htable[i]);
+ atomic_set(&tlob_num_monitored, 0);
+
+ retval = da_monitor_init();
+ if (retval) {
+ kmem_cache_destroy(tlob_state_cache);
+ tlob_state_cache = NULL;
+ return retval;
+ }
+
+ rv_this.enabled = 1;
+ return 0;
+}
+
+static void __tlob_destroy_monitor(void)
+{
+ rv_this.enabled = 0;
+ tlob_stop_all();
+ tlob_remove_all_uprobes();
+ /*
+ * Drain pending call_rcu() callbacks from tlob_stop_all() before
+ * destroying the kmem_cache.
+ */
+ synchronize_rcu();
+ da_monitor_destroy();
+ kmem_cache_destroy(tlob_state_cache);
+ tlob_state_cache = NULL;
+}
+
+/*
+ * tlob_init_monitor / tlob_destroy_monitor - KUnit wrappers that acquire
+ * rv_interface_lock, satisfying the lockdep_assert_held() inside
+ * rv_get/put_task_monitor_slot().
+ */
+VISIBLE_IF_KUNIT int tlob_init_monitor(void)
+{
+ int ret;
+
+ mutex_lock(&rv_interface_lock);
+ ret = __tlob_init_monitor();
+ mutex_unlock(&rv_interface_lock);
+ return ret;
+}
+EXPORT_SYMBOL_IF_KUNIT(tlob_init_monitor);
+
+VISIBLE_IF_KUNIT void tlob_destroy_monitor(void)
+{
+ mutex_lock(&rv_interface_lock);
+ __tlob_destroy_monitor();
+ mutex_unlock(&rv_interface_lock);
+}
+EXPORT_SYMBOL_IF_KUNIT(tlob_destroy_monitor);
+
+VISIBLE_IF_KUNIT int tlob_enable_hooks(void)
+{
+ rv_attach_trace_probe("tlob", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("tlob", sched_wakeup, handle_sched_wakeup);
+ return 0;
+}
+EXPORT_SYMBOL_IF_KUNIT(tlob_enable_hooks);
+
+VISIBLE_IF_KUNIT void tlob_disable_hooks(void)
+{
+ rv_detach_trace_probe("tlob", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("tlob", sched_wakeup, handle_sched_wakeup);
+}
+EXPORT_SYMBOL_IF_KUNIT(tlob_disable_hooks);
+
+/*
+ * enable_tlob / disable_tlob - called by rv_enable/disable_monitor() which
+ * already holds rv_interface_lock; call the __ variants directly.
+ */
+static int enable_tlob(void)
+{
+ int retval;
+
+ retval = __tlob_init_monitor();
+ if (retval)
+ return retval;
+
+ return tlob_enable_hooks();
+}
+
+static void disable_tlob(void)
+{
+ tlob_disable_hooks();
+ __tlob_destroy_monitor();
+}
+
+static struct rv_monitor rv_this = {
+ .name = "tlob",
+ .description = "Per-task latency-over-budget monitor.",
+ .enable = enable_tlob,
+ .disable = disable_tlob,
+ .reset = da_monitor_reset_all,
+ .enabled = 0,
+};
+
+static int __init register_tlob(void)
+{
+ int ret;
+
+ ret = rv_register_monitor(&rv_this, NULL);
+ if (ret)
+ return ret;
+
+ if (rv_this.root_d) {
+ tracefs_create_file("monitor", 0644, rv_this.root_d, NULL,
+ &tlob_monitor_fops);
+ }
+
+ return 0;
+}
+
+static void __exit unregister_tlob(void)
+{
+ rv_unregister_monitor(&rv_this);
+}
+
+module_init(register_tlob);
+module_exit(unregister_tlob);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Wen Yang <wen.yang@linux.dev>");
+MODULE_DESCRIPTION("tlob: task latency over budget per-task monitor.");
diff --git a/kernel/trace/rv/monitors/tlob/tlob.h b/kernel/trace/rv/monitors/tlob/tlob.h
new file mode 100644
index 000000000..3438a6175
--- /dev/null
+++ b/kernel/trace/rv/monitors/tlob/tlob.h
@@ -0,0 +1,145 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _RV_TLOB_H
+#define _RV_TLOB_H
+
+/*
+ * C representation of the tlob automaton, generated from tlob.dot via rvgen
+ * and extended with tlob_start_task()/tlob_stop_task() declarations.
+ * For the format description see Documentation/trace/rv/deterministic_automata.rst
+ */
+
+#include <linux/rv.h>
+#include <uapi/linux/rv.h>
+
+#define MONITOR_NAME tlob
+
+enum states_tlob {
+ unmonitored_tlob,
+ on_cpu_tlob,
+ off_cpu_tlob,
+ state_max_tlob,
+};
+
+#define INVALID_STATE state_max_tlob
+
+enum events_tlob {
+ trace_start_tlob,
+ switch_in_tlob,
+ switch_out_tlob,
+ sched_wakeup_tlob,
+ trace_stop_tlob,
+ budget_expired_tlob,
+ event_max_tlob,
+};
+
+struct automaton_tlob {
+ char *state_names[state_max_tlob];
+ char *event_names[event_max_tlob];
+ unsigned char function[state_max_tlob][event_max_tlob];
+ unsigned char initial_state;
+ bool final_states[state_max_tlob];
+};
+
+static const struct automaton_tlob automaton_tlob = {
+ .state_names = {
+ "unmonitored",
+ "on_cpu",
+ "off_cpu",
+ },
+ .event_names = {
+ "trace_start",
+ "switch_in",
+ "switch_out",
+ "sched_wakeup",
+ "trace_stop",
+ "budget_expired",
+ },
+ .function = {
+ /* unmonitored */
+ {
+ on_cpu_tlob, /* trace_start */
+ unmonitored_tlob, /* switch_in */
+ unmonitored_tlob, /* switch_out */
+ unmonitored_tlob, /* sched_wakeup */
+ INVALID_STATE, /* trace_stop */
+ INVALID_STATE, /* budget_expired */
+ },
+ /* on_cpu */
+ {
+ INVALID_STATE, /* trace_start */
+ INVALID_STATE, /* switch_in */
+ off_cpu_tlob, /* switch_out */
+ on_cpu_tlob, /* sched_wakeup */
+ unmonitored_tlob, /* trace_stop */
+ unmonitored_tlob, /* budget_expired */
+ },
+ /* off_cpu */
+ {
+ INVALID_STATE, /* trace_start */
+ on_cpu_tlob, /* switch_in */
+ off_cpu_tlob, /* switch_out */
+ off_cpu_tlob, /* sched_wakeup */
+ unmonitored_tlob, /* trace_stop */
+ unmonitored_tlob, /* budget_expired */
+ },
+ },
+ /*
+ * final_states: unmonitored is the sole accepting state.
+ * Violations are recorded via ntf_push and tlob_budget_exceeded.
+ */
+ .initial_state = unmonitored_tlob,
+ .final_states = { 1, 0, 0 },
+};
+
+/* Exported for use by the RV ioctl layer (rv_dev.c) */
+int tlob_start_task(struct task_struct *task, u64 threshold_us,
+ struct file *notify_file, u64 tag);
+int tlob_stop_task(struct task_struct *task);
+
+/* Maximum number of concurrently monitored tasks (also used by KUnit). */
+#define TLOB_MAX_MONITORED 64U
+
+/*
+ * Ring buffer constants (also published in UAPI for mmap size calculation).
+ */
+#define TLOB_RING_DEFAULT_CAP 64U /* records allocated at open() */
+#define TLOB_RING_MIN_CAP 8U /* minimum accepted by mmap() */
+#define TLOB_RING_MAX_CAP 4096U /* maximum accepted by mmap() */
+
+/**
+ * struct tlob_ring - per-fd mmap-capable violation ring buffer.
+ *
+ * Allocated as a contiguous page range at rv_open() time:
+ * page 0: struct tlob_mmap_page (shared with userspace)
+ * pages 1-N: struct tlob_event[capacity]
+ */
+struct tlob_ring {
+ struct tlob_mmap_page *page;
+ struct tlob_event *data;
+ u32 mask;
+ spinlock_t lock;
+ unsigned long base;
+ unsigned int order;
+};
+
+/**
+ * struct rv_file_priv - per-fd private data for /dev/rv.
+ */
+struct rv_file_priv {
+ struct tlob_ring ring;
+ wait_queue_head_t waitq;
+};
+
+#if IS_ENABLED(CONFIG_KUNIT)
+int tlob_init_monitor(void);
+void tlob_destroy_monitor(void);
+int tlob_enable_hooks(void);
+void tlob_disable_hooks(void);
+void tlob_event_push_kunit(struct rv_file_priv *priv,
+ const struct tlob_event *info);
+int tlob_parse_uprobe_line(char *buf, u64 *thr_out,
+ char **path_out,
+ loff_t *start_out, loff_t *stop_out);
+#endif /* CONFIG_KUNIT */
+
+#endif /* _RV_TLOB_H */
diff --git a/kernel/trace/rv/monitors/tlob/tlob_trace.h b/kernel/trace/rv/monitors/tlob/tlob_trace.h
new file mode 100644
index 000000000..b08d67776
--- /dev/null
+++ b/kernel/trace/rv/monitors/tlob/tlob_trace.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_TLOB
+/*
+ * tlob uses the generic event_da_monitor_id and error_da_monitor_id event
+ * classes so that both event classes are instantiated. This avoids a
+ * -Werror=unused-variable warning that the compiler emits when a
+ * DECLARE_EVENT_CLASS has no corresponding DEFINE_EVENT instance.
+ *
+ * The event_tlob tracepoint is defined here but the call-site in
+ * da_handle_event() is overridden with a no-op macro below so that no
+ * trace record is emitted on every scheduler context switch. Budget
+ * violations are reported via the dedicated tlob_budget_exceeded event.
+ *
+ * error_tlob IS kept active so that invalid DA transitions (programming
+ * errors) are still visible in the ftrace ring buffer for debugging.
+ */
+DEFINE_EVENT(event_da_monitor_id, event_tlob,
+ TP_PROTO(int id, char *state, char *event, char *next_state,
+ bool final_state),
+ TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_tlob,
+ TP_PROTO(int id, char *state, char *event),
+ TP_ARGS(id, state, event));
+
+/*
+ * Override the trace_event_tlob() call-site with a no-op after the
+ * DEFINE_EVENT above has satisfied the event class instantiation
+ * requirement. The tracepoint symbol itself exists (and can be enabled
+ * via tracefs) but the automatic call from da_handle_event() is silenced
+ * to avoid per-context-switch ftrace noise during normal operation.
+ */
+#undef trace_event_tlob
+#define trace_event_tlob(id, state, event, next_state, final_state) \
+ do { (void)(id); (void)(state); (void)(event); \
+ (void)(next_state); (void)(final_state); } while (0)
+#endif /* CONFIG_RV_MON_TLOB */
diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
index ee4e68102..e754e76d5 100644
--- a/kernel/trace/rv/rv.c
+++ b/kernel/trace/rv/rv.c
@@ -148,6 +148,10 @@
#include <rv_trace.h>
#endif
+#ifdef CONFIG_RV_MON_TLOB
+EXPORT_TRACEPOINT_SYMBOL_GPL(tlob_budget_exceeded);
+#endif
+
#include "rv.h"
DEFINE_MUTEX(rv_interface_lock);
diff --git a/kernel/trace/rv/rv_dev.c b/kernel/trace/rv/rv_dev.c
new file mode 100644
index 000000000..a052f3203
--- /dev/null
+++ b/kernel/trace/rv/rv_dev.c
@@ -0,0 +1,602 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * rv_dev.c - /dev/rv misc device for RV monitor self-instrumentation
+ *
+ * A single misc device (MISC_DYNAMIC_MINOR) serves all RV monitors.
+ * ioctl numbers encode the monitor identity:
+ *
+ * 0x01 - 0x1F tlob (task latency over budget)
+ * 0x20 - 0x3F reserved
+ *
+ * Each monitor exports tlob_start_task() / tlob_stop_task() which are
+ * called here. The calling task is identified by current.
+ *
+ * Magic: RV_IOC_MAGIC (0xB9), defined in include/uapi/linux/rv.h
+ *
+ * Per-fd private data (rv_file_priv)
+ * ------------------------------------
+ * Every open() of /dev/rv allocates an rv_file_priv (defined in tlob.h).
+ * When TLOB_IOCTL_TRACE_START is called with args.notify_fd >= 0, violations
+ * are pushed as tlob_event records into that fd's per-fd ring buffer (tlob_ring)
+ * and its poll/epoll waitqueue is woken.
+ *
+ * Consumers drain records with read() on the notify_fd; read() blocks until
+ * at least one record is available (unless O_NONBLOCK is set).
+ *
+ * Per-thread "started" tracking (tlob_task_handle)
+ * -------------------------------------------------
+ * tlob_stop_task() returns -ESRCH in two distinct situations:
+ *
+ * (a) The deadline timer already fired and removed the tlob hash-table
+ * entry before TRACE_STOP arrived -> budget was exceeded -> -EOVERFLOW
+ *
+ * (b) TRACE_START was never called for this thread -> programming error
+ * -> -ESRCH
+ *
+ * To distinguish them, rv_dev.c maintains a lightweight hash table
+ * (tlob_handles) that records a tlob_task_handle for every task_struct *
+ * for which a successful TLOB_IOCTL_TRACE_START has been
+ * issued but the corresponding TLOB_IOCTL_TRACE_STOP has not yet arrived.
+ *
+ * tlob_task_handle is a thin "session ticket" -- it carries only the
+ * task pointer and the owning file descriptor. The heavy per-task state
+ * (hrtimer, DA state, threshold) lives in tlob_task_state inside tlob.c.
+ *
+ * The table is keyed on task_struct * (same key as tlob.c), protected
+ * by tlob_handles_lock (spinlock, irq-safe). No get_task_struct()
+ * refcount is needed here because tlob.c already holds a reference for
+ * each live entry.
+ *
+ * Multiple threads may share the same fd. Each thread has its own
+ * tlob_task_handle in the table, so concurrent TRACE_START / TRACE_STOP
+ * calls from different threads do not interfere.
+ *
+ * The fd release path (rv_release) calls tlob_stop_task() for every
+ * handle in tlob_handles that belongs to the closing fd, ensuring cleanup
+ * even if the user forgets to call TRACE_STOP.
+ */
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/gfp.h>
+#include <linux/hash.h>
+#include <linux/mm.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+#include <uapi/linux/rv.h>
+
+#ifdef CONFIG_RV_MON_TLOB
+#include "monitors/tlob/tlob.h"
+#endif
+
+/* -----------------------------------------------------------------------
+ * tlob_task_handle - per-thread session ticket for the ioctl interface
+ *
+ * One handle is allocated by TLOB_IOCTL_TRACE_START and freed by
+ * TLOB_IOCTL_TRACE_STOP (or by rv_release if the fd is closed).
+ *
+ * @hlist: Hash-table linkage in tlob_handles (keyed on task pointer).
+ * @task: The monitored thread. Plain pointer; no refcount held here
+ * because tlob.c holds one for the lifetime of the monitoring
+ * window, which encompasses the lifetime of this handle.
+ * @file: The /dev/rv file descriptor that issued TRACE_START.
+ * Used by rv_release() to sweep orphaned handles on close().
+ * -----------------------------------------------------------------------
+ */
+#define TLOB_HANDLES_BITS 5
+#define TLOB_HANDLES_SIZE (1 << TLOB_HANDLES_BITS)
+
+struct tlob_task_handle {
+ struct hlist_node hlist;
+ struct task_struct *task;
+ struct file *file;
+};
+
+static struct hlist_head tlob_handles[TLOB_HANDLES_SIZE];
+static DEFINE_SPINLOCK(tlob_handles_lock);
+
+static unsigned int tlob_handle_hash(const struct task_struct *task)
+{
+ return hash_ptr((void *)task, TLOB_HANDLES_BITS);
+}
+
+/* Must be called with tlob_handles_lock held. */
+static struct tlob_task_handle *
+tlob_handle_find_locked(struct task_struct *task)
+{
+ struct tlob_task_handle *h;
+ unsigned int slot = tlob_handle_hash(task);
+
+ hlist_for_each_entry(h, &tlob_handles[slot], hlist) {
+ if (h->task == task)
+ return h;
+ }
+ return NULL;
+}
+
+/*
+ * tlob_handle_alloc - record that @task has an active monitoring session
+ * opened via @file.
+ *
+ * Returns 0 on success, -EEXIST if @task already has a handle (double
+ * TRACE_START without TRACE_STOP), -ENOMEM on allocation failure.
+ */
+static int tlob_handle_alloc(struct task_struct *task, struct file *file)
+{
+ struct tlob_task_handle *h;
+ unsigned long flags;
+ unsigned int slot;
+
+ h = kmalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
+ return -ENOMEM;
+ h->task = task;
+ h->file = file;
+
+ spin_lock_irqsave(&tlob_handles_lock, flags);
+ if (tlob_handle_find_locked(task)) {
+ spin_unlock_irqrestore(&tlob_handles_lock, flags);
+ kfree(h);
+ return -EEXIST;
+ }
+ slot = tlob_handle_hash(task);
+ hlist_add_head(&h->hlist, &tlob_handles[slot]);
+ spin_unlock_irqrestore(&tlob_handles_lock, flags);
+ return 0;
+}
+
+/*
+ * tlob_handle_free - remove the handle for @task and free it.
+ *
+ * Returns 1 if a handle existed (TRACE_START was called), 0 if not found
+ * (TRACE_START was never called for this thread).
+ */
+static int tlob_handle_free(struct task_struct *task)
+{
+ struct tlob_task_handle *h;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tlob_handles_lock, flags);
+ h = tlob_handle_find_locked(task);
+ if (h) {
+ hlist_del_init(&h->hlist);
+ spin_unlock_irqrestore(&tlob_handles_lock, flags);
+ kfree(h);
+ return 1;
+ }
+ spin_unlock_irqrestore(&tlob_handles_lock, flags);
+ return 0;
+}
+
+/*
+ * tlob_handle_sweep_file - release all handles owned by @file.
+ *
+ * Called from rv_release() when the fd is closed without TRACE_STOP.
+ * Calls tlob_stop_task() for each orphaned handle to drain the tlob
+ * monitoring entries and prevent resource leaks in tlob.c.
+ *
+ * Handles are collected under the lock (short critical section), then
+ * processed outside it (tlob_stop_task() may sleep/spin internally).
+ */
+#ifdef CONFIG_RV_MON_TLOB
+static void tlob_handle_sweep_file(struct file *file)
+{
+ struct tlob_task_handle *batch[TLOB_HANDLES_SIZE];
+ struct tlob_task_handle *h;
+ struct hlist_node *tmp;
+ unsigned long flags;
+ int i, n = 0;
+
+ spin_lock_irqsave(&tlob_handles_lock, flags);
+ for (i = 0; i < TLOB_HANDLES_SIZE; i++) {
+ hlist_for_each_entry_safe(h, tmp, &tlob_handles[i], hlist) {
+ if (h->file == file) {
+ hlist_del_init(&h->hlist);
+ batch[n++] = h;
+ }
+ }
+ }
+ spin_unlock_irqrestore(&tlob_handles_lock, flags);
+
+ for (i = 0; i < n; i++) {
+ /*
+ * Ignore -ESRCH: the deadline timer may have already fired
+ * and cleaned up the tlob entry.
+ */
+ tlob_stop_task(batch[i]->task);
+ kfree(batch[i]);
+ }
+}
+#else
+static inline void tlob_handle_sweep_file(struct file *file) {}
+#endif /* CONFIG_RV_MON_TLOB */
+
+/* -----------------------------------------------------------------------
+ * Ring buffer lifecycle
+ * -----------------------------------------------------------------------
+ */
+
+/*
+ * tlob_ring_alloc - allocate a ring of @cap records (must be a power of 2).
+ *
+ * Allocates a physically contiguous block of pages:
+ * page 0 : struct tlob_mmap_page (control page, shared with userspace)
+ * pages 1..N : struct tlob_event[cap] (data pages)
+ *
+ * Each page is marked reserved so it can be mapped to userspace via mmap().
+ */
+static int tlob_ring_alloc(struct tlob_ring *ring, u32 cap)
+{
+ unsigned int total = PAGE_SIZE + cap * sizeof(struct tlob_event);
+ unsigned int order = get_order(total);
+ unsigned long base;
+ unsigned int i;
+
+ base = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+ if (!base)
+ return -ENOMEM;
+
+ for (i = 0; i < (1u << order); i++)
+ SetPageReserved(virt_to_page((void *)(base + i * PAGE_SIZE)));
+
+ ring->base = base;
+ ring->order = order;
+ ring->page = (struct tlob_mmap_page *)base;
+ ring->data = (struct tlob_event *)(base + PAGE_SIZE);
+ ring->mask = cap - 1;
+ spin_lock_init(&ring->lock);
+
+ ring->page->capacity = cap;
+ ring->page->version = 1;
+ ring->page->data_offset = PAGE_SIZE;
+ ring->page->record_size = sizeof(struct tlob_event);
+ return 0;
+}
+
+static void tlob_ring_free(struct tlob_ring *ring)
+{
+ unsigned int i;
+
+ if (!ring->base)
+ return;
+
+ for (i = 0; i < (1u << ring->order); i++)
+ ClearPageReserved(virt_to_page((void *)(ring->base + i * PAGE_SIZE)));
+
+ free_pages(ring->base, ring->order);
+ ring->base = 0;
+ ring->page = NULL;
+ ring->data = NULL;
+}
+
+/* -----------------------------------------------------------------------
+ * File operations
+ * -----------------------------------------------------------------------
+ */
+
+static int rv_open(struct inode *inode, struct file *file)
+{
+ struct rv_file_priv *priv;
+ int ret;
+
+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ ret = tlob_ring_alloc(&priv->ring, TLOB_RING_DEFAULT_CAP);
+ if (ret) {
+ kfree(priv);
+ return ret;
+ }
+
+ init_waitqueue_head(&priv->waitq);
+ file->private_data = priv;
+ return 0;
+}
+
+static int rv_release(struct inode *inode, struct file *file)
+{
+ struct rv_file_priv *priv = file->private_data;
+
+ tlob_handle_sweep_file(file);
+ tlob_ring_free(&priv->ring);
+ kfree(priv);
+ file->private_data = NULL;
+ return 0;
+}
+
+static __poll_t rv_poll(struct file *file, poll_table *wait)
+{
+ struct rv_file_priv *priv = file->private_data;
+
+ if (!priv)
+ return EPOLLERR;
+
+ poll_wait(file, &priv->waitq, wait);
+
+ /*
+ * Pairs with smp_store_release(&ring->page->data_head, ...) in
+ * tlob_event_push(). No lock needed: head is written by the kernel
+ * producer and read here; tail is written by the consumer and we only
+ * need an approximate check for the poll fast path.
+ */
+ if (smp_load_acquire(&priv->ring.page->data_head) !=
+ READ_ONCE(priv->ring.page->data_tail))
+ return EPOLLIN | EPOLLRDNORM;
+
+ return 0;
+}
+
+/*
+ * rv_read - consume tlob_event violation records from this fd's ring buffer.
+ *
+ * Each read() returns a whole number of struct tlob_event records. @count must
+ * be at least sizeof(struct tlob_event); partial-record sizes are rejected with
+ * -EINVAL.
+ *
+ * Blocking behaviour follows O_NONBLOCK on the fd:
+ * O_NONBLOCK clear: blocks until at least one record is available.
+ * O_NONBLOCK set: returns -EAGAIN immediately if the ring is empty.
+ *
+ * Returns the number of bytes copied (always a multiple of sizeof tlob_event),
+ * -EAGAIN if non-blocking and empty, or a negative error code.
+ *
+ * read() and mmap() share the same ring and data_tail cursor; do not use
+ * both simultaneously on the same fd.
+ */
+static ssize_t rv_read(struct file *file, char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ struct rv_file_priv *priv = file->private_data;
+ struct tlob_ring *ring;
+ size_t rec = sizeof(struct tlob_event);
+ unsigned long irqflags;
+ ssize_t done = 0;
+ int ret;
+
+ if (!priv)
+ return -ENODEV;
+
+ ring = &priv->ring;
+
+ if (count < rec)
+ return -EINVAL;
+
+ /* Blocking path: sleep until the producer advances data_head. */
+ if (!(file->f_flags & O_NONBLOCK)) {
+ ret = wait_event_interruptible(priv->waitq,
+ /* pairs with smp_store_release() in the producer */
+ smp_load_acquire(&ring->page->data_head) !=
+ READ_ONCE(ring->page->data_tail));
+ if (ret)
+ return ret;
+ }
+
+ /*
+ * Drain records into the caller's buffer. ring->lock serialises
+ * concurrent read() callers and the softirq producer.
+ */
+ while (done + rec <= count) {
+ struct tlob_event record;
+ u32 head, tail;
+
+ spin_lock_irqsave(&ring->lock, irqflags);
+ /* pairs with smp_store_release() in the producer */
+ head = smp_load_acquire(&ring->page->data_head);
+ tail = ring->page->data_tail;
+ if (head == tail) {
+ spin_unlock_irqrestore(&ring->lock, irqflags);
+ break;
+ }
+ record = ring->data[tail & ring->mask];
+ WRITE_ONCE(ring->page->data_tail, tail + 1);
+ spin_unlock_irqrestore(&ring->lock, irqflags);
+
+ if (copy_to_user(buf + done, &record, rec))
+ return done ? done : -EFAULT;
+ done += rec;
+ }
+
+ return done ? done : -EAGAIN;
+}
+
+/*
+ * rv_mmap - map the per-fd violation ring buffer into userspace.
+ *
+ * The mmap region covers the full ring allocation:
+ *
+ * offset 0 : struct tlob_mmap_page (control page)
+ * offset PAGE_SIZE : struct tlob_event[capacity] (data pages)
+ *
+ * The caller must map exactly PAGE_SIZE + capacity * sizeof(struct tlob_event)
+ * bytes starting at offset 0 (vm_pgoff must be 0). The actual capacity is
+ * read from tlob_mmap_page.capacity after a successful mmap(2).
+ *
+ * Private mappings (MAP_PRIVATE) are rejected: the shared data_tail field
+ * written by userspace must be visible to the kernel producer.
+ */
+static int rv_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct rv_file_priv *priv = file->private_data;
+ struct tlob_ring *ring;
+ unsigned long size = vma->vm_end - vma->vm_start;
+ unsigned long ring_size;
+
+ if (!priv)
+ return -ENODEV;
+
+ ring = &priv->ring;
+
+ if (vma->vm_pgoff != 0)
+ return -EINVAL;
+
+ ring_size = PAGE_ALIGN(PAGE_SIZE + ((unsigned long)(ring->mask + 1) *
+ sizeof(struct tlob_event)));
+ if (size != ring_size)
+ return -EINVAL;
+
+ if (!(vma->vm_flags & VM_SHARED))
+ return -EINVAL;
+
+ return remap_pfn_range(vma, vma->vm_start,
+ page_to_pfn(virt_to_page((void *)ring->base)),
+ ring_size, vma->vm_page_prot);
+}
+
+/* -----------------------------------------------------------------------
+ * ioctl dispatcher
+ * -----------------------------------------------------------------------
+ */
+
+static long rv_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ unsigned int nr = _IOC_NR(cmd);
+
+ /*
+ * Verify the magic byte so we don't accidentally handle ioctls
+ * intended for a different device.
+ */
+ if (_IOC_TYPE(cmd) != RV_IOC_MAGIC)
+ return -ENOTTY;
+
+#ifdef CONFIG_RV_MON_TLOB
+ /* tlob: ioctl numbers 0x01 - 0x1F */
+ switch (cmd) {
+ case TLOB_IOCTL_TRACE_START: {
+ struct tlob_start_args args;
+ struct file *notify_file = NULL;
+ int ret, hret;
+
+ if (copy_from_user(&args,
+ (struct tlob_start_args __user *)arg,
+ sizeof(args)))
+ return -EFAULT;
+ if (args.threshold_us == 0)
+ return -EINVAL;
+ if (args.flags != 0)
+ return -EINVAL;
+
+ /*
+ * If notify_fd >= 0, resolve it to a file pointer.
+ * fget() bumps the reference count; tlob.c drops it
+ * via fput() when the monitoring window ends.
+ * Reject non-/dev/rv fds to prevent type confusion.
+ */
+ if (args.notify_fd >= 0) {
+ notify_file = fget(args.notify_fd);
+ if (!notify_file)
+ return -EBADF;
+ if (notify_file->f_op != file->f_op) {
+ fput(notify_file);
+ return -EINVAL;
+ }
+ }
+
+ ret = tlob_start_task(current, args.threshold_us,
+ notify_file, args.tag);
+ if (ret != 0) {
+ /* tlob.c did not take ownership; drop ref. */
+ if (notify_file)
+ fput(notify_file);
+ return ret;
+ }
+
+ /*
+ * Record session handle. Free any stale handle left by
+ * a previous window whose deadline timer fired (timer
+ * removes tlob_task_state but cannot touch tlob_handles).
+ */
+ tlob_handle_free(current);
+ hret = tlob_handle_alloc(current, file);
+ if (hret < 0) {
+ tlob_stop_task(current);
+ return hret;
+ }
+ return 0;
+ }
+ case TLOB_IOCTL_TRACE_STOP: {
+ int had_handle;
+ int ret;
+
+ /*
+ * Atomically remove the session handle for current.
+ *
+ * had_handle == 0: TRACE_START was never called for
+ * this thread -> caller bug -> -ESRCH
+ *
+ * had_handle == 1: TRACE_START was called. If
+ * tlob_stop_task() now returns
+ * -ESRCH, the deadline timer already
+ * fired -> budget exceeded -> -EOVERFLOW
+ */
+ had_handle = tlob_handle_free(current);
+ if (!had_handle)
+ return -ESRCH;
+
+ ret = tlob_stop_task(current);
+ return (ret == -ESRCH) ? -EOVERFLOW : ret;
+ }
+ default:
+ break;
+ }
+#endif /* CONFIG_RV_MON_TLOB */
+
+ return -ENOTTY;
+}
+
+/* -----------------------------------------------------------------------
+ * Module init / exit
+ * -----------------------------------------------------------------------
+ */
+
+static const struct file_operations rv_fops = {
+ .owner = THIS_MODULE,
+ .open = rv_open,
+ .release = rv_release,
+ .read = rv_read,
+ .poll = rv_poll,
+ .mmap = rv_mmap,
+ .unlocked_ioctl = rv_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = rv_ioctl,
+#endif
+ .llseek = noop_llseek,
+};
+
+/*
+ * 0666: /dev/rv is a self-instrumentation device. All ioctls operate
+ * exclusively on the calling task (current); no task can monitor another
+ * via this interface. Opening the device does not grant any privilege
+ * beyond observing one's own latency, so world-read/write is appropriate.
+ */
+static struct miscdevice rv_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "rv",
+ .fops = &rv_fops,
+ .mode = 0666,
+};
+
+static int __init rv_ioctl_init(void)
+{
+ int i;
+
+ for (i = 0; i < TLOB_HANDLES_SIZE; i++)
+ INIT_HLIST_HEAD(&tlob_handles[i]);
+
+ return misc_register(&rv_miscdev);
+}
+
+static void __exit rv_ioctl_exit(void)
+{
+ misc_deregister(&rv_miscdev);
+}
+
+module_init(rv_ioctl_init);
+module_exit(rv_ioctl_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("RV ioctl interface via /dev/rv");
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 4a6faddac..65d6c6485 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -126,6 +126,7 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
#include <monitors/snroc/snroc_trace.h>
#include <monitors/nrp/nrp_trace.h>
#include <monitors/sssw/sssw_trace.h>
+#include <monitors/tlob/tlob_trace.h>
// Add new monitors based on CONFIG_DA_MON_EVENTS_ID here
#endif /* CONFIG_DA_MON_EVENTS_ID */
@@ -202,6 +203,55 @@ TRACE_EVENT(rv_retries_error,
__get_str(event), __get_str(name))
);
#endif /* CONFIG_RV_MON_MAINTENANCE_EVENTS */
+
+#ifdef CONFIG_RV_MON_TLOB
+/*
+ * tlob_budget_exceeded - emitted when a monitored task exceeds its latency
+ * budget. Carries the on-CPU / off-CPU time breakdown so that the cause
+ * of the overrun (CPU-bound vs. scheduling/I/O latency) is immediately
+ * visible in the ftrace ring buffer without post-processing.
+ */
+TRACE_EVENT(tlob_budget_exceeded,
+
+ TP_PROTO(struct task_struct *task, u64 threshold_us,
+ u64 on_cpu_us, u64 off_cpu_us, u32 switches,
+ bool state_is_on_cpu, u64 tag),
+
+ TP_ARGS(task, threshold_us, on_cpu_us, off_cpu_us, switches,
+ state_is_on_cpu, tag),
+
+ TP_STRUCT__entry(
+ __string(comm, task->comm)
+ __field(pid_t, pid)
+ __field(u64, threshold_us)
+ __field(u64, on_cpu_us)
+ __field(u64, off_cpu_us)
+ __field(u32, switches)
+ __field(bool, state_is_on_cpu)
+ __field(u64, tag)
+ ),
+
+ TP_fast_assign(
+ __assign_str(comm);
+ __entry->pid = task->pid;
+ __entry->threshold_us = threshold_us;
+ __entry->on_cpu_us = on_cpu_us;
+ __entry->off_cpu_us = off_cpu_us;
+ __entry->switches = switches;
+ __entry->state_is_on_cpu = state_is_on_cpu;
+ __entry->tag = tag;
+ ),
+
+ TP_printk("%s[%d]: budget exceeded threshold=%llu on_cpu=%llu off_cpu=%llu switches=%u state=%s tag=0x%016llx",
+ __get_str(comm), __entry->pid,
+ __entry->threshold_us,
+ __entry->on_cpu_us, __entry->off_cpu_us,
+ __entry->switches,
+ __entry->state_is_on_cpu ? "on_cpu" : "off_cpu",
+ __entry->tag)
+);
+#endif /* CONFIG_RV_MON_TLOB */
+
#endif /* _TRACE_RV_H */
/* This part must be outside protection */
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 3/4] rv/tlob: Add KUnit tests for the tlob monitor
2026-04-12 19:27 [RFC PATCH 0/4] rv/tlob: Add task latency over budget RV monitor wen.yang
2026-04-12 19:27 ` [RFC PATCH 1/4] rv/tlob: Add tlob model DOT file wen.yang
2026-04-12 19:27 ` [RFC PATCH 2/4] rv/tlob: Add tlob deterministic automaton monitor wen.yang
@ 2026-04-12 19:27 ` wen.yang
2026-04-12 19:27 ` [RFC PATCH 4/4] selftests/rv: Add selftest " wen.yang
3 siblings, 0 replies; 7+ messages in thread
From: wen.yang @ 2026-04-12 19:27 UTC (permalink / raw)
To: Steven Rostedt, Gabriele Monaco, Masami Hiramatsu,
Mathieu Desnoyers
Cc: linux-trace-kernel, linux-kernel, Wen Yang
From: Wen Yang <wen.yang@linux.dev>
Add six KUnit test suites gated behind CONFIG_TLOB_KUNIT_TEST
(depends on RV_MON_TLOB && KUNIT; default KUNIT_ALL_TESTS).
A .kunitconfig fragment is provided for the kunit.py runner.
Coverage: automaton state transitions and self-loops; start/stop API
error paths (duplicate start, missing start, overflow threshold,
table-full, immediate deadline); scheduler context-switch accounting
for on/off-CPU time; violation tracepoint payload fields; ring buffer
push, drop-new overflow, and wakeup; and the uprobe line parser.
Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
kernel/trace/rv/Makefile | 1 +
kernel/trace/rv/monitors/tlob/.kunitconfig | 5 +
kernel/trace/rv/monitors/tlob/Kconfig | 12 +
kernel/trace/rv/monitors/tlob/tlob.c | 1 +
kernel/trace/rv/monitors/tlob/tlob_kunit.c | 1194 ++++++++++++++++++++
5 files changed, 1213 insertions(+)
create mode 100644 kernel/trace/rv/monitors/tlob/.kunitconfig
create mode 100644 kernel/trace/rv/monitors/tlob/tlob_kunit.c
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index cc3781a3b..6d963207d 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
obj-$(CONFIG_RV_MON_TLOB) += monitors/tlob/tlob.o
+obj-$(CONFIG_TLOB_KUNIT_TEST) += monitors/tlob/tlob_kunit.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/tlob/.kunitconfig b/kernel/trace/rv/monitors/tlob/.kunitconfig
new file mode 100644
index 000000000..977c58601
--- /dev/null
+++ b/kernel/trace/rv/monitors/tlob/.kunitconfig
@@ -0,0 +1,5 @@
+CONFIG_FTRACE=y
+CONFIG_KUNIT=y
+CONFIG_RV=y
+CONFIG_RV_MON_TLOB=y
+CONFIG_TLOB_KUNIT_TEST=y
diff --git a/kernel/trace/rv/monitors/tlob/Kconfig b/kernel/trace/rv/monitors/tlob/Kconfig
index 010237480..4ccd2f881 100644
--- a/kernel/trace/rv/monitors/tlob/Kconfig
+++ b/kernel/trace/rv/monitors/tlob/Kconfig
@@ -49,3 +49,15 @@ config RV_MON_TLOB
For further information, see:
Documentation/trace/rv/monitor_tlob.rst
+config TLOB_KUNIT_TEST
+ tristate "KUnit tests for tlob monitor" if !KUNIT_ALL_TESTS
+ depends on RV_MON_TLOB && KUNIT
+ default KUNIT_ALL_TESTS
+ help
+ Enable KUnit in-kernel unit tests for the tlob RV monitor.
+
+ Tests cover automaton state transitions, the hash table helpers,
+ the start/stop task interface, and the event ring buffer including
+ overflow handling and wakeup behaviour.
+
+ Say Y or M here to run the tlob KUnit test suite; otherwise say N.
diff --git a/kernel/trace/rv/monitors/tlob/tlob.c b/kernel/trace/rv/monitors/tlob/tlob.c
index a6e474025..dd959eb9b 100644
--- a/kernel/trace/rv/monitors/tlob/tlob.c
+++ b/kernel/trace/rv/monitors/tlob/tlob.c
@@ -784,6 +784,7 @@ VISIBLE_IF_KUNIT int tlob_parse_uprobe_line(char *buf, u64 *thr_out,
*path_out = buf + n;
return 0;
}
+EXPORT_SYMBOL_IF_KUNIT(tlob_parse_uprobe_line);
static ssize_t tlob_monitor_write(struct file *file,
const char __user *ubuf,
diff --git a/kernel/trace/rv/monitors/tlob/tlob_kunit.c b/kernel/trace/rv/monitors/tlob/tlob_kunit.c
new file mode 100644
index 000000000..64f5abb34
--- /dev/null
+++ b/kernel/trace/rv/monitors/tlob/tlob_kunit.c
@@ -0,0 +1,1194 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit tests for the tlob RV monitor.
+ *
+ * tlob_automaton: DA transition table coverage.
+ * tlob_task_api: tlob_start_task()/tlob_stop_task() lifecycle and errors.
+ * tlob_sched_integration: on/off-CPU accounting across real context switches.
+ * tlob_trace_output: tlob_budget_exceeded tracepoint field verification.
+ * tlob_event_buf: ring buffer push, overflow, and wakeup.
+ * tlob_parse_uprobe: uprobe format string parser acceptance and rejection.
+ *
+ * The duplicate-(binary, offset_start) constraint enforced by tlob_add_uprobe()
+ * is not covered here: that function calls kern_path() and requires a real
+ * filesystem, which is outside the scope of unit tests. It is covered by the
+ * uprobe_duplicate_offset case in tools/testing/selftests/rv/test_tlob.sh.
+ */
+#include <kunit/test.h>
+#include <linux/atomic.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/ktime.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/sched/task.h>
+#include <linux/tracepoint.h>
+
+/*
+ * Pull in the rv tracepoint declarations so that
+ * register_trace_tlob_budget_exceeded() is available.
+ * No CREATE_TRACE_POINTS here -- the tracepoint implementation lives in rv.c.
+ */
+#include <rv_trace.h>
+
+#include "tlob.h"
+
+/*
+ * da_handle_event_tlob - apply one automaton transition on @da_mon.
+ *
+ * This helper is used only by the KUnit automaton suite. It applies the
+ * tlob transition table directly on a supplied da_monitor without touching
+ * per-task slots, tracepoints, or timers.
+ */
+static void da_handle_event_tlob(struct da_monitor *da_mon,
+ enum events_tlob event)
+{
+ enum states_tlob curr_state = (enum states_tlob)da_mon->curr_state;
+ enum states_tlob next_state =
+ (enum states_tlob)automaton_tlob.function[curr_state][event];
+
+ if (next_state != INVALID_STATE)
+ da_mon->curr_state = next_state;
+}
+
+MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
+
+/*
+ * Suite 1: automaton state-machine transitions
+ */
+
+/* unmonitored -> trace_start -> on_cpu */
+static void tlob_unmonitored_to_on_cpu(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = unmonitored_tlob };
+
+ da_handle_event_tlob(&mon, trace_start_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob);
+}
+
+/* on_cpu -> switch_out -> off_cpu */
+static void tlob_on_cpu_switch_out(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = on_cpu_tlob };
+
+ da_handle_event_tlob(&mon, switch_out_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)off_cpu_tlob);
+}
+
+/* off_cpu -> switch_in -> on_cpu */
+static void tlob_off_cpu_switch_in(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = off_cpu_tlob };
+
+ da_handle_event_tlob(&mon, switch_in_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob);
+}
+
+/* on_cpu -> budget_expired -> unmonitored */
+static void tlob_on_cpu_budget_expired(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = on_cpu_tlob };
+
+ da_handle_event_tlob(&mon, budget_expired_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob);
+}
+
+/* off_cpu -> budget_expired -> unmonitored */
+static void tlob_off_cpu_budget_expired(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = off_cpu_tlob };
+
+ da_handle_event_tlob(&mon, budget_expired_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob);
+}
+
+/* on_cpu -> trace_stop -> unmonitored */
+static void tlob_on_cpu_trace_stop(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = on_cpu_tlob };
+
+ da_handle_event_tlob(&mon, trace_stop_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob);
+}
+
+/* off_cpu -> trace_stop -> unmonitored */
+static void tlob_off_cpu_trace_stop(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = off_cpu_tlob };
+
+ da_handle_event_tlob(&mon, trace_stop_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob);
+}
+
+/* budget_expired -> unmonitored; a single trace_start re-enters on_cpu. */
+static void tlob_violation_then_restart(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = unmonitored_tlob };
+
+ da_handle_event_tlob(&mon, trace_start_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob);
+
+ da_handle_event_tlob(&mon, budget_expired_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob);
+
+ /* Single trace_start is sufficient to re-enter on_cpu */
+ da_handle_event_tlob(&mon, trace_start_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob);
+
+ da_handle_event_tlob(&mon, trace_stop_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob);
+}
+
+/* off_cpu self-loops on switch_out and sched_wakeup. */
+static void tlob_off_cpu_self_loops(struct kunit *test)
+{
+ static const enum events_tlob events[] = {
+ switch_out_tlob, sched_wakeup_tlob,
+ };
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(events); i++) {
+ struct da_monitor mon = { .curr_state = off_cpu_tlob };
+
+ da_handle_event_tlob(&mon, events[i]);
+ KUNIT_EXPECT_EQ_MSG(test, (int)mon.curr_state,
+ (int)off_cpu_tlob,
+ "event %u should self-loop in off_cpu",
+ events[i]);
+ }
+}
+
+/* on_cpu self-loops on sched_wakeup. */
+static void tlob_on_cpu_self_loops(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = on_cpu_tlob };
+
+ da_handle_event_tlob(&mon, sched_wakeup_tlob);
+ KUNIT_EXPECT_EQ_MSG(test, (int)mon.curr_state, (int)on_cpu_tlob,
+ "sched_wakeup should self-loop in on_cpu");
+}
+
+/* Scheduling events in unmonitored self-loop (no state change). */
+static void tlob_unmonitored_ignores_sched(struct kunit *test)
+{
+ static const enum events_tlob events[] = {
+ switch_in_tlob, switch_out_tlob, sched_wakeup_tlob,
+ };
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(events); i++) {
+ struct da_monitor mon = { .curr_state = unmonitored_tlob };
+
+ da_handle_event_tlob(&mon, events[i]);
+ KUNIT_EXPECT_EQ_MSG(test, (int)mon.curr_state,
+ (int)unmonitored_tlob,
+ "event %u should self-loop in unmonitored",
+ events[i]);
+ }
+}
+
+static void tlob_full_happy_path(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = unmonitored_tlob };
+
+ da_handle_event_tlob(&mon, trace_start_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob);
+
+ da_handle_event_tlob(&mon, switch_out_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)off_cpu_tlob);
+
+ da_handle_event_tlob(&mon, switch_in_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob);
+
+ da_handle_event_tlob(&mon, trace_stop_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob);
+}
+
+static void tlob_multiple_switches(struct kunit *test)
+{
+ struct da_monitor mon = { .curr_state = unmonitored_tlob };
+ int i;
+
+ da_handle_event_tlob(&mon, trace_start_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob);
+
+ for (i = 0; i < 3; i++) {
+ da_handle_event_tlob(&mon, switch_out_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)off_cpu_tlob);
+ da_handle_event_tlob(&mon, switch_in_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)on_cpu_tlob);
+ }
+
+ da_handle_event_tlob(&mon, trace_stop_tlob);
+ KUNIT_EXPECT_EQ(test, (int)mon.curr_state, (int)unmonitored_tlob);
+}
+
+static struct kunit_case tlob_automaton_cases[] = {
+ KUNIT_CASE(tlob_unmonitored_to_on_cpu),
+ KUNIT_CASE(tlob_on_cpu_switch_out),
+ KUNIT_CASE(tlob_off_cpu_switch_in),
+ KUNIT_CASE(tlob_on_cpu_budget_expired),
+ KUNIT_CASE(tlob_off_cpu_budget_expired),
+ KUNIT_CASE(tlob_on_cpu_trace_stop),
+ KUNIT_CASE(tlob_off_cpu_trace_stop),
+ KUNIT_CASE(tlob_off_cpu_self_loops),
+ KUNIT_CASE(tlob_on_cpu_self_loops),
+ KUNIT_CASE(tlob_unmonitored_ignores_sched),
+ KUNIT_CASE(tlob_full_happy_path),
+ KUNIT_CASE(tlob_violation_then_restart),
+ KUNIT_CASE(tlob_multiple_switches),
+ {}
+};
+
+static struct kunit_suite tlob_automaton_suite = {
+ .name = "tlob_automaton",
+ .test_cases = tlob_automaton_cases,
+};
+
+/*
+ * Suite 2: task registration API
+ */
+
+/* Basic start/stop cycle */
+static void tlob_start_stop_ok(struct kunit *test)
+{
+ int ret;
+
+ ret = tlob_start_task(current, 10000000 /* 10 s, won't fire */, NULL, 0);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+ KUNIT_EXPECT_EQ(test, tlob_stop_task(current), 0);
+}
+
+/* Double start must return -EEXIST. */
+static void tlob_double_start(struct kunit *test)
+{
+ KUNIT_ASSERT_EQ(test, tlob_start_task(current, 10000000, NULL, 0), 0);
+ KUNIT_EXPECT_EQ(test, tlob_start_task(current, 10000000, NULL, 0), -EEXIST);
+ tlob_stop_task(current);
+}
+
+/* Stop without start must return -ESRCH. */
+static void tlob_stop_without_start(struct kunit *test)
+{
+ tlob_stop_task(current); /* clear any stale entry first */
+ KUNIT_EXPECT_EQ(test, tlob_stop_task(current), -ESRCH);
+}
+
+/*
+ * A 1 us budget fires before tlob_stop_task() is called. Either the
+ * timer wins (-ESRCH) or we are very fast (0); both are valid.
+ */
+static void tlob_immediate_deadline(struct kunit *test)
+{
+ int ret = tlob_start_task(current, 1 /* 1 us - fires almost immediately */, NULL, 0);
+
+ KUNIT_ASSERT_EQ(test, ret, 0);
+ /* Let the 1 us timer fire */
+ udelay(100);
+ /*
+ * By now the hrtimer has almost certainly fired. Either it has
+ * (returns -ESRCH) or we were very fast (returns 0). Both are
+ * acceptable; just ensure no crash and the table is clean after.
+ */
+ ret = tlob_stop_task(current);
+ KUNIT_EXPECT_TRUE(test, ret == 0 || ret == -ESRCH);
+}
+
+/*
+ * Fill the table to TLOB_MAX_MONITORED using kthreads (each needs a
+ * distinct task_struct), then verify the next start returns -ENOSPC.
+ */
+struct tlob_waiter_ctx {
+ struct completion start;
+ struct completion done;
+};
+
+static int tlob_waiter_fn(void *arg)
+{
+ struct tlob_waiter_ctx *ctx = arg;
+
+ wait_for_completion(&ctx->start);
+ complete(&ctx->done);
+ return 0;
+}
+
+static void tlob_enospc(struct kunit *test)
+{
+ struct tlob_waiter_ctx *ctxs;
+ struct task_struct **threads;
+ int i, ret;
+
+ ctxs = kunit_kcalloc(test, TLOB_MAX_MONITORED,
+ sizeof(*ctxs), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, ctxs);
+
+ threads = kunit_kcalloc(test, TLOB_MAX_MONITORED,
+ sizeof(*threads), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, threads);
+
+ /* Start TLOB_MAX_MONITORED kthreads and monitor each */
+ for (i = 0; i < TLOB_MAX_MONITORED; i++) {
+ init_completion(&ctxs[i].start);
+ init_completion(&ctxs[i].done);
+
+ threads[i] = kthread_run(tlob_waiter_fn, &ctxs[i],
+ "tlob_waiter_%d", i);
+ if (IS_ERR(threads[i])) {
+ KUNIT_FAIL(test, "kthread_run failed at i=%d", i);
+ threads[i] = NULL;
+ goto cleanup;
+ }
+ get_task_struct(threads[i]);
+
+ ret = tlob_start_task(threads[i], 10000000, NULL, 0);
+ if (ret != 0) {
+ KUNIT_FAIL(test, "tlob_start_task failed at i=%d: %d",
+ i, ret);
+ put_task_struct(threads[i]);
+ complete(&ctxs[i].start);
+ goto cleanup;
+ }
+ }
+
+ /* The table is now full: one more must fail with -ENOSPC */
+ ret = tlob_start_task(current, 10000000, NULL, 0);
+ KUNIT_EXPECT_EQ(test, ret, -ENOSPC);
+
+cleanup:
+ /*
+ * Two-pass cleanup: cancel tlob monitoring and unblock kthreads first,
+ * then kthread_stop() to wait for full exit before releasing refs.
+ */
+ for (i = 0; i < TLOB_MAX_MONITORED; i++) {
+ if (!threads[i])
+ break;
+ tlob_stop_task(threads[i]);
+ complete(&ctxs[i].start);
+ }
+ for (i = 0; i < TLOB_MAX_MONITORED; i++) {
+ if (!threads[i])
+ break;
+ kthread_stop(threads[i]);
+ put_task_struct(threads[i]);
+ }
+}
+
+/*
+ * A kthread holds a mutex for 80 ms; arm a 10 ms budget, burn ~1 ms
+ * on-CPU, then block on the mutex. The timer fires off-CPU; stop
+ * must return -ESRCH.
+ */
+struct tlob_holder_ctx {
+ struct mutex lock;
+ struct completion ready;
+ unsigned int hold_ms;
+};
+
+static int tlob_holder_fn(void *arg)
+{
+ struct tlob_holder_ctx *ctx = arg;
+
+ mutex_lock(&ctx->lock);
+ complete(&ctx->ready);
+ msleep(ctx->hold_ms);
+ mutex_unlock(&ctx->lock);
+ return 0;
+}
+
+static void tlob_deadline_fires_off_cpu(struct kunit *test)
+{
+ struct tlob_holder_ctx ctx = { .hold_ms = 80 };
+ struct task_struct *holder;
+ ktime_t t0;
+ int ret;
+
+ mutex_init(&ctx.lock);
+ init_completion(&ctx.ready);
+
+ holder = kthread_run(tlob_holder_fn, &ctx, "tlob_holder_kunit");
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, holder);
+ wait_for_completion(&ctx.ready);
+
+ /* Arm 10 ms budget while kthread holds the mutex. */
+ ret = tlob_start_task(current, 10000, NULL, 0);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ /* Phase 1: burn ~1 ms on-CPU to exercise on_cpu accounting. */
+ t0 = ktime_get();
+ while (ktime_us_delta(ktime_get(), t0) < 1000)
+ cpu_relax();
+
+ /*
+ * Phase 2: block on the mutex -> on_cpu->off_cpu transition.
+ * The 10 ms budget fires while we are off-CPU.
+ */
+ mutex_lock(&ctx.lock);
+ mutex_unlock(&ctx.lock);
+
+ /* Timer already fired and removed the entry -> -ESRCH */
+ KUNIT_EXPECT_EQ(test, tlob_stop_task(current), -ESRCH);
+}
+
+/* Arm a 1 ms budget and busy-spin for 50 ms; timer fires on-CPU. */
+static void tlob_deadline_fires_on_cpu(struct kunit *test)
+{
+ ktime_t t0;
+ int ret;
+
+ ret = tlob_start_task(current, 1000 /* 1 ms */, NULL, 0);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ /* Busy-spin 50 ms - 50x the budget */
+ t0 = ktime_get();
+ while (ktime_us_delta(ktime_get(), t0) < 50000)
+ cpu_relax();
+
+ /* Timer fired during the spin; entry is gone */
+ KUNIT_EXPECT_EQ(test, tlob_stop_task(current), -ESRCH);
+}
+
+/*
+ * Start three tasks, call tlob_destroy_monitor() + tlob_init_monitor(),
+ * and verify the table is empty afterwards.
+ */
+static int tlob_dummy_fn(void *arg)
+{
+ wait_for_completion((struct completion *)arg);
+ return 0;
+}
+
+static void tlob_stop_all_cleanup(struct kunit *test)
+{
+ struct completion done1, done2;
+ struct task_struct *t1, *t2;
+ int ret;
+
+ init_completion(&done1);
+ init_completion(&done2);
+
+ t1 = kthread_run(tlob_dummy_fn, &done1, "tlob_dummy1");
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, t1);
+ get_task_struct(t1);
+
+ t2 = kthread_run(tlob_dummy_fn, &done2, "tlob_dummy2");
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, t2);
+ get_task_struct(t2);
+
+ KUNIT_ASSERT_EQ(test, tlob_start_task(current, 10000000, NULL, 0), 0);
+ KUNIT_ASSERT_EQ(test, tlob_start_task(t1, 10000000, NULL, 0), 0);
+ KUNIT_ASSERT_EQ(test, tlob_start_task(t2, 10000000, NULL, 0), 0);
+
+ /* Destroy clears all entries via tlob_stop_all() */
+ tlob_destroy_monitor();
+ ret = tlob_init_monitor();
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ /* Table must be empty now */
+ KUNIT_EXPECT_EQ(test, tlob_stop_task(current), -ESRCH);
+ KUNIT_EXPECT_EQ(test, tlob_stop_task(t1), -ESRCH);
+ KUNIT_EXPECT_EQ(test, tlob_stop_task(t2), -ESRCH);
+
+ complete(&done1);
+ complete(&done2);
+ /*
+ * completions live on stack; wait for kthreads to exit before return.
+ */
+ kthread_stop(t1);
+ kthread_stop(t2);
+ put_task_struct(t1);
+ put_task_struct(t2);
+}
+
+/* A threshold that overflows ktime_t must be rejected with -ERANGE. */
+static void tlob_overflow_threshold(struct kunit *test)
+{
+ /* KTIME_MAX / NSEC_PER_USEC + 1 overflows ktime_t */
+ u64 too_large = (u64)(KTIME_MAX / NSEC_PER_USEC) + 1;
+
+ KUNIT_EXPECT_EQ(test,
+ tlob_start_task(current, too_large, NULL, 0),
+ -ERANGE);
+}
+
+static int tlob_task_api_suite_init(struct kunit_suite *suite)
+{
+ return tlob_init_monitor();
+}
+
+static void tlob_task_api_suite_exit(struct kunit_suite *suite)
+{
+ tlob_destroy_monitor();
+}
+
+static struct kunit_case tlob_task_api_cases[] = {
+ KUNIT_CASE(tlob_start_stop_ok),
+ KUNIT_CASE(tlob_double_start),
+ KUNIT_CASE(tlob_stop_without_start),
+ KUNIT_CASE(tlob_immediate_deadline),
+ KUNIT_CASE(tlob_enospc),
+ KUNIT_CASE(tlob_overflow_threshold),
+ KUNIT_CASE(tlob_deadline_fires_off_cpu),
+ KUNIT_CASE(tlob_deadline_fires_on_cpu),
+ KUNIT_CASE(tlob_stop_all_cleanup),
+ {}
+};
+
+static struct kunit_suite tlob_task_api_suite = {
+ .name = "tlob_task_api",
+ .suite_init = tlob_task_api_suite_init,
+ .suite_exit = tlob_task_api_suite_exit,
+ .test_cases = tlob_task_api_cases,
+};
+
+/*
+ * Suite 3: scheduling integration
+ */
+
+struct tlob_ping_ctx {
+ struct completion ping;
+ struct completion pong;
+};
+
+static int tlob_ping_fn(void *arg)
+{
+ struct tlob_ping_ctx *ctx = arg;
+
+ /* Wait for main to give us the CPU back */
+ wait_for_completion(&ctx->ping);
+ complete(&ctx->pong);
+ return 0;
+}
+
+/* Force two context switches and verify stop returns 0 (within budget). */
+static void tlob_sched_switch_accounting(struct kunit *test)
+{
+ struct tlob_ping_ctx ctx;
+ struct task_struct *peer;
+ int ret;
+
+ init_completion(&ctx.ping);
+ init_completion(&ctx.pong);
+
+ peer = kthread_run(tlob_ping_fn, &ctx, "tlob_ping_kunit");
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, peer);
+
+ /* Arm a generous 5 s budget so the timer never fires */
+ ret = tlob_start_task(current, 5000000, NULL, 0);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ /*
+ * complete(ping) -> peer runs, forcing a context switch out and back.
+ */
+ complete(&ctx.ping);
+ wait_for_completion(&ctx.pong);
+
+ /*
+ * Back on CPU after one off-CPU interval; stop must return 0.
+ */
+ ret = tlob_stop_task(current);
+ KUNIT_EXPECT_EQ(test, ret, 0);
+}
+
+/*
+ * Verify that monitoring a kthread (not current) works: start on behalf
+ * of a kthread, let it block, then stop it.
+ */
+static int tlob_block_fn(void *arg)
+{
+ struct completion *done = arg;
+
+ /* Block briefly, exercising off_cpu accounting for this task */
+ msleep(20);
+ complete(done);
+ return 0;
+}
+
+static void tlob_monitor_other_task(struct kunit *test)
+{
+ struct completion done;
+ struct task_struct *target;
+ int ret;
+
+ init_completion(&done);
+
+ target = kthread_run(tlob_block_fn, &done, "tlob_target_kunit");
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, target);
+ get_task_struct(target);
+
+ /* Arm a 5 s budget for the target task */
+ ret = tlob_start_task(target, 5000000, NULL, 0);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ wait_for_completion(&done);
+
+ /*
+ * Target has finished; stop_task may return 0 (still in htable)
+ * or -ESRCH (kthread exited and timer fired / entry cleaned up).
+ */
+ ret = tlob_stop_task(target);
+ KUNIT_EXPECT_TRUE(test, ret == 0 || ret == -ESRCH);
+ put_task_struct(target);
+}
+
+static int tlob_sched_suite_init(struct kunit_suite *suite)
+{
+ return tlob_init_monitor();
+}
+
+static void tlob_sched_suite_exit(struct kunit_suite *suite)
+{
+ tlob_destroy_monitor();
+}
+
+static struct kunit_case tlob_sched_integration_cases[] = {
+ KUNIT_CASE(tlob_sched_switch_accounting),
+ KUNIT_CASE(tlob_monitor_other_task),
+ {}
+};
+
+static struct kunit_suite tlob_sched_integration_suite = {
+ .name = "tlob_sched_integration",
+ .suite_init = tlob_sched_suite_init,
+ .suite_exit = tlob_sched_suite_exit,
+ .test_cases = tlob_sched_integration_cases,
+};
+
+/*
+ * Suite 4: ftrace tracepoint field verification
+ */
+
+/* Capture fields from trace_tlob_budget_exceeded for inspection. */
+struct tlob_exceeded_capture {
+ atomic_t fired; /* 1 after first call */
+ pid_t pid;
+ u64 threshold_us;
+ u64 on_cpu_us;
+ u64 off_cpu_us;
+ u32 switches;
+ bool state_is_on_cpu;
+ u64 tag;
+};
+
+static void
+probe_tlob_budget_exceeded(void *data,
+ struct task_struct *task, u64 threshold_us,
+ u64 on_cpu_us, u64 off_cpu_us,
+ u32 switches, bool state_is_on_cpu, u64 tag)
+{
+ struct tlob_exceeded_capture *cap = data;
+
+ /* Only capture the first event to avoid races. */
+ if (atomic_cmpxchg(&cap->fired, 0, 1) != 0)
+ return;
+
+ cap->pid = task->pid;
+ cap->threshold_us = threshold_us;
+ cap->on_cpu_us = on_cpu_us;
+ cap->off_cpu_us = off_cpu_us;
+ cap->switches = switches;
+ cap->state_is_on_cpu = state_is_on_cpu;
+ cap->tag = tag;
+}
+
+/*
+ * Arm a 2 ms budget and busy-spin for 60 ms. Verify the tracepoint fires
+ * once with matching threshold, correct pid, and total time >= budget.
+ *
+ * state_is_on_cpu is not asserted: preemption during the spin makes it
+ * non-deterministic.
+ */
+static void tlob_trace_budget_exceeded_on_cpu(struct kunit *test)
+{
+ struct tlob_exceeded_capture cap = {};
+ const u64 threshold_us = 2000; /* 2 ms */
+ ktime_t t0;
+ int ret;
+
+ atomic_set(&cap.fired, 0);
+
+ ret = register_trace_tlob_budget_exceeded(probe_tlob_budget_exceeded,
+ &cap);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ ret = tlob_start_task(current, threshold_us, NULL, 0);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ /* Busy-spin 60 ms -- 30x the budget */
+ t0 = ktime_get();
+ while (ktime_us_delta(ktime_get(), t0) < 60000)
+ cpu_relax();
+
+ /* Entry removed by timer; stop returns -ESRCH */
+ tlob_stop_task(current);
+
+ /*
+ * Synchronise: ensure the probe callback has completed before we
+ * read the captured fields.
+ */
+ tracepoint_synchronize_unregister();
+ unregister_trace_tlob_budget_exceeded(probe_tlob_budget_exceeded, &cap);
+
+ KUNIT_EXPECT_EQ(test, atomic_read(&cap.fired), 1);
+ KUNIT_EXPECT_EQ(test, (int)cap.pid, (int)current->pid);
+ KUNIT_EXPECT_EQ(test, cap.threshold_us, threshold_us);
+ /* Total elapsed must cover at least the budget */
+ KUNIT_EXPECT_GE(test, cap.on_cpu_us + cap.off_cpu_us, threshold_us);
+}
+
+/*
+ * Holder kthread grabs a mutex for 80 ms; arm 10 ms budget, burn ~1 ms
+ * on-CPU, then block on the mutex. Timer fires off-CPU. Verify:
+ * state_is_on_cpu == false, switches >= 1, off_cpu_us > 0.
+ */
+static void tlob_trace_budget_exceeded_off_cpu(struct kunit *test)
+{
+ struct tlob_exceeded_capture cap = {};
+ struct tlob_holder_ctx ctx = { .hold_ms = 80 };
+ struct task_struct *holder;
+ const u64 threshold_us = 10000; /* 10 ms */
+ ktime_t t0;
+ int ret;
+
+ atomic_set(&cap.fired, 0);
+
+ mutex_init(&ctx.lock);
+ init_completion(&ctx.ready);
+
+ holder = kthread_run(tlob_holder_fn, &ctx, "tlob_holder2_kunit");
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, holder);
+ wait_for_completion(&ctx.ready);
+
+ ret = register_trace_tlob_budget_exceeded(probe_tlob_budget_exceeded,
+ &cap);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ ret = tlob_start_task(current, threshold_us, NULL, 0);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ /* Phase 1: ~1 ms on-CPU */
+ t0 = ktime_get();
+ while (ktime_us_delta(ktime_get(), t0) < 1000)
+ cpu_relax();
+
+ /* Phase 2: block -> off-CPU; timer fires here */
+ mutex_lock(&ctx.lock);
+ mutex_unlock(&ctx.lock);
+
+ tlob_stop_task(current);
+
+ tracepoint_synchronize_unregister();
+ unregister_trace_tlob_budget_exceeded(probe_tlob_budget_exceeded, &cap);
+
+ KUNIT_EXPECT_EQ(test, atomic_read(&cap.fired), 1);
+ KUNIT_EXPECT_EQ(test, cap.threshold_us, threshold_us);
+ /* Violation happened off-CPU */
+ KUNIT_EXPECT_FALSE(test, cap.state_is_on_cpu);
+ /* At least the switch_out event was counted */
+ KUNIT_EXPECT_GE(test, (u64)cap.switches, (u64)1);
+ /* Off-CPU time must be non-zero */
+ KUNIT_EXPECT_GT(test, cap.off_cpu_us, (u64)0);
+}
+
+/* threshold_us in the tracepoint must exactly match the start argument. */
+static void tlob_trace_threshold_field_accuracy(struct kunit *test)
+{
+ static const u64 thresholds[] = { 500, 1000, 3000 };
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(thresholds); i++) {
+ struct tlob_exceeded_capture cap = {};
+ ktime_t t0;
+ int ret;
+
+ atomic_set(&cap.fired, 0);
+
+ ret = register_trace_tlob_budget_exceeded(
+ probe_tlob_budget_exceeded, &cap);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ ret = tlob_start_task(current, thresholds[i], NULL, 0);
+ KUNIT_ASSERT_EQ(test, ret, 0);
+
+ /* Spin for 20x the threshold to ensure timer fires */
+ t0 = ktime_get();
+ while (ktime_us_delta(ktime_get(), t0) <
+ (s64)(thresholds[i] * 20))
+ cpu_relax();
+
+ tlob_stop_task(current);
+
+ tracepoint_synchronize_unregister();
+ unregister_trace_tlob_budget_exceeded(
+ probe_tlob_budget_exceeded, &cap);
+
+ KUNIT_EXPECT_EQ_MSG(test, cap.threshold_us, thresholds[i],
+ "threshold mismatch for entry %u", i);
+ }
+}
+
+static int tlob_trace_suite_init(struct kunit_suite *suite)
+{
+ int ret;
+
+ ret = tlob_init_monitor();
+ if (ret)
+ return ret;
+ return tlob_enable_hooks();
+}
+
+static void tlob_trace_suite_exit(struct kunit_suite *suite)
+{
+ tlob_disable_hooks();
+ tlob_destroy_monitor();
+}
+
+static struct kunit_case tlob_trace_output_cases[] = {
+ KUNIT_CASE(tlob_trace_budget_exceeded_on_cpu),
+ KUNIT_CASE(tlob_trace_budget_exceeded_off_cpu),
+ KUNIT_CASE(tlob_trace_threshold_field_accuracy),
+ {}
+};
+
+static struct kunit_suite tlob_trace_output_suite = {
+ .name = "tlob_trace_output",
+ .suite_init = tlob_trace_suite_init,
+ .suite_exit = tlob_trace_suite_exit,
+ .test_cases = tlob_trace_output_cases,
+};
+
+/* Suite 5: ring buffer */
+
+/*
+ * Allocate a synthetic rv_file_priv for ring buffer tests. Uses
+ * kunit_kzalloc() instead of __get_free_pages() since the ring is never
+ * mmap'd here.
+ */
+static struct rv_file_priv *alloc_priv_kunit(struct kunit *test, u32 cap)
+{
+ struct rv_file_priv *priv;
+ struct tlob_ring *ring;
+
+ priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return NULL;
+
+ ring = &priv->ring;
+
+ ring->page = kunit_kzalloc(test, sizeof(struct tlob_mmap_page),
+ GFP_KERNEL);
+ if (!ring->page)
+ return NULL;
+
+ ring->data = kunit_kzalloc(test, cap * sizeof(struct tlob_event),
+ GFP_KERNEL);
+ if (!ring->data)
+ return NULL;
+
+ ring->mask = cap - 1;
+ ring->page->capacity = cap;
+ ring->page->version = 1;
+ ring->page->data_offset = PAGE_SIZE; /* nominal; not used in tests */
+ ring->page->record_size = sizeof(struct tlob_event);
+ spin_lock_init(&ring->lock);
+ init_waitqueue_head(&priv->waitq);
+ return priv;
+}
+
+/* Push one record and verify all fields survive the round-trip. */
+static void tlob_event_push_one(struct kunit *test)
+{
+ struct rv_file_priv *priv;
+ struct tlob_ring *ring;
+ struct tlob_event in = {
+ .tid = 1234,
+ .threshold_us = 5000,
+ .on_cpu_us = 3000,
+ .off_cpu_us = 2000,
+ .switches = 3,
+ .state = 1,
+ };
+ struct tlob_event out = {};
+ u32 tail;
+
+ priv = alloc_priv_kunit(test, TLOB_RING_DEFAULT_CAP);
+ KUNIT_ASSERT_NOT_NULL(test, priv);
+
+ ring = &priv->ring;
+
+ tlob_event_push_kunit(priv, &in);
+
+ /* One record written, none dropped */
+ KUNIT_EXPECT_EQ(test, ring->page->data_head, 1u);
+ KUNIT_EXPECT_EQ(test, ring->page->data_tail, 0u);
+ KUNIT_EXPECT_EQ(test, ring->page->dropped, 0ull);
+
+ /* Dequeue manually */
+ tail = ring->page->data_tail;
+ out = ring->data[tail & ring->mask];
+ ring->page->data_tail = tail + 1;
+
+ KUNIT_EXPECT_EQ(test, out.tid, in.tid);
+ KUNIT_EXPECT_EQ(test, out.threshold_us, in.threshold_us);
+ KUNIT_EXPECT_EQ(test, out.on_cpu_us, in.on_cpu_us);
+ KUNIT_EXPECT_EQ(test, out.off_cpu_us, in.off_cpu_us);
+ KUNIT_EXPECT_EQ(test, out.switches, in.switches);
+ KUNIT_EXPECT_EQ(test, out.state, in.state);
+
+ /* Ring is now empty */
+ KUNIT_EXPECT_EQ(test, ring->page->data_head, ring->page->data_tail);
+}
+
+/*
+ * Fill to capacity, push one more. Drop-new policy: head stays at cap,
+ * dropped == 1, oldest record is preserved.
+ */
+static void tlob_event_push_overflow(struct kunit *test)
+{
+ struct rv_file_priv *priv;
+ struct tlob_ring *ring;
+ struct tlob_event ntf = {};
+ struct tlob_event out = {};
+ const u32 cap = TLOB_RING_MIN_CAP;
+ u32 i;
+
+ priv = alloc_priv_kunit(test, cap);
+ KUNIT_ASSERT_NOT_NULL(test, priv);
+
+ ring = &priv->ring;
+
+ /* Push cap + 1 records; tid encodes the sequence */
+ for (i = 0; i <= cap; i++) {
+ ntf.tid = i;
+ ntf.threshold_us = (u64)i * 1000;
+ tlob_event_push_kunit(priv, &ntf);
+ }
+
+ /* Drop-new: head stopped at cap; one record was silently discarded */
+ KUNIT_EXPECT_EQ(test, ring->page->data_head, cap);
+ KUNIT_EXPECT_EQ(test, ring->page->data_tail, 0u);
+ KUNIT_EXPECT_EQ(test, ring->page->dropped, 1ull);
+
+ /* Oldest surviving record must be the first one pushed (tid == 0) */
+ out = ring->data[ring->page->data_tail & ring->mask];
+ KUNIT_EXPECT_EQ(test, out.tid, 0u);
+
+ /* Drain the ring; the last record must have tid == cap - 1 */
+ for (i = 0; i < cap; i++) {
+ u32 tail = ring->page->data_tail;
+
+ out = ring->data[tail & ring->mask];
+ ring->page->data_tail = tail + 1;
+ }
+ KUNIT_EXPECT_EQ(test, out.tid, cap - 1);
+ KUNIT_EXPECT_EQ(test, ring->page->data_head, ring->page->data_tail);
+}
+
+/* A freshly initialised ring is empty. */
+static void tlob_event_empty(struct kunit *test)
+{
+ struct rv_file_priv *priv;
+ struct tlob_ring *ring;
+
+ priv = alloc_priv_kunit(test, TLOB_RING_DEFAULT_CAP);
+ KUNIT_ASSERT_NOT_NULL(test, priv);
+
+ ring = &priv->ring;
+
+ KUNIT_EXPECT_EQ(test, ring->page->data_head, 0u);
+ KUNIT_EXPECT_EQ(test, ring->page->data_tail, 0u);
+ KUNIT_EXPECT_EQ(test, ring->page->dropped, 0ull);
+}
+
+/* A kthread blocks on wait_event_interruptible(); pushing one record must
+ * wake it within 1 s.
+ */
+
+struct tlob_wakeup_ctx {
+ struct rv_file_priv *priv;
+ struct completion ready;
+ struct completion done;
+ int woke;
+};
+
+static int tlob_wakeup_thread(void *arg)
+{
+ struct tlob_wakeup_ctx *ctx = arg;
+ struct tlob_ring *ring = &ctx->priv->ring;
+
+ complete(&ctx->ready);
+
+ wait_event_interruptible(ctx->priv->waitq,
+ smp_load_acquire(&ring->page->data_head) !=
+ READ_ONCE(ring->page->data_tail) ||
+ kthread_should_stop());
+
+ if (smp_load_acquire(&ring->page->data_head) !=
+ READ_ONCE(ring->page->data_tail))
+ ctx->woke = 1;
+
+ complete(&ctx->done);
+ return 0;
+}
+
+static void tlob_ring_wakeup(struct kunit *test)
+{
+ struct rv_file_priv *priv;
+ struct tlob_wakeup_ctx ctx;
+ struct task_struct *t;
+ struct tlob_event ev = { .tid = 99 };
+ long timeout;
+
+ priv = alloc_priv_kunit(test, TLOB_RING_DEFAULT_CAP);
+ KUNIT_ASSERT_NOT_NULL(test, priv);
+
+ init_completion(&ctx.ready);
+ init_completion(&ctx.done);
+ ctx.priv = priv;
+ ctx.woke = 0;
+
+ t = kthread_run(tlob_wakeup_thread, &ctx, "tlob_wakeup_kunit");
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, t);
+ get_task_struct(t);
+
+ /* Let the kthread reach wait_event_interruptible */
+ wait_for_completion(&ctx.ready);
+ usleep_range(10000, 20000);
+
+ /* Push one record -- must wake the waiter */
+ tlob_event_push_kunit(priv, &ev);
+
+ timeout = wait_for_completion_timeout(&ctx.done, msecs_to_jiffies(1000));
+ kthread_stop(t);
+ put_task_struct(t);
+
+ KUNIT_EXPECT_GT(test, timeout, 0L);
+ KUNIT_EXPECT_EQ(test, ctx.woke, 1);
+ KUNIT_EXPECT_EQ(test, priv->ring.page->data_head, 1u);
+}
+
+static struct kunit_case tlob_event_buf_cases[] = {
+ KUNIT_CASE(tlob_event_push_one),
+ KUNIT_CASE(tlob_event_push_overflow),
+ KUNIT_CASE(tlob_event_empty),
+ KUNIT_CASE(tlob_ring_wakeup),
+ {}
+};
+
+static struct kunit_suite tlob_event_buf_suite = {
+ .name = "tlob_event_buf",
+ .test_cases = tlob_event_buf_cases,
+};
+
+/* Suite 6: uprobe format string parser */
+
+/* Happy path: decimal offsets, plain path. */
+static void tlob_parse_decimal_offsets(struct kunit *test)
+{
+ char buf[] = "5000:4768:4848:/usr/bin/myapp";
+ u64 thr; loff_t start, stop; char *path;
+
+ KUNIT_EXPECT_EQ(test,
+ tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop),
+ 0);
+ KUNIT_EXPECT_EQ(test, thr, (u64)5000);
+ KUNIT_EXPECT_EQ(test, start, (loff_t)4768);
+ KUNIT_EXPECT_EQ(test, stop, (loff_t)4848);
+ KUNIT_EXPECT_STREQ(test, path, "/usr/bin/myapp");
+}
+
+/* Happy path: 0x-prefixed hex offsets. */
+static void tlob_parse_hex_offsets(struct kunit *test)
+{
+ char buf[] = "10000:0x12a0:0x12f0:/usr/bin/myapp";
+ u64 thr; loff_t start, stop; char *path;
+
+ KUNIT_EXPECT_EQ(test,
+ tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop),
+ 0);
+ KUNIT_EXPECT_EQ(test, start, (loff_t)0x12a0);
+ KUNIT_EXPECT_EQ(test, stop, (loff_t)0x12f0);
+ KUNIT_EXPECT_STREQ(test, path, "/usr/bin/myapp");
+}
+
+/* Path containing ':' must not be truncated. */
+static void tlob_parse_path_with_colon(struct kunit *test)
+{
+ char buf[] = "1000:0x100:0x200:/opt/my:app/bin";
+ u64 thr; loff_t start, stop; char *path;
+
+ KUNIT_EXPECT_EQ(test,
+ tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop),
+ 0);
+ KUNIT_EXPECT_STREQ(test, path, "/opt/my:app/bin");
+}
+
+/* Zero threshold must be rejected. */
+static void tlob_parse_zero_threshold(struct kunit *test)
+{
+ char buf[] = "0:0x100:0x200:/usr/bin/myapp";
+ u64 thr; loff_t start, stop; char *path;
+
+ KUNIT_EXPECT_EQ(test,
+ tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop),
+ -EINVAL);
+}
+
+/* Empty path (trailing ':' with nothing after) must be rejected. */
+static void tlob_parse_empty_path(struct kunit *test)
+{
+ char buf[] = "5000:0x100:0x200:";
+ u64 thr; loff_t start, stop; char *path;
+
+ KUNIT_EXPECT_EQ(test,
+ tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop),
+ -EINVAL);
+}
+
+/* Missing field (3 tokens instead of 4) must be rejected. */
+static void tlob_parse_too_few_fields(struct kunit *test)
+{
+ char buf[] = "5000:0x100:/usr/bin/myapp";
+ u64 thr; loff_t start, stop; char *path;
+
+ KUNIT_EXPECT_EQ(test,
+ tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop),
+ -EINVAL);
+}
+
+/* Negative offset must be rejected. */
+static void tlob_parse_negative_offset(struct kunit *test)
+{
+ char buf[] = "5000:-1:0x200:/usr/bin/myapp";
+ u64 thr; loff_t start, stop; char *path;
+
+ KUNIT_EXPECT_EQ(test,
+ tlob_parse_uprobe_line(buf, &thr, &path, &start, &stop),
+ -EINVAL);
+}
+
+static struct kunit_case tlob_parse_uprobe_cases[] = {
+ KUNIT_CASE(tlob_parse_decimal_offsets),
+ KUNIT_CASE(tlob_parse_hex_offsets),
+ KUNIT_CASE(tlob_parse_path_with_colon),
+ KUNIT_CASE(tlob_parse_zero_threshold),
+ KUNIT_CASE(tlob_parse_empty_path),
+ KUNIT_CASE(tlob_parse_too_few_fields),
+ KUNIT_CASE(tlob_parse_negative_offset),
+ {}
+};
+
+static struct kunit_suite tlob_parse_uprobe_suite = {
+ .name = "tlob_parse_uprobe",
+ .test_cases = tlob_parse_uprobe_cases,
+};
+
+kunit_test_suites(&tlob_automaton_suite,
+ &tlob_task_api_suite,
+ &tlob_sched_integration_suite,
+ &tlob_trace_output_suite,
+ &tlob_event_buf_suite,
+ &tlob_parse_uprobe_suite);
+
+MODULE_DESCRIPTION("KUnit tests for the tlob RV monitor");
+MODULE_LICENSE("GPL");
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 4/4] selftests/rv: Add selftest for the tlob monitor
2026-04-12 19:27 [RFC PATCH 0/4] rv/tlob: Add task latency over budget RV monitor wen.yang
` (2 preceding siblings ...)
2026-04-12 19:27 ` [RFC PATCH 3/4] rv/tlob: Add KUnit tests for the tlob monitor wen.yang
@ 2026-04-12 19:27 ` wen.yang
3 siblings, 0 replies; 7+ messages in thread
From: wen.yang @ 2026-04-12 19:27 UTC (permalink / raw)
To: Steven Rostedt, Gabriele Monaco, Masami Hiramatsu,
Mathieu Desnoyers
Cc: linux-trace-kernel, linux-kernel, Wen Yang
From: Wen Yang <wen.yang@linux.dev>
Add a kselftest suite (TAP output, 19 test points) for the tlob RV
monitor under tools/testing/selftests/rv/.
test_tlob.sh drives a compiled C helper (tlob_helper) and, for uprobe
tests, a target binary (tlob_uprobe_target). Coverage spans the
tracefs enable/disable path, uprobe-triggered violations, and the
ioctl interface (within-budget stop, CPU-bound and sleep violations,
duplicate start, ring buffer mmap and consumption).
Requires CONFIG_RV_MON_TLOB=y and CONFIG_RV_CHARDEV=y; must be run
as root.
Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
tools/include/uapi/linux/rv.h | 54 +
tools/testing/selftests/rv/Makefile | 18 +
tools/testing/selftests/rv/test_tlob.sh | 563 ++++++++++
tools/testing/selftests/rv/tlob_helper.c | 994 ++++++++++++++++++
.../testing/selftests/rv/tlob_uprobe_target.c | 108 ++
5 files changed, 1737 insertions(+)
create mode 100644 tools/include/uapi/linux/rv.h
create mode 100644 tools/testing/selftests/rv/Makefile
create mode 100755 tools/testing/selftests/rv/test_tlob.sh
create mode 100644 tools/testing/selftests/rv/tlob_helper.c
create mode 100644 tools/testing/selftests/rv/tlob_uprobe_target.c
diff --git a/tools/include/uapi/linux/rv.h b/tools/include/uapi/linux/rv.h
new file mode 100644
index 000000000..bef07aded
--- /dev/null
+++ b/tools/include/uapi/linux/rv.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * UAPI definitions for Runtime Verification (RV) monitors.
+ *
+ * This is a tools-friendly copy of include/uapi/linux/rv.h.
+ * Keep in sync with the kernel header.
+ */
+
+#ifndef _UAPI_LINUX_RV_H
+#define _UAPI_LINUX_RV_H
+
+#include <linux/types.h>
+#include <sys/ioctl.h>
+
+/* Magic byte shared by all RV monitor ioctls. */
+#define RV_IOC_MAGIC 0xB9
+
+/* -----------------------------------------------------------------------
+ * tlob: task latency over budget monitor (nr 0x01 - 0x1F)
+ * -----------------------------------------------------------------------
+ */
+
+struct tlob_start_args {
+ __u64 threshold_us;
+ __u64 tag;
+ __s32 notify_fd;
+ __u32 flags;
+};
+
+struct tlob_event {
+ __u32 tid;
+ __u32 pad;
+ __u64 threshold_us;
+ __u64 on_cpu_us;
+ __u64 off_cpu_us;
+ __u32 switches;
+ __u32 state; /* 1 = on_cpu, 0 = off_cpu */
+ __u64 tag;
+};
+
+struct tlob_mmap_page {
+ __u32 data_head;
+ __u32 data_tail;
+ __u32 capacity;
+ __u32 version;
+ __u32 data_offset;
+ __u32 record_size;
+ __u64 dropped;
+};
+
+#define TLOB_IOCTL_TRACE_START _IOW(RV_IOC_MAGIC, 0x01, struct tlob_start_args)
+#define TLOB_IOCTL_TRACE_STOP _IO(RV_IOC_MAGIC, 0x02)
+
+#endif /* _UAPI_LINUX_RV_H */
diff --git a/tools/testing/selftests/rv/Makefile b/tools/testing/selftests/rv/Makefile
new file mode 100644
index 000000000..14e94a1ab
--- /dev/null
+++ b/tools/testing/selftests/rv/Makefile
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for rv selftests
+
+TEST_GEN_PROGS := tlob_helper tlob_uprobe_target
+
+TEST_PROGS := \
+ test_tlob.sh \
+
+# TOOLS_INCLUDES is defined by ../lib.mk; provides -isystem to
+# tools/include/uapi so that #include <linux/rv.h> resolves to the
+# in-tree UAPI header without requiring make headers_install.
+# Note: both must be added to the global variables, not as target-specific
+# overrides, because lib.mk rewrites TEST_GEN_PROGS to $(OUTPUT)/name
+# before per-target rules would be evaluated.
+CFLAGS += $(TOOLS_INCLUDES)
+LDLIBS += -lpthread
+
+include ../lib.mk
diff --git a/tools/testing/selftests/rv/test_tlob.sh b/tools/testing/selftests/rv/test_tlob.sh
new file mode 100755
index 000000000..3ba2125eb
--- /dev/null
+++ b/tools/testing/selftests/rv/test_tlob.sh
@@ -0,0 +1,563 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Selftest for the tlob (task latency over budget) RV monitor.
+#
+# Two interfaces are tested:
+#
+# 1. tracefs interface:
+# enable/disable, presence of tracefs files,
+# uprobe binding (threshold_us:offset_start:offset_stop:binary_path) and
+# violation detection via the ftrace ring buffer.
+#
+# 2. /dev/rv ioctl self-instrumentation (via tlob_helper):
+# within-budget, over-budget on-CPU, over-budget off-CPU (sleep),
+# double-start, stop-without-start.
+#
+# Written to be POSIX sh compatible (no bash-specific extensions).
+
+ksft_skip=4
+t_pass=0; t_fail=0; t_skip=0; t_total=0
+
+tap_header() { echo "TAP version 13"; }
+tap_plan() { echo "1..$1"; }
+tap_pass() { t_pass=$((t_pass+1)); echo "ok $t_total - $1"; }
+tap_fail() { t_fail=$((t_fail+1)); echo "not ok $t_total - $1"
+ [ -n "$2" ] && echo " # $2"; }
+tap_skip() { t_skip=$((t_skip+1)); echo "ok $t_total - $1 # SKIP $2"; }
+next_test() { t_total=$((t_total+1)); }
+
+TRACEFS=$(grep -m1 tracefs /proc/mounts 2>/dev/null | awk '{print $2}')
+[ -z "$TRACEFS" ] && TRACEFS=/sys/kernel/tracing
+
+RV_DIR="${TRACEFS}/rv"
+TLOB_DIR="${RV_DIR}/monitors/tlob"
+TRACE_FILE="${TRACEFS}/trace"
+TRACING_ON="${TRACEFS}/tracing_on"
+TLOB_MONITOR="${TLOB_DIR}/monitor"
+BUDGET_EXCEEDED_ENABLE="${TRACEFS}/events/rv/tlob_budget_exceeded/enable"
+RV_DEV="/dev/rv"
+
+# tlob_helper and tlob_uprobe_target must be in the same directory as
+# this script or on PATH.
+SCRIPT_DIR=$(dirname "$0")
+IOCTL_HELPER="${SCRIPT_DIR}/tlob_helper"
+UPROBE_TARGET="${SCRIPT_DIR}/tlob_uprobe_target"
+
+check_root() { [ "$(id -u)" = "0" ] || { echo "# Need root" >&2; exit $ksft_skip; }; }
+check_tracefs() { [ -d "${TRACEFS}" ] || { echo "# No tracefs" >&2; exit $ksft_skip; }; }
+check_rv_dir() { [ -d "${RV_DIR}" ] || { echo "# No RV infra" >&2; exit $ksft_skip; }; }
+check_tlob() { [ -d "${TLOB_DIR}" ] || { echo "# No tlob monitor" >&2; exit $ksft_skip; }; }
+
+tlob_enable() { echo 1 > "${TLOB_DIR}/enable"; }
+tlob_disable() { echo 0 > "${TLOB_DIR}/enable" 2>/dev/null; }
+tlob_is_enabled() { [ "$(cat "${TLOB_DIR}/enable" 2>/dev/null)" = "1" ]; }
+trace_event_enable() { echo 1 > "${BUDGET_EXCEEDED_ENABLE}" 2>/dev/null; }
+trace_event_disable() { echo 0 > "${BUDGET_EXCEEDED_ENABLE}" 2>/dev/null; }
+trace_on() { echo 1 > "${TRACING_ON}" 2>/dev/null; }
+trace_clear() { echo > "${TRACE_FILE}"; }
+trace_grep() { grep -q "$1" "${TRACE_FILE}" 2>/dev/null; }
+
+cleanup() {
+ tlob_disable
+ trace_event_disable
+ trace_clear
+}
+
+# ---------------------------------------------------------------------------
+# Test 1: enable / disable
+# ---------------------------------------------------------------------------
+run_test_enable_disable() {
+ next_test; cleanup
+ tlob_enable
+ if ! tlob_is_enabled; then
+ tap_fail "enable_disable" "not enabled after echo 1"; cleanup; return
+ fi
+ tlob_disable
+ if tlob_is_enabled; then
+ tap_fail "enable_disable" "still enabled after echo 0"; cleanup; return
+ fi
+ tap_pass "enable_disable"; cleanup
+}
+
+# ---------------------------------------------------------------------------
+# Test 2: tracefs files present
+# ---------------------------------------------------------------------------
+run_test_tracefs_files() {
+ next_test; cleanup
+ missing=""
+ for f in enable desc monitor; do
+ [ ! -e "${TLOB_DIR}/${f}" ] && missing="${missing} ${f}"
+ done
+ [ -n "${missing}" ] \
+ && tap_fail "tracefs_files" "missing:${missing}" \
+ || tap_pass "tracefs_files"
+ cleanup
+}
+
+# ---------------------------------------------------------------------------
+# Helper: resolve file offset of a function inside a binary.
+#
+# Usage: resolve_offset <binary> <vaddr_hex>
+# Prints the hex file offset, or empty string on failure.
+# ---------------------------------------------------------------------------
+resolve_offset() {
+ bin=$1; vaddr=$2
+ # Parse /proc/self/maps to find the mapping that contains vaddr.
+ # Each line: start-end perms offset dev inode [path]
+ while IFS= read -r line; do
+ set -- $line
+ range=$1; off=$4; path=$7
+ [ -z "$path" ] && continue
+ # Only consider the mapping for our binary
+ [ "$path" != "$bin" ] && continue
+ # Split range into start and end
+ start=$(echo "$range" | cut -d- -f1)
+ end=$(echo "$range" | cut -d- -f2)
+ # Convert hex to decimal for comparison (use printf)
+ s=$(printf "%d" "0x${start}" 2>/dev/null) || continue
+ e=$(printf "%d" "0x${end}" 2>/dev/null) || continue
+ v=$(printf "%d" "${vaddr}" 2>/dev/null) || continue
+ o=$(printf "%d" "0x${off}" 2>/dev/null) || continue
+ if [ "$v" -ge "$s" ] && [ "$v" -lt "$e" ]; then
+ file_off=$(printf "0x%x" $(( (v - s) + o )))
+ echo "$file_off"
+ return
+ fi
+ done < /proc/self/maps
+}
+
+# ---------------------------------------------------------------------------
+# Test 3: uprobe binding - no false positive
+#
+# Bind this process with a 10 s budget. Do nothing for 0.5 s.
+# No budget_exceeded event should appear in the trace.
+# ---------------------------------------------------------------------------
+run_test_uprobe_no_false_positive() {
+ next_test; cleanup
+ if [ ! -e "${TLOB_MONITOR}" ]; then
+ tap_skip "uprobe_no_false_positive" "monitor file not available"
+ cleanup; return
+ fi
+ # We probe the "sleep" command that we will run as a subprocess.
+ # Use /bin/sleep as the binary; find a valid function offset (0x0
+ # resolves to the ELF entry point, which is sufficient for a
+ # no-false-positive test since we just need the binding to exist).
+ sleep_bin=$(command -v sleep 2>/dev/null)
+ if [ -z "$sleep_bin" ]; then
+ tap_skip "uprobe_no_false_positive" "sleep not found"; cleanup; return
+ fi
+ pid=$$
+ # offset 0x0 probes the entry point of /bin/sleep - this is a
+ # deliberate probe that will not fire during a simple 'sleep 10'
+ # invoked in a subshell, but registers the pid in tlob.
+ #
+ # Instead, bind our own pid with a generous 10 s threshold and
+ # verify that 0.5 s of idle time does NOT fire the timer.
+ #
+ # Since we cannot easily get a valid uprobe offset in pure shell,
+ # we skip this sub-test if we cannot form a valid binding.
+ exe=$(readlink /proc/self/exe 2>/dev/null)
+ if [ -z "$exe" ]; then
+ tap_skip "uprobe_no_false_positive" "cannot read /proc/self/exe"
+ cleanup; return
+ fi
+ trace_event_enable
+ trace_on
+ tlob_enable
+ trace_clear
+ # Sleep without any binding - just verify no spurious events
+ sleep 0.5
+ trace_grep "budget_exceeded" \
+ && tap_fail "uprobe_no_false_positive" \
+ "spurious budget_exceeded without any binding" \
+ || tap_pass "uprobe_no_false_positive"
+ cleanup
+}
+
+# ---------------------------------------------------------------------------
+# Helper: get_uprobe_offset <binary> <symbol>
+#
+# Use tlob_helper sym_offset to get the ELF file offset of <symbol>
+# in <binary>. Prints the hex offset (e.g. "0x11d0") or empty string on
+# failure.
+# ---------------------------------------------------------------------------
+get_uprobe_offset() {
+ bin=$1; sym=$2
+ if [ ! -x "${IOCTL_HELPER}" ]; then
+ return
+ fi
+ "${IOCTL_HELPER}" sym_offset "${bin}" "${sym}" 2>/dev/null
+}
+
+# ---------------------------------------------------------------------------
+# Test 4: uprobe binding - violation detected
+#
+# Start tlob_uprobe_target (a busy-spin binary with a well-known symbol),
+# attach a uprobe on tlob_busy_work with a 10 ms threshold, and verify
+# that a budget_expired event appears.
+# ---------------------------------------------------------------------------
+run_test_uprobe_violation() {
+ next_test; cleanup
+ if [ ! -e "${TLOB_MONITOR}" ]; then
+ tap_skip "uprobe_violation" "monitor file not available"
+ cleanup; return
+ fi
+ if [ ! -x "${UPROBE_TARGET}" ]; then
+ tap_skip "uprobe_violation" \
+ "tlob_uprobe_target not found or not executable"
+ cleanup; return
+ fi
+
+ # Get the file offsets of the start and stop probe symbols
+ busy_offset=$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work")
+ if [ -z "${busy_offset}" ]; then
+ tap_skip "uprobe_violation" \
+ "cannot resolve tlob_busy_work offset in ${UPROBE_TARGET}"
+ cleanup; return
+ fi
+ stop_offset=$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work_done")
+ if [ -z "${stop_offset}" ]; then
+ tap_skip "uprobe_violation" \
+ "cannot resolve tlob_busy_work_done offset in ${UPROBE_TARGET}"
+ cleanup; return
+ fi
+
+ # Start the busy-spin target (run for 30 s so the test can observe it)
+ "${UPROBE_TARGET}" 30000 &
+ busy_pid=$!
+ sleep 0.05
+
+ trace_event_enable
+ trace_on
+ tlob_enable
+ trace_clear
+
+ # Bind the target: 10 us budget; start=tlob_busy_work, stop=tlob_busy_work_done
+ binding="10:${busy_offset}:${stop_offset}:${UPROBE_TARGET}"
+ if ! echo "${binding}" > "${TLOB_MONITOR}" 2>/dev/null; then
+ kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null
+ tap_skip "uprobe_violation" \
+ "uprobe binding rejected (CONFIG_UPROBES=y needed)"
+ cleanup; return
+ fi
+
+ # Wait up to 2 s for a budget_exceeded event
+ found=0; i=0
+ while [ "$i" -lt 20 ]; do
+ sleep 0.1
+ trace_grep "budget_exceeded" && { found=1; break; }
+ i=$((i+1))
+ done
+
+ echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null
+ kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null
+
+ if [ "${found}" != "1" ]; then
+ tap_fail "uprobe_violation" "no budget_exceeded within 2 s"
+ cleanup; return
+ fi
+
+ # Validate the event fields: threshold must match, on_cpu must be non-zero
+ # (CPU-bound violation), and state must be on_cpu.
+ ev=$(grep "budget_exceeded" "${TRACE_FILE}" | head -n 1)
+ if ! echo "${ev}" | grep -q "threshold=10 "; then
+ tap_fail "uprobe_violation" "threshold field mismatch: ${ev}"
+ cleanup; return
+ fi
+ on_cpu=$(echo "${ev}" | grep -o "on_cpu=[0-9]*" | cut -d= -f2)
+ if [ "${on_cpu:-0}" -eq 0 ]; then
+ tap_fail "uprobe_violation" "on_cpu=0 for a CPU-bound spin: ${ev}"
+ cleanup; return
+ fi
+ if ! echo "${ev}" | grep -q "state=on_cpu"; then
+ tap_fail "uprobe_violation" "state is not on_cpu: ${ev}"
+ cleanup; return
+ fi
+ tap_pass "uprobe_violation"
+ cleanup
+}
+
+# ---------------------------------------------------------------------------
+# Test 5: uprobe binding - remove binding stops monitoring
+#
+# Bind a pid via tlob_uprobe_target, then immediately remove it.
+# Verify that after removal the monitor file no longer lists the pid.
+# ---------------------------------------------------------------------------
+run_test_uprobe_unbind() {
+ next_test; cleanup
+ if [ ! -e "${TLOB_MONITOR}" ]; then
+ tap_skip "uprobe_unbind" "monitor file not available"
+ cleanup; return
+ fi
+ if [ ! -x "${UPROBE_TARGET}" ]; then
+ tap_skip "uprobe_unbind" \
+ "tlob_uprobe_target not found or not executable"
+ cleanup; return
+ fi
+
+ busy_offset=$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work")
+ stop_offset=$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work_done")
+ if [ -z "${busy_offset}" ] || [ -z "${stop_offset}" ]; then
+ tap_skip "uprobe_unbind" \
+ "cannot resolve tlob_busy_work/tlob_busy_work_done offset"
+ cleanup; return
+ fi
+
+ "${UPROBE_TARGET}" 30000 &
+ busy_pid=$!
+ sleep 0.05
+
+ tlob_enable
+ # 5 s budget - should not fire during this quick test
+ binding="5000000:${busy_offset}:${stop_offset}:${UPROBE_TARGET}"
+ if ! echo "${binding}" > "${TLOB_MONITOR}" 2>/dev/null; then
+ kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null
+ tap_skip "uprobe_unbind" \
+ "uprobe binding rejected (CONFIG_UPROBES=y needed)"
+ cleanup; return
+ fi
+
+ # Remove the binding
+ echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null
+
+ # The monitor file should no longer list the binding for this offset
+ if grep -q "^[0-9]*:0x${busy_offset#0x}:" "${TLOB_MONITOR}" 2>/dev/null; then
+ kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null
+ tap_fail "uprobe_unbind" "pid still listed after removal"
+ cleanup; return
+ fi
+
+ kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null
+ tap_pass "uprobe_unbind"
+ cleanup
+}
+
+# ---------------------------------------------------------------------------
+# Test 6: uprobe - duplicate offset_start rejected
+#
+# Registering a second binding with the same offset_start in the same binary
+# must be rejected with an error, since two entry uprobes at the same address
+# would cause double tlob_start_task() calls and undefined behaviour.
+# ---------------------------------------------------------------------------
+run_test_uprobe_duplicate_offset() {
+ next_test; cleanup
+ if [ ! -e "${TLOB_MONITOR}" ]; then
+ tap_skip "uprobe_duplicate_offset" "monitor file not available"
+ cleanup; return
+ fi
+ if [ ! -x "${UPROBE_TARGET}" ]; then
+ tap_skip "uprobe_duplicate_offset" \
+ "tlob_uprobe_target not found or not executable"
+ cleanup; return
+ fi
+
+ busy_offset=$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work")
+ stop_offset=$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work_done")
+ if [ -z "${busy_offset}" ] || [ -z "${stop_offset}" ]; then
+ tap_skip "uprobe_duplicate_offset" \
+ "cannot resolve tlob_busy_work/tlob_busy_work_done offset"
+ cleanup; return
+ fi
+
+ tlob_enable
+
+ # First binding: should succeed
+ if ! echo "5000000:${busy_offset}:${stop_offset}:${UPROBE_TARGET}" \
+ > "${TLOB_MONITOR}" 2>/dev/null; then
+ tap_skip "uprobe_duplicate_offset" \
+ "uprobe binding rejected (CONFIG_UPROBES=y needed)"
+ cleanup; return
+ fi
+
+ # Second binding with same offset_start: must be rejected
+ if echo "9999:${busy_offset}:${stop_offset}:${UPROBE_TARGET}" \
+ > "${TLOB_MONITOR}" 2>/dev/null; then
+ echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null
+ tap_fail "uprobe_duplicate_offset" \
+ "duplicate offset_start was accepted (expected error)"
+ cleanup; return
+ fi
+
+ echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null
+ tap_pass "uprobe_duplicate_offset"
+ cleanup
+}
+
+
+#
+# Region A: tlob_busy_work with a 5 s budget - should NOT fire during the test.
+# Region B: tlob_busy_work_done with a 10 us budget - SHOULD fire quickly since
+# tlob_uprobe_target calls tlob_busy_work_done after a busy spin.
+#
+# Verifies that independent bindings for different offsets in the same binary
+# are tracked separately and that only the tight-budget binding triggers a
+# budget_exceeded event.
+# ---------------------------------------------------------------------------
+run_test_uprobe_independent_thresholds() {
+ next_test; cleanup
+ if [ ! -e "${TLOB_MONITOR}" ]; then
+ tap_skip "uprobe_independent_thresholds" \
+ "monitor file not available"; cleanup; return
+ fi
+ if [ ! -x "${UPROBE_TARGET}" ]; then
+ tap_skip "uprobe_independent_thresholds" \
+ "tlob_uprobe_target not found or not executable"
+ cleanup; return
+ fi
+
+ busy_offset=$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work")
+ busy_stop_offset=$(get_uprobe_offset "${UPROBE_TARGET}" "tlob_busy_work_done")
+ if [ -z "${busy_offset}" ] || [ -z "${busy_stop_offset}" ]; then
+ tap_skip "uprobe_independent_thresholds" \
+ "cannot resolve tlob_busy_work/tlob_busy_work_done offset"
+ cleanup; return
+ fi
+
+ "${UPROBE_TARGET}" 30000 &
+ busy_pid=$!
+ sleep 0.05
+
+ trace_event_enable
+ trace_on
+ tlob_enable
+ trace_clear
+
+ # Region A: generous 5 s budget on tlob_busy_work entry (should not fire)
+ if ! echo "5000000:${busy_offset}:${busy_stop_offset}:${UPROBE_TARGET}" \
+ > "${TLOB_MONITOR}" 2>/dev/null; then
+ kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null
+ tap_skip "uprobe_independent_thresholds" \
+ "uprobe binding rejected (CONFIG_UPROBES=y needed)"
+ cleanup; return
+ fi
+ # Region B: tight 10 us budget on tlob_busy_work_done (fires quickly)
+ echo "10:${busy_stop_offset}:${busy_stop_offset}:${UPROBE_TARGET}" \
+ > "${TLOB_MONITOR}" 2>/dev/null
+
+ found=0; i=0
+ while [ "$i" -lt 20 ]; do
+ sleep 0.1
+ trace_grep "budget_exceeded" && { found=1; break; }
+ i=$((i+1))
+ done
+
+ echo "-${busy_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null
+ echo "-${busy_stop_offset}:${UPROBE_TARGET}" > "${TLOB_MONITOR}" 2>/dev/null
+ kill "${busy_pid}" 2>/dev/null; wait "${busy_pid}" 2>/dev/null
+
+ if [ "${found}" != "1" ]; then
+ tap_fail "uprobe_independent_thresholds" \
+ "budget_exceeded not raised for tight-budget region within 2 s"
+ cleanup; return
+ fi
+
+ # The violation must carry threshold=10 (Region B's budget).
+ ev=$(grep "budget_exceeded" "${TRACE_FILE}" | head -n 1)
+ if ! echo "${ev}" | grep -q "threshold=10 "; then
+ tap_fail "uprobe_independent_thresholds" \
+ "violation threshold is not Region B's 10 us: ${ev}"
+ cleanup; return
+ fi
+ tap_pass "uprobe_independent_thresholds"
+ cleanup
+}
+
+# ---------------------------------------------------------------------------
+# ioctl tests via tlob_helper
+#
+# Each test invokes the helper with a sub-test name.
+# Exit code: 0=pass, 1=fail, 2=skip.
+# ---------------------------------------------------------------------------
+run_ioctl_test() {
+ testname=$1
+ next_test
+
+ if [ ! -x "${IOCTL_HELPER}" ]; then
+ tap_skip "ioctl_${testname}" \
+ "tlob_helper not found or not executable"
+ return
+ fi
+ if [ ! -c "${RV_DEV}" ]; then
+ tap_skip "ioctl_${testname}" \
+ "${RV_DEV} not present (CONFIG_RV_CHARDEV=y needed)"
+ return
+ fi
+
+ tlob_enable
+ "${IOCTL_HELPER}" "${testname}"
+ rc=$?
+ tlob_disable
+
+ case "${rc}" in
+ 0) tap_pass "ioctl_${testname}" ;;
+ 2) tap_skip "ioctl_${testname}" "helper returned skip" ;;
+ *) tap_fail "ioctl_${testname}" "helper exited with code ${rc}" ;;
+ esac
+}
+
+# run_ioctl_test_not_enabled - like run_ioctl_test but deliberately does NOT
+# enable the tlob monitor before invoking the helper. Used to verify that
+# ioctls issued against a disabled monitor return ENODEV rather than crashing
+# the kernel with a NULL pointer dereference.
+run_ioctl_test_not_enabled()
+{
+ next_test
+
+ if [ ! -x "${IOCTL_HELPER}" ]; then
+ tap_skip "ioctl_not_enabled" \
+ "tlob_helper not found or not executable"
+ return
+ fi
+ if [ ! -c "${RV_DEV}" ]; then
+ tap_skip "ioctl_not_enabled" \
+ "${RV_DEV} not present (CONFIG_RV_CHARDEV=y needed)"
+ return
+ fi
+
+ # Monitor intentionally left disabled.
+ tlob_disable
+ "${IOCTL_HELPER}" not_enabled
+ rc=$?
+
+ case "${rc}" in
+ 0) tap_pass "ioctl_not_enabled" ;;
+ 2) tap_skip "ioctl_not_enabled" "helper returned skip" ;;
+ *) tap_fail "ioctl_not_enabled" "helper exited with code ${rc}" ;;
+ esac
+}
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+check_root; check_tracefs; check_rv_dir; check_tlob
+tap_header; tap_plan 20
+
+# tracefs interface tests
+run_test_enable_disable
+run_test_tracefs_files
+
+# uprobe external monitoring tests
+run_test_uprobe_no_false_positive
+run_test_uprobe_violation
+run_test_uprobe_unbind
+run_test_uprobe_duplicate_offset
+run_test_uprobe_independent_thresholds
+
+# /dev/rv ioctl self-instrumentation tests
+run_ioctl_test_not_enabled
+run_ioctl_test within_budget
+run_ioctl_test over_budget_cpu
+run_ioctl_test over_budget_sleep
+run_ioctl_test double_start
+run_ioctl_test stop_no_start
+run_ioctl_test multi_thread
+run_ioctl_test self_watch
+run_ioctl_test invalid_flags
+run_ioctl_test notify_fd_bad
+run_ioctl_test mmap_basic
+run_ioctl_test mmap_errors
+run_ioctl_test mmap_consume
+
+echo "# Passed: ${t_pass} Failed: ${t_fail} Skipped: ${t_skip}"
+[ "${t_fail}" -gt 0 ] && exit 1 || exit 0
diff --git a/tools/testing/selftests/rv/tlob_helper.c b/tools/testing/selftests/rv/tlob_helper.c
new file mode 100644
index 000000000..cd76b56d1
--- /dev/null
+++ b/tools/testing/selftests/rv/tlob_helper.c
@@ -0,0 +1,994 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * tlob_helper.c - test helper and ELF utility for tlob selftests
+ *
+ * Called by test_tlob.sh to exercise the /dev/rv ioctl interface and to
+ * resolve ELF symbol offsets for uprobe bindings. One subcommand per
+ * invocation so the shell script can report each as an independent TAP
+ * test case.
+ *
+ * Usage: tlob_helper <subcommand> [args...]
+ *
+ * Synchronous TRACE_START / TRACE_STOP tests:
+ * not_enabled - TRACE_START without tlob enabled -> ENODEV (no kernel crash)
+ * within_budget - start(50000 us), sleep 10 ms, stop -> expect 0
+ * over_budget_cpu - start(5000 us), busyspin 100 ms, stop -> EOVERFLOW
+ * over_budget_sleep - start(3000 us), sleep 50 ms, stop -> EOVERFLOW
+ *
+ * Error-handling tests:
+ * double_start - two starts without stop -> EEXIST on second
+ * stop_no_start - stop without start -> ESRCH
+ *
+ * Per-thread isolation test:
+ * multi_thread - two threads share one fd; one within budget, one over
+ *
+ * Asynchronous notification test (notify_fd + read()):
+ * self_watch - one worker exceeds budget; monitor fd receives one ntf via read()
+ *
+ * Input-validation tests (TRACE_START error paths):
+ * invalid_flags - TRACE_START with flags != 0 -> EINVAL
+ * notify_fd_bad - TRACE_START with notify_fd = stdout (non-rv fd) -> EINVAL
+ *
+ * mmap ring buffer tests (Scenario D):
+ * mmap_basic - mmap succeeds; verify tlob_mmap_page fields
+ * (version, capacity, data_offset, record_size)
+ * mmap_errors - MAP_PRIVATE, wrong size, and non-zero pgoff all
+ * return EINVAL
+ * mmap_consume - trigger a real violation via self-notification and
+ * consume the event through the mmap'd ring
+ *
+ * ELF utility (does not require /dev/rv):
+ * sym_offset <binary> <symbol>
+ * - print the ELF file offset of <symbol> in <binary>
+ * (used by the shell script to build uprobe bindings)
+ *
+ * Exit code: 0 = pass, 1 = fail, 2 = skip (device not available).
+ */
+#define _GNU_SOURCE
+#include <elf.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <pthread.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <linux/rv.h>
+
+/* Default ring capacity allocated at open(); matches TLOB_RING_DEFAULT_CAP. */
+#define TLOB_RING_DEFAULT_CAP 64U
+
+static int rv_fd = -1;
+
+static int open_rv(void)
+{
+ rv_fd = open("/dev/rv", O_RDWR);
+ if (rv_fd < 0) {
+ fprintf(stderr, "open /dev/rv: %s\n", strerror(errno));
+ return -1;
+ }
+ return 0;
+}
+
+static void busy_spin_us(unsigned long us)
+{
+ struct timespec start, now;
+ unsigned long elapsed;
+
+ clock_gettime(CLOCK_MONOTONIC, &start);
+ do {
+ clock_gettime(CLOCK_MONOTONIC, &now);
+ elapsed = (unsigned long)(now.tv_sec - start.tv_sec)
+ * 1000000000UL
+ + (unsigned long)(now.tv_nsec - start.tv_nsec);
+ } while (elapsed < us * 1000UL);
+}
+
+static int do_start(uint64_t threshold_us)
+{
+ struct tlob_start_args args = {
+ .threshold_us = threshold_us,
+ .notify_fd = -1,
+ };
+
+ return ioctl(rv_fd, TLOB_IOCTL_TRACE_START, &args);
+}
+
+static int do_stop(void)
+{
+ return ioctl(rv_fd, TLOB_IOCTL_TRACE_STOP, NULL);
+}
+
+/* -----------------------------------------------------------------------
+ * Synchronous TRACE_START / TRACE_STOP tests
+ * -----------------------------------------------------------------------
+ */
+
+/*
+ * test_not_enabled - TRACE_START must return ENODEV when the tlob monitor
+ * has not been enabled (tlob_state_cache is NULL).
+ *
+ * The shell wrapper deliberately does NOT call tlob_enable before invoking
+ * this subcommand, so the ioctl is expected to fail with ENODEV rather than
+ * crashing the kernel with a NULL pointer dereference in kmem_cache_alloc.
+ */
+static int test_not_enabled(void)
+{
+ int ret;
+
+ ret = do_start(1000);
+ if (ret == 0) {
+ fprintf(stderr, "TRACE_START: expected ENODEV, got success\n");
+ do_stop();
+ return 1;
+ }
+ if (errno != ENODEV) {
+ fprintf(stderr, "TRACE_START: expected ENODEV, got %s\n",
+ strerror(errno));
+ return 1;
+ }
+ return 0;
+}
+
+static int test_within_budget(void)
+{
+ int ret;
+
+ if (do_start(50000) < 0) {
+ fprintf(stderr, "TRACE_START: %s\n", strerror(errno));
+ return 1;
+ }
+ usleep(10000); /* 10 ms < 50 ms budget */
+ ret = do_stop();
+ if (ret != 0) {
+ fprintf(stderr, "TRACE_STOP: expected 0, got %d errno=%s\n",
+ ret, strerror(errno));
+ return 1;
+ }
+ return 0;
+}
+
+static int test_over_budget_cpu(void)
+{
+ int ret;
+
+ if (do_start(5000) < 0) {
+ fprintf(stderr, "TRACE_START: %s\n", strerror(errno));
+ return 1;
+ }
+ busy_spin_us(100000); /* 100 ms >> 5 ms budget */
+ ret = do_stop();
+ if (ret == 0) {
+ fprintf(stderr, "TRACE_STOP: expected EOVERFLOW, got 0\n");
+ return 1;
+ }
+ if (errno != EOVERFLOW) {
+ fprintf(stderr, "TRACE_STOP: expected EOVERFLOW, got %s\n",
+ strerror(errno));
+ return 1;
+ }
+ return 0;
+}
+
+static int test_over_budget_sleep(void)
+{
+ int ret;
+
+ if (do_start(3000) < 0) {
+ fprintf(stderr, "TRACE_START: %s\n", strerror(errno));
+ return 1;
+ }
+ usleep(50000); /* 50 ms >> 3 ms budget, off-CPU time counts */
+ ret = do_stop();
+ if (ret == 0) {
+ fprintf(stderr, "TRACE_STOP: expected EOVERFLOW, got 0\n");
+ return 1;
+ }
+ if (errno != EOVERFLOW) {
+ fprintf(stderr, "TRACE_STOP: expected EOVERFLOW, got %s\n",
+ strerror(errno));
+ return 1;
+ }
+ return 0;
+}
+
+/* -----------------------------------------------------------------------
+ * Error-handling tests
+ * -----------------------------------------------------------------------
+ */
+
+static int test_double_start(void)
+{
+ int ret;
+
+ if (do_start(10000000) < 0) {
+ fprintf(stderr, "first TRACE_START: %s\n", strerror(errno));
+ return 1;
+ }
+ ret = do_start(10000000);
+ if (ret == 0) {
+ fprintf(stderr, "second TRACE_START: expected EEXIST, got 0\n");
+ do_stop();
+ return 1;
+ }
+ if (errno != EEXIST) {
+ fprintf(stderr, "second TRACE_START: expected EEXIST, got %s\n",
+ strerror(errno));
+ do_stop();
+ return 1;
+ }
+ do_stop(); /* clean up */
+ return 0;
+}
+
+static int test_stop_no_start(void)
+{
+ int ret;
+
+ /* Ensure clean state: ignore error from a stale entry */
+ do_stop();
+
+ ret = do_stop();
+ if (ret == 0) {
+ fprintf(stderr, "TRACE_STOP: expected ESRCH, got 0\n");
+ return 1;
+ }
+ if (errno != ESRCH) {
+ fprintf(stderr, "TRACE_STOP: expected ESRCH, got %s\n",
+ strerror(errno));
+ return 1;
+ }
+ return 0;
+}
+
+/* -----------------------------------------------------------------------
+ * Per-thread isolation test
+ *
+ * Two threads share a single /dev/rv fd. The monitor uses task_struct *
+ * as the key, so each thread gets an independent slot regardless of the
+ * shared fd.
+ * -----------------------------------------------------------------------
+ */
+
+struct mt_thread_args {
+ uint64_t threshold_us;
+ unsigned long workload_us;
+ int busy;
+ int expect_eoverflow;
+ int result;
+};
+
+static void *mt_thread_fn(void *arg)
+{
+ struct mt_thread_args *a = arg;
+ int ret;
+
+ if (do_start(a->threshold_us) < 0) {
+ fprintf(stderr, "thread TRACE_START: %s\n", strerror(errno));
+ a->result = 1;
+ return NULL;
+ }
+
+ if (a->busy)
+ busy_spin_us(a->workload_us);
+ else
+ usleep(a->workload_us);
+
+ ret = do_stop();
+ if (a->expect_eoverflow) {
+ if (ret == 0 || errno != EOVERFLOW) {
+ fprintf(stderr, "thread: expected EOVERFLOW, got ret=%d errno=%s\n",
+ ret, strerror(errno));
+ a->result = 1;
+ return NULL;
+ }
+ } else {
+ if (ret != 0) {
+ fprintf(stderr, "thread: expected 0, got ret=%d errno=%s\n",
+ ret, strerror(errno));
+ a->result = 1;
+ return NULL;
+ }
+ }
+ a->result = 0;
+ return NULL;
+}
+
+static int test_multi_thread(void)
+{
+ pthread_t ta, tb;
+ struct mt_thread_args a = {
+ .threshold_us = 20000, /* 20 ms */
+ .workload_us = 5000, /* 5 ms sleep -> within budget */
+ .busy = 0,
+ .expect_eoverflow = 0,
+ };
+ struct mt_thread_args b = {
+ .threshold_us = 3000, /* 3 ms */
+ .workload_us = 30000, /* 30 ms spin -> over budget */
+ .busy = 1,
+ .expect_eoverflow = 1,
+ };
+
+ pthread_create(&ta, NULL, mt_thread_fn, &a);
+ pthread_create(&tb, NULL, mt_thread_fn, &b);
+ pthread_join(ta, NULL);
+ pthread_join(tb, NULL);
+
+ return (a.result || b.result) ? 1 : 0;
+}
+
+/* -----------------------------------------------------------------------
+ * Asynchronous notification test (notify_fd + read())
+ *
+ * A dedicated monitor_fd is opened by the main thread. Two worker threads
+ * each open their own work_fd and call TLOB_IOCTL_TRACE_START with
+ * notify_fd = monitor_fd, nominating it as the violation target. Worker A
+ * stays within budget; worker B exceeds it. The main thread reads from
+ * monitor_fd and expects exactly one tlob_event record.
+ * -----------------------------------------------------------------------
+ */
+
+struct sw_worker_args {
+ int monitor_fd;
+ uint64_t threshold_us;
+ unsigned long workload_us;
+ int busy;
+ int result;
+};
+
+static void *sw_worker_fn(void *arg)
+{
+ struct sw_worker_args *a = arg;
+ struct tlob_start_args args = {
+ .threshold_us = a->threshold_us,
+ .notify_fd = a->monitor_fd,
+ };
+ int work_fd;
+ int ret;
+
+ work_fd = open("/dev/rv", O_RDWR);
+ if (work_fd < 0) {
+ fprintf(stderr, "worker open /dev/rv: %s\n", strerror(errno));
+ a->result = 1;
+ return NULL;
+ }
+
+ ret = ioctl(work_fd, TLOB_IOCTL_TRACE_START, &args);
+ if (ret < 0) {
+ fprintf(stderr, "TRACE_START (notify): %s\n", strerror(errno));
+ close(work_fd);
+ a->result = 1;
+ return NULL;
+ }
+
+ if (a->busy)
+ busy_spin_us(a->workload_us);
+ else
+ usleep(a->workload_us);
+
+ ioctl(work_fd, TLOB_IOCTL_TRACE_STOP, NULL);
+ close(work_fd);
+ a->result = 0;
+ return NULL;
+}
+
+static int test_self_watch(void)
+{
+ int monitor_fd;
+ pthread_t ta, tb;
+ struct sw_worker_args a = {
+ .threshold_us = 50000, /* 50 ms */
+ .workload_us = 5000, /* 5 ms sleep -> no violation */
+ .busy = 0,
+ };
+ struct sw_worker_args b = {
+ .threshold_us = 3000, /* 3 ms */
+ .workload_us = 30000, /* 30 ms spin -> violation */
+ .busy = 1,
+ };
+ struct tlob_event ntfs[8];
+ int violations = 0;
+ ssize_t n;
+
+ /*
+ * Open monitor_fd with O_NONBLOCK so read() after the workers finish
+ * returns immediately rather than blocking forever.
+ */
+ monitor_fd = open("/dev/rv", O_RDWR | O_NONBLOCK);
+ if (monitor_fd < 0) {
+ fprintf(stderr, "open /dev/rv (monitor_fd): %s\n", strerror(errno));
+ return 1;
+ }
+ a.monitor_fd = monitor_fd;
+ b.monitor_fd = monitor_fd;
+
+ pthread_create(&ta, NULL, sw_worker_fn, &a);
+ pthread_create(&tb, NULL, sw_worker_fn, &b);
+ pthread_join(ta, NULL);
+ pthread_join(tb, NULL);
+
+ if (a.result || b.result) {
+ close(monitor_fd);
+ return 1;
+ }
+
+ /*
+ * Drain all available tlob_event records. With O_NONBLOCK the final
+ * read() returns -EAGAIN when the buffer is empty.
+ */
+ while ((n = read(monitor_fd, ntfs, sizeof(ntfs))) > 0)
+ violations += (int)(n / sizeof(struct tlob_event));
+
+ close(monitor_fd);
+
+ if (violations != 1) {
+ fprintf(stderr, "self_watch: expected 1 violation, got %d\n",
+ violations);
+ return 1;
+ }
+ return 0;
+}
+
+/* -----------------------------------------------------------------------
+ * Input-validation tests (TRACE_START error paths)
+ * -----------------------------------------------------------------------
+ */
+
+/*
+ * test_invalid_flags - TRACE_START with flags != 0 must return EINVAL.
+ *
+ * The flags field is reserved for future extensions and must be zero.
+ * Callers that set it to a non-zero value are rejected early so that a
+ * future kernel can assign meaning to those bits without silently
+ * ignoring them.
+ */
+static int test_invalid_flags(void)
+{
+ struct tlob_start_args args = {
+ .threshold_us = 1000,
+ .notify_fd = -1,
+ .flags = 1, /* non-zero: must be rejected */
+ };
+ int ret;
+
+ ret = ioctl(rv_fd, TLOB_IOCTL_TRACE_START, &args);
+ if (ret == 0) {
+ fprintf(stderr, "TRACE_START(flags=1): expected EINVAL, got success\n");
+ do_stop();
+ return 1;
+ }
+ if (errno != EINVAL) {
+ fprintf(stderr, "TRACE_START(flags=1): expected EINVAL, got %s\n",
+ strerror(errno));
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * test_notify_fd_bad - TRACE_START with a non-/dev/rv notify_fd must return
+ * EINVAL.
+ *
+ * When notify_fd >= 0, the kernel resolves it to a struct file and checks
+ * that its private_data is non-NULL (i.e. it is a /dev/rv file descriptor).
+ * Passing stdout (fd 1) supplies a real, open fd whose private_data is NULL,
+ * so the kernel must reject it with EINVAL.
+ */
+static int test_notify_fd_bad(void)
+{
+ struct tlob_start_args args = {
+ .threshold_us = 1000,
+ .notify_fd = STDOUT_FILENO, /* open but not a /dev/rv fd */
+ .flags = 0,
+ };
+ int ret;
+
+ ret = ioctl(rv_fd, TLOB_IOCTL_TRACE_START, &args);
+ if (ret == 0) {
+ fprintf(stderr,
+ "TRACE_START(notify_fd=stdout): expected EINVAL, got success\n");
+ do_stop();
+ return 1;
+ }
+ if (errno != EINVAL) {
+ fprintf(stderr,
+ "TRACE_START(notify_fd=stdout): expected EINVAL, got %s\n",
+ strerror(errno));
+ return 1;
+ }
+ return 0;
+}
+
+/* -----------------------------------------------------------------------
+ * mmap ring buffer tests (Scenario D)
+ * -----------------------------------------------------------------------
+ */
+
+/*
+ * test_mmap_basic - mmap the ring buffer and verify the control page fields.
+ *
+ * The kernel allocates TLOB_RING_DEFAULT_CAP records at open(). A shared
+ * mmap of PAGE_SIZE + cap * record_size must succeed and the tlob_mmap_page
+ * header must contain consistent values.
+ */
+static int test_mmap_basic(void)
+{
+ long pagesize = sysconf(_SC_PAGESIZE);
+ size_t mmap_len = (size_t)pagesize +
+ TLOB_RING_DEFAULT_CAP * sizeof(struct tlob_event);
+ /* rv_mmap requires a page-aligned length */
+ mmap_len = (mmap_len + (size_t)(pagesize - 1)) & ~(size_t)(pagesize - 1);
+ struct tlob_mmap_page *page;
+ struct tlob_event *data;
+ void *map;
+ int ret = 0;
+
+ map = mmap(NULL, mmap_len, PROT_READ | PROT_WRITE, MAP_SHARED, rv_fd, 0);
+ if (map == MAP_FAILED) {
+ fprintf(stderr, "mmap_basic: mmap: %s\n", strerror(errno));
+ return 1;
+ }
+
+ page = (struct tlob_mmap_page *)map;
+ data = (struct tlob_event *)((char *)map + page->data_offset);
+
+ if (page->version != 1) {
+ fprintf(stderr, "mmap_basic: expected version=1, got %u\n",
+ page->version);
+ ret = 1;
+ goto out;
+ }
+ if (page->capacity != TLOB_RING_DEFAULT_CAP) {
+ fprintf(stderr, "mmap_basic: expected capacity=%u, got %u\n",
+ TLOB_RING_DEFAULT_CAP, page->capacity);
+ ret = 1;
+ goto out;
+ }
+ if (page->data_offset != (uint32_t)pagesize) {
+ fprintf(stderr, "mmap_basic: expected data_offset=%ld, got %u\n",
+ pagesize, page->data_offset);
+ ret = 1;
+ goto out;
+ }
+ if (page->record_size != sizeof(struct tlob_event)) {
+ fprintf(stderr, "mmap_basic: expected record_size=%zu, got %u\n",
+ sizeof(struct tlob_event), page->record_size);
+ ret = 1;
+ goto out;
+ }
+ if (page->data_head != 0 || page->data_tail != 0) {
+ fprintf(stderr, "mmap_basic: ring not empty at open: head=%u tail=%u\n",
+ page->data_head, page->data_tail);
+ ret = 1;
+ goto out;
+ }
+ /* Touch the data array to confirm it is accessible. */
+ (void)data[0].tid;
+out:
+ munmap(map, mmap_len);
+ return ret;
+}
+
+/*
+ * test_mmap_errors - verify that rv_mmap() rejects invalid mmap parameters.
+ *
+ * Four cases are tested, each must return MAP_FAILED with errno == EINVAL:
+ * 1. size one page short of the correct ring length
+ * 2. size one page larger than the correct ring length
+ * 3. MAP_PRIVATE (only MAP_SHARED is permitted)
+ * 4. non-zero vm_pgoff (offset must be 0)
+ */
+static int test_mmap_errors(void)
+{
+ long pagesize = sysconf(_SC_PAGESIZE);
+ size_t correct_len = (size_t)pagesize +
+ TLOB_RING_DEFAULT_CAP * sizeof(struct tlob_event);
+ /* rv_mmap requires a page-aligned length */
+ correct_len = (correct_len + (size_t)(pagesize - 1)) & ~(size_t)(pagesize - 1);
+ void *map;
+ int ret = 0;
+
+ /* Case 1: size one page short (correct_len - 1 still rounds up to correct_len) */
+ map = mmap(NULL, correct_len - (size_t)pagesize, PROT_READ | PROT_WRITE,
+ MAP_SHARED, rv_fd, 0);
+ if (map != MAP_FAILED) {
+ fprintf(stderr, "mmap_errors: short-size mmap succeeded (expected EINVAL)\n");
+ munmap(map, correct_len - (size_t)pagesize);
+ ret = 1;
+ } else if (errno != EINVAL) {
+ fprintf(stderr, "mmap_errors: short-size: expected EINVAL, got %s\n",
+ strerror(errno));
+ ret = 1;
+ }
+
+ /* Case 2: size one page too large */
+ map = mmap(NULL, correct_len + (size_t)pagesize, PROT_READ | PROT_WRITE,
+ MAP_SHARED, rv_fd, 0);
+ if (map != MAP_FAILED) {
+ fprintf(stderr, "mmap_errors: oversized mmap succeeded (expected EINVAL)\n");
+ munmap(map, correct_len + (size_t)pagesize);
+ ret = 1;
+ } else if (errno != EINVAL) {
+ fprintf(stderr, "mmap_errors: oversized: expected EINVAL, got %s\n",
+ strerror(errno));
+ ret = 1;
+ }
+
+ /* Case 3: MAP_PRIVATE instead of MAP_SHARED */
+ map = mmap(NULL, correct_len, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE, rv_fd, 0);
+ if (map != MAP_FAILED) {
+ fprintf(stderr, "mmap_errors: MAP_PRIVATE succeeded (expected EINVAL)\n");
+ munmap(map, correct_len);
+ ret = 1;
+ } else if (errno != EINVAL) {
+ fprintf(stderr, "mmap_errors: MAP_PRIVATE: expected EINVAL, got %s\n",
+ strerror(errno));
+ ret = 1;
+ }
+
+ /* Case 4: non-zero file offset (pgoff = 1) */
+ map = mmap(NULL, correct_len, PROT_READ | PROT_WRITE,
+ MAP_SHARED, rv_fd, (off_t)pagesize);
+ if (map != MAP_FAILED) {
+ fprintf(stderr, "mmap_errors: non-zero pgoff mmap succeeded (expected EINVAL)\n");
+ munmap(map, correct_len);
+ ret = 1;
+ } else if (errno != EINVAL) {
+ fprintf(stderr, "mmap_errors: non-zero pgoff: expected EINVAL, got %s\n",
+ strerror(errno));
+ ret = 1;
+ }
+
+ return ret;
+}
+
+/*
+ * test_mmap_consume - zero-copy consumption of a real violation event.
+ *
+ * Arms a 5 ms budget with self-notification (notify_fd = rv_fd), sleeps
+ * 50 ms (off-CPU violation), then reads the pushed event through the mmap'd
+ * ring without calling read(). Verifies:
+ * - TRACE_STOP returns EOVERFLOW (budget was exceeded)
+ * - data_head == 1 after the violation
+ * - the event fields (threshold_us, tag, tid) are correct
+ * - data_tail can be advanced to consume the record (ring empties)
+ */
+static int test_mmap_consume(void)
+{
+ long pagesize = sysconf(_SC_PAGESIZE);
+ size_t mmap_len = (size_t)pagesize +
+ TLOB_RING_DEFAULT_CAP * sizeof(struct tlob_event);
+ /* rv_mmap requires a page-aligned length */
+ mmap_len = (mmap_len + (size_t)(pagesize - 1)) & ~(size_t)(pagesize - 1);
+ struct tlob_start_args args = {
+ .threshold_us = 5000, /* 5 ms */
+ .notify_fd = rv_fd, /* self-notification */
+ .tag = 0xdeadbeefULL,
+ .flags = 0,
+ };
+ struct tlob_mmap_page *page;
+ struct tlob_event *data;
+ void *map;
+ int stop_ret;
+ int ret = 0;
+
+ map = mmap(NULL, mmap_len, PROT_READ | PROT_WRITE, MAP_SHARED, rv_fd, 0);
+ if (map == MAP_FAILED) {
+ fprintf(stderr, "mmap_consume: mmap: %s\n", strerror(errno));
+ return 1;
+ }
+
+ page = (struct tlob_mmap_page *)map;
+ data = (struct tlob_event *)((char *)map + page->data_offset);
+
+ if (ioctl(rv_fd, TLOB_IOCTL_TRACE_START, &args) < 0) {
+ fprintf(stderr, "mmap_consume: TRACE_START: %s\n", strerror(errno));
+ ret = 1;
+ goto out;
+ }
+
+ usleep(50000); /* 50 ms >> 5 ms budget -> off-CPU violation */
+
+ stop_ret = ioctl(rv_fd, TLOB_IOCTL_TRACE_STOP, NULL);
+ if (stop_ret == 0) {
+ fprintf(stderr, "mmap_consume: TRACE_STOP returned 0, expected EOVERFLOW\n");
+ ret = 1;
+ goto out;
+ }
+ if (errno != EOVERFLOW) {
+ fprintf(stderr, "mmap_consume: TRACE_STOP: expected EOVERFLOW, got %s\n",
+ strerror(errno));
+ ret = 1;
+ goto out;
+ }
+
+ /* Pairs with smp_store_release in tlob_event_push. */
+ if (__atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE) != 1) {
+ fprintf(stderr, "mmap_consume: expected data_head=1, got %u\n",
+ page->data_head);
+ ret = 1;
+ goto out;
+ }
+ if (page->data_tail != 0) {
+ fprintf(stderr, "mmap_consume: expected data_tail=0, got %u\n",
+ page->data_tail);
+ ret = 1;
+ goto out;
+ }
+
+ /* Verify record content */
+ if (data[0].threshold_us != 5000) {
+ fprintf(stderr, "mmap_consume: expected threshold_us=5000, got %llu\n",
+ (unsigned long long)data[0].threshold_us);
+ ret = 1;
+ goto out;
+ }
+ if (data[0].tag != 0xdeadbeefULL) {
+ fprintf(stderr, "mmap_consume: expected tag=0xdeadbeef, got %llx\n",
+ (unsigned long long)data[0].tag);
+ ret = 1;
+ goto out;
+ }
+ if (data[0].tid == 0) {
+ fprintf(stderr, "mmap_consume: tid is 0\n");
+ ret = 1;
+ goto out;
+ }
+
+ /* Consume: advance data_tail and confirm ring is empty */
+ __atomic_store_n(&page->data_tail, 1U, __ATOMIC_RELEASE);
+ if (__atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE) !=
+ __atomic_load_n(&page->data_tail, __ATOMIC_ACQUIRE)) {
+ fprintf(stderr, "mmap_consume: ring not empty after consume\n");
+ ret = 1;
+ }
+
+out:
+ munmap(map, mmap_len);
+ return ret;
+}
+
+/* -----------------------------------------------------------------------
+ * ELF utility: sym_offset
+ *
+ * Print the ELF file offset of a symbol in a binary. Supports 32- and
+ * 64-bit ELF. Walks the section headers to find .symtab (falling back to
+ * .dynsym), then converts the symbol's virtual address to a file offset
+ * via the PT_LOAD program headers.
+ *
+ * Does not require /dev/rv; used by the shell script to build uprobe
+ * bindings of the form pid:threshold_us:offset_start:offset_stop:binary_path.
+ *
+ * Returns 0 on success (offset printed to stdout), 1 on failure.
+ * -----------------------------------------------------------------------
+ */
+static int sym_offset(const char *binary, const char *symname)
+{
+ int fd;
+ struct stat st;
+ void *map;
+ Elf64_Ehdr *ehdr;
+ Elf32_Ehdr *ehdr32;
+ int is64;
+ uint64_t sym_vaddr = 0;
+ int found = 0;
+ uint64_t file_offset = 0;
+
+ fd = open(binary, O_RDONLY);
+ if (fd < 0) {
+ fprintf(stderr, "open %s: %s\n", binary, strerror(errno));
+ return 1;
+ }
+ if (fstat(fd, &st) < 0) {
+ close(fd);
+ return 1;
+ }
+ map = mmap(NULL, (size_t)st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ close(fd);
+ if (map == MAP_FAILED) {
+ fprintf(stderr, "mmap: %s\n", strerror(errno));
+ return 1;
+ }
+
+ /* Identify ELF class */
+ ehdr = (Elf64_Ehdr *)map;
+ ehdr32 = (Elf32_Ehdr *)map;
+ if (st.st_size < 4 ||
+ ehdr->e_ident[EI_MAG0] != ELFMAG0 ||
+ ehdr->e_ident[EI_MAG1] != ELFMAG1 ||
+ ehdr->e_ident[EI_MAG2] != ELFMAG2 ||
+ ehdr->e_ident[EI_MAG3] != ELFMAG3) {
+ fprintf(stderr, "%s: not an ELF file\n", binary);
+ munmap(map, (size_t)st.st_size);
+ return 1;
+ }
+ is64 = (ehdr->e_ident[EI_CLASS] == ELFCLASS64);
+
+ if (is64) {
+ /* Walk section headers to find .symtab or .dynsym */
+ Elf64_Shdr *shdrs = (Elf64_Shdr *)((char *)map + ehdr->e_shoff);
+ Elf64_Shdr *shstrtab_hdr = &shdrs[ehdr->e_shstrndx];
+ const char *shstrtab = (char *)map + shstrtab_hdr->sh_offset;
+ int si;
+
+ /* Prefer .symtab; fall back to .dynsym */
+ for (int pass = 0; pass < 2 && !found; pass++) {
+ const char *target = pass ? ".dynsym" : ".symtab";
+
+ for (si = 0; si < ehdr->e_shnum && !found; si++) {
+ Elf64_Shdr *sh = &shdrs[si];
+ const char *name = shstrtab + sh->sh_name;
+
+ if (strcmp(name, target) != 0)
+ continue;
+
+ Elf64_Shdr *strtab_sh = &shdrs[sh->sh_link];
+ const char *strtab = (char *)map + strtab_sh->sh_offset;
+ Elf64_Sym *syms = (Elf64_Sym *)((char *)map + sh->sh_offset);
+ uint64_t nsyms = sh->sh_size / sizeof(Elf64_Sym);
+ uint64_t j;
+
+ for (j = 0; j < nsyms; j++) {
+ if (strcmp(strtab + syms[j].st_name, symname) == 0) {
+ sym_vaddr = syms[j].st_value;
+ found = 1;
+ break;
+ }
+ }
+ }
+ }
+
+ if (!found) {
+ fprintf(stderr, "symbol '%s' not found in %s\n", symname, binary);
+ munmap(map, (size_t)st.st_size);
+ return 1;
+ }
+
+ /* Convert vaddr to file offset via PT_LOAD segments */
+ Elf64_Phdr *phdrs = (Elf64_Phdr *)((char *)map + ehdr->e_phoff);
+ int pi;
+
+ for (pi = 0; pi < ehdr->e_phnum; pi++) {
+ Elf64_Phdr *ph = &phdrs[pi];
+
+ if (ph->p_type != PT_LOAD)
+ continue;
+ if (sym_vaddr >= ph->p_vaddr &&
+ sym_vaddr < ph->p_vaddr + ph->p_filesz) {
+ file_offset = sym_vaddr - ph->p_vaddr + ph->p_offset;
+ break;
+ }
+ }
+ } else {
+ /* 32-bit ELF */
+ Elf32_Shdr *shdrs = (Elf32_Shdr *)((char *)map + ehdr32->e_shoff);
+ Elf32_Shdr *shstrtab_hdr = &shdrs[ehdr32->e_shstrndx];
+ const char *shstrtab = (char *)map + shstrtab_hdr->sh_offset;
+ int si;
+ uint32_t sym_vaddr32 = 0;
+
+ for (int pass = 0; pass < 2 && !found; pass++) {
+ const char *target = pass ? ".dynsym" : ".symtab";
+
+ for (si = 0; si < ehdr32->e_shnum && !found; si++) {
+ Elf32_Shdr *sh = &shdrs[si];
+ const char *name = shstrtab + sh->sh_name;
+
+ if (strcmp(name, target) != 0)
+ continue;
+
+ Elf32_Shdr *strtab_sh = &shdrs[sh->sh_link];
+ const char *strtab = (char *)map + strtab_sh->sh_offset;
+ Elf32_Sym *syms = (Elf32_Sym *)((char *)map + sh->sh_offset);
+ uint32_t nsyms = sh->sh_size / sizeof(Elf32_Sym);
+ uint32_t j;
+
+ for (j = 0; j < nsyms; j++) {
+ if (strcmp(strtab + syms[j].st_name, symname) == 0) {
+ sym_vaddr32 = syms[j].st_value;
+ found = 1;
+ break;
+ }
+ }
+ }
+ }
+
+ if (!found) {
+ fprintf(stderr, "symbol '%s' not found in %s\n", symname, binary);
+ munmap(map, (size_t)st.st_size);
+ return 1;
+ }
+
+ Elf32_Phdr *phdrs = (Elf32_Phdr *)((char *)map + ehdr32->e_phoff);
+ int pi;
+
+ for (pi = 0; pi < ehdr32->e_phnum; pi++) {
+ Elf32_Phdr *ph = &phdrs[pi];
+
+ if (ph->p_type != PT_LOAD)
+ continue;
+ if (sym_vaddr32 >= ph->p_vaddr &&
+ sym_vaddr32 < ph->p_vaddr + ph->p_filesz) {
+ file_offset = sym_vaddr32 - ph->p_vaddr + ph->p_offset;
+ break;
+ }
+ }
+ sym_vaddr = sym_vaddr32;
+ }
+
+ munmap(map, (size_t)st.st_size);
+
+ if (!file_offset && sym_vaddr) {
+ fprintf(stderr, "could not map vaddr 0x%lx to file offset\n",
+ (unsigned long)sym_vaddr);
+ return 1;
+ }
+
+ printf("0x%lx\n", (unsigned long)file_offset);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ int rc;
+
+ if (argc < 2) {
+ fprintf(stderr, "Usage: %s <subcommand> [args...]\n", argv[0]);
+ return 1;
+ }
+
+ /* sym_offset does not need /dev/rv */
+ if (strcmp(argv[1], "sym_offset") == 0) {
+ if (argc < 4) {
+ fprintf(stderr, "Usage: %s sym_offset <binary> <symbol>\n",
+ argv[0]);
+ return 1;
+ }
+ return sym_offset(argv[2], argv[3]);
+ }
+
+ if (open_rv() < 0)
+ return 2; /* skip */
+
+ if (strcmp(argv[1], "not_enabled") == 0)
+ rc = test_not_enabled();
+ else if (strcmp(argv[1], "within_budget") == 0)
+ rc = test_within_budget();
+ else if (strcmp(argv[1], "over_budget_cpu") == 0)
+ rc = test_over_budget_cpu();
+ else if (strcmp(argv[1], "over_budget_sleep") == 0)
+ rc = test_over_budget_sleep();
+ else if (strcmp(argv[1], "double_start") == 0)
+ rc = test_double_start();
+ else if (strcmp(argv[1], "stop_no_start") == 0)
+ rc = test_stop_no_start();
+ else if (strcmp(argv[1], "multi_thread") == 0)
+ rc = test_multi_thread();
+ else if (strcmp(argv[1], "self_watch") == 0)
+ rc = test_self_watch();
+ else if (strcmp(argv[1], "invalid_flags") == 0)
+ rc = test_invalid_flags();
+ else if (strcmp(argv[1], "notify_fd_bad") == 0)
+ rc = test_notify_fd_bad();
+ else if (strcmp(argv[1], "mmap_basic") == 0)
+ rc = test_mmap_basic();
+ else if (strcmp(argv[1], "mmap_errors") == 0)
+ rc = test_mmap_errors();
+ else if (strcmp(argv[1], "mmap_consume") == 0)
+ rc = test_mmap_consume();
+ else {
+ fprintf(stderr, "Unknown test: %s\n", argv[1]);
+ rc = 1;
+ }
+
+ close(rv_fd);
+ return rc;
+}
diff --git a/tools/testing/selftests/rv/tlob_uprobe_target.c b/tools/testing/selftests/rv/tlob_uprobe_target.c
new file mode 100644
index 000000000..6c895cb40
--- /dev/null
+++ b/tools/testing/selftests/rv/tlob_uprobe_target.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * tlob_uprobe_target.c - uprobe target binary for tlob selftests.
+ *
+ * Provides two well-known probe points:
+ * tlob_busy_work() - start probe: arms the tlob budget timer
+ * tlob_busy_work_done() - stop probe: cancels the timer on completion
+ *
+ * The tlob selftest writes a five-field uprobe binding:
+ * pid:threshold_us:binary:offset_start:offset_stop
+ * where offset_start is the file offset of tlob_busy_work and offset_stop
+ * is the file offset of tlob_busy_work_done (resolved via tlob_helper
+ * sym_offset).
+ *
+ * Both probe points are plain entry uprobes (no uretprobe). The busy loop
+ * keeps the task on-CPU so that either the stop probe fires cleanly (within
+ * budget) or the hrtimer fires first and emits tlob_budget_exceeded (over
+ * budget).
+ *
+ * Usage: tlob_uprobe_target <duration_ms>
+ *
+ * Loops calling tlob_busy_work() in 200 ms iterations until <duration_ms>
+ * has elapsed (0 = run for ~24 hours). Short iterations ensure the uprobe
+ * entry fires on every call even if the uprobe is installed after the
+ * program has started.
+ */
+#define _GNU_SOURCE
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <time.h>
+
+#ifndef noinline
+#define noinline __attribute__((noinline))
+#endif
+
+static inline int timespec_before(const struct timespec *a,
+ const struct timespec *b)
+{
+ return a->tv_sec < b->tv_sec ||
+ (a->tv_sec == b->tv_sec && a->tv_nsec < b->tv_nsec);
+}
+
+static void timespec_add_ms(struct timespec *ts, unsigned long ms)
+{
+ ts->tv_sec += ms / 1000;
+ ts->tv_nsec += (long)(ms % 1000) * 1000000L;
+ if (ts->tv_nsec >= 1000000000L) {
+ ts->tv_sec++;
+ ts->tv_nsec -= 1000000000L;
+ }
+}
+
+/*
+ * tlob_busy_work_done - stop-probe target.
+ *
+ * Called by tlob_busy_work() after the busy loop. The uprobe on this
+ * function's entry fires tlob_stop_task(), cancelling the budget timer.
+ * noinline ensures the compiler never merges this function with its caller,
+ * guaranteeing the entry uprobe always fires.
+ */
+noinline void tlob_busy_work_done(void)
+{
+ /* empty: the uprobe fires on entry */
+}
+
+/*
+ * tlob_busy_work - start-probe target.
+ *
+ * The uprobe on this function's entry fires tlob_start_task(), arming the
+ * budget timer. noinline prevents the compiler and linker (including LTO)
+ * from inlining this function into its callers, ensuring the entry uprobe
+ * fires on every call.
+ */
+noinline void tlob_busy_work(unsigned long duration_ns)
+{
+ struct timespec start, now;
+ unsigned long elapsed;
+
+ clock_gettime(CLOCK_MONOTONIC, &start);
+ do {
+ clock_gettime(CLOCK_MONOTONIC, &now);
+ elapsed = (unsigned long)(now.tv_sec - start.tv_sec)
+ * 1000000000UL
+ + (unsigned long)(now.tv_nsec - start.tv_nsec);
+ } while (elapsed < duration_ns);
+
+ tlob_busy_work_done();
+}
+
+int main(int argc, char *argv[])
+{
+ unsigned long duration_ms = 0;
+ struct timespec deadline, now;
+
+ if (argc >= 2)
+ duration_ms = strtoul(argv[1], NULL, 10);
+
+ clock_gettime(CLOCK_MONOTONIC, &deadline);
+ timespec_add_ms(&deadline, duration_ms ? duration_ms : 86400000UL);
+
+ do {
+ tlob_busy_work(200 * 1000000UL); /* 200 ms per iteration */
+ clock_gettime(CLOCK_MONOTONIC, &now);
+ } while (timespec_before(&now, &deadline));
+
+ return 0;
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 2/4] rv/tlob: Add tlob deterministic automaton monitor
2026-04-12 19:27 ` [RFC PATCH 2/4] rv/tlob: Add tlob deterministic automaton monitor wen.yang
@ 2026-04-13 8:19 ` Gabriele Monaco
0 siblings, 0 replies; 7+ messages in thread
From: Gabriele Monaco @ 2026-04-13 8:19 UTC (permalink / raw)
To: wen.yang
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
linux-trace-kernel, linux-kernel
On Mon, 2026-04-13 at 03:27 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
>
> Add the tlob (task latency over budget) RV monitor. tlob tracks the
> monotonic elapsed time (CLOCK_MONOTONIC) of a marked per-task code
> path, including time off-CPU, and fires a per-task hrtimer when the
> elapsed time exceeds a configurable budget.
>
> Three-state DA (unmonitored/on_cpu/off_cpu) driven by trace_start,
> switch_in/out, and budget_expired events. Per-task state lives in a
> fixed-size hash table (TLOB_MAX_MONITORED slots) with RCU-deferred
> free.
>
> Two userspace interfaces:
> - tracefs: uprobe pair registration via the monitor file using the
> format "pid:threshold_us:offset_start:offset_stop:binary_path"
> - /dev/rv ioctls (CONFIG_RV_CHARDEV): TLOB_IOCTL_TRACE_START /
> TRACE_STOP; TRACE_STOP returns -EOVERFLOW on violation
>
> Each /dev/rv fd has a per-fd mmap ring buffer (physically contiguous
> pages). A control page (struct tlob_mmap_page) at offset 0 exposes
> head/tail/dropped for lockless userspace reads; struct tlob_event
> records follow at data_offset. Drop-new policy on overflow.
>
> UAPI: include/uapi/linux/rv.h (tlob_start_args, tlob_event,
> tlob_mmap_page, ioctl numbers), monitor_tlob.rst,
> ioctl-number.rst (RV_IOC_MAGIC=0xB9).
>
I'm not fully grasping all the requirements for the monitors yet, but I see you
are reimplementing a lot of functionality in the monitor itself rather than
within RV, let's see if we can consolidate some of them:
* you're using timer expirations, can we do it with timed automata? [1]
* RV automata usually don't have an /unmonitored/ state, your trace_start event
would be the start condition (da_event_start) and the monitor will get non-
running at each violation (it calls da_monitor_reset() automatically), all
setup/cleanup logic should be handled implicitly within RV. I believe that would
also save you that ugly trace_event_tlob() redefinition.
* you're maintaining a local hash table for each task_struct, that could use
the per-object monitors [2] where your "object" is in fact your struct,
allocated when you start the monitor with all appropriate fields and indexed by
pid
* you are handling violations manually, considering timed automata trigger a
full fledged violation on timeouts, can you use the RV-way (error tracepoints or
reactors only)? Do you need the additional reporting within the
tracepoint/ioctl? Cannot the userspace consumer desume all those from other
events and let RV do just the monitoring?
* I like the uprobe thing, we could probably move all that to a common helper
once we figure out how to make it generic.
Note: [1] and [2] didn't reach upstream yet, but should reach linux-next soon.
Thanks,
Gabriele
[1] -
https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/commit/?h=rv/for-next&id=f5587d1b6ec938afb2f74fe399a68020d66923e4
[2] -
https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/commit/?h=rv/for-next&id=da282bf7fadb095ee0a40c32ff0126429c769b45
> Signed-off-by: Wen Yang <wen.yang@linux.dev>
> ---
> Documentation/trace/rv/index.rst | 1 +
> Documentation/trace/rv/monitor_tlob.rst | 381 +++++++
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> include/uapi/linux/rv.h | 181 ++++
> kernel/trace/rv/Kconfig | 17 +
> kernel/trace/rv/Makefile | 2 +
> kernel/trace/rv/monitors/tlob/Kconfig | 51 +
> kernel/trace/rv/monitors/tlob/tlob.c | 986 ++++++++++++++++++
> kernel/trace/rv/monitors/tlob/tlob.h | 145 +++
> kernel/trace/rv/monitors/tlob/tlob_trace.h | 42 +
> kernel/trace/rv/rv.c | 4 +
> kernel/trace/rv/rv_dev.c | 602 +++++++++++
> kernel/trace/rv/rv_trace.h | 50 +
> 13 files changed, 2463 insertions(+)
> create mode 100644 Documentation/trace/rv/monitor_tlob.rst
> create mode 100644 include/uapi/linux/rv.h
> create mode 100644 kernel/trace/rv/monitors/tlob/Kconfig
> create mode 100644 kernel/trace/rv/monitors/tlob/tlob.c
> create mode 100644 kernel/trace/rv/monitors/tlob/tlob.h
> create mode 100644 kernel/trace/rv/monitors/tlob/tlob_trace.h
> create mode 100644 kernel/trace/rv/rv_dev.c
>
> diff --git a/Documentation/trace/rv/index.rst
> b/Documentation/trace/rv/index.rst
> index a2812ac5c..4f2bfaf38 100644
> --- a/Documentation/trace/rv/index.rst
> +++ b/Documentation/trace/rv/index.rst
> @@ -15,3 +15,4 @@ Runtime Verification
> monitor_wwnr.rst
> monitor_sched.rst
> monitor_rtapp.rst
> + monitor_tlob.rst
> diff --git a/Documentation/trace/rv/monitor_tlob.rst
> b/Documentation/trace/rv/monitor_tlob.rst
> new file mode 100644
> index 000000000..d498e9894
> --- /dev/null
> +++ b/Documentation/trace/rv/monitor_tlob.rst
> @@ -0,0 +1,381 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Monitor tlob
> +============
> +
> +- Name: tlob - task latency over budget
> +- Type: per-task deterministic automaton
> +- Author: Wen Yang <wen.yang@linux.dev>
> +
> +Description
> +-----------
> +
> +The tlob monitor tracks per-task elapsed time (CLOCK_MONOTONIC, including
> +both on-CPU and off-CPU time) and reports a violation when the monitored
> +task exceeds a configurable latency budget threshold.
> +
> +The monitor implements a three-state deterministic automaton::
> +
> + |
> + | (initial)
> + v
> + +--------------+
> + +-------> | unmonitored |
> + | +--------------+
> + | |
> + | trace_start
> + | v
> + | +--------------+
> + | | on_cpu |
> + | +--------------+
> + | | |
> + | switch_out| | trace_stop / budget_expired
> + | v v
> + | +--------------+ (unmonitored)
> + | | off_cpu |
> + | +--------------+
> + | | |
> + | | switch_in| trace_stop / budget_expired
> + | v v
> + | (on_cpu) (unmonitored)
> + |
> + +-- trace_stop (from on_cpu or off_cpu)
> +
> + Key transitions:
> + unmonitored --(trace_start)--> on_cpu
> + on_cpu --(switch_out)--> off_cpu
> + off_cpu --(switch_in)--> on_cpu
> + on_cpu --(trace_stop)--> unmonitored
> + off_cpu --(trace_stop)--> unmonitored
> + on_cpu --(budget_expired)-> unmonitored [violation]
> + off_cpu --(budget_expired)-> unmonitored [violation]
> +
> + sched_wakeup self-loops in on_cpu and unmonitored; switch_out and
> + sched_wakeup self-loop in off_cpu. budget_expired is fired by the one-shot
> hrtimer; it always
> + transitions to unmonitored regardless of whether the task is on-CPU
> + or off-CPU when the timer fires.
> +
> +State Descriptions
> +------------------
> +
> +- **unmonitored**: Task is not being traced. Scheduling events
> + (``switch_in``, ``switch_out``, ``sched_wakeup``) are silently
> + ignored (self-loop). The monitor waits for a ``trace_start`` event
> + to begin a new observation window.
> +
> +- **on_cpu**: Task is running on the CPU with the deadline timer armed.
> + A one-shot hrtimer was set for ``threshold_us`` microseconds at
> + ``trace_start`` time. A ``switch_out`` event transitions to
> + ``off_cpu``; the hrtimer keeps running (off-CPU time counts toward
> + the budget). A ``trace_stop`` cancels the timer and returns to
> + ``unmonitored`` (normal completion). If the hrtimer fires
> + (``budget_expired``) the violation is recorded and the automaton
> + transitions to ``unmonitored``.
> +
> +- **off_cpu**: Task was preempted or blocked. The one-shot hrtimer
> + continues to run. A ``switch_in`` event returns to ``on_cpu``.
> + A ``trace_stop`` cancels the timer and returns to ``unmonitored``.
> + If the hrtimer fires (``budget_expired``) while the task is off-CPU,
> + the violation is recorded and the automaton transitions to
> + ``unmonitored``.
> +
> +Rationale
> +---------
> +
> +The per-task latency budget threshold allows operators to express timing
> +requirements in microseconds and receive an immediate ftrace event when a
> +task exceeds its budget. This is useful for real-time tasks
> +(``SCHED_FIFO`` / ``SCHED_DEADLINE``) where total elapsed time must
> +remain within a known bound.
> +
> +Each task has an independent threshold, so up to ``TLOB_MAX_MONITORED``
> +(64) tasks with different timing requirements can be monitored
> +simultaneously.
> +
> +On threshold violation the automaton records a ``tlob_budget_exceeded``
> +ftrace event carrying the final on-CPU / off-CPU time breakdown, but does
> +not kill or throttle the task. Monitoring can be restarted by issuing a
> +new ``trace_start`` event (or a new ``TLOB_IOCTL_TRACE_START`` ioctl).
> +
> +A per-task one-shot hrtimer is armed at ``trace_start`` for exactly
> +``threshold_us`` microseconds. It fires at most once per monitoring
> +window, performs an O(1) hash lookup, records the violation, and injects
> +the ``budget_expired`` event into the DA. When ``CONFIG_RV_MON_TLOB``
> +is not set there is zero runtime cost.
> +
> +Usage
> +-----
> +
> +tracefs interface (uprobe-based external monitoring)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The ``monitor`` tracefs file allows any privileged user to instrument an
> +unmodified binary via uprobes, without changing its source code. Write a
> +four-field record to attach two plain entry uprobes: one at
> +``offset_start`` fires ``tlob_start_task()`` and one at ``offset_stop``
> +fires ``tlob_stop_task()``, so the latency budget covers exactly the code
> +region between the two offsets::
> +
> + threshold_us:offset_start:offset_stop:binary_path
> +
> +``binary_path`` comes last so it may freely contain ``:`` (e.g. paths
> +inside a container namespace).
> +
> +The uprobes fire for every task that executes the probed instruction in
> +the binary, consistent with the native uprobe semantics. All tasks that
> +execute the code region get independent per-task monitoring slots.
> +
> +Using two plain entry uprobes (rather than a uretprobe for the stop) means
> +that a mistyped offset can never corrupt the call stack; the worst outcome
> +of a bad ``offset_stop`` is a missed stop that causes the hrtimer to fire
> +and report a budget violation.
> +
> +Example -- monitor a code region in ``/usr/bin/myapp`` with a 5 ms
> +budget, where the region starts at offset 0x12a0 and ends at 0x12f0::
> +
> + echo 1 > /sys/kernel/tracing/rv/monitors/tlob/enable
> +
> + # Bind uprobes: start probe starts the clock, stop probe stops it
> + echo "5000:0x12a0:0x12f0:/usr/bin/myapp" \
> + > /sys/kernel/tracing/rv/monitors/tlob/monitor
> +
> + # Remove the uprobe binding for this code region
> + echo "-0x12a0:/usr/bin/myapp" >
> /sys/kernel/tracing/rv/monitors/tlob/monitor
> +
> + # List registered uprobe bindings (mirrors the write format)
> + cat /sys/kernel/tracing/rv/monitors/tlob/monitor
> + # -> 5000:0x12a0:0x12f0:/usr/bin/myapp
> +
> + # Read violations from the trace buffer
> + cat /sys/kernel/tracing/trace
> +
> +Up to ``TLOB_MAX_MONITORED`` tasks may be monitored simultaneously.
> +
> +The offsets can be obtained with ``nm`` or ``readelf``::
> +
> + nm -n /usr/bin/myapp | grep my_function
> + # -> 0000000000012a0 T my_function
> +
> + readelf -s /usr/bin/myapp | grep my_function
> + # -> 42: 0000000000012a0 336 FUNC GLOBAL DEFAULT 13 my_function
> +
> + # offset_start = 0x12a0 (function entry)
> + # offset_stop = 0x12a0 + 0x50 = 0x12f0 (or any instruction before return)
> +
> +Notes:
> +
> +- The uprobes fire for every task that executes the probed instruction,
> + so concurrent calls from different threads each get independent
> + monitoring slots.
> +- ``offset_stop`` need not be a function return; it can be any instruction
> + within the region. If the stop probe is never reached (e.g. early exit
> + path bypasses it), the hrtimer fires and a budget violation is reported.
> +- Each ``(binary_path, offset_start)`` pair may only be registered once.
> + A second write with the same ``offset_start`` for the same binary is
> + rejected with ``-EEXIST``. Two entry uprobes at the same address would
> + both fire for every task, causing ``tlob_start_task()`` to be called
> + twice; the second call would silently fail with ``-EEXIST`` and the
> + second binding's threshold would never take effect. Different code
> + regions that share the same ``offset_stop`` (common exit point) are
> + explicitly allowed.
> +- The uprobe binding is removed when ``-offset_start:binary_path`` is
> + written to ``monitor``, or when the monitor is disabled.
> +- The ``tag`` field in every ``tlob_budget_exceeded`` event is
> + automatically set to ``offset_start`` for the tracefs path, so
> + violation events for different code regions are immediately
> + distinguishable even when ``threshold_us`` values are identical.
> +
> +ftrace ring buffer (budget violation events)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a monitored task exceeds its latency budget the hrtimer fires,
> +records the violation, and emits a single ``tlob_budget_exceeded`` event
> +into the ftrace ring buffer. **Nothing is written to the ftrace ring
> +buffer while the task is within budget.**
> +
> +The event carries the on-CPU / off-CPU time breakdown so that root-cause
> +analysis (CPU-bound vs. scheduling / I/O overrun) is immediate::
> +
> + cat /sys/kernel/tracing/trace
> +
> +Example output::
> +
> + myapp-1234 [003] .... 12345.678: tlob_budget_exceeded: \
> + myapp[1234]: budget exceeded threshold=5000 \
> + on_cpu=820 off_cpu=4500 switches=3 state=off_cpu tag=0x00000000000012a0
> +
> +Field descriptions:
> +
> +``threshold``
> + Configured latency budget in microseconds.
> +
> +``on_cpu``
> + Cumulative on-CPU time since ``trace_start``, in microseconds.
> +
> +``off_cpu``
> + Cumulative off-CPU (scheduling + I/O wait) time since ``trace_start``,
> + in microseconds.
> +
> +``switches``
> + Number of times the task was scheduled out during this window.
> +
> +``state``
> + DA state when the hrtimer fired: ``on_cpu`` means the task was executing
> + when the budget expired (CPU-bound overrun); ``off_cpu`` means the task
> + was preempted or blocked (scheduling / I/O overrun).
> +
> +``tag``
> + Opaque 64-bit cookie supplied by the caller via ``tlob_start_args.tag``
> + (ioctl path) or automatically set to ``offset_start`` (tracefs uprobe
> + path). Use it to distinguish violations from different code regions
> + monitored by the same thread. Zero when not set.
> +
> +To capture violations in a file::
> +
> + trace-cmd record -e tlob_budget_exceeded &
> + # ... run workload ...
> + trace-cmd report
> +
> +/dev/rv ioctl interface (self-instrumentation)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Tasks can self-instrument their own code paths via the ``/dev/rv`` misc
> +device (requires ``CONFIG_RV_CHARDEV``). The kernel key is
> +``task_struct``; multiple threads sharing a single fd each get their own
> +independent monitoring slot.
> +
> +**Synchronous mode** -- the calling thread checks its own result::
> +
> + int fd = open("/dev/rv", O_RDWR);
> +
> + struct tlob_start_args args = {
> + .threshold_us = 50000, /* 50 ms */
> + .tag = 0, /* optional; 0 = don't care */
> + .notify_fd = -1, /* no fd notification */
> + };
> + ioctl(fd, TLOB_IOCTL_TRACE_START, &args);
> +
> + /* ... code path under observation ... */
> +
> + int ret = ioctl(fd, TLOB_IOCTL_TRACE_STOP, NULL);
> + /* ret == 0: within budget */
> + /* ret == -EOVERFLOW: budget exceeded */
> +
> + close(fd);
> +
> +**Asynchronous mode** -- a dedicated monitor thread receives violation
> +records via ``read()`` on a shared fd, decoupling the observation from
> +the critical path::
> +
> + /* Monitor thread: open a dedicated fd. */
> + int monitor_fd = open("/dev/rv", O_RDWR);
> +
> + /* Worker thread: set notify_fd = monitor_fd in TRACE_START args. */
> + int work_fd = open("/dev/rv", O_RDWR);
> + struct tlob_start_args args = {
> + .threshold_us = 10000, /* 10 ms */
> + .tag = REGION_A,
> + .notify_fd = monitor_fd,
> + };
> + ioctl(work_fd, TLOB_IOCTL_TRACE_START, &args);
> + /* ... critical section ... */
> + ioctl(work_fd, TLOB_IOCTL_TRACE_STOP, NULL);
> +
> + /* Monitor thread: blocking read() returns one or more tlob_event records.
> */
> + struct tlob_event ntfs[8];
> + ssize_t n = read(monitor_fd, ntfs, sizeof(ntfs));
> + for (int i = 0; i < n / sizeof(struct tlob_event); i++) {
> + struct tlob_event *ntf = &ntfs[i];
> + printf("tid=%u tag=0x%llx exceeded budget=%llu us "
> + "(on_cpu=%llu off_cpu=%llu switches=%u state=%s)\n",
> + ntf->tid, ntf->tag, ntf->threshold_us,
> + ntf->on_cpu_us, ntf->off_cpu_us, ntf->switches,
> + ntf->state ? "on_cpu" : "off_cpu");
> + }
> +
> +**mmap ring buffer** -- zero-copy consumption of violation events::
> +
> + int fd = open("/dev/rv", O_RDWR);
> + struct tlob_start_args args = {
> + .threshold_us = 1000, /* 1 ms */
> + .notify_fd = fd, /* push violations to own ring buffer */
> + };
> + ioctl(fd, TLOB_IOCTL_TRACE_START, &args);
> +
> + /* Map the ring: one control page + capacity data records. */
> + size_t pagesize = sysconf(_SC_PAGESIZE);
> + size_t cap = 64; /* read from page->capacity after mmap */
> + size_t len = pagesize + cap * sizeof(struct tlob_event);
> + void *map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +
> + struct tlob_mmap_page *page = map;
> + struct tlob_event *data =
> + (struct tlob_event *)((char *)map + page->data_offset);
> +
> + /* Consumer loop: poll for events, read without copying. */
> + while (1) {
> + poll(&(struct pollfd){fd, POLLIN, 0}, 1, -1);
> +
> + uint32_t head = __atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE);
> + uint32_t tail = page->data_tail;
> + while (tail != head) {
> + handle(&data[tail & (page->capacity - 1)]);
> + tail++;
> + }
> + __atomic_store_n(&page->data_tail, tail, __ATOMIC_RELEASE);
> + }
> +
> +Note: ``read()`` and ``mmap()`` share the same ring and ``data_tail``
> +cursor. Do not use both simultaneously on the same fd.
> +
> +``tlob_event`` fields:
> +
> +``tid``
> + Thread ID (``task_pid_vnr``) of the violating task.
> +
> +``threshold_us``
> + Budget that was exceeded, in microseconds.
> +
> +``on_cpu_us``
> + Cumulative on-CPU time at violation time, in microseconds.
> +
> +``off_cpu_us``
> + Cumulative off-CPU time at violation time, in microseconds.
> +
> +``switches``
> + Number of context switches since ``TRACE_START``.
> +
> +``state``
> + 1 = timer fired while task was on-CPU; 0 = timer fired while off-CPU.
> +
> +``tag``
> + Cookie from ``tlob_start_args.tag``; for the tracefs uprobe path this
> + equals ``offset_start``. Zero when not set.
> +
> +tracefs files
> +-------------
> +
> +The following files are created under
> +``/sys/kernel/tracing/rv/monitors/tlob/``:
> +
> +``enable`` (rw)
> + Write ``1`` to enable the monitor; write ``0`` to disable it and
> + stop all currently monitored tasks.
> +
> +``desc`` (ro)
> + Human-readable description of the monitor.
> +
> +``monitor`` (rw)
> + Write ``threshold_us:offset_start:offset_stop:binary_path`` to bind two
> + plain entry uprobes in *binary_path*. The uprobe at *offset_start* fires
> + ``tlob_start_task()``; the uprobe at *offset_stop* fires
> + ``tlob_stop_task()``. Returns ``-EEXIST`` if a binding with the same
> + *offset_start* already exists for *binary_path*. Write
> + ``-offset_start:binary_path`` to remove the binding. Read to list
> + registered bindings, one
> + ``threshold_us:0xoffset_start:0xoffset_stop:binary_path`` entry per line.
> +
> +Specification
> +-------------
> +
> +Graphviz DOT file in tools/verification/models/tlob.dot
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 331223761..8d3af68db 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -385,6 +385,7 @@ Code Seq# Include
> File Comments
> 0xB8 01-02 uapi/misc/mrvl_cn10k_dpi.h
> Marvell CN10K DPI driver
> 0xB8 all uapi/linux/mshv.h
> Microsoft Hyper-V /dev/mshv driver
>
> <mailto:linux-hyperv@vger.kernel.org>
> +0xB9 00-3F linux/rv.h
> Runtime Verification (RV) monitors
> 0xBA 00-0F uapi/linux/liveupdate.h Pasha
> Tatashin
>
> <mailto:pasha.tatashin@soleen.com>
> 0xC0 00-0F linux/usb/iowarrior.h
> diff --git a/include/uapi/linux/rv.h b/include/uapi/linux/rv.h
> new file mode 100644
> index 000000000..d1b96d8cd
> --- /dev/null
> +++ b/include/uapi/linux/rv.h
> @@ -0,0 +1,181 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * UAPI definitions for Runtime Verification (RV) monitors.
> + *
> + * All RV monitors that expose an ioctl self-instrumentation interface
> + * share the magic byte RV_IOC_MAGIC (0xB9), registered in
> + * Documentation/userspace-api/ioctl/ioctl-number.rst.
> + *
> + * A single /dev/rv misc device serves as the entry point. ioctl numbers
> + * encode both the monitor identity and the operation:
> + *
> + * 0x01 - 0x1F tlob (task latency over budget)
> + * 0x20 - 0x3F reserved for future RV monitors
> + *
> + * Usage examples and design rationale are in:
> + * Documentation/trace/rv/monitor_tlob.rst
> + */
> +
> +#ifndef _UAPI_LINUX_RV_H
> +#define _UAPI_LINUX_RV_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>
> +
> +/* Magic byte shared by all RV monitor ioctls. */
> +#define RV_IOC_MAGIC 0xB9
> +
> +/* -----------------------------------------------------------------------
> + * tlob: task latency over budget monitor (nr 0x01 - 0x1F)
> + * -----------------------------------------------------------------------
> + */
> +
> +/**
> + * struct tlob_start_args - arguments for TLOB_IOCTL_TRACE_START
> + * @threshold_us: Latency budget for this critical section, in microseconds.
> + * Must be greater than zero.
> + * @tag: Opaque 64-bit cookie supplied by the caller. Echoed back
> + * verbatim in the tlob_budget_exceeded ftrace event and in any
> + * tlob_event record delivered via @notify_fd. Use it to
> identify
> + * which code region triggered a violation when the same thread
> + * monitors multiple regions sequentially. Set to 0 if not
> + * needed.
> + * @notify_fd: File descriptor that will receive a tlob_event record on
> + * violation. Must refer to an open /dev/rv fd. May equal
> + * the calling fd (self-notification, useful for retrieving the
> + * on_cpu_us / off_cpu_us breakdown after TRACE_STOP returns
> + * -EOVERFLOW). Set to -1 to disable fd notification; in that
> + * case violations are only signalled via the TRACE_STOP return
> + * value and the tlob_budget_exceeded ftrace event.
> + * @flags: Must be 0. Reserved for future extensions.
> + */
> +struct tlob_start_args {
> + __u64 threshold_us;
> + __u64 tag;
> + __s32 notify_fd;
> + __u32 flags;
> +};
> +
> +/**
> + * struct tlob_event - one budget-exceeded event
> + *
> + * Consumed by read() on the notify_fd registered at TLOB_IOCTL_TRACE_START.
> + * Each record describes a single budget exceedance for one task.
> + *
> + * @tid: Thread ID (task_pid_vnr) of the violating task.
> + * @threshold_us: Budget that was exceeded, in microseconds.
> + * @on_cpu_us: Cumulative on-CPU time at violation time, in microseconds.
> + * @off_cpu_us: Cumulative off-CPU (scheduling + I/O wait) time at
> + * violation time, in microseconds.
> + * @switches: Number of context switches since TRACE_START.
> + * @state: DA state at violation: 1 = on_cpu, 0 = off_cpu.
> + * @tag: Cookie from tlob_start_args.tag; for the tracefs uprobe
> path
> + * this is the offset_start value. Zero when not set.
> + */
> +struct tlob_event {
> + __u32 tid;
> + __u32 pad;
> + __u64 threshold_us;
> + __u64 on_cpu_us;
> + __u64 off_cpu_us;
> + __u32 switches;
> + __u32 state; /* 1 = on_cpu, 0 = off_cpu */
> + __u64 tag;
> +};
> +
> +/**
> + * struct tlob_mmap_page - control page for the mmap'd violation ring buffer
> + *
> + * Mapped at offset 0 of the mmap region returned by mmap(2) on a /dev/rv fd.
> + * The data array of struct tlob_event records begins at offset @data_offset
> + * (always one page from the mmap base; use this field rather than hard-
> coding
> + * PAGE_SIZE so the code remains correct across architectures).
> + *
> + * Ring layout:
> + *
> + * mmap base + 0 : struct tlob_mmap_page (one page)
> + * mmap base + data_offset : struct tlob_event[capacity]
> + *
> + * The mmap length determines the ring capacity. Compute it as:
> + *
> + * raw = sysconf(_SC_PAGESIZE) + capacity * sizeof(struct tlob_event)
> + * length = (raw + sysconf(_SC_PAGESIZE) - 1) & ~(sysconf(_SC_PAGESIZE) -
> 1)
> + *
> + * i.e. round the raw byte count up to the next page boundary before
> + * passing it to mmap(2). The kernel requires a page-aligned length.
> + * capacity must be a power of 2. Read @capacity after a successful
> + * mmap(2) for the actual value.
> + *
> + * Producer/consumer ordering contract:
> + *
> + * Kernel (producer):
> + * data[data_head & (capacity - 1)] = event;
> + * // pairs with load-acquire in userspace:
> + * smp_store_release(&page->data_head, data_head + 1);
> + *
> + * Userspace (consumer):
> + * // pairs with store-release in kernel:
> + * head = __atomic_load_n(&page->data_head, __ATOMIC_ACQUIRE);
> + * for (tail = page->data_tail; tail != head; tail++)
> + * handle(&data[tail & (capacity - 1)]);
> + * __atomic_store_n(&page->data_tail, tail, __ATOMIC_RELEASE);
> + *
> + * @data_head and @data_tail are monotonically increasing __u32 counters
> + * in units of records. Unsigned 32-bit wrap-around is handled correctly
> + * by modular arithmetic; the ring is full when
> + * (data_head - data_tail) == capacity.
> + *
> + * When the ring is full the kernel drops the incoming record and increments
> + * @dropped. The consumer should check @dropped periodically to detect loss.
> + *
> + * read() and mmap() share the same ring buffer. Do not use both
> + * simultaneously on the same fd.
> + *
> + * @data_head: Next write slot index. Updated by the kernel with
> + * store-release ordering. Read by userspace with load-
> acquire.
> + * @data_tail: Next read slot index. Updated by userspace. Read by the
> + * kernel to detect overflow.
> + * @capacity: Actual ring capacity in records (power of 2). Written once
> + * by the kernel at mmap time; read-only for userspace
> thereafter.
> + * @version: Ring buffer ABI version; currently 1.
> + * @data_offset: Byte offset from the mmap base to the data array.
> + * Always equal to sysconf(_SC_PAGESIZE) on the running kernel.
> + * @record_size: sizeof(struct tlob_event) as seen by the kernel. Verify
> + * this matches userspace's sizeof before indexing the array.
> + * @dropped: Number of events dropped because the ring was full.
> + * Monotonically increasing; read with __ATOMIC_RELAXED.
> + */
> +struct tlob_mmap_page {
> + __u32 data_head;
> + __u32 data_tail;
> + __u32 capacity;
> + __u32 version;
> + __u32 data_offset;
> + __u32 record_size;
> + __u64 dropped;
> +};
> +
> +/*
> + * TLOB_IOCTL_TRACE_START - begin monitoring the calling task.
> + *
> + * Arms a per-task hrtimer for threshold_us microseconds. If args.notify_fd
> + * is >= 0, a tlob_event record is pushed into that fd's ring buffer on
> + * violation in addition to the tlob_budget_exceeded ftrace event.
> + * args.notify_fd == -1 disables fd notification.
> + *
> + * Violation records are consumed by read() on the notify_fd (blocking or
> + * non-blocking depending on O_NONBLOCK). On violation,
> TLOB_IOCTL_TRACE_STOP
> + * also returns -EOVERFLOW regardless of whether notify_fd is set.
> + *
> + * args.flags must be 0.
> + */
> +#define TLOB_IOCTL_TRACE_START _IOW(RV_IOC_MAGIC, 0x01, struct
> tlob_start_args)
> +
> +/*
> + * TLOB_IOCTL_TRACE_STOP - end monitoring the calling task.
> + *
> + * Returns 0 if within budget, -EOVERFLOW if the budget was exceeded.
> + */
> +#define TLOB_IOCTL_TRACE_STOP _IO(RV_IOC_MAGIC, 0x02)
> +
> +#endif /* _UAPI_LINUX_RV_H */
> diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
> index 5b4be87ba..227573cda 100644
> --- a/kernel/trace/rv/Kconfig
> +++ b/kernel/trace/rv/Kconfig
> @@ -65,6 +65,7 @@ source "kernel/trace/rv/monitors/pagefault/Kconfig"
> source "kernel/trace/rv/monitors/sleep/Kconfig"
> # Add new rtapp monitors here
>
> +source "kernel/trace/rv/monitors/tlob/Kconfig"
> # Add new monitors here
>
> config RV_REACTORS
> @@ -93,3 +94,19 @@ config RV_REACT_PANIC
> help
> Enables the panic reactor. The panic reactor emits a printk()
> message if an exception is found and panic()s the system.
> +
> +config RV_CHARDEV
> + bool "RV ioctl interface via /dev/rv"
> + depends on RV
> + default n
> + help
> + Register a /dev/rv misc device that exposes an ioctl interface
> + for RV monitor self-instrumentation. All RV monitors share the
> + single device node; ioctl numbers encode the monitor identity.
> +
> + When enabled, user-space programs can open /dev/rv and use
> + monitor-specific ioctl commands to bracket code regions they
> + want the kernel RV subsystem to observe.
> +
> + Say Y here if you want to use the tlob self-instrumentation
> + ioctl interface; otherwise say N.
> diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
> index 750e4ad6f..cc3781a3b 100644
> --- a/kernel/trace/rv/Makefile
> +++ b/kernel/trace/rv/Makefile
> @@ -3,6 +3,7 @@
> ccflags-y += -I $(src) # needed for trace events
>
> obj-$(CONFIG_RV) += rv.o
> +obj-$(CONFIG_RV_CHARDEV) += rv_dev.o
> obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o
> obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o
> obj-$(CONFIG_RV_MON_SCHED) += monitors/sched/sched.o
> @@ -17,6 +18,7 @@ obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
> obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
> obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
> obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
> +obj-$(CONFIG_RV_MON_TLOB) += monitors/tlob/tlob.o
> # Add new monitors here
> obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
> obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
> diff --git a/kernel/trace/rv/monitors/tlob/Kconfig
> b/kernel/trace/rv/monitors/tlob/Kconfig
> new file mode 100644
> index 000000000..010237480
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/tlob/Kconfig
> @@ -0,0 +1,51 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +config RV_MON_TLOB
> + depends on RV
> + depends on UPROBES
> + select DA_MON_EVENTS_ID
> + bool "tlob monitor"
> + help
> + Enable the tlob (task latency over budget) monitor. This monitor
> + tracks the elapsed time (CLOCK_MONOTONIC) of a marked code path
> within a
> + task (including both on-CPU and off-CPU time) and reports a
> + violation when the elapsed time exceeds a configurable budget
> + threshold.
> +
> + The monitor implements a three-state deterministic automaton.
> + States: unmonitored, on_cpu, off_cpu.
> + Key transitions:
> + unmonitored --(trace_start)--> on_cpu
> + on_cpu --(switch_out)--> off_cpu
> + off_cpu --(switch_in)--> on_cpu
> + on_cpu --(trace_stop)--> unmonitored
> + off_cpu --(trace_stop)--> unmonitored
> + on_cpu --(budget_expired)--> unmonitored
> + off_cpu --(budget_expired)--> unmonitored
> +
> + External configuration is done via the tracefs "monitor" file:
> + echo pid:threshold_us:binary:offset_start:offset_stop >
> .../rv/monitors/tlob/monitor
> + echo -pid > .../rv/monitors/tlob/monitor (remove
> task)
> + cat .../rv/monitors/tlob/monitor (list
> tasks)
> +
> + The uprobe binding places two plain entry uprobes at offset_start
> and
> + offset_stop in the binary; these trigger tlob_start_task() and
> + tlob_stop_task() respectively. Using two entry uprobes (rather
> than a
> + uretprobe) means that a mistyped offset can never corrupt the call
> + stack; the worst outcome is a missed stop, which causes the hrtimer
> to
> + fire and report a budget violation.
> +
> + Violation events are delivered via a lock-free mmap ring buffer on
> + /dev/rv (enabled by CONFIG_RV_CHARDEV). The consumer mmap()s the
> + device, reads records from the data array using the head/tail
> indices
> + in the control page, and advances data_tail when done.
> +
> + For self-instrumentation, use TLOB_IOCTL_TRACE_START /
> + TLOB_IOCTL_TRACE_STOP via the /dev/rv misc device (enabled by
> + CONFIG_RV_CHARDEV).
> +
> + Up to TLOB_MAX_MONITORED tasks may be monitored simultaneously.
> +
> + For further information, see:
> + Documentation/trace/rv/monitor_tlob.rst
> +
> diff --git a/kernel/trace/rv/monitors/tlob/tlob.c
> b/kernel/trace/rv/monitors/tlob/tlob.c
> new file mode 100644
> index 000000000..a6e474025
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/tlob/tlob.c
> @@ -0,0 +1,986 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * tlob: task latency over budget monitor
> + *
> + * Track the elapsed wall-clock time of a marked code path and detect when
> + * a monitored task exceeds its per-task latency budget. CLOCK_MONOTONIC
> + * is used so both on-CPU and off-CPU time count toward the budget.
> + *
> + * Per-task state is maintained in a spinlock-protected hash table. A
> + * one-shot hrtimer fires at the deadline; if the task has not called
> + * trace_stop by then, a violation is recorded.
> + *
> + * Up to TLOB_MAX_MONITORED tasks may be tracked simultaneously.
> + *
> + * Copyright (C) 2026 Wen Yang <wen.yang@linux.dev>
> + */
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/ftrace.h>
> +#include <linux/hash.h>
> +#include <linux/hrtimer.h>
> +#include <linux/kernel.h>
> +#include <linux/ktime.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/namei.h>
> +#include <linux/poll.h>
> +#include <linux/rv.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/atomic.h>
> +#include <linux/rcupdate.h>
> +#include <linux/spinlock.h>
> +#include <linux/tracefs.h>
> +#include <linux/uaccess.h>
> +#include <linux/uprobes.h>
> +#include <kunit/visibility.h>
> +#include <rv/instrumentation.h>
> +
> +/* rv_interface_lock is defined in kernel/trace/rv/rv.c */
> +extern struct mutex rv_interface_lock;
> +
> +#define MODULE_NAME "tlob"
> +
> +#include <rv_trace.h>
> +#include <trace/events/sched.h>
> +
> +#define RV_MON_TYPE RV_MON_PER_TASK
> +#include "tlob.h"
> +#include <rv/da_monitor.h>
> +
> +/* Hash table size; must be a power of two. */
> +#define TLOB_HTABLE_BITS 6
> +#define TLOB_HTABLE_SIZE (1 << TLOB_HTABLE_BITS)
> +
> +/* Maximum binary path length for uprobe binding. */
> +#define TLOB_MAX_PATH 256
> +
> +/* Per-task latency monitoring state. */
> +struct tlob_task_state {
> + struct hlist_node hlist;
> + struct task_struct *task;
> + u64 threshold_us;
> + u64 tag;
> + struct hrtimer deadline_timer;
> + int canceled; /* protected by entry_lock */
> + struct file *notify_file; /* NULL or held reference */
> +
> + /*
> + * entry_lock serialises the mutable accounting fields below.
> + * Lock order: tlob_table_lock -> entry_lock (never reverse).
> + */
> + raw_spinlock_t entry_lock;
> + u64 on_cpu_us;
> + u64 off_cpu_us;
> + ktime_t last_ts;
> + u32 switches;
> + u8 da_state;
> +
> + struct rcu_head rcu; /* for call_rcu() teardown */
> +};
> +
> +/* Per-uprobe-binding state: a start + stop probe pair for one binary region.
> */
> +struct tlob_uprobe_binding {
> + struct list_head list;
> + u64 threshold_us;
> + struct path path;
> + char binpath[TLOB_MAX_PATH]; /* canonical
> path for read/remove */
> + loff_t offset_start;
> + loff_t offset_stop;
> + struct uprobe_consumer entry_uc;
> + struct uprobe_consumer stop_uc;
> + struct uprobe *entry_uprobe;
> + struct uprobe *stop_uprobe;
> +};
> +
> +/* Object pool for tlob_task_state. */
> +static struct kmem_cache *tlob_state_cache;
> +
> +/* Hash table and lock protecting table structure (insert/delete/canceled).
> */
> +static struct hlist_head tlob_htable[TLOB_HTABLE_SIZE];
> +static DEFINE_RAW_SPINLOCK(tlob_table_lock);
> +static atomic_t tlob_num_monitored = ATOMIC_INIT(0);
> +
> +/* Uprobe binding list; protected by tlob_uprobe_mutex. */
> +static LIST_HEAD(tlob_uprobe_list);
> +static DEFINE_MUTEX(tlob_uprobe_mutex);
> +
> +/* Forward declaration */
> +static enum hrtimer_restart tlob_deadline_timer_fn(struct hrtimer *timer);
> +
> +/* Hash table helpers */
> +
> +static unsigned int tlob_hash_task(const struct task_struct *task)
> +{
> + return hash_ptr((void *)task, TLOB_HTABLE_BITS);
> +}
> +
> +/*
> + * tlob_find_rcu - look up per-task state.
> + * Must be called under rcu_read_lock() or with tlob_table_lock held.
> + */
> +static struct tlob_task_state *tlob_find_rcu(struct task_struct *task)
> +{
> + struct tlob_task_state *ws;
> + unsigned int h = tlob_hash_task(task);
> +
> + hlist_for_each_entry_rcu(ws, &tlob_htable[h], hlist,
> + lockdep_is_held(&tlob_table_lock))
> + if (ws->task == task)
> + return ws;
> + return NULL;
> +}
> +
> +/* Allocate and initialise a new per-task state entry. */
> +static struct tlob_task_state *tlob_alloc(struct task_struct *task,
> + u64 threshold_us, u64 tag)
> +{
> + struct tlob_task_state *ws;
> +
> + ws = kmem_cache_zalloc(tlob_state_cache, GFP_ATOMIC);
> + if (!ws)
> + return NULL;
> +
> + ws->task = task;
> + get_task_struct(task);
> + ws->threshold_us = threshold_us;
> + ws->tag = tag;
> + ws->last_ts = ktime_get();
> + ws->da_state = on_cpu_tlob;
> + raw_spin_lock_init(&ws->entry_lock);
> + hrtimer_setup(&ws->deadline_timer, tlob_deadline_timer_fn,
> + CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> + return ws;
> +}
> +
> +/* RCU callback: free the slab once no readers remain. */
> +static void tlob_free_rcu_slab(struct rcu_head *head)
> +{
> + struct tlob_task_state *ws =
> + container_of(head, struct tlob_task_state, rcu);
> + kmem_cache_free(tlob_state_cache, ws);
> +}
> +
> +/* Arm the one-shot deadline timer for threshold_us microseconds. */
> +static void tlob_arm_deadline(struct tlob_task_state *ws)
> +{
> + hrtimer_start(&ws->deadline_timer,
> + ns_to_ktime(ws->threshold_us * NSEC_PER_USEC),
> + HRTIMER_MODE_REL);
> +}
> +
> +/*
> + * Push a violation record into a monitor fd's ring buffer (softirq context).
> + * Drop-new policy: discard incoming record when full. smp_store_release on
> + * data_head pairs with smp_load_acquire in the consumer.
> + */
> +static void tlob_event_push(struct rv_file_priv *priv,
> + const struct tlob_event *info)
> +{
> + struct tlob_ring *ring = &priv->ring;
> + unsigned long flags;
> + u32 head, tail;
> +
> + spin_lock_irqsave(&ring->lock, flags);
> +
> + head = ring->page->data_head;
> + tail = READ_ONCE(ring->page->data_tail);
> +
> + if (head - tail > ring->mask) {
> + /* Ring full: drop incoming record. */
> + ring->page->dropped++;
> + spin_unlock_irqrestore(&ring->lock, flags);
> + return;
> + }
> +
> + ring->data[head & ring->mask] = *info;
> + /* pairs with smp_load_acquire() in the consumer */
> + smp_store_release(&ring->page->data_head, head + 1);
> +
> + spin_unlock_irqrestore(&ring->lock, flags);
> +
> + wake_up_interruptible_poll(&priv->waitq, EPOLLIN | EPOLLRDNORM);
> +}
> +
> +#if IS_ENABLED(CONFIG_KUNIT)
> +void tlob_event_push_kunit(struct rv_file_priv *priv,
> + const struct tlob_event *info)
> +{
> + tlob_event_push(priv, info);
> +}
> +EXPORT_SYMBOL_IF_KUNIT(tlob_event_push_kunit);
> +#endif /* CONFIG_KUNIT */
> +
> +/*
> + * Budget exceeded: remove the entry, record the violation, and inject
> + * budget_expired into the DA.
> + *
> + * Lock order: tlob_table_lock -> entry_lock. tlob_stop_task() sets
> + * ws->canceled under both locks; if we see it here the stop path owns
> cleanup.
> + * fput/put_task_struct are done before call_rcu(); the RCU callback only
> + * reclaims the slab.
> + */
> +static enum hrtimer_restart tlob_deadline_timer_fn(struct hrtimer *timer)
> +{
> + struct tlob_task_state *ws =
> + container_of(timer, struct tlob_task_state, deadline_timer);
> + struct tlob_event info = {};
> + struct file *notify_file;
> + struct task_struct *task;
> + unsigned long flags;
> + /* snapshots taken under entry_lock */
> + u64 on_cpu_us, off_cpu_us, threshold_us, tag;
> + u32 switches;
> + bool on_cpu;
> + bool push_event = false;
> +
> + raw_spin_lock_irqsave(&tlob_table_lock, flags);
> + /* stop path sets canceled under both locks; if set it owns cleanup
> */
> + if (ws->canceled) {
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> + return HRTIMER_NORESTART;
> + }
> +
> + /* Finalize accounting and snapshot all fields under entry_lock. */
> + raw_spin_lock(&ws->entry_lock);
> +
> + {
> + ktime_t now = ktime_get();
> + u64 delta_us = ktime_to_us(ktime_sub(now, ws->last_ts));
> +
> + if (ws->da_state == on_cpu_tlob)
> + ws->on_cpu_us += delta_us;
> + else
> + ws->off_cpu_us += delta_us;
> + }
> +
> + ws->canceled = 1;
> + on_cpu_us = ws->on_cpu_us;
> + off_cpu_us = ws->off_cpu_us;
> + threshold_us = ws->threshold_us;
> + tag = ws->tag;
> + switches = ws->switches;
> + on_cpu = (ws->da_state == on_cpu_tlob);
> + notify_file = ws->notify_file;
> + if (notify_file) {
> + info.tid = task_pid_vnr(ws->task);
> + info.threshold_us = threshold_us;
> + info.on_cpu_us = on_cpu_us;
> + info.off_cpu_us = off_cpu_us;
> + info.switches = switches;
> + info.state = on_cpu ? 1 : 0;
> + info.tag = tag;
> + push_event = true;
> + }
> +
> + raw_spin_unlock(&ws->entry_lock);
> +
> + hlist_del_rcu(&ws->hlist);
> + atomic_dec(&tlob_num_monitored);
> + /*
> + * Hold a reference so task remains valid across da_handle_event()
> + * after we drop tlob_table_lock.
> + */
> + task = ws->task;
> + get_task_struct(task);
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> +
> + /*
> + * Both locks are now released; ws is exclusively owned (removed from
> + * the hash table with canceled=1). Emit the tracepoint and push the
> + * violation record.
> + */
> + trace_tlob_budget_exceeded(ws->task, threshold_us, on_cpu_us,
> + off_cpu_us, switches, on_cpu, tag);
> +
> + if (push_event) {
> + struct rv_file_priv *priv = notify_file->private_data;
> +
> + if (priv)
> + tlob_event_push(priv, &info);
> + }
> +
> + da_handle_event(task, budget_expired_tlob);
> +
> + if (notify_file)
> + fput(notify_file); /* ref from fget() at
> TRACE_START */
> + put_task_struct(ws->task); /* ref from tlob_alloc() */
> + put_task_struct(task); /* extra ref from
> get_task_struct() above */
> + call_rcu(&ws->rcu, tlob_free_rcu_slab);
> + return HRTIMER_NORESTART;
> +}
> +
> +/* Tracepoint handlers */
> +
> +/*
> + * handle_sched_switch - advance the DA and accumulate on/off-CPU time.
> + *
> + * RCU read-side for lock-free lookup; entry_lock for per-task accounting.
> + * da_handle_event() is called after rcu_read_unlock() to avoid holding the
> + * read-side critical section across the RV framework.
> + */
> +static void handle_sched_switch(void *data, bool preempt,
> + struct task_struct *prev,
> + struct task_struct *next,
> + unsigned int prev_state)
> +{
> + struct tlob_task_state *ws;
> + unsigned long flags;
> + bool do_prev = false, do_next = false;
> + ktime_t now;
> +
> + rcu_read_lock();
> +
> + ws = tlob_find_rcu(prev);
> + if (ws) {
> + raw_spin_lock_irqsave(&ws->entry_lock, flags);
> + if (!ws->canceled) {
> + now = ktime_get();
> + ws->on_cpu_us += ktime_to_us(ktime_sub(now, ws-
> >last_ts));
> + ws->last_ts = now;
> + ws->switches++;
> + ws->da_state = off_cpu_tlob;
> + do_prev = true;
> + }
> + raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
> + }
> +
> + ws = tlob_find_rcu(next);
> + if (ws) {
> + raw_spin_lock_irqsave(&ws->entry_lock, flags);
> + if (!ws->canceled) {
> + now = ktime_get();
> + ws->off_cpu_us += ktime_to_us(ktime_sub(now, ws-
> >last_ts));
> + ws->last_ts = now;
> + ws->da_state = on_cpu_tlob;
> + do_next = true;
> + }
> + raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
> + }
> +
> + rcu_read_unlock();
> +
> + if (do_prev)
> + da_handle_event(prev, switch_out_tlob);
> + if (do_next)
> + da_handle_event(next, switch_in_tlob);
> +}
> +
> +static void handle_sched_wakeup(void *data, struct task_struct *p)
> +{
> + struct tlob_task_state *ws;
> + unsigned long flags;
> + bool found = false;
> +
> + rcu_read_lock();
> + ws = tlob_find_rcu(p);
> + if (ws) {
> + raw_spin_lock_irqsave(&ws->entry_lock, flags);
> + found = !ws->canceled;
> + raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
> + }
> + rcu_read_unlock();
> +
> + if (found)
> + da_handle_event(p, sched_wakeup_tlob);
> +}
> +
> +/* -----------------------------------------------------------------------
> + * Core start/stop helpers (also called from rv_dev.c)
> + * -----------------------------------------------------------------------
> + */
> +
> +/*
> + * __tlob_insert - insert @ws into the hash table and arm its deadline timer.
> + *
> + * Re-checks for duplicates and capacity under tlob_table_lock; the caller
> + * may have done a lock-free pre-check before allocating @ws. On failure @ws
> + * is freed directly (never in table, so no call_rcu needed).
> + */
> +static int __tlob_insert(struct task_struct *task, struct tlob_task_state
> *ws)
> +{
> + unsigned int h;
> + unsigned long flags;
> +
> + raw_spin_lock_irqsave(&tlob_table_lock, flags);
> + if (tlob_find_rcu(task)) {
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> + if (ws->notify_file)
> + fput(ws->notify_file);
> + put_task_struct(ws->task);
> + kmem_cache_free(tlob_state_cache, ws);
> + return -EEXIST;
> + }
> + if (atomic_read(&tlob_num_monitored) >= TLOB_MAX_MONITORED) {
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> + if (ws->notify_file)
> + fput(ws->notify_file);
> + put_task_struct(ws->task);
> + kmem_cache_free(tlob_state_cache, ws);
> + return -ENOSPC;
> + }
> + h = tlob_hash_task(task);
> + hlist_add_head_rcu(&ws->hlist, &tlob_htable[h]);
> + atomic_inc(&tlob_num_monitored);
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> +
> + da_handle_start_run_event(task, trace_start_tlob);
> + tlob_arm_deadline(ws);
> + return 0;
> +}
> +
> +/**
> + * tlob_start_task - begin monitoring @task with latency budget
> @threshold_us.
> + *
> + * @notify_file: /dev/rv fd whose ring buffer receives a tlob_event on
> + * violation; caller transfers the fget() reference to tlob.c.
> + * Pass NULL for synchronous mode (violations only via
> + * TRACE_STOP return value and the tlob_budget_exceeded event).
> + *
> + * Returns 0, -ENODEV, -EEXIST, -ENOSPC, or -ENOMEM. On failure the caller
> + * retains responsibility for any @notify_file reference.
> + */
> +int tlob_start_task(struct task_struct *task, u64 threshold_us,
> + struct file *notify_file, u64 tag)
> +{
> + struct tlob_task_state *ws;
> + unsigned long flags;
> +
> + if (!tlob_state_cache)
> + return -ENODEV;
> +
> + if (threshold_us > (u64)KTIME_MAX / NSEC_PER_USEC)
> + return -ERANGE;
> +
> + /* Quick pre-check before allocation. */
> + raw_spin_lock_irqsave(&tlob_table_lock, flags);
> + if (tlob_find_rcu(task)) {
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> + return -EEXIST;
> + }
> + if (atomic_read(&tlob_num_monitored) >= TLOB_MAX_MONITORED) {
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> + return -ENOSPC;
> + }
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> +
> + ws = tlob_alloc(task, threshold_us, tag);
> + if (!ws)
> + return -ENOMEM;
> +
> + ws->notify_file = notify_file;
> + return __tlob_insert(task, ws);
> +}
> +EXPORT_SYMBOL_GPL(tlob_start_task);
> +
> +/**
> + * tlob_stop_task - stop monitoring @task before the deadline fires.
> + *
> + * Sets canceled under entry_lock (inside tlob_table_lock) before calling
> + * hrtimer_cancel(), racing safely with the timer callback.
> + *
> + * Returns 0 if within budget, -ESRCH if the entry is gone (deadline already
> + * fired, or TRACE_START was never called).
> + */
> +int tlob_stop_task(struct task_struct *task)
> +{
> + struct tlob_task_state *ws;
> + struct file *notify_file;
> + unsigned long flags;
> +
> + raw_spin_lock_irqsave(&tlob_table_lock, flags);
> + ws = tlob_find_rcu(task);
> + if (!ws) {
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> + return -ESRCH;
> + }
> +
> + /* Prevent handle_sched_switch from updating accounting after
> removal. */
> + raw_spin_lock(&ws->entry_lock);
> + ws->canceled = 1;
> + raw_spin_unlock(&ws->entry_lock);
> +
> + hlist_del_rcu(&ws->hlist);
> + atomic_dec(&tlob_num_monitored);
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> +
> + hrtimer_cancel(&ws->deadline_timer);
> +
> + da_handle_event(task, trace_stop_tlob);
> +
> + notify_file = ws->notify_file;
> + if (notify_file)
> + fput(notify_file);
> + put_task_struct(ws->task);
> + call_rcu(&ws->rcu, tlob_free_rcu_slab);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(tlob_stop_task);
> +
> +/* Stop monitoring all tracked tasks; called on monitor disable. */
> +static void tlob_stop_all(void)
> +{
> + struct tlob_task_state *batch[TLOB_MAX_MONITORED];
> + struct tlob_task_state *ws;
> + struct hlist_node *tmp;
> + unsigned long flags;
> + int n = 0, i;
> +
> + raw_spin_lock_irqsave(&tlob_table_lock, flags);
> + for (i = 0; i < TLOB_HTABLE_SIZE; i++) {
> + hlist_for_each_entry_safe(ws, tmp, &tlob_htable[i], hlist) {
> + raw_spin_lock(&ws->entry_lock);
> + ws->canceled = 1;
> + raw_spin_unlock(&ws->entry_lock);
> + hlist_del_rcu(&ws->hlist);
> + atomic_dec(&tlob_num_monitored);
> + if (n < TLOB_MAX_MONITORED)
> + batch[n++] = ws;
> + }
> + }
> + raw_spin_unlock_irqrestore(&tlob_table_lock, flags);
> +
> + for (i = 0; i < n; i++) {
> + ws = batch[i];
> + hrtimer_cancel(&ws->deadline_timer);
> + da_handle_event(ws->task, trace_stop_tlob);
> + if (ws->notify_file)
> + fput(ws->notify_file);
> + put_task_struct(ws->task);
> + call_rcu(&ws->rcu, tlob_free_rcu_slab);
> + }
> +}
> +
> +/* uprobe binding helpers */
> +
> +static int tlob_uprobe_entry_handler(struct uprobe_consumer *uc,
> + struct pt_regs *regs, __u64 *data)
> +{
> + struct tlob_uprobe_binding *b =
> + container_of(uc, struct tlob_uprobe_binding, entry_uc);
> +
> + tlob_start_task(current, b->threshold_us, NULL, (u64)b-
> >offset_start);
> + return 0;
> +}
> +
> +static int tlob_uprobe_stop_handler(struct uprobe_consumer *uc,
> + struct pt_regs *regs, __u64 *data)
> +{
> + tlob_stop_task(current);
> + return 0;
> +}
> +
> +/*
> + * Register start + stop entry uprobes for a binding.
> + * Both are plain entry uprobes (no uretprobe), so a wrong offset never
> + * corrupts the call stack; the worst outcome is a missed stop (hrtimer
> + * fires and reports a budget violation).
> + * Called with tlob_uprobe_mutex held.
> + */
> +static int tlob_add_uprobe(u64 threshold_us, const char *binpath,
> + loff_t offset_start, loff_t offset_stop)
> +{
> + struct tlob_uprobe_binding *b, *tmp_b;
> + char pathbuf[TLOB_MAX_PATH];
> + struct inode *inode;
> + char *canon;
> + int ret;
> +
> + b = kzalloc(sizeof(*b), GFP_KERNEL);
> + if (!b)
> + return -ENOMEM;
> +
> + if (binpath[0] != '/') {
> + kfree(b);
> + return -EINVAL;
> + }
> +
> + b->threshold_us = threshold_us;
> + b->offset_start = offset_start;
> + b->offset_stop = offset_stop;
> +
> + ret = kern_path(binpath, LOOKUP_FOLLOW, &b->path);
> + if (ret)
> + goto err_free;
> +
> + if (!d_is_reg(b->path.dentry)) {
> + ret = -EINVAL;
> + goto err_path;
> + }
> +
> + /* Reject duplicate start offset for the same binary. */
> + list_for_each_entry(tmp_b, &tlob_uprobe_list, list) {
> + if (tmp_b->offset_start == offset_start &&
> + tmp_b->path.dentry == b->path.dentry) {
> + ret = -EEXIST;
> + goto err_path;
> + }
> + }
> +
> + /* Store canonical path for read-back and removal matching. */
> + canon = d_path(&b->path, pathbuf, sizeof(pathbuf));
> + if (IS_ERR(canon)) {
> + ret = PTR_ERR(canon);
> + goto err_path;
> + }
> + strscpy(b->binpath, canon, sizeof(b->binpath));
> +
> + b->entry_uc.handler = tlob_uprobe_entry_handler;
> + b->stop_uc.handler = tlob_uprobe_stop_handler;
> +
> + inode = d_real_inode(b->path.dentry);
> +
> + b->entry_uprobe = uprobe_register(inode, offset_start, 0, &b-
> >entry_uc);
> + if (IS_ERR(b->entry_uprobe)) {
> + ret = PTR_ERR(b->entry_uprobe);
> + b->entry_uprobe = NULL;
> + goto err_path;
> + }
> +
> + b->stop_uprobe = uprobe_register(inode, offset_stop, 0, &b->stop_uc);
> + if (IS_ERR(b->stop_uprobe)) {
> + ret = PTR_ERR(b->stop_uprobe);
> + b->stop_uprobe = NULL;
> + goto err_entry;
> + }
> +
> + list_add_tail(&b->list, &tlob_uprobe_list);
> + return 0;
> +
> +err_entry:
> + uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc);
> + uprobe_unregister_sync();
> +err_path:
> + path_put(&b->path);
> +err_free:
> + kfree(b);
> + return ret;
> +}
> +
> +/*
> + * Remove the uprobe binding for (offset_start, binpath).
> + * binpath is resolved to a dentry for comparison so symlinks are handled
> + * correctly. Called with tlob_uprobe_mutex held.
> + */
> +static void tlob_remove_uprobe_by_key(loff_t offset_start, const char
> *binpath)
> +{
> + struct tlob_uprobe_binding *b, *tmp;
> + struct path remove_path;
> +
> + if (kern_path(binpath, LOOKUP_FOLLOW, &remove_path))
> + return;
> +
> + list_for_each_entry_safe(b, tmp, &tlob_uprobe_list, list) {
> + if (b->offset_start != offset_start)
> + continue;
> + if (b->path.dentry != remove_path.dentry)
> + continue;
> + uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc);
> + uprobe_unregister_nosync(b->stop_uprobe, &b->stop_uc);
> + list_del(&b->list);
> + uprobe_unregister_sync();
> + path_put(&b->path);
> + kfree(b);
> + break;
> + }
> +
> + path_put(&remove_path);
> +}
> +
> +/* Unregister all uprobe bindings; called from disable_tlob(). */
> +static void tlob_remove_all_uprobes(void)
> +{
> + struct tlob_uprobe_binding *b, *tmp;
> +
> + mutex_lock(&tlob_uprobe_mutex);
> + list_for_each_entry_safe(b, tmp, &tlob_uprobe_list, list) {
> + uprobe_unregister_nosync(b->entry_uprobe, &b->entry_uc);
> + uprobe_unregister_nosync(b->stop_uprobe, &b->stop_uc);
> + list_del(&b->list);
> + path_put(&b->path);
> + kfree(b);
> + }
> + mutex_unlock(&tlob_uprobe_mutex);
> + uprobe_unregister_sync();
> +}
> +
> +/*
> + * tracefs "monitor" file
> + *
> + * Read: one "threshold_us:0xoffset_start:0xoffset_stop:binary_path\n"
> + * line per registered uprobe binding.
> + * Write: "threshold_us:offset_start:offset_stop:binary_path" - add uprobe
> binding
> + * "-offset_start:binary_path" - remove uprobe
> binding
> + */
> +
> +static ssize_t tlob_monitor_read(struct file *file,
> + char __user *ubuf,
> + size_t count, loff_t *ppos)
> +{
> + /* pid(10) + threshold(20) + 2 offsets(2*18) + path(256) + delimiters
> */
> + const int line_sz = TLOB_MAX_PATH + 72;
> + struct tlob_uprobe_binding *b;
> + char *buf, *p;
> + int n = 0, buf_sz, pos = 0;
> + ssize_t ret;
> +
> + mutex_lock(&tlob_uprobe_mutex);
> + list_for_each_entry(b, &tlob_uprobe_list, list)
> + n++;
> + mutex_unlock(&tlob_uprobe_mutex);
> +
> + buf_sz = (n ? n : 1) * line_sz + 1;
> + buf = kmalloc(buf_sz, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + mutex_lock(&tlob_uprobe_mutex);
> + list_for_each_entry(b, &tlob_uprobe_list, list) {
> + p = b->binpath;
> + pos += scnprintf(buf + pos, buf_sz - pos,
> + "%llu:0x%llx:0x%llx:%s\n",
> + b->threshold_us,
> + (unsigned long long)b->offset_start,
> + (unsigned long long)b->offset_stop,
> + p);
> + }
> + mutex_unlock(&tlob_uprobe_mutex);
> +
> + ret = simple_read_from_buffer(ubuf, count, ppos, buf, pos);
> + kfree(buf);
> + return ret;
> +}
> +
> +/*
> + * Parse "threshold_us:offset_start:offset_stop:binary_path".
> + * binary_path comes last so it may freely contain ':'.
> + * Returns 0 on success.
> + */
> +VISIBLE_IF_KUNIT int tlob_parse_uprobe_line(char *buf, u64 *thr_out,
> + char **path_out,
> + loff_t *start_out, loff_t
> *stop_out)
> +{
> + unsigned long long thr;
> + long long start, stop;
> + int n = 0;
> +
> + /*
> + * %llu : decimal-only (microseconds)
> + * %lli : auto-base, accepts 0x-prefixed hex for offsets
> + * %n : records the byte offset of the first path character
> + */
> + if (sscanf(buf, "%llu:%lli:%lli:%n", &thr, &start, &stop, &n) != 3)
> + return -EINVAL;
> + if (thr == 0 || n == 0 || buf[n] == '\0')
> + return -EINVAL;
> + if (start < 0 || stop < 0)
> + return -EINVAL;
> +
> + *thr_out = thr;
> + *start_out = start;
> + *stop_out = stop;
> + *path_out = buf + n;
> + return 0;
> +}
> +
> +static ssize_t tlob_monitor_write(struct file *file,
> + const char __user *ubuf,
> + size_t count, loff_t *ppos)
> +{
> + char buf[TLOB_MAX_PATH + 64];
> + loff_t offset_start, offset_stop;
> + u64 threshold_us;
> + char *binpath;
> + int ret;
> +
> + if (count >= sizeof(buf))
> + return -EINVAL;
> + if (copy_from_user(buf, ubuf, count))
> + return -EFAULT;
> + buf[count] = '\0';
> +
> + if (count > 0 && buf[count - 1] == '\n')
> + buf[count - 1] = '\0';
> +
> + /* Remove request: "-offset_start:binary_path" */
> + if (buf[0] == '-') {
> + long long off;
> + int n = 0;
> +
> + if (sscanf(buf + 1, "%lli:%n", &off, &n) != 1 || n == 0)
> + return -EINVAL;
> + binpath = buf + 1 + n;
> + if (binpath[0] != '/')
> + return -EINVAL;
> +
> + mutex_lock(&tlob_uprobe_mutex);
> + tlob_remove_uprobe_by_key((loff_t)off, binpath);
> + mutex_unlock(&tlob_uprobe_mutex);
> +
> + return (ssize_t)count;
> + }
> +
> + /*
> + * Uprobe binding:
> "threshold_us:offset_start:offset_stop:binary_path"
> + * binpath points into buf at the start of the path field.
> + */
> + ret = tlob_parse_uprobe_line(buf, &threshold_us,
> + &binpath, &offset_start, &offset_stop);
> + if (ret)
> + return ret;
> +
> + mutex_lock(&tlob_uprobe_mutex);
> + ret = tlob_add_uprobe(threshold_us, binpath, offset_start,
> offset_stop);
> + mutex_unlock(&tlob_uprobe_mutex);
> + return ret ? ret : (ssize_t)count;
> +}
> +
> +static const struct file_operations tlob_monitor_fops = {
> + .open = simple_open,
> + .read = tlob_monitor_read,
> + .write = tlob_monitor_write,
> + .llseek = noop_llseek,
> +};
> +
> +/*
> + * __tlob_init_monitor / __tlob_destroy_monitor - called with
> rv_interface_lock
> + * held (required by da_monitor_init/destroy via
> rv_get/put_task_monitor_slot).
> + */
> +static int __tlob_init_monitor(void)
> +{
> + int i, retval;
> +
> + tlob_state_cache = kmem_cache_create("tlob_task_state",
> + sizeof(struct tlob_task_state),
> + 0, 0, NULL);
> + if (!tlob_state_cache)
> + return -ENOMEM;
> +
> + for (i = 0; i < TLOB_HTABLE_SIZE; i++)
> + INIT_HLIST_HEAD(&tlob_htable[i]);
> + atomic_set(&tlob_num_monitored, 0);
> +
> + retval = da_monitor_init();
> + if (retval) {
> + kmem_cache_destroy(tlob_state_cache);
> + tlob_state_cache = NULL;
> + return retval;
> + }
> +
> + rv_this.enabled = 1;
> + return 0;
> +}
> +
> +static void __tlob_destroy_monitor(void)
> +{
> + rv_this.enabled = 0;
> + tlob_stop_all();
> + tlob_remove_all_uprobes();
> + /*
> + * Drain pending call_rcu() callbacks from tlob_stop_all() before
> + * destroying the kmem_cache.
> + */
> + synchronize_rcu();
> + da_monitor_destroy();
> + kmem_cache_destroy(tlob_state_cache);
> + tlob_state_cache = NULL;
> +}
> +
> +/*
> + * tlob_init_monitor / tlob_destroy_monitor - KUnit wrappers that acquire
> + * rv_interface_lock, satisfying the lockdep_assert_held() inside
> + * rv_get/put_task_monitor_slot().
> + */
> +VISIBLE_IF_KUNIT int tlob_init_monitor(void)
> +{
> + int ret;
> +
> + mutex_lock(&rv_interface_lock);
> + ret = __tlob_init_monitor();
> + mutex_unlock(&rv_interface_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_IF_KUNIT(tlob_init_monitor);
> +
> +VISIBLE_IF_KUNIT void tlob_destroy_monitor(void)
> +{
> + mutex_lock(&rv_interface_lock);
> + __tlob_destroy_monitor();
> + mutex_unlock(&rv_interface_lock);
> +}
> +EXPORT_SYMBOL_IF_KUNIT(tlob_destroy_monitor);
> +
> +VISIBLE_IF_KUNIT int tlob_enable_hooks(void)
> +{
> + rv_attach_trace_probe("tlob", sched_switch, handle_sched_switch);
> + rv_attach_trace_probe("tlob", sched_wakeup, handle_sched_wakeup);
> + return 0;
> +}
> +EXPORT_SYMBOL_IF_KUNIT(tlob_enable_hooks);
> +
> +VISIBLE_IF_KUNIT void tlob_disable_hooks(void)
> +{
> + rv_detach_trace_probe("tlob", sched_switch, handle_sched_switch);
> + rv_detach_trace_probe("tlob", sched_wakeup, handle_sched_wakeup);
> +}
> +EXPORT_SYMBOL_IF_KUNIT(tlob_disable_hooks);
> +
> +/*
> + * enable_tlob / disable_tlob - called by rv_enable/disable_monitor() which
> + * already holds rv_interface_lock; call the __ variants directly.
> + */
> +static int enable_tlob(void)
> +{
> + int retval;
> +
> + retval = __tlob_init_monitor();
> + if (retval)
> + return retval;
> +
> + return tlob_enable_hooks();
> +}
> +
> +static void disable_tlob(void)
> +{
> + tlob_disable_hooks();
> + __tlob_destroy_monitor();
> +}
> +
> +static struct rv_monitor rv_this = {
> + .name = "tlob",
> + .description = "Per-task latency-over-budget monitor.",
> + .enable = enable_tlob,
> + .disable = disable_tlob,
> + .reset = da_monitor_reset_all,
> + .enabled = 0,
> +};
> +
> +static int __init register_tlob(void)
> +{
> + int ret;
> +
> + ret = rv_register_monitor(&rv_this, NULL);
> + if (ret)
> + return ret;
> +
> + if (rv_this.root_d) {
> + tracefs_create_file("monitor", 0644, rv_this.root_d, NULL,
> + &tlob_monitor_fops);
> + }
> +
> + return 0;
> +}
> +
> +static void __exit unregister_tlob(void)
> +{
> + rv_unregister_monitor(&rv_this);
> +}
> +
> +module_init(register_tlob);
> +module_exit(unregister_tlob);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Wen Yang <wen.yang@linux.dev>");
> +MODULE_DESCRIPTION("tlob: task latency over budget per-task monitor.");
> diff --git a/kernel/trace/rv/monitors/tlob/tlob.h
> b/kernel/trace/rv/monitors/tlob/tlob.h
> new file mode 100644
> index 000000000..3438a6175
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/tlob/tlob.h
> @@ -0,0 +1,145 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _RV_TLOB_H
> +#define _RV_TLOB_H
> +
> +/*
> + * C representation of the tlob automaton, generated from tlob.dot via rvgen
> + * and extended with tlob_start_task()/tlob_stop_task() declarations.
> + * For the format description see
> Documentation/trace/rv/deterministic_automata.rst
> + */
> +
> +#include <linux/rv.h>
> +#include <uapi/linux/rv.h>
> +
> +#define MONITOR_NAME tlob
> +
> +enum states_tlob {
> + unmonitored_tlob,
> + on_cpu_tlob,
> + off_cpu_tlob,
> + state_max_tlob,
> +};
> +
> +#define INVALID_STATE state_max_tlob
> +
> +enum events_tlob {
> + trace_start_tlob,
> + switch_in_tlob,
> + switch_out_tlob,
> + sched_wakeup_tlob,
> + trace_stop_tlob,
> + budget_expired_tlob,
> + event_max_tlob,
> +};
> +
> +struct automaton_tlob {
> + char *state_names[state_max_tlob];
> + char *event_names[event_max_tlob];
> + unsigned char function[state_max_tlob][event_max_tlob];
> + unsigned char initial_state;
> + bool final_states[state_max_tlob];
> +};
> +
> +static const struct automaton_tlob automaton_tlob = {
> + .state_names = {
> + "unmonitored",
> + "on_cpu",
> + "off_cpu",
> + },
> + .event_names = {
> + "trace_start",
> + "switch_in",
> + "switch_out",
> + "sched_wakeup",
> + "trace_stop",
> + "budget_expired",
> + },
> + .function = {
> + /* unmonitored */
> + {
> + on_cpu_tlob, /* trace_start */
> + unmonitored_tlob, /* switch_in */
> + unmonitored_tlob, /* switch_out */
> + unmonitored_tlob, /* sched_wakeup */
> + INVALID_STATE, /* trace_stop */
> + INVALID_STATE, /* budget_expired */
> + },
> + /* on_cpu */
> + {
> + INVALID_STATE, /* trace_start */
> + INVALID_STATE, /* switch_in */
> + off_cpu_tlob, /* switch_out */
> + on_cpu_tlob, /* sched_wakeup */
> + unmonitored_tlob, /* trace_stop */
> + unmonitored_tlob, /* budget_expired */
> + },
> + /* off_cpu */
> + {
> + INVALID_STATE, /* trace_start */
> + on_cpu_tlob, /* switch_in */
> + off_cpu_tlob, /* switch_out */
> + off_cpu_tlob, /* sched_wakeup */
> + unmonitored_tlob, /* trace_stop */
> + unmonitored_tlob, /* budget_expired */
> + },
> + },
> + /*
> + * final_states: unmonitored is the sole accepting state.
> + * Violations are recorded via ntf_push and tlob_budget_exceeded.
> + */
> + .initial_state = unmonitored_tlob,
> + .final_states = { 1, 0, 0 },
> +};
> +
> +/* Exported for use by the RV ioctl layer (rv_dev.c) */
> +int tlob_start_task(struct task_struct *task, u64 threshold_us,
> + struct file *notify_file, u64 tag);
> +int tlob_stop_task(struct task_struct *task);
> +
> +/* Maximum number of concurrently monitored tasks (also used by KUnit). */
> +#define TLOB_MAX_MONITORED 64U
> +
> +/*
> + * Ring buffer constants (also published in UAPI for mmap size calculation).
> + */
> +#define TLOB_RING_DEFAULT_CAP 64U /* records allocated at open() */
> +#define TLOB_RING_MIN_CAP 8U /* minimum accepted by mmap() */
> +#define TLOB_RING_MAX_CAP 4096U /* maximum accepted by mmap() */
> +
> +/**
> + * struct tlob_ring - per-fd mmap-capable violation ring buffer.
> + *
> + * Allocated as a contiguous page range at rv_open() time:
> + * page 0: struct tlob_mmap_page (shared with userspace)
> + * pages 1-N: struct tlob_event[capacity]
> + */
> +struct tlob_ring {
> + struct tlob_mmap_page *page;
> + struct tlob_event *data;
> + u32 mask;
> + spinlock_t lock;
> + unsigned long base;
> + unsigned int order;
> +};
> +
> +/**
> + * struct rv_file_priv - per-fd private data for /dev/rv.
> + */
> +struct rv_file_priv {
> + struct tlob_ring ring;
> + wait_queue_head_t waitq;
> +};
> +
> +#if IS_ENABLED(CONFIG_KUNIT)
> +int tlob_init_monitor(void);
> +void tlob_destroy_monitor(void);
> +int tlob_enable_hooks(void);
> +void tlob_disable_hooks(void);
> +void tlob_event_push_kunit(struct rv_file_priv *priv,
> + const struct tlob_event *info);
> +int tlob_parse_uprobe_line(char *buf, u64 *thr_out,
> + char **path_out,
> + loff_t *start_out, loff_t *stop_out);
> +#endif /* CONFIG_KUNIT */
> +
> +#endif /* _RV_TLOB_H */
> diff --git a/kernel/trace/rv/monitors/tlob/tlob_trace.h
> b/kernel/trace/rv/monitors/tlob/tlob_trace.h
> new file mode 100644
> index 000000000..b08d67776
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/tlob/tlob_trace.h
> @@ -0,0 +1,42 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Snippet to be included in rv_trace.h
> + */
> +
> +#ifdef CONFIG_RV_MON_TLOB
> +/*
> + * tlob uses the generic event_da_monitor_id and error_da_monitor_id event
> + * classes so that both event classes are instantiated. This avoids a
> + * -Werror=unused-variable warning that the compiler emits when a
> + * DECLARE_EVENT_CLASS has no corresponding DEFINE_EVENT instance.
> + *
> + * The event_tlob tracepoint is defined here but the call-site in
> + * da_handle_event() is overridden with a no-op macro below so that no
> + * trace record is emitted on every scheduler context switch. Budget
> + * violations are reported via the dedicated tlob_budget_exceeded event.
> + *
> + * error_tlob IS kept active so that invalid DA transitions (programming
> + * errors) are still visible in the ftrace ring buffer for debugging.
> + */
> +DEFINE_EVENT(event_da_monitor_id, event_tlob,
> + TP_PROTO(int id, char *state, char *event, char *next_state,
> + bool final_state),
> + TP_ARGS(id, state, event, next_state, final_state));
> +
> +DEFINE_EVENT(error_da_monitor_id, error_tlob,
> + TP_PROTO(int id, char *state, char *event),
> + TP_ARGS(id, state, event));
> +
> +/*
> + * Override the trace_event_tlob() call-site with a no-op after the
> + * DEFINE_EVENT above has satisfied the event class instantiation
> + * requirement. The tracepoint symbol itself exists (and can be enabled
> + * via tracefs) but the automatic call from da_handle_event() is silenced
> + * to avoid per-context-switch ftrace noise during normal operation.
> + */
> +#undef trace_event_tlob
> +#define trace_event_tlob(id, state, event, next_state, final_state) \
> + do { (void)(id); (void)(state); (void)(event); \
> + (void)(next_state); (void)(final_state); } while (0)
> +#endif /* CONFIG_RV_MON_TLOB */
> diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c
> index ee4e68102..e754e76d5 100644
> --- a/kernel/trace/rv/rv.c
> +++ b/kernel/trace/rv/rv.c
> @@ -148,6 +148,10 @@
> #include <rv_trace.h>
> #endif
>
> +#ifdef CONFIG_RV_MON_TLOB
> +EXPORT_TRACEPOINT_SYMBOL_GPL(tlob_budget_exceeded);
> +#endif
> +
> #include "rv.h"
>
> DEFINE_MUTEX(rv_interface_lock);
> diff --git a/kernel/trace/rv/rv_dev.c b/kernel/trace/rv/rv_dev.c
> new file mode 100644
> index 000000000..a052f3203
> --- /dev/null
> +++ b/kernel/trace/rv/rv_dev.c
> @@ -0,0 +1,602 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * rv_dev.c - /dev/rv misc device for RV monitor self-instrumentation
> + *
> + * A single misc device (MISC_DYNAMIC_MINOR) serves all RV monitors.
> + * ioctl numbers encode the monitor identity:
> + *
> + * 0x01 - 0x1F tlob (task latency over budget)
> + * 0x20 - 0x3F reserved
> + *
> + * Each monitor exports tlob_start_task() / tlob_stop_task() which are
> + * called here. The calling task is identified by current.
> + *
> + * Magic: RV_IOC_MAGIC (0xB9), defined in include/uapi/linux/rv.h
> + *
> + * Per-fd private data (rv_file_priv)
> + * ------------------------------------
> + * Every open() of /dev/rv allocates an rv_file_priv (defined in tlob.h).
> + * When TLOB_IOCTL_TRACE_START is called with args.notify_fd >= 0, violations
> + * are pushed as tlob_event records into that fd's per-fd ring buffer
> (tlob_ring)
> + * and its poll/epoll waitqueue is woken.
> + *
> + * Consumers drain records with read() on the notify_fd; read() blocks until
> + * at least one record is available (unless O_NONBLOCK is set).
> + *
> + * Per-thread "started" tracking (tlob_task_handle)
> + * -------------------------------------------------
> + * tlob_stop_task() returns -ESRCH in two distinct situations:
> + *
> + * (a) The deadline timer already fired and removed the tlob hash-table
> + * entry before TRACE_STOP arrived -> budget was exceeded -> -EOVERFLOW
> + *
> + * (b) TRACE_START was never called for this thread -> programming error
> + * -> -ESRCH
> + *
> + * To distinguish them, rv_dev.c maintains a lightweight hash table
> + * (tlob_handles) that records a tlob_task_handle for every task_struct *
> + * for which a successful TLOB_IOCTL_TRACE_START has been
> + * issued but the corresponding TLOB_IOCTL_TRACE_STOP has not yet arrived.
> + *
> + * tlob_task_handle is a thin "session ticket" -- it carries only the
> + * task pointer and the owning file descriptor. The heavy per-task state
> + * (hrtimer, DA state, threshold) lives in tlob_task_state inside tlob.c.
> + *
> + * The table is keyed on task_struct * (same key as tlob.c), protected
> + * by tlob_handles_lock (spinlock, irq-safe). No get_task_struct()
> + * refcount is needed here because tlob.c already holds a reference for
> + * each live entry.
> + *
> + * Multiple threads may share the same fd. Each thread has its own
> + * tlob_task_handle in the table, so concurrent TRACE_START / TRACE_STOP
> + * calls from different threads do not interfere.
> + *
> + * The fd release path (rv_release) calls tlob_stop_task() for every
> + * handle in tlob_handles that belongs to the closing fd, ensuring cleanup
> + * even if the user forgets to call TRACE_STOP.
> + */
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/gfp.h>
> +#include <linux/hash.h>
> +#include <linux/mm.h>
> +#include <linux/miscdevice.h>
> +#include <linux/module.h>
> +#include <linux/poll.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/uaccess.h>
> +#include <uapi/linux/rv.h>
> +
> +#ifdef CONFIG_RV_MON_TLOB
> +#include "monitors/tlob/tlob.h"
> +#endif
> +
> +/* -----------------------------------------------------------------------
> + * tlob_task_handle - per-thread session ticket for the ioctl interface
> + *
> + * One handle is allocated by TLOB_IOCTL_TRACE_START and freed by
> + * TLOB_IOCTL_TRACE_STOP (or by rv_release if the fd is closed).
> + *
> + * @hlist: Hash-table linkage in tlob_handles (keyed on task pointer).
> + * @task: The monitored thread. Plain pointer; no refcount held here
> + * because tlob.c holds one for the lifetime of the monitoring
> + * window, which encompasses the lifetime of this handle.
> + * @file: The /dev/rv file descriptor that issued TRACE_START.
> + * Used by rv_release() to sweep orphaned handles on close().
> + * -----------------------------------------------------------------------
> + */
> +#define TLOB_HANDLES_BITS 5
> +#define TLOB_HANDLES_SIZE (1 << TLOB_HANDLES_BITS)
> +
> +struct tlob_task_handle {
> + struct hlist_node hlist;
> + struct task_struct *task;
> + struct file *file;
> +};
> +
> +static struct hlist_head tlob_handles[TLOB_HANDLES_SIZE];
> +static DEFINE_SPINLOCK(tlob_handles_lock);
> +
> +static unsigned int tlob_handle_hash(const struct task_struct *task)
> +{
> + return hash_ptr((void *)task, TLOB_HANDLES_BITS);
> +}
> +
> +/* Must be called with tlob_handles_lock held. */
> +static struct tlob_task_handle *
> +tlob_handle_find_locked(struct task_struct *task)
> +{
> + struct tlob_task_handle *h;
> + unsigned int slot = tlob_handle_hash(task);
> +
> + hlist_for_each_entry(h, &tlob_handles[slot], hlist) {
> + if (h->task == task)
> + return h;
> + }
> + return NULL;
> +}
> +
> +/*
> + * tlob_handle_alloc - record that @task has an active monitoring session
> + * opened via @file.
> + *
> + * Returns 0 on success, -EEXIST if @task already has a handle (double
> + * TRACE_START without TRACE_STOP), -ENOMEM on allocation failure.
> + */
> +static int tlob_handle_alloc(struct task_struct *task, struct file *file)
> +{
> + struct tlob_task_handle *h;
> + unsigned long flags;
> + unsigned int slot;
> +
> + h = kmalloc(sizeof(*h), GFP_KERNEL);
> + if (!h)
> + return -ENOMEM;
> + h->task = task;
> + h->file = file;
> +
> + spin_lock_irqsave(&tlob_handles_lock, flags);
> + if (tlob_handle_find_locked(task)) {
> + spin_unlock_irqrestore(&tlob_handles_lock, flags);
> + kfree(h);
> + return -EEXIST;
> + }
> + slot = tlob_handle_hash(task);
> + hlist_add_head(&h->hlist, &tlob_handles[slot]);
> + spin_unlock_irqrestore(&tlob_handles_lock, flags);
> + return 0;
> +}
> +
> +/*
> + * tlob_handle_free - remove the handle for @task and free it.
> + *
> + * Returns 1 if a handle existed (TRACE_START was called), 0 if not found
> + * (TRACE_START was never called for this thread).
> + */
> +static int tlob_handle_free(struct task_struct *task)
> +{
> + struct tlob_task_handle *h;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&tlob_handles_lock, flags);
> + h = tlob_handle_find_locked(task);
> + if (h) {
> + hlist_del_init(&h->hlist);
> + spin_unlock_irqrestore(&tlob_handles_lock, flags);
> + kfree(h);
> + return 1;
> + }
> + spin_unlock_irqrestore(&tlob_handles_lock, flags);
> + return 0;
> +}
> +
> +/*
> + * tlob_handle_sweep_file - release all handles owned by @file.
> + *
> + * Called from rv_release() when the fd is closed without TRACE_STOP.
> + * Calls tlob_stop_task() for each orphaned handle to drain the tlob
> + * monitoring entries and prevent resource leaks in tlob.c.
> + *
> + * Handles are collected under the lock (short critical section), then
> + * processed outside it (tlob_stop_task() may sleep/spin internally).
> + */
> +#ifdef CONFIG_RV_MON_TLOB
> +static void tlob_handle_sweep_file(struct file *file)
> +{
> + struct tlob_task_handle *batch[TLOB_HANDLES_SIZE];
> + struct tlob_task_handle *h;
> + struct hlist_node *tmp;
> + unsigned long flags;
> + int i, n = 0;
> +
> + spin_lock_irqsave(&tlob_handles_lock, flags);
> + for (i = 0; i < TLOB_HANDLES_SIZE; i++) {
> + hlist_for_each_entry_safe(h, tmp, &tlob_handles[i], hlist) {
> + if (h->file == file) {
> + hlist_del_init(&h->hlist);
> + batch[n++] = h;
> + }
> + }
> + }
> + spin_unlock_irqrestore(&tlob_handles_lock, flags);
> +
> + for (i = 0; i < n; i++) {
> + /*
> + * Ignore -ESRCH: the deadline timer may have already fired
> + * and cleaned up the tlob entry.
> + */
> + tlob_stop_task(batch[i]->task);
> + kfree(batch[i]);
> + }
> +}
> +#else
> +static inline void tlob_handle_sweep_file(struct file *file) {}
> +#endif /* CONFIG_RV_MON_TLOB */
> +
> +/* -----------------------------------------------------------------------
> + * Ring buffer lifecycle
> + * -----------------------------------------------------------------------
> + */
> +
> +/*
> + * tlob_ring_alloc - allocate a ring of @cap records (must be a power of 2).
> + *
> + * Allocates a physically contiguous block of pages:
> + * page 0 : struct tlob_mmap_page (control page, shared with
> userspace)
> + * pages 1..N : struct tlob_event[cap] (data pages)
> + *
> + * Each page is marked reserved so it can be mapped to userspace via mmap().
> + */
> +static int tlob_ring_alloc(struct tlob_ring *ring, u32 cap)
> +{
> + unsigned int total = PAGE_SIZE + cap * sizeof(struct tlob_event);
> + unsigned int order = get_order(total);
> + unsigned long base;
> + unsigned int i;
> +
> + base = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> + if (!base)
> + return -ENOMEM;
> +
> + for (i = 0; i < (1u << order); i++)
> + SetPageReserved(virt_to_page((void *)(base + i *
> PAGE_SIZE)));
> +
> + ring->base = base;
> + ring->order = order;
> + ring->page = (struct tlob_mmap_page *)base;
> + ring->data = (struct tlob_event *)(base + PAGE_SIZE);
> + ring->mask = cap - 1;
> + spin_lock_init(&ring->lock);
> +
> + ring->page->capacity = cap;
> + ring->page->version = 1;
> + ring->page->data_offset = PAGE_SIZE;
> + ring->page->record_size = sizeof(struct tlob_event);
> + return 0;
> +}
> +
> +static void tlob_ring_free(struct tlob_ring *ring)
> +{
> + unsigned int i;
> +
> + if (!ring->base)
> + return;
> +
> + for (i = 0; i < (1u << ring->order); i++)
> + ClearPageReserved(virt_to_page((void *)(ring->base + i *
> PAGE_SIZE)));
> +
> + free_pages(ring->base, ring->order);
> + ring->base = 0;
> + ring->page = NULL;
> + ring->data = NULL;
> +}
> +
> +/* -----------------------------------------------------------------------
> + * File operations
> + * -----------------------------------------------------------------------
> + */
> +
> +static int rv_open(struct inode *inode, struct file *file)
> +{
> + struct rv_file_priv *priv;
> + int ret;
> +
> + priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + ret = tlob_ring_alloc(&priv->ring, TLOB_RING_DEFAULT_CAP);
> + if (ret) {
> + kfree(priv);
> + return ret;
> + }
> +
> + init_waitqueue_head(&priv->waitq);
> + file->private_data = priv;
> + return 0;
> +}
> +
> +static int rv_release(struct inode *inode, struct file *file)
> +{
> + struct rv_file_priv *priv = file->private_data;
> +
> + tlob_handle_sweep_file(file);
> + tlob_ring_free(&priv->ring);
> + kfree(priv);
> + file->private_data = NULL;
> + return 0;
> +}
> +
> +static __poll_t rv_poll(struct file *file, poll_table *wait)
> +{
> + struct rv_file_priv *priv = file->private_data;
> +
> + if (!priv)
> + return EPOLLERR;
> +
> + poll_wait(file, &priv->waitq, wait);
> +
> + /*
> + * Pairs with smp_store_release(&ring->page->data_head, ...) in
> + * tlob_event_push(). No lock needed: head is written by the kernel
> + * producer and read here; tail is written by the consumer and we
> only
> + * need an approximate check for the poll fast path.
> + */
> + if (smp_load_acquire(&priv->ring.page->data_head) !=
> + READ_ONCE(priv->ring.page->data_tail))
> + return EPOLLIN | EPOLLRDNORM;
> +
> + return 0;
> +}
> +
> +/*
> + * rv_read - consume tlob_event violation records from this fd's ring buffer.
> + *
> + * Each read() returns a whole number of struct tlob_event records. @count
> must
> + * be at least sizeof(struct tlob_event); partial-record sizes are rejected
> with
> + * -EINVAL.
> + *
> + * Blocking behaviour follows O_NONBLOCK on the fd:
> + * O_NONBLOCK clear: blocks until at least one record is available.
> + * O_NONBLOCK set: returns -EAGAIN immediately if the ring is empty.
> + *
> + * Returns the number of bytes copied (always a multiple of sizeof
> tlob_event),
> + * -EAGAIN if non-blocking and empty, or a negative error code.
> + *
> + * read() and mmap() share the same ring and data_tail cursor; do not use
> + * both simultaneously on the same fd.
> + */
> +static ssize_t rv_read(struct file *file, char __user *buf, size_t count,
> + loff_t *ppos)
> +{
> + struct rv_file_priv *priv = file->private_data;
> + struct tlob_ring *ring;
> + size_t rec = sizeof(struct tlob_event);
> + unsigned long irqflags;
> + ssize_t done = 0;
> + int ret;
> +
> + if (!priv)
> + return -ENODEV;
> +
> + ring = &priv->ring;
> +
> + if (count < rec)
> + return -EINVAL;
> +
> + /* Blocking path: sleep until the producer advances data_head. */
> + if (!(file->f_flags & O_NONBLOCK)) {
> + ret = wait_event_interruptible(priv->waitq,
> + /* pairs with smp_store_release() in the producer */
> + smp_load_acquire(&ring->page->data_head) !=
> + READ_ONCE(ring->page->data_tail));
> + if (ret)
> + return ret;
> + }
> +
> + /*
> + * Drain records into the caller's buffer. ring->lock serialises
> + * concurrent read() callers and the softirq producer.
> + */
> + while (done + rec <= count) {
> + struct tlob_event record;
> + u32 head, tail;
> +
> + spin_lock_irqsave(&ring->lock, irqflags);
> + /* pairs with smp_store_release() in the producer */
> + head = smp_load_acquire(&ring->page->data_head);
> + tail = ring->page->data_tail;
> + if (head == tail) {
> + spin_unlock_irqrestore(&ring->lock, irqflags);
> + break;
> + }
> + record = ring->data[tail & ring->mask];
> + WRITE_ONCE(ring->page->data_tail, tail + 1);
> + spin_unlock_irqrestore(&ring->lock, irqflags);
> +
> + if (copy_to_user(buf + done, &record, rec))
> + return done ? done : -EFAULT;
> + done += rec;
> + }
> +
> + return done ? done : -EAGAIN;
> +}
> +
> +/*
> + * rv_mmap - map the per-fd violation ring buffer into userspace.
> + *
> + * The mmap region covers the full ring allocation:
> + *
> + * offset 0 : struct tlob_mmap_page (control page)
> + * offset PAGE_SIZE : struct tlob_event[capacity] (data pages)
> + *
> + * The caller must map exactly PAGE_SIZE + capacity * sizeof(struct
> tlob_event)
> + * bytes starting at offset 0 (vm_pgoff must be 0). The actual capacity is
> + * read from tlob_mmap_page.capacity after a successful mmap(2).
> + *
> + * Private mappings (MAP_PRIVATE) are rejected: the shared data_tail field
> + * written by userspace must be visible to the kernel producer.
> + */
> +static int rv_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct rv_file_priv *priv = file->private_data;
> + struct tlob_ring *ring;
> + unsigned long size = vma->vm_end - vma->vm_start;
> + unsigned long ring_size;
> +
> + if (!priv)
> + return -ENODEV;
> +
> + ring = &priv->ring;
> +
> + if (vma->vm_pgoff != 0)
> + return -EINVAL;
> +
> + ring_size = PAGE_ALIGN(PAGE_SIZE + ((unsigned long)(ring->mask + 1) *
> + sizeof(struct tlob_event)));
> + if (size != ring_size)
> + return -EINVAL;
> +
> + if (!(vma->vm_flags & VM_SHARED))
> + return -EINVAL;
> +
> + return remap_pfn_range(vma, vma->vm_start,
> + page_to_pfn(virt_to_page((void *)ring->base)),
> + ring_size, vma->vm_page_prot);
> +}
> +
> +/* -----------------------------------------------------------------------
> + * ioctl dispatcher
> + * -----------------------------------------------------------------------
> + */
> +
> +static long rv_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> +{
> + unsigned int nr = _IOC_NR(cmd);
> +
> + /*
> + * Verify the magic byte so we don't accidentally handle ioctls
> + * intended for a different device.
> + */
> + if (_IOC_TYPE(cmd) != RV_IOC_MAGIC)
> + return -ENOTTY;
> +
> +#ifdef CONFIG_RV_MON_TLOB
> + /* tlob: ioctl numbers 0x01 - 0x1F */
> + switch (cmd) {
> + case TLOB_IOCTL_TRACE_START: {
> + struct tlob_start_args args;
> + struct file *notify_file = NULL;
> + int ret, hret;
> +
> + if (copy_from_user(&args,
> + (struct tlob_start_args __user *)arg,
> + sizeof(args)))
> + return -EFAULT;
> + if (args.threshold_us == 0)
> + return -EINVAL;
> + if (args.flags != 0)
> + return -EINVAL;
> +
> + /*
> + * If notify_fd >= 0, resolve it to a file pointer.
> + * fget() bumps the reference count; tlob.c drops it
> + * via fput() when the monitoring window ends.
> + * Reject non-/dev/rv fds to prevent type confusion.
> + */
> + if (args.notify_fd >= 0) {
> + notify_file = fget(args.notify_fd);
> + if (!notify_file)
> + return -EBADF;
> + if (notify_file->f_op != file->f_op) {
> + fput(notify_file);
> + return -EINVAL;
> + }
> + }
> +
> + ret = tlob_start_task(current, args.threshold_us,
> + notify_file, args.tag);
> + if (ret != 0) {
> + /* tlob.c did not take ownership; drop ref. */
> + if (notify_file)
> + fput(notify_file);
> + return ret;
> + }
> +
> + /*
> + * Record session handle. Free any stale handle left by
> + * a previous window whose deadline timer fired (timer
> + * removes tlob_task_state but cannot touch tlob_handles).
> + */
> + tlob_handle_free(current);
> + hret = tlob_handle_alloc(current, file);
> + if (hret < 0) {
> + tlob_stop_task(current);
> + return hret;
> + }
> + return 0;
> + }
> + case TLOB_IOCTL_TRACE_STOP: {
> + int had_handle;
> + int ret;
> +
> + /*
> + * Atomically remove the session handle for current.
> + *
> + * had_handle == 0: TRACE_START was never called for
> + * this thread -> caller bug -> -ESRCH
> + *
> + * had_handle == 1: TRACE_START was called. If
> + * tlob_stop_task() now returns
> + * -ESRCH, the deadline timer already
> + * fired -> budget exceeded -> -EOVERFLOW
> + */
> + had_handle = tlob_handle_free(current);
> + if (!had_handle)
> + return -ESRCH;
> +
> + ret = tlob_stop_task(current);
> + return (ret == -ESRCH) ? -EOVERFLOW : ret;
> + }
> + default:
> + break;
> + }
> +#endif /* CONFIG_RV_MON_TLOB */
> +
> + return -ENOTTY;
> +}
> +
> +/* -----------------------------------------------------------------------
> + * Module init / exit
> + * -----------------------------------------------------------------------
> + */
> +
> +static const struct file_operations rv_fops = {
> + .owner = THIS_MODULE,
> + .open = rv_open,
> + .release = rv_release,
> + .read = rv_read,
> + .poll = rv_poll,
> + .mmap = rv_mmap,
> + .unlocked_ioctl = rv_ioctl,
> +#ifdef CONFIG_COMPAT
> + .compat_ioctl = rv_ioctl,
> +#endif
> + .llseek = noop_llseek,
> +};
> +
> +/*
> + * 0666: /dev/rv is a self-instrumentation device. All ioctls operate
> + * exclusively on the calling task (current); no task can monitor another
> + * via this interface. Opening the device does not grant any privilege
> + * beyond observing one's own latency, so world-read/write is appropriate.
> + */
> +static struct miscdevice rv_miscdev = {
> + .minor = MISC_DYNAMIC_MINOR,
> + .name = "rv",
> + .fops = &rv_fops,
> + .mode = 0666,
> +};
> +
> +static int __init rv_ioctl_init(void)
> +{
> + int i;
> +
> + for (i = 0; i < TLOB_HANDLES_SIZE; i++)
> + INIT_HLIST_HEAD(&tlob_handles[i]);
> +
> + return misc_register(&rv_miscdev);
> +}
> +
> +static void __exit rv_ioctl_exit(void)
> +{
> + misc_deregister(&rv_miscdev);
> +}
> +
> +module_init(rv_ioctl_init);
> +module_exit(rv_ioctl_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("RV ioctl interface via /dev/rv");
> diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
> index 4a6faddac..65d6c6485 100644
> --- a/kernel/trace/rv/rv_trace.h
> +++ b/kernel/trace/rv/rv_trace.h
> @@ -126,6 +126,7 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
> #include <monitors/snroc/snroc_trace.h>
> #include <monitors/nrp/nrp_trace.h>
> #include <monitors/sssw/sssw_trace.h>
> +#include <monitors/tlob/tlob_trace.h>
> // Add new monitors based on CONFIG_DA_MON_EVENTS_ID here
>
> #endif /* CONFIG_DA_MON_EVENTS_ID */
> @@ -202,6 +203,55 @@ TRACE_EVENT(rv_retries_error,
> __get_str(event), __get_str(name))
> );
> #endif /* CONFIG_RV_MON_MAINTENANCE_EVENTS */
> +
> +#ifdef CONFIG_RV_MON_TLOB
> +/*
> + * tlob_budget_exceeded - emitted when a monitored task exceeds its latency
> + * budget. Carries the on-CPU / off-CPU time breakdown so that the cause
> + * of the overrun (CPU-bound vs. scheduling/I/O latency) is immediately
> + * visible in the ftrace ring buffer without post-processing.
> + */
> +TRACE_EVENT(tlob_budget_exceeded,
> +
> + TP_PROTO(struct task_struct *task, u64 threshold_us,
> + u64 on_cpu_us, u64 off_cpu_us, u32 switches,
> + bool state_is_on_cpu, u64 tag),
> +
> + TP_ARGS(task, threshold_us, on_cpu_us, off_cpu_us, switches,
> + state_is_on_cpu, tag),
> +
> + TP_STRUCT__entry(
> + __string(comm, task->comm)
> + __field(pid_t, pid)
> + __field(u64, threshold_us)
> + __field(u64, on_cpu_us)
> + __field(u64, off_cpu_us)
> + __field(u32, switches)
> + __field(bool, state_is_on_cpu)
> + __field(u64, tag)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(comm);
> + __entry->pid = task->pid;
> + __entry->threshold_us = threshold_us;
> + __entry->on_cpu_us = on_cpu_us;
> + __entry->off_cpu_us = off_cpu_us;
> + __entry->switches = switches;
> + __entry->state_is_on_cpu = state_is_on_cpu;
> + __entry->tag = tag;
> + ),
> +
> + TP_printk("%s[%d]: budget exceeded threshold=%llu on_cpu=%llu
> off_cpu=%llu switches=%u state=%s tag=0x%016llx",
> + __get_str(comm), __entry->pid,
> + __entry->threshold_us,
> + __entry->on_cpu_us, __entry->off_cpu_us,
> + __entry->switches,
> + __entry->state_is_on_cpu ? "on_cpu" : "off_cpu",
> + __entry->tag)
> +);
> +#endif /* CONFIG_RV_MON_TLOB */
> +
> #endif /* _TRACE_RV_H */
>
> /* This part must be outside protection */
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 1/4] rv/tlob: Add tlob model DOT file
2026-04-12 19:27 ` [RFC PATCH 1/4] rv/tlob: Add tlob model DOT file wen.yang
@ 2026-04-13 8:19 ` Gabriele Monaco
0 siblings, 0 replies; 7+ messages in thread
From: Gabriele Monaco @ 2026-04-13 8:19 UTC (permalink / raw)
To: wen.yang, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
Cc: linux-trace-kernel, linux-kernel
On Mon, 2026-04-13 at 03:27 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
>
> Add the Graphviz DOT specification for the tlob (task latency over
> budget) deterministic automaton.
>
> The model has three states: unmonitored, on_cpu, and off_cpu.
> trace_start transitions from unmonitored to on_cpu; switch_out and
> switch_in cycle between on_cpu and off_cpu; trace_stop and
> budget_expired return to unmonitored from either active state.
> unmonitored is the sole accepting state.
>
> switch_in, switch_out, and sched_wakeup self-loop in unmonitored;
> sched_wakeup self-loops in on_cpu; switch_out and sched_wakeup
> self-loop in off_cpu.
>
> Signed-off-by: Wen Yang <wen.yang@linux.dev>
> ---
Interesting monitor! Thanks.
I'm going to go through it more in details later, but let me share some initial
comments.
> MAINTAINERS | 3 +++
> tools/verification/models/tlob.dot | 25 +++++++++++++++++++++++++
> 2 files changed, 28 insertions(+)
> create mode 100644 tools/verification/models/tlob.dot
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9fbb619c6..c2c56236c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -23242,7 +23242,10 @@ S: Maintained
> F: Documentation/trace/rv/
> F: include/linux/rv.h
> F: include/rv/
> +F: include/uapi/linux/rv.h
> F: kernel/trace/rv/
> +F: samples/rv/
> +F: tools/testing/selftests/rv/
> F: tools/testing/selftests/verification/
> F: tools/verification/
This change doesn't belong here, the patch itself is not adding those file, you
should probably move it later.
>
> diff --git a/tools/verification/models/tlob.dot
> b/tools/verification/models/tlob.dot
> new file mode 100644
> index 000000000..df34a14b8
> --- /dev/null
> +++ b/tools/verification/models/tlob.dot
> @@ -0,0 +1,25 @@
> +digraph state_automaton {
> + center = true;
> + size = "7,11";
> + {node [shape = plaintext, style=invis, label=""]
> "__init_unmonitored"};
> + {node [shape = ellipse] "unmonitored"};
> + {node [shape = plaintext] "unmonitored"};
> + {node [shape = plaintext] "on_cpu"};
> + {node [shape = plaintext] "off_cpu"};
> + "__init_unmonitored" -> "unmonitored";
> + "unmonitored" [label = "unmonitored", color = green3];
> + "unmonitored" -> "on_cpu" [ label = "trace_start" ];
> + "unmonitored" -> "unmonitored" [ label =
> "switch_in\nswitch_out\nsched_wakeup" ];
> + "on_cpu" [label = "on_cpu"];
> + "on_cpu" -> "off_cpu" [ label = "switch_out" ];
> + "on_cpu" -> "unmonitored" [ label = "trace_stop\nbudget_expired" ];
> + "on_cpu" -> "on_cpu" [ label = "sched_wakeup" ];
> + "off_cpu" [label = "off_cpu"];
> + "off_cpu" -> "on_cpu" [ label = "switch_in" ];
> + "off_cpu" -> "unmonitored" [ label = "trace_stop\nbudget_expired" ];
> + "off_cpu" -> "off_cpu" [ label = "switch_out\nsched_wakeup" ];
> + { rank = min ;
> + "__init_unmonitored";
> + "unmonitored";
> + }
> +}
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-13 8:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-12 19:27 [RFC PATCH 0/4] rv/tlob: Add task latency over budget RV monitor wen.yang
2026-04-12 19:27 ` [RFC PATCH 1/4] rv/tlob: Add tlob model DOT file wen.yang
2026-04-13 8:19 ` Gabriele Monaco
2026-04-12 19:27 ` [RFC PATCH 2/4] rv/tlob: Add tlob deterministic automaton monitor wen.yang
2026-04-13 8:19 ` Gabriele Monaco
2026-04-12 19:27 ` [RFC PATCH 3/4] rv/tlob: Add KUnit tests for the tlob monitor wen.yang
2026-04-12 19:27 ` [RFC PATCH 4/4] selftests/rv: Add selftest " wen.yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox