[PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor
@ 2026-06-07 16:13 wen.yang
  2026-06-07 16:13 ` [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY wen.yang
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: wen.yang @ 2026-06-07 16:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, linux-trace-kernel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

This series introduces tlob (task latency over budget), a per-task
hybrid automaton RV monitor that measures elapsed wall-clock time across
a user-delimited code section and fires when the time exceeds a
configurable budget.

The series applies cleanly on top of:
  [1] git://git.kernel.org/pub/scm/linux/kernel/git/gmonaco/linux.git rv-fixes-7.1
      "rv fixes for v7.1"

Background
----------
The existing wwnr monitor uses a two-state DA to detect tasks that are
woken but never run.  tlob extends the RV framework to a three-state
hybrid automaton:

  running  (initial) -- on CPU
  waiting             -- in the scheduler runqueue, not yet on CPU
  sleeping            -- blocked on a lock, I/O, or similar resource

A single HA clock invariant, clk_elapsed < BUDGET_NS(), is active in
all states.  The framework enforces it via a per-task hrtimer.  On
expiry, error_env_tlob is emitted, followed by detail_env_tlob which
carries a per-state time breakdown (running_ns, waiting_ns, sleeping_ns)
that pinpoints whether the overrun occurred in the running, waiting, or
sleeping state.

Userspace interface
-------------------
Tasks are registered for monitoring by writing to the tracefs monitor
file:

  # echo "p /path/to/binary:START_OFFSET STOP_OFFSET threshold=NS" \
        > /sys/kernel/tracing/rv/monitors/tlob/monitor

Two uprobes are registered at START_OFFSET (entry) and STOP_OFFSET
(exit) of the delimited section.  When a task executes the entry uprobe,
the monitor starts; when the task reaches the exit uprobe or the budget
expires, monitoring stops and the slot is returned to the pool.

Multiple uprobe pairs can be registered for the same binary or different
binaries.  Each task can have at most one active monitoring session; if
a task hits a start uprobe while already monitored, the prior session is
cancelled and a new one begins.

Series structure
----------------
Patch 1: rv/da: introduce DA_MON_ALLOCATION_STRATEGY
  Consolidates per-object DA storage allocation under a compile-time
  selector with three strategies:
    DA_ALLOC_AUTO   (default) - lock-free kmalloc_nolock; unbounded
    DA_ALLOC_POOL             - pre-allocated fixed-size pool
    DA_ALLOC_MANUAL           - caller pre-inserts storage

  da_handle_start_event() and da_handle_start_run_event() call
  da_prepare_storage() which resolves at compile time to the correct
  allocation function.

  This patch also includes critical correctness fixes for the pool
  implementation:
  - Add tracepoint_synchronize_unregister() in da_monitor_destroy_pool()
    to fix UAF where in-flight handlers access freed pool storage
  - Fix duplicate hash entry race in da_create_or_get_pool() via
    concurrent-insert detection under RCU
  - Add capacity field to fix build error (DA_MON_POOL_SIZE undeclared
    in da_pool_return_cb)

Patch 2: rv: add generic uprobe infrastructure for RV monitors
  Introduces rv_uprobe, a thin wrapper around uprobe_consumer for RV
  monitors.  Provides rv_uprobe_register(), rv_uprobe_unregister(),
  and rv_uprobe_sync() for safe teardown.

Patch 3: rv/tlob: add tlob model DOT file
  The formal model used to generate tlob.h.

Patch 4: rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check
  Fixes a bug where ha_invariant_passed_ns() returned 0 early when
  env_store was invalid (U64_MAX), leaving it at U64_MAX and causing
  ha_check_invariant_ns() to always pass.  The fix calls ha_reset_clk_ns()
  then ha_set_invariant_ns() on first use.

Patch 5: rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable
  Allows tlob to override EVENT_NONE_LBL for its start_tlob self-loop.

Patch 6: rv/tlob: add tlob hybrid automaton monitor
  The main tlob implementation, including:
  - Three-state HA (running/waiting/sleeping)
  - Per-task hrtimer enforcement (HRTIMER_MODE_REL_HARD)
  - DA_ALLOC_POOL for allocation-free hot path
  - Uprobe registration via tracefs monitor file
  - Per-state time accumulation (running_ns, waiting_ns, sleeping_ns)
  - HIGH_RES_TIMERS dependency in Kconfig

Patches 7-9: Tests
  - KUnit tests for tlob monitor
  - Selftest infrastructure fixes
  - tlob selftests (uprobe binding, state tracking, violation detection)

Changes since v2
----------------
All feedback from Gabriele Monaco has been addressed:

-- Patch 02 (per-task slot ordering / ha_monitor_reset_env):
   Dropped from v3; rebased on top of Gabriele's series [1].

-- Patch 03 (verificationtest-ktap):
   Changed to use realpath for robustness as suggested.

-- Patch 04 (pre-allocated storage pool):
   Complete redesign as DA_MON_ALLOCATION_STRATEGY:
   - Three strategies (AUTO/POOL/MANUAL) via compile-time macro
   - da_monitor_init_prealloc() removed; da_monitor_init() selects
     internally
   - da_create_or_get_kmalloc() removed (no viable use case)
   - nomiss updated to use DA_ALLOC_MANUAL
   - da_extra_cleanup() hook added for per-entry teardown

   Critical bug fixes included in this patch:
   - tracepoint_synchronize_unregister() added to da_monitor_destroy_pool()
     to prevent UAF from in-flight handlers accessing freed pool storage
   - Duplicate hash entry race fixed in da_create_or_get_pool() via
     concurrent-insert detection and slot return under RCU
   - capacity field added to fix DA_MON_POOL_SIZE undeclared build error

-- Patch 05 (generic uprobe infrastructure):
   Carried unchanged into v3.

-- Patch 06 (rvgen __init arrow reset):
   Carried unchanged into v3.

-- Patch 08 (tlob monitor):
   Split and refactored:
   - ioctl interface deferred to follow-up series (tracefs-only in v3)
   - Handler simplification: three inline helpers (tlob_acc_running/
     waiting/sleeping) with scoped_guard(rcu)
   - do_prev/do_next flags removed (da_handle_event skips unmonitored)
   - scoped_guard(rcu) and guard(mutex) applied throughout
   - tlob_stop_all() removed; da_extra_cleanup() hook used instead
   - start_tlob self-loop added to DOT model as suggested
   - ha_setup_invariants() guards against redundant timer restart
   - HIGH_RES_TIMERS dependency added to Kconfig

Additional improvements in v3
------------------------------
Beyond the v2 feedback, this version includes:

1. Simplified tlob monitor implementation:
   - Removed redundant tlob_num_monitored atomic counter
     (da_handle_event already handles unmonitored tasks via hash lookup)
   - Eliminated extra cacheline touch on every sched_switch/sched_wakeup
   - Several repeated pattern simplifications.

2. Extracted common accumulation logic:
   - __tlob_acc() using offsetof() replaces three nearly-identical functions
   - Reduces code duplication while maintaining type safety

3. Complete test coverage:
   - KUnit tests for core functionality
   - Comprehensive selftests for uprobe integration, state tracking,
     and violation detection

Testing
-------
All patches have been tested on:
- x86_64 with CONFIG_PREEMPT_RT
- All KUnit tests pass
- All selftests pass with verificationtest-ktap

  
[1] git://git.kernel.org/pub/scm/linux/kernel/git/gmonaco/linux.git rv-fixes-7.1
    "rv fixes for v7.1"


Wen Yang (9):
  rv/da: introduce DA_MON_ALLOCATION_STRATEGY
  rv: add generic uprobe infrastructure for RV monitors
  rv/tlob: add tlob model DOT file
  rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check
  rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable
  rv/tlob: add tlob hybrid automaton monitor
  rv/tlob: add KUnit tests for the tlob monitor
  selftests/verification: fix verificationtest-ktap for out-of-tree
    execution
  selftests/verification: add tlob selftests

 Documentation/trace/rv/index.rst              |   1 +
 Documentation/trace/rv/monitor_tlob.rst       | 177 ++++
 include/rv/da_monitor.h                       | 276 ++++-
 include/rv/ha_monitor.h                       |  22 +-
 include/rv/rv_uprobe.h                        | 119 +++
 kernel/trace/rv/Kconfig                       |   5 +
 kernel/trace/rv/Makefile                      |   3 +
 kernel/trace/rv/monitors/nomiss/nomiss.c      |   6 +-
 kernel/trace/rv/monitors/tlob/.kunitconfig    |   6 +
 kernel/trace/rv/monitors/tlob/Kconfig         |  19 +
 kernel/trace/rv/monitors/tlob/tlob.c          | 968 ++++++++++++++++++
 kernel/trace/rv/monitors/tlob/tlob.h          | 148 +++
 kernel/trace/rv/monitors/tlob/tlob_kunit.c    |  92 ++
 kernel/trace/rv/monitors/tlob/tlob_trace.h    |  49 +
 kernel/trace/rv/rv_trace.h                    |   1 +
 kernel/trace/rv/rv_uprobe.c                   | 182 ++++
 .../testing/selftests/verification/.gitignore |   2 +
 tools/testing/selftests/verification/Makefile |  19 +-
 .../verification/test.d/tlob/Makefile         |  20 +
 .../verification/test.d/tlob/test.d/functions |   1 +
 .../verification/test.d/tlob/tlob_sym.c       | 189 ++++
 .../verification/test.d/tlob/tlob_target.c    | 138 +++
 .../verification/test.d/tlob/uprobe_bind.tc   |  37 +
 .../test.d/tlob/uprobe_detail_running.tc      |  51 +
 .../test.d/tlob/uprobe_detail_sleeping.tc     |  50 +
 .../test.d/tlob/uprobe_detail_waiting.tc      |  66 ++
 .../verification/test.d/tlob/uprobe_multi.tc  |  64 ++
 .../test.d/tlob/uprobe_no_event.tc            |  19 +
 .../test.d/tlob/uprobe_violation.tc           |  67 ++
 .../verification/verificationtest-ktap        |   4 +-
 tools/verification/models/tlob.dot            |  22 +
 31 files changed, 2789 insertions(+), 34 deletions(-)
 create mode 100644 Documentation/trace/rv/monitor_tlob.rst
 create mode 100644 include/rv/rv_uprobe.h
 create mode 100644 kernel/trace/rv/monitors/tlob/.kunitconfig
 create mode 100644 kernel/trace/rv/monitors/tlob/Kconfig
 create mode 100644 kernel/trace/rv/monitors/tlob/tlob.c
 create mode 100644 kernel/trace/rv/monitors/tlob/tlob.h
 create mode 100644 kernel/trace/rv/monitors/tlob/tlob_kunit.c
 create mode 100644 kernel/trace/rv/monitors/tlob/tlob_trace.h
 create mode 100644 kernel/trace/rv/rv_uprobe.c
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/Makefile
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/test.d/functions
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/tlob_sym.c
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/tlob_target.c
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_bind.tc
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_detail_running.tc
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_detail_sleeping.tc
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_detail_waiting.tc
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_multi.tc
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_no_event.tc
 create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_violation.tc
 create mode 100644 tools/verification/models/tlob.dot

-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY
  2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
@ 2026-06-07 16:13 ` wen.yang
  2026-06-15  9:56   ` Gabriele Monaco
  2026-06-07 16:13 ` [PATCH v3 2/9] rv: add generic uprobe infrastructure for RV monitors wen.yang
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: wen.yang @ 2026-06-07 16:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, linux-trace-kernel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

Consolidate per-object DA monitor storage allocation under a
single compile-time selector, replacing the ad-hoc
da_monitor_init_prealloc() API.

Three strategies are provided:

  DA_ALLOC_AUTO   (default) - lock-free kmalloc_nolock on the hot path;
                              unbounded capacity.  Preserves the existing
                              behaviour for all monitors that do not set
                              DA_MON_ALLOCATION_STRATEGY.

  DA_ALLOC_POOL             - pre-allocated fixed-size pool.  Requires the
                              monitor to define DA_MON_POOL_SIZE; enforced
                              with #error.  da_prepare_storage() acquires
                              spinlock_t (O(1), irqsave); must be called
                              from task context on PREEMPT_RT where
                              spinlock_t is a sleeping lock.

  DA_ALLOC_MANUAL           - caller pre-inserts storage via
                              da_create_empty_storage() before the first
                              da_handle_start_event(); the framework only
                              links the target field.  Useful for monitors
                              that allocate storage from known-safe task
                              context (e.g. a syscall path) and then hand
                              it to a tracepoint handler on the hot path.

da_handle_start_event() and da_handle_start_run_event() both call
da_prepare_storage() which resolves at compile time to the correct
allocation function, so no runtime dispatch is needed.

da_monitor_init_prealloc() is removed; da_monitor_init() selects pool
or kmalloc initialisation internally based on the strategy.

A da_extra_cleanup() hook macro is added: the default is a no-op; a
monitor may define it as a function called by da_monitor_destroy() on
each remaining entry before hash_del_rcu().

nomiss is updated to DA_ALLOC_MANUAL: it calls da_create_empty_storage()
from handle_sys_enter() (the sched_setscheduler syscall path, safe
task context), then da_fill_empty_storage() links the sched_dl_entity
target on the first da_handle_start_run_event() call in
handle_sched_switch().

Suggested-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
 include/rv/da_monitor.h                  | 276 +++++++++++++++++++++--
 kernel/trace/rv/monitors/nomiss/nomiss.c |   6 +-
 2 files changed, 254 insertions(+), 28 deletions(-)

diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 34b8fba9ecd4..eb7fc02ecb8a 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -14,6 +14,26 @@
 #ifndef _RV_DA_MONITOR_H
 #define _RV_DA_MONITOR_H
 
+/*
+ * Allocation strategies for RV_MON_PER_OBJ monitors.
+ *
+ * Define DA_MON_ALLOCATION_STRATEGY before including this header.
+ * DA_ALLOC_AUTO   - lock-free kmalloc on the hot path; unbounded capacity.
+ * DA_ALLOC_POOL   - pre-allocated fixed-size pool; requires DA_MON_POOL_SIZE.
+ *                   da_prepare_storage() acquires spinlock_t (O(1), irqsave);
+ *                   must be called from task context on PREEMPT_RT where
+ *                   spinlock_t is a sleeping lock.
+ * DA_ALLOC_MANUAL - caller inserts storage before da_handle_start_event();
+ *                   the framework only links the target field.
+ */
+#define DA_ALLOC_AUTO   0
+#define DA_ALLOC_POOL   1
+#define DA_ALLOC_MANUAL 2
+
+#ifndef DA_MON_ALLOCATION_STRATEGY
+# define DA_MON_ALLOCATION_STRATEGY DA_ALLOC_AUTO
+#endif
+
 #include <rv/automata.h>
 #include <linux/rv.h>
 #include <linux/stringify.h>
@@ -66,6 +86,19 @@ static struct rv_monitor rv_this;
 #define da_monitor_sync_hook()
 #endif
 
+/*
+ * Hook for per-object teardown during da_monitor_destroy().
+ *
+ * Called for each entry still in the hash table when the monitor is
+ * destroyed.  Invoked before da_monitor_reset() and hash_del_rcu(), so
+ * it is safe to call ha_cancel_timer_sync() here.
+ *
+ * Define before including this header.  Default is a no-op.
+ */
+#ifndef da_extra_cleanup
+#define da_extra_cleanup(da_mon)
+#endif
+
 /*
  * Type for the target id, default to int but can be overridden.
  * A long type can work as hash table key (PER_OBJ) but will be downgraded to
@@ -398,6 +431,16 @@ static inline void da_monitor_destroy(void)
  * Functions to define, init and get a per-object monitor.
  */
 
+/*
+ * DA_MON_POOL_SIZE must be defined before this header is included (directly or
+ * transitively via ha_monitor.h) when DA_ALLOC_POOL is selected.  In practice
+ * this means defining it after the monitor's model header (which supplies the
+ * capacity constant) and before the ha_monitor.h include.
+ */
+#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL && !defined(DA_MON_POOL_SIZE)
+# error "DA_ALLOC_POOL requires DA_MON_POOL_SIZE to be defined before including this header"
+#endif
+
 struct da_monitor_storage {
 	da_id_type id;
 	monitor_target target;
@@ -495,18 +538,6 @@ static inline da_id_type da_get_id(struct da_monitor *da_mon)
 	return container_of(da_mon, struct da_monitor_storage, rv.da_mon)->id;
 }
 
-/*
- * da_create_or_get - create the per-object storage if not already there
- *
- * This needs a lookup so should be guarded by RCU, the condition is checked
- * directly in da_create_storage()
- */
-static inline void da_create_or_get(da_id_type id, monitor_target target)
-{
-	guard(rcu)();
-	da_create_storage(id, target, da_get_monitor(id, target));
-}
-
 /*
  * da_fill_empty_storage - store the target in a pre-allocated storage
  *
@@ -537,15 +568,96 @@ static inline monitor_target da_get_target_by_id(da_id_type id)
 	return mon_storage->target;
 }
 
+/*
+ * Per-object pool state.
+ *
+ * Zero-initialised by default (storage == NULL ⟹ kmalloc mode).  A monitor
+ * opts into pool mode by defining DA_MON_ALLOCATION_STRATEGY DA_ALLOC_POOL
+ * and DA_MON_POOL_SIZE before including this header; da_monitor_init() then
+ * pre-allocates the pool internally.
+ *
+ * Because every field is wrapped in this struct and the struct itself is a
+ * per-TU static, each monitor that includes this header gets a completely
+ * independent pool.  A kmalloc monitor (e.g. nomiss) and a pool monitor
+ * (e.g. tlob) therefore coexist without any interference.
+ *
+ * da_pool_return_cb runs from softirq (non-PREEMPT_RT) or rcuc kthread
+ * (PREEMPT_RT); spin_lock_irqsave handles both.
+ */
+struct da_per_obj_pool {
+	struct da_monitor_storage  *storage;  /* non-NULL ⟹ pool mode */
+	struct da_monitor_storage **free;     /* kmalloc'd pointer stack */
+	unsigned int                free_top;
+	unsigned int                capacity; /* total number of slots */
+	spinlock_t                  lock;
+};
+
+static struct da_per_obj_pool da_pool = {
+	.lock = __SPIN_LOCK_UNLOCKED(da_pool.lock),
+};
+
+static void da_pool_return_cb(struct rcu_head *head)
+{
+	struct da_monitor_storage *ms =
+		container_of(head, struct da_monitor_storage, rcu);
+	unsigned long flags;
+
+	spin_lock_irqsave(&da_pool.lock, flags);
+	if (!WARN_ON_ONCE(!da_pool.free || da_pool.free_top >= da_pool.capacity))
+		da_pool.free[da_pool.free_top++] = ms;
+	spin_unlock_irqrestore(&da_pool.lock, flags);
+}
+
+/*
+ * da_create_or_get_pool - pop a slot and insert it into the hash.
+ *
+ * Returns the new da_monitor on success, NULL if the pool is exhausted, or
+ * the existing da_monitor if a concurrent caller already inserted the same id
+ * (in which case the popped slot is returned to the free stack).
+ *
+ * Must be called inside an RCU read-side critical section (guard(rcu)()).
+ */
+static inline struct da_monitor *
+da_create_or_get_pool(da_id_type id, monitor_target target)
+{
+	struct da_monitor_storage *mon_storage, *existing;
+	unsigned long flags;
+
+	spin_lock_irqsave(&da_pool.lock, flags);
+	if (!da_pool.free_top) {
+		spin_unlock_irqrestore(&da_pool.lock, flags);
+		return NULL;
+	}
+	mon_storage = da_pool.free[--da_pool.free_top];
+	spin_unlock_irqrestore(&da_pool.lock, flags);
+
+	mon_storage->id = id;
+	mon_storage->target = target;
+
+	/*
+	 * A concurrent caller may have inserted the same id between our spinlock
+	 * release and here.  Return the slot to the pool and yield to the winner.
+	 */
+	existing = __da_get_mon_storage(id);
+	if (unlikely(existing)) {
+		spin_lock_irqsave(&da_pool.lock, flags);
+		da_pool.free[da_pool.free_top++] = mon_storage;
+		spin_unlock_irqrestore(&da_pool.lock, flags);
+		return &existing->rv.da_mon;
+	}
+	hash_add_rcu(da_monitor_ht, &mon_storage->node, id);
+	return &mon_storage->rv.da_mon;
+}
+
+
 /*
  * da_destroy_storage - destroy the per-object storage
  *
- * The caller is responsible to synchronise writers, either with locks or
- * implicitly. For instance, if da_destroy_storage is called at sched_exit and
- * da_create_storage can never occur after that, it's safe to call this without
- * locks.
- * This function includes an RCU read-side critical section to synchronise
- * against da_monitor_destroy().
+ * Pool mode: removes from hash and returns the slot via call_rcu().
+ * Kmalloc mode: removes from hash and frees via kfree_rcu().
+ *
+ * Includes an RCU read-side critical section to synchronise against
+ * da_monitor_destroy().
  */
 static inline void da_destroy_storage(da_id_type id)
 {
@@ -558,7 +670,11 @@ static inline void da_destroy_storage(da_id_type id)
 		return;
 	da_monitor_reset_hook(&mon_storage->rv.da_mon);
 	hash_del_rcu(&mon_storage->node);
+#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
+	call_rcu(&mon_storage->rcu, da_pool_return_cb);
+#else
 	kfree_rcu(mon_storage, rcu);
+#endif
 }
 
 static void __da_monitor_reset_all(void (*reset)(struct da_monitor *))
@@ -581,13 +697,87 @@ static inline void da_monitor_reset_state_all(void)
 	__da_monitor_reset_all(da_monitor_reset_state);
 }
 
+/* Not part of the public API; called by da_monitor_init() for DA_ALLOC_POOL. */
+static inline int __da_monitor_init_pool(unsigned int prealloc_count)
+{
+	da_pool.storage = kcalloc(prealloc_count, sizeof(*da_pool.storage),
+				  GFP_KERNEL);
+	if (!da_pool.storage)
+		return -ENOMEM;
+
+	da_pool.free = kmalloc_array(prealloc_count, sizeof(*da_pool.free),
+				     GFP_KERNEL);
+	if (!da_pool.free) {
+		kfree(da_pool.storage);
+		da_pool.storage = NULL;
+		return -ENOMEM;
+	}
+
+	da_pool.capacity = prealloc_count;
+	da_pool.free_top = 0;
+	for (unsigned int i = 0; i < prealloc_count; i++)
+		da_pool.free[da_pool.free_top++] = &da_pool.storage[i];
+	return 0;
+}
+
+/*
+ * da_monitor_init - initialise the per-object monitor
+ *
+ * Selects the allocation path at compile time based on DA_MON_ALLOCATION_STRATEGY:
+ *   DA_ALLOC_POOL   - pre-allocates DA_MON_POOL_SIZE storage slots.
+ *   DA_ALLOC_AUTO / DA_ALLOC_MANUAL - initialises the hash table only.
+ */
 static inline int da_monitor_init(void)
 {
 	hash_init(da_monitor_ht);
+#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
+	return __da_monitor_init_pool(DA_MON_POOL_SIZE);
+#else
 	return 0;
+#endif
 }
 
-static inline void da_monitor_destroy(void)
+static inline void da_monitor_destroy_pool(void)
+{
+	struct da_monitor_storage *ms;
+	struct hlist_node *tmp;
+	int bkt;
+
+	/*
+	 * Ensure all in-flight tracepoint handlers that may hold a raw pointer
+	 * to a pool slot (e.g. tlob_stop_task after its RCU guard exits) have
+	 * completed before we begin tearing down the pool.  Mirrors the same
+	 * call in da_monitor_destroy_kmalloc().
+	 */
+	tracepoint_synchronize_unregister();
+
+	/*
+	 * Drain any entries that were not stopped before destroy (e.g.
+	 * uprobe-started sessions whose stop probe never fired).  Call
+	 * da_extra_cleanup() before hash_del_rcu() so the hook may safely
+	 * call ha_cancel_timer_sync() while the monitor is still reachable.
+	 */
+	hash_for_each_safe(da_monitor_ht, bkt, tmp, ms, node) {
+		da_extra_cleanup(&ms->rv.da_mon);
+		hash_del_rcu(&ms->node);
+		call_rcu(&ms->rcu, da_pool_return_cb);
+	}
+
+	/*
+	 * rcu_barrier() drains every pending call_rcu() callback, including
+	 * both da_pool_return_cb() and any monitor-specific free callbacks
+	 * (e.g. tlob_free_rcu) enqueued by da_extra_cleanup().
+	 */
+	rcu_barrier();
+	kfree(da_pool.storage);
+	da_pool.storage = NULL;
+	kfree(da_pool.free);
+	da_pool.free = NULL;
+	da_pool.free_top = 0;
+	da_pool.capacity = 0;
+}
+
+static inline void da_monitor_destroy_kmalloc(void)
 {
 	struct da_monitor_storage *mon_storage;
 	struct hlist_node *tmp;
@@ -607,15 +797,51 @@ static inline void da_monitor_destroy(void)
 }
 
 /*
- * Allow the per-object monitors to run allocation manually, necessary if the
- * start condition is in a context problematic for allocation (e.g. scheduling).
- * In such case, if the storage was pre-allocated without a target, set it now.
+ * da_monitor_destroy - tear down the per-object monitor
+ *
+ * DA_ALLOC_POOL: calls tracepoint_synchronize_unregister() to drain any
+ * in-flight handlers, then iterates the hash draining remaining entries via
+ * da_extra_cleanup() + hash_del_rcu() + call_rcu(), then rcu_barrier() to
+ * wait for all pending da_pool_return_cb() callbacks before freeing the pool.
+ * DA_ALLOC_AUTO / DA_ALLOC_MANUAL: drains remaining entries after
+ * tracepoint_synchronize_unregister() + synchronize_rcu().
  */
-#ifdef DA_SKIP_AUTO_ALLOC
-#define da_prepare_storage da_fill_empty_storage
+static inline void da_monitor_destroy(void)
+{
+#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
+	da_monitor_destroy_pool();
 #else
+	da_monitor_destroy_kmalloc();
+#endif
+}
+
+/*
+ * da_prepare_storage - obtain (or create) the da_monitor for (id, target)
+ *
+ * The implementation is selected at compile time by DA_MON_ALLOCATION_STRATEGY:
+ *
+ * DA_ALLOC_AUTO   - calls da_create_storage() (lock-free kmalloc_nolock).
+ * DA_ALLOC_POOL   - if an entry already exists, returns it; otherwise pops a
+ *                   slot from the pre-allocated pool and re-looks it up.
+ *                   Returns NULL if the pool is exhausted.
+ * DA_ALLOC_MANUAL - caller has already inserted storage via da_create_empty_storage();
+ *                   only fills in the target field if it was left NULL.
+ */
+#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
+static inline struct da_monitor *da_prepare_storage(da_id_type id,
+						     monitor_target target,
+						     struct da_monitor *da_mon)
+{
+	if (da_mon)
+		return da_mon;
+	/* da_create_or_get_pool() returns the da_monitor directly; no re-lookup needed. */
+	return da_create_or_get_pool(id, target);
+}
+#elif DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_MANUAL
+#define da_prepare_storage da_fill_empty_storage
+#else /* DA_ALLOC_AUTO */
 #define da_prepare_storage da_create_storage
-#endif /* DA_SKIP_AUTO_ALLOC */
+#endif
 
 #endif /* RV_MON_TYPE */
 
diff --git a/kernel/trace/rv/monitors/nomiss/nomiss.c b/kernel/trace/rv/monitors/nomiss/nomiss.c
index 8ead8783c29f..ac4d334e757f 100644
--- a/kernel/trace/rv/monitors/nomiss/nomiss.c
+++ b/kernel/trace/rv/monitors/nomiss/nomiss.c
@@ -17,8 +17,8 @@
 
 #define RV_MON_TYPE RV_MON_PER_OBJ
 #define HA_TIMER_TYPE HA_TIMER_WHEEL
-/* The start condition is on sched_switch, it's dangerous to allocate there */
-#define DA_SKIP_AUTO_ALLOC
+/* Allocate storage in sched_setscheduler; sched_switch is too hot to alloc. */
+#define DA_MON_ALLOCATION_STRATEGY DA_ALLOC_MANUAL
 typedef struct sched_dl_entity *monitor_target;
 #include "nomiss.h"
 #include <rv/ha_monitor.h>
@@ -214,7 +214,7 @@ static void handle_sys_enter(void *data, struct pt_regs *regs, long id)
 	if (p->policy == SCHED_DEADLINE)
 		da_reset(EXPAND_ID_TASK(p));
 	else if (new_policy == SCHED_DEADLINE)
-		da_create_or_get(EXPAND_ID_TASK(p));
+		da_create_empty_storage(get_entity_id(&p->dl, task_cpu(p), DL_TASK));
 }
 
 static void handle_sched_wakeup(void *data, struct task_struct *tsk)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY
  2026-06-07 16:13 ` [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY wen.yang
@ 2026-06-15  9:56   ` Gabriele Monaco
  0 siblings, 0 replies; 13+ messages in thread
From: Gabriele Monaco @ 2026-06-15  9:56 UTC (permalink / raw)
  To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:

> +#ifndef DA_MON_ALLOCATION_STRATEGY
> +# define DA_MON_ALLOCATION_STRATEGY DA_ALLOC_AUTO
> +#endif

I'm not sure the space goes there according to kernel coding style, we
don't use it in this file and clang-format removes it. I'd keep
consistency.

...

>  
> +/*
> + * DA_MON_POOL_SIZE must be defined before this header is included

I don't think we need to be this verbose (also, ha_monitor may not even
be included if that's a da_monitor). I would stop at the line above.

> (directly or
> + * transitively via ha_monitor.h) when DA_ALLOC_POOL is selected. 
> In practice
> + * this means defining it after the monitor's model header (which
> supplies the
> + * capacity constant) and before the ha_monitor.h include.
> + */

> +#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL &&
> !defined(DA_MON_POOL_SIZE)
> +# error "DA_ALLOC_POOL requires DA_MON_POOL_SIZE to be defined
> before including this header"

Same here I'd keep consistency and remove the space before error.

> +#endif

...
  
> +/*
> + * Per-object pool state.
> + *
> + * Zero-initialised by default (storage == NULL ⟹ kmalloc mode).  A

Mmh, ⟹  doesn't seem to print that well on my terminal, let's perhaps
use plain old ASCII => . Also I don't find what you put in parentheses
to be adding much value, we could even omit it.

I remember discussing about this so I may have missed your answer, but
why don't we handle this pool as a simple kmem_cache/mempool instead of
implementing a similar logic from scratch?

> monitor
> + * opts into pool mode by defining DA_MON_ALLOCATION_STRATEGY
> DA_ALLOC_POOL
> + * and DA_MON_POOL_SIZE before including this header;
> da_monitor_init() then
> + * pre-allocates the pool internally.
> + *
> + * Because every field is wrapped in this struct and the struct
> itself is a
> + * per-TU static, each monitor that includes this header gets a
> completely
> + * independent pool.  A kmalloc monitor (e.g. nomiss) and a pool
> monitor
> + * (e.g. tlob) therefore coexist without any interference.
> + *
> + * da_pool_return_cb runs from softirq (non-PREEMPT_RT) or rcuc
> kthread
> + * (PREEMPT_RT); spin_lock_irqsave handles both.
> + */
> +struct da_per_obj_pool {
> +	struct da_monitor_storage  *storage;  /* non-NULL ⟹ pool
> mode */
> +	struct da_monitor_storage **free;     /* kmalloc'd pointer
> stack */
> +	unsigned int                free_top;
> +	unsigned int                capacity; /* total number of
> slots */
> +	spinlock_t                  lock;
> +};
> +
> +static struct da_per_obj_pool da_pool = {
> +	.lock = __SPIN_LOCK_UNLOCKED(da_pool.lock),
> +};

...

>  /*
>   * da_destroy_storage - destroy the per-object storage
>   *
> - * The caller is responsible to synchronise writers, either with
> locks or
> - * implicitly. For instance, if da_destroy_storage is called at
> sched_exit and
> - * da_create_storage can never occur after that, it's safe to call
> this without
> - * locks.
> - * This function includes an RCU read-side critical section to
> synchronise
> - * against da_monitor_destroy().
> + * Pool mode: removes from hash and returns the slot via call_rcu().
> + * Kmalloc mode: removes from hash and frees via kfree_rcu().
> + *
> + * Includes an RCU read-side critical section to synchronise against
> + * da_monitor_destroy().
>   */
>  static inline void da_destroy_storage(da_id_type id)
>  {
> @@ -558,7 +670,11 @@ static inline void da_destroy_storage(da_id_type
> id)
>  		return;
>  	da_monitor_reset_hook(&mon_storage->rv.da_mon);
>  	hash_del_rcu(&mon_storage->node);
> +#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
> +	call_rcu(&mon_storage->rcu, da_pool_return_cb);
> +#else

ifdeffery in functions is discouraged as quite unreadable. Since
DA_MON_ALLOCATION_STRATEGY is guaranteed to be defined, simple C ifs are
going to be mostly equivalent (the compiler will cut instead of the
preprocessor, but still).

>  	kfree_rcu(mon_storage, rcu);
> +#endif
>  }

...

> +
> +/*
> + * da_monitor_init - initialise the per-object monitor
> + *
> + * Selects the allocation path at compile time based on
> DA_MON_ALLOCATION_STRATEGY:
> + *   DA_ALLOC_POOL   - pre-allocates DA_MON_POOL_SIZE storage slots.
> + *   DA_ALLOC_AUTO / DA_ALLOC_MANUAL - initialises the hash table
> only.
> + */
>  static inline int da_monitor_init(void)
>  {
>  	hash_init(da_monitor_ht);
> +#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
> +	return __da_monitor_init_pool(DA_MON_POOL_SIZE);
> +#else

Same here, use if()

>  	return 0;
> +#endif
>  }
>  
> -static inline void da_monitor_destroy(void)
> +static inline void da_monitor_destroy_pool(void)
> +{
> +	struct da_monitor_storage *ms;
> +	struct hlist_node *tmp;
> +	int bkt;
> +
> +	/*
> +	 * Ensure all in-flight tracepoint handlers that may hold a
> raw pointer
> +	 * to a pool slot (e.g. tlob_stop_task after its RCU guard
> exits) have
> +	 * completed before we begin tearing down the pool.  Mirrors
> the same
> +	 * call in da_monitor_destroy_kmalloc().
> +	 */
> +	tracepoint_synchronize_unregister();
> +

This is common between the pool and kmalloc flavours, you can leave it
in da_monitor_destroy() before branching.

> +	/*
> +	 * Drain any entries that were not stopped before destroy
> (e.g.
> +	 * uprobe-started sessions whose stop probe never fired). 
> Call
> +	 * da_extra_cleanup() before hash_del_rcu() so the hook may
> safely
> +	 * call ha_cancel_timer_sync() while the monitor is still
> reachable.
> +	 */
> +	hash_for_each_safe(da_monitor_ht, bkt, tmp, ms, node) {
> +		da_extra_cleanup(&ms->rv.da_mon);
> +		hash_del_rcu(&ms->node);
> +		call_rcu(&ms->rcu, da_pool_return_cb);
> +	}

Cannot you make also this common? da_extra_cleanup() should be called in
all flavours and the only difference I see here is the rcu callback.

Also do you really need call_rcu() ? Since we should already have waited
for a grace period, you can probably call the function directly. If not,
I'd still try and make both flavours consistent (sync + free OR call_rcu
+ barrier, not both).

> +
> +	/*
> +	 * rcu_barrier() drains every pending call_rcu() callback,
> including
> +	 * both da_pool_return_cb() and any monitor-specific free
> callbacks
> +	 * (e.g. tlob_free_rcu) enqueued by da_extra_cleanup().
> +	 */
> +	rcu_barrier();
> +	kfree(da_pool.storage);
> +	da_pool.storage = NULL;
> +	kfree(da_pool.free);
> +	da_pool.free = NULL;
> +	da_pool.free_top = 0;
> +	da_pool.capacity = 0;

Only this part is really specific to this allocation flavour, if you
want it in a separate function, go ahead, but the rest should probably
share as much code as possible (especially the cleanup/synchronisation
mess we just worked out).

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 2/9] rv: add generic uprobe infrastructure for RV monitors
  2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
  2026-06-07 16:13 ` [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY wen.yang
@ 2026-06-07 16:13 ` wen.yang
  2026-06-07 16:13 ` [PATCH v3 3/9] rv/tlob: add tlob model DOT file wen.yang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: wen.yang @ 2026-06-07 16:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, linux-trace-kernel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

Introduce rv_uprobe, a thin wrapper around uprobe_consumer providing
rv_uprobe_attach_path(), rv_uprobe_attach(), and rv_uprobe_detach()
for RV monitors.  An opaque priv pointer is forwarded unchanged to
entry/return handlers so monitors can carry per-binding state (e.g. a
latency threshold) to the hot path without any global lookup.

rv_uprobe_detach() is fully synchronous (nosync + sync + path_put +
kfree), closing the use-after-free window present in open-coded
patterns where kfree() precedes uprobe_unregister_sync().

Suggested-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
 include/rv/rv_uprobe.h      | 119 +++++++++++++++++++++++
 kernel/trace/rv/Kconfig     |   4 +
 kernel/trace/rv/Makefile    |   1 +
 kernel/trace/rv/rv_uprobe.c | 182 ++++++++++++++++++++++++++++++++++++
 4 files changed, 306 insertions(+)
 create mode 100644 include/rv/rv_uprobe.h
 create mode 100644 kernel/trace/rv/rv_uprobe.c

diff --git a/include/rv/rv_uprobe.h b/include/rv/rv_uprobe.h
new file mode 100644
index 000000000000..9106c5c9275e
--- /dev/null
+++ b/include/rv/rv_uprobe.h
@@ -0,0 +1,119 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Generic uprobe infrastructure for RV monitors.
+ *
+ */
+
+#ifndef _RV_UPROBE_H
+#define _RV_UPROBE_H
+
+#include <linux/path.h>
+#include <linux/types.h>
+
+struct pt_regs;
+
+/**
+ * struct rv_uprobe - a single uprobe registered on behalf of an RV monitor
+ *
+ * @offset:   byte offset within the ELF binary where the probe is installed
+ * @priv:     monitor-private pointer; set at attach time, never touched by
+ *            this layer; passed unchanged to entry_fn / ret_fn
+ * @path:     resolved path of the probed binary (read-only after attach);
+ *            callers may use path.dentry for identity comparisons
+ *
+ * The implementation fields (uprobe_consumer, uprobe handle, callbacks) are
+ * private to rv_uprobe.c and are not exposed here; monitors must not access
+ * them directly.
+ */
+struct rv_uprobe {
+	/* public: read-only after rv_uprobe_attach*() */
+	loff_t		 offset;
+	void		*priv;
+	struct path	 path;
+};
+
+/**
+ * rv_uprobe_attach_path - register an uprobe given an already-resolved path
+ * @path:     path of the target binary; rv_uprobe takes its own reference
+ * @offset:   byte offset within the binary
+ * @entry_fn: called on probe hit (entry); may be NULL
+ * @ret_fn:   called on function return (uretprobe); may be NULL
+ * @priv:     opaque pointer forwarded to callbacks unchanged
+ *
+ * Use this variant when the caller has already resolved the path (e.g. to
+ * register multiple probes on the same binary with a single kern_path call).
+ * The inode is derived internally via d_real_inode(), so inode and path are
+ * always consistent.
+ *
+ * Returns a pointer to the new rv_uprobe on success, ERR_PTR on failure.
+ */
+struct rv_uprobe *rv_uprobe_attach_path(struct path *path, loff_t offset,
+	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
+	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
+			struct pt_regs *regs, __u64 *data),
+	void *priv);
+
+/**
+ * rv_uprobe_attach - resolve binpath and register an uprobe
+ * @binpath:  absolute path to the target binary
+ * @offset:   byte offset within the binary
+ * @entry_fn: called on probe hit (entry); may be NULL
+ * @ret_fn:   called on function return (uretprobe); may be NULL
+ * @priv:     opaque pointer forwarded to callbacks unchanged
+ *
+ * Resolves binpath via kern_path(), then delegates to rv_uprobe_attach_path().
+ *
+ * Returns a pointer to the new rv_uprobe on success, ERR_PTR on failure.
+ */
+struct rv_uprobe *rv_uprobe_attach(const char *binpath, loff_t offset,
+	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
+	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
+			struct pt_regs *regs, __u64 *data),
+	void *priv);
+
+/**
+ * rv_uprobe_detach - synchronously unregister an uprobe and free it
+ * @p:  probe to detach; may be NULL (no-op)
+ *
+ * Calls uprobe_unregister_nosync(), then uprobe_unregister_sync() to wait
+ * for any in-progress handler to finish, then releases the path reference
+ * and frees the rv_uprobe struct.  The caller's priv data is NOT freed.
+ *
+ * When removing a single probe, prefer this over the three-phase API.
+ * Safe to call from process context only (uprobe_unregister_sync() may
+ * schedule).
+ */
+void rv_uprobe_detach(struct rv_uprobe *p);
+
+/**
+ * rv_uprobe_unregister_nosync - dequeue an uprobe without waiting
+ * @p:  probe to dequeue; may be NULL (no-op)
+ *
+ * Removes the uprobe from the uprobe subsystem but does NOT wait for
+ * in-flight handlers to complete.  The caller must call rv_uprobe_sync()
+ * before calling rv_uprobe_free() on the same probe.
+ *
+ * Use this to batch multiple deregistrations before a single rv_uprobe_sync().
+ */
+void rv_uprobe_unregister_nosync(struct rv_uprobe *p);
+
+/**
+ * rv_uprobe_sync - wait for all in-flight uprobe handlers to complete
+ *
+ * Global barrier: waits for every in-flight uprobe handler across the system
+ * to finish.  Call once after a batch of rv_uprobe_unregister_nosync() calls
+ * and before any rv_uprobe_free() call.
+ */
+void rv_uprobe_sync(void);
+
+/**
+ * rv_uprobe_free - release resources of a previously deregistered probe
+ * @p:  probe to free; may be NULL (no-op)
+ *
+ * Releases the path reference and frees the rv_uprobe struct.  Must only
+ * be called after rv_uprobe_sync() has returned.  The caller's priv data
+ * is NOT freed.
+ */
+void rv_uprobe_free(struct rv_uprobe *p);
+
+#endif /* _RV_UPROBE_H */
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 3884b14df375..e2e0033a00b9 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -59,6 +59,10 @@ config RV_PER_TASK_MONITORS
 	  This option configures the maximum number of per-task RV monitors that can run
 	  simultaneously.
 
+config RV_UPROBE
+	bool
+	depends on RV && UPROBES
+
 source "kernel/trace/rv/monitors/wip/Kconfig"
 source "kernel/trace/rv/monitors/wwnr/Kconfig"
 
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 94498da35b37..f139b904bea3 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_RV_MON_STALL) += monitors/stall/stall.o
 obj-$(CONFIG_RV_MON_DEADLINE) += monitors/deadline/deadline.o
 obj-$(CONFIG_RV_MON_NOMISS) += monitors/nomiss/nomiss.o
 # Add new monitors here
+obj-$(CONFIG_RV_UPROBE) += rv_uprobe.o
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
 obj-$(CONFIG_RV_REACT_PANIC) += reactor_panic.o
diff --git a/kernel/trace/rv/rv_uprobe.c b/kernel/trace/rv/rv_uprobe.c
new file mode 100644
index 000000000000..3d8b764dded3
--- /dev/null
+++ b/kernel/trace/rv/rv_uprobe.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generic uprobe infrastructure for RV monitors.
+ *
+ */
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/slab.h>
+#include <linux/uprobes.h>
+#include <rv/rv_uprobe.h>
+
+/*
+ * Private extension of struct rv_uprobe.  Allocated by rv_uprobe_attach*()
+ * and returned to callers as &impl->pub.
+ */
+struct rv_uprobe_impl {
+	struct rv_uprobe	pub;	/* must be first; callers hold &pub */
+	struct uprobe_consumer	uc;
+	struct uprobe		*uprobe;
+	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data);
+	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
+			struct pt_regs *regs, __u64 *data);
+};
+
+static int rv_uprobe_handler(struct uprobe_consumer *uc,
+			     struct pt_regs *regs, __u64 *data)
+{
+	struct rv_uprobe_impl *impl = container_of(uc, struct rv_uprobe_impl, uc);
+
+	if (impl->entry_fn)
+		return impl->entry_fn(&impl->pub, regs, data);
+	return 0;
+}
+
+static int rv_uprobe_ret_handler(struct uprobe_consumer *uc,
+				 unsigned long func,
+				 struct pt_regs *regs, __u64 *data)
+{
+	struct rv_uprobe_impl *impl = container_of(uc, struct rv_uprobe_impl, uc);
+
+	if (impl->ret_fn)
+		return impl->ret_fn(&impl->pub, func, regs, data);
+	return 0;
+}
+
+static struct rv_uprobe *
+__rv_uprobe_attach(struct inode *inode, struct path *path, loff_t offset,
+		   int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
+		   int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
+				   struct pt_regs *regs, __u64 *data),
+		   void *priv)
+{
+	struct rv_uprobe_impl *impl;
+	int ret;
+
+	if (!entry_fn && !ret_fn)
+		return ERR_PTR(-EINVAL);
+
+	impl = kzalloc_obj(*impl, GFP_KERNEL);
+	if (!impl)
+		return ERR_PTR(-ENOMEM);
+
+	impl->pub.offset = offset;
+	impl->pub.priv   = priv;
+	impl->entry_fn   = entry_fn;
+	impl->ret_fn     = ret_fn;
+	path_get(path);
+	impl->pub.path   = *path;
+
+	if (entry_fn)
+		impl->uc.handler     = rv_uprobe_handler;
+	if (ret_fn)
+		impl->uc.ret_handler = rv_uprobe_ret_handler;
+
+	impl->uprobe = uprobe_register(inode, offset, 0, &impl->uc);
+	if (IS_ERR(impl->uprobe)) {
+		ret = PTR_ERR(impl->uprobe);
+		path_put(&impl->pub.path);
+		kfree(impl);
+		return ERR_PTR(ret);
+	}
+
+	return &impl->pub;
+}
+
+/**
+ * rv_uprobe_attach_path - register an uprobe given an already-resolved path
+ */
+struct rv_uprobe *rv_uprobe_attach_path(struct path *path, loff_t offset,
+	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
+	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
+			struct pt_regs *regs, __u64 *data),
+	void *priv)
+{
+	struct inode *inode = d_real_inode(path->dentry);
+
+	return __rv_uprobe_attach(inode, path, offset, entry_fn, ret_fn, priv);
+}
+EXPORT_SYMBOL_GPL(rv_uprobe_attach_path);
+
+/**
+ * rv_uprobe_attach - resolve binpath and register an uprobe
+ */
+struct rv_uprobe *rv_uprobe_attach(const char *binpath, loff_t offset,
+	int (*entry_fn)(struct rv_uprobe *p, struct pt_regs *regs, __u64 *data),
+	int (*ret_fn)(struct rv_uprobe *p, unsigned long func,
+			struct pt_regs *regs, __u64 *data),
+	void *priv)
+{
+	struct rv_uprobe *p;
+	struct path path;
+	int ret;
+
+	ret = kern_path(binpath, LOOKUP_FOLLOW, &path);
+	if (ret)
+		return ERR_PTR(ret);
+
+	if (!d_is_reg(path.dentry)) {
+		path_put(&path);
+		return ERR_PTR(-EINVAL);
+	}
+
+	p = rv_uprobe_attach_path(&path, offset, entry_fn, ret_fn, priv);
+	path_put(&path);
+	return p;
+}
+EXPORT_SYMBOL_GPL(rv_uprobe_attach);
+
+/**
+ * rv_uprobe_detach - synchronously unregister an uprobe and free it
+ */
+void rv_uprobe_detach(struct rv_uprobe *p)
+{
+	if (!p)
+		return;
+
+	rv_uprobe_unregister_nosync(p);
+	rv_uprobe_sync();
+	rv_uprobe_free(p);
+}
+EXPORT_SYMBOL_GPL(rv_uprobe_detach);
+
+/**
+ * rv_uprobe_unregister_nosync - dequeue an uprobe without waiting
+ */
+void rv_uprobe_unregister_nosync(struct rv_uprobe *p)
+{
+	struct rv_uprobe_impl *impl;
+
+	if (!p)
+		return;
+
+	impl = container_of(p, struct rv_uprobe_impl, pub);
+	uprobe_unregister_nosync(impl->uprobe, &impl->uc);
+}
+EXPORT_SYMBOL_GPL(rv_uprobe_unregister_nosync);
+
+/**
+ * rv_uprobe_sync - wait for all in-flight uprobe handlers to complete
+ */
+void rv_uprobe_sync(void)
+{
+	uprobe_unregister_sync();
+}
+EXPORT_SYMBOL_GPL(rv_uprobe_sync);
+
+/**
+ * rv_uprobe_free - release resources of a previously deregistered probe
+ */
+void rv_uprobe_free(struct rv_uprobe *p)
+{
+	struct rv_uprobe_impl *impl;
+
+	if (!p)
+		return;
+
+	impl = container_of(p, struct rv_uprobe_impl, pub);
+	path_put(&p->path);
+	kfree(impl);
+}
+EXPORT_SYMBOL_GPL(rv_uprobe_free);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/9] rv/tlob: add tlob model DOT file
  2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
  2026-06-07 16:13 ` [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY wen.yang
  2026-06-07 16:13 ` [PATCH v3 2/9] rv: add generic uprobe infrastructure for RV monitors wen.yang
@ 2026-06-07 16:13 ` wen.yang
  2026-06-07 16:13 ` [PATCH v3 4/9] rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check wen.yang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: wen.yang @ 2026-06-07 16:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, linux-trace-kernel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

Add tools/verification/models/tlob.dot, the Graphviz specification of
the tlob hybrid automaton.  The model has three states (running,
waiting, sleeping) connected by four transitions (switch_in, preempt,
wakeup, sleep) with a single clock invariant clk_elapsed < BUDGET_NS()
active in all states.

Suggested-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
 tools/verification/models/tlob.dot | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 tools/verification/models/tlob.dot

diff --git a/tools/verification/models/tlob.dot b/tools/verification/models/tlob.dot
new file mode 100644
index 000000000000..a1834daff2ed
--- /dev/null
+++ b/tools/verification/models/tlob.dot
@@ -0,0 +1,22 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = plaintext, style=invis, label=""] "__init_running"};
+	{node [shape = ellipse] "running"};
+	{node [shape = plaintext] "running"};
+	{node [shape = plaintext] "waiting"};
+	{node [shape = plaintext] "sleeping"};
+	"__init_running" -> "running";
+	"running"  -> "running"  [ label = "start;reset(clk_elapsed)" ];
+	"running"  [label = "running\nclk_elapsed < BUDGET_NS()",  color = green3];
+	"waiting"  [label = "waiting\nclk_elapsed < BUDGET_NS()"];
+	"sleeping" [label = "sleeping\nclk_elapsed < BUDGET_NS()"];
+	"running"  -> "sleeping" [ label = "sleep" ];
+	"running"  -> "waiting"  [ label = "preempt" ];
+	"waiting"  -> "running"  [ label = "switch_in" ];
+	"sleeping" -> "waiting"  [ label = "wakeup" ];
+	{ rank = min ;
+		"__init_running";
+		"running";
+	}
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/9] rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check
  2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
                   ` (2 preceding siblings ...)
  2026-06-07 16:13 ` [PATCH v3 3/9] rv/tlob: add tlob model DOT file wen.yang
@ 2026-06-07 16:13 ` wen.yang
  2026-06-15 10:12   ` Gabriele Monaco
  2026-06-07 16:13 ` [PATCH v3 5/9] rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable wen.yang
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: wen.yang @ 2026-06-07 16:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, linux-trace-kernel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

The function is documented as "prepare the invariant and return the time
since reset", but on the first call (env_store == U64_MAX) it exits
early without calling ha_set_invariant_ns():

  if (ha_monitor_env_invalid(ha_mon, env))  /* env_store == U64_MAX */
      return 0;   /* ha_set_invariant_ns skipped, env_store stays U64_MAX */
  ...
  ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);

This leaves env_store == U64_MAX, so ha_check_invariant_ns() always
passes on the first activation regardless of elapsed time:

  return READ_ONCE(ha_mon->env_store[env]) >= time_ns;  /* U64_MAX >= any */

Fix: establish the guard before converting to the invariant:

  if (ha_monitor_env_invalid(ha_mon, env))
      ha_reset_clk_ns(ha_mon, env, time_ns); /* guard: env_store = time_ns */
  passed = ha_get_env(ha_mon, env, time_ns);
  ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
                                 /* invariant: env_store = time_ns + expire */

Apply the same fix to ha_invariant_passed_jiffy().

Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
 include/rv/ha_monitor.h | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
index 28d3c74cabfc..e5860900a337 100644
--- a/include/rv/ha_monitor.h
+++ b/include/rv/ha_monitor.h
@@ -365,16 +365,22 @@ static inline bool ha_check_invariant_ns(struct ha_monitor *ha_mon,
 }
 /*
  * ha_invariant_passed_ns - prepare the invariant and return the time since reset
+ *
+ * If the env has not been initialised yet (first entry into a state with an
+ * invariant), anchor the guard clock at the current time so that the full
+ * budget is available from this point.  This preserves the documented
+ * guard→invariant ordering: ha_set_invariant_ns() is always preceded by a
+ * valid guard representation in env_store.
  */
 static inline u64 ha_invariant_passed_ns(struct ha_monitor *ha_mon, enum envs env,
 				   u64 expire, u64 time_ns)
 {
-	u64 passed = 0;
+	u64 passed;
 
 	if (env < 0 || env >= ENV_MAX_STORED)
 		return 0;
 	if (ha_monitor_env_invalid(ha_mon, env))
-		return 0;
+		ha_reset_clk_ns(ha_mon, env, time_ns);
 	passed = ha_get_env(ha_mon, env, time_ns);
 	ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
 	return passed;
@@ -404,16 +410,19 @@ static inline bool ha_check_invariant_jiffy(struct ha_monitor *ha_mon,
 }
 /*
  * ha_invariant_passed_jiffy - prepare the invariant and return the time since reset
+ *
+ * Same first-use semantics as ha_invariant_passed_ns(): anchor the guard clock
+ * now if the env has not been initialised.
  */
 static inline u64 ha_invariant_passed_jiffy(struct ha_monitor *ha_mon, enum envs env,
 				      u64 expire, u64 time_ns)
 {
-	u64 passed = 0;
+	u64 passed;
 
 	if (env < 0 || env >= ENV_MAX_STORED)
 		return 0;
 	if (ha_monitor_env_invalid(ha_mon, env))
-		return 0;
+		ha_reset_clk_jiffy(ha_mon, env);
 	passed = ha_get_env(ha_mon, env, time_ns);
 	ha_set_invariant_jiffy(ha_mon, env, expire - passed);
 	return passed;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/9] rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check
  2026-06-07 16:13 ` [PATCH v3 4/9] rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check wen.yang
@ 2026-06-15 10:12   ` Gabriele Monaco
  0 siblings, 0 replies; 13+ messages in thread
From: Gabriele Monaco @ 2026-06-15 10:12 UTC (permalink / raw)
  To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
> 
> The function is documented as "prepare the invariant and return the
> time
> since reset", but on the first call (env_store == U64_MAX) it exits
> early without calling ha_set_invariant_ns():
> 
>   if (ha_monitor_env_invalid(ha_mon, env))  /* env_store == U64_MAX
> */
>       return 0;   /* ha_set_invariant_ns skipped, env_store stays
> U64_MAX */
>   ...
>   ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
> 
> This leaves env_store == U64_MAX, so ha_check_invariant_ns() always
> passes on the first activation regardless of elapsed time:
> 
>   return READ_ONCE(ha_mon->env_store[env]) >= time_ns;  /* U64_MAX >=
> any */
> 
> Fix: establish the guard before converting to the invariant:
> 
>   if (ha_monitor_env_invalid(ha_mon, env))
>       ha_reset_clk_ns(ha_mon, env, time_ns); /* guard: env_store =
> time_ns */
>   passed = ha_get_env(ha_mon, env, time_ns);
>   ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
>                                  /* invariant: env_store = time_ns +
> expire */
> 
> Apply the same fix to ha_invariant_passed_jiffy().
> 
> Signed-off-by: Wen Yang <wen.yang@linux.dev>

This is neat and probably works just fine in your case. Anyway I'm
planning on a more seamless solution not involving invalid states at
all. Initialisation would include a reset, which should work better on
monitors doing da_handle_start(), so not running the first event.
That's, by the way, pretty much what you wanted by doing reset() on the
(virtual) init transition.

We first needs Nam's simplification though [1]. We can synchronise
during this development cycle, you probably won't really need to bother
for now.

Thanks,
Gabriele

[1] -
https://lore.kernel.org/lkml/08188c28f274da63a3f8549add3086a92aef45e5.1780908661.git.namcao@linutronix.de

> ---
>  include/rv/ha_monitor.h | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
> index 28d3c74cabfc..e5860900a337 100644
> --- a/include/rv/ha_monitor.h
> +++ b/include/rv/ha_monitor.h
> @@ -365,16 +365,22 @@ static inline bool ha_check_invariant_ns(struct
> ha_monitor *ha_mon,
>  }
>  /*
>   * ha_invariant_passed_ns - prepare the invariant and return the
> time since reset
> + *
> + * If the env has not been initialised yet (first entry into a state
> with an
> + * invariant), anchor the guard clock at the current time so that
> the full
> + * budget is available from this point.  This preserves the
> documented
> + * guard→invariant ordering: ha_set_invariant_ns() is always
> preceded by a
> + * valid guard representation in env_store.
>   */
>  static inline u64 ha_invariant_passed_ns(struct ha_monitor *ha_mon,
> enum envs env,
>  				   u64 expire, u64 time_ns)
>  {
> -	u64 passed = 0;
> +	u64 passed;
>  
>  	if (env < 0 || env >= ENV_MAX_STORED)
>  		return 0;
>  	if (ha_monitor_env_invalid(ha_mon, env))
> -		return 0;
> +		ha_reset_clk_ns(ha_mon, env, time_ns);
>  	passed = ha_get_env(ha_mon, env, time_ns);
>  	ha_set_invariant_ns(ha_mon, env, expire - passed, time_ns);
>  	return passed;
> @@ -404,16 +410,19 @@ static inline bool
> ha_check_invariant_jiffy(struct ha_monitor *ha_mon,
>  }
>  /*
>   * ha_invariant_passed_jiffy - prepare the invariant and return the
> time since reset
> + *
> + * Same first-use semantics as ha_invariant_passed_ns(): anchor the
> guard clock
> + * now if the env has not been initialised.
>   */
>  static inline u64 ha_invariant_passed_jiffy(struct ha_monitor
> *ha_mon, enum envs env,
>  				      u64 expire, u64 time_ns)
>  {
> -	u64 passed = 0;
> +	u64 passed;
>  
>  	if (env < 0 || env >= ENV_MAX_STORED)
>  		return 0;
>  	if (ha_monitor_env_invalid(ha_mon, env))
> -		return 0;
> +		ha_reset_clk_jiffy(ha_mon, env);
>  	passed = ha_get_env(ha_mon, env, time_ns);
>  	ha_set_invariant_jiffy(ha_mon, env, expire - passed);
>  	return passed;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 5/9] rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable
  2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
                   ` (3 preceding siblings ...)
  2026-06-07 16:13 ` [PATCH v3 4/9] rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check wen.yang
@ 2026-06-07 16:13 ` wen.yang
  2026-06-15 10:16   ` Gabriele Monaco
  2026-06-07 16:13 ` [PATCH v3 7/9] rv/tlob: add KUnit tests for the tlob monitor wen.yang
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: wen.yang @ 2026-06-07 16:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, linux-trace-kernel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

Wrap the two definitions with #ifndef guards so that HA-based monitors
can substitute their own implementations before including this header:

  /* in monitor.c, before #include <rv/ha_monitor.h> */
  #define da_monitor_reset_hook  my_monitor_reset_env
  #define EVENT_NONE_LBL         "idle"

No behaviour change for monitors that do not override either macro.

Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
 include/rv/ha_monitor.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
index e5860900a337..610da54c111f 100644
--- a/include/rv/ha_monitor.h
+++ b/include/rv/ha_monitor.h
@@ -36,7 +36,10 @@ static bool ha_monitor_handle_constraint(struct da_monitor *da_mon,
 					 da_id_type id);
 #define da_monitor_event_hook ha_monitor_handle_constraint
 #define da_monitor_init_hook ha_monitor_init_env
+/* Allow monitors to override da_monitor_reset_hook before including this header. */
+#ifndef da_monitor_reset_hook
 #define da_monitor_reset_hook ha_monitor_reset_env
+#endif
 #define da_monitor_sync_hook() synchronize_rcu()
 
 #if !defined(HA_SKIP_AUTO_CLEANUP) && RV_MON_TYPE == RV_MON_PER_TASK
@@ -75,7 +78,9 @@ _Static_assert(offsetof(struct ha_monitor, da_mon) == 0,
 #define ENV_INVALID_VALUE U64_MAX
 /* Error with no event occurs only on timeouts */
 #define EVENT_NONE EVENT_MAX
+#ifndef EVENT_NONE_LBL
 #define EVENT_NONE_LBL "none"
+#endif
 #define ENV_BUFFER_SIZE 64
 
 #ifdef CONFIG_RV_REACTORS
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 5/9] rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable
  2026-06-07 16:13 ` [PATCH v3 5/9] rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable wen.yang
@ 2026-06-15 10:16   ` Gabriele Monaco
  0 siblings, 0 replies; 13+ messages in thread
From: Gabriele Monaco @ 2026-06-15 10:16 UTC (permalink / raw)
  To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
> 
> Wrap the two definitions with #ifndef guards so that HA-based
> monitors
> can substitute their own implementations before including this
> header:
> 
>   /* in monitor.c, before #include <rv/ha_monitor.h> */
>   #define da_monitor_reset_hook  my_monitor_reset_env
>   #define EVENT_NONE_LBL         "idle"
> 
> No behaviour change for monitors that do not override either macro.
> 
> Signed-off-by: Wen Yang <wen.yang@linux.dev>
> ---
>  include/rv/ha_monitor.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
> index e5860900a337..610da54c111f 100644
> --- a/include/rv/ha_monitor.h
> +++ b/include/rv/ha_monitor.h
> @@ -36,7 +36,10 @@ static bool ha_monitor_handle_constraint(struct
> da_monitor *da_mon,
>  					 da_id_type id);
>  #define da_monitor_event_hook ha_monitor_handle_constraint
>  #define da_monitor_init_hook ha_monitor_init_env
> +/* Allow monitors to override da_monitor_reset_hook before including
> this header. */

Just a nit: users can do that but shouldn't do it mindlessly, so let's
add a line like:

  "Make sure you still call ha_monitor_reset_env() or reset timers
otherwise."

Other than that looks good

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

Thanks,
Gabriele

> +#ifndef da_monitor_reset_hook
>  #define da_monitor_reset_hook ha_monitor_reset_env
> +#endif
>  #define da_monitor_sync_hook() synchronize_rcu()
>  
>  #if !defined(HA_SKIP_AUTO_CLEANUP) && RV_MON_TYPE == RV_MON_PER_TASK
> @@ -75,7 +78,9 @@ _Static_assert(offsetof(struct ha_monitor, da_mon)
> == 0,
>  #define ENV_INVALID_VALUE U64_MAX
>  /* Error with no event occurs only on timeouts */
>  #define EVENT_NONE EVENT_MAX
> +#ifndef EVENT_NONE_LBL
>  #define EVENT_NONE_LBL "none"
> +#endif
>  #define ENV_BUFFER_SIZE 64
>  
>  #ifdef CONFIG_RV_REACTORS


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 7/9] rv/tlob: add KUnit tests for the tlob monitor
  2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
                   ` (4 preceding siblings ...)
  2026-06-07 16:13 ` [PATCH v3 5/9] rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable wen.yang
@ 2026-06-07 16:13 ` wen.yang
  2026-06-07 16:13 ` [PATCH v3 8/9] selftests/verification: fix verificationtest-ktap for out-of-tree execution wen.yang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: wen.yang @ 2026-06-07 16:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, linux-trace-kernel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

Add CONFIG_TLOB_KUNIT_TEST (tristate, depends on RV_MON_TLOB && KUNIT,
default KUNIT_ALL_TESTS) with a single test suite covering the uprobe
line parser: valid bindings are accepted, malformed ones return -EINVAL,
and out-of-range thresholds return -ERANGE.

Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
 kernel/trace/rv/Makefile                   |  1 +
 kernel/trace/rv/monitors/tlob/.kunitconfig |  6 ++
 kernel/trace/rv/monitors/tlob/Kconfig      |  7 ++
 kernel/trace/rv/monitors/tlob/tlob_kunit.c | 92 ++++++++++++++++++++++
 4 files changed, 106 insertions(+)
 create mode 100644 kernel/trace/rv/monitors/tlob/.kunitconfig
 create mode 100644 kernel/trace/rv/monitors/tlob/tlob_kunit.c

diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index ae59e97f8682..316d53398345 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_RV_MON_STALL) += monitors/stall/stall.o
 obj-$(CONFIG_RV_MON_DEADLINE) += monitors/deadline/deadline.o
 obj-$(CONFIG_RV_MON_NOMISS) += monitors/nomiss/nomiss.o
 obj-$(CONFIG_RV_MON_TLOB) += monitors/tlob/tlob.o
+obj-$(CONFIG_TLOB_KUNIT_TEST) += monitors/tlob/tlob_kunit.o
 # Add new monitors here
 obj-$(CONFIG_RV_UPROBE) += rv_uprobe.o
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
diff --git a/kernel/trace/rv/monitors/tlob/.kunitconfig b/kernel/trace/rv/monitors/tlob/.kunitconfig
new file mode 100644
index 000000000000..35d313dfc20d
--- /dev/null
+++ b/kernel/trace/rv/monitors/tlob/.kunitconfig
@@ -0,0 +1,6 @@
+CONFIG_FTRACE=y
+CONFIG_KUNIT=y
+CONFIG_MODULES=y
+CONFIG_RV=y
+CONFIG_RV_MON_TLOB=y
+CONFIG_TLOB_KUNIT_TEST=y
diff --git a/kernel/trace/rv/monitors/tlob/Kconfig b/kernel/trace/rv/monitors/tlob/Kconfig
index b29a375de228..7ec3326640c2 100644
--- a/kernel/trace/rv/monitors/tlob/Kconfig
+++ b/kernel/trace/rv/monitors/tlob/Kconfig
@@ -10,3 +10,10 @@ config RV_MON_TLOB
 	  monitor.  tlob tracks per-task elapsed wall-clock time across a
 	  user-delimited code section and emits error_env_tlob when the
 	  elapsed time exceeds a configurable per-invocation budget.
+
+config TLOB_KUNIT_TEST
+	tristate "KUnit tests for tlob monitor" if !KUNIT_ALL_TESTS
+	depends on RV_MON_TLOB && KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  Enable KUnit unit tests for the tlob RV monitor.
diff --git a/kernel/trace/rv/monitors/tlob/tlob_kunit.c b/kernel/trace/rv/monitors/tlob/tlob_kunit.c
new file mode 100644
index 000000000000..6450d61b26c3
--- /dev/null
+++ b/kernel/trace/rv/monitors/tlob/tlob_kunit.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit tests for the tlob RV monitor.
+ *
+ */
+#include <kunit/test.h>
+
+#include "tlob.h"
+
+MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING");
+
+static const char * const tlob_parse_valid[] = {
+	"p /usr/bin/myapp:4768 4848 threshold=5000000",
+	"p /usr/bin/myapp:0x12a0 0x12f0 threshold=10000000",
+	"p /opt/my:app/bin:0x100 0x200 threshold=1000000",
+};
+
+static const char * const tlob_parse_invalid[] = {
+	/* add: malformed */
+	"p :0x100 0x200 threshold=5000",
+	"p /usr/bin/myapp:0x100 threshold=5000",
+	"p /usr/bin/myapp:-1 0x200 threshold=5000",
+	"p /usr/bin/myapp:0x100 0x200",
+	"p /usr/bin/myapp:0x100 0x100 threshold=5000",
+	/* remove: malformed */
+	"-usr/bin/myapp:0x100",
+	"-/usr/bin/myapp",
+	"-/:0x100",
+	"-/usr/bin/myapp:abc",
+};
+
+/* threshold_ns < 1000 or > TLOB_MAX_THRESHOLD_NS return -ERANGE, not -EINVAL. */
+static const char * const tlob_parse_out_of_range[] = {
+	"p /usr/bin/myapp:0x100 0x200 threshold=0",
+	"p /usr/bin/myapp:0x100 0x200 threshold=999",
+	"p /usr/bin/myapp:0x100 0x200 threshold=3600000000001",  /* TLOB_MAX_THRESHOLD_NS + 1 */
+};
+
+/*
+ * Valid add lines return -ENOENT (kern_path() finds no such file in the test
+ * environment) rather than 0; a non-(-EINVAL) return confirms the format was
+ * accepted by the parser.
+ */
+static void tlob_parse_valid_accepted(struct kunit *test)
+{
+	char buf[128];
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tlob_parse_valid); i++) {
+		strscpy(buf, tlob_parse_valid[i], sizeof(buf));
+		KUNIT_EXPECT_NE(test, tlob_create_or_delete_uprobe(buf), -EINVAL);
+	}
+}
+
+static void tlob_parse_invalid_rejected(struct kunit *test)
+{
+	char buf[128];
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tlob_parse_invalid); i++) {
+		strscpy(buf, tlob_parse_invalid[i], sizeof(buf));
+		KUNIT_EXPECT_EQ(test, tlob_create_or_delete_uprobe(buf), -EINVAL);
+	}
+}
+
+static void tlob_parse_out_of_range_rejected(struct kunit *test)
+{
+	char buf[128];
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tlob_parse_out_of_range); i++) {
+		strscpy(buf, tlob_parse_out_of_range[i], sizeof(buf));
+		KUNIT_EXPECT_EQ(test, tlob_create_or_delete_uprobe(buf), -ERANGE);
+	}
+}
+
+static struct kunit_case tlob_parse_cases[] = {
+	KUNIT_CASE(tlob_parse_valid_accepted),
+	KUNIT_CASE(tlob_parse_invalid_rejected),
+	KUNIT_CASE(tlob_parse_out_of_range_rejected),
+	{}
+};
+
+static struct kunit_suite tlob_parse_suite = {
+	.name       = "tlob_parse",
+	.test_cases = tlob_parse_cases,
+};
+
+kunit_test_suite(tlob_parse_suite);
+
+MODULE_DESCRIPTION("KUnit tests for the tlob RV monitor");
+MODULE_LICENSE("GPL");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 8/9] selftests/verification: fix verificationtest-ktap for out-of-tree execution
  2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
                   ` (5 preceding siblings ...)
  2026-06-07 16:13 ` [PATCH v3 7/9] rv/tlob: add KUnit tests for the tlob monitor wen.yang
@ 2026-06-07 16:13 ` wen.yang
  2026-06-13 16:00 ` [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor Wen Yang
       [not found] ` <629023dbcc4389fcc6ec46d88c98eb19aa0abc36.1780847473.git.wen.yang@linux.dev>
  8 siblings, 0 replies; 13+ messages in thread
From: wen.yang @ 2026-06-07 16:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, linux-trace-kernel, linux-kernel, Wen Yang

From: Wen Yang <wen.yang@linux.dev>

verificationtest-ktap used CWD-relative paths which broke when
invoked outside the verification directory (e.g. via vng).

Resolve paths via realpath "$(dirname "$0")" so the script works
from any working directory.  Accept an optional subdirectory argument
interpreted relative to the script's directory.

Suggested-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Wen Yang <wen.yang@linux.dev>
---
 tools/testing/selftests/verification/verificationtest-ktap | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/verification/verificationtest-ktap b/tools/testing/selftests/verification/verificationtest-ktap
index 18f7fe324e2f..055747cef38a 100755
--- a/tools/testing/selftests/verification/verificationtest-ktap
+++ b/tools/testing/selftests/verification/verificationtest-ktap
@@ -5,4 +5,6 @@
 #
 # Copyright (C) Arm Ltd., 2023
 
-../ftrace/ftracetest -K -v --rv ../verification
+dir=$(realpath "$(dirname "$0")")
+testdir=$(cd "$dir" && realpath "${1:-.}")
+"$dir/../ftrace/ftracetest" -K -v --rv "$testdir"
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor
  2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
                   ` (6 preceding siblings ...)
  2026-06-07 16:13 ` [PATCH v3 8/9] selftests/verification: fix verificationtest-ktap for out-of-tree execution wen.yang
@ 2026-06-13 16:00 ` Wen Yang
       [not found] ` <629023dbcc4389fcc6ec46d88c98eb19aa0abc36.1780847473.git.wen.yang@linux.dev>
  8 siblings, 0 replies; 13+ messages in thread
From: Wen Yang @ 2026-06-13 16:00 UTC (permalink / raw)
  To: Gabriele Monaco; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel


Hi Gabriele,

Gentle ping on this series. Please let me know if there are any
concerns or if further changes are needed.

Thanks for your time,
Wen


On 6/8/26 00:13, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
> 
> This series introduces tlob (task latency over budget), a per-task
> hybrid automaton RV monitor that measures elapsed wall-clock time across
> a user-delimited code section and fires when the time exceeds a
> configurable budget.
> 
> The series applies cleanly on top of:
>    [1] git://git.kernel.org/pub/scm/linux/kernel/git/gmonaco/linux.git rv-fixes-7.1
>        "rv fixes for v7.1"
> 
> Background
> ----------
> The existing wwnr monitor uses a two-state DA to detect tasks that are
> woken but never run.  tlob extends the RV framework to a three-state
> hybrid automaton:
> 
>    running  (initial) -- on CPU
>    waiting             -- in the scheduler runqueue, not yet on CPU
>    sleeping            -- blocked on a lock, I/O, or similar resource
> 
> A single HA clock invariant, clk_elapsed < BUDGET_NS(), is active in
> all states.  The framework enforces it via a per-task hrtimer.  On
> expiry, error_env_tlob is emitted, followed by detail_env_tlob which
> carries a per-state time breakdown (running_ns, waiting_ns, sleeping_ns)
> that pinpoints whether the overrun occurred in the running, waiting, or
> sleeping state.
> 
> Userspace interface
> -------------------
> Tasks are registered for monitoring by writing to the tracefs monitor
> file:
> 
>    # echo "p /path/to/binary:START_OFFSET STOP_OFFSET threshold=NS" \
>          > /sys/kernel/tracing/rv/monitors/tlob/monitor
> 
> Two uprobes are registered at START_OFFSET (entry) and STOP_OFFSET
> (exit) of the delimited section.  When a task executes the entry uprobe,
> the monitor starts; when the task reaches the exit uprobe or the budget
> expires, monitoring stops and the slot is returned to the pool.
> 
> Multiple uprobe pairs can be registered for the same binary or different
> binaries.  Each task can have at most one active monitoring session; if
> a task hits a start uprobe while already monitored, the prior session is
> cancelled and a new one begins.
> 
> Series structure
> ----------------
> Patch 1: rv/da: introduce DA_MON_ALLOCATION_STRATEGY
>    Consolidates per-object DA storage allocation under a compile-time
>    selector with three strategies:
>      DA_ALLOC_AUTO   (default) - lock-free kmalloc_nolock; unbounded
>      DA_ALLOC_POOL             - pre-allocated fixed-size pool
>      DA_ALLOC_MANUAL           - caller pre-inserts storage
> 
>    da_handle_start_event() and da_handle_start_run_event() call
>    da_prepare_storage() which resolves at compile time to the correct
>    allocation function.
> 
>    This patch also includes critical correctness fixes for the pool
>    implementation:
>    - Add tracepoint_synchronize_unregister() in da_monitor_destroy_pool()
>      to fix UAF where in-flight handlers access freed pool storage
>    - Fix duplicate hash entry race in da_create_or_get_pool() via
>      concurrent-insert detection under RCU
>    - Add capacity field to fix build error (DA_MON_POOL_SIZE undeclared
>      in da_pool_return_cb)
> 
> Patch 2: rv: add generic uprobe infrastructure for RV monitors
>    Introduces rv_uprobe, a thin wrapper around uprobe_consumer for RV
>    monitors.  Provides rv_uprobe_register(), rv_uprobe_unregister(),
>    and rv_uprobe_sync() for safe teardown.
> 
> Patch 3: rv/tlob: add tlob model DOT file
>    The formal model used to generate tlob.h.
> 
> Patch 4: rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check
>    Fixes a bug where ha_invariant_passed_ns() returned 0 early when
>    env_store was invalid (U64_MAX), leaving it at U64_MAX and causing
>    ha_check_invariant_ns() to always pass.  The fix calls ha_reset_clk_ns()
>    then ha_set_invariant_ns() on first use.
> 
> Patch 5: rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable
>    Allows tlob to override EVENT_NONE_LBL for its start_tlob self-loop.
> 
> Patch 6: rv/tlob: add tlob hybrid automaton monitor
>    The main tlob implementation, including:
>    - Three-state HA (running/waiting/sleeping)
>    - Per-task hrtimer enforcement (HRTIMER_MODE_REL_HARD)
>    - DA_ALLOC_POOL for allocation-free hot path
>    - Uprobe registration via tracefs monitor file
>    - Per-state time accumulation (running_ns, waiting_ns, sleeping_ns)
>    - HIGH_RES_TIMERS dependency in Kconfig
> 
> Patches 7-9: Tests
>    - KUnit tests for tlob monitor
>    - Selftest infrastructure fixes
>    - tlob selftests (uprobe binding, state tracking, violation detection)
> 
> Changes since v2
> ----------------
> All feedback from Gabriele Monaco has been addressed:
> 
> -- Patch 02 (per-task slot ordering / ha_monitor_reset_env):
>     Dropped from v3; rebased on top of Gabriele's series [1].
> 
> -- Patch 03 (verificationtest-ktap):
>     Changed to use realpath for robustness as suggested.
> 
> -- Patch 04 (pre-allocated storage pool):
>     Complete redesign as DA_MON_ALLOCATION_STRATEGY:
>     - Three strategies (AUTO/POOL/MANUAL) via compile-time macro
>     - da_monitor_init_prealloc() removed; da_monitor_init() selects
>       internally
>     - da_create_or_get_kmalloc() removed (no viable use case)
>     - nomiss updated to use DA_ALLOC_MANUAL
>     - da_extra_cleanup() hook added for per-entry teardown
> 
>     Critical bug fixes included in this patch:
>     - tracepoint_synchronize_unregister() added to da_monitor_destroy_pool()
>       to prevent UAF from in-flight handlers accessing freed pool storage
>     - Duplicate hash entry race fixed in da_create_or_get_pool() via
>       concurrent-insert detection and slot return under RCU
>     - capacity field added to fix DA_MON_POOL_SIZE undeclared build error
> 
> -- Patch 05 (generic uprobe infrastructure):
>     Carried unchanged into v3.
> 
> -- Patch 06 (rvgen __init arrow reset):
>     Carried unchanged into v3.
> 
> -- Patch 08 (tlob monitor):
>     Split and refactored:
>     - ioctl interface deferred to follow-up series (tracefs-only in v3)
>     - Handler simplification: three inline helpers (tlob_acc_running/
>       waiting/sleeping) with scoped_guard(rcu)
>     - do_prev/do_next flags removed (da_handle_event skips unmonitored)
>     - scoped_guard(rcu) and guard(mutex) applied throughout
>     - tlob_stop_all() removed; da_extra_cleanup() hook used instead
>     - start_tlob self-loop added to DOT model as suggested
>     - ha_setup_invariants() guards against redundant timer restart
>     - HIGH_RES_TIMERS dependency added to Kconfig
> 
> Additional improvements in v3
> ------------------------------
> Beyond the v2 feedback, this version includes:
> 
> 1. Simplified tlob monitor implementation:
>     - Removed redundant tlob_num_monitored atomic counter
>       (da_handle_event already handles unmonitored tasks via hash lookup)
>     - Eliminated extra cacheline touch on every sched_switch/sched_wakeup
>     - Several repeated pattern simplifications.
> 
> 2. Extracted common accumulation logic:
>     - __tlob_acc() using offsetof() replaces three nearly-identical functions
>     - Reduces code duplication while maintaining type safety
> 
> 3. Complete test coverage:
>     - KUnit tests for core functionality
>     - Comprehensive selftests for uprobe integration, state tracking,
>       and violation detection
> 
> Testing
> -------
> All patches have been tested on:
> - x86_64 with CONFIG_PREEMPT_RT
> - All KUnit tests pass
> - All selftests pass with verificationtest-ktap
> 
>    
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/gmonaco/linux.git rv-fixes-7.1
>      "rv fixes for v7.1"
> 
> 
> Wen Yang (9):
>    rv/da: introduce DA_MON_ALLOCATION_STRATEGY
>    rv: add generic uprobe infrastructure for RV monitors
>    rv/tlob: add tlob model DOT file
>    rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check
>    rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable
>    rv/tlob: add tlob hybrid automaton monitor
>    rv/tlob: add KUnit tests for the tlob monitor
>    selftests/verification: fix verificationtest-ktap for out-of-tree
>      execution
>    selftests/verification: add tlob selftests
> 
>   Documentation/trace/rv/index.rst              |   1 +
>   Documentation/trace/rv/monitor_tlob.rst       | 177 ++++
>   include/rv/da_monitor.h                       | 276 ++++-
>   include/rv/ha_monitor.h                       |  22 +-
>   include/rv/rv_uprobe.h                        | 119 +++
>   kernel/trace/rv/Kconfig                       |   5 +
>   kernel/trace/rv/Makefile                      |   3 +
>   kernel/trace/rv/monitors/nomiss/nomiss.c      |   6 +-
>   kernel/trace/rv/monitors/tlob/.kunitconfig    |   6 +
>   kernel/trace/rv/monitors/tlob/Kconfig         |  19 +
>   kernel/trace/rv/monitors/tlob/tlob.c          | 968 ++++++++++++++++++
>   kernel/trace/rv/monitors/tlob/tlob.h          | 148 +++
>   kernel/trace/rv/monitors/tlob/tlob_kunit.c    |  92 ++
>   kernel/trace/rv/monitors/tlob/tlob_trace.h    |  49 +
>   kernel/trace/rv/rv_trace.h                    |   1 +
>   kernel/trace/rv/rv_uprobe.c                   | 182 ++++
>   .../testing/selftests/verification/.gitignore |   2 +
>   tools/testing/selftests/verification/Makefile |  19 +-
>   .../verification/test.d/tlob/Makefile         |  20 +
>   .../verification/test.d/tlob/test.d/functions |   1 +
>   .../verification/test.d/tlob/tlob_sym.c       | 189 ++++
>   .../verification/test.d/tlob/tlob_target.c    | 138 +++
>   .../verification/test.d/tlob/uprobe_bind.tc   |  37 +
>   .../test.d/tlob/uprobe_detail_running.tc      |  51 +
>   .../test.d/tlob/uprobe_detail_sleeping.tc     |  50 +
>   .../test.d/tlob/uprobe_detail_waiting.tc      |  66 ++
>   .../verification/test.d/tlob/uprobe_multi.tc  |  64 ++
>   .../test.d/tlob/uprobe_no_event.tc            |  19 +
>   .../test.d/tlob/uprobe_violation.tc           |  67 ++
>   .../verification/verificationtest-ktap        |   4 +-
>   tools/verification/models/tlob.dot            |  22 +
>   31 files changed, 2789 insertions(+), 34 deletions(-)
>   create mode 100644 Documentation/trace/rv/monitor_tlob.rst
>   create mode 100644 include/rv/rv_uprobe.h
>   create mode 100644 kernel/trace/rv/monitors/tlob/.kunitconfig
>   create mode 100644 kernel/trace/rv/monitors/tlob/Kconfig
>   create mode 100644 kernel/trace/rv/monitors/tlob/tlob.c
>   create mode 100644 kernel/trace/rv/monitors/tlob/tlob.h
>   create mode 100644 kernel/trace/rv/monitors/tlob/tlob_kunit.c
>   create mode 100644 kernel/trace/rv/monitors/tlob/tlob_trace.h
>   create mode 100644 kernel/trace/rv/rv_uprobe.c
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/Makefile
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/test.d/functions
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/tlob_sym.c
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/tlob_target.c
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_bind.tc
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_detail_running.tc
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_detail_sleeping.tc
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_detail_waiting.tc
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_multi.tc
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_no_event.tc
>   create mode 100644 tools/testing/selftests/verification/test.d/tlob/uprobe_violation.tc
>   create mode 100644 tools/verification/models/tlob.dot
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <629023dbcc4389fcc6ec46d88c98eb19aa0abc36.1780847473.git.wen.yang@linux.dev>]

* Re: [PATCH v3 6/9] rv/tlob: add tlob hybrid automaton monitor
       [not found] ` <629023dbcc4389fcc6ec46d88c98eb19aa0abc36.1780847473.git.wen.yang@linux.dev>
@ 2026-06-15 15:24   ` Gabriele Monaco
  0 siblings, 0 replies; 13+ messages in thread
From: Gabriele Monaco @ 2026-06-15 15:24 UTC (permalink / raw)
  To: wen.yang; +Cc: Steven Rostedt, linux-trace-kernel, linux-kernel

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
> 
> Add tlob (task latency over budget), a per-task hybrid automaton RV
> monitor that tracks elapsed wall-clock time across a user-delimited
> code section and emits error_env_tlob when the elapsed time exceeds a
> configurable budget.
> 
> The monitor uses RV_MON_PER_OBJ with three states (running, waiting,
> sleeping) driven by sched_switch and sched_wakeup tracepoints, and a
> single clock invariant clk_elapsed < budget enforced by an hrtimer
> (HRTIMER_MODE_REL_HARD).  On violation, detail_env_tlob provides a
> per-state time breakdown (running_ns, waiting_ns, sleeping_ns).
> 
> Per-task state is managed via DA_ALLOC_POOL to avoid allocation on
> the
> scheduler tracepoint path.  Uprobe pairs are registered through the
> tracefs monitor file as "p PATH:OFFSET_START OFFSET_STOP
> threshold=NS".
> 
> Also adds ha_cancel_timer_sync() to ha_monitor.h, a blocking cancel
> variant needed by tlob's stop_task path to ensure the hrtimer
> callback
> has completed before the per-task monitor state is freed.
> 
> Suggested-by: Gabriele Monaco <gmonaco@redhat.com>

I'm not sure this applies to be fair, I'm reviewing the patch and as
part of that I'm suggesting things, but the main idea is yours [1].

Unless you mess up, I'm going to appear as reviewer anyway ;)

[1] - https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

> Signed-off-by: Wen Yang <wen.yang@linux.dev>
> ---

...

> diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
> index e2e0033a00b9..ed2de31d0312 100644
> --- a/kernel/trace/rv/Kconfig
> +++ b/kernel/trace/rv/Kconfig
> @@ -85,6 +85,7 @@ source "kernel/trace/rv/monitors/sleep/Kconfig"
>  source "kernel/trace/rv/monitors/stall/Kconfig"
>  source "kernel/trace/rv/monitors/deadline/Kconfig"
>  source "kernel/trace/rv/monitors/nomiss/Kconfig"
> +source "kernel/trace/rv/monitors/tlob/Kconfig"
>  # Add new deadline monitors here

Your line should be after the marker for deadline monitors, this is
required for the Kconfig to look good with nested monitors.

Just let rvgen handle the way it looks (V2 was fine, just above the
general monitor marker):

 # Add new deadline monitors here

+source "kernel/trace/rv/monitors/tlob/Kconfig"
 # Add new monitors here

>  
>  # Add new monitors here

> diff --git a/kernel/trace/rv/monitors/tlob/tlob.c
> b/kernel/trace/rv/monitors/tlob/tlob.c
> new file mode 100644
> index 000000000000..d8e0c4794720
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/tlob/tlob.c
> @@ -0,0 +1,968 @@
> +#include <linux/hrtimer.h>
> +#include <linux/kernel.h>
> +#include <linux/ktime.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/namei.h>
> +#include <linux/rv.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/tracefs.h>
> +#include <kunit/visibility.h>
> +#include <rv/instrumentation.h>
> +#include <rv/rv_uprobe.h>

You added a few includes I don't believe you need: hrtimer, ktime, sched
(included in events/sched) and tracefs (included in rv).

You can keep the others as they come from the template, all monitors
include them anyway.

> +#include "../../rv.h"

This is <rv.h> the includepath allows that just like rv_trace.

> +
> +#define MODULE_NAME "tlob"
> +
> +#include <trace/events/sched.h>
> +#include <rv_trace.h>

...

> +/* Type for da_monitor_storage.target; must be defined before the
> includes. */
> +typedef struct tlob_task_state *monitor_target;
> +
> +/* Forward-declared so da_monitor_reset_hook works before
> ha_monitor.h. */
> +static inline void tlob_reset_notify(struct da_monitor *da_mon);
> +#define da_monitor_reset_hook tlob_reset_notify
> +
> +/* Override EVENT_NONE_LBL so the timer-fired violation shows
> "budget_exceeded". */
> +#define EVENT_NONE_LBL "budget_exceeded"
> +
> +#include "tlob.h"
> +
> +/*
> + * DA_MON_POOL_SIZE must be defined HERE: after tlob.h (which
> defines
> + * TLOB_MAX_MONITORED) and before #include <rv/ha_monitor.h> (which
> + * transitively includes da_monitor.h and expands
> __da_monitor_init_pool
> + * using this macro).  Placing the define before tlob.h or after
> + * ha_monitor.h both cause a build error.
> + */

I don't think you need this comment here. I would add a comment in case
moving it somewhere else would silently change the behaviour, but here
the compiler is going to tell you what isn't defined if you really want
to move it around.

Same for the one line comments above but those are less invasive.

> +#define DA_MON_POOL_SIZE TLOB_MAX_MONITORED
> +
> +/*
> + * Forward-declare tlob_extra_cleanup so the #define below is valid
> when
> + * da_monitor.h (included via ha_monitor.h) expands da_extra_cleanup
> inside
> + * da_monitor_destroy().  The full definition follows after
> ha_monitor.h.
> + */

I'd argue the same here. Though defining da_extra_cleanup() before is
the actual catch (if you do after, the compiler won't tell you it's
using a no-op). If you really want to leave a line for that, feel free
to but I wouldn't be that verbose.
You can combine it to da_monitor_reset_hook() where the same rules
apply.

> +static inline void tlob_extra_cleanup(struct da_monitor *da_mon);
> +#define da_extra_cleanup tlob_extra_cleanup
> +
> +#include <rv/ha_monitor.h>
> +
> +/*
> + * Called from da_monitor_reset() on both normal stop and hrtimer
> expiry.
> + * On violation (stopping==0), emits detail_env_tlob.
> + */
> +static inline void tlob_reset_notify(struct da_monitor *da_mon)
> +{
> +	struct ha_monitor *ha_mon = to_ha_monitor(da_mon);
> +	struct tlob_task_state *ws;
> +
> +	ha_monitor_reset_env(da_mon);

Could have the rest run only if trace_detail_env_tlob_enabled() and
later use trace_call__detail_env_tlob() (the raw tracepoint without
enabled check).

So if the tracepoint is disabled, this code almost doesn't exist.

> +
> +	ws = ha_get_target(ha_mon);
> +	if (!ws)
> +		return;
> +
> +	/*
> +	 * Emit per-state breakdown on budget violation only.
> +	 * stopping==0: timer callback owns this path (genuine
> overrun).
> +	 * stopping==1: normal stop claimed ownership first; skip.
> +	 */
> +	if (!atomic_read(&ws->stopping)) {
> +		unsigned int curr_state = READ_ONCE(da_mon-
> >curr_state);
> +		u64 running_ns, waiting_ns, sleeping_ns, partial_ns;
> +		unsigned long flags;
> +
> +		/*
> +		 * Snapshot accumulators; partial_ns covers
> curr_state time
> +		 * not yet folded in (transition-out pending).
> +		 */
> +		raw_spin_lock_irqsave(&ws->entry_lock, flags);
> +		partial_ns   = ktime_get_ns() - ktime_to_ns(ws-
> >last_ts);
> +		running_ns   = ws->running_ns  +
> +			       (curr_state == running_tlob  ?
> partial_ns : 0);
> +		waiting_ns   = ws->waiting_ns  +
> +			       (curr_state == waiting_tlob  ?
> partial_ns : 0);
> +		sleeping_ns  = ws->sleeping_ns +
> +			       (curr_state == sleeping_tlob ?
> partial_ns : 0);
> +		raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
> +
> +		trace_detail_env_tlob(da_get_id(da_mon), ws-
> >threshold_ns,
> +				      running_ns, waiting_ns,
> sleeping_ns);
> +	}
> +}
> +
> +#define BUDGET_NS(ha_mon) (ha_get_target(ha_mon)->threshold_ns)
> +
> +/* HA constraint functions (called by ha_monitor_handle_constraint)
> */
> +
> +static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs_tlob env,
> u64 time_ns)
> +{
> +	if (env == clk_elapsed_tlob)
> +		return ha_get_clk_ns(ha_mon, env, time_ns);
> +	return ENV_INVALID_VALUE;
> +}

So here you removed ha_reset_env() because that's done implicitly in the
start. It seems to work, but you're going to need to define it if we
apply an automatic reset at start (what I mentioned in the other review).

It's kinda idiomatic to have resets whenever you have clocks, yours is a
special case because you don't really expect start (the only event with a
reset) to happen again.

It's ok to redefine the functions the way you did, though you risk
leaving them out of sync in case you update the model (rvgen's output is
going to be quite different). Not sure if that's ever going to happen
anyway.

> +
> +/*
> + * ha_verify_invariants - clk_elapsed < BUDGET_NS must hold in all
> states.
> + *
> + * The invariant is uniform across running/waiting/sleeping; check
> it
> + * unconditionally rather than enumerating each state.
> + */
> +static inline bool ha_verify_invariants(struct ha_monitor *ha_mon,
> +					enum states curr_state, enum
> events event,
> +					enum states next_state, u64
> time_ns)
> +{

The compiler is probably going to produce something not that different
if you keep the ifs, but this is smaller and cleaner. Again, you only
need to track it manually but it looks alright now.

> +	return ha_check_invariant_ns(ha_mon, clk_elapsed_tlob,
> time_ns);
> +}
> +
> +/*
> + * Convert invariant (deadline) to guard (reset anchor) on state
> transitions.
> + *
> + * The conversion is identical for every departing state; skip only
> self-loops.
> + */
> +static inline void ha_convert_inv_guard(struct ha_monitor *ha_mon,
> +					enum states curr_state, enum
> events event,
> +					enum states next_state, u64
> time_ns)
> +{
> +	if (curr_state != next_state)
> +		ha_inv_to_guard(ha_mon, clk_elapsed_tlob,
> BUDGET_NS(ha_mon), time_ns);
> +}
> +
> +/* No per-event guard conditions for tlob; invariants suffice. */
> +static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
> +				    enum states curr_state, enum
> events event,
> +				    enum states next_state, u64
> time_ns)
> +{

Yeah you're not quite using a reset and you probably shouldn't so this
is alright too.

> +	return true;
> +}
> +
> +/*
> + * Arm or cancel the HA budget timer on state transitions.
> + *
> + * The timer must run in every monitored state
> (running/waiting/sleeping),
> + * so arm it whenever next_state is any of the three.  On a self-
> loop caused
> + * by a non-start event the timer is already running; skip the
> redundant
> + * restart.  On a true state change the old timer is implicitly
> superseded by
> + * the new ha_start_timer_ns() call.
> + *
> + * Guard on stopping: sched_switch events can arrive after
> ha_cancel_timer_sync,
> + * restarting the timer and triggering an ODEBUG "activate active"
> splat.
> + * The _acquire pairs with the cmpxchg_release in tlob_stop_task.
> + */
> +static inline void ha_setup_invariants(struct ha_monitor *ha_mon,
> +				       enum states curr_state, enum
> events event,
> +				       enum states next_state, u64
> time_ns)
> +{
> +	if (next_state == curr_state && event != start_tlob)
> +		return;
> +

stopping is only ever set in the cleanup phase, right? You don't really
need to cancel timers because you should have done it already, what
about the slightly more readable:

	if (atomic_read_acquire(&ha_get_target(ha_mon)->stopping))
		return;
	if (next_state < state_max_tlob)
		ha_start_timer_ns(ha_mon, clk_elapsed_tlob, BUDGET_NS(ha_mon), time_ns);
	else
		ha_cancel_timer(ha_mon);

> +	if (next_state < state_max_tlob) {
> +		if (!atomic_read_acquire(&ha_get_target(ha_mon)-
> >stopping))
> +			ha_start_timer_ns(ha_mon, clk_elapsed_tlob,
> BUDGET_NS(ha_mon), time_ns);
> +	} else {
> +		ha_cancel_timer(ha_mon);
> +	}
> +}

...

> +/*
> + * da_extra_cleanup - per-task teardown called by
> da_monitor_destroy().
> + *
> + * Claims cleanup ownership via CAS; cancels the budget timer;
> decrements the
> + * monitored-task counter; and schedules the slab free via
> call_rcu().
> + * Must run before da_monitor_reset() (i.e. before hash_del_rcu())
> so that
> + * ha_cancel_timer_sync() can safely access the still-registered
> ha_monitor.
> + */
> +static inline void tlob_extra_cleanup(struct da_monitor *da_mon)
> +{
> +	struct ha_monitor *ha_mon = to_ha_monitor(da_mon);
> +	struct tlob_task_state *ws = ha_get_target(ha_mon);
> +
> +	if (!ws)
> +		return;
> +
> +	if (atomic_cmpxchg_release(&ws->stopping, 0, 1) != 0)
> +		return;
> +
> +	ha_cancel_timer_sync(ha_mon);

You shouldn't need to do this since that's standard HA functionality and
is stopped already by the reset + sync RCU sequence.

> +	put_task_struct(ws->task);
> +	call_rcu(&ws->rcu, tlob_free_rcu);

And since you can sleep and already waited for a grace period, you can
probably just free now.
This is the same point I had in da_monitor_destroy_pool() vs
da_monitor_destroy_kmalloc(): let's align what we do with sync/RCU.

> +}
> +
> +/*
> + * __tlob_acc - accumulate elapsed ns into one per-state counter.
> + *
> + * Looks up the task's tlob_task_state under RCU, adds the interval
> + * [ws->last_ts, now] to the field at @offset within the state
> struct,
> + * and updates last_ts.  Returns true if the task is monitored.
> + *
> + * entry_lock is a raw spinlock so this is safe from hardirq
> context.
> + */
> +static inline bool __tlob_acc(struct task_struct *task, ktime_t now,
> +			       size_t offset)
> +{
> +	struct tlob_task_state *ws;
> +	unsigned long flags;
> +
> +	scoped_guard(rcu) {
> +		ws = da_get_target_by_id(task->pid);
> +		if (!ws)
> +			return false;
> +		raw_spin_lock_irqsave(&ws->entry_lock, flags);
> +		*(u64 *)((char *)ws + offset) +=
> ktime_to_ns(ktime_sub(now, ws->last_ts));

I don't really like this, you have a few options to avoid raw pointer
arithmetics:

1. a macro:

	#define __tlob_acc(task, now, field) do { \
		...
		ws->field += ktime_stuff(now); \
		...
	}
	static inline bool tlob_acc_running(struct task_struct *task, ktime_t now)
	{
		return __tlob_acc(task, now, running_ns);
	}

2. use an array in the struct and pass the index around:

	enum tlob_acc {
		running,
		waiting,
		sleeping,
		max,
	}

	struct tlob_task_state {
		...
		u64			accs_ns[max];
		...
	};

	bool __tlob_acc(struct task_struct *task, ktime_t now, enum tlob_acc acc)
	{
		...
		ws->accs_ns[acc] += ktime_stuff(now);
		...
	}
	static inline bool tlob_acc_running(struct task_struct *task, ktime_t now)
	{
		return __tlob_acc(task, now, running);
	}


Some folks don't like macros, so perhaps options 2 is cleaner.
You may not even need the helper functions since calling the generic
tlob_acc like this is still readable.

If you go down that path, you may also want to simplify the function
using normal (unscoped) guards for both RCU and the lock, since you take
them until the end of the function (won't work with the macro!).

> +		ws->last_ts = now;
> +		raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
> +	}
> +	return true;
> +}
> +
> +/* Accumulate running_ns for prev; returns true if prev is
> monitored. */
> +static inline bool tlob_acc_running(struct task_struct *task,
> ktime_t now)
> +{
> +	return __tlob_acc(task, now, offsetof(struct
> tlob_task_state, running_ns));
> +}
> +
> +/* Accumulate waiting_ns for next; returns true if next is
> monitored. */
> +static inline bool tlob_acc_waiting(struct task_struct *task,
> ktime_t now)
> +{
> +	return __tlob_acc(task, now, offsetof(struct
> tlob_task_state, waiting_ns));
> +}
> +
> +/*
> + * handle_sched_switch - advance the DA on every context switch.
> + *
> + * Generates three DA events:
> + *   prev, prev_state != 0  -> sleep_tlob    (running -> sleeping)
> + *   prev, prev_state == 0  -> preempt_tlob  (running -> waiting)
> + *   next                   -> switch_in_tlob (waiting -> running)
> + *
> + * A single ktime_get() at handler entry is shared by both acc calls
> so that
> + * prev's running_ns and next's waiting_ns share the same context-
> switch
> + * timestamp; neither absorbs handler overhead into its accumulator.
> + *
> + * No waiting->sleeping edge exists: a task can only block
> voluntarily
> + * (call schedule()) while it is executing on CPU, which corresponds
> to
> + * the running DA state.  A task in the waiting state is
> TASK_RUNNING in
> + * kernel terms (on the runqueue) and cannot block itself.
> + *
> + * da_handle_event() is called unconditionally: it skips tasks that
> have no
> + * monitor entry in the hash table.
> + */
> +static void handle_sched_switch(void *data, bool preempt_unused,
> +				struct task_struct *prev,
> +				struct task_struct *next,
> +				unsigned int prev_state)
> +{
> +	ktime_t now = ktime_get();
> +	bool prev_preempted = (prev_state == 0);
> +
> +	/*
> +	 * No guard on tlob_num_monitored here: da_handle_event()
> internally
> +	 * calls da_monitor_handling_event() which checks both
> rv_monitoring_on()
> +	 * and da_monitoring(da_mon).  The hash lookup inside
> da_get_monitor()
> +	 * simply returns NULL for unmonitored tasks, which is
> equally fast as
> +	 * an atomic_read() guard.  By omitting the guard we avoid
> touching the
> +	 * tlob_num_monitored cacheline on every global context-
> switch.
> +	 */
> +	if (tlob_acc_running(prev, now))
> +		da_handle_event(prev->pid, NULL,
> +				prev_preempted ? preempt_tlob :
> sleep_tlob);
> +	if (tlob_acc_waiting(next, now))
> +		da_handle_event(next->pid, NULL, switch_in_tlob);
> +}
> +
> +/* Accumulate sleeping_ns on wakeup; returns true if task is
> monitored. */
> +static inline bool tlob_acc_sleeping(struct task_struct *task,
> ktime_t now)
> +{
> +	return __tlob_acc(task, now, offsetof(struct
> tlob_task_state, sleeping_ns));
> +}
> +
> +/*
> + * handle_sched_wakeup - sleeping -> waiting transition.
> + *
> + * try_to_wake_up() skips TASK_RUNNING tasks, so this never fires
> for a
> + * task already in running or waiting state.
> + */
> +static void handle_sched_wakeup(void *data, struct task_struct *p)
> +{
> +	ktime_t now = ktime_get();
> +
> +	/* Same reasoning as handle_sched_switch: rely on hash-
> lookup fast path. */
> +	if (tlob_acc_sleeping(p, now))
> +		da_handle_event(p->pid, NULL, wakeup_tlob);
> +}
> +
> +/*
> + * handle_sched_process_exit - clean up if a task exits without
> TRACE_STOP.
> + *
> + * Called in do_exit() context; the task still has a valid pid here.
> + * tlob_stop_task() returns -ESRCH if the task is not monitored,
> which is fine.
> + */
> +static void handle_sched_process_exit(void *data, struct task_struct
> *p,
> +				       bool group_dead)
> +{
> +	tlob_stop_task(p);
> +}
> +
> +
> +
> +/**
> + * tlob_start_task - begin monitoring @task with budget
> @threshold_ns ns.
> + * @task:         Task to monitor; may be current or another task.
> + * @threshold_ns: Latency budget in nanoseconds (wall-clock; running
> + waiting + sleeping).
> + *                Must be in [1000, TLOB_MAX_THRESHOLD_NS].
> + *
> + * Returns 0, -ENODEV, -ERANGE, -EALREADY, -ENOMEM, or -ENOSPC.
> + */
> +int tlob_start_task(struct task_struct *task, u64 threshold_ns)
> +{
> +	struct tlob_task_state *ws;
> +
> +	if (!da_monitor_enabled())
> +		return -ENODEV;
> +
> +	if (threshold_ns < 1000 || threshold_ns >

Why not some TLOB_MIN_THRESHOLD_NS (that can of course be 1000)?

> TLOB_MAX_THRESHOLD_NS)
> +		return -ERANGE;
> +
> +	/* Serialise duplicate-check + pool-slot claim for the same
> pid. */
> +	guard(mutex)(&tlob_start_mutex);
> +
> +	if (da_get_target_by_id(task->pid))
> +		return -EALREADY;
> +
> +	ws = kmem_cache_zalloc(tlob_state_cache, GFP_KERNEL);
> +	if (!ws)
> +		return -ENOMEM;
> +
> +	ws->task = task;
> +	get_task_struct(task);
> +	ws->threshold_ns = threshold_ns;
> +	ws->last_ts = ktime_get();
> +	raw_spin_lock_init(&ws->entry_lock);
> +
> +	/*
> +	 * da_handle_start_run_event() claims a pool slot via
> da_prepare_storage(),
> +	 * initialises the monitor, and delivers start_tlob in one
> step: the
> +	 * generated ha_setup_invariants() resets clk_elapsed and
> arms the timer.
> +	 * Returns 0 if the pool is exhausted (-ENOSPC).
> +	 */
> +	if (!da_handle_start_run_event(task->pid, ws, start_tlob)) {
> +		put_task_struct(task);
> +		kmem_cache_free(tlob_state_cache, ws);
> +		return -ENOSPC;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(tlob_start_task);
> +
> +/**
> + * tlob_stop_task - stop monitoring @task.
> + * @task: Task to stop.
> + *
> + * CAS on ws->stopping (0->1) under RCU claims cleanup ownership;
> + * the winner cancels the timer synchronously and frees all
> resources.
> + *
> + * Returns 0, -EOVERFLOW (budget exceeded), -ESRCH (not monitored),
> + * or -EAGAIN (concurrent caller claimed cleanup).
> + */
> +int tlob_stop_task(struct task_struct *task)
> +{
> +	struct da_monitor *da_mon;
> +	struct ha_monitor *ha_mon;
> +	struct tlob_task_state *ws;
> +	bool budget_exceeded;
> +
> +	scoped_guard(rcu) {
> +		ws = da_get_target_by_id(task->pid);
> +		if (!ws)
> +			return -ESRCH;
> +
> +		da_mon = da_get_monitor(task->pid, NULL);
> +		if (unlikely(!da_mon)) {
> +			/* ws in hash but da_mon gone; internal
> inconsistency. */
> +			WARN_ON_ONCE(1);

	if (unlikely(WARN_ON_ONCE(!da_mon)))

> +			return -ESRCH;
> +		}
> +
> +		ha_mon = to_ha_monitor(da_mon);
> +
> +		/*
> +		 * CAS (0->1) claims cleanup ownership under RCU (ws
> guaranteed valid).
> +		 * _release pairs with atomic_read_acquire in
> ha_setup_invariants.
> +		 */
> +		if (atomic_cmpxchg_release(&ws->stopping, 0, 1) !=
> 0)
> +			return -EAGAIN;
> +	}
> +
> +	/* Wait for in-flight timer callback before reading
> da_monitoring. */
> +	ha_cancel_timer_sync(ha_mon);
> +
> +	/* Timer fired first -> budget exceeded; otherwise reset
> normally. */
> +	scoped_guard(rcu) {
> +		budget_exceeded = !da_monitoring(da_mon);
> +		if (!budget_exceeded)
> +			da_monitor_reset(da_mon);
> +	}
> +	da_destroy_storage(task->pid);
> +
> +	put_task_struct(ws->task);
> +	call_rcu(&ws->rcu, tlob_free_rcu);
> +	return budget_exceeded ? -EOVERFLOW : 0;
> +}
> +EXPORT_SYMBOL_GPL(tlob_stop_task);
> +
> +
> +static int tlob_uprobe_entry_handler(struct rv_uprobe *p, struct
> pt_regs *regs,
> +				     __u64 *data)
> +{
> +	struct tlob_uprobe_binding *b = p->priv;
> +
> +	tlob_start_task(current, b->threshold_ns);
> +	return 0;
> +}
> +
> +static int tlob_uprobe_stop_handler(struct rv_uprobe *p, struct
> pt_regs *regs,
> +				    __u64 *data)
> +{
> +	tlob_stop_task(current);
> +	return 0;
> +}
> +
> +/*
> + * Register start + stop entry uprobes for a binding.
> + * Called with tlob_uprobe_mutex held.
> + */
> +static int tlob_add_uprobe(u64 threshold_ns, const char *binpath,
> +			   loff_t offset_start, loff_t offset_stop)
> +{

You can use cleanup helpers here to save a couple of gotos (making the
code less error prone and more readable):

> +	struct tlob_uprobe_binding *b, *tmp_b;

struct tlob_uprobe_binding __free(kfree) *b = NULL, *tmp_b;

> +	char pathbuf[TLOB_MAX_PATH];
> +	struct path path;

struct path path __free(path_put) = {};

> +	char *canon;
> +	int ret;
> +
> +	if (binpath[0] != '/')
> +		return -EINVAL;
> +
> +	b = kzalloc_obj(*b, GFP_KERNEL);
> +	if (!b)
> +		return -ENOMEM;
> +
> +	b->threshold_ns = threshold_ns;
> +	b->offset_start = offset_start;
> +	b->offset_stop  = offset_stop;
> +
> +	ret = kern_path(binpath, LOOKUP_FOLLOW, &path);
> +	if (ret)
> +		goto err_free;
> +
> +	if (!d_is_reg(path.dentry)) {
> +		ret = -EINVAL;
> +		goto err_path;
> +	}
> +
> +	/* Reject duplicate start offset for the same binary. */
> +	list_for_each_entry(tmp_b, &tlob_uprobe_list, list) {
> +		if (tmp_b->offset_start == offset_start &&
> +		    tmp_b->start_probe->path.dentry == path.dentry)
> {
> +			ret = -EEXIST;
> +			goto err_path;
> +		}
> +	}
> +
> +	canon = d_path(&path, pathbuf, sizeof(pathbuf));
> +	if (IS_ERR(canon)) {
> +		ret = PTR_ERR(canon);
> +		goto err_path;
> +	}
> +	strscpy(b->binpath, canon, sizeof(b->binpath));
> +
> +	/* Both probes share b (priv) and path; attach_path refs
> path itself. */
> +	b->start_probe = rv_uprobe_attach_path(&path, offset_start,
> +					      
> tlob_uprobe_entry_handler, NULL, b);
> +	if (IS_ERR(b->start_probe)) {
> +		ret = PTR_ERR(b->start_probe);
> +		b->start_probe = NULL;
> +		goto err_path;
> +	}
> +
> +	b->stop_probe = rv_uprobe_attach_path(&path, offset_stop,
> +					     
> tlob_uprobe_stop_handler, NULL, b);
> +	if (IS_ERR(b->stop_probe)) {
> +		ret = PTR_ERR(b->stop_probe);
> +		b->stop_probe = NULL;
> +		goto err_start;
> +	}
> +
> +	path_put(&path);
> +	list_add_tail(&b->list, &tlob_uprobe_list);

And finally tell the compiler you want to keep b in the happy path.

list_add_tail(&no_free_ptr(b)->list, &tlob_uprobe_list);

Not tested, but you got the gist, check include/linux/cleanup.h and grep
around for users if it doesn't work as expected.

> +	return 0;
> +
> +err_start:
> +	rv_uprobe_detach(b->start_probe);

This will need to stay, unless you want to DEFINE_FREE() for this, may
be fun but not required.

> +err_path:
> +	path_put(&path);
> +err_free:
> +	kfree(b);
> +	return ret;
> +}

...

> +
> +/* Parse "-PATH:OFFSET_START" (ftrace uprobe_events removal
> convention). */

This isn't quite a standard function comment (and you don't really need
to make it so, it wouldn't add much value), but I'd say at least make it
a multi-line comment which is more common in this location:

/*
 * Parse ...
 */

...

> +static void __tlob_destroy_monitor(void)
> +{
> +	rv_this.enabled = 0;
> +	/*
> +	 * Remove uprobes first; rv_uprobe_sync() inside ensures all
> in-flight
> +	 * handlers have finished before we proceed.
> +	 */
> +	tlob_remove_all_uprobes();
> +
> +	/*
> +	 * da_monitor_destroy() iterates any remaining entries via
> da_extra_cleanup
> +	 * (tlob_extra_cleanup), cancels their timers, and frees
> their state.
> +	 * rcu_barrier() inside drains both da_pool_return_cb and
> tlob_free_rcu
> +	 * callbacks before the pool arrays are freed.
> +	 */

I'm not sure we need this many comments, the behaviour may change and we
don't need to describe what the function does before calling it.

If anything say /why/ you call a function before another, but I'd say
it's trivial here so you can just drop both comments.

> +	ha_monitor_destroy();
> +	kmem_cache_destroy(tlob_state_cache);
> +	tlob_state_cache = NULL;
> +}

...

> +static int __init register_tlob(void)
> +{
> +	int ret;
> +
> +	ret = rv_register_monitor(&rv_this, NULL);
> +	if (ret)
> +		return ret;
> +
> +	if (rv_this.root_d) {
> +		if (!tracefs_create_file("monitor", 0644,

We have rv_create_file() , it's exactly the same but let's be consistent
so if we ever decide to overload it, this won't stay behind.

> rv_this.root_d, NULL,
> +					 &tlob_monitor_fops)) {
> +			rv_unregister_monitor(&rv_this);
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	return 0;
> +}

> diff --git a/kernel/trace/rv/monitors/tlob/tlob.h
> b/kernel/trace/rv/monitors/tlob/tlob.h
> new file mode 100644
> index 000000000000..b6724e629c69
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/tlob/tlob.h
> @@ -0,0 +1,148 @@

...

> +
> +enum states_tlob {
> +	running_tlob,
> +	waiting_tlob,
> +	sleeping_tlob,
> +	state_max_tlob,
> +};

This is personal preference, so feel free to ignore it. The header file
is (mostly) automatically generated and rvgen currently gives you a
different order for enums/arrays. Maybe just reorder all those according
to what rvgen does (just run it without the -a option and diff what
comes out).

The main reason is again to make it easy to update the file in case the
model/framework changes.

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-06-15 15:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
2026-06-07 16:13 ` [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY wen.yang
2026-06-15  9:56   ` Gabriele Monaco
2026-06-07 16:13 ` [PATCH v3 2/9] rv: add generic uprobe infrastructure for RV monitors wen.yang
2026-06-07 16:13 ` [PATCH v3 3/9] rv/tlob: add tlob model DOT file wen.yang
2026-06-07 16:13 ` [PATCH v3 4/9] rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check wen.yang
2026-06-15 10:12   ` Gabriele Monaco
2026-06-07 16:13 ` [PATCH v3 5/9] rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable wen.yang
2026-06-15 10:16   ` Gabriele Monaco
2026-06-07 16:13 ` [PATCH v3 7/9] rv/tlob: add KUnit tests for the tlob monitor wen.yang
2026-06-07 16:13 ` [PATCH v3 8/9] selftests/verification: fix verificationtest-ktap for out-of-tree execution wen.yang
2026-06-13 16:00 ` [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor Wen Yang
     [not found] ` <629023dbcc4389fcc6ec46d88c98eb19aa0abc36.1780847473.git.wen.yang@linux.dev>
2026-06-15 15:24   ` [PATCH v3 6/9] rv/tlob: add tlob hybrid automaton monitor Gabriele Monaco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox