Linux Trace Kernel
 help / color / mirror / Atom feed
From: Gabriele Monaco <gmonaco@redhat.com>
To: wen.yang@linux.dev
Cc: Steven Rostedt <rostedt@goodmis.org>,
	linux-trace-kernel@vger.kernel.org,
		linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 6/9] rv/tlob: add tlob hybrid automaton monitor
Date: Mon, 15 Jun 2026 17:24:55 +0200	[thread overview]
Message-ID: <04900b5700a2d5d344a60cb346cfffaf4e3d5fa4.camel@redhat.com> (raw)
In-Reply-To: <629023dbcc4389fcc6ec46d88c98eb19aa0abc36.1780847473.git.wen.yang@linux.dev>

On Mon, 2026-06-08 at 00:13 +0800, wen.yang@linux.dev wrote:
> From: Wen Yang <wen.yang@linux.dev>
> 
> Add tlob (task latency over budget), a per-task hybrid automaton RV
> monitor that tracks elapsed wall-clock time across a user-delimited
> code section and emits error_env_tlob when the elapsed time exceeds a
> configurable budget.
> 
> The monitor uses RV_MON_PER_OBJ with three states (running, waiting,
> sleeping) driven by sched_switch and sched_wakeup tracepoints, and a
> single clock invariant clk_elapsed < budget enforced by an hrtimer
> (HRTIMER_MODE_REL_HARD).  On violation, detail_env_tlob provides a
> per-state time breakdown (running_ns, waiting_ns, sleeping_ns).
> 
> Per-task state is managed via DA_ALLOC_POOL to avoid allocation on
> the
> scheduler tracepoint path.  Uprobe pairs are registered through the
> tracefs monitor file as "p PATH:OFFSET_START OFFSET_STOP
> threshold=NS".
> 
> Also adds ha_cancel_timer_sync() to ha_monitor.h, a blocking cancel
> variant needed by tlob's stop_task path to ensure the hrtimer
> callback
> has completed before the per-task monitor state is freed.
> 
> Suggested-by: Gabriele Monaco <gmonaco@redhat.com>

I'm not sure this applies to be fair, I'm reviewing the patch and as
part of that I'm suggesting things, but the main idea is yours [1].

Unless you mess up, I'm going to appear as reviewer anyway ;)

[1] - https://www.kernel.org/doc/html/latest/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

> Signed-off-by: Wen Yang <wen.yang@linux.dev>
> ---

...

> diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
> index e2e0033a00b9..ed2de31d0312 100644
> --- a/kernel/trace/rv/Kconfig
> +++ b/kernel/trace/rv/Kconfig
> @@ -85,6 +85,7 @@ source "kernel/trace/rv/monitors/sleep/Kconfig"
>  source "kernel/trace/rv/monitors/stall/Kconfig"
>  source "kernel/trace/rv/monitors/deadline/Kconfig"
>  source "kernel/trace/rv/monitors/nomiss/Kconfig"
> +source "kernel/trace/rv/monitors/tlob/Kconfig"
>  # Add new deadline monitors here

Your line should be after the marker for deadline monitors, this is
required for the Kconfig to look good with nested monitors.

Just let rvgen handle the way it looks (V2 was fine, just above the
general monitor marker):

 # Add new deadline monitors here

+source "kernel/trace/rv/monitors/tlob/Kconfig"
 # Add new monitors here

>  
>  # Add new monitors here

> diff --git a/kernel/trace/rv/monitors/tlob/tlob.c
> b/kernel/trace/rv/monitors/tlob/tlob.c
> new file mode 100644
> index 000000000000..d8e0c4794720
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/tlob/tlob.c
> @@ -0,0 +1,968 @@
> +#include <linux/hrtimer.h>
> +#include <linux/kernel.h>
> +#include <linux/ktime.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/namei.h>
> +#include <linux/rv.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/tracefs.h>
> +#include <kunit/visibility.h>
> +#include <rv/instrumentation.h>
> +#include <rv/rv_uprobe.h>

You added a few includes I don't believe you need: hrtimer, ktime, sched
(included in events/sched) and tracefs (included in rv).

You can keep the others as they come from the template, all monitors
include them anyway.

> +#include "../../rv.h"

This is <rv.h> the includepath allows that just like rv_trace.

> +
> +#define MODULE_NAME "tlob"
> +
> +#include <trace/events/sched.h>
> +#include <rv_trace.h>

...

> +/* Type for da_monitor_storage.target; must be defined before the
> includes. */
> +typedef struct tlob_task_state *monitor_target;
> +
> +/* Forward-declared so da_monitor_reset_hook works before
> ha_monitor.h. */
> +static inline void tlob_reset_notify(struct da_monitor *da_mon);
> +#define da_monitor_reset_hook tlob_reset_notify
> +
> +/* Override EVENT_NONE_LBL so the timer-fired violation shows
> "budget_exceeded". */
> +#define EVENT_NONE_LBL "budget_exceeded"
> +
> +#include "tlob.h"
> +
> +/*
> + * DA_MON_POOL_SIZE must be defined HERE: after tlob.h (which
> defines
> + * TLOB_MAX_MONITORED) and before #include <rv/ha_monitor.h> (which
> + * transitively includes da_monitor.h and expands
> __da_monitor_init_pool
> + * using this macro).  Placing the define before tlob.h or after
> + * ha_monitor.h both cause a build error.
> + */

I don't think you need this comment here. I would add a comment in case
moving it somewhere else would silently change the behaviour, but here
the compiler is going to tell you what isn't defined if you really want
to move it around.

Same for the one line comments above but those are less invasive.

> +#define DA_MON_POOL_SIZE TLOB_MAX_MONITORED
> +
> +/*
> + * Forward-declare tlob_extra_cleanup so the #define below is valid
> when
> + * da_monitor.h (included via ha_monitor.h) expands da_extra_cleanup
> inside
> + * da_monitor_destroy().  The full definition follows after
> ha_monitor.h.
> + */

I'd argue the same here. Though defining da_extra_cleanup() before is
the actual catch (if you do after, the compiler won't tell you it's
using a no-op). If you really want to leave a line for that, feel free
to but I wouldn't be that verbose.
You can combine it to da_monitor_reset_hook() where the same rules
apply.

> +static inline void tlob_extra_cleanup(struct da_monitor *da_mon);
> +#define da_extra_cleanup tlob_extra_cleanup
> +
> +#include <rv/ha_monitor.h>
> +
> +/*
> + * Called from da_monitor_reset() on both normal stop and hrtimer
> expiry.
> + * On violation (stopping==0), emits detail_env_tlob.
> + */
> +static inline void tlob_reset_notify(struct da_monitor *da_mon)
> +{
> +	struct ha_monitor *ha_mon = to_ha_monitor(da_mon);
> +	struct tlob_task_state *ws;
> +
> +	ha_monitor_reset_env(da_mon);

Could have the rest run only if trace_detail_env_tlob_enabled() and
later use trace_call__detail_env_tlob() (the raw tracepoint without
enabled check).

So if the tracepoint is disabled, this code almost doesn't exist.

> +
> +	ws = ha_get_target(ha_mon);
> +	if (!ws)
> +		return;
> +
> +	/*
> +	 * Emit per-state breakdown on budget violation only.
> +	 * stopping==0: timer callback owns this path (genuine
> overrun).
> +	 * stopping==1: normal stop claimed ownership first; skip.
> +	 */
> +	if (!atomic_read(&ws->stopping)) {
> +		unsigned int curr_state = READ_ONCE(da_mon-
> >curr_state);
> +		u64 running_ns, waiting_ns, sleeping_ns, partial_ns;
> +		unsigned long flags;
> +
> +		/*
> +		 * Snapshot accumulators; partial_ns covers
> curr_state time
> +		 * not yet folded in (transition-out pending).
> +		 */
> +		raw_spin_lock_irqsave(&ws->entry_lock, flags);
> +		partial_ns   = ktime_get_ns() - ktime_to_ns(ws-
> >last_ts);
> +		running_ns   = ws->running_ns  +
> +			       (curr_state == running_tlob  ?
> partial_ns : 0);
> +		waiting_ns   = ws->waiting_ns  +
> +			       (curr_state == waiting_tlob  ?
> partial_ns : 0);
> +		sleeping_ns  = ws->sleeping_ns +
> +			       (curr_state == sleeping_tlob ?
> partial_ns : 0);
> +		raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
> +
> +		trace_detail_env_tlob(da_get_id(da_mon), ws-
> >threshold_ns,
> +				      running_ns, waiting_ns,
> sleeping_ns);
> +	}
> +}
> +
> +#define BUDGET_NS(ha_mon) (ha_get_target(ha_mon)->threshold_ns)
> +
> +/* HA constraint functions (called by ha_monitor_handle_constraint)
> */
> +
> +static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs_tlob env,
> u64 time_ns)
> +{
> +	if (env == clk_elapsed_tlob)
> +		return ha_get_clk_ns(ha_mon, env, time_ns);
> +	return ENV_INVALID_VALUE;
> +}

So here you removed ha_reset_env() because that's done implicitly in the
start. It seems to work, but you're going to need to define it if we
apply an automatic reset at start (what I mentioned in the other review).

It's kinda idiomatic to have resets whenever you have clocks, yours is a
special case because you don't really expect start (the only event with a
reset) to happen again.

It's ok to redefine the functions the way you did, though you risk
leaving them out of sync in case you update the model (rvgen's output is
going to be quite different). Not sure if that's ever going to happen
anyway.

> +
> +/*
> + * ha_verify_invariants - clk_elapsed < BUDGET_NS must hold in all
> states.
> + *
> + * The invariant is uniform across running/waiting/sleeping; check
> it
> + * unconditionally rather than enumerating each state.
> + */
> +static inline bool ha_verify_invariants(struct ha_monitor *ha_mon,
> +					enum states curr_state, enum
> events event,
> +					enum states next_state, u64
> time_ns)
> +{

The compiler is probably going to produce something not that different
if you keep the ifs, but this is smaller and cleaner. Again, you only
need to track it manually but it looks alright now.

> +	return ha_check_invariant_ns(ha_mon, clk_elapsed_tlob,
> time_ns);
> +}
> +
> +/*
> + * Convert invariant (deadline) to guard (reset anchor) on state
> transitions.
> + *
> + * The conversion is identical for every departing state; skip only
> self-loops.
> + */
> +static inline void ha_convert_inv_guard(struct ha_monitor *ha_mon,
> +					enum states curr_state, enum
> events event,
> +					enum states next_state, u64
> time_ns)
> +{
> +	if (curr_state != next_state)
> +		ha_inv_to_guard(ha_mon, clk_elapsed_tlob,
> BUDGET_NS(ha_mon), time_ns);
> +}
> +
> +/* No per-event guard conditions for tlob; invariants suffice. */
> +static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
> +				    enum states curr_state, enum
> events event,
> +				    enum states next_state, u64
> time_ns)
> +{

Yeah you're not quite using a reset and you probably shouldn't so this
is alright too.

> +	return true;
> +}
> +
> +/*
> + * Arm or cancel the HA budget timer on state transitions.
> + *
> + * The timer must run in every monitored state
> (running/waiting/sleeping),
> + * so arm it whenever next_state is any of the three.  On a self-
> loop caused
> + * by a non-start event the timer is already running; skip the
> redundant
> + * restart.  On a true state change the old timer is implicitly
> superseded by
> + * the new ha_start_timer_ns() call.
> + *
> + * Guard on stopping: sched_switch events can arrive after
> ha_cancel_timer_sync,
> + * restarting the timer and triggering an ODEBUG "activate active"
> splat.
> + * The _acquire pairs with the cmpxchg_release in tlob_stop_task.
> + */
> +static inline void ha_setup_invariants(struct ha_monitor *ha_mon,
> +				       enum states curr_state, enum
> events event,
> +				       enum states next_state, u64
> time_ns)
> +{
> +	if (next_state == curr_state && event != start_tlob)
> +		return;
> +

stopping is only ever set in the cleanup phase, right? You don't really
need to cancel timers because you should have done it already, what
about the slightly more readable:

	if (atomic_read_acquire(&ha_get_target(ha_mon)->stopping))
		return;
	if (next_state < state_max_tlob)
		ha_start_timer_ns(ha_mon, clk_elapsed_tlob, BUDGET_NS(ha_mon), time_ns);
	else
		ha_cancel_timer(ha_mon);

> +	if (next_state < state_max_tlob) {
> +		if (!atomic_read_acquire(&ha_get_target(ha_mon)-
> >stopping))
> +			ha_start_timer_ns(ha_mon, clk_elapsed_tlob,
> BUDGET_NS(ha_mon), time_ns);
> +	} else {
> +		ha_cancel_timer(ha_mon);
> +	}
> +}

...

> +/*
> + * da_extra_cleanup - per-task teardown called by
> da_monitor_destroy().
> + *
> + * Claims cleanup ownership via CAS; cancels the budget timer;
> decrements the
> + * monitored-task counter; and schedules the slab free via
> call_rcu().
> + * Must run before da_monitor_reset() (i.e. before hash_del_rcu())
> so that
> + * ha_cancel_timer_sync() can safely access the still-registered
> ha_monitor.
> + */
> +static inline void tlob_extra_cleanup(struct da_monitor *da_mon)
> +{
> +	struct ha_monitor *ha_mon = to_ha_monitor(da_mon);
> +	struct tlob_task_state *ws = ha_get_target(ha_mon);
> +
> +	if (!ws)
> +		return;
> +
> +	if (atomic_cmpxchg_release(&ws->stopping, 0, 1) != 0)
> +		return;
> +
> +	ha_cancel_timer_sync(ha_mon);

You shouldn't need to do this since that's standard HA functionality and
is stopped already by the reset + sync RCU sequence.

> +	put_task_struct(ws->task);
> +	call_rcu(&ws->rcu, tlob_free_rcu);

And since you can sleep and already waited for a grace period, you can
probably just free now.
This is the same point I had in da_monitor_destroy_pool() vs
da_monitor_destroy_kmalloc(): let's align what we do with sync/RCU.

> +}
> +
> +/*
> + * __tlob_acc - accumulate elapsed ns into one per-state counter.
> + *
> + * Looks up the task's tlob_task_state under RCU, adds the interval
> + * [ws->last_ts, now] to the field at @offset within the state
> struct,
> + * and updates last_ts.  Returns true if the task is monitored.
> + *
> + * entry_lock is a raw spinlock so this is safe from hardirq
> context.
> + */
> +static inline bool __tlob_acc(struct task_struct *task, ktime_t now,
> +			       size_t offset)
> +{
> +	struct tlob_task_state *ws;
> +	unsigned long flags;
> +
> +	scoped_guard(rcu) {
> +		ws = da_get_target_by_id(task->pid);
> +		if (!ws)
> +			return false;
> +		raw_spin_lock_irqsave(&ws->entry_lock, flags);
> +		*(u64 *)((char *)ws + offset) +=
> ktime_to_ns(ktime_sub(now, ws->last_ts));

I don't really like this, you have a few options to avoid raw pointer
arithmetics:

1. a macro:

	#define __tlob_acc(task, now, field) do { \
		...
		ws->field += ktime_stuff(now); \
		...
	}
	static inline bool tlob_acc_running(struct task_struct *task, ktime_t now)
	{
		return __tlob_acc(task, now, running_ns);
	}

2. use an array in the struct and pass the index around:

	enum tlob_acc {
		running,
		waiting,
		sleeping,
		max,
	}

	struct tlob_task_state {
		...
		u64			accs_ns[max];
		...
	};

	bool __tlob_acc(struct task_struct *task, ktime_t now, enum tlob_acc acc)
	{
		...
		ws->accs_ns[acc] += ktime_stuff(now);
		...
	}
	static inline bool tlob_acc_running(struct task_struct *task, ktime_t now)
	{
		return __tlob_acc(task, now, running);
	}


Some folks don't like macros, so perhaps options 2 is cleaner.
You may not even need the helper functions since calling the generic
tlob_acc like this is still readable.

If you go down that path, you may also want to simplify the function
using normal (unscoped) guards for both RCU and the lock, since you take
them until the end of the function (won't work with the macro!).

> +		ws->last_ts = now;
> +		raw_spin_unlock_irqrestore(&ws->entry_lock, flags);
> +	}
> +	return true;
> +}
> +
> +/* Accumulate running_ns for prev; returns true if prev is
> monitored. */
> +static inline bool tlob_acc_running(struct task_struct *task,
> ktime_t now)
> +{
> +	return __tlob_acc(task, now, offsetof(struct
> tlob_task_state, running_ns));
> +}
> +
> +/* Accumulate waiting_ns for next; returns true if next is
> monitored. */
> +static inline bool tlob_acc_waiting(struct task_struct *task,
> ktime_t now)
> +{
> +	return __tlob_acc(task, now, offsetof(struct
> tlob_task_state, waiting_ns));
> +}
> +
> +/*
> + * handle_sched_switch - advance the DA on every context switch.
> + *
> + * Generates three DA events:
> + *   prev, prev_state != 0  -> sleep_tlob    (running -> sleeping)
> + *   prev, prev_state == 0  -> preempt_tlob  (running -> waiting)
> + *   next                   -> switch_in_tlob (waiting -> running)
> + *
> + * A single ktime_get() at handler entry is shared by both acc calls
> so that
> + * prev's running_ns and next's waiting_ns share the same context-
> switch
> + * timestamp; neither absorbs handler overhead into its accumulator.
> + *
> + * No waiting->sleeping edge exists: a task can only block
> voluntarily
> + * (call schedule()) while it is executing on CPU, which corresponds
> to
> + * the running DA state.  A task in the waiting state is
> TASK_RUNNING in
> + * kernel terms (on the runqueue) and cannot block itself.
> + *
> + * da_handle_event() is called unconditionally: it skips tasks that
> have no
> + * monitor entry in the hash table.
> + */
> +static void handle_sched_switch(void *data, bool preempt_unused,
> +				struct task_struct *prev,
> +				struct task_struct *next,
> +				unsigned int prev_state)
> +{
> +	ktime_t now = ktime_get();
> +	bool prev_preempted = (prev_state == 0);
> +
> +	/*
> +	 * No guard on tlob_num_monitored here: da_handle_event()
> internally
> +	 * calls da_monitor_handling_event() which checks both
> rv_monitoring_on()
> +	 * and da_monitoring(da_mon).  The hash lookup inside
> da_get_monitor()
> +	 * simply returns NULL for unmonitored tasks, which is
> equally fast as
> +	 * an atomic_read() guard.  By omitting the guard we avoid
> touching the
> +	 * tlob_num_monitored cacheline on every global context-
> switch.
> +	 */
> +	if (tlob_acc_running(prev, now))
> +		da_handle_event(prev->pid, NULL,
> +				prev_preempted ? preempt_tlob :
> sleep_tlob);
> +	if (tlob_acc_waiting(next, now))
> +		da_handle_event(next->pid, NULL, switch_in_tlob);
> +}
> +
> +/* Accumulate sleeping_ns on wakeup; returns true if task is
> monitored. */
> +static inline bool tlob_acc_sleeping(struct task_struct *task,
> ktime_t now)
> +{
> +	return __tlob_acc(task, now, offsetof(struct
> tlob_task_state, sleeping_ns));
> +}
> +
> +/*
> + * handle_sched_wakeup - sleeping -> waiting transition.
> + *
> + * try_to_wake_up() skips TASK_RUNNING tasks, so this never fires
> for a
> + * task already in running or waiting state.
> + */
> +static void handle_sched_wakeup(void *data, struct task_struct *p)
> +{
> +	ktime_t now = ktime_get();
> +
> +	/* Same reasoning as handle_sched_switch: rely on hash-
> lookup fast path. */
> +	if (tlob_acc_sleeping(p, now))
> +		da_handle_event(p->pid, NULL, wakeup_tlob);
> +}
> +
> +/*
> + * handle_sched_process_exit - clean up if a task exits without
> TRACE_STOP.
> + *
> + * Called in do_exit() context; the task still has a valid pid here.
> + * tlob_stop_task() returns -ESRCH if the task is not monitored,
> which is fine.
> + */
> +static void handle_sched_process_exit(void *data, struct task_struct
> *p,
> +				       bool group_dead)
> +{
> +	tlob_stop_task(p);
> +}
> +
> +
> +
> +/**
> + * tlob_start_task - begin monitoring @task with budget
> @threshold_ns ns.
> + * @task:         Task to monitor; may be current or another task.
> + * @threshold_ns: Latency budget in nanoseconds (wall-clock; running
> + waiting + sleeping).
> + *                Must be in [1000, TLOB_MAX_THRESHOLD_NS].
> + *
> + * Returns 0, -ENODEV, -ERANGE, -EALREADY, -ENOMEM, or -ENOSPC.
> + */
> +int tlob_start_task(struct task_struct *task, u64 threshold_ns)
> +{
> +	struct tlob_task_state *ws;
> +
> +	if (!da_monitor_enabled())
> +		return -ENODEV;
> +
> +	if (threshold_ns < 1000 || threshold_ns >

Why not some TLOB_MIN_THRESHOLD_NS (that can of course be 1000)?

> TLOB_MAX_THRESHOLD_NS)
> +		return -ERANGE;
> +
> +	/* Serialise duplicate-check + pool-slot claim for the same
> pid. */
> +	guard(mutex)(&tlob_start_mutex);
> +
> +	if (da_get_target_by_id(task->pid))
> +		return -EALREADY;
> +
> +	ws = kmem_cache_zalloc(tlob_state_cache, GFP_KERNEL);
> +	if (!ws)
> +		return -ENOMEM;
> +
> +	ws->task = task;
> +	get_task_struct(task);
> +	ws->threshold_ns = threshold_ns;
> +	ws->last_ts = ktime_get();
> +	raw_spin_lock_init(&ws->entry_lock);
> +
> +	/*
> +	 * da_handle_start_run_event() claims a pool slot via
> da_prepare_storage(),
> +	 * initialises the monitor, and delivers start_tlob in one
> step: the
> +	 * generated ha_setup_invariants() resets clk_elapsed and
> arms the timer.
> +	 * Returns 0 if the pool is exhausted (-ENOSPC).
> +	 */
> +	if (!da_handle_start_run_event(task->pid, ws, start_tlob)) {
> +		put_task_struct(task);
> +		kmem_cache_free(tlob_state_cache, ws);
> +		return -ENOSPC;
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(tlob_start_task);
> +
> +/**
> + * tlob_stop_task - stop monitoring @task.
> + * @task: Task to stop.
> + *
> + * CAS on ws->stopping (0->1) under RCU claims cleanup ownership;
> + * the winner cancels the timer synchronously and frees all
> resources.
> + *
> + * Returns 0, -EOVERFLOW (budget exceeded), -ESRCH (not monitored),
> + * or -EAGAIN (concurrent caller claimed cleanup).
> + */
> +int tlob_stop_task(struct task_struct *task)
> +{
> +	struct da_monitor *da_mon;
> +	struct ha_monitor *ha_mon;
> +	struct tlob_task_state *ws;
> +	bool budget_exceeded;
> +
> +	scoped_guard(rcu) {
> +		ws = da_get_target_by_id(task->pid);
> +		if (!ws)
> +			return -ESRCH;
> +
> +		da_mon = da_get_monitor(task->pid, NULL);
> +		if (unlikely(!da_mon)) {
> +			/* ws in hash but da_mon gone; internal
> inconsistency. */
> +			WARN_ON_ONCE(1);

	if (unlikely(WARN_ON_ONCE(!da_mon)))

> +			return -ESRCH;
> +		}
> +
> +		ha_mon = to_ha_monitor(da_mon);
> +
> +		/*
> +		 * CAS (0->1) claims cleanup ownership under RCU (ws
> guaranteed valid).
> +		 * _release pairs with atomic_read_acquire in
> ha_setup_invariants.
> +		 */
> +		if (atomic_cmpxchg_release(&ws->stopping, 0, 1) !=
> 0)
> +			return -EAGAIN;
> +	}
> +
> +	/* Wait for in-flight timer callback before reading
> da_monitoring. */
> +	ha_cancel_timer_sync(ha_mon);
> +
> +	/* Timer fired first -> budget exceeded; otherwise reset
> normally. */
> +	scoped_guard(rcu) {
> +		budget_exceeded = !da_monitoring(da_mon);
> +		if (!budget_exceeded)
> +			da_monitor_reset(da_mon);
> +	}
> +	da_destroy_storage(task->pid);
> +
> +	put_task_struct(ws->task);
> +	call_rcu(&ws->rcu, tlob_free_rcu);
> +	return budget_exceeded ? -EOVERFLOW : 0;
> +}
> +EXPORT_SYMBOL_GPL(tlob_stop_task);
> +
> +
> +static int tlob_uprobe_entry_handler(struct rv_uprobe *p, struct
> pt_regs *regs,
> +				     __u64 *data)
> +{
> +	struct tlob_uprobe_binding *b = p->priv;
> +
> +	tlob_start_task(current, b->threshold_ns);
> +	return 0;
> +}
> +
> +static int tlob_uprobe_stop_handler(struct rv_uprobe *p, struct
> pt_regs *regs,
> +				    __u64 *data)
> +{
> +	tlob_stop_task(current);
> +	return 0;
> +}
> +
> +/*
> + * Register start + stop entry uprobes for a binding.
> + * Called with tlob_uprobe_mutex held.
> + */
> +static int tlob_add_uprobe(u64 threshold_ns, const char *binpath,
> +			   loff_t offset_start, loff_t offset_stop)
> +{

You can use cleanup helpers here to save a couple of gotos (making the
code less error prone and more readable):

> +	struct tlob_uprobe_binding *b, *tmp_b;

struct tlob_uprobe_binding __free(kfree) *b = NULL, *tmp_b;

> +	char pathbuf[TLOB_MAX_PATH];
> +	struct path path;

struct path path __free(path_put) = {};

> +	char *canon;
> +	int ret;
> +
> +	if (binpath[0] != '/')
> +		return -EINVAL;
> +
> +	b = kzalloc_obj(*b, GFP_KERNEL);
> +	if (!b)
> +		return -ENOMEM;
> +
> +	b->threshold_ns = threshold_ns;
> +	b->offset_start = offset_start;
> +	b->offset_stop  = offset_stop;
> +
> +	ret = kern_path(binpath, LOOKUP_FOLLOW, &path);
> +	if (ret)
> +		goto err_free;
> +
> +	if (!d_is_reg(path.dentry)) {
> +		ret = -EINVAL;
> +		goto err_path;
> +	}
> +
> +	/* Reject duplicate start offset for the same binary. */
> +	list_for_each_entry(tmp_b, &tlob_uprobe_list, list) {
> +		if (tmp_b->offset_start == offset_start &&
> +		    tmp_b->start_probe->path.dentry == path.dentry)
> {
> +			ret = -EEXIST;
> +			goto err_path;
> +		}
> +	}
> +
> +	canon = d_path(&path, pathbuf, sizeof(pathbuf));
> +	if (IS_ERR(canon)) {
> +		ret = PTR_ERR(canon);
> +		goto err_path;
> +	}
> +	strscpy(b->binpath, canon, sizeof(b->binpath));
> +
> +	/* Both probes share b (priv) and path; attach_path refs
> path itself. */
> +	b->start_probe = rv_uprobe_attach_path(&path, offset_start,
> +					      
> tlob_uprobe_entry_handler, NULL, b);
> +	if (IS_ERR(b->start_probe)) {
> +		ret = PTR_ERR(b->start_probe);
> +		b->start_probe = NULL;
> +		goto err_path;
> +	}
> +
> +	b->stop_probe = rv_uprobe_attach_path(&path, offset_stop,
> +					     
> tlob_uprobe_stop_handler, NULL, b);
> +	if (IS_ERR(b->stop_probe)) {
> +		ret = PTR_ERR(b->stop_probe);
> +		b->stop_probe = NULL;
> +		goto err_start;
> +	}
> +
> +	path_put(&path);
> +	list_add_tail(&b->list, &tlob_uprobe_list);

And finally tell the compiler you want to keep b in the happy path.

list_add_tail(&no_free_ptr(b)->list, &tlob_uprobe_list);

Not tested, but you got the gist, check include/linux/cleanup.h and grep
around for users if it doesn't work as expected.

> +	return 0;
> +
> +err_start:
> +	rv_uprobe_detach(b->start_probe);

This will need to stay, unless you want to DEFINE_FREE() for this, may
be fun but not required.

> +err_path:
> +	path_put(&path);
> +err_free:
> +	kfree(b);
> +	return ret;
> +}

...

> +
> +/* Parse "-PATH:OFFSET_START" (ftrace uprobe_events removal
> convention). */

This isn't quite a standard function comment (and you don't really need
to make it so, it wouldn't add much value), but I'd say at least make it
a multi-line comment which is more common in this location:

/*
 * Parse ...
 */

...

> +static void __tlob_destroy_monitor(void)
> +{
> +	rv_this.enabled = 0;
> +	/*
> +	 * Remove uprobes first; rv_uprobe_sync() inside ensures all
> in-flight
> +	 * handlers have finished before we proceed.
> +	 */
> +	tlob_remove_all_uprobes();
> +
> +	/*
> +	 * da_monitor_destroy() iterates any remaining entries via
> da_extra_cleanup
> +	 * (tlob_extra_cleanup), cancels their timers, and frees
> their state.
> +	 * rcu_barrier() inside drains both da_pool_return_cb and
> tlob_free_rcu
> +	 * callbacks before the pool arrays are freed.
> +	 */

I'm not sure we need this many comments, the behaviour may change and we
don't need to describe what the function does before calling it.

If anything say /why/ you call a function before another, but I'd say
it's trivial here so you can just drop both comments.

> +	ha_monitor_destroy();
> +	kmem_cache_destroy(tlob_state_cache);
> +	tlob_state_cache = NULL;
> +}

...

> +static int __init register_tlob(void)
> +{
> +	int ret;
> +
> +	ret = rv_register_monitor(&rv_this, NULL);
> +	if (ret)
> +		return ret;
> +
> +	if (rv_this.root_d) {
> +		if (!tracefs_create_file("monitor", 0644,

We have rv_create_file() , it's exactly the same but let's be consistent
so if we ever decide to overload it, this won't stay behind.

> rv_this.root_d, NULL,
> +					 &tlob_monitor_fops)) {
> +			rv_unregister_monitor(&rv_this);
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	return 0;
> +}

> diff --git a/kernel/trace/rv/monitors/tlob/tlob.h
> b/kernel/trace/rv/monitors/tlob/tlob.h
> new file mode 100644
> index 000000000000..b6724e629c69
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/tlob/tlob.h
> @@ -0,0 +1,148 @@

...

> +
> +enum states_tlob {
> +	running_tlob,
> +	waiting_tlob,
> +	sleeping_tlob,
> +	state_max_tlob,
> +};

This is personal preference, so feel free to ignore it. The header file
is (mostly) automatically generated and rvgen currently gives you a
different order for enums/arrays. Maybe just reorder all those according
to what rvgen does (just run it without the -a option and diff what
comes out).

The main reason is again to make it easy to update the file in case the
model/framework changes.

Thanks,
Gabriele


  reply	other threads:[~2026-06-15 15:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-07 16:13 [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor wen.yang
2026-06-07 16:13 ` [PATCH v3 1/9] rv/da: introduce DA_MON_ALLOCATION_STRATEGY wen.yang
2026-06-15  9:56   ` Gabriele Monaco
2026-06-07 16:13 ` [PATCH v3 2/9] rv: add generic uprobe infrastructure for RV monitors wen.yang
2026-06-07 16:13 ` [PATCH v3 3/9] rv/tlob: add tlob model DOT file wen.yang
2026-06-07 16:13 ` [PATCH v3 4/9] rv/ha: fix ha_invariant_passed_ns silent bypass of invariant check wen.yang
2026-06-15 10:12   ` Gabriele Monaco
2026-06-07 16:13 ` [PATCH v3 5/9] rv/ha: make da_monitor_reset_hook and EVENT_NONE_LBL overridable wen.yang
2026-06-15 10:16   ` Gabriele Monaco
2026-06-07 16:13 ` [PATCH v3 6/9] rv/tlob: add tlob hybrid automaton monitor wen.yang
2026-06-15 15:24   ` Gabriele Monaco [this message]
2026-06-07 16:13 ` [PATCH v3 7/9] rv/tlob: add KUnit tests for the tlob monitor wen.yang
2026-06-07 16:13 ` [PATCH v3 8/9] selftests/verification: fix verificationtest-ktap for out-of-tree execution wen.yang
2026-06-07 16:13 ` [PATCH v3 9/9] selftests/verification: add tlob selftests wen.yang
2026-06-13 16:00 ` [PATCH v3 0/9] rv/tlob: Add task latency over budget RV monitor Wen Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=04900b5700a2d5d344a60cb346cfffaf4e3d5fa4.camel@redhat.com \
    --to=gmonaco@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=wen.yang@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox