* [PATCH v2 04/12] rv: Prevent task migration while handling per-CPU events
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, linux-trace-kernel
Cc: Wen Yang, Nam Cao
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
Tracepoint handlers are now fully preemptible. When a per-CPU monitor
handles an event, it retrieves the monitor state using a per-CPU
pointer. If the event itself doesn't disable preemption, the task can
migrate to a different CPU and we risk updating the wrong monitor.
Mitigate this by explicitly disabling task migration before acquiring
the monitor pointer. This cannot guarantee the monitor runs on the
correct CPU but reduces the race condition window and prevents warnings.
Reviewed-by: Wen Yang <wen.yang@linux.dev>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/rv/da_monitor.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 0b7028df08fb..a9fd284195ee 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -181,6 +181,10 @@ static inline void da_monitor_destroy(void)
da_monitor_reset_all();
}
+#ifndef da_implicit_guard
+#define da_implicit_guard()
+#endif
+
#elif RV_MON_TYPE == RV_MON_PER_CPU
/*
* Functions to define, init and get a per-cpu monitor.
@@ -230,6 +234,10 @@ static inline void da_monitor_destroy(void)
da_monitor_reset_all();
}
+#ifndef da_implicit_guard
+#define da_implicit_guard() guard(migrate)()
+#endif
+
#elif RV_MON_TYPE == RV_MON_PER_TASK
/*
* Functions to define, init and get a per-task monitor.
@@ -677,6 +685,7 @@ static inline bool __da_handle_start_run_event(struct da_monitor *da_mon,
*/
static inline void da_handle_event(enum events event)
{
+ da_implicit_guard();
__da_handle_event(da_get_monitor(), event, 0);
}
@@ -692,6 +701,7 @@ static inline void da_handle_event(enum events event)
*/
static inline bool da_handle_start_event(enum events event)
{
+ da_implicit_guard();
return __da_handle_start_event(da_get_monitor(), event, 0);
}
@@ -703,6 +713,7 @@ static inline bool da_handle_start_event(enum events event)
*/
static inline bool da_handle_start_run_event(enum events event)
{
+ da_implicit_guard();
return __da_handle_start_run_event(da_get_monitor(), event, 0);
}
--
2.54.0
^ permalink raw reply related
* [PATCH v2 05/12] rv: Prevent in-flight per-task handlers from using invalid slots
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, Nam Cao,
linux-trace-kernel
Cc: Wen Yang
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
Per-task monitors use a slot in the task_struct->rv[] array and store
that locally (e.g. task_mon_slot), this slot is returned during the
destruction process but currently hanlers can be running while that slot
is returning and this race may lead to accessing an invalid slot.
Synchronise with all in-flight tracepoint handlers using
tracepoint_synchronize_unregister() before returning the slot.
Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Fixes: a9769a5b9878 ("rv: Add support for LTL monitors")
Suggested-by: Wen Yang <wen.yang@linux.dev>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/rv/da_monitor.h | 4 ++++
include/rv/ltl_monitor.h | 1 +
2 files changed, 5 insertions(+)
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index a9fd284195ee..446a4d53d99c 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -310,6 +310,9 @@ static int da_monitor_init(void)
/*
* da_monitor_destroy - return the allocated slot
+ *
+ * Wait for all in-flight handlers before returning the slot to avoid
+ * out-of-bound accesses.
*/
static inline void da_monitor_destroy(void)
{
@@ -320,6 +323,7 @@ static inline void da_monitor_destroy(void)
da_monitor_reset_all();
+ tracepoint_synchronize_unregister();
rv_put_task_monitor_slot(task_mon_slot);
task_mon_slot = RV_PER_TASK_MONITOR_INIT;
}
diff --git a/include/rv/ltl_monitor.h b/include/rv/ltl_monitor.h
index eff60cd61106..38e792401f76 100644
--- a/include/rv/ltl_monitor.h
+++ b/include/rv/ltl_monitor.h
@@ -77,6 +77,7 @@ static void ltl_monitor_destroy(void)
{
rv_detach_trace_probe(name, task_newtask, handle_task_newtask);
+ tracepoint_synchronize_unregister();
rv_put_task_monitor_slot(ltl_monitor_slot);
ltl_monitor_slot = RV_PER_TASK_MONITOR_INIT;
}
--
2.54.0
^ permalink raw reply related
* [PATCH v2 06/12] rv: Ensure all pending probes terminate on per-obj monitor destroy
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, Nam Cao,
linux-trace-kernel
Cc: Wen Yang
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
The monitor disable/destroy sequence detaches all probes and resets the
monitor's data, however it doesn't wait for pending probes. This is an
issue with per-object monitors, which free the monitor storage.
Call tracepoint_synchronize_unregister() to make sure to wait for all
pending probes before destroying the monitor storage.
Fixes: 4a24127bd6cb ("rv: Add support for per-object monitors in DA/HA")
Reviewed-by: Wen Yang <wen.yang@linux.dev>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/rv/da_monitor.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 446a4d53d99c..5f790e1694b4 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -519,9 +519,10 @@ static inline void da_monitor_destroy(void)
struct hlist_node *tmp;
int bkt;
+ tracepoint_synchronize_unregister();
/*
- * This function is called after all probes are disabled, we need only
- * worry about concurrency against old events.
+ * This function is called after all probes are disabled and no longer
+ * pending, we can safely assume no concurrent user.
*/
synchronize_rcu();
hash_for_each_safe(da_monitor_ht, bkt, tmp, mon_storage, node) {
--
2.54.0
^ permalink raw reply related
* [PATCH v2 08/12] rv: Ensure synchronous cleanup for HA monitors
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, Nam Cao,
linux-trace-kernel
Cc: Wen Yang
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
HA monitors may start timers, all cleanup functions currently stop the
timers asynchronously to avoid sleeping in the wrong context.
Nothing makes sure running callbacks terminate on cleanup.
Run the entire HA timer callback in an RCU read-side critical section,
this way we can simply synchronize_rcu() with any pending timer and are
sure any cleanup using kfree_rcu() runs after callbacks terminated.
Additionally make sure any unlikely callback running late won't run any
code if the monitor is marked as disabled.
Use memory barriers to serialise with racing resets.
Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Fixes: 4a24127bd6cb ("rv: Add support for per-object monitors in DA/HA")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/rv/da_monitor.h | 26 +++++++++++++++++++++-----
include/rv/ha_monitor.h | 19 +++++++++++++++++--
2 files changed, 38 insertions(+), 7 deletions(-)
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index c1e347f9deaf..268f22933663 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -57,6 +57,15 @@ static struct rv_monitor rv_this;
#define da_monitor_reset_hook(da_mon)
#endif
+/*
+ * Hook to allow the implementation of hybrid automata: define it with a
+ * function that waits for the termination of all monitors background
+ * activities (e.g. all timers). This hook can sleep.
+ */
+#ifndef da_monitor_sync_hook
+#define da_monitor_sync_hook()
+#endif
+
/*
* Type for the target id, default to int but can be overridden.
* A long type can work as hash table key (PER_OBJ) but will be downgraded to
@@ -83,7 +92,8 @@ static inline void da_monitor_reset(struct da_monitor *da_mon)
{
da_monitor_reset_hook(da_mon);
WRITE_ONCE(da_mon->monitoring, 0);
- da_mon->curr_state = model_get_initial_state();
+ /* Pair with load in __ha_monitor_timer_callback */
+ smp_store_release(&da_mon->curr_state, model_get_initial_state());
}
/*
@@ -181,6 +191,7 @@ static inline int da_monitor_init(void)
static inline void da_monitor_destroy(void)
{
da_monitor_reset_all();
+ da_monitor_sync_hook();
}
#ifndef da_implicit_guard
@@ -234,6 +245,7 @@ static inline int da_monitor_init(void)
static inline void da_monitor_destroy(void)
{
da_monitor_reset_all();
+ da_monitor_sync_hook();
}
#ifndef da_implicit_guard
@@ -324,6 +336,7 @@ static inline void da_monitor_destroy(void)
}
da_monitor_reset_all();
+ da_monitor_sync_hook();
tracepoint_synchronize_unregister();
rv_put_task_monitor_slot(task_mon_slot);
@@ -503,10 +516,9 @@ static void da_monitor_reset_all(void)
struct da_monitor_storage *mon_storage;
int bkt;
- rcu_read_lock();
+ guard(rcu)();
hash_for_each_rcu(da_monitor_ht, bkt, mon_storage, node)
da_monitor_reset(&mon_storage->rv.da_mon);
- rcu_read_unlock();
}
static inline int da_monitor_init(void)
@@ -522,13 +534,17 @@ static inline void da_monitor_destroy(void)
int bkt;
tracepoint_synchronize_unregister();
+ scoped_guard(rcu) {
+ hash_for_each_rcu(da_monitor_ht, bkt, mon_storage, node) {
+ da_monitor_reset_hook(&mon_storage->rv.da_mon);
+ }
+ }
+ da_monitor_sync_hook();
/*
* This function is called after all probes are disabled and no longer
* pending, we can safely assume no concurrent user.
*/
- synchronize_rcu();
hash_for_each_safe(da_monitor_ht, bkt, tmp, mon_storage, node) {
- da_monitor_reset_hook(&mon_storage->rv.da_mon);
hash_del_rcu(&mon_storage->node);
kfree(mon_storage);
}
diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
index d59507e8cb30..661631bc933a 100644
--- a/include/rv/ha_monitor.h
+++ b/include/rv/ha_monitor.h
@@ -36,6 +36,7 @@ static bool ha_monitor_handle_constraint(struct da_monitor *da_mon,
#define da_monitor_event_hook ha_monitor_handle_constraint
#define da_monitor_init_hook ha_monitor_init_env
#define da_monitor_reset_hook ha_monitor_reset_env
+#define da_monitor_sync_hook() synchronize_rcu()
#include <rv/da_monitor.h>
#include <linux/seq_buf.h>
@@ -237,12 +238,26 @@ static bool ha_monitor_handle_constraint(struct da_monitor *da_mon,
return false;
}
+/*
+ * __ha_monitor_timer_callback - generic callback representation
+ *
+ * This callback runs in an RCU read-side critical section to allow the
+ * destruction sequence to easily synchronize_rcu() with all pending timer
+ * after asynchronously disabling them.
+ */
static inline void __ha_monitor_timer_callback(struct ha_monitor *ha_mon)
{
- enum states curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
DECLARE_SEQ_BUF(env_string, ENV_BUFFER_SIZE);
- u64 time_ns = ha_get_ns();
+ enum states curr_state;
+ u64 time_ns;
+
+ guard(rcu)();
+ /* Ensure consistent curr_state if we race with da_monitor_reset */
+ curr_state = smp_load_acquire(&ha_mon->da_mon.curr_state);
+ if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
+ return;
+ time_ns = ha_get_ns();
ha_get_env_string(&env_string, ha_mon, time_ns);
ha_react(curr_state, EVENT_NONE, env_string.buffer);
ha_trace_error_env(ha_mon, model_get_state_name(curr_state),
--
2.54.0
^ permalink raw reply related
* [PATCH v2 07/12] rv: Fix monitor start ordering and memory ordering for monitoring flag
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, linux-trace-kernel
Cc: Wen Yang, Nam Cao
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
From: Wen Yang <wen.yang@linux.dev>
da_monitor_start() set monitoring=1 before calling da_monitor_init_hook(),
may racing with the sched_switch handler:
da_monitor_start() sched_switch handler
------------------------- ---------------------------------
da_mon->monitoring = 1;
if (da_monitoring(da_mon)) /* true */
ha_start_timer_ns(...);
/* hrtimer->base == NULL, crash */
da_monitor_init_hook(da_mon);
/* hrtimer_setup() sets base */
Fix the ordering and pair with release/acquire semantics:
da_monitor_init_hook(da_mon);
smp_store_release(&da_mon->monitoring, 1); /* da_monitor_start() */
return smp_load_acquire(&da_mon->monitoring); /* da_monitoring() */
On ARM64 a plain STR + LDR does not form a release-acquire pair, so
the load can observe monitoring=1 while hrtimer->base is still NULL.
The plain accesses are also data races under KCSAN.
Use WRITE_ONCE for the monitoring=0 store in da_monitor_reset() to
cover the reset path.
Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros")
Signed-off-by: Wen Yang <wen.yang@linux.dev>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/rv/da_monitor.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 5f790e1694b4..c1e347f9deaf 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -82,7 +82,7 @@ static void react(enum states curr_state, enum events event)
static inline void da_monitor_reset(struct da_monitor *da_mon)
{
da_monitor_reset_hook(da_mon);
- da_mon->monitoring = 0;
+ WRITE_ONCE(da_mon->monitoring, 0);
da_mon->curr_state = model_get_initial_state();
}
@@ -95,8 +95,9 @@ static inline void da_monitor_reset(struct da_monitor *da_mon)
static inline void da_monitor_start(struct da_monitor *da_mon)
{
da_mon->curr_state = model_get_initial_state();
- da_mon->monitoring = 1;
da_monitor_init_hook(da_mon);
+ /* Pairs with smp_load_acquire in da_monitoring(). */
+ smp_store_release(&da_mon->monitoring, 1);
}
/*
@@ -104,7 +105,8 @@ static inline void da_monitor_start(struct da_monitor *da_mon)
*/
static inline bool da_monitoring(struct da_monitor *da_mon)
{
- return da_mon->monitoring;
+ /* Pairs with smp_store_release in da_monitor_start(). */
+ return smp_load_acquire(&da_mon->monitoring);
}
/*
--
2.54.0
^ permalink raw reply related
* [PATCH v2 09/12] rv: Do not rely on clean monitor when initialising HA
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, Masami Hiramatsu,
Nam Cao, linux-trace-kernel
Cc: Wen Yang
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
Hybrid Automata monitors hook into the DA implementation when doing
da_monitor_reset(). This function is called both on initialisation and
teardown, HA monitors try to cancel a timer only when it's initialised
relying on the da_mon->monitoring flag. This flag could however be
corrupted during initialisation. This happens for instance on per-task
monitors that share the same storage with different type of monitors
like LTL or in case of races during a previous teardown.
Stop relying on the monitoring flag during initialisation, assume that
can have any value, so skip timer cancellation in any case when a local
flag is set. New monitors (e.g. new tasks) are always zero-initialised
so they are safe.
Reported-by: Wen Yang <wen.yang@linux.dev>
Closes: https://lore.kernel.org/lkml/d02c656aada7d071f083460a5c9a454363669b61.1778522945.git.wen.yang@linux.dev
Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Reviewed-by: Wen Yang <wen.yang@linux.dev>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/rv/ha_monitor.h | 31 ++++++++++++++++++-
kernel/trace/rv/monitors/nomiss/nomiss.c | 4 +--
kernel/trace/rv/monitors/opid/opid.c | 4 +--
kernel/trace/rv/monitors/stall/stall.c | 4 +--
.../rvgen/rvgen/templates/dot2k/main.c | 4 +--
5 files changed, 38 insertions(+), 9 deletions(-)
diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
index 661631bc933a..2d7187c16f55 100644
--- a/include/rv/ha_monitor.h
+++ b/include/rv/ha_monitor.h
@@ -116,6 +116,35 @@ static enum hrtimer_restart ha_monitor_timer_callback(struct hrtimer *hrtimer);
#define ha_get_ns() 0
#endif /* HA_CLK_NS */
+static bool ha_mon_initializing;
+
+static int ha_monitor_init(void)
+{
+ int ret;
+
+ ha_mon_initializing = true;
+ ret = da_monitor_init();
+ ha_mon_initializing = false;
+ return ret;
+}
+
+static void ha_monitor_destroy(void)
+{
+ da_monitor_destroy();
+}
+
+/*
+ * ha_monitor_uninitialized - are fields like the timer initialized?
+ *
+ * On a clean monitor, we can assume an active monitor (monitoring) is
+ * initialized, however the monitoring field cannot be trusted during
+ * initialization.
+ */
+static inline bool ha_monitor_uninitialized(struct da_monitor *da_mon)
+{
+ return ha_mon_initializing || !da_monitoring(da_mon);
+}
+
/* Should be supplied by the monitor */
static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs env, u64 time_ns);
static bool ha_verify_constraint(struct ha_monitor *ha_mon,
@@ -160,7 +189,7 @@ static inline void ha_monitor_reset_env(struct da_monitor *da_mon)
struct ha_monitor *ha_mon = to_ha_monitor(da_mon);
/* Initialisation resets the monitor before initialising the timer */
- if (likely(da_monitoring(da_mon)))
+ if (likely(!ha_monitor_uninitialized(da_mon)))
ha_cancel_timer(ha_mon);
}
diff --git a/kernel/trace/rv/monitors/nomiss/nomiss.c b/kernel/trace/rv/monitors/nomiss/nomiss.c
index 31f90f3638d8..8ead8783c29f 100644
--- a/kernel/trace/rv/monitors/nomiss/nomiss.c
+++ b/kernel/trace/rv/monitors/nomiss/nomiss.c
@@ -227,7 +227,7 @@ static int enable_nomiss(void)
{
int retval;
- retval = da_monitor_init();
+ retval = ha_monitor_init();
if (retval)
return retval;
@@ -263,7 +263,7 @@ static void disable_nomiss(void)
rv_detach_trace_probe("nomiss", sched_switch, handle_sched_switch);
rv_detach_trace_probe("nomiss", sched_wakeup, handle_sched_wakeup);
- da_monitor_destroy();
+ ha_monitor_destroy();
}
static struct rv_monitor rv_this = {
diff --git a/kernel/trace/rv/monitors/opid/opid.c b/kernel/trace/rv/monitors/opid/opid.c
index 4594c7c46601..2922318c6112 100644
--- a/kernel/trace/rv/monitors/opid/opid.c
+++ b/kernel/trace/rv/monitors/opid/opid.c
@@ -73,7 +73,7 @@ static int enable_opid(void)
{
int retval;
- retval = da_monitor_init();
+ retval = ha_monitor_init();
if (retval)
return retval;
@@ -90,7 +90,7 @@ static void disable_opid(void)
rv_detach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
rv_detach_trace_probe("opid", sched_waking, handle_sched_waking);
- da_monitor_destroy();
+ ha_monitor_destroy();
}
/*
diff --git a/kernel/trace/rv/monitors/stall/stall.c b/kernel/trace/rv/monitors/stall/stall.c
index 9ccfda6b0e73..3c38fb1a0159 100644
--- a/kernel/trace/rv/monitors/stall/stall.c
+++ b/kernel/trace/rv/monitors/stall/stall.c
@@ -103,7 +103,7 @@ static int enable_stall(void)
{
int retval;
- retval = da_monitor_init();
+ retval = ha_monitor_init();
if (retval)
return retval;
@@ -120,7 +120,7 @@ static void disable_stall(void)
rv_detach_trace_probe("stall", sched_switch, handle_sched_switch);
rv_detach_trace_probe("stall", sched_wakeup, handle_sched_wakeup);
- da_monitor_destroy();
+ ha_monitor_destroy();
}
static struct rv_monitor rv_this = {
diff --git a/tools/verification/rvgen/rvgen/templates/dot2k/main.c b/tools/verification/rvgen/rvgen/templates/dot2k/main.c
index bf0999f6657a..889446760e3c 100644
--- a/tools/verification/rvgen/rvgen/templates/dot2k/main.c
+++ b/tools/verification/rvgen/rvgen/templates/dot2k/main.c
@@ -35,7 +35,7 @@ static int enable_%%MODEL_NAME%%(void)
{
int retval;
- retval = da_monitor_init();
+ retval = %%MONITOR_CLASS%%_monitor_init();
if (retval)
return retval;
@@ -50,7 +50,7 @@ static void disable_%%MODEL_NAME%%(void)
%%TRACEPOINT_DETACH%%
- da_monitor_destroy();
+ %%MONITOR_CLASS%%_monitor_destroy();
}
/*
--
2.54.0
^ permalink raw reply related
* [PATCH v2 10/12] rv: Add automatic cleanup handlers for per-task HA monitors
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, Nam Cao,
linux-trace-kernel
Cc: Wen Yang
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
Hybrid automata monitors may start timers, depending on the model, these
may remain active on an exiting task and cause false positives or even
access freed memory.
Add an enable/disable hook in the HA code, currently only populated by
the per-task handler for registration and deregistration.
This hooks to the sched_process_exit event and ensures the timer is
stopped for every exiting task. The handler is enabled automatically but
may be disabled, for instance if the monitor uses the event for another
purpose (but should still manually ensure timers are stopped).
Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/rv/ha_monitor.h | 45 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
index 2d7187c16f55..df67d5e9d132 100644
--- a/include/rv/ha_monitor.h
+++ b/include/rv/ha_monitor.h
@@ -28,6 +28,7 @@ static inline void ha_monitor_init_env(struct da_monitor *da_mon);
static inline void ha_monitor_reset_env(struct da_monitor *da_mon);
static inline void ha_setup_timer(struct ha_monitor *ha_mon);
static inline bool ha_cancel_timer(struct ha_monitor *ha_mon);
+static inline void ha_cancel_timer_sync(struct ha_monitor *ha_mon);
static bool ha_monitor_handle_constraint(struct da_monitor *da_mon,
enum states curr_state,
enum events event,
@@ -38,6 +39,26 @@ static bool ha_monitor_handle_constraint(struct da_monitor *da_mon,
#define da_monitor_reset_hook ha_monitor_reset_env
#define da_monitor_sync_hook() synchronize_rcu()
+#if !defined(HA_SKIP_AUTO_CLEANUP) && RV_MON_TYPE == RV_MON_PER_TASK
+/*
+ * Automatic cleanup handlers for per-task HA monitors, only skip if you know
+ * what you are doing (e.g. you want to implement cleanup manually in another
+ * handler doing more things).
+ */
+static void ha_handle_sched_process_exit(void *data, struct task_struct *p,
+ bool group_dead);
+
+#define ha_monitor_enable_hook() \
+ rv_attach_trace_probe(__stringify(MONITOR_NAME), sched_process_exit, \
+ ha_handle_sched_process_exit)
+#define ha_monitor_disable_hook() \
+ rv_detach_trace_probe(__stringify(MONITOR_NAME), sched_process_exit, \
+ ha_handle_sched_process_exit)
+#else
+#define ha_monitor_enable_hook() ((void)0)
+#define ha_monitor_disable_hook() ((void)0)
+#endif
+
#include <rv/da_monitor.h>
#include <linux/seq_buf.h>
@@ -124,12 +145,15 @@ static int ha_monitor_init(void)
ha_mon_initializing = true;
ret = da_monitor_init();
+ if (ret == 0)
+ ha_monitor_enable_hook();
ha_mon_initializing = false;
return ret;
}
static void ha_monitor_destroy(void)
{
+ ha_monitor_disable_hook();
da_monitor_destroy();
}
@@ -230,6 +254,18 @@ static inline void ha_trace_error_env(struct ha_monitor *ha_mon,
{
CONCATENATE(trace_error_env_, MONITOR_NAME)(id, curr_state, event, env);
}
+
+#if !defined(HA_SKIP_AUTO_CLEANUP) && RV_MON_TYPE == RV_MON_PER_TASK
+static void ha_handle_sched_process_exit(void *data, struct task_struct *p,
+ bool group_dead)
+{
+ struct da_monitor *da_mon = da_get_monitor(p);
+
+ if (likely(!ha_monitor_uninitialized(da_mon)))
+ ha_cancel_timer_sync(to_ha_monitor(da_mon));
+}
+#endif
+
#endif /* RV_MON_TYPE */
/*
@@ -456,6 +492,10 @@ static inline bool ha_cancel_timer(struct ha_monitor *ha_mon)
{
return timer_delete(&ha_mon->timer);
}
+static inline void ha_cancel_timer_sync(struct ha_monitor *ha_mon)
+{
+ timer_delete_sync(&ha_mon->timer);
+}
#elif HA_TIMER_TYPE == HA_TIMER_HRTIMER
/*
* Helper functions to handle the monitor timer.
@@ -507,6 +547,10 @@ static inline bool ha_cancel_timer(struct ha_monitor *ha_mon)
{
return hrtimer_try_to_cancel(&ha_mon->hrtimer) == 1;
}
+static inline void ha_cancel_timer_sync(struct ha_monitor *ha_mon)
+{
+ hrtimer_cancel(&ha_mon->hrtimer);
+}
#else /* HA_TIMER_NONE */
/*
* Start function is intentionally not defined, monitors using timers must
@@ -517,6 +561,7 @@ static inline bool ha_cancel_timer(struct ha_monitor *ha_mon)
{
return false;
}
+static inline void ha_cancel_timer_sync(struct ha_monitor *ha_mon) { }
#endif
#endif
--
2.54.0
^ permalink raw reply related
* [PATCH v2 11/12] verification/rvgen: Generate cleanup hook for per-obj monitor
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, linux-trace-kernel
Cc: Nam Cao, Wen Yang
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
Per-object monitors can allocate memory dynamically and such memory is
required for the lifetime of the object, then it should be freed with
the appropriate call.
Force the generation scripts to add a cleanup function the user will
need to wire to the appropriate event (e.g. sched_process_exit for
tasks). This can be safely removed if the object will never cease to
exist before disabling the monitor (e.g. if following only static
variables).
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
tools/verification/rvgen/rvgen/dot2k.py | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/tools/verification/rvgen/rvgen/dot2k.py b/tools/verification/rvgen/rvgen/dot2k.py
index e6f476b903b0..a17294283a1f 100644
--- a/tools/verification/rvgen/rvgen/dot2k.py
+++ b/tools/verification/rvgen/rvgen/dot2k.py
@@ -17,6 +17,9 @@ from .automata import _EventConstraintKey, _StateConstraintKey, AutomataError
class dot2k(Monitor, Dot2c):
template_dir = "dot2k"
+ # only needed for the per-obj cleanup hook
+ cleanup_marker = "obj_cleanup"
+
def __init__(self, file_path, MonitorType, extra_params={}):
self.monitor_type = MonitorType
Monitor.__init__(self, extra_params)
@@ -56,18 +59,30 @@ class dot2k(Monitor, Dot2c):
buff.append(f"\tda_{handle}({event}{self.enum_suffix});")
buff.append("}")
buff.append("")
+ if self.monitor_type == "per_obj":
+ buff.append("/* XXX: obj is being destroyed, remove if not required (e.g. obj is static) */")
+ buff.append(f"static void handle_{self.cleanup_marker}(void *data, /* XXX: fill header */)")
+ buff.append("{")
+ buff.append("\tint id = /* XXX: how do I get the id? */;")
+ buff.append("\tda_destroy_storage(id);")
+ buff.append("}")
+ buff.append("")
return '\n'.join(buff)
def fill_tracepoint_attach_probe(self) -> str:
buff = []
for event in self.events:
buff.append(f"\trv_attach_trace_probe(\"{self.name}\", /* XXX: tracepoint */, handle_{event});")
+ if self.monitor_type == "per_obj":
+ buff.append(f"\trv_attach_trace_probe(\"{self.name}\", /* XXX: cleanup tracepoint */, handle_{self.cleanup_marker});")
return '\n'.join(buff)
def fill_tracepoint_detach_helper(self) -> str:
buff = []
for event in self.events:
buff.append(f"\trv_detach_trace_probe(\"{self.name}\", /* XXX: tracepoint */, handle_{event});")
+ if self.monitor_type == "per_obj":
+ buff.append(f"\trv_detach_trace_probe(\"{self.name}\", /* XXX: cleanup tracepoint */, handle_{self.cleanup_marker});")
return '\n'.join(buff)
def fill_model_h_header(self) -> list[str]:
--
2.54.0
^ permalink raw reply related
* [PATCH v2 12/12] verification/rvgen: Fix suffix strip in dot2k
From: Gabriele Monaco @ 2026-05-27 6:23 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, Nam Cao,
linux-trace-kernel
Cc: Wen Yang
In-Reply-To: <20260527062313.39908-1-gmonaco@redhat.com>
__start_to_invariant_check() and __get_constraint_env() parse the
environment variable's name from sources that have it padded with the
monitor name. This is removed using rstrip(), which is not meant to
strip a substring but rather a set of characters.
Use removesuffix() to actually get rid of the trailing _<monitor name>.
Fixes: a82adadb16894 ("verification/rvgen: Add support for Hybrid Automata")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
tools/verification/rvgen/rvgen/dot2k.py | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/verification/rvgen/rvgen/dot2k.py b/tools/verification/rvgen/rvgen/dot2k.py
index a17294283a1f..3060aa4b945d 100644
--- a/tools/verification/rvgen/rvgen/dot2k.py
+++ b/tools/verification/rvgen/rvgen/dot2k.py
@@ -230,14 +230,14 @@ class ha2k(dot2k):
def __get_constraint_env(self, constr: str) -> str:
"""Extract the second argument from an ha_ function"""
env = constr.split("(")[1].split()[1].rstrip(")").rstrip(",")
- assert env.rstrip(f"_{self.name}") in self.envs
+ assert env.removesuffix(f"_{self.name}") in self.envs
return env
def __start_to_invariant_check(self, constr: str) -> str:
# by default assume the timer has ns expiration
env = self.__get_constraint_env(constr)
clock_type = "ns"
- if self.env_types.get(env.rstrip(f"_{self.name}")) == "j":
+ if self.env_types.get(env.removesuffix(f"_{self.name}")) == "j":
clock_type = "jiffy"
return f"return ha_check_invariant_{clock_type}(ha_mon, {env}, time_ns)"
--
2.54.0
^ permalink raw reply related
* [PATCH] rv: Use 0 to check for preemption enabled in opid
From: Gabriele Monaco @ 2026-05-27 7:24 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Gabriele Monaco, Masami Hiramatsu,
linux-trace-kernel
Cc: Wen Yang, Nam Cao
Since tracepoint handlers no longer run with preemption disabled by
default, the opid monitor should count 1 in the preemption count as
preemption disabled.
Change the rule for preempt_off to preempt > 0.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Forgot to add this commit to the fixes series. This is required since
opid was merged with the commits enabling preemption in tracepoints,
similarly to "rv: Prevent task migration while handling per-CPU events"
---
kernel/trace/rv/monitors/opid/opid.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/kernel/trace/rv/monitors/opid/opid.c b/kernel/trace/rv/monitors/opid/opid.c
index 2922318c6112..3b6a85e815b8 100644
--- a/kernel/trace/rv/monitors/opid/opid.c
+++ b/kernel/trace/rv/monitors/opid/opid.c
@@ -22,14 +22,8 @@ static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs_opid env, u64 time_ns
if (env == irq_off_opid)
return irqs_disabled();
else if (env == preempt_off_opid) {
- /*
- * If CONFIG_PREEMPTION is enabled, then the tracepoint itself disables
- * preemption (adding one to the preempt_count). Since we are
- * interested in the preempt_count at the time the tracepoint was
- * hit, we consider 1 as still enabled.
- */
if (IS_ENABLED(CONFIG_PREEMPTION))
- return (preempt_count() & PREEMPT_MASK) > 1;
+ return (preempt_count() & PREEMPT_MASK) > 0;
return true;
}
return ENV_INVALID_VALUE;
base-commit: 8bc67e4db64aa72732c474b44ea8622062c903f0
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v3 2/2] serial: qcom-geni: Add tracepoints for Qualcomm GENI serial driver
From: Greg Kroah-Hartman @ 2026-05-27 8:17 UTC (permalink / raw)
To: Praveen Talari
Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Jiri Slaby,
Konrad Dybcio, linux-kernel, linux-trace-kernel, linux-arm-msm,
linux-serial, Mukesh Kumar Savaliya, Aniket Randive,
chandana.chiluveru, jyothi.seerapu
In-Reply-To: <c0daeac3-1c13-4389-b48d-92f3c3a1643c@oss.qualcomm.com>
On Tue, May 26, 2026 at 10:36:18AM +0530, Praveen Talari wrote:
> Hi
>
> On 22-05-2026 15:17, Greg Kroah-Hartman wrote:
> > On Mon, May 18, 2026 at 11:26:56PM +0530, Praveen Talari wrote:
> > > Add tracing to the Qualcomm GENI serial driver to improve runtime
> > > observability.
> > >
> > > Trace hooks are added at key points including termios and clock
> > > configuration, manual control get/set, interrupt handling, and data
> > > TX/RX paths.
> > >
> > > Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
> > > Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com>
> > > ---
> > > v2->v3:
> > > - Updated commit text(removed example as it was available on cover
> > > letter).
> > > ---
> > > drivers/tty/serial/qcom_geni_serial.c | 27 +++++++++++++++++++++++----
> > > 1 file changed, 23 insertions(+), 4 deletions(-)
> > This patch did not apply to my tree :(
> Do you mean these patches are not applied cleanly?
Yes.
> If yes, i will push on linux-next tip.
You mean rebase, right?
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH v6] tracing/eprobes: Allow use of BTF names to dereference pointers
From: Jiri Olsa @ 2026-05-27 8:59 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Linux trace kernel, Masami Hiramatsu, Mathieu Desnoyers,
Mark Rutland, Peter Zijlstra, Namhyung Kim, Takaya Saeki,
Douglas Raillard, Tom Zanussi, Andrew Morton, Thomas Gleixner,
Ian Rogers, Jiri Olsa
In-Reply-To: <20260521225033.56458336@fedora>
On Thu, May 21, 2026 at 10:50:33PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> Add syntax to the parsing of eprobes to be able to typecast a trace event
> field that is a pointer to a structure.
>
> Currently, a dereference must be a number, where the user has to figure
> out manually the offset of a member of a structure that they want to
> dereference.
>
> But for event probes that records a field that happens to be a pointer to
> a structure, it cannot dereference these values with BTF naming, but
> must use numerical offsets.
>
> For example, to find out what device a sk_buff is pointing to in the
> net_dev_xmit trace event, one must first use gdb to find the offsets of the
> members of the structures:
>
> (gdb) p &((struct sk_buff *)0)->dev
> $1 = (struct net_device **) 0x10
> (gdb) p &((struct net_device *)0)->name
> $2 = (char (*)[16]) 0x118
>
> And then use the raw numbers to dereference:
>
> # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events
>
> If BTF is in the kernel, then instead, the skbaddr can be typecast to
> sk_buff and use the normal dereference logic.
>
> # echo 'e:xmit net.net_dev_xmit (sk_buff)skbaddr->dev->name:string' >> dynamic_events
> # echo 1 > events/eprobes/xmit/enable
> # cat trace
> [..]
> sshd-session-1022 [000] b..2. 860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0"
> sshd-session-1022 [000] b..2. 860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0"
>
> The syntax is simply: (STRUCT)(FIELD)->MEMBER[->MEMBER..]
>
> Also add comments around the #else and #endif of #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> to know what they are for.
hi,
this seems to be supported only for argument (pointer) stored in the trace record,
not the actual arguments to the tracepoint, is that right?
so I can deref worker from sched.sched_kthread_work_queue_work, like:
echo 'e:myprobe sched.sched_kthread_work_queue_work (kthread_worker)worker->flags (kthread_work)work->canceling' > dynamic_events
but I can't deref sched.sched_process_exec p->pid, like:
# echo 'e:myprobe sched.sched_process_exec (task_struct)p->pid' > dynamic_events
bash: echo: write error: Invalid argument
SNIP
> +static int handle_typecast(char *arg, struct fetch_insn **pcode,
> + struct fetch_insn *end,
> + struct traceprobe_parse_context *ctx)
> +{
> + char *tmp;
> + int ret;
> +
> + /* Currently this only works for eprobes */
> + if (!(ctx->flags & TPARG_FL_TEVENT)) {
> + trace_probe_log_err(ctx->offset, TYPECAST_NOT_EVENT);
> + return -EINVAL;
> + }
> +
> + tmp = strchr(arg, ')');
> + if (!tmp) {
> + trace_probe_log_err(ctx->offset + strlen(arg),
> + DEREF_OPEN_BRACE);
> + return -EINVAL;
> + }
> + *tmp = '\0';
> + ret = query_btf_struct(arg + 1, ctx);
> + *tmp = ')';
> +
> + if (ret < 0) {
> + trace_probe_log_err(ctx->offset + 1, NO_PTR_STRCT);
> + ret = -EINVAL;
> + goto out_put;
> + }
> +
> + ctx->flags |= TPARG_FL_TYPECAST;
> + tmp++;
> +
> + ctx->offset += tmp - arg;
> + ret = parse_btf_arg(tmp, pcode, end, ctx);
> + ctx->flags &= ~TPARG_FL_TYPECAST;
> + ctx->last_struct = NULL;
> +out_put:
> + btf_put(ctx->struct_btf);
should we zero ctx->struct_btf in case there's more type casts,
so query_btf_struct would re-init it?
jirka
^ permalink raw reply
* [PATCH] selftests/ftrace: Fix trace_marker_raw test on 64K page kernels
From: Tianchen Ding @ 2026-05-27 9:54 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Shuah Khan
Cc: linux-kernel, linux-trace-kernel, linux-kselftest
On ARM64 kernels with 64K pages, the trace_marker_raw test fails because
bash's printf builtin uses stdio buffering which splits output into
multiple small write() calls to the tracefs file. Since each individual
write is within TRACE_MARKER_MAX_SIZE (4096), they all succeed, causing
the "too big" write test to incorrectly pass.
Fix by piping make_str output through dd with iflag=fullblock to
guarantee a single atomic write() syscall to trace_marker_raw.
Fixes: 37f46601383a ("selftests/tracing: Add basic test for trace_marker_raw file")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
---
.../selftests/ftrace/test.d/00basic/trace_marker_raw.tc | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
index 8e905d4fe6dd..efd8263e6087 100644
--- a/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
+++ b/tools/testing/selftests/ftrace/test.d/00basic/trace_marker_raw.tc
@@ -43,8 +43,10 @@ write_buffer() {
id=$1
size=$2
- # write the string into the raw marker
- make_str $id $size > trace_marker_raw
+ # Pipe through dd to ensure a single atomic write() syscall.
+ # Shell's printf builtin uses stdio buffering which may split the
+ # output into multiple writes.
+ make_str $id $size | dd of=trace_marker_raw bs=`expr $size + 4` iflag=fullblock
}
--
2.39.3
^ permalink raw reply related
* Re: [PATCHv4 04/13] uprobes/x86: Unmap trampoline vma object in case it's unused
From: Jiri Olsa @ 2026-05-27 9:57 UTC (permalink / raw)
To: bot+bpf-ci
Cc: oleg, peterz, mingo, mhiramat, andrii, bpf, linux-trace-kernel,
ast, daniel, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
In-Reply-To: <addf3472e04583c864492a9343420c1e0356b324b4a84a71145dd759d8ec367d@mail.kernel.org>
On Tue, May 26, 2026 at 09:46:49PM +0000, bot+bpf-ci@kernel.org wrote:
> > commit 68883203f0583fb1a8d79536f5c8945542974abd
> > Author: Jiri Olsa <jolsa@kernel.org>
> > Date: Tue May 26 22:58:31 2026 +0200
> >
> > uprobes/x86: Unmap trampoline vma object in case it's unused
> >
> > In case the optimization fails, we leak new-ly created trampoline
> > vma mapping (in case we just created it), let's unmap it.
> >
> > Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>
> Should this Fixes: tag point to a9fe6f54968c instead? The vma leak bug
> was actually introduced by commit a9fe6f54968c ("uprobes/x86: Remove
> struct uprobe_trampoline object"), not ba2bfc97b462. Commit a9fe6f54968c
> removed the destroy_uprobe_trampoline() cleanup call, and its commit
> message explicitly states: "The new vma leak is fixed in following
> change."
there's no a9fe6f54968c hash yet
>
> Suggested: Fixes: a9fe6f54968c ("uprobes/x86: Remove struct
> uprobe_trampoline object")
>
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index c8af41ed681a..af5af7d67999 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -677,11 +677,14 @@ static unsigned long find_nearest_trampoline(unsigned long vaddr)
> > return high_tramp;
> > }
> >
> > -static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsigned long vaddr)
> > +static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsigned long vaddr,
> > + bool *new_mapping)
> > {
> > VMA_ITERATOR(vmi, mm, 0);
> > struct vm_area_struct *vma;
> >
> > + *new_mapping = false;
> > +
> > if (vaddr > TASK_SIZE || vaddr < PAGE_SIZE)
> > return ERR_PTR(-EINVAL);
> >
>
> Does this fix address the VMA leak scenario across fork chains? In v2,
> sashiko-bot raised a concern about inherited-but-not-tracked VMAs:
>
> https://lore.kernel.org/bpf/20260518114208.CE128C2BCF6@smtp.kernel.org/
>
> When VM_DONTCOPY is removed, child processes inherit trampoline VMAs
> but their tracking list (mm->uprobes_state.head_tramps) is empty. When
> the child executes a uprobe, get_uprobe_trampoline() fails to find the
> inherited VMA and creates a duplicate trampoline.
>
> Child fork:
> dup_mmap()
> copies trampoline VMA to child
> mm_init_uprobes_state()
> initializes head_tramps as empty (parent's state not copied)
>
> Child executes uprobe:
> arch_uprobe_optimize()
> get_uprobe_trampoline()
> head_tramps is empty, fails to find inherited VMA
> create_uprobe_trampoline()
> find_nearest_trampoline() finds new unmapped area
> installs second duplicate trampoline VMA
>
> In a deep fork chain, each generation inherits all previous VMAs, has an
> empty head_tramps list, and creates a new VMA. Can this cause linear
> accumulation of redundant VMAs, eventually exhausting vm.max_map_count?
>
> While this patch addresses unmapping failed optimization attempts, it
> does not prevent the inherited-but-not-tracked VMA leak scenario in fork
> chains.
there's not head_tramps list anymore, was removed by earlier:
uprobes/x86: Remove struct uprobe_trampoline object
jirka
^ permalink raw reply
* Re: [PATCHv4 06/13] libbpf: Change has_nop_combo to work on top of nop10
From: Jiri Olsa @ 2026-05-27 9:57 UTC (permalink / raw)
To: bot+bpf-ci
Cc: oleg, peterz, mingo, mhiramat, andrii, jakub, bpf,
linux-trace-kernel, ast, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
In-Reply-To: <4ea9f26531f9e53da6c8e5c8a638ef7aeaffd6b3fb6ed20e742069783e660e05@mail.kernel.org>
On Tue, May 26, 2026 at 09:46:37PM +0000, bot+bpf-ci@kernel.org wrote:
> > commit 5ad96df841a0877eeca7d2cbe4b3f726b7fd0b72
> > Author: Jiri Olsa <jolsa@kernel.org>
> > Date: Tue May 26 22:58:33 2026 +0200
> >
> > libbpf: Change has_nop_combo to work on top of nop10
> >
> > We now expect nop combo with 10 bytes nop instead of 5 bytes nop,
> > fixing has_nop_combo to reflect that.
> >
> > Fixes: 41a5c7df4466 ("libbpf: Add support to detect nop,nop5 instructions combo for usdt probe")
> > Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>
> The Fixes: tag references commit 41a5c7df4466 which originally added
> has_nop_combo() with correct nop5 support. However, the actual bug was
> introduced by commit af734a7ddb4e which changed the kernel's uprobe
> implementation from nop5 to nop10. Should the tag be:
>
> Fixes: af734a7ddb4e ("uprobes/x86: Move optimized uprobe from nop5 to nop10")
af734a7ddb4e hash is not available, the patch:
uprobes/x86: Move optimized uprobe from nop5 to nop10
is part of this patchset
>
> > diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> > index e3710933fd52..484a4354e82b 100644
> > --- a/tools/lib/bpf/usdt.c
> > +++ b/tools/lib/bpf/usdt.c
> > @@ -305,7 +305,7 @@ struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
> >
> > /*
> > * Detect kernel support for uprobe() syscall, it's presence means we can
> > - * take advantage of faster nop5 uprobe handling.
> > + * take advantage of faster nop10 uprobe handling.
> > * Added in: 56101b69c919 ("uprobes/x86: Add uprobe syscall to speed up uprobe")
> > */
>
> The comment at the end of this block references commit 56101b69c919 which
> added the uprobe syscall. Andrii Nakryiko noted in v3 review that it would
> be good to also reference the commit that switches nop5 to nop10, though he
> acknowledged the hash isn't stable until it lands. Now that af734a7ddb4e is
> available, could this be updated to reference both commits?
I proposed to change the comment after this gets merged
>
> > @@ -596,14 +596,14 @@ static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note,
> > #if defined(__x86_64__)
> > static bool has_nop_combo(int fd, long off)
> > {
> > - unsigned char nop_combo[6] = {
> > - 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 /* nop,nop5 */
> > + unsigned char nop_combo[11] = {
> > + 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00,
> > };
> > - unsigned char buf[6];
> > + unsigned char buf[11];
> >
> > - if (pread(fd, buf, 6, off) != 6)
> > + if (pread(fd, buf, 11, off) != 11)
> > return false;
> > - return memcmp(buf, nop_combo, 6) == 0;
> > + return memcmp(buf, nop_combo, 11) == 0;
> > }
>
> This isn't a bug, but Jakub Sitnicki suggested in v1 review using
> ARRAY_SIZE(buf) instead of repeating the scalar value 11. The author
> preferred keeping the literal number for simplicity, though the value
> now appears in 5 locations (array declarations, pread size, pread check,
> memcmp size).
and I replied to that then
jirka
^ permalink raw reply
* Re: [PATCHv4 09/13] selftests/bpf: Change uprobe syscall tests to use nop10
From: Jiri Olsa @ 2026-05-27 9:58 UTC (permalink / raw)
To: bot+bpf-ci
Cc: oleg, peterz, mingo, mhiramat, andrii, bpf, linux-trace-kernel,
ast, daniel, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
In-Reply-To: <3bec8cfe67b2e52b7316d77f6907373a99ee0443a27c15ad26b90d6c5539ecdf@mail.kernel.org>
On Tue, May 26, 2026 at 09:46:38PM +0000, bot+bpf-ci@kernel.org wrote:
SNIP
> > diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> > index 69759b27794d..fda3a298ccfc 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> > @@ -252,7 +252,7 @@ extern void usdt_1(void);
> > extern void usdt_2(void);
> >
> > static unsigned char nop1[1] = { 0x90 };
> > -static unsigned char nop1_nop5_combo[6] = { 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> > +static unsigned char nop1_nop10_combo[11] = { 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
> >
> > static void *find_instr(void *fn, unsigned char *instr, size_t cnt)
> > {
>
> Is the loop in find_instr() adequate to find the updated instruction
> sequence? In uprobe_syscall.c, find_nop10() was updated to search up
> to 128 bytes to account for compiler-generated prologues:
>
> tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c:find_nop10() {
> for (i = 0; i < 128; i++) {
> if (!memcmp(nop10, fn + i, 10))
> return fn + i;
> }
> }
>
> But find_instr() in usdt.c only searches the first 10 bytes:
>
> tools/testing/selftests/bpf/prog_tests/usdt.c:find_instr() {
> for (i = 0; i < 10; i++) {
> if (!memcmp(instr, fn + i, cnt))
> return fn + i;
> }
> }
>
> If a modern compiler generates a prologue longer than 9 bytes for
> usdt_2(), find_instr() will prematurely terminate and cause
> subtest_optimized_attach() to fail. This concern was raised by
> reviewers in v2 and v3 of the patch series:
find_instr is ok for its usage in subtest_optimized_attach,
we first use it to verify the combo was not generated in usdt_1,
and having 128 bytes search we'd find combo from usdt_2 function
modern compilers seems to be smart emough not to generate long
prologue for usdt_2 function
jirka
^ permalink raw reply
* Re: [PATCHv4 03/13] uprobes/x86: Allow to copy uprobe trampolines on fork
From: Jiri Olsa @ 2026-05-27 9:58 UTC (permalink / raw)
To: bot+bpf-ci
Cc: oleg, peterz, mingo, mhiramat, andrii, bpf, linux-trace-kernel,
ast, daniel, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
In-Reply-To: <aa6bd23387138013e5b10a9d9bacf6dfbcf9a2981ddb132a3d29e3c8524c65fa@mail.kernel.org>
On Tue, May 26, 2026 at 09:46:40PM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 733655bc610e..c8af41ed681a 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -697,7 +697,7 @@ static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsign
> > return ERR_PTR(vaddr);
> >
> > return _install_special_mapping(mm, vaddr, PAGE_SIZE,
> > - VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_DONTCOPY|VM_IO,
> > + VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_IO,
> > &tramp_mapping);
> > }
>
> When a process with optimized uprobes is forked, the child inherits the
> optimized code and trampoline. This means sys_uprobe invocations in the
> child will go into the uprobe trampoline, and the syscall will keep
> returning -EPROTO since no handler is registered.
>
> Is there a way to restore the original memory pages for clones?
>
> This behavior seems suboptimal from a performance perspective. For
> standard int3-based uprobes, breakpoints are removed via
> UPROBE_HANDLER_REMOVE. Could something similar be implemented for
> optimized uprobes, perhaps with proper filtering in uprobe_multi?
>
> Reference: https://lore.kernel.org/bpf/CAEf4BzYo-8PAXFJt9MHoUn9ux1O2YVxJADC0tGSsacVu_R8Stw@mail.gmail.com/
and I replied in that thread that we can address that later:
https://lore.kernel.org/bpf/ahNzlAfUnymDDGOR@krava/
jirka
^ permalink raw reply
* Re: [PATCHv4 02/13] uprobes/x86: Remove struct uprobe_trampoline object
From: Jiri Olsa @ 2026-05-27 9:58 UTC (permalink / raw)
To: bot+bpf-ci
Cc: oleg, peterz, mingo, mhiramat, andrii, bpf, linux-trace-kernel,
ast, daniel, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
In-Reply-To: <22e6111c3aa950ec8eaa29c776670e6f834f82249190a652b2afa9a9cb551d76@mail.kernel.org>
On Tue, May 26, 2026 at 09:46:44PM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 2be6707e3320..733655bc610e 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
>
> [ ... ]
>
> > @@ -1111,21 +1051,18 @@ int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> > static int __arch_uprobe_optimize(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > unsigned long vaddr)
> > {
> > - struct uprobe_trampoline *tramp;
> > - struct vm_area_struct *vma;
> > - bool new = false;
> > - int err = 0;
> > + struct pt_regs *regs = task_pt_regs(current);
> > + struct vm_area_struct *vma, *tramp;
> >
> > + if (!user_64bit_mode(regs))
> > + return -EINVAL;
> > vma = find_vma(mm, vaddr);
> > if (!vma)
> > return -EINVAL;
> > - tramp = get_uprobe_trampoline(vaddr, &new);
> > - if (!tramp)
> > - return -EINVAL;
> > - err = swbp_optimize(auprobe, vma, vaddr, tramp->vaddr);
> > - if (WARN_ON_ONCE(err) && new)
> > - destroy_uprobe_trampoline(tramp);
> > - return err;
> > + tramp = get_uprobe_trampoline(mm, vaddr);
> > + if (IS_ERR(tramp))
> > + return PTR_ERR(tramp);
> > + return WARN_ON_ONCE(swbp_optimize(auprobe, vma, vaddr, tramp->vm_start));
> > }
>
> When swbp_optimize() fails in __arch_uprobe_optimize() and the preceding
> get_uprobe_trampoline() call created a new trampoline VMA via
> _install_special_mapping(), does the new VMA leak in the process address
> space?
>
> The pre-patch code tracked whether the trampoline was newly created and
> called destroy_uprobe_trampoline() on the failure path:
>
> err = swbp_optimize(auprobe, vma, vaddr, tramp->vaddr);
> if (WARN_ON_ONCE(err) && new)
> destroy_uprobe_trampoline(tramp);
> return err;
>
> The new code drops that cleanup entirely:
>
> return WARN_ON_ONCE(swbp_optimize(auprobe, vma, vaddr, tramp->vm_start));
>
> Although the old destroy_uprobe_trampoline only freed the struct (not the
> underlying VMA), the new code appears to introduce a VMA leak: the freshly
> mapped PAGE_SIZE special mapping in the user's address space stays mapped
> even though optimization failed. arch_uprobe_optimize() then sets
> ARCH_UPROBE_FLAG_OPTIMIZE_FAIL so subsequent calls won't retry, leaving the
> orphan trampoline mapping in the address space until exit_mmap() reaps it at
> process teardown.
>
> The commit message mentions: "Note the original code called
> destroy_uprobe_trampoline if the optimiation failed, but it only freed the
> struct uprobe_trampoline object, not the vma. The new vma leak is fixed in
> following change."
>
> Is the VMA leak addressed in the subsequent commit in this series?
yes, in:
[1] uprobes/x86: Unmap trampoline vma object in case it's unused
>
> A secondary behaviour change is that 'return WARN_ON_ONCE(swbp_optimize(...))'
> now returns the boolean truth value of the error (0 or 1) instead of the
> original errno. While the current caller (arch_uprobe_optimize) only treats
> the value as boolean, could this surprise a future caller that propagates the
> return code?
ah ok, this is actualy 'fixed' in [1] above, but yea we should
fix that directly in this change, will do
>
> Commit message:
>
> > uprobes/x86: Remove struct uprobe_trampoline object
> >
> > Removing struct uprobe_trampoline object and it's tracking code,
> > because it's not needed. We can do same thing directly on top of
> > struct vm_area_struct objects.
> >
> > This makes the code simpler and allows easy propagation of the
> > trampoline vma object into child process in following change.
> >
> > Note the original code called destroy_uprobe_trampoline if the
> > optimiation failed, but it only freed the struct uprobe_trampoline
> > object, not the vma. The new vma leak is fixed in following change.
> >
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>
> CI bot suggested adding a Fixes tag in v3: 'Fixes: ba2bfc97b462
> ("uprobes/x86: Add support to optimize uprobes")' because the commit message
> notes a VMA resource leak in the original code.
>
> Reference: https://lore.kernel.org/bpf/8861cc7af8e0dfff2674ca4ae6381ea2d82c242dec115774e8e11048bde358c4@mail.kernel.org/
>
> The author responded that 'the logic stays' and a separate fix is needed, but
> did not explain why the Fixes tag should not be included. If this patch is
> documenting a pre-existing bug (even if the full fix comes later), should it
> include a Fixes tag for backporting purposes?
there's not functional change in this patch, the vma leak fix is in [1]
jirka
^ permalink raw reply
* Re: [PATCHv4 12/13] selftests/bpf: Add tests for uprobe nop10 red zone clobbering
From: Jiri Olsa @ 2026-05-27 10:26 UTC (permalink / raw)
To: bot+bpf-ci
Cc: oleg, peterz, mingo, mhiramat, andrii, bpf, linux-trace-kernel,
ast, daniel, martin.lau, eddyz87, yonghong.song, clm,
ihor.solodrai
In-Reply-To: <623e280bfab376da047f79dd9ef8f646c26f38fdb0348fce7f8f8108a1261ee4@mail.kernel.org>
On Tue, May 26, 2026 at 09:46:42PM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > index 969f4deba9fd..efff0c515184 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
>
> [ ... ]
>
> > @@ -855,6 +897,37 @@ static void test_uprobe_race(void)
> > #define __NR_uprobe 336
> > #endif
> >
> > +static void test_uprobe_red_zone(void)
> > +{
> > + struct uprobe_syscall_executed *skel;
> > + struct bpf_link *link;
> > + void *nop10_addr;
> > + size_t offset;
> > + int i;
> > +
> > + nop10_addr = find_nop10(uprobe_red_zone_test);
> > + if (!ASSERT_NEQ(nop10_addr, NULL, "find_nop10"))
> > + return;
> > +
> > + skel = uprobe_syscall_executed__open_and_load();
> > + if (!ASSERT_OK_PTR(skel, "open_and_load"))
> > + return;
> > +
> > + offset = get_uprobe_offset(nop10_addr);
> > + link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
> > + 0, "/proc/self/exe", offset, NULL);
> > + if (!ASSERT_OK_PTR(link, "attach_uprobe"))
> > + goto cleanup;
> > +
> > + for (i = 0; i < 10; i++)
> > + ASSERT_EQ(uprobe_red_zone_test(), 0, "red_zone_intact");
>
> Does this loop execute too quickly to guarantee the uprobe optimization has
> completed?
>
> Uprobe optimization is performed asynchronously by a worker thread. Since this
> is a tight loop with no delays or blocking, the 10 iterations might finish
> before the background worker patches the code.
uprobe is optimized synchronously whithin its first hit
>
> If all hits execute on the unoptimized INT3 path, the test could falsely pass
> on kernels where the CALL-based optimization bug is present.
>
> (This concern was raised by sashiko-bot@kernel.org in v1 and remains
> unaddressed:
> https://lore.kernel.org/bpf/20260514212219.BF1F3C2BCB3@smtp.kernel.org/)
>
> > +
> > + bpf_link__destroy(link);
> > +
> > +cleanup:
> > + uprobe_syscall_executed__destroy(skel);
> > +}
>
> [ ... ]
>
> > diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
> > index b359b389f6c0..5e38f8605b02 100644
> > --- a/tools/testing/selftests/bpf/usdt_2.c
> > +++ b/tools/testing/selftests/bpf/usdt_2.c
> > @@ -13,4 +13,17 @@ void usdt_2(void)
> > USDT(optimized_attach, usdt_2);
> > }
> >
> > +static volatile unsigned long usdt_red_zone_arg1 = 0xDEADBEEF;
> > +static volatile unsigned long usdt_red_zone_arg2 = 0xCAFEBABE;
> > +static volatile unsigned long usdt_red_zone_arg3 = 0xFEEDFACE;
> > +
> > +void __attribute__((noinline)) usdt_red_zone_trigger(void)
> > +{
> > + unsigned long a1 = usdt_red_zone_arg1;
> > + unsigned long a2 = usdt_red_zone_arg2;
> > + unsigned long a3 = usdt_red_zone_arg3;
> > +
> > + USDT(optimized_attach, usdt_red_zone, a1, a2, a3);
> > +}
>
> Is this test reliable across different compiler versions and optimizations?
>
> The USDT macro uses the "nor" inline assembly constraint, which allows the
> compiler to allocate these operands to CPU registers.
>
> Because there is no register pressure in usdt_red_zone_trigger(), modern
> compilers might allocate a1, a2, and a3 directly to registers rather than the
> stack's red zone.
>
> If the operands are placed in registers, the uprobe CALL optimization
> clobbering [rsp-8] will not corrupt them. This could cause the test to pass
> even on buggy kernels, creating a false positive.
>
> Would it be safer to explicitly force these operands into the red zone using
> inline assembly constraints, rather than depending on the compiler's register
> allocator?
I think it's ok the way it is, usdt_red_zone_trigger uses redzone on x86_64 build:
0000000000830d02 <usdt_red_zone_trigger>:
830d02: 55 push %rbp
830d03: 48 89 e5 mov %rsp,%rbp
830d06: 48 8b 05 6b f3 44 03 mov 0x344f36b(%rip),%rax # 3c80078 <usdt_red_zone_arg1>
830d0d: 48 89 45 f8 mov %rax,-0x8(%rbp)
830d11: 48 8b 05 68 f3 44 03 mov 0x344f368(%rip),%rax # 3c80080 <usdt_red_zone_arg2>
830d18: 48 89 45 f0 mov %rax,-0x10(%rbp)
830d1c: 48 8b 05 65 f3 44 03 mov 0x344f365(%rip),%rax # 3c80088 <usdt_red_zone_arg3>
830d23: 48 89 45 e8 mov %rax,-0x18(%rbp)
...
jirka
^ permalink raw reply
* Re: [PATCHv4 09/13] selftests/bpf: Change uprobe syscall tests to use nop10
From: Jakub Sitnicki @ 2026-05-27 10:30 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
Andrii Nakryiko, bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-10-jolsa@kernel.org>
On Tue, May 26, 2026 at 10:58 PM +02, Jiri Olsa wrote:
> Optimized uprobes are now on top of 10-bytes nop instructions,
> reflect that in existing tests.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
[...]
> @@ -430,9 +432,11 @@ static void *check_attach(struct uprobe_syscall_executed *skel, trigger_t trigge
>
> static void check_detach(void *addr, void *tramp)
> {
> + static const char nop10_prefix[] = { 0x66, 0x2e, 0x0f, 0x1f, 0x84 };
> +
> /* [uprobes_trampoline] stays after detach */
> ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
> - ASSERT_OK(memcmp(addr, nop5, 5), "nop5");
> + ASSERT_OK(memcmp(addr, nop10_prefix, 5), "nop10_prefix");
> }
>
> static void check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
Nit: You could just do `memcmp(addr, nop10, 5)` to match the prefix.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
^ permalink raw reply
* Re: [PATCHv4 10/13] selftests/bpf: Change uprobe/usdt trigger bench code to use nop10
From: Jakub Sitnicki @ 2026-05-27 10:46 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
Andrii Nakryiko, bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-11-jolsa@kernel.org>
On Tue, May 26, 2026 at 10:58 PM +02, Jiri Olsa wrote:
> Changing uprobe/usdt trigger bench code to use nop10 instead
> of nop5. Also changing run_bench_uprobes.sh to use nop10 triggers.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
^ permalink raw reply
* Re: [PATCHv4 11/13] selftests/bpf: Add reattach tests for uprobe syscall
From: Jakub Sitnicki @ 2026-05-27 11:32 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
Andrii Nakryiko, bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-12-jolsa@kernel.org>
On Tue, May 26, 2026 at 10:58 PM +02, Jiri Olsa wrote:
> Adding reattach tests for uprobe syscall tests to make sure
> we can re-attach and optimize same uprobe multiple times.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> .../selftests/bpf/prog_tests/uprobe_syscall.c | 114 ++++++++++++++++--
> 1 file changed, 104 insertions(+), 10 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 0868fb9793e0..969f4deba9fd 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -430,23 +430,28 @@ static void *check_attach(struct uprobe_syscall_executed *skel, trigger_t trigge
> return tramp;
> }
>
> -static void check_detach(void *addr, void *tramp)
> +static bool check_detach(void *addr, void *tramp)
> {
> static const char nop10_prefix[] = { 0x66, 0x2e, 0x0f, 0x1f, 0x84 };
> + bool ok = true;
>
> /* [uprobes_trampoline] stays after detach */
> - ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
> - ASSERT_OK(memcmp(addr, nop10_prefix, 5), "nop10_prefix");
> + if (!ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline"))
> + ok = false;
> + if (!ASSERT_OK(memcmp(addr, nop10_prefix, 5), "nop10_prefix"))
> + ok = false;
> + return ok;
> }
Nit: Maybe apply the same pattern you used in
progs/get_func_args_test.c?
ok &= ASSERT_OK(...)
ok &= ASSERT_OK(...)
>
> -static void check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
> - trigger_t trigger, void *addr, int executed)
> +static void *check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
> + trigger_t trigger, void *addr, int executed)
Nit: Kinda wish that was called check_attach_detach().
> {
> void *tramp;
>
> tramp = check_attach(skel, trigger, addr, executed);
> bpf_link__destroy(link);
> check_detach(addr, tramp);
> + return tramp;
> }
>
> static void test_uprobe_legacy(void)
> @@ -457,6 +462,7 @@ static void test_uprobe_legacy(void)
> );
> struct bpf_link *link;
> unsigned long offset;
> + void *tramp;
>
> offset = get_uprobe_offset(&uprobe_test);
> if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
> @@ -474,7 +480,28 @@ static void test_uprobe_legacy(void)
> if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
> goto cleanup;
>
> - check(skel, link, uprobe_test, uprobe_test, 2);
> + tramp = check(skel, link, uprobe_test, uprobe_test, 2);
> +
> + /* reattach and detach without triggering optimization */
> + link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
> + 0, "/proc/self/exe", offset, NULL);
> + if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
> + goto cleanup;
> +
In theory we're missing a check here that an unoptimized uprobe was
installed. If nothing happened at all between the last check() and
check_destroy() below, the test would still pass.
Applies to the three similar changes after that one as well.
> + bpf_link__destroy(link);
> + if (!check_detach(uprobe_test, tramp))
> + goto cleanup;
> +
> + uprobe_test();
> + ASSERT_EQ(skel->bss->executed, 2, "executed_no_probe");
> +
> + /* reattach with triggering optimization */
> + link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
> + 0, "/proc/self/exe", offset, NULL);
> + if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
> + goto cleanup;
> +
> + check(skel, link, uprobe_test, uprobe_test, 4);
>
> /* uretprobe */
> skel->bss->executed = 0;
[...]
^ permalink raw reply
* [PATCHv6 bpf-next 00/29] bpf: tracing_multi link
From: Jiri Olsa @ 2026-05-27 11:39 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Hengqi Chen, bpf, linux-trace-kernel, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, Menglong Dong,
Steven Rostedt
hi,
adding tracing_multi link support that allows fast attachment
of tracing program to many functions.
RFC: https://lore.kernel.org/bpf/20260203093819.2105105-1-jolsa@kernel.org/
v1: https://lore.kernel.org/bpf/20260220100649.628307-1-jolsa@kernel.org/
v2: https://lore.kernel.org/bpf/20260304222141.497203-1-jolsa@kernel.org/
v3: https://lore.kernel.org/bpf/20260316075138.465430-1-jolsa@kernel.org/
v4: https://lore.kernel.org/bpf/20260324081846.2334094-1-jolsa@kernel.org/
v5: https://lore.kernel.org/bpf/20260417192502.194548-1-jolsa@kernel.org/
v6 changes:
- move ftrace_hash_count declaration under CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS [sashiko]
- fix ftrace_hash_remove check/deref [sashiko]
- disable context access for multi programs by using stub function with no arguments
for verification [sashiko]
- add __used for bpf_multi_func, and removed arguments, we do not allow direct access [sashiko]
- rebased on latest loongarch changes, fix ppc build
- guard update_ftrace_direct_del with ftrace_hash_count on rollback [sashiko]
- fix noreturn attachment condition in bpf_check_attach_btf_id_multi [sashiko]
- fail early on multiple same IDs provided by user [sashiko]
- fix selftests error paths [sashiko]
- add MAX_RESOLVE_DEPTH check to btf_get_type_size [sashiko]
- use btf__pointer_size [sashiko]
- fixed compilation on powerpc [sashiko]
- added verifier fails selftest
- after discussing with Song, it was determined that cleaning up FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER
is not strictly necessary — keeping the trampoline in the ipmodify_enabled state is acceptable.
The race condition this introduces remains unlikely, so the concern raised in [1] will not be
addressed at this time.
[1] https://lore.kernel.org/bpf/aec7bAbGlnEo3R1g@krava/
v5 changes:
- add dedicated hashes used for detach, so there's no need to allocate
them on detach [sashiko]
- safely release old trampoline images [sashiko]
- add cond_resched() to couple of loops [sashiko]
- validate attr->link_create.target_fd [sashiko]
- allow only bpf_get_func_ret() for return value retrieval [sashiko]
- do not allow attachment of fexit/fsession_multi for noreturn functions [sashiko]
- fixed double free/close in libbpf btf cleanup, in separate patch [sashiko]
- make btf_type_is_traceable_func closer to btf_distill_func_proto [sashiko]
- add prog->attach_btf_obj_fd check to collect_func_ids_by_glob,
to check we don't load module programs for kernel [sashiko]
- make sure program is loaded in bpf_program__attach_tracing_multi [sashiko]
- several selftests fixes [sashiko]
- add attach_type to fdinfo output [Leon Hwang]
- selftests cleanup fixes [Leon Hwang]
v4 changes:
- unlink rollback fix (added ftrace_hash_count) [bot]
- use const for some bpf_link_create_opts tracing_multi members [bot]
- adding missing comment for lockdep keys [bot]
- selftest error path fixes (leaks) and other assorted test fixes [Leon Hwang]
- several compile fixes wrt CONFIG_BPF_SYSCALL and CONFIG_BPF_JIT [kernel test robot]
- make ftrace_hash_clear global, because it's needed in rollback
v3 changes:
- fix module parsing [Leon Hwang]
- use function traceable check from libbpf [Leon Hwang]
- use ptr_to_u64 and fix/updated few comments [ci]
- display cookies as decimal numbers [ci]
- added link_create.flags check [ci]
- fix error path in bpf_trampoline_multi_detach [ci]
- make fentry/fexit.multi not extendable [ci]
- add missing OPTS_VALID to bpf_program__attach_tracing_multi [ci]
v2 changes:
- allocate data.unreg in bpf_trampoline_multi_attach for rollback path [ci]
and fixed link count setup in rollback path [ci]
- several small assorted fixes [ci]
- added loongarch and powerpc changes for struct bpf_tramp_node change
- added support to attach functions from modules
- added tests for sleepable programs
- added rollback tests
v1 changes:
- added ftrace_hash_count as wrapper for hash_count [Steven]
- added trampoline mutex pool [Andrii]
- reworked 'struct bpf_tramp_node' separatoin [Andrii]
- the 'struct bpf_tramp_node' now holds pointer to bpf_link,
which is similar to what we do for uprobe_multi;
I understand it's not a fundamental change compared to previous
version which used bpf_prog pointer instead, but I don't see better
way of doing this.. I'm happy to discuss this further if there's
better idea
- reworked 'struct bpf_fsession_link' based on bpf_tramp_node
- made btf__find_by_glob_kind function internal helper [Andrii]
- many small assorted fixes [Andrii,CI]
- added session support [Leon Hwang]
- added cookies support
- added more tests
Note I plan to send linkinfo support separately, the patchset is big enough.
thanks,
jirka
Cc: Hengqi Chen <hengqi.chen@gmail.com>
---
Jiri Olsa (29):
ftrace: Add ftrace_hash_count function
ftrace: Add ftrace_hash_remove function
ftrace: Add add_ftrace_hash_entry function
bpf: Use mutex lock pool for bpf trampolines
bpf: Add struct bpf_trampoline_ops object
bpf: Move trampoline image setup into bpf_trampoline_ops callbacks
bpf: Add bpf_trampoline_add/remove_prog functions
bpf: Add struct bpf_tramp_node object
bpf: Factor fsession link to use struct bpf_tramp_node
bpf: Add multi tracing attach types
bpf: Move sleepable verification code to btf_id_allow_sleepable
bpf: Add bpf_trampoline_multi_attach/detach functions
bpf: Add support for tracing multi link
bpf: Add support for tracing_multi link cookies
bpf: Add support for tracing_multi link session
bpf: Add support for tracing_multi link fdinfo
libbpf: Add bpf_object_cleanup_btf function
libbpf: Add bpf_link_create support for tracing_multi link
libbpf: Add btf_type_is_traceable_func function
libbpf: Add support to create tracing multi link
selftests/bpf: Add tracing multi skel/pattern/ids attach tests
selftests/bpf: Add tracing multi skel/pattern/ids module attach tests
selftests/bpf: Add tracing multi intersect tests
selftests/bpf: Add tracing multi cookies test
selftests/bpf: Add tracing multi session test
selftests/bpf: Add tracing multi attach fails test
selftests/bpf: Add tracing multi verifier fails test
selftests/bpf: Add tracing multi attach benchmark test
selftests/bpf: Add tracing multi attach rollback tests
arch/arm64/net/bpf_jit_comp.c | 58 ++--
arch/loongarch/net/bpf_jit.c | 52 ++--
arch/powerpc/net/bpf_jit_comp.c | 54 ++--
arch/riscv/net/bpf_jit_comp64.c | 52 ++--
arch/s390/net/bpf_jit_comp.c | 44 +--
arch/x86/net/bpf_jit_comp.c | 54 ++--
include/linux/bpf.h | 117 ++++++--
include/linux/bpf_types.h | 1 +
include/linux/bpf_verifier.h | 4 +
include/linux/btf_ids.h | 1 +
include/linux/ftrace.h | 4 +
include/linux/trace_events.h | 6 +
include/uapi/linux/bpf.h | 9 +
kernel/bpf/bpf_struct_ops.c | 27 +-
kernel/bpf/fixups.c | 2 +
kernel/bpf/syscall.c | 83 +++---
kernel/bpf/trampoline.c | 670 ++++++++++++++++++++++++++++++++++----------
kernel/bpf/verifier.c | 179 +++++++++---
kernel/trace/bpf_trace.c | 186 ++++++++++++-
kernel/trace/ftrace.c | 35 ++-
net/bpf/bpf_dummy_struct_ops.c | 14 +-
net/bpf/test_run.c | 3 +
tools/include/uapi/linux/bpf.h | 10 +
tools/lib/bpf/bpf.c | 9 +
tools/lib/bpf/bpf.h | 5 +
tools/lib/bpf/libbpf.c | 374 ++++++++++++++++++++++++-
tools/lib/bpf/libbpf.h | 15 +
tools/lib/bpf/libbpf.map | 1 +
tools/lib/bpf/libbpf_internal.h | 1 +
tools/testing/selftests/bpf/Makefile | 9 +-
tools/testing/selftests/bpf/prog_tests/tracing_multi.c | 930 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tools/testing/selftests/bpf/progs/tracing_multi_attach.c | 39 +++
tools/testing/selftests/bpf/progs/tracing_multi_attach_module.c | 25 ++
tools/testing/selftests/bpf/progs/tracing_multi_bench.c | 12 +
tools/testing/selftests/bpf/progs/tracing_multi_check.c | 214 ++++++++++++++
tools/testing/selftests/bpf/progs/tracing_multi_fail.c | 18 ++
tools/testing/selftests/bpf/progs/tracing_multi_intersect_attach.c | 41 +++
tools/testing/selftests/bpf/progs/tracing_multi_rollback.c | 43 +++
tools/testing/selftests/bpf/progs/tracing_multi_session_attach.c | 47 ++++
tools/testing/selftests/bpf/progs/tracing_multi_verifier.c | 31 +++
tools/testing/selftests/bpf/trace_helpers.c | 6 +-
tools/testing/selftests/bpf/trace_helpers.h | 1 +
42 files changed, 3057 insertions(+), 429 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tracing_multi.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_attach.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_attach_module.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_bench.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_check.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_fail.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_intersect_attach.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_rollback.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_session_attach.c
create mode 100644 tools/testing/selftests/bpf/progs/tracing_multi_verifier.c
^ permalink raw reply
* [PATCHv6 bpf-next 01/29] ftrace: Add ftrace_hash_count function
From: Jiri Olsa @ 2026-05-27 11:39 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260527113951.46265-1-jolsa@kernel.org>
Adding external ftrace_hash_count function so we could get hash
count outside of ftrace object.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/ftrace.h | 2 ++
kernel/trace/ftrace.c | 7 ++++++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 28b30c6f1031..282da661f131 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -551,6 +551,8 @@ int update_ftrace_direct_mod(struct ftrace_ops *ops, struct ftrace_hash *hash, b
void ftrace_stub_direct_tramp(void);
+unsigned long ftrace_hash_count(struct ftrace_hash *hash);
+
#else
struct ftrace_ops;
static inline unsigned long ftrace_find_rec_direct(unsigned long ip)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b2611de3f594..57ab01fd00bd 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6288,11 +6288,16 @@ int modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
}
EXPORT_SYMBOL_GPL(modify_ftrace_direct);
-static unsigned long hash_count(struct ftrace_hash *hash)
+static inline unsigned long hash_count(struct ftrace_hash *hash)
{
return hash ? hash->count : 0;
}
+unsigned long ftrace_hash_count(struct ftrace_hash *hash)
+{
+ return hash_count(hash);
+}
+
/**
* hash_add - adds two struct ftrace_hash and returns the result
* @a: struct ftrace_hash object
--
2.54.0
^ permalink raw reply related
* [PATCHv6 bpf-next 02/29] ftrace: Add ftrace_hash_remove function
From: Jiri Olsa @ 2026-05-27 11:39 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: bpf, linux-trace-kernel, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, Menglong Dong, Steven Rostedt
In-Reply-To: <20260527113951.46265-1-jolsa@kernel.org>
Adding ftrace_hash_remove function that removes all entries
from struct ftrace_hash object without freeing them.
It will be used in following changes where entries are allocated
as part of another structure and are free-ed separately.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/ftrace.h | 1 +
kernel/trace/ftrace.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 282da661f131..d3acfac4f8a7 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -415,6 +415,7 @@ struct ftrace_hash *alloc_ftrace_hash(int size_bits);
void free_ftrace_hash(struct ftrace_hash *hash);
struct ftrace_func_entry *add_ftrace_hash_entry_direct(struct ftrace_hash *hash,
unsigned long ip, unsigned long direct);
+void ftrace_hash_remove(struct ftrace_hash *hash);
/* The hash used to know what functions callbacks trace */
struct ftrace_ops_hash {
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 57ab01fd00bd..45548b0200eb 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1249,6 +1249,25 @@ remove_hash_entry(struct ftrace_hash *hash,
hash->count--;
}
+void ftrace_hash_remove(struct ftrace_hash *hash)
+{
+ struct ftrace_func_entry *entry;
+ struct hlist_head *hhd;
+ struct hlist_node *tn;
+ int size;
+ int i;
+
+ if (!hash || !hash->count)
+ return;
+ size = 1 << hash->size_bits;
+ for (i = 0; i < size; i++) {
+ hhd = &hash->buckets[i];
+ hlist_for_each_entry_safe(entry, tn, hhd, hlist)
+ remove_hash_entry(hash, entry);
+ }
+ FTRACE_WARN_ON(hash->count);
+}
+
static void ftrace_hash_clear(struct ftrace_hash *hash)
{
struct hlist_head *hhd;
--
2.54.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox