From: Gabriele Monaco <gmonaco@redhat.com>
To: wen.yang@linux.dev
Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
rostedt@goodmis.org, Gabriele Monaco <gmonaco@redhat.com>
Subject: [PATCH] Re: Re: [RFC PATCH v2 04/10] rv/da: add pre-allocated storage pool for per-object monitors
Date: Fri, 15 May 2026 10:30:01 +0200 [thread overview]
Message-ID: <20260515083002.106512-1-gmonaco@redhat.com> (raw)
In-Reply-To: <668f83581c58644a84cab5e6736864a439bb8e28.camel@redhat.com>
So this is what I meant. It's quick and dirty but seems to work as far
as I could test it.
I didn't change too much around to avoid confusing more, but it probably
needs a refactor for the functions positions and names. Some AI can do
that later after we agree on how it should look.
The main idea is (using current function names):
da_handle_start_[run_]_event() calls da_prepare_storage(), this
makes sure the storage is there and usable based on the strategy:
1. da_create_storage() plain allocation with kmalloc_nolock
2. da_create_or_get_pool() get a slot from the pool
3. da_fill_empty_storage() only set the target in a storage manually
allocated before
The reason why you'd need 3. is that since da_handle_start_event() is
called from a tracepoint, you may in no way be able to allocate from
there, then you use manually somewhere else with
da_create_empty_storage() if you don't have the target and
da_create_or_get() if you do (this one is misleading, we should probably
simplify further).
The newly created 2. might be useful if you aren't on preempt-rt and
cannot sleep but also don't want a manual allocation (beware of lock
dependencies, it doesn't always work).
Now, I left your da_create_or_get_kmalloc() unwired because I don't
really see the use case (you use kmalloc_nolock because you cannot lock,
so if it fails you don't try a kmalloc). But if we really want to offer
a possibility to allocate with GFP_KERNEL, we can make 1. more
configurable.
Does this make sense to you?
Thanks,
Gabriele
---
include/rv/da_monitor.h | 160 ++++++++++-------------
kernel/trace/rv/monitors/nomiss/nomiss.c | 2 +-
kernel/trace/rv/monitors/tlob/tlob.c | 15 +--
3 files changed, 74 insertions(+), 103 deletions(-)
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 74aa95d9a284..3b4a36245531 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -21,6 +21,7 @@
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/hashtable.h>
+#include <linux/mempool.h>
/*
* Per-cpu variables require a unique name although static in some
@@ -67,6 +68,35 @@ static struct rv_monitor rv_this;
#define da_id_type int
#endif
+#define DA_ALLOC_AUTO 0
+#define DA_ALLOC_POOL 1
+#define DA_ALLOC_MANUAL 2
+
+/*
+ * Allow the per-object monitors to run allocation manually, necessary if the
+ * start condition is in a context problematic for allocation (e.g. scheduling).
+ * In such case, if the storage was pre-allocated without a target, set it now.
+ */
+#ifndef DA_MON_ALLOCATION_STRATEGY
+#define DA_MON_ALLOCATION_STRATEGY DA_ALLOC_AUTO
+#endif
+#ifndef DA_MON_POOL_SIZE
+#define DA_MON_POOL_SIZE 0
+#endif
+#if DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_MANUAL
+#define da_prepare_storage da_fill_empty_storage
+
+#elif DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL
+#define da_prepare_storage da_create_or_get_pool
+#if DA_MON_POOL_SIZE == 0
+#error "DA_ALLOC_POOL requires DA_MON_POOL_SIZE to be non-zero"
+#endif
+
+#else
+#define da_prepare_storage da_create_storage
+#endif /* DA_MON_ALLOCATION_STRATEGY */
+
+
static void react(enum states curr_state, enum events event)
{
rv_react(&rv_this,
@@ -448,62 +478,38 @@ static inline monitor_target da_get_target_by_id(da_id_type id)
}
/*
- * Per-object pool state.
- *
- * Zero-initialised by default (storage == NULL ⟹ kmalloc mode). A monitor
- * opts into pool mode by calling da_monitor_init_prealloc(N) instead of
- * da_monitor_init(), which sets storage to a non-NULL kcalloc'd array.
- *
- * Because every field is wrapped in this struct and the struct itself is a
- * per-TU static, each monitor that includes this header gets a completely
- * independent pool. A kmalloc monitor (e.g. nomiss) and a pool monitor
- * (e.g. tlob) therefore coexist without any interference.
- *
- * da_pool_return_cb runs from softirq on non-PREEMPT_RT, so irqsave is
- * required to prevent deadlock with task-context callers. On PREEMPT_RT
- * it runs from an rcuc kthread where spinlock_t is a sleeping lock.
- */
-struct da_per_obj_pool {
- struct da_monitor_storage *storage; /* non-NULL ⟹ pool mode */
- struct da_monitor_storage **free; /* kmalloc'd pointer stack */
- unsigned int free_top;
- spinlock_t lock;
-};
-
-static struct da_per_obj_pool da_pool = {
- .lock = __SPIN_LOCK_UNLOCKED(da_pool.lock),
-};
+ * Per-object pool state using kmem_cache and mempool.
+ */
+static struct kmem_cache *da_mon_cache;
+static mempool_t *da_mon_pool;
static void da_pool_return_cb(struct rcu_head *head)
{
struct da_monitor_storage *ms =
container_of(head, struct da_monitor_storage, rcu);
- unsigned long flags;
-
- spin_lock_irqsave(&da_pool.lock, flags);
- da_pool.free[da_pool.free_top++] = ms;
- spin_unlock_irqrestore(&da_pool.lock, flags);
+ mempool_free(ms, da_mon_pool);
}
-/* Pops a slot from the pre-allocated pool; returns -ENOSPC if exhausted. */
-static inline int da_create_or_get_pool(da_id_type id, monitor_target target)
+/* Pops a slot from the pre-allocated pool; returns NULL if exhausted. */
+static inline struct da_monitor *da_create_or_get_pool(da_id_type id,
+ monitor_target target,
+ struct da_monitor *da_mon)
{
struct da_monitor_storage *mon_storage;
- unsigned long flags;
- spin_lock_irqsave(&da_pool.lock, flags);
- if (!da_pool.free_top) {
- spin_unlock_irqrestore(&da_pool.lock, flags);
- return -ENOSPC;
- }
- mon_storage = da_pool.free[--da_pool.free_top];
- spin_unlock_irqrestore(&da_pool.lock, flags);
+ if (da_mon)
+ return da_mon;
+ mon_storage = mempool_alloc_preallocated(da_mon_pool);
+ if (!mon_storage)
+ return NULL;
+
+ memset(mon_storage, 0, sizeof(*mon_storage));
mon_storage->id = id;
mon_storage->target = target;
guard(rcu)();
hash_add_rcu(da_monitor_ht, &mon_storage->node, id);
- return 0;
+ return &mon_storage->rv.da_mon;
}
/*
@@ -547,11 +553,12 @@ static inline int da_create_or_get_kmalloc(da_id_type id, monitor_target target)
}
/* Create the per-object storage if not already there. */
-static inline int da_create_or_get(da_id_type id, monitor_target target)
+// NOTE: this is only needed for manual allocation!
+// we can refactor to have it only defined there, leaving it for now
+static inline void da_create_or_get(da_id_type id, monitor_target target)
{
- if (da_pool.storage)
- return da_create_or_get_pool(id, target);
- return da_create_or_get_kmalloc(id, target);
+ guard(rcu)();
+ da_create_storage(id, target, da_get_monitor(id, target));
}
/*
@@ -573,7 +580,7 @@ static inline void da_destroy_storage(da_id_type id)
return;
da_monitor_reset_hook(&mon_storage->rv.da_mon);
hash_del_rcu(&mon_storage->node);
- if (da_pool.storage)
+ if (DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL)
call_rcu(&mon_storage->rcu, da_pool_return_cb);
else
kfree_rcu(mon_storage, rcu);
@@ -591,41 +598,26 @@ static __maybe_unused void da_monitor_reset_all(void)
}
/*
- * da_monitor_init_prealloc - initialise with a pre-allocated storage pool
- *
- * Allocates @prealloc_count storage slots up-front so that da_create_or_get()
- * and da_destroy_storage() never call kmalloc/kfree. Must be called instead
- * of da_monitor_init() for monitors that require pool mode.
+ * da_monitor_init - initialise in kmalloc mode (no pre-allocation)
*/
-static inline int da_monitor_init_prealloc(unsigned int prealloc_count)
+static inline int da_monitor_init(void)
{
hash_init(da_monitor_ht);
+ if (DA_MON_ALLOCATION_STRATEGY != DA_ALLOC_POOL)
+ return 0;
- da_pool.storage = kcalloc(prealloc_count, sizeof(*da_pool.storage),
- GFP_KERNEL);
- if (!da_pool.storage)
+ da_mon_cache = kmem_cache_create(__stringify(DA_MON_NAME) "_cache",
+ sizeof(struct da_monitor_storage),
+ 0, 0, NULL);
+ if (!da_mon_cache)
return -ENOMEM;
- da_pool.free = kmalloc_array(prealloc_count, sizeof(*da_pool.free),
- GFP_KERNEL);
- if (!da_pool.free) {
- kfree(da_pool.storage);
- da_pool.storage = NULL;
+ da_mon_pool = mempool_create_slab_pool(DA_MON_POOL_SIZE, da_mon_cache);
+ if (!da_mon_pool) {
+ kmem_cache_destroy(da_mon_cache);
+ da_mon_cache = NULL;
return -ENOMEM;
}
-
- da_pool.free_top = 0;
- for (unsigned int i = 0; i < prealloc_count; i++)
- da_pool.free[da_pool.free_top++] = &da_pool.storage[i];
- return 0;
-}
-
-/*
- * da_monitor_init - initialise in kmalloc mode (no pre-allocation)
- */
-static inline int da_monitor_init(void)
-{
- hash_init(da_monitor_ht);
return 0;
}
@@ -641,11 +633,10 @@ static inline void da_monitor_destroy_pool(void)
* pending callback.
*/
rcu_barrier();
- kfree(da_pool.storage);
- da_pool.storage = NULL;
- kfree(da_pool.free);
- da_pool.free = NULL;
- da_pool.free_top = 0;
+ mempool_destroy(da_mon_pool);
+ da_mon_pool = NULL;
+ kmem_cache_destroy(da_mon_cache);
+ da_mon_cache = NULL;
}
static inline void da_monitor_destroy_kmalloc(void)
@@ -676,23 +667,12 @@ static inline void da_monitor_destroy_kmalloc(void)
*/
static inline void da_monitor_destroy(void)
{
- if (da_pool.storage)
+ if (DA_MON_ALLOCATION_STRATEGY == DA_ALLOC_POOL)
da_monitor_destroy_pool();
else
da_monitor_destroy_kmalloc();
}
-/*
- * Allow the per-object monitors to run allocation manually, necessary if the
- * start condition is in a context problematic for allocation (e.g. scheduling).
- * In such case, if the storage was pre-allocated without a target, set it now.
- */
-#ifdef DA_SKIP_AUTO_ALLOC
-#define da_prepare_storage da_fill_empty_storage
-#else
-#define da_prepare_storage da_create_storage
-#endif /* DA_SKIP_AUTO_ALLOC */
-
#endif /* RV_MON_TYPE */
#if RV_MON_TYPE == RV_MON_GLOBAL || RV_MON_TYPE == RV_MON_PER_CPU
diff --git a/kernel/trace/rv/monitors/nomiss/nomiss.c b/kernel/trace/rv/monitors/nomiss/nomiss.c
index 31f90f3638d8..f089cc0e2f10 100644
--- a/kernel/trace/rv/monitors/nomiss/nomiss.c
+++ b/kernel/trace/rv/monitors/nomiss/nomiss.c
@@ -18,7 +18,7 @@
#define RV_MON_TYPE RV_MON_PER_OBJ
#define HA_TIMER_TYPE HA_TIMER_WHEEL
/* The start condition is on sched_switch, it's dangerous to allocate there */
-#define DA_SKIP_AUTO_ALLOC
+#define DA_MON_ALLOCATION_STRATEGY DA_ALLOC_MANUAL
typedef struct sched_dl_entity *monitor_target;
#include "nomiss.h"
#include <rv/ha_monitor.h>
diff --git a/kernel/trace/rv/monitors/tlob/tlob.c b/kernel/trace/rv/monitors/tlob/tlob.c
index 90e7035a0b55..486a6bac5cf9 100644
--- a/kernel/trace/rv/monitors/tlob/tlob.c
+++ b/kernel/trace/rv/monitors/tlob/tlob.c
@@ -88,8 +88,8 @@ struct tlob_task_state {
#define RV_MON_TYPE RV_MON_PER_OBJ
#define HA_TIMER_TYPE HA_TIMER_HRTIMER
-/* Pool mode: da_handle_start_event uses da_fill_empty_storage, not kmalloc. */
-#define DA_SKIP_AUTO_ALLOC
+#define DA_MON_ALLOCATION_STRATEGY DA_ALLOC_POOL
+#define DA_MON_POOL_SIZE TLOB_MAX_MONITORED
/* Type for da_monitor_storage.target; must be defined before the includes. */
typedef struct tlob_task_state *monitor_target;
@@ -428,7 +428,6 @@ int tlob_start_task(struct task_struct *task, u64 threshold_us)
struct da_monitor *da_mon;
struct ha_monitor *ha_mon;
u64 now_ns;
- int ret;
if (!da_monitor_enabled())
return -ENODEV;
@@ -457,14 +456,6 @@ int tlob_start_task(struct task_struct *task, u64 threshold_us)
ws->last_ts = ktime_get();
raw_spin_lock_init(&ws->entry_lock);
- /* Claim a pool slot (no kmalloc; DA_SKIP_AUTO_ALLOC + prealloc). */
- ret = da_create_or_get(task->pid, ws);
- if (ret) {
- put_task_struct(task);
- kmem_cache_free(tlob_state_cache, ws);
- return ret;
- }
-
atomic_inc(&tlob_num_monitored);
/* Hold RCU across handle + timer setup to keep da_mon valid. */
@@ -955,7 +946,7 @@ static int __tlob_init_monitor(void)
atomic_set(&tlob_num_monitored, 0);
- retval = da_monitor_init_prealloc(TLOB_MAX_MONITORED);
+ retval = da_monitor_init();
if (retval) {
kmem_cache_destroy(tlob_state_cache);
tlob_state_cache = NULL;
--
2.54.0
next prev parent reply other threads:[~2026-05-15 8:30 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 18:24 [RFC PATCH v2 00/10] rv/tlob: Add task latency over budget RV monitor wen.yang
2026-05-11 18:24 ` [RFC PATCH v2 01/10] rv/da: fix monitor start ordering and memory ordering for monitoring flag wen.yang
2026-05-13 12:39 ` Gabriele Monaco
2026-05-11 18:24 ` [RFC PATCH v2 02/10] rv/da: fix per-task da_monitor_destroy() ordering and sync wen.yang
2026-05-12 8:27 ` Gabriele Monaco
2026-05-12 9:09 ` Gabriele Monaco
2026-05-13 5:32 ` Wen Yang
2026-05-13 9:31 ` Gabriele Monaco
2026-05-11 18:24 ` [RFC PATCH v2 03/10] selftests/verification: fix verificationtest-ktap for out-of-tree execution wen.yang
2026-05-13 8:32 ` Gabriele Monaco
2026-05-11 18:24 ` [RFC PATCH v2 04/10] rv/da: add pre-allocated storage pool for per-object monitors wen.yang
2026-05-13 13:47 ` Gabriele Monaco
2026-05-13 13:50 ` Gabriele Monaco
2026-05-13 14:01 ` Gabriele Monaco
2026-05-15 8:30 ` Gabriele Monaco [this message]
2026-05-11 18:24 ` [RFC PATCH v2 05/10] rv: add generic uprobe infrastructure for RV monitors wen.yang
2026-05-11 18:24 ` [RFC PATCH v2 06/10] rvgen: support reset() on the __init arrow for global-window HA clocks wen.yang
2026-05-12 13:25 ` Gabriele Monaco
2026-05-11 18:24 ` [RFC PATCH v2 07/10] rv/tlob: add tlob model DOT file wen.yang
2026-05-11 18:24 ` [RFC PATCH v2 08/10] rv/tlob: add tlob hybrid automaton monitor wen.yang
2026-05-15 9:53 ` Gabriele Monaco
2026-05-15 13:08 ` Gabriele Monaco
2026-05-11 18:24 ` [RFC PATCH v2 09/10] rv/tlob: add KUnit tests for the tlob monitor wen.yang
2026-05-15 13:13 ` Gabriele Monaco
2026-05-11 18:24 ` [RFC PATCH v2 10/10] selftests/verification: add tlob selftests wen.yang
2026-05-13 7:46 ` Gabriele Monaco
2026-05-15 13:23 ` Gabriele Monaco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260515083002.106512-1-gmonaco@redhat.com \
--to=gmonaco@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=wen.yang@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox