From: Wen Yang <wen.yang@linux.dev>
To: Gabriele Monaco <gmonaco@redhat.com>
Cc: linux-kernel@vger.kernel.org,
Steven Rostedt <rostedt@goodmis.org>,
Nam Cao <namcao@linutronix.de>,
linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH 6/9] rv: Ensure synchronous cleanup for HA monitors
Date: Wed, 20 May 2026 00:48:11 +0800 [thread overview]
Message-ID: <5f8ceba5-ec25-4c19-a74b-ed98715e89b6@linux.dev> (raw)
In-Reply-To: <88a6fc5c08d18e3c1f6d29dc106db80fa688bf87.camel@redhat.com>
On 5/18/26 19:54, Gabriele Monaco wrote:
> On Sun, 2026-05-17 at 17:12 +0800, Wen Yang wrote:
>> The guard(rcu)() + synchronize_rcu() mechanism for HA timer callbacks
>> is correct.
>>
>> One concern: TOCTOU between the pre-check and guard(rcu)().
>
> Yes, this could happen, but I'm not sure it's really a big issue:
>
>>
>> da_monitor_reset() calls reset_hook BEFORE clearing monitoring:
>>
>> da_monitor_reset_hook(da_mon); /* ha_cancel_timer [async] */
>> WRITE_ONCE(da_mon->monitoring, 0); /* cleared AFTER reset_hook */
>> da_mon->curr_state = model_get_initial_state();
>>
>> This may creates a window where the callback pre-check passes but the
>> monitor is reset before guard(rcu)() is acquired:
>
> If a callback is running, there was a violation because the timer expired, so it
> isn't wrong to report, although we are unloading the monitor.
>
>>
>> /* __ha_monitor_timer_callback() */
>> if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
>> return;
>>
>> /* passes: monitoring=1
>> *
>> * WINDOW ─ CPU A runs da_monitor_reset_all() here:
>> * ha_cancel_timer() [returns: callback is running, cannot cancel]
>> * WRITE_ONCE(monitoring, 0)
>> * curr_state = model_get_initial_state()
>> */
>> guard(rcu)();
>> curr_state = READ_ONCE(ha_mon->da_mon.curr_state); /* initial_state */
>> /* no second da_monitoring() check */
>> ha_react(curr_state, EVENT_NONE, env_string.buffer); /* spurious call */
>> ha_trace_error_env(ha_mon, ...); /* fires
>> unconditionally */
>>
>> Result: spurious ha_trace_error_env() for initial_state. For existing
>> monitors (stall/nomiss/opid), model_should_send_event_env(initial, NONE)
>> returns false, so no false-positive reaction, but the trace event fires.
>> Monitors where initial_state carries a constraint would produce a false
>> positive.
>
> I'm not sure what you mean here, if I understand the situation correctly: the
> callback is running (so we should react), da_monitor_reset() is too late to stop
> it but somehow manages to reset curr_state on time for the callback to see it
> change: react reports the wrong state in an otherwise valid reaction.
>
>>
>> Proposed fix : re-check inside the RCU critical section:
>>
>> guard(rcu)();
>> if (unlikely(!da_monitoring(&ha_mon->da_mon))) /* re-check here */
>> return;
>> curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
>
> I'm not sure that's going to fix it anyway, RCU cannot synchronise readers,
> checking again would at most (mildly) reduce the race window, not remove it.
>
> What we could do is to play with barriersin for the callback to either:
> * see monitoring = 1 AND the old curr_state
> * see monitoring = 0 AND the new curr_state
>
> Something like:
>
> void __ha_monitor_timer_callback() {
> guard(rcu)(); //this is only for waiters, let them wait more
>
> if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
> return;
> smp_rmb();
> curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
> ...
> }
>
> void da_monitor_reset() {
> da_monitor_reset_hook(da_mon);
> WRITE_ONCE(da_mon->monitoring, 0);
> smp_wmb();
> WRITE_ONCE(da_mon->curr_state, model_get_initial_state());
> }
>
> Coupled with your patch [1] adding more atomic accesses to da_mon->monitoring
> should probably do the trick.
>
> Am I missing anything?
>
The goal is right. One thing worth double-checking is the load order
in the callback against the "SMP BARRIER PAIRING" section of
Documentation/memory-barriers.txt, which states:
[!] Note that the stores before the write barrier would normally be
expected to match the loads after the read barrier or the
address-dependency barrier, and vice versa ...
So, we should to swap the read order in the callback so that it matches
the standard pattern:
void __ha_monitor_timer_callback() {
guard(rcu)();
curr_state = READ_ONCE(ha_mon->da_mon.curr_state); /* B:
before rmb */
smp_rmb();
if (unlikely(!da_monitoring(&ha_mon->da_mon))) /* A:
after rmb */
return;
/*
* Reached here: monitoring = 1 (old_A).
* Standard wmb/rmb guarantee: curr_state (read before rmb) is also
* old, i.e. not initial_state.
*/
ha_react(curr_state, EVENT_NONE, env_string.buffer);
...
}
void da_monitor_reset() {
da_monitor_reset_hook(da_mon);
WRITE_ONCE(da_mon->monitoring, 0); /* A: before wmb */
smp_wmb();
WRITE_ONCE(da_mon->curr_state, model_get_initial_state()); /*
B: after wmb */
}
--
Best wishes,
Wen
>
> [1] -
> https://lore.kernel.org/lkml/8af5ba4bd93d2acb8a546e8e47ced974a87c1eb8.1778522945.git.wen.yang@linux.dev
>
>>
>>
>> --
>> Best wishes,
>> Wen
>>
>>
>> On 5/12/26 22:02, Gabriele Monaco wrote:
>>> HA monitors may start timers, all cleanup functions currently stop the
>>> timers asynchronously to avoid sleeping in the wrong context.
>>> Nothing makes sure running callbacks terminate on cleanup.
>>>
>>> Run the entire HA timer callback in an RCU read-side critical section,
>>> this way we can simply synchronize_rcu() with any pending timer and are
>>> sure any cleanup using kfree_rcu() runs after callbacks terminated.
>>> Additionally make sure any unlikely callback running late won't run any
>>> code if the monitor is marked as disabled.
>>>
>>> Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
>>> Fixes: 4a24127bd6cb ("rv: Add support for per-object monitors in DA/HA")
>>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>>> ---
>>> include/rv/da_monitor.h | 23 +++++++++++++++++++----
>>> include/rv/ha_monitor.h | 18 ++++++++++++++++--
>>> 2 files changed, 35 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
>>> index a4a13b62d1a4..402d3b935c08 100644
>>> --- a/include/rv/da_monitor.h
>>> +++ b/include/rv/da_monitor.h
>>> @@ -57,6 +57,15 @@ static struct rv_monitor rv_this;
>>> #define da_monitor_reset_hook(da_mon)
>>> #endif
>>>
>>> +/*
>>> + * Hook to allow the implementation of hybrid automata: define it with a
>>> + * function that waits for the termination of all monitors background
>>> + * activities (e.g. all timers). This hook can sleep.
>>> + */
>>> +#ifndef da_monitor_sync_hook
>>> +#define da_monitor_sync_hook()
>>> +#endif
>>> +
>>> /*
>>> * Type for the target id, default to int but can be overridden.
>>> * A long type can work as hash table key (PER_OBJ) but will be downgraded
>>> to
>>> @@ -179,6 +188,7 @@ static inline int da_monitor_init(void)
>>> static inline void da_monitor_destroy(void)
>>> {
>>> da_monitor_reset_all();
>>> + da_monitor_sync_hook();
>>> }
>>>
>>> #ifndef da_implicit_guard
>>> @@ -232,6 +242,7 @@ static inline int da_monitor_init(void)
>>> static inline void da_monitor_destroy(void)
>>> {
>>> da_monitor_reset_all();
>>> + da_monitor_sync_hook();
>>> }
>>>
>>> #ifndef da_implicit_guard
>>> @@ -319,6 +330,7 @@ static inline void da_monitor_destroy(void)
>>> }
>>>
>>> da_monitor_reset_all();
>>> + da_monitor_sync_hook();
>>>
>>> rv_put_task_monitor_slot(task_mon_slot);
>>> task_mon_slot = RV_PER_TASK_MONITOR_INIT;
>>> @@ -497,10 +509,9 @@ static void da_monitor_reset_all(void)
>>> struct da_monitor_storage *mon_storage;
>>> int bkt;
>>>
>>> - rcu_read_lock();
>>> + guard(rcu)();
>>> hash_for_each_rcu(da_monitor_ht, bkt, mon_storage, node)
>>> da_monitor_reset(&mon_storage->rv.da_mon);
>>> - rcu_read_unlock();
>>> }
>>>
>>> static inline int da_monitor_init(void)
>>> @@ -516,13 +527,17 @@ static inline void da_monitor_destroy(void)
>>> int bkt;
>>>
>>> tracepoint_synchronize_unregister();
>>> + scoped_guard(rcu) {
>>> + hash_for_each_rcu(da_monitor_ht, bkt, mon_storage, node) {
>>> + da_monitor_reset_hook(&mon_storage->rv.da_mon);
>>> + }
>>> + }
>>> + da_monitor_sync_hook();
>>> /*
>>> * This function is called after all probes are disabled and no
>>> longer
>>> * pending, we can safely assume no concurrent user.
>>> */
>>> - synchronize_rcu();
>>> hash_for_each_safe(da_monitor_ht, bkt, tmp, mon_storage, node) {
>>> - da_monitor_reset_hook(&mon_storage->rv.da_mon);
>>> hash_del_rcu(&mon_storage->node);
>>> kfree(mon_storage);
>>> }
>>> diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
>>> index d59507e8cb30..47ff1a41febe 100644
>>> --- a/include/rv/ha_monitor.h
>>> +++ b/include/rv/ha_monitor.h
>>> @@ -36,6 +36,7 @@ static bool ha_monitor_handle_constraint(struct da_monitor
>>> *da_mon,
>>> #define da_monitor_event_hook ha_monitor_handle_constraint
>>> #define da_monitor_init_hook ha_monitor_init_env
>>> #define da_monitor_reset_hook ha_monitor_reset_env
>>> +#define da_monitor_sync_hook() synchronize_rcu()
>>>
>>> #include <rv/da_monitor.h>
>>> #include <linux/seq_buf.h>
>>> @@ -237,12 +238,25 @@ static bool ha_monitor_handle_constraint(struct
>>> da_monitor *da_mon,
>>> return false;
>>> }
>>>
>>> +/*
>>> + * __ha_monitor_timer_callback - generic callback representation
>>> + *
>>> + * This callback runs in an RCU read-side critical section to allow the
>>> + * destruction sequence to easily synchronize_rcu() with all pending timer
>>> + * after asynchronously disabling them.
>>> + */
>>> static inline void __ha_monitor_timer_callback(struct ha_monitor *ha_mon)
>>> {
>>> - enum states curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
>>> DECLARE_SEQ_BUF(env_string, ENV_BUFFER_SIZE);
>>> - u64 time_ns = ha_get_ns();
>>> + enum states curr_state;
>>> + u64 time_ns;
>>> +
>>> + if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
>>> + return;
>>>
>>> + guard(rcu)();
>>> + curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
>>> + time_ns = ha_get_ns();
>>> ha_get_env_string(&env_string, ha_mon, time_ns);
>>> ha_react(curr_state, EVENT_NONE, env_string.buffer);
>>> ha_trace_error_env(ha_mon, model_get_state_name(curr_state),
>
next prev parent reply other threads:[~2026-05-19 16:48 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 14:02 [PATCH 0/9] rv: Fixes on Deterministic and Hybrid Automata Gabriele Monaco
2026-05-12 14:02 ` [PATCH 1/9] rv: Fix __user specifier usage in extract_params() Gabriele Monaco
2026-05-17 8:48 ` Wen Yang
2026-05-12 14:02 ` [PATCH 2/9] rv: Fix read_lock scope in per-task DA cleanup Gabriele Monaco
2026-05-17 8:51 ` Wen Yang
2026-05-12 14:02 ` [PATCH 3/9] rv: Reset per-task DA monitors before releasing the slot Gabriele Monaco
2026-05-17 8:55 ` Wen Yang
2026-05-12 14:02 ` [PATCH 4/9] rv: Prevent task migration while handling per-CPU events Gabriele Monaco
2026-05-17 8:57 ` Wen Yang
2026-05-12 14:02 ` [PATCH 5/9] rv: Ensure all pending probes terminate on per-obj monitor destroy Gabriele Monaco
2026-05-17 9:01 ` Wen Yang
2026-05-12 14:02 ` [PATCH 6/9] rv: Ensure synchronous cleanup for HA monitors Gabriele Monaco
2026-05-17 9:12 ` Wen Yang
2026-05-18 11:54 ` Gabriele Monaco
2026-05-19 9:31 ` Gabriele Monaco
2026-05-19 16:48 ` Wen Yang [this message]
2026-05-12 14:02 ` [PATCH 7/9] rv: Do not rely on clean monitor when initialising HA Gabriele Monaco
2026-05-17 9:15 ` Wen Yang
2026-05-12 14:02 ` [PATCH 8/9] rv: Add automatic cleanup handlers for per-task HA monitors Gabriele Monaco
2026-05-17 9:40 ` Wen Yang
2026-05-18 12:18 ` Gabriele Monaco
2026-05-12 14:02 ` [PATCH 9/9] rv: Mandate deallocation for per-obj monitors Gabriele Monaco
2026-05-17 9:52 ` Wen Yang
2026-05-18 6:36 ` Gabriele Monaco
2026-05-18 15:40 ` Wen Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5f8ceba5-ec25-4c19-a74b-ed98715e89b6@linux.dev \
--to=wen.yang@linux.dev \
--cc=gmonaco@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=namcao@linutronix.de \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox