Linux Trace Kernel
 help / color / mirror / Atom feed
From: Wen Yang <wen.yang@linux.dev>
To: Gabriele Monaco <gmonaco@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Nam Cao <namcao@linutronix.de>,
	linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCH 6/9] rv: Ensure synchronous cleanup for HA monitors
Date: Wed, 20 May 2026 00:48:11 +0800	[thread overview]
Message-ID: <5f8ceba5-ec25-4c19-a74b-ed98715e89b6@linux.dev> (raw)
In-Reply-To: <88a6fc5c08d18e3c1f6d29dc106db80fa688bf87.camel@redhat.com>



On 5/18/26 19:54, Gabriele Monaco wrote:
> On Sun, 2026-05-17 at 17:12 +0800, Wen Yang wrote:
>> The guard(rcu)() + synchronize_rcu() mechanism for HA timer callbacks
>> is correct.
>>
>> One concern: TOCTOU between the pre-check and guard(rcu)().
> 
> Yes, this could happen, but I'm not sure it's really a big issue:
> 
>>
>> da_monitor_reset() calls reset_hook BEFORE clearing monitoring:
>>
>>     da_monitor_reset_hook(da_mon);        /* ha_cancel_timer [async]   */
>>     WRITE_ONCE(da_mon->monitoring, 0);    /* cleared AFTER reset_hook  */
>>     da_mon->curr_state = model_get_initial_state();
>>
>> This may creates a window where the callback pre-check passes but the
>> monitor is reset before guard(rcu)() is acquired:
> 
> If a callback is running, there was a violation because the timer expired, so it
> isn't wrong to report, although we are unloading the monitor.
> 
>>
>>     /* __ha_monitor_timer_callback() */
>>     if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
>>         return;
>>
>>     /* passes: monitoring=1
>>      *
>>      * WINDOW ─ CPU A runs da_monitor_reset_all() here:
>>      *   ha_cancel_timer()  [returns: callback is running, cannot cancel]
>>      *   WRITE_ONCE(monitoring, 0)
>>      *   curr_state = model_get_initial_state()
>>      */
>>     guard(rcu)();
>>     curr_state = READ_ONCE(ha_mon->da_mon.curr_state);  /* initial_state */
>>     /* no second da_monitoring() check */
>>     ha_react(curr_state, EVENT_NONE, env_string.buffer); /* spurious call */
>>     ha_trace_error_env(ha_mon, ...);                     /* fires
>> unconditionally */
>>
>> Result: spurious ha_trace_error_env() for initial_state.  For existing
>> monitors (stall/nomiss/opid), model_should_send_event_env(initial, NONE)
>> returns false, so no false-positive reaction, but the trace event fires.
>> Monitors where initial_state carries a constraint would produce a false
>> positive.
> 
> I'm not sure what you mean here, if I understand the situation correctly: the
> callback is running (so we should react), da_monitor_reset() is too late to stop
> it but somehow manages to reset curr_state on time for the callback to see it
> change: react reports the wrong state in an otherwise valid reaction.
> 
>>
>> Proposed fix : re-check inside the RCU critical section:
>>
>>     guard(rcu)();
>>     if (unlikely(!da_monitoring(&ha_mon->da_mon)))  /* re-check here */
>>         return;
>>     curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
> 
> I'm not sure that's going to fix it anyway, RCU cannot synchronise readers,
> checking again would at most (mildly) reduce the race window, not remove it.
> 
> What we could do is to play with barriersin for the callback to either:
> * see monitoring = 1 AND the old curr_state
> * see monitoring = 0 AND the new curr_state
> 
> Something like:
> 
> void __ha_monitor_timer_callback() {
> 	guard(rcu)(); //this is only for waiters, let them wait more
> 
> 	if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
> 		return;
> 	smp_rmb();
> 	curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
> 	...
> }
> 
> void da_monitor_reset() {
> 	da_monitor_reset_hook(da_mon);
> 	WRITE_ONCE(da_mon->monitoring, 0);
> 	smp_wmb();
> 	WRITE_ONCE(da_mon->curr_state, model_get_initial_state());
> }
> 
> Coupled with your patch [1] adding more atomic accesses to da_mon->monitoring
> should probably do the trick.
> 
> Am I missing anything?
> 

The goal is right.  One thing worth double-checking is the load order
in the callback against the "SMP BARRIER PAIRING" section of
Documentation/memory-barriers.txt, which states:

   [!] Note that the stores before the write barrier would normally be
   expected to match the loads after the read barrier or the
   address-dependency barrier, and vice versa ...

So, we should to swap the read order in the callback so that it matches
the standard pattern:

   void __ha_monitor_timer_callback() {
         guard(rcu)();
         curr_state = READ_ONCE(ha_mon->da_mon.curr_state);  /* B: 
before rmb */
         smp_rmb();
         if (unlikely(!da_monitoring(&ha_mon->da_mon)))       /* A: 
after rmb */
                 return;
         /*
          * Reached here: monitoring = 1 (old_A).
          * Standard wmb/rmb guarantee: curr_state (read before rmb) is also
          * old, i.e. not initial_state.
          */
         ha_react(curr_state, EVENT_NONE, env_string.buffer);
         ...
   }

   void da_monitor_reset() {
         da_monitor_reset_hook(da_mon);
         WRITE_ONCE(da_mon->monitoring, 0);   /* A: before wmb */
         smp_wmb();
         WRITE_ONCE(da_mon->curr_state, model_get_initial_state());  /* 
B: after wmb */
   }



--
Best wishes,
Wen

> 
> [1] -
> https://lore.kernel.org/lkml/8af5ba4bd93d2acb8a546e8e47ced974a87c1eb8.1778522945.git.wen.yang@linux.dev
> 
>>
>>
>> --
>> Best wishes,
>> Wen
>>
>>
>> On 5/12/26 22:02, Gabriele Monaco wrote:
>>> HA monitors may start timers, all cleanup functions currently stop the
>>> timers asynchronously to avoid sleeping in the wrong context.
>>> Nothing makes sure running callbacks terminate on cleanup.
>>>
>>> Run the entire HA timer callback in an RCU read-side critical section,
>>> this way we can simply synchronize_rcu() with any pending timer and are
>>> sure any cleanup using kfree_rcu() runs after callbacks terminated.
>>> Additionally make sure any unlikely callback running late won't run any
>>> code if the monitor is marked as disabled.
>>>
>>> Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type")
>>> Fixes: 4a24127bd6cb ("rv: Add support for per-object monitors in DA/HA")
>>> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
>>> ---
>>>    include/rv/da_monitor.h | 23 +++++++++++++++++++----
>>>    include/rv/ha_monitor.h | 18 ++++++++++++++++--
>>>    2 files changed, 35 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
>>> index a4a13b62d1a4..402d3b935c08 100644
>>> --- a/include/rv/da_monitor.h
>>> +++ b/include/rv/da_monitor.h
>>> @@ -57,6 +57,15 @@ static struct rv_monitor rv_this;
>>>    #define da_monitor_reset_hook(da_mon)
>>>    #endif
>>>    
>>> +/*
>>> + * Hook to allow the implementation of hybrid automata: define it with a
>>> + * function that waits for the termination of all monitors background
>>> + * activities (e.g. all timers). This hook can sleep.
>>> + */
>>> +#ifndef da_monitor_sync_hook
>>> +#define da_monitor_sync_hook()
>>> +#endif
>>> +
>>>    /*
>>>     * Type for the target id, default to int but can be overridden.
>>>     * A long type can work as hash table key (PER_OBJ) but will be downgraded
>>> to
>>> @@ -179,6 +188,7 @@ static inline int da_monitor_init(void)
>>>    static inline void da_monitor_destroy(void)
>>>    {
>>>    	da_monitor_reset_all();
>>> +	da_monitor_sync_hook();
>>>    }
>>>    
>>>    #ifndef da_implicit_guard
>>> @@ -232,6 +242,7 @@ static inline int da_monitor_init(void)
>>>    static inline void da_monitor_destroy(void)
>>>    {
>>>    	da_monitor_reset_all();
>>> +	da_monitor_sync_hook();
>>>    }
>>>    
>>>    #ifndef da_implicit_guard
>>> @@ -319,6 +330,7 @@ static inline void da_monitor_destroy(void)
>>>    	}
>>>    
>>>    	da_monitor_reset_all();
>>> +	da_monitor_sync_hook();
>>>    
>>>    	rv_put_task_monitor_slot(task_mon_slot);
>>>    	task_mon_slot = RV_PER_TASK_MONITOR_INIT;
>>> @@ -497,10 +509,9 @@ static void da_monitor_reset_all(void)
>>>    	struct da_monitor_storage *mon_storage;
>>>    	int bkt;
>>>    
>>> -	rcu_read_lock();
>>> +	guard(rcu)();
>>>    	hash_for_each_rcu(da_monitor_ht, bkt, mon_storage, node)
>>>    		da_monitor_reset(&mon_storage->rv.da_mon);
>>> -	rcu_read_unlock();
>>>    }
>>>    
>>>    static inline int da_monitor_init(void)
>>> @@ -516,13 +527,17 @@ static inline void da_monitor_destroy(void)
>>>    	int bkt;
>>>    
>>>    	tracepoint_synchronize_unregister();
>>> +	scoped_guard(rcu) {
>>> +		hash_for_each_rcu(da_monitor_ht, bkt, mon_storage, node) {
>>> +			da_monitor_reset_hook(&mon_storage->rv.da_mon);
>>> +		}
>>> +	}
>>> +	da_monitor_sync_hook();
>>>    	/*
>>>    	 * This function is called after all probes are disabled and no
>>> longer
>>>    	 * pending, we can safely assume no concurrent user.
>>>    	 */
>>> -	synchronize_rcu();
>>>    	hash_for_each_safe(da_monitor_ht, bkt, tmp, mon_storage, node) {
>>> -		da_monitor_reset_hook(&mon_storage->rv.da_mon);
>>>    		hash_del_rcu(&mon_storage->node);
>>>    		kfree(mon_storage);
>>>    	}
>>> diff --git a/include/rv/ha_monitor.h b/include/rv/ha_monitor.h
>>> index d59507e8cb30..47ff1a41febe 100644
>>> --- a/include/rv/ha_monitor.h
>>> +++ b/include/rv/ha_monitor.h
>>> @@ -36,6 +36,7 @@ static bool ha_monitor_handle_constraint(struct da_monitor
>>> *da_mon,
>>>    #define da_monitor_event_hook ha_monitor_handle_constraint
>>>    #define da_monitor_init_hook ha_monitor_init_env
>>>    #define da_monitor_reset_hook ha_monitor_reset_env
>>> +#define da_monitor_sync_hook() synchronize_rcu()
>>>    
>>>    #include <rv/da_monitor.h>
>>>    #include <linux/seq_buf.h>
>>> @@ -237,12 +238,25 @@ static bool ha_monitor_handle_constraint(struct
>>> da_monitor *da_mon,
>>>    	return false;
>>>    }
>>>    
>>> +/*
>>> + * __ha_monitor_timer_callback - generic callback representation
>>> + *
>>> + * This callback runs in an RCU read-side critical section to allow the
>>> + * destruction sequence to easily synchronize_rcu() with all pending timer
>>> + * after asynchronously disabling them.
>>> + */
>>>    static inline void __ha_monitor_timer_callback(struct ha_monitor *ha_mon)
>>>    {
>>> -	enum states curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
>>>    	DECLARE_SEQ_BUF(env_string, ENV_BUFFER_SIZE);
>>> -	u64 time_ns = ha_get_ns();
>>> +	enum states curr_state;
>>> +	u64 time_ns;
>>> +
>>> +	if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
>>> +		return;
>>>    
>>> +	guard(rcu)();
>>> +	curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
>>> +	time_ns = ha_get_ns();
>>>    	ha_get_env_string(&env_string, ha_mon, time_ns);
>>>    	ha_react(curr_state, EVENT_NONE, env_string.buffer);
>>>    	ha_trace_error_env(ha_mon, model_get_state_name(curr_state),
> 

  parent reply	other threads:[~2026-05-19 16:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 14:02 [PATCH 0/9] rv: Fixes on Deterministic and Hybrid Automata Gabriele Monaco
2026-05-12 14:02 ` [PATCH 1/9] rv: Fix __user specifier usage in extract_params() Gabriele Monaco
2026-05-17  8:48   ` Wen Yang
2026-05-12 14:02 ` [PATCH 2/9] rv: Fix read_lock scope in per-task DA cleanup Gabriele Monaco
2026-05-17  8:51   ` Wen Yang
2026-05-12 14:02 ` [PATCH 3/9] rv: Reset per-task DA monitors before releasing the slot Gabriele Monaco
2026-05-17  8:55   ` Wen Yang
2026-05-12 14:02 ` [PATCH 4/9] rv: Prevent task migration while handling per-CPU events Gabriele Monaco
2026-05-17  8:57   ` Wen Yang
2026-05-12 14:02 ` [PATCH 5/9] rv: Ensure all pending probes terminate on per-obj monitor destroy Gabriele Monaco
2026-05-17  9:01   ` Wen Yang
2026-05-12 14:02 ` [PATCH 6/9] rv: Ensure synchronous cleanup for HA monitors Gabriele Monaco
2026-05-17  9:12   ` Wen Yang
2026-05-18 11:54     ` Gabriele Monaco
2026-05-19  9:31       ` Gabriele Monaco
2026-05-19 16:48       ` Wen Yang [this message]
2026-05-12 14:02 ` [PATCH 7/9] rv: Do not rely on clean monitor when initialising HA Gabriele Monaco
2026-05-17  9:15   ` Wen Yang
2026-05-12 14:02 ` [PATCH 8/9] rv: Add automatic cleanup handlers for per-task HA monitors Gabriele Monaco
2026-05-17  9:40   ` Wen Yang
2026-05-18 12:18     ` Gabriele Monaco
2026-05-12 14:02 ` [PATCH 9/9] rv: Mandate deallocation for per-obj monitors Gabriele Monaco
2026-05-17  9:52   ` Wen Yang
2026-05-18  6:36     ` Gabriele Monaco
2026-05-18 15:40       ` Wen Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f8ceba5-ec25-4c19-a74b-ed98715e89b6@linux.dev \
    --to=wen.yang@linux.dev \
    --cc=gmonaco@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=namcao@linutronix.de \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox