[PATCH 0/2] rcu: Random updates

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] rcu: Random updates
@ 2025-03-18 13:56 Frederic Weisbecker
  2025-03-18 13:56 ` [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact() Frederic Weisbecker
  2025-03-18 13:56 ` [PATCH 2/2] rcu: Robustify rcu_is_cpu_rrupt_from_idle() Frederic Weisbecker
  0 siblings, 2 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2025-03-18 13:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Paul E . McKenney, Uladzislau Rezki, Zqiang, rcu

Hi,

Just a bunch of things I had lagging in my local queue.

Thanks.

Frederic Weisbecker (2):
  rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  rcu: Robustify rcu_is_cpu_rrupt_from_idle()

 kernel/rcu/rcu.h  |  7 +++++++
 kernel/rcu/tree.c | 27 +++++++++++++++++----------
 2 files changed, 24 insertions(+), 10 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-18 13:56 [PATCH 0/2] rcu: Random updates Frederic Weisbecker
@ 2025-03-18 13:56 ` Frederic Weisbecker
  2025-03-18 18:37   ` Paul E. McKenney
  2025-03-18 13:56 ` [PATCH 2/2] rcu: Robustify rcu_is_cpu_rrupt_from_idle() Frederic Weisbecker
  1 sibling, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2025-03-18 13:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Paul E . McKenney, Uladzislau Rezki, Zqiang, rcu

The numbers used in rcu_seq_done_exact() lack some explanation behind
their magic. Especially after the commit:

    85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")

which reported a subtle issue where a new GP sequence snapshot was taken
on the root node state while a grace period had already been started and
reflected on the global state sequence but not yet on the root node
sequence, making a polling user waiting on a wrong already started grace
period that would ignore freshly online CPUs.

The fix involved taking the snaphot on the global state sequence and
waiting on the root node sequence. And since a grace period is first
started on the global state and only afterward reflected on the root
node, a snapshot taken on the global state sequence might be two full
grace periods ahead of the root node as in the following example:

rnp->gp_seq = rcu_state.gp_seq = 0

    CPU 0                                           CPU 1
    -----                                           -----
    // rcu_state.gp_seq = 1
    rcu_seq_start(&rcu_state.gp_seq)
                                                    // snap = 8
                                                    snap = rcu_seq_snap(&rcu_state.gp_seq)
                                                    // Two full GP differences
                                                    rcu_seq_done_exact(&rnp->gp_seq, snap)
    // rnp->gp_seq = 1
    WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);

Add a comment about those expectations and to clarify the magic within
the relevant function.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/rcu.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index eed2951a4962..7acf1f36dd6c 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -157,6 +157,13 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s)
  * Given a snapshot from rcu_seq_snap(), determine whether or not a
  * full update-side operation has occurred, but do not allow the
  * (ULONG_MAX / 2) safety-factor/guard-band.
+ *
+ * The token returned by get_state_synchronize_rcu_full() is based on
+ * rcu_state.gp_seq but it is tested in poll_state_synchronize_rcu_full()
+ * against the root rnp->gp_seq. Since rcu_seq_start() is first called
+ * on rcu_state.gp_seq and only later reflected on the root rnp->gp_seq,
+ * it is possible that rcu_seq_snap(rcu_state.gp_seq) returns 2 full grace
+ * periods ahead of the root rnp->gp_seq.
  */
 static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
 {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] rcu: Robustify rcu_is_cpu_rrupt_from_idle()
  2025-03-18 13:56 [PATCH 0/2] rcu: Random updates Frederic Weisbecker
  2025-03-18 13:56 ` [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact() Frederic Weisbecker
@ 2025-03-18 13:56 ` Frederic Weisbecker
  2025-03-22 17:00   ` Joel Fernandes
  1 sibling, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2025-03-18 13:56 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Paul E . McKenney, Uladzislau Rezki, Zqiang, rcu

RCU relies on the context tracking nesting counter in order to determine
if it is running in extended quiescent state.

However the context tracking nesting counter is not completely
synchronized with the actual context tracking state:

* The nesting counter is set to 1 or incremented further _after_ the
  actual state is set to RCU not watching.

   (then we know for sure we interrupted RCU not watching)

* The nesting counter is set to 0 or decremented further _before_ the
  actual state is set to RCU watching.

Therefore it is safe to assume that if ct_nesting() > 0, RCU is not
watching. But if ct_nesting() <= 0, RCU is watching except for a tiny
window.

This hasn't been a problem so far because rcu_is_cpu_rrupt_from_idle()
has only been called from interrupts. However the code is confusing
and abuses the role of the context tracking nesting counter while there
are more accurate indicators available.

Clarify and robustify accordingly.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79dced5fb72e..90c43061c981 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -367,7 +367,7 @@ EXPORT_SYMBOL_GPL(rcu_momentary_eqs);
  */
 static int rcu_is_cpu_rrupt_from_idle(void)
 {
-	long nesting;
+	long nmi_nesting = ct_nmi_nesting();
 
 	/*
 	 * Usually called from the tick; but also used from smp_function_call()
@@ -379,21 +379,28 @@ static int rcu_is_cpu_rrupt_from_idle(void)
 	/* Check for counter underflows */
 	RCU_LOCKDEP_WARN(ct_nesting() < 0,
 			 "RCU nesting counter underflow!");
-	RCU_LOCKDEP_WARN(ct_nmi_nesting() <= 0,
-			 "RCU nmi_nesting counter underflow/zero!");
 
-	/* Are we at first interrupt nesting level? */
-	nesting = ct_nmi_nesting();
-	if (nesting > 1)
+	/* Non-idle interrupt or nested idle interrupt */
+	if (nmi_nesting > 1)
 		return false;
 
 	/*
-	 * If we're not in an interrupt, we must be in the idle task!
+	 * Non nested idle interrupt (interrupting section where RCU
+	 * wasn't watching).
 	 */
-	WARN_ON_ONCE(!nesting && !is_idle_task(current));
+	if (nmi_nesting == 1)
+		return true;
 
-	/* Does CPU appear to be idle from an RCU standpoint? */
-	return ct_nesting() == 0;
+	/* Not in an interrupt */
+	if (!nmi_nesting) {
+		RCU_LOCKDEP_WARN(!in_task() || !is_idle_task(current),
+				 "RCU nmi_nesting counter not in idle task!");
+		return !rcu_is_watching_curr_cpu();
+	}
+
+	RCU_LOCKDEP_WARN(1, "RCU nmi_nesting counter underflow/zero!");
+
+	return false;
 }
 
 #define DEFAULT_RCU_BLIMIT (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 1000 : 10)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-18 13:56 ` [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact() Frederic Weisbecker
@ 2025-03-18 18:37   ` Paul E. McKenney
  2025-03-19 19:38     ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2025-03-18 18:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Boqun Feng, Joel Fernandes, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
> The numbers used in rcu_seq_done_exact() lack some explanation behind
> their magic. Especially after the commit:
> 
>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> 
> which reported a subtle issue where a new GP sequence snapshot was taken
> on the root node state while a grace period had already been started and
> reflected on the global state sequence but not yet on the root node
> sequence, making a polling user waiting on a wrong already started grace
> period that would ignore freshly online CPUs.
> 
> The fix involved taking the snaphot on the global state sequence and
> waiting on the root node sequence. And since a grace period is first
> started on the global state and only afterward reflected on the root
> node, a snapshot taken on the global state sequence might be two full
> grace periods ahead of the root node as in the following example:
> 
> rnp->gp_seq = rcu_state.gp_seq = 0
> 
>     CPU 0                                           CPU 1
>     -----                                           -----
>     // rcu_state.gp_seq = 1
>     rcu_seq_start(&rcu_state.gp_seq)
>                                                     // snap = 8
>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
>                                                     // Two full GP differences
>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
>     // rnp->gp_seq = 1
>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
> 
> Add a comment about those expectations and to clarify the magic within
> the relevant function.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

But it would of course be good to get reviews from the others.

> ---
>  kernel/rcu/rcu.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index eed2951a4962..7acf1f36dd6c 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -157,6 +157,13 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s)
>   * Given a snapshot from rcu_seq_snap(), determine whether or not a
>   * full update-side operation has occurred, but do not allow the
>   * (ULONG_MAX / 2) safety-factor/guard-band.
> + *
> + * The token returned by get_state_synchronize_rcu_full() is based on
> + * rcu_state.gp_seq but it is tested in poll_state_synchronize_rcu_full()
> + * against the root rnp->gp_seq. Since rcu_seq_start() is first called
> + * on rcu_state.gp_seq and only later reflected on the root rnp->gp_seq,
> + * it is possible that rcu_seq_snap(rcu_state.gp_seq) returns 2 full grace
> + * periods ahead of the root rnp->gp_seq.
>   */
>  static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
>  {
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-18 18:37   ` Paul E. McKenney
@ 2025-03-19 19:38     ` Joel Fernandes
  2025-03-20 14:15       ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2025-03-19 19:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
> > The numbers used in rcu_seq_done_exact() lack some explanation behind
> > their magic. Especially after the commit:
> > 
> >     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> > 
> > which reported a subtle issue where a new GP sequence snapshot was taken
> > on the root node state while a grace period had already been started and
> > reflected on the global state sequence but not yet on the root node
> > sequence, making a polling user waiting on a wrong already started grace
> > period that would ignore freshly online CPUs.
> > 
> > The fix involved taking the snaphot on the global state sequence and
> > waiting on the root node sequence. And since a grace period is first
> > started on the global state and only afterward reflected on the root
> > node, a snapshot taken on the global state sequence might be two full
> > grace periods ahead of the root node as in the following example:
> > 
> > rnp->gp_seq = rcu_state.gp_seq = 0
> > 
> >     CPU 0                                           CPU 1
> >     -----                                           -----
> >     // rcu_state.gp_seq = 1
> >     rcu_seq_start(&rcu_state.gp_seq)
> >                                                     // snap = 8
> >                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
> >                                                     // Two full GP differences
> >                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
> >     // rnp->gp_seq = 1
> >     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
> > 
> > Add a comment about those expectations and to clarify the magic within
> > the relevant function.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> 
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> 
> But it would of course be good to get reviews from the others.

I actually don't agree that the magic in the rcu_seq_done_exact() function about the
~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
because the small lag can just as well survive with the rcu_seq_done()
function in the above sequence right?

The rcu_seq_done_exact() function on the other hand is more about not being
stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
least ULONG_MAX/2 AFAIU.

So the only time this magic will matter is if you have a huge delta between
what is being compared, not just 2 GPs.

Or, did I miss something?

(Also sorry about slow email responses this week as I had my presentation
today and was busy preparing this week and attending other presentations at
OSPM, I'll provide an update on that soon!).

thanks,

 - Joel









> 
> > ---
> >  kernel/rcu/rcu.h | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index eed2951a4962..7acf1f36dd6c 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -157,6 +157,13 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s)
> >   * Given a snapshot from rcu_seq_snap(), determine whether or not a
> >   * full update-side operation has occurred, but do not allow the
> >   * (ULONG_MAX / 2) safety-factor/guard-band.
> > + *
> > + * The token returned by get_state_synchronize_rcu_full() is based on
> > + * rcu_state.gp_seq but it is tested in poll_state_synchronize_rcu_full()
> > + * against the root rnp->gp_seq. Since rcu_seq_start() is first called
> > + * on rcu_state.gp_seq and only later reflected on the root rnp->gp_seq,
> > + * it is possible that rcu_seq_snap(rcu_state.gp_seq) returns 2 full grace
> > + * periods ahead of the root rnp->gp_seq.
> >   */
> >  static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
> >  {
> > -- 
> > 2.48.1
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-19 19:38     ` Joel Fernandes
@ 2025-03-20 14:15       ` Frederic Weisbecker
  2025-03-22  2:06         ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2025-03-20 14:15 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
> > On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
> > > The numbers used in rcu_seq_done_exact() lack some explanation behind
> > > their magic. Especially after the commit:
> > > 
> > >     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> > > 
> > > which reported a subtle issue where a new GP sequence snapshot was taken
> > > on the root node state while a grace period had already been started and
> > > reflected on the global state sequence but not yet on the root node
> > > sequence, making a polling user waiting on a wrong already started grace
> > > period that would ignore freshly online CPUs.
> > > 
> > > The fix involved taking the snaphot on the global state sequence and
> > > waiting on the root node sequence. And since a grace period is first
> > > started on the global state and only afterward reflected on the root
> > > node, a snapshot taken on the global state sequence might be two full
> > > grace periods ahead of the root node as in the following example:
> > > 
> > > rnp->gp_seq = rcu_state.gp_seq = 0
> > > 
> > >     CPU 0                                           CPU 1
> > >     -----                                           -----
> > >     // rcu_state.gp_seq = 1
> > >     rcu_seq_start(&rcu_state.gp_seq)
> > >                                                     // snap = 8
> > >                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
> > >                                                     // Two full GP differences
> > >                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
> > >     // rnp->gp_seq = 1
> > >     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
> > > 
> > > Add a comment about those expectations and to clarify the magic within
> > > the relevant function.
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > 
> > Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > But it would of course be good to get reviews from the others.
> 
> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
> because the small lag can just as well survive with the rcu_seq_done()
> function in the above sequence right?
> 
> The rcu_seq_done_exact() function on the other hand is more about not being
> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
> least ULONG_MAX/2 AFAIU.
> 
> So the only time this magic will matter is if you have a huge delta between
> what is being compared, not just 2 GPs.

You're right, and perhaps I should have made it more specific that my comment
only explains the magic "3" number here, in that if it were "2" instead, there
could be accidents with 2 full GPs difference (which is possible) spuriously
accounted as a wrap around.

Thanks.

> 
> Or, did I miss something?
> 
> (Also sorry about slow email responses this week as I had my presentation
> today and was busy preparing this week and attending other presentations at
> OSPM, I'll provide an update on that soon!).
> 
> thanks,
> 
>  - Joel
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > 
> > > ---
> > >  kernel/rcu/rcu.h | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > > index eed2951a4962..7acf1f36dd6c 100644
> > > --- a/kernel/rcu/rcu.h
> > > +++ b/kernel/rcu/rcu.h
> > > @@ -157,6 +157,13 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s)
> > >   * Given a snapshot from rcu_seq_snap(), determine whether or not a
> > >   * full update-side operation has occurred, but do not allow the
> > >   * (ULONG_MAX / 2) safety-factor/guard-band.
> > > + *
> > > + * The token returned by get_state_synchronize_rcu_full() is based on
> > > + * rcu_state.gp_seq but it is tested in poll_state_synchronize_rcu_full()
> > > + * against the root rnp->gp_seq. Since rcu_seq_start() is first called
> > > + * on rcu_state.gp_seq and only later reflected on the root rnp->gp_seq,
> > > + * it is possible that rcu_seq_snap(rcu_state.gp_seq) returns 2 full grace
> > > + * periods ahead of the root rnp->gp_seq.
> > >   */
> > >  static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
> > >  {
> > > -- 
> > > 2.48.1
> > > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-20 14:15       ` Frederic Weisbecker
@ 2025-03-22  2:06         ` Joel Fernandes
  2025-03-22 10:25           ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2025-03-22  2:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):

On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
>>>> their magic. Especially after the commit:
>>>>
>>>>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
>>>>
>>>> which reported a subtle issue where a new GP sequence snapshot was taken
>>>> on the root node state while a grace period had already been started and
>>>> reflected on the global state sequence but not yet on the root node
>>>> sequence, making a polling user waiting on a wrong already started grace
>>>> period that would ignore freshly online CPUs.
>>>>
>>>> The fix involved taking the snaphot on the global state sequence and
>>>> waiting on the root node sequence. And since a grace period is first
>>>> started on the global state and only afterward reflected on the root
>>>> node, a snapshot taken on the global state sequence might be two full
>>>> grace periods ahead of the root node as in the following example:
>>>>
>>>> rnp->gp_seq = rcu_state.gp_seq = 0
>>>>
>>>>     CPU 0                                           CPU 1
>>>>     -----                                           -----
>>>>     // rcu_state.gp_seq = 1
>>>>     rcu_seq_start(&rcu_state.gp_seq)
>>>>                                                     // snap = 8
>>>>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
>>>>                                                     // Two full GP differences
>>>>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
>>>>     // rnp->gp_seq = 1
>>>>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
>>>>
>>>> Add a comment about those expectations and to clarify the magic within
>>>> the relevant function.
>>>>
>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
>>>
>>> But it would of course be good to get reviews from the others.
>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
>> because the small lag can just as well survive with the rcu_seq_done()
>> function in the above sequence right?
>>
>> The rcu_seq_done_exact() function on the other hand is more about not being
>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
>> least ULONG_MAX/2 AFAIU.
>>
>> So the only time this magic will matter is if you have a huge delta between
>> what is being compared, not just 2 GPs.
> You're right, and perhaps I should have made it more specific that my comment
> only explains the magic "3" number here, in that if it were "2" instead, there
> could be accidents with 2 full GPs difference (which is possible) spuriously
> accounted as a wrap around.

Ahh, so I guess I get it now and we are both right. The complete picture is - We
are trying to handle the case of "very large wrap" around but as a part of that,
we don't want to create false-positives for this "snap" case.

A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.

That's within "2 GPs" worth of counts (about 8 counts)

Taking some numbers:

cur_s	s	delta (s - cur_s)
0	4	4
1	8	7
2	8	6
3	8	5
4	8	4
5	12	7

The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
1) which in this case is 7.

So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
RCU_SEQ_STATE_MASK + 1)). i.e.

after a snap, if we blindly do ULONG_CMP_LT without adjustment, we'll falsely
conclude that the GP has completed thinking it was due to wrap around, where as
it is possible we just snapped and got a false positive.

So I think your comment is mostly correct then. But I think it may be better to
clarify that the reason we need rcu_seq_done_exact() and that ULONG_CMP_LT is
because we want handle very large wrap around not being stuck in "false
negative" territory as we would with rcu_seq_done(). But that also means we
can't break the "snap" usecase to the introduction of ULONG_CMP_LT.

Unless you beat me to it, I may modify your patch for v6.16 augmented with this
reasoning ;) (Also since I am also working on adding that forced wrap around
test to rcutorture).

Also it is still not fully clear to me what the root node has to do with all
this in your example, because the rcu_seq_done_exact() needs to be what it is
(that is having that 2 GP adjustment) even if the rnp->gp_seq and
rcu_state.gp_seq were in sync?

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-22  2:06         ` Joel Fernandes
@ 2025-03-22 10:25           ` Frederic Weisbecker
  2025-03-22 14:20             ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2025-03-22 10:25 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
> 
> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
> > Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
> >> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
> >>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
> >>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
> >>>> their magic. Especially after the commit:
> >>>>
> >>>>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> >>>>
> >>>> which reported a subtle issue where a new GP sequence snapshot was taken
> >>>> on the root node state while a grace period had already been started and
> >>>> reflected on the global state sequence but not yet on the root node
> >>>> sequence, making a polling user waiting on a wrong already started grace
> >>>> period that would ignore freshly online CPUs.
> >>>>
> >>>> The fix involved taking the snaphot on the global state sequence and
> >>>> waiting on the root node sequence. And since a grace period is first
> >>>> started on the global state and only afterward reflected on the root
> >>>> node, a snapshot taken on the global state sequence might be two full
> >>>> grace periods ahead of the root node as in the following example:
> >>>>
> >>>> rnp->gp_seq = rcu_state.gp_seq = 0
> >>>>
> >>>>     CPU 0                                           CPU 1
> >>>>     -----                                           -----
> >>>>     // rcu_state.gp_seq = 1
> >>>>     rcu_seq_start(&rcu_state.gp_seq)
> >>>>                                                     // snap = 8
> >>>>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
> >>>>                                                     // Two full GP differences
> >>>>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
> >>>>     // rnp->gp_seq = 1
> >>>>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
> >>>>
> >>>> Add a comment about those expectations and to clarify the magic within
> >>>> the relevant function.
> >>>>
> >>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> >>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> >>>
> >>> But it would of course be good to get reviews from the others.
> >> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
> >> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
> >> because the small lag can just as well survive with the rcu_seq_done()
> >> function in the above sequence right?
> >>
> >> The rcu_seq_done_exact() function on the other hand is more about not being
> >> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
> >> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
> >> least ULONG_MAX/2 AFAIU.
> >>
> >> So the only time this magic will matter is if you have a huge delta between
> >> what is being compared, not just 2 GPs.
> > You're right, and perhaps I should have made it more specific that my comment
> > only explains the magic "3" number here, in that if it were "2" instead, there
> > could be accidents with 2 full GPs difference (which is possible) spuriously
> > accounted as a wrap around.
> 
> Ahh, so I guess I get it now and we are both right. The complete picture is - We
> are trying to handle the case of "very large wrap" around but as a part of that,
> we don't want to create false-positives for this "snap" case.
> 
> A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
> 
> That's within "2 GPs" worth of counts (about 8 counts)
> 
> Taking some numbers:
> 
> cur_s	s	delta (s - cur_s)
> 0	4	4
> 1	8	7
> 2	8	6
> 3	8	5
> 4	8	4
> 5	12	7
> 
> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
> 1) which in this case is 7.
> 
> So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
> RCU_SEQ_STATE_MASK + 1)). i.e.

3, right?

> 
> after a snap, if we blindly do ULONG_CMP_LT without adjustment, we'll falsely
> conclude that the GP has completed thinking it was due to wrap around, where as
> it is possible we just snapped and got a false positive.
> 
> So I think your comment is mostly correct then. But I think it may be better to
> clarify that the reason we need rcu_seq_done_exact() and that ULONG_CMP_LT is
> because we want handle very large wrap around not being stuck in "false
> negative" territory as we would with rcu_seq_done(). But that also means we
> can't break the "snap" usecase to the introduction of ULONG_CMP_LT.

Indeed.

> 
> Unless you beat me to it, I may modify your patch for v6.16 augmented with this
> reasoning ;) (Also since I am also working on adding that forced wrap around
> test to rcutorture).

Please do, I appreciate!

> 
> Also it is still not fully clear to me what the root node has to do with all
> this in your example, because the rcu_seq_done_exact() needs to be what it is
> (that is having that 2 GP adjustment) even if the rnp->gp_seq and
> rcu_state.gp_seq were in sync?

Yes, this is only to explain that the maximum drift between the snap on rsp
and the current state on root rnp can be at most 2 full GP. And that explain the "3"
magic in the function. But if they were in sync it's all fine.

Thanks.


> 
> thanks,
> 
>  - Joel
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-22 10:25           ` Frederic Weisbecker
@ 2025-03-22 14:20             ` Joel Fernandes
  2025-03-22 14:40               ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2025-03-22 14:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu



On 3/22/2025 11:25 AM, Frederic Weisbecker wrote:
> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
>>
>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
>>>>>> their magic. Especially after the commit:
>>>>>>
>>>>>>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
>>>>>>
>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken
>>>>>> on the root node state while a grace period had already been started and
>>>>>> reflected on the global state sequence but not yet on the root node
>>>>>> sequence, making a polling user waiting on a wrong already started grace
>>>>>> period that would ignore freshly online CPUs.
>>>>>>
>>>>>> The fix involved taking the snaphot on the global state sequence and
>>>>>> waiting on the root node sequence. And since a grace period is first
>>>>>> started on the global state and only afterward reflected on the root
>>>>>> node, a snapshot taken on the global state sequence might be two full
>>>>>> grace periods ahead of the root node as in the following example:
>>>>>>
>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0
>>>>>>
>>>>>>     CPU 0                                           CPU 1
>>>>>>     -----                                           -----
>>>>>>     // rcu_state.gp_seq = 1
>>>>>>     rcu_seq_start(&rcu_state.gp_seq)
>>>>>>                                                     // snap = 8
>>>>>>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
>>>>>>                                                     // Two full GP differences
>>>>>>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
>>>>>>     // rnp->gp_seq = 1
>>>>>>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
>>>>>>
>>>>>> Add a comment about those expectations and to clarify the magic within
>>>>>> the relevant function.
>>>>>>
>>>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>>>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
>>>>>
>>>>> But it would of course be good to get reviews from the others.
>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
>>>> because the small lag can just as well survive with the rcu_seq_done()
>>>> function in the above sequence right?
>>>>
>>>> The rcu_seq_done_exact() function on the other hand is more about not being
>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
>>>> least ULONG_MAX/2 AFAIU.
>>>>
>>>> So the only time this magic will matter is if you have a huge delta between
>>>> what is being compared, not just 2 GPs.
>>> You're right, and perhaps I should have made it more specific that my comment
>>> only explains the magic "3" number here, in that if it were "2" instead, there
>>> could be accidents with 2 full GPs difference (which is possible) spuriously
>>> accounted as a wrap around.
>>
>> Ahh, so I guess I get it now and we are both right. The complete picture is - We
>> are trying to handle the case of "very large wrap" around but as a part of that,
>> we don't want to create false-positives for this "snap" case.
>>
>> A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
>>
>> That's within "2 GPs" worth of counts (about 8 counts)
>>
>> Taking some numbers:
>>
>> cur_s	s	delta (s - cur_s)
>> 0	4	4
>> 1	8	7
>> 2	8	6
>> 3	8	5
>> 4	8	4
>> 5	12	7
>>
>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
>> 1) which in this case is 7.
>>
>> So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
>> RCU_SEQ_STATE_MASK + 1)). i.e.
> 
> 3, right?

Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ?

That is indeed 3 (RCU_SEQ_STATE_MASK).

But if we're talking about number of GPs, my understanding is a count of 4 is
one GP worth. Per the above table, the delta between gp_seq and is snap is
always a count of 7 (hence less than 2 GPs).

Agreed?

If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count
of 2 instead of 4, for a GP), then the table would be:

 cur_s	s (snap)	delta (s - cur_s)
 0	2		2
 1	4		3
 2	4		2
 3	6		3
 4	6		2
 5	8		3

So delta is always <= 3,  Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1

>> Unless you beat me to it, I may modify your patch for v6.16 augmented with this
>> reasoning ;) (Also since I am also working on adding that forced wrap around
>> test to rcutorture).
> 
> Please do, I appreciate!

Great, will do.

>> Also it is still not fully clear to me what the root node has to do with all
>> this in your example, because the rcu_seq_done_exact() needs to be what it is
>> (that is having that 2 GP adjustment) even if the rnp->gp_seq and
>> rcu_state.gp_seq were in sync?
> 
> Yes, this is only to explain that the maximum drift between the snap on rsp
> and the current state on root rnp can be at most 2 full GP. And that explain the "3"
> magic in the function. But if they were in sync it's all fine.

Right. But I would argue that, even if things were in sync, its not fine to drop
the magic number. Because you cannot take a gp_seq and its corresponding "snap"
and compare them instantly without the "ULONG_CMP_LT". If you do, you'd get a
false positive. But as you did, we could say that the lag between rnp and rsp's
gp_seq values is also covered by this "ULONG_CMP_LT" expression.

Glad we are discussing this in detail so that once we forget all this in 24
hours, we can go back to the archives ;-) ;-)

thanks,

 - Joel




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-22 14:20             ` Joel Fernandes
@ 2025-03-22 14:40               ` Joel Fernandes
  2025-03-23 22:05                 ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2025-03-22 14:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu



On 3/22/2025 3:20 PM, Joel Fernandes wrote:
> 
> On 3/22/2025 11:25 AM, Frederic Weisbecker wrote:
>> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
>>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
>>>
>>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
>>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
>>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
>>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
>>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
>>>>>>> their magic. Especially after the commit:
>>>>>>>
>>>>>>>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
>>>>>>>
>>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken
>>>>>>> on the root node state while a grace period had already been started and
>>>>>>> reflected on the global state sequence but not yet on the root node
>>>>>>> sequence, making a polling user waiting on a wrong already started grace
>>>>>>> period that would ignore freshly online CPUs.
>>>>>>>
>>>>>>> The fix involved taking the snaphot on the global state sequence and
>>>>>>> waiting on the root node sequence. And since a grace period is first
>>>>>>> started on the global state and only afterward reflected on the root
>>>>>>> node, a snapshot taken on the global state sequence might be two full
>>>>>>> grace periods ahead of the root node as in the following example:
>>>>>>>
>>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0
>>>>>>>
>>>>>>>     CPU 0                                           CPU 1
>>>>>>>     -----                                           -----
>>>>>>>     // rcu_state.gp_seq = 1
>>>>>>>     rcu_seq_start(&rcu_state.gp_seq)
>>>>>>>                                                     // snap = 8
>>>>>>>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
>>>>>>>                                                     // Two full GP differences
>>>>>>>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
>>>>>>>     // rnp->gp_seq = 1
>>>>>>>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
>>>>>>>
>>>>>>> Add a comment about those expectations and to clarify the magic within
>>>>>>> the relevant function.
>>>>>>>
>>>>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>>>>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
>>>>>>
>>>>>> But it would of course be good to get reviews from the others.
>>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
>>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
>>>>> because the small lag can just as well survive with the rcu_seq_done()
>>>>> function in the above sequence right?
>>>>>
>>>>> The rcu_seq_done_exact() function on the other hand is more about not being
>>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
>>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
>>>>> least ULONG_MAX/2 AFAIU.
>>>>>
>>>>> So the only time this magic will matter is if you have a huge delta between
>>>>> what is being compared, not just 2 GPs.
>>>> You're right, and perhaps I should have made it more specific that my comment
>>>> only explains the magic "3" number here, in that if it were "2" instead, there
>>>> could be accidents with 2 full GPs difference (which is possible) spuriously
>>>> accounted as a wrap around.
>>> Ahh, so I guess I get it now and we are both right. The complete picture is - We
>>> are trying to handle the case of "very large wrap" around but as a part of that,
>>> we don't want to create false-positives for this "snap" case.
>>>
>>> A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
>>>
>>> That's within "2 GPs" worth of counts (about 8 counts)
>>>
>>> Taking some numbers:
>>>
>>> cur_s	s	delta (s - cur_s)
>>> 0	4	4
>>> 1	8	7
>>> 2	8	6
>>> 3	8	5
>>> 4	8	4
>>> 5	12	7
>>>
>>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
>>> 1) which in this case is 7.
>>>
>>> So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
>>> RCU_SEQ_STATE_MASK + 1)). i.e.
>> 3, right?
> Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ?
> 
> That is indeed 3 (RCU_SEQ_STATE_MASK).
> 
> But if we're talking about number of GPs, my understanding is a count of 4 is
> one GP worth. Per the above table, the delta between gp_seq and is snap is
> always a count of 7 (hence less than 2 GPs).
> 
> Agreed?
> 
> If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count
> of 2 instead of 4, for a GP), then the table would be:
> 
>  cur_s	s (snap)	delta (s - cur_s)
>  0	2		2
>  1	4		3
>  2	4		2
>  3	6		3
>  4	6		2
>  5	8		3
> 
> So delta is always <= 3,  Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1

Oh man, I am wondering if we are on to a bug here:

From your example:

    CPU 0                                           CPU 1
    -----                                           -----
    // rcu_state.gp_seq = 1
    rcu_seq_start(&rcu_state.gp_seq)
                                      // snap = 8
                                      snap = rcu_seq_snap(&rcu_state.gp_seq)
                                      // Two full GP
                                      rcu_seq_done_exact(&rnp->gp_seq, snap)


Here, the
ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1));

Will be
ULONG_CMP_LT(0, 8 - (2 * RCU_SEQ_STATE_MASK + 1));

 = ULONG_CMP_LT(0, 8 - 7)

 = TRUE.

Which means rcu_seq_done_exact() will return a false positive saying the GP has
completed even though it has not.

I think rcu_seq_done_exact() is off by one and should be doing:

ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 2));

?

Oh and I see from an old email that Frederic did ask about this: "Should it be
ULONG_CMP_LT(cur_s, s - (2 * (RCU_SEQ_STATE_MASK + 1))) ?"

thanks,

 - Joel





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Robustify rcu_is_cpu_rrupt_from_idle()
  2025-03-18 13:56 ` [PATCH 2/2] rcu: Robustify rcu_is_cpu_rrupt_from_idle() Frederic Weisbecker
@ 2025-03-22 17:00   ` Joel Fernandes
  2025-03-23 22:31     ` Frederic Weisbecker
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2025-03-22 17:00 UTC (permalink / raw)
  To: Frederic Weisbecker, LKML
  Cc: Boqun Feng, Neeraj Upadhyay, Paul E . McKenney, Uladzislau Rezki,
	Zqiang, rcu



On 3/18/2025 2:56 PM, Frederic Weisbecker wrote:
> RCU relies on the context tracking nesting counter in order to determine
> if it is running in extended quiescent state.
> 
> However the context tracking nesting counter is not completely
> synchronized with the actual context tracking state:
> 
> * The nesting counter is set to 1 or incremented further _after_ the
>   actual state is set to RCU not watching.

I agree with patch, but this line is a bit confusing ->nesting is set to 1
*after* the RCU state is set to "watching".  Did you mean "watching" ?

But I think you meant "After RCU transitions from a state of not-watching to
watching' instead of 'actual state is set to RCU not watching'..

ct_kernel_entry():

	// RCU is not watching here ...
	ct_kernel_enter_state(offset);
	// ... but is watching here.
	WRITE_ONCE(ct->nesting, 1);

>    (then we know for sure we interrupted RCU not watching)
> 
> * The nesting counter is set to 0 or decremented further _before_ the
>   actual state is set to RCU watching.
> 
> Therefore it is safe to assume that if ct_nesting() > 0, RCU is not
> watching. But if ct_nesting() <= 0, RCU is watching except for a tiny
> window.
> 
> This hasn't been a problem so far because rcu_is_cpu_rrupt_from_idle()
> has only been called from interrupts. However the code is confusing

Agreed, and I could also see the existing code's snippet:
	WARN_ON_ONCE(!nesting && !is_idle_task(current));

.. not working if this function were to be called from non-interrupt kernel context.


> and abuses the role of the context tracking nesting counter while there
> are more accurate indicators available.
> 
> Clarify and robustify accordingly.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/tree.c | 27 +++++++++++++++++----------
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79dced5fb72e..90c43061c981 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -367,7 +367,7 @@ EXPORT_SYMBOL_GPL(rcu_momentary_eqs);
>   */
>  static int rcu_is_cpu_rrupt_from_idle(void)
>  {
> -	long nesting;
> +	long nmi_nesting = ct_nmi_nesting();
>  
>  	/*
>  	 * Usually called from the tick; but also used from smp_function_call()
> @@ -379,21 +379,28 @@ static int rcu_is_cpu_rrupt_from_idle(void)
>  	/* Check for counter underflows */
>  	RCU_LOCKDEP_WARN(ct_nesting() < 0,
>  			 "RCU nesting counter underflow!");
> -	RCU_LOCKDEP_WARN(ct_nmi_nesting() <= 0,
> -			 "RCU nmi_nesting counter underflow/zero!");
>  
> -	/* Are we at first interrupt nesting level? */
> -	nesting = ct_nmi_nesting();
> -	if (nesting > 1)
> +	/* Non-idle interrupt or nested idle interrupt */
> +	if (nmi_nesting > 1)
>  		return false;
>  
>  	/*
> -	 * If we're not in an interrupt, we must be in the idle task!
> +	 * Non nested idle interrupt (interrupting section where RCU
> +	 * wasn't watching).
>  	 */
> -	WARN_ON_ONCE(!nesting && !is_idle_task(current));
> +	if (nmi_nesting == 1)
> +		return true;
>  
> -	/* Does CPU appear to be idle from an RCU standpoint? */
> -	return ct_nesting() == 0;
> +	/* Not in an interrupt */
> +	if (!nmi_nesting) {
> +		RCU_LOCKDEP_WARN(!in_task() || !is_idle_task(current),
> +				 "RCU nmi_nesting counter not in idle task!");
> +		return !rcu_is_watching_curr_cpu();

Makes sense to me and it is also consistent with rcu_watching_snap_in_eqs().

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-22 14:40               ` Joel Fernandes
@ 2025-03-23 22:05                 ` Frederic Weisbecker
  2025-03-23 22:57                   ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2025-03-23 22:05 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu

Le Sat, Mar 22, 2025 at 03:40:53PM +0100, Joel Fernandes a écrit :
> 
> 
> On 3/22/2025 3:20 PM, Joel Fernandes wrote:
> > 
> > On 3/22/2025 11:25 AM, Frederic Weisbecker wrote:
> >> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
> >>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
> >>>
> >>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
> >>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
> >>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
> >>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
> >>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
> >>>>>>> their magic. Especially after the commit:
> >>>>>>>
> >>>>>>>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> >>>>>>>
> >>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken
> >>>>>>> on the root node state while a grace period had already been started and
> >>>>>>> reflected on the global state sequence but not yet on the root node
> >>>>>>> sequence, making a polling user waiting on a wrong already started grace
> >>>>>>> period that would ignore freshly online CPUs.
> >>>>>>>
> >>>>>>> The fix involved taking the snaphot on the global state sequence and
> >>>>>>> waiting on the root node sequence. And since a grace period is first
> >>>>>>> started on the global state and only afterward reflected on the root
> >>>>>>> node, a snapshot taken on the global state sequence might be two full
> >>>>>>> grace periods ahead of the root node as in the following example:
> >>>>>>>
> >>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0
> >>>>>>>
> >>>>>>>     CPU 0                                           CPU 1
> >>>>>>>     -----                                           -----
> >>>>>>>     // rcu_state.gp_seq = 1
> >>>>>>>     rcu_seq_start(&rcu_state.gp_seq)
> >>>>>>>                                                     // snap = 8
> >>>>>>>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
> >>>>>>>                                                     // Two full GP differences
> >>>>>>>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
> >>>>>>>     // rnp->gp_seq = 1
> >>>>>>>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
> >>>>>>>
> >>>>>>> Add a comment about those expectations and to clarify the magic within
> >>>>>>> the relevant function.
> >>>>>>>
> >>>>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> >>>>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> >>>>>>
> >>>>>> But it would of course be good to get reviews from the others.
> >>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
> >>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
> >>>>> because the small lag can just as well survive with the rcu_seq_done()
> >>>>> function in the above sequence right?
> >>>>>
> >>>>> The rcu_seq_done_exact() function on the other hand is more about not being
> >>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
> >>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
> >>>>> least ULONG_MAX/2 AFAIU.
> >>>>>
> >>>>> So the only time this magic will matter is if you have a huge delta between
> >>>>> what is being compared, not just 2 GPs.
> >>>> You're right, and perhaps I should have made it more specific that my comment
> >>>> only explains the magic "3" number here, in that if it were "2" instead, there
> >>>> could be accidents with 2 full GPs difference (which is possible) spuriously
> >>>> accounted as a wrap around.
> >>> Ahh, so I guess I get it now and we are both right. The complete picture is - We
> >>> are trying to handle the case of "very large wrap" around but as a part of that,
> >>> we don't want to create false-positives for this "snap" case.
> >>>
> >>> A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
> >>>
> >>> That's within "2 GPs" worth of counts (about 8 counts)
> >>>
> >>> Taking some numbers:
> >>>
> >>> cur_s	s	delta (s - cur_s)
> >>> 0	4	4
> >>> 1	8	7
> >>> 2	8	6
> >>> 3	8	5
> >>> 4	8	4
> >>> 5	12	7
> >>>
> >>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
> >>> 1) which in this case is 7.
> >>>
> >>> So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
> >>> RCU_SEQ_STATE_MASK + 1)). i.e.
> >> 3, right?
> > Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ?
> > 
> > That is indeed 3 (RCU_SEQ_STATE_MASK).
> > 
> > But if we're talking about number of GPs, my understanding is a count of 4 is
> > one GP worth. Per the above table, the delta between gp_seq and is snap is
> > always a count of 7 (hence less than 2 GPs).
> > 
> > Agreed?
> > 
> > If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count
> > of 2 instead of 4, for a GP), then the table would be:
> > 
> >  cur_s	s (snap)	delta (s - cur_s)
> >  0	2		2
> >  1	4		3
> >  2	4		2
> >  3	6		3
> >  4	6		2
> >  5	8		3
> > 
> > So delta is always <= 3,  Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1
> 
> Oh man, I am wondering if we are on to a bug here:
> 
> From your example:
> 
>     CPU 0                                           CPU 1
>     -----                                           -----
>     // rcu_state.gp_seq = 1
>     rcu_seq_start(&rcu_state.gp_seq)
>                                       // snap = 8
>                                       snap = rcu_seq_snap(&rcu_state.gp_seq)
>                                       // Two full GP
>                                       rcu_seq_done_exact(&rnp->gp_seq, snap)
> 
> 
> Here, the
> ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1));
> 
> Will be
> ULONG_CMP_LT(0, 8 - (2 * RCU_SEQ_STATE_MASK + 1));
> 
>  = ULONG_CMP_LT(0, 8 - 7)
> 
>  = TRUE.
> 
> Which means rcu_seq_done_exact() will return a false positive saying the GP has
> completed even though it has not.
> 
> I think rcu_seq_done_exact() is off by one and should be doing:
> 
> ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 2));
> 
> ?

But it's ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1) now since:

    85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")

That's 10 so we are good.

However that magic value is arbitrary and doesn't mean much. It should be
like you said. Or rather for clarity:

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 7acf1f36dd6c..e53f0b687a83 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -57,6 +57,10 @@
 /* Low-order bit definition for polled grace-period APIs. */
 #define RCU_GET_STATE_COMPLETED	0x1
 
+/* A complete grace period count */
+#define RCU_SEQ_GP (RCU_SEQ_STATE_MASK + 1)
+
+
 extern int sysctl_sched_rt_runtime;
 
 /*
@@ -169,7 +173,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
 {
 	unsigned long cur_s = READ_ONCE(*sp);
 
-	return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1));
+	return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_GP));
 }
 
 /*


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] rcu: Robustify rcu_is_cpu_rrupt_from_idle()
  2025-03-22 17:00   ` Joel Fernandes
@ 2025-03-23 22:31     ` Frederic Weisbecker
  0 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2025-03-23 22:31 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Boqun Feng, Neeraj Upadhyay, Paul E . McKenney,
	Uladzislau Rezki, Zqiang, rcu

Le Sat, Mar 22, 2025 at 06:00:13PM +0100, Joel Fernandes a écrit :
> 
> 
> On 3/18/2025 2:56 PM, Frederic Weisbecker wrote:
> > RCU relies on the context tracking nesting counter in order to determine
> > if it is running in extended quiescent state.
> > 
> > However the context tracking nesting counter is not completely
> > synchronized with the actual context tracking state:
> > 
> > * The nesting counter is set to 1 or incremented further _after_ the
> >   actual state is set to RCU not watching.
> 
> I agree with patch, but this line is a bit confusing ->nesting is set to 1
> *after* the RCU state is set to "watching".  Did you mean "watching" ?
> 
> But I think you meant "After RCU transitions from a state of not-watching to
> watching' instead of 'actual state is set to RCU not watching'..
> 
> ct_kernel_entry():
> 
> 	// RCU is not watching here ...
> 	ct_kernel_enter_state(offset);
> 	// ... but is watching here.
> 	WRITE_ONCE(ct->nesting, 1);

Oh I completely inverted the thing in the changelog!

> 
> >    (then we know for sure we interrupted RCU not watching)
> > 
> > * The nesting counter is set to 0 or decremented further _before_ the
> >   actual state is set to RCU watching.
> > 
> > Therefore it is safe to assume that if ct_nesting() > 0, RCU is not
> > watching. But if ct_nesting() <= 0, RCU is watching except for a tiny
> > window.
> > 
> > This hasn't been a problem so far because rcu_is_cpu_rrupt_from_idle()
> > has only been called from interrupts. However the code is confusing
> 
> Agreed, and I could also see the existing code's snippet:
> 	WARN_ON_ONCE(!nesting && !is_idle_task(current));
> 
> .. not working if this function were to be called from non-interrupt kernel
> context.

Right.

I'll reissue that one.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact()
  2025-03-23 22:05                 ` Frederic Weisbecker
@ 2025-03-23 22:57                   ` Joel Fernandes
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Fernandes @ 2025-03-23 22:57 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Boqun Feng, Neeraj Upadhyay,
	Uladzislau Rezki, Zqiang, rcu



> On Mar 23, 2025, at 11:05 PM, Frederic Weisbecker <frederic@kernel.org> wrote:
> 
> Le Sat, Mar 22, 2025 at 03:40:53PM +0100, Joel Fernandes a écrit :
>> 
>> 
>>> On 3/22/2025 3:20 PM, Joel Fernandes wrote:
>>> 
>>> On 3/22/2025 11:25 AM, Frederic Weisbecker wrote:
>>>> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
>>>>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
>>>>> 
>>>>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
>>>>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
>>>>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
>>>>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
>>>>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
>>>>>>>>> their magic. Especially after the commit:
>>>>>>>>> 
>>>>>>>>>    85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
>>>>>>>>> 
>>>>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken
>>>>>>>>> on the root node state while a grace period had already been started and
>>>>>>>>> reflected on the global state sequence but not yet on the root node
>>>>>>>>> sequence, making a polling user waiting on a wrong already started grace
>>>>>>>>> period that would ignore freshly online CPUs.
>>>>>>>>> 
>>>>>>>>> The fix involved taking the snaphot on the global state sequence and
>>>>>>>>> waiting on the root node sequence. And since a grace period is first
>>>>>>>>> started on the global state and only afterward reflected on the root
>>>>>>>>> node, a snapshot taken on the global state sequence might be two full
>>>>>>>>> grace periods ahead of the root node as in the following example:
>>>>>>>>> 
>>>>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0
>>>>>>>>> 
>>>>>>>>>    CPU 0                                           CPU 1
>>>>>>>>>    -----                                           -----
>>>>>>>>>    // rcu_state.gp_seq = 1
>>>>>>>>>    rcu_seq_start(&rcu_state.gp_seq)
>>>>>>>>>                                                    // snap = 8
>>>>>>>>>                                                    snap = rcu_seq_snap(&rcu_state.gp_seq)
>>>>>>>>>                                                    // Two full GP differences
>>>>>>>>>                                                    rcu_seq_done_exact(&rnp->gp_seq, snap)
>>>>>>>>>    // rnp->gp_seq = 1
>>>>>>>>>    WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
>>>>>>>>> 
>>>>>>>>> Add a comment about those expectations and to clarify the magic within
>>>>>>>>> the relevant function.
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>>>>>>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
>>>>>>>> 
>>>>>>>> But it would of course be good to get reviews from the others.
>>>>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
>>>>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
>>>>>>> because the small lag can just as well survive with the rcu_seq_done()
>>>>>>> function in the above sequence right?
>>>>>>> 
>>>>>>> The rcu_seq_done_exact() function on the other hand is more about not being
>>>>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
>>>>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
>>>>>>> least ULONG_MAX/2 AFAIU.
>>>>>>> 
>>>>>>> So the only time this magic will matter is if you have a huge delta between
>>>>>>> what is being compared, not just 2 GPs.
>>>>>> You're right, and perhaps I should have made it more specific that my comment
>>>>>> only explains the magic "3" number here, in that if it were "2" instead, there
>>>>>> could be accidents with 2 full GPs difference (which is possible) spuriously
>>>>>> accounted as a wrap around.
>>>>> Ahh, so I guess I get it now and we are both right. The complete picture is - We
>>>>> are trying to handle the case of "very large wrap" around but as a part of that,
>>>>> we don't want to create false-positives for this "snap" case.
>>>>> 
>>>>> A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
>>>>> 
>>>>> That's within "2 GPs" worth of counts (about 8 counts)
>>>>> 
>>>>> Taking some numbers:
>>>>> 
>>>>> cur_s    s    delta (s - cur_s)
>>>>> 0    4    4
>>>>> 1    8    7
>>>>> 2    8    6
>>>>> 3    8    5
>>>>> 4    8    4
>>>>> 5    12    7
>>>>> 
>>>>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
>>>>> 1) which in this case is 7.
>>>>> 
>>>>> So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
>>>>> RCU_SEQ_STATE_MASK + 1)). i.e.
>>>> 3, right?
>>> Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ?
>>> 
>>> That is indeed 3 (RCU_SEQ_STATE_MASK).
>>> 
>>> But if we're talking about number of GPs, my understanding is a count of 4 is
>>> one GP worth. Per the above table, the delta between gp_seq and is snap is
>>> always a count of 7 (hence less than 2 GPs).
>>> 
>>> Agreed?
>>> 
>>> If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count
>>> of 2 instead of 4, for a GP), then the table would be:
>>> 
>>> cur_s    s (snap)    delta (s - cur_s)
>>> 0    2        2
>>> 1    4        3
>>> 2    4        2
>>> 3    6        3
>>> 4    6        2
>>> 5    8        3
>>> 
>>> So delta is always <= 3,  Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1
>> 
>> Oh man, I am wondering if we are on to a bug here:
>> 
>> From your example:
>> 
>>    CPU 0                                           CPU 1
>>    -----                                           -----
>>    // rcu_state.gp_seq = 1
>>    rcu_seq_start(&rcu_state.gp_seq)
>>                                      // snap = 8
>>                                      snap = rcu_seq_snap(&rcu_state.gp_seq)
>>                                      // Two full GP
>>                                      rcu_seq_done_exact(&rnp->gp_seq, snap)
>> 
>> 
>> Here, the
>> ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1));
>> 
>> Will be
>> ULONG_CMP_LT(0, 8 - (2 * RCU_SEQ_STATE_MASK + 1));
>> 
>> = ULONG_CMP_LT(0, 8 - 7)
>> 
>> = TRUE.
>> 
>> Which means rcu_seq_done_exact() will return a false positive saying the GP has
>> completed even though it has not.
>> 
>> I think rcu_seq_done_exact() is off by one and should be doing:
>> 
>> ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 2));
>> 
>> ?
> 
> But it's ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1) now since:
> 
>    85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> 
> That's 10 so we are good.
> 
> However that magic value is arbitrary and doesn't mean much. It should be
> like you said. Or rather for clarity:
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 7acf1f36dd6c..e53f0b687a83 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -57,6 +57,10 @@
> /* Low-order bit definition for polled grace-period APIs. */
> #define RCU_GET_STATE_COMPLETED    0x1
> 
> +/* A complete grace period count */
> +#define RCU_SEQ_GP (RCU_SEQ_STATE_MASK + 1)
> +
> +
> extern int sysctl_sched_rt_runtime;
> 
> /*
> @@ -169,7 +173,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
> {
>    unsigned long cur_s = READ_ONCE(*sp);
> 
> -    return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1));
> +    return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_GP));
> }

Ah, my kernel did not have the change at the time I commented, sorry. I agree that this is a much better and meaningful expression than the existing one. I shall create a patch with it and send it out with my other series on forcing the wrap for testing (along with the modification for the new comments you added).

Thanks!


> 
> /*
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-03-23 22:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-18 13:56 [PATCH 0/2] rcu: Random updates Frederic Weisbecker
2025-03-18 13:56 ` [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done_exact() Frederic Weisbecker
2025-03-18 18:37   ` Paul E. McKenney
2025-03-19 19:38     ` Joel Fernandes
2025-03-20 14:15       ` Frederic Weisbecker
2025-03-22  2:06         ` Joel Fernandes
2025-03-22 10:25           ` Frederic Weisbecker
2025-03-22 14:20             ` Joel Fernandes
2025-03-22 14:40               ` Joel Fernandes
2025-03-23 22:05                 ` Frederic Weisbecker
2025-03-23 22:57                   ` Joel Fernandes
2025-03-18 13:56 ` [PATCH 2/2] rcu: Robustify rcu_is_cpu_rrupt_from_idle() Frederic Weisbecker
2025-03-22 17:00   ` Joel Fernandes
2025-03-23 22:31     ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).