[PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU

Linux RCU subsystem development
 help / color / mirror / Atom feed

* [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask
@ 2022-12-20 11:25 Zqiang
  2022-12-21 20:08 ` Paul E. McKenney
  0 siblings, 1 reply; 8+ messages in thread
From: Zqiang @ 2022-12-20 11:25 UTC (permalink / raw)
  To: paulmck, frederic, quic_neeraju, joel; +Cc: rcu, linux-kernel

For the kernel bulit with CONFIG_NO_HZ_FULL enabled and the following
cpus is nohz_full cpus:

	CPU1                                                 CPU2
rcu_report_exp_cpu_mult                          synchronize_rcu_expedited_wait
   acquires rnp->lock                               mask = rnp->expmask;
                                                    for_each_leaf_node_cpu_mask(rnp, cpu, mask)
   rnp->expmask = rnp->expmask & ~mask;                rdp = per_cpu_ptr(&rcu_data, cpu1);
   for_each_leaf_node_cpu_mask(rnp, cpu, mask)
      rdp = per_cpu_ptr(&rcu_data, cpu1);
      if (!rdp->rcu_forced_tick_exp)
             continue;                                 rdp->rcu_forced_tick_exp = true;
                                                       tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);

In the above scenario, after CPU1 reported the quiescent state, CPU1
misses the opportunity to clear the TICK_DEP_BIT_RCU_EXP bitmask, it
will not be cleared until the next expedited grace period starts and
the CPU1 quiescent state is reported again. during this window period,
the CPU1 whose tick can not be stopped, if CPU1 has only one runnable
task and this task has aggressive real-time response constraints, this
task may have one of the worst response times.

Therefore, this commit add rnp->lock when set TICK_DEP_BIT_RCU_EXP
bitmask to fix this race.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
 kernel/rcu/tree_exp.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 927abaf6c822..e5fe0099488b 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -593,6 +593,7 @@ static void synchronize_rcu_expedited_wait(void)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 	struct rcu_node *rnp_root = rcu_get_root();
+	unsigned long flags;
 
 	trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
 	jiffies_stall = rcu_exp_jiffies_till_stall_check();
@@ -601,17 +602,17 @@ static void synchronize_rcu_expedited_wait(void)
 		if (synchronize_rcu_expedited_wait_once(1))
 			return;
 		rcu_for_each_leaf_node(rnp) {
+			raw_spin_lock_irqsave_rcu_node(rnp, flags);
 			mask = READ_ONCE(rnp->expmask);
 			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
 				rdp = per_cpu_ptr(&rcu_data, cpu);
 				if (rdp->rcu_forced_tick_exp)
 					continue;
 				rdp->rcu_forced_tick_exp = true;
-				preempt_disable();
 				if (cpu_online(cpu))
 					tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
-				preempt_enable();
 			}
+			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		}
 		j = READ_ONCE(jiffies_till_first_fqs);
 		if (synchronize_rcu_expedited_wait_once(j + HZ))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask
  2022-12-20 11:25 [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask Zqiang
@ 2022-12-21 20:08 ` Paul E. McKenney
  2022-12-22  9:48   ` Zhang, Qiang1
  2022-12-31 18:25   ` Frederic Weisbecker
  0 siblings, 2 replies; 8+ messages in thread
From: Paul E. McKenney @ 2022-12-21 20:08 UTC (permalink / raw)
  To: Zqiang; +Cc: frederic, quic_neeraju, joel, rcu, linux-kernel

On Tue, Dec 20, 2022 at 07:25:20PM +0800, Zqiang wrote:
> For the kernel bulit with CONFIG_NO_HZ_FULL enabled and the following
> cpus is nohz_full cpus:
> 
> 	CPU1                                                 CPU2
> rcu_report_exp_cpu_mult                          synchronize_rcu_expedited_wait
>    acquires rnp->lock                               mask = rnp->expmask;
>                                                     for_each_leaf_node_cpu_mask(rnp, cpu, mask)
>    rnp->expmask = rnp->expmask & ~mask;                rdp = per_cpu_ptr(&rcu_data, cpu1);
>    for_each_leaf_node_cpu_mask(rnp, cpu, mask)
>       rdp = per_cpu_ptr(&rcu_data, cpu1);
>       if (!rdp->rcu_forced_tick_exp)
>              continue;                                 rdp->rcu_forced_tick_exp = true;
>                                                        tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
> 
> In the above scenario, after CPU1 reported the quiescent state, CPU1
> misses the opportunity to clear the TICK_DEP_BIT_RCU_EXP bitmask, it
> will not be cleared until the next expedited grace period starts and
> the CPU1 quiescent state is reported again. during this window period,
> the CPU1 whose tick can not be stopped, if CPU1 has only one runnable
> task and this task has aggressive real-time response constraints, this
> task may have one of the worst response times.
> 
> Therefore, this commit add rnp->lock when set TICK_DEP_BIT_RCU_EXP
> bitmask to fix this race.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>

Good eyes, thank you!!!

Queued for testing and further review as follows, as always, please
check for errors.

							Thanx, Paul

------------------------------------------------------------------------

commit acfe689f2e473fb59b6d2c95af5fe36198bb9a84
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Tue Dec 20 19:25:20 2022 +0800

    rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
    
    For kernels built with CONFIG_NO_HZ_FULL=y, the following scenario can result
    in the scheduling-clock interrupt remaining enabled on a holdout CPU after
    its quiescent state has been reported:
    
            CPU1                                                 CPU2
    rcu_report_exp_cpu_mult                          synchronize_rcu_expedited_wait
       acquires rnp->lock                               mask = rnp->expmask;
                                                        for_each_leaf_node_cpu_mask(rnp, cpu, mask)
       rnp->expmask = rnp->expmask & ~mask;                rdp = per_cpu_ptr(&rcu_data, cpu1);
       for_each_leaf_node_cpu_mask(rnp, cpu, mask)
          rdp = per_cpu_ptr(&rcu_data, cpu1);
          if (!rdp->rcu_forced_tick_exp)
                 continue;                                 rdp->rcu_forced_tick_exp = true;
                                                           tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
    
    The problem is that CPU2's sampling of rnp->expmask is obsolete by the
    time it invokes tick_dep_set_cpu(), and CPU1 is not guaranteed to see
    CPU2's store to ->rcu_forced_tick_exp in time to clear it.  And even if
    CPU1 does see that store, it might invoke tick_dep_clear_cpu() before
    CPU2 got around to executing its tick_dep_set_cpu(), which would still
    leave the victim CPU with its scheduler-clock tick running.
    
    Either way, an nohz_full real-time application running on the victim
    CPU would have its latency needlessly degraded.
    
    Note that expedited RCU grace periods look at context-tracking
    information, and so if the CPU is executing in nohz_full usermode
    throughout, that CPU cannot be victimized in this manner.
    
    This commit therefore causes synchronize_rcu_expedited_wait to hold
    the rcu_node structure's ->lock when checking for holdout CPUs, setting
    TICK_DEP_BIT_RCU_EXP, and invoking tick_dep_set_cpu(), thus preventing
    this race.
    
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 249c2967d9e6c..7cc4856da0817 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 	struct rcu_node *rnp_root = rcu_get_root();
+	unsigned long flags;
 
 	trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
 	jiffies_stall = rcu_exp_jiffies_till_stall_check();
@@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
 		if (synchronize_rcu_expedited_wait_once(1))
 			return;
 		rcu_for_each_leaf_node(rnp) {
+			raw_spin_lock_irqsave_rcu_node(rnp, flags);
 			mask = READ_ONCE(rnp->expmask);
 			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
 				rdp = per_cpu_ptr(&rcu_data, cpu);
 				if (rdp->rcu_forced_tick_exp)
 					continue;
 				rdp->rcu_forced_tick_exp = true;
-				preempt_disable();
 				if (cpu_online(cpu))
 					tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
-				preempt_enable();
 			}
+			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		}
 		j = READ_ONCE(jiffies_till_first_fqs);
 		if (synchronize_rcu_expedited_wait_once(j + HZ))

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask
  2022-12-21 20:08 ` Paul E. McKenney
@ 2022-12-22  9:48   ` Zhang, Qiang1
  2022-12-22 15:18     ` Paul E. McKenney
  2022-12-31 18:25   ` Frederic Weisbecker
  1 sibling, 1 reply; 8+ messages in thread
From: Zhang, Qiang1 @ 2022-12-22  9:48 UTC (permalink / raw)
  To: paulmck@kernel.org
  Cc: frederic@kernel.org, quic_neeraju@quicinc.com,
	joel@joelfernandes.org, rcu@vger.kernel.org,
	linux-kernel@vger.kernel.org

> For the kernel bulit with CONFIG_NO_HZ_FULL enabled and the following 
> cpus is nohz_full cpus:
> 
> 	CPU1                                                 CPU2
> rcu_report_exp_cpu_mult                          synchronize_rcu_expedited_wait
>    acquires rnp->lock                               mask = rnp->expmask;
>                                                     for_each_leaf_node_cpu_mask(rnp, cpu, mask)
>    rnp->expmask = rnp->expmask & ~mask;                rdp = per_cpu_ptr(&rcu_data, cpu1);
>    for_each_leaf_node_cpu_mask(rnp, cpu, mask)
>       rdp = per_cpu_ptr(&rcu_data, cpu1);
>       if (!rdp->rcu_forced_tick_exp)
>              continue;                                 rdp->rcu_forced_tick_exp = true;
>                                                        
> tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
> 
> In the above scenario, after CPU1 reported the quiescent state, CPU1 
> misses the opportunity to clear the TICK_DEP_BIT_RCU_EXP bitmask, it 
> will not be cleared until the next expedited grace period starts and 
> the CPU1 quiescent state is reported again. during this window period, 
> the CPU1 whose tick can not be stopped, if CPU1 has only one runnable 
> task and this task has aggressive real-time response constraints, this 
> task may have one of the worst response times.
> 
> Therefore, this commit add rnp->lock when set TICK_DEP_BIT_RCU_EXP 
> bitmask to fix this race.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>
>Good eyes, thank you!!!
>
>Queued for testing and further review as follows, as always, please check for errors.
>

It looks more clear now, thank you!

Thanks
Zqiang

>							Thanx, Paul
>
>------------------------------------------------------------------------

commit acfe689f2e473fb59b6d2c95af5fe36198bb9a84
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Tue Dec 20 19:25:20 2022 +0800

    rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
    
    For kernels built with CONFIG_NO_HZ_FULL=y, the following scenario can result
    in the scheduling-clock interrupt remaining enabled on a holdout CPU after
    its quiescent state has been reported:
    
            CPU1                                                 CPU2
    rcu_report_exp_cpu_mult                          synchronize_rcu_expedited_wait
       acquires rnp->lock                               mask = rnp->expmask;
                                                        for_each_leaf_node_cpu_mask(rnp, cpu, mask)
       rnp->expmask = rnp->expmask & ~mask;                rdp = per_cpu_ptr(&rcu_data, cpu1);
       for_each_leaf_node_cpu_mask(rnp, cpu, mask)
          rdp = per_cpu_ptr(&rcu_data, cpu1);
          if (!rdp->rcu_forced_tick_exp)
                 continue;                                 rdp->rcu_forced_tick_exp = true;
                                                           tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
    
    The problem is that CPU2's sampling of rnp->expmask is obsolete by the
    time it invokes tick_dep_set_cpu(), and CPU1 is not guaranteed to see
    CPU2's store to ->rcu_forced_tick_exp in time to clear it.  And even if
    CPU1 does see that store, it might invoke tick_dep_clear_cpu() before
    CPU2 got around to executing its tick_dep_set_cpu(), which would still
    leave the victim CPU with its scheduler-clock tick running.
    
    Either way, an nohz_full real-time application running on the victim
    CPU would have its latency needlessly degraded.
    
    Note that expedited RCU grace periods look at context-tracking
    information, and so if the CPU is executing in nohz_full usermode
    throughout, that CPU cannot be victimized in this manner.
    
    This commit therefore causes synchronize_rcu_expedited_wait to hold
    the rcu_node structure's ->lock when checking for holdout CPUs, setting
    TICK_DEP_BIT_RCU_EXP, and invoking tick_dep_set_cpu(), thus preventing
    this race.
    
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 249c2967d9e6c..7cc4856da0817 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
 	struct rcu_data *rdp;
 	struct rcu_node *rnp;
 	struct rcu_node *rnp_root = rcu_get_root();
+	unsigned long flags;
 
 	trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
 	jiffies_stall = rcu_exp_jiffies_till_stall_check();
@@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
 		if (synchronize_rcu_expedited_wait_once(1))
 			return;
 		rcu_for_each_leaf_node(rnp) {
+			raw_spin_lock_irqsave_rcu_node(rnp, flags);
 			mask = READ_ONCE(rnp->expmask);
 			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
 				rdp = per_cpu_ptr(&rcu_data, cpu);
 				if (rdp->rcu_forced_tick_exp)
 					continue;
 				rdp->rcu_forced_tick_exp = true;
-				preempt_disable();
 				if (cpu_online(cpu))
 					tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
-				preempt_enable();
 			}
+			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		}
 		j = READ_ONCE(jiffies_till_first_fqs);
 		if (synchronize_rcu_expedited_wait_once(j + HZ))

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask
  2022-12-22  9:48   ` Zhang, Qiang1
@ 2022-12-22 15:18     ` Paul E. McKenney
  0 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2022-12-22 15:18 UTC (permalink / raw)
  To: Zhang, Qiang1
  Cc: frederic@kernel.org, quic_neeraju@quicinc.com,
	joel@joelfernandes.org, rcu@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, Dec 22, 2022 at 09:48:14AM +0000, Zhang, Qiang1 wrote:
> > For the kernel bulit with CONFIG_NO_HZ_FULL enabled and the following 
> > cpus is nohz_full cpus:
> > 
> > 	CPU1                                                 CPU2
> > rcu_report_exp_cpu_mult                          synchronize_rcu_expedited_wait
> >    acquires rnp->lock                               mask = rnp->expmask;
> >                                                     for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> >    rnp->expmask = rnp->expmask & ~mask;                rdp = per_cpu_ptr(&rcu_data, cpu1);
> >    for_each_leaf_node_cpu_mask(rnp, cpu, mask)
> >       rdp = per_cpu_ptr(&rcu_data, cpu1);
> >       if (!rdp->rcu_forced_tick_exp)
> >              continue;                                 rdp->rcu_forced_tick_exp = true;
> >                                                        
> > tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
> > 
> > In the above scenario, after CPU1 reported the quiescent state, CPU1 
> > misses the opportunity to clear the TICK_DEP_BIT_RCU_EXP bitmask, it 
> > will not be cleared until the next expedited grace period starts and 
> > the CPU1 quiescent state is reported again. during this window period, 
> > the CPU1 whose tick can not be stopped, if CPU1 has only one runnable 
> > task and this task has aggressive real-time response constraints, this 
> > task may have one of the worst response times.
> > 
> > Therefore, this commit add rnp->lock when set TICK_DEP_BIT_RCU_EXP 
> > bitmask to fix this race.
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >
> >Good eyes, thank you!!!
> >
> >Queued for testing and further review as follows, as always, please check for errors.
> >
> 
> It looks more clear now, thank you!

Thank you for checking them both!

							Thanx, Paul

> Thanks
> Zqiang
> 
> >							Thanx, Paul
> >
> >------------------------------------------------------------------------
> 
> commit acfe689f2e473fb59b6d2c95af5fe36198bb9a84
> Author: Zqiang <qiang1.zhang@intel.com>
> Date:   Tue Dec 20 19:25:20 2022 +0800
> 
>     rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
>     
>     For kernels built with CONFIG_NO_HZ_FULL=y, the following scenario can result
>     in the scheduling-clock interrupt remaining enabled on a holdout CPU after
>     its quiescent state has been reported:
>     
>             CPU1                                                 CPU2
>     rcu_report_exp_cpu_mult                          synchronize_rcu_expedited_wait
>        acquires rnp->lock                               mask = rnp->expmask;
>                                                         for_each_leaf_node_cpu_mask(rnp, cpu, mask)
>        rnp->expmask = rnp->expmask & ~mask;                rdp = per_cpu_ptr(&rcu_data, cpu1);
>        for_each_leaf_node_cpu_mask(rnp, cpu, mask)
>           rdp = per_cpu_ptr(&rcu_data, cpu1);
>           if (!rdp->rcu_forced_tick_exp)
>                  continue;                                 rdp->rcu_forced_tick_exp = true;
>                                                            tick_dep_set_cpu(cpu1, TICK_DEP_BIT_RCU_EXP);
>     
>     The problem is that CPU2's sampling of rnp->expmask is obsolete by the
>     time it invokes tick_dep_set_cpu(), and CPU1 is not guaranteed to see
>     CPU2's store to ->rcu_forced_tick_exp in time to clear it.  And even if
>     CPU1 does see that store, it might invoke tick_dep_clear_cpu() before
>     CPU2 got around to executing its tick_dep_set_cpu(), which would still
>     leave the victim CPU with its scheduler-clock tick running.
>     
>     Either way, an nohz_full real-time application running on the victim
>     CPU would have its latency needlessly degraded.
>     
>     Note that expedited RCU grace periods look at context-tracking
>     information, and so if the CPU is executing in nohz_full usermode
>     throughout, that CPU cannot be victimized in this manner.
>     
>     This commit therefore causes synchronize_rcu_expedited_wait to hold
>     the rcu_node structure's ->lock when checking for holdout CPUs, setting
>     TICK_DEP_BIT_RCU_EXP, and invoking tick_dep_set_cpu(), thus preventing
>     this race.
>     
>     Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 249c2967d9e6c..7cc4856da0817 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
>  	struct rcu_data *rdp;
>  	struct rcu_node *rnp;
>  	struct rcu_node *rnp_root = rcu_get_root();
> +	unsigned long flags;
>  
>  	trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
>  	jiffies_stall = rcu_exp_jiffies_till_stall_check();
> @@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
>  		if (synchronize_rcu_expedited_wait_once(1))
>  			return;
>  		rcu_for_each_leaf_node(rnp) {
> +			raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  			mask = READ_ONCE(rnp->expmask);
>  			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
>  				rdp = per_cpu_ptr(&rcu_data, cpu);
>  				if (rdp->rcu_forced_tick_exp)
>  					continue;
>  				rdp->rcu_forced_tick_exp = true;
> -				preempt_disable();
>  				if (cpu_online(cpu))
>  					tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
> -				preempt_enable();
>  			}
> +			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  		}
>  		j = READ_ONCE(jiffies_till_first_fqs);
>  		if (synchronize_rcu_expedited_wait_once(j + HZ))

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask
  2022-12-21 20:08 ` Paul E. McKenney
  2022-12-22  9:48   ` Zhang, Qiang1
@ 2022-12-31 18:25   ` Frederic Weisbecker
  2022-12-31 19:01     ` Paul E. McKenney
  1 sibling, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2022-12-31 18:25 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Zqiang, quic_neeraju, joel, rcu, linux-kernel

On Wed, Dec 21, 2022 at 12:08:49PM -0800, Paul E. McKenney wrote:
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 249c2967d9e6c..7cc4856da0817 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
>  	struct rcu_data *rdp;
>  	struct rcu_node *rnp;
>  	struct rcu_node *rnp_root = rcu_get_root();
> +	unsigned long flags;
>  
>  	trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
>  	jiffies_stall = rcu_exp_jiffies_till_stall_check();
> @@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
>  		if (synchronize_rcu_expedited_wait_once(1))
>  			return;
>  		rcu_for_each_leaf_node(rnp) {
> +			raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  			mask = READ_ONCE(rnp->expmask);
>  			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
>  				rdp = per_cpu_ptr(&rcu_data, cpu);
>  				if (rdp->rcu_forced_tick_exp)
>  					continue;
>  				rdp->rcu_forced_tick_exp = true;
> -				preempt_disable();
>  				if (cpu_online(cpu))
>  					tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
> -				preempt_enable();
>  			}
> +			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  		}
>  		j = READ_ONCE(jiffies_till_first_fqs);
>  		if (synchronize_rcu_expedited_wait_once(j + HZ))

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

BTW why are we forcing the tick on the whole node?
And shouldn't we set the tick dependency from rcu_exp_handler() instead?

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask
  2022-12-31 18:25   ` Frederic Weisbecker
@ 2022-12-31 19:01     ` Paul E. McKenney
  2023-01-01  9:41       ` Zhang, Qiang1
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2022-12-31 19:01 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Zqiang, quic_neeraju, joel, rcu, linux-kernel

On Sat, Dec 31, 2022 at 07:25:08PM +0100, Frederic Weisbecker wrote:
> On Wed, Dec 21, 2022 at 12:08:49PM -0800, Paul E. McKenney wrote:
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 249c2967d9e6c..7cc4856da0817 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
> >  	struct rcu_data *rdp;
> >  	struct rcu_node *rnp;
> >  	struct rcu_node *rnp_root = rcu_get_root();
> > +	unsigned long flags;
> >  
> >  	trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
> >  	jiffies_stall = rcu_exp_jiffies_till_stall_check();
> > @@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
> >  		if (synchronize_rcu_expedited_wait_once(1))
> >  			return;
> >  		rcu_for_each_leaf_node(rnp) {
> > +			raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >  			mask = READ_ONCE(rnp->expmask);
> >  			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
> >  				rdp = per_cpu_ptr(&rcu_data, cpu);
> >  				if (rdp->rcu_forced_tick_exp)
> >  					continue;
> >  				rdp->rcu_forced_tick_exp = true;
> > -				preempt_disable();
> >  				if (cpu_online(cpu))
> >  					tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
> > -				preempt_enable();
> >  			}
> > +			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  		}
> >  		j = READ_ONCE(jiffies_till_first_fqs);
> >  		if (synchronize_rcu_expedited_wait_once(j + HZ))
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thank you!

> BTW why are we forcing the tick on the whole node?

Now that you mention it, that would be more precise.

> And shouldn't we set the tick dependency from rcu_exp_handler() instead?

Because it never occurred to me to check whether this could be invoked
from an interrupt handler?  ;-)

But that does sound like it might be a better approach.

Zqiang, would you be willing to look into this?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask
  2022-12-31 19:01     ` Paul E. McKenney
@ 2023-01-01  9:41       ` Zhang, Qiang1
  2023-01-03 12:13         ` Frederic Weisbecker
  0 siblings, 1 reply; 8+ messages in thread
From: Zhang, Qiang1 @ 2023-01-01  9:41 UTC (permalink / raw)
  To: paulmck@kernel.org, Frederic Weisbecker
  Cc: quic_neeraju@quicinc.com, joel@joelfernandes.org,
	rcu@vger.kernel.org, linux-kernel@vger.kernel.org

> > >On Sat, Dec 31, 2022 at 07:25:08PM +0100, Frederic Weisbecker wrote:
> On Wed, Dec 21, 2022 at 12:08:49PM -0800, Paul E. McKenney wrote:
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 
> > 249c2967d9e6c..7cc4856da0817 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -594,6 +594,7 @@ static void synchronize_rcu_expedited_wait(void)
> >  	struct rcu_data *rdp;
> >  	struct rcu_node *rnp;
> >  	struct rcu_node *rnp_root = rcu_get_root();
> > +	unsigned long flags;
> >  
> >  	trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("startwait"));
> >  	jiffies_stall = rcu_exp_jiffies_till_stall_check();
> > @@ -602,17 +603,17 @@ static void synchronize_rcu_expedited_wait(void)
> >  		if (synchronize_rcu_expedited_wait_once(1))
> >  			return;
> >  		rcu_for_each_leaf_node(rnp) {
> > +			raw_spin_lock_irqsave_rcu_node(rnp, flags);
> >  			mask = READ_ONCE(rnp->expmask);
> >  			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
> >  				rdp = per_cpu_ptr(&rcu_data, cpu);
> >  				if (rdp->rcu_forced_tick_exp)
> >  					continue;
> >  				rdp->rcu_forced_tick_exp = true;
> > -				preempt_disable();
> >  				if (cpu_online(cpu))
> >  					tick_dep_set_cpu(cpu, TICK_DEP_BIT_RCU_EXP);
> > -				preempt_enable();
> >  			}
> > +			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  		}
> >  		j = READ_ONCE(jiffies_till_first_fqs);
> >  		if (synchronize_rcu_expedited_wait_once(j + HZ))
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>
>Thank you!
>
> BTW why are we forcing the tick on the whole node?
>
>Now that you mention it, that would be more precise.
>
> And shouldn't we set the tick dependency from rcu_exp_handler() instead?
>
>Because it never occurred to me to check whether this could be invoked from an interrupt handler?  ;-)
>
>But that does sound like it might be a better approach.
>
>Zqiang, would you be willing to look into this?


Yes,    and I have a question,  we forcing the tick dependency because the expedited grace period
is not end for the first time, this means that it is not to set the tick dependency every time.
if we set the tick dependency in rcu_exp_handler(), does this mean that every time the expedited
grace period starts the tick dependency will be set unconditionally ?

Thoughts ?

Thanks
Zqiang

>
>							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask
  2023-01-01  9:41       ` Zhang, Qiang1
@ 2023-01-03 12:13         ` Frederic Weisbecker
  0 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2023-01-03 12:13 UTC (permalink / raw)
  To: Zhang, Qiang1
  Cc: paulmck@kernel.org, quic_neeraju@quicinc.com,
	joel@joelfernandes.org, rcu@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Sun, Jan 01, 2023 at 09:41:57AM +0000, Zhang, Qiang1 wrote:
> > > >On Sat, Dec 31, 2022 at 07:25:08PM +0100, Frederic Weisbecker wrote:
> 
> Yes,    and I have a question,  we forcing the tick dependency because the expedited grace period
> is not end for the first time, this means that it is not to set the tick dependency every time.
> if we set the tick dependency in rcu_exp_handler(), does this mean that every time the expedited
> grace period starts the tick dependency will be set unconditionally ?
> 
> Thoughts ?

Only if rcu_exp_handler() fails to report a quiescent state. Then it means we
must poll on the CPU looking for a future one.

In fact the tick dependency should be set when rdp->cpu_no_qs.b.exp is set to
true and cleared when rdp->cpu_no_qs.b.exp is set to false.

Thanks.

> 
> Thanks
> Zqiang
> 
> >
> >							Thanx, Paul

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-03 12:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-20 11:25 [PATCH] rcu: Fix race in set and clear TICK_DEP_BIT_RCU_EXP bitmask Zqiang
2022-12-21 20:08 ` Paul E. McKenney
2022-12-22  9:48   ` Zhang, Qiang1
2022-12-22 15:18     ` Paul E. McKenney
2022-12-31 18:25   ` Frederic Weisbecker
2022-12-31 19:01     ` Paul E. McKenney
2023-01-01  9:41       ` Zhang, Qiang1
2023-01-03 12:13         ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox