public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown
@ 2022-09-05  3:38 Pingfan Liu
  2022-09-05  3:38 ` [PATCH 2/3] rcu: simplify the prototype of rcu_boost_kthread_setaffinity() Pingfan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Pingfan Liu @ 2022-09-05  3:38 UTC (permalink / raw)
  To: rcu
  Cc: Pingfan Liu, Paul E. McKenney, David Woodhouse,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
called twice. Firstly by rcutree_offline_cpu(), then by
rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.

From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
is visible to the scheduler. Furthermore, a bit in cpu_active_mask
means that the cpu is suitable as a migration destination.

Now turning back to the case in rcu offlining. sched_cpu_deactivate()
has disabled the dying cpu as the migration destination before
rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
cpu, it will be migrated to another suitable online cpu by the scheduler.
So the affinity setting by rcutree_offline_cpu() is redundant and can be
eliminated.

Besides, this patch does an trival code rearrangement by unfolding
rcutree_affinity_setting() into rcutree_online_cpu(), considering that
the latter one is the only user of the former.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345..b90f6487fd45 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
 	return 0;
 }
 
-/*
- * Update RCU priority boot kthread affinity for CPU-hotplug changes.
- */
-static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
-{
-	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
-
-	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
-}
-
 /*
  * Near the end of the CPU-online process.  Pretty much all services
  * enabled, and the CPU is now very much alive.
@@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
 	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
 		return 0; /* Too early in boot for scheduler work. */
 	sync_sched_exp_online_cleanup(cpu);
-	rcutree_affinity_setting(cpu, -1);
+	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
 
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
@@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
 	rnp->ffmask &= ~rdp->grpmask;
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
-	rcutree_affinity_setting(cpu, cpu);
-
 	// nohz_full CPUs need the tick for stop-machine to work quickly
 	tick_dep_set(TICK_DEP_BIT_RCU);
 	return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] rcu: simplify the prototype of rcu_boost_kthread_setaffinity()
  2022-09-05  3:38 [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown Pingfan Liu
@ 2022-09-05  3:38 ` Pingfan Liu
  2022-09-05  3:38 ` [PATCH 3/3] rcu: Keep qsmaskinitnext fresh for rcu_boost_kthread_setaffinity() Pingfan Liu
  2022-09-06 17:24 ` [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown Paul E. McKenney
  2 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2022-09-05  3:38 UTC (permalink / raw)
  To: rcu
  Cc: Pingfan Liu, Paul E. McKenney, David Woodhouse,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

Previously, the outgoing parameter is needed by rcutree_affinity_setting() since
when rcutree_offline_cpu() is called, ->qsmaskinitnext contains the outgoing cpu.

Now, it is not necessary and the prototype of
rcu_boost_kthread_setaffinity() can be simplified.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree.c        |  6 +++---
 kernel/rcu/tree_plugin.h | 12 +++++-------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b90f6487fd45..f0535bd210b6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -145,7 +145,7 @@ static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
 			      unsigned long gps, unsigned long flags);
 static void rcu_init_new_rnp(struct rcu_node *rnp_leaf);
 static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
-static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
+static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp);
 static void invoke_rcu_core(void);
 static void rcu_report_exp_rdp(struct rcu_data *rdp);
 static void sync_sched_exp_online_cleanup(int cpu);
@@ -2170,7 +2170,7 @@ int rcutree_dead_cpu(unsigned int cpu)
 
 	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
 	/* Adjust any no-longer-needed kthreads. */
-	rcu_boost_kthread_setaffinity(rnp, -1);
+	rcu_boost_kthread_setaffinity(rnp);
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
 	return 0;
@@ -3996,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
 	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
 		return 0; /* Too early in boot for scheduler work. */
 	sync_sched_exp_online_cleanup(cpu);
-	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
+	rcu_boost_kthread_setaffinity(rdp->mynode);
 
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 438ecae6bd7e..0bf6de185af5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1217,11 +1217,10 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
  * served by the rcu_node in question.  The CPU hotplug lock is still
  * held, so the value of rnp->qsmaskinit will be stable.
  *
- * We don't include outgoingcpu in the affinity set, use -1 if there is
- * no outgoing CPU.  If there are no CPUs left in the affinity set,
- * this function allows the kthread to execute on any CPU.
+ * If there are no CPUs left in the affinity set, this function allows the
+ * kthread to execute on any CPU.
  */
-static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
+static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp)
 {
 	struct task_struct *t = rnp->boost_kthread_task;
 	unsigned long mask = rcu_rnp_online_cpus(rnp);
@@ -1234,8 +1233,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 		return;
 	mutex_lock(&rnp->boost_kthread_mutex);
 	for_each_leaf_node_possible_cpu(rnp, cpu)
-		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
-		    cpu != outgoingcpu)
+		if ((mask & leaf_node_cpu_bit(rnp, cpu)))
 			cpumask_set_cpu(cpu, cm);
 	cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
 	if (cpumask_empty(cm))
@@ -1261,7 +1259,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
 {
 }
 
-static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
+static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp)
 {
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] rcu: Keep qsmaskinitnext fresh for rcu_boost_kthread_setaffinity()
  2022-09-05  3:38 [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown Pingfan Liu
  2022-09-05  3:38 ` [PATCH 2/3] rcu: simplify the prototype of rcu_boost_kthread_setaffinity() Pingfan Liu
@ 2022-09-05  3:38 ` Pingfan Liu
  2022-09-06 18:42   ` Paul E. McKenney
  2022-09-06 17:24 ` [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown Paul E. McKenney
  2 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2022-09-05  3:38 UTC (permalink / raw)
  To: rcu
  Cc: Pingfan Liu, Paul E. McKenney, David Woodhouse,
	Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Joel Fernandes,
	Jason A. Donenfeld

rcutree_online_cpu() is concurrent, so there is the following race
scene:

        CPU 1                               CPU2
mask_old = rcu_rnp_online_cpus(rnp);
...

                                   mask_new = rcu_rnp_online_cpus(rnp);
                                   ...
                                   set_cpus_allowed_ptr(t, cm);

set_cpus_allowed_ptr(t, cm);

Consequently, the old mask will overwrite the new mask in the task's cpus_ptr.

Since there is a mutex ->boost_kthread_mutex, using it to build an
order, then the latest ->qsmaskinitnext will be fetched for updating cpus_ptr.

Notes about the cpu teardown: The cpu hot-removing initiator executes
rcutree_dead_cpu() for the teardown cpu. If in future, an initiator can
hot-remove several cpus at a time, it executes rcutree_dead_cpu() in
series for each cpu. So it is still race-free without the mutex.  But
considering this is a cold path, applying that redundant mutex is
harmless.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/tree_plugin.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 0bf6de185af5..b868ac6c6ac8 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1223,7 +1223,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp)
 {
 	struct task_struct *t = rnp->boost_kthread_task;
-	unsigned long mask = rcu_rnp_online_cpus(rnp);
+	unsigned long mask;
 	cpumask_var_t cm;
 	int cpu;
 
@@ -1232,6 +1232,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp)
 	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
 		return;
 	mutex_lock(&rnp->boost_kthread_mutex);
+	/*
+	 * Relying on the lock to serialize, so the latest qsmaskinitnext is for
+	 * cpus_ptr.
+	 */
+	mask = rcu_rnp_online_cpus(rnp);
 	for_each_leaf_node_possible_cpu(rnp, cpu)
 		if ((mask & leaf_node_cpu_bit(rnp, cpu)))
 			cpumask_set_cpu(cpu, cm);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown
  2022-09-05  3:38 [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown Pingfan Liu
  2022-09-05  3:38 ` [PATCH 2/3] rcu: simplify the prototype of rcu_boost_kthread_setaffinity() Pingfan Liu
  2022-09-05  3:38 ` [PATCH 3/3] rcu: Keep qsmaskinitnext fresh for rcu_boost_kthread_setaffinity() Pingfan Liu
@ 2022-09-06 17:24 ` Paul E. McKenney
  2022-09-07  1:40   ` Pingfan Liu
  2 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2022-09-06 17:24 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, David Woodhouse, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote:
> At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
> called twice. Firstly by rcutree_offline_cpu(), then by
> rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.
> 
> >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
> is visible to the scheduler. Furthermore, a bit in cpu_active_mask
> means that the cpu is suitable as a migration destination.
> 
> Now turning back to the case in rcu offlining. sched_cpu_deactivate()
> has disabled the dying cpu as the migration destination before
> rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
> cpu, it will be migrated to another suitable online cpu by the scheduler.
> So the affinity setting by rcutree_offline_cpu() is redundant and can be
> eliminated.
> 
> Besides, this patch does an trival code rearrangement by unfolding
> rcutree_affinity_setting() into rcutree_online_cpu(), considering that
> the latter one is the only user of the former.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/tree.c | 14 +-------------
>  1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 79aea7df4345..b90f6487fd45 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
>  	return 0;
>  }
>  
> -/*
> - * Update RCU priority boot kthread affinity for CPU-hotplug changes.
> - */
> -static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
> -{
> -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> -
> -	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
> -}

Good point, tiven how simple a wrapper this is and how little it is used,
getting rid of it does sound like a reasonable idea.

>  /*
>   * Near the end of the CPU-online process.  Pretty much all services
>   * enabled, and the CPU is now very much alive.
> @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
>  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>  		return 0; /* Too early in boot for scheduler work. */
>  	sync_sched_exp_online_cleanup(cpu);
> -	rcutree_affinity_setting(cpu, -1);
> +	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
>  
>  	// Stop-machine done, so allow nohz_full to disable tick.
>  	tick_dep_clear(TICK_DEP_BIT_RCU);
> @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
>  	rnp->ffmask &= ~rdp->grpmask;
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  
> -	rcutree_affinity_setting(cpu, cpu);

We do need to keep this one because the CPU is going away.

One the other hand, it might well be that we could get rid of the call
to rcutree_affinity_setting() in rcutree_dead_cpu().

Or am I missing something subtle here?

							Thanx, Paul

> -
>  	// nohz_full CPUs need the tick for stop-machine to work quickly
>  	tick_dep_set(TICK_DEP_BIT_RCU);
>  	return 0;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] rcu: Keep qsmaskinitnext fresh for rcu_boost_kthread_setaffinity()
  2022-09-05  3:38 ` [PATCH 3/3] rcu: Keep qsmaskinitnext fresh for rcu_boost_kthread_setaffinity() Pingfan Liu
@ 2022-09-06 18:42   ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2022-09-06 18:42 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, David Woodhouse, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Mon, Sep 05, 2022 at 11:38:52AM +0800, Pingfan Liu wrote:
> rcutree_online_cpu() is concurrent, so there is the following race
> scene:
> 
>         CPU 1                               CPU2
> mask_old = rcu_rnp_online_cpus(rnp);
> ...
> 
>                                    mask_new = rcu_rnp_online_cpus(rnp);
>                                    ...
>                                    set_cpus_allowed_ptr(t, cm);
> 
> set_cpus_allowed_ptr(t, cm);
> 
> Consequently, the old mask will overwrite the new mask in the task's cpus_ptr.
> 
> Since there is a mutex ->boost_kthread_mutex, using it to build an
> order, then the latest ->qsmaskinitnext will be fetched for updating cpus_ptr.
> 
> Notes about the cpu teardown: The cpu hot-removing initiator executes
> rcutree_dead_cpu() for the teardown cpu. If in future, an initiator can
> hot-remove several cpus at a time, it executes rcutree_dead_cpu() in
> series for each cpu. So it is still race-free without the mutex.  But
> considering this is a cold path, applying that redundant mutex is
> harmless.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> To: rcu@vger.kernel.org

Good choice, thank you!

Because I did not take your patch #2, I had to hand-apply this one.
Please check the result below and let me know if I messed something up.

							Thanx, Paul

------------------------------------------------------------------------

commit 7139c19d2ea117976aa892de4fb75682e989ba12
Author: Pingfan Liu <kernelfans@gmail.com>
Date:   Tue Sep 6 11:36:42 2022 -0700

    rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()
    
    Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked
    concurrently, the following rcu_boost_kthread_setaffinity() race can
    occur:
    
            CPU 1                               CPU2
    mask = rcu_rnp_online_cpus(rnp);
    ...
    
                                       mask = rcu_rnp_online_cpus(rnp);
                                       ...
                                       set_cpus_allowed_ptr(t, cm);
    
    set_cpus_allowed_ptr(t, cm);
    
    This results in CPU2's update being overwritten by that of CPU1, and
    thus the possibility of ->boost_kthread_task continuing to run on a
    to-be-offlined CPU.
    
    This commit therefore eliminates this race by relying on the pre-existing
    acquisition of ->boost_kthread_mutex to serialize the full process of
    changing the affinity of ->boost_kthread_task.
    
    Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
    Cc: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index e3142ee35fc6a..7b0fe741a0886 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1221,11 +1221,13 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
  * We don't include outgoingcpu in the affinity set, use -1 if there is
  * no outgoing CPU.  If there are no CPUs left in the affinity set,
  * this function allows the kthread to execute on any CPU.
+ *
+ * Any future concurrent calls are serialized via ->boost_kthread_mutex.
  */
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 {
 	struct task_struct *t = rnp->boost_kthread_task;
-	unsigned long mask = rcu_rnp_online_cpus(rnp);
+	unsigned long mask;
 	cpumask_var_t cm;
 	int cpu;
 
@@ -1234,6 +1236,7 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
 	if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
 		return;
 	mutex_lock(&rnp->boost_kthread_mutex);
+	mask = rcu_rnp_online_cpus(rnp);
 	for_each_leaf_node_possible_cpu(rnp, cpu)
 		if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
 		    cpu != outgoingcpu)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown
  2022-09-06 17:24 ` [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown Paul E. McKenney
@ 2022-09-07  1:40   ` Pingfan Liu
  2022-09-10 20:36     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2022-09-07  1:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, David Woodhouse, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote:
> On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote:
> > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
> > called twice. Firstly by rcutree_offline_cpu(), then by
> > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.
> > 
> > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
> > is visible to the scheduler. Furthermore, a bit in cpu_active_mask
> > means that the cpu is suitable as a migration destination.
> > 
> > Now turning back to the case in rcu offlining. sched_cpu_deactivate()
> > has disabled the dying cpu as the migration destination before
> > rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
> > cpu, it will be migrated to another suitable online cpu by the scheduler.
> > So the affinity setting by rcutree_offline_cpu() is redundant and can be
> > eliminated.
> > 
> > Besides, this patch does an trival code rearrangement by unfolding
> > rcutree_affinity_setting() into rcutree_online_cpu(), considering that
> > the latter one is the only user of the former.
> > 
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > Cc: Frederic Weisbecker <frederic@kernel.org>
> > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > Cc: Joel Fernandes <joel@joelfernandes.org>
> > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > To: rcu@vger.kernel.org
> > ---
> >  kernel/rcu/tree.c | 14 +-------------
> >  1 file changed, 1 insertion(+), 13 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 79aea7df4345..b90f6487fd45 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> >  	return 0;
> >  }
> >  
> > -/*
> > - * Update RCU priority boot kthread affinity for CPU-hotplug changes.
> > - */
> > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
> > -{
> > -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > -
> > -	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
> > -}
> 
> Good point, tiven how simple a wrapper this is and how little it is used,
> getting rid of it does sound like a reasonable idea.
> 
> >  /*
> >   * Near the end of the CPU-online process.  Pretty much all services
> >   * enabled, and the CPU is now very much alive.
> > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
> >  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> >  		return 0; /* Too early in boot for scheduler work. */
> >  	sync_sched_exp_online_cleanup(cpu);
> > -	rcutree_affinity_setting(cpu, -1);
> > +	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
> >  
> >  	// Stop-machine done, so allow nohz_full to disable tick.
> >  	tick_dep_clear(TICK_DEP_BIT_RCU);
> > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> >  	rnp->ffmask &= ~rdp->grpmask;
> >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  
> > -	rcutree_affinity_setting(cpu, cpu);
> 
> We do need to keep this one because the CPU is going away.
> 
> One the other hand, it might well be that we could get rid of the call
> to rcutree_affinity_setting() in rcutree_dead_cpu().
> 
> Or am I missing something subtle here?
> 

Oops, I think I need to rephrase my commit log to describe this nuance.
The keypoint is whether ->qsmaskinitnext is stable.

The teardown code path on a single dying cpu looks like:

sched_cpu_deactivate() // prevent this dying cpu as a migration dst.

rcutree_offline_cpu()  // as a result, the scheduler core will take care
                       // of the transient affinity mismatching until
		       // rcutree_dead_cpu(). (I think it also stands in
		       // the concurrent offlining)

rcu_report_dead()      // running on the dying cpu, and clear its bit in ->qsmaskinitnext

rcutree_dead_cpu()     // running on the initiator (a initiator cpu will
                       // execute this function for each dying cpu)
		       // At this point, ->qsmaskinitnext reflects the
		       // offlining, and the affinity can get right.

Sorry that my commit log had emphasized on the first part, but forgot to
mention the ->qsmaskinitnext.


Does this justification stand?

Thanks,

	Pingfan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown
  2022-09-07  1:40   ` Pingfan Liu
@ 2022-09-10 20:36     ` Paul E. McKenney
  2022-09-14 10:12       ` Pingfan Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2022-09-10 20:36 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: rcu, David Woodhouse, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Wed, Sep 07, 2022 at 09:40:29AM +0800, Pingfan Liu wrote:
> On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote:
> > > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
> > > called twice. Firstly by rcutree_offline_cpu(), then by
> > > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.
> > > 
> > > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
> > > is visible to the scheduler. Furthermore, a bit in cpu_active_mask
> > > means that the cpu is suitable as a migration destination.
> > > 
> > > Now turning back to the case in rcu offlining. sched_cpu_deactivate()
> > > has disabled the dying cpu as the migration destination before
> > > rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
> > > cpu, it will be migrated to another suitable online cpu by the scheduler.
> > > So the affinity setting by rcutree_offline_cpu() is redundant and can be
> > > eliminated.
> > > 
> > > Besides, this patch does an trival code rearrangement by unfolding
> > > rcutree_affinity_setting() into rcutree_online_cpu(), considering that
> > > the latter one is the only user of the former.
> > > 
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > > Cc: Frederic Weisbecker <frederic@kernel.org>
> > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > > To: rcu@vger.kernel.org
> > > ---
> > >  kernel/rcu/tree.c | 14 +-------------
> > >  1 file changed, 1 insertion(+), 13 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 79aea7df4345..b90f6487fd45 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > >  	return 0;
> > >  }
> > >  
> > > -/*
> > > - * Update RCU priority boot kthread affinity for CPU-hotplug changes.
> > > - */
> > > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
> > > -{
> > > -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > -
> > > -	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
> > > -}
> > 
> > Good point, tiven how simple a wrapper this is and how little it is used,
> > getting rid of it does sound like a reasonable idea.
> > 
> > >  /*
> > >   * Near the end of the CPU-online process.  Pretty much all services
> > >   * enabled, and the CPU is now very much alive.
> > > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
> > >  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> > >  		return 0; /* Too early in boot for scheduler work. */
> > >  	sync_sched_exp_online_cleanup(cpu);
> > > -	rcutree_affinity_setting(cpu, -1);
> > > +	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
> > >  
> > >  	// Stop-machine done, so allow nohz_full to disable tick.
> > >  	tick_dep_clear(TICK_DEP_BIT_RCU);
> > > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >  	rnp->ffmask &= ~rdp->grpmask;
> > >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > >  
> > > -	rcutree_affinity_setting(cpu, cpu);
> > 
> > We do need to keep this one because the CPU is going away.
> > 
> > One the other hand, it might well be that we could get rid of the call
> > to rcutree_affinity_setting() in rcutree_dead_cpu().
> > 
> > Or am I missing something subtle here?
> 
> Oops, I think I need to rephrase my commit log to describe this nuance.
> The keypoint is whether ->qsmaskinitnext is stable.
> 
> The teardown code path on a single dying cpu looks like:
> 
> sched_cpu_deactivate() // prevent this dying cpu as a migration dst.

Suppose that this was the last CPU that the task was permitted to
run on.

> rcutree_offline_cpu()  // as a result, the scheduler core will take care
>                        // of the transient affinity mismatching until
> 		       // rcutree_dead_cpu(). (I think it also stands in
> 		       // the concurrent offlining)
> 
> rcu_report_dead()      // running on the dying cpu, and clear its bit in ->qsmaskinitnext
> 
> rcutree_dead_cpu()     // running on the initiator (a initiator cpu will
>                        // execute this function for each dying cpu)
> 		       // At this point, ->qsmaskinitnext reflects the
> 		       // offlining, and the affinity can get right.
> 
> Sorry that my commit log had emphasized on the first part, but forgot to
> mention the ->qsmaskinitnext.
> 
> 
> Does this justification stand?

We should ensure that the task's permitted set of CPUs always contained
at least one online CPU.  Unless I am missing something, your suggested
change will sometimes end up with the task having no online CPUs in
its mask.

So what am I missing here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown
  2022-09-10 20:36     ` Paul E. McKenney
@ 2022-09-14 10:12       ` Pingfan Liu
  2022-09-14 12:27         ` Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2022-09-14 10:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, David Woodhouse, Frederic Weisbecker, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote:
> On Wed, Sep 07, 2022 at 09:40:29AM +0800, Pingfan Liu wrote:
> > On Tue, Sep 06, 2022 at 10:24:41AM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 05, 2022 at 11:38:50AM +0800, Pingfan Liu wrote:
> > > > At present, during the cpu teardown, rcu_boost_kthread_setaffinity() is
> > > > called twice. Firstly by rcutree_offline_cpu(), then by
> > > > rcutree_dead_cpu() as the CPUHP_RCUTREE_PREP  cpuhp_step callback.
> > > > 
> > > > >From the scheduler's perspective, a bit in cpu_online_mask means that the cpu
> > > > is visible to the scheduler. Furthermore, a bit in cpu_active_mask
> > > > means that the cpu is suitable as a migration destination.
> > > > 
> > > > Now turning back to the case in rcu offlining. sched_cpu_deactivate()
> > > > has disabled the dying cpu as the migration destination before
> > > > rcutree_offline_cpu().  Furthermore, if the boost kthread is on the dying
> > > > cpu, it will be migrated to another suitable online cpu by the scheduler.
> > > > So the affinity setting by rcutree_offline_cpu() is redundant and can be
> > > > eliminated.
> > > > 
> > > > Besides, this patch does an trival code rearrangement by unfolding
> > > > rcutree_affinity_setting() into rcutree_online_cpu(), considering that
> > > > the latter one is the only user of the former.
> > > > 
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > > > Cc: David Woodhouse <dwmw@amazon.co.uk>
> > > > Cc: Frederic Weisbecker <frederic@kernel.org>
> > > > Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
> > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > > Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > > > To: rcu@vger.kernel.org
> > > > ---
> > > >  kernel/rcu/tree.c | 14 +-------------
> > > >  1 file changed, 1 insertion(+), 13 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 79aea7df4345..b90f6487fd45 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3978,16 +3978,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > > >  	return 0;
> > > >  }
> > > >  
> > > > -/*
> > > > - * Update RCU priority boot kthread affinity for CPU-hotplug changes.
> > > > - */
> > > > -static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
> > > > -{
> > > > -	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > -
> > > > -	rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
> > > > -}
> > > 
> > > Good point, tiven how simple a wrapper this is and how little it is used,
> > > getting rid of it does sound like a reasonable idea.
> > > 
> > > >  /*
> > > >   * Near the end of the CPU-online process.  Pretty much all services
> > > >   * enabled, and the CPU is now very much alive.
> > > > @@ -4006,7 +3996,7 @@ int rcutree_online_cpu(unsigned int cpu)
> > > >  	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> > > >  		return 0; /* Too early in boot for scheduler work. */
> > > >  	sync_sched_exp_online_cleanup(cpu);
> > > > -	rcutree_affinity_setting(cpu, -1);
> > > > +	rcu_boost_kthread_setaffinity(rdp->mynode, -1);
> > > >  
> > > >  	// Stop-machine done, so allow nohz_full to disable tick.
> > > >  	tick_dep_clear(TICK_DEP_BIT_RCU);
> > > > @@ -4029,8 +4019,6 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > >  	rnp->ffmask &= ~rdp->grpmask;
> > > >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > >  
> > > > -	rcutree_affinity_setting(cpu, cpu);
> > > 
> > > We do need to keep this one because the CPU is going away.
> > > 
> > > One the other hand, it might well be that we could get rid of the call
> > > to rcutree_affinity_setting() in rcutree_dead_cpu().
> > > 
> > > Or am I missing something subtle here?
> > 
> > Oops, I think I need to rephrase my commit log to describe this nuance.
> > The keypoint is whether ->qsmaskinitnext is stable.
> > 
> > The teardown code path on a single dying cpu looks like:
> > 
> > sched_cpu_deactivate() // prevent this dying cpu as a migration dst.
> 
> Suppose that this was the last CPU that the task was permitted to
> run on.
> 

Get your concern, please see the comment below.

> > rcutree_offline_cpu()  // as a result, the scheduler core will take care
> >                        // of the transient affinity mismatching until
> > 		       // rcutree_dead_cpu(). (I think it also stands in
> > 		       // the concurrent offlining)
> > 
> > rcu_report_dead()      // running on the dying cpu, and clear its bit in ->qsmaskinitnext
> > 
> > rcutree_dead_cpu()     // running on the initiator (a initiator cpu will
> >                        // execute this function for each dying cpu)
> > 		       // At this point, ->qsmaskinitnext reflects the
> > 		       // offlining, and the affinity can get right.
> > 
> > Sorry that my commit log had emphasized on the first part, but forgot to
> > mention the ->qsmaskinitnext.
> > 
> > 
> > Does this justification stand?
> 
> We should ensure that the task's permitted set of CPUs always contained
> at least one online CPU.  Unless I am missing something, your suggested
> change will sometimes end up with the task having no online CPUs in
> its mask.
> 

Actually, the scheduler will pick up one online cpu for the boost
thread.

On the last cpu, boost is subject to migration, and is pushed away by
__balance_push_cpu_stop()
{
        if (task_rq(p) == rq && task_on_rq_queued(p)) {
                cpu = select_fallback_rq(rq->cpu, p);
                rq = __migrate_task(rq, &rf, p, cpu);
        }
}

Now, turning to select_fallback_rq(), inside that function, if
p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it
point to cpuset or cpu_possible_mask.  So finally, there is a valid cpu
is picked up. (I have not found the online cpu there. The trick should hide
elsewhere. But that rendered the same result as in v5.16, where
rcu_boost_kthread_setaffinity()->cpumask_setall(cm))

But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU),
which appeals for a more careful thinking.
If there is a cpuset for housekeeping so that select_fallback_rq() can
pick up one from that cpuset?

To Frederic, could you show some light here and is it worth introducing
such a cpuset so that select_fallback_rq() can pick up a cpu in
housekeeping set in this case?


Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for
the awareness of the concurrent rcutree_offline_cpu(), the affinity
setting could be done there if select_fallback_rq() can not work.


Thanks,

	Pingfan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown
  2022-09-14 10:12       ` Pingfan Liu
@ 2022-09-14 12:27         ` Frederic Weisbecker
  2022-09-14 13:34           ` Pingfan Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2022-09-14 12:27 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Paul E. McKenney, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Wed, Sep 14, 2022 at 06:12:27PM +0800, Pingfan Liu wrote:
> On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote:
> Actually, the scheduler will pick up one online cpu for the boost
> thread.
> 
> On the last cpu, boost is subject to migration, and is pushed away by
> __balance_push_cpu_stop()
> {
>         if (task_rq(p) == rq && task_on_rq_queued(p)) {
>                 cpu = select_fallback_rq(rq->cpu, p);
>                 rq = __migrate_task(rq, &rf, p, cpu);
>         }
> }
> 
> Now, turning to select_fallback_rq(), inside that function, if
> p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it
> point to cpuset or cpu_possible_mask.  So finally, there is a valid cpu
> is picked up. (I have not found the online cpu there. The trick should hide
> elsewhere. But that rendered the same result as in v5.16, where
> rcu_boost_kthread_setaffinity()->cpumask_setall(cm))
> 
> But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU),
> which appeals for a more careful thinking.
> If there is a cpuset for housekeeping so that select_fallback_rq() can
> pick up one from that cpuset?
> 
> To Frederic, could you show some light here and is it worth introducing
> such a cpuset so that select_fallback_rq() can pick up a cpu in
> housekeeping set in this case?
> 
> 
> Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for
> the awareness of the concurrent rcutree_offline_cpu(), the affinity
> setting could be done there if select_fallback_rq() can not work.

Here is the way I understand it: suppose the outgoing CPU was the last online
one in this rcu node, in this case the migration performed in
sched_cpu_deactivate() will find only one CPU in p->cpus_mask, but that CPU is
now out of cpu_active_mask, so the task is affined by default to cpu_possible_mask.

So there is a short window between sched_cpu_deactivate() and
rcutree_offline_cpu() where the task may run on all cpu_possible_mask. Does it
matter? Maybe, I don't know...

After that we reach rcutree_offline_cpu() -> rcu_boost_kthread_setaffinity()
that fails to fill the cpumask since the node is now empty. As the resulting
cpumask is empty, it fills it as a last resort to the HK_TYPE_RCU housekeeping
set which has to have at least 1 online CPU (nohz_full constraint). So the task
is finally affine to that housekeeping set.

Like Paul said, I'd rather indeed remove the rcu_boost_kthread_setaffinity()
call from rcutree_dead_cpu() as this one looks useless.

Thanks.




> 
> Thanks,
> 
> 	Pingfan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown
  2022-09-14 12:27         ` Frederic Weisbecker
@ 2022-09-14 13:34           ` Pingfan Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2022-09-14 13:34 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, rcu, David Woodhouse, Neeraj Upadhyay,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes, Jason A. Donenfeld

On Wed, Sep 14, 2022 at 8:27 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Wed, Sep 14, 2022 at 06:12:27PM +0800, Pingfan Liu wrote:
> > On Sat, Sep 10, 2022 at 01:36:22PM -0700, Paul E. McKenney wrote:
> > Actually, the scheduler will pick up one online cpu for the boost
> > thread.
> >
> > On the last cpu, boost is subject to migration, and is pushed away by
> > __balance_push_cpu_stop()
> > {
> >         if (task_rq(p) == rq && task_on_rq_queued(p)) {
> >                 cpu = select_fallback_rq(rq->cpu, p);
> >                 rq = __migrate_task(rq, &rf, p, cpu);
> >         }
> > }
> >
> > Now, turning to select_fallback_rq(), inside that function, if
> > p->cpus_ptr has no suitable cpus, then case cpuset or possible will make it
> > point to cpuset or cpu_possible_mask.  So finally, there is a valid cpu
> > is picked up. (I have not found the online cpu there. The trick should hide
> > elsewhere. But that rendered the same result as in v5.16, where
> > rcu_boost_kthread_setaffinity()->cpumask_setall(cm))
> >
> > But now, in 6.0, it changes into housekeeping_cpumask(HK_TYPE_RCU),
> > which appeals for a more careful thinking.
> > If there is a cpuset for housekeeping so that select_fallback_rq() can
> > pick up one from that cpuset?
> >
> > To Frederic, could you show some light here and is it worth introducing
> > such a cpuset so that select_fallback_rq() can pick up a cpu in
> > housekeeping set in this case?
> >
> >
> > Anyway, as the last resort, TICK_DEP_BIT_RCU has already appealed for
> > the awareness of the concurrent rcutree_offline_cpu(), the affinity
> > setting could be done there if select_fallback_rq() can not work.
>
> Here is the way I understand it: suppose the outgoing CPU was the last online
> one in this rcu node, in this case the migration performed in
> sched_cpu_deactivate() will find only one CPU in p->cpus_mask, but that CPU is
> now out of cpu_active_mask, so the task is affined by default to cpu_possible_mask.
>
> So there is a short window between sched_cpu_deactivate() and
> rcutree_offline_cpu() where the task may run on all cpu_possible_mask. Does it
> matter? Maybe, I don't know...
>

OK, then it is pointless to introduce a housekeeping cpuset.

> After that we reach rcutree_offline_cpu() -> rcu_boost_kthread_setaffinity()
> that fails to fill the cpumask since the node is now empty. As the resulting
> cpumask is empty, it fills it as a last resort to the HK_TYPE_RCU housekeeping
> set which has to have at least 1 online CPU (nohz_full constraint). So the task
> is finally affine to that housekeeping set.
>

Yes, exactly.

> Like Paul said, I'd rather indeed remove the rcu_boost_kthread_setaffinity()
> call from rcutree_dead_cpu() as this one looks useless.
>

OK, I will take that way.

Thanks to both of you for the generous help. I will bring up V2 soon.


Thanks,

    Pingfan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-09-14 13:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-05  3:38 [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown Pingfan Liu
2022-09-05  3:38 ` [PATCH 2/3] rcu: simplify the prototype of rcu_boost_kthread_setaffinity() Pingfan Liu
2022-09-05  3:38 ` [PATCH 3/3] rcu: Keep qsmaskinitnext fresh for rcu_boost_kthread_setaffinity() Pingfan Liu
2022-09-06 18:42   ` Paul E. McKenney
2022-09-06 17:24 ` [PATCH 1/3] rcu: remove redundant cpu affinity setting during teardown Paul E. McKenney
2022-09-07  1:40   ` Pingfan Liu
2022-09-10 20:36     ` Paul E. McKenney
2022-09-14 10:12       ` Pingfan Liu
2022-09-14 12:27         ` Frederic Weisbecker
2022-09-14 13:34           ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox