public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] RCU tasks fixes for v6.9
@ 2024-02-17  1:27 Boqun Feng
  2024-02-17  1:27 ` [PATCH v2 1/6] rcu-tasks: Repair RCU Tasks Trace quiescence check Boqun Feng
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Boqun Feng @ 2024-02-17  1:27 UTC (permalink / raw)
  To: linux-kernel, rcu; +Cc: Neeraj Upadhyay, Boqun Feng

Hi,

This series contains the fixes of RCU tasks for v6.9. You can also find
the series at:

	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a

Changes since v1:

*	Update with Paul's rework on "Eliminate deadlocks involving
	do_exit() and RCU task"

The detailed list of changes:

Paul E. McKenney (6):
  rcu-tasks: Repair RCU Tasks Trace quiescence check
  rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
  rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
  rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
  rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
  rcu-tasks: Maintain real-time response in rcu_tasks_postscan()

 include/linux/rcupdate.h |   4 +-
 include/linux/sched.h    |   2 +
 init/init_task.c         |   1 +
 kernel/fork.c            |   1 +
 kernel/rcu/tasks.h       | 110 ++++++++++++++++++++++++++++++---------
 5 files changed, 90 insertions(+), 28 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/6] rcu-tasks: Repair RCU Tasks Trace quiescence check
  2024-02-17  1:27 [PATCH v2 0/6] RCU tasks fixes for v6.9 Boqun Feng
@ 2024-02-17  1:27 ` Boqun Feng
  2024-02-17  1:27 ` [PATCH v2 2/6] rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks Boqun Feng
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Boqun Feng @ 2024-02-17  1:27 UTC (permalink / raw)
  To: linux-kernel, rcu
  Cc: Neeraj Upadhyay, Paul E. McKenney, Steven Rostedt, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Zqiang

From: "Paul E. McKenney" <paulmck@kernel.org>

The context-switch-time check for RCU Tasks Trace quiescence expects
current->trc_reader_special.b.need_qs to be zero, and if so, updates
it to TRC_NEED_QS_CHECKED.  This is backwards, because if this value
is zero, there is no RCU Tasks Trace grace period in flight, an thus
no need for a quiescent state.  Instead, when a grace period starts,
this field is set to TRC_NEED_QS.

This commit therefore changes the check from zero to TRC_NEED_QS.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/rcupdate.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 0746b1b0b663..16f519914415 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -184,9 +184,9 @@ void rcu_tasks_trace_qs_blkd(struct task_struct *t);
 	do {									\
 		int ___rttq_nesting = READ_ONCE((t)->trc_reader_nesting);	\
 										\
-		if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) &&	\
+		if (unlikely(READ_ONCE((t)->trc_reader_special.b.need_qs) == TRC_NEED_QS) &&	\
 		    likely(!___rttq_nesting)) {					\
-			rcu_trc_cmpxchg_need_qs((t), 0,	TRC_NEED_QS_CHECKED);	\
+			rcu_trc_cmpxchg_need_qs((t), TRC_NEED_QS, TRC_NEED_QS_CHECKED);	\
 		} else if (___rttq_nesting && ___rttq_nesting != INT_MIN &&	\
 			   !READ_ONCE((t)->trc_reader_special.b.blocked)) {	\
 			rcu_tasks_trace_qs_blkd(t);				\
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/6] rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-17  1:27 [PATCH v2 0/6] RCU tasks fixes for v6.9 Boqun Feng
  2024-02-17  1:27 ` [PATCH v2 1/6] rcu-tasks: Repair RCU Tasks Trace quiescence check Boqun Feng
@ 2024-02-17  1:27 ` Boqun Feng
  2024-02-22 16:54   ` Frederic Weisbecker
  2024-02-17  1:27 ` [PATCH v2 3/6] rcu-tasks: Initialize " Boqun Feng
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Boqun Feng @ 2024-02-17  1:27 UTC (permalink / raw)
  To: linux-kernel, rcu
  Cc: Neeraj Upadhyay, Paul E. McKenney, Chen Zhongjin, Yang Jihong,
	Boqun Feng, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Zqiang

From: "Paul E. McKenney" <paulmck@kernel.org>

Holding a mutex across synchronize_rcu_tasks() and acquiring
that same mutex in code called from do_exit() after its call to
exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
results in deadlock.  This is by design, because tasks that are far
enough into do_exit() are no longer present on the tasks list, making
it a bit difficult for RCU Tasks to find them, let alone wait on them
to do a voluntary context switch.  However, such deadlocks are becoming
more frequent.  In addition, lockdep currently does not detect such
deadlocks and they can be difficult to reproduce.

In addition, if a task voluntarily context switches during that time
(for example, if it blocks acquiring a mutex), then this task is in an
RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
just as well take advantage of that fact.

This commit therefore adds the data structures that will be needed
to rely on these quiescent states and to eliminate these deadlocks.

Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/

Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 include/linux/sched.h | 2 ++
 kernel/rcu/tasks.h    | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffe8f618ab86..5eeebed2dd9b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -858,6 +858,8 @@ struct task_struct {
 	u8				rcu_tasks_idx;
 	int				rcu_tasks_idle_cpu;
 	struct list_head		rcu_tasks_holdout_list;
+	int				rcu_tasks_exit_cpu;
+	struct list_head		rcu_tasks_exit_list;
 #endif /* #ifdef CONFIG_TASKS_RCU */
 
 #ifdef CONFIG_TASKS_TRACE_RCU
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 732ad5b39946..b7d5f2757053 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -32,6 +32,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @rtp_irq_work: IRQ work queue for deferred wakeups.
  * @barrier_q_head: RCU callback for barrier operation.
  * @rtp_blkd_tasks: List of tasks blocked as readers.
+ * @rtp_exit_list: List of tasks in the latter portion of do_exit().
  * @cpu: CPU number corresponding to this entry.
  * @rtpp: Pointer to the rcu_tasks structure.
  */
@@ -46,6 +47,7 @@ struct rcu_tasks_percpu {
 	struct irq_work rtp_irq_work;
 	struct rcu_head barrier_q_head;
 	struct list_head rtp_blkd_tasks;
+	struct list_head rtp_exit_list;
 	int cpu;
 	struct rcu_tasks *rtpp;
 };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/6] rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-17  1:27 [PATCH v2 0/6] RCU tasks fixes for v6.9 Boqun Feng
  2024-02-17  1:27 ` [PATCH v2 1/6] rcu-tasks: Repair RCU Tasks Trace quiescence check Boqun Feng
  2024-02-17  1:27 ` [PATCH v2 2/6] rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks Boqun Feng
@ 2024-02-17  1:27 ` Boqun Feng
  2024-02-22 16:21   ` Frederic Weisbecker
  2024-02-17  1:27 ` [PATCH v2 4/6] rcu-tasks: Maintain lists " Boqun Feng
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Boqun Feng @ 2024-02-17  1:27 UTC (permalink / raw)
  To: linux-kernel, rcu
  Cc: Neeraj Upadhyay, Paul E. McKenney, Chen Zhongjin, Yang Jihong,
	Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Andrew Morton, Kent Overstreet, Oleg Nesterov,
	Heiko Carstens, Christian Brauner, Suren Baghdasaryan,
	Michael S. Tsirkin, Mike Christie, Mateusz Guzik, Nicholas Piggin,
	Peng Zhang

From: "Paul E. McKenney" <paulmck@kernel.org>

Holding a mutex across synchronize_rcu_tasks() and acquiring
that same mutex in code called from do_exit() after its call to
exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
results in deadlock.  This is by design, because tasks that are far
enough into do_exit() are no longer present on the tasks list, making
it a bit difficult for RCU Tasks to find them, let alone wait on them
to do a voluntary context switch.  However, such deadlocks are becoming
more frequent.  In addition, lockdep currently does not detect such
deadlocks and they can be difficult to reproduce.

In addition, if a task voluntarily context switches during that time
(for example, if it blocks acquiring a mutex), then this task is in an
RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
just as well take advantage of that fact.

This commit therefore initializes the data structures that will be needed
to rely on these quiescent states and to eliminate these deadlocks.

Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/

Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 init/init_task.c   | 1 +
 kernel/fork.c      | 1 +
 kernel/rcu/tasks.h | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/init/init_task.c b/init/init_task.c
index 7ecb458eb3da..4daee6d761c8 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -147,6 +147,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
 	.rcu_tasks_holdout = false,
 	.rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
 	.rcu_tasks_idle_cpu = -1,
+	.rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
 #endif
 #ifdef CONFIG_TASKS_TRACE_RCU
 	.trc_reader_nesting = 0,
diff --git a/kernel/fork.c b/kernel/fork.c
index 0d944e92a43f..af7203be1d2d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1976,6 +1976,7 @@ static inline void rcu_copy_process(struct task_struct *p)
 	p->rcu_tasks_holdout = false;
 	INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
 	p->rcu_tasks_idle_cpu = -1;
+	INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
 #endif /* #ifdef CONFIG_TASKS_RCU */
 #ifdef CONFIG_TASKS_TRACE_RCU
 	p->trc_reader_nesting = 0;
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index b7d5f2757053..4a5d562e3189 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -277,6 +277,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 		rtpcp->rtpp = rtp;
 		if (!rtpcp->rtp_blkd_tasks.next)
 			INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
+		if (!rtpcp->rtp_exit_list.next)
+			INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
 	}
 
 	pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/6] rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-17  1:27 [PATCH v2 0/6] RCU tasks fixes for v6.9 Boqun Feng
                   ` (2 preceding siblings ...)
  2024-02-17  1:27 ` [PATCH v2 3/6] rcu-tasks: Initialize " Boqun Feng
@ 2024-02-17  1:27 ` Boqun Feng
  2024-02-22 16:32   ` Frederic Weisbecker
  2024-02-23 12:19   ` Frederic Weisbecker
  2024-02-17  1:27 ` [PATCH v2 5/6] rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks Boqun Feng
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Boqun Feng @ 2024-02-17  1:27 UTC (permalink / raw)
  To: linux-kernel, rcu
  Cc: Neeraj Upadhyay, Paul E. McKenney, Chen Zhongjin, Yang Jihong,
	Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit continues the elimination of deadlocks involving do_exit()
and RCU tasks by causing exit_tasks_rcu_start() to add the current
task to a per-CPU list and causing exit_tasks_rcu_stop() to remove the
current task from whatever list it is on.  These lists will be used to
track tasks that are exiting, while still accounting for any RCU-tasks
quiescent states that these tasks pass though.

[ paulmck: Apply Frederic Weisbecker feedback. ]

Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/

Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tasks.h | 43 +++++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 4a5d562e3189..68a8adf7de8e 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1151,25 +1151,48 @@ struct task_struct *get_rcu_tasks_gp_kthread(void)
 EXPORT_SYMBOL_GPL(get_rcu_tasks_gp_kthread);
 
 /*
- * Contribute to protect against tasklist scan blind spot while the
- * task is exiting and may be removed from the tasklist. See
- * corresponding synchronize_srcu() for further details.
+ * Protect against tasklist scan blind spot while the task is exiting and
+ * may be removed from the tasklist.  Do this by adding the task to yet
+ * another list.
+ *
+ * Note that the task will remove itself from this list, so there is no
+ * need for get_task_struct(), except in the case where rcu_tasks_pertask()
+ * adds it to the holdout list, in which case rcu_tasks_pertask() supplies
+ * the needed get_task_struct().
  */
-void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
+void exit_tasks_rcu_start(void)
 {
-	current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
+	unsigned long flags;
+	struct rcu_tasks_percpu *rtpcp;
+	struct task_struct *t = current;
+
+	WARN_ON_ONCE(!list_empty(&t->rcu_tasks_exit_list));
+	preempt_disable();
+	rtpcp = this_cpu_ptr(rcu_tasks.rtpcpu);
+	t->rcu_tasks_exit_cpu = smp_processor_id();
+	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+	if (!rtpcp->rtp_exit_list.next)
+		INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
+	list_add(&t->rcu_tasks_exit_list, &rtpcp->rtp_exit_list);
+	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+	preempt_enable();
 }
 
 /*
- * Contribute to protect against tasklist scan blind spot while the
- * task is exiting and may be removed from the tasklist. See
- * corresponding synchronize_srcu() for further details.
+ * Remove the task from the "yet another list" because do_exit() is now
+ * non-preemptible, allowing synchronize_rcu() to wait beyond this point.
  */
-void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu)
+void exit_tasks_rcu_stop(void)
 {
+	unsigned long flags;
+	struct rcu_tasks_percpu *rtpcp;
 	struct task_struct *t = current;
 
-	__srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
+	WARN_ON_ONCE(list_empty(&t->rcu_tasks_exit_list));
+	rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, t->rcu_tasks_exit_cpu);
+	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+	list_del_init(&t->rcu_tasks_exit_list);
+	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/6] rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
  2024-02-17  1:27 [PATCH v2 0/6] RCU tasks fixes for v6.9 Boqun Feng
                   ` (3 preceding siblings ...)
  2024-02-17  1:27 ` [PATCH v2 4/6] rcu-tasks: Maintain lists " Boqun Feng
@ 2024-02-17  1:27 ` Boqun Feng
  2024-02-22 16:46   ` Frederic Weisbecker
  2024-02-17  1:27 ` [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan() Boqun Feng
  2024-02-22 16:52 ` [PATCH v2 0/6] RCU tasks fixes for v6.9 Frederic Weisbecker
  6 siblings, 1 reply; 28+ messages in thread
From: Boqun Feng @ 2024-02-17  1:27 UTC (permalink / raw)
  To: linux-kernel, rcu
  Cc: Neeraj Upadhyay, Paul E. McKenney, Chen Zhongjin, Yang Jihong,
	Boqun Feng, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang

From: "Paul E. McKenney" <paulmck@kernel.org>

Holding a mutex across synchronize_rcu_tasks() and acquiring
that same mutex in code called from do_exit() after its call to
exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
results in deadlock.  This is by design, because tasks that are far
enough into do_exit() are no longer present on the tasks list, making
it a bit difficult for RCU Tasks to find them, let alone wait on them
to do a voluntary context switch.  However, such deadlocks are becoming
more frequent.  In addition, lockdep currently does not detect such
deadlocks and they can be difficult to reproduce.

In addition, if a task voluntarily context switches during that time
(for example, if it blocks acquiring a mutex), then this task is in an
RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
just as well take advantage of that fact.

This commit therefore eliminates these deadlock by replacing the
SRCU-based wait for do_exit() completion with per-CPU lists of tasks
currently exiting.  A given task will be on one of these per-CPU lists for
the same period of time that this task would previously have been in the
previous SRCU read-side critical section.  These lists enable RCU Tasks
to find the tasks that have already been removed from the tasks list,
but that must nevertheless be waited upon.

The RCU Tasks grace period gathers any of these do_exit() tasks that it
must wait on, and adds them to the list of holdouts.  Per-CPU locking
and get_task_struct() are used to synchronize addition to and removal
from these lists.

Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/

Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tasks.h | 44 ++++++++++++++++++++++++++++----------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 68a8adf7de8e..4dc355b2ac22 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -146,8 +146,6 @@ static struct rcu_tasks rt_name =							\
 }
 
 #ifdef CONFIG_TASKS_RCU
-/* Track exiting tasks in order to allow them to be waited for. */
-DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu);
 
 /* Report delay in synchronize_srcu() completion in rcu_tasks_postscan(). */
 static void tasks_rcu_exit_srcu_stall(struct timer_list *unused);
@@ -855,10 +853,12 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
 //	number of voluntary context switches, and add that task to the
 //	holdout list.
 // rcu_tasks_postscan():
-//	Invoke synchronize_srcu() to ensure that all tasks that were
-//	in the process of exiting (and which thus might not know to
-//	synchronize with this RCU Tasks grace period) have completed
-//	exiting.
+//	Gather per-CPU lists of tasks in do_exit() to ensure that all
+//	tasks that were in the process of exiting (and which thus might
+//	not know to synchronize with this RCU Tasks grace period) have
+//	completed exiting.  The synchronize_rcu() in rcu_tasks_postgp()
+//	will take care of any tasks stuck in the non-preemptible region
+//	of do_exit() following its call to exit_tasks_rcu_stop().
 // check_all_holdout_tasks(), repeatedly until holdout list is empty:
 //	Scans the holdout list, attempting to identify a quiescent state
 //	for each task on the list.  If there is a quiescent state, the
@@ -871,8 +871,10 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
 //	with interrupts disabled.
 //
 // For each exiting task, the exit_tasks_rcu_start() and
-// exit_tasks_rcu_finish() functions begin and end, respectively, the SRCU
-// read-side critical sections waited for by rcu_tasks_postscan().
+// exit_tasks_rcu_finish() functions add and remove, respectively, the
+// current task to a per-CPU list of tasks that rcu_tasks_postscan() must
+// wait on.  This is necessary because rcu_tasks_postscan() must wait on
+// tasks that have already been removed from the global list of tasks.
 //
 // Pre-grace-period update-side code is ordered before the grace
 // via the raw_spin_lock.*rcu_node().  Pre-grace-period read-side code
@@ -936,9 +938,13 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
 	}
 }
 
+void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func);
+DEFINE_RCU_TASKS(rcu_tasks, rcu_tasks_wait_gp, call_rcu_tasks, "RCU Tasks");
+
 /* Processing between scanning taskslist and draining the holdout list. */
 static void rcu_tasks_postscan(struct list_head *hop)
 {
+	int cpu;
 	int rtsi = READ_ONCE(rcu_task_stall_info);
 
 	if (!IS_ENABLED(CONFIG_TINY_RCU)) {
@@ -952,9 +958,9 @@ static void rcu_tasks_postscan(struct list_head *hop)
 	 * this, divide the fragile exit path part in two intersecting
 	 * read side critical sections:
 	 *
-	 * 1) An _SRCU_ read side starting before calling exit_notify(),
-	 *    which may remove the task from the tasklist, and ending after
-	 *    the final preempt_disable() call in do_exit().
+	 * 1) A task_struct list addition before calling exit_notify(),
+	 *    which may remove the task from the tasklist, with the
+	 *    removal after the final preempt_disable() call in do_exit().
 	 *
 	 * 2) An _RCU_ read side starting with the final preempt_disable()
 	 *    call in do_exit() and ending with the final call to schedule()
@@ -963,7 +969,17 @@ static void rcu_tasks_postscan(struct list_head *hop)
 	 * This handles the part 1). And postgp will handle part 2) with a
 	 * call to synchronize_rcu().
 	 */
-	synchronize_srcu(&tasks_rcu_exit_srcu);
+
+	for_each_possible_cpu(cpu) {
+		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
+		struct task_struct *t;
+
+		raw_spin_lock_irq_rcu_node(rtpcp);
+		list_for_each_entry(t, &rtpcp->rtp_exit_list, rcu_tasks_exit_list)
+			if (list_empty(&t->rcu_tasks_holdout_list))
+				rcu_tasks_pertask(t, hop);
+		raw_spin_unlock_irq_rcu_node(rtpcp);
+	}
 
 	if (!IS_ENABLED(CONFIG_TINY_RCU))
 		del_timer_sync(&tasks_rcu_exit_srcu_stall_timer);
@@ -1031,7 +1047,6 @@ static void rcu_tasks_postgp(struct rcu_tasks *rtp)
 	 *
 	 * In addition, this synchronize_rcu() waits for exiting tasks
 	 * to complete their final preempt_disable() region of execution,
-	 * cleaning up after synchronize_srcu(&tasks_rcu_exit_srcu),
 	 * enforcing the whole region before tasklist removal until
 	 * the final schedule() with TASK_DEAD state to be an RCU TASKS
 	 * read side critical section.
@@ -1039,9 +1054,6 @@ static void rcu_tasks_postgp(struct rcu_tasks *rtp)
 	synchronize_rcu();
 }
 
-void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func);
-DEFINE_RCU_TASKS(rcu_tasks, rcu_tasks_wait_gp, call_rcu_tasks, "RCU Tasks");
-
 static void tasks_rcu_exit_srcu_stall(struct timer_list *unused)
 {
 #ifndef CONFIG_TINY_RCU
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
  2024-02-17  1:27 [PATCH v2 0/6] RCU tasks fixes for v6.9 Boqun Feng
                   ` (4 preceding siblings ...)
  2024-02-17  1:27 ` [PATCH v2 5/6] rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks Boqun Feng
@ 2024-02-17  1:27 ` Boqun Feng
  2024-02-22 17:48   ` Frederic Weisbecker
  2024-02-22 16:52 ` [PATCH v2 0/6] RCU tasks fixes for v6.9 Frederic Weisbecker
  6 siblings, 1 reply; 28+ messages in thread
From: Boqun Feng @ 2024-02-17  1:27 UTC (permalink / raw)
  To: linux-kernel, rcu
  Cc: Neeraj Upadhyay, Paul E. McKenney, Thomas Gleixner,
	Sebastian Siewior, Anna-Maria Behnsen, Steven Rostedt, Boqun Feng,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Zqiang

From: "Paul E. McKenney" <paulmck@kernel.org>

The current code will scan the entirety of each per-CPU list of exiting
tasks in ->rtp_exit_list with interrupts disabled.  This is normally just
fine, because each CPU typically won't have very many tasks in this state.
However, if a large number of tasks block late in do_exit(), these lists
could be arbitrarily long.  Low probability, perhaps, but it really
could happen.

This commit therefore occasionally re-enables interrupts while traversing
these lists, inserting a dummy element to hold the current place in the
list.  In kernels built with CONFIG_PREEMPT_RT=y, this re-enabling happens
after each list element is processed, otherwise every one-to-two jiffies.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/rcu/tasks.h | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 4dc355b2ac22..866743e0796f 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -971,13 +971,32 @@ static void rcu_tasks_postscan(struct list_head *hop)
 	 */
 
 	for_each_possible_cpu(cpu) {
+		unsigned long j = jiffies + 1;
 		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
 		struct task_struct *t;
+		struct task_struct *t1;
+		struct list_head tmp;
 
 		raw_spin_lock_irq_rcu_node(rtpcp);
-		list_for_each_entry(t, &rtpcp->rtp_exit_list, rcu_tasks_exit_list)
+		list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
 			if (list_empty(&t->rcu_tasks_holdout_list))
 				rcu_tasks_pertask(t, hop);
+
+			// RT kernels need frequent pauses, otherwise
+			// pause at least once per pair of jiffies.
+			if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
+				continue;
+
+			// Keep our place in the list while pausing.
+			// Nothing else traverses this list, so adding a
+			// bare list_head is OK.
+			list_add(&tmp, &t->rcu_tasks_exit_list);
+			raw_spin_unlock_irq_rcu_node(rtpcp);
+			cond_resched(); // For CONFIG_PREEMPT=n kernels
+			raw_spin_lock_irq_rcu_node(rtpcp);
+			list_del(&tmp);
+			j = jiffies + 1;
+		}
 		raw_spin_unlock_irq_rcu_node(rtpcp);
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/6] rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-17  1:27 ` [PATCH v2 3/6] rcu-tasks: Initialize " Boqun Feng
@ 2024-02-22 16:21   ` Frederic Weisbecker
  2024-02-22 20:41     ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-22 16:21 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, Neeraj Upadhyay, Paul E. McKenney,
	Chen Zhongjin, Yang Jihong, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Andrew Morton, Kent Overstreet, Oleg Nesterov,
	Heiko Carstens, Christian Brauner, Suren Baghdasaryan,
	Michael S. Tsirkin, Mike Christie, Mateusz Guzik, Nicholas Piggin,
	Peng Zhang

Le Fri, Feb 16, 2024 at 05:27:38PM -0800, Boqun Feng a écrit :
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> Holding a mutex across synchronize_rcu_tasks() and acquiring
> that same mutex in code called from do_exit() after its call to
> exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> results in deadlock.  This is by design, because tasks that are far
> enough into do_exit() are no longer present on the tasks list, making
> it a bit difficult for RCU Tasks to find them, let alone wait on them
> to do a voluntary context switch.  However, such deadlocks are becoming
> more frequent.  In addition, lockdep currently does not detect such
> deadlocks and they can be difficult to reproduce.
> 
> In addition, if a task voluntarily context switches during that time
> (for example, if it blocks acquiring a mutex), then this task is in an
> RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
> just as well take advantage of that fact.
> 
> This commit therefore initializes the data structures that will be needed
> to rely on these quiescent states and to eliminate these deadlocks.
> 
> Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
> 
> Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Reported-by: Yang Jihong <yangjihong1@huawei.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Tested-by: Yang Jihong <yangjihong1@huawei.com>
> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  init/init_task.c   | 1 +
>  kernel/fork.c      | 1 +
>  kernel/rcu/tasks.h | 2 ++
>  3 files changed, 4 insertions(+)
> 
> diff --git a/init/init_task.c b/init/init_task.c
> index 7ecb458eb3da..4daee6d761c8 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -147,6 +147,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
>  	.rcu_tasks_holdout = false,
>  	.rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
>  	.rcu_tasks_idle_cpu = -1,
> +	.rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
>  #endif
>  #ifdef CONFIG_TASKS_TRACE_RCU
>  	.trc_reader_nesting = 0,
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 0d944e92a43f..af7203be1d2d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1976,6 +1976,7 @@ static inline void rcu_copy_process(struct task_struct *p)
>  	p->rcu_tasks_holdout = false;
>  	INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
>  	p->rcu_tasks_idle_cpu = -1;
> +	INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
>  #endif /* #ifdef CONFIG_TASKS_RCU */
>  #ifdef CONFIG_TASKS_TRACE_RCU
>  	p->trc_reader_nesting = 0;
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index b7d5f2757053..4a5d562e3189 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -277,6 +277,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
>  		rtpcp->rtpp = rtp;
>  		if (!rtpcp->rtp_blkd_tasks.next)
>  			INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
> +		if (!rtpcp->rtp_exit_list.next)

I assume there can't be an exiting task concurrently at this point on
boot. Because kthreadd just got created and workqueues as well but that's it,
right? Or workqueues can die that early? Probably not.

> +			INIT_LIST_HEAD(&rtpcp->rtp_exit_list);

Because if tasks can exit concurrently, then we are in trouble :-)

Thanks.

>  	}
>  
>  	pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name,
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-17  1:27 ` [PATCH v2 4/6] rcu-tasks: Maintain lists " Boqun Feng
@ 2024-02-22 16:32   ` Frederic Weisbecker
  2024-02-23 12:19   ` Frederic Weisbecker
  1 sibling, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-22 16:32 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, Neeraj Upadhyay, Paul E. McKenney,
	Chen Zhongjin, Yang Jihong, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang

Le Fri, Feb 16, 2024 at 05:27:39PM -0800, Boqun Feng a écrit :
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> This commit continues the elimination of deadlocks involving do_exit()
> and RCU tasks by causing exit_tasks_rcu_start() to add the current
> task to a per-CPU list and causing exit_tasks_rcu_stop() to remove the
> current task from whatever list it is on.  These lists will be used to
> track tasks that are exiting, while still accounting for any RCU-tasks
> quiescent states that these tasks pass though.
> 
> [ paulmck: Apply Frederic Weisbecker feedback. ]
> 
> Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
> 
> Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Reported-by: Yang Jihong <yangjihong1@huawei.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Tested-by: Yang Jihong <yangjihong1@huawei.com>
> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/6] rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
  2024-02-17  1:27 ` [PATCH v2 5/6] rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks Boqun Feng
@ 2024-02-22 16:46   ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-22 16:46 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, Neeraj Upadhyay, Paul E. McKenney,
	Chen Zhongjin, Yang Jihong, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang

Le Fri, Feb 16, 2024 at 05:27:40PM -0800, Boqun Feng a écrit :
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> Holding a mutex across synchronize_rcu_tasks() and acquiring
> that same mutex in code called from do_exit() after its call to
> exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> results in deadlock.  This is by design, because tasks that are far
> enough into do_exit() are no longer present on the tasks list, making
> it a bit difficult for RCU Tasks to find them, let alone wait on them
> to do a voluntary context switch.  However, such deadlocks are becoming
> more frequent.  In addition, lockdep currently does not detect such
> deadlocks and they can be difficult to reproduce.
> 
> In addition, if a task voluntarily context switches during that time
> (for example, if it blocks acquiring a mutex), then this task is in an
> RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
> just as well take advantage of that fact.
> 
> This commit therefore eliminates these deadlock by replacing the
> SRCU-based wait for do_exit() completion with per-CPU lists of tasks
> currently exiting.  A given task will be on one of these per-CPU lists for
> the same period of time that this task would previously have been in the
> previous SRCU read-side critical section.  These lists enable RCU Tasks
> to find the tasks that have already been removed from the tasks list,
> but that must nevertheless be waited upon.
> 
> The RCU Tasks grace period gathers any of these do_exit() tasks that it
> must wait on, and adds them to the list of holdouts.  Per-CPU locking
> and get_task_struct() are used to synchronize addition to and removal
> from these lists.
> 
> Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
> 
> Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Reported-by: Yang Jihong <yangjihong1@huawei.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Tested-by: Yang Jihong <yangjihong1@huawei.com>
> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/6] RCU tasks fixes for v6.9
  2024-02-17  1:27 [PATCH v2 0/6] RCU tasks fixes for v6.9 Boqun Feng
                   ` (5 preceding siblings ...)
  2024-02-17  1:27 ` [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan() Boqun Feng
@ 2024-02-22 16:52 ` Frederic Weisbecker
  2024-02-22 22:09   ` Paul E. McKenney
  6 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-22 16:52 UTC (permalink / raw)
  To: Boqun Feng; +Cc: linux-kernel, rcu, Neeraj Upadhyay

Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a écrit :
> Hi,
> 
> This series contains the fixes of RCU tasks for v6.9. You can also find
> the series at:
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
> 
> Changes since v1:
> 
> *	Update with Paul's rework on "Eliminate deadlocks involving
> 	do_exit() and RCU task"
> 
> The detailed list of changes:
> 
> Paul E. McKenney (6):
>   rcu-tasks: Repair RCU Tasks Trace quiescence check
>   rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
>   rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
>   rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
>   rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks

Food for later thoughts and further improvements: would it make sense to
call exit_rcu_tasks_start() on fork() instead and rely solely on
each CPUs' rtp_exit_list instead of the tasklist?

Thanks.

>   rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
> 
>  include/linux/rcupdate.h |   4 +-
>  include/linux/sched.h    |   2 +
>  init/init_task.c         |   1 +
>  kernel/fork.c            |   1 +
>  kernel/rcu/tasks.h       | 110 ++++++++++++++++++++++++++++++---------
>  5 files changed, 90 insertions(+), 28 deletions(-)
> 
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/6] rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-17  1:27 ` [PATCH v2 2/6] rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks Boqun Feng
@ 2024-02-22 16:54   ` Frederic Weisbecker
  2024-02-22 20:46     ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-22 16:54 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, Neeraj Upadhyay, Paul E. McKenney,
	Chen Zhongjin, Yang Jihong, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Mathieu Desnoyers, Lai Jiangshan, Zqiang

Le Fri, Feb 16, 2024 at 05:27:37PM -0800, Boqun Feng a écrit :
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> Holding a mutex across synchronize_rcu_tasks() and acquiring
> that same mutex in code called from do_exit() after its call to
> exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> results in deadlock.  This is by design, because tasks that are far
> enough into do_exit() are no longer present on the tasks list, making
> it a bit difficult for RCU Tasks to find them, let alone wait on them
> to do a voluntary context switch.  However, such deadlocks are becoming
> more frequent.  In addition, lockdep currently does not detect such
> deadlocks and they can be difficult to reproduce.
> 
> In addition, if a task voluntarily context switches during that time
> (for example, if it blocks acquiring a mutex), then this task is in an
> RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
> just as well take advantage of that fact.
> 
> This commit therefore adds the data structures that will be needed
> to rely on these quiescent states and to eliminate these deadlocks.
> 
> Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
> 
> Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Reported-by: Yang Jihong <yangjihong1@huawei.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Tested-by: Yang Jihong <yangjihong1@huawei.com>
> Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
  2024-02-17  1:27 ` [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan() Boqun Feng
@ 2024-02-22 17:48   ` Frederic Weisbecker
  2024-02-22 20:52     ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-22 17:48 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, Neeraj Upadhyay, Paul E. McKenney,
	Thomas Gleixner, Sebastian Siewior, Anna-Maria Behnsen,
	Steven Rostedt, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang

Le Fri, Feb 16, 2024 at 05:27:41PM -0800, Boqun Feng a écrit :
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> The current code will scan the entirety of each per-CPU list of exiting
> tasks in ->rtp_exit_list with interrupts disabled.  This is normally just
> fine, because each CPU typically won't have very many tasks in this state.
> However, if a large number of tasks block late in do_exit(), these lists
> could be arbitrarily long.  Low probability, perhaps, but it really
> could happen.
> 
> This commit therefore occasionally re-enables interrupts while traversing
> these lists, inserting a dummy element to hold the current place in the
> list.  In kernels built with CONFIG_PREEMPT_RT=y, this re-enabling happens
> after each list element is processed, otherwise every one-to-two jiffies.
> 
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/rcu/tasks.h | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index 4dc355b2ac22..866743e0796f 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -971,13 +971,32 @@ static void rcu_tasks_postscan(struct list_head *hop)
>  	 */
>  
>  	for_each_possible_cpu(cpu) {
> +		unsigned long j = jiffies + 1;
>  		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
>  		struct task_struct *t;
> +		struct task_struct *t1;
> +		struct list_head tmp;
>  
>  		raw_spin_lock_irq_rcu_node(rtpcp);
> -		list_for_each_entry(t, &rtpcp->rtp_exit_list, rcu_tasks_exit_list)
> +		list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
>  			if (list_empty(&t->rcu_tasks_holdout_list))
>  				rcu_tasks_pertask(t, hop);
> +
> +			// RT kernels need frequent pauses, otherwise
> +			// pause at least once per pair of jiffies.
> +			if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
> +				continue;
> +
> +			// Keep our place in the list while pausing.
> +			// Nothing else traverses this list, so adding a
> +			// bare list_head is OK.
> +			list_add(&tmp, &t->rcu_tasks_exit_list);

I'm a bit confused about what this does...

> +			raw_spin_unlock_irq_rcu_node(rtpcp);
> +			cond_resched(); // For CONFIG_PREEMPT=n kernels
> +			raw_spin_lock_irq_rcu_node(rtpcp);
> +			list_del(&tmp);

Isn't there a risk that t is reaped by then? If it was not observed on_rq
while calling rcu_tasks_pertask() then there is no get_task_struct.

And what about t1? Can't it be reaped as well?

Thanks.


> +			j = jiffies + 1;
> +		}
>  		raw_spin_unlock_irq_rcu_node(rtpcp);
>  	}
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/6] rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-22 16:21   ` Frederic Weisbecker
@ 2024-02-22 20:41     ` Paul E. McKenney
  2024-02-23 11:39       ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-22 20:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Chen Zhongjin,
	Yang Jihong, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Andrew Morton, Kent Overstreet, Oleg Nesterov, Heiko Carstens,
	Christian Brauner, Suren Baghdasaryan, Michael S. Tsirkin,
	Mike Christie, Mateusz Guzik, Nicholas Piggin, Peng Zhang

On Thu, Feb 22, 2024 at 05:21:03PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 16, 2024 at 05:27:38PM -0800, Boqun Feng a écrit :
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > Holding a mutex across synchronize_rcu_tasks() and acquiring
> > that same mutex in code called from do_exit() after its call to
> > exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> > results in deadlock.  This is by design, because tasks that are far
> > enough into do_exit() are no longer present on the tasks list, making
> > it a bit difficult for RCU Tasks to find them, let alone wait on them
> > to do a voluntary context switch.  However, such deadlocks are becoming
> > more frequent.  In addition, lockdep currently does not detect such
> > deadlocks and they can be difficult to reproduce.
> > 
> > In addition, if a task voluntarily context switches during that time
> > (for example, if it blocks acquiring a mutex), then this task is in an
> > RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
> > just as well take advantage of that fact.
> > 
> > This commit therefore initializes the data structures that will be needed
> > to rely on these quiescent states and to eliminate these deadlocks.
> > 
> > Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
> > 
> > Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
> > Reported-by: Yang Jihong <yangjihong1@huawei.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Tested-by: Yang Jihong <yangjihong1@huawei.com>
> > Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  init/init_task.c   | 1 +
> >  kernel/fork.c      | 1 +
> >  kernel/rcu/tasks.h | 2 ++
> >  3 files changed, 4 insertions(+)
> > 
> > diff --git a/init/init_task.c b/init/init_task.c
> > index 7ecb458eb3da..4daee6d761c8 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -147,6 +147,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> >  	.rcu_tasks_holdout = false,
> >  	.rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
> >  	.rcu_tasks_idle_cpu = -1,
> > +	.rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
> >  #endif
> >  #ifdef CONFIG_TASKS_TRACE_RCU
> >  	.trc_reader_nesting = 0,
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 0d944e92a43f..af7203be1d2d 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1976,6 +1976,7 @@ static inline void rcu_copy_process(struct task_struct *p)
> >  	p->rcu_tasks_holdout = false;
> >  	INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
> >  	p->rcu_tasks_idle_cpu = -1;
> > +	INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
> >  #endif /* #ifdef CONFIG_TASKS_RCU */
> >  #ifdef CONFIG_TASKS_TRACE_RCU
> >  	p->trc_reader_nesting = 0;
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index b7d5f2757053..4a5d562e3189 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -277,6 +277,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
> >  		rtpcp->rtpp = rtp;
> >  		if (!rtpcp->rtp_blkd_tasks.next)
> >  			INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
> > +		if (!rtpcp->rtp_exit_list.next)
> 
> I assume there can't be an exiting task concurrently at this point on
> boot. Because kthreadd just got created and workqueues as well but that's it,
> right? Or workqueues can die that early? Probably not.
> 
> > +			INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
> 
> Because if tasks can exit concurrently, then we are in trouble :-)

Tasks exiting at that point might be unconventional, but I don't see
anything that prevents them from doing so.

So excellent catch, and thank you very much!!!

My thought is to add the following patch to precede this one, which
initializes those lists at rcu_init() time.  Would that work?

						Thanx, Paul

------------------------------------------------------------------------

commit 9a876aac8064dfd46c840e4bb6177e65f7964bb4
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Feb 22 12:29:54 2024 -0800

    rcu-tasks: Initialize callback lists at rcu_init() time
    
    In order for RCU Tasks to reliably maintain per-CPU lists of exiting
    tasks, those lists must be initialized before it is possible for tasks
    to exit, especially given that the boot CPU is not necessarily CPU 0
    (an example being, powerpc kexec() kernels).  And at the time that
    rcu_init_tasks_generic() is called, a task could potentially exit,
    unconventional though that sort of thing might be.
    
    This commit therefore moves the calls to cblist_init_generic() from
    functions called from rcu_init_tasks_generic() to a new function named
    tasks_cblist_init_generic() that is invoked from rcu_init().
    
    This constituted a bug in a commit that never went to mainline, so
    there is no need for any backporting to -stable.
    
    Reported-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 4e65a92e528e5..86fce206560e8 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -528,6 +528,12 @@ struct task_struct *get_rcu_tasks_gp_kthread(void);
 struct task_struct *get_rcu_tasks_rude_gp_kthread(void);
 #endif // # ifdef CONFIG_TASKS_RUDE_RCU
 
+#ifdef CONFIG_TASKS_RCU_GENERIC
+void tasks_cblist_init_generic(void);
+#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
+static inline void tasks_cblist_init_generic(void) { }
+#endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
+
 #define RCU_SCHEDULER_INACTIVE	0
 #define RCU_SCHEDULER_INIT	1
 #define RCU_SCHEDULER_RUNNING	2
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 866743e0796f4..e06e388e7c7e6 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -240,7 +240,6 @@ static const char *tasks_gp_state_getname(struct rcu_tasks *rtp)
 static void cblist_init_generic(struct rcu_tasks *rtp)
 {
 	int cpu;
-	unsigned long flags;
 	int lim;
 	int shift;
 
@@ -266,10 +265,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
 		WARN_ON_ONCE(!rtpcp);
 		if (cpu)
 			raw_spin_lock_init(&ACCESS_PRIVATE(rtpcp, lock));
-		local_irq_save(flags);  // serialize initialization
 		if (rcu_segcblist_empty(&rtpcp->cblist))
 			rcu_segcblist_init(&rtpcp->cblist);
-		local_irq_restore(flags);
 		INIT_WORK(&rtpcp->rtp_work, rcu_tasks_invoke_cbs_wq);
 		rtpcp->cpu = cpu;
 		rtpcp->rtpp = rtp;
@@ -1153,7 +1150,6 @@ module_param(rcu_tasks_lazy_ms, int, 0444);
 
 static int __init rcu_spawn_tasks_kthread(void)
 {
-	cblist_init_generic(&rcu_tasks);
 	rcu_tasks.gp_sleep = HZ / 10;
 	rcu_tasks.init_fract = HZ / 10;
 	if (rcu_tasks_lazy_ms >= 0)
@@ -1340,7 +1336,6 @@ module_param(rcu_tasks_rude_lazy_ms, int, 0444);
 
 static int __init rcu_spawn_tasks_rude_kthread(void)
 {
-	cblist_init_generic(&rcu_tasks_rude);
 	rcu_tasks_rude.gp_sleep = HZ / 10;
 	if (rcu_tasks_rude_lazy_ms >= 0)
 		rcu_tasks_rude.lazy_jiffies = msecs_to_jiffies(rcu_tasks_rude_lazy_ms);
@@ -1972,7 +1967,6 @@ module_param(rcu_tasks_trace_lazy_ms, int, 0444);
 
 static int __init rcu_spawn_tasks_trace_kthread(void)
 {
-	cblist_init_generic(&rcu_tasks_trace);
 	if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
 		rcu_tasks_trace.gp_sleep = HZ / 10;
 		rcu_tasks_trace.init_fract = HZ / 10;
@@ -2144,6 +2138,24 @@ late_initcall(rcu_tasks_verify_schedule_work);
 static void rcu_tasks_initiate_self_tests(void) { }
 #endif /* #else #ifdef CONFIG_PROVE_RCU */
 
+void __init tasks_cblist_init_generic(void)
+{
+	lockdep_assert_irqs_disabled();
+	WARN_ON(num_online_cpus() > 1);
+
+#ifdef CONFIG_TASKS_RCU
+	cblist_init_generic(&rcu_tasks);
+#endif
+
+#ifdef CONFIG_TASKS_RUDE_RCU
+	cblist_init_generic(&rcu_tasks_rude);
+#endif
+
+#ifdef CONFIG_TASKS_TRACE_RCU
+	cblist_init_generic(&rcu_tasks_trace);
+#endif
+}
+
 void __init rcu_init_tasks_generic(void)
 {
 #ifdef CONFIG_TASKS_RCU
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index fec804b790803..705c0d16850aa 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -261,4 +261,5 @@ void __init rcu_init(void)
 {
 	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
 	rcu_early_boot_tests();
+	tasks_cblist_init_generic();
 }
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 31f3a61f9c384..4f4aec64039f0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5601,6 +5601,8 @@ void __init rcu_init(void)
 	(void)start_poll_synchronize_rcu_expedited();
 
 	rcu_test_sync_prims();
+
+	tasks_cblist_init_generic();
 }
 
 #include "tree_stall.h"

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/6] rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-22 16:54   ` Frederic Weisbecker
@ 2024-02-22 20:46     ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-22 20:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Chen Zhongjin,
	Yang Jihong, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang

On Thu, Feb 22, 2024 at 05:54:48PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 16, 2024 at 05:27:37PM -0800, Boqun Feng a écrit :
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > Holding a mutex across synchronize_rcu_tasks() and acquiring
> > that same mutex in code called from do_exit() after its call to
> > exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> > results in deadlock.  This is by design, because tasks that are far
> > enough into do_exit() are no longer present on the tasks list, making
> > it a bit difficult for RCU Tasks to find them, let alone wait on them
> > to do a voluntary context switch.  However, such deadlocks are becoming
> > more frequent.  In addition, lockdep currently does not detect such
> > deadlocks and they can be difficult to reproduce.
> > 
> > In addition, if a task voluntarily context switches during that time
> > (for example, if it blocks acquiring a mutex), then this task is in an
> > RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
> > just as well take advantage of that fact.
> > 
> > This commit therefore adds the data structures that will be needed
> > to rely on these quiescent states and to eliminate these deadlocks.
> > 
> > Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
> > 
> > Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
> > Reported-by: Yang Jihong <yangjihong1@huawei.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Tested-by: Yang Jihong <yangjihong1@huawei.com>
> > Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thank you, I have recorded your three review tags.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
  2024-02-22 17:48   ` Frederic Weisbecker
@ 2024-02-22 20:52     ` Paul E. McKenney
  2024-02-22 22:56       ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-22 20:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Thomas Gleixner,
	Sebastian Siewior, Anna-Maria Behnsen, Steven Rostedt,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang

On Thu, Feb 22, 2024 at 06:48:47PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 16, 2024 at 05:27:41PM -0800, Boqun Feng a écrit :
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > The current code will scan the entirety of each per-CPU list of exiting
> > tasks in ->rtp_exit_list with interrupts disabled.  This is normally just
> > fine, because each CPU typically won't have very many tasks in this state.
> > However, if a large number of tasks block late in do_exit(), these lists
> > could be arbitrarily long.  Low probability, perhaps, but it really
> > could happen.
> > 
> > This commit therefore occasionally re-enables interrupts while traversing
> > these lists, inserting a dummy element to hold the current place in the
> > list.  In kernels built with CONFIG_PREEMPT_RT=y, this re-enabling happens
> > after each list element is processed, otherwise every one-to-two jiffies.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Sebastian Siewior <bigeasy@linutronix.de>
> > Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  kernel/rcu/tasks.h | 21 ++++++++++++++++++++-
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index 4dc355b2ac22..866743e0796f 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -971,13 +971,32 @@ static void rcu_tasks_postscan(struct list_head *hop)
> >  	 */
> >  
> >  	for_each_possible_cpu(cpu) {
> > +		unsigned long j = jiffies + 1;
> >  		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
> >  		struct task_struct *t;
> > +		struct task_struct *t1;
> > +		struct list_head tmp;
> >  
> >  		raw_spin_lock_irq_rcu_node(rtpcp);
> > -		list_for_each_entry(t, &rtpcp->rtp_exit_list, rcu_tasks_exit_list)
> > +		list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
> >  			if (list_empty(&t->rcu_tasks_holdout_list))
> >  				rcu_tasks_pertask(t, hop);
> > +
> > +			// RT kernels need frequent pauses, otherwise
> > +			// pause at least once per pair of jiffies.
> > +			if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
> > +				continue;
> > +
> > +			// Keep our place in the list while pausing.
> > +			// Nothing else traverses this list, so adding a
> > +			// bare list_head is OK.
> > +			list_add(&tmp, &t->rcu_tasks_exit_list);
> 
> I'm a bit confused about what this does...
> 
> > +			raw_spin_unlock_irq_rcu_node(rtpcp);
> > +			cond_resched(); // For CONFIG_PREEMPT=n kernels
> > +			raw_spin_lock_irq_rcu_node(rtpcp);
> > +			list_del(&tmp);
> 
> Isn't there a risk that t is reaped by then? If it was not observed on_rq
> while calling rcu_tasks_pertask() then there is no get_task_struct.

That is OK, courtesy of the _safe in list_for_each_entry_safe().

> And what about t1? Can't it be reaped as well?

It can, and that is a problem, good catch!

My current thought is to add this before the list_del(), which is
admittedly a bit crude:

			t1 = tmp.next;

Is there a better way?

							Thanx, Paul

> Thanks.
> 
> 
> > +			j = jiffies + 1;
> > +		}
> >  		raw_spin_unlock_irq_rcu_node(rtpcp);
> >  	}
> >  
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/6] RCU tasks fixes for v6.9
  2024-02-22 16:52 ` [PATCH v2 0/6] RCU tasks fixes for v6.9 Frederic Weisbecker
@ 2024-02-22 22:09   ` Paul E. McKenney
  2024-02-23 12:25     ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-22 22:09 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay

On Thu, Feb 22, 2024 at 05:52:23PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a écrit :
> > Hi,
> > 
> > This series contains the fixes of RCU tasks for v6.9. You can also find
> > the series at:
> > 
> > 	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
> > 
> > Changes since v1:
> > 
> > *	Update with Paul's rework on "Eliminate deadlocks involving
> > 	do_exit() and RCU task"
> > 
> > The detailed list of changes:
> > 
> > Paul E. McKenney (6):
> >   rcu-tasks: Repair RCU Tasks Trace quiescence check
> >   rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
> >   rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
> >   rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
> >   rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
> 
> Food for later thoughts and further improvements: would it make sense to
> call exit_rcu_tasks_start() on fork() instead and rely solely on
> each CPUs' rtp_exit_list instead of the tasklist?

It might well.

One big advantage of doing that is the ability to incrementally traverse
the tasks.  But is there some good way of doing that to the full task
lists?  If so, everyone could benefit.

							Thanx, Paul

> Thanks.
> 
> >   rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
> > 
> >  include/linux/rcupdate.h |   4 +-
> >  include/linux/sched.h    |   2 +
> >  init/init_task.c         |   1 +
> >  kernel/fork.c            |   1 +
> >  kernel/rcu/tasks.h       | 110 ++++++++++++++++++++++++++++++---------
> >  5 files changed, 90 insertions(+), 28 deletions(-)
> > 
> > -- 
> > 2.43.0
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
  2024-02-22 20:52     ` Paul E. McKenney
@ 2024-02-22 22:56       ` Paul E. McKenney
  2024-02-23 12:17         ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-22 22:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Thomas Gleixner,
	Sebastian Siewior, Anna-Maria Behnsen, Steven Rostedt,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang

On Thu, Feb 22, 2024 at 12:52:24PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 22, 2024 at 06:48:47PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Feb 16, 2024 at 05:27:41PM -0800, Boqun Feng a écrit :
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > 
> > > The current code will scan the entirety of each per-CPU list of exiting
> > > tasks in ->rtp_exit_list with interrupts disabled.  This is normally just
> > > fine, because each CPU typically won't have very many tasks in this state.
> > > However, if a large number of tasks block late in do_exit(), these lists
> > > could be arbitrarily long.  Low probability, perhaps, but it really
> > > could happen.
> > > 
> > > This commit therefore occasionally re-enables interrupts while traversing
> > > these lists, inserting a dummy element to hold the current place in the
> > > list.  In kernels built with CONFIG_PREEMPT_RT=y, this re-enabling happens
> > > after each list element is processed, otherwise every one-to-two jiffies.
> > > 
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Sebastian Siewior <bigeasy@linutronix.de>
> > > Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > ---
> > >  kernel/rcu/tasks.h | 21 ++++++++++++++++++++-
> > >  1 file changed, 20 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > index 4dc355b2ac22..866743e0796f 100644
> > > --- a/kernel/rcu/tasks.h
> > > +++ b/kernel/rcu/tasks.h
> > > @@ -971,13 +971,32 @@ static void rcu_tasks_postscan(struct list_head *hop)
> > >  	 */
> > >  
> > >  	for_each_possible_cpu(cpu) {
> > > +		unsigned long j = jiffies + 1;
> > >  		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
> > >  		struct task_struct *t;
> > > +		struct task_struct *t1;
> > > +		struct list_head tmp;
> > >  
> > >  		raw_spin_lock_irq_rcu_node(rtpcp);
> > > -		list_for_each_entry(t, &rtpcp->rtp_exit_list, rcu_tasks_exit_list)
> > > +		list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
> > >  			if (list_empty(&t->rcu_tasks_holdout_list))
> > >  				rcu_tasks_pertask(t, hop);
> > > +
> > > +			// RT kernels need frequent pauses, otherwise
> > > +			// pause at least once per pair of jiffies.
> > > +			if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
> > > +				continue;
> > > +
> > > +			// Keep our place in the list while pausing.
> > > +			// Nothing else traverses this list, so adding a
> > > +			// bare list_head is OK.
> > > +			list_add(&tmp, &t->rcu_tasks_exit_list);
> > 
> > I'm a bit confused about what this does...
> > 
> > > +			raw_spin_unlock_irq_rcu_node(rtpcp);
> > > +			cond_resched(); // For CONFIG_PREEMPT=n kernels
> > > +			raw_spin_lock_irq_rcu_node(rtpcp);
> > > +			list_del(&tmp);
> > 
> > Isn't there a risk that t is reaped by then? If it was not observed on_rq
> > while calling rcu_tasks_pertask() then there is no get_task_struct.
> 
> That is OK, courtesy of the _safe in list_for_each_entry_safe().
> 
> > And what about t1? Can't it be reaped as well?
> 
> It can, and that is a problem, good catch!
> 
> My current thought is to add this before the list_del(), which is
> admittedly a bit crude:
> 
> 			t1 = tmp.next;

OK, OK...  ;-)

			t1 = list_entry(tmp.next, struct task_struct, rcu_tasks_exit_list);

Is there still a better way?

							Thanx, Paul

> > Thanks.
> > 
> > 
> > > +			j = jiffies + 1;
> > > +		}
> > >  		raw_spin_unlock_irq_rcu_node(rtpcp);
> > >  	}
> > >  
> > > -- 
> > > 2.43.0
> > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/6] rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-22 20:41     ` Paul E. McKenney
@ 2024-02-23 11:39       ` Frederic Weisbecker
  0 siblings, 0 replies; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-23 11:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Chen Zhongjin,
	Yang Jihong, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Andrew Morton, Kent Overstreet, Oleg Nesterov, Heiko Carstens,
	Christian Brauner, Suren Baghdasaryan, Michael S. Tsirkin,
	Mike Christie, Mateusz Guzik, Nicholas Piggin, Peng Zhang

On Thu, Feb 22, 2024 at 12:41:55PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 22, 2024 at 05:21:03PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Feb 16, 2024 at 05:27:38PM -0800, Boqun Feng a écrit :
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > 
> > > Holding a mutex across synchronize_rcu_tasks() and acquiring
> > > that same mutex in code called from do_exit() after its call to
> > > exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> > > results in deadlock.  This is by design, because tasks that are far
> > > enough into do_exit() are no longer present on the tasks list, making
> > > it a bit difficult for RCU Tasks to find them, let alone wait on them
> > > to do a voluntary context switch.  However, such deadlocks are becoming
> > > more frequent.  In addition, lockdep currently does not detect such
> > > deadlocks and they can be difficult to reproduce.
> > > 
> > > In addition, if a task voluntarily context switches during that time
> > > (for example, if it blocks acquiring a mutex), then this task is in an
> > > RCU Tasks quiescent state.  And with some adjustments, RCU Tasks could
> > > just as well take advantage of that fact.
> > > 
> > > This commit therefore initializes the data structures that will be needed
> > > to rely on these quiescent states and to eliminate these deadlocks.
> > > 
> > > Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
> > > 
> > > Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
> > > Reported-by: Yang Jihong <yangjihong1@huawei.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Tested-by: Yang Jihong <yangjihong1@huawei.com>
> > > Tested-by: Chen Zhongjin <chenzhongjin@huawei.com>
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

Looks good, thanks!

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
  2024-02-22 22:56       ` Paul E. McKenney
@ 2024-02-23 12:17         ` Frederic Weisbecker
  2024-02-23 15:14           ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-23 12:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Thomas Gleixner,
	Sebastian Siewior, Anna-Maria Behnsen, Steven Rostedt,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang

On Thu, Feb 22, 2024 at 02:56:46PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 22, 2024 at 12:52:24PM -0800, Paul E. McKenney wrote:
> > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > > index 4dc355b2ac22..866743e0796f 100644
> > > > --- a/kernel/rcu/tasks.h
> > > > +++ b/kernel/rcu/tasks.h
> > > > @@ -971,13 +971,32 @@ static void rcu_tasks_postscan(struct list_head *hop)
> > > >  	 */
> > > >  
> > > >  	for_each_possible_cpu(cpu) {
> > > > +		unsigned long j = jiffies + 1;
> > > >  		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
> > > >  		struct task_struct *t;
> > > > +		struct task_struct *t1;
> > > > +		struct list_head tmp;
> > > >  
> > > >  		raw_spin_lock_irq_rcu_node(rtpcp);
> > > > -		list_for_each_entry(t, &rtpcp->rtp_exit_list, rcu_tasks_exit_list)
> > > > +		list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
> > > >  			if (list_empty(&t->rcu_tasks_holdout_list))
> > > >  				rcu_tasks_pertask(t, hop);
> > > > +
> > > > +			// RT kernels need frequent pauses, otherwise
> > > > +			// pause at least once per pair of jiffies.
> > > > +			if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
> > > > +				continue;
> > > > +
> > > > +			// Keep our place in the list while pausing.
> > > > +			// Nothing else traverses this list, so adding a
> > > > +			// bare list_head is OK.
> > > > +			list_add(&tmp, &t->rcu_tasks_exit_list);
> > > 
> > > I'm a bit confused about what this does...

Oh, ok now I see what you're doing! My fear was that t goes off but doesn't
remove itself and then the list_del() crashes. But no actually tmp places itself
after t and then if t exits, it removes itself before tmp and that's fine.

> > > 
> > > > +			raw_spin_unlock_irq_rcu_node(rtpcp);
> > > > +			cond_resched(); // For CONFIG_PREEMPT=n kernels
> > > > +			raw_spin_lock_irq_rcu_node(rtpcp);
> > > > +			list_del(&tmp);
> > > 
> > > Isn't there a risk that t is reaped by then? If it was not observed on_rq
> > > while calling rcu_tasks_pertask() then there is no get_task_struct.
> > 
> > That is OK, courtesy of the _safe in list_for_each_entry_safe().
> > 
> > > And what about t1? Can't it be reaped as well?
> > 
> > It can, and that is a problem, good catch!
> > 
> > My current thought is to add this before the list_del(), which is
> > admittedly a bit crude:
> > 
> > 			t1 = tmp.next;
> 
> OK, OK...  ;-)
> 
> 			t1 = list_entry(tmp.next, struct task_struct, rcu_tasks_exit_list);
> 
> Is there still a better way?

That should work.

An (untested) alternative that fiddles a bit less with list internals could look like this:

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 866743e0796f..0ff2b554f5b5 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -973,12 +973,13 @@ static void rcu_tasks_postscan(struct list_head *hop)
 	for_each_possible_cpu(cpu) {
 		unsigned long j = jiffies + 1;
 		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
-		struct task_struct *t;
-		struct task_struct *t1;
-		struct list_head tmp;
 
 		raw_spin_lock_irq_rcu_node(rtpcp);
-		list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
+		while (!list_empty(&rtpcp->rtp_exit_list)) {
+			struct task_struct *t;
+			t = list_first_entry(&rtpcp->rtp_exit_list, typeof(*t), rcu_tasks_exit_list);
+			list_del_init(&t->rcu_tasks_exit_list);
+
 			if (list_empty(&t->rcu_tasks_holdout_list))
 				rcu_tasks_pertask(t, hop);
 
@@ -987,14 +988,9 @@ static void rcu_tasks_postscan(struct list_head *hop)
 			if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
 				continue;
 
-			// Keep our place in the list while pausing.
-			// Nothing else traverses this list, so adding a
-			// bare list_head is OK.
-			list_add(&tmp, &t->rcu_tasks_exit_list);
 			raw_spin_unlock_irq_rcu_node(rtpcp);
 			cond_resched(); // For CONFIG_PREEMPT=n kernels
 			raw_spin_lock_irq_rcu_node(rtpcp);
-			list_del(&tmp);
 			j = jiffies + 1;
 		}
 		raw_spin_unlock_irq_rcu_node(rtpcp);
@@ -1219,7 +1215,6 @@ void exit_tasks_rcu_stop(void)
 	struct rcu_tasks_percpu *rtpcp;
 	struct task_struct *t = current;
 
-	WARN_ON_ONCE(list_empty(&t->rcu_tasks_exit_list));
 	rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, t->rcu_tasks_exit_cpu);
 	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
 	list_del_init(&t->rcu_tasks_exit_list);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-17  1:27 ` [PATCH v2 4/6] rcu-tasks: Maintain lists " Boqun Feng
  2024-02-22 16:32   ` Frederic Weisbecker
@ 2024-02-23 12:19   ` Frederic Weisbecker
  2024-02-24  0:28     ` Paul E. McKenney
  1 sibling, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-23 12:19 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linux-kernel, rcu, Neeraj Upadhyay, Paul E. McKenney,
	Chen Zhongjin, Yang Jihong, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang

On Fri, Feb 16, 2024 at 05:27:39PM -0800, Boqun Feng wrote:
> -void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
> +void exit_tasks_rcu_start(void)
>  {
> -	current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
> +	unsigned long flags;
> +	struct rcu_tasks_percpu *rtpcp;
> +	struct task_struct *t = current;
> +
> +	WARN_ON_ONCE(!list_empty(&t->rcu_tasks_exit_list));
> +	preempt_disable();
> +	rtpcp = this_cpu_ptr(rcu_tasks.rtpcpu);
> +	t->rcu_tasks_exit_cpu = smp_processor_id();
> +	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
> +	if (!rtpcp->rtp_exit_list.next)

And then you might want to turn that into a WARN_ONCE.

Thanks.

> +		INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
> +	list_add(&t->rcu_tasks_exit_list, &rtpcp->rtp_exit_list);
> +	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
> +	preempt_enable();
>  }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/6] RCU tasks fixes for v6.9
  2024-02-22 22:09   ` Paul E. McKenney
@ 2024-02-23 12:25     ` Frederic Weisbecker
  2024-02-24  0:43       ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-23 12:25 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay

On Thu, Feb 22, 2024 at 02:09:17PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 22, 2024 at 05:52:23PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a écrit :
> > > Hi,
> > > 
> > > This series contains the fixes of RCU tasks for v6.9. You can also find
> > > the series at:
> > > 
> > > 	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
> > > 
> > > Changes since v1:
> > > 
> > > *	Update with Paul's rework on "Eliminate deadlocks involving
> > > 	do_exit() and RCU task"
> > > 
> > > The detailed list of changes:
> > > 
> > > Paul E. McKenney (6):
> > >   rcu-tasks: Repair RCU Tasks Trace quiescence check
> > >   rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
> > >   rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
> > >   rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
> > >   rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
> > 
> > Food for later thoughts and further improvements: would it make sense to
> > call exit_rcu_tasks_start() on fork() instead and rely solely on
> > each CPUs' rtp_exit_list instead of the tasklist?
> 
> It might well.
> 
> One big advantage of doing that is the ability to incrementally traverse
> the tasks.  But is there some good way of doing that to the full task
> lists?  If so, everyone could benefit.

What do you mean by incrementally? You mean being able to cond_resched()
in the middle of the tasks iteration? Yeah not sure that's possible...

Thanks.

> 
> 							Thanx, Paul
> 
> > Thanks.
> > 
> > >   rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
> > > 
> > >  include/linux/rcupdate.h |   4 +-
> > >  include/linux/sched.h    |   2 +
> > >  init/init_task.c         |   1 +
> > >  kernel/fork.c            |   1 +
> > >  kernel/rcu/tasks.h       | 110 ++++++++++++++++++++++++++++++---------
> > >  5 files changed, 90 insertions(+), 28 deletions(-)
> > > 
> > > -- 
> > > 2.43.0
> > > 
> > > 
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
  2024-02-23 12:17         ` Frederic Weisbecker
@ 2024-02-23 15:14           ` Frederic Weisbecker
  2024-02-24  0:23             ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-23 15:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Thomas Gleixner,
	Sebastian Siewior, Anna-Maria Behnsen, Steven Rostedt,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang

On Fri, Feb 23, 2024 at 01:17:14PM +0100, Frederic Weisbecker wrote:
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index 866743e0796f..0ff2b554f5b5 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -973,12 +973,13 @@ static void rcu_tasks_postscan(struct list_head *hop)
>  	for_each_possible_cpu(cpu) {
>  		unsigned long j = jiffies + 1;
>  		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
> -		struct task_struct *t;
> -		struct task_struct *t1;
> -		struct list_head tmp;
>  
>  		raw_spin_lock_irq_rcu_node(rtpcp);
> -		list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
> +		while (!list_empty(&rtpcp->rtp_exit_list)) {
> +			struct task_struct *t;
> +			t = list_first_entry(&rtpcp->rtp_exit_list, typeof(*t), rcu_tasks_exit_list);
> +			list_del_init(&t->rcu_tasks_exit_list);

Oh no! The task has to stay in the list for subsequent grace periods! Please
forget that suggestion... Yours looks good!

Thanks.

> +
>  			if (list_empty(&t->rcu_tasks_holdout_list))
>  				rcu_tasks_pertask(t, hop);
>  
> @@ -987,14 +988,9 @@ static void rcu_tasks_postscan(struct list_head *hop)
>  			if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
>  				continue;
>  
> -			// Keep our place in the list while pausing.
> -			// Nothing else traverses this list, so adding a
> -			// bare list_head is OK.
> -			list_add(&tmp, &t->rcu_tasks_exit_list);
>  			raw_spin_unlock_irq_rcu_node(rtpcp);
>  			cond_resched(); // For CONFIG_PREEMPT=n kernels
>  			raw_spin_lock_irq_rcu_node(rtpcp);
> -			list_del(&tmp);
>  			j = jiffies + 1;
>  		}
>  		raw_spin_unlock_irq_rcu_node(rtpcp);
> @@ -1219,7 +1215,6 @@ void exit_tasks_rcu_stop(void)
>  	struct rcu_tasks_percpu *rtpcp;
>  	struct task_struct *t = current;
>  
> -	WARN_ON_ONCE(list_empty(&t->rcu_tasks_exit_list));
>  	rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, t->rcu_tasks_exit_cpu);
>  	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
>  	list_del_init(&t->rcu_tasks_exit_list);
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
  2024-02-23 15:14           ` Frederic Weisbecker
@ 2024-02-24  0:23             ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-24  0:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Thomas Gleixner,
	Sebastian Siewior, Anna-Maria Behnsen, Steven Rostedt,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang

On Fri, Feb 23, 2024 at 04:14:49PM +0100, Frederic Weisbecker wrote:
> On Fri, Feb 23, 2024 at 01:17:14PM +0100, Frederic Weisbecker wrote:
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index 866743e0796f..0ff2b554f5b5 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -973,12 +973,13 @@ static void rcu_tasks_postscan(struct list_head *hop)
> >  	for_each_possible_cpu(cpu) {
> >  		unsigned long j = jiffies + 1;
> >  		struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
> > -		struct task_struct *t;
> > -		struct task_struct *t1;
> > -		struct list_head tmp;
> >  
> >  		raw_spin_lock_irq_rcu_node(rtpcp);
> > -		list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
> > +		while (!list_empty(&rtpcp->rtp_exit_list)) {
> > +			struct task_struct *t;
> > +			t = list_first_entry(&rtpcp->rtp_exit_list, typeof(*t), rcu_tasks_exit_list);
> > +			list_del_init(&t->rcu_tasks_exit_list);
> 
> Oh no! The task has to stay in the list for subsequent grace periods! Please
> forget that suggestion... Yours looks good!

You had me going for a bit, and I do know that feeling!  ;-)

							Thanx, Paul

> Thanks.
> 
> > +
> >  			if (list_empty(&t->rcu_tasks_holdout_list))
> >  				rcu_tasks_pertask(t, hop);
> >  
> > @@ -987,14 +988,9 @@ static void rcu_tasks_postscan(struct list_head *hop)
> >  			if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
> >  				continue;
> >  
> > -			// Keep our place in the list while pausing.
> > -			// Nothing else traverses this list, so adding a
> > -			// bare list_head is OK.
> > -			list_add(&tmp, &t->rcu_tasks_exit_list);
> >  			raw_spin_unlock_irq_rcu_node(rtpcp);
> >  			cond_resched(); // For CONFIG_PREEMPT=n kernels
> >  			raw_spin_lock_irq_rcu_node(rtpcp);
> > -			list_del(&tmp);
> >  			j = jiffies + 1;
> >  		}
> >  		raw_spin_unlock_irq_rcu_node(rtpcp);
> > @@ -1219,7 +1215,6 @@ void exit_tasks_rcu_stop(void)
> >  	struct rcu_tasks_percpu *rtpcp;
> >  	struct task_struct *t = current;
> >  
> > -	WARN_ON_ONCE(list_empty(&t->rcu_tasks_exit_list));
> >  	rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, t->rcu_tasks_exit_cpu);
> >  	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
> >  	list_del_init(&t->rcu_tasks_exit_list);
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/6] rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
  2024-02-23 12:19   ` Frederic Weisbecker
@ 2024-02-24  0:28     ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-24  0:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay, Chen Zhongjin,
	Yang Jihong, Neeraj Upadhyay, Joel Fernandes, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang

On Fri, Feb 23, 2024 at 01:19:42PM +0100, Frederic Weisbecker wrote:
> On Fri, Feb 16, 2024 at 05:27:39PM -0800, Boqun Feng wrote:
> > -void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
> > +void exit_tasks_rcu_start(void)
> >  {
> > -	current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
> > +	unsigned long flags;
> > +	struct rcu_tasks_percpu *rtpcp;
> > +	struct task_struct *t = current;
> > +
> > +	WARN_ON_ONCE(!list_empty(&t->rcu_tasks_exit_list));
> > +	preempt_disable();
> > +	rtpcp = this_cpu_ptr(rcu_tasks.rtpcpu);
> > +	t->rcu_tasks_exit_cpu = smp_processor_id();
> > +	raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
> > +	if (!rtpcp->rtp_exit_list.next)
> 
> And then you might want to turn that into a WARN_ONCE.

Excellent point, thank you!

I am queueing a separate patch for this with your Reported-by.  (This
change can lag behind, just in case this series needs to go in quickly.)

						Thanx, Paul

> Thanks.
> 
> > +		INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
> > +	list_add(&t->rcu_tasks_exit_list, &rtpcp->rtp_exit_list);
> > +	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
> > +	preempt_enable();
> >  }

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/6] RCU tasks fixes for v6.9
  2024-02-23 12:25     ` Frederic Weisbecker
@ 2024-02-24  0:43       ` Paul E. McKenney
  2024-02-26 13:56         ` Frederic Weisbecker
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-24  0:43 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay

On Fri, Feb 23, 2024 at 01:25:06PM +0100, Frederic Weisbecker wrote:
> On Thu, Feb 22, 2024 at 02:09:17PM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 22, 2024 at 05:52:23PM +0100, Frederic Weisbecker wrote:
> > > Le Fri, Feb 16, 2024 at 05:27:35PM -0800, Boqun Feng a écrit :
> > > > Hi,
> > > > 
> > > > This series contains the fixes of RCU tasks for v6.9. You can also find
> > > > the series at:
> > > > 
> > > > 	git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git rcu-tasks.2024.02.14a
> > > > 
> > > > Changes since v1:
> > > > 
> > > > *	Update with Paul's rework on "Eliminate deadlocks involving
> > > > 	do_exit() and RCU task"
> > > > 
> > > > The detailed list of changes:
> > > > 
> > > > Paul E. McKenney (6):
> > > >   rcu-tasks: Repair RCU Tasks Trace quiescence check
> > > >   rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
> > > >   rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
> > > >   rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
> > > >   rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
> > > 
> > > Food for later thoughts and further improvements: would it make sense to
> > > call exit_rcu_tasks_start() on fork() instead and rely solely on
> > > each CPUs' rtp_exit_list instead of the tasklist?
> > 
> > It might well.
> > 
> > One big advantage of doing that is the ability to incrementally traverse
> > the tasks.  But is there some good way of doing that to the full task
> > lists?  If so, everyone could benefit.
> 
> What do you mean by incrementally? You mean being able to cond_resched()
> in the middle of the tasks iteration? Yeah not sure that's possible...

I do indeed mean doing cond_resched() mid-stream.

One way to make this happen would be to do something like this:

struct task_struct_anchor {
	struct list_head tsa_list;
	struct list_head tsa_adjust_list;
	atomic_t tsa_ref;  // Or use an appropriate API.
	bool tsa_is_anchor;
}

Each task structure would contain one of these, though there are a
number of ways to conserve space if needed.

These anchors would be placed perhaps every 1,000 tasks or so.  When a
traversal encountered one, it could atomic_inc_not_zero() the reference
count, and if that succeeded, exit the RCU read-side critical section and
do a cond_resched().  It could then enter a new RCU read-side critical
section, drop the reference, and continue.

A traveral might container_of() its way from ->tsa_list to the
task_struct_anchor structure, then if ->tsa_is_anchor is false,
container_of() its way to the enclosing task structure.

How to maintain proper spacing of the anchors?

One way is to make the traversals do the checking.  If the space between a
pair of anchors was to large or too small, it could add the first of the
pair to a list to be adjusted.  This list could periodically be processed,
perhaps with more urgency if a huge gap had opened up.

Freeing an anchor requires decrementing the reference count, waiting for
it to go to zero, removing the anchor, waiting for a grace period (perhaps
asynchronously), and only then freeing the anchor.

Anchors cannot be moved, only added or removed.

So it is possible.  But is it reasonable?  ;-)

							Thanx, Paul

> > > Thanks.
> > > 
> > > >   rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
> > > > 
> > > >  include/linux/rcupdate.h |   4 +-
> > > >  include/linux/sched.h    |   2 +
> > > >  init/init_task.c         |   1 +
> > > >  kernel/fork.c            |   1 +
> > > >  kernel/rcu/tasks.h       | 110 ++++++++++++++++++++++++++++++---------
> > > >  5 files changed, 90 insertions(+), 28 deletions(-)
> > > > 
> > > > -- 
> > > > 2.43.0
> > > > 
> > > > 
> > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/6] RCU tasks fixes for v6.9
  2024-02-24  0:43       ` Paul E. McKenney
@ 2024-02-26 13:56         ` Frederic Weisbecker
  2024-02-26 14:37           ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Frederic Weisbecker @ 2024-02-26 13:56 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay

Le Fri, Feb 23, 2024 at 04:43:04PM -0800, Paul E. McKenney a écrit :
> I do indeed mean doing cond_resched() mid-stream.
> 
> One way to make this happen would be to do something like this:
> 
> struct task_struct_anchor {
> 	struct list_head tsa_list;
> 	struct list_head tsa_adjust_list;
> 	atomic_t tsa_ref;  // Or use an appropriate API.
> 	bool tsa_is_anchor;
> }
> 
> Each task structure would contain one of these, though there are a
> number of ways to conserve space if needed.
> 
> These anchors would be placed perhaps every 1,000 tasks or so.  When a
> traversal encountered one, it could atomic_inc_not_zero() the reference
> count, and if that succeeded, exit the RCU read-side critical section and
> do a cond_resched().  It could then enter a new RCU read-side critical
> section, drop the reference, and continue.
> 
> A traveral might container_of() its way from ->tsa_list to the
> task_struct_anchor structure, then if ->tsa_is_anchor is false,
> container_of() its way to the enclosing task structure.
> 
> How to maintain proper spacing of the anchors?
> 
> One way is to make the traversals do the checking.  If the space between a
> pair of anchors was to large or too small, it could add the first of the
> pair to a list to be adjusted.  This list could periodically be processed,
> perhaps with more urgency if a huge gap had opened up.
> 
> Freeing an anchor requires decrementing the reference count, waiting for
> it to go to zero, removing the anchor, waiting for a grace period (perhaps
> asynchronously), and only then freeing the anchor.
> 
> Anchors cannot be moved, only added or removed.
> 
> So it is possible.  But is it reasonable?  ;-)

Wow! And this will need to be done both for process leaders (p->tasks)
and for threads (p->thread_node) :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/6] RCU tasks fixes for v6.9
  2024-02-26 13:56         ` Frederic Weisbecker
@ 2024-02-26 14:37           ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2024-02-26 14:37 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Boqun Feng, linux-kernel, rcu, Neeraj Upadhyay

On Mon, Feb 26, 2024 at 02:56:06PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 23, 2024 at 04:43:04PM -0800, Paul E. McKenney a écrit :
> > I do indeed mean doing cond_resched() mid-stream.
> > 
> > One way to make this happen would be to do something like this:
> > 
> > struct task_struct_anchor {
> > 	struct list_head tsa_list;
> > 	struct list_head tsa_adjust_list;
> > 	atomic_t tsa_ref;  // Or use an appropriate API.
> > 	bool tsa_is_anchor;
> > }
> > 
> > Each task structure would contain one of these, though there are a
> > number of ways to conserve space if needed.
> > 
> > These anchors would be placed perhaps every 1,000 tasks or so.  When a
> > traversal encountered one, it could atomic_inc_not_zero() the reference
> > count, and if that succeeded, exit the RCU read-side critical section and
> > do a cond_resched().  It could then enter a new RCU read-side critical
> > section, drop the reference, and continue.
> > 
> > A traveral might container_of() its way from ->tsa_list to the
> > task_struct_anchor structure, then if ->tsa_is_anchor is false,
> > container_of() its way to the enclosing task structure.
> > 
> > How to maintain proper spacing of the anchors?
> > 
> > One way is to make the traversals do the checking.  If the space between a
> > pair of anchors was to large or too small, it could add the first of the
> > pair to a list to be adjusted.  This list could periodically be processed,
> > perhaps with more urgency if a huge gap had opened up.
> > 
> > Freeing an anchor requires decrementing the reference count, waiting for
> > it to go to zero, removing the anchor, waiting for a grace period (perhaps
> > asynchronously), and only then freeing the anchor.
> > 
> > Anchors cannot be moved, only added or removed.
> > 
> > So it is possible.  But is it reasonable?  ;-)
> 
> Wow! And this will need to be done both for process leaders (p->tasks)
> and for threads (p->thread_node) :-)

True enough!  Your point being?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-02-26 14:37 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-17  1:27 [PATCH v2 0/6] RCU tasks fixes for v6.9 Boqun Feng
2024-02-17  1:27 ` [PATCH v2 1/6] rcu-tasks: Repair RCU Tasks Trace quiescence check Boqun Feng
2024-02-17  1:27 ` [PATCH v2 2/6] rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks Boqun Feng
2024-02-22 16:54   ` Frederic Weisbecker
2024-02-22 20:46     ` Paul E. McKenney
2024-02-17  1:27 ` [PATCH v2 3/6] rcu-tasks: Initialize " Boqun Feng
2024-02-22 16:21   ` Frederic Weisbecker
2024-02-22 20:41     ` Paul E. McKenney
2024-02-23 11:39       ` Frederic Weisbecker
2024-02-17  1:27 ` [PATCH v2 4/6] rcu-tasks: Maintain lists " Boqun Feng
2024-02-22 16:32   ` Frederic Weisbecker
2024-02-23 12:19   ` Frederic Weisbecker
2024-02-24  0:28     ` Paul E. McKenney
2024-02-17  1:27 ` [PATCH v2 5/6] rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks Boqun Feng
2024-02-22 16:46   ` Frederic Weisbecker
2024-02-17  1:27 ` [PATCH v2 6/6] rcu-tasks: Maintain real-time response in rcu_tasks_postscan() Boqun Feng
2024-02-22 17:48   ` Frederic Weisbecker
2024-02-22 20:52     ` Paul E. McKenney
2024-02-22 22:56       ` Paul E. McKenney
2024-02-23 12:17         ` Frederic Weisbecker
2024-02-23 15:14           ` Frederic Weisbecker
2024-02-24  0:23             ` Paul E. McKenney
2024-02-22 16:52 ` [PATCH v2 0/6] RCU tasks fixes for v6.9 Frederic Weisbecker
2024-02-22 22:09   ` Paul E. McKenney
2024-02-23 12:25     ` Frederic Weisbecker
2024-02-24  0:43       ` Paul E. McKenney
2024-02-26 13:56         ` Frederic Weisbecker
2024-02-26 14:37           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox