From: Frederic Weisbecker <frederic@kernel.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
linux-doc@vger.kernel.org,
"Paul E. McKenney" <paulmck@kernel.org>,
Chen Zhongjin <chenzhongjin@huawei.com>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Valentin Schneider <vschneid@redhat.com>,
Neeraj Upadhyay <neeraj.iitr10@gmail.com>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
Kent Overstreet <kent.overstreet@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Heiko Carstens <hca@linux.ibm.com>, Arnd Bergmann <arnd@arndb.de>,
Oleg Nesterov <oleg@redhat.com>,
Christian Brauner <brauner@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Mike Christie <michael.christie@oracle.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Mateusz Guzik <mjguzik@gmail.com>,
Nicholas Piggin <npiggin@gmail.com>,
Peng Zhang <zhangpeng.00@bytedance.com>
Subject: Re: [PATCH 2/2] rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
Date: Wed, 7 Feb 2024 23:53:13 +0100 [thread overview]
Message-ID: <ZcQJ2Vec1_b5ooS_@pavilion.home> (raw)
In-Reply-To: <20240129225730.3168681-3-boqun.feng@gmail.com>
Le Mon, Jan 29, 2024 at 02:57:27PM -0800, Boqun Feng a écrit :
> From: "Paul E. McKenney" <paulmck@kernel.org>
>
> Holding a mutex across synchronize_rcu_tasks() and acquiring
> that same mutex in code called from do_exit() after its call to
> exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
> results in deadlock. This is by design, because tasks that are far
> enough into do_exit() are no longer present on the tasks list, making
> it a bit difficult for RCU Tasks to find them, let alone wait on them
> to do a voluntary context switch. However, such deadlocks are becoming
> more frequent. In addition, lockdep currently does not detect such
> deadlocks and they can be difficult to reproduce.
>
> In addition, if a task voluntarily context switches during that time
> (for example, if it blocks acquiring a mutex), then this task is in an
> RCU Tasks quiescent state. And with some adjustments, RCU Tasks could
> just as well take advantage of that fact.
>
> This commit therefore eliminates these deadlock by replacing the
> SRCU-based wait for do_exit() completion with per-CPU lists of tasks
> currently exiting. A given task will be on one of these per-CPU lists for
> the same period of time that this task would previously have been in the
> previous SRCU read-side critical section. These lists enable RCU Tasks
> to find the tasks that have already been removed from the tasks list,
> but that must nevertheless be waited upon.
>
> The RCU Tasks grace period gathers any of these do_exit() tasks that it
> must wait on, and adds them to the list of holdouts. Per-CPU locking
> and get_task_struct() are used to synchronize addition to and removal
> from these lists.
>
> Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
>
> Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
With that, I think we can now revert 28319d6dc5e2 (rcu-tasks: Fix
synchronize_rcu_tasks() VS zap_pid_ns_processes()). Because if the task
is in rcu_tasks_exit_list, it's treated just like the others and must go
through check_holdout_task(). Therefore and unlike with the previous srcu thing,
a task sleeping between exit_tasks_rcu_start() and exit_tasks_rcu_finish() is
now a quiescent state. And that kills the possible deadlock.
> -void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
> +void exit_tasks_rcu_start(void)
> {
> - current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
> + unsigned long flags;
> + struct rcu_tasks_percpu *rtpcp;
> + struct task_struct *t = current;
> +
> + WARN_ON_ONCE(!list_empty(&t->rcu_tasks_exit_list));
> + get_task_struct(t);
Is this get_task_struct() necessary?
> + preempt_disable();
> + rtpcp = this_cpu_ptr(rcu_tasks.rtpcpu);
> + t->rcu_tasks_exit_cpu = smp_processor_id();
> + raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
Do we really need smp_mb__after_unlock_lock() ?
> + if (!rtpcp->rtp_exit_list.next)
> + INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
> + list_add(&t->rcu_tasks_exit_list, &rtpcp->rtp_exit_list);
> + raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
> + preempt_enable();
> }
>
> /*
> - * Contribute to protect against tasklist scan blind spot while the
> - * task is exiting and may be removed from the tasklist. See
> - * corresponding synchronize_srcu() for further details.
> + * Remove the task from the "yet another list" because do_exit() is now
> + * non-preemptible, allowing synchronize_rcu() to wait beyond this point.
> */
> -void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu)
> +void exit_tasks_rcu_stop(void)
> {
> + unsigned long flags;
> + struct rcu_tasks_percpu *rtpcp;
> struct task_struct *t = current;
>
> - __srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
> + WARN_ON_ONCE(list_empty(&t->rcu_tasks_exit_list));
> + rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, t->rcu_tasks_exit_cpu);
> + raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
> + list_del_init(&t->rcu_tasks_exit_list);
> + raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
> + put_task_struct(t);
And conversely this put_task_struct()?
Thanks.
> }
>
> /*
> --
> 2.43.0
>
next prev parent reply other threads:[~2024-02-07 22:53 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-29 22:57 [PATCH 0/2] RCU tasks fixes for v6.9 Boqun Feng
2024-01-29 22:57 ` [PATCH 1/2] rcu-tasks: Repair RCU Tasks Trace quiescence check Boqun Feng
2024-01-29 22:57 ` [PATCH 2/2] rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks Boqun Feng
2024-02-07 22:53 ` Frederic Weisbecker [this message]
2024-02-08 1:52 ` Frederic Weisbecker
2024-02-08 2:10 ` Frederic Weisbecker
2024-02-08 9:56 ` Paul E. McKenney
2024-02-08 10:43 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZcQJ2Vec1_b5ooS_@pavilion.home \
--to=frederic@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=boqun.feng@gmail.com \
--cc=brauner@kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=chenzhongjin@huawei.com \
--cc=dietmar.eggemann@arm.com \
--cc=hca@linux.ibm.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=juri.lelli@redhat.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=michael.christie@oracle.com \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=mst@redhat.com \
--cc=neeraj.iitr10@gmail.com \
--cc=npiggin@gmail.com \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=surenb@google.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=zhangpeng.00@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.