From: Wei Fu <fuweid89@gmail.com>
To: oleg@redhat.com
Cc: Sudhanva.Huruli@microsoft.com, akpm@linux-foundation.org,
apais@linux.microsoft.com, axboe@kernel.dk, boqun.feng@gmail.com,
brauner@kernel.org, frederic@kernel.org, fuweid89@gmail.com,
j.granados@samsung.com, jiangshanlai@gmail.com,
joel@joelfernandes.org, josh@joshtriplett.org,
linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
michael.christie@oracle.com, mjguzik@gmail.com,
neeraj.upadhyay@kernel.org, paulmck@kernel.org,
qiang.zhang1211@gmail.com, rachelmenge@linux.microsoft.com,
rcu@vger.kernel.org, rostedt@goodmis.org
Subject: [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
Date: Sun, 9 Jun 2024 22:12:44 +0800 [thread overview]
Message-ID: <20240609141244.1770896-1-fuweid89@gmail.com> (raw)
In-Reply-To: <20240608120616.GB7947@redhat.com>
> kernel_wait4() doesn't sleep and returns -EINTR if there is no
> eligible child and signal_pending() is true.
>
> That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
> enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
> return false and avoid a busy-wait loop.
>
> Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
> Reported-by: Rachel Menge <rachelmenge@linux.microsoft.com>
> Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-By: Wei Fu <fuweid89@gmail.com>
This change looks good to me!
I used [rcudeadlock-v1][1] to verify this patch on v5.15.160 for more than 30
hours. The soft lockup didn't show up. If there is no such patch, that
test will trigger soft-lockup in 10 minutes.
```
root@(none):/# uname -a
Linux (none) 5.15.160-dirty #7 SMP Fri Jun 7 15:25:30 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@(none):/# ps -ef | grep rcu
root 3 2 0 Jun07 ? 00:00:00 [rcu_gp]
root 4 2 0 Jun07 ? 00:00:00 [rcu_par_gp]
root 11 2 0 Jun07 ? 00:00:00 [rcu_tasks_rude_]
root 12 2 0 Jun07 ? 00:00:00 [rcu_tasks_trace]
root 15 2 0 Jun07 ? 00:03:31 [rcu_sched]
root 145 141 0 Jun07 ? 00:15:29 ./rcudeadlock
root 5372 141 0 13:37 ? 00:00:00 grep rcu
root@(none):/# date
Sun Jun 9 13:37:38 UTC 2024
```
I used [rcudeadlock-v2][2] to verify this patch on v6.10-rc2 for more than 2
hours. The soft lockup didn't show up. If there is no such patch, that
test will trigger soft-lockup in 1 minute.
```
root@(none):/# uname -a
Linux (none) 6.10.0-rc2-dirty #4 SMP Sun Jun 9 11:19:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@(none):/# ps -ef | grep rcu
root 4 2 0 11:20 ? 00:00:00 [kworker/R-rcu_g]
root 13 2 0 11:20 ? 00:00:00 [rcu_tasks_rude_kthread]
root 14 2 0 11:20 ? 00:00:00 [rcu_tasks_trace_kthread]
root 16 2 0 11:20 ? 00:00:03 [rcu_sched]
root 17 2 0 11:20 ? 00:00:00 [rcu_exp_par_gp_kthread_worker/0]
root 18 2 0 11:20 ? 00:00:12 [rcu_exp_gp_kthread_worker]
root 117 108 0 11:21 ? 00:01:06 ./rcudeadlock
root 14451 108 0 13:37 ? 00:00:00 grep rcu
root@(none):/# date
Sun Jun 9 13:37:15 UTC 2024
```
It's about data-race during cleanup active iou-wrk-thread. I shares that idea
about how to verify this patch.
> ---
> kernel/pid_namespace.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index dc48fecfa1dc..25f3cf679b35 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> */
> do {
> clear_thread_flag(TIF_SIGPENDING);
> + clear_thread_flag(TIF_NOTIFY_SIGNAL);
> rc = kernel_wait4(-1, NULL, __WALL, NULL);
> } while (rc != -ECHILD);
>
> --
> 2.25.1.362.g51ebf55
>
>
>
Let's assume that there is new pid namespace unshared by host pid namespace,
named by `PA`. There are two processes in `PA`. The init process is named by
`X` and its child is named by `Y`.
```
unshare(CLONE_NEWPID|CLONE_NEWNS)
X
|__ Y
```
The main-thread of process X creates one active iouring worker thread
`iou-wrk-X`. When process X exits, that main-thread of process X wakes up and
set `TIF_NOTIFY_SIGNAL` flag on `iou-wrk-X` thread.
However, if `iou-wrk-X` thread receives signal from main-thread and wakes up,
that thread isn't able to clear `TIF_NOTIFY_SIGNAL` flag. And that `iou-wrk-X`
thread is last thread in process-X and it will carry `TIF_NOTIFY_SIGNAL` flag
to enter `zap_pid_ns_processes`. It can be described by the following comment.
```
== X main-thread == == X iou-wrk-X == == Y main-thread ==
do_exit
kill iou-wrk-X thread
io_uring_files_cancel io_wq_worker
set TIF_NOTIFY_SIGNAL on
iou-wrk-X thread
do_exit(0)
exit_task_namespace
exit_task_namespace
do_task_dead
exit_notify
forget_original_parent
find_child_reaper
zap_pid_ns_processes do_exit
exit_task_namespace
...
namespace_unlock
synchronize_rcu_expedited
```
The `iou-wrk-X` thread kills process-Y which is only one holding the mount
namespace reference. The process-Y will get into `synchronize_rcu_expedited`.
Since kernel doesn't enable preempt and `iou-wrk-X` thread has
`TIF_NOTIFY_SIGNAL` flag, the `iou-wrk-X` thread will get into infinity loop,
which cause soft lockup.
So, in [rcudeadlock-v2][2] test, I create more active iou-wrk- threads in
init process so that there is high chance to have iou-wrk- thread in
`zap_pid_ns_processes` function.
Hope it can help.
Thanks,
Wei
[1]: https://github.com/rlmenge/rcu-soft-lock-issue-repro/blob/662b8e414ff15d75419e2286b8121b7c2049a37c/rcudeadlock.go#L1
[2]: https://github.com/rlmenge/rcu-soft-lock-issue-repro/pull/1
next prev parent reply other threads:[~2024-06-09 14:13 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 23:42 [RCU] zombie task hung in synchronize_rcu_expedited Rachel Menge
2024-06-06 11:10 ` Oleg Nesterov
2024-06-06 15:45 ` Wei Fu
2024-06-06 17:28 ` Oleg Nesterov
2024-06-07 3:02 ` Wei Fu
2024-06-07 6:25 ` Oleg Nesterov
2024-06-07 15:04 ` Wei Fu
2024-06-07 21:22 ` Oleg Nesterov
2024-06-08 12:42 ` Oleg Nesterov
2024-06-10 0:07 ` Wei Fu
2024-06-08 12:06 ` [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING Oleg Nesterov
2024-06-08 17:00 ` Boqun Feng
2024-06-09 14:12 ` Wei Fu [this message]
2024-06-12 16:57 ` Jens Axboe
2024-06-13 12:40 ` Eric W. Biederman
2024-06-13 14:02 ` Wei Fu
2024-06-13 14:49 ` Oleg Nesterov
2024-06-13 15:30 ` Oleg Nesterov
2024-06-08 15:48 ` [PATCH] zap_pid_ns_processes: don't send SIGKILL to sub-threads Oleg Nesterov
2024-06-13 13:01 ` Eric W. Biederman
2024-06-13 15:00 ` Oleg Nesterov
2024-06-13 16:23 ` Eric W. Biederman
2024-07-05 16:08 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240609141244.1770896-1-fuweid89@gmail.com \
--to=fuweid89@gmail.com \
--cc=Sudhanva.Huruli@microsoft.com \
--cc=akpm@linux-foundation.org \
--cc=apais@linux.microsoft.com \
--cc=axboe@kernel.dk \
--cc=boqun.feng@gmail.com \
--cc=brauner@kernel.org \
--cc=frederic@kernel.org \
--cc=j.granados@samsung.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=michael.christie@oracle.com \
--cc=mjguzik@gmail.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rachelmenge@linux.microsoft.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox