[PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Wei Fu <fuweid89@gmail.com>
To: oleg@redhat.com
Cc: Sudhanva.Huruli@microsoft.com, akpm@linux-foundation.org,
	apais@linux.microsoft.com, axboe@kernel.dk, boqun.feng@gmail.com,
	brauner@kernel.org, frederic@kernel.org, fuweid89@gmail.com,
	j.granados@samsung.com, jiangshanlai@gmail.com,
	joel@joelfernandes.org, josh@joshtriplett.org,
	linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
	michael.christie@oracle.com, mjguzik@gmail.com,
	neeraj.upadhyay@kernel.org, paulmck@kernel.org,
	qiang.zhang1211@gmail.com, rachelmenge@linux.microsoft.com,
	rcu@vger.kernel.org, rostedt@goodmis.org
Subject: [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
Date: Sun,  9 Jun 2024 22:12:44 +0800	[thread overview]
Message-ID: <20240609141244.1770896-1-fuweid89@gmail.com> (raw)
In-Reply-To: <20240608120616.GB7947@redhat.com>

> kernel_wait4() doesn't sleep and returns -EINTR if there is no
> eligible child and signal_pending() is true.
> 
> That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
> enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
> return false and avoid a busy-wait loop.
> 
> Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
> Reported-by: Rachel Menge <rachelmenge@linux.microsoft.com>
> Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Tested-By: Wei Fu <fuweid89@gmail.com>

This change looks good to me!

I used [rcudeadlock-v1][1] to verify this patch on v5.15.160 for more than 30
hours. The soft lockup didn't show up. If there is no such patch, that
test will trigger soft-lockup in 10 minutes.

```
root@(none):/# uname -a
Linux (none) 5.15.160-dirty #7 SMP Fri Jun 7 15:25:30 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

root@(none):/# ps -ef | grep rcu
root 3 2 0 Jun07 ? 00:00:00 [rcu_gp]
root 4 2 0 Jun07 ? 00:00:00 [rcu_par_gp]
root 11 2 0 Jun07 ? 00:00:00 [rcu_tasks_rude_]
root 12 2 0 Jun07 ? 00:00:00 [rcu_tasks_trace]
root 15 2 0 Jun07 ? 00:03:31 [rcu_sched]
root 145 141 0 Jun07 ? 00:15:29 ./rcudeadlock
root 5372 141 0 13:37 ? 00:00:00 grep rcu

root@(none):/# date
Sun Jun 9 13:37:38 UTC 2024
```

I used [rcudeadlock-v2][2] to verify this patch on v6.10-rc2 for more than 2
hours. The soft lockup didn't show up. If there is no such patch, that
test will trigger soft-lockup in 1 minute.

```
root@(none):/# uname -a
Linux (none) 6.10.0-rc2-dirty #4 SMP Sun Jun 9 11:19:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

root@(none):/# ps -ef | grep rcu
root 4 2 0 11:20 ? 00:00:00 [kworker/R-rcu_g]
root 13 2 0 11:20 ? 00:00:00 [rcu_tasks_rude_kthread]
root 14 2 0 11:20 ? 00:00:00 [rcu_tasks_trace_kthread]
root 16 2 0 11:20 ? 00:00:03 [rcu_sched]
root 17 2 0 11:20 ? 00:00:00 [rcu_exp_par_gp_kthread_worker/0]
root 18 2 0 11:20 ? 00:00:12 [rcu_exp_gp_kthread_worker]
root 117 108 0 11:21 ? 00:01:06 ./rcudeadlock
root 14451 108 0 13:37 ? 00:00:00 grep rcu

root@(none):/# date
Sun Jun 9 13:37:15 UTC 2024
```

It's about data-race during cleanup active iou-wrk-thread. I shares that idea
about how to verify this patch.

> ---
>  kernel/pid_namespace.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index dc48fecfa1dc..25f3cf679b35 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  	 */
>  	do {
>  		clear_thread_flag(TIF_SIGPENDING);
> +		clear_thread_flag(TIF_NOTIFY_SIGNAL);
>  		rc = kernel_wait4(-1, NULL, __WALL, NULL);
>  	} while (rc != -ECHILD);
>  
> -- 
> 2.25.1.362.g51ebf55
> 
> 
> 

Let's assume that there is new pid namespace unshared by host pid namespace,
named by `PA`. There are two processes in `PA`. The init process is named by
`X` and its child is named by `Y`.

```
unshare(CLONE_NEWPID|CLONE_NEWNS)

	X
	|__ Y
```

The main-thread of process X creates one active iouring worker thread
`iou-wrk-X`. When process X exits, that main-thread of process X wakes up and
set `TIF_NOTIFY_SIGNAL` flag on `iou-wrk-X` thread.

However, if `iou-wrk-X` thread receives signal from main-thread and wakes up,
that thread isn't able to clear `TIF_NOTIFY_SIGNAL` flag. And that `iou-wrk-X`
thread is last thread in process-X and it will carry `TIF_NOTIFY_SIGNAL` flag
to enter `zap_pid_ns_processes`. It can be described by the following comment.

```
== X main-thread ==		== X iou-wrk-X ==	== Y main-thread ==

do_exit
  kill iou-wrk-X thread
  io_uring_files_cancel		io_wq_worker
    set TIF_NOTIFY_SIGNAL on
     iou-wrk-X thread
				do_exit(0)
  exit_task_namespace		

				  exit_task_namespace
  do_task_dead

				  exit_notify
				    forget_original_parent
				      find_child_reaper
				        zap_pid_ns_processes	do_exit
								  exit_task_namespace
								    ...
								    namespace_unlock
								      synchronize_rcu_expedited
```

The `iou-wrk-X` thread kills process-Y which is only one holding the mount
namespace reference. The process-Y will get into `synchronize_rcu_expedited`.

Since kernel doesn't enable preempt and `iou-wrk-X` thread has
`TIF_NOTIFY_SIGNAL` flag, the `iou-wrk-X` thread will get into infinity loop,
which cause soft lockup.

So, in [rcudeadlock-v2][2] test, I create more active iou-wrk- threads in
init process so that there is high chance to have iou-wrk- thread in
`zap_pid_ns_processes` function.

Hope it can help.
Thanks,

Wei

[1]: https://github.com/rlmenge/rcu-soft-lock-issue-repro/blob/662b8e414ff15d75419e2286b8121b7c2049a37c/rcudeadlock.go#L1
[2]: https://github.com/rlmenge/rcu-soft-lock-issue-repro/pull/1

next prev parent reply	other threads:[~2024-06-09 14:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 23:42 [RCU] zombie task hung in synchronize_rcu_expedited Rachel Menge
2024-06-06 11:10 ` Oleg Nesterov
2024-06-06 15:45   ` Wei Fu
2024-06-06 17:28     ` Oleg Nesterov
2024-06-07  3:02       ` Wei Fu
2024-06-07  6:25         ` Oleg Nesterov
2024-06-07 15:04           ` Wei Fu
2024-06-07 21:22             ` Oleg Nesterov
2024-06-08 12:42               ` Oleg Nesterov
2024-06-10  0:07                 ` Wei Fu
2024-06-08 12:06 ` [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING Oleg Nesterov
2024-06-08 17:00   ` Boqun Feng
2024-06-09 14:12   ` Wei Fu [this message]
2024-06-12 16:57   ` Jens Axboe
2024-06-13 12:40   ` Eric W. Biederman
2024-06-13 14:02     ` Wei Fu
2024-06-13 14:49       ` Oleg Nesterov
2024-06-13 15:30     ` Oleg Nesterov
2024-06-08 15:48 ` [PATCH] zap_pid_ns_processes: don't send SIGKILL to sub-threads Oleg Nesterov
2024-06-13 13:01   ` Eric W. Biederman
2024-06-13 15:00     ` Oleg Nesterov
2024-06-13 16:23       ` Eric W. Biederman
2024-07-05 16:08       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240609141244.1770896-1-fuweid89@gmail.com \
    --to=fuweid89@gmail.com \
    --cc=Sudhanva.Huruli@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=apais@linux.microsoft.com \
    --cc=axboe@kernel.dk \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=frederic@kernel.org \
    --cc=j.granados@samsung.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rachelmenge@linux.microsoft.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox