From: Oleg Nesterov <oleg@redhat.com>
To: Wei Fu <fuweid89@gmail.com>
Cc: Sudhanva.Huruli@microsoft.com, akpm@linux-foundation.org,
apais@linux.microsoft.com, axboe@kernel.dk, boqun.feng@gmail.com,
brauner@kernel.org, ebiederm@xmission.com, frederic@kernel.org,
j.granados@samsung.com, jiangshanlai@gmail.com,
joel@joelfernandes.org, josh@joshtriplett.org,
linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
michael.christie@oracle.com, mjguzik@gmail.com,
neeraj.upadhyay@kernel.org, paulmck@kernel.org,
qiang.zhang1211@gmail.com, rachelmenge@linux.microsoft.com,
rcu@vger.kernel.org, rostedt@goodmis.org, weifu@microsoft.com
Subject: Re: [RCU] zombie task hung in synchronize_rcu_expedited
Date: Thu, 6 Jun 2024 19:28:49 +0200 [thread overview]
Message-ID: <20240606172848.GC22450@redhat.com> (raw)
In-Reply-To: <20240606154553.53514-1-fuweid89@gmail.com>
Hi Wei, thanks for more info.
On 06/06, Wei Fu wrote:
>
> > Well, due to unfortunate design zap_pid_ns_processes() can hang "forever"
> > if this namespace has a (zombie) task injected from the parent ns, this
> > task should be reaped by its parent.
>
> That zombie task was cloned by pid-1 process in that pid namespace. In my last
> reproduced log, the process tree in that pid namespace looks like
OK,
> ```
> # unshare(CLONE_NEWPID | CLONE_NEWNS)
>
> npm start (pid 2522045)
> |__npm run zombie (pid 2522605)
> |__ sh -c "whle true; do echo zombie; sleep 1; done" (pid 2522869)
> ```
only 3 processes? nothing is running? Is the last process 2522869 a
zombie too?
Could you show your .config? In particular, CONFIG_PREEMPT...
> The `npm start (pid 2522045)` was stuck in kernel_wait4. And its child,
so this is the init task in this namespace,
> `npm run zombie (pid 2522605)`, has two threads. One of them was in D status.
...
> $ sudo cat /proc/2522605/task/*/stack
> [<0>] synchronize_rcu_expedited+0x177/0x1f0
> [<0>] namespace_unlock+0xd6/0x1b0
> [<0>] put_mnt_ns+0x73/0xa0
> [<0>] free_nsproxy+0x1c/0x1b0
> [<0>] switch_task_namespaces+0x5d/0x70
> [<0>] exit_task_namespaces+0x10/0x20
> [<0>] do_exit+0x2ce/0x500
> [<0>] io_sq_thread+0x48e/0x5a0
> [<0>] ret_from_fork+0x3c/0x60
> [<0>] ret_from_fork_asm+0x1b/0x30
so I guess this is the trace of its sub-thread 2522645.
What about the process 2522605? Has it exited too?
> > But zap_pid_ns_processes() shouldn't cause the soft-lockup, it should
> > sleep in kernel_wait4().
>
> I run `cat /proc/2522045/status` and found that the status was kept switching
> between running and sleeping.
OK, this shouldn't happen in this case. So it really looks like it spins
in a busy-wait loop because TIF_NOTIFY_SIGNAL is not cleared. It can be
reported as sleeping because do_wait() sets/clears TASK_INTERRUPTIBLE,
although the window is small...
Oleg.
next prev parent reply other threads:[~2024-06-06 17:30 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 23:42 [RCU] zombie task hung in synchronize_rcu_expedited Rachel Menge
2024-06-06 11:10 ` Oleg Nesterov
2024-06-06 15:45 ` Wei Fu
2024-06-06 17:28 ` Oleg Nesterov [this message]
2024-06-07 3:02 ` Wei Fu
2024-06-07 6:25 ` Oleg Nesterov
2024-06-07 15:04 ` Wei Fu
2024-06-07 21:22 ` Oleg Nesterov
2024-06-08 12:42 ` Oleg Nesterov
2024-06-10 0:07 ` Wei Fu
2024-06-08 12:06 ` [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING Oleg Nesterov
2024-06-08 17:00 ` Boqun Feng
2024-06-09 14:12 ` Wei Fu
2024-06-12 16:57 ` Jens Axboe
2024-06-13 12:40 ` Eric W. Biederman
2024-06-13 14:02 ` Wei Fu
2024-06-13 14:49 ` Oleg Nesterov
2024-06-13 15:30 ` Oleg Nesterov
2024-06-08 15:48 ` [PATCH] zap_pid_ns_processes: don't send SIGKILL to sub-threads Oleg Nesterov
2024-06-13 13:01 ` Eric W. Biederman
2024-06-13 15:00 ` Oleg Nesterov
2024-06-13 16:23 ` Eric W. Biederman
2024-07-05 16:08 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240606172848.GC22450@redhat.com \
--to=oleg@redhat.com \
--cc=Sudhanva.Huruli@microsoft.com \
--cc=akpm@linux-foundation.org \
--cc=apais@linux.microsoft.com \
--cc=axboe@kernel.dk \
--cc=boqun.feng@gmail.com \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=frederic@kernel.org \
--cc=fuweid89@gmail.com \
--cc=j.granados@samsung.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=michael.christie@oracle.com \
--cc=mjguzik@gmail.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rachelmenge@linux.microsoft.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=weifu@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.