public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Wei Fu <fuweid89@gmail.com>
Cc: Sudhanva.Huruli@microsoft.com, akpm@linux-foundation.org,
	apais@linux.microsoft.com, axboe@kernel.dk, boqun.feng@gmail.com,
	brauner@kernel.org, ebiederm@xmission.com, frederic@kernel.org,
	j.granados@samsung.com, jiangshanlai@gmail.com,
	joel@joelfernandes.org, josh@joshtriplett.org,
	linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
	michael.christie@oracle.com, mjguzik@gmail.com,
	neeraj.upadhyay@kernel.org, paulmck@kernel.org,
	qiang.zhang1211@gmail.com, rachelmenge@linux.microsoft.com,
	rcu@vger.kernel.org, rostedt@goodmis.org, weifu@microsoft.com
Subject: Re: [RCU] zombie task hung in synchronize_rcu_expedited
Date: Thu, 6 Jun 2024 19:28:49 +0200	[thread overview]
Message-ID: <20240606172848.GC22450@redhat.com> (raw)
In-Reply-To: <20240606154553.53514-1-fuweid89@gmail.com>

Hi Wei, thanks for more info.

On 06/06, Wei Fu wrote:
>
> > Well, due to unfortunate design zap_pid_ns_processes() can hang "forever"
> > if this namespace has a (zombie) task injected from the parent ns, this
> > task should be reaped by its parent.
>
> That zombie task was cloned by pid-1 process in that pid namespace. In my last
> reproduced log, the process tree in that pid namespace looks like

OK,

> ```
> # unshare(CLONE_NEWPID | CLONE_NEWNS)
>
> npm start (pid 2522045)
>     |__npm run zombie (pid 2522605)
>        |__ sh -c "whle true; do echo zombie; sleep 1; done" (pid 2522869)
> ```

only 3 processes? nothing is running? Is the last process 2522869 a
zombie too?

Could you show your .config? In particular, CONFIG_PREEMPT...

> The `npm start (pid 2522045)` was stuck in kernel_wait4. And its child,

so this is the init task in this namespace,

> `npm run zombie (pid 2522605)`, has two threads. One of them was in D status.
...
> $ sudo cat /proc/2522605/task/*/stack
> [<0>] synchronize_rcu_expedited+0x177/0x1f0
> [<0>] namespace_unlock+0xd6/0x1b0
> [<0>] put_mnt_ns+0x73/0xa0
> [<0>] free_nsproxy+0x1c/0x1b0
> [<0>] switch_task_namespaces+0x5d/0x70
> [<0>] exit_task_namespaces+0x10/0x20
> [<0>] do_exit+0x2ce/0x500
> [<0>] io_sq_thread+0x48e/0x5a0
> [<0>] ret_from_fork+0x3c/0x60
> [<0>] ret_from_fork_asm+0x1b/0x30

so I guess this is the trace of its sub-thread 2522645.

What about the process 2522605? Has it exited too?

> > But zap_pid_ns_processes() shouldn't cause the soft-lockup, it should
> > sleep in kernel_wait4().
>
> I run `cat /proc/2522045/status` and found that the status was kept switching
> between running and sleeping.

OK, this shouldn't happen in this case. So it really looks like it spins
in a busy-wait loop because TIF_NOTIFY_SIGNAL is not cleared. It can be
reported as sleeping because do_wait() sets/clears TASK_INTERRUPTIBLE,
although the window is small...

Oleg.


  reply	other threads:[~2024-06-06 17:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 23:42 [RCU] zombie task hung in synchronize_rcu_expedited Rachel Menge
2024-06-06 11:10 ` Oleg Nesterov
2024-06-06 15:45   ` Wei Fu
2024-06-06 17:28     ` Oleg Nesterov [this message]
2024-06-07  3:02       ` Wei Fu
2024-06-07  6:25         ` Oleg Nesterov
2024-06-07 15:04           ` Wei Fu
2024-06-07 21:22             ` Oleg Nesterov
2024-06-08 12:42               ` Oleg Nesterov
2024-06-10  0:07                 ` Wei Fu
2024-06-08 12:06 ` [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING Oleg Nesterov
2024-06-08 17:00   ` Boqun Feng
2024-06-09 14:12   ` Wei Fu
2024-06-12 16:57   ` Jens Axboe
2024-06-13 12:40   ` Eric W. Biederman
2024-06-13 14:02     ` Wei Fu
2024-06-13 14:49       ` Oleg Nesterov
2024-06-13 15:30     ` Oleg Nesterov
2024-06-08 15:48 ` [PATCH] zap_pid_ns_processes: don't send SIGKILL to sub-threads Oleg Nesterov
2024-06-13 13:01   ` Eric W. Biederman
2024-06-13 15:00     ` Oleg Nesterov
2024-06-13 16:23       ` Eric W. Biederman
2024-07-05 16:08       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240606172848.GC22450@redhat.com \
    --to=oleg@redhat.com \
    --cc=Sudhanva.Huruli@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=apais@linux.microsoft.com \
    --cc=axboe@kernel.dk \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=frederic@kernel.org \
    --cc=fuweid89@gmail.com \
    --cc=j.granados@samsung.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rachelmenge@linux.microsoft.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=weifu@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox