From: Guenter Roeck <linux@roeck-us.net>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@kernel.org>,
linux-kernel@vger.kernel.org, Vovo Yang <vovoy@google.com>
Subject: Re: Threads stuck in zap_pid_ns_processes()
Date: Thu, 11 May 2017 13:21:04 -0700 [thread overview]
Message-ID: <20170511202104.GA14720@roeck-us.net> (raw)
In-Reply-To: <87shkbfggm.fsf@xmission.com>
On Thu, May 11, 2017 at 12:31:21PM -0500, Eric W. Biederman wrote:
> Guenter Roeck <linux@roeck-us.net> writes:
>
> > Hi all,
> >
> > the test program attached below almost always results in one of the child
> > processes being stuck in zap_pid_ns_processes(). When this happens, I can
> > see from test logs that nr_hashed == 2 and init_pids==1, but there is only
> > a single thread left in the pid namespace (the one that is stuck).
> > Traceback from /proc/<pid>/stack is
> >
> > [<ffffffff811c385e>] zap_pid_ns_processes+0x1ee/0x2a0
> > [<ffffffff810c1ba4>] do_exit+0x10d4/0x1330
> > [<ffffffff810c1ee6>] do_group_exit+0x86/0x130
> > [<ffffffff810d4347>] get_signal+0x367/0x8a0
> > [<ffffffff81046e73>] do_signal+0x83/0xb90
> > [<ffffffff81004475>] exit_to_usermode_loop+0x75/0xc0
> > [<ffffffff810055b6>] syscall_return_slowpath+0xc6/0xd0
> > [<ffffffff81ced488>] entry_SYSCALL_64_fastpath+0xab/0xad
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > After 120 seconds, I get the "hung task" message.
> >
> > Example from v4.11:
> >
> > ...
> > [ 3263.379545] INFO: task clone:27910 blocked for more than 120 seconds.
> > [ 3263.379561] Not tainted 4.11.0+ #1
> > [ 3263.379569] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 3263.379577] clone D 0 27910 27909 0x00000000
> > [ 3263.379587] Call Trace:
> > [ 3263.379608] __schedule+0x677/0xda0
> > [ 3263.379621] ? pci_mmcfg_check_reserved+0xc0/0xc0
> > [ 3263.379634] ? task_stopped_code+0x70/0x70
> > [ 3263.379643] schedule+0x4d/0xd0
> > [ 3263.379653] zap_pid_ns_processes+0x1ee/0x2a0
> > [ 3263.379659] ? copy_pid_ns+0x4d0/0x4d0
> > [ 3263.379670] do_exit+0x10d4/0x1330
> > ...
> >
> > The problem is seen in all kernels up to v4.11.
> >
> > Any idea what might be going on and how to fix the problem ?
>
> Let me see. Reading the code it looks like we have three tasks
> let's call them main, child1, and child2.
>
> child1 and child2 are started using CLONE_THREAD and are
> thus clones of one another.
>
> child2 exits first but is ptraced by main so is not reaped.
> Further child2 calls do_group_exit forcing child1 to
> exit making for fun races.
>
> A ptread_exit() or syscall(SYS_exit, 0); would skip
> the group exit and make the window larger.
>
> child1 exits next and calls zap_pid_ns_processes and is
> waiting for child2 to be reaped by main.
>
> main is just sitting around doing nothing for 3600 seconds
> not reaping anyone.
>
> I would expect that when main exits everything would be cleaned up
> and the only real issue is that we have a hung task warning.
>
> Does everything cleanup when main exits?
>
As an add-on to my previous mail: I added a function to count
the number of threads in the pid namespace, using next_pidmap().
Even though nr_hashed == 2, only the hanging thread is still
present.
Is there maybe a better way to terminate the wait loop than
with "if (pid_ns->nr_hashed == init_pids)" ?
Thanks,
Guenter
next prev parent reply other threads:[~2017-05-11 20:21 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-11 17:11 Threads stuck in zap_pid_ns_processes() Guenter Roeck
2017-05-11 17:31 ` Eric W. Biederman
2017-05-11 18:35 ` Guenter Roeck
2017-05-11 20:23 ` Eric W. Biederman
2017-05-11 20:48 ` Guenter Roeck
2017-05-11 21:39 ` Eric W. Biederman
2017-05-11 20:21 ` Guenter Roeck [this message]
2017-05-11 21:25 ` Eric W. Biederman
2017-05-11 22:47 ` Guenter Roeck
2017-05-11 23:19 ` Eric W. Biederman
2017-05-12 9:30 ` Vovo Yang
2017-05-12 13:26 ` Eric W. Biederman
2017-05-12 16:52 ` Guenter Roeck
2017-05-12 17:33 ` Eric W. Biederman
2017-05-12 17:55 ` [REVIEW][PATCH] pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes Eric W. Biederman
2017-05-12 19:33 ` Guenter Roeck
2017-05-12 19:43 ` Threads stuck in zap_pid_ns_processes() Guenter Roeck
2017-05-12 20:03 ` Eric W. Biederman
2017-05-13 14:34 ` Guenter Roeck
2017-05-13 18:21 ` Eric W. Biederman
2017-06-01 17:08 ` Eric W. Biederman
2017-06-01 18:45 ` Guenter Roeck
2017-06-01 19:36 ` Eric W. Biederman
2017-06-01 21:43 ` Guenter Roeck
2017-06-02 1:06 ` Eric W. Biederman
2017-05-12 3:42 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170511202104.GA14720@roeck-us.net \
--to=linux@roeck-us.net \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=vovoy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox