From: Guenter Roeck <linux@roeck-us.net>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@kernel.org>,
linux-kernel@vger.kernel.org, Vovo Yang <vovoy@google.com>
Subject: Re: Threads stuck in zap_pid_ns_processes()
Date: Thu, 11 May 2017 13:21:04 -0700 [thread overview]
Message-ID: <20170511202104.GA14720@roeck-us.net> (raw)
In-Reply-To: <87shkbfggm.fsf@xmission.com>
On Thu, May 11, 2017 at 12:31:21PM -0500, Eric W. Biederman wrote:
> Guenter Roeck <linux@roeck-us.net> writes:
>
> > Hi all,
> >
> > the test program attached below almost always results in one of the child
> > processes being stuck in zap_pid_ns_processes(). When this happens, I can
> > see from test logs that nr_hashed == 2 and init_pids==1, but there is only
> > a single thread left in the pid namespace (the one that is stuck).
> > Traceback from /proc/<pid>/stack is
> >
> > [<ffffffff811c385e>] zap_pid_ns_processes+0x1ee/0x2a0
> > [<ffffffff810c1ba4>] do_exit+0x10d4/0x1330
> > [<ffffffff810c1ee6>] do_group_exit+0x86/0x130
> > [<ffffffff810d4347>] get_signal+0x367/0x8a0
> > [<ffffffff81046e73>] do_signal+0x83/0xb90
> > [<ffffffff81004475>] exit_to_usermode_loop+0x75/0xc0
> > [<ffffffff810055b6>] syscall_return_slowpath+0xc6/0xd0
> > [<ffffffff81ced488>] entry_SYSCALL_64_fastpath+0xab/0xad
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > After 120 seconds, I get the "hung task" message.
> >
> > Example from v4.11:
> >
> > ...
> > [ 3263.379545] INFO: task clone:27910 blocked for more than 120 seconds.
> > [ 3263.379561] Not tainted 4.11.0+ #1
> > [ 3263.379569] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 3263.379577] clone D 0 27910 27909 0x00000000
> > [ 3263.379587] Call Trace:
> > [ 3263.379608] __schedule+0x677/0xda0
> > [ 3263.379621] ? pci_mmcfg_check_reserved+0xc0/0xc0
> > [ 3263.379634] ? task_stopped_code+0x70/0x70
> > [ 3263.379643] schedule+0x4d/0xd0
> > [ 3263.379653] zap_pid_ns_processes+0x1ee/0x2a0
> > [ 3263.379659] ? copy_pid_ns+0x4d0/0x4d0
> > [ 3263.379670] do_exit+0x10d4/0x1330
> > ...
> >
> > The problem is seen in all kernels up to v4.11.
> >
> > Any idea what might be going on and how to fix the problem ?
>
> Let me see. Reading the code it looks like we have three tasks
> let's call them main, child1, and child2.
>
> child1 and child2 are started using CLONE_THREAD and are
> thus clones of one another.
>
> child2 exits first but is ptraced by main so is not reaped.
> Further child2 calls do_group_exit forcing child1 to
> exit making for fun races.
>
> A ptread_exit() or syscall(SYS_exit, 0); would skip
> the group exit and make the window larger.
>
> child1 exits next and calls zap_pid_ns_processes and is
> waiting for child2 to be reaped by main.
>
> main is just sitting around doing nothing for 3600 seconds
> not reaping anyone.
>
> I would expect that when main exits everything would be cleaned up
> and the only real issue is that we have a hung task warning.
>
> Does everything cleanup when main exits?
>
As an add-on to my previous mail: I added a function to count
the number of threads in the pid namespace, using next_pidmap().
Even though nr_hashed == 2, only the hanging thread is still
present.
Is there maybe a better way to terminate the wait loop than
with "if (pid_ns->nr_hashed == init_pids)" ?
Thanks,
Guenter
next prev parent reply other threads:[~2017-05-11 20:21 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-11 17:11 Threads stuck in zap_pid_ns_processes() Guenter Roeck
2017-05-11 17:31 ` Eric W. Biederman
2017-05-11 18:35 ` Guenter Roeck
2017-05-11 20:23 ` Eric W. Biederman
2017-05-11 20:48 ` Guenter Roeck
2017-05-11 21:39 ` Eric W. Biederman
2017-05-11 20:21 ` Guenter Roeck [this message]
2017-05-11 21:25 ` Eric W. Biederman
2017-05-11 22:47 ` Guenter Roeck
2017-05-11 23:19 ` Eric W. Biederman
2017-05-12 9:30 ` Vovo Yang
2017-05-12 13:26 ` Eric W. Biederman
2017-05-12 16:52 ` Guenter Roeck
2017-05-12 17:33 ` Eric W. Biederman
[not found] ` <874lwqyo8i.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-12 17:55 ` [REVIEW][PATCH] pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes Eric W. Biederman
2017-05-12 17:55 ` Eric W. Biederman
[not found] ` <87d1bex8mt.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-12 19:33 ` Guenter Roeck
2017-05-12 19:33 ` Guenter Roeck
2017-05-12 19:43 ` Threads stuck in zap_pid_ns_processes() Guenter Roeck
2017-05-12 20:03 ` Eric W. Biederman
2017-05-13 14:34 ` Guenter Roeck
2017-05-13 18:21 ` Eric W. Biederman
2017-06-01 17:08 ` Eric W. Biederman
2017-06-01 18:45 ` Guenter Roeck
2017-06-01 19:36 ` Eric W. Biederman
2017-06-01 21:43 ` Guenter Roeck
2017-06-02 1:06 ` Eric W. Biederman
2017-05-12 3:42 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170511202104.GA14720@roeck-us.net \
--to=linux@roeck-us.net \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=vovoy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.