public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Guenter Roeck <linux@roeck-us.net>
Cc: Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org, Vovo Yang <vovoy@google.com>
Subject: Re: Threads stuck in zap_pid_ns_processes()
Date: Thu, 11 May 2017 12:31:21 -0500	[thread overview]
Message-ID: <87shkbfggm.fsf@xmission.com> (raw)
In-Reply-To: <20170511171108.GB15063@roeck-us.net> (Guenter Roeck's message of "Thu, 11 May 2017 10:11:08 -0700")

Guenter Roeck <linux@roeck-us.net> writes:

> Hi all,
>
> the test program attached below almost always results in one of the child
> processes being stuck in zap_pid_ns_processes(). When this happens, I can
> see from test logs that nr_hashed == 2 and init_pids==1, but there is only
> a single thread left in the pid namespace (the one that is stuck).
> Traceback from /proc/<pid>/stack is
>
> [<ffffffff811c385e>] zap_pid_ns_processes+0x1ee/0x2a0
> [<ffffffff810c1ba4>] do_exit+0x10d4/0x1330
> [<ffffffff810c1ee6>] do_group_exit+0x86/0x130
> [<ffffffff810d4347>] get_signal+0x367/0x8a0
> [<ffffffff81046e73>] do_signal+0x83/0xb90
> [<ffffffff81004475>] exit_to_usermode_loop+0x75/0xc0
> [<ffffffff810055b6>] syscall_return_slowpath+0xc6/0xd0
> [<ffffffff81ced488>] entry_SYSCALL_64_fastpath+0xab/0xad
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> After 120 seconds, I get the "hung task" message.
>
> Example from v4.11:
>
> ...
> [ 3263.379545] INFO: task clone:27910 blocked for more than 120 seconds.
> [ 3263.379561]       Not tainted 4.11.0+ #1
> [ 3263.379569] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 3263.379577] clone           D    0 27910  27909 0x00000000
> [ 3263.379587] Call Trace:
> [ 3263.379608]  __schedule+0x677/0xda0
> [ 3263.379621]  ? pci_mmcfg_check_reserved+0xc0/0xc0
> [ 3263.379634]  ? task_stopped_code+0x70/0x70
> [ 3263.379643]  schedule+0x4d/0xd0
> [ 3263.379653]  zap_pid_ns_processes+0x1ee/0x2a0
> [ 3263.379659]  ? copy_pid_ns+0x4d0/0x4d0
> [ 3263.379670]  do_exit+0x10d4/0x1330
> ...
>
> The problem is seen in all kernels up to v4.11.
>
> Any idea what might be going on and how to fix the problem ?

Let me see.  Reading the code it looks like we have three tasks
let's call them main, child1, and child2.

child1 and child2 are started using CLONE_THREAD and are
thus clones of one another.

child2 exits first but is ptraced by main so is not reaped.
       Further child2 calls do_group_exit forcing child1 to
       exit making for fun races.

       A ptread_exit() or syscall(SYS_exit, 0); would skip
       the group exit and make the window larger.

child1 exits next and calls zap_pid_ns_processes and is
       waiting for child2 to be reaped by main.

main is just sitting around doing nothing for 3600 seconds
not reaping anyone.

I would expect that when main exits everything would be cleaned up
and the only real issue is that we have a hung task warning.

Does everything cleanup when main exits?

Eric


>
> Thanks,
> Guenter
>
> ---
> This test program was kindly provided by Vovo Yang <vovoy@google.com>.
>
> Note that the ptrace() call in child1() is not necessary for the problem
> to be seen, though it seems to make it a bit more likely.

That would appear to just slow things down a smidge.    As there is
nothing substantial that happens ptrace wise except until after
zap_pid_ns_processes.


> ---
>
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/ptrace.h>
> #include <errno.h>
> #include <string.h>
> #include <sched.h>
>
> #define STACK_SIZE 65536
>
> int child1(void* arg);
> int child2(void* arg);
>
> int main(int argc, char **argv)
> {
>   int child_pid;
>   char* child_stack = malloc(STACK_SIZE);
>   char* stack_top = child_stack + STACK_SIZE;
>   char command[256];
>
>   child_pid = clone(&child1, stack_top, CLONE_NEWPID, NULL);
>   if (child_pid == -1) {
>     printf("parent: clone failed: %s\n", strerror(errno));
>     return EXIT_FAILURE;
>   }
>   printf("parent: child1_pid: %d\n", child_pid);
>
>   sleep(2);
>   printf("child state, if it's D (disk sleep), the child process is hung\n");
>   sprintf(command, "cat /proc/%d/status | grep State:", child_pid);
>   system(command);
>   sleep(3600);
>   return EXIT_SUCCESS;
> }
>
> int child1(void* arg)
> {
>   int flags = CLONE_FILES | CLONE_FS | CLONE_VM | CLONE_SIGHAND | CLONE_THREAD;
>   char* child_stack = malloc(STACK_SIZE);
>   char* stack_top = child_stack + STACK_SIZE;
>   long ret;
>
>   ret = ptrace(PTRACE_TRACEME, 0, NULL, NULL);
>   if (ret == -1) {
>     printf("child1: ptrace failed: %s\n", strerror(errno));
>     return EXIT_FAILURE;
>   }
>
>   ret = clone(&child2, stack_top, flags, NULL);
>   if (ret == -1) {
>     printf("child1: clone failed: %s\n", strerror(errno));
>     return EXIT_FAILURE;
>   }
>   printf("child1: child2 pid: %ld\n", ret);
>
>   sleep(1);
>   printf("child1: end\n");
>   return EXIT_SUCCESS;
> }
>
> int child2(void* arg)
> {
>   long ret = ptrace(PTRACE_TRACEME, 0, NULL, NULL);
>   if (ret == -1) {
>     printf("child2: ptrace failed: %s\n", strerror(errno));
>     return EXIT_FAILURE;
>   }
>
>   printf("child2: end\n");
>   return EXIT_SUCCESS;
> }

  reply	other threads:[~2017-05-11 17:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11 17:11 Threads stuck in zap_pid_ns_processes() Guenter Roeck
2017-05-11 17:31 ` Eric W. Biederman [this message]
2017-05-11 18:35   ` Guenter Roeck
2017-05-11 20:23     ` Eric W. Biederman
2017-05-11 20:48       ` Guenter Roeck
2017-05-11 21:39         ` Eric W. Biederman
2017-05-11 20:21   ` Guenter Roeck
2017-05-11 21:25     ` Eric W. Biederman
2017-05-11 22:47       ` Guenter Roeck
2017-05-11 23:19         ` Eric W. Biederman
2017-05-12  9:30           ` Vovo Yang
2017-05-12 13:26             ` Eric W. Biederman
2017-05-12 16:52               ` Guenter Roeck
2017-05-12 17:33                 ` Eric W. Biederman
2017-05-12 17:55                   ` [REVIEW][PATCH] pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes Eric W. Biederman
2017-05-12 19:33                     ` Guenter Roeck
2017-05-12 19:43                   ` Threads stuck in zap_pid_ns_processes() Guenter Roeck
2017-05-12 20:03                     ` Eric W. Biederman
2017-05-13 14:34                       ` Guenter Roeck
2017-05-13 18:21                         ` Eric W. Biederman
2017-06-01 17:08                         ` Eric W. Biederman
2017-06-01 18:45                           ` Guenter Roeck
2017-06-01 19:36                             ` Eric W. Biederman
2017-06-01 21:43                               ` Guenter Roeck
2017-06-02  1:06                                 ` Eric W. Biederman
2017-05-12  3:42         ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87shkbfggm.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mingo@kernel.org \
    --cc=vovoy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox