public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Guenter Roeck <linux@roeck-us.net>
Cc: Vovo Yang <vovoy@google.com>, Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: Threads stuck in zap_pid_ns_processes()
Date: Fri, 12 May 2017 15:03:29 -0500	[thread overview]
Message-ID: <87wp9lvo4u.fsf@xmission.com> (raw)
In-Reply-To: <20170512194304.GE12960@roeck-us.net> (Guenter Roeck's message of "Fri, 12 May 2017 12:43:04 -0700")

Guenter Roeck <linux@roeck-us.net> writes:

> Hi Eric,
>
> On Fri, May 12, 2017 at 12:33:01PM -0500, Eric W. Biederman wrote:
>> Guenter Roeck <linux@roeck-us.net> writes:
>> 
>> > Hi Eric,
>> >
>> > On Fri, May 12, 2017 at 08:26:27AM -0500, Eric W. Biederman wrote:
>> >> Vovo Yang <vovoy@google.com> writes:
>> >> 
>> >> > On Fri, May 12, 2017 at 7:19 AM, Eric W. Biederman
>> >> > <ebiederm@xmission.com> wrote:
>> >> >> Guenter Roeck <linux@roeck-us.net> writes:
>> >> >>
>> >> >>> What I know so far is
>> >> >>> - We see this condition on a regular basis in the field. Regular is
>> >> >>>   relative, of course - let's say maybe 1 in a Milion Chromebooks
>> >> >>>   per day reports a crash because of it. That is not that many,
>> >> >>>   but it adds up.
>> >> >>> - We are able to reproduce the problem with a performance benchmark
>> >> >>>   which opens 100 chrome tabs. While that is a lot, it should not
>> >> >>>   result in a kernel hang/crash.
>> >> >>> - Vovo proviced the test code last night. I don't know if this is
>> >> >>>   exactly what is observed in the benchmark, or how it relates to the
>> >> >>>   benchmark in the first place, but it is the first time we are actually
>> >> >>>   able to reliably create a condition where the problem is seen.
>> >> >>
>> >> >> Thank you.  I will be interesting to hear what is happening in the
>> >> >> chrome perfomance benchmark that triggers this.
>> >> >>
>> >> > What's happening in the benchmark:
>> >> > 1. A chrome renderer process was created with CLONE_NEWPID
>> >> > 2. The process crashed
>> >> > 3. Chrome breakpad service calls ptrace(PTRACE_ATTACH, ..) to attach to every
>> >> >   threads of the crashed process to dump info
>> >> > 4. When breakpad detach the crashed process, the crashed process stuck in
>> >> >   zap_pid_ns_processes()
>> >> 
>> >> Very interesting thank you.
>> >> 
>> >> So the question is specifically which interaction is causing this.
>> >> 
>> >> In the test case provided it was a sibling task in the pid namespace
>> >> dying and not being reaped.  Which may be what is happening with
>> >> breakpad.  So far I have yet to see kernel bug but I won't rule one out.
>> >> 
>> >
>> > I am trying to understand what you are looking for. I would have thought
>> > that both the test application as well as the Chrome functionality
>> > described above show that there are situations where zap_pid_ns_processes()
>> > can get stuck and cause hung task timeouts in conjunction with the use of
>> > ptrace().
>> >
>> > Your last sentence seems to suggest that you believe that the kernel might
>> > do what it is expected to do. Assuming this is the case, what else would
>> > you like to see ? A test application which matches exactly the Chrome use
>> > case ? We can try to provide that, but I don't entirely understand how
>> > that would change the situation. After all, we already know that it is
>> > possible to get a thread into this condition, and we already have one
>> > means to reproduce it.
>> >
>> > Replacing TASK_UNINTERRUPTIBLE with TASK_INTERRUPTABLE works for both the
>> > test application and the Chrome benchmark. The thread is still stuck in
>> > zap_pid_ns_processes(), but it is now in S (sleep) state instead of D,
>> > and no longer results in a hung task timeout. It remains in that state
>> > until the parent process terminates. I am not entirely happy with it
>> > since the processes are still stuck and may pile up over time, but at
>> > least it solves the immediate problem for us.
>> >
>> > Question now is what to do with that solution. We can of course apply
>> > it locally to Chrome OS, but I would rather have it upstream - especially
>> > since we have to assume that any users of Chrome on Linux, or more
>> > generically anyone using ptrace in conjunction with CLONE_NEWPID, may
>> > experience the same problem. Right now I have no idea how to get there,
>> > though. Can you provide some guidance ?
>> 
>> Apologies for not being clear.  I intend to send a pull request with the
>
> No worries.
>
>> the TASK_UINTERRUPTIBLE to TASK_INTERRUPTIBLE change to Linus in the
>> next week or so with a Cc stable and an appropriate Fixes tag.  So the
>> fix can be backported.
>> 
> Great, that is good to know.
>
>> I have a more comprehensive change queued I will probably merge for 4.13
>> already but it just changes what kind of zombies you see.  It won't
>> remove the ``stuck'' zombies.
>> 
>> So what I am looking for now is:
>> Why are things getting stuck in your benchmark?
>> 
>> -  Is it a userspace bug?
>> 
>>   In which case we can figure out what userspace (aka breakpad) needs
>>    to do to avoid the problem.
>>    
> I spent some time trying to understand what breakpad is doing. I don't
> really see how it can be blamed. It attaches itself to a crashing thread,
> gets the information it is looking for, and detaches itself. At the
> same time, whatever it does, it seems to me that userspace should not
> be able to create such a situation in the first place.

I suspect what is happening is that the thread breakpad is attached to
is killed by zap_pid_ns_processes.  At which point PTRACE_DETACH won't
work because the thread has been killed and waitpid(thread_pid,...)
needs to be used instead.

Certainly that is what is happening in the reproducer.

So I suspect breakpad just needs a waitpid whend PTRACE_DETACH fails
or possible just a general loop that listens for SIGCHLD and calls
some variant of wait.

>> -  Is it a kernel bug with ptrace?
>> 
>>    There have been a lot of little subtle bugs with ptrace over the
>>    years so one more would not surprise
>> 
> Good question. I suspect that PTRACE_ATTACH (or PTRACE_TRACEME)
> changes the reaper to be the calling process, and a subsequent
> PTRACE_DETACH or exit of the process calling PTRACE_TRACEME doesn't
> change it back or changes it to the process group owner. Is that
> what happens, and if so is it what should happen ? I don't know (yet).

Not the reaper but ptrace does change the parent to be the tracer.
PTRACE_DETACH won't work if the process is not stopped waiting
for the tracer.  And being killed and being a zombie is the ultimate
not waiting for the tracer state.

What needs to happen is for the tracer (aka breakpad) to call waitpid.

>> So I am just looking to make certain we fix the root issue not just
>> the hung task timeout warning.
>> 
> Makes sense. At least with TASK_INTERRUPTIBLE, I suspect we are just
> fixing the symptom, but on the other side that isn't really my area
> of expertise, so I don't really know.
>
> One additional data point: If I time the threads in the test code by
> adding various calls to sleep(), it is still more or less random if the
> task status ends up in Z or D, meaning the child thread is sometimes
> reaped and sometimes not. I would have expected it to be well defined.
> I may be wrong, but that smells racy to me.

If you have the threads exit with the exit system call rather than
what in the kernel winds up being do_group_exit it won't be racy.
The easy version is to use pthreads from child1 to create child2
and have them exit with pthread_exit.

Alternatively you can wrap call syscall(SYS_exit, exit_code); which
should also do the trick.

> Anyway, I'll spend some more time on this. I'll let you know if I find
> anything.

Thanks.

Eric

  reply	other threads:[~2017-05-12 20:10 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-11 17:11 Threads stuck in zap_pid_ns_processes() Guenter Roeck
2017-05-11 17:31 ` Eric W. Biederman
2017-05-11 18:35   ` Guenter Roeck
2017-05-11 20:23     ` Eric W. Biederman
2017-05-11 20:48       ` Guenter Roeck
2017-05-11 21:39         ` Eric W. Biederman
2017-05-11 20:21   ` Guenter Roeck
2017-05-11 21:25     ` Eric W. Biederman
2017-05-11 22:47       ` Guenter Roeck
2017-05-11 23:19         ` Eric W. Biederman
2017-05-12  9:30           ` Vovo Yang
2017-05-12 13:26             ` Eric W. Biederman
2017-05-12 16:52               ` Guenter Roeck
2017-05-12 17:33                 ` Eric W. Biederman
2017-05-12 17:55                   ` [REVIEW][PATCH] pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes Eric W. Biederman
2017-05-12 19:33                     ` Guenter Roeck
2017-05-12 19:43                   ` Threads stuck in zap_pid_ns_processes() Guenter Roeck
2017-05-12 20:03                     ` Eric W. Biederman [this message]
2017-05-13 14:34                       ` Guenter Roeck
2017-05-13 18:21                         ` Eric W. Biederman
2017-06-01 17:08                         ` Eric W. Biederman
2017-06-01 18:45                           ` Guenter Roeck
2017-06-01 19:36                             ` Eric W. Biederman
2017-06-01 21:43                               ` Guenter Roeck
2017-06-02  1:06                                 ` Eric W. Biederman
2017-05-12  3:42         ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wp9lvo4u.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=mingo@kernel.org \
    --cc=vovoy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox