From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751395AbdFASp6 (ORCPT ); Thu, 1 Jun 2017 14:45:58 -0400 Received: from bh-25.webhostbox.net ([208.91.199.152]:59400 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751078AbdFASp5 (ORCPT ); Thu, 1 Jun 2017 14:45:57 -0400 Date: Thu, 1 Jun 2017 11:45:49 -0700 From: Guenter Roeck To: "Eric W. Biederman" Cc: Vovo Yang , Ingo Molnar , linux-kernel@vger.kernel.org Subject: Re: Threads stuck in zap_pid_ns_processes() Message-ID: <20170601184549.GA28522@roeck-us.net> References: <20170511224724.GB15676@roeck-us.net> <8760h79e22.fsf@xmission.com> <8760h66wak.fsf@xmission.com> <20170512165214.GA12960@roeck-us.net> <874lwqyo8i.fsf@xmission.com> <20170512194304.GE12960@roeck-us.net> <87wp9lvo4u.fsf@xmission.com> <87inkfab4l.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87inkfab4l.fsf@xmission.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Authenticated_sender: guenter@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: authenticated_id: guenter@roeck-us.net X-Authenticated-Sender: bh-25.webhostbox.net: guenter@roeck-us.net X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 01, 2017 at 12:08:58PM -0500, Eric W. Biederman wrote: > Guenter Roeck writes: > > > > I think you nailed it. If I drop CLONE_NEWPID from the reproducer I get > > a zombie process. > > > > I guess the only question left is if zap_pid_ns_processes() should (or could) > > somehow detect that situation and return instead of waiting forever. > > What do you think ? > > Any chance you can point me at the chromium code that is performing the > ptrace? > > I want to conduct a review of the kernel semantics to see if the current > semantics make it unnecessarily easy to get into hang situations. If > the semantics make it really easy to get into a hang situation I want > to see if there is anything we can do to delicately change the semantics > to avoid the hangs without breaking existing userspace. > The internal bug should be accessible to you. https://bugs.chromium.org/p/chromium/issues/detail?id=721298&desc=2 It has some additional information, and points to the following code in Chrome. https://cs.chromium.org/chromium/src/breakpad/src/client/linux/minidump_writer/linux_ptrace_dumper.cc?rcl=47e51739fd00badbceba5bc26b8abc8bbd530989&l=85 With the information we have, I don't really have a good idea what we could or should change in Chrome to make the problem disappear, so I just concluded that we'll have to live with the forever-sleeping task. Thanks, Guenter