From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933112AbdEKUVI (ORCPT ); Thu, 11 May 2017 16:21:08 -0400 Received: from bh-25.webhostbox.net ([208.91.199.152]:43205 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932646AbdEKUVG (ORCPT ); Thu, 11 May 2017 16:21:06 -0400 Date: Thu, 11 May 2017 13:21:04 -0700 From: Guenter Roeck To: "Eric W. Biederman" Cc: Ingo Molnar , linux-kernel@vger.kernel.org, Vovo Yang Subject: Re: Threads stuck in zap_pid_ns_processes() Message-ID: <20170511202104.GA14720@roeck-us.net> References: <20170511171108.GB15063@roeck-us.net> <87shkbfggm.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87shkbfggm.fsf@xmission.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Authenticated_sender: guenter@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: authenticated_id: guenter@roeck-us.net X-Authenticated-Sender: bh-25.webhostbox.net: guenter@roeck-us.net X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 11, 2017 at 12:31:21PM -0500, Eric W. Biederman wrote: > Guenter Roeck writes: > > > Hi all, > > > > the test program attached below almost always results in one of the child > > processes being stuck in zap_pid_ns_processes(). When this happens, I can > > see from test logs that nr_hashed == 2 and init_pids==1, but there is only > > a single thread left in the pid namespace (the one that is stuck). > > Traceback from /proc//stack is > > > > [] zap_pid_ns_processes+0x1ee/0x2a0 > > [] do_exit+0x10d4/0x1330 > > [] do_group_exit+0x86/0x130 > > [] get_signal+0x367/0x8a0 > > [] do_signal+0x83/0xb90 > > [] exit_to_usermode_loop+0x75/0xc0 > > [] syscall_return_slowpath+0xc6/0xd0 > > [] entry_SYSCALL_64_fastpath+0xab/0xad > > [] 0xffffffffffffffff > > > > After 120 seconds, I get the "hung task" message. > > > > Example from v4.11: > > > > ... > > [ 3263.379545] INFO: task clone:27910 blocked for more than 120 seconds. > > [ 3263.379561] Not tainted 4.11.0+ #1 > > [ 3263.379569] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 3263.379577] clone D 0 27910 27909 0x00000000 > > [ 3263.379587] Call Trace: > > [ 3263.379608] __schedule+0x677/0xda0 > > [ 3263.379621] ? pci_mmcfg_check_reserved+0xc0/0xc0 > > [ 3263.379634] ? task_stopped_code+0x70/0x70 > > [ 3263.379643] schedule+0x4d/0xd0 > > [ 3263.379653] zap_pid_ns_processes+0x1ee/0x2a0 > > [ 3263.379659] ? copy_pid_ns+0x4d0/0x4d0 > > [ 3263.379670] do_exit+0x10d4/0x1330 > > ... > > > > The problem is seen in all kernels up to v4.11. > > > > Any idea what might be going on and how to fix the problem ? > > Let me see. Reading the code it looks like we have three tasks > let's call them main, child1, and child2. > > child1 and child2 are started using CLONE_THREAD and are > thus clones of one another. > > child2 exits first but is ptraced by main so is not reaped. > Further child2 calls do_group_exit forcing child1 to > exit making for fun races. > > A ptread_exit() or syscall(SYS_exit, 0); would skip > the group exit and make the window larger. > > child1 exits next and calls zap_pid_ns_processes and is > waiting for child2 to be reaped by main. > > main is just sitting around doing nothing for 3600 seconds > not reaping anyone. > > I would expect that when main exits everything would be cleaned up > and the only real issue is that we have a hung task warning. > > Does everything cleanup when main exits? > As an add-on to my previous mail: I added a function to count the number of threads in the pid namespace, using next_pidmap(). Even though nr_hashed == 2, only the hanging thread is still present. Is there maybe a better way to terminate the wait loop than with "if (pid_ns->nr_hashed == init_pids)" ? Thanks, Guenter