From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751407Ab3LJUuW (ORCPT ); Tue, 10 Dec 2013 15:50:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:64209 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750836Ab3LJUuV (ORCPT ); Tue, 10 Dec 2013 15:50:21 -0500 Date: Tue, 10 Dec 2013 15:49:25 -0500 From: Dave Jones To: Oleg Nesterov Cc: Linus Torvalds , Thomas Gleixner , Darren Hart , Andrea Arcangeli , Linux Kernel Mailing List , Peter Zijlstra , Mel Gorman Subject: Re: process 'stuck' at exit. Message-ID: <20131210204925.GB27373@redhat.com> Mail-Followup-To: Dave Jones , Oleg Nesterov , Linus Torvalds , Thomas Gleixner , Darren Hart , Andrea Arcangeli , Linux Kernel Mailing List , Peter Zijlstra , Mel Gorman References: <20131210154724.GA30020@redhat.com> <20131210203559.GA1209@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131210203559.GA1209@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 10, 2013 at 09:35:59PM +0100, Oleg Nesterov wrote: > Dave, I must have missed something, help. > > I am looking at the first message and I can't understand who stuck > "at exit". > > The trace shows that the task with pid=10818 called sys_futex() ? > > Perhaps "exit" means the userspace paths? pid 1131 is wait()'ing for 10818 to exit pid 1130 is periodically sending SIGKILL to 10818 because it's gotten tired of waiting. 10818 is ignoring these because it's stuck in a loop somewhere in the kernel. I tried attaching to 10818 with gdb, and it just hangs. (possibly because its weird stack situation [see 1st post]) by inspecting the shared mapping that all processes have (by gdb'ing 1130) I can see that 10818 did all its full run without incident, and the "exit child" flag in the fuzzer had been in set. The last 'random syscall' the fuzzer did was to sys_accept4, so the futex call must come from somewhere in libc maybe ? Dave