From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760070AbZE0GDP (ORCPT ); Wed, 27 May 2009 02:03:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755723AbZE0GDA (ORCPT ); Wed, 27 May 2009 02:03:00 -0400 Received: from smtp02.lnh.mail.rcn.net ([207.172.157.102]:28651 "EHLO smtp02.lnh.mail.rcn.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753406AbZE0GDA (ORCPT ); Wed, 27 May 2009 02:03:00 -0400 Subject: Re: [2.6.27.24] Kernel coredump to a pipe is failing From: Paul Smith Reply-To: paul@mad-scientist.net To: Andrew Morton Cc: Andi Kleen , linux-kernel@vger.kernel.org In-Reply-To: <20090526172935.fad52c49.akpm@linux-foundation.org> References: <1243355634.29250.331.camel@psmith-ubeta.netezza.com> <878wkjobbm.fsf@basil.nowhere.org> <20090526160017.98fc62e4.akpm@linux-foundation.org> <20090526231428.GK846@one.firstfloor.org> <20090526162821.02e11d5b.akpm@linux-foundation.org> <20090526234109.GL846@one.firstfloor.org> <20090526164532.6c780234.akpm@linux-foundation.org> <20090527001104.GN846@one.firstfloor.org> <20090526172935.fad52c49.akpm@linux-foundation.org> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: GNU's Not Unix! Date: Wed, 27 May 2009 02:02:53 -0400 Message-Id: <1243404173.7369.158.camel@homebase.localnet> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2009-05-26 at 17:29 -0700, Andrew Morton wrote: > On Wed, 27 May 2009 02:11:04 +0200 Andi Kleen wrote: > > > > I dunno. Is this true of all linux filesystems in all cases? Maybe. > > > > Assuming one of them is not would you rather want to fix that file system > > or 10 zillion user programs (including the kernel core dumper) that > > get it wrong? @) > > I think that removing one bug is better than adding one. > > Many filesystems will return a short write if they hit a memory > allocation failure, for example. pipe_write() sure will. Retrying > is appropriate in such a case. As a mainly userspace guy maybe I'm missing some details for kernel behavior, but I know I would never write a program that used write(2) and assumed it would never return a short write. The documentation for write(2) is very clear that short writes are possible and any reasonably robust program will handle this. Consider things like NFS filesystems, etc. where who knows what behavior is found. I'm more concerned with the loss of the signal mask settings in the core dump in Andi's patch. This seems to be losing important information. Andi, why did you prefer that to clearing the pending signal and retrying the write? I'm definitely not familiar enough with signal management in the kernel to know what side-effects there might be from just clearing the pending flag without doing anything else: I did it that way because fs/exec.c:do_coredump() does this before it runs the ->core_dump function. I wonder whether dump_write() shouldn't be rewritten along the lines of a normal, robust userspace writer, where we handle EAGAIN and EINTR (can we ever get these at this level, or do we ever just get ERESTARTSYS?), short writes, etc. PS. I have a thought about why this happens for me; I doubt I'm getting SIGPIPE. In our system it's almost certain that these worker processes will get a signal (SIGUSR1 or something: I forget exactly which one) if they are still alive after a few seconds. I suspect that the core dump takes long enough that this signal is received in the middle of the core dump. It may be that this problem hasn't been noticed before because it's unlikely you'll receive a signal in the middle of dumping core, and if you do get one every now and then, and get a short core, it's not easily reproducible. I left my debugging in the kernel and I get exactly one instance of signal_pending() per process, so having the signal be SIGPIPE seems unlikely.