From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261687AbULBSIo (ORCPT ); Thu, 2 Dec 2004 13:08:44 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261702AbULBSIo (ORCPT ); Thu, 2 Dec 2004 13:08:44 -0500 Received: from mail-relay-3.tiscali.it ([213.205.33.43]:49820 "EHLO mail-relay-3.tiscali.it") by vger.kernel.org with ESMTP id S261687AbULBSIi (ORCPT ); Thu, 2 Dec 2004 13:08:38 -0500 Date: Thu, 2 Dec 2004 19:08:23 +0100 From: Andrea Arcangeli To: Andrew Morton Cc: tglx@linutronix.de, marcelo.tosatti@cyclades.com, linux-kernel@vger.kernel.org, nickpiggin@yahoo.com.au Subject: Re: [PATCH] oom killer (Core) Message-ID: <20041202180823.GD32635@dualathlon.random> References: <20041201104820.1.patchmail@tglx> <20041201211638.GB4530@dualathlon.random> <1101938767.13353.62.camel@tglx.tec.linutronix.de> <20041202033619.GA32635@dualathlon.random> <1101985759.13353.102.camel@tglx.tec.linutronix.de> <1101995280.13353.124.camel@tglx.tec.linutronix.de> <20041202164725.GB32635@dualathlon.random> <20041202085518.58e0e8eb.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20041202085518.58e0e8eb.akpm@osdl.org> X-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 X-PGP-Key: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 User-Agent: Mutt/1.5.6i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 02, 2004 at 08:55:18AM -0800, Andrew Morton wrote: > Andrea Arcangeli wrote: > > > > I believe the thing you're hiding with the callback, is some screwup in > > the VM. It shouldn't fire oom 300 times in a row. > > Well no ;) Could you explain why do we need all_unreclaimable? What is the point of all_unreclaimable if we bypass it at priority = 0? Just to loop a few times (between DEF_PRIORITY and 1) at 100% cpu load? OTOH we must not forget 2.4(-aa) calls do_exit synchronously and it never sends signals. That might be why 2.4 doesn't kill more than one task by mistake, even without a callback-wakeup. So if we keep sending signals I certainly agree with Thomas that using a callback to stop the VM until the task is rescheduled is a good idea, and potentially it may be even the only possible fix when the oom killer is enabled like in 2.6 (though the 300 kills in between SIGKILL and the reschedule sounds like the VM isn't even trying anymore). Otherwise perhaps his workload is spawning enough tasks, that it takes an huge time for the rechedule (that would explain it too). Actually this should fix it too btw: - if (p->flags & PF_MEMDIE) - return 0; Thomas can you try the above? I'd rather fix this by removing buggy code, than by adding additional logics on top of already buggy code (i.e. setting PF_MEMDIE is a smp race and can corrupt other bitflags), but at least the oom-wakeup-callback from do_exit still makes a lot of sense (even if PF_MEMDIE must be dropped since it's buggy, and something else should be used instead). Whatever we change I'd like to change it on top of my last patch that already removes the 5 seconds fixed waits, and does the right watermark checks before killing the next task (Thomas already attempted that with a not accurate nr_free_pages check, so he at least agrees about the need of checking watermarks before firing up the oom killer). BTW, checking for pid == 1 like in Thomas's patch is a must, I advocated it for years but nobody listened yet, hope Thomas will be better at convincing the VM mainline folks than me.