From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262558AbULDQox (ORCPT ); Sat, 4 Dec 2004 11:44:53 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262560AbULDQox (ORCPT ); Sat, 4 Dec 2004 11:44:53 -0500 Received: from mail-relay-1.tiscali.it ([213.205.33.41]:32644 "EHLO mail-relay-1.tiscali.it") by vger.kernel.org with ESMTP id S262558AbULDQny (ORCPT ); Sat, 4 Dec 2004 11:43:54 -0500 Date: Sat, 4 Dec 2004 17:43:53 +0100 From: Andrea Arcangeli To: Voluspa Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH] oom killer (Core) Message-ID: <20041204164353.GE32635@dualathlon.random> References: <200412041242.iB4CgsN07246@d1o408.telia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200412041242.iB4CgsN07246@d1o408.telia.com> X-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 X-PGP-Key: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 User-Agent: Mutt/1.5.6i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 04, 2004 at 01:42:54PM +0100, Voluspa wrote: > > On 2004-12-04 8:08:40 Andrea Arcangeli wrote: > > > You can try to put back a might_slee_if(wait), but if it deadlocks > with > > that change sure it's not a bug in my patch, it's instead a bug > > somewhere else that calls alloc_pages w/o GFP_ATOMIC. Ingo's > > lowlatency patch would expose the same bug too since they're aliasing > > the might_sleep to cond_resched. > > Putting it back doesn't alter the outcome - hanging. And the original > patch, (hope it was the right one) from: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=110204117506557&w=2 yes it's the right one ;) > root:loke:/usr/src/linux-2.6.9-oomkill# patch -Np1 -i ../oomkill.patch > patching file mm/oom_kill.c > patching file mm/page_alloc.c > Hunk #1 succeeded at 608 (offset -3 lines). > Hunk #3 succeeded at 681 (offset -3 lines). > patching file mm/swap_state.c > patching file mm/vmscan.c > > has been tried with the following variations. With and without > optimizing for size, with and without preempt, with and without kernel > boot params (cfq, lapic), cold and hot starts, and then I threw in a smp > compile for measure. All have the same behaviour: > > [...] > Checking 'hlt' instruction... OK. > > [10 minutes wait. Then a long callback trace > scrolls off the screen ending like Thomas'] > > <0>Kernel panic - not syncing: Fatal exception in interrupt > > My toolchain (well, the whole software system) is quite contemporary > within the stable branches. Built from scratch with gcc-3.4.3, glibc- > 20041011 (nptl) and binutils-2.15.92.0.2 > > No energy control, acpi-pm or whatever it's called, is used here. The > machine is extremely stable. Running with 100 percent utilization 24/7. > > Don't shoot the messenger ;) I trust you of course but I've absolutely no idea how can my patch ever change any code that runs at that point during boot. mm/oom_kill.c can be obviously ruled out. The changes in mm/swap_state.c (two printk in show_swap_cache_info) as well can be obviously ruled out. The change in mm/vmscan.c as well only makes a difference during an oom condition. This mean it has to be the change in mm/page_alloc.c that broke something. But even that should never run during boot (except for the cond_resched instead of might_sleep_if that you already tried to backout separately from the rest). There's simply not enough memory pressure at boot in order to recall try_to_free_pages and run the modified code. If try_to_free_pages is being recalled during boot them we've a problem somewhere else, it should never happen! Plus it works like a charm here. Can you send me your .config so that I will try to send you privately a kernel image built on my machine? (and before sending I'll try to boot it locally ;) My .config sure is happily running. Many thanks.