From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265161AbUEYW05 (ORCPT ); Tue, 25 May 2004 18:26:57 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265243AbUEYWYW (ORCPT ); Tue, 25 May 2004 18:24:22 -0400 Received: from v-2020.easyco.net ([63.209.183.37]:7872 "HELO v-2020.easyco.net") by vger.kernel.org with SMTP id S265169AbUEYWV2 (ORCPT ); Tue, 25 May 2004 18:21:28 -0400 X-VScan: EasyCo VirusScan - clear - uvscan v4.1.60-4362 May 19 2004 Message-ID: <40B3C816.6030802@easyco.com> Date: Tue, 25 May 2004 15:26:30 -0700 From: Doug Dumitru User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7b) Gecko/20040316 X-Accept-Language: en-us, en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org, "David S. Miller" Subject: Re: Hard Hang with __alloc_pages: 0-order allocation failed (gfp=0x20/1) - Not out of memory Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org This is the original trap dump from a __page_alloc error __alloc_pages: 0-order allocation failed (gfp=0x20/1) This was the first error that took the machine down entirely (it should be noted that the machine was >100 LoadAvg soft-hang before this error). This is with "echo 1 > /proc/sys/vm/vm_gfp_debug" and I ran it thru ksymoops to try to decode the addresses (I hope I did this right). ksymoops 2.4.4 on i686 2.4.25. Options used -V (default) -k ksyms.5 (specified) -l /proc/modules (default) -o /lib/modules/2.4.26/ (specified) -m /boot/System.map-2.4.26 (specified) Warning (expand_objects): object /lib/modules/2.4.26/kernel/drivers/md/lvm-mod.o for module lvm-mod has changed since load Warning (expand_objects): object /lib/modules/2.4.26/kernel/drivers/md/md.o for module md has changed since load cc68bad8 c0135289 00000000 011410ac 00000001 0000000c c03689dc 0000 cbccb780 cbccb780 c02d23ba c7c5b838 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Warning (Oops_read): Code line not seen, dumping what data is available Trace; c0135289 <__alloc_pages+2d9/2f0> Trace; c01352b0 <__get_free_pages+10/20> Trace; c0132214 Trace; c02d23ba Trace; c01327f1 Trace; c029923f Trace; c01f0d3c Trace; c01f0c52 Trace; c0121786 Trace; c01219d9 Trace; c01f05ec Trace; c010a4de Trace; c010a6f4 Trace; c0133ce6 Trace; c0134152 Trace; c01341fc Trace; c0134271 Trace; c0134dff Trace; c0135169 <__alloc_pages+1b9/2f0> Trace; c01352b0 <__get_free_pages+10/20> Trace; c014c203 <__pollwait+33/90> Trace; c02b765e Trace; c029634f Trace; c014c467 Trace; c014c8e9 Trace; c010a72d Trace; c0108b63 If I am reading this correctly, the system was ... in an interrupt processing some TCP select(...) stuff asking for a page doing a zone rebalance trying to shrink cache and interrupted again by the ethernet driver which wanted to allocate an skb which wanted a page Thus __alloc_pages appears to be called recursively, with the 2nd call during a rebalance in the first one and both calls non-interuptable (on interrupts). Is this allowable? -------------------------------------------------------------------- Doug Dumitru 800-470-2756 (610-237-2000) EasyCo LLC doug@easyco.com http://easyco.com --------------------------------------------------------------------