From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Heffner Subject: Re: Fw: [Bug 9189] New: Oops in kernel 2.6.21-rc4 through 2.6.23, page allocation failure Date: Fri, 19 Oct 2007 10:59:29 -0400 Message-ID: <4718C651.5040804@psc.edu> References: <20071019073917.1d15fdbb@oldman> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "David S. Miller" , netdev@vger.kernel.org To: Stephen Hemminger Return-path: Received: from mailer1.psc.edu ([128.182.58.100]:60565 "EHLO mailer1.psc.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755263AbXJSO7d (ORCPT ); Fri, 19 Oct 2007 10:59:33 -0400 In-Reply-To: <20071019073917.1d15fdbb@oldman> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Stephen Hemminger wrote: > Looks like a memory over commit with small machines?? > > Begin forwarded message: > > Date: Fri, 19 Oct 2007 01:35:33 -0700 (PDT) > From: bugme-daemon@bugzilla.kernel.org > To: shemminger@linux-foundation.org > Subject: [Bug 9189] New: Oops in kernel 2.6.21-rc4 through 2.6.23, page allocation failure [snip] > Problem Description:After recent upgrade to kernel 2.6.23 (from 2.6.20) I have > started seeing kernel oops-es in networking code. The problem is 100% > reproducible in my environment. I've seen two slightly different backtraces but > both seem to be caused by the same commit. > > I've performed the git bisect and tracked down the problem to the commit: > 53cdcc04c1e85d4e423b2822b66149b6f2e52c2c [TCP]: Fix tcp_mem[] initialization > > Once I reverse this commit in 2.6.23 the problem goes away (this is true also > for the kernel version generated by git bisect, 2.6.21-rc4). > > Backtrace #1: > page allocation failure. order:1, mode:0x20 > [] __alloc_pages+0x2e1/0x300 > [] cache_alloc_refill+0x29e/0x4b0 > [] __kmalloc+0x6e/0x80 > [] __alloc_skb+0x53/0x110 > [] tcp_collapse+0x1ac/0x370 > [] tcp_prune_queue+0xfd/0x2c0 > [] tcp_data_queue+0x7cd/0xbb0 > [] skb_checksum+0x4d/0x2a0 > [] tcp_rcv_established+0x36e/0x6a0 > [] tcp_v4_do_rcv+0xb4/0x2a0 > [] __alloc_pages+0xd9/0x300 > [] tcp_v4_rcv+0x6a9/0x6c0 > [] ip_local_deliver+0x91/0x110 > [] ip_rcv+0x230/0x3c0 > [] __alloc_skb+0x53/0x110 > [] netif_receive_skb+0x152/0x1e0 > [] process_backlog+0x6f/0xe0 > [] net_rx_action+0x5c/0xf0 > [] __do_softirq+0x42/0x90 > [] do_softirq+0x27/0x30 > [] do_IRQ+0x3d/0x70 > [] sys_gettimeofday+0x28/0x80 > [] common_interrupt+0x23/0x28 > ======================= I'm not surprised that this commit would make a difference in this situation, since it does change the fraction of memory TCP is allowed to use. (If it really is too much in this situation, we should tweak the function.) However, I don't think this is the root cause. Why does it oops here when the allocation fails? -John