From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: 2.6.38 x86_64 domU null pointer in xennet_alloc_rx_buffers Date: Tue, 12 Apr 2011 17:06:26 -0400 Message-ID: <20110412210626.GA21002@dumpdata.com> References: <875AC862-8CFC-4583-8BDC-45ECE189DE53@linode.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <875AC862-8CFC-4583-8BDC-45ECE189DE53@linode.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Peter Sandin Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Tue, Apr 12, 2011 at 11:58:35AM -0400, Peter Sandin wrote: > > We've got some 64 bit guests that have been trying to dereference a null pointer in xennet_alloc_rx_buffers. We have only been receiving reports of this issue since introducing 2.6.38 guest kernels. The only reports that we have received of this are on guests that are running 64 bit kernels. These reports have come from multiple separate physical machines. One of the instances that ran in to this issue was repeatedly restarting the nginx web server, and failing because port 80 was already in use, however we were unable to replicate the issue using this method in a controlled environment. Any suggestions on replicating or resolving this issue are would be appreciated. > > More traces, the .config and kernel binary can be found at: > > http://thesandins.net/xen/2.6.38-x86_64/ Nothing in the Xen hypervisor console? > > -- > > BUG: Bad page state in process swapper pfn:5bb31 > page:ffffea000140f2b8 count:-1 mapcount:0 mapping: (null) index:0xffff88005b8bdf80 > page flags: 0x100000000000000() > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [] xennet_alloc_rx_buffers+0xe1/0x2d9 So it looks as if it just does an alloc_page, and alloc_page does an check_new_page(), which checks the values mentioned above. The one that is odd is the page->_count (it should have been zero, it is -1). .. which sadly is not getting us closer to trying to reproduce this. But it looks familiar.. > PGD 7bacb067 PUD 7b930067 PMD 0 > Oops: 0002 [#1] SMP > last sysfs file: /sys/kernel/uevent_seqnum > CPU 0 > Modules linked in: > > Pid: 0, comm: swapper Not tainted 2.6.38-x86_64-linode17 #1 > RIP: e030:[] [] xennet_alloc_rx_buffers+0xe1/0x2d9 > RSP: e02b:ffff88007ff7fcf0 EFLAGS: 00010202 > RAX: 0000000000000000 RBX: ffff88007bfa85c0 RCX: 0000000000000000 > RDX: ffff88007d36bf00 RSI: ffff88007b309400 RDI: ffff88007b309400 > RBP: ffff88007ff7fd50 R08: 0000000000000000 R09: 000000000007195a > R10: 0000000000000001 R11: 00000000000006fa R12: ffff88007bfa92b0 > R13: ffff88007bfa8000 R14: 0000000000000001 R15: 00000000000002cd > FS: 00007f4de5d42760(0000) GS:ffff88007ff7c000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000007bb74000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a9b020) > Stack: > ffff88007d36bf00 ffff88007bfa8000 ffff88007d36bf00 ffff88007bfa85c0 > ffff88007ff7fd50 00000017813f46c5 ffff88007d36bf00 ffff88007bfa85c0 > ffff88007ff7fe10 ffff88007bfa8000 0000000000000001 ffff88007bfa85c0 > Call Trace: > > [] xennet_poll+0xbef/0xc85 > [] ? _raw_spin_unlock_irqrestore+0x19/0x1c > [] net_rx_action+0xb6/0x1dc > [] ? unmask_evtchn+0x1f/0xa3 > [] __do_softirq+0xc7/0x1a3 > [] ? handle_fasteoi_irq+0xd2/0xe1 > [] ? check_events+0x12/0x20 > [] call_softirq+0x1c/0x30 > [] do_softirq+0x41/0x7e > [] irq_exit+0x36/0x78 > [] xen_evtchn_do_upcall+0x2f/0x3c > [] xen_do_hypervisor_callback+0x1e/0x30 > > [] ? hypercall_page+0x3aa/0x1006 > [] ? hypercall_page+0x3aa/0x1006 > [] ? hypercall_page+0x3aa/0x1006 > [] ? xen_safe_halt+0x10/0x1a > [] ? default_idle+0x4b/0x85 > [] ? cpu_idle+0x60/0x97 > [] ? rest_init+0x6d/0x6f > [] ? start_kernel+0x37f/0x38a > [] ? x86_64_start_reservations+0xb8/0xbc > [] ? xen_start_kernel+0x528/0x52f > Code: c8 00 00 00 41 ff c6 48 89 44 37 38 8b 82 c4 00 00 00 48 8b b2 c8 00 00 00 66 c7 04 06 01 00 49 8b 44 24 08 4c 89 22 48 89 4 > 2 08 <48> 89 10 49 89 54 24 08 ff 83 00 0d 00 00 44 3b 75 cc 0f 8c 5a > RIP [] xennet_alloc_rx_buffers+0xe1/0x2d9 > RSP > CR2: 0000000000000000 > ---[ end trace e0e245c8a8426fde ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Pid: 0, comm: swapper Tainted: G D 2.6.38-x86_64-linode17 #1 > Call Trace: > [] ? panic+0x8c/0x195 > [] ? oops_end+0xb7/0xc7 > [] ? no_context+0x1f7/0x206 > [] ? get_page_from_freelist+0x445/0x715 > [] ? __bad_area_nosemaphore+0x188/0x1ab > [] ? tcp_v4_rcv+0x521/0x681 > [] ? bad_area_nosemaphore+0xe/0x10 > [] ? do_page_fault+0x1ef/0x3ee > [] ? tcp_v4_rcv+0x521/0x681 > [] ? __alloc_pages_nodemask+0x14d/0x6ab > [] ? __netdev_alloc_skb+0x1d/0x3a > [] ? page_fault+0x25/0x30 > [] ? xennet_alloc_rx_buffers+0xe1/0x2d9 > [] ? xennet_poll+0xbef/0xc85 > [] ? _raw_spin_unlock_irqrestore+0x19/0x1c > [] ? net_rx_action+0xb6/0x1dc > [] ? unmask_evtchn+0x1f/0xa3 > [] ? __do_softirq+0xc7/0x1a3 > [] ? handle_fasteoi_irq+0xd2/0xe1 > [] ? check_events+0x12/0x20 > [] ? call_softirq+0x1c/0x30 > [] ? do_softirq+0x41/0x7e > [] ? irq_exit+0x36/0x78 > [] ? xen_evtchn_do_upcall+0x2f/0x3c > [] ? xen_do_hypervisor_callback+0x1e/0x30 > [] ? hypercall_page+0x3aa/0x1006 > [] ? hypercall_page+0x3aa/0x1006 > [] ? hypercall_page+0x3aa/0x1006 > [] ? xen_safe_halt+0x10/0x1a > [] ? default_idle+0x4b/0x85 > [] ? cpu_idle+0x60/0x97 > [] ? rest_init+0x6d/0x6f > [] ? start_kernel+0x37f/0x38a > [] ? x86_64_start_reservations+0xb8/0xbc > [] ? xen_start_kernel+0x528/0x52f > > --Peter > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel