From mboxrd@z Thu Jan 1 00:00:00 1970 From: Terry Loftin Subject: [PATCH 0/1][RFC] e1000e: stop cleaning when we reach tx_ring->next_to_use Date: Wed, 31 Mar 2010 14:01:28 -0600 Message-ID: <4BB3AA18.3050301@hp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net, Jeff Kirsher , Jesse Brandeburg Return-path: Received: from g1t0027.austin.hp.com ([15.216.28.34]:19012 "EHLO g1t0027.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757566Ab0CaUBb (ORCPT ); Wed, 31 Mar 2010 16:01:31 -0400 Sender: netdev-owner@vger.kernel.org List-ID: During long test runs with heavy network traffic, we have had a number of crashes in e1000e with backtraces like this: BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc IP: [] e1000_clean_tx_irq+0x81/0x2db [e1000e] Pid: 0, comm: swapper Not tainted 2.6.32-4-amd64 #1 ProLiant DL380 G6 RIP: 0010:[] [] e1000_clean_tx_irq+0x81/0x2db [e1000e] RSP: 0018:ffff8800282039a0 EFLAGS: 00010246 RAX: ffff8803259e0000 RBX: 0000000000000046 RCX: 0000000000000000 RDX: ffff8803259e0000 RSI: 0000000000000000 RDI: ffffc90006b20af0 RBP: ffff880028203a10 R08: 0000000000000000 R09: 0000000000003d5c R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000046 R13: ffff8801a4bc45c0 R14: ffff8801a4b5cd40 R15: 0000000000000046 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000000000cc CR3: 0000000001001000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81420000, task ffffffff8145e4b0) Stack: 0000000000000008 0000000000000001 ffff8802fa2d8200 ffff8801a6966880 <0> ffff8800282039d0 ffff8801a4bc4000 0000000000000003 01ffffff00000000 <0> ffff8803259e0000 ffff8801a4bc45c0 ffff8801a4b5cd40 ffff880326cbccc0 Call Trace: [] e1000_intr_msix_tx+0x31/0x53 [e1000e] (eth9 tx) [] handle_IRQ_event+0x61/0x13b [] handle_edge_irq+0xeb/0x130 [] handle_irq+0x1f/0x27 [] do_IRQ+0x5a/0xba [] ret_from_intr+0x0/0x11 [] ? e1000_clean_tx_irq+0x71/0x2db [e1000e] [] ? ret_from_intr+0x0/0x11 [] ? e1000_intr_msix_tx+0x31/0x53 [e1000e] (eth6 tx) [] ? handle_IRQ_event+0x61/0x13b [] ? handle_edge_irq+0xeb/0x130 [] ? handle_irq+0x1f/0x27 [] ? do_IRQ+0x5a/0xba [] ? ret_from_intr+0x0/0x11 [] ? e1000_clean_rx_irq+0x1fb/0x2fb [e1000e] (eth6 rx) [] ? is_swiotlb_buffer+0x2b/0x39 [] ? e1000_clean+0x75/0x22b [e1000e] [] ? net_rx_action+0xb8/0x1e3 [] ? __do_softirq+0xde/0x19f [] ? call_softirq+0x1c/0x28 [] ? do_softirq+0x41/0x81 [] ? irq_exit+0x36/0x75 [] ? do_IRQ+0xa3/0xba [] ? ret_from_intr+0x0/0x11 [] ? acpi_idle_enter_bm+0x2bb/0x2f2 [processor] [] ? acpi_idle_enter_bm+0x2b4/0x2f2 [processor] [] ? cpuidle_idle_call+0x9b/0xf9 [] ? cpu_idle+0x5b/0x93 [] ? rest_init+0x66/0x68 [] ? start_kernel+0x381/0x38c [] ? early_idt_handler+0x0/0x71 [] ? x86_64_start_reservations+0xaa/0xae [] ? x86_64_start_kernel+0xf7/0x106 Typically, we find several nested interrupts. Each interrupt is for a different interface and tx or rx combination as noted above in parenthesis. This problem happens on about 30% of our 4-day CHO runs. The crash occurs in e1000_clean_tx_irq(), on this line: segs = skb_shinfo(skb)->gso_segs ?: 1; because the skb is null (0xcc is the offset of gso_segs). The problem is that we clean the tx_ring until we hit an entry that does not have (eop_desc->upper.data & E1000_TXD_STAT_DD). In other words, we keep cleaning the ring until we find an entry that the hardware hasn't marked as done. The crash always occurs when i >= tx_ring->next_to_use. In the crash above, we set eop to the ring entries next_to_watch index at the bottom of the while loop: index next_to_watch skb descriptor->upper.data 0x46 0x46 null 0 0x47 0x47 null 0 That is, eop = 0x46. By the time we get to the test at the top of the while loop, the ring now looks like this: index next_to_watch skb descriptor->upper.data 0x46 0x47 null E1000_TXD_STAT_DD 0x47 0x47 not-null E1000_TXD_STAT_DD Because descriptor->upper.data now has E1000_TXD_STAT_DD, we assume this entry can be cleaned, and since we're using the old next_to_watch value, we assume it has an skb. Apparently, we've been interrupted long enough handling interrupts from other interfaces that another cpu has had time to call e1000_start_xmit(), queue up more tx's and the hardware has had time to xmit some of them and mark them as E1000_TXD_STAT_DD. I've been able to make this occur much more frequently (within 10 minutes) by inserting a delay loop after we set eop, similar to: if (i == tx_ring->next_to_use) for (j = 0; j < 5000000; j++ ) ; The fix is to just bail out when (i == tx_ring->next_to_use). With the fix and the delay loop, the problem no longer occurred for me. A patch follows. If you find it acceptable, please considerate it. Thanks, -T