From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755820AbZAIVrs (ORCPT ); Fri, 9 Jan 2009 16:47:48 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757030AbZAIVrZ (ORCPT ); Fri, 9 Jan 2009 16:47:25 -0500 Received: from newbox.bazarnic.net ([209.188.29.80]:30761 "EHLO newbox.bazarnic.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756836AbZAIVrW convert rfc822-to-8bit (ORCPT ); Fri, 9 Jan 2009 16:47:22 -0500 Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=private; d=bazarnic.net; b=KRupvhp7rq/uX9zLJLdCu7Ej8gRnqOH3F/azxvnRDy77kUwkTBpivvhaCKTYGSeP; From: "Doug Bazarnic" To: Subject: e1000 tx Unit hang on 2.6.28 and patch that works... Date: Fri, 9 Jan 2009 14:47:23 -0700 Message-ID: <007e01c972a3$db2301b0$91690510$@net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AclyncqhpGcUwzIGQMOLxX3zq6PK4gABT/Sg Content-Language: en-us Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Asus P5k, Intel Q9450, 8gb ram, x86_64, running 2.6.28.   The on board eth0 atl1 never has a problem, only eth1 intel e1000, plugged into a pci slot. On a couple of machines, the way I got it to stop, was to use 8.0.6 intel e1000 driver with the patch below.  (works on same hardware, running 2.6.27 kernels) I can’t remember where I downloaded this patch from, but I haven’t had an issue yet with it. [root@web6 SCRIPTS]# cat e1000-8-0-6-stepfix.patch >diff -up e1000-8-0-6base/src/e1000_main.c e1000-8-0-6stepfix/src/e1000_main.c >--- e1000-8-0-6base/src/e1000_main.c    2008-10-29 17:38:38.000000000 -0700 >+++ e1000-8-0-6stepfix/src/e1000_main.c 2008-10-29 17:43:52.000000000 -0700 >@@ -65,7 +65,7 @@ static char e1000_driver_string[] = "Int >  > #define DRV_HW_PERF >  >-#define DRV_VERSION "8.0.6" DRV_NAPI DRV_DEBUG DRV_HW_PERF >+#define DRV_VERSION "8.0.6a" DRV_NAPI DRV_DEBUG DRV_HW_PERF > const char e1000_driver_version[] = DRV_VERSION; > static const char e1000_copyright[] = "Copyright (c) 1999-2008 Intel Corporation."; >  >@@ -2364,8 +2364,8 @@ link_up: >        if (!netif_carrier_ok(netdev)) { >                for (i = 0 ; i < adapter->num_tx_queues ; i++) { >                        tx_ring = &adapter->tx_ring[i]; >-                       tx_pending |= (E1000_DESC_UNUSED(tx_ring) + 1 < >-                                                              tx_ring->count); >+                       tx_pending |= (E1000_DESC_UNUSED(tx_ring) + >+                                     tx_ring->step  < tx_ring->count); >                } >                if (tx_pending) { >                        /* We've lost link, so the controller stops DMA, >@@ -2892,7 +2892,7 @@ static int __e1000_maybe_stop_tx(struct >  >        /* We need to check again in a case another CPU has just >         * made room available. */ >-       if (likely(E1000_DESC_UNUSED(tx_ring) < size)) >+       if (likely(E1000_DESC_UNUSED(tx_ring) < ((size) * tx_ring->step))) >                return -EBUSY; >  >        /* A reprieve! */ >@@ -2909,7 +2909,7 @@ static int __e1000_maybe_stop_tx(struct > static int e1000_maybe_stop_tx(struct net_device *netdev, >                                struct e1000_tx_ring *tx_ring, int size) > { >-       if (likely(E1000_DESC_UNUSED(tx_ring) >= size)) >+       if (likely(E1000_DESC_UNUSED(tx_ring) >= ((size) * tx_ring->step))) >                return 0; >        return __e1000_maybe_stop_tx(netdev, tx_ring, size); > } --------------------------------- This has been an ongoing problem on over 20 identical machines. Jan  7 13:26:41 web2 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jan  7 13:26:41 web2 kernel:   Tx Queue             <0> Jan  7 13:26:41 web2 kernel:   TDH                  <2a> Jan  7 13:26:41 web2 kernel:   TDT                  <2d> Jan  7 13:26:41 web2 kernel:   next_to_use          <2d> Jan  7 13:26:41 web2 kernel:   next_to_clean        <2a> Jan  7 13:26:41 web2 kernel: buffer_info[next_to_clean] Jan  7 13:26:41 web2 kernel:   time_stamp           <103105c76> Jan  7 13:26:41 web2 kernel:   next_to_watch        <2a> Jan  7 13:26:41 web2 kernel:   jiffies              <103106770> Jan  7 13:26:41 web2 kernel:   next_to_watch.status <0> Jan  7 13:26:43 web2 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jan  7 13:26:43 web2 kernel:   Tx Queue             <0> Jan  7 13:26:43 web2 kernel:   TDH                  <2a> Jan  7 13:26:43 web2 kernel:   TDT                  <2d> Jan  7 13:26:43 web2 kernel:   next_to_use          <2d> Jan  7 13:26:43 web2 kernel:   next_to_clean        <2a> Jan  7 13:26:43 web2 kernel: buffer_info[next_to_clean] Jan  7 13:26:43 web2 kernel:   time_stamp           <103105c76> Jan  7 13:26:43 web2 kernel:   next_to_watch        <2a> Jan  7 13:26:43 web2 kernel:   jiffies              <103106f41> Jan  7 13:26:43 web2 kernel:   next_to_watch.status <0> Jan  7 13:26:45 web2 kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Jan  7 13:26:45 web2 kernel:   Tx Queue             <0> Jan  7 13:26:45 web2 kernel:   TDH                  <2a> Jan  7 13:26:45 web2 kernel:   TDT                  <2d> Jan  7 13:26:45 web2 kernel:   next_to_use          <2d> Jan  7 13:26:45 web2 kernel:   next_to_clean        <2a> Jan  7 13:26:45 web2 kernel: buffer_info[next_to_clean] Jan  7 13:26:45 web2 kernel:   time_stamp           <103105c76> Jan  7 13:26:45 web2 kernel:   next_to_watch        <2a> Jan  7 13:26:45 web2 kernel:   jiffies              <103107711> Jan  7 13:26:45 web2 kernel:   next_to_watch.status <0> Jan  7 13:26:47 web2 kernel: ------------[ cut here ]------------ Jan  7 13:26:47 web2 kernel: WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x206/0x220() Jan  7 13:26:47 web2 kernel: NETDEV WATCHDOG: eth1 (e1000): transmit timed out Jan  7 13:26:47 web2 kernel: Modules linked in: nfs lockd nfs_acl sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm_mod serio_raw e1000 pata_jmicron pcspkr Jan  7 13:26:47 web2 kernel: Pid: 0, comm: swapper Not tainted 2.6.28 #1 Jan  7 13:26:47 web2 kernel: Call Trace: Jan  7 13:26:47 web2 kernel:    [] warn_slowpath+0x10c/0x150 Jan  7 13:26:47 web2 kernel:  [] enqueue_task_fair+0xf1/0x110 Jan  7 13:26:47 web2 kernel:  [] source_load+0x37/0x70 Jan  7 13:26:47 web2 kernel:  [] __next_cpu+0x1a/0x30 Jan  7 13:26:47 web2 kernel:  [] find_busiest_group+0x18c/0x820 Jan  7 13:26:47 web2 kernel:  [] read_tsc+0x9/0x20 Jan  7 13:26:47 web2 kernel:  [] getnstimeofday+0x41/0xc0 Jan  7 13:26:47 web2 kernel:  [] strlcpy+0x4e/0x80 Jan  7 13:26:47 web2 kernel:  [] dev_watchdog+0x206/0x220 Jan  7 13:26:47 web2 kernel:  [] read_tsc+0x9/0x20 Jan  7 13:26:47 web2 kernel:  [] getnstimeofday+0x41/0xc0 Jan  7 13:26:47 web2 kernel:  [] dev_watchdog+0x0/0x220 Jan  7 13:26:47 web2 kernel:  [] run_timer_softirq+0x15f/0x1c0 Jan  7 13:26:47 web2 kernel:  [] lapic_next_event+0x15/0x20 Jan  7 13:26:47 web2 kernel:  [] __do_softirq+0x9c/0x170 Jan  7 13:26:47 web2 kernel:  [] call_softirq+0x1c/0x30 Jan  7 13:26:47 web2 kernel:  [] do_softirq+0x35/0x70 Jan  7 13:26:47 web2 kernel:  [] smp_apic_timer_interrupt+0x85/0xd0 Jan  7 13:26:47 web2 kernel:  [] apic_timer_interrupt+0x6b/0x70 Jan  7 13:26:47 web2 kernel:    [] mwait_idle+0x41/0x50 Jan  7 13:26:47 web2 kernel:  [] cpu_idle+0x3a/0x70 Jan  7 13:26:47 web2 kernel: ---[ end trace 06fe492fa8302a12 ]--- Jan  7 13:26:50 web2 kernel: e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02)         Subsystem: ASUSTeK Computer Inc. Unknown device 8276         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B-         Capabilities: [40] Express Root Port (Slot+) IRQ 0                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-                 Device: Latency L0s <64ns, L1 <1us                 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-                 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes                 Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s L1, Port 1                 Link: Latency L0s <1us, L1 <4us                 Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-                 Link: Speed 2.5Gb/s, Width x0                 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+                 Slot: Number 0, PowerLimit 10.000000                 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-                 Slot: AttnInd Unknown, PwrInd Unknown, Power-                 Root: Correctable- Non-Fatal- Fatal- PME-         Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable+                 Address: fee0f00c  Data: 4159         Capabilities: [90] #0d [0000]         Capabilities: [a0] Power Management version 2                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-         Capabilities: [100] Virtual Channel         Capabilities: [180] Unknown (5) 00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02) (prog-if 00 [Normal decode])         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-         Capabilities: [40] Express Root Port (Slot+) IRQ 0                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-                 Device: Latency L0s <64ns, L1 <1us                 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-                 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes                 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 5                 Link: Latency L0s <256ns, L1 <4us                 Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-                 Link: Speed 2.5Gb/s, Width x1                 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+                 Slot: Number 0, PowerLimit 10.000000                 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-                 Slot: AttnInd Unknown, PwrInd Unknown, Power-                 Root: Correctable- Non-Fatal- Fatal- PME-         Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable+                 Address: fee0f00c  Data: 4161         Capabilities: [90] #0d [0000]         Capabilities: [a0] Power Management version 2                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-         Capabilities: [100] Virtual Channel         Capabilities: [180] Unknown (5) 00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02) (prog-if 00 [Normal decode])         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-         Capabilities: [40] Express Root Port (Slot+) IRQ 0                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-                 Device: Latency L0s <64ns, L1 <1us                 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-                 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes                 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 6                 Link: Latency L0s <256ns, L1 <4us                 Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-                 Link: Speed 2.5Gb/s, Width x1                 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+                 Slot: Number 0, PowerLimit 10.000000                 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-                 Slot: AttnInd Unknown, PwrInd Unknown, Power-                 Root: Correctable- Non-Fatal- Fatal- PME-         Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable+                 Address: fee0f00c  Data: 4169         Capabilities: [90] #0d [0000]         Capabilities: [a0] Power Management version 2                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-         Capabilities: [100] Virtual Channel         Capabilities: [180] Unknown (5) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92) (prog-if 01 [Subtractive decode])         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-         Capabilities: [50] #0d [0000] 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)         Subsystem: ASUSTeK Computer Inc. Unknown device 8277         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR-