* tg3: tg3_stop_block timed out @ 2006-08-07 22:43 Bernd Schubert 2006-08-07 23:07 ` Michael Chan 0 siblings, 1 reply; 10+ messages in thread From: Bernd Schubert @ 2006-08-07 22:43 UTC (permalink / raw) To: netdev Hi, I have seen a few reports like this, but now broadcom seems to actively support tg3, so I decided to send this. ... [many hamilton not responding messages] 4554928.798000] nfs: server hamilton not responding, still trying [4554935.319000] nfs: server hamilton not responding, still trying [4555468.940000] NETDEV WATCHDOG: eth1: transmit timed out [4555468.940000] tg3: eth1: transmit timed out, resetting [4555469.044000] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 [4555469.147000] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 [4555469.251000] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 [4555469.354000] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 [4555469.433000] tg3: eth1: Link is down. [4555472.593000] tg3: eth1: Link is up at 1000 Mbps, full duplex. [4555472.594000] tg3: eth1: Flow control is on for TX and on for RX. [4555498.016000] nfs: server 129.206.21.200 OK [4555648.015000] nfs: server 129.206.21.200 OK ... [many ok messages] It seems to be the first time that something like this happend, at least I don't find anything in the previous logs. This is with 2.6.16, would it be worth to try a more recent tg3 driver (e.g. from broadcom (3.58) or backported from 2.6.17 (3.59))? Thanks, Bernd ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-08-07 22:43 tg3: tg3_stop_block timed out Bernd Schubert @ 2006-08-07 23:07 ` Michael Chan 2006-08-07 23:24 ` Bernd Schubert 0 siblings, 1 reply; 10+ messages in thread From: Michael Chan @ 2006-08-07 23:07 UTC (permalink / raw) To: Bernd Schubert; +Cc: netdev On Tue, 2006-08-08 at 00:43 +0200, Bernd Schubert wrote: > Hi, > > I have seen a few reports like this, but now broadcom seems to actively > support tg3, so I decided to send this. > > ... [many hamilton not responding messages] > 4554928.798000] nfs: server hamilton not responding, still trying > [4554935.319000] nfs: server hamilton not responding, still trying > [4555468.940000] NETDEV WATCHDOG: eth1: transmit timed out > [4555468.940000] tg3: eth1: transmit timed out, resetting > [4555469.044000] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 > [4555469.147000] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 > [4555469.251000] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 > [4555469.354000] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 > [4555469.433000] tg3: eth1: Link is down. > [4555472.593000] tg3: eth1: Link is up at 1000 Mbps, full duplex. > [4555472.594000] tg3: eth1: Flow control is on for TX and on for RX. > [4555498.016000] nfs: server 129.206.21.200 OK > [4555648.015000] nfs: server 129.206.21.200 OK > ... [many ok messages] > I need to know what hardware you're using so please send me the tg3 probing output for eth1 when you load the driver. Do you have TSO enabled? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-08-07 23:07 ` Michael Chan @ 2006-08-07 23:24 ` Bernd Schubert 2006-08-07 23:46 ` Michael Chan 0 siblings, 1 reply; 10+ messages in thread From: Bernd Schubert @ 2006-08-07 23:24 UTC (permalink / raw) To: Michael Chan; +Cc: netdev Hi Michael, thanks for your help! On Tuesday 08 August 2006 01:07, Michael Chan wrote: > > ... [many hamilton not responding messages] > > 4554928.798000] nfs: server hamilton not responding, still trying > > [4554935.319000] nfs: server hamilton not responding, still trying > > [4555468.940000] NETDEV WATCHDOG: eth1: transmit timed out > > [4555468.940000] tg3: eth1: transmit timed out, resetting > > [4555469.044000] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 > > [4555469.147000] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 > > [4555469.251000] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2 > > [4555469.354000] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2 > > [4555469.433000] tg3: eth1: Link is down. > > [4555472.593000] tg3: eth1: Link is up at 1000 Mbps, full duplex. > > [4555472.594000] tg3: eth1: Flow control is on for TX and on for RX. > > [4555498.016000] nfs: server 129.206.21.200 OK > > [4555648.015000] nfs: server 129.206.21.200 OK > > ... [many ok messages] > > I need to know what hardware you're using so please send me the tg3 > probing output for eth1 when you load the driver. Do you have TSO > enabled? tg3.c:v3.49 (Feb 2, 2006) acpi_bus-0201 [01] bus_set_power : Device is not power manageable eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0] eth1: dma_rwctrl[769f4000] dma_mask[64-bit] eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth2: dma_rwctrl[769f4000] dma_mask[64-bit] The NIC is onboard a Tyan S2882. 0000:02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) Subsystem: Broadcom Corporation: Unknown device 1644 Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 24 Memory at fc8c0000 (64-bit, non-prefetchable) [size=64K] Memory at fc8b0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- 0000:02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) Subsystem: Broadcom Corporation: Unknown device 1644 Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 25 Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K] Memory at fc8d0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- The driver is compiled into the kernel (its a nfs-root booted system and NIC modules are presently not supported by our initrd). So the default option for tso is set. Is there any way to determine the present tso setting? With ethtool I only find the options to turn it off/on, but none to query the current state. Thanks a lot, Bernd -- Bernd Schubert PCI / Theoretische Chemie Universität Heidelberg INF 229 69120 Heidelberg ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-08-07 23:24 ` Bernd Schubert @ 2006-08-07 23:46 ` Michael Chan 2006-08-09 14:44 ` Philip Molter 2006-08-09 15:20 ` Bernd Schubert 0 siblings, 2 replies; 10+ messages in thread From: Michael Chan @ 2006-08-07 23:46 UTC (permalink / raw) To: Bernd Schubert; +Cc: netdev On Tue, 2006-08-08 at 01:24 +0200, Bernd Schubert wrote: > > tg3.c:v3.49 (Feb 2, 2006) > acpi_bus-0201 [01] bus_set_power : Device is not power manageable > eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28 > eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0] > eth1: dma_rwctrl[769f4000] dma_mask[64-bit] > eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29 > eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] > eth2: dma_rwctrl[769f4000] dma_mask[64-bit] > You have ASF enabled on eth1 but not on eth2 so I wonder if ASF is causing the problem. Can you run the same traffic on eth2 and see if you get the same timeout problem? Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-08-07 23:46 ` Michael Chan @ 2006-08-09 14:44 ` Philip Molter 2006-09-03 22:35 ` Philip Molter 2006-08-09 15:20 ` Bernd Schubert 1 sibling, 1 reply; 10+ messages in thread From: Philip Molter @ 2006-08-09 14:44 UTC (permalink / raw) To: Michael Chan; +Cc: Bernd Schubert, netdev Michael Chan wrote: > On Tue, 2006-08-08 at 01:24 +0200, Bernd Schubert wrote: > >> tg3.c:v3.49 (Feb 2, 2006) >> acpi_bus-0201 [01] bus_set_power : Device is not power manageable >> eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28 >> eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0] >> eth1: dma_rwctrl[769f4000] dma_mask[64-bit] >> eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29 >> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] >> eth2: dma_rwctrl[769f4000] dma_mask[64-bit] >> > > You have ASF enabled on eth1 but not on eth2 so I wonder if ASF is > causing the problem. Can you run the same traffic on eth2 and see if > you get the same timeout problem? Thanks. I'm also having this same problem: divert: allocating divert_blk for bond0 tg3.c:v3.14 (November 15, 2004) divert: allocating divert_blk for eth0 eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2e:82:1a eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] divert: allocating divert_blk for eth1 eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2e:82:1b eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] divert: freeing divert_blk for bond0 divert: freeing divert_blk for eth0 divert: freeing divert_blk for eth1 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) Subsystem: Broadcom Corporation: Unknown device 1644 Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 161 Memory at fc8c0000 (64-bit, non-prefetchable) [size=fc8a0000] Memory at fc8b0000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at 00010000 [disabled] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) Subsystem: Broadcom Corporation: Unknown device 1644 Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169 Memory at fc8f0000 (64-bit, non-prefetchable) [size=fc8d0000] Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at 00010000 [disabled] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- I run these things with jumbo frames and bonding. In the case last night, our machine completely locked up because both interfaces stopped working and the channel bond between them went down. These guys are pushing a little over 1Gb/s total traffic between them (500Mb/s each) and one of them will take in about 300Mb/s. Outgoing packets average 20kpkts/s and incoming packets on the one interface average about 45kpkts/s (most incoming traffic is not jumbo). This was on console: tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth1: transmit timed out, resetting tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: transmit timed out, resetting tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth1: transmit timed out, resetting tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 tg3: eth0: transmit timed out, resetting tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 We tried restarting networking. We tried unloading all network-related modules and reloading them. We eventually had to reboot the box to get networking started again. The kernel is 2.6.10, via FC2 (2.6.10-2.3.legacy). We've also had the problem with the latest FC4 kernel. Any information would be greatly appreciated. Philip ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-08-09 14:44 ` Philip Molter @ 2006-09-03 22:35 ` Philip Molter 2006-09-04 18:25 ` Michael Chan 0 siblings, 1 reply; 10+ messages in thread From: Philip Molter @ 2006-09-03 22:35 UTC (permalink / raw) To: Philip Molter; +Cc: Michael Chan, Bernd Schubert, netdev Philip Molter wrote: > Michael Chan wrote: >> On Tue, 2006-08-08 at 01:24 +0200, Bernd Schubert wrote: >> >>> tg3.c:v3.49 (Feb 2, 2006) >>> acpi_bus-0201 [01] bus_set_power : Device is not power >>> manageable >>> eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] >>> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28 >>> eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] >>> TSOcap[0] >>> eth1: dma_rwctrl[769f4000] dma_mask[64-bit] >>> eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] >>> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29 >>> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] >>> TSOcap[1] >>> eth2: dma_rwctrl[769f4000] dma_mask[64-bit] >>> >> >> You have ASF enabled on eth1 but not on eth2 so I wonder if ASF is >> causing the problem. Can you run the same traffic on eth2 and see if >> you get the same timeout problem? Thanks. > > I'm also having this same problem: Is there any additional information that I can give to help get some more work targeted at this bug? I've been getting this lockup three or four times a week per server (I have four of them exhibiting this behavior). The network setup is fairly complicated, but unfortunately, these are production machines pushing multi-gigabit traffic loads. We're using vlans on top of bonding on top of anywhere from 2-to-6 broadcomm NICs, but it appears that the problem is unrelated to the bonding and vlans, as others are reporting similar problems without those enabled. Any assistance would be appreciated. I've left the original information below for reference. If anyone could even explain what this error means, that would be helpful. Maybe we can change something to work around it. Philip > divert: allocating divert_blk for bond0 > tg3.c:v3.14 (November 15, 2004) > divert: allocating divert_blk for eth0 > eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] > (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2e:82:1a > eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] > TSOcap[1] > divert: allocating divert_blk for eth1 > eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] > (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2e:82:1b > eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] > TSOcap[1] > divert: freeing divert_blk for bond0 > divert: freeing divert_blk for eth0 > divert: freeing divert_blk for eth1 > > 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > Gigabit Ethernet (rev 03) > Subsystem: Broadcom Corporation: Unknown device 1644 > Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 161 > Memory at fc8c0000 (64-bit, non-prefetchable) [size=fc8a0000] > Memory at fc8b0000 (64-bit, non-prefetchable) [size=64K] > Expansion ROM at 00010000 [disabled] > Capabilities: [40] PCI-X non-bridge device. > Capabilities: [48] Power Management version 2 > Capabilities: [50] Vital Product Data > Capabilities: [58] Message Signalled Interrupts: 64bit+ > Queue=0/3 Enable- > > 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > Gigabit Ethernet (rev 03) > Subsystem: Broadcom Corporation: Unknown device 1644 > Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169 > Memory at fc8f0000 (64-bit, non-prefetchable) [size=fc8d0000] > Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K] > Expansion ROM at 00010000 [disabled] > Capabilities: [40] PCI-X non-bridge device. > Capabilities: [48] Power Management version 2 > Capabilities: [50] Vital Product Data > Capabilities: [58] Message Signalled Interrupts: 64bit+ > Queue=0/3 Enable- > > I run these things with jumbo frames and bonding. In the case last > night, our machine completely locked up because both interfaces stopped > working and the channel bond between them went down. These guys are > pushing a little over 1Gb/s total traffic between them (500Mb/s each) > and one of them will take in about 300Mb/s. Outgoing packets average > 20kpkts/s and incoming packets on the one interface average about > 45kpkts/s (most incoming traffic is not jumbo). > > This was on console: > > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 > tg3: eth1: transmit timed out, resetting > tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 > tg3: eth0: transmit timed out, resetting > tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 > tg3: eth1: transmit timed out, resetting > tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 > tg3: eth0: transmit timed out, resetting > tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2 > tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2 > > We tried restarting networking. We tried unloading all network-related > modules and reloading them. We eventually had to reboot the box to get > networking started again. The kernel is 2.6.10, via FC2 > (2.6.10-2.3.legacy). We've also had the problem with the latest FC4 > kernel. > > Any information would be greatly appreciated. > > Philip > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- VGER BF report: U 0.965869 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-09-03 22:35 ` Philip Molter @ 2006-09-04 18:25 ` Michael Chan 2006-09-04 21:27 ` Philip Molter 2006-09-12 17:22 ` Philip Molter 0 siblings, 2 replies; 10+ messages in thread From: Michael Chan @ 2006-09-04 18:25 UTC (permalink / raw) To: Philip Molter; +Cc: Bernd Schubert, netdev Philip Molter wrote: > Is there any additional information that I can give to help get some > more work targeted at this bug? I've been getting this > lockup three or > four times a week per server (I have four of them exhibiting > this behavior). > > The network setup is fairly complicated, but unfortunately, these are > production machines pushing multi-gigabit traffic loads. We're using > vlans on top of bonding on top of anywhere from 2-to-6 > broadcomm NICs, > but it appears that the problem is unrelated to the bonding > and vlans, > as others are reporting similar problems without those enabled. > > Any assistance would be appreciated. I've left the original > information > below for reference. Since you're using a rather old version of tg3, I suggest that you upgrade to a newer version first. Your problem is probably different from Bernd Schubert's since he has ASF enabled and you don't. > > If anyone could even explain what this error means, that would be > helpful. Maybe we can change something to work around it. > The stop_block error messages are not too important. The important thing is that you're getting a transmit timeout. It means that the tx queue is getting full because the NIC is no longer getting interrupts. When this condition is detected, the NIC will get reset which should normally bring the NIC back to life. It seems that in your case, it doesn't come back. Do you get these timeouts on both ports at the same time? Please try the latest driver. If you still get the timeouts, I'll need to send you some debug patches to dump the state when these timeouts occur. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-09-04 18:25 ` Michael Chan @ 2006-09-04 21:27 ` Philip Molter 2006-09-12 17:22 ` Philip Molter 1 sibling, 0 replies; 10+ messages in thread From: Philip Molter @ 2006-09-04 21:27 UTC (permalink / raw) To: Michael Chan; +Cc: Bernd Schubert, netdev Michael Chan wrote: > Philip Molter wrote: > >> Is there any additional information that I can give to help get some >> more work targeted at this bug? I've been getting this >> lockup three or >> four times a week per server (I have four of them exhibiting >> this behavior). >> >> The network setup is fairly complicated, but unfortunately, these are >> production machines pushing multi-gigabit traffic loads. We're using >> vlans on top of bonding on top of anywhere from 2-to-6 >> broadcomm NICs, >> but it appears that the problem is unrelated to the bonding >> and vlans, >> as others are reporting similar problems without those enabled. >> >> Any assistance would be appreciated. I've left the original >> information >> below for reference. > > Since you're using a rather old version of tg3, I suggest that you > upgrade to a newer version first. Your problem is probably > different from Bernd Schubert's since he has ASF enabled and you > don't. > >> If anyone could even explain what this error means, that would be >> helpful. Maybe we can change something to work around it. >> > > The stop_block error messages are not too important. The important > thing is that you're getting a transmit timeout. It means that > the tx queue is getting full because the NIC is no longer getting > interrupts. When this condition is detected, the NIC will get reset > which should normally bring the NIC back to life. It seems that > in your case, it doesn't come back. Do you get these timeouts on > both ports at the same time? It's hard to tell. When the error gets logged, it doesn't say which interface it's happening on. The box is locked up by the time we get to it, but I think it's happening on both. I've had NICs lock up with queue issues before, but I've never had it lock up a box completely, unresponsive on console even. Normally, network just breaks, and sure, it requires a reboot, but at least we can do a controlled reboot. This only started happening when we moved these NICs to jumbo frames. We've used the exact same hardware in less demanding applications (up to 500Mbits vs. 750+Mbits) with jumbo without issue, but these particular machines, these pushers, only started locking up when we switched to jumbo. > Please try the latest driver. If you still get the timeouts, I'll > need to send you some debug patches to dump the state when these > timeouts occur. Will do. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-09-04 18:25 ` Michael Chan 2006-09-04 21:27 ` Philip Molter @ 2006-09-12 17:22 ` Philip Molter 1 sibling, 0 replies; 10+ messages in thread From: Philip Molter @ 2006-09-12 17:22 UTC (permalink / raw) To: Michael Chan; +Cc: netdev > Please try the latest driver. If you still get the timeouts, I'll > need to send you some debug patches to dump the state when these > timeouts occur. So far, with the new driver, I haven't seen any of these timeouts. Normally, we'd have seen them by now. If they recur at some point in the future, I'll post a new message. Thanks for your help, Michael. Philip ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tg3: tg3_stop_block timed out 2006-08-07 23:46 ` Michael Chan 2006-08-09 14:44 ` Philip Molter @ 2006-08-09 15:20 ` Bernd Schubert 1 sibling, 0 replies; 10+ messages in thread From: Bernd Schubert @ 2006-08-09 15:20 UTC (permalink / raw) To: Michael Chan; +Cc: netdev On Tuesday 08 August 2006 01:46, Michael Chan wrote: > On Tue, 2006-08-08 at 01:24 +0200, Bernd Schubert wrote: > > tg3.c:v3.49 (Feb 2, 2006) > > acpi_bus-0201 [01] bus_set_power : Device is not power manageable > > eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) > > 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28 eth1: RXcsums[1] > > LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0] eth1: > > dma_rwctrl[769f4000] dma_mask[64-bit] > > eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) > > 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29 eth2: RXcsums[1] > > LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth2: > > dma_rwctrl[769f4000] dma_mask[64-bit] > > You have ASF enabled on eth1 but not on eth2 so I wonder if ASF is > causing the problem. Can you run the same traffic on eth2 and see if > you get the same timeout problem? Thanks. Currently I have no physical access to the system and eth2 is not connected to our switch. I will connect it and run a test as soon as possible (sometime this week). However, I don't think I can easily reproduce it even with eth1. Its the first time I noticed those error messages in seven months. However, we are experiencing random crashes of the system and maybe thats the cause of it, who knows. The system is running its root file system over nfs, so a network failure will lockup the entire system. Thanks, Bernd ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-09-12 17:22 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-07 22:43 tg3: tg3_stop_block timed out Bernd Schubert 2006-08-07 23:07 ` Michael Chan 2006-08-07 23:24 ` Bernd Schubert 2006-08-07 23:46 ` Michael Chan 2006-08-09 14:44 ` Philip Molter 2006-09-03 22:35 ` Philip Molter 2006-09-04 18:25 ` Michael Chan 2006-09-04 21:27 ` Philip Molter 2006-09-12 17:22 ` Philip Molter 2006-08-09 15:20 ` Bernd Schubert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).