netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tg3: tg3_stop_block timed out
@ 2006-08-07 22:43 Bernd Schubert
  2006-08-07 23:07 ` Michael Chan
  0 siblings, 1 reply; 10+ messages in thread
From: Bernd Schubert @ 2006-08-07 22:43 UTC (permalink / raw)
  To: netdev

Hi,

I have seen a few reports like this, but now broadcom seems to actively
support tg3, so I decided to send this.

... [many hamilton not responding messages]
4554928.798000] nfs: server hamilton not responding, still trying
[4554935.319000] nfs: server hamilton not responding, still trying
[4555468.940000] NETDEV WATCHDOG: eth1: transmit timed out
[4555468.940000] tg3: eth1: transmit timed out, resetting
[4555469.044000] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
[4555469.147000] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
[4555469.251000] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
[4555469.354000] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
[4555469.433000] tg3: eth1: Link is down.
[4555472.593000] tg3: eth1: Link is up at 1000 Mbps, full duplex.
[4555472.594000] tg3: eth1: Flow control is on for TX and on for RX.
[4555498.016000] nfs: server 129.206.21.200 OK
[4555648.015000] nfs: server 129.206.21.200 OK
... [many ok messages]

It seems to be the first time that something like this happend, at least I
don't find anything in the previous logs.

This is with 2.6.16, would it be worth to try a more recent tg3 driver (e.g.
from broadcom (3.58) or backported from 2.6.17 (3.59))? 


Thanks, 
Bernd





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-08-07 22:43 tg3: tg3_stop_block timed out Bernd Schubert
@ 2006-08-07 23:07 ` Michael Chan
  2006-08-07 23:24   ` Bernd Schubert
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Chan @ 2006-08-07 23:07 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: netdev

On Tue, 2006-08-08 at 00:43 +0200, Bernd Schubert wrote:
> Hi,
> 
> I have seen a few reports like this, but now broadcom seems to actively
> support tg3, so I decided to send this.
> 
> ... [many hamilton not responding messages]
> 4554928.798000] nfs: server hamilton not responding, still trying
> [4554935.319000] nfs: server hamilton not responding, still trying
> [4555468.940000] NETDEV WATCHDOG: eth1: transmit timed out
> [4555468.940000] tg3: eth1: transmit timed out, resetting
> [4555469.044000] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
> [4555469.147000] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
> [4555469.251000] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
> [4555469.354000] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
> [4555469.433000] tg3: eth1: Link is down.
> [4555472.593000] tg3: eth1: Link is up at 1000 Mbps, full duplex.
> [4555472.594000] tg3: eth1: Flow control is on for TX and on for RX.
> [4555498.016000] nfs: server 129.206.21.200 OK
> [4555648.015000] nfs: server 129.206.21.200 OK
> ... [many ok messages]
> 
I need to know what hardware you're using so please send me the tg3
probing output for eth1 when you load the driver. Do you have TSO
enabled?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-08-07 23:07 ` Michael Chan
@ 2006-08-07 23:24   ` Bernd Schubert
  2006-08-07 23:46     ` Michael Chan
  0 siblings, 1 reply; 10+ messages in thread
From: Bernd Schubert @ 2006-08-07 23:24 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

Hi Michael,

thanks for your help!

On Tuesday 08 August 2006 01:07, Michael Chan wrote:
> > ... [many hamilton not responding messages]
> > 4554928.798000] nfs: server hamilton not responding, still trying
> > [4554935.319000] nfs: server hamilton not responding, still trying
> > [4555468.940000] NETDEV WATCHDOG: eth1: transmit timed out
> > [4555468.940000] tg3: eth1: transmit timed out, resetting
> > [4555469.044000] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
> > [4555469.147000] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
> > [4555469.251000] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
> > [4555469.354000] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
> > [4555469.433000] tg3: eth1: Link is down.
> > [4555472.593000] tg3: eth1: Link is up at 1000 Mbps, full duplex.
> > [4555472.594000] tg3: eth1: Flow control is on for TX and on for RX.
> > [4555498.016000] nfs: server 129.206.21.200 OK
> > [4555648.015000] nfs: server 129.206.21.200 OK
> > ... [many ok messages]
>
> I need to know what hardware you're using so please send me the tg3
> probing output for eth1 when you load the driver. Do you have TSO
> enabled?


tg3.c:v3.49 (Feb 2, 2006)
acpi_bus-0201 [01] bus_set_power         : Device is not power manageable
eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0]
eth1: dma_rwctrl[769f4000] dma_mask[64-bit]
eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29
eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
eth2: dma_rwctrl[769f4000] dma_mask[64-bit]

The NIC is onboard a Tyan S2882. 

0000:02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
        Subsystem: Broadcom Corporation: Unknown device 1644
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 24
        Memory at fc8c0000 (64-bit, non-prefetchable) [size=64K]
        Memory at fc8b0000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40]      Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-

0000:02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
        Subsystem: Broadcom Corporation: Unknown device 1644
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 25
        Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K]
        Memory at fc8d0000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40]      Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-


The driver is compiled into the kernel (its a nfs-root booted system and 
NIC modules are presently not supported by our initrd).
So the default option for tso is set. Is there any way to determine the 
present tso setting? With ethtool I only find the options to turn it off/on, 
but none to query the current state.


Thanks a lot,
Bernd


-- 
Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-08-07 23:24   ` Bernd Schubert
@ 2006-08-07 23:46     ` Michael Chan
  2006-08-09 14:44       ` Philip Molter
  2006-08-09 15:20       ` Bernd Schubert
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Chan @ 2006-08-07 23:46 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: netdev

On Tue, 2006-08-08 at 01:24 +0200, Bernd Schubert wrote:

> 
> tg3.c:v3.49 (Feb 2, 2006)
> acpi_bus-0201 [01] bus_set_power         : Device is not power manageable
> eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28
> eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0]
> eth1: dma_rwctrl[769f4000] dma_mask[64-bit]
> eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29
> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
> eth2: dma_rwctrl[769f4000] dma_mask[64-bit]
> 

You have ASF enabled on eth1 but not on eth2 so I wonder if ASF is
causing the problem.  Can you run the same traffic on eth2 and see if
you get the same timeout problem?  Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-08-07 23:46     ` Michael Chan
@ 2006-08-09 14:44       ` Philip Molter
  2006-09-03 22:35         ` Philip Molter
  2006-08-09 15:20       ` Bernd Schubert
  1 sibling, 1 reply; 10+ messages in thread
From: Philip Molter @ 2006-08-09 14:44 UTC (permalink / raw)
  To: Michael Chan; +Cc: Bernd Schubert, netdev

Michael Chan wrote:
> On Tue, 2006-08-08 at 01:24 +0200, Bernd Schubert wrote:
> 
>> tg3.c:v3.49 (Feb 2, 2006)
>> acpi_bus-0201 [01] bus_set_power         : Device is not power manageable
>> eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28
>> eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0]
>> eth1: dma_rwctrl[769f4000] dma_mask[64-bit]
>> eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29
>> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1]
>> eth2: dma_rwctrl[769f4000] dma_mask[64-bit]
>>
> 
> You have ASF enabled on eth1 but not on eth2 so I wonder if ASF is
> causing the problem.  Can you run the same traffic on eth2 and see if
> you get the same timeout problem?  Thanks.

I'm also having this same problem:

divert: allocating divert_blk for bond0
tg3.c:v3.14 (November 15, 2004)
divert: allocating divert_blk for eth0
eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] 
(PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2e:82:1a
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] 
TSOcap[1]
divert: allocating divert_blk for eth1
eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] 
(PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2e:82:1b
eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] 
TSOcap[1]
divert: freeing divert_blk for bond0
divert: freeing divert_blk for eth0
divert: freeing divert_blk for eth1

02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 
Gigabit Ethernet (rev 03)
         Subsystem: Broadcom Corporation: Unknown device 1644
         Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 161
         Memory at fc8c0000 (64-bit, non-prefetchable) [size=fc8a0000]
         Memory at fc8b0000 (64-bit, non-prefetchable) [size=64K]
         Expansion ROM at 00010000 [disabled]
         Capabilities: [40] PCI-X non-bridge device.
         Capabilities: [48] Power Management version 2
         Capabilities: [50] Vital Product Data
         Capabilities: [58] Message Signalled Interrupts: 64bit+ 
Queue=0/3 Enable-

02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 
Gigabit Ethernet (rev 03)
         Subsystem: Broadcom Corporation: Unknown device 1644
         Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169
         Memory at fc8f0000 (64-bit, non-prefetchable) [size=fc8d0000]
         Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K]
         Expansion ROM at 00010000 [disabled]
         Capabilities: [40] PCI-X non-bridge device.
         Capabilities: [48] Power Management version 2
         Capabilities: [50] Vital Product Data
         Capabilities: [58] Message Signalled Interrupts: 64bit+ 
Queue=0/3 Enable-

I run these things with jumbo frames and bonding.  In the case last 
night, our machine completely locked up because both interfaces stopped 
working and the channel bond between them went down.  These guys are 
pushing a little over 1Gb/s total traffic between them (500Mb/s each) 
and one of them will take in about 300Mb/s.  Outgoing packets average 
20kpkts/s and incoming packets on the one interface average about 
45kpkts/s (most incoming traffic is not jumbo).

This was on console:

tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth1: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth1: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2

We tried restarting networking.  We tried unloading all network-related 
modules and reloading them.  We eventually had to reboot the box to get 
networking started again.  The kernel is 2.6.10, via FC2 
(2.6.10-2.3.legacy).  We've also had the problem with the latest FC4 kernel.

Any information would be greatly appreciated.

Philip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-08-07 23:46     ` Michael Chan
  2006-08-09 14:44       ` Philip Molter
@ 2006-08-09 15:20       ` Bernd Schubert
  1 sibling, 0 replies; 10+ messages in thread
From: Bernd Schubert @ 2006-08-09 15:20 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

On Tuesday 08 August 2006 01:46, Michael Chan wrote:
> On Tue, 2006-08-08 at 01:24 +0200, Bernd Schubert wrote:
> > tg3.c:v3.49 (Feb 2, 2006)
> > acpi_bus-0201 [01] bus_set_power         : Device is not power manageable
> > eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit)
> > 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28 eth1: RXcsums[1]
> > LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] TSOcap[0] eth1:
> > dma_rwctrl[769f4000] dma_mask[64-bit]
> > eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCIX:100MHz:64-bit)
> > 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29 eth2: RXcsums[1]
> > LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth2:
> > dma_rwctrl[769f4000] dma_mask[64-bit]
>
> You have ASF enabled on eth1 but not on eth2 so I wonder if ASF is
> causing the problem.  Can you run the same traffic on eth2 and see if
> you get the same timeout problem?  Thanks.

Currently I have no physical access to the system and eth2 is not connected to 
our switch. I will connect it and run a test as soon as possible (sometime 
this week). However, I don't think I can easily reproduce it even with eth1. 
Its the first time I noticed those error messages in seven months. However, 
we are experiencing random crashes of the system and maybe thats the cause of 
it, who knows. The system is running its root file system over nfs, so a 
network failure will lockup the entire system.

Thanks,
	Bernd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-08-09 14:44       ` Philip Molter
@ 2006-09-03 22:35         ` Philip Molter
  2006-09-04 18:25           ` Michael Chan
  0 siblings, 1 reply; 10+ messages in thread
From: Philip Molter @ 2006-09-03 22:35 UTC (permalink / raw)
  To: Philip Molter; +Cc: Michael Chan, Bernd Schubert, netdev

Philip Molter wrote:
> Michael Chan wrote:
>> On Tue, 2006-08-08 at 01:24 +0200, Bernd Schubert wrote:
>>
>>> tg3.c:v3.49 (Feb 2, 2006)
>>> acpi_bus-0201 [01] bus_set_power         : Device is not power 
>>> manageable
>>> eth1: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] 
>>> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:28
>>> eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] Split[0] WireSpeed[1] 
>>> TSOcap[0]
>>> eth1: dma_rwctrl[769f4000] dma_mask[64-bit]
>>> eth2: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] 
>>> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2b:aa:29
>>> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] 
>>> TSOcap[1]
>>> eth2: dma_rwctrl[769f4000] dma_mask[64-bit]
>>>
>>
>> You have ASF enabled on eth1 but not on eth2 so I wonder if ASF is
>> causing the problem.  Can you run the same traffic on eth2 and see if
>> you get the same timeout problem?  Thanks.
> 
> I'm also having this same problem:

Is there any additional information that I can give to help get some 
more work targeted at this bug?  I've been getting this lockup three or 
four times a week per server (I have four of them exhibiting this behavior).

The network setup is fairly complicated, but unfortunately, these are 
production machines pushing multi-gigabit traffic loads.  We're using 
vlans on top of bonding on top of anywhere from 2-to-6 broadcomm NICs, 
but it appears that the problem is unrelated to the bonding and vlans, 
as others are reporting similar problems without those enabled.

Any assistance would be appreciated.  I've left the original information 
below for reference.

If anyone could even explain what this error means, that would be 
helpful.  Maybe we can change something to work around it.

Philip

> divert: allocating divert_blk for bond0
> tg3.c:v3.14 (November 15, 2004)
> divert: allocating divert_blk for eth0
> eth0: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] 
> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2e:82:1a
> eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] 
> TSOcap[1]
> divert: allocating divert_blk for eth1
> eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] 
> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:2e:82:1b
> eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] 
> TSOcap[1]
> divert: freeing divert_blk for bond0
> divert: freeing divert_blk for eth0
> divert: freeing divert_blk for eth1
> 
> 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 
> Gigabit Ethernet (rev 03)
>         Subsystem: Broadcom Corporation: Unknown device 1644
>         Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 161
>         Memory at fc8c0000 (64-bit, non-prefetchable) [size=fc8a0000]
>         Memory at fc8b0000 (64-bit, non-prefetchable) [size=64K]
>         Expansion ROM at 00010000 [disabled]
>         Capabilities: [40] PCI-X non-bridge device.
>         Capabilities: [48] Power Management version 2
>         Capabilities: [50] Vital Product Data
>         Capabilities: [58] Message Signalled Interrupts: 64bit+ 
> Queue=0/3 Enable-
> 
> 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 
> Gigabit Ethernet (rev 03)
>         Subsystem: Broadcom Corporation: Unknown device 1644
>         Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 169
>         Memory at fc8f0000 (64-bit, non-prefetchable) [size=fc8d0000]
>         Memory at fc8e0000 (64-bit, non-prefetchable) [size=64K]
>         Expansion ROM at 00010000 [disabled]
>         Capabilities: [40] PCI-X non-bridge device.
>         Capabilities: [48] Power Management version 2
>         Capabilities: [50] Vital Product Data
>         Capabilities: [58] Message Signalled Interrupts: 64bit+ 
> Queue=0/3 Enable-
> 
> I run these things with jumbo frames and bonding.  In the case last 
> night, our machine completely locked up because both interfaces stopped 
> working and the channel bond between them went down.  These guys are 
> pushing a little over 1Gb/s total traffic between them (500Mb/s each) 
> and one of them will take in about 300Mb/s.  Outgoing packets average 
> 20kpkts/s and incoming packets on the one interface average about 
> 45kpkts/s (most incoming traffic is not jumbo).
> 
> This was on console:
> 
> tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
> tg3: eth1: transmit timed out, resetting
> tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
> tg3: eth0: transmit timed out, resetting
> tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
> tg3: eth1: transmit timed out, resetting
> tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
> tg3: eth0: transmit timed out, resetting
> tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
> tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
> 
> We tried restarting networking.  We tried unloading all network-related 
> modules and reloading them.  We eventually had to reboot the box to get 
> networking started again.  The kernel is 2.6.10, via FC2 
> (2.6.10-2.3.legacy).  We've also had the problem with the latest FC4 
> kernel.
> 
> Any information would be greatly appreciated.
> 
> Philip
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
VGER BF report: U 0.965869

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-09-03 22:35         ` Philip Molter
@ 2006-09-04 18:25           ` Michael Chan
  2006-09-04 21:27             ` Philip Molter
  2006-09-12 17:22             ` Philip Molter
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Chan @ 2006-09-04 18:25 UTC (permalink / raw)
  To: Philip Molter; +Cc: Bernd Schubert, netdev

Philip Molter wrote:

> Is there any additional information that I can give to help get some 
> more work targeted at this bug?  I've been getting this 
> lockup three or 
> four times a week per server (I have four of them exhibiting 
> this behavior).
> 
> The network setup is fairly complicated, but unfortunately, these are 
> production machines pushing multi-gigabit traffic loads.  We're using 
> vlans on top of bonding on top of anywhere from 2-to-6 
> broadcomm NICs, 
> but it appears that the problem is unrelated to the bonding 
> and vlans, 
> as others are reporting similar problems without those enabled.
> 
> Any assistance would be appreciated.  I've left the original 
> information 
> below for reference.

Since you're using a rather old version of tg3, I suggest that you
upgrade to a newer version first.  Your problem is probably
different from Bernd Schubert's since he has ASF enabled and you
don't.

> 
> If anyone could even explain what this error means, that would be 
> helpful.  Maybe we can change something to work around it.
> 

The stop_block error messages are not too important.  The important
thing is that you're getting a transmit timeout.  It means that
the tx queue is getting full because the NIC is no longer getting
interrupts.  When this condition is detected, the NIC will get reset
which should normally bring the NIC back to life.  It seems that
in your case, it doesn't come back.  Do you get these timeouts on
both ports at the same time?

Please try the latest driver.  If you still get the timeouts, I'll
need to send you some debug patches to dump the state when these
timeouts occur.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-09-04 18:25           ` Michael Chan
@ 2006-09-04 21:27             ` Philip Molter
  2006-09-12 17:22             ` Philip Molter
  1 sibling, 0 replies; 10+ messages in thread
From: Philip Molter @ 2006-09-04 21:27 UTC (permalink / raw)
  To: Michael Chan; +Cc: Bernd Schubert, netdev

Michael Chan wrote:
> Philip Molter wrote:
> 
>> Is there any additional information that I can give to help get some 
>> more work targeted at this bug?  I've been getting this 
>> lockup three or 
>> four times a week per server (I have four of them exhibiting 
>> this behavior).
>>
>> The network setup is fairly complicated, but unfortunately, these are 
>> production machines pushing multi-gigabit traffic loads.  We're using 
>> vlans on top of bonding on top of anywhere from 2-to-6 
>> broadcomm NICs, 
>> but it appears that the problem is unrelated to the bonding 
>> and vlans, 
>> as others are reporting similar problems without those enabled.
>>
>> Any assistance would be appreciated.  I've left the original 
>> information 
>> below for reference.
> 
> Since you're using a rather old version of tg3, I suggest that you
> upgrade to a newer version first.  Your problem is probably
> different from Bernd Schubert's since he has ASF enabled and you
> don't.
> 
>> If anyone could even explain what this error means, that would be 
>> helpful.  Maybe we can change something to work around it.
>>
> 
> The stop_block error messages are not too important.  The important
> thing is that you're getting a transmit timeout.  It means that
> the tx queue is getting full because the NIC is no longer getting
> interrupts.  When this condition is detected, the NIC will get reset
> which should normally bring the NIC back to life.  It seems that
> in your case, it doesn't come back.  Do you get these timeouts on
> both ports at the same time?

It's hard to tell.  When the error gets logged, it doesn't say which 
interface it's happening on.  The box is locked up by the time we get to 
it, but I think it's happening on both.

I've had NICs lock up with queue issues before, but I've never had it 
lock up a box completely, unresponsive on console even.  Normally, 
network just breaks, and sure, it requires a reboot, but at least we can 
do a controlled reboot.

This only started happening when we moved these NICs to jumbo frames. 
We've used the exact same hardware in less demanding applications (up to 
  500Mbits vs. 750+Mbits) with jumbo without issue, but these particular 
machines, these pushers, only started locking up when we switched to jumbo.

> Please try the latest driver.  If you still get the timeouts, I'll
> need to send you some debug patches to dump the state when these
> timeouts occur.

Will do.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tg3: tg3_stop_block timed out
  2006-09-04 18:25           ` Michael Chan
  2006-09-04 21:27             ` Philip Molter
@ 2006-09-12 17:22             ` Philip Molter
  1 sibling, 0 replies; 10+ messages in thread
From: Philip Molter @ 2006-09-12 17:22 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

> Please try the latest driver.  If you still get the timeouts, I'll
> need to send you some debug patches to dump the state when these
> timeouts occur.

So far, with the new driver, I haven't seen any of these timeouts. 
Normally, we'd have seen them by now.  If they recur at some point in 
the future, I'll post a new message.

Thanks for your help, Michael.
Philip


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-09-12 17:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-07 22:43 tg3: tg3_stop_block timed out Bernd Schubert
2006-08-07 23:07 ` Michael Chan
2006-08-07 23:24   ` Bernd Schubert
2006-08-07 23:46     ` Michael Chan
2006-08-09 14:44       ` Philip Molter
2006-09-03 22:35         ` Philip Molter
2006-09-04 18:25           ` Michael Chan
2006-09-04 21:27             ` Philip Molter
2006-09-12 17:22             ` Philip Molter
2006-08-09 15:20       ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).