public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* NETDEV WATCHDOG on U60/SMP
@ 2008-06-20  7:54 BERTRAND Joël
  2008-06-20  8:52 ` Steffen Klassert
  0 siblings, 1 reply; 3+ messages in thread
From: BERTRAND Joël @ 2008-06-20  7:54 UTC (permalink / raw)
  To: linux-kernel

	Hello,

	This mail comes from sparclinux mailing list. I repost it on general 
linux kernel mailing list because I'm not sure that this bug is sparc 
specific. Nevertheless, I can only reproduce it on sparc64/SMP.

	My U60 runs linux debian with official 2.6.25 linux kernel (I'm
currently trying 2.6.25.7) and sometimes, when eth2 is stressed, eth2
hangs with NETDEV WATCHDOG :

NETDEV WATCHDOG: eth2: transmit timed out
eth2: transmit timed out, tx_status 00 status 8601.
   diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
eth2: Interrupt posted but not delivered -- IRQ blocked by another device?
   Flags; bus-master 1, dirty 2283344(0) current 2283344(0)
   Transmit list 00000000 vs. fffff800af098200.
   0: @fffff800af098200  length 00000042 status 0c01059a
   1: @fffff800af098260  length 00000042 status 0c01059a
   2: @fffff800af0982c0  length 00000042 status 0c01059a
   3: @fffff800af098320  length 00000042 status 0c01059a
   4: @fffff800af098380  length 00000042 status 0c01059a
   5: @fffff800af0983e0  length 00000042 status 0c01059a
   6: @fffff800af098440  length 00000042 status 0c01059a
   7: @fffff800af0984a0  length 00000042 status 0c01059a
   8: @fffff800af098500  length 8000002a status 0001002a
   9: @fffff800af098560  length 8000002a status 0001002a
   10: @fffff800af0985c0  length 8000002a status 0001002a
   11: @fffff800af098620  length 8000002a status 0001002a
   12: @fffff800af098680  length 8000002a status 0001002a
   13: @fffff800af0986e0  length 8000002a status 0001002a
   14: @fffff800af098740  length 8000002a status 8001002a
   15: @fffff800af0987a0  length 8000002a status 8001002a
eth2: Resetting the Tx ring pointer.
eth2:  setting full-duplex.
NETDEV WATCHDOG: eth2: transmit timed out
eth2: transmit timed out, tx_status 00 status 8601.
   diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
eth2: Interrupt posted but not delivered -- IRQ blocked by another device?
   Flags; bus-master 1, dirty 16(0) current 16(0)
   Transmit list 00000000 vs. fffff800af098200.
   0: @fffff800af098200  length 8000002a status 0001002a
   1: @fffff800af098260  length 8000002a status 0001002a
   2: @fffff800af0982c0  length 8000002a status 0001002a
   3: @fffff800af098320  length 8000002a status 0001002a
   4: @fffff800af098380  length 8000002a status 0001002a
   5: @fffff800af0983e0  length 8000002a status 0001002a
   6: @fffff800af098440  length 8000002a status 0001002a
   7: @fffff800af0984a0  length 8000002a status 0001002a
   8: @fffff800af098500  length 8000002a status 0001002a
   9: @fffff800af098560  length 8000002a status 0001002a
   10: @fffff800af0985c0  length 8000002a status 0001002a
   11: @fffff800af098620  length 8000002a status 0001002a
   12: @fffff800af098680  length 8000002a status 0001002a
   13: @fffff800af0986e0  length 8000002a status 0001002a
   14: @fffff800af098740  length 8000002a status 8001002a
   15: @fffff800af0987a0  length 8000002a status 8001002a
eth2: Resetting the Tx ring pointer.
eth2:  setting full-duplex.
...

	I have to reboot this server to restore eth2.
This adapter is a 3Com NIC (3C905). I have tried with several different
3Com adapters with the same result. If I change this NIC (for example 
with a HME or any PCI 2.1 adapter), I cannot reproduce the bug.

	It only occurs when ethernet traffic is high on eth2.

	I have seen this bug since 2.6.20 even on amd64 (but I'm not sure that 
this bug remains in amd64 kernel because I don't have any amd64 
workstation to test, and I don't see it on amd64 since 2.6.24. Maybe it 
is fixed on amd64...).

lspci returns :
0000:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus
Module
0000:00:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
0000:00:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy
Meal 10/100 Ethernet [hme] (rev 01)
0000:00:02.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M
[Tornado] (rev 78)
0000:00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875
(rev 14)
0000:00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875
(rev 14)
0000:00:04.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
0000:00:05.0 USB Controller: NEC Corporation USB (rev 43)
0000:00:05.1 USB Controller: NEC Corporation USB (rev 43)
0000:00:05.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
0001:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus
Module
0001:80:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
0001:80:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy
Meal 10/100 Ethernet [hme] (rev 01)

ifconfig:
eth0      Link encap:Ethernet  HWaddr 08:00:20:a1:4b:33
           inet adr:192.168.0.128  Bcast:192.168.0.255  Masque:255.255.255.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:16709366 errors:0 dropped:0 overruns:0 frame:1
           TX packets:21355942 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 lg file transmission:1000
           RX bytes:2391901923 (2.2 GiB)  TX bytes:21605391421 (20.1 GiB)
           Interruption:14 Adresse de base:0x3000

eth1      Link encap:Ethernet  HWaddr 08:00:20:a1:4b:33
           inet adr:192.168.254.1  Bcast:192.168.254.255
Masque:255.255.255.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:20207169 errors:0 dropped:0 overruns:0 frame:0
           TX packets:17280402 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 lg file transmission:1000
           RX bytes:19068335140 (17.7 GiB)  TX bytes:8246313479 (7.6 GiB)
           Interruption:24 Adresse de base:0x1800

eth2      Link encap:Ethernet  HWaddr 00:04:75:df:1c:6d
           inet adr:192.168.253.1  Bcast:192.168.253.255
Masque:255.255.255.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:1843643 errors:0 dropped:0 overruns:0 frame:0
           TX packets:2416959 errors:13 dropped:0 overruns:0 carrier:0
           collisions:0 lg file transmission:1000
           RX bytes:157416047 (150.1 MiB)  TX bytes:2313298605 (2.1 GiB)
           Interruption:17 Adresse de base:0x8000

lo        Link encap:Boucle locale
           inet adr:127.0.0.1  Masque:255.0.0.0
           UP LOOPBACK RUNNING  MTU:16436  Metric:1
           RX packets:7839862 errors:0 dropped:0 overruns:0 frame:0
           TX packets:7839862 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 lg file transmission:0
           RX bytes:3713209874 (3.4 GiB)  TX bytes:3713209874 (3.4 GiB)

Interruptions:
            CPU0       CPU2
   0: 1253580857 1253580260     <NULL>  timer
   1:          0          0      sun4u  PSYCHO_PCIERR
   2:          0          0      sun4u  PSYCHO_UE
   3:          0          0      sun4u  PSYCHO_CE
   8:     733411          0      sun4u  su(kbd)
   9:          0    4396224      sun4u  su(mouse)
  10:          0          0      sun4u  parport0
  11:          4          0      sun4u  floppy
  12:          0          0      sun4u  cs4231(capture)
  13:          0          0      sun4u  cs4231(play)
  14:          0   37976886      sun4u  eth0
  15:          0  218660455      sun4u  sym53c8xx
  16:         30          0      sun4u  sym53c8xx
  17:    2042976    2011664      sun4u  eth2
  18:  137883796          0      sun4u  aic7xxx
  19:          0    1208028      sun4u  ohci_hcd:usb2
  20:          0     650947      sun4u  ohci_hcd:usb3
  21:          1          4      sun4u  ehci_hcd:usb1
  22:          0          0      sun4u  PSYCHO_PCIERR
  24:    4957716   33460983      sun4u  eth1

	Any idea ?

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: NETDEV WATCHDOG on U60/SMP
  2008-06-20  7:54 NETDEV WATCHDOG on U60/SMP BERTRAND Joël
@ 2008-06-20  8:52 ` Steffen Klassert
  2008-06-20 10:19   ` BERTRAND Joel
  0 siblings, 1 reply; 3+ messages in thread
From: Steffen Klassert @ 2008-06-20  8:52 UTC (permalink / raw)
  To: mt, linux-kernel

On Fri, Jun 20, 2008 at 09:54:00AM +0200, BERTRAND Jo?l wrote:
> 	Hello,
> 
> 	This mail comes from sparclinux mailing list. I repost it on general 
> linux kernel mailing list because I'm not sure that this bug is sparc 
> specific. Nevertheless, I can only reproduce it on sparc64/SMP.
> 
> 	My U60 runs linux debian with official 2.6.25 linux kernel (I'm
> currently trying 2.6.25.7) and sometimes, when eth2 is stressed, eth2
> hangs with NETDEV WATCHDOG :
> 
> NETDEV WATCHDOG: eth2: transmit timed out
> eth2: transmit timed out, tx_status 00 status 8601.
>   diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
> eth2: Interrupt posted but not delivered -- IRQ blocked by another device?
>   Flags; bus-master 1, dirty 2283344(0) current 2283344(0)
>   Transmit list 00000000 vs. fffff800af098200.
>   0: @fffff800af098200  length 00000042 status 0c01059a
>   1: @fffff800af098260  length 00000042 status 0c01059a
>   2: @fffff800af0982c0  length 00000042 status 0c01059a
>   3: @fffff800af098320  length 00000042 status 0c01059a
>   4: @fffff800af098380  length 00000042 status 0c01059a
>   5: @fffff800af0983e0  length 00000042 status 0c01059a
>   6: @fffff800af098440  length 00000042 status 0c01059a
>   7: @fffff800af0984a0  length 00000042 status 0c01059a
>   8: @fffff800af098500  length 8000002a status 0001002a
>   9: @fffff800af098560  length 8000002a status 0001002a
>   10: @fffff800af0985c0  length 8000002a status 0001002a
>   11: @fffff800af098620  length 8000002a status 0001002a
>   12: @fffff800af098680  length 8000002a status 0001002a
>   13: @fffff800af0986e0  length 8000002a status 0001002a
>   14: @fffff800af098740  length 8000002a status 8001002a
>   15: @fffff800af0987a0  length 8000002a status 8001002a
> eth2: Resetting the Tx ring pointer.
> eth2:  setting full-duplex.

Some people with similar problems reported, that they can workarround
their problems by increasing the rx/tx ring sizes of the 3c59x driver.
See http://bugzilla.kernel.org/show_bug.cgi?id=6444 for details.

Would be good to know whether this helps for your problem too.

Steffen


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: NETDEV WATCHDOG on U60/SMP
  2008-06-20  8:52 ` Steffen Klassert
@ 2008-06-20 10:19   ` BERTRAND Joel
  0 siblings, 0 replies; 3+ messages in thread
From: BERTRAND Joel @ 2008-06-20 10:19 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: linux-kernel

Steffen Klassert a écrit :
> On Fri, Jun 20, 2008 at 09:54:00AM +0200, BERTRAND Jo?l wrote:
>> 	Hello,
>>
>> 	This mail comes from sparclinux mailing list. I repost it on general 
>> linux kernel mailing list because I'm not sure that this bug is sparc 
>> specific. Nevertheless, I can only reproduce it on sparc64/SMP.
>>
>> 	My U60 runs linux debian with official 2.6.25 linux kernel (I'm
>> currently trying 2.6.25.7) and sometimes, when eth2 is stressed, eth2
>> hangs with NETDEV WATCHDOG :
>>
>> NETDEV WATCHDOG: eth2: transmit timed out
>> eth2: transmit timed out, tx_status 00 status 8601.
>>   diagnostics: net 0ccc media 8880 dma 0000003a fifo 0000
>> eth2: Interrupt posted but not delivered -- IRQ blocked by another device?
>>   Flags; bus-master 1, dirty 2283344(0) current 2283344(0)
>>   Transmit list 00000000 vs. fffff800af098200.
>>   0: @fffff800af098200  length 00000042 status 0c01059a
>>   1: @fffff800af098260  length 00000042 status 0c01059a
>>   2: @fffff800af0982c0  length 00000042 status 0c01059a
>>   3: @fffff800af098320  length 00000042 status 0c01059a
>>   4: @fffff800af098380  length 00000042 status 0c01059a
>>   5: @fffff800af0983e0  length 00000042 status 0c01059a
>>   6: @fffff800af098440  length 00000042 status 0c01059a
>>   7: @fffff800af0984a0  length 00000042 status 0c01059a
>>   8: @fffff800af098500  length 8000002a status 0001002a
>>   9: @fffff800af098560  length 8000002a status 0001002a
>>   10: @fffff800af0985c0  length 8000002a status 0001002a
>>   11: @fffff800af098620  length 8000002a status 0001002a
>>   12: @fffff800af098680  length 8000002a status 0001002a
>>   13: @fffff800af0986e0  length 8000002a status 0001002a
>>   14: @fffff800af098740  length 8000002a status 8001002a
>>   15: @fffff800af0987a0  length 8000002a status 8001002a
>> eth2: Resetting the Tx ring pointer.
>> eth2:  setting full-duplex.
> 
> Some people with similar problems reported, that they can workarround
> their problems by increasing the rx/tx ring sizes of the 3c59x driver.
> See http://bugzilla.kernel.org/show_bug.cgi?id=6444 for details.

	Thanks. I'm trying with RX/TX=256/256 and max_interrupt=1024.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-06-20 10:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-20  7:54 NETDEV WATCHDOG on U60/SMP BERTRAND Joël
2008-06-20  8:52 ` Steffen Klassert
2008-06-20 10:19   ` BERTRAND Joel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox