netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* niu driver - Transmit timed out - 2.6.29
@ 2009-03-26 12:44 Jesper Krogh
  2009-03-27 19:31 ` Jesper Krogh
  0 siblings, 1 reply; 6+ messages in thread
From: Jesper Krogh @ 2009-03-26 12:44 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit 
driver earlier).

But then it "blew up" again:

Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here 
]------------
Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at 
net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210()
Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire X4600 M2
Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 (niu): 
transmit timed out
Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: af_packet 
ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss sunrpc 
iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa 
ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev 
psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw pcspkr 
shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd mbcache 
ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi mptscsih 
qla2xxx mptbase scsi_transport_sas scsi_transport_fc ehci_hcd 
scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore dm_mirror 
dm_region_hash dm_log dm_snapshot dm_mod thermal processor fan 
thermal_sys fuse
Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not 
tainted 2.6.29 #30
Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace:
Mar 26 13:25:49 hest kernel: [25335.505113]  <IRQ>  [<ffffffff8023d5c2>] 
warn_slowpath+0xf2/0x130
Mar 26 13:25:49 hest kernel: [25335.505124]  [<ffffffff80239d2d>] 
task_tick_fair+0x4d/0xd0
Mar 26 13:25:49 hest kernel: [25335.505130]  [<ffffffff80355e33>] 
cpumask_next_and+0x23/0x40
Mar 26 13:25:49 hest kernel: [25335.505132]  [<ffffffff80233f84>] 
find_busiest_group+0x204/0x870
Mar 26 13:25:49 hest kernel: [25335.505136]  [<ffffffff8035b65e>] 
strlcpy+0x4e/0x80
Mar 26 13:25:49 hest kernel: [25335.505138]  [<ffffffff8041f11d>] 
dev_watchdog+0x1fd/0x210
Mar 26 13:25:49 hest kernel: [25335.505141]  [<ffffffff80235ac5>] 
run_rebalance_domains+0x3c5/0x530
Mar 26 13:25:49 hest kernel: [25335.505143]  [<ffffffff802474bb>] 
run_timer_softirq+0x1bb/0x230
Mar 26 13:25:49 hest kernel: [25335.505148]  [<ffffffff802574e1>] 
sched_clock_cpu+0x131/0x180
Mar 26 13:25:49 hest kernel: [25335.505151]  [<ffffffff80242cdb>] 
__do_softirq+0x8b/0x150
Mar 26 13:25:49 hest kernel: [25335.505155]  [<ffffffff8020d3bc>] 
call_softirq+0x1c/0x30
Mar 26 13:25:49 hest kernel: [25335.505157]  [<ffffffff8020e505>] 
do_softirq+0x35/0x80
Mar 26 13:25:49 hest kernel: [25335.505161]  [<ffffffff8021f715>] 
smp_apic_timer_interrupt+0x85/0xd0
Mar 26 13:25:49 hest kernel: [25335.505163]  [<ffffffff8020cdf3>] 
apic_timer_interrupt+0x13/0x20
Mar 26 13:25:49 hest kernel: [25335.505164]  <EOI>  [<ffffffff80212dc7>] 
default_idle+0x27/0x40
Mar 26 13:25:49 hest kernel: [25335.505169]  [<ffffffff80212fea>] 
c1e_idle+0xba/0x100
Mar 26 13:25:49 hest kernel: [25335.505171]  [<ffffffff8020ae80>] 
cpu_idle+0x40/0x70
Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace 
e6e4f250dc22390d ]---

It is fairly hard to reproduce and pops generally up after af few days 
of production. But I am willing to test patches that would help resolve 
this problem as both the niu-driver and the NFSD on 2.6.29 really 
outperforms the 2.6.26-rc4 + nxge driver I'm currently using.

Hardware: Sun Fire X4600, 32GB of memory

-- 
Jesper

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: niu driver - Transmit timed out - 2.6.29
  2009-03-26 12:44 niu driver - Transmit timed out - 2.6.29 Jesper Krogh
@ 2009-03-27 19:31 ` Jesper Krogh
  2009-03-28  0:42   ` Matheos Worku
  0 siblings, 1 reply; 6+ messages in thread
From: Jesper Krogh @ 2009-03-27 19:31 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Jesper Krogh wrote:
> Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit 
> driver earlier).
> 
> But then it "blew up" again:
> 
> Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here 
> ]------------
> Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at 
> net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210()
> Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire 
> X4600 M2
> Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 (niu): 
> transmit timed out
> Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: af_packet 
> ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss sunrpc 
> iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa 
> ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev 
> psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw pcspkr 
> shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd mbcache 
> ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi mptscsih 
> qla2xxx mptbase scsi_transport_sas scsi_transport_fc ehci_hcd 
> scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore dm_mirror 
> dm_region_hash dm_log dm_snapshot dm_mod thermal processor fan 
> thermal_sys fuse
> Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not 
> tainted 2.6.29 #30
> Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace:
> Mar 26 13:25:49 hest kernel: [25335.505113]  <IRQ>  [<ffffffff8023d5c2>] 
> warn_slowpath+0xf2/0x130
> Mar 26 13:25:49 hest kernel: [25335.505124]  [<ffffffff80239d2d>] 
> task_tick_fair+0x4d/0xd0
> Mar 26 13:25:49 hest kernel: [25335.505130]  [<ffffffff80355e33>] 
> cpumask_next_and+0x23/0x40
> Mar 26 13:25:49 hest kernel: [25335.505132]  [<ffffffff80233f84>] 
> find_busiest_group+0x204/0x870
> Mar 26 13:25:49 hest kernel: [25335.505136]  [<ffffffff8035b65e>] 
> strlcpy+0x4e/0x80
> Mar 26 13:25:49 hest kernel: [25335.505138]  [<ffffffff8041f11d>] 
> dev_watchdog+0x1fd/0x210
> Mar 26 13:25:49 hest kernel: [25335.505141]  [<ffffffff80235ac5>] 
> run_rebalance_domains+0x3c5/0x530
> Mar 26 13:25:49 hest kernel: [25335.505143]  [<ffffffff802474bb>] 
> run_timer_softirq+0x1bb/0x230
> Mar 26 13:25:49 hest kernel: [25335.505148]  [<ffffffff802574e1>] 
> sched_clock_cpu+0x131/0x180
> Mar 26 13:25:49 hest kernel: [25335.505151]  [<ffffffff80242cdb>] 
> __do_softirq+0x8b/0x150
> Mar 26 13:25:49 hest kernel: [25335.505155]  [<ffffffff8020d3bc>] 
> call_softirq+0x1c/0x30
> Mar 26 13:25:49 hest kernel: [25335.505157]  [<ffffffff8020e505>] 
> do_softirq+0x35/0x80
> Mar 26 13:25:49 hest kernel: [25335.505161]  [<ffffffff8021f715>] 
> smp_apic_timer_interrupt+0x85/0xd0
> Mar 26 13:25:49 hest kernel: [25335.505163]  [<ffffffff8020cdf3>] 
> apic_timer_interrupt+0x13/0x20
> Mar 26 13:25:49 hest kernel: [25335.505164]  <EOI>  [<ffffffff80212dc7>] 
> default_idle+0x27/0x40
> Mar 26 13:25:49 hest kernel: [25335.505169]  [<ffffffff80212fea>] 
> c1e_idle+0xba/0x100
> Mar 26 13:25:49 hest kernel: [25335.505171]  [<ffffffff8020ae80>] 
> cpu_idle+0x40/0x70
> Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace 
> e6e4f250dc22390d ]---

There was actually a bit more in the log:

Mar 26 13:25:49 hest kernel: [25335.505176] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting
Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: eth4: 
bits (40000000) of register RXDMA_CFIG1 would not cl
ear, val[c0000000]
Mar 26 13:25:49 hest last message repeated 4 times
Mar 26 13:25:58 hest kernel: [25345.504898] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting
Mar 26 13:26:08 hest kernel: [25355.504758] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting
Mar 26 13:26:13 hest kernel: [25360.504687] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting
Mar 26 13:26:18 hest kernel: [25365.504619] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting
Mar 26 13:26:23 hest kernel: [25370.504549] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting
Mar 26 13:26:28 hest kernel: [25375.504479] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting
Mar 26 13:26:33 hest kernel: [25380.504409] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting
Mar 26 13:26:38 hest kernel: [25385.504340] niu 0000:84:00.0: niu: eth4: 
Transmit timed out, resetting

This is probably the interesting part:
Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: eth4: 
bits (40000000) of register RXDMA_CFIG1 would not clear, val[c0000000]

Any suggestions?

Is this perhaps just broken hardware.. or a driver issue?  (I had the 
Sun nxge driver working for around 180 days on the same card.. so I 
would assume the hardware is ok).

Jesper
-- 
Jesper


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: niu driver - Transmit timed out - 2.6.29
  2009-03-27 19:31 ` Jesper Krogh
@ 2009-03-28  0:42   ` Matheos Worku
  2009-03-28  6:05     ` Jesper Krogh
  0 siblings, 1 reply; 6+ messages in thread
From: Matheos Worku @ 2009-03-28  0:42 UTC (permalink / raw)
  To: Jesper Krogh; +Cc: netdev@vger.kernel.org

Jesper Krogh wrote:
> Jesper Krogh wrote:
>> Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit 
>> driver earlier).
>>
>> But then it "blew up" again:
>>
>> Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here 
>> ]------------
>> Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at 
>> net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210()
>> Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire 
>> X4600 M2
>> Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 
>> (niu): transmit timed out
>> Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: 
>> af_packet ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss 
>> sunrpc iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm 
>> ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
>> scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev 
>> psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw 
>> pcspkr shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd 
>> mbcache ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi 
>> mptscsih qla2xxx mptbase scsi_transport_sas scsi_transport_fc 
>> ehci_hcd scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore 
>> dm_mirror dm_region_hash dm_log dm_snapshot dm_mod thermal processor 
>> fan thermal_sys fuse
>> Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not 
>> tainted 2.6.29 #30
>> Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace:
>> Mar 26 13:25:49 hest kernel: [25335.505113]  <IRQ>  
>> [<ffffffff8023d5c2>] warn_slowpath+0xf2/0x130
>> Mar 26 13:25:49 hest kernel: [25335.505124]  [<ffffffff80239d2d>] 
>> task_tick_fair+0x4d/0xd0
>> Mar 26 13:25:49 hest kernel: [25335.505130]  [<ffffffff80355e33>] 
>> cpumask_next_and+0x23/0x40
>> Mar 26 13:25:49 hest kernel: [25335.505132]  [<ffffffff80233f84>] 
>> find_busiest_group+0x204/0x870
>> Mar 26 13:25:49 hest kernel: [25335.505136]  [<ffffffff8035b65e>] 
>> strlcpy+0x4e/0x80
>> Mar 26 13:25:49 hest kernel: [25335.505138]  [<ffffffff8041f11d>] 
>> dev_watchdog+0x1fd/0x210
>> Mar 26 13:25:49 hest kernel: [25335.505141]  [<ffffffff80235ac5>] 
>> run_rebalance_domains+0x3c5/0x530
>> Mar 26 13:25:49 hest kernel: [25335.505143]  [<ffffffff802474bb>] 
>> run_timer_softirq+0x1bb/0x230
>> Mar 26 13:25:49 hest kernel: [25335.505148]  [<ffffffff802574e1>] 
>> sched_clock_cpu+0x131/0x180
>> Mar 26 13:25:49 hest kernel: [25335.505151]  [<ffffffff80242cdb>] 
>> __do_softirq+0x8b/0x150
>> Mar 26 13:25:49 hest kernel: [25335.505155]  [<ffffffff8020d3bc>] 
>> call_softirq+0x1c/0x30
>> Mar 26 13:25:49 hest kernel: [25335.505157]  [<ffffffff8020e505>] 
>> do_softirq+0x35/0x80
>> Mar 26 13:25:49 hest kernel: [25335.505161]  [<ffffffff8021f715>] 
>> smp_apic_timer_interrupt+0x85/0xd0
>> Mar 26 13:25:49 hest kernel: [25335.505163]  [<ffffffff8020cdf3>] 
>> apic_timer_interrupt+0x13/0x20
>> Mar 26 13:25:49 hest kernel: [25335.505164]  <EOI>  
>> [<ffffffff80212dc7>] default_idle+0x27/0x40
>> Mar 26 13:25:49 hest kernel: [25335.505169]  [<ffffffff80212fea>] 
>> c1e_idle+0xba/0x100
>> Mar 26 13:25:49 hest kernel: [25335.505171]  [<ffffffff8020ae80>] 
>> cpu_idle+0x40/0x70
>> Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace 
>> e6e4f250dc22390d ]---
>
> There was actually a bit more in the log:
>
> Mar 26 13:25:49 hest kernel: [25335.505176] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: 
> eth4: bits (40000000) of register RXDMA_CFIG1 would not cl
> ear, val[c0000000]
> Mar 26 13:25:49 hest last message repeated 4 times
> Mar 26 13:25:58 hest kernel: [25345.504898] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
> Mar 26 13:26:08 hest kernel: [25355.504758] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
> Mar 26 13:26:13 hest kernel: [25360.504687] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
> Mar 26 13:26:18 hest kernel: [25365.504619] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
> Mar 26 13:26:23 hest kernel: [25370.504549] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
> Mar 26 13:26:28 hest kernel: [25375.504479] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
> Mar 26 13:26:33 hest kernel: [25380.504409] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
> Mar 26 13:26:38 hest kernel: [25385.504340] niu 0000:84:00.0: niu: 
> eth4: Transmit timed out, resetting
>
> This is probably the interesting part:
> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: 
> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, 
> val[c0000000]
Jesper,

One of the RX  ring DMAs  is failing to reset. I guess whatever is 
hanging the TX side is affecting the RX side as well. Can you do lspci 
on the function  and its siblings?
Regards
Matheos

>
> Any suggestions?
>
> Is this perhaps just broken hardware.. or a driver issue?  (I had the 
> Sun nxge driver working for around 180 days on the same card.. so I 
> would assume the hardware is ok).
>
> Jesper


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: niu driver - Transmit timed out - 2.6.29
  2009-03-28  0:42   ` Matheos Worku
@ 2009-03-28  6:05     ` Jesper Krogh
  2009-03-28  6:18       ` Matheos Worku
  0 siblings, 1 reply; 6+ messages in thread
From: Jesper Krogh @ 2009-03-28  6:05 UTC (permalink / raw)
  To: Matheos Worku; +Cc: netdev@vger.kernel.org

Matheos Worku wrote:
>> This is probably the interesting part:
>> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: 
>> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, 
>> val[c0000000]
> Jesper,
> 
> One of the RX  ring DMAs  is failing to reset. I guess whatever is 
> hanging the TX side is affecting the RX side as well. Can you do lspci 
> on the function  and its siblings?

Like this(please guide me if that wasn't the correct lspci output):

k# lspci -vvv -s 84:00
84:00.0 Ethernet controller: Sun Microsystems Computer Corp. 
Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
         Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
         Latency: 0, Cache Line Size: 64 bytes
         Interrupt: pin A routed to IRQ 43
         Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=16M]
         Region 2: Memory at fe9f8000 (64-bit, non-prefetchable) [size=32K]
         Region 4: Memory at fe9f0000 (64-bit, non-prefetchable) [size=32K]
         Expansion ROM at fe800000 [disabled] [size=1M]
         Capabilities: [40] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ 
Queue=0/5 Enable-
                 Address: 0000000000000000  Data: 0000
                 Masking: 00000000  Pending: 00000000
         Capabilities: [70] MSI-X: Enable+ Mask- TabSize=32
                 Vector table: BAR=2 offset=00000000
                 PBA: BAR=2 offset=00004000
         Capabilities: [80] Express Endpoint IRQ 0
                 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, 
ExtTag-
                 Device: Latency L0s <4us, L1 <8us
                 Device: AtnBtn- AtnInd- PwrInd-
                 Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
                 Link: Latency L0s <512ns, L1 <64us
                 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
                 Link: Speed 2.5Gb/s, Width x8
         Capabilities: [94] Vendor Specific Information
         Capabilities: [9c] Vendor Specific Information

84:00.1 Ethernet controller: Sun Microsystems Computer Corp. 
Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
         Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
         Latency: 0, Cache Line Size: 64 bytes
         Interrupt: pin B routed to IRQ 42
         Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=16M]
         Region 2: Memory at fe9e8000 (64-bit, non-prefetchable) [size=32K]
         Region 4: Memory at fe9e0000 (64-bit, non-prefetchable) [size=32K]
         Expansion ROM at fe700000 [disabled] [size=1M]
         Capabilities: [40] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ 
Queue=0/5 Enable-
                 Address: 0000000000000000  Data: 0000
                 Masking: 00000000  Pending: 00000000
         Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
                 Vector table: BAR=2 offset=00000000
                 PBA: BAR=2 offset=00004000
         Capabilities: [80] Express Endpoint IRQ 0
                 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, 
ExtTag-
                 Device: Latency L0s <4us, L1 <8us
                 Device: AtnBtn- AtnInd- PwrInd-
                 Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
                 Link: Latency L0s <512ns, L1 <64us
                 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
                 Link: Speed 2.5Gb/s, Width x8
         Capabilities: [94] Vendor Specific Information
         Capabilities: [9c] Vendor Specific Information

84:00.2 Ethernet controller: Sun Microsystems Computer Corp. 
Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
         Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
         Interrupt: pin C routed to IRQ 41
         Region 0: Memory at fb000000 (64-bit, non-prefetchable) [size=16M]
         Region 2: Memory at fe9d8000 (64-bit, non-prefetchable) [size=32K]
         Region 4: Memory at fe9d0000 (64-bit, non-prefetchable) [size=32K]
         Expansion ROM at fe600000 [disabled] [size=1M]
         Capabilities: [40] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ 
Queue=0/5 Enable-
                 Address: 0000000000000000  Data: 0000
                 Masking: 00000000  Pending: 00000000
         Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
                 Vector table: BAR=2 offset=00000000
                 PBA: BAR=2 offset=00004000
         Capabilities: [80] Express Endpoint IRQ 0
                 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, 
ExtTag-
                 Device: Latency L0s <4us, L1 <8us
                 Device: AtnBtn- AtnInd- PwrInd-
                 Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
                 Link: Latency L0s <512ns, L1 <64us
                 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
                 Link: Speed 2.5Gb/s, Width x8
         Capabilities: [94] Vendor Specific Information
         Capabilities: [9c] Vendor Specific Information

84:00.3 Ethernet controller: Sun Microsystems Computer Corp. 
Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
         Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
         Interrupt: pin D routed to IRQ 40
         Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M]
         Region 2: Memory at fe9c8000 (64-bit, non-prefetchable) [size=32K]
         Region 4: Memory at fe9c0000 (64-bit, non-prefetchable) [size=32K]
         Expansion ROM at fe500000 [disabled] [size=1M]
         Capabilities: [40] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ 
Queue=0/5 Enable-
                 Address: 0000000000000000  Data: 0000
                 Masking: 00000000  Pending: 00000000
         Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
                 Vector table: BAR=2 offset=00000000
                 PBA: BAR=2 offset=00004000
         Capabilities: [80] Express Endpoint IRQ 0
                 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, 
ExtTag-
                 Device: Latency L0s <4us, L1 <8us
                 Device: AtnBtn- AtnInd- PwrInd-
                 Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
                 Link: Latency L0s <512ns, L1 <64us
                 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
                 Link: Speed 2.5Gb/s, Width x8
         Capabilities: [94] Vendor Specific Information
         Capabilities: [9c] Vendor Specific Information

-- 
Jesper

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: niu driver - Transmit timed out - 2.6.29
  2009-03-28  6:05     ` Jesper Krogh
@ 2009-03-28  6:18       ` Matheos Worku
  2009-03-28  7:25         ` Jesper Krogh
  0 siblings, 1 reply; 6+ messages in thread
From: Matheos Worku @ 2009-03-28  6:18 UTC (permalink / raw)
  To: Jesper Krogh; +Cc: netdev@vger.kernel.org

Jesper Krogh wrote:
> Matheos Worku wrote:
>>> This is probably the interesting part:
>>> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: 
>>> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, 
>>> val[c0000000]
>> Jesper,
>>
>> One of the RX  ring DMAs  is failing to reset. I guess whatever is 
>> hanging the TX side is affecting the RX side as well. Can you do lspci 
>> on the function  and its siblings?
> 
> Like this(please guide me if that wasn't the correct lspci output):

Jesper,

I was wondering if you can get the register dump just after the NIC hangs.

lspci -vvv -xxx -s 84:0

Regards
Matheos

> 
> k# lspci -vvv -s 84:00
> 84:00.0 Ethernet controller: Sun Microsystems Computer Corp. 
> Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
>         Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 43
>         Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=16M]
>         Region 2: Memory at fe9f8000 (64-bit, non-prefetchable) [size=32K]
>         Region 4: Memory at fe9f0000 (64-bit, non-prefetchable) [size=32K]
>         Expansion ROM at fe800000 [disabled] [size=1M]
>         Capabilities: [40] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ 
> Queue=0/5 Enable-
>                 Address: 0000000000000000  Data: 0000
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [70] MSI-X: Enable+ Mask- TabSize=32
>                 Vector table: BAR=2 offset=00000000
>                 PBA: BAR=2 offset=00004000
>         Capabilities: [80] Express Endpoint IRQ 0
>                 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, 
> ExtTag-
>                 Device: Latency L0s <4us, L1 <8us
>                 Device: AtnBtn- AtnInd- PwrInd-
>                 Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
>                 Link: Latency L0s <512ns, L1 <64us
>                 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x8
>         Capabilities: [94] Vendor Specific Information
>         Capabilities: [9c] Vendor Specific Information
> 
> 84:00.1 Ethernet controller: Sun Microsystems Computer Corp. 
> Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
>         Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin B routed to IRQ 42
>         Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=16M]
>         Region 2: Memory at fe9e8000 (64-bit, non-prefetchable) [size=32K]
>         Region 4: Memory at fe9e0000 (64-bit, non-prefetchable) [size=32K]
>         Expansion ROM at fe700000 [disabled] [size=1M]
>         Capabilities: [40] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ 
> Queue=0/5 Enable-
>                 Address: 0000000000000000  Data: 0000
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
>                 Vector table: BAR=2 offset=00000000
>                 PBA: BAR=2 offset=00004000
>         Capabilities: [80] Express Endpoint IRQ 0
>                 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, 
> ExtTag-
>                 Device: Latency L0s <4us, L1 <8us
>                 Device: AtnBtn- AtnInd- PwrInd-
>                 Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
>                 Link: Latency L0s <512ns, L1 <64us
>                 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x8
>         Capabilities: [94] Vendor Specific Information
>         Capabilities: [9c] Vendor Specific Information
> 
> 84:00.2 Ethernet controller: Sun Microsystems Computer Corp. 
> Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
>         Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
>         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
>         Interrupt: pin C routed to IRQ 41
>         Region 0: Memory at fb000000 (64-bit, non-prefetchable) [size=16M]
>         Region 2: Memory at fe9d8000 (64-bit, non-prefetchable) [size=32K]
>         Region 4: Memory at fe9d0000 (64-bit, non-prefetchable) [size=32K]
>         Expansion ROM at fe600000 [disabled] [size=1M]
>         Capabilities: [40] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ 
> Queue=0/5 Enable-
>                 Address: 0000000000000000  Data: 0000
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
>                 Vector table: BAR=2 offset=00000000
>                 PBA: BAR=2 offset=00004000
>         Capabilities: [80] Express Endpoint IRQ 0
>                 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, 
> ExtTag-
>                 Device: Latency L0s <4us, L1 <8us
>                 Device: AtnBtn- AtnInd- PwrInd-
>                 Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
>                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
>                 Link: Latency L0s <512ns, L1 <64us
>                 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x8
>         Capabilities: [94] Vendor Specific Information
>         Capabilities: [9c] Vendor Specific Information
> 
> 84:00.3 Ethernet controller: Sun Microsystems Computer Corp. 
> Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
>         Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
>         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- 
> ParErr- Stepping- SERR- FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR-
>         Interrupt: pin D routed to IRQ 40
>         Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M]
>         Region 2: Memory at fe9c8000 (64-bit, non-prefetchable) [size=32K]
>         Region 4: Memory at fe9c0000 (64-bit, non-prefetchable) [size=32K]
>         Expansion ROM at fe500000 [disabled] [size=1M]
>         Capabilities: [40] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ 
> Queue=0/5 Enable-
>                 Address: 0000000000000000  Data: 0000
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
>                 Vector table: BAR=2 offset=00000000
>                 PBA: BAR=2 offset=00004000
>         Capabilities: [80] Express Endpoint IRQ 0
>                 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, 
> ExtTag-
>                 Device: Latency L0s <4us, L1 <8us
>                 Device: AtnBtn- AtnInd- PwrInd-
>                 Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
>                 Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
>                 Link: Latency L0s <512ns, L1 <64us
>                 Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x8
>         Capabilities: [94] Vendor Specific Information
>         Capabilities: [9c] Vendor Specific Information
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: niu driver - Transmit timed out - 2.6.29
  2009-03-28  6:18       ` Matheos Worku
@ 2009-03-28  7:25         ` Jesper Krogh
  0 siblings, 0 replies; 6+ messages in thread
From: Jesper Krogh @ 2009-03-28  7:25 UTC (permalink / raw)
  To: Matheos Worku; +Cc: netdev@vger.kernel.org

Matheos Worku wrote:
> Jesper Krogh wrote:
>> Matheos Worku wrote:
>>>> This is probably the interesting part:
>>>> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: 
>>>> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, 
>>>> val[c0000000]
>>> Jesper,
>>>
>>> One of the RX  ring DMAs  is failing to reset. I guess whatever is 
>>> hanging the TX side is affecting the RX side as well. Can you do 
>>> lspci on the function  and its siblings?
>>
>> Like this(please guide me if that wasn't the correct lspci output):
> 
> Jesper,
> 
> I was wondering if you can get the register dump just after the NIC hangs.
> 
> lspci -vvv -xxx -s 84:0

I will try to do that, but it involves more or less "putting a known bad 
driver" in production. And wait X days. (where X usually is less than 7 
and more than 2). So if there is more debugging code that would be 
helpful to have in the driver/kernel then it would be preferrable to get 
it in at the same time, in order to reduce the amount of trial-and-error 
cycles.

-- 
Jesper


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-03-28  7:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-26 12:44 niu driver - Transmit timed out - 2.6.29 Jesper Krogh
2009-03-27 19:31 ` Jesper Krogh
2009-03-28  0:42   ` Matheos Worku
2009-03-28  6:05     ` Jesper Krogh
2009-03-28  6:18       ` Matheos Worku
2009-03-28  7:25         ` Jesper Krogh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).