* niu driver - Transmit timed out - 2.6.29
@ 2009-03-26 12:44 Jesper Krogh
2009-03-27 19:31 ` Jesper Krogh
0 siblings, 1 reply; 6+ messages in thread
From: Jesper Krogh @ 2009-03-26 12:44 UTC (permalink / raw)
To: netdev@vger.kernel.org
Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit
driver earlier).
But then it "blew up" again:
Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here
]------------
Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at
net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210()
Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire X4600 M2
Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 (niu):
transmit timed out
Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: af_packet
ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss sunrpc
iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa
ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev
psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw pcspkr
shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd mbcache
ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi mptscsih
qla2xxx mptbase scsi_transport_sas scsi_transport_fc ehci_hcd
scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore dm_mirror
dm_region_hash dm_log dm_snapshot dm_mod thermal processor fan
thermal_sys fuse
Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not
tainted 2.6.29 #30
Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace:
Mar 26 13:25:49 hest kernel: [25335.505113] <IRQ> [<ffffffff8023d5c2>]
warn_slowpath+0xf2/0x130
Mar 26 13:25:49 hest kernel: [25335.505124] [<ffffffff80239d2d>]
task_tick_fair+0x4d/0xd0
Mar 26 13:25:49 hest kernel: [25335.505130] [<ffffffff80355e33>]
cpumask_next_and+0x23/0x40
Mar 26 13:25:49 hest kernel: [25335.505132] [<ffffffff80233f84>]
find_busiest_group+0x204/0x870
Mar 26 13:25:49 hest kernel: [25335.505136] [<ffffffff8035b65e>]
strlcpy+0x4e/0x80
Mar 26 13:25:49 hest kernel: [25335.505138] [<ffffffff8041f11d>]
dev_watchdog+0x1fd/0x210
Mar 26 13:25:49 hest kernel: [25335.505141] [<ffffffff80235ac5>]
run_rebalance_domains+0x3c5/0x530
Mar 26 13:25:49 hest kernel: [25335.505143] [<ffffffff802474bb>]
run_timer_softirq+0x1bb/0x230
Mar 26 13:25:49 hest kernel: [25335.505148] [<ffffffff802574e1>]
sched_clock_cpu+0x131/0x180
Mar 26 13:25:49 hest kernel: [25335.505151] [<ffffffff80242cdb>]
__do_softirq+0x8b/0x150
Mar 26 13:25:49 hest kernel: [25335.505155] [<ffffffff8020d3bc>]
call_softirq+0x1c/0x30
Mar 26 13:25:49 hest kernel: [25335.505157] [<ffffffff8020e505>]
do_softirq+0x35/0x80
Mar 26 13:25:49 hest kernel: [25335.505161] [<ffffffff8021f715>]
smp_apic_timer_interrupt+0x85/0xd0
Mar 26 13:25:49 hest kernel: [25335.505163] [<ffffffff8020cdf3>]
apic_timer_interrupt+0x13/0x20
Mar 26 13:25:49 hest kernel: [25335.505164] <EOI> [<ffffffff80212dc7>]
default_idle+0x27/0x40
Mar 26 13:25:49 hest kernel: [25335.505169] [<ffffffff80212fea>]
c1e_idle+0xba/0x100
Mar 26 13:25:49 hest kernel: [25335.505171] [<ffffffff8020ae80>]
cpu_idle+0x40/0x70
Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace
e6e4f250dc22390d ]---
It is fairly hard to reproduce and pops generally up after af few days
of production. But I am willing to test patches that would help resolve
this problem as both the niu-driver and the NFSD on 2.6.29 really
outperforms the 2.6.26-rc4 + nxge driver I'm currently using.
Hardware: Sun Fire X4600, 32GB of memory
--
Jesper
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29
2009-03-26 12:44 niu driver - Transmit timed out - 2.6.29 Jesper Krogh
@ 2009-03-27 19:31 ` Jesper Krogh
2009-03-28 0:42 ` Matheos Worku
0 siblings, 1 reply; 6+ messages in thread
From: Jesper Krogh @ 2009-03-27 19:31 UTC (permalink / raw)
To: netdev@vger.kernel.org
Jesper Krogh wrote:
> Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit
> driver earlier).
>
> But then it "blew up" again:
>
> Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here
> ]------------
> Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at
> net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210()
> Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire
> X4600 M2
> Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 (niu):
> transmit timed out
> Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: af_packet
> ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss sunrpc
> iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa
> ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev
> psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw pcspkr
> shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd mbcache
> ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi mptscsih
> qla2xxx mptbase scsi_transport_sas scsi_transport_fc ehci_hcd
> scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod thermal processor fan
> thermal_sys fuse
> Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not
> tainted 2.6.29 #30
> Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace:
> Mar 26 13:25:49 hest kernel: [25335.505113] <IRQ> [<ffffffff8023d5c2>]
> warn_slowpath+0xf2/0x130
> Mar 26 13:25:49 hest kernel: [25335.505124] [<ffffffff80239d2d>]
> task_tick_fair+0x4d/0xd0
> Mar 26 13:25:49 hest kernel: [25335.505130] [<ffffffff80355e33>]
> cpumask_next_and+0x23/0x40
> Mar 26 13:25:49 hest kernel: [25335.505132] [<ffffffff80233f84>]
> find_busiest_group+0x204/0x870
> Mar 26 13:25:49 hest kernel: [25335.505136] [<ffffffff8035b65e>]
> strlcpy+0x4e/0x80
> Mar 26 13:25:49 hest kernel: [25335.505138] [<ffffffff8041f11d>]
> dev_watchdog+0x1fd/0x210
> Mar 26 13:25:49 hest kernel: [25335.505141] [<ffffffff80235ac5>]
> run_rebalance_domains+0x3c5/0x530
> Mar 26 13:25:49 hest kernel: [25335.505143] [<ffffffff802474bb>]
> run_timer_softirq+0x1bb/0x230
> Mar 26 13:25:49 hest kernel: [25335.505148] [<ffffffff802574e1>]
> sched_clock_cpu+0x131/0x180
> Mar 26 13:25:49 hest kernel: [25335.505151] [<ffffffff80242cdb>]
> __do_softirq+0x8b/0x150
> Mar 26 13:25:49 hest kernel: [25335.505155] [<ffffffff8020d3bc>]
> call_softirq+0x1c/0x30
> Mar 26 13:25:49 hest kernel: [25335.505157] [<ffffffff8020e505>]
> do_softirq+0x35/0x80
> Mar 26 13:25:49 hest kernel: [25335.505161] [<ffffffff8021f715>]
> smp_apic_timer_interrupt+0x85/0xd0
> Mar 26 13:25:49 hest kernel: [25335.505163] [<ffffffff8020cdf3>]
> apic_timer_interrupt+0x13/0x20
> Mar 26 13:25:49 hest kernel: [25335.505164] <EOI> [<ffffffff80212dc7>]
> default_idle+0x27/0x40
> Mar 26 13:25:49 hest kernel: [25335.505169] [<ffffffff80212fea>]
> c1e_idle+0xba/0x100
> Mar 26 13:25:49 hest kernel: [25335.505171] [<ffffffff8020ae80>]
> cpu_idle+0x40/0x70
> Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace
> e6e4f250dc22390d ]---
There was actually a bit more in the log:
Mar 26 13:25:49 hest kernel: [25335.505176] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: eth4:
bits (40000000) of register RXDMA_CFIG1 would not cl
ear, val[c0000000]
Mar 26 13:25:49 hest last message repeated 4 times
Mar 26 13:25:58 hest kernel: [25345.504898] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
Mar 26 13:26:08 hest kernel: [25355.504758] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
Mar 26 13:26:13 hest kernel: [25360.504687] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
Mar 26 13:26:18 hest kernel: [25365.504619] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
Mar 26 13:26:23 hest kernel: [25370.504549] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
Mar 26 13:26:28 hest kernel: [25375.504479] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
Mar 26 13:26:33 hest kernel: [25380.504409] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
Mar 26 13:26:38 hest kernel: [25385.504340] niu 0000:84:00.0: niu: eth4:
Transmit timed out, resetting
This is probably the interesting part:
Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: eth4:
bits (40000000) of register RXDMA_CFIG1 would not clear, val[c0000000]
Any suggestions?
Is this perhaps just broken hardware.. or a driver issue? (I had the
Sun nxge driver working for around 180 days on the same card.. so I
would assume the hardware is ok).
Jesper
--
Jesper
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29
2009-03-27 19:31 ` Jesper Krogh
@ 2009-03-28 0:42 ` Matheos Worku
2009-03-28 6:05 ` Jesper Krogh
0 siblings, 1 reply; 6+ messages in thread
From: Matheos Worku @ 2009-03-28 0:42 UTC (permalink / raw)
To: Jesper Krogh; +Cc: netdev@vger.kernel.org
Jesper Krogh wrote:
> Jesper Krogh wrote:
>> Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit
>> driver earlier).
>>
>> But then it "blew up" again:
>>
>> Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here
>> ]------------
>> Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at
>> net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210()
>> Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire
>> X4600 M2
>> Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4
>> (niu): transmit timed out
>> Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in:
>> af_packet ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss
>> sunrpc iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm
>> ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
>> scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev
>> psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw
>> pcspkr shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd
>> mbcache ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi
>> mptscsih qla2xxx mptbase scsi_transport_sas scsi_transport_fc
>> ehci_hcd scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore
>> dm_mirror dm_region_hash dm_log dm_snapshot dm_mod thermal processor
>> fan thermal_sys fuse
>> Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not
>> tainted 2.6.29 #30
>> Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace:
>> Mar 26 13:25:49 hest kernel: [25335.505113] <IRQ>
>> [<ffffffff8023d5c2>] warn_slowpath+0xf2/0x130
>> Mar 26 13:25:49 hest kernel: [25335.505124] [<ffffffff80239d2d>]
>> task_tick_fair+0x4d/0xd0
>> Mar 26 13:25:49 hest kernel: [25335.505130] [<ffffffff80355e33>]
>> cpumask_next_and+0x23/0x40
>> Mar 26 13:25:49 hest kernel: [25335.505132] [<ffffffff80233f84>]
>> find_busiest_group+0x204/0x870
>> Mar 26 13:25:49 hest kernel: [25335.505136] [<ffffffff8035b65e>]
>> strlcpy+0x4e/0x80
>> Mar 26 13:25:49 hest kernel: [25335.505138] [<ffffffff8041f11d>]
>> dev_watchdog+0x1fd/0x210
>> Mar 26 13:25:49 hest kernel: [25335.505141] [<ffffffff80235ac5>]
>> run_rebalance_domains+0x3c5/0x530
>> Mar 26 13:25:49 hest kernel: [25335.505143] [<ffffffff802474bb>]
>> run_timer_softirq+0x1bb/0x230
>> Mar 26 13:25:49 hest kernel: [25335.505148] [<ffffffff802574e1>]
>> sched_clock_cpu+0x131/0x180
>> Mar 26 13:25:49 hest kernel: [25335.505151] [<ffffffff80242cdb>]
>> __do_softirq+0x8b/0x150
>> Mar 26 13:25:49 hest kernel: [25335.505155] [<ffffffff8020d3bc>]
>> call_softirq+0x1c/0x30
>> Mar 26 13:25:49 hest kernel: [25335.505157] [<ffffffff8020e505>]
>> do_softirq+0x35/0x80
>> Mar 26 13:25:49 hest kernel: [25335.505161] [<ffffffff8021f715>]
>> smp_apic_timer_interrupt+0x85/0xd0
>> Mar 26 13:25:49 hest kernel: [25335.505163] [<ffffffff8020cdf3>]
>> apic_timer_interrupt+0x13/0x20
>> Mar 26 13:25:49 hest kernel: [25335.505164] <EOI>
>> [<ffffffff80212dc7>] default_idle+0x27/0x40
>> Mar 26 13:25:49 hest kernel: [25335.505169] [<ffffffff80212fea>]
>> c1e_idle+0xba/0x100
>> Mar 26 13:25:49 hest kernel: [25335.505171] [<ffffffff8020ae80>]
>> cpu_idle+0x40/0x70
>> Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace
>> e6e4f250dc22390d ]---
>
> There was actually a bit more in the log:
>
> Mar 26 13:25:49 hest kernel: [25335.505176] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu:
> eth4: bits (40000000) of register RXDMA_CFIG1 would not cl
> ear, val[c0000000]
> Mar 26 13:25:49 hest last message repeated 4 times
> Mar 26 13:25:58 hest kernel: [25345.504898] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
> Mar 26 13:26:08 hest kernel: [25355.504758] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
> Mar 26 13:26:13 hest kernel: [25360.504687] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
> Mar 26 13:26:18 hest kernel: [25365.504619] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
> Mar 26 13:26:23 hest kernel: [25370.504549] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
> Mar 26 13:26:28 hest kernel: [25375.504479] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
> Mar 26 13:26:33 hest kernel: [25380.504409] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
> Mar 26 13:26:38 hest kernel: [25385.504340] niu 0000:84:00.0: niu:
> eth4: Transmit timed out, resetting
>
> This is probably the interesting part:
> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu:
> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear,
> val[c0000000]
Jesper,
One of the RX ring DMAs is failing to reset. I guess whatever is
hanging the TX side is affecting the RX side as well. Can you do lspci
on the function and its siblings?
Regards
Matheos
>
> Any suggestions?
>
> Is this perhaps just broken hardware.. or a driver issue? (I had the
> Sun nxge driver working for around 180 days on the same card.. so I
> would assume the hardware is ok).
>
> Jesper
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29
2009-03-28 0:42 ` Matheos Worku
@ 2009-03-28 6:05 ` Jesper Krogh
2009-03-28 6:18 ` Matheos Worku
0 siblings, 1 reply; 6+ messages in thread
From: Jesper Krogh @ 2009-03-28 6:05 UTC (permalink / raw)
To: Matheos Worku; +Cc: netdev@vger.kernel.org
Matheos Worku wrote:
>> This is probably the interesting part:
>> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu:
>> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear,
>> val[c0000000]
> Jesper,
>
> One of the RX ring DMAs is failing to reset. I guess whatever is
> hanging the TX side is affecting the RX side as well. Can you do lspci
> on the function and its siblings?
Like this(please guide me if that wasn't the correct lspci output):
k# lspci -vvv -s 84:00
84:00.0 Ethernet controller: Sun Microsystems Computer Corp.
Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 43
Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at fe9f8000 (64-bit, non-prefetchable) [size=32K]
Region 4: Memory at fe9f0000 (64-bit, non-prefetchable) [size=32K]
Expansion ROM at fe800000 [disabled] [size=1M]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
Queue=0/5 Enable-
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Mask- TabSize=32
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00004000
Capabilities: [80] Express Endpoint IRQ 0
Device: Supported: MaxPayload 1024 bytes, PhantFunc 0,
ExtTag-
Device: Latency L0s <4us, L1 <8us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
Link: Latency L0s <512ns, L1 <64us
Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x8
Capabilities: [94] Vendor Specific Information
Capabilities: [9c] Vendor Specific Information
84:00.1 Ethernet controller: Sun Microsystems Computer Corp.
Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 42
Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at fe9e8000 (64-bit, non-prefetchable) [size=32K]
Region 4: Memory at fe9e0000 (64-bit, non-prefetchable) [size=32K]
Expansion ROM at fe700000 [disabled] [size=1M]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
Queue=0/5 Enable-
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00004000
Capabilities: [80] Express Endpoint IRQ 0
Device: Supported: MaxPayload 1024 bytes, PhantFunc 0,
ExtTag-
Device: Latency L0s <4us, L1 <8us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
Link: Latency L0s <512ns, L1 <64us
Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x8
Capabilities: [94] Vendor Specific Information
Capabilities: [9c] Vendor Specific Information
84:00.2 Ethernet controller: Sun Microsystems Computer Corp.
Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin C routed to IRQ 41
Region 0: Memory at fb000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at fe9d8000 (64-bit, non-prefetchable) [size=32K]
Region 4: Memory at fe9d0000 (64-bit, non-prefetchable) [size=32K]
Expansion ROM at fe600000 [disabled] [size=1M]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
Queue=0/5 Enable-
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00004000
Capabilities: [80] Express Endpoint IRQ 0
Device: Supported: MaxPayload 1024 bytes, PhantFunc 0,
ExtTag-
Device: Latency L0s <4us, L1 <8us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
Link: Latency L0s <512ns, L1 <64us
Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x8
Capabilities: [94] Vendor Specific Information
Capabilities: [9c] Vendor Specific Information
84:00.3 Ethernet controller: Sun Microsystems Computer Corp.
Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin D routed to IRQ 40
Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at fe9c8000 (64-bit, non-prefetchable) [size=32K]
Region 4: Memory at fe9c0000 (64-bit, non-prefetchable) [size=32K]
Expansion ROM at fe500000 [disabled] [size=1M]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
Queue=0/5 Enable-
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00004000
Capabilities: [80] Express Endpoint IRQ 0
Device: Supported: MaxPayload 1024 bytes, PhantFunc 0,
ExtTag-
Device: Latency L0s <4us, L1 <8us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
Link: Latency L0s <512ns, L1 <64us
Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x8
Capabilities: [94] Vendor Specific Information
Capabilities: [9c] Vendor Specific Information
--
Jesper
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29
2009-03-28 6:05 ` Jesper Krogh
@ 2009-03-28 6:18 ` Matheos Worku
2009-03-28 7:25 ` Jesper Krogh
0 siblings, 1 reply; 6+ messages in thread
From: Matheos Worku @ 2009-03-28 6:18 UTC (permalink / raw)
To: Jesper Krogh; +Cc: netdev@vger.kernel.org
Jesper Krogh wrote:
> Matheos Worku wrote:
>>> This is probably the interesting part:
>>> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu:
>>> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear,
>>> val[c0000000]
>> Jesper,
>>
>> One of the RX ring DMAs is failing to reset. I guess whatever is
>> hanging the TX side is affecting the RX side as well. Can you do lspci
>> on the function and its siblings?
>
> Like this(please guide me if that wasn't the correct lspci output):
Jesper,
I was wondering if you can get the register dump just after the NIC hangs.
lspci -vvv -xxx -s 84:0
Regards
Matheos
>
> k# lspci -vvv -s 84:00
> 84:00.0 Ethernet controller: Sun Microsystems Computer Corp.
> Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
> Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 43
> Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=16M]
> Region 2: Memory at fe9f8000 (64-bit, non-prefetchable) [size=32K]
> Region 4: Memory at fe9f0000 (64-bit, non-prefetchable) [size=32K]
> Expansion ROM at fe800000 [disabled] [size=1M]
> Capabilities: [40] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
> Queue=0/5 Enable-
> Address: 0000000000000000 Data: 0000
> Masking: 00000000 Pending: 00000000
> Capabilities: [70] MSI-X: Enable+ Mask- TabSize=32
> Vector table: BAR=2 offset=00000000
> PBA: BAR=2 offset=00004000
> Capabilities: [80] Express Endpoint IRQ 0
> Device: Supported: MaxPayload 1024 bytes, PhantFunc 0,
> ExtTag-
> Device: Latency L0s <4us, L1 <8us
> Device: AtnBtn- AtnInd- PwrInd-
> Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
> Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
> Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
> Link: Latency L0s <512ns, L1 <64us
> Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
> Link: Speed 2.5Gb/s, Width x8
> Capabilities: [94] Vendor Specific Information
> Capabilities: [9c] Vendor Specific Information
>
> 84:00.1 Ethernet controller: Sun Microsystems Computer Corp.
> Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
> Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin B routed to IRQ 42
> Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=16M]
> Region 2: Memory at fe9e8000 (64-bit, non-prefetchable) [size=32K]
> Region 4: Memory at fe9e0000 (64-bit, non-prefetchable) [size=32K]
> Expansion ROM at fe700000 [disabled] [size=1M]
> Capabilities: [40] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
> Queue=0/5 Enable-
> Address: 0000000000000000 Data: 0000
> Masking: 00000000 Pending: 00000000
> Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
> Vector table: BAR=2 offset=00000000
> PBA: BAR=2 offset=00004000
> Capabilities: [80] Express Endpoint IRQ 0
> Device: Supported: MaxPayload 1024 bytes, PhantFunc 0,
> ExtTag-
> Device: Latency L0s <4us, L1 <8us
> Device: AtnBtn- AtnInd- PwrInd-
> Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
> Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
> Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
> Link: Latency L0s <512ns, L1 <64us
> Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
> Link: Speed 2.5Gb/s, Width x8
> Capabilities: [94] Vendor Specific Information
> Capabilities: [9c] Vendor Specific Information
>
> 84:00.2 Ethernet controller: Sun Microsystems Computer Corp.
> Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
> Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
> Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Interrupt: pin C routed to IRQ 41
> Region 0: Memory at fb000000 (64-bit, non-prefetchable) [size=16M]
> Region 2: Memory at fe9d8000 (64-bit, non-prefetchable) [size=32K]
> Region 4: Memory at fe9d0000 (64-bit, non-prefetchable) [size=32K]
> Expansion ROM at fe600000 [disabled] [size=1M]
> Capabilities: [40] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
> Queue=0/5 Enable-
> Address: 0000000000000000 Data: 0000
> Masking: 00000000 Pending: 00000000
> Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
> Vector table: BAR=2 offset=00000000
> PBA: BAR=2 offset=00004000
> Capabilities: [80] Express Endpoint IRQ 0
> Device: Supported: MaxPayload 1024 bytes, PhantFunc 0,
> ExtTag-
> Device: Latency L0s <4us, L1 <8us
> Device: AtnBtn- AtnInd- PwrInd-
> Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
> Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
> Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
> Link: Latency L0s <512ns, L1 <64us
> Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
> Link: Speed 2.5Gb/s, Width x8
> Capabilities: [94] Vendor Specific Information
> Capabilities: [9c] Vendor Specific Information
>
> 84:00.3 Ethernet controller: Sun Microsystems Computer Corp.
> Multithreaded 10 Gigabit Ethernet Network Controller (rev 01)
> Subsystem: Sun Microsystems Computer Corp. Unknown device 0000
> Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR-
> Interrupt: pin D routed to IRQ 40
> Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M]
> Region 2: Memory at fe9c8000 (64-bit, non-prefetchable) [size=32K]
> Region 4: Memory at fe9c0000 (64-bit, non-prefetchable) [size=32K]
> Expansion ROM at fe500000 [disabled] [size=1M]
> Capabilities: [40] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
> Queue=0/5 Enable-
> Address: 0000000000000000 Data: 0000
> Masking: 00000000 Pending: 00000000
> Capabilities: [70] MSI-X: Enable- Mask- TabSize=32
> Vector table: BAR=2 offset=00000000
> PBA: BAR=2 offset=00004000
> Capabilities: [80] Express Endpoint IRQ 0
> Device: Supported: MaxPayload 1024 bytes, PhantFunc 0,
> ExtTag-
> Device: Latency L0s <4us, L1 <8us
> Device: AtnBtn- AtnInd- PwrInd-
> Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
> Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
> Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1
> Link: Latency L0s <512ns, L1 <64us
> Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch-
> Link: Speed 2.5Gb/s, Width x8
> Capabilities: [94] Vendor Specific Information
> Capabilities: [9c] Vendor Specific Information
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29
2009-03-28 6:18 ` Matheos Worku
@ 2009-03-28 7:25 ` Jesper Krogh
0 siblings, 0 replies; 6+ messages in thread
From: Jesper Krogh @ 2009-03-28 7:25 UTC (permalink / raw)
To: Matheos Worku; +Cc: netdev@vger.kernel.org
Matheos Worku wrote:
> Jesper Krogh wrote:
>> Matheos Worku wrote:
>>>> This is probably the interesting part:
>>>> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu:
>>>> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear,
>>>> val[c0000000]
>>> Jesper,
>>>
>>> One of the RX ring DMAs is failing to reset. I guess whatever is
>>> hanging the TX side is affecting the RX side as well. Can you do
>>> lspci on the function and its siblings?
>>
>> Like this(please guide me if that wasn't the correct lspci output):
>
> Jesper,
>
> I was wondering if you can get the register dump just after the NIC hangs.
>
> lspci -vvv -xxx -s 84:0
I will try to do that, but it involves more or less "putting a known bad
driver" in production. And wait X days. (where X usually is less than 7
and more than 2). So if there is more debugging code that would be
helpful to have in the driver/kernel then it would be preferrable to get
it in at the same time, in order to reduce the amount of trial-and-error
cycles.
--
Jesper
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-03-28 7:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-26 12:44 niu driver - Transmit timed out - 2.6.29 Jesper Krogh
2009-03-27 19:31 ` Jesper Krogh
2009-03-28 0:42 ` Matheos Worku
2009-03-28 6:05 ` Jesper Krogh
2009-03-28 6:18 ` Matheos Worku
2009-03-28 7:25 ` Jesper Krogh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).