* niu driver - Transmit timed out - 2.6.29 @ 2009-03-26 12:44 Jesper Krogh 2009-03-27 19:31 ` Jesper Krogh 0 siblings, 1 reply; 6+ messages in thread From: Jesper Krogh @ 2009-03-26 12:44 UTC (permalink / raw) To: netdev@vger.kernel.org Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit driver earlier). But then it "blew up" again: Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here ]------------ Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210() Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire X4600 M2 Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 (niu): transmit timed out Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: af_packet ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss sunrpc iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw pcspkr shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd mbcache ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi mptscsih qla2xxx mptbase scsi_transport_sas scsi_transport_fc ehci_hcd scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore dm_mirror dm_region_hash dm_log dm_snapshot dm_mod thermal processor fan thermal_sys fuse Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not tainted 2.6.29 #30 Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace: Mar 26 13:25:49 hest kernel: [25335.505113] <IRQ> [<ffffffff8023d5c2>] warn_slowpath+0xf2/0x130 Mar 26 13:25:49 hest kernel: [25335.505124] [<ffffffff80239d2d>] task_tick_fair+0x4d/0xd0 Mar 26 13:25:49 hest kernel: [25335.505130] [<ffffffff80355e33>] cpumask_next_and+0x23/0x40 Mar 26 13:25:49 hest kernel: [25335.505132] [<ffffffff80233f84>] find_busiest_group+0x204/0x870 Mar 26 13:25:49 hest kernel: [25335.505136] [<ffffffff8035b65e>] strlcpy+0x4e/0x80 Mar 26 13:25:49 hest kernel: [25335.505138] [<ffffffff8041f11d>] dev_watchdog+0x1fd/0x210 Mar 26 13:25:49 hest kernel: [25335.505141] [<ffffffff80235ac5>] run_rebalance_domains+0x3c5/0x530 Mar 26 13:25:49 hest kernel: [25335.505143] [<ffffffff802474bb>] run_timer_softirq+0x1bb/0x230 Mar 26 13:25:49 hest kernel: [25335.505148] [<ffffffff802574e1>] sched_clock_cpu+0x131/0x180 Mar 26 13:25:49 hest kernel: [25335.505151] [<ffffffff80242cdb>] __do_softirq+0x8b/0x150 Mar 26 13:25:49 hest kernel: [25335.505155] [<ffffffff8020d3bc>] call_softirq+0x1c/0x30 Mar 26 13:25:49 hest kernel: [25335.505157] [<ffffffff8020e505>] do_softirq+0x35/0x80 Mar 26 13:25:49 hest kernel: [25335.505161] [<ffffffff8021f715>] smp_apic_timer_interrupt+0x85/0xd0 Mar 26 13:25:49 hest kernel: [25335.505163] [<ffffffff8020cdf3>] apic_timer_interrupt+0x13/0x20 Mar 26 13:25:49 hest kernel: [25335.505164] <EOI> [<ffffffff80212dc7>] default_idle+0x27/0x40 Mar 26 13:25:49 hest kernel: [25335.505169] [<ffffffff80212fea>] c1e_idle+0xba/0x100 Mar 26 13:25:49 hest kernel: [25335.505171] [<ffffffff8020ae80>] cpu_idle+0x40/0x70 Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace e6e4f250dc22390d ]--- It is fairly hard to reproduce and pops generally up after af few days of production. But I am willing to test patches that would help resolve this problem as both the niu-driver and the NFSD on 2.6.29 really outperforms the 2.6.26-rc4 + nxge driver I'm currently using. Hardware: Sun Fire X4600, 32GB of memory -- Jesper ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29 2009-03-26 12:44 niu driver - Transmit timed out - 2.6.29 Jesper Krogh @ 2009-03-27 19:31 ` Jesper Krogh 2009-03-28 0:42 ` Matheos Worku 0 siblings, 1 reply; 6+ messages in thread From: Jesper Krogh @ 2009-03-27 19:31 UTC (permalink / raw) To: netdev@vger.kernel.org Jesper Krogh wrote: > Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit > driver earlier). > > But then it "blew up" again: > > Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here > ]------------ > Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at > net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210() > Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire > X4600 M2 > Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 (niu): > transmit timed out > Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: af_packet > ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss sunrpc > iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa > ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi > scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev > psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw pcspkr > shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd mbcache > ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi mptscsih > qla2xxx mptbase scsi_transport_sas scsi_transport_fc ehci_hcd > scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore dm_mirror > dm_region_hash dm_log dm_snapshot dm_mod thermal processor fan > thermal_sys fuse > Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not > tainted 2.6.29 #30 > Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace: > Mar 26 13:25:49 hest kernel: [25335.505113] <IRQ> [<ffffffff8023d5c2>] > warn_slowpath+0xf2/0x130 > Mar 26 13:25:49 hest kernel: [25335.505124] [<ffffffff80239d2d>] > task_tick_fair+0x4d/0xd0 > Mar 26 13:25:49 hest kernel: [25335.505130] [<ffffffff80355e33>] > cpumask_next_and+0x23/0x40 > Mar 26 13:25:49 hest kernel: [25335.505132] [<ffffffff80233f84>] > find_busiest_group+0x204/0x870 > Mar 26 13:25:49 hest kernel: [25335.505136] [<ffffffff8035b65e>] > strlcpy+0x4e/0x80 > Mar 26 13:25:49 hest kernel: [25335.505138] [<ffffffff8041f11d>] > dev_watchdog+0x1fd/0x210 > Mar 26 13:25:49 hest kernel: [25335.505141] [<ffffffff80235ac5>] > run_rebalance_domains+0x3c5/0x530 > Mar 26 13:25:49 hest kernel: [25335.505143] [<ffffffff802474bb>] > run_timer_softirq+0x1bb/0x230 > Mar 26 13:25:49 hest kernel: [25335.505148] [<ffffffff802574e1>] > sched_clock_cpu+0x131/0x180 > Mar 26 13:25:49 hest kernel: [25335.505151] [<ffffffff80242cdb>] > __do_softirq+0x8b/0x150 > Mar 26 13:25:49 hest kernel: [25335.505155] [<ffffffff8020d3bc>] > call_softirq+0x1c/0x30 > Mar 26 13:25:49 hest kernel: [25335.505157] [<ffffffff8020e505>] > do_softirq+0x35/0x80 > Mar 26 13:25:49 hest kernel: [25335.505161] [<ffffffff8021f715>] > smp_apic_timer_interrupt+0x85/0xd0 > Mar 26 13:25:49 hest kernel: [25335.505163] [<ffffffff8020cdf3>] > apic_timer_interrupt+0x13/0x20 > Mar 26 13:25:49 hest kernel: [25335.505164] <EOI> [<ffffffff80212dc7>] > default_idle+0x27/0x40 > Mar 26 13:25:49 hest kernel: [25335.505169] [<ffffffff80212fea>] > c1e_idle+0xba/0x100 > Mar 26 13:25:49 hest kernel: [25335.505171] [<ffffffff8020ae80>] > cpu_idle+0x40/0x70 > Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace > e6e4f250dc22390d ]--- There was actually a bit more in the log: Mar 26 13:25:49 hest kernel: [25335.505176] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: eth4: bits (40000000) of register RXDMA_CFIG1 would not cl ear, val[c0000000] Mar 26 13:25:49 hest last message repeated 4 times Mar 26 13:25:58 hest kernel: [25345.504898] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:08 hest kernel: [25355.504758] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:13 hest kernel: [25360.504687] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:18 hest kernel: [25365.504619] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:23 hest kernel: [25370.504549] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:28 hest kernel: [25375.504479] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:33 hest kernel: [25380.504409] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:38 hest kernel: [25385.504340] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting This is probably the interesting part: Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, val[c0000000] Any suggestions? Is this perhaps just broken hardware.. or a driver issue? (I had the Sun nxge driver working for around 180 days on the same card.. so I would assume the hardware is ok). Jesper -- Jesper ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29 2009-03-27 19:31 ` Jesper Krogh @ 2009-03-28 0:42 ` Matheos Worku 2009-03-28 6:05 ` Jesper Krogh 0 siblings, 1 reply; 6+ messages in thread From: Matheos Worku @ 2009-03-28 0:42 UTC (permalink / raw) To: Jesper Krogh; +Cc: netdev@vger.kernel.org Jesper Krogh wrote: > Jesper Krogh wrote: >> Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit >> driver earlier). >> >> But then it "blew up" again: >> >> Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here >> ]------------ >> Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at >> net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210() >> Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire >> X4600 M2 >> Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 >> (niu): transmit timed out >> Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: >> af_packet ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss >> sunrpc iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm >> ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi >> scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev >> psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw >> pcspkr shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd >> mbcache ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi >> mptscsih qla2xxx mptbase scsi_transport_sas scsi_transport_fc >> ehci_hcd scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore >> dm_mirror dm_region_hash dm_log dm_snapshot dm_mod thermal processor >> fan thermal_sys fuse >> Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not >> tainted 2.6.29 #30 >> Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace: >> Mar 26 13:25:49 hest kernel: [25335.505113] <IRQ> >> [<ffffffff8023d5c2>] warn_slowpath+0xf2/0x130 >> Mar 26 13:25:49 hest kernel: [25335.505124] [<ffffffff80239d2d>] >> task_tick_fair+0x4d/0xd0 >> Mar 26 13:25:49 hest kernel: [25335.505130] [<ffffffff80355e33>] >> cpumask_next_and+0x23/0x40 >> Mar 26 13:25:49 hest kernel: [25335.505132] [<ffffffff80233f84>] >> find_busiest_group+0x204/0x870 >> Mar 26 13:25:49 hest kernel: [25335.505136] [<ffffffff8035b65e>] >> strlcpy+0x4e/0x80 >> Mar 26 13:25:49 hest kernel: [25335.505138] [<ffffffff8041f11d>] >> dev_watchdog+0x1fd/0x210 >> Mar 26 13:25:49 hest kernel: [25335.505141] [<ffffffff80235ac5>] >> run_rebalance_domains+0x3c5/0x530 >> Mar 26 13:25:49 hest kernel: [25335.505143] [<ffffffff802474bb>] >> run_timer_softirq+0x1bb/0x230 >> Mar 26 13:25:49 hest kernel: [25335.505148] [<ffffffff802574e1>] >> sched_clock_cpu+0x131/0x180 >> Mar 26 13:25:49 hest kernel: [25335.505151] [<ffffffff80242cdb>] >> __do_softirq+0x8b/0x150 >> Mar 26 13:25:49 hest kernel: [25335.505155] [<ffffffff8020d3bc>] >> call_softirq+0x1c/0x30 >> Mar 26 13:25:49 hest kernel: [25335.505157] [<ffffffff8020e505>] >> do_softirq+0x35/0x80 >> Mar 26 13:25:49 hest kernel: [25335.505161] [<ffffffff8021f715>] >> smp_apic_timer_interrupt+0x85/0xd0 >> Mar 26 13:25:49 hest kernel: [25335.505163] [<ffffffff8020cdf3>] >> apic_timer_interrupt+0x13/0x20 >> Mar 26 13:25:49 hest kernel: [25335.505164] <EOI> >> [<ffffffff80212dc7>] default_idle+0x27/0x40 >> Mar 26 13:25:49 hest kernel: [25335.505169] [<ffffffff80212fea>] >> c1e_idle+0xba/0x100 >> Mar 26 13:25:49 hest kernel: [25335.505171] [<ffffffff8020ae80>] >> cpu_idle+0x40/0x70 >> Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace >> e6e4f250dc22390d ]--- > > There was actually a bit more in the log: > > Mar 26 13:25:49 hest kernel: [25335.505176] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: > eth4: bits (40000000) of register RXDMA_CFIG1 would not cl > ear, val[c0000000] > Mar 26 13:25:49 hest last message repeated 4 times > Mar 26 13:25:58 hest kernel: [25345.504898] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > Mar 26 13:26:08 hest kernel: [25355.504758] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > Mar 26 13:26:13 hest kernel: [25360.504687] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > Mar 26 13:26:18 hest kernel: [25365.504619] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > Mar 26 13:26:23 hest kernel: [25370.504549] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > Mar 26 13:26:28 hest kernel: [25375.504479] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > Mar 26 13:26:33 hest kernel: [25380.504409] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > Mar 26 13:26:38 hest kernel: [25385.504340] niu 0000:84:00.0: niu: > eth4: Transmit timed out, resetting > > This is probably the interesting part: > Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: > eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, > val[c0000000] Jesper, One of the RX ring DMAs is failing to reset. I guess whatever is hanging the TX side is affecting the RX side as well. Can you do lspci on the function and its siblings? Regards Matheos > > Any suggestions? > > Is this perhaps just broken hardware.. or a driver issue? (I had the > Sun nxge driver working for around 180 days on the same card.. so I > would assume the hardware is ok). > > Jesper ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29 2009-03-28 0:42 ` Matheos Worku @ 2009-03-28 6:05 ` Jesper Krogh 2009-03-28 6:18 ` Matheos Worku 0 siblings, 1 reply; 6+ messages in thread From: Jesper Krogh @ 2009-03-28 6:05 UTC (permalink / raw) To: Matheos Worku; +Cc: netdev@vger.kernel.org Matheos Worku wrote: >> This is probably the interesting part: >> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: >> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, >> val[c0000000] > Jesper, > > One of the RX ring DMAs is failing to reset. I guess whatever is > hanging the TX side is affecting the RX side as well. Can you do lspci > on the function and its siblings? Like this(please guide me if that wasn't the correct lspci output): k# lspci -vvv -s 84:00 84:00.0 Ethernet controller: Sun Microsystems Computer Corp. Multithreaded 10 Gigabit Ethernet Network Controller (rev 01) Subsystem: Sun Microsystems Computer Corp. Unknown device 0000 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 43 Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at fe9f8000 (64-bit, non-prefetchable) [size=32K] Region 4: Memory at fe9f0000 (64-bit, non-prefetchable) [size=32K] Expansion ROM at fe800000 [disabled] [size=1M] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/5 Enable- Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Mask- TabSize=32 Vector table: BAR=2 offset=00000000 PBA: BAR=2 offset=00004000 Capabilities: [80] Express Endpoint IRQ 0 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <4us, L1 <8us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1 Link: Latency L0s <512ns, L1 <64us Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Capabilities: [94] Vendor Specific Information Capabilities: [9c] Vendor Specific Information 84:00.1 Ethernet controller: Sun Microsystems Computer Corp. Multithreaded 10 Gigabit Ethernet Network Controller (rev 01) Subsystem: Sun Microsystems Computer Corp. Unknown device 0000 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 42 Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at fe9e8000 (64-bit, non-prefetchable) [size=32K] Region 4: Memory at fe9e0000 (64-bit, non-prefetchable) [size=32K] Expansion ROM at fe700000 [disabled] [size=1M] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/5 Enable- Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable- Mask- TabSize=32 Vector table: BAR=2 offset=00000000 PBA: BAR=2 offset=00004000 Capabilities: [80] Express Endpoint IRQ 0 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <4us, L1 <8us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1 Link: Latency L0s <512ns, L1 <64us Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Capabilities: [94] Vendor Specific Information Capabilities: [9c] Vendor Specific Information 84:00.2 Ethernet controller: Sun Microsystems Computer Corp. Multithreaded 10 Gigabit Ethernet Network Controller (rev 01) Subsystem: Sun Microsystems Computer Corp. Unknown device 0000 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin C routed to IRQ 41 Region 0: Memory at fb000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at fe9d8000 (64-bit, non-prefetchable) [size=32K] Region 4: Memory at fe9d0000 (64-bit, non-prefetchable) [size=32K] Expansion ROM at fe600000 [disabled] [size=1M] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/5 Enable- Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable- Mask- TabSize=32 Vector table: BAR=2 offset=00000000 PBA: BAR=2 offset=00004000 Capabilities: [80] Express Endpoint IRQ 0 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <4us, L1 <8us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1 Link: Latency L0s <512ns, L1 <64us Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Capabilities: [94] Vendor Specific Information Capabilities: [9c] Vendor Specific Information 84:00.3 Ethernet controller: Sun Microsystems Computer Corp. Multithreaded 10 Gigabit Ethernet Network Controller (rev 01) Subsystem: Sun Microsystems Computer Corp. Unknown device 0000 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin D routed to IRQ 40 Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M] Region 2: Memory at fe9c8000 (64-bit, non-prefetchable) [size=32K] Region 4: Memory at fe9c0000 (64-bit, non-prefetchable) [size=32K] Expansion ROM at fe500000 [disabled] [size=1M] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/5 Enable- Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable- Mask- TabSize=32 Vector table: BAR=2 offset=00000000 PBA: BAR=2 offset=00004000 Capabilities: [80] Express Endpoint IRQ 0 Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, ExtTag- Device: Latency L0s <4us, L1 <8us Device: AtnBtn- AtnInd- PwrInd- Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1 Link: Latency L0s <512ns, L1 <64us Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Capabilities: [94] Vendor Specific Information Capabilities: [9c] Vendor Specific Information -- Jesper ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29 2009-03-28 6:05 ` Jesper Krogh @ 2009-03-28 6:18 ` Matheos Worku 2009-03-28 7:25 ` Jesper Krogh 0 siblings, 1 reply; 6+ messages in thread From: Matheos Worku @ 2009-03-28 6:18 UTC (permalink / raw) To: Jesper Krogh; +Cc: netdev@vger.kernel.org Jesper Krogh wrote: > Matheos Worku wrote: >>> This is probably the interesting part: >>> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: >>> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, >>> val[c0000000] >> Jesper, >> >> One of the RX ring DMAs is failing to reset. I guess whatever is >> hanging the TX side is affecting the RX side as well. Can you do lspci >> on the function and its siblings? > > Like this(please guide me if that wasn't the correct lspci output): Jesper, I was wondering if you can get the register dump just after the NIC hangs. lspci -vvv -xxx -s 84:0 Regards Matheos > > k# lspci -vvv -s 84:00 > 84:00.0 Ethernet controller: Sun Microsystems Computer Corp. > Multithreaded 10 Gigabit Ethernet Network Controller (rev 01) > Subsystem: Sun Microsystems Computer Corp. Unknown device 0000 > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 43 > Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=16M] > Region 2: Memory at fe9f8000 (64-bit, non-prefetchable) [size=32K] > Region 4: Memory at fe9f0000 (64-bit, non-prefetchable) [size=32K] > Expansion ROM at fe800000 [disabled] [size=1M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ > Queue=0/5 Enable- > Address: 0000000000000000 Data: 0000 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] MSI-X: Enable+ Mask- TabSize=32 > Vector table: BAR=2 offset=00000000 > PBA: BAR=2 offset=00004000 > Capabilities: [80] Express Endpoint IRQ 0 > Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, > ExtTag- > Device: Latency L0s <4us, L1 <8us > Device: AtnBtn- AtnInd- PwrInd- > Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ > Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- > Device: MaxPayload 128 bytes, MaxReadReq 128 bytes > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1 > Link: Latency L0s <512ns, L1 <64us > Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- > Link: Speed 2.5Gb/s, Width x8 > Capabilities: [94] Vendor Specific Information > Capabilities: [9c] Vendor Specific Information > > 84:00.1 Ethernet controller: Sun Microsystems Computer Corp. > Multithreaded 10 Gigabit Ethernet Network Controller (rev 01) > Subsystem: Sun Microsystems Computer Corp. Unknown device 0000 > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin B routed to IRQ 42 > Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=16M] > Region 2: Memory at fe9e8000 (64-bit, non-prefetchable) [size=32K] > Region 4: Memory at fe9e0000 (64-bit, non-prefetchable) [size=32K] > Expansion ROM at fe700000 [disabled] [size=1M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ > Queue=0/5 Enable- > Address: 0000000000000000 Data: 0000 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] MSI-X: Enable- Mask- TabSize=32 > Vector table: BAR=2 offset=00000000 > PBA: BAR=2 offset=00004000 > Capabilities: [80] Express Endpoint IRQ 0 > Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, > ExtTag- > Device: Latency L0s <4us, L1 <8us > Device: AtnBtn- AtnInd- PwrInd- > Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ > Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- > Device: MaxPayload 128 bytes, MaxReadReq 128 bytes > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1 > Link: Latency L0s <512ns, L1 <64us > Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- > Link: Speed 2.5Gb/s, Width x8 > Capabilities: [94] Vendor Specific Information > Capabilities: [9c] Vendor Specific Information > > 84:00.2 Ethernet controller: Sun Microsystems Computer Corp. > Multithreaded 10 Gigabit Ethernet Network Controller (rev 01) > Subsystem: Sun Microsystems Computer Corp. Unknown device 0000 > Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- > Interrupt: pin C routed to IRQ 41 > Region 0: Memory at fb000000 (64-bit, non-prefetchable) [size=16M] > Region 2: Memory at fe9d8000 (64-bit, non-prefetchable) [size=32K] > Region 4: Memory at fe9d0000 (64-bit, non-prefetchable) [size=32K] > Expansion ROM at fe600000 [disabled] [size=1M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ > Queue=0/5 Enable- > Address: 0000000000000000 Data: 0000 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] MSI-X: Enable- Mask- TabSize=32 > Vector table: BAR=2 offset=00000000 > PBA: BAR=2 offset=00004000 > Capabilities: [80] Express Endpoint IRQ 0 > Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, > ExtTag- > Device: Latency L0s <4us, L1 <8us > Device: AtnBtn- AtnInd- PwrInd- > Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported- > Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > Device: MaxPayload 128 bytes, MaxReadReq 128 bytes > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1 > Link: Latency L0s <512ns, L1 <64us > Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- > Link: Speed 2.5Gb/s, Width x8 > Capabilities: [94] Vendor Specific Information > Capabilities: [9c] Vendor Specific Information > > 84:00.3 Ethernet controller: Sun Microsystems Computer Corp. > Multithreaded 10 Gigabit Ethernet Network Controller (rev 01) > Subsystem: Sun Microsystems Computer Corp. Unknown device 0000 > Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- > Interrupt: pin D routed to IRQ 40 > Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M] > Region 2: Memory at fe9c8000 (64-bit, non-prefetchable) [size=32K] > Region 4: Memory at fe9c0000 (64-bit, non-prefetchable) [size=32K] > Expansion ROM at fe500000 [disabled] [size=1M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ > Queue=0/5 Enable- > Address: 0000000000000000 Data: 0000 > Masking: 00000000 Pending: 00000000 > Capabilities: [70] MSI-X: Enable- Mask- TabSize=32 > Vector table: BAR=2 offset=00000000 > PBA: BAR=2 offset=00004000 > Capabilities: [80] Express Endpoint IRQ 0 > Device: Supported: MaxPayload 1024 bytes, PhantFunc 0, > ExtTag- > Device: Latency L0s <4us, L1 <8us > Device: AtnBtn- AtnInd- PwrInd- > Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported- > Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > Device: MaxPayload 128 bytes, MaxReadReq 128 bytes > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 1 > Link: Latency L0s <512ns, L1 <64us > Link: ASPM Disabled RCB 128 bytes CommClk- ExtSynch- > Link: Speed 2.5Gb/s, Width x8 > Capabilities: [94] Vendor Specific Information > Capabilities: [9c] Vendor Specific Information > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: niu driver - Transmit timed out - 2.6.29 2009-03-28 6:18 ` Matheos Worku @ 2009-03-28 7:25 ` Jesper Krogh 0 siblings, 0 replies; 6+ messages in thread From: Jesper Krogh @ 2009-03-28 7:25 UTC (permalink / raw) To: Matheos Worku; +Cc: netdev@vger.kernel.org Matheos Worku wrote: > Jesper Krogh wrote: >> Matheos Worku wrote: >>>> This is probably the interesting part: >>>> Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: >>>> eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, >>>> val[c0000000] >>> Jesper, >>> >>> One of the RX ring DMAs is failing to reset. I guess whatever is >>> hanging the TX side is affecting the RX side as well. Can you do >>> lspci on the function and its siblings? >> >> Like this(please guide me if that wasn't the correct lspci output): > > Jesper, > > I was wondering if you can get the register dump just after the NIC hangs. > > lspci -vvv -xxx -s 84:0 I will try to do that, but it involves more or less "putting a known bad driver" in production. And wait X days. (where X usually is less than 7 and more than 2). So if there is more debugging code that would be helpful to have in the driver/kernel then it would be preferrable to get it in at the same time, in order to reduce the amount of trial-and-error cycles. -- Jesper ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-03-28 7:25 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-26 12:44 niu driver - Transmit timed out - 2.6.29 Jesper Krogh 2009-03-27 19:31 ` Jesper Krogh 2009-03-28 0:42 ` Matheos Worku 2009-03-28 6:05 ` Jesper Krogh 2009-03-28 6:18 ` Matheos Worku 2009-03-28 7:25 ` Jesper Krogh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).