From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Krogh Subject: Re: niu driver - Transmit timed out - 2.6.29 Date: Fri, 27 Mar 2009 20:31:34 +0100 Message-ID: <49CD2996.60502@krogh.cc> References: <49CB78A4.3020406@krogh.cc> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit To: "netdev@vger.kernel.org" Return-path: Received: from 2605ds1-ynoe.1.fullrate.dk ([90.184.12.24]:49850 "EHLO shrek.krogh.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751215AbZC0Tbs (ORCPT ); Fri, 27 Mar 2009 15:31:48 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by shrek.krogh.cc (Postfix) with ESMTP id F34013BF58D for ; Fri, 27 Mar 2009 20:31:45 +0100 (CET) Received: from shrek.krogh.cc ([127.0.0.1]) by localhost (shrek.krogh.cc [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xuVBFN8yTjlZ for ; Fri, 27 Mar 2009 20:31:33 +0100 (CET) Received: from [192.168.1.4] (unknown [90.184.13.46]) by shrek.krogh.cc (Postfix) with ESMTP id 592AC3BF441 for ; Fri, 27 Mar 2009 20:31:33 +0100 (CET) In-Reply-To: <49CB78A4.3020406@krogh.cc> Sender: netdev-owner@vger.kernel.org List-ID: Jesper Krogh wrote: > Ok. I was just so happy .. (See "Status update on Sun Neptune 10Gbit > driver earlier). > > But then it "blew up" again: > > Mar 26 13:25:49 hest kernel: [25335.505049] ------------[ cut here > ]------------ > Mar 26 13:25:49 hest kernel: [25335.505055] WARNING: at > net/sched/sch_generic.c:226 dev_watchdog+0x1fd/0x210() > Mar 26 13:25:49 hest kernel: [25335.505057] Hardware name: Sun Fire > X4600 M2 > Mar 26 13:25:49 hest kernel: [25335.505059] NETDEV WATCHDOG: eth4 (niu): > transmit timed out > Mar 26 13:25:49 hest kernel: [25335.505060] Modules linked in: af_packet > ext4 jbd2 crc16 nfsd exportfs autofs4 nfs lockd auth_rpcgss sunrpc > iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa > ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi > scsi_transport_iscsi ipv6 parport_pc lp parport loop sr_mod joydev > psmouse niu usb_storage usbhid i2c_nforce2 libusual hid serio_raw pcspkr > shpchp k8temp pci_hotplug i2c_core button evdev ext3 jbd mbcache > ide_cd_mod cdrom sg sd_mod ata_generic libata mptsas mptspi mptscsih > qla2xxx mptbase scsi_transport_sas scsi_transport_fc ehci_hcd > scsi_transport_spi ohci_hcd e1000 scsi_mod amd74xx usbcore dm_mirror > dm_region_hash dm_log dm_snapshot dm_mod thermal processor fan > thermal_sys fuse > Mar 26 13:25:49 hest kernel: [25335.505109] Pid: 0, comm: swapper Not > tainted 2.6.29 #30 > Mar 26 13:25:49 hest kernel: [25335.505111] Call Trace: > Mar 26 13:25:49 hest kernel: [25335.505113] [] > warn_slowpath+0xf2/0x130 > Mar 26 13:25:49 hest kernel: [25335.505124] [] > task_tick_fair+0x4d/0xd0 > Mar 26 13:25:49 hest kernel: [25335.505130] [] > cpumask_next_and+0x23/0x40 > Mar 26 13:25:49 hest kernel: [25335.505132] [] > find_busiest_group+0x204/0x870 > Mar 26 13:25:49 hest kernel: [25335.505136] [] > strlcpy+0x4e/0x80 > Mar 26 13:25:49 hest kernel: [25335.505138] [] > dev_watchdog+0x1fd/0x210 > Mar 26 13:25:49 hest kernel: [25335.505141] [] > run_rebalance_domains+0x3c5/0x530 > Mar 26 13:25:49 hest kernel: [25335.505143] [] > run_timer_softirq+0x1bb/0x230 > Mar 26 13:25:49 hest kernel: [25335.505148] [] > sched_clock_cpu+0x131/0x180 > Mar 26 13:25:49 hest kernel: [25335.505151] [] > __do_softirq+0x8b/0x150 > Mar 26 13:25:49 hest kernel: [25335.505155] [] > call_softirq+0x1c/0x30 > Mar 26 13:25:49 hest kernel: [25335.505157] [] > do_softirq+0x35/0x80 > Mar 26 13:25:49 hest kernel: [25335.505161] [] > smp_apic_timer_interrupt+0x85/0xd0 > Mar 26 13:25:49 hest kernel: [25335.505163] [] > apic_timer_interrupt+0x13/0x20 > Mar 26 13:25:49 hest kernel: [25335.505164] [] > default_idle+0x27/0x40 > Mar 26 13:25:49 hest kernel: [25335.505169] [] > c1e_idle+0xba/0x100 > Mar 26 13:25:49 hest kernel: [25335.505171] [] > cpu_idle+0x40/0x70 > Mar 26 13:25:49 hest kernel: [25335.505173] ---[ end trace > e6e4f250dc22390d ]--- There was actually a bit more in the log: Mar 26 13:25:49 hest kernel: [25335.505176] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: eth4: bits (40000000) of register RXDMA_CFIG1 would not cl ear, val[c0000000] Mar 26 13:25:49 hest last message repeated 4 times Mar 26 13:25:58 hest kernel: [25345.504898] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:08 hest kernel: [25355.504758] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:13 hest kernel: [25360.504687] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:18 hest kernel: [25365.504619] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:23 hest kernel: [25370.504549] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:28 hest kernel: [25375.504479] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:33 hest kernel: [25380.504409] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting Mar 26 13:26:38 hest kernel: [25385.504340] niu 0000:84:00.0: niu: eth4: Transmit timed out, resetting This is probably the interesting part: Mar 26 13:25:49 hest kernel: [25335.587191] niu 0000:84:00.0: niu: eth4: bits (40000000) of register RXDMA_CFIG1 would not clear, val[c0000000] Any suggestions? Is this perhaps just broken hardware.. or a driver issue? (I had the Sun nxge driver working for around 180 days on the same card.. so I would assume the hardware is ok). Jesper -- Jesper