From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: "Allan, Bruce W" <bruce.w.allan@intel.com>
Cc: Linux Network Development list <netdev@vger.kernel.org>,
"e1000-devel@lists.sourceforge.net"
<e1000-devel@lists.sourceforge.net>
Subject: Re: eth1: Detected Hardware Unit Hang
Date: Wed, 31 Mar 2010 09:47:15 +0200 [thread overview]
Message-ID: <4BB2FE03.4090608@itcare.pl> (raw)
In-Reply-To: <4BB0E394.2060908@itcare.pl>
Hello
I reproduce this problem on other machine with the same hardware and
here is dmesg output: (kernel 2.6.33)
Mar 27 18:19:16 TM_01_C1 [1817894.769395] 0000:04:00.0: eth0: Detected
Hardware Unit Hang:
Mar 27 18:19:16 TM_01_C1 [1817894.769396] TDH <2e>
Mar 27 18:19:16 TM_01_C1 [1817894.769397] TDT <1a>
Mar 27 18:19:16 TM_01_C1 [1817894.769397] next_to_use <1a>
Mar 27 18:19:16 TM_01_C1 [1817894.769398] next_to_clean <2d>
Mar 27 18:19:16 TM_01_C1 [1817894.769398] buffer_info[next_to_clean]:
Mar 27 18:19:16 TM_01_C1 [1817894.769399] time_stamp <11b1591e9>
Mar 27 18:19:16 TM_01_C1 [1817894.769399] next_to_watch <2f>
Mar 27 18:19:16 TM_01_C1 [1817894.769400] jiffies <11b1592e4>
Mar 27 18:19:16 TM_01_C1 [1817894.769401] next_to_watch.status <0>
Mar 27 18:19:16 TM_01_C1 [1817894.769401] MAC Status <80080783>
Mar 27 18:19:16 TM_01_C1 [1817894.769402] PHY Status <796d>
Mar 27 18:19:16 TM_01_C1 [1817894.769402] PHY 1000BASE-T Status <3800>
Mar 27 18:19:16 TM_01_C1 [1817894.769403] PHY Extended Status <3000>
Mar 27 18:19:16 TM_01_C1 [1817894.769404] PCI Status <10>
Mar 27 18:19:18 TM_01_C1 [1817896.773365] 0000:04:00.0: eth0: Detected
Hardware Unit Hang:
Mar 27 18:19:18 TM_01_C1 [1817896.773367] TDH <2e>
Mar 27 18:19:18 TM_01_C1 [1817896.773368] TDT <1a>
Mar 27 18:19:18 TM_01_C1 [1817896.773368] next_to_use <1a>
Mar 27 18:19:18 TM_01_C1 [1817896.773369] next_to_clean <2d>
Mar 27 18:19:18 TM_01_C1 [1817896.773369] buffer_info[next_to_clean]:
Mar 27 18:19:18 TM_01_C1 [1817896.773370] time_stamp <11b1591e9>
Mar 27 18:19:18 TM_01_C1 [1817896.773370] next_to_watch <2f>
Mar 27 18:19:18 TM_01_C1 [1817896.773371] jiffies <11b1594d8>
Mar 27 18:19:18 TM_01_C1 [1817896.773372] next_to_watch.status <0>
Mar 27 18:19:18 TM_01_C1 [1817896.773372] MAC Status <80080783>
Mar 27 18:19:18 TM_01_C1 [1817896.773373] PHY Status <796d>
Mar 27 18:19:18 TM_01_C1 [1817896.773373] PHY 1000BASE-T Status <3800>
Mar 27 18:19:18 TM_01_C1 [1817896.773374] PHY Extended Status <3000>
Mar 27 18:19:18 TM_01_C1 [1817896.773375] PCI Status <10>
Mar 27 18:19:20 TM_01_C1 [1817898.769353] 0000:04:00.0: eth0: Detected
Hardware Unit Hang:
Mar 27 18:19:20 TM_01_C1 [1817898.769355] TDH <2e>
Mar 27 18:19:20 TM_01_C1 [1817898.769355] TDT <1a>
Mar 27 18:19:20 TM_01_C1 [1817898.769356] next_to_use <1a>
Mar 27 18:19:20 TM_01_C1 [1817898.769356] next_to_clean <2d>
Mar 27 18:19:20 TM_01_C1 [1817898.769357] buffer_info[next_to_clean]:
Mar 27 18:19:20 TM_01_C1 [1817898.769358] time_stamp <11b1591e9>
Mar 27 18:19:20 TM_01_C1 [1817898.769358] next_to_watch <2f>
Mar 27 18:19:20 TM_01_C1 [1817898.769359] jiffies <11b1596cc>
Mar 27 18:19:20 TM_01_C1 [1817898.769359] next_to_watch.status <0>
Mar 27 18:19:20 TM_01_C1 [1817898.769360] MAC Status <80080783>
Mar 27 18:19:20 TM_01_C1 [1817898.769361] PHY Status <796d>
Mar 27 18:19:20 TM_01_C1 [1817898.769361] PHY 1000BASE-T Status <3800>
Mar 27 18:19:20 TM_01_C1 [1817898.769362] PHY Extended Status <3000>
Mar 27 18:19:20 TM_01_C1 [1817898.769362] PCI Status <18>
Mar 27 18:19:21 TM_01_C1 [1817899.773012] ------------[ cut here
]------------
Mar 27 18:19:21 TM_01_C1 [1817899.773023] WARNING: at
net/sched/sch_generic.c:255 dev_watchdog+0x130/0x1d3()
Mar 27 18:19:21 TM_01_C1 [1817899.773026] Hardware name: X7DCT
Mar 27 18:19:21 TM_01_C1 [1817899.773028] NETDEV WATCHDOG: eth0
(e1000e): transmit queue 0 timed out
Mar 27 18:19:21 TM_01_C1 [1817899.773030] Modules linked in: coretemp
hwmon_vid hwmon [last unloaded: w83627hf]
Mar 27 18:19:21 TM_01_C1 [1817899.773038] Pid: 0, comm: swapper Not
tainted 2.6.33 #2
Mar 27 18:19:21 TM_01_C1 [1817899.773040] Call Trace:
Mar 27 18:19:21 TM_01_C1 [1817899.773042] <IRQ> [<ffffffff813003b3>] ?
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773050] [<ffffffff813003b3>] ?
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773055] [<ffffffff81032d1a>] ?
warn_slowpath_common+0x77/0xa3
Mar 27 18:19:21 TM_01_C1 [1817899.773059] [<ffffffff81032da2>] ?
warn_slowpath_fmt+0x51/0x59
Mar 27 18:19:21 TM_01_C1 [1817899.773064] [<ffffffff8102910c>] ?
enqueue_task_fair+0x3e/0xa1
Mar 27 18:19:21 TM_01_C1 [1817899.773068] [<ffffffff8102f0c2>] ?
try_to_wake_up+0x368/0x379
Mar 27 18:19:21 TM_01_C1 [1817899.773072] [<ffffffff812ee612>] ?
netdev_drivername+0x3b/0x40
Mar 27 18:19:21 TM_01_C1 [1817899.773075] [<ffffffff813003b3>] ?
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773079] [<ffffffff81026d60>] ?
__wake_up+0x30/0x44
Mar 27 18:19:21 TM_01_C1 [1817899.773082] [<ffffffff81300283>] ?
dev_watchdog+0x0/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773087] [<ffffffff8103f5d1>] ?
run_timer_softirq+0x200/0x29e
Mar 27 18:19:21 TM_01_C1 [1817899.773091] [<ffffffff810386f6>] ?
__do_softirq+0xd7/0x195
Mar 27 18:19:21 TM_01_C1 [1817899.773099] [<ffffffff810152b3>] ?
lapic_next_event+0x18/0x1d
Mar 27 18:19:21 TM_01_C1 [1817899.773104] [<ffffffff81002e0c>] ?
call_softirq+0x1c/0x28
Mar 27 18:19:21 TM_01_C1 [1817899.773107] [<ffffffff81004811>] ?
do_softirq+0x31/0x63
Mar 27 18:19:21 TM_01_C1 [1817899.773110] [<ffffffff810384eb>] ?
irq_exit+0x36/0x78
Mar 27 18:19:21 TM_01_C1 [1817899.773113] [<ffffffff81015d0b>] ?
smp_apic_timer_interrupt+0x87/0x95
Mar 27 18:19:21 TM_01_C1 [1817899.773117] [<ffffffff810028d3>] ?
apic_timer_interrupt+0x13/0x20
Mar 27 18:19:21 TM_01_C1 [1817899.773119] <EOI> [<ffffffff81008bdd>] ?
mwait_idle+0x9b/0xa0
Mar 27 18:19:21 TM_01_C1 [1817899.773126] [<ffffffff81001385>] ?
cpu_idle+0x53/0x8b
Mar 27 18:19:21 TM_01_C1 [1817899.773128] ---[ end trace
4ac842842c6f54b3 ]---
ethtool -i eth0
driver: e1000e
version: 1.0.2-k2
firmware-version: 0.15-5
bus-info: 0000:04:00.0
NIC statistics:
rx_packets: 8202754725
tx_packets: 7398272195
rx_bytes: 4373145698252
tx_bytes: 5234354904619
rx_broadcast: 59775
tx_broadcast: 405
rx_multicast: 0
tx_multicast: 0
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 0
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 1185
rx_missed_errors: 1466
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
tx_restart_queue: 12
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 0
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_long_byte_count: 4373145698252
rx_csum_offload_good: 8084424290
rx_csum_offload_errors: 5690
rx_header_split: 0
alloc_rx_buff_failed: 0
tx_smbus: 0
rx_smbus: 48588
dropped_smbus: 0
rx_dma_failed: 0
tx_dma_failed: 0
Wnen this occured traffic was about - RX: 360Mbit/s and TX: 340Mbit -
for eth0 interface.
W dniu 2010-03-29 19:29, Paweł Staszewski pisze:
> lspci -vvv + ethtool -S in attached files.
>
> Network traffic when i get this info:
> eth1: RX: 157.22 Mb/s TX: 379.27 Mb/s
>
> ethtool -i eth1
> driver: e1000e
> version: 1.0.2-k2
> firmware-version: 0.5-7
> bus-info: 0000:05:00.0
> This is: Intel Corporation 82573L Gigabit Ethernet Controller
>
>
> But in this server i have another gigabit interface:
> Intel Corporation 82573E Gigabit Ethernet Controller
> this interface has two times more traffic than eth0 (82573L)
> ethtool -i eth0
> driver: e1000e
> version: 1.0.2-k2
> firmware-version: 0.15-5
> bus-info: 0000:04:00.0
>
> And also this server was working 4months without problems on 2.6.29.1
> kernel
>
> Drivers that I use for e1000e are from kernel (standard kernel
> build-in e1000e driver).
> I don't tried other drivers.
>
> This is production server so I can't make too much tests.
>
>
> W dniu 2010-03-29 18:41, Allan, Bruce W pisze:
>> [adding e1000-devel]
>>
>> Please provide more information:
>> * what NIC/LOM is this on (preferably send full output from lspci -vvv)
>> * what type of networking workload is running at the time the hang
>> occurred
>> * a dump of the NIC/LOM statistics might also help (ethtool -S eth1)
>>
>> Have you tried the latest standalone e1000e driver on e1000.sf.net?
>> Does it reproduce the issue?
>>
>> If we cannot reproduce the hang in-house, would you be able/willing
>> to run a debug driver to gather more information?
>>
>> Thanks,
>> Bruce.
>>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org
>> [mailto:netdev-owner@vger.kernel.org] On Behalf Of Pawel Staszewski
>> Sent: Monday, March 29, 2010 8:34 AM
>> To: Linux Network Development list
>> Subject: eth1: Detected Hardware Unit Hang
>>
>> After update to kernel from 2.6.29.1 to 2.6.33.1 i have this info in
>> dmesg:
>>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>> TDH<1e>
>> TDT<a>
>> next_to_use<a>
>> next_to_clean<1d>
>> buffer_info[next_to_clean]:
>> time_stamp<33bae15>
>> next_to_watch<20>
>> jiffies<33bafaf>
>> next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>> TDH<1e>
>> TDT<a>
>> next_to_use<a>
>> next_to_clean<1d>
>> buffer_info[next_to_clean]:
>> time_stamp<33bae15>
>> next_to_watch<20>
>> jiffies<33bb1a3>
>> next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>> TDH<1e>
>> TDT<a>
>> next_to_use<a>
>> next_to_clean<1d>
>> buffer_info[next_to_clean]:
>> time_stamp<33bae15>
>> next_to_watch<20>
>> jiffies<33bb397>
>> next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> ------------[ cut here ]------------
>> WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x118/0x19c()
>> Hardware name: X7DCT
>> NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
>> Modules linked in:
>> Pid: 0, comm: swapper Not tainted 2.6.33.1 #2
>> Call Trace:
>> [<c1024e3d>] ? warn_slowpath_common+0x52/0x71
>> [<c1024e49>] ? warn_slowpath_common+0x5e/0x71
>> [<c1024e8e>] ? warn_slowpath_fmt+0x26/0x2a
>> [<c1261f54>] ? dev_watchdog+0x118/0x19c
>> [<c102135c>] ? __wake_up+0x29/0x39
>> [<c10320c6>] ? insert_work+0x40/0x44
>> [<c1261e3c>] ? dev_watchdog+0x0/0x19c
>> [<c102cc15>] ? run_timer_softirq+0x11a/0x173
>> [<c1028e5b>] ? __do_softirq+0x74/0xdf
>> [<c1028ee9>] ? do_softirq+0x23/0x27
>> [<c10290be>] ? irq_exit+0x26/0x58
>> [<c10102d7>] ? smp_apic_timer_interrupt+0x6c/0x76
>> [<c12c5f9a>] ? apic_timer_interrupt+0x2a/0x30
>> [<c1007e06>] ? mwait_idle+0x49/0x4e
>> [<c10017e8>] ? cpu_idle+0x41/0x5a
>> ---[ end trace bcca9926a046332c ]---
>>
>>
>> With kernel 2.6.29.1 all was ok.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
next prev parent reply other threads:[~2010-03-31 7:47 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-29 15:33 eth1: Detected Hardware Unit Hang Paweł Staszewski
2010-03-29 16:41 ` Allan, Bruce W
2010-03-29 17:29 ` Paweł Staszewski
2010-03-29 17:36 ` Paweł Staszewski
2010-03-31 7:47 ` Paweł Staszewski [this message]
2010-03-31 18:03 ` Tantilov, Emil S
2010-03-31 19:16 ` Paweł Staszewski
2010-03-31 19:59 ` Tantilov, Emil S
2010-03-31 20:06 ` Paweł Staszewski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BB2FE03.4090608@itcare.pl \
--to=pstaszewski@itcare.pl \
--cc=bruce.w.allan@intel.com \
--cc=e1000-devel@lists.sourceforge.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.