Re: eth1: Detected Hardware Unit Hang

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: "Allan, Bruce W" <bruce.w.allan@intel.com>
Cc: Linux Network Development list <netdev@vger.kernel.org>,
	"e1000-devel@lists.sourceforge.net"
	<e1000-devel@lists.sourceforge.net>
Subject: Re: eth1: Detected Hardware Unit Hang
Date: Wed, 31 Mar 2010 09:47:15 +0200	[thread overview]
Message-ID: <4BB2FE03.4090608@itcare.pl> (raw)
In-Reply-To: <4BB0E394.2060908@itcare.pl>

Hello

I reproduce this problem on other machine with the same hardware and 
here is dmesg output: (kernel 2.6.33)

Mar 27 18:19:16 TM_01_C1 [1817894.769395] 0000:04:00.0: eth0: Detected 
Hardware Unit Hang:
Mar 27 18:19:16 TM_01_C1 [1817894.769396]   TDH <2e>
Mar 27 18:19:16 TM_01_C1 [1817894.769397]   TDT <1a>
Mar 27 18:19:16 TM_01_C1 [1817894.769397]   next_to_use <1a>
Mar 27 18:19:16 TM_01_C1 [1817894.769398]   next_to_clean <2d>
Mar 27 18:19:16 TM_01_C1 [1817894.769398] buffer_info[next_to_clean]:
Mar 27 18:19:16 TM_01_C1 [1817894.769399]   time_stamp <11b1591e9>
Mar 27 18:19:16 TM_01_C1 [1817894.769399]   next_to_watch <2f>
Mar 27 18:19:16 TM_01_C1 [1817894.769400]   jiffies <11b1592e4>
Mar 27 18:19:16 TM_01_C1 [1817894.769401]   next_to_watch.status <0>
Mar 27 18:19:16 TM_01_C1 [1817894.769401] MAC Status <80080783>
Mar 27 18:19:16 TM_01_C1 [1817894.769402] PHY Status <796d>
Mar 27 18:19:16 TM_01_C1 [1817894.769402] PHY 1000BASE-T Status <3800>
Mar 27 18:19:16 TM_01_C1 [1817894.769403] PHY Extended Status <3000>
Mar 27 18:19:16 TM_01_C1 [1817894.769404] PCI Status <10>
Mar 27 18:19:18 TM_01_C1 [1817896.773365] 0000:04:00.0: eth0: Detected 
Hardware Unit Hang:
Mar 27 18:19:18 TM_01_C1 [1817896.773367]   TDH <2e>
Mar 27 18:19:18 TM_01_C1 [1817896.773368]   TDT <1a>
Mar 27 18:19:18 TM_01_C1 [1817896.773368]   next_to_use <1a>
Mar 27 18:19:18 TM_01_C1 [1817896.773369]   next_to_clean <2d>
Mar 27 18:19:18 TM_01_C1 [1817896.773369] buffer_info[next_to_clean]:
Mar 27 18:19:18 TM_01_C1 [1817896.773370]   time_stamp <11b1591e9>
Mar 27 18:19:18 TM_01_C1 [1817896.773370]   next_to_watch <2f>
Mar 27 18:19:18 TM_01_C1 [1817896.773371]   jiffies <11b1594d8>
Mar 27 18:19:18 TM_01_C1 [1817896.773372]   next_to_watch.status <0>
Mar 27 18:19:18 TM_01_C1 [1817896.773372] MAC Status <80080783>
Mar 27 18:19:18 TM_01_C1 [1817896.773373] PHY Status <796d>
Mar 27 18:19:18 TM_01_C1 [1817896.773373] PHY 1000BASE-T Status <3800>
Mar 27 18:19:18 TM_01_C1 [1817896.773374] PHY Extended Status <3000>
Mar 27 18:19:18 TM_01_C1 [1817896.773375] PCI Status <10>
Mar 27 18:19:20 TM_01_C1 [1817898.769353] 0000:04:00.0: eth0: Detected 
Hardware Unit Hang:
Mar 27 18:19:20 TM_01_C1 [1817898.769355]   TDH <2e>
Mar 27 18:19:20 TM_01_C1 [1817898.769355]   TDT <1a>
Mar 27 18:19:20 TM_01_C1 [1817898.769356]   next_to_use <1a>
Mar 27 18:19:20 TM_01_C1 [1817898.769356]   next_to_clean <2d>
Mar 27 18:19:20 TM_01_C1 [1817898.769357] buffer_info[next_to_clean]:
Mar 27 18:19:20 TM_01_C1 [1817898.769358]   time_stamp <11b1591e9>
Mar 27 18:19:20 TM_01_C1 [1817898.769358]   next_to_watch <2f>
Mar 27 18:19:20 TM_01_C1 [1817898.769359]   jiffies <11b1596cc>
Mar 27 18:19:20 TM_01_C1 [1817898.769359]   next_to_watch.status <0>
Mar 27 18:19:20 TM_01_C1 [1817898.769360] MAC Status <80080783>
Mar 27 18:19:20 TM_01_C1 [1817898.769361] PHY Status <796d>
Mar 27 18:19:20 TM_01_C1 [1817898.769361] PHY 1000BASE-T Status <3800>
Mar 27 18:19:20 TM_01_C1 [1817898.769362] PHY Extended Status <3000>
Mar 27 18:19:20 TM_01_C1 [1817898.769362] PCI Status <18>
Mar 27 18:19:21 TM_01_C1 [1817899.773012] ------------[ cut here 
]------------
Mar 27 18:19:21 TM_01_C1 [1817899.773023] WARNING: at 
net/sched/sch_generic.c:255 dev_watchdog+0x130/0x1d3()
Mar 27 18:19:21 TM_01_C1 [1817899.773026] Hardware name: X7DCT
Mar 27 18:19:21 TM_01_C1 [1817899.773028] NETDEV WATCHDOG: eth0 
(e1000e): transmit queue 0 timed out
Mar 27 18:19:21 TM_01_C1 [1817899.773030] Modules linked in: coretemp 
hwmon_vid hwmon [last unloaded: w83627hf]
Mar 27 18:19:21 TM_01_C1 [1817899.773038] Pid: 0, comm: swapper Not 
tainted 2.6.33 #2
Mar 27 18:19:21 TM_01_C1 [1817899.773040] Call Trace:
Mar 27 18:19:21 TM_01_C1 [1817899.773042] <IRQ>  [<ffffffff813003b3>] ? 
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773050]  [<ffffffff813003b3>] ? 
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773055]  [<ffffffff81032d1a>] ? 
warn_slowpath_common+0x77/0xa3
Mar 27 18:19:21 TM_01_C1 [1817899.773059]  [<ffffffff81032da2>] ? 
warn_slowpath_fmt+0x51/0x59
Mar 27 18:19:21 TM_01_C1 [1817899.773064]  [<ffffffff8102910c>] ? 
enqueue_task_fair+0x3e/0xa1
Mar 27 18:19:21 TM_01_C1 [1817899.773068]  [<ffffffff8102f0c2>] ? 
try_to_wake_up+0x368/0x379
Mar 27 18:19:21 TM_01_C1 [1817899.773072]  [<ffffffff812ee612>] ? 
netdev_drivername+0x3b/0x40
Mar 27 18:19:21 TM_01_C1 [1817899.773075]  [<ffffffff813003b3>] ? 
dev_watchdog+0x130/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773079]  [<ffffffff81026d60>] ? 
__wake_up+0x30/0x44
Mar 27 18:19:21 TM_01_C1 [1817899.773082]  [<ffffffff81300283>] ? 
dev_watchdog+0x0/0x1d3
Mar 27 18:19:21 TM_01_C1 [1817899.773087]  [<ffffffff8103f5d1>] ? 
run_timer_softirq+0x200/0x29e
Mar 27 18:19:21 TM_01_C1 [1817899.773091]  [<ffffffff810386f6>] ? 
__do_softirq+0xd7/0x195
Mar 27 18:19:21 TM_01_C1 [1817899.773099]  [<ffffffff810152b3>] ? 
lapic_next_event+0x18/0x1d
Mar 27 18:19:21 TM_01_C1 [1817899.773104]  [<ffffffff81002e0c>] ? 
call_softirq+0x1c/0x28
Mar 27 18:19:21 TM_01_C1 [1817899.773107]  [<ffffffff81004811>] ? 
do_softirq+0x31/0x63
Mar 27 18:19:21 TM_01_C1 [1817899.773110]  [<ffffffff810384eb>] ? 
irq_exit+0x36/0x78
Mar 27 18:19:21 TM_01_C1 [1817899.773113]  [<ffffffff81015d0b>] ? 
smp_apic_timer_interrupt+0x87/0x95
Mar 27 18:19:21 TM_01_C1 [1817899.773117]  [<ffffffff810028d3>] ? 
apic_timer_interrupt+0x13/0x20
Mar 27 18:19:21 TM_01_C1 [1817899.773119] <EOI>  [<ffffffff81008bdd>] ? 
mwait_idle+0x9b/0xa0
Mar 27 18:19:21 TM_01_C1 [1817899.773126]  [<ffffffff81001385>] ? 
cpu_idle+0x53/0x8b
Mar 27 18:19:21 TM_01_C1 [1817899.773128] ---[ end trace 
4ac842842c6f54b3 ]---

ethtool -i eth0
driver: e1000e
version: 1.0.2-k2
firmware-version: 0.15-5
bus-info: 0000:04:00.0

NIC statistics:
      rx_packets: 8202754725
      tx_packets: 7398272195
      rx_bytes: 4373145698252
      tx_bytes: 5234354904619
      rx_broadcast: 59775
      tx_broadcast: 405
      rx_multicast: 0
      tx_multicast: 0
      rx_errors: 0
      tx_errors: 0
      tx_dropped: 0
      multicast: 0
      collisions: 0
      rx_length_errors: 0
      rx_over_errors: 0
      rx_crc_errors: 0
      rx_frame_errors: 0
      rx_no_buffer_count: 1185
      rx_missed_errors: 1466
      tx_aborted_errors: 0
      tx_carrier_errors: 0
      tx_fifo_errors: 0
      tx_heartbeat_errors: 0
      tx_window_errors: 0
      tx_abort_late_coll: 0
      tx_deferred_ok: 0
      tx_single_coll_ok: 0
      tx_multi_coll_ok: 0
      tx_timeout_count: 0
      tx_restart_queue: 12
      rx_long_length_errors: 0
      rx_short_length_errors: 0
      rx_align_errors: 0
      tx_tcp_seg_good: 0
      tx_tcp_seg_failed: 0
      rx_flow_control_xon: 0
      rx_flow_control_xoff: 0
      tx_flow_control_xon: 0
      tx_flow_control_xoff: 0
      rx_long_byte_count: 4373145698252
      rx_csum_offload_good: 8084424290
      rx_csum_offload_errors: 5690
      rx_header_split: 0
      alloc_rx_buff_failed: 0
      tx_smbus: 0
      rx_smbus: 48588
      dropped_smbus: 0
      rx_dma_failed: 0
      tx_dma_failed: 0


Wnen this occured traffic was about -  RX: 360Mbit/s  and  TX: 340Mbit - 
for eth0 interface.



W dniu 2010-03-29 19:29, Paweł Staszewski pisze:
> lspci -vvv + ethtool -S in attached files.
>
> Network traffic when i get this info:
> eth1:    RX:    157.22 Mb/s    TX:    379.27 Mb/s
>
> ethtool -i eth1
> driver: e1000e
> version: 1.0.2-k2
> firmware-version: 0.5-7
> bus-info: 0000:05:00.0
> This is: Intel Corporation 82573L Gigabit Ethernet Controller
>
>
> But in this server i have another gigabit interface:
> Intel Corporation 82573E Gigabit Ethernet Controller
> this interface has two times more traffic than eth0 (82573L)
> ethtool -i eth0
> driver: e1000e
> version: 1.0.2-k2
> firmware-version: 0.15-5
> bus-info: 0000:04:00.0
>
> And also this server was working 4months without problems on 2.6.29.1 
> kernel
>
> Drivers that I use for e1000e are from kernel (standard kernel 
> build-in e1000e driver).
> I don't tried other drivers.
>
> This is production server so I can't make too much tests.
>
>
> W dniu 2010-03-29 18:41, Allan, Bruce W pisze:
>> [adding e1000-devel]
>>
>> Please provide more information:
>> * what NIC/LOM is this on (preferably send full output from lspci -vvv)
>> * what type of networking workload is running at the time the hang 
>> occurred
>> * a dump of the NIC/LOM statistics might also help (ethtool -S eth1)
>>
>> Have you tried the latest standalone e1000e driver on e1000.sf.net?  
>> Does it reproduce the issue?
>>
>> If we cannot reproduce the hang in-house, would you be able/willing 
>> to run a debug driver to gather more information?
>>
>> Thanks,
>> Bruce.
>>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org 
>> [mailto:netdev-owner@vger.kernel.org] On Behalf Of Pawel Staszewski
>> Sent: Monday, March 29, 2010 8:34 AM
>> To: Linux Network Development list
>> Subject: eth1: Detected Hardware Unit Hang
>>
>> After update to kernel from 2.6.29.1 to 2.6.33.1 i have this info in 
>> dmesg:
>>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>>     TDH<1e>
>>     TDT<a>
>>     next_to_use<a>
>>     next_to_clean<1d>
>> buffer_info[next_to_clean]:
>>     time_stamp<33bae15>
>>     next_to_watch<20>
>>     jiffies<33bafaf>
>>     next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>>     TDH<1e>
>>     TDT<a>
>>     next_to_use<a>
>>     next_to_clean<1d>
>> buffer_info[next_to_clean]:
>>     time_stamp<33bae15>
>>     next_to_watch<20>
>>     jiffies<33bb1a3>
>>     next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> 0000:05:00.0: eth1: Detected Hardware Unit Hang:
>>     TDH<1e>
>>     TDT<a>
>>     next_to_use<a>
>>     next_to_clean<1d>
>> buffer_info[next_to_clean]:
>>     time_stamp<33bae15>
>>     next_to_watch<20>
>>     jiffies<33bb397>
>>     next_to_watch.status<0>
>> MAC Status<80080783>
>> PHY Status<796d>
>> PHY 1000BASE-T Status<3800>
>> PHY Extended Status<3000>
>> PCI Status<10>
>> ------------[ cut here ]------------
>> WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x118/0x19c()
>> Hardware name: X7DCT
>> NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
>> Modules linked in:
>> Pid: 0, comm: swapper Not tainted 2.6.33.1 #2
>> Call Trace:
>>    [<c1024e3d>] ? warn_slowpath_common+0x52/0x71
>>    [<c1024e49>] ? warn_slowpath_common+0x5e/0x71
>>    [<c1024e8e>] ? warn_slowpath_fmt+0x26/0x2a
>>    [<c1261f54>] ? dev_watchdog+0x118/0x19c
>>    [<c102135c>] ? __wake_up+0x29/0x39
>>    [<c10320c6>] ? insert_work+0x40/0x44
>>    [<c1261e3c>] ? dev_watchdog+0x0/0x19c
>>    [<c102cc15>] ? run_timer_softirq+0x11a/0x173
>>    [<c1028e5b>] ? __do_softirq+0x74/0xdf
>>    [<c1028ee9>] ? do_softirq+0x23/0x27
>>    [<c10290be>] ? irq_exit+0x26/0x58
>>    [<c10102d7>] ? smp_apic_timer_interrupt+0x6c/0x76
>>    [<c12c5f9a>] ? apic_timer_interrupt+0x2a/0x30
>>    [<c1007e06>] ? mwait_idle+0x49/0x4e
>>    [<c10017e8>] ? cpu_idle+0x41/0x5a
>> ---[ end trace bcca9926a046332c ]---
>>
>>
>> With kernel 2.6.29.1 all was ok.
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>

next prev parent reply	other threads:[~2010-03-31  7:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-29 15:33 eth1: Detected Hardware Unit Hang Paweł Staszewski
2010-03-29 16:41 ` Allan, Bruce W
2010-03-29 17:29   ` Paweł Staszewski
2010-03-29 17:36     ` Paweł Staszewski
2010-03-31  7:47     ` Paweł Staszewski [this message]
2010-03-31 18:03       ` Tantilov, Emil S
2010-03-31 19:16         ` Paweł Staszewski
2010-03-31 19:59           ` Tantilov, Emil S
2010-03-31 20:06             ` Paweł Staszewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BB2FE03.4090608@itcare.pl \
    --to=pstaszewski@itcare.pl \
    --cc=bruce.w.allan@intel.com \
    --cc=e1000-devel@lists.sourceforge.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.