* [pv_ops] e1000e: "Detected Tx Unit Hang"
@ 2010-05-20 21:45 Stefan Kuhne
2010-05-20 22:18 ` Jeremy Fitzhardinge
2010-05-22 15:32 ` Thomas Goirand
0 siblings, 2 replies; 8+ messages in thread
From: Stefan Kuhne @ 2010-05-20 21:45 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 216 bytes --]
Hello,
my server has massive problems with my NIC.
I got: "Detected Tx Unit Hang".
At the moment I use 2.6.31 from Jeremy, does anyone know if it's fixed
in 2.6.32 or newer tree?
Regards,
Stefan Kuhne
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 552 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [pv_ops] e1000e: "Detected Tx Unit Hang"
2010-05-20 21:45 [pv_ops] e1000e: "Detected Tx Unit Hang" Stefan Kuhne
@ 2010-05-20 22:18 ` Jeremy Fitzhardinge
2010-05-20 22:58 ` Stefan Kuhne
2010-05-22 15:32 ` Thomas Goirand
1 sibling, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2010-05-20 22:18 UTC (permalink / raw)
To: xen-devel; +Cc: Stefan Kuhne
On 05/20/2010 02:45 PM, Stefan Kuhne wrote:
> Hello,
>
> my server has massive problems with my NIC.
> I got: "Detected Tx Unit Hang".
>
> At the moment I use 2.6.31 from Jeremy, does anyone know if it's fixed
> in 2.6.32 or newer tree?
>
e1000e works fine for me. However, I did have problems with my Ibex
Peak-based system and the integrated ethernet devices; they would drop
off the PCIe bus (lspci -vx would show all 0xff for the config space),
which turned out to be some problem with ALPM (PCIe active link power
management). Could this be what you're seeing?
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [pv_ops] e1000e: "Detected Tx Unit Hang"
2010-05-20 22:18 ` Jeremy Fitzhardinge
@ 2010-05-20 22:58 ` Stefan Kuhne
2010-05-20 23:01 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 8+ messages in thread
From: Stefan Kuhne @ 2010-05-20 22:58 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 2971 bytes --]
Am 21.05.2010 00:18, schrieb Jeremy Fitzhardinge:
Hello Jeremy,
> e1000e works fine for me. However, I did have problems with my Ibex
> Peak-based system and the integrated ethernet devices; they would drop
> off the PCIe bus (lspci -vx would show all 0xff for the config space),
> which turned out to be some problem with ALPM (PCIe active link power
> management). Could this be what you're seeing?
>
my "lspci -vx" output:
02:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet
Controller (Copper)
Subsystem: FIRST INTERNATIONAL Computer Inc Unknown device 4720
Flags: bus master, fast devsel, latency 0, IRQ 409
Memory at d0000000 (32-bit, non-prefetchable) [size=128K]
I/O ports at 2000 [size=32]
Capabilities: [c8] Power Management version 2
Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
Queue=0/0 Enable+
Capabilities: [e0] Express Endpoint IRQ 0
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number c6-a9-09-ff-ff-0b-14-00
00: 86 80 8c 10 07 05 10 00 00 00 00 02 10 00 00 00
10: 00 00 00 d0 00 00 00 00 01 20 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 09 15 20 47
30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00
and the complete dmesg output:
[ 9620.997466] 0000:02:00.0: peth0: Detected Tx Unit Hang:
[ 9620.997469] TDH <fc>
[ 9620.997471] TDT <1f>
[ 9620.997473] next_to_use <1f>
[ 9620.997475] next_to_clean <fc>
[ 9620.997477] buffer_info[next_to_clean]:
[ 9620.997479] time_stamp <8e2ec3>
[ 9620.997481] next_to_watch <fc>
[ 9620.997483] jiffies <8e3a25>
[ 9620.997485] next_to_watch.status <0>
[ 9622.997490] 0000:02:00.0: peth0: Detected Tx Unit Hang:
[ 9622.997496] TDH <fc>
[ 9622.997500] TDT <1f>
[ 9622.997503] next_to_use <1f>
[ 9622.997507] next_to_clean <fc>
[ 9622.997511] buffer_info[next_to_clean]:
[ 9622.997515] time_stamp <8e2ec3>
[ 9622.997519] next_to_watch <fc>
[ 9622.997522] jiffies <8e41f5>
[ 9622.997526] next_to_watch.status <0>
[ 9624.997536] 0000:02:00.0: peth0: Detected Tx Unit Hang:
[ 9624.997541] TDH <fc>
[ 9624.997545] TDT <1f>
[ 9624.997549] next_to_use <1f>
[ 9624.997553] next_to_clean <fc>
[ 9624.997557] buffer_info[next_to_clean]:
[ 9624.997561] time_stamp <8e2ec3>
[ 9624.997565] next_to_watch <fc>
[ 9624.997568] jiffies <8e49c5>
[ 9624.997572] next_to_watch.status <0>
[ 9626.065848] eth0: port 1(peth0) entering disabled state
[ 9629.910292] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: None
[ 9629.910854] eth0: port 1(peth0) entering forwarding state
Regards,
Stefan Kuhne
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 552 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [pv_ops] e1000e: "Detected Tx Unit Hang"
2010-05-20 22:58 ` Stefan Kuhne
@ 2010-05-20 23:01 ` Jeremy Fitzhardinge
2010-05-20 23:21 ` AW: " Heiko Wundram
2010-05-20 23:22 ` Stefan Kuhne
0 siblings, 2 replies; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2010-05-20 23:01 UTC (permalink / raw)
To: xen-devel; +Cc: Stefan Kuhne
On 05/20/2010 03:58 PM, Stefan Kuhne wrote:
> Am 21.05.2010 00:18, schrieb Jeremy Fitzhardinge:
>
> Hello Jeremy,
>
>
>> e1000e works fine for me. However, I did have problems with my Ibex
>> Peak-based system and the integrated ethernet devices; they would drop
>> off the PCIe bus (lspci -vx would show all 0xff for the config space),
>> which turned out to be some problem with ALPM (PCIe active link power
>> management). Could this be what you're seeing?
>>
>>
> my "lspci -vx" output:
>
> 02:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet
> Controller (Copper)
> Subsystem: FIRST INTERNATIONAL Computer Inc Unknown device 4720
> Flags: bus master, fast devsel, latency 0, IRQ 409
> Memory at d0000000 (32-bit, non-prefetchable) [size=128K]
> I/O ports at 2000 [size=32]
> Capabilities: [c8] Power Management version 2
> Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable+
> Capabilities: [e0] Express Endpoint IRQ 0
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Device Serial Number c6-a9-09-ff-ff-0b-14-00
> 00: 86 80 8c 10 07 05 10 00 00 00 00 02 10 00 00 00
> 10: 00 00 00 d0 00 00 00 00 01 20 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 09 15 20 47
> 30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00
>
> and the complete dmesg output:
> [ 9620.997466] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9620.997469] TDH <fc>
> [ 9620.997471] TDT <1f>
> [ 9620.997473] next_to_use <1f>
> [ 9620.997475] next_to_clean <fc>
> [ 9620.997477] buffer_info[next_to_clean]:
> [ 9620.997479] time_stamp <8e2ec3>
> [ 9620.997481] next_to_watch <fc>
> [ 9620.997483] jiffies <8e3a25>
> [ 9620.997485] next_to_watch.status <0>
> [ 9622.997490] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9622.997496] TDH <fc>
> [ 9622.997500] TDT <1f>
> [ 9622.997503] next_to_use <1f>
> [ 9622.997507] next_to_clean <fc>
> [ 9622.997511] buffer_info[next_to_clean]:
> [ 9622.997515] time_stamp <8e2ec3>
> [ 9622.997519] next_to_watch <fc>
> [ 9622.997522] jiffies <8e41f5>
> [ 9622.997526] next_to_watch.status <0>
> [ 9624.997536] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9624.997541] TDH <fc>
> [ 9624.997545] TDT <1f>
> [ 9624.997549] next_to_use <1f>
> [ 9624.997553] next_to_clean <fc>
> [ 9624.997557] buffer_info[next_to_clean]:
> [ 9624.997561] time_stamp <8e2ec3>
> [ 9624.997565] next_to_watch <fc>
> [ 9624.997568] jiffies <8e49c5>
> [ 9624.997572] next_to_watch.status <0>
> [ 9626.065848] eth0: port 1(peth0) entering disabled state
> [ 9629.910292] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: None
> [ 9629.910854] eth0: port 1(peth0) entering forwarding state
>
OK, definitely different problem. Does it happen immediately, or after
a while? Under load? Can you provide the full boot output, and cat
/proc/interrupts?
Thanks,
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* AW: [pv_ops] e1000e: "Detected Tx Unit Hang"
2010-05-20 23:01 ` Jeremy Fitzhardinge
@ 2010-05-20 23:21 ` Heiko Wundram
2010-05-23 0:16 ` Stefan Kuhne
2010-05-20 23:22 ` Stefan Kuhne
1 sibling, 1 reply; 8+ messages in thread
From: Heiko Wundram @ 2010-05-20 23:21 UTC (permalink / raw)
To: 'Jeremy Fitzhardinge', xen-devel; +Cc: 'Stefan Kuhne'
I'm pretty sure the problem you're seeing is related to a broken firmware of
the specific chipset used for this Intel network card, not to Xen/pv_ops
kernel. I've had the same problems under high load with "semi-old"
Supermicro-Boxens I'm administering.
There's an Intel utility to patch the respective Firmware issue (i.e., the
network controller EEPROM), but it's not available online anymore (at least
last time I looked for it, I couldn't find it on the Intel site, where it
was prominently featured when I first looked for it).
I'll try to get access to it from the last machine that I applied this patch
to, but I'll only be able to do this some time during the (European) day
tomorrow.
--- Heiko.
-----Ursprüngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Jeremy
Fitzhardinge
Gesendet: Freitag, 21. Mai 2010 01:01
An: xen-devel@lists.xensource.com
Cc: Stefan Kuhne
Betreff: Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"
On 05/20/2010 03:58 PM, Stefan Kuhne wrote:
> Am 21.05.2010 00:18, schrieb Jeremy Fitzhardinge:
>
> Hello Jeremy,
>
>
>> e1000e works fine for me. However, I did have problems with my Ibex
>> Peak-based system and the integrated ethernet devices; they would drop
>> off the PCIe bus (lspci -vx would show all 0xff for the config space),
>> which turned out to be some problem with ALPM (PCIe active link power
>> management). Could this be what you're seeing?
>>
>>
> my "lspci -vx" output:
>
> 02:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet
> Controller (Copper)
> Subsystem: FIRST INTERNATIONAL Computer Inc Unknown device 4720
> Flags: bus master, fast devsel, latency 0, IRQ 409
> Memory at d0000000 (32-bit, non-prefetchable) [size=128K]
> I/O ports at 2000 [size=32]
> Capabilities: [c8] Power Management version 2
> Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/0 Enable+
> Capabilities: [e0] Express Endpoint IRQ 0
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Device Serial Number c6-a9-09-ff-ff-0b-14-00
> 00: 86 80 8c 10 07 05 10 00 00 00 00 02 10 00 00 00
> 10: 00 00 00 d0 00 00 00 00 01 20 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 09 15 20 47
> 30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00
>
> and the complete dmesg output:
> [ 9620.997466] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9620.997469] TDH <fc>
> [ 9620.997471] TDT <1f>
> [ 9620.997473] next_to_use <1f>
> [ 9620.997475] next_to_clean <fc>
> [ 9620.997477] buffer_info[next_to_clean]:
> [ 9620.997479] time_stamp <8e2ec3>
> [ 9620.997481] next_to_watch <fc>
> [ 9620.997483] jiffies <8e3a25>
> [ 9620.997485] next_to_watch.status <0>
> [ 9622.997490] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9622.997496] TDH <fc>
> [ 9622.997500] TDT <1f>
> [ 9622.997503] next_to_use <1f>
> [ 9622.997507] next_to_clean <fc>
> [ 9622.997511] buffer_info[next_to_clean]:
> [ 9622.997515] time_stamp <8e2ec3>
> [ 9622.997519] next_to_watch <fc>
> [ 9622.997522] jiffies <8e41f5>
> [ 9622.997526] next_to_watch.status <0>
> [ 9624.997536] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9624.997541] TDH <fc>
> [ 9624.997545] TDT <1f>
> [ 9624.997549] next_to_use <1f>
> [ 9624.997553] next_to_clean <fc>
> [ 9624.997557] buffer_info[next_to_clean]:
> [ 9624.997561] time_stamp <8e2ec3>
> [ 9624.997565] next_to_watch <fc>
> [ 9624.997568] jiffies <8e49c5>
> [ 9624.997572] next_to_watch.status <0>
> [ 9626.065848] eth0: port 1(peth0) entering disabled state
> [ 9629.910292] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: None
> [ 9629.910854] eth0: port 1(peth0) entering forwarding state
>
OK, definitely different problem. Does it happen immediately, or after
a while? Under load? Can you provide the full boot output, and cat
/proc/interrupts?
Thanks,
J
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [pv_ops] e1000e: "Detected Tx Unit Hang"
2010-05-20 23:01 ` Jeremy Fitzhardinge
2010-05-20 23:21 ` AW: " Heiko Wundram
@ 2010-05-20 23:22 ` Stefan Kuhne
1 sibling, 0 replies; 8+ messages in thread
From: Stefan Kuhne @ 2010-05-20 23:22 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 4324 bytes --]
Am 21.05.2010 01:01, schrieb Jeremy Fitzhardinge:
Hello Jeremy,
> OK, definitely different problem. Does it happen immediately, or after
> a while? Under load? Can you provide the full boot output, and cat
> /proc/interrupts?
>
It happen under copy from domU to physical PC.
Boot output from dom0 (dmesg)?
root@Overmind:~# cat /proc/interrupts
CPU0 CPU1
1: 8 0 xen-pirq-ioapic-edge i8042
8: 0 0 xen-pirq-ioapic-edge rtc0
9: 926014 0 xen-pirq-ioapic-level acpi
12: 22 0 xen-pirq-ioapic-edge i8042
14: 129 0 xen-pirq-ioapic-edge ide0
16: 887 0 xen-pirq-ioapic-level uhci_hcd:usb5,
firewire_ohci
17: 0 0 xen-pirq-ioapic-level mmc0
18: 0 0 xen-pirq-ioapic-level uhci_hcd:usb4
19: 0 0 xen-pirq-ioapic-level uhci_hcd:usb3
22: 72 0 xen-pirq-ioapic-level HDA Intel
23: 31 0 xen-pirq-ioapic-level ehci_hcd:usb1,
uhci_hcd:usb2
381: 15770007 0 xen-dyn-event vif6.0
382: 85431 0 xen-dyn-event blkif-backend
383: 386 0 xen-dyn-event evtchn:xenconsoled
384: 139 0 xen-dyn-event evtchn:xenstored
385: 42592 0 xen-dyn-event vif5.0
386: 65335 0 xen-dyn-event blkif-backend
387: 315 0 xen-dyn-event evtchn:xenconsoled
388: 139 0 xen-dyn-event evtchn:xenstored
389: 43 0 xen-dyn-event vif4.0
390: 1306 0 xen-dyn-event blkif-backend
391: 123 0 xen-dyn-event evtchn:xenconsoled
392: 135 0 xen-dyn-event evtchn:xenstored
393: 6588 0 xen-dyn-event vif3.0
394: 6723 0 xen-dyn-event blkif-backend
395: 319 0 xen-dyn-event evtchn:xenconsoled
396: 265 0 xen-dyn-event evtchn:xenstored
397: 108544 0 xen-dyn-event vif2.0
398: 315 0 xen-dyn-event blkif-backend
399: 87 0 xen-dyn-event evtchn:xenconsoled
400: 128 0 xen-dyn-event evtchn:xenstored
401: 13477877 0 xen-dyn-event vif1.0
402: 866835 0 xen-dyn-event blkif-backend
403: 28802 0 xen-dyn-event blkif-backend
404: 300 0 xen-dyn-event evtchn:xenconsoled
405: 220 0 xen-dyn-event evtchn:xenstored
406: 0 0 xen-dyn-event evtchn:xenstored
407: 2460 0 xen-dyn-event evtchn:xenstored
408: 2953808 0 xen-pirq-msi ahci
409: 8689919 0 xen-pirq-msi peth0
412: 0 0 xen-dyn-virq pcpu
413: 4550 0 xen-dyn-event xenbus
414: 0 403 xen-dyn-ipi callfuncsingle1
415: 0 0 xen-dyn-virq debug1
416: 0 0 xen-dyn-ipi callfunc1
417: 0 104331 xen-dyn-ipi resched1
418: 0 53769606 xen-dyn-virq timer1
419: 221 0 xen-dyn-ipi callfuncsingle0
420: 0 0 xen-dyn-virq debug0
421: 0 0 xen-dyn-ipi callfunc0
422: 264761 0 xen-dyn-ipi resched0
423: 53761166 0 xen-dyn-virq timer0
NMI: 0 0 Non-maskable interrupts
LOC: 0 0 Local timer interrupts
SPU: 0 0 Spurious interrupts
CNT: 0 0 Performance counter interrupts
PND: 0 0 Performance pending work
RES: 264761 104331 Rescheduling interrupts
CAL: 221 403 Function call interrupts
TLB: 0 0 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 180 180 Machine check polls
ERR: 0
MIS: 0
root@Overmind:~#
I've no PCI device forwarded.
Regards,
Stefan Kuhne
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 552 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [pv_ops] e1000e: "Detected Tx Unit Hang"
2010-05-20 21:45 [pv_ops] e1000e: "Detected Tx Unit Hang" Stefan Kuhne
2010-05-20 22:18 ` Jeremy Fitzhardinge
@ 2010-05-22 15:32 ` Thomas Goirand
1 sibling, 0 replies; 8+ messages in thread
From: Thomas Goirand @ 2010-05-22 15:32 UTC (permalink / raw)
To: xen-devel
Stefan Kuhne wrote:
> Hello,
>
> my server has massive problems with my NIC.
> I got: "Detected Tx Unit Hang".
>
> At the moment I use 2.6.31 from Jeremy, does anyone know if it's fixed
> in 2.6.32 or newer tree?
>
> Regards,
> Stefan Kuhne
>
We had the issues with many Supermicro servers as well. It seems that
Supermicro doesn't often upgrade BIOS/ROMs/etc. when they sell their
hardware. For us, many times, this fixed the issue:
ethtool -K peth0 tso off
You might want to try as well.
Thomas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [pv_ops] e1000e: "Detected Tx Unit Hang"
2010-05-20 23:21 ` AW: " Heiko Wundram
@ 2010-05-23 0:16 ` Stefan Kuhne
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Kuhne @ 2010-05-23 0:16 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 96 bytes --]
Hello Heiko,
thanks for this script.
It seams to work fine now.
Thanks,
Stefan Kuhne
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 552 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-05-23 0:16 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-20 21:45 [pv_ops] e1000e: "Detected Tx Unit Hang" Stefan Kuhne
2010-05-20 22:18 ` Jeremy Fitzhardinge
2010-05-20 22:58 ` Stefan Kuhne
2010-05-20 23:01 ` Jeremy Fitzhardinge
2010-05-20 23:21 ` AW: " Heiko Wundram
2010-05-23 0:16 ` Stefan Kuhne
2010-05-20 23:22 ` Stefan Kuhne
2010-05-22 15:32 ` Thomas Goirand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).