From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Heiko Wundram" <modelnine@modelnine.org>
Subject: AW: [pv_ops] e1000e: "Detected Tx Unit Hang"
Date: Fri, 21 May 2010 01:21:29 +0200
Message-ID: <00a301caf873$2d76d350$886479f0$@org>
References: <4BF5AD97.4000907@access.denied>
	<4BF5B547.60700@goop.org>	<4BF5BE8A.9060509@access.denied>
	<4BF5BF3E.2010708@goop.org>
Mime-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <4BF5BF3E.2010708@goop.org>
Content-Language: de
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: 'Jeremy Fitzhardinge' <jeremy@goop.org>, xen-devel@lists.xensource.com
Cc: 'Stefan Kuhne' <stefan.kuhne@gmx.net>
List-Id: xen-devel@lists.xenproject.org

I'm pretty sure the problem you're seeing is related to a broken =
firmware of
the specific chipset used for this Intel network card, not to Xen/pv_ops
kernel. I've had the same problems under high load with "semi-old"
Supermicro-Boxens I'm administering.

There's an Intel utility to patch the respective Firmware issue (i.e., =
the
network controller EEPROM), but it's not available online anymore (at =
least
last time I looked for it, I couldn't find it on the Intel site, where =
it
was prominently featured when I first looked for it).

I'll try to get access to it from the last machine that I applied this =
patch
to, but I'll only be able to do this some time during the (European) day
tomorrow.
--- Heiko.


-----Urspr=FCngliche Nachricht-----
Von: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] Im Auftrag von Jeremy
Fitzhardinge
Gesendet: Freitag, 21. Mai 2010 01:01
An: xen-devel@lists.xensource.com
Cc: Stefan Kuhne
Betreff: Re: [Xen-devel] [pv_ops] e1000e: "Detected Tx Unit Hang"

On 05/20/2010 03:58 PM, Stefan Kuhne wrote:
> Am 21.05.2010 00:18, schrieb Jeremy Fitzhardinge:
>
> Hello Jeremy,
>
>  =20
>> e1000e works fine for me.  However, I did have problems with my Ibex
>> Peak-based system and the integrated ethernet devices; they would =
drop
>> off the PCIe bus (lspci -vx would show all 0xff for the config =
space),
>> which turned out to be some problem with ALPM (PCIe active link power
>> management).  Could this be what you're seeing?
>>
>>    =20
> my "lspci -vx" output:
>
> 02:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet
> Controller (Copper)
>         Subsystem: FIRST INTERNATIONAL Computer Inc Unknown device =
4720
>         Flags: bus master, fast devsel, latency 0, IRQ 409
>         Memory at d0000000 (32-bit, non-prefetchable) [size=3D128K]
>         I/O ports at 2000 [size=3D32]
>         Capabilities: [c8] Power Management version 2
>         Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+
> Queue=3D0/0 Enable+
>         Capabilities: [e0] Express Endpoint IRQ 0
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [140] Device Serial Number =
c6-a9-09-ff-ff-0b-14-00
> 00: 86 80 8c 10 07 05 10 00 00 00 00 02 10 00 00 00
> 10: 00 00 00 d0 00 00 00 00 01 20 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 09 15 20 47
> 30: 00 00 00 00 c8 00 00 00 00 00 00 00 0b 01 00 00
>
> and the complete dmesg output:
> [ 9620.997466] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9620.997469]   TDH                  <fc>
> [ 9620.997471]   TDT                  <1f>
> [ 9620.997473]   next_to_use          <1f>
> [ 9620.997475]   next_to_clean        <fc>
> [ 9620.997477] buffer_info[next_to_clean]:
> [ 9620.997479]   time_stamp           <8e2ec3>
> [ 9620.997481]   next_to_watch        <fc>
> [ 9620.997483]   jiffies              <8e3a25>
> [ 9620.997485]   next_to_watch.status <0>
> [ 9622.997490] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9622.997496]   TDH                  <fc>
> [ 9622.997500]   TDT                  <1f>
> [ 9622.997503]   next_to_use          <1f>
> [ 9622.997507]   next_to_clean        <fc>
> [ 9622.997511] buffer_info[next_to_clean]:
> [ 9622.997515]   time_stamp           <8e2ec3>
> [ 9622.997519]   next_to_watch        <fc>
> [ 9622.997522]   jiffies              <8e41f5>
> [ 9622.997526]   next_to_watch.status <0>
> [ 9624.997536] 0000:02:00.0: peth0: Detected Tx Unit Hang:
> [ 9624.997541]   TDH                  <fc>
> [ 9624.997545]   TDT                  <1f>
> [ 9624.997549]   next_to_use          <1f>
> [ 9624.997553]   next_to_clean        <fc>
> [ 9624.997557] buffer_info[next_to_clean]:
> [ 9624.997561]   time_stamp           <8e2ec3>
> [ 9624.997565]   next_to_watch        <fc>
> [ 9624.997568]   jiffies              <8e49c5>
> [ 9624.997572]   next_to_watch.status <0>
> [ 9626.065848] eth0: port 1(peth0) entering disabled state
> [ 9629.910292] e1000e: peth0 NIC Link is Up 1000 Mbps Full Duplex, =
Flow
> Control: None
> [ 9629.910854] eth0: port 1(peth0) entering forwarding state
>  =20


OK, definitely different problem.  Does it happen immediately, or after
a while?  Under load?  Can you provide the full boot output, and cat
/proc/interrupts?

Thanks,
    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel