* e1000 TX unit hang (redux)
@ 2006-07-11 18:16 shaw
2006-07-11 21:09 ` Auke Kok
0 siblings, 1 reply; 3+ messages in thread
From: shaw @ 2006-07-11 18:16 UTC (permalink / raw)
To: auke-jan.h.kok, kernel, jesse.brandeburg; +Cc: netdev
Hello All,
I have an e1000 card periodically misbehaving with the message 'Detected Tx
unit hang'. I've noticed this problem come up on netdev a couple of times
and found the link to the bug tracking page--
http://sourceforge.net/tracker/index.php?func=detail&aid=1463045&group_id=42302&atid=447449
I've also seen the patch that I believe was placed in 2.6.16 and subsequently
brought down to 2.4.2? that seems to address this problem by creating a
tx_timeout_factor relative to the speed of the NIC. However, there is no
mention of this workaround/fix on the bug at the link above and I haven't
found any discussion of it here on netdev. Auke recommends turning off tso
to see if that resolves the problem and this also seems to work, though I
have as yet not been able to confirm this and would prefer a more performance
friendly fix..if available ;)
Would one of you pplease give an update on the status of the bug? If a cause
was ever found and if the tx_timeout_factor was intended as a fix or
temporary workaround? I feel like I must have missed something, because I
never saw the tx_timeout_factor patch go through netdev at all..
Thanks again for your help,
Shaw
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: e1000 TX unit hang (redux)
2006-07-11 18:16 e1000 TX unit hang (redux) shaw
@ 2006-07-11 21:09 ` Auke Kok
2006-07-11 21:53 ` shawvrana
0 siblings, 1 reply; 3+ messages in thread
From: Auke Kok @ 2006-07-11 21:09 UTC (permalink / raw)
To: shaw; +Cc: auke-jan.h.kok, kernel, jesse.brandeburg, netdev
shaw@vranix.com wrote:
> I have an e1000 card periodically misbehaving with the message 'Detected Tx
> unit hang'. I've noticed this problem come up on netdev a couple of times
> and found the link to the bug tracking page--
> http://sourceforge.net/tracker/index.php?func=detail&aid=1463045&group_id=42302&atid=447449
>
> I've also seen the patch that I believe was placed in 2.6.16 and subsequently
> brought down to 2.4.2?
that's not only impossible but also unlikely - we don't push changes to 2.4
kernels anymore a lot, I think the last change is likely older than 2.4.28 or so.
> that seems to address this problem by creating a
> tx_timeout_factor relative to the speed of the NIC. However, there is no
> mention of this workaround/fix on the bug at the link above and I haven't
> found any discussion of it here on netdev.
I wouldn't even know what patch you are talking about (?!)
> Auke recommends turning off tso
> to see if that resolves the problem and this also seems to work, though I
> have as yet not been able to confirm this and would prefer a more performance
> friendly fix..if available ;)
>
> Would one of you pplease give an update on the status of the bug? If a cause
> was ever found and if the tx_timeout_factor was intended as a fix or
> temporary workaround? I feel like I must have missed something, because I
> never saw the tx_timeout_factor patch go through netdev at all..
One possible problem is a bad EEPROM bit, where the hardware might have been
misconfigured. This only affects _some_ older e1000's. Any bugreport therefore
should include the output of `ethtool -e ethX` (as well as the `lspci -vv`
output of course. If you haven't already done so, please submit this to the
bugtracker or to us by e-mail
Cheers,
Auke
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: e1000 TX unit hang (redux)
2006-07-11 21:09 ` Auke Kok
@ 2006-07-11 21:53 ` shawvrana
0 siblings, 0 replies; 3+ messages in thread
From: shawvrana @ 2006-07-11 21:53 UTC (permalink / raw)
To: Auke Kok; +Cc: kernel, jesse.brandeburg, netdev
Hi Auke,
On Tuesday 11 July 2006 14:09, Auke Kok wrote:
>
> > that seems to address this problem by creating a
> >
> > tx_timeout_factor relative to the speed of the NIC. However, there is no
> > mention of this workaround/fix on the bug at the link above and I haven't
> > found any discussion of it here on netdev.
>
> I wouldn't even know what patch you are talking about (?!)
Ok, well, the patch is in 2.6.17.4 and looks to have been announced in the
2.6.16-c2 changelog -- http://lwn.net/Articles/170529/ -- and written by Jeff
Kirsher. I haven't been able to find a link to the original patch submission
anywhere. The code looks something like this now:
/* Detect a transmit hang in hardware, this serializes the
* check with the clearing of time_stamp and movement of i */
adapter->detect_tx_hung = FALSE;
if (tx_ring->buffer_info[eop].dma &&
time_after(jiffies, tx_ring->buffer_info[eop].time_stamp +
(adapter->tx_timeout_factor * HZ))
&& !(E1000_READ_REG(&adapter->hw, STATUS) &
E1000_STATUS_TXOFF)) {
..where the tx_timeout_factor has been added and is set in the watchdog code
based on the link speed.
> that's not only impossible but also unlikely - we don't push changes to 2.4
> kernels anymore a lot, I think the last change is likely older than 2.4.28.
I'm sure you're right. Jumped to conclusions on a patch I saw posted at
redhat.. I'll be more careful next time :)
I'll also try to get some better debugging info from my side.
Thanks.
Shaw
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-07-11 21:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-11 18:16 e1000 TX unit hang (redux) shaw
2006-07-11 21:09 ` Auke Kok
2006-07-11 21:53 ` shawvrana
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).