Netdev List
 help / color / mirror / Atom feed
From: "Ruinskiy, Dima" <dima.ruinskiy@intel.com>
To: Andrew Lunn <andrew@lunn.ch>, Helge Deller <deller@gmx.de>,
	Helge Deller <deller@kernel.org>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>
Subject: Re: [Intel-wired-lan] e1000e: Report link down after "Detected Hardware Unit Hang" ?
Date: Sun, 21 Jun 2026 16:22:53 +0300	[thread overview]
Message-ID: <dce2391f-8e8d-4eeb-97e1-c7070f340d64@intel.com> (raw)
In-Reply-To: <d86c0dd8-8bd8-495a-b750-2a0036fbbee4@lunn.ch>

On 17/06/2026 0:59, Andrew Lunn wrote:
>> This does not seem like the right direction to me.
>>
>> The "Detected Hardware Unit Hang" print does not indicate that the interface
>> is dead, but that the transmitter is stalled.
>>
>> This can be due to an unusually high load, or a HW fault / race condition
>> with another component, etc.
>>
>> When a hang is detected, the transmitter is stopped with netif_stop_queue()
>> and eventually ndo_tx_timeout triggers a full reset to the device, which in
>> many cases recovers it from the hang.
> 
> Does a full reset cause the link to be negotiated again? If so, there
> is no harm in setting the carrier down. If the reset is successful,
> the carrier will be restored. However, if the reset does not recover
> the system, does the carrier say down?
> 
>      Andrew
> 

The way it is written - a reset triggered by the Tx timeout path will go 
through e1000e_reinit_locked(), which calls e1000e_down() followed by 
e1000e_up().

e1000e_down() calls netif_carrier_off() at the start, and e1000e_reset() 
later. e1000e_up() triggers a link state recheck, which should restore 
the carrier.

So if everything works as it should, the change proposed here would be 
both redundant and unnecessary. However, we have been getting reports of 
these unrecoverable hangs from time-to-time, so I suspect things do not 
always work as they should.

There is one issue under investigation at present, where a persistent 
hang was reported following an aborted hibernation attempt. We are 
testing a patch against it.

I did not see anything in the original description of this report tying 
the hang to a power state change, but I will happily share the patch 
once we get preliminary positive results.

--Dima


      reply	other threads:[~2026-06-21 13:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-14 21:48 e1000e: Report link down after "Detected Hardware Unit Hang" ? Helge Deller
2026-06-15 16:41 ` Andrew Lunn
2026-06-15 20:36   ` Helge Deller
2026-06-16 16:20     ` [Intel-wired-lan] " Ruinskiy, Dima
2026-06-16 16:55       ` Helge Deller
2026-06-16 21:59       ` Andrew Lunn
2026-06-21 13:22         ` Ruinskiy, Dima [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dce2391f-8e8d-4eeb-97e1-c7070f340d64@intel.com \
    --to=dima.ruinskiy@intel.com \
    --cc=andrew@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=deller@gmx.de \
    --cc=deller@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=netdev@vger.kernel.org \
    --cc=przemyslaw.kitszel@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox