Netdev List
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: jiawenwu@trustnetic.com
Cc: netdev@vger.kernel.org, mengyuanlou@net-swift.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, richardcochran@gmail.com,
	linux@armlinux.org.uk, horms@kernel.org, kees@kernel.org,
	larysa.zaremba@intel.com, leitao@debian.org, joe@dama.to,
	jacob.e.keller@intel.com, fabio.baltieri@gmail.com
Subject: Re: [PATCH net-next v3 2/5] net: wangxun: add Tx timeout process
Date: Thu, 14 May 2026 11:23:46 +0200	[thread overview]
Message-ID: <20260514092346.90833-1-pabeni@redhat.com> (raw)
In-Reply-To: <20260509100540.32612-3-jiawenwu@trustnetic.com>

From: AI Reviewer <ai@example.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: wangxun: add Tx timeout process

This patch implements .ndo_tx_timeout for the wangxun drivers (ngbe and
txgbe). On a Tx timeout it sets WX_FLAG_NEED_PF_RESET and the service
task picks the flag up and calls wx->do_reset() under rtnl_lock().

> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> new file mode 100644
> index 000000000000..bb61355447a8
> --- /dev/null
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> @@ -0,0 +1,154 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2015 - 2026 Beijing WangXun Technology Co., Ltd. */
> +/* Copyright (c) 1999 - 2026 Intel Corporation. */
> +
> +#include <linux/netdevice.h>
> +#include <linux/pci.h>
> +
> +#include "wx_type.h"
> +#include "wx_lib.h"
> +#include "wx_err.h"
> +
> +void wx_reset_subtask(struct wx *wx)
> +{
> +	if (!test_bit(WX_FLAG_NEED_PF_RESET, wx->flags))
> +		return;
> +
> +	rtnl_lock();
> +
> +	if (!netif_running(wx->netdev) ||
> +	    test_bit(WX_STATE_RESETTING, wx->state))
> +		goto out;
> +
> +	wx_warn(wx, "Reset adapter.\n");
> +
> +	if (test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags)) {
> +		if (wx->do_reset)
> +			wx->do_reset(wx->netdev);
> +	}
> +
> +out:
> +	rtnl_unlock();
> +}
> +EXPORT_SYMBOL(wx_reset_subtask);

[Critical]
Can wx_reset_subtask() deadlock against itself on ngbe?

wx_reset_subtask() runs as the first step of ngbe_service_task and
synchronously calls wx->do_reset(), which for ngbe is bound to
ngbe_do_reset():

  ngbe_sw_init():
      wx->do_reset = ngbe_do_reset;

  ngbe_do_reset():
      if (netif_running(netdev))
          ngbe_reinit_locked(wx);

  ngbe_reinit_locked():
      ngbe_down(wx);
      ngbe_up(wx);

  ngbe_down() -> ngbe_disable_device():
      ...
      cancel_work_sync(&wx->service_task);

So the running service_task ends up calling cancel_work_sync() on
itself. cancel_work_sync() inserts a barrier behind the currently
executing work and waits for completion; the barrier cannot run until
the current work returns, and the current work cannot return until the
barrier completes. Would the worker hang on the very first Tx timeout?

There is also an rtnl ordering question. wx_reset_subtask() takes
rtnl_lock() before calling do_reset, while ngbe_close() runs under
RTNL and calls ngbe_disable_device() (and therefore cancel_work_sync())
with RTNL still held. If the service task is already blocked on
rtnl_lock() in wx_reset_subtask() while close holds RTNL and waits for
the work to finish, can the two paths deadlock on each other?

The same hazard is reachable via ngbe_dev_shutdown(), which wraps
ngbe_close() in rtnl_lock()/rtnl_unlock().

Note that txgbe_disable_device() does not call cancel_work_sync(), so
adding it there for symmetry would import the same self-cancel
deadlock; the fix likely belongs elsewhere (for example, having
wx_reset_subtask() schedule the reset outside of the worker, or using
a separate work item for the reset).

> +static void wx_watchdog_flush_tx(struct wx *wx)
> +{
> +	if (netif_carrier_ok(wx->netdev))
> +		return;
> +
> +	if (wx_ring_tx_pending(wx) || wx_vf_tx_pending(wx)) {
> +		/* We've lost link, so the controller stops DMA,
> +		 * but we've got queued Tx work that's never going
> +		 * to get done, so reset controller to flush Tx.
> +		 * (Do the reset outside of interrupt context).
> +		 */
> +		wx_warn(wx, "initiating reset due to lost link with pending Tx work\n");
> +		set_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
> +	}
> +}

[Medium]
On txgbe, can WX_FLAG_NEED_PF_RESET survive across a close and trigger
a spurious reset on the next open?

txgbe_disable_device() clears the flag and then calls
timer_delete_sync() but, unlike ngbe_disable_device(), does not call
cancel_work_sync(&wx->service_task):

  clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
  timer_delete_sync(&wx->service_timer);

timer_delete_sync() only stops future timer fires; an already-queued
or in-flight service_task still runs. wx_watchdog_flush_tx() sets
WX_FLAG_NEED_PF_RESET based only on netif_carrier_ok() and
ntu != ntc, without consulting netif_running(). After phylink_stop()
drops carrier and before wx_clean_all_tx_rings() resets ntu/ntc, an
in-flight service_task can re-set the flag after the close path has
cleared it.

On the next open, txgbe_up_complete() does
mod_timer(&wx->service_timer, jiffies); the service_task fires, sees
WX_FLAG_NEED_PF_RESET still set, and wx_reset_subtask() runs
txgbe_do_reset() against the just-opened device.

Would gating wx_watchdog_flush_tx() on netif_running(wx->netdev), or
clearing the flag in the open path, address this?

> diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> index 8b7c3753bb6a..ba55a17da3b8 100644
> --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> @@ -228,6 +231,7 @@ static void txgbe_disable_device(struct wx *wx)
>  	wx_irq_disable(wx);
>  	wx_napi_disable_all(wx);
>  
> +	clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
>  	timer_delete_sync(&wx->service_timer);
>  
>  	if (wx->bus.func < 2)
-- 
This is an AI-generated review.


  reply	other threads:[~2026-05-14  9:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 10:05 [PATCH net-next v3 0/5] net: wangxun: timeout and error Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-05-14  9:23   ` Paolo Abeni [this message]
2026-05-09 10:05 ` [PATCH net-next v3 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 4/5] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-05-14  9:24   ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514092346.90833-1-pabeni@redhat.com \
    --to=pabeni@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=joe@dama.to \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=netdev@vger.kernel.org \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox