All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: jiawenwu@trustnetic.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, mengyuanlou@net-swift.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	pabeni@redhat.com, richardcochran@gmail.com,
	linux@armlinux.org.uk, horms@kernel.org, kees@kernel.org,
	larysa.zaremba@intel.com, leitao@debian.org, joe@dama.to,
	jacob.e.keller@intel.com, fabio.baltieri@gmail.com
Subject: Re: [PATCH net-next v2 2/6] net: wangxun: add Tx timeout process
Date: Sat,  2 May 2026 19:15:32 -0700	[thread overview]
Message-ID: <20260503021532.4127268-1-kuba@kernel.org> (raw)
In-Reply-To: <20260430082517.19612-3-jiawenwu@trustnetic.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: wangxun: add Tx timeout process

Implement .ndo_tx_timeout to handle Tx side timeout events, triggering
a driver reset process. It also adds logic to track potential hangs and
clear false detections during 802.3 pause frames.

> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> new file mode 100644
> index 0000000000000..ba5f23cefc0f9
> --- /dev/null
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> @@ -0,0 +1,126 @@

[ ... ]

> +static void wx_reset_subtask(struct wx *wx)
> +{
> +	if (!test_bit(WX_FLAG_NEED_PF_RESET, wx->flags))
> +		return;
> +
> +	rtnl_lock();
> +
> +	if (!netif_running(wx->netdev) ||
> +	    test_bit(WX_STATE_RESETTING, wx->state))
> +		goto out;

If the interface is being brought down or a reset is in progress, does
this leave WX_FLAG_NEED_PF_RESET set indefinitely? When the interface is
brought back up, could this stale flag trigger an immediate, redundant
hardware reset?

Also, could this rtnl_lock() call cause an AB-BA deadlock? If the network
interface is administratively brought down, the networking core acquires
rtnl_lock() and invokes the driver's .ndo_stop routine.

If the teardown path calls cancel_work_sync(&wx->service_task), the close
path holding rtnl_lock() will wait for service_task to finish, while this
service_task waits indefinitely to acquire rtnl_lock().

> +
> +	wx_warn(wx, "Reset adapter.\n");
> +
> +	if (test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags)) {
> +		if (wx->do_reset)
> +			wx->do_reset(wx->netdev);
> +	}
> +
> +out:
> +	rtnl_unlock();
> +}

[ ... ]

> +void wx_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> +{
> +	struct wx *wx = netdev_priv(netdev);
> +	u32 head, tail;
> +	int i;
> +
> +	for (i = 0; i < wx->num_tx_queues; i++) {
> +		struct wx_ring *tx_ring = wx->tx_ring[i];
> +
> +		if (test_bit(WX_TX_DETECT_HANG, tx_ring->state) &&
> +		    wx_check_tx_hang(tx_ring))

Since wx_tx_timeout() can execute concurrently with NAPI's wx_clean_tx_irq()
on healthy queues, is it safe to call wx_check_tx_hang() here?

wx_check_tx_hang() mutates per-queue NAPI state like
ring->tx_stats.tx_done_old and ring->state without synchronization. Modifying
this state concurrently could corrupt the hang detection state machine.

> +			wx_warn(wx, "Real tx hang detected on queue %d\n", i);
> +
> +		head = rd32(wx, WX_PX_TR_RP(tx_ring->reg_idx));
> +		tail = rd32(wx, WX_PX_TR_WP(tx_ring->reg_idx));
> +		wx_warn(wx,
> +			"tx ring %d next_to_use is %d, next_to_clean is %d\n",
> +			i, tx_ring->next_to_use,
> +			tx_ring->next_to_clean);
> +		wx_warn(wx, "tx ring %d hw rp is 0x%x, wp is 0x%x\n",
> +			i, head, tail);
> +	}
> +
> +	wx_tx_timeout_reset(wx);
> +}

[ ... ]

> diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
> index bd905e2675758..e9561996b970f 100644
> --- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
> +++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
> @@ -147,6 +148,7 @@ static void ngbe_service_task(struct work_struct *work)
>  {
>  	struct wx *wx = container_of(work, struct wx, service_task);
>  
> +	wx_handle_errors_subtask(wx);
>  	wx_update_stats(wx);

Is there a race condition here that could cause spurious Tx hang resets?

wx_handle_errors_subtask() sets the WX_TX_DETECT_HANG bit.
wx_update_stats() clears the WX_HANG_CHECK_ARMED bit when XOFF pause
frames are received.

If a hardware interrupt schedules NAPI between these two calls,
wx_clean_tx_irq() will execute wx_check_tx_hang() and might test the
WX_HANG_CHECK_ARMED bit from the previous interval (since wx_update_stats()
hasn't cleared it yet).

Would calling wx_update_stats() before wx_handle_errors_subtask() prevent
this false hang detection?

>  
>  	wx_service_event_complete(wx);

[ ... ]

> diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> index 8b7c3753bb6ac..5793da5b7bab4 100644
> --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> @@ -128,6 +129,7 @@ static void txgbe_service_task(struct work_struct *work)
>  {
>  	struct wx *wx = container_of(work, struct wx, service_task);
>  
> +	wx_handle_errors_subtask(wx);
>  	txgbe_module_detection_subtask(wx);
>  	txgbe_link_config_subtask(wx);
>  	wx_update_stats(wx);

Similar to ngbe_service_task(), could NAPI execution between
wx_handle_errors_subtask() and wx_update_stats() trigger a false Tx hang
here as well?

  reply	other threads:[~2026-05-03  2:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  8:25 [PATCH net-next v2 0/6] net: wangxun: timeout and error Jiawen Wu
2026-04-30  8:25 ` [PATCH net-next v2 1/6] net: ngbe: implement libwx reset ops Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-05-06  9:05     ` Jiawen Wu
2026-04-30  8:25 ` [PATCH net-next v2 2/6] net: wangxun: add Tx timeout process Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski [this message]
2026-04-30  8:25 ` [PATCH net-next v2 3/6] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-04-30  8:25 ` [PATCH net-next v2 4/6] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-04-30  8:25 ` [PATCH net-next v2 5/6] net: wangxun: clear stored DMA addresses after dma_free_coherent() Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-05-08  8:43     ` Jiawen Wu
2026-04-30  8:25 ` [PATCH net-next v2 6/6] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-05-09  8:29     ` Jiawen Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260503021532.4127268-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=joe@dama.to \
    --cc=kees@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.