Netdev List
 help / color / mirror / Atom feed
From: "Jiawen Wu" <jiawenwu@trustnetic.com>
To: "'Larysa Zaremba'" <larysa.zaremba@intel.com>
Cc: netdev@vger.kernel.org,
	"'Mengyuan Lou'" <mengyuanlou@net-swift.com>,
	"'Andrew Lunn'" <andrew+netdev@lunn.ch>,
	"'David S. Miller'" <davem@davemloft.net>,
	"'Eric Dumazet'" <edumazet@google.com>,
	"'Jakub Kicinski'" <kuba@kernel.org>,
	"'Paolo Abeni'" <pabeni@redhat.com>,
	"'Richard Cochran'" <richardcochran@gmail.com>,
	"'Russell King'" <linux@armlinux.org.uk>,
	"'Jacob Keller'" <jacob.e.keller@intel.com>,
	"'Michal Swiatkowski'" <michal.swiatkowski@linux.intel.com>,
	"'Simon Horman'" <horms@kernel.org>,
	"'Kees Cook'" <kees@kernel.org>,
	"'Ingo Molnar'" <mingo@kernel.org>, "'Joe Damato'" <joe@dama.to>,
	"'Breno Leitao'" <leitao@debian.org>,
	"'Aleksandr Loktionov'" <aleksandr.loktionov@intel.com>,
	"'Uwe Kleine-König (The Capable Hub)'"
	<u.kleine-koenig@baylibre.com>,
	"'Johannes Berg'" <johannes@sipsolutions.net>,
	"'Fabio Baltieri'" <fabio.baltieri@gmail.com>,
	netdev@vger.kernel.org,
	"'Mengyuan Lou'" <mengyuanlou@net-swift.com>,
	"'Andrew Lunn'" <andrew+netdev@lunn.ch>,
	"'David S. Miller'" <davem@davemloft.net>,
	"'Eric Dumazet'" <edumazet@google.com>,
	"'Jakub Kicinski'" <kuba@kernel.org>,
	"'Paolo Abeni'" <pabeni@redhat.com>,
	"'Richard Cochran'" <richardcochran@gmail.com>,
	"'Russell King'" <linux@armlinux.org.uk>,
	"'Jacob Keller'" <jacob.e.keller@intel.com>,
	"'Michal Swiatkowski'" <michal.swiatkowski@linux.intel.com>,
	"'Simon Horman'" <horms@kernel.org>,
	"'Kees Cook'" <kees@kernel.org>,
	"'Ingo Molnar'" <mingo@kernel.org>, "'Joe Damato'" <joe@dama.to>,
	"'Breno Leitao'" <leitao@debian.org>,
	"'Aleksandr Loktionov'" <aleksandr.loktionov@intel.com>,
	"'Uwe Kleine-König (The Capable Hub)'"
	<u.kleine-koenig@baylibre.com>,
	"'Johannes Berg'" <johannes@sipsolutions.net>,
	"'Fabio Baltieri'" <fabio.baltieri@gmail.com>
Subject: RE: [PATCH net-next v4 2/5] net: wangxun: add Tx timeout process
Date: Wed, 3 Jun 2026 10:22:27 +0800	[thread overview]
Message-ID: <093d01dcf2ff$d2792810$776b7830$@trustnetic.com> (raw)
In-Reply-To: <ah6xJgZjplSbVKvl@soc-5CG4396X81.clients.intel.com>

On Tue, Jun 2, 2026 6:32 PM, Larysa Zaremba wrote:
> On Mon, Jun 01, 2026 at 03:22:18PM +0800, Jiawen Wu wrote:
> > Implement .ndo_tx_timeout to handle Tx side timeout event. When a Tx
> > timeout event occur, it will trigger driver into reset process.
> >
> > The WX_HANG_CHECK_ARMED bit is set to indicate a potential hang. It will
> > be cleared if a pause frame is received to avoid false hang detection
> > caused by pause frames.
> 
> In general, logic seems sound, below 1 small nit.
> There is also a seemingly sensible comment from Sashiko (pasted below), which I
> agree with. If you not schedule NAPI every time to check for hang, then without
> Rx you are relying on dev_watchdog anyway, so maybe internal hang detection is
> not that necessary?

The purpose of WX_TX_DETECT_HANG is to reuse the existing Tx cleanup path and
detect hangs opportunistically when NAPI is already running due to normal Tx/Rx
interrupt activity.

For workloads with active traffic, this allows earlier detection than the
generic dev_watchdog() timeout and provides driver-specific diagnostics before
a full reset is triggered.

> 
> >
> > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> > ---
> >  drivers/net/ethernet/wangxun/libwx/Makefile   |   2 +-
> >  drivers/net/ethernet/wangxun/libwx/wx_err.c   | 170 ++++++++++++++++++
> >  drivers/net/ethernet/wangxun/libwx/wx_err.h   |  16 ++
> >  drivers/net/ethernet/wangxun/libwx/wx_hw.c    |  17 +-
> >  drivers/net/ethernet/wangxun/libwx/wx_lib.c   |  37 ++++
> >  drivers/net/ethernet/wangxun/libwx/wx_type.h  |  19 +-
> >  drivers/net/ethernet/wangxun/ngbe/ngbe_main.c |  14 ++
> >  .../net/ethernet/wangxun/txgbe/txgbe_main.c   |  14 ++
> >  8 files changed, 284 insertions(+), 5 deletions(-)
> >  create mode 100644 drivers/net/ethernet/wangxun/libwx/wx_err.c
> >  create mode 100644 drivers/net/ethernet/wangxun/libwx/wx_err.h
> >
> > diff --git a/drivers/net/ethernet/wangxun/libwx/Makefile b/drivers/net/ethernet/wangxun/libwx/Makefile
> > index a71b0ad77de3..c8724bb129aa 100644
> > --- a/drivers/net/ethernet/wangxun/libwx/Makefile
> > +++ b/drivers/net/ethernet/wangxun/libwx/Makefile
> > @@ -4,5 +4,5 @@
> >
> >  obj-$(CONFIG_LIBWX) += libwx.o
> >
> > -libwx-objs := wx_hw.o wx_lib.o wx_ethtool.o wx_ptp.o wx_mbx.o wx_sriov.o
> > +libwx-objs := wx_hw.o wx_lib.o wx_ethtool.o wx_ptp.o wx_mbx.o wx_sriov.o wx_err.o
> >  libwx-objs += wx_vf.o wx_vf_lib.o wx_vf_common.o
> > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> > new file mode 100644
> > index 000000000000..982a438d009e
> > --- /dev/null
> > +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> > @@ -0,0 +1,170 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2015 - 2026 Beijing WangXun Technology Co., Ltd. */
> > +/* Copyright (c) 1999 - 2026 Intel Corporation. */
> > +
> > +#include <linux/netdevice.h>
> > +#include <linux/pci.h>
> > +
> > +#include "wx_type.h"
> > +#include "wx_lib.h"
> > +#include "wx_err.h"
> > +
> > +static void wx_pf_reset_subtask(struct wx *wx)
> > +{
> > +	if (!test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags))
> > +		return;
> > +
> > +	wx_warn(wx, "Reset adapter.\n");
> > +	if (wx->do_reset)
> > +		wx->do_reset(wx->netdev);
> > +}
> > +
> > +static void wx_reset_task(struct work_struct *work)
> > +{
> > +	struct wx *wx = container_of(work, struct wx, reset_task);
> > +
> > +	rtnl_lock();
> > +
> > +	if (test_bit(WX_STATE_DOWN, wx->state) ||
> > +	    test_bit(WX_STATE_RESETTING, wx->state))
> > +		goto out;
> > +
> > +	wx_pf_reset_subtask(wx);
> > +
> > +out:
> > +	rtnl_unlock();
> > +}
> > +
> > +void wx_check_err_subtask(struct wx *wx)
> > +{
> > +	if (test_bit(WX_FLAG_NEED_PF_RESET, wx->flags))
> > +		queue_work(wx->reset_wq, &wx->reset_task);
> > +}
> > +EXPORT_SYMBOL(wx_check_err_subtask);
> > +
> > +int wx_init_err_task(struct wx *wx)
> > +{
> > +	wx->reset_wq = alloc_workqueue("wx_reset_wq", WQ_UNBOUND | WQ_HIGHPRI, 1);
> > +	if (!wx->reset_wq) {
> > +		pr_err("Failed to create wx_reset_wq workqueue\n");
> 
> This driver does not generally use pr_err().

I'll change it to wx_err().

> 
> > +		return -ENOMEM;
> > +	}
> > +
> > +	INIT_WORK(&wx->reset_task, wx_reset_task);
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(wx_init_err_task);
> > +
> > +static bool wx_ring_tx_pending(struct wx *wx)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < wx->num_tx_queues; i++) {
> > +		struct wx_ring *tx_ring = wx->tx_ring[i];
> > +
> > +		if (tx_ring->next_to_use != tx_ring->next_to_clean)
> > +			return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +static bool wx_vf_tx_pending(struct wx *wx)
> > +{
> > +	struct wx_ring_feature *vmdq = &wx->ring_feature[RING_F_VMDQ];
> > +	u32 q_per_pool = __ALIGN_MASK(1, ~vmdq->mask);
> > +	u32 i, j;
> > +
> > +	if (!wx->num_vfs)
> > +		return false;
> > +
> > +	for (i = 0; i < wx->num_vfs; i++) {
> > +		for (j = 0; j < q_per_pool; j++) {
> > +			u32 h, t;
> > +
> > +			h = rd32(wx, WX_PX_TR_RP_PV(q_per_pool, i, j));
> > +			t = rd32(wx, WX_PX_TR_WP_PV(q_per_pool, i, j));
> > +
> > +			if (h != t)
> > +				return true;
> > +		}
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +static void wx_watchdog_flush_tx(struct wx *wx)
> > +{
> > +	if (!netif_running(wx->netdev))
> > +		return;
> > +	if (netif_carrier_ok(wx->netdev))
> > +		return;
> > +
> > +	if (wx_ring_tx_pending(wx) || wx_vf_tx_pending(wx)) {
> > +		/* We've lost link, so the controller stops DMA,
> > +		 * but we've got queued Tx work that's never going
> > +		 * to get done, so reset controller to flush Tx.
> > +		 * (Do the reset outside of interrupt context).
> > +		 */
> > +		wx_warn(wx, "initiating reset due to lost link with pending Tx work\n");
> > +		set_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
> > +	}
> > +}
> > +
> > +static void wx_check_tx_hang(struct wx *wx)
> > +{
> > +	int i;
> > +
> > +	/* If we're down or resetting, just bail */
> > +	if (!netif_running(wx->netdev) ||
> > +	    test_bit(WX_STATE_RESETTING, wx->state))
> > +		return;
> > +
> > +	/* Force detection of hung controller */
> > +	if (netif_carrier_ok(wx->netdev)) {
> > +		for (i = 0; i < wx->num_tx_queues; i++)
> > +			set_bit(WX_TX_DETECT_HANG, wx->tx_ring[i]->state);
> 
> Sashiko says:
> 
> If a software interrupt is not triggered, wouldn't the evaluation of this
> flag in wx_clean_tx_irq() fail to run unless incoming Rx traffic happens
> to arrive on the same interrupt vector?
> 
> > +	}
> > +}
> > +
> > +void wx_check_tx_hang_subtask(struct wx *wx)
> > +{
> > +	wx_watchdog_flush_tx(wx);
> > +	wx_check_tx_hang(wx);
> > +}
> > +EXPORT_SYMBOL(wx_check_tx_hang_subtask);
> > +
> > +static void wx_tx_timeout_reset(struct wx *wx)
> > +{
> > +	if (test_bit(WX_STATE_DOWN, wx->state))
> > +		return;
> > +
> > +	set_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
> > +	wx_warn(wx, "initiating reset due to tx timeout\n");
> > +	wx_service_event_schedule(wx);
> > +}
> > +
> > +void wx_tx_timeout(struct net_device *netdev, unsigned int __always_unused txqueue)
> > +{
> > +	struct wx *wx = netdev_priv(netdev);
> > +
> > +	wx_tx_timeout_reset(wx);
> > +}
> > +EXPORT_SYMBOL(wx_tx_timeout);
> > +
> > +void wx_handle_tx_hang(struct wx_ring *tx_ring, unsigned int next)
> > +{
> > +	struct wx *wx = netdev_priv(tx_ring->netdev);
> > +
> > +	wx_warn(wx,
> > +		"Detected Tx Unit Hang: Queue %d, TDH %x, TDT %x, ntu %x, ntc %x, ntc.time_stamp %lx, jiffies %lx\n",
> > +		tx_ring->queue_index,
> > +		rd32(wx, WX_PX_TR_RP(tx_ring->reg_idx)),
> > +		rd32(wx, WX_PX_TR_WP(tx_ring->reg_idx)),
> > +		tx_ring->next_to_use, next,
> > +		tx_ring->tx_buffer_info[next].time_stamp, jiffies);
> > +
> > +	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
> > +
> > +	wx_tx_timeout_reset(wx);
> > +}
> 
> [...]
> 


  reply	other threads:[~2026-06-03  2:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01  7:22 [PATCH net-next v4 0/5] net: wangxun: timeout and error Jiawen Wu
2026-06-01  7:22 ` [PATCH net-next v4 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-06-02 10:13   ` Larysa Zaremba
2026-06-01  7:22 ` [PATCH net-next v4 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-06-01  9:26   ` Loktionov, Aleksandr
2026-06-02 10:32   ` Larysa Zaremba
2026-06-03  2:22     ` Jiawen Wu [this message]
2026-06-01  7:22 ` [PATCH net-next v4 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-06-01  9:05   ` Loktionov, Aleksandr
2026-06-01  7:22 ` [PATCH net-next v4 4/5] net: wangxun: introduce soft quiesce callbacks for AER recovery Jiawen Wu
2026-06-01  9:32   ` Loktionov, Aleksandr
2026-06-01  9:37     ` Jiawen Wu
2026-06-01 10:09   ` Loktionov, Aleksandr
2026-06-02  2:12     ` Jiawen Wu
2026-06-02 11:16   ` Larysa Zaremba
2026-06-03  2:45     ` Jiawen Wu
2026-06-01  7:22 ` [PATCH net-next v4 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-06-01  9:37   ` Loktionov, Aleksandr
2026-06-02  2:28     ` Jiawen Wu
2026-06-02 11:30   ` Larysa Zaremba
2026-06-03  2:51     ` Jiawen Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='093d01dcf2ff$d2792810$776b7830$@trustnetic.com' \
    --to=jiawenwu@trustnetic.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=joe@dama.to \
    --cc=johannes@sipsolutions.net \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=michal.swiatkowski@linux.intel.com \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=u.kleine-koenig@baylibre.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox