* [PATCH net-next v1 0/5] net: wangxun: timeout and error
@ 2026-04-28 2:11 Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Jiawen Wu @ 2026-04-28 2:11 UTC (permalink / raw)
To: netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Richard Cochran, Russell King,
Simon Horman, Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato,
Jacob Keller, Fabio Baltieri, Jiawen Wu
This series is a split of the previous series:
https://lore.kernel.org/all/20260326021406.30444-1-jiawenwu@trustnetic.com
It is about adding the Tx timeout process and pci_error_handlers.
The changes from the last full patch set V6:
- Add 'else' handling in ngbe_do_reset().
- Acquire rtnl_lock() before checking netif_running() in
wx_reset_subtask().
- Use test_and_clear_bit() instead of test_bit()…clear_bit() to avoid
losing another reset request.
- Change ‘u64 tx_done_old’ to ‘u32’ to avoid data race between
dev_watchdog and NAPI polling.
- Check the return value of ndo_open() in wx_io_resume().
- Drop pci_save_state().
Jiawen Wu (5):
net: ngbe: implement libwx reset ops
net: wangxun: add Tx timeout process
net: wangxun: add reinit parameter to wx->do_reset callback
net: wangxun: extract the close_suspend sequence
net: wangxun: implement pci_error_handlers ops
drivers/net/ethernet/wangxun/libwx/Makefile | 2 +-
drivers/net/ethernet/wangxun/libwx/wx_err.c | 232 ++++++++++++++++++
drivers/net/ethernet/wangxun/libwx/wx_err.h | 16 ++
.../net/ethernet/wangxun/libwx/wx_ethtool.c | 2 +-
drivers/net/ethernet/wangxun/libwx/wx_hw.c | 17 +-
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 41 +++-
drivers/net/ethernet/wangxun/libwx/wx_lib.h | 1 +
drivers/net/ethernet/wangxun/libwx/wx_type.h | 16 +-
.../net/ethernet/wangxun/ngbe/ngbe_ethtool.c | 1 -
drivers/net/ethernet/wangxun/ngbe/ngbe_main.c | 68 ++++-
drivers/net/ethernet/wangxun/ngbe/ngbe_type.h | 2 +
.../net/ethernet/wangxun/txgbe/txgbe_main.c | 26 +-
.../net/ethernet/wangxun/txgbe/txgbe_type.h | 3 +-
13 files changed, 398 insertions(+), 29 deletions(-)
create mode 100644 drivers/net/ethernet/wangxun/libwx/wx_err.c
create mode 100644 drivers/net/ethernet/wangxun/libwx/wx_err.h
--
2.51.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net-next v1 1/5] net: ngbe: implement libwx reset ops
2026-04-28 2:11 [PATCH net-next v1 0/5] net: wangxun: timeout and error Jiawen Wu
@ 2026-04-28 2:11 ` Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 2/5] net: wangxun: add Tx timeout process Jiawen Wu
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Jiawen Wu @ 2026-04-28 2:11 UTC (permalink / raw)
To: netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Richard Cochran, Russell King,
Simon Horman, Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato,
Jacob Keller, Fabio Baltieri, Jiawen Wu
Implement wx->do_reset() for library module calling.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
.../net/ethernet/wangxun/ngbe/ngbe_ethtool.c | 1 -
drivers/net/ethernet/wangxun/ngbe/ngbe_main.c | 37 ++++++++++++++++++-
drivers/net/ethernet/wangxun/ngbe/ngbe_type.h | 1 +
3 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_ethtool.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_ethtool.c
index b2e191982803..1960f7154151 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_ethtool.c
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_ethtool.c
@@ -59,7 +59,6 @@ static int ngbe_set_ringparam(struct net_device *netdev,
wx_set_ring(wx, new_tx_count, new_rx_count, temp_ring);
kvfree(temp_ring);
- wx_configure(wx);
ngbe_up(wx);
clear_reset:
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
index d8e3827a8b1f..bd905e267575 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
@@ -133,6 +133,7 @@ static int ngbe_sw_init(struct wx *wx)
wx->mbx.size = WX_VXMAILBOX_SIZE;
wx->setup_tc = ngbe_setup_tc;
+ wx->do_reset = ngbe_do_reset;
set_bit(0, &wx->fwd_bitmask);
return 0;
@@ -422,7 +423,7 @@ void ngbe_down(struct wx *wx)
wx_clean_all_rx_rings(wx);
}
-void ngbe_up(struct wx *wx)
+static void ngbe_up_complete(struct wx *wx)
{
wx_configure_vectors(wx);
@@ -488,7 +489,7 @@ static int ngbe_open(struct net_device *netdev)
wx_ptp_init(wx);
- ngbe_up(wx);
+ ngbe_up_complete(wx);
return 0;
err_dis_phy:
@@ -501,6 +502,12 @@ static int ngbe_open(struct net_device *netdev)
return err;
}
+void ngbe_up(struct wx *wx)
+{
+ wx_configure(wx);
+ ngbe_up_complete(wx);
+}
+
/**
* ngbe_close - Disables a network interface
* @netdev: network interface device structure
@@ -588,6 +595,8 @@ int ngbe_setup_tc(struct net_device *dev, u8 tc)
*/
if (netif_running(dev))
ngbe_close(dev);
+ else
+ ngbe_reset(wx);
wx_clear_interrupt_scheme(wx);
@@ -604,6 +613,30 @@ int ngbe_setup_tc(struct net_device *dev, u8 tc)
return 0;
}
+static void ngbe_reinit_locked(struct wx *wx)
+{
+ netif_trans_update(wx->netdev);
+
+ mutex_lock(&wx->reset_lock);
+ set_bit(WX_STATE_RESETTING, wx->state);
+
+ ngbe_down(wx);
+ ngbe_up(wx);
+
+ clear_bit(WX_STATE_RESETTING, wx->state);
+ mutex_unlock(&wx->reset_lock);
+}
+
+void ngbe_do_reset(struct net_device *netdev)
+{
+ struct wx *wx = netdev_priv(netdev);
+
+ if (netif_running(netdev))
+ ngbe_reinit_locked(wx);
+ else
+ ngbe_reset(wx);
+}
+
static const struct net_device_ops ngbe_netdev_ops = {
.ndo_open = ngbe_open,
.ndo_stop = ngbe_close,
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
index 7077a0da4c98..4f648f272c08 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
@@ -125,5 +125,6 @@ extern char ngbe_driver_name[];
void ngbe_down(struct wx *wx);
void ngbe_up(struct wx *wx);
int ngbe_setup_tc(struct net_device *dev, u8 tc);
+void ngbe_do_reset(struct net_device *netdev);
#endif /* _NGBE_TYPE_H_ */
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v1 2/5] net: wangxun: add Tx timeout process
2026-04-28 2:11 [PATCH net-next v1 0/5] net: wangxun: timeout and error Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
@ 2026-04-28 2:11 ` Jiawen Wu
2026-04-30 8:24 ` Paolo Abeni
2026-04-28 2:11 ` [PATCH net-next v1 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Jiawen Wu @ 2026-04-28 2:11 UTC (permalink / raw)
To: netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Richard Cochran, Russell King,
Simon Horman, Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato,
Jacob Keller, Fabio Baltieri, Jiawen Wu
Implement .ndo_tx_timeout to handle Tx side timeout event. When Tx
timeout event occur, it will triger driver into reset process.
The WX_HANG_CHECK_ARMED bit is set to indicate a potential hang. It will
be cleared if a pause frame is received to remove false hang detection
due to 802.3 frames.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
drivers/net/ethernet/wangxun/libwx/Makefile | 2 +-
drivers/net/ethernet/wangxun/libwx/wx_err.c | 125 ++++++++++++++++++
drivers/net/ethernet/wangxun/libwx/wx_err.h | 14 ++
drivers/net/ethernet/wangxun/libwx/wx_hw.c | 17 ++-
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 37 ++++++
drivers/net/ethernet/wangxun/libwx/wx_lib.h | 1 +
drivers/net/ethernet/wangxun/libwx/wx_type.h | 12 +-
drivers/net/ethernet/wangxun/ngbe/ngbe_main.c | 4 +
.../net/ethernet/wangxun/txgbe/txgbe_main.c | 4 +
9 files changed, 211 insertions(+), 5 deletions(-)
create mode 100644 drivers/net/ethernet/wangxun/libwx/wx_err.c
create mode 100644 drivers/net/ethernet/wangxun/libwx/wx_err.h
diff --git a/drivers/net/ethernet/wangxun/libwx/Makefile b/drivers/net/ethernet/wangxun/libwx/Makefile
index a71b0ad77de3..c8724bb129aa 100644
--- a/drivers/net/ethernet/wangxun/libwx/Makefile
+++ b/drivers/net/ethernet/wangxun/libwx/Makefile
@@ -4,5 +4,5 @@
obj-$(CONFIG_LIBWX) += libwx.o
-libwx-objs := wx_hw.o wx_lib.o wx_ethtool.o wx_ptp.o wx_mbx.o wx_sriov.o
+libwx-objs := wx_hw.o wx_lib.o wx_ethtool.o wx_ptp.o wx_mbx.o wx_sriov.o wx_err.o
libwx-objs += wx_vf.o wx_vf_lib.o wx_vf_common.o
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
new file mode 100644
index 000000000000..42e00f0bd8da
--- /dev/null
+++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2015 - 2026 Beijing WangXun Technology Co., Ltd. */
+
+#include <linux/netdevice.h>
+#include <linux/pci.h>
+
+#include "wx_type.h"
+#include "wx_lib.h"
+#include "wx_err.h"
+
+static void wx_reset_subtask(struct wx *wx)
+{
+ if (!test_bit(WX_FLAG_NEED_PF_RESET, wx->flags))
+ return;
+
+ rtnl_lock();
+
+ if (!netif_running(wx->netdev) ||
+ test_bit(WX_STATE_RESETTING, wx->state))
+ return;
+
+ wx_warn(wx, "Reset adapter.\n");
+
+ if (test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags)) {
+ if (wx->do_reset)
+ wx->do_reset(wx->netdev);
+ }
+
+ rtnl_unlock();
+}
+
+/*
+ * wx_check_tx_hang_subtask - check for hung queues and dropped interrupts
+ * @wx - pointer to the device wx structure
+ *
+ * This function serves two purposes. First it strobes the interrupt lines
+ * in order to make certain interrupts are occurring. Secondly it sets the
+ * bits needed to check for TX hangs. As a result we should immediately
+ * determine if a hang has occurred.
+ */
+static void wx_check_tx_hang_subtask(struct wx *wx)
+{
+ int i;
+
+ /* If we're down or resetting, just bail */
+ if (!netif_running(wx->netdev) ||
+ test_bit(WX_STATE_RESETTING, wx->state))
+ return;
+
+ /* Force detection of hung controller */
+ if (netif_carrier_ok(wx->netdev)) {
+ for (i = 0; i < wx->num_tx_queues; i++)
+ set_bit(WX_TX_DETECT_HANG, wx->tx_ring[i]->state);
+ }
+}
+
+void wx_handle_errors_subtask(struct wx *wx)
+{
+ wx_reset_subtask(wx);
+ wx_check_tx_hang_subtask(wx);
+}
+EXPORT_SYMBOL(wx_handle_errors_subtask);
+
+static void wx_tx_timeout_reset(struct wx *wx)
+{
+ if (!netif_running(wx->netdev))
+ return;
+
+ set_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
+ wx_warn(wx, "initiating reset due to tx timeout\n");
+ wx_service_event_schedule(wx);
+}
+
+void wx_tx_timeout(struct net_device *netdev, unsigned int txqueue)
+{
+ struct wx *wx = netdev_priv(netdev);
+ u32 head, tail;
+ int i;
+
+ for (i = 0; i < wx->num_tx_queues; i++) {
+ struct wx_ring *tx_ring = wx->tx_ring[i];
+
+ if (test_bit(WX_TX_DETECT_HANG, tx_ring->state) &&
+ wx_check_tx_hang(tx_ring))
+ wx_warn(wx, "Real tx hang detected on queue %d\n", i);
+
+ head = rd32(wx, WX_PX_TR_RP(tx_ring->reg_idx));
+ tail = rd32(wx, WX_PX_TR_WP(tx_ring->reg_idx));
+ wx_warn(wx,
+ "tx ring %d next_to_use is %d, next_to_clean is %d\n",
+ i, tx_ring->next_to_use,
+ tx_ring->next_to_clean);
+ wx_warn(wx, "tx ring %d hw rp is 0x%x, wp is 0x%x\n",
+ i, head, tail);
+ }
+
+ wx_tx_timeout_reset(wx);
+}
+EXPORT_SYMBOL(wx_tx_timeout);
+
+void wx_handle_tx_hang(struct wx_ring *tx_ring, unsigned int next)
+{
+ struct wx *wx = netdev_priv(tx_ring->netdev);
+
+ wx_warn(wx, "Detected Tx Unit Hang\n"
+ " Tx Queue <%d>\n"
+ " TDH, TDT <%x>, <%x>\n"
+ " next_to_use <%x>\n"
+ " next_to_clean <%x>\n"
+ "tx_buffer_info[next_to_clean]\n"
+ " time_stamp <%lx>\n"
+ " jiffies <%lx>\n",
+ tx_ring->queue_index,
+ rd32(wx, WX_PX_TR_RP(tx_ring->reg_idx)),
+ rd32(wx, WX_PX_TR_WP(tx_ring->reg_idx)),
+ tx_ring->next_to_use, next,
+ tx_ring->tx_buffer_info[next].time_stamp, jiffies);
+
+ netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+
+ wx_warn(wx, "tx hang detected on queue %d, resetting adapter\n",
+ tx_ring->queue_index);
+
+ wx_tx_timeout_reset(wx);
+}
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.h b/drivers/net/ethernet/wangxun/libwx/wx_err.h
new file mode 100644
index 000000000000..e317e6c8d928
--- /dev/null
+++ b/drivers/net/ethernet/wangxun/libwx/wx_err.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * WangXun Gigabit PCI Express Linux driver
+ * Copyright (c) 2015 - 2026 Beijing WangXun Technology Co., Ltd.
+ */
+
+#ifndef _WX_ERR_H_
+#define _WX_ERR_H_
+
+void wx_handle_errors_subtask(struct wx *wx);
+void wx_tx_timeout(struct net_device *netdev, unsigned int txqueue);
+void wx_handle_tx_hang(struct wx_ring *tx_ring, unsigned int next);
+
+#endif /* _WX_ERR_H_ */
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_hw.c b/drivers/net/ethernet/wangxun/libwx/wx_hw.c
index d3772d01e00b..401dc7eb1137 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_hw.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_hw.c
@@ -1932,6 +1932,7 @@ static void wx_configure_tx_ring(struct wx *wx,
else
ring->atr_sample_rate = 0;
+ bitmap_zero(ring->state, WX_RING_STATE_NBITS);
/* reinitialize tx_buffer_info */
memset(ring->tx_buffer_info, 0,
sizeof(struct wx_tx_buffer) * ring->count);
@@ -2847,16 +2848,26 @@ EXPORT_SYMBOL(wx_fc_enable);
static void wx_update_xoff_rx_lfc(struct wx *wx)
{
struct wx_hw_stats *hwstats = &wx->stats;
+ u64 data;
+ int i;
if (wx->fc.mode != wx_fc_full &&
wx->fc.mode != wx_fc_rx_pause)
return;
if (wx->mac.type >= wx_mac_aml)
- hwstats->lxoffrxc += rd32_wrap(wx, WX_MAC_LXOFFRXC_AML,
- &wx->last_stats.lxoffrxc);
+ data = rd32_wrap(wx, WX_MAC_LXOFFRXC_AML,
+ &wx->last_stats.lxoffrxc);
else
- hwstats->lxoffrxc += rd64(wx, WX_MAC_LXOFFRXC);
+ data = rd64(wx, WX_MAC_LXOFFRXC);
+ hwstats->lxoffrxc += data;
+
+ /* refill credits (no tx hang) if we received xoff */
+ if (!data)
+ return;
+
+ for (i = 0; i < wx->num_tx_queues; i++)
+ clear_bit(WX_HANG_CHECK_ARMED, wx->tx_ring[i]->state);
}
/**
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
index 746623fa59b4..9e6167b43f75 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
@@ -14,6 +14,7 @@
#include "wx_type.h"
#include "wx_lib.h"
+#include "wx_err.h"
#include "wx_ptp.h"
#include "wx_hw.h"
#include "wx_vf_lib.h"
@@ -742,6 +743,36 @@ static struct netdev_queue *wx_txring_txq(const struct wx_ring *ring)
return netdev_get_tx_queue(ring->netdev, ring->queue_index);
}
+static u32 wx_get_tx_pending(struct wx_ring *ring)
+{
+ unsigned int head, tail;
+
+ head = ring->next_to_clean;
+ tail = ring->next_to_use;
+
+ return ((head <= tail) ? tail : tail + ring->count) - head;
+}
+
+bool wx_check_tx_hang(struct wx_ring *ring)
+{
+ u32 tx_done_old = ring->tx_stats.tx_done_old;
+ u32 tx_pending = wx_get_tx_pending(ring);
+ u32 tx_done = ring->stats.packets;
+
+ clear_bit(WX_TX_DETECT_HANG, ring->state);
+
+ if (tx_done_old == tx_done && tx_pending)
+ /* make sure it is true for two checks in a row */
+ return test_and_set_bit(WX_HANG_CHECK_ARMED, ring->state);
+
+ /* update completed stats and continue */
+ ring->tx_stats.tx_done_old = tx_done;
+ /* reset the countdown */
+ clear_bit(WX_HANG_CHECK_ARMED, ring->state);
+
+ return false;
+}
+
/**
* wx_clean_tx_irq - Reclaim resources after transmit completes
* @q_vector: structure containing interrupt and ring information
@@ -866,6 +897,12 @@ static bool wx_clean_tx_irq(struct wx_q_vector *q_vector,
netdev_tx_completed_queue(wx_txring_txq(tx_ring),
total_packets, total_bytes);
+ if (test_bit(WX_TX_DETECT_HANG, tx_ring->state) &&
+ wx_check_tx_hang(tx_ring)) {
+ wx_handle_tx_hang(tx_ring, i);
+ return true;
+ }
+
#define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
(wx_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.h b/drivers/net/ethernet/wangxun/libwx/wx_lib.h
index aed6ea8cf0d6..e373cd7f05d3 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.h
@@ -10,6 +10,7 @@
struct wx_dec_ptype wx_decode_ptype(const u8 ptype);
void wx_alloc_rx_buffers(struct wx_ring *rx_ring, u16 cleaned_count);
u16 wx_desc_unused(struct wx_ring *ring);
+bool wx_check_tx_hang(struct wx_ring *ring);
netdev_tx_t wx_xmit_frame(struct sk_buff *skb,
struct net_device *netdev);
void wx_napi_enable_all(struct wx *wx);
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index 0da5565ee4ff..f65c2d7bae39 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -1039,6 +1039,7 @@ struct wx_queue_stats {
struct wx_tx_queue_stats {
u64 restart_queue;
u64 tx_busy;
+ u32 tx_done_old;
};
struct wx_rx_queue_stats {
@@ -1054,6 +1055,12 @@ struct wx_rx_queue_stats {
#define wx_for_each_ring(posm, headm) \
for (posm = (headm).ring; posm; posm = posm->next)
+enum wx_ring_state {
+ WX_TX_DETECT_HANG,
+ WX_HANG_CHECK_ARMED,
+ WX_RING_STATE_NBITS
+};
+
struct wx_ring_container {
struct wx_ring *ring; /* pointer to linked list of rings */
unsigned int total_bytes; /* total bytes processed this int */
@@ -1073,6 +1080,7 @@ struct wx_ring {
struct wx_tx_buffer *tx_buffer_info;
struct wx_rx_buffer *rx_buffer_info;
};
+ DECLARE_BITMAP(state, WX_RING_STATE_NBITS);
u8 __iomem *tail;
dma_addr_t dma; /* phys. address of descriptor ring */
dma_addr_t headwb_dma;
@@ -1273,6 +1281,7 @@ enum wx_pf_flags {
WX_FLAG_NEED_DO_RESET,
WX_FLAG_RX_MERGE_ENABLED,
WX_FLAG_TXHEAD_WB_ENABLED,
+ WX_FLAG_NEED_PF_RESET,
WX_PF_FLAGS_NBITS /* must be last */
};
@@ -1503,7 +1512,8 @@ rd32_wrap(struct wx *wx, u32 reg, u32 *last)
#define wx_err(wx, fmt, arg...) \
dev_err(&(wx)->pdev->dev, fmt, ##arg)
-
+#define wx_warn(wx, fmt, arg...) \
+ dev_warn(&(wx)->pdev->dev, fmt, ##arg)
#define wx_dbg(wx, fmt, arg...) \
dev_dbg(&(wx)->pdev->dev, fmt, ##arg)
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
index bd905e267575..e9561996b970 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
@@ -14,6 +14,7 @@
#include "../libwx/wx_type.h"
#include "../libwx/wx_hw.h"
#include "../libwx/wx_lib.h"
+#include "../libwx/wx_err.h"
#include "../libwx/wx_ptp.h"
#include "../libwx/wx_mbx.h"
#include "../libwx/wx_sriov.h"
@@ -147,6 +148,7 @@ static void ngbe_service_task(struct work_struct *work)
{
struct wx *wx = container_of(work, struct wx, service_task);
+ wx_handle_errors_subtask(wx);
wx_update_stats(wx);
wx_service_event_complete(wx);
@@ -642,6 +644,7 @@ static const struct net_device_ops ngbe_netdev_ops = {
.ndo_stop = ngbe_close,
.ndo_change_mtu = wx_change_mtu,
.ndo_start_xmit = wx_xmit_frame,
+ .ndo_tx_timeout = wx_tx_timeout,
.ndo_set_rx_mode = wx_set_rx_mode,
.ndo_set_features = wx_set_features,
.ndo_fix_features = wx_fix_features,
@@ -731,6 +734,7 @@ static int ngbe_probe(struct pci_dev *pdev,
wx->driver_name = ngbe_driver_name;
ngbe_set_ethtool_ops(netdev);
netdev->netdev_ops = &ngbe_netdev_ops;
+ netdev->watchdog_timeo = 5 * HZ;
netdev->features = NETIF_F_SG | NETIF_F_IP_CSUM |
NETIF_F_TSO | NETIF_F_TSO6 |
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
index 8b7c3753bb6a..5793da5b7bab 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
@@ -14,6 +14,7 @@
#include "../libwx/wx_type.h"
#include "../libwx/wx_lib.h"
+#include "../libwx/wx_err.h"
#include "../libwx/wx_ptp.h"
#include "../libwx/wx_hw.h"
#include "../libwx/wx_mbx.h"
@@ -128,6 +129,7 @@ static void txgbe_service_task(struct work_struct *work)
{
struct wx *wx = container_of(work, struct wx, service_task);
+ wx_handle_errors_subtask(wx);
txgbe_module_detection_subtask(wx);
txgbe_link_config_subtask(wx);
wx_update_stats(wx);
@@ -659,6 +661,7 @@ static const struct net_device_ops txgbe_netdev_ops = {
.ndo_stop = txgbe_close,
.ndo_change_mtu = wx_change_mtu,
.ndo_start_xmit = wx_xmit_frame,
+ .ndo_tx_timeout = wx_tx_timeout,
.ndo_set_rx_mode = wx_set_rx_mode,
.ndo_set_features = wx_set_features,
.ndo_fix_features = wx_fix_features,
@@ -750,6 +753,7 @@ static int txgbe_probe(struct pci_dev *pdev,
wx->driver_name = txgbe_driver_name;
txgbe_set_ethtool_ops(netdev);
netdev->netdev_ops = &txgbe_netdev_ops;
+ netdev->watchdog_timeo = 5 * HZ;
netdev->udp_tunnel_nic_info = &txgbe_udp_tunnels;
/* setup the private structure */
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v1 3/5] net: wangxun: add reinit parameter to wx->do_reset callback
2026-04-28 2:11 [PATCH net-next v1 0/5] net: wangxun: timeout and error Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 2/5] net: wangxun: add Tx timeout process Jiawen Wu
@ 2026-04-28 2:11 ` Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 4/5] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
4 siblings, 0 replies; 10+ messages in thread
From: Jiawen Wu @ 2026-04-28 2:11 UTC (permalink / raw)
To: netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Richard Cochran, Russell King,
Simon Horman, Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato,
Jacob Keller, Fabio Baltieri, Jiawen Wu
To implement a simple hardware reset without tearing down the network
interface state, introduce a boolean 'reinit' parameter to wx->do_reset
callback.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
drivers/net/ethernet/wangxun/libwx/wx_err.c | 2 +-
drivers/net/ethernet/wangxun/libwx/wx_ethtool.c | 2 +-
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 4 ++--
drivers/net/ethernet/wangxun/libwx/wx_type.h | 2 +-
drivers/net/ethernet/wangxun/ngbe/ngbe_main.c | 4 ++--
drivers/net/ethernet/wangxun/ngbe/ngbe_type.h | 2 +-
drivers/net/ethernet/wangxun/txgbe/txgbe_main.c | 4 ++--
drivers/net/ethernet/wangxun/txgbe/txgbe_type.h | 2 +-
8 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
index 42e00f0bd8da..e7c9dcb148b5 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
@@ -23,7 +23,7 @@ static void wx_reset_subtask(struct wx *wx)
if (test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags)) {
if (wx->do_reset)
- wx->do_reset(wx->netdev);
+ wx->do_reset(wx->netdev, true);
}
rtnl_unlock();
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_ethtool.c b/drivers/net/ethernet/wangxun/libwx/wx_ethtool.c
index 5df971aca9e3..d1356ff5d69b 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_ethtool.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_ethtool.c
@@ -395,7 +395,7 @@ static void wx_update_rsc(struct wx *wx)
/* reset the device to apply the new RSC setting */
if (need_reset && wx->do_reset)
- wx->do_reset(netdev);
+ wx->do_reset(netdev, true);
}
int wx_set_coalesce(struct net_device *netdev,
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
index 9e6167b43f75..3216dee778be 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
@@ -3146,7 +3146,7 @@ int wx_set_features(struct net_device *netdev, netdev_features_t features)
netdev->features = features;
if (changed & NETIF_F_HW_VLAN_CTAG_RX && wx->do_reset)
- wx->do_reset(netdev);
+ wx->do_reset(netdev, true);
else if (changed & (NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_CTAG_FILTER))
wx_set_rx_mode(netdev);
@@ -3196,7 +3196,7 @@ int wx_set_features(struct net_device *netdev, netdev_features_t features)
out:
if (need_reset && wx->do_reset)
- wx->do_reset(netdev);
+ wx->do_reset(netdev, true);
return 0;
}
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index f65c2d7bae39..671ac0a19dee 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -1402,7 +1402,7 @@ struct wx {
void (*atr)(struct wx_ring *ring, struct wx_tx_buffer *first, u8 ptype);
void (*configure_fdir)(struct wx *wx);
int (*setup_tc)(struct net_device *netdev, u8 tc);
- void (*do_reset)(struct net_device *netdev);
+ void (*do_reset)(struct net_device *netdev, bool reinit);
int (*ptp_setup_sdp)(struct wx *wx);
void (*set_num_queues)(struct wx *wx);
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
index e9561996b970..ec14dd47cd42 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
@@ -629,11 +629,11 @@ static void ngbe_reinit_locked(struct wx *wx)
mutex_unlock(&wx->reset_lock);
}
-void ngbe_do_reset(struct net_device *netdev)
+void ngbe_do_reset(struct net_device *netdev, bool reinit)
{
struct wx *wx = netdev_priv(netdev);
- if (netif_running(netdev))
+ if (netif_running(netdev) && reinit)
ngbe_reinit_locked(wx);
else
ngbe_reset(wx);
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
index 4f648f272c08..c9233dc7ae50 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
@@ -125,6 +125,6 @@ extern char ngbe_driver_name[];
void ngbe_down(struct wx *wx);
void ngbe_up(struct wx *wx);
int ngbe_setup_tc(struct net_device *dev, u8 tc);
-void ngbe_do_reset(struct net_device *netdev);
+void ngbe_do_reset(struct net_device *netdev, bool reinit);
#endif /* _NGBE_TYPE_H_ */
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
index 5793da5b7bab..9887638203cb 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
@@ -613,11 +613,11 @@ static void txgbe_reinit_locked(struct wx *wx)
mutex_unlock(&wx->reset_lock);
}
-void txgbe_do_reset(struct net_device *netdev)
+void txgbe_do_reset(struct net_device *netdev, bool reinit)
{
struct wx *wx = netdev_priv(netdev);
- if (netif_running(netdev))
+ if (netif_running(netdev) && reinit)
txgbe_reinit_locked(wx);
else
txgbe_reset(wx);
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h b/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h
index 6b05f32b4a01..1e373f7fd9b5 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h
@@ -313,7 +313,7 @@ extern char txgbe_driver_name[];
void txgbe_down(struct wx *wx);
void txgbe_up(struct wx *wx);
int txgbe_setup_tc(struct net_device *dev, u8 tc);
-void txgbe_do_reset(struct net_device *netdev);
+void txgbe_do_reset(struct net_device *netdev, bool reinit);
#define TXGBE_LINK_SPEED_UNKNOWN 0
#define TXGBE_LINK_SPEED_10GB_FULL 4
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v1 4/5] net: wangxun: extract the close_suspend sequence
2026-04-28 2:11 [PATCH net-next v1 0/5] net: wangxun: timeout and error Jiawen Wu
` (2 preceding siblings ...)
2026-04-28 2:11 ` [PATCH net-next v1 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
@ 2026-04-28 2:11 ` Jiawen Wu
2026-04-30 8:29 ` Paolo Abeni
2026-04-28 2:11 ` [PATCH net-next v1 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
4 siblings, 1 reply; 10+ messages in thread
From: Jiawen Wu @ 2026-04-28 2:11 UTC (permalink / raw)
To: netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Richard Cochran, Russell King,
Simon Horman, Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato,
Jacob Keller, Fabio Baltieri, Jiawen Wu
Refactor the .ndo_close implementation by extracting the necessary
hardware shutdown sequence into a dedicated close_suspend function.
This is for later implementation of PCIe error callback function in
libwx.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
drivers/net/ethernet/wangxun/libwx/wx_type.h | 1 +
drivers/net/ethernet/wangxun/ngbe/ngbe_main.c | 18 +++++++++++++-----
drivers/net/ethernet/wangxun/ngbe/ngbe_type.h | 1 +
.../net/ethernet/wangxun/txgbe/txgbe_main.c | 13 +++++++------
.../net/ethernet/wangxun/txgbe/txgbe_type.h | 1 +
5 files changed, 23 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index 671ac0a19dee..4b72835ddec1 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -1403,6 +1403,7 @@ struct wx {
void (*configure_fdir)(struct wx *wx);
int (*setup_tc)(struct net_device *netdev, u8 tc);
void (*do_reset)(struct net_device *netdev, bool reinit);
+ void (*close_suspend)(struct wx *wx);
int (*ptp_setup_sdp)(struct wx *wx);
void (*set_num_queues)(struct wx *wx);
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
index ec14dd47cd42..bd6c0c9c51ba 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
@@ -135,6 +135,7 @@ static int ngbe_sw_init(struct wx *wx)
wx->mbx.size = WX_VXMAILBOX_SIZE;
wx->setup_tc = ngbe_setup_tc;
wx->do_reset = ngbe_do_reset;
+ wx->close_suspend = ngbe_close_suspend;
set_bit(0, &wx->fwd_bitmask);
return 0;
@@ -510,6 +511,16 @@ void ngbe_up(struct wx *wx)
ngbe_up_complete(wx);
}
+void ngbe_close_suspend(struct wx *wx)
+{
+ wx_ptp_suspend(wx);
+ ngbe_down(wx);
+ wx_free_irq(wx);
+ wx_free_isb_resources(wx);
+ wx_free_resources(wx);
+ phylink_disconnect_phy(wx->phylink);
+}
+
/**
* ngbe_close - Disables a network interface
* @netdev: network interface device structure
@@ -526,11 +537,8 @@ static int ngbe_close(struct net_device *netdev)
struct wx *wx = netdev_priv(netdev);
wx_ptp_stop(wx);
- ngbe_down(wx);
- wx_free_irq(wx);
- wx_free_isb_resources(wx);
- wx_free_resources(wx);
- phylink_disconnect_phy(wx->phylink);
+ if (netif_device_present(netdev))
+ ngbe_close_suspend(wx);
wx_control_hw(wx, false);
return 0;
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
index c9233dc7ae50..eb5c92edae06 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_type.h
@@ -126,5 +126,6 @@ void ngbe_down(struct wx *wx);
void ngbe_up(struct wx *wx);
int ngbe_setup_tc(struct net_device *dev, u8 tc);
void ngbe_do_reset(struct net_device *netdev, bool reinit);
+void ngbe_close_suspend(struct wx *wx);
#endif /* _NGBE_TYPE_H_ */
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
index 9887638203cb..3bfb3328b8f3 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
@@ -415,6 +415,7 @@ static int txgbe_sw_init(struct wx *wx)
wx->setup_tc = txgbe_setup_tc;
wx->do_reset = txgbe_do_reset;
+ wx->close_suspend = txgbe_close_suspend;
set_bit(0, &wx->fwd_bitmask);
switch (wx->mac.type) {
@@ -503,10 +504,12 @@ static int txgbe_open(struct net_device *netdev)
* This function should contain the necessary work common to both suspending
* and closing of the device.
*/
-static void txgbe_close_suspend(struct wx *wx)
+void txgbe_close_suspend(struct wx *wx)
{
wx_ptp_suspend(wx);
- txgbe_disable_device(wx);
+ txgbe_down(wx);
+ wx_free_irq(wx);
+ txgbe_free_misc_irq(wx->priv);
wx_free_resources(wx);
}
@@ -526,10 +529,8 @@ static int txgbe_close(struct net_device *netdev)
struct wx *wx = netdev_priv(netdev);
wx_ptp_stop(wx);
- txgbe_down(wx);
- wx_free_irq(wx);
- txgbe_free_misc_irq(wx->priv);
- wx_free_resources(wx);
+ if (netif_device_present(netdev))
+ txgbe_close_suspend(wx);
txgbe_fdir_filter_exit(wx);
wx_control_hw(wx, false);
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h b/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h
index 1e373f7fd9b5..cd50ff1ef2ed 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_type.h
@@ -314,6 +314,7 @@ void txgbe_down(struct wx *wx);
void txgbe_up(struct wx *wx);
int txgbe_setup_tc(struct net_device *dev, u8 tc);
void txgbe_do_reset(struct net_device *netdev, bool reinit);
+void txgbe_close_suspend(struct wx *wx);
#define TXGBE_LINK_SPEED_UNKNOWN 0
#define TXGBE_LINK_SPEED_10GB_FULL 4
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net-next v1 5/5] net: wangxun: implement pci_error_handlers ops
2026-04-28 2:11 [PATCH net-next v1 0/5] net: wangxun: timeout and error Jiawen Wu
` (3 preceding siblings ...)
2026-04-28 2:11 ` [PATCH net-next v1 4/5] net: wangxun: extract the close_suspend sequence Jiawen Wu
@ 2026-04-28 2:11 ` Jiawen Wu
2026-04-30 8:34 ` Paolo Abeni
4 siblings, 1 reply; 10+ messages in thread
From: Jiawen Wu @ 2026-04-28 2:11 UTC (permalink / raw)
To: netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Richard Cochran, Russell King,
Simon Horman, Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato,
Jacob Keller, Fabio Baltieri, Jiawen Wu
Support AER driver to handle the PCIe errors.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
drivers/net/ethernet/wangxun/libwx/wx_err.c | 107 ++++++++++++++++++
drivers/net/ethernet/wangxun/libwx/wx_err.h | 2 +
drivers/net/ethernet/wangxun/libwx/wx_type.h | 1 +
drivers/net/ethernet/wangxun/ngbe/ngbe_main.c | 9 +-
.../net/ethernet/wangxun/txgbe/txgbe_main.c | 5 +-
5 files changed, 121 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
index e7c9dcb148b5..1aefae402c8e 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
@@ -3,11 +3,118 @@
#include <linux/netdevice.h>
#include <linux/pci.h>
+#include <linux/aer.h>
#include "wx_type.h"
#include "wx_lib.h"
#include "wx_err.h"
+/**
+ * wx_io_error_detected - called when PCI error is detected
+ * @pdev: Pointer to PCI device
+ * @state: The current pci connection state
+ *
+ * Return: pci_ers_result_t.
+ *
+ * This function is called after a PCI bus error affecting
+ * this device has been detected.
+ */
+static pci_ers_result_t wx_io_error_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+ struct wx *wx = pci_get_drvdata(pdev);
+ struct net_device *netdev;
+
+ netdev = wx->netdev;
+ if (!netif_device_present(netdev))
+ return PCI_ERS_RESULT_DISCONNECT;
+
+ rtnl_lock();
+ netif_device_detach(netdev);
+
+ if (netif_running(netdev))
+ wx->close_suspend(wx);
+
+ if (state == pci_channel_io_perm_failure) {
+ rtnl_unlock();
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
+
+ if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
+ pci_disable_device(pdev);
+ rtnl_unlock();
+
+ /* Request a slot reset. */
+ return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * wx_io_slot_reset - called after the pci bus has been reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Return: pci_ers_result_t.
+ *
+ * Restart the card from scratch, as if from a cold-boot.
+ */
+static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
+{
+ struct wx *wx = pci_get_drvdata(pdev);
+ pci_ers_result_t result;
+
+ if (pci_enable_device_mem(pdev)) {
+ wx_err(wx, "Cannot re-enable PCI device after reset.\n");
+ result = PCI_ERS_RESULT_DISCONNECT;
+ } else {
+ /* make all bar access done before reset. */
+ smp_mb__before_atomic();
+ clear_bit(WX_STATE_DISABLED, wx->state);
+ pci_set_master(pdev);
+ pci_restore_state(pdev);
+ pci_wake_from_d3(pdev, false);
+
+ wx->do_reset(wx->netdev, false);
+ result = PCI_ERS_RESULT_RECOVERED;
+ }
+
+ pci_aer_clear_nonfatal_status(pdev);
+
+ return result;
+}
+
+/**
+ * wx_io_resume - called when traffic can start flowing again.
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called when the error recovery driver tells us that
+ * its OK to resume normal operation.
+ */
+static void wx_io_resume(struct pci_dev *pdev)
+{
+ struct wx *wx = pci_get_drvdata(pdev);
+ struct net_device *netdev;
+ int err;
+
+ netdev = wx->netdev;
+ rtnl_lock();
+ if (netif_running(netdev)) {
+ err = netdev->netdev_ops->ndo_open(netdev);
+ if (err) {
+ wx_err(wx, "Failed to open netdev after reset\n");
+ goto out;
+ }
+ }
+ netif_device_attach(netdev);
+out:
+ rtnl_unlock();
+}
+
+const struct pci_error_handlers wx_err_handler = {
+ .error_detected = wx_io_error_detected,
+ .slot_reset = wx_io_slot_reset,
+ .resume = wx_io_resume,
+};
+EXPORT_SYMBOL(wx_err_handler);
+
static void wx_reset_subtask(struct wx *wx)
{
if (!test_bit(WX_FLAG_NEED_PF_RESET, wx->flags))
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.h b/drivers/net/ethernet/wangxun/libwx/wx_err.h
index e317e6c8d928..8b1a7863b5b1 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_err.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_err.h
@@ -7,6 +7,8 @@
#ifndef _WX_ERR_H_
#define _WX_ERR_H_
+extern const struct pci_error_handlers wx_err_handler;
+
void wx_handle_errors_subtask(struct wx *wx);
void wx_tx_timeout(struct net_device *netdev, unsigned int txqueue);
void wx_handle_tx_hang(struct wx_ring *tx_ring, unsigned int next);
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_type.h b/drivers/net/ethernet/wangxun/libwx/wx_type.h
index 4b72835ddec1..81e12609d3fa 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_type.h
+++ b/drivers/net/ethernet/wangxun/libwx/wx_type.h
@@ -1215,6 +1215,7 @@ enum wx_state {
WX_STATE_PTP_RUNNING,
WX_STATE_PTP_TX_IN_PROGRESS,
WX_STATE_SERVICE_SCHED,
+ WX_STATE_DISABLED,
WX_STATE_NBITS /* must be last */
};
diff --git a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
index bd6c0c9c51ba..a174605d1105 100644
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
+++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_main.c
@@ -570,7 +570,8 @@ static void ngbe_dev_shutdown(struct pci_dev *pdev, bool *enable_wake)
*enable_wake = !!wufc;
wx_control_hw(wx, false);
- pci_disable_device(pdev);
+ if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
+ pci_disable_device(pdev);
}
static void ngbe_shutdown(struct pci_dev *pdev)
@@ -856,6 +857,7 @@ static int ngbe_probe(struct pci_dev *pdev,
goto err_register;
pci_set_drvdata(pdev, wx);
+ pci_save_state(pdev);
return 0;
@@ -907,7 +909,8 @@ static void ngbe_remove(struct pci_dev *pdev)
kfree(wx->mac_table);
wx_clear_interrupt_scheme(wx);
- pci_disable_device(pdev);
+ if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
+ pci_disable_device(pdev);
}
static int ngbe_suspend(struct pci_dev *pdev, pm_message_t state)
@@ -934,6 +937,7 @@ static int ngbe_resume(struct pci_dev *pdev)
wx_err(wx, "Cannot enable PCI device from suspend\n");
return err;
}
+ clear_bit(WX_STATE_DISABLED, wx->state);
pci_set_master(pdev);
device_wakeup_disable(&pdev->dev);
@@ -958,6 +962,7 @@ static struct pci_driver ngbe_driver = {
.resume = ngbe_resume,
.shutdown = ngbe_shutdown,
.sriov_configure = wx_pci_sriov_configure,
+ .err_handler = &wx_err_handler,
};
module_pci_driver(ngbe_driver);
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
index 3bfb3328b8f3..f992a345af46 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
@@ -900,6 +900,7 @@ static int txgbe_probe(struct pci_dev *pdev,
goto err_remove_phy;
pci_set_drvdata(pdev, wx);
+ pci_save_state(pdev);
netif_tx_stop_all_queues(netdev);
@@ -970,7 +971,8 @@ static void txgbe_remove(struct pci_dev *pdev)
kfree(wx->mac_table);
wx_clear_interrupt_scheme(wx);
- pci_disable_device(pdev);
+ if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
+ pci_disable_device(pdev);
}
static struct pci_driver txgbe_driver = {
@@ -980,6 +982,7 @@ static struct pci_driver txgbe_driver = {
.remove = txgbe_remove,
.shutdown = txgbe_shutdown,
.sriov_configure = wx_pci_sriov_configure,
+ .err_handler = &wx_err_handler,
};
module_pci_driver(txgbe_driver);
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v1 2/5] net: wangxun: add Tx timeout process
2026-04-28 2:11 ` [PATCH net-next v1 2/5] net: wangxun: add Tx timeout process Jiawen Wu
@ 2026-04-30 8:24 ` Paolo Abeni
2026-04-30 8:33 ` Jiawen Wu
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2026-04-30 8:24 UTC (permalink / raw)
To: Jiawen Wu, netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Richard Cochran, Russell King, Simon Horman,
Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato, Jacob Keller,
Fabio Baltieri
On 4/28/26 4:11 AM, Jiawen Wu wrote:
> +static void wx_reset_subtask(struct wx *wx)
> +{
> + if (!test_bit(WX_FLAG_NEED_PF_RESET, wx->flags))
> + return;
> +
> + rtnl_lock();
> +
> + if (!netif_running(wx->netdev) ||
> + test_bit(WX_STATE_RESETTING, wx->state))
> + return;
Sashiko says:
Does this early return path leak the rtnl_lock?
If the interface is brought down concurrently while a reset is scheduled,
it appears this would return without calling rtnl_unlock(). Since all
network
configuration operations require the RTNL lock, could this lead to a
system-wide deadlock in the networking subsystem?
> +
> + wx_warn(wx, "Reset adapter.\n");
> +
> + if (test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags)) {
> + if (wx->do_reset)
> + wx->do_reset(wx->netdev);
> + }
> +
> + rtnl_unlock();
> +}
> +
> +/*
> + * wx_check_tx_hang_subtask - check for hung queues and dropped interrupts
> + * @wx - pointer to the device wx structure
> + *
> + * This function serves two purposes. First it strobes the interrupt lines
> + * in order to make certain interrupts are occurring. Secondly it sets the
> + * bits needed to check for TX hangs. As a result we should immediately
> + * determine if a hang has occurred.
> + */
> +static void wx_check_tx_hang_subtask(struct wx *wx)
> +{
> + int i;
> +
> + /* If we're down or resetting, just bail */
> + if (!netif_running(wx->netdev) ||
> + test_bit(WX_STATE_RESETTING, wx->state))
> + return;
> +
> + /* Force detection of hung controller */
> + if (netif_carrier_ok(wx->netdev)) {
> + for (i = 0; i < wx->num_tx_queues; i++)
> + set_bit(WX_TX_DETECT_HANG, wx->tx_ring[i]->state);
> + }
> +}
> +
> +void wx_handle_errors_subtask(struct wx *wx)
> +{
> + wx_reset_subtask(wx);
> + wx_check_tx_hang_subtask(wx);
> +}
> +EXPORT_SYMBOL(wx_handle_errors_subtask);
> +
> +static void wx_tx_timeout_reset(struct wx *wx)
> +{
> + if (!netif_running(wx->netdev))
> + return;
> +
> + set_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
> + wx_warn(wx, "initiating reset due to tx timeout\n");
> + wx_service_event_schedule(wx);
> +}
> +
> +void wx_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> +{
> + struct wx *wx = netdev_priv(netdev);
> + u32 head, tail;
> + int i;
> +
> + for (i = 0; i < wx->num_tx_queues; i++) {
> + struct wx_ring *tx_ring = wx->tx_ring[i];
> +
> + if (test_bit(WX_TX_DETECT_HANG, tx_ring->state) &&
> + wx_check_tx_hang(tx_ring))
> + wx_warn(wx, "Real tx hang detected on queue %d\n", i);
> +
> + head = rd32(wx, WX_PX_TR_RP(tx_ring->reg_idx));
> + tail = rd32(wx, WX_PX_TR_WP(tx_ring->reg_idx));
> + wx_warn(wx,
> + "tx ring %d next_to_use is %d, next_to_clean is %d\n",
> + i, tx_ring->next_to_use,
> + tx_ring->next_to_clean);
> + wx_warn(wx, "tx ring %d hw rp is 0x%x, wp is 0x%x\n",
> + i, head, tail);
> + }
> +
> + wx_tx_timeout_reset(wx);
> +}
> +EXPORT_SYMBOL(wx_tx_timeout);
> +
> +void wx_handle_tx_hang(struct wx_ring *tx_ring, unsigned int next)
> +{
> + struct wx *wx = netdev_priv(tx_ring->netdev);
> +
> + wx_warn(wx, "Detected Tx Unit Hang\n"
> + " Tx Queue <%d>\n"
> + " TDH, TDT <%x>, <%x>\n"
> + " next_to_use <%x>\n"
> + " next_to_clean <%x>\n"
> + "tx_buffer_info[next_to_clean]\n"
> + " time_stamp <%lx>\n"
> + " jiffies <%lx>\n",
It's better to use a single string for the whole message, even if it
would exceed the 80 chars limit
> + tx_ring->queue_index,
> + rd32(wx, WX_PX_TR_RP(tx_ring->reg_idx)),
> + rd32(wx, WX_PX_TR_WP(tx_ring->reg_idx)),
> + tx_ring->next_to_use, next,
> + tx_ring->tx_buffer_info[next].time_stamp, jiffies);
> +
> + netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
> +
> + wx_warn(wx, "tx hang detected on queue %d, resetting adapter\n",
> + tx_ring->queue_index);
Possibly two warn messages for the same cause is a bit too verbose (same
in wx_tx_timeout()).
> +bool wx_check_tx_hang(struct wx_ring *ring)
> +{
> + u32 tx_done_old = ring->tx_stats.tx_done_old;
> + u32 tx_pending = wx_get_tx_pending(ring);
> + u32 tx_done = ring->stats.packets;
> +
> + clear_bit(WX_TX_DETECT_HANG, ring->state);
It looks like every caller checks WX_TX_DETECT_HANG, it would be
probably better to use test_and_clear_bit() here, and drop the test from
the caller.
/P
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v1 4/5] net: wangxun: extract the close_suspend sequence
2026-04-28 2:11 ` [PATCH net-next v1 4/5] net: wangxun: extract the close_suspend sequence Jiawen Wu
@ 2026-04-30 8:29 ` Paolo Abeni
0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2026-04-30 8:29 UTC (permalink / raw)
To: Jiawen Wu, netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Richard Cochran, Russell King, Simon Horman,
Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato, Jacob Keller,
Fabio Baltieri
On 4/28/26 4:11 AM, Jiawen Wu wrote:
> @@ -510,6 +511,16 @@ void ngbe_up(struct wx *wx)
> ngbe_up_complete(wx);
> }
>
> +void ngbe_close_suspend(struct wx *wx)
> +{
> + wx_ptp_suspend(wx);
Sashiko says:
Does this introduce a redundant call to wx_ptp_suspend()?
Both ngbe_close() and txgbe_close() call wx_ptp_stop() immediately before
conditionally calling ngbe_close_suspend() or txgbe_close_suspend().
Looking at wx_ptp_stop(), it already executes wx_ptp_suspend() as its first
operation.
While test_and_clear_bit(WX_STATE_PTP_RUNNING, wx->state) makes the second
call safe, it seems to be redundant execution.
> + ngbe_down(wx);
> + wx_free_irq(wx);
> + wx_free_isb_resources(wx);
> + wx_free_resources(wx);
> + phylink_disconnect_phy(wx->phylink);
> +}
> +
> /**
> * ngbe_close - Disables a network interface
> * @netdev: network interface device structure
> @@ -526,11 +537,8 @@ static int ngbe_close(struct net_device *netdev)
> struct wx *wx = netdev_priv(netdev);
>
> wx_ptp_stop(wx);
> - ngbe_down(wx);
> - wx_free_irq(wx);
> - wx_free_isb_resources(wx);
> - wx_free_resources(wx);
> - phylink_disconnect_phy(wx->phylink);
> + if (netif_device_present(netdev))
> + ngbe_close_suspend(wx);
Sashiko says:
On the ngbe side, can this skip the teardown sequence entirely on the
suspend/shutdown paths?
ngbe_dev_shutdown() (shared by .suspend and .shutdown) does:
static void ngbe_dev_shutdown(struct pci_dev *pdev, bool *enable_wake)
{
...
netif_device_detach(netdev);
if (netif_running(netdev))
ngbe_close(netdev);
wx_clear_interrupt_scheme(wx);
...
}
netif_device_detach() clears __LINK_STATE_PRESENT, so by the time
ngbe_close() runs, netif_device_present(netdev) returns false and the new
guard
if (netif_device_present(netdev))
ngbe_close_suspend(wx);
skips ngbe_close_suspend() on the suspend and shutdown paths. Before this
patch ngbe_close() unconditionally ran ngbe_down(), wx_free_irq(),
wx_free_isb_resources(), wx_free_resources() and
phylink_disconnect_phy() on those paths.
Does that mean after this patch, suspend and shutdown:
- leave the requested MSI-X / legacy IRQ handlers registered while
wx_clear_interrupt_scheme() subsequently frees the vector pool,
- leak the coherent ISB DMA buffer (isb_mem / isb_dma) freed in
wx_free_isb_resources(),
- leak the TX/RX ring DMA allocations freed in wx_free_resources(),
- leave phylink connected (no phylink_disconnect_phy()), so resume
re-enters phylink_connect_phy on an already connected state, and
- skip ngbe_down() / ngbe_reset(), so the MAC is not quiesced before
pci_disable_device()?
The txgbe shutdown path avoids this because txgbe_dev_shutdown() calls
txgbe_close_suspend() directly rather than going through txgbe_close():
static void txgbe_dev_shutdown(struct pci_dev *pdev)
{
...
netif_device_detach(netdev);
rtnl_lock();
if (netif_running(netdev))
txgbe_close_suspend(wx);
rtnl_unlock();
...
}
Would an equivalent change in ngbe_dev_shutdown() (call
ngbe_close_suspend() directly instead of ngbe_close()) be the intended
pairing for the new guard in ngbe_close()?
/P
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH net-next v1 2/5] net: wangxun: add Tx timeout process
2026-04-30 8:24 ` Paolo Abeni
@ 2026-04-30 8:33 ` Jiawen Wu
0 siblings, 0 replies; 10+ messages in thread
From: Jiawen Wu @ 2026-04-30 8:33 UTC (permalink / raw)
To: 'Paolo Abeni', netdev
Cc: 'Mengyuan Lou', 'Andrew Lunn',
'David S. Miller', 'Eric Dumazet',
'Jakub Kicinski', 'Richard Cochran',
'Russell King', 'Simon Horman',
'Kees Cook', 'Larysa Zaremba',
'Breno Leitao', 'Joe Damato',
'Jacob Keller', 'Fabio Baltieri'
On Thu, April 30, 2026 4:24 PM, Paolo Abeni wrote:
> On 4/28/26 4:11 AM, Jiawen Wu wrote:
> > +static void wx_reset_subtask(struct wx *wx)
> > +{
> > + if (!test_bit(WX_FLAG_NEED_PF_RESET, wx->flags))
> > + return;
> > +
> > + rtnl_lock();
> > +
> > + if (!netif_running(wx->netdev) ||
> > + test_bit(WX_STATE_RESETTING, wx->state))
> > + return;
>
> Sashiko says:
>
> Does this early return path leak the rtnl_lock?
> If the interface is brought down concurrently while a reset is scheduled,
> it appears this would return without calling rtnl_unlock(). Since all
> network
> configuration operations require the RTNL lock, could this lead to a
> system-wide deadlock in the networking subsystem?
Thanks for your review.
Unfortunately, I just sent V2 patch set address on Sashiko's comments...
I'll make V3 patches according to your follow comments.
>
> > +
> > + wx_warn(wx, "Reset adapter.\n");
> > +
> > + if (test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags)) {
> > + if (wx->do_reset)
> > + wx->do_reset(wx->netdev);
> > + }
> > +
> > + rtnl_unlock();
> > +}
> > +
> > +/*
> > + * wx_check_tx_hang_subtask - check for hung queues and dropped interrupts
> > + * @wx - pointer to the device wx structure
> > + *
> > + * This function serves two purposes. First it strobes the interrupt lines
> > + * in order to make certain interrupts are occurring. Secondly it sets the
> > + * bits needed to check for TX hangs. As a result we should immediately
> > + * determine if a hang has occurred.
> > + */
> > +static void wx_check_tx_hang_subtask(struct wx *wx)
> > +{
> > + int i;
> > +
> > + /* If we're down or resetting, just bail */
> > + if (!netif_running(wx->netdev) ||
> > + test_bit(WX_STATE_RESETTING, wx->state))
> > + return;
> > +
> > + /* Force detection of hung controller */
> > + if (netif_carrier_ok(wx->netdev)) {
> > + for (i = 0; i < wx->num_tx_queues; i++)
> > + set_bit(WX_TX_DETECT_HANG, wx->tx_ring[i]->state);
> > + }
> > +}
> > +
> > +void wx_handle_errors_subtask(struct wx *wx)
> > +{
> > + wx_reset_subtask(wx);
> > + wx_check_tx_hang_subtask(wx);
> > +}
> > +EXPORT_SYMBOL(wx_handle_errors_subtask);
> > +
> > +static void wx_tx_timeout_reset(struct wx *wx)
> > +{
> > + if (!netif_running(wx->netdev))
> > + return;
> > +
> > + set_bit(WX_FLAG_NEED_PF_RESET, wx->flags);
> > + wx_warn(wx, "initiating reset due to tx timeout\n");
> > + wx_service_event_schedule(wx);
> > +}
> > +
> > +void wx_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> > +{
> > + struct wx *wx = netdev_priv(netdev);
> > + u32 head, tail;
> > + int i;
> > +
> > + for (i = 0; i < wx->num_tx_queues; i++) {
> > + struct wx_ring *tx_ring = wx->tx_ring[i];
> > +
> > + if (test_bit(WX_TX_DETECT_HANG, tx_ring->state) &&
> > + wx_check_tx_hang(tx_ring))
> > + wx_warn(wx, "Real tx hang detected on queue %d\n", i);
> > +
> > + head = rd32(wx, WX_PX_TR_RP(tx_ring->reg_idx));
> > + tail = rd32(wx, WX_PX_TR_WP(tx_ring->reg_idx));
> > + wx_warn(wx,
> > + "tx ring %d next_to_use is %d, next_to_clean is %d\n",
> > + i, tx_ring->next_to_use,
> > + tx_ring->next_to_clean);
> > + wx_warn(wx, "tx ring %d hw rp is 0x%x, wp is 0x%x\n",
> > + i, head, tail);
> > + }
> > +
> > + wx_tx_timeout_reset(wx);
> > +}
> > +EXPORT_SYMBOL(wx_tx_timeout);
> > +
> > +void wx_handle_tx_hang(struct wx_ring *tx_ring, unsigned int next)
> > +{
> > + struct wx *wx = netdev_priv(tx_ring->netdev);
> > +
> > + wx_warn(wx, "Detected Tx Unit Hang\n"
> > + " Tx Queue <%d>\n"
> > + " TDH, TDT <%x>, <%x>\n"
> > + " next_to_use <%x>\n"
> > + " next_to_clean <%x>\n"
> > + "tx_buffer_info[next_to_clean]\n"
> > + " time_stamp <%lx>\n"
> > + " jiffies <%lx>\n",
>
> It's better to use a single string for the whole message, even if it
> would exceed the 80 chars limit
>
> > + tx_ring->queue_index,
> > + rd32(wx, WX_PX_TR_RP(tx_ring->reg_idx)),
> > + rd32(wx, WX_PX_TR_WP(tx_ring->reg_idx)),
> > + tx_ring->next_to_use, next,
> > + tx_ring->tx_buffer_info[next].time_stamp, jiffies);
> > +
> > + netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
> > +
> > + wx_warn(wx, "tx hang detected on queue %d, resetting adapter\n",
> > + tx_ring->queue_index);
>
> Possibly two warn messages for the same cause is a bit too verbose (same
> in wx_tx_timeout()).
>
> > +bool wx_check_tx_hang(struct wx_ring *ring)
> > +{
> > + u32 tx_done_old = ring->tx_stats.tx_done_old;
> > + u32 tx_pending = wx_get_tx_pending(ring);
> > + u32 tx_done = ring->stats.packets;
> > +
> > + clear_bit(WX_TX_DETECT_HANG, ring->state);
>
> It looks like every caller checks WX_TX_DETECT_HANG, it would be
> probably better to use test_and_clear_bit() here, and drop the test from
> the caller.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net-next v1 5/5] net: wangxun: implement pci_error_handlers ops
2026-04-28 2:11 ` [PATCH net-next v1 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
@ 2026-04-30 8:34 ` Paolo Abeni
0 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2026-04-30 8:34 UTC (permalink / raw)
To: Jiawen Wu, netdev
Cc: Mengyuan Lou, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Richard Cochran, Russell King, Simon Horman,
Kees Cook, Larysa Zaremba, Breno Leitao, Joe Damato, Jacob Keller,
Fabio Baltieri
On 4/28/26 4:11 AM, Jiawen Wu wrote:
> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> index e7c9dcb148b5..1aefae402c8e 100644
> --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> @@ -3,11 +3,118 @@
>
> #include <linux/netdevice.h>
> #include <linux/pci.h>
> +#include <linux/aer.h>
>
> #include "wx_type.h"
> #include "wx_lib.h"
> #include "wx_err.h"
>
> +/**
> + * wx_io_error_detected - called when PCI error is detected
> + * @pdev: Pointer to PCI device
> + * @state: The current pci connection state
> + *
> + * Return: pci_ers_result_t.
> + *
> + * This function is called after a PCI bus error affecting
> + * this device has been detected.
> + */
> +static pci_ers_result_t wx_io_error_detected(struct pci_dev *pdev,
> + pci_channel_state_t state)
> +{
> + struct wx *wx = pci_get_drvdata(pdev);
> + struct net_device *netdev;
> +
> + netdev = wx->netdev;
> + if (!netif_device_present(netdev))
> + return PCI_ERS_RESULT_DISCONNECT;
> +
> + rtnl_lock();
> + netif_device_detach(netdev);
> +
> + if (netif_running(netdev))
> + wx->close_suspend(wx);
> +
> + if (state == pci_channel_io_perm_failure) {
> + rtnl_unlock();
> + return PCI_ERS_RESULT_DISCONNECT;
Sashiko says:
On the pci_channel_io_perm_failure path here, WX_STATE_DISABLED is not
set and pci_disable_device() is not called. When the PCI core then
follows up with .remove(), ngbe_remove()/txgbe_remove() do:
if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
pci_disable_device(pdev);
Since the bit is still clear, pci_disable_device() is invoked on a
device that has already been torn down by the PCI core on the
perm_failure path. Should the perm_failure branch also set
WX_STATE_DISABLED (and arguably call pci_disable_device()) for symmetry
with the NEED_RESET branch below and with how drivers like ixgbe handle
this case?
> + }
> +
> + if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
> + pci_disable_device(pdev);
> + rtnl_unlock();
> +
> + /* Request a slot reset. */
> + return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +/**
> + * wx_io_slot_reset - called after the pci bus has been reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Return: pci_ers_result_t.
> + *
> + * Restart the card from scratch, as if from a cold-boot.
> + */
> +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
> +{
> + struct wx *wx = pci_get_drvdata(pdev);
> + pci_ers_result_t result;
> +
> + if (pci_enable_device_mem(pdev)) {
> + wx_err(wx, "Cannot re-enable PCI device after reset.\n");
> + result = PCI_ERS_RESULT_DISCONNECT;
> + } else {
> + /* make all bar access done before reset. */
> + smp_mb__before_atomic();
> + clear_bit(WX_STATE_DISABLED, wx->state);
> + pci_set_master(pdev);
> + pci_restore_state(pdev);
> + pci_wake_from_d3(pdev, false);
> +
> + wx->do_reset(wx->netdev, false);
> + result = PCI_ERS_RESULT_RECOVERED;
> + }
> +
> + pci_aer_clear_nonfatal_status(pdev);
Sashiko says:
Should pci_aer_clear_nonfatal_status() be called on the
PCI_ERS_RESULT_DISCONNECT path where pci_enable_device_mem() failed?
It runs unconditionally here, and when CONFIG_PCIEAER=n the stub in
include/linux/aer.h returns -EINVAL, which is also ignored. Would it
be cleaner to only call this on the recovered path?
> +
> + return result;
> +}
> +
> +/**
> + * wx_io_resume - called when traffic can start flowing again.
> + * @pdev: Pointer to PCI device
> + *
> + * This callback is called when the error recovery driver tells us that
> + * its OK to resume normal operation.
> + */
> +static void wx_io_resume(struct pci_dev *pdev)
> +{
> + struct wx *wx = pci_get_drvdata(pdev);
> + struct net_device *netdev;
> + int err;
> +
> + netdev = wx->netdev;
> + rtnl_lock();
> + if (netif_running(netdev)) {
> + err = netdev->netdev_ops->ndo_open(netdev);
> + if (err) {
> + wx_err(wx, "Failed to open netdev after reset\n");
> + goto out;
> + }
> + }
> + netif_device_attach(netdev);
> +out:
> + rtnl_unlock();
Sashiko says:
What state is the netdev left in when ndo_open() fails here?
wx_io_error_detected() called netif_device_detach(), which clears
__LINK_STATE_PRESENT but leaves __LINK_STATE_START set, so
netif_running() still returns true. On ndo_open() failure, the 'out'
label skips netif_device_attach() and returns void (so the AER core
gets no error signal).
If ngbe_open()/txgbe_open() partially succeeded and then failed on an
internal error path that freed rings/IRQs/phylink resources, the netdev
is then left with netif_running()==true, __LINK_STATE_PRESENT clear,
and the hardware resources released. The service_timer and
service_task are still running, and wx_reset_subtask() does:
if (!netif_running(wx->netdev) ||
test_bit(WX_STATE_RESETTING, wx->state))
return;
...
if (test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags)) {
if (wx->do_reset)
wx->do_reset(wx->netdev, true);
}
Can a subsequent wx_reset_subtask() tick then dereference the already
freed ring/IRQ state through wx->do_reset()?
/P
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-30 8:34 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 2:11 [PATCH net-next v1 0/5] net: wangxun: timeout and error Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-04-30 8:24 ` Paolo Abeni
2026-04-30 8:33 ` Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-04-28 2:11 ` [PATCH net-next v1 4/5] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-04-30 8:29 ` Paolo Abeni
2026-04-28 2:11 ` [PATCH net-next v1 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-04-30 8:34 ` Paolo Abeni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox