Re: [PATCH net-next v3 5/5] net: wangxun: implement pci_error_handlers ops

Netdev List
 help / color / mirror / Atom feed

From: Paolo Abeni <pabeni@redhat.com>
To: jiawenwu@trustnetic.com
Cc: netdev@vger.kernel.org, mengyuanlou@net-swift.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, richardcochran@gmail.com,
	linux@armlinux.org.uk, horms@kernel.org, kees@kernel.org,
	larysa.zaremba@intel.com, leitao@debian.org, joe@dama.to,
	jacob.e.keller@intel.com, fabio.baltieri@gmail.com
Subject: Re: [PATCH net-next v3 5/5] net: wangxun: implement pci_error_handlers ops
Date: Thu, 14 May 2026 11:24:00 +0200	[thread overview]
Message-ID: <20260514092415.90852-1-pabeni@redhat.com> (raw)
In-Reply-To: <20260509100540.32612-6-jiawenwu@trustnetic.com>

From: AI Reviewer <ai@example.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: wangxun: implement pci_error_handlers ops

This patch adds AER error_detected, slot_reset, and resume callbacks
for the libwx, ngbe, and txgbe drivers, gates pci_disable_device()
behind a new WX_STATE_DISABLED bit, and adds pci_save_state() at the
end of probe.

> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> index deb9fac2e1a9..64d43bf6e9da 100644
> --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> @@ -4,11 +4,119 @@
>
>  #include <linux/netdevice.h>
>  #include <linux/pci.h>
> +#include <linux/aer.h>
>
>  #include "wx_type.h"
>  #include "wx_lib.h"
>  #include "wx_err.h"
>
> +/**
> + * wx_io_error_detected - called when PCI error is detected
> + * @pdev: Pointer to PCI device
> + * @state: The current pci connection state
> + *
> + * Return: pci_ers_result_t.
> + *
> + * This function is called after a PCI bus error affecting
> + * this device has been detected.
> + */
> +static pci_ers_result_t wx_io_error_detected(struct pci_dev *pdev,
> +					     pci_channel_state_t state)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	struct net_device *netdev;
> +
> +	if (!wx)
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	netdev = wx->netdev;
> +	if (!netif_device_present(netdev))
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	if (state == pci_channel_io_perm_failure)
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	rtnl_lock();
> +	netif_device_detach(netdev);
> +
> +	if (netif_running(netdev))
> +		wx->close_suspend(wx);

[High]

Should this path consult pci_channel_offline(pdev) (or the state
argument) before invoking close_suspend()?  When state ==
pci_channel_io_frozen, MMIO is unreliable: reads return all-ones and
writes are dropped.

close_suspend() walks into ngbe_down()/txgbe_down(), which call
wx_disable_pcie_master() and wx_disable_rx_queue() in
drivers/net/ethernet/wangxun/libwx/wx_hw.c.  Both use read_poll_timeout
on MMIO:

  wx_disable_pcie_master() polls WX_PX_TRANSACTION_PENDING with
  WX_PCI_MASTER_DISABLE_TIMEOUT (around 80 ms)

  wx_disable_rx_queue() polls WX_PX_RR_CFG (~100 us per queue)

When the channel is frozen, those polls cannot succeed and will spin
the full timeout for every queue.  wx_update_stats() at the tail of
disable_device additionally accumulates 0xFFFFFFFF reads as garbage
counters.

Documentation/PCI/pci-error-recovery.rst suggests error_detected
should avoid MMIO when the channel is frozen and only quiesce (stop
queues, free IRQs, disable the device).  Would skipping close_suspend
or substituting an MMIO-free quiesce path be appropriate when
pci_channel_offline(pdev) is true?

> +
> +	if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
> +		pci_disable_device(pdev);
> +	rtnl_unlock();
> +
> +	/* Request a slot reset. */
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +/**
> + * wx_io_slot_reset - called after the pci bus has been reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Return: pci_ers_result_t.
> + *
> + * Restart the card from scratch, as if from a cold-boot.
> + */
> +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	pci_ers_result_t result;
> +
> +	if (pci_enable_device_mem(pdev)) {
> +		wx_err(wx, "Cannot re-enable PCI device after reset.\n");
> +		result = PCI_ERS_RESULT_DISCONNECT;

[Medium]

What unwinds the resources released by close_suspend() in
wx_io_error_detected() when this path returns
PCI_ERS_RESULT_DISCONNECT?

netif_device_detach() only clears __LINK_STATE_PRESENT; it does not
clear __LINK_STATE_START, so netif_running() still reports true.
After this failure, the AER core calls error_detected again with
pci_channel_io_perm_failure and the driver short-circuits, leaving the
netdev in a half-torn-down state until ngbe_remove/txgbe_remove
eventually runs.

In the meantime, any rtnl path gated on netif_running() (for example
wx_set_features and wx_update_rsc, which call wx->do_reset ->
reinit_locked -> *_down -> *_disable_device touching freed
wx->rx_ring[i]/tx_ring[i]) would dereference freed rings.  Should this
failure path also clear LINK_STATE_START (e.g., via dev_close()) so
those callers see the netdev as down?

> +	} else {
> +		/* make all bar access done before reset. */
> +		smp_mb__before_atomic();

[Low]

Is the comment "make all bar access done before reset." accurate?

The PCI bus reset happens before slot_reset is invoked by the AER
core, so "before reset" reads temporally backwards.  There is also no
BAR/MMIO access in this function prior to the barrier — only
pci_enable_device_mem(), which touches config space.

smp_mb__before_atomic() is a CPU barrier paired with the following
non-RMW atomic (clear_bit) per Documentation/atomic_t.txt and
memory-barriers.txt; it does not order MMIO/BAR accesses (which would
require a readback or wmb()-class primitive).  Could the comment be
updated to reflect what the barrier actually does?

> +		clear_bit(WX_STATE_DISABLED, wx->state);
> +		pci_set_master(pdev);
> +		pci_restore_state(pdev);

[Medium]

Should pci_save_state(pdev) be called again here after
pci_restore_state(pdev)?

pci_restore_state() ends with dev->state_saved = false, and the
canonical pattern in other AER slot_reset implementations
(ixgbe_io_slot_reset, igb, e1000e) is restore-then-save so that
subsequent recovery cycles start from a known-good baseline.  Without
re-saving, repeated AER events on the same device would rely on stale
or partially-cleared per-cap save state.

> +		pci_wake_from_d3(pdev, false);
> +
> +		wx->do_reset(wx->netdev, false);

[Medium]

Should rtnl_lock be held around this do_reset() call?

Every other call site of wx->do_reset() runs under rtnl_lock:
wx_reset_subtask() in wx_err.c takes it explicitly, wx_set_features
and wx_update_rsc are invoked under rtnl by netdev_ops, and
wx_io_resume() takes rtnl_lock around ndo_open.

txgbe_do_reset -> txgbe_reset writes wx->mac_table[]
(wx_flush_sw_mac_table + wx_mac_set_default_filter) and reprograms HW
(txgbe_reset_hw, wx_start_hw); ngbe_reset also writes mac_table[].
netif_device_detach() in error_detected only clears
__LINK_STATE_PRESENT and stops TX queues — it does not block
rtnl-protected ndo callbacks (ndo_set_mac_address, ndo_set_features,
ndo_vlan_rx_add_vid, ethtool_ops).  Those userspace-triggered paths
run under rtnl_lock and would concurrently modify mac_table[] / HW
registers, racing with do_reset.

> +		result = PCI_ERS_RESULT_RECOVERED;
> +	}
> +
> +	pci_aer_clear_nonfatal_status(pdev);
> +
> +	return result;
> +}
> +
> +/**
> + * wx_io_resume - called when traffic can start flowing again.
> + * @pdev: Pointer to PCI device
> + *
> + * This callback is called when the error recovery driver tells us that
> + * its OK to resume normal operation.
> + */
> +static void wx_io_resume(struct pci_dev *pdev)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	struct net_device *netdev;
> +	int err;
> +
> +	netdev = wx->netdev;
> +	rtnl_lock();
> +	if (netif_running(netdev)) {
> +		err = netdev->netdev_ops->ndo_open(netdev);
> +		if (err) {
> +			wx_err(wx, "Failed to open netdev after reset\n");
> +			goto out;
> +		}
> +	}
> +	netif_device_attach(netdev);
> +out:
> +	rtnl_unlock();
> +}

[High]

When ndo_open() fails here, what state is the netdev left in?

ngbe_open()/txgbe_open() free their resources on failure paths and
return without clearing __LINK_STATE_START (only dev_close() does
that, and ndo_open is being called directly here).  netif_device_attach
is then skipped, so __LINK_STATE_PRESENT stays cleared.

The result is a netdev where netif_running() returns true (LINK_START
set) but PRESENT is clear, with rings/IRQs/phylink already freed by
close_suspend in error_detected and only partially re-allocated.  AER
treats recovery as complete (resume returns void).

Code paths gated only on netif_running() — wx_reset_subtask in
wx_err.c, wx_tx_timeout, wx_set_features -> wx->do_reset, ethtool
callbacks — would then dereference freed Tx/Rx rings, freed IRQ
handlers, or a disconnected phylink.  For example:

  void wx_reset_subtask(struct wx *wx)
  {
        ...
        if (!netif_running(wx->netdev) ||
            test_bit(WX_STATE_RESETTING, wx->state))
                goto out;
        ...
        if (wx->do_reset)
                wx->do_reset(wx->netdev, true);
  }

would call into reinit_locked -> *_down -> *_disable_device on freed
rings.  Should the failure path here force the netdev fully down (e.g.,
dev_close_many or equivalent) so netif_running() reports false?
-- 
This is an AI-generated review.

     prev parent reply	other threads:[~2026-05-14  9:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 10:05 [PATCH net-next v3 0/5] net: wangxun: timeout and error Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-05-14  9:23   ` Paolo Abeni
2026-05-09 10:05 ` [PATCH net-next v3 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 4/5] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-05-14  9:24   ` Paolo Abeni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514092415.90852-1-pabeni@redhat.com \
    --to=pabeni@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=joe@dama.to \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=netdev@vger.kernel.org \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox