Re: [PATCH net-next v3 5/5] net: wangxun: implement pci_error_handlers ops

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paolo Abeni <pabeni@redhat.com>
To: jiawenwu@trustnetic.com
Cc: netdev@vger.kernel.org, mengyuanlou@net-swift.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, richardcochran@gmail.com,
	linux@armlinux.org.uk, horms@kernel.org, kees@kernel.org,
	larysa.zaremba@intel.com, leitao@debian.org, joe@dama.to,
	jacob.e.keller@intel.com, fabio.baltieri@gmail.com
Subject: Re: [PATCH net-next v3 5/5] net: wangxun: implement pci_error_handlers ops
Date: Thu, 14 May 2026 11:24:00 +0200	[thread overview]
Message-ID: <20260514092415.90852-1-pabeni@redhat.com> (raw)
In-Reply-To: <20260509100540.32612-6-jiawenwu@trustnetic.com>

From: AI Reviewer <ai@example.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: wangxun: implement pci_error_handlers ops

This patch adds AER error_detected, slot_reset, and resume callbacks
for the libwx, ngbe, and txgbe drivers, gates pci_disable_device()
behind a new WX_STATE_DISABLED bit, and adds pci_save_state() at the
end of probe.

> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> index deb9fac2e1a9..64d43bf6e9da 100644
> --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> @@ -4,11 +4,119 @@
>
>  #include <linux/netdevice.h>
>  #include <linux/pci.h>
> +#include <linux/aer.h>
>
>  #include "wx_type.h"
>  #include "wx_lib.h"
>  #include "wx_err.h"
>
> +/**
> + * wx_io_error_detected - called when PCI error is detected
> + * @pdev: Pointer to PCI device
> + * @state: The current pci connection state
> + *
> + * Return: pci_ers_result_t.
> + *
> + * This function is called after a PCI bus error affecting
> + * this device has been detected.
> + */
> +static pci_ers_result_t wx_io_error_detected(struct pci_dev *pdev,
> +					     pci_channel_state_t state)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	struct net_device *netdev;
> +
> +	if (!wx)
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	netdev = wx->netdev;
> +	if (!netif_device_present(netdev))
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	if (state == pci_channel_io_perm_failure)
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	rtnl_lock();
> +	netif_device_detach(netdev);
> +
> +	if (netif_running(netdev))
> +		wx->close_suspend(wx);

[High]

Should this path consult pci_channel_offline(pdev) (or the state
argument) before invoking close_suspend()?  When state ==
pci_channel_io_frozen, MMIO is unreliable: reads return all-ones and
writes are dropped.

close_suspend() walks into ngbe_down()/txgbe_down(), which call
wx_disable_pcie_master() and wx_disable_rx_queue() in
drivers/net/ethernet/wangxun/libwx/wx_hw.c.  Both use read_poll_timeout
on MMIO:

  wx_disable_pcie_master() polls WX_PX_TRANSACTION_PENDING with
  WX_PCI_MASTER_DISABLE_TIMEOUT (around 80 ms)

  wx_disable_rx_queue() polls WX_PX_RR_CFG (~100 us per queue)

When the channel is frozen, those polls cannot succeed and will spin
the full timeout for every queue.  wx_update_stats() at the tail of
disable_device additionally accumulates 0xFFFFFFFF reads as garbage
counters.

Documentation/PCI/pci-error-recovery.rst suggests error_detected
should avoid MMIO when the channel is frozen and only quiesce (stop
queues, free IRQs, disable the device).  Would skipping close_suspend
or substituting an MMIO-free quiesce path be appropriate when
pci_channel_offline(pdev) is true?

> +
> +	if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
> +		pci_disable_device(pdev);
> +	rtnl_unlock();
> +
> +	/* Request a slot reset. */
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +/**
> + * wx_io_slot_reset - called after the pci bus has been reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Return: pci_ers_result_t.
> + *
> + * Restart the card from scratch, as if from a cold-boot.
> + */
> +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	pci_ers_result_t result;
> +
> +	if (pci_enable_device_mem(pdev)) {
> +		wx_err(wx, "Cannot re-enable PCI device after reset.\n");
> +		result = PCI_ERS_RESULT_DISCONNECT;

[Medium]

What unwinds the resources released by close_suspend() in
wx_io_error_detected() when this path returns
PCI_ERS_RESULT_DISCONNECT?

netif_device_detach() only clears __LINK_STATE_PRESENT; it does not
clear __LINK_STATE_START, so netif_running() still reports true.
After this failure, the AER core calls error_detected again with
pci_channel_io_perm_failure and the driver short-circuits, leaving the
netdev in a half-torn-down state until ngbe_remove/txgbe_remove
eventually runs.

In the meantime, any rtnl path gated on netif_running() (for example
wx_set_features and wx_update_rsc, which call wx->do_reset ->
reinit_locked -> *_down -> *_disable_device touching freed
wx->rx_ring[i]/tx_ring[i]) would dereference freed rings.  Should this
failure path also clear LINK_STATE_START (e.g., via dev_close()) so
those callers see the netdev as down?

> +	} else {
> +		/* make all bar access done before reset. */
> +		smp_mb__before_atomic();

[Low]

Is the comment "make all bar access done before reset." accurate?

The PCI bus reset happens before slot_reset is invoked by the AER
core, so "before reset" reads temporally backwards.  There is also no
BAR/MMIO access in this function prior to the barrier — only
pci_enable_device_mem(), which touches config space.

smp_mb__before_atomic() is a CPU barrier paired with the following
non-RMW atomic (clear_bit) per Documentation/atomic_t.txt and
memory-barriers.txt; it does not order MMIO/BAR accesses (which would
require a readback or wmb()-class primitive).  Could the comment be
updated to reflect what the barrier actually does?

> +		clear_bit(WX_STATE_DISABLED, wx->state);
> +		pci_set_master(pdev);
> +		pci_restore_state(pdev);

[Medium]

Should pci_save_state(pdev) be called again here after
pci_restore_state(pdev)?

pci_restore_state() ends with dev->state_saved = false, and the
canonical pattern in other AER slot_reset implementations
(ixgbe_io_slot_reset, igb, e1000e) is restore-then-save so that
subsequent recovery cycles start from a known-good baseline.  Without
re-saving, repeated AER events on the same device would rely on stale
or partially-cleared per-cap save state.

> +		pci_wake_from_d3(pdev, false);
> +
> +		wx->do_reset(wx->netdev, false);

[Medium]

Should rtnl_lock be held around this do_reset() call?

Every other call site of wx->do_reset() runs under rtnl_lock:
wx_reset_subtask() in wx_err.c takes it explicitly, wx_set_features
and wx_update_rsc are invoked under rtnl by netdev_ops, and
wx_io_resume() takes rtnl_lock around ndo_open.

txgbe_do_reset -> txgbe_reset writes wx->mac_table[]
(wx_flush_sw_mac_table + wx_mac_set_default_filter) and reprograms HW
(txgbe_reset_hw, wx_start_hw); ngbe_reset also writes mac_table[].
netif_device_detach() in error_detected only clears
__LINK_STATE_PRESENT and stops TX queues — it does not block
rtnl-protected ndo callbacks (ndo_set_mac_address, ndo_set_features,
ndo_vlan_rx_add_vid, ethtool_ops).  Those userspace-triggered paths
run under rtnl_lock and would concurrently modify mac_table[] / HW
registers, racing with do_reset.

> +		result = PCI_ERS_RESULT_RECOVERED;
> +	}
> +
> +	pci_aer_clear_nonfatal_status(pdev);
> +
> +	return result;
> +}
> +
> +/**
> + * wx_io_resume - called when traffic can start flowing again.
> + * @pdev: Pointer to PCI device
> + *
> + * This callback is called when the error recovery driver tells us that
> + * its OK to resume normal operation.
> + */
> +static void wx_io_resume(struct pci_dev *pdev)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	struct net_device *netdev;
> +	int err;
> +
> +	netdev = wx->netdev;
> +	rtnl_lock();
> +	if (netif_running(netdev)) {
> +		err = netdev->netdev_ops->ndo_open(netdev);
> +		if (err) {
> +			wx_err(wx, "Failed to open netdev after reset\n");
> +			goto out;
> +		}
> +	}
> +	netif_device_attach(netdev);
> +out:
> +	rtnl_unlock();
> +}

[High]

When ndo_open() fails here, what state is the netdev left in?

ngbe_open()/txgbe_open() free their resources on failure paths and
return without clearing __LINK_STATE_START (only dev_close() does
that, and ndo_open is being called directly here).  netif_device_attach
is then skipped, so __LINK_STATE_PRESENT stays cleared.

The result is a netdev where netif_running() returns true (LINK_START
set) but PRESENT is clear, with rings/IRQs/phylink already freed by
close_suspend in error_detected and only partially re-allocated.  AER
treats recovery as complete (resume returns void).

Code paths gated only on netif_running() — wx_reset_subtask in
wx_err.c, wx_tx_timeout, wx_set_features -> wx->do_reset, ethtool
callbacks — would then dereference freed Tx/Rx rings, freed IRQ
handlers, or a disconnected phylink.  For example:

  void wx_reset_subtask(struct wx *wx)
  {
        ...
        if (!netif_running(wx->netdev) ||
            test_bit(WX_STATE_RESETTING, wx->state))
                goto out;
        ...
        if (wx->do_reset)
                wx->do_reset(wx->netdev, true);
  }

would call into reinit_locked -> *_down -> *_disable_device on freed
rings.  Should the failure path here force the netdev fully down (e.g.,
dev_close_many or equivalent) so netif_running() reports false?
-- 
This is an AI-generated review.

     prev parent reply	other threads:[~2026-05-14  9:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 10:05 [PATCH net-next v3 0/5] net: wangxun: timeout and error Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-05-14  9:23   ` Paolo Abeni
2026-05-09 10:05 ` [PATCH net-next v3 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 4/5] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-05-09 10:05 ` [PATCH net-next v3 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-05-14  9:24   ` Paolo Abeni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514092415.90852-1-pabeni@redhat.com \
    --to=pabeni@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=joe@dama.to \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=netdev@vger.kernel.org \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.