public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Jiawen Wu <jiawenwu@trustnetic.com>, netdev@vger.kernel.org
Cc: Mengyuan Lou <mengyuanlou@net-swift.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Richard Cochran <richardcochran@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Simon Horman <horms@kernel.org>, Kees Cook <kees@kernel.org>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	Breno Leitao <leitao@debian.org>, Joe Damato <joe@dama.to>,
	Jacob Keller <jacob.e.keller@intel.com>,
	Fabio Baltieri <fabio.baltieri@gmail.com>
Subject: Re: [PATCH net-next v1 5/5] net: wangxun: implement pci_error_handlers ops
Date: Thu, 30 Apr 2026 10:34:18 +0200	[thread overview]
Message-ID: <799458d9-729d-4589-987d-2518e31a7bde@redhat.com> (raw)
In-Reply-To: <20260428021156.13564-6-jiawenwu@trustnetic.com>

On 4/28/26 4:11 AM, Jiawen Wu wrote:
> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> index e7c9dcb148b5..1aefae402c8e 100644
> --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> @@ -3,11 +3,118 @@
>  
>  #include <linux/netdevice.h>
>  #include <linux/pci.h>
> +#include <linux/aer.h>
>  
>  #include "wx_type.h"
>  #include "wx_lib.h"
>  #include "wx_err.h"
>  
> +/**
> + * wx_io_error_detected - called when PCI error is detected
> + * @pdev: Pointer to PCI device
> + * @state: The current pci connection state
> + *
> + * Return: pci_ers_result_t.
> + *
> + * This function is called after a PCI bus error affecting
> + * this device has been detected.
> + */
> +static pci_ers_result_t wx_io_error_detected(struct pci_dev *pdev,
> +					     pci_channel_state_t state)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	struct net_device *netdev;
> +
> +	netdev = wx->netdev;
> +	if (!netif_device_present(netdev))
> +		return PCI_ERS_RESULT_DISCONNECT;
> +
> +	rtnl_lock();
> +	netif_device_detach(netdev);
> +
> +	if (netif_running(netdev))
> +		wx->close_suspend(wx);
> +
> +	if (state == pci_channel_io_perm_failure) {
> +		rtnl_unlock();
> +		return PCI_ERS_RESULT_DISCONNECT;

Sashiko says:

On the pci_channel_io_perm_failure path here, WX_STATE_DISABLED is not
set and pci_disable_device() is not called.  When the PCI core then
follows up with .remove(), ngbe_remove()/txgbe_remove() do:
	if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
		pci_disable_device(pdev);
Since the bit is still clear, pci_disable_device() is invoked on a
device that has already been torn down by the PCI core on the
perm_failure path.  Should the perm_failure branch also set
WX_STATE_DISABLED (and arguably call pci_disable_device()) for symmetry
with the NEED_RESET branch below and with how drivers like ixgbe handle
this case?

> +	}
> +
> +	if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
> +		pci_disable_device(pdev);
> +	rtnl_unlock();
> +
> +	/* Request a slot reset. */
> +	return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +/**
> + * wx_io_slot_reset - called after the pci bus has been reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Return: pci_ers_result_t.
> + *
> + * Restart the card from scratch, as if from a cold-boot.
> + */
> +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	pci_ers_result_t result;
> +
> +	if (pci_enable_device_mem(pdev)) {
> +		wx_err(wx, "Cannot re-enable PCI device after reset.\n");
> +		result = PCI_ERS_RESULT_DISCONNECT;
> +	} else {
> +		/* make all bar access done before reset. */
> +		smp_mb__before_atomic();
> +		clear_bit(WX_STATE_DISABLED, wx->state);
> +		pci_set_master(pdev);
> +		pci_restore_state(pdev);
> +		pci_wake_from_d3(pdev, false);
> +
> +		wx->do_reset(wx->netdev, false);
> +		result = PCI_ERS_RESULT_RECOVERED;
> +	}
> +
> +	pci_aer_clear_nonfatal_status(pdev);

Sashiko says:

Should pci_aer_clear_nonfatal_status() be called on the
PCI_ERS_RESULT_DISCONNECT path where pci_enable_device_mem() failed?
It runs unconditionally here, and when CONFIG_PCIEAER=n the stub in
include/linux/aer.h returns -EINVAL, which is also ignored.  Would it
be cleaner to only call this on the recovered path?

> +
> +	return result;
> +}
> +
> +/**
> + * wx_io_resume - called when traffic can start flowing again.
> + * @pdev: Pointer to PCI device
> + *
> + * This callback is called when the error recovery driver tells us that
> + * its OK to resume normal operation.
> + */
> +static void wx_io_resume(struct pci_dev *pdev)
> +{
> +	struct wx *wx = pci_get_drvdata(pdev);
> +	struct net_device *netdev;
> +	int err;
> +
> +	netdev = wx->netdev;
> +	rtnl_lock();
> +	if (netif_running(netdev)) {
> +		err = netdev->netdev_ops->ndo_open(netdev);
> +		if (err) {
> +			wx_err(wx, "Failed to open netdev after reset\n");
> +			goto out;
> +		}
> +	}
> +	netif_device_attach(netdev);
> +out:
> +	rtnl_unlock();

Sashiko says:

What state is the netdev left in when ndo_open() fails here?
wx_io_error_detected() called netif_device_detach(), which clears
__LINK_STATE_PRESENT but leaves __LINK_STATE_START set, so
netif_running() still returns true.  On ndo_open() failure, the 'out'
label skips netif_device_attach() and returns void (so the AER core
gets no error signal).
If ngbe_open()/txgbe_open() partially succeeded and then failed on an
internal error path that freed rings/IRQs/phylink resources, the netdev
is then left with netif_running()==true, __LINK_STATE_PRESENT clear,
and the hardware resources released.  The service_timer and
service_task are still running, and wx_reset_subtask() does:
	if (!netif_running(wx->netdev) ||
	    test_bit(WX_STATE_RESETTING, wx->state))
		return;
	...
	if (test_and_clear_bit(WX_FLAG_NEED_PF_RESET, wx->flags)) {
		if (wx->do_reset)
			wx->do_reset(wx->netdev, true);
	}
Can a subsequent wx_reset_subtask() tick then dereference the already
freed ring/IRQ state through wx->do_reset()?

/P


      reply	other threads:[~2026-04-30  8:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  2:11 [PATCH net-next v1 0/5] net: wangxun: timeout and error Jiawen Wu
2026-04-28  2:11 ` [PATCH net-next v1 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-04-28  2:11 ` [PATCH net-next v1 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-04-30  8:24   ` Paolo Abeni
2026-04-30  8:33     ` Jiawen Wu
2026-04-28  2:11 ` [PATCH net-next v1 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-04-28  2:11 ` [PATCH net-next v1 4/5] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-04-30  8:29   ` Paolo Abeni
2026-04-28  2:11 ` [PATCH net-next v1 5/5] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-04-30  8:34   ` Paolo Abeni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=799458d9-729d-4589-987d-2518e31a7bde@redhat.com \
    --to=pabeni@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=joe@dama.to \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=netdev@vger.kernel.org \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox