Re: [PATCH net-next v5 5/5] net: wangxun: add pcie error handler

Netdev List
 help / color / mirror / Atom feed

From: Simon Horman <horms@kernel.org>
To: jiawenwu@trustnetic.com
Cc: 'Simon Horman' <horms@kernel.org>,
	netdev@vger.kernel.org, mengyuanlou@net-swift.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, richardcochran@gmail.com,
	linux@armlinux.org.uk, jacob.e.keller@intel.com,
	michal.swiatkowski@linux.intel.com, kees@kernel.org,
	larysa.zaremba@intel.com, joe@dama.to, leitao@debian.org,
	aleksandr.loktionov@intel.com, u.kleine-koenig@baylibre.com,
	fabio.baltieri@gmail.com, tglx@kernel.org,
	gregkh@linuxfoundation.org
Subject: Re: [PATCH net-next v5 5/5] net: wangxun: add pcie error handler
Date: Mon,  8 Jun 2026 16:05:31 +0100	[thread overview]
Message-ID: <20260608150529.149510-3-horms@kernel.org> (raw)
In-Reply-To: <20260604085631.12720-6-jiawenwu@trustnetic.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev

Simon says: there is also AI-generated review available of other patches of
	    this patch-set. But I believe those flagged below are the ones
	    that warrant investigation in the context of this patchset.

---
net: wangxun: add pcie error handler

Support AER driver to handle the PCIe errors. Sometimes netdev watchdog
Tx timeout happens before the AER error report when a PCIe error occurs,
CPU blocking would be caused by MMIO during the reset process. To
prevent it, error return is added in reset path. The current function of
ngbe is not yet fully developed, it will be completed in the future.

> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> index ee27f96735dc7..9a4dbb3427d93 100644
> --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
[ ... ]
> @@ -4,11 +4,125 @@
>  static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
>  {
>  	struct wx *wx = pci_get_drvdata(pdev);
>  	pci_ers_result_t result;
>  
>  	if (pci_enable_device_mem(pdev)) {
>  		wx_err(wx, "Cannot re-enable PCI device after reset.\n");
>  		result = PCI_ERS_RESULT_DISCONNECT;
>  	} else {
>  		/* make all memory operations done before clearing the flag */
>  		smp_mb__before_atomic();
>  		clear_bit(WX_STATE_DISABLED, wx->state);
>  		pci_set_master(pdev);
>  		pci_restore_state(pdev);
>  		pci_wake_from_d3(pdev, false);
>  
> +		rtnl_lock();
> +		if (netif_running(wx->netdev) && wx->down_suspend)
> +			wx->down_suspend(wx);
> +		if (wx->do_reset)
> +			wx->do_reset(wx->netdev, false);
> +		rtnl_unlock();

[Severity: Critical]
Does this sequence leave the netdev state out of sync if wx_io_resume()
later fails?

If wx_io_resume() fails its ndo_open() call, it returns with netif_running()
still true. If an administrator later sets the interface down, the kernel
will call ndo_stop (txgbe_close).

txgbe_close() unconditionally calls wx_free_irq(), txgbe_free_misc_irq(), and
wx_free_resources(). Because txgbe_free_misc_irq() does not set
txgbe->misc.domain to NULL, calling it a second time causes a double free.

> diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_ethtool.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_ethtool.c
> index 3e32aca728065..80811947d5acf 100644
> --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_ethtool.c
> +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_ethtool.c
[ ... ]
> @@ -78,13 +78,17 @@ static int txgbe_set_ringparam(struct net_device *netdev,
>  		goto clear_reset;
>  	}
>  
> -	txgbe_down(wx);
> +	err = txgbe_down(wx);
> +	if (err)
> +		goto free_temp;
>  
>  	wx_set_ring(wx, new_tx_count, new_rx_count, temp_ring);
>  	kvfree(temp_ring);
>  
>  	txgbe_up(wx);
>  
> +free_temp:
> +	kvfree(temp_ring);
>  clear_reset:
>  	clear_bit(WX_STATE_RESETTING, wx->state);

[Severity: High]
If txgbe_down() returns an error here, the function jumps to free_temp and
skips calling txgbe_up().

Because txgbe_disable_device() has already called wx_soft_quiesce() to stop
TX queues and disable NAPI, does this leave the interface logically stuck
down while the OS still considers it up?

> diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> index f6e596eb92178..98786efbe871d 100644
> --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
[ ... ]
> @@ -212,15 +212,21 @@ static void txgbe_reset(struct wx *wx)
>  		wx_ptp_reset(wx);
>  }
>  
> -static void txgbe_disable_device(struct wx *wx)
> +static int txgbe_disable_device(struct wx *wx)
>  {
>  	struct net_device *netdev = wx->netdev;
> +	int ret = 0;
>  	u32 i;
>  
>  	if (test_and_set_bit(WX_STATE_DOWN, wx->state))
> -		return;
> +		return 0;
> +
> +	ret = wx_disable_pcie_master(wx);
> +	if (ret) {
> +		wx_soft_quiesce(wx);
> +		return ret;
> +	}

[Severity: High]
If wx_disable_pcie_master() times out and returns an error, the function
exits without clearing the WX_STATE_DOWN bit. If a subsequent teardown
attempt occurs, will the WX_STATE_DOWN check falsely assume the device is
already disabled and return 0 (success)?

[Severity: Critical]
Returning early here skips critical hardware teardown like wx_disable_rx()
and the MAC shutdown.

Because callers like txgbe_close() do not check the return value of
txgbe_down(), they proceed to unconditionally free all DMA rings via
wx_free_resources(). Can this allow the hardware to DMA incoming packets into
freed memory pages?

[ ... ]
> @@ -606,13 +635,16 @@ int txgbe_setup_tc(struct net_device *dev, u8 tc)
>  
>  static void txgbe_reinit_locked(struct wx *wx)
>  {
> +	int ret;
> +
>  	netif_trans_update(wx->netdev);
>  
>  	mutex_lock(&wx->reset_lock);
>  	set_bit(WX_STATE_RESETTING, wx->state);
>  
> -	txgbe_down(wx);
> -	txgbe_up(wx);
> +	ret = txgbe_down(wx);
> +	if (!ret)
> +		txgbe_up(wx);
>  
>  	clear_bit(WX_STATE_RESETTING, wx->state);

[Severity: High]
Similar to the ethtool path, if txgbe_down() fails here, txgbe_up() is
skipped. Does this leave the interface in a broken state until the next
administrative down/up cycle?

     prev parent reply	other threads:[~2026-06-08 15:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-04  8:56 [PATCH net-next v5 0/5] net: wangxun: timeout and error Jiawen Wu
2026-06-04  8:56 ` [PATCH net-next v5 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-06-08  8:47   ` Loktionov, Aleksandr
2026-06-04  8:56 ` [PATCH net-next v5 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-06-04  8:56 ` [PATCH net-next v5 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-06-04  8:56 ` [PATCH net-next v5 4/5] net: wangxun: implement soft quiesce for PCIe error recovery Jiawen Wu
2026-06-08  8:47   ` Loktionov, Aleksandr
2026-06-04  8:56 ` [PATCH net-next v5 5/5] net: wangxun: add pcie error handler Jiawen Wu
2026-06-08 15:05   ` Simon Horman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260608150529.149510-3-horms@kernel.org \
    --to=horms@kernel.org \
    --cc=aleksandr.loktionov@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=joe@dama.to \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=michal.swiatkowski@linux.intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=tglx@kernel.org \
    --cc=u.kleine-koenig@baylibre.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox