From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46D1830675C
	for <netdev@vger.kernel.org>; Mon,  8 Jun 2026 15:07:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780931271; cv=none; b=TVv/LlNYxdCEfhBD7eimn1mfzrsmmm2tZz2w37X5I28bNpxFbAOzdZTNprAAQ33ey/GquJa6th5lu4JjGNE0grWvg3HGxdd3pk8rMYq9QXrbr5eoLp1ZtnokG58kECWN3upRx+hieaNLbWGPGg98swsyGO+c3a19zkS2Xzspby4=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780931271; c=relaxed/simple;
	bh=3ZAi6k7uot/P9VWf4aeXBkjh3qrMw04RVar/N/MtQ28=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version; b=WvDx4lK7wFcPuBY9JA4a1BdqA3FoT0DPc5yWflEzlweQpfcifIomMLmdRyvD2+9xTCP6yefovFrDoBJio/kHsjd3U7XMNPsaT42JapkPNHk22bLUJUd+HSUn8pu6LFVRiAK4yFT7Law6FbyviZEIByyi4i+7Oy6pViURsKD2sGI=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SNRpde6k; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SNRpde6k"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id E0C421F00893;
	Mon,  8 Jun 2026 15:07:45 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1780931269;
	bh=4LLNpNp2396w2WmhF3i7U1VNXInpT/sRq/l2UGIIvV0=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References;
	b=SNRpde6krlDWnxsgZ3jauKZP99qdShWOV1LWKqMYoXGC5wrmTkgymuPwjmEHv5PI0
	 z285kLrrIyoctoIIkiEvN3n8gNhBepqgQSBZaxI5rT9ZrdwXAt8G8wcBzEIQIvAemC
	 2JMju0QFD0/wKNHJNQ+2JfVgpIzWO+dHCPmdgb4DmRHQqbV29kjiv1QFNd50Euw8CA
	 GwdpgIgcAk9+1XRrW2tSvrwHcVwvnXmco7nHypkxe8bfp94p096IG3CvrCdGL6xUIr
	 uh6jNSRDFhlx8qf3D5CQ8gHGZZFNPXoX8VIUbWGLPI7fnM3WkZpUk9YfIthI1EfwuC
	 U2diiPT06INOw==
From: Simon Horman <horms@kernel.org>
To: jiawenwu@trustnetic.com
Cc: 'Simon Horman' <horms@kernel.org>,
	netdev@vger.kernel.org,
	mengyuanlou@net-swift.com,
	andrew+netdev@lunn.ch,
	davem@davemloft.net,
	edumazet@google.com,
	kuba@kernel.org,
	pabeni@redhat.com,
	richardcochran@gmail.com,
	linux@armlinux.org.uk,
	jacob.e.keller@intel.com,
	michal.swiatkowski@linux.intel.com,
	kees@kernel.org,
	larysa.zaremba@intel.com,
	joe@dama.to,
	leitao@debian.org,
	aleksandr.loktionov@intel.com,
	u.kleine-koenig@baylibre.com,
	fabio.baltieri@gmail.com,
	tglx@kernel.org,
	gregkh@linuxfoundation.org
Subject: Re: [PATCH net-next v5 5/5] net: wangxun: add pcie error handler
Date: Mon,  8 Jun 2026 16:05:31 +0100
Message-ID: <20260608150529.149510-3-horms@kernel.org>
X-Mailer: git-send-email 2.54.0
In-Reply-To: <20260604085631.12720-6-jiawenwu@trustnetic.com>
References: <20260604085631.12720-6-jiawenwu@trustnetic.com>
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev

Simon says: there is also AI-generated review available of other patches of
	    this patch-set. But I believe those flagged below are the ones
	    that warrant investigation in the context of this patchset.

---
net: wangxun: add pcie error handler

Support AER driver to handle the PCIe errors. Sometimes netdev watchdog
Tx timeout happens before the AER error report when a PCIe error occurs,
CPU blocking would be caused by MMIO during the reset process. To
prevent it, error return is added in reset path. The current function of
ngbe is not yet fully developed, it will be completed in the future.

> diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> index ee27f96735dc7..9a4dbb3427d93 100644
> --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
> +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
[ ... ]
> @@ -4,11 +4,125 @@
>  static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
>  {
>  	struct wx *wx = pci_get_drvdata(pdev);
>  	pci_ers_result_t result;
>  
>  	if (pci_enable_device_mem(pdev)) {
>  		wx_err(wx, "Cannot re-enable PCI device after reset.\n");
>  		result = PCI_ERS_RESULT_DISCONNECT;
>  	} else {
>  		/* make all memory operations done before clearing the flag */
>  		smp_mb__before_atomic();
>  		clear_bit(WX_STATE_DISABLED, wx->state);
>  		pci_set_master(pdev);
>  		pci_restore_state(pdev);
>  		pci_wake_from_d3(pdev, false);
>  
> +		rtnl_lock();
> +		if (netif_running(wx->netdev) && wx->down_suspend)
> +			wx->down_suspend(wx);
> +		if (wx->do_reset)
> +			wx->do_reset(wx->netdev, false);
> +		rtnl_unlock();

[Severity: Critical]
Does this sequence leave the netdev state out of sync if wx_io_resume()
later fails?

If wx_io_resume() fails its ndo_open() call, it returns with netif_running()
still true. If an administrator later sets the interface down, the kernel
will call ndo_stop (txgbe_close).

txgbe_close() unconditionally calls wx_free_irq(), txgbe_free_misc_irq(), and
wx_free_resources(). Because txgbe_free_misc_irq() does not set
txgbe->misc.domain to NULL, calling it a second time causes a double free.

> diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_ethtool.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_ethtool.c
> index 3e32aca728065..80811947d5acf 100644
> --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_ethtool.c
> +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_ethtool.c
[ ... ]
> @@ -78,13 +78,17 @@ static int txgbe_set_ringparam(struct net_device *netdev,
>  		goto clear_reset;
>  	}
>  
> -	txgbe_down(wx);
> +	err = txgbe_down(wx);
> +	if (err)
> +		goto free_temp;
>  
>  	wx_set_ring(wx, new_tx_count, new_rx_count, temp_ring);
>  	kvfree(temp_ring);
>  
>  	txgbe_up(wx);
>  
> +free_temp:
> +	kvfree(temp_ring);
>  clear_reset:
>  	clear_bit(WX_STATE_RESETTING, wx->state);

[Severity: High]
If txgbe_down() returns an error here, the function jumps to free_temp and
skips calling txgbe_up().

Because txgbe_disable_device() has already called wx_soft_quiesce() to stop
TX queues and disable NAPI, does this leave the interface logically stuck
down while the OS still considers it up?

> diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> index f6e596eb92178..98786efbe871d 100644
> --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
[ ... ]
> @@ -212,15 +212,21 @@ static void txgbe_reset(struct wx *wx)
>  		wx_ptp_reset(wx);
>  }
>  
> -static void txgbe_disable_device(struct wx *wx)
> +static int txgbe_disable_device(struct wx *wx)
>  {
>  	struct net_device *netdev = wx->netdev;
> +	int ret = 0;
>  	u32 i;
>  
>  	if (test_and_set_bit(WX_STATE_DOWN, wx->state))
> -		return;
> +		return 0;
> +
> +	ret = wx_disable_pcie_master(wx);
> +	if (ret) {
> +		wx_soft_quiesce(wx);
> +		return ret;
> +	}

[Severity: High]
If wx_disable_pcie_master() times out and returns an error, the function
exits without clearing the WX_STATE_DOWN bit. If a subsequent teardown
attempt occurs, will the WX_STATE_DOWN check falsely assume the device is
already disabled and return 0 (success)?

[Severity: Critical]
Returning early here skips critical hardware teardown like wx_disable_rx()
and the MAC shutdown.

Because callers like txgbe_close() do not check the return value of
txgbe_down(), they proceed to unconditionally free all DMA rings via
wx_free_resources(). Can this allow the hardware to DMA incoming packets into
freed memory pages?

[ ... ]
> @@ -606,13 +635,16 @@ int txgbe_setup_tc(struct net_device *dev, u8 tc)
>  
>  static void txgbe_reinit_locked(struct wx *wx)
>  {
> +	int ret;
> +
>  	netif_trans_update(wx->netdev);
>  
>  	mutex_lock(&wx->reset_lock);
>  	set_bit(WX_STATE_RESETTING, wx->state);
>  
> -	txgbe_down(wx);
> -	txgbe_up(wx);
> +	ret = txgbe_down(wx);
> +	if (!ret)
> +		txgbe_up(wx);
>  
>  	clear_bit(WX_STATE_RESETTING, wx->state);

[Severity: High]
Similar to the ethtool path, if txgbe_down() fails here, txgbe_up() is
skipped. Does this leave the interface in a broken state until the next
administrative down/up cycle?