Netdev List
 help / color / mirror / Atom feed
From: "Jiawen Wu" <jiawenwu@trustnetic.com>
To: "'Simon Horman'" <horms@kernel.org>
Cc: netdev@vger.kernel.org,
	"'Mengyuan Lou'" <mengyuanlou@net-swift.com>,
	"'Andrew Lunn'" <andrew+netdev@lunn.ch>,
	"'David S. Miller'" <davem@davemloft.net>,
	"'Eric Dumazet'" <edumazet@google.com>,
	"'Jakub Kicinski'" <kuba@kernel.org>,
	"'Paolo Abeni'" <pabeni@redhat.com>,
	"'Richard Cochran'" <richardcochran@gmail.com>,
	"'Russell King'" <linux@armlinux.org.uk>,
	"'Jacob Keller'" <jacob.e.keller@intel.com>,
	"'Michal Swiatkowski'" <michal.swiatkowski@linux.intel.com>,
	"'Kees Cook'" <kees@kernel.org>,
	"'Larysa Zaremba'" <larysa.zaremba@intel.com>,
	"'Joe Damato'" <joe@dama.to>,
	"'Breno Leitao'" <leitao@debian.org>,
	"'Aleksandr Loktionov'" <aleksandr.loktionov@intel.com>,
	"'Uwe Kleine-König (The Capable Hub)'"
	<u.kleine-koenig@baylibre.com>,
	"'Fabio Baltieri'" <fabio.baltieri@gmail.com>,
	"'Thomas Gleixner'" <tglx@kernel.org>,
	"'Greg Kroah-Hartman'" <gregkh@linuxfoundation.org>,
	netdev@vger.kernel.org,
	"'Mengyuan Lou'" <mengyuanlou@net-swift.com>,
	"'Andrew Lunn'" <andrew+netdev@lunn.ch>,
	"'David S. Miller'" <davem@davemloft.net>,
	"'Eric Dumazet'" <edumazet@google.com>,
	"'Jakub Kicinski'" <kuba@kernel.org>,
	"'Paolo Abeni'" <pabeni@redhat.com>,
	"'Richard Cochran'" <richardcochran@gmail.com>,
	"'Russell King'" <linux@armlinux.org.uk>,
	"'Jacob Keller'" <jacob.e.keller@intel.com>,
	"'Michal Swiatkowski'" <michal.swiatkowski@linux.intel.com>,
	"'Kees Cook'" <kees@kernel.org>,
	"'Larysa Zaremba'" <larysa.zaremba@intel.com>,
	"'Joe Damato'" <joe@dama.to>,
	"'Breno Leitao'" <leitao@debian.org>,
	"'Aleksandr Loktionov'" <aleksandr.loktionov@intel.com>,
	"'Uwe Kleine-König (The Capable Hub)'"
	<u.kleine-koenig@baylibre.com>,
	"'Fabio Baltieri'" <fabio.baltieri@gmail.com>,
	"'Thomas Gleixner'" <tglx@kernel.org>,
	"'Greg Kroah-Hartman'" <gregkh@linuxfoundation.org>
Subject: RE: [PATCH net-next v6 5/5] net: wangxun: add pcie error handler
Date: Mon, 15 Jun 2026 10:13:42 +0800	[thread overview]
Message-ID: <018201dcfc6c$96419460$c2c4bd20$@trustnetic.com> (raw)
In-Reply-To: <20260612154117.GC671640@horms.kernel.org>

On Fri, Jun 12, 2026 11:41 PM, Simon Horman wrote:
> On Wed, Jun 10, 2026 at 02:09:17PM +0800, Jiawen Wu wrote:
> > Support AER driver to handle the PCIe errors. Sometimes netdev watchdog
> > Tx timeout happens before the AER error report when a PCIe error occurs,
> > CPU blocking would be caused by MMIO during the reset process. To
> > prevent it, check PCIe error status in .ndo_tx_timeout. The current
> > function of ngbe is not yet fully developed, it will be completed in the
> > future.
> >
> > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
> 
> Hi Jiawen,
> 
> There is AI-generated review of this patch-set available on both
> https://sashiko.dev and https://netdev-ai.bots.linux.dev/sashiko/
> 
> I've tried to filter out only those that seem strictly relevant
> to the progress of this patch and have included them inline.
> 
> I'd appreciate it if you could take a look over this.
> 
> > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> 
> ...
> 
> > +/**
> > + * wx_io_slot_reset - called after the pci bus has been reset.
> > + * @pdev: Pointer to PCI device
> > + *
> > + * Return: pci_ers_result_t.
> > + *
> > + * Restart the card from scratch, as if from a cold-boot.
> > + */
> > +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
> > +{
> > +	struct wx *wx = pci_get_drvdata(pdev);
> > +	pci_ers_result_t result;
> > +
> > +	if (pci_enable_device_mem(pdev)) {
> > +		wx_err(wx, "Cannot re-enable PCI device after reset.\n");
> > +		result = PCI_ERS_RESULT_DISCONNECT;
> > +	} else {
> > +		/* make all memory operations done before clearing the flag */
> > +		smp_mb__before_atomic();
> > +		clear_bit(WX_STATE_DISABLED, wx->state);
> > +		clear_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags);
> > +		pci_set_master(pdev);
> > +		pci_restore_state(pdev);
> > +		pci_wake_from_d3(pdev, false);
> > +
> > +		rtnl_lock();
> > +		if (netif_running(wx->netdev) && wx->down_suspend)
> > +			wx->down_suspend(wx);
> > +		if (wx->do_reset)
> > +			wx->do_reset(wx->netdev, false);
> 
> The AI-generated review on sashiko.dev flags the following.
> 
> 	"Does this properly initialize the hardware MAC during PCIe AER
> 	 slot reset recovery for the ngbe driver?
> 
> 	"For txgbe, this correctly maps to txgbe_reset(), which calls
> 	 txgbe_reset_hw().  However, for ngbe, wx->do_reset maps to
> 	 ngbe_do_reset() which calls ngbe_reset().
> 
> 	"Looking at ngbe_reset():
> 
> 	"drivers/net/ethernet/wangxun/ngbe/ngbe_main.c:ngbe_reset() {
> 	    if (test_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags))
> 	        return;
> 	    wx_flush_sw_mac_table(wx);
> 	    wx_mac_set_default_filter(wx, wx->mac.addr);
> 	    if (test_bit(WX_STATE_PTP_RUNNING, wx->state))
> 	        wx_ptp_reset(wx);
> 
> 	"This skips ngbe_reset_hw() and fails to fully reset the hardware.
> 	 Since a PCIe slot reset clears the device's internal state, when
> 	 wx_io_resume() subsequently calls ndo_open(), could the device fail
> 	 to function because its hardware registers and PHY state were never
> 	 reset?

As stated in the commit message, ngbe is not yet fully implemented due to some
existing issues with its reset mechanism. These will be refined in a future patch.

> > +		rtnl_unlock();
> > +		result = PCI_ERS_RESULT_RECOVERED;
> > +	}
> > +
> > +	pci_aer_clear_nonfatal_status(pdev);
> > +
> > +	return result;
> > +}
> 
> ...
> 
> > +static void wx_tx_timeout_recovery(struct wx *wx)
> > +{
> > +	/*
> > +	 * When a PCIe hardware error occurs, the driver should initiate a PCIe
> > +	 * recovery mechanism. However, this recovery flow relies on the AER
> > +	 * driver for current kernel policy. Therefore, a self-contained
> > +	 * recovery mechanism is not implemented yet.
> > +	 */
> > +	set_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags);
> > +	wx_err(wx, "PCIe error detected during tx timeout\n");
> > +	queue_work(wx->reset_wq, &wx->reset_task);
> > +}
> 
> The AI-generated review on https://netdev-ai.bots.linux.dev/sashiko/ flags:
> 
> 	"What clears WX_FLAG_NEED_PCIE_RECOVERY if AER never follows up?
> 	 wx_tx_timeout_recovery() sets the flag and queues reset_task.
> 	 reset_task only calls wx_soft_quiesce() and does not clear the
> 	 flag:
> 
> 	    if (test_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags))
> 	        wx_soft_quiesce(wx);
> 
> 	"The flag is cleared only on the success branch of
> 	 wx_io_slot_reset(). If the PCIe error never escalates to AER (AER
> 	 disabled in firmware or kernel, error not reportable, slot_reset
> 	 returns PCI_ERS_RESULT_DISCONNECT), the flag remains set
> 	 indefinitely.
> 
> 	"Both ngbe_reset() and txgbe_reset() now early-return on this flag:
> 
> 	    static void ngbe_reset(struct wx *wx)
> 	    {
> 	        if (test_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags))
> 	            return;
> 	        ...
> 	    }
> 
> 	"so any subsequent reset path (ethtool reset, ndo_set_features
> 	 triggered reset, watchdog timeout reset, FDIR reinit,
> 	 ngbe_reinit_locked / txgbe_reinit_locked) is silently a no-op. The
> 	 hardware reset capability is effectively disabled until the module
> 	 is reloaded.
> 
> 	"Should wx_tx_timeout_recovery() (or reset_task after the quiesce)
> 	 clear the flag once the soft-quiesce completes, so the driver's
> 	 reset machinery remains functional when AER does not follow up?

If a PCIe error happens and AER is unresponsive, the driver currently lacks
alternative recovery mechanisms. This renders the hardware unusable, making
any further reset operations disallowed.



      reply	other threads:[~2026-06-15  2:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10  6:09 [PATCH net-next v6 0/5] net: wangxun: timeout and error Jiawen Wu
2026-06-10  6:09 ` [PATCH net-next v6 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-06-10  6:09 ` [PATCH net-next v6 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-06-10  6:09 ` [PATCH net-next v6 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-06-10  6:09 ` [PATCH net-next v6 4/5] net: wangxun: implement soft quiesce for PCIe error recovery Jiawen Wu
2026-06-12 15:49   ` Simon Horman
2026-06-15  3:06     ` Jiawen Wu
2026-06-10  6:09 ` [PATCH net-next v6 5/5] net: wangxun: add pcie error handler Jiawen Wu
2026-06-12 15:41   ` Simon Horman
2026-06-15  2:13     ` Jiawen Wu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='018201dcfc6c$96419460$c2c4bd20$@trustnetic.com' \
    --to=jiawenwu@trustnetic.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=joe@dama.to \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=michal.swiatkowski@linux.intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=tglx@kernel.org \
    --cc=u.kleine-koenig@baylibre.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox