From: "Jiawen Wu" <jiawenwu@trustnetic.com>
To: "'Simon Horman'" <horms@kernel.org>
Cc: netdev@vger.kernel.org,
"'Mengyuan Lou'" <mengyuanlou@net-swift.com>,
"'Andrew Lunn'" <andrew+netdev@lunn.ch>,
"'David S. Miller'" <davem@davemloft.net>,
"'Eric Dumazet'" <edumazet@google.com>,
"'Jakub Kicinski'" <kuba@kernel.org>,
"'Paolo Abeni'" <pabeni@redhat.com>,
"'Richard Cochran'" <richardcochran@gmail.com>,
"'Russell King'" <linux@armlinux.org.uk>,
"'Jacob Keller'" <jacob.e.keller@intel.com>,
"'Michal Swiatkowski'" <michal.swiatkowski@linux.intel.com>,
"'Kees Cook'" <kees@kernel.org>,
"'Larysa Zaremba'" <larysa.zaremba@intel.com>,
"'Joe Damato'" <joe@dama.to>,
"'Breno Leitao'" <leitao@debian.org>,
"'Aleksandr Loktionov'" <aleksandr.loktionov@intel.com>,
"'Uwe Kleine-König (The Capable Hub)'"
<u.kleine-koenig@baylibre.com>,
"'Fabio Baltieri'" <fabio.baltieri@gmail.com>,
"'Thomas Gleixner'" <tglx@kernel.org>,
"'Greg Kroah-Hartman'" <gregkh@linuxfoundation.org>,
netdev@vger.kernel.org,
"'Mengyuan Lou'" <mengyuanlou@net-swift.com>,
"'Andrew Lunn'" <andrew+netdev@lunn.ch>,
"'David S. Miller'" <davem@davemloft.net>,
"'Eric Dumazet'" <edumazet@google.com>,
"'Jakub Kicinski'" <kuba@kernel.org>,
"'Paolo Abeni'" <pabeni@redhat.com>,
"'Richard Cochran'" <richardcochran@gmail.com>,
"'Russell King'" <linux@armlinux.org.uk>,
"'Jacob Keller'" <jacob.e.keller@intel.com>,
"'Michal Swiatkowski'" <michal.swiatkowski@linux.intel.com>,
"'Kees Cook'" <kees@kernel.org>,
"'Larysa Zaremba'" <larysa.zaremba@intel.com>,
"'Joe Damato'" <joe@dama.to>,
"'Breno Leitao'" <leitao@debian.org>,
"'Aleksandr Loktionov'" <aleksandr.loktionov@intel.com>,
"'Uwe Kleine-König (The Capable Hub)'"
<u.kleine-koenig@baylibre.com>,
"'Fabio Baltieri'" <fabio.baltieri@gmail.com>,
"'Thomas Gleixner'" <tglx@kernel.org>,
"'Greg Kroah-Hartman'" <gregkh@linuxfoundation.org>
Subject: RE: [PATCH net-next v6 5/5] net: wangxun: add pcie error handler
Date: Mon, 15 Jun 2026 10:13:42 +0800 [thread overview]
Message-ID: <018201dcfc6c$96419460$c2c4bd20$@trustnetic.com> (raw)
In-Reply-To: <20260612154117.GC671640@horms.kernel.org>
On Fri, Jun 12, 2026 11:41 PM, Simon Horman wrote:
> On Wed, Jun 10, 2026 at 02:09:17PM +0800, Jiawen Wu wrote:
> > Support AER driver to handle the PCIe errors. Sometimes netdev watchdog
> > Tx timeout happens before the AER error report when a PCIe error occurs,
> > CPU blocking would be caused by MMIO during the reset process. To
> > prevent it, check PCIe error status in .ndo_tx_timeout. The current
> > function of ngbe is not yet fully developed, it will be completed in the
> > future.
> >
> > Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
>
> Hi Jiawen,
>
> There is AI-generated review of this patch-set available on both
> https://sashiko.dev and https://netdev-ai.bots.linux.dev/sashiko/
>
> I've tried to filter out only those that seem strictly relevant
> to the progress of this patch and have included them inline.
>
> I'd appreciate it if you could take a look over this.
>
> > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
>
> ...
>
> > +/**
> > + * wx_io_slot_reset - called after the pci bus has been reset.
> > + * @pdev: Pointer to PCI device
> > + *
> > + * Return: pci_ers_result_t.
> > + *
> > + * Restart the card from scratch, as if from a cold-boot.
> > + */
> > +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
> > +{
> > + struct wx *wx = pci_get_drvdata(pdev);
> > + pci_ers_result_t result;
> > +
> > + if (pci_enable_device_mem(pdev)) {
> > + wx_err(wx, "Cannot re-enable PCI device after reset.\n");
> > + result = PCI_ERS_RESULT_DISCONNECT;
> > + } else {
> > + /* make all memory operations done before clearing the flag */
> > + smp_mb__before_atomic();
> > + clear_bit(WX_STATE_DISABLED, wx->state);
> > + clear_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags);
> > + pci_set_master(pdev);
> > + pci_restore_state(pdev);
> > + pci_wake_from_d3(pdev, false);
> > +
> > + rtnl_lock();
> > + if (netif_running(wx->netdev) && wx->down_suspend)
> > + wx->down_suspend(wx);
> > + if (wx->do_reset)
> > + wx->do_reset(wx->netdev, false);
>
> The AI-generated review on sashiko.dev flags the following.
>
> "Does this properly initialize the hardware MAC during PCIe AER
> slot reset recovery for the ngbe driver?
>
> "For txgbe, this correctly maps to txgbe_reset(), which calls
> txgbe_reset_hw(). However, for ngbe, wx->do_reset maps to
> ngbe_do_reset() which calls ngbe_reset().
>
> "Looking at ngbe_reset():
>
> "drivers/net/ethernet/wangxun/ngbe/ngbe_main.c:ngbe_reset() {
> if (test_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags))
> return;
> wx_flush_sw_mac_table(wx);
> wx_mac_set_default_filter(wx, wx->mac.addr);
> if (test_bit(WX_STATE_PTP_RUNNING, wx->state))
> wx_ptp_reset(wx);
>
> "This skips ngbe_reset_hw() and fails to fully reset the hardware.
> Since a PCIe slot reset clears the device's internal state, when
> wx_io_resume() subsequently calls ndo_open(), could the device fail
> to function because its hardware registers and PHY state were never
> reset?
As stated in the commit message, ngbe is not yet fully implemented due to some
existing issues with its reset mechanism. These will be refined in a future patch.
> > + rtnl_unlock();
> > + result = PCI_ERS_RESULT_RECOVERED;
> > + }
> > +
> > + pci_aer_clear_nonfatal_status(pdev);
> > +
> > + return result;
> > +}
>
> ...
>
> > +static void wx_tx_timeout_recovery(struct wx *wx)
> > +{
> > + /*
> > + * When a PCIe hardware error occurs, the driver should initiate a PCIe
> > + * recovery mechanism. However, this recovery flow relies on the AER
> > + * driver for current kernel policy. Therefore, a self-contained
> > + * recovery mechanism is not implemented yet.
> > + */
> > + set_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags);
> > + wx_err(wx, "PCIe error detected during tx timeout\n");
> > + queue_work(wx->reset_wq, &wx->reset_task);
> > +}
>
> The AI-generated review on https://netdev-ai.bots.linux.dev/sashiko/ flags:
>
> "What clears WX_FLAG_NEED_PCIE_RECOVERY if AER never follows up?
> wx_tx_timeout_recovery() sets the flag and queues reset_task.
> reset_task only calls wx_soft_quiesce() and does not clear the
> flag:
>
> if (test_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags))
> wx_soft_quiesce(wx);
>
> "The flag is cleared only on the success branch of
> wx_io_slot_reset(). If the PCIe error never escalates to AER (AER
> disabled in firmware or kernel, error not reportable, slot_reset
> returns PCI_ERS_RESULT_DISCONNECT), the flag remains set
> indefinitely.
>
> "Both ngbe_reset() and txgbe_reset() now early-return on this flag:
>
> static void ngbe_reset(struct wx *wx)
> {
> if (test_bit(WX_FLAG_NEED_PCIE_RECOVERY, wx->flags))
> return;
> ...
> }
>
> "so any subsequent reset path (ethtool reset, ndo_set_features
> triggered reset, watchdog timeout reset, FDIR reinit,
> ngbe_reinit_locked / txgbe_reinit_locked) is silently a no-op. The
> hardware reset capability is effectively disabled until the module
> is reloaded.
>
> "Should wx_tx_timeout_recovery() (or reset_task after the quiesce)
> clear the flag once the soft-quiesce completes, so the driver's
> reset machinery remains functional when AER does not follow up?
If a PCIe error happens and AER is unresponsive, the driver currently lacks
alternative recovery mechanisms. This renders the hardware unusable, making
any further reset operations disallowed.
prev parent reply other threads:[~2026-06-15 2:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-10 6:09 [PATCH net-next v6 0/5] net: wangxun: timeout and error Jiawen Wu
2026-06-10 6:09 ` [PATCH net-next v6 1/5] net: ngbe: implement libwx reset ops Jiawen Wu
2026-06-10 6:09 ` [PATCH net-next v6 2/5] net: wangxun: add Tx timeout process Jiawen Wu
2026-06-10 6:09 ` [PATCH net-next v6 3/5] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-06-10 6:09 ` [PATCH net-next v6 4/5] net: wangxun: implement soft quiesce for PCIe error recovery Jiawen Wu
2026-06-12 15:49 ` Simon Horman
2026-06-15 3:06 ` Jiawen Wu
2026-06-10 6:09 ` [PATCH net-next v6 5/5] net: wangxun: add pcie error handler Jiawen Wu
2026-06-12 15:41 ` Simon Horman
2026-06-15 2:13 ` Jiawen Wu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='018201dcfc6c$96419460$c2c4bd20$@trustnetic.com' \
--to=jiawenwu@trustnetic.com \
--cc=aleksandr.loktionov@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fabio.baltieri@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=horms@kernel.org \
--cc=jacob.e.keller@intel.com \
--cc=joe@dama.to \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=larysa.zaremba@intel.com \
--cc=leitao@debian.org \
--cc=linux@armlinux.org.uk \
--cc=mengyuanlou@net-swift.com \
--cc=michal.swiatkowski@linux.intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=richardcochran@gmail.com \
--cc=tglx@kernel.org \
--cc=u.kleine-koenig@baylibre.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox