From: "Jiawen Wu" <jiawenwu@trustnetic.com>
To: "'Jakub Kicinski'" <kuba@kernel.org>
Cc: <netdev@vger.kernel.org>, <mengyuanlou@net-swift.com>,
<andrew+netdev@lunn.ch>, <davem@davemloft.net>,
<edumazet@google.com>, <pabeni@redhat.com>,
<richardcochran@gmail.com>, <linux@armlinux.org.uk>,
<horms@kernel.org>, <kees@kernel.org>, <larysa.zaremba@intel.com>,
<leitao@debian.org>, <joe@dama.to>, <jacob.e.keller@intel.com>,
<fabio.baltieri@gmail.com>, <netdev@vger.kernel.org>,
<mengyuanlou@net-swift.com>, <andrew+netdev@lunn.ch>,
<davem@davemloft.net>, <edumazet@google.com>, <pabeni@redhat.com>,
<richardcochran@gmail.com>, <linux@armlinux.org.uk>,
<horms@kernel.org>, <kees@kernel.org>, <larysa.zaremba@intel.com>,
<leitao@debian.org>, <joe@dama.to>, <jacob.e.keller@intel.com>,
<fabio.baltieri@gmail.com>
Subject: RE: [PATCH net-next v2 6/6] net: wangxun: implement pci_error_handlers ops
Date: Sat, 9 May 2026 16:29:35 +0800 [thread overview]
Message-ID: <065501dcdf8d$f8473030$e8d59090$@trustnetic.com> (raw)
In-Reply-To: <20260503021538.4127382-1-kuba@kernel.org>
On Sun, May 3, 2026 10:16 AM, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> net: wangxun: implement pci_error_handlers ops
>
> Support AER driver to handle the PCIe errors.
>
> > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> > --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
> > +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> [ ... ]
> > +static pci_ers_result_t wx_io_error_detected(struct pci_dev *pdev,
> > + pci_channel_state_t state)
> > +{
> > + struct wx *wx = pci_get_drvdata(pdev);
> > + struct net_device *netdev;
> > +
> > + netdev = wx->netdev;
>
> If an AER event triggers during ngbe_probe() or txgbe_probe() before
> pci_set_drvdata() is called, wx will be NULL here.
>
> Could this lead to a NULL pointer dereference when accessing wx->netdev?
>
> Should the handler validate the wx pointer, or should pci_set_drvdata()
> be moved earlier in the probe sequence?
I'll validate the wx pointer first.
>
> > + if (!netif_device_present(netdev))
> > + return PCI_ERS_RESULT_DISCONNECT;
> > +
> > + rtnl_lock();
> > + netif_device_detach(netdev);
> > +
> > + if (netif_running(netdev))
> > + wx->close_suspend(wx);
> > +
> > + if (state == pci_channel_io_perm_failure) {
> > + rtnl_unlock();
> > + return PCI_ERS_RESULT_DISCONNECT;
> > + }
>
> Does calling wx->close_suspend(wx) before checking for
> pci_channel_io_perm_failure cause illegal MMIO accesses?
>
> If the device is in a permanent failure state, it is physically
> disconnected, and MMIO accesses could result in Master Aborts or system
> lockups.
>
> Should the check for permanent failure precede the hardware teardown?
I'll fix it.
>
> > + if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
> > + pci_disable_device(pdev);
> > + rtnl_unlock();
> > +
> > + /* Request a slot reset. */
> > + return PCI_ERS_RESULT_NEED_RESET;
> > +}
>
> Should the background service task (wx->service_task) be canceled here
> using cancel_work_sync()?
>
> Since WX_STATE_DISABLED is not checked within the service task's
> execution path, it appears the task could wake up and attempt to interact
> with the device while its MMIO space is disabled.
Service task is canceled in the device close path, and no event triggers it
when netif is not running.
>
> [ ... ]
> > +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
> > +{
> > + struct wx *wx = pci_get_drvdata(pdev);
> > + pci_ers_result_t result;
> > +
> > + if (pci_enable_device_mem(pdev)) {
> > + wx_err(wx, "Cannot re-enable PCI device after reset.\n");
> > + result = PCI_ERS_RESULT_DISCONNECT;
> > + } else {
> > + /* make all bar access done before reset. */
> > + smp_mb__before_atomic();
> > + clear_bit(WX_STATE_DISABLED, wx->state);
> > + pci_set_master(pdev);
> > + pci_restore_state(pdev);
> > + pci_wake_from_d3(pdev, false);
> > +
> > + wx->do_reset(wx->netdev, false);
>
> Is it safe to call wx->do_reset() here without holding the rtnl_lock()?
>
> Other paths that reset the hardware (such as wx_reset_subtask()) appear
> to acquire rtnl_lock() to serialize against concurrent network
> configurations.
I think there is no need to acquire rtnl_lock().
>
> Could failing to hold the lock here allow concurrent userspace requests
> to access hardware registers while the device is in a reset state?
>
> > + result = PCI_ERS_RESULT_RECOVERED;
> > + }
> > +
> > + pci_aer_clear_nonfatal_status(pdev);
> > +
> > + return result;
> > +}
> [ ... ]
> > +static void wx_io_resume(struct pci_dev *pdev)
> > +{
> > + struct wx *wx = pci_get_drvdata(pdev);
> > + struct net_device *netdev;
> > + int err;
> > +
> > + netdev = wx->netdev;
> > + rtnl_lock();
> > + if (netif_running(netdev)) {
> > + err = netdev->netdev_ops->ndo_open(netdev);
>
> By calling ndo_open() directly instead of dev_open(), is there a risk of
> bypassing the core network state machine?
>
> If ndo_open() fails, it cleans up its allocations, but the core network
> stack's __LINK_STATE_START bit remains set.
>
> If a user subsequently administratively downs the interface, the network
> core might unconditionally call ndo_stop(), potentially leading to a
> double-free when it attempts to clean up the already-freed resources.
If ndo_open() fails, the hardware state must have been wrong beyond
recovery. At this point, all further actions are invalid.
>
> > + if (err) {
> > + wx_err(wx, "Failed to open netdev after reset\n");
> > + goto out;
> > + }
> > + }
> > + netif_device_attach(netdev);
> > +out:
> > + rtnl_unlock();
> > +}
> [ ... ]
> > diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> > --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> > +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> [ ... ]
> > @@ -900,6 +901,7 @@ static int txgbe_probe(struct pci_dev *pdev,
> > goto err_remove_phy;
> >
> > pci_set_drvdata(pdev, wx);
> > + pci_save_state(pdev);
> >
> > netif_tx_stop_all_queues(netdev);
>
> This isn't a bug introduced by this patch, but does the ordering here
> expose the interface to userspace before queue initialization?
>
> Since register_netdev() is called prior to this code, the interface
> becomes visible to userspace.
>
> If a management daemon immediately brings the interface UP, invoking
> ndo_open() and waking the TX queues, could this netif_tx_stop_all_queues()
> erroneously stop the queues of the logically UP interface, resulting in a
> silent TX hang?
I'll consider this in another patch.
Thanks.
prev parent reply other threads:[~2026-05-09 8:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 8:25 [PATCH net-next v2 0/6] net: wangxun: timeout and error Jiawen Wu
2026-04-30 8:25 ` [PATCH net-next v2 1/6] net: ngbe: implement libwx reset ops Jiawen Wu
2026-05-03 2:15 ` Jakub Kicinski
2026-05-06 9:05 ` Jiawen Wu
2026-04-30 8:25 ` [PATCH net-next v2 2/6] net: wangxun: add Tx timeout process Jiawen Wu
2026-05-03 2:15 ` Jakub Kicinski
2026-04-30 8:25 ` [PATCH net-next v2 3/6] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-04-30 8:25 ` [PATCH net-next v2 4/6] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-05-03 2:15 ` Jakub Kicinski
2026-04-30 8:25 ` [PATCH net-next v2 5/6] net: wangxun: clear stored DMA addresses after dma_free_coherent() Jiawen Wu
2026-05-03 2:15 ` Jakub Kicinski
2026-05-08 8:43 ` Jiawen Wu
2026-04-30 8:25 ` [PATCH net-next v2 6/6] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-05-03 2:15 ` Jakub Kicinski
2026-05-09 8:29 ` Jiawen Wu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='065501dcdf8d$f8473030$e8d59090$@trustnetic.com' \
--to=jiawenwu@trustnetic.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fabio.baltieri@gmail.com \
--cc=horms@kernel.org \
--cc=jacob.e.keller@intel.com \
--cc=joe@dama.to \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=larysa.zaremba@intel.com \
--cc=leitao@debian.org \
--cc=linux@armlinux.org.uk \
--cc=mengyuanlou@net-swift.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=richardcochran@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox