All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jiawen Wu" <jiawenwu@trustnetic.com>
To: "'Jakub Kicinski'" <kuba@kernel.org>
Cc: <netdev@vger.kernel.org>, <mengyuanlou@net-swift.com>,
	<andrew+netdev@lunn.ch>, <davem@davemloft.net>,
	<edumazet@google.com>, <pabeni@redhat.com>,
	<richardcochran@gmail.com>, <linux@armlinux.org.uk>,
	<horms@kernel.org>, <kees@kernel.org>, <larysa.zaremba@intel.com>,
	<leitao@debian.org>, <joe@dama.to>, <jacob.e.keller@intel.com>,
	<fabio.baltieri@gmail.com>, <netdev@vger.kernel.org>,
	<mengyuanlou@net-swift.com>, <andrew+netdev@lunn.ch>,
	<davem@davemloft.net>, <edumazet@google.com>, <pabeni@redhat.com>,
	<richardcochran@gmail.com>, <linux@armlinux.org.uk>,
	<horms@kernel.org>, <kees@kernel.org>, <larysa.zaremba@intel.com>,
	<leitao@debian.org>, <joe@dama.to>, <jacob.e.keller@intel.com>,
	<fabio.baltieri@gmail.com>
Subject: RE: [PATCH net-next v2 6/6] net: wangxun: implement pci_error_handlers ops
Date: Sat, 9 May 2026 16:29:35 +0800	[thread overview]
Message-ID: <065501dcdf8d$f8473030$e8d59090$@trustnetic.com> (raw)
In-Reply-To: <20260503021538.4127382-1-kuba@kernel.org>

On Sun, May 3, 2026 10:16 AM, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> net: wangxun: implement pci_error_handlers ops
> 
> Support AER driver to handle the PCIe errors.
> 
> > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> > --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c
> > +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c
> [ ... ]
> > +static pci_ers_result_t wx_io_error_detected(struct pci_dev *pdev,
> > +					     pci_channel_state_t state)
> > +{
> > +	struct wx *wx = pci_get_drvdata(pdev);
> > +	struct net_device *netdev;
> > +
> > +	netdev = wx->netdev;
> 
> If an AER event triggers during ngbe_probe() or txgbe_probe() before
> pci_set_drvdata() is called, wx will be NULL here.
> 
> Could this lead to a NULL pointer dereference when accessing wx->netdev?
> 
> Should the handler validate the wx pointer, or should pci_set_drvdata()
> be moved earlier in the probe sequence?

I'll validate the wx pointer first.

> 
> > +	if (!netif_device_present(netdev))
> > +		return PCI_ERS_RESULT_DISCONNECT;
> > +
> > +	rtnl_lock();
> > +	netif_device_detach(netdev);
> > +
> > +	if (netif_running(netdev))
> > +		wx->close_suspend(wx);
> > +
> > +	if (state == pci_channel_io_perm_failure) {
> > +		rtnl_unlock();
> > +		return PCI_ERS_RESULT_DISCONNECT;
> > +	}
> 
> Does calling wx->close_suspend(wx) before checking for
> pci_channel_io_perm_failure cause illegal MMIO accesses?
> 
> If the device is in a permanent failure state, it is physically
> disconnected, and MMIO accesses could result in Master Aborts or system
> lockups.
> 
> Should the check for permanent failure precede the hardware teardown?

I'll fix it.

> 
> > +	if (!test_and_set_bit(WX_STATE_DISABLED, wx->state))
> > +		pci_disable_device(pdev);
> > +	rtnl_unlock();
> > +
> > +	/* Request a slot reset. */
> > +	return PCI_ERS_RESULT_NEED_RESET;
> > +}
> 
> Should the background service task (wx->service_task) be canceled here
> using cancel_work_sync()?
> 
> Since WX_STATE_DISABLED is not checked within the service task's
> execution path, it appears the task could wake up and attempt to interact
> with the device while its MMIO space is disabled.

Service task is canceled in the device close path, and no event triggers it
when netif is not running.

> 
> [ ... ]
> > +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev)
> > +{
> > +	struct wx *wx = pci_get_drvdata(pdev);
> > +	pci_ers_result_t result;
> > +
> > +	if (pci_enable_device_mem(pdev)) {
> > +		wx_err(wx, "Cannot re-enable PCI device after reset.\n");
> > +		result = PCI_ERS_RESULT_DISCONNECT;
> > +	} else {
> > +		/* make all bar access done before reset. */
> > +		smp_mb__before_atomic();
> > +		clear_bit(WX_STATE_DISABLED, wx->state);
> > +		pci_set_master(pdev);
> > +		pci_restore_state(pdev);
> > +		pci_wake_from_d3(pdev, false);
> > +
> > +		wx->do_reset(wx->netdev, false);
> 
> Is it safe to call wx->do_reset() here without holding the rtnl_lock()?
> 
> Other paths that reset the hardware (such as wx_reset_subtask()) appear
> to acquire rtnl_lock() to serialize against concurrent network
> configurations.

I think there is no need to acquire rtnl_lock().

> 
> Could failing to hold the lock here allow concurrent userspace requests
> to access hardware registers while the device is in a reset state?
> 
> > +		result = PCI_ERS_RESULT_RECOVERED;
> > +	}
> > +
> > +	pci_aer_clear_nonfatal_status(pdev);
> > +
> > +	return result;
> > +}
> [ ... ]
> > +static void wx_io_resume(struct pci_dev *pdev)
> > +{
> > +	struct wx *wx = pci_get_drvdata(pdev);
> > +	struct net_device *netdev;
> > +	int err;
> > +
> > +	netdev = wx->netdev;
> > +	rtnl_lock();
> > +	if (netif_running(netdev)) {
> > +		err = netdev->netdev_ops->ndo_open(netdev);
> 
> By calling ndo_open() directly instead of dev_open(), is there a risk of
> bypassing the core network state machine?
> 
> If ndo_open() fails, it cleans up its allocations, but the core network
> stack's __LINK_STATE_START bit remains set.
> 
> If a user subsequently administratively downs the interface, the network
> core might unconditionally call ndo_stop(), potentially leading to a
> double-free when it attempts to clean up the already-freed resources.

If ndo_open() fails, the hardware state must have been wrong beyond
recovery. At this point, all further actions are invalid.

> 
> > +		if (err) {
> > +			wx_err(wx, "Failed to open netdev after reset\n");
> > +			goto out;
> > +		}
> > +	}
> > +	netif_device_attach(netdev);
> > +out:
> > +	rtnl_unlock();
> > +}
> [ ... ]
> > diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> > --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> > +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
> [ ... ]
> > @@ -900,6 +901,7 @@ static int txgbe_probe(struct pci_dev *pdev,
> >  		goto err_remove_phy;
> >
> >  	pci_set_drvdata(pdev, wx);
> > +	pci_save_state(pdev);
> >
> >  	netif_tx_stop_all_queues(netdev);
> 
> This isn't a bug introduced by this patch, but does the ordering here
> expose the interface to userspace before queue initialization?
> 
> Since register_netdev() is called prior to this code, the interface
> becomes visible to userspace.
> 
> If a management daemon immediately brings the interface UP, invoking
> ndo_open() and waking the TX queues, could this netif_tx_stop_all_queues()
> erroneously stop the queues of the logically UP interface, resulting in a
> silent TX hang?

I'll consider this in another patch.
Thanks.


      reply	other threads:[~2026-05-09  8:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  8:25 [PATCH net-next v2 0/6] net: wangxun: timeout and error Jiawen Wu
2026-04-30  8:25 ` [PATCH net-next v2 1/6] net: ngbe: implement libwx reset ops Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-05-06  9:05     ` Jiawen Wu
2026-04-30  8:25 ` [PATCH net-next v2 2/6] net: wangxun: add Tx timeout process Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-04-30  8:25 ` [PATCH net-next v2 3/6] net: wangxun: add reinit parameter to wx->do_reset callback Jiawen Wu
2026-04-30  8:25 ` [PATCH net-next v2 4/6] net: wangxun: extract the close_suspend sequence Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-04-30  8:25 ` [PATCH net-next v2 5/6] net: wangxun: clear stored DMA addresses after dma_free_coherent() Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-05-08  8:43     ` Jiawen Wu
2026-04-30  8:25 ` [PATCH net-next v2 6/6] net: wangxun: implement pci_error_handlers ops Jiawen Wu
2026-05-03  2:15   ` Jakub Kicinski
2026-05-09  8:29     ` Jiawen Wu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='065501dcdf8d$f8473030$e8d59090$@trustnetic.com' \
    --to=jiawenwu@trustnetic.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fabio.baltieri@gmail.com \
    --cc=horms@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=joe@dama.to \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=leitao@debian.org \
    --cc=linux@armlinux.org.uk \
    --cc=mengyuanlou@net-swift.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.