From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D9AB35E549 for ; Sun, 3 May 2026 02:15:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777774540; cv=none; b=OnDOcZYYMw0zm9HtHaVw6aQ+gnQZV8I04thSnRIbKp6A8POdVpqxIOYLZkmyzh6kRB0/pc2pceGiGWBDeQ37mc34Yf+KqLLfqr0u5bC3CeVmB5N77cKzjxh+SGDG7UX7p3j/W4SbPngptUzG5gsg3Anqd+Xya7gdjRrK/U8y4lI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777774540; c=relaxed/simple; bh=XcIkrF9u8KEvBcFwQuOzXM8O898ziH5rvIIDFJaig0c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lmo9iZMh1mVR1XR4gmu7kzuAOVP6g/4mo9u912uLE1RgAhDx77t6j/9zrpuZPHQd2b8n8X8120l13PPTMCuwfckdmGAFRcx10GWibDLU6t9vxcDDO407hI0gBGVNwnMqNflZDhEzVFV4Kltz51JFiWgiKaOHDXY4C0Ed0gnaE5Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=slI1z2sl; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="slI1z2sl" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 79186C19425; Sun, 3 May 2026 02:15:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777774540; bh=XcIkrF9u8KEvBcFwQuOzXM8O898ziH5rvIIDFJaig0c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=slI1z2slKAhZnLWsMYeu3pu9OphAl8v1dMPebwvW05KWQXWk3+RUqp1j74PZs336y O3Nj1QxUM0ERFlkV/8B9dfJoTC1j6fGn4i1wT/4MZ1swlFA8LlqyaCLibR+8XyIYqD PhDvVTvKyMsF8mdf6zP6heMimsHpyifhqTK5k1Ey7wfUGlz2lhx1O5C9GQrT20Ecjb Br2ws6zgn1llHuKoyNWE8dgrrnqxmgOaXCpncdoYMwSu5P9ty/fIUPf9oLCwtXffV6 Qa6yIw7zpgAQPZcesYbpWkgqQwpDuiyI6vEb7iOVEm1N3bRbV1a5M85Tma7tDOtSyL pu7PC8/pz1Uww== From: Jakub Kicinski To: jiawenwu@trustnetic.com Cc: Jakub Kicinski , netdev@vger.kernel.org, mengyuanlou@net-swift.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, richardcochran@gmail.com, linux@armlinux.org.uk, horms@kernel.org, kees@kernel.org, larysa.zaremba@intel.com, leitao@debian.org, joe@dama.to, jacob.e.keller@intel.com, fabio.baltieri@gmail.com Subject: Re: [PATCH net-next v2 6/6] net: wangxun: implement pci_error_handlers ops Date: Sat, 2 May 2026 19:15:38 -0700 Message-ID: <20260503021538.4127382-1-kuba@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260430082517.19612-7-jiawenwu@trustnetic.com> References: <20260430082517.19612-7-jiawenwu@trustnetic.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. --- net: wangxun: implement pci_error_handlers ops Support AER driver to handle the PCIe errors. > diff --git a/drivers/net/ethernet/wangxun/libwx/wx_err.c b/drivers/net/ethernet/wangxun/libwx/wx_err.c > --- a/drivers/net/ethernet/wangxun/libwx/wx_err.c > +++ b/drivers/net/ethernet/wangxun/libwx/wx_err.c [ ... ] > +static pci_ers_result_t wx_io_error_detected(struct pci_dev *pdev, > + pci_channel_state_t state) > +{ > + struct wx *wx = pci_get_drvdata(pdev); > + struct net_device *netdev; > + > + netdev = wx->netdev; If an AER event triggers during ngbe_probe() or txgbe_probe() before pci_set_drvdata() is called, wx will be NULL here. Could this lead to a NULL pointer dereference when accessing wx->netdev? Should the handler validate the wx pointer, or should pci_set_drvdata() be moved earlier in the probe sequence? > + if (!netif_device_present(netdev)) > + return PCI_ERS_RESULT_DISCONNECT; > + > + rtnl_lock(); > + netif_device_detach(netdev); > + > + if (netif_running(netdev)) > + wx->close_suspend(wx); > + > + if (state == pci_channel_io_perm_failure) { > + rtnl_unlock(); > + return PCI_ERS_RESULT_DISCONNECT; > + } Does calling wx->close_suspend(wx) before checking for pci_channel_io_perm_failure cause illegal MMIO accesses? If the device is in a permanent failure state, it is physically disconnected, and MMIO accesses could result in Master Aborts or system lockups. Should the check for permanent failure precede the hardware teardown? > + if (!test_and_set_bit(WX_STATE_DISABLED, wx->state)) > + pci_disable_device(pdev); > + rtnl_unlock(); > + > + /* Request a slot reset. */ > + return PCI_ERS_RESULT_NEED_RESET; > +} Should the background service task (wx->service_task) be canceled here using cancel_work_sync()? Since WX_STATE_DISABLED is not checked within the service task's execution path, it appears the task could wake up and attempt to interact with the device while its MMIO space is disabled. [ ... ] > +static pci_ers_result_t wx_io_slot_reset(struct pci_dev *pdev) > +{ > + struct wx *wx = pci_get_drvdata(pdev); > + pci_ers_result_t result; > + > + if (pci_enable_device_mem(pdev)) { > + wx_err(wx, "Cannot re-enable PCI device after reset.\n"); > + result = PCI_ERS_RESULT_DISCONNECT; > + } else { > + /* make all bar access done before reset. */ > + smp_mb__before_atomic(); > + clear_bit(WX_STATE_DISABLED, wx->state); > + pci_set_master(pdev); > + pci_restore_state(pdev); > + pci_wake_from_d3(pdev, false); > + > + wx->do_reset(wx->netdev, false); Is it safe to call wx->do_reset() here without holding the rtnl_lock()? Other paths that reset the hardware (such as wx_reset_subtask()) appear to acquire rtnl_lock() to serialize against concurrent network configurations. Could failing to hold the lock here allow concurrent userspace requests to access hardware registers while the device is in a reset state? > + result = PCI_ERS_RESULT_RECOVERED; > + } > + > + pci_aer_clear_nonfatal_status(pdev); > + > + return result; > +} [ ... ] > +static void wx_io_resume(struct pci_dev *pdev) > +{ > + struct wx *wx = pci_get_drvdata(pdev); > + struct net_device *netdev; > + int err; > + > + netdev = wx->netdev; > + rtnl_lock(); > + if (netif_running(netdev)) { > + err = netdev->netdev_ops->ndo_open(netdev); By calling ndo_open() directly instead of dev_open(), is there a risk of bypassing the core network state machine? If ndo_open() fails, it cleans up its allocations, but the core network stack's __LINK_STATE_START bit remains set. If a user subsequently administratively downs the interface, the network core might unconditionally call ndo_stop(), potentially leading to a double-free when it attempts to clean up the already-freed resources. > + if (err) { > + wx_err(wx, "Failed to open netdev after reset\n"); > + goto out; > + } > + } > + netif_device_attach(netdev); > +out: > + rtnl_unlock(); > +} [ ... ] > diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c > --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c > +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c [ ... ] > @@ -900,6 +901,7 @@ static int txgbe_probe(struct pci_dev *pdev, > goto err_remove_phy; > > pci_set_drvdata(pdev, wx); > + pci_save_state(pdev); > > netif_tx_stop_all_queues(netdev); This isn't a bug introduced by this patch, but does the ordering here expose the interface to userspace before queue initialization? Since register_netdev() is called prior to this code, the interface becomes visible to userspace. If a management daemon immediately brings the interface UP, invoking ndo_open() and waking the TX queues, could this netif_tx_stop_all_queues() erroneously stop the queues of the logically UP interface, resulting in a silent TX hang?