public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached
@ 2026-04-21  5:16 Li Li
  2026-04-23 16:33 ` Simon Horman
  0 siblings, 1 reply; 2+ messages in thread
From: Li Li @ 2026-04-21  5:16 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
	Eric Dumazet, intel-wired-lan
  Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
	Sridhar Samudrala, Brian Vazquez, Li Li, emil.s.tantilov, stable

Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
handling a reset") prevents ethtool -N/-n operations to operate on
detached netdevs, we found that out-of-tree workflows like OpenOnload
can bypass ethtool core locks and call idpf_set_rxnfc directly during
an idpf HW reset. When this happens, we could get kernel crashes like
the following:

[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070
[ 4045.794420] #PF: supervisor read access in kernel mode
[ 4045.799580] #PF: error_code(0x0000) - not-present page
[ 4045.804739] PGD 0
[ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI
...
[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload]
[ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf]
...
[ 4045.946323] Call Trace:
[ 4045.948796]  <TASK>
[ 4045.950915]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 4045.955293]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 4045.959672]  ? idpf_set_rxnfc+0x6f/0x80 [idpf]
[ 4046.063613]  </TASK>

To prevent this, we need to add checks in idpf_set_rxnfc and
idpf_get_rxnfc to error out if the netdev is already detached.

Tested: synthetically forced idpf into a HW reset by introducing module
parameters to simulate a Tx timeout and force virtual channel
initialization failure. This was done by skipping completion cleaning for
specific queues and returning -EIO during core initialization.
The failure was then triggered by writing 1 to the corresponding sysfs
parameters and calling idpf_get_rxnfc() during the reset process.

Without the patch: encountered NULL pointer and kernel crash.

With the patch: no crashes.

Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
Cc: stable@vger.kernel.org
Signed-off-by: Li Li <boolli@google.com>
---
v2:
- Removed the raw code block from the commit message and replaced it with
  a textual description of the test modifications.

 drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index bb99d9e7c65d..8368a7e6a754 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 	unsigned int cnt = 0;
 	int err = 0;
 
+	if (!netdev || !netif_device_present(netdev))
+		return -ENODEV;
+
 	idpf_vport_ctrl_lock(netdev);
 	vport = idpf_netdev_to_vport(netdev);
 	vport_config = np->adapter->vport_config[np->vport_idx];
@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
 {
 	int ret = -EOPNOTSUPP;
 
+	if (!netdev || !netif_device_present(netdev))
+		return -ENODEV;
+
 	idpf_vport_ctrl_lock(netdev);
 	switch (cmd->cmd) {
 	case ETHTOOL_SRXCLSRLINS:
-- 
2.54.0.rc1.555.g9c883467ad-goog


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached
  2026-04-21  5:16 [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached Li Li
@ 2026-04-23 16:33 ` Simon Horman
  0 siblings, 0 replies; 2+ messages in thread
From: Simon Horman @ 2026-04-23 16:33 UTC (permalink / raw)
  To: boolli
  Cc: 'Simon Horman', anthony.l.nguyen, przemyslaw.kitszel,
	davem, kuba, edumazet, intel-wired-lan, netdev, linux-kernel,
	decot, anjali.singhai, sridhar.samudrala, brianvv,
	emil.s.tantilov, stable

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
idpf: do not perform flow ops when netdev is detached

This commit adds checks to prevent crashes when out-of-tree modules bypass
ethtool core locks and call flow operations on detached netdevs during a
hardware reset. It introduces netdev and netif_device_present checks in
idpf_get_rxnfc and idpf_set_rxnfc.

> Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
> handling a reset") prevents ethtool -N/-n operations to operate on
> detached netdevs, we found that out-of-tree workflows like OpenOnload
> can bypass ethtool core locks and call idpf_set_rxnfc directly during
> an idpf HW reset.

Is it acceptable to add defensive checks upstream solely to accommodate
out-of-tree modules that bypass core kernel locking?

The core ethtool infrastructure manages locks like rtnl_lock for in-tree
callers to guarantee safety. Modifying the upstream driver to paper over
locking violations in out-of-tree modules might contradict upstream policies.

Simon says: If this is to address OOT module behaviour then I am concerned.
            And I would suggest that the correct approach is to fix
	    the OOT kernel module.

> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index bb99d9e7c65d5..8368a7e6a7544 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
>  	unsigned int cnt = 0;
>  	int err = 0;
>  
> +	if (!netdev || !netif_device_present(netdev))
> +		return -ENODEV;
> +

Could this result in undefined behavior if netdev is NULL? Before this check,
np is initialized using netdev_priv(netdev), which does pointer arithmetic.
If netdev can genuinely be NULL from out-of-tree callers, should the
initialization of np be moved after the validation?

Simon says: I think the actual problem here is that netdev is never NULL.
            And thus the !netdev check should not be added.

	    Likewise for idpf_set_rxnfc()

Also, is there a time-of-check to time-of-use race here?

The netif_device_present() check is performed outside the vport_ctrl_lock
critical section.

If a concurrent hardware reset detaches the device and clears the present bit
before acquiring the vport_ctrl_lock to set np->vport = NULL, this thread
might pass the present check, block on idpf_vport_ctrl_lock(netdev), and
then dereference the newly set NULL vport once the lock is acquired.

Should the device presence check be moved inside the vport_ctrl_lock
critical section to prevent the NULL pointer dereference?

Simon says: This seems to be a legitimate concern.
            Which seems to further suggest that the problem
	    should be addressed by the caller.

>  	idpf_vport_ctrl_lock(netdev);

[ ... ]

> @@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
>  {
>  	int ret = -EOPNOTSUPP;
>  
> +	if (!netdev || !netif_device_present(netdev))
> +		return -ENODEV;
> +
>  	idpf_vport_ctrl_lock(netdev);

Would this be vulnerable to the same time-of-check to time-of-use race
mentioned above, since the present check is outside the lock?

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-23 16:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21  5:16 [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached Li Li
2026-04-23 16:33 ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox