* [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached
@ 2026-04-21 5:16 Li Li
2026-04-23 16:33 ` Simon Horman
0 siblings, 1 reply; 2+ messages in thread
From: Li Li @ 2026-04-21 5:16 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
Eric Dumazet, intel-wired-lan
Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
Sridhar Samudrala, Brian Vazquez, Li Li, emil.s.tantilov, stable
Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
handling a reset") prevents ethtool -N/-n operations to operate on
detached netdevs, we found that out-of-tree workflows like OpenOnload
can bypass ethtool core locks and call idpf_set_rxnfc directly during
an idpf HW reset. When this happens, we could get kernel crashes like
the following:
[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070
[ 4045.794420] #PF: supervisor read access in kernel mode
[ 4045.799580] #PF: error_code(0x0000) - not-present page
[ 4045.804739] PGD 0
[ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI
...
[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload]
[ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf]
...
[ 4045.946323] Call Trace:
[ 4045.948796] <TASK>
[ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0
[ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0
[ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf]
[ 4046.063613] </TASK>
To prevent this, we need to add checks in idpf_set_rxnfc and
idpf_get_rxnfc to error out if the netdev is already detached.
Tested: synthetically forced idpf into a HW reset by introducing module
parameters to simulate a Tx timeout and force virtual channel
initialization failure. This was done by skipping completion cleaning for
specific queues and returning -EIO during core initialization.
The failure was then triggered by writing 1 to the corresponding sysfs
parameters and calling idpf_get_rxnfc() during the reset process.
Without the patch: encountered NULL pointer and kernel crash.
With the patch: no crashes.
Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
Cc: stable@vger.kernel.org
Signed-off-by: Li Li <boolli@google.com>
---
v2:
- Removed the raw code block from the commit message and replaced it with
a textual description of the test modifications.
drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index bb99d9e7c65d..8368a7e6a754 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
unsigned int cnt = 0;
int err = 0;
+ if (!netdev || !netif_device_present(netdev))
+ return -ENODEV;
+
idpf_vport_ctrl_lock(netdev);
vport = idpf_netdev_to_vport(netdev);
vport_config = np->adapter->vport_config[np->vport_idx];
@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
{
int ret = -EOPNOTSUPP;
+ if (!netdev || !netif_device_present(netdev))
+ return -ENODEV;
+
idpf_vport_ctrl_lock(netdev);
switch (cmd->cmd) {
case ETHTOOL_SRXCLSRLINS:
--
2.54.0.rc1.555.g9c883467ad-goog
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached
2026-04-21 5:16 [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached Li Li
@ 2026-04-23 16:33 ` Simon Horman
0 siblings, 0 replies; 2+ messages in thread
From: Simon Horman @ 2026-04-23 16:33 UTC (permalink / raw)
To: boolli
Cc: 'Simon Horman', anthony.l.nguyen, przemyslaw.kitszel,
davem, kuba, edumazet, intel-wired-lan, netdev, linux-kernel,
decot, anjali.singhai, sridhar.samudrala, brianvv,
emil.s.tantilov, stable
From: 'Simon Horman' <horms@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
idpf: do not perform flow ops when netdev is detached
This commit adds checks to prevent crashes when out-of-tree modules bypass
ethtool core locks and call flow operations on detached netdevs during a
hardware reset. It introduces netdev and netif_device_present checks in
idpf_get_rxnfc and idpf_set_rxnfc.
> Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
> handling a reset") prevents ethtool -N/-n operations to operate on
> detached netdevs, we found that out-of-tree workflows like OpenOnload
> can bypass ethtool core locks and call idpf_set_rxnfc directly during
> an idpf HW reset.
Is it acceptable to add defensive checks upstream solely to accommodate
out-of-tree modules that bypass core kernel locking?
The core ethtool infrastructure manages locks like rtnl_lock for in-tree
callers to guarantee safety. Modifying the upstream driver to paper over
locking violations in out-of-tree modules might contradict upstream policies.
Simon says: If this is to address OOT module behaviour then I am concerned.
And I would suggest that the correct approach is to fix
the OOT kernel module.
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index bb99d9e7c65d5..8368a7e6a7544 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
> unsigned int cnt = 0;
> int err = 0;
>
> + if (!netdev || !netif_device_present(netdev))
> + return -ENODEV;
> +
Could this result in undefined behavior if netdev is NULL? Before this check,
np is initialized using netdev_priv(netdev), which does pointer arithmetic.
If netdev can genuinely be NULL from out-of-tree callers, should the
initialization of np be moved after the validation?
Simon says: I think the actual problem here is that netdev is never NULL.
And thus the !netdev check should not be added.
Likewise for idpf_set_rxnfc()
Also, is there a time-of-check to time-of-use race here?
The netif_device_present() check is performed outside the vport_ctrl_lock
critical section.
If a concurrent hardware reset detaches the device and clears the present bit
before acquiring the vport_ctrl_lock to set np->vport = NULL, this thread
might pass the present check, block on idpf_vport_ctrl_lock(netdev), and
then dereference the newly set NULL vport once the lock is acquired.
Should the device presence check be moved inside the vport_ctrl_lock
critical section to prevent the NULL pointer dereference?
Simon says: This seems to be a legitimate concern.
Which seems to further suggest that the problem
should be addressed by the caller.
> idpf_vport_ctrl_lock(netdev);
[ ... ]
> @@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
> {
> int ret = -EOPNOTSUPP;
>
> + if (!netdev || !netif_device_present(netdev))
> + return -ENODEV;
> +
> idpf_vport_ctrl_lock(netdev);
Would this be vulnerable to the same time-of-check to time-of-use race
mentioned above, since the present check is outside the lock?
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-23 16:39 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 5:16 [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached Li Li
2026-04-23 16:33 ` Simon Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox