* [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached
@ 2026-04-21 5:16 Li Li
2026-04-23 16:33 ` Simon Horman
0 siblings, 1 reply; 3+ messages in thread
From: Li Li @ 2026-04-21 5:16 UTC (permalink / raw)
To: Tony Nguyen, Przemek Kitszel, David S. Miller, Jakub Kicinski,
Eric Dumazet, intel-wired-lan
Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
Sridhar Samudrala, Brian Vazquez, Li Li, emil.s.tantilov, stable
Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
handling a reset") prevents ethtool -N/-n operations to operate on
detached netdevs, we found that out-of-tree workflows like OpenOnload
can bypass ethtool core locks and call idpf_set_rxnfc directly during
an idpf HW reset. When this happens, we could get kernel crashes like
the following:
[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070
[ 4045.794420] #PF: supervisor read access in kernel mode
[ 4045.799580] #PF: error_code(0x0000) - not-present page
[ 4045.804739] PGD 0
[ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI
...
[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload]
[ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf]
...
[ 4045.946323] Call Trace:
[ 4045.948796] <TASK>
[ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0
[ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0
[ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf]
[ 4046.063613] </TASK>
To prevent this, we need to add checks in idpf_set_rxnfc and
idpf_get_rxnfc to error out if the netdev is already detached.
Tested: synthetically forced idpf into a HW reset by introducing module
parameters to simulate a Tx timeout and force virtual channel
initialization failure. This was done by skipping completion cleaning for
specific queues and returning -EIO during core initialization.
The failure was then triggered by writing 1 to the corresponding sysfs
parameters and calling idpf_get_rxnfc() during the reset process.
Without the patch: encountered NULL pointer and kernel crash.
With the patch: no crashes.
Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
Cc: stable@vger.kernel.org
Signed-off-by: Li Li <boolli@google.com>
---
v2:
- Removed the raw code block from the commit message and replaced it with
a textual description of the test modifications.
drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index bb99d9e7c65d..8368a7e6a754 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
unsigned int cnt = 0;
int err = 0;
+ if (!netdev || !netif_device_present(netdev))
+ return -ENODEV;
+
idpf_vport_ctrl_lock(netdev);
vport = idpf_netdev_to_vport(netdev);
vport_config = np->adapter->vport_config[np->vport_idx];
@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
{
int ret = -EOPNOTSUPP;
+ if (!netdev || !netif_device_present(netdev))
+ return -ENODEV;
+
idpf_vport_ctrl_lock(netdev);
switch (cmd->cmd) {
case ETHTOOL_SRXCLSRLINS:
--
2.54.0.rc1.555.g9c883467ad-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached
2026-04-21 5:16 [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached Li Li
@ 2026-04-23 16:33 ` Simon Horman
2026-04-23 18:40 ` [Intel-wired-lan] " Jacob Keller
0 siblings, 1 reply; 3+ messages in thread
From: Simon Horman @ 2026-04-23 16:33 UTC (permalink / raw)
To: boolli
Cc: 'Simon Horman', anthony.l.nguyen, przemyslaw.kitszel,
davem, kuba, edumazet, intel-wired-lan, netdev, linux-kernel,
decot, anjali.singhai, sridhar.samudrala, brianvv,
emil.s.tantilov, stable
From: 'Simon Horman' <horms@kernel.org>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
idpf: do not perform flow ops when netdev is detached
This commit adds checks to prevent crashes when out-of-tree modules bypass
ethtool core locks and call flow operations on detached netdevs during a
hardware reset. It introduces netdev and netif_device_present checks in
idpf_get_rxnfc and idpf_set_rxnfc.
> Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
> handling a reset") prevents ethtool -N/-n operations to operate on
> detached netdevs, we found that out-of-tree workflows like OpenOnload
> can bypass ethtool core locks and call idpf_set_rxnfc directly during
> an idpf HW reset.
Is it acceptable to add defensive checks upstream solely to accommodate
out-of-tree modules that bypass core kernel locking?
The core ethtool infrastructure manages locks like rtnl_lock for in-tree
callers to guarantee safety. Modifying the upstream driver to paper over
locking violations in out-of-tree modules might contradict upstream policies.
Simon says: If this is to address OOT module behaviour then I am concerned.
And I would suggest that the correct approach is to fix
the OOT kernel module.
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index bb99d9e7c65d5..8368a7e6a7544 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
> unsigned int cnt = 0;
> int err = 0;
>
> + if (!netdev || !netif_device_present(netdev))
> + return -ENODEV;
> +
Could this result in undefined behavior if netdev is NULL? Before this check,
np is initialized using netdev_priv(netdev), which does pointer arithmetic.
If netdev can genuinely be NULL from out-of-tree callers, should the
initialization of np be moved after the validation?
Simon says: I think the actual problem here is that netdev is never NULL.
And thus the !netdev check should not be added.
Likewise for idpf_set_rxnfc()
Also, is there a time-of-check to time-of-use race here?
The netif_device_present() check is performed outside the vport_ctrl_lock
critical section.
If a concurrent hardware reset detaches the device and clears the present bit
before acquiring the vport_ctrl_lock to set np->vport = NULL, this thread
might pass the present check, block on idpf_vport_ctrl_lock(netdev), and
then dereference the newly set NULL vport once the lock is acquired.
Should the device presence check be moved inside the vport_ctrl_lock
critical section to prevent the NULL pointer dereference?
Simon says: This seems to be a legitimate concern.
Which seems to further suggest that the problem
should be addressed by the caller.
> idpf_vport_ctrl_lock(netdev);
[ ... ]
> @@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
> {
> int ret = -EOPNOTSUPP;
>
> + if (!netdev || !netif_device_present(netdev))
> + return -ENODEV;
> +
> idpf_vport_ctrl_lock(netdev);
Would this be vulnerable to the same time-of-check to time-of-use race
mentioned above, since the present check is outside the lock?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached
2026-04-23 16:33 ` Simon Horman
@ 2026-04-23 18:40 ` Jacob Keller
0 siblings, 0 replies; 3+ messages in thread
From: Jacob Keller @ 2026-04-23 18:40 UTC (permalink / raw)
To: Simon Horman, boolli
Cc: anthony.l.nguyen, przemyslaw.kitszel, davem, kuba, edumazet,
intel-wired-lan, netdev, linux-kernel, decot, anjali.singhai,
sridhar.samudrala, brianvv, emil.s.tantilov, stable
On 4/23/2026 9:33 AM, Simon Horman wrote:
> From: 'Simon Horman' <horms@kernel.org>
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> idpf: do not perform flow ops when netdev is detached
>
> This commit adds checks to prevent crashes when out-of-tree modules bypass
> ethtool core locks and call flow operations on detached netdevs during a
> hardware reset. It introduces netdev and netif_device_present checks in
> idpf_get_rxnfc and idpf_set_rxnfc.
>
>> Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
>> handling a reset") prevents ethtool -N/-n operations to operate on
>> detached netdevs, we found that out-of-tree workflows like OpenOnload
>> can bypass ethtool core locks and call idpf_set_rxnfc directly during
>> an idpf HW reset.
>
> Is it acceptable to add defensive checks upstream solely to accommodate
> out-of-tree modules that bypass core kernel locking?
>
> The core ethtool infrastructure manages locks like rtnl_lock for in-tree
> callers to guarantee safety. Modifying the upstream driver to paper over
> locking violations in out-of-tree modules might contradict upstream policies.
>
> Simon says: If this is to address OOT module behaviour then I am concerned.
> And I would suggest that the correct approach is to fix
> the OOT kernel module.
>
The commit message says: "OpenOnload can bypass ethtool core locks and
call idpf_set_rxnfc directly". But if it somehow calls
idpf_set_rxnfc_directly without holding the appropriate locks, then no
amount of changes to the idpf driver will fix that issue. It is simply
unsafe for that to be occurring, and this feels like it just patches one
specific problem without addressing the root cause that something is
calling the drivers ethtool function without correctly holding the
expected locks.
@Li Li, could you please explain more details about the workflow that
triggers these behaviors? If it can't be reproduced with in-tree modules
then I don't think we can accept this fix.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-23 18:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 5:16 [PATCH iwl-net v2] idpf: do not perform flow ops when netdev is detached Li Li
2026-04-23 16:33 ` Simon Horman
2026-04-23 18:40 ` [Intel-wired-lan] " Jacob Keller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox