* RE: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
2026-04-19 19:25 [PATCH] idpf: do not perform flow ops when netdev is detached Li Li
@ 2026-04-20 6:20 ` Loktionov, Aleksandr
2026-04-20 8:22 ` Kwapulinski, Piotr
2026-04-20 23:44 ` Jacob Keller
2 siblings, 0 replies; 5+ messages in thread
From: Loktionov, Aleksandr @ 2026-04-20 6:20 UTC (permalink / raw)
To: Li Li, Nguyen, Anthony L, Kitszel, Przemyslaw, David S. Miller,
Jakub Kicinski, Eric Dumazet, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
Brian Vazquez, Tantilov, Emil S
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Li Li via Intel-wired-lan
> Sent: Sunday, April 19, 2026 9:26 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller
> <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric Dumazet
> <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David
> Decotigny <decot@google.com>; Singhai, Anjali
> <anjali.singhai@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>; Li
> Li <boolli@google.com>; Tantilov, Emil S <emil.s.tantilov@intel.com>
> Subject: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when
> netdev is detached
>
> Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
> handling a reset") prevents ethtool -N/-n operations to operate on
> detached netdevs, we found that out-of-tree workflows like OpenOnload
> can bypass ethtool core locks and call idpf_set_rxnfc directly during
> an idpf HW reset. When this happens, we could get kernel crashes like
> the following:
>
> [ 4045.787439] BUG: kernel NULL pointer dereference, address:
> 0000000000000070 [ 4045.794420] #PF: supervisor read access in kernel
> mode [ 4045.799580] #PF: error_code(0x0000) - not-present page [
> 4045.804739] PGD 0 [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI ...
> [ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn
> [onload] [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170
> [idpf] ...
> [ 4045.946323] Call Trace:
> [ 4045.948796] <TASK>
> [ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.955293] ?
> show_trace_log_lvl+0x1b0/0x2f0 [ 4045.959672] ?
> idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.964142] ?
> __die_body.cold+0x8/0x12 [ 4045.968000] ? page_fault_oops+0x148/0x160
> [ 4045.972117] ? exc_page_fault+0x6f/0x160 [ 4045.976060] ?
> asm_exc_page_fault+0x22/0x30 [ 4045.980262] ?
> idpf_del_flow_steer+0x24/0x170 [idpf] [ 4045.985245]
> idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.989535]
> af_xdp_filter_remove+0x7c/0xb0 [sfc_resource] [ 4045.995069]
> oo_hw_filter_clear_hwports+0x6f/0xa0 [onload] [ 4046.000589]
> oo_hw_filter_update+0x65/0x210 [onload] [ 4046.005587]
> oof_hw_filter_update.constprop.0+0xe7/0x140 [onload] [ 4046.011716]
> oof_manager_update_all_filters+0xad/0x270 [onload] [ 4046.017671]
> __oof_do_deferred_work+0x15e/0x190 [onload] [ 4046.023014]
> oof_do_deferred_work+0x2c/0x40 [onload] [ 4046.028018]
> oof_do_deferred_work_fn+0x12/0x30 [onload] [ 4046.033277]
> process_one_work+0x174/0x330 [ 4046.037304] worker_thread+0x246/0x390
> [ 4046.041074] ? __pfx_worker_thread+0x10/0x10 [ 4046.045364]
> kthread+0xf6/0x240 [ 4046.048530] ? __pfx_kthread+0x10/0x10 [
> 4046.052297] ret_from_fork+0x2d/0x50 [ 4046.055896] ?
> __pfx_kthread+0x10/0x10 [ 4046.059664] ret_from_fork_asm+0x1a/0x30 [
> 4046.063613] </TASK>
>
> To prevent this, we need to add checks in idpf_set_rxnfc and
> idpf_get_rxnfc to error out if the netdev is already detached.
>
> Tested: implemented the following patch to synthetically force idpf
> into a HW reset:
>
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> index 4fc0bb14c5b1..27476d57bcf0 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> @@ -10,6 +10,9 @@
> #define idpf_tx_buf_next(buf) (*(u32 *)&(buf)->priv)
> LIBETH_SQE_CHECK_PRIV(u32);
>
> +static bool SIMULATE_TX_TIMEOUT;
> +module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
> +
> /**
> * idpf_chk_linearize - Check if skb exceeds max descriptors per
> packet
> * @skb: send buffer
> @@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev,
> unsigned int txqueue)
>
> adapter->tx_timeout_count++;
>
> + SIMULATE_TX_TIMEOUT = false;
> +
> netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
> adapter->tx_timeout_count, txqueue);
> if (!idpf_is_reset_in_prog(adapter)) { @@ -2225,6 +2230,8 @@
> static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int
> budget,
> goto fetch_next_desc;
> }
> tx_q = complq->txq_grp->txqs[rel_tx_qid];
> + if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 ==
> 1)))
> + goto fetch_next_desc;
>
> /* Determine completion type */
> ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index be66f9b2e101..ba5da2a86c15 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -8,6 +8,9 @@
> #include "idpf_virtchnl.h"
> #include "idpf_ptp.h"
>
> +static bool VIRTCHNL_FAILED;
> +module_param(VIRTCHNL_FAILED, bool, 0644);
> +
> /**
> * struct idpf_vc_xn_manager - Manager for tracking transactions
> * @ring: backing and lookup for transactions @@ -3496,6 +3499,11 @@
> int idpf_vc_core_init(struct idpf_adapter *adapter)
> switch (adapter->state) {
> case __IDPF_VER_CHECK:
> err = idpf_send_ver_msg(adapter);
> +
> + if (unlikely(VIRTCHNL_FAILED)) {
> + err = -EIO;
> + }
> +
> switch (err) {
> case 0:
> /* success, move state machine forward */
>
> And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
> and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
> idpf_get_rxnfc() right after the HW reset.
>
> Without the patch: encountered NULL pointer and kernel crash.
>
> With the patch: no crashes.
>
> Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a
> reset")
> Signed-off-by: Li Li <boolli@google.com>
> ---
> drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index bb99d9e7c65d..8368a7e6a754 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev,
> struct ethtool_rxnfc *cmd,
> unsigned int cnt = 0;
> int err = 0;
>
> + if (!netdev || !netif_device_present(netdev))
> + return -ENODEV;
> +
> idpf_vport_ctrl_lock(netdev);
> vport = idpf_netdev_to_vport(netdev);
> vport_config = np->adapter->vport_config[np->vport_idx];
> @@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device
> *netdev, struct ethtool_rxnfc *cmd) {
> int ret = -EOPNOTSUPP;
>
> + if (!netdev || !netif_device_present(netdev))
> + return -ENODEV;
> +
> idpf_vport_ctrl_lock(netdev);
> switch (cmd->cmd) {
> case ETHTOOL_SRXCLSRLINS:
> --
> 2.54.0.rc1.513.gad8abe7a5a-goog
Please add Cc: stable@vger.kernel.org
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
^ permalink raw reply [flat|nested] 5+ messages in thread* RE: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
2026-04-19 19:25 [PATCH] idpf: do not perform flow ops when netdev is detached Li Li
2026-04-20 6:20 ` [Intel-wired-lan] " Loktionov, Aleksandr
@ 2026-04-20 8:22 ` Kwapulinski, Piotr
2026-04-20 17:03 ` Li Li
2026-04-20 23:44 ` Jacob Keller
2 siblings, 1 reply; 5+ messages in thread
From: Kwapulinski, Piotr @ 2026-04-20 8:22 UTC (permalink / raw)
To: Li Li, Nguyen, Anthony L, Kitszel, Przemyslaw, David S. Miller,
Jakub Kicinski, Eric Dumazet, intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
Brian Vazquez, Tantilov, Emil S
>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Li Li via Intel-wired-lan
>Sent: Sunday, April 19, 2026 9:26 PM
>To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric Dumazet <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
>Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David Decotigny <decot@google.com>; Singhai, Anjali <anjali.singhai@intel.com>; Samudrala, Sridhar <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>; Li Li <boolli@google.com>; Tantilov, Emil S <emil.s.tantilov@intel.com>
>Subject: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
>
>Even though commit 2e281e1155fc ("idpf: detach and close netdevs while handling a reset") prevents ethtool -N/-n operations to operate on detached netdevs, we found that out-of-tree workflows like OpenOnload can bypass ethtool core locks and call idpf_set_rxnfc directly during an idpf HW reset. When this happens, we could get kernel crashes like the following:
>
>[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070 [ 4045.794420] #PF: supervisor read access in kernel mode [ 4045.799580] #PF: error_code(0x0000) - not-present page [ 4045.804739] PGD 0 [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI ...
>[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload] [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf] ...
>[ 4045.946323] Call Trace:
>[ 4045.948796] <TASK>
>[ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.964142] ? __die_body.cold+0x8/0x12 [ 4045.968000] ? page_fault_oops+0x148/0x160 [ 4045.972117] ? exc_page_fault+0x6f/0x160 [ 4045.976060] ? asm_exc_page_fault+0x22/0x30 [ 4045.980262] ? idpf_del_flow_steer+0x24/0x170 [idpf] [ 4045.985245] idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.989535] af_xdp_filter_remove+0x7c/0xb0 [sfc_resource] [ 4045.995069] oo_hw_filter_clear_hwports+0x6f/0xa0 [onload] [ 4046.000589] oo_hw_filter_update+0x65/0x210 [onload] [ 4046.005587] oof_hw_filter_update.constprop.0+0xe7/0x140 [onload] [ 4046.011716] oof_manager_update_all_filters+0xad/0x270 [onload] [ 4046.017671] __oof_do_deferred_work+0x15e/0x190 [onload] [ 4046.023014] oof_do_deferred_work+0x2c/0x40 [onload] [ 4046.028018] oof_do_deferred_work_fn+0x12/0x30 [onload] [ 4046.033277] process_one_work+0x174/0x330 [ 4046.037304] worker_thread+0x246/0x390 [ 4046.041074] ? __pfx_worker_thread+0x10/0x10 [ 4046.045364] kthread+0xf6/0x240 [ 4046.048530] ? __pfx_kthread+0x10/0x10 [ 4046.052297] ret_from_fork+0x2d/0x50 [ 4046.055896] ? __pfx_kthread+0x10/0x10 [ 4046.059664] ret_from_fork_asm+0x1a/0x30 [ 4046.063613] </TASK>
>
>To prevent this, we need to add checks in idpf_set_rxnfc and idpf_get_rxnfc to error out if the netdev is already detached.
>
>Tested: implemented the following patch to synthetically force idpf into a HW reset:
>
>diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
>index 4fc0bb14c5b1..27476d57bcf0 100644
>--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
>+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
>@@ -10,6 +10,9 @@
> #define idpf_tx_buf_next(buf) (*(u32 *)&(buf)->priv)
> LIBETH_SQE_CHECK_PRIV(u32);
>
>+static bool SIMULATE_TX_TIMEOUT;
>+module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
>+
> /**
> * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
> * @skb: send buffer
>@@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
>
> adapter->tx_timeout_count++;
>
>+ SIMULATE_TX_TIMEOUT = false;
>+
> netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
> adapter->tx_timeout_count, txqueue);
> if (!idpf_is_reset_in_prog(adapter)) { @@ -2225,6 +2230,8 @@ static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int budget,
> goto fetch_next_desc;
> }
> tx_q = complq->txq_grp->txqs[rel_tx_qid];
>+ if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 == 1)))
>+ goto fetch_next_desc;
>
> /* Determine completion type */
> ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
>diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>index be66f9b2e101..ba5da2a86c15 100644
>--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
>@@ -8,6 +8,9 @@
> #include "idpf_virtchnl.h"
> #include "idpf_ptp.h"
>
>+static bool VIRTCHNL_FAILED;
>+module_param(VIRTCHNL_FAILED, bool, 0644);
>+
> /**
> * struct idpf_vc_xn_manager - Manager for tracking transactions
> * @ring: backing and lookup for transactions @@ -3496,6 +3499,11 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
> switch (adapter->state) {
> case __IDPF_VER_CHECK:
> err = idpf_send_ver_msg(adapter);
>+
>+ if (unlikely(VIRTCHNL_FAILED)) {
>+ err = -EIO;
>+ }
Please remove redundant parenthesis
Piotr
>+
> switch (err) {
> case 0:
> /* success, move state machine forward */
>
>And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
>and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
>idpf_get_rxnfc() right after the HW reset.
>
>Without the patch: encountered NULL pointer and kernel crash.
>
>With the patch: no crashes.
>
>Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
>Signed-off-by: Li Li <boolli@google.com>
>---
> drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
>index bb99d9e7c65d..8368a7e6a754 100644
>--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
>+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
>@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
> unsigned int cnt = 0;
> int err = 0;
>
>+ if (!netdev || !netif_device_present(netdev))
>+ return -ENODEV;
>+
> idpf_vport_ctrl_lock(netdev);
> vport = idpf_netdev_to_vport(netdev);
> vport_config = np->adapter->vport_config[np->vport_idx];
>@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd) {
> int ret = -EOPNOTSUPP;
>
>+ if (!netdev || !netif_device_present(netdev))
>+ return -ENODEV;
>+
> idpf_vport_ctrl_lock(netdev);
> switch (cmd->cmd) {
> case ETHTOOL_SRXCLSRLINS:
>--
>2.54.0.rc1.513.gad8abe7a5a-goog
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
2026-04-20 8:22 ` Kwapulinski, Piotr
@ 2026-04-20 17:03 ` Li Li
0 siblings, 0 replies; 5+ messages in thread
From: Li Li @ 2026-04-20 17:03 UTC (permalink / raw)
To: Kwapulinski, Piotr
Cc: Nguyen, Anthony L, Kitszel, Przemyslaw, David S. Miller,
Jakub Kicinski, Eric Dumazet, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David Decotigny, Singhai, Anjali, Samudrala, Sridhar,
Brian Vazquez, Tantilov, Emil S, stable
On Mon, Apr 20, 2026 at 1:23 AM Kwapulinski, Piotr
<piotr.kwapulinski@intel.com> wrote:
>
> >-----Original Message-----
> >From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Li Li via Intel-wired-lan
> >Sent: Sunday, April 19, 2026 9:26 PM
> >To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; Eric Dumazet <edumazet@google.com>; intel-wired-lan@lists.osuosl.org
> >Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; David Decotigny <decot@google.com>; Singhai, Anjali <anjali.singhai@intel.com>; Samudrala, Sridhar <sridhar.samudrala@intel.com>; Brian Vazquez <brianvv@google.com>; Li Li <boolli@google.com>; Tantilov, Emil S <emil.s.tantilov@intel.com>
> >Subject: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
> >
> >Even though commit 2e281e1155fc ("idpf: detach and close netdevs while handling a reset") prevents ethtool -N/-n operations to operate on detached netdevs, we found that out-of-tree workflows like OpenOnload can bypass ethtool core locks and call idpf_set_rxnfc directly during an idpf HW reset. When this happens, we could get kernel crashes like the following:
> >
> >[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070 [ 4045.794420] #PF: supervisor read access in kernel mode [ 4045.799580] #PF: error_code(0x0000) - not-present page [ 4045.804739] PGD 0 [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI ...
> >[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload] [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf] ...
> >[ 4045.946323] Call Trace:
> >[ 4045.948796] <TASK>
> >[ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.964142] ? __die_body.cold+0x8/0x12 [ 4045.968000] ? page_fault_oops+0x148/0x160 [ 4045.972117] ? exc_page_fault+0x6f/0x160 [ 4045.976060] ? asm_exc_page_fault+0x22/0x30 [ 4045.980262] ? idpf_del_flow_steer+0x24/0x170 [idpf] [ 4045.985245] idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.989535] af_xdp_filter_remove+0x7c/0xb0 [sfc_resource] [ 4045.995069] oo_hw_filter_clear_hwports+0x6f/0xa0 [onload] [ 4046.000589] oo_hw_filter_update+0x65/0x210 [onload] [ 4046.005587] oof_hw_filter_update.constprop.0+0xe7/0x140 [onload] [ 4046.011716] oof_manager_update_all_filters+0xad/0x270 [onload] [ 4046.017671] __oof_do_deferred_work+0x15e/0x190 [onload] [ 4046.023014] oof_do_deferred_work+0x2c/0x40 [onload] [ 4046.028018] oof_do_deferred_work_fn+0x12/0x30 [onload] [ 4046.033277] process_one_work+0x174/0x330 [ 4046.037304] worker_thread+0x246/0x390 [ 4046.041074] ? __pfx_worker_thread+0x10/0x10 [ 4046.045364] kthread+0xf6/0x240 [ 4046.048530] ? __pfx_kthread+0x10/0x10 [ 4046.052297] ret_from_fork+0x2d/0x50 [ 4046.055896] ? __pfx_kthread+0x10/0x10 [ 4046.059664] ret_from_fork_asm+0x1a/0x30 [ 4046.063613] </TASK>
> >
> >To prevent this, we need to add checks in idpf_set_rxnfc and idpf_get_rxnfc to error out if the netdev is already detached.
> >
> >Tested: implemented the following patch to synthetically force idpf into a HW reset:
> >
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >index 4fc0bb14c5b1..27476d57bcf0 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >@@ -10,6 +10,9 @@
> > #define idpf_tx_buf_next(buf) (*(u32 *)&(buf)->priv)
> > LIBETH_SQE_CHECK_PRIV(u32);
> >
> >+static bool SIMULATE_TX_TIMEOUT;
> >+module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
> >+
> > /**
> > * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
> > * @skb: send buffer
> >@@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> >
> > adapter->tx_timeout_count++;
> >
> >+ SIMULATE_TX_TIMEOUT = false;
> >+
> > netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
> > adapter->tx_timeout_count, txqueue);
> > if (!idpf_is_reset_in_prog(adapter)) { @@ -2225,6 +2230,8 @@ static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int budget,
> > goto fetch_next_desc;
> > }
> > tx_q = complq->txq_grp->txqs[rel_tx_qid];
> >+ if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 == 1)))
> >+ goto fetch_next_desc;
> >
> > /* Determine completion type */
> > ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >index be66f9b2e101..ba5da2a86c15 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >@@ -8,6 +8,9 @@
> > #include "idpf_virtchnl.h"
> > #include "idpf_ptp.h"
> >
> >+static bool VIRTCHNL_FAILED;
> >+module_param(VIRTCHNL_FAILED, bool, 0644);
> >+
> > /**
> > * struct idpf_vc_xn_manager - Manager for tracking transactions
> > * @ring: backing and lookup for transactions @@ -3496,6 +3499,11 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
> > switch (adapter->state) {
> > case __IDPF_VER_CHECK:
> > err = idpf_send_ver_msg(adapter);
> >+
> >+ if (unlikely(VIRTCHNL_FAILED)) {
> >+ err = -EIO;
> >+ }
> Please remove redundant parenthesis
> Piotr
Hi Piotr,
The block you are commenting on is not part of the patch; it's just a
block of test code in the commit message I used to reproduce the
failures.
Thanks!
>
> >+
> > switch (err) {
> > case 0:
> > /* success, move state machine forward */
> >
> >And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
> >and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
> >idpf_get_rxnfc() right after the HW reset.
> >
> >Without the patch: encountered NULL pointer and kernel crash.
> >
> >With the patch: no crashes.
> >
> >Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
> >Signed-off-by: Li Li <boolli@google.com>
> >---
> > drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >index bb99d9e7c65d..8368a7e6a754 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
> > unsigned int cnt = 0;
> > int err = 0;
> >
> >+ if (!netdev || !netif_device_present(netdev))
> >+ return -ENODEV;
> >+
> > idpf_vport_ctrl_lock(netdev);
> > vport = idpf_netdev_to_vport(netdev);
> > vport_config = np->adapter->vport_config[np->vport_idx];
> >@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd) {
> > int ret = -EOPNOTSUPP;
> >
> >+ if (!netdev || !netif_device_present(netdev))
> >+ return -ENODEV;
> >+
> > idpf_vport_ctrl_lock(netdev);
> > switch (cmd->cmd) {
> > case ETHTOOL_SRXCLSRLINS:
> >--
> >2.54.0.rc1.513.gad8abe7a5a-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] idpf: do not perform flow ops when netdev is detached
2026-04-19 19:25 [PATCH] idpf: do not perform flow ops when netdev is detached Li Li
2026-04-20 6:20 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-04-20 8:22 ` Kwapulinski, Piotr
@ 2026-04-20 23:44 ` Jacob Keller
2 siblings, 0 replies; 5+ messages in thread
From: Jacob Keller @ 2026-04-20 23:44 UTC (permalink / raw)
To: Li Li, Tony Nguyen, Przemek Kitszel, David S. Miller,
Jakub Kicinski, Eric Dumazet, intel-wired-lan
Cc: netdev, linux-kernel, David Decotigny, Anjali Singhai,
Sridhar Samudrala, Brian Vazquez, emil.s.tantilov
On 4/19/2026 12:25 PM, Li Li wrote:
> Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
> handling a reset") prevents ethtool -N/-n operations to operate on
> detached netdevs, we found that out-of-tree workflows like OpenOnload
> can bypass ethtool core locks and call idpf_set_rxnfc directly during
> an idpf HW reset. When this happens, we could get kernel crashes like
> the following:
>
> [ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070
> [ 4045.794420] #PF: supervisor read access in kernel mode
> [ 4045.799580] #PF: error_code(0x0000) - not-present page
> [ 4045.804739] PGD 0
> [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI
> ...
> [ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload]
> [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf]
> ...
> [ 4045.946323] Call Trace:
> [ 4045.948796] <TASK>
> [ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0
> [ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0
> [ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf]
> [ 4045.964142] ? __die_body.cold+0x8/0x12
> [ 4045.968000] ? page_fault_oops+0x148/0x160
> [ 4045.972117] ? exc_page_fault+0x6f/0x160
> [ 4045.976060] ? asm_exc_page_fault+0x22/0x30
> [ 4045.980262] ? idpf_del_flow_steer+0x24/0x170 [idpf]
> [ 4045.985245] idpf_set_rxnfc+0x6f/0x80 [idpf]
> [ 4045.989535] af_xdp_filter_remove+0x7c/0xb0 [sfc_resource]
> [ 4045.995069] oo_hw_filter_clear_hwports+0x6f/0xa0 [onload]
> [ 4046.000589] oo_hw_filter_update+0x65/0x210 [onload]
> [ 4046.005587] oof_hw_filter_update.constprop.0+0xe7/0x140 [onload]
> [ 4046.011716] oof_manager_update_all_filters+0xad/0x270 [onload]
> [ 4046.017671] __oof_do_deferred_work+0x15e/0x190 [onload]
> [ 4046.023014] oof_do_deferred_work+0x2c/0x40 [onload]
> [ 4046.028018] oof_do_deferred_work_fn+0x12/0x30 [onload]
> [ 4046.033277] process_one_work+0x174/0x330
> [ 4046.037304] worker_thread+0x246/0x390
> [ 4046.041074] ? __pfx_worker_thread+0x10/0x10
> [ 4046.045364] kthread+0xf6/0x240
> [ 4046.048530] ? __pfx_kthread+0x10/0x10
> [ 4046.052297] ret_from_fork+0x2d/0x50
> [ 4046.055896] ? __pfx_kthread+0x10/0x10
> [ 4046.059664] ret_from_fork_asm+0x1a/0x30
> [ 4046.063613] </TASK>
>
> To prevent this, we need to add checks in idpf_set_rxnfc and
> idpf_get_rxnfc to error out if the netdev is already detached.
>
> Tested: implemented the following patch to synthetically force idpf into
> a HW reset:
>
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> index 4fc0bb14c5b1..27476d57bcf0 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> @@ -10,6 +10,9 @@
> #define idpf_tx_buf_next(buf) (*(u32 *)&(buf)->priv)
> LIBETH_SQE_CHECK_PRIV(u32);
>
Patchwork, and likely other git tools based around plain text mail do
not work kindly with an embedded diff inside the commit message. Could
you please resubmit with an updated commit message that doesn't simply
insert the raw diff? Perhaps you could indent the diff by a few spaces,
or simply describe what modifications were required to force the failure.
Also you didn't mention the target tree, which I think should be iwl-net.
Thanks,
Jake
> +static bool SIMULATE_TX_TIMEOUT;
> +module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
> +
> /**
> * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
> * @skb: send buffer
> @@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
>
> adapter->tx_timeout_count++;
>
> + SIMULATE_TX_TIMEOUT = false;
> +
> netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
> adapter->tx_timeout_count, txqueue);
> if (!idpf_is_reset_in_prog(adapter)) {
> @@ -2225,6 +2230,8 @@ static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int budget,
> goto fetch_next_desc;
> }
> tx_q = complq->txq_grp->txqs[rel_tx_qid];
> + if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 == 1)))
> + goto fetch_next_desc;
>
> /* Determine completion type */
> ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index be66f9b2e101..ba5da2a86c15 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -8,6 +8,9 @@
> #include "idpf_virtchnl.h"
> #include "idpf_ptp.h"
>
> +static bool VIRTCHNL_FAILED;
> +module_param(VIRTCHNL_FAILED, bool, 0644);
> +
> /**
> * struct idpf_vc_xn_manager - Manager for tracking transactions
> * @ring: backing and lookup for transactions
> @@ -3496,6 +3499,11 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
> switch (adapter->state) {
> case __IDPF_VER_CHECK:
> err = idpf_send_ver_msg(adapter);
> +
> + if (unlikely(VIRTCHNL_FAILED)) {
> + err = -EIO;
> + }
> +
> switch (err) {
> case 0:
> /* success, move state machine forward */
>
> And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
> and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
> idpf_get_rxnfc() right after the HW reset.
>
> Without the patch: encountered NULL pointer and kernel crash.
>
> With the patch: no crashes.
>
> Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
> Signed-off-by: Li Li <boolli@google.com>
> ---
> drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index bb99d9e7c65d..8368a7e6a754 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
> unsigned int cnt = 0;
> int err = 0;
>
> + if (!netdev || !netif_device_present(netdev))
> + return -ENODEV;
> +
> idpf_vport_ctrl_lock(netdev);
> vport = idpf_netdev_to_vport(netdev);
> vport_config = np->adapter->vport_config[np->vport_idx];
> @@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
> {
> int ret = -EOPNOTSUPP;
>
> + if (!netdev || !netif_device_present(netdev))
> + return -ENODEV;
> +
> idpf_vport_ctrl_lock(netdev);
> switch (cmd->cmd) {
> case ETHTOOL_SRXCLSRLINS:
^ permalink raw reply [flat|nested] 5+ messages in thread