From: Li Li <boolli@google.com>
To: Tony Nguyen <anthony.l.nguyen@intel.com>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Eric Dumazet <edumazet@google.com>,
intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
David Decotigny <decot@google.com>,
Anjali Singhai <anjali.singhai@intel.com>,
Sridhar Samudrala <sridhar.samudrala@intel.com>,
Brian Vazquez <brianvv@google.com>, Li Li <boolli@google.com>,
emil.s.tantilov@intel.com
Subject: [PATCH] idpf: do not perform flow ops when netdev is detached
Date: Sun, 19 Apr 2026 19:25:55 +0000 [thread overview]
Message-ID: <20260419192555.3631327-1-boolli@google.com> (raw)
Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
handling a reset") prevents ethtool -N/-n operations to operate on
detached netdevs, we found that out-of-tree workflows like OpenOnload
can bypass ethtool core locks and call idpf_set_rxnfc directly during
an idpf HW reset. When this happens, we could get kernel crashes like
the following:
[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070
[ 4045.794420] #PF: supervisor read access in kernel mode
[ 4045.799580] #PF: error_code(0x0000) - not-present page
[ 4045.804739] PGD 0
[ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI
...
[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload]
[ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf]
...
[ 4045.946323] Call Trace:
[ 4045.948796] <TASK>
[ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0
[ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0
[ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf]
[ 4045.964142] ? __die_body.cold+0x8/0x12
[ 4045.968000] ? page_fault_oops+0x148/0x160
[ 4045.972117] ? exc_page_fault+0x6f/0x160
[ 4045.976060] ? asm_exc_page_fault+0x22/0x30
[ 4045.980262] ? idpf_del_flow_steer+0x24/0x170 [idpf]
[ 4045.985245] idpf_set_rxnfc+0x6f/0x80 [idpf]
[ 4045.989535] af_xdp_filter_remove+0x7c/0xb0 [sfc_resource]
[ 4045.995069] oo_hw_filter_clear_hwports+0x6f/0xa0 [onload]
[ 4046.000589] oo_hw_filter_update+0x65/0x210 [onload]
[ 4046.005587] oof_hw_filter_update.constprop.0+0xe7/0x140 [onload]
[ 4046.011716] oof_manager_update_all_filters+0xad/0x270 [onload]
[ 4046.017671] __oof_do_deferred_work+0x15e/0x190 [onload]
[ 4046.023014] oof_do_deferred_work+0x2c/0x40 [onload]
[ 4046.028018] oof_do_deferred_work_fn+0x12/0x30 [onload]
[ 4046.033277] process_one_work+0x174/0x330
[ 4046.037304] worker_thread+0x246/0x390
[ 4046.041074] ? __pfx_worker_thread+0x10/0x10
[ 4046.045364] kthread+0xf6/0x240
[ 4046.048530] ? __pfx_kthread+0x10/0x10
[ 4046.052297] ret_from_fork+0x2d/0x50
[ 4046.055896] ? __pfx_kthread+0x10/0x10
[ 4046.059664] ret_from_fork_asm+0x1a/0x30
[ 4046.063613] </TASK>
To prevent this, we need to add checks in idpf_set_rxnfc and
idpf_get_rxnfc to error out if the netdev is already detached.
Tested: implemented the following patch to synthetically force idpf into
a HW reset:
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 4fc0bb14c5b1..27476d57bcf0 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -10,6 +10,9 @@
#define idpf_tx_buf_next(buf) (*(u32 *)&(buf)->priv)
LIBETH_SQE_CHECK_PRIV(u32);
+static bool SIMULATE_TX_TIMEOUT;
+module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
+
/**
* idpf_chk_linearize - Check if skb exceeds max descriptors per packet
* @skb: send buffer
@@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
adapter->tx_timeout_count++;
+ SIMULATE_TX_TIMEOUT = false;
+
netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
adapter->tx_timeout_count, txqueue);
if (!idpf_is_reset_in_prog(adapter)) {
@@ -2225,6 +2230,8 @@ static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int budget,
goto fetch_next_desc;
}
tx_q = complq->txq_grp->txqs[rel_tx_qid];
+ if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 == 1)))
+ goto fetch_next_desc;
/* Determine completion type */
ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index be66f9b2e101..ba5da2a86c15 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -8,6 +8,9 @@
#include "idpf_virtchnl.h"
#include "idpf_ptp.h"
+static bool VIRTCHNL_FAILED;
+module_param(VIRTCHNL_FAILED, bool, 0644);
+
/**
* struct idpf_vc_xn_manager - Manager for tracking transactions
* @ring: backing and lookup for transactions
@@ -3496,6 +3499,11 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
switch (adapter->state) {
case __IDPF_VER_CHECK:
err = idpf_send_ver_msg(adapter);
+
+ if (unlikely(VIRTCHNL_FAILED)) {
+ err = -EIO;
+ }
+
switch (err) {
case 0:
/* success, move state machine forward */
And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
idpf_get_rxnfc() right after the HW reset.
Without the patch: encountered NULL pointer and kernel crash.
With the patch: no crashes.
Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
Signed-off-by: Li Li <boolli@google.com>
---
drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index bb99d9e7c65d..8368a7e6a754 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
unsigned int cnt = 0;
int err = 0;
+ if (!netdev || !netif_device_present(netdev))
+ return -ENODEV;
+
idpf_vport_ctrl_lock(netdev);
vport = idpf_netdev_to_vport(netdev);
vport_config = np->adapter->vport_config[np->vport_idx];
@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
{
int ret = -EOPNOTSUPP;
+ if (!netdev || !netif_device_present(netdev))
+ return -ENODEV;
+
idpf_vport_ctrl_lock(netdev);
switch (cmd->cmd) {
case ETHTOOL_SRXCLSRLINS:
--
2.54.0.rc1.513.gad8abe7a5a-goog
next reply other threads:[~2026-04-19 19:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-19 19:25 Li Li [this message]
2026-04-20 6:20 ` [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached Loktionov, Aleksandr
2026-04-20 8:22 ` Kwapulinski, Piotr
2026-04-20 17:03 ` Li Li
2026-04-20 23:44 ` Jacob Keller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260419192555.3631327-1-boolli@google.com \
--to=boolli@google.com \
--cc=anjali.singhai@intel.com \
--cc=anthony.l.nguyen@intel.com \
--cc=brianvv@google.com \
--cc=davem@davemloft.net \
--cc=decot@google.com \
--cc=edumazet@google.com \
--cc=emil.s.tantilov@intel.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=przemyslaw.kitszel@intel.com \
--cc=sridhar.samudrala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox