public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel.com>
To: Li Li <boolli@google.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	<intel-wired-lan@lists.osuosl.org>
Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	David Decotigny <decot@google.com>,
	Anjali Singhai <anjali.singhai@intel.com>,
	"Sridhar Samudrala" <sridhar.samudrala@intel.com>,
	Brian Vazquez <brianvv@google.com>, <emil.s.tantilov@intel.com>
Subject: Re: [PATCH] idpf: do not perform flow ops when netdev is detached
Date: Mon, 20 Apr 2026 16:44:37 -0700	[thread overview]
Message-ID: <0119d579-f847-4c1c-bf4d-a31b352420ea@intel.com> (raw)
In-Reply-To: <20260419192555.3631327-1-boolli@google.com>

On 4/19/2026 12:25 PM, Li Li wrote:
> Even though commit 2e281e1155fc ("idpf: detach and close netdevs while
> handling a reset") prevents ethtool -N/-n operations to operate on
> detached netdevs, we found that out-of-tree workflows like OpenOnload
> can bypass ethtool core locks and call idpf_set_rxnfc directly during
> an idpf HW reset. When this happens, we could get kernel crashes like
> the following:
> 
> [ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070
> [ 4045.794420] #PF: supervisor read access in kernel mode
> [ 4045.799580] #PF: error_code(0x0000) - not-present page
> [ 4045.804739] PGD 0
> [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI
> ...
> [ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload]
> [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf]
> ...
> [ 4045.946323] Call Trace:
> [ 4045.948796]  <TASK>
> [ 4045.950915]  ? show_trace_log_lvl+0x1b0/0x2f0
> [ 4045.955293]  ? show_trace_log_lvl+0x1b0/0x2f0
> [ 4045.959672]  ? idpf_set_rxnfc+0x6f/0x80 [idpf]
> [ 4045.964142]  ? __die_body.cold+0x8/0x12
> [ 4045.968000]  ? page_fault_oops+0x148/0x160
> [ 4045.972117]  ? exc_page_fault+0x6f/0x160
> [ 4045.976060]  ? asm_exc_page_fault+0x22/0x30
> [ 4045.980262]  ? idpf_del_flow_steer+0x24/0x170 [idpf]
> [ 4045.985245]  idpf_set_rxnfc+0x6f/0x80 [idpf]
> [ 4045.989535]  af_xdp_filter_remove+0x7c/0xb0 [sfc_resource]
> [ 4045.995069]  oo_hw_filter_clear_hwports+0x6f/0xa0 [onload]
> [ 4046.000589]  oo_hw_filter_update+0x65/0x210 [onload]
> [ 4046.005587]  oof_hw_filter_update.constprop.0+0xe7/0x140 [onload]
> [ 4046.011716]  oof_manager_update_all_filters+0xad/0x270 [onload]
> [ 4046.017671]  __oof_do_deferred_work+0x15e/0x190 [onload]
> [ 4046.023014]  oof_do_deferred_work+0x2c/0x40 [onload]
> [ 4046.028018]  oof_do_deferred_work_fn+0x12/0x30 [onload]
> [ 4046.033277]  process_one_work+0x174/0x330
> [ 4046.037304]  worker_thread+0x246/0x390
> [ 4046.041074]  ? __pfx_worker_thread+0x10/0x10
> [ 4046.045364]  kthread+0xf6/0x240
> [ 4046.048530]  ? __pfx_kthread+0x10/0x10
> [ 4046.052297]  ret_from_fork+0x2d/0x50
> [ 4046.055896]  ? __pfx_kthread+0x10/0x10
> [ 4046.059664]  ret_from_fork_asm+0x1a/0x30
> [ 4046.063613]  </TASK>
> 
> To prevent this, we need to add checks in idpf_set_rxnfc and
> idpf_get_rxnfc to error out if the netdev is already detached.
> 
> Tested: implemented the following patch to synthetically force idpf into
> a HW reset:
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> index 4fc0bb14c5b1..27476d57bcf0 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> @@ -10,6 +10,9 @@
>  #define idpf_tx_buf_next(buf)		(*(u32 *)&(buf)->priv)
>  LIBETH_SQE_CHECK_PRIV(u32);
> 

Patchwork, and likely other git tools based around plain text mail do
not work kindly with an embedded diff inside the commit message. Could
you please resubmit with an updated commit message that doesn't simply
insert the raw diff? Perhaps you could indent the diff by a few spaces,
or simply describe what modifications were required to force the failure.

Also you didn't mention the target tree, which I think should be iwl-net.

Thanks,
Jake

> +static bool SIMULATE_TX_TIMEOUT;
> +module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
> +
>  /**
>   * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
>   * @skb: send buffer
> @@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> 
>  	adapter->tx_timeout_count++;
> 
> +	SIMULATE_TX_TIMEOUT = false;
> +
>  	netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
>  		   adapter->tx_timeout_count, txqueue);
>  	if (!idpf_is_reset_in_prog(adapter)) {
> @@ -2225,6 +2230,8 @@ static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int budget,
>  			goto fetch_next_desc;
>  		}
>  		tx_q = complq->txq_grp->txqs[rel_tx_qid];
> +		if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 == 1)))
> +			goto fetch_next_desc;
> 
>  		/* Determine completion type */
>  		ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index be66f9b2e101..ba5da2a86c15 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -8,6 +8,9 @@
>  #include "idpf_virtchnl.h"
>  #include "idpf_ptp.h"
> 
> +static bool VIRTCHNL_FAILED;
> +module_param(VIRTCHNL_FAILED, bool, 0644);
> +
>  /**
>   * struct idpf_vc_xn_manager - Manager for tracking transactions
>   * @ring: backing and lookup for transactions
> @@ -3496,6 +3499,11 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
>  		switch (adapter->state) {
>  		case __IDPF_VER_CHECK:
>  			err = idpf_send_ver_msg(adapter);
> +
> +			if (unlikely(VIRTCHNL_FAILED)) {
> +				err = -EIO;
> +			}
> +
>  			switch (err) {
>  			case 0:
>  				/* success, move state machine forward */
> 
> And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
> and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
> idpf_get_rxnfc() right after the HW reset.
> 
> Without the patch: encountered NULL pointer and kernel crash.
> 
> With the patch: no crashes.
> 
> Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
> Signed-off-by: Li Li <boolli@google.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index bb99d9e7c65d..8368a7e6a754 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
>  	unsigned int cnt = 0;
>  	int err = 0;
>  
> +	if (!netdev || !netif_device_present(netdev))
> +		return -ENODEV;
> +
>  	idpf_vport_ctrl_lock(netdev);
>  	vport = idpf_netdev_to_vport(netdev);
>  	vport_config = np->adapter->vport_config[np->vport_idx];
> @@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
>  {
>  	int ret = -EOPNOTSUPP;
>  
> +	if (!netdev || !netif_device_present(netdev))
> +		return -ENODEV;
> +
>  	idpf_vport_ctrl_lock(netdev);
>  	switch (cmd->cmd) {
>  	case ETHTOOL_SRXCLSRLINS:


      parent reply	other threads:[~2026-04-20 23:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-19 19:25 [PATCH] idpf: do not perform flow ops when netdev is detached Li Li
2026-04-20  6:20 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-04-20  8:22 ` Kwapulinski, Piotr
2026-04-20 17:03   ` Li Li
2026-04-20 23:44 ` Jacob Keller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0119d579-f847-4c1c-bf4d-a31b352420ea@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=anjali.singhai@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=boolli@google.com \
    --cc=brianvv@google.com \
    --cc=davem@davemloft.net \
    --cc=decot@google.com \
    --cc=edumazet@google.com \
    --cc=emil.s.tantilov@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=sridhar.samudrala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox