Netdev List
 help / color / mirror / Atom feed
From: Jacob Keller <jacob.e.keller@intel.com>
To: Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Piotr Kwapulinski <piotr.kwapulinski@intel.com>,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Joshua Hay <joshua.a.hay@intel.com>,
	"Madhu Chittim" <madhu.chittim@intel.com>,
	Willem de Bruijn <willemb@google.com>,
	Dave Ertman <david.m.ertman@intel.com>,
	Ivan Vecera <ivecera@redhat.com>,
	Grzegorz Nitka <grzegorz.nitka@intel.com>
Cc: <netdev@vger.kernel.org>, <stable@vger.kernel.org>,
	Simon Horman <horms@kernel.org>,
	Sunitha Mekala <sunithax.d.mekala@intel.com>
Subject: Re: [PATCH net 03/13] i40e: keep q_vectors array in sync with channel count changes
Date: Wed, 6 May 2026 13:53:13 -0700	[thread overview]
Message-ID: <329453de-c300-441e-934c-cef7682fd55a@intel.com> (raw)
In-Reply-To: <20260504-jk-iwl-net-2026-05-04-v1-3-a222a88bd962@intel.com>

On 5/4/2026 10:14 PM, Jacob Keller wrote:
> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> 
> For the main VSI, i40e_set_num_rings_in_vsi() always derives
> num_q_vectors from pf->num_lan_msix. At the same time, ethtool -L stores
> the user requested channel count in vsi->req_queue_pairs and the queue
> setup path uses that value for the effective number of queue pairs.
> 
> This leaves queue and vector counts out of sync after shrinking channel
> count via ethtool -L. The active queue configuration is reduced, but the
> VSI still keeps the full PF-sized q_vector topology.
> 
> That mismatch breaks reconfiguration flows which rely on vector/NAPI
> state matching the effective channel configuration. In particular,
> toggling /sys/class/net/<dev>/threaded after reducing the channel count
> can hang, and later channel-count changes can fail because VSI reinit
> does not rebuild q_vectors to match the new vector count.
> 
> Fix this by making the main VSI num_q_vectors follow the effective
> requested channel count, capped by the available MSI-X vectors. Update
> i40e_vsi_reinit_setup() to rebuild q_vectors during VSI reinit so the
> vector topology is refreshed together with the ring arrays when channel
> count changes.
> 
> Keep alloc_queue_pairs unchanged and based on pf->num_lan_qps so the VSI
> retains its full queue capacity.
> 
> Selftest napi_threaded.py was originally used when Jakub reported hang
> on /sys/class/net/<dev>/threaded toggle. In order to make it pass on
> i40e, use persistent NAPI configuration for q_vector NAPIs so NAPI
> identity and threaded settings survive q_vector reallocation across
> channel-count changes. This is achieved by using netif_napi_add_config()
> when configuring q_vectors.
> 
> $ export NETIF=ens259f1np1
> $ sudo -E env PATH="$PATH" ./tools/testing/selftests/drivers/net/napi_threaded.py
> TAP version 13
> 1..3
> ok 1 napi_threaded.napi_init
> ok 2 napi_threaded.change_num_queues
> ok 3 napi_threaded.enable_dev_threaded_disable_napi_threaded
> Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> [Jake: use min() and clamp() as suggested by Simon on Intel Wired LAN]
> 
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/intel-wired-lan/20260316133100.6054a11f@kernel.org/
> Fixes: d2a69fefd756 ("i40e: Fix changing previously set num_queue_pairs for PFs")
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Reviewed-by: Simon Horman <horms@kernel.org>
> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---

Sashiko points out some possible issues that might need to be addressed:

>  drivers/net/ethernet/intel/i40e/i40e_main.c | 34 ++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 6d4f9218dc68..23156015ed86 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -11403,10 +11403,14 @@ static void i40e_service_timer(struct timer_list *t)
>  static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
>  {
>  	struct i40e_pf *pf = vsi->back;
> +	u16 qps;
>  
>  	switch (vsi->type) {
>  	case I40E_VSI_MAIN:
>  		vsi->alloc_queue_pairs = pf->num_lan_qps;
> +		qps = vsi->req_queue_pairs ?
> +		      min(vsi->req_queue_pairs, pf->num_lan_qps) :
> +		      pf->num_lan_qps;
>  		if (!vsi->num_tx_desc)
>  			vsi->num_tx_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
>  						 I40E_REQ_DESCRIPTOR_MULTIPLE);
> @@ -11414,7 +11418,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
>  			vsi->num_rx_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
>  						 I40E_REQ_DESCRIPTOR_MULTIPLE);
>  		if (test_bit(I40E_FLAG_MSIX_ENA, pf->flags))
> -			vsi->num_q_vectors = pf->num_lan_msix;
> +			vsi->num_q_vectors = clamp(qps, 1, pf->num_lan_msix);
>  		else
>  			vsi->num_q_vectors = 1;
>  
> @@ -11503,6 +11507,7 @@ static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
>  
>  err_vectors:
>  	kfree(vsi->tx_rings);
> +	vsi->tx_rings = NULL;
>  	return ret;
>  }
>  
> @@ -12043,7 +12048,8 @@ static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, int v_idx)
>  	cpumask_copy(&q_vector->affinity_mask, cpu_possible_mask);
>  
>  	if (vsi->netdev)
> -		netif_napi_add(vsi->netdev, &q_vector->napi, i40e_napi_poll);
> +		netif_napi_add_config(vsi->netdev, &q_vector->napi,
> +				      i40e_napi_poll, v_idx);
>  
>  	/* tie q_vector and vsi together */
>  	vsi->q_vectors[v_idx] = q_vector;
> @@ -14264,12 +14270,27 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
>  
>  	pf = vsi->back;
>  
> +	if (test_bit(I40E_FLAG_MSIX_ENA, pf->flags)) {
> +		i40e_put_lump(pf->irq_pile, vsi->base_vector, vsi->idx);
> +		vsi->base_vector = 0;
> +	}
> +


It says:
> 
> Does vsi->base_vector need to be cleared unconditionally?
> If the hardware falls back to legacy/MSI mode, the I40E_FLAG_MSIX_ENA flag is
> false, which leaves vsi->base_vector with a stale non-zero value.
> Subsequent calls to i40e_vsi_setup_vectors() check if (vsi->base_vector) and
> unconditionally return -EEXIST, which causes the VSI reinitialization to
> fail and the VSI to be destroyed.

I'm not sure this is an issue. base_vector should only be non-zero when
MSIX is enabled, so this will clear it to 0 in that case. In MSI or
Legacy mode it will already be zero so it doesn't need to be cleared.
When switching modes we have to go through this function to
reinitialize, so we should clear it.

It might nomimally make more sense to always clear the value to ensure
its zero regardless if it ever got a stale value, but I don't think its
actually required.


>  	i40e_put_lump(pf->qp_pile, vsi->base_queue, vsi->idx);
>  	i40e_vsi_clear_rings(vsi);
>  
> -	i40e_vsi_free_arrays(vsi, false);
> +	i40e_vsi_free_q_vectors(vsi);
> +	i40e_vsi_free_arrays(vsi, true);

Sashiko says:

> Can this result in a use-after-free on the ring structures?
> The i40e_vsi_clear_rings() function passes the ring structures to kfree_rcu().
> Immediately after, i40e_vsi_free_q_vectors() iterates over the q_vector->tx
> list, which still points to these RCU-freed rings, and attempts to modify
> them.

I think this is correct, and we'll need to free this in order of
i40e_free_q_vectors -> i40e_vsi_clear_rings -> i40e_vsi_free_arrays.

It does seem like the q_vectors maintain their pointers to the rings and
the i40e_for_each_ring would access the values. The i40e_vsi_clear_rings
function does assign the pointers to NULL, but those don't appear to be
the same pointers checked by in q_vector.

I will note that there is at least one other teardown flow that also
appears incorrect and calls i40e_vsi_clear_rings first before calling
i40e_vsi_free_q_vectors: the teardown flow of i40e_vsi_setup().

This needs to be addressed, so this patch should be dropped from the series.

>  	i40e_set_num_rings_in_vsi(vsi);
> -	ret = i40e_vsi_alloc_arrays(vsi, false);
> +
> +	ret = i40e_vsi_alloc_arrays(vsi, true);
> +	if (ret)
> +		goto err_vsi;
> +
> +	/* Rebuild q_vectors during VSI reinit because the effective channel
> +	 * count may change num_q_vectors. Keep vector topology aligned with the
> +	 * queue configuration after ethtool's .set_channels() callback.
> +	 */
> +	ret = i40e_vsi_setup_vectors(vsi);
>  	if (ret)
>  		goto err_vsi;
>  


Finally sashiko says this:

> Is it possible for the dynamic reallocation of vsi->num_q_vectors to cause
> irq_pile fragmentation?
> Because i40e_get_lump() requires a contiguous block of vectors, reducing
> the channel count and later increasing it might fail if another VSI
> (such as a Macvlan interface) allocated vectors from the freed hole in
> the meantime.
> If this contiguous allocation fails, i40e_vsi_reinit_setup() aborts and
> destroys the VSI, which could render the interface unusable until a driver
> reload.
> Also, if i40e_vsi_setup_vectors() fails here and jumps to err_vsi, will the
> driver skip unregister_netdev()?
> The err_vsi label is placed after the err_rings block, which skips cleaning up
> the netdev. The code then falls through to i40e_vsi_clear() which frees the
> VSI memory.
> Could this leave the active net_device registered in the kernel with a
> dangling pointer to the freed VSI memory?


I'm not sure exactly what its trying to say here since it seems like a
few jumbled issues. However, I suspect even if the fragmentation issue
is real it shouldn't be solved by this fix.

The issue with the err_vsi also I am not certain is real. The function
returns NULL when it fails to reinit, and that will stop probe or
rebuild. It is likely that this entire chunk of code is flawed and
should be redone, but I think that is also out-of-scope for this fix.

Still, we need a new verison that fixes the use-after-free from
q_vectors function.

  reply	other threads:[~2026-05-06 20:53 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  5:14 [PATCH net 00/13] Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf) Jacob Keller
2026-05-05  5:14 ` [PATCH net 01/13] i40e: Cleanup PTP registration on probe failure Jacob Keller
2026-05-06 20:24   ` Jacob Keller
2026-05-05  5:14 ` [PATCH net 02/13] i40e: Cleanup PTP pins " Jacob Keller
2026-05-06 20:28   ` Jacob Keller
2026-05-05  5:14 ` [PATCH net 03/13] i40e: keep q_vectors array in sync with channel count changes Jacob Keller
2026-05-06 20:53   ` Jacob Keller [this message]
2026-05-05  5:14 ` [PATCH net 04/13] idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init() Jacob Keller
2026-05-05  5:14 ` [PATCH net 05/13] idpf: do not enable XDP if queue based scheduling is not supported Jacob Keller
2026-05-06 20:59   ` Jacob Keller
2026-05-05  5:14 ` [PATCH net 06/13] idpf: fix skb datapath queue based scheduling crashes and timeouts Jacob Keller
2026-05-05  5:14 ` [PATCH net 07/13] idpf: fix xdp crash in soft reset error path Jacob Keller
2026-05-05  5:14 ` [PATCH net 08/13] idpf: fix double free and use-after-free in aux device error paths Jacob Keller
2026-05-06 21:04   ` Jacob Keller
2026-05-05  5:14 ` [PATCH net 09/13] ice: fix setting RSS VSI hash for E830 Jacob Keller
2026-05-06 21:06   ` Jacob Keller
2026-05-07 11:47     ` Marcin Szycik
2026-05-07 16:59       ` Marcin Szycik
2026-05-07 21:13         ` Jacob Keller
2026-05-05  5:14 ` [PATCH net 10/13] ice: fix locking in ice_dcb_rebuild() Jacob Keller
2026-05-06 21:13   ` Jacob Keller
2026-05-05  5:14 ` [PATCH net 11/13] ice: fix PTP hang for E825C devices Jacob Keller
2026-05-06 21:16   ` Jacob Keller
2026-05-05  5:14 ` [PATCH net 12/13] ice: dpll: fix rclk pin state get for E810 Jacob Keller
2026-05-05  5:14 ` [PATCH net 13/13] ice: dpll: fix misplaced header macros Jacob Keller
2026-05-06 21:21 ` [PATCH net 00/13] Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf) Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=329453de-c300-441e-934c-cef7682fd55a@intel.com \
    --to=jacob.e.keller@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=arkadiusz.kubalewski@intel.com \
    --cc=davem@davemloft.net \
    --cc=david.m.ertman@intel.com \
    --cc=edumazet@google.com \
    --cc=grzegorz.nitka@intel.com \
    --cc=horms@kernel.org \
    --cc=ivecera@redhat.com \
    --cc=joshua.a.hay@intel.com \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=madhu.chittim@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=piotr.kwapulinski@intel.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=sunithax.d.mekala@intel.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox