* [PATCH v5 net 0/7] i40e: re-init and UAF fixes
@ 2026-07-01 12:45 Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 1/7] i40e: unregister netdev before clearing VSI on reinit failure Maciej Fijalkowski
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
jacob.e.keller, Maciej Fijalkowski
v5:
- include three new patches to address last Sashiko review
*
- do not release the irq lump in rebuild path in patch 7
- clear dangling pointers from rx and xdp rings arrays
v4:
- add preceding patch that fixes a case when some of re-init allocations
failed and we missed de-registering netdev at failure path
- pull out i40e_vsi_setup() changes onto separate patch
v3:
- address UAF when ring arrays were freed before q_vector's ring
containers (Sashiko, Jacob)
- remove bool params from alloc/free array routines (Simon)
v2:
- NULL vsi->tx_rings in i40e_vsi_alloc_arrays() (Sashiko)
Maciej Fijalkowski (7):
i40e: unregister netdev before clearing VSI on reinit failure
i40e: avoid null ptr dereference in i40e_ptp_stop()
i40e: make ring pointers unreachable before freeing via rcu
i40e: avoid deadlock when calling unregister_netdev()
i40e: fix potential UAF in i40e_vsi_setup()'s error path
i40e: do not expose netdev too early
i40e: keep q_vectors array in sync with channel count changes
drivers/net/ethernet/intel/i40e/i40e_main.c | 128 ++++++++++++--------
drivers/net/ethernet/intel/i40e/i40e_ptp.c | 5 +-
2 files changed, 82 insertions(+), 51 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v5 net 1/7] i40e: unregister netdev before clearing VSI on reinit failure
2026-07-01 12:45 [PATCH v5 net 0/7] i40e: re-init and UAF fixes Maciej Fijalkowski
@ 2026-07-01 12:45 ` Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 2/7] i40e: avoid null ptr dereference in i40e_ptp_stop() Maciej Fijalkowski
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
jacob.e.keller, Maciej Fijalkowski
i40e_vsi_reinit_setup() tears down the existing VSI queue/ring backing
state before allocating replacement arrays and queue tracking. If one of
these early allocations fails, the function jumps directly to err_vsi
and calls i40e_vsi_clear().
For a registered netdev, this frees the VSI while
netdev_priv(netdev)->vsi can still point at it, leaving the registered
netdev with dangling private driver state.
Split the error path so failures after destructive reinit teardown first
unregister and free the netdev before clearing the VSI.
Fixes: d2a69fefd756 ("i40e: Fix changing previously set num_queue_pairs for PFs")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a04683004a56..471fa7f7b643 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -14274,7 +14274,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
i40e_set_num_rings_in_vsi(vsi);
ret = i40e_vsi_alloc_arrays(vsi, false);
if (ret)
- goto err_vsi;
+ goto err_netdev;
alloc_queue_pairs = vsi->alloc_queue_pairs *
(i40e_enabled_xdp_vsi(vsi) ? 2 : 1);
@@ -14284,7 +14284,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
dev_info(&pf->pdev->dev,
"failed to get tracking for %d queues for VSI %d err %d\n",
alloc_queue_pairs, vsi->seid, ret);
- goto err_vsi;
+ goto err_netdev;
}
vsi->base_queue = ret;
@@ -14309,6 +14309,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
err_rings:
i40e_vsi_free_q_vectors(vsi);
+err_netdev:
if (vsi->netdev_registered) {
vsi->netdev_registered = false;
unregister_netdev(vsi->netdev);
@@ -14318,7 +14319,6 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
if (vsi->type == I40E_VSI_MAIN)
i40e_devlink_destroy_port(pf);
i40e_aq_delete_element(&pf->hw, vsi->seid, NULL);
-err_vsi:
i40e_vsi_clear(vsi);
return NULL;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 net 2/7] i40e: avoid null ptr dereference in i40e_ptp_stop()
2026-07-01 12:45 [PATCH v5 net 0/7] i40e: re-init and UAF fixes Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 1/7] i40e: unregister netdev before clearing VSI on reinit failure Maciej Fijalkowski
@ 2026-07-01 12:45 ` Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 3/7] i40e: make ring pointers unreachable before freeing via rcu Maciej Fijalkowski
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
jacob.e.keller, Maciej Fijalkowski, Sashiko AI Review
Sashiko reports:
***
If an allocation fails here during i40e_rebuild(), i40e_vsi_clear()
frees the
main VSI and sets pf->vsi[vsi->idx] = NULL, and the rebuild will abort
without
stopping the PTP clock.
Later, if the device is removed or unbound, i40e_remove()
unconditionally
calls i40e_ptp_stop(), which does:
drivers/net/ethernet/intel/i40e/i40e_ptp.c:i40e_ptp_stop() {
...
struct i40e_vsi *main_vsi = i40e_pf_get_main_vsi(pf);
...
dev_info(&pf->pdev->dev, "%s: removed PHC on %s\n", __func__,
main_vsi->netdev->name);
...
}
Would this cause a NULL pointer dereference since main_vsi is now NULL?
***
Check if main_vsi is not null before calling dev_info().
Fixes: beb0dff1251d ("i40e: enable PTP")
Reported-by: Sashiko AI Review <sashiko-bot@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_ptp.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index ff62b5f2c815..ca93df4d6785 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -1556,8 +1556,9 @@ void i40e_ptp_stop(struct i40e_pf *pf)
if (pf->ptp_clock) {
ptp_clock_unregister(pf->ptp_clock);
pf->ptp_clock = NULL;
- dev_info(&pf->pdev->dev, "%s: removed PHC on %s\n", __func__,
- main_vsi->netdev->name);
+ if (main_vsi)
+ dev_info(&pf->pdev->dev, "%s: removed PHC on %s\n", __func__,
+ main_vsi->netdev->name);
}
if (i40e_is_ptp_pin_dev(&pf->hw)) {
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 net 3/7] i40e: make ring pointers unreachable before freeing via rcu
2026-07-01 12:45 [PATCH v5 net 0/7] i40e: re-init and UAF fixes Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 1/7] i40e: unregister netdev before clearing VSI on reinit failure Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 2/7] i40e: avoid null ptr dereference in i40e_ptp_stop() Maciej Fijalkowski
@ 2026-07-01 12:45 ` Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 4/7] i40e: avoid deadlock when calling unregister_netdev() Maciej Fijalkowski
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
jacob.e.keller, Maciej Fijalkowski, Sashiko AI Review
Sashiko reports:
***
> err_config:
> + i40e_vsi_free_q_vectors(vsi);
> +err_qvec:
> i40e_vsi_clear_rings(vsi);
This is a pre-existing issue, but can the sequence in i40e_vsi_clear_rings()
lead to an RCU ordering violation?
In i40e_vsi_clear_rings(), the rings are freed before the array pointers are
nullified:
kfree_rcu(vsi->tx_rings[i], rcu);
WRITE_ONCE(vsi->tx_rings[i], NULL);
Under RCU rules, a pointer must be made unreachable to new readers before it
is handed off to kfree_rcu(). Could a new RCU reader (like
i40e_get_netdev_stats_struct_tx()) fetch the pointer after kfree_rcu() is
invoked, and access freed memory if the grace period expires while the
reader is still active?
***
Save the Tx ring pointer before clearing the published ring array slots
and pass the saved pointer to kfree_rcu(). This preserves the intended
RCU ordering, where new readers can no longer discover the ring through
vsi->tx_rings/rx_rings/xdp_rings before the object is queued for
deferred freeing, while avoiding a NULL kfree_rcu() argument after the
slot has already been cleared. Since the Tx pointer is the base of the
per-queue-pair allocation block, re-reading vsi->tx_rings[i] after
WRITE_ONCE(..., NULL) would otherwise turn the free into a no-op and
leak the whole ring block.
Fixes: 9f65e15b4f98 ("i40e: Move rings from pointer to array to array of pointers")
Reported-by: Sashiko AI Review <sashiko-bot@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 471fa7f7b643..a29a89192a7a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11699,11 +11699,13 @@ static void i40e_vsi_clear_rings(struct i40e_vsi *vsi)
if (vsi->tx_rings && vsi->tx_rings[0]) {
for (i = 0; i < vsi->alloc_queue_pairs; i++) {
- kfree_rcu(vsi->tx_rings[i], rcu);
+ struct i40e_ring *tx_ring = vsi->tx_rings[i];
+
WRITE_ONCE(vsi->tx_rings[i], NULL);
WRITE_ONCE(vsi->rx_rings[i], NULL);
if (vsi->xdp_rings)
WRITE_ONCE(vsi->xdp_rings[i], NULL);
+ kfree_rcu(tx_ring, rcu);
}
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 net 4/7] i40e: avoid deadlock when calling unregister_netdev()
2026-07-01 12:45 [PATCH v5 net 0/7] i40e: re-init and UAF fixes Maciej Fijalkowski
` (2 preceding siblings ...)
2026-07-01 12:45 ` [PATCH v5 net 3/7] i40e: make ring pointers unreachable before freeing via rcu Maciej Fijalkowski
@ 2026-07-01 12:45 ` Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 5/7] i40e: fix potential UAF in i40e_vsi_setup()'s error path Maciej Fijalkowski
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
jacob.e.keller, Maciej Fijalkowski, Sashiko AI Review
Sashiko reports:
***
> +err_netdev:
> if (vsi->netdev_registered) {
> vsi->netdev_registered = false;
> unregister_netdev(vsi->netdev);
Could this result in a deadlock when called during a device rebuild?
Looking at i40e_rebuild(), it explicitly acquires the RTNL lock before
proceeding:
drivers/net/ethernet/intel/i40e/i40e_main.c:i40e_rebuild() {
...
if (!lock_acquired)
rtnl_lock();
ret = i40e_setup_pf_switch(pf, reinit, true);
...
}
If i40e_setup_pf_switch() calls i40e_vsi_reinit_setup() and takes this new
err_netdev path, unregister_netdev() will unconditionally attempt to acquire
rtnl_lock(), leading to a deadlock on the non-recursive mutex.
***
Use unregister_netdevice() when the rebuild path already holds RTNL, and
keep unregister_netdev() for callers that do not. This avoids both
recursive RTNL locking and dropping RTNL in the middle of the VSI unwind
path.
Fixes: bc7d338fbb3f ("i40e: reinit flow for the main VSI")
Reported-by: Sashiko AI Review <sashiko-bot@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a29a89192a7a..e88cf7cfbd84 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -14257,7 +14257,8 @@ static int i40e_vsi_setup_vectors(struct i40e_vsi *vsi)
* Returns pointer to the successfully allocated and configured VSI sw struct
* on success, otherwise returns NULL on failure.
**/
-static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
+static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi,
+ bool lock_acquired)
{
struct i40e_vsi *main_vsi;
u16 alloc_queue_pairs;
@@ -14314,7 +14315,10 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
err_netdev:
if (vsi->netdev_registered) {
vsi->netdev_registered = false;
- unregister_netdev(vsi->netdev);
+ if (lock_acquired)
+ unregister_netdevice(vsi->netdev);
+ else
+ unregister_netdev(vsi->netdev);
free_netdev(vsi->netdev);
vsi->netdev = NULL;
}
@@ -15036,7 +15040,7 @@ static int i40e_setup_pf_switch(struct i40e_pf *pf, bool reinit, bool lock_acqui
main_vsi = i40e_vsi_setup(pf, I40E_VSI_MAIN,
uplink_seid, 0);
else if (reinit)
- main_vsi = i40e_vsi_reinit_setup(main_vsi);
+ main_vsi = i40e_vsi_reinit_setup(main_vsi, lock_acquired);
if (!main_vsi) {
dev_info(&pf->pdev->dev, "setup of MAIN VSI failed\n");
i40e_cloud_filter_exit(pf);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 net 5/7] i40e: fix potential UAF in i40e_vsi_setup()'s error path
2026-07-01 12:45 [PATCH v5 net 0/7] i40e: re-init and UAF fixes Maciej Fijalkowski
` (3 preceding siblings ...)
2026-07-01 12:45 ` [PATCH v5 net 4/7] i40e: avoid deadlock when calling unregister_netdev() Maciej Fijalkowski
@ 2026-07-01 12:45 ` Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 6/7] i40e: do not expose netdev too early Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 7/7] i40e: keep q_vectors array in sync with channel count changes Maciej Fijalkowski
6 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
jacob.e.keller, Maciej Fijalkowski
Sashiko pointed out an issue where error path in i40e_vsi_reinit_setup()
released ring memory but then when freeing q_vectors, the rings mapped
to q_vectors where touched which implies a regular use-after-free bug.
Apparently i40e_vsi_setup() has the same problem, so swap the allocation
and freeing order and fix the 13 year old bug.
Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e88cf7cfbd84..fcdd13af08ea 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -14466,14 +14466,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
fallthrough;
case I40E_VSI_FDIR:
/* set up vectors and rings if needed */
- ret = i40e_vsi_setup_vectors(vsi);
- if (ret)
- goto err_msix;
-
ret = i40e_alloc_rings(vsi);
if (ret)
goto err_rings;
+ ret = i40e_vsi_setup_vectors(vsi);
+ if (ret)
+ goto err_qvec;
+
/* map all of the rings to the q_vectors */
i40e_vsi_map_rings_to_vectors(vsi);
@@ -14493,10 +14493,10 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
return vsi;
err_config:
+ i40e_vsi_free_q_vectors(vsi);
+err_qvec:
i40e_vsi_clear_rings(vsi);
err_rings:
- i40e_vsi_free_q_vectors(vsi);
-err_msix:
if (vsi->netdev_registered) {
vsi->netdev_registered = false;
unregister_netdev(vsi->netdev);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 net 6/7] i40e: do not expose netdev too early
2026-07-01 12:45 [PATCH v5 net 0/7] i40e: re-init and UAF fixes Maciej Fijalkowski
` (4 preceding siblings ...)
2026-07-01 12:45 ` [PATCH v5 net 5/7] i40e: fix potential UAF in i40e_vsi_setup()'s error path Maciej Fijalkowski
@ 2026-07-01 12:45 ` Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 7/7] i40e: keep q_vectors array in sync with channel count changes Maciej Fijalkowski
6 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
jacob.e.keller, Maciej Fijalkowski, Sashiko AI Review
i40e_vsi_setup() registers the netdev before rings and q_vectors are fully
allocated and mapped. Once register_netdev() returns, userspace can reach
netdev callbacks such as ndo_open(), so the VSI backing state must already
be ready.
Move register_netdev() to the end of the setup path, after ring/q_vector
allocation, ring mapping and RSS configuration. Keep freeing an allocated
but not registered netdev on the error path.
Fixes: 41c445ff0f48 ("i40e: main driver core")
Reported-by: Sashiko AI Review <sashiko-bot@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 29 ++++++++++++---------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index fcdd13af08ea..a9ec53cfd905 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -14454,15 +14454,6 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
goto err_netdev;
SET_NETDEV_DEVLINK_PORT(vsi->netdev, &pf->devlink_port);
}
- ret = register_netdev(vsi->netdev);
- if (ret)
- goto err_dl_port;
- vsi->netdev_registered = true;
- netif_carrier_off(vsi->netdev);
-#ifdef CONFIG_I40E_DCB
- /* Setup DCB netlink interface */
- i40e_dcbnl_setup(vsi);
-#endif /* CONFIG_I40E_DCB */
fallthrough;
case I40E_VSI_FDIR:
/* set up vectors and rings if needed */
@@ -14490,6 +14481,19 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
if (ret)
goto err_config;
}
+
+ if (vsi->netdev) {
+ ret = register_netdev(vsi->netdev);
+ if (ret)
+ goto err_config;
+ vsi->netdev_registered = true;
+ netif_carrier_off(vsi->netdev);
+#ifdef CONFIG_I40E_DCB
+ /* Setup DCB netlink interface */
+ i40e_dcbnl_setup(vsi);
+#endif /* CONFIG_I40E_DCB */
+ }
+
return vsi;
err_config:
@@ -14500,13 +14504,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
if (vsi->netdev_registered) {
vsi->netdev_registered = false;
unregister_netdev(vsi->netdev);
- free_netdev(vsi->netdev);
- vsi->netdev = NULL;
}
-err_dl_port:
if (vsi->type == I40E_VSI_MAIN)
i40e_devlink_destroy_port(pf);
err_netdev:
+ if (vsi->netdev) {
+ free_netdev(vsi->netdev);
+ vsi->netdev = NULL;
+ }
i40e_aq_delete_element(&pf->hw, vsi->seid, NULL);
err_vsi:
i40e_vsi_clear(vsi);
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v5 net 7/7] i40e: keep q_vectors array in sync with channel count changes
2026-07-01 12:45 [PATCH v5 net 0/7] i40e: re-init and UAF fixes Maciej Fijalkowski
` (5 preceding siblings ...)
2026-07-01 12:45 ` [PATCH v5 net 6/7] i40e: do not expose netdev too early Maciej Fijalkowski
@ 2026-07-01 12:45 ` Maciej Fijalkowski
6 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2026-07-01 12:45 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, magnus.karlsson, kuba, pabeni, horms, przemyslaw.kitszel,
jacob.e.keller, Maciej Fijalkowski
For the main VSI, i40e_set_num_rings_in_vsi() always derives
num_q_vectors from pf->num_lan_msix. At the same time, ethtool -L stores
the user requested channel count in vsi->req_queue_pairs and the queue
setup path uses that value for the effective number of queue pairs.
This leaves queue and vector counts out of sync after shrinking channel
count via ethtool -L. The active queue configuration is reduced, but the
VSI still keeps the full PF-sized q_vector topology.
That mismatch breaks reconfiguration flows which rely on vector/NAPI
state matching the effective channel configuration. In particular,
toggling /sys/class/net/<dev>/threaded after reducing the channel count
can hang, and later channel-count changes can fail because VSI reinit
does not rebuild q_vectors to match the new vector count.
Fix this by making the main VSI num_q_vectors follow the effective
requested channel count, capped by the available MSI-X vectors. Update
i40e_vsi_reinit_setup() to rebuild q_vectors during VSI reinit so the
vector topology is refreshed together with the ring arrays when channel
count changes.
Keep alloc_queue_pairs unchanged and based on pf->num_lan_qps so the VSI
retains its full queue capacity. Also do not touch irq_pile when
rebuilding vectors.
Selftest napi_threaded.py was originally used when Jakub reported hang
on /sys/class/net/<dev>/threaded toggle. In order to make it pass on
i40e, use persistent NAPI configuration for q_vector NAPIs so NAPI
identity and threaded settings survive q_vector reallocation across
channel-count changes. This is achieved by using netif_napi_add_config()
when configuring q_vectors.
$ export NETIF=ens259f1np1
$ sudo -E env PATH="$PATH" ./tools/testing/selftests/drivers/net/napi_threaded.py
TAP version 13
1..3
ok 1 napi_threaded.napi_init
ok 2 napi_threaded.change_num_queues
ok 3 napi_threaded.enable_dev_threaded_disable_napi_threaded
Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/intel-wired-lan/20260316133100.6054a11f@kernel.org/
Fixes: d2a69fefd756 ("i40e: Fix changing previously set num_queue_pairs for PFs")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 69 +++++++++++++--------
1 file changed, 44 insertions(+), 25 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a9ec53cfd905..8a23bd99bd12 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11406,10 +11406,14 @@ static void i40e_service_timer(struct timer_list *t)
static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
{
struct i40e_pf *pf = vsi->back;
+ u16 qps;
switch (vsi->type) {
case I40E_VSI_MAIN:
vsi->alloc_queue_pairs = pf->num_lan_qps;
+ qps = vsi->req_queue_pairs ?
+ min(vsi->req_queue_pairs, pf->num_lan_qps) :
+ pf->num_lan_qps;
if (!vsi->num_tx_desc)
vsi->num_tx_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
I40E_REQ_DESCRIPTOR_MULTIPLE);
@@ -11417,7 +11421,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
vsi->num_rx_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
I40E_REQ_DESCRIPTOR_MULTIPLE);
if (test_bit(I40E_FLAG_MSIX_ENA, pf->flags))
- vsi->num_q_vectors = pf->num_lan_msix;
+ vsi->num_q_vectors = clamp(qps, 1, pf->num_lan_msix);
else
vsi->num_q_vectors = 1;
@@ -11469,12 +11473,11 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
/**
* i40e_vsi_alloc_arrays - Allocate queue and vector pointer arrays for the vsi
* @vsi: VSI pointer
- * @alloc_qvectors: a bool to specify if q_vectors need to be allocated.
*
* On error: returns error code (negative)
* On success: returns 0
**/
-static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
+static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi)
{
struct i40e_ring **next_rings;
int size;
@@ -11493,19 +11496,20 @@ static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
}
vsi->rx_rings = next_rings;
- if (alloc_qvectors) {
- /* allocate memory for q_vector pointers */
- size = sizeof(struct i40e_q_vector *) * vsi->num_q_vectors;
- vsi->q_vectors = kzalloc(size, GFP_KERNEL);
- if (!vsi->q_vectors) {
- ret = -ENOMEM;
- goto err_vectors;
- }
+ /* allocate memory for q_vector pointers */
+ size = sizeof(struct i40e_q_vector *) * vsi->num_q_vectors;
+ vsi->q_vectors = kzalloc(size, GFP_KERNEL);
+ if (!vsi->q_vectors) {
+ ret = -ENOMEM;
+ goto err_vectors;
}
return ret;
err_vectors:
kfree(vsi->tx_rings);
+ vsi->tx_rings = NULL;
+ vsi->rx_rings = NULL;
+ vsi->xdp_rings = NULL;
return ret;
}
@@ -11578,7 +11582,7 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
if (ret)
goto err_rings;
- ret = i40e_vsi_alloc_arrays(vsi, true);
+ ret = i40e_vsi_alloc_arrays(vsi);
if (ret)
goto err_rings;
@@ -11603,18 +11607,15 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
/**
* i40e_vsi_free_arrays - Free queue and vector pointer arrays for the VSI
* @vsi: VSI pointer
- * @free_qvectors: a bool to specify if q_vectors need to be freed.
*
* On error: returns error code (negative)
* On success: returns 0
**/
-static void i40e_vsi_free_arrays(struct i40e_vsi *vsi, bool free_qvectors)
+static void i40e_vsi_free_arrays(struct i40e_vsi *vsi)
{
/* free the ring and vector containers */
- if (free_qvectors) {
- kfree(vsi->q_vectors);
- vsi->q_vectors = NULL;
- }
+ kfree(vsi->q_vectors);
+ vsi->q_vectors = NULL;
kfree(vsi->tx_rings);
vsi->tx_rings = NULL;
vsi->rx_rings = NULL;
@@ -11674,7 +11675,7 @@ static int i40e_vsi_clear(struct i40e_vsi *vsi)
i40e_put_lump(pf->irq_pile, vsi->base_vector, vsi->idx);
bitmap_free(vsi->af_xdp_zc_qps);
- i40e_vsi_free_arrays(vsi, true);
+ i40e_vsi_free_arrays(vsi);
i40e_clear_rss_config_user(vsi);
pf->vsi[vsi->idx] = NULL;
@@ -12048,7 +12049,8 @@ static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, int v_idx)
cpumask_copy(&q_vector->affinity_mask, cpu_possible_mask);
if (vsi->netdev)
- netif_napi_add(vsi->netdev, &q_vector->napi, i40e_napi_poll);
+ netif_napi_add_config(vsi->netdev, &q_vector->napi,
+ i40e_napi_poll, v_idx);
/* tie q_vector and vsi together */
vsi->q_vectors[v_idx] = q_vector;
@@ -14203,8 +14205,9 @@ int i40e_vsi_release(struct i40e_vsi *vsi)
**/
static int i40e_vsi_setup_vectors(struct i40e_vsi *vsi)
{
- int ret = -ENOENT;
struct i40e_pf *pf = vsi->back;
+ bool reuse_irq_lump = false;
+ int ret = -ENOENT;
if (vsi->q_vectors[0]) {
dev_info(&pf->pdev->dev, "VSI %d has existing q_vectors\n",
@@ -14212,7 +14215,10 @@ static int i40e_vsi_setup_vectors(struct i40e_vsi *vsi)
return -EEXIST;
}
- if (vsi->base_vector) {
+ if (vsi->type == I40E_VSI_MAIN && vsi->base_vector)
+ reuse_irq_lump = true;
+
+ if (vsi->base_vector && !reuse_irq_lump) {
dev_info(&pf->pdev->dev, "VSI %d has non-zero base vector %d\n",
vsi->seid, vsi->base_vector);
return -EEXIST;
@@ -14232,6 +14238,10 @@ static int i40e_vsi_setup_vectors(struct i40e_vsi *vsi)
*/
if (!test_bit(I40E_FLAG_MSIX_ENA, pf->flags))
return ret;
+
+ if (reuse_irq_lump)
+ return ret;
+
if (vsi->num_q_vectors)
vsi->base_vector = i40e_get_lump(pf, pf->irq_pile,
vsi->num_q_vectors, vsi->idx);
@@ -14271,11 +14281,20 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi,
pf = vsi->back;
i40e_put_lump(pf->qp_pile, vsi->base_queue, vsi->idx);
+ i40e_vsi_free_q_vectors(vsi);
i40e_vsi_clear_rings(vsi);
+ i40e_vsi_free_arrays(vsi);
- i40e_vsi_free_arrays(vsi, false);
i40e_set_num_rings_in_vsi(vsi);
- ret = i40e_vsi_alloc_arrays(vsi, false);
+ ret = i40e_vsi_alloc_arrays(vsi);
+ if (ret)
+ goto err_netdev;
+
+ /* Rebuild q_vectors during VSI reinit because the effective channel
+ * count may change num_q_vectors. Keep vector topology aligned with the
+ * queue configuration after ethtool's .set_channels() callback.
+ */
+ ret = i40e_vsi_setup_vectors(vsi);
if (ret)
goto err_netdev;
@@ -14287,7 +14306,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi,
dev_info(&pf->pdev->dev,
"failed to get tracking for %d queues for VSI %d err %d\n",
alloc_queue_pairs, vsi->seid, ret);
- goto err_netdev;
+ goto err_rings;
}
vsi->base_queue = ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-07-01 12:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 12:45 [PATCH v5 net 0/7] i40e: re-init and UAF fixes Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 1/7] i40e: unregister netdev before clearing VSI on reinit failure Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 2/7] i40e: avoid null ptr dereference in i40e_ptp_stop() Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 3/7] i40e: make ring pointers unreachable before freeing via rcu Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 4/7] i40e: avoid deadlock when calling unregister_netdev() Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 5/7] i40e: fix potential UAF in i40e_vsi_setup()'s error path Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 6/7] i40e: do not expose netdev too early Maciej Fijalkowski
2026-07-01 12:45 ` [PATCH v5 net 7/7] i40e: keep q_vectors array in sync with channel count changes Maciej Fijalkowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox