[PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe)

Netdev List
 help / color / mirror / Atom feed

* [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe)
@ 2026-05-20 18:34 Tony Nguyen
  2026-05-20 18:34 ` [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race Tony Nguyen
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev; +Cc: Tony Nguyen

Kohei Enju disallows XDP attach when VSI is rebuilding to prevent
possible NULL dereference on ice.

Michal Schmidt adds call to ice_vsi_realloc_stat_arrays() when
reconfiguring VF VSIs on ice to resolve overwriting bounds when queues
are increased.

Jose Ignacio Tornos Martinez fixes issues with VF bonding that came
about with commit ad7c7b2172c3 ("net: hold netdev instance lock during
sysfs operations").

Further details:
https://lore.kernel.org/all/20260429102426.210750-1-jtornosm@redhat.com/

Przemyslaw Korba sets the proper PTP extts flags for i40e.

Corinna Vinschen moves from VF spinlock to RCU to prevent races in
structure accesses in ixgbe.

The following are changes since commit edc502717be153674b0b3eefb8b40734c747c138:
  Merge branch 'mptcp-misc-fixes-for-v7-1-rc4'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 100GbE

Corinna Vinschen (1):
  ixgbe: only access vfinfo and mv_list under RCU lock

Jose Ignacio Tornos Martinez (4):
  iavf: return EBUSY if reset in progress or not ready during MAC change
  i40e: skip unnecessary VF reset when setting trust
  iavf: send MAC change request synchronously
  ice: skip unnecessary VF reset when setting trust

Kohei Enju (1):
  ice: fix UAF/NULL deref when VSI rebuild and XDP attach race

Michal Schmidt (1):
  ice: fix stats array overflow when VF requests more queues

Przemyslaw Korba (1):
  i40e: set supported_extts_flags for rising edge

 drivers/net/ethernet/intel/i40e/i40e_ptp.c    |   2 +
 .../ethernet/intel/i40e/i40e_virtchnl_pf.c    |  38 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  10 +-
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  74 ++-
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 100 +++-
 drivers/net/ethernet/intel/ice/ice_lib.c      |   2 +-
 drivers/net/ethernet/intel/ice/ice_lib.h      |   1 +
 drivers/net/ethernet/intel/ice/ice_main.c     |  13 +-
 drivers/net/ethernet/intel/ice/ice_sriov.c    |  33 +-
 drivers/net/ethernet/intel/ice/ice_vf_lib.c   |   7 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   7 +-
 .../net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c   |  36 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  44 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 227 +++++---
 .../net/ethernet/intel/ixgbe/ixgbe_sriov.c    | 547 ++++++++++++------
 16 files changed, 825 insertions(+), 333 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race
  2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
@ 2026-05-20 18:34 ` Tony Nguyen
  2026-05-21 15:37   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-20 18:34 ` [PATCH net 2/8] ice: fix stats array overflow when VF requests more queues Tony Nguyen
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Kohei Enju, Simon Horman, Patryk Holda, Tony Nguyen

From: Kohei Enju <kohei@enjuk.jp>

ice_xdp_setup_prog() unconditionally hot-swaps xdp_prog when
ICE_VSI_REBUILD_PENDING is set. In the attach path, this can publish a
new rx_ring->xdp_prog before rx_ring->xdp_ring becomes valid while the
rebuild is pending. As a result, ice_clean_rx_irq() may dereference
rx_ring->xdp_ring too early.

With high-volume RX packets, running these commands in parallel
triggered a KASAN splat [1].
 # ethtool --reset $DEV irq dma filter offload
 # ip link set dev $DEV xdp {obj $OBJ sec xdp,off}

Fix this by rejecting XDP attach while rebuild is pending.
Keep XDP detach allowed in this window. Detach clears rx_ring->xdp_prog,
so the RX path will not attempt to access rx_ring->xdp_ring.

[1]
BUG: KASAN: slab-use-after-free in ice_napi_poll+0x3921/0x41a0
Read of size 2 at addr ffff88812475b880 by task ksoftirqd/1/23
[...]
Call Trace:
 <TASK>
 ice_napi_poll+0x3921/0x41a0
 __napi_poll+0x98/0x520
 net_rx_action+0x8f2/0xfa0
 handle_softirqs+0x1cb/0x7f0
[...]
 </TASK>

Allocated by task 7246:
 ice_prepare_xdp_rings+0x3de/0x12d0
 ice_xdp+0x61c/0xef0
 dev_xdp_install+0x3c4/0x840
 dev_xdp_attach+0x50a/0x10a0
 dev_change_xdp_fd+0x175/0x210
[...]

Freed by task 7251:
 __rcu_free_sheaf_prepare+0x5f/0x230
 rcu_free_sheaf+0x1a/0xf0
 rcu_core+0x567/0x1d80
 handle_softirqs+0x1cb/0x7f0

Fixes: 2504b8405768 ("ice: protect XDP configuration with a mutex")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Patryk Holda <patryk.holda@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index e2fbe111f849..f5aa31886e37 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2912,12 +2912,21 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 	}
 
 	/* hot swap progs and avoid toggling link */
-	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
-	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
+	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
 		ice_vsi_assign_bpf_prog(vsi, prog);
 		return 0;
 	}
 
+	if (test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
+		if (prog) {
+			NL_SET_ERR_MSG_MOD(extack, "VSI rebuild is pending");
+			return -EAGAIN;
+		}
+
+		ice_vsi_assign_bpf_prog(vsi, NULL);
+		return 0;
+	}
+
 	if_running = netif_running(vsi->netdev) &&
 		     !test_and_set_bit(ICE_VSI_DOWN, vsi->state);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race
  2026-05-20 18:34 ` [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race Tony Nguyen
@ 2026-05-21 15:37   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-21 15:37 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, pabeni, edumazet, andrew+netdev, netdev, Kohei Enju,
	Simon Horman, Patryk Holda

On Wed, 20 May 2026 11:34:49 -0700 Tony Nguyen wrote:
> +	if (test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
> +		if (prog) {
> +			NL_SET_ERR_MSG_MOD(extack, "VSI rebuild is pending");
> +			return -EAGAIN;
> +		}

I seem to recall complaining about this sort of error handling to you
folks in the past. Chances are user space will just fail seeing this
error, rather than retrying. The correct behavior would be to wait.
Is that not possible?

(I think Sashiko is brewing a lot of comments for this series so it
won't make today's PR either way :()

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race
  2026-05-20 18:34 ` [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race Tony Nguyen
  2026-05-21 15:37   ` Jakub Kicinski
@ 2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ice: fix UAF/NULL deref when VSI rebuild and XDP attach race

This commit addresses a race condition between a VSI rebuild and an XDP
attach operation. It prevents rx_ring->xdp_prog from being populated before
rx_ring->xdp_ring is valid by rejecting the XDP attach when a VSI rebuild
is pending. XDP detachment is still permitted during this window to clear
the existing program.

Fixes: 2504b8405768 ("ice: protect XDP configuration with a mutex")

> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
[ ... ]
> @@ -2912,12 +2912,21 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
>  	}
>  
>  	/* hot swap progs and avoid toggling link */
> -	if (ice_is_xdp_ena_vsi(vsi) == !!prog ||
> -	    test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {
> +	if (ice_is_xdp_ena_vsi(vsi) == !!prog) {
>  		ice_vsi_assign_bpf_prog(vsi, prog);
>  		return 0;
>  	}
>  
> +	if (test_bit(ICE_VSI_REBUILD_PENDING, vsi->state)) {

[Severity: High]
This is a pre-existing issue, but is the check for ICE_VSI_REBUILD_PENDING
prone to a TOCTOU race because it relies on ad-hoc flag synchronization
instead of proper locking with the device reset path?

The asynchronous reset path (ice_prepare_for_reset()) sets
ICE_VSI_REBUILD_PENDING without holding vsi->xdp_state_lock or rtnl_lock().

If ice_xdp_setup_prog() (which holds both locks) checks the bit before it
is set, it proceeds to configure XDP and sets ICE_VSI_DOWN. If the reset
path then sets the bit and calls ice_pf_dis_all_vsi(), it will skip this
VSI because ICE_VSI_DOWN is already set.

When ice_xdp_setup_prog() finishes, it calls ice_up() which clears
ICE_VSI_DOWN and starts enabling hardware queues while the device is
actively undergoing a hardware reset (ice_do_reset()).

Can writing to hardware registers during an active reset cause hardware
hangs or PCI master aborts?

> +		if (prog) {
> +			NL_SET_ERR_MSG_MOD(extack, "VSI rebuild is pending");
> +			return -EAGAIN;
> +		}
> +
> +		ice_vsi_assign_bpf_prog(vsi, NULL);

[Severity: High]
This is a pre-existing issue, but does detaching an XDP program while a VSI
rebuild has failed lead to a NULL pointer dereference here?

If a VSI rebuild fails, ice_vsi_decfg() frees vsi->rx_rings and sets the
pointer to NULL via ice_vsi_free_arrays(), but leaves vsi->num_rxq
non-zero. The ICE_VSI_REBUILD_PENDING bit remains set because it is only
cleared on the success path in ice_vsi_rebuild().

If a user subsequently detaches XDP, ice_xdp_setup_prog() intercepts it
because ICE_VSI_REBUILD_PENDING is set, and it calls
ice_vsi_assign_bpf_prog(vsi, NULL).

This function loops up to vsi->num_rxq:

ice_vsi_assign_bpf_prog() {
    ...
    ice_for_each_rxq(vsi, i)
        WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);
    ...
}

Since vsi->rx_rings is NULL, does this cause a guaranteed NULL pointer
dereference panic?

> +		return 0;
> +	}
> +
>  	if_running = netif_running(vsi->netdev) &&
>  		     !test_and_set_bit(ICE_VSI_DOWN, vsi->state);
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net 2/8] ice: fix stats array overflow when VF requests more queues
  2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
  2026-05-20 18:34 ` [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race Tony Nguyen
@ 2026-05-20 18:34 ` Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-20 18:34 ` [PATCH net 3/8] iavf: return EBUSY if reset in progress or not ready during MAC change Tony Nguyen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Schmidt, anthony.l.nguyen, przemyslaw.kitszel,
	jacob.e.keller, poros, Aleksandr Loktionov, Simon Horman,
	Rafal Romanowski

From: Michal Schmidt <mschmidt@redhat.com>

When a VF increases its queue count via VIRTCHNL_OP_REQUEST_QUEUES,
ice_vc_request_qs_msg() sets vf->num_req_qs and triggers a VF reset.
The reset calls ice_vf_reconfig_vsi(), which does ice_vsi_decfg()
followed by ice_vsi_cfg(). ice_vsi_decfg() does not free the per-ring
stats arrays. Inside ice_vsi_cfg_def(), ice_vsi_set_num_qs() updates
alloc_txq/alloc_rxq to the new larger value, but
ice_vsi_alloc_stat_arrays() returns early because the stats already
exist. ice_vsi_alloc_ring_stats() then iterates using the new larger
alloc_txq and writes beyond the bounds of the old, smaller
tx_ring_stats/rx_ring_stats pointer arrays, corrupting adjacent SLUB
metadata.

KASAN detects the bug:
 ==================================================================
 BUG: KASAN: slab-out-of-bounds in ice_vsi_alloc_ring_stats+0x385/0x4a0 [ice]
 Read of size 8 at addr ffff88810affea60 by task kworker/u131:7/221

 CPU: 24 UID: 0 PID: 221 Comm: kworker/u131:7 Not tainted 7.1.0-rc1+ #1 PREEMPT(lazy)
 ...
 Workqueue: ice ice_service_task [ice]
 Call Trace:
  <TASK>
  ...
  kasan_report+0xd7/0x120
  ice_vsi_alloc_ring_stats+0x385/0x4a0 [ice]
  ice_vsi_cfg_def+0x12e2/0x2060 [ice]
  ice_vsi_cfg+0xb5/0x3c0 [ice]
  ice_reset_vf+0x858/0xf80 [ice]
  ice_vc_request_qs_msg+0x1da/0x290 [ice]
  ice_vc_process_vf_msg+0xb15/0x1430 [ice]
  __ice_clean_ctrlq+0x70d/0x9d0 [ice]
  ice_service_task+0x840/0xf20 [ice]
  process_one_work+0x690/0xff0
  worker_thread+0x4d9/0xd20
  kthread+0x322/0x410
  ret_from_fork+0x332/0x660
  ret_from_fork_asm+0x1a/0x30
  </TASK>

 Allocated by task 2439:
  kasan_save_stack+0x1c/0x40
  kasan_save_track+0x10/0x30
  __kasan_kmalloc+0x96/0xb0
  __kmalloc_noprof+0x1d8/0x580
  ice_vsi_cfg_def+0x115c/0x2060 [ice]
  ice_vsi_cfg+0xb5/0x3c0 [ice]
  ice_vsi_setup+0x180/0x320 [ice]
  ice_start_vfs+0x1f3/0x590 [ice]
  ice_ena_vfs+0x66d/0x798 [ice]
  ice_sriov_configure.cold+0xe4/0x121 [ice]
  sriov_numvfs_store+0x279/0x480
  kernfs_fop_write_iter+0x331/0x4f0
  vfs_write+0x4c4/0xe40
  ksys_write+0x10c/0x240
  do_syscall_64+0xd9/0x650
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

 The buggy address belongs to the object at ffff88810affea40
                which belongs to the cache kmalloc-32 of size 32
 The buggy address is located 0 bytes to the right of
                allocated 32-byte region [ffff88810affea40, ffff88810affea60)
 ...
 ==================================================================

ice_vsi_rebuild() handles this correctly by calling
ice_vsi_realloc_stat_arrays() before reconfiguration, but
ice_vf_reconfig_vsi() was missing this call.

Fix by calling ice_vsi_realloc_stat_arrays() in ice_vf_reconfig_vsi()
before ice_vsi_decfg(), mirroring the ice_vsi_rebuild() pattern. Set
vsi->req_txq/req_rxq from vf->num_req_qs so the realloc function knows
the target array size.

See the linked RHEL Jira item for a reproducer.

Fixes: 2a2cb4c6c181 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()")
Closes: https://redhat.atlassian.net/browse/RHEL-164321
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Assisted-by: Claude:claude-opus-4-6 semcode
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c    | 2 +-
 drivers/net/ethernet/intel/ice/ice_lib.h    | 1 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.c | 7 +++++++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 837b71b7b2b7..fc78176a2a8d 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -3015,7 +3015,7 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi,
  * ice_vsi_realloc_stat_arrays - Frees unused stat structures or alloc new ones
  * @vsi: VSI pointer
  */
-static int
+int
 ice_vsi_realloc_stat_arrays(struct ice_vsi *vsi)
 {
 	u16 req_txq = vsi->req_txq ? vsi->req_txq : vsi->alloc_txq;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 49454d98dcfe..6f7da84384e5 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -66,6 +66,7 @@ int ice_ena_vsi(struct ice_vsi *vsi, bool locked);
 void ice_vsi_decfg(struct ice_vsi *vsi);
 void ice_dis_vsi(struct ice_vsi *vsi, bool locked);
 
+int ice_vsi_realloc_stat_arrays(struct ice_vsi *vsi);
 int ice_vsi_rebuild(struct ice_vsi *vsi, u32 vsi_flags);
 int ice_vsi_cfg(struct ice_vsi *vsi);
 struct ice_vsi *ice_vsi_alloc(struct ice_pf *pf);
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
index b1f46707dcc0..cf4ed034ad8a 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
@@ -268,6 +268,13 @@ static int ice_vf_reconfig_vsi(struct ice_vf *vf)
 
 	vsi->flags = ICE_VSI_FLAG_NO_INIT;
 
+	vsi->req_txq = vf->num_req_qs;
+	vsi->req_rxq = vf->num_req_qs;
+
+	err = ice_vsi_realloc_stat_arrays(vsi);
+	if (err)
+		return err;
+
 	ice_vsi_decfg(vsi);
 	ice_fltr_remove_all(vsi);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net 2/8] ice: fix stats array overflow when VF requests more queues
  2026-05-20 18:34 ` [PATCH net 2/8] ice: fix stats array overflow when VF requests more queues Tony Nguyen
@ 2026-05-23  0:16   ` Jakub Kicinski
  0 siblings, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ice: fix stats array overflow when VF requests more queues

When a VF increases its queue count via VIRTCHNL_OP_REQUEST_QUEUES,
ice_vc_request_qs_msg() sets vf->num_req_qs and triggers a VF reset.
The reset calls ice_vf_reconfig_vsi(), which does ice_vsi_decfg()
followed by ice_vsi_cfg(). This patch fixes a stats array overflow by
calling ice_vsi_realloc_stat_arrays() in ice_vf_reconfig_vsi() before
ice_vsi_decfg().

> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index 837b71b7b2b7c..fc78176a2a8d0 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -3015,7 +3015,7 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi,
>   * ice_vsi_realloc_stat_arrays - Frees unused stat structures or alloc new ones
>   * @vsi: VSI pointer
>   */
> -static int
> +int
>  ice_vsi_realloc_stat_arrays(struct ice_vsi *vsi)
>  {

[Severity: High]
This isn't a bug introduced by this patch, but making this function public and
using it in more paths exposes a potential NULL pointer dereference.

Does this code safely handle ADQ channel VSIs?

For ADQ channel VSIs (where vsi->type == ICE_VSI_CHNL),
ice_vsi_alloc_stat_arrays() bypasses stats allocation, leaving
pf->vsi_stats[vsi->idx] as NULL.

During a PF reset (ICE_RESET_PFR), ice_rebuild_channels() iterates over
ICE_VSI_CHNL VSIs and calls ice_vsi_rebuild(), which triggers
ice_vsi_realloc_stat_arrays().

This will unconditionally dereference vsi_stat:

drivers/net/ethernet/intel/ice/ice_lib.c:ice_vsi_realloc_stat_arrays() {
	...
	vsi_stat = pf->vsi_stats[vsi->idx];

	if (req_txq < prev_txq) {
		for (i = req_txq; i < prev_txq; i++) {
			if (vsi_stat->tx_ring_stats[i]) {
				...

Can this result in a NULL pointer dereference when rebuilding channel VSIs?

[Severity: High]
This is another pre-existing issue, but the error handling when reallocating
arrays might lead to out-of-bounds reads and memory corruption.

When a VF requests fewer queues (req_txq < prev_txq), the function successfully
shrinks vsi_stat->tx_ring_stats using krealloc_array().

However, if the subsequent reallocation for rx_ring_stats fails (e.g., due to
memory pressure returning -ENOMEM), the function returns an error and aborts
before vsi->alloc_txq is updated to the new smaller size:

drivers/net/ethernet/intel/ice/ice_lib.c:ice_vsi_realloc_stat_arrays() {
	...
	vsi_stat->tx_ring_stats = krealloc_array(...);
	if (!vsi_stat->tx_ring_stats) { ... }

	...
	vsi_stat->rx_ring_stats = krealloc_array(...);
	if (!vsi_stat->rx_ring_stats) {
		vsi_stat->rx_ring_stats = rx_ring_stats;
		return -ENOMEM;
	}
	...
}

Later, during VSI teardown in ice_vsi_free_stats(), ice_for_each_alloc_txq(vsi, i)
iterates up to the stale, larger vsi->alloc_txq value.

Is it possible for this to read past the bounds of the newly shrunk
tx_ring_stats array and pass non-zero garbage to kfree_rcu()?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net 3/8] iavf: return EBUSY if reset in progress or not ready during MAC change
  2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
  2026-05-20 18:34 ` [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race Tony Nguyen
  2026-05-20 18:34 ` [PATCH net 2/8] ice: fix stats array overflow when VF requests more queues Tony Nguyen
@ 2026-05-20 18:34 ` Tony Nguyen
  2026-05-20 18:34 ` [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust Tony Nguyen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Jose Ignacio Tornos Martinez, anthony.l.nguyen,
	przemyslaw.kitszel, jacob.e.keller, horms, Aleksandr Loktionov,
	Rafal Romanowski

From: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>

When a MAC address change is requested while the VF is resetting or still
initializing, return -EBUSY immediately instead of attempting the
operation.

Additionally, during early initialization states (before __IAVF_DOWN),
the PF may be slow to respond to MAC change requests, causing long
delays. Only allow MAC changes once the VF reaches __IAVF_DOWN state or
later, when the watchdog is running and the VF is ready for operations.

After commit ad7c7b2172c3 ("net: hold netdev instance lock
during sysfs operations"), MAC changes are called with the netdev lock
held, so we should not wait with the lock held during reset or
initialization. This allows the caller to retry or handle the busy state
appropriately without blocking other operations.

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index d2914c511e1e..78c59a58e0b2 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1042,6 +1042,9 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
 	struct sockaddr *addr = p;
 	int ret;
 
+	if (iavf_is_reset_in_progress(adapter) || adapter->state < __IAVF_DOWN)
+		return -EBUSY;
+
 	if (!is_valid_ether_addr(addr->sa_data))
 		return -EADDRNOTAVAIL;
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust
  2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
                   ` (2 preceding siblings ...)
  2026-05-20 18:34 ` [PATCH net 3/8] iavf: return EBUSY if reset in progress or not ready during MAC change Tony Nguyen
@ 2026-05-20 18:34 ` Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-20 18:34 ` [PATCH net 5/8] iavf: send MAC change request synchronously Tony Nguyen
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Jose Ignacio Tornos Martinez, anthony.l.nguyen,
	przemyslaw.kitszel, jacob.e.keller, horms, Rafal Romanowski

From: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>

The current implementation triggers a VF reset when changing the trust
setting, causing a ~10 second delay during bonding setup.

In all the cases, the reset causes a ~10 second delay during which:
- VF must reinitialize completely
- Any in-progress operations (like bonding enslave) fail with timeouts
- VF is unavailable

When granting trust, no reset is needed - we can just set the capability
flag to allow privileged operations.

When revoking trust, we only need to reset (conservative approach) if
the VF has actually configured advanced features that require cleanup
(ADQ/cloud filters, promiscuous mode). For VFs in a clean state, we can
safely change the trust setting without the disruptive reset.

When we don't reset, we manually handle capability flag via helper
function, eliminating the delay.

Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 .../ethernet/intel/i40e/i40e_virtchnl_pf.c    | 38 ++++++++++++++-----
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index a26c3d47ec15..0cc434b26eb8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -4943,6 +4943,23 @@ int i40e_ndo_set_vf_spoofchk(struct net_device *netdev, int vf_id, bool enable)
 	return ret;
 }
 
+/**
+ * i40e_setup_vf_trust - Enable/disable VF trust mode without reset
+ * @vf: VF to configure
+ * @setting: trust setting
+ *
+ * Update VF flags when changing trust without performing a VF reset.
+ * This is only called when it's safe to skip the reset (VF has no advanced
+ * features configured that need cleanup).
+ */
+static void i40e_setup_vf_trust(struct i40e_vf *vf, bool setting)
+{
+	if (setting)
+		set_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
+	else
+		clear_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
+}
+
 /**
  * i40e_ndo_set_vf_trust
  * @netdev: network interface device structure of the pf
@@ -4987,19 +5004,20 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
 	set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
 	pf->vsi[vf->lan_vsi_idx]->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
 
-	i40e_vc_reset_vf(vf, true);
+	/* Reset only if revoking trust and VF has advanced features configured */
+	if (!setting &&
+	    (vf->adq_enabled || vf->num_cloud_filters > 0 ||
+	     test_bit(I40E_VF_STATE_UC_PROMISC, &vf->vf_states) ||
+	     test_bit(I40E_VF_STATE_MC_PROMISC, &vf->vf_states))) {
+		i40e_vc_reset_vf(vf, true);
+		i40e_del_all_cloud_filters(vf);
+	} else {
+		i40e_setup_vf_trust(vf, setting);
+	}
+
 	dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
 		 vf_id, setting ? "" : "un");
 
-	if (vf->adq_enabled) {
-		if (!vf->trusted) {
-			dev_info(&pf->pdev->dev,
-				 "VF %u no longer Trusted, deleting all cloud filters\n",
-				 vf_id);
-			i40e_del_all_cloud_filters(vf);
-		}
-	}
-
 out:
 	clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);
 	return ret;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust
  2026-05-20 18:34 ` [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust Tony Nguyen
@ 2026-05-23  0:16   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
i40e: skip unnecessary VF reset when setting trust

This commit changes i40e_ndo_set_vf_trust() to skip the VF reset when
granting trust, and when revoking trust unless one of a few advanced
features (ADQ/cloud filters, UC/MC promisc) is currently set on the VF.
On the no-reset path the new helper i40e_setup_vf_trust() flips the
I40E_VIRTCHNL_VF_CAP_PRIVILEGE bit directly.

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index a26c3d47ec15..0cc434b26eb8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> @@ -4943,6 +4943,23 @@ int i40e_ndo_set_vf_spoofchk(struct net_device *netdev, int vf_id, bool enable)
>  	return ret;
>  }
>  
> +/**
> + * i40e_setup_vf_trust - Enable/disable VF trust mode without reset
> + * @vf: VF to configure
> + * @setting: trust setting
> + *
> + * Update VF flags when changing trust without performing a VF reset.
> + * This is only called when it's safe to skip the reset (VF has no advanced
> + * features configured that need cleanup).
> + */
> +static void i40e_setup_vf_trust(struct i40e_vf *vf, bool setting)
> +{
> +	if (setting)
> +		set_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
> +	else
> +		clear_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
> +}
> +

[ ... ]

> @@ -4987,19 +5004,20 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
>  	set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
>  	pf->vsi[vf->lan_vsi_idx]->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
>  
> -	i40e_vc_reset_vf(vf, true);
> +	/* Reset only if revoking trust and VF has advanced features configured */
> +	if (!setting &&
> +	    (vf->adq_enabled || vf->num_cloud_filters > 0 ||
> +	     test_bit(I40E_VF_STATE_UC_PROMISC, &vf->vf_states) ||
> +	     test_bit(I40E_VF_STATE_MC_PROMISC, &vf->vf_states))) {
> +		i40e_vc_reset_vf(vf, true);
> +		i40e_del_all_cloud_filters(vf);
> +	} else {
> +		i40e_setup_vf_trust(vf, setting);
> +	}

[High]
Are the four conditions used here a complete description of "VF in a
clean state" for the purposes of trust revocation?

A trusted VF can install several pieces of privileged state that this
predicate does not consult:

  - more than I40E_VC_MAX_VLAN_PER_VF (16) VLAN filters, accepted by
    i40e_vc_add_vlan_msg() only because the VF is trusted

  - more than I40E_VC_MAX_MAC_ADDR_PER_VF (18) unicast MAC filters,
    permitted by i40e_check_vf_permission() only because the VF is
    trusted

  - a VF-side override of an administratively-set MAC (vf->pf_set_mac),
    where the comment in i40e_check_vf_permission() reads:

        /* ... Unless the VF is privileged and then it can do whatever */

Before this patch, the unconditional i40e_vc_reset_vf() drove
i40e_cleanup_reset_vf() -> i40e_free_vf_res(), which wiped the VF
filter table and re-derived the privilege bit from vf->trusted in
i40e_alloc_vf_resources().

On the new no-reset branch, only the I40E_VIRTCHNL_VF_CAP_PRIVILEGE bit
is cleared, so over-limit VLAN filters, over-limit MAC filters, and the
admin-MAC override remain installed in hardware while the driver
prints "VF %u is now untrusted".

Should this predicate also cover those cases (for example by checking
the VLAN/MAC filter counts on the VF VSI and vf->pf_set_mac), or by
performing the resource cleanup explicitly on the no-reset path?

[High]
Can the no-reset branch race with concurrent virtchnl message handling?

i40e_ndo_set_vf_trust() serializes against other ndo callbacks via
__I40E_VIRTCHNL_OP_PENDING, but i40e_vc_process_vf_msg() (driven from
i40e_clean_adminq_subtask() in the service workqueue) does not test
that bit. It rejects messages only when I40E_VF_STATE_DISABLED is set,
which the no-reset path never sets.

A revoke-trust sequence could then look like:

  CPU0 (ndo_set_vf_trust, setting=false)
     reads vf->num_cloud_filters == 0, !vf->adq_enabled,
           !UC_PROMISC, !MC_PROMISC
     selects the else branch

  CPU1 (i40e_vc_process_vf_msg, e.g. ADD_CLOUD_FILTER /
        ADD_VLAN / ADD_ETH_ADDR / CONFIG_PROMISCUOUS_MODE)
     tests I40E_VIRTCHNL_VF_CAP_PRIVILEGE -- still set
     installs privileged state (++num_cloud_filters,
        sets UC/MC_PROMISC, adds an over-limit VLAN/MAC, ...)

  CPU0
     i40e_setup_vf_trust(vf, false)
        clear_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);

End state: trust is "revoked" and the privilege bit is cleared, but
privileged state was installed during the window and is never cleaned
up.

The previous unconditional i40e_vc_reset_vf() path masked this because
i40e_trigger_vf_reset() clears I40E_VF_STATE_ACTIVE (so the
i40e_sync_vf_state(I40E_VF_STATE_ACTIVE) check at the top of most
virtchnl handlers fails) and i40e_cleanup_reset_vf() then wipes any
state that did get installed.

Should the no-reset path take a stronger form of synchronisation
against virtchnl processing, or sample the gating conditions and clear
the privilege bit under that synchronisation?

>  
>  	dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
>  		 vf_id, setting ? "" : "un");
>  
> -	if (vf->adq_enabled) {
> -		if (!vf->trusted) {
> -			dev_info(&pf->pdev->dev,
> -				 "VF %u no longer Trusted, deleting all cloud filters\n",
> -				 vf_id);
> -			i40e_del_all_cloud_filters(vf);
> -		}
> -	}
> -

[Low]
The dev_info() "VF %u no longer Trusted, deleting all cloud filters"
is removed here. The new code still calls i40e_del_all_cloud_filters()
in the equivalent scenario (inside the conditional reset branch), but
the message is no longer emitted and the commit message does not
mention this change.

Was the removal intentional, and if so should it be noted in the
changelog?

>  out:
>  	clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);
>  	return ret;
>  }
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust
  2026-05-20 18:34 ` [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
@ 2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
i40e: skip unnecessary VF reset when setting trust

This commit attempts to skip the ~10 second VF reset delay when changing
the trust setting. It bypasses the reset and only updates the privilege
capability bit when revoking trust if no advanced features are active.

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index a26c3d47ec156..0cc434b26eb88 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
[ ... ]
> @@ -4987,19 +5004,20 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
>  	set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
>  	pf->vsi[vf->lan_vsi_idx]->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
>  
> -	i40e_vc_reset_vf(vf, true);
> +	/* Reset only if revoking trust and VF has advanced features configured */
> +	if (!setting &&
> +	    (vf->adq_enabled || vf->num_cloud_filters > 0 ||
> +	     test_bit(I40E_VF_STATE_UC_PROMISC, &vf->vf_states) ||
> +	     test_bit(I40E_VF_STATE_MC_PROMISC, &vf->vf_states))) {

[Severity: High]
Does skipping the VF reset when revoking trust allow the Virtual Function
to permanently retain privileged resources?

A trusted VF can bypass untrusted limits, allocating a large number of MAC
and VLAN filters and overriding administratively set MAC addresses. By
skipping the reset upon trust revocation if ADq and promiscuous mode are
inactive, the VSI is not destroyed. 

Since i40e_setup_vf_trust() only updates the capability bit without removing
excess filters, could an untrusted VF retain these configurations and bypass
isolation limits?

[Severity: High]
Can this introduce a race condition with concurrent virtchnl messages?

Virtchnl messages are processed in the ARQ workqueue without holding
rtnl_lock or __I40E_VIRTCHNL_OP_PENDING. A concurrent virtchnl request could
verify the privilege capability bit before it is cleared, and successfully
configure an advanced feature like promiscuous mode after these checks have
bypassed the reset.

Will the VF end up untrusted but with advanced features active?

> +		i40e_vc_reset_vf(vf, true);
> +		i40e_del_all_cloud_filters(vf);

[Severity: Critical]
This is a pre-existing issue, but is vf->cloud_filter_list accessed
concurrently without synchronization?

i40e_del_all_cloud_filters() iterates over and deletes elements from
vf->cloud_filter_list. Concurrently, the ARQ processing thread processes
virtchnl messages and adds elements to the same list. 

Since there is no lock protecting this list across the netlink and ARQ
thread contexts, could concurrent modifications corrupt the list pointers
and lead to a use-after-free?

> +	} else {
> +		i40e_setup_vf_trust(vf, setting);
> +	}
> +
>  	dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
>  		 vf_id, setting ? "" : "un");
>  
> -	if (vf->adq_enabled) {
> -		if (!vf->trusted) {
> -			dev_info(&pf->pdev->dev,
> -				 "VF %u no longer Trusted, deleting all cloud filters\n",
> -				 vf_id);
> -			i40e_del_all_cloud_filters(vf);
> -		}
> -	}
> -
>  out:
>  	clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);

[Severity: High]
This isn't a bug introduced by this patch, but does using the
__I40E_VIRTCHNL_OP_PENDING bit flag as an ad-hoc lock violate networking
driver concurrency guidelines?

The guidelines explicitly forbid using bit flags to guard code regions. This
open-coded locking scheme provides no deadlock detection, lockdep visibility,
or strict memory ordering guarantees. Could this be replaced with a proper
mutex or lock?

>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net 5/8] iavf: send MAC change request synchronously
  2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
                   ` (3 preceding siblings ...)
  2026-05-20 18:34 ` [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust Tony Nguyen
@ 2026-05-20 18:34 ` Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-20 18:34 ` [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust Tony Nguyen
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Jose Ignacio Tornos Martinez, anthony.l.nguyen,
	przemyslaw.kitszel, jacob.e.keller, horms, stable,
	Aleksandr Loktionov, Rafal Romanowski

From: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>

After commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs
operations"), iavf_set_mac() is called with the netdev instance lock
already held.

The function queues a MAC address change request via
iavf_replace_primary_mac() and then waits for completion. However, in
the current flow, the actual virtchnl message is sent by the watchdog
task, which also needs to acquire the netdev lock to run. Additionally,
the adminq_task which processes virtchnl responses also needs the netdev
lock.

This creates a deadlock scenario:
1. iavf_set_mac() holds netdev lock and waits for MAC change
2. Watchdog needs netdev lock to send the request -> blocked
3. Even if request is sent, adminq_task needs netdev lock to process
   PF response -> blocked
4. MAC change times out after 2.5 seconds
5. iavf_set_mac() returns -EAGAIN

This particularly affects VFs during bonding setup when multiple VFs are
enslaved in quick succession.

Fix by implementing a synchronous MAC change operation similar to the
approach used in commit fdadbf6e84c4 ("iavf: fix incorrect reset handling
in callbacks").

The solution:
1. Send the virtchnl ADD_ETH_ADDR message directly (not via watchdog)
2. Poll the admin queue hardware directly for responses
3. Process all received messages (including non-MAC messages)
4. Return when MAC change completes or times out

A new generic function iavf_poll_virtchnl_response() is introduced that
can be reused for any future synchronous virtchnl operations. It takes a
callback to check completion, allowing flexible condition checking.

This allows the operation to complete synchronously while holding
netdev_lock, without relying on watchdog or adminq_task. The function
can sleep for up to 2.5 seconds polling hardware, but this is acceptable
since netdev_lock is per-device and only serializes operations on the
same interface.

To support this, change iavf_add_ether_addrs() to return an error code
instead of void, allowing callers to detect failures. Additionally,
export iavf_mac_add_reject() to enable proper rollback on local failures
(timeouts, send errors) - PF rejections are already handled automatically
by iavf_virtchnl_completion().

Remove vc_waitqueue entirely because iavf_set_mac was the only waiter on
this waitqueue and after the changes it is not needed.

Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations")
cc: stable@vger.kernel.org
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  10 +-
 drivers/net/ethernet/intel/iavf/iavf_main.c   |  71 +++++++++----
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 100 ++++++++++++++++--
 3 files changed, 151 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 050f8241ef5e..06eb19b00527 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -259,7 +259,6 @@ struct iavf_adapter {
 	struct work_struct adminq_task;
 	struct work_struct finish_config;
 	wait_queue_head_t down_waitqueue;
-	wait_queue_head_t vc_waitqueue;
 	struct iavf_q_vector *q_vectors;
 	struct list_head vlan_filter_list;
 	int num_vlan_filters;
@@ -588,8 +587,9 @@ void iavf_configure_queues(struct iavf_adapter *adapter);
 void iavf_enable_queues(struct iavf_adapter *adapter);
 void iavf_disable_queues(struct iavf_adapter *adapter);
 void iavf_map_queues(struct iavf_adapter *adapter);
-void iavf_add_ether_addrs(struct iavf_adapter *adapter);
+int iavf_add_ether_addrs(struct iavf_adapter *adapter);
 void iavf_del_ether_addrs(struct iavf_adapter *adapter);
+void iavf_mac_add_reject(struct iavf_adapter *adapter);
 void iavf_add_vlans(struct iavf_adapter *adapter);
 void iavf_del_vlans(struct iavf_adapter *adapter);
 void iavf_set_promiscuous(struct iavf_adapter *adapter);
@@ -606,6 +606,12 @@ void iavf_disable_vlan_stripping(struct iavf_adapter *adapter);
 void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			      enum virtchnl_ops v_opcode,
 			      enum iavf_status v_retval, u8 *msg, u16 msglen);
+int iavf_poll_virtchnl_response(struct iavf_adapter *adapter,
+				bool (*condition)(struct iavf_adapter *adapter,
+						  const void *data,
+						  enum virtchnl_ops v_op),
+				const void *cond_data,
+				unsigned int timeout_ms);
 int iavf_config_rss(struct iavf_adapter *adapter);
 void iavf_cfg_queues_bw(struct iavf_adapter *adapter);
 void iavf_cfg_queues_quanta_size(struct iavf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 78c59a58e0b2..ed790dc3de6b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1029,6 +1029,48 @@ static bool iavf_is_mac_set_handled(struct net_device *netdev,
 	return ret;
 }
 
+/**
+ * iavf_mac_change_done - Check if MAC change completed
+ * @adapter: board private structure
+ * @data: MAC address being checked (as const void *)
+ * @v_op: virtchnl opcode from processed message
+ *
+ * Callback for iavf_poll_virtchnl_response() to check if MAC change completed.
+ *
+ * Return: true if MAC change completed, false otherwise
+ */
+static bool iavf_mac_change_done(struct iavf_adapter *adapter,
+				 const void *data, enum virtchnl_ops v_op)
+{
+	const u8 *addr = data;
+
+	return iavf_is_mac_set_handled(adapter->netdev, addr);
+}
+
+/**
+ * iavf_set_mac_sync - Synchronously change MAC address
+ * @adapter: board private structure
+ * @addr: MAC address to set
+ *
+ * Send MAC change request to PF and poll admin queue for response.
+ * Caller must hold netdev_lock. This can sleep for up to 2.5 seconds.
+ *
+ * Return: 0 on success, negative on failure
+ */
+static int iavf_set_mac_sync(struct iavf_adapter *adapter, const u8 *addr)
+{
+	int ret;
+
+	netdev_assert_locked(adapter->netdev);
+
+	ret = iavf_add_ether_addrs(adapter);
+	if (ret)
+		return ret;
+
+	return iavf_poll_virtchnl_response(adapter, iavf_mac_change_done,
+					   addr, 2500);
+}
+
 /**
  * iavf_set_mac - NDO callback to set port MAC address
  * @netdev: network interface device structure
@@ -1049,25 +1091,21 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
 		return -EADDRNOTAVAIL;
 
 	ret = iavf_replace_primary_mac(adapter, addr->sa_data);
-
 	if (ret)
 		return ret;
 
-	ret = wait_event_interruptible_timeout(adapter->vc_waitqueue,
-					       iavf_is_mac_set_handled(netdev, addr->sa_data),
-					       msecs_to_jiffies(2500));
-
-	/* If ret < 0 then it means wait was interrupted.
-	 * If ret == 0 then it means we got a timeout.
-	 * else it means we got response for set MAC from PF,
-	 * check if netdev MAC was updated to requested MAC,
-	 * if yes then set MAC succeeded otherwise it failed return -EACCES
-	 */
-	if (ret < 0)
+	ret = iavf_set_mac_sync(adapter, addr->sa_data);
+	if (ret) {
+		/* Rollback for local failures (timeout, send error, -EBUSY).
+		 * Note: If PF rejects the request (sends error response),
+		 * iavf_virtchnl_completion() automatically calls
+		 * iavf_mac_add_reject(), ret=0, and this is not executed.
+		 * Only local failures (no PF response received) need manual rollback.
+		 */
+		iavf_mac_add_reject(adapter);
+		ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
 		return ret;
-
-	if (!ret)
-		return -EAGAIN;
+	}
 
 	if (!ether_addr_equal(netdev->dev_addr, addr->sa_data))
 		return -EACCES;
@@ -5393,9 +5431,6 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating transition to down status */
 	init_waitqueue_head(&adapter->down_waitqueue);
 
-	/* Setup the wait queue for indicating virtchannel events */
-	init_waitqueue_head(&adapter->vc_waitqueue);
-
 	INIT_LIST_HEAD(&adapter->ptp.aq_cmds);
 	init_waitqueue_head(&adapter->ptp.phc_time_waitqueue);
 	mutex_init(&adapter->ptp.aq_cmd_lock);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 4f2defd2331b..cd5211b9a798 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -2,6 +2,7 @@
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
 #include <linux/net/intel/libie/rx.h>
+#include <net/netdev_lock.h>
 
 #include "iavf.h"
 #include "iavf_ptp.h"
@@ -555,20 +556,23 @@ iavf_set_mac_addr_type(struct virtchnl_ether_addr *virtchnl_ether_addr,
  * @adapter: adapter structure
  *
  * Request that the PF add one or more addresses to our filters.
- **/
-void iavf_add_ether_addrs(struct iavf_adapter *adapter)
+ *
+ * Return: 0 on success, negative on failure
+ */
+int iavf_add_ether_addrs(struct iavf_adapter *adapter)
 {
 	struct virtchnl_ether_addr_list *veal;
 	struct iavf_mac_filter *f;
 	int i = 0, count = 0;
 	bool more = false;
 	size_t len;
+	int ret;
 
 	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
 		/* bail because we already have a command pending */
 		dev_err(&adapter->pdev->dev, "Cannot add filters, command %d pending\n",
 			adapter->current_op);
-		return;
+		return -EBUSY;
 	}
 
 	spin_lock_bh(&adapter->mac_vlan_list_lock);
@@ -580,7 +584,7 @@ void iavf_add_ether_addrs(struct iavf_adapter *adapter)
 	if (!count) {
 		adapter->aq_required &= ~IAVF_FLAG_AQ_ADD_MAC_FILTER;
 		spin_unlock_bh(&adapter->mac_vlan_list_lock);
-		return;
+		return 0;
 	}
 	adapter->current_op = VIRTCHNL_OP_ADD_ETH_ADDR;
 
@@ -594,8 +598,9 @@ void iavf_add_ether_addrs(struct iavf_adapter *adapter)
 
 	veal = kzalloc(len, GFP_ATOMIC);
 	if (!veal) {
+		adapter->current_op = VIRTCHNL_OP_UNKNOWN;
 		spin_unlock_bh(&adapter->mac_vlan_list_lock);
-		return;
+		return -ENOMEM;
 	}
 
 	veal->vsi_id = adapter->vsi_res->vsi_id;
@@ -615,8 +620,15 @@ void iavf_add_ether_addrs(struct iavf_adapter *adapter)
 
 	spin_unlock_bh(&adapter->mac_vlan_list_lock);
 
-	iavf_send_pf_msg(adapter, VIRTCHNL_OP_ADD_ETH_ADDR, (u8 *)veal, len);
+	ret = iavf_send_pf_msg(adapter, VIRTCHNL_OP_ADD_ETH_ADDR, (u8 *)veal, len);
 	kfree(veal);
+	if (ret) {
+		dev_err(&adapter->pdev->dev,
+			"Unable to send ADD_ETH_ADDR message to PF, error %d\n", ret);
+		adapter->current_op = VIRTCHNL_OP_UNKNOWN;
+	}
+
+	return ret;
 }
 
 /**
@@ -712,8 +724,8 @@ static void iavf_mac_add_ok(struct iavf_adapter *adapter)
  * @adapter: adapter structure
  *
  * Remove filters from list based on PF response.
- **/
-static void iavf_mac_add_reject(struct iavf_adapter *adapter)
+ */
+void iavf_mac_add_reject(struct iavf_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct iavf_mac_filter *f, *ftmp;
@@ -2364,7 +2376,6 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			iavf_mac_add_reject(adapter);
 			/* restore administratively set MAC address */
 			ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
-			wake_up(&adapter->vc_waitqueue);
 			break;
 		case VIRTCHNL_OP_DEL_ETH_ADDR:
 			dev_err(&adapter->pdev->dev, "Failed to delete MAC filter, error %s\n",
@@ -2557,7 +2568,6 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 				eth_hw_addr_set(netdev, adapter->hw.mac.addr);
 				netif_addr_unlock_bh(netdev);
 			}
-		wake_up(&adapter->vc_waitqueue);
 		break;
 	case VIRTCHNL_OP_GET_STATS: {
 		struct iavf_eth_stats *stats =
@@ -2952,3 +2962,73 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 	} /* switch v_opcode */
 	adapter->current_op = VIRTCHNL_OP_UNKNOWN;
 }
+
+/**
+ * iavf_poll_virtchnl_response - Poll admin queue for virtchnl response
+ * @adapter: adapter structure
+ * @condition: callback to check if desired response received
+ * @cond_data: context data passed to condition callback
+ * @timeout_ms: maximum time to wait in milliseconds
+ *
+ * Polls the admin queue and processes all incoming virtchnl messages.
+ * After processing each valid message, calls the condition callback to check
+ * if the expected response has been received. The callback receives the opcode
+ * of the processed message to identify which response was received. Continues
+ * polling until the callback returns true or timeout expires.
+ * Caller must hold netdev_lock. This can sleep for up to timeout_ms while
+ * polling hardware.
+ *
+ * Return: 0 on success (condition met), -EAGAIN on timeout, or error code
+ */
+int iavf_poll_virtchnl_response(struct iavf_adapter *adapter,
+				bool (*condition)(struct iavf_adapter *adapter,
+						  const void *data,
+						  enum virtchnl_ops v_op),
+				const void *cond_data,
+				unsigned int timeout_ms)
+{
+	struct iavf_hw *hw = &adapter->hw;
+	struct iavf_arq_event_info event;
+	enum virtchnl_ops received_op;
+	unsigned long timeout;
+	int ret = -EAGAIN;
+	u16 pending = 0;
+	u32 v_retval;
+
+	netdev_assert_locked(adapter->netdev);
+
+	event.buf_len = IAVF_MAX_AQ_BUF_SIZE;
+	event.msg_buf = kzalloc(event.buf_len, GFP_KERNEL);
+	if (!event.msg_buf)
+		return -ENOMEM;
+
+	timeout = jiffies + msecs_to_jiffies(timeout_ms);
+	do {
+		if (!pending)
+			usleep_range(50, 75);
+
+		if (iavf_clean_arq_element(hw, &event, &pending) == IAVF_SUCCESS) {
+			received_op = (enum virtchnl_ops)le32_to_cpu(event.desc.cookie_high);
+			if (received_op != VIRTCHNL_OP_UNKNOWN) {
+				v_retval = le32_to_cpu(event.desc.cookie_low);
+
+				iavf_virtchnl_completion(adapter, received_op,
+							 (enum iavf_status)v_retval,
+							 event.msg_buf, event.msg_len);
+
+				if (condition(adapter, cond_data, received_op)) {
+					ret = 0;
+					break;
+				}
+			}
+
+			memset(event.msg_buf, 0, IAVF_MAX_AQ_BUF_SIZE);
+
+			if (pending)
+				continue;
+		}
+	} while (time_before(jiffies, timeout));
+
+	kfree(event.msg_buf);
+	return ret;
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net 5/8] iavf: send MAC change request synchronously
  2026-05-20 18:34 ` [PATCH net 5/8] iavf: send MAC change request synchronously Tony Nguyen
@ 2026-05-23  0:16   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
iavf: send MAC change request synchronously

This patch reworks iavf_set_mac() to send the ADD_ETH_ADDR virtchnl
message directly and poll the admin queue for the response, instead of
relying on the watchdog and adminq tasks (which now block on the netdev
instance lock held by iavf_set_mac()). It also exports
iavf_mac_add_reject() so the new path can roll back local failures, and
removes vc_waitqueue.

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 78c59a58e0b2..ed790dc3de6b 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -1029,6 +1029,48 @@ static bool iavf_is_mac_set_handled(struct net_device *netdev,
>  	return ret;
>  }
>  

[ ... ]

> +static int iavf_set_mac_sync(struct iavf_adapter *adapter, const u8 *addr)
> +{
> +	int ret;
> +
> +	netdev_assert_locked(adapter->netdev);
> +
> +	ret = iavf_add_ether_addrs(adapter);
> +	if (ret)
> +		return ret;
> +
> +	return iavf_poll_virtchnl_response(adapter, iavf_mac_change_done,
> +					   addr, 2500);
> +}

[Medium]
Can this path mishandle the multi-batch case in iavf_add_ether_addrs()?

When the number of filters with f->add=true exceeds what fits in one
IAVF_MAX_AQ_BUF_SIZE message, iavf_add_ether_addrs() trims the batch and
sets more=true:

	len = virtchnl_struct_size(veal, list, count);
	if (len > IAVF_MAX_AQ_BUF_SIZE) {
		dev_warn(&adapter->pdev->dev, "Too many add MAC changes in one request\n");
		while (len > IAVF_MAX_AQ_BUF_SIZE)
			len = virtchnl_struct_size(veal, list, --count);
		more = true;
	}
	...
	if (!more)
		adapter->aq_required &= ~IAVF_FLAG_AQ_ADD_MAC_FILTER;

Only the first batch is sent synchronously here; the remainder is left
for the watchdog. iavf_replace_primary_mac() calls iavf_add_filter()
which uses list_add_tail(), so the new primary MAC is at the end of
mac_filter_list and ends up in the last batch.

In that scenario, iavf_poll_virtchnl_response() polls
iavf_is_mac_set_handled() for the new MAC, which never completes within
2.5 seconds because the message that would change it has not been sent
yet. iavf_set_mac() then returns -EAGAIN and the rollback below removes
the new filter via iavf_mac_add_reject() because is_new_mac is true.

Pre-patch the wait_event() on vc_waitqueue would eventually be satisfied
because the watchdog drained subsequent batches and the PF response for
the new MAC arrived. Should iavf_set_mac_sync() loop while
IAVF_FLAG_AQ_ADD_MAC_FILTER (or more) is still set, so the new MAC is
actually transmitted before the timeout?

> @@ -1049,25 +1091,21 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
>  		return -EADDRNOTAVAIL;
>  
>  	ret = iavf_replace_primary_mac(adapter, addr->sa_data);
> -
>  	if (ret)
>  		return ret;
>  
> -	ret = wait_event_interruptible_timeout(adapter->vc_waitqueue,
> -					       iavf_is_mac_set_handled(netdev, addr->sa_data),
> -					       msecs_to_jiffies(2500));
> -
> -	/* If ret < 0 then it means wait was interrupted.
> -	 * If ret == 0 then it means we got a timeout.
> -	 * else it means we got response for set MAC from PF,
> -	 * check if netdev MAC was updated to requested MAC,
> -	 * if yes then set MAC succeeded otherwise it failed return -EACCES
> -	 */
> -	if (ret < 0)
> +	ret = iavf_set_mac_sync(adapter, addr->sa_data);
> +	if (ret) {
> +		/* Rollback for local failures (timeout, send error, -EBUSY).
> +		 * Note: If PF rejects the request (sends error response),
> +		 * iavf_virtchnl_completion() automatically calls
> +		 * iavf_mac_add_reject(), ret=0, and this is not executed.
> +		 * Only local failures (no PF response received) need manual rollback.
> +		 */
> +		iavf_mac_add_reject(adapter);
> +		ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
>  		return ret;
> -
> -	if (!ret)
> -		return -EAGAIN;
> +	}

[Medium]
Is iavf_mac_add_reject() safe to call when the AQ message never reached
the PF?

iavf_add_ether_addrs() clears f->add=false on every filter that it packs
into the batch, before iavf_send_pf_msg() is invoked:

	list_for_each_entry(f, &adapter->mac_filter_list, list) {
		if (f->add) {
			ether_addr_copy(veal->list[i].addr, f->macaddr);
			iavf_set_mac_addr_type(&veal->list[i], f);
			i++;
			f->add = false;
			...
		}
	}
	...
	ret = iavf_send_pf_msg(adapter, VIRTCHNL_OP_ADD_ETH_ADDR, (u8 *)veal, len);

iavf_mac_add_reject() then walks the entire mac_filter_list and marks
every filter with !f->add && !f->add_handled as add_handled=true:

	list_for_each_entry_safe(f, ftmp, &adapter->mac_filter_list, list) {
		...
		if (!f->add && !f->add_handled)
			f->add_handled = true;
		...
	}

On a local failure (iavf_send_pf_msg() error, IAVF_FLAG_PF_COMMS_FAILED,
-EBUSY, -ENOMEM or polling timeout) any unrelated filters that were in
the batch (e.g., entries added by a prior ndo_set_rx_mode() that the
watchdog had not yet drained) get add_handled=true even though the PF
never received them. They are not retried since f->add was cleared.

Pre-patch iavf_mac_add_reject() was only called from
iavf_virtchnl_completion() after the PF actually rejected an
ADD_ETH_ADDR, so marking these filters as handled was correct because
the PF had seen them. Should the new local-failure rollback be scoped to
only the new primary MAC filter, rather than reusing the PF-reject
helper unchanged?

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
> index 4f2defd2331b..cd5211b9a798 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
> @@ -555,20 +556,23 @@ iavf_set_mac_addr_type(struct virtchnl_ether_addr *virtchnl_ether_addr,
>   * @adapter: adapter structure
>   *
>   * Request that the PF add one or more addresses to our filters.
> - **/
> -void iavf_add_ether_addrs(struct iavf_adapter *adapter)
> +int iavf_add_ether_addrs(struct iavf_adapter *adapter)
>  {
>  	struct virtchnl_ether_addr_list *veal;
>  	struct iavf_mac_filter *f;
>  	int i = 0, count = 0;
>  	bool more = false;

[ ... ]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net 5/8] iavf: send MAC change request synchronously
  2026-05-20 18:34 ` [PATCH net 5/8] iavf: send MAC change request synchronously Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
@ 2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin, stable

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
iavf: send MAC change request synchronously

This commit implements a synchronous MAC address change operation that sends
the ADD_ETH_ADDR message directly and polls for the response. This avoids
deadlocks involving the netdev lock, watchdog task, and adminq task.

cc: stable@vger.kernel.org

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 78c59a58e0b2b..ed790dc3de6b7 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
[ ... ]
> @@ -1029,6 +1029,48 @@ static bool iavf_is_mac_set_handled(struct net_device *netdev,
[ ... ]
> +static int iavf_set_mac_sync(struct iavf_adapter *adapter, const u8 *addr)
> +{
> +	int ret;
> +
> +	netdev_assert_locked(adapter->netdev);
> +
> +	ret = iavf_add_ether_addrs(adapter);
> +	if (ret)
> +		return ret;

[Severity: Medium]
Will this reliably fail with -EBUSY if a background operation (such as
VIRTCHNL_OP_GET_STATS sent by the watchdog) is already in flight?

If adapter->current_op != VIRTCHNL_OP_UNKNOWN, iavf_add_ether_addrs()
instantly returns -EBUSY. Because iavf_set_mac() holds the netdev_lock,
the adminq_task cannot run to process the pending response and clear
current_op, resulting in a spurious failure rather than waiting for the
background operation to complete.

> +
> +	return iavf_poll_virtchnl_response(adapter, iavf_mac_change_done,
> +					   addr, 2500);
> +}
[ ... ]
> @@ -1049,25 +1091,21 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
>  		return -EADDRNOTAVAIL;
>  
>  	ret = iavf_replace_primary_mac(adapter, addr->sa_data);
> -
>  	if (ret)
>  		return ret;
>  
> -	ret = wait_event_interruptible_timeout(adapter->vc_waitqueue,
> -					       iavf_is_mac_set_handled(netdev, addr->sa_data),
> -					       msecs_to_jiffies(2500));
> -
> -	/* If ret < 0 then it means wait was interrupted.
> -	 * If ret == 0 then it means we got a timeout.
> -	 * else it means we got response for set MAC from PF,
> -	 * check if netdev MAC was updated to requested MAC,
> -	 * if yes then set MAC succeeded otherwise it failed return -EACCES
> -	 */
> -	if (ret < 0)
> +	ret = iavf_set_mac_sync(adapter, addr->sa_data);
> +	if (ret) {
> +		/* Rollback for local failures (timeout, send error, -EBUSY).
> +		 * Note: If PF rejects the request (sends error response),
> +		 * iavf_virtchnl_completion() automatically calls
> +		 * iavf_mac_add_reject(), ret=0, and this is not executed.
> +		 * Only local failures (no PF response received) need manual rollback.
> +		 */
> +		iavf_mac_add_reject(adapter);
> +		ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
>  		return ret;
> -
> -	if (!ret)
> -		return -EAGAIN;

[Severity: High]
Does rolling back on local failure here cause a permanent state mismatch
between the PF and VF?

If the MAC change request was already sent to the PF by
iavf_add_ether_addrs(), but iavf_poll_virtchnl_response() subsequently
fails (e.g., memory allocation failure for event.msg_buf, or a timeout
occurs), this local rollback deletes the new MAC filter and restores the
old MAC.

However, the PF might still asynchronously process the request and
configure the new MAC. Later, the VF's watchdog will send a pending
deletion for the old MAC, leaving the PF configured with the new MAC
while the VF uses the old MAC, causing a complete loss of network
connectivity.

Should the buffer allocation in iavf_poll_virtchnl_response() be done
before sending the request to the PF, and can timeouts be handled without
rolling back in-flight requests?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust
  2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
                   ` (4 preceding siblings ...)
  2026-05-20 18:34 ` [PATCH net 5/8] iavf: send MAC change request synchronously Tony Nguyen
@ 2026-05-20 18:34 ` Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-20 18:34 ` [PATCH net 7/8] i40e: set supported_extts_flags for rising edge Tony Nguyen
  2026-05-20 18:34 ` [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock Tony Nguyen
  7 siblings, 2 replies; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Jose Ignacio Tornos Martinez, anthony.l.nguyen,
	przemyslaw.kitszel, jacob.e.keller, horms, Rafal Romanowski

From: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>

Similar to the i40e fix, ice_set_vf_trust() unconditionally calls
ice_reset_vf() when the trust setting changes. While the delay is smaller
than i40e this reset is still unnecessary in most cases.

Additionally, the original code has a race condition: it deletes MAC LLDP
filters BEFORE resetting the VF. During this deletion, the VF is still
ACTIVE and can add new MAC LLDP filters concurrently, potentially
corrupting the filter list.

When granting trust, no reset is needed - we can just set the capability
flag to allow privileged operations.

When revoking trust, we only need to reset (conservative approach) if
the VF has actually configured advanced features that require cleanup
(MAC LLDP filters, promiscuous mode). For VFs in a clean state, we can
safely change the trust setting without the disruptive reset.

When we do reset (MAC LLDP case), we fix the race condition by resetting
first to clear VF state (which blocks new MAC LLDP filter additions), then
delete existing filters safely. During cleanup, vf->trusted remains true so
ice_vf_is_lldp_ena() works properly. Only after cleanup do we set
vf->trusted = false.

When we don't reset, we manually handle capability flag via helper
function, eliminating the delay.

Fixes: 2296345416b0 ("ice: receive LLDP on trusted VFs")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_sriov.c | 33 +++++++++++++++++++---
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index 7e00e091756d..3c64ed1b41a8 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1364,6 +1364,23 @@ int ice_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
 	return __ice_set_vf_mac(ice_netdev_to_pf(netdev), vf_id, mac);
 }
 
+/**
+ * ice_setup_vf_trust - Enable/disable VF trust mode without reset
+ * @vf: VF to configure
+ * @setting: trust setting
+ *
+ * Update VF flags when changing trust without performing a VF reset.
+ * This is only called when it's safe to skip the reset (VF has no advanced
+ * features configured that need cleanup).
+ */
+static void ice_setup_vf_trust(struct ice_vf *vf, bool setting)
+{
+	if (setting)
+		set_bit(ICE_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
+	else
+		clear_bit(ICE_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
+}
+
 /**
  * ice_set_vf_trust
  * @netdev: network interface device structure
@@ -1399,11 +1416,19 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
 
 	mutex_lock(&vf->cfg_lock);
 
-	while (!trusted && vf->num_mac_lldp)
-		ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);
-
+	/* Reset only if revoking trust and VF has advanced features configured */
+	if (!trusted &&
+	    (vf->num_mac_lldp > 0 ||
+	     test_bit(ICE_VF_STATE_UC_PROMISC, vf->vf_states) ||
+	     test_bit(ICE_VF_STATE_MC_PROMISC, vf->vf_states))) {
+		ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
+		while (vf->num_mac_lldp)
+			ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);
+	} else {
+		ice_setup_vf_trust(vf, trusted);
+	}
 	vf->trusted = trusted;
-	ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
+
 	dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
 		 vf_id, trusted ? "" : "un");
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust
  2026-05-20 18:34 ` [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust Tony Nguyen
@ 2026-05-23  0:16   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Based on my verification, all three findings are confirmed. Let me now generate the report.

commit 848ee33529de0af7967f298f70c37bfb75ea5498
Author: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>

ice: skip unnecessary VF reset when setting trust

This patch makes ice_set_vf_trust() skip the VF reset for both the
grant-trust path and for revocation when the VF has not configured
LLDP filters or promiscuous mode. When a reset is still needed, it
reorders the work so the reset runs before the LLDP filter teardown.

> diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
> index 7e00e091756d..3c64ed1b41a8 100644
> --- a/drivers/net/ethernet/intel/ice/ice_sriov.c
> +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
> @@ -1399,11 +1416,19 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
>  
>  	mutex_lock(&vf->cfg_lock);
>  
> -	while (!trusted && vf->num_mac_lldp)
> -		ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);
> -
> +	/* Reset only if revoking trust and VF has advanced features configured */
> +	if (!trusted &&
> +	    (vf->num_mac_lldp > 0 ||
> +	     test_bit(ICE_VF_STATE_UC_PROMISC, vf->vf_states) ||
> +	     test_bit(ICE_VF_STATE_MC_PROMISC, vf->vf_states))) {
> +		ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);

[Critical]
Can the privilege capability bit get left set after revocation here?

The actual trust check used at runtime is ice_is_vf_trusted(), which
reads the bit, not vf->trusted:

  bool ice_is_vf_trusted(struct ice_vf *vf)
  {
          return test_bit(ICE_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
  }

The new code calls ice_reset_vf() while vf->trusted is still true (the
assignment is deferred to after the reset so ice_vf_is_lldp_ena() works
during cleanup). Inside the reset, the rebuild path runs:

  ice_reset_vf()
    -> ice_vf_post_vsi_rebuild()
       -> ice_vf_rebuild_host_cfg()
          -> ice_vf_set_host_trust_cfg()
             assign_bit(ICE_VIRTCHNL_VF_CAP_PRIVILEGE,
                        &vf->vf_caps, vf->trusted);

Because vf->trusted is still true at that moment, the bit gets
re-asserted to 1. After the reset returns, vf->trusted is set to false,
but ice_setup_vf_trust(vf, false) is only called in the else branch, so
the privilege bit is never cleared on this path.

After this returns, ice_is_vf_trusted() still returns true, which gates
ice_can_vf_change_mac(), ice_vc_handle_mac_addr_msg(),
ice_vc_cfg_promiscuous_mode_msg() and ice_vf_has_max_vlans(). The kernel
log says "VF X is now untrusted" but the trust-gated capabilities remain
available to the guest.

Would calling ice_setup_vf_trust(vf, false) (or assigning vf->trusted
before the reset, as the previous code did) be necessary to actually
clear the bit on this path?

> +		while (vf->num_mac_lldp)
> +			ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);

[High]
Is this loop reachable after the reset above?

ice_reset_vf() goes through ice_vf_pre_vsi_rebuild() which calls
ice_vf_clear_counters():

  static void ice_vf_clear_counters(struct ice_vf *vf)
  {
          struct ice_vsi *vsi = ice_get_vf_vsi(vf);

          if (vsi)
                  vsi->num_vlan = 0;

          vf->num_mac = 0;
          vf->num_mac_lldp = 0;
          ...
  }

By the time control returns to ice_set_vf_trust(), vf->num_mac_lldp is
already zero, so the while loop never iterates and
ice_vf_update_mac_lldp_num() -> ice_vsi_cfg_sw_lldp(vsi, false, false)
never runs.

ice_vf_reconfig_vsi() inside the reset does invoke ice_fltr_remove_all()
for per-VSI rules, but ice_vsi_cfg_sw_lldp() also drives the AQ-based
ice_lldp_fltr_add_remove() removal, gated by the PF-wide
ICE_FLAG_LLDP_AQ_FLTR flag, which is not torn down by ice_fltr_remove_all().
On firmware that uses the AQ fallback (set when the per-VSI rule add
fails), the AQ LLDP filter rule installed when trust was granted stays
in place, and the VF can keep receiving LLDP frames after revocation.

The commit message says:

    When we do reset (MAC LLDP case), we fix the race condition by
    resetting first to clear VF state (which blocks new MAC LLDP filter
    additions), then delete existing filters safely.

Does the deletion step actually run, given that the loop counter is
zeroed by the reset before the loop is reached? Would saving
num_mac_lldp before ice_reset_vf() and using that saved value (or
unconditionally calling ice_vsi_cfg_sw_lldp(vsi, false, false) when an
LLDP rule was present) be needed for the intended teardown to happen?

> +	} else {
> +		ice_setup_vf_trust(vf, trusted);
> +	}
>  	vf->trusted = trusted;
> -	ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
> +
>  	dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
>  		 vf_id, trusted ? "" : "un");

[Medium]
Is the predicate that selects the reset path complete?

The condition only covers num_mac_lldp and the UC/MC promisc state bits.
While trusted, however, a VF can also:

  - override the host-set MAC and add unicast MACs beyond
    ICE_MAX_MACADDR_PER_VF (ice_can_vf_change_mac() and
    ice_vc_handle_mac_addr_msg() consult ice_is_vf_trusted())
  - add VLAN filters beyond ICE_MAX_VLAN_PER_VF
    (ice_vf_has_max_vlans() / ice_vc_get_max_vlan_fltrs() grant a higher
    cap when ice_is_vf_trusted() is true)

A VF that used these higher quotas but is not currently in promisc mode
and has no LLDP filter takes the no-reset path, and the over-quota MAC
and VLAN filters carry over past trust revocation. The previous
unconditional ice_reset_vf() rebuilt the VSI through
ice_vf_rebuild_host_mac_cfg(), which restored only the host-set MAC and
dropped VF-added extras.

The commit message says the reset is taken "if the VF has actually
configured advanced features that require cleanup". Should the
condition also account for VF-installed MAC and VLAN filters that
exceed the untrusted limits?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust
  2026-05-20 18:34 ` [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
@ 2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ice: skip unnecessary VF reset when setting trust

Similar to the i40e fix, ice_set_vf_trust() unconditionally calls
ice_reset_vf() when the trust setting changes. While the delay is smaller
than i40e this reset is still unnecessary in most cases.

> diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
> --- a/drivers/net/ethernet/intel/ice/ice_sriov.c
> +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c

[ ... ]

> @@ -1399,11 +1416,19 @@ int ice_set_vf_trust(struct net_device *netdev, int vf_id, bool trusted)
>  
>  	mutex_lock(&vf->cfg_lock);
>  
> -	while (!trusted && vf->num_mac_lldp)
> -		ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);
> -
> +	/* Reset only if revoking trust and VF has advanced features configured */
> +	if (!trusted &&
> +	    (vf->num_mac_lldp > 0 ||
> +	     test_bit(ICE_VF_STATE_UC_PROMISC, vf->vf_states) ||
> +	     test_bit(ICE_VF_STATE_MC_PROMISC, vf->vf_states))) {
> +		ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);

[Severity: High]
When revoking trust, ice_reset_vf() is called before updating vf->trusted
to false.

During ice_reset_vf(), the VSI is rebuilt and synchronizes the
ICE_VIRTCHNL_VF_CAP_PRIVILEGE bit in vf->vf_caps with the current value of
vf->trusted, which is still true. The else block below is bypassed, and
although vf->trusted is later set to false, vf_caps will retain the
privileged capability bit.

Because ice_is_vf_trusted() strictly checks vf_caps, won't the VF
incorrectly retain privileged capabilities indefinitely after this operation?

> +		while (vf->num_mac_lldp)
> +			ice_vf_update_mac_lldp_num(vf, ice_get_vf_vsi(vf), false);

[Severity: High]
Is the MAC LLDP filter cleanup loop dead code?

ice_reset_vf() internally calls ice_vf_pre_vsi_rebuild(), which
unconditionally clears vf->num_mac_lldp to 0 via ice_vf_clear_counters().
As a result, the while condition evaluates to false immediately upon
returning from the reset, completely bypassing the intended cleanup logic.

Furthermore, because vf->num_mac_lldp is abruptly zeroed without going
through ice_vf_update_mac_lldp_num(), does the driver skip calling
ice_vsi_cfg_sw_lldp(), silently leaking the software LLDP switch filters
from its bookkeeping?

> +	} else {
> +		ice_setup_vf_trust(vf, trusted);
> +	}

[Severity: High]
If we skip the VF reset when granting trust, what happens to expected VM
configuration workflows for promiscuous mode?

The PF driver intentionally lies to untrusted VFs that attempt to enable
promiscuous mode: it returns success to the VF without actually programming
the hardware. Historically, granting trust forced a full VF reset. This reset
prompted the VF driver to re-initialize and replay its configuration (including
promiscuous mode) to the PF, which would then correctly program the hardware
because the VF was now trusted.

By skipping the reset when granting trust, it appears a VF that previously
requested promiscuous mode will remain permanently desynchronized. The guest
OS believes it is in promiscuous mode, but the hardware is not programmed to
reflect this, which could result in dropped traffic.

[Severity: High]
When skipping the VF reset on trust revocation (because the VF lacks MAC
LLDP or promiscuous mode), are we allowing the VF to persistently retain
resources allocated while it was trusted?

The 'trusted' capability grants a VF the ability to bypass administrative
limits, such as ICE_MAX_MACADDR_PER_VF (which restricts standard VFs to 18
MAC addresses) and the maximum VLAN limit checked in ice_vf_has_max_vlans().

Prior to this patch, revoking trust unconditionally triggered a VF reset,
which rebuilt the VSI and completely wiped all dynamically allocated hardware
filters, effectively enforcing the demoted limits.

Without the reset, might a VF that allocated an excessive number of MAC
addresses or VLANs while trusted retain them in hardware indefinitely after
trust is revoked, creating a persistent resource limit bypass?

>  	vf->trusted = trusted;
> -	ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
> +
>  	dev_info(ice_pf_to_dev(pf), "VF %u is now %strusted\n",
>  		 vf_id, trusted ? "" : "un");

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net 7/8] i40e: set supported_extts_flags for rising edge
  2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
                   ` (5 preceding siblings ...)
  2026-05-20 18:34 ` [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust Tony Nguyen
@ 2026-05-20 18:34 ` Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-20 18:34 ` [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock Tony Nguyen
  7 siblings, 1 reply; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Przemyslaw Korba, anthony.l.nguyen, richardcochran, kory.maincent,
	Arkadiusz Kubalewski, Aleksandr Loktionov, Jacob Keller,
	Simon Horman, Sunitha Mekala

From: Przemyslaw Korba <przemyslaw.korba@intel.com>

The i40e driver always supported only rising edge detection, so
advertise PTP_RISING_EDGE, and PTP_STRICT_FLAGS to ensure the
PTP core properly validates user requests.

Fixes: 7c571ac57d9d ("net: ptp: introduce .supported_extts_flags to ptp_clock_info")
Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ptp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 7d07c389bb23..c4525bfab09c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -1344,6 +1344,8 @@ static int i40e_init_pin_config(struct i40e_pf *pf)
 	pf->ptp_caps.n_ext_ts = 2;
 	pf->ptp_caps.pps = 1;
 	pf->ptp_caps.n_per_out = 2;
+	pf->ptp_caps.supported_extts_flags = PTP_RISING_EDGE |
+					     PTP_STRICT_FLAGS;
 
 	pf->ptp_caps.pin_config = kzalloc_objs(*pf->ptp_caps.pin_config,
 					       pf->ptp_caps.n_pins);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net 7/8] i40e: set supported_extts_flags for rising edge
  2026-05-20 18:34 ` [PATCH net 7/8] i40e: set supported_extts_flags for rising edge Tony Nguyen
@ 2026-05-23  0:16   ` Jakub Kicinski
  0 siblings, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
i40e: set supported_extts_flags for rising edge

The i40e driver always supported only rising edge detection, so
advertise PTP_RISING_EDGE, and PTP_STRICT_FLAGS to ensure the
PTP core properly validates user requests.

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
> index 7d07c389bb231..c4525bfab09c0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
> @@ -1344,6 +1344,8 @@ static int i40e_init_pin_config(struct i40e_pf *pf)
>  	pf->ptp_caps.n_ext_ts = 2;
>  	pf->ptp_caps.pps = 1;
>  	pf->ptp_caps.n_per_out = 2;
> +	pf->ptp_caps.supported_extts_flags = PTP_RISING_EDGE |
> +					     PTP_STRICT_FLAGS;

[Severity: High]
This isn't a bug introduced by this patch, but does the PTP external timestamp
event in i40e_ptp_extts0_work() incorrectly assign the PCI Physical Function
(PF) ID to the event index instead of the PTP hardware channel index?

Looking at i40e_ptp_extts0_work():

drivers/net/ethernet/intel/i40e/i40e_ptp.c:i40e_ptp_extts0_work() {
	...
	event.type = PTP_CLOCK_EXTTS;
	event.index = hw->pf_id;

	/* fire event */
	ptp_clock_event(pf->ptp_clock, &event);
}

The PTP core API expects the index field to correspond to the requested EXTTS
channel index (which should be 0 here).

Userspace applications rely on this index to associate the received timestamp
event with the requested channel.

For interfaces where pf_id is greater than 0 (e.g., dual or quad port NICs),
this dispatches the event to userspace with the wrong channel index, causing
applications to ignore the timestamp or misbehave, completely breaking EXTTS
functionality for those ports.

>  
>  	pf->ptp_caps.pin_config = kzalloc_objs(*pf->ptp_caps.pin_config,
>  					       pf->ptp_caps.n_pins);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock
  2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
                   ` (6 preceding siblings ...)
  2026-05-20 18:34 ` [PATCH net 7/8] i40e: set supported_extts_flags for rising edge Tony Nguyen
@ 2026-05-20 18:34 ` Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  7 siblings, 2 replies; 21+ messages in thread
From: Tony Nguyen @ 2026-05-20 18:34 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Corinna Vinschen, anthony.l.nguyen, Alexander Nowlin

From: Corinna Vinschen <vinschen@redhat.com>

Commit 1e53834ce541d ("ixgbe: Add locking to prevent panic when setting
sriov_numvfs to zero") added a spinlock to the adapter info.  The reason
at the time was an observed crash when ixgbe_disable_sriov() freed the
adapter->vfinfo array while the interrupt driven function ixgbe_msg_task()
was handling VF messages.

Recent stability testing turned up another crash, which is very easily
reproducible:

  while true
  do
    for numvfs in 5 0
    do
      echo $numvfs > /sys/class/net/eth0/device/sriov_numvfs
    done
  done

This crashed almost always within the first two hundred runs with
a NULL pointer deref while running the ixgbe_service_task() workqueue:

[ 5052.036491] BUG: kernel NULL pointer dereference, address: 0000000000000258
[ 5052.043454] #PF: supervisor read access in kernel mode
[ 5052.048594] #PF: error_code(0x0000) - not-present page
[ 5052.053734] PGD 0 P4D 0
[ 5052.056272] Oops: Oops: 0000 #1 SMP NOPTI
[ 5052.060459] CPU: 2 UID: 0 PID: 132253 Comm: kworker/u96:0 Kdump: loaded Not tainted 6.12.0-180.el10.x86_64 #1 PREEMPT(voluntary)
[ 5052.072100] Hardware name: Dell Inc. PowerEdge R740/0DY2X0, BIOS 2.12.2 07/09/2021
[ 5052.079664] Workqueue: ixgbe ixgbe_service_task [ixgbe]
[ 5052.084907] RIP: 0010:ixgbe_update_stats+0x8b1/0xb40 [ixgbe]
[ 5052.090585] Code: 21 56 50 49 8b b6 18 26 00 00 4c 01 fe 48 09 46 50 42 8d 34 a5 00 83 00 00 e8 cb 7a ff ff 49 8b b6 18 26 00 00 89 c0 4c 01 fe <48> 3b 86 88 00 00 00 73 18 48 b9 00 00 00 00 01 00 00 00 48 01 4e
[ 5052.109331] RSP: 0018:ffffd5f1e8a6bd88 EFLAGS: 00010202
[ 5052.114558] RAX: 0000000000000000 RBX: ffff8f49b22b14a0 RCX: 000000000000023c
[ 5052.121689] RDX: ffffffff00000000 RSI: 00000000000001d0 RDI: ffff8f49b22b14a0
[ 5052.128823] RBP: 000000000000109c R08: 0000000000000000 R09: 0000000000000000
[ 5052.135955] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[ 5052.143086] R13: 0000000000008410 R14: ffff8f49b22b01a0 R15: 00000000000001d0
[ 5052.150221] FS:  0000000000000000(0000) GS:ffff8f58bfc80000(0000) knlGS:0000000000000000
[ 5052.158307] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5052.164054] CR2: 0000000000000258 CR3: 0000000bf2624006 CR4: 00000000007726f0
[ 5052.171187] PKRU: 55555554
[ 5052.173898] Call Trace:
[ 5052.176351]  <TASK>
[ 5052.178457]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 5052.182816]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 5052.187177]  ? ixgbe_watchdog_subtask+0x1a1/0x230 [ixgbe]
[ 5052.192591]  ? __die_body.cold+0x8/0x12
[ 5052.196433]  ? page_fault_oops+0x148/0x160
[ 5052.200532]  ? exc_page_fault+0x7f/0x150
[ 5052.204458]  ? asm_exc_page_fault+0x26/0x30
[ 5052.208643]  ? ixgbe_update_stats+0x8b1/0xb40 [ixgbe]
[ 5052.213714]  ? ixgbe_update_stats+0x8a5/0xb40 [ixgbe]
[ 5052.218784]  ixgbe_watchdog_subtask+0x1a1/0x230 [ixgbe]
[ 5052.224026]  ixgbe_service_task+0x15a/0x3f0 [ixgbe]
[ 5052.228916]  process_one_work+0x177/0x330
[ 5052.232928]  worker_thread+0x256/0x3a0
[ 5052.236681]  ? __pfx_worker_thread+0x10/0x10
[ 5052.240952]  kthread+0xfa/0x240
[ 5052.244099]  ? __pfx_kthread+0x10/0x10
[ 5052.247852]  ret_from_fork+0x34/0x50
[ 5052.251429]  ? __pfx_kthread+0x10/0x10
[ 5052.255185]  ret_from_fork_asm+0x1a/0x30
[ 5052.259112]  </TASK>

The first simple patch, just adding spinlocking to ixgbe_update_stats()
while reading from adapter->vfinfo, did not fix the problem, it just
moved it elsewhere: I could now reproduce the same kind of crash in
ixgbe_restore_vf_multicasts().

But adding more spinlocking doesn't really cut it.  One reason is that
ixgbe_restore_vf_multicasts() is called from within ixgbe_msg_task()
with active spinlock, as well as from outside without locking.

Additionally, given that ixgbe_disable_sriov() is the only call changing
adapter->vfinfo, and given ixgbe_disable_sriov() is called very
seldom compared to other actions in the driver, just adding more
spinlocks would unnecessarily occupy the driver with spinning when
multiple functions accessing adapter->vfinfo are running in parallel.

So this patch drops the spinlock in favor of RCU and uses it throughout
the driver.

While changing this, it seems prudent to do the same for the
adapter->mv_list array, which is allocated and freed at the same time as
adapter->vfinfo, albeit there was no crash observed.

Fixes: 1e53834ce541d ("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   7 +-
 .../net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c   |  36 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  44 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 227 +++++---
 .../net/ethernet/intel/ixgbe/ixgbe_sriov.c    | 547 ++++++++++++------
 6 files changed, 592 insertions(+), 286 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 9b8217523fd2..8849b9f42bf6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -210,6 +210,7 @@ struct vf_stats {
 };
 
 struct vf_data_storage {
+	struct rcu_head rcu_head;
 	struct pci_dev *vfdev;
 	unsigned char vf_mac_addresses[ETH_ALEN];
 	u16 vf_mc_hashes[IXGBE_MAX_VF_MC_ENTRIES];
@@ -240,6 +241,7 @@ enum ixgbevf_xcast_modes {
 };
 
 struct vf_macvlans {
+	struct rcu_head rcu_head;
 	struct list_head l;
 	int vf;
 	bool free;
@@ -808,10 +810,10 @@ struct ixgbe_adapter {
 	/* SR-IOV */
 	DECLARE_BITMAP(active_vfs, IXGBE_MAX_VF_FUNCTIONS);
 	unsigned int num_vfs;
-	struct vf_data_storage *vfinfo;
+	struct vf_data_storage __rcu *vfinfo;
 	int vf_rate_link_speed;
 	struct vf_macvlans vf_mvs;
-	struct vf_macvlans *mv_list;
+	struct vf_macvlans __rcu *mv_list;
 
 	u32 timer_event_accumulator;
 	u32 vferr_refcount;
@@ -844,7 +846,6 @@ struct ixgbe_adapter {
 #ifdef CONFIG_IXGBE_IPSEC
 	struct ixgbe_ipsec *ipsec;
 #endif /* CONFIG_IXGBE_IPSEC */
-	spinlock_t vfs_lock;
 };
 
 struct ixgbe_netdevice_priv {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
index 382d097e4b11..9a84cfc09120 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
@@ -640,17 +640,21 @@ static int ixgbe_dcbnl_ieee_setapp(struct net_device *dev,
 	/* VF devices should use default UP when available */
 	if (app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
 	    app->protocol == 0) {
+		struct vf_data_storage *vfinfo;
 		int vf;
 
 		adapter->default_up = app->priority;
 
-		for (vf = 0; vf < adapter->num_vfs; vf++) {
-			struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
-
-			if (!vfinfo->pf_qos)
-				ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
-						app->priority, vf);
-		}
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			for (vf = 0; vf < adapter->num_vfs; vf++) {
+				if (!vfinfo[vf].pf_qos)
+					ixgbe_set_vmvir(adapter,
+							vfinfo[vf].pf_vlan,
+							app->priority, vf);
+			}
+		rcu_read_unlock();
 	}
 
 	return 0;
@@ -683,19 +687,23 @@ static int ixgbe_dcbnl_ieee_delapp(struct net_device *dev,
 	/* IF default priority is being removed clear VF default UP */
 	if (app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
 	    app->protocol == 0 && adapter->default_up == app->priority) {
+		struct vf_data_storage *vfinfo;
 		int vf;
 		long unsigned int app_mask = dcb_ieee_getapp_mask(dev, app);
 		int qos = app_mask ? find_first_bit(&app_mask, 8) : 0;
 
 		adapter->default_up = qos;
 
-		for (vf = 0; vf < adapter->num_vfs; vf++) {
-			struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
-
-			if (!vfinfo->pf_qos)
-				ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
-						qos, vf);
-		}
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			for (vf = 0; vf < adapter->num_vfs; vf++) {
+				if (!vfinfo[vf].pf_qos)
+					ixgbe_set_vmvir(adapter,
+							vfinfo[vf].pf_vlan,
+							qos, vf);
+			}
+		rcu_read_unlock();
 	}
 
 	return err;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index ba049b3a9609..b77317476af4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2265,21 +2265,28 @@ static void ixgbe_diag_test(struct net_device *netdev,
 		struct ixgbe_hw *hw = &adapter->hw;
 
 		if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
+			struct vf_data_storage *vfinfo;
 			int i;
-			for (i = 0; i < adapter->num_vfs; i++) {
-				if (adapter->vfinfo[i].clear_to_send) {
-					netdev_warn(netdev, "offline diagnostic is not supported when VFs are present\n");
-					data[0] = 1;
-					data[1] = 1;
-					data[2] = 1;
-					data[3] = 1;
-					data[4] = 1;
-					eth_test->flags |= ETH_TEST_FL_FAILED;
-					clear_bit(__IXGBE_TESTING,
-						  &adapter->state);
-					return;
+
+			rcu_read_lock();
+			vfinfo = rcu_dereference(adapter->vfinfo);
+			if (vfinfo)
+				for (i = 0; i < adapter->num_vfs; i++) {
+					if (vfinfo[i].clear_to_send) {
+						netdev_warn(netdev, "offline diagnostic is not supported when VFs are present\n");
+						data[0] = 1;
+						data[1] = 1;
+						data[2] = 1;
+						data[3] = 1;
+						data[4] = 1;
+						eth_test->flags |= ETH_TEST_FL_FAILED;
+						clear_bit(__IXGBE_TESTING,
+							  &adapter->state);
+						rcu_read_unlock();
+						return;
+					}
 				}
-			}
+			rcu_read_unlock();
 		}
 
 		/* Offline tests */
@@ -3700,9 +3707,14 @@ static int ixgbe_set_priv_flags(struct net_device *netdev, u32 priv_flags)
 	if (priv_flags & IXGBE_PRIV_FLAGS_AUTO_DISABLE_VF) {
 		if (adapter->hw.mac.type == ixgbe_mac_82599EB) {
 			/* Reset primary abort counter */
-			for (i = 0; i < adapter->num_vfs; i++)
-				adapter->vfinfo[i].primary_abort_count = 0;
-
+			struct vf_data_storage *vfinfo;
+
+			rcu_read_lock();
+			vfinfo = rcu_dereference(adapter->vfinfo);
+			if (vfinfo)
+				for (i = 0; i < adapter->num_vfs; i++)
+					vfinfo[i].primary_abort_count = 0;
+			rcu_read_unlock();
 			flags2 |= IXGBE_FLAG2_AUTO_DISABLE_VF;
 		} else {
 			e_info(probe,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index bd397b3d7dea..b524a3a61eb6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -874,6 +874,7 @@ void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf)
 int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 {
 	struct ixgbe_ipsec *ipsec = adapter->ipsec;
+	struct vf_data_storage *vfinfo;
 	struct xfrm_algo_desc *algo;
 	struct sa_mbx_msg *sam;
 	struct xfrm_state *xs;
@@ -883,7 +884,13 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 	int err;
 
 	sam = (struct sa_mbx_msg *)(&msgbuf[1]);
-	if (!adapter->vfinfo[vf].trusted ||
+
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (!vfinfo[vf].trusted ||
 	    !(adapter->flags2 & IXGBE_FLAG2_VF_IPSEC_ENABLED)) {
 		e_warn(drv, "VF %d attempted to add an IPsec SA\n", vf);
 		err = -EACCES;
@@ -984,11 +991,17 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 int ixgbe_ipsec_vf_del_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 {
 	struct ixgbe_ipsec *ipsec = adapter->ipsec;
+	struct vf_data_storage *vfinfo;
 	struct xfrm_state *xs;
 	u32 pfsa = msgbuf[1];
 	u16 sa_idx;
 
-	if (!adapter->vfinfo[vf].trusted) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (!vfinfo[vf].trusted) {
 		e_err(drv, "vf %d attempted to delete an SA\n", vf);
 		return -EPERM;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2646ee6f295f..d82c7dfc6580 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1240,20 +1240,26 @@ static void ixgbe_pf_handle_tx_hang(struct ixgbe_ring *tx_ring,
 static void ixgbe_vf_handle_tx_hang(struct ixgbe_adapter *adapter, u16 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 
 	if (adapter->hw.mac.type != ixgbe_mac_e610)
 		return;
 
-	e_warn(drv,
-	       "Malicious Driver Detection tx hang detected on PF %d VF %d MAC: %pM",
-	       hw->bus.func, vf, adapter->vfinfo[vf].vf_mac_addresses);
-
-	adapter->tx_hang_count[vf]++;
-	if (adapter->tx_hang_count[vf] == IXGBE_MAX_TX_VF_HANGS) {
-		ixgbe_set_vf_link_state(adapter, vf,
-					IFLA_VF_LINK_STATE_DISABLE);
-		adapter->tx_hang_count[vf] = 0;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo) {
+		e_warn(drv,
+		       "Malicious Driver Detection tx hang detected on PF %d VF %d MAC: %pM",
+		       hw->bus.func, vf, vfinfo[vf].vf_mac_addresses);
+
+		adapter->tx_hang_count[vf]++;
+		if (adapter->tx_hang_count[vf] == IXGBE_MAX_TX_VF_HANGS) {
+			ixgbe_set_vf_link_state(adapter, vf,
+						IFLA_VF_LINK_STATE_DISABLE);
+			adapter->tx_hang_count[vf] = 0;
+		}
 	}
+	rcu_read_unlock();
 }
 
 static u32 ixgbe_poll_tx_icache(struct ixgbe_hw *hw, u16 queue, u16 idx)
@@ -4625,6 +4631,7 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u16 pool = adapter->num_rx_pools;
 	u32 reg_offset, vf_shift, vmolr;
+	struct vf_data_storage *vfinfo;
 	u32 gcr_ext, vmdctl;
 	int i;
 
@@ -4680,15 +4687,19 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 
 	IXGBE_WRITE_REG(hw, IXGBE_GCR_EXT, gcr_ext);
 
-	for (i = 0; i < adapter->num_vfs; i++) {
-		/* configure spoof checking */
-		ixgbe_ndo_set_vf_spoofchk(adapter->netdev, i,
-					  adapter->vfinfo[i].spoofchk_enabled);
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0; i < adapter->num_vfs; i++) {
+			/* configure spoof checking */
+			ixgbe_ndo_set_vf_spoofchk(adapter->netdev, i,
+						  vfinfo[i].spoofchk_enabled);
 
-		/* Enable/Disable RSS query feature  */
-		ixgbe_ndo_set_vf_rss_query_en(adapter->netdev, i,
-					  adapter->vfinfo[i].rss_query_enabled);
-	}
+			/* Enable/Disable RSS query feature  */
+			ixgbe_ndo_set_vf_rss_query_en(adapter->netdev, i,
+						      vfinfo[i].rss_query_enabled);
+		}
+	rcu_read_unlock();
 }
 
 static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
@@ -6093,35 +6104,40 @@ static void ixgbe_check_media_subtask(struct ixgbe_adapter *adapter)
 static void ixgbe_clear_vf_stats_counters(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int i;
 
-	for (i = 0; i < adapter->num_vfs; i++) {
-		adapter->vfinfo[i].last_vfstats.gprc =
-			IXGBE_READ_REG(hw, IXGBE_PVFGPRC(i));
-		adapter->vfinfo[i].saved_rst_vfstats.gprc +=
-			adapter->vfinfo[i].vfstats.gprc;
-		adapter->vfinfo[i].vfstats.gprc = 0;
-		adapter->vfinfo[i].last_vfstats.gptc =
-			IXGBE_READ_REG(hw, IXGBE_PVFGPTC(i));
-		adapter->vfinfo[i].saved_rst_vfstats.gptc +=
-			adapter->vfinfo[i].vfstats.gptc;
-		adapter->vfinfo[i].vfstats.gptc = 0;
-		adapter->vfinfo[i].last_vfstats.gorc =
-			IXGBE_READ_REG(hw, IXGBE_PVFGORC_LSB(i));
-		adapter->vfinfo[i].saved_rst_vfstats.gorc +=
-			adapter->vfinfo[i].vfstats.gorc;
-		adapter->vfinfo[i].vfstats.gorc = 0;
-		adapter->vfinfo[i].last_vfstats.gotc =
-			IXGBE_READ_REG(hw, IXGBE_PVFGOTC_LSB(i));
-		adapter->vfinfo[i].saved_rst_vfstats.gotc +=
-			adapter->vfinfo[i].vfstats.gotc;
-		adapter->vfinfo[i].vfstats.gotc = 0;
-		adapter->vfinfo[i].last_vfstats.mprc =
-			IXGBE_READ_REG(hw, IXGBE_PVFMPRC(i));
-		adapter->vfinfo[i].saved_rst_vfstats.mprc +=
-			adapter->vfinfo[i].vfstats.mprc;
-		adapter->vfinfo[i].vfstats.mprc = 0;
-	}
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0; i < adapter->num_vfs; i++) {
+			vfinfo[i].last_vfstats.gprc =
+				IXGBE_READ_REG(hw, IXGBE_PVFGPRC(i));
+			vfinfo[i].saved_rst_vfstats.gprc +=
+				vfinfo[i].vfstats.gprc;
+			vfinfo[i].vfstats.gprc = 0;
+			vfinfo[i].last_vfstats.gptc =
+				IXGBE_READ_REG(hw, IXGBE_PVFGPTC(i));
+			vfinfo[i].saved_rst_vfstats.gptc +=
+				vfinfo[i].vfstats.gptc;
+			vfinfo[i].vfstats.gptc = 0;
+			vfinfo[i].last_vfstats.gorc =
+				IXGBE_READ_REG(hw, IXGBE_PVFGORC_LSB(i));
+			vfinfo[i].saved_rst_vfstats.gorc +=
+				vfinfo[i].vfstats.gorc;
+			vfinfo[i].vfstats.gorc = 0;
+			vfinfo[i].last_vfstats.gotc =
+				IXGBE_READ_REG(hw, IXGBE_PVFGOTC_LSB(i));
+			vfinfo[i].saved_rst_vfstats.gotc +=
+				vfinfo[i].vfstats.gotc;
+			vfinfo[i].vfstats.gotc = 0;
+			vfinfo[i].last_vfstats.mprc =
+				IXGBE_READ_REG(hw, IXGBE_PVFMPRC(i));
+			vfinfo[i].saved_rst_vfstats.mprc +=
+				vfinfo[i].vfstats.mprc;
+			vfinfo[i].vfstats.mprc = 0;
+		}
+	rcu_read_unlock();
 }
 
 static void ixgbe_setup_gpie(struct ixgbe_adapter *adapter)
@@ -6729,15 +6745,22 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	timer_delete_sync(&adapter->service_timer);
 
 	if (adapter->num_vfs) {
+		struct vf_data_storage *vfinfo;
+
 		/* Clear EITR Select mapping */
 		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EITRSEL, 0);
 
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
 		/* Mark all the VFs as inactive */
-		for (i = 0 ; i < adapter->num_vfs; i++)
-			adapter->vfinfo[i].clear_to_send = false;
+		if (vfinfo) {
+			for (i = 0 ; i < adapter->num_vfs; i++)
+				vfinfo[i].clear_to_send = false;
 
-		/* update setting rx tx for all active vfs */
-		ixgbe_set_all_vfs(adapter);
+			/* update setting rx tx for all active vfs */
+			ixgbe_set_all_vfs(adapter);
+		}
+		rcu_read_unlock();
 	}
 
 	/* disable transmits in the hardware now that interrupts are off */
@@ -7001,9 +7024,6 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
 	/* n-tuple support exists, always init our spinlock */
 	spin_lock_init(&adapter->fdir_perfect_lock);
 
-	/* init spinlock to avoid concurrency of VF resources */
-	spin_lock_init(&adapter->vfs_lock);
-
 #ifdef CONFIG_IXGBE_DCB
 	ixgbe_init_dcb(adapter);
 #endif
@@ -7905,25 +7925,31 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 	 * crazy values.
 	 */
 	if (!test_bit(__IXGBE_RESETTING, &adapter->state)) {
-		for (i = 0; i < adapter->num_vfs; i++) {
-			UPDATE_VF_COUNTER_32bit(IXGBE_PVFGPRC(i),
-						adapter->vfinfo[i].last_vfstats.gprc,
-						adapter->vfinfo[i].vfstats.gprc);
-			UPDATE_VF_COUNTER_32bit(IXGBE_PVFGPTC(i),
-						adapter->vfinfo[i].last_vfstats.gptc,
-						adapter->vfinfo[i].vfstats.gptc);
-			UPDATE_VF_COUNTER_36bit(IXGBE_PVFGORC_LSB(i),
-						IXGBE_PVFGORC_MSB(i),
-						adapter->vfinfo[i].last_vfstats.gorc,
-						adapter->vfinfo[i].vfstats.gorc);
-			UPDATE_VF_COUNTER_36bit(IXGBE_PVFGOTC_LSB(i),
-						IXGBE_PVFGOTC_MSB(i),
-						adapter->vfinfo[i].last_vfstats.gotc,
-						adapter->vfinfo[i].vfstats.gotc);
-			UPDATE_VF_COUNTER_32bit(IXGBE_PVFMPRC(i),
-						adapter->vfinfo[i].last_vfstats.mprc,
-						adapter->vfinfo[i].vfstats.mprc);
-		}
+		struct vf_data_storage *vfinfo;
+
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			for (i = 0; i < adapter->num_vfs; i++) {
+				UPDATE_VF_COUNTER_32bit(IXGBE_PVFGPRC(i),
+							vfinfo[i].last_vfstats.gprc,
+							vfinfo[i].vfstats.gprc);
+				UPDATE_VF_COUNTER_32bit(IXGBE_PVFGPTC(i),
+							vfinfo[i].last_vfstats.gptc,
+							vfinfo[i].vfstats.gptc);
+				UPDATE_VF_COUNTER_36bit(IXGBE_PVFGORC_LSB(i),
+							IXGBE_PVFGORC_MSB(i),
+							vfinfo[i].last_vfstats.gorc,
+							vfinfo[i].vfstats.gorc);
+				UPDATE_VF_COUNTER_36bit(IXGBE_PVFGOTC_LSB(i),
+							IXGBE_PVFGOTC_MSB(i),
+							vfinfo[i].last_vfstats.gotc,
+							vfinfo[i].vfstats.gotc);
+				UPDATE_VF_COUNTER_32bit(IXGBE_PVFMPRC(i),
+							vfinfo[i].last_vfstats.mprc,
+							vfinfo[i].vfstats.mprc);
+			}
+		rcu_read_unlock();
 	}
 }
 
@@ -8267,22 +8293,27 @@ static void ixgbe_watchdog_flush_tx(struct ixgbe_adapter *adapter)
 static void ixgbe_bad_vf_abort(struct ixgbe_adapter *adapter, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 
-	if (adapter->hw.mac.type == ixgbe_mac_82599EB &&
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo &&
+	    adapter->hw.mac.type == ixgbe_mac_82599EB &&
 	    adapter->flags2 & IXGBE_FLAG2_AUTO_DISABLE_VF) {
-		adapter->vfinfo[vf].primary_abort_count++;
-		if (adapter->vfinfo[vf].primary_abort_count ==
+		vfinfo[vf].primary_abort_count++;
+		if (vfinfo[vf].primary_abort_count ==
 		    IXGBE_PRIMARY_ABORT_LIMIT) {
 			ixgbe_set_vf_link_state(adapter, vf,
 						IFLA_VF_LINK_STATE_DISABLE);
-			adapter->vfinfo[vf].primary_abort_count = 0;
+			vfinfo[vf].primary_abort_count = 0;
 
 			e_info(drv,
 			       "Malicious Driver Detection event detected on PF %d VF %d MAC: %pM mdd-disable-vf=on",
 			       hw->bus.func, vf,
-			       adapter->vfinfo[vf].vf_mac_addresses);
+			       vfinfo[vf].vf_mac_addresses);
 		}
 	}
+	rcu_read_unlock();
 }
 
 static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
@@ -8309,9 +8340,15 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
 
 	/* check status reg for all VFs owned by this PF */
 	for (vf = 0; vf < adapter->num_vfs; ++vf) {
-		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
+		struct vf_data_storage *vfinfo;
+		struct pci_dev *vfdev = NULL;
 		u16 status_reg;
 
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			vfdev = vfinfo[vf].vfdev;
+		rcu_read_unlock();
 		if (!vfdev)
 			continue;
 		pci_read_config_word(vfdev, PCI_STATUS, &status_reg);
@@ -9744,15 +9781,21 @@ static int ixgbe_ndo_get_vf_stats(struct net_device *netdev, int vf,
 				  struct ifla_vf_stats *vf_stats)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 
 	if (vf < 0 || vf >= adapter->num_vfs)
 		return -EINVAL;
 
-	vf_stats->rx_packets = adapter->vfinfo[vf].vfstats.gprc;
-	vf_stats->rx_bytes   = adapter->vfinfo[vf].vfstats.gorc;
-	vf_stats->tx_packets = adapter->vfinfo[vf].vfstats.gptc;
-	vf_stats->tx_bytes   = adapter->vfinfo[vf].vfstats.gotc;
-	vf_stats->multicast  = adapter->vfinfo[vf].vfstats.mprc;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo) {
+		vf_stats->rx_packets = vfinfo[vf].vfstats.gprc;
+		vf_stats->rx_bytes   = vfinfo[vf].vfstats.gorc;
+		vf_stats->tx_packets = vfinfo[vf].vfstats.gptc;
+		vf_stats->tx_bytes   = vfinfo[vf].vfstats.gotc;
+		vf_stats->multicast  = vfinfo[vf].vfstats.mprc;
+	}
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -10071,20 +10114,26 @@ static int handle_redirect_action(struct ixgbe_adapter *adapter, int ifindex,
 {
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
 	unsigned int num_vfs = adapter->num_vfs, vf;
+	struct vf_data_storage *vfinfo;
 	struct netdev_nested_priv priv;
 	struct upper_walk_data data;
 	struct net_device *upper;
 
 	/* redirect to a SRIOV VF */
-	for (vf = 0; vf < num_vfs; ++vf) {
-		upper = pci_get_drvdata(adapter->vfinfo[vf].vfdev);
-		if (upper->ifindex == ifindex) {
-			*queue = vf * __ALIGN_MASK(1, ~vmdq->mask);
-			*action = vf + 1;
-			*action <<= ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
-			return 0;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (vf = 0; vf < num_vfs; ++vf) {
+			upper = pci_get_drvdata(vfinfo[vf].vfdev);
+			if (upper->ifindex == ifindex) {
+				*queue = vf * __ALIGN_MASK(1, ~vmdq->mask);
+				*action = vf + 1;
+				*action <<= ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
+				rcu_read_unlock();
+				return 0;
+			}
 		}
-	}
+	rcu_read_unlock();
 
 	/* redirect to a offloaded macvlan netdev */
 	data.adapter = adapter;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 431d77da15a5..80f22a8e7af4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -44,7 +44,7 @@ static inline void ixgbe_alloc_vf_macvlans(struct ixgbe_adapter *adapter,
 			mv_list[i].free = true;
 			list_add(&mv_list[i].l, &adapter->vf_mvs.l);
 		}
-		adapter->mv_list = mv_list;
+		rcu_assign_pointer(adapter->mv_list, mv_list);
 	}
 }
 
@@ -52,6 +52,7 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 				unsigned int num_vfs)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int i;
 
 	if (adapter->xdp_prog) {
@@ -64,14 +65,11 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 			  IXGBE_FLAG_VMDQ_ENABLED;
 
 	/* Allocate memory for per VF control structures */
-	adapter->vfinfo = kzalloc_objs(struct vf_data_storage, num_vfs);
-	if (!adapter->vfinfo)
+	vfinfo = kzalloc_objs(struct vf_data_storage, num_vfs);
+	if (!vfinfo)
 		return -ENOMEM;
 
-	adapter->num_vfs = num_vfs;
-
 	ixgbe_alloc_vf_macvlans(adapter, num_vfs);
-	adapter->ring_feature[RING_F_VMDQ].offset = num_vfs;
 
 	/* Initialize default switching mode VEB */
 	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
@@ -95,23 +93,27 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 
 	for (i = 0; i < num_vfs; i++) {
 		/* enable spoof checking for all VFs */
-		adapter->vfinfo[i].spoofchk_enabled = true;
-		adapter->vfinfo[i].link_enable = true;
+		vfinfo[i].spoofchk_enabled = true;
+		vfinfo[i].link_enable = true;
 
 		/* We support VF RSS querying only for 82599 and x540
 		 * devices at the moment. These devices share RSS
 		 * indirection table and RSS hash key with PF therefore
 		 * we want to disable the querying by default.
 		 */
-		adapter->vfinfo[i].rss_query_enabled = false;
+		vfinfo[i].rss_query_enabled = false;
 
 		/* Untrust all VFs */
-		adapter->vfinfo[i].trusted = false;
+		vfinfo[i].trusted = false;
 
 		/* set the default xcast mode */
-		adapter->vfinfo[i].xcast_mode = IXGBEVF_XCAST_MODE_NONE;
+		vfinfo[i].xcast_mode = IXGBEVF_XCAST_MODE_NONE;
 	}
 
+	rcu_assign_pointer(adapter->vfinfo, vfinfo);
+	adapter->num_vfs = num_vfs;
+	adapter->ring_feature[RING_F_VMDQ].offset = num_vfs;
+
 	e_info(probe, "SR-IOV enabled with %d VFs\n", num_vfs);
 	return 0;
 }
@@ -123,6 +125,7 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 static void ixgbe_get_vfs(struct ixgbe_adapter *adapter)
 {
 	struct pci_dev *pdev = adapter->pdev;
+	struct vf_data_storage *vfinfo;
 	u16 vendor = pdev->vendor;
 	struct pci_dev *vfdev;
 	int vf = 0;
@@ -134,18 +137,23 @@ static void ixgbe_get_vfs(struct ixgbe_adapter *adapter)
 		return;
 	pci_read_config_word(pdev, pos + PCI_SRIOV_VF_DID, &vf_id);
 
-	vfdev = pci_get_device(vendor, vf_id, NULL);
-	for (; vfdev; vfdev = pci_get_device(vendor, vf_id, vfdev)) {
-		if (!vfdev->is_virtfn)
-			continue;
-		if (vfdev->physfn != pdev)
-			continue;
-		if (vf >= adapter->num_vfs)
-			continue;
-		pci_dev_get(vfdev);
-		adapter->vfinfo[vf].vfdev = vfdev;
-		++vf;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo) {
+		vfdev = pci_get_device(vendor, vf_id, NULL);
+		for (; vfdev; vfdev = pci_get_device(vendor, vf_id, vfdev)) {
+			if (!vfdev->is_virtfn)
+				continue;
+			if (vfdev->physfn != pdev)
+				continue;
+			if (vf >= adapter->num_vfs)
+				continue;
+			pci_dev_get(vfdev);
+			vfinfo[vf].vfdev = vfdev;
+			++vf;
+		}
 	}
+	rcu_read_unlock();
 }
 
 /* Note this function is called when the user wants to enable SR-IOV
@@ -206,31 +214,28 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
 {
 	unsigned int num_vfs = adapter->num_vfs, vf;
 	struct ixgbe_hw *hw = &adapter->hw;
-	unsigned long flags;
+	struct vf_data_storage *vfinfo;
+	struct vf_macvlans *mv_list;
 	int rss;
 
-	spin_lock_irqsave(&adapter->vfs_lock, flags);
-	/* set num VFs to 0 to prevent access to vfinfo */
+	/* set num VFs to 0 so readers bail out early */
 	adapter->num_vfs = 0;
-	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+
+	vfinfo = rcu_replace_pointer(adapter->vfinfo, NULL, 1);
+	mv_list = rcu_replace_pointer(adapter->mv_list, NULL, 1);
 
 	/* put the reference to all of the vf devices */
 	for (vf = 0; vf < num_vfs; ++vf) {
-		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
+		struct pci_dev *vfdev = vfinfo[vf].vfdev;
 
 		if (!vfdev)
 			continue;
-		adapter->vfinfo[vf].vfdev = NULL;
+		vfinfo[vf].vfdev = NULL;
 		pci_dev_put(vfdev);
 	}
 
-	/* free VF control structures */
-	kfree(adapter->vfinfo);
-	adapter->vfinfo = NULL;
-
-	/* free macvlan list */
-	kfree(adapter->mv_list);
-	adapter->mv_list = NULL;
+	kfree_rcu(vfinfo, rcu_head);
+	kfree_rcu(mv_list, rcu_head);
 
 	/* if SR-IOV is already disabled then there is nothing to do */
 	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
@@ -368,8 +373,8 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
 {
 	int entries = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]);
 	u16 *hash_list = (u16 *)&msgbuf[1];
-	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int i;
 	u32 vector_bit;
 	u32 vector_reg;
@@ -379,28 +384,34 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
 	/* only so many hash values supported */
 	entries = min(entries, IXGBE_MAX_VF_MC_ENTRIES);
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/*
 	 * salt away the number of multi cast addresses assigned
 	 * to this VF for later use to restore when the PF multi cast
 	 * list changes
 	 */
-	vfinfo->num_vf_mc_hashes = entries;
+	vfinfo[vf].num_vf_mc_hashes = entries;
 
 	/*
 	 * VFs are limited to using the MTA hash table for their multicast
 	 * addresses
 	 */
 	for (i = 0; i < entries; i++) {
-		vfinfo->vf_mc_hashes[i] = hash_list[i];
+		vfinfo[vf].vf_mc_hashes[i] = hash_list[i];
 	}
 
-	for (i = 0; i < vfinfo->num_vf_mc_hashes; i++) {
-		vector_reg = (vfinfo->vf_mc_hashes[i] >> 5) & 0x7F;
-		vector_bit = vfinfo->vf_mc_hashes[i] & 0x1F;
+	for (i = 0; i < vfinfo[vf].num_vf_mc_hashes; i++) {
+		vector_reg = (vfinfo[vf].vf_mc_hashes[i] >> 5) & 0x7F;
+		vector_bit = vfinfo[vf].vf_mc_hashes[i] & 0x1F;
 		mta_reg = IXGBE_READ_REG(hw, IXGBE_MTA(vector_reg));
 		mta_reg |= BIT(vector_bit);
 		IXGBE_WRITE_REG(hw, IXGBE_MTA(vector_reg), mta_reg);
 	}
+
 	vmolr |= IXGBE_VMOLR_ROMPE;
 	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
 
@@ -410,32 +421,39 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
 #ifdef CONFIG_PCI_IOV
 void ixgbe_restore_vf_multicasts(struct ixgbe_adapter *adapter)
 {
-	struct ixgbe_hw *hw = &adapter->hw;
 	struct vf_data_storage *vfinfo;
+	struct ixgbe_hw *hw = &adapter->hw;
 	int i, j;
 	u32 vector_bit;
 	u32 vector_reg;
 	u32 mta_reg;
 
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		goto no_vfs;
+
 	for (i = 0; i < adapter->num_vfs; i++) {
 		u32 vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(i));
-		vfinfo = &adapter->vfinfo[i];
-		for (j = 0; j < vfinfo->num_vf_mc_hashes; j++) {
+		for (j = 0; j < vfinfo[i].num_vf_mc_hashes; j++) {
 			hw->addr_ctrl.mta_in_use++;
-			vector_reg = (vfinfo->vf_mc_hashes[j] >> 5) & 0x7F;
-			vector_bit = vfinfo->vf_mc_hashes[j] & 0x1F;
+			vector_reg = (vfinfo[i].vf_mc_hashes[j] >> 5) & 0x7F;
+			vector_bit = vfinfo[i].vf_mc_hashes[j] & 0x1F;
 			mta_reg = IXGBE_READ_REG(hw, IXGBE_MTA(vector_reg));
 			mta_reg |= BIT(vector_bit);
 			IXGBE_WRITE_REG(hw, IXGBE_MTA(vector_reg), mta_reg);
 		}
 
-		if (vfinfo->num_vf_mc_hashes)
+		if (vfinfo[i].num_vf_mc_hashes)
 			vmolr |= IXGBE_VMOLR_ROMPE;
 		else
 			vmolr &= ~IXGBE_VMOLR_ROMPE;
 		IXGBE_WRITE_REG(hw, IXGBE_VMOLR(i), vmolr);
 	}
 
+no_vfs:
+	rcu_read_unlock();
+
 	/* Restore any VF macvlans */
 	ixgbe_full_sync_mac_table(adapter);
 }
@@ -493,7 +511,9 @@ static int ixgbe_set_vf_lpe(struct ixgbe_adapter *adapter, u32 max_frame, u32 vf
 	 */
 	if (adapter->hw.mac.type == ixgbe_mac_82599EB) {
 		struct net_device *dev = adapter->netdev;
+		unsigned int vf_api = ixgbe_mbox_api_10;
 		int pf_max_frame = dev->mtu + ETH_HLEN;
+		struct vf_data_storage *vfinfo;
 		u32 reg_offset, vf_shift, vfre;
 		int err = 0;
 
@@ -503,7 +523,12 @@ static int ixgbe_set_vf_lpe(struct ixgbe_adapter *adapter, u32 max_frame, u32 vf
 					     IXGBE_FCOE_JUMBO_FRAME_SIZE);
 
 #endif /* CONFIG_FCOE */
-		switch (adapter->vfinfo[vf].vf_api) {
+		lockdep_assert_in_rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			vf_api = vfinfo[vf].vf_api;
+
+		switch (vf_api) {
 		case ixgbe_mbox_api_11:
 		case ixgbe_mbox_api_12:
 		case ixgbe_mbox_api_13:
@@ -643,10 +668,16 @@ static void ixgbe_clear_vf_vlans(struct ixgbe_adapter *adapter, u32 vf)
 static int ixgbe_set_vf_macvlan(struct ixgbe_adapter *adapter,
 				int vf, int index, unsigned char *mac_addr)
 {
-	struct vf_macvlans *entry;
+	struct vf_macvlans *mv_list, *entry;
 	bool found = false;
 	int retval = 0;
 
+	lockdep_assert_in_rcu_read_lock();
+	/* vf_mvs entries point into the mv_list array */
+	mv_list = rcu_dereference(adapter->mv_list);
+	if (!mv_list)
+		return 0;
+
 	if (index <= 1) {
 		list_for_each_entry(entry, &adapter->vf_mvs.l, l) {
 			if (entry->vf == vf) {
@@ -700,7 +731,7 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
-	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
+	struct vf_data_storage *vfinfo;
 	u32 q_per_pool = __ALIGN_MASK(1, ~vmdq->mask);
 	u8 num_tcs = adapter->hw_tcs;
 	u32 reg_val;
@@ -709,31 +740,36 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
 	/* remove VLAN filters belonging to this VF */
 	ixgbe_clear_vf_vlans(adapter, vf);
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return;
+
 	/* add back PF assigned VLAN or VLAN 0 */
-	ixgbe_set_vf_vlan(adapter, true, vfinfo->pf_vlan, vf);
+	ixgbe_set_vf_vlan(adapter, true, vfinfo[vf].pf_vlan, vf);
 
 	/* reset offloads to defaults */
-	ixgbe_set_vmolr(hw, vf, !vfinfo->pf_vlan);
+	ixgbe_set_vmolr(hw, vf, !vfinfo[vf].pf_vlan);
 
 	/* set outgoing tags for VFs */
-	if (!vfinfo->pf_vlan && !vfinfo->pf_qos && !num_tcs) {
+	if (!vfinfo[vf].pf_vlan && !vfinfo[vf].pf_qos && !num_tcs) {
 		ixgbe_clear_vmvir(adapter, vf);
 	} else {
-		if (vfinfo->pf_qos || !num_tcs)
-			ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
-					vfinfo->pf_qos, vf);
+		if (vfinfo[vf].pf_qos || !num_tcs)
+			ixgbe_set_vmvir(adapter, vfinfo[vf].pf_vlan,
+					vfinfo[vf].pf_qos, vf);
 		else
-			ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
+			ixgbe_set_vmvir(adapter, vfinfo[vf].pf_vlan,
 					adapter->default_up, vf);
 
-		if (vfinfo->spoofchk_enabled) {
+		if (vfinfo[vf].spoofchk_enabled) {
 			hw->mac.ops.set_vlan_anti_spoofing(hw, true, vf);
 			hw->mac.ops.set_mac_anti_spoofing(hw, true, vf);
 		}
 	}
 
 	/* reset multicast table array for vf */
-	adapter->vfinfo[vf].num_vf_mc_hashes = 0;
+	vfinfo[vf].num_vf_mc_hashes = 0;
 
 	/* clear any ipsec table info */
 	ixgbe_ipsec_vf_clear(adapter, vf);
@@ -741,11 +777,11 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
 	/* Flush and reset the mta with the new values */
 	ixgbe_set_rx_mode(adapter->netdev);
 
-	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
+	ixgbe_del_mac_filter(adapter, vfinfo[vf].vf_mac_addresses, vf);
 	ixgbe_set_vf_macvlan(adapter, vf, 0, NULL);
 
 	/* reset VF api back to unknown */
-	adapter->vfinfo[vf].vf_api = ixgbe_mbox_api_10;
+	vfinfo[vf].vf_api = ixgbe_mbox_api_10;
 
 	/* Restart each queue for given VF */
 	for (queue = 0; queue < q_per_pool; queue++) {
@@ -780,16 +816,25 @@ static void ixgbe_vf_clear_mbx(struct ixgbe_adapter *adapter, u32 vf)
 static int ixgbe_set_vf_mac(struct ixgbe_adapter *adapter,
 			    int vf, unsigned char *mac_addr)
 {
+	struct vf_data_storage *vfinfo;
 	int retval;
 
-	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+
+	ixgbe_del_mac_filter(adapter, vfinfo[vf].vf_mac_addresses, vf);
 	retval = ixgbe_add_mac_filter(adapter, mac_addr, vf);
 	if (retval >= 0)
-		memcpy(adapter->vfinfo[vf].vf_mac_addresses, mac_addr,
+		memcpy(vfinfo[vf].vf_mac_addresses, mac_addr,
 		       ETH_ALEN);
 	else
-		eth_zero_addr(adapter->vfinfo[vf].vf_mac_addresses);
+		eth_zero_addr(vfinfo[vf].vf_mac_addresses);
 
+	rcu_read_unlock();
 	return retval;
 }
 
@@ -797,12 +842,17 @@ int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask)
 {
 	struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
 	unsigned int vfn = (event_mask & 0x3f);
+	struct vf_data_storage *vfinfo;
 
 	bool enable = ((event_mask & 0x10000000U) != 0);
 
-	if (enable)
-		eth_zero_addr(adapter->vfinfo[vfn].vf_mac_addresses);
-
+	if (enable) {
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			eth_zero_addr(vfinfo[vfn].vf_mac_addresses);
+		rcu_read_unlock();
+	}
 	return 0;
 }
 
@@ -838,6 +888,7 @@ static void ixgbe_set_vf_rx_tx(struct ixgbe_adapter *adapter, int vf)
 {
 	u32 reg_cur_tx, reg_cur_rx, reg_req_tx, reg_req_rx;
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	u32 reg_offset, vf_shift;
 
 	vf_shift = vf % 32;
@@ -846,7 +897,9 @@ static void ixgbe_set_vf_rx_tx(struct ixgbe_adapter *adapter, int vf)
 	reg_cur_tx = IXGBE_READ_REG(hw, IXGBE_VFTE(reg_offset));
 	reg_cur_rx = IXGBE_READ_REG(hw, IXGBE_VFRE(reg_offset));
 
-	if (adapter->vfinfo[vf].link_enable) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo && vfinfo[vf].link_enable) {
 		reg_req_tx = reg_cur_tx | 1 << vf_shift;
 		reg_req_rx = reg_cur_rx | 1 << vf_shift;
 	} else {
@@ -882,11 +935,12 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 {
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
 	struct ixgbe_hw *hw = &adapter->hw;
-	unsigned char *vf_mac = adapter->vfinfo[vf].vf_mac_addresses;
+	struct vf_data_storage *vfinfo;
 	u32 reg, reg_offset, vf_shift;
 	u32 msgbuf[4] = {0, 0, 0, 0};
 	u8 *addr = (u8 *)(&msgbuf[1]);
 	u32 q_per_pool = __ALIGN_MASK(1, ~vmdq->mask);
+	unsigned char *vf_mac;
 	int i;
 
 	e_info(probe, "VF Reset msg received from vf %d\n", vf);
@@ -896,6 +950,13 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 
 	ixgbe_vf_clear_mbx(adapter, vf);
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	vf_mac = vfinfo[vf].vf_mac_addresses;
+
 	/* set vf mac address */
 	if (!is_zero_ether_addr(vf_mac))
 		ixgbe_set_vf_mac(adapter, vf, vf_mac);
@@ -905,7 +966,7 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 
 	/* force drop enable for all VF Rx queues */
 	reg = IXGBE_QDE_ENABLE;
-	if (adapter->vfinfo[vf].pf_vlan)
+	if (vfinfo[vf].pf_vlan)
 		reg |= IXGBE_QDE_HIDE_VLAN;
 
 	ixgbe_write_qde(adapter, vf, reg);
@@ -913,7 +974,7 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 	ixgbe_set_vf_rx_tx(adapter, vf);
 
 	/* enable VF mailbox for further messages */
-	adapter->vfinfo[vf].clear_to_send = true;
+	vfinfo[vf].clear_to_send = true;
 
 	/* Enable counting of spoofed packets in the SSVPC register */
 	reg = IXGBE_READ_REG(hw, IXGBE_VMECM(reg_offset));
@@ -931,7 +992,7 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 
 	/* reply to reset with ack and vf mac address */
 	msgbuf[0] = IXGBE_VF_RESET;
-	if (!is_zero_ether_addr(vf_mac) && adapter->vfinfo[vf].pf_set_mac) {
+	if (!is_zero_ether_addr(vf_mac) && vfinfo[vf].pf_set_mac) {
 		msgbuf[0] |= IXGBE_VT_MSGTYPE_ACK;
 		memcpy(addr, vf_mac, ETH_ALEN);
 	} else {
@@ -952,14 +1013,20 @@ static int ixgbe_set_vf_mac_addr(struct ixgbe_adapter *adapter,
 				 u32 *msgbuf, u32 vf)
 {
 	u8 *new_mac = ((u8 *)(&msgbuf[1]));
+	struct vf_data_storage *vfinfo;
 
 	if (!is_valid_ether_addr(new_mac)) {
 		e_warn(drv, "VF %d attempted to set invalid mac\n", vf);
 		return -1;
 	}
 
-	if (adapter->vfinfo[vf].pf_set_mac && !adapter->vfinfo[vf].trusted &&
-	    !ether_addr_equal(adapter->vfinfo[vf].vf_mac_addresses, new_mac)) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (vfinfo[vf].pf_set_mac && !vfinfo[vf].trusted &&
+	    !ether_addr_equal(vfinfo[vf].vf_mac_addresses, new_mac)) {
 		e_warn(drv,
 		       "VF %d attempted to override administratively set MAC address\n"
 		       "Reload the VF driver to resume operations\n",
@@ -975,9 +1042,15 @@ static int ixgbe_set_vf_vlan_msg(struct ixgbe_adapter *adapter,
 {
 	u32 add = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]);
 	u32 vid = (msgbuf[1] & IXGBE_VLVF_VLANID_MASK);
+	struct vf_data_storage *vfinfo;
 	u8 tcs = adapter->hw_tcs;
 
-	if (adapter->vfinfo[vf].pf_vlan || tcs) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (vfinfo[vf].pf_vlan || tcs) {
 		e_warn(drv,
 		       "VF %d attempted to override administratively set VLAN configuration\n"
 		       "Reload the VF driver to resume operations\n",
@@ -997,9 +1070,15 @@ static int ixgbe_set_vf_macvlan_msg(struct ixgbe_adapter *adapter,
 {
 	u8 *new_mac = ((u8 *)(&msgbuf[1]));
 	int index = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]);
+	struct vf_data_storage *vfinfo;
 	int err;
 
-	if (adapter->vfinfo[vf].pf_set_mac && !adapter->vfinfo[vf].trusted &&
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (vfinfo[vf].pf_set_mac && !vfinfo[vf].trusted &&
 	    index > 0) {
 		e_warn(drv,
 		       "VF %d requested MACVLAN filter but is administratively denied\n",
@@ -1018,7 +1097,7 @@ static int ixgbe_set_vf_macvlan_msg(struct ixgbe_adapter *adapter,
 		 * If the VF is allowed to set MAC filters then turn off
 		 * anti-spoofing to avoid false positives.
 		 */
-		if (adapter->vfinfo[vf].spoofchk_enabled) {
+		if (vfinfo[vf].spoofchk_enabled) {
 			struct ixgbe_hw *hw = &adapter->hw;
 
 			hw->mac.ops.set_mac_anti_spoofing(hw, false, vf);
@@ -1038,6 +1117,7 @@ static int ixgbe_set_vf_macvlan_msg(struct ixgbe_adapter *adapter,
 static int ixgbe_negotiate_vf_api(struct ixgbe_adapter *adapter,
 				  u32 *msgbuf, u32 vf)
 {
+	struct vf_data_storage *vfinfo;
 	int api = msgbuf[1];
 
 	switch (api) {
@@ -1048,7 +1128,10 @@ static int ixgbe_negotiate_vf_api(struct ixgbe_adapter *adapter,
 	case ixgbe_mbox_api_14:
 	case ixgbe_mbox_api_16:
 	case ixgbe_mbox_api_17:
-		adapter->vfinfo[vf].vf_api = api;
+		lockdep_assert_in_rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			vfinfo[vf].vf_api = api;
 		return 0;
 	default:
 		break;
@@ -1064,11 +1147,17 @@ static int ixgbe_get_vf_queues(struct ixgbe_adapter *adapter,
 {
 	struct net_device *dev = adapter->netdev;
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
+	struct vf_data_storage *vfinfo;
 	unsigned int default_tc = 0;
 	u8 num_tcs = adapter->hw_tcs;
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/* verify the PF is supporting the correct APIs */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_20:
 	case ixgbe_mbox_api_11:
 	case ixgbe_mbox_api_12:
@@ -1092,7 +1181,7 @@ static int ixgbe_get_vf_queues(struct ixgbe_adapter *adapter,
 	/* notify VF of need for VLAN tag stripping, and correct queue */
 	if (num_tcs)
 		msgbuf[IXGBE_VF_TRANS_VLAN] = num_tcs;
-	else if (adapter->vfinfo[vf].pf_vlan || adapter->vfinfo[vf].pf_qos)
+	else if (vfinfo[vf].pf_vlan || vfinfo[vf].pf_qos)
 		msgbuf[IXGBE_VF_TRANS_VLAN] = 1;
 	else
 		msgbuf[IXGBE_VF_TRANS_VLAN] = 0;
@@ -1105,17 +1194,23 @@ static int ixgbe_get_vf_queues(struct ixgbe_adapter *adapter,
 
 static int ixgbe_get_vf_reta(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 {
-	u32 i, j;
-	u32 *out_buf = &msgbuf[1];
-	const u8 *reta = adapter->rss_indir_tbl;
 	u32 reta_size = ixgbe_rss_indir_tbl_entries(adapter);
+	const u8 *reta = adapter->rss_indir_tbl;
+	struct vf_data_storage *vfinfo;
+	u32 *out_buf = &msgbuf[1];
+	u32 i, j;
+
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
 
 	/* Check if operation is permitted */
-	if (!adapter->vfinfo[vf].rss_query_enabled)
+	if (!vfinfo[vf].rss_query_enabled)
 		return -EPERM;
 
 	/* verify the PF is supporting the correct API */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_17:
 	case ixgbe_mbox_api_16:
 	case ixgbe_mbox_api_14:
@@ -1143,14 +1238,20 @@ static int ixgbe_get_vf_reta(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 static int ixgbe_get_vf_rss_key(struct ixgbe_adapter *adapter,
 				u32 *msgbuf, u32 vf)
 {
+	struct vf_data_storage *vfinfo;
 	u32 *rss_key = &msgbuf[1];
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/* Check if the operation is permitted */
-	if (!adapter->vfinfo[vf].rss_query_enabled)
+	if (!vfinfo[vf].rss_query_enabled)
 		return -EPERM;
 
 	/* verify the PF is supporting the correct API */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_17:
 	case ixgbe_mbox_api_16:
 	case ixgbe_mbox_api_14:
@@ -1170,11 +1271,17 @@ static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
 				      u32 *msgbuf, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int xcast_mode = msgbuf[1];
 	u32 vmolr, fctrl, disable, enable;
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/* verify the PF is supporting the correct APIs */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_12:
 		/* promisc introduced in 1.3 version */
 		if (xcast_mode == IXGBEVF_XCAST_MODE_PROMISC)
@@ -1190,11 +1297,11 @@ static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
 	}
 
 	if (xcast_mode > IXGBEVF_XCAST_MODE_MULTI &&
-	    !adapter->vfinfo[vf].trusted) {
+	    !vfinfo[vf].trusted) {
 		xcast_mode = IXGBEVF_XCAST_MODE_MULTI;
 	}
 
-	if (adapter->vfinfo[vf].xcast_mode == xcast_mode)
+	if (vfinfo[vf].xcast_mode == xcast_mode)
 		goto out;
 
 	switch (xcast_mode) {
@@ -1236,7 +1343,7 @@ static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
 	vmolr |= enable;
 	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
 
-	adapter->vfinfo[vf].xcast_mode = xcast_mode;
+	vfinfo[vf].xcast_mode = xcast_mode;
 
 out:
 	msgbuf[1] = xcast_mode;
@@ -1247,10 +1354,16 @@ static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
 static int ixgbe_get_vf_link_state(struct ixgbe_adapter *adapter,
 				   u32 *msgbuf, u32 vf)
 {
+	struct vf_data_storage *vfinfo;
 	u32 *link_state = &msgbuf[1];
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/* verify the PF is supporting the correct API */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_12:
 	case ixgbe_mbox_api_13:
 	case ixgbe_mbox_api_14:
@@ -1261,7 +1374,7 @@ static int ixgbe_get_vf_link_state(struct ixgbe_adapter *adapter,
 		return -EOPNOTSUPP;
 	}
 
-	*link_state = adapter->vfinfo[vf].link_enable;
+	*link_state = vfinfo[vf].link_enable;
 
 	return 0;
 }
@@ -1280,8 +1393,14 @@ static int ixgbe_send_vf_link_status(struct ixgbe_adapter *adapter,
 				     u32 *msgbuf, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
+
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
 
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_16:
 	case ixgbe_mbox_api_17:
 		if (hw->mac.type != ixgbe_mac_e610)
@@ -1310,9 +1429,15 @@ static int ixgbe_send_vf_link_status(struct ixgbe_adapter *adapter,
 static int ixgbe_negotiate_vf_features(struct ixgbe_adapter *adapter,
 				       u32 *msgbuf, u32 vf)
 {
+	struct vf_data_storage *vfinfo;
 	u32 features = msgbuf[1];
 
-	switch (adapter->vfinfo[vf].vf_api) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_17:
 		break;
 	default:
@@ -1330,6 +1455,7 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 	u32 mbx_size = IXGBE_VFMAILBOX_SIZE;
 	u32 msgbuf[IXGBE_VFMAILBOX_SIZE];
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int retval;
 
 	retval = ixgbe_read_mbx(hw, msgbuf, mbx_size, vf);
@@ -1349,11 +1475,16 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 	if (msgbuf[0] == IXGBE_VF_RESET)
 		return ixgbe_vf_reset_msg(adapter, vf);
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/*
 	 * until the vf completes a virtual function reset it should not be
 	 * allowed to start any configuration.
 	 */
-	if (!adapter->vfinfo[vf].clear_to_send) {
+	if (!vfinfo[vf].clear_to_send) {
 		msgbuf[0] |= IXGBE_VT_MSGTYPE_NACK;
 		ixgbe_write_mbx(hw, msgbuf, 1, vf);
 		return 0;
@@ -1426,11 +1557,12 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 
 static void ixgbe_rcv_ack_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 {
+	struct vf_data_storage *vfinfo = rcu_dereference(adapter->vfinfo);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 msg = IXGBE_VT_MSGTYPE_NACK;
 
 	/* if device isn't clear to send it shouldn't be reading either */
-	if (!adapter->vfinfo[vf].clear_to_send)
+	if (vfinfo && !vfinfo[vf].clear_to_send)
 		ixgbe_write_mbx(hw, &msg, 1, vf);
 }
 
@@ -1462,15 +1594,21 @@ bool ixgbe_check_mdd_event(struct ixgbe_adapter *adapter)
 			 IXGBE_READ_REG(hw, IXGBE_LVMMC_RX));
 
 		if (hw->mac.ops.restore_mdd_vf) {
+			struct vf_data_storage *vfinfo;
 			u32 ping;
 
 			hw->mac.ops.restore_mdd_vf(hw, i);
 
 			/* get the VF to rebuild its queues */
-			adapter->vfinfo[i].clear_to_send = 0;
-			ping = IXGBE_PF_CONTROL_MSG |
-			       IXGBE_VT_MSGTYPE_CTS;
-			ixgbe_write_mbx(hw, &ping, 1, i);
+			rcu_read_lock();
+			vfinfo = rcu_dereference(adapter->vfinfo);
+			if (vfinfo) {
+				vfinfo[i].clear_to_send = false;
+				ping = IXGBE_PF_CONTROL_MSG |
+				       IXGBE_VT_MSGTYPE_CTS;
+				ixgbe_write_mbx(hw, &ping, 1, i);
+			}
+			rcu_read_unlock();
 		}
 
 		ret = true;
@@ -1482,12 +1620,11 @@ bool ixgbe_check_mdd_event(struct ixgbe_adapter *adapter)
 void ixgbe_msg_task(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	unsigned long flags;
 	u32 vf;
 
 	ixgbe_check_mdd_event(adapter);
 
-	spin_lock_irqsave(&adapter->vfs_lock, flags);
+	rcu_read_lock();
 	for (vf = 0; vf < adapter->num_vfs; vf++) {
 		/* process any reset requests */
 		if (!ixgbe_check_for_rst(hw, vf))
@@ -1501,7 +1638,7 @@ void ixgbe_msg_task(struct ixgbe_adapter *adapter)
 		if (!ixgbe_check_for_ack(hw, vf))
 			ixgbe_rcv_ack_from_vf(adapter, vf);
 	}
-	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+	rcu_read_unlock();
 }
 
 static inline void ixgbe_ping_vf(struct ixgbe_adapter *adapter, int vf)
@@ -1510,23 +1647,26 @@ static inline void ixgbe_ping_vf(struct ixgbe_adapter *adapter, int vf)
 	u32 ping;
 
 	ping = IXGBE_PF_CONTROL_MSG;
-	if (adapter->vfinfo[vf].clear_to_send)
-		ping |= IXGBE_VT_MSGTYPE_CTS;
 	ixgbe_write_mbx(hw, &ping, 1, vf);
 }
 
 void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	u32 ping;
 	int i;
 
-	for (i = 0 ; i < adapter->num_vfs; i++) {
-		ping = IXGBE_PF_CONTROL_MSG;
-		if (adapter->vfinfo[i].clear_to_send)
-			ping |= IXGBE_VT_MSGTYPE_CTS;
-		ixgbe_write_mbx(hw, &ping, 1, i);
-	}
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0 ; i < adapter->num_vfs; i++) {
+			ping = IXGBE_PF_CONTROL_MSG;
+			if (vfinfo[i].clear_to_send)
+				ping |= IXGBE_VT_MSGTYPE_CTS;
+			ixgbe_write_mbx(hw, &ping, 1, i);
+		}
+	rcu_read_unlock();
 }
 
 /**
@@ -1537,21 +1677,34 @@ void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
  **/
 void ixgbe_set_all_vfs(struct ixgbe_adapter *adapter)
 {
+	struct vf_data_storage *vfinfo;
 	int i;
 
-	for (i = 0 ; i < adapter->num_vfs; i++)
-		ixgbe_set_vf_link_state(adapter, i,
-					adapter->vfinfo[i].link_state);
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0 ; i < adapter->num_vfs; i++)
+			ixgbe_set_vf_link_state(adapter, i,
+						vfinfo[i].link_state);
+	rcu_read_unlock();
 }
 
 int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 	int retval;
 
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return 0;
+	}
+
 	if (is_valid_ether_addr(mac)) {
 		dev_info(&adapter->pdev->dev, "setting MAC %pM on VF %d\n",
 			 mac, vf);
@@ -1559,7 +1712,7 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 
 		retval = ixgbe_set_vf_mac(adapter, vf, mac);
 		if (retval >= 0) {
-			adapter->vfinfo[vf].pf_set_mac = true;
+			vfinfo[vf].pf_set_mac = true;
 
 			if (test_bit(__IXGBE_DOWN, &adapter->state)) {
 				dev_warn(&adapter->pdev->dev, "The VF MAC address has been set, but the PF device is not up.\n");
@@ -1569,18 +1722,19 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 			dev_warn(&adapter->pdev->dev, "The VF MAC address was NOT set due to invalid or duplicate MAC address.\n");
 		}
 	} else if (is_zero_ether_addr(mac)) {
-		unsigned char *vf_mac_addr =
-					   adapter->vfinfo[vf].vf_mac_addresses;
+		unsigned char *vf_mac_addr = vfinfo[vf].vf_mac_addresses;
 
 		/* nothing to do */
-		if (is_zero_ether_addr(vf_mac_addr))
+		if (is_zero_ether_addr(vf_mac_addr)) {
+			rcu_read_unlock();
 			return 0;
+		}
 
 		dev_info(&adapter->pdev->dev, "removing MAC on VF %d\n", vf);
 
 		retval = ixgbe_del_mac_filter(adapter, vf_mac_addr, vf);
 		if (retval >= 0) {
-			adapter->vfinfo[vf].pf_set_mac = false;
+			vfinfo[vf].pf_set_mac = false;
 			memcpy(vf_mac_addr, mac, ETH_ALEN);
 		} else {
 			dev_warn(&adapter->pdev->dev, "Could NOT remove the VF MAC address.\n");
@@ -1589,10 +1743,12 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 		retval = -EINVAL;
 	}
 
+	rcu_read_unlock();
 	return retval;
 }
 
 static int ixgbe_enable_port_vlan(struct ixgbe_adapter *adapter, int vf,
+				  struct vf_data_storage *vfinfo,
 				  u16 vlan, u8 qos)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -1613,8 +1769,8 @@ static int ixgbe_enable_port_vlan(struct ixgbe_adapter *adapter, int vf,
 		ixgbe_write_qde(adapter, vf, IXGBE_QDE_ENABLE |
 				IXGBE_QDE_HIDE_VLAN);
 
-	adapter->vfinfo[vf].pf_vlan = vlan;
-	adapter->vfinfo[vf].pf_qos = qos;
+	vfinfo[vf].pf_vlan = vlan;
+	vfinfo[vf].pf_qos = qos;
 	dev_info(&adapter->pdev->dev,
 		 "Setting VLAN %d, QOS 0x%x on VF %d\n", vlan, qos, vf);
 	if (test_bit(__IXGBE_DOWN, &adapter->state)) {
@@ -1628,13 +1784,14 @@ static int ixgbe_enable_port_vlan(struct ixgbe_adapter *adapter, int vf,
 	return err;
 }
 
-static int ixgbe_disable_port_vlan(struct ixgbe_adapter *adapter, int vf)
+static int ixgbe_disable_port_vlan(struct ixgbe_adapter *adapter, int vf,
+				   struct vf_data_storage *vfinfo)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	int err;
 
 	err = ixgbe_set_vf_vlan(adapter, false,
-				adapter->vfinfo[vf].pf_vlan, vf);
+				vfinfo[vf].pf_vlan, vf);
 	/* Restore tagless access via VLAN 0 */
 	ixgbe_set_vf_vlan(adapter, true, 0, vf);
 	ixgbe_clear_vmvir(adapter, vf);
@@ -1644,8 +1801,8 @@ static int ixgbe_disable_port_vlan(struct ixgbe_adapter *adapter, int vf)
 	if (hw->mac.type >= ixgbe_mac_X550)
 		ixgbe_write_qde(adapter, vf, IXGBE_QDE_ENABLE);
 
-	adapter->vfinfo[vf].pf_vlan = 0;
-	adapter->vfinfo[vf].pf_qos = 0;
+	vfinfo[vf].pf_vlan = 0;
+	vfinfo[vf].pf_qos = 0;
 
 	return err;
 }
@@ -1653,13 +1810,20 @@ static int ixgbe_disable_port_vlan(struct ixgbe_adapter *adapter, int vf)
 int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
 			  u8 qos, __be16 vlan_proto)
 {
-	int err = 0;
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
+	int err = 0;
 
 	if ((vf >= adapter->num_vfs) || (vlan > 4095) || (qos > 7))
 		return -EINVAL;
 	if (vlan_proto != htons(ETH_P_8021Q))
 		return -EPROTONOSUPPORT;
+
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		goto out;
+
 	if (vlan || qos) {
 		/* Check if there is already a port VLAN set, if so
 		 * we have to delete the old one first before we
@@ -1668,16 +1832,17 @@ int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
 		 * old port VLAN before setting a new one but this
 		 * is not necessarily the case.
 		 */
-		if (adapter->vfinfo[vf].pf_vlan)
-			err = ixgbe_disable_port_vlan(adapter, vf);
+		if (vfinfo[vf].pf_vlan)
+			err = ixgbe_disable_port_vlan(adapter, vf, vfinfo);
 		if (err)
 			goto out;
-		err = ixgbe_enable_port_vlan(adapter, vf, vlan, qos);
+		err = ixgbe_enable_port_vlan(adapter, vf, vfinfo, vlan, qos);
 	} else {
-		err = ixgbe_disable_port_vlan(adapter, vf);
+		err = ixgbe_disable_port_vlan(adapter, vf, vfinfo);
 	}
 
 out:
+	rcu_read_unlock();
 	return err;
 }
 
@@ -1695,13 +1860,13 @@ int ixgbe_link_mbps(struct ixgbe_adapter *adapter)
 	}
 }
 
-static void ixgbe_set_vf_rate_limit(struct ixgbe_adapter *adapter, int vf)
+static void ixgbe_set_vf_rate_limit(struct ixgbe_adapter *adapter, int vf,
+				    u16 tx_rate)
 {
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 bcnrc_val = 0;
 	u16 queue, queues_per_pool;
-	u16 tx_rate = adapter->vfinfo[vf].tx_rate;
 
 	if (tx_rate) {
 		/* start with base link speed value */
@@ -1749,6 +1914,7 @@ static void ixgbe_set_vf_rate_limit(struct ixgbe_adapter *adapter, int vf)
 
 void ixgbe_check_vf_rate_limit(struct ixgbe_adapter *adapter)
 {
+	struct vf_data_storage *vfinfo;
 	int i;
 
 	/* VF Tx rate limit was not set */
@@ -1761,18 +1927,23 @@ void ixgbe_check_vf_rate_limit(struct ixgbe_adapter *adapter)
 			 "Link speed has been changed. VF Transmit rate is disabled\n");
 	}
 
-	for (i = 0; i < adapter->num_vfs; i++) {
-		if (!adapter->vf_rate_link_speed)
-			adapter->vfinfo[i].tx_rate = 0;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0; i < adapter->num_vfs; i++) {
+			if (!adapter->vf_rate_link_speed)
+				vfinfo[i].tx_rate = 0;
 
-		ixgbe_set_vf_rate_limit(adapter, i);
-	}
+			ixgbe_set_vf_rate_limit(adapter, i, vfinfo[i].tx_rate);
+		}
+	rcu_read_unlock();
 }
 
 int ixgbe_ndo_set_vf_bw(struct net_device *netdev, int vf, int min_tx_rate,
 			int max_tx_rate)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 	int link_speed;
 
 	/* verify VF is active */
@@ -1795,12 +1966,17 @@ int ixgbe_ndo_set_vf_bw(struct net_device *netdev, int vf, int min_tx_rate,
 	if (max_tx_rate && ((max_tx_rate <= 10) || (max_tx_rate > link_speed)))
 		return -EINVAL;
 
-	/* store values */
-	adapter->vf_rate_link_speed = link_speed;
-	adapter->vfinfo[vf].tx_rate = max_tx_rate;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo) {
+		/* store values */
+		adapter->vf_rate_link_speed = link_speed;
+		vfinfo[vf].tx_rate = max_tx_rate;
 
-	/* update hardware configuration */
-	ixgbe_set_vf_rate_limit(adapter, vf);
+		/* update hardware configuration */
+		ixgbe_set_vf_rate_limit(adapter, vf, vfinfo[vf].tx_rate);
+	}
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -1809,11 +1985,18 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 
-	adapter->vfinfo[vf].spoofchk_enabled = setting;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		vfinfo[vf].spoofchk_enabled = setting;
+	rcu_read_unlock();
+	if (!vfinfo)
+		return 0;
 
 	/* configure MAC spoofing */
 	hw->mac.ops.set_mac_anti_spoofing(hw, setting, vf);
@@ -1851,28 +2034,37 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
  **/
 void ixgbe_set_vf_link_state(struct ixgbe_adapter *adapter, int vf, int state)
 {
-	adapter->vfinfo[vf].link_state = state;
+	struct vf_data_storage *vfinfo;
+
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return;
+	}
+	vfinfo[vf].link_state = state;
 
 	switch (state) {
 	case IFLA_VF_LINK_STATE_AUTO:
 		if (test_bit(__IXGBE_DOWN, &adapter->state))
-			adapter->vfinfo[vf].link_enable = false;
+			vfinfo[vf].link_enable = false;
 		else
-			adapter->vfinfo[vf].link_enable = true;
+			vfinfo[vf].link_enable = true;
 		break;
 	case IFLA_VF_LINK_STATE_ENABLE:
-		adapter->vfinfo[vf].link_enable = true;
+		vfinfo[vf].link_enable = true;
 		break;
 	case IFLA_VF_LINK_STATE_DISABLE:
-		adapter->vfinfo[vf].link_enable = false;
+		vfinfo[vf].link_enable = false;
 		break;
 	}
 
 	ixgbe_set_vf_rx_tx(adapter, vf);
 
 	/* restart the VF */
-	adapter->vfinfo[vf].clear_to_send = false;
+	vfinfo[vf].clear_to_send = false;
 	ixgbe_ping_vf(adapter, vf);
+	rcu_read_unlock();
 }
 
 /**
@@ -1923,6 +2115,7 @@ int ixgbe_ndo_set_vf_rss_query_en(struct net_device *netdev, int vf,
 				  bool setting)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 
 	/* This operation is currently supported only for 82599 and x540
 	 * devices.
@@ -1934,7 +2127,11 @@ int ixgbe_ndo_set_vf_rss_query_en(struct net_device *netdev, int vf,
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 
-	adapter->vfinfo[vf].rss_query_enabled = setting;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		vfinfo[vf].rss_query_enabled = setting;
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -1942,18 +2139,31 @@ int ixgbe_ndo_set_vf_rss_query_en(struct net_device *netdev, int vf,
 int ixgbe_ndo_set_vf_trust(struct net_device *netdev, int vf, bool setting)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return 0;
+	}
+
 	/* nothing to do */
-	if (adapter->vfinfo[vf].trusted == setting)
+	if (vfinfo[vf].trusted == setting) {
+		rcu_read_unlock();
 		return 0;
+	}
 
-	adapter->vfinfo[vf].trusted = setting;
+	vfinfo[vf].trusted = setting;
 
 	/* reset VF to reconfigure features */
-	adapter->vfinfo[vf].clear_to_send = false;
+	vfinfo[vf].clear_to_send = false;
+
+	rcu_read_unlock();
+
 	ixgbe_ping_vf(adapter, vf);
 
 	e_info(drv, "VF %u is %strusted\n", vf, setting ? "" : "not ");
@@ -1965,17 +2175,30 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 			    int vf, struct ifla_vf_info *ivi)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
+
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 	ivi->vf = vf;
-	memcpy(&ivi->mac, adapter->vfinfo[vf].vf_mac_addresses, ETH_ALEN);
-	ivi->max_tx_rate = adapter->vfinfo[vf].tx_rate;
+
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+
+	memcpy(&ivi->mac, vfinfo[vf].vf_mac_addresses, ETH_ALEN);
+	ivi->max_tx_rate = vfinfo[vf].tx_rate;
 	ivi->min_tx_rate = 0;
-	ivi->vlan = adapter->vfinfo[vf].pf_vlan;
-	ivi->qos = adapter->vfinfo[vf].pf_qos;
-	ivi->spoofchk = adapter->vfinfo[vf].spoofchk_enabled;
-	ivi->rss_query_en = adapter->vfinfo[vf].rss_query_enabled;
-	ivi->trusted = adapter->vfinfo[vf].trusted;
-	ivi->linkstate = adapter->vfinfo[vf].link_state;
+	ivi->vlan = vfinfo[vf].pf_vlan;
+	ivi->qos = vfinfo[vf].pf_qos;
+	ivi->spoofchk = vfinfo[vf].spoofchk_enabled;
+	ivi->rss_query_en = vfinfo[vf].rss_query_enabled;
+	ivi->trusted = vfinfo[vf].trusted;
+	ivi->linkstate = vfinfo[vf].link_state;
+
+	rcu_read_unlock();
+
 	return 0;
 }
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock
  2026-05-20 18:34 ` [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock Tony Nguyen
@ 2026-05-23  0:16   ` Jakub Kicinski
  2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ixgbe: only access vfinfo and mv_list under RCU lock

This change drops the vfs_lock spinlock in favour of RCU protection
for adapter->vfinfo and adapter->mv_list, marking them __rcu and
converting readers to rcu_dereference()/rcu_read_lock() and writers
to rcu_replace_pointer()/kfree_rcu(). The goal is to address a
crash observed when ixgbe_disable_sriov() races with
ixgbe_service_task() / ixgbe_msg_task() readers.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 9b8217523fd2..8849b9f42bf6 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -808,10 +810,10 @@ struct ixgbe_adapter {
>  	/* SR-IOV */
>  	DECLARE_BITMAP(active_vfs, IXGBE_MAX_VF_FUNCTIONS);
>  	unsigned int num_vfs;
> -	struct vf_data_storage *vfinfo;
> +	struct vf_data_storage __rcu *vfinfo;
>  	int vf_rate_link_speed;
>  	struct vf_macvlans vf_mvs;
> -	struct vf_macvlans *mv_list;
> +	struct vf_macvlans __rcu *mv_list;

[Medium]
After this conversion, is ixgbe_handle_mdd_event() in
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c still consistent
with the new annotations? It contains:

    if (adapter->vfinfo && ixgbe_check_mdd_event(adapter)) {

That bare load of an __rcu pointer is not wrapped in
rcu_read_lock() and does not use rcu_access_pointer() or
rcu_dereference(), so sparse will flag it and the access
bypasses the RCU discipline used elsewhere in the file. The
commit message says RCU is used "throughout the driver" — was
this site simply missed during the conversion?

[ ... ]

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 2646ee6f295f..d82c7dfc6580 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c

[ ... ]

> @@ -8267,22 +8293,27 @@ static void ixgbe_watchdog_flush_tx(struct ixgbe_adapter *adapter)
> @@ -8309,9 +8340,15 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
>  
>  	/* check status reg for all VFs owned by this PF */
>  	for (vf = 0; vf < adapter->num_vfs; ++vf) {
> -		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
> +		struct vf_data_storage *vfinfo;
> +		struct pci_dev *vfdev = NULL;
>  		u16 status_reg;
>  
> +		rcu_read_lock();
> +		vfinfo = rcu_dereference(adapter->vfinfo);
> +		if (vfinfo)
> +			vfdev = vfinfo[vf].vfdev;
> +		rcu_read_unlock();
>  		if (!vfdev)
>  			continue;
>  		pci_read_config_word(vfdev, PCI_STATUS, &status_reg);

[High]
Can vfdev still be used safely after rcu_read_unlock() here?

The RCU section only protects the lifetime of the vfinfo array,
not the lifetime of the pci_dev whose pointer was stored in it.
ixgbe_disable_sriov() drops the reference taken by
ixgbe_get_vfs() with pci_dev_put(vfdev) and then proceeds to
pci_disable_sriov(), either of which can free the pci_dev
concurrently with the subsequent pci_read_config_word() and the
pcie_flr(vfdev) reached through ixgbe_bad_vf_abort().

Should this path take pci_dev_get(vfdev) before dropping the
rcu_read_unlock(), and pci_dev_put() after the config-space
access and FLR are complete?

This appears to predate this patch (the prior code dereferenced
vfdev outside any lifetime-extending section as well), but the
RCU conversion seems like a natural opportunity to address it.

[ ... ]

> @@ -10071,20 +10114,26 @@ static int handle_redirect_action(struct ixgbe_adapter *adapter, int ifindex,
>  {
>  	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
>  	unsigned int num_vfs = adapter->num_vfs, vf;
> +	struct vf_data_storage *vfinfo;
>  	struct netdev_nested_priv priv;
>  	struct upper_walk_data data;
>  	struct net_device *upper;
>  
>  	/* redirect to a SRIOV VF */
> -	for (vf = 0; vf < num_vfs; ++vf) {
> -		upper = pci_get_drvdata(adapter->vfinfo[vf].vfdev);
> -		if (upper->ifindex == ifindex) {
> -			*queue = vf * __ALIGN_MASK(1, ~vmdq->mask);
> -			*action = vf + 1;
> -			*action <<= ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
> -			return 0;
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (vfinfo)
> +		for (vf = 0; vf < num_vfs; ++vf) {
> +			upper = pci_get_drvdata(vfinfo[vf].vfdev);

[High]
Can the captured num_vfs and the freshly-dereferenced vfinfo
disagree in size here?

num_vfs is sampled from adapter->num_vfs before rcu_read_lock(),
while vfinfo is observed inside the RCU section. ixgbe_disable_sriov()
runs under PCI device_lock (not rtnl), sets adapter->num_vfs = 0,
and rcu_replace_pointer()s vfinfo. A subsequent
__ixgbe_enable_sriov(N2) can publish a smaller vfinfo array of
size N2 < num_vfs before the loop runs. The access vfinfo[vf]
for vf >= N2 is then a heap out-of-bounds read.

Should num_vfs be sampled inside the rcu_read_lock() and bounded
by the actual size of the published vfinfo, or should
adapter->num_vfs be updated as part of the same rcu_assign_pointer
publication so readers can rely on a consistent pair?

[ ... ]

> @@ -9744,15 +9781,21 @@ static int ixgbe_ndo_get_vf_stats(struct net_device *netdev, int vf,
>  				  struct ifla_vf_stats *vf_stats)
>  {
>  	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
> +	struct vf_data_storage *vfinfo;
>  
>  	if (vf < 0 || vf >= adapter->num_vfs)
>  		return -EINVAL;
>  
> -	vf_stats->rx_packets = adapter->vfinfo[vf].vfstats.gprc;
> -	vf_stats->rx_bytes   = adapter->vfinfo[vf].vfstats.gorc;
> -	vf_stats->tx_packets = adapter->vfinfo[vf].vfstats.gptc;
> -	vf_stats->tx_bytes   = adapter->vfinfo[vf].vfstats.gotc;
> -	vf_stats->multicast  = adapter->vfinfo[vf].vfstats.mprc;
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (vfinfo) {
> +		vf_stats->rx_packets = vfinfo[vf].vfstats.gprc;
> +		vf_stats->rx_bytes   = vfinfo[vf].vfstats.gorc;
> +		vf_stats->tx_packets = vfinfo[vf].vfstats.gptc;
> +		vf_stats->tx_bytes   = vfinfo[vf].vfstats.gotc;
> +		vf_stats->multicast  = vfinfo[vf].vfstats.mprc;
> +	}
> +	rcu_read_unlock();

[High]
This pattern recurs in many of the rtnl ndo handlers (set_vf_mac,
set_vf_spoofchk, set_vf_rss_query_en, set_vf_trust, set_vf_vlan,
set_vf_bw, get_vf_config, get_vf_stats): the bound check

    if (vf >= adapter->num_vfs)
        return -EINVAL;

is performed before rcu_read_lock(). Between that check and the
later rcu_dereference, ixgbe_disable_sriov() (under PCI device_lock,
not rtnl) can set num_vfs = 0 and __ixgbe_enable_sriov() can
republish vfinfo at a smaller size N2.

For ixgbe_ndo_get_vf_config() the resulting OOB read is then
memcpy'd into struct ifla_vf_info and returned to userspace via
netlink, which would expose adjacent kernel heap. For the setter
ndos and the mailbox-driven setters, this is an out-of-bounds
write.

Should the vf range check be repeated against the size of the
freshly observed vfinfo, ideally with num_vfs and vfinfo
published together so the pair is self-consistent?

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> index 431d77da15a5..80f22a8e7af4 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c

[ ... ]

> @@ -206,31 +214,28 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
>  {
>  	unsigned int num_vfs = adapter->num_vfs, vf;
>  	struct ixgbe_hw *hw = &adapter->hw;
> -	unsigned long flags;
> +	struct vf_data_storage *vfinfo;
> +	struct vf_macvlans *mv_list;
>  	int rss;
>  
> -	spin_lock_irqsave(&adapter->vfs_lock, flags);
> -	/* set num VFs to 0 to prevent access to vfinfo */
> +	/* set num VFs to 0 so readers bail out early */
>  	adapter->num_vfs = 0;
> -	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
> +
> +	vfinfo = rcu_replace_pointer(adapter->vfinfo, NULL, 1);
> +	mv_list = rcu_replace_pointer(adapter->mv_list, NULL, 1);

[Low]
The lockdep predicate passed to rcu_replace_pointer() is the
literal 1, which makes the writer-side assertion always succeed
and provides no verification. The patch simultaneously removes
vfs_lock and shifts to an implicit (undocumented) reliance on
some caller serialization for the writer.

Could lockdep_is_held() of the actual writer-side lock (for
example the PCI device_lock or rtnl_lock, whichever is intended)
be used here so a future regression that calls
ixgbe_disable_sriov() outside that context is caught by lockdep?

[ ... ]

> @@ -643,10 +668,16 @@ static void ixgbe_clear_vf_vlans(struct ixgbe_adapter *adapter, u32 vf)
>  static int ixgbe_set_vf_macvlan(struct ixgbe_adapter *adapter,
>  				int vf, int index, unsigned char *mac_addr)
>  {
> -	struct vf_macvlans *entry;
> +	struct vf_macvlans *mv_list, *entry;
>  	bool found = false;
>  	int retval = 0;
>  
> +	lockdep_assert_in_rcu_read_lock();
> +	/* vf_mvs entries point into the mv_list array */
> +	mv_list = rcu_dereference(adapter->mv_list);
> +	if (!mv_list)
> +		return 0;
> +
>  	if (index <= 1) {
>  		list_for_each_entry(entry, &adapter->vf_mvs.l, l) {

[High]
Is non-RCU list iteration safe here while only rcu_read_lock()
is held?

The list head &adapter->vf_mvs.l is mutated by the writer side
without RCU-aware primitives. ixgbe_alloc_vf_macvlans() does:

    INIT_LIST_HEAD(&adapter->vf_mvs.l);
    ...
    list_add(&mv_list[i].l, &adapter->vf_mvs.l);

and ixgbe_disable_sriov() kfree_rcu()s mv_list but never
reinitialises &adapter->vf_mvs.l, so on a subsequent
re-enable the writer concurrently zeros and re-links the head
while a reader walks it.

Should list_for_each_entry() be list_for_each_entry_rcu(), and
the writer use INIT_LIST_HEAD_RCU() / list_add_rcu()? Otherwise
the reader can observe torn next/prev loads or follow into
freed/re-used entries.

[ ... ]

> @@ -780,16 +816,25 @@ static void ixgbe_vf_clear_mbx(struct ixgbe_adapter *adapter, u32 vf)
>  static int ixgbe_set_vf_mac(struct ixgbe_adapter *adapter,
>  			    int vf, unsigned char *mac_addr)
>  {
> +	struct vf_data_storage *vfinfo;
>  	int retval;
>  
> -	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (!vfinfo) {
> +		rcu_read_unlock();
> +		return -EINVAL;
> +	}
> +
> +	ixgbe_del_mac_filter(adapter, vfinfo[vf].vf_mac_addresses, vf);
>  	retval = ixgbe_add_mac_filter(adapter, mac_addr, vf);
>  	if (retval >= 0)
> -		memcpy(adapter->vfinfo[vf].vf_mac_addresses, mac_addr,
> +		memcpy(vfinfo[vf].vf_mac_addresses, mac_addr,
>  		       ETH_ALEN);

[ ... ]

> @@ -1537,21 +1677,34 @@ void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
>  int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
>  {
>  	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
> +	struct vf_data_storage *vfinfo;
>  	int retval;
>  
>  	if (vf >= adapter->num_vfs)
>  		return -EINVAL;
>  
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (!vfinfo) {
> +		rcu_read_unlock();
> +		return 0;
> +	}
> +
>  	if (is_valid_ether_addr(mac)) {
>  		dev_info(&adapter->pdev->dev, "setting MAC %pM on VF %d\n",
>  			 mac, vf);
>  		dev_info(&adapter->pdev->dev, "Reload the VF driver to make this change effective.");
>  
>  		retval = ixgbe_set_vf_mac(adapter, vf, mac);
>  		if (retval >= 0) {
> -			adapter->vfinfo[vf].pf_set_mac = true;
> +			vfinfo[vf].pf_set_mac = true;

[High]
Can vfinfo here and the vfinfo observed inside ixgbe_set_vf_mac()
end up pointing at different objects?

ixgbe_ndo_set_vf_mac() captures vfinfo via its own rcu_dereference,
then calls ixgbe_set_vf_mac() which independently does:

    rcu_read_lock();
    vfinfo = rcu_dereference(adapter->vfinfo);
    ...
    memcpy(vfinfo[vf].vf_mac_addresses, mac_addr, ETH_ALEN);

Nesting rcu_read_lock() is fine, but each rcu_dereference()
observes whatever pointer is currently published. If
ixgbe_disable_sriov() followed by __ixgbe_enable_sriov() races
between the two dereferences, the inner copy lands in the new
array while the outer pf_set_mac = true write lands in the old
array. The two pieces of state then disagree, which seems to
break the "attempted to override administratively set MAC"
enforcement on subsequent VF mailbox messages. If the new
vfinfo is shorter than vf, the inner memcpy is also a heap
out-of-bounds write.

Should ixgbe_ndo_set_vf_mac() pass its already-captured vfinfo
into ixgbe_set_vf_mac() so both writes go to the same object,
or otherwise structure the call so a single rcu_dereference
covers all writes for one operation?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock
  2026-05-20 18:34 ` [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock Tony Nguyen
  2026-05-23  0:16   ` Jakub Kicinski
@ 2026-05-23  0:16   ` Jakub Kicinski
  1 sibling, 0 replies; 21+ messages in thread
From: Jakub Kicinski @ 2026-05-23  0:16 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Jakub Kicinski, davem, pabeni, edumazet, andrew+netdev, netdev,
	vinschen, alexander.nowlin

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: ixgbe: only access vfinfo and mv_list under RCU lock

This patch drops the spinlock protecting the adapter info in favor of RCU to
avoid lock contention and crashes when setting sriov_numvfs to zero. It
modifies the accesses to vfinfo and mv_list across the driver to use RCU
primitives.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 2646ee6f295f0..d82c7dfc6580a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
[ ... ]
> @@ -8309,9 +8340,15 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
>  
>  	/* check status reg for all VFs owned by this PF */
>  	for (vf = 0; vf < adapter->num_vfs; ++vf) {
> -		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
> +		struct vf_data_storage *vfinfo;
> +		struct pci_dev *vfdev = NULL;
>  		u16 status_reg;
>  
> +		rcu_read_lock();
> +		vfinfo = rcu_dereference(adapter->vfinfo);
> +		if (vfinfo)
> +			vfdev = vfinfo[vf].vfdev;
> +		rcu_read_unlock();
>  		if (!vfdev)
>  			continue;
>  		pci_read_config_word(vfdev, PCI_STATUS, &status_reg);

[Severity: High]
Is it safe to dereference vfdev outside the RCU read-side critical
section? By dropping the RCU read lock before calling pci_read_config_word(),
the grace period can expire and a concurrent execution of
ixgbe_disable_sriov() could free the pci_dev object, resulting in a
use-after-free.

[ ... ]
> @@ -10071,20 +10114,26 @@ static int handle_redirect_action(struct ixgbe_adapter *adapter, int ifindex,
>  {
>  	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
>  	unsigned int num_vfs = adapter->num_vfs, vf;
> +	struct vf_data_storage *vfinfo;
>  	struct netdev_nested_priv priv;
>  	struct upper_walk_data data;
>  	struct net_device *upper;
>  
>  	/* redirect to a SRIOV VF */
> -	for (vf = 0; vf < num_vfs; ++vf) {
> -		upper = pci_get_drvdata(adapter->vfinfo[vf].vfdev);
> -		if (upper->ifindex == ifindex) {
> -			*queue = vf * __ALIGN_MASK(1, ~vmdq->mask);
> -			*action = vf + 1;
> -			*action <<= ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
> -			return 0;
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (vfinfo)
> +		for (vf = 0; vf < num_vfs; ++vf) {

[Severity: High]
Does reading adapter->num_vfs outside the RCU lock create a time-of-check to
time-of-use (TOCTOU) race? If SR-IOV is disabled and rapidly re-enabled, a
reader can cache an older, larger num_vfs value before acquiring the RCU
lock, but then fetch the newer, smaller vfinfo array. Iterating up to the old
num_vfs would then cause an out-of-bounds memory access.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> index 431d77da15a5f..80f22a8e7af43 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
[ ... ]
> @@ -123,6 +125,7 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
>  static void ixgbe_get_vfs(struct ixgbe_adapter *adapter)
>  {
>  	struct pci_dev *pdev = adapter->pdev;
> +	struct vf_data_storage *vfinfo;
>  	u16 vendor = pdev->vendor;
>  	struct pci_dev *vfdev;
>  	int vf = 0;
> @@ -134,18 +137,23 @@ static void ixgbe_get_vfs(struct ixgbe_adapter *adapter)
>  		return;
>  	pci_read_config_word(pdev, pos + PCI_SRIOV_VF_DID, &vf_id);
>  
> -	vfdev = pci_get_device(vendor, vf_id, NULL);
> -	for (; vfdev; vfdev = pci_get_device(vendor, vf_id, vfdev)) {
> -		if (!vfdev->is_virtfn)
> -			continue;
> -		if (vfdev->physfn != pdev)
> -			continue;
> -		if (vf >= adapter->num_vfs)
> -			continue;
> -		pci_dev_get(vfdev);
> -		adapter->vfinfo[vf].vfdev = vfdev;
> -		++vf;
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (vfinfo) {
> +		vfdev = pci_get_device(vendor, vf_id, NULL);
> +		for (; vfdev; vfdev = pci_get_device(vendor, vf_id, vfdev)) {

[Severity: High]
Does this code call a sleeping function inside an RCU read-side critical
section? pci_get_device() internally acquires the pci_bus_sem read-write
semaphore, which can sleep. Sleeping inside an RCU read section will
trigger a 'scheduling while atomic' BUG.

[ ... ]
> @@ -206,31 +214,28 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
>  {
>  	unsigned int num_vfs = adapter->num_vfs, vf;
>  	struct ixgbe_hw *hw = &adapter->hw;
> -	unsigned long flags;
> +	struct vf_data_storage *vfinfo;
> +	struct vf_macvlans *mv_list;
>  	int rss;
>  
> -	spin_lock_irqsave(&adapter->vfs_lock, flags);
> -	/* set num VFs to 0 to prevent access to vfinfo */
> +	/* set num VFs to 0 so readers bail out early */
>  	adapter->num_vfs = 0;
> -	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
> +
> +	vfinfo = rcu_replace_pointer(adapter->vfinfo, NULL, 1);
> +	mv_list = rcu_replace_pointer(adapter->mv_list, NULL, 1);
>  
>  	/* put the reference to all of the vf devices */
>  	for (vf = 0; vf < num_vfs; ++vf) {
> -		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
> +		struct pci_dev *vfdev = vfinfo[vf].vfdev;
>  
>  		if (!vfdev)
>  			continue;
> -		adapter->vfinfo[vf].vfdev = NULL;
> +		vfinfo[vf].vfdev = NULL;
>  		pci_dev_put(vfdev);
>  	}

[Severity: High]
Can RCU readers observe these synchronous mutations before the grace
period ends? The vfinfo array is modified in-place and pci_dev_put() is
called on the devices synchronously. If the final reference is dropped,
the pci_dev could be freed while an RCU reader (like
handle_redirect_action()) is concurrently accessing it, leading to a
use-after-free.

[ ... ]
> @@ -643,10 +668,16 @@ static void ixgbe_clear_vf_vlans(struct ixgbe_adapter *adapter, u32 vf)
>  static int ixgbe_set_vf_macvlan(struct ixgbe_adapter *adapter,
>  				int vf, int index, unsigned char *mac_addr)
>  {
> -	struct vf_macvlans *entry;
> +	struct vf_macvlans *mv_list, *entry;
>  	bool found = false;
>  	int retval = 0;
>  
> +	lockdep_assert_in_rcu_read_lock();
> +	/* vf_mvs entries point into the mv_list array */
> +	mv_list = rcu_dereference(adapter->mv_list);
> +	if (!mv_list)
> +		return 0;
> +
>  	if (index <= 1) {
>  		list_for_each_entry(entry, &adapter->vf_mvs.l, l) {

[Severity: High]
Are standard list operations safe here under RCU? The adapter->vf_mvs.l
list is traversed using list_for_each_entry() under rcu_read_lock().
Concurrently, ixgbe_alloc_vf_macvlans() can re-initialize the list head
and add entries using list_add(). Without the appropriate RCU list
primitives, concurrent list modifications can cause data races and
corrupted pointers.

[ ... ]
> @@ -952,14 +1013,20 @@ static int ixgbe_set_vf_mac_addr(struct ixgbe_adapter *adapter,
>  				 u32 *msgbuf, u32 vf)
>  {
>  	u8 *new_mac = ((u8 *)(&msgbuf[1]));
> +	struct vf_data_storage *vfinfo;
>  
>  	if (!is_valid_ether_addr(new_mac)) {
>  		e_warn(drv, "VF %d attempted to set invalid mac\n", vf);
>  		return -1;
>  	}
>  
> -	if (adapter->vfinfo[vf].pf_set_mac && !adapter->vfinfo[vf].trusted &&
> -	    !ether_addr_equal(adapter->vfinfo[vf].vf_mac_addresses, new_mac)) {
> +	lockdep_assert_in_rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (!vfinfo)
> +		return 0;

[Severity: Medium]
Will this send false ACKs to the VF when vfinfo is NULL? The caller,
ixgbe_rcv_msg_from_vf(), interprets a return value of 0 as success and
sends an ACK (IXGBE_VT_MSGTYPE_ACK) to the VF. Returning 0 here means
the operation is silently ignored but acknowledged as successful. Could
this return an error like -EINVAL instead?

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-05-23  0:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
2026-05-20 18:34 ` [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race Tony Nguyen
2026-05-21 15:37   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 2/8] ice: fix stats array overflow when VF requests more queues Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 3/8] iavf: return EBUSY if reset in progress or not ready during MAC change Tony Nguyen
2026-05-20 18:34 ` [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 5/8] iavf: send MAC change request synchronously Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 7/8] i40e: set supported_extts_flags for rising edge Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox