Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Intel-wired-lan] [PATCH net] ixgbe: only access vfinfo and mv_list under RCU lock
From: Corinna Vinschen @ 2026-04-16 10:42 UTC (permalink / raw)
  To: Loktionov, Aleksandr
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	Corinna Vinschen
In-Reply-To: <IA3PR11MB898633AB0B2D010F495944E8E5232@IA3PR11MB8986.namprd11.prod.outlook.com>

On Apr 16 09:23, Loktionov, Aleksandr wrote:
> > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> > [...]
> > @@ -9744,17 +9781,23 @@ static int ixgbe_ndo_get_vf_stats(struct
> > net_device *netdev, int vf,
> >  				  struct ifla_vf_stats *vf_stats)
> >  {
> >  	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
> > +	struct vf_data_storage *vfinfo;
> > 
> >  	if (vf < 0 || vf >= adapter->num_vfs)
> >  		return -EINVAL;
> > 
> > -	vf_stats->rx_packets = adapter->vfinfo[vf].vfstats.gprc;
> > -	vf_stats->rx_bytes   = adapter->vfinfo[vf].vfstats.gorc;
> > -	vf_stats->tx_packets = adapter->vfinfo[vf].vfstats.gptc;
> > -	vf_stats->tx_bytes   = adapter->vfinfo[vf].vfstats.gotc;
> > -	vf_stats->multicast  = adapter->vfinfo[vf].vfstats.mprc;
> > +	rcu_read_lock();
> > +	vfinfo = rcu_dereference(adapter->vfinfo);
> > +	if (vfinfo) {
> > +		vf_stats->rx_packets = vfinfo[vf].vfstats.gprc;
> > +		vf_stats->rx_bytes   = vfinfo[vf].vfstats.gorc;
> > +		vf_stats->tx_packets = vfinfo[vf].vfstats.gptc;
> > +		vf_stats->tx_bytes   = vfinfo[vf].vfstats.gotc;
> > +		vf_stats->multicast  = vfinfo[vf].vfstats.mprc;
> > +	}
> > +	rcu_read_unlock();
> > 
> > -	return 0;
> > +	return vfinfo ? 0 : -EINVAL;
> Before it returned always success, but now it will break 'ip link show dev' in short window when SR-IOV is being torn down.
> For me it looks like UAPI regression.

Good point.  I'll change that back for a v2, just waiting for more
feedback.


Thanks,
Corinna


^ permalink raw reply

* Re: [PATCH net 1/1] net: bridge: use a stable FDB dst snapshot in RCU readers
From: Paolo Abeni @ 2026-04-16 10:41 UTC (permalink / raw)
  To: Ido Schimmel, Ren Wei
  Cc: bridge, netdev, razor, davem, edumazet, kuba, horms,
	makita.toshiaki, vyasevic, yifanwucs, tomapufckgml, yuantan098,
	bird, enjou1224z, zcliangcn
In-Reply-To: <20260414074722.GA321402@shredder>

On 4/14/26 10:05 AM, Ido Schimmel wrote:
> On Mon, Apr 13, 2026 at 05:08:46PM +0800, Ren Wei wrote:
>> From: Zhengchuan Liang <zcliangcn@gmail.com>
>>
>> Local FDB entries can be rewritten in place by `fdb_delete_local()`, which
>> updates `f->dst` to another port or to `NULL` while keeping the entry
>> alive. Several bridge RCU readers inspect `f->dst`, including
>> `br_fdb_fillbuf()` through the `brforward_read()` sysfs path.
>>
>> These readers currently load `f->dst` multiple times and can therefore
>> observe inconsistent values across the check and later dereference.
>> In `br_fdb_fillbuf()`, this means a concurrent local-FDB update can change
>> `f->dst` after the NULL check and before the `port_no` dereference,
>> leading to a NULL-ptr-deref.
>>
>> Fix this by taking a single `READ_ONCE()` snapshot of `f->dst` in each
>> affected RCU reader and using that snapshot for the rest of the access
>> sequence. Also publish the in-place `f->dst` updates in `fdb_delete_local()`
>> with `WRITE_ONCE()` so the readers and writer use matching access patterns.
> 
> Sashiko is complaining [1] about missing READ_ONCE() annotations in some
> places, but I can handle them in net-next in a similar fashion to commit
> 3e19ae7c6fd6 ("net: bridge: use READ_ONCE() and WRITE_ONCE() compiler
> barriers for fdb->dst").

I agree they can be handled separately, because they don't look harmful.
I think a 'net' patch could be used for such follow-up (data race)

/P


^ permalink raw reply

* Re: [PATCH net 00/14] Netfilter/IPVS fixes for net
From: Florian Westphal @ 2026-04-16 10:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, horms
In-Reply-To: <aeC4A75gYD9qT5OR@chamomile>

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> I cannot send a batch before 16h my local time, I need a bit more
> time.
>
> Sorry.

No problem.  Alternative is to drop patches, this is what I did in the
past.  Some LLM comment indicates problem, remove patch from v2
and defer to next week.

But that was before LLM reviews flagged 50% of patches.
I'll pick up on anything left behind for next weeks batch(es).

^ permalink raw reply

* Re: [PATCH v25 06/11] cxl/hdm: Add support for getting region from committed decoder
From: Alejandro Lucero Palau @ 2026-04-16 10:37 UTC (permalink / raw)
  To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
	dave.jiang, edward.cree, davem, kuba, pabeni, edumazet
In-Reply-To: <69ccaabf8d9cc_1b0cc6100fd@dwillia2-mobl4.notmuch>


On 4/1/26 06:18, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> A Type2 device configured by the BIOS can have its HDM committed and
>> a cxl region linked by auto discovery when the device memdev is created.
>>
>> Add a function for a Type2 driver to obtain such a region.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> [..]
>> +/**
>> + * cxl_get_region_from_committed_decoder - obtain a pointer to a region
>> + * @cxlmd: CXL memdev from an endpoint device
>> + *
>> + * An accelerator decoder can be set up by the firmware/BIOS and the auto
>> + * discovery region creation triggered by the memdev object initialization.
>> + * Using this function the related driver can obtain such a region.
>> + *
>> + * Only one committed HDM is expected, returning the first one found.
>> + *
>> + * Return pointer to a region or NULL
>> + */
>> +struct cxl_region *cxl_get_region_from_committed_decoder(struct cxl_memdev *cxlmd)
>> +{
>> +	struct cxl_port *endpoint = cxlmd->endpoint;
>> +	struct cxl_endpoint_decoder *cxled;
>> +
>> +	if (!endpoint)
>> +		return NULL;
>> +
>> +	guard(rwsem_read)(&cxl_rwsem.dpa);
>> +	struct device *cxled_dev __free(put_device) =
>> +		device_find_child(&endpoint->dev, NULL,
>> +				  find_committed_endpoint_decoder);
>> +
>> +	if (!cxled_dev)
>> +		return NULL;
>> +
>> +	cxled = to_cxl_endpoint_decoder(cxled_dev);
>> +
>> +	return cxled->cxld.region;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_region_from_committed_decoder, "CXL");
> As soon as this function returns there is no guarantee that the region
> is still valid.


 From my comment on your patch trying to solve this problem:


https://lore.kernel.org/linux-cxl/20260403210050.1058650-1-dan.j.williams@intel.com/T/#m467dc88199645865a05b5fef858a67f3a608895b


I think the protection for getting the region needs to cover the use 
from the type2 driver, what is likely the ioremapping and some internal 
updates based on the success of these "atomic" actions. If the region 
disappears after that point, the same type2 driver should get an event 
for stopping that usage, what can be implemented as a callback/action 
for the cxl region device removal as done in v17.


If we agree on this, the main issue is how to implement it. I do not 
think adding specific functions or params covering the different 
possibilities (ioremap, ioremap_wc, ...) is desirable, and it will lack 
the flexibility needs for syncing things internally by the type2 driver 
against the region removal race. I wonder if we could have something 
like this:


cxl_get_locked_region_from_committed_decoder()   --> implying locking on 
cxlrd->dev, if regions does exist


cxl_release_region_lock  --> implying unlocking cxlrd->dev


  to be used by the type2 driver allowing the safe ioremap and any 
internal variable updates between them.


^ permalink raw reply

* [PATCH net] ipv6: fix possible UAF in icmpv6_rcv()
From: Eric Dumazet @ 2026-04-16 10:35 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, David Ahern, Ido Schimmel, netdev, eric.dumazet,
	Eric Dumazet

Caching saddr and daddr before pskb_pull() is problematic
since skb->head can change.

Remove these temporary variables:

- We only access &ipv6_hdr(skb)->saddr and &ipv6_hdr(skb)->daddr
  when net_dbg_ratelimited() is called in the slow path.

- Avoid potential future misuse after pskb_pull() call.

Fixes: 4b3418fba0fe ("ipv6: icmp: include addresses in debug messages")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/icmp.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 799d9e9ac45d11f7b460da7d8a7aeeaf0eb50f2f..efb23807a0262e8d68aa1afc8d96ee94eab89d50 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -1104,7 +1104,6 @@ static int icmpv6_rcv(struct sk_buff *skb)
 	struct net *net = dev_net_rcu(skb->dev);
 	struct net_device *dev = icmp6_dev(skb);
 	struct inet6_dev *idev = __in6_dev_get(dev);
-	const struct in6_addr *saddr, *daddr;
 	struct icmp6hdr *hdr;
 	u8 type;
 
@@ -1135,12 +1134,10 @@ static int icmpv6_rcv(struct sk_buff *skb)
 
 	__ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INMSGS);
 
-	saddr = &ipv6_hdr(skb)->saddr;
-	daddr = &ipv6_hdr(skb)->daddr;
-
 	if (skb_checksum_validate(skb, IPPROTO_ICMPV6, ip6_compute_pseudo)) {
 		net_dbg_ratelimited("ICMPv6 checksum failed [%pI6c > %pI6c]\n",
-				    saddr, daddr);
+				    &ipv6_hdr(skb)->saddr,
+				    &ipv6_hdr(skb)->daddr);
 		goto csum_error;
 	}
 
@@ -1220,7 +1217,8 @@ static int icmpv6_rcv(struct sk_buff *skb)
 			break;
 
 		net_dbg_ratelimited("icmpv6: msg of unknown type [%pI6c > %pI6c]\n",
-				    saddr, daddr);
+				    &ipv6_hdr(skb)->saddr,
+				    &ipv6_hdr(skb)->daddr);
 
 		/*
 		 * error of unknown type.
-- 
2.54.0.rc1.513.gad8abe7a5a-goog


^ permalink raw reply related

* Re: [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal
From: Petr Mladek @ 2026-04-16 10:33 UTC (permalink / raw)
  To: chensong_2000
  Cc: rafael, lenb, mturquette, sboyd, viresh.kumar, agk, snitzer,
	mpatocka, bmarzins, song, yukuai, linan122, jason.wessel, danielt,
	dianders, horms, davem, edumazet, kuba, pabeni, paulmck, frederic,
	mcgrof, petr.pavlu, da.gomez, samitolvanen, atomlin, jpoimboe,
	jikos, mbenes, joe.lawrence, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-modules, linux-kernel,
	linux-trace-kernel, linux-acpi, linux-clk, linux-pm,
	live-patching, dm-devel, linux-raid, kgdb-bugreport, netdev
In-Reply-To: <20260415070137.17860-1-chensong_2000@189.cn>

On Wed 2026-04-15 15:01:37, chensong_2000@189.cn wrote:
> From: Song Chen <chensong_2000@189.cn>
> 
> The current notifier chain implementation uses a single-linked list
> (struct notifier_block *next), which only supports forward traversal
> in priority order. This makes it difficult to handle cleanup/teardown
> scenarios that require notifiers to be called in reverse priority order.
> 
> A concrete example is the ordering dependency between ftrace and
> livepatch during module load/unload. see the detail here [1].
> 
> This patch replaces the single-linked list in struct notifier_block
> with a struct list_head, converting the notifier chain into a
> doubly-linked list sorted in descending priority order. Based on
> this, a new function notifier_call_chain_reverse() is introduced,
> which traverses the chain in reverse (ascending priority order).
> The corresponding blocking_notifier_call_chain_reverse() is also
> added as the locking wrapper for blocking notifier chains.
> 
> The internal notifier_call_chain_robust() is updated to use
> notifier_call_chain_reverse() for rollback: on error, it records
> the failing notifier (last_nb) and the count of successfully called
> notifiers (nr), then rolls back exactly those nr-1 notifiers in
> reverse order starting from last_nb's predecessor, without needing
> to know the total length of the chain.
> 
> With this change, subsystems with symmetric setup/teardown ordering
> requirements can register a single notifier_block with one priority
> value, and rely on blocking_notifier_call_chain() for forward
> traversal and blocking_notifier_call_chain_reverse() for reverse
> traversal, without needing hard-coded call sequences or separate
> notifier registrations for each direction.
> 
> [1]:https://lore.kernel.org/all
> 	/alpine.LNX.2.00.1602172216491.22700@cbobk.fhfr.pm/
> 
> --- a/include/linux/notifier.h
> +++ b/include/linux/notifier.h
> @@ -53,41 +53,41 @@ typedef	int (*notifier_fn_t)(struct notifier_block *nb,
[...]
>  struct notifier_block {
>  	notifier_fn_t notifier_call;
> -	struct notifier_block __rcu *next;
> +	struct list_head __rcu entry;
>  	int priority;
>  };
[...]
>  #define ATOMIC_INIT_NOTIFIER_HEAD(name) do {	\
>  		spin_lock_init(&(name)->lock);	\
> -		(name)->head = NULL;		\
> +		INIT_LIST_HEAD(&(name)->head);		\

I would expect the RCU variant here, aka INIT_LIST_HEAD_RCU().

> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -14,39 +14,47 @@
>   *	are layered on top of these, with appropriate locking added.
>   */
>  
> -static int notifier_chain_register(struct notifier_block **nl,
> +static int notifier_chain_register(struct list_head *nl,
>  				   struct notifier_block *n,
>  				   bool unique_priority)
>  {
> -	while ((*nl) != NULL) {
> -		if (unlikely((*nl) == n)) {
> +	struct notifier_block *cur;
> +
> +	list_for_each_entry(cur, nl, entry) {
> +		if (unlikely(cur == n)) {
>  			WARN(1, "notifier callback %ps already registered",
>  			     n->notifier_call);
>  			return -EEXIST;
>  		}
> -		if (n->priority > (*nl)->priority)
> -			break;
> -		if (n->priority == (*nl)->priority && unique_priority)
> +
> +		if (n->priority == cur->priority && unique_priority)
>  			return -EBUSY;
> -		nl = &((*nl)->next);
> +
> +		if (n->priority > cur->priority) {
> +			list_add_tail(&n->entry, &cur->entry);
> +			goto out;
> +		}
>  	}
> -	n->next = *nl;
> -	rcu_assign_pointer(*nl, n);
> +
> +	list_add_tail(&n->entry, nl);

I would expect list_add_tail_rcu() here.

> @@ -59,25 +67,25 @@ static int notifier_chain_unregister(struct notifier_block **nl,
>   *			value of this parameter is -1.
>   *	@nr_calls:	Records the number of notifications sent. Don't care
>   *			value of this field is NULL.
> + *	@last_nb:  Records the last called notifier block for rolling back
>   *	Return:		notifier_call_chain returns the value returned by the
>   *			last notifier function called.
>   */
> -static int notifier_call_chain(struct notifier_block **nl,
> +static int notifier_call_chain(struct list_head *nl,
>  			       unsigned long val, void *v,
> -			       int nr_to_call, int *nr_calls)
> +			       int nr_to_call, int *nr_calls,
> +				   struct notifier_block **last_nb)
>  {
>  	int ret = NOTIFY_DONE;
> -	struct notifier_block *nb, *next_nb;
> -
> -	nb = rcu_dereference_raw(*nl);
> +	struct notifier_block *nb;
>  
> -	while (nb && nr_to_call) {
> -		next_nb = rcu_dereference_raw(nb->next);
> +	if (!nr_to_call)
> +		return ret;
>  
> +	list_for_each_entry(nb, nl, entry) {

I would expect the RCU variant here, aka list_for_each_rcu()

These are just two random examples which I found by a quick look.

I guess that the notifier API is very old and it does not use all
the RCU API features which allow to track safety when
CONFIG_PROVE_RCU and CONFIG_PROVE_RCU_LIST are enabled.

It actually might be worth to audit the code and make it right.

>  #ifdef CONFIG_DEBUG_NOTIFIERS
>  		if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
>  			WARN(1, "Invalid notifier called!");
> -			nb = next_nb;
>  			continue;
>  		}
>  #endif

That said, I am not sure if the ftrace/livepatching handlers are
the right motivation for this. Especially when I see the
complexity of the 2nd patch [*]

After thinking more about it. I am not even sure that the ftrace and
livepatching callbacks are good candidates for generic notifiers.
They are too special. It is not only about ordering them against
each other. But it is also about ordering them against other
notifiers. The ftrace/livepatching callbacks must be the first/last
during module load/release.

[*] The 2nd patch is not archived by lore for some reason.
    I have found only a review but it gives a good picture, see
    https://lore.kernel.org/all/1191caf5-6a61-4622-a15e-854d3701f4fc@suse.com/

Best Regards,
Petr

^ permalink raw reply

* [PATCH net v2] net: airoha: Fix possible TX queue stall in airoha_qdma_tx_napi_poll()
From: Lorenzo Bianconi @ 2026-04-16 10:30 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Lorenzo Bianconi
  Cc: linux-arm-kernel, linux-mediatek, netdev

Since multiple net_device TX queues can share the same hw QDMA TX queue,
there is no guarantee we have inflight packets queued in hw belonging to a
net_device TX queue stopped in the xmit path because hw QDMA TX queue
can be full. In this corner case the net_device TX queue will never be
re-activated. In order to avoid any potential net_device TX queue stall,
we need to wake all the net_device TX queues feeding the same hw QDMA TX
queue in airoha_qdma_tx_napi_poll routine.

Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes in v2:
- Add txq_stopped parameter to avoid any possible corner cases where the
  netdev queue stalls.
- Link to v1: https://lore.kernel.org/r/20260413-airoha-txq-potential-stall-v1-1-7830363b1543@kernel.org
---
 drivers/net/ethernet/airoha/airoha_eth.c | 37 +++++++++++++++++++++++++++-----
 drivers/net/ethernet/airoha/airoha_eth.h |  1 +
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index e1ab15f1ee7d..19f67c7dd8e1 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -843,6 +843,21 @@ static int airoha_qdma_init_rx(struct airoha_qdma *qdma)
 	return 0;
 }
 
+static void airoha_qdma_wake_netdev_txqs(struct airoha_queue *q)
+{
+	struct airoha_qdma *qdma = q->qdma;
+	struct airoha_eth *eth = qdma->eth;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
+		struct airoha_gdm_port *port = eth->ports[i];
+
+		if (port && port->qdma == qdma)
+			netif_tx_wake_all_queues(port->dev);
+	}
+	q->txq_stopped = false;
+}
+
 static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
 {
 	struct airoha_tx_irq_queue *irq_q;
@@ -919,12 +934,21 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
 
 			txq = netdev_get_tx_queue(skb->dev, queue);
 			netdev_tx_completed_queue(txq, 1, skb->len);
-			if (netif_tx_queue_stopped(txq) &&
-			    q->ndesc - q->queued >= q->free_thr)
-				netif_tx_wake_queue(txq);
-
 			dev_kfree_skb_any(skb);
 		}
+
+		if (q->txq_stopped && q->ndesc - q->queued >= q->free_thr) {
+			/* Since multiple net_device TX queues can share the
+			 * same hw QDMA TX queue, there is no guarantee we have
+			 * inflight packets queued in hw belonging to a
+			 * net_device TX queue stopped in the xmit path.
+			 * In order to avoid any potential net_device TX queue
+			 * stall, we need to wake all the net_device TX queues
+			 * feeding the same hw QDMA TX queue.
+			 */
+			airoha_qdma_wake_netdev_txqs(q);
+		}
+
 unlock:
 		spin_unlock_bh(&q->lock);
 	}
@@ -1984,6 +2008,7 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 	if (q->queued + nr_frags >= q->ndesc) {
 		/* not enough space in the queue */
 		netif_tx_stop_queue(txq);
+		q->txq_stopped = true;
 		spin_unlock_bh(&q->lock);
 		return NETDEV_TX_BUSY;
 	}
@@ -2039,8 +2064,10 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 				TX_RING_CPU_IDX_MASK,
 				FIELD_PREP(TX_RING_CPU_IDX_MASK, index));
 
-	if (q->ndesc - q->queued < q->free_thr)
+	if (q->ndesc - q->queued < q->free_thr) {
 		netif_tx_stop_queue(txq);
+		q->txq_stopped = true;
+	}
 
 	spin_unlock_bh(&q->lock);
 
diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h
index 95e557638617..87b328cfefb0 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.h
+++ b/drivers/net/ethernet/airoha/airoha_eth.h
@@ -193,6 +193,7 @@ struct airoha_queue {
 	int ndesc;
 	int free_thr;
 	int buf_size;
+	bool txq_stopped;
 
 	struct napi_struct napi;
 	struct page_pool *page_pool;

---
base-commit: 1f5ffc672165ff851063a5fd044b727ab2517ae3
change-id: 20260407-airoha-txq-potential-stall-ad52c53094e8

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply related

* [PATCH net] slip: bound decode() reads against the compressed packet length
From: Weiming Shi @ 2026-04-16 10:01 UTC (permalink / raw)
  To: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, netdev, Weiming Shi

slhc_uncompress() parses a VJ-compressed TCP header by advancing a
pointer through the packet via decode() and pull16(). Neither helper
bounds-checks against isize, and decode() masks its return with
& 0xffff so it can never return the -1 that callers test for -- those
error paths are dead code.

A short compressed frame whose change byte requests optional fields
lets decode() read past the end of the packet. The over-read bytes
are folded into the cached cstate and reflected into subsequent
reconstructed packets.

Make decode() and pull16() take the packet end pointer and return -1
when exhausted. Add a bounds check before the TCP-checksum read.
The existing == -1 tests now do what they were always meant to.

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20260414134126.758795-2-horms@kernel.org/
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
 drivers/net/slip/slhc.c | 43 ++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
index e3c785da3eef3..d1c0523085d3a 100644
--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -80,9 +80,9 @@
 #include <linux/unaligned.h>
 
 static unsigned char *encode(unsigned char *cp, unsigned short n);
-static long decode(unsigned char **cpp);
+static long decode(unsigned char **cpp, const unsigned char *end);
 static unsigned char * put16(unsigned char *cp, unsigned short x);
-static unsigned short pull16(unsigned char **cpp);
+static long pull16(unsigned char **cpp, const unsigned char *end);
 
 /* Allocate compression data structure
  *	slots must be in range 0 to 255 (zero meaning no compression)
@@ -190,30 +190,34 @@ encode(unsigned char *cp, unsigned short n)
 	return cp;
 }
 
-/* Pull a 16-bit integer in host order from buffer in network byte order */
-static unsigned short
-pull16(unsigned char **cpp)
+/* Pull a 16-bit integer in host order from buffer in network byte order.
+ * Returns -1 if the buffer is exhausted, otherwise the 16-bit value.
+ */
+static long
+pull16(unsigned char **cpp, const unsigned char *end)
 {
-	short rval;
+	long rval;
 
+	if (*cpp + 2 > end)
+		return -1;
 	rval = *(*cpp)++;
 	rval <<= 8;
 	rval |= *(*cpp)++;
 	return rval;
 }
 
-/* Decode a number */
+/* Decode a number. Returns -1 if the buffer is exhausted. */
 static long
-decode(unsigned char **cpp)
+decode(unsigned char **cpp, const unsigned char *end)
 {
 	int x;
 
+	if (*cpp >= end)
+		return -1;
 	x = *(*cpp)++;
-	if(x == 0){
-		return pull16(cpp) & 0xffff;	/* pull16 returns -1 on error */
-	} else {
-		return x & 0xff;		/* -1 if PULLCHAR returned error */
-	}
+	if (x == 0)
+		return pull16(cpp, end);
+	return x & 0xff;
 }
 
 /*
@@ -499,6 +503,7 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize)
 	struct cstate *cs;
 	int len, hdrlen;
 	unsigned char *cp = icp;
+	const unsigned char *end = icp + isize;
 
 	/* We've got a compressed packet; read the change byte */
 	comp->sls_i_compressed++;
@@ -534,6 +539,8 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize)
 	thp = &cs->cs_tcp;
 	ip = &cs->cs_ip;
 
+	if (cp + 2 > end)
+		goto bad;
 	thp->check = *(__sum16 *)cp;
 	cp += 2;
 
@@ -564,26 +571,26 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize)
 	default:
 		if(changes & NEW_U){
 			thp->urg = 1;
-			if((x = decode(&cp)) == -1) {
+			if((x = decode(&cp, end)) == -1) {
 				goto bad;
 			}
 			thp->urg_ptr = htons(x);
 		} else
 			thp->urg = 0;
 		if(changes & NEW_W){
-			if((x = decode(&cp)) == -1) {
+			if((x = decode(&cp, end)) == -1) {
 				goto bad;
 			}
 			thp->window = htons( ntohs(thp->window) + x);
 		}
 		if(changes & NEW_A){
-			if((x = decode(&cp)) == -1) {
+			if((x = decode(&cp, end)) == -1) {
 				goto bad;
 			}
 			thp->ack_seq = htonl( ntohl(thp->ack_seq) + x);
 		}
 		if(changes & NEW_S){
-			if((x = decode(&cp)) == -1) {
+			if((x = decode(&cp, end)) == -1) {
 				goto bad;
 			}
 			thp->seq = htonl( ntohl(thp->seq) + x);
@@ -591,7 +598,7 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize)
 		break;
 	}
 	if(changes & NEW_I){
-		if((x = decode(&cp)) == -1) {
+		if((x = decode(&cp, end)) == -1) {
 			goto bad;
 		}
 		ip->id = htons (ntohs (ip->id) + x);
-- 
2.43.0


^ permalink raw reply related

* Re: [patch 18/38] lib/tests: Replace get_cycles() with ktime_get()
From: Geert Uytterhoeven @ 2026-04-16 10:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Andrew Morton, Uladzislau Rezki, linux-mm, Arnd Bergmann,
	x86, Lu Baolu, iommu, Michael Grzeschik, netdev, linux-wireless,
	Herbert Xu, linux-crypto, Vlastimil Babka, David Woodhouse,
	Bernie Thompson, linux-fbdev, Theodore Tso, linux-ext4,
	Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
	Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
	linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
	Huacai Chen, loongarch, linux-m68k, Dinh Nguyen, Jonas Bonn,
	linux-openrisc, Helge Deller, linux-parisc, Michael Ellerman,
	linuxppc-dev, Paul Walmsley, linux-riscv, Heiko Carstens,
	linux-s390, David S. Miller, sparclinux
In-Reply-To: <20260410120318.794680738@kernel.org>

Hi Thomas,

On Fri, 10 Apr 2026 at 14:20, Thomas Gleixner <tglx@kernel.org> wrote:
> get_cycles() is the historical access to a fine grained time source, but it
> is a suboptimal choice for two reasons:
>
>    - get_cycles() is not guaranteed to be supported and functional on all
>      systems/platforms. If not supported or not functional it returns 0,
>      which makes benchmarking moot.
>
>    - get_cycles() returns the raw counter value of whatever the
>      architecture platform provides. The original x86 Time Stamp Counter
>      (TSC) was despite its name tied to the actual CPU core frequency.
>      That's not longer the case. So the counter value is only meaningful
>      when the CPU operates at the same frequency as the TSC or the value is
>      adjusted to the actual CPU frequency. Other architectures and
>      platforms provide similar disjunct counters via get_cycles(), so the
>      result is operations per BOGO-cycles, which is not really meaningful.
>
> Use ktime_get() instead which provides nanosecond timestamps with the
> granularity of the underlying hardware counter, which is not different to
> the variety of get_cycles() implementations.
>
> This provides at least understandable metrics, i.e. operations/nanoseconds,
> and is available on all platforms. As with get_cycles() the result might
> have to be put into relation with the CPU operating frequency, but that's
> not any different.
>
> This is part of a larger effort to remove get_cycles() usage from
> non-architecture code.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Thanks for your patch!

> --- a/lib/interval_tree_test.c
> +++ b/lib/interval_tree_test.c
> @@ -65,13 +65,13 @@ static void init(void)
>  static int basic_check(void)
>  {
>         int i, j;
> -       cycles_t time1, time2, time;
> +       ktime_t time1, time2, time;
>
>         printk(KERN_ALERT "interval tree insert/remove");
>
>         init();
>
> -       time1 = get_cycles();
> +       time1 = ktime_get();
>
>         for (i = 0; i < perf_loops; i++) {
>                 for (j = 0; j < nnodes; j++)
> @@ -80,11 +80,11 @@ static int basic_check(void)
>                         interval_tree_remove(nodes + j, &root);
>         }
>
> -       time2 = get_cycles();
> +       time2 = ktime_get();
>         time = time2 - time1;
>
>         time = div_u64(time, perf_loops);
> -       printk(" -> %llu cycles\n", (unsigned long long)time);
> +       printk(" -> %llu nsecs\n", (unsigned long long)time);

While cycles_t was unsigned long or long long, ktime_t is always s64,
so "%lld", and the cast can be dropped (everywhere).

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net 00/14] Netfilter/IPVS fixes for net
From: Pablo Neira Ayuso @ 2026-04-16 10:20 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
In-Reply-To: <aeCPB1_WaFOX-Xos@chamomile>

Hi,

On Thu, Apr 16, 2026 at 09:25:59AM +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> I am preparing a v2 to address so AI generated comment, I should be
> ready in a few hours.

Just a quick follow up.

I cannot send a batch before 16h my local time, I need a bit more
time.

Sorry.

> Thanks.
> 
> On Thu, Apr 16, 2026 at 03:30:47AM +0200, Pablo Neira Ayuso wrote:
> > Hi,
> > 
> > The following patchset contains Netfilter/IPVS fixes for net: Mostly
> > addressing very old bugs in the SIP conntrack helper string parser,
> > unsafe arp_tables match support with legacy IEEE1394, restrict xt_realm
> > to IPv4 and incorrect use of RCU lists in nat core and nftables. This
> > batch also includes one IPVS MTU fix. The exception is a fix for a
> > recent issue related to broken double-tagged vlan in the flowtable.
> > 
> > 1) Fix possible stack recursion in nft_fwd_netdev from egress path,
> >    from Weiming Shi.
> > 
> > 2) Fix unsafe port parser in SIP helper, from Jenny Guanni Qu.
> > 
> > 3) Fix arp_tables match with IEEE1394 ARP payload, allowing to
> >    reach bytes off the skb boundary, from Weiming Shi.
> > 
> > 4) Reject unsafe nfnetlink_osf configurations from control plane,
> >    this is addressing a possible division by zero, from Xiang Mei.
> > 
> > 5) nft_osf actually only supports IPv4, restrict it.
> > 
> > 6) Fix double-tagged-vlan support (again) in the flowtable, from
> >    Eric Woudstra.
> > 
> > 7) Remove unsafe use of sprintf to fix possible buffer overflow
> >    in the SIP NAT helper, from Florian Westphal.
> > 
> > 8) Restrict xt_mac, xt_owner and xt_physdev to inet families only;
> >    xt_realm is only for ipv4, otherwise null-pointer-deref is possible.
> > 
> > 9) Use kfree_rcu() in nat core to release hooks, this can be an issue
> >    once nfnetlink_hook gets support to dump NAT hook information,
> >    not currently a real issue but better fix it now.
> > 
> > 10) Fix MTU checks in IPVS, from Yingnan Zhang.
> > 
> > 11) Use list_del_rcu() in chain and flowtable hook unregistration,
> >     concurrent RCU reader could be walking over the hook list,
> >     from Florian Westphal.
> > 
> > 12) Add list_splice_rcu(), this is required to fix unsafe
> >     splice to RCU protected hook list. Reviewed by Paul McKenney.
> > 
> > 13) Use list_splice_rcu() to splice new chain and flowtable hooks.
> > 
> > 14) Add shim nft_trans_hook object to track chain and flowtable
> >     hook deletions and flag them as removed, instead of unsafely
> >     moving around hooks in the RCU-protected hook list. This allows
> >     to restore the previous state from the abort path.
> > 
> > Please, pull these changes from:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-04-16
> > 
> > Thanks.
> > 
> > ----------------------------------------------------------------
> > 
> > The following changes since commit 2dddb34dd0d07b01fa770eca89480a4da4f13153:
> > 
> >   net: ethernet: mtk_eth_soc: initialize PPE per-tag-layer MTU registers (2026-04-12 15:22:58 -0700)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-26-04-16
> > 
> > for you to fetch changes up to e349f90da812aeddd22c3914a2cc639b51e4eb48:
> > 
> >   netfilter: nf_tables: add hook transactions for device deletions (2026-04-16 02:47:58 +0200)
> > 
> > ----------------------------------------------------------------
> > netfilter pull request 26-04-16
> > 
> > ----------------------------------------------------------------
> > Eric Woudstra (1):
> >       netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push()
> > 
> > Florian Westphal (2):
> >       netfilter: conntrack: remove sprintf usage
> >       netfilter: nf_tables: use list_del_rcu for netlink hooks
> > 
> > Jenny Guanni Qu (1):
> >       netfilter: nf_conntrack_sip: add bounds-checked port parsing helper
> > 
> > Pablo Neira Ayuso (6):
> >       netfilter: nft_osf: restrict it to ipv4
> >       netfilter: xtables: restrict several matches to inet family
> >       netfilter: nat: use kfree_rcu to release ops
> >       rculist: add list_splice_rcu() for private lists
> >       netfilter: nf_tables: join hook list via splice_list_rcu() in commit phase
> >       netfilter: nf_tables: add hook transactions for device deletions
> > 
> > Weiming Shi (2):
> >       netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
> >       netfilter: arp_tables: fix IEEE1394 ARP payload parsing in arp_packet_match()
> > 
> > Xiang Mei (1):
> >       netfilter: nfnetlink_osf: fix divide-by-zero in OSF_WSS_MODULO
> > 
> > Yingnan Zhang (1):
> >       ipvs: fix MTU check for GSO packets in tunnel mode
> > 
> >  include/linux/rculist.h               |  29 ++++++
> >  include/net/netfilter/nf_dup_netdev.h |  13 +++
> >  include/net/netfilter/nf_tables.h     |  13 +++
> >  net/ipv4/netfilter/arp_tables.c       |  14 ++-
> >  net/ipv4/netfilter/iptable_nat.c      |   2 +-
> >  net/ipv6/netfilter/ip6table_nat.c     |   2 +-
> >  net/netfilter/ipvs/ip_vs_xmit.c       |  19 +++-
> >  net/netfilter/nf_conntrack_sip.c      |  80 +++++++++++-----
> >  net/netfilter/nf_dup_netdev.c         |  16 ----
> >  net/netfilter/nf_flow_table_ip.c      |  25 ++++-
> >  net/netfilter/nf_nat_amanda.c         |   2 +-
> >  net/netfilter/nf_nat_core.c           |  10 +-
> >  net/netfilter/nf_nat_sip.c            |  33 ++++---
> >  net/netfilter/nf_tables_api.c         | 168 ++++++++++++++++++++++++----------
> >  net/netfilter/nfnetlink_osf.c         |   4 +
> >  net/netfilter/nft_fwd_netdev.c        |   7 ++
> >  net/netfilter/nft_osf.c               |   6 +-
> >  net/netfilter/xt_mac.c                |  34 ++++---
> >  net/netfilter/xt_owner.c              |  37 +++++---
> >  net/netfilter/xt_physdev.c            |  29 ++++--
> >  net/netfilter/xt_realm.c              |   2 +-
> >  21 files changed, 393 insertions(+), 152 deletions(-)
> > 
> 

^ permalink raw reply

* Re: [PATCH] macvlan: fix macvlan_get_size() not reserving space for IFLA_MACVLAN_BC_CUTOFF
From: patchwork-bot+netdevbpf @ 2026-04-16 10:20 UTC (permalink / raw)
  To: Dudu Lu; +Cc: netdev, andrew+netdev, davem, edumazet, kuba, pabeni
In-Reply-To: <20260413085349.73977-1-phx0fer@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Mon, 13 Apr 2026 16:53:49 +0800 you wrote:
> macvlan_get_size() does not account for IFLA_MACVLAN_BC_CUTOFF, but
> macvlan_fill_info() conditionally includes it when port->bc_cutoff != 1.
> This causes nla_put_s32() to fail with -EMSGSIZE when the netlink skb
> runs out of space, triggering a WARN_ON in rtnetlink and preventing the
> interface from being dumped.
> 
> The bug can be reproduced with:
> 
> [...]

Here is the summary with links:
  - macvlan: fix macvlan_get_size() not reserving space for IFLA_MACVLAN_BC_CUTOFF
    https://git.kernel.org/netdev/net/c/fa92a77b0ed4

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] net: dsa: sja1105: fix division by zero in sja1105_tas_set_runtime_params()
From: Paolo Abeni @ 2026-04-16 10:09 UTC (permalink / raw)
  To: Alexander.Chesnokov, olteanv
  Cc: lvc-project, Oleg.Kazakov, Pavel.Zhigulin, stable, Andrew Lunn,
	Florian Fainelli, David S. Miller, Eric Dumazet, Jakub Kicinski,
	linux-kernel, netdev
In-Reply-To: <20260413085140.33138-1-Alexander.Chesnokov@kaspersky.com>

On 4/13/26 10:51 AM, Alexander.Chesnokov@kaspersky.com wrote:
> From: Alexander Chesnokov <Alexander.Chesnokov@kaspersky.com>
> 
> If taprio offload is configured such that none of the ports' base_time
> is less than S64_MAX (the initial value of earliest_base_time), then
> its_cycle_time remains zero and is passed to future_base_time() as
> cycle_time, causing division by zero in div_s64().
> 
> Add a check for its_cycle_time being zero before calling
> future_base_time() and return -EINVAL.
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Fixes: 86db36a347b4 ("net: dsa: sja1105: Implement state machine for TAS with PTP clock source")
> Cc: stable@vger.kernel.org
> 

No empty lines in the tag area.

> Signed-off-by: Alexander Chesnokov <Alexander.Chesnokov@kaspersky.com>
> ---
>  drivers/net/dsa/sja1105/sja1105_tas.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/dsa/sja1105/sja1105_tas.c b/drivers/net/dsa/sja1105/sja1105_tas.c
> index e6153848a950..ce4b544a2b9c 100644
> --- a/drivers/net/dsa/sja1105/sja1105_tas.c
> +++ b/drivers/net/dsa/sja1105/sja1105_tas.c
> @@ -62,6 +62,9 @@ static int sja1105_tas_set_runtime_params(struct sja1105_private *priv)
>  	if (!tas_data->enabled)
>  		return 0;
>  
> +	if (!its_cycle_time)
> +		return -EINVAL;

Sashiko says:

Is this division by zero reachable without this check?
When all ports have base_time == S64_MAX, earliest_base_time and
latest_base_time are both S64_MAX. When future_base_time(S64_MAX, 0,
S64_MAX) is called, it returns early because base_time >= now (S64_MAX
>= S64_MAX), avoiding the division.
Could this new error path cause an actual division by zero later?
When returning -EINVAL here, tas_data->enabled is already set to true,
but tas_data->max_cycle_time is left uninitialized (0).
If sja1105_tas_state_machine() runs later, it will pass this
max_cycle_time as the cycle_time argument to future_base_time(). Since 0
>= now + 1s is false, it proceeds to call div_s64() with a zero divisor.

/P


^ permalink raw reply

* Re: [PATCH] net/sched: act_mirred: fix wrong device for mac_header_xmit check in tcf_blockcast_redir
From: patchwork-bot+netdevbpf @ 2026-04-16  9:40 UTC (permalink / raw)
  To: Dudu Lu; +Cc: netdev, jhs, jiri
In-Reply-To: <20260413084927.71353-1-phx0fer@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Mon, 13 Apr 2026 16:49:27 +0800 you wrote:
> In tcf_blockcast_redir(), when iterating block ports to redirect
> packets to multiple devices, the mac_header_xmit flag is queried
> from the wrong device. The loop sends to dev_prev but queries
> dev_is_mac_header_xmit(dev) — which is the NEXT device in the
> iteration, not the one being sent to.
> 
> This causes tcf_mirred_to_dev() to make incorrect decisions about
> whether to push or pull the MAC header. When the block contains
> mixed device types (e.g., an ethernet veth and a tunnel device),
> intermediate devices get the wrong mac_header_xmit flag, leading to
> skb header corruption. In the worst case, skb_push_rcsum with an
> incorrect mac_len can exhaust headroom and panic.
> 
> [...]

Here is the summary with links:
  - net/sched: act_mirred: fix wrong device for mac_header_xmit check in tcf_blockcast_redir
    https://git.kernel.org/netdev/net/c/4510d140524c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [net,PATCH v3 1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Marek Vasut @ 2026-04-16  9:26 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, stable, David S. Miller, Andrew Lunn, Eric Dumazet,
	Jakub Kicinski, Nicolai Buchwitz, Paolo Abeni, Ronald Wahl,
	Yicong Hui, linux-kernel
In-Reply-To: <20260416062159.fPxqc52X@linutronix.de>

On 4/16/26 8:21 AM, Sebastian Andrzej Siewior wrote:
> On 2026-04-16 01:14:35 [+0200], Marek Vasut wrote:
>>> spin_unlock_bh(&ks->statelock)? After that unlock, the softirq must be
>>> processed and __netdev_alloc_skb() _could_ observe pending softirqs but
>>> not from ks8851.
>> Because __netdev_alloc_skb() also enables/disables BH , see the "else"
> 
> Yes. But there is no softirq raised in that part. That softirq is raised
> by netif_wake_queue() within a bh disabled section. Therefore upon the
> unlock the softirq must be invoked.
> After that, rhe allocation later on may invoke softirqs which were
> raised but I don't see how ks8851 can be part of it.
> Before commit 0913ec336a6c0 ("net: ks8851: Fix deadlock with the SPI
> chip variant") there was no _bh around it meaning the softirq was raised
> but not invoked immediately. This happened on the bh unlock during
> memory allocation. Therefore I am saying this backtrace is from an older
> kernel.

I actually did update the backtrace in V3 with the one from next 
20260413 that contained b44596ffe1b4 ("ARM: Allow to enable RT") from 
stable-rt/v6.12-rt-rebase branch [1] .

I think I misunderstood the usage of "softirq is raised" vs. "softirq is 
invoked" above . Is it possible that there was an already raised softirq 
before the threaded IRQ handler was invoked, and __netdev_alloc_skb() is 
what invoked that softirq ?

> If there is a flaw in my the theory please explain _how_ you managed
> that get that backtrace. I am sure it must have from an older kernel and
> _now_ this lockup also happens on !RT kernels (except for the SPI
> platform).
I used [1] , with PREEMPT_RT enabled , on stm32mp157c SoC . I ran iperf3 
-s on the stm32 side, iperf3 -c 192.168.1.2 -t 0 --bidir on the hostpc 
side. The backtrace happened shortly after.

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net] ixgbe: only access vfinfo and mv_list under RCU lock
From: Loktionov, Aleksandr @ 2026-04-16  9:23 UTC (permalink / raw)
  To: Vinschen, Corinna, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org
  Cc: Vinschen, Corinna
In-Reply-To: <20260416084227.3787828-1-vinschen@redhat.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Corinna Vinschen
> Sent: Thursday, April 16, 2026 10:42 AM
> To: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org
> Cc: Vinschen, Corinna <vinschen@redhat.com>
> Subject: [Intel-wired-lan] [PATCH net] ixgbe: only access vfinfo and
> mv_list under RCU lock
> 
> Commit 1e53834ce541d ("ixgbe: Add locking to prevent panic when
> setting
> sriov_numvfs to zero") added a spinlock to the adapter info.  The
> reason
> at the time was an observed crash when ixgbe_disable_sriov() freed the
> adapter->vfinfo array while the interrupt driven function
> ixgbe_msg_task()
> was handling VF messages.
> 
> Recent stability testing turned up another crash, which is very easily
> reproducible:
> 
>   while true
>   do
>     for numvfs in 5 0
>     do
>       echo $numvfs > /sys/class/net/eth0/device/sriov_numvfs
>     done
>   done
> 
> This crashed almost always within the first two hundred runs with
> a NULL pointer deref while running the ixgbe_service_task() workqueue:
> 
> [ 5052.036491] BUG: kernel NULL pointer dereference, address:
> 0000000000000258
> [ 5052.043454] #PF: supervisor read access in kernel mode
> [ 5052.048594] #PF: error_code(0x0000) - not-present page
> [ 5052.053734] PGD 0 P4D 0
> [ 5052.056272] Oops: Oops: 0000 #1 SMP NOPTI
> [ 5052.060459] CPU: 2 UID: 0 PID: 132253 Comm: kworker/u96:0 Kdump:
> loaded Not tainted 6.12.0-180.el10.x86_64 #1 PREEMPT(voluntary)
> [ 5052.072100] Hardware name: Dell Inc. PowerEdge R740/0DY2X0, BIOS
> 2.12.2 07/09/2021
> [ 5052.079664] Workqueue: ixgbe ixgbe_service_task [ixgbe]
> [ 5052.084907] RIP: 0010:ixgbe_update_stats+0x8b1/0xb40 [ixgbe]
> [ 5052.090585] Code: 21 56 50 49 8b b6 18 26 00 00 4c 01 fe 48 09 46
> 50 42 8d 34 a5 00 83 00 00 e8 cb 7a ff ff 49 8b b6 18 26 00 00 89 c0
> 4c 01 fe <48> 3b 86 88 00 00 00 73 18 48 b9 00 00 00 00 01 00 00 00 48
> 01 4e
> [ 5052.109331] RSP: 0018:ffffd5f1e8a6bd88 EFLAGS: 00010202
> [ 5052.114558] RAX: 0000000000000000 RBX: ffff8f49b22b14a0 RCX:
> 000000000000023c
> [ 5052.121689] RDX: ffffffff00000000 RSI: 00000000000001d0 RDI:
> ffff8f49b22b14a0
> [ 5052.128823] RBP: 000000000000109c R08: 0000000000000000 R09:
> 0000000000000000
> [ 5052.135955] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000002
> [ 5052.143086] R13: 0000000000008410 R14: ffff8f49b22b01a0 R15:
> 00000000000001d0
> [ 5052.150221] FS:  0000000000000000(0000) GS:ffff8f58bfc80000(0000)
> knlGS:0000000000000000
> [ 5052.158307] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5052.164054] CR2: 0000000000000258 CR3: 0000000bf2624006 CR4:
> 00000000007726f0
> [ 5052.171187] PKRU: 55555554
> [ 5052.173898] Call Trace:
> [ 5052.176351]  <TASK>
> [ 5052.178457]  ? show_trace_log_lvl+0x1b0/0x2f0
> [ 5052.182816]  ? show_trace_log_lvl+0x1b0/0x2f0
> [ 5052.187177]  ? ixgbe_watchdog_subtask+0x1a1/0x230 [ixgbe]
> [ 5052.192591]  ? __die_body.cold+0x8/0x12
> [ 5052.196433]  ? page_fault_oops+0x148/0x160
> [ 5052.200532]  ? exc_page_fault+0x7f/0x150
> [ 5052.204458]  ? asm_exc_page_fault+0x26/0x30
> [ 5052.208643]  ? ixgbe_update_stats+0x8b1/0xb40 [ixgbe]
> [ 5052.213714]  ? ixgbe_update_stats+0x8a5/0xb40 [ixgbe]
> [ 5052.218784]  ixgbe_watchdog_subtask+0x1a1/0x230 [ixgbe]
> [ 5052.224026]  ixgbe_service_task+0x15a/0x3f0 [ixgbe]
> [ 5052.228916]  process_one_work+0x177/0x330
> [ 5052.232928]  worker_thread+0x256/0x3a0
> [ 5052.236681]  ? __pfx_worker_thread+0x10/0x10
> [ 5052.240952]  kthread+0xfa/0x240
> [ 5052.244099]  ? __pfx_kthread+0x10/0x10
> [ 5052.247852]  ret_from_fork+0x34/0x50
> [ 5052.251429]  ? __pfx_kthread+0x10/0x10
> [ 5052.255185]  ret_from_fork_asm+0x1a/0x30
> [ 5052.259112]  </TASK>
> 
> The first simple patch, just adding spinlocking to
> ixgbe_update_stats()
> while reading from adapter->vfinfo, did not fix the problem, it just
> moved it elsewhere: I could now reproduce the same kind of crash in
> ixgbe_restore_vf_multicasts().
> 
> But adding more spinlocking doesn't really cut it.  One reason is that
> ixgbe_restore_vf_multicasts() is called from within ixgbe_msg_task()
> with active spinlock, as well as from outside without locking.
> 
> Additionally, given that ixgbe_disable_sriov() is the only call
> changing
> adapter->vfinfo, and given ixgbe_disable_sriov() is called very
> seldom compared to other actions in the driver, just adding more
> spinlocks would unnecessarily occupy the driver with spinning when
> multiple functions accessing adapter->vfinfo are running in parallel.
> 
> So this patch drops the spinlock in favor of RCU and uses it
> throughout
> the driver.
> 
> While changing this, it seems prudent to do the same for the
> adapter->mv_list array, which is allocated and freed at the same time
> as
> adapter->vfinfo, albeit there was no crash observed.
> 
> Fixes: 1e53834ce541d ("ixgbe: Add locking to prevent panic when
> setting sriov_numvfs to zero")
> Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   7 +-
>  .../net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c   |  36 +-
>  .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  44 +-
>  .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    |  17 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 229 +++++---
>  .../net/ethernet/intel/ixgbe/ixgbe_sriov.c    | 547 ++++++++++++-----
> -
>  6 files changed, 593 insertions(+), 287 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 9b8217523fd2..8849b9f42bf6 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -210,6 +210,7 @@ struct vf_stats {
>  };
> 
>  struct vf_data_storage {
> +	struct rcu_head rcu_head;
>  	struct pci_dev *vfdev;
>  	unsigned char vf_mac_addresses[ETH_ALEN];
>  	u16 vf_mc_hashes[IXGBE_MAX_VF_MC_ENTRIES];
> @@ -240,6 +241,7 @@ enum ixgbevf_xcast_modes {
>  };
> 
>  struct vf_macvlans {
> +	struct rcu_head rcu_head;
>  	struct list_head l;
>  	int vf;
>  	bool free;
> @@ -808,10 +810,10 @@ struct ixgbe_adapter {
>  	/* SR-IOV */
>  	DECLARE_BITMAP(active_vfs, IXGBE_MAX_VF_FUNCTIONS);
>  	unsigned int num_vfs;

...

>  		if (!vfdev)
>  			continue;
>  		pci_read_config_word(vfdev, PCI_STATUS, &status_reg);
> @@ -9744,17 +9781,23 @@ static int ixgbe_ndo_get_vf_stats(struct
> net_device *netdev, int vf,
>  				  struct ifla_vf_stats *vf_stats)
>  {
>  	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
> +	struct vf_data_storage *vfinfo;
> 
>  	if (vf < 0 || vf >= adapter->num_vfs)
>  		return -EINVAL;
> 
> -	vf_stats->rx_packets = adapter->vfinfo[vf].vfstats.gprc;
> -	vf_stats->rx_bytes   = adapter->vfinfo[vf].vfstats.gorc;
> -	vf_stats->tx_packets = adapter->vfinfo[vf].vfstats.gptc;
> -	vf_stats->tx_bytes   = adapter->vfinfo[vf].vfstats.gotc;
> -	vf_stats->multicast  = adapter->vfinfo[vf].vfstats.mprc;
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (vfinfo) {
> +		vf_stats->rx_packets = vfinfo[vf].vfstats.gprc;
> +		vf_stats->rx_bytes   = vfinfo[vf].vfstats.gorc;
> +		vf_stats->tx_packets = vfinfo[vf].vfstats.gptc;
> +		vf_stats->tx_bytes   = vfinfo[vf].vfstats.gotc;
> +		vf_stats->multicast  = vfinfo[vf].vfstats.mprc;
> +	}
> +	rcu_read_unlock();
> 
> -	return 0;
> +	return vfinfo ? 0 : -EINVAL;
Before it returned always success, but now it will break 'ip link show dev' in short window when SR-IOV is being torn down.
For me it looks like UAPI regression.

>  }
> 
>  #ifdef CONFIG_IXGBE_DCB
> @@ -10071,20 +10114,26 @@ static int handle_redirect_action(struct
> ixgbe_adapter *adapter, int ifindex,
>  {
>  	struct ixgbe_ring_feature *vmdq = &adapter-
> >ring_feature[RING_F_VMDQ];
>  	unsigned int num_vfs = adapter->num_vfs, vf;

...

>  	return 0;
>  }
> --
> 2.53.0


^ permalink raw reply

* Re: [PATCH v3 net] ax25: fix OOB read after address header strip in ax25_rcv()
From: David Laight @ 2026-04-16  9:21 UTC (permalink / raw)
  To: Ashutosh Desai
  Cc: netdev, linux-hams, jreuter, davem, edumazet, kuba, pabeni, horms,
	stable, linux-kernel
In-Reply-To: <69e07601.c80a0220.2f9024.1e0b@mx.google.com>

On Wed, 15 Apr 2026 22:39:13 -0700 (PDT)
Ashutosh Desai <ashutoshdesai993@gmail.com> wrote:

> On Wed, 15 Apr 2026 08:59:21 +0100, David Laight wrote:
> > Is it just worth linearising the skb on entry to all this code?  
> 
> Thanks for the feedback, David.
> 
> skb_linearize() on entry is a nice idea for simplifying sanity checks
> overall, but it wouldn't fix this particular bug on its own - the issue
> is skb->len dropping to zero after skb_pull(), not non-linear data. We'd
> still need a length check regardless. pskb_may_pull(skb, 2) handles both
> in one call.

The skb->len >= 2 check will be a lot cheaper/smaller.

> That said, linearizing on entry to ax25_rcv() as a cleanup to simplify
> future checks sounds worthwhile - happy to send that as a separate
> net-next patch.

I think you proposed just checking skb->len in an earlier version
and it was pointed out that the skb may not be linear.
So perhaps linearize as part of this fix and leave the simplifcation
of any other checks to later.

	David

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v2 2/3] igc: move autoneg-enabled settings into igc_handle_autoneg_enabled()
From: Loktionov, Aleksandr @ 2026-04-16  9:05 UTC (permalink / raw)
  To: KhaiWenTan, Nguyen, Anthony L, Kitszel, Przemyslaw,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Abdul Rahim, Faizal, Looi, Hong Aun,
	Tan, Khai Wen, Faizal Rahim, Looi, Alan Chia Wei
In-Reply-To: <20260416015520.6090-3-khai.wen.tan@linux.intel.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of KhaiWenTan
> Sent: Thursday, April 16, 2026 3:55 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Abdul Rahim, Faizal
> <faizal.abdul.rahim@intel.com>; Looi, Hong Aun
> <hong.aun.looi@intel.com>; Tan, Khai Wen <khai.wen.tan@intel.com>;
> Faizal Rahim <faizal.abdul.rahim@linux.intel.com>; Looi; KhaiWenTan
> <khai.wen.tan@linux.intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-next v2 2/3] igc: move autoneg-
> enabled settings into igc_handle_autoneg_enabled()
> 
> From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> 
> Move the advertised link modes and flow control configuration from
> igc_ethtool_set_link_ksettings() into igc_handle_autoneg_enabled().
> 
> No functional change.
> 
> Reviewed-by: Looi, Hong Aun <hong.aun.looi@intel.com>
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc_ethtool.c | 72 ++++++++++++-------
> -
>  1 file changed, 44 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c
> b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> index 0122009bedd0..cfcbf2fdad6e 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
> @@ -2000,6 +2000,49 @@ static int
> igc_ethtool_get_link_ksettings(struct net_device *netdev,
>  	return 0;
>  }
> 
> +/**
> + * igc_handle_autoneg_enabled - Configure autonegotiation
> advertisement
> + * @adapter: private driver structure
> + * @cmd: ethtool link ksettings from user
> + *
> + * Records advertised speeds and flow control settings when autoneg
> + * is enabled.
> + */
> +static void igc_handle_autoneg_enabled(struct igc_adapter *adapter,
> +				       const struct ethtool_link_ksettings
> *cmd) {
> +	struct igc_hw *hw = &adapter->hw;
> +	u16 advertised = 0;
> +
> +	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> +						  2500baseT_Full))
> +		advertised |= ADVERTISE_2500_FULL;
> +
> +	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> +						  1000baseT_Full))
> +		advertised |= ADVERTISE_1000_FULL;
> +
> +	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> +						  100baseT_Full))
> +		advertised |= ADVERTISE_100_FULL;
> +
> +	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> +						  100baseT_Half))
> +		advertised |= ADVERTISE_100_HALF;
> +
> +	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> +						  10baseT_Full))
> +		advertised |= ADVERTISE_10_FULL;
> +
> +	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> +						  10baseT_Half))
> +		advertised |= ADVERTISE_10_HALF;
> +
> +	hw->phy.autoneg_advertised = advertised;
> +	if (adapter->fc_autoneg)
> +		hw->fc.requested_mode = igc_fc_default; }
> +
>  static int
>  igc_ethtool_set_link_ksettings(struct net_device *netdev,
>  			       const struct ethtool_link_ksettings *cmd)
> @@ -2007,7 +2050,6 @@ igc_ethtool_set_link_ksettings(struct net_device
> *netdev,
>  	struct igc_adapter *adapter = netdev_priv(netdev);
>  	struct net_device *dev = adapter->netdev;
>  	struct igc_hw *hw = &adapter->hw;
> -	u16 advertised = 0;
> 
>  	/* When adapter in resetting mode, autoneg/speed/duplex
>  	 * cannot be changed
> @@ -2032,34 +2074,8 @@ igc_ethtool_set_link_ksettings(struct
> net_device *netdev,
>  	while (test_and_set_bit(__IGC_RESETTING, &adapter->state))
>  		usleep_range(1000, 2000);
> 
> -	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> -						  2500baseT_Full))
> -		advertised |= ADVERTISE_2500_FULL;
> -
> -	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> -						  1000baseT_Full))
> -		advertised |= ADVERTISE_1000_FULL;
> -
> -	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> -						  100baseT_Full))
> -		advertised |= ADVERTISE_100_FULL;
> -
> -	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> -						  100baseT_Half))
> -		advertised |= ADVERTISE_100_HALF;
> -
> -	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> -						  10baseT_Full))
> -		advertised |= ADVERTISE_10_FULL;
> -
> -	if (ethtool_link_ksettings_test_link_mode(cmd, advertising,
> -						  10baseT_Half))
> -		advertised |= ADVERTISE_10_HALF;
> -
>  	if (cmd->base.autoneg == AUTONEG_ENABLE) {
> -		hw->phy.autoneg_advertised = advertised;
> -		if (adapter->fc_autoneg)
> -			hw->fc.requested_mode = igc_fc_default;
> +		igc_handle_autoneg_enabled(adapter, cmd);
>  	} else {
>  		netdev_info(dev, "Force mode currently not
> supported\n");
>  	}
> --
> 2.43.0


Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH iwl-next v2 1/3] igc: remove unused autoneg_failed field
From: Loktionov, Aleksandr @ 2026-04-16  9:04 UTC (permalink / raw)
  To: KhaiWenTan, Nguyen, Anthony L, Kitszel, Przemyslaw,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Abdul Rahim, Faizal, Looi, Hong Aun,
	Tan, Khai Wen, Faizal Rahim, Looi, Alan Chia Wei
In-Reply-To: <20260416015520.6090-2-khai.wen.tan@linux.intel.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of KhaiWenTan
> Sent: Thursday, April 16, 2026 3:55 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Abdul Rahim, Faizal
> <faizal.abdul.rahim@intel.com>; Looi, Hong Aun
> <hong.aun.looi@intel.com>; Tan, Khai Wen <khai.wen.tan@intel.com>;
> Faizal Rahim <faizal.abdul.rahim@linux.intel.com>; Looi; KhaiWenTan
> <khai.wen.tan@linux.intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-next v2 1/3] igc: remove unused
> autoneg_failed field
> 
> From: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> 
> autoneg_failed in struct igc_mac_info is never set in the igc driver.
> Remove the field and the dead code checking it in
> igc_config_fc_after_link_up().
> 
> Reviewed-by: Looi, Hong Aun <hong.aun.looi@intel.com>
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
> Signed-off-by: KhaiWenTan <khai.wen.tan@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/igc/igc_hw.h  |  1 -
> drivers/net/ethernet/intel/igc/igc_mac.c | 16 +---------------
>  2 files changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_hw.h
> b/drivers/net/ethernet/intel/igc/igc_hw.h
> index be8a49a86d09..86ab8f566f44 100644
> --- a/drivers/net/ethernet/intel/igc/igc_hw.h
> +++ b/drivers/net/ethernet/intel/igc/igc_hw.h
> @@ -92,7 +92,6 @@ struct igc_mac_info {
>  	bool asf_firmware_present;
>  	bool arc_subsystem_valid;
> 
> -	bool autoneg_failed;
>  	bool get_link_status;
>  };
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c
> b/drivers/net/ethernet/intel/igc/igc_mac.c
> index 7ac6637f8db7..142beb9ae557 100644
> --- a/drivers/net/ethernet/intel/igc/igc_mac.c
> +++ b/drivers/net/ethernet/intel/igc/igc_mac.c
> @@ -438,28 +438,14 @@ void igc_config_collision_dist(struct igc_hw
> *hw)
>   * Checks the status of auto-negotiation after link up to ensure that
> the
>   * speed and duplex were not forced.  If the link needed to be
> forced, then
>   * flow control needs to be forced also.  If auto-negotiation is
> enabled
> - * and did not fail, then we configure flow control based on our link
> - * partner.
> + * then we configure flow control based on our link partner.
>   */
>  s32 igc_config_fc_after_link_up(struct igc_hw *hw)  {
>  	u16 mii_status_reg, mii_nway_adv_reg, mii_nway_lp_ability_reg;
> -	struct igc_mac_info *mac = &hw->mac;
>  	u16 speed, duplex;
>  	s32 ret_val = 0;
> 
> -	/* Check for the case where we have fiber media and auto-neg
> failed
> -	 * so we had to force link.  In this case, we need to force the
> -	 * configuration of the MAC to match the "fc" parameter.
> -	 */
> -	if (mac->autoneg_failed)
> -		ret_val = igc_force_mac_fc(hw);
> -
> -	if (ret_val) {
> -		hw_dbg("Error forcing flow control settings\n");
> -		goto out;
> -	}
> -
>  	/* In auto-neg, we need to check and see if Auto-Neg has
> completed,
>  	 * and if so, how the PHY and link partner has flow control
>  	 * configured.
> --
> 2.43.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply

* [PATCH bpf-next v4 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags
From: Nick Hudson @ 2026-04-16  7:55 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, linux-kernel
In-Reply-To: <20260416075514.927101-1-nhudson@akamai.com>

The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is
named to enum bpf_adj_room_flags to enable CO-RE (Compile Once -
Run Everywhere) lookups in BPF programs.

Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/bpf.h       | 2 +-
 tools/include/uapi/linux/bpf.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 552bc5d9afbd..c021ed8d7b44 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6211,7 +6211,7 @@ enum {
 };
 
 /* BPF_FUNC_skb_adjust_room flags. */
-enum {
+enum bpf_adj_room_flags {
 	BPF_F_ADJ_ROOM_FIXED_GSO	= (1ULL << 0),
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV4	= (1ULL << 1),
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 677be9a47347..ca35ed622ed5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6211,7 +6211,7 @@ enum {
 };
 
 /* BPF_FUNC_skb_adjust_room flags. */
-enum {
+enum bpf_adj_room_flags {
 	BPF_F_ADJ_ROOM_FIXED_GSO	= (1ULL << 0),
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV4	= (1ULL << 1),
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net-next v8 4/4] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
From: Simon Schippers @ 2026-04-16  8:54 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: willemdebruijn.kernel, andrew+netdev, davem, edumazet, kuba,
	pabeni, mst, eperezma, leiyang, stephen, jon, tim.gebauer, netdev,
	linux-kernel, kvm, virtualization
In-Reply-To: <b9d84d88-46d5-4fd3-a5b2-d914f54766f6@tu-dortmund.de>

To summarize the discussion from my POV:

Open point: __ptr_ring_zero_tail() is only called after
            consuming ring.batch elements.
1) Consumer wakes up the producer but the slot is not cleaned.
--> I disagree, the consumer only wakes after consuming ring.size/2.
    Then __ptr_ring_zero_tail() was called at least once.
2) Producer is woken up but see the ring is full, so it need to
   drop the packet.
--> I disagree, because then NETDEV_TX_BUSY is returned. This is
    noticeable as qdisc requeue and only happens very rarely.

Points I will address:
- Minor nit on patch 2 by MST.
- Rebase patch 3 because of commit d748047
  ("ptr_ring: disable KCSAN warnings").
- Document the pair of the smp_mb__after_atomic() in tun_net_xmit
  with tun_ring_consume().
- Use 1 ptr_ring spinlock instead of 2 (currently used for consume
  and empty check), not sure how to implement it pretty rn.
- Run pktgen benchmarks with pg_set SHARED.


^ permalink raw reply

* Re: [PATCH net-next v2 13/14] net: macb: use context swapping in .set_ringparam()
From: Théo Lebrun @ 2026-04-16  8:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Paolo Abeni, Richard Cochran, Russell King,
	Paolo Valerio, Conor Dooley, Nicolai Buchwitz,
	Vladimir Kondratiev, Gregory CLEMENT, Benoît Monin,
	Tawfik Bayouk, Thomas Petazzoni, Maxime Chevallier, netdev,
	linux-kernel
In-Reply-To: <20260413175040.352378c5@kernel.org>

Hello Jakub,

On Tue Apr 14, 2026 at 2:50 AM CEST, Jakub Kicinski wrote:
> On Fri, 10 Apr 2026 21:52:01 +0200 Théo Lebrun wrote:
>> ethtool_ops.set_ringparam() is implemented using the primitive close /
>> update ring size / reopen sequence. Under memory pressure this does not
>> fly: we free our buffers at close and cannot reallocate new ones at
>> open. Also, it triggers a slow PHY reinit.
>> 
>> Instead, exploit the new context mechanism and improve our sequence to:
>>  - allocate a new context (including buffers) first
>>  - if it fails, early return without any impact to the interface
>>  - stop interface
>>  - update global state (bp, netdev, etc)
>>  - pass buffer pointers to the hardware
>>  - start interface
>>  - free old context.
>> 
>> The HW disable sequence is inspired by macb_reset_hw() but avoids
>> (1) setting NCR bit CLRSTAT and (2) clearing register PBUFRXCUT.
>> 
>> The HW re-enable sequence is inspired by macb_mac_link_up(), skipping
>> over register writes which would be redundant (because values have not
>> changed).
>> 
>> The generic context swapping parts are isolated into helper functions
>> macb_context_swap_start|end(), reusable by other operations (change_mtu,
>> set_channels, etc).
>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
>> index 81beb67b206a..340ae7d881c6 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -3081,6 +3081,89 @@ static void macb_configure_dma(struct macb *bp)
>>  	}
>>  }
>>  
>> +static void macb_context_swap_start(struct macb *bp)
>> +{
>> +	struct macb_queue *queue;
>> +	unsigned long flags;
>> +	unsigned int q;
>> +	u32 ctrl;
>> +
>> +	/* Disable software Tx, disable HW Tx/Rx and disable NAPI. */
>> +
>> +	netif_tx_disable(bp->netdev);
>
> AFAIR netif_tx_disable() just stops all the queues, if the NAPIs and
> whatever else may wake queues is still running the queues may get
> restarted right away.

Your memory appears correct (unsurprisingly). Ordering was wrong, it
must be (1) NAPI disabling followed by (2) disabling of Tx queues.

The tx queue wakeup is possible in NAPI poll function through this call
stack: netif_wake_subqueue() <- macb_tx_complete() <- macb_tx_poll().

There is also macb_tx_error_task() that disables Tx queues at start and
re-enables them at the end. Meaning we need to disable Tx queues after
we disabled queue->tx_error_task. (Note that tx_error_task probably
races with NAPI, but that is outside our scope.)

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply

* Re: [PATCH net] net: airoha: Fix possible TX queue stall in airoha_qdma_tx_napi_poll()
From: Paolo Abeni @ 2026-04-16  8:44 UTC (permalink / raw)
  To: Lorenzo Bianconi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski
  Cc: linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260413-airoha-txq-potential-stall-v1-1-7830363b1543@kernel.org>

On 4/13/26 10:29 AM, Lorenzo Bianconi wrote:
> Since multiple net_device TX queues can share the same hw QDMA TX queue,
> there is no guarantee we have inflight packets queued in hw belonging to a
> net_device TX queue stopped in the xmit path because hw QDMA TX queue
> can be full. In this corner case the net_device TX queue will never be
> re-activated. In order to avoid any potential net_device TX queue stall,
> we need to wake all the net_device TX queues feeding the same hw QDMA TX
> queue in airoha_qdma_tx_napi_poll routine.
> 
> Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC")
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/ethernet/airoha/airoha_eth.c | 30 ++++++++++++++++++++++++++----
>  1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> index 9e995094c32a..e7610f36b8e4 100644
> --- a/drivers/net/ethernet/airoha/airoha_eth.c
> +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> @@ -855,6 +855,19 @@ static int airoha_qdma_init_rx(struct airoha_qdma *qdma)
>  	return 0;
>  }
>  
> +static void airoha_qdma_wake_tx_queues(struct airoha_qdma *qdma)
> +{
> +	struct airoha_eth *eth = qdma->eth;
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(eth->ports); i++) {
> +		struct airoha_gdm_port *port = eth->ports[i];
> +
> +		if (port && port->qdma == qdma)
> +			netif_tx_wake_all_queues(port->dev);
> +	}
> +}
> +
>  static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
>  {
>  	struct airoha_tx_irq_queue *irq_q;
> @@ -931,12 +944,21 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget)
>  
>  			txq = netdev_get_tx_queue(skb->dev, queue);
>  			netdev_tx_completed_queue(txq, 1, skb->len);
> -			if (netif_tx_queue_stopped(txq) &&
> -			    q->ndesc - q->queued >= q->free_thr)
> -				netif_tx_wake_queue(txq);
> -
>  			dev_kfree_skb_any(skb);
>  		}
> +
> +		if (q->ndesc - q->queued == q->free_thr) {

Sashiko says:

---
Can this exact equality check cause a permanent TX queue stall?
The previous logic checked if the free space was greater than or equal
to q->free_thr. If the xmit path stops the queue because the free space
drops to exactly q->free_thr, the hardware queue will have exactly
q->free_thr free slots.
When the NAPI poll routine subsequently reaps a completed descriptor,
q->queued is decremented, increasing the free space to q->free_thr + 1.
Since the free space is no longer exactly equal to the threshold, this
condition evaluates to false.
As NAPI continues to reap more descriptors, the free space strictly
increases, meaning the exact equality check will never evaluate to true
and the netdev TX queue will remain permanently stalled.
---

Please, try to triage sashiko comments proactively. Especially on NIC
drivers, validating the AI statements is extremely cumbersome for the
maintainers.

Thanks,

Paolo


^ permalink raw reply

* [PATCH net] ixgbe: only access vfinfo and mv_list under RCU lock
From: Corinna Vinschen @ 2026-04-16  8:42 UTC (permalink / raw)
  To: intel-wired-lan, netdev; +Cc: Corinna Vinschen

Commit 1e53834ce541d ("ixgbe: Add locking to prevent panic when setting
sriov_numvfs to zero") added a spinlock to the adapter info.  The reason
at the time was an observed crash when ixgbe_disable_sriov() freed the
adapter->vfinfo array while the interrupt driven function ixgbe_msg_task()
was handling VF messages.

Recent stability testing turned up another crash, which is very easily
reproducible:

  while true
  do
    for numvfs in 5 0
    do
      echo $numvfs > /sys/class/net/eth0/device/sriov_numvfs
    done
  done

This crashed almost always within the first two hundred runs with
a NULL pointer deref while running the ixgbe_service_task() workqueue:

[ 5052.036491] BUG: kernel NULL pointer dereference, address: 0000000000000258
[ 5052.043454] #PF: supervisor read access in kernel mode
[ 5052.048594] #PF: error_code(0x0000) - not-present page
[ 5052.053734] PGD 0 P4D 0
[ 5052.056272] Oops: Oops: 0000 #1 SMP NOPTI
[ 5052.060459] CPU: 2 UID: 0 PID: 132253 Comm: kworker/u96:0 Kdump: loaded Not tainted 6.12.0-180.el10.x86_64 #1 PREEMPT(voluntary)
[ 5052.072100] Hardware name: Dell Inc. PowerEdge R740/0DY2X0, BIOS 2.12.2 07/09/2021
[ 5052.079664] Workqueue: ixgbe ixgbe_service_task [ixgbe]
[ 5052.084907] RIP: 0010:ixgbe_update_stats+0x8b1/0xb40 [ixgbe]
[ 5052.090585] Code: 21 56 50 49 8b b6 18 26 00 00 4c 01 fe 48 09 46 50 42 8d 34 a5 00 83 00 00 e8 cb 7a ff ff 49 8b b6 18 26 00 00 89 c0 4c 01 fe <48> 3b 86 88 00 00 00 73 18 48 b9 00 00 00 00 01 00 00 00 48 01 4e
[ 5052.109331] RSP: 0018:ffffd5f1e8a6bd88 EFLAGS: 00010202
[ 5052.114558] RAX: 0000000000000000 RBX: ffff8f49b22b14a0 RCX: 000000000000023c
[ 5052.121689] RDX: ffffffff00000000 RSI: 00000000000001d0 RDI: ffff8f49b22b14a0
[ 5052.128823] RBP: 000000000000109c R08: 0000000000000000 R09: 0000000000000000
[ 5052.135955] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[ 5052.143086] R13: 0000000000008410 R14: ffff8f49b22b01a0 R15: 00000000000001d0
[ 5052.150221] FS:  0000000000000000(0000) GS:ffff8f58bfc80000(0000) knlGS:0000000000000000
[ 5052.158307] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5052.164054] CR2: 0000000000000258 CR3: 0000000bf2624006 CR4: 00000000007726f0
[ 5052.171187] PKRU: 55555554
[ 5052.173898] Call Trace:
[ 5052.176351]  <TASK>
[ 5052.178457]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 5052.182816]  ? show_trace_log_lvl+0x1b0/0x2f0
[ 5052.187177]  ? ixgbe_watchdog_subtask+0x1a1/0x230 [ixgbe]
[ 5052.192591]  ? __die_body.cold+0x8/0x12
[ 5052.196433]  ? page_fault_oops+0x148/0x160
[ 5052.200532]  ? exc_page_fault+0x7f/0x150
[ 5052.204458]  ? asm_exc_page_fault+0x26/0x30
[ 5052.208643]  ? ixgbe_update_stats+0x8b1/0xb40 [ixgbe]
[ 5052.213714]  ? ixgbe_update_stats+0x8a5/0xb40 [ixgbe]
[ 5052.218784]  ixgbe_watchdog_subtask+0x1a1/0x230 [ixgbe]
[ 5052.224026]  ixgbe_service_task+0x15a/0x3f0 [ixgbe]
[ 5052.228916]  process_one_work+0x177/0x330
[ 5052.232928]  worker_thread+0x256/0x3a0
[ 5052.236681]  ? __pfx_worker_thread+0x10/0x10
[ 5052.240952]  kthread+0xfa/0x240
[ 5052.244099]  ? __pfx_kthread+0x10/0x10
[ 5052.247852]  ret_from_fork+0x34/0x50
[ 5052.251429]  ? __pfx_kthread+0x10/0x10
[ 5052.255185]  ret_from_fork_asm+0x1a/0x30
[ 5052.259112]  </TASK>

The first simple patch, just adding spinlocking to ixgbe_update_stats()
while reading from adapter->vfinfo, did not fix the problem, it just
moved it elsewhere: I could now reproduce the same kind of crash in
ixgbe_restore_vf_multicasts().

But adding more spinlocking doesn't really cut it.  One reason is that
ixgbe_restore_vf_multicasts() is called from within ixgbe_msg_task()
with active spinlock, as well as from outside without locking.

Additionally, given that ixgbe_disable_sriov() is the only call changing
adapter->vfinfo, and given ixgbe_disable_sriov() is called very
seldom compared to other actions in the driver, just adding more
spinlocks would unnecessarily occupy the driver with spinning when
multiple functions accessing adapter->vfinfo are running in parallel.

So this patch drops the spinlock in favor of RCU and uses it throughout
the driver.

While changing this, it seems prudent to do the same for the
adapter->mv_list array, which is allocated and freed at the same time as
adapter->vfinfo, albeit there was no crash observed.

Fixes: 1e53834ce541d ("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   7 +-
 .../net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c   |  36 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  44 +-
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 229 +++++---
 .../net/ethernet/intel/ixgbe/ixgbe_sriov.c    | 547 ++++++++++++------
 6 files changed, 593 insertions(+), 287 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 9b8217523fd2..8849b9f42bf6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -210,6 +210,7 @@ struct vf_stats {
 };
 
 struct vf_data_storage {
+	struct rcu_head rcu_head;
 	struct pci_dev *vfdev;
 	unsigned char vf_mac_addresses[ETH_ALEN];
 	u16 vf_mc_hashes[IXGBE_MAX_VF_MC_ENTRIES];
@@ -240,6 +241,7 @@ enum ixgbevf_xcast_modes {
 };
 
 struct vf_macvlans {
+	struct rcu_head rcu_head;
 	struct list_head l;
 	int vf;
 	bool free;
@@ -808,10 +810,10 @@ struct ixgbe_adapter {
 	/* SR-IOV */
 	DECLARE_BITMAP(active_vfs, IXGBE_MAX_VF_FUNCTIONS);
 	unsigned int num_vfs;
-	struct vf_data_storage *vfinfo;
+	struct vf_data_storage __rcu *vfinfo;
 	int vf_rate_link_speed;
 	struct vf_macvlans vf_mvs;
-	struct vf_macvlans *mv_list;
+	struct vf_macvlans __rcu *mv_list;
 
 	u32 timer_event_accumulator;
 	u32 vferr_refcount;
@@ -844,7 +846,6 @@ struct ixgbe_adapter {
 #ifdef CONFIG_IXGBE_IPSEC
 	struct ixgbe_ipsec *ipsec;
 #endif /* CONFIG_IXGBE_IPSEC */
-	spinlock_t vfs_lock;
 };
 
 struct ixgbe_netdevice_priv {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
index 382d097e4b11..9a84cfc09120 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
@@ -640,17 +640,21 @@ static int ixgbe_dcbnl_ieee_setapp(struct net_device *dev,
 	/* VF devices should use default UP when available */
 	if (app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
 	    app->protocol == 0) {
+		struct vf_data_storage *vfinfo;
 		int vf;
 
 		adapter->default_up = app->priority;
 
-		for (vf = 0; vf < adapter->num_vfs; vf++) {
-			struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
-
-			if (!vfinfo->pf_qos)
-				ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
-						app->priority, vf);
-		}
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			for (vf = 0; vf < adapter->num_vfs; vf++) {
+				if (!vfinfo[vf].pf_qos)
+					ixgbe_set_vmvir(adapter,
+							vfinfo[vf].pf_vlan,
+							app->priority, vf);
+			}
+		rcu_read_unlock();
 	}
 
 	return 0;
@@ -683,19 +687,23 @@ static int ixgbe_dcbnl_ieee_delapp(struct net_device *dev,
 	/* IF default priority is being removed clear VF default UP */
 	if (app->selector == IEEE_8021QAZ_APP_SEL_ETHERTYPE &&
 	    app->protocol == 0 && adapter->default_up == app->priority) {
+		struct vf_data_storage *vfinfo;
 		int vf;
 		long unsigned int app_mask = dcb_ieee_getapp_mask(dev, app);
 		int qos = app_mask ? find_first_bit(&app_mask, 8) : 0;
 
 		adapter->default_up = qos;
 
-		for (vf = 0; vf < adapter->num_vfs; vf++) {
-			struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
-
-			if (!vfinfo->pf_qos)
-				ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
-						qos, vf);
-		}
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			for (vf = 0; vf < adapter->num_vfs; vf++) {
+				if (!vfinfo[vf].pf_qos)
+					ixgbe_set_vmvir(adapter,
+							vfinfo[vf].pf_vlan,
+							qos, vf);
+			}
+		rcu_read_unlock();
 	}
 
 	return err;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index ba049b3a9609..b77317476af4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2265,21 +2265,28 @@ static void ixgbe_diag_test(struct net_device *netdev,
 		struct ixgbe_hw *hw = &adapter->hw;
 
 		if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
+			struct vf_data_storage *vfinfo;
 			int i;
-			for (i = 0; i < adapter->num_vfs; i++) {
-				if (adapter->vfinfo[i].clear_to_send) {
-					netdev_warn(netdev, "offline diagnostic is not supported when VFs are present\n");
-					data[0] = 1;
-					data[1] = 1;
-					data[2] = 1;
-					data[3] = 1;
-					data[4] = 1;
-					eth_test->flags |= ETH_TEST_FL_FAILED;
-					clear_bit(__IXGBE_TESTING,
-						  &adapter->state);
-					return;
+
+			rcu_read_lock();
+			vfinfo = rcu_dereference(adapter->vfinfo);
+			if (vfinfo)
+				for (i = 0; i < adapter->num_vfs; i++) {
+					if (vfinfo[i].clear_to_send) {
+						netdev_warn(netdev, "offline diagnostic is not supported when VFs are present\n");
+						data[0] = 1;
+						data[1] = 1;
+						data[2] = 1;
+						data[3] = 1;
+						data[4] = 1;
+						eth_test->flags |= ETH_TEST_FL_FAILED;
+						clear_bit(__IXGBE_TESTING,
+							  &adapter->state);
+						rcu_read_unlock();
+						return;
+					}
 				}
-			}
+			rcu_read_unlock();
 		}
 
 		/* Offline tests */
@@ -3700,9 +3707,14 @@ static int ixgbe_set_priv_flags(struct net_device *netdev, u32 priv_flags)
 	if (priv_flags & IXGBE_PRIV_FLAGS_AUTO_DISABLE_VF) {
 		if (adapter->hw.mac.type == ixgbe_mac_82599EB) {
 			/* Reset primary abort counter */
-			for (i = 0; i < adapter->num_vfs; i++)
-				adapter->vfinfo[i].primary_abort_count = 0;
-
+			struct vf_data_storage *vfinfo;
+
+			rcu_read_lock();
+			vfinfo = rcu_dereference(adapter->vfinfo);
+			if (vfinfo)
+				for (i = 0; i < adapter->num_vfs; i++)
+					vfinfo[i].primary_abort_count = 0;
+			rcu_read_unlock();
 			flags2 |= IXGBE_FLAG2_AUTO_DISABLE_VF;
 		} else {
 			e_info(probe,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index bd397b3d7dea..b524a3a61eb6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -874,6 +874,7 @@ void ixgbe_ipsec_vf_clear(struct ixgbe_adapter *adapter, u32 vf)
 int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 {
 	struct ixgbe_ipsec *ipsec = adapter->ipsec;
+	struct vf_data_storage *vfinfo;
 	struct xfrm_algo_desc *algo;
 	struct sa_mbx_msg *sam;
 	struct xfrm_state *xs;
@@ -883,7 +884,13 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 	int err;
 
 	sam = (struct sa_mbx_msg *)(&msgbuf[1]);
-	if (!adapter->vfinfo[vf].trusted ||
+
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (!vfinfo[vf].trusted ||
 	    !(adapter->flags2 & IXGBE_FLAG2_VF_IPSEC_ENABLED)) {
 		e_warn(drv, "VF %d attempted to add an IPsec SA\n", vf);
 		err = -EACCES;
@@ -984,11 +991,17 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 int ixgbe_ipsec_vf_del_sa(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 {
 	struct ixgbe_ipsec *ipsec = adapter->ipsec;
+	struct vf_data_storage *vfinfo;
 	struct xfrm_state *xs;
 	u32 pfsa = msgbuf[1];
 	u16 sa_idx;
 
-	if (!adapter->vfinfo[vf].trusted) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (!vfinfo[vf].trusted) {
 		e_err(drv, "vf %d attempted to delete an SA\n", vf);
 		return -EPERM;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2646ee6f295f..6ee8c2a140c2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1240,20 +1240,26 @@ static void ixgbe_pf_handle_tx_hang(struct ixgbe_ring *tx_ring,
 static void ixgbe_vf_handle_tx_hang(struct ixgbe_adapter *adapter, u16 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 
 	if (adapter->hw.mac.type != ixgbe_mac_e610)
 		return;
 
-	e_warn(drv,
-	       "Malicious Driver Detection tx hang detected on PF %d VF %d MAC: %pM",
-	       hw->bus.func, vf, adapter->vfinfo[vf].vf_mac_addresses);
-
-	adapter->tx_hang_count[vf]++;
-	if (adapter->tx_hang_count[vf] == IXGBE_MAX_TX_VF_HANGS) {
-		ixgbe_set_vf_link_state(adapter, vf,
-					IFLA_VF_LINK_STATE_DISABLE);
-		adapter->tx_hang_count[vf] = 0;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo) {
+		e_warn(drv,
+		       "Malicious Driver Detection tx hang detected on PF %d VF %d MAC: %pM",
+		       hw->bus.func, vf, vfinfo[vf].vf_mac_addresses);
+
+		adapter->tx_hang_count[vf]++;
+		if (adapter->tx_hang_count[vf] == IXGBE_MAX_TX_VF_HANGS) {
+			ixgbe_set_vf_link_state(adapter, vf,
+						IFLA_VF_LINK_STATE_DISABLE);
+			adapter->tx_hang_count[vf] = 0;
+		}
 	}
+	rcu_read_unlock();
 }
 
 static u32 ixgbe_poll_tx_icache(struct ixgbe_hw *hw, u16 queue, u16 idx)
@@ -4625,6 +4631,7 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u16 pool = adapter->num_rx_pools;
 	u32 reg_offset, vf_shift, vmolr;
+	struct vf_data_storage *vfinfo;
 	u32 gcr_ext, vmdctl;
 	int i;
 
@@ -4680,15 +4687,19 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 
 	IXGBE_WRITE_REG(hw, IXGBE_GCR_EXT, gcr_ext);
 
-	for (i = 0; i < adapter->num_vfs; i++) {
-		/* configure spoof checking */
-		ixgbe_ndo_set_vf_spoofchk(adapter->netdev, i,
-					  adapter->vfinfo[i].spoofchk_enabled);
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0; i < adapter->num_vfs; i++) {
+			/* configure spoof checking */
+			ixgbe_ndo_set_vf_spoofchk(adapter->netdev, i,
+						  vfinfo[i].spoofchk_enabled);
 
-		/* Enable/Disable RSS query feature  */
-		ixgbe_ndo_set_vf_rss_query_en(adapter->netdev, i,
-					  adapter->vfinfo[i].rss_query_enabled);
-	}
+			/* Enable/Disable RSS query feature  */
+			ixgbe_ndo_set_vf_rss_query_en(adapter->netdev, i,
+						  vfinfo[i].rss_query_enabled);
+		}
+	rcu_read_unlock();
 }
 
 static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
@@ -6093,35 +6104,40 @@ static void ixgbe_check_media_subtask(struct ixgbe_adapter *adapter)
 static void ixgbe_clear_vf_stats_counters(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int i;
 
-	for (i = 0; i < adapter->num_vfs; i++) {
-		adapter->vfinfo[i].last_vfstats.gprc =
-			IXGBE_READ_REG(hw, IXGBE_PVFGPRC(i));
-		adapter->vfinfo[i].saved_rst_vfstats.gprc +=
-			adapter->vfinfo[i].vfstats.gprc;
-		adapter->vfinfo[i].vfstats.gprc = 0;
-		adapter->vfinfo[i].last_vfstats.gptc =
-			IXGBE_READ_REG(hw, IXGBE_PVFGPTC(i));
-		adapter->vfinfo[i].saved_rst_vfstats.gptc +=
-			adapter->vfinfo[i].vfstats.gptc;
-		adapter->vfinfo[i].vfstats.gptc = 0;
-		adapter->vfinfo[i].last_vfstats.gorc =
-			IXGBE_READ_REG(hw, IXGBE_PVFGORC_LSB(i));
-		adapter->vfinfo[i].saved_rst_vfstats.gorc +=
-			adapter->vfinfo[i].vfstats.gorc;
-		adapter->vfinfo[i].vfstats.gorc = 0;
-		adapter->vfinfo[i].last_vfstats.gotc =
-			IXGBE_READ_REG(hw, IXGBE_PVFGOTC_LSB(i));
-		adapter->vfinfo[i].saved_rst_vfstats.gotc +=
-			adapter->vfinfo[i].vfstats.gotc;
-		adapter->vfinfo[i].vfstats.gotc = 0;
-		adapter->vfinfo[i].last_vfstats.mprc =
-			IXGBE_READ_REG(hw, IXGBE_PVFMPRC(i));
-		adapter->vfinfo[i].saved_rst_vfstats.mprc +=
-			adapter->vfinfo[i].vfstats.mprc;
-		adapter->vfinfo[i].vfstats.mprc = 0;
-	}
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0; i < adapter->num_vfs; i++) {
+			vfinfo[i].last_vfstats.gprc =
+				IXGBE_READ_REG(hw, IXGBE_PVFGPRC(i));
+			vfinfo[i].saved_rst_vfstats.gprc +=
+				vfinfo[i].vfstats.gprc;
+			vfinfo[i].vfstats.gprc = 0;
+			vfinfo[i].last_vfstats.gptc =
+				IXGBE_READ_REG(hw, IXGBE_PVFGPTC(i));
+			vfinfo[i].saved_rst_vfstats.gptc +=
+				vfinfo[i].vfstats.gptc;
+			vfinfo[i].vfstats.gptc = 0;
+			vfinfo[i].last_vfstats.gorc =
+				IXGBE_READ_REG(hw, IXGBE_PVFGORC_LSB(i));
+			vfinfo[i].saved_rst_vfstats.gorc +=
+				vfinfo[i].vfstats.gorc;
+			vfinfo[i].vfstats.gorc = 0;
+			vfinfo[i].last_vfstats.gotc =
+				IXGBE_READ_REG(hw, IXGBE_PVFGOTC_LSB(i));
+			vfinfo[i].saved_rst_vfstats.gotc +=
+				vfinfo[i].vfstats.gotc;
+			vfinfo[i].vfstats.gotc = 0;
+			vfinfo[i].last_vfstats.mprc =
+				IXGBE_READ_REG(hw, IXGBE_PVFMPRC(i));
+			vfinfo[i].saved_rst_vfstats.mprc +=
+				vfinfo[i].vfstats.mprc;
+			vfinfo[i].vfstats.mprc = 0;
+		}
+	rcu_read_unlock();
 }
 
 static void ixgbe_setup_gpie(struct ixgbe_adapter *adapter)
@@ -6729,15 +6745,22 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	timer_delete_sync(&adapter->service_timer);
 
 	if (adapter->num_vfs) {
+		struct vf_data_storage *vfinfo;
+
 		/* Clear EITR Select mapping */
 		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EITRSEL, 0);
 
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
 		/* Mark all the VFs as inactive */
-		for (i = 0 ; i < adapter->num_vfs; i++)
-			adapter->vfinfo[i].clear_to_send = false;
+		if (vfinfo) {
+			for (i = 0 ; i < adapter->num_vfs; i++)
+				vfinfo[i].clear_to_send = false;
 
-		/* update setting rx tx for all active vfs */
-		ixgbe_set_all_vfs(adapter);
+			/* update setting rx tx for all active vfs */
+			ixgbe_set_all_vfs(adapter);
+		}
+		rcu_read_unlock();
 	}
 
 	/* disable transmits in the hardware now that interrupts are off */
@@ -7001,9 +7024,6 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
 	/* n-tuple support exists, always init our spinlock */
 	spin_lock_init(&adapter->fdir_perfect_lock);
 
-	/* init spinlock to avoid concurrency of VF resources */
-	spin_lock_init(&adapter->vfs_lock);
-
 #ifdef CONFIG_IXGBE_DCB
 	ixgbe_init_dcb(adapter);
 #endif
@@ -7905,25 +7925,31 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 	 * crazy values.
 	 */
 	if (!test_bit(__IXGBE_RESETTING, &adapter->state)) {
-		for (i = 0; i < adapter->num_vfs; i++) {
-			UPDATE_VF_COUNTER_32bit(IXGBE_PVFGPRC(i),
-						adapter->vfinfo[i].last_vfstats.gprc,
-						adapter->vfinfo[i].vfstats.gprc);
-			UPDATE_VF_COUNTER_32bit(IXGBE_PVFGPTC(i),
-						adapter->vfinfo[i].last_vfstats.gptc,
-						adapter->vfinfo[i].vfstats.gptc);
-			UPDATE_VF_COUNTER_36bit(IXGBE_PVFGORC_LSB(i),
-						IXGBE_PVFGORC_MSB(i),
-						adapter->vfinfo[i].last_vfstats.gorc,
-						adapter->vfinfo[i].vfstats.gorc);
-			UPDATE_VF_COUNTER_36bit(IXGBE_PVFGOTC_LSB(i),
-						IXGBE_PVFGOTC_MSB(i),
-						adapter->vfinfo[i].last_vfstats.gotc,
-						adapter->vfinfo[i].vfstats.gotc);
-			UPDATE_VF_COUNTER_32bit(IXGBE_PVFMPRC(i),
-						adapter->vfinfo[i].last_vfstats.mprc,
-						adapter->vfinfo[i].vfstats.mprc);
-		}
+		struct vf_data_storage *vfinfo;
+
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			for (i = 0; i < adapter->num_vfs; i++) {
+				UPDATE_VF_COUNTER_32bit(IXGBE_PVFGPRC(i),
+							vfinfo[i].last_vfstats.gprc,
+							vfinfo[i].vfstats.gprc);
+				UPDATE_VF_COUNTER_32bit(IXGBE_PVFGPTC(i),
+							vfinfo[i].last_vfstats.gptc,
+							vfinfo[i].vfstats.gptc);
+				UPDATE_VF_COUNTER_36bit(IXGBE_PVFGORC_LSB(i),
+							IXGBE_PVFGORC_MSB(i),
+							vfinfo[i].last_vfstats.gorc,
+							vfinfo[i].vfstats.gorc);
+				UPDATE_VF_COUNTER_36bit(IXGBE_PVFGOTC_LSB(i),
+							IXGBE_PVFGOTC_MSB(i),
+							vfinfo[i].last_vfstats.gotc,
+							vfinfo[i].vfstats.gotc);
+				UPDATE_VF_COUNTER_32bit(IXGBE_PVFMPRC(i),
+							vfinfo[i].last_vfstats.mprc,
+							vfinfo[i].vfstats.mprc);
+			}
+		rcu_read_unlock();
 	}
 }
 
@@ -8267,22 +8293,27 @@ static void ixgbe_watchdog_flush_tx(struct ixgbe_adapter *adapter)
 static void ixgbe_bad_vf_abort(struct ixgbe_adapter *adapter, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 
-	if (adapter->hw.mac.type == ixgbe_mac_82599EB &&
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo &&
+	    adapter->hw.mac.type == ixgbe_mac_82599EB &&
 	    adapter->flags2 & IXGBE_FLAG2_AUTO_DISABLE_VF) {
-		adapter->vfinfo[vf].primary_abort_count++;
-		if (adapter->vfinfo[vf].primary_abort_count ==
+		vfinfo[vf].primary_abort_count++;
+		if (vfinfo[vf].primary_abort_count ==
 		    IXGBE_PRIMARY_ABORT_LIMIT) {
 			ixgbe_set_vf_link_state(adapter, vf,
 						IFLA_VF_LINK_STATE_DISABLE);
-			adapter->vfinfo[vf].primary_abort_count = 0;
+			vfinfo[vf].primary_abort_count = 0;
 
 			e_info(drv,
 			       "Malicious Driver Detection event detected on PF %d VF %d MAC: %pM mdd-disable-vf=on",
 			       hw->bus.func, vf,
-			       adapter->vfinfo[vf].vf_mac_addresses);
+			       vfinfo[vf].vf_mac_addresses);
 		}
 	}
+	rcu_read_unlock();
 }
 
 static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
@@ -8309,9 +8340,15 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
 
 	/* check status reg for all VFs owned by this PF */
 	for (vf = 0; vf < adapter->num_vfs; ++vf) {
-		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
+		struct vf_data_storage *vfinfo;
+		struct pci_dev *vfdev = NULL;
 		u16 status_reg;
 
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			vfdev = vfinfo[vf].vfdev;
+		rcu_read_unlock();
 		if (!vfdev)
 			continue;
 		pci_read_config_word(vfdev, PCI_STATUS, &status_reg);
@@ -9744,17 +9781,23 @@ static int ixgbe_ndo_get_vf_stats(struct net_device *netdev, int vf,
 				  struct ifla_vf_stats *vf_stats)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 
 	if (vf < 0 || vf >= adapter->num_vfs)
 		return -EINVAL;
 
-	vf_stats->rx_packets = adapter->vfinfo[vf].vfstats.gprc;
-	vf_stats->rx_bytes   = adapter->vfinfo[vf].vfstats.gorc;
-	vf_stats->tx_packets = adapter->vfinfo[vf].vfstats.gptc;
-	vf_stats->tx_bytes   = adapter->vfinfo[vf].vfstats.gotc;
-	vf_stats->multicast  = adapter->vfinfo[vf].vfstats.mprc;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo) {
+		vf_stats->rx_packets = vfinfo[vf].vfstats.gprc;
+		vf_stats->rx_bytes   = vfinfo[vf].vfstats.gorc;
+		vf_stats->tx_packets = vfinfo[vf].vfstats.gptc;
+		vf_stats->tx_bytes   = vfinfo[vf].vfstats.gotc;
+		vf_stats->multicast  = vfinfo[vf].vfstats.mprc;
+	}
+	rcu_read_unlock();
 
-	return 0;
+	return vfinfo ? 0 : -EINVAL;
 }
 
 #ifdef CONFIG_IXGBE_DCB
@@ -10071,20 +10114,26 @@ static int handle_redirect_action(struct ixgbe_adapter *adapter, int ifindex,
 {
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
 	unsigned int num_vfs = adapter->num_vfs, vf;
+	struct vf_data_storage *vfinfo;
 	struct netdev_nested_priv priv;
 	struct upper_walk_data data;
 	struct net_device *upper;
 
 	/* redirect to a SRIOV VF */
-	for (vf = 0; vf < num_vfs; ++vf) {
-		upper = pci_get_drvdata(adapter->vfinfo[vf].vfdev);
-		if (upper->ifindex == ifindex) {
-			*queue = vf * __ALIGN_MASK(1, ~vmdq->mask);
-			*action = vf + 1;
-			*action <<= ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
-			return 0;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (vf = 0; vf < num_vfs; ++vf) {
+			upper = pci_get_drvdata(vfinfo[vf].vfdev);
+			if (upper->ifindex == ifindex) {
+				*queue = vf * __ALIGN_MASK(1, ~vmdq->mask);
+				*action = vf + 1;
+				*action <<= ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
+				rcu_read_unlock();
+				return 0;
+			}
 		}
-	}
+	rcu_read_unlock();
 
 	/* redirect to a offloaded macvlan netdev */
 	data.adapter = adapter;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 431d77da15a5..80f22a8e7af4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -44,7 +44,7 @@ static inline void ixgbe_alloc_vf_macvlans(struct ixgbe_adapter *adapter,
 			mv_list[i].free = true;
 			list_add(&mv_list[i].l, &adapter->vf_mvs.l);
 		}
-		adapter->mv_list = mv_list;
+		rcu_assign_pointer(adapter->mv_list, mv_list);
 	}
 }
 
@@ -52,6 +52,7 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 				unsigned int num_vfs)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int i;
 
 	if (adapter->xdp_prog) {
@@ -64,14 +65,11 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 			  IXGBE_FLAG_VMDQ_ENABLED;
 
 	/* Allocate memory for per VF control structures */
-	adapter->vfinfo = kzalloc_objs(struct vf_data_storage, num_vfs);
-	if (!adapter->vfinfo)
+	vfinfo = kzalloc_objs(struct vf_data_storage, num_vfs);
+	if (!vfinfo)
 		return -ENOMEM;
 
-	adapter->num_vfs = num_vfs;
-
 	ixgbe_alloc_vf_macvlans(adapter, num_vfs);
-	adapter->ring_feature[RING_F_VMDQ].offset = num_vfs;
 
 	/* Initialize default switching mode VEB */
 	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
@@ -95,23 +93,27 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 
 	for (i = 0; i < num_vfs; i++) {
 		/* enable spoof checking for all VFs */
-		adapter->vfinfo[i].spoofchk_enabled = true;
-		adapter->vfinfo[i].link_enable = true;
+		vfinfo[i].spoofchk_enabled = true;
+		vfinfo[i].link_enable = true;
 
 		/* We support VF RSS querying only for 82599 and x540
 		 * devices at the moment. These devices share RSS
 		 * indirection table and RSS hash key with PF therefore
 		 * we want to disable the querying by default.
 		 */
-		adapter->vfinfo[i].rss_query_enabled = false;
+		vfinfo[i].rss_query_enabled = false;
 
 		/* Untrust all VFs */
-		adapter->vfinfo[i].trusted = false;
+		vfinfo[i].trusted = false;
 
 		/* set the default xcast mode */
-		adapter->vfinfo[i].xcast_mode = IXGBEVF_XCAST_MODE_NONE;
+		vfinfo[i].xcast_mode = IXGBEVF_XCAST_MODE_NONE;
 	}
 
+	rcu_assign_pointer(adapter->vfinfo, vfinfo);
+	adapter->num_vfs = num_vfs;
+	adapter->ring_feature[RING_F_VMDQ].offset = num_vfs;
+
 	e_info(probe, "SR-IOV enabled with %d VFs\n", num_vfs);
 	return 0;
 }
@@ -123,6 +125,7 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 static void ixgbe_get_vfs(struct ixgbe_adapter *adapter)
 {
 	struct pci_dev *pdev = adapter->pdev;
+	struct vf_data_storage *vfinfo;
 	u16 vendor = pdev->vendor;
 	struct pci_dev *vfdev;
 	int vf = 0;
@@ -134,18 +137,23 @@ static void ixgbe_get_vfs(struct ixgbe_adapter *adapter)
 		return;
 	pci_read_config_word(pdev, pos + PCI_SRIOV_VF_DID, &vf_id);
 
-	vfdev = pci_get_device(vendor, vf_id, NULL);
-	for (; vfdev; vfdev = pci_get_device(vendor, vf_id, vfdev)) {
-		if (!vfdev->is_virtfn)
-			continue;
-		if (vfdev->physfn != pdev)
-			continue;
-		if (vf >= adapter->num_vfs)
-			continue;
-		pci_dev_get(vfdev);
-		adapter->vfinfo[vf].vfdev = vfdev;
-		++vf;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo) {
+		vfdev = pci_get_device(vendor, vf_id, NULL);
+		for (; vfdev; vfdev = pci_get_device(vendor, vf_id, vfdev)) {
+			if (!vfdev->is_virtfn)
+				continue;
+			if (vfdev->physfn != pdev)
+				continue;
+			if (vf >= adapter->num_vfs)
+				continue;
+			pci_dev_get(vfdev);
+			vfinfo[vf].vfdev = vfdev;
+			++vf;
+		}
 	}
+	rcu_read_unlock();
 }
 
 /* Note this function is called when the user wants to enable SR-IOV
@@ -206,31 +214,28 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
 {
 	unsigned int num_vfs = adapter->num_vfs, vf;
 	struct ixgbe_hw *hw = &adapter->hw;
-	unsigned long flags;
+	struct vf_data_storage *vfinfo;
+	struct vf_macvlans *mv_list;
 	int rss;
 
-	spin_lock_irqsave(&adapter->vfs_lock, flags);
-	/* set num VFs to 0 to prevent access to vfinfo */
+	/* set num VFs to 0 so readers bail out early */
 	adapter->num_vfs = 0;
-	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+
+	vfinfo = rcu_replace_pointer(adapter->vfinfo, NULL, 1);
+	mv_list = rcu_replace_pointer(adapter->mv_list, NULL, 1);
 
 	/* put the reference to all of the vf devices */
 	for (vf = 0; vf < num_vfs; ++vf) {
-		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
+		struct pci_dev *vfdev = vfinfo[vf].vfdev;
 
 		if (!vfdev)
 			continue;
-		adapter->vfinfo[vf].vfdev = NULL;
+		vfinfo[vf].vfdev = NULL;
 		pci_dev_put(vfdev);
 	}
 
-	/* free VF control structures */
-	kfree(adapter->vfinfo);
-	adapter->vfinfo = NULL;
-
-	/* free macvlan list */
-	kfree(adapter->mv_list);
-	adapter->mv_list = NULL;
+	kfree_rcu(vfinfo, rcu_head);
+	kfree_rcu(mv_list, rcu_head);
 
 	/* if SR-IOV is already disabled then there is nothing to do */
 	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
@@ -368,8 +373,8 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
 {
 	int entries = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]);
 	u16 *hash_list = (u16 *)&msgbuf[1];
-	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int i;
 	u32 vector_bit;
 	u32 vector_reg;
@@ -379,28 +384,34 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
 	/* only so many hash values supported */
 	entries = min(entries, IXGBE_MAX_VF_MC_ENTRIES);
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/*
 	 * salt away the number of multi cast addresses assigned
 	 * to this VF for later use to restore when the PF multi cast
 	 * list changes
 	 */
-	vfinfo->num_vf_mc_hashes = entries;
+	vfinfo[vf].num_vf_mc_hashes = entries;
 
 	/*
 	 * VFs are limited to using the MTA hash table for their multicast
 	 * addresses
 	 */
 	for (i = 0; i < entries; i++) {
-		vfinfo->vf_mc_hashes[i] = hash_list[i];
+		vfinfo[vf].vf_mc_hashes[i] = hash_list[i];
 	}
 
-	for (i = 0; i < vfinfo->num_vf_mc_hashes; i++) {
-		vector_reg = (vfinfo->vf_mc_hashes[i] >> 5) & 0x7F;
-		vector_bit = vfinfo->vf_mc_hashes[i] & 0x1F;
+	for (i = 0; i < vfinfo[vf].num_vf_mc_hashes; i++) {
+		vector_reg = (vfinfo[vf].vf_mc_hashes[i] >> 5) & 0x7F;
+		vector_bit = vfinfo[vf].vf_mc_hashes[i] & 0x1F;
 		mta_reg = IXGBE_READ_REG(hw, IXGBE_MTA(vector_reg));
 		mta_reg |= BIT(vector_bit);
 		IXGBE_WRITE_REG(hw, IXGBE_MTA(vector_reg), mta_reg);
 	}
+
 	vmolr |= IXGBE_VMOLR_ROMPE;
 	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
 
@@ -410,32 +421,39 @@ static int ixgbe_set_vf_multicasts(struct ixgbe_adapter *adapter,
 #ifdef CONFIG_PCI_IOV
 void ixgbe_restore_vf_multicasts(struct ixgbe_adapter *adapter)
 {
-	struct ixgbe_hw *hw = &adapter->hw;
 	struct vf_data_storage *vfinfo;
+	struct ixgbe_hw *hw = &adapter->hw;
 	int i, j;
 	u32 vector_bit;
 	u32 vector_reg;
 	u32 mta_reg;
 
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		goto no_vfs;
+
 	for (i = 0; i < adapter->num_vfs; i++) {
 		u32 vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(i));
-		vfinfo = &adapter->vfinfo[i];
-		for (j = 0; j < vfinfo->num_vf_mc_hashes; j++) {
+		for (j = 0; j < vfinfo[i].num_vf_mc_hashes; j++) {
 			hw->addr_ctrl.mta_in_use++;
-			vector_reg = (vfinfo->vf_mc_hashes[j] >> 5) & 0x7F;
-			vector_bit = vfinfo->vf_mc_hashes[j] & 0x1F;
+			vector_reg = (vfinfo[i].vf_mc_hashes[j] >> 5) & 0x7F;
+			vector_bit = vfinfo[i].vf_mc_hashes[j] & 0x1F;
 			mta_reg = IXGBE_READ_REG(hw, IXGBE_MTA(vector_reg));
 			mta_reg |= BIT(vector_bit);
 			IXGBE_WRITE_REG(hw, IXGBE_MTA(vector_reg), mta_reg);
 		}
 
-		if (vfinfo->num_vf_mc_hashes)
+		if (vfinfo[i].num_vf_mc_hashes)
 			vmolr |= IXGBE_VMOLR_ROMPE;
 		else
 			vmolr &= ~IXGBE_VMOLR_ROMPE;
 		IXGBE_WRITE_REG(hw, IXGBE_VMOLR(i), vmolr);
 	}
 
+no_vfs:
+	rcu_read_unlock();
+
 	/* Restore any VF macvlans */
 	ixgbe_full_sync_mac_table(adapter);
 }
@@ -493,7 +511,9 @@ static int ixgbe_set_vf_lpe(struct ixgbe_adapter *adapter, u32 max_frame, u32 vf
 	 */
 	if (adapter->hw.mac.type == ixgbe_mac_82599EB) {
 		struct net_device *dev = adapter->netdev;
+		unsigned int vf_api = ixgbe_mbox_api_10;
 		int pf_max_frame = dev->mtu + ETH_HLEN;
+		struct vf_data_storage *vfinfo;
 		u32 reg_offset, vf_shift, vfre;
 		int err = 0;
 
@@ -503,7 +523,12 @@ static int ixgbe_set_vf_lpe(struct ixgbe_adapter *adapter, u32 max_frame, u32 vf
 					     IXGBE_FCOE_JUMBO_FRAME_SIZE);
 
 #endif /* CONFIG_FCOE */
-		switch (adapter->vfinfo[vf].vf_api) {
+		lockdep_assert_in_rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			vf_api = vfinfo[vf].vf_api;
+
+		switch (vf_api) {
 		case ixgbe_mbox_api_11:
 		case ixgbe_mbox_api_12:
 		case ixgbe_mbox_api_13:
@@ -643,10 +668,16 @@ static void ixgbe_clear_vf_vlans(struct ixgbe_adapter *adapter, u32 vf)
 static int ixgbe_set_vf_macvlan(struct ixgbe_adapter *adapter,
 				int vf, int index, unsigned char *mac_addr)
 {
-	struct vf_macvlans *entry;
+	struct vf_macvlans *mv_list, *entry;
 	bool found = false;
 	int retval = 0;
 
+	lockdep_assert_in_rcu_read_lock();
+	/* vf_mvs entries point into the mv_list array */
+	mv_list = rcu_dereference(adapter->mv_list);
+	if (!mv_list)
+		return 0;
+
 	if (index <= 1) {
 		list_for_each_entry(entry, &adapter->vf_mvs.l, l) {
 			if (entry->vf == vf) {
@@ -700,7 +731,7 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
-	struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
+	struct vf_data_storage *vfinfo;
 	u32 q_per_pool = __ALIGN_MASK(1, ~vmdq->mask);
 	u8 num_tcs = adapter->hw_tcs;
 	u32 reg_val;
@@ -709,31 +740,36 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
 	/* remove VLAN filters belonging to this VF */
 	ixgbe_clear_vf_vlans(adapter, vf);
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return;
+
 	/* add back PF assigned VLAN or VLAN 0 */
-	ixgbe_set_vf_vlan(adapter, true, vfinfo->pf_vlan, vf);
+	ixgbe_set_vf_vlan(adapter, true, vfinfo[vf].pf_vlan, vf);
 
 	/* reset offloads to defaults */
-	ixgbe_set_vmolr(hw, vf, !vfinfo->pf_vlan);
+	ixgbe_set_vmolr(hw, vf, !vfinfo[vf].pf_vlan);
 
 	/* set outgoing tags for VFs */
-	if (!vfinfo->pf_vlan && !vfinfo->pf_qos && !num_tcs) {
+	if (!vfinfo[vf].pf_vlan && !vfinfo[vf].pf_qos && !num_tcs) {
 		ixgbe_clear_vmvir(adapter, vf);
 	} else {
-		if (vfinfo->pf_qos || !num_tcs)
-			ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
-					vfinfo->pf_qos, vf);
+		if (vfinfo[vf].pf_qos || !num_tcs)
+			ixgbe_set_vmvir(adapter, vfinfo[vf].pf_vlan,
+					vfinfo[vf].pf_qos, vf);
 		else
-			ixgbe_set_vmvir(adapter, vfinfo->pf_vlan,
+			ixgbe_set_vmvir(adapter, vfinfo[vf].pf_vlan,
 					adapter->default_up, vf);
 
-		if (vfinfo->spoofchk_enabled) {
+		if (vfinfo[vf].spoofchk_enabled) {
 			hw->mac.ops.set_vlan_anti_spoofing(hw, true, vf);
 			hw->mac.ops.set_mac_anti_spoofing(hw, true, vf);
 		}
 	}
 
 	/* reset multicast table array for vf */
-	adapter->vfinfo[vf].num_vf_mc_hashes = 0;
+	vfinfo[vf].num_vf_mc_hashes = 0;
 
 	/* clear any ipsec table info */
 	ixgbe_ipsec_vf_clear(adapter, vf);
@@ -741,11 +777,11 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
 	/* Flush and reset the mta with the new values */
 	ixgbe_set_rx_mode(adapter->netdev);
 
-	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
+	ixgbe_del_mac_filter(adapter, vfinfo[vf].vf_mac_addresses, vf);
 	ixgbe_set_vf_macvlan(adapter, vf, 0, NULL);
 
 	/* reset VF api back to unknown */
-	adapter->vfinfo[vf].vf_api = ixgbe_mbox_api_10;
+	vfinfo[vf].vf_api = ixgbe_mbox_api_10;
 
 	/* Restart each queue for given VF */
 	for (queue = 0; queue < q_per_pool; queue++) {
@@ -780,16 +816,25 @@ static void ixgbe_vf_clear_mbx(struct ixgbe_adapter *adapter, u32 vf)
 static int ixgbe_set_vf_mac(struct ixgbe_adapter *adapter,
 			    int vf, unsigned char *mac_addr)
 {
+	struct vf_data_storage *vfinfo;
 	int retval;
 
-	ixgbe_del_mac_filter(adapter, adapter->vfinfo[vf].vf_mac_addresses, vf);
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+
+	ixgbe_del_mac_filter(adapter, vfinfo[vf].vf_mac_addresses, vf);
 	retval = ixgbe_add_mac_filter(adapter, mac_addr, vf);
 	if (retval >= 0)
-		memcpy(adapter->vfinfo[vf].vf_mac_addresses, mac_addr,
+		memcpy(vfinfo[vf].vf_mac_addresses, mac_addr,
 		       ETH_ALEN);
 	else
-		eth_zero_addr(adapter->vfinfo[vf].vf_mac_addresses);
+		eth_zero_addr(vfinfo[vf].vf_mac_addresses);
 
+	rcu_read_unlock();
 	return retval;
 }
 
@@ -797,12 +842,17 @@ int ixgbe_vf_configuration(struct pci_dev *pdev, unsigned int event_mask)
 {
 	struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
 	unsigned int vfn = (event_mask & 0x3f);
+	struct vf_data_storage *vfinfo;
 
 	bool enable = ((event_mask & 0x10000000U) != 0);
 
-	if (enable)
-		eth_zero_addr(adapter->vfinfo[vfn].vf_mac_addresses);
-
+	if (enable) {
+		rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			eth_zero_addr(vfinfo[vfn].vf_mac_addresses);
+		rcu_read_unlock();
+	}
 	return 0;
 }
 
@@ -838,6 +888,7 @@ static void ixgbe_set_vf_rx_tx(struct ixgbe_adapter *adapter, int vf)
 {
 	u32 reg_cur_tx, reg_cur_rx, reg_req_tx, reg_req_rx;
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	u32 reg_offset, vf_shift;
 
 	vf_shift = vf % 32;
@@ -846,7 +897,9 @@ static void ixgbe_set_vf_rx_tx(struct ixgbe_adapter *adapter, int vf)
 	reg_cur_tx = IXGBE_READ_REG(hw, IXGBE_VFTE(reg_offset));
 	reg_cur_rx = IXGBE_READ_REG(hw, IXGBE_VFRE(reg_offset));
 
-	if (adapter->vfinfo[vf].link_enable) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo && vfinfo[vf].link_enable) {
 		reg_req_tx = reg_cur_tx | 1 << vf_shift;
 		reg_req_rx = reg_cur_rx | 1 << vf_shift;
 	} else {
@@ -882,11 +935,12 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 {
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
 	struct ixgbe_hw *hw = &adapter->hw;
-	unsigned char *vf_mac = adapter->vfinfo[vf].vf_mac_addresses;
+	struct vf_data_storage *vfinfo;
 	u32 reg, reg_offset, vf_shift;
 	u32 msgbuf[4] = {0, 0, 0, 0};
 	u8 *addr = (u8 *)(&msgbuf[1]);
 	u32 q_per_pool = __ALIGN_MASK(1, ~vmdq->mask);
+	unsigned char *vf_mac;
 	int i;
 
 	e_info(probe, "VF Reset msg received from vf %d\n", vf);
@@ -896,6 +950,13 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 
 	ixgbe_vf_clear_mbx(adapter, vf);
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	vf_mac = vfinfo[vf].vf_mac_addresses;
+
 	/* set vf mac address */
 	if (!is_zero_ether_addr(vf_mac))
 		ixgbe_set_vf_mac(adapter, vf, vf_mac);
@@ -905,7 +966,7 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 
 	/* force drop enable for all VF Rx queues */
 	reg = IXGBE_QDE_ENABLE;
-	if (adapter->vfinfo[vf].pf_vlan)
+	if (vfinfo[vf].pf_vlan)
 		reg |= IXGBE_QDE_HIDE_VLAN;
 
 	ixgbe_write_qde(adapter, vf, reg);
@@ -913,7 +974,7 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 	ixgbe_set_vf_rx_tx(adapter, vf);
 
 	/* enable VF mailbox for further messages */
-	adapter->vfinfo[vf].clear_to_send = true;
+	vfinfo[vf].clear_to_send = true;
 
 	/* Enable counting of spoofed packets in the SSVPC register */
 	reg = IXGBE_READ_REG(hw, IXGBE_VMECM(reg_offset));
@@ -931,7 +992,7 @@ static int ixgbe_vf_reset_msg(struct ixgbe_adapter *adapter, u32 vf)
 
 	/* reply to reset with ack and vf mac address */
 	msgbuf[0] = IXGBE_VF_RESET;
-	if (!is_zero_ether_addr(vf_mac) && adapter->vfinfo[vf].pf_set_mac) {
+	if (!is_zero_ether_addr(vf_mac) && vfinfo[vf].pf_set_mac) {
 		msgbuf[0] |= IXGBE_VT_MSGTYPE_ACK;
 		memcpy(addr, vf_mac, ETH_ALEN);
 	} else {
@@ -952,14 +1013,20 @@ static int ixgbe_set_vf_mac_addr(struct ixgbe_adapter *adapter,
 				 u32 *msgbuf, u32 vf)
 {
 	u8 *new_mac = ((u8 *)(&msgbuf[1]));
+	struct vf_data_storage *vfinfo;
 
 	if (!is_valid_ether_addr(new_mac)) {
 		e_warn(drv, "VF %d attempted to set invalid mac\n", vf);
 		return -1;
 	}
 
-	if (adapter->vfinfo[vf].pf_set_mac && !adapter->vfinfo[vf].trusted &&
-	    !ether_addr_equal(adapter->vfinfo[vf].vf_mac_addresses, new_mac)) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (vfinfo[vf].pf_set_mac && !vfinfo[vf].trusted &&
+	    !ether_addr_equal(vfinfo[vf].vf_mac_addresses, new_mac)) {
 		e_warn(drv,
 		       "VF %d attempted to override administratively set MAC address\n"
 		       "Reload the VF driver to resume operations\n",
@@ -975,9 +1042,15 @@ static int ixgbe_set_vf_vlan_msg(struct ixgbe_adapter *adapter,
 {
 	u32 add = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]);
 	u32 vid = (msgbuf[1] & IXGBE_VLVF_VLANID_MASK);
+	struct vf_data_storage *vfinfo;
 	u8 tcs = adapter->hw_tcs;
 
-	if (adapter->vfinfo[vf].pf_vlan || tcs) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (vfinfo[vf].pf_vlan || tcs) {
 		e_warn(drv,
 		       "VF %d attempted to override administratively set VLAN configuration\n"
 		       "Reload the VF driver to resume operations\n",
@@ -997,9 +1070,15 @@ static int ixgbe_set_vf_macvlan_msg(struct ixgbe_adapter *adapter,
 {
 	u8 *new_mac = ((u8 *)(&msgbuf[1]));
 	int index = FIELD_GET(IXGBE_VT_MSGINFO_MASK, msgbuf[0]);
+	struct vf_data_storage *vfinfo;
 	int err;
 
-	if (adapter->vfinfo[vf].pf_set_mac && !adapter->vfinfo[vf].trusted &&
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	if (vfinfo[vf].pf_set_mac && !vfinfo[vf].trusted &&
 	    index > 0) {
 		e_warn(drv,
 		       "VF %d requested MACVLAN filter but is administratively denied\n",
@@ -1018,7 +1097,7 @@ static int ixgbe_set_vf_macvlan_msg(struct ixgbe_adapter *adapter,
 		 * If the VF is allowed to set MAC filters then turn off
 		 * anti-spoofing to avoid false positives.
 		 */
-		if (adapter->vfinfo[vf].spoofchk_enabled) {
+		if (vfinfo[vf].spoofchk_enabled) {
 			struct ixgbe_hw *hw = &adapter->hw;
 
 			hw->mac.ops.set_mac_anti_spoofing(hw, false, vf);
@@ -1038,6 +1117,7 @@ static int ixgbe_set_vf_macvlan_msg(struct ixgbe_adapter *adapter,
 static int ixgbe_negotiate_vf_api(struct ixgbe_adapter *adapter,
 				  u32 *msgbuf, u32 vf)
 {
+	struct vf_data_storage *vfinfo;
 	int api = msgbuf[1];
 
 	switch (api) {
@@ -1048,7 +1128,10 @@ static int ixgbe_negotiate_vf_api(struct ixgbe_adapter *adapter,
 	case ixgbe_mbox_api_14:
 	case ixgbe_mbox_api_16:
 	case ixgbe_mbox_api_17:
-		adapter->vfinfo[vf].vf_api = api;
+		lockdep_assert_in_rcu_read_lock();
+		vfinfo = rcu_dereference(adapter->vfinfo);
+		if (vfinfo)
+			vfinfo[vf].vf_api = api;
 		return 0;
 	default:
 		break;
@@ -1064,11 +1147,17 @@ static int ixgbe_get_vf_queues(struct ixgbe_adapter *adapter,
 {
 	struct net_device *dev = adapter->netdev;
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
+	struct vf_data_storage *vfinfo;
 	unsigned int default_tc = 0;
 	u8 num_tcs = adapter->hw_tcs;
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/* verify the PF is supporting the correct APIs */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_20:
 	case ixgbe_mbox_api_11:
 	case ixgbe_mbox_api_12:
@@ -1092,7 +1181,7 @@ static int ixgbe_get_vf_queues(struct ixgbe_adapter *adapter,
 	/* notify VF of need for VLAN tag stripping, and correct queue */
 	if (num_tcs)
 		msgbuf[IXGBE_VF_TRANS_VLAN] = num_tcs;
-	else if (adapter->vfinfo[vf].pf_vlan || adapter->vfinfo[vf].pf_qos)
+	else if (vfinfo[vf].pf_vlan || vfinfo[vf].pf_qos)
 		msgbuf[IXGBE_VF_TRANS_VLAN] = 1;
 	else
 		msgbuf[IXGBE_VF_TRANS_VLAN] = 0;
@@ -1105,17 +1194,23 @@ static int ixgbe_get_vf_queues(struct ixgbe_adapter *adapter,
 
 static int ixgbe_get_vf_reta(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 {
-	u32 i, j;
-	u32 *out_buf = &msgbuf[1];
-	const u8 *reta = adapter->rss_indir_tbl;
 	u32 reta_size = ixgbe_rss_indir_tbl_entries(adapter);
+	const u8 *reta = adapter->rss_indir_tbl;
+	struct vf_data_storage *vfinfo;
+	u32 *out_buf = &msgbuf[1];
+	u32 i, j;
+
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
 
 	/* Check if operation is permitted */
-	if (!adapter->vfinfo[vf].rss_query_enabled)
+	if (!vfinfo[vf].rss_query_enabled)
 		return -EPERM;
 
 	/* verify the PF is supporting the correct API */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_17:
 	case ixgbe_mbox_api_16:
 	case ixgbe_mbox_api_14:
@@ -1143,14 +1238,20 @@ static int ixgbe_get_vf_reta(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf)
 static int ixgbe_get_vf_rss_key(struct ixgbe_adapter *adapter,
 				u32 *msgbuf, u32 vf)
 {
+	struct vf_data_storage *vfinfo;
 	u32 *rss_key = &msgbuf[1];
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/* Check if the operation is permitted */
-	if (!adapter->vfinfo[vf].rss_query_enabled)
+	if (!vfinfo[vf].rss_query_enabled)
 		return -EPERM;
 
 	/* verify the PF is supporting the correct API */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_17:
 	case ixgbe_mbox_api_16:
 	case ixgbe_mbox_api_14:
@@ -1170,11 +1271,17 @@ static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
 				      u32 *msgbuf, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int xcast_mode = msgbuf[1];
 	u32 vmolr, fctrl, disable, enable;
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/* verify the PF is supporting the correct APIs */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_12:
 		/* promisc introduced in 1.3 version */
 		if (xcast_mode == IXGBEVF_XCAST_MODE_PROMISC)
@@ -1190,11 +1297,11 @@ static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
 	}
 
 	if (xcast_mode > IXGBEVF_XCAST_MODE_MULTI &&
-	    !adapter->vfinfo[vf].trusted) {
+	    !vfinfo[vf].trusted) {
 		xcast_mode = IXGBEVF_XCAST_MODE_MULTI;
 	}
 
-	if (adapter->vfinfo[vf].xcast_mode == xcast_mode)
+	if (vfinfo[vf].xcast_mode == xcast_mode)
 		goto out;
 
 	switch (xcast_mode) {
@@ -1236,7 +1343,7 @@ static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
 	vmolr |= enable;
 	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
 
-	adapter->vfinfo[vf].xcast_mode = xcast_mode;
+	vfinfo[vf].xcast_mode = xcast_mode;
 
 out:
 	msgbuf[1] = xcast_mode;
@@ -1247,10 +1354,16 @@ static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
 static int ixgbe_get_vf_link_state(struct ixgbe_adapter *adapter,
 				   u32 *msgbuf, u32 vf)
 {
+	struct vf_data_storage *vfinfo;
 	u32 *link_state = &msgbuf[1];
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/* verify the PF is supporting the correct API */
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_12:
 	case ixgbe_mbox_api_13:
 	case ixgbe_mbox_api_14:
@@ -1261,7 +1374,7 @@ static int ixgbe_get_vf_link_state(struct ixgbe_adapter *adapter,
 		return -EOPNOTSUPP;
 	}
 
-	*link_state = adapter->vfinfo[vf].link_enable;
+	*link_state = vfinfo[vf].link_enable;
 
 	return 0;
 }
@@ -1280,8 +1393,14 @@ static int ixgbe_send_vf_link_status(struct ixgbe_adapter *adapter,
 				     u32 *msgbuf, u32 vf)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
+
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
 
-	switch (adapter->vfinfo[vf].vf_api) {
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_16:
 	case ixgbe_mbox_api_17:
 		if (hw->mac.type != ixgbe_mac_e610)
@@ -1310,9 +1429,15 @@ static int ixgbe_send_vf_link_status(struct ixgbe_adapter *adapter,
 static int ixgbe_negotiate_vf_features(struct ixgbe_adapter *adapter,
 				       u32 *msgbuf, u32 vf)
 {
+	struct vf_data_storage *vfinfo;
 	u32 features = msgbuf[1];
 
-	switch (adapter->vfinfo[vf].vf_api) {
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
+	switch (vfinfo[vf].vf_api) {
 	case ixgbe_mbox_api_17:
 		break;
 	default:
@@ -1330,6 +1455,7 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 	u32 mbx_size = IXGBE_VFMAILBOX_SIZE;
 	u32 msgbuf[IXGBE_VFMAILBOX_SIZE];
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	int retval;
 
 	retval = ixgbe_read_mbx(hw, msgbuf, mbx_size, vf);
@@ -1349,11 +1475,16 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 	if (msgbuf[0] == IXGBE_VF_RESET)
 		return ixgbe_vf_reset_msg(adapter, vf);
 
+	lockdep_assert_in_rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		return 0;
+
 	/*
 	 * until the vf completes a virtual function reset it should not be
 	 * allowed to start any configuration.
 	 */
-	if (!adapter->vfinfo[vf].clear_to_send) {
+	if (!vfinfo[vf].clear_to_send) {
 		msgbuf[0] |= IXGBE_VT_MSGTYPE_NACK;
 		ixgbe_write_mbx(hw, msgbuf, 1, vf);
 		return 0;
@@ -1426,11 +1557,12 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 
 static void ixgbe_rcv_ack_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 {
+	struct vf_data_storage *vfinfo = rcu_dereference(adapter->vfinfo);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 msg = IXGBE_VT_MSGTYPE_NACK;
 
 	/* if device isn't clear to send it shouldn't be reading either */
-	if (!adapter->vfinfo[vf].clear_to_send)
+	if (vfinfo && !vfinfo[vf].clear_to_send)
 		ixgbe_write_mbx(hw, &msg, 1, vf);
 }
 
@@ -1462,15 +1594,21 @@ bool ixgbe_check_mdd_event(struct ixgbe_adapter *adapter)
 			 IXGBE_READ_REG(hw, IXGBE_LVMMC_RX));
 
 		if (hw->mac.ops.restore_mdd_vf) {
+			struct vf_data_storage *vfinfo;
 			u32 ping;
 
 			hw->mac.ops.restore_mdd_vf(hw, i);
 
 			/* get the VF to rebuild its queues */
-			adapter->vfinfo[i].clear_to_send = 0;
-			ping = IXGBE_PF_CONTROL_MSG |
-			       IXGBE_VT_MSGTYPE_CTS;
-			ixgbe_write_mbx(hw, &ping, 1, i);
+			rcu_read_lock();
+			vfinfo = rcu_dereference(adapter->vfinfo);
+			if (vfinfo) {
+				vfinfo[i].clear_to_send = false;
+				ping = IXGBE_PF_CONTROL_MSG |
+				       IXGBE_VT_MSGTYPE_CTS;
+				ixgbe_write_mbx(hw, &ping, 1, i);
+			}
+			rcu_read_unlock();
 		}
 
 		ret = true;
@@ -1482,12 +1620,11 @@ bool ixgbe_check_mdd_event(struct ixgbe_adapter *adapter)
 void ixgbe_msg_task(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	unsigned long flags;
 	u32 vf;
 
 	ixgbe_check_mdd_event(adapter);
 
-	spin_lock_irqsave(&adapter->vfs_lock, flags);
+	rcu_read_lock();
 	for (vf = 0; vf < adapter->num_vfs; vf++) {
 		/* process any reset requests */
 		if (!ixgbe_check_for_rst(hw, vf))
@@ -1501,7 +1638,7 @@ void ixgbe_msg_task(struct ixgbe_adapter *adapter)
 		if (!ixgbe_check_for_ack(hw, vf))
 			ixgbe_rcv_ack_from_vf(adapter, vf);
 	}
-	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
+	rcu_read_unlock();
 }
 
 static inline void ixgbe_ping_vf(struct ixgbe_adapter *adapter, int vf)
@@ -1510,23 +1647,26 @@ static inline void ixgbe_ping_vf(struct ixgbe_adapter *adapter, int vf)
 	u32 ping;
 
 	ping = IXGBE_PF_CONTROL_MSG;
-	if (adapter->vfinfo[vf].clear_to_send)
-		ping |= IXGBE_VT_MSGTYPE_CTS;
 	ixgbe_write_mbx(hw, &ping, 1, vf);
 }
 
 void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 	u32 ping;
 	int i;
 
-	for (i = 0 ; i < adapter->num_vfs; i++) {
-		ping = IXGBE_PF_CONTROL_MSG;
-		if (adapter->vfinfo[i].clear_to_send)
-			ping |= IXGBE_VT_MSGTYPE_CTS;
-		ixgbe_write_mbx(hw, &ping, 1, i);
-	}
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0 ; i < adapter->num_vfs; i++) {
+			ping = IXGBE_PF_CONTROL_MSG;
+			if (vfinfo[i].clear_to_send)
+				ping |= IXGBE_VT_MSGTYPE_CTS;
+			ixgbe_write_mbx(hw, &ping, 1, i);
+		}
+	rcu_read_unlock();
 }
 
 /**
@@ -1537,21 +1677,34 @@ void ixgbe_ping_all_vfs(struct ixgbe_adapter *adapter)
  **/
 void ixgbe_set_all_vfs(struct ixgbe_adapter *adapter)
 {
+	struct vf_data_storage *vfinfo;
 	int i;
 
-	for (i = 0 ; i < adapter->num_vfs; i++)
-		ixgbe_set_vf_link_state(adapter, i,
-					adapter->vfinfo[i].link_state);
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0 ; i < adapter->num_vfs; i++)
+			ixgbe_set_vf_link_state(adapter, i,
+						vfinfo[i].link_state);
+	rcu_read_unlock();
 }
 
 int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 	int retval;
 
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return 0;
+	}
+
 	if (is_valid_ether_addr(mac)) {
 		dev_info(&adapter->pdev->dev, "setting MAC %pM on VF %d\n",
 			 mac, vf);
@@ -1559,7 +1712,7 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 
 		retval = ixgbe_set_vf_mac(adapter, vf, mac);
 		if (retval >= 0) {
-			adapter->vfinfo[vf].pf_set_mac = true;
+			vfinfo[vf].pf_set_mac = true;
 
 			if (test_bit(__IXGBE_DOWN, &adapter->state)) {
 				dev_warn(&adapter->pdev->dev, "The VF MAC address has been set, but the PF device is not up.\n");
@@ -1569,18 +1722,19 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 			dev_warn(&adapter->pdev->dev, "The VF MAC address was NOT set due to invalid or duplicate MAC address.\n");
 		}
 	} else if (is_zero_ether_addr(mac)) {
-		unsigned char *vf_mac_addr =
-					   adapter->vfinfo[vf].vf_mac_addresses;
+		unsigned char *vf_mac_addr = vfinfo[vf].vf_mac_addresses;
 
 		/* nothing to do */
-		if (is_zero_ether_addr(vf_mac_addr))
+		if (is_zero_ether_addr(vf_mac_addr)) {
+			rcu_read_unlock();
 			return 0;
+		}
 
 		dev_info(&adapter->pdev->dev, "removing MAC on VF %d\n", vf);
 
 		retval = ixgbe_del_mac_filter(adapter, vf_mac_addr, vf);
 		if (retval >= 0) {
-			adapter->vfinfo[vf].pf_set_mac = false;
+			vfinfo[vf].pf_set_mac = false;
 			memcpy(vf_mac_addr, mac, ETH_ALEN);
 		} else {
 			dev_warn(&adapter->pdev->dev, "Could NOT remove the VF MAC address.\n");
@@ -1589,10 +1743,12 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int vf, u8 *mac)
 		retval = -EINVAL;
 	}
 
+	rcu_read_unlock();
 	return retval;
 }
 
 static int ixgbe_enable_port_vlan(struct ixgbe_adapter *adapter, int vf,
+				  struct vf_data_storage *vfinfo,
 				  u16 vlan, u8 qos)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -1613,8 +1769,8 @@ static int ixgbe_enable_port_vlan(struct ixgbe_adapter *adapter, int vf,
 		ixgbe_write_qde(adapter, vf, IXGBE_QDE_ENABLE |
 				IXGBE_QDE_HIDE_VLAN);
 
-	adapter->vfinfo[vf].pf_vlan = vlan;
-	adapter->vfinfo[vf].pf_qos = qos;
+	vfinfo[vf].pf_vlan = vlan;
+	vfinfo[vf].pf_qos = qos;
 	dev_info(&adapter->pdev->dev,
 		 "Setting VLAN %d, QOS 0x%x on VF %d\n", vlan, qos, vf);
 	if (test_bit(__IXGBE_DOWN, &adapter->state)) {
@@ -1628,13 +1784,14 @@ static int ixgbe_enable_port_vlan(struct ixgbe_adapter *adapter, int vf,
 	return err;
 }
 
-static int ixgbe_disable_port_vlan(struct ixgbe_adapter *adapter, int vf)
+static int ixgbe_disable_port_vlan(struct ixgbe_adapter *adapter, int vf,
+				   struct vf_data_storage *vfinfo)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	int err;
 
 	err = ixgbe_set_vf_vlan(adapter, false,
-				adapter->vfinfo[vf].pf_vlan, vf);
+				vfinfo[vf].pf_vlan, vf);
 	/* Restore tagless access via VLAN 0 */
 	ixgbe_set_vf_vlan(adapter, true, 0, vf);
 	ixgbe_clear_vmvir(adapter, vf);
@@ -1644,8 +1801,8 @@ static int ixgbe_disable_port_vlan(struct ixgbe_adapter *adapter, int vf)
 	if (hw->mac.type >= ixgbe_mac_X550)
 		ixgbe_write_qde(adapter, vf, IXGBE_QDE_ENABLE);
 
-	adapter->vfinfo[vf].pf_vlan = 0;
-	adapter->vfinfo[vf].pf_qos = 0;
+	vfinfo[vf].pf_vlan = 0;
+	vfinfo[vf].pf_qos = 0;
 
 	return err;
 }
@@ -1653,13 +1810,20 @@ static int ixgbe_disable_port_vlan(struct ixgbe_adapter *adapter, int vf)
 int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
 			  u8 qos, __be16 vlan_proto)
 {
-	int err = 0;
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
+	int err = 0;
 
 	if ((vf >= adapter->num_vfs) || (vlan > 4095) || (qos > 7))
 		return -EINVAL;
 	if (vlan_proto != htons(ETH_P_8021Q))
 		return -EPROTONOSUPPORT;
+
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo)
+		goto out;
+
 	if (vlan || qos) {
 		/* Check if there is already a port VLAN set, if so
 		 * we have to delete the old one first before we
@@ -1668,16 +1832,17 @@ int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
 		 * old port VLAN before setting a new one but this
 		 * is not necessarily the case.
 		 */
-		if (adapter->vfinfo[vf].pf_vlan)
-			err = ixgbe_disable_port_vlan(adapter, vf);
+		if (vfinfo[vf].pf_vlan)
+			err = ixgbe_disable_port_vlan(adapter, vf, vfinfo);
 		if (err)
 			goto out;
-		err = ixgbe_enable_port_vlan(adapter, vf, vlan, qos);
+		err = ixgbe_enable_port_vlan(adapter, vf, vfinfo, vlan, qos);
 	} else {
-		err = ixgbe_disable_port_vlan(adapter, vf);
+		err = ixgbe_disable_port_vlan(adapter, vf, vfinfo);
 	}
 
 out:
+	rcu_read_unlock();
 	return err;
 }
 
@@ -1695,13 +1860,13 @@ int ixgbe_link_mbps(struct ixgbe_adapter *adapter)
 	}
 }
 
-static void ixgbe_set_vf_rate_limit(struct ixgbe_adapter *adapter, int vf)
+static void ixgbe_set_vf_rate_limit(struct ixgbe_adapter *adapter, int vf,
+				    u16 tx_rate)
 {
 	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 bcnrc_val = 0;
 	u16 queue, queues_per_pool;
-	u16 tx_rate = adapter->vfinfo[vf].tx_rate;
 
 	if (tx_rate) {
 		/* start with base link speed value */
@@ -1749,6 +1914,7 @@ static void ixgbe_set_vf_rate_limit(struct ixgbe_adapter *adapter, int vf)
 
 void ixgbe_check_vf_rate_limit(struct ixgbe_adapter *adapter)
 {
+	struct vf_data_storage *vfinfo;
 	int i;
 
 	/* VF Tx rate limit was not set */
@@ -1761,18 +1927,23 @@ void ixgbe_check_vf_rate_limit(struct ixgbe_adapter *adapter)
 			 "Link speed has been changed. VF Transmit rate is disabled\n");
 	}
 
-	for (i = 0; i < adapter->num_vfs; i++) {
-		if (!adapter->vf_rate_link_speed)
-			adapter->vfinfo[i].tx_rate = 0;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		for (i = 0; i < adapter->num_vfs; i++) {
+			if (!adapter->vf_rate_link_speed)
+				vfinfo[i].tx_rate = 0;
 
-		ixgbe_set_vf_rate_limit(adapter, i);
-	}
+			ixgbe_set_vf_rate_limit(adapter, i, vfinfo[i].tx_rate);
+		}
+	rcu_read_unlock();
 }
 
 int ixgbe_ndo_set_vf_bw(struct net_device *netdev, int vf, int min_tx_rate,
 			int max_tx_rate)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 	int link_speed;
 
 	/* verify VF is active */
@@ -1795,12 +1966,17 @@ int ixgbe_ndo_set_vf_bw(struct net_device *netdev, int vf, int min_tx_rate,
 	if (max_tx_rate && ((max_tx_rate <= 10) || (max_tx_rate > link_speed)))
 		return -EINVAL;
 
-	/* store values */
-	adapter->vf_rate_link_speed = link_speed;
-	adapter->vfinfo[vf].tx_rate = max_tx_rate;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo) {
+		/* store values */
+		adapter->vf_rate_link_speed = link_speed;
+		vfinfo[vf].tx_rate = max_tx_rate;
 
-	/* update hardware configuration */
-	ixgbe_set_vf_rate_limit(adapter, vf);
+		/* update hardware configuration */
+		ixgbe_set_vf_rate_limit(adapter, vf, vfinfo[vf].tx_rate);
+	}
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -1809,11 +1985,18 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct vf_data_storage *vfinfo;
 
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 
-	adapter->vfinfo[vf].spoofchk_enabled = setting;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		vfinfo[vf].spoofchk_enabled = setting;
+	rcu_read_unlock();
+	if (!vfinfo)
+		return 0;
 
 	/* configure MAC spoofing */
 	hw->mac.ops.set_mac_anti_spoofing(hw, setting, vf);
@@ -1851,28 +2034,37 @@ int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
  **/
 void ixgbe_set_vf_link_state(struct ixgbe_adapter *adapter, int vf, int state)
 {
-	adapter->vfinfo[vf].link_state = state;
+	struct vf_data_storage *vfinfo;
+
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return;
+	}
+	vfinfo[vf].link_state = state;
 
 	switch (state) {
 	case IFLA_VF_LINK_STATE_AUTO:
 		if (test_bit(__IXGBE_DOWN, &adapter->state))
-			adapter->vfinfo[vf].link_enable = false;
+			vfinfo[vf].link_enable = false;
 		else
-			adapter->vfinfo[vf].link_enable = true;
+			vfinfo[vf].link_enable = true;
 		break;
 	case IFLA_VF_LINK_STATE_ENABLE:
-		adapter->vfinfo[vf].link_enable = true;
+		vfinfo[vf].link_enable = true;
 		break;
 	case IFLA_VF_LINK_STATE_DISABLE:
-		adapter->vfinfo[vf].link_enable = false;
+		vfinfo[vf].link_enable = false;
 		break;
 	}
 
 	ixgbe_set_vf_rx_tx(adapter, vf);
 
 	/* restart the VF */
-	adapter->vfinfo[vf].clear_to_send = false;
+	vfinfo[vf].clear_to_send = false;
 	ixgbe_ping_vf(adapter, vf);
+	rcu_read_unlock();
 }
 
 /**
@@ -1923,6 +2115,7 @@ int ixgbe_ndo_set_vf_rss_query_en(struct net_device *netdev, int vf,
 				  bool setting)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 
 	/* This operation is currently supported only for 82599 and x540
 	 * devices.
@@ -1934,7 +2127,11 @@ int ixgbe_ndo_set_vf_rss_query_en(struct net_device *netdev, int vf,
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 
-	adapter->vfinfo[vf].rss_query_enabled = setting;
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (vfinfo)
+		vfinfo[vf].rss_query_enabled = setting;
+	rcu_read_unlock();
 
 	return 0;
 }
@@ -1942,18 +2139,31 @@ int ixgbe_ndo_set_vf_rss_query_en(struct net_device *netdev, int vf,
 int ixgbe_ndo_set_vf_trust(struct net_device *netdev, int vf, bool setting)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
 
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return 0;
+	}
+
 	/* nothing to do */
-	if (adapter->vfinfo[vf].trusted == setting)
+	if (vfinfo[vf].trusted == setting) {
+		rcu_read_unlock();
 		return 0;
+	}
 
-	adapter->vfinfo[vf].trusted = setting;
+	vfinfo[vf].trusted = setting;
 
 	/* reset VF to reconfigure features */
-	adapter->vfinfo[vf].clear_to_send = false;
+	vfinfo[vf].clear_to_send = false;
+
+	rcu_read_unlock();
+
 	ixgbe_ping_vf(adapter, vf);
 
 	e_info(drv, "VF %u is %strusted\n", vf, setting ? "" : "not ");
@@ -1965,17 +2175,30 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 			    int vf, struct ifla_vf_info *ivi)
 {
 	struct ixgbe_adapter *adapter = ixgbe_from_netdev(netdev);
+	struct vf_data_storage *vfinfo;
+
 	if (vf >= adapter->num_vfs)
 		return -EINVAL;
 	ivi->vf = vf;
-	memcpy(&ivi->mac, adapter->vfinfo[vf].vf_mac_addresses, ETH_ALEN);
-	ivi->max_tx_rate = adapter->vfinfo[vf].tx_rate;
+
+	rcu_read_lock();
+	vfinfo = rcu_dereference(adapter->vfinfo);
+	if (!vfinfo) {
+		rcu_read_unlock();
+		return -EINVAL;
+	}
+
+	memcpy(&ivi->mac, vfinfo[vf].vf_mac_addresses, ETH_ALEN);
+	ivi->max_tx_rate = vfinfo[vf].tx_rate;
 	ivi->min_tx_rate = 0;
-	ivi->vlan = adapter->vfinfo[vf].pf_vlan;
-	ivi->qos = adapter->vfinfo[vf].pf_qos;
-	ivi->spoofchk = adapter->vfinfo[vf].spoofchk_enabled;
-	ivi->rss_query_en = adapter->vfinfo[vf].rss_query_enabled;
-	ivi->trusted = adapter->vfinfo[vf].trusted;
-	ivi->linkstate = adapter->vfinfo[vf].link_state;
+	ivi->vlan = vfinfo[vf].pf_vlan;
+	ivi->qos = vfinfo[vf].pf_qos;
+	ivi->spoofchk = vfinfo[vf].spoofchk_enabled;
+	ivi->rss_query_en = vfinfo[vf].rss_query_enabled;
+	ivi->trusted = vfinfo[vf].trusted;
+	ivi->linkstate = vfinfo[vf].link_state;
+
+	rcu_read_unlock();
+
 	return 0;
 }
-- 
2.53.0


^ permalink raw reply related

* [PATCH bpf-next v4 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation
From: Nick Hudson @ 2026-04-16  7:55 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, linux-kernel
In-Reply-To: <20260416075514.927101-1-nhudson@akamai.com>

Add new bpf_skb_adjust_room() decapsulation flags:

- BPF_F_ADJ_ROOM_DECAP_L4_GRE
- BPF_F_ADJ_ROOM_DECAP_L4_UDP
- BPF_F_ADJ_ROOM_DECAP_IPXIP4
- BPF_F_ADJ_ROOM_DECAP_IPXIP6

These flags let BPF programs describe which tunnel layer is being
removed, so later changes can update tunnel-related GSO state
accordingly during decapsulation.

This patch only introduces the UAPI flag definitions and helper
documentation.

Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
---
 include/uapi/linux/bpf.h       | 34 ++++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 34 ++++++++++++++++++++++++++++++++--
 2 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c021ed8d7b44..4a53e731c554 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3010,8 +3010,34 @@ union bpf_attr {
  *
  *		* **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
  *		  **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
- *		  Indicate the new IP header version after decapsulating the outer
- *		  IP header. Used when the inner and outer IP versions are different.
+ *		  Indicate the new IP header version after decapsulating the
+ *		  outer IP header. Used when the inner and outer IP versions
+ *		  are different. These flags only trigger a protocol change
+ *		  without clearing any tunnel-specific GSO flags.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_L4_GRE**:
+ *		  Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM)
+ *		  when decapsulating a GRE tunnel.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_L4_UDP**:
+ *		  Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and
+ *		  SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP4**:
+ *		  Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating
+ *		  a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4).
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP6**:
+ *		  Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when
+ *		  decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6
+ *		  or IPv4-in-IPv6).
+ *
+ *		When using the decapsulation flags above, the skb->encapsulation
+ *		flag is automatically cleared if all tunnel-specific GSO flags
+ *		(SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE,
+ *		SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been
+ *		removed from the packet. This handles cases where all tunnel
+ *		layers have been decapsulated.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
@@ -6221,6 +6247,10 @@ enum bpf_adj_room_flags {
 	BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
 	BPF_F_ADJ_ROOM_DECAP_L3_IPV4	= (1ULL << 7),
 	BPF_F_ADJ_ROOM_DECAP_L3_IPV6	= (1ULL << 8),
+	BPF_F_ADJ_ROOM_DECAP_L4_GRE	= (1ULL << 9),
+	BPF_F_ADJ_ROOM_DECAP_L4_UDP	= (1ULL << 10),
+	BPF_F_ADJ_ROOM_DECAP_IPXIP4	= (1ULL << 11),
+	BPF_F_ADJ_ROOM_DECAP_IPXIP6	= (1ULL << 12),
 };
 
 enum {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ca35ed622ed5..f4c2fbd8fe68 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3010,8 +3010,34 @@ union bpf_attr {
  *
  *		* **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
  *		  **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
- *		  Indicate the new IP header version after decapsulating the outer
- *		  IP header. Used when the inner and outer IP versions are different.
+ *		  Indicate the new IP header version after decapsulating the
+ *		  outer IP header. Used when the inner and outer IP versions
+ *		  are different. These flags only trigger a protocol change
+ *		  without clearing any tunnel-specific GSO flags.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_L4_GRE**:
+ *		  Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM)
+ *		  when decapsulating a GRE tunnel.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_L4_UDP**:
+ *		  Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and
+ *		  SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP4**:
+ *		  Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating
+ *		  a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4).
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP6**:
+ *		  Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when
+ *		  decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6
+ *		  or IPv4-in-IPv6).
+ *
+ *		When using the decapsulation flags above, the skb->encapsulation
+ *		flag is automatically cleared if all tunnel-specific GSO flags
+ *		(SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE,
+ *		SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been
+ *		removed from the packet. This handles cases where all tunnel
+ *		layers have been decapsulated.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
@@ -6221,6 +6247,10 @@ enum bpf_adj_room_flags {
 	BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
 	BPF_F_ADJ_ROOM_DECAP_L3_IPV4	= (1ULL << 7),
 	BPF_F_ADJ_ROOM_DECAP_L3_IPV6	= (1ULL << 8),
+	BPF_F_ADJ_ROOM_DECAP_L4_GRE	= (1ULL << 9),
+	BPF_F_ADJ_ROOM_DECAP_L4_UDP	= (1ULL << 10),
+	BPF_F_ADJ_ROOM_DECAP_IPXIP4	= (1ULL << 11),
+	BPF_F_ADJ_ROOM_DECAP_IPXIP6	= (1ULL << 12),
 };
 
 enum {
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH bpf-next v4 5/6] bpf: clear decap tunnel GSO state in skb_adjust_room
From: bot+bpf-ci @ 2026-04-16  8:34 UTC (permalink / raw)
  To: nhudson, bpf, netdev, willemb, martin.lau
  Cc: nhudson, mtottenh, aglasgal, daniel, ast, andrii, eddyz87, memxor,
	davem, edumazet, kuba, pabeni, linux-kernel, ast, andrii, daniel,
	martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260416075514.927101-6-nhudson@akamai.com>

[-- Attachment #1: Type: text/plain, Size: 2910 bytes --]

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1dac64b50de4..4e77e9c46f1a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3657,6 +3657,44 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>  		if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
>  			skb_increase_gso_size(shinfo, len_diff);
>
> +		/* Selective GSO flag clearing based on decap type.
> +		 * Only clear the flags for the tunnel layer being removed.
> +		 */
> +		if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) &&

The symbols BPF_F_ADJ_ROOM_DECAP_L4_UDP, BPF_F_ADJ_ROOM_DECAP_L4_GRE,
BPF_F_ADJ_ROOM_DECAP_IPXIP4, BPF_F_ADJ_ROOM_DECAP_IPXIP6,
BPF_F_ADJ_ROOM_DECAP_L4_MASK, and BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK
are used throughout this hunk but are not defined anywhere in the tree.

The series cover letter lists patch 3/6 "bpf: add
BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation" which would
add these to include/uapi/linux/bpf.h, but that patch is not present.
Without it, this commit will not compile.

Additionally, even if these symbols were defined, BPF_F_ADJ_ROOM_MASK
and BPF_F_ADJ_ROOM_DECAP_MASK do not include the new flag bits.  Both
bpf_skb_adjust_room():

    if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK))
        return -EINVAL;

and bpf_skb_net_shrink():

    if (unlikely(flags & ~(BPF_F_ADJ_ROOM_DECAP_MASK |
                           BPF_F_ADJ_ROOM_FIXED_GSO |
                           BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
        return -EINVAL;

would reject any flags value containing these new bits, making the
new code unreachable.  Patch 4/6 "bpf: allow new DECAP flags and add
guard rails" from the series would presumably update these masks, but
it is also not present.

Are patches 3/6 and 4/6 of the series missing from this tree?

> +		    (shinfo->gso_type & (SKB_GSO_UDP_TUNNEL |
> +					 SKB_GSO_UDP_TUNNEL_CSUM)))
> +			shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL |
> +					      SKB_GSO_UDP_TUNNEL_CSUM);

[ ... ]

> +		/* Clear encapsulation flag only when no tunnel GSO flags remain */
> +		if (flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
> +			     BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) {

[ ... ]

> +			if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) {
> +				skb->encap_hdr_csum = 0;
> +				skb->remcsum_offload = 0;
> +			}
> +		}

This isn't a bug, but the BPF subsystem convention is that multi-line
comments should have the opening /* on its own line:

> +		/* Selective GSO flag clearing based on decap type.
> +		 * Only clear the flags for the tunnel layer being removed.
> +		 */

Should be:

    /*
     * Selective GSO flag clearing based on decap type.
     * Only clear the flags for the tunnel layer being removed.
     */

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24499314045

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox