Re: [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock

Netdev List
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: anthony.l.nguyen@intel.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, pabeni@redhat.com, edumazet@google.com,
	andrew+netdev@lunn.ch, netdev@vger.kernel.org,
	vinschen@redhat.com, alexander.nowlin@intel.com
Subject: Re: [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock
Date: Fri, 22 May 2026 17:16:42 -0700	[thread overview]
Message-ID: <20260523001642.1758535-1-kuba@kernel.org> (raw)
In-Reply-To: <20260520183501.3360810-9-anthony.l.nguyen@intel.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: ixgbe: only access vfinfo and mv_list under RCU lock

This patch drops the spinlock protecting the adapter info in favor of RCU to
avoid lock contention and crashes when setting sriov_numvfs to zero. It
modifies the accesses to vfinfo and mv_list across the driver to use RCU
primitives.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 2646ee6f295f0..d82c7dfc6580a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
[ ... ]
> @@ -8309,9 +8340,15 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
>  
>  	/* check status reg for all VFs owned by this PF */
>  	for (vf = 0; vf < adapter->num_vfs; ++vf) {
> -		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
> +		struct vf_data_storage *vfinfo;
> +		struct pci_dev *vfdev = NULL;
>  		u16 status_reg;
>  
> +		rcu_read_lock();
> +		vfinfo = rcu_dereference(adapter->vfinfo);
> +		if (vfinfo)
> +			vfdev = vfinfo[vf].vfdev;
> +		rcu_read_unlock();
>  		if (!vfdev)
>  			continue;
>  		pci_read_config_word(vfdev, PCI_STATUS, &status_reg);

[Severity: High]
Is it safe to dereference vfdev outside the RCU read-side critical
section? By dropping the RCU read lock before calling pci_read_config_word(),
the grace period can expire and a concurrent execution of
ixgbe_disable_sriov() could free the pci_dev object, resulting in a
use-after-free.

[ ... ]
> @@ -10071,20 +10114,26 @@ static int handle_redirect_action(struct ixgbe_adapter *adapter, int ifindex,
>  {
>  	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
>  	unsigned int num_vfs = adapter->num_vfs, vf;
> +	struct vf_data_storage *vfinfo;
>  	struct netdev_nested_priv priv;
>  	struct upper_walk_data data;
>  	struct net_device *upper;
>  
>  	/* redirect to a SRIOV VF */
> -	for (vf = 0; vf < num_vfs; ++vf) {
> -		upper = pci_get_drvdata(adapter->vfinfo[vf].vfdev);
> -		if (upper->ifindex == ifindex) {
> -			*queue = vf * __ALIGN_MASK(1, ~vmdq->mask);
> -			*action = vf + 1;
> -			*action <<= ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
> -			return 0;
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (vfinfo)
> +		for (vf = 0; vf < num_vfs; ++vf) {

[Severity: High]
Does reading adapter->num_vfs outside the RCU lock create a time-of-check to
time-of-use (TOCTOU) race? If SR-IOV is disabled and rapidly re-enabled, a
reader can cache an older, larger num_vfs value before acquiring the RCU
lock, but then fetch the newer, smaller vfinfo array. Iterating up to the old
num_vfs would then cause an out-of-bounds memory access.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> index 431d77da15a5f..80f22a8e7af43 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
[ ... ]
> @@ -123,6 +125,7 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
>  static void ixgbe_get_vfs(struct ixgbe_adapter *adapter)
>  {
>  	struct pci_dev *pdev = adapter->pdev;
> +	struct vf_data_storage *vfinfo;
>  	u16 vendor = pdev->vendor;
>  	struct pci_dev *vfdev;
>  	int vf = 0;
> @@ -134,18 +137,23 @@ static void ixgbe_get_vfs(struct ixgbe_adapter *adapter)
>  		return;
>  	pci_read_config_word(pdev, pos + PCI_SRIOV_VF_DID, &vf_id);
>  
> -	vfdev = pci_get_device(vendor, vf_id, NULL);
> -	for (; vfdev; vfdev = pci_get_device(vendor, vf_id, vfdev)) {
> -		if (!vfdev->is_virtfn)
> -			continue;
> -		if (vfdev->physfn != pdev)
> -			continue;
> -		if (vf >= adapter->num_vfs)
> -			continue;
> -		pci_dev_get(vfdev);
> -		adapter->vfinfo[vf].vfdev = vfdev;
> -		++vf;
> +	rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (vfinfo) {
> +		vfdev = pci_get_device(vendor, vf_id, NULL);
> +		for (; vfdev; vfdev = pci_get_device(vendor, vf_id, vfdev)) {

[Severity: High]
Does this code call a sleeping function inside an RCU read-side critical
section? pci_get_device() internally acquires the pci_bus_sem read-write
semaphore, which can sleep. Sleeping inside an RCU read section will
trigger a 'scheduling while atomic' BUG.

[ ... ]
> @@ -206,31 +214,28 @@ int ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
>  {
>  	unsigned int num_vfs = adapter->num_vfs, vf;
>  	struct ixgbe_hw *hw = &adapter->hw;
> -	unsigned long flags;
> +	struct vf_data_storage *vfinfo;
> +	struct vf_macvlans *mv_list;
>  	int rss;
>  
> -	spin_lock_irqsave(&adapter->vfs_lock, flags);
> -	/* set num VFs to 0 to prevent access to vfinfo */
> +	/* set num VFs to 0 so readers bail out early */
>  	adapter->num_vfs = 0;
> -	spin_unlock_irqrestore(&adapter->vfs_lock, flags);
> +
> +	vfinfo = rcu_replace_pointer(adapter->vfinfo, NULL, 1);
> +	mv_list = rcu_replace_pointer(adapter->mv_list, NULL, 1);
>  
>  	/* put the reference to all of the vf devices */
>  	for (vf = 0; vf < num_vfs; ++vf) {
> -		struct pci_dev *vfdev = adapter->vfinfo[vf].vfdev;
> +		struct pci_dev *vfdev = vfinfo[vf].vfdev;
>  
>  		if (!vfdev)
>  			continue;
> -		adapter->vfinfo[vf].vfdev = NULL;
> +		vfinfo[vf].vfdev = NULL;
>  		pci_dev_put(vfdev);
>  	}

[Severity: High]
Can RCU readers observe these synchronous mutations before the grace
period ends? The vfinfo array is modified in-place and pci_dev_put() is
called on the devices synchronously. If the final reference is dropped,
the pci_dev could be freed while an RCU reader (like
handle_redirect_action()) is concurrently accessing it, leading to a
use-after-free.

[ ... ]
> @@ -643,10 +668,16 @@ static void ixgbe_clear_vf_vlans(struct ixgbe_adapter *adapter, u32 vf)
>  static int ixgbe_set_vf_macvlan(struct ixgbe_adapter *adapter,
>  				int vf, int index, unsigned char *mac_addr)
>  {
> -	struct vf_macvlans *entry;
> +	struct vf_macvlans *mv_list, *entry;
>  	bool found = false;
>  	int retval = 0;
>  
> +	lockdep_assert_in_rcu_read_lock();
> +	/* vf_mvs entries point into the mv_list array */
> +	mv_list = rcu_dereference(adapter->mv_list);
> +	if (!mv_list)
> +		return 0;
> +
>  	if (index <= 1) {
>  		list_for_each_entry(entry, &adapter->vf_mvs.l, l) {

[Severity: High]
Are standard list operations safe here under RCU? The adapter->vf_mvs.l
list is traversed using list_for_each_entry() under rcu_read_lock().
Concurrently, ixgbe_alloc_vf_macvlans() can re-initialize the list head
and add entries using list_add(). Without the appropriate RCU list
primitives, concurrent list modifications can cause data races and
corrupted pointers.

[ ... ]
> @@ -952,14 +1013,20 @@ static int ixgbe_set_vf_mac_addr(struct ixgbe_adapter *adapter,
>  				 u32 *msgbuf, u32 vf)
>  {
>  	u8 *new_mac = ((u8 *)(&msgbuf[1]));
> +	struct vf_data_storage *vfinfo;
>  
>  	if (!is_valid_ether_addr(new_mac)) {
>  		e_warn(drv, "VF %d attempted to set invalid mac\n", vf);
>  		return -1;
>  	}
>  
> -	if (adapter->vfinfo[vf].pf_set_mac && !adapter->vfinfo[vf].trusted &&
> -	    !ether_addr_equal(adapter->vfinfo[vf].vf_mac_addresses, new_mac)) {
> +	lockdep_assert_in_rcu_read_lock();
> +	vfinfo = rcu_dereference(adapter->vfinfo);
> +	if (!vfinfo)
> +		return 0;

[Severity: Medium]
Will this send false ACKs to the VF when vfinfo is NULL? The caller,
ixgbe_rcv_msg_from_vf(), interprets a return value of 0 as success and
sends an ACK (IXGBE_VT_MSGTYPE_ACK) to the VF. Returning 0 here means
the operation is silently ignored but acknowledged as successful. Could
this return an error like -EINVAL instead?

     prev parent reply	other threads:[~2026-05-23  0:16 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 18:34 [PATCH net 0/8][pull request] Intel Wired LAN Driver Updates 2026-05-20 (ice, iavf, i40e, ixgbe) Tony Nguyen
2026-05-20 18:34 ` [PATCH net 1/8] ice: fix UAF/NULL deref when VSI rebuild and XDP attach race Tony Nguyen
2026-05-21 15:37   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 2/8] ice: fix stats array overflow when VF requests more queues Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 3/8] iavf: return EBUSY if reset in progress or not ready during MAC change Tony Nguyen
2026-05-20 18:34 ` [PATCH net 4/8] i40e: skip unnecessary VF reset when setting trust Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 5/8] iavf: send MAC change request synchronously Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 6/8] ice: skip unnecessary VF reset when setting trust Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 7/8] i40e: set supported_extts_flags for rising edge Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-20 18:34 ` [PATCH net 8/8] ixgbe: only access vfinfo and mv_list under RCU lock Tony Nguyen
2026-05-23  0:16   ` Jakub Kicinski
2026-05-23  0:16   ` Jakub Kicinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260523001642.1758535-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=alexander.nowlin@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=vinschen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox