public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Przemek Kitszel <przemyslaw.kitszel@intel.com>
To: Petr Oros <poros@redhat.com>
Cc: <ivecera@redhat.com>, <aleksandr.loktionov@intel.com>,
	<shaojijie@huawei.com>, Jacob Keller <jacob.e.keller@intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	<intel-wired-lan@lists.osuosl.org>,
	<linux-kernel@vger.kernel.org>, <netdev@vger.kernel.org>
Subject: Re: [PATCH net v2] iavf: fix deadlock in reset handling
Date: Tue, 10 Feb 2026 15:58:25 +0100	[thread overview]
Message-ID: <adf9538c-d403-4480-8ec5-09e2943878e6@intel.com> (raw)
In-Reply-To: <20260207102245.812795-1-poros@redhat.com>

On 2/7/26 11:22, Petr Oros wrote:
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
> 
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
> commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
> it") to fix a race condition where adding an interface to bonding
> immediately after MTU or ring parameter change failed because the
> interface was still in __RESETTING state. The same commit also added
> waiting in iavf_set_priv_flags(), which was later removed by commit
> 53844673d555 ("iavf: kill "legacy-rx" for good").
> 
> Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF has
> enough time to complete the VF reset when changing channel count, and to
> return correct error codes to userspace.
> 
> Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked NAPI
> variants (napi_enable_locked, napi_disable_locked) that need the netdev
> instance lock.
> 
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
> protect all callback with netdev instance lock") started holding the
> netdev instance lock during ndo and ethtool callbacks for drivers with
> net_shaper_ops.
> 
> Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
> replaced the driver's crit_lock with netdev_lock in reset_task, making
> the deadlock manifest: the callback holds netdev_lock and waits for
> reset_task, but reset_task needs the same lock:
> 
>    Thread 1 (callback)               Thread 2 (reset_task)
>    -------------------               ---------------------
>    netdev_lock()                     [blocked on workqueue]
>    ndo_change_mtu() or ethtool op
>      iavf_schedule_reset()
>      iavf_wait_for_reset()           iavf_reset_task()
>        waiting...                      netdev_lock() <- DEADLOCK
> 
> Fix this by extracting the reset logic from iavf_reset_task() into a new
> iavf_reset_step() function that expects netdev_lock to be already held.
> The three callbacks now call iavf_reset_step() directly instead of
> scheduling the work and waiting, performing the reset synchronously in
> the caller's context which already holds netdev_lock. This eliminates
> both the deadlock and the need for iavf_wait_for_reset(), which is
> removed.
> 
> The workqueue-based iavf_reset_task() becomes a thin wrapper that
> acquires netdev_lock and calls iavf_reset_step(), preserving its use
> for PF-initiated resets.
> 
> The callbacks may block for several seconds while iavf_reset_step()
> polls hardware registers, but this is acceptable since netdev_lock is a
> per-device mutex and only serializes operations on the same interface.
> 
> Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Petr Oros <poros@redhat.com>

thanks a lot, this is not only a fix, but also a step in the right
direction for the driver,

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>

> ---
>   drivers/net/ethernet/intel/iavf/iavf.h        |  2 +-
>   .../net/ethernet/intel/iavf/iavf_ethtool.c    | 21 +++---
>   drivers/net/ethernet/intel/iavf/iavf_main.c   | 72 +++++++------------
>   3 files changed, 33 insertions(+), 62 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
> index d552f912e8a947..0c3844b3ff1c86 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf.h
> @@ -625,5 +625,5 @@ void iavf_add_adv_rss_cfg(struct iavf_adapter *adapter);
>   void iavf_del_adv_rss_cfg(struct iavf_adapter *adapter);
>   struct iavf_mac_filter *iavf_add_filter(struct iavf_adapter *adapter,
>   					const u8 *macaddr);
> -int iavf_wait_for_reset(struct iavf_adapter *adapter);
> +void iavf_reset_step(struct iavf_adapter *adapter);
>   #endif /* _IAVF_H_ */
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> index 2cc21289a70779..9b0f47f9340942 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> @@ -492,7 +492,6 @@ static int iavf_set_ringparam(struct net_device *netdev,
>   {
>   	struct iavf_adapter *adapter = netdev_priv(netdev);
>   	u32 new_rx_count, new_tx_count;
> -	int ret = 0;
>   
>   	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
>   		return -EINVAL;
> @@ -537,13 +536,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
>   	}
>   
>   	if (netif_running(netdev)) {
> -		iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
> -		ret = iavf_wait_for_reset(adapter);
> -		if (ret)
> -			netdev_warn(netdev, "Changing ring parameters timeout or interrupted waiting for reset");
> +		adapter->flags |= IAVF_FLAG_RESET_NEEDED;
> +		iavf_reset_step(adapter);
>   	}
>   
> -	return ret;
> +	return 0;
>   }
>   
>   /**
> @@ -1723,7 +1720,6 @@ static int iavf_set_channels(struct net_device *netdev,
>   {
>   	struct iavf_adapter *adapter = netdev_priv(netdev);
>   	u32 num_req = ch->combined_count;
> -	int ret = 0;
>   
>   	if ((adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_ADQ) &&
>   	    adapter->num_tc) {
> @@ -1745,13 +1741,12 @@ static int iavf_set_channels(struct net_device *netdev,
>   
>   	adapter->num_req_queues = num_req;
>   	adapter->flags |= IAVF_FLAG_REINIT_ITR_NEEDED;
> -	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
> -
> -	ret = iavf_wait_for_reset(adapter);
> -	if (ret)
> -		netdev_warn(netdev, "Changing channel count timeout or interrupted waiting for reset");
> +	if (netif_running(netdev)) {
> +		adapter->flags |= IAVF_FLAG_RESET_NEEDED;
> +		iavf_reset_step(adapter);
> +	}
>   
> -	return ret;
> +	return 0;
>   }
>   
>   /**
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 8aa6e92c16431f..9c8d6125106f5a 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -185,31 +185,6 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter)
>   	return false;
>   }
>   
> -/**
> - * iavf_wait_for_reset - Wait for reset to finish.
> - * @adapter: board private structure
> - *
> - * Returns 0 if reset finished successfully, negative on timeout or interrupt.
> - */
> -int iavf_wait_for_reset(struct iavf_adapter *adapter)
> -{
> -	int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
> -					!iavf_is_reset_in_progress(adapter),
> -					msecs_to_jiffies(5000));
> -
> -	/* If ret < 0 then it means wait was interrupted.
> -	 * If ret == 0 then it means we got a timeout while waiting
> -	 * for reset to finish.
> -	 * If ret > 0 it means reset has finished.
> -	 */
> -	if (ret > 0)
> -		return 0;
> -	else if (ret < 0)
> -		return -EINTR;
> -	else
> -		return -EBUSY;
> -}
> -
>   /**
>    * iavf_allocate_dma_mem_d - OS specific memory alloc for shared code
>    * @hw:   pointer to the HW structure
> @@ -3100,18 +3075,16 @@ static void iavf_reconfig_qs_bw(struct iavf_adapter *adapter)
>   }
>   
>   /**
> - * iavf_reset_task - Call-back task to handle hardware reset
> - * @work: pointer to work_struct
> + * iavf_reset_step - Perform the VF reset sequence
> + * @adapter: board private structure
>    *
> - * During reset we need to shut down and reinitialize the admin queue
> - * before we can use it to communicate with the PF again. We also clear
> - * and reinit the rings because that context is lost as well.
> - **/
> -static void iavf_reset_task(struct work_struct *work)
> + * Requests a reset from PF, polls for completion, and reconfigures
> + * the driver. Caller must hold the netdev instance lock.
> + *
> + * This can sleep for several seconds while polling HW registers.
> + */
> +void iavf_reset_step(struct iavf_adapter *adapter)
>   {
> -	struct iavf_adapter *adapter = container_of(work,
> -						      struct iavf_adapter,
> -						      reset_task);
>   	struct virtchnl_vf_resource *vfres = adapter->vf_res;
>   	struct net_device *netdev = adapter->netdev;
>   	struct iavf_hw *hw = &adapter->hw;
> @@ -3122,7 +3095,7 @@ static void iavf_reset_task(struct work_struct *work)
>   	int i = 0, err;
>   	bool running;
>   
> -	netdev_lock(netdev);
> +	netdev_assert_locked(netdev);
>   
>   	iavf_misc_irq_disable(adapter);
>   	if (adapter->flags & IAVF_FLAG_RESET_NEEDED) {
> @@ -3167,7 +3140,6 @@ static void iavf_reset_task(struct work_struct *work)
>   		dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
>   			reg_val);
>   		iavf_disable_vf(adapter);
> -		netdev_unlock(netdev);
>   		return; /* Do not attempt to reinit. It's dead, Jim. */
>   	}
>   
> @@ -3179,7 +3151,6 @@ static void iavf_reset_task(struct work_struct *work)
>   		iavf_startup(adapter);
>   		queue_delayed_work(adapter->wq, &adapter->watchdog_task,
>   				   msecs_to_jiffies(30));
> -		netdev_unlock(netdev);
>   		return;
>   	}
>   
> @@ -3321,7 +3292,6 @@ static void iavf_reset_task(struct work_struct *work)
>   	adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED;
>   
>   	wake_up(&adapter->reset_waitqueue);
> -	netdev_unlock(netdev);
>   
>   	return;
>   reset_err:
> @@ -3331,10 +3301,21 @@ static void iavf_reset_task(struct work_struct *work)
>   	}
>   	iavf_disable_vf(adapter);
>   
> -	netdev_unlock(netdev);
>   	dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
>   }
>   
> +static void iavf_reset_task(struct work_struct *work)
> +{
> +	struct iavf_adapter *adapter = container_of(work,
> +						      struct iavf_adapter,
> +						      reset_task);
> +	struct net_device *netdev = adapter->netdev;
> +
> +	netdev_lock(netdev);
> +	iavf_reset_step(adapter);
> +	netdev_unlock(netdev);
> +}
> +
>   /**
>    * iavf_adminq_task - worker thread to clean the admin queue
>    * @work: pointer to work_struct containing our data
> @@ -4600,22 +4581,17 @@ static int iavf_close(struct net_device *netdev)
>   static int iavf_change_mtu(struct net_device *netdev, int new_mtu)
>   {
>   	struct iavf_adapter *adapter = netdev_priv(netdev);
> -	int ret = 0;
>   
>   	netdev_dbg(netdev, "changing MTU from %d to %d\n",
>   		   netdev->mtu, new_mtu);
>   	WRITE_ONCE(netdev->mtu, new_mtu);
>   
>   	if (netif_running(netdev)) {
> -		iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
> -		ret = iavf_wait_for_reset(adapter);
> -		if (ret < 0)
> -			netdev_warn(netdev, "MTU change interrupted waiting for reset");
> -		else if (ret)
> -			netdev_warn(netdev, "MTU change timed out waiting for reset");
> +		adapter->flags |= IAVF_FLAG_RESET_NEEDED;
> +		iavf_reset_step(adapter);
>   	}
>   
> -	return ret;
> +	return 0;
>   }
>   
>   /**


  parent reply	other threads:[~2026-02-10 14:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-07 10:22 [PATCH net v2] iavf: fix deadlock in reset handling Petr Oros
2026-02-09  7:15 ` Loktionov, Aleksandr
2026-02-09 23:59 ` Jacob Keller
2026-02-10 14:58 ` Przemek Kitszel [this message]
2026-02-11 11:50 ` Paolo Abeni
2026-02-11 18:50   ` Jacob Keller
2026-02-11 19:18 ` [PATCH net v3] iavf: fix incorrect reset handling in callbacks Petr Oros
2026-02-13 10:29   ` Przemek Kitszel
2026-02-23 16:10   ` [Intel-wired-lan] " Romanowski, Rafal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adf9538c-d403-4480-8ec5-09e2943878e6@intel.com \
    --to=przemyslaw.kitszel@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=ivecera@redhat.com \
    --cc=jacob.e.keller@intel.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=poros@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shaojijie@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox