* [PATCH net v2] iavf: fix deadlock in reset handling
@ 2026-02-07 10:22 Petr Oros
2026-02-09 7:15 ` Loktionov, Aleksandr
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Petr Oros @ 2026-02-07 10:22 UTC (permalink / raw)
To: netdev
Cc: ivecera, aleksandr.loktionov, shaojijie, Petr Oros, Jacob Keller,
Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Stanislav Fomichev,
intel-wired-lan, linux-kernel
Three driver callbacks schedule a reset and wait for its completion:
ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
it") to fix a race condition where adding an interface to bonding
immediately after MTU or ring parameter change failed because the
interface was still in __RESETTING state. The same commit also added
waiting in iavf_set_priv_flags(), which was later removed by commit
53844673d555 ("iavf: kill "legacy-rx" for good").
Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
("iavf: Fix return of set the new channel count") to ensure the PF has
enough time to complete the VF reset when changing channel count, and to
return correct error codes to userspace.
Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
net_shaper_ops to iavf, which required reset_task to use _locked NAPI
variants (napi_enable_locked, napi_disable_locked) that need the netdev
instance lock.
Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
protect all callback with netdev instance lock") started holding the
netdev instance lock during ndo and ethtool callbacks for drivers with
net_shaper_ops.
Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
replaced the driver's crit_lock with netdev_lock in reset_task, making
the deadlock manifest: the callback holds netdev_lock and waits for
reset_task, but reset_task needs the same lock:
Thread 1 (callback) Thread 2 (reset_task)
------------------- ---------------------
netdev_lock() [blocked on workqueue]
ndo_change_mtu() or ethtool op
iavf_schedule_reset()
iavf_wait_for_reset() iavf_reset_task()
waiting... netdev_lock() <- DEADLOCK
Fix this by extracting the reset logic from iavf_reset_task() into a new
iavf_reset_step() function that expects netdev_lock to be already held.
The three callbacks now call iavf_reset_step() directly instead of
scheduling the work and waiting, performing the reset synchronously in
the caller's context which already holds netdev_lock. This eliminates
both the deadlock and the need for iavf_wait_for_reset(), which is
removed.
The workqueue-based iavf_reset_task() becomes a thin wrapper that
acquires netdev_lock and calls iavf_reset_step(), preserving its use
for PF-initiated resets.
The callbacks may block for several seconds while iavf_reset_step()
polls hardware registers, but this is acceptable since netdev_lock is a
per-device mutex and only serializes operations on the same interface.
Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
---
drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
.../net/ethernet/intel/iavf/iavf_ethtool.c | 21 +++---
drivers/net/ethernet/intel/iavf/iavf_main.c | 72 +++++++------------
3 files changed, 33 insertions(+), 62 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index d552f912e8a947..0c3844b3ff1c86 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -625,5 +625,5 @@ void iavf_add_adv_rss_cfg(struct iavf_adapter *adapter);
void iavf_del_adv_rss_cfg(struct iavf_adapter *adapter);
struct iavf_mac_filter *iavf_add_filter(struct iavf_adapter *adapter,
const u8 *macaddr);
-int iavf_wait_for_reset(struct iavf_adapter *adapter);
+void iavf_reset_step(struct iavf_adapter *adapter);
#endif /* _IAVF_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 2cc21289a70779..9b0f47f9340942 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -492,7 +492,6 @@ static int iavf_set_ringparam(struct net_device *netdev,
{
struct iavf_adapter *adapter = netdev_priv(netdev);
u32 new_rx_count, new_tx_count;
- int ret = 0;
if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
return -EINVAL;
@@ -537,13 +536,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
}
if (netif_running(netdev)) {
- iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
- ret = iavf_wait_for_reset(adapter);
- if (ret)
- netdev_warn(netdev, "Changing ring parameters timeout or interrupted waiting for reset");
+ adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+ iavf_reset_step(adapter);
}
- return ret;
+ return 0;
}
/**
@@ -1723,7 +1720,6 @@ static int iavf_set_channels(struct net_device *netdev,
{
struct iavf_adapter *adapter = netdev_priv(netdev);
u32 num_req = ch->combined_count;
- int ret = 0;
if ((adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_ADQ) &&
adapter->num_tc) {
@@ -1745,13 +1741,12 @@ static int iavf_set_channels(struct net_device *netdev,
adapter->num_req_queues = num_req;
adapter->flags |= IAVF_FLAG_REINIT_ITR_NEEDED;
- iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
-
- ret = iavf_wait_for_reset(adapter);
- if (ret)
- netdev_warn(netdev, "Changing channel count timeout or interrupted waiting for reset");
+ if (netif_running(netdev)) {
+ adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+ iavf_reset_step(adapter);
+ }
- return ret;
+ return 0;
}
/**
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 8aa6e92c16431f..9c8d6125106f5a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -185,31 +185,6 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter)
return false;
}
-/**
- * iavf_wait_for_reset - Wait for reset to finish.
- * @adapter: board private structure
- *
- * Returns 0 if reset finished successfully, negative on timeout or interrupt.
- */
-int iavf_wait_for_reset(struct iavf_adapter *adapter)
-{
- int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
- !iavf_is_reset_in_progress(adapter),
- msecs_to_jiffies(5000));
-
- /* If ret < 0 then it means wait was interrupted.
- * If ret == 0 then it means we got a timeout while waiting
- * for reset to finish.
- * If ret > 0 it means reset has finished.
- */
- if (ret > 0)
- return 0;
- else if (ret < 0)
- return -EINTR;
- else
- return -EBUSY;
-}
-
/**
* iavf_allocate_dma_mem_d - OS specific memory alloc for shared code
* @hw: pointer to the HW structure
@@ -3100,18 +3075,16 @@ static void iavf_reconfig_qs_bw(struct iavf_adapter *adapter)
}
/**
- * iavf_reset_task - Call-back task to handle hardware reset
- * @work: pointer to work_struct
+ * iavf_reset_step - Perform the VF reset sequence
+ * @adapter: board private structure
*
- * During reset we need to shut down and reinitialize the admin queue
- * before we can use it to communicate with the PF again. We also clear
- * and reinit the rings because that context is lost as well.
- **/
-static void iavf_reset_task(struct work_struct *work)
+ * Requests a reset from PF, polls for completion, and reconfigures
+ * the driver. Caller must hold the netdev instance lock.
+ *
+ * This can sleep for several seconds while polling HW registers.
+ */
+void iavf_reset_step(struct iavf_adapter *adapter)
{
- struct iavf_adapter *adapter = container_of(work,
- struct iavf_adapter,
- reset_task);
struct virtchnl_vf_resource *vfres = adapter->vf_res;
struct net_device *netdev = adapter->netdev;
struct iavf_hw *hw = &adapter->hw;
@@ -3122,7 +3095,7 @@ static void iavf_reset_task(struct work_struct *work)
int i = 0, err;
bool running;
- netdev_lock(netdev);
+ netdev_assert_locked(netdev);
iavf_misc_irq_disable(adapter);
if (adapter->flags & IAVF_FLAG_RESET_NEEDED) {
@@ -3167,7 +3140,6 @@ static void iavf_reset_task(struct work_struct *work)
dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
reg_val);
iavf_disable_vf(adapter);
- netdev_unlock(netdev);
return; /* Do not attempt to reinit. It's dead, Jim. */
}
@@ -3179,7 +3151,6 @@ static void iavf_reset_task(struct work_struct *work)
iavf_startup(adapter);
queue_delayed_work(adapter->wq, &adapter->watchdog_task,
msecs_to_jiffies(30));
- netdev_unlock(netdev);
return;
}
@@ -3321,7 +3292,6 @@ static void iavf_reset_task(struct work_struct *work)
adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED;
wake_up(&adapter->reset_waitqueue);
- netdev_unlock(netdev);
return;
reset_err:
@@ -3331,10 +3301,21 @@ static void iavf_reset_task(struct work_struct *work)
}
iavf_disable_vf(adapter);
- netdev_unlock(netdev);
dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
}
+static void iavf_reset_task(struct work_struct *work)
+{
+ struct iavf_adapter *adapter = container_of(work,
+ struct iavf_adapter,
+ reset_task);
+ struct net_device *netdev = adapter->netdev;
+
+ netdev_lock(netdev);
+ iavf_reset_step(adapter);
+ netdev_unlock(netdev);
+}
+
/**
* iavf_adminq_task - worker thread to clean the admin queue
* @work: pointer to work_struct containing our data
@@ -4600,22 +4581,17 @@ static int iavf_close(struct net_device *netdev)
static int iavf_change_mtu(struct net_device *netdev, int new_mtu)
{
struct iavf_adapter *adapter = netdev_priv(netdev);
- int ret = 0;
netdev_dbg(netdev, "changing MTU from %d to %d\n",
netdev->mtu, new_mtu);
WRITE_ONCE(netdev->mtu, new_mtu);
if (netif_running(netdev)) {
- iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
- ret = iavf_wait_for_reset(adapter);
- if (ret < 0)
- netdev_warn(netdev, "MTU change interrupted waiting for reset");
- else if (ret)
- netdev_warn(netdev, "MTU change timed out waiting for reset");
+ adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+ iavf_reset_step(adapter);
}
- return ret;
+ return 0;
}
/**
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [PATCH net v2] iavf: fix deadlock in reset handling
2026-02-07 10:22 [PATCH net v2] iavf: fix deadlock in reset handling Petr Oros
@ 2026-02-09 7:15 ` Loktionov, Aleksandr
2026-02-09 23:59 ` Jacob Keller
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Loktionov, Aleksandr @ 2026-02-09 7:15 UTC (permalink / raw)
To: Oros, Petr, netdev@vger.kernel.org
Cc: Vecera, Ivan, shaojijie@huawei.com, Oros, Petr, Keller, Jacob E,
Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Stanislav Fomichev, intel-wired-lan@lists.osuosl.org,
linux-kernel@vger.kernel.org
> -----Original Message-----
> From: Petr Oros <poros@redhat.com>
> Sent: Saturday, February 7, 2026 11:23 AM
> To: netdev@vger.kernel.org
> Cc: Vecera, Ivan <ivecera@redhat.com>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>; shaojijie@huawei.com; Oros, Petr
> <poros@redhat.com>; Keller, Jacob E <jacob.e.keller@intel.com>;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Stanislav Fomichev <sdf@fomichev.me>; intel-
> wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org
> Subject: [PATCH net v2] iavf: fix deadlock in reset handling
>
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
>
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
> commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
> it") to fix a race condition where adding an interface to bonding
> immediately after MTU or ring parameter change failed because the
> interface was still in __RESETTING state. The same commit also added
> waiting in iavf_set_priv_flags(), which was later removed by commit
> 53844673d555 ("iavf: kill "legacy-rx" for good").
>
> Waiting in set_channels() was introduced earlier by commit
> 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF has
> enough time to complete the VF reset when changing channel count, and
> to return correct error codes to userspace.
>
> Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked NAPI
> variants (napi_enable_locked, napi_disable_locked) that need the
> netdev instance lock.
>
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
> protect all callback with netdev instance lock") started holding the
> netdev instance lock during ndo and ethtool callbacks for drivers with
> net_shaper_ops.
>
> Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
> replaced the driver's crit_lock with netdev_lock in reset_task, making
> the deadlock manifest: the callback holds netdev_lock and waits for
> reset_task, but reset_task needs the same lock:
>
> Thread 1 (callback) Thread 2 (reset_task)
> ------------------- ---------------------
> netdev_lock() [blocked on workqueue]
> ndo_change_mtu() or ethtool op
> iavf_schedule_reset()
> iavf_wait_for_reset() iavf_reset_task()
> waiting... netdev_lock() <- DEADLOCK
>
> Fix this by extracting the reset logic from iavf_reset_task() into a
> new
> iavf_reset_step() function that expects netdev_lock to be already
> held.
> The three callbacks now call iavf_reset_step() directly instead of
> scheduling the work and waiting, performing the reset synchronously in
> the caller's context which already holds netdev_lock. This eliminates
> both the deadlock and the need for iavf_wait_for_reset(), which is
> removed.
>
> The workqueue-based iavf_reset_task() becomes a thin wrapper that
> acquires netdev_lock and calls iavf_reset_step(), preserving its use
> for PF-initiated resets.
>
> The callbacks may block for several seconds while iavf_reset_step()
> polls hardware registers, but this is acceptable since netdev_lock is
> a per-device mutex and only serializes operations on the same
> interface.
>
> Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
> drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
> .../net/ethernet/intel/iavf/iavf_ethtool.c | 21 +++---
> drivers/net/ethernet/intel/iavf/iavf_main.c | 72 +++++++-----------
> -
> 3 files changed, 33 insertions(+), 62 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/iavf/iavf.h
> b/drivers/net/ethernet/intel/iavf/iavf.h
> index d552f912e8a947..0c3844b3ff1c86 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf.h
> @@ -625,5 +625,5 @@ void iavf_add_adv_rss_cfg(struct iavf_adapter
> *adapter); void iavf_del_adv_rss_cfg(struct iavf_adapter *adapter);
> struct iavf_mac_filter *iavf_add_filter(struct iavf_adapter *adapter,
> const u8 *macaddr);
...
> /**
> --
> 2.52.0
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net v2] iavf: fix deadlock in reset handling
2026-02-07 10:22 [PATCH net v2] iavf: fix deadlock in reset handling Petr Oros
2026-02-09 7:15 ` Loktionov, Aleksandr
@ 2026-02-09 23:59 ` Jacob Keller
2026-02-10 14:58 ` Przemek Kitszel
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Jacob Keller @ 2026-02-09 23:59 UTC (permalink / raw)
To: Petr Oros, netdev, Anthony Nguyen
Cc: ivecera, aleksandr.loktionov, shaojijie, Tony Nguyen,
Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Stanislav Fomichev, intel-wired-lan,
linux-kernel
On 2/7/2026 2:22 AM, Petr Oros wrote:
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
>
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
> commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
> it") to fix a race condition where adding an interface to bonding
> immediately after MTU or ring parameter change failed because the
> interface was still in __RESETTING state. The same commit also added
> waiting in iavf_set_priv_flags(), which was later removed by commit
> 53844673d555 ("iavf: kill "legacy-rx" for good").
>
> Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF has
> enough time to complete the VF reset when changing channel count, and to
> return correct error codes to userspace.
>
> Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked NAPI
> variants (napi_enable_locked, napi_disable_locked) that need the netdev
> instance lock.
>
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
> protect all callback with netdev instance lock") started holding the
> netdev instance lock during ndo and ethtool callbacks for drivers with
> net_shaper_ops.
>
> Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
> replaced the driver's crit_lock with netdev_lock in reset_task, making
> the deadlock manifest: the callback holds netdev_lock and waits for
> reset_task, but reset_task needs the same lock:
>
> Thread 1 (callback) Thread 2 (reset_task)
> ------------------- ---------------------
> netdev_lock() [blocked on workqueue]
> ndo_change_mtu() or ethtool op
> iavf_schedule_reset()
> iavf_wait_for_reset() iavf_reset_task()
> waiting... netdev_lock() <- DEADLOCK
>
Only one note: this does not strictly "deadlock" because the
wait_event() will eventually timeout, exit with an error, release the
lock, and then the reset will complete. This is obviously incorrect
behavior and likely to lead to errors as the configuration change may
actually silently apply even though the user sees an error.
Still worth fixing, but it does not lead to a complete system failure
deadlock due to the timeout.
> Fix this by extracting the reset logic from iavf_reset_task() into a new
> iavf_reset_step() function that expects netdev_lock to be already held.
> The three callbacks now call iavf_reset_step() directly instead of
> scheduling the work and waiting, performing the reset synchronously in
> the caller's context which already holds netdev_lock. This eliminates
> both the deadlock and the need for iavf_wait_for_reset(), which is
> removed.
>
> The workqueue-based iavf_reset_task() becomes a thin wrapper that
> acquires netdev_lock and calls iavf_reset_step(), preserving its use
> for PF-initiated resets.
>
> The callbacks may block for several seconds while iavf_reset_step()
> polls hardware registers, but this is acceptable since netdev_lock is a
> per-device mutex and only serializes operations on the same interface.
>
> Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
Great work on the analysis and history of the issue, as well as coming
up with a minimal fix!
I'm happy to see this go straight to net since its an important fix and
comes from external to Intel, unless Tony has any objections.
I case it helps expedite things: I manually loaded this on my system and
confirmed it resolves the issues for MTU changes, so:
Tested-by: Jacob Keller <jacob.e.keller@intel.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net v2] iavf: fix deadlock in reset handling
2026-02-07 10:22 [PATCH net v2] iavf: fix deadlock in reset handling Petr Oros
2026-02-09 7:15 ` Loktionov, Aleksandr
2026-02-09 23:59 ` Jacob Keller
@ 2026-02-10 14:58 ` Przemek Kitszel
2026-02-11 11:50 ` Paolo Abeni
2026-02-11 19:18 ` [PATCH net v3] iavf: fix incorrect reset handling in callbacks Petr Oros
4 siblings, 0 replies; 9+ messages in thread
From: Przemek Kitszel @ 2026-02-10 14:58 UTC (permalink / raw)
To: Petr Oros
Cc: ivecera, aleksandr.loktionov, shaojijie, Jacob Keller,
Tony Nguyen, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Stanislav Fomichev, intel-wired-lan,
linux-kernel, netdev
On 2/7/26 11:22, Petr Oros wrote:
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
>
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
> commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
> it") to fix a race condition where adding an interface to bonding
> immediately after MTU or ring parameter change failed because the
> interface was still in __RESETTING state. The same commit also added
> waiting in iavf_set_priv_flags(), which was later removed by commit
> 53844673d555 ("iavf: kill "legacy-rx" for good").
>
> Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF has
> enough time to complete the VF reset when changing channel count, and to
> return correct error codes to userspace.
>
> Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked NAPI
> variants (napi_enable_locked, napi_disable_locked) that need the netdev
> instance lock.
>
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
> protect all callback with netdev instance lock") started holding the
> netdev instance lock during ndo and ethtool callbacks for drivers with
> net_shaper_ops.
>
> Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
> replaced the driver's crit_lock with netdev_lock in reset_task, making
> the deadlock manifest: the callback holds netdev_lock and waits for
> reset_task, but reset_task needs the same lock:
>
> Thread 1 (callback) Thread 2 (reset_task)
> ------------------- ---------------------
> netdev_lock() [blocked on workqueue]
> ndo_change_mtu() or ethtool op
> iavf_schedule_reset()
> iavf_wait_for_reset() iavf_reset_task()
> waiting... netdev_lock() <- DEADLOCK
>
> Fix this by extracting the reset logic from iavf_reset_task() into a new
> iavf_reset_step() function that expects netdev_lock to be already held.
> The three callbacks now call iavf_reset_step() directly instead of
> scheduling the work and waiting, performing the reset synchronously in
> the caller's context which already holds netdev_lock. This eliminates
> both the deadlock and the need for iavf_wait_for_reset(), which is
> removed.
>
> The workqueue-based iavf_reset_task() becomes a thin wrapper that
> acquires netdev_lock and calls iavf_reset_step(), preserving its use
> for PF-initiated resets.
>
> The callbacks may block for several seconds while iavf_reset_step()
> polls hardware registers, but this is acceptable since netdev_lock is a
> per-device mutex and only serializes operations on the same interface.
>
> Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Petr Oros <poros@redhat.com>
thanks a lot, this is not only a fix, but also a step in the right
direction for the driver,
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> ---
> drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
> .../net/ethernet/intel/iavf/iavf_ethtool.c | 21 +++---
> drivers/net/ethernet/intel/iavf/iavf_main.c | 72 +++++++------------
> 3 files changed, 33 insertions(+), 62 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
> index d552f912e8a947..0c3844b3ff1c86 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf.h
> @@ -625,5 +625,5 @@ void iavf_add_adv_rss_cfg(struct iavf_adapter *adapter);
> void iavf_del_adv_rss_cfg(struct iavf_adapter *adapter);
> struct iavf_mac_filter *iavf_add_filter(struct iavf_adapter *adapter,
> const u8 *macaddr);
> -int iavf_wait_for_reset(struct iavf_adapter *adapter);
> +void iavf_reset_step(struct iavf_adapter *adapter);
> #endif /* _IAVF_H_ */
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> index 2cc21289a70779..9b0f47f9340942 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> @@ -492,7 +492,6 @@ static int iavf_set_ringparam(struct net_device *netdev,
> {
> struct iavf_adapter *adapter = netdev_priv(netdev);
> u32 new_rx_count, new_tx_count;
> - int ret = 0;
>
> if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
> return -EINVAL;
> @@ -537,13 +536,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
> }
>
> if (netif_running(netdev)) {
> - iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
> - ret = iavf_wait_for_reset(adapter);
> - if (ret)
> - netdev_warn(netdev, "Changing ring parameters timeout or interrupted waiting for reset");
> + adapter->flags |= IAVF_FLAG_RESET_NEEDED;
> + iavf_reset_step(adapter);
> }
>
> - return ret;
> + return 0;
> }
>
> /**
> @@ -1723,7 +1720,6 @@ static int iavf_set_channels(struct net_device *netdev,
> {
> struct iavf_adapter *adapter = netdev_priv(netdev);
> u32 num_req = ch->combined_count;
> - int ret = 0;
>
> if ((adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_ADQ) &&
> adapter->num_tc) {
> @@ -1745,13 +1741,12 @@ static int iavf_set_channels(struct net_device *netdev,
>
> adapter->num_req_queues = num_req;
> adapter->flags |= IAVF_FLAG_REINIT_ITR_NEEDED;
> - iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
> -
> - ret = iavf_wait_for_reset(adapter);
> - if (ret)
> - netdev_warn(netdev, "Changing channel count timeout or interrupted waiting for reset");
> + if (netif_running(netdev)) {
> + adapter->flags |= IAVF_FLAG_RESET_NEEDED;
> + iavf_reset_step(adapter);
> + }
>
> - return ret;
> + return 0;
> }
>
> /**
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 8aa6e92c16431f..9c8d6125106f5a 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -185,31 +185,6 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter)
> return false;
> }
>
> -/**
> - * iavf_wait_for_reset - Wait for reset to finish.
> - * @adapter: board private structure
> - *
> - * Returns 0 if reset finished successfully, negative on timeout or interrupt.
> - */
> -int iavf_wait_for_reset(struct iavf_adapter *adapter)
> -{
> - int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
> - !iavf_is_reset_in_progress(adapter),
> - msecs_to_jiffies(5000));
> -
> - /* If ret < 0 then it means wait was interrupted.
> - * If ret == 0 then it means we got a timeout while waiting
> - * for reset to finish.
> - * If ret > 0 it means reset has finished.
> - */
> - if (ret > 0)
> - return 0;
> - else if (ret < 0)
> - return -EINTR;
> - else
> - return -EBUSY;
> -}
> -
> /**
> * iavf_allocate_dma_mem_d - OS specific memory alloc for shared code
> * @hw: pointer to the HW structure
> @@ -3100,18 +3075,16 @@ static void iavf_reconfig_qs_bw(struct iavf_adapter *adapter)
> }
>
> /**
> - * iavf_reset_task - Call-back task to handle hardware reset
> - * @work: pointer to work_struct
> + * iavf_reset_step - Perform the VF reset sequence
> + * @adapter: board private structure
> *
> - * During reset we need to shut down and reinitialize the admin queue
> - * before we can use it to communicate with the PF again. We also clear
> - * and reinit the rings because that context is lost as well.
> - **/
> -static void iavf_reset_task(struct work_struct *work)
> + * Requests a reset from PF, polls for completion, and reconfigures
> + * the driver. Caller must hold the netdev instance lock.
> + *
> + * This can sleep for several seconds while polling HW registers.
> + */
> +void iavf_reset_step(struct iavf_adapter *adapter)
> {
> - struct iavf_adapter *adapter = container_of(work,
> - struct iavf_adapter,
> - reset_task);
> struct virtchnl_vf_resource *vfres = adapter->vf_res;
> struct net_device *netdev = adapter->netdev;
> struct iavf_hw *hw = &adapter->hw;
> @@ -3122,7 +3095,7 @@ static void iavf_reset_task(struct work_struct *work)
> int i = 0, err;
> bool running;
>
> - netdev_lock(netdev);
> + netdev_assert_locked(netdev);
>
> iavf_misc_irq_disable(adapter);
> if (adapter->flags & IAVF_FLAG_RESET_NEEDED) {
> @@ -3167,7 +3140,6 @@ static void iavf_reset_task(struct work_struct *work)
> dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
> reg_val);
> iavf_disable_vf(adapter);
> - netdev_unlock(netdev);
> return; /* Do not attempt to reinit. It's dead, Jim. */
> }
>
> @@ -3179,7 +3151,6 @@ static void iavf_reset_task(struct work_struct *work)
> iavf_startup(adapter);
> queue_delayed_work(adapter->wq, &adapter->watchdog_task,
> msecs_to_jiffies(30));
> - netdev_unlock(netdev);
> return;
> }
>
> @@ -3321,7 +3292,6 @@ static void iavf_reset_task(struct work_struct *work)
> adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED;
>
> wake_up(&adapter->reset_waitqueue);
> - netdev_unlock(netdev);
>
> return;
> reset_err:
> @@ -3331,10 +3301,21 @@ static void iavf_reset_task(struct work_struct *work)
> }
> iavf_disable_vf(adapter);
>
> - netdev_unlock(netdev);
> dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
> }
>
> +static void iavf_reset_task(struct work_struct *work)
> +{
> + struct iavf_adapter *adapter = container_of(work,
> + struct iavf_adapter,
> + reset_task);
> + struct net_device *netdev = adapter->netdev;
> +
> + netdev_lock(netdev);
> + iavf_reset_step(adapter);
> + netdev_unlock(netdev);
> +}
> +
> /**
> * iavf_adminq_task - worker thread to clean the admin queue
> * @work: pointer to work_struct containing our data
> @@ -4600,22 +4581,17 @@ static int iavf_close(struct net_device *netdev)
> static int iavf_change_mtu(struct net_device *netdev, int new_mtu)
> {
> struct iavf_adapter *adapter = netdev_priv(netdev);
> - int ret = 0;
>
> netdev_dbg(netdev, "changing MTU from %d to %d\n",
> netdev->mtu, new_mtu);
> WRITE_ONCE(netdev->mtu, new_mtu);
>
> if (netif_running(netdev)) {
> - iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
> - ret = iavf_wait_for_reset(adapter);
> - if (ret < 0)
> - netdev_warn(netdev, "MTU change interrupted waiting for reset");
> - else if (ret)
> - netdev_warn(netdev, "MTU change timed out waiting for reset");
> + adapter->flags |= IAVF_FLAG_RESET_NEEDED;
> + iavf_reset_step(adapter);
> }
>
> - return ret;
> + return 0;
> }
>
> /**
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net v2] iavf: fix deadlock in reset handling
2026-02-07 10:22 [PATCH net v2] iavf: fix deadlock in reset handling Petr Oros
` (2 preceding siblings ...)
2026-02-10 14:58 ` Przemek Kitszel
@ 2026-02-11 11:50 ` Paolo Abeni
2026-02-11 18:50 ` Jacob Keller
2026-02-11 19:18 ` [PATCH net v3] iavf: fix incorrect reset handling in callbacks Petr Oros
4 siblings, 1 reply; 9+ messages in thread
From: Paolo Abeni @ 2026-02-11 11:50 UTC (permalink / raw)
To: Petr Oros, netdev
Cc: ivecera, aleksandr.loktionov, shaojijie, Jacob Keller,
Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Stanislav Fomichev, intel-wired-lan,
linux-kernel
On 2/7/26 11:22 AM, Petr Oros wrote:
> -/**
> - * iavf_wait_for_reset - Wait for reset to finish.
> - * @adapter: board private structure
> - *
> - * Returns 0 if reset finished successfully, negative on timeout or interrupt.
> - */
> -int iavf_wait_for_reset(struct iavf_adapter *adapter)
> -{
> - int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
> - !iavf_is_reset_in_progress(adapter),
> - msecs_to_jiffies(5000));
AFAICS, after this change nobody waits anymore on reset_waitqueue, do we
still need such wait queue head? I think a bunch of additional code
could be dropped.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net v2] iavf: fix deadlock in reset handling
2026-02-11 11:50 ` Paolo Abeni
@ 2026-02-11 18:50 ` Jacob Keller
0 siblings, 0 replies; 9+ messages in thread
From: Jacob Keller @ 2026-02-11 18:50 UTC (permalink / raw)
To: Paolo Abeni, Petr Oros, netdev
Cc: ivecera, aleksandr.loktionov, shaojijie, Tony Nguyen,
Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Stanislav Fomichev, intel-wired-lan, linux-kernel
On 2/11/2026 3:50 AM, Paolo Abeni wrote:
> On 2/7/26 11:22 AM, Petr Oros wrote:
>> -/**
>> - * iavf_wait_for_reset - Wait for reset to finish.
>> - * @adapter: board private structure
>> - *
>> - * Returns 0 if reset finished successfully, negative on timeout or interrupt.
>> - */
>> -int iavf_wait_for_reset(struct iavf_adapter *adapter)
>> -{
>> - int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
>> - !iavf_is_reset_in_progress(adapter),
>> - msecs_to_jiffies(5000));
>
> AFAICS, after this change nobody waits anymore on reset_waitqueue, do we
> still need such wait queue head? I think a bunch of additional code
> could be dropped.
>
> Thanks,
>
> Paolo
>
Yea, that seems correct. We could do that cleanup through next instead
if we want to keep this patch smaller and focused on the fix. I don't
mind either way.
Thanks,
Jake
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net v3] iavf: fix incorrect reset handling in callbacks
2026-02-07 10:22 [PATCH net v2] iavf: fix deadlock in reset handling Petr Oros
` (3 preceding siblings ...)
2026-02-11 11:50 ` Paolo Abeni
@ 2026-02-11 19:18 ` Petr Oros
2026-02-13 10:29 ` Przemek Kitszel
2026-02-23 16:10 ` [Intel-wired-lan] " Romanowski, Rafal
4 siblings, 2 replies; 9+ messages in thread
From: Petr Oros @ 2026-02-11 19:18 UTC (permalink / raw)
To: netdev
Cc: ivecera, aleksandr.loktionov, shaojijie, Petr Oros, Jacob Keller,
Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Stanislav Fomichev,
intel-wired-lan, linux-kernel
Three driver callbacks schedule a reset and wait for its completion:
ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
it") to fix a race condition where adding an interface to bonding
immediately after MTU or ring parameter change failed because the
interface was still in __RESETTING state. The same commit also added
waiting in iavf_set_priv_flags(), which was later removed by commit
53844673d555 ("iavf: kill "legacy-rx" for good").
Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
("iavf: Fix return of set the new channel count") to ensure the PF has
enough time to complete the VF reset when changing channel count, and to
return correct error codes to userspace.
Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
net_shaper_ops to iavf, which required reset_task to use _locked NAPI
variants (napi_enable_locked, napi_disable_locked) that need the netdev
instance lock.
Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
protect all callback with netdev instance lock") started holding the
netdev instance lock during ndo and ethtool callbacks for drivers with
net_shaper_ops.
Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
replaced the driver's crit_lock with netdev_lock in reset_task, causing
incorrect behavior: the callback holds netdev_lock and waits for
reset_task, but reset_task needs the same lock:
Thread 1 (callback) Thread 2 (reset_task)
------------------- ---------------------
netdev_lock() [blocked on workqueue]
ndo_change_mtu() or ethtool op
iavf_schedule_reset()
iavf_wait_for_reset() iavf_reset_task()
waiting... netdev_lock() <- blocked
This does not strictly deadlock because iavf_wait_for_reset() uses
wait_event_interruptible_timeout() with a 5-second timeout. The wait
eventually times out, the callback returns an error to userspace, and
after the lock is released reset_task completes the reset. This leads to
incorrect behavior: userspace sees an error even though the configuration
change silently takes effect after the timeout.
Fix this by extracting the reset logic from iavf_reset_task() into a new
iavf_reset_step() function that expects netdev_lock to be already held.
The three callbacks now call iavf_reset_step() directly instead of
scheduling the work and waiting, performing the reset synchronously in
the caller's context which already holds netdev_lock. This eliminates
both the incorrect error reporting and the need for
iavf_wait_for_reset(), which is removed along with the now-unused
reset_waitqueue.
The workqueue-based iavf_reset_task() becomes a thin wrapper that
acquires netdev_lock and calls iavf_reset_step(), preserving its use
for PF-initiated resets.
The callbacks may block for several seconds while iavf_reset_step()
polls hardware registers, but this is acceptable since netdev_lock is a
per-device mutex and only serializes operations on the same interface.
v3:
- Remove netif_running() guard from iavf_set_channels(). Unlike
set_ringparam where descriptor counts are picked up by iavf_open()
directly, num_req_queues is only consumed during
iavf_reinit_interrupt_scheme() in the reset path. Skipping the reset
on a down device would silently discard the channel count change.
- Remove dead reset_waitqueue code (struct field, init, and all
wake_up calls) since iavf_wait_for_reset() was the only consumer.
Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
---
drivers/net/ethernet/intel/iavf/iavf.h | 3 +-
.../net/ethernet/intel/iavf/iavf_ethtool.c | 19 ++---
drivers/net/ethernet/intel/iavf/iavf_main.c | 77 ++++++-------------
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 1 -
4 files changed, 31 insertions(+), 69 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index a87e0c6d4017ad..e9fb0a0919e376 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -260,7 +260,6 @@ struct iavf_adapter {
struct work_struct adminq_task;
struct work_struct finish_config;
wait_queue_head_t down_waitqueue;
- wait_queue_head_t reset_waitqueue;
wait_queue_head_t vc_waitqueue;
struct iavf_q_vector *q_vectors;
struct list_head vlan_filter_list;
@@ -626,5 +625,5 @@ void iavf_add_adv_rss_cfg(struct iavf_adapter *adapter);
void iavf_del_adv_rss_cfg(struct iavf_adapter *adapter);
struct iavf_mac_filter *iavf_add_filter(struct iavf_adapter *adapter,
const u8 *macaddr);
-int iavf_wait_for_reset(struct iavf_adapter *adapter);
+void iavf_reset_step(struct iavf_adapter *adapter);
#endif /* _IAVF_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 2cc21289a70779..6ff3842a1ff1f0 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -492,7 +492,6 @@ static int iavf_set_ringparam(struct net_device *netdev,
{
struct iavf_adapter *adapter = netdev_priv(netdev);
u32 new_rx_count, new_tx_count;
- int ret = 0;
if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
return -EINVAL;
@@ -537,13 +536,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
}
if (netif_running(netdev)) {
- iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
- ret = iavf_wait_for_reset(adapter);
- if (ret)
- netdev_warn(netdev, "Changing ring parameters timeout or interrupted waiting for reset");
+ adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+ iavf_reset_step(adapter);
}
- return ret;
+ return 0;
}
/**
@@ -1723,7 +1720,6 @@ static int iavf_set_channels(struct net_device *netdev,
{
struct iavf_adapter *adapter = netdev_priv(netdev);
u32 num_req = ch->combined_count;
- int ret = 0;
if ((adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_ADQ) &&
adapter->num_tc) {
@@ -1745,13 +1741,10 @@ static int iavf_set_channels(struct net_device *netdev,
adapter->num_req_queues = num_req;
adapter->flags |= IAVF_FLAG_REINIT_ITR_NEEDED;
- iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
+ adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+ iavf_reset_step(adapter);
- ret = iavf_wait_for_reset(adapter);
- if (ret)
- netdev_warn(netdev, "Changing channel count timeout or interrupted waiting for reset");
-
- return ret;
+ return 0;
}
/**
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 4b0fc8f354bc90..9fd414800bca7a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -185,31 +185,6 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter)
return false;
}
-/**
- * iavf_wait_for_reset - Wait for reset to finish.
- * @adapter: board private structure
- *
- * Returns 0 if reset finished successfully, negative on timeout or interrupt.
- */
-int iavf_wait_for_reset(struct iavf_adapter *adapter)
-{
- int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
- !iavf_is_reset_in_progress(adapter),
- msecs_to_jiffies(5000));
-
- /* If ret < 0 then it means wait was interrupted.
- * If ret == 0 then it means we got a timeout while waiting
- * for reset to finish.
- * If ret > 0 it means reset has finished.
- */
- if (ret > 0)
- return 0;
- else if (ret < 0)
- return -EINTR;
- else
- return -EBUSY;
-}
-
/**
* iavf_allocate_dma_mem_d - OS specific memory alloc for shared code
* @hw: pointer to the HW structure
@@ -3100,18 +3075,16 @@ static void iavf_reconfig_qs_bw(struct iavf_adapter *adapter)
}
/**
- * iavf_reset_task - Call-back task to handle hardware reset
- * @work: pointer to work_struct
+ * iavf_reset_step - Perform the VF reset sequence
+ * @adapter: board private structure
*
- * During reset we need to shut down and reinitialize the admin queue
- * before we can use it to communicate with the PF again. We also clear
- * and reinit the rings because that context is lost as well.
- **/
-static void iavf_reset_task(struct work_struct *work)
+ * Requests a reset from PF, polls for completion, and reconfigures
+ * the driver. Caller must hold the netdev instance lock.
+ *
+ * This can sleep for several seconds while polling HW registers.
+ */
+void iavf_reset_step(struct iavf_adapter *adapter)
{
- struct iavf_adapter *adapter = container_of(work,
- struct iavf_adapter,
- reset_task);
struct virtchnl_vf_resource *vfres = adapter->vf_res;
struct net_device *netdev = adapter->netdev;
struct iavf_hw *hw = &adapter->hw;
@@ -3122,7 +3095,7 @@ static void iavf_reset_task(struct work_struct *work)
int i = 0, err;
bool running;
- netdev_lock(netdev);
+ netdev_assert_locked(netdev);
iavf_misc_irq_disable(adapter);
if (adapter->flags & IAVF_FLAG_RESET_NEEDED) {
@@ -3167,7 +3140,6 @@ static void iavf_reset_task(struct work_struct *work)
dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
reg_val);
iavf_disable_vf(adapter);
- netdev_unlock(netdev);
return; /* Do not attempt to reinit. It's dead, Jim. */
}
@@ -3179,7 +3151,6 @@ static void iavf_reset_task(struct work_struct *work)
iavf_startup(adapter);
queue_delayed_work(adapter->wq, &adapter->watchdog_task,
msecs_to_jiffies(30));
- netdev_unlock(netdev);
return;
}
@@ -3320,9 +3291,6 @@ static void iavf_reset_task(struct work_struct *work)
adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED;
- wake_up(&adapter->reset_waitqueue);
- netdev_unlock(netdev);
-
return;
reset_err:
if (running) {
@@ -3331,10 +3299,21 @@ static void iavf_reset_task(struct work_struct *work)
}
iavf_disable_vf(adapter);
- netdev_unlock(netdev);
dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
}
+static void iavf_reset_task(struct work_struct *work)
+{
+ struct iavf_adapter *adapter = container_of(work,
+ struct iavf_adapter,
+ reset_task);
+ struct net_device *netdev = adapter->netdev;
+
+ netdev_lock(netdev);
+ iavf_reset_step(adapter);
+ netdev_unlock(netdev);
+}
+
/**
* iavf_adminq_task - worker thread to clean the admin queue
* @work: pointer to work_struct containing our data
@@ -4600,22 +4579,17 @@ static int iavf_close(struct net_device *netdev)
static int iavf_change_mtu(struct net_device *netdev, int new_mtu)
{
struct iavf_adapter *adapter = netdev_priv(netdev);
- int ret = 0;
netdev_dbg(netdev, "changing MTU from %d to %d\n",
netdev->mtu, new_mtu);
WRITE_ONCE(netdev->mtu, new_mtu);
if (netif_running(netdev)) {
- iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
- ret = iavf_wait_for_reset(adapter);
- if (ret < 0)
- netdev_warn(netdev, "MTU change interrupted waiting for reset");
- else if (ret)
- netdev_warn(netdev, "MTU change timed out waiting for reset");
+ adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+ iavf_reset_step(adapter);
}
- return ret;
+ return 0;
}
/**
@@ -5420,9 +5394,6 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
/* Setup the wait queue for indicating transition to down status */
init_waitqueue_head(&adapter->down_waitqueue);
- /* Setup the wait queue for indicating transition to running state */
- init_waitqueue_head(&adapter->reset_waitqueue);
-
/* Setup the wait queue for indicating virtchannel events */
init_waitqueue_head(&adapter->vc_waitqueue);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 88156082a41da6..a52c100dcbc56d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -2736,7 +2736,6 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
case VIRTCHNL_OP_ENABLE_QUEUES:
/* enable transmits */
iavf_irq_enable(adapter, true);
- wake_up(&adapter->reset_waitqueue);
adapter->flags &= ~IAVF_FLAG_QUEUES_DISABLED;
break;
case VIRTCHNL_OP_DISABLE_QUEUES:
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net v3] iavf: fix incorrect reset handling in callbacks
2026-02-11 19:18 ` [PATCH net v3] iavf: fix incorrect reset handling in callbacks Petr Oros
@ 2026-02-13 10:29 ` Przemek Kitszel
2026-02-23 16:10 ` [Intel-wired-lan] " Romanowski, Rafal
1 sibling, 0 replies; 9+ messages in thread
From: Przemek Kitszel @ 2026-02-13 10:29 UTC (permalink / raw)
To: Petr Oros, netdev
Cc: ivecera, aleksandr.loktionov, shaojijie, Jacob Keller,
Tony Nguyen, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Stanislav Fomichev, intel-wired-lan,
linux-kernel
On 2/11/26 20:18, Petr Oros wrote:
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
>
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
> commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
> it") to fix a race condition where adding an interface to bonding
> immediately after MTU or ring parameter change failed because the
> interface was still in __RESETTING state. The same commit also added
> waiting in iavf_set_priv_flags(), which was later removed by commit
> 53844673d555 ("iavf: kill "legacy-rx" for good").
>
> Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF has
> enough time to complete the VF reset when changing channel count, and to
> return correct error codes to userspace.
>
> Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked NAPI
> variants (napi_enable_locked, napi_disable_locked) that need the netdev
> instance lock.
>
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
> protect all callback with netdev instance lock") started holding the
> netdev instance lock during ndo and ethtool callbacks for drivers with
> net_shaper_ops.
>
> Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
> replaced the driver's crit_lock with netdev_lock in reset_task, causing
> incorrect behavior: the callback holds netdev_lock and waits for
> reset_task, but reset_task needs the same lock:
>
> Thread 1 (callback) Thread 2 (reset_task)
> ------------------- ---------------------
> netdev_lock() [blocked on workqueue]
> ndo_change_mtu() or ethtool op
> iavf_schedule_reset()
> iavf_wait_for_reset() iavf_reset_task()
> waiting... netdev_lock() <- blocked
>
> This does not strictly deadlock because iavf_wait_for_reset() uses
> wait_event_interruptible_timeout() with a 5-second timeout. The wait
> eventually times out, the callback returns an error to userspace, and
> after the lock is released reset_task completes the reset. This leads to
> incorrect behavior: userspace sees an error even though the configuration
> change silently takes effect after the timeout.
>
> Fix this by extracting the reset logic from iavf_reset_task() into a new
> iavf_reset_step() function that expects netdev_lock to be already held.
> The three callbacks now call iavf_reset_step() directly instead of
> scheduling the work and waiting, performing the reset synchronously in
> the caller's context which already holds netdev_lock. This eliminates
> both the incorrect error reporting and the need for
> iavf_wait_for_reset(), which is removed along with the now-unused
> reset_waitqueue.
>
> The workqueue-based iavf_reset_task() becomes a thin wrapper that
> acquires netdev_lock and calls iavf_reset_step(), preserving its use
> for PF-initiated resets.
>
> The callbacks may block for several seconds while iavf_reset_step()
> polls hardware registers, but this is acceptable since netdev_lock is a
> per-device mutex and only serializes operations on the same interface.
>
> v3:
> - Remove netif_running() guard from iavf_set_channels(). Unlike
> set_ringparam where descriptor counts are picked up by iavf_open()
> directly, num_req_queues is only consumed during
> iavf_reinit_interrupt_scheme() in the reset path. Skipping the reset
> on a down device would silently discard the channel count change.
> - Remove dead reset_waitqueue code (struct field, init, and all
> wake_up calls) since iavf_wait_for_reset() was the only consumer.
>
> Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
> drivers/net/ethernet/intel/iavf/iavf.h | 3 +-
> .../net/ethernet/intel/iavf/iavf_ethtool.c | 19 ++---
> drivers/net/ethernet/intel/iavf/iavf_main.c | 77 ++++++-------------
> .../net/ethernet/intel/iavf/iavf_virtchnl.c | 1 -
> 4 files changed, 31 insertions(+), 69 deletions(-)
>
thank you for improving on your v2,
next time please post as top-level (instead of In-reply-to), to get more
traction earlier
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [Intel-wired-lan] [PATCH net v3] iavf: fix incorrect reset handling in callbacks
2026-02-11 19:18 ` [PATCH net v3] iavf: fix incorrect reset handling in callbacks Petr Oros
2026-02-13 10:29 ` Przemek Kitszel
@ 2026-02-23 16:10 ` Romanowski, Rafal
1 sibling, 0 replies; 9+ messages in thread
From: Romanowski, Rafal @ 2026-02-23 16:10 UTC (permalink / raw)
To: Oros, Petr, netdev@vger.kernel.org
Cc: Vecera, Ivan, shaojijie@huawei.com, Kitszel, Przemyslaw,
Eric Dumazet, linux-kernel@vger.kernel.org, Loktionov, Aleksandr,
Andrew Lunn, Stanislav Fomichev, Nguyen, Anthony L,
intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
Paolo Abeni, David S. Miller
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Petr
> Oros
> Sent: Wednesday, February 11, 2026 20:19
> To: netdev@vger.kernel.org
> Cc: Vecera, Ivan <ivecera@redhat.com>; shaojijie@huawei.com; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; Eric Dumazet
> <edumazet@google.com>; linux-kernel@vger.kernel.org; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> Stanislav Fomichev <sdf@fomichev.me>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; intel-wired-lan@lists.osuosl.org; Keller, Jacob E
> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>
> Subject: [Intel-wired-lan] [PATCH net v3] iavf: fix incorrect reset handling in
> callbacks
>
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
>
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added by commit
> c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
> it") to fix a race condition where adding an interface to bonding immediately after
> MTU or ring parameter change failed because the interface was still in
> __RESETTING state. The same commit also added waiting in iavf_set_priv_flags(),
> which was later removed by commit
> 53844673d555 ("iavf: kill "legacy-rx" for good").
>
> Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF has enough time
> to complete the VF reset when changing channel count, and to return correct
> error codes to userspace.
>
> Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked NAPI variants
> (napi_enable_locked, napi_disable_locked) that need the netdev instance lock.
>
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink
> operations") and commit 2bcf4772e45a ("net: ethtool: try to protect all callback
> with netdev instance lock") started holding the netdev instance lock during ndo
> and ethtool callbacks for drivers with net_shaper_ops.
>
> Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock") replaced the driver's
> crit_lock with netdev_lock in reset_task, causing incorrect behavior: the callback
> holds netdev_lock and waits for reset_task, but reset_task needs the same lock:
>
> Thread 1 (callback) Thread 2 (reset_task)
> ------------------- ---------------------
> netdev_lock() [blocked on workqueue]
> ndo_change_mtu() or ethtool op
> iavf_schedule_reset()
> iavf_wait_for_reset() iavf_reset_task()
> waiting... netdev_lock() <- blocked
>
> This does not strictly deadlock because iavf_wait_for_reset() uses
> wait_event_interruptible_timeout() with a 5-second timeout. The wait eventually
> times out, the callback returns an error to userspace, and after the lock is
> released reset_task completes the reset. This leads to incorrect behavior:
> userspace sees an error even though the configuration change silently takes
> effect after the timeout.
>
> Fix this by extracting the reset logic from iavf_reset_task() into a new
> iavf_reset_step() function that expects netdev_lock to be already held.
> The three callbacks now call iavf_reset_step() directly instead of scheduling the
> work and waiting, performing the reset synchronously in the caller's context
> which already holds netdev_lock. This eliminates both the incorrect error
> reporting and the need for iavf_wait_for_reset(), which is removed along with the
> now-unused reset_waitqueue.
>
> The workqueue-based iavf_reset_task() becomes a thin wrapper that acquires
> netdev_lock and calls iavf_reset_step(), preserving its use for PF-initiated resets.
>
> The callbacks may block for several seconds while iavf_reset_step() polls
> hardware registers, but this is acceptable since netdev_lock is a per-device
> mutex and only serializes operations on the same interface.
>
> v3:
> - Remove netif_running() guard from iavf_set_channels(). Unlike
> set_ringparam where descriptor counts are picked up by iavf_open()
> directly, num_req_queues is only consumed during
> iavf_reinit_interrupt_scheme() in the reset path. Skipping the reset
> on a down device would silently discard the channel count change.
> - Remove dead reset_waitqueue code (struct field, init, and all
> wake_up calls) since iavf_wait_for_reset() was the only consumer.
>
> Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
> drivers/net/ethernet/intel/iavf/iavf.h | 3 +-
> .../net/ethernet/intel/iavf/iavf_ethtool.c | 19 ++---
> drivers/net/ethernet/intel/iavf/iavf_main.c | 77 ++++++-------------
> .../net/ethernet/intel/iavf/iavf_virtchnl.c | 1 -
> 4 files changed, 31 insertions(+), 69 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/iavf/iavf.h
> b/drivers/net/ethernet/intel/iavf/iavf.h
> index a87e0c6d4017ad..e9fb0a0919e376 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf.h
> @@ -260,7 +260,6 @@ struct iavf_adapter {
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-02-23 16:11 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-07 10:22 [PATCH net v2] iavf: fix deadlock in reset handling Petr Oros
2026-02-09 7:15 ` Loktionov, Aleksandr
2026-02-09 23:59 ` Jacob Keller
2026-02-10 14:58 ` Przemek Kitszel
2026-02-11 11:50 ` Paolo Abeni
2026-02-11 18:50 ` Jacob Keller
2026-02-11 19:18 ` [PATCH net v3] iavf: fix incorrect reset handling in callbacks Petr Oros
2026-02-13 10:29 ` Przemek Kitszel
2026-02-23 16:10 ` [Intel-wired-lan] " Romanowski, Rafal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox