* iAVF circular lock dependency due to netdev_lock
@ 2025-02-05 23:20 Jacob Keller
2025-02-06 1:23 ` Jakub Kicinski
0 siblings, 1 reply; 5+ messages in thread
From: Jacob Keller @ 2025-02-05 23:20 UTC (permalink / raw)
To: Jakub Kicinski, netdev@vger.kernel.org
Hi,
I just discovered that iAVF now has a circular lock dependency on its
critical lock and the netdev lock:
> [ 1504.790308] ======================================================
> [ 1504.790309] WARNING: possible circular locking dependency detected
> [ 1504.790310] 6.13.0 #net_next_rt.c2933b2befe2.el9 Not tainted
> [ 1504.790311] ------------------------------------------------------
> [ 1504.790312] kworker/u128:0/13566 is trying to acquire lock:
> [ 1504.790313] ffff97d0e4738f18 (&dev->lock){+.+.}-{4:4}, at: register_netdevice+0x52c/0x710
> [ 1504.790320]
> [ 1504.790320] but task is already holding lock:
> [ 1504.790321] ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf]
> [ 1504.790330]
> [ 1504.790330] which lock already depends on the new lock.
> [ 1504.790330]
> [ 1504.790330]
> [ 1504.790330] the existing dependency chain (in reverse order) is:
> [ 1504.790331]
> [ 1504.790331] -> #1 (&adapter->crit_lock){+.+.}-{4:4}:
> [ 1504.790333] __lock_acquire+0x52d/0xbb0
> [ 1504.790337] lock_acquire+0xd9/0x330
> [ 1504.790338] mutex_lock_nested+0x4b/0xb0
> [ 1504.790341] iavf_finish_config+0x37/0x240 [iavf]
> [ 1504.790347] process_one_work+0x248/0x6d0
> [ 1504.790350] worker_thread+0x18d/0x330
> [ 1504.790352] kthread+0x10e/0x250
> [ 1504.790354] ret_from_fork+0x30/0x50
> [ 1504.790357] ret_from_fork_asm+0x1a/0x30
> [ 1504.790361]
> [ 1504.790361] -> #0 (&dev->lock){+.+.}-{4:4}:
> [ 1504.790364] check_prev_add+0xf1/0xce0
> [ 1504.790366] validate_chain+0x46a/0x570
> [ 1504.790368] __lock_acquire+0x52d/0xbb0
> [ 1504.790370] lock_acquire+0xd9/0x330
> [ 1504.790371] mutex_lock_nested+0x4b/0xb0
> [ 1504.790372] register_netdevice+0x52c/0x710
> [ 1504.790374] iavf_finish_config+0xfa/0x240 [iavf]
> [ 1504.790379] process_one_work+0x248/0x6d0
> [ 1504.790381] worker_thread+0x18d/0x330
> [ 1504.790383] kthread+0x10e/0x250
> [ 1504.790385] ret_from_fork+0x30/0x50
> [ 1504.790387] ret_from_fork_asm+0x1a/0x30
> [ 1504.790389]
> [ 1504.790389] other info that might help us debug this:
> [ 1504.790389]
> [ 1504.790389] Possible unsafe locking scenario:
> [ 1504.790389]
> [ 1504.790390] CPU0 CPU1
> [ 1504.790391] ---- ----
> [ 1504.790391] lock(&adapter->crit_lock);
> [ 1504.790393] lock(&dev->lock);
> [ 1504.790394] lock(&adapter->crit_lock);
> [ 1504.790395] lock(&dev->lock);
> [ 1504.790397]
> [ 1504.790397] *** DEADLOCK ***
> [ 1504.790397]
> [ 1504.790397] 4 locks held by kworker/u128:0/13566:
> [ 1504.790399] #0: ffff97d1924d4d48 ((wq_completion)iavf){+.+.}-{0:0}, at: process_one_work+0x49b/0x6d0
> [ 1504.790404] #1: ffffa848c3d9be40 ((work_completion)(&adapter->finish_config)){+.+.}-{0:0}, at: process_one_work+0x1f3/0x6d0
> [ 1504.790408] #2: ffffffffb3e1bf40 (rtnl_mutex){+.+.}-{4:4}, at: iavf_finish_config+0x18/0x240 [iavf]
> [ 1504.790416] #3: ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf]
This happens because the driver takes netdev_lock prior to acquiring its
own adapter->crit_lock, but then it calls register_netdevice under the
crit_lock. Since commit 5fda3f35349b ("net: make netdev_lock() protect
netdev->reg_state"), the register_netdevice() function now acquires
netdev_lock as part of its flow.
I can fix this by refactoring iavf to only take netdev_lock after
acquiring its own crit_lock.. but that smells funny. It seems like a
future change could require to take netdev_lock before calling into the
driver routines somehow, making that ordering problematic.
I'm not sure how else to fix this... I briefly considered just removing
crit_lock and relying solely on netdev_lock for synchronization, but
that doesn't work because of the register_netdevice() taking the lock.
I guess I could do some funky stuff with unlocking but that seems ugly
as well...
I'm not sure what we should do to fix this.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: iAVF circular lock dependency due to netdev_lock 2025-02-05 23:20 iAVF circular lock dependency due to netdev_lock Jacob Keller @ 2025-02-06 1:23 ` Jakub Kicinski 2025-02-06 2:27 ` Jacob Keller 0 siblings, 1 reply; 5+ messages in thread From: Jakub Kicinski @ 2025-02-06 1:23 UTC (permalink / raw) To: Jacob Keller; +Cc: netdev@vger.kernel.org On Wed, 5 Feb 2025 15:20:07 -0800 Jacob Keller wrote: > This happens because the driver takes netdev_lock prior to acquiring its > own adapter->crit_lock, but then it calls register_netdevice under the > crit_lock. Since commit 5fda3f35349b ("net: make netdev_lock() protect > netdev->reg_state"), the register_netdevice() function now acquires > netdev_lock as part of its flow. > > I can fix this by refactoring iavf to only take netdev_lock after > acquiring its own crit_lock.. but that smells funny. It seems like a > future change could require to take netdev_lock before calling into the > driver routines somehow, making that ordering problematic. > > I'm not sure how else to fix this... I briefly considered just removing > crit_lock and relying solely on netdev_lock for synchronization, but > that doesn't work because of the register_netdevice() taking the lock. > > I guess I could do some funky stuff with unlocking but that seems ugly > as well... > > I'm not sure what we should do to fix this. Not sure either, the locking in this driver is quite odd. Do you know why it's registering the netdev from a workqueue, and what the entry points to the driver are? Normally before the netdev is registered it can't get called, so all the locking is moot. But IDK if we need to protect from some FW interactions, maybe? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: iAVF circular lock dependency due to netdev_lock 2025-02-06 1:23 ` Jakub Kicinski @ 2025-02-06 2:27 ` Jacob Keller 2025-02-06 4:32 ` Jakub Kicinski 0 siblings, 1 reply; 5+ messages in thread From: Jacob Keller @ 2025-02-06 2:27 UTC (permalink / raw) To: Jakub Kicinski; +Cc: netdev@vger.kernel.org On 2/5/2025 5:23 PM, Jakub Kicinski wrote: > On Wed, 5 Feb 2025 15:20:07 -0800 Jacob Keller wrote: >> This happens because the driver takes netdev_lock prior to acquiring its >> own adapter->crit_lock, but then it calls register_netdevice under the >> crit_lock. Since commit 5fda3f35349b ("net: make netdev_lock() protect >> netdev->reg_state"), the register_netdevice() function now acquires >> netdev_lock as part of its flow. >> >> I can fix this by refactoring iavf to only take netdev_lock after >> acquiring its own crit_lock.. but that smells funny. It seems like a >> future change could require to take netdev_lock before calling into the >> driver routines somehow, making that ordering problematic. >> >> I'm not sure how else to fix this... I briefly considered just removing >> crit_lock and relying solely on netdev_lock for synchronization, but >> that doesn't work because of the register_netdevice() taking the lock. >> >> I guess I could do some funky stuff with unlocking but that seems ugly >> as well... >> >> I'm not sure what we should do to fix this. > > Not sure either, the locking in this driver is quite odd. Do you know > why it's registering the netdev from a workqueue, and what the entry > points to the driver are? > Yes, the locking in iAVF has been problematic for years :( We register the netdevice from a work queue because we are waiting on messages from the PF over virtchnl. I don't fully understand the motivation behind the way the initialization was moved into a work queue, but this appears to be the historical reasoning from examining commit logs. > Normally before the netdev is registered it can't get called, so all > the locking is moot. But IDK if we need to protect from some FW > interactions, maybe? We had a lot of issues with locking and pain getting to the state we are in today. I think part of the challenge is that the VF is communicating asynchronously over virtchnl queue messages to the PF for setup. Its a mess :( I could re-order the locks so we go "RTNL -> crit_lock -> netdev_lock" but that will only work as long as no flow from the kernel does something like "RTNL -> netdev_lock -> <driver callback that takes crit lock>" which seems unlikely :( Its a mess and I don't quite know how to dig out of it. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: iAVF circular lock dependency due to netdev_lock 2025-02-06 2:27 ` Jacob Keller @ 2025-02-06 4:32 ` Jakub Kicinski 2025-02-06 17:18 ` Jacob Keller 0 siblings, 1 reply; 5+ messages in thread From: Jakub Kicinski @ 2025-02-06 4:32 UTC (permalink / raw) To: Jacob Keller; +Cc: netdev@vger.kernel.org On Wed, 5 Feb 2025 18:27:40 -0800 Jacob Keller wrote: > > Not sure either, the locking in this driver is quite odd. Do you know > > why it's registering the netdev from a workqueue, and what the entry > > points to the driver are? > > Yes, the locking in iAVF has been problematic for years :( > > We register the netdevice from a work queue because we are waiting on > messages from the PF over virtchnl. I don't fully understand the > motivation behind the way the initialization was moved into a work > queue, but this appears to be the historical reasoning from examining > commit logs. > > > Normally before the netdev is registered it can't get called, so all > > the locking is moot. But IDK if we need to protect from some FW > > interactions, maybe? > > We had a lot of issues with locking and pain getting to the state we are > in today. I think part of the challenge is that the VF is communicating > asynchronously over virtchnl queue messages to the PF for setup. > > Its a mess :( I could re-order the locks so we go "RTNL -> crit_lock -> > netdev_lock" but that will only work as long as no flow from the kernel > does something like "RTNL -> netdev_lock -> <driver callback that takes > crit lock>" which seems unlikely :( > > Its a mess and I don't quite know how to dig out of it. Stanislav suggested off list that we could add a _locked() version of register_netdevice(). I'm worried that it's just digging a deeper hole. We'd cover with the lock parts of core which weren't covered before. Maybe the saving grace is that the driver appears to never transition out of the registered state. And netdev lock only protects the core. So we could elide taking the netdev lock if device is not registered yet? We'd still need to convince lockdep that its okay to take the netdev lock under crit lock once. Code below is incomplete but hopefully it will illustrate. Key unanswered question is how to explain this to lockdep.. diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 4fe481433842..79904d49792a 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1974,6 +1974,67 @@ static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter, bool runni return err; } +enum iavf_lock_level { + IAVF_LOCK_REMOVING, + IAVF_LOCK_CONTENDED, + IAVF_LOCK_CRIT_ONLY, + IAVF_LOCK_FULL, /* CRIT + netdev_lock */ +}; + +static enum iavf_lock_level iavf_lock(struct iavf_adapter *adapter, bool try) +{ + struct net_device *netdev = adapter->netdev; + enum iavf_lock_level ret; + + if (READ_ONCE(netdev->reg_state) >= NETREG_REGISTERED) { + netdev_lock(netdev); + ret = IAVF_LOCK_FULL; + } else { + ret = IAVF_LOCK_CRIT_ONLY; + } + + if (!try) { + mutex_lock(&adapter->crit_lock); + } else if (!mutex_trylock(&adapter->crit_lock)) { + if (ret == IAVF_LOCK_FULL) + netdev_unlock(netdev); + + return adapter->state == __IAVF_REMOVE ? + IAVF_LOCK_REMOVING : IAVF_LOCK_CONTENDED; + } + + /* Incredibly unlucky, we saw unregistered device yet didn't contend + * with registration for the crit lock. Act as if we did contend. + */ + if (ret == IAVF_LOCK_CRIT_ONLY && + READ_ONCE(netdev->reg_state) >= NETREG_REGISTERED) { + mutex_unlock(&adapter->crit_lock); + return IAVF_LOCK_CONTENDED; + } + + return ret; +} + +static void iavf_unlock(struct iavf_adapter *adapter, enum iavf_lock_level lock) +{ + struct net_device *netdev = adapter->netdev; + + switch (lock) { + case IAVF_LOCK_REMOVING: + case IAVF_LOCK_CONTENDED: + WARN_ON_ONCE(1); + return; + + case IAVF_LOCK_CRIT_ONLY: + mutex_unlock(&adapter->crit_lock); + break; + case IAVF_LOCK_FULL: + mutex_unlock(&adapter->crit_lock); + netdev_unlock(netdev); + break; + } +} + /** * iavf_finish_config - do all netdev work that needs RTNL * @work: our work_struct @@ -1983,7 +2044,7 @@ static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter, bool runni static void iavf_finish_config(struct work_struct *work) { struct iavf_adapter *adapter; - bool netdev_released = false; + enum iavf_lock_level lock; int pairs, err; adapter = container_of(work, struct iavf_adapter, finish_config); @@ -1992,8 +2053,7 @@ static void iavf_finish_config(struct work_struct *work) * The dev->lock is needed to update the queue number */ rtnl_lock(); - netdev_lock(adapter->netdev); - mutex_lock(&adapter->crit_lock); + lock = iavf_lock(adapter, false); if ((adapter->flags & IAVF_FLAG_SETUP_NETDEV_FEATURES) && adapter->netdev->reg_state == NETREG_REGISTERED && @@ -2012,8 +2072,6 @@ static void iavf_finish_config(struct work_struct *work) netif_set_real_num_tx_queues(adapter->netdev, pairs); if (adapter->netdev->reg_state != NETREG_REGISTERED) { - netdev_unlock(adapter->netdev); - netdev_released = true; err = register_netdevice(adapter->netdev); if (err) { dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n", @@ -2040,9 +2098,7 @@ static void iavf_finish_config(struct work_struct *work) } out: - mutex_unlock(&adapter->crit_lock); - if (!netdev_released) - netdev_unlock(adapter->netdev); + iavf_unlock(adapter, lock); rtnl_unlock(); } @@ -2737,16 +2793,18 @@ static void iavf_watchdog_task(struct work_struct *work) watchdog_task.work); struct net_device *netdev = adapter->netdev; struct iavf_hw *hw = &adapter->hw; + enum iavf_lock_level lock; u32 reg_val; - netdev_lock(netdev); - if (!mutex_trylock(&adapter->crit_lock)) { - if (adapter->state == __IAVF_REMOVE) { - netdev_unlock(netdev); - return; - } - + lock = iavf_lock(adapter, true); + switch (lock) { + case IAVF_LOCK_REMOVING: + return; + case IAVF_LOCK_CONTENDED: goto restart_watchdog; + case IAVF_LOCK_CRIT_ONLY: + case IAVF_LOCK_FULL: + break; /* continue */ } if (adapter->flags & IAVF_FLAG_PF_COMMS_FAILED) @@ -2755,15 +2813,13 @@ static void iavf_watchdog_task(struct work_struct *work) switch (adapter->state) { case __IAVF_STARTUP: iavf_startup(adapter); - mutex_unlock(&adapter->crit_lock); - netdev_unlock(netdev); + iavf_unlock(adapter, lock); queue_delayed_work(adapter->wq, &adapter->watchdog_task, msecs_to_jiffies(30)); return; case __IAVF_INIT_VERSION_CHECK: iavf_init_version_check(adapter); - mutex_unlock(&adapter->crit_lock); - netdev_unlock(netdev); + iavf_unlock(adapter, lock); queue_delayed_work(adapter->wq, &adapter->watchdog_task, msecs_to_jiffies(30)); return; @@ -2902,8 +2958,7 @@ static void iavf_watchdog_task(struct work_struct *work) return; } - mutex_unlock(&adapter->crit_lock); - netdev_unlock(netdev); + iavf_unlock(adapter, lock); restart_watchdog: if (adapter->state >= __IAVF_DOWN) queue_work(adapter->wq, &adapter->adminq_task); @@ -3022,6 +3077,7 @@ static void iavf_reset_task(struct work_struct *work) struct iavf_hw *hw = &adapter->hw; struct iavf_mac_filter *f, *ftmp; struct iavf_cloud_filter *cf; + enum iavf_lock_level lock; enum iavf_status status; u32 reg_val; int i = 0, err; @@ -3030,13 +3086,16 @@ static void iavf_reset_task(struct work_struct *work) /* When device is being removed it doesn't make sense to run the reset * task, just return in such a case. */ - netdev_lock(netdev); - if (!mutex_trylock(&adapter->crit_lock)) { - if (adapter->state != __IAVF_REMOVE) - queue_work(adapter->wq, &adapter->reset_task); - - netdev_unlock(netdev); + lock = iavf_lock(adapter, true); + switch (lock) { + case IAVF_LOCK_REMOVING: return; + case IAVF_LOCK_CONTENDED: + queue_work(adapter->wq, &adapter->reset_task); + return; + case IAVF_LOCK_CRIT_ONLY: + case IAVF_LOCK_FULL: + break; /* continue */ } iavf_misc_irq_disable(adapter); @@ -3082,8 +3141,7 @@ static void iavf_reset_task(struct work_struct *work) dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n", reg_val); iavf_disable_vf(adapter); - mutex_unlock(&adapter->crit_lock); - netdev_unlock(netdev); + iavf_unlock(adapter, lock); return; /* Do not attempt to reinit. It's dead, Jim. */ } @@ -3223,8 +3281,7 @@ static void iavf_reset_task(struct work_struct *work) adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED; wake_up(&adapter->reset_waitqueue); - mutex_unlock(&adapter->crit_lock); - netdev_unlock(netdev); + iavf_unlock(adapter, lock); return; reset_err: @@ -3234,8 +3291,7 @@ static void iavf_reset_task(struct work_struct *work) } iavf_disable_vf(adapter); - mutex_unlock(&adapter->crit_lock); - netdev_unlock(netdev); + iavf_unlock(adapter, lock); dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n"); } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: iAVF circular lock dependency due to netdev_lock 2025-02-06 4:32 ` Jakub Kicinski @ 2025-02-06 17:18 ` Jacob Keller 0 siblings, 0 replies; 5+ messages in thread From: Jacob Keller @ 2025-02-06 17:18 UTC (permalink / raw) To: Jakub Kicinski; +Cc: netdev@vger.kernel.org On 2/5/2025 8:32 PM, Jakub Kicinski wrote: > On Wed, 5 Feb 2025 18:27:40 -0800 Jacob Keller wrote: >>> Not sure either, the locking in this driver is quite odd. Do you know >>> why it's registering the netdev from a workqueue, and what the entry >>> points to the driver are? >> >> Yes, the locking in iAVF has been problematic for years :( >> >> We register the netdevice from a work queue because we are waiting on >> messages from the PF over virtchnl. I don't fully understand the >> motivation behind the way the initialization was moved into a work >> queue, but this appears to be the historical reasoning from examining >> commit logs. >> >>> Normally before the netdev is registered it can't get called, so all >>> the locking is moot. But IDK if we need to protect from some FW >>> interactions, maybe? >> >> We had a lot of issues with locking and pain getting to the state we are >> in today. I think part of the challenge is that the VF is communicating >> asynchronously over virtchnl queue messages to the PF for setup. >> >> Its a mess :( I could re-order the locks so we go "RTNL -> crit_lock -> >> netdev_lock" but that will only work as long as no flow from the kernel >> does something like "RTNL -> netdev_lock -> <driver callback that takes >> crit lock>" which seems unlikely :( >> >> Its a mess and I don't quite know how to dig out of it. > > Stanislav suggested off list that we could add a _locked() version of > register_netdevice(). I'm worried that it's just digging a deeper hole. > We'd cover with the lock parts of core which weren't covered before. > Right. We could add one, but I think that gets pretty tricky as well. > Maybe the saving grace is that the driver appears to never transition > out of the registered state. And netdev lock only protects the core. > So we could elide taking the netdev lock if device is not registered > yet? We'd still need to convince lockdep that its okay to take the > netdev lock under crit lock once. > > Code below is incomplete but hopefully it will illustrate. > Key unanswered question is how to explain this to lockdep.. > This might work but it seems also quite ugly, especially if we have to make lockdep happy with it somehow. One other option would be to simply release the crit_lock the same way we are already releasing the dev lock just before calling register. That leaves a possible gap where other code could lock and then do something in the window where things are unlocked. I'm investigating whether thats even possible given the work queue design and flow during initialization of the driver. Hopefully I'll have an answer soon. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-02-06 17:19 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-05 23:20 iAVF circular lock dependency due to netdev_lock Jacob Keller 2025-02-06 1:23 ` Jakub Kicinski 2025-02-06 2:27 ` Jacob Keller 2025-02-06 4:32 ` Jakub Kicinski 2025-02-06 17:18 ` Jacob Keller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).