netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iAVF circular lock dependency due to netdev_lock
@ 2025-02-05 23:20 Jacob Keller
  2025-02-06  1:23 ` Jakub Kicinski
  0 siblings, 1 reply; 5+ messages in thread
From: Jacob Keller @ 2025-02-05 23:20 UTC (permalink / raw)
  To: Jakub Kicinski, netdev@vger.kernel.org

Hi,

I just discovered that iAVF now has a circular lock dependency on its
critical lock and the netdev lock:

> [ 1504.790308] ====================================================== 
> [ 1504.790309] WARNING: possible circular locking dependency detected 
> [ 1504.790310] 6.13.0 #net_next_rt.c2933b2befe2.el9 Not tainted 
> [ 1504.790311] ------------------------------------------------------ 
> [ 1504.790312] kworker/u128:0/13566 is trying to acquire lock: 
> [ 1504.790313] ffff97d0e4738f18 (&dev->lock){+.+.}-{4:4}, at: register_netdevice+0x52c/0x710 
> [ 1504.790320]  
> [ 1504.790320] but task is already holding lock: 
> [ 1504.790321] ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf] 
> [ 1504.790330]  
> [ 1504.790330] which lock already depends on the new lock. 
> [ 1504.790330]  
> [ 1504.790330]  
> [ 1504.790330] the existing dependency chain (in reverse order) is: 
> [ 1504.790331]  
> [ 1504.790331] -> #1 (&adapter->crit_lock){+.+.}-{4:4}: 
> [ 1504.790333]        __lock_acquire+0x52d/0xbb0 
> [ 1504.790337]        lock_acquire+0xd9/0x330 
> [ 1504.790338]        mutex_lock_nested+0x4b/0xb0 
> [ 1504.790341]        iavf_finish_config+0x37/0x240 [iavf] 
> [ 1504.790347]        process_one_work+0x248/0x6d0 
> [ 1504.790350]        worker_thread+0x18d/0x330 
> [ 1504.790352]        kthread+0x10e/0x250 
> [ 1504.790354]        ret_from_fork+0x30/0x50 
> [ 1504.790357]        ret_from_fork_asm+0x1a/0x30 
> [ 1504.790361]  
> [ 1504.790361] -> #0 (&dev->lock){+.+.}-{4:4}: 
> [ 1504.790364]        check_prev_add+0xf1/0xce0 
> [ 1504.790366]        validate_chain+0x46a/0x570 
> [ 1504.790368]        __lock_acquire+0x52d/0xbb0 
> [ 1504.790370]        lock_acquire+0xd9/0x330 
> [ 1504.790371]        mutex_lock_nested+0x4b/0xb0 
> [ 1504.790372]        register_netdevice+0x52c/0x710 
> [ 1504.790374]        iavf_finish_config+0xfa/0x240 [iavf] 
> [ 1504.790379]        process_one_work+0x248/0x6d0 
> [ 1504.790381]        worker_thread+0x18d/0x330 
> [ 1504.790383]        kthread+0x10e/0x250 
> [ 1504.790385]        ret_from_fork+0x30/0x50 
> [ 1504.790387]        ret_from_fork_asm+0x1a/0x30 
> [ 1504.790389]  
> [ 1504.790389] other info that might help us debug this: 
> [ 1504.790389]  
> [ 1504.790389]  Possible unsafe locking scenario: 
> [ 1504.790389]  
> [ 1504.790390]        CPU0                    CPU1 
> [ 1504.790391]        ----                    ---- 
> [ 1504.790391]   lock(&adapter->crit_lock); 
> [ 1504.790393]                                lock(&dev->lock); 
> [ 1504.790394]                                lock(&adapter->crit_lock); 
> [ 1504.790395]   lock(&dev->lock); 
> [ 1504.790397]  
> [ 1504.790397]  *** DEADLOCK *** 
> [ 1504.790397]  
> [ 1504.790397] 4 locks held by kworker/u128:0/13566: 
> [ 1504.790399]  #0: ffff97d1924d4d48 ((wq_completion)iavf){+.+.}-{0:0}, at: process_one_work+0x49b/0x6d0 
> [ 1504.790404]  #1: ffffa848c3d9be40 ((work_completion)(&adapter->finish_config)){+.+.}-{0:0}, at: process_one_work+0x1f3/0x6d0 
> [ 1504.790408]  #2: ffffffffb3e1bf40 (rtnl_mutex){+.+.}-{4:4}, at: iavf_finish_config+0x18/0x240 [iavf] 
> [ 1504.790416]  #3: ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf] 


This happens because the driver takes netdev_lock prior to acquiring its
own adapter->crit_lock, but then it calls register_netdevice under the
crit_lock. Since commit 5fda3f35349b ("net: make netdev_lock() protect
netdev->reg_state"), the register_netdevice() function now acquires
netdev_lock as part of its flow.

I can fix this by refactoring iavf to only take netdev_lock after
acquiring its own crit_lock.. but that smells funny. It seems like a
future change could require to take netdev_lock before calling into the
driver routines somehow, making that ordering problematic.

I'm not sure how else to fix this... I briefly considered just removing
crit_lock and relying solely on netdev_lock for synchronization, but
that doesn't work because of the register_netdevice() taking the lock.

I guess I could do some funky stuff with unlocking but that seems ugly
as well...

I'm not sure what we should do to fix this.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-06 17:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-05 23:20 iAVF circular lock dependency due to netdev_lock Jacob Keller
2025-02-06  1:23 ` Jakub Kicinski
2025-02-06  2:27   ` Jacob Keller
2025-02-06  4:32     ` Jakub Kicinski
2025-02-06 17:18       ` Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).