netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Report deadlock in the latest net-next
@ 2025-03-17  6:17 Taehee Yoo
  2025-03-24 16:57 ` Stanislav Fomichev
  0 siblings, 1 reply; 4+ messages in thread
From: Taehee Yoo @ 2025-03-17  6:17 UTC (permalink / raw)
  To: Netdev, Stanislav Fomichev
  Cc: David Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman

Hi Stanislav,
I found a deadlock in the latest net-next kernel.
The calltrace indicates your current
commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
which is net/core/rtnetlink.c:3025
And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
which is /net/core/dev_api.c:255
dev_disable_lro() is called by netdev notification, but notification
seems to be called both outside and inside dev->lock context.
This case is that netdev notification is called inside dev->lock context.
So deadlock occurs.
Could you please look into this?

Reproducer:
modprobe netdevsim
ip netns add ns_test
echo 1 > /sys/bus/netdevsim/new_device
ip link set $interface netns ns_test

============================================
WARNING: possible recursive locking detected
6.14.0-rc6+ #56 Not tainted
--------------------------------------------
ip/1672 is trying to acquire lock:
ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0

but task is already holding lock:
ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3440

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&dev->lock);
  lock(&dev->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by ip/1672:
 #0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
 #1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
rtnl_newlink+0x6f6/0x1c60
 #2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3440

stack backtrace:
CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
66129e0c5b1b922fef38623168aea99c0593a519
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x7e/0xc0
 print_deadlock_bug+0x4fd/0x8e0
 __lock_acquire+0x3082/0x4fd0
 ? __pfx___lock_acquire+0x10/0x10
 ? mark_lock.part.0+0xfa/0x2f60
 ? __pfx___lock_acquire+0x10/0x10
 ? check_chain_key+0x1c1/0x520
 lock_acquire+0x1b0/0x570
 ? dev_disable_lro+0x81/0x1f0
 ? __pfx_lock_acquire+0x10/0x10
 __mutex_lock+0x17c/0x17c0
 ? dev_disable_lro+0x81/0x1f0
 ? dev_disable_lro+0x81/0x1f0
 ? __pfx___mutex_lock+0x10/0x10
 ? mark_held_locks+0xa5/0xf0
 ? neigh_parms_alloc+0x36b/0x4f0
 ? __local_bh_enable_ip+0xa5/0x120
 ? lockdep_hardirqs_on+0xbe/0x140
 ? dev_disable_lro+0x81/0x1f0
 dev_disable_lro+0x81/0x1f0
 inetdev_init+0x2d1/0x4a0
 inetdev_event+0x9b3/0x1590
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_inetdev_event+0x10/0x10
 ? notifier_call_chain+0x9b/0x300
 notifier_call_chain+0x9b/0x300
 netif_change_net_namespace+0xdfe/0x1390
 ? __pfx_netif_change_net_namespace+0x10/0x10
 ? __pfx_validate_linkmsg+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 do_setlink.constprop.0+0x241/0x3440
 ? lock_acquire+0x1b0/0x570
 ? __pfx_do_setlink.constprop.0+0x10/0x10
 ? rtnl_newlink+0x6f6/0x1c60
 ? __pfx_lock_acquired+0x10/0x10
 ? netlink_sendmsg+0x712/0xbc0
 ? rcu_is_watching+0x11/0xb0
 ? trace_contention_end+0xef/0x140
 ? __mutex_lock+0x935/0x17c0
 ? __create_object+0x36/0x90
 ? __pfx_lock_release+0x10/0x10
 ? rtnl_newlink+0x6f6/0x1c60
 ? __nla_validate_parse+0xb9/0x2830
 ? __pfx___mutex_lock+0x10/0x10
 ? lockdep_hardirqs_on+0xbe/0x140
 ? __pfx___nla_validate_parse+0x10/0x10
 ? rcu_is_watching+0x11/0xb0
 ? cap_capable+0x17d/0x360
 ? fdget+0x4e/0x1d0
 rtnl_newlink+0x108d/0x1c60
 ? __pfx_rtnl_newlink+0x10/0x10
 ? mark_lock.part.0+0xfa/0x2f60
 ? __pfx___lock_acquire+0x10/0x10
 ? __pfx_mark_lock.part.0+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_rtnl_newlink+0x10/0x10
 rtnetlink_rcv_msg+0x71c/0xc10
 ? __pfx_rtnetlink_rcv_msg+0x10/0x10
 ? check_chain_key+0x1c1/0x520
 ? __pfx___lock_acquire+0x10/0x10
 netlink_rcv_skb+0x12c/0x360
 ? __pfx_rtnetlink_rcv_msg+0x10/0x10
 ? __pfx_netlink_rcv_skb+0x10/0x10
 ? netlink_deliver_tap+0xcb/0x9e0
 ? netlink_deliver_tap+0x14b/0x9e0
 netlink_unicast+0x447/0x710
 ? __pfx_netlink_unicast+0x10/0x10
 netlink_sendmsg+0x712/0xbc0
 ? __pfx_netlink_sendmsg+0x10/0x10
 ? _copy_from_user+0x3e/0xa0
 ____sys_sendmsg+0x7ab/0xa10
 ? __pfx_____sys_sendmsg+0x10/0x10
 ? __pfx_copy_msghdr_from_user+0x10/0x10
 ___sys_sendmsg+0xee/0x170
 ? __pfx___lock_acquire+0x10/0x10
 ? kasan_save_stack+0x20/0x40
 ? __pfx____sys_sendmsg+0x10/0x10
 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
 ? kasan_save_stack+0x30/0x40
 ? __pfx_lock_release+0x10/0x10
 ? __might_fault+0xbf/0x170
 __sys_sendmsg+0x105/0x190
 ? __pfx___sys_sendmsg+0x10/0x10
 ? rseq_syscall+0xc3/0x130
 do_syscall_64+0x64/0x140
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fd20f92c004
Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
005
RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Report deadlock in the latest net-next
  2025-03-17  6:17 Report deadlock in the latest net-next Taehee Yoo
@ 2025-03-24 16:57 ` Stanislav Fomichev
  2025-03-25  4:36   ` Taehee Yoo
  0 siblings, 1 reply; 4+ messages in thread
From: Stanislav Fomichev @ 2025-03-24 16:57 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: Netdev, Stanislav Fomichev, David Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman

On 03/17, Taehee Yoo wrote:
> Hi Stanislav,
> I found a deadlock in the latest net-next kernel.
> The calltrace indicates your current
> commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
> The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
> which is net/core/rtnetlink.c:3025
> And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
> which is /net/core/dev_api.c:255
> dev_disable_lro() is called by netdev notification, but notification
> seems to be called both outside and inside dev->lock context.
> This case is that netdev notification is called inside dev->lock context.
> So deadlock occurs.
> Could you please look into this?
> 
> Reproducer:
> modprobe netdevsim
> ip netns add ns_test
> echo 1 > /sys/bus/netdevsim/new_device
> ip link set $interface netns ns_test
> 
> ============================================
> WARNING: possible recursive locking detected
> 6.14.0-rc6+ #56 Not tainted
> --------------------------------------------
> ip/1672 is trying to acquire lock:
> ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0
> 
> but task is already holding lock:
> ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3440
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&dev->lock);
>   lock(&dev->lock);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by ip/1672:
>  #0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
>  #1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> rtnl_newlink+0x6f6/0x1c60
>  #2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3440
> 
> stack backtrace:
> CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
> 66129e0c5b1b922fef38623168aea99c0593a519
> Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x7e/0xc0
>  print_deadlock_bug+0x4fd/0x8e0
>  __lock_acquire+0x3082/0x4fd0
>  ? __pfx___lock_acquire+0x10/0x10
>  ? mark_lock.part.0+0xfa/0x2f60
>  ? __pfx___lock_acquire+0x10/0x10
>  ? check_chain_key+0x1c1/0x520
>  lock_acquire+0x1b0/0x570
>  ? dev_disable_lro+0x81/0x1f0
>  ? __pfx_lock_acquire+0x10/0x10
>  __mutex_lock+0x17c/0x17c0
>  ? dev_disable_lro+0x81/0x1f0
>  ? dev_disable_lro+0x81/0x1f0
>  ? __pfx___mutex_lock+0x10/0x10
>  ? mark_held_locks+0xa5/0xf0
>  ? neigh_parms_alloc+0x36b/0x4f0
>  ? __local_bh_enable_ip+0xa5/0x120
>  ? lockdep_hardirqs_on+0xbe/0x140
>  ? dev_disable_lro+0x81/0x1f0
>  dev_disable_lro+0x81/0x1f0
>  inetdev_init+0x2d1/0x4a0
>  inetdev_event+0x9b3/0x1590
>  ? __pfx_lock_release+0x10/0x10
>  ? __pfx_inetdev_event+0x10/0x10
>  ? notifier_call_chain+0x9b/0x300
>  notifier_call_chain+0x9b/0x300
>  netif_change_net_namespace+0xdfe/0x1390
>  ? __pfx_netif_change_net_namespace+0x10/0x10
>  ? __pfx_validate_linkmsg+0x10/0x10
>  ? __pfx___lock_acquire+0x10/0x10
>  do_setlink.constprop.0+0x241/0x3440
>  ? lock_acquire+0x1b0/0x570
>  ? __pfx_do_setlink.constprop.0+0x10/0x10
>  ? rtnl_newlink+0x6f6/0x1c60
>  ? __pfx_lock_acquired+0x10/0x10
>  ? netlink_sendmsg+0x712/0xbc0
>  ? rcu_is_watching+0x11/0xb0
>  ? trace_contention_end+0xef/0x140
>  ? __mutex_lock+0x935/0x17c0
>  ? __create_object+0x36/0x90
>  ? __pfx_lock_release+0x10/0x10
>  ? rtnl_newlink+0x6f6/0x1c60
>  ? __nla_validate_parse+0xb9/0x2830
>  ? __pfx___mutex_lock+0x10/0x10
>  ? lockdep_hardirqs_on+0xbe/0x140
>  ? __pfx___nla_validate_parse+0x10/0x10
>  ? rcu_is_watching+0x11/0xb0
>  ? cap_capable+0x17d/0x360
>  ? fdget+0x4e/0x1d0
>  rtnl_newlink+0x108d/0x1c60
>  ? __pfx_rtnl_newlink+0x10/0x10
>  ? mark_lock.part.0+0xfa/0x2f60
>  ? __pfx___lock_acquire+0x10/0x10
>  ? __pfx_mark_lock.part.0+0x10/0x10
>  ? __pfx_lock_release+0x10/0x10
>  ? __pfx_rtnl_newlink+0x10/0x10
>  rtnetlink_rcv_msg+0x71c/0xc10
>  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
>  ? check_chain_key+0x1c1/0x520
>  ? __pfx___lock_acquire+0x10/0x10
>  netlink_rcv_skb+0x12c/0x360
>  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
>  ? __pfx_netlink_rcv_skb+0x10/0x10
>  ? netlink_deliver_tap+0xcb/0x9e0
>  ? netlink_deliver_tap+0x14b/0x9e0
>  netlink_unicast+0x447/0x710
>  ? __pfx_netlink_unicast+0x10/0x10
>  netlink_sendmsg+0x712/0xbc0
>  ? __pfx_netlink_sendmsg+0x10/0x10
>  ? _copy_from_user+0x3e/0xa0
>  ____sys_sendmsg+0x7ab/0xa10
>  ? __pfx_____sys_sendmsg+0x10/0x10
>  ? __pfx_copy_msghdr_from_user+0x10/0x10
>  ___sys_sendmsg+0xee/0x170
>  ? __pfx___lock_acquire+0x10/0x10
>  ? kasan_save_stack+0x20/0x40
>  ? __pfx____sys_sendmsg+0x10/0x10
>  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  ? kasan_save_stack+0x30/0x40
>  ? __pfx_lock_release+0x10/0x10
>  ? __might_fault+0xbf/0x170
>  __sys_sendmsg+0x105/0x190
>  ? __pfx___sys_sendmsg+0x10/0x10
>  ? rseq_syscall+0xc3/0x130
>  do_syscall_64+0x64/0x140
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7fd20f92c004
> Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
> 00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
> 005
> RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
> RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
> RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
> R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
> R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000
> 
> Thanks a lot!
> Taehee Yoo

Sorry, I completely missed that, I think this is similar to:

https://lore.kernel.org/netdev/Z-GDBlDsnPyc21RM@mini-arch/T/#u

?

Can you give it a quick test with the patches from that link?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Report deadlock in the latest net-next
  2025-03-24 16:57 ` Stanislav Fomichev
@ 2025-03-25  4:36   ` Taehee Yoo
  2025-03-25 12:45     ` Stanislav Fomichev
  0 siblings, 1 reply; 4+ messages in thread
From: Taehee Yoo @ 2025-03-25  4:36 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Netdev, Stanislav Fomichev, David Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman

On Tue, Mar 25, 2025 at 1:57 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>

Hi Stanislav,
Thanks a lot for your reply.

> On 03/17, Taehee Yoo wrote:
> > Hi Stanislav,
> > I found a deadlock in the latest net-next kernel.
> > The calltrace indicates your current
> > commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
> > The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
> > which is net/core/rtnetlink.c:3025
> > And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
> > which is /net/core/dev_api.c:255
> > dev_disable_lro() is called by netdev notification, but notification
> > seems to be called both outside and inside dev->lock context.
> > This case is that netdev notification is called inside dev->lock context.
> > So deadlock occurs.
> > Could you please look into this?
> >
> > Reproducer:
> > modprobe netdevsim
> > ip netns add ns_test
> > echo 1 > /sys/bus/netdevsim/new_device
> > ip link set $interface netns ns_test
> >
> > ============================================
> > WARNING: possible recursive locking detected
> > 6.14.0-rc6+ #56 Not tainted
> > --------------------------------------------
> > ip/1672 is trying to acquire lock:
> > ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0
> >
> > but task is already holding lock:
> > ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> > do_setlink.constprop.0+0x12a/0x3440
> >
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> >
> >        CPU0
> >        ----
> >   lock(&dev->lock);
> >   lock(&dev->lock);
> >
> >  *** DEADLOCK ***
> >
> >  May be due to missing lock nesting notation
> >
> > 3 locks held by ip/1672:
> >  #0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
> >  #1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> > rtnl_newlink+0x6f6/0x1c60
> >  #2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> > do_setlink.constprop.0+0x12a/0x3440
> >
> > stack backtrace:
> > CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
> > 66129e0c5b1b922fef38623168aea99c0593a519
> > Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x7e/0xc0
> >  print_deadlock_bug+0x4fd/0x8e0
> >  __lock_acquire+0x3082/0x4fd0
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? mark_lock.part.0+0xfa/0x2f60
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? check_chain_key+0x1c1/0x520
> >  lock_acquire+0x1b0/0x570
> >  ? dev_disable_lro+0x81/0x1f0
> >  ? __pfx_lock_acquire+0x10/0x10
> >  __mutex_lock+0x17c/0x17c0
> >  ? dev_disable_lro+0x81/0x1f0
> >  ? dev_disable_lro+0x81/0x1f0
> >  ? __pfx___mutex_lock+0x10/0x10
> >  ? mark_held_locks+0xa5/0xf0
> >  ? neigh_parms_alloc+0x36b/0x4f0
> >  ? __local_bh_enable_ip+0xa5/0x120
> >  ? lockdep_hardirqs_on+0xbe/0x140
> >  ? dev_disable_lro+0x81/0x1f0
> >  dev_disable_lro+0x81/0x1f0
> >  inetdev_init+0x2d1/0x4a0
> >  inetdev_event+0x9b3/0x1590
> >  ? __pfx_lock_release+0x10/0x10
> >  ? __pfx_inetdev_event+0x10/0x10
> >  ? notifier_call_chain+0x9b/0x300
> >  notifier_call_chain+0x9b/0x300
> >  netif_change_net_namespace+0xdfe/0x1390
> >  ? __pfx_netif_change_net_namespace+0x10/0x10
> >  ? __pfx_validate_linkmsg+0x10/0x10
> >  ? __pfx___lock_acquire+0x10/0x10
> >  do_setlink.constprop.0+0x241/0x3440
> >  ? lock_acquire+0x1b0/0x570
> >  ? __pfx_do_setlink.constprop.0+0x10/0x10
> >  ? rtnl_newlink+0x6f6/0x1c60
> >  ? __pfx_lock_acquired+0x10/0x10
> >  ? netlink_sendmsg+0x712/0xbc0
> >  ? rcu_is_watching+0x11/0xb0
> >  ? trace_contention_end+0xef/0x140
> >  ? __mutex_lock+0x935/0x17c0
> >  ? __create_object+0x36/0x90
> >  ? __pfx_lock_release+0x10/0x10
> >  ? rtnl_newlink+0x6f6/0x1c60
> >  ? __nla_validate_parse+0xb9/0x2830
> >  ? __pfx___mutex_lock+0x10/0x10
> >  ? lockdep_hardirqs_on+0xbe/0x140
> >  ? __pfx___nla_validate_parse+0x10/0x10
> >  ? rcu_is_watching+0x11/0xb0
> >  ? cap_capable+0x17d/0x360
> >  ? fdget+0x4e/0x1d0
> >  rtnl_newlink+0x108d/0x1c60
> >  ? __pfx_rtnl_newlink+0x10/0x10
> >  ? mark_lock.part.0+0xfa/0x2f60
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? __pfx_mark_lock.part.0+0x10/0x10
> >  ? __pfx_lock_release+0x10/0x10
> >  ? __pfx_rtnl_newlink+0x10/0x10
> >  rtnetlink_rcv_msg+0x71c/0xc10
> >  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> >  ? check_chain_key+0x1c1/0x520
> >  ? __pfx___lock_acquire+0x10/0x10
> >  netlink_rcv_skb+0x12c/0x360
> >  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> >  ? __pfx_netlink_rcv_skb+0x10/0x10
> >  ? netlink_deliver_tap+0xcb/0x9e0
> >  ? netlink_deliver_tap+0x14b/0x9e0
> >  netlink_unicast+0x447/0x710
> >  ? __pfx_netlink_unicast+0x10/0x10
> >  netlink_sendmsg+0x712/0xbc0
> >  ? __pfx_netlink_sendmsg+0x10/0x10
> >  ? _copy_from_user+0x3e/0xa0
> >  ____sys_sendmsg+0x7ab/0xa10
> >  ? __pfx_____sys_sendmsg+0x10/0x10
> >  ? __pfx_copy_msghdr_from_user+0x10/0x10
> >  ___sys_sendmsg+0xee/0x170
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? kasan_save_stack+0x20/0x40
> >  ? __pfx____sys_sendmsg+0x10/0x10
> >  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >  ? kasan_save_stack+0x30/0x40
> >  ? __pfx_lock_release+0x10/0x10
> >  ? __might_fault+0xbf/0x170
> >  __sys_sendmsg+0x105/0x190
> >  ? __pfx___sys_sendmsg+0x10/0x10
> >  ? rseq_syscall+0xc3/0x130
> >  do_syscall_64+0x64/0x140
> >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7fd20f92c004
> > Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
> > 00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
> > 005
> > RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
> > RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
> > RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
> > R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
> > R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000
> >
> > Thanks a lot!
> > Taehee Yoo
>
> Sorry, I completely missed that, I think this is similar to:
>
> https://lore.kernel.org/netdev/Z-GDBlDsnPyc21RM@mini-arch/T/#u
>
> ?
>
> Can you give it a quick test with the patches from that link?

I applied two changes [1] and [2].
The above case seems to be fixed.
But I found a new splat when netdevsim interface was created,
which was already reported from that link.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1448 at ./include/net/netdev_lock.h:54
__netdev_update_features+0x894/0x1550
Modules linked in: netdevsim veth xt_nat xt_tcpudp xt_conntrack
nft_chain_nat xt_MASQUERADE nf_cos
CPU: 1 UID: 0 PID: 1448 Comm: bash Not tainted 6.14.0-rc7+ #74
0e3a9c04b78c7bd4fd13f140e1c89a83e53
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603
11/01/2021
RIP: 0010:__netdev_update_features+0x894/0x1550
Code: ff 0f 1f 44 00 00 48 f7 d0 49 21 c4 e9 4d fa ff ff 48 8d bd 90
0d 00 00 be ff ff ff ff e8 e0
RSP: 0018:ffff88825cc3f230 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881e1f72000 RCX: 0000000000000001
RDX: 0000000000000006 RSI: ffffffff90ac4960 RDI: ffffffff90d73280
RBP: ffff8881e1f72000 R08: 0000000000000000 R09: fffffbfff327743c
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88815ad84000
R13: ffff88815ad84168 R14: 0000000000000005 R15: 1ffff1104b987e6c
FS:  00007f64f7c8a740(0000) GS:ffff88881b200000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffdaa5c07c8 CR3: 00000001e1af0000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __warn+0xcd/0x2f0
 ? __netdev_update_features+0x894/0x1550
 ? report_bug+0x326/0x3c0
 ? handle_bug+0x53/0xa0
 ? exc_invalid_op+0x14/0x50
 ? asm_exc_invalid_op+0x16/0x20
 ? __netdev_update_features+0x894/0x1550
 ? check_chain_key+0x1c1/0x520
 ? __pfx___netdev_update_features+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 netif_disable_lro+0x90/0x520
 ? __pfx_netif_disable_lro+0x10/0x10
 ? lockdep_hardirqs_on+0xbe/0x140
 ? neigh_parms_alloc+0x36b/0x4f0
 ? __local_bh_enable_ip+0xa5/0x120
 ? neigh_parms_alloc+0x36b/0x4f0
 inetdev_init+0x2d1/0x4a0
 inetdev_event+0x9b3/0x1590
 ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
 ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
 ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
 ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
 ? __module_address.part.0+0x6a/0x220
 ? __pfx_inetdev_event+0x10/0x10
 ? notifier_call_chain+0x9b/0x300

But I found a new deadlock.
Reproducer:
   modprobe netdevsim
   ip netns add ns_test
   echo 1 > /sys/bus/netdevsim/new_device
   ip link add bond0 type bond
   ip link set $interface master bond0
   ip link set $interface netns ns_test

Splat:
============================================
WARNING: possible recursive locking detected
6.14.0-rc7+ #74 Tainted: G        W
--------------------------------------------
ip/1876 is trying to acquire lock:
ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at: dev_close+0x81/0x1f0

but task is already holding lock:
ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3410

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&dev->lock);
  lock(&dev->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by ip/1876:
 #0: ffffffff993ba250 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
 #1: ffff88816736e230 (&net->rtnl_mutex){+.+.}-{4:4}, at:
rtnl_newlink+0x6f6/0x1c60
 #2: ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3410

stack backtrace:
CPU: 1 UID: 0 PID: 1876 Comm: ip Tainted: G        W
6.14.0-rc7+ #74 0e3a9c04b78c7bd4fd13
Tainted: [W]=WARN
Call Trace:
 <TASK>
 dump_stack_lvl+0x7e/0xc0
 print_deadlock_bug+0x4fd/0x8e0
 __lock_acquire+0x3082/0x4fd0
 ? __pfx___lock_acquire+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 lock_acquire+0x1b0/0x570
 ? dev_close+0x81/0x1f0
 ? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 ? __pfx_lock_acquire+0x10/0x10
 ? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 ? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 __mutex_lock+0x17c/0x17c0
 ? dev_close+0x81/0x1f0
 ? dev_close+0x81/0x1f0
 ? __pfx_netdev_change_features+0x10/0x10
 ? __pfx___mutex_lock+0x10/0x10
 ? __module_text_address+0x36/0x170
 ? preempt_count_add+0x7d/0x150
 ? ip6_route_dev_notify+0x37/0x670
 ? notifier_call_chain+0x9b/0x300
 ? dev_close+0x81/0x1f0
 dev_close+0x81/0x1f0
 __bond_release_one+0x888/0x1610 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 ? __mutex_lock+0x935/0x17c0
 ? nf_tables_flowtable_event+0x97/0x480 [nf_tables
1445783a301bcd3ec7ca4a0703efdcd50d4aca3a]
 ? __pfx___bond_release_one+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 ? nft_offload_netdev_event+0xce/0x3a0 [nf_tables
1445783a301bcd3ec7ca4a0703efdcd50d4aca3a]
 ? __mutex_unlock_slowpath+0x15d/0x650
 ? __pfx___mutex_unlock_slowpath+0x10/0x10
 ? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 ? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 ? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 ? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 ? __module_address.part.0+0x6a/0x220
 bond_netdev_event+0x91b/0xab0 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
 notifier_call_chain+0x9b/0x300
 netif_change_net_namespace+0x43f/0x1390
 ? __pfx_netif_change_net_namespace+0x10/0x10
 ? __pfx_validate_linkmsg+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 do_setlink.constprop.0+0x241/0x3410

Reproducer2:
   modprobe netdevsim
   ip netns add ns_test
   echo 1 > /sys/bus/netdevsim/new_device
   ip link add team0 type team
   ip link set $interface master team0
   ip link set $interface netns ns_test

Splat:
======================================================
WARNING: possible circular locking dependency detected
6.14.0-rc7+ #74 Tainted: G        W
------------------------------------------------------
ip/2036 is trying to acquire lock:
ffff88812fccae88 (team->team_lock_key){+.+.}-{4:4}, at:
team_device_event+0x101/0x690 [team]

but task is already holding lock:
ffff8881947a2d90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3410

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&dev->lock){+.+.}-{4:4}:
       lock_acquire+0x1b0/0x570
       __mutex_lock+0x17c/0x17c0
       dev_set_mtu+0x86/0x210
       team_add_slave+0x802/0x1e00 [team]
       do_set_master+0x363/0x6d0
       do_setlink.constprop.0+0x86f/0x3410
       rtnl_newlink+0x108d/0x1c60
       rtnetlink_rcv_msg+0x71c/0xc10
       netlink_rcv_skb+0x12c/0x360
       netlink_unicast+0x447/0x710
       netlink_sendmsg+0x712/0xbc0
       ____sys_sendmsg+0x7ab/0xa10
       ___sys_sendmsg+0xee/0x170
       __sys_sendmsg+0x105/0x190
       do_syscall_64+0x64/0x140
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #0 (team->team_lock_key){+.+.}-{4:4}:
       check_prev_add+0x1b7/0x2360
       __lock_acquire+0x32ab/0x4fd0
       lock_acquire+0x1b0/0x570
       __mutex_lock+0x17c/0x17c0
       team_device_event+0x101/0x690 [team]
       notifier_call_chain+0x9b/0x300
       dev_close_many+0x2c4/0x5a0
       netif_close+0x147/0x1e0
       netif_change_net_namespace+0x3a9/0x1390
       do_setlink.constprop.0+0x241/0x3410
       rtnl_newlink+0x108d/0x1c60
       rtnetlink_rcv_msg+0x71c/0xc10
       netlink_rcv_skb+0x12c/0x360
       netlink_unicast+0x447/0x710
       netlink_sendmsg+0x712/0xbc0
       ____sys_sendmsg+0x7ab/0xa10
       ___sys_sendmsg+0xee/0x170
       __sys_sendmsg+0x105/0x190
       do_syscall_64+0x64/0x140
       entry_SYSCALL_64_after_hwframe+0x76/0x7e

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&dev->lock);
                               lock(team->team_lock_key);
                               lock(&dev->lock);
  lock(team->team_lock_key);

 *** DEADLOCK ***

3 locks held by ip/2036:
 #0: ffffffffa33ba250 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
 #1: ffff888148db1fb0 (&net->rtnl_mutex){+.+.}-{4:4}, at:
rtnl_newlink+0x6f6/0x1c60
 #2: ffff8881947a2d90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3410

stack backtrace:
CPU: 3 UID: 0 PID: 2036 Comm: ip Tainted: G        W
6.14.0-rc7+ #74 0e3a9c04b78c7bd4fd13
Tainted: [W]=WARN
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x7e/0xc0
 print_circular_bug+0x5bd/0x9b0
 check_noncircular+0x31b/0x400
 ? mark_lock.part.0+0xfa/0x2f60
 ? __pfx_check_noncircular+0x10/0x10
 ? __pfx_mark_lock.part.0+0x10/0x10
 ? __lock_acquire+0x16fa/0x4fd0
 check_prev_add+0x1b7/0x2360
 __lock_acquire+0x32ab/0x4fd0
 ? __pfx___lock_acquire+0x10/0x10
 ? try_to_wake_up+0xb9/0x1600
 ? __pfx_lock_release+0x10/0x10
 lock_acquire+0x1b0/0x570
 ? team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
 ? __pfx_lock_acquire+0x10/0x10
 ? __pfx_mark_lock.part.0+0x10/0x10
 __mutex_lock+0x17c/0x17c0
 ? team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
 ? team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
 ? mark_held_locks+0xa5/0xf0
 ? __pfx___mutex_lock+0x10/0x10
 ? queue_work_on+0x63/0x90
 ? lockdep_hardirqs_on+0xbe/0x140
 ? __pfx_team_device_event+0x10/0x10 [team
101c85bbc03feb292be26fbaaf9cee585e6924fa]
 ? __pfx_team_device_event+0x10/0x10 [team
101c85bbc03feb292be26fbaaf9cee585e6924fa]
 ? __pfx_team_device_event+0x10/0x10 [team
101c85bbc03feb292be26fbaaf9cee585e6924fa]
 ? __pfx_team_device_event+0x10/0x10 [team
101c85bbc03feb292be26fbaaf9cee585e6924fa]
 ? team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
 team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
 notifier_call_chain+0x9b/0x300
 dev_close_many+0x2c4/0x5a0
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_dev_close_many+0x10/0x10
 netif_close+0x147/0x1e0
 ? __pfx_netif_close+0x10/0x10
 ? rcu_is_watching+0x11/0xb0
 netif_change_net_namespace+0x3a9/0x1390


[1] https://lore.kernel.org/netdev/Z-IrMQQ-mnQJzGyL@mini-arch/T/#t
[2]
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 754f60fb6e25..77e5705ac799 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -281,7 +281,7 @@ static struct in_device *inetdev_init(struct
net_device *dev)
        if (!in_dev->arp_parms)
                goto out_kfree;
        if (IPV4_DEVCONF(in_dev->cnf, FORWARDING))
-               dev_disable_lro(dev);
+               netif_disable_lro(dev);
        /* Reference in_dev->dev */
        netdev_hold(dev, &in_dev->dev_tracker, GFP_KERNEL);
        /* Account for reference dev->ip_ptr (below) */

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Report deadlock in the latest net-next
  2025-03-25  4:36   ` Taehee Yoo
@ 2025-03-25 12:45     ` Stanislav Fomichev
  0 siblings, 0 replies; 4+ messages in thread
From: Stanislav Fomichev @ 2025-03-25 12:45 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: Netdev, Stanislav Fomichev, David Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman

On 03/25, Taehee Yoo wrote:
> On Tue, Mar 25, 2025 at 1:57 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> 
> Hi Stanislav,
> Thanks a lot for your reply.
> 
> > On 03/17, Taehee Yoo wrote:
> > > Hi Stanislav,
> > > I found a deadlock in the latest net-next kernel.
> > > The calltrace indicates your current
> > > commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
> > > The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
> > > which is net/core/rtnetlink.c:3025
> > > And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
> > > which is /net/core/dev_api.c:255
> > > dev_disable_lro() is called by netdev notification, but notification
> > > seems to be called both outside and inside dev->lock context.
> > > This case is that netdev notification is called inside dev->lock context.
> > > So deadlock occurs.
> > > Could you please look into this?
> > >
> > > Reproducer:
> > > modprobe netdevsim
> > > ip netns add ns_test
> > > echo 1 > /sys/bus/netdevsim/new_device
> > > ip link set $interface netns ns_test
> > >
> > > ============================================
> > > WARNING: possible recursive locking detected
> > > 6.14.0-rc6+ #56 Not tainted
> > > --------------------------------------------
> > > ip/1672 is trying to acquire lock:
> > > ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0
> > >
> > > but task is already holding lock:
> > > ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> > > do_setlink.constprop.0+0x12a/0x3440
> > >
> > > other info that might help us debug this:
> > >  Possible unsafe locking scenario:
> > >
> > >        CPU0
> > >        ----
> > >   lock(&dev->lock);
> > >   lock(&dev->lock);
> > >
> > >  *** DEADLOCK ***
> > >
> > >  May be due to missing lock nesting notation
> > >
> > > 3 locks held by ip/1672:
> > >  #0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
> > >  #1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> > > rtnl_newlink+0x6f6/0x1c60
> > >  #2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> > > do_setlink.constprop.0+0x12a/0x3440
> > >
> > > stack backtrace:
> > > CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
> > > 66129e0c5b1b922fef38623168aea99c0593a519
> > > Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0x7e/0xc0
> > >  print_deadlock_bug+0x4fd/0x8e0
> > >  __lock_acquire+0x3082/0x4fd0
> > >  ? __pfx___lock_acquire+0x10/0x10
> > >  ? mark_lock.part.0+0xfa/0x2f60
> > >  ? __pfx___lock_acquire+0x10/0x10
> > >  ? check_chain_key+0x1c1/0x520
> > >  lock_acquire+0x1b0/0x570
> > >  ? dev_disable_lro+0x81/0x1f0
> > >  ? __pfx_lock_acquire+0x10/0x10
> > >  __mutex_lock+0x17c/0x17c0
> > >  ? dev_disable_lro+0x81/0x1f0
> > >  ? dev_disable_lro+0x81/0x1f0
> > >  ? __pfx___mutex_lock+0x10/0x10
> > >  ? mark_held_locks+0xa5/0xf0
> > >  ? neigh_parms_alloc+0x36b/0x4f0
> > >  ? __local_bh_enable_ip+0xa5/0x120
> > >  ? lockdep_hardirqs_on+0xbe/0x140
> > >  ? dev_disable_lro+0x81/0x1f0
> > >  dev_disable_lro+0x81/0x1f0
> > >  inetdev_init+0x2d1/0x4a0
> > >  inetdev_event+0x9b3/0x1590
> > >  ? __pfx_lock_release+0x10/0x10
> > >  ? __pfx_inetdev_event+0x10/0x10
> > >  ? notifier_call_chain+0x9b/0x300
> > >  notifier_call_chain+0x9b/0x300
> > >  netif_change_net_namespace+0xdfe/0x1390
> > >  ? __pfx_netif_change_net_namespace+0x10/0x10
> > >  ? __pfx_validate_linkmsg+0x10/0x10
> > >  ? __pfx___lock_acquire+0x10/0x10
> > >  do_setlink.constprop.0+0x241/0x3440
> > >  ? lock_acquire+0x1b0/0x570
> > >  ? __pfx_do_setlink.constprop.0+0x10/0x10
> > >  ? rtnl_newlink+0x6f6/0x1c60
> > >  ? __pfx_lock_acquired+0x10/0x10
> > >  ? netlink_sendmsg+0x712/0xbc0
> > >  ? rcu_is_watching+0x11/0xb0
> > >  ? trace_contention_end+0xef/0x140
> > >  ? __mutex_lock+0x935/0x17c0
> > >  ? __create_object+0x36/0x90
> > >  ? __pfx_lock_release+0x10/0x10
> > >  ? rtnl_newlink+0x6f6/0x1c60
> > >  ? __nla_validate_parse+0xb9/0x2830
> > >  ? __pfx___mutex_lock+0x10/0x10
> > >  ? lockdep_hardirqs_on+0xbe/0x140
> > >  ? __pfx___nla_validate_parse+0x10/0x10
> > >  ? rcu_is_watching+0x11/0xb0
> > >  ? cap_capable+0x17d/0x360
> > >  ? fdget+0x4e/0x1d0
> > >  rtnl_newlink+0x108d/0x1c60
> > >  ? __pfx_rtnl_newlink+0x10/0x10
> > >  ? mark_lock.part.0+0xfa/0x2f60
> > >  ? __pfx___lock_acquire+0x10/0x10
> > >  ? __pfx_mark_lock.part.0+0x10/0x10
> > >  ? __pfx_lock_release+0x10/0x10
> > >  ? __pfx_rtnl_newlink+0x10/0x10
> > >  rtnetlink_rcv_msg+0x71c/0xc10
> > >  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > >  ? check_chain_key+0x1c1/0x520
> > >  ? __pfx___lock_acquire+0x10/0x10
> > >  netlink_rcv_skb+0x12c/0x360
> > >  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > >  ? __pfx_netlink_rcv_skb+0x10/0x10
> > >  ? netlink_deliver_tap+0xcb/0x9e0
> > >  ? netlink_deliver_tap+0x14b/0x9e0
> > >  netlink_unicast+0x447/0x710
> > >  ? __pfx_netlink_unicast+0x10/0x10
> > >  netlink_sendmsg+0x712/0xbc0
> > >  ? __pfx_netlink_sendmsg+0x10/0x10
> > >  ? _copy_from_user+0x3e/0xa0
> > >  ____sys_sendmsg+0x7ab/0xa10
> > >  ? __pfx_____sys_sendmsg+0x10/0x10
> > >  ? __pfx_copy_msghdr_from_user+0x10/0x10
> > >  ___sys_sendmsg+0xee/0x170
> > >  ? __pfx___lock_acquire+0x10/0x10
> > >  ? kasan_save_stack+0x20/0x40
> > >  ? __pfx____sys_sendmsg+0x10/0x10
> > >  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >  ? kasan_save_stack+0x30/0x40
> > >  ? __pfx_lock_release+0x10/0x10
> > >  ? __might_fault+0xbf/0x170
> > >  __sys_sendmsg+0x105/0x190
> > >  ? __pfx___sys_sendmsg+0x10/0x10
> > >  ? rseq_syscall+0xc3/0x130
> > >  do_syscall_64+0x64/0x140
> > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > RIP: 0033:0x7fd20f92c004
> > > Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
> > > 00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
> > > 005
> > > RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
> > > RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
> > > RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
> > > R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
> > > R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000
> > >
> > > Thanks a lot!
> > > Taehee Yoo
> >
> > Sorry, I completely missed that, I think this is similar to:
> >
> > https://lore.kernel.org/netdev/Z-GDBlDsnPyc21RM@mini-arch/T/#u
> >
> > ?
> >
> > Can you give it a quick test with the patches from that link?
> 
> I applied two changes [1] and [2].
> The aboje case seems to be fixed.
> But I found a new splat when netdevsim interface was created,
> which was already reported from that link.

Thanks for testing! Yeah, I'm still looking into it. I ended up
adding ops lock around NETDEV_REGISTER and NETDEV_UP, but I
think something is still not right.

> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 1448 at ./include/net/netdev_lock.h:54
> __netdev_update_features+0x894/0x1550
> Modules linked in: netdevsim veth xt_nat xt_tcpudp xt_conntrack
> nft_chain_nat xt_MASQUERADE nf_cos
> CPU: 1 UID: 0 PID: 1448 Comm: bash Not tainted 6.14.0-rc7+ #74
> 0e3a9c04b78c7bd4fd13f140e1c89a83e53
> Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603
> 11/01/2021
> RIP: 0010:__netdev_update_features+0x894/0x1550
> Code: ff 0f 1f 44 00 00 48 f7 d0 49 21 c4 e9 4d fa ff ff 48 8d bd 90
> 0d 00 00 be ff ff ff ff e8 e0
> RSP: 0018:ffff88825cc3f230 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff8881e1f72000 RCX: 0000000000000001
> RDX: 0000000000000006 RSI: ffffffff90ac4960 RDI: ffffffff90d73280
> RBP: ffff8881e1f72000 R08: 0000000000000000 R09: fffffbfff327743c
> R10: 0000000000000001 R11: 0000000000000001 R12: ffff88815ad84000
> R13: ffff88815ad84168 R14: 0000000000000005 R15: 1ffff1104b987e6c
> FS:  00007f64f7c8a740(0000) GS:ffff88881b200000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffdaa5c07c8 CR3: 00000001e1af0000 CR4: 00000000007506f0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  ? __warn+0xcd/0x2f0
>  ? __netdev_update_features+0x894/0x1550
>  ? report_bug+0x326/0x3c0
>  ? handle_bug+0x53/0xa0
>  ? exc_invalid_op+0x14/0x50
>  ? asm_exc_invalid_op+0x16/0x20
>  ? __netdev_update_features+0x894/0x1550
>  ? check_chain_key+0x1c1/0x520
>  ? __pfx___netdev_update_features+0x10/0x10
>  ? __pfx_lock_release+0x10/0x10
>  netif_disable_lro+0x90/0x520
>  ? __pfx_netif_disable_lro+0x10/0x10
>  ? lockdep_hardirqs_on+0xbe/0x140
>  ? neigh_parms_alloc+0x36b/0x4f0
>  ? __local_bh_enable_ip+0xa5/0x120
>  ? neigh_parms_alloc+0x36b/0x4f0
>  inetdev_init+0x2d1/0x4a0
>  inetdev_event+0x9b3/0x1590
>  ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
> 56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
>  ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
> 56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
>  ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
> 56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
>  ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
> 56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
>  ? __module_address.part.0+0x6a/0x220
>  ? __pfx_inetdev_event+0x10/0x10
>  ? notifier_call_chain+0x9b/0x300
> 
> But I found a new deadlock.
> Reproducer:
>    modprobe netdevsim
>    ip netns add ns_test
>    echo 1 > /sys/bus/netdevsim/new_device
>    ip link add bond0 type bond
>    ip link set $interface master bond0
>    ip link set $interface netns ns_test
> 
> Splat:
> ============================================
> WARNING: possible recursive locking detected
> 6.14.0-rc7+ #74 Tainted: G        W
> --------------------------------------------
> ip/1876 is trying to acquire lock:
> ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at: dev_close+0x81/0x1f0
> 
> but task is already holding lock:
> ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3410
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&dev->lock);
>   lock(&dev->lock);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by ip/1876:
>  #0: ffffffff993ba250 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
>  #1: ffff88816736e230 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> rtnl_newlink+0x6f6/0x1c60
>  #2: ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3410
> 
> stack backtrace:
> CPU: 1 UID: 0 PID: 1876 Comm: ip Tainted: G        W
> 6.14.0-rc7+ #74 0e3a9c04b78c7bd4fd13
> Tainted: [W]=WARN
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x7e/0xc0
>  print_deadlock_bug+0x4fd/0x8e0
>  __lock_acquire+0x3082/0x4fd0
>  ? __pfx___lock_acquire+0x10/0x10
>  ? __pfx_lock_release+0x10/0x10
>  lock_acquire+0x1b0/0x570
>  ? dev_close+0x81/0x1f0
>  ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  ? __pfx_lock_acquire+0x10/0x10
>  ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  __mutex_lock+0x17c/0x17c0
>  ? dev_close+0x81/0x1f0
>  ? dev_close+0x81/0x1f0
>  ? __pfx_netdev_change_features+0x10/0x10
>  ? __pfx___mutex_lock+0x10/0x10
>  ? __module_text_address+0x36/0x170
>  ? preempt_count_add+0x7d/0x150
>  ? ip6_route_dev_notify+0x37/0x670
>  ? notifier_call_chain+0x9b/0x300
>  ? dev_close+0x81/0x1f0
>  dev_close+0x81/0x1f0
>  __bond_release_one+0x888/0x1610 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  ? __mutex_lock+0x935/0x17c0
>  ? nf_tables_flowtable_event+0x97/0x480 [nf_tables
> 1445783a301bcd3ec7ca4a0703efdcd50d4aca3a]
>  ? __pfx___bond_release_one+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  ? nft_offload_netdev_event+0xce/0x3a0 [nf_tables
> 1445783a301bcd3ec7ca4a0703efdcd50d4aca3a]
>  ? __mutex_unlock_slowpath+0x15d/0x650
>  ? __pfx___mutex_unlock_slowpath+0x10/0x10
>  ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  ? __module_address.part.0+0x6a/0x220
>  bond_netdev_event+0x91b/0xab0 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
>  notifier_call_chain+0x9b/0x300
>  netif_change_net_namespace+0x43f/0x1390
>  ? __pfx_netif_change_net_namespace+0x10/0x10
>  ? __pfx_validate_linkmsg+0x10/0x10
>  ? __pfx___lock_acquire+0x10/0x10
>  do_setlink.constprop.0+0x241/0x3410
> 
> Reproducer2:
>    modprobe netdevsim
>    ip netns add ns_test
>    echo 1 > /sys/bus/netdevsim/new_device
>    ip link add team0 type team
>    ip link set $interface master team0
>    ip link set $interface netns ns_test
> 
> Splat:
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.14.0-rc7+ #74 Tainted: G        W
> ------------------------------------------------------
> ip/2036 is trying to acquire lock:
> ffff88812fccae88 (team->team_lock_key){+.+.}-{4:4}, at:
> team_device_event+0x101/0x690 [team]
> 
> but task is already holding lock:
> ffff8881947a2d90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3410
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&dev->lock){+.+.}-{4:4}:
>        lock_acquire+0x1b0/0x570
>        __mutex_lock+0x17c/0x17c0
>        dev_set_mtu+0x86/0x210
>        team_add_slave+0x802/0x1e00 [team]
>        do_set_master+0x363/0x6d0
>        do_setlink.constprop.0+0x86f/0x3410
>        rtnl_newlink+0x108d/0x1c60
>        rtnetlink_rcv_msg+0x71c/0xc10
>        netlink_rcv_skb+0x12c/0x360
>        netlink_unicast+0x447/0x710
>        netlink_sendmsg+0x712/0xbc0
>        ____sys_sendmsg+0x7ab/0xa10
>        ___sys_sendmsg+0xee/0x170
>        __sys_sendmsg+0x105/0x190
>        do_syscall_64+0x64/0x140
>        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> -> #0 (team->team_lock_key){+.+.}-{4:4}:
>        check_prev_add+0x1b7/0x2360
>        __lock_acquire+0x32ab/0x4fd0
>        lock_acquire+0x1b0/0x570
>        __mutex_lock+0x17c/0x17c0
>        team_device_event+0x101/0x690 [team]
>        notifier_call_chain+0x9b/0x300
>        dev_close_many+0x2c4/0x5a0
>        netif_close+0x147/0x1e0
>        netif_change_net_namespace+0x3a9/0x1390
>        do_setlink.constprop.0+0x241/0x3410
>        rtnl_newlink+0x108d/0x1c60
>        rtnetlink_rcv_msg+0x71c/0xc10
>        netlink_rcv_skb+0x12c/0x360
>        netlink_unicast+0x447/0x710
>        netlink_sendmsg+0x712/0xbc0
>        ____sys_sendmsg+0x7ab/0xa10
>        ___sys_sendmsg+0xee/0x170
>        __sys_sendmsg+0x105/0x190
>        do_syscall_64+0x64/0x140
>        entry_SYSCALL_64_after_hwframe+0x76/0x7e

This is interesting, haven't seen this one. Looks lie team_device_event
NETDEV_DOWN which grabs team->lock.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-03-25 12:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-17  6:17 Report deadlock in the latest net-next Taehee Yoo
2025-03-24 16:57 ` Stanislav Fomichev
2025-03-25  4:36   ` Taehee Yoo
2025-03-25 12:45     ` Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).