* Report deadlock in the latest net-next
@ 2025-03-17 6:17 Taehee Yoo
2025-03-24 16:57 ` Stanislav Fomichev
0 siblings, 1 reply; 4+ messages in thread
From: Taehee Yoo @ 2025-03-17 6:17 UTC (permalink / raw)
To: Netdev, Stanislav Fomichev
Cc: David Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman
Hi Stanislav,
I found a deadlock in the latest net-next kernel.
The calltrace indicates your current
commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
which is net/core/rtnetlink.c:3025
And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
which is /net/core/dev_api.c:255
dev_disable_lro() is called by netdev notification, but notification
seems to be called both outside and inside dev->lock context.
This case is that netdev notification is called inside dev->lock context.
So deadlock occurs.
Could you please look into this?
Reproducer:
modprobe netdevsim
ip netns add ns_test
echo 1 > /sys/bus/netdevsim/new_device
ip link set $interface netns ns_test
============================================
WARNING: possible recursive locking detected
6.14.0-rc6+ #56 Not tainted
--------------------------------------------
ip/1672 is trying to acquire lock:
ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0
but task is already holding lock:
ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3440
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by ip/1672:
#0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
#1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
rtnl_newlink+0x6f6/0x1c60
#2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3440
stack backtrace:
CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
66129e0c5b1b922fef38623168aea99c0593a519
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
print_deadlock_bug+0x4fd/0x8e0
__lock_acquire+0x3082/0x4fd0
? __pfx___lock_acquire+0x10/0x10
? mark_lock.part.0+0xfa/0x2f60
? __pfx___lock_acquire+0x10/0x10
? check_chain_key+0x1c1/0x520
lock_acquire+0x1b0/0x570
? dev_disable_lro+0x81/0x1f0
? __pfx_lock_acquire+0x10/0x10
__mutex_lock+0x17c/0x17c0
? dev_disable_lro+0x81/0x1f0
? dev_disable_lro+0x81/0x1f0
? __pfx___mutex_lock+0x10/0x10
? mark_held_locks+0xa5/0xf0
? neigh_parms_alloc+0x36b/0x4f0
? __local_bh_enable_ip+0xa5/0x120
? lockdep_hardirqs_on+0xbe/0x140
? dev_disable_lro+0x81/0x1f0
dev_disable_lro+0x81/0x1f0
inetdev_init+0x2d1/0x4a0
inetdev_event+0x9b3/0x1590
? __pfx_lock_release+0x10/0x10
? __pfx_inetdev_event+0x10/0x10
? notifier_call_chain+0x9b/0x300
notifier_call_chain+0x9b/0x300
netif_change_net_namespace+0xdfe/0x1390
? __pfx_netif_change_net_namespace+0x10/0x10
? __pfx_validate_linkmsg+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
do_setlink.constprop.0+0x241/0x3440
? lock_acquire+0x1b0/0x570
? __pfx_do_setlink.constprop.0+0x10/0x10
? rtnl_newlink+0x6f6/0x1c60
? __pfx_lock_acquired+0x10/0x10
? netlink_sendmsg+0x712/0xbc0
? rcu_is_watching+0x11/0xb0
? trace_contention_end+0xef/0x140
? __mutex_lock+0x935/0x17c0
? __create_object+0x36/0x90
? __pfx_lock_release+0x10/0x10
? rtnl_newlink+0x6f6/0x1c60
? __nla_validate_parse+0xb9/0x2830
? __pfx___mutex_lock+0x10/0x10
? lockdep_hardirqs_on+0xbe/0x140
? __pfx___nla_validate_parse+0x10/0x10
? rcu_is_watching+0x11/0xb0
? cap_capable+0x17d/0x360
? fdget+0x4e/0x1d0
rtnl_newlink+0x108d/0x1c60
? __pfx_rtnl_newlink+0x10/0x10
? mark_lock.part.0+0xfa/0x2f60
? __pfx___lock_acquire+0x10/0x10
? __pfx_mark_lock.part.0+0x10/0x10
? __pfx_lock_release+0x10/0x10
? __pfx_rtnl_newlink+0x10/0x10
rtnetlink_rcv_msg+0x71c/0xc10
? __pfx_rtnetlink_rcv_msg+0x10/0x10
? check_chain_key+0x1c1/0x520
? __pfx___lock_acquire+0x10/0x10
netlink_rcv_skb+0x12c/0x360
? __pfx_rtnetlink_rcv_msg+0x10/0x10
? __pfx_netlink_rcv_skb+0x10/0x10
? netlink_deliver_tap+0xcb/0x9e0
? netlink_deliver_tap+0x14b/0x9e0
netlink_unicast+0x447/0x710
? __pfx_netlink_unicast+0x10/0x10
netlink_sendmsg+0x712/0xbc0
? __pfx_netlink_sendmsg+0x10/0x10
? _copy_from_user+0x3e/0xa0
____sys_sendmsg+0x7ab/0xa10
? __pfx_____sys_sendmsg+0x10/0x10
? __pfx_copy_msghdr_from_user+0x10/0x10
___sys_sendmsg+0xee/0x170
? __pfx___lock_acquire+0x10/0x10
? kasan_save_stack+0x20/0x40
? __pfx____sys_sendmsg+0x10/0x10
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
? kasan_save_stack+0x30/0x40
? __pfx_lock_release+0x10/0x10
? __might_fault+0xbf/0x170
__sys_sendmsg+0x105/0x190
? __pfx___sys_sendmsg+0x10/0x10
? rseq_syscall+0xc3/0x130
do_syscall_64+0x64/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fd20f92c004
Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
005
RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000
Thanks a lot!
Taehee Yoo
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Report deadlock in the latest net-next
2025-03-17 6:17 Report deadlock in the latest net-next Taehee Yoo
@ 2025-03-24 16:57 ` Stanislav Fomichev
2025-03-25 4:36 ` Taehee Yoo
0 siblings, 1 reply; 4+ messages in thread
From: Stanislav Fomichev @ 2025-03-24 16:57 UTC (permalink / raw)
To: Taehee Yoo
Cc: Netdev, Stanislav Fomichev, David Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
On 03/17, Taehee Yoo wrote:
> Hi Stanislav,
> I found a deadlock in the latest net-next kernel.
> The calltrace indicates your current
> commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
> The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
> which is net/core/rtnetlink.c:3025
> And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
> which is /net/core/dev_api.c:255
> dev_disable_lro() is called by netdev notification, but notification
> seems to be called both outside and inside dev->lock context.
> This case is that netdev notification is called inside dev->lock context.
> So deadlock occurs.
> Could you please look into this?
>
> Reproducer:
> modprobe netdevsim
> ip netns add ns_test
> echo 1 > /sys/bus/netdevsim/new_device
> ip link set $interface netns ns_test
>
> ============================================
> WARNING: possible recursive locking detected
> 6.14.0-rc6+ #56 Not tainted
> --------------------------------------------
> ip/1672 is trying to acquire lock:
> ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0
>
> but task is already holding lock:
> ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3440
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&dev->lock);
> lock(&dev->lock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 3 locks held by ip/1672:
> #0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
> #1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> rtnl_newlink+0x6f6/0x1c60
> #2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3440
>
> stack backtrace:
> CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
> 66129e0c5b1b922fef38623168aea99c0593a519
> Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
> Call Trace:
> <TASK>
> dump_stack_lvl+0x7e/0xc0
> print_deadlock_bug+0x4fd/0x8e0
> __lock_acquire+0x3082/0x4fd0
> ? __pfx___lock_acquire+0x10/0x10
> ? mark_lock.part.0+0xfa/0x2f60
> ? __pfx___lock_acquire+0x10/0x10
> ? check_chain_key+0x1c1/0x520
> lock_acquire+0x1b0/0x570
> ? dev_disable_lro+0x81/0x1f0
> ? __pfx_lock_acquire+0x10/0x10
> __mutex_lock+0x17c/0x17c0
> ? dev_disable_lro+0x81/0x1f0
> ? dev_disable_lro+0x81/0x1f0
> ? __pfx___mutex_lock+0x10/0x10
> ? mark_held_locks+0xa5/0xf0
> ? neigh_parms_alloc+0x36b/0x4f0
> ? __local_bh_enable_ip+0xa5/0x120
> ? lockdep_hardirqs_on+0xbe/0x140
> ? dev_disable_lro+0x81/0x1f0
> dev_disable_lro+0x81/0x1f0
> inetdev_init+0x2d1/0x4a0
> inetdev_event+0x9b3/0x1590
> ? __pfx_lock_release+0x10/0x10
> ? __pfx_inetdev_event+0x10/0x10
> ? notifier_call_chain+0x9b/0x300
> notifier_call_chain+0x9b/0x300
> netif_change_net_namespace+0xdfe/0x1390
> ? __pfx_netif_change_net_namespace+0x10/0x10
> ? __pfx_validate_linkmsg+0x10/0x10
> ? __pfx___lock_acquire+0x10/0x10
> do_setlink.constprop.0+0x241/0x3440
> ? lock_acquire+0x1b0/0x570
> ? __pfx_do_setlink.constprop.0+0x10/0x10
> ? rtnl_newlink+0x6f6/0x1c60
> ? __pfx_lock_acquired+0x10/0x10
> ? netlink_sendmsg+0x712/0xbc0
> ? rcu_is_watching+0x11/0xb0
> ? trace_contention_end+0xef/0x140
> ? __mutex_lock+0x935/0x17c0
> ? __create_object+0x36/0x90
> ? __pfx_lock_release+0x10/0x10
> ? rtnl_newlink+0x6f6/0x1c60
> ? __nla_validate_parse+0xb9/0x2830
> ? __pfx___mutex_lock+0x10/0x10
> ? lockdep_hardirqs_on+0xbe/0x140
> ? __pfx___nla_validate_parse+0x10/0x10
> ? rcu_is_watching+0x11/0xb0
> ? cap_capable+0x17d/0x360
> ? fdget+0x4e/0x1d0
> rtnl_newlink+0x108d/0x1c60
> ? __pfx_rtnl_newlink+0x10/0x10
> ? mark_lock.part.0+0xfa/0x2f60
> ? __pfx___lock_acquire+0x10/0x10
> ? __pfx_mark_lock.part.0+0x10/0x10
> ? __pfx_lock_release+0x10/0x10
> ? __pfx_rtnl_newlink+0x10/0x10
> rtnetlink_rcv_msg+0x71c/0xc10
> ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> ? check_chain_key+0x1c1/0x520
> ? __pfx___lock_acquire+0x10/0x10
> netlink_rcv_skb+0x12c/0x360
> ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> ? __pfx_netlink_rcv_skb+0x10/0x10
> ? netlink_deliver_tap+0xcb/0x9e0
> ? netlink_deliver_tap+0x14b/0x9e0
> netlink_unicast+0x447/0x710
> ? __pfx_netlink_unicast+0x10/0x10
> netlink_sendmsg+0x712/0xbc0
> ? __pfx_netlink_sendmsg+0x10/0x10
> ? _copy_from_user+0x3e/0xa0
> ____sys_sendmsg+0x7ab/0xa10
> ? __pfx_____sys_sendmsg+0x10/0x10
> ? __pfx_copy_msghdr_from_user+0x10/0x10
> ___sys_sendmsg+0xee/0x170
> ? __pfx___lock_acquire+0x10/0x10
> ? kasan_save_stack+0x20/0x40
> ? __pfx____sys_sendmsg+0x10/0x10
> ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> ? kasan_save_stack+0x30/0x40
> ? __pfx_lock_release+0x10/0x10
> ? __might_fault+0xbf/0x170
> __sys_sendmsg+0x105/0x190
> ? __pfx___sys_sendmsg+0x10/0x10
> ? rseq_syscall+0xc3/0x130
> do_syscall_64+0x64/0x140
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7fd20f92c004
> Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
> 00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
> 005
> RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
> RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
> RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
> R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
> R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000
>
> Thanks a lot!
> Taehee Yoo
Sorry, I completely missed that, I think this is similar to:
https://lore.kernel.org/netdev/Z-GDBlDsnPyc21RM@mini-arch/T/#u
?
Can you give it a quick test with the patches from that link?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Report deadlock in the latest net-next
2025-03-24 16:57 ` Stanislav Fomichev
@ 2025-03-25 4:36 ` Taehee Yoo
2025-03-25 12:45 ` Stanislav Fomichev
0 siblings, 1 reply; 4+ messages in thread
From: Taehee Yoo @ 2025-03-25 4:36 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Netdev, Stanislav Fomichev, David Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
On Tue, Mar 25, 2025 at 1:57 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
Hi Stanislav,
Thanks a lot for your reply.
> On 03/17, Taehee Yoo wrote:
> > Hi Stanislav,
> > I found a deadlock in the latest net-next kernel.
> > The calltrace indicates your current
> > commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
> > The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
> > which is net/core/rtnetlink.c:3025
> > And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
> > which is /net/core/dev_api.c:255
> > dev_disable_lro() is called by netdev notification, but notification
> > seems to be called both outside and inside dev->lock context.
> > This case is that netdev notification is called inside dev->lock context.
> > So deadlock occurs.
> > Could you please look into this?
> >
> > Reproducer:
> > modprobe netdevsim
> > ip netns add ns_test
> > echo 1 > /sys/bus/netdevsim/new_device
> > ip link set $interface netns ns_test
> >
> > ============================================
> > WARNING: possible recursive locking detected
> > 6.14.0-rc6+ #56 Not tainted
> > --------------------------------------------
> > ip/1672 is trying to acquire lock:
> > ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0
> >
> > but task is already holding lock:
> > ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> > do_setlink.constprop.0+0x12a/0x3440
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(&dev->lock);
> > lock(&dev->lock);
> >
> > *** DEADLOCK ***
> >
> > May be due to missing lock nesting notation
> >
> > 3 locks held by ip/1672:
> > #0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
> > #1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> > rtnl_newlink+0x6f6/0x1c60
> > #2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> > do_setlink.constprop.0+0x12a/0x3440
> >
> > stack backtrace:
> > CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
> > 66129e0c5b1b922fef38623168aea99c0593a519
> > Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x7e/0xc0
> > print_deadlock_bug+0x4fd/0x8e0
> > __lock_acquire+0x3082/0x4fd0
> > ? __pfx___lock_acquire+0x10/0x10
> > ? mark_lock.part.0+0xfa/0x2f60
> > ? __pfx___lock_acquire+0x10/0x10
> > ? check_chain_key+0x1c1/0x520
> > lock_acquire+0x1b0/0x570
> > ? dev_disable_lro+0x81/0x1f0
> > ? __pfx_lock_acquire+0x10/0x10
> > __mutex_lock+0x17c/0x17c0
> > ? dev_disable_lro+0x81/0x1f0
> > ? dev_disable_lro+0x81/0x1f0
> > ? __pfx___mutex_lock+0x10/0x10
> > ? mark_held_locks+0xa5/0xf0
> > ? neigh_parms_alloc+0x36b/0x4f0
> > ? __local_bh_enable_ip+0xa5/0x120
> > ? lockdep_hardirqs_on+0xbe/0x140
> > ? dev_disable_lro+0x81/0x1f0
> > dev_disable_lro+0x81/0x1f0
> > inetdev_init+0x2d1/0x4a0
> > inetdev_event+0x9b3/0x1590
> > ? __pfx_lock_release+0x10/0x10
> > ? __pfx_inetdev_event+0x10/0x10
> > ? notifier_call_chain+0x9b/0x300
> > notifier_call_chain+0x9b/0x300
> > netif_change_net_namespace+0xdfe/0x1390
> > ? __pfx_netif_change_net_namespace+0x10/0x10
> > ? __pfx_validate_linkmsg+0x10/0x10
> > ? __pfx___lock_acquire+0x10/0x10
> > do_setlink.constprop.0+0x241/0x3440
> > ? lock_acquire+0x1b0/0x570
> > ? __pfx_do_setlink.constprop.0+0x10/0x10
> > ? rtnl_newlink+0x6f6/0x1c60
> > ? __pfx_lock_acquired+0x10/0x10
> > ? netlink_sendmsg+0x712/0xbc0
> > ? rcu_is_watching+0x11/0xb0
> > ? trace_contention_end+0xef/0x140
> > ? __mutex_lock+0x935/0x17c0
> > ? __create_object+0x36/0x90
> > ? __pfx_lock_release+0x10/0x10
> > ? rtnl_newlink+0x6f6/0x1c60
> > ? __nla_validate_parse+0xb9/0x2830
> > ? __pfx___mutex_lock+0x10/0x10
> > ? lockdep_hardirqs_on+0xbe/0x140
> > ? __pfx___nla_validate_parse+0x10/0x10
> > ? rcu_is_watching+0x11/0xb0
> > ? cap_capable+0x17d/0x360
> > ? fdget+0x4e/0x1d0
> > rtnl_newlink+0x108d/0x1c60
> > ? __pfx_rtnl_newlink+0x10/0x10
> > ? mark_lock.part.0+0xfa/0x2f60
> > ? __pfx___lock_acquire+0x10/0x10
> > ? __pfx_mark_lock.part.0+0x10/0x10
> > ? __pfx_lock_release+0x10/0x10
> > ? __pfx_rtnl_newlink+0x10/0x10
> > rtnetlink_rcv_msg+0x71c/0xc10
> > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > ? check_chain_key+0x1c1/0x520
> > ? __pfx___lock_acquire+0x10/0x10
> > netlink_rcv_skb+0x12c/0x360
> > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > ? __pfx_netlink_rcv_skb+0x10/0x10
> > ? netlink_deliver_tap+0xcb/0x9e0
> > ? netlink_deliver_tap+0x14b/0x9e0
> > netlink_unicast+0x447/0x710
> > ? __pfx_netlink_unicast+0x10/0x10
> > netlink_sendmsg+0x712/0xbc0
> > ? __pfx_netlink_sendmsg+0x10/0x10
> > ? _copy_from_user+0x3e/0xa0
> > ____sys_sendmsg+0x7ab/0xa10
> > ? __pfx_____sys_sendmsg+0x10/0x10
> > ? __pfx_copy_msghdr_from_user+0x10/0x10
> > ___sys_sendmsg+0xee/0x170
> > ? __pfx___lock_acquire+0x10/0x10
> > ? kasan_save_stack+0x20/0x40
> > ? __pfx____sys_sendmsg+0x10/0x10
> > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > ? kasan_save_stack+0x30/0x40
> > ? __pfx_lock_release+0x10/0x10
> > ? __might_fault+0xbf/0x170
> > __sys_sendmsg+0x105/0x190
> > ? __pfx___sys_sendmsg+0x10/0x10
> > ? rseq_syscall+0xc3/0x130
> > do_syscall_64+0x64/0x140
> > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7fd20f92c004
> > Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
> > 00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
> > 005
> > RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
> > RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
> > RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
> > R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
> > R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000
> >
> > Thanks a lot!
> > Taehee Yoo
>
> Sorry, I completely missed that, I think this is similar to:
>
> https://lore.kernel.org/netdev/Z-GDBlDsnPyc21RM@mini-arch/T/#u
>
> ?
>
> Can you give it a quick test with the patches from that link?
I applied two changes [1] and [2].
The above case seems to be fixed.
But I found a new splat when netdevsim interface was created,
which was already reported from that link.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1448 at ./include/net/netdev_lock.h:54
__netdev_update_features+0x894/0x1550
Modules linked in: netdevsim veth xt_nat xt_tcpudp xt_conntrack
nft_chain_nat xt_MASQUERADE nf_cos
CPU: 1 UID: 0 PID: 1448 Comm: bash Not tainted 6.14.0-rc7+ #74
0e3a9c04b78c7bd4fd13f140e1c89a83e53
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603
11/01/2021
RIP: 0010:__netdev_update_features+0x894/0x1550
Code: ff 0f 1f 44 00 00 48 f7 d0 49 21 c4 e9 4d fa ff ff 48 8d bd 90
0d 00 00 be ff ff ff ff e8 e0
RSP: 0018:ffff88825cc3f230 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881e1f72000 RCX: 0000000000000001
RDX: 0000000000000006 RSI: ffffffff90ac4960 RDI: ffffffff90d73280
RBP: ffff8881e1f72000 R08: 0000000000000000 R09: fffffbfff327743c
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88815ad84000
R13: ffff88815ad84168 R14: 0000000000000005 R15: 1ffff1104b987e6c
FS: 00007f64f7c8a740(0000) GS:ffff88881b200000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffdaa5c07c8 CR3: 00000001e1af0000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0xcd/0x2f0
? __netdev_update_features+0x894/0x1550
? report_bug+0x326/0x3c0
? handle_bug+0x53/0xa0
? exc_invalid_op+0x14/0x50
? asm_exc_invalid_op+0x16/0x20
? __netdev_update_features+0x894/0x1550
? check_chain_key+0x1c1/0x520
? __pfx___netdev_update_features+0x10/0x10
? __pfx_lock_release+0x10/0x10
netif_disable_lro+0x90/0x520
? __pfx_netif_disable_lro+0x10/0x10
? lockdep_hardirqs_on+0xbe/0x140
? neigh_parms_alloc+0x36b/0x4f0
? __local_bh_enable_ip+0xa5/0x120
? neigh_parms_alloc+0x36b/0x4f0
inetdev_init+0x2d1/0x4a0
inetdev_event+0x9b3/0x1590
? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
? __module_address.part.0+0x6a/0x220
? __pfx_inetdev_event+0x10/0x10
? notifier_call_chain+0x9b/0x300
But I found a new deadlock.
Reproducer:
modprobe netdevsim
ip netns add ns_test
echo 1 > /sys/bus/netdevsim/new_device
ip link add bond0 type bond
ip link set $interface master bond0
ip link set $interface netns ns_test
Splat:
============================================
WARNING: possible recursive locking detected
6.14.0-rc7+ #74 Tainted: G W
--------------------------------------------
ip/1876 is trying to acquire lock:
ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at: dev_close+0x81/0x1f0
but task is already holding lock:
ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3410
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by ip/1876:
#0: ffffffff993ba250 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
#1: ffff88816736e230 (&net->rtnl_mutex){+.+.}-{4:4}, at:
rtnl_newlink+0x6f6/0x1c60
#2: ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3410
stack backtrace:
CPU: 1 UID: 0 PID: 1876 Comm: ip Tainted: G W
6.14.0-rc7+ #74 0e3a9c04b78c7bd4fd13
Tainted: [W]=WARN
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
print_deadlock_bug+0x4fd/0x8e0
__lock_acquire+0x3082/0x4fd0
? __pfx___lock_acquire+0x10/0x10
? __pfx_lock_release+0x10/0x10
lock_acquire+0x1b0/0x570
? dev_close+0x81/0x1f0
? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
? __pfx_lock_acquire+0x10/0x10
? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
__mutex_lock+0x17c/0x17c0
? dev_close+0x81/0x1f0
? dev_close+0x81/0x1f0
? __pfx_netdev_change_features+0x10/0x10
? __pfx___mutex_lock+0x10/0x10
? __module_text_address+0x36/0x170
? preempt_count_add+0x7d/0x150
? ip6_route_dev_notify+0x37/0x670
? notifier_call_chain+0x9b/0x300
? dev_close+0x81/0x1f0
dev_close+0x81/0x1f0
__bond_release_one+0x888/0x1610 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
? __mutex_lock+0x935/0x17c0
? nf_tables_flowtable_event+0x97/0x480 [nf_tables
1445783a301bcd3ec7ca4a0703efdcd50d4aca3a]
? __pfx___bond_release_one+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
? nft_offload_netdev_event+0xce/0x3a0 [nf_tables
1445783a301bcd3ec7ca4a0703efdcd50d4aca3a]
? __mutex_unlock_slowpath+0x15d/0x650
? __pfx___mutex_unlock_slowpath+0x10/0x10
? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
? __pfx_bond_netdev_event+0x10/0x10 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
? __module_address.part.0+0x6a/0x220
bond_netdev_event+0x91b/0xab0 [bonding
b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
notifier_call_chain+0x9b/0x300
netif_change_net_namespace+0x43f/0x1390
? __pfx_netif_change_net_namespace+0x10/0x10
? __pfx_validate_linkmsg+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
do_setlink.constprop.0+0x241/0x3410
Reproducer2:
modprobe netdevsim
ip netns add ns_test
echo 1 > /sys/bus/netdevsim/new_device
ip link add team0 type team
ip link set $interface master team0
ip link set $interface netns ns_test
Splat:
======================================================
WARNING: possible circular locking dependency detected
6.14.0-rc7+ #74 Tainted: G W
------------------------------------------------------
ip/2036 is trying to acquire lock:
ffff88812fccae88 (team->team_lock_key){+.+.}-{4:4}, at:
team_device_event+0x101/0x690 [team]
but task is already holding lock:
ffff8881947a2d90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3410
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&dev->lock){+.+.}-{4:4}:
lock_acquire+0x1b0/0x570
__mutex_lock+0x17c/0x17c0
dev_set_mtu+0x86/0x210
team_add_slave+0x802/0x1e00 [team]
do_set_master+0x363/0x6d0
do_setlink.constprop.0+0x86f/0x3410
rtnl_newlink+0x108d/0x1c60
rtnetlink_rcv_msg+0x71c/0xc10
netlink_rcv_skb+0x12c/0x360
netlink_unicast+0x447/0x710
netlink_sendmsg+0x712/0xbc0
____sys_sendmsg+0x7ab/0xa10
___sys_sendmsg+0xee/0x170
__sys_sendmsg+0x105/0x190
do_syscall_64+0x64/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (team->team_lock_key){+.+.}-{4:4}:
check_prev_add+0x1b7/0x2360
__lock_acquire+0x32ab/0x4fd0
lock_acquire+0x1b0/0x570
__mutex_lock+0x17c/0x17c0
team_device_event+0x101/0x690 [team]
notifier_call_chain+0x9b/0x300
dev_close_many+0x2c4/0x5a0
netif_close+0x147/0x1e0
netif_change_net_namespace+0x3a9/0x1390
do_setlink.constprop.0+0x241/0x3410
rtnl_newlink+0x108d/0x1c60
rtnetlink_rcv_msg+0x71c/0xc10
netlink_rcv_skb+0x12c/0x360
netlink_unicast+0x447/0x710
netlink_sendmsg+0x712/0xbc0
____sys_sendmsg+0x7ab/0xa10
___sys_sendmsg+0xee/0x170
__sys_sendmsg+0x105/0x190
do_syscall_64+0x64/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->lock);
lock(team->team_lock_key);
lock(&dev->lock);
lock(team->team_lock_key);
*** DEADLOCK ***
3 locks held by ip/2036:
#0: ffffffffa33ba250 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
#1: ffff888148db1fb0 (&net->rtnl_mutex){+.+.}-{4:4}, at:
rtnl_newlink+0x6f6/0x1c60
#2: ffff8881947a2d90 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.constprop.0+0x12a/0x3410
stack backtrace:
CPU: 3 UID: 0 PID: 2036 Comm: ip Tainted: G W
6.14.0-rc7+ #74 0e3a9c04b78c7bd4fd13
Tainted: [W]=WARN
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
print_circular_bug+0x5bd/0x9b0
check_noncircular+0x31b/0x400
? mark_lock.part.0+0xfa/0x2f60
? __pfx_check_noncircular+0x10/0x10
? __pfx_mark_lock.part.0+0x10/0x10
? __lock_acquire+0x16fa/0x4fd0
check_prev_add+0x1b7/0x2360
__lock_acquire+0x32ab/0x4fd0
? __pfx___lock_acquire+0x10/0x10
? try_to_wake_up+0xb9/0x1600
? __pfx_lock_release+0x10/0x10
lock_acquire+0x1b0/0x570
? team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
? __pfx_lock_acquire+0x10/0x10
? __pfx_mark_lock.part.0+0x10/0x10
__mutex_lock+0x17c/0x17c0
? team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
? team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
? mark_held_locks+0xa5/0xf0
? __pfx___mutex_lock+0x10/0x10
? queue_work_on+0x63/0x90
? lockdep_hardirqs_on+0xbe/0x140
? __pfx_team_device_event+0x10/0x10 [team
101c85bbc03feb292be26fbaaf9cee585e6924fa]
? __pfx_team_device_event+0x10/0x10 [team
101c85bbc03feb292be26fbaaf9cee585e6924fa]
? __pfx_team_device_event+0x10/0x10 [team
101c85bbc03feb292be26fbaaf9cee585e6924fa]
? __pfx_team_device_event+0x10/0x10 [team
101c85bbc03feb292be26fbaaf9cee585e6924fa]
? team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
team_device_event+0x101/0x690 [team 101c85bbc03feb292be26fbaaf9cee585e6924fa]
notifier_call_chain+0x9b/0x300
dev_close_many+0x2c4/0x5a0
? __pfx_lock_release+0x10/0x10
? __pfx_dev_close_many+0x10/0x10
netif_close+0x147/0x1e0
? __pfx_netif_close+0x10/0x10
? rcu_is_watching+0x11/0xb0
netif_change_net_namespace+0x3a9/0x1390
[1] https://lore.kernel.org/netdev/Z-IrMQQ-mnQJzGyL@mini-arch/T/#t
[2]
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 754f60fb6e25..77e5705ac799 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -281,7 +281,7 @@ static struct in_device *inetdev_init(struct
net_device *dev)
if (!in_dev->arp_parms)
goto out_kfree;
if (IPV4_DEVCONF(in_dev->cnf, FORWARDING))
- dev_disable_lro(dev);
+ netif_disable_lro(dev);
/* Reference in_dev->dev */
netdev_hold(dev, &in_dev->dev_tracker, GFP_KERNEL);
/* Account for reference dev->ip_ptr (below) */
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Report deadlock in the latest net-next
2025-03-25 4:36 ` Taehee Yoo
@ 2025-03-25 12:45 ` Stanislav Fomichev
0 siblings, 0 replies; 4+ messages in thread
From: Stanislav Fomichev @ 2025-03-25 12:45 UTC (permalink / raw)
To: Taehee Yoo
Cc: Netdev, Stanislav Fomichev, David Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
On 03/25, Taehee Yoo wrote:
> On Tue, Mar 25, 2025 at 1:57 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
>
> Hi Stanislav,
> Thanks a lot for your reply.
>
> > On 03/17, Taehee Yoo wrote:
> > > Hi Stanislav,
> > > I found a deadlock in the latest net-next kernel.
> > > The calltrace indicates your current
> > > commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
> > > The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
> > > which is net/core/rtnetlink.c:3025
> > > And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
> > > which is /net/core/dev_api.c:255
> > > dev_disable_lro() is called by netdev notification, but notification
> > > seems to be called both outside and inside dev->lock context.
> > > This case is that netdev notification is called inside dev->lock context.
> > > So deadlock occurs.
> > > Could you please look into this?
> > >
> > > Reproducer:
> > > modprobe netdevsim
> > > ip netns add ns_test
> > > echo 1 > /sys/bus/netdevsim/new_device
> > > ip link set $interface netns ns_test
> > >
> > > ============================================
> > > WARNING: possible recursive locking detected
> > > 6.14.0-rc6+ #56 Not tainted
> > > --------------------------------------------
> > > ip/1672 is trying to acquire lock:
> > > ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0
> > >
> > > but task is already holding lock:
> > > ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> > > do_setlink.constprop.0+0x12a/0x3440
> > >
> > > other info that might help us debug this:
> > > Possible unsafe locking scenario:
> > >
> > > CPU0
> > > ----
> > > lock(&dev->lock);
> > > lock(&dev->lock);
> > >
> > > *** DEADLOCK ***
> > >
> > > May be due to missing lock nesting notation
> > >
> > > 3 locks held by ip/1672:
> > > #0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
> > > #1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> > > rtnl_newlink+0x6f6/0x1c60
> > > #2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> > > do_setlink.constprop.0+0x12a/0x3440
> > >
> > > stack backtrace:
> > > CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
> > > 66129e0c5b1b922fef38623168aea99c0593a519
> > > Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl+0x7e/0xc0
> > > print_deadlock_bug+0x4fd/0x8e0
> > > __lock_acquire+0x3082/0x4fd0
> > > ? __pfx___lock_acquire+0x10/0x10
> > > ? mark_lock.part.0+0xfa/0x2f60
> > > ? __pfx___lock_acquire+0x10/0x10
> > > ? check_chain_key+0x1c1/0x520
> > > lock_acquire+0x1b0/0x570
> > > ? dev_disable_lro+0x81/0x1f0
> > > ? __pfx_lock_acquire+0x10/0x10
> > > __mutex_lock+0x17c/0x17c0
> > > ? dev_disable_lro+0x81/0x1f0
> > > ? dev_disable_lro+0x81/0x1f0
> > > ? __pfx___mutex_lock+0x10/0x10
> > > ? mark_held_locks+0xa5/0xf0
> > > ? neigh_parms_alloc+0x36b/0x4f0
> > > ? __local_bh_enable_ip+0xa5/0x120
> > > ? lockdep_hardirqs_on+0xbe/0x140
> > > ? dev_disable_lro+0x81/0x1f0
> > > dev_disable_lro+0x81/0x1f0
> > > inetdev_init+0x2d1/0x4a0
> > > inetdev_event+0x9b3/0x1590
> > > ? __pfx_lock_release+0x10/0x10
> > > ? __pfx_inetdev_event+0x10/0x10
> > > ? notifier_call_chain+0x9b/0x300
> > > notifier_call_chain+0x9b/0x300
> > > netif_change_net_namespace+0xdfe/0x1390
> > > ? __pfx_netif_change_net_namespace+0x10/0x10
> > > ? __pfx_validate_linkmsg+0x10/0x10
> > > ? __pfx___lock_acquire+0x10/0x10
> > > do_setlink.constprop.0+0x241/0x3440
> > > ? lock_acquire+0x1b0/0x570
> > > ? __pfx_do_setlink.constprop.0+0x10/0x10
> > > ? rtnl_newlink+0x6f6/0x1c60
> > > ? __pfx_lock_acquired+0x10/0x10
> > > ? netlink_sendmsg+0x712/0xbc0
> > > ? rcu_is_watching+0x11/0xb0
> > > ? trace_contention_end+0xef/0x140
> > > ? __mutex_lock+0x935/0x17c0
> > > ? __create_object+0x36/0x90
> > > ? __pfx_lock_release+0x10/0x10
> > > ? rtnl_newlink+0x6f6/0x1c60
> > > ? __nla_validate_parse+0xb9/0x2830
> > > ? __pfx___mutex_lock+0x10/0x10
> > > ? lockdep_hardirqs_on+0xbe/0x140
> > > ? __pfx___nla_validate_parse+0x10/0x10
> > > ? rcu_is_watching+0x11/0xb0
> > > ? cap_capable+0x17d/0x360
> > > ? fdget+0x4e/0x1d0
> > > rtnl_newlink+0x108d/0x1c60
> > > ? __pfx_rtnl_newlink+0x10/0x10
> > > ? mark_lock.part.0+0xfa/0x2f60
> > > ? __pfx___lock_acquire+0x10/0x10
> > > ? __pfx_mark_lock.part.0+0x10/0x10
> > > ? __pfx_lock_release+0x10/0x10
> > > ? __pfx_rtnl_newlink+0x10/0x10
> > > rtnetlink_rcv_msg+0x71c/0xc10
> > > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > > ? check_chain_key+0x1c1/0x520
> > > ? __pfx___lock_acquire+0x10/0x10
> > > netlink_rcv_skb+0x12c/0x360
> > > ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> > > ? __pfx_netlink_rcv_skb+0x10/0x10
> > > ? netlink_deliver_tap+0xcb/0x9e0
> > > ? netlink_deliver_tap+0x14b/0x9e0
> > > netlink_unicast+0x447/0x710
> > > ? __pfx_netlink_unicast+0x10/0x10
> > > netlink_sendmsg+0x712/0xbc0
> > > ? __pfx_netlink_sendmsg+0x10/0x10
> > > ? _copy_from_user+0x3e/0xa0
> > > ____sys_sendmsg+0x7ab/0xa10
> > > ? __pfx_____sys_sendmsg+0x10/0x10
> > > ? __pfx_copy_msghdr_from_user+0x10/0x10
> > > ___sys_sendmsg+0xee/0x170
> > > ? __pfx___lock_acquire+0x10/0x10
> > > ? kasan_save_stack+0x20/0x40
> > > ? __pfx____sys_sendmsg+0x10/0x10
> > > ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > ? kasan_save_stack+0x30/0x40
> > > ? __pfx_lock_release+0x10/0x10
> > > ? __might_fault+0xbf/0x170
> > > __sys_sendmsg+0x105/0x190
> > > ? __pfx___sys_sendmsg+0x10/0x10
> > > ? rseq_syscall+0xc3/0x130
> > > do_syscall_64+0x64/0x140
> > > entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > RIP: 0033:0x7fd20f92c004
> > > Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
> > > 00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
> > > 005
> > > RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
> > > RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
> > > RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
> > > R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
> > > R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000
> > >
> > > Thanks a lot!
> > > Taehee Yoo
> >
> > Sorry, I completely missed that, I think this is similar to:
> >
> > https://lore.kernel.org/netdev/Z-GDBlDsnPyc21RM@mini-arch/T/#u
> >
> > ?
> >
> > Can you give it a quick test with the patches from that link?
>
> I applied two changes [1] and [2].
> The aboje case seems to be fixed.
> But I found a new splat when netdevsim interface was created,
> which was already reported from that link.
Thanks for testing! Yeah, I'm still looking into it. I ended up
adding ops lock around NETDEV_REGISTER and NETDEV_UP, but I
think something is still not right.
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 1448 at ./include/net/netdev_lock.h:54
> __netdev_update_features+0x894/0x1550
> Modules linked in: netdevsim veth xt_nat xt_tcpudp xt_conntrack
> nft_chain_nat xt_MASQUERADE nf_cos
> CPU: 1 UID: 0 PID: 1448 Comm: bash Not tainted 6.14.0-rc7+ #74
> 0e3a9c04b78c7bd4fd13f140e1c89a83e53
> Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603
> 11/01/2021
> RIP: 0010:__netdev_update_features+0x894/0x1550
> Code: ff 0f 1f 44 00 00 48 f7 d0 49 21 c4 e9 4d fa ff ff 48 8d bd 90
> 0d 00 00 be ff ff ff ff e8 e0
> RSP: 0018:ffff88825cc3f230 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff8881e1f72000 RCX: 0000000000000001
> RDX: 0000000000000006 RSI: ffffffff90ac4960 RDI: ffffffff90d73280
> RBP: ffff8881e1f72000 R08: 0000000000000000 R09: fffffbfff327743c
> R10: 0000000000000001 R11: 0000000000000001 R12: ffff88815ad84000
> R13: ffff88815ad84168 R14: 0000000000000005 R15: 1ffff1104b987e6c
> FS: 00007f64f7c8a740(0000) GS:ffff88881b200000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffdaa5c07c8 CR3: 00000001e1af0000 CR4: 00000000007506f0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __warn+0xcd/0x2f0
> ? __netdev_update_features+0x894/0x1550
> ? report_bug+0x326/0x3c0
> ? handle_bug+0x53/0xa0
> ? exc_invalid_op+0x14/0x50
> ? asm_exc_invalid_op+0x16/0x20
> ? __netdev_update_features+0x894/0x1550
> ? check_chain_key+0x1c1/0x520
> ? __pfx___netdev_update_features+0x10/0x10
> ? __pfx_lock_release+0x10/0x10
> netif_disable_lro+0x90/0x520
> ? __pfx_netif_disable_lro+0x10/0x10
> ? lockdep_hardirqs_on+0xbe/0x140
> ? neigh_parms_alloc+0x36b/0x4f0
> ? __local_bh_enable_ip+0xa5/0x120
> ? neigh_parms_alloc+0x36b/0x4f0
> inetdev_init+0x2d1/0x4a0
> inetdev_event+0x9b3/0x1590
> ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
> 56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
> ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
> 56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
> ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
> 56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
> ? __pfx_nsim_dev_netdevice_event+0x10/0x10 [netdevsim
> 56c6fb92f9ab7ad97a5f7886b4a8c456dda09181]
> ? __module_address.part.0+0x6a/0x220
> ? __pfx_inetdev_event+0x10/0x10
> ? notifier_call_chain+0x9b/0x300
>
> But I found a new deadlock.
> Reproducer:
> modprobe netdevsim
> ip netns add ns_test
> echo 1 > /sys/bus/netdevsim/new_device
> ip link add bond0 type bond
> ip link set $interface master bond0
> ip link set $interface netns ns_test
>
> Splat:
> ============================================
> WARNING: possible recursive locking detected
> 6.14.0-rc7+ #74 Tainted: G W
> --------------------------------------------
> ip/1876 is trying to acquire lock:
> ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at: dev_close+0x81/0x1f0
>
> but task is already holding lock:
> ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3410
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&dev->lock);
> lock(&dev->lock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 3 locks held by ip/1876:
> #0: ffffffff993ba250 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
> #1: ffff88816736e230 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> rtnl_newlink+0x6f6/0x1c60
> #2: ffff8881e1f72d90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3410
>
> stack backtrace:
> CPU: 1 UID: 0 PID: 1876 Comm: ip Tainted: G W
> 6.14.0-rc7+ #74 0e3a9c04b78c7bd4fd13
> Tainted: [W]=WARN
> Call Trace:
> <TASK>
> dump_stack_lvl+0x7e/0xc0
> print_deadlock_bug+0x4fd/0x8e0
> __lock_acquire+0x3082/0x4fd0
> ? __pfx___lock_acquire+0x10/0x10
> ? __pfx_lock_release+0x10/0x10
> lock_acquire+0x1b0/0x570
> ? dev_close+0x81/0x1f0
> ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> ? __pfx_lock_acquire+0x10/0x10
> ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> __mutex_lock+0x17c/0x17c0
> ? dev_close+0x81/0x1f0
> ? dev_close+0x81/0x1f0
> ? __pfx_netdev_change_features+0x10/0x10
> ? __pfx___mutex_lock+0x10/0x10
> ? __module_text_address+0x36/0x170
> ? preempt_count_add+0x7d/0x150
> ? ip6_route_dev_notify+0x37/0x670
> ? notifier_call_chain+0x9b/0x300
> ? dev_close+0x81/0x1f0
> dev_close+0x81/0x1f0
> __bond_release_one+0x888/0x1610 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> ? __mutex_lock+0x935/0x17c0
> ? nf_tables_flowtable_event+0x97/0x480 [nf_tables
> 1445783a301bcd3ec7ca4a0703efdcd50d4aca3a]
> ? __pfx___bond_release_one+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> ? nft_offload_netdev_event+0xce/0x3a0 [nf_tables
> 1445783a301bcd3ec7ca4a0703efdcd50d4aca3a]
> ? __mutex_unlock_slowpath+0x15d/0x650
> ? __pfx___mutex_unlock_slowpath+0x10/0x10
> ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> ? __pfx_bond_netdev_event+0x10/0x10 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> ? __module_address.part.0+0x6a/0x220
> bond_netdev_event+0x91b/0xab0 [bonding
> b66920a8cbfc9c0d4b32a75d6048c0ac5533c0d4]
> notifier_call_chain+0x9b/0x300
> netif_change_net_namespace+0x43f/0x1390
> ? __pfx_netif_change_net_namespace+0x10/0x10
> ? __pfx_validate_linkmsg+0x10/0x10
> ? __pfx___lock_acquire+0x10/0x10
> do_setlink.constprop.0+0x241/0x3410
>
> Reproducer2:
> modprobe netdevsim
> ip netns add ns_test
> echo 1 > /sys/bus/netdevsim/new_device
> ip link add team0 type team
> ip link set $interface master team0
> ip link set $interface netns ns_test
>
> Splat:
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.14.0-rc7+ #74 Tainted: G W
> ------------------------------------------------------
> ip/2036 is trying to acquire lock:
> ffff88812fccae88 (team->team_lock_key){+.+.}-{4:4}, at:
> team_device_event+0x101/0x690 [team]
>
> but task is already holding lock:
> ffff8881947a2d90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3410
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&dev->lock){+.+.}-{4:4}:
> lock_acquire+0x1b0/0x570
> __mutex_lock+0x17c/0x17c0
> dev_set_mtu+0x86/0x210
> team_add_slave+0x802/0x1e00 [team]
> do_set_master+0x363/0x6d0
> do_setlink.constprop.0+0x86f/0x3410
> rtnl_newlink+0x108d/0x1c60
> rtnetlink_rcv_msg+0x71c/0xc10
> netlink_rcv_skb+0x12c/0x360
> netlink_unicast+0x447/0x710
> netlink_sendmsg+0x712/0xbc0
> ____sys_sendmsg+0x7ab/0xa10
> ___sys_sendmsg+0xee/0x170
> __sys_sendmsg+0x105/0x190
> do_syscall_64+0x64/0x140
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> -> #0 (team->team_lock_key){+.+.}-{4:4}:
> check_prev_add+0x1b7/0x2360
> __lock_acquire+0x32ab/0x4fd0
> lock_acquire+0x1b0/0x570
> __mutex_lock+0x17c/0x17c0
> team_device_event+0x101/0x690 [team]
> notifier_call_chain+0x9b/0x300
> dev_close_many+0x2c4/0x5a0
> netif_close+0x147/0x1e0
> netif_change_net_namespace+0x3a9/0x1390
> do_setlink.constprop.0+0x241/0x3410
> rtnl_newlink+0x108d/0x1c60
> rtnetlink_rcv_msg+0x71c/0xc10
> netlink_rcv_skb+0x12c/0x360
> netlink_unicast+0x447/0x710
> netlink_sendmsg+0x712/0xbc0
> ____sys_sendmsg+0x7ab/0xa10
> ___sys_sendmsg+0xee/0x170
> __sys_sendmsg+0x105/0x190
> do_syscall_64+0x64/0x140
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is interesting, haven't seen this one. Looks lie team_device_event
NETDEV_DOWN which grabs team->lock.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-03-25 12:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-17 6:17 Report deadlock in the latest net-next Taehee Yoo
2025-03-24 16:57 ` Stanislav Fomichev
2025-03-25 4:36 ` Taehee Yoo
2025-03-25 12:45 ` Stanislav Fomichev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).