All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <stfomichev@gmail.com>
To: Taehee Yoo <ap420073@gmail.com>
Cc: Netdev <netdev@vger.kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>,
	David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>
Subject: Re: Report deadlock in the latest net-next
Date: Mon, 24 Mar 2025 09:57:57 -0700	[thread overview]
Message-ID: <Z-GPFQou5GomWCOo@mini-arch> (raw)
In-Reply-To: <CAMArcTX2dEs=H586fumSEv_V8_p-pcAjyyPXkcLG9WkQM+c0cA@mail.gmail.com>

On 03/17, Taehee Yoo wrote:
> Hi Stanislav,
> I found a deadlock in the latest net-next kernel.
> The calltrace indicates your current
> commit ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations").
> The dev->lock was acquired in do_setlink.constprop.0+0x12a/0x3440,
> which is net/core/rtnetlink.c:3025
> And then dev->lock is acquired in dev_disable_lro+0x81/0x1f0,
> which is /net/core/dev_api.c:255
> dev_disable_lro() is called by netdev notification, but notification
> seems to be called both outside and inside dev->lock context.
> This case is that netdev notification is called inside dev->lock context.
> So deadlock occurs.
> Could you please look into this?
> 
> Reproducer:
> modprobe netdevsim
> ip netns add ns_test
> echo 1 > /sys/bus/netdevsim/new_device
> ip link set $interface netns ns_test
> 
> ============================================
> WARNING: possible recursive locking detected
> 6.14.0-rc6+ #56 Not tainted
> --------------------------------------------
> ip/1672 is trying to acquire lock:
> ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at: dev_disable_lro+0x81/0x1f0
> 
> but task is already holding lock:
> ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3440
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&dev->lock);
>   lock(&dev->lock);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 3 locks held by ip/1672:
>  #0: ffffffff943ba050 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x6b4/0x1c60
>  #1: ffff88813abc6170 (&net->rtnl_mutex){+.+.}-{4:4}, at:
> rtnl_newlink+0x6f6/0x1c60
>  #2: ffff888231fbad90 (&dev->lock){+.+.}-{4:4}, at:
> do_setlink.constprop.0+0x12a/0x3440
> 
> stack backtrace:
> CPU: 2 UID: 0 PID: 1672 Comm: ip Not tainted 6.14.0-rc6+ #56
> 66129e0c5b1b922fef38623168aea99c0593a519
> Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x7e/0xc0
>  print_deadlock_bug+0x4fd/0x8e0
>  __lock_acquire+0x3082/0x4fd0
>  ? __pfx___lock_acquire+0x10/0x10
>  ? mark_lock.part.0+0xfa/0x2f60
>  ? __pfx___lock_acquire+0x10/0x10
>  ? check_chain_key+0x1c1/0x520
>  lock_acquire+0x1b0/0x570
>  ? dev_disable_lro+0x81/0x1f0
>  ? __pfx_lock_acquire+0x10/0x10
>  __mutex_lock+0x17c/0x17c0
>  ? dev_disable_lro+0x81/0x1f0
>  ? dev_disable_lro+0x81/0x1f0
>  ? __pfx___mutex_lock+0x10/0x10
>  ? mark_held_locks+0xa5/0xf0
>  ? neigh_parms_alloc+0x36b/0x4f0
>  ? __local_bh_enable_ip+0xa5/0x120
>  ? lockdep_hardirqs_on+0xbe/0x140
>  ? dev_disable_lro+0x81/0x1f0
>  dev_disable_lro+0x81/0x1f0
>  inetdev_init+0x2d1/0x4a0
>  inetdev_event+0x9b3/0x1590
>  ? __pfx_lock_release+0x10/0x10
>  ? __pfx_inetdev_event+0x10/0x10
>  ? notifier_call_chain+0x9b/0x300
>  notifier_call_chain+0x9b/0x300
>  netif_change_net_namespace+0xdfe/0x1390
>  ? __pfx_netif_change_net_namespace+0x10/0x10
>  ? __pfx_validate_linkmsg+0x10/0x10
>  ? __pfx___lock_acquire+0x10/0x10
>  do_setlink.constprop.0+0x241/0x3440
>  ? lock_acquire+0x1b0/0x570
>  ? __pfx_do_setlink.constprop.0+0x10/0x10
>  ? rtnl_newlink+0x6f6/0x1c60
>  ? __pfx_lock_acquired+0x10/0x10
>  ? netlink_sendmsg+0x712/0xbc0
>  ? rcu_is_watching+0x11/0xb0
>  ? trace_contention_end+0xef/0x140
>  ? __mutex_lock+0x935/0x17c0
>  ? __create_object+0x36/0x90
>  ? __pfx_lock_release+0x10/0x10
>  ? rtnl_newlink+0x6f6/0x1c60
>  ? __nla_validate_parse+0xb9/0x2830
>  ? __pfx___mutex_lock+0x10/0x10
>  ? lockdep_hardirqs_on+0xbe/0x140
>  ? __pfx___nla_validate_parse+0x10/0x10
>  ? rcu_is_watching+0x11/0xb0
>  ? cap_capable+0x17d/0x360
>  ? fdget+0x4e/0x1d0
>  rtnl_newlink+0x108d/0x1c60
>  ? __pfx_rtnl_newlink+0x10/0x10
>  ? mark_lock.part.0+0xfa/0x2f60
>  ? __pfx___lock_acquire+0x10/0x10
>  ? __pfx_mark_lock.part.0+0x10/0x10
>  ? __pfx_lock_release+0x10/0x10
>  ? __pfx_rtnl_newlink+0x10/0x10
>  rtnetlink_rcv_msg+0x71c/0xc10
>  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
>  ? check_chain_key+0x1c1/0x520
>  ? __pfx___lock_acquire+0x10/0x10
>  netlink_rcv_skb+0x12c/0x360
>  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
>  ? __pfx_netlink_rcv_skb+0x10/0x10
>  ? netlink_deliver_tap+0xcb/0x9e0
>  ? netlink_deliver_tap+0x14b/0x9e0
>  netlink_unicast+0x447/0x710
>  ? __pfx_netlink_unicast+0x10/0x10
>  netlink_sendmsg+0x712/0xbc0
>  ? __pfx_netlink_sendmsg+0x10/0x10
>  ? _copy_from_user+0x3e/0xa0
>  ____sys_sendmsg+0x7ab/0xa10
>  ? __pfx_____sys_sendmsg+0x10/0x10
>  ? __pfx_copy_msghdr_from_user+0x10/0x10
>  ___sys_sendmsg+0xee/0x170
>  ? __pfx___lock_acquire+0x10/0x10
>  ? kasan_save_stack+0x20/0x40
>  ? __pfx____sys_sendmsg+0x10/0x10
>  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  ? kasan_save_stack+0x30/0x40
>  ? __pfx_lock_release+0x10/0x10
>  ? __might_fault+0xbf/0x170
>  __sys_sendmsg+0x105/0x190
>  ? __pfx___sys_sendmsg+0x10/0x10
>  ? rseq_syscall+0xc3/0x130
>  do_syscall_64+0x64/0x140
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7fd20f92c004
> Code: 15 19 6e 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00
> 00 f3 0f 1e fa 80 3d 45 f0 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d
> 005
> RSP: 002b:00007fff40636e68 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd20f92c004
> RDX: 0000000000000000 RSI: 00007fff40636ee0 RDI: 0000000000000003
> RBP: 00007fff40636f50 R08: 0000000067d7b7e9 R09: 0000000000000050
> R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000000003
> R13: 0000000067d7b7ea R14: 000055d14b9e4040 R15: 0000000000000000
> 
> Thanks a lot!
> Taehee Yoo

Sorry, I completely missed that, I think this is similar to:

https://lore.kernel.org/netdev/Z-GDBlDsnPyc21RM@mini-arch/T/#u

?

Can you give it a quick test with the patches from that link?

  reply	other threads:[~2025-03-24 16:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-17  6:17 Report deadlock in the latest net-next Taehee Yoo
2025-03-24 16:57 ` Stanislav Fomichev [this message]
2025-03-25  4:36   ` Taehee Yoo
2025-03-25 12:45     ` Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-GPFQou5GomWCOo@mini-arch \
    --to=stfomichev@gmail.com \
    --cc=ap420073@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.