netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] netrom: fix possible deadlock in nr_rt_device_down
@ 2025-06-05 10:54 Denis Arefev
  2025-06-09 22:57 ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Arefev @ 2025-06-05 10:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Nikita Marushkin, Ilya Shchipletsov, Hongbo Li, linux-hams,
	netdev, linux-kernel, lvc-project, stable,
	syzbot+ccdfb85a561b973219c7

Syzkaller detected a possible deadlock in nr_rt_device_down [1]

Locking in concurrent threads can cause deadlock.

           CPU0                           
           ----                           
  nr_rt_device_down()
  |-> spin_lock_bh(&nr_neigh_list_lock);   capture
    . . .
  |-> spin_lock_bh(&nr_node_list_lock);    waiting and deadlock

           CPU1
           ----
  nr_del_node()
  |-> spin_lock_bh(&nr_node_list_lock);    capture
   . . .
  |-> nr_remove_neigh(nr_neigh);
      |-> spin_lock_bh(&nr_neigh_list_lock); waiting for capture

Make sure we always get nr_neigh_list_lock before nr_node_list_lock.

[1]
WARNING: possible circular locking dependency detected
6.15.0-rc2-syzkaller-00278-gfc96b232f8e7 #0 Not tainted
------------------------------------------------------
syz-executor107/6105 is trying to acquire lock:
ffffffff902543b8 (nr_node_list_lock){+...}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffffffff902543b8 (nr_node_list_lock){+...}-{3:3}, at: nr_rt_device_down+0xb5/0x7b0 net/netrom/nr_route.c:517

but task is already holding lock:
ffffffff90254358 (nr_neigh_list_lock){+...}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffffffff90254358 (nr_neigh_list_lock){+...}-{3:3}, at: nr_rt_device_down+0x28/0x7b0 net/netrom/nr_route.c:514

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (nr_neigh_list_lock){+...}-{3:3}:
       lock_acquire+0x116/0x2f0 kernel/locking/lockdep.c:5866
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
       _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
       spin_lock_bh include/linux/spinlock.h:356 [inline]
       nr_remove_neigh net/netrom/nr_route.c:307 [inline]
       nr_dec_obs net/netrom/nr_route.c:472 [inline]
       nr_rt_ioctl+0x39a/0xff0 net/netrom/nr_route.c:692
       sock_do_ioctl+0x152/0x400 net/socket.c:1190
       sock_ioctl+0x644/0x900 net/socket.c:1311
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:906 [inline]
       __se_sys_ioctl+0xf1/0x160 fs/ioctl.c:892
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xf3/0x210 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (&nr_node->node_lock){+...}-{3:3}:
       lock_acquire+0x116/0x2f0 kernel/locking/lockdep.c:5866
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
       _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
       spin_lock_bh include/linux/spinlock.h:356 [inline]
       nr_node_lock include/net/netrom.h:152 [inline]
       nr_dec_obs net/netrom/nr_route.c:459 [inline]
       nr_rt_ioctl+0x194/0xff0 net/netrom/nr_route.c:692
       sock_do_ioctl+0x152/0x400 net/socket.c:1190
       sock_ioctl+0x644/0x900 net/socket.c:1311
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:906 [inline]
       __se_sys_ioctl+0xf1/0x160 fs/ioctl.c:892
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xf3/0x210 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (nr_node_list_lock){+...}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3166 [inline]
       check_prevs_add kernel/locking/lockdep.c:3285 [inline]
       validate_chain+0xa69/0x24e0 kernel/locking/lockdep.c:3909
       __lock_acquire+0xad5/0xd80 kernel/locking/lockdep.c:5235
       lock_acquire+0x116/0x2f0 kernel/locking/lockdep.c:5866
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
       _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
       spin_lock_bh include/linux/spinlock.h:356 [inline]
       nr_rt_device_down+0xb5/0x7b0 net/netrom/nr_route.c:517
       nr_device_event+0x134/0x150 net/netrom/af_netrom.c:126
       notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
       __dev_notify_flags+0x209/0x410 net/core/dev.c:-1
       netif_change_flags+0xf0/0x1a0 net/core/dev.c:9434
       dev_change_flags+0x146/0x270 net/core/dev_api.c:68
       dev_ioctl+0x80f/0x1260 net/core/dev_ioctl.c:821
       sock_do_ioctl+0x22f/0x400 net/socket.c:1204
       sock_ioctl+0x644/0x900 net/socket.c:1311
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:906 [inline]
       __se_sys_ioctl+0xf1/0x160 fs/ioctl.c:892
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xf3/0x210 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  nr_node_list_lock --> &nr_node->node_lock --> nr_neigh_list_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(nr_neigh_list_lock);
                               lock(&nr_node->node_lock);
                               lock(nr_neigh_list_lock);
  lock(nr_node_list_lock);

 *** DEADLOCK ***

2 locks held by syz-executor107/6105:
 #0: ffffffff900fd788 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_net_lock include/linux/rtnetlink.h:130 [inline]
 #0: ffffffff900fd788 (rtnl_mutex){+.+.}-{4:4}, at: dev_ioctl+0x7fd/0x1260 net/core/dev_ioctl.c:820
 #1: ffffffff90254358 (nr_neigh_list_lock){+...}-{3:3}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
 #1: ffffffff90254358 (nr_neigh_list_lock){+...}-{3:3}, at: nr_rt_device_down+0x28/0x7b0 net/netrom/nr_route.c:514

stack backtrace:
CPU: 0 UID: 0 PID: 6105 Comm: syz-executor107 Not tainted 6.15.0-rc2-syzkaller-00278-gfc96b232f8e7 #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_circular_bug+0x2e1/0x300 kernel/locking/lockdep.c:2079
 check_noncircular+0x142/0x160 kernel/locking/lockdep.c:2211
 check_prev_add kernel/locking/lockdep.c:3166 [inline]
 check_prevs_add kernel/locking/lockdep.c:3285 [inline]
 validate_chain+0xa69/0x24e0 kernel/locking/lockdep.c:3909
 __lock_acquire+0xad5/0xd80 kernel/locking/lockdep.c:5235
 lock_acquire+0x116/0x2f0 kernel/locking/lockdep.c:5866
 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
 _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
 spin_lock_bh include/linux/spinlock.h:356 [inline]
 nr_rt_device_down+0xb5/0x7b0 net/netrom/nr_route.c:517
 nr_device_event+0x134/0x150 net/netrom/af_netrom.c:126
 notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
 __dev_notify_flags+0x209/0x410 net/core/dev.c:-1
 netif_change_flags+0xf0/0x1a0 net/core/dev.c:9434
 dev_change_flags+0x146/0x270 net/core/dev_api.c:68
 dev_ioctl+0x80f/0x1260 net/core/dev_ioctl.c:821
 sock_do_ioctl+0x22f/0x400 net/socket.c:1204
 sock_ioctl+0x644/0x900 net/socket.c:1311
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:906 [inline]
 __se_sys_ioctl+0xf1/0x160 fs/ioctl.c:892
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xf3/0x210 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Reported-by: syzbot+ccdfb85a561b973219c7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ccdfb85a561b973219c7 
Signed-off-by: Denis Arefev <arefev@swemel.ru>
---
 net/netrom/nr_route.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b94cb2ffbaf8..aae0923dbcf0 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -331,6 +331,7 @@ static int nr_del_node(ax25_address *callsign, ax25_address *neighbour, struct n
 		return -EINVAL;
 	}
 
+	spin_lock_bh(&nr_neigh_list_lock);
 	spin_lock_bh(&nr_node_list_lock);
 	nr_node_lock(nr_node);
 	for (i = 0; i < nr_node->count; i++) {
@@ -339,7 +340,7 @@ static int nr_del_node(ax25_address *callsign, ax25_address *neighbour, struct n
 			nr_neigh_put(nr_neigh);
 
 			if (nr_neigh->count == 0 && !nr_neigh->locked)
-				nr_remove_neigh(nr_neigh);
+				nr_remove_neigh_locked(nr_neigh);
 			nr_neigh_put(nr_neigh);
 
 			nr_node->count--;
@@ -361,13 +362,14 @@ static int nr_del_node(ax25_address *callsign, ax25_address *neighbour, struct n
 			}
 			nr_node_unlock(nr_node);
 			spin_unlock_bh(&nr_node_list_lock);
-
+			spin_unlock_bh(&nr_neigh_list_lock);
 			return 0;
 		}
 	}
 	nr_neigh_put(nr_neigh);
 	nr_node_unlock(nr_node);
 	spin_unlock_bh(&nr_node_list_lock);
+	spin_unlock_bh(&nr_neigh_list_lock);
 	nr_node_put(nr_node);
 
 	return -EINVAL;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netrom: fix possible deadlock in nr_rt_device_down
  2025-06-05 10:54 [PATCH net] netrom: fix possible deadlock in nr_rt_device_down Denis Arefev
@ 2025-06-09 22:57 ` Jakub Kicinski
       [not found]   ` <5f821879-6774-3dc2-e97d-e33b76513088@trinnet.net>
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-06-09 22:57 UTC (permalink / raw)
  To: Denis Arefev
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Nikita Marushkin, Ilya Shchipletsov, Hongbo Li, linux-hams,
	netdev, linux-kernel, lvc-project, stable,
	syzbot+ccdfb85a561b973219c7

On Thu,  5 Jun 2025 13:54:48 +0300 Denis Arefev wrote:
> Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

I suspect syzbot is the only user of this code.
Can you delete one of the locks to make the maintenance simpler?
At a glance - I don't think your fix covers all the ABBA cases.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netrom: fix possible deadlock in nr_rt_device_down
       [not found]   ` <5f821879-6774-3dc2-e97d-e33b76513088@trinnet.net>
@ 2025-06-09 23:26     ` Jakub Kicinski
  2025-06-09 23:30       ` David Ranch
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-06-09 23:26 UTC (permalink / raw)
  To: David Ranch
  Cc: Denis Arefev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Nikita Marushkin, Ilya Shchipletsov, Hongbo Li,
	linux-hams, netdev, linux-kernel, lvc-project, stable,
	syzbot+ccdfb85a561b973219c7

On Mon, 9 Jun 2025 16:16:32 -0700 David Ranch wrote:
> I'm not sure what you mean by "the only user of this code".  There are 
> many people using the Linux AX.25 + NETROM stack but we unfortunately 
> don't have a active kernel maintainer for this code today.

Alright, sorry. Either way - these locks are not performance critical
for you, right?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netrom: fix possible deadlock in nr_rt_device_down
  2025-06-09 23:26     ` Jakub Kicinski
@ 2025-06-09 23:30       ` David Ranch
  2025-06-10 13:36         ` Dan Cross
  0 siblings, 1 reply; 6+ messages in thread
From: David Ranch @ 2025-06-09 23:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Denis Arefev, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Nikita Marushkin, Ilya Shchipletsov, Hongbo Li,
	linux-hams, netdev, linux-kernel, lvc-project, stable,
	syzbot+ccdfb85a561b973219c7


That's unclear to me but maybe someone else knowing the code better than 
myself can chime in.  I have to assume having these locks present
are for a reason.

--David
KI6ZHD


On 06/09/2025 04:26 PM, Jakub Kicinski wrote:
> On Mon, 9 Jun 2025 16:16:32 -0700 David Ranch wrote:
>> I'm not sure what you mean by "the only user of this code".  There are
>> many people using the Linux AX.25 + NETROM stack but we unfortunately
>> don't have a active kernel maintainer for this code today.
>
> Alright, sorry. Either way - these locks are not performance critical
> for you, right?
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netrom: fix possible deadlock in nr_rt_device_down
  2025-06-09 23:30       ` David Ranch
@ 2025-06-10 13:36         ` Dan Cross
  2025-06-10 17:00           ` David Ranch
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Cross @ 2025-06-10 13:36 UTC (permalink / raw)
  To: David Ranch
  Cc: Jakub Kicinski, Denis Arefev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Nikita Marushkin, Ilya Shchipletsov,
	Hongbo Li, linux-hams, netdev, linux-kernel, lvc-project, stable,
	syzbot+ccdfb85a561b973219c7

On Mon, Jun 9, 2025 at 7:31 PM David Ranch <linux-hams@trinnet.net> wrote:
> That's unclear to me but maybe someone else knowing the code better than
> myself can chime in.  I have to assume having these locks present
> are for a reason.

The suggestion was not to remove locking, but rather, to fold multiple
separate locks into one. That is, have a single lock that covers both
the neighbor list and the node list. Naturally, there would be more
contention around a single lock in contrast to multiple, more granular
locks. But given that NETROM has very low performance requirements,
and that the data that these locks protect doesn't change that often,
that's probably fine and would eliminate the possibility of deadlock
due to lock ordering issues.

        - Dan C.

> On 06/09/2025 04:26 PM, Jakub Kicinski wrote:
> > On Mon, 9 Jun 2025 16:16:32 -0700 David Ranch wrote:
> >> I'm not sure what you mean by "the only user of this code".  There are
> >> many people using the Linux AX.25 + NETROM stack but we unfortunately
> >> don't have a active kernel maintainer for this code today.
> >
> > Alright, sorry. Either way - these locks are not performance critical
> > for you, right?
> >
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netrom: fix possible deadlock in nr_rt_device_down
  2025-06-10 13:36         ` Dan Cross
@ 2025-06-10 17:00           ` David Ranch
  0 siblings, 0 replies; 6+ messages in thread
From: David Ranch @ 2025-06-10 17:00 UTC (permalink / raw)
  To: Dan Cross
  Cc: Jakub Kicinski, Denis Arefev, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Nikita Marushkin, Ilya Shchipletsov,
	Hongbo Li, linux-hams, netdev, linux-kernel, lvc-project, stable,
	syzbot+ccdfb85a561b973219c7

Yes, this seems like a reasonable approach though I understand all this 
code is old, overly complicated, and when proposed changes are 
available, little to no proper testing is done before it's commited and 
it takes a very long time to get properly fixed.

I only bring this all up as the Linux AX.25 community has been badly 
bitten by similar commits in the last few years.  I've tried to help 
find a new maintainer and/or find somewhere to possibly create and run 
CI tests to catch issues but I've been unsuccessful so far.

I am happy to try helping on the testing side once I know what the test 
harness is but I'm out of my league when it comes to the code side.

--David
KI6ZHD


On 06/10/2025 06:36 AM, Dan Cross wrote:
> On Mon, Jun 9, 2025 at 7:31 PM David Ranch <linux-hams@trinnet.net> wrote:
>> That's unclear to me but maybe someone else knowing the code better than
>> myself can chime in.  I have to assume having these locks present
>> are for a reason.
>
> The suggestion was not to remove locking, but rather, to fold multiple
> separate locks into one. That is, have a single lock that covers both
> the neighbor list and the node list. Naturally, there would be more
> contention around a single lock in contrast to multiple, more granular
> locks. But given that NETROM has very low performance requirements,
> and that the data that these locks protect doesn't change that often,
> that's probably fine and would eliminate the possibility of deadlock
> due to lock ordering issues.
>
>         - Dan C.
>
>> On 06/09/2025 04:26 PM, Jakub Kicinski wrote:
>>> On Mon, 9 Jun 2025 16:16:32 -0700 David Ranch wrote:
>>>> I'm not sure what you mean by "the only user of this code".  There are
>>>> many people using the Linux AX.25 + NETROM stack but we unfortunately
>>>> don't have a active kernel maintainer for this code today.
>>>
>>> Alright, sorry. Either way - these locks are not performance critical
>>> for you, right?
>>>
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-06-10 17:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 10:54 [PATCH net] netrom: fix possible deadlock in nr_rt_device_down Denis Arefev
2025-06-09 22:57 ` Jakub Kicinski
     [not found]   ` <5f821879-6774-3dc2-e97d-e33b76513088@trinnet.net>
2025-06-09 23:26     ` Jakub Kicinski
2025-06-09 23:30       ` David Ranch
2025-06-10 13:36         ` Dan Cross
2025-06-10 17:00           ` David Ranch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).