* [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) @ 2023-07-04 6:19 syzbot 2023-07-04 6:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: syzbot @ 2023-07-04 6:19 UTC (permalink / raw) To: davem, edumazet, kernel, kuba, linux-can, linux-kernel, linux, mkl, netdev, pabeni, robin, socketcan, syzkaller-bugs Hello, syzbot found the following issue on: HEAD commit: ae230642190a Merge branch 'af_unix-followup-fixes-for-so_p.. git tree: net-next console output: https://syzkaller.appspot.com/x/log.txt?x=1771bf67280000 kernel config: https://syzkaller.appspot.com/x/.config?x=c9bf1936936ca698 dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564 compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/8c060db03f09/disk-ae230642.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/1b9b937ece91/vmlinux-ae230642.xz kernel image: https://storage.googleapis.com/syzbot-assets/0c7eb1c82bf0/bzImage-ae230642.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com ====================================================== WARNING: possible circular locking dependency detected 6.4.0-rc7-syzkaller-01948-gae230642190a #0 Not tainted ------------------------------------------------------ syz-executor.2/11224 is trying to acquire lock: ffff88803bee50d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline] ffff88803bee50d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_errqueue+0xa3/0x1a0 net/can/j1939/socket.c:1081 but task is already holding lock: ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline] ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_session_list_lock net/can/j1939/transport.c:238 [inline] ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_cancel_active_session+0x41/0x360 net/can/j1939/transport.c:2183 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&priv->active_session_list_lock){+.-.}-{2:2}: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] j1939_session_list_lock net/can/j1939/transport.c:238 [inline] j1939_session_activate+0x47/0x4b0 net/can/j1939/transport.c:1564 j1939_sk_queue_activate_next_locked net/can/j1939/socket.c:181 [inline] j1939_sk_queue_activate_next+0x2bf/0x4d0 net/can/j1939/socket.c:208 j1939_session_deactivate_activate_next net/can/j1939/transport.c:1108 [inline] j1939_xtp_rx_abort_one+0x3c0/0x5b0 net/can/j1939/transport.c:1351 j1939_xtp_rx_abort net/can/j1939/transport.c:1362 [inline] j1939_tp_cmd_recv net/can/j1939/transport.c:2111 [inline] j1939_tp_recv+0xd98/0xf50 net/can/j1939/transport.c:2144 j1939_can_recv net/can/j1939/main.c:112 [inline] j1939_can_recv+0x78e/0xa80 net/can/j1939/main.c:38 deliver net/can/af_can.c:572 [inline] can_rcv_filter+0x5d4/0x8d0 net/can/af_can.c:606 can_receive+0x31d/0x5c0 net/can/af_can.c:663 can_rcv+0x1e1/0x280 net/can/af_can.c:687 __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5452 __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5566 process_backlog+0x101/0x670 net/core/dev.c:5894 __napi_poll+0xb7/0x6f0 net/core/dev.c:6460 napi_poll net/core/dev.c:6527 [inline] net_rx_action+0x8a9/0xcb0 net/core/dev.c:6660 __do_softirq+0x1d4/0x905 kernel/softirq.c:571 run_ksoftirqd kernel/softirq.c:939 [inline] run_ksoftirqd+0x31/0x60 kernel/softirq.c:931 smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164 kthread+0x344/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 -> #1 (&jsk->sk_session_queue_lock){+.-.}-{2:2}: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] j1939_sk_queue_drop_all+0x3b/0x2f0 net/can/j1939/socket.c:139 j1939_sk_netdev_event_netdown+0x7f/0x160 net/can/j1939/socket.c:1280 j1939_netdev_notify+0x19f/0x1d0 net/can/j1939/main.c:381 notifier_call_chain+0xb6/0x3c0 kernel/notifier.c:93 call_netdevice_notifiers_info+0xb9/0x130 net/core/dev.c:1962 call_netdevice_notifiers_extack net/core/dev.c:2000 [inline] call_netdevice_notifiers net/core/dev.c:2014 [inline] __dev_notify_flags+0x1ea/0x2d0 net/core/dev.c:8571 dev_change_flags+0x11b/0x170 net/core/dev.c:8607 do_setlink+0x19e2/0x3ae0 net/core/rtnetlink.c:2867 __rtnl_newlink+0xd85/0x1860 net/core/rtnetlink.c:3655 rtnl_newlink+0x68/0xa0 net/core/rtnetlink.c:3702 rtnetlink_rcv_msg+0x43d/0xd50 net/core/rtnetlink.c:6424 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2549 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 ____sys_sendmsg+0x733/0x920 net/socket.c:2493 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2547 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2576 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd -> #0 (&priv->j1939_socks_lock){+.-.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3113 [inline] check_prevs_add kernel/locking/lockdep.c:3232 [inline] validate_chain kernel/locking/lockdep.c:3847 [inline] __lock_acquire+0x2fcd/0x5f30 kernel/locking/lockdep.c:5088 lock_acquire kernel/locking/lockdep.c:5705 [inline] lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5670 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] j1939_sk_errqueue+0xa3/0x1a0 net/can/j1939/socket.c:1081 j1939_session_destroy+0x26c/0x4e0 net/can/j1939/transport.c:271 __j1939_session_release net/can/j1939/transport.c:294 [inline] kref_put include/linux/kref.h:65 [inline] j1939_session_put net/can/j1939/transport.c:299 [inline] j1939_session_deactivate_locked net/can/j1939/transport.c:1086 [inline] j1939_session_deactivate_locked+0x293/0x340 net/can/j1939/transport.c:1074 j1939_cancel_active_session+0x183/0x360 net/can/j1939/transport.c:2194 j1939_netdev_notify+0x197/0x1d0 net/can/j1939/main.c:380 notifier_call_chain+0xb6/0x3c0 kernel/notifier.c:93 call_netdevice_notifiers_info+0xb9/0x130 net/core/dev.c:1962 call_netdevice_notifiers_extack net/core/dev.c:2000 [inline] call_netdevice_notifiers net/core/dev.c:2014 [inline] __dev_notify_flags+0x1ea/0x2d0 net/core/dev.c:8571 dev_change_flags+0x11b/0x170 net/core/dev.c:8607 do_setlink+0x19e2/0x3ae0 net/core/rtnetlink.c:2867 __rtnl_newlink+0xd85/0x1860 net/core/rtnetlink.c:3655 rtnl_newlink+0x68/0xa0 net/core/rtnetlink.c:3702 rtnetlink_rcv_msg+0x43d/0xd50 net/core/rtnetlink.c:6424 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2549 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 ____sys_sendmsg+0x733/0x920 net/socket.c:2493 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2547 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2576 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Chain exists of: &priv->j1939_socks_lock --> &jsk->sk_session_queue_lock --> &priv->active_session_list_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->active_session_list_lock); lock(&jsk->sk_session_queue_lock); lock(&priv->active_session_list_lock); lock(&priv->j1939_socks_lock); *** DEADLOCK *** 2 locks held by syz-executor.2/11224: #0: ffffffff8e1194a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:78 [inline] #0: ffffffff8e1194a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x3e8/0xd50 net/core/rtnetlink.c:6421 #1: ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline] #1: ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_session_list_lock net/can/j1939/transport.c:238 [inline] #1: ffff88803bee5088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_cancel_active_session+0x41/0x360 net/can/j1939/transport.c:2183 stack backtrace: CPU: 1 PID: 11224 Comm: syz-executor.2 Not tainted 6.4.0-rc7-syzkaller-01948-gae230642190a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2188 check_prev_add kernel/locking/lockdep.c:3113 [inline] check_prevs_add kernel/locking/lockdep.c:3232 [inline] validate_chain kernel/locking/lockdep.c:3847 [inline] __lock_acquire+0x2fcd/0x5f30 kernel/locking/lockdep.c:5088 lock_acquire kernel/locking/lockdep.c:5705 [inline] lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5670 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] j1939_sk_errqueue+0xa3/0x1a0 net/can/j1939/socket.c:1081 j1939_session_destroy+0x26c/0x4e0 net/can/j1939/transport.c:271 __j1939_session_release net/can/j1939/transport.c:294 [inline] kref_put include/linux/kref.h:65 [inline] j1939_session_put net/can/j1939/transport.c:299 [inline] j1939_session_deactivate_locked net/can/j1939/transport.c:1086 [inline] j1939_session_deactivate_locked+0x293/0x340 net/can/j1939/transport.c:1074 j1939_cancel_active_session+0x183/0x360 net/can/j1939/transport.c:2194 j1939_netdev_notify+0x197/0x1d0 net/can/j1939/main.c:380 notifier_call_chain+0xb6/0x3c0 kernel/notifier.c:93 call_netdevice_notifiers_info+0xb9/0x130 net/core/dev.c:1962 call_netdevice_notifiers_extack net/core/dev.c:2000 [inline] call_netdevice_notifiers net/core/dev.c:2014 [inline] __dev_notify_flags+0x1ea/0x2d0 net/core/dev.c:8571 dev_change_flags+0x11b/0x170 net/core/dev.c:8607 do_setlink+0x19e2/0x3ae0 net/core/rtnetlink.c:2867 __rtnl_newlink+0xd85/0x1860 net/core/rtnetlink.c:3655 rtnl_newlink+0x68/0xa0 net/core/rtnetlink.c:3702 rtnetlink_rcv_msg+0x43d/0xd50 net/core/rtnetlink.c:6424 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2549 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 ____sys_sendmsg+0x733/0x920 net/socket.c:2493 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2547 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2576 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fdb2bc8c389 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdb2cab9168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fdb2bdabf80 RCX: 00007fdb2bc8c389 RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000006 RBP: 00007fdb2bcd7493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd70d89bbf R14: 00007fdb2cab9300 R15: 0000000000022000 </TASK> --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the bug is already fixed, let syzbot know by replying with: #syz fix: exact-commit-title If you want to change bug's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the bug is a duplicate of another bug, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-07-04 6:19 [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot @ 2023-07-04 6:47 ` Ziqi Zhao 2023-07-04 6:47 ` syzbot 2023-07-21 16:22 ` Ziqi Zhao 2023-07-10 17:53 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2023-11-15 3:54 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2 siblings, 2 replies; 13+ messages in thread From: Ziqi Zhao @ 2023-07-04 6:47 UTC (permalink / raw) To: syzbot+1591462f226d9cbf0564 Cc: davem, edumazet, kernel, kuba, linux-can, linux-kernel, linux, mkl, netdev, pabeni, robin, socketcan, syzkaller-bugs, skhan, ivan.orlov0322, Ziqi Zhao The following 3 locks would race against each other, causing the deadlock situation in the Syzbot bug report: - j1939_socks_lock - active_session_list_lock - sk_session_queue_lock A reasonable fix is to change j1939_socks_lock to an rwlock, since in the rare situations where a write lock is required for the linked list that j1939_socks_lock is protecting, the code does not attempt to acquire any more locks. This would break the circular lock dependency, where, for example, the current thread already locks j1939_socks_lock and attempts to acquire sk_session_queue_lock, and at the same time, another thread attempts to acquire j1939_socks_lock while holding sk_session_queue_lock. NOTE: This patch along does not fix the unregister_netdevice bug reported by Syzbot; instead, it solves a deadlock situation to prepare for one or more further patches to actually fix the Syzbot bug, which appears to be a reference counting problem within the j1939 codebase. #syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> --- net/can/j1939/j1939-priv.h | 2 +- net/can/j1939/main.c | 2 +- net/can/j1939/socket.c | 25 +++++++++++++------------ 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h index 16af1a7f80f6..74f15592d170 100644 --- a/net/can/j1939/j1939-priv.h +++ b/net/can/j1939/j1939-priv.h @@ -86,7 +86,7 @@ struct j1939_priv { unsigned int tp_max_packet_size; /* lock for j1939_socks list */ - spinlock_t j1939_socks_lock; + rwlock_t j1939_socks_lock; struct list_head j1939_socks; struct kref rx_kref; diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c index ecff1c947d68..a6fb89fa6278 100644 --- a/net/can/j1939/main.c +++ b/net/can/j1939/main.c @@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev) return ERR_PTR(-ENOMEM); j1939_tp_init(priv); - spin_lock_init(&priv->j1939_socks_lock); + rwlock_init(&priv->j1939_socks_lock); INIT_LIST_HEAD(&priv->j1939_socks); mutex_lock(&j1939_netdev_lock); diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index feaec4ad6d16..a8b981dc2065 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk) jsk->state |= J1939_SOCK_BOUND; j1939_priv_get(priv); - spin_lock_bh(&priv->j1939_socks_lock); + write_lock_bh(&priv->j1939_socks_lock); list_add_tail(&jsk->list, &priv->j1939_socks); - spin_unlock_bh(&priv->j1939_socks_lock); + write_unlock_bh(&priv->j1939_socks_lock); } static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk) { - spin_lock_bh(&priv->j1939_socks_lock); + write_lock_bh(&priv->j1939_socks_lock); list_del_init(&jsk->list); - spin_unlock_bh(&priv->j1939_socks_lock); + write_unlock_bh(&priv->j1939_socks_lock); j1939_priv_put(priv); jsk->state &= ~J1939_SOCK_BOUND; @@ -329,13 +329,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb) struct j1939_sock *jsk; bool match = false; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { match = j1939_sk_recv_match_one(jsk, skcb); if (match) break; } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); return match; } @@ -344,11 +344,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb) { struct j1939_sock *jsk; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { j1939_sk_recv_one(jsk, skb); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); } static void j1939_sk_sock_destruct(struct sock *sk) @@ -484,6 +484,7 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) priv = j1939_netdev_start(ndev); dev_put(ndev); + if (IS_ERR(priv)) { ret = PTR_ERR(priv); goto out_release_sock; @@ -1078,12 +1079,12 @@ void j1939_sk_errqueue(struct j1939_session *session, } /* spread RX notifications to all sockets subscribed to this session */ - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { if (j1939_sk_recv_match_one(jsk, &session->skcb)) __j1939_sk_errqueue(session, &jsk->sk, type); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); }; void j1939_sk_send_loop_abort(struct sock *sk, int err) @@ -1271,7 +1272,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) struct j1939_sock *jsk; int error_code = ENETDOWN; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { jsk->sk.sk_err = error_code; if (!sock_flag(&jsk->sk, SOCK_DEAD)) @@ -1279,7 +1280,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) j1939_sk_queue_drop_all(priv, jsk, error_code); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); } static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd, -- 2.34.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-07-04 6:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao @ 2023-07-04 6:47 ` syzbot 2023-07-04 7:37 ` Oleksij Rempel 2023-07-21 16:22 ` Ziqi Zhao 1 sibling, 1 reply; 13+ messages in thread From: syzbot @ 2023-07-04 6:47 UTC (permalink / raw) To: astrajoan Cc: astrajoan, davem, edumazet, ivan.orlov0322, kernel, kuba, linux-can, linux-kernel, linux, mkl, netdev, pabeni, robin, skhan, socketcan, syzkaller-bugs > The following 3 locks would race against each other, causing the > deadlock situation in the Syzbot bug report: > > - j1939_socks_lock > - active_session_list_lock > - sk_session_queue_lock > > A reasonable fix is to change j1939_socks_lock to an rwlock, since in > the rare situations where a write lock is required for the linked list > that j1939_socks_lock is protecting, the code does not attempt to > acquire any more locks. This would break the circular lock dependency, > where, for example, the current thread already locks j1939_socks_lock > and attempts to acquire sk_session_queue_lock, and at the same time, > another thread attempts to acquire j1939_socks_lock while holding > sk_session_queue_lock. > > NOTE: This patch along does not fix the unregister_netdevice bug > reported by Syzbot; instead, it solves a deadlock situation to prepare > for one or more further patches to actually fix the Syzbot bug, which > appears to be a reference counting problem within the j1939 codebase. > > #syz test: This crash does not have a reproducer. I cannot test it. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> > --- > net/can/j1939/j1939-priv.h | 2 +- > net/can/j1939/main.c | 2 +- > net/can/j1939/socket.c | 25 +++++++++++++------------ > 3 files changed, 15 insertions(+), 14 deletions(-) > > diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h > index 16af1a7f80f6..74f15592d170 100644 > --- a/net/can/j1939/j1939-priv.h > +++ b/net/can/j1939/j1939-priv.h > @@ -86,7 +86,7 @@ struct j1939_priv { > unsigned int tp_max_packet_size; > > /* lock for j1939_socks list */ > - spinlock_t j1939_socks_lock; > + rwlock_t j1939_socks_lock; > struct list_head j1939_socks; > > struct kref rx_kref; > diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c > index ecff1c947d68..a6fb89fa6278 100644 > --- a/net/can/j1939/main.c > +++ b/net/can/j1939/main.c > @@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev) > return ERR_PTR(-ENOMEM); > > j1939_tp_init(priv); > - spin_lock_init(&priv->j1939_socks_lock); > + rwlock_init(&priv->j1939_socks_lock); > INIT_LIST_HEAD(&priv->j1939_socks); > > mutex_lock(&j1939_netdev_lock); > diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c > index feaec4ad6d16..a8b981dc2065 100644 > --- a/net/can/j1939/socket.c > +++ b/net/can/j1939/socket.c > @@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk) > jsk->state |= J1939_SOCK_BOUND; > j1939_priv_get(priv); > > - spin_lock_bh(&priv->j1939_socks_lock); > + write_lock_bh(&priv->j1939_socks_lock); > list_add_tail(&jsk->list, &priv->j1939_socks); > - spin_unlock_bh(&priv->j1939_socks_lock); > + write_unlock_bh(&priv->j1939_socks_lock); > } > > static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk) > { > - spin_lock_bh(&priv->j1939_socks_lock); > + write_lock_bh(&priv->j1939_socks_lock); > list_del_init(&jsk->list); > - spin_unlock_bh(&priv->j1939_socks_lock); > + write_unlock_bh(&priv->j1939_socks_lock); > > j1939_priv_put(priv); > jsk->state &= ~J1939_SOCK_BOUND; > @@ -329,13 +329,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb) > struct j1939_sock *jsk; > bool match = false; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > match = j1939_sk_recv_match_one(jsk, skcb); > if (match) > break; > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > > return match; > } > @@ -344,11 +344,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb) > { > struct j1939_sock *jsk; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > j1939_sk_recv_one(jsk, skb); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > } > > static void j1939_sk_sock_destruct(struct sock *sk) > @@ -484,6 +484,7 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) > > priv = j1939_netdev_start(ndev); > dev_put(ndev); > + > if (IS_ERR(priv)) { > ret = PTR_ERR(priv); > goto out_release_sock; > @@ -1078,12 +1079,12 @@ void j1939_sk_errqueue(struct j1939_session *session, > } > > /* spread RX notifications to all sockets subscribed to this session */ > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > if (j1939_sk_recv_match_one(jsk, &session->skcb)) > __j1939_sk_errqueue(session, &jsk->sk, type); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > }; > > void j1939_sk_send_loop_abort(struct sock *sk, int err) > @@ -1271,7 +1272,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) > struct j1939_sock *jsk; > int error_code = ENETDOWN; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > jsk->sk.sk_err = error_code; > if (!sock_flag(&jsk->sk, SOCK_DEAD)) > @@ -1279,7 +1280,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) > > j1939_sk_queue_drop_all(priv, jsk, error_code); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > } > > static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd, > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-07-04 6:47 ` syzbot @ 2023-07-04 7:37 ` Oleksij Rempel 0 siblings, 0 replies; 13+ messages in thread From: Oleksij Rempel @ 2023-07-04 7:37 UTC (permalink / raw) To: syzbot Cc: astrajoan, davem, edumazet, ivan.orlov0322, kernel, kuba, linux-can, linux-kernel, linux, mkl, netdev, pabeni, robin, skhan, socketcan, syzkaller-bugs On Mon, Jul 03, 2023 at 11:47:26PM -0700, syzbot wrote: > > The following 3 locks would race against each other, causing the > > deadlock situation in the Syzbot bug report: > > > > - j1939_socks_lock > > - active_session_list_lock > > - sk_session_queue_lock > > > > A reasonable fix is to change j1939_socks_lock to an rwlock, since in > > the rare situations where a write lock is required for the linked list > > that j1939_socks_lock is protecting, the code does not attempt to > > acquire any more locks. This would break the circular lock dependency, > > where, for example, the current thread already locks j1939_socks_lock > > and attempts to acquire sk_session_queue_lock, and at the same time, > > another thread attempts to acquire j1939_socks_lock while holding > > sk_session_queue_lock. > > > > NOTE: This patch along does not fix the unregister_netdevice bug > > reported by Syzbot; instead, it solves a deadlock situation to prepare > > for one or more further patches to actually fix the Syzbot bug, which > > appears to be a reference counting problem within the j1939 codebase. > > > > #syz test: > > This crash does not have a reproducer. I cannot test it. > To stress this code path, the socket should be configured with err queue enabled. For example like this: value = 1; setsockopt(priv->sock, SOL_CAN_J1939, SO_J1939_ERRQUEUE, &value, sizeof(value)); sock_opt = SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_CMSG | SOF_TIMESTAMPING_TX_ACK | SOF_TIMESTAMPING_TX_SCHED | SOF_TIMESTAMPING_OPT_STATS | SOF_TIMESTAMPING_OPT_TSONLY | SOF_TIMESTAMPING_OPT_ID | SOF_TIMESTAMPING_RX_SOFTWARE; setsockopt(priv->sock, SOL_SOCKET, SO_TIMESTAMPING, (char *) &sock_opt, sizeof(sock_opt)); I hope it will help to create the reproducer Regards, Oleksij -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-07-04 6:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao 2023-07-04 6:47 ` syzbot @ 2023-07-21 16:22 ` Ziqi Zhao 2023-07-23 15:41 ` Oleksij Rempel 2023-08-07 4:46 ` Oleksij Rempel 1 sibling, 2 replies; 13+ messages in thread From: Ziqi Zhao @ 2023-07-21 16:22 UTC (permalink / raw) To: astrajoan, davem, edumazet, ivan.orlov0322, kernel, kuba, linux, linux-can, mkl, pabeni, robin, skhan, socketcan Cc: arnd, bridge, linux-kernel, mudongliangabcd, netdev, nikolay, roopa, syzbot+881d65229ca4f9ae8c84, syzkaller-bugs, syzbot+1591462f226d9cbf0564 The following 3 locks would race against each other, causing the deadlock situation in the Syzbot bug report: - j1939_socks_lock - active_session_list_lock - sk_session_queue_lock A reasonable fix is to change j1939_socks_lock to an rwlock, since in the rare situations where a write lock is required for the linked list that j1939_socks_lock is protecting, the code does not attempt to acquire any more locks. This would break the circular lock dependency, where, for example, the current thread already locks j1939_socks_lock and attempts to acquire sk_session_queue_lock, and at the same time, another thread attempts to acquire j1939_socks_lock while holding sk_session_queue_lock. NOTE: This patch along does not fix the unregister_netdevice bug reported by Syzbot; instead, it solves a deadlock situation to prepare for one or more further patches to actually fix the Syzbot bug, which appears to be a reference counting problem within the j1939 codebase. Reported-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> --- net/can/j1939/j1939-priv.h | 2 +- net/can/j1939/main.c | 2 +- net/can/j1939/socket.c | 25 +++++++++++++------------ 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h index 16af1a7f80f6..74f15592d170 100644 --- a/net/can/j1939/j1939-priv.h +++ b/net/can/j1939/j1939-priv.h @@ -86,7 +86,7 @@ struct j1939_priv { unsigned int tp_max_packet_size; /* lock for j1939_socks list */ - spinlock_t j1939_socks_lock; + rwlock_t j1939_socks_lock; struct list_head j1939_socks; struct kref rx_kref; diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c index ecff1c947d68..a6fb89fa6278 100644 --- a/net/can/j1939/main.c +++ b/net/can/j1939/main.c @@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev) return ERR_PTR(-ENOMEM); j1939_tp_init(priv); - spin_lock_init(&priv->j1939_socks_lock); + rwlock_init(&priv->j1939_socks_lock); INIT_LIST_HEAD(&priv->j1939_socks); mutex_lock(&j1939_netdev_lock); diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index feaec4ad6d16..a8b981dc2065 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk) jsk->state |= J1939_SOCK_BOUND; j1939_priv_get(priv); - spin_lock_bh(&priv->j1939_socks_lock); + write_lock_bh(&priv->j1939_socks_lock); list_add_tail(&jsk->list, &priv->j1939_socks); - spin_unlock_bh(&priv->j1939_socks_lock); + write_unlock_bh(&priv->j1939_socks_lock); } static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk) { - spin_lock_bh(&priv->j1939_socks_lock); + write_lock_bh(&priv->j1939_socks_lock); list_del_init(&jsk->list); - spin_unlock_bh(&priv->j1939_socks_lock); + write_unlock_bh(&priv->j1939_socks_lock); j1939_priv_put(priv); jsk->state &= ~J1939_SOCK_BOUND; @@ -329,13 +329,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb) struct j1939_sock *jsk; bool match = false; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { match = j1939_sk_recv_match_one(jsk, skcb); if (match) break; } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); return match; } @@ -344,11 +344,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb) { struct j1939_sock *jsk; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { j1939_sk_recv_one(jsk, skb); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); } static void j1939_sk_sock_destruct(struct sock *sk) @@ -484,6 +484,7 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) priv = j1939_netdev_start(ndev); dev_put(ndev); + if (IS_ERR(priv)) { ret = PTR_ERR(priv); goto out_release_sock; @@ -1078,12 +1079,12 @@ void j1939_sk_errqueue(struct j1939_session *session, } /* spread RX notifications to all sockets subscribed to this session */ - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { if (j1939_sk_recv_match_one(jsk, &session->skcb)) __j1939_sk_errqueue(session, &jsk->sk, type); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); }; void j1939_sk_send_loop_abort(struct sock *sk, int err) @@ -1271,7 +1272,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) struct j1939_sock *jsk; int error_code = ENETDOWN; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { jsk->sk.sk_err = error_code; if (!sock_flag(&jsk->sk, SOCK_DEAD)) @@ -1279,7 +1280,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) j1939_sk_queue_drop_all(priv, jsk, error_code); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); } static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd, -- 2.34.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-07-21 16:22 ` Ziqi Zhao @ 2023-07-23 15:41 ` Oleksij Rempel 2023-08-07 4:46 ` Oleksij Rempel 1 sibling, 0 replies; 13+ messages in thread From: Oleksij Rempel @ 2023-07-23 15:41 UTC (permalink / raw) To: Ziqi Zhao Cc: davem, edumazet, ivan.orlov0322, kernel, kuba, linux, linux-can, mkl, pabeni, robin, skhan, socketcan, arnd, bridge, linux-kernel, mudongliangabcd, netdev, nikolay, roopa, syzbot+881d65229ca4f9ae8c84, syzkaller-bugs, syzbot+1591462f226d9cbf0564 Hi, Thank you for you patch. Right now I'm on vacation, I'll to take a look on it as soon as possible. If i do not response for more then 3 weeks, please ping me. On Fri, Jul 21, 2023 at 09:22:26AM -0700, Ziqi Zhao wrote: > The following 3 locks would race against each other, causing the > deadlock situation in the Syzbot bug report: > > - j1939_socks_lock > - active_session_list_lock > - sk_session_queue_lock > > A reasonable fix is to change j1939_socks_lock to an rwlock, since in > the rare situations where a write lock is required for the linked list > that j1939_socks_lock is protecting, the code does not attempt to > acquire any more locks. This would break the circular lock dependency, > where, for example, the current thread already locks j1939_socks_lock > and attempts to acquire sk_session_queue_lock, and at the same time, > another thread attempts to acquire j1939_socks_lock while holding > sk_session_queue_lock. > > NOTE: This patch along does not fix the unregister_netdevice bug > reported by Syzbot; instead, it solves a deadlock situation to prepare > for one or more further patches to actually fix the Syzbot bug, which > appears to be a reference counting problem within the j1939 codebase. > > Reported-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com > Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> > --- > net/can/j1939/j1939-priv.h | 2 +- > net/can/j1939/main.c | 2 +- > net/can/j1939/socket.c | 25 +++++++++++++------------ > 3 files changed, 15 insertions(+), 14 deletions(-) > > diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h > index 16af1a7f80f6..74f15592d170 100644 > --- a/net/can/j1939/j1939-priv.h > +++ b/net/can/j1939/j1939-priv.h > @@ -86,7 +86,7 @@ struct j1939_priv { > unsigned int tp_max_packet_size; > > /* lock for j1939_socks list */ > - spinlock_t j1939_socks_lock; > + rwlock_t j1939_socks_lock; > struct list_head j1939_socks; > > struct kref rx_kref; > diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c > index ecff1c947d68..a6fb89fa6278 100644 > --- a/net/can/j1939/main.c > +++ b/net/can/j1939/main.c > @@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev) > return ERR_PTR(-ENOMEM); > > j1939_tp_init(priv); > - spin_lock_init(&priv->j1939_socks_lock); > + rwlock_init(&priv->j1939_socks_lock); > INIT_LIST_HEAD(&priv->j1939_socks); > > mutex_lock(&j1939_netdev_lock); > diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c > index feaec4ad6d16..a8b981dc2065 100644 > --- a/net/can/j1939/socket.c > +++ b/net/can/j1939/socket.c > @@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk) > jsk->state |= J1939_SOCK_BOUND; > j1939_priv_get(priv); > > - spin_lock_bh(&priv->j1939_socks_lock); > + write_lock_bh(&priv->j1939_socks_lock); > list_add_tail(&jsk->list, &priv->j1939_socks); > - spin_unlock_bh(&priv->j1939_socks_lock); > + write_unlock_bh(&priv->j1939_socks_lock); > } > > static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk) > { > - spin_lock_bh(&priv->j1939_socks_lock); > + write_lock_bh(&priv->j1939_socks_lock); > list_del_init(&jsk->list); > - spin_unlock_bh(&priv->j1939_socks_lock); > + write_unlock_bh(&priv->j1939_socks_lock); > > j1939_priv_put(priv); > jsk->state &= ~J1939_SOCK_BOUND; > @@ -329,13 +329,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb) > struct j1939_sock *jsk; > bool match = false; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > match = j1939_sk_recv_match_one(jsk, skcb); > if (match) > break; > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > > return match; > } > @@ -344,11 +344,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb) > { > struct j1939_sock *jsk; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > j1939_sk_recv_one(jsk, skb); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > } > > static void j1939_sk_sock_destruct(struct sock *sk) > @@ -484,6 +484,7 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) > > priv = j1939_netdev_start(ndev); > dev_put(ndev); > + > if (IS_ERR(priv)) { > ret = PTR_ERR(priv); > goto out_release_sock; > @@ -1078,12 +1079,12 @@ void j1939_sk_errqueue(struct j1939_session *session, > } > > /* spread RX notifications to all sockets subscribed to this session */ > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > if (j1939_sk_recv_match_one(jsk, &session->skcb)) > __j1939_sk_errqueue(session, &jsk->sk, type); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > }; > > void j1939_sk_send_loop_abort(struct sock *sk, int err) > @@ -1271,7 +1272,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) > struct j1939_sock *jsk; > int error_code = ENETDOWN; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > jsk->sk.sk_err = error_code; > if (!sock_flag(&jsk->sk, SOCK_DEAD)) > @@ -1279,7 +1280,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) > > j1939_sk_queue_drop_all(priv, jsk, error_code); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > } > > static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd, > -- > 2.34.1 > > > -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-07-21 16:22 ` Ziqi Zhao 2023-07-23 15:41 ` Oleksij Rempel @ 2023-08-07 4:46 ` Oleksij Rempel 2023-11-17 8:10 ` Oleksij Rempel 1 sibling, 1 reply; 13+ messages in thread From: Oleksij Rempel @ 2023-08-07 4:46 UTC (permalink / raw) To: Ziqi Zhao Cc: davem, edumazet, ivan.orlov0322, kernel, kuba, linux, linux-can, mkl, pabeni, robin, skhan, socketcan, arnd, netdev, bridge, syzkaller-bugs, linux-kernel, mudongliangabcd, nikolay, syzbot+1591462f226d9cbf0564, roopa, syzbot+881d65229ca4f9ae8c84 On Fri, Jul 21, 2023 at 09:22:26AM -0700, Ziqi Zhao wrote: > The following 3 locks would race against each other, causing the > deadlock situation in the Syzbot bug report: > > - j1939_socks_lock > - active_session_list_lock > - sk_session_queue_lock > > A reasonable fix is to change j1939_socks_lock to an rwlock, since in > the rare situations where a write lock is required for the linked list > that j1939_socks_lock is protecting, the code does not attempt to > acquire any more locks. This would break the circular lock dependency, > where, for example, the current thread already locks j1939_socks_lock > and attempts to acquire sk_session_queue_lock, and at the same time, > another thread attempts to acquire j1939_socks_lock while holding > sk_session_queue_lock. > > NOTE: This patch along does not fix the unregister_netdevice bug > reported by Syzbot; instead, it solves a deadlock situation to prepare > for one or more further patches to actually fix the Syzbot bug, which > appears to be a reference counting problem within the j1939 codebase. > > Reported-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com > Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Thank you! > --- > net/can/j1939/j1939-priv.h | 2 +- > net/can/j1939/main.c | 2 +- > net/can/j1939/socket.c | 25 +++++++++++++------------ > 3 files changed, 15 insertions(+), 14 deletions(-) > > diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h > index 16af1a7f80f6..74f15592d170 100644 > --- a/net/can/j1939/j1939-priv.h > +++ b/net/can/j1939/j1939-priv.h > @@ -86,7 +86,7 @@ struct j1939_priv { > unsigned int tp_max_packet_size; > > /* lock for j1939_socks list */ > - spinlock_t j1939_socks_lock; > + rwlock_t j1939_socks_lock; > struct list_head j1939_socks; > > struct kref rx_kref; > diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c > index ecff1c947d68..a6fb89fa6278 100644 > --- a/net/can/j1939/main.c > +++ b/net/can/j1939/main.c > @@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev) > return ERR_PTR(-ENOMEM); > > j1939_tp_init(priv); > - spin_lock_init(&priv->j1939_socks_lock); > + rwlock_init(&priv->j1939_socks_lock); > INIT_LIST_HEAD(&priv->j1939_socks); > > mutex_lock(&j1939_netdev_lock); > diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c > index feaec4ad6d16..a8b981dc2065 100644 > --- a/net/can/j1939/socket.c > +++ b/net/can/j1939/socket.c > @@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk) > jsk->state |= J1939_SOCK_BOUND; > j1939_priv_get(priv); > > - spin_lock_bh(&priv->j1939_socks_lock); > + write_lock_bh(&priv->j1939_socks_lock); > list_add_tail(&jsk->list, &priv->j1939_socks); > - spin_unlock_bh(&priv->j1939_socks_lock); > + write_unlock_bh(&priv->j1939_socks_lock); > } > > static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk) > { > - spin_lock_bh(&priv->j1939_socks_lock); > + write_lock_bh(&priv->j1939_socks_lock); > list_del_init(&jsk->list); > - spin_unlock_bh(&priv->j1939_socks_lock); > + write_unlock_bh(&priv->j1939_socks_lock); > > j1939_priv_put(priv); > jsk->state &= ~J1939_SOCK_BOUND; > @@ -329,13 +329,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb) > struct j1939_sock *jsk; > bool match = false; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > match = j1939_sk_recv_match_one(jsk, skcb); > if (match) > break; > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > > return match; > } > @@ -344,11 +344,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb) > { > struct j1939_sock *jsk; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > j1939_sk_recv_one(jsk, skb); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > } > > static void j1939_sk_sock_destruct(struct sock *sk) > @@ -484,6 +484,7 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) > > priv = j1939_netdev_start(ndev); > dev_put(ndev); > + > if (IS_ERR(priv)) { > ret = PTR_ERR(priv); > goto out_release_sock; > @@ -1078,12 +1079,12 @@ void j1939_sk_errqueue(struct j1939_session *session, > } > > /* spread RX notifications to all sockets subscribed to this session */ > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > if (j1939_sk_recv_match_one(jsk, &session->skcb)) > __j1939_sk_errqueue(session, &jsk->sk, type); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > }; > > void j1939_sk_send_loop_abort(struct sock *sk, int err) > @@ -1271,7 +1272,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) > struct j1939_sock *jsk; > int error_code = ENETDOWN; > > - spin_lock_bh(&priv->j1939_socks_lock); > + read_lock_bh(&priv->j1939_socks_lock); > list_for_each_entry(jsk, &priv->j1939_socks, list) { > jsk->sk.sk_err = error_code; > if (!sock_flag(&jsk->sk, SOCK_DEAD)) > @@ -1279,7 +1280,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) > > j1939_sk_queue_drop_all(priv, jsk, error_code); > } > - spin_unlock_bh(&priv->j1939_socks_lock); > + read_unlock_bh(&priv->j1939_socks_lock); > } > > static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd, > -- > 2.34.1 > > > -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-08-07 4:46 ` Oleksij Rempel @ 2023-11-17 8:10 ` Oleksij Rempel 0 siblings, 0 replies; 13+ messages in thread From: Oleksij Rempel @ 2023-11-17 8:10 UTC (permalink / raw) To: Ziqi Zhao Cc: ivan.orlov0322, edumazet, syzbot+881d65229ca4f9ae8c84, socketcan, bridge, nikolay, syzbot+1591462f226d9cbf0564, roopa, kuba, pabeni, arnd, syzkaller-bugs, mudongliangabcd, linux-can, mkl, skhan, robin, linux-kernel, linux, kernel, netdev, davem On Mon, Aug 07, 2023 at 06:46:34AM +0200, Oleksij Rempel wrote: > On Fri, Jul 21, 2023 at 09:22:26AM -0700, Ziqi Zhao wrote: > > The following 3 locks would race against each other, causing the > > deadlock situation in the Syzbot bug report: > > > > - j1939_socks_lock > > - active_session_list_lock > > - sk_session_queue_lock > > > > A reasonable fix is to change j1939_socks_lock to an rwlock, since in > > the rare situations where a write lock is required for the linked list > > that j1939_socks_lock is protecting, the code does not attempt to > > acquire any more locks. This would break the circular lock dependency, > > where, for example, the current thread already locks j1939_socks_lock > > and attempts to acquire sk_session_queue_lock, and at the same time, > > another thread attempts to acquire j1939_socks_lock while holding > > sk_session_queue_lock. > > > > NOTE: This patch along does not fix the unregister_netdevice bug > > reported by Syzbot; instead, it solves a deadlock situation to prepare > > for one or more further patches to actually fix the Syzbot bug, which > > appears to be a reference counting problem within the j1939 codebase. > > > > Reported-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com > > Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) 2023-07-04 6:19 [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2023-07-04 6:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao @ 2023-07-10 17:53 ` syzbot 2023-07-12 0:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao 2023-11-15 3:54 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2 siblings, 1 reply; 13+ messages in thread From: syzbot @ 2023-07-10 17:53 UTC (permalink / raw) To: astrajoan, davem, dvyukov, edumazet, ivan.orlov0322, kernel, kuba, linux-can, linux-kernel, linux, mkl, netdev, o.rempel, pabeni, robin, skhan, socketcan, syzkaller-bugs, syzkaller syzbot has found a reproducer for the following issue on: HEAD commit: e40939bbfc68 Merge branch 'for-next/core' into for-kernelci git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=17ce67d8a80000 kernel config: https://syzkaller.appspot.com/x/.config?x=c84f463eb74eab24 dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564 compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2 userspace arch: arm64 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1580fc5ca80000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=178f78d4a80000 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/257596b75aaf/disk-e40939bb.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/9c75b8d61081/vmlinux-e40939bb.xz kernel image: https://storage.googleapis.com/syzbot-assets/8f0233129f4f/Image-e40939bb.gz.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com ====================================================== WARNING: possible circular locking dependency detected 6.4.0-rc7-syzkaller-ge40939bbfc68 #0 Not tainted ------------------------------------------------------ syz-executor375/6045 is trying to acquire lock: ffff0000d2e690d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline] ffff0000d2e690d0 (&priv->j1939_socks_lock){+.-.}-{2:2}, at: j1939_sk_errqueue+0x90/0x144 net/can/j1939/socket.c:1081 but task is already holding lock: ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline] ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_session_list_lock net/can/j1939/transport.c:238 [inline] ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_cancel_active_session+0x54/0x414 net/can/j1939/transport.c:2183 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&priv->active_session_list_lock){+.-.}-{2:2}: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x48/0x60 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] j1939_session_list_lock net/can/j1939/transport.c:238 [inline] j1939_session_activate+0x60/0x378 net/can/j1939/transport.c:1564 j1939_sk_queue_activate_next_locked net/can/j1939/socket.c:181 [inline] j1939_sk_queue_activate_next+0x230/0x3b4 net/can/j1939/socket.c:208 j1939_session_deactivate_activate_next net/can/j1939/transport.c:1108 [inline] j1939_session_completed net/can/j1939/transport.c:1222 [inline] j1939_xtp_rx_eoma_one net/can/j1939/transport.c:1395 [inline] j1939_xtp_rx_eoma+0x2c0/0x4c0 net/can/j1939/transport.c:1410 j1939_tp_cmd_recv net/can/j1939/transport.c:2099 [inline] j1939_tp_recv+0x714/0xe14 net/can/j1939/transport.c:2144 j1939_can_recv+0x5bc/0x930 net/can/j1939/main.c:112 deliver net/can/af_can.c:572 [inline] can_rcv_filter+0x308/0x714 net/can/af_can.c:606 can_receive+0x338/0x498 net/can/af_can.c:663 can_rcv+0x128/0x23c net/can/af_can.c:687 __netif_receive_skb_one_core net/core/dev.c:5493 [inline] __netif_receive_skb+0x18c/0x400 net/core/dev.c:5607 process_backlog+0x3c0/0x70c net/core/dev.c:5935 __napi_poll+0xb4/0x648 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x5e4/0xdc4 net/core/dev.c:6698 __do_softirq+0x2d0/0xd54 kernel/softirq.c:571 run_ksoftirqd+0x6c/0x158 kernel/softirq.c:939 smpboot_thread_fn+0x4b0/0x920 kernel/smpboot.c:164 kthread+0x288/0x310 kernel/kthread.c:379 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:853 -> #1 (&jsk->sk_session_queue_lock){+.-.}-{2:2}: __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x48/0x60 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] j1939_sk_queue_drop_all+0x4c/0x200 net/can/j1939/socket.c:139 j1939_sk_netdev_event_netdown+0xe0/0x144 net/can/j1939/socket.c:1280 j1939_netdev_notify+0xf0/0x144 net/can/j1939/main.c:381 notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93 raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461 __dev_notify_flags+0x2bc/0x544 dev_change_flags+0xd0/0x15c net/core/dev.c:8645 do_setlink+0xc68/0x35c8 net/core/rtnetlink.c:2867 __rtnl_newlink net/core/rtnetlink.c:3648 [inline] rtnl_newlink+0x1354/0x1b1c net/core/rtnetlink.c:3695 rtnetlink_rcv_msg+0x744/0xdb8 net/core/rtnetlink.c:6417 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6435 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x568/0x81c net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52 el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191 el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 -> #0 (&priv->j1939_socks_lock){+.-.}-{2:2}: check_prev_add kernel/locking/lockdep.c:3113 [inline] check_prevs_add kernel/locking/lockdep.c:3232 [inline] validate_chain kernel/locking/lockdep.c:3847 [inline] __lock_acquire+0x3308/0x7604 kernel/locking/lockdep.c:5088 lock_acquire+0x23c/0x71c kernel/locking/lockdep.c:5705 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x48/0x60 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] j1939_sk_errqueue+0x90/0x144 net/can/j1939/socket.c:1081 __j1939_session_release net/can/j1939/transport.c:294 [inline] kref_put include/linux/kref.h:65 [inline] j1939_session_put+0xf0/0x4b4 net/can/j1939/transport.c:299 j1939_session_deactivate_locked net/can/j1939/transport.c:1086 [inline] j1939_cancel_active_session+0x2ec/0x414 net/can/j1939/transport.c:2194 j1939_netdev_notify+0xe8/0x144 net/can/j1939/main.c:380 notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93 raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461 __dev_notify_flags+0x2bc/0x544 dev_change_flags+0xd0/0x15c net/core/dev.c:8645 do_setlink+0xc68/0x35c8 net/core/rtnetlink.c:2867 __rtnl_newlink net/core/rtnetlink.c:3648 [inline] rtnl_newlink+0x1354/0x1b1c net/core/rtnetlink.c:3695 rtnetlink_rcv_msg+0x744/0xdb8 net/core/rtnetlink.c:6417 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6435 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x568/0x81c net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52 el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191 el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 other info that might help us debug this: Chain exists of: &priv->j1939_socks_lock --> &jsk->sk_session_queue_lock --> &priv->active_session_list_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->active_session_list_lock); lock(&jsk->sk_session_queue_lock); lock(&priv->active_session_list_lock); lock(&priv->j1939_socks_lock); *** DEADLOCK *** 2 locks held by syz-executor375/6045: #0: ffff80009080db68 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:78 [inline] #0: ffff80009080db68 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x700/0xdb8 net/core/rtnetlink.c:6414 #1: ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:355 [inline] #1: ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_session_list_lock net/can/j1939/transport.c:238 [inline] #1: ffff0000d2e69088 (&priv->active_session_list_lock){+.-.}-{2:2}, at: j1939_cancel_active_session+0x54/0x414 net/can/j1939/transport.c:2183 stack backtrace: CPU: 1 PID: 6045 Comm: syz-executor375 Not tainted 6.4.0-rc7-syzkaller-ge40939bbfc68 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Call trace: dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233 show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106 dump_stack+0x1c/0x28 lib/dump_stack.c:113 print_circular_bug+0x150/0x1b8 kernel/locking/lockdep.c:2066 check_noncircular+0x2cc/0x378 kernel/locking/lockdep.c:2188 check_prev_add kernel/locking/lockdep.c:3113 [inline] check_prevs_add kernel/locking/lockdep.c:3232 [inline] validate_chain kernel/locking/lockdep.c:3847 [inline] __lock_acquire+0x3308/0x7604 kernel/locking/lockdep.c:5088 lock_acquire+0x23c/0x71c kernel/locking/lockdep.c:5705 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x48/0x60 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:355 [inline] j1939_sk_errqueue+0x90/0x144 net/can/j1939/socket.c:1081 __j1939_session_release net/can/j1939/transport.c:294 [inline] kref_put include/linux/kref.h:65 [inline] j1939_session_put+0xf0/0x4b4 net/can/j1939/transport.c:299 j1939_session_deactivate_locked net/can/j1939/transport.c:1086 [inline] j1939_cancel_active_session+0x2ec/0x414 net/can/j1939/transport.c:2194 j1939_netdev_notify+0xe8/0x144 net/can/j1939/main.c:380 notifier_call_chain+0x1a4/0x510 kernel/notifier.c:93 raw_notifier_call_chain+0x3c/0x50 kernel/notifier.c:461 __dev_notify_flags+0x2bc/0x544 dev_change_flags+0xd0/0x15c net/core/dev.c:8645 do_setlink+0xc68/0x35c8 net/core/rtnetlink.c:2867 __rtnl_newlink net/core/rtnetlink.c:3648 [inline] rtnl_newlink+0x1354/0x1b1c net/core/rtnetlink.c:3695 rtnetlink_rcv_msg+0x744/0xdb8 net/core/rtnetlink.c:6417 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6435 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x568/0x81c net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x26c/0x33c net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52 el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191 el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 --- If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-07-10 17:53 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot @ 2023-07-12 0:47 ` Ziqi Zhao 2023-07-12 1:16 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2023-07-13 22:23 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Stephen Hemminger 0 siblings, 2 replies; 13+ messages in thread From: Ziqi Zhao @ 2023-07-12 0:47 UTC (permalink / raw) To: syzbot+1591462f226d9cbf0564 Cc: astrajoan, davem, dvyukov, edumazet, ivan.orlov0322, kernel, kuba, linux-can, linux-kernel, linux, mkl, netdev, o.rempel, pabeni, robin, skhan, socketcan, syzkaller-bugs, syzkaller The following 3 locks would race against each other, causing the deadlock situation in the Syzbot bug report: - j1939_socks_lock - active_session_list_lock - sk_session_queue_lock A reasonable fix is to change j1939_socks_lock to an rwlock, since in the rare situations where a write lock is required for the linked list that j1939_socks_lock is protecting, the code does not attempt to acquire any more locks. This would break the circular lock dependency, where, for example, the current thread already locks j1939_socks_lock and attempts to acquire sk_session_queue_lock, and at the same time, another thread attempts to acquire j1939_socks_lock while holding sk_session_queue_lock. NOTE: This patch along does not fix the unregister_netdevice bug reported by Syzbot; instead, it solves a deadlock situation to prepare for one or more further patches to actually fix the Syzbot bug, which appears to be a reference counting problem within the j1939 codebase. #syz test: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> --- net/can/j1939/j1939-priv.h | 2 +- net/can/j1939/main.c | 2 +- net/can/j1939/socket.c | 25 +++++++++++++------------ 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h index 16af1a7f80f6..74f15592d170 100644 --- a/net/can/j1939/j1939-priv.h +++ b/net/can/j1939/j1939-priv.h @@ -86,7 +86,7 @@ struct j1939_priv { unsigned int tp_max_packet_size; /* lock for j1939_socks list */ - spinlock_t j1939_socks_lock; + rwlock_t j1939_socks_lock; struct list_head j1939_socks; struct kref rx_kref; diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c index ecff1c947d68..a6fb89fa6278 100644 --- a/net/can/j1939/main.c +++ b/net/can/j1939/main.c @@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev) return ERR_PTR(-ENOMEM); j1939_tp_init(priv); - spin_lock_init(&priv->j1939_socks_lock); + rwlock_init(&priv->j1939_socks_lock); INIT_LIST_HEAD(&priv->j1939_socks); mutex_lock(&j1939_netdev_lock); diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c index feaec4ad6d16..a8b981dc2065 100644 --- a/net/can/j1939/socket.c +++ b/net/can/j1939/socket.c @@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk) jsk->state |= J1939_SOCK_BOUND; j1939_priv_get(priv); - spin_lock_bh(&priv->j1939_socks_lock); + write_lock_bh(&priv->j1939_socks_lock); list_add_tail(&jsk->list, &priv->j1939_socks); - spin_unlock_bh(&priv->j1939_socks_lock); + write_unlock_bh(&priv->j1939_socks_lock); } static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk) { - spin_lock_bh(&priv->j1939_socks_lock); + write_lock_bh(&priv->j1939_socks_lock); list_del_init(&jsk->list); - spin_unlock_bh(&priv->j1939_socks_lock); + write_unlock_bh(&priv->j1939_socks_lock); j1939_priv_put(priv); jsk->state &= ~J1939_SOCK_BOUND; @@ -329,13 +329,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb) struct j1939_sock *jsk; bool match = false; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { match = j1939_sk_recv_match_one(jsk, skcb); if (match) break; } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); return match; } @@ -344,11 +344,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb) { struct j1939_sock *jsk; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { j1939_sk_recv_one(jsk, skb); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); } static void j1939_sk_sock_destruct(struct sock *sk) @@ -484,6 +484,7 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len) priv = j1939_netdev_start(ndev); dev_put(ndev); + if (IS_ERR(priv)) { ret = PTR_ERR(priv); goto out_release_sock; @@ -1078,12 +1079,12 @@ void j1939_sk_errqueue(struct j1939_session *session, } /* spread RX notifications to all sockets subscribed to this session */ - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { if (j1939_sk_recv_match_one(jsk, &session->skcb)) __j1939_sk_errqueue(session, &jsk->sk, type); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); }; void j1939_sk_send_loop_abort(struct sock *sk, int err) @@ -1271,7 +1272,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) struct j1939_sock *jsk; int error_code = ENETDOWN; - spin_lock_bh(&priv->j1939_socks_lock); + read_lock_bh(&priv->j1939_socks_lock); list_for_each_entry(jsk, &priv->j1939_socks, list) { jsk->sk.sk_err = error_code; if (!sock_flag(&jsk->sk, SOCK_DEAD)) @@ -1279,7 +1280,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv) j1939_sk_queue_drop_all(priv, jsk, error_code); } - spin_unlock_bh(&priv->j1939_socks_lock); + read_unlock_bh(&priv->j1939_socks_lock); } static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd, -- 2.34.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) 2023-07-12 0:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao @ 2023-07-12 1:16 ` syzbot 2023-07-13 22:23 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Stephen Hemminger 1 sibling, 0 replies; 13+ messages in thread From: syzbot @ 2023-07-12 1:16 UTC (permalink / raw) To: astrajoan, davem, dvyukov, edumazet, ivan.orlov0322, kernel, kuba, linux-can, linux-kernel, linux, mkl, netdev, o.rempel, pabeni, robin, skhan, socketcan, syzkaller-bugs, syzkaller Hello, syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com Tested on: commit: 3f01e9fe Merge tag 'linux-watchdog-6.5-rc2' of git://w.. git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master console output: https://syzkaller.appspot.com/x/log.txt?x=130a98a2a80000 kernel config: https://syzkaller.appspot.com/x/.config?x=4c2acb092ca90577 dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 userspace arch: arm64 patch: https://syzkaller.appspot.com/x/patch.diff?x=1380a782a80000 Note: testing is done by a robot and is best-effort only. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock 2023-07-12 0:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao 2023-07-12 1:16 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot @ 2023-07-13 22:23 ` Stephen Hemminger 1 sibling, 0 replies; 13+ messages in thread From: Stephen Hemminger @ 2023-07-13 22:23 UTC (permalink / raw) To: Ziqi Zhao Cc: syzbot+1591462f226d9cbf0564, davem, dvyukov, edumazet, ivan.orlov0322, kernel, kuba, linux-can, linux-kernel, linux, mkl, netdev, o.rempel, pabeni, robin, skhan, socketcan, syzkaller-bugs, syzkaller On Tue, 11 Jul 2023 17:47:50 -0700 Ziqi Zhao <astrajoan@yahoo.com> wrote: > The following 3 locks would race against each other, causing the > deadlock situation in the Syzbot bug report: > > - j1939_socks_lock > - active_session_list_lock > - sk_session_queue_lock > > A reasonable fix is to change j1939_socks_lock to an rwlock, since in > the rare situations where a write lock is required for the linked list > that j1939_socks_lock is protecting, the code does not attempt to > acquire any more locks. This would break the circular lock dependency, > where, for example, the current thread already locks j1939_socks_lock > and attempts to acquire sk_session_queue_lock, and at the same time, > another thread attempts to acquire j1939_socks_lock while holding > sk_session_queue_lock. > > NOTE: This patch along does not fix the unregister_netdevice bug > reported by Syzbot; instead, it solves a deadlock situation to prepare > for one or more further patches to actually fix the Syzbot bug, which > appears to be a reference counting problem within the j1939 codebase. > > #syz test: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com> > --- Reader-writer locks are not the best way to fix a lock hierarchy problem. Instead either fix the lock ordering, or use RCU. Other devices don't have this problem, so perhaps the unique locking in this device is the problem. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) 2023-07-04 6:19 [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2023-07-04 6:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao 2023-07-10 17:53 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot @ 2023-11-15 3:54 ` syzbot 2 siblings, 0 replies; 13+ messages in thread From: syzbot @ 2023-11-15 3:54 UTC (permalink / raw) To: arnd, astrajoan, bridge, davem, dvyukov, edumazet, hdanton, ivan.orlov0322, kernel, kuba, linux-can, linux-kernel, linux, mkl, mudongliangabcd, netdev, nikolay, o.rempel, pabeni, robin, roopa, skhan, socketcan, stephen, syzkaller-bugs, syzkaller syzbot has bisected this issue to: commit 2030043e616cab40f510299f09b636285e0a3678 Author: Oleksij Rempel <o.rempel@pengutronix.de> Date: Fri May 21 11:57:20 2021 +0000 can: j1939: fix Use-after-Free, hold skb ref while in use bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1438c947680000 start commit: 1b907d050735 Merge tag '6.7-rc-smb3-client-fixes-part2' of.. git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=1638c947680000 console output: https://syzkaller.appspot.com/x/log.txt?x=1238c947680000 kernel config: https://syzkaller.appspot.com/x/.config?x=88e7ba51eecd9cd6 dashboard link: https://syzkaller.appspot.com/bug?extid=1591462f226d9cbf0564 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17fea8fb680000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1633dc70e80000 Reported-by: syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com Fixes: 2030043e616c ("can: j1939: fix Use-after-Free, hold skb ref while in use") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-11-17 8:10 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-04 6:19 [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2023-07-04 6:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao 2023-07-04 6:47 ` syzbot 2023-07-04 7:37 ` Oleksij Rempel 2023-07-21 16:22 ` Ziqi Zhao 2023-07-23 15:41 ` Oleksij Rempel 2023-08-07 4:46 ` Oleksij Rempel 2023-11-17 8:10 ` Oleksij Rempel 2023-07-10 17:53 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2023-07-12 0:47 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Ziqi Zhao 2023-07-12 1:16 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot 2023-07-13 22:23 ` [PATCH] can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock Stephen Hemminger 2023-11-15 3:54 ` [syzbot] [can?] possible deadlock in j1939_sk_errqueue (2) syzbot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).