Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [syzbot] [rdma?] possible deadlock in siw_create_listen (2)
@ 2024-09-26 13:34 syzbot
  2024-10-04 16:10 ` Bernard Metzler
  0 siblings, 1 reply; 4+ messages in thread
From: syzbot @ 2024-09-26 13:34 UTC (permalink / raw)
  To: bmt, jgg, leon, linux-kernel, linux-rdma, netdev, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    5f5673607153 Merge branch 'for-next/core' into for-kernelci
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=149fdca9980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=dedbcb1ff4387972
dashboard link: https://syzkaller.appspot.com/bug?extid=3eb27595de9aa3cf63c3
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/40172aed5414/disk-5f567360.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/58372f305e9d/vmlinux-5f567360.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d2aae6fa798f/Image-5f567360.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3eb27595de9aa3cf63c3@syzkaller.appspotmail.com

iwpm_register_pid: Unable to send a nlmsg (client = 2)
======================================================
WARNING: possible circular locking dependency detected
6.11.0-rc7-syzkaller-g5f5673607153 #0 Not tainted
------------------------------------------------------
syz.4.157/7931 is trying to acquire lock:
ffff0000ee056458 (sk_lock-AF_INET){+.+.}-{0:0}, at: siw_create_listen+0x164/0xd70 drivers/infiniband/sw/siw/siw_cm.c:1776

but task is already holding lock:
ffff800091c21ea8 (lock#7){+.+.}-{3:3}, at: cma_add_one+0x510/0xab4 drivers/infiniband/core/cma.c:5354

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (lock#7){+.+.}-{3:3}:
       __mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
       __mutex_lock kernel/locking/mutex.c:752 [inline]
       mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
       cma_init+0x2c/0x158 drivers/infiniband/core/cma.c:5438
       do_one_initcall+0x24c/0x9c0 init/main.c:1267
       do_initcall_level+0x154/0x214 init/main.c:1329
       do_initcalls+0x58/0xac init/main.c:1345
       do_basic_setup+0x8c/0xa0 init/main.c:1364
       kernel_init_freeable+0x324/0x478 init/main.c:1578
       kernel_init+0x24/0x2a0 init/main.c:1467
       ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860

-> #2 (rtnl_mutex){+.+.}-{3:3}:
       __mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
       __mutex_lock kernel/locking/mutex.c:752 [inline]
       mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
       rtnl_lock+0x20/0x2c net/core/rtnetlink.c:79
       do_ip_setsockopt+0xe8c/0x346c net/ipv4/ip_sockglue.c:1077
       ip_setsockopt+0x80/0x128 net/ipv4/ip_sockglue.c:1417
       tcp_setsockopt+0xcc/0xe8 net/ipv4/tcp.c:3768
       sock_common_setsockopt+0xb0/0xcc net/core/sock.c:3735
       smc_setsockopt+0x204/0x10fc net/smc/af_smc.c:3072
       do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2324
       __sys_setsockopt+0x128/0x1a8 net/socket.c:2347
       __do_sys_setsockopt net/socket.c:2356 [inline]
       __se_sys_setsockopt net/socket.c:2353 [inline]
       __arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2353
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
       el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
       do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
       el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
       el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
       el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

-> #1 (&smc->clcsock_release_lock){+.+.}-{3:3}:
       __mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
       __mutex_lock kernel/locking/mutex.c:752 [inline]
       mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
       smc_switch_to_fallback+0x48/0xa80 net/smc/af_smc.c:902
       smc_sendmsg+0xfc/0x9f8 net/smc/af_smc.c:2779
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       __sys_sendto+0x374/0x4f4 net/socket.c:2204
       __do_sys_sendto net/socket.c:2216 [inline]
       __se_sys_sendto net/socket.c:2212 [inline]
       __arm64_sys_sendto+0xd8/0xf8 net/socket.c:2212
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
       el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
       do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
       el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
       el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
       el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

-> #0 (sk_lock-AF_INET){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3133 [inline]
       check_prevs_add kernel/locking/lockdep.c:3252 [inline]
       validate_chain kernel/locking/lockdep.c:3868 [inline]
       __lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
       lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
       lock_sock_nested net/core/sock.c:3543 [inline]
       lock_sock include/net/sock.h:1607 [inline]
       sock_set_reuseaddr+0x58/0x154 net/core/sock.c:782
       siw_create_listen+0x164/0xd70 drivers/infiniband/sw/siw/siw_cm.c:1776
       iw_cm_listen+0x14c/0x204 drivers/infiniband/core/iwcm.c:585
       cma_iw_listen drivers/infiniband/core/cma.c:2668 [inline]
       rdma_listen+0x774/0xae4 drivers/infiniband/core/cma.c:3953
       cma_listen_on_dev+0x320/0x64c drivers/infiniband/core/cma.c:2727
       cma_add_one+0x5ec/0xab4 drivers/infiniband/core/cma.c:5357
       add_client_context+0x45c/0x7d0 drivers/infiniband/core/device.c:727
       enable_device_and_get+0x1a8/0x3e8 drivers/infiniband/core/device.c:1338
       ib_register_device+0xe40/0x108c drivers/infiniband/core/device.c:1426
       siw_device_register drivers/infiniband/sw/siw/siw_main.c:72 [inline]
       siw_newlink+0x80c/0xc2c drivers/infiniband/sw/siw/siw_main.c:489
       nldev_newlink+0x49c/0x4fc drivers/infiniband/core/nldev.c:1794
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x5c4/0x858 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
       netlink_unicast+0x668/0x8a4 net/netlink/af_netlink.c:1357
       netlink_sendmsg+0x7a4/0xa8c net/netlink/af_netlink.c:1901
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       ____sys_sendmsg+0x56c/0x840 net/socket.c:2597
       ___sys_sendmsg net/socket.c:2651 [inline]
       __sys_sendmsg+0x26c/0x33c net/socket.c:2680
       __do_sys_sendmsg net/socket.c:2689 [inline]
       __se_sys_sendmsg net/socket.c:2687 [inline]
       __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2687
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
       el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
       do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
       el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
       el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
       el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598

other info that might help us debug this:

Chain exists of:
  sk_lock-AF_INET --> rtnl_mutex --> lock#7

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(lock#7);
                               lock(rtnl_mutex);
                               lock(lock#7);
  lock(sk_lock-AF_INET);

 *** DEADLOCK ***

6 locks held by syz.4.157/7931:
 #0: ffff8000974142d8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:164 [inline]
 #0: ffff8000974142d8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
 #0: ffff8000974142d8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv+0x330/0x858 drivers/infiniband/core/netlink.c:259
 #1: ffff800091c0e870 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x358/0x4fc drivers/infiniband/core/nldev.c:1784
 #2: ffff800091bff210 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0x104/0x3e8 drivers/infiniband/core/device.c:1328
 #3: ffff800091bff510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x160/0x3e8 drivers/infiniband/core/device.c:1336
 #4: ffff0000d61505d0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x424/0x7d0 drivers/infiniband/core/device.c:725
 #5: ffff800091c21ea8 (lock#7){+.+.}-{3:3}, at: cma_add_one+0x510/0xab4 drivers/infiniband/core/cma.c:5354

stack backtrace:
CPU: 0 UID: 0 PID: 7931 Comm: syz.4.157 Not tainted 6.11.0-rc7-syzkaller-g5f5673607153 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call trace:
 dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:319
 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:326
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:119
 dump_stack+0x1c/0x28 lib/dump_stack.c:128
 print_circular_bug+0x150/0x1b8 kernel/locking/lockdep.c:2059
 check_noncircular+0x310/0x404 kernel/locking/lockdep.c:2186
 check_prev_add kernel/locking/lockdep.c:3133 [inline]
 check_prevs_add kernel/locking/lockdep.c:3252 [inline]
 validate_chain kernel/locking/lockdep.c:3868 [inline]
 __lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
 lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
 lock_sock_nested net/core/sock.c:3543 [inline]
 lock_sock include/net/sock.h:1607 [inline]
 sock_set_reuseaddr+0x58/0x154 net/core/sock.c:782
 siw_create_listen+0x164/0xd70 drivers/infiniband/sw/siw/siw_cm.c:1776
 iw_cm_listen+0x14c/0x204 drivers/infiniband/core/iwcm.c:585
 cma_iw_listen drivers/infiniband/core/cma.c:2668 [inline]
 rdma_listen+0x774/0xae4 drivers/infiniband/core/cma.c:3953
 cma_listen_on_dev+0x320/0x64c drivers/infiniband/core/cma.c:2727
 cma_add_one+0x5ec/0xab4 drivers/infiniband/core/cma.c:5357
 add_client_context+0x45c/0x7d0 drivers/infiniband/core/device.c:727
 enable_device_and_get+0x1a8/0x3e8 drivers/infiniband/core/device.c:1338
 ib_register_device+0xe40/0x108c drivers/infiniband/core/device.c:1426
 siw_device_register drivers/infiniband/sw/siw/siw_main.c:72 [inline]
 siw_newlink+0x80c/0xc2c drivers/infiniband/sw/siw/siw_main.c:489
 nldev_newlink+0x49c/0x4fc drivers/infiniband/core/nldev.c:1794
 rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
 rdma_nl_rcv+0x5c4/0x858 drivers/infiniband/core/netlink.c:259
 netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
 netlink_unicast+0x668/0x8a4 net/netlink/af_netlink.c:1357
 netlink_sendmsg+0x7a4/0xa8c net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg net/socket.c:745 [inline]
 ____sys_sendmsg+0x56c/0x840 net/socket.c:2597
 ___sys_sendmsg net/socket.c:2651 [inline]
 __sys_sendmsg+0x26c/0x33c net/socket.c:2680
 __do_sys_sendmsg net/socket.c:2689 [inline]
 __se_sys_sendmsg net/socket.c:2687 [inline]
 __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2687
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
infiniband syz1: RDMA CMA: cma_listen_on_dev, error -98
overlay: ./file0 is not a directory
xt_nfacct: accounting object `sy\x05' does not exists


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE:  [syzbot] [rdma?] possible deadlock in siw_create_listen (2)
  2024-09-26 13:34 [syzbot] [rdma?] possible deadlock in siw_create_listen (2) syzbot
@ 2024-10-04 16:10 ` Bernard Metzler
  2024-10-05  1:20   ` Jason Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Bernard Metzler @ 2024-10-04 16:10 UTC (permalink / raw)
  To: jgg@ziepe.ca, leon@kernel.org, linux-rdma@vger.kernel.org

> -----Original Message-----
> From: syzbot <syzbot+3eb27595de9aa3cf63c3@syzkaller.appspotmail.com>
> Sent: Thursday, September 26, 2024 3:34 PM
> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org;
> linux-kernel@vger.kernel.org; linux-rdma@vger.kernel.org;
> netdev@vger.kernel.org; syzkaller-bugs@googlegroups.com
> Subject: [EXTERNAL] [syzbot] [rdma?] possible deadlock in siw_create_listen
> (2)
> 
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    5f5673607153 Merge branch 'for-next/core' into for-kernelci
> git tree:
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: INVALID URI REMOVED
> 3A__syzkaller.appspot.com_x_log.txt-3Fx-
> 3D149fdca9980000&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4t
> YSbqxyOwdSiLedP4yO55g&m=sr8zZDt-
> X4vizrmZ7oTOUCc_f4tzlpuv7bRCJEXpp32wPy_dhtBfJCqKKk2V7Bp0&s=xuvx4qT_oYipgtJx
> 0iJ1oKZQsCwdkBuRmnDShT45eOc&e=
> kernel config:  INVALID URI REMOVED
> 3A__syzkaller.appspot.com_x_.config-3Fx-
> 3Ddedbcb1ff4387972&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE
> 4tYSbqxyOwdSiLedP4yO55g&m=sr8zZDt-
> X4vizrmZ7oTOUCc_f4tzlpuv7bRCJEXpp32wPy_dhtBfJCqKKk2V7Bp0&s=BjZg8UtYaAeXwr8W
> WxXuZ7A2QgccwxH4uGrmlPYBr0s&e=
> dashboard link: INVALID URI REMOVED
> 3A__syzkaller.appspot.com_bug-3Fextid-
> 3D3eb27595de9aa3cf63c3&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbh
> vovE4tYSbqxyOwdSiLedP4yO55g&m=sr8zZDt-
> X4vizrmZ7oTOUCc_f4tzlpuv7bRCJEXpp32wPy_dhtBfJCqKKk2V7Bp0&s=Mxs76HbB1WLfXbF9
> s3ulaR8KJd6t1Uz4K5IRN64eFVo&e=
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for
> Debian) 2.40
> userspace arch: arm64
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image: INVALID URI REMOVED
> 3A__storage.googleapis.com_syzbot-2Dassets_40172aed5414_disk-
> 2D5f567360.raw.xz&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4
> tYSbqxyOwdSiLedP4yO55g&m=sr8zZDt-
> X4vizrmZ7oTOUCc_f4tzlpuv7bRCJEXpp32wPy_dhtBfJCqKKk2V7Bp0&s=HJ8gCHhTtZtGTRe7
> 3tc_YcCZD2qxh-xhZFSsDV_tetc&e=
> vmlinux: INVALID URI REMOVED
> 3A__storage.googleapis.com_syzbot-2Dassets_58372f305e9d_vmlinux-
> 2D5f567360.xz&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tYSb
> qxyOwdSiLedP4yO55g&m=sr8zZDt-
> X4vizrmZ7oTOUCc_f4tzlpuv7bRCJEXpp32wPy_dhtBfJCqKKk2V7Bp0&s=I_ky8ZO37Twppvej
> koUyZpbrQC4ZkwxoCPf7SSerSm4&e=
> kernel image: INVALID URI REMOVED
> 3A__storage.googleapis.com_syzbot-2Dassets_d2aae6fa798f_Image-
> 2D5f567360.gz.xz&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4t
> YSbqxyOwdSiLedP4yO55g&m=sr8zZDt-
> X4vizrmZ7oTOUCc_f4tzlpuv7bRCJEXpp32wPy_dhtBfJCqKKk2V7Bp0&s=huA26Ba18XAmiroY
> x2AAfOapW2IOIdxGPh0_ay4obP8&e=
> 
> IMPORTANT: if you fix the issue, please add the following tag to the
> commit:
> Reported-by: syzbot+3eb27595de9aa3cf63c3@syzkaller.appspotmail.com
> 
> iwpm_register_pid: Unable to send a nlmsg (client = 2)
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.11.0-rc7-syzkaller-g5f5673607153 #0 Not tainted
> ------------------------------------------------------
> syz.4.157/7931 is trying to acquire lock:
> ffff0000ee056458 (sk_lock-AF_INET){+.+.}-{0:0}, at:
> siw_create_listen+0x164/0xd70 drivers/infiniband/sw/siw/siw_cm.c:1776
> 
> but task is already holding lock:
> ffff800091c21ea8 (lock#7){+.+.}-{3:3}, at: cma_add_one+0x510/0xab4
> drivers/infiniband/core/cma.c:5354
> 
> which lock already depends on the new lock.
> 

Could one please help me to understand this situation?
cma.c:5354

        mutex_lock(&lock);
        list_add_tail(&cma_dev->list, &dev_list);
        list_for_each_entry(id_priv, &listen_any_list, listen_any_item) {
                ret = cma_listen_on_dev(id_priv, cma_dev, &to_destroy);
                if (ret)
                        goto free_listen;
        }               
        mutex_unlock(&lock);

siw_cm.c:1776
	sock_set_reuseaddr(s->sk);

...which calls lock_sock(sk) on a feshly created socket.

I don't see the dependency between the global cma lock and the socket lock.

Any help appreciated!

Thanks,
Bernard.


> 
> the existing dependency chain (in reverse order) is:
> 
> -> #3 (lock#7){+.+.}-{3:3}:
>        __mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
>        __mutex_lock kernel/locking/mutex.c:752 [inline]
>        mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
>        cma_init+0x2c/0x158 drivers/infiniband/core/cma.c:5438
>        do_one_initcall+0x24c/0x9c0 init/main.c:1267
>        do_initcall_level+0x154/0x214 init/main.c:1329
>        do_initcalls+0x58/0xac init/main.c:1345
>        do_basic_setup+0x8c/0xa0 init/main.c:1364
>        kernel_init_freeable+0x324/0x478 init/main.c:1578
>        kernel_init+0x24/0x2a0 init/main.c:1467
>        ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
> 
> -> #2 (rtnl_mutex){+.+.}-{3:3}:
>        __mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
>        __mutex_lock kernel/locking/mutex.c:752 [inline]
>        mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
>        rtnl_lock+0x20/0x2c net/core/rtnetlink.c:79
>        do_ip_setsockopt+0xe8c/0x346c net/ipv4/ip_sockglue.c:1077
>        ip_setsockopt+0x80/0x128 net/ipv4/ip_sockglue.c:1417
>        tcp_setsockopt+0xcc/0xe8 net/ipv4/tcp.c:3768
>        sock_common_setsockopt+0xb0/0xcc net/core/sock.c:3735
>        smc_setsockopt+0x204/0x10fc net/smc/af_smc.c:3072
>        do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2324
>        __sys_setsockopt+0x128/0x1a8 net/socket.c:2347
>        __do_sys_setsockopt net/socket.c:2356 [inline]
>        __se_sys_setsockopt net/socket.c:2353 [inline]
>        __arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2353
>        __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>        invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
>        el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
>        do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
>        el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
>        el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
>        el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
> 
> -> #1 (&smc->clcsock_release_lock){+.+.}-{3:3}:
>        __mutex_lock_common+0x190/0x21a0 kernel/locking/mutex.c:608
>        __mutex_lock kernel/locking/mutex.c:752 [inline]
>        mutex_lock_nested+0x2c/0x38 kernel/locking/mutex.c:804
>        smc_switch_to_fallback+0x48/0xa80 net/smc/af_smc.c:902
>        smc_sendmsg+0xfc/0x9f8 net/smc/af_smc.c:2779
>        sock_sendmsg_nosec net/socket.c:730 [inline]
>        __sock_sendmsg net/socket.c:745 [inline]
>        __sys_sendto+0x374/0x4f4 net/socket.c:2204
>        __do_sys_sendto net/socket.c:2216 [inline]
>        __se_sys_sendto net/socket.c:2212 [inline]
>        __arm64_sys_sendto+0xd8/0xf8 net/socket.c:2212
>        __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>        invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
>        el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
>        do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
>        el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
>        el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
>        el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
> 
> -> #0 (sk_lock-AF_INET){+.+.}-{0:0}:
>        check_prev_add kernel/locking/lockdep.c:3133 [inline]
>        check_prevs_add kernel/locking/lockdep.c:3252 [inline]
>        validate_chain kernel/locking/lockdep.c:3868 [inline]
>        __lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
>        lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
>        lock_sock_nested net/core/sock.c:3543 [inline]
>        lock_sock include/net/sock.h:1607 [inline]
>        sock_set_reuseaddr+0x58/0x154 net/core/sock.c:782
>        siw_create_listen+0x164/0xd70
> drivers/infiniband/sw/siw/siw_cm.c:1776
>        iw_cm_listen+0x14c/0x204 drivers/infiniband/core/iwcm.c:585
>        cma_iw_listen drivers/infiniband/core/cma.c:2668 [inline]
>        rdma_listen+0x774/0xae4 drivers/infiniband/core/cma.c:3953
>        cma_listen_on_dev+0x320/0x64c drivers/infiniband/core/cma.c:2727
>        cma_add_one+0x5ec/0xab4 drivers/infiniband/core/cma.c:5357
>        add_client_context+0x45c/0x7d0 drivers/infiniband/core/device.c:727
>        enable_device_and_get+0x1a8/0x3e8
> drivers/infiniband/core/device.c:1338
>        ib_register_device+0xe40/0x108c
> drivers/infiniband/core/device.c:1426
>        siw_device_register drivers/infiniband/sw/siw/siw_main.c:72 [inline]
>        siw_newlink+0x80c/0xc2c drivers/infiniband/sw/siw/siw_main.c:489
>        nldev_newlink+0x49c/0x4fc drivers/infiniband/core/nldev.c:1794
>        rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
>        rdma_nl_rcv+0x5c4/0x858 drivers/infiniband/core/netlink.c:259
>        netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
>        netlink_unicast+0x668/0x8a4 net/netlink/af_netlink.c:1357
>        netlink_sendmsg+0x7a4/0xa8c net/netlink/af_netlink.c:1901
>        sock_sendmsg_nosec net/socket.c:730 [inline]
>        __sock_sendmsg net/socket.c:745 [inline]
>        ____sys_sendmsg+0x56c/0x840 net/socket.c:2597
>        ___sys_sendmsg net/socket.c:2651 [inline]
>        __sys_sendmsg+0x26c/0x33c net/socket.c:2680
>        __do_sys_sendmsg net/socket.c:2689 [inline]
>        __se_sys_sendmsg net/socket.c:2687 [inline]
>        __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2687
>        __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>        invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
>        el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
>        do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
>        el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
>        el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
>        el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   sk_lock-AF_INET --> rtnl_mutex --> lock#7
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(lock#7);
>                                lock(rtnl_mutex);
>                                lock(lock#7);
>   lock(sk_lock-AF_INET);
> 
>  *** DEADLOCK ***
> 
> 6 locks held by syz.4.157/7931:
>  #0: ffff8000974142d8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at:
> rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:164 [inline]
>  #0: ffff8000974142d8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at:
> rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
>  #0: ffff8000974142d8 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at:
> rdma_nl_rcv+0x330/0x858 drivers/infiniband/core/netlink.c:259
>  #1: ffff800091c0e870 (link_ops_rwsem){++++}-{3:3}, at:
> nldev_newlink+0x358/0x4fc drivers/infiniband/core/nldev.c:1784
>  #2: ffff800091bff210 (devices_rwsem){++++}-{3:3}, at:
> enable_device_and_get+0x104/0x3e8 drivers/infiniband/core/device.c:1328
>  #3: ffff800091bff510 (clients_rwsem){++++}-{3:3}, at:
> enable_device_and_get+0x160/0x3e8 drivers/infiniband/core/device.c:1336
>  #4: ffff0000d61505d0 (&device->client_data_rwsem){++++}-{3:3}, at:
> add_client_context+0x424/0x7d0 drivers/infiniband/core/device.c:725
>  #5: ffff800091c21ea8 (lock#7){+.+.}-{3:3}, at: cma_add_one+0x510/0xab4
> drivers/infiniband/core/cma.c:5354
> 
> stack backtrace:
> CPU: 0 UID: 0 PID: 7931 Comm: syz.4.157 Not tainted 6.11.0-rc7-syzkaller-
> g5f5673607153 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 08/06/2024
> Call trace:
>  dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:319
>  show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:326
>  __dump_stack lib/dump_stack.c:93 [inline]
>  dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:119
>  dump_stack+0x1c/0x28 lib/dump_stack.c:128
>  print_circular_bug+0x150/0x1b8 kernel/locking/lockdep.c:2059
>  check_noncircular+0x310/0x404 kernel/locking/lockdep.c:2186
>  check_prev_add kernel/locking/lockdep.c:3133 [inline]
>  check_prevs_add kernel/locking/lockdep.c:3252 [inline]
>  validate_chain kernel/locking/lockdep.c:3868 [inline]
>  __lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
>  lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
>  lock_sock_nested net/core/sock.c:3543 [inline]
>  lock_sock include/net/sock.h:1607 [inline]
>  sock_set_reuseaddr+0x58/0x154 net/core/sock.c:782
>  siw_create_listen+0x164/0xd70 drivers/infiniband/sw/siw/siw_cm.c:1776
>  iw_cm_listen+0x14c/0x204 drivers/infiniband/core/iwcm.c:585
>  cma_iw_listen drivers/infiniband/core/cma.c:2668 [inline]
>  rdma_listen+0x774/0xae4 drivers/infiniband/core/cma.c:3953
>  cma_listen_on_dev+0x320/0x64c drivers/infiniband/core/cma.c:2727
>  cma_add_one+0x5ec/0xab4 drivers/infiniband/core/cma.c:5357
>  add_client_context+0x45c/0x7d0 drivers/infiniband/core/device.c:727
>  enable_device_and_get+0x1a8/0x3e8 drivers/infiniband/core/device.c:1338
>  ib_register_device+0xe40/0x108c drivers/infiniband/core/device.c:1426
>  siw_device_register drivers/infiniband/sw/siw/siw_main.c:72 [inline]
>  siw_newlink+0x80c/0xc2c drivers/infiniband/sw/siw/siw_main.c:489
>  nldev_newlink+0x49c/0x4fc drivers/infiniband/core/nldev.c:1794
>  rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
>  rdma_nl_rcv+0x5c4/0x858 drivers/infiniband/core/netlink.c:259
>  netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
>  netlink_unicast+0x668/0x8a4 net/netlink/af_netlink.c:1357
>  netlink_sendmsg+0x7a4/0xa8c net/netlink/af_netlink.c:1901
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg net/socket.c:745 [inline]
>  ____sys_sendmsg+0x56c/0x840 net/socket.c:2597
>  ___sys_sendmsg net/socket.c:2651 [inline]
>  __sys_sendmsg+0x26c/0x33c net/socket.c:2680
>  __do_sys_sendmsg net/socket.c:2689 [inline]
>  __se_sys_sendmsg net/socket.c:2687 [inline]
>  __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2687
>  __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>  invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
>  el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
>  do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
>  el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
>  el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
>  el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
> infiniband syz1: RDMA CMA: cma_listen_on_dev, error -98
> overlay: ./file0 is not a directory
> xt_nfacct: accounting object `sy\x05' does not exists
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See INVALID URI REMOVED
> 3A__goo.gl_tpsmEJ&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4
> tYSbqxyOwdSiLedP4yO55g&m=sr8zZDt-
> X4vizrmZ7oTOUCc_f4tzlpuv7bRCJEXpp32wPy_dhtBfJCqKKk2V7Bp0&s=DA1M0kOP4c-
> 36riaoyaAE7WfF4I2V_cvru4PbF80xu4&e=  for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> INVALID URI REMOVED
> 23status&d=DwIFaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tYSbqxyOw
> dSiLedP4yO55g&m=sr8zZDt-
> X4vizrmZ7oTOUCc_f4tzlpuv7bRCJEXpp32wPy_dhtBfJCqKKk2V7Bp0&s=HTvYoHo7kNGdhvI6
> p66EC7F21n9dIQYD3aC3N_qXllQ&e=  for how to communicate with syzbot.
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [rdma?] possible deadlock in siw_create_listen (2)
  2024-10-04 16:10 ` Bernard Metzler
@ 2024-10-05  1:20   ` Jason Gunthorpe
  2024-10-05 17:34     ` Bernard Metzler
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2024-10-05  1:20 UTC (permalink / raw)
  To: Bernard Metzler; +Cc: leon@kernel.org, linux-rdma@vger.kernel.org

On Fri, Oct 04, 2024 at 04:10:31PM +0000, Bernard Metzler wrote:

> Could one please help me to understand this situation?
> cma.c:5354
> 
>         mutex_lock(&lock);
>         list_add_tail(&cma_dev->list, &dev_list);
>         list_for_each_entry(id_priv, &listen_any_list, listen_any_item) {
>                 ret = cma_listen_on_dev(id_priv, cma_dev, &to_destroy);
>                 if (ret)
>                         goto free_listen;
>         }               
>         mutex_unlock(&lock);
> 
> siw_cm.c:1776
> 	sock_set_reuseaddr(s->sk);
>
> ...which calls lock_sock(sk) on a feshly created socket.

I think this is a smc bug, and lockdep is getting confused about what
to report due to all the different locks.

smc_setsockopt() eventually in ip_setsockopt() does:

	mutex_lock(&smc->clcsock_release_lock);

	if (needs_rtnl)
		rtnl_lock();
	sockopt_lock_sock(sk);
	mutex_unlock(&smc->clcsock_release_lock);


smc_sendmsg() does

	lock_sock(sk);
	mutex_lock(&smc->clcsock_release_lock);

Which is classic deadlock locking.

That the CMA gets involved here seems like wrong reporting because
syzkaller put those lock chains into it.

I guess this is a dup of 

https://lore.kernel.org/netdev/00000000000093078f0622583e6e@google.com/T/

Or at least that should be fixed before looking at this

Jason

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [syzbot] [rdma?] possible deadlock in siw_create_listen (2)
  2024-10-05  1:20   ` Jason Gunthorpe
@ 2024-10-05 17:34     ` Bernard Metzler
  0 siblings, 0 replies; 4+ messages in thread
From: Bernard Metzler @ 2024-10-05 17:34 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: leon@kernel.org, linux-rdma@vger.kernel.org



> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Saturday, October 5, 2024 3:21 AM
> To: Bernard Metzler <BMT@zurich.ibm.com>
> Cc: leon@kernel.org; linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] Re: [syzbot] [rdma?] possible deadlock in
> siw_create_listen (2)
> 
> On Fri, Oct 04, 2024 at 04:10:31PM +0000, Bernard Metzler wrote:
> 
> > Could one please help me to understand this situation?
> > cma.c:5354
> >
> >         mutex_lock(&lock);
> >         list_add_tail(&cma_dev->list, &dev_list);
> >         list_for_each_entry(id_priv, &listen_any_list, listen_any_item) {
> >                 ret = cma_listen_on_dev(id_priv, cma_dev, &to_destroy);
> >                 if (ret)
> >                         goto free_listen;
> >         }
> >         mutex_unlock(&lock);
> >
> > siw_cm.c:1776
> > 	sock_set_reuseaddr(s->sk);
> >
> > ...which calls lock_sock(sk) on a feshly created socket.
> 
> I think this is a smc bug, and lockdep is getting confused about what
> to report due to all the different locks.
> 
> smc_setsockopt() eventually in ip_setsockopt() does:
> 
> 	mutex_lock(&smc->clcsock_release_lock);
> 
> 	if (needs_rtnl)
> 		rtnl_lock();
> 	sockopt_lock_sock(sk);
> 	mutex_unlock(&smc->clcsock_release_lock);
> 
> 
> smc_sendmsg() does
> 
> 	lock_sock(sk);
> 	mutex_lock(&smc->clcsock_release_lock);
> 
> Which is classic deadlock locking.
> 

Thank you for helping to clarify this. That would make much more sense.
So blaming

> siw_create_listen+0x164/0xd70 drivers/infiniband/sw/siw/siw_cm.c:1776

... isn't quite right. It doesn't deal with the SMC lock,
but locks a just created socket via 

>> -> #0 (sk_lock-AF_INET){+.+.}-{0:0}:
>>        check_prev_add kernel/locking/lockdep.c:3133 [inline]
>>        check_prevs_add kernel/locking/lockdep.c:3252 [inline]
>>        validate_chain kernel/locking/lockdep.c:3868 [inline]
>>        __lock_acquire+0x33d8/0x779c kernel/locking/lockdep.c:5142
>>        lock_acquire+0x240/0x728 kernel/locking/lockdep.c:5759
>>        lock_sock_nested net/core/sock.c:3543 [inline]
>>        lock_sock include/net/sock.h:1607 [inline]
>>        sock_set_reuseaddr+0x58/0x154 net/core/sock.c:782
>>        siw_create_listen+0x164/0xd70

> That the CMA gets involved here seems like wrong reporting because
> syzkaller put those lock chains into it.
> 
> I guess this is a dup of
> 
> INVALID URI REMOVED
> 3A__lore.kernel.org_netdev_00000000000093078f0622583e6e-
> 40google.com_T_&d=DwIBAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=4ynb4Sj_4MUcZXbhvovE4tY
> SbqxyOwdSiLedP4yO55g&m=JpX-DX-70KCh-9MzDE4Yt0wOtrMj03iWWukt_A_7qB2ycm-
> IeacSCUUDTQ5MS24-&s=DQc776KI863HX_sKom7kci4ykIgXdN7skIMVbWS1Hjc&e=
> 
> Or at least that should be fixed before looking at this
> 
Sounds reasonable...

Thanks!
Bernard.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-05 17:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-26 13:34 [syzbot] [rdma?] possible deadlock in siw_create_listen (2) syzbot
2024-10-04 16:10 ` Bernard Metzler
2024-10-05  1:20   ` Jason Gunthorpe
2024-10-05 17:34     ` Bernard Metzler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox