* [syzbot] unregister_netdevice: waiting for DEV to become free (7) @ 2022-11-18 11:39 syzbot 2022-11-18 13:28 ` Dmitry Vyukov 0 siblings, 1 reply; 7+ messages in thread From: syzbot @ 2022-11-18 11:39 UTC (permalink / raw) To: linux-kernel, netdev, syzkaller-bugs Hello, syzbot found the following issue on: HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset git tree: net-next console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 Bisection is inconclusive: the issue happens on the oldest tested release. bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com iwpm_register_pid: Unable to send a nlmsg (client = 2) infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this issue, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-18 11:39 [syzbot] unregister_netdevice: waiting for DEV to become free (7) syzbot @ 2022-11-18 13:28 ` Dmitry Vyukov 2022-11-22 2:13 ` Jason Gunthorpe 0 siblings, 1 reply; 7+ messages in thread From: Dmitry Vyukov @ 2022-11-18 13:28 UTC (permalink / raw) To: syzbot, Jason Gunthorpe, Leon Romanovsky, chenzhongjin, RDMA mailing list Cc: linux-kernel, netdev, syzkaller-bugs On Fri, 18 Nov 2022 at 12:39, syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset > git tree: net-next > console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf > dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 > > Bisection is inconclusive: the issue happens on the oldest tested release. > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 > final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 > console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com > > iwpm_register_pid: Unable to send a nlmsg (client = 2) > infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 > unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 +RDMA maintainers There are 4 reproducers and all contain: r0 = socket$nl_rdma(0x10, 0x3, 0x14) sendmsg$RDMA_NLDEV_CMD_NEWLINK(...) Also the preceding print looks related (a bug in the error handling path there?): infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > syzbot can test patches for this issue, for details see: > https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-18 13:28 ` Dmitry Vyukov @ 2022-11-22 2:13 ` Jason Gunthorpe 2022-11-22 3:28 ` wangyufen 0 siblings, 1 reply; 7+ messages in thread From: Jason Gunthorpe @ 2022-11-22 2:13 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote: > On Fri, 18 Nov 2022 at 12:39, syzbot > <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> wrote: > > > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset > > git tree: net-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf > > dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 > > > > Bisection is inconclusive: the issue happens on the oldest tested release. > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 > > final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 > > console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com > > > > iwpm_register_pid: Unable to send a nlmsg (client = 2) > > infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 > > unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 > > +RDMA maintainers > > There are 4 reproducers and all contain: > > r0 = socket$nl_rdma(0x10, 0x3, 0x14) > sendmsg$RDMA_NLDEV_CMD_NEWLINK(...) > > Also the preceding print looks related (a bug in the error handling > path there?): > > infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 I'm pretty sure it is an rxe bug ib_device_set_netdev() will hold the netdev until the caller destroys the ib_device rxe calls it during rxe_register_device() because the user asked for a stacked ib_device on top of the netdev Presumably rxe needs to have a notifier to also self destroy the rxe device if the underlying net device is to be destroyed? Can someone from rxe check into this? Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-22 2:13 ` Jason Gunthorpe @ 2022-11-22 3:28 ` wangyufen 2022-11-23 9:45 ` Guoqing Jiang 0 siblings, 1 reply; 7+ messages in thread From: wangyufen @ 2022-11-22 3:28 UTC (permalink / raw) To: Jason Gunthorpe, Dmitry Vyukov Cc: syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson 在 2022/11/22 10:13, Jason Gunthorpe 写道: > On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote: >> On Fri, 18 Nov 2022 at 12:39, syzbot >> <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> wrote: >>> >>> Hello, >>> >>> syzbot found the following issue on: >>> >>> HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset >>> git tree: net-next >>> console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf >>> dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b >>> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 >>> >>> Bisection is inconclusive: the issue happens on the oldest tested release. >>> >>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 >>> final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 >>> console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 >>> >>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>> Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com >>> >>> iwpm_register_pid: Unable to send a nlmsg (client = 2) >>> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 >>> unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 >> >> +RDMA maintainers >> >> There are 4 reproducers and all contain: >> >> r0 = socket$nl_rdma(0x10, 0x3, 0x14) >> sendmsg$RDMA_NLDEV_CMD_NEWLINK(...) >> >> Also the preceding print looks related (a bug in the error handling >> path there?): >> >> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 > > I'm pretty sure it is an rxe bug > > ib_device_set_netdev() will hold the netdev until the caller destroys > the ib_device > > rxe calls it during rxe_register_device() because the user asked for a > stacked ib_device on top of the netdev > > Presumably rxe needs to have a notifier to also self destroy the rxe > device if the underlying net device is to be destroyed? > > Can someone from rxe check into this? The following patch may fix the issue: --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -4049,6 +4049,9 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) return 0; err: id_priv->backlog = 0; + if (id_priv->cma_dev) + cma_release_dev(id_priv); + /* * All the failure paths that lead here will not allow the req_handler's * to have run. The causes are as follows: rdma_listen() rdma_bind_addr() cma_acquire_dev_by_src_ip() cma_attach_to_dev() _cma_attach_to_dev() cma_dev_get() cma_check_port() <--The return value is not zero, goto err err: <-- The error handling here is missing the operation of cma_release_dev. > > Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-22 3:28 ` wangyufen @ 2022-11-23 9:45 ` Guoqing Jiang 2022-11-24 0:22 ` Jason Gunthorpe 0 siblings, 1 reply; 7+ messages in thread From: Guoqing Jiang @ 2022-11-23 9:45 UTC (permalink / raw) To: wangyufen, Jason Gunthorpe, Dmitry Vyukov Cc: syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson On 11/22/22 11:28 AM, wangyufen wrote: > > 在 2022/11/22 10:13, Jason Gunthorpe 写道: >> On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote: >>> On Fri, 18 Nov 2022 at 12:39, syzbot >>> <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> wrote: >>>> >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of >>>> kmalloc/memset >>>> git tree: net-next >>>> console output: >>>> https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 >>>> kernel config: >>>> https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf >>>> dashboard link: >>>> https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b >>>> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU >>>> Binutils for Debian) 2.35.2 >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 >>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 >>>> >>>> Bisection is inconclusive: the issue happens on the oldest tested >>>> release. >>>> >>>> bisection log: >>>> https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 >>>> final oops: >>>> https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 >>>> console output: >>>> https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 >>>> >>>> IMPORTANT: if you fix the issue, please add the following tag to >>>> the commit: >>>> Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com >>>> >>>> iwpm_register_pid: Unable to send a nlmsg (client = 2) >>>> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 >>>> unregister_netdevice: waiting for vlan0 to become free. Usage count >>>> = 2 >>> >>> +RDMA maintainers >>> >>> There are 4 reproducers and all contain: >>> >>> r0 = socket$nl_rdma(0x10, 0x3, 0x14) >>> sendmsg$RDMA_NLDEV_CMD_NEWLINK(...) >>> >>> Also the preceding print looks related (a bug in the error handling >>> path there?): >>> >>> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 >> >> I'm pretty sure it is an rxe bug >> >> ib_device_set_netdev() will hold the netdev until the caller destroys >> the ib_device >> >> rxe calls it during rxe_register_device() because the user asked for a >> stacked ib_device on top of the netdev >> >> Presumably rxe needs to have a notifier to also self destroy the rxe >> device if the underlying net device is to be destroyed? >> >> Can someone from rxe check into this? > > The following patch may fix the issue: > > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -4049,6 +4049,9 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) > return 0; > err: > id_priv->backlog = 0; > + if (id_priv->cma_dev) > + cma_release_dev(id_priv); > + > /* > * All the failure paths that lead here will not allow the > req_handler's > * to have run. > But it is the caller's responsibility to destroy it since commit dd37d2f59eb8. > The causes are as follows: > > rdma_listen() > rdma_bind_addr() > cma_acquire_dev_by_src_ip() > cma_attach_to_dev() > _cma_attach_to_dev() > cma_dev_get() Thanks for the analysis. And for the two callers of cma_listen_on_dev, looks they have different behaviors with regard to handling failure. 1. cma_listen_on_all which calls both list_del_init(&to_destroy->device_item) and rdma_destroy_id(&to_destroy->id) 2. cma_add_one invokes cma_process_remove to delete to_destroy, cma_process_remove call both list_del_init(&id_priv->listen_item) and list_del_init(&id_priv->device_item), but it doesn't call rdma_destroy_id(&dev_id_priv->id) which is also different with _cma_cancel_listens. I am wondering if this is needed. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index cc2222b85c88..48e283d1389b 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -5231,6 +5231,7 @@ static void cma_process_remove(struct cma_device *cma_dev) cma_id_get(id_priv); mutex_unlock(&lock); + rdma_destroy_id(&dev_id_priv->id); cma_send_device_removal_put(id_priv); mutex_lock(&lock); Thanks, Guoqing ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-23 9:45 ` Guoqing Jiang @ 2022-11-24 0:22 ` Jason Gunthorpe 2022-11-24 1:42 ` wangyufen 0 siblings, 1 reply; 7+ messages in thread From: Jason Gunthorpe @ 2022-11-24 0:22 UTC (permalink / raw) To: Guoqing Jiang, Bernard Metzler Cc: wangyufen, Dmitry Vyukov, syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson On Wed, Nov 23, 2022 at 05:45:53PM +0800, Guoqing Jiang wrote: > But it is the caller's responsibility to destroy it since commit > dd37d2f59eb8. > > > The causes are as follows: > > > > rdma_listen() > > rdma_bind_addr() > > cma_acquire_dev_by_src_ip() > > cma_attach_to_dev() > > _cma_attach_to_dev() > > cma_dev_get() > > Thanks for the analysis. > > And for the two callers of cma_listen_on_dev, looks they have > different behaviors with regard to handling failure. Yes, the CM is not the problem, and that print from it is unrelated I patched in netdevice_tracker and get this: [ 237.475070][ T7541] unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 [ 237.477311][ T7541] leaked reference. [ 237.478378][ T7541] ib_device_set_netdev+0x266/0x730 [ 237.479848][ T7541] siw_newlink+0x4e0/0xfd0 [ 237.481100][ T7541] nldev_newlink+0x35c/0x5c0 [ 237.482121][ T7541] rdma_nl_rcv_msg+0x36d/0x690 [ 237.483312][ T7541] rdma_nl_rcv+0x2ee/0x430 [ 237.484483][ T7541] netlink_unicast+0x543/0x7f0 [ 237.485746][ T7541] netlink_sendmsg+0x918/0xe20 [ 237.486866][ T7541] sock_sendmsg+0xcf/0x120 [ 237.488006][ T7541] ____sys_sendmsg+0x70d/0x8b0 [ 237.489294][ T7541] ___sys_sendmsg+0x11d/0x1b0 [ 237.490404][ T7541] __sys_sendmsg+0xfa/0x1d0 [ 237.491451][ T7541] do_syscall_64+0x35/0xb0 [ 237.492566][ T7541] entry_SYSCALL_64_after_hwframe+0x63/0xcd Which seems to confirm my original prediction, except this is siw not rxe.. Maybe rxe was the wrong guess, or maybe it is troubled too in other reports? Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-24 0:22 ` Jason Gunthorpe @ 2022-11-24 1:42 ` wangyufen 0 siblings, 0 replies; 7+ messages in thread From: wangyufen @ 2022-11-24 1:42 UTC (permalink / raw) To: Jason Gunthorpe, Guoqing Jiang, Bernard Metzler Cc: Dmitry Vyukov, syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson 在 2022/11/24 8:22, Jason Gunthorpe 写道: > On Wed, Nov 23, 2022 at 05:45:53PM +0800, Guoqing Jiang wrote: >> But it is the caller's responsibility to destroy it since commit >> dd37d2f59eb8. >> >>> The causes are as follows: >>> >>> rdma_listen() >>> rdma_bind_addr() >>> cma_acquire_dev_by_src_ip() >>> cma_attach_to_dev() >>> _cma_attach_to_dev() >>> cma_dev_get() >> >> Thanks for the analysis. >> >> And for the two callers of cma_listen_on_dev, looks they have >> different behaviors with regard to handling failure. > > Yes, the CM is not the problem, and that print from it is unrelated > Yes, I misanalyzed earlier. > I patched in netdevice_tracker and get this: > > [ 237.475070][ T7541] unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 > [ 237.477311][ T7541] leaked reference. > [ 237.478378][ T7541] ib_device_set_netdev+0x266/0x730 > [ 237.479848][ T7541] siw_newlink+0x4e0/0xfd0 > [ 237.481100][ T7541] nldev_newlink+0x35c/0x5c0 > [ 237.482121][ T7541] rdma_nl_rcv_msg+0x36d/0x690 > [ 237.483312][ T7541] rdma_nl_rcv+0x2ee/0x430 > [ 237.484483][ T7541] netlink_unicast+0x543/0x7f0 > [ 237.485746][ T7541] netlink_sendmsg+0x918/0xe20 > [ 237.486866][ T7541] sock_sendmsg+0xcf/0x120 > [ 237.488006][ T7541] ____sys_sendmsg+0x70d/0x8b0 > [ 237.489294][ T7541] ___sys_sendmsg+0x11d/0x1b0 > [ 237.490404][ T7541] __sys_sendmsg+0xfa/0x1d0 > [ 237.491451][ T7541] do_syscall_64+0x35/0xb0 > [ 237.492566][ T7541] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > Which seems to confirm my original prediction, except this is siw not > rxe.. > Rxe dose not have this issue, maybe because it does not support vlan dev. > Maybe rxe was the wrong guess, or maybe it is troubled too in other > reports? > > Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-11-24 1:43 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-18 11:39 [syzbot] unregister_netdevice: waiting for DEV to become free (7) syzbot 2022-11-18 13:28 ` Dmitry Vyukov 2022-11-22 2:13 ` Jason Gunthorpe 2022-11-22 3:28 ` wangyufen 2022-11-23 9:45 ` Guoqing Jiang 2022-11-24 0:22 ` Jason Gunthorpe 2022-11-24 1:42 ` wangyufen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).