* [syzbot] unregister_netdevice: waiting for DEV to become free (7) @ 2022-11-18 11:39 syzbot 2022-11-18 13:28 ` Dmitry Vyukov 0 siblings, 1 reply; 8+ messages in thread From: syzbot @ 2022-11-18 11:39 UTC (permalink / raw) To: linux-kernel, netdev, syzkaller-bugs Hello, syzbot found the following issue on: HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset git tree: net-next console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 Bisection is inconclusive: the issue happens on the oldest tested release. bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com iwpm_register_pid: Unable to send a nlmsg (client = 2) infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this issue, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-18 11:39 [syzbot] unregister_netdevice: waiting for DEV to become free (7) syzbot @ 2022-11-18 13:28 ` Dmitry Vyukov 2022-11-22 2:13 ` Jason Gunthorpe 2023-03-25 16:02 ` Tetsuo Handa 0 siblings, 2 replies; 8+ messages in thread From: Dmitry Vyukov @ 2022-11-18 13:28 UTC (permalink / raw) To: syzbot, Jason Gunthorpe, Leon Romanovsky, chenzhongjin, RDMA mailing list Cc: linux-kernel, netdev, syzkaller-bugs On Fri, 18 Nov 2022 at 12:39, syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset > git tree: net-next > console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf > dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 > > Bisection is inconclusive: the issue happens on the oldest tested release. > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 > final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 > console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com > > iwpm_register_pid: Unable to send a nlmsg (client = 2) > infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 > unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 +RDMA maintainers There are 4 reproducers and all contain: r0 = socket$nl_rdma(0x10, 0x3, 0x14) sendmsg$RDMA_NLDEV_CMD_NEWLINK(...) Also the preceding print looks related (a bug in the error handling path there?): infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > syzbot can test patches for this issue, for details see: > https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-18 13:28 ` Dmitry Vyukov @ 2022-11-22 2:13 ` Jason Gunthorpe 2022-11-22 3:28 ` wangyufen 2023-03-25 16:02 ` Tetsuo Handa 1 sibling, 1 reply; 8+ messages in thread From: Jason Gunthorpe @ 2022-11-22 2:13 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote: > On Fri, 18 Nov 2022 at 12:39, syzbot > <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> wrote: > > > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset > > git tree: net-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf > > dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 > > > > Bisection is inconclusive: the issue happens on the oldest tested release. > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 > > final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 > > console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com > > > > iwpm_register_pid: Unable to send a nlmsg (client = 2) > > infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 > > unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 > > +RDMA maintainers > > There are 4 reproducers and all contain: > > r0 = socket$nl_rdma(0x10, 0x3, 0x14) > sendmsg$RDMA_NLDEV_CMD_NEWLINK(...) > > Also the preceding print looks related (a bug in the error handling > path there?): > > infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 I'm pretty sure it is an rxe bug ib_device_set_netdev() will hold the netdev until the caller destroys the ib_device rxe calls it during rxe_register_device() because the user asked for a stacked ib_device on top of the netdev Presumably rxe needs to have a notifier to also self destroy the rxe device if the underlying net device is to be destroyed? Can someone from rxe check into this? Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-22 2:13 ` Jason Gunthorpe @ 2022-11-22 3:28 ` wangyufen 2022-11-23 9:45 ` Guoqing Jiang 0 siblings, 1 reply; 8+ messages in thread From: wangyufen @ 2022-11-22 3:28 UTC (permalink / raw) To: Jason Gunthorpe, Dmitry Vyukov Cc: syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson 在 2022/11/22 10:13, Jason Gunthorpe 写道: > On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote: >> On Fri, 18 Nov 2022 at 12:39, syzbot >> <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> wrote: >>> >>> Hello, >>> >>> syzbot found the following issue on: >>> >>> HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of kmalloc/memset >>> git tree: net-next >>> console output: https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf >>> dashboard link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b >>> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2 >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 >>> >>> Bisection is inconclusive: the issue happens on the oldest tested release. >>> >>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 >>> final oops: https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 >>> console output: https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 >>> >>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>> Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com >>> >>> iwpm_register_pid: Unable to send a nlmsg (client = 2) >>> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 >>> unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 >> >> +RDMA maintainers >> >> There are 4 reproducers and all contain: >> >> r0 = socket$nl_rdma(0x10, 0x3, 0x14) >> sendmsg$RDMA_NLDEV_CMD_NEWLINK(...) >> >> Also the preceding print looks related (a bug in the error handling >> path there?): >> >> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 > > I'm pretty sure it is an rxe bug > > ib_device_set_netdev() will hold the netdev until the caller destroys > the ib_device > > rxe calls it during rxe_register_device() because the user asked for a > stacked ib_device on top of the netdev > > Presumably rxe needs to have a notifier to also self destroy the rxe > device if the underlying net device is to be destroyed? > > Can someone from rxe check into this? The following patch may fix the issue: --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -4049,6 +4049,9 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) return 0; err: id_priv->backlog = 0; + if (id_priv->cma_dev) + cma_release_dev(id_priv); + /* * All the failure paths that lead here will not allow the req_handler's * to have run. The causes are as follows: rdma_listen() rdma_bind_addr() cma_acquire_dev_by_src_ip() cma_attach_to_dev() _cma_attach_to_dev() cma_dev_get() cma_check_port() <--The return value is not zero, goto err err: <-- The error handling here is missing the operation of cma_release_dev. > > Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-22 3:28 ` wangyufen @ 2022-11-23 9:45 ` Guoqing Jiang 2022-11-24 0:22 ` Jason Gunthorpe 0 siblings, 1 reply; 8+ messages in thread From: Guoqing Jiang @ 2022-11-23 9:45 UTC (permalink / raw) To: wangyufen, Jason Gunthorpe, Dmitry Vyukov Cc: syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson On 11/22/22 11:28 AM, wangyufen wrote: > > 在 2022/11/22 10:13, Jason Gunthorpe 写道: >> On Fri, Nov 18, 2022 at 02:28:53PM +0100, Dmitry Vyukov wrote: >>> On Fri, 18 Nov 2022 at 12:39, syzbot >>> <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> wrote: >>>> >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit: 9c8774e629a1 net: eql: Use kzalloc instead of >>>> kmalloc/memset >>>> git tree: net-next >>>> console output: >>>> https://syzkaller.appspot.com/x/log.txt?x=17bf6cc8f00000 >>>> kernel config: >>>> https://syzkaller.appspot.com/x/.config?x=9eb259db6b1893cf >>>> dashboard link: >>>> https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b >>>> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU >>>> Binutils for Debian) 2.35.2 >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1136d592f00000 >>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1193ae64f00000 >>>> >>>> Bisection is inconclusive: the issue happens on the oldest tested >>>> release. >>>> >>>> bisection log: >>>> https://syzkaller.appspot.com/x/bisect.txt?x=167c33a2f00000 >>>> final oops: >>>> https://syzkaller.appspot.com/x/report.txt?x=157c33a2f00000 >>>> console output: >>>> https://syzkaller.appspot.com/x/log.txt?x=117c33a2f00000 >>>> >>>> IMPORTANT: if you fix the issue, please add the following tag to >>>> the commit: >>>> Reported-by: syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com >>>> >>>> iwpm_register_pid: Unable to send a nlmsg (client = 2) >>>> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 >>>> unregister_netdevice: waiting for vlan0 to become free. Usage count >>>> = 2 >>> >>> +RDMA maintainers >>> >>> There are 4 reproducers and all contain: >>> >>> r0 = socket$nl_rdma(0x10, 0x3, 0x14) >>> sendmsg$RDMA_NLDEV_CMD_NEWLINK(...) >>> >>> Also the preceding print looks related (a bug in the error handling >>> path there?): >>> >>> infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98 >> >> I'm pretty sure it is an rxe bug >> >> ib_device_set_netdev() will hold the netdev until the caller destroys >> the ib_device >> >> rxe calls it during rxe_register_device() because the user asked for a >> stacked ib_device on top of the netdev >> >> Presumably rxe needs to have a notifier to also self destroy the rxe >> device if the underlying net device is to be destroyed? >> >> Can someone from rxe check into this? > > The following patch may fix the issue: > > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -4049,6 +4049,9 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) > return 0; > err: > id_priv->backlog = 0; > + if (id_priv->cma_dev) > + cma_release_dev(id_priv); > + > /* > * All the failure paths that lead here will not allow the > req_handler's > * to have run. > But it is the caller's responsibility to destroy it since commit dd37d2f59eb8. > The causes are as follows: > > rdma_listen() > rdma_bind_addr() > cma_acquire_dev_by_src_ip() > cma_attach_to_dev() > _cma_attach_to_dev() > cma_dev_get() Thanks for the analysis. And for the two callers of cma_listen_on_dev, looks they have different behaviors with regard to handling failure. 1. cma_listen_on_all which calls both list_del_init(&to_destroy->device_item) and rdma_destroy_id(&to_destroy->id) 2. cma_add_one invokes cma_process_remove to delete to_destroy, cma_process_remove call both list_del_init(&id_priv->listen_item) and list_del_init(&id_priv->device_item), but it doesn't call rdma_destroy_id(&dev_id_priv->id) which is also different with _cma_cancel_listens. I am wondering if this is needed. diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index cc2222b85c88..48e283d1389b 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -5231,6 +5231,7 @@ static void cma_process_remove(struct cma_device *cma_dev) cma_id_get(id_priv); mutex_unlock(&lock); + rdma_destroy_id(&dev_id_priv->id); cma_send_device_removal_put(id_priv); mutex_lock(&lock); Thanks, Guoqing ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-23 9:45 ` Guoqing Jiang @ 2022-11-24 0:22 ` Jason Gunthorpe 2022-11-24 1:42 ` wangyufen 0 siblings, 1 reply; 8+ messages in thread From: Jason Gunthorpe @ 2022-11-24 0:22 UTC (permalink / raw) To: Guoqing Jiang, Bernard Metzler Cc: wangyufen, Dmitry Vyukov, syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson On Wed, Nov 23, 2022 at 05:45:53PM +0800, Guoqing Jiang wrote: > But it is the caller's responsibility to destroy it since commit > dd37d2f59eb8. > > > The causes are as follows: > > > > rdma_listen() > > rdma_bind_addr() > > cma_acquire_dev_by_src_ip() > > cma_attach_to_dev() > > _cma_attach_to_dev() > > cma_dev_get() > > Thanks for the analysis. > > And for the two callers of cma_listen_on_dev, looks they have > different behaviors with regard to handling failure. Yes, the CM is not the problem, and that print from it is unrelated I patched in netdevice_tracker and get this: [ 237.475070][ T7541] unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 [ 237.477311][ T7541] leaked reference. [ 237.478378][ T7541] ib_device_set_netdev+0x266/0x730 [ 237.479848][ T7541] siw_newlink+0x4e0/0xfd0 [ 237.481100][ T7541] nldev_newlink+0x35c/0x5c0 [ 237.482121][ T7541] rdma_nl_rcv_msg+0x36d/0x690 [ 237.483312][ T7541] rdma_nl_rcv+0x2ee/0x430 [ 237.484483][ T7541] netlink_unicast+0x543/0x7f0 [ 237.485746][ T7541] netlink_sendmsg+0x918/0xe20 [ 237.486866][ T7541] sock_sendmsg+0xcf/0x120 [ 237.488006][ T7541] ____sys_sendmsg+0x70d/0x8b0 [ 237.489294][ T7541] ___sys_sendmsg+0x11d/0x1b0 [ 237.490404][ T7541] __sys_sendmsg+0xfa/0x1d0 [ 237.491451][ T7541] do_syscall_64+0x35/0xb0 [ 237.492566][ T7541] entry_SYSCALL_64_after_hwframe+0x63/0xcd Which seems to confirm my original prediction, except this is siw not rxe.. Maybe rxe was the wrong guess, or maybe it is troubled too in other reports? Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-24 0:22 ` Jason Gunthorpe @ 2022-11-24 1:42 ` wangyufen 0 siblings, 0 replies; 8+ messages in thread From: wangyufen @ 2022-11-24 1:42 UTC (permalink / raw) To: Jason Gunthorpe, Guoqing Jiang, Bernard Metzler Cc: Dmitry Vyukov, syzbot, Leon Romanovsky, chenzhongjin, RDMA mailing list, linux-kernel, netdev, syzkaller-bugs, Zhu Yanjun, Bob Pearson 在 2022/11/24 8:22, Jason Gunthorpe 写道: > On Wed, Nov 23, 2022 at 05:45:53PM +0800, Guoqing Jiang wrote: >> But it is the caller's responsibility to destroy it since commit >> dd37d2f59eb8. >> >>> The causes are as follows: >>> >>> rdma_listen() >>> rdma_bind_addr() >>> cma_acquire_dev_by_src_ip() >>> cma_attach_to_dev() >>> _cma_attach_to_dev() >>> cma_dev_get() >> >> Thanks for the analysis. >> >> And for the two callers of cma_listen_on_dev, looks they have >> different behaviors with regard to handling failure. > > Yes, the CM is not the problem, and that print from it is unrelated > Yes, I misanalyzed earlier. > I patched in netdevice_tracker and get this: > > [ 237.475070][ T7541] unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 > [ 237.477311][ T7541] leaked reference. > [ 237.478378][ T7541] ib_device_set_netdev+0x266/0x730 > [ 237.479848][ T7541] siw_newlink+0x4e0/0xfd0 > [ 237.481100][ T7541] nldev_newlink+0x35c/0x5c0 > [ 237.482121][ T7541] rdma_nl_rcv_msg+0x36d/0x690 > [ 237.483312][ T7541] rdma_nl_rcv+0x2ee/0x430 > [ 237.484483][ T7541] netlink_unicast+0x543/0x7f0 > [ 237.485746][ T7541] netlink_sendmsg+0x918/0xe20 > [ 237.486866][ T7541] sock_sendmsg+0xcf/0x120 > [ 237.488006][ T7541] ____sys_sendmsg+0x70d/0x8b0 > [ 237.489294][ T7541] ___sys_sendmsg+0x11d/0x1b0 > [ 237.490404][ T7541] __sys_sendmsg+0xfa/0x1d0 > [ 237.491451][ T7541] do_syscall_64+0x35/0xb0 > [ 237.492566][ T7541] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > Which seems to confirm my original prediction, except this is siw not > rxe.. > Rxe dose not have this issue, maybe because it does not support vlan dev. > Maybe rxe was the wrong guess, or maybe it is troubled too in other > reports? > > Jason ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [syzbot] unregister_netdevice: waiting for DEV to become free (7) 2022-11-18 13:28 ` Dmitry Vyukov 2022-11-22 2:13 ` Jason Gunthorpe @ 2023-03-25 16:02 ` Tetsuo Handa 1 sibling, 0 replies; 8+ messages in thread From: Tetsuo Handa @ 2023-03-25 16:02 UTC (permalink / raw) To: Jason Gunthorpe, Leon Romanovsky, chenzhongjin, Zhu Yanjun, Bob Pearson, Guoqing Jiang, Bernard Metzler, linux-rdma, wangyufen, Dmitry Vyukov Cc: syzbot+5e70d01ee8985ae62a3b, syzkaller-bugs I made a simplified reproducer, using https://syzkaller.appspot.com/text?tag=ReproC&x=1193ae64f00000 as a base with main() rewritten like below. ---------- int main(void) { unshare(CLONE_NEWNET); initialize_netdevices(); const int fd1 = socket(AF_NETLINK, SOCK_RAW, NETLINK_RDMA); struct iovec iov = { .iov_base = "\x38\x00\x00\x00\x03\x14\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x09" "\x00\x02\x00\x73\x79\x6a\x31\x00\x00\x00\x00\x08\x00\x41\x00\x73\x69" "\x77\x00\x14\x00\x33\x00\x76\x6c\x61\x6e\x30", .iov_len = 0x38 }; struct msghdr msg = { .msg_iov = &iov, .msg_iovlen = 1 }; sendmsg(fd1, &msg, 0); if (0) { /* Enable this block if you want to observe hung without exit(0). */ const int fd2 = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); struct /* vlan_ioctl_args */ { int cmd; char device1[24]; union { char device2[24]; int VID; unsigned int skb_priority; unsigned int name_type; unsigned int bind_type; unsigned int flag; } u; short vlan_qos; } args = { .cmd = 1 /* DEL_VLAN_CMD */, .device1 = "vlan0", }; ioctl(fd2, 0x8982 /* SIOCGIFVLAN */, &args); } return 0; } ---------- Using this reproducer, I did several tests. This problem is reproduced with below diff (which pretends as if e.g. cma_add_one(), ib_uverbs_add_one(), srp_add_one() failed with -ENOMEM error because they do GFP_KERNEL allocation). ---------- diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index a666847bd714..f8f593bd81e5 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -710,23 +710,23 @@ static int add_client_context(struct ib_device *device, */ if (xa_get_mark(&device->client_data, client->client_id, CLIENT_DATA_REGISTERED)) goto out; ret = xa_err(xa_store(&device->client_data, client->client_id, NULL, GFP_KERNEL)); if (ret) goto out; downgrade_write(&device->client_data_rwsem); if (client->add) { - if (client->add(device)) { + if (1) { /* * If a client fails to add then the error code is * ignored, but we won't call any more ops on this * client. */ xa_erase(&device->client_data, client->client_id); up_read(&device->client_data_rwsem); ib_device_put(device); ib_client_put(client); return 0; } ---------- Since client->add() can fail with -ENOMEM, pretending as if client->add() always fails with -ENOMEM hides this problem. ---------- diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index a666847bd714..4c986f62514f 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -695,6 +695,9 @@ static int add_client_context(struct ib_device *device, if (!device->kverbs_provider && !client->no_kverbs_req) return 0; + if (client->add) + return -ENOMEM; + down_write(&device->client_data_rwsem); /* * So long as the client is registered hold both the client and device ---------- However, changing to return 0 instead of -ENOMEM again shows this problem. ---------- diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index a666847bd714..9dcf712e542d 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -695,6 +695,9 @@ static int add_client_context(struct ib_device *device, if (!device->kverbs_provider && !client->no_kverbs_req) return 0; + if (client->add) + return 0; + down_write(&device->client_data_rwsem); /* * So long as the client is registered hold both the client and device ---------- These results suggest that the "If a client fails to add then the error code is ignored" part is wrong. We need to teach the caller about the failure, even if we don't call any more ops on this client. Please re-evaluate commit 11a0ae4c4bff ("RDMA: Allow ib_client's to fail when add() is called"). I don't know why that commit ignores failures. There could be some error code which should be ignored, but I feel that there are error codes which should not be ignored. What do you think? From 3e1cd08cb206afa827d196e4730ba9c3670a7fe7 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Sun, 26 Mar 2023 00:34:05 +0900 Subject: [PATCH] RDMA: don't ignore client->add() failures syzbot is reporting refcount leak when client->add() from add_client_context() fails, for add_client_context() is ignoring error from client->add(). We need to return an error in order to indicate that client could not be added, or the caller will get confused. Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- drivers/infiniband/core/device.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index a666847bd714..c72f810ceae1 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -718,17 +718,17 @@ static int add_client_context(struct ib_device *device, goto out; downgrade_write(&device->client_data_rwsem); if (client->add) { - if (client->add(device)) { + ret = client->add(device); + if (ret) { /* - * If a client fails to add then the error code is - * ignored, but we won't call any more ops on this - * client. + * If a client fails to add, we won't call any more + * ops on this client. */ xa_erase(&device->client_data, client->client_id); up_read(&device->client_data_rwsem); ib_device_put(device); ib_client_put(client); - return 0; + return ret; } } -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-03-25 16:03 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-18 11:39 [syzbot] unregister_netdevice: waiting for DEV to become free (7) syzbot 2022-11-18 13:28 ` Dmitry Vyukov 2022-11-22 2:13 ` Jason Gunthorpe 2022-11-22 3:28 ` wangyufen 2022-11-23 9:45 ` Guoqing Jiang 2022-11-24 0:22 ` Jason Gunthorpe 2022-11-24 1:42 ` wangyufen 2023-03-25 16:02 ` Tetsuo Handa
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.