* WARNING in ib_umad_kill_port @ 2020-04-06 6:37 syzbot 2020-04-06 17:21 ` Leon Romanovsky 0 siblings, 1 reply; 9+ messages in thread From: syzbot @ 2020-04-06 6:37 UTC (permalink / raw) To: gregkh, linux-kernel, netdev, rafael, syzkaller-bugs Hello, syzbot found the following crash on: HEAD commit: 304e0242 net_sched: add a temporary refcnt for struct tcin.. git tree: net console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000 kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1 dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c compiler: gcc (GCC) 9.0.0 20181231 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com ------------[ cut here ]------------ sysfs group 'power' not found for kobject 'umad1' WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline] WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events_unbound ib_unregister_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x35 kernel/panic.c:582 report_bug+0x27b/0x2f0 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:175 [inline] fixup_bug arch/x86/kernel/traps.c:170 [inline] do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267 do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline] RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270 Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1 R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070 R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000 dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794 device_del+0x18b/0xd30 drivers/base/core.c:2687 cdev_device_del+0x15/0x80 fs/char_dev.c:570 ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327 ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409 remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724 disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270 __ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437 ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547 process_one_work+0x965/0x16a0 kernel/workqueue.c:2266 worker_thread+0x96/0xe20 kernel/workqueue.c:2412 kthread+0x388/0x470 kernel/kthread.c:268 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Kernel Offset: disabled Rebooting in 86400 seconds.. --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: WARNING in ib_umad_kill_port 2020-04-06 6:37 WARNING in ib_umad_kill_port syzbot @ 2020-04-06 17:21 ` Leon Romanovsky 2020-04-06 17:44 ` Jason Gunthorpe 0 siblings, 1 reply; 9+ messages in thread From: Leon Romanovsky @ 2020-04-06 17:21 UTC (permalink / raw) To: syzbot, RDMA mailing list Cc: gregkh, linux-kernel, netdev, rafael, syzkaller-bugs + RDMA On Sun, Apr 05, 2020 at 11:37:15PM -0700, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: 304e0242 net_sched: add a temporary refcnt for struct tcin.. > git tree: net > console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1 > dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com > > ------------[ cut here ]------------ > sysfs group 'power' not found for kobject 'umad1' > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline] > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270 > Kernel panic - not syncing: panic_on_warn set ... > CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Workqueue: events_unbound ib_unregister_work > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x188/0x20d lib/dump_stack.c:118 > panic+0x2e3/0x75c kernel/panic.c:221 > __warn.cold+0x2f/0x35 kernel/panic.c:582 > report_bug+0x27b/0x2f0 lib/bug.c:195 > fixup_bug arch/x86/kernel/traps.c:175 [inline] > fixup_bug arch/x86/kernel/traps.c:170 [inline] > do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267 > do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286 > invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 > RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline] > RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270 > Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff > RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282 > RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e > RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1 > R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070 > R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000 > dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794 > device_del+0x18b/0xd30 drivers/base/core.c:2687 > cdev_device_del+0x15/0x80 fs/char_dev.c:570 > ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327 > ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409 > remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724 > disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270 > __ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437 > ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547 > process_one_work+0x965/0x16a0 kernel/workqueue.c:2266 > worker_thread+0x96/0xe20 kernel/workqueue.c:2412 > kthread+0x388/0x470 kernel/kthread.c:268 > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > Kernel Offset: disabled > Rebooting in 86400 seconds.. > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: WARNING in ib_umad_kill_port 2020-04-06 17:21 ` Leon Romanovsky @ 2020-04-06 17:44 ` Jason Gunthorpe 2020-04-07 9:56 ` Dmitry Vyukov 0 siblings, 1 reply; 9+ messages in thread From: Jason Gunthorpe @ 2020-04-06 17:44 UTC (permalink / raw) To: Leon Romanovsky Cc: syzbot, RDMA mailing list, gregkh, linux-kernel, netdev, rafael, syzkaller-bugs On Mon, Apr 06, 2020 at 08:21:51PM +0300, Leon Romanovsky wrote: > + RDMA > > On Sun, Apr 05, 2020 at 11:37:15PM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: 304e0242 net_sched: add a temporary refcnt for struct tcin.. > > git tree: net > > console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1 > > dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com > > > > sysfs group 'power' not found for kobject 'umad1' > > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline] > > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270 > > Kernel panic - not syncing: panic_on_warn set ... > > CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Workqueue: events_unbound ib_unregister_work > > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x188/0x20d lib/dump_stack.c:118 > > panic+0x2e3/0x75c kernel/panic.c:221 > > __warn.cold+0x2f/0x35 kernel/panic.c:582 > > report_bug+0x27b/0x2f0 lib/bug.c:195 > > fixup_bug arch/x86/kernel/traps.c:175 [inline] > > fixup_bug arch/x86/kernel/traps.c:170 [inline] > > do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267 > > do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286 > > invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 > > RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline] > > RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270 > > Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff > > RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282 > > RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e > > RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1 > > R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070 > > R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000 > > dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794 > > device_del+0x18b/0xd30 drivers/base/core.c:2687 > > cdev_device_del+0x15/0x80 fs/char_dev.c:570 > > ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327 > > ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409 > > remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724 > > disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270 > > __ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437 > > ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547 > > process_one_work+0x965/0x16a0 kernel/workqueue.c:2266 > > worker_thread+0x96/0xe20 kernel/workqueue.c:2412 > > kthread+0x388/0x470 kernel/kthread.c:268 > > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > > Kernel Offset: disabled > > Rebooting in 86400 seconds.. I'm not sure what could be done wrong here to elicit this: sysfs group 'power' not found for kobject 'umad1' ?? I've seen another similar sysfs related trigger that we couldn't figure out. Hard to investigate without a reproducer. Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: WARNING in ib_umad_kill_port 2020-04-06 17:44 ` Jason Gunthorpe @ 2020-04-07 9:56 ` Dmitry Vyukov 2020-04-07 11:55 ` Jason Gunthorpe 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Vyukov @ 2020-04-07 9:56 UTC (permalink / raw) To: Jason Gunthorpe Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs On Mon, Apr 6, 2020 at 7:44 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Mon, Apr 06, 2020 at 08:21:51PM +0300, Leon Romanovsky wrote: > > + RDMA > > > > On Sun, Apr 05, 2020 at 11:37:15PM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following crash on: > > > > > > HEAD commit: 304e0242 net_sched: add a temporary refcnt for struct tcin.. > > > git tree: net > > > console output: https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=9627a92b1f9262d5d30c > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > Reported-by: syzbot+9627a92b1f9262d5d30c@syzkaller.appspotmail.com > > > > > > sysfs group 'power' not found for kobject 'umad1' > > > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline] > > > WARNING: CPU: 1 PID: 31308 at fs/sysfs/group.c:279 sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270 > > > Kernel panic - not syncing: panic_on_warn set ... > > > CPU: 1 PID: 31308 Comm: kworker/u4:10 Not tainted 5.6.0-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > Workqueue: events_unbound ib_unregister_work > > > Call Trace: > > > __dump_stack lib/dump_stack.c:77 [inline] > > > dump_stack+0x188/0x20d lib/dump_stack.c:118 > > > panic+0x2e3/0x75c kernel/panic.c:221 > > > __warn.cold+0x2f/0x35 kernel/panic.c:582 > > > report_bug+0x27b/0x2f0 lib/bug.c:195 > > > fixup_bug arch/x86/kernel/traps.c:175 [inline] > > > fixup_bug arch/x86/kernel/traps.c:170 [inline] > > > do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267 > > > do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286 > > > invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 > > > RIP: 0010:sysfs_remove_group fs/sysfs/group.c:279 [inline] > > > RIP: 0010:sysfs_remove_group+0x155/0x1b0 fs/sysfs/group.c:270 > > > Code: 48 89 d9 49 8b 14 24 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 80 3c 01 00 75 41 48 8b 33 48 c7 c7 60 c3 39 88 e8 93 c3 5f ff <0f> 0b eb 95 e8 22 62 cb ff e9 d2 fe ff ff 48 89 df e8 15 62 cb ff > > > RSP: 0018:ffffc90001d97a60 EFLAGS: 00010282 > > > RAX: 0000000000000000 RBX: ffffffff88915620 RCX: 0000000000000000 > > > RDX: 0000000000000000 RSI: ffffffff815ca861 RDI: fffff520003b2f3e > > > RBP: 0000000000000000 R08: ffff8880a78fc2c0 R09: ffffed1015ce66a1 > > > R10: ffffed1015ce66a0 R11: ffff8880ae733507 R12: ffff88808e5ba070 > > > R13: ffffffff88915bc0 R14: ffff88808e5ba008 R15: dffffc0000000000 > > > dpm_sysfs_remove+0x97/0xb0 drivers/base/power/sysfs.c:794 > > > device_del+0x18b/0xd30 drivers/base/core.c:2687 > > > cdev_device_del+0x15/0x80 fs/char_dev.c:570 > > > ib_umad_kill_port+0x45/0x250 drivers/infiniband/core/user_mad.c:1327 > > > ib_umad_remove_one+0x18a/0x220 drivers/infiniband/core/user_mad.c:1409 > > > remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:724 > > > disable_device+0x13b/0x230 drivers/infiniband/core/device.c:1270 > > > __ib_unregister_device+0x91/0x180 drivers/infiniband/core/device.c:1437 > > > ib_unregister_work+0x15/0x30 drivers/infiniband/core/device.c:1547 > > > process_one_work+0x965/0x16a0 kernel/workqueue.c:2266 > > > worker_thread+0x96/0xe20 kernel/workqueue.c:2412 > > > kthread+0x388/0x470 kernel/kthread.c:268 > > > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > > > Kernel Offset: disabled > > > Rebooting in 86400 seconds.. > > I'm not sure what could be done wrong here to elicit this: > > sysfs group 'power' not found for kobject 'umad1' > > ?? > > I've seen another similar sysfs related trigger that we couldn't > figure out. > > Hard to investigate without a reproducer. > > Jason Based on all of the sysfs-related bugs I've seen, my bet would be on some races. E.g. one thread registers devices, while another unregisters these. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: WARNING in ib_umad_kill_port 2020-04-07 9:56 ` Dmitry Vyukov @ 2020-04-07 11:55 ` Jason Gunthorpe 2020-04-07 12:39 ` Dmitry Vyukov 0 siblings, 1 reply; 9+ messages in thread From: Jason Gunthorpe @ 2020-04-07 11:55 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote: > > I'm not sure what could be done wrong here to elicit this: > > > > sysfs group 'power' not found for kobject 'umad1' > > > > ?? > > > > I've seen another similar sysfs related trigger that we couldn't > > figure out. > > > > Hard to investigate without a reproducer. > > Based on all of the sysfs-related bugs I've seen, my bet would be on > some races. E.g. one thread registers devices, while another > unregisters these. I did check that the naming is ordered right, at least we won't be concurrently creating and destroying umadX sysfs of the same names. I'm also fairly sure we can't be destroying the parent at the same time as this child. Do you see the above commonly? Could it be some driver core thing? Or is it more likely something wrong in umad? Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: WARNING in ib_umad_kill_port 2020-04-07 11:55 ` Jason Gunthorpe @ 2020-04-07 12:39 ` Dmitry Vyukov 2020-04-07 14:33 ` Greg Kroah-Hartman 2020-04-07 14:35 ` Jason Gunthorpe 0 siblings, 2 replies; 9+ messages in thread From: Dmitry Vyukov @ 2020-04-07 12:39 UTC (permalink / raw) To: Jason Gunthorpe Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote: > > > I'm not sure what could be done wrong here to elicit this: > > > > > > sysfs group 'power' not found for kobject 'umad1' > > > > > > ?? > > > > > > I've seen another similar sysfs related trigger that we couldn't > > > figure out. > > > > > > Hard to investigate without a reproducer. > > > > Based on all of the sysfs-related bugs I've seen, my bet would be on > > some races. E.g. one thread registers devices, while another > > unregisters these. > > I did check that the naming is ordered right, at least we won't be > concurrently creating and destroying umadX sysfs of the same names. > > I'm also fairly sure we can't be destroying the parent at the same > time as this child. > > Do you see the above commonly? Could it be some driver core thing? Or > is it more likely something wrong in umad? Mmmm... I can't say, I am looking at some bugs very briefly. I've noticed that sysfs comes up periodically (or was it some other similar fs?). General observation is that code frequently assumes only the happy scenario and only, say, a single administrator doing one thing at a time, slowly and carefully, and it is not really hardened against armies of monkeys. But I did not look at code abstractions, bug patterns, contracts, etc. Greg KH may know better. Greg, as far as I remember you commented on some of these reports along the lines of, for example, "the warning is in sysfs code, but the bug is in the callers". ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: WARNING in ib_umad_kill_port 2020-04-07 12:39 ` Dmitry Vyukov @ 2020-04-07 14:33 ` Greg Kroah-Hartman 2020-04-07 14:35 ` Jason Gunthorpe 1 sibling, 0 replies; 9+ messages in thread From: Greg Kroah-Hartman @ 2020-04-07 14:33 UTC (permalink / raw) To: Dmitry Vyukov Cc: Jason Gunthorpe, Leon Romanovsky, syzbot, RDMA mailing list, LKML, netdev, Rafael Wysocki, syzkaller-bugs On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote: > On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote: > > > > I'm not sure what could be done wrong here to elicit this: > > > > > > > > sysfs group 'power' not found for kobject 'umad1' > > > > > > > > ?? > > > > > > > > I've seen another similar sysfs related trigger that we couldn't > > > > figure out. > > > > > > > > Hard to investigate without a reproducer. > > > > > > Based on all of the sysfs-related bugs I've seen, my bet would be on > > > some races. E.g. one thread registers devices, while another > > > unregisters these. > > > > I did check that the naming is ordered right, at least we won't be > > concurrently creating and destroying umadX sysfs of the same names. > > > > I'm also fairly sure we can't be destroying the parent at the same > > time as this child. > > > > Do you see the above commonly? Could it be some driver core thing? Or > > is it more likely something wrong in umad? > > Mmmm... I can't say, I am looking at some bugs very briefly. I've > noticed that sysfs comes up periodically (or was it some other similar > fs?). General observation is that code frequently assumes only the > happy scenario and only, say, a single administrator doing one thing > at a time, slowly and carefully, and it is not really hardened against > armies of monkeys. > But I did not look at code abstractions, bug patterns, contracts, etc. > > Greg KH may know better. Greg, as far as I remember you commented on > some of these reports along the lines of, for example, "the warning is > in sysfs code, but the bug is in the callers". Yes, that is correct. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: WARNING in ib_umad_kill_port 2020-04-07 12:39 ` Dmitry Vyukov 2020-04-07 14:33 ` Greg Kroah-Hartman @ 2020-04-07 14:35 ` Jason Gunthorpe 2020-04-09 13:35 ` Dmitry Vyukov 1 sibling, 1 reply; 9+ messages in thread From: Jason Gunthorpe @ 2020-04-07 14:35 UTC (permalink / raw) To: Dmitry Vyukov Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote: > On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote: > > > > I'm not sure what could be done wrong here to elicit this: > > > > > > > > sysfs group 'power' not found for kobject 'umad1' > > > > > > > > ?? > > > > > > > > I've seen another similar sysfs related trigger that we couldn't > > > > figure out. > > > > > > > > Hard to investigate without a reproducer. > > > > > > Based on all of the sysfs-related bugs I've seen, my bet would be on > > > some races. E.g. one thread registers devices, while another > > > unregisters these. > > > > I did check that the naming is ordered right, at least we won't be > > concurrently creating and destroying umadX sysfs of the same names. > > > > I'm also fairly sure we can't be destroying the parent at the same > > time as this child. > > > > Do you see the above commonly? Could it be some driver core thing? Or > > is it more likely something wrong in umad? > > Mmmm... I can't say, I am looking at some bugs very briefly. I've > noticed that sysfs comes up periodically (or was it some other similar > fs?). Hmm.. Looking at the git history I see several cases where there are ordering problems. I wonder if the rdma parent device is being destroyed before the rdma devices complete destruction? I see the syzkaller is creating a bunch of virtual net devices, and I assume it has created a software rdma device on one of these virtual devices. So I'm guessing that it is also destroying a parent? But I can't guess which.. Some simple tests with veth suggest it is OK because the parent is virtual. But maybe bond or bridge or something? The issue in rdma is that unregistering a netdev triggers an async destruction of the RDMA devices. This has to be async because the netdev notification is delivered with RTNL held, and a rdma device cannot be destroyed while holding RTNL. So there is a race, I suppose, where the netdev can complete destruction while rdma continues, and if someone deletes the sysfs holding the netdev before rdma completes, I'm going to guess, that we hit this warning? Could it be? I would love to know what netdev the rdma device was created on, but it doesn't seem to show in the trace :\ This theory could be made more likely by adding a sleep to ib_unregister_work() to increase the race window - is there some way to get syzkaller to search for a reproducer with that patch? Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: WARNING in ib_umad_kill_port 2020-04-07 14:35 ` Jason Gunthorpe @ 2020-04-09 13:35 ` Dmitry Vyukov 0 siblings, 0 replies; 9+ messages in thread From: Dmitry Vyukov @ 2020-04-09 13:35 UTC (permalink / raw) To: Jason Gunthorpe Cc: Leon Romanovsky, syzbot, RDMA mailing list, Greg Kroah-Hartman, LKML, netdev, Rafael Wysocki, syzkaller-bugs On Tue, Apr 7, 2020 at 4:35 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Apr 07, 2020 at 02:39:42PM +0200, Dmitry Vyukov wrote: > > On Tue, Apr 7, 2020 at 1:55 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > On Tue, Apr 07, 2020 at 11:56:30AM +0200, Dmitry Vyukov wrote: > > > > > I'm not sure what could be done wrong here to elicit this: > > > > > > > > > > sysfs group 'power' not found for kobject 'umad1' > > > > > > > > > > ?? > > > > > > > > > > I've seen another similar sysfs related trigger that we couldn't > > > > > figure out. > > > > > > > > > > Hard to investigate without a reproducer. > > > > > > > > Based on all of the sysfs-related bugs I've seen, my bet would be on > > > > some races. E.g. one thread registers devices, while another > > > > unregisters these. > > > > > > I did check that the naming is ordered right, at least we won't be > > > concurrently creating and destroying umadX sysfs of the same names. > > > > > > I'm also fairly sure we can't be destroying the parent at the same > > > time as this child. > > > > > > Do you see the above commonly? Could it be some driver core thing? Or > > > is it more likely something wrong in umad? > > > > Mmmm... I can't say, I am looking at some bugs very briefly. I've > > noticed that sysfs comes up periodically (or was it some other similar > > fs?). > > Hmm.. > > Looking at the git history I see several cases where there are > ordering problems. I wonder if the rdma parent device is being > destroyed before the rdma devices complete destruction? > > I see the syzkaller is creating a bunch of virtual net devices, and I > assume it has created a software rdma device on one of these virtual > devices. > > So I'm guessing that it is also destroying a parent? But I can't guess > which.. Some simple tests with veth suggest it is OK because the > parent is virtual. But maybe bond or bridge or something? > > The issue in rdma is that unregistering a netdev triggers an async > destruction of the RDMA devices. This has to be async because the > netdev notification is delivered with RTNL held, and a rdma device > cannot be destroyed while holding RTNL. > > So there is a race, I suppose, where the netdev can complete > destruction while rdma continues, and if someone deletes the sysfs > holding the netdev before rdma completes, I'm going to guess, that we > hit this warning? > > Could it be? I would love to know what netdev the rdma device was > created on, but it doesn't seem to show in the trace :\ > > This theory could be made more likely by adding a sleep to > ib_unregister_work() to increase the race window - is there some way > to get syzkaller to search for a reproducer with that patch? Bad it happened in kthread context. Otherwise it's usually possible to pinpoint the test based on process name. syz-repro utility will do reproduction process with a any kernel you give it: https://github.com/google/syzkaller/blob/master/docs/reproducing_crashes.md Or it's possible to run individual programs, or whole log with syz-execprog utility: https://github.com/google/syzkaller/blob/master/docs/executing_syzkaller_programs.md Or maybe you could pinpoint the guilty test program by hand in the log (it's probably somewhere closer to the end): https://syzkaller.appspot.com/x/log.txt?x=119dd16de00000 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-04-09 13:35 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-06 6:37 WARNING in ib_umad_kill_port syzbot 2020-04-06 17:21 ` Leon Romanovsky 2020-04-06 17:44 ` Jason Gunthorpe 2020-04-07 9:56 ` Dmitry Vyukov 2020-04-07 11:55 ` Jason Gunthorpe 2020-04-07 12:39 ` Dmitry Vyukov 2020-04-07 14:33 ` Greg Kroah-Hartman 2020-04-07 14:35 ` Jason Gunthorpe 2020-04-09 13:35 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).