* mm: GPF in bdi_put @ 2017-02-27 17:11 Dmitry Vyukov 2017-02-27 17:14 ` Dmitry Vyukov 2017-02-27 18:27 ` Al Viro 0 siblings, 2 replies; 9+ messages in thread From: Dmitry Vyukov @ 2017-02-27 17:11 UTC (permalink / raw) To: Al Viro, linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Jan Kara, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin Cc: syzkaller Hello, The following program triggers GPF in bdi_put: https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 2952 Comm: a.out Not tainted 4.10.0+ #229 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880063e72180 task.stack: ffff880064a78000 RIP: 0010:__read_once_size include/linux/compiler.h:247 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:refcount_sub_and_test include/linux/refcount.h:156 [inline] RIP: 0010:refcount_dec_and_test include/linux/refcount.h:181 [inline] RIP: 0010:kref_put include/linux/kref.h:71 [inline] RIP: 0010:bdi_put+0x8b/0x1d0 mm/backing-dev.c:914 RSP: 0018:ffff880064a7f0b0 EFLAGS: 00010202 RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff880064a7f118 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffff880064a7f140 R08: ffff880065603280 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: dffffc0000000000 R13: 0000000000000038 R14: 1ffff1000c94fe17 R15: ffff880064a7f218 FS: 0000000000eb5880(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020914ffa CR3: 000000006bc37000 CR4: 00000000001426f0 Call Trace: bdev_evict_inode+0x203/0x3a0 fs/block_dev.c:888 evict+0x46e/0x980 fs/inode.c:553 iput_final fs/inode.c:1515 [inline] iput+0x589/0xb20 fs/inode.c:1542 dentry_unlink_inode+0x43b/0x600 fs/dcache.c:343 __dentry_kill+0x34d/0x740 fs/dcache.c:538 dentry_kill fs/dcache.c:579 [inline] dput.part.27+0x5ce/0x7c0 fs/dcache.c:791 dput fs/dcache.c:753 [inline] do_one_tree+0x43/0x50 fs/dcache.c:1454 shrink_dcache_for_umount+0xbb/0x2b0 fs/dcache.c:1468 generic_shutdown_super+0xcd/0x4c0 fs/super.c:421 kill_anon_super+0x3c/0x50 fs/super.c:988 deactivate_locked_super+0x88/0xd0 fs/super.c:309 deactivate_super+0x155/0x1b0 fs/super.c:340 cleanup_mnt+0xb2/0x160 fs/namespace.c:1112 __cleanup_mnt+0x16/0x20 fs/namespace.c:1119 task_work_run+0x18a/0x260 kernel/task_work.c:116 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x23b/0x2a0 arch/x86/entry/common.c:160 prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline] syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 entry_SYSCALL_64_fastpath+0xc0/0xc2 RIP: 0033:0x435e19 RSP: 002b:00007ffc9d7f2748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffea RBX: 0100000000000000 RCX: 0000000000435e19 RDX: 0000000020063000 RSI: 0000000020914ffa RDI: 0000000020037000 RBP: 00007ffc9d7f2fe0 R08: 0000000020039000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000402b70 R14: 0000000000402c00 R15: 0000000000000000 Code: 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 f0 ec de ff 48 8d 45 98 48 8b 95 70 ff ff ff 48 c1 e8 03 42 c6 04 20 04 4c 89 e8 48 c1 e8 03 <42> 0f b6 0c 20 4c 89 e8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f RIP: __read_once_size include/linux/compiler.h:247 [inline] RSP: ffff880064a7f0b0 RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: ffff880064a7f0b0 RIP: refcount_sub_and_test include/linux/refcount.h:156 [inline] RSP: ffff880064a7f0b0 RIP: refcount_dec_and_test include/linux/refcount.h:181 [inline] RSP: ffff880064a7f0b0 RIP: kref_put include/linux/kref.h:71 [inline] RSP: ffff880064a7f0b0 RIP: bdi_put+0x8b/0x1d0 mm/backing-dev.c:914 RSP: ffff880064a7f0b0 ---[ end trace 8991b3d16ac9bf93 ]--- On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mm: GPF in bdi_put 2017-02-27 17:11 mm: GPF in bdi_put Dmitry Vyukov @ 2017-02-27 17:14 ` Dmitry Vyukov 2017-02-27 18:27 ` Al Viro 1 sibling, 0 replies; 9+ messages in thread From: Dmitry Vyukov @ 2017-02-27 17:14 UTC (permalink / raw) To: Al Viro, linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Jan Kara, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin Cc: syzkaller On Mon, Feb 27, 2017 at 6:11 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > Hello, > > The following program triggers GPF in bdi_put: > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt > > general protection fault: 0000 [#1] SMP KASAN > Modules linked in: > CPU: 0 PID: 2952 Comm: a.out Not tainted 4.10.0+ #229 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > task: ffff880063e72180 task.stack: ffff880064a78000 > RIP: 0010:__read_once_size include/linux/compiler.h:247 [inline] > RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] > RIP: 0010:refcount_sub_and_test include/linux/refcount.h:156 [inline] > RIP: 0010:refcount_dec_and_test include/linux/refcount.h:181 [inline] > RIP: 0010:kref_put include/linux/kref.h:71 [inline] > RIP: 0010:bdi_put+0x8b/0x1d0 mm/backing-dev.c:914 > RSP: 0018:ffff880064a7f0b0 EFLAGS: 00010202 > RAX: 0000000000000007 RBX: 0000000000000000 RCX: 0000000000000000 > RDX: ffff880064a7f118 RSI: 0000000000000001 RDI: 0000000000000000 > RBP: ffff880064a7f140 R08: ffff880065603280 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000001 R12: dffffc0000000000 > R13: 0000000000000038 R14: 1ffff1000c94fe17 R15: ffff880064a7f218 > FS: 0000000000eb5880(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000020914ffa CR3: 000000006bc37000 CR4: 00000000001426f0 > Call Trace: > bdev_evict_inode+0x203/0x3a0 fs/block_dev.c:888 > evict+0x46e/0x980 fs/inode.c:553 > iput_final fs/inode.c:1515 [inline] > iput+0x589/0xb20 fs/inode.c:1542 > dentry_unlink_inode+0x43b/0x600 fs/dcache.c:343 > __dentry_kill+0x34d/0x740 fs/dcache.c:538 > dentry_kill fs/dcache.c:579 [inline] > dput.part.27+0x5ce/0x7c0 fs/dcache.c:791 > dput fs/dcache.c:753 [inline] > do_one_tree+0x43/0x50 fs/dcache.c:1454 > shrink_dcache_for_umount+0xbb/0x2b0 fs/dcache.c:1468 > generic_shutdown_super+0xcd/0x4c0 fs/super.c:421 > kill_anon_super+0x3c/0x50 fs/super.c:988 > deactivate_locked_super+0x88/0xd0 fs/super.c:309 > deactivate_super+0x155/0x1b0 fs/super.c:340 > cleanup_mnt+0xb2/0x160 fs/namespace.c:1112 > __cleanup_mnt+0x16/0x20 fs/namespace.c:1119 > task_work_run+0x18a/0x260 kernel/task_work.c:116 > tracehook_notify_resume include/linux/tracehook.h:191 [inline] > exit_to_usermode_loop+0x23b/0x2a0 arch/x86/entry/common.c:160 > prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline] > syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 > entry_SYSCALL_64_fastpath+0xc0/0xc2 > RIP: 0033:0x435e19 > RSP: 002b:00007ffc9d7f2748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 > RAX: ffffffffffffffea RBX: 0100000000000000 RCX: 0000000000435e19 > RDX: 0000000020063000 RSI: 0000000020914ffa RDI: 0000000020037000 > RBP: 00007ffc9d7f2fe0 R08: 0000000020039000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 0000000000402b70 R14: 0000000000402c00 R15: 0000000000000000 > Code: 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 f0 ec de ff 48 8d 45 98 48 > 8b 95 70 ff ff ff 48 c1 e8 03 42 c6 04 20 04 4c 89 e8 48 c1 e8 03 <42> > 0f b6 0c 20 4c 89 e8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f > RIP: __read_once_size include/linux/compiler.h:247 [inline] RSP: > ffff880064a7f0b0 > RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: ffff880064a7f0b0 > RIP: refcount_sub_and_test include/linux/refcount.h:156 [inline] RSP: > ffff880064a7f0b0 > RIP: refcount_dec_and_test include/linux/refcount.h:181 [inline] RSP: > ffff880064a7f0b0 > RIP: kref_put include/linux/kref.h:71 [inline] RSP: ffff880064a7f0b0 > RIP: bdi_put+0x8b/0x1d0 mm/backing-dev.c:914 RSP: ffff880064a7f0b0 > ---[ end trace 8991b3d16ac9bf93 ]--- > > On commit e5d56efc97f8240d0b5d66c03949382b6d7e5570. I also wee the following WARNING. Do you think it' the same underlying bug? ------------[ cut here ]------------ WARNING: CPU: 1 PID: 24265 at mm/backing-dev.c:899 bdi_exit+0x13e/0x160 mm/backing-dev.c:899 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 24265 Comm: syz-executor3 Not tainted 4.10.0-next-20170227+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 panic+0x1fb/0x412 kernel/panic.c:179 __warn+0x1c4/0x1e0 kernel/panic.c:540 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 bdi_exit+0x13e/0x160 mm/backing-dev.c:899 release_bdi+0x19/0x30 mm/backing-dev.c:908 kref_put include/linux/kref.h:72 [inline] bdi_put+0x2a/0x40 mm/backing-dev.c:914 bdev_evict_inode+0x203/0x3a0 fs/block_dev.c:888 evict+0x46e/0x980 fs/inode.c:553 iput_final fs/inode.c:1515 [inline] iput+0x589/0xb20 fs/inode.c:1542 dentry_unlink_inode+0x43b/0x600 fs/dcache.c:343 __dentry_kill+0x34d/0x740 fs/dcache.c:538 dentry_kill fs/dcache.c:579 [inline] dput.part.27+0x5ce/0x7c0 fs/dcache.c:791 dput fs/dcache.c:753 [inline] do_one_tree+0x43/0x50 fs/dcache.c:1454 shrink_dcache_for_umount+0xbb/0x2b0 fs/dcache.c:1468 generic_shutdown_super+0xcd/0x4c0 fs/super.c:421 kill_anon_super+0x3c/0x50 fs/super.c:988 deactivate_locked_super+0x88/0xd0 fs/super.c:309 deactivate_super+0x155/0x1b0 fs/super.c:340 cleanup_mnt+0xb2/0x160 fs/namespace.c:1112 __cleanup_mnt+0x16/0x20 fs/namespace.c:1119 task_work_run+0x18a/0x260 kernel/task_work.c:116 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x23b/0x2a0 arch/x86/entry/common.c:160 prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline] syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 entry_SYSCALL_64_fastpath+0xc0/0xc2 RIP: 0033:0x44fb79 RSP: 002b:00007fd57a8a0b58 EFLAGS: 00000212 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffea RBX: 0000000000708000 RCX: 000000000044fb79 RDX: 00000000208cf000 RSI: 0000000020058ffd RDI: 0000000020fc2000 RBP: 00000000000002f7 R08: 0000000020691000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000212 R12: 0000000020fc2000 R13: 0000000020058ffd R14: 00000000208cf000 R15: 0000000000000000 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mm: GPF in bdi_put 2017-02-27 17:11 mm: GPF in bdi_put Dmitry Vyukov 2017-02-27 17:14 ` Dmitry Vyukov @ 2017-02-27 18:27 ` Al Viro 2017-02-28 17:55 ` Dmitry Vyukov 2017-03-01 14:29 ` Jan Kara 1 sibling, 2 replies; 9+ messages in thread From: Al Viro @ 2017-02-27 18:27 UTC (permalink / raw) To: Dmitry Vyukov Cc: linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Jan Kara, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin, syzkaller On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote: > Hello, > > The following program triggers GPF in bdi_put: > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt What happens is * attempt of, essentially, mount -t bdev ..., calls mount_pseudo() and then promptly destroys the new instance it has created. * the only inode created on that sucker (root directory, that is) gets evicted. * most of ->evict_inode() is harmless, until it gets to if (bdev->bd_bdi != &noop_backing_dev_info) bdi_put(bdev->bd_bdi); added there by "block: Make blk_get_backing_dev_info() safe without open bdev". Since ->bd_bdi hadn't been initialized for that sucker (the same patch has placed initialization into bdget()), we step into shit of varying nastiness, depending on phase of moon, etc. Could somebody explain WTF do we have those two lines in bdev_evict_inode(), anyway? We set ->bd_bdi to something other than noop_backing_dev_info only in __blkdev_get() when ->bd_openers goes from zero to positive, so why is the matching bdi_put() not in __blkdev_put()? Jan? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mm: GPF in bdi_put 2017-02-27 18:27 ` Al Viro @ 2017-02-28 17:55 ` Dmitry Vyukov 2017-02-28 18:23 ` Al Viro 2017-03-01 14:29 ` Jan Kara 1 sibling, 1 reply; 9+ messages in thread From: Dmitry Vyukov @ 2017-02-28 17:55 UTC (permalink / raw) To: Al Viro Cc: linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Jan Kara, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin, syzkaller On Mon, Feb 27, 2017 at 7:27 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote: >> Hello, >> >> The following program triggers GPF in bdi_put: >> https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt > > What happens is > * attempt of, essentially, mount -t bdev ..., calls mount_pseudo() > and then promptly destroys the new instance it has created. > * the only inode created on that sucker (root directory, that > is) gets evicted. > * most of ->evict_inode() is harmless, until it gets to > if (bdev->bd_bdi != &noop_backing_dev_info) > bdi_put(bdev->bd_bdi); > > added there by "block: Make blk_get_backing_dev_info() safe without open bdev". > Since ->bd_bdi hadn't been initialized for that sucker (the same patch has > placed initialization into bdget()), we step into shit of varying nastiness, > depending on phase of moon, etc. > > Could somebody explain WTF do we have those two lines in bdev_evict_inode(), > anyway? We set ->bd_bdi to something other than noop_backing_dev_info only > in __blkdev_get() when ->bd_openers goes from zero to positive, so why is > the matching bdi_put() not in __blkdev_put()? Jan? I am also seeing the following crashes on linux-next/8d01c069486aca75b8f6018a759215b0ed0c91f0. Do you think it's the same underlying issue? kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 19552 Comm: syz-executor2 Not tainted 4.10.0-next-20170228+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801c16ae400 task.stack: ffff880154c98000 RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:refcount_sub_and_test+0x82/0x1f0 lib/refcount.c:120 RSP: 0018:ffff880154c9f078 EFLAGS: 00010202 RAX: 0000000000000007 RBX: dffffc0000000000 RCX: ffffc90001a8f000 RDX: 0000000000000740 RSI: ffffffff8246160f RDI: 0000000000000001 RBP: ffff880154c9f110 R08: ffffe8ffffc29a28 R09: 0000000000000001 R10: 1ffff1002a993dcc R11: 0000000000000001 R12: 0000000000000038 R13: 0000000000000001 R14: ffff880154c9f0e8 R15: 1ffff1002a993e11 FS: 00007f0335223700(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020fd3ff8 CR3: 00000001c4580000 CR4: 00000000001406f0 DR0: 0000000020000000 DR1: 0000000020001000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: refcount_dec_and_test+0x1a/0x20 lib/refcount.c:153 kref_put include/linux/kref.h:71 [inline] bdi_put+0x19/0x40 mm/backing-dev.c:914 bdev_evict_inode+0x203/0x3a0 fs/block_dev.c:888 evict+0x46e/0x980 fs/inode.c:553 iput_final fs/inode.c:1515 [inline] iput+0x589/0xb20 fs/inode.c:1542 dentry_unlink_inode+0x43b/0x600 fs/dcache.c:343 __dentry_kill+0x34d/0x740 fs/dcache.c:538 dentry_kill fs/dcache.c:579 [inline] dput.part.27+0x5ce/0x7c0 fs/dcache.c:791 dput fs/dcache.c:753 [inline] do_one_tree+0x43/0x50 fs/dcache.c:1454 shrink_dcache_for_umount+0xbb/0x2b0 fs/dcache.c:1468 generic_shutdown_super+0xcd/0x4c0 fs/super.c:421 kill_anon_super+0x3c/0x50 fs/super.c:988 deactivate_locked_super+0x88/0xd0 fs/super.c:309 deactivate_super+0x155/0x1b0 fs/super.c:340 cleanup_mnt+0xb2/0x160 fs/namespace.c:1112 __cleanup_mnt+0x16/0x20 fs/namespace.c:1119 task_work_run+0x18a/0x260 kernel/task_work.c:116 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x23b/0x2a0 arch/x86/entry/common.c:160 prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline] syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259 entry_SYSCALL_64_fastpath+0xc0/0xc2 RIP: 0033:0x44fb79 RSP: 002b:00007f0335222b58 EFLAGS: 00000212 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffea RBX: 0000000000708150 RCX: 000000000044fb79 RDX: 000000002064e000 RSI: 00000000208f8ff8 RDI: 0000000020b28ff8 RBP: 00000000000002f7 R08: 0000000000000000 R09: 0000000000000000 R10: 8000000000000001 R11: 0000000000000212 R12: 0000000020b28ff8 R13: 00000000208f8ff8 R14: 000000002064e000 R15: 0000000000000000 Code: 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 71 02 2d ff 48 8d 45 98 48 c1 e8 03 c6 04 18 04 4c 89 e0 48 c1 e8 03 <0f> b6 14 18 4c 89 e0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 RIP: __read_once_size include/linux/compiler.h:254 [inline] RSP: ffff880154c9f078 RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: ffff880154c9f078 RIP: refcount_sub_and_test+0x82/0x1f0 lib/refcount.c:120 RSP: ffff880154c9f078 ---[ end trace 3457479bd0ed5045 ]--- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mm: GPF in bdi_put 2017-02-28 17:55 ` Dmitry Vyukov @ 2017-02-28 18:23 ` Al Viro 0 siblings, 0 replies; 9+ messages in thread From: Al Viro @ 2017-02-28 18:23 UTC (permalink / raw) To: Dmitry Vyukov Cc: linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Jan Kara, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin, syzkaller On Tue, Feb 28, 2017 at 06:55:55PM +0100, Dmitry Vyukov wrote: > I am also seeing the following crashes on > linux-next/8d01c069486aca75b8f6018a759215b0ed0c91f0. Do you think it's > the same underlying issue? Yes. 1) Any attempt of mount -t bdev will fail, as it should 2) bdevfs instance created by that attempt will be immediately destroyed (again, as it should) 3) the sole inode ever created for that instance (its root directory) will be destroyed in process (again, as it should) 4) that inode has never had ->bd_bdi initialized - the value stored there would have been whatever garbage kmem_cache_alloc() has left behind 5) bdev_evict_inode() will be called for that inode and if aforementioned garbage happens to be not equal to &noop_backing_dev_info, the pointer will be passed to bdi_put(). If that inode happens to reuse the memory previously occupied by a bdev inode of a looked up but never opened block device, it will have ->bd_bdi still equal to &noop_backing_dev_info, so that crap does not trigger every time. That's what the junk (recvmsg/ioctl/etc.) in your reproducer is affecting. Specific effects of bdi_put() will, of course, depend upon the actual garbage found there - silent decrement of refcount of an existing bdi setting the things up for later use-after-free, outright memory corruption, etc. _Any_ stack trace of form sys_mount() -> ... -> bdev_evict_inode() -> bdi_put() -> <barf> is almost certainly the same bug. I would still like to hear from Jan regarding the reasons why we do that bdi_put() from bdev_evict_inode() and not in __blkdev_put(). My preference would be to do it there (and reset ->bd_bdi to &noop_backing_dev_info) when ->bd_openers hits 0. And drop that code from bdev_evict_inode()... Objections? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mm: GPF in bdi_put 2017-02-27 18:27 ` Al Viro 2017-02-28 17:55 ` Dmitry Vyukov @ 2017-03-01 14:29 ` Jan Kara 2017-03-01 15:05 ` Jan Kara 2017-03-02 11:44 ` Al Viro 1 sibling, 2 replies; 9+ messages in thread From: Jan Kara @ 2017-03-01 14:29 UTC (permalink / raw) To: Al Viro Cc: Dmitry Vyukov, linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Jan Kara, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin, syzkaller On Mon 27-02-17 18:27:55, Al Viro wrote: > On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote: > > Hello, > > > > The following program triggers GPF in bdi_put: > > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt > > What happens is > * attempt of, essentially, mount -t bdev ..., calls mount_pseudo() > and then promptly destroys the new instance it has created. > * the only inode created on that sucker (root directory, that > is) gets evicted. > * most of ->evict_inode() is harmless, until it gets to > if (bdev->bd_bdi != &noop_backing_dev_info) > bdi_put(bdev->bd_bdi); Thanks for the analysis! > added there by "block: Make blk_get_backing_dev_info() safe without open bdev". > Since ->bd_bdi hadn't been initialized for that sucker (the same patch has > placed initialization into bdget()), we step into shit of varying nastiness, > depending on phase of moon, etc. Yup, I've missed that the root inode of bdev superblock does not go through bdget() (in fact I didn't think what happens with root inode for bdev superblock at all) and thus bd_bdi is left uninitialized in that case. I'll send a fix for that in a while. > Could somebody explain WTF do we have those two lines in bdev_evict_inode(), > anyway? We set ->bd_bdi to something other than noop_backing_dev_info only > in __blkdev_get() when ->bd_openers goes from zero to positive, so why is > the matching bdi_put() not in __blkdev_put()? Jan? The problem is writeback code (from flusher work or through sync(2) - generally inode_to_bdi() users) can be looking at bdev inode independently from it being open. So if they start looking while the bdev is open but the dereference happens after it is closed and device removed, we oops. We have seen oopses due to this for quite a while. And all the stuff that is done in __blkdev_put() is not enough to prevent writeback code from having a look whether there is not something to write. So what we do now is that once we establish valid bd_bdi reference, we leave it alone until bdev inode gets evicted. And to handle the case when underlying device actually changes, we unhash bdev inode when the device gets removed from the system so that it cannot be found by bdget() anymore. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mm: GPF in bdi_put 2017-03-01 14:29 ` Jan Kara @ 2017-03-01 15:05 ` Jan Kara 2017-03-02 11:44 ` Al Viro 1 sibling, 0 replies; 9+ messages in thread From: Jan Kara @ 2017-03-01 15:05 UTC (permalink / raw) To: Al Viro Cc: Dmitry Vyukov, linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Jan Kara, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin, syzkaller [-- Attachment #1: Type: text/plain, Size: 2631 bytes --] On Wed 01-03-17 15:29:09, Jan Kara wrote: > On Mon 27-02-17 18:27:55, Al Viro wrote: > > On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote: > > > Hello, > > > > > > The following program triggers GPF in bdi_put: > > > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt > > > > What happens is > > * attempt of, essentially, mount -t bdev ..., calls mount_pseudo() > > and then promptly destroys the new instance it has created. > > * the only inode created on that sucker (root directory, that > > is) gets evicted. > > * most of ->evict_inode() is harmless, until it gets to > > if (bdev->bd_bdi != &noop_backing_dev_info) > > bdi_put(bdev->bd_bdi); > > Thanks for the analysis! > > > added there by "block: Make blk_get_backing_dev_info() safe without open bdev". > > Since ->bd_bdi hadn't been initialized for that sucker (the same patch has > > placed initialization into bdget()), we step into shit of varying nastiness, > > depending on phase of moon, etc. > > Yup, I've missed that the root inode of bdev superblock does not go through > bdget() (in fact I didn't think what happens with root inode for bdev > superblock at all) and thus bd_bdi is left uninitialized in that case. I'll > send a fix for that in a while. > > > Could somebody explain WTF do we have those two lines in bdev_evict_inode(), > > anyway? We set ->bd_bdi to something other than noop_backing_dev_info only > > in __blkdev_get() when ->bd_openers goes from zero to positive, so why is > > the matching bdi_put() not in __blkdev_put()? Jan? > > The problem is writeback code (from flusher work or through sync(2) - > generally inode_to_bdi() users) can be looking at bdev inode independently > from it being open. So if they start looking while the bdev is open but the > dereference happens after it is closed and device removed, we oops. We have > seen oopses due to this for quite a while. And all the stuff that is done > in __blkdev_put() is not enough to prevent writeback code from having a > look whether there is not something to write. > > So what we do now is that once we establish valid bd_bdi reference, we > leave it alone until bdev inode gets evicted. And to handle the case when > underlying device actually changes, we unhash bdev inode when the device > gets removed from the system so that it cannot be found by bdget() anymore. Attached patch fixes the problem for me. I'll post it officially tomorrow once Al has a chance to reply... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR [-- Attachment #2: 0001-block-Initialize-bd_bdi-on-inode-initialization.patch --] [-- Type: text/x-patch, Size: 2034 bytes --] >From a533c8dd1fb4dbf840cd3adaf68afb6ad6851ddc Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Wed, 1 Mar 2017 15:31:11 +0100 Subject: [PATCH] block: Initialize bd_bdi on inode initialization So far we initialized bd_bdi only in bdget(). That is fine for normal bdev inodes however for the special case of the root inode of blockdev_superblock that function is never called and thus bd_bdi is left uninitialized. As a result bdev_evict_inode() may oops doing bdi_put(root->bd_bdi) on that inode as can be seen when doing: mount -t bdev none /mnt Fix the problem by initializing bd_bdi when first allocating the inode and then reinitializing bd_bdi in bdev_evict_inode(). Thanks to syzkaller team for finding the problem. Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: b1d2dc5659b41741f5a29b2ade76ffb4e5bb13d8 Signed-off-by: Jan Kara <jack@suse.cz> --- fs/block_dev.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 77c30f15a02c..2eca00ec4370 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -870,6 +870,7 @@ static void init_once(void *foo) #ifdef CONFIG_SYSFS INIT_LIST_HEAD(&bdev->bd_holder_disks); #endif + bdev->bd_bdi = &noop_backing_dev_info; inode_init_once(&ei->vfs_inode); /* Initialize mutex for freeze. */ mutex_init(&bdev->bd_fsfreeze_mutex); @@ -884,8 +885,10 @@ static void bdev_evict_inode(struct inode *inode) spin_lock(&bdev_lock); list_del_init(&bdev->bd_list); spin_unlock(&bdev_lock); - if (bdev->bd_bdi != &noop_backing_dev_info) + if (bdev->bd_bdi != &noop_backing_dev_info) { bdi_put(bdev->bd_bdi); + bdev->bd_bdi = &noop_backing_dev_info; + } } static const struct super_operations bdev_sops = { @@ -988,7 +991,6 @@ struct block_device *bdget(dev_t dev) bdev->bd_contains = NULL; bdev->bd_super = NULL; bdev->bd_inode = inode; - bdev->bd_bdi = &noop_backing_dev_info; bdev->bd_block_size = i_blocksize(inode); bdev->bd_part_count = 0; bdev->bd_invalidated = 0; -- 2.10.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: mm: GPF in bdi_put 2017-03-01 14:29 ` Jan Kara 2017-03-01 15:05 ` Jan Kara @ 2017-03-02 11:44 ` Al Viro 2017-03-02 12:20 ` Jan Kara 1 sibling, 1 reply; 9+ messages in thread From: Al Viro @ 2017-03-02 11:44 UTC (permalink / raw) To: Jan Kara Cc: Dmitry Vyukov, linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin, syzkaller On Wed, Mar 01, 2017 at 03:29:09PM +0100, Jan Kara wrote: > The problem is writeback code (from flusher work or through sync(2) - > generally inode_to_bdi() users) can be looking at bdev inode independently > from it being open. So if they start looking while the bdev is open but the > dereference happens after it is closed and device removed, we oops. We have > seen oopses due to this for quite a while. And all the stuff that is done > in __blkdev_put() is not enough to prevent writeback code from having a > look whether there is not something to write. Um. What's to prevent the queue/device/module itself from disappearing from under you? IOW, what are you doing that is safe to do in face of driver going rmmoded? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mm: GPF in bdi_put 2017-03-02 11:44 ` Al Viro @ 2017-03-02 12:20 ` Jan Kara 0 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2017-03-02 12:20 UTC (permalink / raw) To: Al Viro Cc: Jan Kara, Dmitry Vyukov, linux-fsdevel@vger.kernel.org, LKML, Jens Axboe, Andrew Morton, Tejun Heo, Johannes Weiner, linux-mm@kvack.org, Andrey Ryabinin, syzkaller On Thu 02-03-17 11:44:53, Al Viro wrote: > On Wed, Mar 01, 2017 at 03:29:09PM +0100, Jan Kara wrote: > > > The problem is writeback code (from flusher work or through sync(2) - > > generally inode_to_bdi() users) can be looking at bdev inode independently > > from it being open. So if they start looking while the bdev is open but the > > dereference happens after it is closed and device removed, we oops. We have > > seen oopses due to this for quite a while. And all the stuff that is done > > in __blkdev_put() is not enough to prevent writeback code from having a > > look whether there is not something to write. > > Um. What's to prevent the queue/device/module itself from disappearing > from under you? IOW, what are you doing that is safe to do in face of > driver going rmmoded? So BDI does not have direct relation to the device itself. It is an abstraction for some of the device properties / functionality and thus it can live even after the device itself went away and the module got removed. The only thing users of bdi want is to tell them whether the device is congested or various statistics and dirty inode tracking for writeback purposes and that is all independent of the particular device or whether it still exists. Technically there may be pointers bdi->dev, bdi->owner to the device which are properly refcounted (so the device structure or module cannot be removed under us). These references get dropped & cleared in bdi_unregister() generally called from blk_cleanup_queue() (will be moved to del_gendisk() soon) when the device is going away. This can happen while e.g. bdev still references the bdi so users of bdi->dev or bdi->owner have to be careful to sychronize against device removal and bdi_unregister() but there are only very few such users. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-03-02 12:20 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-02-27 17:11 mm: GPF in bdi_put Dmitry Vyukov 2017-02-27 17:14 ` Dmitry Vyukov 2017-02-27 18:27 ` Al Viro 2017-02-28 17:55 ` Dmitry Vyukov 2017-02-28 18:23 ` Al Viro 2017-03-01 14:29 ` Jan Kara 2017-03-01 15:05 ` Jan Kara 2017-03-02 11:44 ` Al Viro 2017-03-02 12:20 ` Jan Kara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).