* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode [not found] ` <20240614235953.809-1-hdanton@sina.com> @ 2024-06-15 20:44 ` Matthew Wilcox 2024-06-15 23:52 ` Hillf Danton 2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton 0 siblings, 2 replies; 11+ messages in thread From: Matthew Wilcox @ 2024-06-15 20:44 UTC (permalink / raw) To: Hillf Danton Cc: linux-mm, Jan Kara, linux-kernel, syzbot+d79afb004be235636ee8, linux-fsdevel, linux-nilfs, Ryusuke Konishi On Sat, Jun 15, 2024 at 07:59:53AM +0800, Hillf Danton wrote: > On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote: > > On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote: > > > Flush lru cache to avoid folio->mapping uaf in case of inode teardown. > > > > What? inodes are supposed to have all their folios removed before > > being freed. Part of removing a folio sets the folio->mapping to NULL. > > Where is the report? > > > Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn > https://lore.kernel.org/lkml/000000000000cae276061aa12d5e@google.com/ Thanks. This fix is wrong. Of course syzbot says it fixes the problem, but you're just avoiding putting the folios into the situation where we have debug that would detect the problem. I suspect this would trigger: +++ b/fs/inode.c @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb) void __destroy_inode(struct inode *inode) { BUG_ON(inode_has_buffers(inode)); + BUG_ON(inode->i_data.nrpages); inode_detach_wb(inode); security_inode_free(inode); fsnotify_inode_delete(inode); and what a real fix would look like would be calling clear_inode() before calling iput() in nilfs_put_root(). But I'm not an expert in this layer of the VFS, so I might well be wrong. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode 2024-06-15 20:44 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Matthew Wilcox @ 2024-06-15 23:52 ` Hillf Danton 2024-06-16 0:10 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot 2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton 1 sibling, 1 reply; 11+ messages in thread From: Hillf Danton @ 2024-06-15 23:52 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, Jan Kara, linux-kernel, syzbot+d79afb004be235636ee8, linux-fsdevel, linux-nilfs, Ryusuke Konishi On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote: > On Sat, Jun 15, 2024 at 07:59:53AM +0800, Hillf Danton wrote: > > On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote: > > > On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote: > > > > Flush lru cache to avoid folio->mapping uaf in case of inode teardown. > > > > > > What? inodes are supposed to have all their folios removed before > > > being freed. Part of removing a folio sets the folio->mapping to NULL. > > > Where is the report? > > > > > Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn > > https://lore.kernel.org/lkml/000000000000cae276061aa12d5e@google.com/ > > Thanks. This fix is wrong. Of course syzbot says it fixes the problem, > but you're just avoiding putting the folios into the situation where we > have debug that would detect the problem. > > I suspect this would trigger: > Happy to test your idea. > +++ b/fs/inode.c > @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb) > void __destroy_inode(struct inode *inode) > { > BUG_ON(inode_has_buffers(inode)); > + BUG_ON(inode->i_data.nrpages); > inode_detach_wb(inode); > security_inode_free(inode); > fsnotify_inode_delete(inode); > > and what a real fix would look like would be calling clear_inode() > before calling iput() in nilfs_put_root(). But I'm not an expert Hm...given I_FREEING checked in clear_inode(), fix like this one could be tried in midle 2026. > in this layer of the VFS, so I might well be wrong. #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b --- x/mm/truncate.c +++ y/mm/truncate.c @@ -419,6 +419,9 @@ void truncate_inode_pages_range(struct a truncate_folio_batch_exceptionals(mapping, &fbatch, indices); folio_batch_release(&fbatch); } + + if (mapping_exiting(mapping)) + lru_add_drain_all(); } EXPORT_SYMBOL(truncate_inode_pages_range); --- x/fs/inode.c +++ y/fs/inode.c @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct void __destroy_inode(struct inode *inode) { BUG_ON(inode_has_buffers(inode)); + BUG_ON(inode->i_data.nrpages); inode_detach_wb(inode); security_inode_free(inode); fsnotify_inode_delete(inode); -- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn 2024-06-15 23:52 ` Hillf Danton @ 2024-06-16 0:10 ` syzbot 0 siblings, 0 replies; 11+ messages in thread From: syzbot @ 2024-06-16 0:10 UTC (permalink / raw) To: hdanton, jack, konishi.ryusuke, linux-fsdevel, linux-kernel, linux-mm, linux-nilfs, syzkaller-bugs, willy Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: kernel BUG in __destroy_inode NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0) NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0) NILFS (loop0): disposed unprocessed dirty file(s) when stopping log writer ------------[ cut here ]------------ kernel BUG at fs/inode.c:285! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 2 PID: 5330 Comm: syz-executor Not tainted 6.10.0-rc3-syzkaller-dirty #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:__destroy_inode+0x5e4/0x7a0 fs/inode.c:285 Code: 2a 03 00 00 48 c7 c7 40 78 3d 8b c6 05 aa 6d cc 0d 01 e8 bf d9 69 ff e9 0e fc ff ff e8 a5 8b 8c ff 90 0f 0b e8 9d 8b 8c ff 90 <0f> 0b e8 95 8b 8c ff 90 0f 0b 90 e9 fa fa ff ff e8 87 8b 8c ff 90 RSP: 0018:ffffc900035afaf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8880325ba7c8 RCX: ffffffff82015439 RDX: ffff8880222ec880 RSI: ffffffff820159b3 RDI: 0000000000000007 RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880325ba980 R13: 0000000000000024 R14: ffffffff8b706c60 R15: ffff8880325ba8a0 FS: 0000555571e27480(0000) GS:ffff88806b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f01cb366731 CR3: 0000000034ef4000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> destroy_inode+0x91/0x1b0 fs/inode.c:310 iput_final fs/inode.c:1742 [inline] iput.part.0+0x5a8/0x7f0 fs/inode.c:1768 iput+0x5c/0x80 fs/inode.c:1758 nilfs_put_root+0xae/0xe0 fs/nilfs2/the_nilfs.c:925 nilfs_segctor_destroy fs/nilfs2/segment.c:2788 [inline] nilfs_detach_log_writer+0x5ef/0xaa0 fs/nilfs2/segment.c:2850 nilfs_put_super+0x43/0x1b0 fs/nilfs2/super.c:498 generic_shutdown_super+0x159/0x3d0 fs/super.c:642 kill_block_super+0x3b/0x90 fs/super.c:1676 deactivate_locked_super+0xbe/0x1a0 fs/super.c:473 deactivate_super+0xde/0x100 fs/super.c:506 cleanup_mnt+0x222/0x450 fs/namespace.c:1267 task_work_run+0x14e/0x250 kernel/task_work.c:180 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x278/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc203a7e217 Code: b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8 RSP: 002b:00007fffe9265ae8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000000064 RCX: 00007fc203a7e217 RDX: 0000000000000200 RSI: 0000000000000009 RDI: 00007fffe9266c90 RBP: 00007fc203ac8336 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000100 R11: 0000000000000202 R12: 00007fffe9266c90 R13: 00007fc203ac8336 R14: 0000555571e27430 R15: 0000000000000005 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:__destroy_inode+0x5e4/0x7a0 fs/inode.c:285 Code: 2a 03 00 00 48 c7 c7 40 78 3d 8b c6 05 aa 6d cc 0d 01 e8 bf d9 69 ff e9 0e fc ff ff e8 a5 8b 8c ff 90 0f 0b e8 9d 8b 8c ff 90 <0f> 0b e8 95 8b 8c ff 90 0f 0b 90 e9 fa fa ff ff e8 87 8b 8c ff 90 RSP: 0018:ffffc900035afaf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8880325ba7c8 RCX: ffffffff82015439 RDX: ffff8880222ec880 RSI: ffffffff820159b3 RDI: 0000000000000007 RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880325ba980 R13: 0000000000000024 R14: ffffffff8b706c60 R15: ffff8880325ba8a0 FS: 0000555571e27480(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c0016fb000 CR3: 0000000034ef4000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Tested on: commit: 83a7eefe Linux 6.10-rc3 git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git console output: https://syzkaller.appspot.com/x/log.txt?x=11bb8ada980000 kernel config: https://syzkaller.appspot.com/x/.config?x=b8786f381e62940f dashboard link: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8 compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 patch: https://syzkaller.appspot.com/x/patch.diff?x=16642012980000 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode 2024-06-15 20:44 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Matthew Wilcox 2024-06-15 23:52 ` Hillf Danton @ 2024-06-16 2:39 ` Hillf Danton 2024-06-16 3:06 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot 2024-06-17 7:57 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Jan Kara 1 sibling, 2 replies; 11+ messages in thread From: Hillf Danton @ 2024-06-16 2:39 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, Jan Kara, linux-kernel, syzbot+d79afb004be235636ee8, linux-fsdevel, linux-nilfs, Ryusuke Konishi On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote: > > I suspect this would trigger: > > +++ b/fs/inode.c > @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb) > void __destroy_inode(struct inode *inode) > { > BUG_ON(inode_has_buffers(inode)); > + BUG_ON(inode->i_data.nrpages); > inode_detach_wb(inode); > security_inode_free(inode); > fsnotify_inode_delete(inode); > Yes, it was triggered [1] [1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/ and given trigger after nrpages is checked in clear_inode(), iput(inode) evict(inode) truncate_inode_pages_final(&inode->i_data); clear_inode(inode); destroy_inode(inode); why is folio added to exiting mapping? #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b --- x/mm/filemap.c +++ y/mm/filemap.c @@ -870,6 +870,7 @@ noinline int __filemap_add_folio(struct folio_ref_add(folio, nr); folio->mapping = mapping; folio->index = xas.xa_index; + BUG_ON(mapping_exiting(mapping)); for (;;) { int order = -1, split_order = 0; -- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn 2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton @ 2024-06-16 3:06 ` syzbot 2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi 2024-06-17 7:57 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Jan Kara 1 sibling, 1 reply; 11+ messages in thread From: syzbot @ 2024-06-16 3:06 UTC (permalink / raw) To: hdanton, jack, konishi.ryusuke, linux-fsdevel, linux-kernel, linux-mm, linux-nilfs, syzkaller-bugs, willy Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: kernel BUG in __filemap_add_folio NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0) NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0) NILFS (loop0): disposed unprocessed dirty file(s) when stopping log writer ------------[ cut here ]------------ kernel BUG at mm/filemap.c:873! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 PID: 5321 Comm: syz-executor Not tainted 6.10.0-rc3-syzkaller-dirty #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:__filemap_add_folio+0xd1d/0xe80 mm/filemap.c:873 Code: 37 8b 4c 89 f7 e8 23 68 10 00 90 0f 0b e8 9b 14 ce ff 48 c7 c6 e0 92 37 8b 4c 89 f7 e8 0c 68 10 00 90 0f 0b e8 84 14 ce ff 90 <0f> 0b e8 7c 14 ce ff 90 0f 0b 90 e9 24 fb ff ff e8 6e 14 ce ff 48 RSP: 0018:ffffc900035773f0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81bfc8cd RDX: ffff888023052440 RSI: ffffffff81bfd0cc RDI: 0000000000000001 RBP: ffff88803233a9f0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000003 R12: ffffc90003577468 R13: 0000000000000000 R14: ffffea0000b3f7c0 R15: 0000000000000000 FS: 000055556c846480(0000) GS:ffff88806b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe311b9ff8 CR3: 000000001ae02000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> filemap_add_folio+0x110/0x220 mm/filemap.c:971 __filemap_get_folio+0x455/0xa80 mm/filemap.c:1959 filemap_grab_folio include/linux/pagemap.h:697 [inline] nilfs_grab_buffer+0xc3/0x370 fs/nilfs2/page.c:57 nilfs_mdt_submit_block+0x9f/0x870 fs/nilfs2/mdt.c:121 nilfs_mdt_read_block+0xa4/0x3b0 fs/nilfs2/mdt.c:176 nilfs_mdt_get_block+0xdb/0xb90 fs/nilfs2/mdt.c:251 nilfs_palloc_get_block+0xb5/0x300 fs/nilfs2/alloc.c:217 nilfs_palloc_get_entry_block+0x165/0x1b0 fs/nilfs2/alloc.c:319 nilfs_ifile_delete_inode+0x1e6/0x260 fs/nilfs2/ifile.c:109 nilfs_evict_inode+0x294/0x550 fs/nilfs2/inode.c:950 evict+0x2ed/0x6c0 fs/inode.c:667 iput_final fs/inode.c:1741 [inline] iput.part.0+0x5a8/0x7f0 fs/inode.c:1767 iput+0x5c/0x80 fs/inode.c:1757 nilfs_put_root+0xae/0xe0 fs/nilfs2/the_nilfs.c:925 nilfs_segctor_destroy fs/nilfs2/segment.c:2788 [inline] nilfs_detach_log_writer+0x5ef/0xaa0 fs/nilfs2/segment.c:2850 nilfs_put_super+0x43/0x1b0 fs/nilfs2/super.c:498 generic_shutdown_super+0x159/0x3d0 fs/super.c:642 kill_block_super+0x3b/0x90 fs/super.c:1676 deactivate_locked_super+0xbe/0x1a0 fs/super.c:473 deactivate_super+0xde/0x100 fs/super.c:506 cleanup_mnt+0x222/0x450 fs/namespace.c:1267 task_work_run+0x14e/0x250 kernel/task_work.c:180 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x278/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f70d447e217 Code: b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8 RSP: 002b:00007ffe311ba288 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000000064 RCX: 00007f70d447e217 RDX: 0000000000000200 RSI: 0000000000000009 RDI: 00007ffe311bb430 RBP: 00007f70d44c8336 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000100 R11: 0000000000000202 R12: 00007ffe311bb430 R13: 00007f70d44c8336 R14: 000055556c846430 R15: 0000000000000005 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:__filemap_add_folio+0xd1d/0xe80 mm/filemap.c:873 Code: 37 8b 4c 89 f7 e8 23 68 10 00 90 0f 0b e8 9b 14 ce ff 48 c7 c6 e0 92 37 8b 4c 89 f7 e8 0c 68 10 00 90 0f 0b e8 84 14 ce ff 90 <0f> 0b e8 7c 14 ce ff 90 0f 0b 90 e9 24 fb ff ff e8 6e 14 ce ff 48 RSP: 0018:ffffc900035773f0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81bfc8cd RDX: ffff888023052440 RSI: ffffffff81bfd0cc RDI: 0000000000000001 RBP: ffff88803233a9f0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000003 R12: ffffc90003577468 R13: 0000000000000000 R14: ffffea0000b3f7c0 R15: 0000000000000000 FS: 000055556c846480(0000) GS:ffff88806b000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f70d45a8000 CR3: 000000001ae02000 CR4: 0000000000350ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Tested on: commit: 83a7eefe Linux 6.10-rc3 git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git console output: https://syzkaller.appspot.com/x/log.txt?x=15608256980000 kernel config: https://syzkaller.appspot.com/x/.config?x=b8786f381e62940f dashboard link: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8 compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 patch: https://syzkaller.appspot.com/x/patch.diff?x=147bb012980000 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes 2024-06-16 3:06 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot @ 2024-06-23 5:11 ` Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 1/3] nilfs2: fix inode number range checks Ryusuke Konishi ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Ryusuke Konishi @ 2024-06-23 5:11 UTC (permalink / raw) To: Andrew Morton Cc: linux-nilfs, syzbot, syzkaller-bugs, LKML, hdanton, jack, linux-fsdevel, willy Hi Andrew, please apply this bug fix series. This series fixes one use-after-free issue reported by syzbot, caused by nilfs2's internal inode being exposed in the namespace on a corrupted filesystem, and a couple of flaws that cause problems if the starting number of non-reserved inodes written in the on-disk super block is intentionally (or corruptly) changed from its default value. Thanks, Ryusuke Konishi Ryusuke Konishi (3): nilfs2: fix inode number range checks nilfs2: add missing check for inode numbers on directory entries nilfs2: fix incorrect inode allocation from reserved inodes fs/nilfs2/alloc.c | 19 +++++++++++++++---- fs/nilfs2/alloc.h | 4 ++-- fs/nilfs2/dat.c | 2 +- fs/nilfs2/dir.c | 6 ++++++ fs/nilfs2/ifile.c | 7 ++----- fs/nilfs2/nilfs.h | 10 ++++++++-- fs/nilfs2/the_nilfs.c | 6 ++++++ fs/nilfs2/the_nilfs.h | 2 +- 8 files changed, 41 insertions(+), 15 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] nilfs2: fix inode number range checks 2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi @ 2024-06-23 5:11 ` Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 2/3] nilfs2: add missing check for inode numbers on directory entries Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 3/3] nilfs2: fix incorrect inode allocation from reserved inodes Ryusuke Konishi 2 siblings, 0 replies; 11+ messages in thread From: Ryusuke Konishi @ 2024-06-23 5:11 UTC (permalink / raw) To: Andrew Morton Cc: linux-nilfs, syzbot, syzkaller-bugs, LKML, hdanton, jack, linux-fsdevel, willy In the current implementation of nilfs2, "nilfs->ns_first_ino", which gives the first non-reserved inode number, is read from the superblock, but its lower limit is not checked. As a result, if a number that overlaps with the inode number range of reserved inodes such as the root directory or metadata files is set in the super block parameter, the inode number test macros (NILFS_MDT_INODE and NILFS_VALID_INODE) will not function properly. In addition, these test macros use left bit-shift calculations using with the inode number as the shift count via the BIT macro, but the result of a shift calculation that exceeds the bit width of an integer is undefined in the C specification, so if "ns_first_ino" is set to a large value other than the default value NILFS_USER_INO (=11), the macros may potentially malfunction depending on the environment. Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and by preventing bit shifts equal to or greater than the NILFS_USER_INO constant in the inode number test macros. Also, change the type of "ns_first_ino" from signed integer to unsigned integer to avoid the need for type casting in comparisons such as the lower bound check introduced this time. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: stable@vger.kernel.org --- fs/nilfs2/nilfs.h | 5 +++-- fs/nilfs2/the_nilfs.c | 6 ++++++ fs/nilfs2/the_nilfs.h | 2 +- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h index 728e90be3570..7e39e277c77f 100644 --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -116,9 +116,10 @@ enum { #define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino) #define NILFS_MDT_INODE(sb, ino) \ - ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino))) + ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino))) #define NILFS_VALID_INODE(sb, ino) \ - ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino))) + ((ino) >= NILFS_FIRST_INO(sb) || \ + ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino)))) /** * struct nilfs_transaction_info: context information for synchronization diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c index f41d7b6d432c..e44dde57ab65 100644 --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs, } nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino); + if (nilfs->ns_first_ino < NILFS_USER_INO) { + nilfs_err(nilfs->ns_sb, + "too small lower limit for non-reserved inode numbers: %u", + nilfs->ns_first_ino); + return -EINVAL; + } nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment); if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) { diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index 85da0629415d..1e829ed7b0ef 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -182,7 +182,7 @@ struct the_nilfs { unsigned long ns_nrsvsegs; unsigned long ns_first_data_block; int ns_inode_size; - int ns_first_ino; + unsigned int ns_first_ino; u32 ns_crc_seed; /* /sys/fs/<nilfs>/<device> */ -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] nilfs2: add missing check for inode numbers on directory entries 2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 1/3] nilfs2: fix inode number range checks Ryusuke Konishi @ 2024-06-23 5:11 ` Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 3/3] nilfs2: fix incorrect inode allocation from reserved inodes Ryusuke Konishi 2 siblings, 0 replies; 11+ messages in thread From: Ryusuke Konishi @ 2024-06-23 5:11 UTC (permalink / raw) To: Andrew Morton Cc: linux-nilfs, syzbot, syzkaller-bugs, LKML, hdanton, jack, linux-fsdevel, willy Syzbot reported that mounting and unmounting a specific pattern of corrupted nilfs2 filesystem images causes a use-after-free of metadata file inodes, which triggers a kernel bug in lru_add_fn(). As Jan Kara pointed out, this is because the link count of a metadata file gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(), tries to delete that inode (ifile inode in this case). The inconsistency occurs because directories containing the inode numbers of these metadata files that should not be visible in the namespace are read without checking. Fix this issue by treating the inode numbers of these internal files as errors in the sanity check helper when reading directory folios/pages. Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer analysis. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8 Reported-by: Jan Kara <jack@suse.cz> Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: stable@vger.kernel.org --- fs/nilfs2/dir.c | 6 ++++++ fs/nilfs2/nilfs.h | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c index 52e50b1b7f22..dddfa604491a 100644 --- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr) goto Enamelen; if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1)) goto Espan; + if (unlikely(p->inode && + NILFS_PRIVATE_INODE(le64_to_cpu(p->inode)))) + goto Einumber; } if (offs != limit) goto Eend; @@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr) goto bad_entry; Espan: error = "directory entry across blocks"; + goto bad_entry; +Einumber: + error = "disallowed inode number"; bad_entry: nilfs_error(sb, "bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d", diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h index 7e39e277c77f..4017f7856440 100644 --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -121,6 +121,11 @@ enum { ((ino) >= NILFS_FIRST_INO(sb) || \ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino)))) +#define NILFS_PRIVATE_INODE(ino) ({ \ + ino_t __ino = (ino); \ + ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \ + (__ino) != NILFS_SKETCH_INO); }) + /** * struct nilfs_transaction_info: context information for synchronization * @ti_magic: Magic number -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] nilfs2: fix incorrect inode allocation from reserved inodes 2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 1/3] nilfs2: fix inode number range checks Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 2/3] nilfs2: add missing check for inode numbers on directory entries Ryusuke Konishi @ 2024-06-23 5:11 ` Ryusuke Konishi 2 siblings, 0 replies; 11+ messages in thread From: Ryusuke Konishi @ 2024-06-23 5:11 UTC (permalink / raw) To: Andrew Morton Cc: linux-nilfs, syzbot, syzkaller-bugs, LKML, hdanton, jack, linux-fsdevel, willy If the bitmap block that manages the inode allocation status is corrupted, nilfs_ifile_create_inode() may allocate a new inode from the reserved inode area where it should not be allocated. Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of struct nilfs_root"), fixed the problem that reserved inodes with inode numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to bitmap corruption, but since the start number of non-reserved inodes is read from the super block and may change, in which case inode allocation may occur from the extended reserved inode area. If that happens, access to that inode will cause an IO error, causing the file system to degrade to an error state. Fix this potential issue by adding a wraparound option to the common metadata object allocation routine and by modifying nilfs_ifile_create_inode() to disable the option so that it only allocates inodes with inode numbers greater than or equal to the inode number read in "nilfs->ns_first_ino", regardless of the bitmap status of reserved inodes. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: stable@vger.kernel.org --- fs/nilfs2/alloc.c | 19 +++++++++++++++---- fs/nilfs2/alloc.h | 4 ++-- fs/nilfs2/dat.c | 2 +- fs/nilfs2/ifile.c | 7 ++----- 4 files changed, 20 insertions(+), 12 deletions(-) diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c index 89caef7513db..ba50388ee4bf 100644 --- a/fs/nilfs2/alloc.c +++ b/fs/nilfs2/alloc.c @@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr, * @target: offset number of an entry in the group (start point) * @bsize: size in bits * @lock: spin lock protecting @bitmap + * @wrap: whether to wrap around */ static int nilfs_palloc_find_available_slot(unsigned char *bitmap, unsigned long target, unsigned int bsize, - spinlock_t *lock) + spinlock_t *lock, bool wrap) { int pos, end = bsize; @@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap, end = target; } + if (!wrap) + return -ENOSPC; /* wrap around */ for (pos = 0; pos < end; pos++) { @@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp) * nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object * @inode: inode of metadata file using this allocator * @req: nilfs_palloc_req structure exchanged for the allocation + * @wrap: whether to wrap around */ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, - struct nilfs_palloc_req *req) + struct nilfs_palloc_req *req, bool wrap) { struct buffer_head *desc_bh, *bitmap_bh; struct nilfs_palloc_group_desc *desc; @@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, entries_per_group = nilfs_palloc_entries_per_group(inode); for (i = 0; i < ngroups; i += n) { - if (group >= ngroups) { + if (group >= ngroups && wrap) { /* wrap around */ group = 0; maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr, @@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, bitmap_kaddr = kmap_local_page(bitmap_bh->b_page); bitmap = bitmap_kaddr + bh_offset(bitmap_bh); pos = nilfs_palloc_find_available_slot( - bitmap, group_offset, entries_per_group, lock); + bitmap, group_offset, entries_per_group, lock, + wrap); + /* + * Since the search for a free slot in the second and + * subsequent bitmap blocks always starts from the + * beginning, the wrap flag only has an effect on the + * first search. + */ kunmap_local(bitmap_kaddr); if (pos >= 0) goto found; diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h index b667e869ac07..d825a9faca6d 100644 --- a/fs/nilfs2/alloc.h +++ b/fs/nilfs2/alloc.h @@ -50,8 +50,8 @@ struct nilfs_palloc_req { struct buffer_head *pr_entry_bh; }; -int nilfs_palloc_prepare_alloc_entry(struct inode *, - struct nilfs_palloc_req *); +int nilfs_palloc_prepare_alloc_entry(struct inode *inode, + struct nilfs_palloc_req *req, bool wrap); void nilfs_palloc_commit_alloc_entry(struct inode *, struct nilfs_palloc_req *); void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *); diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c index 180fc8d36213..fc1caf63a42a 100644 --- a/fs/nilfs2/dat.c +++ b/fs/nilfs2/dat.c @@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req) { int ret; - ret = nilfs_palloc_prepare_alloc_entry(dat, req); + ret = nilfs_palloc_prepare_alloc_entry(dat, req, true); if (ret < 0) return ret; diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c index 612e609158b5..1e86b9303b7c 100644 --- a/fs/nilfs2/ifile.c +++ b/fs/nilfs2/ifile.c @@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino, struct nilfs_palloc_req req; int ret; - req.pr_entry_nr = 0; /* - * 0 says find free inode from beginning - * of a group. dull code!! - */ + req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb); req.pr_entry_bh = NULL; - ret = nilfs_palloc_prepare_alloc_entry(ifile, &req); + ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false); if (!ret) { ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1, &req.pr_entry_bh); -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode 2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton 2024-06-16 3:06 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot @ 2024-06-17 7:57 ` Jan Kara 2024-06-17 11:24 ` Ryusuke Konishi 1 sibling, 1 reply; 11+ messages in thread From: Jan Kara @ 2024-06-17 7:57 UTC (permalink / raw) To: Hillf Danton Cc: Matthew Wilcox, linux-mm, Jan Kara, linux-kernel, syzbot+d79afb004be235636ee8, linux-fsdevel, linux-nilfs, Ryusuke Konishi On Sun 16-06-24 10:39:51, Hillf Danton wrote: > On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote: > > > > I suspect this would trigger: > > > > +++ b/fs/inode.c > > @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb) > > void __destroy_inode(struct inode *inode) > > { > > BUG_ON(inode_has_buffers(inode)); > > + BUG_ON(inode->i_data.nrpages); > > inode_detach_wb(inode); > > security_inode_free(inode); > > fsnotify_inode_delete(inode); > > > Yes, it was triggered [1] > > [1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/ > > and given trigger after nrpages is checked in clear_inode(), > > iput(inode) > evict(inode) > truncate_inode_pages_final(&inode->i_data); > clear_inode(inode); > destroy_inode(inode); > > why is folio added to exiting mapping? > > #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b OK, so based on syzbot results this seems to be a bug in nilfs_evict_inode() (likely caused by corrupted filesystem so that root inode's link count was 0 and hence was getting deleted on iput()). I guess nilfs maintainers need to address these with more consistency checks of metadata when loading them... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode 2024-06-17 7:57 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Jan Kara @ 2024-06-17 11:24 ` Ryusuke Konishi 0 siblings, 0 replies; 11+ messages in thread From: Ryusuke Konishi @ 2024-06-17 11:24 UTC (permalink / raw) To: Jan Kara Cc: Hillf Danton, Matthew Wilcox, linux-mm, linux-kernel, syzbot+d79afb004be235636ee8, linux-fsdevel, linux-nilfs On Mon, Jun 17, 2024 at 4:57 PM Jan Kara wrote: > > On Sun 16-06-24 10:39:51, Hillf Danton wrote: > > On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote: > > > > > > I suspect this would trigger: > > > > > > +++ b/fs/inode.c > > > @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb) > > > void __destroy_inode(struct inode *inode) > > > { > > > BUG_ON(inode_has_buffers(inode)); > > > + BUG_ON(inode->i_data.nrpages); > > > inode_detach_wb(inode); > > > security_inode_free(inode); > > > fsnotify_inode_delete(inode); > > > > > Yes, it was triggered [1] > > > > [1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/ > > > > and given trigger after nrpages is checked in clear_inode(), > > > > iput(inode) > > evict(inode) > > truncate_inode_pages_final(&inode->i_data); > > clear_inode(inode); > > destroy_inode(inode); > > > > why is folio added to exiting mapping? > > > > #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b > > OK, so based on syzbot results this seems to be a bug in > nilfs_evict_inode() (likely caused by corrupted filesystem so that root > inode's link count was 0 and hence was getting deleted on iput()). I guess > nilfs maintainers need to address these with more consistency checks of > metadata when loading them... > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR Sorry for my late response. Also, thank you for pointing out that the problem seems to be caused via nilfs_evict_inode() by a missing consistency check of the link count. I'll check it out and think about how to deal with it. Thanks, Ryusuke Konishi ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-06-23 5:11 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <ZmxIvIJ3YSZDwbPW@casper.infradead.org> [not found] ` <20240614235953.809-1-hdanton@sina.com> 2024-06-15 20:44 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Matthew Wilcox 2024-06-15 23:52 ` Hillf Danton 2024-06-16 0:10 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot 2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton 2024-06-16 3:06 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot 2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 1/3] nilfs2: fix inode number range checks Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 2/3] nilfs2: add missing check for inode numbers on directory entries Ryusuke Konishi 2024-06-23 5:11 ` [PATCH 3/3] nilfs2: fix incorrect inode allocation from reserved inodes Ryusuke Konishi 2024-06-17 7:57 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Jan Kara 2024-06-17 11:24 ` Ryusuke Konishi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).