* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode
[not found] ` <20240614235953.809-1-hdanton@sina.com>
@ 2024-06-15 20:44 ` Matthew Wilcox
2024-06-15 23:52 ` Hillf Danton
2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton
0 siblings, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2024-06-15 20:44 UTC (permalink / raw)
To: Hillf Danton
Cc: linux-mm, Jan Kara, linux-kernel, syzbot+d79afb004be235636ee8,
linux-fsdevel, linux-nilfs, Ryusuke Konishi
On Sat, Jun 15, 2024 at 07:59:53AM +0800, Hillf Danton wrote:
> On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote:
> > On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> > > Flush lru cache to avoid folio->mapping uaf in case of inode teardown.
> >
> > What? inodes are supposed to have all their folios removed before
> > being freed. Part of removing a folio sets the folio->mapping to NULL.
> > Where is the report?
> >
> Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
> https://lore.kernel.org/lkml/000000000000cae276061aa12d5e@google.com/
Thanks. This fix is wrong. Of course syzbot says it fixes the problem,
but you're just avoiding putting the folios into the situation where we
have debug that would detect the problem.
I suspect this would trigger:
+++ b/fs/inode.c
@@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
void __destroy_inode(struct inode *inode)
{
BUG_ON(inode_has_buffers(inode));
+ BUG_ON(inode->i_data.nrpages);
inode_detach_wb(inode);
security_inode_free(inode);
fsnotify_inode_delete(inode);
and what a real fix would look like would be calling clear_inode()
before calling iput() in nilfs_put_root(). But I'm not an expert
in this layer of the VFS, so I might well be wrong.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode
2024-06-15 20:44 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Matthew Wilcox
@ 2024-06-15 23:52 ` Hillf Danton
2024-06-16 0:10 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot
2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton
1 sibling, 1 reply; 11+ messages in thread
From: Hillf Danton @ 2024-06-15 23:52 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, Jan Kara, linux-kernel, syzbot+d79afb004be235636ee8,
linux-fsdevel, linux-nilfs, Ryusuke Konishi
On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
> On Sat, Jun 15, 2024 at 07:59:53AM +0800, Hillf Danton wrote:
> > On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote:
> > > On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> > > > Flush lru cache to avoid folio->mapping uaf in case of inode teardown.
> > >
> > > What? inodes are supposed to have all their folios removed before
> > > being freed. Part of removing a folio sets the folio->mapping to NULL.
> > > Where is the report?
> > >
> > Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
> > https://lore.kernel.org/lkml/000000000000cae276061aa12d5e@google.com/
>
> Thanks. This fix is wrong. Of course syzbot says it fixes the problem,
> but you're just avoiding putting the folios into the situation where we
> have debug that would detect the problem.
>
> I suspect this would trigger:
>
Happy to test your idea.
> +++ b/fs/inode.c
> @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
> void __destroy_inode(struct inode *inode)
> {
> BUG_ON(inode_has_buffers(inode));
> + BUG_ON(inode->i_data.nrpages);
> inode_detach_wb(inode);
> security_inode_free(inode);
> fsnotify_inode_delete(inode);
>
> and what a real fix would look like would be calling clear_inode()
> before calling iput() in nilfs_put_root(). But I'm not an expert
Hm...given I_FREEING checked in clear_inode(), fix like this one could be
tried in midle 2026.
> in this layer of the VFS, so I might well be wrong.
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b
--- x/mm/truncate.c
+++ y/mm/truncate.c
@@ -419,6 +419,9 @@ void truncate_inode_pages_range(struct a
truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
folio_batch_release(&fbatch);
}
+
+ if (mapping_exiting(mapping))
+ lru_add_drain_all();
}
EXPORT_SYMBOL(truncate_inode_pages_range);
--- x/fs/inode.c
+++ y/fs/inode.c
@@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct
void __destroy_inode(struct inode *inode)
{
BUG_ON(inode_has_buffers(inode));
+ BUG_ON(inode->i_data.nrpages);
inode_detach_wb(inode);
security_inode_free(inode);
fsnotify_inode_delete(inode);
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
2024-06-15 23:52 ` Hillf Danton
@ 2024-06-16 0:10 ` syzbot
0 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2024-06-16 0:10 UTC (permalink / raw)
To: hdanton, jack, konishi.ryusuke, linux-fsdevel, linux-kernel,
linux-mm, linux-nilfs, syzkaller-bugs, willy
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in __destroy_inode
NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): disposed unprocessed dirty file(s) when stopping log writer
------------[ cut here ]------------
kernel BUG at fs/inode.c:285!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 2 PID: 5330 Comm: syz-executor Not tainted 6.10.0-rc3-syzkaller-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__destroy_inode+0x5e4/0x7a0 fs/inode.c:285
Code: 2a 03 00 00 48 c7 c7 40 78 3d 8b c6 05 aa 6d cc 0d 01 e8 bf d9 69 ff e9 0e fc ff ff e8 a5 8b 8c ff 90 0f 0b e8 9d 8b 8c ff 90 <0f> 0b e8 95 8b 8c ff 90 0f 0b 90 e9 fa fa ff ff e8 87 8b 8c ff 90
RSP: 0018:ffffc900035afaf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880325ba7c8 RCX: ffffffff82015439
RDX: ffff8880222ec880 RSI: ffffffff820159b3 RDI: 0000000000000007
RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880325ba980
R13: 0000000000000024 R14: ffffffff8b706c60 R15: ffff8880325ba8a0
FS: 0000555571e27480(0000) GS:ffff88806b200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f01cb366731 CR3: 0000000034ef4000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
destroy_inode+0x91/0x1b0 fs/inode.c:310
iput_final fs/inode.c:1742 [inline]
iput.part.0+0x5a8/0x7f0 fs/inode.c:1768
iput+0x5c/0x80 fs/inode.c:1758
nilfs_put_root+0xae/0xe0 fs/nilfs2/the_nilfs.c:925
nilfs_segctor_destroy fs/nilfs2/segment.c:2788 [inline]
nilfs_detach_log_writer+0x5ef/0xaa0 fs/nilfs2/segment.c:2850
nilfs_put_super+0x43/0x1b0 fs/nilfs2/super.c:498
generic_shutdown_super+0x159/0x3d0 fs/super.c:642
kill_block_super+0x3b/0x90 fs/super.c:1676
deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
deactivate_super+0xde/0x100 fs/super.c:506
cleanup_mnt+0x222/0x450 fs/namespace.c:1267
task_work_run+0x14e/0x250 kernel/task_work.c:180
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x278/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc203a7e217
Code: b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8
RSP: 002b:00007fffe9265ae8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000064 RCX: 00007fc203a7e217
RDX: 0000000000000200 RSI: 0000000000000009 RDI: 00007fffe9266c90
RBP: 00007fc203ac8336 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000202 R12: 00007fffe9266c90
R13: 00007fc203ac8336 R14: 0000555571e27430 R15: 0000000000000005
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__destroy_inode+0x5e4/0x7a0 fs/inode.c:285
Code: 2a 03 00 00 48 c7 c7 40 78 3d 8b c6 05 aa 6d cc 0d 01 e8 bf d9 69 ff e9 0e fc ff ff e8 a5 8b 8c ff 90 0f 0b e8 9d 8b 8c ff 90 <0f> 0b e8 95 8b 8c ff 90 0f 0b 90 e9 fa fa ff ff e8 87 8b 8c ff 90
RSP: 0018:ffffc900035afaf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880325ba7c8 RCX: ffffffff82015439
RDX: ffff8880222ec880 RSI: ffffffff820159b3 RDI: 0000000000000007
RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880325ba980
R13: 0000000000000024 R14: ffffffff8b706c60 R15: ffff8880325ba8a0
FS: 0000555571e27480(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0016fb000 CR3: 0000000034ef4000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Tested on:
commit: 83a7eefe Linux 6.10-rc3
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=11bb8ada980000
kernel config: https://syzkaller.appspot.com/x/.config?x=b8786f381e62940f
dashboard link: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=16642012980000
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode
2024-06-15 20:44 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Matthew Wilcox
2024-06-15 23:52 ` Hillf Danton
@ 2024-06-16 2:39 ` Hillf Danton
2024-06-16 3:06 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot
2024-06-17 7:57 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Jan Kara
1 sibling, 2 replies; 11+ messages in thread
From: Hillf Danton @ 2024-06-16 2:39 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, Jan Kara, linux-kernel, syzbot+d79afb004be235636ee8,
linux-fsdevel, linux-nilfs, Ryusuke Konishi
On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
>
> I suspect this would trigger:
>
> +++ b/fs/inode.c
> @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
> void __destroy_inode(struct inode *inode)
> {
> BUG_ON(inode_has_buffers(inode));
> + BUG_ON(inode->i_data.nrpages);
> inode_detach_wb(inode);
> security_inode_free(inode);
> fsnotify_inode_delete(inode);
>
Yes, it was triggered [1]
[1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/
and given trigger after nrpages is checked in clear_inode(),
iput(inode)
evict(inode)
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);
destroy_inode(inode);
why is folio added to exiting mapping?
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b
--- x/mm/filemap.c
+++ y/mm/filemap.c
@@ -870,6 +870,7 @@ noinline int __filemap_add_folio(struct
folio_ref_add(folio, nr);
folio->mapping = mapping;
folio->index = xas.xa_index;
+ BUG_ON(mapping_exiting(mapping));
for (;;) {
int order = -1, split_order = 0;
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton
@ 2024-06-16 3:06 ` syzbot
2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi
2024-06-17 7:57 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Jan Kara
1 sibling, 1 reply; 11+ messages in thread
From: syzbot @ 2024-06-16 3:06 UTC (permalink / raw)
To: hdanton, jack, konishi.ryusuke, linux-fsdevel, linux-kernel,
linux-mm, linux-nilfs, syzkaller-bugs, willy
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in __filemap_add_folio
NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): disposed unprocessed dirty file(s) when stopping log writer
------------[ cut here ]------------
kernel BUG at mm/filemap.c:873!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 5321 Comm: syz-executor Not tainted 6.10.0-rc3-syzkaller-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__filemap_add_folio+0xd1d/0xe80 mm/filemap.c:873
Code: 37 8b 4c 89 f7 e8 23 68 10 00 90 0f 0b e8 9b 14 ce ff 48 c7 c6 e0 92 37 8b 4c 89 f7 e8 0c 68 10 00 90 0f 0b e8 84 14 ce ff 90 <0f> 0b e8 7c 14 ce ff 90 0f 0b 90 e9 24 fb ff ff e8 6e 14 ce ff 48
RSP: 0018:ffffc900035773f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81bfc8cd
RDX: ffff888023052440 RSI: ffffffff81bfd0cc RDI: 0000000000000001
RBP: ffff88803233a9f0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000003 R12: ffffc90003577468
R13: 0000000000000000 R14: ffffea0000b3f7c0 R15: 0000000000000000
FS: 000055556c846480(0000) GS:ffff88806b100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe311b9ff8 CR3: 000000001ae02000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
filemap_add_folio+0x110/0x220 mm/filemap.c:971
__filemap_get_folio+0x455/0xa80 mm/filemap.c:1959
filemap_grab_folio include/linux/pagemap.h:697 [inline]
nilfs_grab_buffer+0xc3/0x370 fs/nilfs2/page.c:57
nilfs_mdt_submit_block+0x9f/0x870 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xa4/0x3b0 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0xdb/0xb90 fs/nilfs2/mdt.c:251
nilfs_palloc_get_block+0xb5/0x300 fs/nilfs2/alloc.c:217
nilfs_palloc_get_entry_block+0x165/0x1b0 fs/nilfs2/alloc.c:319
nilfs_ifile_delete_inode+0x1e6/0x260 fs/nilfs2/ifile.c:109
nilfs_evict_inode+0x294/0x550 fs/nilfs2/inode.c:950
evict+0x2ed/0x6c0 fs/inode.c:667
iput_final fs/inode.c:1741 [inline]
iput.part.0+0x5a8/0x7f0 fs/inode.c:1767
iput+0x5c/0x80 fs/inode.c:1757
nilfs_put_root+0xae/0xe0 fs/nilfs2/the_nilfs.c:925
nilfs_segctor_destroy fs/nilfs2/segment.c:2788 [inline]
nilfs_detach_log_writer+0x5ef/0xaa0 fs/nilfs2/segment.c:2850
nilfs_put_super+0x43/0x1b0 fs/nilfs2/super.c:498
generic_shutdown_super+0x159/0x3d0 fs/super.c:642
kill_block_super+0x3b/0x90 fs/super.c:1676
deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
deactivate_super+0xde/0x100 fs/super.c:506
cleanup_mnt+0x222/0x450 fs/namespace.c:1267
task_work_run+0x14e/0x250 kernel/task_work.c:180
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x278/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f70d447e217
Code: b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8
RSP: 002b:00007ffe311ba288 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000064 RCX: 00007f70d447e217
RDX: 0000000000000200 RSI: 0000000000000009 RDI: 00007ffe311bb430
RBP: 00007f70d44c8336 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000202 R12: 00007ffe311bb430
R13: 00007f70d44c8336 R14: 000055556c846430 R15: 0000000000000005
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__filemap_add_folio+0xd1d/0xe80 mm/filemap.c:873
Code: 37 8b 4c 89 f7 e8 23 68 10 00 90 0f 0b e8 9b 14 ce ff 48 c7 c6 e0 92 37 8b 4c 89 f7 e8 0c 68 10 00 90 0f 0b e8 84 14 ce ff 90 <0f> 0b e8 7c 14 ce ff 90 0f 0b 90 e9 24 fb ff ff e8 6e 14 ce ff 48
RSP: 0018:ffffc900035773f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81bfc8cd
RDX: ffff888023052440 RSI: ffffffff81bfd0cc RDI: 0000000000000001
RBP: ffff88803233a9f0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000003 R12: ffffc90003577468
R13: 0000000000000000 R14: ffffea0000b3f7c0 R15: 0000000000000000
FS: 000055556c846480(0000) GS:ffff88806b000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f70d45a8000 CR3: 000000001ae02000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Tested on:
commit: 83a7eefe Linux 6.10-rc3
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15608256980000
kernel config: https://syzkaller.appspot.com/x/.config?x=b8786f381e62940f
dashboard link: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=147bb012980000
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode
2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton
2024-06-16 3:06 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot
@ 2024-06-17 7:57 ` Jan Kara
2024-06-17 11:24 ` Ryusuke Konishi
1 sibling, 1 reply; 11+ messages in thread
From: Jan Kara @ 2024-06-17 7:57 UTC (permalink / raw)
To: Hillf Danton
Cc: Matthew Wilcox, linux-mm, Jan Kara, linux-kernel,
syzbot+d79afb004be235636ee8, linux-fsdevel, linux-nilfs,
Ryusuke Konishi
On Sun 16-06-24 10:39:51, Hillf Danton wrote:
> On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
> >
> > I suspect this would trigger:
> >
> > +++ b/fs/inode.c
> > @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
> > void __destroy_inode(struct inode *inode)
> > {
> > BUG_ON(inode_has_buffers(inode));
> > + BUG_ON(inode->i_data.nrpages);
> > inode_detach_wb(inode);
> > security_inode_free(inode);
> > fsnotify_inode_delete(inode);
> >
> Yes, it was triggered [1]
>
> [1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/
>
> and given trigger after nrpages is checked in clear_inode(),
>
> iput(inode)
> evict(inode)
> truncate_inode_pages_final(&inode->i_data);
> clear_inode(inode);
> destroy_inode(inode);
>
> why is folio added to exiting mapping?
>
> #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b
OK, so based on syzbot results this seems to be a bug in
nilfs_evict_inode() (likely caused by corrupted filesystem so that root
inode's link count was 0 and hence was getting deleted on iput()). I guess
nilfs maintainers need to address these with more consistency checks of
metadata when loading them...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode
2024-06-17 7:57 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Jan Kara
@ 2024-06-17 11:24 ` Ryusuke Konishi
0 siblings, 0 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2024-06-17 11:24 UTC (permalink / raw)
To: Jan Kara
Cc: Hillf Danton, Matthew Wilcox, linux-mm, linux-kernel,
syzbot+d79afb004be235636ee8, linux-fsdevel, linux-nilfs
On Mon, Jun 17, 2024 at 4:57 PM Jan Kara wrote:
>
> On Sun 16-06-24 10:39:51, Hillf Danton wrote:
> > On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
> > >
> > > I suspect this would trigger:
> > >
> > > +++ b/fs/inode.c
> > > @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
> > > void __destroy_inode(struct inode *inode)
> > > {
> > > BUG_ON(inode_has_buffers(inode));
> > > + BUG_ON(inode->i_data.nrpages);
> > > inode_detach_wb(inode);
> > > security_inode_free(inode);
> > > fsnotify_inode_delete(inode);
> > >
> > Yes, it was triggered [1]
> >
> > [1] https://lore.kernel.org/lkml/00000000000084b401061af6ab80@google.com/
> >
> > and given trigger after nrpages is checked in clear_inode(),
> >
> > iput(inode)
> > evict(inode)
> > truncate_inode_pages_final(&inode->i_data);
> > clear_inode(inode);
> > destroy_inode(inode);
> >
> > why is folio added to exiting mapping?
> >
> > #syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b
>
> OK, so based on syzbot results this seems to be a bug in
> nilfs_evict_inode() (likely caused by corrupted filesystem so that root
> inode's link count was 0 and hence was getting deleted on iput()). I guess
> nilfs maintainers need to address these with more consistency checks of
> metadata when loading them...
>
> Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
Sorry for my late response.
Also, thank you for pointing out that the problem seems to be caused
via nilfs_evict_inode() by a missing consistency check of the link
count.
I'll check it out and think about how to deal with it.
Thanks,
Ryusuke Konishi
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes
2024-06-16 3:06 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot
@ 2024-06-23 5:11 ` Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 1/3] nilfs2: fix inode number range checks Ryusuke Konishi
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2024-06-23 5:11 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-nilfs, syzbot, syzkaller-bugs, LKML, hdanton, jack,
linux-fsdevel, willy
Hi Andrew, please apply this bug fix series.
This series fixes one use-after-free issue reported by syzbot, caused
by nilfs2's internal inode being exposed in the namespace on a
corrupted filesystem, and a couple of flaws that cause problems if the
starting number of non-reserved inodes written in the on-disk super
block is intentionally (or corruptly) changed from its default value.
Thanks,
Ryusuke Konishi
Ryusuke Konishi (3):
nilfs2: fix inode number range checks
nilfs2: add missing check for inode numbers on directory entries
nilfs2: fix incorrect inode allocation from reserved inodes
fs/nilfs2/alloc.c | 19 +++++++++++++++----
fs/nilfs2/alloc.h | 4 ++--
fs/nilfs2/dat.c | 2 +-
fs/nilfs2/dir.c | 6 ++++++
fs/nilfs2/ifile.c | 7 ++-----
fs/nilfs2/nilfs.h | 10 ++++++++--
fs/nilfs2/the_nilfs.c | 6 ++++++
fs/nilfs2/the_nilfs.h | 2 +-
8 files changed, 41 insertions(+), 15 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] nilfs2: fix inode number range checks
2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi
@ 2024-06-23 5:11 ` Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 2/3] nilfs2: add missing check for inode numbers on directory entries Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 3/3] nilfs2: fix incorrect inode allocation from reserved inodes Ryusuke Konishi
2 siblings, 0 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2024-06-23 5:11 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-nilfs, syzbot, syzkaller-bugs, LKML, hdanton, jack,
linux-fsdevel, willy
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the
superblock, but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in
the super block parameter, the inode number test macros
(NILFS_MDT_INODE and NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using
with the inode number as the shift count via the BIT macro, but the
result of a shift calculation that exceeds the bit width of an integer
is undefined in the C specification, so if "ns_first_ino" is set to a
large value other than the default value NILFS_USER_INO (=11), the
macros may potentially malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino"
and by preventing bit shifts equal to or greater than the
NILFS_USER_INO constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to
unsigned integer to avoid the need for type casting in comparisons
such as the lower bound check introduced this time.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: stable@vger.kernel.org
---
fs/nilfs2/nilfs.h | 5 +++--
fs/nilfs2/the_nilfs.c | 6 ++++++
fs/nilfs2/the_nilfs.h | 2 +-
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 728e90be3570..7e39e277c77f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -116,9 +116,10 @@ enum {
#define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
#define NILFS_MDT_INODE(sb, ino) \
- ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
+ ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino)))
#define NILFS_VALID_INODE(sb, ino) \
- ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
+ ((ino) >= NILFS_FIRST_INO(sb) || \
+ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
/**
* struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index f41d7b6d432c..e44dde57ab65 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
}
nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino);
+ if (nilfs->ns_first_ino < NILFS_USER_INO) {
+ nilfs_err(nilfs->ns_sb,
+ "too small lower limit for non-reserved inode numbers: %u",
+ nilfs->ns_first_ino);
+ return -EINVAL;
+ }
nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 85da0629415d..1e829ed7b0ef 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -182,7 +182,7 @@ struct the_nilfs {
unsigned long ns_nrsvsegs;
unsigned long ns_first_data_block;
int ns_inode_size;
- int ns_first_ino;
+ unsigned int ns_first_ino;
u32 ns_crc_seed;
/* /sys/fs/<nilfs>/<device> */
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] nilfs2: add missing check for inode numbers on directory entries
2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 1/3] nilfs2: fix inode number range checks Ryusuke Konishi
@ 2024-06-23 5:11 ` Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 3/3] nilfs2: fix incorrect inode allocation from reserved inodes Ryusuke Konishi
2 siblings, 0 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2024-06-23 5:11 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-nilfs, syzbot, syzkaller-bugs, LKML, hdanton, jack,
linux-fsdevel, willy
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata
file gets corrupted to 0, and nilfs_evict_inode(), which is called
from iput(), tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode
numbers of these metadata files that should not be visible in the
namespace are read without checking.
Fix this issue by treating the inode numbers of these internal files
as errors in the sanity check helper when reading directory
folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial
mm-layer analysis.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack@suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: stable@vger.kernel.org
---
fs/nilfs2/dir.c | 6 ++++++
fs/nilfs2/nilfs.h | 5 +++++
2 files changed, 11 insertions(+)
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 52e50b1b7f22..dddfa604491a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto Enamelen;
if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
goto Espan;
+ if (unlikely(p->inode &&
+ NILFS_PRIVATE_INODE(le64_to_cpu(p->inode))))
+ goto Einumber;
}
if (offs != limit)
goto Eend;
@@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto bad_entry;
Espan:
error = "directory entry across blocks";
+ goto bad_entry;
+Einumber:
+ error = "disallowed inode number";
bad_entry:
nilfs_error(sb,
"bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d",
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 7e39e277c77f..4017f7856440 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -121,6 +121,11 @@ enum {
((ino) >= NILFS_FIRST_INO(sb) || \
((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
+#define NILFS_PRIVATE_INODE(ino) ({ \
+ ino_t __ino = (ino); \
+ ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \
+ (__ino) != NILFS_SKETCH_INO); })
+
/**
* struct nilfs_transaction_info: context information for synchronization
* @ti_magic: Magic number
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] nilfs2: fix incorrect inode allocation from reserved inodes
2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 1/3] nilfs2: fix inode number range checks Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 2/3] nilfs2: add missing check for inode numbers on directory entries Ryusuke Konishi
@ 2024-06-23 5:11 ` Ryusuke Konishi
2 siblings, 0 replies; 11+ messages in thread
From: Ryusuke Konishi @ 2024-06-23 5:11 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-nilfs, syzbot, syzkaller-bugs, LKML, hdanton, jack,
linux-fsdevel, willy
If the bitmap block that manages the inode allocation status is
corrupted, nilfs_ifile_create_inode() may allocate a new inode from
the reserved inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated
due to bitmap corruption, but since the start number of non-reserved
inodes is read from the super block and may change, in which case
inode allocation may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing
the file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only
allocates inodes with inode numbers greater than or equal to the inode
number read in "nilfs->ns_first_ino", regardless of the bitmap status
of reserved inodes.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: stable@vger.kernel.org
---
fs/nilfs2/alloc.c | 19 +++++++++++++++----
fs/nilfs2/alloc.h | 4 ++--
fs/nilfs2/dat.c | 2 +-
fs/nilfs2/ifile.c | 7 ++-----
4 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 89caef7513db..ba50388ee4bf 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
* @target: offset number of an entry in the group (start point)
* @bsize: size in bits
* @lock: spin lock protecting @bitmap
+ * @wrap: whether to wrap around
*/
static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
unsigned long target,
unsigned int bsize,
- spinlock_t *lock)
+ spinlock_t *lock, bool wrap)
{
int pos, end = bsize;
@@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
end = target;
}
+ if (!wrap)
+ return -ENOSPC;
/* wrap around */
for (pos = 0; pos < end; pos++) {
@@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
* nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @wrap: whether to wrap around
*/
int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+ struct nilfs_palloc_req *req, bool wrap)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
entries_per_group = nilfs_palloc_entries_per_group(inode);
for (i = 0; i < ngroups; i += n) {
- if (group >= ngroups) {
+ if (group >= ngroups && wrap) {
/* wrap around */
group = 0;
maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr,
@@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap_kaddr = kmap_local_page(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
- bitmap, group_offset, entries_per_group, lock);
+ bitmap, group_offset, entries_per_group, lock,
+ wrap);
+ /*
+ * Since the search for a free slot in the second and
+ * subsequent bitmap blocks always starts from the
+ * beginning, the wrap flag only has an effect on the
+ * first search.
+ */
kunmap_local(bitmap_kaddr);
if (pos >= 0)
goto found;
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index b667e869ac07..d825a9faca6d 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -50,8 +50,8 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};
-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req, bool wrap);
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 180fc8d36213..fc1caf63a42a 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req)
{
int ret;
- ret = nilfs_palloc_prepare_alloc_entry(dat, req);
+ ret = nilfs_palloc_prepare_alloc_entry(dat, req, true);
if (ret < 0)
return ret;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 612e609158b5..1e86b9303b7c 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;
- req.pr_entry_nr = 0; /*
- * 0 says find free inode from beginning
- * of a group. dull code!!
- */
+ req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb);
req.pr_entry_bh = NULL;
- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false);
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-06-23 5:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <ZmxIvIJ3YSZDwbPW@casper.infradead.org>
[not found] ` <20240614235953.809-1-hdanton@sina.com>
2024-06-15 20:44 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Matthew Wilcox
2024-06-15 23:52 ` Hillf Danton
2024-06-16 0:10 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot
2024-06-16 2:39 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Hillf Danton
2024-06-16 3:06 ` [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn syzbot
2024-06-23 5:11 ` [PATCH 0/3] nilfs2: fix potential issues related to reserved inodes Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 1/3] nilfs2: fix inode number range checks Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 2/3] nilfs2: add missing check for inode numbers on directory entries Ryusuke Konishi
2024-06-23 5:11 ` [PATCH 3/3] nilfs2: fix incorrect inode allocation from reserved inodes Ryusuke Konishi
2024-06-17 7:57 ` [RFC PATCH] mm: truncate: flush lru cache for evicted inode Jan Kara
2024-06-17 11:24 ` Ryusuke Konishi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).