Re: [syzbot] [net?] kernel BUG in filemap

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [syzbot] [net?] kernel BUG in filemap_fault (2)
       [not found] <684ffc59.a00a0220.279073.0037.GAE@google.com>
@ 2025-07-03 16:43 ` syzbot
  2025-09-14 10:51 ` [syzbot] [sound?] " syzbot
  1 sibling, 0 replies; 7+ messages in thread
From: syzbot @ 2025-07-03 16:43 UTC (permalink / raw)
  To: davem, edumazet, horms, kuba, kuniyu, linux-kernel, linux-sound,
	netdev, pabeni, perex, syzkaller-bugs, tiwai, willemb

syzbot has found a reproducer for the following issue on:

HEAD commit:    b4911fb0b060 Merge tag 'mmc-v6.16-rc1' of git://git.kernel..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16568c8c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3f6ddf055b5c86f8
dashboard link: https://syzkaller.appspot.com/bug?extid=263f159eb37a1c4c67a4
compiler:       Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=157cf48c580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=146a948c580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/5828d857f454/disk-b4911fb0.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/ecf6b7e29d2b/vmlinux-b4911fb0.xz
kernel image: https://storage.googleapis.com/syzbot-assets/cc43b227e03d/bzImage-b4911fb0.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+263f159eb37a1c4c67a4@syzkaller.appspotmail.com

 __kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:329
 kasan_slab_alloc include/linux/kasan.h:250 [inline]
 slab_post_alloc_hook mm/slub.c:4148 [inline]
 slab_alloc_node mm/slub.c:4197 [inline]
 kmem_cache_alloc_node_noprof+0x1bb/0x3c0 mm/slub.c:4249
 __alloc_skb+0x112/0x2d0 net/core/skbuff.c:660
 alloc_skb include/linux/skbuff.h:1336 [inline]
 __ip6_append_data+0x2b8c/0x3de0 net/ipv6/ip6_output.c:1668
 ip6_append_data+0x1c4/0x380 net/ipv6/ip6_output.c:1858
 rawv6_sendmsg+0x124b/0x17f0 net/ipv6/raw.c:911
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg+0x19c/0x270 net/socket.c:727
 ____sys_sendmsg+0x52d/0x830 net/socket.c:2566
 ___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620
 __sys_sendmmsg+0x227/0x430 net/socket.c:2709
------------[ cut here ]------------
kernel BUG at mm/filemap.c:3442!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 9236 Comm: syz.3.1035 Not tainted 6.16.0-rc4-syzkaller-00049-gb4911fb0b060 #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
RIP: 0010:filemap_fault+0x117d/0x1200 mm/filemap.c:3442
Code: 38 c1 0f 8c 8e fc ff ff 4c 89 e7 e8 8d 6b 29 00 e9 81 fc ff ff e8 a3 13 c8 ff 48 89 df 48 c7 c6 a0 30 94 8b e8 d4 ae 0d 00 90 <0f> 0b e8 8c 13 c8 ff 48 8b 3c 24 48 c7 c6 20 37 94 8b e8 bc ae 0d
RSP: 0018:ffffc900030976e0 EFLAGS: 00010246
RAX: 864bab26a5780700 RBX: ffffea0001e53880 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8d96e815 RDI: 00000000ffffffff
RBP: ffffc90003097818 R08: ffffffff8f9fdbf7 R09: 1ffffffff1f3fb7e
R10: dffffc0000000000 R11: fffffbfff1f3fb7f R12: dffffc0000000000
R13: 1ffffd40003ca711 R14: ffffea0001e53898 R15: ffffea0001e53888
FS:  00007f9550ed76c0(0000) GS:ffff888125d84000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000002000 CR3: 0000000059b4a000 CR4: 00000000003526f0
Call Trace:
 <TASK>
 __do_fault+0x135/0x390 mm/memory.c:5169
 do_shared_fault mm/memory.c:5654 [inline]
 do_fault mm/memory.c:5728 [inline]
 do_pte_missing mm/memory.c:4251 [inline]
 handle_pte_fault mm/memory.c:6069 [inline]
 __handle_mm_fault+0x198b/0x5620 mm/memory.c:6212
 handle_mm_fault+0x2d5/0x7f0 mm/memory.c:6381
 do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
 handle_page_fault arch/x86/mm/fault.c:1476 [inline]
 exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
RIP: 0010:__put_user_4+0xd/0x20 arch/x86/lib/putuser.S:94
Code: 66 89 01 31 c9 0f 01 ca e9 00 3b 03 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 cb 48 c1 fb 3f 48 09 d9 0f 01 cb <89> 01 31 c9 0f 01 ca e9 d7 3a 03 00 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc90003097c98 EFLAGS: 00050206
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00002000000015b8
RDX: 0000000000000000 RSI: ffffffff8db5a681 RDI: ffffffff8be1b940
RBP: ffffc90003097eb0 R08: 0000000000000000 R09: ffffffff820a3bc0
R10: dffffc0000000000 R11: ffffed100b371081 R12: 0000200000001580
R13: 0000000000040000 R14: 0000200000000480 R15: 0000000000000044
 __sys_sendmmsg+0x25f/0x430 net/socket.c:2714
 __do_sys_sendmmsg net/socket.c:2736 [inline]
 __se_sys_sendmmsg net/socket.c:2733 [inline]
 __x64_sys_sendmmsg+0xa0/0xc0 net/socket.c:2733
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f954ff8e929
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9550ed7038 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f95501b5fa0 RCX: 00007f954ff8e929
RDX: 00000000000002e9 RSI: 0000200000000480 RDI: 0000000000000004
RBP: 00007f9550010b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f95501b5fa0 R15: 00007ffcd704c578
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:filemap_fault+0x117d/0x1200 mm/filemap.c:3442
Code: 38 c1 0f 8c 8e fc ff ff 4c 89 e7 e8 8d 6b 29 00 e9 81 fc ff ff e8 a3 13 c8 ff 48 89 df 48 c7 c6 a0 30 94 8b e8 d4 ae 0d 00 90 <0f> 0b e8 8c 13 c8 ff 48 8b 3c 24 48 c7 c6 20 37 94 8b e8 bc ae 0d
RSP: 0018:ffffc900030976e0 EFLAGS: 00010246
RAX: 864bab26a5780700 RBX: ffffea0001e53880 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8d96e815 RDI: 00000000ffffffff
RBP: ffffc90003097818 R08: ffffffff8f9fdbf7 R09: 1ffffffff1f3fb7e
R10: dffffc0000000000 R11: fffffbfff1f3fb7f R12: dffffc0000000000
R13: 1ffffd40003ca711 R14: ffffea0001e53898 R15: ffffea0001e53888
FS:  00007f9550ed76c0(0000) GS:ffff888125d84000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000002000 CR3: 0000000059b4a000 CR4: 00000000003526f0
----------------
Code disassembly (best guess):
   0:	66 89 01             	mov    %ax,(%rcx)
   3:	31 c9                	xor    %ecx,%ecx
   5:	0f 01 ca             	clac
   8:	e9 00 3b 03 00       	jmp    0x33b0d
   d:	90                   	nop
   e:	90                   	nop
   f:	90                   	nop
  10:	90                   	nop
  11:	90                   	nop
  12:	90                   	nop
  13:	90                   	nop
  14:	90                   	nop
  15:	90                   	nop
  16:	90                   	nop
  17:	90                   	nop
  18:	90                   	nop
  19:	90                   	nop
  1a:	90                   	nop
  1b:	90                   	nop
  1c:	90                   	nop
  1d:	48 89 cb             	mov    %rcx,%rbx
  20:	48 c1 fb 3f          	sar    $0x3f,%rbx
  24:	48 09 d9             	or     %rbx,%rcx
  27:	0f 01 cb             	stac
* 2a:	89 01                	mov    %eax,(%rcx) <-- trapping instruction
  2c:	31 c9                	xor    %ecx,%ecx
  2e:	0f 01 ca             	clac
  31:	e9 d7 3a 03 00       	jmp    0x33b0d
  36:	90                   	nop
  37:	90                   	nop
  38:	90                   	nop
  39:	90                   	nop
  3a:	90                   	nop
  3b:	90                   	nop
  3c:	90                   	nop
  3d:	90                   	nop
  3e:	90                   	nop
  3f:	90                   	nop


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)
       [not found] <684ffc59.a00a0220.279073.0037.GAE@google.com>
  2025-07-03 16:43 ` [syzbot] [net?] kernel BUG in filemap_fault (2) syzbot
@ 2025-09-14 10:51 ` syzbot
  2025-09-16 12:50   ` Ryan Roberts
  1 sibling, 1 reply; 7+ messages in thread
From: syzbot @ 2025-09-14 10:51 UTC (permalink / raw)
  To: akpm, chaitanyas.prakash, davem, david, edumazet, hdanton, horms,
	jack, kuba, kuniyu, linux-kernel, linux-sound, netdev, pabeni,
	perex, ryan.roberts, syzkaller-bugs, tiwai, willemb

syzbot suspects this issue was fixed by commit:

commit bdb86f6b87633cc020f8225ae09d336da7826724
Author: Ryan Roberts <ryan.roberts@arm.com>
Date:   Mon Jun 9 09:27:23 2025 +0000

    mm/readahead: honour new_order in page_cache_ra_order()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1100b934580000
start commit:   b4911fb0b060 Merge tag 'mmc-v6.16-rc1' of git://git.kernel..
git tree:       upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=3f6ddf055b5c86f8
dashboard link: https://syzkaller.appspot.com/bug?extid=263f159eb37a1c4c67a4
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=157cf48c580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=146a948c580000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: mm/readahead: honour new_order in page_cache_ra_order()

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)
  2025-09-14 10:51 ` [syzbot] [sound?] " syzbot
@ 2025-09-16 12:50   ` Ryan Roberts
  2025-09-16 13:05     ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Ryan Roberts @ 2025-09-16 12:50 UTC (permalink / raw)
  To: syzbot, akpm, chaitanyas.prakash, davem, david, edumazet, hdanton,
	horms, jack, kuba, kuniyu, linux-kernel, linux-sound, netdev,
	pabeni, perex, syzkaller-bugs, tiwai, willemb

On 14/09/2025 11:51, syzbot wrote:
> syzbot suspects this issue was fixed by commit:
> 
> commit bdb86f6b87633cc020f8225ae09d336da7826724
> Author: Ryan Roberts <ryan.roberts@arm.com>
> Date:   Mon Jun 9 09:27:23 2025 +0000
> 
>     mm/readahead: honour new_order in page_cache_ra_order()

I'm not sure what original bug you are claiming this is fixing? Perhaps this?

https://lore.kernel.org/linux-mm/6852b77e.a70a0220.79d0a.0214.GAE@google.com/

If so, the fix for that was squashed into the original patch before it was
merged upstream. That is now Commit 38b0ece6d763 ("mm/filemap: allow arch to
request folio size for exec memory").

Thanks,
Ryan


> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1100b934580000
> start commit:   b4911fb0b060 Merge tag 'mmc-v6.16-rc1' of git://git.kernel..
> git tree:       upstream
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3f6ddf055b5c86f8
> dashboard link: https://syzkaller.appspot.com/bug?extid=263f159eb37a1c4c67a4
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=157cf48c580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=146a948c580000
> 
> If the result looks correct, please mark the issue as fixed by replying with:
> 
> #syz fix: mm/readahead: honour new_order in page_cache_ra_order()
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)
  2025-09-16 12:50   ` Ryan Roberts
@ 2025-09-16 13:05     ` Jan Kara
  2025-09-17  7:57       ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2025-09-16 13:05 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: syzbot, akpm, chaitanyas.prakash, davem, david, edumazet, hdanton,
	horms, jack, kuba, kuniyu, linux-kernel, linux-sound, netdev,
	pabeni, perex, syzkaller-bugs, tiwai, willemb

On Tue 16-09-25 13:50:08, Ryan Roberts wrote:
> On 14/09/2025 11:51, syzbot wrote:
> > syzbot suspects this issue was fixed by commit:
> > 
> > commit bdb86f6b87633cc020f8225ae09d336da7826724
> > Author: Ryan Roberts <ryan.roberts@arm.com>
> > Date:   Mon Jun 9 09:27:23 2025 +0000
> > 
> >     mm/readahead: honour new_order in page_cache_ra_order()
> 
> I'm not sure what original bug you are claiming this is fixing? Perhaps this?
> 
> https://lore.kernel.org/linux-mm/6852b77e.a70a0220.79d0a.0214.GAE@google.com/

I think it was:

https://lore.kernel.org/all/684ffc59.a00a0220.279073.0037.GAE@google.com/

at least that's what the syzbot email replies to... And it doesn't make a
lot of sense but it isn't totally off either. So I'd just let the syzbot
bug autoclose after some timeout.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)
  2025-09-16 13:05     ` Jan Kara
@ 2025-09-17  7:57       ` David Hildenbrand
  2025-09-17  8:35         ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2025-09-17  7:57 UTC (permalink / raw)
  To: Jan Kara, Ryan Roberts
  Cc: syzbot, akpm, chaitanyas.prakash, davem, edumazet, hdanton, horms,
	kuba, kuniyu, linux-kernel, linux-sound, netdev, pabeni, perex,
	syzkaller-bugs, tiwai, willemb

On 16.09.25 15:05, Jan Kara wrote:
> On Tue 16-09-25 13:50:08, Ryan Roberts wrote:
>> On 14/09/2025 11:51, syzbot wrote:
>>> syzbot suspects this issue was fixed by commit:
>>>
>>> commit bdb86f6b87633cc020f8225ae09d336da7826724
>>> Author: Ryan Roberts <ryan.roberts@arm.com>
>>> Date:   Mon Jun 9 09:27:23 2025 +0000
>>>
>>>      mm/readahead: honour new_order in page_cache_ra_order()
>>
>> I'm not sure what original bug you are claiming this is fixing? Perhaps this?
>>
>> https://lore.kernel.org/linux-mm/6852b77e.a70a0220.79d0a.0214.GAE@google.com/
> 
> I think it was:
> 
> https://lore.kernel.org/all/684ffc59.a00a0220.279073.0037.GAE@google.com/
> 
> at least that's what the syzbot email replies to... And it doesn't make a
> lot of sense but it isn't totally off either. So I'd just let the syzbot
> bug autoclose after some timeout.

Hm, in the issue we ran into was:

	VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);

in filemap_fault().

Now, that sounds rather bad, especially given that it was reported upstream.

So likely we should figure out what happened and see if it really fixed 
it and if so, why it fixed it (stable backports etc)?

Could be that Ryans patch is just making the problem harder to 
reproduce, of course (what I assume right now).


Essentially we do a

	folio = filemap_get_folio(mapping, index);

followed by

	if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
		goto out_retry;

	/* Did it get truncated? */
	if (unlikely(folio->mapping != mapping)) {
		folio_unlock(folio);
		folio_put(folio);
		goto retry_find;
	}
	VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);


I would assume that if !folio_contains(folio, index), either the folio 
got split in the meantime (filemap_get_folio() returned with a raised 
reference, though) or that file pagecache contained something wrong.


In __filemap_get_folio() we perform the same checks after locking the 
folio (with FGP_LOCK), and weird enough it didn't trigger yet there.

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)
  2025-09-17  7:57       ` David Hildenbrand
@ 2025-09-17  8:35         ` Jan Kara
  2025-09-17  9:04           ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2025-09-17  8:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jan Kara, Ryan Roberts, syzbot, akpm, chaitanyas.prakash, davem,
	edumazet, hdanton, horms, kuba, kuniyu, linux-kernel, linux-sound,
	netdev, pabeni, perex, syzkaller-bugs, tiwai, willemb

On Wed 17-09-25 09:57:19, David Hildenbrand wrote:
> On 16.09.25 15:05, Jan Kara wrote:
> > On Tue 16-09-25 13:50:08, Ryan Roberts wrote:
> > > On 14/09/2025 11:51, syzbot wrote:
> > > > syzbot suspects this issue was fixed by commit:
> > > > 
> > > > commit bdb86f6b87633cc020f8225ae09d336da7826724
> > > > Author: Ryan Roberts <ryan.roberts@arm.com>
> > > > Date:   Mon Jun 9 09:27:23 2025 +0000
> > > > 
> > > >      mm/readahead: honour new_order in page_cache_ra_order()
> > > 
> > > I'm not sure what original bug you are claiming this is fixing? Perhaps this?
> > > 
> > > https://lore.kernel.org/linux-mm/6852b77e.a70a0220.79d0a.0214.GAE@google.com/
> > 
> > I think it was:
> > 
> > https://lore.kernel.org/all/684ffc59.a00a0220.279073.0037.GAE@google.com/
> > 
> > at least that's what the syzbot email replies to... And it doesn't make a
> > lot of sense but it isn't totally off either. So I'd just let the syzbot
> > bug autoclose after some timeout.
> 
> Hm, in the issue we ran into was:
> 
> 	VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);
> 
> in filemap_fault().
> 
> Now, that sounds rather bad, especially given that it was reported upstream.
> 
> So likely we should figure out what happened and see if it really fixed it
> and if so, why it fixed it (stable backports etc)?

Ok, ok, fair enough ;)

> Could be that Ryans patch is just making the problem harder to reproduce, of
> course (what I assume right now).
> 
> Essentially we do a
> 
> 	folio = filemap_get_folio(mapping, index);
> 
> followed by
> 
> 	if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
> 		goto out_retry;
> 
> 	/* Did it get truncated? */
> 	if (unlikely(folio->mapping != mapping)) {
> 		folio_unlock(folio);
> 		folio_put(folio);
> 		goto retry_find;
> 	}
> 	VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);
> 
> 
> I would assume that if !folio_contains(folio, index), either the folio got
> split in the meantime (filemap_get_folio() returned with a raised reference,
> though) or that file pagecache contained something wrong.

Right.

> In __filemap_get_folio() we perform the same checks after locking the folio
> (with FGP_LOCK), and weird enough it didn't trigger yet there.

But we don't call __filemap_get_folio() with FGP_LOCK from filemap_fault().
The folio locking is handled by lock_folio_maybe_drop_mmap() as you
mentioned. So this is the first time we do the assert after getting the
folio AFAICT. So some race with folio split looks plausible. Checking the
reproducer it does play with mmap(2) and madvise(MADV_REMOVE) over the
mapped range so the page fault may be racing with
truncate_inode_partial_folio()->try_folio_split(). But I don't see the race
there now...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [sound?] kernel BUG in filemap_fault (2)
  2025-09-17  8:35         ` Jan Kara
@ 2025-09-17  9:04           ` David Hildenbrand
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-09-17  9:04 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ryan Roberts, syzbot, akpm, chaitanyas.prakash, davem, edumazet,
	hdanton, horms, kuba, kuniyu, linux-kernel, linux-sound, netdev,
	pabeni, perex, syzkaller-bugs, tiwai, willemb

>>
>> 	if (!lock_folio_maybe_drop_mmap(vmf, folio, &fpin))
>> 		goto out_retry;
>>
>> 	/* Did it get truncated? */
>> 	if (unlikely(folio->mapping != mapping)) {
>> 		folio_unlock(folio);
>> 		folio_put(folio);
>> 		goto retry_find;
>> 	}
>> 	VM_BUG_ON_FOLIO(!folio_contains(folio, index), folio);
>>
>>
>> I would assume that if !folio_contains(folio, index), either the folio got
>> split in the meantime (filemap_get_folio() returned with a raised reference,
>> though) or that file pagecache contained something wrong.
> 
> Right.
> 
>> In __filemap_get_folio() we perform the same checks after locking the folio
>> (with FGP_LOCK), and weird enough it didn't trigger yet there.
> 
> But we don't call __filemap_get_folio() with FGP_LOCK from filemap_fault().

Yes. I should have clarified that we haven't seen the VM_BUG_ON_FOLIO() 
trigger on other callpaths that set FGP_LOCK, because I would think the 
very same problem could happen there as well.

> The folio locking is handled by lock_folio_maybe_drop_mmap() as you
> mentioned. So this is the first time we do the assert after getting the
> folio AFAICT. So some race with folio split looks plausible. Checking the
> reproducer it does play with mmap(2) and madvise(MADV_REMOVE) over the
> mapped range so the page fault may be racing with
> truncate_inode_partial_folio()->try_folio_split(). But I don't see the race
> there now...

__filemap_get_folio() will grab a reference and verify that the xarray 
didn't change. So having a concurrent split succeed would be weird, 
because freezing the refcount should fail. Of course, some refcounting 
inconsistency could trigger something weird like that.

I can spot that we are also manually calling 
__filemap_get_folio(FGP_CREAT|FGP_FOR_MMAP) on the else path if 
filemap_get_folio() failed, maybe that's the problematic bit (and maybe 
that's where readahead logic makes a difference).

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-09-17  9:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <684ffc59.a00a0220.279073.0037.GAE@google.com>
2025-07-03 16:43 ` [syzbot] [net?] kernel BUG in filemap_fault (2) syzbot
2025-09-14 10:51 ` [syzbot] [sound?] " syzbot
2025-09-16 12:50   ` Ryan Roberts
2025-09-16 13:05     ` Jan Kara
2025-09-17  7:57       ` David Hildenbrand
2025-09-17  8:35         ` Jan Kara
2025-09-17  9:04           ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).