[syzbot] [mm?] WARNING in xfs_init_fs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [mm?] WARNING in xfs_init_fs_context
@ 2025-06-29 22:47 syzbot
  2025-07-01 15:01 ` Zi Yan
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2025-06-29 22:47 UTC (permalink / raw)
  To: akpm, apopple, byungchul, david, gourry, joshua.hahnjy,
	linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs,
	ying.huang, ziy

Hello,

syzbot found the following issue on:

HEAD commit:    dfba48a70cb6 Merge tag 'i2c-for-6.16-rc4' of git://git.ker..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14a62982580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
dashboard link: https://syzkaller.appspot.com/bug?extid=359a67b608de1ef72f65
compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-dfba48a7.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/783560258712/vmlinux-dfba48a7.xz
kernel image: https://storage.googleapis.com/syzbot-assets/685ad235ac7b/bzImage-dfba48a7.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+359a67b608de1ef72f65@syzkaller.appspotmail.com

loop0: detected capacity change from 0 to 32768
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5325 at mm/page_alloc.c:4430 __alloc_pages_slowpath+0xcb3/0xce0 mm/page_alloc.c:4430
Modules linked in:
CPU: 0 UID: 0 PID: 5325 Comm: syz.0.0 Not tainted 6.16.0-rc3-syzkaller-00329-gdfba48a70cb6 #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__alloc_pages_slowpath+0xcb3/0xce0 mm/page_alloc.c:4430
Code: d8 48 c1 e8 03 0f b6 04 08 84 c0 75 2e f6 43 01 08 48 8b 14 24 0f 84 a2 f3 ff ff 90 0f 0b 90 e9 99 f3 ff ff e8 ae 09 50 09 90 <0f> 0b 90 f7 c5 00 04 00 00 75 bc 90 0f 0b 90 eb b6 89 d9 80 e1 07
RSP: 0018:ffffc9000d62f970 EFLAGS: 00010246
RAX: 1378840d66abe400 RBX: 0000000000000002 RCX: dffffc0000000000
RDX: ffffc9000d62fa80 RSI: 0000000000000002 RDI: 0000000000048cc0
RBP: 0000000000048cc0 R08: ffff88801b68003f R09: 1ffff110036d0007
R10: dffffc0000000000 R11: ffffed10036d0008 R12: ffffc9000d62fa80
R13: 1ffff92001ac5f4c R14: 0000000000048cc0 R15: dffffc0000000000
FS:  00007fcd1cbf46c0(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcd1bd726e0 CR3: 0000000043166000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972
 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419
 alloc_slab_page mm/slub.c:2451 [inline]
 allocate_slab+0xe2/0x3b0 mm/slub.c:2627
 new_slab mm/slub.c:2673 [inline]
 ___slab_alloc+0xbfc/0x1480 mm/slub.c:3859
 __slab_alloc mm/slub.c:3949 [inline]
 __slab_alloc_node mm/slub.c:4024 [inline]
 slab_alloc_node mm/slub.c:4185 [inline]
 __kmalloc_cache_noprof+0x296/0x3d0 mm/slub.c:4354
 kmalloc_noprof include/linux/slab.h:905 [inline]
 kzalloc_noprof include/linux/slab.h:1039 [inline]
 xfs_init_fs_context+0x54/0x500 fs/xfs/xfs_super.c:2279
 alloc_fs_context+0x651/0x7d0 fs/fs_context.c:318
 do_new_mount+0x10e/0xa40 fs/namespace.c:3881
 do_mount fs/namespace.c:4239 [inline]
 __do_sys_mount fs/namespace.c:4450 [inline]
 __se_sys_mount+0x317/0x410 fs/namespace.c:4427
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fcd1bd900ca
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fcd1cbf3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007fcd1cbf3ef0 RCX: 00007fcd1bd900ca
RDX: 0000200000000040 RSI: 0000200000009640 RDI: 00007fcd1cbf3eb0
RBP: 0000200000000040 R08: 00007fcd1cbf3ef0 R09: 0000000000208800
R10: 0000000000208800 R11: 0000000000000246 R12: 0000200000009640
R13: 00007fcd1cbf3eb0 R14: 000000000000964b R15: 0000200000001340
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context
  2025-06-29 22:47 [syzbot] [mm?] WARNING in xfs_init_fs_context syzbot
@ 2025-07-01 15:01 ` Zi Yan
       [not found]   ` <1921ec99-7abb-42f1-a56b-d1f0f5bc1377@I-love.SAKURA.ne.jp>
  0 siblings, 1 reply; 7+ messages in thread
From: Zi Yan @ 2025-07-01 15:01 UTC (permalink / raw)
  To: Vlastimil Babka, Barry Song
  Cc: syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy,
	linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs,
	ying.huang

On 29 Jun 2025, at 18:47, syzbot wrote:

> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    dfba48a70cb6 Merge tag 'i2c-for-6.16-rc4' of git://git.ker..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14a62982580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
> dashboard link: https://syzkaller.appspot.com/bug?extid=359a67b608de1ef72f65
> compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-dfba48a7.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/783560258712/vmlinux-dfba48a7.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/685ad235ac7b/bzImage-dfba48a7.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+359a67b608de1ef72f65@syzkaller.appspotmail.com
>
> loop0: detected capacity change from 0 to 32768
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5325 at mm/page_alloc.c:4430

This warning fires when one tries to allocate a >1 order page with
__GFP_NOFAIL.

> __alloc_pages_slowpath+0xcb3/0xce0 mm/page_alloc.c:4430
> Modules linked in:
> CPU: 0 UID: 0 PID: 5325 Comm: syz.0.0 Not tainted 6.16.0-rc3-syzkaller-00329-gdfba48a70cb6 #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> RIP: 0010:__alloc_pages_slowpath+0xcb3/0xce0 mm/page_alloc.c:4430
> Code: d8 48 c1 e8 03 0f b6 04 08 84 c0 75 2e f6 43 01 08 48 8b 14 24 0f 84 a2 f3 ff ff 90 0f 0b 90 e9 99 f3 ff ff e8 ae 09 50 09 90 <0f> 0b 90 f7 c5 00 04 00 00 75 bc 90 0f 0b 90 eb b6 89 d9 80 e1 07
> RSP: 0018:ffffc9000d62f970 EFLAGS: 00010246
> RAX: 1378840d66abe400 RBX: 0000000000000002 RCX: dffffc0000000000
> RDX: ffffc9000d62fa80 RSI: 0000000000000002 RDI: 0000000000048cc0
> RBP: 0000000000048cc0 R08: ffff88801b68003f R09: 1ffff110036d0007
> R10: dffffc0000000000 R11: ffffed10036d0008 R12: ffffc9000d62fa80
> R13: 1ffff92001ac5f4c R14: 0000000000048cc0 R15: dffffc0000000000
> FS:  00007fcd1cbf46c0(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fcd1bd726e0 CR3: 0000000043166000 CR4: 0000000000352ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972
>  alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419
>  alloc_slab_page mm/slub.c:2451 [inline]
>  allocate_slab+0xe2/0x3b0 mm/slub.c:2627
>  new_slab mm/slub.c:2673 [inline]

new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it.
In allocate_slab(), the first allocation without __GFP_NOFAIL
failed, the retry used __GFP_NOFAIL but kmem_cache order
was greater than 1, which led to the warning above.

Maybe allocate_slab() should just fail when kmem_cache
order is too big and first trial fails? I am no expert,
so add Vlastimil for help. Barry, who added the nofail
warning is cc’d.


>  ___slab_alloc+0xbfc/0x1480 mm/slub.c:3859
>  __slab_alloc mm/slub.c:3949 [inline]
>  __slab_alloc_node mm/slub.c:4024 [inline]
>  slab_alloc_node mm/slub.c:4185 [inline]
>  __kmalloc_cache_noprof+0x296/0x3d0 mm/slub.c:4354
>  kmalloc_noprof include/linux/slab.h:905 [inline]
>  kzalloc_noprof include/linux/slab.h:1039 [inline]
>  xfs_init_fs_context+0x54/0x500 fs/xfs/xfs_super.c:2279
>  alloc_fs_context+0x651/0x7d0 fs/fs_context.c:318
>  do_new_mount+0x10e/0xa40 fs/namespace.c:3881
>  do_mount fs/namespace.c:4239 [inline]
>  __do_sys_mount fs/namespace.c:4450 [inline]
>  __se_sys_mount+0x317/0x410 fs/namespace.c:4427
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fcd1bd900ca
> Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fcd1cbf3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> RAX: ffffffffffffffda RBX: 00007fcd1cbf3ef0 RCX: 00007fcd1bd900ca
> RDX: 0000200000000040 RSI: 0000200000009640 RDI: 00007fcd1cbf3eb0
> RBP: 0000200000000040 R08: 00007fcd1cbf3ef0 R09: 0000000000208800
> R10: 0000000000208800 R11: 0000000000000246 R12: 0000200000009640
> R13: 00007fcd1cbf3eb0 R14: 000000000000964b R15: 0000200000001340
>  </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <1921ec99-7abb-42f1-a56b-d1f0f5bc1377@I-love.SAKURA.ne.jp>]

* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context
       [not found]   ` <1921ec99-7abb-42f1-a56b-d1f0f5bc1377@I-love.SAKURA.ne.jp>
@ 2025-07-02  7:30     ` Vlastimil Babka
  2025-07-04  8:26       ` Harry Yoo
  2025-07-07 22:10       ` Dave Chinner
  0 siblings, 2 replies; 7+ messages in thread
From: Vlastimil Babka @ 2025-07-02  7:30 UTC (permalink / raw)
  To: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs,
	Dave Chinner
  Cc: syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy,
	linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs,
	ying.huang, Harry Yoo, Michal Hocko, Matthew Wilcox

+CC xfs and few more

On 7/2/25 3:41 AM, Tetsuo Handa wrote:
> On 2025/07/02 0:01, Zi Yan wrote:
>>>  __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972
>>>  alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419
>>>  alloc_slab_page mm/slub.c:2451 [inline]
>>>  allocate_slab+0xe2/0x3b0 mm/slub.c:2627
>>>  new_slab mm/slub.c:2673 [inline]
>>
>> new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it.
>> In allocate_slab(), the first allocation without __GFP_NOFAIL
>> failed, the retry used __GFP_NOFAIL but kmem_cache order
>> was greater than 1, which led to the warning above.
>>
>> Maybe allocate_slab() should just fail when kmem_cache
>> order is too big and first trial fails? I am no expert,
>> so add Vlastimil for help.

Thanks Zi. Slab shouldn't fail with __GFP_NOFAIL, that would only lead
to subsystems like xfs to reintroduce their own forever retrying
wrappers again. I think it's going the best it can for the fallback
attempt by using the minimum order, so the warning will never happen due
to the calculated optimal order being too large, but only if the
kmalloc()/kmem_cache_alloc() requested/object size is too large itself.

Hm but perhaps enabling slab_debug can inflate it over the threshold, is
it the case here? I think in that rare case we could convert such
fallback allocations to large kmalloc to avoid adding the debugging
overhead - we can't easily create an individual slab page without the
debugging layout for a kmalloc cache with debugging enabled.

>> Barry, who added the nofail
>> warning is cc’d.

Barry's commit 903edea6c53f0 reorganized the warnings, but it existed
already long before.

> Indeed. In allocate_slab(struct kmem_cache *s, gfp_t flags, int node),
> 
> 	/*
> 	 * Let the initial higher-order allocation fail under memory pressure
> 	 * so we fall-back to the minimum order allocation.
> 	 */
> 	alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
> 	if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min))
> 		alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_RECLAIM;
> 
> 	slab = alloc_slab_page(alloc_gfp, node, oo);
> 	if (unlikely(!slab)) {
> 		oo = s->min;
> 		alloc_gfp = flags;
> 		/*
> 		 * Allocation may have failed due to fragmentation.
> 		 * Try a lower order alloc if possible
> 		 */
> 		slab = alloc_slab_page(alloc_gfp, node, oo);
> 
> __GFP_NOFAIL needs to be dropped unless s->min is either 0 or 1.

No, that would violate __GFP_NOFAIL semantics.

> 
> 		if (unlikely(!slab))
> 			return NULL;
> 		stat(s, ORDER_FALLBACK);
> 	}
> 
> 
> 
> By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ?
> 
> 	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
> 	if (!mp)
> 		return -ENOMEM;
> 
> This looks an allocation attempt which can fail safely.

Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc()
to kmalloc()") dropped the xfs wrapper. This allocation didn't use
KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order
nofail issue for another allocation site that had to use xlog_kvmalloc().

I think either this allocation really can fail as the code (return
-ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use
kvmalloc() - I think the wrapper for that can be removed now too after
the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc:
make kmalloc fast path real fast path").

[1] https://lore.kernel.org/all/Z_XI6vBE8v_cIhjZ@dread.disaster.area/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context
  2025-07-02  7:30     ` Vlastimil Babka
@ 2025-07-04  8:26       ` Harry Yoo
  2025-07-07 16:57         ` Vlastimil Babka
  2025-07-07 22:10       ` Dave Chinner
  1 sibling, 1 reply; 7+ messages in thread
From: Harry Yoo @ 2025-07-04  8:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs,
	Dave Chinner, syzbot, akpm, apopple, byungchul, david, gourry,
	joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim,
	syzkaller-bugs, ying.huang, Michal Hocko, Matthew Wilcox

On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
> +CC xfs and few more
> 
> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
> > On 2025/07/02 0:01, Zi Yan wrote:
> >>>  __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972
> >>>  alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419
> >>>  alloc_slab_page mm/slub.c:2451 [inline]
> >>>  allocate_slab+0xe2/0x3b0 mm/slub.c:2627
> >>>  new_slab mm/slub.c:2673 [inline]
> >>
> >> new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it.
> >> In allocate_slab(), the first allocation without __GFP_NOFAIL
> >> failed, the retry used __GFP_NOFAIL but kmem_cache order
> >> was greater than 1, which led to the warning above.
> >>
> >> Maybe allocate_slab() should just fail when kmem_cache
> >> order is too big and first trial fails? I am no expert,
> >> so add Vlastimil for help.
> 
> Thanks Zi. Slab shouldn't fail with __GFP_NOFAIL, that would only lead
> to subsystems like xfs to reintroduce their own forever retrying
> wrappers again. I think it's going the best it can for the fallback
> attempt by using the minimum order, so the warning will never happen due
> to the calculated optimal order being too large, but only if the
> kmalloc()/kmem_cache_alloc() requested/object size is too large itself.

Right. The warning would trigger only if the object size is bigger
than 8k (PAGE_SIZE * 2).

> Hm but perhaps enabling slab_debug can inflate it over the threshold, is
> it the case here?

CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 panic_on_warn=1"

CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set

It seems no slab_debug is involved here.

I downloaded the config and built the kernel, and
sizeof(struct xfs_mount) is 4480 bytes. It should have allocated using
order 1?

Not sure why the min order was greater than 1?
Not sure what I'm missing...

> I think in that rare case we could convert such
> fallback allocations to large kmalloc to avoid adding the debugging
> overhead - we can't easily create an individual slab page without the
> debugging layout for a kmalloc cache with debugging enabled.

Yeah that can be doable when the size is exactly 8k or very close to 8k.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context
  2025-07-04  8:26       ` Harry Yoo
@ 2025-07-07 16:57         ` Vlastimil Babka
  0 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2025-07-07 16:57 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs,
	Dave Chinner, syzbot, akpm, apopple, byungchul, david, gourry,
	joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim,
	syzkaller-bugs, ying.huang, Michal Hocko, Matthew Wilcox

On 7/4/25 10:26, Harry Yoo wrote:
> On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
>> +CC xfs and few more
>> 
>> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
>> > On 2025/07/02 0:01, Zi Yan wrote:
>> >>>  __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972
>> >>>  alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419
>> >>>  alloc_slab_page mm/slub.c:2451 [inline]
>> >>>  allocate_slab+0xe2/0x3b0 mm/slub.c:2627
>> >>>  new_slab mm/slub.c:2673 [inline]
>> >>
>> >> new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it.
>> >> In allocate_slab(), the first allocation without __GFP_NOFAIL
>> >> failed, the retry used __GFP_NOFAIL but kmem_cache order
>> >> was greater than 1, which led to the warning above.
>> >>
>> >> Maybe allocate_slab() should just fail when kmem_cache
>> >> order is too big and first trial fails? I am no expert,
>> >> so add Vlastimil for help.
>> 
>> Thanks Zi. Slab shouldn't fail with __GFP_NOFAIL, that would only lead
>> to subsystems like xfs to reintroduce their own forever retrying
>> wrappers again. I think it's going the best it can for the fallback
>> attempt by using the minimum order, so the warning will never happen due
>> to the calculated optimal order being too large, but only if the
>> kmalloc()/kmem_cache_alloc() requested/object size is too large itself.
> 
> Right. The warning would trigger only if the object size is bigger
> than 8k (PAGE_SIZE * 2).
> 
>> Hm but perhaps enabling slab_debug can inflate it over the threshold, is
>> it the case here?
> 
> CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 panic_on_warn=1"
> 
> CONFIG_SLUB_DEBUG=y
> # CONFIG_SLUB_DEBUG_ON is not set
> 
> It seems no slab_debug is involved here.
> 
> I downloaded the config and built the kernel, and
> sizeof(struct xfs_mount) is 4480 bytes. It should have allocated using
> order 1?

So it should be the kmalloc-8k cache, its min order should be get_order(8k)
thus 1. If the object was larger than 8k it would be a large kmalloc anyway
and also trigger the __GFP_NOFAIL warning but with a different stacktrace.

> Not sure why the min order was greater than 1?
> Not sure what I'm missing...

The only sane explanation is that slab debugging is enabled but not via
CONFIG_CMDLINE but via options passed to the qemu execution? But I don't see
those, nor the full dmesg (that would report them) in the syzbot dashboard.

Hm or actually it might be kasan_cache_create() bumping our size when called
from calculate_sizes(). KASAN seems enabled...

>> I think in that rare case we could convert such
>> fallback allocations to large kmalloc to avoid adding the debugging
>> overhead - we can't easily create an individual slab page without the
>> debugging layout for a kmalloc cache with debugging enabled.
> 
> Yeah that can be doable when the size is exactly 8k or very close to 8k.
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context
  2025-07-02  7:30     ` Vlastimil Babka
  2025-07-04  8:26       ` Harry Yoo
@ 2025-07-07 22:10       ` Dave Chinner
  2025-07-08  8:50         ` Vlastimil Babka
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2025-07-07 22:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs,
	syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy,
	linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs,
	ying.huang, Harry Yoo, Michal Hocko, Matthew Wilcox

On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
> > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ?
> > 
> > 	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
> > 	if (!mp)
> > 		return -ENOMEM;
> > 
> > This looks an allocation attempt which can fail safely.

It's irrelevant - it shouldn't fail regardless of __GFP_NOFAIL being
specified.

> Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc()
> to kmalloc()") dropped the xfs wrapper. This allocation didn't use
> KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order
> nofail issue for another allocation site that had to use xlog_kvmalloc().

I don't see how high-order allocation behaviour is relevant here.

Pahole says the struct xfs_mount is 4224 bytes in length. It is an
order-1 allocation and if we've fragmented memory so badly that slab
can't allocate an order-1 page then *lots* of other stuff is going
to be stalling. (e.g. slab pages for inodes are typically order-3,
same as the kmalloc-8kk slab).

Note that the size of the structure is largely because of the
embedded cpumask for inodegc:

	struct cpumask             m_inodegc_cpumask;    /*  3104  1024 */

This should probably be pulled out into a dynamically allocated
inodegc specific structure. Then the struct xfs_mount is only a
order-0 allocation and should never fail, regardless of
__GFP_NOFAIL being specified or not.

> I think either this allocation really can fail as the code (return
> -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use
> kvmalloc() - I think the wrapper for that can be removed now too after
> the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc:
> make kmalloc fast path real fast path").

I know about that - I have patches that I'm testing that replace
xlog_kvmalloc() with kvmalloc calls.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context
  2025-07-07 22:10       ` Dave Chinner
@ 2025-07-08  8:50         ` Vlastimil Babka
  0 siblings, 0 replies; 7+ messages in thread
From: Vlastimil Babka @ 2025-07-08  8:50 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs,
	syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy,
	linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs,
	ying.huang, Harry Yoo, Michal Hocko, Matthew Wilcox

On 7/8/25 00:10, Dave Chinner wrote:
> On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
>> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
>> > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ?
>> > 
>> > 	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
>> > 	if (!mp)
>> > 		return -ENOMEM;
>> > 
>> > This looks an allocation attempt which can fail safely.
> 
> It's irrelevant - it shouldn't fail regardless of __GFP_NOFAIL being
> specified.

If you mean the "too small to fail" behavior then it's generally true,
except in some corner cases like being an oom victim, in which case the
allocation can fail - the userspace process is doomed anyway. But a (small)
kernel allocation not handling NULL would still need __GFP_NOFAIL to prevent
that corner case.

>> Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc()
>> to kmalloc()") dropped the xfs wrapper. This allocation didn't use
>> KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order
>> nofail issue for another allocation site that had to use xlog_kvmalloc().
> 
> I don't see how high-order allocation behaviour is relevant here.
> 
> Pahole says the struct xfs_mount is 4224 bytes in length. It is an
> order-1 allocation and if we've fragmented memory so badly that slab
> can't allocate an order-1 page then *lots* of other stuff is going
> to be stalling. (e.g. slab pages for inodes are typically order-3,
> same as the kmalloc-8kk slab).

Elsewhere in this thread we figured it out since I wrote the quoted reply.
4224 bytes means kmalloc-8k where the fallback allocation (the one that
passes on the __GFP_NOFAIL) order is 1 normally. But due to KASAN enabled
its metadata means the per-object size goes above 8k and thus the fallback
order will be 2. It's a corner case that wasn't anticipated and existed for
years without known reports. We'll need to deal with it somehow.

> Note that the size of the structure is largely because of the
> embedded cpumask for inodegc:
> 
> 	struct cpumask             m_inodegc_cpumask;    /*  3104  1024 */
> 
> This should probably be pulled out into a dynamically allocated
> inodegc specific structure. Then the struct xfs_mount is only a
> order-0 allocation and should never fail, regardless of
> __GFP_NOFAIL being specified or not.
> 
>> I think either this allocation really can fail as the code (return
>> -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use
>> kvmalloc() - I think the wrapper for that can be removed now too after
>> the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc:
>> make kmalloc fast path real fast path").
> 
> I know about that - I have patches that I'm testing that replace
> xlog_kvmalloc() with kvmalloc calls.

Great, thanks!

> -Dave.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-08  8:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-29 22:47 [syzbot] [mm?] WARNING in xfs_init_fs_context syzbot
2025-07-01 15:01 ` Zi Yan
     [not found]   ` <1921ec99-7abb-42f1-a56b-d1f0f5bc1377@I-love.SAKURA.ne.jp>
2025-07-02  7:30     ` Vlastimil Babka
2025-07-04  8:26       ` Harry Yoo
2025-07-07 16:57         ` Vlastimil Babka
2025-07-07 22:10       ` Dave Chinner
2025-07-08  8:50         ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).