* [syzbot] [mm?] WARNING in xfs_init_fs_context @ 2025-06-29 22:47 syzbot 2025-07-01 15:01 ` Zi Yan 0 siblings, 1 reply; 7+ messages in thread From: syzbot @ 2025-06-29 22:47 UTC (permalink / raw) To: akpm, apopple, byungchul, david, gourry, joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs, ying.huang, ziy Hello, syzbot found the following issue on: HEAD commit: dfba48a70cb6 Merge tag 'i2c-for-6.16-rc4' of git://git.ker.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=14a62982580000 kernel config: https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65 dashboard link: https://syzkaller.appspot.com/bug?extid=359a67b608de1ef72f65 compiler: Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-dfba48a7.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/783560258712/vmlinux-dfba48a7.xz kernel image: https://storage.googleapis.com/syzbot-assets/685ad235ac7b/bzImage-dfba48a7.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+359a67b608de1ef72f65@syzkaller.appspotmail.com loop0: detected capacity change from 0 to 32768 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5325 at mm/page_alloc.c:4430 __alloc_pages_slowpath+0xcb3/0xce0 mm/page_alloc.c:4430 Modules linked in: CPU: 0 UID: 0 PID: 5325 Comm: syz.0.0 Not tainted 6.16.0-rc3-syzkaller-00329-gdfba48a70cb6 #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:__alloc_pages_slowpath+0xcb3/0xce0 mm/page_alloc.c:4430 Code: d8 48 c1 e8 03 0f b6 04 08 84 c0 75 2e f6 43 01 08 48 8b 14 24 0f 84 a2 f3 ff ff 90 0f 0b 90 e9 99 f3 ff ff e8 ae 09 50 09 90 <0f> 0b 90 f7 c5 00 04 00 00 75 bc 90 0f 0b 90 eb b6 89 d9 80 e1 07 RSP: 0018:ffffc9000d62f970 EFLAGS: 00010246 RAX: 1378840d66abe400 RBX: 0000000000000002 RCX: dffffc0000000000 RDX: ffffc9000d62fa80 RSI: 0000000000000002 RDI: 0000000000048cc0 RBP: 0000000000048cc0 R08: ffff88801b68003f R09: 1ffff110036d0007 R10: dffffc0000000000 R11: ffffed10036d0008 R12: ffffc9000d62fa80 R13: 1ffff92001ac5f4c R14: 0000000000048cc0 R15: dffffc0000000000 FS: 00007fcd1cbf46c0(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcd1bd726e0 CR3: 0000000043166000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419 alloc_slab_page mm/slub.c:2451 [inline] allocate_slab+0xe2/0x3b0 mm/slub.c:2627 new_slab mm/slub.c:2673 [inline] ___slab_alloc+0xbfc/0x1480 mm/slub.c:3859 __slab_alloc mm/slub.c:3949 [inline] __slab_alloc_node mm/slub.c:4024 [inline] slab_alloc_node mm/slub.c:4185 [inline] __kmalloc_cache_noprof+0x296/0x3d0 mm/slub.c:4354 kmalloc_noprof include/linux/slab.h:905 [inline] kzalloc_noprof include/linux/slab.h:1039 [inline] xfs_init_fs_context+0x54/0x500 fs/xfs/xfs_super.c:2279 alloc_fs_context+0x651/0x7d0 fs/fs_context.c:318 do_new_mount+0x10e/0xa40 fs/namespace.c:3881 do_mount fs/namespace.c:4239 [inline] __do_sys_mount fs/namespace.c:4450 [inline] __se_sys_mount+0x317/0x410 fs/namespace.c:4427 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fcd1bd900ca Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fcd1cbf3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007fcd1cbf3ef0 RCX: 00007fcd1bd900ca RDX: 0000200000000040 RSI: 0000200000009640 RDI: 00007fcd1cbf3eb0 RBP: 0000200000000040 R08: 00007fcd1cbf3ef0 R09: 0000000000208800 R10: 0000000000208800 R11: 0000000000000246 R12: 0000200000009640 R13: 00007fcd1cbf3eb0 R14: 000000000000964b R15: 0000200000001340 </TASK> --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context 2025-06-29 22:47 [syzbot] [mm?] WARNING in xfs_init_fs_context syzbot @ 2025-07-01 15:01 ` Zi Yan [not found] ` <1921ec99-7abb-42f1-a56b-d1f0f5bc1377@I-love.SAKURA.ne.jp> 0 siblings, 1 reply; 7+ messages in thread From: Zi Yan @ 2025-07-01 15:01 UTC (permalink / raw) To: Vlastimil Babka, Barry Song Cc: syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs, ying.huang On 29 Jun 2025, at 18:47, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: dfba48a70cb6 Merge tag 'i2c-for-6.16-rc4' of git://git.ker.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=14a62982580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65 > dashboard link: https://syzkaller.appspot.com/bug?extid=359a67b608de1ef72f65 > compiler: Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6 > > Unfortunately, I don't have any reproducer for this issue yet. > > Downloadable assets: > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-dfba48a7.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/783560258712/vmlinux-dfba48a7.xz > kernel image: https://storage.googleapis.com/syzbot-assets/685ad235ac7b/bzImage-dfba48a7.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+359a67b608de1ef72f65@syzkaller.appspotmail.com > > loop0: detected capacity change from 0 to 32768 > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 5325 at mm/page_alloc.c:4430 This warning fires when one tries to allocate a >1 order page with __GFP_NOFAIL. > __alloc_pages_slowpath+0xcb3/0xce0 mm/page_alloc.c:4430 > Modules linked in: > CPU: 0 UID: 0 PID: 5325 Comm: syz.0.0 Not tainted 6.16.0-rc3-syzkaller-00329-gdfba48a70cb6 #0 PREEMPT(full) > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 > RIP: 0010:__alloc_pages_slowpath+0xcb3/0xce0 mm/page_alloc.c:4430 > Code: d8 48 c1 e8 03 0f b6 04 08 84 c0 75 2e f6 43 01 08 48 8b 14 24 0f 84 a2 f3 ff ff 90 0f 0b 90 e9 99 f3 ff ff e8 ae 09 50 09 90 <0f> 0b 90 f7 c5 00 04 00 00 75 bc 90 0f 0b 90 eb b6 89 d9 80 e1 07 > RSP: 0018:ffffc9000d62f970 EFLAGS: 00010246 > RAX: 1378840d66abe400 RBX: 0000000000000002 RCX: dffffc0000000000 > RDX: ffffc9000d62fa80 RSI: 0000000000000002 RDI: 0000000000048cc0 > RBP: 0000000000048cc0 R08: ffff88801b68003f R09: 1ffff110036d0007 > R10: dffffc0000000000 R11: ffffed10036d0008 R12: ffffc9000d62fa80 > R13: 1ffff92001ac5f4c R14: 0000000000048cc0 R15: dffffc0000000000 > FS: 00007fcd1cbf46c0(0000) GS:ffff88808d250000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fcd1bd726e0 CR3: 0000000043166000 CR4: 0000000000352ef0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972 > alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419 > alloc_slab_page mm/slub.c:2451 [inline] > allocate_slab+0xe2/0x3b0 mm/slub.c:2627 > new_slab mm/slub.c:2673 [inline] new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it. In allocate_slab(), the first allocation without __GFP_NOFAIL failed, the retry used __GFP_NOFAIL but kmem_cache order was greater than 1, which led to the warning above. Maybe allocate_slab() should just fail when kmem_cache order is too big and first trial fails? I am no expert, so add Vlastimil for help. Barry, who added the nofail warning is cc’d. > ___slab_alloc+0xbfc/0x1480 mm/slub.c:3859 > __slab_alloc mm/slub.c:3949 [inline] > __slab_alloc_node mm/slub.c:4024 [inline] > slab_alloc_node mm/slub.c:4185 [inline] > __kmalloc_cache_noprof+0x296/0x3d0 mm/slub.c:4354 > kmalloc_noprof include/linux/slab.h:905 [inline] > kzalloc_noprof include/linux/slab.h:1039 [inline] > xfs_init_fs_context+0x54/0x500 fs/xfs/xfs_super.c:2279 > alloc_fs_context+0x651/0x7d0 fs/fs_context.c:318 > do_new_mount+0x10e/0xa40 fs/namespace.c:3881 > do_mount fs/namespace.c:4239 [inline] > __do_sys_mount fs/namespace.c:4450 [inline] > __se_sys_mount+0x317/0x410 fs/namespace.c:4427 > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] > do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > RIP: 0033:0x7fcd1bd900ca > Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb a6 e8 de 1a 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fcd1cbf3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 > RAX: ffffffffffffffda RBX: 00007fcd1cbf3ef0 RCX: 00007fcd1bd900ca > RDX: 0000200000000040 RSI: 0000200000009640 RDI: 00007fcd1cbf3eb0 > RBP: 0000200000000040 R08: 00007fcd1cbf3ef0 R09: 0000000000208800 > R10: 0000000000208800 R11: 0000000000000246 R12: 0000200000009640 > R13: 00007fcd1cbf3eb0 R14: 000000000000964b R15: 0000200000001340 > </TASK> > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > If the report is already addressed, let syzbot know by replying with: > #syz fix: exact-commit-title > > If you want to overwrite report's subsystems, reply with: > #syz set subsystems: new-subsystem > (See the list of subsystem names on the web dashboard) > > If the report is a duplicate of another one, reply with: > #syz dup: exact-subject-of-another-report > > If you want to undo deduplication, reply with: > #syz undup Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <1921ec99-7abb-42f1-a56b-d1f0f5bc1377@I-love.SAKURA.ne.jp>]
* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context [not found] ` <1921ec99-7abb-42f1-a56b-d1f0f5bc1377@I-love.SAKURA.ne.jp> @ 2025-07-02 7:30 ` Vlastimil Babka 2025-07-04 8:26 ` Harry Yoo 2025-07-07 22:10 ` Dave Chinner 0 siblings, 2 replies; 7+ messages in thread From: Vlastimil Babka @ 2025-07-02 7:30 UTC (permalink / raw) To: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs, Dave Chinner Cc: syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs, ying.huang, Harry Yoo, Michal Hocko, Matthew Wilcox +CC xfs and few more On 7/2/25 3:41 AM, Tetsuo Handa wrote: > On 2025/07/02 0:01, Zi Yan wrote: >>> __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972 >>> alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419 >>> alloc_slab_page mm/slub.c:2451 [inline] >>> allocate_slab+0xe2/0x3b0 mm/slub.c:2627 >>> new_slab mm/slub.c:2673 [inline] >> >> new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it. >> In allocate_slab(), the first allocation without __GFP_NOFAIL >> failed, the retry used __GFP_NOFAIL but kmem_cache order >> was greater than 1, which led to the warning above. >> >> Maybe allocate_slab() should just fail when kmem_cache >> order is too big and first trial fails? I am no expert, >> so add Vlastimil for help. Thanks Zi. Slab shouldn't fail with __GFP_NOFAIL, that would only lead to subsystems like xfs to reintroduce their own forever retrying wrappers again. I think it's going the best it can for the fallback attempt by using the minimum order, so the warning will never happen due to the calculated optimal order being too large, but only if the kmalloc()/kmem_cache_alloc() requested/object size is too large itself. Hm but perhaps enabling slab_debug can inflate it over the threshold, is it the case here? I think in that rare case we could convert such fallback allocations to large kmalloc to avoid adding the debugging overhead - we can't easily create an individual slab page without the debugging layout for a kmalloc cache with debugging enabled. >> Barry, who added the nofail >> warning is cc’d. Barry's commit 903edea6c53f0 reorganized the warnings, but it existed already long before. > Indeed. In allocate_slab(struct kmem_cache *s, gfp_t flags, int node), > > /* > * Let the initial higher-order allocation fail under memory pressure > * so we fall-back to the minimum order allocation. > */ > alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; > if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min)) > alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_RECLAIM; > > slab = alloc_slab_page(alloc_gfp, node, oo); > if (unlikely(!slab)) { > oo = s->min; > alloc_gfp = flags; > /* > * Allocation may have failed due to fragmentation. > * Try a lower order alloc if possible > */ > slab = alloc_slab_page(alloc_gfp, node, oo); > > __GFP_NOFAIL needs to be dropped unless s->min is either 0 or 1. No, that would violate __GFP_NOFAIL semantics. > > if (unlikely(!slab)) > return NULL; > stat(s, ORDER_FALLBACK); > } > > > > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ? > > mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL); > if (!mp) > return -ENOMEM; > > This looks an allocation attempt which can fail safely. Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc() to kmalloc()") dropped the xfs wrapper. This allocation didn't use KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order nofail issue for another allocation site that had to use xlog_kvmalloc(). I think either this allocation really can fail as the code (return -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use kvmalloc() - I think the wrapper for that can be removed now too after the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc: make kmalloc fast path real fast path"). [1] https://lore.kernel.org/all/Z_XI6vBE8v_cIhjZ@dread.disaster.area/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context 2025-07-02 7:30 ` Vlastimil Babka @ 2025-07-04 8:26 ` Harry Yoo 2025-07-07 16:57 ` Vlastimil Babka 2025-07-07 22:10 ` Dave Chinner 1 sibling, 1 reply; 7+ messages in thread From: Harry Yoo @ 2025-07-04 8:26 UTC (permalink / raw) To: Vlastimil Babka Cc: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs, Dave Chinner, syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs, ying.huang, Michal Hocko, Matthew Wilcox On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote: > +CC xfs and few more > > On 7/2/25 3:41 AM, Tetsuo Handa wrote: > > On 2025/07/02 0:01, Zi Yan wrote: > >>> __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972 > >>> alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419 > >>> alloc_slab_page mm/slub.c:2451 [inline] > >>> allocate_slab+0xe2/0x3b0 mm/slub.c:2627 > >>> new_slab mm/slub.c:2673 [inline] > >> > >> new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it. > >> In allocate_slab(), the first allocation without __GFP_NOFAIL > >> failed, the retry used __GFP_NOFAIL but kmem_cache order > >> was greater than 1, which led to the warning above. > >> > >> Maybe allocate_slab() should just fail when kmem_cache > >> order is too big and first trial fails? I am no expert, > >> so add Vlastimil for help. > > Thanks Zi. Slab shouldn't fail with __GFP_NOFAIL, that would only lead > to subsystems like xfs to reintroduce their own forever retrying > wrappers again. I think it's going the best it can for the fallback > attempt by using the minimum order, so the warning will never happen due > to the calculated optimal order being too large, but only if the > kmalloc()/kmem_cache_alloc() requested/object size is too large itself. Right. The warning would trigger only if the object size is bigger than 8k (PAGE_SIZE * 2). > Hm but perhaps enabling slab_debug can inflate it over the threshold, is > it the case here? CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 panic_on_warn=1" CONFIG_SLUB_DEBUG=y # CONFIG_SLUB_DEBUG_ON is not set It seems no slab_debug is involved here. I downloaded the config and built the kernel, and sizeof(struct xfs_mount) is 4480 bytes. It should have allocated using order 1? Not sure why the min order was greater than 1? Not sure what I'm missing... > I think in that rare case we could convert such > fallback allocations to large kmalloc to avoid adding the debugging > overhead - we can't easily create an individual slab page without the > debugging layout for a kmalloc cache with debugging enabled. Yeah that can be doable when the size is exactly 8k or very close to 8k. -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context 2025-07-04 8:26 ` Harry Yoo @ 2025-07-07 16:57 ` Vlastimil Babka 0 siblings, 0 replies; 7+ messages in thread From: Vlastimil Babka @ 2025-07-07 16:57 UTC (permalink / raw) To: Harry Yoo Cc: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs, Dave Chinner, syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs, ying.huang, Michal Hocko, Matthew Wilcox On 7/4/25 10:26, Harry Yoo wrote: > On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote: >> +CC xfs and few more >> >> On 7/2/25 3:41 AM, Tetsuo Handa wrote: >> > On 2025/07/02 0:01, Zi Yan wrote: >> >>> __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972 >> >>> alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419 >> >>> alloc_slab_page mm/slub.c:2451 [inline] >> >>> allocate_slab+0xe2/0x3b0 mm/slub.c:2627 >> >>> new_slab mm/slub.c:2673 [inline] >> >> >> >> new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it. >> >> In allocate_slab(), the first allocation without __GFP_NOFAIL >> >> failed, the retry used __GFP_NOFAIL but kmem_cache order >> >> was greater than 1, which led to the warning above. >> >> >> >> Maybe allocate_slab() should just fail when kmem_cache >> >> order is too big and first trial fails? I am no expert, >> >> so add Vlastimil for help. >> >> Thanks Zi. Slab shouldn't fail with __GFP_NOFAIL, that would only lead >> to subsystems like xfs to reintroduce their own forever retrying >> wrappers again. I think it's going the best it can for the fallback >> attempt by using the minimum order, so the warning will never happen due >> to the calculated optimal order being too large, but only if the >> kmalloc()/kmem_cache_alloc() requested/object size is too large itself. > > Right. The warning would trigger only if the object size is bigger > than 8k (PAGE_SIZE * 2). > >> Hm but perhaps enabling slab_debug can inflate it over the threshold, is >> it the case here? > > CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 panic_on_warn=1" > > CONFIG_SLUB_DEBUG=y > # CONFIG_SLUB_DEBUG_ON is not set > > It seems no slab_debug is involved here. > > I downloaded the config and built the kernel, and > sizeof(struct xfs_mount) is 4480 bytes. It should have allocated using > order 1? So it should be the kmalloc-8k cache, its min order should be get_order(8k) thus 1. If the object was larger than 8k it would be a large kmalloc anyway and also trigger the __GFP_NOFAIL warning but with a different stacktrace. > Not sure why the min order was greater than 1? > Not sure what I'm missing... The only sane explanation is that slab debugging is enabled but not via CONFIG_CMDLINE but via options passed to the qemu execution? But I don't see those, nor the full dmesg (that would report them) in the syzbot dashboard. Hm or actually it might be kasan_cache_create() bumping our size when called from calculate_sizes(). KASAN seems enabled... >> I think in that rare case we could convert such >> fallback allocations to large kmalloc to avoid adding the debugging >> overhead - we can't easily create an individual slab page without the >> debugging layout for a kmalloc cache with debugging enabled. > > Yeah that can be doable when the size is exactly 8k or very close to 8k. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context 2025-07-02 7:30 ` Vlastimil Babka 2025-07-04 8:26 ` Harry Yoo @ 2025-07-07 22:10 ` Dave Chinner 2025-07-08 8:50 ` Vlastimil Babka 1 sibling, 1 reply; 7+ messages in thread From: Dave Chinner @ 2025-07-07 22:10 UTC (permalink / raw) To: Vlastimil Babka Cc: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs, syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs, ying.huang, Harry Yoo, Michal Hocko, Matthew Wilcox On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote: > On 7/2/25 3:41 AM, Tetsuo Handa wrote: > > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ? > > > > mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL); > > if (!mp) > > return -ENOMEM; > > > > This looks an allocation attempt which can fail safely. It's irrelevant - it shouldn't fail regardless of __GFP_NOFAIL being specified. > Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc() > to kmalloc()") dropped the xfs wrapper. This allocation didn't use > KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order > nofail issue for another allocation site that had to use xlog_kvmalloc(). I don't see how high-order allocation behaviour is relevant here. Pahole says the struct xfs_mount is 4224 bytes in length. It is an order-1 allocation and if we've fragmented memory so badly that slab can't allocate an order-1 page then *lots* of other stuff is going to be stalling. (e.g. slab pages for inodes are typically order-3, same as the kmalloc-8kk slab). Note that the size of the structure is largely because of the embedded cpumask for inodegc: struct cpumask m_inodegc_cpumask; /* 3104 1024 */ This should probably be pulled out into a dynamically allocated inodegc specific structure. Then the struct xfs_mount is only a order-0 allocation and should never fail, regardless of __GFP_NOFAIL being specified or not. > I think either this allocation really can fail as the code (return > -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use > kvmalloc() - I think the wrapper for that can be removed now too after > the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc: > make kmalloc fast path real fast path"). I know about that - I have patches that I'm testing that replace xlog_kvmalloc() with kvmalloc calls. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [mm?] WARNING in xfs_init_fs_context 2025-07-07 22:10 ` Dave Chinner @ 2025-07-08 8:50 ` Vlastimil Babka 0 siblings, 0 replies; 7+ messages in thread From: Vlastimil Babka @ 2025-07-08 8:50 UTC (permalink / raw) To: Dave Chinner Cc: Tetsuo Handa, Zi Yan, Barry Song, Carlos Maiolino, linux-xfs, syzbot, akpm, apopple, byungchul, david, gourry, joshua.hahnjy, linux-kernel, linux-mm, matthew.brost, rakie.kim, syzkaller-bugs, ying.huang, Harry Yoo, Michal Hocko, Matthew Wilcox On 7/8/25 00:10, Dave Chinner wrote: > On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote: >> On 7/2/25 3:41 AM, Tetsuo Handa wrote: >> > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ? >> > >> > mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL); >> > if (!mp) >> > return -ENOMEM; >> > >> > This looks an allocation attempt which can fail safely. > > It's irrelevant - it shouldn't fail regardless of __GFP_NOFAIL being > specified. If you mean the "too small to fail" behavior then it's generally true, except in some corner cases like being an oom victim, in which case the allocation can fail - the userspace process is doomed anyway. But a (small) kernel allocation not handling NULL would still need __GFP_NOFAIL to prevent that corner case. >> Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc() >> to kmalloc()") dropped the xfs wrapper. This allocation didn't use >> KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order >> nofail issue for another allocation site that had to use xlog_kvmalloc(). > > I don't see how high-order allocation behaviour is relevant here. > > Pahole says the struct xfs_mount is 4224 bytes in length. It is an > order-1 allocation and if we've fragmented memory so badly that slab > can't allocate an order-1 page then *lots* of other stuff is going > to be stalling. (e.g. slab pages for inodes are typically order-3, > same as the kmalloc-8kk slab). Elsewhere in this thread we figured it out since I wrote the quoted reply. 4224 bytes means kmalloc-8k where the fallback allocation (the one that passes on the __GFP_NOFAIL) order is 1 normally. But due to KASAN enabled its metadata means the per-object size goes above 8k and thus the fallback order will be 2. It's a corner case that wasn't anticipated and existed for years without known reports. We'll need to deal with it somehow. > Note that the size of the structure is largely because of the > embedded cpumask for inodegc: > > struct cpumask m_inodegc_cpumask; /* 3104 1024 */ > > This should probably be pulled out into a dynamically allocated > inodegc specific structure. Then the struct xfs_mount is only a > order-0 allocation and should never fail, regardless of > __GFP_NOFAIL being specified or not. > >> I think either this allocation really can fail as the code (return >> -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use >> kvmalloc() - I think the wrapper for that can be removed now too after >> the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc: >> make kmalloc fast path real fast path"). > > I know about that - I have patches that I'm testing that replace > xlog_kvmalloc() with kvmalloc calls. Great, thanks! > -Dave. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-08 8:50 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-29 22:47 [syzbot] [mm?] WARNING in xfs_init_fs_context syzbot 2025-07-01 15:01 ` Zi Yan [not found] ` <1921ec99-7abb-42f1-a56b-d1f0f5bc1377@I-love.SAKURA.ne.jp> 2025-07-02 7:30 ` Vlastimil Babka 2025-07-04 8:26 ` Harry Yoo 2025-07-07 16:57 ` Vlastimil Babka 2025-07-07 22:10 ` Dave Chinner 2025-07-08 8:50 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).