Linux EXT4 FS development

Linux EXT4 FS development
 help / color / mirror / Atom feed

* [syzbot ci] Re: Data in direntry (dirdata) feature
From: syzbot ci @ 2026-06-11 10:29 UTC (permalink / raw)
  To: adilger.kernel, adilger, adilger, artem.blagodarenko, linux-ext4,
	pravin.shelar
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

syzbot ci has tested the following series

[v2] Data in direntry (dirdata) feature
https://lore.kernel.org/all/20260610152417.13576-1-ablagodarenko@thelustrecollective.com
* [PATCH v2 01/10] ext4: replace ext4_dir_entry with ext4_dir_entry_2
* [PATCH v2 02/10] ext4: add ext4_dir_entry_is_tail()
* [PATCH v2 03/10] ext4: refactor dx_root to support variable dirent sizes
* [PATCH v2 04/10] ext4: add dirdata format definitions and access helpers
* [PATCH v2 05/10] ext4: preserve dirdata bits in get_dtype()
* [PATCH v2 06/10] ext4: add ext4_dir_entry_len() and harden dirdata parsing
* [PATCH v2 07/10] ext4: rename ext4_dir_rec_len() and clarify dirdata usage
* [PATCH v2 08/10] ext4: dirdata feature
* [PATCH v2 09/10] ext4: add dirdata set/get helpers
* [PATCH v2 10/10] ext4: Add EXT4_IOC_SET_LUFID ioctl for setting LUFID on directory entries

and found the following issues:
* KASAN: slab-out-of-bounds Read in __ext4_check_dir_entry
* KASAN: slab-out-of-bounds Read in ext4_inlinedir_to_tree
* KASAN: slab-use-after-free Read in __ext4_check_dir_entry
* KASAN: slab-use-after-free Read in ext4_inlinedir_to_tree
* KASAN: use-after-free Read in __ext4_check_dir_entry

Full report is available here:
https://ci.syzbot.org/series/5bf0e2fa-2e68-4532-8396-4568879b2788

***

KASAN: slab-out-of-bounds Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/b0854918-13f9-49dd-ab30-12154f0debe2/syz_repro

loop0: lost filesystem error report for type 5 error -117
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
BUG: KASAN: slab-out-of-bounds in ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
BUG: KASAN: slab-out-of-bounds in __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
Read of size 1 at addr ffff8881022db7f5 by task syz.0.23/5815

CPU: 1 UID: 0 PID: 5815 Comm: syz.0.23 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
 __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
 ext4_check_all_de+0x66/0x150 fs/ext4/dir.c:657
 ext4_convert_inline_data_nolock+0x1b7/0x990 fs/ext4/inline.c:1121
 ext4_try_add_inline_entry+0x604/0x8e0 fs/ext4/inline.c:1247
 __ext4_add_entry+0x390/0x1f40 fs/ext4/namei.c:2529
 ext4_add_entry fs/ext4/namei.c:2613 [inline]
 ext4_mkdir+0x5e5/0xce0 fs/ext4/namei.c:3175
 vfs_mkdir+0x413/0x630 fs/namei.c:5271
 filename_mkdirat+0x285/0x510 fs/namei.c:5304
 __do_sys_mkdirat fs/namei.c:5325 [inline]
 __se_sys_mkdirat+0x35/0x150 fs/namei.c:5322
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f669359bcc7
Code: 00 66 90 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 db f7 ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 b8 02 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd42381d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000102
RAX: ffffffffffffffda RBX: 00007ffd42381dc0 RCX: 00007f669359bcc7
RDX: 00000000000001ff RSI: 0000200000001200 RDI: 00000000ffffff9c
RBP: 00002000000024c0 R08: 0000200000000240 R09: 0000000000000000
R10: 00002000000024c0 R11: 0000000000000246 R12: 0000200000001200
R13: 00007ffd42381d80 R14: 0000000000000000 R15: 0000000000000000
 </TASK>

Allocated by task 5066:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x31c/0x660 mm/slub.c:5420
 kmalloc_noprof include/linux/slab.h:950 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 kernfs_get_open_node fs/kernfs/file.c:543 [inline]
 kernfs_fop_open+0x862/0xda0 fs/kernfs/file.c:718
 do_dentry_open+0x822/0x13a0 fs/open.c:947
 vfs_open+0x3b/0x340 fs/open.c:1079
 do_open fs/namei.c:4699 [inline]
 path_openat+0x2e08/0x3860 fs/namei.c:4858
 do_file_open+0x23e/0x4a0 fs/namei.c:4887
 do_sys_openat2+0x113/0x200 fs/open.c:1364
 do_sys_open fs/open.c:1370 [inline]
 __do_sys_openat fs/open.c:1386 [inline]
 __se_sys_openat fs/open.c:1381 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1381
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Last potentially related work creation:
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 kvfree_call_rcu+0x100/0x430 mm/slab_common.c:1970
 kernfs_unlink_open_file+0x3fe/0x4b0 fs/kernfs/file.c:604
 kernfs_fop_release+0x2eb/0x440 fs/kernfs/file.c:783
 __fput+0x44f/0xa60 fs/file_table.c:510
 fput_close_sync+0x11f/0x240 fs/file_table.c:615
 __do_sys_close fs/open.c:1507 [inline]
 __se_sys_close fs/open.c:1492 [inline]
 __x64_sys_close+0x7e/0x110 fs/open.c:1492
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff8881022db700
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 117 bytes to the right of
 allocated 128-byte region [ffff8881022db700, ffff8881022db780)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1022db
flags: 0x17ff00000000000(node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000000 ffff888100041a00 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2000(__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 0, tgid 0 (swapper/0), ts 2408938923, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7272
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4652
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 __do_kmalloc_node mm/slub.c:5295 [inline]
 __kmalloc_noprof+0x474/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 __alloc_empty_sheaf mm/slub.c:2768 [inline]
 alloc_empty_sheaf mm/slub.c:2783 [inline]
 __pcs_replace_empty_main+0x2df/0x720 mm/slub.c:4647
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 kmem_cache_alloc_noprof+0x37d/0x650 mm/slub.c:4906
 dup_fd+0x55/0xb40 fs/file.c:390
 copy_files+0xc8/0x120 kernel/fork.c:1639
 copy_process+0x1d94/0x4440 kernel/fork.c:2252
 kernel_clone+0x2d7/0x940 kernel/fork.c:2722
 user_mode_thread+0x110/0x180 kernel/fork.c:2798
 rest_init+0x23/0x300 init/main.c:727
 start_kernel+0x38a/0x3e0 init/main.c:1220
page_owner free stack trace missing

Memory state around the buggy address:
 ffff8881022db680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8881022db700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8881022db780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                             ^
 ffff8881022db800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8881022db880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: slab-out-of-bounds Read in ext4_inlinedir_to_tree

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/2dff870b-f382-4c93-8d8d-b2291d921224/syz_repro

loop1: lost filesystem error report for type 5 error -117
EXT4-fs (loop1): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_dir_entry_len fs/ext4/ext4.h:4095 [inline]
BUG: KASAN: slab-out-of-bounds in ext4_inlinedir_to_tree+0xda5/0x10d0 fs/ext4/inline.c:1335
Read of size 2 at addr ffff888115a3183c by task syz.1.18/5839

CPU: 1 UID: 0 PID: 5839 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dir_entry_len fs/ext4/ext4.h:4095 [inline]
 ext4_inlinedir_to_tree+0xda5/0x10d0 fs/ext4/inline.c:1335
 ext4_htree_fill_tree+0x517/0x1230 fs/ext4/namei.c:1182
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2db4/0x3640 fs/ext4/dir.c:146
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents64 fs/readdir.c:399 [inline]
 __se_sys_getdents64+0xf1/0x280 fs/readdir.c:384
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f3e02b9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3e03ad5028 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
RAX: ffffffffffffffda RBX: 00007f3e02e15fa0 RCX: 00007f3e02b9ce59
RDX: 0000000000001000 RSI: 0000200000000f80 RDI: 0000000000000004
RBP: 00007f3e02c32d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f3e02e16038 R14: 00007f3e02e15fa0 R15: 00007ffcaa902298
 </TASK>

Allocated by task 5839:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5296 [inline]
 __kmalloc_noprof+0x35c/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 ext4_inlinedir_to_tree+0x312/0x10d0 fs/ext4/inline.c:1292
 ext4_htree_fill_tree+0x517/0x1230 fs/ext4/namei.c:1182
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2db4/0x3640 fs/ext4/dir.c:146
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents64 fs/readdir.c:399 [inline]
 __se_sys_getdents64+0xf1/0x280 fs/readdir.c:384
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff888115a31800
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes to the right of
 allocated 60-byte region [ffff888115a31800, ffff888115a3183c)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x115a31
flags: 0x17ff00000000000(node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000000 ffff8881000418c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2c40(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5051, tgid 5051 (acpid), ts 27203740677, free_ts 27201732767
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7272
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4652
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 __do_kmalloc_node mm/slub.c:5295 [inline]
 __kmalloc_noprof+0x474/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 tomoyo_get_name+0x20c/0x590 security/tomoyo/memory.c:173
 tomoyo_parse_name_union+0xd9/0x130 security/tomoyo/util.c:260
 tomoyo_update_path_acl security/tomoyo/file.c:399 [inline]
 tomoyo_write_file+0x3a6/0xc50 security/tomoyo/file.c:1027
 tomoyo_write_domain2 security/tomoyo/common.c:1160 [inline]
 tomoyo_add_entry security/tomoyo/common.c:2177 [inline]
 tomoyo_supervisor+0x1208/0x1570 security/tomoyo/common.c:2238
 tomoyo_audit_path_log security/tomoyo/file.c:169 [inline]
 tomoyo_path_permission+0x25a/0x380 security/tomoyo/file.c:592
 tomoyo_check_open_permission+0x2b2/0x470 security/tomoyo/file.c:782
 security_file_open+0xa9/0x240 security/security.c:2739
 do_dentry_open+0x4a8/0x13a0 fs/open.c:924
 vfs_open+0x3b/0x340 fs/open.c:1079
page last free pid 15 tgid 15 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc1c/0xd30 mm/page_alloc.c:2938
 __tlb_remove_table_free mm/mmu_gather.c:228 [inline]
 tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 run_ksoftirqd+0x36/0x60 kernel/softirq.c:1076
 smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff888115a31700: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff888115a31780: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
>ffff888115a31800: 00 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc
                                        ^
 ffff888115a31880: 00 00 00 00 00 00 02 fc fc fc fc fc fc fc fc fc
 ffff888115a31900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: slab-use-after-free Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/f1d48ea1-6e87-4d64-9c13-8bf8aed109fc/syz_repro

loop0: lost filesystem error report for type 5 error -117
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
BUG: KASAN: slab-use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
BUG: KASAN: slab-use-after-free in __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
Read of size 1 at addr ffff888114d8c045 by task syz.0.20/5821

CPU: 1 UID: 0 PID: 5821 Comm: syz.0.20 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
 __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
 ext4_find_dest_de+0x136/0x770 fs/ext4/namei.c:2203
 ext4_add_dirent_to_inline+0xcf/0x430 fs/ext4/inline.c:984
 ext4_try_add_inline_entry+0x235/0x8e0 fs/ext4/inline.c:1213
 __ext4_add_entry+0x390/0x1f40 fs/ext4/namei.c:2529
 ext4_add_entry fs/ext4/namei.c:2613 [inline]
 ext4_add_nondir+0x111/0x310 fs/ext4/namei.c:2936
 ext4_create+0x2e9/0x470 fs/ext4/namei.c:2982
 lookup_open fs/namei.c:4511 [inline]
 open_last_lookups fs/namei.c:4611 [inline]
 path_openat+0x1395/0x3860 fs/namei.c:4855
 do_file_open+0x23e/0x4a0 fs/namei.c:4887
 do_sys_openat2+0x113/0x200 fs/open.c:1364
 do_sys_open fs/open.c:1370 [inline]
 __do_sys_openat fs/open.c:1386 [inline]
 __se_sys_openat fs/open.c:1381 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1381
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f922219ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9223137028 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f9222415fa0 RCX: 00007f922219ce59
RDX: 0000000000042042 RSI: 0000200000000080 RDI: 0000000000000004
RBP: 00007f9222232d6f R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000014a R11: 0000000000000246 R12: 0000000000000000
R13: 00007f9222416038 R14: 00007f9222415fa0 R15: 00007ffd01a2d448
 </TASK>

Allocated by task 5484:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4570 [inline]
 slab_alloc_node mm/slub.c:4899 [inline]
 kmem_cache_alloc_node_noprof+0x384/0x690 mm/slub.c:4951
 kmalloc_reserve net/core/skbuff.c:613 [inline]
 __alloc_skb+0x27d/0x7d0 net/core/skbuff.c:713
 alloc_skb include/linux/skbuff.h:1385 [inline]
 nlmsg_new include/net/netlink.h:1055 [inline]
 mpls_netconf_notify_devconf+0x46/0x100 net/mpls/af_mpls.c:1217
 mpls_dev_notify+0xb2d/0xd10 net/mpls/af_mpls.c:1691
 notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
 call_netdevice_notifiers net/core/dev.c:2301 [inline]
 unregister_netdevice_many_notify+0x17a5/0x22c0 net/core/dev.c:12421
 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
 ops_undo_list+0x3d3/0x940 net/core/net_namespace.c:248
 cleanup_net+0x56b/0x800 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Freed by task 5484:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2689 [inline]
 slab_free mm/slub.c:6251 [inline]
 kfree+0x1c5/0x640 mm/slub.c:6566
 skb_kfree_head net/core/skbuff.c:1075 [inline]
 skb_free_head net/core/skbuff.c:1087 [inline]
 skb_release_data+0x828/0xa60 net/core/skbuff.c:1114
 skb_release_all net/core/skbuff.c:1189 [inline]
 __kfree_skb+0x5d/0x210 net/core/skbuff.c:1203
 netlink_broadcast_filtered+0xe18/0xf20 net/netlink/af_netlink.c:1540
 nlmsg_multicast_filtered include/net/netlink.h:1165 [inline]
 nlmsg_multicast include/net/netlink.h:1184 [inline]
 nlmsg_notify+0xf0/0x1a0 net/netlink/af_netlink.c:2598
 mpls_dev_notify+0xb2d/0xd10 net/mpls/af_mpls.c:1691
 notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
 call_netdevice_notifiers net/core/dev.c:2301 [inline]
 unregister_netdevice_many_notify+0x17a5/0x22c0 net/core/dev.c:12421
 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
 ops_undo_list+0x3d3/0x940 net/core/net_namespace.c:248
 cleanup_net+0x56b/0x800 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The buggy address belongs to the object at ffff888114d8c000
 which belongs to the cache skbuff_small_head of size 704
The buggy address is located 69 bytes inside of
 freed 704-byte region [ffff888114d8c000, ffff888114d8c2c0)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x114d8c
head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000040 ffff888160416b40 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800120012 00000000f5000000 0000000000000000
head: 017ff00000000040 ffff888160416b40 dead000000000100 dead000000000122
head: 0000000000000000 0000000800120012 00000000f5000000 0000000000000000
head: 017ff00000000002 ffffffffffffff01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000004
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5484, tgid 5484 (kworker/u8:2), ts 72573003529, free_ts 72546506446
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7272
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4652
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 kmem_cache_alloc_node_noprof+0x441/0x690 mm/slub.c:4951
 kmalloc_reserve net/core/skbuff.c:613 [inline]
 __alloc_skb+0x27d/0x7d0 net/core/skbuff.c:713
 alloc_skb include/linux/skbuff.h:1385 [inline]
 nlmsg_new include/net/netlink.h:1055 [inline]
 mpls_netconf_notify_devconf+0x46/0x100 net/mpls/af_mpls.c:1217
 mpls_dev_notify+0xb2d/0xd10 net/mpls/af_mpls.c:1691
 notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
 call_netdevice_notifiers net/core/dev.c:2301 [inline]
 unregister_netdevice_many_notify+0x17a5/0x22c0 net/core/dev.c:12421
 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
 ops_undo_list+0x3d3/0x940 net/core/net_namespace.c:248
 cleanup_net+0x56b/0x800 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
page last free pid 5484 tgid 5484 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc1c/0xd30 mm/page_alloc.c:2938
 stack_depot_save_flags+0x40e/0x810 lib/stackdepot.c:735
 kasan_save_stack mm/kasan/common.c:58 [inline]
 kasan_save_track+0x4f/0x80 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4570 [inline]
 slab_alloc_node mm/slub.c:4899 [inline]
 kmem_cache_alloc_noprof+0x2bc/0x650 mm/slub.c:4906
 kmem_alloc_batch lib/debugobjects.c:371 [inline]
 fill_pool+0x156/0x580 lib/debugobjects.c:420
 debug_objects_fill_pool lib/debugobjects.c:752 [inline]
 debug_object_activate+0x4a3/0x580 lib/debugobjects.c:841
 debug_rcu_head_queue kernel/rcu/rcu.h:236 [inline]
 __call_rcu_common kernel/rcu/tree.c:3116 [inline]
 call_rcu+0x43/0x890 kernel/rcu/tree.c:3251
 kernfs_put+0x259/0x520 fs/kernfs/dir.c:618
 kernfs_remove_by_name_ns+0xc8/0x140 fs/kernfs/dir.c:1799
 device_remove_class_symlinks+0x178/0x190 drivers/base/core.c:3479
 device_del+0x400/0x8f0 drivers/base/core.c:3881
 unregister_netdevice_many_notify+0x1d5f/0x22c0 net/core/dev.c:12456
 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
 ops_undo_list+0x3d3/0x940 net/core/net_namespace.c:248
 cleanup_net+0x56b/0x800 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397

Memory state around the buggy address:
 ffff888114d8bf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888114d8bf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888114d8c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff888114d8c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888114d8c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


***

KASAN: slab-use-after-free Read in ext4_inlinedir_to_tree

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/f42da242-e16e-4f10-bf25-0bd7e192d989/syz_repro

loop0: lost filesystem error report for type 5 error -117
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
BUG: KASAN: slab-use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
BUG: KASAN: slab-use-after-free in ext4_inlinedir_to_tree+0x94c/0x10d0 fs/ext4/inline.c:1335
Read of size 1 at addr ffff88816fee8825 by task syz.0.20/5867

CPU: 1 UID: 0 PID: 5867 Comm: syz.0.20 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
 ext4_inlinedir_to_tree+0x94c/0x10d0 fs/ext4/inline.c:1335
 ext4_htree_fill_tree+0x517/0x1230 fs/ext4/namei.c:1182
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2db4/0x3640 fs/ext4/dir.c:146
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents fs/readdir.c:319 [inline]
 __se_sys_getdents+0xf1/0x270 fs/readdir.c:304
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f010ad9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f010bc0f028 EFLAGS: 00000246 ORIG_RAX: 000000000000004e
RAX: ffffffffffffffda RBX: 00007f010b015fa0 RCX: 00007f010ad9ce59
RDX: 0000000000000054 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007f010ae32d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f010b016038 R14: 00007f010b015fa0 R15: 00007ffd93577348
 </TASK>

Allocated by task 5064:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5296 [inline]
 __kmalloc_noprof+0x35c/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 tomoyo_encode2 security/tomoyo/realpath.c:45 [inline]
 tomoyo_encode+0x28b/0x550 security/tomoyo/realpath.c:80
 tomoyo_realpath_from_path+0x58d/0x5d0 security/tomoyo/realpath.c:283
 tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
 tomoyo_path_perm+0x283/0x560 security/tomoyo/file.c:827
 security_inode_getattr+0x12b/0x310 security/security.c:1895
 vfs_getattr fs/stat.c:259 [inline]
 vfs_fstat fs/stat.c:281 [inline]
 vfs_fstatat+0xb4/0x170 fs/stat.c:371
 __do_sys_newfstatat fs/stat.c:538 [inline]
 __se_sys_newfstatat fs/stat.c:532 [inline]
 __x64_sys_newfstatat+0x151/0x200 fs/stat.c:532
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 5064:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2689 [inline]
 slab_free mm/slub.c:6251 [inline]
 kfree+0x1c5/0x640 mm/slub.c:6566
 tomoyo_path_perm+0x403/0x560 security/tomoyo/file.c:847
 security_inode_getattr+0x12b/0x310 security/security.c:1895
 vfs_getattr fs/stat.c:259 [inline]
 vfs_fstat fs/stat.c:281 [inline]
 vfs_fstatat+0xb4/0x170 fs/stat.c:371
 __do_sys_newfstatat fs/stat.c:538 [inline]
 __se_sys_newfstatat fs/stat.c:532 [inline]
 __x64_sys_newfstatat+0x151/0x200 fs/stat.c:532
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88816fee8800
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 37 bytes inside of
 freed 64-byte region [ffff88816fee8800, ffff88816fee8840)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16fee8
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff8881000418c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 21294026082, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7272
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4652
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 __do_kmalloc_node mm/slub.c:5295 [inline]
 __kmalloc_noprof+0x474/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 handler_new_ref+0x261/0x9c0 drivers/media/v4l2-core/v4l2-ctrls-core.c:1882
 v4l2_ctrl_add_handler+0x19f/0x290 drivers/media/v4l2-core/v4l2-ctrls-core.c:2443
 vivid_create_controls+0x332d/0x3bd0 drivers/media/test-drivers/vivid/vivid-ctrls.c:2072
 vivid_create_instance drivers/media/test-drivers/vivid/vivid-core.c:1933 [inline]
 vivid_probe+0x4261/0x72b0 drivers/media/test-drivers/vivid/vivid-core.c:2095
 platform_probe+0xf9/0x190 drivers/base/platform.c:1432
 call_driver_probe drivers/base/dd.c:-1 [inline]
 really_probe+0x267/0xaf0 drivers/base/dd.c:709
 __driver_probe_device+0x1ef/0x380 drivers/base/dd.c:871
 driver_probe_device+0x4f/0x240 drivers/base/dd.c:901
 __driver_attach+0x34c/0x640 drivers/base/dd.c:1295
page_owner free stack trace missing

Memory state around the buggy address:
 ffff88816fee8700: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
 ffff88816fee8780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
>ffff88816fee8800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                               ^
 ffff88816fee8880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff88816fee8900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: use-after-free Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/57c0b75a-8922-4dc1-9a20-ca947564792b/syz_repro

==================================================================
BUG: KASAN: use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
BUG: KASAN: use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
BUG: KASAN: use-after-free in __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
Read of size 1 at addr ffff88816be85045 by task syz.2.21/5880

CPU: 1 UID: 0 PID: 5880 Comm: syz.2.21 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
 __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
 ext4_find_dest_de+0x136/0x770 fs/ext4/namei.c:2203
 ext4_add_dirent_to_inline+0xcf/0x430 fs/ext4/inline.c:984
 ext4_try_add_inline_entry+0x235/0x8e0 fs/ext4/inline.c:1213
 __ext4_add_entry+0x390/0x1f40 fs/ext4/namei.c:2529
 ext4_add_entry fs/ext4/namei.c:2613 [inline]
 ext4_add_nondir+0x111/0x310 fs/ext4/namei.c:2936
 ext4_create+0x2e9/0x470 fs/ext4/namei.c:2982
 lookup_open fs/namei.c:4511 [inline]
 open_last_lookups fs/namei.c:4611 [inline]
 path_openat+0x1395/0x3860 fs/namei.c:4855
 do_file_open+0x23e/0x4a0 fs/namei.c:4887
 do_sys_openat2+0x113/0x200 fs/open.c:1364
 do_sys_open fs/open.c:1370 [inline]
 __do_sys_openat fs/open.c:1386 [inline]
 __se_sys_openat fs/open.c:1381 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1381
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5713b9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff672b25f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f5713e15fa0 RCX: 00007f5713b9ce59
RDX: 0000000000042042 RSI: 0000200000000080 RDI: 0000000000000004
RBP: 00007f5713c32d6f R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000014a R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5713e15fac R14: 00007f5713e15fa0 R15: 00007f5713e15fa0
 </TASK>

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16be85
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f0(buddy)
raw: 057ff00000000000 ffffea0005afa0c8 ffffea0005afa1c8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000f0000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL), pid 5630, tgid 5630 (syz-executor), ts 67290853657, free_ts 69321168948
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 __alloc_pages_noprof+0x10/0x100 mm/page_alloc.c:5255
 alloc_pages_bulk_noprof+0x5ff/0x7c0 mm/page_alloc.c:5175
 ___alloc_pages_bulk mm/kasan/shadow.c:345 [inline]
 __kasan_populate_vmalloc_do mm/kasan/shadow.c:370 [inline]
 __kasan_populate_vmalloc+0xc1/0x1d0 mm/kasan/shadow.c:424
 kasan_populate_vmalloc include/linux/kasan.h:580 [inline]
 alloc_vmap_area+0xd47/0x1480 mm/vmalloc.c:2123
 __get_vm_area_node+0x1f8/0x300 mm/vmalloc.c:3226
 __vmalloc_node_range_noprof+0x36a/0x1750 mm/vmalloc.c:4024
 vmalloc_user_noprof+0xad/0xe0 mm/vmalloc.c:4218
 kcov_ioctl+0x55/0x620 kernel/kcov.c:726
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5693 tgid 5693 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc1c/0xd30 mm/page_alloc.c:2938
 kasan_depopulate_vmalloc_pte+0x6d/0x90 mm/kasan/shadow.c:484
 apply_to_pte_range mm/memory.c:3338 [inline]
 apply_to_pmd_range mm/memory.c:3382 [inline]
 apply_to_pud_range mm/memory.c:3418 [inline]
 apply_to_p4d_range mm/memory.c:3454 [inline]
 __apply_to_page_range+0xbdc/0x1420 mm/memory.c:3490
 __kasan_release_vmalloc+0xa2/0xd0 mm/kasan/shadow.c:602
 kasan_release_vmalloc include/linux/kasan.h:593 [inline]
 kasan_release_vmalloc_node mm/vmalloc.c:2284 [inline]
 purge_vmap_node+0x220/0x960 mm/vmalloc.c:2306
 __purge_vmap_area_lazy+0x779/0xb40 mm/vmalloc.c:2396
 drain_vmap_area_work+0x27/0x40 mm/vmalloc.c:2430
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff88816be84f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88816be84f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88816be85000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                           ^
 ffff88816be85080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88816be85100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* Re: [PATCH] iomap: enforce DIO alignment check in iomap
From: Carlos Maiolino @ 2026-06-11 10:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: brauner, linux-block, linux-fsdevel, linux-ext4, linux-xfs,
	Keith Busch, Hannes Reinecke, Martin K. Petersen, Jens Axboe
In-Reply-To: <20260611055744.GA18538@lst.de>

On Thu, Jun 11, 2026 at 07:57:44AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 10, 2026 at 04:52:11PM +0200, cem@kernel.org wrote:
> > From: Carlos Maiolino <cem@kernel.org>
> > 
> > The DIO alignment check has been lifted from iomap layer to rely on the
> > block layer to enforce proper alignment when issuing direct IO
> > operations. This though, depending on the IO size and buffer address
> > passed to the IO operation may lead to user-visible behavior change.
> > 
> > This has been caught initially by LTP test diotest4 running on
> > PPC architecture, where the test fails because a read() operation
> > with a supposedly misaligned buffer succeeds instead of an expected
> > -EINVAL.
> > This has no direct relationship with PPC, but seems to do with the
> > IO size crossing page borders or not.
> 
> I don't understand the problem here.  Why do we want to insist on a
> failure when we can support it?  I think the test is just broken.

The problem I see here from my POV is this changed the behavior expected
from the syscalls when the passed in buffer is misaligned as the read()
(in the test) succeeds when the passed in buffer does not match the
alignment requirements (see below).

I am pretty happy in declaring this a test bug, but I thought it would be
worth starting a discussion about the sudden/unexpected behavior change.
Not to mention now different filesystems will have different alignment
requirements which seems at least "weird" to me. I mean, now suddenly
iomap-based filesystems have a more relaxed alignment constraint than
for example btrfs.

> 
> > The problematic behavior is reproducible on x86 by reducing the IO size
> > to something < PAGE_SIZE, so the misaligned read()s will also be accepted
> > by the block layer.
> 
> What do you mean with misaligned here?  For a long time the kernel
> supports basically arbitrary low memory alignment for diret I/O,
> just bounded by the device capabilities (typical 4 byte alignment).

The test sends to read() a buffer misplaced by 1 byte (see below) which
doesn't match the system's alignment constraints at least from the user
passed buffer perspective.
I've been assuming it should match device's dma_alignment constraints.
The typical 4 byte alignment indeed is the requirement from my PPC
machine, but not for my x86:

> 
> The supported memory alignment is reported in the statx
> dio_mem_align.  What does that say compared to the alignment
> expectations in this test?

From my x86:
dio_mem_align: 512
dio_offset_align: 512

From PPC:
dio_mem_align: 4
dio_offset_align: 512

But this does not explain how the following call would succeed in either
case (below one taken from PPC):

openat(dirfd=AT_FDCWD, pathname="testdata-4.135256", flags=O_RDWR|O_DIRECT) = 3
_llseek(fd=3, offset=4096, result=[4096], whence=SEEK_SET) = 0
read(arg1=0x3, arg2=0x1003af80001, arg3=0x1000) = 0x1000

The passed in address 0x1003af80001 is one byte misaligned and shouldn't
(at least in theory) ever be accepted no? Or am I missing something
else?

^ permalink raw reply

* Re: [PATCH v4] iomap: add simple read path for small direct I/O
From: Pankaj Raghav (Samsung) @ 2026-06-11  9:36 UTC (permalink / raw)
  To: Fengnan Chang
  Cc: brauner, djwong, hch, ojaswin, dgc, linux-xfs, linux-fsdevel,
	linux-ext4, linux-kernel, lidiangang, p.raghav
In-Reply-To: <20260608073134.95964-1-changfengnan@bytedance.com>

> +static ssize_t iomap_dio_simple_read_complete(struct kiocb *iocb,
> +		struct bio *bio)
> +{
> +	struct inode *inode = file_inode(iocb->ki_filp);
> +	ssize_t ret;
> +
> +	WRITE_ONCE(iocb->private, NULL);
> +
> +	ret = iomap_dio_simple_read_finish(iocb, bio,
> +			blk_status_to_errno(bio->bi_status));
> +
> +	inode_dio_end(inode);
> +	trace_iomap_dio_complete(iocb, ret < 0 ? ret : 0, ret > 0 ? ret : 0);

Shouldn't the second parameter here be
blk_status_to_errno(bio->bi_status)?

I think that will be more meaningful for tracing here.
trace_iomap_dio_complete(iocb, blk_status_to_errno(bio->bi_status), ret);

<snip>
> +	return ret;
> +}
> +
> +	sr->iocb = iocb;
> +	sr->dio_flags = dio_flags;
> +
> +	bio->bi_iter.bi_sector = iomap_sector(&iomi.iomap, iomi.pos);
> +	bio->bi_ioprio = iocb->ki_ioprio;
> +	bio->bi_private = sr;
> +	bio->bi_end_io = iomap_dio_simple_read_end_io;
> +
> +	if (dio_flags & IOMAP_DIO_BOUNCE)
> +		ret = bio_iov_iter_bounce(bio, iter, count);
> +	else
> +		ret = bio_iov_iter_get_pages(bio, iter, alignment - 1);
> +	if (unlikely(ret))
> +		goto out_bio_put;
> +
> +	if (bio->bi_iter.bi_size != count) {
> +		iov_iter_revert(iter, bio->bi_iter.bi_size);
> +		ret = -ENOTBLK;
> +		goto out_bio_release_pages;
> +	}
> +
> +	sr->size = bio->bi_iter.bi_size;
> +
> +	if ((dio_flags & IOMAP_DIO_USER_BACKED) &&
> +	    !(dio_flags & IOMAP_DIO_BOUNCE))
> +		bio_set_pages_dirty(bio);
> +
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		bio->bi_opf |= REQ_NOWAIT;
> +	if ((iocb->ki_flags & IOCB_HIPRI) && !wait_for_completion) {
> +		bio->bi_opf |= REQ_POLLED;
> +		bio_set_polled(bio, iocb);

This results in build failure as the following patch removed this call:
https://lore.kernel.org/linux-block/20260518062917.506483-1-hch@lst.de/

I think this call can just be removed as you are setting REQ_POLLED
anyway.

> +		WRITE_ONCE(iocb->private, bio);
> +	}
> +
> +	if (wait_for_completion) {
> +		sr->waiter = current;
> +		blk_crypto_submit_bio(bio);
> +	} else {
> +		atomic_set(&sr->state, IOMAP_DIO_SIMPLE_SUBMITTING);
> +		sr->waiter = NULL;
> +		blk_crypto_submit_bio(bio);
> +		ret = -EIOCBQUEUED;
> +	}
> +
--
Pankaj

^ permalink raw reply

* Re: [PATCH 00/17] replace __get_free_pages() call with kmalloc()
From: Mike Rapoport @ 2026-06-11  9:09 UTC (permalink / raw)
  To: Zi Yan
  Cc: Jan Kara, Mark Fasheh, Joel Becker, Joseph Qi, Ryusuke Konishi,
	Viacheslav Dubeyko, Trond Myklebust, Anna Schumaker, Chuck Lever,
	Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Alexander Viro, Christian Brauner, Jan Kara, Dave Kleikamp,
	Theodore Ts'o, Miklos Szeredi, Andreas Hindborg, Breno Leitao,
	Kees Cook, Tigran A. Aivazian, linux-kernel, linux-fsdevel,
	ocfs2-devel, linux-nilfs, linux-nfs, jfs-discussion, linux-ext4,
	linux-mm
In-Reply-To: <3FD8E1FD-6E18-46D9-AE93-00FA1A66C775@nvidia.com>

On Fri, Jun 05, 2026 at 04:00:33PM -0400, Zi Yan wrote:
> On 23 May 2026, at 13:54, Mike Rapoport (Microsoft) wrote:
> 
> > This is a (small) part of larger work of replacing page allocator calls
> > with kmalloc.
> 
> Is the goal to get rid of __get_free_page(s)()?

Yes, eventually.

My initial intention a few month ago was to remove the ugly casts [1], but
then willy pointed out that Linus objected to something like this [2] and
it looks like more than a decade old technical debt.

Since there are more than 600 or those it will take a while to convert
suitable gfp calls to kmalloc.
Afterwards we can re-evaluate what APIs we want to provide for allocations
that must have actual pages.

[1] https://lore.kernel.org/all/20251018093002.3660549-1-rppt@kernel.org/
[2] https://lore.kernel.org/all/CA+55aFwp4iy4rtX2gE2WjBGFL=NxMVnoFeHqYa2j1dYOMMGqxg@mail.gmail.com/ 

 
> Thanks.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH][e2fsprogs] build: use correct subst variable
From: Andreas Dilger @ 2026-06-11  8:18 UTC (permalink / raw)
  To: Li Dongyang; +Cc: linux-ext4
In-Reply-To: <20260611035236.307622-1-dongyangli@ddn.com>

On Jun 10, 2026, at 21:52, Li Dongyang <dongyangli@ddn.com> wrote:
> 
> ifNotGNUmake was changed to ifnGNUmake but test/Makefile.in still uses
> the old variable name.
> make fullcheck fails on some platforms:
> make[2]: Entering directory `/var/lib/jenkins/workspace/e2fsprogs-reviews/arch/x86_64/distro/el7/_topdir/BUILD/e2fsprogs-1.47.4/tests'
> Makefile:387: *** missing separator.  Stop.
> make[2]: Leaving directory `/var/lib/jenkins/workspace/e2fsprogs-reviews/arch/x86_64/distro/el7/_topdir/BUILD/e2fsprogs-1.47.4/tests'
> make[1]: *** [fullcheck-recursive] Error 1
> 
> Fixes: b7d1ab3376 "Update configure/configure.ac/aclocal.m4 to use autoconf 2.72"
> Change-Id: Iec3cacfca7206bf785381664b7d7bded8c70113c
> Signed-off-by: Li Dongyang <dongyangli@ddn.com>

Reviewed-by: Andreas Dilger <adilger@dilger.ca <mailto:adilger@dilger.ca>>

> ---
> tests/Makefile.in | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/Makefile.in b/tests/Makefile.in
> index 678cc3268c..8f7a072f45 100644
> --- a/tests/Makefile.in
> +++ b/tests/Makefile.in
> @@ -48,7 +48,7 @@ test_data.tmp: $(srcdir)/scripts/gen-test-data
> always_run:
> 
> @ifGNUmake@TESTS=$(wildcard $(srcdir)/[a-z]_*)
> -@ifNotGNUmake@TESTS != echo $(srcdir)/[a-z]_*
> +@ifnGNUmake@TESTS != echo $(srcdir)/[a-z]_*
> 
> SKIP_SLOW_TESTS=--skip-slow-tests
> 
> -- 
> 2.52.0
> 
> 


Cheers, Andreas






^ permalink raw reply

* Re: [PATCH] iomap: enforce DIO alignment check in iomap
From: Christoph Hellwig @ 2026-06-11  5:57 UTC (permalink / raw)
  To: cem
  Cc: brauner, linux-block, linux-fsdevel, linux-ext4, linux-xfs,
	Keith Busch, Hannes Reinecke, Martin K. Petersen,
	Christoph Hellwig, Jens Axboe
In-Reply-To: <20260610145218.141369-1-cem@kernel.org>

On Wed, Jun 10, 2026 at 04:52:11PM +0200, cem@kernel.org wrote:
> From: Carlos Maiolino <cem@kernel.org>
> 
> The DIO alignment check has been lifted from iomap layer to rely on the
> block layer to enforce proper alignment when issuing direct IO
> operations. This though, depending on the IO size and buffer address
> passed to the IO operation may lead to user-visible behavior change.
> 
> This has been caught initially by LTP test diotest4 running on
> PPC architecture, where the test fails because a read() operation
> with a supposedly misaligned buffer succeeds instead of an expected
> -EINVAL.
> This has no direct relationship with PPC, but seems to do with the
> IO size crossing page borders or not.

I don't understand the problem here.  Why do we want to insist on a
failure when we can support it?  I think the test is just broken.

> The problematic behavior is reproducible on x86 by reducing the IO size
> to something < PAGE_SIZE, so the misaligned read()s will also be accepted
> by the block layer.

What do you mean with misaligned here?  For a long time the kernel
supports basically arbitrary low memory alignment for diret I/O,
just bounded by the device capabilities (typical 4 byte alignment).

The supported memory alignment is reported in the statx
dio_mem_align.  What does that say compared to the alignment
expectations in this test?

^ permalink raw reply

* [PATCH 2/2] ext4: allocate the fast-commit range array lazily
From: Daejun Park @ 2026-06-11  4:49 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daejun Park
In-Reply-To: <20260611044817epcms2p3b5a66f4cdb41d0cbaaa7c257cccfc8a1@epcms2p3>

The multi-interval tracker added a fixed array of EXT4_FC_MAX_RANGES + 1
entries to every ext4_inode_info -- ~136 bytes that is wasted on inodes
that never use fast commit (read-only files, directories, ...).

Shrink it to the common case:

 - Keep the first range inline in i_fc_range, so a single contiguous
   dirty region (the common case) needs no allocation at all.

 - Allocate the i_fc_ranges array only when a second disjoint range
   appears, and free it when the inode is evicted.

 - The tracking path runs under i_fc_lock and so cannot sleep, so the
   array is allocated with GFP_ATOMIC.  On failure, fall back to
   coalescing the new range into the inline i_fc_range -- exactly the
   original single coalesced-range behaviour -- so no full-commit
   fallback or fast-commit ineligibility is needed.

The per-inode fast-commit footprint drops from ~140 bytes (the embedded
array) to 20 bytes (inline range + array pointer + count); the array is
allocated only while two or more disjoint ranges are tracked.

No on-disk format change.  Crash recovery (online replay + offline
e2fsck) and the fast-commit xfstests are unaffected.

While rewriting __track_range, also skip degenerate ranges (a sub-block
punch hole rounds the start up past the end, passing end == start - 1, so
no whole block changed) instead of storing an empty range, and drop the
redundant per-transaction reset here -- ext4_fc_track_template() already
resets the range set under i_fc_lock before calling the tracker.

Signed-off-by: Daejun Park <pdaejun@gmail.com>
---
 fs/ext4/ext4.h        | 19 ++++++++----
 fs/ext4/fast_commit.c | 70 +++++++++++++++++++++++++++++++++++++++----
 fs/ext4/super.c       |  1 +
 3 files changed, 80 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 314a1c90075b..6c6ac19e86b6 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1081,14 +1081,23 @@ struct ext4_inode_info {
 					 */
 
 	/*
-	 * Disjoint lblk ranges modified in this fast commit.  Tracking the
+	 * Logical block ranges modified in this fast commit.  Tracking the
 	 * actual modified ranges (instead of one coalesced [min,max]) avoids
 	 * re-logging the whole spanned extent map for scattered allocations.
-	 * Sorted by start, mutually disjoint.  Bounded by EXT4_FC_MAX_RANGES;
-	 * the extra slot is transient room used while inserting before an
-	 * overflow merge.  Protected by i_fc_lock.
+	 *
+	 * The first range is kept inline in i_fc_range, so the common case of a
+	 * single contiguous dirty region needs no allocation.  When a second
+	 * disjoint range appears the inode is upgraded to the i_fc_ranges array
+	 * (EXT4_FC_MAX_RANGES + 1 entries, sorted and mutually disjoint; the
+	 * extra slot is transient room used while inserting before an overflow
+	 * merge), allocated then and freed when the inode is evicted.  If that
+	 * allocation fails we fall back to coalescing into i_fc_range, i.e. the
+	 * original single coalesced-range behaviour.  i_fc_nr_ranges counts the
+	 * valid ranges; while i_fc_ranges is NULL it is 0 or 1.  Protected by
+	 * i_fc_lock.
 	 */
-	struct ext4_fc_lblk_range i_fc_ranges[EXT4_FC_MAX_RANGES + 1];
+	struct ext4_fc_lblk_range i_fc_range;
+	struct ext4_fc_lblk_range *i_fc_ranges;
 	unsigned int i_fc_nr_ranges;
 
 	spinlock_t i_raw_lock;	/* protects updates to the raw inode */
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index ab9ab50ad0b5..786b79a9c573 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -211,6 +211,7 @@ void ext4_fc_init_inode(struct inode *inode)
 	struct ext4_inode_info *ei = EXT4_I(inode);
 
 	ext4_fc_reset_inode(inode);
+	ei->i_fc_ranges = NULL;
 	ext4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);
 	INIT_LIST_HEAD(&ei->i_fc_list);
 	INIT_LIST_HEAD(&ei->i_fc_dilist);
@@ -671,17 +672,73 @@ static int __track_range(handle_t *handle, struct inode *inode, void *arg,
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct __track_range_args *__arg =
 		(struct __track_range_args *)arg;
+	ext4_lblk_t start = __arg->start, end = __arg->end;
+	ext4_lblk_t s0, e0;
 
 	if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb)) {
 		ext4_debug("Special inode %ld being modified\n", inode->i_ino);
 		return -ECANCELED;
 	}
 
-	/* A new transaction (update == false) starts a fresh range set. */
-	if (!update)
-		ei->i_fc_nr_ranges = 0;
+	/*
+	 * A sub-block punch hole rounds up the start and down the end, passing
+	 * end == start - 1: no whole block changed, so there is nothing to
+	 * track.  (ext4_fc_track_template has already reset the range set for a
+	 * new transaction, so we need not do it here.)
+	 */
+	if (end < start)
+		return 0;
+
+	/* Already upgraded to the heap array: full multi-interval tracking. */
+	if (ei->i_fc_ranges) {
+		ext4_fc_range_add(ei, start, end);
+		return 0;
+	}
+
+	/* First range of this commit stays inline, no allocation needed. */
+	if (ei->i_fc_nr_ranges == 0) {
+		ei->i_fc_range.start = start;
+		ei->i_fc_range.len = end - start + 1;
+		ei->i_fc_nr_ranges = 1;
+		return 0;
+	}
+
+	/* One inline range so far. */
+	s0 = ei->i_fc_range.start;
+	e0 = s0 + ei->i_fc_range.len - 1;
 
-	ext4_fc_range_add(ei, __arg->start, __arg->end);
+	/* Disjoint from it: try to upgrade to the array for exact tracking. */
+	if (start > e0 + 1 || end + 1 < s0) {
+		struct ext4_fc_lblk_range *heap;
+
+		/*
+		 * GFP_ATOMIC: we hold i_fc_lock.  __GFP_NOWARN: failure is not
+		 * fatal -- we fall back to the single coalesced range below --
+		 * so it must not splat under memory pressure.
+		 */
+		heap = kmalloc_array(EXT4_FC_MAX_RANGES + 1, sizeof(*heap),
+				     GFP_ATOMIC | __GFP_NOWARN);
+		if (heap) {
+			heap[0] = ei->i_fc_range;
+			ei->i_fc_ranges = heap;
+			ext4_fc_range_add(ei, start, end);
+			return 0;
+		}
+		/*
+		 * Out of memory: fall back to the original single coalesced
+		 * range by absorbing the gap below.  This over-logs the spanned
+		 * extents but stays a valid fast commit (no full-commit
+		 * fallback), so there is nothing to mark ineligible.
+		 */
+	}
+
+	/* Overlapping/adjacent, or array allocation failed: coalesce inline. */
+	if (start < s0)
+		s0 = start;
+	if (end > e0)
+		e0 = end;
+	ei->i_fc_range.start = s0;
+	ei->i_fc_range.len = e0 - s0 + 1;
 
 	return 0;
 }
@@ -1016,7 +1073,10 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
 		spin_unlock(&ei->i_fc_lock);
 		return 0;
 	}
-	memcpy(ranges, ei->i_fc_ranges, nr * sizeof(ranges[0]));
+	if (ei->i_fc_ranges)
+		memcpy(ranges, ei->i_fc_ranges, nr * sizeof(ranges[0]));
+	else
+		ranges[0] = ei->i_fc_range;	/* inline single-range mode */
 	ei->i_fc_nr_ranges = 0;
 	spin_unlock(&ei->i_fc_lock);
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 699c15db28a8..93d495cad0ba 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1433,6 +1433,7 @@ static void ext4_free_in_core_inode(struct inode *inode)
 		pr_warn("%s: inode %ld still in fc list",
 			__func__, inode->i_ino);
 	}
+	kfree(EXT4_I(inode)->i_fc_ranges);
 	kmem_cache_free(ext4_inode_cachep, EXT4_I(inode));
 }
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/2] ext4: track multiple disjoint fast-commit ranges per inode
From: Daejun Park @ 2026-06-11  4:48 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daejun Park
In-Reply-To: <20260611044733epcms2p38013ae683a283555526f70e4eab6d2a9@epcms2p3>

Fast commit tracks a single coalesced logical range per inode
(i_fc_lblk_start .. i_fc_lblk_len).  When an inode is modified at several
disjoint offsets between two commits (e.g. sparse random writes), the
range is widened to span [min, max] of all touched offsets, and at commit
time ext4_fc_write_inode_data() re-logs every extent inside that span,
including the unmodified ones.  On sparse allocation this inflates
fast-commit traffic and often overflows the fast-commit area, forcing a
fallback to a full jbd2 commit.

Replace the single range with a bounded array of up to EXT4_FC_MAX_RANGES
(16) disjoint ranges.  __track_range inserts and merges into it; on
overflow the two ranges separated by the smallest gap are coalesced, so
it degrades to the old single-span behaviour in the worst case.
ext4_fc_write_inode_data() now walks only the tracked ranges.  The
on-disk fast-commit (TLV) format is unchanged.

The number of disjoint dirty regions an inode accumulates per fsync --
how scattered the writes are -- controls how badly the single-span
tracking over-logs.  On a sparse random-write workload (1 GiB span, 300
fsyncs, NVMe):

                           16 regions    64 regions
  fast-commit blocks/cmt   19.1 -> 1.0   76.3 -> 31.6
  mean fsync latency (us)  2537 -> 2280  3398 -> 2937
  p99  fsync latency (us)  3698 -> 2545  4492 -> 4291

With 16 dirty regions per fsync everything fits within the 16-range cap
and each region is tracked exactly; 64 regions exceeds the cap and
exercises the overflow-merge path, which still roughly halves the logged
blocks.  On a small filesystem whose fast-commit area is easily exhausted,
the reduced traffic also cuts the full-commit fallback rate (e.g. 22% ->
2% at 16 regions on an 8 GiB fs).

Crash recovery (online replay + offline e2fsck) and the ext4/generic
fast-commit xfstests show no regression; the unchanged on-disk format
means e2fsprogs needs no update.

Signed-off-by: Daejun Park <pdaejun@gmail.com>
---
 fs/ext4/ext4.h        |  31 ++++++++--
 fs/ext4/fast_commit.c | 138 ++++++++++++++++++++++++++++++++----------
 2 files changed, 130 insertions(+), 39 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 01a6e2de7fc3..314a1c90075b 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1017,6 +1017,20 @@ enum {
 };
 
 
+/*
+ * Maximum number of disjoint logical-block ranges tracked per inode for a
+ * single fast commit.  Scattered allocations that exceed this get their two
+ * closest ranges merged (see ext4_fc_range_add()), degrading gracefully to
+ * the old single coalesced-range behaviour.
+ */
+#define EXT4_FC_MAX_RANGES 16
+
+/* In-memory record of an lblk range modified in the current fast commit. */
+struct ext4_fc_lblk_range {
+	ext4_lblk_t start;
+	ext4_lblk_t len;
+};
+
 /*
  * fourth extended file system inode data in memory
  */
@@ -1066,11 +1080,16 @@ struct ext4_inode_info {
 					 * protected by sbi->s_fc_lock.
 					 */
 
-	/* Start of lblk range that needs to be committed in this fast commit */
-	ext4_lblk_t i_fc_lblk_start;
-
-	/* End of lblk range that needs to be committed in this fast commit */
-	ext4_lblk_t i_fc_lblk_len;
+	/*
+	 * Disjoint lblk ranges modified in this fast commit.  Tracking the
+	 * actual modified ranges (instead of one coalesced [min,max]) avoids
+	 * re-logging the whole spanned extent map for scattered allocations.
+	 * Sorted by start, mutually disjoint.  Bounded by EXT4_FC_MAX_RANGES;
+	 * the extra slot is transient room used while inserting before an
+	 * overflow merge.  Protected by i_fc_lock.
+	 */
+	struct ext4_fc_lblk_range i_fc_ranges[EXT4_FC_MAX_RANGES + 1];
+	unsigned int i_fc_nr_ranges;
 
 	spinlock_t i_raw_lock;	/* protects updates to the raw inode */
 
@@ -1078,7 +1097,7 @@ struct ext4_inode_info {
 	wait_queue_head_t i_fc_wait;
 
 	/*
-	 * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len
+	 * Protect concurrent accesses on i_fc_ranges, i_fc_nr_ranges
 	 * and inode's EXT4_FC_STATE_COMMITTING state bit.
 	 */
 	spinlock_t i_fc_lock;
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 42bee1d4f9f9..ab9ab50ad0b5 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -203,8 +203,7 @@ static inline void ext4_fc_reset_inode(struct inode *inode)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
 
-	ei->i_fc_lblk_start = 0;
-	ei->i_fc_lblk_len = 0;
+	ei->i_fc_nr_ranges = 0;
 }
 
 void ext4_fc_init_inode(struct inode *inode)
@@ -540,7 +539,7 @@ static int __track_inode(handle_t *handle, struct inode *inode, void *arg,
 	if (update)
 		return -EEXIST;
 
-	EXT4_I(inode)->i_fc_lblk_len = 0;
+	EXT4_I(inode)->i_fc_nr_ranges = 0;
 
 	return 0;
 }
@@ -603,12 +602,73 @@ struct __track_range_args {
 	ext4_lblk_t start, end;
 };
 
+/*
+ * Record that logical block range [start, end] was modified in the current
+ * fast commit.  Maintains a small, bounded set of sorted, mutually disjoint
+ * ranges, merging the new range with any it overlaps or is adjacent to.  When
+ * the set would exceed EXT4_FC_MAX_RANGES, the consecutive pair separated by
+ * the smallest gap is merged (absorbing that gap), so the worst case degrades
+ * gracefully to the old single coalesced-range behaviour.  Tracking the actual
+ * modified ranges (rather than one [min,max] span) keeps ext4_fc_write_inode_data
+ * from re-logging the whole spanned extent map on scattered allocations.
+ * Caller holds ei->i_fc_lock.
+ */
+static void ext4_fc_range_add(struct ext4_inode_info *ei,
+			      ext4_lblk_t start, ext4_lblk_t end)
+{
+	struct ext4_fc_lblk_range *r = ei->i_fc_ranges;
+	unsigned int n = ei->i_fc_nr_ranges;
+	unsigned int i, j;
+
+	/* Skip ranges lying entirely before [start - 1] (no overlap/adjacency). */
+	i = 0;
+	while (i < n && r[i].start + r[i].len < start)
+		i++;
+
+	/* Absorb every range overlapping or adjacent to the growing [start,end]. */
+	j = i;
+	while (j < n && r[j].start <= end + 1) {
+		if (r[j].start < start)
+			start = r[j].start;
+		if (r[j].start + r[j].len - 1 > end)
+			end = r[j].start + r[j].len - 1;
+		j++;
+	}
+
+	/* Replace r[i..j-1] with the merged range (j == i is a plain insert). */
+	if (j != i + 1)
+		memmove(&r[i + 1], &r[j], (n - j) * sizeof(*r));
+	r[i].start = start;
+	r[i].len = end - start + 1;
+	ei->i_fc_nr_ranges = n - (j - i) + 1;
+
+	/* Overflow: merge the consecutive pair separated by the smallest gap. */
+	while (ei->i_fc_nr_ranges > EXT4_FC_MAX_RANGES) {
+		ext4_lblk_t best_gap = ~0U;
+		unsigned int best = 0;
+
+		n = ei->i_fc_nr_ranges;
+		for (i = 0; i + 1 < n; i++) {
+			ext4_lblk_t gap = r[i + 1].start -
+					  (r[i].start + r[i].len);
+
+			if (gap < best_gap) {
+				best_gap = gap;
+				best = i;
+			}
+		}
+		r[best].len = r[best + 1].start + r[best + 1].len - r[best].start;
+		memmove(&r[best + 1], &r[best + 2],
+			(n - best - 2) * sizeof(*r));
+		ei->i_fc_nr_ranges = n - 1;
+	}
+}
+
 /* __track_fn for tracking data updates */
 static int __track_range(handle_t *handle, struct inode *inode, void *arg,
 			 bool update)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	ext4_lblk_t oldstart;
 	struct __track_range_args *__arg =
 		(struct __track_range_args *)arg;
 
@@ -617,17 +677,11 @@ static int __track_range(handle_t *handle, struct inode *inode, void *arg,
 		return -ECANCELED;
 	}
 
-	oldstart = ei->i_fc_lblk_start;
+	/* A new transaction (update == false) starts a fresh range set. */
+	if (!update)
+		ei->i_fc_nr_ranges = 0;
 
-	if (update && ei->i_fc_lblk_len > 0) {
-		ei->i_fc_lblk_start = min(ei->i_fc_lblk_start, __arg->start);
-		ei->i_fc_lblk_len =
-			max(oldstart + ei->i_fc_lblk_len - 1, __arg->end) -
-				ei->i_fc_lblk_start + 1;
-	} else {
-		ei->i_fc_lblk_start = __arg->start;
-		ei->i_fc_lblk_len = __arg->end - __arg->start + 1;
-	}
+	ext4_fc_range_add(ei, __arg->start, __arg->end);
 
 	return 0;
 }
@@ -890,33 +944,20 @@ static int ext4_fc_write_inode(struct inode *inode, u32 *crc)
  * Writes updated data ranges for the inode in question. Updates CRC.
  * Returns 0 on success, error otherwise.
  */
-static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
+/* Write the fast commit TLVs for one modified lblk range [start, end]. */
+static int ext4_fc_write_lblk_range(struct inode *inode, ext4_lblk_t start,
+				    ext4_lblk_t end, u32 *crc)
 {
-	ext4_lblk_t old_blk_size, cur_lblk_off, new_blk_size;
-	struct ext4_inode_info *ei = EXT4_I(inode);
+	ext4_lblk_t cur_lblk_off = start;
 	struct ext4_map_blocks map;
 	struct ext4_fc_add_range fc_ext;
 	struct ext4_fc_del_range lrange;
 	struct ext4_extent *ex;
 	int ret;
 
-	spin_lock(&ei->i_fc_lock);
-	if (ei->i_fc_lblk_len == 0) {
-		spin_unlock(&ei->i_fc_lock);
-		return 0;
-	}
-	old_blk_size = ei->i_fc_lblk_start;
-	new_blk_size = ei->i_fc_lblk_start + ei->i_fc_lblk_len - 1;
-	ei->i_fc_lblk_len = 0;
-	spin_unlock(&ei->i_fc_lock);
-
-	cur_lblk_off = old_blk_size;
-	ext4_debug("will try writing %d to %d for inode %ld\n",
-		   cur_lblk_off, new_blk_size, inode->i_ino);
-
-	while (cur_lblk_off <= new_blk_size) {
+	while (cur_lblk_off <= end) {
 		map.m_lblk = cur_lblk_off;
-		map.m_len = new_blk_size - cur_lblk_off + 1;
+		map.m_len = end - cur_lblk_off + 1;
 		ret = ext4_map_blocks(NULL, inode, &map,
 				      EXT4_GET_BLOCKS_IO_SUBMIT |
 				      EXT4_EX_NOCACHE);
@@ -962,6 +1003,37 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
 	return 0;
 }
 
+static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
+{
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct ext4_fc_lblk_range ranges[EXT4_FC_MAX_RANGES + 1];
+	unsigned int nr, i;
+	int ret;
+
+	spin_lock(&ei->i_fc_lock);
+	nr = ei->i_fc_nr_ranges;
+	if (nr == 0) {
+		spin_unlock(&ei->i_fc_lock);
+		return 0;
+	}
+	memcpy(ranges, ei->i_fc_ranges, nr * sizeof(ranges[0]));
+	ei->i_fc_nr_ranges = 0;
+	spin_unlock(&ei->i_fc_lock);
+
+	for (i = 0; i < nr; i++) {
+		ext4_lblk_t start = ranges[i].start;
+		ext4_lblk_t end = ranges[i].start + ranges[i].len - 1;
+
+		ext4_debug("will try writing %u to %u for inode %ld\n",
+			   start, end, inode->i_ino);
+		ret = ext4_fc_write_lblk_range(inode, start, end, crc);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 
 /* Flushes data of all the inodes in the commit queue. */
 static int ext4_fc_flush_data(journal_t *journal)
-- 
2.43.0


^ permalink raw reply related

* [PATCH 0/2] ext4: reduce fast-commit write amplification for scattered writes
From: Daejun Park @ 2026-06-11  4:47 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daejun Park
In-Reply-To: <CGME20260611044733epcms2p38013ae683a283555526f70e4eab6d2a9@epcms2p3>

ext4 fast commit tracks a single coalesced logical range per inode.  When
an inode is dirtied at several disjoint offsets between two commits (e.g.
sparse/scattered random writes), that range is widened to span [min, max]
of all the touched offsets, and ext4_fc_write_inode_data() then re-logs
every extent inside that span -- including the unmodified ones.  On sparse
allocation this inflates fast-commit traffic and frequently overflows the
fast-commit area, forcing a fallback to a full jbd2 commit.

This series replaces the single range with a small, bounded set of disjoint
ranges so that only the actually-modified regions are logged, while keeping
the per-inode memory cost negligible:

  1/2 tracks up to EXT4_FC_MAX_RANGES (16) disjoint ranges, merging the two
      closest ranges when the set would overflow -- so the worst case
      degrades gracefully to the old single-span behaviour.  The on-disk
      fast-commit (TLV) format is unchanged.

  2/2 allocates that array lazily: the first range is kept inline, the array
      is allocated only when a second disjoint range appears, and on an
      allocation failure we fall back to the inline single range.  The
      per-inode fast-commit footprint drops from ~140 to 20 bytes.

Measured on a sparse random-write workload (1 GiB span, R disjoint dirty
regions per fsync, 300 fsyncs, bare-metal NVMe):

  - fast-commit blocks per commit (R=16):  18.6 -> 1.0
  - full-commit fallback rate     (R=16):  22%  -> 2%   (on a small fs)
  - mean fsync latency:  R=16  -10%,  R=64  -14%
  - p99  fsync latency:  R=16  -31%

The p99 improvement comes from eliminating the full-commit fallback spikes.

Testing: crash recovery (power loss -> fast-commit replay -> verify every
fsync'd block, then e2fsck) is clean; the ext4/generic fast-commit xfstests
show no regression; the unchanged on-disk format means e2fsprogs needs no
update.  Both patches are checkpatch --strict clean.

Based on v6.17-rc3.

Daejun Park (2):
  ext4: track multiple disjoint fast-commit ranges per inode
  ext4: allocate the fast-commit range array lazily

 fs/ext4/ext4.h        |  40 +++++++--
 fs/ext4/fast_commit.c | 196 +++++++++++++++++++++++++++++++++++-------
 fs/ext4/super.c       |   1 +
 3 files changed, 199 insertions(+), 38 deletions(-)

-- 
2.43.0

^ permalink raw reply

* [PATCH][e2fsprogs] build: use correct subst variable
From: Li Dongyang @ 2026-06-11  3:52 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger

ifNotGNUmake was changed to ifnGNUmake but test/Makefile.in still uses
the old variable name.
make fullcheck fails on some platforms:
make[2]: Entering directory `/var/lib/jenkins/workspace/e2fsprogs-reviews/arch/x86_64/distro/el7/_topdir/BUILD/e2fsprogs-1.47.4/tests'
Makefile:387: *** missing separator.  Stop.
make[2]: Leaving directory `/var/lib/jenkins/workspace/e2fsprogs-reviews/arch/x86_64/distro/el7/_topdir/BUILD/e2fsprogs-1.47.4/tests'
make[1]: *** [fullcheck-recursive] Error 1

Fixes: b7d1ab3376 "Update configure/configure.ac/aclocal.m4 to use autoconf 2.72"
Change-Id: Iec3cacfca7206bf785381664b7d7bded8c70113c
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
---
 tests/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/Makefile.in b/tests/Makefile.in
index 678cc3268c..8f7a072f45 100644
--- a/tests/Makefile.in
+++ b/tests/Makefile.in
@@ -48,7 +48,7 @@ test_data.tmp: $(srcdir)/scripts/gen-test-data
 always_run:

 @ifGNUmake@TESTS=$(wildcard $(srcdir)/[a-z]_*)
-@ifNotGNUmake@TESTS != echo $(srcdir)/[a-z]_*
+@ifnGNUmake@TESTS != echo $(srcdir)/[a-z]_*

 SKIP_SLOW_TESTS=--skip-slow-tests

-- 
2.52.0

^ permalink raw reply related

* Re: [PATCH 5.10/5.15] ext4: validate p_idx bounds in ext4_ext_correct_indexes
From: Sasha Levin @ 2026-06-11  0:45 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Sasha Levin, Alexey Panov, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-kernel, Baokun Li, Jan Kara, Ojaswin Mujoo,
	Ritesh Harjani (IBM), Zhang Yi, lvc-project,
	syzbot+04c4e65cab786a2e5b7e, Tejas Bharambe, stable
In-Reply-To: <20260609164430.29988-1-apanov@astralinux.ru>

On Mon, Jun 09, 2026 at 07:44:30PM +0300, Alexey Panov wrote:
> [PATCH 5.10/5.15] ext4: validate p_idx bounds in ext4_ext_correct_indexes

Queued for 5.15 and 5.10, thanks.

--
Thanks,
Sasha

^ permalink raw reply

* [PATCH v2 10/10] ext4: Add EXT4_IOC_SET_LUFID ioctl for setting LUFID on directory entries
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Add a new ioctl command that allows setting LUFID (Locally Unique File ID)
data on existing directory entries. This includes:

- ext4_ioctl_set_lufid(): ioctl handler that validates parameters and
  calls the underlying implementation
- ext4_set_direntry_lufid(): Core function that performs the operation by:
  * Looking up the target directory entry
  * Retrieving the associated inode
  * Deleting the old entry and re-creating it with LUFID data attached

This implementation requires the dirdata feature to be enabled on the
filesystem and properly handles transactions and inode locking to ensure
consistency.

Signed-off-by: Artem Blagodarenko artem.blagodarenko@gmail.com
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/ext4.h            |  15 +++++
 fs/ext4/ioctl.c           |  62 ++++++++++++++++++++
 fs/ext4/namei.c           | 120 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ext4.h |  13 +++++
 4 files changed, 210 insertions(+)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index defa18d98c74..975b0975e032 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1193,6 +1193,7 @@ struct ext4_inode_info {
 #ifdef CONFIG_FS_ENCRYPTION
 	struct fscrypt_inode_info *i_crypt_info;
 #endif
+	void *i_dirdata;
 };
 
 /*
@@ -2515,6 +2516,18 @@ struct ext4_dirent_hash {
 	struct ext4_dir_entry_hash	dh_hash;
 } __packed;
 
+static inline
+struct ext4_dirent_fid *ext4_dentry_get_fid(struct super_block *sb,
+					    struct ext4_dentry_param *p)
+{
+	if (!ext4_has_feature_dirdata(sb))
+		return NULL;
+	if (p && p->edp_magic == EXT4_LUFID_MAGIC)
+		return &p->edp_dfid;
+
+	return NULL;
+}
+
 #define EXT4_FT_DIR_CSUM	0xDE
 
 /*
@@ -3215,6 +3228,8 @@ static inline int ext4_init_new_dir(handle_t *handle, struct inode *dir,
 }
 extern int ext4_dirblock_csum_verify(struct inode *inode,
 				     struct buffer_head *bh);
+extern int ext4_dirdata_set_lufid(struct inode *dir, const char *filename,
+			   int namelen, struct ext4_dentry_param *edp);
 extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
 				__u32 start_minor_hash, __u32 *next_hash);
 extern int ext4_search_dir(struct buffer_head *bh,
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 1d0c3d4bdf47..9f32f21d9b3a 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1529,6 +1529,65 @@ static int ext4_ioctl_set_tune_sb(struct file *filp,
 	return ret;
 }
 
+/*
+ * ext4_ioctl_set_lufid() - Set LUFID on a directory entry
+ * @filp:	file pointer (parent directory)
+ * @arg:	pointer to ext4_set_lufid structure with filename and LUFID data
+ *
+ * This ioctl allows setting LUFID data on an existing
+ * directory entry. It is called on the parent directory with a filename and
+ * LUFID data.
+ */
+static long ext4_ioctl_set_lufid(struct file *filp, unsigned long arg)
+{
+	struct inode *dir = file_inode(filp);
+	struct ext4_set_lufid lufid_args;
+	struct {
+		__u32 edp_magic;
+		struct ext4_dirent_data_header df_header;
+		char df_fid[255];
+	} edp;
+	int err;
+
+	/* Check if parent is a directory */
+	if (!S_ISDIR(dir->i_mode))
+		return -ENOTDIR;
+
+	/* Copy arguments from user space */
+	if (copy_from_user(&lufid_args, (struct ext4_set_lufid __user *)arg,
+			   sizeof(lufid_args)))
+		return -EFAULT;
+
+	/* Validate parameters */
+	if (lufid_args.esl_name_len == 0 || lufid_args.esl_name_len > EXT4_NAME_LEN)
+		return -EINVAL;
+
+	if (lufid_args.esl_data_len == 0 || lufid_args.esl_data_len > 255)
+		return -EINVAL;
+
+	/* Ensure filename is NUL-terminated and unmodified */
+	if (lufid_args.esl_name[lufid_args.esl_name_len - 1] != '\0')
+		return -EINVAL;
+
+	/* Prepare the dentry param struct with LUFID data */
+	edp.edp_magic = EXT4_LUFID_MAGIC;
+	edp.df_header.ddh_length = lufid_args.esl_data_len;
+	memcpy(edp.df_fid, lufid_args.esl_data, lufid_args.esl_data_len);
+
+	/* Want write access */
+	err = mnt_want_write_file(filp);
+	if (err)
+		return err;
+
+	/* Call the helper function to do the actual work */
+	err = ext4_dirdata_set_lufid(dir, lufid_args.esl_name,
+				    lufid_args.esl_name_len - 1,
+				    (struct ext4_dentry_param *)&edp);
+
+	mnt_drop_write_file(filp);
+	return err;
+}
+
 static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	struct inode *inode = file_inode(filp);
@@ -1912,6 +1971,8 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 					      (void __user *)arg);
 	case EXT4_IOC_SET_TUNE_SB_PARAM:
 		return ext4_ioctl_set_tune_sb(filp, (void __user *)arg);
+	case EXT4_IOC_SET_LUFID:
+		return ext4_ioctl_set_lufid(filp, arg);
 	default:
 		return -ENOTTY;
 	}
@@ -1991,6 +2052,7 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case FS_IOC_SETFSLABEL:
 	case EXT4_IOC_GETFSUUID:
 	case EXT4_IOC_SETFSUUID:
+	case EXT4_IOC_SET_LUFID:
 		break;
 	default:
 		return -ENOIOCTLCMD;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c799f87d7459..65c53c08213a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2264,6 +2264,8 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 	if (ext4_has_feature_metadata_csum(inode->i_sb))
 		csum_size = sizeof(struct ext4_dir_entry_tail);
 
+	dfid = ext4_dentry_get_fid(inode->i_sb,
+		(struct ext4_dentry_param *)EXT4_I(inode)->i_dirdata);
 	if (!de) {
 		if (dfid)
 			dlen = dfid->df_header.ddh_length;
@@ -2605,6 +2607,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
 {
 	struct inode *dir = d_inode(dentry->d_parent);
 
+	EXT4_I(inode)->i_dirdata = dentry->d_fsdata;
 	if (fscrypt_is_nokey_name(dentry))
 		return -ENOKEY;
 	return __ext4_add_entry(handle, dir, &dentry->d_name, inode);
@@ -4361,6 +4364,123 @@ static int ext4_rename2(struct mnt_idmap *idmap,
 	return ext4_rename(idmap, old_dir, old_dentry, new_dir, new_dentry, flags);
 }
 
+/*
+ * ext4_dirdata_set_lufid() - Set LUFID data on an existing directory entry
+ * @dir:        parent directory inode
+ * @filename:   name of the file in the directory
+ * @namelen:    length of filename
+ * @edp:        pointer to initialized dentry param with LUFID data
+ *
+ * This function finds an existing directory entry, deletes it, and re-creates it
+ * with LUFID data attached. Used by the EXT4_IOC_SET_LUFID ioctl.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+int ext4_dirdata_set_lufid(struct inode *dir, const char *filename,
+			    int namelen, struct ext4_dentry_param *edp)
+{
+	struct super_block *sb = dir->i_sb;
+	struct ext4_filename fname;
+	struct ext4_dir_entry_2 *de = NULL;
+	struct buffer_head *bh = NULL;
+	struct inode *inode = NULL;
+	handle_t *handle = NULL;
+	struct qstr d_name;
+	void *old_dirdata = NULL;
+	int err = 0;
+
+	/* Check if dirdata feature is enabled */
+	if (!ext4_has_feature_dirdata(sb))
+		return -ENOTSUPP;
+
+	if (namelen > EXT4_NAME_LEN)
+               return -ENAMETOOLONG;
+        if (namelen != strnlen(filename, namelen + 1))
+               return -EINVAL;
+
+	/* Setup the filename for lookup */
+	d_name.name = filename;
+	d_name.len = namelen;
+
+	/* Lookup the filename in the directory */
+	err = ext4_fname_setup_filename(dir, &d_name, 0, &fname);
+	if (err)
+		goto out_free;
+
+	bh = ext4_find_entry(dir, &d_name, &de, NULL);
+	if (!bh) {
+		err = -ENOENT;
+		goto out_free;
+	}
+
+	/* Get the inode number from the directory entry */
+	inode = ext4_iget(sb, le32_to_cpu(de->inode), EXT4_IGET_NORMAL);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		inode = NULL;
+		goto out_brelse;
+	}
+
+	/* Start a transaction */
+	handle = ext4_journal_start(dir, EXT4_HT_DIR, 
+				     2 * EXT4_DATA_TRANS_BLOCKS(sb) + 
+				     EXT4_INDEX_EXTRA_TRANS_BLOCKS);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		handle = NULL;
+		goto out_iput;
+	}
+
+	inode_lock(dir);
+
+	/* Delete the old entry */
+	err = ext4_delete_entry(handle, dir, de, bh);
+	if (err)
+		goto out_unlock;
+
+	brelse(bh);
+	bh = NULL;
+
+	/* Re-add the entry with LUFID data
+	 * We set i_dirdata before adding so the entry can include it
+	 */
+	old_dirdata = EXT4_I(inode)->i_dirdata;
+	EXT4_I(inode)->i_dirdata = edp;
+
+	/* Use ext4_add_entry() to properly handle hash table management
+	 * and block splitting, just like rename does. This ensures the entry
+	 * is placed in the correct hash block and avoids breaking dirhash.
+	 */
+	{
+		struct dentry parent_dentry = { .d_inode = dir };
+		struct dentry new_dentry = {
+			.d_name = d_name,
+			.d_parent = &parent_dentry,
+			.d_inode = inode,  /* Same inode (in-place update) */
+			.d_fsdata = edp,   /* required */
+		};
+		err = ext4_add_entry(handle, &new_dentry, inode);
+	}
+	EXT4_I(inode)->i_dirdata = old_dirdata;
+
+	/* Update inode times */
+	inode_set_ctime_current(dir);
+	inode_inc_iversion(dir);
+	ext4_mark_inode_dirty(handle, dir);
+
+out_unlock:
+	inode_unlock(dir);
+	ext4_journal_stop(handle);
+out_iput:
+	iput(inode);
+out_brelse:
+	brelse(bh);
+out_free:
+	ext4_fname_free_filename(&fname);
+
+	return err;
+}
+
 /*
  * directories can handle most operations...
  */
diff --git a/include/uapi/linux/ext4.h b/include/uapi/linux/ext4.h
index 9c683991c32f..b04bbb2818a3 100644
--- a/include/uapi/linux/ext4.h
+++ b/include/uapi/linux/ext4.h
@@ -35,6 +35,7 @@
 #define EXT4_IOC_SETFSUUID		_IOW('f', 44, struct fsuuid)
 #define EXT4_IOC_GET_TUNE_SB_PARAM	_IOR('f', 45, struct ext4_tune_sb_params)
 #define EXT4_IOC_SET_TUNE_SB_PARAM	_IOW('f', 46, struct ext4_tune_sb_params)
+#define EXT4_IOC_SET_LUFID		_IOW('f', 47, struct ext4_set_lufid)
 
 #define EXT4_IOC_SHUTDOWN _IOR('X', 125, __u32)
 
@@ -92,6 +93,18 @@ struct move_extent {
 	__u64 moved_len;	/* moved block length */
 };
 
+/*
+ * Structure for EXT4_IOC_SET_LUFID
+ * Sets LUFID on a directory entry
+ * Called on parent directory with filename and LUFID data as arguments
+ */
+struct ext4_set_lufid {
+	__u8 esl_name_len;	/* length of filename */
+	__u8 esl_data_len;	/* length of LUFID data */
+	char  esl_name[255 + 1]; /* filename (NUL-terminated) */
+	char  esl_data[255 + 1]; /* LUFID data (raw bytes) */
+};
+
 /*
  * Flags used by EXT4_IOC_SHUTDOWN
  */
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 09/10] ext4: add dirdata set/get helpers
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Add helpers to set and retrieve dirdata payload and hook them up at
the appropriate call sites.

Enable dirdata for casefold+encryption hashes and storing unique
128-bit file identifier in the directory entry for testing.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 foofile.txt      |   0
 fs/ext4/ext4.h   |   4 +
 fs/ext4/inline.c |   6 +-
 fs/ext4/namei.c  | 196 ++++++++++++++++++++++++++++++++++++++++-------
 4 files changed, 176 insertions(+), 30 deletions(-)

diff --git a/foofile.txt b/foofile.txt
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ef99e4fa99d7..defa18d98c74 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3787,6 +3787,10 @@ extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
 			 struct inode *inode, struct dentry *dentry);
 extern int __ext4_link(struct inode *dir, struct inode *inode,
 		       const struct qstr *d_name, struct dentry *dentry);
+extern unsigned char ext4_dirdata_get(struct ext4_dir_entry_2 *de,
+				      struct inode *dir,
+				      struct ext4_dirent_fid  *lufid,
+				      struct dx_hash_info *hinfo);
 
 #define S_SHIFT 12
 static const unsigned char ext4_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = {
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index c57a8ebe4f94..71c395c9a162 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1346,10 +1346,8 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
 			}
 		}
 
-		if (ext4_hash_in_dirent(dir)) {
-			hinfo->hash = EXT4_DIRENT_HASH(de);
-			hinfo->minor_hash = EXT4_DIRENT_MINOR_HASH(de);
-		} else {
+		if (!(ext4_dirdata_get(de, dir, NULL, hinfo) &
+							EXT4_DIRENT_CFHASH)) {
 			err = ext4fs_dirhash(dir, de->name, de->name_len, hinfo);
 			if (err) {
 				ret = err;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 40a3394f7eac..c799f87d7459 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1084,22 +1084,22 @@ static int htree_dirblock_to_tree(struct file *dir_file,
 			/* silently ignore the rest of the block */
 			break;
 		}
-		if (ext4_hash_in_dirent(dir)) {
-			if (de->name_len && de->inode) {
-				hinfo->hash = EXT4_DIRENT_HASH(de);
-				hinfo->minor_hash = EXT4_DIRENT_MINOR_HASH(de);
-			} else {
-				hinfo->hash = 0;
-				hinfo->minor_hash = 0;
-			}
+		if (de->name_len && de->inode) {
+			/* check for saved hash first, or generate it from name */
+			if (!(ext4_dirdata_get(de, dir, NULL, hinfo) &
+			      EXT4_DIRENT_CFHASH)) {
+				err = ext4fs_dirhash(dir, de->name,
+						     de->name_len, hinfo);
+				if (err < 0) {
+					count = err;
+					goto errout;
+				}
+			 }
 		} else {
-			err = ext4fs_dirhash(dir, de->name,
-					     de->name_len, hinfo);
-			if (err < 0) {
-				count = err;
-				goto errout;
-			}
+			hinfo->hash = 0;
+			hinfo->minor_hash = 0;
 		}
+
 		if ((hinfo->hash < start_hash) ||
 		    ((hinfo->hash == start_hash) &&
 		     (hinfo->minor_hash < start_minor_hash)))
@@ -1277,9 +1277,160 @@ static inline int search_dirblock(struct buffer_head *bh,
  */
 
 /*
- * Create map of hash values, offsets, and sizes, stored at end of block.
- * Returns number of entries mapped.
+ * ext4_dirdata_get() - Read dirdata fields from a directory entry.
+ * @de:         directory entry
+ * @dir:        directory inode (used for fscrypt+casefold hash fallback)
+ * @dfid:      if non-NULL and EXT4_DIRENT_LUFID is set, LUFID data is copied
+ * 		here
+ * @hinfo:	if non-NULL, receives the casefold hash and minor hash
+ *
+ * Reads any dirdata stored in @de.  If the dirdata feature is not enabled,
+ * falls back to reading the hash stored inline after the filename (for
+ * compatibility with the older casefold+fscrypt format).
+ *
+ * Returns a bitmask of EXT4_DIRENT_* flags indicating which fields were read.
  */
+unsigned char ext4_dirdata_get(struct ext4_dir_entry_2 *de, struct inode *dir,
+			       struct ext4_dirent_fid *dfid,
+			       struct dx_hash_info *hinfo)
+{
+	unsigned char ret = 0;
+	unsigned int data_offset = de->name_len + 1;
+
+	if (data_offset > de->rec_len)
+		return ret;
+
+	/* compatibility: hash stored inline after filename (no dirdata) */
+	if (hinfo && !ext4_has_feature_dirdata(dir->i_sb) &&
+	    ext4_hash_in_dirent(dir)) {
+		hinfo->hash = EXT4_DIRENT_HASH(de);
+		hinfo->minor_hash = EXT4_DIRENT_MINOR_HASH(de);
+		ret |= EXT4_DIRENT_CFHASH;
+
+		return ret;
+	}
+
+	/*  EXT4_DIRENT_* are not expected without flag in i_sb */
+	if (de->file_type & EXT4_DIRENT_LUFID) {
+		struct ext4_dirent_fid *dfid =
+			(struct ext4_dirent_fid *)(de->name + data_offset);
+		unsigned int dlen;
+
+		if (data_offset + sizeof(dfid->df_header) > de->rec_len)
+			return ret;
+
+		dlen = dfid->df_header.ddh_length;
+		if (dlen < sizeof(*dfid) || data_offset + dlen > de->rec_len)
+			return ret;
+
+		if (dfid) {
+			memcpy(dfid, dfid->df_fid, dfid->df_header.ddh_length);
+			ret |= EXT4_DIRENT_LUFID;
+		}
+		data_offset += dlen;
+	}
+
+	/* Skip INO64 for now*/
+	if (de->file_type & EXT4_DIRENT_INO64) {
+		struct ext4_dirent_data_header *ddh =
+		       (struct ext4_dirent_data_header *)(de->name + data_offset);
+		unsigned int dlen;
+
+		if (data_offset + sizeof(*ddh) > de->rec_len)
+			return ret;
+
+		dlen = ddh->ddh_length;
+		if (dlen < sizeof(*ddh) || data_offset + dlen > de->rec_len)
+			return ret;
+
+		data_offset += dlen;
+	}
+
+	if (!hinfo)
+		return ret;
+
+	if (de->file_type & EXT4_DIRENT_CFHASH) {
+		struct ext4_dirent_hash *dh =
+			(struct ext4_dirent_hash *)(de->name + data_offset);
+		unsigned int dlen;
+
+		dlen = dh->dh_header.ddh_length;
+		if (dlen < sizeof(*dh) || data_offset + dlen > de->rec_len)
+			return ret;
+
+		hinfo->hash = le32_to_cpu(dh->dh_hash.hash);
+		hinfo->minor_hash = le32_to_cpu(dh->dh_hash.minor_hash);
+		ret |= EXT4_DIRENT_CFHASH;
+	}
+
+	return ret;
+}
+
+/*
+ * ext4_dirdata_set() - Write dirdata fields into a directory entry.
+ * @de:    directory entry (name must already be set)
+ * @dir:   directory inode
+ * @data:  LUFID data to store (or NULL)
+ * @fname: filename info carrying the casefold hash
+ *
+ * Writes any required dirdata into @de after the filename.  If the dirdata
+ * feature is not enabled, falls back to writing the hash inline after the
+ * filename (for compatibility with the older casefold+fscrypt format).
+ */
+static void ext4_dirdata_set(struct ext4_dir_entry_2 *de, struct inode *dir,
+			     struct ext4_dirent_fid *dfid,
+			     struct ext4_filename *fname)
+{
+	struct dx_hash_info *hinfo = &fname->hinfo;
+	unsigned int data_offset = de->name_len + 1;
+
+
+	if (dfid) {
+		unsigned int dlen = dfid->df_header.ddh_length;
+
+		if (data_offset + dlen > de->rec_len) {
+			EXT4_ERROR_INODE(dir, "Can not insert FID");
+			return;
+		}
+
+
+		de->name[de->name_len] = 0;
+		memcpy(&de->name[de->name_len + 1], dfid,
+		       dlen);
+		de->file_type |= EXT4_DIRENT_LUFID;
+		data_offset += dlen;
+	}
+
+	if (ext4_hash_in_dirent(dir)) {
+		if (ext4_has_feature_dirdata(dir->i_sb)) {
+			struct ext4_dirent_hash *dh =
+			    (struct ext4_dirent_hash *)(de->name + data_offset);
+
+			if (data_offset + sizeof(*dh) > de->rec_len) {
+				EXT4_ERROR_INODE(dir, "Can not insert dhash dirdata");
+				return;
+			}
+
+			dh->dh_header.ddh_length = sizeof(*dh);
+			dh->dh_hash.hash = cpu_to_le32(hinfo->hash);
+			dh->dh_hash.minor_hash = cpu_to_le32(hinfo->minor_hash);
+			de->file_type |= EXT4_DIRENT_CFHASH;
+		} else {
+			/* Compatibility: store hash inline after filename */
+			if (data_offset + sizeof(struct ext4_dir_entry_hash) >
+								de-> rec_len) {
+				EXT4_ERROR_INODE(dir, "Can not insert dhash");
+				return;
+			}
+
+			EXT4_DIRENT_HASHES(de)->hash = cpu_to_le32(hinfo->hash);
+			EXT4_DIRENT_HASHES(de)->minor_hash =
+						cpu_to_le32(hinfo->minor_hash);
+		}
+	}
+}
+
+
 static int dx_make_map(struct inode *dir, struct buffer_head *bh,
 		       struct dx_hash_info *hinfo,
 		       struct dx_map_entry *map_tail)
@@ -1299,9 +1450,8 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh,
 					 ((char *)de) - base))
 			return -EFSCORRUPTED;
 		if (de->name_len && de->inode) {
-			if (ext4_hash_in_dirent(dir))
-				h.hash = EXT4_DIRENT_HASH(de);
-			else {
+			if (!(ext4_dirdata_get(de, dir, NULL, &h) &
+						EXT4_DIRENT_CFHASH)) {
 				int err = ext4fs_dirhash(dir, de->name,
 						     de->name_len, &h);
 				if (err < 0)
@@ -2089,13 +2239,7 @@ void ext4_insert_dentry_data(struct inode *dir, struct inode *inode,
 	ext4_set_de_type(inode->i_sb, de, inode->i_mode);
 	de->name_len = fname_len(fname);
 	memcpy(de->name, fname_name(fname), fname_len(fname));
-	if (ext4_hash_in_dirent(dir)) {
-		struct dx_hash_info *hinfo = &fname->hinfo;
-
-		EXT4_DIRENT_HASHES(de)->hash = cpu_to_le32(hinfo->hash);
-		EXT4_DIRENT_HASHES(de)->minor_hash =
-						cpu_to_le32(hinfo->minor_hash);
-	}
+	ext4_dirdata_set(de, dir, data, fname);
 }
 
 /*
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 08/10] ext4: dirdata feature
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4
  Cc: adilger.kernel, Artem Blagodarenko, Pravin Shelar, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

When fscrypt and casefold are enabled together for a directory,
all ext4_dir_entry[_2] in that directory store a n 8-byte hash
of the filename after 'name' between 'name_len' and 'rec_len'.

However, there is no clear indication there is important data
stored in these bytes, which are only for padding and alignment
in other directory entries.  This adds complexity to code handling
the on-disk directory entries, and there is no provision for other
metadata to be stored in each dir entry after 'name'.

The dirdata feature adds a mechanism to store multiple metadata
entries in each dir entry after 'name' (including the fchash).
The unused high 4 bits of 'file_type' are used to indicate whether
additional data fields are stored after 'name'.  If a bit is set,
the corresponding dirdata record is present, starting after a NUL
filename terminator.  If present, a record starts with a 1-byte
length (including the length byte itself) and the data immediately
follows the length byte without any alignment.

This allows up to four different dirdata records to be stored in
each entry, and allows unhandled record bytes to be skipped without
having to process the contents, providing forward compatibility.

If and when the fourth and last dirdata record is needed, it is
recommended to further subdivide it into sub-records, with
the first byte being the total length, and then there being a
second byte that gives the sub-record length, etc. as long as
the total record length is less than 255 bytes.  However, this
would not affect compatibility with the current code since the
record length would allow it to be skipped without processing.

Signed-off-by: Pravin Shelar <pravin.shelar@sun.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/ext4.h   | 27 +++++++++++++++++++++------
 fs/ext4/inline.c | 19 +++++++++++++++----
 fs/ext4/namei.c  | 43 +++++++++++++++++++++----------------------
 fs/ext4/sysfs.c  |  2 ++
 4 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 066c49fe3266..ef99e4fa99d7 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2248,6 +2248,7 @@ EXT4_FEATURE_INCOMPAT_FUNCS(casefold,		CASEFOLD)
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG| \
 					 EXT4_FEATURE_INCOMPAT_EA_INODE| \
 					 EXT4_FEATURE_INCOMPAT_MMP | \
+					 EXT4_FEATURE_INCOMPAT_DIRDATA | \
 					 EXT4_FEATURE_INCOMPAT_INLINE_DATA | \
 					 EXT4_FEATURE_INCOMPAT_ENCRYPT | \
 					 EXT4_FEATURE_INCOMPAT_CASEFOLD | \
@@ -2949,10 +2950,18 @@ extern int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 			     struct ext4_filename *fname,
 			     struct ext4_dir_entry_2 **dest_de,
 			     int dlen);
-void ext4_insert_dentry(struct inode *dir, struct inode *inode,
-			struct ext4_dir_entry_2 *de,
-			int buf_size,
-			struct ext4_filename *fname);
+void ext4_insert_dentry_data(struct inode *dir, struct inode *inode,
+			     struct ext4_dir_entry_2 *de,
+			     int buf_size,
+			     struct ext4_filename *fname,
+			     void *data);
+static inline void ext4_insert_dentry(struct inode *dir, struct inode *inode,
+				      struct ext4_dir_entry_2 *de,
+				      int buf_size,
+				      struct ext4_filename *fname)
+{
+	ext4_insert_dentry_data(dir, inode, de, buf_size, fname, NULL);
+}
 static inline void ext4_update_dx_flag(struct inode *inode)
 {
 	if (!ext4_has_feature_dir_index(inode->i_sb) &&
@@ -3196,8 +3205,14 @@ extern int ext4_ext_migrate(struct inode *);
 extern int ext4_ind_migrate(struct inode *inode);
 
 /* namei.c */
-extern int ext4_init_new_dir(handle_t *handle, struct inode *dir,
-			     struct inode *inode);
+extern int ext4_init_new_dir_data(handle_t *handle, struct inode *dir,
+				  struct inode *inode,
+				  const void *data1, const void *data2);
+static inline int ext4_init_new_dir(handle_t *handle, struct inode *dir,
+				    struct inode *inode)
+{
+	return ext4_init_new_dir_data(handle, dir, inode, NULL, NULL);
+}
 extern int ext4_dirblock_csum_verify(struct inode *inode,
 				     struct buffer_head *bh);
 extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 5b3faacdf143..c57a8ebe4f94 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -973,11 +973,16 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 				     struct ext4_iloc *iloc,
 				     void *inline_start, int inline_size)
 {
-	int		err;
+	int		err, dlen = 0;
 	struct ext4_dir_entry_2 *de;
+	unsigned char *data = NULL;
+
+	/* Deliver data in any appropriate way here. Now it is NULL */
+	if (data)
+		dlen = (*data) + 1;
 
 	err = ext4_find_dest_de(dir, iloc->bh, inline_start,
-				inline_size, fname, &de, 0);
+				inline_size, fname, &de, dlen);
 	if (err)
 		return err;
 
@@ -986,7 +991,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 					    EXT4_JTR_NONE);
 	if (err)
 		return err;
-	ext4_insert_dentry(dir, inode, de, inline_size, fname);
+	ext4_insert_dentry_data(dir, inode, de, inline_size, fname, NULL);
 
 	ext4_show_inline_dir(dir, iloc->bh, inline_start, inline_size);
 
@@ -1326,7 +1331,13 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
 			pos = EXT4_INLINE_DOTDOT_SIZE;
 		} else {
 			de = (struct ext4_dir_entry_2 *)(dir_buf + pos);
-			pos += ext4_rec_len_from_disk(de->rec_len, inline_size);
+			/* Use ext4_dir_entry_len to account for dirdata extensions */
+			pos += ext4_dir_entry_len(de, dir);
+			/* Validate pos doesn't exceed buffer to prevent use-after-free */
+			if (pos > inline_size) {
+				ret = count;
+				goto out;
+			}
 			if (ext4_check_dir_entry(inode, dir_file, de,
 					 iloc.bh, dir_buf,
 					 inline_size, pos)) {
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index cd20b1094134..40a3394f7eac 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -401,23 +401,24 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
 {
 	struct ext4_dir_entry_2 *de;
 	struct dx_root_info *root;
-	int count_offset;
+	int count_offset, dotdot_rec_len;
 	int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
 	unsigned int rlen = ext4_rec_len_from_disk(dirent->rec_len, blocksize);
 
-	if (rlen == blocksize)
+	if (rlen == blocksize) {
 		count_offset = sizeof(struct dx_node);
-	else if (rlen == 12) {
-		de = (struct ext4_dir_entry_2 *)(((void *)dirent) + 12);
-		if (ext4_rec_len_from_disk(de->rec_len, blocksize) != blocksize - 12)
+	} else {
+		de = (struct ext4_dir_entry_2 *)(((char *)dirent) + rlen);
+		if (le16_to_cpu(de->rec_len) != (blocksize - rlen))
 			return NULL;
-		root = (struct dx_root_info *)(((void *)de + 12));
+		/* de->rec_len covers whole dx_root block, calculate actual length */
+		dotdot_rec_len = ext4_dir_entry_len(de, NULL);
+		root = (struct dx_root_info *)(((char *)de + dotdot_rec_len));
 		if (root->reserved_zero ||
 		    root->info_length != sizeof(struct dx_root_info))
 			return NULL;
-		count_offset = 32;
-	} else
-		return NULL;
+		count_offset = root->info_length + rlen + dotdot_rec_len;
+	}
 
 	if (offset)
 		*offset = count_offset;
@@ -698,7 +699,7 @@ static struct stats dx_show_leaf(struct inode *dir,
 				       (unsigned) ((char *) de - base));
 #endif
 			}
-			space += ext4_dir_rec_len(de->name_len, dir);
+			space += ext4_dir_entry_len(de, dir);
 			names++;
 		}
 		de = ext4_next_entry(de, size);
@@ -2068,13 +2069,10 @@ int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 	return 0;
 }
 
-void ext4_insert_dentry(struct inode *dir,
-			struct inode *inode,
-			struct ext4_dir_entry_2 *de,
-			int buf_size,
-			struct ext4_filename *fname)
+void ext4_insert_dentry_data(struct inode *dir, struct inode *inode,
+			     struct ext4_dir_entry_2 *de, int buf_size,
+			     struct ext4_filename *fname, void *data)
 {
-
 	int nlen, rlen;
 
 	nlen = ext4_dir_entry_len(de, dir);
@@ -2116,15 +2114,15 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 	unsigned int	blocksize = dir->i_sb->s_blocksize;
 	int		csum_size = 0;
 	int		err, err2, dlen = 0;
-	unsigned char	*data = NULL;
+	struct ext4_dirent_fid *dfid = NULL;
 
 	/* Deliver data in any appropriate way here. Now it is NULL */
 	if (ext4_has_feature_metadata_csum(inode->i_sb))
 		csum_size = sizeof(struct ext4_dir_entry_tail);
 
 	if (!de) {
-		if (data)
-			dlen = (*data) + 1;
+		if (dfid)
+			dlen = dfid->df_header.ddh_length;
 		err = ext4_find_dest_de(dir, bh, bh->b_data,
 					blocksize - csum_size, fname, &de, dlen);
 		if (err)
@@ -2139,7 +2137,7 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 	}
 
 	/* By now the buffer is marked for journaling */
-	ext4_insert_dentry(dir, inode, de, blocksize, fname);
+	ext4_insert_dentry_data(dir, inode, de, blocksize, fname, dfid);
 
 	/*
 	 * XXX shouldn't update any times until successful
@@ -2968,8 +2966,9 @@ int ext4_init_dirblock(handle_t *handle, struct inode *inode,
 	return ext4_handle_dirty_dirblock(handle, inode, bh);
 }
 
-int ext4_init_new_dir(handle_t *handle, struct inode *dir,
-			     struct inode *inode)
+int ext4_init_new_dir_data(handle_t *handle, struct inode *dir,
+			   struct inode *inode,
+			   const void *data1, const void *data2)
 {
 	struct buffer_head *dir_block = NULL;
 	ext4_lblk_t block = 0;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 923b375e017f..80074fb15ee9 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -362,6 +362,7 @@ EXT4_ATTR_FEATURE(verity);
 #endif
 EXT4_ATTR_FEATURE(metadata_csum_seed);
 EXT4_ATTR_FEATURE(fast_commit);
+EXT4_ATTR_FEATURE(dirdata);
 #if IS_ENABLED(CONFIG_UNICODE) && defined(CONFIG_FS_ENCRYPTION)
 EXT4_ATTR_FEATURE(encrypted_casefold);
 #endif
@@ -385,6 +386,7 @@ static struct attribute *ext4_feat_attrs[] = {
 #endif
 	ATTR_LIST(metadata_csum_seed),
 	ATTR_LIST(fast_commit),
+	ATTR_LIST(dirdata),
 #if IS_ENABLED(CONFIG_UNICODE) && defined(CONFIG_FS_ENCRYPTION)
 	ATTR_LIST(encrypted_casefold),
 #endif
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 07/10] ext4: rename ext4_dir_rec_len() and clarify dirdata usage
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Rename ext4_dir_rec_len() to ext4_dirent_rec_len() to better
reflect that it computes the record length for a directory
entry based on the provided name length.

Update the comment to clarify handling of dirdata-enabled
directories and document the use of ext4_dir_entry_len()
when dirdata is present.

No functional changes.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/dir.c    |  9 ++++-----
 fs/ext4/ext4.h   | 14 ++++++++++----
 fs/ext4/inline.c | 14 +++++++-------
 fs/ext4/namei.c  | 39 ++++++++++++++++++++++-----------------
 4 files changed, 43 insertions(+), 33 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 17edd678fa87..012687822b82 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -89,16 +89,15 @@ int __ext4_check_dir_entry(const char *function, unsigned int line,
 	bool fake = is_fake_dir_entry(de);
 	bool has_csum = ext4_has_feature_metadata_csum(dir->i_sb);
 
-	if (unlikely(rlen < ext4_dir_rec_len(1, fake ? NULL : dir)))
+	if (unlikely(rlen < ext4_dirent_rec_len(1, fake ? NULL : dir)))
 		error_msg = "rec_len is smaller than minimal";
 	else if (unlikely(rlen % 4 != 0))
 		error_msg = "rec_len % 4 != 0";
-	else if (unlikely(rlen < ext4_dir_rec_len(de->name_len,
-							fake ? NULL : dir)))
+	else if (unlikely(rlen < ext4_dir_entry_len(de, fake ? NULL : dir)))
 		error_msg = "rec_len is too small for name_len";
 	else if (unlikely(next_offset > size))
 		error_msg = "directory entry overrun";
-	else if (unlikely(next_offset > size - ext4_dir_rec_len(1,
+	else if (unlikely(next_offset > size - ext4_dirent_rec_len(1,
 						  has_csum ? NULL : dir) &&
 			  next_offset != size))
 		error_msg = "directory entry too close to block end";
@@ -245,7 +244,7 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
 				 * failure will be detected in the
 				 * dirent test below. */
 				if (ext4_rec_len_from_disk(de->rec_len,
-					sb->s_blocksize) < ext4_dir_rec_len(1,
+					sb->s_blocksize) < ext4_dirent_rec_len(1,
 									inode))
 					break;
 				i += ext4_rec_len_from_disk(de->rec_len,
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 45e90b8be9e8..066c49fe3266 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2530,11 +2530,16 @@ struct ext4_dirent_hash {
  * casefolded and encrypted need to store the hash as well, so we add room for
  * ext4_extended_dir_entry_2. For all entries related to '.' or '..' you should
  * pass NULL for dir, as those entries do not use the extra fields.
+ *
+ * For directories with the dirdata feature, extra data may follow the filename.
+ * Use ext4_dir_entry_len() to compute the length of a directory entry
+ * including any dirdata, or ext4_dirent_rec_len() directly when the total
+ * name_len (including dirdata length) is already known.
  */
-static inline unsigned int ext4_dir_rec_len(__u8 name_len,
+static inline unsigned int ext4_dirent_rec_len(unsigned int name_len,
 						const struct inode *dir)
 {
-	int rec_len = (name_len + 8 + EXT4_DIR_ROUND);
+	unsigned int rec_len = (name_len + 8 + EXT4_DIR_ROUND);
 
 	if (dir && ext4_hash_in_dirent(dir))
 		rec_len += sizeof(struct ext4_dir_entry_hash);
@@ -2942,7 +2947,8 @@ extern void ext4_htree_free_dir_info(struct dir_private_info *p);
 extern int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 			     void *buf, int buf_size,
 			     struct ext4_filename *fname,
-			     struct ext4_dir_entry_2 **dest_de);
+			     struct ext4_dir_entry_2 **dest_de,
+			     int dlen);
 void ext4_insert_dentry(struct inode *dir, struct inode *inode,
 			struct ext4_dir_entry_2 *de,
 			int buf_size,
@@ -4055,7 +4061,7 @@ static inline unsigned int ext4_dir_entry_len(struct ext4_dir_entry_2 *de,
 	unsigned int rec_len = ext4_rec_len_from_disk(de->rec_len, blocksize);
 	unsigned int dirdata = ext4_dirent_get_data_len(de, rec_len);
 
-	return ext4_dir_rec_len(de->name_len + dirdata, dir);
+	return ext4_dirent_rec_len(de->name_len + dirdata, dir);
 }
 
 extern const struct iomap_ops ext4_iomap_ops;
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 8045e4ff270c..5b3faacdf143 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -977,7 +977,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 	struct ext4_dir_entry_2 *de;
 
 	err = ext4_find_dest_de(dir, iloc->bh, inline_start,
-				inline_size, fname, &de);
+				inline_size, fname, &de, 0);
 	if (err)
 		return err;
 
@@ -1055,7 +1055,7 @@ static int ext4_update_inline_dir(handle_t *handle, struct inode *dir,
 	int old_size = EXT4_I(dir)->i_inline_size - EXT4_MIN_INLINE_DATA_SIZE;
 	int new_size = get_max_inline_xattr_value_size(dir, iloc);
 
-	if (new_size - old_size <= ext4_dir_rec_len(1, NULL))
+	if (new_size - old_size <= ext4_dirent_rec_len(1, NULL))
 		return -ENOSPC;
 
 	ret = ext4_update_inline_data(handle, dir,
@@ -1309,7 +1309,7 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
 			fake.name_len = 1;
 			memcpy(fake.name, ".", 2);
 			fake.rec_len = ext4_rec_len_to_disk(
-					  ext4_dir_rec_len(fake.name_len, NULL),
+					  ext4_dirent_rec_len(fake.name_len, NULL),
 					  inline_size);
 			ext4_set_de_type(inode->i_sb, &fake, S_IFDIR);
 			de = &fake;
@@ -1319,7 +1319,7 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
 			fake.name_len = 2;
 			memcpy(fake.name, "..", 3);
 			fake.rec_len = ext4_rec_len_to_disk(
-					  ext4_dir_rec_len(fake.name_len, NULL),
+					  ext4_dirent_rec_len(fake.name_len, NULL),
 					  inline_size);
 			ext4_set_de_type(inode->i_sb, &fake, S_IFDIR);
 			de = &fake;
@@ -1427,8 +1427,8 @@ int ext4_read_inline_dir(struct file *file,
 	 * So we will use extra_offset and extra_size to indicate them
 	 * during the inline dir iteration.
 	 */
-	dotdot_offset = ext4_dir_rec_len(1, NULL);
-	dotdot_size = dotdot_offset + ext4_dir_rec_len(2, NULL);
+	dotdot_offset = ext4_dirent_rec_len(1, NULL);
+	dotdot_size = dotdot_offset + ext4_dirent_rec_len(2, NULL);
 	extra_offset = dotdot_size - EXT4_INLINE_DOTDOT_SIZE;
 	extra_size = extra_offset + inline_size;
 
@@ -1463,7 +1463,7 @@ int ext4_read_inline_dir(struct file *file,
 			 * failure will be detected in the
 			 * dirent test below. */
 			if (ext4_rec_len_from_disk(de->rec_len, extra_size)
-				< ext4_dir_rec_len(1, NULL))
+				< ext4_dirent_rec_len(1, NULL))
 				break;
 			i += ext4_rec_len_from_disk(de->rec_len,
 						    extra_size);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 0635eac2de8d..cd20b1094134 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -522,10 +522,10 @@ ext4_next_entry(struct ext4_dir_entry_2 *p, unsigned long blocksize)
 static struct dx_root_info *dx_get_dx_info(void *de_buf)
 {
 	/* get dotdot first */
-	de_buf = de_buf + ext4_dir_rec_len(1, NULL);
+	de_buf += ext4_dir_entry_len(de_buf, NULL);
 
 	/* dx root info is after dotdot entry */
-	de_buf = de_buf + ext4_dir_rec_len(2, NULL);
+	de_buf += ext4_dir_entry_len(de_buf, NULL);
 
 	return (struct dx_root_info *)de_buf;
 }
@@ -588,7 +588,7 @@ static inline unsigned dx_root_limit(struct inode *dir,
 static inline unsigned dx_node_limit(struct inode *dir)
 {
 	unsigned int entry_space = dir->i_sb->s_blocksize -
-			ext4_dir_rec_len(0, dir);
+			ext4_dirent_rec_len(0, dir);
 
 	if (ext4_has_feature_metadata_csum(dir->i_sb))
 		entry_space -= sizeof(struct dx_tail);
@@ -1058,7 +1058,7 @@ static int htree_dirblock_to_tree(struct file *dir_file,
 	/* csum entries are not larger in the casefolded encrypted case */
 	top = (struct ext4_dir_entry_2 *) ((char *) de +
 					   dir->i_sb->s_blocksize -
-					   ext4_dir_rec_len(0,
+					   ext4_dirent_rec_len(0,
 							   csum ? NULL : dir));
 	/* Check if the directory is encrypted */
 	if (IS_ENCRYPTED(dir)) {
@@ -1852,7 +1852,7 @@ dx_move_dirents(struct inode *dir, char *from, char *to,
 	while (count--) {
 		struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)
 						(from + (map->offs<<2));
-		rec_len = ext4_dir_rec_len(de->name_len, dir);
+		rec_len = ext4_dir_entry_len(de, dir);
 
 		memcpy (to, de, rec_len);
 		((struct ext4_dir_entry_2 *) to)->rec_len =
@@ -1885,7 +1885,7 @@ static struct ext4_dir_entry_2 *dx_pack_dirents(struct inode *dir, char *base,
 	while ((char*)de < base + blocksize) {
 		next = ext4_next_entry(de, blocksize);
 		if (de->inode && de->name_len) {
-			rec_len = ext4_dir_rec_len(de->name_len, dir);
+			rec_len = ext4_dir_entry_len(de, dir);
 			if (de > to)
 				memmove(to, de, rec_len);
 			to->rec_len = ext4_rec_len_to_disk(rec_len, blocksize);
@@ -2037,10 +2037,11 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
 int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 		      void *buf, int buf_size,
 		      struct ext4_filename *fname,
-		      struct ext4_dir_entry_2 **dest_de)
+		      struct ext4_dir_entry_2 **dest_de,
+		      int dlen)
 {
 	struct ext4_dir_entry_2 *de;
-	unsigned short reclen = ext4_dir_rec_len(fname_len(fname), dir);
+	unsigned short reclen = ext4_dirent_rec_len(fname_len(fname) + dlen, dir);
 	int nlen, rlen;
 	unsigned int offset = 0;
 	char *top;
@@ -2053,7 +2054,7 @@ int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 			return -EFSCORRUPTED;
 		if (ext4_match(dir, fname, de))
 			return -EEXIST;
-		nlen = ext4_dir_rec_len(de->name_len, dir);
+		nlen = ext4_dir_entry_len(de, dir);
 		rlen = ext4_rec_len_from_disk(de->rec_len, buf_size);
 		if ((de->inode ? rlen - nlen : rlen) >= reclen)
 			break;
@@ -2076,7 +2077,7 @@ void ext4_insert_dentry(struct inode *dir,
 
 	int nlen, rlen;
 
-	nlen = ext4_dir_rec_len(de->name_len, dir);
+	nlen = ext4_dir_entry_len(de, dir);
 	rlen = ext4_rec_len_from_disk(de->rec_len, buf_size);
 	if (de->inode) {
 		struct ext4_dir_entry_2 *de1 =
@@ -2114,14 +2115,18 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 {
 	unsigned int	blocksize = dir->i_sb->s_blocksize;
 	int		csum_size = 0;
-	int		err, err2;
+	int		err, err2, dlen = 0;
+	unsigned char	*data = NULL;
 
+	/* Deliver data in any appropriate way here. Now it is NULL */
 	if (ext4_has_feature_metadata_csum(inode->i_sb))
 		csum_size = sizeof(struct ext4_dir_entry_tail);
 
 	if (!de) {
+		if (data)
+			dlen = (*data) + 1;
 		err = ext4_find_dest_de(dir, bh, bh->b_data,
-					blocksize - csum_size, fname, &de);
+					blocksize - csum_size, fname, &de, dlen);
 		if (err)
 			return err;
 	}
@@ -2930,7 +2935,7 @@ int ext4_init_dirblock(handle_t *handle, struct inode *inode,
 
 	de->inode = cpu_to_le32(inode->i_ino);
 	de->name_len = 1;
-	de->rec_len = ext4_rec_len_to_disk(ext4_dir_rec_len(de->name_len, NULL),
+	de->rec_len = ext4_rec_len_to_disk(ext4_dirent_rec_len(de->name_len, NULL),
 					   blocksize);
 	memcpy(de->name, ".", 2);
 	ext4_set_de_type(inode->i_sb, de, S_IFDIR);
@@ -2942,7 +2947,7 @@ int ext4_init_dirblock(handle_t *handle, struct inode *inode,
 	ext4_set_de_type(inode->i_sb, de, S_IFDIR);
 	if (inline_buf) {
 		de->rec_len = ext4_rec_len_to_disk(
-					ext4_dir_rec_len(de->name_len, NULL),
+					ext4_dirent_rec_len(de->name_len, NULL),
 					blocksize);
 		de = ext4_next_entry(de, blocksize);
 		header_size = (char *)de - bh->b_data;
@@ -2951,7 +2956,7 @@ int ext4_init_dirblock(handle_t *handle, struct inode *inode,
 			blocksize - csum_size);
 	} else {
 		de->rec_len = ext4_rec_len_to_disk(blocksize -
-					(csum_size + ext4_dir_rec_len(1, NULL)),
+					(csum_size + ext4_dirent_rec_len(1, NULL)),
 					blocksize);
 	}
 
@@ -3074,8 +3079,8 @@ bool ext4_empty_dir(struct inode *inode)
 	}
 
 	sb = inode->i_sb;
-	if (inode->i_size < ext4_dir_rec_len(1, NULL) +
-					ext4_dir_rec_len(2, NULL)) {
+	if (inode->i_size < ext4_dirent_rec_len(1, NULL) +
+					ext4_dirent_rec_len(2, NULL)) {
 		EXT4_ERROR_INODE(inode, "invalid size");
 		return false;
 	}
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 06/10] ext4: add ext4_dir_entry_len() and harden dirdata parsing
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Introduce ext4_dir_entry_len() helper to compute the required
rec_len for a directory entry, taking into account dirdata and
casefold+fscrypt hash space.

Convert ext4_dirent_get_data_len() to take the decoded rec_len
as an argument and add bounds checking when walking dirdata
extensions to avoid overruns on malformed entries.

Update dx_root_limit() to use ext4_dir_entry_len() instead of
open-coded ext4_dir_rec_len() for '.' and '..' entries.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/ext4.h  | 45 ++++++++++++++++++++++++++++++++++++++++++---
 fs/ext4/namei.c | 23 +++++++++++++++--------
 2 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f833f6ef0040..45e90b8be9e8 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3988,6 +3988,7 @@ static inline bool ext4_dir_entry_is_tail(struct ext4_dir_entry_2 *de)
 /*
  * ext4_dirent_get_data_len() - Compute the total dirdata length for an entry.
  * @de: directory entry
+ * @rec_len: the record length of the directory entry (decoded)
  *
  * Computes the length of optional data stored after the filename (and its
  * implicit NUL terminator).  Each extension is indicated by a bit in the
@@ -3996,22 +3997,41 @@ static inline bool ext4_dir_entry_is_tail(struct ext4_dir_entry_2 *de)
  *
  * Returns 0 for tail entries and for entries with no dirdata.
  */
-static inline int ext4_dirent_get_data_len(struct ext4_dir_entry_2 *de)
+static inline int ext4_dirent_get_data_len(struct ext4_dir_entry_2 *de,
+					   unsigned int rec_len)
 {
 	__u8 extra_data_flags;
 	struct ext4_dirent_data_header *ddh;
 	int dlen = 0;
+	unsigned int offset;
 
 	if (ext4_dir_entry_is_tail(de))
 		return 0;
 
 	extra_data_flags = (de->file_type & ~EXT4_FT_MASK) >> 4;
-	ddh = (struct ext4_dirent_data_header *)(de->name + de->name_len +
-						 1 /* NUL terminator */);
+	/* offset from start of entry to after filename + NUL */
+	offset = EXT4_BASE_DIR_LEN + de->name_len + 1;
 
+	/* bounds check: ensure we start reading within the entry */
+	if (offset >= rec_len)
+		return 0;
+
+	ddh = (struct ext4_dirent_data_header *)((char *)de + offset);
+ 
 	while (extra_data_flags) {
 		if (extra_data_flags & 1) {
+			/* bounds check before reading ddh_length */
+			if (offset + sizeof(*ddh) >
+			    rec_len)
+				return dlen;
+
+			/* validate ddh_length is reasonable */
+			if (ddh->ddh_length == 0 || ddh->ddh_length >
+			    rec_len - offset)
+				return dlen;
+
 			dlen += ddh->ddh_length + (dlen == 0);
+			offset += ddh->ddh_length;
 			ddh = ext4_dirdata_next(ddh);
 		}
 		extra_data_flags >>= 1;
@@ -4019,6 +4039,25 @@ static inline int ext4_dirent_get_data_len(struct ext4_dir_entry_2 *de)
 	return dlen;
 }
 
+/*
+ * ext4_dir_entry_len() - Compute the required rec_len for a directory entry.
+ * @de:  directory entry (used to read name_len and any dirdata length)
+ * @dir: directory inode (may be NULL for '.' and '..' entries)
+ *
+ * Returns the minimum record length needed to hold @de, rounded up to the
+ * directory alignment and including room for the casefold+fscrypt hash if
+ * the directory requires it.
+ */
+static inline unsigned int ext4_dir_entry_len(struct ext4_dir_entry_2 *de,
+					      const struct inode *dir)
+{
+	unsigned int blocksize = (dir && dir->i_sb) ? dir->i_sb->s_blocksize : 4096;
+	unsigned int rec_len = ext4_rec_len_from_disk(de->rec_len, blocksize);
+	unsigned int dirdata = ext4_dirent_get_data_len(de, rec_len);
+
+	return ext4_dir_rec_len(de->name_len + dirdata, dir);
+}
+
 extern const struct iomap_ops ext4_iomap_ops;
 extern const struct iomap_ops ext4_iomap_report_ops;
 
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 87d8cd2c6377..0635eac2de8d 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -570,11 +570,15 @@ static inline void dx_set_limit(struct dx_entry *entries, unsigned value)
 	((struct dx_countlimit *) entries)->limit = cpu_to_le16(value);
 }
 
-static inline unsigned dx_root_limit(struct inode *dir, unsigned infosize)
+static inline unsigned dx_root_limit(struct inode *dir,
+	struct ext4_dir_entry_2 *dot_de)
 {
-	unsigned int entry_space = dir->i_sb->s_blocksize -
-			ext4_dir_rec_len(1, NULL) -
-			ext4_dir_rec_len(2, NULL) - infosize;
+	struct dx_root_info *info;
+	unsigned int entry_space;
+
+	info = dx_get_dx_info(dot_de);
+	entry_space = dir->i_sb->s_blocksize - ((char *)info - (char *)dot_de) -
+		info->info_length;
 
 	if (ext4_has_feature_metadata_csum(dir->i_sb))
 		entry_space -= sizeof(struct dx_tail);
@@ -850,10 +854,13 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
 
 	entries = (struct dx_entry *)(((char *)info) + info->info_length);
 
-	if (dx_get_limit(entries) != dx_root_limit(dir, info->info_length)) {
+	if (dx_get_limit(entries) !=
+	    dx_root_limit(dir, (struct ext4_dir_entry_2 *)frame->bh->b_data)) {
 		ext4_warning_inode(dir, "dx entry: limit %u != root limit %u",
 				   dx_get_limit(entries),
-				   dx_root_limit(dir, info->info_length));
+				   dx_root_limit(dir,
+				   (struct ext4_dir_entry_2 *)frame->bh->b_data
+				   ));
 		goto fail;
 	}
 
@@ -2278,10 +2285,10 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname,
 		dx_info->hash_version =
 				EXT4_SB(dir->i_sb)->s_def_hash_version;
 
-	entries = (void *)dx_info + sizeof(*dx_info);
+	entries = (void *)dx_info + dx_info->info_length;
 	dx_set_block(entries, 1);
 	dx_set_count(entries, 1);
-	dx_set_limit(entries, dx_root_limit(dir, sizeof(*dx_info)));
+	dx_set_limit(entries, dx_root_limit(dir, dot_de));
 
 	/* Initialize as for dx_probe */
 	fname->hinfo.hash_version = dx_info->hash_version;
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 04/10] ext4: add dirdata format definitions and access helpers
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4
  Cc: adilger.kernel, Artem Blagodarenko, Pravin Shelar, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Define the on-disk format for ext4 directory entry extension data.

The upper four bits of de->file_type indicate the presence of
optional data stored after the filename NUL terminator.  This patch
defines flags for LUFID, 64-bit inode numbers, and casefold hash
data stored in that area.

Add struct ext4_dirent_data_header to describe variable-length
extension records and struct ext4_dirent_hash for hash storage used
by casefold and fscrypt.

Provide ext4_dirdata_next() to advance to the next extension record
and ext4_dirent_get_data_len() to compute the total extension data
length associated with a directory entry.

No functional changes.

Signed-off-by: Pravin Shelar <pravin.shelar@sun.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@diliger.ca>
---
 fs/ext4/ext4.h | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 01b1222b1454..c36c3bf54590 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2470,6 +2470,49 @@ struct ext4_dir_entry_tail {
 #define EXT4_FT_SYMLINK		7
 
 #define EXT4_FT_MAX		8
+#define EXT4_FT_MASK		0xf
+
+#if EXT4_FT_MAX > EXT4_FT_MASK
+#error "conflicting EXT4_FT_MAX and EXT4_FT_MASK"
+#endif
+
+/*
+ * d_type has 4 unused bits, so it can hold four types of data. These different
+ * types of data (e.g. fscypt hash, high 32 bits of 64-bit inode number) can be
+ * stored, in flag order, after file-name in ext4 dirent.
+ *
+ * These flags are added to d_type if ext4 dirent has extra data after
+ * filename. This data length is variable and length is stored in first byte
+ * of data. Data starts after filename NUL byte.
+ */
+#define EXT4_DIRENT_LUFID		0x10
+#define EXT4_DIRENT_INO64		0x20
+#define EXT4_DIRENT_CFHASH		0x40
+
+struct ext4_fid {
+	char    fid[16];     /* 128-bit unique file identifier */
+};
+
+struct ext4_dirent_data_header {
+	/* length of this header + the whole data blob */
+	__u8	ddh_length;
+} __packed;
+
+struct ext4_dirent_fid {
+	struct ext4_dirent_data_header df_header;
+	struct ext4_fid                df_fid[];
+};
+
+#define EXT4_LUFID_MAGIC    0xAD200907UL
+struct ext4_dentry_param {
+	__u32			edp_magic;	/* EXT4_LUFID_MAGIC */
+	struct ext4_dirent_fid	edp_dfid;
+};
+
+struct ext4_dirent_hash {
+	struct ext4_dirent_data_header	dh_header;
+	struct ext4_dir_entry_hash	dh_hash;
+} __packed;
 
 #define EXT4_FT_DIR_CSUM	0xDE
 
@@ -3917,6 +3960,12 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
 		io_end->flag &= ~EXT4_IO_END_UNWRITTEN;
 }
 
+/*
+ * Advance to the next dirdata record header starting from @ddh.
+ */
+#define ext4_dirdata_next(ddh) \
+	((struct ext4_dirent_data_header *)((char *)(ddh) + (ddh)->ddh_length))
+
 /*
  * ext4_dir_entry_is_tail() - Check if a directory entry is a tail entry.
  * @de: directory entry to check
@@ -3933,6 +3982,40 @@ static inline bool ext4_dir_entry_is_tail(struct ext4_dir_entry_2 *de)
 	       t->det_reserved_ft == EXT4_FT_DIR_CSUM;
 }
 
+/*
+ * ext4_dirent_get_data_len() - Compute the total dirdata length for an entry.
+ * @de: directory entry
+ *
+ * Computes the length of optional data stored after the filename (and its
+ * implicit NUL terminator).  Each extension is indicated by a bit in the
+ * high 4 bits of de->file_type; the first byte of each extension is its
+ * length (including that length byte itself).
+ *
+ * Returns 0 for tail entries and for entries with no dirdata.
+ */
+static inline int ext4_dirent_get_data_len(struct ext4_dir_entry_2 *de)
+{
+	__u8 extra_data_flags;
+	struct ext4_dirent_data_header *ddh;
+	int dlen = 0;
+
+	if (ext4_dir_entry_is_tail(de))
+		return 0;
+
+	extra_data_flags = (de->file_type & ~EXT4_FT_MASK) >> 4;
+	ddh = (struct ext4_dirent_data_header *)(de->name + de->name_len +
+						 1 /* NUL terminator */);
+
+	while (extra_data_flags) {
+		if (extra_data_flags & 1) {
+			dlen += ddh->ddh_length + (dlen == 0);
+			ddh = ext4_dirdata_next(ddh);
+		}
+		extra_data_flags >>= 1;
+	}
+	return dlen;
+}
+
 extern const struct iomap_ops ext4_iomap_ops;
 extern const struct iomap_ops ext4_iomap_report_ops;
 
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 05/10] ext4: preserve dirdata bits in get_dtype()
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Mask the filetype with EXT4_FT_MASK when indexing
ext4_filetype_table[] to avoid using dirdata bits as an index.

Preserve the extra bits
stored in the upper part of filetype and propagate them to the
returned dtype value.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/ext4.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c36c3bf54590..f833f6ef0040 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2960,12 +2960,15 @@ static const unsigned char ext4_filetype_table[] = {
 	DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
 };
 
-static inline  unsigned char get_dtype(struct super_block *sb, int filetype)
+static inline unsigned char get_dtype(struct super_block *sb, int filetype)
 {
-	if (!ext4_has_feature_filetype(sb) || filetype >= EXT4_FT_MAX)
+	unsigned char fl_index = filetype & EXT4_FT_MASK;
+
+	if (!ext4_has_feature_filetype(sb) || fl_index >= EXT4_FT_MAX)
 		return DT_UNKNOWN;
 
-	return ext4_filetype_table[filetype];
+	return (ext4_filetype_table[fl_index]) |
+		(filetype & ~EXT4_FT_MASK);
 }
 extern int ext4_check_all_de(struct inode *dir, struct buffer_head *bh,
 			     void *buf, int buf_size);
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 03/10] ext4: refactor dx_root to support variable dirent sizes
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4
  Cc: adilger.kernel, Artem Blagodarenko, Pravin Shelar, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Split monolithic definition of dx_root struct to separate dx_root_info
from fake struct ext4_dir_entry2 for improved code readability.
This allows "." and ".." dirents to have different sizes if necessary,
since we can't assume the rec_len 12 if dx_root dirents have dirdata.
Adds dx_get_dx_info() accessor instead of complex typecast at callers.
Does not change any functionality.

Signed-off-by: Pravin Shelar <pravin.shelar@sun.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/namei.c | 145 +++++++++++++++++++++++-------------------------
 1 file changed, 70 insertions(+), 75 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 0196d954cba1..87d8cd2c6377 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -244,22 +244,13 @@ struct dx_entry
  * hash version mod 4 should never be 0.  Sincerely, the paranoia department.
  */
 
-struct dx_root
+struct dx_root_info
 {
-	struct fake_dirent dot;
-	char dot_name[4];
-	struct fake_dirent dotdot;
-	char dotdot_name[4];
-	struct dx_root_info
-	{
-		__le32 reserved_zero;
-		u8 hash_version;
-		u8 info_length; /* 8 */
-		u8 indirect_levels;
-		u8 unused_flags;
-	}
-	info;
-	struct dx_entry	entries[];
+	__le32 reserved_zero;
+	u8 hash_version;
+	u8 info_length; /* 8 */
+	u8 indirect_levels;
+	u8 unused_flags;
 };
 
 struct dx_node
@@ -528,6 +519,16 @@ ext4_next_entry(struct ext4_dir_entry_2 *p, unsigned long blocksize)
  * Future: use high four bits of block for coalesce-on-delete flags
  * Mask them off for now.
  */
+static struct dx_root_info *dx_get_dx_info(void *de_buf)
+{
+	/* get dotdot first */
+	de_buf = de_buf + ext4_dir_rec_len(1, NULL);
+
+	/* dx root info is after dotdot entry */
+	de_buf = de_buf + ext4_dir_rec_len(2, NULL);
+
+	return (struct dx_root_info *)de_buf;
+}
 
 static inline ext4_lblk_t dx_get_block(struct dx_entry *entry)
 {
@@ -775,7 +776,7 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
 {
 	unsigned count, indirect, level, i;
 	struct dx_entry *at, *entries, *p, *q, *m;
-	struct dx_root *root;
+	struct dx_root_info *info;
 	struct dx_frame *frame = frame_in;
 	struct dx_frame *ret_err = ERR_PTR(ERR_BAD_DX_DIR);
 	u32 hash;
@@ -787,23 +788,24 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
 	if (IS_ERR(frame->bh))
 		return (struct dx_frame *) frame->bh;
 
-	root = (struct dx_root *) frame->bh->b_data;
-	if (root->info.hash_version != DX_HASH_TEA &&
-	    root->info.hash_version != DX_HASH_HALF_MD4 &&
-	    root->info.hash_version != DX_HASH_LEGACY &&
-	    root->info.hash_version != DX_HASH_SIPHASH) {
-		ext4_warning_inode(dir, "Unrecognised inode hash code %u",
-				   root->info.hash_version);
+	info = dx_get_dx_info((struct ext4_dir_entry_2 *)frame->bh->b_data);
+	if (info->hash_version != DX_HASH_TEA &&
+	    info->hash_version != DX_HASH_HALF_MD4 &&
+	    info->hash_version != DX_HASH_LEGACY &&
+	    info->hash_version != DX_HASH_SIPHASH) {
+		ext4_warning(dir->i_sb,
+			"Unrecognised inode hash code %d for directory #%llu",
+			info->hash_version, dir->i_ino);
 		goto fail;
 	}
 	if (ext4_hash_in_dirent(dir)) {
-		if (root->info.hash_version != DX_HASH_SIPHASH) {
+		if (info->hash_version != DX_HASH_SIPHASH) {
 			ext4_warning_inode(dir,
 				"Hash in dirent, but hash is not SIPHASH");
 			goto fail;
 		}
 	} else {
-		if (root->info.hash_version == DX_HASH_SIPHASH) {
+		if (info->hash_version == DX_HASH_SIPHASH) {
 			ext4_warning_inode(dir,
 				"Hash code is SIPHASH, but hash not in dirent");
 			goto fail;
@@ -811,7 +813,7 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
 	}
 	if (fname)
 		hinfo = &fname->hinfo;
-	hinfo->hash_version = root->info.hash_version;
+	hinfo->hash_version = info->hash_version;
 	if (hinfo->hash_version <= DX_HASH_TEA)
 		hinfo->hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
 	hinfo->seed = EXT4_SB(dir->i_sb)->s_hash_seed;
@@ -827,13 +829,13 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
 	}
 	hash = hinfo->hash;
 
-	if (root->info.unused_flags & 1) {
+	if (info->unused_flags & 1) {
 		ext4_warning_inode(dir, "Unimplemented hash flags: %#06x",
-				   root->info.unused_flags);
+				   info->unused_flags);
 		goto fail;
 	}
 
-	indirect = root->info.indirect_levels;
+	indirect = info->indirect_levels;
 	if (indirect >= ext4_dir_htree_level(dir->i_sb)) {
 		ext4_warning(dir->i_sb,
 			     "Directory (ino: %llu) htree depth %#06x exceed"
@@ -846,14 +848,12 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
 		goto fail;
 	}
 
-	entries = (struct dx_entry *)(((char *)&root->info) +
-				      root->info.info_length);
+	entries = (struct dx_entry *)(((char *)info) + info->info_length);
 
-	if (dx_get_limit(entries) != dx_root_limit(dir,
-						   root->info.info_length)) {
+	if (dx_get_limit(entries) != dx_root_limit(dir, info->info_length)) {
 		ext4_warning_inode(dir, "dx entry: limit %u != root limit %u",
 				   dx_get_limit(entries),
-				   dx_root_limit(dir, root->info.info_length));
+				   dx_root_limit(dir, info->info_length));
 		goto fail;
 	}
 
@@ -939,7 +939,7 @@ static void dx_release(struct dx_frame *frames)
 	if (frames[0].bh == NULL)
 		return;
 
-	info = &((struct dx_root *)frames[0].bh->b_data)->info;
+	info = dx_get_dx_info((struct ext4_dir_entry_2 *)frames[0].bh->b_data);
 	/* save local copy, "info" may be freed after brelse() */
 	indirect_levels = info->indirect_levels;
 	for (i = 0; i <= indirect_levels; i++) {
@@ -2151,44 +2151,38 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 	return err ? err : err2;
 }
 
-static bool ext4_check_dx_root(struct inode *dir, struct dx_root *root)
+static bool ext4_check_dx_root(struct inode *dir,
+			       struct ext4_dir_entry_2 *dot_de,
+			       struct ext4_dir_entry_2 *dotdot_de,
+			       struct ext4_dir_entry_2 **entry)
 {
-	struct fake_dirent *fde;
 	const char *error_msg;
-	unsigned int rlen;
 	unsigned int blocksize = dir->i_sb->s_blocksize;
-	char *blockend = (char *)root + dir->i_sb->s_blocksize;
+	struct ext4_dir_entry_2 *de = NULL;
 
-	fde = &root->dot;
-	if (unlikely(fde->name_len != 1)) {
+	if (unlikely(dot_de->name_len != 1)) {
 		error_msg = "invalid name_len for '.'";
 		goto corrupted;
 	}
-	if (unlikely(strncmp(root->dot_name, ".", fde->name_len))) {
+	if (unlikely(strncmp(dot_de->name, ".", dot_de->name_len))) {
 		error_msg = "invalid name for '.'";
 		goto corrupted;
 	}
-	rlen = ext4_rec_len_from_disk(fde->rec_len, blocksize);
-	if (unlikely((char *)fde + rlen >= blockend)) {
-		error_msg = "invalid rec_len for '.'";
-		goto corrupted;
-	}
 
-	fde = &root->dotdot;
-	if (unlikely(fde->name_len != 2)) {
+	if (unlikely(dotdot_de->name_len != 2)) {
 		error_msg = "invalid name_len for '..'";
 		goto corrupted;
 	}
-	if (unlikely(strncmp(root->dotdot_name, "..", fde->name_len))) {
+	if (unlikely(strncmp(dotdot_de->name, "..", dotdot_de->name_len))) {
 		error_msg = "invalid name for '..'";
 		goto corrupted;
 	}
-	rlen = ext4_rec_len_from_disk(fde->rec_len, blocksize);
-	if (unlikely((char *)fde + rlen >= blockend)) {
+	de = ext4_next_entry(dotdot_de, blocksize);
+	if ((char *)de >= (((char *)dot_de) + blocksize)) {
 		error_msg = "invalid rec_len for '..'";
 		goto corrupted;
 	}
-
+	*entry = de;
 	return true;
 
 corrupted:
@@ -2206,16 +2200,15 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname,
 			    struct inode *inode, struct buffer_head *bh)
 {
 	struct buffer_head *bh2;
-	struct dx_root	*root;
 	struct dx_frame	frames[EXT4_HTREE_LEVEL], *frame;
 	struct dx_entry *entries;
-	struct ext4_dir_entry_2	*de, *de2;
+	struct ext4_dir_entry_2	*de, *de2, *dot_de, *dotdot_de;
 	char		*data2, *top;
 	unsigned	len;
 	int		retval;
 	unsigned	blocksize;
 	ext4_lblk_t  block;
-	struct fake_dirent *fde;
+	struct dx_root_info *dx_info;
 	int csum_size = 0;
 
 	if (ext4_has_feature_metadata_csum(inode->i_sb))
@@ -2232,17 +2225,15 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname,
 		return retval;
 	}
 
-	root = (struct dx_root *) bh->b_data;
-	if (!ext4_check_dx_root(dir, root)) {
+	dot_de = (struct ext4_dir_entry_2 *)bh->b_data;
+	dotdot_de = ext4_next_entry(dot_de, blocksize);
+	if (!ext4_check_dx_root(dir, dot_de, dotdot_de, &de)) {
 		brelse(bh);
 		return -EFSCORRUPTED;
 	}
 
 	/* The 0th block becomes the root, move the dirents out */
-	fde = &root->dotdot;
-	de = (struct ext4_dir_entry_2 *)((char *)fde +
-		ext4_rec_len_from_disk(fde->rec_len, blocksize));
-	len = ((char *) root) + (blocksize - csum_size) - (char *) de;
+	len = ((char *)dot_de) + (blocksize - csum_size) - (char *)de;
 
 	/* Allocate new block for the 0th block's dirents */
 	bh2 = ext4_append(handle, dir, &block);
@@ -2273,24 +2264,27 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname,
 		ext4_initialize_dirent_tail(bh2, blocksize);
 
 	/* Initialize the root; the dot dirents already exist */
-	de = (struct ext4_dir_entry_2 *) (&root->dotdot);
-	de->rec_len = ext4_rec_len_to_disk(
-			blocksize - ext4_dir_rec_len(2, NULL), blocksize);
-	memset (&root->info, 0, sizeof(root->info));
-	root->info.info_length = sizeof(root->info);
+	dotdot_de->rec_len =
+		ext4_rec_len_to_disk(blocksize - le16_to_cpu(dot_de->rec_len),
+				     blocksize);
+
+	/* initialize hashing info */
+	dx_info = dx_get_dx_info(dot_de);
+	memset(dx_info, 0, sizeof(*dx_info));
+	dx_info->info_length = sizeof(*dx_info);
 	if (ext4_hash_in_dirent(dir))
-		root->info.hash_version = DX_HASH_SIPHASH;
+		dx_info->hash_version = DX_HASH_SIPHASH;
 	else
-		root->info.hash_version =
+		dx_info->hash_version =
 				EXT4_SB(dir->i_sb)->s_def_hash_version;
 
-	entries = root->entries;
+	entries = (void *)dx_info + sizeof(*dx_info);
 	dx_set_block(entries, 1);
 	dx_set_count(entries, 1);
-	dx_set_limit(entries, dx_root_limit(dir, sizeof(root->info)));
+	dx_set_limit(entries, dx_root_limit(dir, sizeof(*dx_info)));
 
 	/* Initialize as for dx_probe */
-	fname->hinfo.hash_version = root->info.hash_version;
+	fname->hinfo.hash_version = dx_info->hash_version;
 	if (fname->hinfo.hash_version <= DX_HASH_TEA)
 		fname->hinfo.hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
 	fname->hinfo.seed = EXT4_SB(dir->i_sb)->s_hash_seed;
@@ -2600,7 +2594,7 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname,
 			if (restart || err)
 				goto journal_error;
 		} else {
-			struct dx_root *dxroot;
+			struct dx_root_info *info;
 			memcpy((char *) entries2, (char *) entries,
 			       icount * sizeof(struct dx_entry));
 			dx_set_limit(entries2, dx_node_limit(dir));
@@ -2608,8 +2602,9 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname,
 			/* Set up root */
 			dx_set_count(entries, 1);
 			dx_set_block(entries + 0, newblock);
-			dxroot = (struct dx_root *)frames[0].bh->b_data;
-			dxroot->info.indirect_levels += 1;
+			info = dx_get_dx_info((struct ext4_dir_entry_2 *)
+					      frames[0].bh->b_data);
+			info->indirect_levels += 1;
 			dxtrace(printk(KERN_DEBUG
 				       "Creating %d level index...\n",
 				       dxroot->info.indirect_levels));
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 02/10] ext4: add ext4_dir_entry_is_tail()
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Replace open-coded checks for directory tail entries with a call
to ext4_dir_entry_is_tail(). This helper will also be used by
upcoming changes.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/ext4.h  | 16 ++++++++++++++++
 fs/ext4/namei.c |  7 +------
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 94283a991e5c..01b1222b1454 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3917,6 +3917,22 @@ static inline void ext4_clear_io_unwritten_flag(ext4_io_end_t *io_end)
 		io_end->flag &= ~EXT4_IO_END_UNWRITTEN;
 }
 
+/*
+ * ext4_dir_entry_is_tail() - Check if a directory entry is a tail entry.
+ * @de: directory entry to check
+ *
+ * Returns true if @de is a directory block tail entry (checksum record).
+ */
+static inline bool ext4_dir_entry_is_tail(struct ext4_dir_entry_2 *de)
+{
+	struct ext4_dir_entry_tail *t = (struct ext4_dir_entry_tail *)de;
+
+	return !t->det_reserved_zero1 &&
+	       le16_to_cpu(t->det_rec_len) == sizeof(*t) &&
+	       !t->det_reserved_zero2 &&
+	       t->det_reserved_ft == EXT4_FT_DIR_CSUM;
+}
+
 extern const struct iomap_ops ext4_iomap_ops;
 extern const struct iomap_ops ext4_iomap_report_ops;
 
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5805001ff1d9..0196d954cba1 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -314,7 +314,6 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode,
 						   struct buffer_head *bh)
 {
 	struct ext4_dir_entry_tail *t;
-	int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
 
 #ifdef PARANOID
 	struct ext4_dir_entry_2 *d, *top;
@@ -334,11 +333,7 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode,
 	t = EXT4_DIRENT_TAIL(bh->b_data, EXT4_BLOCK_SIZE(inode->i_sb));
 #endif
 
-	if (t->det_reserved_zero1 ||
-	    (ext4_rec_len_from_disk(t->det_rec_len, blocksize) !=
-	     sizeof(struct ext4_dir_entry_tail)) ||
-	    t->det_reserved_zero2 ||
-	    t->det_reserved_ft != EXT4_FT_DIR_CSUM)
+	if (!ext4_dir_entry_is_tail((struct ext4_dir_entry_2 *)t))
 		return NULL;
 
 	return t;
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 01/10] ext4: replace ext4_dir_entry with ext4_dir_entry_2
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Replace remaining uses of struct ext4_dir_entry in namei.c
with struct ext4_dir_entry_2.

The code paths affected by this change already depend on the
filetype feature, so using struct ext4_dir_entry_2 is
appropriate and avoids mixing the two directory entry types
unnecessarily.

This change does not affect support for 16-bit rec_len.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/namei.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 4a47fbd8dd30..5805001ff1d9 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -102,7 +102,7 @@ static struct buffer_head *ext4_append(handle_t *handle,
 }
 
 static int ext4_dx_csum_verify(struct inode *inode,
-			       struct ext4_dir_entry *dirent);
+			       struct ext4_dir_entry_2 *dirent);
 
 /*
  * Hints to ext4_read_dirblock regarding whether we expect a directory
@@ -128,7 +128,7 @@ static struct buffer_head *__ext4_read_dirblock(struct inode *inode,
 						unsigned int line)
 {
 	struct buffer_head *bh;
-	struct ext4_dir_entry *dirent;
+	struct ext4_dir_entry_2 *dirent;
 	int is_dx_block = 0;
 
 	if (block >= inode->i_size >> inode->i_blkbits) {
@@ -160,7 +160,7 @@ static struct buffer_head *__ext4_read_dirblock(struct inode *inode,
 	}
 	if (!bh)
 		return NULL;
-	dirent = (struct ext4_dir_entry *) bh->b_data;
+	dirent = (struct ext4_dir_entry_2 *) bh->b_data;
 	/* Determine whether or not we have an index block */
 	if (is_dx(inode)) {
 		if (block == 0)
@@ -317,13 +317,13 @@ static struct ext4_dir_entry_tail *get_dirent_tail(struct inode *inode,
 	int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
 
 #ifdef PARANOID
-	struct ext4_dir_entry *d, *top;
+	struct ext4_dir_entry_2 *d, *top;
 
-	d = (struct ext4_dir_entry *)bh->b_data;
-	top = (struct ext4_dir_entry *)(bh->b_data +
+	d = (struct ext4_dir_entry_2 *)bh->b_data;
+	top = (struct ext4_dir_entry_2 *)(bh->b_data +
 		(blocksize - sizeof(struct ext4_dir_entry_tail)));
 	while (d < top && ext4_rec_len_from_disk(d->rec_len, blocksize))
-		d = (struct ext4_dir_entry *)(((void *)d) +
+		d = (struct ext4_dir_entry_2 *)(((void *)d) +
 		    ext4_rec_len_from_disk(d->rec_len, blocksize));
 
 	if (d != top)
@@ -410,22 +410,22 @@ int ext4_handle_dirty_dirblock(handle_t *handle,
 }
 
 static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
-					       struct ext4_dir_entry *dirent,
+					       struct ext4_dir_entry_2 *dirent,
 					       int *offset)
 {
-	struct ext4_dir_entry *dp;
+	struct ext4_dir_entry_2 *de;
 	struct dx_root_info *root;
 	int count_offset;
 	int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
 	unsigned int rlen = ext4_rec_len_from_disk(dirent->rec_len, blocksize);
 
 	if (rlen == blocksize)
-		count_offset = 8;
+		count_offset = sizeof(struct dx_node);
 	else if (rlen == 12) {
-		dp = (struct ext4_dir_entry *)(((void *)dirent) + 12);
-		if (ext4_rec_len_from_disk(dp->rec_len, blocksize) != blocksize - 12)
+		de = (struct ext4_dir_entry_2 *)(((void *)dirent) + 12);
+		if (ext4_rec_len_from_disk(de->rec_len, blocksize) != blocksize - 12)
 			return NULL;
-		root = (struct dx_root_info *)(((void *)dp + 12));
+		root = (struct dx_root_info *)(((void *)de + 12));
 		if (root->reserved_zero ||
 		    root->info_length != sizeof(struct dx_root_info))
 			return NULL;
@@ -438,7 +438,7 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
 	return (struct dx_countlimit *)(((void *)dirent) + count_offset);
 }
 
-static __le32 ext4_dx_csum(struct inode *inode, struct ext4_dir_entry *dirent,
+static __le32 ext4_dx_csum(struct inode *inode, struct ext4_dir_entry_2 *dirent,
 			   int count_offset, int count, struct dx_tail *t)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
@@ -456,7 +456,7 @@ static __le32 ext4_dx_csum(struct inode *inode, struct ext4_dir_entry *dirent,
 }
 
 static int ext4_dx_csum_verify(struct inode *inode,
-			       struct ext4_dir_entry *dirent)
+			       struct ext4_dir_entry_2 *dirent)
 {
 	struct dx_countlimit *c;
 	struct dx_tail *t;
@@ -485,7 +485,7 @@ static int ext4_dx_csum_verify(struct inode *inode,
 	return 1;
 }
 
-static void ext4_dx_csum_set(struct inode *inode, struct ext4_dir_entry *dirent)
+static void ext4_dx_csum_set(struct inode *inode, struct ext4_dir_entry_2 *dirent)
 {
 	struct dx_countlimit *c;
 	struct dx_tail *t;
@@ -515,7 +515,7 @@ static inline int ext4_handle_dirty_dx_node(handle_t *handle,
 					    struct inode *inode,
 					    struct buffer_head *bh)
 {
-	ext4_dx_csum_set(inode, (struct ext4_dir_entry *)bh->b_data);
+	ext4_dx_csum_set(inode, (struct ext4_dir_entry_2 *)bh->b_data);
 	return ext4_handle_dirty_metadata(handle, inode, bh);
 }
 
@@ -1488,7 +1488,7 @@ int ext4_search_dir(struct buffer_head *bh, char *search_buf, int buf_size,
 }
 
 static int is_dx_internal_node(struct inode *dir, ext4_lblk_t block,
-			       struct ext4_dir_entry *de)
+			       struct ext4_dir_entry_2 *de)
 {
 	struct super_block *sb = dir->i_sb;
 
@@ -1619,7 +1619,7 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir,
 		}
 		if (!buffer_verified(bh) &&
 		    !is_dx_internal_node(dir, block,
-					 (struct ext4_dir_entry *)bh->b_data) &&
+					 (struct ext4_dir_entry_2 *)bh->b_data) &&
 		    !ext4_dirblock_csum_verify(dir, bh)) {
 			EXT4_ERROR_INODE_ERR(dir, EFSBADCRC,
 					     "checksumming directory "
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 00/10] Data in direntry (dirdata) feature
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger, syzbot

EXT4 currently stores a hash in the directory entry
(dirent) immediately after the file name to support
simultaneous fscrypt and casefold functionality.

It has been discussed within the EXT4 community that
this hash could instead be stored in dirdata. This
would make it the second (or third, in the case of
64-bit inode counts) user of dirdata.

At the same time, the existing format—where the hash
is placed after the file name—must continue to be
supported. With these patches, EXT4 can handle the
hash in both formats.

The first user of this feature is  LUFID -
Locally Unique File ID.

Support for fscrypt and case-insensitive directories
with dirdata enabled has been verified using a
dedicated xfstest submitted to the xfstests list as
a separate patch.

e2fsprogs support is provided in a separate patches
series.

Changes in v2:
- Split the patch set into 10 smaller patchesfor
  easier reading and review.
- Added an IOCTL to set the LUFID for testing purposes.
  LUFIDs can be listed via debugfs. Corresponding support
  has been added in the related e2fsprogs series.
- Removed the dirdata mount option.
- Fixed the following issue:
  KASAN: slab-out-of-bounds read in __ext4_check_dir_entry
- Rebased onto the latest codebase.

Artem Blagodarenko (10):
  ext4: replace ext4_dir_entry with ext4_dir_entry_2
  ext4: add ext4_dir_entry_is_tail()
  ext4: refactor dx_root to support variable dirent sizes
  ext4: add dirdata format definitions and access helpers
  ext4: preserve dirdata bits in get_dtype()
  ext4: add ext4_dir_entry_len() and harden dirdata parsing
  ext4: rename ext4_dir_rec_len() and clarify dirdata usage
  ext4: dirdata feature
  ext4: add dirdata set/get helpers
  ext4: Add EXT4_IOC_SET_LUFID ioctl for setting LUFID on directory
    entries

 foofile.txt               |   0
 fs/ext4/dir.c             |   9 +-
 fs/ext4/ext4.h            | 205 ++++++++++++-
 fs/ext4/inline.c          |  37 ++-
 fs/ext4/ioctl.c           |  62 ++++
 fs/ext4/namei.c           | 587 +++++++++++++++++++++++++++-----------
 fs/ext4/sysfs.c           |   2 +
 include/uapi/linux/ext4.h |  13 +
 8 files changed, 723 insertions(+), 192 deletions(-)
 create mode 100644 foofile.txt

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Tested-by: syzbot@syzkaller.appspotmail.com
-- 
2.43.7

^ permalink raw reply

* [PATCH] iomap: enforce DIO alignment check in iomap
From: cem @ 2026-06-10 14:52 UTC (permalink / raw)
  To: brauner
  Cc: linux-block, linux-fsdevel, linux-ext4, linux-xfs,
	Carlos Maiolino, Keith Busch, Hannes Reinecke, Martin K. Petersen,
	Christoph Hellwig, Jens Axboe

From: Carlos Maiolino <cem@kernel.org>

The DIO alignment check has been lifted from iomap layer to rely on the
block layer to enforce proper alignment when issuing direct IO
operations. This though, depending on the IO size and buffer address
passed to the IO operation may lead to user-visible behavior change.

This has been caught initially by LTP test diotest4 running on
PPC architecture, where the test fails because a read() operation
with a supposedly misaligned buffer succeeds instead of an expected
-EINVAL.
This has no direct relationship with PPC, but seems to do with the
IO size crossing page borders or not.

The test allocates a 4k buffer, and then increments the buffer pointer
by a single byte to enforce a misaligned address. It then issues a 4k
read() using such buffer. The read is supposed to return an -EINVAL but
it ends up succeeding.

The allocated buffer is at least a single page, so the read() size being
smaller will end up most of the time within the very same page initially
allocated which seems to suffice the block layer to accept the IO.

On x86 though, the same 4k read will end up crossing page boundaries
causing a bio_split which ends up properly checking the address and
rejecting it due to misalignment.
The test itself is buggy (which seems by design) because it ends up
attempting to read 4096 bytes into a 4095, but I believe the test
expected the address to be rejected prior to any write attempt.

The problematic behavior is reproducible on x86 by reducing the IO size
to something < PAGE_SIZE, so the misaligned read()s will also be accepted
by the block layer.

Fixing this is just a matter of enforcing daddr and memory
alignment back into iomap.

This behavior is reproducible in ext4 and xfs due to both relying on
iomap layer, btrfs does not present this behavior change as it does its
own DIO alignment checking.

Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
Cc: Keith Busch <kbusch@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
---

While I didn't spot any memory/disk corruption looking into this, it
changes the user behavior that dictates buffer addresses must be
properly aligned when issuing direct IO operations so I thought making
iomap check again for the buffer address alignment is reasonable.

 fs/iomap/direct-io.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 95254aa1b654..0064984e64e5 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -400,6 +400,9 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	if ((pos | length) & (alignment - 1))
 		return -EINVAL;

+	if (iov_iter_alignment(dio->submit.iter) & (alignment - 1))
+		return -EINVAL;
+
 	if (dio->flags & IOMAP_DIO_WRITE) {
 		bool need_completion_work = true;

-- 
2.54.0

^ permalink raw reply related

* [PATCH v2] ext4: fix circular lock dependency in ext4_ext_migrate
From: Yun Zhou @ 2026-06-10 10:30 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, ebiggers, yun.zhou
  Cc: linux-ext4, linux-kernel
In-Reply-To: <20260609084007.3432061-1-yun.zhou@windriver.com>

Move iput(tmp_inode) after ext4_writepages_up_write() to avoid a
circular lock dependency between s_writepages_rwsem and sb_internal
(freeze protection).

The deadlock scenario:

  CPU0 (EXT4_IOC_MIGRATE)        CPU1 (orphan cleanup during mount)
  ----                           ----
  ext4_ext_migrate()
    ext4_writepages_down_write()
      s_writepages_rwsem (write)
                                 ext4_evict_inode()
                                   sb_start_intwrite()   [sb_internal]
                                   ...
                                     ext4_writepages()
                                       s_writepages_rwsem (read) [BLOCKED]
    iput(tmp_inode)
      ext4_evict_inode()
        sb_start_intwrite()         [BLOCKED]

The tmp_inode is a temporary inode with nlink=0 created solely for
building the extent tree.  Its eviction does not require
s_writepages_rwsem protection, so deferring iput() until after
releasing the rwsem is safe.

Reported-by: syzbot+f0b58a1f5075a90dd9a5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f0b58a1f5075a90dd9a5
Fixes: cb85f4d23f79 ("ext4: fix race between writepages and enabling EXT4_EXTENTS_FL")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
v2: remove redundant null pointer check for iput(tmp_inode)

 fs/ext4/migrate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 477d43d7e294..5d60ef10fe11 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -464,6 +464,7 @@ int ext4_ext_migrate(struct inode *inode)
 	if (IS_ERR(tmp_inode)) {
 		retval = PTR_ERR(tmp_inode);
 		ext4_journal_stop(handle);
+		tmp_inode = NULL;
 		goto out_unlock;
 	}
 	/*
@@ -591,9 +592,9 @@ int ext4_ext_migrate(struct inode *inode)
 	ext4_journal_stop(handle);
 out_tmp_inode:
 	unlock_new_inode(tmp_inode);
-	iput(tmp_inode);
 out_unlock:
 	ext4_writepages_up_write(inode->i_sb, alloc_ctx);
+	iput(tmp_inode);
 	return retval;
 }
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH] ext4: fix circular lock dependency in ext4_ext_migrate
From: Jan Kara @ 2026-06-10 10:21 UTC (permalink / raw)
  To: Zhou, Yun
  Cc: Jan Kara, tytso, adilger.kernel, libaokun, ojaswin, ritesh.list,
	yi.zhang, ebiggers, linux-ext4, linux-kernel
In-Reply-To: <7fe6eec7-acd1-4511-beb7-bac9bbdb9cb2@windriver.com>

On Wed 10-06-26 15:04:33, Zhou, Yun wrote:
> 
> 
> On 6/9/26 20:05, Jan Kara wrote:
> > Looks good. Feel free to add:
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > 
> > Just one nit below:
> > 
> > > @@ -591,9 +592,10 @@ int ext4_ext_migrate(struct inode *inode)
> > >        ext4_journal_stop(handle);
> > >   out_tmp_inode:
> > >        unlock_new_inode(tmp_inode);
> > > -     iput(tmp_inode);
> > >   out_unlock:
> > >        ext4_writepages_up_write(inode->i_sb, alloc_ctx);
> > > +     if (tmp_inode)
> > > +             iput(tmp_inode);
> > iput(NULL) is properly handled so you don't need the if (tmp_inode) check
> > here.
> Hi Jan,
> 
> Thank you for your careful review. Should I remove this redundant check in
> v2?

Yes, please. Thank you!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox