Linux EXT4 FS development

Linux EXT4 FS development
 help / color / mirror / Atom feed

* [PATCH] ext4: skip extra isize expansion on inode eviction to avoid deadlock
From: Yun Zhou @ 2026-06-11 12:45 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang, ebiggers, yun.zhou
  Cc: linux-ext4, linux-kernel

Expanding extra isize on an inode that is being evicted is pointless
since the inode is about to be deleted.  Skip it by setting
EXT4_STATE_NO_EXPAND before calling ext4_mark_inode_dirty() in the
eviction path.

This also breaks a circular lock dependency reported by lockdep during
orphan cleanup at mount time:

  CPU0 (writeback worker)            CPU1 (open)
  ----                               ----
  ext4_writepages()
    s_writepages_rwsem (read)        ext4_create()
    ext4_do_writepages()               __ext4_new_inode()
      ext4_journal_start()               [holds jbd2 handle]
        wait_transaction_locked()        ext4_xattr_set_handle()
        [WAIT for jbd2_handle]             xattr_sem (write)

  CPU2 (mount / orphan cleanup)
  ----
  ext4_evict_inode()
    __ext4_mark_inode_dirty()
      ext4_try_to_expand_extra_isize()
        xattr_sem (write)
        ext4_expand_extra_isize_ea()
          ext4_xattr_block_set()
            iput(ea_inode)
              write_inode_now()
                ext4_writepages()
                  s_writepages_rwsem (read)
                  [WAIT for s_writepages_rwsem -- if blocked by write lock holder]

This forms a circular dependency on lock classes:

  s_writepages_rwsem --> jbd2_handle --> xattr_sem --> s_writepages_rwsem

The iput() inside ext4_xattr_block_set() triggers write_inode_now()
because SB_ACTIVE is not yet set during mount, so iput_final() cannot
cache the inode in the LRU and must flush it synchronously.

Setting EXT4_STATE_NO_EXPAND prevents ext4_try_to_expand_extra_isize()
from executing, which eliminates the xattr_sem --> s_writepages_rwsem
edge and breaks the cycle.

Reported-by: syzbot+5d19358d7eb30ffb0cc5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5d19358d7eb30ffb0cc5
Fixes: c8585c6fcaf2 ("ext4: fix races between changing inode journal mode and ext4_writepages")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
 fs/ext4/inode.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cd7588a3fa45..cbfd1d1282e6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -264,6 +264,12 @@ void ext4_evict_inode(struct inode *inode)
 	if (ext4_inode_is_fast_symlink(inode))
 		memset(EXT4_I(inode)->i_data, 0, sizeof(EXT4_I(inode)->i_data));
 	inode->i_size = 0;
+	/*
+	 * Skip extra isize expansion on inodes being deleted -- it is
+	 * pointless and can trigger a circular lock dependency:
+	 *   xattr_sem -> ext4_xattr_block_set -> iput -> s_writepages_rwsem
+	 */
+	ext4_set_inode_state(inode, EXT4_STATE_NO_EXPAND);
 	err = ext4_mark_inode_dirty(handle, inode);
 	if (err) {
 		ext4_warning(inode->i_sb,
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v2] jbd2: Remove special jbd2 slabs
From: Theodore Ts'o @ 2026-06-11 12:35 UTC (permalink / raw)
  To: Ext4 Developers List, Matthew Wilcox (Oracle)
  Cc: Theodore Ts'o, Jan Kara, linux-fsdevel,
	Mike Rapoport (Microsoft), Vlastimil Babka, Tal Zussman, Jan Kara
In-Reply-To: <20260528171413.1088143-1-willy@infradead.org>


On Thu, 28 May 2026 18:14:11 +0100, Matthew Wilcox (Oracle) wrote:
> When jbd2 was originally written, kmalloc() would not guarantee memory
> alignment for the requested objects.  Since commit 59bb47985c1d in 2019,
> kmalloc has guaranteed natural alignment for power-of-two allocations.
> We can now remove the jbd2 special slabs and just use kmalloc() directly.

Applied, thanks!

[1/1] jbd2: Remove special jbd2 slabs
      commit: bbe9015f23432bd4f5b8590eb178b3b5b7c29f02

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply

* Re: [PATCH] ext4: fix kernel BUG in ext4_write_inline_data_end
From: Theodore Ts'o @ 2026-06-11 12:35 UTC (permalink / raw)
  To: Ext4 Developers List, Andreas Dilger, Aditya Prakash Srivastava
  Cc: Theodore Ts'o, Jan Kara, Baokun Li, Ojaswin Mujoo,
	Ritesh Harjani, Zhang Yi, linux-kernel,
	syzbot+0c89d865531d053abb2d
In-Reply-To: <20260608065227.3018-1-aditya.ansh182@gmail.com>


On Mon, 08 Jun 2026 06:52:27 +0000, Aditya Prakash Srivastava wrote:
> When the data=journal mount option is used, the ext4_journalled_write_end()
> function incorrectly calls ext4_write_inline_data_end() without checking
> if the EXT4_STATE_MAY_INLINE_DATA flag is still set on the inode.
> 
> If a previous attempt to convert the inline data to an extent failed (e.g.
> due to ENOSPC), the EXT4_STATE_MAY_INLINE_DATA flag is cleared, but
> the EXT4_INODE_INLINE_DATA flag remains set. In this scenario, the next
> call to ext4_write_begin() will not prepare the inline data xattr for
> writing, but ext4_journalled_write_end() will incorrectly attempt to write
> to it, triggering a BUG_ON(pos + len > EXT4_I(inode)->i_inline_size) in
> ext4_write_inline_data() since i_inline_size was not expanded.
> 
> [...]

Applied, thanks!

[1/1] ext4: fix kernel BUG in ext4_write_inline_data_end
      commit: ad09aa45965d3fafaf9963bc78109b73c0f9ac8d

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply

* Re: [PATCH] ext4: validate donor file superblock early in EXT4_IOC_MOVE_EXT
From: Theodore Ts'o @ 2026-06-11 12:35 UTC (permalink / raw)
  To: Ext4 Developers List, adilger.kernel, libaokun, jack, ojaswin,
	ritesh.list, yi.zhang, dmonakhov, Yun Zhou
  Cc: Theodore Ts'o, linux-kernel
In-Reply-To: <20260608152521.1292656-1-yun.zhou@windriver.com>


On Mon, 08 Jun 2026 23:25:21 +0800, Yun Zhou wrote:
> Reject the EXT4_IOC_MOVE_EXT ioctl early if the donor file does not
> belong to the same superblock as the original file.  Currently, this
> validation is performed inside ext4_move_extents() by
> mext_check_validity(), but only after lock_two_nondirectories() has
> already acquired the inode locks.  When the donor fd refers to a file
> on a different filesystem (e.g., overlayfs), this late validation
> creates a circular lock dependency:
> 
> [...]

Applied, thanks!

[1/1] ext4: validate donor file superblock early in EXT4_IOC_MOVE_EXT
      commit: c143957520c6c9b5cd72e0de8b52b814f0c576fe

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply

* Re: [PATCH] ext4: Remove mention of PageWriteback
From: Theodore Ts'o @ 2026-06-11 12:35 UTC (permalink / raw)
  To: Ext4 Developers List, Matthew Wilcox (Oracle)
  Cc: Theodore Ts'o, Andreas Dilger, Baokun Li, Jan Kara,
	Ojaswin Mujoo, Ritesh Harjani (IBM), Zhang Yi, linux-kernel
In-Reply-To: <20260526190805.341676-1-willy@infradead.org>


On Tue, 26 May 2026 20:08:02 +0100, Matthew Wilcox (Oracle) wrote:
> Update a comment to refer to the concept of writeback instead of the
> (now obsolete) detail of how it's implemented.

Applied, thanks!

[1/1] ext4: Remove mention of PageWriteback
      commit: 4e3a55f44b42c2aabd4c1cc3bdb6a01a7107121d

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply

* Re: [PATCH v2] ext4: Fix ERR_PTR(0) in ext4_mkdir()
From: Theodore Ts'o @ 2026-06-11 12:35 UTC (permalink / raw)
  To: Ext4 Developers List, adilger.kernel, libaokun, jack, ojaswin,
	ritesh.list, yi.zhang, neil, brauner, jlayton, Hongling Zeng
  Cc: Theodore Ts'o, linux-kernel, zhongling0719
In-Reply-To: <20260604073647.211279-1-zenghongling@kylinos.cn>


On Thu, 04 Jun 2026 15:36:47 +0800, Hongling Zeng wrote:
> When mkdir succeeds, ext4_mkdir() returns ERR_PTR(0) which is incorrect.
> It should return NULL instead for success and ERR_PTR() only with
> negative error codes for failure.

Applied, thanks!

[1/1] ext4: Fix ERR_PTR(0) in ext4_mkdir()
      commit: 8e1c43af7cf5091d99db38b7c8129e394d7f45b5

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply

* Re: [PATCH v4] ext4: fix kernel BUG in ext4_write_inline_data_end
From: Theodore Ts'o @ 2026-06-11 12:35 UTC (permalink / raw)
  To: Ext4 Developers List, Andreas Dilger, Aditya Prakash Srivastava
  Cc: Theodore Ts'o, Jan Kara, Baokun Li, Ojaswin Mujoo,
	Ritesh Harjani, Zhang Yi, sashiko-reviews, linux-kernel,
	syzbot+0c89d865531d053abb2d
In-Reply-To: <20260609062005.1702-1-aditya.ansh182@gmail.com>


On Tue, 09 Jun 2026 06:20:05 +0000, Aditya Prakash Srivastava wrote:
> When the data=journal mount option is used, the ext4_journalled_write_end()
> function incorrectly calls ext4_write_inline_data_end() without checking
> if the EXT4_STATE_MAY_INLINE_DATA flag is still set on the inode.
> 
> If a previous attempt to convert the inline data to an extent failed (e.g.
> due to ENOSPC), the EXT4_STATE_MAY_INLINE_DATA flag is cleared, but
> the EXT4_INODE_INLINE_DATA flag remains set. In this scenario, the next
> call to ext4_write_begin() will not prepare the inline data xattr for
> writing, but ext4_journalled_write_end() will incorrectly attempt to write
> to it, triggering a BUG_ON(pos + len > EXT4_I(inode)->i_inline_size) in
> ext4_write_inline_data() since i_inline_size was not expanded.
> 
> [...]

Applied, thanks!

[1/1] ext4: fix kernel BUG in ext4_write_inline_data_end
      commit: ad09aa45965d3fafaf9963bc78109b73c0f9ac8d

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply

* Re: [PATCH v4] iomap: add simple read path for small direct I/O
From: Fengnan @ 2026-06-11 12:04 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: brauner, djwong, hch, ojaswin, dgc, linux-xfs, linux-fsdevel,
	linux-ext4, linux-kernel, lidiangang, p.raghav
In-Reply-To: <mmbe4kdeqg6zlblhysi27qno22dtkaahv7bzslaqopsg4k3qs7@nofv525nnl6c>

在 2026/6/11 17:36, Pankaj Raghav (Samsung) 写道:
>> +static ssize_t iomap_dio_simple_read_complete(struct kiocb *iocb,
>> +		struct bio *bio)
>> +{
>> +	struct inode *inode = file_inode(iocb->ki_filp);
>> +	ssize_t ret;
>> +
>> +	WRITE_ONCE(iocb->private, NULL);
>> +
>> +	ret = iomap_dio_simple_read_finish(iocb, bio,
>> +			blk_status_to_errno(bio->bi_status));
>> +
>> +	inode_dio_end(inode);
>> +	trace_iomap_dio_complete(iocb, ret < 0 ? ret : 0, ret > 0 ? ret : 0);
> Shouldn't the second parameter here be
> blk_status_to_errno(bio->bi_status)?
>
> I think that will be more meaningful for tracing here.
> trace_iomap_dio_complete(iocb, blk_status_to_errno(bio->bi_status), ret);
Makes sense. I’ll update it in the next version.

>
> <snip>
>> +	return ret;
>> +}
>> +
>> +	sr->iocb = iocb;
>> +	sr->dio_flags = dio_flags;
>> +
>> +	bio->bi_iter.bi_sector = iomap_sector(&iomi.iomap, iomi.pos);
>> +	bio->bi_ioprio = iocb->ki_ioprio;
>> +	bio->bi_private = sr;
>> +	bio->bi_end_io = iomap_dio_simple_read_end_io;
>> +
>> +	if (dio_flags & IOMAP_DIO_BOUNCE)
>> +		ret = bio_iov_iter_bounce(bio, iter, count);
>> +	else
>> +		ret = bio_iov_iter_get_pages(bio, iter, alignment - 1);
>> +	if (unlikely(ret))
>> +		goto out_bio_put;
>> +
>> +	if (bio->bi_iter.bi_size != count) {
>> +		iov_iter_revert(iter, bio->bi_iter.bi_size);
>> +		ret = -ENOTBLK;
>> +		goto out_bio_release_pages;
>> +	}
>> +
>> +	sr->size = bio->bi_iter.bi_size;
>> +
>> +	if ((dio_flags & IOMAP_DIO_USER_BACKED) &&
>> +	    !(dio_flags & IOMAP_DIO_BOUNCE))
>> +		bio_set_pages_dirty(bio);
>> +
>> +	if (iocb->ki_flags & IOCB_NOWAIT)
>> +		bio->bi_opf |= REQ_NOWAIT;
>> +	if ((iocb->ki_flags & IOCB_HIPRI) && !wait_for_completion) {
>> +		bio->bi_opf |= REQ_POLLED;
>> +		bio_set_polled(bio, iocb);
> This results in build failure as the following patch removed this call:
> https://lore.kernel.org/linux-block/20260518062917.506483-1-hch@lst.de/
>
> I think this call can just be removed as you are setting REQ_POLLED
> anyway.
You’re right. I’ll update that in the next version too.

Thanks.

>
>> +		WRITE_ONCE(iocb->private, bio);
>> +	}
>> +
>> +	if (wait_for_completion) {
>> +		sr->waiter = current;
>> +		blk_crypto_submit_bio(bio);
>> +	} else {
>> +		atomic_set(&sr->state, IOMAP_DIO_SIMPLE_SUBMITTING);
>> +		sr->waiter = NULL;
>> +		blk_crypto_submit_bio(bio);
>> +		ret = -EIOCBQUEUED;
>> +	}
>> +
> --
> Pankaj

^ permalink raw reply

* Re: [PATCH v2 0/4] show orphan file inode detail info
From: yebin @ 2026-06-11 11:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: tytso, adilger.kernel, linux-ext4
In-Reply-To: <a5v57ie6feotxznmhrf3i22gzplw2ucotlnw3y7hmjhkalbb26@bx2lzoil75ks>



On 2026/6/9 19:13, Jan Kara wrote:
> On Mon 08-06-26 19:44:20, yebin wrote:
>> On 2026/4/16 1:59, Jan Kara wrote:
>>> On Wed 15-04-26 18:55:01, Ye Bin wrote:
>>>> From: Ye Bin <yebin10@huawei.com>
>>>>
>>>> Diffs v2 vs v1:
>>>> (1) Fix sashiko review issues:
>>>> https://sashiko.dev/#/patchset/20260403082507.1882703-1-yebin%40huaweicloud.com
>>>> (2) Change "orphan_list" file mode from 0444 to 0400;
>>>> (3) The display format of the "orphan_list" file is modified according
>>>>       to Andreas' suggestions.
>>>> Fault injection tests have been conducted to address the issues raised
>>>> in the sashik review. There is no UAF issue in the ext4_seq_orphan_release()
>>>> function. The reason for this has already been explained in the code comments.
>>>> In addition to the fault injection tests, we also performed a stress test by
>>>> observing the /proc/fs/ext4/XX/orphan_list and the concurrent processes of
>>>> adding and removing orphan nodes, and no issues were found so far.
>>>>
>>>>
>>>> In actual production environments, the issue of inconsistency between
>>>> df and du is frequently encountered. In many cases, the cause of the
>>>> problem can be identified through the use of lsof. However, when
>>>> overlayfs is combined with project quota configuration, the issue becomes
>>>> more complex and troublesome to diagnose. First, to determine the project
>>>> ID, one needs to obtain orphaned nodes using `fsck.ext4 -fn /dev/xx`, and
>>>> then retrieve file information through `debugfs`. However, the file names
>>>> cannot always be obtained, and it is often unclear which files they are.
>>>> To identify which files these are, one would need to use crash for online
>>>> debugging or use kprobe to gather information incrementally. However, some
>>>> customers in production environments do not agree to upload any tools, and
>>>> online debugging might impact the business. There are also scenarios where
>>>> files are opened in kernel mode, which do not generate file descriptors(fds),
>>>> making it impossible to identify which files were deleted but still have
>>>> references through lsof. This patchset adds a procfs interface to query
>>>> information about orphaned nodes, which can assist in the analysis and
>>>> localization of such issues.
>>>
>>> Ye, did you read my comments to the v1 of the patchset [1]? I didn't see
>>> any reply from you. I don't think this is a good way how to expose orphan
>>> information for a filesystem for reasons I've outlined in that email.
>>>
>>
>> Hi Jan
>>
>> I thought about how to prevent resource exhaustion caused by making too many
>> FDs in a single application. My idea is that IOCTL should only obtain one FD
>> at a time, and the next time it should start obtaining orphan nodes from the
>> inode after the previous one. Each time an fd is obtained, the previous fd
>> should be closed. I expect that after traversing all the fds from the beginning,
>> they will all be closed and there will be no need for user space to close them
>> manually. I wonder if this approach is feasible? Or do you have any good
>> suggestions?
>
> Hum, I think you've misunderstood my suggestion in [1]. What I suggested
> is:
>
> 1) Provide ioctl GET_ORPHAN_FILES that will return one "virtual" fd that
> tracks state of iteration over orphan entries of a superblock
>
> 2) Reading from this fd will be returning file *handles* (as struct
> file_handle) describing the orphan inodes. There are no kernel resources
> struct file_handle occupies in the kernel. It is essentially just a
> filesystem agnostic container for inode number and inode generation number.
> Userspace can then use open_by_handle() syscall to convert struct
> file_handle into normal file descriptor but that is upto userspace and what
> it wants orphan information for.
>
> Is the design clearer now?
>
Thank you for your patient explanation. I have implemented it according to
your suggestion and am currently testing it locally. After the testing is
complete, I will release it. I hope I have not misunderstood your meaning
this time.
> 								Honza
>
> [1] https://lore.kernel.org/all/n4sccudy5avcgnkdhc27rzofzoprxqtwhfrlmsh3yyrj6vbc6d@mmu73gmtawkq/
>


^ permalink raw reply

* [syzbot ci] Re: Data in direntry (dirdata) feature
From: syzbot ci @ 2026-06-11 10:29 UTC (permalink / raw)
  To: adilger.kernel, adilger, adilger, artem.blagodarenko, linux-ext4,
	pravin.shelar
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

syzbot ci has tested the following series

[v2] Data in direntry (dirdata) feature
https://lore.kernel.org/all/20260610152417.13576-1-ablagodarenko@thelustrecollective.com
* [PATCH v2 01/10] ext4: replace ext4_dir_entry with ext4_dir_entry_2
* [PATCH v2 02/10] ext4: add ext4_dir_entry_is_tail()
* [PATCH v2 03/10] ext4: refactor dx_root to support variable dirent sizes
* [PATCH v2 04/10] ext4: add dirdata format definitions and access helpers
* [PATCH v2 05/10] ext4: preserve dirdata bits in get_dtype()
* [PATCH v2 06/10] ext4: add ext4_dir_entry_len() and harden dirdata parsing
* [PATCH v2 07/10] ext4: rename ext4_dir_rec_len() and clarify dirdata usage
* [PATCH v2 08/10] ext4: dirdata feature
* [PATCH v2 09/10] ext4: add dirdata set/get helpers
* [PATCH v2 10/10] ext4: Add EXT4_IOC_SET_LUFID ioctl for setting LUFID on directory entries

and found the following issues:
* KASAN: slab-out-of-bounds Read in __ext4_check_dir_entry
* KASAN: slab-out-of-bounds Read in ext4_inlinedir_to_tree
* KASAN: slab-use-after-free Read in __ext4_check_dir_entry
* KASAN: slab-use-after-free Read in ext4_inlinedir_to_tree
* KASAN: use-after-free Read in __ext4_check_dir_entry

Full report is available here:
https://ci.syzbot.org/series/5bf0e2fa-2e68-4532-8396-4568879b2788

***

KASAN: slab-out-of-bounds Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/b0854918-13f9-49dd-ab30-12154f0debe2/syz_repro

loop0: lost filesystem error report for type 5 error -117
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
BUG: KASAN: slab-out-of-bounds in ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
BUG: KASAN: slab-out-of-bounds in __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
Read of size 1 at addr ffff8881022db7f5 by task syz.0.23/5815

CPU: 1 UID: 0 PID: 5815 Comm: syz.0.23 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
 __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
 ext4_check_all_de+0x66/0x150 fs/ext4/dir.c:657
 ext4_convert_inline_data_nolock+0x1b7/0x990 fs/ext4/inline.c:1121
 ext4_try_add_inline_entry+0x604/0x8e0 fs/ext4/inline.c:1247
 __ext4_add_entry+0x390/0x1f40 fs/ext4/namei.c:2529
 ext4_add_entry fs/ext4/namei.c:2613 [inline]
 ext4_mkdir+0x5e5/0xce0 fs/ext4/namei.c:3175
 vfs_mkdir+0x413/0x630 fs/namei.c:5271
 filename_mkdirat+0x285/0x510 fs/namei.c:5304
 __do_sys_mkdirat fs/namei.c:5325 [inline]
 __se_sys_mkdirat+0x35/0x150 fs/namei.c:5322
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f669359bcc7
Code: 00 66 90 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 db f7 ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 b8 02 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd42381d38 EFLAGS: 00000246 ORIG_RAX: 0000000000000102
RAX: ffffffffffffffda RBX: 00007ffd42381dc0 RCX: 00007f669359bcc7
RDX: 00000000000001ff RSI: 0000200000001200 RDI: 00000000ffffff9c
RBP: 00002000000024c0 R08: 0000200000000240 R09: 0000000000000000
R10: 00002000000024c0 R11: 0000000000000246 R12: 0000200000001200
R13: 00007ffd42381d80 R14: 0000000000000000 R15: 0000000000000000
 </TASK>

Allocated by task 5066:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x31c/0x660 mm/slub.c:5420
 kmalloc_noprof include/linux/slab.h:950 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 kernfs_get_open_node fs/kernfs/file.c:543 [inline]
 kernfs_fop_open+0x862/0xda0 fs/kernfs/file.c:718
 do_dentry_open+0x822/0x13a0 fs/open.c:947
 vfs_open+0x3b/0x340 fs/open.c:1079
 do_open fs/namei.c:4699 [inline]
 path_openat+0x2e08/0x3860 fs/namei.c:4858
 do_file_open+0x23e/0x4a0 fs/namei.c:4887
 do_sys_openat2+0x113/0x200 fs/open.c:1364
 do_sys_open fs/open.c:1370 [inline]
 __do_sys_openat fs/open.c:1386 [inline]
 __se_sys_openat fs/open.c:1381 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1381
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Last potentially related work creation:
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 kvfree_call_rcu+0x100/0x430 mm/slab_common.c:1970
 kernfs_unlink_open_file+0x3fe/0x4b0 fs/kernfs/file.c:604
 kernfs_fop_release+0x2eb/0x440 fs/kernfs/file.c:783
 __fput+0x44f/0xa60 fs/file_table.c:510
 fput_close_sync+0x11f/0x240 fs/file_table.c:615
 __do_sys_close fs/open.c:1507 [inline]
 __se_sys_close fs/open.c:1492 [inline]
 __x64_sys_close+0x7e/0x110 fs/open.c:1492
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff8881022db700
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 117 bytes to the right of
 allocated 128-byte region [ffff8881022db700, ffff8881022db780)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1022db
flags: 0x17ff00000000000(node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000000 ffff888100041a00 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2000(__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 0, tgid 0 (swapper/0), ts 2408938923, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7272
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4652
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 __do_kmalloc_node mm/slub.c:5295 [inline]
 __kmalloc_noprof+0x474/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 __alloc_empty_sheaf mm/slub.c:2768 [inline]
 alloc_empty_sheaf mm/slub.c:2783 [inline]
 __pcs_replace_empty_main+0x2df/0x720 mm/slub.c:4647
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 kmem_cache_alloc_noprof+0x37d/0x650 mm/slub.c:4906
 dup_fd+0x55/0xb40 fs/file.c:390
 copy_files+0xc8/0x120 kernel/fork.c:1639
 copy_process+0x1d94/0x4440 kernel/fork.c:2252
 kernel_clone+0x2d7/0x940 kernel/fork.c:2722
 user_mode_thread+0x110/0x180 kernel/fork.c:2798
 rest_init+0x23/0x300 init/main.c:727
 start_kernel+0x38a/0x3e0 init/main.c:1220
page_owner free stack trace missing

Memory state around the buggy address:
 ffff8881022db680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8881022db700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8881022db780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                             ^
 ffff8881022db800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8881022db880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: slab-out-of-bounds Read in ext4_inlinedir_to_tree

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/2dff870b-f382-4c93-8d8d-b2291d921224/syz_repro

loop1: lost filesystem error report for type 5 error -117
EXT4-fs (loop1): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_dir_entry_len fs/ext4/ext4.h:4095 [inline]
BUG: KASAN: slab-out-of-bounds in ext4_inlinedir_to_tree+0xda5/0x10d0 fs/ext4/inline.c:1335
Read of size 2 at addr ffff888115a3183c by task syz.1.18/5839

CPU: 1 UID: 0 PID: 5839 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dir_entry_len fs/ext4/ext4.h:4095 [inline]
 ext4_inlinedir_to_tree+0xda5/0x10d0 fs/ext4/inline.c:1335
 ext4_htree_fill_tree+0x517/0x1230 fs/ext4/namei.c:1182
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2db4/0x3640 fs/ext4/dir.c:146
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents64 fs/readdir.c:399 [inline]
 __se_sys_getdents64+0xf1/0x280 fs/readdir.c:384
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f3e02b9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3e03ad5028 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
RAX: ffffffffffffffda RBX: 00007f3e02e15fa0 RCX: 00007f3e02b9ce59
RDX: 0000000000001000 RSI: 0000200000000f80 RDI: 0000000000000004
RBP: 00007f3e02c32d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f3e02e16038 R14: 00007f3e02e15fa0 R15: 00007ffcaa902298
 </TASK>

Allocated by task 5839:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5296 [inline]
 __kmalloc_noprof+0x35c/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 ext4_inlinedir_to_tree+0x312/0x10d0 fs/ext4/inline.c:1292
 ext4_htree_fill_tree+0x517/0x1230 fs/ext4/namei.c:1182
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2db4/0x3640 fs/ext4/dir.c:146
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents64 fs/readdir.c:399 [inline]
 __se_sys_getdents64+0xf1/0x280 fs/readdir.c:384
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff888115a31800
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes to the right of
 allocated 60-byte region [ffff888115a31800, ffff888115a3183c)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x115a31
flags: 0x17ff00000000000(node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000000 ffff8881000418c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2c40(GFP_NOFS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5051, tgid 5051 (acpid), ts 27203740677, free_ts 27201732767
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7272
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4652
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 __do_kmalloc_node mm/slub.c:5295 [inline]
 __kmalloc_noprof+0x474/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 tomoyo_get_name+0x20c/0x590 security/tomoyo/memory.c:173
 tomoyo_parse_name_union+0xd9/0x130 security/tomoyo/util.c:260
 tomoyo_update_path_acl security/tomoyo/file.c:399 [inline]
 tomoyo_write_file+0x3a6/0xc50 security/tomoyo/file.c:1027
 tomoyo_write_domain2 security/tomoyo/common.c:1160 [inline]
 tomoyo_add_entry security/tomoyo/common.c:2177 [inline]
 tomoyo_supervisor+0x1208/0x1570 security/tomoyo/common.c:2238
 tomoyo_audit_path_log security/tomoyo/file.c:169 [inline]
 tomoyo_path_permission+0x25a/0x380 security/tomoyo/file.c:592
 tomoyo_check_open_permission+0x2b2/0x470 security/tomoyo/file.c:782
 security_file_open+0xa9/0x240 security/security.c:2739
 do_dentry_open+0x4a8/0x13a0 fs/open.c:924
 vfs_open+0x3b/0x340 fs/open.c:1079
page last free pid 15 tgid 15 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc1c/0xd30 mm/page_alloc.c:2938
 __tlb_remove_table_free mm/mmu_gather.c:228 [inline]
 tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
 rcu_do_batch kernel/rcu/tree.c:2617 [inline]
 rcu_core+0x7cd/0x1070 kernel/rcu/tree.c:2869
 handle_softirqs+0x22a/0x840 kernel/softirq.c:622
 run_ksoftirqd+0x36/0x60 kernel/softirq.c:1076
 smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff888115a31700: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff888115a31780: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
>ffff888115a31800: 00 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc
                                        ^
 ffff888115a31880: 00 00 00 00 00 00 02 fc fc fc fc fc fc fc fc fc
 ffff888115a31900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: slab-use-after-free Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/f1d48ea1-6e87-4d64-9c13-8bf8aed109fc/syz_repro

loop0: lost filesystem error report for type 5 error -117
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
BUG: KASAN: slab-use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
BUG: KASAN: slab-use-after-free in __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
Read of size 1 at addr ffff888114d8c045 by task syz.0.20/5821

CPU: 1 UID: 0 PID: 5821 Comm: syz.0.20 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
 __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
 ext4_find_dest_de+0x136/0x770 fs/ext4/namei.c:2203
 ext4_add_dirent_to_inline+0xcf/0x430 fs/ext4/inline.c:984
 ext4_try_add_inline_entry+0x235/0x8e0 fs/ext4/inline.c:1213
 __ext4_add_entry+0x390/0x1f40 fs/ext4/namei.c:2529
 ext4_add_entry fs/ext4/namei.c:2613 [inline]
 ext4_add_nondir+0x111/0x310 fs/ext4/namei.c:2936
 ext4_create+0x2e9/0x470 fs/ext4/namei.c:2982
 lookup_open fs/namei.c:4511 [inline]
 open_last_lookups fs/namei.c:4611 [inline]
 path_openat+0x1395/0x3860 fs/namei.c:4855
 do_file_open+0x23e/0x4a0 fs/namei.c:4887
 do_sys_openat2+0x113/0x200 fs/open.c:1364
 do_sys_open fs/open.c:1370 [inline]
 __do_sys_openat fs/open.c:1386 [inline]
 __se_sys_openat fs/open.c:1381 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1381
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f922219ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f9223137028 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f9222415fa0 RCX: 00007f922219ce59
RDX: 0000000000042042 RSI: 0000200000000080 RDI: 0000000000000004
RBP: 00007f9222232d6f R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000014a R11: 0000000000000246 R12: 0000000000000000
R13: 00007f9222416038 R14: 00007f9222415fa0 R15: 00007ffd01a2d448
 </TASK>

Allocated by task 5484:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4570 [inline]
 slab_alloc_node mm/slub.c:4899 [inline]
 kmem_cache_alloc_node_noprof+0x384/0x690 mm/slub.c:4951
 kmalloc_reserve net/core/skbuff.c:613 [inline]
 __alloc_skb+0x27d/0x7d0 net/core/skbuff.c:713
 alloc_skb include/linux/skbuff.h:1385 [inline]
 nlmsg_new include/net/netlink.h:1055 [inline]
 mpls_netconf_notify_devconf+0x46/0x100 net/mpls/af_mpls.c:1217
 mpls_dev_notify+0xb2d/0xd10 net/mpls/af_mpls.c:1691
 notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
 call_netdevice_notifiers net/core/dev.c:2301 [inline]
 unregister_netdevice_many_notify+0x17a5/0x22c0 net/core/dev.c:12421
 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
 ops_undo_list+0x3d3/0x940 net/core/net_namespace.c:248
 cleanup_net+0x56b/0x800 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Freed by task 5484:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2689 [inline]
 slab_free mm/slub.c:6251 [inline]
 kfree+0x1c5/0x640 mm/slub.c:6566
 skb_kfree_head net/core/skbuff.c:1075 [inline]
 skb_free_head net/core/skbuff.c:1087 [inline]
 skb_release_data+0x828/0xa60 net/core/skbuff.c:1114
 skb_release_all net/core/skbuff.c:1189 [inline]
 __kfree_skb+0x5d/0x210 net/core/skbuff.c:1203
 netlink_broadcast_filtered+0xe18/0xf20 net/netlink/af_netlink.c:1540
 nlmsg_multicast_filtered include/net/netlink.h:1165 [inline]
 nlmsg_multicast include/net/netlink.h:1184 [inline]
 nlmsg_notify+0xf0/0x1a0 net/netlink/af_netlink.c:2598
 mpls_dev_notify+0xb2d/0xd10 net/mpls/af_mpls.c:1691
 notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
 call_netdevice_notifiers net/core/dev.c:2301 [inline]
 unregister_netdevice_many_notify+0x17a5/0x22c0 net/core/dev.c:12421
 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
 ops_undo_list+0x3d3/0x940 net/core/net_namespace.c:248
 cleanup_net+0x56b/0x800 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The buggy address belongs to the object at ffff888114d8c000
 which belongs to the cache skbuff_small_head of size 704
The buggy address is located 69 bytes inside of
 freed 704-byte region [ffff888114d8c000, ffff888114d8c2c0)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x114d8c
head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x17ff00000000040(head|node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000040 ffff888160416b40 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800120012 00000000f5000000 0000000000000000
head: 017ff00000000040 ffff888160416b40 dead000000000100 dead000000000122
head: 0000000000000000 0000000800120012 00000000f5000000 0000000000000000
head: 017ff00000000002 ffffffffffffff01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000004
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5484, tgid 5484 (kworker/u8:2), ts 72573003529, free_ts 72546506446
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7272
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4652
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 kmem_cache_alloc_node_noprof+0x441/0x690 mm/slub.c:4951
 kmalloc_reserve net/core/skbuff.c:613 [inline]
 __alloc_skb+0x27d/0x7d0 net/core/skbuff.c:713
 alloc_skb include/linux/skbuff.h:1385 [inline]
 nlmsg_new include/net/netlink.h:1055 [inline]
 mpls_netconf_notify_devconf+0x46/0x100 net/mpls/af_mpls.c:1217
 mpls_dev_notify+0xb2d/0xd10 net/mpls/af_mpls.c:1691
 notifier_call_chain+0x1ad/0x3d0 kernel/notifier.c:85
 call_netdevice_notifiers_extack net/core/dev.c:2287 [inline]
 call_netdevice_notifiers net/core/dev.c:2301 [inline]
 unregister_netdevice_many_notify+0x17a5/0x22c0 net/core/dev.c:12421
 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
 ops_undo_list+0x3d3/0x940 net/core/net_namespace.c:248
 cleanup_net+0x56b/0x800 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
page last free pid 5484 tgid 5484 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc1c/0xd30 mm/page_alloc.c:2938
 stack_depot_save_flags+0x40e/0x810 lib/stackdepot.c:735
 kasan_save_stack mm/kasan/common.c:58 [inline]
 kasan_save_track+0x4f/0x80 mm/kasan/common.c:78
 unpoison_slab_object mm/kasan/common.c:340 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:366
 kasan_slab_alloc include/linux/kasan.h:253 [inline]
 slab_post_alloc_hook mm/slub.c:4570 [inline]
 slab_alloc_node mm/slub.c:4899 [inline]
 kmem_cache_alloc_noprof+0x2bc/0x650 mm/slub.c:4906
 kmem_alloc_batch lib/debugobjects.c:371 [inline]
 fill_pool+0x156/0x580 lib/debugobjects.c:420
 debug_objects_fill_pool lib/debugobjects.c:752 [inline]
 debug_object_activate+0x4a3/0x580 lib/debugobjects.c:841
 debug_rcu_head_queue kernel/rcu/rcu.h:236 [inline]
 __call_rcu_common kernel/rcu/tree.c:3116 [inline]
 call_rcu+0x43/0x890 kernel/rcu/tree.c:3251
 kernfs_put+0x259/0x520 fs/kernfs/dir.c:618
 kernfs_remove_by_name_ns+0xc8/0x140 fs/kernfs/dir.c:1799
 device_remove_class_symlinks+0x178/0x190 drivers/base/core.c:3479
 device_del+0x400/0x8f0 drivers/base/core.c:3881
 unregister_netdevice_many_notify+0x1d5f/0x22c0 net/core/dev.c:12456
 ops_exit_rtnl_list net/core/net_namespace.c:187 [inline]
 ops_undo_list+0x3d3/0x940 net/core/net_namespace.c:248
 cleanup_net+0x56b/0x800 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397

Memory state around the buggy address:
 ffff888114d8bf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888114d8bf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888114d8c000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff888114d8c080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888114d8c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


***

KASAN: slab-use-after-free Read in ext4_inlinedir_to_tree

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/f42da242-e16e-4f10-bf25-0bd7e192d989/syz_repro

loop0: lost filesystem error report for type 5 error -117
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
BUG: KASAN: slab-use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
BUG: KASAN: slab-use-after-free in ext4_inlinedir_to_tree+0x94c/0x10d0 fs/ext4/inline.c:1335
Read of size 1 at addr ffff88816fee8825 by task syz.0.20/5867

CPU: 1 UID: 0 PID: 5867 Comm: syz.0.20 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
 ext4_inlinedir_to_tree+0x94c/0x10d0 fs/ext4/inline.c:1335
 ext4_htree_fill_tree+0x517/0x1230 fs/ext4/namei.c:1182
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2db4/0x3640 fs/ext4/dir.c:146
 iterate_dir+0x399/0x570 fs/readdir.c:110
 __do_sys_getdents fs/readdir.c:319 [inline]
 __se_sys_getdents+0xf1/0x270 fs/readdir.c:304
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f010ad9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f010bc0f028 EFLAGS: 00000246 ORIG_RAX: 000000000000004e
RAX: ffffffffffffffda RBX: 00007f010b015fa0 RCX: 00007f010ad9ce59
RDX: 0000000000000054 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007f010ae32d6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f010b016038 R14: 00007f010b015fa0 R15: 00007ffd93577348
 </TASK>

Allocated by task 5064:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5296 [inline]
 __kmalloc_noprof+0x35c/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 tomoyo_encode2 security/tomoyo/realpath.c:45 [inline]
 tomoyo_encode+0x28b/0x550 security/tomoyo/realpath.c:80
 tomoyo_realpath_from_path+0x58d/0x5d0 security/tomoyo/realpath.c:283
 tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
 tomoyo_path_perm+0x283/0x560 security/tomoyo/file.c:827
 security_inode_getattr+0x12b/0x310 security/security.c:1895
 vfs_getattr fs/stat.c:259 [inline]
 vfs_fstat fs/stat.c:281 [inline]
 vfs_fstatat+0xb4/0x170 fs/stat.c:371
 __do_sys_newfstatat fs/stat.c:538 [inline]
 __se_sys_newfstatat fs/stat.c:532 [inline]
 __x64_sys_newfstatat+0x151/0x200 fs/stat.c:532
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 5064:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2689 [inline]
 slab_free mm/slub.c:6251 [inline]
 kfree+0x1c5/0x640 mm/slub.c:6566
 tomoyo_path_perm+0x403/0x560 security/tomoyo/file.c:847
 security_inode_getattr+0x12b/0x310 security/security.c:1895
 vfs_getattr fs/stat.c:259 [inline]
 vfs_fstat fs/stat.c:281 [inline]
 vfs_fstatat+0xb4/0x170 fs/stat.c:371
 __do_sys_newfstatat fs/stat.c:538 [inline]
 __se_sys_newfstatat fs/stat.c:532 [inline]
 __x64_sys_newfstatat+0x151/0x200 fs/stat.c:532
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88816fee8800
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 37 bytes inside of
 freed 64-byte region [ffff88816fee8800, ffff88816fee8840)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16fee8
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff8881000418c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 21294026082, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3278 [inline]
 allocate_slab+0x77/0x660 mm/slub.c:3467
 new_slab mm/slub.c:3525 [inline]
 refill_objects+0x339/0x3d0 mm/slub.c:7272
 refill_sheaf mm/slub.c:2816 [inline]
 __pcs_replace_empty_main+0x321/0x720 mm/slub.c:4652
 alloc_from_pcs mm/slub.c:4750 [inline]
 slab_alloc_node mm/slub.c:4884 [inline]
 __do_kmalloc_node mm/slub.c:5295 [inline]
 __kmalloc_noprof+0x474/0x760 mm/slub.c:5308
 kmalloc_noprof include/linux/slab.h:954 [inline]
 kzalloc_noprof include/linux/slab.h:1188 [inline]
 handler_new_ref+0x261/0x9c0 drivers/media/v4l2-core/v4l2-ctrls-core.c:1882
 v4l2_ctrl_add_handler+0x19f/0x290 drivers/media/v4l2-core/v4l2-ctrls-core.c:2443
 vivid_create_controls+0x332d/0x3bd0 drivers/media/test-drivers/vivid/vivid-ctrls.c:2072
 vivid_create_instance drivers/media/test-drivers/vivid/vivid-core.c:1933 [inline]
 vivid_probe+0x4261/0x72b0 drivers/media/test-drivers/vivid/vivid-core.c:2095
 platform_probe+0xf9/0x190 drivers/base/platform.c:1432
 call_driver_probe drivers/base/dd.c:-1 [inline]
 really_probe+0x267/0xaf0 drivers/base/dd.c:709
 __driver_probe_device+0x1ef/0x380 drivers/base/dd.c:871
 driver_probe_device+0x4f/0x240 drivers/base/dd.c:901
 __driver_attach+0x34c/0x640 drivers/base/dd.c:1295
page_owner free stack trace missing

Memory state around the buggy address:
 ffff88816fee8700: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
 ffff88816fee8780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
>ffff88816fee8800: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                               ^
 ffff88816fee8880: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff88816fee8900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: use-after-free Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      9716c086c8e8b141d35aa61f2e96a2e83de212a7
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/ddf6ee7c-dfa8-4383-b004-10140edc081c/config
syz repro: https://ci.syzbot.org/findings/57c0b75a-8922-4dc1-9a20-ca947564792b/syz_repro

==================================================================
BUG: KASAN: use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
BUG: KASAN: use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
BUG: KASAN: use-after-free in __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
Read of size 1 at addr ffff88816be85045 by task syz.2.21/5880

CPU: 1 UID: 0 PID: 5880 Comm: syz.2.21 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4069 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4096 [inline]
 __ext4_check_dir_entry+0x65a/0xc40 fs/ext4/dir.c:96
 ext4_find_dest_de+0x136/0x770 fs/ext4/namei.c:2203
 ext4_add_dirent_to_inline+0xcf/0x430 fs/ext4/inline.c:984
 ext4_try_add_inline_entry+0x235/0x8e0 fs/ext4/inline.c:1213
 __ext4_add_entry+0x390/0x1f40 fs/ext4/namei.c:2529
 ext4_add_entry fs/ext4/namei.c:2613 [inline]
 ext4_add_nondir+0x111/0x310 fs/ext4/namei.c:2936
 ext4_create+0x2e9/0x470 fs/ext4/namei.c:2982
 lookup_open fs/namei.c:4511 [inline]
 open_last_lookups fs/namei.c:4611 [inline]
 path_openat+0x1395/0x3860 fs/namei.c:4855
 do_file_open+0x23e/0x4a0 fs/namei.c:4887
 do_sys_openat2+0x113/0x200 fs/open.c:1364
 do_sys_open fs/open.c:1370 [inline]
 __do_sys_openat fs/open.c:1386 [inline]
 __se_sys_openat fs/open.c:1381 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1381
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5713b9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff672b25f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f5713e15fa0 RCX: 00007f5713b9ce59
RDX: 0000000000042042 RSI: 0000200000000080 RDI: 0000000000000004
RBP: 00007f5713c32d6f R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000014a R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5713e15fac R14: 00007f5713e15fa0 R15: 00007f5713e15fa0
 </TASK>

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16be85
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f0(buddy)
raw: 057ff00000000000 ffffea0005afa0c8 ffffea0005afa1c8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000f0000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL), pid 5630, tgid 5630 (syz-executor), ts 67290853657, free_ts 69321168948
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x2593/0x2610 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 __alloc_pages_noprof+0x10/0x100 mm/page_alloc.c:5255
 alloc_pages_bulk_noprof+0x5ff/0x7c0 mm/page_alloc.c:5175
 ___alloc_pages_bulk mm/kasan/shadow.c:345 [inline]
 __kasan_populate_vmalloc_do mm/kasan/shadow.c:370 [inline]
 __kasan_populate_vmalloc+0xc1/0x1d0 mm/kasan/shadow.c:424
 kasan_populate_vmalloc include/linux/kasan.h:580 [inline]
 alloc_vmap_area+0xd47/0x1480 mm/vmalloc.c:2123
 __get_vm_area_node+0x1f8/0x300 mm/vmalloc.c:3226
 __vmalloc_node_range_noprof+0x36a/0x1750 mm/vmalloc.c:4024
 vmalloc_user_noprof+0xad/0xe0 mm/vmalloc.c:4218
 kcov_ioctl+0x55/0x620 kernel/kcov.c:726
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5693 tgid 5693 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc1c/0xd30 mm/page_alloc.c:2938
 kasan_depopulate_vmalloc_pte+0x6d/0x90 mm/kasan/shadow.c:484
 apply_to_pte_range mm/memory.c:3338 [inline]
 apply_to_pmd_range mm/memory.c:3382 [inline]
 apply_to_pud_range mm/memory.c:3418 [inline]
 apply_to_p4d_range mm/memory.c:3454 [inline]
 __apply_to_page_range+0xbdc/0x1420 mm/memory.c:3490
 __kasan_release_vmalloc+0xa2/0xd0 mm/kasan/shadow.c:602
 kasan_release_vmalloc include/linux/kasan.h:593 [inline]
 kasan_release_vmalloc_node mm/vmalloc.c:2284 [inline]
 purge_vmap_node+0x220/0x960 mm/vmalloc.c:2306
 __purge_vmap_area_lazy+0x779/0xb40 mm/vmalloc.c:2396
 drain_vmap_area_work+0x27/0x40 mm/vmalloc.c:2430
 process_one_work kernel/workqueue.c:3314 [inline]
 process_scheduled_works+0xb5d/0x1860 kernel/workqueue.c:3397
 worker_thread+0xa53/0xfc0 kernel/workqueue.c:3478
 kthread+0x389/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff88816be84f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88816be84f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88816be85000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                           ^
 ffff88816be85080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88816be85100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* Re: [PATCH] iomap: enforce DIO alignment check in iomap
From: Carlos Maiolino @ 2026-06-11 10:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: brauner, linux-block, linux-fsdevel, linux-ext4, linux-xfs,
	Keith Busch, Hannes Reinecke, Martin K. Petersen, Jens Axboe
In-Reply-To: <20260611055744.GA18538@lst.de>

On Thu, Jun 11, 2026 at 07:57:44AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 10, 2026 at 04:52:11PM +0200, cem@kernel.org wrote:
> > From: Carlos Maiolino <cem@kernel.org>
> > 
> > The DIO alignment check has been lifted from iomap layer to rely on the
> > block layer to enforce proper alignment when issuing direct IO
> > operations. This though, depending on the IO size and buffer address
> > passed to the IO operation may lead to user-visible behavior change.
> > 
> > This has been caught initially by LTP test diotest4 running on
> > PPC architecture, where the test fails because a read() operation
> > with a supposedly misaligned buffer succeeds instead of an expected
> > -EINVAL.
> > This has no direct relationship with PPC, but seems to do with the
> > IO size crossing page borders or not.
> 
> I don't understand the problem here.  Why do we want to insist on a
> failure when we can support it?  I think the test is just broken.

The problem I see here from my POV is this changed the behavior expected
from the syscalls when the passed in buffer is misaligned as the read()
(in the test) succeeds when the passed in buffer does not match the
alignment requirements (see below).

I am pretty happy in declaring this a test bug, but I thought it would be
worth starting a discussion about the sudden/unexpected behavior change.
Not to mention now different filesystems will have different alignment
requirements which seems at least "weird" to me. I mean, now suddenly
iomap-based filesystems have a more relaxed alignment constraint than
for example btrfs.

> 
> > The problematic behavior is reproducible on x86 by reducing the IO size
> > to something < PAGE_SIZE, so the misaligned read()s will also be accepted
> > by the block layer.
> 
> What do you mean with misaligned here?  For a long time the kernel
> supports basically arbitrary low memory alignment for diret I/O,
> just bounded by the device capabilities (typical 4 byte alignment).

The test sends to read() a buffer misplaced by 1 byte (see below) which
doesn't match the system's alignment constraints at least from the user
passed buffer perspective.
I've been assuming it should match device's dma_alignment constraints.
The typical 4 byte alignment indeed is the requirement from my PPC
machine, but not for my x86:

> 
> The supported memory alignment is reported in the statx
> dio_mem_align.  What does that say compared to the alignment
> expectations in this test?

From my x86:
dio_mem_align: 512
dio_offset_align: 512

From PPC:
dio_mem_align: 4
dio_offset_align: 512

But this does not explain how the following call would succeed in either
case (below one taken from PPC):

openat(dirfd=AT_FDCWD, pathname="testdata-4.135256", flags=O_RDWR|O_DIRECT) = 3
_llseek(fd=3, offset=4096, result=[4096], whence=SEEK_SET) = 0
read(arg1=0x3, arg2=0x1003af80001, arg3=0x1000) = 0x1000

The passed in address 0x1003af80001 is one byte misaligned and shouldn't
(at least in theory) ever be accepted no? Or am I missing something
else?

^ permalink raw reply

* Re: [PATCH v4] iomap: add simple read path for small direct I/O
From: Pankaj Raghav (Samsung) @ 2026-06-11  9:36 UTC (permalink / raw)
  To: Fengnan Chang
  Cc: brauner, djwong, hch, ojaswin, dgc, linux-xfs, linux-fsdevel,
	linux-ext4, linux-kernel, lidiangang, p.raghav
In-Reply-To: <20260608073134.95964-1-changfengnan@bytedance.com>

> +static ssize_t iomap_dio_simple_read_complete(struct kiocb *iocb,
> +		struct bio *bio)
> +{
> +	struct inode *inode = file_inode(iocb->ki_filp);
> +	ssize_t ret;
> +
> +	WRITE_ONCE(iocb->private, NULL);
> +
> +	ret = iomap_dio_simple_read_finish(iocb, bio,
> +			blk_status_to_errno(bio->bi_status));
> +
> +	inode_dio_end(inode);
> +	trace_iomap_dio_complete(iocb, ret < 0 ? ret : 0, ret > 0 ? ret : 0);

Shouldn't the second parameter here be
blk_status_to_errno(bio->bi_status)?

I think that will be more meaningful for tracing here.
trace_iomap_dio_complete(iocb, blk_status_to_errno(bio->bi_status), ret);

<snip>
> +	return ret;
> +}
> +
> +	sr->iocb = iocb;
> +	sr->dio_flags = dio_flags;
> +
> +	bio->bi_iter.bi_sector = iomap_sector(&iomi.iomap, iomi.pos);
> +	bio->bi_ioprio = iocb->ki_ioprio;
> +	bio->bi_private = sr;
> +	bio->bi_end_io = iomap_dio_simple_read_end_io;
> +
> +	if (dio_flags & IOMAP_DIO_BOUNCE)
> +		ret = bio_iov_iter_bounce(bio, iter, count);
> +	else
> +		ret = bio_iov_iter_get_pages(bio, iter, alignment - 1);
> +	if (unlikely(ret))
> +		goto out_bio_put;
> +
> +	if (bio->bi_iter.bi_size != count) {
> +		iov_iter_revert(iter, bio->bi_iter.bi_size);
> +		ret = -ENOTBLK;
> +		goto out_bio_release_pages;
> +	}
> +
> +	sr->size = bio->bi_iter.bi_size;
> +
> +	if ((dio_flags & IOMAP_DIO_USER_BACKED) &&
> +	    !(dio_flags & IOMAP_DIO_BOUNCE))
> +		bio_set_pages_dirty(bio);
> +
> +	if (iocb->ki_flags & IOCB_NOWAIT)
> +		bio->bi_opf |= REQ_NOWAIT;
> +	if ((iocb->ki_flags & IOCB_HIPRI) && !wait_for_completion) {
> +		bio->bi_opf |= REQ_POLLED;
> +		bio_set_polled(bio, iocb);

This results in build failure as the following patch removed this call:
https://lore.kernel.org/linux-block/20260518062917.506483-1-hch@lst.de/

I think this call can just be removed as you are setting REQ_POLLED
anyway.

> +		WRITE_ONCE(iocb->private, bio);
> +	}
> +
> +	if (wait_for_completion) {
> +		sr->waiter = current;
> +		blk_crypto_submit_bio(bio);
> +	} else {
> +		atomic_set(&sr->state, IOMAP_DIO_SIMPLE_SUBMITTING);
> +		sr->waiter = NULL;
> +		blk_crypto_submit_bio(bio);
> +		ret = -EIOCBQUEUED;
> +	}
> +
--
Pankaj

^ permalink raw reply

* Re: [PATCH 00/17] replace __get_free_pages() call with kmalloc()
From: Mike Rapoport @ 2026-06-11  9:09 UTC (permalink / raw)
  To: Zi Yan
  Cc: Jan Kara, Mark Fasheh, Joel Becker, Joseph Qi, Ryusuke Konishi,
	Viacheslav Dubeyko, Trond Myklebust, Anna Schumaker, Chuck Lever,
	Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Alexander Viro, Christian Brauner, Jan Kara, Dave Kleikamp,
	Theodore Ts'o, Miklos Szeredi, Andreas Hindborg, Breno Leitao,
	Kees Cook, Tigran A. Aivazian, linux-kernel, linux-fsdevel,
	ocfs2-devel, linux-nilfs, linux-nfs, jfs-discussion, linux-ext4,
	linux-mm
In-Reply-To: <3FD8E1FD-6E18-46D9-AE93-00FA1A66C775@nvidia.com>

On Fri, Jun 05, 2026 at 04:00:33PM -0400, Zi Yan wrote:
> On 23 May 2026, at 13:54, Mike Rapoport (Microsoft) wrote:
> 
> > This is a (small) part of larger work of replacing page allocator calls
> > with kmalloc.
> 
> Is the goal to get rid of __get_free_page(s)()?

Yes, eventually.

My initial intention a few month ago was to remove the ugly casts [1], but
then willy pointed out that Linus objected to something like this [2] and
it looks like more than a decade old technical debt.

Since there are more than 600 or those it will take a while to convert
suitable gfp calls to kmalloc.
Afterwards we can re-evaluate what APIs we want to provide for allocations
that must have actual pages.

[1] https://lore.kernel.org/all/20251018093002.3660549-1-rppt@kernel.org/
[2] https://lore.kernel.org/all/CA+55aFwp4iy4rtX2gE2WjBGFL=NxMVnoFeHqYa2j1dYOMMGqxg@mail.gmail.com/ 

 
> Thanks.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH][e2fsprogs] build: use correct subst variable
From: Andreas Dilger @ 2026-06-11  8:18 UTC (permalink / raw)
  To: Li Dongyang; +Cc: linux-ext4
In-Reply-To: <20260611035236.307622-1-dongyangli@ddn.com>

On Jun 10, 2026, at 21:52, Li Dongyang <dongyangli@ddn.com> wrote:
> 
> ifNotGNUmake was changed to ifnGNUmake but test/Makefile.in still uses
> the old variable name.
> make fullcheck fails on some platforms:
> make[2]: Entering directory `/var/lib/jenkins/workspace/e2fsprogs-reviews/arch/x86_64/distro/el7/_topdir/BUILD/e2fsprogs-1.47.4/tests'
> Makefile:387: *** missing separator.  Stop.
> make[2]: Leaving directory `/var/lib/jenkins/workspace/e2fsprogs-reviews/arch/x86_64/distro/el7/_topdir/BUILD/e2fsprogs-1.47.4/tests'
> make[1]: *** [fullcheck-recursive] Error 1
> 
> Fixes: b7d1ab3376 "Update configure/configure.ac/aclocal.m4 to use autoconf 2.72"
> Change-Id: Iec3cacfca7206bf785381664b7d7bded8c70113c
> Signed-off-by: Li Dongyang <dongyangli@ddn.com>

Reviewed-by: Andreas Dilger <adilger@dilger.ca <mailto:adilger@dilger.ca>>

> ---
> tests/Makefile.in | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/Makefile.in b/tests/Makefile.in
> index 678cc3268c..8f7a072f45 100644
> --- a/tests/Makefile.in
> +++ b/tests/Makefile.in
> @@ -48,7 +48,7 @@ test_data.tmp: $(srcdir)/scripts/gen-test-data
> always_run:
> 
> @ifGNUmake@TESTS=$(wildcard $(srcdir)/[a-z]_*)
> -@ifNotGNUmake@TESTS != echo $(srcdir)/[a-z]_*
> +@ifnGNUmake@TESTS != echo $(srcdir)/[a-z]_*
> 
> SKIP_SLOW_TESTS=--skip-slow-tests
> 
> -- 
> 2.52.0
> 
> 


Cheers, Andreas






^ permalink raw reply

* Re: [PATCH] iomap: enforce DIO alignment check in iomap
From: Christoph Hellwig @ 2026-06-11  5:57 UTC (permalink / raw)
  To: cem
  Cc: brauner, linux-block, linux-fsdevel, linux-ext4, linux-xfs,
	Keith Busch, Hannes Reinecke, Martin K. Petersen,
	Christoph Hellwig, Jens Axboe
In-Reply-To: <20260610145218.141369-1-cem@kernel.org>

On Wed, Jun 10, 2026 at 04:52:11PM +0200, cem@kernel.org wrote:
> From: Carlos Maiolino <cem@kernel.org>
> 
> The DIO alignment check has been lifted from iomap layer to rely on the
> block layer to enforce proper alignment when issuing direct IO
> operations. This though, depending on the IO size and buffer address
> passed to the IO operation may lead to user-visible behavior change.
> 
> This has been caught initially by LTP test diotest4 running on
> PPC architecture, where the test fails because a read() operation
> with a supposedly misaligned buffer succeeds instead of an expected
> -EINVAL.
> This has no direct relationship with PPC, but seems to do with the
> IO size crossing page borders or not.

I don't understand the problem here.  Why do we want to insist on a
failure when we can support it?  I think the test is just broken.

> The problematic behavior is reproducible on x86 by reducing the IO size
> to something < PAGE_SIZE, so the misaligned read()s will also be accepted
> by the block layer.

What do you mean with misaligned here?  For a long time the kernel
supports basically arbitrary low memory alignment for diret I/O,
just bounded by the device capabilities (typical 4 byte alignment).

The supported memory alignment is reported in the statx
dio_mem_align.  What does that say compared to the alignment
expectations in this test?

^ permalink raw reply

* [PATCH 2/2] ext4: allocate the fast-commit range array lazily
From: Daejun Park @ 2026-06-11  4:49 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daejun Park
In-Reply-To: <20260611044817epcms2p3b5a66f4cdb41d0cbaaa7c257cccfc8a1@epcms2p3>

The multi-interval tracker added a fixed array of EXT4_FC_MAX_RANGES + 1
entries to every ext4_inode_info -- ~136 bytes that is wasted on inodes
that never use fast commit (read-only files, directories, ...).

Shrink it to the common case:

 - Keep the first range inline in i_fc_range, so a single contiguous
   dirty region (the common case) needs no allocation at all.

 - Allocate the i_fc_ranges array only when a second disjoint range
   appears, and free it when the inode is evicted.

 - The tracking path runs under i_fc_lock and so cannot sleep, so the
   array is allocated with GFP_ATOMIC.  On failure, fall back to
   coalescing the new range into the inline i_fc_range -- exactly the
   original single coalesced-range behaviour -- so no full-commit
   fallback or fast-commit ineligibility is needed.

The per-inode fast-commit footprint drops from ~140 bytes (the embedded
array) to 20 bytes (inline range + array pointer + count); the array is
allocated only while two or more disjoint ranges are tracked.

No on-disk format change.  Crash recovery (online replay + offline
e2fsck) and the fast-commit xfstests are unaffected.

While rewriting __track_range, also skip degenerate ranges (a sub-block
punch hole rounds the start up past the end, passing end == start - 1, so
no whole block changed) instead of storing an empty range, and drop the
redundant per-transaction reset here -- ext4_fc_track_template() already
resets the range set under i_fc_lock before calling the tracker.

Signed-off-by: Daejun Park <pdaejun@gmail.com>
---
 fs/ext4/ext4.h        | 19 ++++++++----
 fs/ext4/fast_commit.c | 70 +++++++++++++++++++++++++++++++++++++++----
 fs/ext4/super.c       |  1 +
 3 files changed, 80 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 314a1c90075b..6c6ac19e86b6 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1081,14 +1081,23 @@ struct ext4_inode_info {
 					 */
 
 	/*
-	 * Disjoint lblk ranges modified in this fast commit.  Tracking the
+	 * Logical block ranges modified in this fast commit.  Tracking the
 	 * actual modified ranges (instead of one coalesced [min,max]) avoids
 	 * re-logging the whole spanned extent map for scattered allocations.
-	 * Sorted by start, mutually disjoint.  Bounded by EXT4_FC_MAX_RANGES;
-	 * the extra slot is transient room used while inserting before an
-	 * overflow merge.  Protected by i_fc_lock.
+	 *
+	 * The first range is kept inline in i_fc_range, so the common case of a
+	 * single contiguous dirty region needs no allocation.  When a second
+	 * disjoint range appears the inode is upgraded to the i_fc_ranges array
+	 * (EXT4_FC_MAX_RANGES + 1 entries, sorted and mutually disjoint; the
+	 * extra slot is transient room used while inserting before an overflow
+	 * merge), allocated then and freed when the inode is evicted.  If that
+	 * allocation fails we fall back to coalescing into i_fc_range, i.e. the
+	 * original single coalesced-range behaviour.  i_fc_nr_ranges counts the
+	 * valid ranges; while i_fc_ranges is NULL it is 0 or 1.  Protected by
+	 * i_fc_lock.
 	 */
-	struct ext4_fc_lblk_range i_fc_ranges[EXT4_FC_MAX_RANGES + 1];
+	struct ext4_fc_lblk_range i_fc_range;
+	struct ext4_fc_lblk_range *i_fc_ranges;
 	unsigned int i_fc_nr_ranges;
 
 	spinlock_t i_raw_lock;	/* protects updates to the raw inode */
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index ab9ab50ad0b5..786b79a9c573 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -211,6 +211,7 @@ void ext4_fc_init_inode(struct inode *inode)
 	struct ext4_inode_info *ei = EXT4_I(inode);
 
 	ext4_fc_reset_inode(inode);
+	ei->i_fc_ranges = NULL;
 	ext4_clear_inode_state(inode, EXT4_STATE_FC_COMMITTING);
 	INIT_LIST_HEAD(&ei->i_fc_list);
 	INIT_LIST_HEAD(&ei->i_fc_dilist);
@@ -671,17 +672,73 @@ static int __track_range(handle_t *handle, struct inode *inode, void *arg,
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	struct __track_range_args *__arg =
 		(struct __track_range_args *)arg;
+	ext4_lblk_t start = __arg->start, end = __arg->end;
+	ext4_lblk_t s0, e0;
 
 	if (inode->i_ino < EXT4_FIRST_INO(inode->i_sb)) {
 		ext4_debug("Special inode %ld being modified\n", inode->i_ino);
 		return -ECANCELED;
 	}
 
-	/* A new transaction (update == false) starts a fresh range set. */
-	if (!update)
-		ei->i_fc_nr_ranges = 0;
+	/*
+	 * A sub-block punch hole rounds up the start and down the end, passing
+	 * end == start - 1: no whole block changed, so there is nothing to
+	 * track.  (ext4_fc_track_template has already reset the range set for a
+	 * new transaction, so we need not do it here.)
+	 */
+	if (end < start)
+		return 0;
+
+	/* Already upgraded to the heap array: full multi-interval tracking. */
+	if (ei->i_fc_ranges) {
+		ext4_fc_range_add(ei, start, end);
+		return 0;
+	}
+
+	/* First range of this commit stays inline, no allocation needed. */
+	if (ei->i_fc_nr_ranges == 0) {
+		ei->i_fc_range.start = start;
+		ei->i_fc_range.len = end - start + 1;
+		ei->i_fc_nr_ranges = 1;
+		return 0;
+	}
+
+	/* One inline range so far. */
+	s0 = ei->i_fc_range.start;
+	e0 = s0 + ei->i_fc_range.len - 1;
 
-	ext4_fc_range_add(ei, __arg->start, __arg->end);
+	/* Disjoint from it: try to upgrade to the array for exact tracking. */
+	if (start > e0 + 1 || end + 1 < s0) {
+		struct ext4_fc_lblk_range *heap;
+
+		/*
+		 * GFP_ATOMIC: we hold i_fc_lock.  __GFP_NOWARN: failure is not
+		 * fatal -- we fall back to the single coalesced range below --
+		 * so it must not splat under memory pressure.
+		 */
+		heap = kmalloc_array(EXT4_FC_MAX_RANGES + 1, sizeof(*heap),
+				     GFP_ATOMIC | __GFP_NOWARN);
+		if (heap) {
+			heap[0] = ei->i_fc_range;
+			ei->i_fc_ranges = heap;
+			ext4_fc_range_add(ei, start, end);
+			return 0;
+		}
+		/*
+		 * Out of memory: fall back to the original single coalesced
+		 * range by absorbing the gap below.  This over-logs the spanned
+		 * extents but stays a valid fast commit (no full-commit
+		 * fallback), so there is nothing to mark ineligible.
+		 */
+	}
+
+	/* Overlapping/adjacent, or array allocation failed: coalesce inline. */
+	if (start < s0)
+		s0 = start;
+	if (end > e0)
+		e0 = end;
+	ei->i_fc_range.start = s0;
+	ei->i_fc_range.len = e0 - s0 + 1;
 
 	return 0;
 }
@@ -1016,7 +1073,10 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
 		spin_unlock(&ei->i_fc_lock);
 		return 0;
 	}
-	memcpy(ranges, ei->i_fc_ranges, nr * sizeof(ranges[0]));
+	if (ei->i_fc_ranges)
+		memcpy(ranges, ei->i_fc_ranges, nr * sizeof(ranges[0]));
+	else
+		ranges[0] = ei->i_fc_range;	/* inline single-range mode */
 	ei->i_fc_nr_ranges = 0;
 	spin_unlock(&ei->i_fc_lock);
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 699c15db28a8..93d495cad0ba 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1433,6 +1433,7 @@ static void ext4_free_in_core_inode(struct inode *inode)
 		pr_warn("%s: inode %ld still in fc list",
 			__func__, inode->i_ino);
 	}
+	kfree(EXT4_I(inode)->i_fc_ranges);
 	kmem_cache_free(ext4_inode_cachep, EXT4_I(inode));
 }
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH 1/2] ext4: track multiple disjoint fast-commit ranges per inode
From: Daejun Park @ 2026-06-11  4:48 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daejun Park
In-Reply-To: <20260611044733epcms2p38013ae683a283555526f70e4eab6d2a9@epcms2p3>

Fast commit tracks a single coalesced logical range per inode
(i_fc_lblk_start .. i_fc_lblk_len).  When an inode is modified at several
disjoint offsets between two commits (e.g. sparse random writes), the
range is widened to span [min, max] of all touched offsets, and at commit
time ext4_fc_write_inode_data() re-logs every extent inside that span,
including the unmodified ones.  On sparse allocation this inflates
fast-commit traffic and often overflows the fast-commit area, forcing a
fallback to a full jbd2 commit.

Replace the single range with a bounded array of up to EXT4_FC_MAX_RANGES
(16) disjoint ranges.  __track_range inserts and merges into it; on
overflow the two ranges separated by the smallest gap are coalesced, so
it degrades to the old single-span behaviour in the worst case.
ext4_fc_write_inode_data() now walks only the tracked ranges.  The
on-disk fast-commit (TLV) format is unchanged.

The number of disjoint dirty regions an inode accumulates per fsync --
how scattered the writes are -- controls how badly the single-span
tracking over-logs.  On a sparse random-write workload (1 GiB span, 300
fsyncs, NVMe):

                           16 regions    64 regions
  fast-commit blocks/cmt   19.1 -> 1.0   76.3 -> 31.6
  mean fsync latency (us)  2537 -> 2280  3398 -> 2937
  p99  fsync latency (us)  3698 -> 2545  4492 -> 4291

With 16 dirty regions per fsync everything fits within the 16-range cap
and each region is tracked exactly; 64 regions exceeds the cap and
exercises the overflow-merge path, which still roughly halves the logged
blocks.  On a small filesystem whose fast-commit area is easily exhausted,
the reduced traffic also cuts the full-commit fallback rate (e.g. 22% ->
2% at 16 regions on an 8 GiB fs).

Crash recovery (online replay + offline e2fsck) and the ext4/generic
fast-commit xfstests show no regression; the unchanged on-disk format
means e2fsprogs needs no update.

Signed-off-by: Daejun Park <pdaejun@gmail.com>
---
 fs/ext4/ext4.h        |  31 ++++++++--
 fs/ext4/fast_commit.c | 138 ++++++++++++++++++++++++++++++++----------
 2 files changed, 130 insertions(+), 39 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 01a6e2de7fc3..314a1c90075b 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1017,6 +1017,20 @@ enum {
 };
 
 
+/*
+ * Maximum number of disjoint logical-block ranges tracked per inode for a
+ * single fast commit.  Scattered allocations that exceed this get their two
+ * closest ranges merged (see ext4_fc_range_add()), degrading gracefully to
+ * the old single coalesced-range behaviour.
+ */
+#define EXT4_FC_MAX_RANGES 16
+
+/* In-memory record of an lblk range modified in the current fast commit. */
+struct ext4_fc_lblk_range {
+	ext4_lblk_t start;
+	ext4_lblk_t len;
+};
+
 /*
  * fourth extended file system inode data in memory
  */
@@ -1066,11 +1080,16 @@ struct ext4_inode_info {
 					 * protected by sbi->s_fc_lock.
 					 */
 
-	/* Start of lblk range that needs to be committed in this fast commit */
-	ext4_lblk_t i_fc_lblk_start;
-
-	/* End of lblk range that needs to be committed in this fast commit */
-	ext4_lblk_t i_fc_lblk_len;
+	/*
+	 * Disjoint lblk ranges modified in this fast commit.  Tracking the
+	 * actual modified ranges (instead of one coalesced [min,max]) avoids
+	 * re-logging the whole spanned extent map for scattered allocations.
+	 * Sorted by start, mutually disjoint.  Bounded by EXT4_FC_MAX_RANGES;
+	 * the extra slot is transient room used while inserting before an
+	 * overflow merge.  Protected by i_fc_lock.
+	 */
+	struct ext4_fc_lblk_range i_fc_ranges[EXT4_FC_MAX_RANGES + 1];
+	unsigned int i_fc_nr_ranges;
 
 	spinlock_t i_raw_lock;	/* protects updates to the raw inode */
 
@@ -1078,7 +1097,7 @@ struct ext4_inode_info {
 	wait_queue_head_t i_fc_wait;
 
 	/*
-	 * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len
+	 * Protect concurrent accesses on i_fc_ranges, i_fc_nr_ranges
 	 * and inode's EXT4_FC_STATE_COMMITTING state bit.
 	 */
 	spinlock_t i_fc_lock;
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 42bee1d4f9f9..ab9ab50ad0b5 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -203,8 +203,7 @@ static inline void ext4_fc_reset_inode(struct inode *inode)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
 
-	ei->i_fc_lblk_start = 0;
-	ei->i_fc_lblk_len = 0;
+	ei->i_fc_nr_ranges = 0;
 }
 
 void ext4_fc_init_inode(struct inode *inode)
@@ -540,7 +539,7 @@ static int __track_inode(handle_t *handle, struct inode *inode, void *arg,
 	if (update)
 		return -EEXIST;
 
-	EXT4_I(inode)->i_fc_lblk_len = 0;
+	EXT4_I(inode)->i_fc_nr_ranges = 0;
 
 	return 0;
 }
@@ -603,12 +602,73 @@ struct __track_range_args {
 	ext4_lblk_t start, end;
 };
 
+/*
+ * Record that logical block range [start, end] was modified in the current
+ * fast commit.  Maintains a small, bounded set of sorted, mutually disjoint
+ * ranges, merging the new range with any it overlaps or is adjacent to.  When
+ * the set would exceed EXT4_FC_MAX_RANGES, the consecutive pair separated by
+ * the smallest gap is merged (absorbing that gap), so the worst case degrades
+ * gracefully to the old single coalesced-range behaviour.  Tracking the actual
+ * modified ranges (rather than one [min,max] span) keeps ext4_fc_write_inode_data
+ * from re-logging the whole spanned extent map on scattered allocations.
+ * Caller holds ei->i_fc_lock.
+ */
+static void ext4_fc_range_add(struct ext4_inode_info *ei,
+			      ext4_lblk_t start, ext4_lblk_t end)
+{
+	struct ext4_fc_lblk_range *r = ei->i_fc_ranges;
+	unsigned int n = ei->i_fc_nr_ranges;
+	unsigned int i, j;
+
+	/* Skip ranges lying entirely before [start - 1] (no overlap/adjacency). */
+	i = 0;
+	while (i < n && r[i].start + r[i].len < start)
+		i++;
+
+	/* Absorb every range overlapping or adjacent to the growing [start,end]. */
+	j = i;
+	while (j < n && r[j].start <= end + 1) {
+		if (r[j].start < start)
+			start = r[j].start;
+		if (r[j].start + r[j].len - 1 > end)
+			end = r[j].start + r[j].len - 1;
+		j++;
+	}
+
+	/* Replace r[i..j-1] with the merged range (j == i is a plain insert). */
+	if (j != i + 1)
+		memmove(&r[i + 1], &r[j], (n - j) * sizeof(*r));
+	r[i].start = start;
+	r[i].len = end - start + 1;
+	ei->i_fc_nr_ranges = n - (j - i) + 1;
+
+	/* Overflow: merge the consecutive pair separated by the smallest gap. */
+	while (ei->i_fc_nr_ranges > EXT4_FC_MAX_RANGES) {
+		ext4_lblk_t best_gap = ~0U;
+		unsigned int best = 0;
+
+		n = ei->i_fc_nr_ranges;
+		for (i = 0; i + 1 < n; i++) {
+			ext4_lblk_t gap = r[i + 1].start -
+					  (r[i].start + r[i].len);
+
+			if (gap < best_gap) {
+				best_gap = gap;
+				best = i;
+			}
+		}
+		r[best].len = r[best + 1].start + r[best + 1].len - r[best].start;
+		memmove(&r[best + 1], &r[best + 2],
+			(n - best - 2) * sizeof(*r));
+		ei->i_fc_nr_ranges = n - 1;
+	}
+}
+
 /* __track_fn for tracking data updates */
 static int __track_range(handle_t *handle, struct inode *inode, void *arg,
 			 bool update)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	ext4_lblk_t oldstart;
 	struct __track_range_args *__arg =
 		(struct __track_range_args *)arg;
 
@@ -617,17 +677,11 @@ static int __track_range(handle_t *handle, struct inode *inode, void *arg,
 		return -ECANCELED;
 	}
 
-	oldstart = ei->i_fc_lblk_start;
+	/* A new transaction (update == false) starts a fresh range set. */
+	if (!update)
+		ei->i_fc_nr_ranges = 0;
 
-	if (update && ei->i_fc_lblk_len > 0) {
-		ei->i_fc_lblk_start = min(ei->i_fc_lblk_start, __arg->start);
-		ei->i_fc_lblk_len =
-			max(oldstart + ei->i_fc_lblk_len - 1, __arg->end) -
-				ei->i_fc_lblk_start + 1;
-	} else {
-		ei->i_fc_lblk_start = __arg->start;
-		ei->i_fc_lblk_len = __arg->end - __arg->start + 1;
-	}
+	ext4_fc_range_add(ei, __arg->start, __arg->end);
 
 	return 0;
 }
@@ -890,33 +944,20 @@ static int ext4_fc_write_inode(struct inode *inode, u32 *crc)
  * Writes updated data ranges for the inode in question. Updates CRC.
  * Returns 0 on success, error otherwise.
  */
-static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
+/* Write the fast commit TLVs for one modified lblk range [start, end]. */
+static int ext4_fc_write_lblk_range(struct inode *inode, ext4_lblk_t start,
+				    ext4_lblk_t end, u32 *crc)
 {
-	ext4_lblk_t old_blk_size, cur_lblk_off, new_blk_size;
-	struct ext4_inode_info *ei = EXT4_I(inode);
+	ext4_lblk_t cur_lblk_off = start;
 	struct ext4_map_blocks map;
 	struct ext4_fc_add_range fc_ext;
 	struct ext4_fc_del_range lrange;
 	struct ext4_extent *ex;
 	int ret;
 
-	spin_lock(&ei->i_fc_lock);
-	if (ei->i_fc_lblk_len == 0) {
-		spin_unlock(&ei->i_fc_lock);
-		return 0;
-	}
-	old_blk_size = ei->i_fc_lblk_start;
-	new_blk_size = ei->i_fc_lblk_start + ei->i_fc_lblk_len - 1;
-	ei->i_fc_lblk_len = 0;
-	spin_unlock(&ei->i_fc_lock);
-
-	cur_lblk_off = old_blk_size;
-	ext4_debug("will try writing %d to %d for inode %ld\n",
-		   cur_lblk_off, new_blk_size, inode->i_ino);
-
-	while (cur_lblk_off <= new_blk_size) {
+	while (cur_lblk_off <= end) {
 		map.m_lblk = cur_lblk_off;
-		map.m_len = new_blk_size - cur_lblk_off + 1;
+		map.m_len = end - cur_lblk_off + 1;
 		ret = ext4_map_blocks(NULL, inode, &map,
 				      EXT4_GET_BLOCKS_IO_SUBMIT |
 				      EXT4_EX_NOCACHE);
@@ -962,6 +1003,37 @@ static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
 	return 0;
 }
 
+static int ext4_fc_write_inode_data(struct inode *inode, u32 *crc)
+{
+	struct ext4_inode_info *ei = EXT4_I(inode);
+	struct ext4_fc_lblk_range ranges[EXT4_FC_MAX_RANGES + 1];
+	unsigned int nr, i;
+	int ret;
+
+	spin_lock(&ei->i_fc_lock);
+	nr = ei->i_fc_nr_ranges;
+	if (nr == 0) {
+		spin_unlock(&ei->i_fc_lock);
+		return 0;
+	}
+	memcpy(ranges, ei->i_fc_ranges, nr * sizeof(ranges[0]));
+	ei->i_fc_nr_ranges = 0;
+	spin_unlock(&ei->i_fc_lock);
+
+	for (i = 0; i < nr; i++) {
+		ext4_lblk_t start = ranges[i].start;
+		ext4_lblk_t end = ranges[i].start + ranges[i].len - 1;
+
+		ext4_debug("will try writing %u to %u for inode %ld\n",
+			   start, end, inode->i_ino);
+		ret = ext4_fc_write_lblk_range(inode, start, end, crc);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 
 /* Flushes data of all the inodes in the commit queue. */
 static int ext4_fc_flush_data(journal_t *journal)
-- 
2.43.0


^ permalink raw reply related

* [PATCH 0/2] ext4: reduce fast-commit write amplification for scattered writes
From: Daejun Park @ 2026-06-11  4:47 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daejun Park
In-Reply-To: <CGME20260611044733epcms2p38013ae683a283555526f70e4eab6d2a9@epcms2p3>

ext4 fast commit tracks a single coalesced logical range per inode.  When
an inode is dirtied at several disjoint offsets between two commits (e.g.
sparse/scattered random writes), that range is widened to span [min, max]
of all the touched offsets, and ext4_fc_write_inode_data() then re-logs
every extent inside that span -- including the unmodified ones.  On sparse
allocation this inflates fast-commit traffic and frequently overflows the
fast-commit area, forcing a fallback to a full jbd2 commit.

This series replaces the single range with a small, bounded set of disjoint
ranges so that only the actually-modified regions are logged, while keeping
the per-inode memory cost negligible:

  1/2 tracks up to EXT4_FC_MAX_RANGES (16) disjoint ranges, merging the two
      closest ranges when the set would overflow -- so the worst case
      degrades gracefully to the old single-span behaviour.  The on-disk
      fast-commit (TLV) format is unchanged.

  2/2 allocates that array lazily: the first range is kept inline, the array
      is allocated only when a second disjoint range appears, and on an
      allocation failure we fall back to the inline single range.  The
      per-inode fast-commit footprint drops from ~140 to 20 bytes.

Measured on a sparse random-write workload (1 GiB span, R disjoint dirty
regions per fsync, 300 fsyncs, bare-metal NVMe):

  - fast-commit blocks per commit (R=16):  18.6 -> 1.0
  - full-commit fallback rate     (R=16):  22%  -> 2%   (on a small fs)
  - mean fsync latency:  R=16  -10%,  R=64  -14%
  - p99  fsync latency:  R=16  -31%

The p99 improvement comes from eliminating the full-commit fallback spikes.

Testing: crash recovery (power loss -> fast-commit replay -> verify every
fsync'd block, then e2fsck) is clean; the ext4/generic fast-commit xfstests
show no regression; the unchanged on-disk format means e2fsprogs needs no
update.  Both patches are checkpatch --strict clean.

Based on v6.17-rc3.

Daejun Park (2):
  ext4: track multiple disjoint fast-commit ranges per inode
  ext4: allocate the fast-commit range array lazily

 fs/ext4/ext4.h        |  40 +++++++--
 fs/ext4/fast_commit.c | 196 +++++++++++++++++++++++++++++++++++-------
 fs/ext4/super.c       |   1 +
 3 files changed, 199 insertions(+), 38 deletions(-)

-- 
2.43.0

^ permalink raw reply

* [PATCH][e2fsprogs] build: use correct subst variable
From: Li Dongyang @ 2026-06-11  3:52 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger

ifNotGNUmake was changed to ifnGNUmake but test/Makefile.in still uses
the old variable name.
make fullcheck fails on some platforms:
make[2]: Entering directory `/var/lib/jenkins/workspace/e2fsprogs-reviews/arch/x86_64/distro/el7/_topdir/BUILD/e2fsprogs-1.47.4/tests'
Makefile:387: *** missing separator.  Stop.
make[2]: Leaving directory `/var/lib/jenkins/workspace/e2fsprogs-reviews/arch/x86_64/distro/el7/_topdir/BUILD/e2fsprogs-1.47.4/tests'
make[1]: *** [fullcheck-recursive] Error 1

Fixes: b7d1ab3376 "Update configure/configure.ac/aclocal.m4 to use autoconf 2.72"
Change-Id: Iec3cacfca7206bf785381664b7d7bded8c70113c
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
---
 tests/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/Makefile.in b/tests/Makefile.in
index 678cc3268c..8f7a072f45 100644
--- a/tests/Makefile.in
+++ b/tests/Makefile.in
@@ -48,7 +48,7 @@ test_data.tmp: $(srcdir)/scripts/gen-test-data
 always_run:

 @ifGNUmake@TESTS=$(wildcard $(srcdir)/[a-z]_*)
-@ifNotGNUmake@TESTS != echo $(srcdir)/[a-z]_*
+@ifnGNUmake@TESTS != echo $(srcdir)/[a-z]_*

 SKIP_SLOW_TESTS=--skip-slow-tests

-- 
2.52.0

^ permalink raw reply related

* Re: [PATCH 5.10/5.15] ext4: validate p_idx bounds in ext4_ext_correct_indexes
From: Sasha Levin @ 2026-06-11  0:45 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Sasha Levin, Alexey Panov, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-kernel, Baokun Li, Jan Kara, Ojaswin Mujoo,
	Ritesh Harjani (IBM), Zhang Yi, lvc-project,
	syzbot+04c4e65cab786a2e5b7e, Tejas Bharambe, stable
In-Reply-To: <20260609164430.29988-1-apanov@astralinux.ru>

On Mon, Jun 09, 2026 at 07:44:30PM +0300, Alexey Panov wrote:
> [PATCH 5.10/5.15] ext4: validate p_idx bounds in ext4_ext_correct_indexes

Queued for 5.15 and 5.10, thanks.

--
Thanks,
Sasha

^ permalink raw reply

* [PATCH v2 10/10] ext4: Add EXT4_IOC_SET_LUFID ioctl for setting LUFID on directory entries
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Add a new ioctl command that allows setting LUFID (Locally Unique File ID)
data on existing directory entries. This includes:

- ext4_ioctl_set_lufid(): ioctl handler that validates parameters and
  calls the underlying implementation
- ext4_set_direntry_lufid(): Core function that performs the operation by:
  * Looking up the target directory entry
  * Retrieving the associated inode
  * Deleting the old entry and re-creating it with LUFID data attached

This implementation requires the dirdata feature to be enabled on the
filesystem and properly handles transactions and inode locking to ensure
consistency.

Signed-off-by: Artem Blagodarenko artem.blagodarenko@gmail.com
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/ext4.h            |  15 +++++
 fs/ext4/ioctl.c           |  62 ++++++++++++++++++++
 fs/ext4/namei.c           | 120 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ext4.h |  13 +++++
 4 files changed, 210 insertions(+)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index defa18d98c74..975b0975e032 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1193,6 +1193,7 @@ struct ext4_inode_info {
 #ifdef CONFIG_FS_ENCRYPTION
 	struct fscrypt_inode_info *i_crypt_info;
 #endif
+	void *i_dirdata;
 };
 
 /*
@@ -2515,6 +2516,18 @@ struct ext4_dirent_hash {
 	struct ext4_dir_entry_hash	dh_hash;
 } __packed;
 
+static inline
+struct ext4_dirent_fid *ext4_dentry_get_fid(struct super_block *sb,
+					    struct ext4_dentry_param *p)
+{
+	if (!ext4_has_feature_dirdata(sb))
+		return NULL;
+	if (p && p->edp_magic == EXT4_LUFID_MAGIC)
+		return &p->edp_dfid;
+
+	return NULL;
+}
+
 #define EXT4_FT_DIR_CSUM	0xDE
 
 /*
@@ -3215,6 +3228,8 @@ static inline int ext4_init_new_dir(handle_t *handle, struct inode *dir,
 }
 extern int ext4_dirblock_csum_verify(struct inode *inode,
 				     struct buffer_head *bh);
+extern int ext4_dirdata_set_lufid(struct inode *dir, const char *filename,
+			   int namelen, struct ext4_dentry_param *edp);
 extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
 				__u32 start_minor_hash, __u32 *next_hash);
 extern int ext4_search_dir(struct buffer_head *bh,
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 1d0c3d4bdf47..9f32f21d9b3a 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1529,6 +1529,65 @@ static int ext4_ioctl_set_tune_sb(struct file *filp,
 	return ret;
 }
 
+/*
+ * ext4_ioctl_set_lufid() - Set LUFID on a directory entry
+ * @filp:	file pointer (parent directory)
+ * @arg:	pointer to ext4_set_lufid structure with filename and LUFID data
+ *
+ * This ioctl allows setting LUFID data on an existing
+ * directory entry. It is called on the parent directory with a filename and
+ * LUFID data.
+ */
+static long ext4_ioctl_set_lufid(struct file *filp, unsigned long arg)
+{
+	struct inode *dir = file_inode(filp);
+	struct ext4_set_lufid lufid_args;
+	struct {
+		__u32 edp_magic;
+		struct ext4_dirent_data_header df_header;
+		char df_fid[255];
+	} edp;
+	int err;
+
+	/* Check if parent is a directory */
+	if (!S_ISDIR(dir->i_mode))
+		return -ENOTDIR;
+
+	/* Copy arguments from user space */
+	if (copy_from_user(&lufid_args, (struct ext4_set_lufid __user *)arg,
+			   sizeof(lufid_args)))
+		return -EFAULT;
+
+	/* Validate parameters */
+	if (lufid_args.esl_name_len == 0 || lufid_args.esl_name_len > EXT4_NAME_LEN)
+		return -EINVAL;
+
+	if (lufid_args.esl_data_len == 0 || lufid_args.esl_data_len > 255)
+		return -EINVAL;
+
+	/* Ensure filename is NUL-terminated and unmodified */
+	if (lufid_args.esl_name[lufid_args.esl_name_len - 1] != '\0')
+		return -EINVAL;
+
+	/* Prepare the dentry param struct with LUFID data */
+	edp.edp_magic = EXT4_LUFID_MAGIC;
+	edp.df_header.ddh_length = lufid_args.esl_data_len;
+	memcpy(edp.df_fid, lufid_args.esl_data, lufid_args.esl_data_len);
+
+	/* Want write access */
+	err = mnt_want_write_file(filp);
+	if (err)
+		return err;
+
+	/* Call the helper function to do the actual work */
+	err = ext4_dirdata_set_lufid(dir, lufid_args.esl_name,
+				    lufid_args.esl_name_len - 1,
+				    (struct ext4_dentry_param *)&edp);
+
+	mnt_drop_write_file(filp);
+	return err;
+}
+
 static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	struct inode *inode = file_inode(filp);
@@ -1912,6 +1971,8 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 					      (void __user *)arg);
 	case EXT4_IOC_SET_TUNE_SB_PARAM:
 		return ext4_ioctl_set_tune_sb(filp, (void __user *)arg);
+	case EXT4_IOC_SET_LUFID:
+		return ext4_ioctl_set_lufid(filp, arg);
 	default:
 		return -ENOTTY;
 	}
@@ -1991,6 +2052,7 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	case FS_IOC_SETFSLABEL:
 	case EXT4_IOC_GETFSUUID:
 	case EXT4_IOC_SETFSUUID:
+	case EXT4_IOC_SET_LUFID:
 		break;
 	default:
 		return -ENOIOCTLCMD;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c799f87d7459..65c53c08213a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2264,6 +2264,8 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 	if (ext4_has_feature_metadata_csum(inode->i_sb))
 		csum_size = sizeof(struct ext4_dir_entry_tail);
 
+	dfid = ext4_dentry_get_fid(inode->i_sb,
+		(struct ext4_dentry_param *)EXT4_I(inode)->i_dirdata);
 	if (!de) {
 		if (dfid)
 			dlen = dfid->df_header.ddh_length;
@@ -2605,6 +2607,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
 {
 	struct inode *dir = d_inode(dentry->d_parent);
 
+	EXT4_I(inode)->i_dirdata = dentry->d_fsdata;
 	if (fscrypt_is_nokey_name(dentry))
 		return -ENOKEY;
 	return __ext4_add_entry(handle, dir, &dentry->d_name, inode);
@@ -4361,6 +4364,123 @@ static int ext4_rename2(struct mnt_idmap *idmap,
 	return ext4_rename(idmap, old_dir, old_dentry, new_dir, new_dentry, flags);
 }
 
+/*
+ * ext4_dirdata_set_lufid() - Set LUFID data on an existing directory entry
+ * @dir:        parent directory inode
+ * @filename:   name of the file in the directory
+ * @namelen:    length of filename
+ * @edp:        pointer to initialized dentry param with LUFID data
+ *
+ * This function finds an existing directory entry, deletes it, and re-creates it
+ * with LUFID data attached. Used by the EXT4_IOC_SET_LUFID ioctl.
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+int ext4_dirdata_set_lufid(struct inode *dir, const char *filename,
+			    int namelen, struct ext4_dentry_param *edp)
+{
+	struct super_block *sb = dir->i_sb;
+	struct ext4_filename fname;
+	struct ext4_dir_entry_2 *de = NULL;
+	struct buffer_head *bh = NULL;
+	struct inode *inode = NULL;
+	handle_t *handle = NULL;
+	struct qstr d_name;
+	void *old_dirdata = NULL;
+	int err = 0;
+
+	/* Check if dirdata feature is enabled */
+	if (!ext4_has_feature_dirdata(sb))
+		return -ENOTSUPP;
+
+	if (namelen > EXT4_NAME_LEN)
+               return -ENAMETOOLONG;
+        if (namelen != strnlen(filename, namelen + 1))
+               return -EINVAL;
+
+	/* Setup the filename for lookup */
+	d_name.name = filename;
+	d_name.len = namelen;
+
+	/* Lookup the filename in the directory */
+	err = ext4_fname_setup_filename(dir, &d_name, 0, &fname);
+	if (err)
+		goto out_free;
+
+	bh = ext4_find_entry(dir, &d_name, &de, NULL);
+	if (!bh) {
+		err = -ENOENT;
+		goto out_free;
+	}
+
+	/* Get the inode number from the directory entry */
+	inode = ext4_iget(sb, le32_to_cpu(de->inode), EXT4_IGET_NORMAL);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		inode = NULL;
+		goto out_brelse;
+	}
+
+	/* Start a transaction */
+	handle = ext4_journal_start(dir, EXT4_HT_DIR, 
+				     2 * EXT4_DATA_TRANS_BLOCKS(sb) + 
+				     EXT4_INDEX_EXTRA_TRANS_BLOCKS);
+	if (IS_ERR(handle)) {
+		err = PTR_ERR(handle);
+		handle = NULL;
+		goto out_iput;
+	}
+
+	inode_lock(dir);
+
+	/* Delete the old entry */
+	err = ext4_delete_entry(handle, dir, de, bh);
+	if (err)
+		goto out_unlock;
+
+	brelse(bh);
+	bh = NULL;
+
+	/* Re-add the entry with LUFID data
+	 * We set i_dirdata before adding so the entry can include it
+	 */
+	old_dirdata = EXT4_I(inode)->i_dirdata;
+	EXT4_I(inode)->i_dirdata = edp;
+
+	/* Use ext4_add_entry() to properly handle hash table management
+	 * and block splitting, just like rename does. This ensures the entry
+	 * is placed in the correct hash block and avoids breaking dirhash.
+	 */
+	{
+		struct dentry parent_dentry = { .d_inode = dir };
+		struct dentry new_dentry = {
+			.d_name = d_name,
+			.d_parent = &parent_dentry,
+			.d_inode = inode,  /* Same inode (in-place update) */
+			.d_fsdata = edp,   /* required */
+		};
+		err = ext4_add_entry(handle, &new_dentry, inode);
+	}
+	EXT4_I(inode)->i_dirdata = old_dirdata;
+
+	/* Update inode times */
+	inode_set_ctime_current(dir);
+	inode_inc_iversion(dir);
+	ext4_mark_inode_dirty(handle, dir);
+
+out_unlock:
+	inode_unlock(dir);
+	ext4_journal_stop(handle);
+out_iput:
+	iput(inode);
+out_brelse:
+	brelse(bh);
+out_free:
+	ext4_fname_free_filename(&fname);
+
+	return err;
+}
+
 /*
  * directories can handle most operations...
  */
diff --git a/include/uapi/linux/ext4.h b/include/uapi/linux/ext4.h
index 9c683991c32f..b04bbb2818a3 100644
--- a/include/uapi/linux/ext4.h
+++ b/include/uapi/linux/ext4.h
@@ -35,6 +35,7 @@
 #define EXT4_IOC_SETFSUUID		_IOW('f', 44, struct fsuuid)
 #define EXT4_IOC_GET_TUNE_SB_PARAM	_IOR('f', 45, struct ext4_tune_sb_params)
 #define EXT4_IOC_SET_TUNE_SB_PARAM	_IOW('f', 46, struct ext4_tune_sb_params)
+#define EXT4_IOC_SET_LUFID		_IOW('f', 47, struct ext4_set_lufid)
 
 #define EXT4_IOC_SHUTDOWN _IOR('X', 125, __u32)
 
@@ -92,6 +93,18 @@ struct move_extent {
 	__u64 moved_len;	/* moved block length */
 };
 
+/*
+ * Structure for EXT4_IOC_SET_LUFID
+ * Sets LUFID on a directory entry
+ * Called on parent directory with filename and LUFID data as arguments
+ */
+struct ext4_set_lufid {
+	__u8 esl_name_len;	/* length of filename */
+	__u8 esl_data_len;	/* length of LUFID data */
+	char  esl_name[255 + 1]; /* filename (NUL-terminated) */
+	char  esl_data[255 + 1]; /* LUFID data (raw bytes) */
+};
+
 /*
  * Flags used by EXT4_IOC_SHUTDOWN
  */
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 09/10] ext4: add dirdata set/get helpers
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Add helpers to set and retrieve dirdata payload and hook them up at
the appropriate call sites.

Enable dirdata for casefold+encryption hashes and storing unique
128-bit file identifier in the directory entry for testing.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 foofile.txt      |   0
 fs/ext4/ext4.h   |   4 +
 fs/ext4/inline.c |   6 +-
 fs/ext4/namei.c  | 196 ++++++++++++++++++++++++++++++++++++++++-------
 4 files changed, 176 insertions(+), 30 deletions(-)

diff --git a/foofile.txt b/foofile.txt
new file mode 100644
index 000000000000..e69de29bb2d1
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ef99e4fa99d7..defa18d98c74 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3787,6 +3787,10 @@ extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
 			 struct inode *inode, struct dentry *dentry);
 extern int __ext4_link(struct inode *dir, struct inode *inode,
 		       const struct qstr *d_name, struct dentry *dentry);
+extern unsigned char ext4_dirdata_get(struct ext4_dir_entry_2 *de,
+				      struct inode *dir,
+				      struct ext4_dirent_fid  *lufid,
+				      struct dx_hash_info *hinfo);
 
 #define S_SHIFT 12
 static const unsigned char ext4_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = {
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index c57a8ebe4f94..71c395c9a162 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1346,10 +1346,8 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
 			}
 		}
 
-		if (ext4_hash_in_dirent(dir)) {
-			hinfo->hash = EXT4_DIRENT_HASH(de);
-			hinfo->minor_hash = EXT4_DIRENT_MINOR_HASH(de);
-		} else {
+		if (!(ext4_dirdata_get(de, dir, NULL, hinfo) &
+							EXT4_DIRENT_CFHASH)) {
 			err = ext4fs_dirhash(dir, de->name, de->name_len, hinfo);
 			if (err) {
 				ret = err;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 40a3394f7eac..c799f87d7459 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1084,22 +1084,22 @@ static int htree_dirblock_to_tree(struct file *dir_file,
 			/* silently ignore the rest of the block */
 			break;
 		}
-		if (ext4_hash_in_dirent(dir)) {
-			if (de->name_len && de->inode) {
-				hinfo->hash = EXT4_DIRENT_HASH(de);
-				hinfo->minor_hash = EXT4_DIRENT_MINOR_HASH(de);
-			} else {
-				hinfo->hash = 0;
-				hinfo->minor_hash = 0;
-			}
+		if (de->name_len && de->inode) {
+			/* check for saved hash first, or generate it from name */
+			if (!(ext4_dirdata_get(de, dir, NULL, hinfo) &
+			      EXT4_DIRENT_CFHASH)) {
+				err = ext4fs_dirhash(dir, de->name,
+						     de->name_len, hinfo);
+				if (err < 0) {
+					count = err;
+					goto errout;
+				}
+			 }
 		} else {
-			err = ext4fs_dirhash(dir, de->name,
-					     de->name_len, hinfo);
-			if (err < 0) {
-				count = err;
-				goto errout;
-			}
+			hinfo->hash = 0;
+			hinfo->minor_hash = 0;
 		}
+
 		if ((hinfo->hash < start_hash) ||
 		    ((hinfo->hash == start_hash) &&
 		     (hinfo->minor_hash < start_minor_hash)))
@@ -1277,9 +1277,160 @@ static inline int search_dirblock(struct buffer_head *bh,
  */
 
 /*
- * Create map of hash values, offsets, and sizes, stored at end of block.
- * Returns number of entries mapped.
+ * ext4_dirdata_get() - Read dirdata fields from a directory entry.
+ * @de:         directory entry
+ * @dir:        directory inode (used for fscrypt+casefold hash fallback)
+ * @dfid:      if non-NULL and EXT4_DIRENT_LUFID is set, LUFID data is copied
+ * 		here
+ * @hinfo:	if non-NULL, receives the casefold hash and minor hash
+ *
+ * Reads any dirdata stored in @de.  If the dirdata feature is not enabled,
+ * falls back to reading the hash stored inline after the filename (for
+ * compatibility with the older casefold+fscrypt format).
+ *
+ * Returns a bitmask of EXT4_DIRENT_* flags indicating which fields were read.
  */
+unsigned char ext4_dirdata_get(struct ext4_dir_entry_2 *de, struct inode *dir,
+			       struct ext4_dirent_fid *dfid,
+			       struct dx_hash_info *hinfo)
+{
+	unsigned char ret = 0;
+	unsigned int data_offset = de->name_len + 1;
+
+	if (data_offset > de->rec_len)
+		return ret;
+
+	/* compatibility: hash stored inline after filename (no dirdata) */
+	if (hinfo && !ext4_has_feature_dirdata(dir->i_sb) &&
+	    ext4_hash_in_dirent(dir)) {
+		hinfo->hash = EXT4_DIRENT_HASH(de);
+		hinfo->minor_hash = EXT4_DIRENT_MINOR_HASH(de);
+		ret |= EXT4_DIRENT_CFHASH;
+
+		return ret;
+	}
+
+	/*  EXT4_DIRENT_* are not expected without flag in i_sb */
+	if (de->file_type & EXT4_DIRENT_LUFID) {
+		struct ext4_dirent_fid *dfid =
+			(struct ext4_dirent_fid *)(de->name + data_offset);
+		unsigned int dlen;
+
+		if (data_offset + sizeof(dfid->df_header) > de->rec_len)
+			return ret;
+
+		dlen = dfid->df_header.ddh_length;
+		if (dlen < sizeof(*dfid) || data_offset + dlen > de->rec_len)
+			return ret;
+
+		if (dfid) {
+			memcpy(dfid, dfid->df_fid, dfid->df_header.ddh_length);
+			ret |= EXT4_DIRENT_LUFID;
+		}
+		data_offset += dlen;
+	}
+
+	/* Skip INO64 for now*/
+	if (de->file_type & EXT4_DIRENT_INO64) {
+		struct ext4_dirent_data_header *ddh =
+		       (struct ext4_dirent_data_header *)(de->name + data_offset);
+		unsigned int dlen;
+
+		if (data_offset + sizeof(*ddh) > de->rec_len)
+			return ret;
+
+		dlen = ddh->ddh_length;
+		if (dlen < sizeof(*ddh) || data_offset + dlen > de->rec_len)
+			return ret;
+
+		data_offset += dlen;
+	}
+
+	if (!hinfo)
+		return ret;
+
+	if (de->file_type & EXT4_DIRENT_CFHASH) {
+		struct ext4_dirent_hash *dh =
+			(struct ext4_dirent_hash *)(de->name + data_offset);
+		unsigned int dlen;
+
+		dlen = dh->dh_header.ddh_length;
+		if (dlen < sizeof(*dh) || data_offset + dlen > de->rec_len)
+			return ret;
+
+		hinfo->hash = le32_to_cpu(dh->dh_hash.hash);
+		hinfo->minor_hash = le32_to_cpu(dh->dh_hash.minor_hash);
+		ret |= EXT4_DIRENT_CFHASH;
+	}
+
+	return ret;
+}
+
+/*
+ * ext4_dirdata_set() - Write dirdata fields into a directory entry.
+ * @de:    directory entry (name must already be set)
+ * @dir:   directory inode
+ * @data:  LUFID data to store (or NULL)
+ * @fname: filename info carrying the casefold hash
+ *
+ * Writes any required dirdata into @de after the filename.  If the dirdata
+ * feature is not enabled, falls back to writing the hash inline after the
+ * filename (for compatibility with the older casefold+fscrypt format).
+ */
+static void ext4_dirdata_set(struct ext4_dir_entry_2 *de, struct inode *dir,
+			     struct ext4_dirent_fid *dfid,
+			     struct ext4_filename *fname)
+{
+	struct dx_hash_info *hinfo = &fname->hinfo;
+	unsigned int data_offset = de->name_len + 1;
+
+
+	if (dfid) {
+		unsigned int dlen = dfid->df_header.ddh_length;
+
+		if (data_offset + dlen > de->rec_len) {
+			EXT4_ERROR_INODE(dir, "Can not insert FID");
+			return;
+		}
+
+
+		de->name[de->name_len] = 0;
+		memcpy(&de->name[de->name_len + 1], dfid,
+		       dlen);
+		de->file_type |= EXT4_DIRENT_LUFID;
+		data_offset += dlen;
+	}
+
+	if (ext4_hash_in_dirent(dir)) {
+		if (ext4_has_feature_dirdata(dir->i_sb)) {
+			struct ext4_dirent_hash *dh =
+			    (struct ext4_dirent_hash *)(de->name + data_offset);
+
+			if (data_offset + sizeof(*dh) > de->rec_len) {
+				EXT4_ERROR_INODE(dir, "Can not insert dhash dirdata");
+				return;
+			}
+
+			dh->dh_header.ddh_length = sizeof(*dh);
+			dh->dh_hash.hash = cpu_to_le32(hinfo->hash);
+			dh->dh_hash.minor_hash = cpu_to_le32(hinfo->minor_hash);
+			de->file_type |= EXT4_DIRENT_CFHASH;
+		} else {
+			/* Compatibility: store hash inline after filename */
+			if (data_offset + sizeof(struct ext4_dir_entry_hash) >
+								de-> rec_len) {
+				EXT4_ERROR_INODE(dir, "Can not insert dhash");
+				return;
+			}
+
+			EXT4_DIRENT_HASHES(de)->hash = cpu_to_le32(hinfo->hash);
+			EXT4_DIRENT_HASHES(de)->minor_hash =
+						cpu_to_le32(hinfo->minor_hash);
+		}
+	}
+}
+
+
 static int dx_make_map(struct inode *dir, struct buffer_head *bh,
 		       struct dx_hash_info *hinfo,
 		       struct dx_map_entry *map_tail)
@@ -1299,9 +1450,8 @@ static int dx_make_map(struct inode *dir, struct buffer_head *bh,
 					 ((char *)de) - base))
 			return -EFSCORRUPTED;
 		if (de->name_len && de->inode) {
-			if (ext4_hash_in_dirent(dir))
-				h.hash = EXT4_DIRENT_HASH(de);
-			else {
+			if (!(ext4_dirdata_get(de, dir, NULL, &h) &
+						EXT4_DIRENT_CFHASH)) {
 				int err = ext4fs_dirhash(dir, de->name,
 						     de->name_len, &h);
 				if (err < 0)
@@ -2089,13 +2239,7 @@ void ext4_insert_dentry_data(struct inode *dir, struct inode *inode,
 	ext4_set_de_type(inode->i_sb, de, inode->i_mode);
 	de->name_len = fname_len(fname);
 	memcpy(de->name, fname_name(fname), fname_len(fname));
-	if (ext4_hash_in_dirent(dir)) {
-		struct dx_hash_info *hinfo = &fname->hinfo;
-
-		EXT4_DIRENT_HASHES(de)->hash = cpu_to_le32(hinfo->hash);
-		EXT4_DIRENT_HASHES(de)->minor_hash =
-						cpu_to_le32(hinfo->minor_hash);
-	}
+	ext4_dirdata_set(de, dir, data, fname);
 }
 
 /*
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 08/10] ext4: dirdata feature
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4
  Cc: adilger.kernel, Artem Blagodarenko, Pravin Shelar, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

When fscrypt and casefold are enabled together for a directory,
all ext4_dir_entry[_2] in that directory store a n 8-byte hash
of the filename after 'name' between 'name_len' and 'rec_len'.

However, there is no clear indication there is important data
stored in these bytes, which are only for padding and alignment
in other directory entries.  This adds complexity to code handling
the on-disk directory entries, and there is no provision for other
metadata to be stored in each dir entry after 'name'.

The dirdata feature adds a mechanism to store multiple metadata
entries in each dir entry after 'name' (including the fchash).
The unused high 4 bits of 'file_type' are used to indicate whether
additional data fields are stored after 'name'.  If a bit is set,
the corresponding dirdata record is present, starting after a NUL
filename terminator.  If present, a record starts with a 1-byte
length (including the length byte itself) and the data immediately
follows the length byte without any alignment.

This allows up to four different dirdata records to be stored in
each entry, and allows unhandled record bytes to be skipped without
having to process the contents, providing forward compatibility.

If and when the fourth and last dirdata record is needed, it is
recommended to further subdivide it into sub-records, with
the first byte being the total length, and then there being a
second byte that gives the sub-record length, etc. as long as
the total record length is less than 255 bytes.  However, this
would not affect compatibility with the current code since the
record length would allow it to be skipped without processing.

Signed-off-by: Pravin Shelar <pravin.shelar@sun.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/ext4.h   | 27 +++++++++++++++++++++------
 fs/ext4/inline.c | 19 +++++++++++++++----
 fs/ext4/namei.c  | 43 +++++++++++++++++++++----------------------
 fs/ext4/sysfs.c  |  2 ++
 4 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 066c49fe3266..ef99e4fa99d7 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2248,6 +2248,7 @@ EXT4_FEATURE_INCOMPAT_FUNCS(casefold,		CASEFOLD)
 					 EXT4_FEATURE_INCOMPAT_FLEX_BG| \
 					 EXT4_FEATURE_INCOMPAT_EA_INODE| \
 					 EXT4_FEATURE_INCOMPAT_MMP | \
+					 EXT4_FEATURE_INCOMPAT_DIRDATA | \
 					 EXT4_FEATURE_INCOMPAT_INLINE_DATA | \
 					 EXT4_FEATURE_INCOMPAT_ENCRYPT | \
 					 EXT4_FEATURE_INCOMPAT_CASEFOLD | \
@@ -2949,10 +2950,18 @@ extern int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 			     struct ext4_filename *fname,
 			     struct ext4_dir_entry_2 **dest_de,
 			     int dlen);
-void ext4_insert_dentry(struct inode *dir, struct inode *inode,
-			struct ext4_dir_entry_2 *de,
-			int buf_size,
-			struct ext4_filename *fname);
+void ext4_insert_dentry_data(struct inode *dir, struct inode *inode,
+			     struct ext4_dir_entry_2 *de,
+			     int buf_size,
+			     struct ext4_filename *fname,
+			     void *data);
+static inline void ext4_insert_dentry(struct inode *dir, struct inode *inode,
+				      struct ext4_dir_entry_2 *de,
+				      int buf_size,
+				      struct ext4_filename *fname)
+{
+	ext4_insert_dentry_data(dir, inode, de, buf_size, fname, NULL);
+}
 static inline void ext4_update_dx_flag(struct inode *inode)
 {
 	if (!ext4_has_feature_dir_index(inode->i_sb) &&
@@ -3196,8 +3205,14 @@ extern int ext4_ext_migrate(struct inode *);
 extern int ext4_ind_migrate(struct inode *inode);
 
 /* namei.c */
-extern int ext4_init_new_dir(handle_t *handle, struct inode *dir,
-			     struct inode *inode);
+extern int ext4_init_new_dir_data(handle_t *handle, struct inode *dir,
+				  struct inode *inode,
+				  const void *data1, const void *data2);
+static inline int ext4_init_new_dir(handle_t *handle, struct inode *dir,
+				    struct inode *inode)
+{
+	return ext4_init_new_dir_data(handle, dir, inode, NULL, NULL);
+}
 extern int ext4_dirblock_csum_verify(struct inode *inode,
 				     struct buffer_head *bh);
 extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 5b3faacdf143..c57a8ebe4f94 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -973,11 +973,16 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 				     struct ext4_iloc *iloc,
 				     void *inline_start, int inline_size)
 {
-	int		err;
+	int		err, dlen = 0;
 	struct ext4_dir_entry_2 *de;
+	unsigned char *data = NULL;
+
+	/* Deliver data in any appropriate way here. Now it is NULL */
+	if (data)
+		dlen = (*data) + 1;
 
 	err = ext4_find_dest_de(dir, iloc->bh, inline_start,
-				inline_size, fname, &de, 0);
+				inline_size, fname, &de, dlen);
 	if (err)
 		return err;
 
@@ -986,7 +991,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 					    EXT4_JTR_NONE);
 	if (err)
 		return err;
-	ext4_insert_dentry(dir, inode, de, inline_size, fname);
+	ext4_insert_dentry_data(dir, inode, de, inline_size, fname, NULL);
 
 	ext4_show_inline_dir(dir, iloc->bh, inline_start, inline_size);
 
@@ -1326,7 +1331,13 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
 			pos = EXT4_INLINE_DOTDOT_SIZE;
 		} else {
 			de = (struct ext4_dir_entry_2 *)(dir_buf + pos);
-			pos += ext4_rec_len_from_disk(de->rec_len, inline_size);
+			/* Use ext4_dir_entry_len to account for dirdata extensions */
+			pos += ext4_dir_entry_len(de, dir);
+			/* Validate pos doesn't exceed buffer to prevent use-after-free */
+			if (pos > inline_size) {
+				ret = count;
+				goto out;
+			}
 			if (ext4_check_dir_entry(inode, dir_file, de,
 					 iloc.bh, dir_buf,
 					 inline_size, pos)) {
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index cd20b1094134..40a3394f7eac 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -401,23 +401,24 @@ static struct dx_countlimit *get_dx_countlimit(struct inode *inode,
 {
 	struct ext4_dir_entry_2 *de;
 	struct dx_root_info *root;
-	int count_offset;
+	int count_offset, dotdot_rec_len;
 	int blocksize = EXT4_BLOCK_SIZE(inode->i_sb);
 	unsigned int rlen = ext4_rec_len_from_disk(dirent->rec_len, blocksize);
 
-	if (rlen == blocksize)
+	if (rlen == blocksize) {
 		count_offset = sizeof(struct dx_node);
-	else if (rlen == 12) {
-		de = (struct ext4_dir_entry_2 *)(((void *)dirent) + 12);
-		if (ext4_rec_len_from_disk(de->rec_len, blocksize) != blocksize - 12)
+	} else {
+		de = (struct ext4_dir_entry_2 *)(((char *)dirent) + rlen);
+		if (le16_to_cpu(de->rec_len) != (blocksize - rlen))
 			return NULL;
-		root = (struct dx_root_info *)(((void *)de + 12));
+		/* de->rec_len covers whole dx_root block, calculate actual length */
+		dotdot_rec_len = ext4_dir_entry_len(de, NULL);
+		root = (struct dx_root_info *)(((char *)de + dotdot_rec_len));
 		if (root->reserved_zero ||
 		    root->info_length != sizeof(struct dx_root_info))
 			return NULL;
-		count_offset = 32;
-	} else
-		return NULL;
+		count_offset = root->info_length + rlen + dotdot_rec_len;
+	}
 
 	if (offset)
 		*offset = count_offset;
@@ -698,7 +699,7 @@ static struct stats dx_show_leaf(struct inode *dir,
 				       (unsigned) ((char *) de - base));
 #endif
 			}
-			space += ext4_dir_rec_len(de->name_len, dir);
+			space += ext4_dir_entry_len(de, dir);
 			names++;
 		}
 		de = ext4_next_entry(de, size);
@@ -2068,13 +2069,10 @@ int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 	return 0;
 }
 
-void ext4_insert_dentry(struct inode *dir,
-			struct inode *inode,
-			struct ext4_dir_entry_2 *de,
-			int buf_size,
-			struct ext4_filename *fname)
+void ext4_insert_dentry_data(struct inode *dir, struct inode *inode,
+			     struct ext4_dir_entry_2 *de, int buf_size,
+			     struct ext4_filename *fname, void *data)
 {
-
 	int nlen, rlen;
 
 	nlen = ext4_dir_entry_len(de, dir);
@@ -2116,15 +2114,15 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 	unsigned int	blocksize = dir->i_sb->s_blocksize;
 	int		csum_size = 0;
 	int		err, err2, dlen = 0;
-	unsigned char	*data = NULL;
+	struct ext4_dirent_fid *dfid = NULL;
 
 	/* Deliver data in any appropriate way here. Now it is NULL */
 	if (ext4_has_feature_metadata_csum(inode->i_sb))
 		csum_size = sizeof(struct ext4_dir_entry_tail);
 
 	if (!de) {
-		if (data)
-			dlen = (*data) + 1;
+		if (dfid)
+			dlen = dfid->df_header.ddh_length;
 		err = ext4_find_dest_de(dir, bh, bh->b_data,
 					blocksize - csum_size, fname, &de, dlen);
 		if (err)
@@ -2139,7 +2137,7 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 	}
 
 	/* By now the buffer is marked for journaling */
-	ext4_insert_dentry(dir, inode, de, blocksize, fname);
+	ext4_insert_dentry_data(dir, inode, de, blocksize, fname, dfid);
 
 	/*
 	 * XXX shouldn't update any times until successful
@@ -2968,8 +2966,9 @@ int ext4_init_dirblock(handle_t *handle, struct inode *inode,
 	return ext4_handle_dirty_dirblock(handle, inode, bh);
 }
 
-int ext4_init_new_dir(handle_t *handle, struct inode *dir,
-			     struct inode *inode)
+int ext4_init_new_dir_data(handle_t *handle, struct inode *dir,
+			   struct inode *inode,
+			   const void *data1, const void *data2)
 {
 	struct buffer_head *dir_block = NULL;
 	ext4_lblk_t block = 0;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 923b375e017f..80074fb15ee9 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -362,6 +362,7 @@ EXT4_ATTR_FEATURE(verity);
 #endif
 EXT4_ATTR_FEATURE(metadata_csum_seed);
 EXT4_ATTR_FEATURE(fast_commit);
+EXT4_ATTR_FEATURE(dirdata);
 #if IS_ENABLED(CONFIG_UNICODE) && defined(CONFIG_FS_ENCRYPTION)
 EXT4_ATTR_FEATURE(encrypted_casefold);
 #endif
@@ -385,6 +386,7 @@ static struct attribute *ext4_feat_attrs[] = {
 #endif
 	ATTR_LIST(metadata_csum_seed),
 	ATTR_LIST(fast_commit),
+	ATTR_LIST(dirdata),
 #if IS_ENABLED(CONFIG_UNICODE) && defined(CONFIG_FS_ENCRYPTION)
 	ATTR_LIST(encrypted_casefold),
 #endif
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 07/10] ext4: rename ext4_dir_rec_len() and clarify dirdata usage
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Rename ext4_dir_rec_len() to ext4_dirent_rec_len() to better
reflect that it computes the record length for a directory
entry based on the provided name length.

Update the comment to clarify handling of dirdata-enabled
directories and document the use of ext4_dir_entry_len()
when dirdata is present.

No functional changes.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/dir.c    |  9 ++++-----
 fs/ext4/ext4.h   | 14 ++++++++++----
 fs/ext4/inline.c | 14 +++++++-------
 fs/ext4/namei.c  | 39 ++++++++++++++++++++++-----------------
 4 files changed, 43 insertions(+), 33 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 17edd678fa87..012687822b82 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -89,16 +89,15 @@ int __ext4_check_dir_entry(const char *function, unsigned int line,
 	bool fake = is_fake_dir_entry(de);
 	bool has_csum = ext4_has_feature_metadata_csum(dir->i_sb);
 
-	if (unlikely(rlen < ext4_dir_rec_len(1, fake ? NULL : dir)))
+	if (unlikely(rlen < ext4_dirent_rec_len(1, fake ? NULL : dir)))
 		error_msg = "rec_len is smaller than minimal";
 	else if (unlikely(rlen % 4 != 0))
 		error_msg = "rec_len % 4 != 0";
-	else if (unlikely(rlen < ext4_dir_rec_len(de->name_len,
-							fake ? NULL : dir)))
+	else if (unlikely(rlen < ext4_dir_entry_len(de, fake ? NULL : dir)))
 		error_msg = "rec_len is too small for name_len";
 	else if (unlikely(next_offset > size))
 		error_msg = "directory entry overrun";
-	else if (unlikely(next_offset > size - ext4_dir_rec_len(1,
+	else if (unlikely(next_offset > size - ext4_dirent_rec_len(1,
 						  has_csum ? NULL : dir) &&
 			  next_offset != size))
 		error_msg = "directory entry too close to block end";
@@ -245,7 +244,7 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
 				 * failure will be detected in the
 				 * dirent test below. */
 				if (ext4_rec_len_from_disk(de->rec_len,
-					sb->s_blocksize) < ext4_dir_rec_len(1,
+					sb->s_blocksize) < ext4_dirent_rec_len(1,
 									inode))
 					break;
 				i += ext4_rec_len_from_disk(de->rec_len,
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 45e90b8be9e8..066c49fe3266 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2530,11 +2530,16 @@ struct ext4_dirent_hash {
  * casefolded and encrypted need to store the hash as well, so we add room for
  * ext4_extended_dir_entry_2. For all entries related to '.' or '..' you should
  * pass NULL for dir, as those entries do not use the extra fields.
+ *
+ * For directories with the dirdata feature, extra data may follow the filename.
+ * Use ext4_dir_entry_len() to compute the length of a directory entry
+ * including any dirdata, or ext4_dirent_rec_len() directly when the total
+ * name_len (including dirdata length) is already known.
  */
-static inline unsigned int ext4_dir_rec_len(__u8 name_len,
+static inline unsigned int ext4_dirent_rec_len(unsigned int name_len,
 						const struct inode *dir)
 {
-	int rec_len = (name_len + 8 + EXT4_DIR_ROUND);
+	unsigned int rec_len = (name_len + 8 + EXT4_DIR_ROUND);
 
 	if (dir && ext4_hash_in_dirent(dir))
 		rec_len += sizeof(struct ext4_dir_entry_hash);
@@ -2942,7 +2947,8 @@ extern void ext4_htree_free_dir_info(struct dir_private_info *p);
 extern int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 			     void *buf, int buf_size,
 			     struct ext4_filename *fname,
-			     struct ext4_dir_entry_2 **dest_de);
+			     struct ext4_dir_entry_2 **dest_de,
+			     int dlen);
 void ext4_insert_dentry(struct inode *dir, struct inode *inode,
 			struct ext4_dir_entry_2 *de,
 			int buf_size,
@@ -4055,7 +4061,7 @@ static inline unsigned int ext4_dir_entry_len(struct ext4_dir_entry_2 *de,
 	unsigned int rec_len = ext4_rec_len_from_disk(de->rec_len, blocksize);
 	unsigned int dirdata = ext4_dirent_get_data_len(de, rec_len);
 
-	return ext4_dir_rec_len(de->name_len + dirdata, dir);
+	return ext4_dirent_rec_len(de->name_len + dirdata, dir);
 }
 
 extern const struct iomap_ops ext4_iomap_ops;
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 8045e4ff270c..5b3faacdf143 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -977,7 +977,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 	struct ext4_dir_entry_2 *de;
 
 	err = ext4_find_dest_de(dir, iloc->bh, inline_start,
-				inline_size, fname, &de);
+				inline_size, fname, &de, 0);
 	if (err)
 		return err;
 
@@ -1055,7 +1055,7 @@ static int ext4_update_inline_dir(handle_t *handle, struct inode *dir,
 	int old_size = EXT4_I(dir)->i_inline_size - EXT4_MIN_INLINE_DATA_SIZE;
 	int new_size = get_max_inline_xattr_value_size(dir, iloc);
 
-	if (new_size - old_size <= ext4_dir_rec_len(1, NULL))
+	if (new_size - old_size <= ext4_dirent_rec_len(1, NULL))
 		return -ENOSPC;
 
 	ret = ext4_update_inline_data(handle, dir,
@@ -1309,7 +1309,7 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
 			fake.name_len = 1;
 			memcpy(fake.name, ".", 2);
 			fake.rec_len = ext4_rec_len_to_disk(
-					  ext4_dir_rec_len(fake.name_len, NULL),
+					  ext4_dirent_rec_len(fake.name_len, NULL),
 					  inline_size);
 			ext4_set_de_type(inode->i_sb, &fake, S_IFDIR);
 			de = &fake;
@@ -1319,7 +1319,7 @@ int ext4_inlinedir_to_tree(struct file *dir_file,
 			fake.name_len = 2;
 			memcpy(fake.name, "..", 3);
 			fake.rec_len = ext4_rec_len_to_disk(
-					  ext4_dir_rec_len(fake.name_len, NULL),
+					  ext4_dirent_rec_len(fake.name_len, NULL),
 					  inline_size);
 			ext4_set_de_type(inode->i_sb, &fake, S_IFDIR);
 			de = &fake;
@@ -1427,8 +1427,8 @@ int ext4_read_inline_dir(struct file *file,
 	 * So we will use extra_offset and extra_size to indicate them
 	 * during the inline dir iteration.
 	 */
-	dotdot_offset = ext4_dir_rec_len(1, NULL);
-	dotdot_size = dotdot_offset + ext4_dir_rec_len(2, NULL);
+	dotdot_offset = ext4_dirent_rec_len(1, NULL);
+	dotdot_size = dotdot_offset + ext4_dirent_rec_len(2, NULL);
 	extra_offset = dotdot_size - EXT4_INLINE_DOTDOT_SIZE;
 	extra_size = extra_offset + inline_size;
 
@@ -1463,7 +1463,7 @@ int ext4_read_inline_dir(struct file *file,
 			 * failure will be detected in the
 			 * dirent test below. */
 			if (ext4_rec_len_from_disk(de->rec_len, extra_size)
-				< ext4_dir_rec_len(1, NULL))
+				< ext4_dirent_rec_len(1, NULL))
 				break;
 			i += ext4_rec_len_from_disk(de->rec_len,
 						    extra_size);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 0635eac2de8d..cd20b1094134 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -522,10 +522,10 @@ ext4_next_entry(struct ext4_dir_entry_2 *p, unsigned long blocksize)
 static struct dx_root_info *dx_get_dx_info(void *de_buf)
 {
 	/* get dotdot first */
-	de_buf = de_buf + ext4_dir_rec_len(1, NULL);
+	de_buf += ext4_dir_entry_len(de_buf, NULL);
 
 	/* dx root info is after dotdot entry */
-	de_buf = de_buf + ext4_dir_rec_len(2, NULL);
+	de_buf += ext4_dir_entry_len(de_buf, NULL);
 
 	return (struct dx_root_info *)de_buf;
 }
@@ -588,7 +588,7 @@ static inline unsigned dx_root_limit(struct inode *dir,
 static inline unsigned dx_node_limit(struct inode *dir)
 {
 	unsigned int entry_space = dir->i_sb->s_blocksize -
-			ext4_dir_rec_len(0, dir);
+			ext4_dirent_rec_len(0, dir);
 
 	if (ext4_has_feature_metadata_csum(dir->i_sb))
 		entry_space -= sizeof(struct dx_tail);
@@ -1058,7 +1058,7 @@ static int htree_dirblock_to_tree(struct file *dir_file,
 	/* csum entries are not larger in the casefolded encrypted case */
 	top = (struct ext4_dir_entry_2 *) ((char *) de +
 					   dir->i_sb->s_blocksize -
-					   ext4_dir_rec_len(0,
+					   ext4_dirent_rec_len(0,
 							   csum ? NULL : dir));
 	/* Check if the directory is encrypted */
 	if (IS_ENCRYPTED(dir)) {
@@ -1852,7 +1852,7 @@ dx_move_dirents(struct inode *dir, char *from, char *to,
 	while (count--) {
 		struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *)
 						(from + (map->offs<<2));
-		rec_len = ext4_dir_rec_len(de->name_len, dir);
+		rec_len = ext4_dir_entry_len(de, dir);
 
 		memcpy (to, de, rec_len);
 		((struct ext4_dir_entry_2 *) to)->rec_len =
@@ -1885,7 +1885,7 @@ static struct ext4_dir_entry_2 *dx_pack_dirents(struct inode *dir, char *base,
 	while ((char*)de < base + blocksize) {
 		next = ext4_next_entry(de, blocksize);
 		if (de->inode && de->name_len) {
-			rec_len = ext4_dir_rec_len(de->name_len, dir);
+			rec_len = ext4_dir_entry_len(de, dir);
 			if (de > to)
 				memmove(to, de, rec_len);
 			to->rec_len = ext4_rec_len_to_disk(rec_len, blocksize);
@@ -2037,10 +2037,11 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
 int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 		      void *buf, int buf_size,
 		      struct ext4_filename *fname,
-		      struct ext4_dir_entry_2 **dest_de)
+		      struct ext4_dir_entry_2 **dest_de,
+		      int dlen)
 {
 	struct ext4_dir_entry_2 *de;
-	unsigned short reclen = ext4_dir_rec_len(fname_len(fname), dir);
+	unsigned short reclen = ext4_dirent_rec_len(fname_len(fname) + dlen, dir);
 	int nlen, rlen;
 	unsigned int offset = 0;
 	char *top;
@@ -2053,7 +2054,7 @@ int ext4_find_dest_de(struct inode *dir, struct buffer_head *bh,
 			return -EFSCORRUPTED;
 		if (ext4_match(dir, fname, de))
 			return -EEXIST;
-		nlen = ext4_dir_rec_len(de->name_len, dir);
+		nlen = ext4_dir_entry_len(de, dir);
 		rlen = ext4_rec_len_from_disk(de->rec_len, buf_size);
 		if ((de->inode ? rlen - nlen : rlen) >= reclen)
 			break;
@@ -2076,7 +2077,7 @@ void ext4_insert_dentry(struct inode *dir,
 
 	int nlen, rlen;
 
-	nlen = ext4_dir_rec_len(de->name_len, dir);
+	nlen = ext4_dir_entry_len(de, dir);
 	rlen = ext4_rec_len_from_disk(de->rec_len, buf_size);
 	if (de->inode) {
 		struct ext4_dir_entry_2 *de1 =
@@ -2114,14 +2115,18 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 {
 	unsigned int	blocksize = dir->i_sb->s_blocksize;
 	int		csum_size = 0;
-	int		err, err2;
+	int		err, err2, dlen = 0;
+	unsigned char	*data = NULL;
 
+	/* Deliver data in any appropriate way here. Now it is NULL */
 	if (ext4_has_feature_metadata_csum(inode->i_sb))
 		csum_size = sizeof(struct ext4_dir_entry_tail);
 
 	if (!de) {
+		if (data)
+			dlen = (*data) + 1;
 		err = ext4_find_dest_de(dir, bh, bh->b_data,
-					blocksize - csum_size, fname, &de);
+					blocksize - csum_size, fname, &de, dlen);
 		if (err)
 			return err;
 	}
@@ -2930,7 +2935,7 @@ int ext4_init_dirblock(handle_t *handle, struct inode *inode,
 
 	de->inode = cpu_to_le32(inode->i_ino);
 	de->name_len = 1;
-	de->rec_len = ext4_rec_len_to_disk(ext4_dir_rec_len(de->name_len, NULL),
+	de->rec_len = ext4_rec_len_to_disk(ext4_dirent_rec_len(de->name_len, NULL),
 					   blocksize);
 	memcpy(de->name, ".", 2);
 	ext4_set_de_type(inode->i_sb, de, S_IFDIR);
@@ -2942,7 +2947,7 @@ int ext4_init_dirblock(handle_t *handle, struct inode *inode,
 	ext4_set_de_type(inode->i_sb, de, S_IFDIR);
 	if (inline_buf) {
 		de->rec_len = ext4_rec_len_to_disk(
-					ext4_dir_rec_len(de->name_len, NULL),
+					ext4_dirent_rec_len(de->name_len, NULL),
 					blocksize);
 		de = ext4_next_entry(de, blocksize);
 		header_size = (char *)de - bh->b_data;
@@ -2951,7 +2956,7 @@ int ext4_init_dirblock(handle_t *handle, struct inode *inode,
 			blocksize - csum_size);
 	} else {
 		de->rec_len = ext4_rec_len_to_disk(blocksize -
-					(csum_size + ext4_dir_rec_len(1, NULL)),
+					(csum_size + ext4_dirent_rec_len(1, NULL)),
 					blocksize);
 	}
 
@@ -3074,8 +3079,8 @@ bool ext4_empty_dir(struct inode *inode)
 	}
 
 	sb = inode->i_sb;
-	if (inode->i_size < ext4_dir_rec_len(1, NULL) +
-					ext4_dir_rec_len(2, NULL)) {
+	if (inode->i_size < ext4_dirent_rec_len(1, NULL) +
+					ext4_dirent_rec_len(2, NULL)) {
 		EXT4_ERROR_INODE(inode, "invalid size");
 		return false;
 	}
-- 
2.43.7


^ permalink raw reply related

* [PATCH v2 06/10] ext4: add ext4_dir_entry_len() and harden dirdata parsing
From: Artem Blagodarenko @ 2026-06-10 15:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: adilger.kernel, Artem Blagodarenko, Andreas Dilger
In-Reply-To: <20260610152417.13576-1-ablagodarenko@thelustrecollective.com>

From: Artem Blagodarenko <artem.blagodarenko@gmail.com>

Introduce ext4_dir_entry_len() helper to compute the required
rec_len for a directory entry, taking into account dirdata and
casefold+fscrypt hash space.

Convert ext4_dirent_get_data_len() to take the decoded rec_len
as an argument and add bounds checking when walking dirdata
extensions to avoid overruns on malformed entries.

Update dx_root_limit() to use ext4_dir_entry_len() instead of
open-coded ext4_dir_rec_len() for '.' and '..' entries.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
---
 fs/ext4/ext4.h  | 45 ++++++++++++++++++++++++++++++++++++++++++---
 fs/ext4/namei.c | 23 +++++++++++++++--------
 2 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f833f6ef0040..45e90b8be9e8 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3988,6 +3988,7 @@ static inline bool ext4_dir_entry_is_tail(struct ext4_dir_entry_2 *de)
 /*
  * ext4_dirent_get_data_len() - Compute the total dirdata length for an entry.
  * @de: directory entry
+ * @rec_len: the record length of the directory entry (decoded)
  *
  * Computes the length of optional data stored after the filename (and its
  * implicit NUL terminator).  Each extension is indicated by a bit in the
@@ -3996,22 +3997,41 @@ static inline bool ext4_dir_entry_is_tail(struct ext4_dir_entry_2 *de)
  *
  * Returns 0 for tail entries and for entries with no dirdata.
  */
-static inline int ext4_dirent_get_data_len(struct ext4_dir_entry_2 *de)
+static inline int ext4_dirent_get_data_len(struct ext4_dir_entry_2 *de,
+					   unsigned int rec_len)
 {
 	__u8 extra_data_flags;
 	struct ext4_dirent_data_header *ddh;
 	int dlen = 0;
+	unsigned int offset;
 
 	if (ext4_dir_entry_is_tail(de))
 		return 0;
 
 	extra_data_flags = (de->file_type & ~EXT4_FT_MASK) >> 4;
-	ddh = (struct ext4_dirent_data_header *)(de->name + de->name_len +
-						 1 /* NUL terminator */);
+	/* offset from start of entry to after filename + NUL */
+	offset = EXT4_BASE_DIR_LEN + de->name_len + 1;
 
+	/* bounds check: ensure we start reading within the entry */
+	if (offset >= rec_len)
+		return 0;
+
+	ddh = (struct ext4_dirent_data_header *)((char *)de + offset);
+ 
 	while (extra_data_flags) {
 		if (extra_data_flags & 1) {
+			/* bounds check before reading ddh_length */
+			if (offset + sizeof(*ddh) >
+			    rec_len)
+				return dlen;
+
+			/* validate ddh_length is reasonable */
+			if (ddh->ddh_length == 0 || ddh->ddh_length >
+			    rec_len - offset)
+				return dlen;
+
 			dlen += ddh->ddh_length + (dlen == 0);
+			offset += ddh->ddh_length;
 			ddh = ext4_dirdata_next(ddh);
 		}
 		extra_data_flags >>= 1;
@@ -4019,6 +4039,25 @@ static inline int ext4_dirent_get_data_len(struct ext4_dir_entry_2 *de)
 	return dlen;
 }
 
+/*
+ * ext4_dir_entry_len() - Compute the required rec_len for a directory entry.
+ * @de:  directory entry (used to read name_len and any dirdata length)
+ * @dir: directory inode (may be NULL for '.' and '..' entries)
+ *
+ * Returns the minimum record length needed to hold @de, rounded up to the
+ * directory alignment and including room for the casefold+fscrypt hash if
+ * the directory requires it.
+ */
+static inline unsigned int ext4_dir_entry_len(struct ext4_dir_entry_2 *de,
+					      const struct inode *dir)
+{
+	unsigned int blocksize = (dir && dir->i_sb) ? dir->i_sb->s_blocksize : 4096;
+	unsigned int rec_len = ext4_rec_len_from_disk(de->rec_len, blocksize);
+	unsigned int dirdata = ext4_dirent_get_data_len(de, rec_len);
+
+	return ext4_dir_rec_len(de->name_len + dirdata, dir);
+}
+
 extern const struct iomap_ops ext4_iomap_ops;
 extern const struct iomap_ops ext4_iomap_report_ops;
 
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 87d8cd2c6377..0635eac2de8d 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -570,11 +570,15 @@ static inline void dx_set_limit(struct dx_entry *entries, unsigned value)
 	((struct dx_countlimit *) entries)->limit = cpu_to_le16(value);
 }
 
-static inline unsigned dx_root_limit(struct inode *dir, unsigned infosize)
+static inline unsigned dx_root_limit(struct inode *dir,
+	struct ext4_dir_entry_2 *dot_de)
 {
-	unsigned int entry_space = dir->i_sb->s_blocksize -
-			ext4_dir_rec_len(1, NULL) -
-			ext4_dir_rec_len(2, NULL) - infosize;
+	struct dx_root_info *info;
+	unsigned int entry_space;
+
+	info = dx_get_dx_info(dot_de);
+	entry_space = dir->i_sb->s_blocksize - ((char *)info - (char *)dot_de) -
+		info->info_length;
 
 	if (ext4_has_feature_metadata_csum(dir->i_sb))
 		entry_space -= sizeof(struct dx_tail);
@@ -850,10 +854,13 @@ dx_probe(struct ext4_filename *fname, struct inode *dir,
 
 	entries = (struct dx_entry *)(((char *)info) + info->info_length);
 
-	if (dx_get_limit(entries) != dx_root_limit(dir, info->info_length)) {
+	if (dx_get_limit(entries) !=
+	    dx_root_limit(dir, (struct ext4_dir_entry_2 *)frame->bh->b_data)) {
 		ext4_warning_inode(dir, "dx entry: limit %u != root limit %u",
 				   dx_get_limit(entries),
-				   dx_root_limit(dir, info->info_length));
+				   dx_root_limit(dir,
+				   (struct ext4_dir_entry_2 *)frame->bh->b_data
+				   ));
 		goto fail;
 	}
 
@@ -2278,10 +2285,10 @@ static int make_indexed_dir(handle_t *handle, struct ext4_filename *fname,
 		dx_info->hash_version =
 				EXT4_SB(dir->i_sb)->s_def_hash_version;
 
-	entries = (void *)dx_info + sizeof(*dx_info);
+	entries = (void *)dx_info + dx_info->info_length;
 	dx_set_block(entries, 1);
 	dx_set_count(entries, 1);
-	dx_set_limit(entries, dx_root_limit(dir, sizeof(*dx_info)));
+	dx_set_limit(entries, dx_root_limit(dir, dot_de));
 
 	/* Initialize as for dx_probe */
 	fname->hinfo.hash_version = dx_info->hash_version;
-- 
2.43.7


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox