Linux EXT4 FS development
 help / color / mirror / Atom feed
* Re: [PATCH v7 3/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput
From: Jan Kara @ 2026-06-22 10:44 UTC (permalink / raw)
  To: Zhou, Yun
  Cc: Jan Kara, tytso, adilger.kernel, libaokun, ojaswin, ritesh.list,
	yi.zhang, linux-ext4, linux-kernel
In-Reply-To: <f9c2c9a3-c68d-4c1c-b399-656068ef472e@windriver.com>

On Mon 22-06-26 18:06:23, Zhou, Yun wrote:
> Hi Honza,
> 
> On 6/18/26 02:42, Jan Kara wrote:
> > 
> > Allocating ext4_ea_iput_entry for dropping each inode is somewhat wasteful.
> > I want to suggest another scheme (somewhat more involved but more efficient
> > scheme):
> > 
> > 1) Create a VFS helper bool iput_if_not_last(struct inode *inode) which
> > drops inode reference if it is not the last one (and returns true in that
> > case). Basically:
> > 
> > bool iput_if_not_last(struct inode *inode)
> > {
> >          return atomic_add_unless(&inode->i_count, -1, 1);
> > }
> > 
> > This needs to be a separate patch as it should get vetting from VFS
> > maintainers.
> After taking a closer look, it seems that this function doesn't need to be
> added to the VFS layer — at least not for now. For example, we could
> directly inline atomic_add_unless() into ext4_put_ea_inode():
> 
> void ext4_put_ea_inode(struct super_block *sb, struct inode *inode)
> {
>     if (!inode)
>         return;
>     if (atomic_add_unless(&inode->i_count, -1, 1))
>         return;
>     llist_add(&EXT4_I(inode)->i_ea_iput_node,
>         &EXT4_SB(sb)->s_ea_inode_to_free);
>     schedule_delayed_work(&EXT4_SB(sb)->s_ea_inode_work, 1);
> }
> 
> This way, we avoid submitting an isolated patch to the VFS layer that
> currently has only one user (ext4). If there is indeed a real need for it in
> the future, we can always submit a follow-up patch to refactor it.
> 
> What do you think?

No, please provide a proper helper in VFS for this. We don't really want
filesystems to play games with inode refcount without proper abstraction in
VFS. That causes longterm maintenance issues.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH v7 3/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput
From: Zhou, Yun @ 2026-06-22 10:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: tytso, adilger.kernel, libaokun, ojaswin, ritesh.list, yi.zhang,
	linux-ext4, linux-kernel
In-Reply-To: <jxcbsd2ot63wy3dcoximemkuitwoqn2a7jgxcsfdwaf5q3ecdu@sahahqqopo6y>

Hi Honza,

On 6/18/26 02:42, Jan Kara wrote:
> 
> Allocating ext4_ea_iput_entry for dropping each inode is somewhat wasteful.
> I want to suggest another scheme (somewhat more involved but more efficient
> scheme):
> 
> 1) Create a VFS helper bool iput_if_not_last(struct inode *inode) which
> drops inode reference if it is not the last one (and returns true in that
> case). Basically:
> 
> bool iput_if_not_last(struct inode *inode)
> {
>          return atomic_add_unless(&inode->i_count, -1, 1);
> }
> 
> This needs to be a separate patch as it should get vetting from VFS
> maintainers.
After taking a closer look, it seems that this function doesn't need to 
be added to the VFS layer — at least not for now. For example, we could 
directly inline atomic_add_unless() into ext4_put_ea_inode():

void ext4_put_ea_inode(struct super_block *sb, struct inode *inode)
{
     if (!inode)
         return;
     if (atomic_add_unless(&inode->i_count, -1, 1))
         return;
     llist_add(&EXT4_I(inode)->i_ea_iput_node,
         &EXT4_SB(sb)->s_ea_inode_to_free);
     schedule_delayed_work(&EXT4_SB(sb)->s_ea_inode_work, 1);
}

This way, we avoid submitting an isolated patch to the VFS layer that 
currently has only one user (ext4). If there is indeed a real need for 
it in the future, we can always submit a follow-up patch to refactor it.

What do you think?

BR,
Yun

^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Peter Zijlstra @ 2026-06-22  8:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Julia Lawall, Yury Norov, linux-doc, linux-kbuild, linuxppc-dev,
	dri-devel, linux-stm32, linux-arm-kernel, linux-rdma, linux-usb,
	linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260621093430.264983361@kernel.org>

On Sun, Jun 21, 2026 at 05:34:30AM -0400, Steven Rostedt wrote:
> There's been complaints about trace_printk() being defined in kernel.h as it
> can increase the compilation time. As it is only used by some developers for
> debugging purposes, it should not be in kernel.h causing lots of wasted CPU
> cycles for those that do not ever care about it.
> 
> Instead, add a CONFIG_TRACE_PRINTK_DEBUGGING option that developers that do
> use it can set and not have to always remember to add #include <linux/trace_printk.h>
> to the files they add trace_printk() while debugging. It also means that
> those that do not have that config set will not have to worry about wasted
> CPU cycles as it is only include in the CFLAGS when the option is set, and
> its completely ignored otherwise.

Did you forget your C 101 class? If you use a function, you gotta
include the relevant header.

You don't see userspace saying: 'Hey, you know what, perhaps we should
add stdio.h to every other header, just in case someone wants to
printf()' either.

I really don't understand your argument. Yes, maybe someone will forget
and then either their editor (if they have a halfway modern setup with
LSP enabled) or their build will complain, but so what? This is all
trivial stuff, surely we have more pressing matters to concern outselves
with?

^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Steven Rostedt @ 2026-06-22  8:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Julia Lawall, Yury Norov, linux-doc, linux-kbuild, linuxppc-dev,
	dri-devel, linux-stm32, linux-arm-kernel, linux-rdma, linux-usb,
	linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260622083440.GX49951@noisy.programming.kicks-ass.net>

On Mon, 22 Jun 2026 10:34:40 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Did you forget your C 101 class? If you use a function, you gotta
> include the relevant header.

If this was the way it was back in 2009, yeah sure. But the header
wasn't need for 17 years. Now it suddenly will be.

-- Steve

^ permalink raw reply

* Re: [PATCH v7 3/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput
From: Zhou, Yun @ 2026-06-22  8:47 UTC (permalink / raw)
  To: Jan Kara
  Cc: tytso, adilger.kernel, libaokun, ojaswin, ritesh.list, yi.zhang,
	linux-ext4, linux-kernel
In-Reply-To: <tss5x73b6cigpsmi4yvckahikktacmtsphsupa6kysr2i5bsst@laxukmhkmv22>

On 6/22/26 16:32, Jan Kara wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> On Fri 19-06-26 14:24:51, Zhou, Yun wrote:
>> Your idea makes a lot of sense. It greatly simplifies the current deferred
>> iput logic and eliminates the risk of failing to allocate an entry during
>> an OOM. However, as you mentioned, getting the VFS maintainers to agree
>> might be quite challenging.
> VFS maintainers don't bite, in the worst case they'd disagree :) For a fact
> I'm one of VFS reviewers and I think iput_if_not_last() idea is an
> acceptable one. So I think it's worth trying.
Thanks a lot for the strong support, I'd be happy to give it a try. For 
now, please
just ignore the v8 patch series, as the AI reviewer actually found some 
issues with
it as well.

BR,
Yun

^ permalink raw reply

* Re: [PATCH v7 3/4] ext4: introduce ext4_put_ea_inode() for safe deferred iput
From: Jan Kara @ 2026-06-22  8:32 UTC (permalink / raw)
  To: Zhou, Yun
  Cc: Jan Kara, tytso, adilger.kernel, libaokun, ojaswin, ritesh.list,
	yi.zhang, linux-ext4, linux-kernel
In-Reply-To: <dd9e35e6-306b-4e49-9802-487ce7abd63c@windriver.com>

On Fri 19-06-26 14:24:51, Zhou, Yun wrote:
> On 6/18/2026 2:42 AM, Jan Kara wrote:
> > > +static void ext4_xattr_inode_array_free_deferred(struct super_block *sb,
> > > +                             struct ext4_xattr_inode_array *array)
> > 
> > The array of EA inodes used in xattr handling is just another mechanism
> > used for delaying iput() of EA inodes. It doesn't make sense to stack these
> > to one on top of another. Just completely replace the array mechanism with
> > always deferring iput of EA inode into the workqueue.
> > 
> I'm thinking that a complete replacement might be too large a change. Should
> we consider postponing this work, or perhaps appending a new patch to this
> series to handle it?

Do one patch to implement the new framework for delayed EA iputs. Then
maybe one patch to convert existing iputs() of EA inodes to the new
delaying framework and then one patch to convert users of the current
'array to release' mechanism to the new delaying framework.

> > Allocating ext4_ea_iput_entry for dropping each inode is somewhat wasteful.
> > I want to suggest another scheme (somewhat more involved but more efficient
> > scheme):
> > 
> > 1) Create a VFS helper bool iput_if_not_last(struct inode *inode) which
> > drops inode reference if it is not the last one (and returns true in that
> > case). Basically:
> > 
> > bool iput_if_not_last(struct inode *inode)
> > {
> >          return atomic_add_unless(&inode->i_count, -1, 1);
> > }
> > 
> > This needs to be a separate patch as it should get vetting from VFS
> > maintainers.
> > 
> > 2) Use iput_if_not_last() in ext4_put_ea_inode(). If it returns true, we
> > are done. Otherwise we know we were at least for a moment holders of the
> > last inode reference, so we link the inode to the list of inodes to drop
> > through llist_node embedded in ext4_inode_info. We cannot race with anybody
> > else trying to link the same inode into the list because we hold one inode
> > ref and so nobody else can hit this "I was holding the last ref" path.
> > I'd union this llist_node say with xattr_sem which is unused for EA inodes
> > to avoid growing ext4_inode_info.
> > 
> > This way we avoid offloading unless really necessary and we don't have to
> > do allocations just to drop EA inode ref.
> 
> Your idea makes a lot of sense. It greatly simplifies the current deferred
> iput logic and eliminates the risk of failing to allocate an entry during
> an OOM. However, as you mentioned, getting the VFS maintainers to agree
> might be quite challenging.

VFS maintainers don't bite, in the worst case they'd disagree :) For a fact
I'm one of VFS reviewers and I think iput_if_not_last() idea is an
acceptable one. So I think it's worth trying.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Christophe Leroy (CS GROUP) @ 2026-06-22  8:05 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
	linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <20260621093430.264983361@kernel.org>



Le 21/06/2026 à 11:34, Steven Rostedt a écrit :
> There's been complaints about trace_printk() being defined in kernel.h as it
> can increase the compilation time. As it is only used by some developers for
> debugging purposes, it should not be in kernel.h causing lots of wasted CPU
> cycles for those that do not ever care about it.

Do we have a measurement of the increased compilation time ?

Christophe

> 
> Instead, add a CONFIG_TRACE_PRINTK_DEBUGGING option that developers that do
> use it can set and not have to always remember to add #include <linux/trace_printk.h>
> to the files they add trace_printk() while debugging. It also means that
> those that do not have that config set will not have to worry about wasted
> CPU cycles as it is only include in the CFLAGS when the option is set, and
> its completely ignored otherwise.
> 
> Steven Rostedt (2):
>        tracing: Move non-trace_printk prototypes back to kernel.h
>        tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
> 
> ----
>   .../driver_development_debugging_guide.rst         |  2 +-
>   Makefile                                           |  5 +++++
>   arch/powerpc/kvm/book3s_xics.c                     |  1 +
>   drivers/gpu/drm/i915/gt/intel_gtt.h                |  1 +
>   drivers/gpu/drm/i915/i915_gem.h                    |  1 +
>   drivers/hwtracing/stm/dummy_stm.c                  |  4 ++++
>   drivers/infiniband/hw/hfi1/trace_dbg.h             |  1 +
>   drivers/usb/early/xhci-dbc.c                       |  1 +
>   fs/ext4/inline.c                                   |  1 +
>   include/linux/kernel.h                             | 19 ++++++++++++++++++-
>   include/linux/sunrpc/debug.h                       |  1 +
>   include/linux/trace_printk.h                       | 22 +++-------------------
>   kernel/trace/Kconfig                               | 10 ++++++++++
>   kernel/trace/ring_buffer_benchmark.c               |  1 +
>   kernel/trace/trace.h                               |  1 +
>   samples/fprobe/fprobe_example.c                    |  1 +
>   samples/ftrace/ftrace-direct-modify.c              |  1 +
>   samples/ftrace/ftrace-direct-multi-modify.c        |  1 +
>   samples/ftrace/ftrace-direct-multi.c               |  2 +-
>   samples/ftrace/ftrace-direct-too.c                 |  2 +-
>   samples/ftrace/ftrace-direct.c                     |  2 +-
>   21 files changed, 56 insertions(+), 24 deletions(-)
> 


^ permalink raw reply

* Re: [PATCH v4 17/23] ext4: submit zeroed post-EOF data immediately in the iomap buffered I/O path
From: Zhang Yi @ 2026-06-22  3:37 UTC (permalink / raw)
  To: Jan Kara, Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	libaokun, ojaswin, ritesh.list, djwong, hch, yi.zhang, yangerkun,
	yukuai
In-Reply-To: <g6jho4eosvuwuaw6sxzvyrahl43vbhuznqtr2fbd2nhukd6a3v@fx5udwaepwrx>

On 6/18/2026 8:59 PM, Jan Kara wrote:
> On Mon 11-05-26 15:23:37, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> In the generic buffered_head I/O path, we rely on the data=order mode to
>> ensure that the zeroed EOF block data is written before updating
>> i_disksize, thus preventing stale data from being exposed.
>>
>> However, the iomap buffered I/O path cannot use this mechanism. Instead,
>> we issue the I/O immediately after performing the zero operation
>> (without synchronous waiting for performance). This can reduce the risk
>> of exposing stale data, but it does not guarantee that the zero data
>> will be flushed to disk before the metadata of i_disksize is updated.
>> The subsequent patches will wait for this I/O to complete before
>> updating i_disksize.
>>
>> Suggested-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> Two nits below:
> 
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 239d387ffaf2..e013aeb03d7b 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -4742,6 +4742,32 @@ static int ext4_block_zero_range(struct inode *inode,
>>   					zero_written);
>>   }
>>   
>> +static int ext4_iomap_submit_zero_block(struct inode *inode,
>> +					loff_t from, loff_t end)
>> +{
>> +	struct address_space *mapping = inode->i_mapping;
>> +	struct folio *folio;
>> +	bool do_submit = false;
>> +
>> +	folio = filemap_lock_folio(mapping, from >> PAGE_SHIFT);
>> +	if (IS_ERR(folio))
>> +		/* Already writeback and clear? */
> 		   ^^^ Already written back and reclaimed
> 
>> +		return PTR_ERR(folio) == -ENOENT ? 0 : PTR_ERR(folio);
>> +
>> +	folio_wait_writeback(folio);
>> +	WARN_ON_ONCE(folio_test_writeback(folio));
>> +
>> +	if (likely(folio_test_dirty(folio)))
>> +		do_submit = true;
>> +	folio_unlock(folio);
>> +	folio_put(folio);
> 
> So how is what you do here more efficient than just:
> 
> 	filemap_fdatawrite_range(mapping, from, end - 1)
> 
> ? That will also do nothing if the folio isn't dirty, won't it?
> 
> 								Honza

Yeah, we can just call filemap_fdatawrite_range() in this patch. These
logic for locking/unlocking and checking writeback is only needed in
patch 18, and should be moved to a later patch. It is used to avoid
setting ordered block on folios that have already been concurrently
written back. Sorry for confusing.

Thanks,
Yi.

> 
>> +
>> +	/* Submit zeroed block. */
>> +	if (do_submit)
>> +		return filemap_fdatawrite_range(mapping, from, end - 1);
>> +	return 0;
>> +}
>> +
>>   /*
>>    * Zero out a mapping from file offset 'from' up to the end of the block
>>    * which corresponds to 'from' or to the given 'end' inside this block.
>> @@ -4765,8 +4791,10 @@ int ext4_block_zero_eof(struct inode *inode, loff_t from, loff_t end)
>>   	if (IS_ENCRYPTED(inode) && !fscrypt_has_encryption_key(inode))
>>   		return 0;
>>   
>> -	if (length > blocksize - offset)
>> +	if (length > blocksize - offset) {
>>   		length = blocksize - offset;
>> +		end = from + length;
>> +	}
>>   
>>   	err = ext4_block_zero_range(inode, from, length,
>>   				    &did_zero, &zero_written);
>> @@ -4781,18 +4809,34 @@ int ext4_block_zero_eof(struct inode *inode, loff_t from, loff_t end)
>>   	 * TODO: In the iomap path, handle this by updating i_disksize to
>>   	 * i_size after the zeroed data has been written back.
>>   	 */
>> -	if (ext4_should_order_data(inode) &&
>> -	    did_zero && zero_written && !IS_DAX(inode)) {
>> -		handle_t *handle;
>> +	if (did_zero && zero_written && !IS_DAX(inode)) {
>> +		if (ext4_should_order_data(inode)) {
>> +			handle_t *handle;
>>   
>> -		handle = ext4_journal_start(inode, EXT4_HT_MISC, 1);
>> -		if (IS_ERR(handle))
>> -			return PTR_ERR(handle);
>> +			handle = ext4_journal_start(inode, EXT4_HT_MISC, 1);
>> +			if (IS_ERR(handle))
>> +				return PTR_ERR(handle);
>>   
>> -		err = ext4_jbd2_inode_add_write(handle, inode, from, length);
>> -		ext4_journal_stop(handle);
>> -		if (err)
>> -			return err;
>> +			err = ext4_jbd2_inode_add_write(handle, inode, from,
>> +							length);
>> +			ext4_journal_stop(handle);
>> +			if (err)
>> +				return err;
>> +		/*
>> +		 * inodes using the iomap buffered I/O path do not use the
>> +		 * data=ordered mode. We submit zeroed range directly here.
>> +		 * Do not wait for I/O completion for performance.
>> +		 *
>> +		 * TODO: Any operation that extends i_disksize (including
>> +		 * append write end io past the zeroed boundary, truncate up,
>> +		 * and append fallocate) must wait for the relevant I/O to
>> +		 * complete before updating i_disksize.
>> +		 */
>> +		} else if (ext4_inode_buffered_iomap(inode)) {
>> +			err = ext4_iomap_submit_zero_block(inode, from, end);
>> +			if (err)
>> +				return err;
>> +		}
>>   	}
>>   
>>   	return 0;
>> -- 
>> 2.52.0
>>


^ permalink raw reply

* Re: [PATCH v4 18/23] ext4: wait for ordered I/O in the iomap buffered I/O path
From: Zhang Yi @ 2026-06-22  3:32 UTC (permalink / raw)
  To: Jan Kara, Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	libaokun, ojaswin, ritesh.list, djwong, hch, yi.zhang, yangerkun,
	yukuai
In-Reply-To: <k3w5c3cyg2py4kums7nhdwjg6b6pm43qtweepuolk57xtmnotz@yxkm2k6sqfc6>

On 6/18/2026 9:48 PM, Jan Kara wrote:
> On Mon 11-05-26 15:23:38, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> For append writes, wait for ordered I/O to complete before updating
>> i_disksize. This ensures that zeroed data is flushed to disk before the
>> metadata update, preventing stale data from being exposed during
>> unaligned post-EOF append writes.
>>
>> Suggested-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> Frankly, this all looks too complex to me. Plus your are adding 32-bytes to
> struct ext4_inode_info which isn't great either. Why don't you just do
> filemap_fdatawait() for the byte at old i_disksize and be done with it?
> 
> I believe we have to simplify this. All this complexity (and thus
> maintenance burden) across several patches for the corner case of zeroing
> tail block on extention is in my opinion difficult to justify.
> 
> 								Honza

Hi, Jan!

Thanks for the review. I understand the concern about complexity and the
32-byte increase to ext4_inode_info. I tried using
filemap_fdatawait_range() as you suggested, but found two issues where
this solution doesn't work.

1. ioend worker deadlock

Since worker concurrency resources are limited, we cannot wait for
another ioend worker to complete within one ioend worker with the same
work_struct. If the worker calls
filemap_fdatawait_range(byte_at_old_disksize) to wait for the zeroed
block's folio writeback to complete, it sleeps holding the only worker
slot. If the folio contains blocks requiring extent conversion, its
writeback bit is cleared by iomap_finish_ioends() running inside
another worker -- which can only run after the current worker finishes
its batch.

Concretely:
   - Worker W1 processes ioend A, calls filemap_fdatawait_range() on
     the old EOF byte, sleeps.
   - The zeroed data is in ioend B. bio_endio defers it to
     i_iomap_ioend_list and calls queue_work().
   - queue_work() on i_iomap_ioend_work is idempotent: it returns false
     because the work is currently executing (even though sleeping).
   - ioend B sits in the list, never gets processed.
   - The folio writeback bit is only cleared by processing ioend B.
   - W1 sleeps forever -> deadlock.

Therefore, I think we have to put the wakeup logic in
ext4_iomap_end_bio() that runs in interrupt context without consuming
a worker thread. The ordered range tracking and wait queue are what
make that possible.

2. Truncate-up needs an accurate state query

In the follow-up patch 19, ext4_set_inode_size() must make a precise
decision when updating i_disksize during truncate up.

This needs a state query: "is there ordered zero I/O in flight right
now?" If yes, the i_disksize update is deferred to
ext4_iomap_wb_update_disksize(is_ordered=true), which advances
i_disksize to i_size when the ordered I/O completes. If no, we must
advance i_disksize immediately, otherwise we will lose the updating
forever.

Therefore, we need to track the state of the ordered range. Simply
using filemap_fdatawait_range() doesn't work. i_ordered_len serves as
a maintained state flag that both the ioend completion path and the
setattr path can read atomically without sleeping.

Suggestions?

Regarding the bloat of ext4_inode_info, perhaps we can drop the
wait_queue_head_t (24 bytes) and use wait_var_event()/ wake_up_var()
instead. Would this be acceptable?

Thanks,
Yi

> 
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 078feda47e36..9ce2128eea3e 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1195,6 +1195,15 @@ struct ext4_inode_info {
>>   #ifdef CONFIG_FS_ENCRYPTION
>>   	struct fscrypt_inode_info *i_crypt_info;
>>   #endif
>> +
>> +	/*
>> +	 * Track ordered zeroed data during post-EOF append writes, fallocate,
>> +	 * and truncate-up operations. These parameters are used only in the
>> +	 * iomap buffered I/O path.
>> +	 */
>> +	ext4_lblk_t i_ordered_lblk;
>> +	ext4_lblk_t i_ordered_len;
>> +	wait_queue_head_t i_ordered_wq;
>>   };
>>   
>>   /*
>> @@ -3858,6 +3867,8 @@ extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
>>   			     __u64 len, __u64 *moved_len);
>>   
>>   /* page-io.c */
>> +#define EXT4_IOMAP_IOEND_ORDER_IO	1UL	/* This I/O is an ordered one */
>> +
>>   extern int __init ext4_init_pageio(void);
>>   extern void ext4_exit_pageio(void);
>>   extern ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags);
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index e013aeb03d7b..11fb369efeb1 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -4345,6 +4345,7 @@ static int ext4_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
>>   {
>>   	struct iomap_ioend *ioend = wpc->wb_ctx;
>>   	struct ext4_inode_info *ei = EXT4_I(ioend->io_inode);
>> +	ext4_lblk_t start, end, order_lblk, order_len;
>>   
>>   	/*
>>   	 * After I/O completion, a worker needs to be scheduled when:
>> @@ -4357,6 +4358,30 @@ static int ext4_iomap_writeback_submit(struct iomap_writepage_ctx *wpc,
>>   	    test_opt(ioend->io_inode->i_sb, DATA_ERR_ABORT))
>>   		ioend->io_bio.bi_end_io = ext4_iomap_end_bio;
>>   
>> +	/*
>> +	 * Mark the I/O as ordered. Ordered I/O requires separate endio
>> +	 * handling and must not be merged with regular I/O operations.
>> +	 */
>> +	order_len = READ_ONCE(ei->i_ordered_len);
>> +	if (order_len) {
>> +		/*
>> +		 * Pair with smp_store_release() in ext4_block_zero_eof().
>> +		 * Ensure we see the updated i_ordered_lblk that was written
>> +		 * before the release store to i_ordered_len.
>> +		 */
>> +		smp_rmb();
>> +		order_lblk = READ_ONCE(ei->i_ordered_lblk);
>> +		start = ioend->io_offset >> ioend->io_inode->i_blkbits;
>> +		end = EXT4_B_TO_LBLK(ioend->io_inode,
>> +				     ioend->io_offset + ioend->io_size);
>> +
>> +		if (start <= order_lblk && end >= order_lblk + order_len) {
>> +			ioend->io_bio.bi_end_io = ext4_iomap_end_bio;
>> +			ioend->io_private = (void *)EXT4_IOMAP_IOEND_ORDER_IO;
>> +			ioend->io_flags |= IOMAP_IOEND_BOUNDARY;
>> +		}
>> +	}
>> +
>>   	return iomap_ioend_writeback_submit(wpc, error);
>>   }
>>   
>> @@ -4746,8 +4771,10 @@ static int ext4_iomap_submit_zero_block(struct inode *inode,
>>   					loff_t from, loff_t end)
>>   {
>>   	struct address_space *mapping = inode->i_mapping;
>> +	struct ext4_inode_info *ei = EXT4_I(inode);
>>   	struct folio *folio;
>>   	bool do_submit = false;
>> +	int ret;
>>   
>>   	folio = filemap_lock_folio(mapping, from >> PAGE_SHIFT);
>>   	if (IS_ERR(folio))
>> @@ -4757,14 +4784,50 @@ static int ext4_iomap_submit_zero_block(struct inode *inode,
>>   	folio_wait_writeback(folio);
>>   	WARN_ON_ONCE(folio_test_writeback(folio));
>>   
>> -	if (likely(folio_test_dirty(folio)))
>> +	/*
>> +	 * Mark the ordered range. It will be cleared upon I/O completion
>> +	 * in ext4_iomap_end_bio(). Any operation that extends i_disksize
>> +	 * (including append write end io past the zeroed boundary,
>> +	 * truncate up and append fallocate) must wait for this I/O to
>> +	 * complete before updating i_disksize.
>> +	 *
>> +	 * When multiple overlapping unaligned EOF writes are in flight, we
>> +	 * only need to track and wait for the first one. Subsequent writes
>> +	 * will zero the gap in memory and ensure that the zeroed data is
>> +	 * written out along with the valid data in the same block before
>> +	 * i_disksize is updated.
>> +	 */
>> +	if (likely(folio_test_dirty(folio) &&
>> +		   READ_ONCE(ei->i_ordered_len) == 0)) {
>> +		WRITE_ONCE(ei->i_ordered_lblk,
>> +			   from >> inode->i_blkbits);
>> +		/*
>> +		 * Pairs with smp_rmb() in ext4_iomap_writeback_submit()
>> +		 * and ext4_iomap_wb_ordered_wait(). Ensure the updated
>> +		 * i_ordered_lblk is visible when i_ordered_len becomes
>> +		 * non-zero.
>> +		 */
>> +		smp_store_release(&ei->i_ordered_len, 1);
>>   		do_submit = true;
>> +	}
>>   	folio_unlock(folio);
>>   	folio_put(folio);
>>   
>>   	/* Submit zeroed block. */
>> -	if (do_submit)
>> -		return filemap_fdatawrite_range(mapping, from, end - 1);
>> +	if (do_submit) {
>> +		ret = filemap_fdatawrite_range(mapping, from, end - 1);
>> +		if (ret) {
>> +			/*
>> +			 * Pairs with wait_event() in
>> +			 * ext4_iomap_wb_ordered_wait(). Ensure
>> +			 * i_ordered_len = 0 is visible before waking up
>> +			 * waiters.
>> +			 */
>> +			smp_store_release(&ei->i_ordered_len, 0);
>> +			wake_up_all(&ei->i_ordered_wq);
>> +			return ret;
>> +		}
>> +	}
>>   	return 0;
>>   }
>>   
>> @@ -4827,10 +4890,13 @@ int ext4_block_zero_eof(struct inode *inode, loff_t from, loff_t end)
>>   		 * data=ordered mode. We submit zeroed range directly here.
>>   		 * Do not wait for I/O completion for performance.
>>   		 *
>> -		 * TODO: Any operation that extends i_disksize (including
>> -		 * append write end io past the zeroed boundary, truncate up,
>> -		 * and append fallocate) must wait for the relevant I/O to
>> -		 * complete before updating i_disksize.
>> +		 * The end_io handler ext4_iomap_wb_ordered_wait() will wait
>> +		 * for I/O completion before updating i_disksize if the write
>> +		 * extends beyond the zeroed boundary.
>> +		 *
>> +		 * TODO: Any other operation that extends i_disksize
>> +		 * (including truncate up and append fallocate) must wait for
>> +		 * the relevant I/O to complete before updating i_disksize.
>>   		 */
>>   		} else if (ext4_inode_buffered_iomap(inode)) {
>>   			err = ext4_iomap_submit_zero_block(inode, from, end);
>> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
>> index 3050c887329f..ad05ebb49bf6 100644
>> --- a/fs/ext4/page-io.c
>> +++ b/fs/ext4/page-io.c
>> @@ -613,6 +613,46 @@ int ext4_bio_write_folio(struct ext4_io_submit *io, struct folio *folio,
>>   	return 0;
>>   }
>>   
>> +/*
>> + * If the old disk size is not block size aligned and the current
>> + * writeback range is entirely beyond the old EOF block, we should
>> + * wait for the zeroed data written in ext4_block_zero_eof() to be
>> + * written out, otherwise, it may expose stale data in that block.
>> + */
>> +static void ext4_iomap_wb_ordered_wait(struct inode *inode,
>> +				       loff_t pos, loff_t end)
>> +{
>> +	struct ext4_inode_info *ei = EXT4_I(inode);
>> +	unsigned int blocksize = i_blocksize(inode);
>> +	loff_t disksize = READ_ONCE(ei->i_disksize);
>> +	ext4_lblk_t order_lblk, order_len;
>> +
>> +	/*
>> +	 * Waiting for ordered I/O is unnecessary when:
>> +	 * - The on-disk size is block-aligned (no stale data exists).
>> +	 * - The write start is within the block of the old EOF
>> +	 *   (overwriting, or appending to a block that already contains
>> +	 *   valid data).
>> +	 */
>> +	if (!(disksize & (blocksize - 1)) ||
>> +	    pos < round_up(disksize, blocksize))
>> +		return;
>> +
>> +	order_len = READ_ONCE(ei->i_ordered_len);
>> +	if (!order_len)
>> +		return;
>> +
>> +	/*
>> +	 * Pair with smp_store_release() in ext4_iomap_end_bio() and
>> +	 * ext4_block_zero_eof(). Ensure we see the updated i_ordered_lblk
>> +	 * that was written before the release store to i_ordered_len.
>> +	 */
>> +	smp_rmb();
>> +	order_lblk = READ_ONCE(ei->i_ordered_lblk);
>> +	if ((pos >> inode->i_blkbits) >= order_lblk + order_len)
>> +		wait_event(ei->i_ordered_wq, READ_ONCE(ei->i_ordered_len) == 0);
>> +}
>> +
>>   static int ext4_iomap_wb_update_disksize(handle_t *handle, struct inode *inode,
>>   					 loff_t end)
>>   {
>> @@ -656,6 +696,9 @@ static void ext4_iomap_finish_ioend(struct iomap_ioend *ioend)
>>   		goto out;
>>   	}
>>   
>> +	/* Wait ordered zero data to be written out. */
>> +	ext4_iomap_wb_ordered_wait(inode, pos, pos + size);
>> +
>>   	/* We may need to convert one extent and dirty the inode. */
>>   	credits = ext4_chunk_trans_blocks(inode,
>>   			EXT4_MAX_BLOCKS(size, pos, inode->i_blkbits));
>> @@ -717,8 +760,25 @@ void ext4_iomap_end_bio(struct bio *bio)
>>   	struct inode *inode = ioend->io_inode;
>>   	struct ext4_inode_info *ei = EXT4_I(inode);
>>   	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>> +	unsigned long io_mode = (unsigned long)ioend->io_private;
>>   	unsigned long flags;
>>   
>> +	/*
>> +	 * This is an ordered I/O, clear the ordered range set in
>> +	 * ext4_block_zero_eof() and wake up all waiters that will update
>> +	 * the inode i_disksize.
>> +	 */
>> +	if (io_mode == EXT4_IOMAP_IOEND_ORDER_IO) {
>> +		/*
>> +		 * Pairs with wait_event() in ext4_iomap_wb_ordered_wait().
>> +		 * Ensure i_ordered_len = 0 is visible before waking up
>> +		 * waiters.
>> +		 */
>> +		smp_store_release(&ei->i_ordered_len, 0);
>> +		wake_up_all(&ei->i_ordered_wq);
>> +		goto defer;
>> +	}
>> +
>>   	/* Needs to convert unwritten extents or update the i_disksize. */
>>   	if ((ioend->io_flags & IOMAP_IOEND_UNWRITTEN) ||
>>   	    ioend->io_offset + ioend->io_size > READ_ONCE(ei->i_disksize))
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 62bfe05a64bc..9c0a00e716f3 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -1444,6 +1444,9 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
>>   	ext4_fc_init_inode(&ei->vfs_inode);
>>   	spin_lock_init(&ei->i_fc_lock);
>>   	mmb_init(&ei->i_metadata_bhs, &ei->vfs_inode.i_data);
>> +	ei->i_ordered_lblk = 0;
>> +	ei->i_ordered_len = 0;
>> +	init_waitqueue_head(&ei->i_ordered_wq);
>>   	return &ei->vfs_inode;
>>   }
>>   
>> @@ -1480,12 +1483,20 @@ static void ext4_destroy_inode(struct inode *inode)
>>   		dump_stack();
>>   	}
>>   
>> -	if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS) &&
>> -	    WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks))
>> -		ext4_msg(inode->i_sb, KERN_ERR,
>> -			 "Inode %llu (%p): i_reserved_data_blocks (%u) not cleared!",
>> -			 inode->i_ino, EXT4_I(inode),
>> -			 EXT4_I(inode)->i_reserved_data_blocks);
>> +	if (!(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_ERROR_FS)) {
>> +		if (WARN_ON_ONCE(EXT4_I(inode)->i_reserved_data_blocks))
>> +			ext4_msg(inode->i_sb, KERN_ERR,
>> +				 "Inode %llu (%p): i_reserved_data_blocks (%u) not cleared!",
>> +				 inode->i_ino, EXT4_I(inode),
>> +				 EXT4_I(inode)->i_reserved_data_blocks);
>> +
>> +		if (WARN_ON_ONCE(EXT4_I(inode)->i_ordered_len))
>> +			ext4_msg(inode->i_sb, KERN_ERR,
>> +				 "Inode %llu (%p): i_ordered_lblk (%u) and i_ordered_len (%u) not cleared!",
>> +				 inode->i_ino, EXT4_I(inode),
>> +				 EXT4_I(inode)->i_ordered_lblk,
>> +				 EXT4_I(inode)->i_ordered_len);
>> +	}
>>   }
>>   
>>   static void ext4_shutdown(struct super_block *sb)
>> -- 
>> 2.52.0
>>


^ permalink raw reply

* Re: [PATCH 1/2] tracing: Move non-trace_printk prototypes back to kernel.h
From: Steven Rostedt @ 2026-06-21 13:24 UTC (permalink / raw)
  To: Yury Norov, Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <ajfiVTlCIVlqW3sh@yury>



On June 21, 2026 2:08:37 PM GMT+01:00, Yury Norov <yury.norov@gmail.com> wrote:
>On Sun, Jun 21, 2026 at 05:34:31AM -0400, Steven Rostedt wrote:
>> From: Steven Rostedt <rostedt@goodmis.org>
>> 
>> In order to remove the include to trace_printk.h from kernel.h the tracing
>> control prototypes need to be moved back into kernel.h. That's because
>
>Please don't. Instead, you can split them out to trace_control.h, and
>include where needed. I actually have a prototype for it, FYI:
>
>https://github.com/norov/linux/tree/trace_pritk3
>

Sure, I have no problem adding another header for this.

>> they are used in other common header files like rcu.h. There's no point in
>> removing trace_printk.h from kernel.h if it just gets added back to other
>> common headers.
>> 
>> Prototypes are very cheap for the compiler and should not be an issue.
>
>It's not about cost, it's about mess. kernel.h is included everywhere.
>Is that API needed everywhere? No, it's needed in literally 10 files.
>So, no place in kernel.h.
> 

Well one of those files is rcu.h which is also pretty much included everywhere. But OK.

-- Steve 


>> 
>> 2.53.0
>> 

^ permalink raw reply

* Re: [PATCH 2/2] tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
From: Yury Norov @ 2026-06-21 13:57 UTC (permalink / raw)
  To: Yury Norov
  Cc: Steven Rostedt, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Peter Zijlstra, Julia Lawall, linux-doc,
	linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <ajfphe4Z8BrfYoUX@yury>

On Sun, Jun 21, 2026 at 09:39:17AM -0400, Yury Norov wrote:
> On Sun, Jun 21, 2026 at 05:47:21AM -0400, Steven Rostedt wrote:
> > On Sun, 21 Jun 2026 05:34:32 -0400
> > Steven Rostedt <rostedt@kernel.org> wrote:
> > 
> > > Instead of having trace_printk.h included in kernel.h, create a config
> > > TRACE_PRINTK_DEBUGGING that when set will update the CFLAGS in the
> > > Makefile to allow developers to add trace_printk() without the need to add
> > > the include for it. Having it included in the Makefile keeps it from being
> > > in the dependency chain and it will not waste extra CPU cycles for those
> > > building the kernel without using trace_printk.
> > 
> > Bah, I only tested with the config option enabled, and missed some
> > dependencies with it disabled.
> 
> Yes you did.
>  
> > For instance, rcu.h also uses ftrace_dump() so that too needs to go
> > into kernel.h.
> 
> No, it shouldn't.
> 
> > I also need to add a few more includes to trace_printk.h.
> 
> > OK, I need to run this through all my tests to find where else I missed
> > adding the includes. But the idea should hopefully satisfy everyone.
> 
> If you include it under config in kernel.h, to make the kernel buildable,

I mean: in kernel.h or in Makefile.

> you need to include trace_printk.h explicitly where it's actually used.
> IOW, apply my patch v4-7.
> 
> Then, developers who use trace_printk() on their development machine,
> will be really frustrated when their debugging code will break client
> build just because CONFIG_TRACE_PRINTK_DEBUGGING is disabled there.
> They will spend a day, at best, communicating with remote managers,
> and end up with adding #include <linux/trace_printk.h> in the files
> they touch. Is that your plan?
> 
> If I was one of those developers, the solution would be simple for me:
> don't use trace_printk() at all.
> 
> Thanks,
> Yury

^ permalink raw reply

* Re: [PATCH 2/2] tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
From: Yury Norov @ 2026-06-21 13:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260621054721.7cde38f0@fedora>

On Sun, Jun 21, 2026 at 05:47:21AM -0400, Steven Rostedt wrote:
> On Sun, 21 Jun 2026 05:34:32 -0400
> Steven Rostedt <rostedt@kernel.org> wrote:
> 
> > Instead of having trace_printk.h included in kernel.h, create a config
> > TRACE_PRINTK_DEBUGGING that when set will update the CFLAGS in the
> > Makefile to allow developers to add trace_printk() without the need to add
> > the include for it. Having it included in the Makefile keeps it from being
> > in the dependency chain and it will not waste extra CPU cycles for those
> > building the kernel without using trace_printk.
> 
> Bah, I only tested with the config option enabled, and missed some
> dependencies with it disabled.

Yes you did.
 
> For instance, rcu.h also uses ftrace_dump() so that too needs to go
> into kernel.h.

No, it shouldn't.

> I also need to add a few more includes to trace_printk.h.

> OK, I need to run this through all my tests to find where else I missed
> adding the includes. But the idea should hopefully satisfy everyone.

If you include it under config in kernel.h, to make the kernel buildable,
you need to include trace_printk.h explicitly where it's actually used.
IOW, apply my patch v4-7.

Then, developers who use trace_printk() on their development machine,
will be really frustrated when their debugging code will break client
build just because CONFIG_TRACE_PRINTK_DEBUGGING is disabled there.
They will spend a day, at best, communicating with remote managers,
and end up with adding #include <linux/trace_printk.h> in the files
they touch. Is that your plan?

If I was one of those developers, the solution would be simple for me:
don't use trace_printk() at all.

Thanks,
Yury

^ permalink raw reply

* Re: [PATCH 1/2] tracing: Move non-trace_printk prototypes back to kernel.h
From: Yury Norov @ 2026-06-21 13:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Sebastian Andrzej Siewior, John Ogness, Thomas Gleixner,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260621093811.007634476@kernel.org>

On Sun, Jun 21, 2026 at 05:34:31AM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> In order to remove the include to trace_printk.h from kernel.h the tracing
> control prototypes need to be moved back into kernel.h. That's because

Please don't. Instead, you can split them out to trace_control.h, and
include where needed. I actually have a prototype for it, FYI:

https://github.com/norov/linux/tree/trace_pritk3

> they are used in other common header files like rcu.h. There's no point in
> removing trace_printk.h from kernel.h if it just gets added back to other
> common headers.
> 
> Prototypes are very cheap for the compiler and should not be an issue.

It's not about cost, it's about mess. kernel.h is included everywhere.
Is that API needed everywhere? No, it's needed in literally 10 files.
So, no place in kernel.h.
 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  include/linux/kernel.h       | 18 ++++++++++++++++++
>  include/linux/trace_printk.h | 17 -----------------
>  2 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index e5570a16cbb1..c3c68128827c 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -194,4 +194,22 @@ extern enum system_states system_state;
>  # define REBUILD_DUE_TO_DYNAMIC_FTRACE
>  #endif
>  
> +#ifdef CONFIG_TRACING
> +void tracing_on(void);
> +void tracing_off(void);
> +int tracing_is_on(void);
> +void tracing_snapshot(void);
> +void tracing_snapshot_alloc(void);
> +void tracing_start(void);
> +void tracing_stop(void);
> +#else
> +static inline void tracing_start(void) { }
> +static inline void tracing_stop(void) { }
> +static inline void tracing_on(void) { }
> +static inline void tracing_off(void) { }
> +static inline int tracing_is_on(void) { return 0; }
> +static inline void tracing_snapshot(void) { }
> +static inline void tracing_snapshot_alloc(void) { }
> +#endif
> +
>  #endif
> diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
> index 3d54f440dccf..879fed0805fd 100644
> --- a/include/linux/trace_printk.h
> +++ b/include/linux/trace_printk.h
> @@ -35,15 +35,6 @@ enum ftrace_dump_mode {
>  };
>  
>  #ifdef CONFIG_TRACING
> -void tracing_on(void);
> -void tracing_off(void);
> -int tracing_is_on(void);
> -void tracing_snapshot(void);
> -void tracing_snapshot_alloc(void);
> -
> -extern void tracing_start(void);
> -extern void tracing_stop(void);
> -
>  static inline __printf(1, 2)
>  void ____trace_printk_check_format(const char *fmt, ...)
>  {
> @@ -176,16 +167,8 @@ __ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
>  
>  extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
>  #else
> -static inline void tracing_start(void) { }
> -static inline void tracing_stop(void) { }
>  static inline void trace_dump_stack(int skip) { }
>  
> -static inline void tracing_on(void) { }
> -static inline void tracing_off(void) { }
> -static inline int tracing_is_on(void) { return 0; }
> -static inline void tracing_snapshot(void) { }
> -static inline void tracing_snapshot_alloc(void) { }
> -
>  static inline __printf(1, 2)
>  int trace_printk(const char *fmt, ...)
>  {
> -- 
> 2.53.0
> 

^ permalink raw reply

* Re: [PATCH 2/2] tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
From: Steven Rostedt @ 2026-06-21 13:03 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Gleixner, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260621135531.243375d9@pumpkin>

On Sun, 21 Jun 2026 13:55:31 +0100
David Laight <david.laight.linux@gmail.com> wrote:

> Indeed...
> Isn't trace_printk() just an extern?
> Having it defined somewhere isn't going to make any difference to build times.

No it is not. It is a macro to cut as many nanoseconds as possible as
trace_printk() was created to debug tight race conditions and any added
latency can make the race go away.

-- Steve

^ permalink raw reply

* Re: [PATCH 2/2] tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
From: David Laight @ 2026-06-21 12:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, linux-kernel, linux-trace-kernel,
	Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <87ik7cmcb7.ffs@fw13>

On Sun, 21 Jun 2026 12:13:00 +0200
Thomas Gleixner <tglx@kernel.org> wrote:

> On Sun, Jun 21 2026 at 05:34, Steven Rostedt wrote:
> > Instead of having trace_printk.h included in kernel.h, create a config
> > TRACE_PRINTK_DEBUGGING that when set will update the CFLAGS in the
> > Makefile to allow developers to add trace_printk() without the need to add
> > the include for it. Having it included in the Makefile keeps it from being
> > in the dependency chain and it will not waste extra CPU cycles for those
> > building the kernel without using trace_printk.  
> 
> IOW, you make it worse just because.
> 
> With the header being separate I add the three trace_printk()s and the
> include to the source file I'm investigating. The recompile will build
> exactly this source file.
> 
> Having to enable the config knob will result in a full kernel rebuild
> for no value.

Indeed...
Isn't trace_printk() just an extern?
Having it defined somewhere isn't going to make any difference to build times.

	David
 

> 
> Seriously?
> 
> Thanks,
> 
>         tglx
> 
> 
> 


^ permalink raw reply

* Re: [PATCH 2/2] tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
From: Steven Rostedt @ 2026-06-21 10:38 UTC (permalink / raw)
  To: Thomas Gleixner, Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <87ik7cmcb7.ffs@fw13>



On June 21, 2026 11:13:00 AM GMT+01:00, Thomas Gleixner <tglx@kernel.org> wrote:
>On Sun, Jun 21 2026 at 05:34, Steven Rostedt wrote:
>> Instead of having trace_printk.h included in kernel.h, create a config
>> TRACE_PRINTK_DEBUGGING that when set will update the CFLAGS in the
>> Makefile to allow developers to add trace_printk() without the need to add
>> the include for it. Having it included in the Makefile keeps it from being
>> in the dependency chain and it will not waste extra CPU cycles for those
>> building the kernel without using trace_printk.
>
>IOW, you make it worse just because.
>
>With the header being separate I add the three trace_printk()s and the
>include to the source file I'm investigating. The recompile will build
>exactly this source file.
>
>Having to enable the config knob will result in a full kernel rebuild
>for no value.
>
>Seriously?

Like having lockdep enabled, this would always be set in the development environment. It's not something to only enable when you need to add a trace_printk. If you don't want to rebuild everything, by all means add the include file by file. There's nothing preventing you to do that with this solution.

-- Steve 

P.S. I'm replying on my phone as I'm in the London Tube. Thus why I'm not trimming my email.


>
>Thanks,
>
>        tglx
>

^ permalink raw reply

* Re: [PATCH 2/2] tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
From: Thomas Gleixner @ 2026-06-21 10:13 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Peter Zijlstra, Julia Lawall, Yury Norov, linux-doc, linux-kbuild,
	linuxppc-dev, dri-devel, linux-stm32, linux-arm-kernel,
	linux-rdma, linux-usb, linux-ext4, linux-nfs, kvm, intel-gfx
In-Reply-To: <20260621093811.168514984@kernel.org>

On Sun, Jun 21 2026 at 05:34, Steven Rostedt wrote:
> Instead of having trace_printk.h included in kernel.h, create a config
> TRACE_PRINTK_DEBUGGING that when set will update the CFLAGS in the
> Makefile to allow developers to add trace_printk() without the need to add
> the include for it. Having it included in the Makefile keeps it from being
> in the dependency chain and it will not waste extra CPU cycles for those
> building the kernel without using trace_printk.

IOW, you make it worse just because.

With the header being separate I add the three trace_printk()s and the
include to the source file I'm investigating. The recompile will build
exactly this source file.

Having to enable the config knob will result in a full kernel rebuild
for no value.

Seriously?

Thanks,

        tglx



^ permalink raw reply

* Re: [PATCH 2/2] tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
From: Steven Rostedt @ 2026-06-21  9:47 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
	linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <20260621093811.168514984@kernel.org>

On Sun, 21 Jun 2026 05:34:32 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> Instead of having trace_printk.h included in kernel.h, create a config
> TRACE_PRINTK_DEBUGGING that when set will update the CFLAGS in the
> Makefile to allow developers to add trace_printk() without the need to add
> the include for it. Having it included in the Makefile keeps it from being
> in the dependency chain and it will not waste extra CPU cycles for those
> building the kernel without using trace_printk.

Bah, I only tested with the config option enabled, and missed some
dependencies with it disabled.

For instance, rcu.h also uses ftrace_dump() so that too needs to go
into kernel.h. I also need to add a few more includes to trace_printk.h.

OK, I need to run this through all my tests to find where else I missed
adding the includes. But the idea should hopefully satisfy everyone.

-- Steve

^ permalink raw reply

* [PATCH 2/2] tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h
From: Steven Rostedt @ 2026-06-21  9:34 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
	linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <20260621093430.264983361@kernel.org>

From: Steven Rostedt <rostedt@goodmis.org>

Instead of having trace_printk.h included in kernel.h, create a config
TRACE_PRINTK_DEBUGGING that when set will update the CFLAGS in the
Makefile to allow developers to add trace_printk() without the need to add
the include for it. Having it included in the Makefile keeps it from being
in the dependency chain and it will not waste extra CPU cycles for those
building the kernel without using trace_printk.

Link: https://lore.kernel.org/all/CAHk-=wikCBeVFjVXiY4o-oepdbjAoir5+TcAgtL12c4u1TpZLQ@mail.gmail.com/

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 .../debugging/driver_development_debugging_guide.rst   |  2 +-
 Makefile                                               |  5 +++++
 arch/powerpc/kvm/book3s_xics.c                         |  1 +
 drivers/gpu/drm/i915/gt/intel_gtt.h                    |  1 +
 drivers/gpu/drm/i915/i915_gem.h                        |  1 +
 drivers/hwtracing/stm/dummy_stm.c                      |  4 ++++
 drivers/infiniband/hw/hfi1/trace_dbg.h                 |  1 +
 drivers/usb/early/xhci-dbc.c                           |  1 +
 fs/ext4/inline.c                                       |  1 +
 include/linux/kernel.h                                 |  1 -
 include/linux/sunrpc/debug.h                           |  1 +
 include/linux/trace_printk.h                           |  5 +++--
 kernel/trace/Kconfig                                   | 10 ++++++++++
 kernel/trace/ring_buffer_benchmark.c                   |  1 +
 kernel/trace/trace.h                                   |  1 +
 samples/fprobe/fprobe_example.c                        |  1 +
 samples/ftrace/ftrace-direct-modify.c                  |  1 +
 samples/ftrace/ftrace-direct-multi-modify.c            |  1 +
 samples/ftrace/ftrace-direct-multi.c                   |  2 +-
 samples/ftrace/ftrace-direct-too.c                     |  2 +-
 samples/ftrace/ftrace-direct.c                         |  2 +-
 21 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/Documentation/process/debugging/driver_development_debugging_guide.rst b/Documentation/process/debugging/driver_development_debugging_guide.rst
index aca08f457793..3c87aa03622f 100644
--- a/Documentation/process/debugging/driver_development_debugging_guide.rst
+++ b/Documentation/process/debugging/driver_development_debugging_guide.rst
@@ -52,7 +52,7 @@ For the full documentation see :doc:`/core-api/printk-basics`
 Trace_printk
 ~~~~~~~~~~~~
 
-Prerequisite: ``CONFIG_DYNAMIC_FTRACE`` & ``#include <linux/ftrace.h>``
+Prerequisite: ``CONFIG_TRACE_PRINTK_DEBUGGING``
 
 It is a tiny bit less comfortable to use than printk(), because you will have
 to read the messages from the trace file (See: :ref:`read_ftrace_log`
diff --git a/Makefile b/Makefile
index d1c595db55c9..2f5923d5393b 100644
--- a/Makefile
+++ b/Makefile
@@ -840,6 +840,11 @@ ifdef CONFIG_FUNCTION_TRACER
   CC_FLAGS_FTRACE := -pg
 endif
 
+ifdef CONFIG_TRACE_PRINTK_DEBUGGING
+  # Allow trace_printk() to be used anywhere without including the header.
+  LINUXINCLUDE += -include $(srctree)/include/linux/trace_printk.h
+endif
+
 ifdef CONFIG_TRACEPOINTS
 # To check for unused tracepoints (tracepoints that are defined but never
 # called), run with:
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 74a44fa702b0..ef5eb596a56e 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -26,6 +26,7 @@
 #if 1
 #define XICS_DBG(fmt...) do { } while (0)
 #else
+#include <linux/trace_printk.h>
 #define XICS_DBG(fmt...) trace_printk(fmt)
 #endif
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b54ee4f25af1..f6f223090760 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -35,6 +35,7 @@
 #define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
 
 #if IS_ENABLED(CONFIG_DRM_I915_TRACE_GTT)
+#include <linux/trace_printk.h>
 #define GTT_TRACE(...) trace_printk(__VA_ARGS__)
 #else
 #define GTT_TRACE(...)
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 20b3cb29cfff..5cab1836dc1d 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -116,6 +116,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
 #endif
 
 #if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
+#include <linux/trace_printk.h>
 #define GEM_TRACE(...) trace_printk(__VA_ARGS__)
 #define GEM_TRACE_ERR(...) do {						\
 	pr_err(__VA_ARGS__);						\
diff --git a/drivers/hwtracing/stm/dummy_stm.c b/drivers/hwtracing/stm/dummy_stm.c
index 38528ffdc0b3..784f9af7ccba 100644
--- a/drivers/hwtracing/stm/dummy_stm.c
+++ b/drivers/hwtracing/stm/dummy_stm.c
@@ -14,6 +14,10 @@
 #include <linux/stm.h>
 #include <uapi/linux/stm.h>
 
+#ifdef DEBUG
+#include <linux/trace_printk.h>
+#endif
+
 static ssize_t notrace
 dummy_stm_packet(struct stm_data *stm_data, unsigned int master,
 		 unsigned int channel, unsigned int packet, unsigned int flags,
diff --git a/drivers/infiniband/hw/hfi1/trace_dbg.h b/drivers/infiniband/hw/hfi1/trace_dbg.h
index 58304b91380f..30df5e246586 100644
--- a/drivers/infiniband/hw/hfi1/trace_dbg.h
+++ b/drivers/infiniband/hw/hfi1/trace_dbg.h
@@ -103,6 +103,7 @@ __hfi1_trace_def(IOCTL);
  */
 
 #ifdef HFI1_EARLY_DBG
+#include <linux/trace_printk.h>
 #define hfi1_dbg_early(fmt, ...) \
 	trace_printk(fmt, ##__VA_ARGS__)
 #else
diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
index 41118bba9197..955c73bd601f 100644
--- a/drivers/usb/early/xhci-dbc.c
+++ b/drivers/usb/early/xhci-dbc.c
@@ -30,6 +30,7 @@ static struct xdbc_state xdbc;
 static bool early_console_keep;
 
 #ifdef XDBC_TRACE
+#include <linux/trace_printk.h>
 #define	xdbc_trace	trace_printk
 #else
 static inline void xdbc_trace(const char *fmt, ...) { }
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 8045e4ff270c..0eff4a0c6a6c 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -934,6 +934,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
 }
 
 #ifdef INLINE_DIR_DEBUG
+#include <linux/trace_printk.h>
 void ext4_show_inline_dir(struct inode *dir, struct buffer_head *bh,
 			  void *inline_start, int inline_size)
 {
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index c3c68128827c..538655385089 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -31,7 +31,6 @@
 #include <linux/build_bug.h>
 #include <linux/sprintf.h>
 #include <linux/static_call_types.h>
-#include <linux/trace_printk.h>
 #include <linux/util_macros.h>
 #include <linux/wordpart.h>
 
diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index ab61bed2f7af..7524f5d82fba 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -29,6 +29,7 @@ extern unsigned int		nlm_debug;
 # define ifdebug(fac)		if (unlikely(rpc_debug & RPCDBG_##fac))
 
 # if IS_ENABLED(CONFIG_SUNRPC_DEBUG_TRACE)
+#  include <linux/trace_printk.h>
 #  define __sunrpc_printk(fmt, ...)	trace_printk(fmt, ##__VA_ARGS__)
 # else
 #  define __sunrpc_printk(fmt, ...)	printk(KERN_DEFAULT fmt, ##__VA_ARGS__)
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
index 879fed0805fd..66edec6d5dbf 100644
--- a/include/linux/trace_printk.h
+++ b/include/linux/trace_printk.h
@@ -1,11 +1,12 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _LINUX_TRACE_PRINTK_H
 #define _LINUX_TRACE_PRINTK_H
+#if !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO)
 
-#include <linux/compiler_attributes.h>
 #include <linux/instruction_pointer.h>
 #include <linux/stddef.h>
 #include <linux/stringify.h>
+#include <linux/stdarg.h>
 
 /*
  * General tracing related utility functions - trace_printk(),
@@ -181,5 +182,5 @@ ftrace_vprintk(const char *fmt, va_list ap)
 }
 static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
 #endif /* CONFIG_TRACING */
-
+#endif /* !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) */
 #endif
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 084f34dc6c9f..ffbd1b0ce66e 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -210,6 +210,16 @@ menuconfig FTRACE
 
 if FTRACE
 
+config TRACE_PRINTK_DEBUGGING
+	bool "Debug with trace_printk()"
+	help
+	  If you need to debug with trace_printk(), instead of adding
+	  include <linux/trace_printk.h> to every file you add a trace_printk
+	  to, select this option and it will add trace_printk.h to all code
+	  to allow tracing with trace_printk() with.
+
+	  If in doubt, select N
+
 config TRACEFS_AUTOMOUNT_DEPRECATED
 	bool "Automount tracefs on debugfs [DEPRECATED]"
 	depends on TRACING
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 593e3b59e42e..2bb25caebb75 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
  */
 #include <linux/ring_buffer.h>
+#include <linux/trace_printk.h>
 #include <linux/completion.h>
 #include <linux/kthread.h>
 #include <uapi/linux/sched/types.h>
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..580a3deab1e9 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -13,6 +13,7 @@
 #include <linux/ftrace.h>
 #include <linux/trace.h>
 #include <linux/hw_breakpoint.h>
+#include <linux/trace_printk.h>
 #include <linux/trace_seq.h>
 #include <linux/trace_events.h>
 #include <linux/compiler.h>
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index bfe98ce826f3..de81b9b4ca7d 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -12,6 +12,7 @@
 
 #define pr_fmt(fmt) "%s: " fmt, __func__
 
+#include <linux/trace_printk.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/fprobe.h>
diff --git a/samples/ftrace/ftrace-direct-modify.c b/samples/ftrace/ftrace-direct-modify.c
index 1ba1927b548e..30d0f8e644c8 100644
--- a/samples/ftrace/ftrace-direct-modify.c
+++ b/samples/ftrace/ftrace-direct-modify.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/trace_printk.h>
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/ftrace.h>
diff --git a/samples/ftrace/ftrace-direct-multi-modify.c b/samples/ftrace/ftrace-direct-multi-modify.c
index 7a7822dfeb50..f64b929e19ec 100644
--- a/samples/ftrace/ftrace-direct-multi-modify.c
+++ b/samples/ftrace/ftrace-direct-multi-modify.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/trace_printk.h>
 #include <linux/module.h>
 #include <linux/kthread.h>
 #include <linux/ftrace.h>
diff --git a/samples/ftrace/ftrace-direct-multi.c b/samples/ftrace/ftrace-direct-multi.c
index 3fe6ddaf0b69..d32644a49554 100644
--- a/samples/ftrace/ftrace-direct-multi.c
+++ b/samples/ftrace/ftrace-direct-multi.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/trace_printk.h>
 #include <linux/module.h>
-
 #include <linux/mm.h> /* for handle_mm_fault() */
 #include <linux/ftrace.h>
 #include <linux/sched/stat.h>
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index bf2411aa6fd7..266fcb233301 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/trace_printk.h>
 #include <linux/module.h>
-
 #include <linux/mm.h> /* for handle_mm_fault() */
 #include <linux/ftrace.h>
 #if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
diff --git a/samples/ftrace/ftrace-direct.c b/samples/ftrace/ftrace-direct.c
index 5368c8c39cbb..85e0dff9b691 100644
--- a/samples/ftrace/ftrace-direct.c
+++ b/samples/ftrace/ftrace-direct.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/trace_printk.h>
 #include <linux/module.h>
-
 #include <linux/sched.h> /* for wake_up_process() */
 #include <linux/ftrace.h>
 #if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
-- 
2.53.0



^ permalink raw reply related

* [PATCH 1/2] tracing: Move non-trace_printk prototypes back to kernel.h
From: Steven Rostedt @ 2026-06-21  9:34 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
	linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx
In-Reply-To: <20260621093430.264983361@kernel.org>

From: Steven Rostedt <rostedt@goodmis.org>

In order to remove the include to trace_printk.h from kernel.h the tracing
control prototypes need to be moved back into kernel.h. That's because
they are used in other common header files like rcu.h. There's no point in
removing trace_printk.h from kernel.h if it just gets added back to other
common headers.

Prototypes are very cheap for the compiler and should not be an issue.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/kernel.h       | 18 ++++++++++++++++++
 include/linux/trace_printk.h | 17 -----------------
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e5570a16cbb1..c3c68128827c 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -194,4 +194,22 @@ extern enum system_states system_state;
 # define REBUILD_DUE_TO_DYNAMIC_FTRACE
 #endif
 
+#ifdef CONFIG_TRACING
+void tracing_on(void);
+void tracing_off(void);
+int tracing_is_on(void);
+void tracing_snapshot(void);
+void tracing_snapshot_alloc(void);
+void tracing_start(void);
+void tracing_stop(void);
+#else
+static inline void tracing_start(void) { }
+static inline void tracing_stop(void) { }
+static inline void tracing_on(void) { }
+static inline void tracing_off(void) { }
+static inline int tracing_is_on(void) { return 0; }
+static inline void tracing_snapshot(void) { }
+static inline void tracing_snapshot_alloc(void) { }
+#endif
+
 #endif
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
index 3d54f440dccf..879fed0805fd 100644
--- a/include/linux/trace_printk.h
+++ b/include/linux/trace_printk.h
@@ -35,15 +35,6 @@ enum ftrace_dump_mode {
 };
 
 #ifdef CONFIG_TRACING
-void tracing_on(void);
-void tracing_off(void);
-int tracing_is_on(void);
-void tracing_snapshot(void);
-void tracing_snapshot_alloc(void);
-
-extern void tracing_start(void);
-extern void tracing_stop(void);
-
 static inline __printf(1, 2)
 void ____trace_printk_check_format(const char *fmt, ...)
 {
@@ -176,16 +167,8 @@ __ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
 
 extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
 #else
-static inline void tracing_start(void) { }
-static inline void tracing_stop(void) { }
 static inline void trace_dump_stack(int skip) { }
 
-static inline void tracing_on(void) { }
-static inline void tracing_off(void) { }
-static inline int tracing_is_on(void) { return 0; }
-static inline void tracing_snapshot(void) { }
-static inline void tracing_snapshot_alloc(void) { }
-
 static inline __printf(1, 2)
 int trace_printk(const char *fmt, ...)
 {
-- 
2.53.0



^ permalink raw reply related

* [PATCH 0/2] tracing: Move trace_printk.h out of kernel.h
From: Steven Rostedt @ 2026-06-21  9:34 UTC (permalink / raw)
  To: linux-kernel, linux-trace-kernel
  Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
	Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
	Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
	linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
	linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
	kvm, intel-gfx

There's been complaints about trace_printk() being defined in kernel.h as it
can increase the compilation time. As it is only used by some developers for
debugging purposes, it should not be in kernel.h causing lots of wasted CPU
cycles for those that do not ever care about it.

Instead, add a CONFIG_TRACE_PRINTK_DEBUGGING option that developers that do
use it can set and not have to always remember to add #include <linux/trace_printk.h>
to the files they add trace_printk() while debugging. It also means that
those that do not have that config set will not have to worry about wasted
CPU cycles as it is only include in the CFLAGS when the option is set, and
its completely ignored otherwise.

Steven Rostedt (2):
      tracing: Move non-trace_printk prototypes back to kernel.h
      tracing: Add CONFIG_TRACE_PRINTK_DEBUGGING to clean up kernel.h

----
 .../driver_development_debugging_guide.rst         |  2 +-
 Makefile                                           |  5 +++++
 arch/powerpc/kvm/book3s_xics.c                     |  1 +
 drivers/gpu/drm/i915/gt/intel_gtt.h                |  1 +
 drivers/gpu/drm/i915/i915_gem.h                    |  1 +
 drivers/hwtracing/stm/dummy_stm.c                  |  4 ++++
 drivers/infiniband/hw/hfi1/trace_dbg.h             |  1 +
 drivers/usb/early/xhci-dbc.c                       |  1 +
 fs/ext4/inline.c                                   |  1 +
 include/linux/kernel.h                             | 19 ++++++++++++++++++-
 include/linux/sunrpc/debug.h                       |  1 +
 include/linux/trace_printk.h                       | 22 +++-------------------
 kernel/trace/Kconfig                               | 10 ++++++++++
 kernel/trace/ring_buffer_benchmark.c               |  1 +
 kernel/trace/trace.h                               |  1 +
 samples/fprobe/fprobe_example.c                    |  1 +
 samples/ftrace/ftrace-direct-modify.c              |  1 +
 samples/ftrace/ftrace-direct-multi-modify.c        |  1 +
 samples/ftrace/ftrace-direct-multi.c               |  2 +-
 samples/ftrace/ftrace-direct-too.c                 |  2 +-
 samples/ftrace/ftrace-direct.c                     |  2 +-
 21 files changed, 56 insertions(+), 24 deletions(-)

^ permalink raw reply

* Re: [PATCH v10 03/22] ovl: use core fsverity ensure info interface
From: Andrey Albershteyn @ 2026-06-20  8:19 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Amir Goldstein, linux-xfs, fsverity, linux-fsdevel, hch,
	linux-ext4, linux-f2fs-devel, linux-btrfs, linux-unionfs, djwong
In-Reply-To: <20260619165448.GB3223@sol>

On 2026-06-19 09:54:48, Eric Biggers wrote:
> On Fri, Jun 19, 2026 at 09:28:31AM +0200, Amir Goldstein wrote:
> > On Wed, May 20, 2026 at 9:07 PM Eric Biggers <ebiggers@kernel.org> wrote:
> > >
> > > On Wed, May 20, 2026 at 02:37:01PM +0200, Andrey Albershteyn wrote:
> > > > fsverity now exposes fsverity_ensure_verity_info() which could be used
> > > > instead of opening file to ensure that fsverity info is loaded and
> > > > attached to inode.
> > > >
> > > > Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
> > > > Acked-by: Amir Goldstein <amir73il@gmail.com>
> > > > ---
> > > >  fs/overlayfs/util.c | 14 +++-----------
> > > >  1 file changed, 3 insertions(+), 11 deletions(-)
> > >
> > > Reviewed-by: Eric Biggers <ebiggers@kernel.org>
> > >
> > > I'm still confused by the new implementation of fsverity_active() that
> > > got introduced by "fsverity: use a hashtable to find the fsverity_info",
> > > though.  I should have caught this during review of that commit.  For
> > > one its comment is outdated, but also the memory barrier seems to be
> > > specific to the fsverity_get_info() caller and probably should be moved
> > > to there.  Anyway, that's not directly related to this patch.
> > 
> > Eric, Andrey,
> > 
> > Did you see the Sashiko review for this patch and others in this series?
> > 
> > https://sashiko.dev/#/patchset/20260520123722.405752-1-aalbersh%40kernel.org
> > 
> > It annotated some review comments as high and critical.
> > For this patch it is about interaction with fscrypt.
> > 
> > Please take a look and say if this is concerning or false positive.
> 
> Yes, this patch is broken and should be dropped.  I need to remember to
> look at the Sashiko reviews for other people's patches and not just
> trust that the submitter will.  Fortunately this one wasn't applied yet.
> 
> I pointed out the HIGHMEM performance bug in
> "fsverity: generate and store zero-block hash" earlier
> (https://lore.kernel.org/linux-fsdevel/20260401222717.GH2466@quark/).  I
> assume it was decided that no one will care about the combination of XFS
> && fsverity && HIGHMEM.  But the XFS folks should double-check that.
> 
> Andrey, could you check the Sashiko reviews for the other patches too?

Sure, I will go through them

-- 
- Andrey

^ permalink raw reply

* [syzbot ci] Re: Data in direntry (dirdata) feature
From: syzbot ci @ 2026-06-20  6:55 UTC (permalink / raw)
  To: adilger.kernel, adilger, adilger, artem.blagodarenko, linux-ext4,
	pravin.shelar
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260619191022.27008-1-ablagodarenko@thelustrecollective.com>

syzbot ci has tested the following series

[v3] Data in direntry (dirdata) feature
https://lore.kernel.org/all/20260619191022.27008-1-ablagodarenko@thelustrecollective.com
* [PATCH v3 01/10] ext4: replace ext4_dir_entry with ext4_dir_entry_2
* [PATCH v3 02/10] ext4: add ext4_dir_entry_is_tail()
* [PATCH v3 03/10] ext4: refactor dx_root to support variable dirent sizes
* [PATCH v3 04/10] ext4: add dirdata format definitions and access helpers
* [PATCH v3 05/10] ext4: preserve dirdata bits in get_dtype()
* [PATCH v3 06/10] ext4: add ext4_dir_entry_len() and harden dirdata parsing
* [PATCH v3 07/10] ext4: rename ext4_dir_rec_len() and clarify dirdata usage
* [PATCH v3 08/10] ext4: dirdata feature
* [PATCH v3 09/10] ext4: add dirdata set/get helpers
* [PATCH v3 10/10] ext4: Add EXT4_IOC_SET_LUFID ioctl for setting LUFID on directory entries

and found the following issues:
* KASAN: slab-out-of-bounds Read in __ext4_check_dir_entry
* KASAN: slab-out-of-bounds Read in ext4_inlinedir_to_tree
* KASAN: slab-use-after-free Read in __ext4_check_dir_entry
* KASAN: slab-use-after-free Read in ext4_inlinedir_to_tree
* KASAN: use-after-free Read in __ext4_check_dir_entry

Full report is available here:
https://ci.syzbot.org/series/a3c6e513-a6eb-4583-86f6-89176398b397

***

KASAN: slab-out-of-bounds Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      08c7183f5b9ffe4408e74fff848a4cc2105361d4
arch:      amd64
compiler:  Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
config:    https://ci.syzbot.org/builds/0efdb868-daeb-4649-9bcb-5af41d993e73/config
syz repro: https://ci.syzbot.org/findings/ec557d64-7b60-46c9-a0eb-feaa7a3eb2cd/syz_repro

loop0: lost filesystem error report for type 5 error -117
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_dirent_get_data_len fs/ext4/ext4.h:4156 [inline]
BUG: KASAN: slab-out-of-bounds in ext4_dir_entry_len fs/ext4/ext4.h:4183 [inline]
BUG: KASAN: slab-out-of-bounds in __ext4_check_dir_entry+0x659/0xbe0 fs/ext4/dir.c:96
Read of size 1 at addr ffff88816e86bcd9 by task syz.0.21/5783

CPU: 1 UID: 0 PID: 5783 Comm: syz.0.21 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4156 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4183 [inline]
 __ext4_check_dir_entry+0x659/0xbe0 fs/ext4/dir.c:96
 ext4_check_all_de+0x6a/0x140 fs/ext4/dir.c:657
 ext4_convert_inline_data_nolock+0x1b7/0x980 fs/ext4/inline.c:1121
 ext4_try_add_inline_entry+0x5cc/0x8a0 fs/ext4/inline.c:1247
 __ext4_add_entry+0x385/0x3470 fs/ext4/namei.c:2552
 ext4_add_entry fs/ext4/namei.c:2636 [inline]
 ext4_mkdir+0x5f3/0xd30 fs/ext4/namei.c:3203
 vfs_mkdir+0x406/0x620 fs/namei.c:5272
 filename_mkdirat+0x285/0x510 fs/namei.c:5305
 __do_sys_mkdirat fs/namei.c:5326 [inline]
 __se_sys_mkdirat+0x35/0x150 fs/namei.c:5323
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb4c839ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd4b254a88 EFLAGS: 00000246 ORIG_RAX: 0000000000000102
RAX: ffffffffffffffda RBX: 00007fb4c8615fa0 RCX: 00007fb4c839ce59
RDX: 0000000000000037 RSI: 0000200000000380 RDI: 0000000000000004
RBP: 00007fb4c8432e6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fb4c8615fac R14: 00007fb4c8615fa0 R15: 00007fb4c8615fa0
 </TASK>

Allocated by task 5056:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x318/0x660 mm/slub.c:5451
 _kmalloc_noprof include/linux/slab.h:969 [inline]
 _kzalloc_noprof include/linux/slab.h:1286 [inline]
 kernfs_get_open_node fs/kernfs/file.c:536 [inline]
 kernfs_fop_open+0x7e6/0xce0 fs/kernfs/file.c:711
 do_dentry_open+0x816/0x1380 fs/open.c:947
 vfs_open+0x3b/0x340 fs/open.c:1079
 do_open fs/namei.c:4700 [inline]
 path_openat+0x2e44/0x3830 fs/namei.c:4859
 do_file_open+0x23e/0x4a0 fs/namei.c:4888
 do_sys_openat2+0x115/0x200 fs/open.c:1395
 do_sys_open fs/open.c:1401 [inline]
 __do_sys_openat fs/open.c:1417 [inline]
 __se_sys_openat fs/open.c:1412 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1412
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88816e86bc00
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 89 bytes to the right of
 allocated 128-byte region [ffff88816e86bc00, ffff88816e86bc80)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88816e86bf00 pfn:0x16e86b
flags: 0x57ff00000000200(workingset|node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000200 ffff888100041a00 ffff888160400648 ffff888160400648
raw: ffff88816e86bf00 000000080010000f 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5056, tgid 5056 (udevd), ts 53238350188, free_ts 53091667410
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3289 [inline]
 allocate_slab+0x74/0x5d0 mm/slub.c:3404
 new_slab mm/slub.c:3447 [inline]
 refill_objects+0x328/0x3c0 mm/slub.c:7241
 refill_sheaf mm/slub.c:2827 [inline]
 __pcs_replace_empty_main+0x2e0/0x6b0 mm/slub.c:4692
 alloc_from_pcs mm/slub.c:4790 [inline]
 slab_alloc_node mm/slub.c:4924 [inline]
 __kmalloc_cache_noprof+0x388/0x660 mm/slub.c:5446
 _kmalloc_noprof include/linux/slab.h:969 [inline]
 _kzalloc_noprof include/linux/slab.h:1286 [inline]
 kernfs_get_open_node fs/kernfs/file.c:536 [inline]
 kernfs_fop_open+0x7e6/0xce0 fs/kernfs/file.c:711
 do_dentry_open+0x816/0x1380 fs/open.c:947
 vfs_open+0x3b/0x340 fs/open.c:1079
 do_open fs/namei.c:4700 [inline]
 path_openat+0x2e44/0x3830 fs/namei.c:4859
 do_file_open+0x23e/0x4a0 fs/namei.c:4888
 do_sys_openat2+0x115/0x200 fs/open.c:1395
 do_sys_open fs/open.c:1401 [inline]
 __do_sys_openat fs/open.c:1417 [inline]
 __se_sys_openat fs/open.c:1412 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1412
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 23 tgid 23 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
 __tlb_remove_table_free mm/mmu_gather.c:228 [inline]
 tlb_remove_table_rcu+0x85/0x100 mm/mmu_gather.c:291
 rcu_do_batch kernel/rcu/tree.c:2645 [inline]
 rcu_core+0x78b/0x10a0 kernel/rcu/tree.c:2897
 handle_softirqs+0x225/0x840 kernel/softirq.c:622
 run_ksoftirqd+0x36/0x60 kernel/softirq.c:1076
 smpboot_thread_fn+0x57c/0xa80 kernel/smpboot.c:160
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff88816e86bb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88816e86bc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88816e86bc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                    ^
 ffff88816e86bd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88816e86bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: slab-out-of-bounds Read in ext4_inlinedir_to_tree

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      08c7183f5b9ffe4408e74fff848a4cc2105361d4
arch:      amd64
compiler:  Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
config:    https://ci.syzbot.org/builds/0efdb868-daeb-4649-9bcb-5af41d993e73/config
syz repro: https://ci.syzbot.org/findings/bb78d414-4cff-400b-aaf6-76d450b12cda/syz_repro

==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_dir_entry_len fs/ext4/ext4.h:4182 [inline]
BUG: KASAN: slab-out-of-bounds in ext4_inlinedir_to_tree+0xd95/0x10a0 fs/ext4/inline.c:1335
Read of size 2 at addr ffff88816f219a3c by task syz.1.18/5830

CPU: 1 UID: 0 PID: 5830 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dir_entry_len fs/ext4/ext4.h:4182 [inline]
 ext4_inlinedir_to_tree+0xd95/0x10a0 fs/ext4/inline.c:1335
 ext4_htree_fill_tree+0x4c9/0x2480 fs/ext4/namei.c:1195
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2e2a/0x3720 fs/ext4/dir.c:146
 iterate_dir+0x2e2/0x4d0 fs/readdir.c:110
 __do_sys_getdents fs/readdir.c:319 [inline]
 __se_sys_getdents+0xf1/0x270 fs/readdir.c:304
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe51459ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe515527028 EFLAGS: 00000246 ORIG_RAX: 000000000000004e
RAX: ffffffffffffffda RBX: 00007fe514815fa0 RCX: 00007fe51459ce59
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
RBP: 00007fe514632e6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fe514816038 R14: 00007fe514815fa0 R15: 00007fffd9b381d8
 </TASK>

Allocated by task 5830:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5334 [inline]
 __kmalloc_noprof+0x358/0x750 mm/slub.c:5347
 _kmalloc_noprof include/linux/slab.h:973 [inline]
 ext4_inlinedir_to_tree+0x2ec/0x10a0 fs/ext4/inline.c:1292
 ext4_htree_fill_tree+0x4c9/0x2480 fs/ext4/namei.c:1195
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2e2a/0x3720 fs/ext4/dir.c:146
 iterate_dir+0x2e2/0x4d0 fs/readdir.c:110
 __do_sys_getdents fs/readdir.c:319 [inline]
 __se_sys_getdents+0xf1/0x270 fs/readdir.c:304
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff88816f219a00
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes to the right of
 allocated 60-byte region [ffff88816f219a00, ffff88816f219a3c)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16f219
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff8881000418c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 10138712683, free_ts 10137977139
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3289 [inline]
 allocate_slab+0x74/0x5d0 mm/slub.c:3404
 new_slab mm/slub.c:3447 [inline]
 refill_objects+0x328/0x3c0 mm/slub.c:7241
 refill_sheaf mm/slub.c:2827 [inline]
 __pcs_replace_empty_main+0x2e0/0x6b0 mm/slub.c:4692
 alloc_from_pcs mm/slub.c:4790 [inline]
 slab_alloc_node mm/slub.c:4924 [inline]
 __do_kmalloc_node mm/slub.c:5333 [inline]
 __kmalloc_noprof+0x464/0x750 mm/slub.c:5347
 _kmalloc_noprof include/linux/slab.h:973 [inline]
 _kzalloc_noprof include/linux/slab.h:1286 [inline]
 kobject_get_path+0xc5/0x2f0 lib/kobject.c:161
 kobject_uevent_env+0x29e/0x9e0 lib/kobject_uevent.c:548
 device_add+0x544/0xb80 drivers/base/core.c:3738
 scsi_add_host_with_dma+0x5db/0xd00 drivers/scsi/hosts.c:287
 ata_scsi_add_hosts+0x29b/0x4b0 drivers/ata/libata-scsi.c:4659
 ata_host_register+0x1c5/0x7d0 drivers/ata/libata-core.c:6131
 ata_host_activate+0x33c/0x3c0 drivers/ata/libata-core.c:6234
 ahci_init_one+0x1afa/0x22b0 drivers/ata/ahci.c:3090
 local_pci_probe drivers/pci/pci-driver.c:332 [inline]
 pci_call_probe drivers/pci/pci-driver.c:394 [inline]
 __pci_device_probe drivers/pci/pci-driver.c:455 [inline]
 pci_device_probe+0x431/0xc90 drivers/pci/pci-driver.c:489
page last free pid 36 tgid 36 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
 __kasan_populate_vmalloc_do mm/kasan/shadow.c:393 [inline]
 __kasan_populate_vmalloc+0x1a8/0x1c0 mm/kasan/shadow.c:424
 kasan_populate_vmalloc include/linux/kasan.h:580 [inline]
 alloc_vmap_area+0xd1a/0x1420 mm/vmalloc.c:2123
 __get_vm_area_node+0x1f2/0x300 mm/vmalloc.c:3226
 __vmalloc_node_range_noprof+0x358/0x1730 mm/vmalloc.c:4024
 __vmalloc_node_noprof+0xc2/0x100 mm/vmalloc.c:4124
 alloc_thread_stack_node kernel/fork.c:358 [inline]
 dup_task_struct+0x28e/0x850 kernel/fork.c:928
 copy_process+0x81b/0x42e0 kernel/fork.c:2109
 kernel_clone+0x2d7/0x940 kernel/fork.c:2745
 user_mode_thread+0x110/0x180 kernel/fork.c:2821
 call_usermodehelper_exec_work+0x5c/0x230 kernel/umh.c:171
 process_one_work kernel/workqueue.c:3322 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3405
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3486
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff88816f219900: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
 ffff88816f219980: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff88816f219a00: 00 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc
                                        ^
 ffff88816f219a80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff88816f219b00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: slab-use-after-free Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      08c7183f5b9ffe4408e74fff848a4cc2105361d4
arch:      amd64
compiler:  Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
config:    https://ci.syzbot.org/builds/0efdb868-daeb-4649-9bcb-5af41d993e73/config
syz repro: https://ci.syzbot.org/findings/f322e293-7a3f-469a-ae1f-677c84eb4c0f/syz_repro

==================================================================
BUG: KASAN: slab-use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4156 [inline]
BUG: KASAN: slab-use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4183 [inline]
BUG: KASAN: slab-use-after-free in __ext4_check_dir_entry+0x659/0xbe0 fs/ext4/dir.c:96
Read of size 1 at addr ffff888103e89c1d by task syz.2.19/5867

CPU: 0 UID: 0 PID: 5867 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4156 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4183 [inline]
 __ext4_check_dir_entry+0x659/0xbe0 fs/ext4/dir.c:96
 ext4_check_all_de+0x6a/0x140 fs/ext4/dir.c:657
 ext4_convert_inline_data_nolock+0x1b7/0x980 fs/ext4/inline.c:1121
 ext4_try_add_inline_entry+0x5cc/0x8a0 fs/ext4/inline.c:1247
 __ext4_add_entry+0x385/0x3470 fs/ext4/namei.c:2552
 ext4_add_entry fs/ext4/namei.c:2636 [inline]
 ext4_mkdir+0x5f3/0xd30 fs/ext4/namei.c:3203
 vfs_mkdir+0x406/0x620 fs/namei.c:5272
 filename_mkdirat+0x285/0x510 fs/namei.c:5305
 __do_sys_mkdirat fs/namei.c:5326 [inline]
 __se_sys_mkdirat+0x35/0x150 fs/namei.c:5323
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb01d39ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb01e269028 EFLAGS: 00000246 ORIG_RAX: 0000000000000102
RAX: ffffffffffffffda RBX: 00007fb01d615fa0 RCX: 00007fb01d39ce59
RDX: 0000000000000037 RSI: 0000200000000380 RDI: 0000000000000004
RBP: 00007fb01d432e6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fb01d616038 R14: 00007fb01d615fa0 R15: 00007ffdec822cc8
 </TASK>

Allocated by task 5630:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x318/0x660 mm/slub.c:5451
 _kmalloc_noprof include/linux/slab.h:969 [inline]
 __hw_addr_create+0x62/0x240 net/core/dev_addr_lists.c:69
 __hw_addr_add_ex+0x1ce/0x520 net/core/dev_addr_lists.c:127
 __hw_addr_add net/core/dev_addr_lists.c:144 [inline]
 dev_addr_init+0x15a/0x240 net/core/dev_addr_lists.c:696
 alloc_netdev_mqs+0x2b4/0x1270 net/core/dev.c:12064
 __ip_tunnel_create+0x348/0x560 net/ipv4/ip_tunnel.c:255
 ip_tunnel_init_net+0x2ea/0x810 net/ipv4/ip_tunnel.c:1150
 ops_init+0x35d/0x5d0 net/core/net_namespace.c:137
 setup_net+0x118/0x350 net/core/net_namespace.c:446
 copy_net_ns+0x4f9/0x720 net/core/net_namespace.c:579
 create_new_namespaces+0x3f0/0x6b0 kernel/nsproxy.c:132
 unshare_nsproxy_namespaces+0x149/0x190 kernel/nsproxy.c:234
 ksys_unshare+0x57d/0xa00 kernel/fork.c:3266
 __do_sys_unshare kernel/fork.c:3340 [inline]
 __se_sys_unshare kernel/fork.c:3338 [inline]
 __x64_sys_unshare+0x38/0x50 kernel/fork.c:3338
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 68:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2700 [inline]
 slab_free_freelist_hook mm/slub.c:2729 [inline]
 slab_free_bulk mm/slub.c:6344 [inline]
 kmem_cache_free_bulk+0x30f/0x1180 mm/slub.c:7076
 kfree_bulk include/linux/slab.h:891 [inline]
 kvfree_rcu_bulk+0xc6/0x190 mm/slab_common.c:1502
 kvfree_rcu_drain_ready mm/slab_common.c:1704 [inline]
 kfree_rcu_monitor+0x21a/0x2b0 mm/slab_common.c:1777
 process_one_work kernel/workqueue.c:3322 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3405
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3486
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Last potentially related work creation:
 kasan_save_stack+0x3e/0x60 mm/kasan/common.c:57
 kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
 kvfree_call_rcu+0x100/0x430 mm/slab_common.c:1970
 __hw_addr_flush net/core/dev_addr_lists.c:500 [inline]
 dev_addr_flush+0x16c/0x210 net/core/dev_addr_lists.c:673
 free_netdev+0x26c/0x6e0 net/core/dev.c:12209
 netdev_run_todo+0xf3d/0x10d0 net/core/dev.c:11743
 ops_exit_rtnl_list net/core/net_namespace.c:189 [inline]
 ops_undo_list+0x396/0x8d0 net/core/net_namespace.c:248
 cleanup_net+0x572/0x810 net/core/net_namespace.c:702
 process_one_work kernel/workqueue.c:3322 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3405
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3486
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The buggy address belongs to the object at ffff888103e89c00
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 29 bytes inside of
 freed 128-byte region [ffff888103e89c00, ffff888103e89c80)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x103e89
flags: 0x17ff00000000000(node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000000 ffff888100041a00 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800100010 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1, tgid 1 (swapper/0), ts 2773920656, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3289 [inline]
 allocate_slab+0x74/0x5d0 mm/slub.c:3404
 new_slab mm/slub.c:3447 [inline]
 refill_objects+0x328/0x3c0 mm/slub.c:7241
 refill_sheaf mm/slub.c:2827 [inline]
 __pcs_replace_empty_main+0x2e0/0x6b0 mm/slub.c:4692
 alloc_from_pcs mm/slub.c:4790 [inline]
 slab_alloc_node mm/slub.c:4924 [inline]
 __do_kmalloc_node mm/slub.c:5333 [inline]
 __kmalloc_noprof+0x464/0x750 mm/slub.c:5347
 _kmalloc_noprof include/linux/slab.h:973 [inline]
 _kzalloc_noprof include/linux/slab.h:1286 [inline]
 __alloc_empty_sheaf+0x24/0x40 mm/slub.c:2774
 alloc_empty_sheaf mm/slub.c:2794 [inline]
 __pcs_replace_empty_main+0x447/0x6b0 mm/slub.c:4687
 alloc_from_pcs mm/slub.c:4790 [inline]
 slab_alloc_node mm/slub.c:4924 [inline]
 kmem_cache_alloc_lru_noprof+0x372/0x640 mm/slub.c:4958
 alloc_inode+0x6a/0x1b0 fs/inode.c:341
 new_inode+0x1f/0x170 fs/inode.c:1177
 debugfs_get_inode fs/debugfs/inode.c:72 [inline]
 debugfs_create_dir+0x68/0x350 fs/debugfs/inode.c:578
 blk_dev_init+0xdf/0x150 block/blk-core.c:1333
 genhd_device_init+0x1c/0x50 block/genhd.c:1002
 do_one_initcall+0x250/0x870 init/main.c:1347
page_owner free stack trace missing

Memory state around the buggy address:
 ffff888103e89b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888103e89b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888103e89c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff888103e89c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888103e89d00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


***

KASAN: slab-use-after-free Read in ext4_inlinedir_to_tree

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      08c7183f5b9ffe4408e74fff848a4cc2105361d4
arch:      amd64
compiler:  Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
config:    https://ci.syzbot.org/builds/0efdb868-daeb-4649-9bcb-5af41d993e73/config
syz repro: https://ci.syzbot.org/findings/b1e2a550-a6c3-410a-ae53-ca1e5366cc94/syz_repro

loop0: lost filesystem error report for type 5 error -117
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
==================================================================
BUG: KASAN: slab-use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4156 [inline]
BUG: KASAN: slab-use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4183 [inline]
BUG: KASAN: slab-use-after-free in ext4_inlinedir_to_tree+0x8f0/0x10a0 fs/ext4/inline.c:1335
Read of size 1 at addr ffff888111d0149d by task syz.0.19/5801

CPU: 1 UID: 0 PID: 5801 Comm: syz.0.19 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4156 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4183 [inline]
 ext4_inlinedir_to_tree+0x8f0/0x10a0 fs/ext4/inline.c:1335
 ext4_htree_fill_tree+0x4c9/0x2480 fs/ext4/namei.c:1195
 ext4_dx_readdir fs/ext4/dir.c:600 [inline]
 ext4_readdir+0x2e2a/0x3720 fs/ext4/dir.c:146
 iterate_dir+0x2e2/0x4d0 fs/readdir.c:110
 __do_sys_getdents fs/readdir.c:319 [inline]
 __se_sys_getdents+0xf1/0x270 fs/readdir.c:304
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6e8459ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffef0bce788 EFLAGS: 00000246 ORIG_RAX: 000000000000004e
RAX: ffffffffffffffda RBX: 00007f6e84815fa0 RCX: 00007f6e8459ce59
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007f6e84632e6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6e84815fac R14: 00007f6e84815fa0 R15: 00007f6e84815fa0
 </TASK>

Allocated by task 5738:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __kmalloc_cache_noprof+0x318/0x660 mm/slub.c:5451
 _kmalloc_noprof include/linux/slab.h:969 [inline]
 __kthread_create_on_node+0x115/0x3d0 kernel/kthread.c:483
 kthread_create_on_node+0xeb/0x140 kernel/kthread.c:559
 napi_kthread_create net/core/dev.c:1656 [inline]
 netif_napi_add_weight_locked+0x699/0x940 net/core/dev.c:7594
 netif_napi_add_weight include/linux/netdevice.h:2870 [inline]
 netif_napi_add include/linux/netdevice.h:2887 [inline]
 wg_peer_create+0x52d/0x860 drivers/net/wireguard/peer.c:57
 set_peer drivers/net/wireguard/netlink.c:392 [inline]
 wg_set_device_doit+0xf3a/0x2120 drivers/net/wireguard/netlink.c:569
 genl_family_rcv_msg_doit+0x233/0x340 net/netlink/genetlink.c:1114
 genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
 genl_rcv_msg+0x614/0x7a0 net/netlink/genetlink.c:1209
 netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x7bb/0x940 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
 sock_sendmsg_nosec+0x13a/0x180 net/socket.c:775
 __sock_sendmsg net/socket.c:790 [inline]
 __sys_sendto+0x408/0x5a0 net/socket.c:2252
 __do_sys_sendto net/socket.c:2259 [inline]
 __se_sys_sendto net/socket.c:2255 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2255
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 5738:
 kasan_save_stack mm/kasan/common.c:57 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:78
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:253 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:285
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:2700 [inline]
 slab_free mm/slub.c:6310 [inline]
 kfree+0x1c5/0x640 mm/slub.c:6625
 __kthread_create_on_node+0x2fe/0x3d0 kernel/kthread.c:523
 kthread_create_on_node+0xeb/0x140 kernel/kthread.c:559
 napi_kthread_create net/core/dev.c:1656 [inline]
 netif_napi_add_weight_locked+0x699/0x940 net/core/dev.c:7594
 netif_napi_add_weight include/linux/netdevice.h:2870 [inline]
 netif_napi_add include/linux/netdevice.h:2887 [inline]
 wg_peer_create+0x52d/0x860 drivers/net/wireguard/peer.c:57
 set_peer drivers/net/wireguard/netlink.c:392 [inline]
 wg_set_device_doit+0xf3a/0x2120 drivers/net/wireguard/netlink.c:569
 genl_family_rcv_msg_doit+0x233/0x340 net/netlink/genetlink.c:1114
 genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
 genl_rcv_msg+0x614/0x7a0 net/netlink/genetlink.c:1209
 netlink_rcv_skb+0x226/0x4a0 net/netlink/af_netlink.c:2556
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x7bb/0x940 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1900
 sock_sendmsg_nosec+0x13a/0x180 net/socket.c:775
 __sock_sendmsg net/socket.c:790 [inline]
 __sys_sendto+0x408/0x5a0 net/socket.c:2252
 __do_sys_sendto net/socket.c:2259 [inline]
 __se_sys_sendto net/socket.c:2255 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2255
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

The buggy address belongs to the object at ffff888111d01480
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 29 bytes inside of
 freed 64-byte region [ffff888111d01480, ffff888111d014c0)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x111d01
flags: 0x17ff00000000000(node=0|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 017ff00000000000 ffff8881000418c0 dead000000000100 dead000000000122
raw: 0000000000000000 0000000800200020 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xd2cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 1113, tgid 1113 (kworker/u9:4), ts 18841797934, free_ts 18841792693
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 alloc_slab_page mm/slub.c:3289 [inline]
 allocate_slab+0x74/0x5d0 mm/slub.c:3404
 new_slab mm/slub.c:3447 [inline]
 refill_objects+0x328/0x3c0 mm/slub.c:7241
 refill_sheaf mm/slub.c:2827 [inline]
 __pcs_replace_empty_main+0x2e0/0x6b0 mm/slub.c:4692
 alloc_from_pcs mm/slub.c:4790 [inline]
 slab_alloc_node mm/slub.c:4924 [inline]
 __do_kmalloc_node mm/slub.c:5333 [inline]
 __kmalloc_node_noprof+0x56a/0x7b0 mm/slub.c:5340
 _kmalloc_node_noprof include/linux/slab.h:1174 [inline]
 __vmalloc_area_node mm/vmalloc.c:3857 [inline]
 __vmalloc_node_range_noprof+0x5d9/0x1730 mm/vmalloc.c:4064
 __vmalloc_node_noprof+0xc2/0x100 mm/vmalloc.c:4124
 alloc_thread_stack_node kernel/fork.c:358 [inline]
 dup_task_struct+0x28e/0x850 kernel/fork.c:928
 copy_process+0x81b/0x42e0 kernel/fork.c:2109
 kernel_clone+0x2d7/0x940 kernel/fork.c:2745
 user_mode_thread+0x110/0x180 kernel/fork.c:2821
 call_usermodehelper_exec_work+0x5c/0x230 kernel/umh.c:171
 process_one_work kernel/workqueue.c:3322 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3405
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3486
page last free pid 1113 tgid 1113 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
 __kasan_populate_vmalloc_do mm/kasan/shadow.c:393 [inline]
 __kasan_populate_vmalloc+0x1a8/0x1c0 mm/kasan/shadow.c:424
 kasan_populate_vmalloc include/linux/kasan.h:580 [inline]
 alloc_vmap_area+0xd1a/0x1420 mm/vmalloc.c:2123
 __get_vm_area_node+0x1f2/0x300 mm/vmalloc.c:3226
 __vmalloc_node_range_noprof+0x358/0x1730 mm/vmalloc.c:4024
 __vmalloc_node_noprof+0xc2/0x100 mm/vmalloc.c:4124
 alloc_thread_stack_node kernel/fork.c:358 [inline]
 dup_task_struct+0x28e/0x850 kernel/fork.c:928
 copy_process+0x81b/0x42e0 kernel/fork.c:2109
 kernel_clone+0x2d7/0x940 kernel/fork.c:2745
 user_mode_thread+0x110/0x180 kernel/fork.c:2821
 call_usermodehelper_exec_work+0x5c/0x230 kernel/umh.c:171
 process_one_work kernel/workqueue.c:3322 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3405
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3486
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff888111d01380: 00 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc
 ffff888111d01400: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888111d01480: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                            ^
 ffff888111d01500: 00 00 00 00 00 00 02 fc fc fc fc fc fc fc fc fc
 ffff888111d01580: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


***

KASAN: use-after-free Read in __ext4_check_dir_entry

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      08c7183f5b9ffe4408e74fff848a4cc2105361d4
arch:      amd64
compiler:  Debian clang version 22.1.6 (++20260514074242+fc4aad7b5db3-1~exp1~20260514074407.73), Debian LLD 22.1.6
config:    https://ci.syzbot.org/builds/0efdb868-daeb-4649-9bcb-5af41d993e73/config
syz repro: https://ci.syzbot.org/findings/07c4f835-36f6-4535-a165-aa25c5af571c/syz_repro

EXT4-fs error (device loop2): ext4_inlinedir_to_tree:1343: inode #21: block 10: comm syz.2.19: path /: bad entry in directory: directory entry overrun - offset=20, inode=0, rec_len=1024, size=60 fake=0
==================================================================
BUG: KASAN: use-after-free in ext4_dirent_get_data_len fs/ext4/ext4.h:4156 [inline]
BUG: KASAN: use-after-free in ext4_dir_entry_len fs/ext4/ext4.h:4183 [inline]
BUG: KASAN: use-after-free in __ext4_check_dir_entry+0x659/0xbe0 fs/ext4/dir.c:96
Read of size 1 at addr ffff888112785045 by task syz.2.19/5869

CPU: 0 UID: 0 PID: 5869 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_address_description+0x55/0x1e0 mm/kasan/report.c:378
 print_report+0x58/0x70 mm/kasan/report.c:482
 kasan_report+0x117/0x150 mm/kasan/report.c:595
 ext4_dirent_get_data_len fs/ext4/ext4.h:4156 [inline]
 ext4_dir_entry_len fs/ext4/ext4.h:4183 [inline]
 __ext4_check_dir_entry+0x659/0xbe0 fs/ext4/dir.c:96
 ext4_find_dest_de+0x14e/0x6e0 fs/ext4/namei.c:2221
 ext4_add_dirent_to_inline+0xcc/0x410 fs/ext4/inline.c:984
 ext4_try_add_inline_entry+0x21e/0x8a0 fs/ext4/inline.c:1213
 __ext4_add_entry+0x385/0x3470 fs/ext4/namei.c:2552
 __ext4_link+0x498/0x720 fs/ext4/namei.c:3649
 ext4_link+0x1dc/0x2b0 fs/ext4/namei.c:3689
 vfs_link+0x491/0x650 fs/namei.c:5787
 ovl_do_link fs/overlayfs/overlayfs.h:233 [inline]
 ovl_copy_up_tmpfile fs/overlayfs/copy_up.c:891 [inline]
 ovl_do_copy_up fs/overlayfs/copy_up.c:986 [inline]
 ovl_copy_up_one fs/overlayfs/copy_up.c:1189 [inline]
 ovl_copy_up_flags+0x1c52/0x3930 fs/overlayfs/copy_up.c:1243
 ovl_open+0x13f/0x300 fs/overlayfs/file.c:211
 do_dentry_open+0x816/0x1380 fs/open.c:947
 vfs_open+0x3b/0x340 fs/open.c:1079
 dentry_open+0x61/0xa0 fs/open.c:1102
 ima_calc_file_hash+0x15f/0x890 security/integrity/ima/ima_crypto.c:269
 ima_collect_measurement+0x51b/0xa00 security/integrity/ima/ima_api.c:300
 process_measurement+0x1272/0x1c10 security/integrity/ima/ima_main.c:425
 ima_file_check+0xe1/0x130 security/integrity/ima/ima_main.c:685
 security_file_post_open+0xb3/0x260 security/security.c:2755
 do_open fs/namei.c:4702 [inline]
 path_openat+0x2e90/0x3830 fs/namei.c:4859
 do_file_open+0x23e/0x4a0 fs/namei.c:4888
 do_sys_openat2+0x115/0x200 fs/open.c:1395
 do_sys_open fs/open.c:1401 [inline]
 __do_sys_openat fs/open.c:1417 [inline]
 __se_sys_openat fs/open.c:1412 [inline]
 __x64_sys_openat+0x138/0x170 fs/open.c:1412
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fae54f9ce59
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fae55ddb028 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007fae55215fa0 RCX: 00007fae54f9ce59
RDX: 000000000000003f RSI: 0000200000000380 RDI: ffffffffffffff9c
RBP: 00007fae55032e6f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000186 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fae55216038 R14: 00007fae55215fa0 R15: 00007ffedd6a9a88
 </TASK>

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x112785
flags: 0x17ff00000000000(node=0|zone=2|lastcpupid=0x7ff)
page_type: f0(buddy)
raw: 017ff00000000000 ffffea000449e048 ffffea000449e2c8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000f0000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL), pid 26, tgid 26 (kworker/u9:0), ts 18747225753, free_ts 49527089310
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x22d/0x280 mm/page_alloc.c:1853
 prep_new_page mm/page_alloc.c:1861 [inline]
 get_page_from_freelist+0x24ae/0x2530 mm/page_alloc.c:3941
 __alloc_frozen_pages_noprof+0x18d/0x380 mm/page_alloc.c:5221
 __alloc_pages_noprof+0x10/0x100 mm/page_alloc.c:5255
 alloc_pages_bulk_noprof+0x5ff/0x7c0 mm/page_alloc.c:5175
 ___alloc_pages_bulk mm/kasan/shadow.c:345 [inline]
 __kasan_populate_vmalloc_do mm/kasan/shadow.c:370 [inline]
 __kasan_populate_vmalloc+0xb7/0x1c0 mm/kasan/shadow.c:424
 kasan_populate_vmalloc include/linux/kasan.h:580 [inline]
 alloc_vmap_area+0xd1a/0x1420 mm/vmalloc.c:2123
 __get_vm_area_node+0x1f2/0x300 mm/vmalloc.c:3226
 __vmalloc_node_range_noprof+0x358/0x1730 mm/vmalloc.c:4024
 __vmalloc_node_noprof+0xc2/0x100 mm/vmalloc.c:4124
 alloc_thread_stack_node kernel/fork.c:358 [inline]
 dup_task_struct+0x28e/0x850 kernel/fork.c:928
 copy_process+0x81b/0x42e0 kernel/fork.c:2109
 kernel_clone+0x2d7/0x940 kernel/fork.c:2745
 user_mode_thread+0x110/0x180 kernel/fork.c:2821
 call_usermodehelper_exec_work+0x5c/0x230 kernel/umh.c:171
 process_one_work kernel/workqueue.c:3322 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3405
page last free pid 5625 tgid 5625 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 __free_pages_prepare mm/page_alloc.c:1397 [inline]
 __free_frozen_pages+0xc0d/0xd20 mm/page_alloc.c:2938
 kasan_depopulate_vmalloc_pte+0x6d/0x90 mm/kasan/shadow.c:484
 apply_to_pte_range mm/memory.c:3338 [inline]
 apply_to_pmd_range mm/memory.c:3382 [inline]
 apply_to_pud_range mm/memory.c:3418 [inline]
 apply_to_p4d_range mm/memory.c:3454 [inline]
 __apply_to_page_range+0xbd8/0x1420 mm/memory.c:3490
 __kasan_release_vmalloc+0xa2/0xd0 mm/kasan/shadow.c:602
 kasan_release_vmalloc include/linux/kasan.h:593 [inline]
 kasan_release_vmalloc_node mm/vmalloc.c:2284 [inline]
 purge_vmap_node+0x220/0x960 mm/vmalloc.c:2306
 __purge_vmap_area_lazy+0x783/0xb40 mm/vmalloc.c:2396
 drain_vmap_area_work+0x27/0x40 mm/vmalloc.c:2430
 process_one_work kernel/workqueue.c:3322 [inline]
 process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3405
 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3486
 kthread+0x388/0x470 kernel/kthread.c:436
 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Memory state around the buggy address:
 ffff888112784f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888112784f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888112785000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                           ^
 ffff888112785080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff888112785100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* [PATCH v8 2/4] ext4: set EXT4_STATE_NO_EXPAND in ext4_evict_inode
From: Yun Zhou @ 2026-06-20  1:39 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang
  Cc: linux-ext4, linux-kernel, yun.zhou
In-Reply-To: <20260620013937.2564269-1-yun.zhou@windriver.com>

An inode being evicted will never need its extra isize expanded.  Set
EXT4_STATE_NO_EXPAND before ext4_mark_inode_dirty() in ext4_evict_inode()
to make this explicit and prevent any unnecessary work in
ext4_try_to_expand_extra_isize().

This also provides defense-in-depth for the s_writepages_rwsem deadlock
during mount-time orphan cleanup, ensuring the expand path is blocked
for inodes under eviction regardless of how they are reached.

Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 09dcfb6bf48c..1de0aaa28e63 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -264,6 +264,7 @@ void ext4_evict_inode(struct inode *inode)
 	if (ext4_inode_is_fast_symlink(inode))
 		memset(EXT4_I(inode)->i_data, 0, sizeof(EXT4_I(inode)->i_data));
 	inode->i_size = 0;
+	ext4_set_inode_state(inode, EXT4_STATE_NO_EXPAND);
 	err = ext4_mark_inode_dirty(handle, inode);
 	if (err) {
 		ext4_warning(inode->i_sb,
-- 
2.43.0


^ permalink raw reply related

* [PATCH v8 4/4] ext4: convert remaining EA inode iput() calls to ext4_put_ea_inode()
From: Yun Zhou @ 2026-06-20  1:39 UTC (permalink / raw)
  To: tytso, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
	yi.zhang
  Cc: linux-ext4, linux-kernel, yun.zhou
In-Reply-To: <20260620013937.2564269-1-yun.zhou@windriver.com>

Convert all remaining iput() calls on EA inodes that execute under
xattr_sem or a jbd2 handle to use ext4_put_ea_inode().  With i_nlink>=1
and !SB_ACTIVE, a direct iput() would trigger write_inode_now() ->
s_writepages_rwsem, creating a lock ordering violation with the caller's
active jbd2 handle.

Converted sites and why defer is necessary:

- ext4_xattr_inode_inc_ref_all() cleanup: dec_ref undoes the failed
  inc_ref, but the EA inode may be shared so i_nlink remains 1.

- ext4_xattr_inode_dec_ref_all() ENOMEM fallback: ext4_expand_inode_array()
  failed before dec_ref is called, i_nlink=1, jbd2 handle active.

- ext4_xattr_inode_lookup_create() out_err: may be a cache-found inode
  where inc_ref failed; i_nlink remains 1.

- ext4_xattr_set_entry() old_ea_inode: dec_ref was called but the EA
  inode may be shared by other xattr blocks, so i_nlink remains 1.

- ext4_xattr_block_set() new block path: dec_ref drops the "extra" ref
  but inc_ref_all added another, so i_nlink stays 1.

- ext4_xattr_block_set() cleanup: on success no dec_ref was called
  (i_nlink=1); on error dec_ref may leave i_nlink=1 if shared.

- ext4_xattr_ibody_set() error path: dec_ref on a cache-found EA inode
  may leave i_nlink=1 if shared.

- ext4_xattr_ibody_set() success path: newly stored EA inode with
  i_nlink=1, just releasing the lookup reference.

- ext4_xattr_delete_inode() quota loop: iget for quota accounting only,
  no dec_ref called, i_nlink=1, jbd2 handle is active.

Direct iput() calls in pure lookup paths (ext4_xattr_inode_get,
ext4_xattr_inode_cache_find, tmp_inode in ext4_xattr_block_set) are
left unchanged -- these do not hold a jbd2 handle or xattr_sem.

Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
---
 fs/ext4/xattr.c | 40 ++++++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 79de182e22e6..08c1bdd5133d 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1079,6 +1079,13 @@ static int ext4_xattr_inode_inc_ref(handle_t *handle, struct inode *ea_inode)
 	return ext4_xattr_inode_update_ref(handle, ea_inode, 1);
 }
 
+/*
+ * Decrement on-disk reference count of an EA inode.  If refcount reaches 0,
+ * i_nlink is cleared and the inode is added to the orphan list.  Callers
+ * must use ext4_put_ea_inode() (not iput) to release the VFS reference
+ * afterwards, since iput on a nlink=0 inode triggers eviction which may
+ * deadlock if called under xattr_sem or an active jbd2 handle.
+ */
 static int ext4_xattr_inode_dec_ref(handle_t *handle, struct inode *ea_inode)
 {
 	return ext4_xattr_inode_update_ref(handle, ea_inode, -1);
@@ -1106,10 +1113,10 @@ static int ext4_xattr_inode_inc_ref_all(handle_t *handle, struct inode *parent,
 		err = ext4_xattr_inode_inc_ref(handle, ea_inode);
 		if (err) {
 			ext4_warning_inode(ea_inode, "inc ref error %d", err);
-			iput(ea_inode);
+			ext4_put_ea_inode(parent->i_sb, ea_inode);
 			goto cleanup;
 		}
-		iput(ea_inode);
+		ext4_put_ea_inode(parent->i_sb, ea_inode);
 	}
 	return 0;
 
@@ -1135,7 +1142,8 @@ static int ext4_xattr_inode_inc_ref_all(handle_t *handle, struct inode *parent,
 		if (err)
 			ext4_warning_inode(ea_inode, "cleanup dec ref error %d",
 					   err);
-		iput(ea_inode);
+		/* i_nlink may remain 1 if shared; defer for !SB_ACTIVE safety */
+		ext4_put_ea_inode(parent->i_sb, ea_inode);
 	}
 	return saved_err;
 }
@@ -1203,7 +1211,8 @@ ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent,
 		if (err) {
 			ext4_warning_inode(ea_inode,
 					   "Expand inode array err=%d", err);
-			iput(ea_inode);
+			/* i_nlink=1 (dec_ref not yet called); handle active */
+			ext4_put_ea_inode(parent->i_sb, ea_inode);
 			continue;
 		}
 
@@ -1507,7 +1516,7 @@ static struct inode *ext4_xattr_inode_create(handle_t *handle,
 			if (ext4_xattr_inode_dec_ref(handle, ea_inode))
 				ext4_warning_inode(ea_inode,
 					"cleanup dec ref error %d", err);
-			iput(ea_inode);
+			ext4_put_ea_inode(inode->i_sb, ea_inode);
 			return ERR_PTR(err);
 		}
 
@@ -1617,7 +1626,8 @@ static struct inode *ext4_xattr_inode_lookup_create(handle_t *handle,
 				      ea_inode->i_ino, true /* reusable */);
 	return ea_inode;
 out_err:
-	iput(ea_inode);
+	/* May be cache-found inode with i_nlink=1 (inc_ref failed) */
+	ext4_put_ea_inode(inode->i_sb, ea_inode);
 	ext4_xattr_inode_free_quota(inode, NULL, value_len);
 	return ERR_PTR(err);
 }
@@ -1850,7 +1860,8 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
 
 	ret = 0;
 out:
-	iput(old_ea_inode);
+	/* old_ea_inode had dec_ref; may still have i_nlink=1 if shared */
+	ext4_put_ea_inode(inode->i_sb, old_ea_inode);
 	return ret;
 }
 
@@ -2152,7 +2163,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 					ext4_warning_inode(ea_inode,
 							   "dec ref error=%d",
 							   error);
-				iput(ea_inode);
+				/* i_nlink stays 1 (inc_ref_all added a ref) */
+				ext4_put_ea_inode(inode->i_sb, ea_inode);
 				ea_inode = NULL;
 			}
 
@@ -2206,7 +2218,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 			ext4_xattr_inode_free_quota(inode, ea_inode,
 						    i_size_read(ea_inode));
 		}
-		iput(ea_inode);
+		/* success: i_nlink=1; error+dec_ref: may still be 1 if shared */
+		ext4_put_ea_inode(inode->i_sb, ea_inode);
 	}
 	if (ce)
 		mb_cache_entry_put(ea_block_cache, ce);
@@ -2288,7 +2301,8 @@ int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode,
 
 			ext4_xattr_inode_free_quota(inode, ea_inode,
 						    i_size_read(ea_inode));
-			iput(ea_inode);
+			/* cache-found ea_inode may retain i_nlink=1 */
+			ext4_put_ea_inode(inode->i_sb, ea_inode);
 		}
 		return error;
 	}
@@ -2300,7 +2314,8 @@ int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode,
 		header->h_magic = cpu_to_le32(0);
 		ext4_clear_inode_state(inode, EXT4_STATE_XATTR);
 	}
-	iput(ea_inode);
+	/* ea_inode has i_nlink=1 (new ref just stored in xattr entry) */
+	ext4_put_ea_inode(inode->i_sb, ea_inode);
 	return 0;
 }
 
@@ -2989,7 +3004,8 @@ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode,
 					continue;
 				ext4_xattr_inode_free_quota(inode, ea_inode,
 					      le32_to_cpu(entry->e_value_size));
-				iput(ea_inode);
+				/* no dec_ref yet but i_nlink=1; handle is active */
+				ext4_put_ea_inode(inode->i_sb, ea_inode);
 			}
 
 		}
-- 
2.43.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox