public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ext4: only dirty folios when data journaling regular files
@ 2025-05-16 17:38 Brian Foster
  2025-05-19 10:24 ` Jan Kara
  2025-05-20 14:40 ` Theodore Ts'o
  0 siblings, 2 replies; 4+ messages in thread
From: Brian Foster @ 2025-05-16 17:38 UTC (permalink / raw)
  To: linux-ext4; +Cc: Jan Kara

fstest generic/388 occasionally reproduces a crash that looks as
follows:

BUG: kernel NULL pointer dereference, address: 0000000000000000
...
Call Trace:
 <TASK>
 ext4_block_zero_page_range+0x30c/0x380 [ext4]
 ext4_truncate+0x436/0x440 [ext4]
 ext4_process_orphan+0x5d/0x110 [ext4]
 ext4_orphan_cleanup+0x124/0x4f0 [ext4]
 ext4_fill_super+0x262d/0x3110 [ext4]
 get_tree_bdev_flags+0x132/0x1d0
 vfs_get_tree+0x26/0xd0
 vfs_cmd_create+0x59/0xe0
 __do_sys_fsconfig+0x4ed/0x6b0
 do_syscall_64+0x82/0x170
 ...

This occurs when processing a symlink inode from the orphan list. The
partial block zeroing code in the truncate path calls
ext4_dirty_journalled_data() -> folio_mark_dirty(). The latter calls
mapping->a_ops->dirty_folio(), but symlink inodes are not assigned an
a_ops vector in ext4, hence the crash.

To avoid this problem, update the ext4_dirty_journalled_data() helper to
only mark the folio dirty on regular files (for which a_ops is
assigned). This also matches the journaling logic in the ext4_symlink()
creation path, where ext4_handle_dirty_metadata() is called directly.

Fixes: d84c9ebdac1e ("ext4: Mark pages with journalled data dirty")
Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Hi Jan,

I'm not intimately familiar with the jbd machinery here so this may well
be wrong, but it survives my testing so far. I initially hacked this to
mark the buffer dirty instead of the folio, but discovered jbd2 doesn't
seem to like that. I suspect that is because jbd2 wants to dirty/submit
the buffer itself after it's logged..?

Anyways, after that, this struck me as most consistent with behavior
prior to d84c9ebdac1e and/or with the creation path, so I'm floating
this as a first pass. Is my understanding of d84c9ebdac1e correct in
that it is mainly an optimization to allow writeback to force the
journaling mechanism vs. otherwise waiting for the other way around
(i.e. a journal commit to mark folios dirty)? Thoughts appreciated..

Brian

 fs/ext4/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 94c7d2d828a6..d3c138003ad3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1009,7 +1009,12 @@ int ext4_walk_page_buffers(handle_t *handle, struct inode *inode,
  */
 static int ext4_dirty_journalled_data(handle_t *handle, struct buffer_head *bh)
 {
-	folio_mark_dirty(bh->b_folio);
+	struct folio *folio = bh->b_folio;
+	struct inode *inode = folio->mapping->host;
+
+	/* only regular files have a_ops */
+	if (S_ISREG(inode->i_mode))
+		folio_mark_dirty(folio);
 	return ext4_handle_dirty_metadata(handle, NULL, bh);
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ext4: only dirty folios when data journaling regular files
  2025-05-16 17:38 [PATCH] ext4: only dirty folios when data journaling regular files Brian Foster
@ 2025-05-19 10:24 ` Jan Kara
  2025-05-19 12:34   ` Brian Foster
  2025-05-20 14:40 ` Theodore Ts'o
  1 sibling, 1 reply; 4+ messages in thread
From: Jan Kara @ 2025-05-19 10:24 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-ext4, Jan Kara

On Fri 16-05-25 13:38:00, Brian Foster wrote:
> fstest generic/388 occasionally reproduces a crash that looks as
> follows:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> Call Trace:
>  <TASK>
>  ext4_block_zero_page_range+0x30c/0x380 [ext4]
>  ext4_truncate+0x436/0x440 [ext4]
>  ext4_process_orphan+0x5d/0x110 [ext4]
>  ext4_orphan_cleanup+0x124/0x4f0 [ext4]
>  ext4_fill_super+0x262d/0x3110 [ext4]
>  get_tree_bdev_flags+0x132/0x1d0
>  vfs_get_tree+0x26/0xd0
>  vfs_cmd_create+0x59/0xe0
>  __do_sys_fsconfig+0x4ed/0x6b0
>  do_syscall_64+0x82/0x170
>  ...
> 
> This occurs when processing a symlink inode from the orphan list. The
> partial block zeroing code in the truncate path calls
> ext4_dirty_journalled_data() -> folio_mark_dirty(). The latter calls
> mapping->a_ops->dirty_folio(), but symlink inodes are not assigned an
> a_ops vector in ext4, hence the crash.
> 
> To avoid this problem, update the ext4_dirty_journalled_data() helper to
> only mark the folio dirty on regular files (for which a_ops is
> assigned). This also matches the journaling logic in the ext4_symlink()
> creation path, where ext4_handle_dirty_metadata() is called directly.
> 
> Fixes: d84c9ebdac1e ("ext4: Mark pages with journalled data dirty")
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Yeah, I forgot about this subtlety when writing d84c9ebdac1e. Good catch
and thanks for fixing this up! The fix looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

> ---
> 
> Hi Jan,
> 
> I'm not intimately familiar with the jbd machinery here so this may well
> be wrong, but it survives my testing so far. I initially hacked this to
> mark the buffer dirty instead of the folio, but discovered jbd2 doesn't
> seem to like that. I suspect that is because jbd2 wants to dirty/submit
> the buffer itself after it's logged..?
> 
> Anyways, after that, this struck me as most consistent with behavior
> prior to d84c9ebdac1e and/or with the creation path, so I'm floating
> this as a first pass. Is my understanding of d84c9ebdac1e correct in
> that it is mainly an optimization to allow writeback to force the
> journaling mechanism vs. otherwise waiting for the other way around
> (i.e. a journal commit to mark folios dirty)? Thoughts appreciated..

Well, the motivation for d84c9ebdac1e was not so much an optimization but
rather to provide better visibility to the generic code what needs writing
out. Otherwise we had to special-case data journalling in a lot of places
that tried to do "clean the inode & purge the page cache" because simple
filemap_write_and_wait() was not enough to get the dirty pages in the inode
to disk.

								Honza

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 94c7d2d828a6..d3c138003ad3 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1009,7 +1009,12 @@ int ext4_walk_page_buffers(handle_t *handle, struct inode *inode,
>   */
>  static int ext4_dirty_journalled_data(handle_t *handle, struct buffer_head *bh)
>  {
> -	folio_mark_dirty(bh->b_folio);
> +	struct folio *folio = bh->b_folio;
> +	struct inode *inode = folio->mapping->host;
> +
> +	/* only regular files have a_ops */
> +	if (S_ISREG(inode->i_mode))
> +		folio_mark_dirty(folio);
>  	return ext4_handle_dirty_metadata(handle, NULL, bh);
>  }
>  
> -- 
> 2.49.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ext4: only dirty folios when data journaling regular files
  2025-05-19 10:24 ` Jan Kara
@ 2025-05-19 12:34   ` Brian Foster
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Foster @ 2025-05-19 12:34 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Mon, May 19, 2025 at 12:24:32PM +0200, Jan Kara wrote:
> On Fri 16-05-25 13:38:00, Brian Foster wrote:
> > fstest generic/388 occasionally reproduces a crash that looks as
> > follows:
> > 
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > ...
> > Call Trace:
> >  <TASK>
> >  ext4_block_zero_page_range+0x30c/0x380 [ext4]
> >  ext4_truncate+0x436/0x440 [ext4]
> >  ext4_process_orphan+0x5d/0x110 [ext4]
> >  ext4_orphan_cleanup+0x124/0x4f0 [ext4]
> >  ext4_fill_super+0x262d/0x3110 [ext4]
> >  get_tree_bdev_flags+0x132/0x1d0
> >  vfs_get_tree+0x26/0xd0
> >  vfs_cmd_create+0x59/0xe0
> >  __do_sys_fsconfig+0x4ed/0x6b0
> >  do_syscall_64+0x82/0x170
> >  ...
> > 
> > This occurs when processing a symlink inode from the orphan list. The
> > partial block zeroing code in the truncate path calls
> > ext4_dirty_journalled_data() -> folio_mark_dirty(). The latter calls
> > mapping->a_ops->dirty_folio(), but symlink inodes are not assigned an
> > a_ops vector in ext4, hence the crash.
> > 
> > To avoid this problem, update the ext4_dirty_journalled_data() helper to
> > only mark the folio dirty on regular files (for which a_ops is
> > assigned). This also matches the journaling logic in the ext4_symlink()
> > creation path, where ext4_handle_dirty_metadata() is called directly.
> > 
> > Fixes: d84c9ebdac1e ("ext4: Mark pages with journalled data dirty")
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> 
> Yeah, I forgot about this subtlety when writing d84c9ebdac1e. Good catch
> and thanks for fixing this up! The fix looks good. Feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> > ---
> > 
> > Hi Jan,
> > 
> > I'm not intimately familiar with the jbd machinery here so this may well
> > be wrong, but it survives my testing so far. I initially hacked this to
> > mark the buffer dirty instead of the folio, but discovered jbd2 doesn't
> > seem to like that. I suspect that is because jbd2 wants to dirty/submit
> > the buffer itself after it's logged..?
> > 
> > Anyways, after that, this struck me as most consistent with behavior
> > prior to d84c9ebdac1e and/or with the creation path, so I'm floating
> > this as a first pass. Is my understanding of d84c9ebdac1e correct in
> > that it is mainly an optimization to allow writeback to force the
> > journaling mechanism vs. otherwise waiting for the other way around
> > (i.e. a journal commit to mark folios dirty)? Thoughts appreciated..
> 
> Well, the motivation for d84c9ebdac1e was not so much an optimization but
> rather to provide better visibility to the generic code what needs writing
> out. Otherwise we had to special-case data journalling in a lot of places
> that tried to do "clean the inode & purge the page cache" because simple
> filemap_write_and_wait() was not enough to get the dirty pages in the inode
> to disk.
> 

Ah, I see. Thanks for the insight (and review).

Brian

> 								Honza
> 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 94c7d2d828a6..d3c138003ad3 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -1009,7 +1009,12 @@ int ext4_walk_page_buffers(handle_t *handle, struct inode *inode,
> >   */
> >  static int ext4_dirty_journalled_data(handle_t *handle, struct buffer_head *bh)
> >  {
> > -	folio_mark_dirty(bh->b_folio);
> > +	struct folio *folio = bh->b_folio;
> > +	struct inode *inode = folio->mapping->host;
> > +
> > +	/* only regular files have a_ops */
> > +	if (S_ISREG(inode->i_mode))
> > +		folio_mark_dirty(folio);
> >  	return ext4_handle_dirty_metadata(handle, NULL, bh);
> >  }
> >  
> > -- 
> > 2.49.0
> > 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ext4: only dirty folios when data journaling regular files
  2025-05-16 17:38 [PATCH] ext4: only dirty folios when data journaling regular files Brian Foster
  2025-05-19 10:24 ` Jan Kara
@ 2025-05-20 14:40 ` Theodore Ts'o
  1 sibling, 0 replies; 4+ messages in thread
From: Theodore Ts'o @ 2025-05-20 14:40 UTC (permalink / raw)
  To: linux-ext4, Brian Foster; +Cc: Theodore Ts'o, Jan Kara


On Fri, 16 May 2025 13:38:00 -0400, Brian Foster wrote:
> fstest generic/388 occasionally reproduces a crash that looks as
> follows:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...
> Call Trace:
>  <TASK>
>  ext4_block_zero_page_range+0x30c/0x380 [ext4]
>  ext4_truncate+0x436/0x440 [ext4]
>  ext4_process_orphan+0x5d/0x110 [ext4]
>  ext4_orphan_cleanup+0x124/0x4f0 [ext4]
>  ext4_fill_super+0x262d/0x3110 [ext4]
>  get_tree_bdev_flags+0x132/0x1d0
>  vfs_get_tree+0x26/0xd0
>  vfs_cmd_create+0x59/0xe0
>  __do_sys_fsconfig+0x4ed/0x6b0
>  do_syscall_64+0x82/0x170
>  ...
> 
> [...]

Applied, thanks!

[1/1] ext4: only dirty folios when data journaling regular files
      commit: e26268ff1dcae5662c1b96c35f18cfa6ab73d9de

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-20 14:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-16 17:38 [PATCH] ext4: only dirty folios when data journaling regular files Brian Foster
2025-05-19 10:24 ` Jan Kara
2025-05-19 12:34   ` Brian Foster
2025-05-20 14:40 ` Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox