public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ext4: nojournal mode fixes
@ 2026-02-16 16:48 Jan Kara
  2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jan Kara @ 2026-02-16 16:48 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Free Ekanayaka, Jan Kara

Hello,

here are two ext4 fixes for nojournal mode. The first fix is fixing handling of
uninitialized inodes in recently_deleted() used in nojournal mode which was
leading to occasional fstests failures for me (which became much more likely
after the second patch due to timing changes). The second patch fixes a bug in
ext4_fsync() which was not properly writing out inode metadata in nojournal
mode. It is kind of a band aid but proper solution is going to be rather
intrusive and practically unbackportable so I think having it is worth it.

								Honza

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization
  2026-02-16 16:48 [PATCH 0/2] ext4: nojournal mode fixes Jan Kara
@ 2026-02-16 16:48 ` Jan Kara
  2026-02-26 11:49   ` Zhang Yi
  2026-02-16 16:48 ` [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode Jan Kara
  2026-03-26 11:57 ` [PATCH 0/2] ext4: nojournal mode fixes Theodore Ts'o
  2 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2026-02-16 16:48 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Free Ekanayaka, Jan Kara

recently_deleted() checks whether inode has been used in the near past.
However this can give false positive result when inode table is not
initialized yet and we are in fact comparing to random garbage (or stale
itable block of a filesystem before mkfs). Ultimately this results in
uninitialized inodes being skipped during inode allocation and possibly
they are never initialized and thus e2fsck complains.  Verify if the
inode has been initialized before checking for dtime.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ialloc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index b20a1bf866ab..d858ae10a329 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -686,6 +686,12 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
 	if (unlikely(!gdp))
 		return 0;
 
+	/* Inode was never used in this filesystem? */
+	if (ext4_has_group_desc_csum(sb) && 
+	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT) ||
+	     ino >= EXT4_INODES_PER_GROUP(sb) - ext4_itable_unused_count(sb, gdp)))
+		return 0;
+
 	bh = sb_find_get_block(sb, ext4_inode_table(sb, gdp) +
 		       (ino / inodes_per_block));
 	if (!bh || !buffer_uptodate(bh))
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode
  2026-02-16 16:48 [PATCH 0/2] ext4: nojournal mode fixes Jan Kara
  2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
@ 2026-02-16 16:48 ` Jan Kara
  2026-02-26 11:56   ` Zhang Yi
  2026-03-26 11:57 ` [PATCH 0/2] ext4: nojournal mode fixes Theodore Ts'o
  2 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2026-02-16 16:48 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Free Ekanayaka, Jan Kara, stable

When inode metadata is changed, we sometimes just call
ext4_mark_inode_dirty() to track modified metadata. This copies inode
metadata into block buffer which is enough when we are journalling
metadata. However when we are running in nojournal mode we currently
fail to write the dirtied inode buffer during fsync(2) because the inode
is not marked as dirty. Use explicit ext4_write_inode() call to make
sure the inode table buffer is written to the disk. This is a band aid
solution but proper solution requires a much larger rewrite including
changes in metadata bh tracking infrastructure.

Reported-by: Free Ekanayaka <free.ekanayaka@gmail.com>
Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/fsync.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index e476c6de3074..bd8f230fa507 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -83,11 +83,23 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
 				int datasync, bool *needs_barrier)
 {
 	struct inode *inode = file->f_inode;
+	struct writeback_control wbc = {
+		.sync_mode = WB_SYNC_ALL,
+		.nr_to_write = 0,
+	};
 	int ret;
 
 	ret = generic_buffers_fsync_noflush(file, start, end, datasync);
-	if (!ret)
-		ret = ext4_sync_parent(inode);
+	if (ret)
+		return ret;
+
+	/* Force writeout of inode table buffer to disk */
+	ret = ext4_write_inode(inode, &wbc);
+	if (ret)
+		return ret;
+
+	ret = ext4_sync_parent(inode);
+
 	if (test_opt(inode->i_sb, BARRIER))
 		*needs_barrier = true;
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization
  2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
@ 2026-02-26 11:49   ` Zhang Yi
  0 siblings, 0 replies; 7+ messages in thread
From: Zhang Yi @ 2026-02-26 11:49 UTC (permalink / raw)
  To: Jan Kara, Ted Tso; +Cc: linux-ext4, Free Ekanayaka

On 2/17/2026 12:48 AM, Jan Kara wrote:
> recently_deleted() checks whether inode has been used in the near past.
> However this can give false positive result when inode table is not
> initialized yet and we are in fact comparing to random garbage (or stale
> itable block of a filesystem before mkfs). Ultimately this results in
> uninitialized inodes being skipped during inode allocation and possibly
> they are never initialized and thus e2fsck complains.  Verify if the
> inode has been initialized before checking for dtime.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Ha, this is a nice catch! Looks good to me.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> ---
>  fs/ext4/ialloc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index b20a1bf866ab..d858ae10a329 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -686,6 +686,12 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
>  	if (unlikely(!gdp))
>  		return 0;
>  
> +	/* Inode was never used in this filesystem? */
> +	if (ext4_has_group_desc_csum(sb) && 
> +	    (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT) ||
> +	     ino >= EXT4_INODES_PER_GROUP(sb) - ext4_itable_unused_count(sb, gdp)))
> +		return 0;
> +
>  	bh = sb_find_get_block(sb, ext4_inode_table(sb, gdp) +
>  		       (ino / inodes_per_block));
>  	if (!bh || !buffer_uptodate(bh))


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode
  2026-02-16 16:48 ` [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode Jan Kara
@ 2026-02-26 11:56   ` Zhang Yi
  2026-02-26 13:39     ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Zhang Yi @ 2026-02-26 11:56 UTC (permalink / raw)
  To: Jan Kara, Ted Tso; +Cc: linux-ext4, Free Ekanayaka, stable

On 2/17/2026 12:48 AM, Jan Kara wrote:
> When inode metadata is changed, we sometimes just call
> ext4_mark_inode_dirty() to track modified metadata. This copies inode
> metadata into block buffer which is enough when we are journalling
> metadata. However when we are running in nojournal mode we currently
> fail to write the dirtied inode buffer during fsync(2) because the inode
> is not marked as dirty.

Please let me understand this. You mean that because some places we
directly call ext4_mark_inode_dirty() to mark the inode as dirty, instead
of using the generic mark_inode_dirty(), this results in the inode missing
the I_DIRTY_INODE flag. Consequently, generic_buffers_fsync_noflush()->
sync_inode_metadata() does not write the inode, leading to the metadata
not being updated on disk after fsync(2), right?

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

> Use explicit ext4_write_inode() call to make
> sure the inode table buffer is written to the disk. This is a band aid
> solution but proper solution requires a much larger rewrite including
> changes in metadata bh tracking infrastructure.
> 
> Reported-by: Free Ekanayaka <free.ekanayaka@gmail.com>
> Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/
> CC: stable@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/fsync.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index e476c6de3074..bd8f230fa507 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -83,11 +83,23 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
>  				int datasync, bool *needs_barrier)
>  {
>  	struct inode *inode = file->f_inode;
> +	struct writeback_control wbc = {
> +		.sync_mode = WB_SYNC_ALL,
> +		.nr_to_write = 0,
> +	};
>  	int ret;
>  
>  	ret = generic_buffers_fsync_noflush(file, start, end, datasync);
> -	if (!ret)
> -		ret = ext4_sync_parent(inode);
> +	if (ret)
> +		return ret;
> +
> +	/* Force writeout of inode table buffer to disk */
> +	ret = ext4_write_inode(inode, &wbc);
> +	if (ret)
> +		return ret;
> +
> +	ret = ext4_sync_parent(inode);
> +
>  	if (test_opt(inode->i_sb, BARRIER))
>  		*needs_barrier = true;
>  


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode
  2026-02-26 11:56   ` Zhang Yi
@ 2026-02-26 13:39     ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2026-02-26 13:39 UTC (permalink / raw)
  To: Zhang Yi; +Cc: Jan Kara, Ted Tso, linux-ext4, Free Ekanayaka, stable

On Thu 26-02-26 19:56:34, Zhang Yi wrote:
> On 2/17/2026 12:48 AM, Jan Kara wrote:
> > When inode metadata is changed, we sometimes just call
> > ext4_mark_inode_dirty() to track modified metadata. This copies inode
> > metadata into block buffer which is enough when we are journalling
> > metadata. However when we are running in nojournal mode we currently
> > fail to write the dirtied inode buffer during fsync(2) because the inode
> > is not marked as dirty.
> 
> Please let me understand this. You mean that because some places we
> directly call ext4_mark_inode_dirty() to mark the inode as dirty, instead
> of using the generic mark_inode_dirty(), this results in the inode missing
> the I_DIRTY_INODE flag. Consequently, generic_buffers_fsync_noflush()->
> sync_inode_metadata() does not write the inode, leading to the metadata
> not being updated on disk after fsync(2), right?

Correct.

> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>

Thanks for review!

								Honza

> > Use explicit ext4_write_inode() call to make
> > sure the inode table buffer is written to the disk. This is a band aid
> > solution but proper solution requires a much larger rewrite including
> > changes in metadata bh tracking infrastructure.
> > 
> > Reported-by: Free Ekanayaka <free.ekanayaka@gmail.com>
> > Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/
> > CC: stable@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/fsync.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> > index e476c6de3074..bd8f230fa507 100644
> > --- a/fs/ext4/fsync.c
> > +++ b/fs/ext4/fsync.c
> > @@ -83,11 +83,23 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
> >  				int datasync, bool *needs_barrier)
> >  {
> >  	struct inode *inode = file->f_inode;
> > +	struct writeback_control wbc = {
> > +		.sync_mode = WB_SYNC_ALL,
> > +		.nr_to_write = 0,
> > +	};
> >  	int ret;
> >  
> >  	ret = generic_buffers_fsync_noflush(file, start, end, datasync);
> > -	if (!ret)
> > -		ret = ext4_sync_parent(inode);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Force writeout of inode table buffer to disk */
> > +	ret = ext4_write_inode(inode, &wbc);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = ext4_sync_parent(inode);
> > +
> >  	if (test_opt(inode->i_sb, BARRIER))
> >  		*needs_barrier = true;
> >  
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] ext4: nojournal mode fixes
  2026-02-16 16:48 [PATCH 0/2] ext4: nojournal mode fixes Jan Kara
  2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
  2026-02-16 16:48 ` [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode Jan Kara
@ 2026-03-26 11:57 ` Theodore Ts'o
  2 siblings, 0 replies; 7+ messages in thread
From: Theodore Ts'o @ 2026-03-26 11:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Theodore Ts'o, linux-ext4, Free Ekanayaka


On Mon, 16 Feb 2026 17:48:42 +0100, Jan Kara wrote:
> here are two ext4 fixes for nojournal mode. The first fix is fixing handling of
> uninitialized inodes in recently_deleted() used in nojournal mode which was
> leading to occasional fstests failures for me (which became much more likely
> after the second patch due to timing changes). The second patch fixes a bug in
> ext4_fsync() which was not properly writing out inode metadata in nojournal
> mode. It is kind of a band aid but proper solution is going to be rather
> intrusive and practically unbackportable so I think having it is worth it.
> 
> [...]

Applied, thanks!

[1/2] ext4: Make recently_deleted() properly work with lazy itable initialization
      commit: cec4673c65d709a0c0354c0a0bbbba5cb7508a9c
[2/2] ext4: Fix fsync(2) for nojournal mode
      commit: 3a0fb9e501707760341b5f4562e0f8409bed126a

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-26 11:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16 16:48 [PATCH 0/2] ext4: nojournal mode fixes Jan Kara
2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
2026-02-26 11:49   ` Zhang Yi
2026-02-16 16:48 ` [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode Jan Kara
2026-02-26 11:56   ` Zhang Yi
2026-02-26 13:39     ` Jan Kara
2026-03-26 11:57 ` [PATCH 0/2] ext4: nojournal mode fixes Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox