* [PATCH 0/2] ext4: nojournal mode fixes
@ 2026-02-16 16:48 Jan Kara
2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jan Kara @ 2026-02-16 16:48 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Free Ekanayaka, Jan Kara
Hello,
here are two ext4 fixes for nojournal mode. The first fix is fixing handling of
uninitialized inodes in recently_deleted() used in nojournal mode which was
leading to occasional fstests failures for me (which became much more likely
after the second patch due to timing changes). The second patch fixes a bug in
ext4_fsync() which was not properly writing out inode metadata in nojournal
mode. It is kind of a band aid but proper solution is going to be rather
intrusive and practically unbackportable so I think having it is worth it.
Honza
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization
2026-02-16 16:48 [PATCH 0/2] ext4: nojournal mode fixes Jan Kara
@ 2026-02-16 16:48 ` Jan Kara
2026-02-26 11:49 ` Zhang Yi
2026-02-16 16:48 ` [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode Jan Kara
2026-03-26 11:57 ` [PATCH 0/2] ext4: nojournal mode fixes Theodore Ts'o
2 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2026-02-16 16:48 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Free Ekanayaka, Jan Kara
recently_deleted() checks whether inode has been used in the near past.
However this can give false positive result when inode table is not
initialized yet and we are in fact comparing to random garbage (or stale
itable block of a filesystem before mkfs). Ultimately this results in
uninitialized inodes being skipped during inode allocation and possibly
they are never initialized and thus e2fsck complains. Verify if the
inode has been initialized before checking for dtime.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/ialloc.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index b20a1bf866ab..d858ae10a329 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -686,6 +686,12 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
if (unlikely(!gdp))
return 0;
+ /* Inode was never used in this filesystem? */
+ if (ext4_has_group_desc_csum(sb) &&
+ (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT) ||
+ ino >= EXT4_INODES_PER_GROUP(sb) - ext4_itable_unused_count(sb, gdp)))
+ return 0;
+
bh = sb_find_get_block(sb, ext4_inode_table(sb, gdp) +
(ino / inodes_per_block));
if (!bh || !buffer_uptodate(bh))
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode
2026-02-16 16:48 [PATCH 0/2] ext4: nojournal mode fixes Jan Kara
2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
@ 2026-02-16 16:48 ` Jan Kara
2026-02-26 11:56 ` Zhang Yi
2026-03-26 11:57 ` [PATCH 0/2] ext4: nojournal mode fixes Theodore Ts'o
2 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2026-02-16 16:48 UTC (permalink / raw)
To: Ted Tso; +Cc: linux-ext4, Free Ekanayaka, Jan Kara, stable
When inode metadata is changed, we sometimes just call
ext4_mark_inode_dirty() to track modified metadata. This copies inode
metadata into block buffer which is enough when we are journalling
metadata. However when we are running in nojournal mode we currently
fail to write the dirtied inode buffer during fsync(2) because the inode
is not marked as dirty. Use explicit ext4_write_inode() call to make
sure the inode table buffer is written to the disk. This is a band aid
solution but proper solution requires a much larger rewrite including
changes in metadata bh tracking infrastructure.
Reported-by: Free Ekanayaka <free.ekanayaka@gmail.com>
Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/fsync.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index e476c6de3074..bd8f230fa507 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -83,11 +83,23 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
int datasync, bool *needs_barrier)
{
struct inode *inode = file->f_inode;
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = 0,
+ };
int ret;
ret = generic_buffers_fsync_noflush(file, start, end, datasync);
- if (!ret)
- ret = ext4_sync_parent(inode);
+ if (ret)
+ return ret;
+
+ /* Force writeout of inode table buffer to disk */
+ ret = ext4_write_inode(inode, &wbc);
+ if (ret)
+ return ret;
+
+ ret = ext4_sync_parent(inode);
+
if (test_opt(inode->i_sb, BARRIER))
*needs_barrier = true;
--
2.51.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization
2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
@ 2026-02-26 11:49 ` Zhang Yi
0 siblings, 0 replies; 7+ messages in thread
From: Zhang Yi @ 2026-02-26 11:49 UTC (permalink / raw)
To: Jan Kara, Ted Tso; +Cc: linux-ext4, Free Ekanayaka
On 2/17/2026 12:48 AM, Jan Kara wrote:
> recently_deleted() checks whether inode has been used in the near past.
> However this can give false positive result when inode table is not
> initialized yet and we are in fact comparing to random garbage (or stale
> itable block of a filesystem before mkfs). Ultimately this results in
> uninitialized inodes being skipped during inode allocation and possibly
> they are never initialized and thus e2fsck complains. Verify if the
> inode has been initialized before checking for dtime.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Ha, this is a nice catch! Looks good to me.
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> ---
> fs/ext4/ialloc.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index b20a1bf866ab..d858ae10a329 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -686,6 +686,12 @@ static int recently_deleted(struct super_block *sb, ext4_group_t group, int ino)
> if (unlikely(!gdp))
> return 0;
>
> + /* Inode was never used in this filesystem? */
> + if (ext4_has_group_desc_csum(sb) &&
> + (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT) ||
> + ino >= EXT4_INODES_PER_GROUP(sb) - ext4_itable_unused_count(sb, gdp)))
> + return 0;
> +
> bh = sb_find_get_block(sb, ext4_inode_table(sb, gdp) +
> (ino / inodes_per_block));
> if (!bh || !buffer_uptodate(bh))
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode
2026-02-16 16:48 ` [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode Jan Kara
@ 2026-02-26 11:56 ` Zhang Yi
2026-02-26 13:39 ` Jan Kara
0 siblings, 1 reply; 7+ messages in thread
From: Zhang Yi @ 2026-02-26 11:56 UTC (permalink / raw)
To: Jan Kara, Ted Tso; +Cc: linux-ext4, Free Ekanayaka, stable
On 2/17/2026 12:48 AM, Jan Kara wrote:
> When inode metadata is changed, we sometimes just call
> ext4_mark_inode_dirty() to track modified metadata. This copies inode
> metadata into block buffer which is enough when we are journalling
> metadata. However when we are running in nojournal mode we currently
> fail to write the dirtied inode buffer during fsync(2) because the inode
> is not marked as dirty.
Please let me understand this. You mean that because some places we
directly call ext4_mark_inode_dirty() to mark the inode as dirty, instead
of using the generic mark_inode_dirty(), this results in the inode missing
the I_DIRTY_INODE flag. Consequently, generic_buffers_fsync_noflush()->
sync_inode_metadata() does not write the inode, leading to the metadata
not being updated on disk after fsync(2), right?
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
> Use explicit ext4_write_inode() call to make
> sure the inode table buffer is written to the disk. This is a band aid
> solution but proper solution requires a much larger rewrite including
> changes in metadata bh tracking infrastructure.
>
> Reported-by: Free Ekanayaka <free.ekanayaka@gmail.com>
> Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/
> CC: stable@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/ext4/fsync.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index e476c6de3074..bd8f230fa507 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -83,11 +83,23 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
> int datasync, bool *needs_barrier)
> {
> struct inode *inode = file->f_inode;
> + struct writeback_control wbc = {
> + .sync_mode = WB_SYNC_ALL,
> + .nr_to_write = 0,
> + };
> int ret;
>
> ret = generic_buffers_fsync_noflush(file, start, end, datasync);
> - if (!ret)
> - ret = ext4_sync_parent(inode);
> + if (ret)
> + return ret;
> +
> + /* Force writeout of inode table buffer to disk */
> + ret = ext4_write_inode(inode, &wbc);
> + if (ret)
> + return ret;
> +
> + ret = ext4_sync_parent(inode);
> +
> if (test_opt(inode->i_sb, BARRIER))
> *needs_barrier = true;
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode
2026-02-26 11:56 ` Zhang Yi
@ 2026-02-26 13:39 ` Jan Kara
0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2026-02-26 13:39 UTC (permalink / raw)
To: Zhang Yi; +Cc: Jan Kara, Ted Tso, linux-ext4, Free Ekanayaka, stable
On Thu 26-02-26 19:56:34, Zhang Yi wrote:
> On 2/17/2026 12:48 AM, Jan Kara wrote:
> > When inode metadata is changed, we sometimes just call
> > ext4_mark_inode_dirty() to track modified metadata. This copies inode
> > metadata into block buffer which is enough when we are journalling
> > metadata. However when we are running in nojournal mode we currently
> > fail to write the dirtied inode buffer during fsync(2) because the inode
> > is not marked as dirty.
>
> Please let me understand this. You mean that because some places we
> directly call ext4_mark_inode_dirty() to mark the inode as dirty, instead
> of using the generic mark_inode_dirty(), this results in the inode missing
> the I_DIRTY_INODE flag. Consequently, generic_buffers_fsync_noflush()->
> sync_inode_metadata() does not write the inode, leading to the metadata
> not being updated on disk after fsync(2), right?
Correct.
> Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Thanks for review!
Honza
> > Use explicit ext4_write_inode() call to make
> > sure the inode table buffer is written to the disk. This is a band aid
> > solution but proper solution requires a much larger rewrite including
> > changes in metadata bh tracking infrastructure.
> >
> > Reported-by: Free Ekanayaka <free.ekanayaka@gmail.com>
> > Link: https://lore.kernel.org/all/87il8nhxdm.fsf@x1.mail-host-address-is-not-set/
> > CC: stable@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/ext4/fsync.c | 16 ++++++++++++++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> > index e476c6de3074..bd8f230fa507 100644
> > --- a/fs/ext4/fsync.c
> > +++ b/fs/ext4/fsync.c
> > @@ -83,11 +83,23 @@ static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
> > int datasync, bool *needs_barrier)
> > {
> > struct inode *inode = file->f_inode;
> > + struct writeback_control wbc = {
> > + .sync_mode = WB_SYNC_ALL,
> > + .nr_to_write = 0,
> > + };
> > int ret;
> >
> > ret = generic_buffers_fsync_noflush(file, start, end, datasync);
> > - if (!ret)
> > - ret = ext4_sync_parent(inode);
> > + if (ret)
> > + return ret;
> > +
> > + /* Force writeout of inode table buffer to disk */
> > + ret = ext4_write_inode(inode, &wbc);
> > + if (ret)
> > + return ret;
> > +
> > + ret = ext4_sync_parent(inode);
> > +
> > if (test_opt(inode->i_sb, BARRIER))
> > *needs_barrier = true;
> >
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] ext4: nojournal mode fixes
2026-02-16 16:48 [PATCH 0/2] ext4: nojournal mode fixes Jan Kara
2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
2026-02-16 16:48 ` [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode Jan Kara
@ 2026-03-26 11:57 ` Theodore Ts'o
2 siblings, 0 replies; 7+ messages in thread
From: Theodore Ts'o @ 2026-03-26 11:57 UTC (permalink / raw)
To: Jan Kara; +Cc: Theodore Ts'o, linux-ext4, Free Ekanayaka
On Mon, 16 Feb 2026 17:48:42 +0100, Jan Kara wrote:
> here are two ext4 fixes for nojournal mode. The first fix is fixing handling of
> uninitialized inodes in recently_deleted() used in nojournal mode which was
> leading to occasional fstests failures for me (which became much more likely
> after the second patch due to timing changes). The second patch fixes a bug in
> ext4_fsync() which was not properly writing out inode metadata in nojournal
> mode. It is kind of a band aid but proper solution is going to be rather
> intrusive and practically unbackportable so I think having it is worth it.
>
> [...]
Applied, thanks!
[1/2] ext4: Make recently_deleted() properly work with lazy itable initialization
commit: cec4673c65d709a0c0354c0a0bbbba5cb7508a9c
[2/2] ext4: Fix fsync(2) for nojournal mode
commit: 3a0fb9e501707760341b5f4562e0f8409bed126a
Best regards,
--
Theodore Ts'o <tytso@mit.edu>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-26 11:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16 16:48 [PATCH 0/2] ext4: nojournal mode fixes Jan Kara
2026-02-16 16:48 ` [PATCH 1/2] ext4: Make recently_deleted() properly work with lazy itable initialization Jan Kara
2026-02-26 11:49 ` Zhang Yi
2026-02-16 16:48 ` [PATCH 2/2] ext4: Fix fsync(2) for nojournal mode Jan Kara
2026-02-26 11:56 ` Zhang Yi
2026-02-26 13:39 ` Jan Kara
2026-03-26 11:57 ` [PATCH 0/2] ext4: nojournal mode fixes Theodore Ts'o
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox