* [RFC] ext4: possible inconsistency in ext4_append() error path
@ 2026-05-01 17:25 Vineet Agarwal
2026-05-03 17:23 ` Jan Kara
0 siblings, 1 reply; 2+ messages in thread
From: Vineet Agarwal @ 2026-05-01 17:25 UTC (permalink / raw)
To: tytso
Cc: linux-ext4, adilger.kernel, libaokun, jack, ojaswin, ritesh.list,
yi.zhang, linux-kernel, Vineet Agarwal
Hi,
While looking into ext4 directory operations, I noticed a possible
inconsistency in the error handling of ext4_append().
In ext4_append(), the inode size is updated before all failure points
have been ruled out:
bh = ext4_bread(handle, inode, *block, EXT4_GET_BLOCKS_CREATE);
if (IS_ERR(bh))
return bh;
inode->i_size += inode->i_sb->s_blocksize;
EXT4_I(inode)->i_disksize = inode->i_size;
err = ext4_mark_inode_dirty(handle, inode);
if (err)
goto out;
err = ext4_journal_get_write_access(handle, inode->i_sb, bh,
EXT4_JTR_NONE);
if (err)
goto out;
If either ext4_mark_inode_dirty() or
ext4_journal_get_write_access() fails, the function returns an
error but does not restore the original inode size.
Callers of ext4_append() appear to treat it as an all-or-nothing
operation:
bh = ext4_append(handle, dir, &block);
if (IS_ERR(bh))
goto out;
However, in the failure case, inode->i_size may already have been
increased.
One possible consequence is that subsequent checks relying on i_size,
such as:
if (block >= inode->i_size >> inode->i_blkbits)
may allow a block index to pass bounds checks even though the append
operation did not complete successfully.
I understand that journaling may ensure on-disk consistency, but the
in-memory inode state may still temporarily reflect a change that did
not logically succeed.
Is this behavior intentional, or should ext4_append() avoid updating
i_size until after all failure points, or roll it back on error?
Thanks,
Vineet Agarwal
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: [RFC] ext4: possible inconsistency in ext4_append() error path
2026-05-01 17:25 [RFC] ext4: possible inconsistency in ext4_append() error path Vineet Agarwal
@ 2026-05-03 17:23 ` Jan Kara
0 siblings, 0 replies; 2+ messages in thread
From: Jan Kara @ 2026-05-03 17:23 UTC (permalink / raw)
To: Vineet Agarwal
Cc: tytso, linux-ext4, adilger.kernel, libaokun, jack, ojaswin,
ritesh.list, yi.zhang, linux-kernel
Hi!
On Fri 01-05-26 22:55:06, Vineet Agarwal wrote:
> While looking into ext4 directory operations, I noticed a possible
> inconsistency in the error handling of ext4_append().
>
> In ext4_append(), the inode size is updated before all failure points
> have been ruled out:
>
> bh = ext4_bread(handle, inode, *block, EXT4_GET_BLOCKS_CREATE);
> if (IS_ERR(bh))
> return bh;
>
> inode->i_size += inode->i_sb->s_blocksize;
> EXT4_I(inode)->i_disksize = inode->i_size;
>
> err = ext4_mark_inode_dirty(handle, inode);
> if (err)
> goto out;
>
> err = ext4_journal_get_write_access(handle, inode->i_sb, bh,
> EXT4_JTR_NONE);
> if (err)
> goto out;
>
> If either ext4_mark_inode_dirty() or
> ext4_journal_get_write_access() fails, the function returns an
> error but does not restore the original inode size.
>
> Callers of ext4_append() appear to treat it as an all-or-nothing
> operation:
>
> bh = ext4_append(handle, dir, &block);
> if (IS_ERR(bh))
> goto out;
>
> However, in the failure case, inode->i_size may already have been
> increased.
Right.
> One possible consequence is that subsequent checks relying on i_size,
> such as:
>
> if (block >= inode->i_size >> inode->i_blkbits)
>
> may allow a block index to pass bounds checks even though the append
> operation did not complete successfully.
>
> I understand that journaling may ensure on-disk consistency, but the
> in-memory inode state may still temporarily reflect a change that did
> not logically succeed.
>
> Is this behavior intentional, or should ext4_append() avoid updating
> i_size until after all failure points, or roll it back on error?
The thing is: If any of the above calls you mention fail, the filesystem
journal is aborted and the filesystem doesn't accept any more
modifications. So we are kind of sloppy with error recovery since it cannot
reach on-disk state. I'm not aware where the increased inode size on
failing ext4_append() would cause issues (besides further error complaints)
but if you find some such, yes, we can improve the error recovery path.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-03 17:23 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 17:25 [RFC] ext4: possible inconsistency in ext4_append() error path Vineet Agarwal
2026-05-03 17:23 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox