From: "Darrick J. Wong" <djwong@kernel.org>
To: Ritesh Harjani <ritesh.list@gmail.com>
Cc: linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Jan Kara <jack@suse.cz>, John Garry <john.g.garry@oracle.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v3 0/7] ext4: Add multi-fsblock atomic write support with bigalloc
Date: Wed, 14 May 2025 09:40:50 -0700 [thread overview]
Message-ID: <20250514164050.GN25655@frogsfrogsfrogs> (raw)
In-Reply-To: <87h61t65pl.fsf@gmail.com>
On Fri, May 09, 2025 at 11:12:46PM +0530, Ritesh Harjani wrote:
> "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:
>
> > This is v3 of multi-fsblock atomic write support using bigalloc. This has
> > started looking into much better shape now. The major chunk of the design
> > changes has been kept in Patch-4 & 5.
> >
> > This series can now be carefully reviewed, as all the error handling related
> > code paths should be properly taken care of.
> >
>
> We spotted that multi-fsblock changes might need to force a journal
> commit if there were mixed mappings in the underlying region e.g. say WUWUWUW...
>
> The issue arises when, during block allocation, the unwritten ranges are
> first zeroed out, followed by the unwritten-to-written extent
> conversion. This conversion is part of a journaled metadata transaction
> that has not yet been committed, as the transaction is still running.
> If an iomap write then modifies the data on those multi-fsblocks and a
> sudden power loss occurs before the transaction commits, the
> unwritten-to-written conversion will not be replayed during journal
> recovery. As a result, we end up with new data written over mapped
> blocks, while the alternate unwritten blocks will read zeroes. This
> could cause a torn write behavior for atomic writes.
>
> So we were thinking we might need something like this. Hopefully this
> should still be ok, as mixed mapping case mostly is a non-performance
> critical path. Thoughts?
I agree the journal has to be written out before the atomic write is
sent to the device.
--D
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2642e1ef128f..59b59d609976 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3517,7 +3517,8 @@ static int ext4_map_blocks_atomic_write_slow(handle_t *handle,
> * underlying short holes/unwritten extents within the requested range.
> */
> static int ext4_map_blocks_atomic_write(handle_t *handle, struct inode *inode,
> - struct ext4_map_blocks *map, int m_flags)
> + struct ext4_map_blocks *map, int m_flags,
> + bool *force_commit)
> {
> ext4_lblk_t m_lblk = map->m_lblk;
> unsigned int m_len = map->m_len;
> @@ -3537,6 +3538,11 @@ static int ext4_map_blocks_atomic_write(handle_t *handle, struct inode *inode,
> map->m_len = m_len;
> map->m_flags = 0;
>
> + /*
> + * slow path means we have mixed mapping, that means we will need
> + * to force txn commit.
> + */
> + *force_commit = true;
> return ext4_map_blocks_atomic_write_slow(handle, inode, map);
> out:
> return ret;
> @@ -3548,6 +3554,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
> handle_t *handle;
> u8 blkbits = inode->i_blkbits;
> int ret, dio_credits, m_flags = 0, retries = 0;
> + bool force_commit = false;
>
> /*
> * Trim the mapping request to the maximum value that we can map at
> @@ -3610,7 +3617,8 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
> m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT;
>
> if (flags & IOMAP_ATOMIC)
> - ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags);
> + ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags,
> + &force_commit);
> else
> ret = ext4_map_blocks(handle, inode, map, m_flags);
>
> @@ -3626,6 +3634,9 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map,
> if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> goto retry;
>
> + if (ret > 0 && force_commit)
> + ext4_force_commit(inode->i_sb);
> +
> return ret;
> }
>
>
> -ritesh
>
next prev parent reply other threads:[~2025-05-14 16:40 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-08 20:50 [PATCH v3 0/7] ext4: Add multi-fsblock atomic write support with bigalloc Ritesh Harjani (IBM)
2025-05-08 20:50 ` [PATCH v3 1/7] ext4: Document an edge case for overwrites Ritesh Harjani (IBM)
2025-05-09 5:19 ` Ojaswin Mujoo
2025-05-14 16:23 ` Darrick J. Wong
2025-05-08 20:50 ` [PATCH v3 2/7] ext4: Check if inode uses extents in ext4_inode_can_atomic_write() Ritesh Harjani (IBM)
2025-05-09 5:20 ` Ojaswin Mujoo
2025-05-14 16:24 ` Darrick J. Wong
2025-05-08 20:50 ` [PATCH v3 3/7] ext4: Make ext4_meta_trans_blocks() non-static for later use Ritesh Harjani (IBM)
2025-05-09 5:21 ` Ojaswin Mujoo
2025-05-14 16:24 ` Darrick J. Wong
2025-05-08 20:50 ` [PATCH v3 4/7] ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS Ritesh Harjani (IBM)
2025-05-14 16:16 ` Darrick J. Wong
2025-05-14 18:47 ` Ritesh Harjani
2025-05-08 20:50 ` [PATCH v3 5/7] ext4: Add multi-fsblock atomic write support with bigalloc Ritesh Harjani (IBM)
2025-05-14 16:19 ` Darrick J. Wong
2025-05-14 19:04 ` Ritesh Harjani
2025-05-08 20:50 ` [PATCH v3 6/7] ext4: Enable support for ext4 multi-fsblock atomic write using bigalloc Ritesh Harjani (IBM)
2025-05-14 16:21 ` Darrick J. Wong
2025-05-08 20:50 ` [PATCH v3 7/7] ext4: Add atomic block write documentation Ritesh Harjani (IBM)
2025-05-09 7:34 ` Ojaswin Mujoo
2025-05-14 16:38 ` Darrick J. Wong
2025-05-15 2:15 ` Ritesh Harjani
2025-05-15 2:18 ` Ritesh Harjani
2025-05-09 17:42 ` [PATCH v3 0/7] ext4: Add multi-fsblock atomic write support with bigalloc Ritesh Harjani
2025-05-14 16:40 ` Darrick J. Wong [this message]
2025-05-14 18:55 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250514164050.GN25655@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=john.g.garry@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).