From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Jan Kara <jack@suse.cz>, John Garry <john.g.garry@oracle.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v3 4/7] ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS
Date: Thu, 15 May 2025 00:17:41 +0530 [thread overview]
Message-ID: <87o6vvyqpe.fsf@gmail.com> (raw)
In-Reply-To: <20250514161603.GG25655@frogsfrogsfrogs>
"Darrick J. Wong" <djwong@kernel.org> writes:
> On Fri, May 09, 2025 at 02:20:34AM +0530, Ritesh Harjani (IBM) wrote:
>> There can be a case where there are contiguous extents on the adjacent
>> leaf nodes of on-disk extent trees. So when someone tries to write to
>> this contiguous range, ext4_map_blocks() call will split by returning
>> 1 extent at a time if this is not already cached in extent_status tree
>> cache (where if these extents when cached can get merged since they are
>> contiguous).
>>
>> This is fine for a normal write however in case of atomic writes, it
>> can't afford to break the write into two. Now this is also something
>> that will only happen in the slow write case where we call
>> ext4_map_blocks() for each of these extents spread across different leaf
>> nodes. However, there is no guarantee that these extent status cache
>> cannot be reclaimed before the last call to ext4_map_blocks() in
>> ext4_map_blocks_atomic_write_slow().
>
> Can you have two physically and logically contiguous mappings within a
> single leaf node?
On disk extent tree can merge two such blocks if it is within the same
leaf node. But there can be a case where there are two logically and
physically contiguous mappings lying on two different leaf nodes.
(since on disk extent tree does not merge extents across branches.)
In that case ext4_map_blocks() can only return only 1 mapping at a time
(unless it is cached in extent status cache).
> Or is the key idea here that the extent status tree
> will merge adjacent mappings from the same leaf block, just not between
> leaf blocks?
>
Yes, in memory extent status cache can still merge this. But there can
be a case (we can argue in this case it may practically never happen)
that, the extent status cache got pruned due to memory pressure and we
have to look over on-disk extent tree. In that case we will need to look
ahead in the adjacent leaf block to see if we have a contiguous mapping.
Otherwise the atomic write will always fail over such contiguous region
split across two leaf nodes.
>> Hence this patch adds support of EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS.
>> This flag checks if the requested range can be fully found in extent
>> status cache and return. If not, it looks up in on-disk extent
>> tree via ext4_map_query_blocks(). If the found extent is the last entry
>> in the leaf node, then it goes and queries the next lblk to see if there
>> is an adjacent contiguous extent in the adjacent leaf node of the
>> on-disk extent tree.
>>
>> Even though there can be a case where there are multiple adjacent extent
>> entries spread across multiple leaf nodes. But we only read an adjacent
>> leaf block i.e. in total of 2 extent entries spread across 2 leaf nodes.
>> The reason for this is that we are mostly only going to support atomic
>> writes with upto 64KB or maybe max upto 1MB of atomic write support.
>>
>> Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> fs/ext4/ext4.h | 18 ++++++++-
>> fs/ext4/extents.c | 12 ++++++
>> fs/ext4/inode.c | 97 +++++++++++++++++++++++++++++++++++++++++------
>> 3 files changed, 115 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index e2b36a3c1b0f..b4bbe2837423 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -256,9 +256,19 @@ struct ext4_allocation_request {
>> #define EXT4_MAP_UNWRITTEN BIT(BH_Unwritten)
>> #define EXT4_MAP_BOUNDARY BIT(BH_Boundary)
>> #define EXT4_MAP_DELAYED BIT(BH_Delay)
>> +/*
>> + * This is for use in ext4_map_query_blocks() for a special case where we can
>> + * have a physically and logically contiguous blocks explit across two leaf
>
> s/explit/split/ ?
Thanks! Will fix it.
-ritesh
>
> --D
>
next prev parent reply other threads:[~2025-05-14 18:55 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-08 20:50 [PATCH v3 0/7] ext4: Add multi-fsblock atomic write support with bigalloc Ritesh Harjani (IBM)
2025-05-08 20:50 ` [PATCH v3 1/7] ext4: Document an edge case for overwrites Ritesh Harjani (IBM)
2025-05-09 5:19 ` Ojaswin Mujoo
2025-05-14 16:23 ` Darrick J. Wong
2025-05-08 20:50 ` [PATCH v3 2/7] ext4: Check if inode uses extents in ext4_inode_can_atomic_write() Ritesh Harjani (IBM)
2025-05-09 5:20 ` Ojaswin Mujoo
2025-05-14 16:24 ` Darrick J. Wong
2025-05-08 20:50 ` [PATCH v3 3/7] ext4: Make ext4_meta_trans_blocks() non-static for later use Ritesh Harjani (IBM)
2025-05-09 5:21 ` Ojaswin Mujoo
2025-05-14 16:24 ` Darrick J. Wong
2025-05-08 20:50 ` [PATCH v3 4/7] ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS Ritesh Harjani (IBM)
2025-05-14 16:16 ` Darrick J. Wong
2025-05-14 18:47 ` Ritesh Harjani [this message]
2025-05-08 20:50 ` [PATCH v3 5/7] ext4: Add multi-fsblock atomic write support with bigalloc Ritesh Harjani (IBM)
2025-05-14 16:19 ` Darrick J. Wong
2025-05-14 19:04 ` Ritesh Harjani
2025-05-08 20:50 ` [PATCH v3 6/7] ext4: Enable support for ext4 multi-fsblock atomic write using bigalloc Ritesh Harjani (IBM)
2025-05-14 16:21 ` Darrick J. Wong
2025-05-08 20:50 ` [PATCH v3 7/7] ext4: Add atomic block write documentation Ritesh Harjani (IBM)
2025-05-09 7:34 ` Ojaswin Mujoo
2025-05-14 16:38 ` Darrick J. Wong
2025-05-15 2:15 ` Ritesh Harjani
2025-05-15 2:18 ` Ritesh Harjani
2025-05-09 17:42 ` [PATCH v3 0/7] ext4: Add multi-fsblock atomic write support with bigalloc Ritesh Harjani
2025-05-14 16:40 ` Darrick J. Wong
2025-05-14 18:55 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o6vvyqpe.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=john.g.garry@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox