From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>,
linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Jan Kara <jack@suse.cz>, rookxu <brookxu.cn@gmail.com>
Subject: Re: [PATCH v4 6/9] ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()
Date: Fri, 17 Feb 2023 22:48:31 +0530 [thread overview]
Message-ID: <87y1ow6wig.fsf@doe.com> (raw)
In-Reply-To: <79b5240a6168171577b1bb9ef7a27c0c52676d37.1676634592.git.ojaswin@linux.ibm.com>
Ojaswin Mujoo <ojaswin@linux.ibm.com> writes:
> When the length of best extent found is less than the length of goal extent
> we need to make sure that the best extent atleast covers the start of the
> original request. This is done by adjusting the ac_b_ex.fe_logical (logical
> start) of the extent.
>
> While doing so, the current logic sometimes results in the best extent's logical
> range overflowing the goal extent. Since this best extent is later added to the
> inode preallocation list, we have a possibility of introducing overlapping
> preallocations. This is discussed in detail here [1].
So if the best extent range overflows the goal extent range, then it
causes the overlapping ranges to be added to the list is because at the
time of normalization of the request during allocation, we decide goal
start and end based on the existing PAs in the list. And if we end up
adding the best found extent range which overflows the goal range, then
that will add a PA whose logical start/end might overlap with an
existing PA range.
>
> To fix this, replace the existing logic with the below logic for adjusting best
> extent as it keeps fragmentation in check while ensuring logical range of best
> extent doesn't overflow out of goal extent:
>
> 1. Check if best extent can be kept at end of goal range and still cover
> original start.
> 2. Else, check if best extent can be kept at start of goal range and still cover
> original start.
> 3. Else, keep the best extent at start of original request.
>
> Also, add a few extra BUG_ONs that might help catch errors faster.
Thanks for the detailed explaination. I think it makes sense.
>
> [1] https://lore.kernel.org/r/Y+OGkVvzPN0RMv0O@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com
Yes, the example in the above link was helpful.
>
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> ---
> fs/ext4/mballoc.c | 49 ++++++++++++++++++++++++++++++-----------------
> 1 file changed, 31 insertions(+), 18 deletions(-)
Also this looks like, it was not a problem till now because while finding a
right PA with existing list implementation, we adjust our allocation
window based on all the elements in the inode PA list (full scan of
inode PAs)
But in case of rbtree, we try to do a log(n) search and if we have an
overlapping range, then we might end up missing a subtree and calculate
a wrong allocation request range.
So the only problem in the current code would be that we might
have some overlapping ranges in the inode PAs which means we have some
extra reservations done for the same logical file ranges. But other than
that I guess we don't have any other issue.
Is that also the reason why we are not adding any fixes tag and cc'ing
it to stable?
I have reviewed the patch along with my understanding above.
Looks good to me. So please feel free to add -
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index fdb9d0a8f35d..ba9d26e2f2aa 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4330,6 +4330,7 @@ static void ext4_mb_use_inode_pa(struct ext4_allocation_context *ac,
> BUG_ON(start < pa->pa_pstart);
> BUG_ON(end > pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len));
> BUG_ON(pa->pa_free < len);
> + BUG_ON(ac->ac_b_ex.fe_len <= 0);
> pa->pa_free -= len;
>
> mb_debug(ac->ac_sb, "use %llu/%d from inode pa %p\n", start, len, pa);
> @@ -4668,10 +4669,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
> pa = ac->ac_pa;
>
> if (ac->ac_b_ex.fe_len < ac->ac_g_ex.fe_len) {
> - int winl;
> - int wins;
> - int win;
> - int offs;
> + int new_bex_start;
> + int new_bex_end;
>
> /* we can't allocate as much as normalizer wants.
> * so, found space must get proper lstart
> @@ -4679,26 +4678,40 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
> BUG_ON(ac->ac_g_ex.fe_logical > ac->ac_o_ex.fe_logical);
> BUG_ON(ac->ac_g_ex.fe_len < ac->ac_o_ex.fe_len);
>
> - /* we're limited by original request in that
> - * logical block must be covered any way
> - * winl is window we can move our chunk within */
> - winl = ac->ac_o_ex.fe_logical - ac->ac_g_ex.fe_logical;
> + /*
> + * Use the below logic for adjusting best extent as it keeps
> + * fragmentation in check while ensuring logical range of best
> + * extent doesn't overflow out of goal extent:
> + *
> + * 1. Check if best ex can be kept at end of goal and still
> + * cover original start
> + * 2. Else, check if best ex can be kept at start of goal and
> + * still cover original start
> + * 3. Else, keep the best ex at start of original request.
> + */
> + new_bex_end = ac->ac_g_ex.fe_logical +
> + EXT4_C2B(sbi, ac->ac_g_ex.fe_len);
> + new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
> + if (ac->ac_o_ex.fe_logical >= new_bex_start)
> + goto adjust_bex;
>
> - /* also, we should cover whole original request */
> - wins = EXT4_C2B(sbi, ac->ac_b_ex.fe_len - ac->ac_o_ex.fe_len);
> + new_bex_start = ac->ac_g_ex.fe_logical;
> + new_bex_end =
> + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
> + if (ac->ac_o_ex.fe_logical < new_bex_end)
> + goto adjust_bex;
>
> - /* the smallest one defines real window */
> - win = min(winl, wins);
> + new_bex_start = ac->ac_o_ex.fe_logical;
> + new_bex_end =
> + new_bex_start + EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
>
> - offs = ac->ac_o_ex.fe_logical %
> - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
> - if (offs && offs < win)
> - win = offs;
> +adjust_bex:
> + ac->ac_b_ex.fe_logical = new_bex_start;
>
> - ac->ac_b_ex.fe_logical = ac->ac_o_ex.fe_logical -
> - EXT4_NUM_B2C(sbi, win);
> BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
> BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
> + BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
> + EXT4_C2B(sbi, ac->ac_g_ex.fe_len)));
> }
>
> /* preallocation can change ac_b_ex, thus we store actually
> --
> 2.31.1
next prev parent reply other threads:[~2023-02-17 17:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-17 12:14 [PATCH v4 0/9] ext4: Convert inode preallocation list to an rbtree Ojaswin Mujoo
2023-02-17 12:14 ` [PATCH v4 1/9] ext4: Stop searching if PA doesn't satisfy non-extent file Ojaswin Mujoo
2023-02-17 12:14 ` [PATCH v4 2/9] ext4: Refactor code related to freeing PAs Ojaswin Mujoo
2023-02-17 12:14 ` [PATCH v4 3/9] ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated() Ojaswin Mujoo
2023-02-17 12:14 ` [PATCH v4 4/9] ext4: Move overlap assert logic into a separate function Ojaswin Mujoo
2023-02-17 12:14 ` [PATCH v4 5/9] ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request() Ojaswin Mujoo
2023-02-17 12:14 ` [PATCH v4 6/9] ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa() Ojaswin Mujoo
2023-02-17 17:18 ` Ritesh Harjani [this message]
2023-02-27 12:01 ` Jan Kara
2023-02-17 12:14 ` [PATCH v4 7/9] ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union Ojaswin Mujoo
2023-02-17 12:14 ` [PATCH v4 8/9] ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list Ojaswin Mujoo
2023-02-18 1:33 ` Ritesh Harjani
2023-02-27 12:19 ` Jan Kara
2023-03-01 6:44 ` Ojaswin Mujoo
2023-02-17 12:14 ` [PATCH v4 9/9] ext4: Remove the logic to trim inode PAs Ojaswin Mujoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1ow6wig.fsf@doe.com \
--to=ritesh.list@gmail.com \
--cc=brookxu.cn@gmail.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=riteshh@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).