Linux EXT4 FS development
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: tytso@mit.edu, adilger.kernel@dilger.ca,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	jack@suse.cz, ojaswin@linux.ibm.com, ritesh.list@gmail.com
Subject: Re: [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits
Date: Thu, 4 Apr 2024 16:16:19 +0200	[thread overview]
Message-ID: <20240404141619.xrgtjhtpcae3kqk6@quack3> (raw)
In-Reply-To: <20240326213823.528302-4-shikemeng@huaweicloud.com>

On Wed 27-03-24 05:38:21, Kemeng Shi wrote:
> Function ext4_mb_mark_free_simple could search order for bit clearing in
> O(1) cost while mb_mark_used will search order in O(distance from chunk
> order to target order) and introduce unnecessary bit flips.

Let me see if I understand you right. I agree that mb_mark_used() is
actually O(log(bitmap_size)^2) because each call to
mb_find_order_for_block() is O(log(bitmap_size)). Do I understand your
concern right?

> Consider we have 4 continuous free bits and going to mark bit 0-2 inuse.
> initial state of buddy bitmap:
> order 2 |           0           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  1  |  1  |
>
> mark whole chunk inuse
> order 2 |           1           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  1  |  1  |
> 
> split chunk to order 1
> order 2 |           1           |
> order 1 |     0     |     0     |
> order 0 |  1  |  1  |  1  |  1  |
> 
> set the first bit in order 1 to mark bit 0-1 inuse
> set the second bit in order 1 for split
> order 2 |           1           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  1  |  1  |
> 
> step 3: split the second bit in order 1 to order 0
> order 2 |           1           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  0  |  0  |
> 
> step 4: set the third bit in order 0 to mark bit 2 inuse.
> order 2 |           1           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  1  |  0  |
> There are two unnecessary splits and three unnecessary bit flips.
> 
> With ext4_mb_mark_free_simple, we will clear the 4th bit in order 0
> with O(1) search and no extra bit flip.

However this looks like a bit ugly way to speed it up, I'm not even sure
this would result in practical speedups and asymptotically, I think the
complexity is still O(log^2). Also the extra bit flips are not really a
concern I'd say as they are in the same cacheline anyway. The unnecessary
overhead (if at all measurable) comes from the O(log^2) behavior. And there
I agree we could do better by not starting the block order search from 1 in
all the cases - we know the found order will be first increasing for some
time and then decreasing again so with some effort we could amortize all
block order searches to O(log) time. But it makes the code more complex and
I'm not conviced this is all worth it. So if you want to go this direction,
then please provide (micro-)benchmarks from real hardware (not just
theoretical cost estimations) showing the benefit. Thanks.

								Honza

> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a61fc52956b2..62d468379722 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2040,13 +2040,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  	int ord;
>  	int mlen = 0;
>  	int max = 0;
> -	int cur;
>  	int start = ex->fe_start;
>  	int len = ex->fe_len;
>  	unsigned ret = 0;
>  	int len0 = len;
>  	void *buddy;
> -	bool split = false;
> +	int ord_start, ord_end;
>  
>  	BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
>  	BUG_ON(e4b->bd_group != ex->fe_group);
> @@ -2071,16 +2070,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  
>  	/* let's maintain buddy itself */
>  	while (len) {
> -		if (!split)
> -			ord = mb_find_order_for_block(e4b, start);
> +		ord = mb_find_order_for_block(e4b, start);
>  
>  		if (((start >> ord) << ord) == start && len >= (1 << ord)) {
>  			/* the whole chunk may be allocated at once! */
>  			mlen = 1 << ord;
> -			if (!split)
> -				buddy = mb_find_buddy(e4b, ord, &max);
> -			else
> -				split = false;
> +			buddy = mb_find_buddy(e4b, ord, &max);
>  			BUG_ON((start >> ord) >= max);
>  			mb_set_bit(start >> ord, buddy);
>  			e4b->bd_info->bb_counters[ord]--;
> @@ -2094,20 +2089,28 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  		if (ret == 0)
>  			ret = len | (ord << 16);
>  
> -		/* we have to split large buddy */
>  		BUG_ON(ord <= 0);
>  		buddy = mb_find_buddy(e4b, ord, &max);
>  		mb_set_bit(start >> ord, buddy);
>  		e4b->bd_info->bb_counters[ord]--;
>  
> -		ord--;
> -		cur = (start >> ord) & ~1U;
> -		buddy = mb_find_buddy(e4b, ord, &max);
> -		mb_clear_bit(cur, buddy);
> -		mb_clear_bit(cur + 1, buddy);
> -		e4b->bd_info->bb_counters[ord]++;
> -		e4b->bd_info->bb_counters[ord]++;
> -		split = true;
> +		ord_start = (start >> ord) << ord;
> +		ord_end = ord_start + (1 << ord);
> +		if (start > ord_start)
> +			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
> +						 ord_start, start - ord_start,
> +						 e4b->bd_info);
> +
> +		if (start + len < ord_end) {
> +			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
> +						 start + len,
> +						 ord_end - (start + len),
> +						 e4b->bd_info);
> +			break;
> +		}
> +
> +		len = start + len - ord_end;
> +		start = ord_end;
>  	}
>  	mb_set_largest_free_order(e4b->bd_sb, e4b->bd_info);
>  
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2024-04-04 14:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 21:38 [PATCH 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
2024-03-26 21:38 ` [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
2024-03-29  7:52   ` Ojaswin Mujoo
2024-04-04 13:22   ` Jan Kara
2024-03-26 21:38 ` [PATCH 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used Kemeng Shi
2024-03-29  7:26   ` kernel test robot
2024-03-26 21:38 ` [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits Kemeng Shi
2024-04-04 14:16   ` Jan Kara [this message]
2024-04-07  6:31     ` Kemeng Shi
2024-03-26 21:38 ` [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
2024-03-29  7:15   ` Ojaswin Mujoo
2024-04-04 14:19   ` Jan Kara
2024-04-07  3:21     ` Kemeng Shi
2024-03-26 21:38 ` [PATCH 5/5] ext4: expand next_linear_group to remove repeat check for linear scan Kemeng Shi
2024-03-29  7:14   ` Ojaswin Mujoo
2024-04-03  6:57     ` Kemeng Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240404141619.xrgtjhtpcae3kqk6@quack3 \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox