All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: fix node balancing condition in balance_level()
@ 2025-08-05  3:57 Sun YangKai
  2025-08-05 18:46 ` Boris Burkov
  0 siblings, 1 reply; 4+ messages in thread
From: Sun YangKai @ 2025-08-05  3:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Sun YangKai

Commit cfbb9308463f ("Btrfs: balance btree more often") intended to
trigger node balancing when node utilization drops below 50% (capacity/2)
by modifying the condition in setup_nodes_for_search(). However, an
undetected early return condition in balance_level() prevented this
behavior from taking effect.

The early return condition:
    if (btrfs_header_nritems(mid) > BTRFS_NODEPTRS_PER_BLOCK(fs_info) / 4)
        return 0;

caused balance_level() to abort when nodes were still more than 25% full,
effectively maintaining the original 25% threshold. This unintended
behavior persisted for over a decade. Since setup_nodes_for_search() is
the sole caller of balance_level(), remove the obsolete early return
condition to:

1. Align with the original intent of commit cfbb9308463f ("Btrfs: balance btree more often")
2. Allow proper node balancing at the 50% utilization threshold
3. Improve btree performance by more frequent node compaction

Fixes: cfbb9308463f ("Btrfs: balance btree more often")
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
---
 fs/btrfs/ctree.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index acd85e317564..8cc52c8b38f3 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -939,9 +939,6 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		}
 		return 0;
 	}
-	if (btrfs_header_nritems(mid) >
-	    BTRFS_NODEPTRS_PER_BLOCK(fs_info) / 4)
-		return 0;
 
 	if (pslot) {
 		left = btrfs_read_node_slot(parent, pslot - 1);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: fix node balancing condition in balance_level()
  2025-08-05  3:57 [PATCH] btrfs: fix node balancing condition in balance_level() Sun YangKai
@ 2025-08-05 18:46 ` Boris Burkov
  2025-08-06  6:07   ` Sun YangKai
  2025-08-07 15:06   ` Sun YangKai
  0 siblings, 2 replies; 4+ messages in thread
From: Boris Burkov @ 2025-08-05 18:46 UTC (permalink / raw)
  To: Sun YangKai; +Cc: linux-btrfs

On Tue, Aug 05, 2025 at 11:57:04AM +0800, Sun YangKai wrote:
> Commit cfbb9308463f ("Btrfs: balance btree more often") intended to
> trigger node balancing when node utilization drops below 50% (capacity/2)
> by modifying the condition in setup_nodes_for_search(). However, an
> undetected early return condition in balance_level() prevented this
> behavior from taking effect.
> 
> The early return condition:
>     if (btrfs_header_nritems(mid) > BTRFS_NODEPTRS_PER_BLOCK(fs_info) / 4)
>         return 0;
> 
> caused balance_level() to abort when nodes were still more than 25% full,
> effectively maintaining the original 25% threshold. This unintended
> behavior persisted for over a decade. Since setup_nodes_for_search() is
> the sole caller of balance_level(), remove the obsolete early return
> condition to:
> 
> 1. Align with the original intent of commit cfbb9308463f ("Btrfs: balance btree more often")
> 2. Allow proper node balancing at the 50% utilization threshold
> 3. Improve btree performance by more frequent node compaction

This is interesting, good catch.

However, we don't really have any evidence one way or the other which is
better. For better or worse, this logic has been around since ~2007, so
to change it I think requires more justification than the reasoning on a
faulty patch from 2009. Do you have any evidence for your #3:
  > Improve btree performance by more frequent node compaction

I would advise that you run:
- fsperf for generic benchmarking.
- a targeted workload that would get compacted more now that you removed
  the surprising check.

Thanks,
Boris

> 
> Fixes: cfbb9308463f ("Btrfs: balance btree more often")
> Signed-off-by: Sun YangKai <sunk67188@gmail.com>
> ---
>  fs/btrfs/ctree.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index acd85e317564..8cc52c8b38f3 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -939,9 +939,6 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>  		}
>  		return 0;
>  	}
> -	if (btrfs_header_nritems(mid) >
> -	    BTRFS_NODEPTRS_PER_BLOCK(fs_info) / 4)
> -		return 0;
>  
>  	if (pslot) {
>  		left = btrfs_read_node_slot(parent, pslot - 1);
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: fix node balancing condition in balance_level()
  2025-08-05 18:46 ` Boris Burkov
@ 2025-08-06  6:07   ` Sun YangKai
  2025-08-07 15:06   ` Sun YangKai
  1 sibling, 0 replies; 4+ messages in thread
From: Sun YangKai @ 2025-08-06  6:07 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs

> This is interesting, good catch.
> 
> However, we don't really have any evidence one way or the other which is
> better. For better or worse, this logic has been around since ~2007, so
> to change it I think requires more justification than the reasoning on a
> 
> faulty patch from 2009. Do you have any evidence for your #3:
>   > Improve btree performance by more frequent node compaction
> 
> I would advise that you run:
> - fsperf for generic benchmarking.
> - a targeted workload that would get compacted more now that you removed
>   the surprising check.
> 
> Thanks,
> Boris

Hi Boris.

The performance might get better because we will have less nodes and even 
lower trees, if we compact nodes when they are 50% full instead of 25% full. 
But I've not benched yet and there's some problem in it.

I come up with a drawback. This patch will make it more frequently to 
split a nearly full node into two and merge them back later again and again if 
item count changes slightly and its neighbors cannot help balance some items, 
which happens when:

1. left node is full or null and we are always operating the left-half part of 
the mid node
2. right node is null

The leaf merge related code choose (cap/3) instead of (cap/2) to prevent 
similar cases. I have no idea how often this could be triggered in real world 
workload. But if it really hurts, we should try some other method to merge/
split the nodes to get better performance.

Anyway, thank you for your advice. And I'm working on fsperf.

Thanks,
Sun YangKai






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] btrfs: fix node balancing condition in balance_level()
  2025-08-05 18:46 ` Boris Burkov
  2025-08-06  6:07   ` Sun YangKai
@ 2025-08-07 15:06   ` Sun YangKai
  1 sibling, 0 replies; 4+ messages in thread
From: Sun YangKai @ 2025-08-07 15:06 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs

Hi, Boris.

After working on node balance related code and fs_perf for days, I found that 
current node balancing is not good enough with or without this patch. And I 
come up with a different way to balance the nodes without adding much 
complexity, which will takes me some days to implement and test it. I believe 
it will be much better than the current one if I didn't mess things up.

BTW, I'm not familiar with fs benchmark, so I need some advice about how can I 
get a reasonable benchmark result that can reflect the metadata operation 
performance since I only have a 128G SSD can be used to benchmark, and it's 
speed seems not stable enough. Currently I use kprobe to trace node balance 
related functions to measure if the balance code works well.

Thanks,
Sun YangKai



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-07 15:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-05  3:57 [PATCH] btrfs: fix node balancing condition in balance_level() Sun YangKai
2025-08-05 18:46 ` Boris Burkov
2025-08-06  6:07   ` Sun YangKai
2025-08-07 15:06   ` Sun YangKai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.