* [PATCH 0/2] btrfs-progs: free space tree fixes
@ 2022-05-20 1:31 Qu Wenruo
2022-05-20 1:31 ` [PATCH 1/2] btrfs-progs: properly initialize btrfs_block_group::bitmap_high_thresh Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Qu Wenruo @ 2022-05-20 1:31 UTC (permalink / raw)
To: linux-btrfs
I was debugging a weird behavior that btrfs kernel chooses not to
allocate a new data extent at an empty data block group.
And when checking the free space tree, it turned out that, we always
use bitmaps in btrfs-progs no matter what.
This results some every concerning free space tree after mkfs:
$ mkfs.btrfs -f -m raid1 -d raid0 /dev/test/scratch[1234]
btrfs-progs v5.17
[...]
Block group profiles:
Data: RAID0 4.00GiB
Metadata: RAID1 256.00MiB
System: RAID1 8.00MiB
[..]
$ btrfs ins dump-tree -t free-space /dev/test/scratch1
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE
node 30441472 flags 0x1(WRITTEN) backref revision 1
fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1
chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219
key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6
key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5
key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5
key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5
key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5
key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5
key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5
key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5
key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5
key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5
[...]
^^^ So many bitmaps that an empty fs will have two levels for free
space tree already
Thankfully, kernel can properly merge those bitmaps into a large extent
at mount, so it won't be that scary forever.
It turns out that, we never set btrfs_block_group::bitmap_high_thresh,
thus we always convert free space extents to bitmaps, and waste space
unnecessarily.
Fix it by cross-port the needed function
set_free_space_tree_thresholds() from kernel and call it at correct
timing.
And finally add a test case for it.
Unfortunately, even with this fixed, kernel is still doing its weird
behavior, as it's the cached un-clustered allocation code causing the
problem...
Qu Wenruo (2):
btrfs-progs: properly initialize btrfs_block_group::bitmap_high_thresh
btrfs-progs: mkfs-tests: add test case to make sure we don't create
bitmaps for empty fs
kernel-shared/extent-tree.c | 2 ++
kernel-shared/free-space-tree.c | 29 ++++++++++++++++++++
kernel-shared/free-space-tree.h | 2 ++
tests/mkfs-tests/024-fst-bitmaps/test.sh | 35 ++++++++++++++++++++++++
4 files changed, 68 insertions(+)
create mode 100755 tests/mkfs-tests/024-fst-bitmaps/test.sh
--
2.36.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] btrfs-progs: properly initialize btrfs_block_group::bitmap_high_thresh
2022-05-20 1:31 [PATCH 0/2] btrfs-progs: free space tree fixes Qu Wenruo
@ 2022-05-20 1:31 ` Qu Wenruo
2022-05-20 1:31 ` [PATCH 2/2] btrfs-progs: mkfs-tests: add test case to make sure we don't create bitmaps for empty fs Qu Wenruo
2022-05-20 14:34 ` [PATCH 0/2] btrfs-progs: free space tree fixes David Sterba
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2022-05-20 1:31 UTC (permalink / raw)
To: linux-btrfs
[BUG]
When creating btrfs with new v2 cache (the default behavior), mkfs.btrfs
always create the free space tree using bitmap.
It's fine for small fs, but will be a disaster if the device is large
and the data profile is something like RAID0:
$ mkfs.btrfs -f -m raid1 -d raid0 /dev/test/scratch[1234]
btrfs-progs v5.17
[...]
Block group profiles:
Data: RAID0 4.00GiB
Metadata: RAID1 256.00MiB
System: RAID1 8.00MiB
[..]
$ btrfs ins dump-tree -t free-space /dev/test/scratch1
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE
node 30441472 flags 0x1(WRITTEN) backref revision 1
fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1
chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219
key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6
key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5
key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5
key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5
key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5
key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5
key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5
key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5
key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5
key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5
[...]
^^^ So many bitmaps that an empty fs will have two levels for free
space tree already
[CAUSE]
Member btrfs_block_group::bitmap_high_thresh is never properly set to
any value other than 0, thus in function
update_free_space_extent_count(), the following check is always true:
if (!(flags & BTRFS_FREE_SPACE_USING_BITMAPS) &&
extent_count > block_group->bitmap_high_thresh) {
ret = convert_free_space_to_bitmaps(trans, block_group, path);
Thus we always got converted to bitmaps.
[FIX]
Crossport the function set_free_space_tree_thresholds() from kernel, and
call that function in btrfs_make_block_group() and
read_one_block_group() so that every block group has bitmap_high_thresh
properly set.
Now even for that 4GiB large data chunk, we still only have one free extent:
btrfs-progs v5.17
free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
leaf 30572544 items 15 free space 15860 generation 6 owner FREE_SPACE_TREE
leaf 30572544 flags 0x1(WRITTEN) backref revision 1
fs uuid b24e52ea-6580-4a88-aa70-cb173090bfe3
chunk uuid d85f3905-fc61-4084-b335-2b6b97814b8e
[...]
item 13 key (298844160 FREE_SPACE_INFO 4294967296) itemoff 16235 itemsize 8
free space info extent count 1 flags 0
item 14 key (298844160 FREE_SPACE_EXTENT 4294967296) itemoff 16235 itemsize 0
free space extent
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
kernel-shared/extent-tree.c | 2 ++
kernel-shared/free-space-tree.c | 29 +++++++++++++++++++++++++++++
kernel-shared/free-space-tree.h | 2 ++
3 files changed, 33 insertions(+)
diff --git a/kernel-shared/extent-tree.c b/kernel-shared/extent-tree.c
index 697a8a1e4dec..5807b11a7b1a 100644
--- a/kernel-shared/extent-tree.c
+++ b/kernel-shared/extent-tree.c
@@ -2697,6 +2697,7 @@ static int read_one_block_group(struct btrfs_fs_info *fs_info,
free(cache);
return ret;
}
+ set_free_space_tree_thresholds(fs_info, cache);
INIT_LIST_HEAD(&cache->dirty_list);
set_avail_alloc_bits(fs_info, cache->flags);
@@ -2845,6 +2846,7 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
cache = btrfs_add_block_group(fs_info, bytes_used, type, chunk_offset,
size);
+ set_free_space_tree_thresholds(fs_info, cache);
ret = insert_block_group_item(trans, cache);
if (ret)
return ret;
diff --git a/kernel-shared/free-space-tree.c b/kernel-shared/free-space-tree.c
index 03eb0ed290fc..e8034057fc56 100644
--- a/kernel-shared/free-space-tree.c
+++ b/kernel-shared/free-space-tree.c
@@ -40,6 +40,35 @@ static struct btrfs_root *btrfs_free_space_root(struct btrfs_fs_info *fs_info,
return btrfs_global_root(fs_info, &key);
}
+void set_free_space_tree_thresholds(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *cache)
+{
+ u32 bitmap_range;
+ size_t bitmap_size;
+ u64 num_bitmaps, total_bitmap_size;
+
+ /*
+ * We convert to bitmaps when the disk space required for using extents
+ * exceeds that required for using bitmaps.
+ */
+ bitmap_range = fs_info->sectorsize * BTRFS_FREE_SPACE_BITMAP_BITS;
+ num_bitmaps = div_u64(cache->start + bitmap_range - 1,
+ bitmap_range);
+ bitmap_size = sizeof(struct btrfs_item) + BTRFS_FREE_SPACE_BITMAP_SIZE;
+ total_bitmap_size = num_bitmaps * bitmap_size;
+ cache->bitmap_high_thresh = div_u64(total_bitmap_size,
+ sizeof(struct btrfs_item));
+
+ /*
+ * We allow for a small buffer between the high threshold and low
+ * threshold to avoid thrashing back and forth between the two formats.
+ */
+ if (cache->bitmap_high_thresh > 100)
+ cache->bitmap_low_thresh = cache->bitmap_high_thresh - 100;
+ else
+ cache->bitmap_low_thresh = 0;
+}
+
static struct btrfs_free_space_info *
search_free_space_info(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info,
diff --git a/kernel-shared/free-space-tree.h b/kernel-shared/free-space-tree.h
index 4f6aa5fc9eaf..e3ae38e10d9f 100644
--- a/kernel-shared/free-space-tree.h
+++ b/kernel-shared/free-space-tree.h
@@ -22,6 +22,8 @@
#define BTRFS_FREE_SPACE_BITMAP_SIZE 256
#define BTRFS_FREE_SPACE_BITMAP_BITS (BTRFS_FREE_SPACE_BITMAP_SIZE * BITS_PER_BYTE)
+void set_free_space_tree_thresholds(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *cache);
int btrfs_clear_free_space_tree(struct btrfs_fs_info *fs_info);
int load_free_space_tree(struct btrfs_fs_info *fs_info,
struct btrfs_block_group *block_group);
--
2.36.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] btrfs-progs: mkfs-tests: add test case to make sure we don't create bitmaps for empty fs
2022-05-20 1:31 [PATCH 0/2] btrfs-progs: free space tree fixes Qu Wenruo
2022-05-20 1:31 ` [PATCH 1/2] btrfs-progs: properly initialize btrfs_block_group::bitmap_high_thresh Qu Wenruo
@ 2022-05-20 1:31 ` Qu Wenruo
2022-05-20 14:34 ` [PATCH 0/2] btrfs-progs: free space tree fixes David Sterba
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2022-05-20 1:31 UTC (permalink / raw)
To: linux-btrfs
The new test case is to make sure on a relative large empty fs, we won't
create bitmaps to unnecessarily bump up the size of free space tree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tests/mkfs-tests/024-fst-bitmaps/test.sh | 35 ++++++++++++++++++++++++
1 file changed, 35 insertions(+)
create mode 100755 tests/mkfs-tests/024-fst-bitmaps/test.sh
diff --git a/tests/mkfs-tests/024-fst-bitmaps/test.sh b/tests/mkfs-tests/024-fst-bitmaps/test.sh
new file mode 100755
index 000000000000..8d88c08cb25a
--- /dev/null
+++ b/tests/mkfs-tests/024-fst-bitmaps/test.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+# Basic check if mkfs supports the runtime feature free-space-tree
+
+source "$TEST_TOP/common"
+
+check_prereq mkfs.btrfs
+check_prereq btrfs
+
+setup_root_helper
+
+setup_loopdevs 4
+prepare_loopdevs
+dev1=${loopdevs[1]}
+tmp=$(_mktemp fst-bitmap)
+
+test_do_mkfs()
+{
+ run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$@"
+ if run_check_stdout "$TOP/btrfs" check "$dev1" | grep -iq warning; then
+ _fail "warnings found in check output"
+ fi
+}
+
+test_do_mkfs -m raid1 -d raid0 ${loopdevs[@]}
+
+run_check_stdout $SUDO_HELPER "$TOP/btrfs" inspect-internal dump-tree \
+ -t free_space "$dev1" > "$tmp.dump-tree"
+cleanup_loopdevs
+
+if grep -q FREE_SPACE_BITMAP "$tmp.dump-tree"; then
+ rm -f -- "$tmp*"
+ _fail "free space bitmap should not be created for empty fs"
+fi
+rm -f -- "$tmp*"
+
--
2.36.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] btrfs-progs: free space tree fixes
2022-05-20 1:31 [PATCH 0/2] btrfs-progs: free space tree fixes Qu Wenruo
2022-05-20 1:31 ` [PATCH 1/2] btrfs-progs: properly initialize btrfs_block_group::bitmap_high_thresh Qu Wenruo
2022-05-20 1:31 ` [PATCH 2/2] btrfs-progs: mkfs-tests: add test case to make sure we don't create bitmaps for empty fs Qu Wenruo
@ 2022-05-20 14:34 ` David Sterba
2 siblings, 0 replies; 4+ messages in thread
From: David Sterba @ 2022-05-20 14:34 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, May 20, 2022 at 09:31:49AM +0800, Qu Wenruo wrote:
> I was debugging a weird behavior that btrfs kernel chooses not to
> allocate a new data extent at an empty data block group.
>
> And when checking the free space tree, it turned out that, we always
> use bitmaps in btrfs-progs no matter what.
>
> This results some every concerning free space tree after mkfs:
>
> $ mkfs.btrfs -f -m raid1 -d raid0 /dev/test/scratch[1234]
> btrfs-progs v5.17
> [...]
> Block group profiles:
> Data: RAID0 4.00GiB
> Metadata: RAID1 256.00MiB
> System: RAID1 8.00MiB
> [..]
>
> $ btrfs ins dump-tree -t free-space /dev/test/scratch1
> btrfs-progs v5.17
> free space tree key (FREE_SPACE_TREE ROOT_ITEM 0)
> node 30441472 level 1 items 10 free space 483 generation 6 owner FREE_SPACE_TREE
> node 30441472 flags 0x1(WRITTEN) backref revision 1
> fs uuid deddccae-afd0-4160-9a12-48fe7b526fb1
> chunk uuid 68f6cf98-afe3-4f47-9797-37fd9c610219
> key (1048576 FREE_SPACE_INFO 4194304) block 30457856 gen 6
> key (475004928 FREE_SPACE_BITMAP 8388608) block 30703616 gen 5
> key (953155584 FREE_SPACE_BITMAP 8388608) block 30720000 gen 5
> key (1431306240 FREE_SPACE_BITMAP 8388608) block 30736384 gen 5
> key (1909456896 FREE_SPACE_BITMAP 8388608) block 30752768 gen 5
> key (2387607552 FREE_SPACE_BITMAP 8388608) block 30769152 gen 5
> key (2865758208 FREE_SPACE_BITMAP 8388608) block 30785536 gen 5
> key (3343908864 FREE_SPACE_BITMAP 8388608) block 30801920 gen 5
> key (3822059520 FREE_SPACE_BITMAP 8388608) block 30818304 gen 5
> key (4300210176 FREE_SPACE_BITMAP 8388608) block 30834688 gen 5
> [...]
> ^^^ So many bitmaps that an empty fs will have two levels for free
> space tree already
>
> Thankfully, kernel can properly merge those bitmaps into a large extent
> at mount, so it won't be that scary forever.
>
> It turns out that, we never set btrfs_block_group::bitmap_high_thresh,
> thus we always convert free space extents to bitmaps, and waste space
> unnecessarily.
>
> Fix it by cross-port the needed function
> set_free_space_tree_thresholds() from kernel and call it at correct
> timing.
>
> And finally add a test case for it.
>
> Unfortunately, even with this fixed, kernel is still doing its weird
> behavior, as it's the cached un-clustered allocation code causing the
> problem...
>
> Qu Wenruo (2):
> btrfs-progs: properly initialize btrfs_block_group::bitmap_high_thresh
> btrfs-progs: mkfs-tests: add test case to make sure we don't create
> bitmaps for empty fs
Good catch, thanks. The free-space-tree.c has a high similarity with the
kernel sources, there are possibly more changes missing in the progs
implementation. Getting this file in sync would be desirable, function
by function or small updates are fine, if anybody is interested.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-05-20 14:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-05-20 1:31 [PATCH 0/2] btrfs-progs: free space tree fixes Qu Wenruo
2022-05-20 1:31 ` [PATCH 1/2] btrfs-progs: properly initialize btrfs_block_group::bitmap_high_thresh Qu Wenruo
2022-05-20 1:31 ` [PATCH 2/2] btrfs-progs: mkfs-tests: add test case to make sure we don't create bitmaps for empty fs Qu Wenruo
2022-05-20 14:34 ` [PATCH 0/2] btrfs-progs: free space tree fixes David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox