* [PATCH v3 0/2] btrfs: fix size class loading logic
@ 2023-02-15 20:59 Boris Burkov
2023-02-15 20:59 ` [PATCH v3 1/2] btrfs: add size class stats to sysfs Boris Burkov
2023-02-15 20:59 ` [PATCH v3 2/2] btrfs: fix size class loading logic Boris Burkov
0 siblings, 2 replies; 5+ messages in thread
From: Boris Burkov @ 2023-02-15 20:59 UTC (permalink / raw)
To: linux-btrfs, kernel-team
Unfortunately, this code needs another fixup, as Filipe discovered that
the fixup's use of search_forward caused a deadlock with a thread
holding the tree root lock and blocked on caching.
---
Changelog:
V3: move to btrfs_for_each_slot, drop contention checking logic. Sysfs
patch holds groups sem, but releases it between raid loops if it is
contended, matching the behavior of the raid_bytes file.
V2: just organizational changes to how the original fixup was sent
Boris Burkov (2):
btrfs: add size class stats to sysfs
btrfs: fix size class loading logic
fs/btrfs/block-group.c | 42 +++++++++++++++-------------------------
fs/btrfs/sysfs.c | 44 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+), 26 deletions(-)
--
2.38.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/2] btrfs: add size class stats to sysfs
2023-02-15 20:59 [PATCH v3 0/2] btrfs: fix size class loading logic Boris Burkov
@ 2023-02-15 20:59 ` Boris Burkov
2023-02-20 19:40 ` David Sterba
2023-02-15 20:59 ` [PATCH v3 2/2] btrfs: fix size class loading logic Boris Burkov
1 sibling, 1 reply; 5+ messages in thread
From: Boris Burkov @ 2023-02-15 20:59 UTC (permalink / raw)
To: linux-btrfs, kernel-team
Make it possible to see the distribution of size classes for block
groups. Helpful for testing and debugging the allocator w.r.t. to size
classes.
The new stats can be found at the path:
/sys/fs/btrfs/<uid>/allocation/<bg-type>/size_class
but they will only be non-zero for bg-type = data.
Signed-off-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/sysfs.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 8c5efa5813b3..4926cab2f507 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -9,6 +9,7 @@
#include <linux/spinlock.h>
#include <linux/completion.h>
#include <linux/bug.h>
+#include <linux/list.h>
#include <crypto/hash.h>
#include "messages.h"
#include "ctree.h"
@@ -778,6 +779,47 @@ static ssize_t btrfs_chunk_size_store(struct kobject *kobj,
return len;
}
+static ssize_t btrfs_size_classes_show(struct kobject *kobj,
+ struct kobj_attribute *a, char *buf)
+{
+ struct btrfs_space_info *sinfo = to_space_info(kobj);
+ struct btrfs_block_group *bg;
+ u32 none = 0;
+ u32 small = 0;
+ u32 medium = 0;
+ u32 large = 0;
+
+ down_read(&sinfo->groups_sem);
+ for (int i = 0; i < BTRFS_NR_RAID_TYPES; ++i) {
+ list_for_each_entry(bg, &sinfo->block_groups[i], list) {
+ if (!btrfs_block_group_should_use_size_class(bg))
+ continue;
+ switch (bg->size_class) {
+ case BTRFS_BG_SZ_NONE:
+ none++;
+ break;
+ case BTRFS_BG_SZ_SMALL:
+ small++;
+ break;
+ case BTRFS_BG_SZ_MEDIUM:
+ medium++;
+ break;
+ case BTRFS_BG_SZ_LARGE:
+ large++;
+ break;
+ }
+ }
+ if (rwsem_is_contended(&sinfo->groups_sem)) {
+ up_read(&sinfo->groups_sem);
+ cond_resched();
+ down_read(&sinfo->groups_sem);
+ }
+ }
+ up_read(&sinfo->groups_sem);
+ return sysfs_emit(buf, "none %u\nsmall %u\nmedium %u\nlarge %u\n",
+ none, small, medium, large);
+}
+
#ifdef CONFIG_BTRFS_DEBUG
/*
* Request chunk allocation with current chunk size.
@@ -835,6 +877,7 @@ SPACE_INFO_ATTR(bytes_zone_unusable);
SPACE_INFO_ATTR(disk_used);
SPACE_INFO_ATTR(disk_total);
BTRFS_ATTR_RW(space_info, chunk_size, btrfs_chunk_size_show, btrfs_chunk_size_store);
+BTRFS_ATTR(space_info, size_classes, btrfs_size_classes_show);
static ssize_t btrfs_sinfo_bg_reclaim_threshold_show(struct kobject *kobj,
struct kobj_attribute *a,
@@ -887,6 +930,7 @@ static struct attribute *space_info_attrs[] = {
BTRFS_ATTR_PTR(space_info, disk_total),
BTRFS_ATTR_PTR(space_info, bg_reclaim_threshold),
BTRFS_ATTR_PTR(space_info, chunk_size),
+ BTRFS_ATTR_PTR(space_info, size_classes),
#ifdef CONFIG_BTRFS_DEBUG
BTRFS_ATTR_PTR(space_info, force_chunk_alloc),
#endif
--
2.38.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/2] btrfs: fix size class loading logic
2023-02-15 20:59 [PATCH v3 0/2] btrfs: fix size class loading logic Boris Burkov
2023-02-15 20:59 ` [PATCH v3 1/2] btrfs: add size class stats to sysfs Boris Burkov
@ 2023-02-15 20:59 ` Boris Burkov
2023-02-20 19:42 ` David Sterba
1 sibling, 1 reply; 5+ messages in thread
From: Boris Burkov @ 2023-02-15 20:59 UTC (permalink / raw)
To: linux-btrfs, kernel-team
This is another incremental patch fixing bugs in:
btrfs: load block group size class when caching
The use of search_forward was incorrect, and I have replaced it with
btrfs_for_each_slot. Since we only consider five samples (five search
slots), don't bother with the complexity of looking for commit_root_sem
contention. If necessary, it can be added to the load function in
between samples.
The mistake was
Reported-by: Filipe Manana <fdmanana@kernel.org>
The commit message should be:
btrfs: load block group size class when caching
Since the size class is an artifact of an arbitrary anti fragmentation
strategy, it doesn't really make sense to persist it. Furthermore, most
of the size class logic assumes fresh block groups. That is of course
not a reasonable assumption -- we will be upgrading kernels with
existing filesystems whose block groups are not classified.
To work around those issues, implement logic to compute the size class
of the block groups as we cache them in. To perfectly assess the state
of a block group, we would have to read the entire extent tree (since
the free space cache mashes together contiguous extent items) which
would be prohibitively expensive for larger file systems with more
extents.
We can do it relatively cheaply by implementing a simple heuristic of
sampling a handful of extents and picking the smallest one we see. In
the happy case where the block group was classified, we will only see
extents of the correct size. In the unhappy case, we will hopefully find
one of the smaller extents, but there is no perfect answer anyway.
Autorelocation will eventually churn up the block group if there is
significant freeing anyway.
There was no regression in mount performance at end state of the fsperf
test suite, and the delay until the block group is marked cached is
minimized by the constant number of extent samples.
Signed-off-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/block-group.c | 42 ++++++++++++++++--------------------------
1 file changed, 16 insertions(+), 26 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 5b10401d803b..05102a55710c 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -558,14 +558,15 @@ u64 add_new_free_space(struct btrfs_block_group *block_group, u64 start, u64 end
static int sample_block_group_extent_item(struct btrfs_caching_control *caching_ctl,
struct btrfs_block_group *block_group,
int index, int max_index,
- struct btrfs_key *key)
+ struct btrfs_key *found_key)
{
struct btrfs_fs_info *fs_info = block_group->fs_info;
struct btrfs_root *extent_root;
- int ret = 0;
u64 search_offset;
u64 search_end = block_group->start + block_group->length;
struct btrfs_path *path;
+ struct btrfs_key search_key;
+ int ret = 0;
ASSERT(index >= 0);
ASSERT(index <= max_index);
@@ -585,37 +586,24 @@ static int sample_block_group_extent_item(struct btrfs_caching_control *caching_
path->reada = READA_FORWARD;
search_offset = index * div_u64(block_group->length, max_index);
- key->objectid = block_group->start + search_offset;
- key->type = BTRFS_EXTENT_ITEM_KEY;
- key->offset = 0;
+ search_key.objectid = block_group->start + search_offset;
+ search_key.type = BTRFS_EXTENT_ITEM_KEY;
+ search_key.offset = 0;
- while (1) {
- ret = btrfs_search_forward(extent_root, key, path, 0);
- if (ret != 0)
- goto out;
+ btrfs_for_each_slot(extent_root, &search_key, found_key, path, ret) {
/* Success; sampled an extent item in the block group */
- if (key->type == BTRFS_EXTENT_ITEM_KEY &&
- key->objectid >= block_group->start &&
- key->objectid + key->offset <= search_end)
- goto out;
+ if (found_key->type == BTRFS_EXTENT_ITEM_KEY &&
+ found_key->objectid >= block_group->start &&
+ found_key->objectid + found_key->offset <= search_end)
+ break;
/* We can't possibly find a valid extent item anymore */
- if (key->objectid >= search_end) {
+ if (found_key->objectid >= search_end) {
ret = 1;
break;
}
- if (key->type < BTRFS_EXTENT_ITEM_KEY)
- key->type = BTRFS_EXTENT_ITEM_KEY;
- else
- key->objectid++;
- btrfs_release_path(path);
- up_read(&fs_info->commit_root_sem);
- mutex_unlock(&caching_ctl->mutex);
- cond_resched();
- mutex_lock(&caching_ctl->mutex);
- down_read(&fs_info->commit_root_sem);
}
-out:
+
lockdep_assert_held(&caching_ctl->mutex);
lockdep_assert_held_read(&fs_info->commit_root_sem);
btrfs_free_path(path);
@@ -659,6 +647,7 @@ static int sample_block_group_extent_item(struct btrfs_caching_control *caching_
static int load_block_group_size_class(struct btrfs_caching_control *caching_ctl,
struct btrfs_block_group *block_group)
{
+ struct btrfs_fs_info *fs_info = block_group->fs_info;
struct btrfs_key key;
int i;
u64 min_size = block_group->length;
@@ -668,6 +657,8 @@ static int load_block_group_size_class(struct btrfs_caching_control *caching_ctl
if (!btrfs_block_group_should_use_size_class(block_group))
return 0;
+ lockdep_assert_held(&caching_ctl->mutex);
+ lockdep_assert_held_read(&fs_info->commit_root_sem);
for (i = 0; i < 5; ++i) {
ret = sample_block_group_extent_item(caching_ctl, block_group, i, 5, &key);
if (ret < 0)
@@ -682,7 +673,6 @@ static int load_block_group_size_class(struct btrfs_caching_control *caching_ctl
block_group->size_class = size_class;
spin_unlock(&block_group->lock);
}
-
out:
return ret;
}
--
2.38.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/2] btrfs: add size class stats to sysfs
2023-02-15 20:59 ` [PATCH v3 1/2] btrfs: add size class stats to sysfs Boris Burkov
@ 2023-02-20 19:40 ` David Sterba
0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2023-02-20 19:40 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs, kernel-team
On Wed, Feb 15, 2023 at 12:59:49PM -0800, Boris Burkov wrote:
> Make it possible to see the distribution of size classes for block
> groups. Helpful for testing and debugging the allocator w.r.t. to size
> classes.
>
> The new stats can be found at the path:
> /sys/fs/btrfs/<uid>/allocation/<bg-type>/size_class
> but they will only be non-zero for bg-type = data.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/sysfs.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 44 insertions(+)
>
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 8c5efa5813b3..4926cab2f507 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -9,6 +9,7 @@
> #include <linux/spinlock.h>
> #include <linux/completion.h>
> #include <linux/bug.h>
> +#include <linux/list.h>
> #include <crypto/hash.h>
> #include "messages.h"
> #include "ctree.h"
> @@ -778,6 +779,47 @@ static ssize_t btrfs_chunk_size_store(struct kobject *kobj,
> return len;
> }
>
> +static ssize_t btrfs_size_classes_show(struct kobject *kobj,
> + struct kobj_attribute *a, char *buf)
> +{
> + struct btrfs_space_info *sinfo = to_space_info(kobj);
> + struct btrfs_block_group *bg;
> + u32 none = 0;
> + u32 small = 0;
> + u32 medium = 0;
> + u32 large = 0;
> +
> + down_read(&sinfo->groups_sem);
> + for (int i = 0; i < BTRFS_NR_RAID_TYPES; ++i) {
The lock in raid_bytes_show would be here, so with
down_read(&sinfo->groups_sem);
> + list_for_each_entry(bg, &sinfo->block_groups[i], list) {
> + if (!btrfs_block_group_should_use_size_class(bg))
> + continue;
> + switch (bg->size_class) {
> + case BTRFS_BG_SZ_NONE:
> + none++;
> + break;
> + case BTRFS_BG_SZ_SMALL:
> + small++;
> + break;
> + case BTRFS_BG_SZ_MEDIUM:
> + medium++;
> + break;
> + case BTRFS_BG_SZ_LARGE:
> + large++;
> + break;
> + }
> + }
and
up_read(&sinfo->groups_sem);
the conditional check could be avoided completely
> + if (rwsem_is_contended(&sinfo->groups_sem)) {
> + up_read(&sinfo->groups_sem);
> + cond_resched();
> + down_read(&sinfo->groups_sem);
> + }
> + }
> + up_read(&sinfo->groups_sem);
> + return sysfs_emit(buf, "none %u\nsmall %u\nmedium %u\nlarge %u\n",
> + none, small, medium, large);
> +}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/2] btrfs: fix size class loading logic
2023-02-15 20:59 ` [PATCH v3 2/2] btrfs: fix size class loading logic Boris Burkov
@ 2023-02-20 19:42 ` David Sterba
0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2023-02-20 19:42 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs, kernel-team
On Wed, Feb 15, 2023 at 12:59:50PM -0800, Boris Burkov wrote:
> This is another incremental patch fixing bugs in:
We can do incremental changes only until a week before the merge window
opens, which is today. The patch applies cleanly so I can take it
separately, with some changelog editing. Please check if everything is
right once it appears in misc-next, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-02-20 19:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-15 20:59 [PATCH v3 0/2] btrfs: fix size class loading logic Boris Burkov
2023-02-15 20:59 ` [PATCH v3 1/2] btrfs: add size class stats to sysfs Boris Burkov
2023-02-20 19:40 ` David Sterba
2023-02-15 20:59 ` [PATCH v3 2/2] btrfs: fix size class loading logic Boris Burkov
2023-02-20 19:42 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox