* [PATCH v2 01/10] btrfs: use a kmem_cache for block groups
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-17 1:51 ` David Sterba
2026-04-15 18:44 ` [PATCH v2 02/10] btrfs: reduce size of struct btrfs_block_group fdmanana
` (9 subsequent siblings)
10 siblings, 1 reply; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We are currently allocating block groups using the generic slabs, and
given that the size of btrfs_block_group structure is 672 bytes (on a
release kernel), we end up using the kmalloc-1024 slab and therefore
waste quite some memory since on a 4K page system we can only fit 4
block groups per page. The block groups are also allocated and
delallocated with some frequency, specially if we have auto reclaim
enabled.
So use a kmem_cache for block groups, this way on a 4K page system we
can fit 6 block groups per page instead of 4.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/block-group.c | 23 ++++++++++++++++++++---
fs/btrfs/block-group.h | 3 +++
fs/btrfs/super.c | 3 +++
3 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index e6f5a17a13e3..a87f147aefa5 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -22,6 +22,23 @@
#include "accessors.h"
#include "extent-tree.h"
+static struct kmem_cache *block_group_cache;
+
+int __init btrfs_init_block_group(void)
+{
+ block_group_cache = kmem_cache_create("btrfs_block_group",
+ sizeof(struct btrfs_block_group),
+ 0, 0, NULL);
+ if (!block_group_cache)
+ return -ENOMEM;
+ return 0;
+}
+
+void __cold btrfs_exit_block_group(void)
+{
+ kmem_cache_destroy(block_group_cache);
+}
+
#ifdef CONFIG_BTRFS_DEBUG
int btrfs_should_fragment_free_space(const struct btrfs_block_group *block_group)
{
@@ -182,7 +199,7 @@ void btrfs_put_block_group(struct btrfs_block_group *cache)
kfree(cache->free_space_ctl);
btrfs_free_chunk_map(cache->physical_map);
- kfree(cache);
+ kmem_cache_free(block_group_cache, cache);
}
}
@@ -2371,13 +2388,13 @@ static struct btrfs_block_group *btrfs_create_block_group(
{
struct btrfs_block_group *cache;
- cache = kzalloc_obj(*cache, GFP_NOFS);
+ cache = kmem_cache_zalloc(block_group_cache, GFP_NOFS);
if (!cache)
return NULL;
cache->free_space_ctl = kzalloc_obj(*cache->free_space_ctl, GFP_NOFS);
if (!cache->free_space_ctl) {
- kfree(cache);
+ kmem_cache_free(block_group_cache, cache);
return NULL;
}
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 0504cb357992..b414f4268d2d 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -320,6 +320,9 @@ static inline u64 btrfs_block_group_available_space(const struct btrfs_block_gro
int btrfs_should_fragment_free_space(const struct btrfs_block_group *block_group);
#endif
+int __init btrfs_init_block_group(void);
+void __cold btrfs_exit_block_group(void);
+
struct btrfs_block_group *btrfs_lookup_first_block_group(
struct btrfs_fs_info *info, u64 bytenr);
struct btrfs_block_group *btrfs_lookup_block_group(
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b26aa9169e83..b5346f26ed87 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2604,6 +2604,9 @@ static const struct init_sequence mod_init_seq[] = {
}, {
.init_func = btrfs_init_compress,
.exit_func = btrfs_exit_compress,
+ }, {
+ .init_func = btrfs_init_block_group,
+ .exit_func = btrfs_exit_block_group,
}, {
.init_func = btrfs_init_cachep,
.exit_func = btrfs_destroy_cachep,
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH v2 01/10] btrfs: use a kmem_cache for block groups
2026-04-15 18:44 ` [PATCH v2 01/10] btrfs: use a kmem_cache " fdmanana
@ 2026-04-17 1:51 ` David Sterba
0 siblings, 0 replies; 28+ messages in thread
From: David Sterba @ 2026-04-17 1:51 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Wed, Apr 15, 2026 at 07:44:39PM +0100, fdmanana@kernel.org wrote:
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -22,6 +22,23 @@
> #include "accessors.h"
> #include "extent-tree.h"
>
> +static struct kmem_cache *block_group_cache;
> +
> +int __init btrfs_init_block_group(void)
> +{
> + block_group_cache = kmem_cache_create("btrfs_block_group",
> + sizeof(struct btrfs_block_group),
In case the structure and cache name match you can use KMEM_CACHE.
> + 0, 0, NULL);
> + if (!block_group_cache)
> + return -ENOMEM;
> + return 0;
> +}
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 02/10] btrfs: reduce size of struct btrfs_block_group
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
2026-04-15 18:44 ` [PATCH v2 01/10] btrfs: use a kmem_cache " fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-15 20:07 ` Martin Raiber
2026-04-15 18:44 ` [PATCH v2 03/10] btrfs: use a kmem_cache for free space control structures fdmanana
` (8 subsequent siblings)
10 siblings, 1 reply; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We currently have several holes in the structure:
struct btrfs_block_group {
struct btrfs_fs_info * fs_info; /* 0 8 */
struct btrfs_inode * inode; /* 8 8 */
spinlock_t lock __attribute__((__aligned__(4))); /* 16 4 */
/* XXX 4 bytes hole, try to pack */
u64 start; /* 24 8 */
u64 length; /* 32 8 */
u64 pinned; /* 40 8 */
u64 reserved; /* 48 8 */
u64 used; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 delalloc_bytes; /* 64 8 */
u64 bytes_super; /* 72 8 */
u64 flags; /* 80 8 */
u64 cache_generation; /* 88 8 */
u64 global_root_id; /* 96 8 */
u64 remap_bytes; /* 104 8 */
u32 identity_remap_count; /* 112 4 */
/* XXX 4 bytes hole, try to pack */
u64 last_used; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
u64 last_remap_bytes; /* 128 8 */
u32 last_identity_remap_count; /* 136 4 */
/* XXX 4 bytes hole, try to pack */
u64 last_flags; /* 144 8 */
u32 bitmap_high_thresh; /* 152 4 */
u32 bitmap_low_thresh; /* 156 4 */
struct rw_semaphore data_rwsem __attribute__((__aligned__(8))); /* 160 40 */
/* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
long unsigned int full_stripe_len; /* 200 8 */
long unsigned int runtime_flags; /* 208 8 */
unsigned int ro; /* 216 4 */
int disk_cache_state; /* 220 4 */
int cached; /* 224 4 */
/* XXX 4 bytes hole, try to pack */
struct btrfs_caching_control * caching_ctl; /* 232 8 */
struct btrfs_space_info * space_info; /* 240 8 */
struct btrfs_free_space_ctl * free_space_ctl; /* 248 8 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct rb_node cache_node __attribute__((__aligned__(8))); /* 256 24 */
struct list_head list; /* 280 16 */
refcount_t refs __attribute__((__aligned__(4))); /* 296 4 */
/* XXX 4 bytes hole, try to pack */
struct list_head cluster_list; /* 304 16 */
/* --- cacheline 5 boundary (320 bytes) --- */
struct list_head bg_list; /* 320 16 */
struct list_head ro_list; /* 336 16 */
atomic_t frozen __attribute__((__aligned__(4))); /* 352 4 */
/* XXX 4 bytes hole, try to pack */
struct list_head discard_list; /* 360 16 */
int discard_index; /* 376 4 */
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 6 boundary (384 bytes) --- */
u64 discard_eligible_time; /* 384 8 */
u64 discard_cursor; /* 392 8 */
enum btrfs_discard_state discard_state; /* 400 4 */
/* XXX 4 bytes hole, try to pack */
struct list_head dirty_list; /* 408 16 */
struct list_head io_list; /* 424 16 */
struct btrfs_io_ctl io_ctl; /* 440 72 */
/* --- cacheline 8 boundary (512 bytes) --- */
atomic_t reservations __attribute__((__aligned__(4))); /* 512 4 */
atomic_t nocow_writers __attribute__((__aligned__(4))); /* 516 4 */
struct mutex free_space_lock __attribute__((__aligned__(8))); /* 520 32 */
bool using_free_space_bitmaps; /* 552 1 */
bool using_free_space_bitmaps_cached; /* 553 1 */
/* XXX 2 bytes hole, try to pack */
int swap_extents; /* 556 4 */
u64 alloc_offset; /* 560 8 */
u64 zone_unusable; /* 568 8 */
/* --- cacheline 9 boundary (576 bytes) --- */
u64 zone_capacity; /* 576 8 */
u64 meta_write_pointer; /* 584 8 */
struct btrfs_chunk_map * physical_map; /* 592 8 */
struct list_head active_bg_list; /* 600 16 */
struct work_struct zone_finish_work; /* 616 32 */
/* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */
struct extent_buffer * last_eb; /* 648 8 */
enum btrfs_block_group_size_class size_class; /* 656 4 */
/* XXX 4 bytes hole, try to pack */
u64 reclaim_mark; /* 664 8 */
/* size: 672, cachelines: 11, members: 61 */
/* sum members: 634, holes: 10, sum holes: 38 */
/* forced alignments: 8 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
Reorder some fields to eliminate the holes while keeping closely related
or frequently accessed fields together. After the reordering the size of
the structure is reduced down to 632 bytes and the number of cache lines
decreases from 11 to 10. We can still only pack 6 block groups per 4K page
but on a 64K page system we will now be able to pack 103 block groups
instead of 97. The new structure layout, on a release kernel, is the
following:
struct btrfs_block_group {
struct btrfs_fs_info * fs_info; /* 0 8 */
struct btrfs_inode * inode; /* 8 8 */
spinlock_t lock __attribute__((__aligned__(4))); /* 16 4 */
unsigned int ro; /* 20 4 */
u64 start; /* 24 8 */
u64 length; /* 32 8 */
u64 pinned; /* 40 8 */
u64 reserved; /* 48 8 */
u64 used; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 delalloc_bytes; /* 64 8 */
u64 bytes_super; /* 72 8 */
u64 flags; /* 80 8 */
u64 cache_generation; /* 88 8 */
u64 global_root_id; /* 96 8 */
u64 remap_bytes; /* 104 8 */
u32 identity_remap_count; /* 112 4 */
u32 last_identity_remap_count; /* 116 4 */
u64 last_used; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
u64 last_remap_bytes; /* 128 8 */
u64 last_flags; /* 136 8 */
u32 bitmap_high_thresh; /* 144 4 */
u32 bitmap_low_thresh; /* 148 4 */
struct rw_semaphore data_rwsem __attribute__((__aligned__(8))); /* 152 40 */
/* --- cacheline 3 boundary (192 bytes) --- */
long unsigned int full_stripe_len; /* 192 8 */
long unsigned int runtime_flags; /* 200 8 */
int disk_cache_state; /* 208 4 */
int cached; /* 212 4 */
struct btrfs_caching_control * caching_ctl; /* 216 8 */
struct btrfs_space_info * space_info; /* 224 8 */
struct btrfs_free_space_ctl * free_space_ctl; /* 232 8 */
struct rb_node cache_node __attribute__((__aligned__(8))); /* 240 24 */
/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
struct list_head list; /* 264 16 */
refcount_t refs __attribute__((__aligned__(4))); /* 280 4 */
atomic_t frozen __attribute__((__aligned__(4))); /* 284 4 */
struct list_head cluster_list; /* 288 16 */
struct list_head bg_list; /* 304 16 */
/* --- cacheline 5 boundary (320 bytes) --- */
struct list_head ro_list; /* 320 16 */
struct list_head discard_list; /* 336 16 */
int discard_index; /* 352 4 */
enum btrfs_discard_state discard_state; /* 356 4 */
u64 discard_eligible_time; /* 360 8 */
u64 discard_cursor; /* 368 8 */
struct list_head dirty_list; /* 376 16 */
/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
struct list_head io_list; /* 392 16 */
struct btrfs_io_ctl io_ctl; /* 408 72 */
/* --- cacheline 7 boundary (448 bytes) was 32 bytes ago --- */
atomic_t reservations __attribute__((__aligned__(4))); /* 480 4 */
atomic_t nocow_writers __attribute__((__aligned__(4))); /* 484 4 */
struct mutex free_space_lock __attribute__((__aligned__(8))); /* 488 32 */
/* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
bool using_free_space_bitmaps; /* 520 1 */
bool using_free_space_bitmaps_cached; /* 521 1 */
/* XXX 2 bytes hole, try to pack */
/* Bitfield combined with previous fields */
static enum btrfs_block_group_size_class size_class; /* 0: 0 0 */
int swap_extents; /* 524 4 */
u64 alloc_offset; /* 528 8 */
u64 zone_unusable; /* 536 8 */
u64 zone_capacity; /* 544 8 */
u64 meta_write_pointer; /* 552 8 */
struct btrfs_chunk_map * physical_map; /* 560 8 */
struct list_head active_bg_list; /* 568 16 */
/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
struct work_struct zone_finish_work; /* 584 32 */
struct extent_buffer * last_eb; /* 616 8 */
u64 reclaim_mark; /* 624 8 */
/* size: 632, cachelines: 10, members: 60, static members: 1 */
/* sum members: 630, holes: 1, sum holes: 2 */
/* sum bitfield members: 8 bits (1 bytes) */
/* forced alignments: 8 */
/* last cacheline: 56 bytes */
/* BRAIN FART ALERT! 632 bytes != 630 (member bytes) + 8 (member bits) + 2 (byte holes) + 0 (bit holes), diff = -8 bits */
} __attribute__((__aligned__(8)));
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/block-group.h | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index b414f4268d2d..60a3b1c0a8ab 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -122,6 +122,7 @@ struct btrfs_block_group {
struct btrfs_fs_info *fs_info;
struct btrfs_inode *inode;
spinlock_t lock;
+ unsigned int ro;
u64 start;
u64 length;
u64 pinned;
@@ -134,7 +135,8 @@ struct btrfs_block_group {
u64 global_root_id;
u64 remap_bytes;
u32 identity_remap_count;
-
+ /* The last commited identity_remap_count value of this block group. */
+ u32 last_identity_remap_count;
/*
* The last committed used bytes of this block group, if the above @used
* is still the same as @last_used, we don't need to update block
@@ -143,8 +145,6 @@ struct btrfs_block_group {
u64 last_used;
/* The last committed remap_bytes value of this block group. */
u64 last_remap_bytes;
- /* The last commited identity_remap_count value of this block group. */
- u32 last_identity_remap_count;
/* The last committed flags value for this block group. */
u64 last_flags;
@@ -171,8 +171,6 @@ struct btrfs_block_group {
unsigned long full_stripe_len;
unsigned long runtime_flags;
- unsigned int ro;
-
int disk_cache_state;
/* Cache tracking stuff */
@@ -192,6 +190,16 @@ struct btrfs_block_group {
refcount_t refs;
+ /*
+ * When non-zero it means the block group's logical address and its
+ * device extents can not be reused for future block group allocations
+ * until the counter goes down to 0. This is to prevent them from being
+ * reused while some task is still using the block group after it was
+ * deleted - we want to make sure they can only be reused for new block
+ * groups after that task is done with the deleted block group.
+ */
+ atomic_t frozen;
+
/*
* List of struct btrfs_free_clusters for this block group.
* Today it will only have one thing on it, but that may change
@@ -211,22 +219,12 @@ struct btrfs_block_group {
/* For read-only block groups */
struct list_head ro_list;
- /*
- * When non-zero it means the block group's logical address and its
- * device extents can not be reused for future block group allocations
- * until the counter goes down to 0. This is to prevent them from being
- * reused while some task is still using the block group after it was
- * deleted - we want to make sure they can only be reused for new block
- * groups after that task is done with the deleted block group.
- */
- atomic_t frozen;
-
/* For discard operations */
struct list_head discard_list;
int discard_index;
+ enum btrfs_discard_state discard_state;
u64 discard_eligible_time;
u64 discard_cursor;
- enum btrfs_discard_state discard_state;
/* For dirty block groups */
struct list_head dirty_list;
@@ -263,6 +261,8 @@ struct btrfs_block_group {
/* Protected by @free_space_lock. */
bool using_free_space_bitmaps_cached;
+ enum btrfs_block_group_size_class size_class:8;
+
/*
* Number of extents in this block group used for swap files.
* All accesses protected by the spinlock 'lock'.
@@ -281,7 +281,6 @@ struct btrfs_block_group {
struct list_head active_bg_list;
struct work_struct zone_finish_work;
struct extent_buffer *last_eb;
- enum btrfs_block_group_size_class size_class;
u64 reclaim_mark;
};
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH v2 02/10] btrfs: reduce size of struct btrfs_block_group
2026-04-15 18:44 ` [PATCH v2 02/10] btrfs: reduce size of struct btrfs_block_group fdmanana
@ 2026-04-15 20:07 ` Martin Raiber
2026-04-16 10:30 ` Filipe Manana
2026-04-17 2:09 ` David Sterba
0 siblings, 2 replies; 28+ messages in thread
From: Martin Raiber @ 2026-04-15 20:07 UTC (permalink / raw)
To: fdmanana, linux-btrfs
On 15/04/2026 20:44 fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> We currently have several holes in the structure:
>
> struct btrfs_block_group {
> struct btrfs_fs_info * fs_info; /* 0 8 */
> struct btrfs_inode * inode; /* 8 8 */
> spinlock_t lock __attribute__((__aligned__(4))); /* 16 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> u64 start; /* 24 8 */
> u64 length; /* 32 8 */
> u64 pinned; /* 40 8 */
> u64 reserved; /* 48 8 */
> u64 used; /* 56 8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> u64 delalloc_bytes; /* 64 8 */
> u64 bytes_super; /* 72 8 */
> u64 flags; /* 80 8 */
> u64 cache_generation; /* 88 8 */
> u64 global_root_id; /* 96 8 */
> u64 remap_bytes; /* 104 8 */
> u32 identity_remap_count; /* 112 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> u64 last_used; /* 120 8 */
> /* --- cacheline 2 boundary (128 bytes) --- */
> u64 last_remap_bytes; /* 128 8 */
> u32 last_identity_remap_count; /* 136 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> u64 last_flags; /* 144 8 */
> u32 bitmap_high_thresh; /* 152 4 */
> u32 bitmap_low_thresh; /* 156 4 */
> struct rw_semaphore data_rwsem __attribute__((__aligned__(8))); /* 160 40 */
> /* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
> long unsigned int full_stripe_len; /* 200 8 */
> long unsigned int runtime_flags; /* 208 8 */
> unsigned int ro; /* 216 4 */
> int disk_cache_state; /* 220 4 */
> int cached; /* 224 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> struct btrfs_caching_control * caching_ctl; /* 232 8 */
> struct btrfs_space_info * space_info; /* 240 8 */
> struct btrfs_free_space_ctl * free_space_ctl; /* 248 8 */
> /* --- cacheline 4 boundary (256 bytes) --- */
> struct rb_node cache_node __attribute__((__aligned__(8))); /* 256 24 */
> struct list_head list; /* 280 16 */
> refcount_t refs __attribute__((__aligned__(4))); /* 296 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> struct list_head cluster_list; /* 304 16 */
> /* --- cacheline 5 boundary (320 bytes) --- */
> struct list_head bg_list; /* 320 16 */
> struct list_head ro_list; /* 336 16 */
> atomic_t frozen __attribute__((__aligned__(4))); /* 352 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> struct list_head discard_list; /* 360 16 */
> int discard_index; /* 376 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> /* --- cacheline 6 boundary (384 bytes) --- */
> u64 discard_eligible_time; /* 384 8 */
> u64 discard_cursor; /* 392 8 */
> enum btrfs_discard_state discard_state; /* 400 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> struct list_head dirty_list; /* 408 16 */
> struct list_head io_list; /* 424 16 */
> struct btrfs_io_ctl io_ctl; /* 440 72 */
> /* --- cacheline 8 boundary (512 bytes) --- */
> atomic_t reservations __attribute__((__aligned__(4))); /* 512 4 */
> atomic_t nocow_writers __attribute__((__aligned__(4))); /* 516 4 */
> struct mutex free_space_lock __attribute__((__aligned__(8))); /* 520 32 */
> bool using_free_space_bitmaps; /* 552 1 */
> bool using_free_space_bitmaps_cached; /* 553 1 */
>
> /* XXX 2 bytes hole, try to pack */
>
> int swap_extents; /* 556 4 */
> u64 alloc_offset; /* 560 8 */
> u64 zone_unusable; /* 568 8 */
> /* --- cacheline 9 boundary (576 bytes) --- */
> u64 zone_capacity; /* 576 8 */
> u64 meta_write_pointer; /* 584 8 */
> struct btrfs_chunk_map * physical_map; /* 592 8 */
> struct list_head active_bg_list; /* 600 16 */
> struct work_struct zone_finish_work; /* 616 32 */
> /* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */
> struct extent_buffer * last_eb; /* 648 8 */
> enum btrfs_block_group_size_class size_class; /* 656 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> u64 reclaim_mark; /* 664 8 */
>
> /* size: 672, cachelines: 11, members: 61 */
> /* sum members: 634, holes: 10, sum holes: 38 */
> /* forced alignments: 8 */
> /* last cacheline: 32 bytes */
> } __attribute__((__aligned__(8)));
>
> Reorder some fields to eliminate the holes while keeping closely related
> or frequently accessed fields together. After the reordering the size of
> the structure is reduced down to 632 bytes and the number of cache lines
> decreases from 11 to 10. We can still only pack 6 block groups per 4K page
> but on a 64K page system we will now be able to pack 103 block groups
> instead of 97. The new structure layout, on a release kernel, is the
> following:
>
> struct btrfs_block_group {
> struct btrfs_fs_info * fs_info; /* 0 8 */
> struct btrfs_inode * inode; /* 8 8 */
> spinlock_t lock __attribute__((__aligned__(4))); /* 16 4 */
> unsigned int ro; /* 20 4 */
> u64 start; /* 24 8 */
> u64 length; /* 32 8 */
> u64 pinned; /* 40 8 */
> u64 reserved; /* 48 8 */
> u64 used; /* 56 8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> u64 delalloc_bytes; /* 64 8 */
> u64 bytes_super; /* 72 8 */
> u64 flags; /* 80 8 */
> u64 cache_generation; /* 88 8 */
> u64 global_root_id; /* 96 8 */
> u64 remap_bytes; /* 104 8 */
> u32 identity_remap_count; /* 112 4 */
> u32 last_identity_remap_count; /* 116 4 */
> u64 last_used; /* 120 8 */
> /* --- cacheline 2 boundary (128 bytes) --- */
> u64 last_remap_bytes; /* 128 8 */
> u64 last_flags; /* 136 8 */
> u32 bitmap_high_thresh; /* 144 4 */
> u32 bitmap_low_thresh; /* 148 4 */
> struct rw_semaphore data_rwsem __attribute__((__aligned__(8))); /* 152 40 */
> /* --- cacheline 3 boundary (192 bytes) --- */
> long unsigned int full_stripe_len; /* 192 8 */
> long unsigned int runtime_flags; /* 200 8 */
> int disk_cache_state; /* 208 4 */
> int cached; /* 212 4 */
> struct btrfs_caching_control * caching_ctl; /* 216 8 */
> struct btrfs_space_info * space_info; /* 224 8 */
> struct btrfs_free_space_ctl * free_space_ctl; /* 232 8 */
> struct rb_node cache_node __attribute__((__aligned__(8))); /* 240 24 */
> /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> struct list_head list; /* 264 16 */
> refcount_t refs __attribute__((__aligned__(4))); /* 280 4 */
> atomic_t frozen __attribute__((__aligned__(4))); /* 284 4 */
> struct list_head cluster_list; /* 288 16 */
> struct list_head bg_list; /* 304 16 */
> /* --- cacheline 5 boundary (320 bytes) --- */
> struct list_head ro_list; /* 320 16 */
> struct list_head discard_list; /* 336 16 */
> int discard_index; /* 352 4 */
> enum btrfs_discard_state discard_state; /* 356 4 */
> u64 discard_eligible_time; /* 360 8 */
> u64 discard_cursor; /* 368 8 */
> struct list_head dirty_list; /* 376 16 */
> /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
> struct list_head io_list; /* 392 16 */
> struct btrfs_io_ctl io_ctl; /* 408 72 */
> /* --- cacheline 7 boundary (448 bytes) was 32 bytes ago --- */
> atomic_t reservations __attribute__((__aligned__(4))); /* 480 4 */
> atomic_t nocow_writers __attribute__((__aligned__(4))); /* 484 4 */
> struct mutex free_space_lock __attribute__((__aligned__(8))); /* 488 32 */
> /* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
> bool using_free_space_bitmaps; /* 520 1 */
> bool using_free_space_bitmaps_cached; /* 521 1 */
>
> /* XXX 2 bytes hole, try to pack */
> /* Bitfield combined with previous fields */
>
> static enum btrfs_block_group_size_class size_class; /* 0: 0 0 */
> int swap_extents; /* 524 4 */
> u64 alloc_offset; /* 528 8 */
> u64 zone_unusable; /* 536 8 */
> u64 zone_capacity; /* 544 8 */
> u64 meta_write_pointer; /* 552 8 */
> struct btrfs_chunk_map * physical_map; /* 560 8 */
> struct list_head active_bg_list; /* 568 16 */
> /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
> struct work_struct zone_finish_work; /* 584 32 */
> struct extent_buffer * last_eb; /* 616 8 */
> u64 reclaim_mark; /* 624 8 */
>
> /* size: 632, cachelines: 10, members: 60, static members: 1 */
> /* sum members: 630, holes: 1, sum holes: 2 */
> /* sum bitfield members: 8 bits (1 bytes) */
> /* forced alignments: 8 */
> /* last cacheline: 56 bytes */
>
> /* BRAIN FART ALERT! 632 bytes != 630 (member bytes) + 8 (member bits) + 2 (byte holes) + 0 (bit holes), diff = -8 bits */
> } __attribute__((__aligned__(8)));
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/block-group.h | 33 ++++++++++++++++-----------------
> 1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index b414f4268d2d..60a3b1c0a8ab 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -122,6 +122,7 @@ struct btrfs_block_group {
> struct btrfs_fs_info *fs_info;
> struct btrfs_inode *inode;
> spinlock_t lock;
> + unsigned int ro;
> u64 start;
> u64 length;
> u64 pinned;
> @@ -134,7 +135,8 @@ struct btrfs_block_group {
> u64 global_root_id;
> u64 remap_bytes;
> u32 identity_remap_count;
> -
> + /* The last commited identity_remap_count value of this block group. */
> + u32 last_identity_remap_count;
> /*
> * The last committed used bytes of this block group, if the above @used
> * is still the same as @last_used, we don't need to update block
> @@ -143,8 +145,6 @@ struct btrfs_block_group {
> u64 last_used;
> /* The last committed remap_bytes value of this block group. */
> u64 last_remap_bytes;
> - /* The last commited identity_remap_count value of this block group. */
> - u32 last_identity_remap_count;
> /* The last committed flags value for this block group. */
> u64 last_flags;
>
> @@ -171,8 +171,6 @@ struct btrfs_block_group {
> unsigned long full_stripe_len;
> unsigned long runtime_flags;
>
> - unsigned int ro;
> -
> int disk_cache_state;
>
> /* Cache tracking stuff */
> @@ -192,6 +190,16 @@ struct btrfs_block_group {
>
> refcount_t refs;
>
> + /*
> + * When non-zero it means the block group's logical address and its
> + * device extents can not be reused for future block group allocations
> + * until the counter goes down to 0. This is to prevent them from being
> + * reused while some task is still using the block group after it was
> + * deleted - we want to make sure they can only be reused for new block
> + * groups after that task is done with the deleted block group.
> + */
> + atomic_t frozen;
> +
> /*
> * List of struct btrfs_free_clusters for this block group.
> * Today it will only have one thing on it, but that may change
> @@ -211,22 +219,12 @@ struct btrfs_block_group {
> /* For read-only block groups */
> struct list_head ro_list;
>
> - /*
> - * When non-zero it means the block group's logical address and its
> - * device extents can not be reused for future block group allocations
> - * until the counter goes down to 0. This is to prevent them from being
> - * reused while some task is still using the block group after it was
> - * deleted - we want to make sure they can only be reused for new block
> - * groups after that task is done with the deleted block group.
> - */
> - atomic_t frozen;
> -
> /* For discard operations */
> struct list_head discard_list;
> int discard_index;
> + enum btrfs_discard_state discard_state;
> u64 discard_eligible_time;
> u64 discard_cursor;
> - enum btrfs_discard_state discard_state;
>
> /* For dirty block groups */
> struct list_head dirty_list;
> @@ -263,6 +261,8 @@ struct btrfs_block_group {
> /* Protected by @free_space_lock. */
> bool using_free_space_bitmaps_cached;
>
> + enum btrfs_block_group_size_class size_class:8;
> +
> /*
> * Number of extents in this block group used for swap files.
> * All accesses protected by the spinlock 'lock'.
> @@ -281,7 +281,6 @@ struct btrfs_block_group {
> struct list_head active_bg_list;
> struct work_struct zone_finish_work;
> struct extent_buffer *last_eb;
> - enum btrfs_block_group_size_class size_class;
> u64 reclaim_mark;
> };
>
I found that putting ro, cached and size_class in the same cache line
brought performance benefits for |find_free_extent. I cannot say how
important that is compared to other kinds of users.|
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH v2 02/10] btrfs: reduce size of struct btrfs_block_group
2026-04-15 20:07 ` Martin Raiber
@ 2026-04-16 10:30 ` Filipe Manana
2026-04-16 10:46 ` Martin Raiber
2026-04-17 2:09 ` David Sterba
1 sibling, 1 reply; 28+ messages in thread
From: Filipe Manana @ 2026-04-16 10:30 UTC (permalink / raw)
To: Martin Raiber; +Cc: linux-btrfs
On Wed, Apr 15, 2026 at 9:08 PM Martin Raiber <martin@urbackup.org> wrote:
>
> On 15/04/2026 20:44 fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > We currently have several holes in the structure:
> >
> > struct btrfs_block_group {
> > struct btrfs_fs_info * fs_info; /* 0 8 */
> > struct btrfs_inode * inode; /* 8 8 */
> > spinlock_t lock __attribute__((__aligned__(4))); /* 16 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > u64 start; /* 24 8 */
> > u64 length; /* 32 8 */
> > u64 pinned; /* 40 8 */
> > u64 reserved; /* 48 8 */
> > u64 used; /* 56 8 */
> > /* --- cacheline 1 boundary (64 bytes) --- */
> > u64 delalloc_bytes; /* 64 8 */
> > u64 bytes_super; /* 72 8 */
> > u64 flags; /* 80 8 */
> > u64 cache_generation; /* 88 8 */
> > u64 global_root_id; /* 96 8 */
> > u64 remap_bytes; /* 104 8 */
> > u32 identity_remap_count; /* 112 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > u64 last_used; /* 120 8 */
> > /* --- cacheline 2 boundary (128 bytes) --- */
> > u64 last_remap_bytes; /* 128 8 */
> > u32 last_identity_remap_count; /* 136 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > u64 last_flags; /* 144 8 */
> > u32 bitmap_high_thresh; /* 152 4 */
> > u32 bitmap_low_thresh; /* 156 4 */
> > struct rw_semaphore data_rwsem __attribute__((__aligned__(8))); /* 160 40 */
> > /* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
> > long unsigned int full_stripe_len; /* 200 8 */
> > long unsigned int runtime_flags; /* 208 8 */
> > unsigned int ro; /* 216 4 */
> > int disk_cache_state; /* 220 4 */
> > int cached; /* 224 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > struct btrfs_caching_control * caching_ctl; /* 232 8 */
> > struct btrfs_space_info * space_info; /* 240 8 */
> > struct btrfs_free_space_ctl * free_space_ctl; /* 248 8 */
> > /* --- cacheline 4 boundary (256 bytes) --- */
> > struct rb_node cache_node __attribute__((__aligned__(8))); /* 256 24 */
> > struct list_head list; /* 280 16 */
> > refcount_t refs __attribute__((__aligned__(4))); /* 296 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > struct list_head cluster_list; /* 304 16 */
> > /* --- cacheline 5 boundary (320 bytes) --- */
> > struct list_head bg_list; /* 320 16 */
> > struct list_head ro_list; /* 336 16 */
> > atomic_t frozen __attribute__((__aligned__(4))); /* 352 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > struct list_head discard_list; /* 360 16 */
> > int discard_index; /* 376 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > /* --- cacheline 6 boundary (384 bytes) --- */
> > u64 discard_eligible_time; /* 384 8 */
> > u64 discard_cursor; /* 392 8 */
> > enum btrfs_discard_state discard_state; /* 400 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > struct list_head dirty_list; /* 408 16 */
> > struct list_head io_list; /* 424 16 */
> > struct btrfs_io_ctl io_ctl; /* 440 72 */
> > /* --- cacheline 8 boundary (512 bytes) --- */
> > atomic_t reservations __attribute__((__aligned__(4))); /* 512 4 */
> > atomic_t nocow_writers __attribute__((__aligned__(4))); /* 516 4 */
> > struct mutex free_space_lock __attribute__((__aligned__(8))); /* 520 32 */
> > bool using_free_space_bitmaps; /* 552 1 */
> > bool using_free_space_bitmaps_cached; /* 553 1 */
> >
> > /* XXX 2 bytes hole, try to pack */
> >
> > int swap_extents; /* 556 4 */
> > u64 alloc_offset; /* 560 8 */
> > u64 zone_unusable; /* 568 8 */
> > /* --- cacheline 9 boundary (576 bytes) --- */
> > u64 zone_capacity; /* 576 8 */
> > u64 meta_write_pointer; /* 584 8 */
> > struct btrfs_chunk_map * physical_map; /* 592 8 */
> > struct list_head active_bg_list; /* 600 16 */
> > struct work_struct zone_finish_work; /* 616 32 */
> > /* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */
> > struct extent_buffer * last_eb; /* 648 8 */
> > enum btrfs_block_group_size_class size_class; /* 656 4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > u64 reclaim_mark; /* 664 8 */
> >
> > /* size: 672, cachelines: 11, members: 61 */
> > /* sum members: 634, holes: 10, sum holes: 38 */
> > /* forced alignments: 8 */
> > /* last cacheline: 32 bytes */
> > } __attribute__((__aligned__(8)));
> >
> > Reorder some fields to eliminate the holes while keeping closely related
> > or frequently accessed fields together. After the reordering the size of
> > the structure is reduced down to 632 bytes and the number of cache lines
> > decreases from 11 to 10. We can still only pack 6 block groups per 4K page
> > but on a 64K page system we will now be able to pack 103 block groups
> > instead of 97. The new structure layout, on a release kernel, is the
> > following:
> >
> > struct btrfs_block_group {
> > struct btrfs_fs_info * fs_info; /* 0 8 */
> > struct btrfs_inode * inode; /* 8 8 */
> > spinlock_t lock __attribute__((__aligned__(4))); /* 16 4 */
> > unsigned int ro; /* 20 4 */
> > u64 start; /* 24 8 */
> > u64 length; /* 32 8 */
> > u64 pinned; /* 40 8 */
> > u64 reserved; /* 48 8 */
> > u64 used; /* 56 8 */
> > /* --- cacheline 1 boundary (64 bytes) --- */
> > u64 delalloc_bytes; /* 64 8 */
> > u64 bytes_super; /* 72 8 */
> > u64 flags; /* 80 8 */
> > u64 cache_generation; /* 88 8 */
> > u64 global_root_id; /* 96 8 */
> > u64 remap_bytes; /* 104 8 */
> > u32 identity_remap_count; /* 112 4 */
> > u32 last_identity_remap_count; /* 116 4 */
> > u64 last_used; /* 120 8 */
> > /* --- cacheline 2 boundary (128 bytes) --- */
> > u64 last_remap_bytes; /* 128 8 */
> > u64 last_flags; /* 136 8 */
> > u32 bitmap_high_thresh; /* 144 4 */
> > u32 bitmap_low_thresh; /* 148 4 */
> > struct rw_semaphore data_rwsem __attribute__((__aligned__(8))); /* 152 40 */
> > /* --- cacheline 3 boundary (192 bytes) --- */
> > long unsigned int full_stripe_len; /* 192 8 */
> > long unsigned int runtime_flags; /* 200 8 */
> > int disk_cache_state; /* 208 4 */
> > int cached; /* 212 4 */
> > struct btrfs_caching_control * caching_ctl; /* 216 8 */
> > struct btrfs_space_info * space_info; /* 224 8 */
> > struct btrfs_free_space_ctl * free_space_ctl; /* 232 8 */
> > struct rb_node cache_node __attribute__((__aligned__(8))); /* 240 24 */
> > /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> > struct list_head list; /* 264 16 */
> > refcount_t refs __attribute__((__aligned__(4))); /* 280 4 */
> > atomic_t frozen __attribute__((__aligned__(4))); /* 284 4 */
> > struct list_head cluster_list; /* 288 16 */
> > struct list_head bg_list; /* 304 16 */
> > /* --- cacheline 5 boundary (320 bytes) --- */
> > struct list_head ro_list; /* 320 16 */
> > struct list_head discard_list; /* 336 16 */
> > int discard_index; /* 352 4 */
> > enum btrfs_discard_state discard_state; /* 356 4 */
> > u64 discard_eligible_time; /* 360 8 */
> > u64 discard_cursor; /* 368 8 */
> > struct list_head dirty_list; /* 376 16 */
> > /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
> > struct list_head io_list; /* 392 16 */
> > struct btrfs_io_ctl io_ctl; /* 408 72 */
> > /* --- cacheline 7 boundary (448 bytes) was 32 bytes ago --- */
> > atomic_t reservations __attribute__((__aligned__(4))); /* 480 4 */
> > atomic_t nocow_writers __attribute__((__aligned__(4))); /* 484 4 */
> > struct mutex free_space_lock __attribute__((__aligned__(8))); /* 488 32 */
> > /* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
> > bool using_free_space_bitmaps; /* 520 1 */
> > bool using_free_space_bitmaps_cached; /* 521 1 */
> >
> > /* XXX 2 bytes hole, try to pack */
> > /* Bitfield combined with previous fields */
> >
> > static enum btrfs_block_group_size_class size_class; /* 0: 0 0 */
> > int swap_extents; /* 524 4 */
> > u64 alloc_offset; /* 528 8 */
> > u64 zone_unusable; /* 536 8 */
> > u64 zone_capacity; /* 544 8 */
> > u64 meta_write_pointer; /* 552 8 */
> > struct btrfs_chunk_map * physical_map; /* 560 8 */
> > struct list_head active_bg_list; /* 568 16 */
> > /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
> > struct work_struct zone_finish_work; /* 584 32 */
> > struct extent_buffer * last_eb; /* 616 8 */
> > u64 reclaim_mark; /* 624 8 */
> >
> > /* size: 632, cachelines: 10, members: 60, static members: 1 */
> > /* sum members: 630, holes: 1, sum holes: 2 */
> > /* sum bitfield members: 8 bits (1 bytes) */
> > /* forced alignments: 8 */
> > /* last cacheline: 56 bytes */
> >
> > /* BRAIN FART ALERT! 632 bytes != 630 (member bytes) + 8 (member bits) + 2 (byte holes) + 0 (bit holes), diff = -8 bits */
> > } __attribute__((__aligned__(8)));
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > fs/btrfs/block-group.h | 33 ++++++++++++++++-----------------
> > 1 file changed, 16 insertions(+), 17 deletions(-)
> >
> > diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> > index b414f4268d2d..60a3b1c0a8ab 100644
> > --- a/fs/btrfs/block-group.h
> > +++ b/fs/btrfs/block-group.h
> > @@ -122,6 +122,7 @@ struct btrfs_block_group {
> > struct btrfs_fs_info *fs_info;
> > struct btrfs_inode *inode;
> > spinlock_t lock;
> > + unsigned int ro;
> > u64 start;
> > u64 length;
> > u64 pinned;
> > @@ -134,7 +135,8 @@ struct btrfs_block_group {
> > u64 global_root_id;
> > u64 remap_bytes;
> > u32 identity_remap_count;
> > -
> > + /* The last commited identity_remap_count value of this block group. */
> > + u32 last_identity_remap_count;
> > /*
> > * The last committed used bytes of this block group, if the above @used
> > * is still the same as @last_used, we don't need to update block
> > @@ -143,8 +145,6 @@ struct btrfs_block_group {
> > u64 last_used;
> > /* The last committed remap_bytes value of this block group. */
> > u64 last_remap_bytes;
> > - /* The last commited identity_remap_count value of this block group. */
> > - u32 last_identity_remap_count;
> > /* The last committed flags value for this block group. */
> > u64 last_flags;
> >
> > @@ -171,8 +171,6 @@ struct btrfs_block_group {
> > unsigned long full_stripe_len;
> > unsigned long runtime_flags;
> >
> > - unsigned int ro;
> > -
> > int disk_cache_state;
> >
> > /* Cache tracking stuff */
> > @@ -192,6 +190,16 @@ struct btrfs_block_group {
> >
> > refcount_t refs;
> >
> > + /*
> > + * When non-zero it means the block group's logical address and its
> > + * device extents can not be reused for future block group allocations
> > + * until the counter goes down to 0. This is to prevent them from being
> > + * reused while some task is still using the block group after it was
> > + * deleted - we want to make sure they can only be reused for new block
> > + * groups after that task is done with the deleted block group.
> > + */
> > + atomic_t frozen;
> > +
> > /*
> > * List of struct btrfs_free_clusters for this block group.
> > * Today it will only have one thing on it, but that may change
> > @@ -211,22 +219,12 @@ struct btrfs_block_group {
> > /* For read-only block groups */
> > struct list_head ro_list;
> >
> > - /*
> > - * When non-zero it means the block group's logical address and its
> > - * device extents can not be reused for future block group allocations
> > - * until the counter goes down to 0. This is to prevent them from being
> > - * reused while some task is still using the block group after it was
> > - * deleted - we want to make sure they can only be reused for new block
> > - * groups after that task is done with the deleted block group.
> > - */
> > - atomic_t frozen;
> > -
> > /* For discard operations */
> > struct list_head discard_list;
> > int discard_index;
> > + enum btrfs_discard_state discard_state;
> > u64 discard_eligible_time;
> > u64 discard_cursor;
> > - enum btrfs_discard_state discard_state;
> >
> > /* For dirty block groups */
> > struct list_head dirty_list;
> > @@ -263,6 +261,8 @@ struct btrfs_block_group {
> > /* Protected by @free_space_lock. */
> > bool using_free_space_bitmaps_cached;
> >
> > + enum btrfs_block_group_size_class size_class:8;
> > +
> > /*
> > * Number of extents in this block group used for swap files.
> > * All accesses protected by the spinlock 'lock'.
> > @@ -281,7 +281,6 @@ struct btrfs_block_group {
> > struct list_head active_bg_list;
> > struct work_struct zone_finish_work;
> > struct extent_buffer *last_eb;
> > - enum btrfs_block_group_size_class size_class;
> > u64 reclaim_mark;
> > };
> >
>
> I found that putting ro, cached and size_class in the same cache line
> brought performance benefits for |find_free_extent. I cannot say how
> important that is compared to other kinds of users.|
Can you be more specific?
What are the performance benefits and what is the magnitude of improvement?
Better throughput in some specific workload, lower latency, something else?
Or was it micro benchmarking by checking cpu cache hit/miss ratio, the
duration of find_free_extent() calls, or something else?
What was the workload: a fio script/config, or something else?
Such reordering requires separate action, a good justification, and a
workload description with supporting numbers in the change log.
Thanks.
>
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH v2 02/10] btrfs: reduce size of struct btrfs_block_group
2026-04-16 10:30 ` Filipe Manana
@ 2026-04-16 10:46 ` Martin Raiber
0 siblings, 0 replies; 28+ messages in thread
From: Martin Raiber @ 2026-04-16 10:46 UTC (permalink / raw)
To: Filipe Manana; +Cc: linux-btrfs
On 16/04/2026 12:30 Filipe Manana wrote:
> On Wed, Apr 15, 2026 at 9:08 PM Martin Raiber <martin@urbackup.org> wrote:
>> On 15/04/2026 20:44 fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> We currently have several holes in the structure:
>>>
>>> struct btrfs_block_group {
>>> struct btrfs_fs_info * fs_info; /* 0 8 */
>>> struct btrfs_inode * inode; /* 8 8 */
>>> spinlock_t lock __attribute__((__aligned__(4))); /* 16 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> u64 start; /* 24 8 */
>>> u64 length; /* 32 8 */
>>> u64 pinned; /* 40 8 */
>>> u64 reserved; /* 48 8 */
>>> u64 used; /* 56 8 */
>>> /* --- cacheline 1 boundary (64 bytes) --- */
>>> u64 delalloc_bytes; /* 64 8 */
>>> u64 bytes_super; /* 72 8 */
>>> u64 flags; /* 80 8 */
>>> u64 cache_generation; /* 88 8 */
>>> u64 global_root_id; /* 96 8 */
>>> u64 remap_bytes; /* 104 8 */
>>> u32 identity_remap_count; /* 112 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> u64 last_used; /* 120 8 */
>>> /* --- cacheline 2 boundary (128 bytes) --- */
>>> u64 last_remap_bytes; /* 128 8 */
>>> u32 last_identity_remap_count; /* 136 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> u64 last_flags; /* 144 8 */
>>> u32 bitmap_high_thresh; /* 152 4 */
>>> u32 bitmap_low_thresh; /* 156 4 */
>>> struct rw_semaphore data_rwsem __attribute__((__aligned__(8))); /* 160 40 */
>>> /* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
>>> long unsigned int full_stripe_len; /* 200 8 */
>>> long unsigned int runtime_flags; /* 208 8 */
>>> unsigned int ro; /* 216 4 */
>>> int disk_cache_state; /* 220 4 */
>>> int cached; /* 224 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> struct btrfs_caching_control * caching_ctl; /* 232 8 */
>>> struct btrfs_space_info * space_info; /* 240 8 */
>>> struct btrfs_free_space_ctl * free_space_ctl; /* 248 8 */
>>> /* --- cacheline 4 boundary (256 bytes) --- */
>>> struct rb_node cache_node __attribute__((__aligned__(8))); /* 256 24 */
>>> struct list_head list; /* 280 16 */
>>> refcount_t refs __attribute__((__aligned__(4))); /* 296 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> struct list_head cluster_list; /* 304 16 */
>>> /* --- cacheline 5 boundary (320 bytes) --- */
>>> struct list_head bg_list; /* 320 16 */
>>> struct list_head ro_list; /* 336 16 */
>>> atomic_t frozen __attribute__((__aligned__(4))); /* 352 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> struct list_head discard_list; /* 360 16 */
>>> int discard_index; /* 376 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> /* --- cacheline 6 boundary (384 bytes) --- */
>>> u64 discard_eligible_time; /* 384 8 */
>>> u64 discard_cursor; /* 392 8 */
>>> enum btrfs_discard_state discard_state; /* 400 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> struct list_head dirty_list; /* 408 16 */
>>> struct list_head io_list; /* 424 16 */
>>> struct btrfs_io_ctl io_ctl; /* 440 72 */
>>> /* --- cacheline 8 boundary (512 bytes) --- */
>>> atomic_t reservations __attribute__((__aligned__(4))); /* 512 4 */
>>> atomic_t nocow_writers __attribute__((__aligned__(4))); /* 516 4 */
>>> struct mutex free_space_lock __attribute__((__aligned__(8))); /* 520 32 */
>>> bool using_free_space_bitmaps; /* 552 1 */
>>> bool using_free_space_bitmaps_cached; /* 553 1 */
>>>
>>> /* XXX 2 bytes hole, try to pack */
>>>
>>> int swap_extents; /* 556 4 */
>>> u64 alloc_offset; /* 560 8 */
>>> u64 zone_unusable; /* 568 8 */
>>> /* --- cacheline 9 boundary (576 bytes) --- */
>>> u64 zone_capacity; /* 576 8 */
>>> u64 meta_write_pointer; /* 584 8 */
>>> struct btrfs_chunk_map * physical_map; /* 592 8 */
>>> struct list_head active_bg_list; /* 600 16 */
>>> struct work_struct zone_finish_work; /* 616 32 */
>>> /* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */
>>> struct extent_buffer * last_eb; /* 648 8 */
>>> enum btrfs_block_group_size_class size_class; /* 656 4 */
>>>
>>> /* XXX 4 bytes hole, try to pack */
>>>
>>> u64 reclaim_mark; /* 664 8 */
>>>
>>> /* size: 672, cachelines: 11, members: 61 */
>>> /* sum members: 634, holes: 10, sum holes: 38 */
>>> /* forced alignments: 8 */
>>> /* last cacheline: 32 bytes */
>>> } __attribute__((__aligned__(8)));
>>>
>>> Reorder some fields to eliminate the holes while keeping closely related
>>> or frequently accessed fields together. After the reordering the size of
>>> the structure is reduced down to 632 bytes and the number of cache lines
>>> decreases from 11 to 10. We can still only pack 6 block groups per 4K page
>>> but on a 64K page system we will now be able to pack 103 block groups
>>> instead of 97. The new structure layout, on a release kernel, is the
>>> following:
>>>
>>> struct btrfs_block_group {
>>> struct btrfs_fs_info * fs_info; /* 0 8 */
>>> struct btrfs_inode * inode; /* 8 8 */
>>> spinlock_t lock __attribute__((__aligned__(4))); /* 16 4 */
>>> unsigned int ro; /* 20 4 */
>>> u64 start; /* 24 8 */
>>> u64 length; /* 32 8 */
>>> u64 pinned; /* 40 8 */
>>> u64 reserved; /* 48 8 */
>>> u64 used; /* 56 8 */
>>> /* --- cacheline 1 boundary (64 bytes) --- */
>>> u64 delalloc_bytes; /* 64 8 */
>>> u64 bytes_super; /* 72 8 */
>>> u64 flags; /* 80 8 */
>>> u64 cache_generation; /* 88 8 */
>>> u64 global_root_id; /* 96 8 */
>>> u64 remap_bytes; /* 104 8 */
>>> u32 identity_remap_count; /* 112 4 */
>>> u32 last_identity_remap_count; /* 116 4 */
>>> u64 last_used; /* 120 8 */
>>> /* --- cacheline 2 boundary (128 bytes) --- */
>>> u64 last_remap_bytes; /* 128 8 */
>>> u64 last_flags; /* 136 8 */
>>> u32 bitmap_high_thresh; /* 144 4 */
>>> u32 bitmap_low_thresh; /* 148 4 */
>>> struct rw_semaphore data_rwsem __attribute__((__aligned__(8))); /* 152 40 */
>>> /* --- cacheline 3 boundary (192 bytes) --- */
>>> long unsigned int full_stripe_len; /* 192 8 */
>>> long unsigned int runtime_flags; /* 200 8 */
>>> int disk_cache_state; /* 208 4 */
>>> int cached; /* 212 4 */
>>> struct btrfs_caching_control * caching_ctl; /* 216 8 */
>>> struct btrfs_space_info * space_info; /* 224 8 */
>>> struct btrfs_free_space_ctl * free_space_ctl; /* 232 8 */
>>> struct rb_node cache_node __attribute__((__aligned__(8))); /* 240 24 */
>>> /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
>>> struct list_head list; /* 264 16 */
>>> refcount_t refs __attribute__((__aligned__(4))); /* 280 4 */
>>> atomic_t frozen __attribute__((__aligned__(4))); /* 284 4 */
>>> struct list_head cluster_list; /* 288 16 */
>>> struct list_head bg_list; /* 304 16 */
>>> /* --- cacheline 5 boundary (320 bytes) --- */
>>> struct list_head ro_list; /* 320 16 */
>>> struct list_head discard_list; /* 336 16 */
>>> int discard_index; /* 352 4 */
>>> enum btrfs_discard_state discard_state; /* 356 4 */
>>> u64 discard_eligible_time; /* 360 8 */
>>> u64 discard_cursor; /* 368 8 */
>>> struct list_head dirty_list; /* 376 16 */
>>> /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
>>> struct list_head io_list; /* 392 16 */
>>> struct btrfs_io_ctl io_ctl; /* 408 72 */
>>> /* --- cacheline 7 boundary (448 bytes) was 32 bytes ago --- */
>>> atomic_t reservations __attribute__((__aligned__(4))); /* 480 4 */
>>> atomic_t nocow_writers __attribute__((__aligned__(4))); /* 484 4 */
>>> struct mutex free_space_lock __attribute__((__aligned__(8))); /* 488 32 */
>>> /* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
>>> bool using_free_space_bitmaps; /* 520 1 */
>>> bool using_free_space_bitmaps_cached; /* 521 1 */
>>>
>>> /* XXX 2 bytes hole, try to pack */
>>> /* Bitfield combined with previous fields */
>>>
>>> static enum btrfs_block_group_size_class size_class; /* 0: 0 0 */
>>> int swap_extents; /* 524 4 */
>>> u64 alloc_offset; /* 528 8 */
>>> u64 zone_unusable; /* 536 8 */
>>> u64 zone_capacity; /* 544 8 */
>>> u64 meta_write_pointer; /* 552 8 */
>>> struct btrfs_chunk_map * physical_map; /* 560 8 */
>>> struct list_head active_bg_list; /* 568 16 */
>>> /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
>>> struct work_struct zone_finish_work; /* 584 32 */
>>> struct extent_buffer * last_eb; /* 616 8 */
>>> u64 reclaim_mark; /* 624 8 */
>>>
>>> /* size: 632, cachelines: 10, members: 60, static members: 1 */
>>> /* sum members: 630, holes: 1, sum holes: 2 */
>>> /* sum bitfield members: 8 bits (1 bytes) */
>>> /* forced alignments: 8 */
>>> /* last cacheline: 56 bytes */
>>>
>>> /* BRAIN FART ALERT! 632 bytes != 630 (member bytes) + 8 (member bits) + 2 (byte holes) + 0 (bit holes), diff = -8 bits */
>>> } __attribute__((__aligned__(8)));
>>>
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>> ---
>>> fs/btrfs/block-group.h | 33 ++++++++++++++++-----------------
>>> 1 file changed, 16 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
>>> index b414f4268d2d..60a3b1c0a8ab 100644
>>> --- a/fs/btrfs/block-group.h
>>> +++ b/fs/btrfs/block-group.h
>>> @@ -122,6 +122,7 @@ struct btrfs_block_group {
>>> struct btrfs_fs_info *fs_info;
>>> struct btrfs_inode *inode;
>>> spinlock_t lock;
>>> + unsigned int ro;
>>> u64 start;
>>> u64 length;
>>> u64 pinned;
>>> @@ -134,7 +135,8 @@ struct btrfs_block_group {
>>> u64 global_root_id;
>>> u64 remap_bytes;
>>> u32 identity_remap_count;
>>> -
>>> + /* The last commited identity_remap_count value of this block group. */
>>> + u32 last_identity_remap_count;
>>> /*
>>> * The last committed used bytes of this block group, if the above @used
>>> * is still the same as @last_used, we don't need to update block
>>> @@ -143,8 +145,6 @@ struct btrfs_block_group {
>>> u64 last_used;
>>> /* The last committed remap_bytes value of this block group. */
>>> u64 last_remap_bytes;
>>> - /* The last commited identity_remap_count value of this block group. */
>>> - u32 last_identity_remap_count;
>>> /* The last committed flags value for this block group. */
>>> u64 last_flags;
>>>
>>> @@ -171,8 +171,6 @@ struct btrfs_block_group {
>>> unsigned long full_stripe_len;
>>> unsigned long runtime_flags;
>>>
>>> - unsigned int ro;
>>> -
>>> int disk_cache_state;
>>>
>>> /* Cache tracking stuff */
>>> @@ -192,6 +190,16 @@ struct btrfs_block_group {
>>>
>>> refcount_t refs;
>>>
>>> + /*
>>> + * When non-zero it means the block group's logical address and its
>>> + * device extents can not be reused for future block group allocations
>>> + * until the counter goes down to 0. This is to prevent them from being
>>> + * reused while some task is still using the block group after it was
>>> + * deleted - we want to make sure they can only be reused for new block
>>> + * groups after that task is done with the deleted block group.
>>> + */
>>> + atomic_t frozen;
>>> +
>>> /*
>>> * List of struct btrfs_free_clusters for this block group.
>>> * Today it will only have one thing on it, but that may change
>>> @@ -211,22 +219,12 @@ struct btrfs_block_group {
>>> /* For read-only block groups */
>>> struct list_head ro_list;
>>>
>>> - /*
>>> - * When non-zero it means the block group's logical address and its
>>> - * device extents can not be reused for future block group allocations
>>> - * until the counter goes down to 0. This is to prevent them from being
>>> - * reused while some task is still using the block group after it was
>>> - * deleted - we want to make sure they can only be reused for new block
>>> - * groups after that task is done with the deleted block group.
>>> - */
>>> - atomic_t frozen;
>>> -
>>> /* For discard operations */
>>> struct list_head discard_list;
>>> int discard_index;
>>> + enum btrfs_discard_state discard_state;
>>> u64 discard_eligible_time;
>>> u64 discard_cursor;
>>> - enum btrfs_discard_state discard_state;
>>>
>>> /* For dirty block groups */
>>> struct list_head dirty_list;
>>> @@ -263,6 +261,8 @@ struct btrfs_block_group {
>>> /* Protected by @free_space_lock. */
>>> bool using_free_space_bitmaps_cached;
>>>
>>> + enum btrfs_block_group_size_class size_class:8;
>>> +
>>> /*
>>> * Number of extents in this block group used for swap files.
>>> * All accesses protected by the spinlock 'lock'.
>>> @@ -281,7 +281,6 @@ struct btrfs_block_group {
>>> struct list_head active_bg_list;
>>> struct work_struct zone_finish_work;
>>> struct extent_buffer *last_eb;
>>> - enum btrfs_block_group_size_class size_class;
>>> u64 reclaim_mark;
>>> };
>>>
>> I found that putting ro, cached and size_class in the same cache line
>> brought performance benefits for |find_free_extent. I cannot say how
>> important that is compared to other kinds of users.|
> Can you be more specific?
>
> What are the performance benefits and what is the magnitude of improvement?
> Better throughput in some specific workload, lower latency, something else?
> Or was it micro benchmarking by checking cpu cache hit/miss ratio, the
> duration of find_free_extent() calls, or something else?
>
> What was the workload: a fio script/config, or something else?
>
> Such reordering requires separate action, a good justification, and a
> workload description with supporting numbers in the change log.
>
> Thanks.
Unfortunately I did not create specific benchmarks or keep track of
throughput improvements in a systematic way. I observed the improvements
by looking at perf traces of specific systems.
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2 02/10] btrfs: reduce size of struct btrfs_block_group
2026-04-15 20:07 ` Martin Raiber
2026-04-16 10:30 ` Filipe Manana
@ 2026-04-17 2:09 ` David Sterba
1 sibling, 0 replies; 28+ messages in thread
From: David Sterba @ 2026-04-17 2:09 UTC (permalink / raw)
To: Martin Raiber; +Cc: fdmanana, linux-btrfs
On Wed, Apr 15, 2026 at 08:07:56PM +0000, Martin Raiber wrote:
> I found that putting ro, cached and size_class in the same cache line
> brought performance benefits for |find_free_extent. I cannot say how
> important that is compared to other kinds of users.|
I don't see any obvious use pattern where grouping the class, ro and
cached would be beneficial. What usually works is when a lock and the
members used in the locked section are on the same cacheline. As block
group is a big structure and used in different ways there are
possibilites where the cacheline placement will have good or bad effects
but without a reproducible or described workload it could be just a
random occasion.
Specifically for find_free_extent, there could be some potential for
optimization but this again would need at least a perf profile to find a
starting point.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v2 03/10] btrfs: use a kmem_cache for free space control structures
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
2026-04-15 18:44 ` [PATCH v2 01/10] btrfs: use a kmem_cache " fdmanana
2026-04-15 18:44 ` [PATCH v2 02/10] btrfs: reduce size of struct btrfs_block_group fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-15 18:44 ` [PATCH v2 04/10] btrfs: remove start field from struct btrfs_free_space_ctl fdmanana
` (7 subsequent siblings)
10 siblings, 0 replies; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We are currently allocating the free space control structures for block
groups using the generic slabs, and given that the size of the
btrfs_free_space_ctl structure is 152 bytes (on a release kernel), we end
up using the kmalloc-192 slab and therefore waste quite some memory since
on a 4K page system we can only fit 21 free space control structures per
page. These structures are allocated and delallocated everytime we create
and remove block groups.
So use a kmem_cache for free space control structures, this way on a 4K
page system we can fit 26 structures instead of 21.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/block-group.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a87f147aefa5..06c4d8777385 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -23,6 +23,7 @@
#include "extent-tree.h"
static struct kmem_cache *block_group_cache;
+static struct kmem_cache *free_space_ctl_cache;
int __init btrfs_init_block_group(void)
{
@@ -31,12 +32,22 @@ int __init btrfs_init_block_group(void)
0, 0, NULL);
if (!block_group_cache)
return -ENOMEM;
+
+ free_space_ctl_cache = kmem_cache_create("btrfs_free_space_ctl",
+ sizeof(struct btrfs_free_space_ctl),
+ 0, 0, NULL);
+ if (!free_space_ctl_cache) {
+ kmem_cache_destroy(block_group_cache);
+ return -ENOMEM;
+ }
+
return 0;
}
void __cold btrfs_exit_block_group(void)
{
kmem_cache_destroy(block_group_cache);
+ kmem_cache_destroy(free_space_ctl_cache);
}
#ifdef CONFIG_BTRFS_DEBUG
@@ -197,7 +208,7 @@ void btrfs_put_block_group(struct btrfs_block_group *cache)
btrfs_discard_cancel_work(&cache->fs_info->discard_ctl,
cache);
- kfree(cache->free_space_ctl);
+ kmem_cache_free(free_space_ctl_cache, cache->free_space_ctl);
btrfs_free_chunk_map(cache->physical_map);
kmem_cache_free(block_group_cache, cache);
}
@@ -2392,7 +2403,7 @@ static struct btrfs_block_group *btrfs_create_block_group(
if (!cache)
return NULL;
- cache->free_space_ctl = kzalloc_obj(*cache->free_space_ctl, GFP_NOFS);
+ cache->free_space_ctl = kmem_cache_zalloc(free_space_ctl_cache, GFP_NOFS);
if (!cache->free_space_ctl) {
kmem_cache_free(block_group_cache, cache);
return NULL;
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 04/10] btrfs: remove start field from struct btrfs_free_space_ctl
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
` (2 preceding siblings ...)
2026-04-15 18:44 ` [PATCH v2 03/10] btrfs: use a kmem_cache for free space control structures fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-15 18:44 ` [PATCH v2 05/10] btrfs: remove unit " fdmanana
` (6 subsequent siblings)
10 siblings, 0 replies; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
There's no need for the start field, we can take it from the block group.
This reduces the structure size from 152 bytes down to 144 bytes, so on
a 4K page system we can now fit 28 structures instead of 26.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/free-space-cache.c | 9 ++++-----
fs/btrfs/free-space-cache.h | 1 -
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index ab22e4f9ffdd..4f53e0908f18 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1572,10 +1572,10 @@ static inline u64 offset_to_bitmap(struct btrfs_free_space_ctl *ctl,
u64 bytes_per_bitmap;
bytes_per_bitmap = BITS_PER_BITMAP * ctl->unit;
- bitmap_start = offset - ctl->start;
+ bitmap_start = offset - ctl->block_group->start;
bitmap_start = div64_u64(bitmap_start, bytes_per_bitmap);
bitmap_start *= bytes_per_bitmap;
- bitmap_start += ctl->start;
+ bitmap_start += ctl->block_group->start;
return bitmap_start;
}
@@ -2054,9 +2054,9 @@ find_free_space(struct btrfs_free_space_ctl *ctl, u64 *offset, u64 *bytes,
* to match our requested alignment
*/
if (*bytes >= align) {
- tmp = entry->offset - ctl->start + align - 1;
+ tmp = entry->offset - ctl->block_group->start + align - 1;
tmp = div64_u64(tmp, align);
- tmp = tmp * align + ctl->start;
+ tmp = tmp * align + ctl->block_group->start;
align_off = tmp - entry->offset;
} else {
align_off = 0;
@@ -2951,7 +2951,6 @@ void btrfs_init_free_space_ctl(struct btrfs_block_group *block_group,
spin_lock_init(&ctl->tree_lock);
ctl->unit = fs_info->sectorsize;
- ctl->start = block_group->start;
ctl->block_group = block_group;
ctl->op = &free_space_op;
ctl->free_space_bytes = RB_ROOT_CACHED;
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 33fc3b245648..e75482fb2d69 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -82,7 +82,6 @@ struct btrfs_free_space_ctl {
int free_extents;
int total_bitmaps;
int unit;
- u64 start;
s32 discardable_extents[BTRFS_STAT_NR_ENTRIES];
s64 discardable_bytes[BTRFS_STAT_NR_ENTRIES];
const struct btrfs_free_space_op *op;
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 05/10] btrfs: remove unit field from struct btrfs_free_space_ctl
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
` (3 preceding siblings ...)
2026-04-15 18:44 ` [PATCH v2 04/10] btrfs: remove start field from struct btrfs_free_space_ctl fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-15 18:44 ` [PATCH v2 06/10] btrfs: reduce size of " fdmanana
` (5 subsequent siblings)
10 siblings, 0 replies; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The unit field always has a value matching the sector size, and since we
have a block group pointer in the structure, we can access the block group
and then its fs_info field to get to the sector size. So remove the field,
which will allow us later to shrink the structure size.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/free-space-cache.c | 95 ++++++++++++++++++++-----------------
fs/btrfs/free-space-cache.h | 1 -
2 files changed, 52 insertions(+), 44 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 4f53e0908f18..9b33f68f43ec 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -690,11 +690,12 @@ static int io_ctl_read_bitmap(struct btrfs_io_ctl *io_ctl,
static void recalculate_thresholds(struct btrfs_free_space_ctl *ctl)
{
struct btrfs_block_group *block_group = ctl->block_group;
+ const int unit = block_group->fs_info->sectorsize;
u64 max_bytes;
u64 bitmap_bytes;
u64 extent_bytes;
u64 size = block_group->length;
- u64 bytes_per_bg = BITS_PER_BITMAP * ctl->unit;
+ u64 bytes_per_bg = BITS_PER_BITMAP * unit;
u64 max_bitmaps = div64_u64(size + bytes_per_bg - 1, bytes_per_bg);
max_bitmaps = max_t(u64, max_bitmaps, 1);
@@ -703,7 +704,7 @@ static void recalculate_thresholds(struct btrfs_free_space_ctl *ctl)
btrfs_err(block_group->fs_info,
"invalid free space control: bg start=%llu len=%llu total_bitmaps=%u unit=%u max_bitmaps=%llu bytes_per_bg=%llu",
block_group->start, block_group->length,
- ctl->total_bitmaps, ctl->unit, max_bitmaps,
+ ctl->total_bitmaps, unit, max_bitmaps,
bytes_per_bg);
ASSERT(ctl->total_bitmaps <= max_bitmaps);
@@ -718,7 +719,7 @@ static void recalculate_thresholds(struct btrfs_free_space_ctl *ctl)
else
max_bytes = MAX_CACHE_BYTES_PER_GIG * div_u64(size, SZ_1G);
- bitmap_bytes = ctl->total_bitmaps * ctl->unit;
+ bitmap_bytes = ctl->total_bitmaps * unit;
/*
* we want the extent entry threshold to always be at most 1/2 the max
@@ -916,7 +917,7 @@ static int copy_free_space_cache(struct btrfs_block_group *block_group,
spin_lock(&ctl->tree_lock);
} else {
u64 offset = info->offset;
- u64 bytes = ctl->unit;
+ u64 bytes = ctl->block_group->fs_info->sectorsize;
ret = search_bitmap(ctl, info, &offset, &bytes, false);
if (ret == 0) {
@@ -1571,7 +1572,7 @@ static inline u64 offset_to_bitmap(struct btrfs_free_space_ctl *ctl,
u64 bitmap_start;
u64 bytes_per_bitmap;
- bytes_per_bitmap = BITS_PER_BITMAP * ctl->unit;
+ bytes_per_bitmap = BITS_PER_BITMAP * ctl->block_group->fs_info->sectorsize;
bitmap_start = offset - ctl->block_group->start;
bitmap_start = div64_u64(bitmap_start, bytes_per_bitmap);
bitmap_start *= bytes_per_bitmap;
@@ -1702,6 +1703,7 @@ tree_search_offset(struct btrfs_free_space_ctl *ctl,
{
struct rb_node *n = ctl->free_space_offset.rb_node;
struct btrfs_free_space *entry = NULL, *prev = NULL;
+ const int unit = ctl->block_group->fs_info->sectorsize;
lockdep_assert_held(&ctl->tree_lock);
@@ -1785,7 +1787,7 @@ tree_search_offset(struct btrfs_free_space_ctl *ctl,
prev->offset + prev->bytes > offset)
return prev;
}
- if (entry->offset + BITS_PER_BITMAP * ctl->unit > offset)
+ if (entry->offset + BITS_PER_BITMAP * unit > offset)
return entry;
} else if (entry->offset + entry->bytes > offset)
return entry;
@@ -1799,8 +1801,7 @@ tree_search_offset(struct btrfs_free_space_ctl *ctl,
return NULL;
entry = rb_entry(n, struct btrfs_free_space, offset_index);
if (entry->bitmap) {
- if (entry->offset + BITS_PER_BITMAP *
- ctl->unit > offset)
+ if (entry->offset + BITS_PER_BITMAP * unit > offset)
break;
} else {
if (entry->offset + entry->bytes > offset)
@@ -1875,18 +1876,19 @@ static inline void bitmap_clear_bits(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *info,
u64 offset, u64 bytes, bool update_stat)
{
+ const int unit = ctl->block_group->fs_info->sectorsize;
unsigned long start, count, end;
int extent_delta = -1;
- start = offset_to_bit(info->offset, ctl->unit, offset);
- count = bytes_to_bits(bytes, ctl->unit);
+ start = offset_to_bit(info->offset, unit, offset);
+ count = bytes_to_bits(bytes, unit);
end = start + count;
ASSERT(end <= BITS_PER_BITMAP);
bitmap_clear(info->bitmap, start, count);
info->bytes -= bytes;
- if (info->max_extent_size > ctl->unit)
+ if (info->max_extent_size > unit)
info->max_extent_size = 0;
relink_bitmap_entry(ctl, info);
@@ -1911,11 +1913,12 @@ static void btrfs_bitmap_set_bits(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *info, u64 offset,
u64 bytes)
{
+ const int unit = ctl->block_group->fs_info->sectorsize;
unsigned long start, count, end;
int extent_delta = 1;
- start = offset_to_bit(info->offset, ctl->unit, offset);
- count = bytes_to_bits(bytes, ctl->unit);
+ start = offset_to_bit(info->offset, unit, offset);
+ count = bytes_to_bits(bytes, unit);
end = start + count;
ASSERT(end <= BITS_PER_BITMAP);
@@ -1952,6 +1955,7 @@ static int search_bitmap(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *bitmap_info, u64 *offset,
u64 *bytes, bool for_alloc)
{
+ const int unit = ctl->block_group->fs_info->sectorsize;
unsigned long found_bits = 0;
unsigned long max_bits = 0;
unsigned long bits, i;
@@ -1969,9 +1973,9 @@ static int search_bitmap(struct btrfs_free_space_ctl *ctl,
return -1;
}
- i = offset_to_bit(bitmap_info->offset, ctl->unit,
+ i = offset_to_bit(bitmap_info->offset, unit,
max_t(u64, *offset, bitmap_info->offset));
- bits = bytes_to_bits(*bytes, ctl->unit);
+ bits = bytes_to_bits(*bytes, unit);
for_each_set_bit_from(i, bitmap_info->bitmap, BITS_PER_BITMAP) {
if (for_alloc && bits == 1) {
@@ -1991,12 +1995,12 @@ static int search_bitmap(struct btrfs_free_space_ctl *ctl,
}
if (found_bits) {
- *offset = (u64)(i * ctl->unit) + bitmap_info->offset;
- *bytes = (u64)(found_bits) * ctl->unit;
+ *offset = (u64)(i * unit) + bitmap_info->offset;
+ *bytes = (u64)(found_bits) * unit;
return 0;
}
- *bytes = (u64)(max_bits) * ctl->unit;
+ *bytes = (u64)(max_bits) * unit;
bitmap_info->max_extent_size = *bytes;
relink_bitmap_entry(ctl, bitmap_info);
return -1;
@@ -2148,12 +2152,13 @@ static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *bitmap_info,
u64 *offset, u64 *bytes)
{
+ const int unit = ctl->block_group->fs_info->sectorsize;
u64 end;
u64 search_start, search_bytes;
int ret;
again:
- end = bitmap_info->offset + (u64)(BITS_PER_BITMAP * ctl->unit) - 1;
+ end = bitmap_info->offset + (u64)(BITS_PER_BITMAP * unit) - 1;
/*
* We need to search for bits in this bitmap. We could only cover some
@@ -2162,7 +2167,7 @@ static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl,
* go searching for the next bit.
*/
search_start = *offset;
- search_bytes = ctl->unit;
+ search_bytes = unit;
search_bytes = min(search_bytes, end - search_start + 1);
ret = search_bitmap(ctl, bitmap_info, &search_start, &search_bytes,
false);
@@ -2208,7 +2213,7 @@ static noinline int remove_from_bitmap(struct btrfs_free_space_ctl *ctl,
* everything over again.
*/
search_start = *offset;
- search_bytes = ctl->unit;
+ search_bytes = unit;
ret = search_bitmap(ctl, bitmap_info, &search_start,
&search_bytes, false);
if (ret < 0 || search_start != *offset)
@@ -2225,6 +2230,7 @@ static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *info, u64 offset,
u64 bytes, enum btrfs_trim_state trim_state)
{
+ const int unit = ctl->block_group->fs_info->sectorsize;
u64 bytes_to_set = 0;
u64 end;
@@ -2241,7 +2247,7 @@ static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl,
info->trim_state = BTRFS_TRIM_STATE_UNTRIMMED;
}
- end = info->offset + (u64)(BITS_PER_BITMAP * ctl->unit);
+ end = info->offset + (u64)(BITS_PER_BITMAP * unit);
bytes_to_set = min(end - offset, bytes);
@@ -2295,7 +2301,7 @@ static bool use_bitmap(struct btrfs_free_space_ctl *ctl,
* so allow those block groups to still be allowed to have a bitmap
* entry.
*/
- if (((BITS_PER_BITMAP * ctl->unit) >> 1) > block_group->length)
+ if (((BITS_PER_BITMAP * fs_info->sectorsize) >> 1) > block_group->length)
return false;
return true;
@@ -2494,6 +2500,7 @@ static bool steal_from_bitmap_to_end(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *info,
bool update_stat)
{
+ const int unit = ctl->block_group->fs_info->sectorsize;
struct btrfs_free_space *bitmap;
unsigned long i;
unsigned long j;
@@ -2505,11 +2512,11 @@ static bool steal_from_bitmap_to_end(struct btrfs_free_space_ctl *ctl,
if (!bitmap)
return false;
- i = offset_to_bit(bitmap->offset, ctl->unit, end);
+ i = offset_to_bit(bitmap->offset, unit, end);
j = find_next_zero_bit(bitmap->bitmap, BITS_PER_BITMAP, i);
if (j == i)
return false;
- bytes = (j - i) * ctl->unit;
+ bytes = (j - i) * unit;
info->bytes += bytes;
/* See try_merge_free_space() comment. */
@@ -2528,6 +2535,7 @@ static bool steal_from_bitmap_to_front(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *info,
bool update_stat)
{
+ const int unit = ctl->block_group->fs_info->sectorsize;
struct btrfs_free_space *bitmap;
u64 bitmap_offset;
unsigned long i;
@@ -2547,7 +2555,7 @@ static bool steal_from_bitmap_to_front(struct btrfs_free_space_ctl *ctl,
if (!bitmap)
return false;
- i = offset_to_bit(bitmap->offset, ctl->unit, info->offset) - 1;
+ i = offset_to_bit(bitmap->offset, unit, info->offset) - 1;
j = 0;
prev_j = (unsigned long)-1;
for_each_clear_bit_from(j, bitmap->bitmap, BITS_PER_BITMAP) {
@@ -2559,9 +2567,9 @@ static bool steal_from_bitmap_to_front(struct btrfs_free_space_ctl *ctl,
return false;
if (prev_j == (unsigned long)-1)
- bytes = (i + 1) * ctl->unit;
+ bytes = (i + 1) * unit;
else
- bytes = (i - prev_j) * ctl->unit;
+ bytes = (i - prev_j) * unit;
info->offset -= bytes;
info->bytes += bytes;
@@ -2947,10 +2955,7 @@ void btrfs_dump_free_space(struct btrfs_block_group *block_group,
void btrfs_init_free_space_ctl(struct btrfs_block_group *block_group,
struct btrfs_free_space_ctl *ctl)
{
- struct btrfs_fs_info *fs_info = block_group->fs_info;
-
spin_lock_init(&ctl->tree_lock);
- ctl->unit = fs_info->sectorsize;
ctl->block_group = block_group;
ctl->op = &free_space_op;
ctl->free_space_bytes = RB_ROOT_CACHED;
@@ -3326,6 +3331,7 @@ static int btrfs_bitmap_cluster(struct btrfs_block_group *block_group,
u64 cont1_bytes, u64 min_bytes)
{
struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
+ const int unit = block_group->fs_info->sectorsize;
unsigned long next_zero;
unsigned long i;
unsigned long want_bits;
@@ -3338,10 +3344,10 @@ static int btrfs_bitmap_cluster(struct btrfs_block_group *block_group,
lockdep_assert_held(&ctl->tree_lock);
- i = offset_to_bit(entry->offset, ctl->unit,
+ i = offset_to_bit(entry->offset, unit,
max_t(u64, offset, entry->offset));
- want_bits = bytes_to_bits(bytes, ctl->unit);
- min_bits = bytes_to_bits(min_bytes, ctl->unit);
+ want_bits = bytes_to_bits(bytes, unit);
+ min_bits = bytes_to_bits(min_bytes, unit);
/*
* Don't bother looking for a cluster in this bitmap if it's heavily
@@ -3367,7 +3373,7 @@ static int btrfs_bitmap_cluster(struct btrfs_block_group *block_group,
}
if (!found_bits) {
- entry->max_extent_size = (u64)max_bits * ctl->unit;
+ entry->max_extent_size = (u64)max_bits * unit;
return -ENOSPC;
}
@@ -3378,15 +3384,15 @@ static int btrfs_bitmap_cluster(struct btrfs_block_group *block_group,
total_found += found_bits;
- if (cluster->max_size < found_bits * ctl->unit)
- cluster->max_size = found_bits * ctl->unit;
+ if (cluster->max_size < found_bits * unit)
+ cluster->max_size = found_bits * unit;
if (total_found < want_bits || cluster->max_size < cont1_bytes) {
i = next_zero + 1;
goto again;
}
- cluster->window_start = start * ctl->unit + entry->offset;
+ cluster->window_start = start * unit + entry->offset;
rb_erase(&entry->offset_index, &ctl->free_space_offset);
rb_erase_cached(&entry->bytes_index, &ctl->free_space_bytes);
@@ -3402,8 +3408,7 @@ static int btrfs_bitmap_cluster(struct btrfs_block_group *block_group,
ret = tree_insert_offset(ctl, cluster, entry);
ASSERT(!ret); /* -EEXIST; Logic error */
- trace_btrfs_setup_cluster(block_group, cluster,
- total_found * ctl->unit, 1);
+ trace_btrfs_setup_cluster(block_group, cluster, total_found * unit, 1);
return 0;
}
@@ -4043,7 +4048,9 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
}
next:
if (next_bitmap) {
- offset += BITS_PER_BITMAP * ctl->unit;
+ const int unit = block_group->fs_info->sectorsize;
+
+ offset += BITS_PER_BITMAP * unit;
start = offset;
} else {
start += bytes;
@@ -4070,6 +4077,7 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group,
u64 *trimmed, u64 start, u64 end, u64 minlen)
{
struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
+ const int unit = block_group->fs_info->sectorsize;
int ret;
u64 rem = 0;
@@ -4090,7 +4098,7 @@ int btrfs_trim_block_group(struct btrfs_block_group *block_group,
goto out;
ret = trim_bitmaps(block_group, trimmed, start, end, minlen, 0, false);
- div64_u64_rem(end, BITS_PER_BITMAP * ctl->unit, &rem);
+ div64_u64_rem(end, BITS_PER_BITMAP * unit, &rem);
/* If we ended in the middle of a bitmap, reset the trimming flag */
if (rem)
reset_trimming_bitmap(ctl, offset_to_bitmap(ctl, end));
@@ -4309,6 +4317,7 @@ int test_check_exists(struct btrfs_block_group *cache,
u64 offset, u64 bytes)
{
struct btrfs_free_space_ctl *ctl = cache->free_space_ctl;
+ const int unit = cache->fs_info->sectorsize;
struct btrfs_free_space *info;
int ret = 0;
@@ -4328,7 +4337,7 @@ int test_check_exists(struct btrfs_block_group *cache,
struct btrfs_free_space *tmp;
bit_off = offset;
- bit_bytes = ctl->unit;
+ bit_bytes = unit;
ret = search_bitmap(ctl, info, &bit_off, &bit_bytes, false);
if (!ret) {
if (bit_off == offset) {
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index e75482fb2d69..2ee0b876723a 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -81,7 +81,6 @@ struct btrfs_free_space_ctl {
int extents_thresh;
int free_extents;
int total_bitmaps;
- int unit;
s32 discardable_extents[BTRFS_STAT_NR_ENTRIES];
s64 discardable_bytes[BTRFS_STAT_NR_ENTRIES];
const struct btrfs_free_space_op *op;
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 06/10] btrfs: reduce size of struct btrfs_free_space_ctl
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
` (4 preceding siblings ...)
2026-04-15 18:44 ` [PATCH v2 05/10] btrfs: remove unit " fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-15 18:44 ` [PATCH v2 07/10] btrfs: remove op field from " fdmanana
` (4 subsequent siblings)
10 siblings, 0 replies; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We have a 4 bytes hole in the structure, reorder some fields so that we
eliminate the hole and reduce the structure size from 144 bytes down to
136 bytes. This way on a 4K page system, we can fit 30 structures per
page instead of 28.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/free-space-cache.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 2ee0b876723a..c5d0f3e056cc 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -74,13 +74,13 @@ enum {
};
struct btrfs_free_space_ctl {
- spinlock_t tree_lock;
struct rb_root free_space_offset;
struct rb_root_cached free_space_bytes;
- u64 free_space;
+ spinlock_t tree_lock;
int extents_thresh;
int free_extents;
int total_bitmaps;
+ u64 free_space;
s32 discardable_extents[BTRFS_STAT_NR_ENTRIES];
s64 discardable_bytes[BTRFS_STAT_NR_ENTRIES];
const struct btrfs_free_space_op *op;
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 07/10] btrfs: remove op field from struct btrfs_free_space_ctl
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
` (5 preceding siblings ...)
2026-04-15 18:44 ` [PATCH v2 06/10] btrfs: reduce size of " fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-15 18:44 ` [PATCH v2 08/10] btrfs: remove block group argument from copy_free_space_cache() fdmanana
` (3 subsequent siblings)
10 siblings, 0 replies; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
The op field always points to the same use_bitmap function, the only
exception is during self tests where we make it temporarily point to a
different function. So just because of this op pointer field we are
increasing the structure size by 8 bytes.
Instead of storing a pointer to a use_bitmap function in struct
btrfs_free_space_ctl, move the pointer to struct btrfs_info, make
insert_into_bitmap() use that pointer if we are running the self tests
and initialize that pointer to the current, default use_bitmap function
(now exported for the tests as btrfs_use_bitmap). This way we reduce
the size of struct btrfs_free_space_ctl from 136 to 128 bytes and can
now fit 32 structures in a 4K page instead of 30. This also avoids the
cost of the indirection of a function pointer call when we are not
running the self tests.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/free-space-cache.c | 24 +++++++++++-------------
fs/btrfs/free-space-cache.h | 8 ++------
fs/btrfs/fs.h | 7 +++++++
fs/btrfs/tests/btrfs-tests.c | 1 +
fs/btrfs/tests/free-space-tests.c | 24 ++++++++++--------------
5 files changed, 31 insertions(+), 33 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 9b33f68f43ec..0d8bc55449b0 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2257,7 +2257,8 @@ static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl,
}
-static bool use_bitmap(struct btrfs_free_space_ctl *ctl,
+EXPORT_FOR_TESTS
+bool btrfs_use_bitmap(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *info)
{
struct btrfs_block_group *block_group = ctl->block_group;
@@ -2307,15 +2308,11 @@ static bool use_bitmap(struct btrfs_free_space_ctl *ctl,
return true;
}
-static const struct btrfs_free_space_op free_space_op = {
- .use_bitmap = use_bitmap,
-};
-
static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *info)
{
struct btrfs_free_space *bitmap_info;
- struct btrfs_block_group *block_group = NULL;
+ struct btrfs_block_group *block_group = ctl->block_group;
int added = 0;
u64 bytes, offset, bytes_added;
enum btrfs_trim_state trim_state;
@@ -2325,18 +2322,20 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl *ctl,
offset = info->offset;
trim_state = info->trim_state;
- if (!ctl->op->use_bitmap(ctl, info))
- return 0;
-
- if (ctl->op == &free_space_op)
- block_group = ctl->block_group;
+ if (btrfs_is_testing(block_group->fs_info)) {
+ if (!block_group->fs_info->use_bitmap(ctl, info))
+ return 0;
+ } else {
+ if (!btrfs_use_bitmap(ctl, info))
+ return 0;
+ }
again:
/*
* Since we link bitmaps right into the cluster we need to see if we
* have a cluster here, and if so and it has our bitmap we need to add
* the free space to that bitmap.
*/
- if (block_group && !list_empty(&block_group->cluster_list)) {
+ if (!list_empty(&block_group->cluster_list)) {
struct btrfs_free_cluster *cluster;
struct rb_node *node;
struct btrfs_free_space *entry;
@@ -2957,7 +2956,6 @@ void btrfs_init_free_space_ctl(struct btrfs_block_group *block_group,
{
spin_lock_init(&ctl->tree_lock);
ctl->block_group = block_group;
- ctl->op = &free_space_op;
ctl->free_space_bytes = RB_ROOT_CACHED;
INIT_LIST_HEAD(&ctl->trimming_ranges);
mutex_init(&ctl->cache_writeout_mutex);
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index c5d0f3e056cc..53fe8e293af1 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -83,17 +83,11 @@ struct btrfs_free_space_ctl {
u64 free_space;
s32 discardable_extents[BTRFS_STAT_NR_ENTRIES];
s64 discardable_bytes[BTRFS_STAT_NR_ENTRIES];
- const struct btrfs_free_space_op *op;
struct btrfs_block_group *block_group;
struct mutex cache_writeout_mutex;
struct list_head trimming_ranges;
};
-struct btrfs_free_space_op {
- bool (*use_bitmap)(struct btrfs_free_space_ctl *ctl,
- struct btrfs_free_space *info);
-};
-
struct btrfs_io_ctl {
void *cur, *orig;
struct page *page;
@@ -170,6 +164,8 @@ bool btrfs_free_space_cache_v1_active(struct btrfs_fs_info *fs_info);
int btrfs_set_free_space_cache_v1_active(struct btrfs_fs_info *fs_info, bool active);
/* Support functions for running our sanity tests */
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
+bool btrfs_use_bitmap(struct btrfs_free_space_ctl *ctl,
+ struct btrfs_free_space *info);
int test_add_free_space_entry(struct btrfs_block_group *cache,
u64 offset, u64 bytes, bool bitmap);
int test_check_exists(struct btrfs_block_group *cache, u64 offset, u64 bytes);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 2c1626155645..64a4da4f7f42 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -485,6 +485,9 @@ struct btrfs_delayed_root {
wait_queue_head_t wait;
};
+struct btrfs_free_space_ctl;
+struct btrfs_free_space;
+
struct btrfs_fs_info {
u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
unsigned long flags;
@@ -957,6 +960,10 @@ struct btrfs_fs_info {
spinlock_t eb_leak_lock;
struct list_head allocated_ebs;
#endif
+
+ /* Used by self tests only. */
+ bool (*use_bitmap)(struct btrfs_free_space_ctl *ctl,
+ struct btrfs_free_space *info);
};
#define folio_to_inode(_folio) (BTRFS_I(_Generic((_folio), \
diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index 19c127ac6d10..6287d940323d 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -145,6 +145,7 @@ struct btrfs_fs_info *btrfs_alloc_dummy_fs_info(u32 nodesize, u32 sectorsize)
fs_info->csum_size = 4;
fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) /
fs_info->csum_size;
+ fs_info->use_bitmap = btrfs_use_bitmap;
set_bit(BTRFS_FS_STATE_DUMMY_FS_INFO, &fs_info->fs_state);
test_mnt->mnt_sb->s_fs_info = fs_info;
diff --git a/fs/btrfs/tests/free-space-tests.c b/fs/btrfs/tests/free-space-tests.c
index ebf68fcd2149..0425b3b68716 100644
--- a/fs/btrfs/tests/free-space-tests.c
+++ b/fs/btrfs/tests/free-space-tests.c
@@ -398,10 +398,8 @@ test_steal_space_from_bitmap_to_extent(struct btrfs_block_group *cache,
int ret;
u64 offset;
u64 max_extent_size;
- const struct btrfs_free_space_op test_free_space_ops = {
- .use_bitmap = test_use_bitmap,
- };
- const struct btrfs_free_space_op *orig_free_space_ops;
+ bool (*orig_use_bitmap)(struct btrfs_free_space_ctl *ctl,
+ struct btrfs_free_space *info);
test_msg("running space stealing from bitmap to extent tests");
@@ -423,8 +421,8 @@ test_steal_space_from_bitmap_to_extent(struct btrfs_block_group *cache,
* that forces use of bitmaps as soon as we have at least 1
* extent entry.
*/
- orig_free_space_ops = cache->free_space_ctl->op;
- cache->free_space_ctl->op = &test_free_space_ops;
+ orig_use_bitmap = cache->fs_info->use_bitmap;
+ cache->fs_info->use_bitmap = test_use_bitmap;
/*
* Extent entry covering free space range [128Mb - 256Kb, 128Mb - 128Kb[
@@ -818,7 +816,7 @@ test_steal_space_from_bitmap_to_extent(struct btrfs_block_group *cache,
if (ret)
return ret;
- cache->free_space_ctl->op = orig_free_space_ops;
+ cache->fs_info->use_bitmap = orig_use_bitmap;
btrfs_remove_free_space_cache(cache);
return 0;
@@ -832,10 +830,8 @@ static bool bytes_index_use_bitmap(struct btrfs_free_space_ctl *ctl,
static int test_bytes_index(struct btrfs_block_group *cache, u32 sectorsize)
{
- const struct btrfs_free_space_op test_free_space_ops = {
- .use_bitmap = bytes_index_use_bitmap,
- };
- const struct btrfs_free_space_op *orig_free_space_ops;
+ bool (*orig_use_bitmap)(struct btrfs_free_space_ctl *ctl,
+ struct btrfs_free_space *info);
struct btrfs_free_space_ctl *ctl = cache->free_space_ctl;
struct btrfs_free_space *entry;
struct rb_node *node;
@@ -892,8 +888,8 @@ static int test_bytes_index(struct btrfs_block_group *cache, u32 sectorsize)
/* Now validate bitmaps with different ->max_extent_size. */
btrfs_remove_free_space_cache(cache);
- orig_free_space_ops = cache->free_space_ctl->op;
- cache->free_space_ctl->op = &test_free_space_ops;
+ orig_use_bitmap = cache->fs_info->use_bitmap;
+ cache->fs_info->use_bitmap = bytes_index_use_bitmap;
ret = test_add_free_space_entry(cache, 0, sectorsize, 1);
if (ret) {
@@ -997,7 +993,7 @@ static int test_bytes_index(struct btrfs_block_group *cache, u32 sectorsize)
return -EINVAL;
}
- cache->free_space_ctl->op = orig_free_space_ops;
+ cache->fs_info->use_bitmap = orig_use_bitmap;
btrfs_remove_free_space_cache(cache);
return 0;
}
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 08/10] btrfs: remove block group argument from copy_free_space_cache()
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
` (6 preceding siblings ...)
2026-04-15 18:44 ` [PATCH v2 07/10] btrfs: remove op field from " fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-15 18:44 ` [PATCH v2 09/10] btrfs: remove unnecessary ctl argument from __btrfs_write_out_cache() fdmanana
` (2 subsequent siblings)
10 siblings, 0 replies; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
It's not necessary since we can get the block group from the given
free space control structure.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/free-space-cache.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 0d8bc55449b0..5ede85cf6c69 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -897,8 +897,7 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode,
goto out;
}
-static int copy_free_space_cache(struct btrfs_block_group *block_group,
- struct btrfs_free_space_ctl *ctl)
+static int copy_free_space_cache(struct btrfs_free_space_ctl *ctl)
{
struct btrfs_free_space *info;
struct rb_node *n;
@@ -913,7 +912,7 @@ static int copy_free_space_cache(struct btrfs_block_group *block_group,
unlink_free_space(ctl, info, true);
spin_unlock(&ctl->tree_lock);
kmem_cache_free(btrfs_free_space_cachep, info);
- ret = btrfs_add_free_space(block_group, offset, bytes);
+ ret = btrfs_add_free_space(ctl->block_group, offset, bytes);
spin_lock(&ctl->tree_lock);
} else {
u64 offset = info->offset;
@@ -923,7 +922,7 @@ static int copy_free_space_cache(struct btrfs_block_group *block_group,
if (ret == 0) {
bitmap_clear_bits(ctl, info, offset, bytes, true);
spin_unlock(&ctl->tree_lock);
- ret = btrfs_add_free_space(block_group, offset,
+ ret = btrfs_add_free_space(ctl->block_group, offset,
bytes);
spin_lock(&ctl->tree_lock);
} else {
@@ -1026,7 +1025,7 @@ int load_free_space_cache(struct btrfs_block_group *block_group)
if (matched) {
spin_lock(&tmp_ctl.tree_lock);
- ret = copy_free_space_cache(block_group, &tmp_ctl);
+ ret = copy_free_space_cache(&tmp_ctl);
spin_unlock(&tmp_ctl.tree_lock);
/*
* ret == 1 means we successfully loaded the free space cache,
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 09/10] btrfs: remove unnecessary ctl argument from __btrfs_write_out_cache()
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
` (7 preceding siblings ...)
2026-04-15 18:44 ` [PATCH v2 08/10] btrfs: remove block group argument from copy_free_space_cache() fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-15 18:44 ` [PATCH v2 10/10] btrfs: remove unnecessary ctl argument from write_cache_extent_entries() fdmanana
2026-04-17 2:16 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups David Sterba
10 siblings, 0 replies; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
We can get the free space control structure from the given block group,
so there is no need to pass it as an argument.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/free-space-cache.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 5ede85cf6c69..76d4b7eea746 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1367,10 +1367,10 @@ int btrfs_wait_cache_io(struct btrfs_trans_handle *trans,
* or an errno if it was not.
*/
static int __btrfs_write_out_cache(struct inode *inode,
- struct btrfs_free_space_ctl *ctl,
struct btrfs_block_group *block_group,
struct btrfs_trans_handle *trans)
{
+ struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
struct btrfs_io_ctl *io_ctl = &block_group->io_ctl;
struct extent_state *cached_state = NULL;
LIST_HEAD(bitmap_list);
@@ -1516,7 +1516,6 @@ int btrfs_write_out_cache(struct btrfs_trans_handle *trans,
struct btrfs_path *path)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
- struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
struct inode *inode;
int ret = 0;
@@ -1531,7 +1530,7 @@ int btrfs_write_out_cache(struct btrfs_trans_handle *trans,
if (IS_ERR(inode))
return 0;
- ret = __btrfs_write_out_cache(inode, ctl, block_group, trans);
+ ret = __btrfs_write_out_cache(inode, block_group, trans);
if (ret) {
btrfs_debug(fs_info,
"failed to write free space cache for block group %llu error %d",
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* [PATCH v2 10/10] btrfs: remove unnecessary ctl argument from write_cache_extent_entries()
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
` (8 preceding siblings ...)
2026-04-15 18:44 ` [PATCH v2 09/10] btrfs: remove unnecessary ctl argument from __btrfs_write_out_cache() fdmanana
@ 2026-04-15 18:44 ` fdmanana
2026-04-17 2:16 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups David Sterba
10 siblings, 0 replies; 28+ messages in thread
From: fdmanana @ 2026-04-15 18:44 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
There is no need to pass the free space control structure as an argument
because we can grab it from the given block group.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/free-space-cache.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 76d4b7eea746..47d00fa6001f 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1068,12 +1068,12 @@ int load_free_space_cache(struct btrfs_block_group *block_group)
static noinline_for_stack
int write_cache_extent_entries(struct btrfs_io_ctl *io_ctl,
- struct btrfs_free_space_ctl *ctl,
struct btrfs_block_group *block_group,
int *entries, int *bitmaps,
struct list_head *bitmap_list)
{
int ret;
+ struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
struct btrfs_free_cluster *cluster = NULL;
struct btrfs_free_cluster *cluster_locked = NULL;
struct rb_node *node = rb_first(&ctl->free_space_offset);
@@ -1416,8 +1416,7 @@ static int __btrfs_write_out_cache(struct inode *inode,
mutex_lock(&ctl->cache_writeout_mutex);
/* Write out the extent entries in the free space cache */
spin_lock(&ctl->tree_lock);
- ret = write_cache_extent_entries(io_ctl, ctl,
- block_group, &entries, &bitmaps,
+ ret = write_cache_extent_entries(io_ctl, block_group, &entries, &bitmaps,
&bitmap_list);
if (ret)
goto out_nospc_locked;
--
2.47.2
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [PATCH v2 00/10] btrfs: reduce memory consumption for block groups
2026-04-15 18:44 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups fdmanana
` (9 preceding siblings ...)
2026-04-15 18:44 ` [PATCH v2 10/10] btrfs: remove unnecessary ctl argument from write_cache_extent_entries() fdmanana
@ 2026-04-17 2:16 ` David Sterba
2026-04-17 7:54 ` Filipe Manana
10 siblings, 1 reply; 28+ messages in thread
From: David Sterba @ 2026-04-17 2:16 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Wed, Apr 15, 2026 at 07:44:38PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> Reduce the amount of memory used by block groups by using kmem caches
> and reducing structure sizes, plus a few cleanups. Details in the
> change logs.
>
> V2: Add one extra patch that was missing, to further reduce the size
> of struct btrfs_free_space_ctl (patch 7/10).
>
> Filipe Manana (10):
> btrfs: use a kmem_cache for block groups
> btrfs: reduce size of struct btrfs_block_group
> btrfs: use a kmem_cache for free space control structures
> btrfs: remove start field from struct btrfs_free_space_ctl
> btrfs: remove unit field from struct btrfs_free_space_ctl
> btrfs: reduce size of struct btrfs_free_space_ctl
> btrfs: remove op field from struct btrfs_free_space_ctl
> btrfs: remove block group argument from copy_free_space_cache()
> btrfs: remove unnecessary ctl argument from __btrfs_write_out_cache()
> btrfs: remove unnecessary ctl argument from write_cache_extent_entries()
Reviewed-by: David Sterba <dsterba@suse.com>
We're planning to remove the free-space-cache in the future and
hopefully majority of users have been using the FST so I'm not sure the
optimizations will have much impact.
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [PATCH v2 00/10] btrfs: reduce memory consumption for block groups
2026-04-17 2:16 ` [PATCH v2 00/10] btrfs: reduce memory consumption for block groups David Sterba
@ 2026-04-17 7:54 ` Filipe Manana
0 siblings, 0 replies; 28+ messages in thread
From: Filipe Manana @ 2026-04-17 7:54 UTC (permalink / raw)
To: dsterba; +Cc: linux-btrfs
On Fri, Apr 17, 2026 at 3:16 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Apr 15, 2026 at 07:44:38PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Reduce the amount of memory used by block groups by using kmem caches
> > and reducing structure sizes, plus a few cleanups. Details in the
> > change logs.
> >
> > V2: Add one extra patch that was missing, to further reduce the size
> > of struct btrfs_free_space_ctl (patch 7/10).
> >
> > Filipe Manana (10):
> > btrfs: use a kmem_cache for block groups
> > btrfs: reduce size of struct btrfs_block_group
> > btrfs: use a kmem_cache for free space control structures
> > btrfs: remove start field from struct btrfs_free_space_ctl
> > btrfs: remove unit field from struct btrfs_free_space_ctl
> > btrfs: reduce size of struct btrfs_free_space_ctl
> > btrfs: remove op field from struct btrfs_free_space_ctl
> > btrfs: remove block group argument from copy_free_space_cache()
> > btrfs: remove unnecessary ctl argument from __btrfs_write_out_cache()
> > btrfs: remove unnecessary ctl argument from write_cache_extent_entries()
>
> Reviewed-by: David Sterba <dsterba@suse.com>
>
> We're planning to remove the free-space-cache in the future and
> hopefully majority of users have been using the FST so I'm not sure the
> optimizations will have much impact.
I know we plan to remove the free-space-cache, but none of this is
specific to the free space cache.
The btrfs_free_space_ctl structure is used for both the free space
cache and the free space tree; it is the structure for the in-memory
cache per block group.
^ permalink raw reply [flat|nested] 28+ messages in thread