* [PATCH] btrfs: shrink the size of btrfs_bio
@ 2025-12-08 21:25 Qu Wenruo
2025-12-12 3:18 ` David Sterba
0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2025-12-08 21:25 UTC (permalink / raw)
To: linux-btrfs; +Cc: Johannes Thumshirn
This is done by:
- Shrink the size of btrfs_bio::mirror_num
From 32 bits unsigned int to u16.
Normally btrfs mirror number is either 0 (all profiles), 1 (all
profiles), 2 (DUP/RAID1/RAID10/RAID5), 3 (RAID1C3) or 4 (RAID1C4).
But for RAID6 the mirror number can go as large as the number of
devices of that chunk.
Currently the limit for number of devices for a data chunk is
BTRFS_MAX_DEVS(), which is around 500 for the default 16K nodesize.
And if going the max 64K nodesize, we can have a little over 2000
devices for a chunk.
Although I'd argue it's way overkilled, we don't reject such cases yet
thus u8 is not going to cut it, and have to use u16 (max out at 64K).
- Use bit fields for boolean members
Although it's not always safe for racy call sites, those members are
safe.
* csum_search_commit_root
* is_scrub
Those two are set immediately after bbio allocation and no more
writes after allocation, thus they are very safe.
* async_csum
* can_use_append
Those two are set for each split range, and after that there is no
writes into those two members in different threads, thus they are
also safe.
And there are spaces for 4 more bits before increasing the size of
btrfs_bio again, which should be future proof enough.
- Reorder the structure members
Now we always put the largest member first (after the huge 120 bytes
union), making it easier to fill any holes.
This reduce the size of btrfs_bio by 8 bytes, from 312 bytes to 304 bytes.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Change mirror number from u8 to u16
As for RAID6 we can have cases (that's beyond 255 devices in a RAID6
chunk) where u8 is not large enough.
Thankfully u16 is large enough for the max number of devices possible
for a RAID6 chunk.
And we have exactly a one-byte hole the in structure, and expanding
the widith of @mirror_num will not increase the size of btrfs_bio.
---
fs/btrfs/bio.h | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 246c7519dff3..157cdfa2f78a 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -68,32 +68,33 @@ struct btrfs_bio {
struct btrfs_tree_parent_check parent_check;
};
+ /* For internal use in read end I/O handling */
+ struct work_struct end_io_work;
+
/* End I/O information supplied to btrfs_bio_alloc */
btrfs_bio_end_io_t end_io;
void *private;
- /* For internal use in read end I/O handling */
- unsigned int mirror_num;
atomic_t pending_ios;
- struct work_struct end_io_work;
+ u16 mirror_num;
/* Save the first error status of split bio. */
blk_status_t status;
/* Use the commit root to look up csums (data read bio only). */
- bool csum_search_commit_root;
+ bool csum_search_commit_root:1;
/*
* Since scrub will reuse btree inode, we need this flag to distinguish
* scrub bios.
*/
- bool is_scrub;
+ bool is_scrub:1;
/* Whether the csum generation for data write is async. */
- bool async_csum;
+ bool async_csum:1;
/* Whether the bio is written using zone append. */
- bool can_use_append;
+ bool can_use_append:1;
/*
* This member must come last, bio_alloc_bioset will allocate enough
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] btrfs: shrink the size of btrfs_bio
2025-12-08 21:25 [PATCH] btrfs: shrink the size of btrfs_bio Qu Wenruo
@ 2025-12-12 3:18 ` David Sterba
0 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2025-12-12 3:18 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Johannes Thumshirn
On Tue, Dec 09, 2025 at 07:55:03AM +1030, Qu Wenruo wrote:
> This is done by:
>
> - Shrink the size of btrfs_bio::mirror_num
> From 32 bits unsigned int to u16.
>
> Normally btrfs mirror number is either 0 (all profiles), 1 (all
> profiles), 2 (DUP/RAID1/RAID10/RAID5), 3 (RAID1C3) or 4 (RAID1C4).
>
> But for RAID6 the mirror number can go as large as the number of
> devices of that chunk.
>
> Currently the limit for number of devices for a data chunk is
> BTRFS_MAX_DEVS(), which is around 500 for the default 16K nodesize.
> And if going the max 64K nodesize, we can have a little over 2000
> devices for a chunk.
>
> Although I'd argue it's way overkilled, we don't reject such cases yet
> thus u8 is not going to cut it, and have to use u16 (max out at 64K).
>
> - Use bit fields for boolean members
> Although it's not always safe for racy call sites, those members are
> safe.
>
> * csum_search_commit_root
> * is_scrub
> Those two are set immediately after bbio allocation and no more
> writes after allocation, thus they are very safe.
>
> * async_csum
> * can_use_append
> Those two are set for each split range, and after that there is no
> writes into those two members in different threads, thus they are
> also safe.
>
> And there are spaces for 4 more bits before increasing the size of
> btrfs_bio again, which should be future proof enough.
>
> - Reorder the structure members
> Now we always put the largest member first (after the huge 120 bytes
> union), making it easier to fill any holes.
>
> This reduce the size of btrfs_bio by 8 bytes, from 312 bytes to 304 bytes.
>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Change mirror number from u8 to u16
> As for RAID6 we can have cases (that's beyond 255 devices in a RAID6
> chunk) where u8 is not large enough.
> Thankfully u16 is large enough for the max number of devices possible
> for a RAID6 chunk.
>
> And we have exactly a one-byte hole the in structure, and expanding
> the widith of @mirror_num will not increase the size of btrfs_bio.
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] btrfs: shrink the size of btrfs_bio
@ 2025-12-05 8:04 Qu Wenruo
2025-12-05 10:09 ` Johannes Thumshirn
2025-12-08 19:19 ` David Sterba
0 siblings, 2 replies; 8+ messages in thread
From: Qu Wenruo @ 2025-12-05 8:04 UTC (permalink / raw)
To: linux-btrfs
This is done by:
- Shrink the size of btrfs_bio::mirror_num
From 32 bits unsigned int to 8 bits u8.
- Use bit fields for boolean members
Although it's not always safe for racy call sites, those members are
safe.
* csum_search_commit_root
* is_scrub
Those two are set immediately after bbio allocation and no more
writes after allocation, thus they are very safe.
* async_csum
* can_use_append
Those two are set for each split range, and after that there is no
writes into those two members in different threads, thus they are
also safe.
And there are spaces for 12 more bits before increasing the size of
btrfs_bio again, which should be very future proof.
- Reorder the structure members
Now we always put the largest member first (after the huge 120 bytes
union), making it easier to fill any holes.
This reduce the size of btrfs_bio by 8 bytes, from 312 bytes to 304 bytes.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/bio.h | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index d6da9ed08bfa..33179b45045c 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -68,32 +68,33 @@ struct btrfs_bio {
struct btrfs_tree_parent_check parent_check;
};
+ /* For internal use in read end I/O handling */
+ struct work_struct end_io_work;
+
/* End I/O information supplied to btrfs_bio_alloc */
btrfs_bio_end_io_t end_io;
void *private;
- /* For internal use in read end I/O handling */
- unsigned int mirror_num;
atomic_t pending_ios;
- struct work_struct end_io_work;
/* Save the first error status of split bio. */
blk_status_t status;
+ u8 mirror_num;
/* Use the commit root to look up csums (data read bio only). */
- bool csum_search_commit_root;
+ bool csum_search_commit_root:1;
/*
* Since scrub will reuse btree inode, we need this flag to distinguish
* scrub bios.
*/
- bool is_scrub;
+ bool is_scrub:1;
/* Whether the csum generation for data write is async. */
- bool async_csum;
+ bool async_csum:1;
/* Whether the bio is written using zone append. */
- bool can_use_append;
+ bool can_use_append:1;
/*
* This member must come last, bio_alloc_bioset will allocate enough
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] btrfs: shrink the size of btrfs_bio
2025-12-05 8:04 Qu Wenruo
@ 2025-12-05 10:09 ` Johannes Thumshirn
2025-12-08 19:19 ` David Sterba
1 sibling, 0 replies; 8+ messages in thread
From: Johannes Thumshirn @ 2025-12-05 10:09 UTC (permalink / raw)
To: WenRuo Qu, linux-btrfs@vger.kernel.org
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] btrfs: shrink the size of btrfs_bio
2025-12-05 8:04 Qu Wenruo
2025-12-05 10:09 ` Johannes Thumshirn
@ 2025-12-08 19:19 ` David Sterba
2025-12-08 20:26 ` Qu Wenruo
1 sibling, 1 reply; 8+ messages in thread
From: David Sterba @ 2025-12-08 19:19 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, Dec 05, 2025 at 06:34:30PM +1030, Qu Wenruo wrote:
> This is done by:
>
> - Shrink the size of btrfs_bio::mirror_num
> From 32 bits unsigned int to 8 bits u8.
What is the explanation for this? IIRC the mirror num on raid56 refers
to the device index, the type width should not be too narrow. But if
the semantics are different we can reduce the mirror num in other
structures as well.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] btrfs: shrink the size of btrfs_bio
2025-12-08 19:19 ` David Sterba
@ 2025-12-08 20:26 ` Qu Wenruo
2025-12-08 20:44 ` David Sterba
0 siblings, 1 reply; 8+ messages in thread
From: Qu Wenruo @ 2025-12-08 20:26 UTC (permalink / raw)
To: dsterba, Qu Wenruo; +Cc: linux-btrfs
在 2025/12/9 05:49, David Sterba 写道:
> On Fri, Dec 05, 2025 at 06:34:30PM +1030, Qu Wenruo wrote:
>> This is done by:
>>
>> - Shrink the size of btrfs_bio::mirror_num
>> From 32 bits unsigned int to 8 bits u8.
>
> What is the explanation for this? IIRC the mirror num on raid56 refers
> to the device index,
You're right, u8 can not cut the max number of devices for RAID6.
(RAID5 only has two mirrors, mirror 0 meaning reading from data stripes,
mirror 1 means rebuild using other data and P stripe)
BTRFS_MAX_DEVICES() is around 500 for the default 16K node size, which
is already beyond 255.
Although in the real world it can hardly go that extreme, but without a
proper rejection/sanity checks, we can not do the shrink now.
I'd like to limit the device number to something more realistic.
Would the device limit of 32 cut for both RAID5 and RAID6?
(And maybe apply this limit to RAID10/RAID0 too?)
Or someone would prefer more devices?
Thanks,
Qu
> the type width should not be too narrow. But if
> the semantics are different we can reduce the mirror num in other
> structures as well.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] btrfs: shrink the size of btrfs_bio
2025-12-08 20:26 ` Qu Wenruo
@ 2025-12-08 20:44 ` David Sterba
2025-12-08 20:53 ` Qu Wenruo
0 siblings, 1 reply; 8+ messages in thread
From: David Sterba @ 2025-12-08 20:44 UTC (permalink / raw)
To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs
On Tue, Dec 09, 2025 at 06:56:47AM +1030, Qu Wenruo wrote:
>
>
> 在 2025/12/9 05:49, David Sterba 写道:
> > On Fri, Dec 05, 2025 at 06:34:30PM +1030, Qu Wenruo wrote:
> >> This is done by:
> >>
> >> - Shrink the size of btrfs_bio::mirror_num
> >> From 32 bits unsigned int to 8 bits u8.
> >
> > What is the explanation for this? IIRC the mirror num on raid56 refers
> > to the device index,
>
> You're right, u8 can not cut the max number of devices for RAID6.
> (RAID5 only has two mirrors, mirror 0 meaning reading from data stripes,
> mirror 1 means rebuild using other data and P stripe)
>
> BTRFS_MAX_DEVICES() is around 500 for the default 16K node size, which
> is already beyond 255.
>
> Although in the real world it can hardly go that extreme, but without a
> proper rejection/sanity checks, we can not do the shrink now.
>
> I'd like to limit the device number to something more realistic.
> Would the device limit of 32 cut for both RAID5 and RAID6?
> (And maybe apply this limit to RAID10/RAID0 too?)
>
> Or someone would prefer more devices?
I'd rather not add such artificial limit, I find 32 to small anyway.
Using say 200+ devices will likely hit other boundaries like fitting
items into some structures or performance reasons, but this does not
justify setting some data structure to u8/1 byte.
With u16 and 16K devices this sounds future proof enough and we may use
u16 in the sructures to save bytes (although it generates a bit worse
code).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] btrfs: shrink the size of btrfs_bio
2025-12-08 20:44 ` David Sterba
@ 2025-12-08 20:53 ` Qu Wenruo
0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2025-12-08 20:53 UTC (permalink / raw)
To: dsterba; +Cc: Qu Wenruo, linux-btrfs
在 2025/12/9 07:14, David Sterba 写道:
> On Tue, Dec 09, 2025 at 06:56:47AM +1030, Qu Wenruo wrote:
>>
>>
>> 在 2025/12/9 05:49, David Sterba 写道:
>>> On Fri, Dec 05, 2025 at 06:34:30PM +1030, Qu Wenruo wrote:
>>>> This is done by:
>>>>
>>>> - Shrink the size of btrfs_bio::mirror_num
>>>> From 32 bits unsigned int to 8 bits u8.
>>>
>>> What is the explanation for this? IIRC the mirror num on raid56 refers
>>> to the device index,
>>
>> You're right, u8 can not cut the max number of devices for RAID6.
>> (RAID5 only has two mirrors, mirror 0 meaning reading from data stripes,
>> mirror 1 means rebuild using other data and P stripe)
>>
>> BTRFS_MAX_DEVICES() is around 500 for the default 16K node size, which
>> is already beyond 255.
>>
>> Although in the real world it can hardly go that extreme, but without a
>> proper rejection/sanity checks, we can not do the shrink now.
>>
>> I'd like to limit the device number to something more realistic.
>> Would the device limit of 32 cut for both RAID5 and RAID6?
>> (And maybe apply this limit to RAID10/RAID0 too?)
>>
>> Or someone would prefer more devices?
>
> I'd rather not add such artificial limit, I find 32 to small anyway.
> Using say 200+ devices will likely hit other boundaries like fitting
> items into some structures or performance reasons, but this does not
> justify setting some data structure to u8/1 byte.
By limiting I mean limiting the number of devices for a chunk, not the
number of total devices.
We can still have whatever number of devices (no real limit), but a
RAID0/RAID10/RAID5/RAID6 chunk shouldn't have that many devices anyway.
With that limit, things will work like this:
The fs has 64/128 or whatever number of devices, but when allocating a
RAID0/5/6 chunk, only 32 devices can be added to that chunk.
This should not make any difference, as 32 devices is already too large
to make RAID0 to have any real difference.
>
> With u16 and 16K devices this sounds future proof enough and we may use
> u16 in the sructures to save bytes (although it generates a bit worse
> code).
16K is already impossible for the device number of a chunk. I'm fine
with u16, but I really prefer more sane default limits.
Thanks,
Qu
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-12-12 3:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-08 21:25 [PATCH] btrfs: shrink the size of btrfs_bio Qu Wenruo
2025-12-12 3:18 ` David Sterba
-- strict thread matches above, loose matches on Subject: below --
2025-12-05 8:04 Qu Wenruo
2025-12-05 10:09 ` Johannes Thumshirn
2025-12-08 19:19 ` David Sterba
2025-12-08 20:26 ` Qu Wenruo
2025-12-08 20:44 ` David Sterba
2025-12-08 20:53 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox