From: Anand Jain <anand.jain@oracle.com>
To: waxhead@dirtcellar.net, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC 00/10] btrfs: new performance-based chunk allocation using device roles
Date: Mon, 2 Jun 2025 12:25:16 +0800 [thread overview]
Message-ID: <0643837b-5a64-483a-9cab-8c127bcf4b30@oracle.com> (raw)
In-Reply-To: <d513c850-c3cf-f570-247a-7b29c6376234@dirtcellar.net>
On 23/5/25 02:19, waxhead wrote:
> Anand Jain wrote:
>> In host hardware, devices can have different speeds. Generally, faster
>> devices come with lesser capacity while slower devices come with larger
>> capacity. A typical configuration would expect that:
>>
>> - A filesystem's read/write performance is evenly distributed on
>> average
>> across the entire filesystem. This is not achievable with the current
>> allocation method because chunks are allocated based only on device
>> free
>> space.
>>
>> - Typically, faster devices are assigned to metadata chunk allocations
>> while slower devices are assigned to data chunk allocations.
>>
>> Introducing Device Roles:
>>
>> Here I define 5 device roles in a specific order for metadata and in
>> the
>> reverse order for data: metadata_only, metadata, none, data, data_only.
>> One or more devices may have the same role.
>>
>> The metadata and data roles indicate preference but not exclusivity for
>> that role, whereas data_only and metadata_only are exclusive roles.
>
> As a BTRFS user I would like to comment a bit on this. I have earlier
> mentioned that I think that BTRFS should allow for device groups. E.g.
> assigning a storage device to one or more groups (or vice versa).
>
> I really like what is being introduced here, but I would like to suggest
> to take this a step further. Instead of assigning a role to the storage
> device itself then maybe it would have been wiser to follow a scheme
> like this:
>
> DeviceID -> Group(s) -> Group properties
>
> In this case what is being introduced here could easily be dealt with as
> a simple group property like (meta)data_weight=0...128 for example.
>
> Personally I think that would have been a much cleaner interface.
>
> Setting a metadata/data roles as originally suggested here would be fine
> on a low number of devices, but on larger storage arrays with many
> devices it sounds (to me) like it would quickly become difficult to keep
> track of.
>
> With the scheme I suggest you would simply list the properties of a
> group and see what DeviceID's that belong in that group... perhaps even
> in a nice table if you where lucky.
>
> (And just for the record: other properties I can from the top of my head
> imagine that would be useful would be read/write weight that could
> (automatically) be set higher and higher if a device starts to throw
> errors, or group_exclusive=1|0 (to prevent other groups owning that
> DeviceID etc... etc...)
>
> And this would of course require another step after mkfs, but personally
> I do not understand why setting these roles (or the scheme I suggest)
> would be very useful at mkfs time. It might as well be done at first
> mount before the filesystem gets put to use.
>
> Great to see progress for BTRFS for things like this , but please do
> consider another scheme for setting the roles.
Thanks for the feedback.
The question is: which approach handles large numbers of devices
better, Mode Groups or Direct Modes?
Let’s try to break it down.
Both approaches need to manage the following:
Five role types (preferences):
metadata_only, metadata, none (any), data, data_only
Fault tolerance (FT) groups:
2 to n device groups
Four allocation strategies:
linear-devid, linear-priority, round-robin, free-space
Pros and Cons:
Direct Modes are simpler and work well for small setups. As things
scale, complexity grows, but scripts or tooling can manage that.
Mode Groups are better organized for large setups, but may be overkill
for small ones. They also require managing an extra btrfs key, which
adds some overhead.
Did I miss anything?
So far, I'm leaning toward Direct Modes. But if there's enough interest
in Mode Groups, we can explore that too. Alternatively, we could start
with Direct Modes and add Mode Groups later if needed.
Does that sound reasonable?
I’ve put up a draft work in progress version of the proposal here:
https://asj.github.io/chunk-alloc-enhancement.html
Thanks, Anand
next prev parent reply other threads:[~2025-06-02 4:25 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-12 18:07 [PATCH RFC 00/10] btrfs: new performance-based chunk allocation using device roles Anand Jain
2025-05-12 18:07 ` [PATCH 01/10] btrfs: fix thresh scope in should_alloc_chunk() Anand Jain
2025-05-12 18:07 ` [PATCH 02/10] btrfs: refactor should_alloc_chunk() arg type Anand Jain
2025-05-12 18:07 ` [PATCH 03/10] btrfs: introduce btrfs_split_sysfs_arg() for argument parsing Anand Jain
2025-05-12 18:07 ` [PATCH 04/10] btrfs: introduce device allocation method Anand Jain
2025-05-12 18:07 ` [PATCH 05/10] btrfs: sysfs: show " Anand Jain
2025-05-12 18:07 ` [PATCH 06/10] btrfs: skip device sorting when only one device is present Anand Jain
2025-05-12 18:07 ` [PATCH 07/10] btrfs: refactor chunk allocation device handling to use list_head Anand Jain
2025-05-12 18:07 ` [PATCH 08/10] btrfs: introduce explicit device roles for block groups Anand Jain
2025-05-12 18:07 ` [PATCH 09/10] btrfs: introduce ROLE_THEN_SPACE device allocation method Anand Jain
2025-05-12 18:07 ` [PATCH 10/10] btrfs: pass device roles through device add ioctl Anand Jain
2025-05-12 18:09 ` [PATCH RFC 00/14] btrfs-progs: add support for device role-based chunk allocation Anand Jain
2025-05-12 18:09 ` [PATCH 01/14] btrfs-progs: minor spelling correction in the list-chunk help text Anand Jain
2025-05-12 18:09 ` [PATCH 02/14] btrfs-progs: refactor devid comparison function Anand Jain
2025-05-12 18:09 ` [PATCH 03/14] btrfs-progs: rename local dev_list to devices in btrfs_alloc_chunk Anand Jain
2025-05-12 18:09 ` [PATCH 04/14] btrfs-progs: mkfs: prepare to merge duplicate if-else blocks Anand Jain
2025-05-12 18:09 ` [PATCH 05/14] btrfs-progs: mkfs: eliminate duplicate code in if-else Anand Jain
2025-05-12 18:09 ` [PATCH 06/14] btrfs-progs: mkfs: refactor test_num_disk_vs_raid - split data and metadata Anand Jain
2025-05-12 18:09 ` [PATCH 07/14] btrfs-progs: mkfs: device argument handling with a list Anand Jain
2025-05-12 18:09 ` [PATCH 08/14] btrfs-progs: import device role handling from the kernel Anand Jain
2025-05-12 18:09 ` [PATCH 09/14] btrfs-progs: mkfs: introduce device roles in device paths Anand Jain
2025-05-12 18:09 ` [PATCH 10/14] btrfs-progs: sort devices by role before using them Anand Jain
2025-05-12 18:09 ` [PATCH 11/14] btrfs-progs: helper for the device role within dev_item::type Anand Jain
2025-05-12 18:09 ` [PATCH 12/14] btrfs-progs: mkfs: persist device roles to dev_item::type Anand Jain
2025-05-12 18:09 ` [PATCH 13/14] btrfs-progs: update device add ioctl with device type Anand Jain
2025-05-12 18:09 ` [PATCH 14/14] btrfs-progs: disable exclusive metadata/data device roles Anand Jain
2025-06-20 16:46 ` [PATCH RFC 00/14] btrfs-progs: add support for device role-based chunk allocation David Sterba
2025-05-12 18:11 ` [PATCH RFC 0/2] fstests: btrfs: add functional verification for device roles Anand Jain
2025-05-12 18:11 ` [PATCH 1/2] fstests: common/btrfs: add _require_btrfs_feature_device_roles Anand Jain
2025-05-12 18:11 ` [PATCH 2/2] fstests: btrfs/366: add test for device role-based chunk allocation Anand Jain
2025-05-20 9:19 ` [PATCH RFC 00/10] btrfs: new performance-based chunk allocation using device roles Forza
2025-05-21 8:37 ` Anand Jain
2025-05-22 4:07 ` Zygo Blaxell
2025-06-02 4:26 ` Anand Jain
2025-06-21 1:11 ` Zygo Blaxell
2025-05-22 18:19 ` waxhead
2025-06-02 4:25 ` Anand Jain [this message]
2025-06-06 14:21 ` waxhead
2025-05-22 20:39 ` Ferry Toth
2025-06-02 4:24 ` Anand Jain
2025-06-04 21:29 ` Ferry Toth
2025-06-04 21:48 ` Anand Jain
2025-05-30 0:15 ` Jani Partanen
2025-06-02 4:25 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0643837b-5a64-483a-9cab-8c127bcf4b30@oracle.com \
--to=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=waxhead@dirtcellar.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox