From: Boris Burkov <boris@bur.io>
To: Goffredo Baroncelli <kreijack@libero.it>
Cc: linux-btrfs@vger.kernel.org,
Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.cz>,
Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
Paul Jones <paul@pauljones.id.au>,
Goffredo Baroncelli <kreijack@inwind.it>
Subject: Re: [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode
Date: Tue, 4 Jan 2022 18:44:32 -0800 [thread overview]
Message-ID: <YdUGAg1TB8FCfqnr@zen> (raw)
In-Reply-To: <cover.1639766364.git.kreijack@inwind.it>
On Fri, Dec 17, 2021 at 07:47:16PM +0100, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
>
> Hi all,
>
> This patches set was born after some discussion between me, Zygo and Josef.
> Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.
>
> Some further information about a real use case can be found in
> https://lore.kernel.org/linux-btrfs/20210116002533.GE31381@hungrycats.org/
>
> Reently Shafeeq told me that he is interested too, due to the performance gain.
>
> In V8 revision I switched away from an ioctl API in favor of a sysfs API (
> see patch #2 and #3).
>
> In V9 I renamed the sysfs interface from devinfo/type to devinfo/allocation_hint.
> Moreover I renamed dev_info->type to dev_info->flags.
>
> The idea behind this patches set, is to dedicate some disks (the fastest one)
> to the metadata chunk. My initial idea was a "soft" hint. However Zygo
> asked an option for a "strong" hint (== mandatory). The result is that
> each disk can be "tagged" by one of the following flags:
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>
> When the chunk allocator search a disks to allocate a chunk, scans the disks
> in an order decided by these tags. For metadata, the order is:
> *_METADATA_ONLY
> *_PREFERRED_METADATA
> *_PREFERRED_DATA
>
> The *_DATA_ONLY are not eligible from metadata chunk allocation.
>
> For the data chunk, the order is reversed, and the *_METADATA_ONLY are
> excluded.
>
> The exact sort logic is to sort first for the "tag", and then for the space
> available. If there is no space available, the next "tag" disks set are
> selected.
>
> To set these tags, a new property called "allocation_hint" was created.
> There is a dedicated btrfs-prog patches set [[PATCH V9] btrfs-progs:
> allocation_hint disk property].
>
> $ sudo mount /dev/loop0 /mnt/test-btrfs/
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=PREFERRED_METADATA
> devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
> devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
> devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
> devid=6, path=/dev/loop5: allocation_hint=DATA_ONLY
> devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
>
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
> Device size: 2.75GiB
> Device allocated: 1.34GiB
> Device unallocated: 1.41GiB
> Device missing: 0.00B
> Used: 400.89MiB
> Free (estimated): 1.04GiB (min: 1.04GiB)
> Data ratio: 2.00
> Metadata ratio: 1.00
> Global reserve: 3.25MiB (used: 0.00B)
> Multiple profiles: no
>
> Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
> /dev/loop0 288.00MiB
> /dev/loop1 288.00MiB
> /dev/loop2 127.00MiB
> /dev/loop3 127.00MiB
> /dev/loop4 127.00MiB
> /dev/loop5 127.00MiB
>
> Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
> /dev/loop1 256.00MiB
>
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> /dev/loop0 32.00MiB
>
> Unallocated:
> /dev/loop0 704.00MiB
> /dev/loop1 480.00MiB
> /dev/loop2 1.00MiB
> /dev/loop3 1.00MiB
> /dev/loop4 1.00MiB
> /dev/loop5 1.00MiB
> /dev/loop6 128.00MiB
> /dev/loop7 128.00MiB
>
> # change the tag of some disks
>
> $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY
> $ sudo ./btrfs prop set /dev/loop1 allocation_hint DATA_ONLY
> $ sudo ./btrfs prop set /dev/loop5 allocation_hint METADATA_ONLY
>
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY
> devid=2, path=/dev/loop1: allocation_hint=DATA_ONLY
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
> devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
> devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
> devid=6, path=/dev/loop5: allocation_hint=METADATA_ONLY
> devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
>
> $ sudo btrfs bal start --full-balance /mnt/test-btrfs/
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
> Device size: 2.75GiB
> Device allocated: 735.00MiB
> Device unallocated: 2.03GiB
> Device missing: 0.00B
> Used: 400.72MiB
> Free (estimated): 1.10GiB (min: 1.10GiB)
> Data ratio: 2.00
> Metadata ratio: 1.00
> Global reserve: 3.25MiB (used: 0.00B)
> Multiple profiles: no
>
> Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
> /dev/loop0 288.00MiB
> /dev/loop1 288.00MiB
>
> Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
> /dev/loop5 127.00MiB
>
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> /dev/loop7 32.00MiB
>
> Unallocated:
> /dev/loop0 736.00MiB
> /dev/loop1 736.00MiB
> /dev/loop2 128.00MiB
> /dev/loop3 128.00MiB
> /dev/loop4 128.00MiB
> /dev/loop5 1.00MiB
> /dev/loop6 128.00MiB
> /dev/loop7 96.00MiB
>
>
> #As you can see all the metadata were placed on the disk loop5/loop7 even if
> #the most empty one are loop0 and loop1.
>
>
>
> TODO:
> - more tests
> - the tool which show the space available should consider the tagging (eg
> the disks tagged by _METADATA_ONLY should be excluded from the data
> availability)
> - allow btrfs-prog to change the allocation_hint even when the filesystem
> is not mounted.
>
>
> Comments are welcome
This is cool, thanks for building it!
I'm playing with setting this up for a test I'm working on where I want
to send data to a dm-zero device. To that end, I applied this patchset
on top of misc-next and ran:
$ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
$ mount /dev/vg0/lv0 /mnt/lol
$ btrfs device add /dev/mapper/zero-data /mnt/lol
$ btrfs fi usage /mnt/lol
Overall:
Device size: 50.01TiB
Device allocated: 20.00MiB
Device unallocated: 50.01TiB
Device missing: 0.00B
Used: 128.00KiB
Free (estimated): 50.01TiB (min: 50.01TiB)
Free (statfs, df): 50.01TiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 3.25MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:8.00MiB, Used:0.00B (0.00%)
/dev/mapper/vg0-lv0 8.00MiB
Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
/dev/mapper/vg0-lv0 8.00MiB
System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
/dev/mapper/vg0-lv0 4.00MiB
Unallocated:
/dev/mapper/vg0-lv0 9.98GiB
/dev/mapper/zero-data 50.00TiB
$ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
$ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY
$ btrfs balance start --full-balance /mnt/lol
Done, had to relocate 3 out of 3 chunks
$ btrfs fi usage /mnt/lol
Overall:
Device size: 50.01TiB
Device allocated: 2.03GiB
Device unallocated: 50.01TiB
Device missing: 0.00B
Used: 640.00KiB
Free (estimated): 50.01TiB (min: 50.01TiB)
Free (statfs, df): 50.01TiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 3.25MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
/dev/mapper/zero-data 1.00GiB
Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
/dev/mapper/zero-data 1.00GiB
System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
/dev/mapper/zero-data 32.00MiB
Unallocated:
/dev/mapper/vg0-lv0 10.00GiB
/dev/mapper/zero-data 50.00TiB
I expected that I would have data on /dev/mapper/zero-data and metadata
on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
device. Attempting to actually use the file system eventually fails, since
the metadata is black-holed :)
Did I make some mistake in how I used it, or is this a bug?
Thanks,
Boris
> BR
> G.Baroncelli
>
> Revision:
> V9:
> - rename dev_item->type to dev_item->flags
> - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
>
> V8:
> - drop the ioctl API, instead use a sysfs one
>
> V7:
> - make more room in the struct btrfs_ioctl_dev_properties up to 1K
> - leave in btrfs_tree.h only the costants
> - removed the mount option (sic)
> - correct an 'use before check' in the while loop (signaled
> by Zygo)
> - add a 2nd sort to be sure that the device_info array is in the
> expected order
>
> V6:
> - add further values to the hints: add the possibility to
> exclude a disk for a chunk type
>
>
> Goffredo Baroncelli (6):
> btrfs: add flags to give an hint to the chunk allocator
> btrfs: export the device allocation_hint property in sysfs
> btrfs: change the device allocation_hint property via sysfs
> btrfs: add allocation_hint mode
> btrfs: rename dev_item->type to dev_item->flags
> btrfs: add allocation_hint option.
>
> fs/btrfs/ctree.h | 18 +++++-
> fs/btrfs/disk-io.c | 4 +-
> fs/btrfs/super.c | 17 ++++++
> fs/btrfs/sysfs.c | 73 ++++++++++++++++++++++
> fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++--
> fs/btrfs/volumes.h | 7 ++-
> include/uapi/linux/btrfs_tree.h | 20 +++++-
> 7 files changed, 232 insertions(+), 12 deletions(-)
>
> --
> 2.34.1
>
next prev parent reply other threads:[~2022-01-05 2:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-17 18:47 [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 1/6] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2022-01-05 22:10 ` Boris Burkov
2022-01-06 8:53 ` Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 2/6] btrfs: export the device allocation_hint property in sysfs Goffredo Baroncelli
2022-01-05 21:57 ` Boris Burkov
2021-12-17 18:47 ` [PATCH 3/6] btrfs: change the device allocation_hint property via sysfs Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 4/6] btrfs: add allocation_hint mode Goffredo Baroncelli
2022-01-05 23:48 ` Boris Burkov
2022-01-06 10:09 ` Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 5/6] btrfs: rename dev_item->type to dev_item->flags Goffredo Baroncelli
2022-01-05 23:50 ` Boris Burkov
2021-12-17 18:47 ` [PATCH 6/6] btrfs: add allocation_hint option Goffredo Baroncelli
2022-01-05 2:44 ` Boris Burkov [this message]
2022-01-05 9:16 ` [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode Goffredo Baroncelli
2022-01-05 17:55 ` Boris Burkov
2022-01-05 18:07 ` Zygo Blaxell
2022-01-05 18:16 ` Goffredo Baroncelli
2022-01-05 18:29 ` Boris Burkov
2022-01-05 22:21 ` Boris Burkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YdUGAg1TB8FCfqnr@zen \
--to=boris@bur.io \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kreijack@inwind.it \
--cc=kreijack@libero.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=paul@pauljones.id.au \
--cc=shafeeqs@panasas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).