From: Boris Burkov <boris@bur.io>
To: Goffredo Baroncelli <kreijack@libero.it>
Cc: linux-btrfs@vger.kernel.org,
Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.cz>,
Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
Paul Jones <paul@pauljones.id.au>,
Goffredo Baroncelli <kreijack@inwind.it>
Subject: Re: [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode
Date: Tue, 4 Jan 2022 18:44:32 -0800 [thread overview]
Message-ID: <YdUGAg1TB8FCfqnr@zen> (raw)
In-Reply-To: <cover.1639766364.git.kreijack@inwind.it>
On Fri, Dec 17, 2021 at 07:47:16PM +0100, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
>
> Hi all,
>
> This patches set was born after some discussion between me, Zygo and Josef.
> Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.
>
> Some further information about a real use case can be found in
> https://lore.kernel.org/linux-btrfs/20210116002533.GE31381@hungrycats.org/
>
> Reently Shafeeq told me that he is interested too, due to the performance gain.
>
> In V8 revision I switched away from an ioctl API in favor of a sysfs API (
> see patch #2 and #3).
>
> In V9 I renamed the sysfs interface from devinfo/type to devinfo/allocation_hint.
> Moreover I renamed dev_info->type to dev_info->flags.
>
> The idea behind this patches set, is to dedicate some disks (the fastest one)
> to the metadata chunk. My initial idea was a "soft" hint. However Zygo
> asked an option for a "strong" hint (== mandatory). The result is that
> each disk can be "tagged" by one of the following flags:
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>
> When the chunk allocator search a disks to allocate a chunk, scans the disks
> in an order decided by these tags. For metadata, the order is:
> *_METADATA_ONLY
> *_PREFERRED_METADATA
> *_PREFERRED_DATA
>
> The *_DATA_ONLY are not eligible from metadata chunk allocation.
>
> For the data chunk, the order is reversed, and the *_METADATA_ONLY are
> excluded.
>
> The exact sort logic is to sort first for the "tag", and then for the space
> available. If there is no space available, the next "tag" disks set are
> selected.
>
> To set these tags, a new property called "allocation_hint" was created.
> There is a dedicated btrfs-prog patches set [[PATCH V9] btrfs-progs:
> allocation_hint disk property].
>
> $ sudo mount /dev/loop0 /mnt/test-btrfs/
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=PREFERRED_METADATA
> devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
> devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
> devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
> devid=6, path=/dev/loop5: allocation_hint=DATA_ONLY
> devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
>
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
> Device size: 2.75GiB
> Device allocated: 1.34GiB
> Device unallocated: 1.41GiB
> Device missing: 0.00B
> Used: 400.89MiB
> Free (estimated): 1.04GiB (min: 1.04GiB)
> Data ratio: 2.00
> Metadata ratio: 1.00
> Global reserve: 3.25MiB (used: 0.00B)
> Multiple profiles: no
>
> Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
> /dev/loop0 288.00MiB
> /dev/loop1 288.00MiB
> /dev/loop2 127.00MiB
> /dev/loop3 127.00MiB
> /dev/loop4 127.00MiB
> /dev/loop5 127.00MiB
>
> Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
> /dev/loop1 256.00MiB
>
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> /dev/loop0 32.00MiB
>
> Unallocated:
> /dev/loop0 704.00MiB
> /dev/loop1 480.00MiB
> /dev/loop2 1.00MiB
> /dev/loop3 1.00MiB
> /dev/loop4 1.00MiB
> /dev/loop5 1.00MiB
> /dev/loop6 128.00MiB
> /dev/loop7 128.00MiB
>
> # change the tag of some disks
>
> $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY
> $ sudo ./btrfs prop set /dev/loop1 allocation_hint DATA_ONLY
> $ sudo ./btrfs prop set /dev/loop5 allocation_hint METADATA_ONLY
>
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY
> devid=2, path=/dev/loop1: allocation_hint=DATA_ONLY
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
> devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
> devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
> devid=6, path=/dev/loop5: allocation_hint=METADATA_ONLY
> devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
>
> $ sudo btrfs bal start --full-balance /mnt/test-btrfs/
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
> Device size: 2.75GiB
> Device allocated: 735.00MiB
> Device unallocated: 2.03GiB
> Device missing: 0.00B
> Used: 400.72MiB
> Free (estimated): 1.10GiB (min: 1.10GiB)
> Data ratio: 2.00
> Metadata ratio: 1.00
> Global reserve: 3.25MiB (used: 0.00B)
> Multiple profiles: no
>
> Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
> /dev/loop0 288.00MiB
> /dev/loop1 288.00MiB
>
> Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
> /dev/loop5 127.00MiB
>
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> /dev/loop7 32.00MiB
>
> Unallocated:
> /dev/loop0 736.00MiB
> /dev/loop1 736.00MiB
> /dev/loop2 128.00MiB
> /dev/loop3 128.00MiB
> /dev/loop4 128.00MiB
> /dev/loop5 1.00MiB
> /dev/loop6 128.00MiB
> /dev/loop7 96.00MiB
>
>
> #As you can see all the metadata were placed on the disk loop5/loop7 even if
> #the most empty one are loop0 and loop1.
>
>
>
> TODO:
> - more tests
> - the tool which show the space available should consider the tagging (eg
> the disks tagged by _METADATA_ONLY should be excluded from the data
> availability)
> - allow btrfs-prog to change the allocation_hint even when the filesystem
> is not mounted.
>
>
> Comments are welcome
This is cool, thanks for building it!
I'm playing with setting this up for a test I'm working on where I want
to send data to a dm-zero device. To that end, I applied this patchset
on top of misc-next and ran:
$ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
$ mount /dev/vg0/lv0 /mnt/lol
$ btrfs device add /dev/mapper/zero-data /mnt/lol
$ btrfs fi usage /mnt/lol
Overall:
Device size: 50.01TiB
Device allocated: 20.00MiB
Device unallocated: 50.01TiB
Device missing: 0.00B
Used: 128.00KiB
Free (estimated): 50.01TiB (min: 50.01TiB)
Free (statfs, df): 50.01TiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 3.25MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:8.00MiB, Used:0.00B (0.00%)
/dev/mapper/vg0-lv0 8.00MiB
Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
/dev/mapper/vg0-lv0 8.00MiB
System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
/dev/mapper/vg0-lv0 4.00MiB
Unallocated:
/dev/mapper/vg0-lv0 9.98GiB
/dev/mapper/zero-data 50.00TiB
$ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
$ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY
$ btrfs balance start --full-balance /mnt/lol
Done, had to relocate 3 out of 3 chunks
$ btrfs fi usage /mnt/lol
Overall:
Device size: 50.01TiB
Device allocated: 2.03GiB
Device unallocated: 50.01TiB
Device missing: 0.00B
Used: 640.00KiB
Free (estimated): 50.01TiB (min: 50.01TiB)
Free (statfs, df): 50.01TiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 3.25MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
/dev/mapper/zero-data 1.00GiB
Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
/dev/mapper/zero-data 1.00GiB
System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
/dev/mapper/zero-data 32.00MiB
Unallocated:
/dev/mapper/vg0-lv0 10.00GiB
/dev/mapper/zero-data 50.00TiB
I expected that I would have data on /dev/mapper/zero-data and metadata
on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
device. Attempting to actually use the file system eventually fails, since
the metadata is black-holed :)
Did I make some mistake in how I used it, or is this a bug?
Thanks,
Boris
> BR
> G.Baroncelli
>
> Revision:
> V9:
> - rename dev_item->type to dev_item->flags
> - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
>
> V8:
> - drop the ioctl API, instead use a sysfs one
>
> V7:
> - make more room in the struct btrfs_ioctl_dev_properties up to 1K
> - leave in btrfs_tree.h only the costants
> - removed the mount option (sic)
> - correct an 'use before check' in the while loop (signaled
> by Zygo)
> - add a 2nd sort to be sure that the device_info array is in the
> expected order
>
> V6:
> - add further values to the hints: add the possibility to
> exclude a disk for a chunk type
>
>
> Goffredo Baroncelli (6):
> btrfs: add flags to give an hint to the chunk allocator
> btrfs: export the device allocation_hint property in sysfs
> btrfs: change the device allocation_hint property via sysfs
> btrfs: add allocation_hint mode
> btrfs: rename dev_item->type to dev_item->flags
> btrfs: add allocation_hint option.
>
> fs/btrfs/ctree.h | 18 +++++-
> fs/btrfs/disk-io.c | 4 +-
> fs/btrfs/super.c | 17 ++++++
> fs/btrfs/sysfs.c | 73 ++++++++++++++++++++++
> fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++--
> fs/btrfs/volumes.h | 7 ++-
> include/uapi/linux/btrfs_tree.h | 20 +++++-
> 7 files changed, 232 insertions(+), 12 deletions(-)
>
> --
> 2.34.1
>
next prev parent reply other threads:[~2022-01-05 2:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-17 18:47 [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 1/6] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2022-01-05 22:10 ` Boris Burkov
2022-01-06 8:53 ` Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 2/6] btrfs: export the device allocation_hint property in sysfs Goffredo Baroncelli
2022-01-05 21:57 ` Boris Burkov
2021-12-17 18:47 ` [PATCH 3/6] btrfs: change the device allocation_hint property via sysfs Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 4/6] btrfs: add allocation_hint mode Goffredo Baroncelli
2022-01-05 23:48 ` Boris Burkov
2022-01-06 10:09 ` Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 5/6] btrfs: rename dev_item->type to dev_item->flags Goffredo Baroncelli
2022-01-05 23:50 ` Boris Burkov
2021-12-17 18:47 ` [PATCH 6/6] btrfs: add allocation_hint option Goffredo Baroncelli
2022-01-05 2:44 ` Boris Burkov [this message]
2022-01-05 9:16 ` [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode Goffredo Baroncelli
2022-01-05 17:55 ` Boris Burkov
2022-01-05 18:07 ` Zygo Blaxell
2022-01-05 18:16 ` Goffredo Baroncelli
2022-01-05 18:29 ` Boris Burkov
2022-01-05 22:21 ` Boris Burkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YdUGAg1TB8FCfqnr@zen \
--to=boris@bur.io \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=dsterba@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kreijack@inwind.it \
--cc=kreijack@libero.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=paul@pauljones.id.au \
--cc=shafeeqs@panasas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.