linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Goffredo Baroncelli <kreijack@libero.it>
Cc: linux-btrfs@vger.kernel.org,
	Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.cz>,
	Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
	Paul Jones <paul@pauljones.id.au>,
	Goffredo Baroncelli <kreijack@inwind.it>
Subject: Re: [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode
Date: Tue, 4 Jan 2022 18:44:32 -0800	[thread overview]
Message-ID: <YdUGAg1TB8FCfqnr@zen> (raw)
In-Reply-To: <cover.1639766364.git.kreijack@inwind.it>

On Fri, Dec 17, 2021 at 07:47:16PM +0100, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> Hi all,
> 
> This patches set was born after some discussion between me, Zygo and Josef.
> Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.
> 
> Some further information about a real use case can be found in
> https://lore.kernel.org/linux-btrfs/20210116002533.GE31381@hungrycats.org/
> 
> Reently Shafeeq told me that he is interested too, due to the performance gain.
> 
> In V8 revision I switched away from an ioctl API in favor of a sysfs API (
> see patch #2 and #3).
> 
> In V9 I renamed the sysfs interface from devinfo/type to devinfo/allocation_hint.
> Moreover I renamed dev_info->type to dev_info->flags.
> 
> The idea behind this patches set, is to dedicate some disks (the fastest one)
> to the metadata chunk. My initial idea was a "soft" hint. However Zygo
> asked an option for a "strong" hint (== mandatory). The result is that
> each disk can be "tagged" by one of the following flags:
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
> 
> When the chunk allocator search a disks to allocate a chunk, scans the disks
> in an order decided by these tags. For metadata, the order is:
> *_METADATA_ONLY
> *_PREFERRED_METADATA
> *_PREFERRED_DATA
> 
> The *_DATA_ONLY are not eligible from metadata chunk allocation.
> 
> For the data chunk, the order is reversed, and the *_METADATA_ONLY are
> excluded.
> 
> The exact sort logic is to sort first for the "tag", and then for the space
> available. If there is no space available, the next "tag" disks set are
> selected.
> 
> To set these tags, a new property called "allocation_hint" was created.
> There is a dedicated btrfs-prog patches set [[PATCH V9] btrfs-progs:
> allocation_hint disk property].
> 
> $ sudo mount /dev/loop0 /mnt/test-btrfs/
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=PREFERRED_METADATA
> devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
> devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
> devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
> devid=6, path=/dev/loop5: allocation_hint=DATA_ONLY
> devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
> 
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
>     Device size:           2.75GiB
>     Device allocated:           1.34GiB
>     Device unallocated:           1.41GiB
>     Device missing:             0.00B
>     Used:             400.89MiB
>     Free (estimated):           1.04GiB    (min: 1.04GiB)
>     Data ratio:                  2.00
>     Metadata ratio:              1.00
>     Global reserve:           3.25MiB    (used: 0.00B)
>     Multiple profiles:                no
> 
> Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
>    /dev/loop0     288.00MiB
>    /dev/loop1     288.00MiB
>    /dev/loop2     127.00MiB
>    /dev/loop3     127.00MiB
>    /dev/loop4     127.00MiB
>    /dev/loop5     127.00MiB
> 
> Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
>    /dev/loop1     256.00MiB
> 
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>    /dev/loop0      32.00MiB
> 
> Unallocated:
>    /dev/loop0     704.00MiB
>    /dev/loop1     480.00MiB
>    /dev/loop2       1.00MiB
>    /dev/loop3       1.00MiB
>    /dev/loop4       1.00MiB
>    /dev/loop5       1.00MiB
>    /dev/loop6     128.00MiB
>    /dev/loop7     128.00MiB
> 
> # change the tag of some disks
> 
> $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY
> $ sudo ./btrfs prop set /dev/loop1 allocation_hint DATA_ONLY
> $ sudo ./btrfs prop set /dev/loop5 allocation_hint METADATA_ONLY
> 
> $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
> devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY
> devid=2, path=/dev/loop1: allocation_hint=DATA_ONLY
> devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
> devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
> devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
> devid=6, path=/dev/loop5: allocation_hint=METADATA_ONLY
> devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
> 
> $ sudo btrfs bal start --full-balance /mnt/test-btrfs/
> $ sudo ./btrfs fi us /mnt/test-btrfs/
> Overall:
>     Device size:           2.75GiB
>     Device allocated:         735.00MiB
>     Device unallocated:           2.03GiB
>     Device missing:             0.00B
>     Used:             400.72MiB
>     Free (estimated):           1.10GiB    (min: 1.10GiB)
>     Data ratio:                  2.00
>     Metadata ratio:              1.00
>     Global reserve:           3.25MiB    (used: 0.00B)
>     Multiple profiles:                no
> 
> Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
>    /dev/loop0     288.00MiB
>    /dev/loop1     288.00MiB
> 
> Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
>    /dev/loop5     127.00MiB
> 
> System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
>    /dev/loop7      32.00MiB
> 
> Unallocated:
>    /dev/loop0     736.00MiB
>    /dev/loop1     736.00MiB
>    /dev/loop2     128.00MiB
>    /dev/loop3     128.00MiB
>    /dev/loop4     128.00MiB
>    /dev/loop5       1.00MiB
>    /dev/loop6     128.00MiB
>    /dev/loop7      96.00MiB
> 
> 
> #As you can see all the metadata were placed on the disk loop5/loop7 even if
> #the most empty one are loop0 and loop1.
> 
> 
> 
> TODO:
> - more tests
> - the tool which show the space available should consider the tagging (eg
>   the disks tagged by _METADATA_ONLY should be excluded from the data
>   availability)
> - allow btrfs-prog to change the allocation_hint even when the filesystem
>   is not mounted.
> 
> 
> Comments are welcome

This is cool, thanks for building it!

I'm playing with setting this up for a test I'm working on where I want
to send data to a dm-zero device. To that end, I applied this patchset
on top of misc-next and ran:

$ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle
$ mount /dev/vg0/lv0 /mnt/lol
$ btrfs device add /dev/mapper/zero-data /mnt/lol
$ btrfs fi usage /mnt/lol
Overall:
    Device size:                  50.01TiB
    Device allocated:             20.00MiB
    Device unallocated:           50.01TiB
    Device missing:                  0.00B
    Used:                        128.00KiB
    Free (estimated):             50.01TiB      (min: 50.01TiB)
    Free (statfs, df):            50.01TiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                3.25MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,single: Size:8.00MiB, Used:0.00B (0.00%)
   /dev/mapper/vg0-lv0     8.00MiB

Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%)
   /dev/mapper/vg0-lv0     8.00MiB

System,single: Size:4.00MiB, Used:16.00KiB (0.39%)
   /dev/mapper/vg0-lv0     4.00MiB

Unallocated:
   /dev/mapper/vg0-lv0     9.98GiB
   /dev/mapper/zero-data          50.00TiB

$ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY
$ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY

$ btrfs balance start --full-balance /mnt/lol
Done, had to relocate 3 out of 3 chunks

$ btrfs fi usage /mnt/lol
Overall:
    Device size:                  50.01TiB
    Device allocated:              2.03GiB
    Device unallocated:           50.01TiB
    Device missing:                  0.00B
    Used:                        640.00KiB
    Free (estimated):             50.01TiB      (min: 50.01TiB)
    Free (statfs, df):            50.01TiB
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:                3.25MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,single: Size:1.00GiB, Used:512.00KiB (0.05%)
   /dev/mapper/zero-data           1.00GiB

Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%)
   /dev/mapper/zero-data           1.00GiB

System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/mapper/zero-data          32.00MiB

Unallocated:
   /dev/mapper/vg0-lv0    10.00GiB
   /dev/mapper/zero-data          50.00TiB


I expected that I would have data on /dev/mapper/zero-data and metadata
on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero
device. Attempting to actually use the file system eventually fails, since
the metadata is black-holed :)

Did I make some mistake in how I used it, or is this a bug?

Thanks,
Boris

> BR
> G.Baroncelli
> 
> Revision:
> V9:
> - rename dev_item->type to dev_item->flags
> - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
> 
> V8:
> - drop the ioctl API, instead use a sysfs one
> 
> V7:
> - make more room in the struct btrfs_ioctl_dev_properties up to 1K
> - leave in btrfs_tree.h only the costants
> - removed the mount option (sic)
> - correct an 'use before check' in the while loop (signaled
>   by Zygo)
> - add a 2nd sort to be sure that the device_info array is in the
>   expected order
> 
> V6:
> - add further values to the hints: add the possibility to
>   exclude a disk for a chunk type 
> 
> 
> Goffredo Baroncelli (6):
>   btrfs: add flags to give an hint to the chunk allocator
>   btrfs: export the device allocation_hint property in sysfs
>   btrfs: change the device allocation_hint property via sysfs
>   btrfs: add allocation_hint mode
>   btrfs: rename dev_item->type to dev_item->flags
>   btrfs: add allocation_hint option.
> 
>  fs/btrfs/ctree.h                |  18 +++++-
>  fs/btrfs/disk-io.c              |   4 +-
>  fs/btrfs/super.c                |  17 ++++++
>  fs/btrfs/sysfs.c                |  73 ++++++++++++++++++++++
>  fs/btrfs/volumes.c              | 105 ++++++++++++++++++++++++++++++--
>  fs/btrfs/volumes.h              |   7 ++-
>  include/uapi/linux/btrfs_tree.h |  20 +++++-
>  7 files changed, 232 insertions(+), 12 deletions(-)
> 
> -- 
> 2.34.1
> 

  parent reply	other threads:[~2022-01-05  2:44 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-17 18:47 [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 1/6] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2022-01-05 22:10   ` Boris Burkov
2022-01-06  8:53     ` Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 2/6] btrfs: export the device allocation_hint property in sysfs Goffredo Baroncelli
2022-01-05 21:57   ` Boris Burkov
2021-12-17 18:47 ` [PATCH 3/6] btrfs: change the device allocation_hint property via sysfs Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 4/6] btrfs: add allocation_hint mode Goffredo Baroncelli
2022-01-05 23:48   ` Boris Burkov
2022-01-06 10:09     ` Goffredo Baroncelli
2021-12-17 18:47 ` [PATCH 5/6] btrfs: rename dev_item->type to dev_item->flags Goffredo Baroncelli
2022-01-05 23:50   ` Boris Burkov
2021-12-17 18:47 ` [PATCH 6/6] btrfs: add allocation_hint option Goffredo Baroncelli
2022-01-05  2:44 ` Boris Burkov [this message]
2022-01-05  9:16   ` [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode Goffredo Baroncelli
2022-01-05 17:55     ` Boris Burkov
2022-01-05 18:07     ` Zygo Blaxell
2022-01-05 18:16       ` Goffredo Baroncelli
2022-01-05 18:29         ` Boris Burkov
2022-01-05 22:21 ` Boris Burkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdUGAg1TB8FCfqnr@zen \
    --to=boris@bur.io \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=kreijack@libero.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=paul@pauljones.id.au \
    --cc=shafeeqs@panasas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).