linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: kreijack@inwind.it, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.cz>,
	Sinnamohideen Shafeeq <shafeeqs@panasas.com>,
	Paul Jones <paul@pauljones.id.au>, Boris Burkov <boris@bur.io>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/7][V11] btrfs: allocation_hint
Date: Tue, 15 Feb 2022 22:28:16 -0500	[thread overview]
Message-ID: <YgxvUC86zumH3OF1@hungrycats.org> (raw)
In-Reply-To: <c43c5945-3c3a-0dee-a998-9e76c3eb0289@gmx.com>

On Wed, Feb 16, 2022 at 08:22:55AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/2/16 02:49, Goffredo Baroncelli wrote:
> > Hi Josef,
> > 
> > gentle ping...
> > 
> > few months ago you showed some interest in this patches set. Few of the
> > cc-ed person use this patch set.
> > 
> > I know that David showed some interest in the Anand approach (i.e. no
> > knobs, but an automatic behavior looking at the speed of the devices).
> > 
> > At the time when I tried this approach in the first attempts, I got the
> > complain that the kernel may not know the performance differences of the
> > disk (HDD vs SSD vs NVME vs ZONED disk...).
> 
> Sorry I didn't check the patches in details.
> 
> But I'm a little concerned about how to accurately determine the
> performance of a device.
> 
> If doing it automatically, there must be some (commonly very short) time
> spent to do the test.
> 
> In the very short time, I doubt we can even accurately got a full
> picture of a device (from sequential read/write speed to IOPS values)
> 
> For spinning disks, the sequential read/write speed even change based on
> their LBA address (as their physical location inside the plate can
> change their linear velocity, since the angular velocity is fixed).
> 
> And even for SSD, IOPS can var dramatically due to cache/controller
> difference.
> 
> 
> For a proper performance aware setup, I guess the only correct way to
> fetch performance characteristics is from the (advanced) user.
> 
> Or we may need to spent at least tens of minutes to do proper tests to
> get the result.
> 
> For regular end users, the difference between SSD and HDD is huge enough
> and simply preferring SSD for metadata is good enough.
> 
> But for more complex setup, like btrfs over LUKS over LVM (even crosses
> several physical devices), I doubt if it's even possible to fetch the
> correct performance characteristics automatically.

I agree with all of the above.

An automatic performance detection/configuration daemon can easily
set the metadata/data preference bits during mkfs, or monitor the
system's iostats and change preferences over time if this is really
useful and desired.  It doesn't have to run in the kernel.

> Thanks,
> Qu
> > 
> > Comments ?
> > 
> > BR
> > Goffredo
> > 
> > 
> > On 26/01/2022 21.32, Goffredo Baroncelli wrote:
> > > From: Goffredo Baroncelli <kreijack@inwind.it>
> > > 
> > > Hi all,
> > > 
> > > This patches set was born after some discussion between me, Zygo and
> > > Josef.
> > > Some details can be found in
> > > https://github.com/btrfs/btrfs-todo/issues/19.
> > > 
> > > Some further information about a real use case can be found in
> > > https://lore.kernel.org/linux-btrfs/20210116002533.GE31381@hungrycats.org/
> > > 
> > > 
> > > Recently Shafeeq told me that he is interested too, due to the
> > > performance gain.
> > > 
> > > In V8, revision I switched away from an ioctl API in favor of a sysfs
> > > API (
> > > see patch #2 and #3).
> > > 
> > > In V9, I renamed the sysfs interface from devinfo/type to
> > > devinfo/allocation_hint.
> > > Moreover I renamed dev_info->type to dev_info->flags.
> > > 
> > > In V10, I renamed the tag 'PREFERRED' from PREFERRED_X to X_PREFERRED;
> > > I added
> > > another devinfo property, called major_minor. For now it is unused;
> > > the plan is to use this in btrfs-progs to go from the block device to
> > > the filesystem.
> > > First client will be "btrfs prop get/set allocation_hint", but I see
> > > others possible
> > > clients.
> > > Finally I made some cleanup, and remove the mount option
> > > 'allocation_hint'
> > > 
> > > In V11 I added a new 'feature' file
> > > /sys/fs/btrfs/features/allocation_hint
> > > to help the detection of the allocation_hint feature.
> > > 
> > > The idea behind this patches set, is to dedicate some disks (the
> > > fastest one)
> > > to the metadata chunk. My initial idea was a "soft" hint. However Zygo
> > > asked an option for a "strong" hint (== mandatory). The result is that
> > > each disk can be "tagged" by one of the following flags:
> > > - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> > > - BTRFS_DEV_ALLOCATION_METADATA_PREFERRED
> > > - BTRFS_DEV_ALLOCATION_DATA_PREFERRED
> > > - BTRFS_DEV_ALLOCATION_DATA_ONLY
> > > 
> > > When the chunk allocator search a disks to allocate a chunk, scans the
> > > disks
> > > in an order decided by these tags. For metadata, the order is:
> > > *_METADATA_ONLY
> > > *_METADATA_PREFERRED
> > > *_DATA_PREFERRED
> > > 
> > > The *_DATA_ONLY are not eligible from metadata chunk allocation.
> > > 
> > > For the data chunk, the order is reversed, and the *_METADATA_ONLY are
> > > excluded.
> > > 
> > > The exact sort logic is to sort first for the "tag", and then for the
> > > space
> > > available. If there is no space available, the next "tag" disks set are
> > > selected.
> > > 
> > > To set these tags, a new property called "allocation_hint" was created.
> > > There is a dedicated btrfs-prog patches set [[PATCH V11] btrfs-progs:
> > > allocation_hint disk property].
> > > 
> > > $ sudo mount /dev/loop0 /mnt/test-btrfs/
> > > $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i
> > > allocation_hint; done
> > > devid=1, path=/dev/loop0: allocation_hint=METADATA_PREFERRED
> > > devid=2, path=/dev/loop1: allocation_hint=METADATA_PREFERRED
> > > devid=3, path=/dev/loop2: allocation_hint=DATA_PREFERRED
> > > devid=4, path=/dev/loop3: allocation_hint=DATA_PREFERRED
> > > devid=5, path=/dev/loop4: allocation_hint=DATA_PREFERRED
> > > devid=6, path=/dev/loop5: allocation_hint=DATA_ONLY
> > > devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> > > devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
> > > 
> > > $ sudo ./btrfs fi us /mnt/test-btrfs/
> > > Overall:
> > >      Device size:           2.75GiB
> > >      Device allocated:           1.34GiB
> > >      Device unallocated:           1.41GiB
> > >      Device missing:             0.00B
> > >      Used:             400.89MiB
> > >      Free (estimated):           1.04GiB    (min: 1.04GiB)
> > >      Data ratio:                  2.00
> > >      Metadata ratio:              1.00
> > >      Global reserve:           3.25MiB    (used: 0.00B)
> > >      Multiple profiles:                no
> > > 
> > > Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
> > >     /dev/loop0     288.00MiB
> > >     /dev/loop1     288.00MiB
> > >     /dev/loop2     127.00MiB
> > >     /dev/loop3     127.00MiB
> > >     /dev/loop4     127.00MiB
> > >     /dev/loop5     127.00MiB
> > > 
> > > Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
> > >     /dev/loop1     256.00MiB
> > > 
> > > System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> > >     /dev/loop0      32.00MiB
> > > 
> > > Unallocated:
> > >     /dev/loop0     704.00MiB
> > >     /dev/loop1     480.00MiB
> > >     /dev/loop2       1.00MiB
> > >     /dev/loop3       1.00MiB
> > >     /dev/loop4       1.00MiB
> > >     /dev/loop5       1.00MiB
> > >     /dev/loop6     128.00MiB
> > >     /dev/loop7     128.00MiB
> > > 
> > > # change the tag of some disks
> > > 
> > > $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY
> > > $ sudo ./btrfs prop set /dev/loop1 allocation_hint DATA_ONLY
> > > $ sudo ./btrfs prop set /dev/loop5 allocation_hint METADATA_ONLY
> > > 
> > > $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i
> > > allocation_hint; done
> > > devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY
> > > devid=2, path=/dev/loop1: allocation_hint=DATA_ONLY
> > > devid=3, path=/dev/loop2: allocation_hint=DATA_PREFERRED
> > > devid=4, path=/dev/loop3: allocation_hint=DATA_PREFERRED
> > > devid=5, path=/dev/loop4: allocation_hint=DATA_PREFERRED
> > > devid=6, path=/dev/loop5: allocation_hint=METADATA_ONLY
> > > devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
> > > devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY
> > > 
> > > $ sudo btrfs bal start --full-balance /mnt/test-btrfs/
> > > $ sudo ./btrfs fi us /mnt/test-btrfs/
> > > Overall:
> > >      Device size:           2.75GiB
> > >      Device allocated:         735.00MiB
> > >      Device unallocated:           2.03GiB
> > >      Device missing:             0.00B
> > >      Used:             400.72MiB
> > >      Free (estimated):           1.10GiB    (min: 1.10GiB)
> > >      Data ratio:                  2.00
> > >      Metadata ratio:              1.00
> > >      Global reserve:           3.25MiB    (used: 0.00B)
> > >      Multiple profiles:                no
> > > 
> > > Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
> > >     /dev/loop0     288.00MiB
> > >     /dev/loop1     288.00MiB
> > > 
> > > Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
> > >     /dev/loop5     127.00MiB
> > > 
> > > System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
> > >     /dev/loop7      32.00MiB
> > > 
> > > Unallocated:
> > >     /dev/loop0     736.00MiB
> > >     /dev/loop1     736.00MiB
> > >     /dev/loop2     128.00MiB
> > >     /dev/loop3     128.00MiB
> > >     /dev/loop4     128.00MiB
> > >     /dev/loop5       1.00MiB
> > >     /dev/loop6     128.00MiB
> > >     /dev/loop7      96.00MiB
> > > 
> > > 
> > > #As you can see all the metadata were placed on the disk loop5/loop7
> > > even if
> > > #the most empty one are loop0 and loop1.
> > > 
> > > 
> > > 
> > > Furher works:
> > > - the tool which show the space available should consider the tagging (eg
> > >    the disks tagged by _METADATA_ONLY should be excluded from the data
> > >    availability)
> > > - allow btrfs-prog to change the allocation_hint even when the filesystem
> > >    is not mounted.
> > > 
> > > In following emails, there will be the btrfs-progs patches set and the
> > > xfstest
> > > suite patches set.
> > > 
> > > 
> > > Comments are welcome
> > > BR
> > > G.Baroncelli
> > > 
> > > Revision:
> > > V11:
> > > - added the property /sys/fs/btrfs/features/allocation_hint
> > > 
> > > V10:
> > > - renamed two disk tags/constants:
> > >          - *_METADATA_PREFERRED -> *_METADATA_PREFERRED
> > >          - *_DATA_PREFERRED -> *_DATA_EFERRED
> > > - add /sys/fs/btrfs/$UUID/devinfo/$DEVID/major_minor
> > > - revise some commit description
> > > - refactored the code of gather_device_info(): one portion of this code
> > >    is moved in a separate function called sort_and_reduce_device_info()
> > >    for clarity.
> > > - removed the 'allocation_hint' mount option
> > > 
> > > V9:
> > > - rename dev_item->type to dev_item->flags
> > > - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint
> > > 
> > > V8:
> > > - drop the ioctl API, instead use a sysfs one
> > > 
> > > V7:
> > > - make more room in the struct btrfs_ioctl_dev_properties up to 1K
> > > - leave in btrfs_tree.h only the costants
> > > - removed the mount option (sic)
> > > - correct an 'use before check' in the while loop (signaled
> > >    by Zygo)
> > > - add a 2nd sort to be sure that the device_info array is in the
> > >    expected order
> > > 
> > > V6:
> > > - add further values to the hints: add the possibility to
> > >    exclude a disk for a chunk type
> > > 
> > > 
> > > 
> > > Goffredo Baroncelli (7):
> > >    btrfs: add flags to give an hint to the chunk allocator
> > >    btrfs: export the device allocation_hint property in sysfs
> > >    btrfs: change the device allocation_hint property via sysfs
> > >    btrfs: add allocation_hint mode
> > >    btrfs: rename dev_item->type to dev_item->flags
> > >    btrfs: add major and minor to sysfs
> > >    Add /sys/fs/btrfs/features/allocation_hint
> > > 
> > >   fs/btrfs/ctree.h                |   4 +-
> > >   fs/btrfs/disk-io.c              |   2 +-
> > >   fs/btrfs/sysfs.c                | 105 +++++++++++++++++++++++++++
> > >   fs/btrfs/volumes.c              | 121 ++++++++++++++++++++++++++++++--
> > >   fs/btrfs/volumes.h              |   7 +-
> > >   include/uapi/linux/btrfs_tree.h |  20 +++++-
> > >   6 files changed, 245 insertions(+), 14 deletions(-)
> > > 
> > 
> > 
> 

  reply	other threads:[~2022-02-16  3:37 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 20:32 [PATCH 0/7][V11] btrfs: allocation_hint Goffredo Baroncelli
2022-01-26 20:32 ` [PATCH 1/7] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2022-01-26 20:32 ` [PATCH 2/7] btrfs: export the device allocation_hint property in sysfs Goffredo Baroncelli
2022-01-26 20:32 ` [PATCH 3/7] btrfs: change the device allocation_hint property via sysfs Goffredo Baroncelli
2022-01-26 20:32 ` [PATCH 4/7] btrfs: add allocation_hint mode Goffredo Baroncelli
2022-01-26 20:32 ` [PATCH 5/7] btrfs: rename dev_item->type to dev_item->flags Goffredo Baroncelli
2022-01-26 20:32 ` [PATCH 6/7] btrfs: add major and minor to sysfs Goffredo Baroncelli
2022-01-26 20:32 ` [PATCH 7/7] Add /sys/fs/btrfs/features/allocation_hint Goffredo Baroncelli
2022-02-15 18:49 ` [PATCH 0/7][V11] btrfs: allocation_hint Goffredo Baroncelli
2022-02-16  0:22   ` Qu Wenruo
2022-02-16  3:28     ` Zygo Blaxell [this message]
2022-02-16  4:43       ` Paul Jones
2022-02-25 20:18         ` Boris Burkov
2022-02-28 17:04 ` Josef Bacik
2022-02-28 21:01   ` Goffredo Baroncelli
2022-03-01 15:07     ` Josef Bacik
2022-03-01 18:55       ` Goffredo Baroncelli
2022-03-01 21:43         ` Josef Bacik
2022-03-02 19:30           ` Goffredo Baroncelli
2022-03-02 21:23             ` Josef Bacik
2022-03-03 19:01               ` Goffredo Baroncelli
2022-03-04 14:56                 ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgxvUC86zumH3OF1@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=boris@bur.io \
    --cc=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=paul@pauljones.id.au \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=shafeeqs@panasas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).