From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E1F1C433EF for ; Wed, 5 Jan 2022 02:44:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237137AbiAECoj (ORCPT ); Tue, 4 Jan 2022 21:44:39 -0500 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:33007 "EHLO out5-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230074AbiAECoh (ORCPT ); Tue, 4 Jan 2022 21:44:37 -0500 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id EDA965C0199; Tue, 4 Jan 2022 21:44:34 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Tue, 04 Jan 2022 21:44:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bur.io; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm2; bh=DM+WeNTYDfewoo7MDqW/8qrxtf3 EyoUOizNQTxx2lco=; b=PiVhACOTKg7zsJGz6K3dwrbUc3iJENL3FsbJk8DaBr1 M3NI4oQ+X5qdl2R8rKBbL/IUPmIY4qR1tHyKTiCBGESoTjDk9jqTFWoJiY9SdQaL v1Z3ZLLB1D/QWrXeh+IJ8vq70NBbzhUGkhGgxeRvbhjX+v1uGn7p5RXA2X2Mzj34 70UEffMXp5UdZTqG32IJHMH8OUI36+ccIyR/su2rBb6fmp59ixWKzhGCeNMm2oa3 EeUhbhRC1pgMW7l2s5pWAq5R3WAFyxyCgs5qJiF5ZB9tayp0RfcDnCZFY116gzJE hmEW8VrumggzfbAtZPETwMoFohwubrtShbTZa76F4cQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=DM+WeN TYDfewoo7MDqW/8qrxtf3EyoUOizNQTxx2lco=; b=Z9Ork9Vgt1CITo6YY8D5em vjrcYSsAYe4qljopBD38nFbfbzf8MYkG9WWkvEgesH9vDrJcFVvtAYmBck1CERAu xr2lCQgVYJ95R3UIu5czF0tdtXgAgBTLWt0dUgaqFjqlguZY/o3YEZgW1V1MWcRo I/lGmLVZ0wWGWFN1GvS3+xVfCVkh8NeLyoEA3nrGEtZp6Er80i0pHoUYJglA3z0r RzTY0SKWerjzxh3SpxVbxBuHB6iSMLqlEzbfjDEr+Zm/1vAqgo2kiUFxy8JHCLBV lNf53IQN8jNC8VlqRSUkiT8jIE3zQ3EkXTzCkK3BT0lLYfpFGjkH5/veW/CoTW/w == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrudefgedggeekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeeuohhrihhs uceuuhhrkhhovhcuoegsohhrihhssegsuhhrrdhioheqnecuggftrfgrthhtvghrnhepie fgteehudffkeettefhtdefueeufeekudeffffghedtgedufffgtdekteeigffhnecuffho mhgrihhnpehgihhthhhusgdrtghomhdpkhgvrhhnvghlrdhorhhgnecuvehluhhsthgvrh fuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsghorhhishessghurhdrihho X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 4 Jan 2022 21:44:34 -0500 (EST) Date: Tue, 4 Jan 2022 18:44:32 -0800 From: Boris Burkov To: Goffredo Baroncelli Cc: linux-btrfs@vger.kernel.org, Zygo Blaxell , Josef Bacik , David Sterba , Sinnamohideen Shafeeq , Paul Jones , Goffredo Baroncelli Subject: Re: [RFC][V9][PATCH 0/6] btrfs: allocation_hint mode Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Fri, Dec 17, 2021 at 07:47:16PM +0100, Goffredo Baroncelli wrote: > From: Goffredo Baroncelli > > Hi all, > > This patches set was born after some discussion between me, Zygo and Josef. > Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19. > > Some further information about a real use case can be found in > https://lore.kernel.org/linux-btrfs/20210116002533.GE31381@hungrycats.org/ > > Reently Shafeeq told me that he is interested too, due to the performance gain. > > In V8 revision I switched away from an ioctl API in favor of a sysfs API ( > see patch #2 and #3). > > In V9 I renamed the sysfs interface from devinfo/type to devinfo/allocation_hint. > Moreover I renamed dev_info->type to dev_info->flags. > > The idea behind this patches set, is to dedicate some disks (the fastest one) > to the metadata chunk. My initial idea was a "soft" hint. However Zygo > asked an option for a "strong" hint (== mandatory). The result is that > each disk can be "tagged" by one of the following flags: > - BTRFS_DEV_ALLOCATION_METADATA_ONLY > - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA > - BTRFS_DEV_ALLOCATION_PREFERRED_DATA > - BTRFS_DEV_ALLOCATION_DATA_ONLY > > When the chunk allocator search a disks to allocate a chunk, scans the disks > in an order decided by these tags. For metadata, the order is: > *_METADATA_ONLY > *_PREFERRED_METADATA > *_PREFERRED_DATA > > The *_DATA_ONLY are not eligible from metadata chunk allocation. > > For the data chunk, the order is reversed, and the *_METADATA_ONLY are > excluded. > > The exact sort logic is to sort first for the "tag", and then for the space > available. If there is no space available, the next "tag" disks set are > selected. > > To set these tags, a new property called "allocation_hint" was created. > There is a dedicated btrfs-prog patches set [[PATCH V9] btrfs-progs: > allocation_hint disk property]. > > $ sudo mount /dev/loop0 /mnt/test-btrfs/ > $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done > devid=1, path=/dev/loop0: allocation_hint=PREFERRED_METADATA > devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA > devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA > devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA > devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA > devid=6, path=/dev/loop5: allocation_hint=DATA_ONLY > devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY > devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY > > $ sudo ./btrfs fi us /mnt/test-btrfs/ > Overall: > Device size: 2.75GiB > Device allocated: 1.34GiB > Device unallocated: 1.41GiB > Device missing: 0.00B > Used: 400.89MiB > Free (estimated): 1.04GiB (min: 1.04GiB) > Data ratio: 2.00 > Metadata ratio: 1.00 > Global reserve: 3.25MiB (used: 0.00B) > Multiple profiles: no > > Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%) > /dev/loop0 288.00MiB > /dev/loop1 288.00MiB > /dev/loop2 127.00MiB > /dev/loop3 127.00MiB > /dev/loop4 127.00MiB > /dev/loop5 127.00MiB > > Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%) > /dev/loop1 256.00MiB > > System,single: Size:32.00MiB, Used:16.00KiB (0.05%) > /dev/loop0 32.00MiB > > Unallocated: > /dev/loop0 704.00MiB > /dev/loop1 480.00MiB > /dev/loop2 1.00MiB > /dev/loop3 1.00MiB > /dev/loop4 1.00MiB > /dev/loop5 1.00MiB > /dev/loop6 128.00MiB > /dev/loop7 128.00MiB > > # change the tag of some disks > > $ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY > $ sudo ./btrfs prop set /dev/loop1 allocation_hint DATA_ONLY > $ sudo ./btrfs prop set /dev/loop5 allocation_hint METADATA_ONLY > > $ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done > devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY > devid=2, path=/dev/loop1: allocation_hint=DATA_ONLY > devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA > devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA > devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA > devid=6, path=/dev/loop5: allocation_hint=METADATA_ONLY > devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY > devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY > > $ sudo btrfs bal start --full-balance /mnt/test-btrfs/ > $ sudo ./btrfs fi us /mnt/test-btrfs/ > Overall: > Device size: 2.75GiB > Device allocated: 735.00MiB > Device unallocated: 2.03GiB > Device missing: 0.00B > Used: 400.72MiB > Free (estimated): 1.10GiB (min: 1.10GiB) > Data ratio: 2.00 > Metadata ratio: 1.00 > Global reserve: 3.25MiB (used: 0.00B) > Multiple profiles: no > > Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%) > /dev/loop0 288.00MiB > /dev/loop1 288.00MiB > > Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%) > /dev/loop5 127.00MiB > > System,single: Size:32.00MiB, Used:16.00KiB (0.05%) > /dev/loop7 32.00MiB > > Unallocated: > /dev/loop0 736.00MiB > /dev/loop1 736.00MiB > /dev/loop2 128.00MiB > /dev/loop3 128.00MiB > /dev/loop4 128.00MiB > /dev/loop5 1.00MiB > /dev/loop6 128.00MiB > /dev/loop7 96.00MiB > > > #As you can see all the metadata were placed on the disk loop5/loop7 even if > #the most empty one are loop0 and loop1. > > > > TODO: > - more tests > - the tool which show the space available should consider the tagging (eg > the disks tagged by _METADATA_ONLY should be excluded from the data > availability) > - allow btrfs-prog to change the allocation_hint even when the filesystem > is not mounted. > > > Comments are welcome This is cool, thanks for building it! I'm playing with setting this up for a test I'm working on where I want to send data to a dm-zero device. To that end, I applied this patchset on top of misc-next and ran: $ mkfs.btrfs -f /dev/vg0/lv0 -dsingle -msingle $ mount /dev/vg0/lv0 /mnt/lol $ btrfs device add /dev/mapper/zero-data /mnt/lol $ btrfs fi usage /mnt/lol Overall: Device size: 50.01TiB Device allocated: 20.00MiB Device unallocated: 50.01TiB Device missing: 0.00B Used: 128.00KiB Free (estimated): 50.01TiB (min: 50.01TiB) Free (statfs, df): 50.01TiB Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 3.25MiB (used: 0.00B) Multiple profiles: no Data,single: Size:8.00MiB, Used:0.00B (0.00%) /dev/mapper/vg0-lv0 8.00MiB Metadata,single: Size:8.00MiB, Used:112.00KiB (1.37%) /dev/mapper/vg0-lv0 8.00MiB System,single: Size:4.00MiB, Used:16.00KiB (0.39%) /dev/mapper/vg0-lv0 4.00MiB Unallocated: /dev/mapper/vg0-lv0 9.98GiB /dev/mapper/zero-data 50.00TiB $ ./btrfs property set -t device /dev/mapper/zero-data allocation_hint DATA_ONLY $ ./btrfs property set -t device /dev/vg0/lv0 allocation_hint METADATA_ONLY $ btrfs balance start --full-balance /mnt/lol Done, had to relocate 3 out of 3 chunks $ btrfs fi usage /mnt/lol Overall: Device size: 50.01TiB Device allocated: 2.03GiB Device unallocated: 50.01TiB Device missing: 0.00B Used: 640.00KiB Free (estimated): 50.01TiB (min: 50.01TiB) Free (statfs, df): 50.01TiB Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 3.25MiB (used: 0.00B) Multiple profiles: no Data,single: Size:1.00GiB, Used:512.00KiB (0.05%) /dev/mapper/zero-data 1.00GiB Metadata,single: Size:1.00GiB, Used:112.00KiB (0.01%) /dev/mapper/zero-data 1.00GiB System,single: Size:32.00MiB, Used:16.00KiB (0.05%) /dev/mapper/zero-data 32.00MiB Unallocated: /dev/mapper/vg0-lv0 10.00GiB /dev/mapper/zero-data 50.00TiB I expected that I would have data on /dev/mapper/zero-data and metadata on /dev/mapper/vg0-lv0, but it seems both of them were written to the zero device. Attempting to actually use the file system eventually fails, since the metadata is black-holed :) Did I make some mistake in how I used it, or is this a bug? Thanks, Boris > BR > G.Baroncelli > > Revision: > V9: > - rename dev_item->type to dev_item->flags > - rename /sys/fs/btrfs/$UUID/devinfo/type -> allocation_hint > > V8: > - drop the ioctl API, instead use a sysfs one > > V7: > - make more room in the struct btrfs_ioctl_dev_properties up to 1K > - leave in btrfs_tree.h only the costants > - removed the mount option (sic) > - correct an 'use before check' in the while loop (signaled > by Zygo) > - add a 2nd sort to be sure that the device_info array is in the > expected order > > V6: > - add further values to the hints: add the possibility to > exclude a disk for a chunk type > > > Goffredo Baroncelli (6): > btrfs: add flags to give an hint to the chunk allocator > btrfs: export the device allocation_hint property in sysfs > btrfs: change the device allocation_hint property via sysfs > btrfs: add allocation_hint mode > btrfs: rename dev_item->type to dev_item->flags > btrfs: add allocation_hint option. > > fs/btrfs/ctree.h | 18 +++++- > fs/btrfs/disk-io.c | 4 +- > fs/btrfs/super.c | 17 ++++++ > fs/btrfs/sysfs.c | 73 ++++++++++++++++++++++ > fs/btrfs/volumes.c | 105 ++++++++++++++++++++++++++++++-- > fs/btrfs/volumes.h | 7 ++- > include/uapi/linux/btrfs_tree.h | 20 +++++- > 7 files changed, 232 insertions(+), 12 deletions(-) > > -- > 2.34.1 >