Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@inwind.it>
To: Goffredo Baroncelli <kreijack@libero.it>, linux-btrfs@vger.kernel.org
Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
	Josef Bacik <josef@toxicpanda.com>
Subject: Re: [PATCH 5/5] btrfs: add allocator_hint mode
Date: Fri, 19 Feb 2021 19:51:26 +0100	[thread overview]
Message-ID: <396b3709-a0b2-a0ee-00b8-75e4ca91b0e7@inwind.it> (raw)
In-Reply-To: <20210201212820.64381-6-kreijack@libero.it>

On 2/1/21 10:28 PM, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> When this mode is enabled, the chunk allocation policy is modified as follow.
> 
> Each disk may have a different tag:
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)
> 
> Where:
> - ALLOCATION_PREFERRED_X means that it is preferred to use this disk for the
> X chunk type (the other type may be allowed when the space is low)
> - ALLOCATION_X_ONLY means that it is used *only* for the X chunk type. This
> means also that it is a preferred choice.
> 
> Each time the allocator allocates a chunk of type X , first it takes the disks
> tagged as ALLOCATION_X_ONLY or ALLOCATION_PREFERRED_X; if the space is not
> enough, it uses also the disks tagged as ALLOCATION_METADATA_ONLY; if the space
> is not enough, it uses also the other disks, with the exception of the one
> marked as ALLOCATION_PREFERRED_Y, where Y the other type of chunk (i.e. not X).
> 
> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> ---
>   fs/btrfs/volumes.c | 81 +++++++++++++++++++++++++++++++++++++++++++++-
>   fs/btrfs/volumes.h |  1 +
>   2 files changed, 81 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 68b346c5465d..57ee3e2fdac0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4806,13 +4806,18 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info,
>   }
>   
>   /*
> - * sort the devices in descending order by max_avail, total_avail
> + * sort the devices in descending order by alloc_hint,
> + * max_avail, total_avail
>    */
>   static int btrfs_cmp_device_info(const void *a, const void *b)
>   {
>   	const struct btrfs_device_info *di_a = a;
>   	const struct btrfs_device_info *di_b = b;
>   
> +	if (di_a->alloc_hint > di_b->alloc_hint)
> +		return -1;
> +	if (di_a->alloc_hint < di_b->alloc_hint)
> +		return 1;
>   	if (di_a->max_avail > di_b->max_avail)
>   		return -1;
>   	if (di_a->max_avail < di_b->max_avail)
> @@ -4939,6 +4944,15 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>   	int ndevs = 0;
>   	u64 max_avail;
>   	u64 dev_offset;
> +	int hint;
> +
> +	static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_MASK_COUNT] = {
> +		[BTRFS_DEV_ALLOCATION_DATA_ONLY] = -1,
> +		[BTRFS_DEV_ALLOCATION_PREFERRED_DATA] = 0,
> +		[BTRFS_DEV_ALLOCATION_METADATA_ONLY] = 1,
> +		[BTRFS_DEV_ALLOCATION_PREFERRED_METADATA] = 2

Finally I found the reason of the wrong allocation. The last two values
are swapped: the priority starts from BTRFS_DEV_ALLOCATION_DATA_ONLY
and ends to BTRFS_DEV_ALLOCATION_METADATA_ONLY.

Ok, now I have to restart the tests :-)

> +		/* the other values are set to 0 */
> +	};
>   
>   	/*
>   	 * in the first pass through the devices list, we gather information
> @@ -4991,16 +5005,81 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>   		devices_info[ndevs].max_avail = max_avail;
>   		devices_info[ndevs].total_avail = total_avail;
>   		devices_info[ndevs].dev = device;
> +
> +		if (((ctl->type & BTRFS_BLOCK_GROUP_DATA) &&
> +		     (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) ||
> +		    info->allocation_hint_mode ==
> +		     BTRFS_ALLOCATION_HINT_DISABLED) {
> +			/*
> +			 * if mixed bg or the allocator hint is
> +			 * disable, set all the alloc_hint
> +			 * fields to the same value, so the sorting
> +			 * is not affected
> +			 */
> +			devices_info[ndevs].alloc_hint = 0;
> +		} else if(ctl->type & BTRFS_BLOCK_GROUP_DATA) {
> +			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> +			/*
> +			 * skip BTRFS_DEV_METADATA_ONLY disks
> +			 */
> +			if (hint == BTRFS_DEV_ALLOCATION_METADATA_ONLY)
> +				continue;
> +			/*
> +			 * if a data chunk must be allocated,
> +			 * sort also by hint (data disk
> +			 * higher priority)
> +			 */
> +			devices_info[ndevs].alloc_hint = -alloc_hint_map[hint];
> +		} else { /* BTRFS_BLOCK_GROUP_METADATA */
> +			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> +			/*
> +			 * skip BTRFS_DEV_DATA_ONLY disks
> +			 */
> +			if (hint == BTRFS_DEV_ALLOCATION_DATA_ONLY)
> +				continue;
> +			/*
> +			 * if a data chunk must be allocated,
> +			 * sort also by hint (metadata hint
> +			 * higher priority)
> +			 */
> +			devices_info[ndevs].alloc_hint = alloc_hint_map[hint];
> +		}
> +
>   		++ndevs;
>   	}
>   	ctl->ndevs = ndevs;
>   
> +	/*
> +	 * no devices available
> +	 */
> +	if (!ndevs)
> +		return 0;
> +
>   	/*
>   	 * now sort the devices by hole size / available space
>   	 */
>   	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>   	     btrfs_cmp_device_info, NULL);
>   
> +	/*
> +	 * select the minimum set of disks grouped by hint that
> +	 * can host the chunk
> +	 */
> +	ndevs = 0;
> +	while (ndevs < ctl->ndevs) {
> +		hint = devices_info[ndevs++].alloc_hint;
> +		while (devices_info[ndevs].alloc_hint == hint &&
> +		       ndevs < ctl->ndevs)
> +				ndevs++;
> +		if (ndevs >= ctl->devs_min)
> +			break;
> +	}
> +
> +	BUG_ON(ndevs > ctl->ndevs);
> +	ctl->ndevs = ndevs;
> +
>   	return 0;
>   }
>   
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index d776b7f55d56..31a3e4cf93b5 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -364,6 +364,7 @@ struct btrfs_device_info {
>   	u64 dev_offset;
>   	u64 max_avail;
>   	u64 total_avail;
> +	int alloc_hint;
>   };
>   
>   struct btrfs_raid_attr {
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  parent reply	other threads:[~2021-02-19 18:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-01 21:28 [RFC][PATCH V6] btrfs: allocation_hint mode Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES Goffredo Baroncelli
2021-02-10 16:08   ` Josef Bacik
2021-02-11 18:47     ` Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 2/5] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-02-10 16:09   ` Josef Bacik
2021-02-11 18:47     ` Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 3/5] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 4/5] btrfs: add allocation_hint option Goffredo Baroncelli
2021-02-10 16:14   ` Josef Bacik
2021-02-11 18:46     ` Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 5/5] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-02-04 23:24   ` Zygo Blaxell
2021-02-05 18:01     ` Goffredo Baroncelli
2021-02-10 16:12   ` Josef Bacik
2021-02-11 18:46     ` Goffredo Baroncelli
2021-02-19 18:51   ` Goffredo Baroncelli [this message]
2021-02-10 16:04 ` [RFC][PATCH V6] btrfs: allocation_hint mode Josef Bacik
2021-02-11 18:47   ` Goffredo Baroncelli
2021-02-11 18:58     ` Josef Bacik
2021-02-16 22:27     ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=396b3709-a0b2-a0ee-00b8-75e4ca91b0e7@inwind.it \
    --to=kreijack@inwind.it \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=josef@toxicpanda.com \
    --cc=kreijack@libero.it \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox