public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation
Date: Thu, 31 Jan 2019 10:38:03 +0800	[thread overview]
Message-ID: <8a785ed2-80ef-e30b-5a63-6556f744eaf6@gmx.com> (raw)
In-Reply-To: <20190130074000.16638-3-wqu@suse.com>


[-- Attachment #1.1: Type: text/plain, Size: 7102 bytes --]

> [ENHANCEMENT]
> This patch will introduce btrfs_device::hint_free_dev_extent member to
> give some hint for chunk allocator to find free dev extents.
> 
> The hint itself is pretty simple, only tells where the first free slot
> could possibly be.
> 
> It is not 100% correct, unlike free space cache, but since
> find_free_dev_extent_start() is already robust enough to handle
> search_hint, so there is not need to introduce a complex and fancy free
> dev extent cache.
> 
> With this patch, allocating 4G on a 4T filled fs will be way more
> faster:
> 
>       v5.0-rc1   |   patched      |    function
> ---------------------------------------------------------------------
>  7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
>  7) ! 152.496 us | 7)   7.885 us  |    find_free_dev_extent_start [btrfs]();
>  7) ! 185.488 us | 7) + 36.649 us |  }
>  7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
>  7) ! 132.889 us | 7)   2.454 us  |    find_free_dev_extent_start [btrfs]();
>  7) ! 152.115 us | 7) + 24.145 us |  }
>  7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
>  7) ! 127.689 us | 7)   2.245 us  |    find_free_dev_extent_start [btrfs]();
>  7) ! 146.595 us | 7) + 19.376 us |  }
>  7)              | 7)             |  __btrfs_alloc_chunk [btrfs]() {
>  7) ! 126.657 us | 7)   2.174 us  |    find_free_dev_extent_start [btrfs]();
>  7) ! 144.521 us | 7) + 16.321 us |  }

For anyone who is interesting in unrealistic workload, without this
patch, fallocating a 1PiB file TiB by TiB will take 5+ hours!!

With this patch, it's just going to take around 15~20min.

Anyway, we're still far from customer oriented 1PiB HDDs, so that's not
something we need to bother yet.

Thanks,
Qu

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/volumes.c | 23 +++++++++++++++---
>  fs/btrfs/volumes.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 78 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 8e932d7d2fe6..cc15bf70dc72 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -411,6 +411,7 @@ static struct btrfs_device *__alloc_device(void)
>  	btrfs_device_data_ordered_init(dev);
>  	INIT_RADIX_TREE(&dev->reada_zones, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
>  	INIT_RADIX_TREE(&dev->reada_extents, GFP_NOFS & ~__GFP_DIRECT_RECLAIM);
> +	dev->hint_free_dev_extent = (u64)-1;
>  
>  	return dev;
>  }
> @@ -1741,9 +1742,9 @@ int find_free_dev_extent(struct btrfs_trans_handle *trans,
>  			 struct btrfs_device *device, u64 num_bytes,
>  			 u64 *start, u64 *len)
>  {
> -	/* FIXME use last free of some kind */
> -	return find_free_dev_extent_start(trans->transaction, device,
> -					  num_bytes, 0, start, len);
> +	return find_free_dev_extent_start(trans->transaction, device, num_bytes,
> +					  device->hint_free_dev_extent, start,
> +					  len);
>  }
>  
>  static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
> @@ -1799,6 +1800,7 @@ static int btrfs_free_dev_extent(struct btrfs_trans_handle *trans,
>  				      "Failed to remove dev extent item");
>  	} else {
>  		set_bit(BTRFS_TRANS_HAVE_FREE_BGS, &trans->transaction->flags);
> +		btrfs_device_hint_add_free(device, key.offset, *dev_extent_len);
>  	}
>  out:
>  	btrfs_free_path(path);
> @@ -1841,6 +1843,7 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans,
>  	btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset);
>  
>  	btrfs_set_dev_extent_length(leaf, extent, num_bytes);
> +	btrfs_device_hint_del_free(device, key.offset, num_bytes);
>  	btrfs_mark_buffer_dirty(leaf);
>  out:
>  	btrfs_free_path(path);
> @@ -7913,6 +7916,14 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  		devid = key.objectid;
>  		physical_offset = key.offset;
>  
> +		/*
> +		 * previous device verification is done, update its free dev
> +		 * extent hint
> +		 */
> +		if (device && devid != device->devid)
> +			btrfs_device_hint_add_free(device, prev_dev_ext_end,
> +				device->disk_total_bytes - prev_dev_ext_end);
> +
>  		if (!device || devid != device->devid) {
>  			device = btrfs_find_device(fs_info, devid, NULL, NULL);
>  			if (!device) {
> @@ -7940,6 +7951,10 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  					    physical_offset, physical_len);
>  		if (ret < 0)
>  			goto out;
> +
> +		btrfs_device_hint_add_free(device, prev_dev_ext_end,
> +				physical_offset - prev_dev_ext_end);
> +
>  		prev_devid = devid;
>  		prev_dev_ext_end = physical_offset + physical_len;
>  
> @@ -7951,6 +7966,8 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>  			break;
>  		}
>  	}
> +	btrfs_device_hint_add_free(device, prev_dev_ext_end,
> +			device->disk_total_bytes - prev_dev_ext_end);
>  
>  	/* Ensure all chunks have corresponding dev extents */
>  	ret = verify_chunk_dev_extent_mapping(fs_info);
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index ed806649a473..00f7ef72466f 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -108,6 +108,14 @@ struct btrfs_device {
>  
>  	/* bytes used on the current transaction */
>  	u64 commit_bytes_used;
> +
> +	/*
> +	 * hint about where the first possible free dev extent is.
> +	 *
> +	 * u64(-1) means no hint.
> +	 */
> +	u64 hint_free_dev_extent;
> +
>  	/*
>  	 * used to manage the device which is resized
>  	 *
> @@ -569,4 +577,54 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>  int btrfs_bg_type_to_factor(u64 flags);
>  int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
>  
> +static inline void btrfs_device_hint_add_free(struct btrfs_device *dev,
> +					      u64 start, u64 len)
> +{
> +	if (dev->disk_total_bytes == 0 || start + len > dev->disk_total_bytes)
> +		return;
> +	if (len < SZ_16M)
> +		return;
> +	if (start > dev->hint_free_dev_extent)
> +		return;
> +	dev->hint_free_dev_extent = start;
> +}
> +
> +static inline void btrfs_device_hint_del_free(struct btrfs_device *dev,
> +					      u64 start, u64 len)
> +{
> +	u64 free_hint = dev->hint_free_dev_extent;
> +
> +	if (dev->disk_total_bytes == 0 || start + len > dev->disk_total_bytes)
> +		return;
> +	/*
> +	 * |<- to be removed ->|
> +	 * 			| free hint
> +	 * Not affecting free hint
> +	 */
> +	if (start + len <= free_hint)
> +		return;
> +	/*
> +	 * |<- to be removed ->|
> +	 * 		| free hint
> +	 * Or
> +	 * 	|<- to be removed ->|
> +	 * | free hint
> +	 * |<-->| Less than 16M
> +	 *
> +	 * Move the hint to the range end
> +	 */
> +	if ((start <= free_hint && start + len > free_hint) ||
> +	    (start > free_hint && free_hint - start < SZ_16M)) {
> +		dev->hint_free_dev_extent = start + len;
> +		return;
> +	}
> +
> +	/*
> +	 * 			|<- to be removed ->|
> +	 * | free hint
> +	 *
> +	 * We still have larger than 16M free space, no need to update
> +	 * free hint
> +	 */
> +}
>  #endif
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-01-31  2:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-30  7:39 [PATCH 0/2] btrfs: Speedup chunk allocation for large fs Qu Wenruo
2019-01-30  7:39 ` [PATCH 1/2] btrfs: Don't search devid for every verify_one_dev_extent() call Qu Wenruo
2019-01-30  9:13   ` Nikolay Borisov
2019-01-31  9:38   ` Anand Jain
2019-01-30  7:40 ` [PATCH RFC 2/2] btrfs: Introduce free dev extent hint to speed up chunk allocation Qu Wenruo
2019-01-31  2:38   ` Qu Wenruo [this message]
2019-02-08 22:27 ` [PATCH 0/2] btrfs: Speedup chunk allocation for large fs David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a785ed2-80ef-e30b-5a63-6556f744eaf6@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox