public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: index free space entries on size
Date: Wed, 29 Sep 2021 14:43:47 +0300	[thread overview]
Message-ID: <ca935b82-0d4b-8bef-e1c2-4b541dcbf3d4@suse.com> (raw)
In-Reply-To: <449df997c354fb9d074bf5f7d32bffc082386c4f.1632750544.git.josef@toxicpanda.com>



On 27.09.21 г. 16:56, Josef Bacik wrote:

<snip>

> ---
>  fs/btrfs/free-space-cache.c | 79 +++++++++++++++++++++++++++++++------
>  fs/btrfs/free-space-cache.h |  2 +
>  2 files changed, 69 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 0d26819b1cf6..d6eaf51ee597 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1576,6 +1576,38 @@ static int tree_insert_offset(struct rb_root *root, u64 offset,
>  	return 0;
>  }
>  
> +static u64 free_space_info_bytes(struct btrfs_free_space *info)
> +{
> +	if (info->bitmap && info->max_extent_size)
> +		return info->max_extent_size;
> +	return info->bytes;
> +}
> +
> +static void tree_insert_bytes(struct btrfs_free_space_ctl *ctl,
> +			      struct btrfs_free_space *info)
> +{
> +	struct rb_root_cached *root = &ctl->free_space_bytes;
> +	struct rb_node **p = &root->rb_root.rb_node;
> +	struct rb_node *parent_node = NULL;
> +	struct btrfs_free_space *tmp;
> +	bool leftmost = true;
> +
> +	while (*p) {
> +		parent_node = *p;
> +		tmp = rb_entry(parent_node, struct btrfs_free_space,
> +			       bytes_index);
> +		if (free_space_info_bytes(info) < free_space_info_bytes(tmp)) {
> +			p = &(*p)->rb_right;
> +			leftmost = false;

According to this the leftmost node is the largest available extent?
This is slightly counter intuitive, as general it's regarded that the
nodes left of the current one are smaller than it, whilst this
completely turns the concept upside down. I assume that's due to the
"cached" version of the rb tree and wanting to utilize the cached node
to hold the largest one. Perhaps a slight comment to sharpen attention
to future readers might be beneficial.

> +		} else {
> +			p = &(*p)->rb_left;
> +		}
> +	}
> +
> +	rb_link_node(&info->bytes_index, parent_node, p);
> +	rb_insert_color_cached(&info->bytes_index, root, leftmost);
> +}
> +
>  /*
>   * searches the tree for the given offset.
>   *
> @@ -1704,6 +1736,7 @@ __unlink_free_space(struct btrfs_free_space_ctl *ctl,
>  		    struct btrfs_free_space *info)
>  {
>  	rb_erase(&info->offset_index, &ctl->free_space_offset);
> +	rb_erase_cached(&info->bytes_index, &ctl->free_space_bytes);
>  	ctl->free_extents--;
>  
>  	if (!info->bitmap && !btrfs_free_space_trimmed(info)) {
> @@ -1730,6 +1763,8 @@ static int link_free_space(struct btrfs_free_space_ctl *ctl,
>  	if (ret)
>  		return ret;
>  
> +	tree_insert_bytes(ctl, info);
> +
>  	if (!info->bitmap && !btrfs_free_space_trimmed(info)) {
>  		ctl->discardable_extents[BTRFS_STAT_CURR]++;
>  		ctl->discardable_bytes[BTRFS_STAT_CURR] += info->bytes;
> @@ -1876,7 +1911,7 @@ static inline u64 get_max_extent_size(struct btrfs_free_space *entry)
>  /* Cache the size of the max extent in bytes */
>  static struct btrfs_free_space *
>  find_free_space(struct btrfs_free_space_ctl *ctl, u64 *offset, u64 *bytes,
> -		unsigned long align, u64 *max_extent_size)
> +		unsigned long align, u64 *max_extent_size, bool use_bytes_index)
>  {
>  	struct btrfs_free_space *entry;
>  	struct rb_node *node;
> @@ -1887,15 +1922,28 @@ find_free_space(struct btrfs_free_space_ctl *ctl, u64 *offset, u64 *bytes,
>  	if (!ctl->free_space_offset.rb_node)
>  		goto out;
>  
> -	entry = tree_search_offset(ctl, offset_to_bitmap(ctl, *offset), 0, 1);
> -	if (!entry)
> -		goto out;
> +	if (use_bytes_index) {
> +		node = rb_first_cached(&ctl->free_space_bytes);
> +	} else {
> +		entry = tree_search_offset(ctl, offset_to_bitmap(ctl, *offset),
> +					   0, 1);
> +		if (!entry)
> +			goto out;
> +		node = &entry->offset_index;
> +	}
>  
> -	for (node = &entry->offset_index; node; node = rb_next(node)) {
> -		entry = rb_entry(node, struct btrfs_free_space, offset_index);
> +	for (; node; node = rb_next(node)) {
> +		if (use_bytes_index)
> +			entry = rb_entry(node, struct btrfs_free_space,
> +					 bytes_index);
> +		else
> +			entry = rb_entry(node, struct btrfs_free_space,
> +					 offset_index);
>  		if (entry->bytes < *bytes) {
>  			*max_extent_size = max(get_max_extent_size(entry),
>  					       *max_extent_size);
> +			if (use_bytes_index)
> +				break;
>  			continue;
>  		}

This hunk is somewhat subtle, because:

1. First you get the largest free space (in case we are using
use_bytes_index). So say we want 5k and the largest index is 5m we are
going to return 5m by doing rb_first_cached.

2. The for() loop then likely terminates on the first iteration because
the largest extent is already too large. However I think this function
should be refactored, because the "indexed by size" case really works in
O(1) complexity. You take the largest available space via
rb_first_cached, then perform all the alignments checks in the body of
the loop and return the chunk if it satisfies them or execute the newly
added break.

This insight is really lost within the surrounding code, majority of
which exists for the "indexed by offset" case.

<snip>

  parent reply	other threads:[~2021-09-29 11:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27 13:56 [PATCH] btrfs: index free space entries on size Josef Bacik
2021-09-27 16:24 ` Filipe Manana
2021-09-29 11:43 ` Nikolay Borisov [this message]
2021-09-29 15:12   ` Josef Bacik
2021-09-29 15:16     ` Nikolay Borisov
2021-09-29 15:22       ` Josef Bacik
2021-09-29 15:33         ` Nikolay Borisov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca935b82-0d4b-8bef-e1c2-4b541dcbf3d4@suse.com \
    --to=nborisov@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox