Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Junchao Sun <sunjunchao2870@gmail.com>, linux-btrfs@vger.kernel.org
Cc: clm@fb.com, josef@toxicpanda.com, dsterba@suse.com, wqu@suse.com
Subject: Re: [PATCH v2] btrfs: qgroup: use xarray to track dirty extents in transaction.
Date: Thu, 6 Jun 2024 19:00:25 +0930	[thread overview]
Message-ID: <0610a1b0-78a6-4c1f-9188-69b587c8146f@gmx.com> (raw)
In-Reply-To: <20240603113650.279782-1-sunjunchao2870@gmail.com>



在 2024/6/3 21:06, Junchao Sun 写道:
> Changes since v1:
>   - Use xa_load() to update existing entry instead of double
>     xa_store().
>   - Rename goto lables.
>   - Remove unnecessary calls to xa_init().
>
> Using xarray to track dirty extents can reduce the size of the
> struct btrfs_qgroup_extent_record from 64 bytes to 40 bytes.
> And xarray is more cache line friendly, it also reduces the
> complexity of insertion and search code compared to rb tree.
>
> Another change introduced is about error handling.
> Before this patch, the result of btrfs_qgroup_trace_extent_nolock()
> is always a success. In this patch, because of this function calls
> the function xa_store() which has the possibility to fail, so mark
> qgroup as inconsistent if error happened and then free preallocated
> memory. Also we preallocate memory before spin_lock(), if memory
> preallcation failed, error handling is the same the existing code.
>
> This patch passed the check -g qgroup tests using xfstests and
> checkpatch tests.
>
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Junchao Sun <sunjunchao2870@gmail.com>

Sorry for the late reply, this version looks much better now, just
something small nitpicks.

> ---
>   fs/btrfs/delayed-ref.c | 57 ++++++++++++++++++++--------------
>   fs/btrfs/delayed-ref.h |  2 +-
>   fs/btrfs/qgroup.c      | 69 +++++++++++++++++++++---------------------
>   fs/btrfs/qgroup.h      |  1 -
>   fs/btrfs/transaction.c |  6 ++--
>   5 files changed, 73 insertions(+), 62 deletions(-)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 891ea2fa263c..e5cbc91e9864 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -915,8 +915,11 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>   	/* Record qgroup extent info if provided */
>   	if (qrecord) {
>   		if (btrfs_qgroup_trace_extent_nolock(trans->fs_info,
> -					delayed_refs, qrecord))
> +					delayed_refs, qrecord)) {

Since btrfs_qgroup_trace_extent_nolock() can return <0 for errors, I'd
prefer the more common handling like:

	ret = btrfs_qgroup_trace_extent_nolock();
	/* Either error or no need to use the qrecord */
	if (ret) {
		/* Do the cleanup */
	}
> +			/* If insertion failed, free preallocated memory */
> +			xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
>   			kfree(qrecord);
> +		}
>   		else
>   			qrecord_inserted = true;
>   	}
> @@ -1029,6 +1032,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
>   	u8 ref_type;
>
>   	is_system = (generic_ref->tree_ref.ref_root == BTRFS_CHUNK_TREE_OBJECTID);
> +	delayed_refs = &trans->transaction->delayed_refs;
>
>   	ASSERT(generic_ref->type == BTRFS_REF_METADATA && generic_ref->action);
>   	ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
> @@ -1036,18 +1040,15 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
>   		return -ENOMEM;
>
>   	head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
> -	if (!head_ref) {
> -		kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref);
> -		return -ENOMEM;
> -	}
> +	if (!head_ref)
> +		goto free_ref;
>
>   	if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) {
>   		record = kzalloc(sizeof(*record), GFP_NOFS);
> -		if (!record) {
> -			kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref);
> -			kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
> -			return -ENOMEM;
> -		}
> +		if (!record)
> +			goto free_head_ref;
> +		if (xa_reserve(&delayed_refs->dirty_extents, bytenr, GFP_NOFS))
> +			goto free_record;

Considering we are doing a big functional change, I'd really prefer to
move the error handling cleanup, for better bisection.

>   	}
>
>   	if (parent)
> @@ -1067,7 +1068,6 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
>   			      false, is_system, generic_ref->owning_root);
>   	head_ref->extent_op = extent_op;
>
> -	delayed_refs = &trans->transaction->delayed_refs;

Again, not really needed to touch it in a function changing patch.

>   	spin_lock(&delayed_refs->lock);
>
>   	/*
> @@ -1096,6 +1096,14 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
>   		btrfs_qgroup_trace_extent_post(trans, record);
>
>   	return 0;
> +
> +free_record:
> +	kfree(record);
> +free_head_ref:
> +	kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
> +free_ref:
> +	kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref);
> +	return -ENOMEM;
>   }
>
>   /*
> @@ -1137,28 +1145,23 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
>   	ref->objectid = owner;
>   	ref->offset = offset;
>
> -
> +	delayed_refs = &trans->transaction->delayed_refs;
>   	head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
> -	if (!head_ref) {
> -		kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
> -		return -ENOMEM;
> -	}
> +	if (!head_ref)
> +		goto free_ref;
>
>   	if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) {
>   		record = kzalloc(sizeof(*record), GFP_NOFS);
> -		if (!record) {
> -			kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
> -			kmem_cache_free(btrfs_delayed_ref_head_cachep,
> -					head_ref);
> -			return -ENOMEM;
> -		}
> +		if (!record)
> +			goto free_head_ref;
> +		if (xa_reserve(&delayed_refs->dirty_extents, bytenr, GFP_NOFS))
> +			goto free_record;

Same here.

[...]
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 5470e1cdf10c..717e16da9679 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1890,16 +1890,13 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
>    *
>    * Return 0 for success insert
>    * Return >0 for existing record, caller can free @record safely.
> - * Error is not possible

Then why not add a minus return value case?

The most common pattern would be, >0 for one common case (qrecord
exists), 0 for another common case (qrecord inserted), <0 for error.

Just like btrfs_search_slot().
And that's my first impression on the function, but it's not the case.

>    */
>   int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
>   				struct btrfs_delayed_ref_root *delayed_refs,
>   				struct btrfs_qgroup_extent_record *record)
>   {
> -	struct rb_node **p = &delayed_refs->dirty_extent_root.rb_node;
> -	struct rb_node *parent_node = NULL;
> -	struct btrfs_qgroup_extent_record *entry;
> -	u64 bytenr = record->bytenr;
> +	struct btrfs_qgroup_extent_record *existing, *ret;
> +	unsigned long bytenr = record->bytenr;
>
>   	if (!btrfs_qgroup_full_accounting(fs_info))
>   		return 1;
> @@ -1907,26 +1904,27 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
>   	lockdep_assert_held(&delayed_refs->lock);
>   	trace_btrfs_qgroup_trace_extent(fs_info, record);
>
> -	while (*p) {
> -		parent_node = *p;
> -		entry = rb_entry(parent_node, struct btrfs_qgroup_extent_record,
> -				 node);
> -		if (bytenr < entry->bytenr) {
> -			p = &(*p)->rb_left;
> -		} else if (bytenr > entry->bytenr) {
> -			p = &(*p)->rb_right;
> -		} else {
> -			if (record->data_rsv && !entry->data_rsv) {
> -				entry->data_rsv = record->data_rsv;
> -				entry->data_rsv_refroot =
> -					record->data_rsv_refroot;
> -			}
> -			return 1;
> +	xa_lock(&delayed_refs->dirty_extents);
> +	existing = xa_load(&delayed_refs->dirty_extents, bytenr);
> +	if (existing) {
> +		if (record->data_rsv && !existing->data_rsv) {
> +			existing->data_rsv = record->data_rsv;
> +			existing->data_rsv_refroot = record->data_rsv_refroot;
>   		}
> +		xa_unlock(&delayed_refs->dirty_extents);
> +		return 1;
> +	}
> +
> +	ret = __xa_store(&delayed_refs->dirty_extents, record->bytenr, record, GFP_ATOMIC);
> +	xa_unlock(&delayed_refs->dirty_extents);
> +	if (xa_is_err(ret)) {
> +		spin_lock(&fs_info->qgroup_lock);
> +		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;

We have qgroup_mark_inconsistent(), which would skip future accounting.

> +		spin_unlock(&fs_info->qgroup_lock);
> +
> +		return 1;

It's much better just to return the xa_err() instead.

>   	}
>
> -	rb_link_node(&record->node, parent_node, p);
> -	rb_insert_color(&record->node, &delayed_refs->dirty_extent_root);
>   	return 0;
>   }
>
> @@ -2027,13 +2025,18 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
>   	struct btrfs_delayed_ref_root *delayed_refs;
>   	int ret;
>
> +	delayed_refs = &trans->transaction->delayed_refs;
>   	if (!btrfs_qgroup_full_accounting(fs_info) || bytenr == 0 || num_bytes == 0)
>   		return 0;
>   	record = kzalloc(sizeof(*record), GFP_NOFS);
>   	if (!record)
>   		return -ENOMEM;
>
> -	delayed_refs = &trans->transaction->delayed_refs;

Again, you may want to not touch unrelated code in a function changing
patch.

Otherwise looks good to me.

Thanks,
Qu

  reply	other threads:[~2024-06-06  9:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-03 11:36 [PATCH v2] btrfs: qgroup: use xarray to track dirty extents in transaction Junchao Sun
2024-06-06  9:30 ` Qu Wenruo [this message]
2024-06-06 12:02   ` JunChao Sun
2024-06-06 22:04     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0610a1b0-78a6-4c1f-9188-69b587c8146f@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sunjunchao2870@gmail.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox