Re: [PATCH v3 2/2] btrfs: qgroup: use xarray to track dirty extents in transaction.

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Junchao Sun <sunjunchao2870@gmail.com>, linux-btrfs@vger.kernel.org
Cc: clm@fb.com, josef@toxicpanda.com, dsterba@suse.com, wqu@suse.com
Subject: Re: [PATCH v3 2/2] btrfs: qgroup: use xarray to track dirty extents in transaction.
Date: Mon, 17 Jun 2024 12:41:40 +0930	[thread overview]
Message-ID: <b02d6f61-65d6-42f5-a89d-c43f7eccbbe6@gmx.com> (raw)
In-Reply-To: <20240607143021.122220-2-sunjunchao2870@gmail.com>



在 2024/6/8 00:00, Junchao Sun 写道:
> Changes since v2:
>    - Separate the cleanup code into a single patch.
>    - Call qgroup_mark_inconsistent() when record insertion failed.
>    - Return negative value when record insertion failed.
>
> Changes since v1:
>    - Use xa_load() to update existing entry instead of double
>      xa_store().
>    - Rename goto lables.
>    - Remove unnecessary calls to xa_init().

Forgot to mention, you should not put changelog into the commit message.

You do not need to resent, I'll remove them when pushing into for-next
branch.
(Along with extra testing etc).


>
> Using xarray to track dirty extents can reduce the size of the
> struct btrfs_qgroup_extent_record from 64 bytes to 40 bytes.
> And xarray is more cache line friendly, it also reduces the
> complexity of insertion and search code compared to rb tree.
>
> Another change introduced is about error handling.
> Before this patch, the result of btrfs_qgroup_trace_extent_nolock()
> is always a success. In this patch, because of this function calls
> the function xa_store() which has the possibility to fail, so mark
> qgroup as inconsistent if error happened and then free preallocated
> memory. Also we preallocate memory before spin_lock(), if memory
> preallcation failed, error handling is the same the existing code.
>
> This patch passed the check -g qgroup tests using xfstests and
> checkpatch tests.
>
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Junchao Sun <sunjunchao2870@gmail.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/delayed-ref.c | 14 +++++++--
>   fs/btrfs/delayed-ref.h |  2 +-
>   fs/btrfs/qgroup.c      | 66 ++++++++++++++++++++----------------------
>   fs/btrfs/transaction.c |  5 ++--
>   4 files changed, 46 insertions(+), 41 deletions(-)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 1a41ab991738..ec78d4c7581c 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -891,10 +891,13 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
>
>   	/* Record qgroup extent info if provided */
>   	if (qrecord) {
> -		if (btrfs_qgroup_trace_extent_nolock(trans->fs_info,
> -					delayed_refs, qrecord))
> +		int ret = btrfs_qgroup_trace_extent_nolock(trans->fs_info,
> +					delayed_refs, qrecord);
> +		if (ret) {
> +			/* Clean up if insertion fails or item exists. */
> +			xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
>   			kfree(qrecord);
> -		else
> +		} else
>   			qrecord_inserted = true;
>   	}
>
> @@ -1048,6 +1051,9 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>   		record = kzalloc(sizeof(*record), GFP_NOFS);
>   		if (!record)
>   			goto free_head_ref;
> +		if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents,
> +			       generic_ref->bytenr, GFP_NOFS))
> +			goto free_record;
>   	}
>
>   	init_delayed_ref_common(fs_info, node, generic_ref);
> @@ -1084,6 +1090,8 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>   		return btrfs_qgroup_trace_extent_post(trans, record);
>   	return 0;
>
> +free_record:
> +	kfree(record);
>   free_head_ref:
>   	kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
>   free_node:
> diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> index 04b180ebe1fe..a81d6f2aa799 100644
> --- a/fs/btrfs/delayed-ref.h
> +++ b/fs/btrfs/delayed-ref.h
> @@ -201,7 +201,7 @@ struct btrfs_delayed_ref_root {
>   	struct rb_root_cached href_root;
>
>   	/* dirty extent records */
> -	struct rb_root dirty_extent_root;
> +	struct xarray dirty_extents;
>
>   	/* this spin lock protects the rbtree and the entries inside */
>   	spinlock_t lock;
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index fc2a7ea26354..f75ed67a8edf 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1902,16 +1902,14 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
>    *
>    * Return 0 for success insert
>    * Return >0 for existing record, caller can free @record safely.
> - * Error is not possible
> + * Return <0 for insertion failed, caller can free @record safely.
>    */
>   int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
>   				struct btrfs_delayed_ref_root *delayed_refs,
>   				struct btrfs_qgroup_extent_record *record)
>   {
> -	struct rb_node **p = &delayed_refs->dirty_extent_root.rb_node;
> -	struct rb_node *parent_node = NULL;
> -	struct btrfs_qgroup_extent_record *entry;
> -	u64 bytenr = record->bytenr;
> +	struct btrfs_qgroup_extent_record *existing, *ret;
> +	unsigned long bytenr = record->bytenr;
>
>   	if (!btrfs_qgroup_full_accounting(fs_info))
>   		return 1;
> @@ -1919,26 +1917,24 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
>   	lockdep_assert_held(&delayed_refs->lock);
>   	trace_btrfs_qgroup_trace_extent(fs_info, record);
>
> -	while (*p) {
> -		parent_node = *p;
> -		entry = rb_entry(parent_node, struct btrfs_qgroup_extent_record,
> -				 node);
> -		if (bytenr < entry->bytenr) {
> -			p = &(*p)->rb_left;
> -		} else if (bytenr > entry->bytenr) {
> -			p = &(*p)->rb_right;
> -		} else {
> -			if (record->data_rsv && !entry->data_rsv) {
> -				entry->data_rsv = record->data_rsv;
> -				entry->data_rsv_refroot =
> -					record->data_rsv_refroot;
> -			}
> -			return 1;
> +	xa_lock(&delayed_refs->dirty_extents);
> +	existing = xa_load(&delayed_refs->dirty_extents, bytenr);
> +	if (existing) {
> +		if (record->data_rsv && !existing->data_rsv) {
> +			existing->data_rsv = record->data_rsv;
> +			existing->data_rsv_refroot = record->data_rsv_refroot;
>   		}
> +		xa_unlock(&delayed_refs->dirty_extents);
> +		return 1;
> +	}
> +
> +	ret = __xa_store(&delayed_refs->dirty_extents, record->bytenr, record, GFP_ATOMIC);
> +	xa_unlock(&delayed_refs->dirty_extents);
> +	if (xa_is_err(ret)) {
> +		qgroup_mark_inconsistent(fs_info);
> +		return xa_err(ret);
>   	}
>
> -	rb_link_node(&record->node, parent_node, p);
> -	rb_insert_color(&record->node, &delayed_refs->dirty_extent_root);
>   	return 0;
>   }
>
> @@ -2045,6 +2041,11 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
>   	if (!record)
>   		return -ENOMEM;
>
> +	if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, bytenr, GFP_NOFS)) {
> +		kfree(record);
> +		return -ENOMEM;
> +	}
> +
>   	delayed_refs = &trans->transaction->delayed_refs;
>   	record->bytenr = bytenr;
>   	record->num_bytes = num_bytes;
> @@ -2053,7 +2054,9 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
>   	spin_lock(&delayed_refs->lock);
>   	ret = btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, record);
>   	spin_unlock(&delayed_refs->lock);
> -	if (ret > 0) {
> +	if (ret) {
> +		/* Clean up if insertion fails or item exists. */
> +		xa_release(&delayed_refs->dirty_extents, record->bytenr);
>   		kfree(record);
>   		return 0;
>   	}
> @@ -2922,7 +2925,7 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
>   	struct btrfs_qgroup_extent_record *record;
>   	struct btrfs_delayed_ref_root *delayed_refs;
>   	struct ulist *new_roots = NULL;
> -	struct rb_node *node;
> +	unsigned long index;
>   	u64 num_dirty_extents = 0;
>   	u64 qgroup_to_skip;
>   	int ret = 0;
> @@ -2932,10 +2935,7 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
>
>   	delayed_refs = &trans->transaction->delayed_refs;
>   	qgroup_to_skip = delayed_refs->qgroup_to_skip;
> -	while ((node = rb_first(&delayed_refs->dirty_extent_root))) {
> -		record = rb_entry(node, struct btrfs_qgroup_extent_record,
> -				  node);
> -
> +	xa_for_each(&delayed_refs->dirty_extents, index, record) {
>   		num_dirty_extents++;
>   		trace_btrfs_qgroup_account_extents(fs_info, record);
>
> @@ -3001,7 +3001,7 @@ int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans)
>   		ulist_free(record->old_roots);
>   		ulist_free(new_roots);
>   		new_roots = NULL;
> -		rb_erase(node, &delayed_refs->dirty_extent_root);
> +		xa_erase(&delayed_refs->dirty_extents, index);
>   		kfree(record);
>
>   	}
> @@ -4788,15 +4788,13 @@ int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans,
>   void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans)
>   {
>   	struct btrfs_qgroup_extent_record *entry;
> -	struct btrfs_qgroup_extent_record *next;
> -	struct rb_root *root;
> +	unsigned long index;
>
> -	root = &trans->delayed_refs.dirty_extent_root;
> -	rbtree_postorder_for_each_entry_safe(entry, next, root, node) {
> +	xa_for_each(&trans->delayed_refs.dirty_extents, index, entry) {
>   		ulist_free(entry->old_roots);
>   		kfree(entry);
>   	}
> -	*root = RB_ROOT;
> +	xa_destroy(&trans->delayed_refs.dirty_extents);
>   }
>
>   void btrfs_free_squota_rsv(struct btrfs_fs_info *fs_info, u64 root, u64 rsv_bytes)
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 3388c836b9a5..c473c049d37f 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -143,8 +143,7 @@ void btrfs_put_transaction(struct btrfs_transaction *transaction)
>   		BUG_ON(!list_empty(&transaction->list));
>   		WARN_ON(!RB_EMPTY_ROOT(
>   				&transaction->delayed_refs.href_root.rb_root));
> -		WARN_ON(!RB_EMPTY_ROOT(
> -				&transaction->delayed_refs.dirty_extent_root));
> +		WARN_ON(!xa_empty(&transaction->delayed_refs.dirty_extents));
>   		if (transaction->delayed_refs.pending_csums)
>   			btrfs_err(transaction->fs_info,
>   				  "pending csums is %llu",
> @@ -351,7 +350,7 @@ static noinline int join_transaction(struct btrfs_fs_info *fs_info,
>   	memset(&cur_trans->delayed_refs, 0, sizeof(cur_trans->delayed_refs));
>
>   	cur_trans->delayed_refs.href_root = RB_ROOT_CACHED;
> -	cur_trans->delayed_refs.dirty_extent_root = RB_ROOT;
> +	xa_init(&cur_trans->delayed_refs.dirty_extents);
>   	atomic_set(&cur_trans->delayed_refs.num_entries, 0);
>
>   	/*

next prev parent reply	other threads:[~2024-06-17  3:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-07 14:30 [PATCH v3 1/2] btrfs: qgroup: use goto style to handle error in add_delayed_ref() Junchao Sun
2024-06-07 14:30 ` [PATCH v3 2/2] btrfs: qgroup: use xarray to track dirty extents in transaction Junchao Sun
2024-06-17  3:11   ` Qu Wenruo [this message]
2024-06-17  2:08 ` [PATCH v3 1/2] btrfs: qgroup: use goto style to handle error in add_delayed_ref() JunChao Sun
2024-06-17  3:08 ` Qu Wenruo
2024-08-13 22:44 ` David Sterba
2024-08-16  3:36   ` Julian Sun
2024-08-16 11:46     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b02d6f61-65d6-42f5-a89d-c43f7eccbbe6@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sunjunchao2870@gmail.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox