From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Junchao Sun <sunjunchao2870@gmail.com>, linux-btrfs@vger.kernel.org
Cc: clm@fb.com, josef@toxicpanda.com, dsterba@suse.com, wqu@suse.com
Subject: Re: [PATCH v2] btrfs: qgroup: use xarray to track dirty extents in transaction.
Date: Thu, 6 Jun 2024 19:00:25 +0930 [thread overview]
Message-ID: <0610a1b0-78a6-4c1f-9188-69b587c8146f@gmx.com> (raw)
In-Reply-To: <20240603113650.279782-1-sunjunchao2870@gmail.com>
在 2024/6/3 21:06, Junchao Sun 写道:
> Changes since v1:
> - Use xa_load() to update existing entry instead of double
> xa_store().
> - Rename goto lables.
> - Remove unnecessary calls to xa_init().
>
> Using xarray to track dirty extents can reduce the size of the
> struct btrfs_qgroup_extent_record from 64 bytes to 40 bytes.
> And xarray is more cache line friendly, it also reduces the
> complexity of insertion and search code compared to rb tree.
>
> Another change introduced is about error handling.
> Before this patch, the result of btrfs_qgroup_trace_extent_nolock()
> is always a success. In this patch, because of this function calls
> the function xa_store() which has the possibility to fail, so mark
> qgroup as inconsistent if error happened and then free preallocated
> memory. Also we preallocate memory before spin_lock(), if memory
> preallcation failed, error handling is the same the existing code.
>
> This patch passed the check -g qgroup tests using xfstests and
> checkpatch tests.
>
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Junchao Sun <sunjunchao2870@gmail.com>
Sorry for the late reply, this version looks much better now, just
something small nitpicks.
> ---
> fs/btrfs/delayed-ref.c | 57 ++++++++++++++++++++--------------
> fs/btrfs/delayed-ref.h | 2 +-
> fs/btrfs/qgroup.c | 69 +++++++++++++++++++++---------------------
> fs/btrfs/qgroup.h | 1 -
> fs/btrfs/transaction.c | 6 ++--
> 5 files changed, 73 insertions(+), 62 deletions(-)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index 891ea2fa263c..e5cbc91e9864 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -915,8 +915,11 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans,
> /* Record qgroup extent info if provided */
> if (qrecord) {
> if (btrfs_qgroup_trace_extent_nolock(trans->fs_info,
> - delayed_refs, qrecord))
> + delayed_refs, qrecord)) {
Since btrfs_qgroup_trace_extent_nolock() can return <0 for errors, I'd
prefer the more common handling like:
ret = btrfs_qgroup_trace_extent_nolock();
/* Either error or no need to use the qrecord */
if (ret) {
/* Do the cleanup */
}
> + /* If insertion failed, free preallocated memory */
> + xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
> kfree(qrecord);
> + }
> else
> qrecord_inserted = true;
> }
> @@ -1029,6 +1032,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
> u8 ref_type;
>
> is_system = (generic_ref->tree_ref.ref_root == BTRFS_CHUNK_TREE_OBJECTID);
> + delayed_refs = &trans->transaction->delayed_refs;
>
> ASSERT(generic_ref->type == BTRFS_REF_METADATA && generic_ref->action);
> ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
> @@ -1036,18 +1040,15 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
> return -ENOMEM;
>
> head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
> - if (!head_ref) {
> - kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref);
> - return -ENOMEM;
> - }
> + if (!head_ref)
> + goto free_ref;
>
> if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) {
> record = kzalloc(sizeof(*record), GFP_NOFS);
> - if (!record) {
> - kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref);
> - kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
> - return -ENOMEM;
> - }
> + if (!record)
> + goto free_head_ref;
> + if (xa_reserve(&delayed_refs->dirty_extents, bytenr, GFP_NOFS))
> + goto free_record;
Considering we are doing a big functional change, I'd really prefer to
move the error handling cleanup, for better bisection.
> }
>
> if (parent)
> @@ -1067,7 +1068,6 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
> false, is_system, generic_ref->owning_root);
> head_ref->extent_op = extent_op;
>
> - delayed_refs = &trans->transaction->delayed_refs;
Again, not really needed to touch it in a function changing patch.
> spin_lock(&delayed_refs->lock);
>
> /*
> @@ -1096,6 +1096,14 @@ int btrfs_add_delayed_tree_ref(struct btrfs_trans_handle *trans,
> btrfs_qgroup_trace_extent_post(trans, record);
>
> return 0;
> +
> +free_record:
> + kfree(record);
> +free_head_ref:
> + kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
> +free_ref:
> + kmem_cache_free(btrfs_delayed_tree_ref_cachep, ref);
> + return -ENOMEM;
> }
>
> /*
> @@ -1137,28 +1145,23 @@ int btrfs_add_delayed_data_ref(struct btrfs_trans_handle *trans,
> ref->objectid = owner;
> ref->offset = offset;
>
> -
> + delayed_refs = &trans->transaction->delayed_refs;
> head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
> - if (!head_ref) {
> - kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
> - return -ENOMEM;
> - }
> + if (!head_ref)
> + goto free_ref;
>
> if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) {
> record = kzalloc(sizeof(*record), GFP_NOFS);
> - if (!record) {
> - kmem_cache_free(btrfs_delayed_data_ref_cachep, ref);
> - kmem_cache_free(btrfs_delayed_ref_head_cachep,
> - head_ref);
> - return -ENOMEM;
> - }
> + if (!record)
> + goto free_head_ref;
> + if (xa_reserve(&delayed_refs->dirty_extents, bytenr, GFP_NOFS))
> + goto free_record;
Same here.
[...]
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 5470e1cdf10c..717e16da9679 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1890,16 +1890,13 @@ int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid,
> *
> * Return 0 for success insert
> * Return >0 for existing record, caller can free @record safely.
> - * Error is not possible
Then why not add a minus return value case?
The most common pattern would be, >0 for one common case (qrecord
exists), 0 for another common case (qrecord inserted), <0 for error.
Just like btrfs_search_slot().
And that's my first impression on the function, but it's not the case.
> */
> int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
> struct btrfs_delayed_ref_root *delayed_refs,
> struct btrfs_qgroup_extent_record *record)
> {
> - struct rb_node **p = &delayed_refs->dirty_extent_root.rb_node;
> - struct rb_node *parent_node = NULL;
> - struct btrfs_qgroup_extent_record *entry;
> - u64 bytenr = record->bytenr;
> + struct btrfs_qgroup_extent_record *existing, *ret;
> + unsigned long bytenr = record->bytenr;
>
> if (!btrfs_qgroup_full_accounting(fs_info))
> return 1;
> @@ -1907,26 +1904,27 @@ int btrfs_qgroup_trace_extent_nolock(struct btrfs_fs_info *fs_info,
> lockdep_assert_held(&delayed_refs->lock);
> trace_btrfs_qgroup_trace_extent(fs_info, record);
>
> - while (*p) {
> - parent_node = *p;
> - entry = rb_entry(parent_node, struct btrfs_qgroup_extent_record,
> - node);
> - if (bytenr < entry->bytenr) {
> - p = &(*p)->rb_left;
> - } else if (bytenr > entry->bytenr) {
> - p = &(*p)->rb_right;
> - } else {
> - if (record->data_rsv && !entry->data_rsv) {
> - entry->data_rsv = record->data_rsv;
> - entry->data_rsv_refroot =
> - record->data_rsv_refroot;
> - }
> - return 1;
> + xa_lock(&delayed_refs->dirty_extents);
> + existing = xa_load(&delayed_refs->dirty_extents, bytenr);
> + if (existing) {
> + if (record->data_rsv && !existing->data_rsv) {
> + existing->data_rsv = record->data_rsv;
> + existing->data_rsv_refroot = record->data_rsv_refroot;
> }
> + xa_unlock(&delayed_refs->dirty_extents);
> + return 1;
> + }
> +
> + ret = __xa_store(&delayed_refs->dirty_extents, record->bytenr, record, GFP_ATOMIC);
> + xa_unlock(&delayed_refs->dirty_extents);
> + if (xa_is_err(ret)) {
> + spin_lock(&fs_info->qgroup_lock);
> + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
We have qgroup_mark_inconsistent(), which would skip future accounting.
> + spin_unlock(&fs_info->qgroup_lock);
> +
> + return 1;
It's much better just to return the xa_err() instead.
> }
>
> - rb_link_node(&record->node, parent_node, p);
> - rb_insert_color(&record->node, &delayed_refs->dirty_extent_root);
> return 0;
> }
>
> @@ -2027,13 +2025,18 @@ int btrfs_qgroup_trace_extent(struct btrfs_trans_handle *trans, u64 bytenr,
> struct btrfs_delayed_ref_root *delayed_refs;
> int ret;
>
> + delayed_refs = &trans->transaction->delayed_refs;
> if (!btrfs_qgroup_full_accounting(fs_info) || bytenr == 0 || num_bytes == 0)
> return 0;
> record = kzalloc(sizeof(*record), GFP_NOFS);
> if (!record)
> return -ENOMEM;
>
> - delayed_refs = &trans->transaction->delayed_refs;
Again, you may want to not touch unrelated code in a function changing
patch.
Otherwise looks good to me.
Thanks,
Qu
next prev parent reply other threads:[~2024-06-06 9:30 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-03 11:36 [PATCH v2] btrfs: qgroup: use xarray to track dirty extents in transaction Junchao Sun
2024-06-06 9:30 ` Qu Wenruo [this message]
2024-06-06 12:02 ` JunChao Sun
2024-06-06 22:04 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0610a1b0-78a6-4c1f-9188-69b587c8146f@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=sunjunchao2870@gmail.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox