From: Sun YangKai <sunk67188@gmail.com>
To: Leo Martins <loemra.dev@gmail.com>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 1/2] btrfs: don't finalize fs roots during qgroup snapshot accounting
Date: Fri, 24 Apr 2026 11:15:48 +0800 [thread overview]
Message-ID: <2c18476d-e75c-4451-8912-b8c783852923@gmail.com> (raw)
In-Reply-To: <3d60d1354b7f564568532085656a1989d4faa171.1776899536.git.loemra.dev@gmail.com>
Thank you for your work on this — much appreciated.
I am not very familiar with this part of the codebase, so please take my
comments with that in mind.
That said, I have a few minor inline suggestions regarding some of the
implementation details. Nothing major — just small things I noticed
while reading through. Hope they help. Please see below.
On 2026/4/24 06:43, Leo Martins wrote:
> qgroup_account_snapshot() calls commit_fs_roots() to persist root
> items for consistent qgroup accounting. However, commit_fs_roots()
> is a transaction finalization function that also clears
> BTRFS_ROOT_FORCE_COW, frees log trees, clears BTRFS_ROOT_TRANS_TAG,
> and frees qgroup metadata reservations. These are all premature
> since the transaction is still in progress and create_pending_snapshot()
> will modify the source root's tree afterward.
>
> Currently this is harmless because should_cow_block() always COWs
> WRITTEN extent buffers regardless of FORCE_COW state. But clearing
> FORCE_COW mid-transaction leaves roots in an incorrect state:
> blocks may be shared with a snapshot (via btrfs_copy_root) while
> FORCE_COW no longer reflects that. Any future optimization that
> relies on FORCE_COW to decide whether COW can be skipped would
> silently corrupt snapshots.
>
> Extract prepare_root_item() as a helper that updates a root's
> root_item and writes it to the tree root. This is the only part of
> commit_fs_roots() that qgroup accounting needs. Add
> qgroup_commit_fs_roots() which iterates dirty roots calling only
> prepare_root_item(), without finalization side effects. It iterates
> using an advancing index instead of clearing tags, so roots remain
> tagged for the real commit_fs_roots() call later, eliminating the
> record_root_in_trans() re-tagging that was previously needed.
>
> Signed-off-by: Leo Martins <loemra.dev@gmail.com>
> ---
> fs/btrfs/transaction.c | 105 +++++++++++++++++++++++++++++------------
> 1 file changed, 75 insertions(+), 30 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 55791bb100a22..5b362f2680200 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1477,8 +1477,66 @@ void btrfs_add_dead_root(struct btrfs_root *root)
> }
>
> /*
> - * Update each subvolume root and its relocation root, if it exists, in the tree
> - * of tree roots. Also free log roots if they exist.
> + * Update the root_item for @root to point at the current root->node
> + * and write it to the tree root.
> + */
> +static int prepare_root_item(struct btrfs_trans_handle *trans,
> + struct btrfs_root *root)
> +{
> + if (root->commit_root != root->node) {
> + list_add_tail(&root->dirty_list,
> + &trans->transaction->switch_commits);
> + btrfs_set_root_node(&root->root_item, root->node);
> + }
> +
> + return btrfs_update_root(trans, trans->fs_info->tree_root,
> + &root->root_key, &root->root_item);
> +}
> +
> +/*
> + * Update root items for all dirty fs roots without the finalization
> + * side effects of commit_fs_roots() (clearing FORCE_COW, freeing
> + * logs, clearing trans tags, etc.). Tags are preserved so
> + * commit_fs_roots() can still find these roots later.
> + */
> +static int qgroup_commit_fs_roots(struct btrfs_trans_handle *trans)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + struct btrfs_root *gang[8];
> + unsigned long index = 0;
> + int i;
> + int ret;
I wonder if it would be better to use unsigned int for ret since it only
set by radix_tree_gang_lookup_tag which returns unsigned int.
And I wonder if ret is a good name since we never return ret.
> +
> + spin_lock(&fs_info->fs_roots_radix_lock);
> + while (1) {
> + ret = radix_tree_gang_lookup_tag(&fs_info->fs_roots_radix,
> + (void **)gang, index,
> + ARRAY_SIZE(gang),
> + BTRFS_ROOT_TRANS_TAG);
> + if (ret == 0)
> + break;
I assume `ret > ARRAAY_SIZE(gang)` should never happen here so maybe we
could have `index=btrfs_root_id(gang[ret-1]) + 1;` here.
> + for (i = 0; i < ret; i++) {
> + struct btrfs_root *root = gang[i];
> + int ret2;
> +
> + index = btrfs_root_id(root) + 1;
index seems not used inside this for loop, so maybe we can update it
outside the loop to make it looks better.
Thanks again for the patch.
Regards,
Sun YangKai
> + spin_unlock(&fs_info->fs_roots_radix_lock);
> +
> + ret2 = prepare_root_item(trans, root);
> + if (unlikely(ret2))
> + return ret2;
> +
> + spin_lock(&fs_info->fs_roots_radix_lock);
> + }
> + }
> + spin_unlock(&fs_info->fs_roots_radix_lock);
> + return 0;
> +}
> +
> +/*
> + * Finalize each dirty subvolume root: free log trees, update relocation
> + * roots, clear BTRFS_ROOT_FORCE_COW, persist root items, and clear
> + * BTRFS_ROOT_TRANS_TAG.
> */
> static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
> {
> @@ -1535,16 +1593,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
> clear_bit(BTRFS_ROOT_FORCE_COW, &root->state);
> smp_mb__after_atomic();
>
> - if (root->commit_root != root->node) {
> - list_add_tail(&root->dirty_list,
> - &trans->transaction->switch_commits);
> - btrfs_set_root_node(&root->root_item,
> - root->node);
> - }
> -
> - ret2 = btrfs_update_root(trans, fs_info->tree_root,
> - &root->root_key,
> - &root->root_item);
> + ret2 = prepare_root_item(trans, root);
> if (unlikely(ret2))
> return ret2;
> spin_lock(&fs_info->fs_roots_radix_lock);
> @@ -1604,7 +1653,13 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
> return ret;
> }
>
> - ret = commit_fs_roots(trans);
> + /*
> + * Use qgroup_commit_fs_roots() instead of commit_fs_roots()
> + * because the transaction is still in progress and we must not
> + * clear FORCE_COW on roots whose blocks are shared with a
> + * snapshot.
> + */
> + ret = qgroup_commit_fs_roots(trans);
> if (ret)
> return ret;
> ret = btrfs_qgroup_account_extents(trans);
> @@ -1618,16 +1673,12 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
> return ret;
>
> /*
> - * Now we do a simplified commit transaction, which will:
> - * 1) commit all subvolume and extent tree
> - * To ensure all subvolume and extent tree have a valid
> - * commit_root to accounting later insert_dir_item()
> - * 2) write all btree blocks onto disk
> - * This is to make sure later btree modification will be cowed
> - * Or commit_root can be populated and cause wrong qgroup numbers
> - * In this simplified commit, we don't really care about other trees
> - * like chunk and root tree, as they won't affect qgroup.
> - * And we don't write super to avoid half committed status.
> + * Simplified commit for qgroup accounting:
> + * 1) commit cow-only roots (extent tree, etc.) for consistent
> + * commit_root pointers
> + * 2) write all btree blocks to disk so subsequent modifications
> + * are properly COW'd
> + * We don't write the super to avoid a half-committed state.
> */
> ret = commit_cowonly_roots(trans);
> if (ret)
> @@ -1640,13 +1691,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
> return ret;
> }
>
> - /*
> - * Force parent root to be updated, as we recorded it before so its
> - * last_trans == cur_transid.
> - * Or it won't be committed again onto disk after later
> - * insert_dir_item()
> - */
> - return record_root_in_trans(trans, parent, true);
> + return 0;
> }
>
> /*
next prev parent reply other threads:[~2026-04-24 3:15 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 22:43 [PATCH 0/2] btrfs: skip COW for written extent buffers Leo Martins
2026-04-23 22:43 ` [PATCH 1/2] btrfs: don't finalize fs roots during qgroup snapshot accounting Leo Martins
2026-04-24 3:15 ` Sun YangKai [this message]
2026-04-23 22:43 ` [PATCH 2/2] btrfs: skip COW for written extent buffers allocated in current transaction Leo Martins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2c18476d-e75c-4451-8912-b8c783852923@gmail.com \
--to=sunk67188@gmail.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=loemra.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox