Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Sun YangKai <sunk67188@gmail.com>
To: Leo Martins <loemra.dev@gmail.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 1/2] btrfs: don't finalize fs roots during qgroup snapshot accounting
Date: Fri, 24 Apr 2026 11:15:48 +0800	[thread overview]
Message-ID: <2c18476d-e75c-4451-8912-b8c783852923@gmail.com> (raw)
In-Reply-To: <3d60d1354b7f564568532085656a1989d4faa171.1776899536.git.loemra.dev@gmail.com>

Thank you for your work on this — much appreciated.

I am not very familiar with this part of the codebase, so please take my 
comments with that in mind.

That said, I have a few minor inline suggestions regarding some of the 
implementation details. Nothing major — just small things I noticed 
while reading through. Hope they help. Please see below.

On 2026/4/24 06:43, Leo Martins wrote:
> qgroup_account_snapshot() calls commit_fs_roots() to persist root
> items for consistent qgroup accounting. However, commit_fs_roots()
> is a transaction finalization function that also clears
> BTRFS_ROOT_FORCE_COW, frees log trees, clears BTRFS_ROOT_TRANS_TAG,
> and frees qgroup metadata reservations. These are all premature
> since the transaction is still in progress and create_pending_snapshot()
> will modify the source root's tree afterward.
> 
> Currently this is harmless because should_cow_block() always COWs
> WRITTEN extent buffers regardless of FORCE_COW state. But clearing
> FORCE_COW mid-transaction leaves roots in an incorrect state:
> blocks may be shared with a snapshot (via btrfs_copy_root) while
> FORCE_COW no longer reflects that. Any future optimization that
> relies on FORCE_COW to decide whether COW can be skipped would
> silently corrupt snapshots.
> 
> Extract prepare_root_item() as a helper that updates a root's
> root_item and writes it to the tree root. This is the only part of
> commit_fs_roots() that qgroup accounting needs. Add
> qgroup_commit_fs_roots() which iterates dirty roots calling only
> prepare_root_item(), without finalization side effects. It iterates
> using an advancing index instead of clearing tags, so roots remain
> tagged for the real commit_fs_roots() call later, eliminating the
> record_root_in_trans() re-tagging that was previously needed.
> 
> Signed-off-by: Leo Martins <loemra.dev@gmail.com>
> ---
>   fs/btrfs/transaction.c | 105 +++++++++++++++++++++++++++++------------
>   1 file changed, 75 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 55791bb100a22..5b362f2680200 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1477,8 +1477,66 @@ void btrfs_add_dead_root(struct btrfs_root *root)
>   }
>   
>   /*
> - * Update each subvolume root and its relocation root, if it exists, in the tree
> - * of tree roots. Also free log roots if they exist.
> + * Update the root_item for @root to point at the current root->node
> + * and write it to the tree root.
> + */
> +static int prepare_root_item(struct btrfs_trans_handle *trans,
> +			   struct btrfs_root *root)
> +{
> +	if (root->commit_root != root->node) {
> +		list_add_tail(&root->dirty_list,
> +			      &trans->transaction->switch_commits);
> +		btrfs_set_root_node(&root->root_item, root->node);
> +	}
> +
> +	return btrfs_update_root(trans, trans->fs_info->tree_root,
> +				 &root->root_key, &root->root_item);
> +}
> +
> +/*
> + * Update root items for all dirty fs roots without the finalization
> + * side effects of commit_fs_roots() (clearing FORCE_COW, freeing
> + * logs, clearing trans tags, etc.). Tags are preserved so
> + * commit_fs_roots() can still find these roots later.
> + */
> +static int qgroup_commit_fs_roots(struct btrfs_trans_handle *trans)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_root *gang[8];
> +	unsigned long index = 0;
> +	int i;
> +	int ret;

I wonder if it would be better to use unsigned int for ret since it only 
set by radix_tree_gang_lookup_tag which returns unsigned int.

And I wonder if ret is a good name since we never return ret.

> +
> +	spin_lock(&fs_info->fs_roots_radix_lock);
> +	while (1) {
> +		ret = radix_tree_gang_lookup_tag(&fs_info->fs_roots_radix,
> +						 (void **)gang, index,
> +						 ARRAY_SIZE(gang),
> +						 BTRFS_ROOT_TRANS_TAG);
> +		if (ret == 0)
> +			break;

I assume `ret > ARRAAY_SIZE(gang)` should never happen here so maybe we 
could have `index=btrfs_root_id(gang[ret-1]) + 1;` here.

> +		for (i = 0; i < ret; i++) {
> +			struct btrfs_root *root = gang[i];
> +			int ret2;
> +
> +			index = btrfs_root_id(root) + 1;

index seems not used inside this for loop, so maybe we can update it 
outside the loop to make it looks better.

Thanks again for the patch.
Regards,
Sun YangKai

> +			spin_unlock(&fs_info->fs_roots_radix_lock);
> +
> +			ret2 = prepare_root_item(trans, root);
> +			if (unlikely(ret2))
> +				return ret2;
> +
> +			spin_lock(&fs_info->fs_roots_radix_lock);
> +		}
> +	}
> +	spin_unlock(&fs_info->fs_roots_radix_lock);
> +	return 0;
> +}
> +
> +/*
> + * Finalize each dirty subvolume root: free log trees, update relocation
> + * roots, clear BTRFS_ROOT_FORCE_COW, persist root items, and clear
> + * BTRFS_ROOT_TRANS_TAG.
>    */
>   static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
>   {
> @@ -1535,16 +1593,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
>   			clear_bit(BTRFS_ROOT_FORCE_COW, &root->state);
>   			smp_mb__after_atomic();
>   
> -			if (root->commit_root != root->node) {
> -				list_add_tail(&root->dirty_list,
> -					&trans->transaction->switch_commits);
> -				btrfs_set_root_node(&root->root_item,
> -						    root->node);
> -			}
> -
> -			ret2 = btrfs_update_root(trans, fs_info->tree_root,
> -						&root->root_key,
> -						&root->root_item);
> +			ret2 = prepare_root_item(trans, root);
>   			if (unlikely(ret2))
>   				return ret2;
>   			spin_lock(&fs_info->fs_roots_radix_lock);
> @@ -1604,7 +1653,13 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
>   		return ret;
>   	}
>   
> -	ret = commit_fs_roots(trans);
> +	/*
> +	 * Use qgroup_commit_fs_roots() instead of commit_fs_roots()
> +	 * because the transaction is still in progress and we must not
> +	 * clear FORCE_COW on roots whose blocks are shared with a
> +	 * snapshot.
> +	 */
> +	ret = qgroup_commit_fs_roots(trans);
>   	if (ret)
>   		return ret;
>   	ret = btrfs_qgroup_account_extents(trans);
> @@ -1618,16 +1673,12 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
>   		return ret;
>   
>   	/*
> -	 * Now we do a simplified commit transaction, which will:
> -	 * 1) commit all subvolume and extent tree
> -	 *    To ensure all subvolume and extent tree have a valid
> -	 *    commit_root to accounting later insert_dir_item()
> -	 * 2) write all btree blocks onto disk
> -	 *    This is to make sure later btree modification will be cowed
> -	 *    Or commit_root can be populated and cause wrong qgroup numbers
> -	 * In this simplified commit, we don't really care about other trees
> -	 * like chunk and root tree, as they won't affect qgroup.
> -	 * And we don't write super to avoid half committed status.
> +	 * Simplified commit for qgroup accounting:
> +	 * 1) commit cow-only roots (extent tree, etc.) for consistent
> +	 *    commit_root pointers
> +	 * 2) write all btree blocks to disk so subsequent modifications
> +	 *    are properly COW'd
> +	 * We don't write the super to avoid a half-committed state.
>   	 */
>   	ret = commit_cowonly_roots(trans);
>   	if (ret)
> @@ -1640,13 +1691,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
>   		return ret;
>   	}
>   
> -	/*
> -	 * Force parent root to be updated, as we recorded it before so its
> -	 * last_trans == cur_transid.
> -	 * Or it won't be committed again onto disk after later
> -	 * insert_dir_item()
> -	 */
> -	return record_root_in_trans(trans, parent, true);
> +	return 0;
>   }
>   
>   /*


  reply	other threads:[~2026-04-24  3:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 22:43 [PATCH 0/2] btrfs: skip COW for written extent buffers Leo Martins
2026-04-23 22:43 ` [PATCH 1/2] btrfs: don't finalize fs roots during qgroup snapshot accounting Leo Martins
2026-04-24  3:15   ` Sun YangKai [this message]
2026-04-23 22:43 ` [PATCH 2/2] btrfs: skip COW for written extent buffers allocated in current transaction Leo Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c18476d-e75c-4451-8912-b8c783852923@gmail.com \
    --to=sunk67188@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=loemra.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox