From: Qu Wenruo <wqu@suse.com>
To: Boris Burkov <boris@bur.io>,
linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 2/7] btrfs: fix qgroup prealloc rsv leak in subvolume operations
Date: Wed, 27 Mar 2024 08:37:28 +1030 [thread overview]
Message-ID: <783373ae-12d1-4925-ae6f-413815a3cbd5@suse.com> (raw)
In-Reply-To: <c39e9aea5fa5e22c42fcea3d810d78cb499312ff.1711488980.git.boris@bur.io>
在 2024/3/27 08:09, Boris Burkov 写道:
> Create subvol, create snapshot and delete subvol all use
> btrfs_subvolume_reserve_metadata to reserve metadata for the changes
> done to the parent subvolume's fs tree, which cannot be mediated in the
> normal way via start_transaction. When quota groups (squota or qgroups)
> are enabled, this reserves qgroup metadata of type PREALLOC. Once the
> operation is associated to a transaction, we convert PREALLOC to
> PERTRANS, which gets cleared in bulk at the end of the transaction.
>
> However, the error paths of these three operations were not implementing
> this lifecycle correctly. They unconditionally converted the PREALLOC to
> PERTRANS in a generic cleanup step regardless of errors or whether the
> operation was fully associated to a transaction or not. This resulted in
> error paths occasionally converting this rsv to PERTRANS without calling
> record_root_in_trans successfully, which meant that unless that root got
> recorded in the transaction by some other thread, the end of the
> transaction would not free that root's PERTRANS, leaking it. Ultimately,
> this resulted in hitting a WARN in CONFIG_BTRFS_DEBUG builds at unmount
> for the leaked reservation.
>
> The fix is to ensure that every qgroup PREALLOC reservation observes the
> following properties:
> 1. any failure before record_root_in_trans is called successfully
> results in freeing the PREALLOC reservation.
> 2. after record_root_in_trans, we convert to PERTRANS, and now the
> transaction owns freeing the reservation.
>
> This patch enforces those properties on the three operations. Without
> it, generic/269 with squotas enabled at MKFS time would fail in ~5-10
> runs on my system. With this patch, it ran successfully 1000 times in a
> row.
>
> Fixes: e85fde5162bf ("btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations")
Thanks for pinning down the bug, and it looks fine to me.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/inode.c | 13 ++++++++++++-
> fs/btrfs/ioctl.c | 37 ++++++++++++++++++++++++++++---------
> fs/btrfs/root-tree.c | 10 ----------
> fs/btrfs/root-tree.h | 2 --
> 4 files changed, 40 insertions(+), 22 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 09f0a20b5ce8..2587a2e25e44 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4503,6 +4503,7 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
> struct btrfs_trans_handle *trans;
> struct btrfs_block_rsv block_rsv;
> u64 root_flags;
> + u64 qgroup_reserved = 0;
> int ret;
>
> down_write(&fs_info->subvol_sem);
> @@ -4547,12 +4548,20 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
> ret = btrfs_subvolume_reserve_metadata(root, &block_rsv, 5, true);
> if (ret)
> goto out_undead;
> + qgroup_reserved = block_rsv.qgroup_rsv_reserved;
>
> trans = btrfs_start_transaction(root, 0);
> if (IS_ERR(trans)) {
> ret = PTR_ERR(trans);
> goto out_release;
> }
> + ret = btrfs_record_root_in_trans(trans, root);
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + goto out_end_trans;
> + }
> + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
> + qgroup_reserved = 0;
> trans->block_rsv = &block_rsv;
> trans->bytes_reserved = block_rsv.size;
>
> @@ -4611,7 +4620,9 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
> ret = btrfs_end_transaction(trans);
> inode->i_flags |= S_DEAD;
> out_release:
> - btrfs_subvolume_release_metadata(root, &block_rsv);
> + btrfs_block_rsv_release(fs_info, &block_rsv, (u64)-1, NULL);
> + if (qgroup_reserved)
> + btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved);
> out_undead:
> if (ret) {
> spin_lock(&dest->root_item_lock);
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index e3a72292eda9..888dc92c6c75 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -613,6 +613,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> int ret;
> dev_t anon_dev;
> u64 objectid;
> + u64 qgroup_reserved = 0;
>
> root_item = kzalloc(sizeof(*root_item), GFP_KERNEL);
> if (!root_item)
> @@ -650,13 +651,18 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> trans_num_items, false);
> if (ret)
> goto out_new_inode_args;
> + qgroup_reserved = block_rsv.qgroup_rsv_reserved;
>
> trans = btrfs_start_transaction(root, 0);
> if (IS_ERR(trans)) {
> ret = PTR_ERR(trans);
> - btrfs_subvolume_release_metadata(root, &block_rsv);
> - goto out_new_inode_args;
> + goto out_release_rsv;
> }
> + ret = btrfs_record_root_in_trans(trans, BTRFS_I(dir)->root);
> + if (ret)
> + goto out;
> + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
> + qgroup_reserved = 0;
> trans->block_rsv = &block_rsv;
> trans->bytes_reserved = block_rsv.size;
> /* Tree log can't currently deal with an inode which is a new root. */
> @@ -767,9 +773,11 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> out:
> trans->block_rsv = NULL;
> trans->bytes_reserved = 0;
> - btrfs_subvolume_release_metadata(root, &block_rsv);
> -
> btrfs_end_transaction(trans);
> +out_release_rsv:
> + btrfs_block_rsv_release(fs_info, &block_rsv, (u64)-1, NULL);
> + if (qgroup_reserved)
> + btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved);
> out_new_inode_args:
> btrfs_new_inode_args_destroy(&new_inode_args);
> out_inode:
> @@ -791,6 +799,8 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
> struct btrfs_pending_snapshot *pending_snapshot;
> unsigned int trans_num_items;
> struct btrfs_trans_handle *trans;
> + struct btrfs_block_rsv *block_rsv;
> + u64 qgroup_reserved = 0;
> int ret;
>
> /* We do not support snapshotting right now. */
> @@ -827,19 +837,19 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
> goto free_pending;
> }
>
> - btrfs_init_block_rsv(&pending_snapshot->block_rsv,
> - BTRFS_BLOCK_RSV_TEMP);
> + block_rsv = &pending_snapshot->block_rsv;
> + btrfs_init_block_rsv(block_rsv, BTRFS_BLOCK_RSV_TEMP);
> /*
> * 1 to add dir item
> * 1 to add dir index
> * 1 to update parent inode item
> */
> trans_num_items = create_subvol_num_items(inherit) + 3;
> - ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)->root,
> - &pending_snapshot->block_rsv,
> + ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)->root, block_rsv,
> trans_num_items, false);
> if (ret)
> goto free_pending;
> + qgroup_reserved = block_rsv->qgroup_rsv_reserved;
>
> pending_snapshot->dentry = dentry;
> pending_snapshot->root = root;
> @@ -852,6 +862,13 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
> ret = PTR_ERR(trans);
> goto fail;
> }
> + ret = btrfs_record_root_in_trans(trans, BTRFS_I(dir)->root);
> + if (ret) {
> + btrfs_end_transaction(trans);
> + goto fail;
> + }
> + btrfs_qgroup_convert_reserved_meta(root, qgroup_reserved);
> + qgroup_reserved = 0;
>
> trans->pending_snapshot = pending_snapshot;
>
> @@ -881,7 +898,9 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
> if (ret && pending_snapshot->snap)
> pending_snapshot->snap->anon_dev = 0;
> btrfs_put_root(pending_snapshot->snap);
> - btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
> + btrfs_block_rsv_release(fs_info, block_rsv, (u64)-1, NULL);
> + if (qgroup_reserved)
> + btrfs_qgroup_free_meta_prealloc(root, qgroup_reserved);
> free_pending:
> if (pending_snapshot->anon_dev)
> free_anon_bdev(pending_snapshot->anon_dev);
> diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
> index 4bb538a372ce..7007f9e0c972 100644
> --- a/fs/btrfs/root-tree.c
> +++ b/fs/btrfs/root-tree.c
> @@ -548,13 +548,3 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
> }
> return ret;
> }
> -
> -void btrfs_subvolume_release_metadata(struct btrfs_root *root,
> - struct btrfs_block_rsv *rsv)
> -{
> - struct btrfs_fs_info *fs_info = root->fs_info;
> - u64 qgroup_to_release;
> -
> - btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release);
> - btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release);
> -}
> diff --git a/fs/btrfs/root-tree.h b/fs/btrfs/root-tree.h
> index 6f929cf3bd49..8f5739e732b9 100644
> --- a/fs/btrfs/root-tree.h
> +++ b/fs/btrfs/root-tree.h
> @@ -18,8 +18,6 @@ struct btrfs_trans_handle;
> int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
> struct btrfs_block_rsv *rsv,
> int nitems, bool use_global_rsv);
> -void btrfs_subvolume_release_metadata(struct btrfs_root *root,
> - struct btrfs_block_rsv *rsv);
> int btrfs_add_root_ref(struct btrfs_trans_handle *trans, u64 root_id,
> u64 ref_id, u64 dirid, u64 sequence,
> const struct fscrypt_str *name);
next prev parent reply other threads:[~2024-03-26 22:07 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-26 21:39 [PATCH 0/7] btrfs: various qg meta rsv leak fixes Boris Burkov
2024-03-26 21:39 ` [PATCH 1/7] btrfs: correctly model root qgroup rsv in convert Boris Burkov
2024-03-26 22:00 ` Qu Wenruo
2024-03-27 17:20 ` Boris Burkov
2024-03-27 19:35 ` Qu Wenruo
2024-03-26 21:39 ` [PATCH 2/7] btrfs: fix qgroup prealloc rsv leak in subvolume operations Boris Burkov
2024-03-26 22:07 ` Qu Wenruo [this message]
2024-03-26 21:39 ` [PATCH 3/7] btrfs: record delayed inode root in transaction Boris Burkov
2024-03-26 22:08 ` Qu Wenruo
2024-03-27 17:21 ` Boris Burkov
2024-03-26 21:39 ` [PATCH 4/7] btrfs: convert PREALLOC to PERTRANS after record_root_in_trans Boris Burkov
2024-03-26 22:12 ` Qu Wenruo
2024-03-26 21:39 ` [PATCH 5/7] btrfs: free pertrans at end of cleanup_transaction Boris Burkov
2024-03-26 22:16 ` Qu Wenruo
2024-03-27 17:22 ` Boris Burkov
2024-03-27 19:51 ` Qu Wenruo
2024-03-26 21:39 ` [PATCH 6/7] btrfs: btrfs_clear_delalloc_extent frees rsv Boris Burkov
2024-03-26 22:26 ` Qu Wenruo
2024-03-27 17:26 ` Boris Burkov
2024-03-27 19:39 ` Qu Wenruo
2024-03-26 21:39 ` [PATCH 7/7] btrfs: always clear meta pertrans during commit Boris Burkov
2024-03-26 22:20 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=783373ae-12d1-4925-ae6f-413815a3cbd5@suse.com \
--to=wqu@suse.com \
--cc=boris@bur.io \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox