From: Boris Burkov <boris@bur.io>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal
Date: Mon, 29 Apr 2024 15:19:56 -0700 [thread overview]
Message-ID: <20240429221956.GA36465@zen.localdomain> (raw)
In-Reply-To: <d817e2ba-799c-4c88-b5b6-98b8e7233687@gmx.com>
On Tue, Apr 30, 2024 at 07:30:58AM +0930, Qu Wenruo wrote:
>
>
> 在 2024/4/29 22:17, Boris Burkov 写道:
> > On Fri, Apr 19, 2024 at 07:16:52PM +0930, Qu Wenruo wrote:
> > > [BUG]
> > > Currently if one is utilizing "qgroups/drop_subtree_threshold" sysfs,
> > > and a snapshot with level higher than that value is dropped, btrfs will
> > > not be able to delete the qgroup until next qgroup rescan:
> > >
> > > uuid=ffffffff-eeee-dddd-cccc-000000000000
> > >
> > > wipefs -fa $dev
> > > mkfs.btrfs -f $dev -O quota -s 4k -n 4k -U $uuid
> > > mount $dev $mnt
> > >
> > > btrfs subvolume create $mnt/subv1/
> > > for (( i = 0; i < 1024; i++ )); do
> > > xfs_io -f -c "pwrite 0 2k" $mnt/subv1/file_$i > /dev/null
> > > done
> > > sync
> > > btrfs subv snapshot $mnt/subv1 $mnt/snapshot
> > > btrfs quota enable $mnt
> > > btrfs quota rescan -w $mnt
> > > sync
> > > echo 1 > /sys/fs/btrfs/$uuid/qgroups/drop_subtree_threshold
> > > btrfs subvolume delete $mnt/snapshot
> > > btrfs subv sync $mnt
> > > btrfs qgroup show -prce --sync $mnt
> > > btrfs qgroup destroy 0/257 $mnt
> > > umount $mnt
> > >
> > > The final qgroup removal would fail with the following error:
> > >
> > > ERROR: unable to destroy quota group: Device or resource busy
> > >
> > > [CAUSE]
> > > The above script would generate a subvolume of level 2, then snapshot
> > > it, enable qgroup, set the drop_subtree_threshold, then drop the
> > > snapshot.
> > >
> > > Since the subvolume drop would meet the threshold, qgroup would be
> > > marked inconsistent and skip accounting to avoid hanging the system at
> > > transaction commit.
> > >
> > > But currently we do not allow a qgroup with any rfer/excl numbers to be
> > > dropped, and this is not really compatible with the new
> > > drop_subtree_threshold behavior.
> > >
> > > [FIX]
> > > Instead of a strong requirement for zero rfer/excl numbers, just check
> > > if there is any child for higher level qgroup, and for subvolume qgroups
> > > check if there is a corresponding subvolume for it.
> > >
> > > For rsv values, do extra warnings, and for rfer/excl numbers, only do the
> > > warning if we're in full accounting mode and the qgroup is consistent.
> > >
> > > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > > ---
> > > fs/btrfs/qgroup.c | 69 ++++++++++++++++++++++++++++++++++++++++++-----
> > > 1 file changed, 62 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> > > index 9a9f84c4d3b8..2ea16a07a7d4 100644
> > > --- a/fs/btrfs/qgroup.c
> > > +++ b/fs/btrfs/qgroup.c
> > > @@ -1736,13 +1736,38 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > > return ret;
> > > }
> > >
> > > -static bool qgroup_has_usage(struct btrfs_qgroup *qgroup)
> > > +static bool can_delete_qgroup(struct btrfs_fs_info *fs_info,
> > > + struct btrfs_qgroup *qgroup)
> > > {
> > > - return (qgroup->rfer > 0 || qgroup->rfer_cmpr > 0 ||
> > > - qgroup->excl > 0 || qgroup->excl_cmpr > 0 ||
> > > - qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] > 0 ||
> > > - qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] > 0 ||
> > > - qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > 0);
> > > + struct btrfs_key key;
> > > + struct btrfs_path *path;
> > > + int ret;
> > > +
> > > + /* For higher level qgroup, we can only delete it if it has no child. */
> > > + if (btrfs_qgroup_level(qgroup->qgroupid)) {
> > > + if (!list_empty(&qgroup->members))
> > > + return false;
> > > + return true;
> > > + }
> > > +
> > > + /*
> > > + * For level-0 qgroups, we can only delete it if it has no subvolume
> > > + * for it.
> > > + * This means even a subvolume is unlinked but not yet fully dropped,
> > > + * we can not delete the qgroup.
> > > + */
> >
> > This was what I originally considered for normal qgroups before observing
> > that usage is 0 anyway. I didn't know about the drop tree threshold,
> > my mistake.
> >
> > With that said, I support this change for non-squota qgroups.
> >
> > For squota, I think this behavior would be OK, but undesirable, IMO. Any
> > parent qgroup would still have its usage incremented against the limit,
> > but it would be impossible to find any information on where it was from.
>
> That's indeed another problem.
>
> For regular qgroup that could only happen when qgroup is inconsistent,
> and we're fine to drop the qgroup without updating the parent.
>
> But for squota, there is no inconsistent state, meaning we should also
> drop all the numbers from parent too.
>
> >
> > How do you feel about making this patch add the new logic and make it
> > conditional on qgroup mode?
>
> I'm totally fine to do it conditional, although I'd also like to get
> feedback on dropping the numbers from parent qgroup (so we can do it for
> both qgroup and squota).
>
> Would the auto drop for parent numbers be a solution?
It's a good question. It never occurred to me until this discussion
today.
Thinking out loud, squotas as a feature is already "comfortable" with
unaccounted space. If you enable it on a live FS, all the extents that
already exist are unaccounted, so there is no objection on that front.
I think from a container isolation perspective, the current behavior
makes more sense than auto dropping from the parent on subvol delete to
allow cleaning up the qgroup.
Suppose we create a container wrapping parent qgroup with ID 1/100. To
enforce a limit on the container customer, we set some limit on 1/100.
Then we create a container subvol and put 0/svid0 into 1/100. The
customer gets to do stuff in the subvol. They are a fancy customer and
create a subvol svid1, snapshot it svid2, and delete svid1.
svid1 and svid2 are created in the subvol of id svid0, so auto-inheritance
ensured that 1/100 was the parent of 0/svid1 and 0/svid2, but now
deleting svid1 frees all that customer usage from 1/100 and allows the
customer to escape the limit. This is obviously quite undesirable, and
wouldn't require a particularly malicious customer to hit.
Boris
>
> Thanks,
> Qu
> >
> > Thanks,
> > Boris
> >
> > > + key.objectid = qgroup->qgroupid;
> > > + key.type = BTRFS_ROOT_ITEM_KEY;
> > > + key.offset = -1ULL;
> > > + path = btrfs_alloc_path();
> > > + if (!path)
> > > + return false;
> > > +
> > > + ret = btrfs_find_root(fs_info->tree_root, &key, path, NULL, NULL);
> > > + btrfs_free_path(path);
> > > + if (ret > 0)
> > > + return true;
> > > + return false;
> > > }
> > >
> > > int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > > @@ -1764,7 +1789,7 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > > goto out;
> > > }
> > >
> > > - if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
> > > + if (!can_delete_qgroup(fs_info, qgroup)) {
> > > ret = -EBUSY;
> > > goto out;
> > > }
> > > @@ -1789,6 +1814,36 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
> > > }
> > >
> > > spin_lock(&fs_info->qgroup_lock);
> > > + /*
> > > + * Warn on reserved space. The subvolume should has no child nor
> > > + * corresponding subvolume.
> > > + * Thus its reserved space should all be zero, no matter if qgroup
> > > + * is consistent or the mode.
> > > + */
> > > + WARN_ON(qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] ||
> > > + qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] ||
> > > + qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS]);
> > > + /*
> > > + * The same for rfer/excl numbers, but that's only if our qgroup is
> > > + * consistent and if it's in regular qgroup mode.
> > > + * For simple mode it's not as accurate thus we can hit non-zero values
> > > + * very frequently.
> > > + */
> > > + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL &&
> > > + !(fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT)) {
> > > + if (WARN_ON(qgroup->rfer || qgroup->excl ||
> > > + qgroup->rfer_cmpr || qgroup->excl_cmpr)) {
> > > + btrfs_warn_rl(fs_info,
> > > +"to be deleted qgroup %u/%llu has non-zero numbers, rfer %llu rfer_cmpr %llu excl %llu excl_cmpr %llu",
> > > + btrfs_qgroup_level(qgroup->qgroupid),
> > > + btrfs_qgroup_subvolid(qgroup->qgroupid),
> > > + qgroup->rfer,
> > > + qgroup->rfer_cmpr,
> > > + qgroup->excl,
> > > + qgroup->excl_cmpr);
> > > + qgroup_mark_inconsistent(fs_info);
> > > + }
> > > + }
> >
> > If you go ahead with making it conditional, I would fold this warning
> > into the check logic. Either way is fine, of course.
> >
> > > del_qgroup_rb(fs_info, qgroupid);
> > > spin_unlock(&fs_info->qgroup_lock);
> > >
> > > --
> > > 2.44.0
> > >
> >
next prev parent reply other threads:[~2024-04-29 22:17 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-19 9:46 [PATCH v2 0/2] btrfs: qgroup: stale qgroups related impromvents Qu Wenruo
2024-04-19 9:46 ` [PATCH v2 1/2] btrfs: slightly loose the requirement for qgroup removal Qu Wenruo
2024-04-29 12:47 ` Boris Burkov
2024-04-29 22:00 ` Qu Wenruo
2024-04-29 22:19 ` Boris Burkov [this message]
2024-04-29 22:29 ` Qu Wenruo
2024-04-29 22:38 ` Boris Burkov
2024-04-19 9:46 ` [PATCH v2 2/2] btrfs: automatically remove the subvolume qgroup Qu Wenruo
2024-04-24 12:41 ` David Sterba
2024-04-24 22:19 ` Qu Wenruo
2024-04-25 12:34 ` David Sterba
2024-04-25 21:51 ` Qu Wenruo
2024-04-29 13:13 ` Boris Burkov
2024-04-29 16:31 ` David Sterba
2024-04-29 22:05 ` Qu Wenruo
2024-04-30 10:59 ` David Sterba
2024-04-30 22:05 ` Qu Wenruo
2024-04-30 22:18 ` Boris Burkov
2024-04-30 22:27 ` Qu Wenruo
2024-05-02 15:03 ` David Sterba
2024-05-02 21:29 ` Qu Wenruo
2024-05-03 12:46 ` David Sterba
2024-05-03 22:14 ` Qu Wenruo
2024-05-02 15:00 ` David Sterba
2024-05-02 21:27 ` Qu Wenruo
2024-04-29 22:49 ` Qu Wenruo
2024-04-29 16:36 ` David Sterba
2024-04-29 12:57 ` Boris Burkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240429221956.GA36465@zen.localdomain \
--to=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox