From: Boris Burkov <boris@bur.io>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 6/9] btrfs: auto hierarchy for simple qgroups of nested subvols
Date: Thu, 4 May 2023 09:34:13 -0700 [thread overview]
Message-ID: <ZFPehaADwnSseFiG@devvm9258.prn0.facebook.com> (raw)
In-Reply-To: <f488fc4e-3a5b-c3f6-1958-25f91c9d378e@gmx.com>
On Wed, May 03, 2023 at 12:47:40PM +0800, Qu Wenruo wrote:
>
>
> On 2023/5/3 08:59, Boris Burkov wrote:
> > Consider the following sequence:
> > - enable quotas
> > - create subvol S id 256 at dir outer/
> > - create a qgroup 1/100
> > - add 0/256 (S's auto qgroup) to 1/100
> > - create subvol T id 257 at dir outer/inner/
> >
> > With full qgroups, there is no relationship between 0/257 and either of
> > 0/256 or 1/100. There is an inherit feature that the creator of inner/
> > can use to specify it ought to be in 1/100.
> >
> > Simple quotas are targeted at container isolation, where such automatic
> > inheritance for not necessarily trusted/controlled nested subvol
> > creation would be quite helpful. Therefore, add a new default behavior
> > for simple quotas: when you create a nested subvol, automatically
> > inherit as parents any parents of the qgroup of the subvol the new inode
> > is going in.
> >
> > In our example, 257/0 would also be under 1/100, allowing easy control
> > of a total quota over an arbitrary hierarchy of subvolumes.
> >
> > I think this _might_ be a generally useful behavior, so it could be
> > interesting to put it behind a new inheritance flag that simple quotas
> > always use while traditional quotas let the user specify, but this is a
> > minimally intrusive change to start.
>
> I'm not sure if the automatically qgroup inherit is a good idea.
>
> Consider the following case, a subvolume is created under another subvolume,
> then it got inherited into qgroup 1/0.
I am not sure I follow this concern. We automatically inherit
relationships of the inode containing subvolume, not its qgroup.
So if X/0 is in a different qgroup Q/1 and subvol Y is created with its
inode in X, Y/0 will be in Q/1, not X/0. I don't believe we would ever
introduce a dependency that would violate the level invariants.
>
> But don't forget that a subvolume can be easily moved to other directory,
> should we also change which qgroup it belongs to?
I think this is a great point and I haven't spent enough time thinking
about it. Currently, the answer is no, but that might be surprising to a
user.
I imagine it gets even worse if you mount the same subvolume multiple times..
>
>
> Another point is, if a container manager tool wants to do the same inherit
> behavior, it's not that hard to query the owner group of the parent
> directory, then pass "-i" option for "btrfs subvolume create" or "btrfs
> subvolume snapshot" commands.
>
> As the old saying goes, kernel should only provide a mechanism, not a
> policy. To me, automatically inherit qgroup owner looks more like a policy.
This is targeted at the case where the container management tool does
not own creating a subvol.
If we did it your way, then the second something running in a container
creates a subvol (a reasonable operation, for faster bulk deletes, e.g.),
they would trivially "escape" their disk limit. At that point you would
have to do convoluted things like LD_PRELOAD or convincing all users to
use a subvol creation util/lib (might be running some code we can't or
don't want to modify too..)
If we can do it automatically in the kernel, there really is value
there.
IMO, a useful analogy here is cgroups. When you put a process in a
cgroup, all its child processes automatically get rolled up into the
same cgroup, which is important for the same reasons as what I described
above. You can then do weird stuff and move procs to a different cgroup,
but by default everything is neatly nested, which I think is a good
behavior for simple quotas too.
If the functionality is not totally unpalatable, but it's too confusing
to make it default for only simple quotas, I can add an ioctl or subvol
creation flag or something that opts a subvol in to having its nested
subvols inherit its parent qgroups. That would be something the
container runtime could use to opt in.
Thanks for the review,
Boris
>
> Thanks,
> Qu
>
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > fs/btrfs/ioctl.c | 2 +-
> > fs/btrfs/qgroup.c | 46 +++++++++++++++++++++++++++++++++++++++---
> > fs/btrfs/qgroup.h | 6 +++---
> > fs/btrfs/transaction.c | 2 +-
> > 4 files changed, 48 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index ca7d2ef739c8..4d6d28feb5c6 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -652,7 +652,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
> > /* Tree log can't currently deal with an inode which is a new root. */
> > btrfs_set_log_full_commit(trans);
> > - ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
> > + ret = btrfs_qgroup_inherit(trans, 0, objectid, root->root_key.objectid, inherit);
> > if (ret)
> > goto out;
> > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> > index e3d0630fef0c..6816e01f00b5 100644
> > --- a/fs/btrfs/qgroup.c
> > +++ b/fs/btrfs/qgroup.c
> > @@ -1504,8 +1504,7 @@ static int quick_update_accounting(struct btrfs_fs_info *fs_info,
> > return ret;
> > }
> > -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src,
> > - u64 dst)
> > +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst)
> > {
> > struct btrfs_fs_info *fs_info = trans->fs_info;
> > struct btrfs_qgroup *parent;
> > @@ -2945,6 +2944,42 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans)
> > return ret;
> > }
> > +static int qgroup_auto_inherit(struct btrfs_fs_info *fs_info,
> > + u64 inode_rootid,
> > + struct btrfs_qgroup_inherit **inherit)
> > +{
> > + int i = 0;
> > + u64 num_qgroups = 0;
> > + struct btrfs_qgroup *inode_qg;
> > + struct btrfs_qgroup_list *qg_list;
> > +
> > + if (*inherit)
> > + return -EEXIST;
> > +
> > + inode_qg = find_qgroup_rb(fs_info, inode_rootid);
> > + if (!inode_qg)
> > + return -ENOENT;
> > +
> > + list_for_each_entry(qg_list, &inode_qg->groups, next_group) {
> > + ++num_qgroups;
> > + }
> > +
> > + if (!num_qgroups)
> > + return 0;
> > +
> > + *inherit = kzalloc(sizeof(**inherit) + num_qgroups * sizeof(u64), GFP_NOFS);
> > + if (!*inherit)
> > + return -ENOMEM;
> > + (*inherit)->num_qgroups = num_qgroups;
> > +
> > + list_for_each_entry(qg_list, &inode_qg->groups, next_group) {
> > + u64 qg_id = qg_list->group->qgroupid;
> > + *((u64 *)((*inherit)+1) + i) = qg_id;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > /*
> > * Copy the accounting information between qgroups. This is necessary
> > * when a snapshot or a subvolume is created. Throwing an error will
> > @@ -2952,7 +2987,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans)
> > * when a readonly fs is a reasonable outcome.
> > */
> > int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> > - u64 objectid, struct btrfs_qgroup_inherit *inherit)
> > + u64 objectid, u64 inode_rootid,
> > + struct btrfs_qgroup_inherit *inherit)
> > {
> > int ret = 0;
> > int i;
> > @@ -2994,6 +3030,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> > goto out;
> > }
> > + if (!inherit && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE)
> > + qgroup_auto_inherit(fs_info, inode_rootid, &inherit);
> > +
> > if (inherit) {
> > i_qgroups = (u64 *)(inherit + 1);
> > nums = inherit->num_qgroups + 2 * inherit->num_ref_copies +
> > @@ -3020,6 +3059,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> > if (ret)
> > goto out;
> > +
> > /*
> > * add qgroup to all inherited groups
> > */
> > diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> > index b300998dcbc7..aecebe9d0d62 100644
> > --- a/fs/btrfs/qgroup.h
> > +++ b/fs/btrfs/qgroup.h
> > @@ -272,8 +272,7 @@ int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
> > void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info);
> > int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
> > bool interruptible);
> > -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src,
> > - u64 dst);
> > +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst);
> > int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src,
> > u64 dst);
> > int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid);
> > @@ -367,7 +366,8 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr,
> > int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans);
> > int btrfs_run_qgroups(struct btrfs_trans_handle *trans);
> > int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
> > - u64 objectid, struct btrfs_qgroup_inherit *inherit);
> > + u64 objectid, u64 inode_rootid,
> > + struct btrfs_qgroup_inherit *inherit);
> > void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
> > u64 ref_root, u64 num_bytes,
> > enum btrfs_qgroup_rsv_type type);
> > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> > index 0edfb58afd80..6befcf1b4b1f 100644
> > --- a/fs/btrfs/transaction.c
> > +++ b/fs/btrfs/transaction.c
> > @@ -1557,7 +1557,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
> > /* Now qgroup are all updated, we can inherit it to new qgroups */
> > ret = btrfs_qgroup_inherit(trans, src->root_key.objectid, dst_objectid,
> > - inherit);
> > + parent->root_key.objectid, inherit);
> > if (ret < 0)
> > goto out;
>
next prev parent reply other threads:[~2023-05-04 16:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-03 0:59 [PATCH RFC 0/9] btrfs: simple quotas Boris Burkov
2023-05-03 0:59 ` [PATCH 1/9] btrfs: simple quotas mode Boris Burkov
2023-05-03 2:38 ` Qu Wenruo
2023-05-03 0:59 ` [PATCH 2/9] btrfs: new function for recording simple quota usage Boris Burkov
2023-05-03 0:59 ` [PATCH 3/9] btrfs: track original extent subvol in a new inline ref Boris Burkov
2023-05-03 3:17 ` Qu Wenruo
2023-05-04 16:17 ` Boris Burkov
2023-05-04 21:49 ` Qu Wenruo
2023-05-09 23:58 ` David Sterba
2023-05-10 16:57 ` David Sterba
2023-05-11 0:07 ` Qu Wenruo
2023-05-13 6:31 ` Qu Wenruo
2023-05-03 0:59 ` [PATCH 4/9] btrfs: track metadata owning root in delayed refs Boris Burkov
2023-05-03 0:59 ` [PATCH 5/9] btrfs: record simple quota deltas Boris Burkov
2023-05-03 0:59 ` [PATCH 6/9] btrfs: auto hierarchy for simple qgroups of nested subvols Boris Burkov
2023-05-03 4:47 ` Qu Wenruo
2023-05-04 16:34 ` Boris Burkov [this message]
2023-05-04 21:55 ` Qu Wenruo
2023-05-03 0:59 ` [PATCH 7/9] btrfs: check generation when recording simple quota delta Boris Burkov
2023-05-03 0:59 ` [PATCH 8/9] btrfs: expose the qgroup mode via sysfs Boris Burkov
2023-05-03 0:59 ` [PATCH 9/9] btrfs: free qgroup rsv on io failure Boris Burkov
2023-05-05 4:13 ` [PATCH RFC 0/9] btrfs: simple quotas Anand Jain
2023-05-10 1:09 ` Boris Burkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZFPehaADwnSseFiG@devvm9258.prn0.facebook.com \
--to=boris@bur.io \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox