From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH v4 14/18] btrfs: simple quota auto hierarchy for nested subvols
Date: Wed, 26 Jul 2023 13:38:41 -0700 [thread overview]
Message-ID: <23508aed837267ae8fc633f2f60e6d3402910d24.1690403768.git.boris@bur.io> (raw)
In-Reply-To: <cover.1690403768.git.boris@bur.io>
Consider the following sequence:
- enable quotas
- create subvol S id 256 at dir outer/
- create a qgroup 1/100
- add 0/256 (S's auto qgroup) to 1/100
- create subvol T id 257 at dir outer/inner/
With full qgroups, there is no relationship between 0/257 and either of
0/256 or 1/100. There is an inherit feature that the creator of inner/
can use to specify it ought to be in 1/100.
Simple quotas are targeted at container isolation, where such automatic
inheritance for not necessarily trusted/controlled nested subvol
creation would be quite helpful. Therefore, add a new default behavior
for simple quotas: when you create a nested subvol, automatically
inherit as parents any parents of the qgroup of the subvol the new inode
is going in.
In our example, 257/0 would also be under 1/100, allowing easy control
of a total quota over an arbitrary hierarchy of subvolumes.
I think this _might_ be a generally useful behavior, so it could be
interesting to put it behind a new inheritance flag that simple quotas
always use while traditional quotas let the user specify, but this is a
minimally intrusive change to start.
Signed-off-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/qgroup.c | 44 +++++++++++++++++++++++++++++++++++++++---
fs/btrfs/qgroup.h | 6 +++---
fs/btrfs/transaction.c | 13 +++++++++----
4 files changed, 54 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9b61bc62e439..c9b069077fd0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -652,7 +652,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
/* Tree log can't currently deal with an inode which is a new root. */
btrfs_set_log_full_commit(trans);
- ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit);
+ ret = btrfs_qgroup_inherit(trans, 0, objectid, root->root_key.objectid, inherit);
if (ret)
goto out;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index dedc532669f4..58e9ed0deedd 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1550,8 +1550,7 @@ static int quick_update_accounting(struct btrfs_fs_info *fs_info,
return ret;
}
-int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src,
- u64 dst)
+int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_qgroup *parent;
@@ -2991,6 +2990,40 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans)
return ret;
}
+static int qgroup_auto_inherit(struct btrfs_fs_info *fs_info,
+ u64 inode_rootid,
+ struct btrfs_qgroup_inherit **inherit)
+{
+ int i = 0;
+ u64 num_qgroups = 0;
+ struct btrfs_qgroup *inode_qg;
+ struct btrfs_qgroup_list *qg_list;
+
+ if (*inherit)
+ return -EEXIST;
+
+ inode_qg = find_qgroup_rb(fs_info, inode_rootid);
+ if (!inode_qg)
+ return -ENOENT;
+
+ num_qgroups = list_count_nodes(&inode_qg->groups);
+
+ if (!num_qgroups)
+ return 0;
+
+ *inherit = kzalloc(sizeof(**inherit) + num_qgroups * sizeof(u64), GFP_NOFS);
+ if (!*inherit)
+ return -ENOMEM;
+ (*inherit)->num_qgroups = num_qgroups;
+
+ list_for_each_entry(qg_list, &inode_qg->groups, next_group) {
+ u64 qg_id = qg_list->group->qgroupid;
+ *((u64 *)((*inherit)+1) + i) = qg_id;
+ }
+
+ return 0;
+}
+
/*
* Copy the accounting information between qgroups. This is necessary
* when a snapshot or a subvolume is created. Throwing an error will
@@ -2998,7 +3031,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans)
* when a readonly fs is a reasonable outcome.
*/
int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
- u64 objectid, struct btrfs_qgroup_inherit *inherit)
+ u64 objectid, u64 inode_rootid,
+ struct btrfs_qgroup_inherit *inherit)
{
int ret = 0;
int i;
@@ -3040,6 +3074,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
goto out;
}
+ if (!inherit && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE)
+ qgroup_auto_inherit(fs_info, inode_rootid, &inherit);
+
if (inherit) {
i_qgroups = (u64 *)(inherit + 1);
nums = inherit->num_qgroups + 2 * inherit->num_ref_copies +
@@ -3066,6 +3103,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
if (ret)
goto out;
+
/*
* add qgroup to all inherited groups
*/
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 94d85b4fbebd..ce6fa8694ca7 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -271,8 +271,7 @@ int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info);
int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info,
bool interruptible);
-int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src,
- u64 dst);
+int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst);
int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src,
u64 dst);
int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid);
@@ -366,7 +365,8 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr,
int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans);
int btrfs_run_qgroups(struct btrfs_trans_handle *trans);
int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid,
- u64 objectid, struct btrfs_qgroup_inherit *inherit);
+ u64 objectid, u64 inode_rootid,
+ struct btrfs_qgroup_inherit *inherit);
void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info,
u64 ref_root, u64 num_bytes,
enum btrfs_qgroup_rsv_type type);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 25217888e897..fb857147df57 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1529,13 +1529,14 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
int ret;
/*
- * Save some performance in the case that full qgroups are not
+ * Save some performance in the case that qgroups are not
* enabled. If this check races with the ioctl, rescan will
* kick in anyway.
*/
if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL)
return 0;
+
/*
* Ensure dirty @src will be committed. Or, after coming
* commit_fs_roots() and switch_commit_roots(), any dirty but not
@@ -1572,7 +1573,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
/* Now qgroup are all updated, we can inherit it to new qgroups */
ret = btrfs_qgroup_inherit(trans, src->root_key.objectid, dst_objectid,
- inherit);
+ parent->root_key.objectid, inherit);
if (ret < 0)
goto out;
@@ -1839,8 +1840,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
* To co-operate with that hack, we do hack again.
* Or snapshot will be greatly slowed down by a subtree qgroup rescan
*/
- ret = qgroup_account_snapshot(trans, root, parent_root,
- pending->inherit, objectid);
+ if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL)
+ ret = qgroup_account_snapshot(trans, root, parent_root,
+ pending->inherit, objectid);
+ else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE)
+ ret = btrfs_qgroup_inherit(trans, root->root_key.objectid, objectid,
+ parent_root->root_key.objectid, pending->inherit);
if (ret < 0)
goto fail;
--
2.41.0
next prev parent reply other threads:[~2023-07-26 20:41 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-26 20:38 [PATCH v4 00/18] btrfs: simple quotas Boris Burkov
2023-07-26 20:38 ` [PATCH v4 01/18] btrfs: introduce quota mode Boris Burkov
2023-07-26 20:38 ` [PATCH v4 02/18] btrfs: add new quota mode for simple quotas Boris Burkov
2023-07-26 20:38 ` [PATCH v4 03/18] btrfs: expose quota mode via sysfs Boris Burkov
2023-07-26 20:38 ` [PATCH v4 04/18] btrfs: add simple_quota incompat feature to sysfs Boris Burkov
2023-07-26 20:38 ` [PATCH v4 05/18] btrfs: flush reservations during quota disable Boris Burkov
2023-07-26 20:38 ` [PATCH v4 06/18] btrfs: create qgroup earlier in snapshot creation Boris Burkov
2023-07-26 20:38 ` [PATCH v4 07/18] btrfs: function for recording simple quota deltas Boris Burkov
2023-07-26 20:38 ` [PATCH v4 08/18] btrfs: rename tree_ref and data_ref owning_root Boris Burkov
2023-07-26 20:38 ` [PATCH v4 09/18] btrfs: track owning root in btrfs_ref Boris Burkov
2023-07-26 20:38 ` [PATCH v4 10/18] btrfs: track original extent owner in head_ref Boris Burkov
2023-07-26 20:38 ` [PATCH v4 11/18] btrfs: new inline ref storing owning subvol of data extents Boris Burkov
2023-07-26 20:38 ` [PATCH v4 12/18] btrfs: inline owner ref lookup helper Boris Burkov
2023-07-26 20:38 ` [PATCH v4 13/18] btrfs: record simple quota deltas Boris Burkov
2023-07-26 20:38 ` Boris Burkov [this message]
2023-07-26 20:38 ` [PATCH v4 15/18] btrfs: check generation when recording simple quota delta Boris Burkov
2023-07-26 20:38 ` [PATCH v4 16/18] btrfs: track metadata relocation cow with simple quota Boris Burkov
2023-07-26 20:38 ` [PATCH v4 17/18] btrfs: track data relocation " Boris Burkov
2023-07-26 20:38 ` [PATCH v4 18/18] btrfs: only set QUOTA_ENABLED when done reading qgroups Boris Burkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23508aed837267ae8fc633f2f60e6d3402910d24.1690403768.git.boris@bur.io \
--to=boris@bur.io \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).