From: Boris Burkov <boris@bur.io>
To: David Sterba <dsterba@suse.cz>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v5 15/18] btrfs: check generation when recording simple quota delta
Date: Fri, 8 Sep 2023 14:41:46 -0700 [thread overview]
Message-ID: <20230908214146.GA172348@zen> (raw)
In-Reply-To: <20230907122449.GL3159@twin.jikos.cz>
On Thu, Sep 07, 2023 at 02:24:49PM +0200, David Sterba wrote:
> On Thu, Jul 27, 2023 at 03:13:02PM -0700, Boris Burkov wrote:
> > Simple quotas count extents only from the moment the feature is enabled.
> > Therefore, if we do something like:
> > 1. create subvol S
> > 2. write F in S
> > 3. enable quotas
> > 4. remove F
> > 5. write G in S
> >
> > then after 3. and 4. we would expect the simple quota usage of S to be 0
> > (putting aside some metadata extents that might be written) and after
> > 5., it should be the size of G plus metadata. Therefore, we need to be
> > able to determine whether a particular quota delta we are processing
> > predates simple quota enablement.
> >
> > To do this, store the transaction id when quotas were enabled. In
> > fs_info for immediate use and in the quota status item to make it
> > recoverable on mount. When we see a delta, check if the generation of
> > the extent item is less than that of quota enablement. If so, we should
> > ignore the delta from this extent.
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > fs/btrfs/accessors.h | 2 ++
> > fs/btrfs/extent-tree.c | 4 ++++
> > fs/btrfs/fs.h | 2 ++
> > fs/btrfs/qgroup.c | 14 ++++++++++++--
> > fs/btrfs/qgroup.h | 1 +
> > include/uapi/linux/btrfs_tree.h | 7 +++++++
> > 6 files changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
> > index a23045c05937..513f8edbd98e 100644
> > --- a/fs/btrfs/accessors.h
> > +++ b/fs/btrfs/accessors.h
> > @@ -970,6 +970,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
> > flags, 64);
> > BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
> > rescan, 64);
> > +BTRFS_SETGET_FUNCS(qgroup_status_enable_gen, struct btrfs_qgroup_status_item,
> > + enable_gen, 64);
> >
> > /* btrfs_qgroup_info_item */
> > BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 1b5efd03ef83..395ab46e520b 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -1513,6 +1513,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
> > .rsv_bytes = href->reserved_bytes,
> > .is_data = true,
> > .is_inc = true,
> > + .generation = trans->transid,
> > };
> >
> > if (extent_op)
> > @@ -1676,6 +1677,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
> > .rsv_bytes = 0,
> > .is_data = false,
> > .is_inc = true,
> > + .generation = trans->transid,
> > };
> >
> > BUG_ON(!extent_op || !extent_op->update_flags);
> > @@ -3217,6 +3219,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
> > .rsv_bytes = 0,
> > .is_data = is_data,
> > .is_inc = false,
> > + .generation = btrfs_extent_generation(leaf, ei),
> > };
> >
> > /* In this branch refs == 1 */
> > @@ -4850,6 +4853,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
> > struct btrfs_simple_quota_delta delta = {
> > .root = root_objectid,
> > .num_bytes = ins->offset,
> > + .generation = trans->transid,
> > .rsv_bytes = 0,
> > .is_data = true,
> > .is_inc = true,
> > diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> > index f76f450c2abf..da7b623ff15f 100644
> > --- a/fs/btrfs/fs.h
> > +++ b/fs/btrfs/fs.h
> > @@ -802,6 +802,8 @@ struct btrfs_fs_info {
> > spinlock_t eb_leak_lock;
> > struct list_head allocated_ebs;
> > #endif
> > +
> > + u64 quota_enable_gen;
>
> Please move it to the other quota/qgroup related members, at the end of
> fs_info there's only debugging stuff.
>
> > };
> >
> > static inline void btrfs_set_last_root_drop_gen(struct btrfs_fs_info *fs_info,
> > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> > index 58e9ed0deedd..a8a603242431 100644
> > --- a/fs/btrfs/qgroup.c
> > +++ b/fs/btrfs/qgroup.c
> > @@ -454,6 +454,8 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
> > }
> > fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, ptr);
> > simple = fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_SIMPLE;
> > + if (simple)
> > + fs_info->quota_enable_gen = btrfs_qgroup_status_enable_gen(l, ptr);
> > if (btrfs_qgroup_status_generation(l, ptr) !=
> > fs_info->generation && !simple) {
> > qgroup_mark_inconsistent(fs_info);
> > @@ -1107,10 +1109,12 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info,
> > btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid);
> > btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION);
> > fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON;
> > - if (simple)
> > + if (simple) {
> > fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_SIMPLE;
> > - else
> > + btrfs_set_qgroup_status_enable_gen(leaf, ptr, trans->transid);
> > + } else {
> > fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> > + }
> > btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags &
> > BTRFS_QGROUP_STATUS_FLAGS_MASK);
> > btrfs_set_qgroup_status_rescan(leaf, ptr, 0);
> > @@ -1202,6 +1206,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info,
> > goto out_free_path;
> > }
> >
> > + fs_info->quota_enable_gen = trans->transid;
> > +
> > mutex_unlock(&fs_info->qgroup_ioctl_lock);
> > /*
> > * Commit the transaction while not holding qgroup_ioctl_lock, to avoid
> > @@ -4622,6 +4628,10 @@ int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info,
> > if (!is_fstree(root))
> > return 0;
> >
> > + /* If the extent predates enabling quotas, don't count it. */
> > + if (delta->generation < fs_info->quota_enable_gen)
> > + return 0;
> > +
> > spin_lock(&fs_info->qgroup_lock);
> > qgroup = find_qgroup_rb(fs_info, root);
> > if (!qgroup) {
> > diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> > index ce6fa8694ca7..ae1ce14b365c 100644
> > --- a/fs/btrfs/qgroup.h
> > +++ b/fs/btrfs/qgroup.h
> > @@ -241,6 +241,7 @@ struct btrfs_simple_quota_delta {
> > u64 rsv_bytes; /* The number of bytes reserved for this extent */
> > bool is_inc; /* Whether we are using or freeing the extent */
> > bool is_data; /* Whether the extent is data or metadata */
> > + u64 generation; /* The generation the extent was created in */
>
> Please reorder it so it does not leave gaps between struct members.
>
> > };
> >
> > static inline u64 btrfs_qgroup_subvolid(u64 qgroupid)
> > diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> > index eacb26caf3c6..1120ce3dae42 100644
> > --- a/include/uapi/linux/btrfs_tree.h
> > +++ b/include/uapi/linux/btrfs_tree.h
> > @@ -1242,6 +1242,13 @@ struct btrfs_qgroup_status_item {
> > * of the scan. It contains a logical address
> > */
> > __le64 rescan;
> > +
> > + /*
> > + * the generation when quotas are enabled. Used by simple quotas to
> > + * avoid decrementing when freeing an extent that was written before
> > + * enable.
> > + */
> > + __le64 enable_gen;
>
> This is public interface and btrfs_qgroup_status_item is used in many
> places in user space at least in btrfs-progs. This needs a lot of
> sanity checks.
Totally agreed in principle, but not exactly sure how to proceed in
practice. I would definitely appreciate some tips/help!
How we interact with the new field:
- When enabling squota, set it, the incompat bit, and the status flag
- When reading in the qgroup status_item, if the status flag is set,
then read the enable_gen.
I believe this prevents us from ever reading garbage while trying to
read an old fs (status flag won't be set) and it prevents any
btrfs-progs from getting confused by a wrong-sized status item, since
it would choke on the incompat bit first.
Am I missing some other case? I can try to make it more explicitly
zeroed when we enable qgroups but not squotas? I can add an ASSERT that
the incompat bit is set as expected when we read the status item with
the flag on (that seems good no matter what)?
I can also write a wrapper for getting it which does the incompat/status
flag checking to make it more clear that it isn't safe to read in
general. Or a comment on the struct saying it depends on the incompat
bit?
Thanks for all the review, by the way.
>
> > } __attribute__ ((__packed__));
> >
> > struct btrfs_qgroup_info_item {
> > --
> > 2.41.0
next prev parent reply other threads:[~2023-09-08 21:40 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-27 22:12 [PATCH v5 00/18] btrfs: simple quotas Boris Burkov
2023-07-27 22:12 ` [PATCH v5 01/18] btrfs: introduce quota mode Boris Burkov
2023-07-27 22:12 ` [PATCH v5 02/18] btrfs: add new quota mode for simple quotas Boris Burkov
2023-08-21 18:00 ` Josef Bacik
2023-09-07 11:19 ` David Sterba
2023-07-27 22:12 ` [PATCH v5 03/18] btrfs: expose quota mode via sysfs Boris Burkov
2023-08-21 18:00 ` Josef Bacik
2023-09-07 11:25 ` David Sterba
2023-07-27 22:12 ` [PATCH v5 04/18] btrfs: add simple_quota incompat feature to sysfs Boris Burkov
2023-08-21 18:01 ` Josef Bacik
2023-09-07 11:28 ` David Sterba
2023-09-07 20:56 ` Boris Burkov
2023-07-27 22:12 ` [PATCH v5 05/18] btrfs: flush reservations during quota disable Boris Burkov
2023-07-27 22:12 ` [PATCH v5 06/18] btrfs: create qgroup earlier in snapshot creation Boris Burkov
2023-08-21 18:02 ` Josef Bacik
2023-09-07 11:41 ` David Sterba
2023-09-08 22:50 ` Boris Burkov
2023-07-27 22:12 ` [PATCH v5 07/18] btrfs: function for recording simple quota deltas Boris Burkov
2023-08-21 18:04 ` Josef Bacik
2023-09-07 11:46 ` David Sterba
2023-07-27 22:12 ` [PATCH v5 08/18] btrfs: rename tree_ref and data_ref owning_root Boris Burkov
2023-07-27 22:12 ` [PATCH v5 09/18] btrfs: track owning root in btrfs_ref Boris Burkov
2023-08-21 18:05 ` Josef Bacik
2023-07-27 22:12 ` [PATCH v5 10/18] btrfs: track original extent owner in head_ref Boris Burkov
2023-08-21 18:06 ` Josef Bacik
2023-09-07 11:54 ` David Sterba
2023-07-27 22:12 ` [PATCH v5 11/18] btrfs: new inline ref storing owning subvol of data extents Boris Burkov
2023-08-21 18:07 ` Josef Bacik
2023-09-07 12:06 ` David Sterba
2023-07-27 22:12 ` [PATCH v5 12/18] btrfs: inline owner ref lookup helper Boris Burkov
2023-09-07 12:10 ` David Sterba
2023-07-27 22:13 ` [PATCH v5 13/18] btrfs: record simple quota deltas Boris Burkov
2023-08-21 18:08 ` Josef Bacik
2023-09-07 12:12 ` David Sterba
2023-07-27 22:13 ` [PATCH v5 14/18] btrfs: simple quota auto hierarchy for nested subvols Boris Burkov
2023-08-21 18:10 ` Josef Bacik
2023-09-07 12:16 ` David Sterba
2023-07-27 22:13 ` [PATCH v5 15/18] btrfs: check generation when recording simple quota delta Boris Burkov
2023-08-21 18:11 ` Josef Bacik
2023-09-07 12:24 ` David Sterba
2023-09-08 21:41 ` Boris Burkov [this message]
2023-09-11 18:00 ` David Sterba
2023-09-13 0:17 ` Boris Burkov
2023-07-27 22:13 ` [PATCH v5 16/18] btrfs: track metadata relocation cow with simple quota Boris Burkov
2023-09-07 12:27 ` David Sterba
2023-07-27 22:13 ` [PATCH v5 17/18] btrfs: track data relocation " Boris Burkov
2023-08-21 18:16 ` Josef Bacik
2023-07-27 22:13 ` [PATCH v5 18/18] btrfs: only set QUOTA_ENABLED when done reading qgroups Boris Burkov
2023-08-21 18:16 ` Josef Bacik
2023-09-07 10:51 ` [PATCH v5 00/18] btrfs: simple quotas David Sterba
2023-09-07 20:51 ` Boris Burkov
2023-09-11 18:06 ` David Sterba
2023-09-11 18:12 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230908214146.GA172348@zen \
--to=boris@bur.io \
--cc=dsterba@suse.cz \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.