All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: David Sterba <dsterba@suse.cz>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v5 15/18] btrfs: check generation when recording simple quota delta
Date: Tue, 12 Sep 2023 17:17:02 -0700	[thread overview]
Message-ID: <20230913001702.GA224634@zen> (raw)
In-Reply-To: <20230911180020.GF3159@twin.jikos.cz>

On Mon, Sep 11, 2023 at 08:00:20PM +0200, David Sterba wrote:
> On Fri, Sep 08, 2023 at 02:41:46PM -0700, Boris Burkov wrote:
> > On Thu, Sep 07, 2023 at 02:24:49PM +0200, David Sterba wrote:
> > > On Thu, Jul 27, 2023 at 03:13:02PM -0700, Boris Burkov wrote:
> > > > Simple quotas count extents only from the moment the feature is enabled.
> > > > Therefore, if we do something like:
> > > > 1. create subvol S
> > > > 2. write F in S
> > > > 3. enable quotas
> > > > 4. remove F
> > > > 5. write G in S
> > > > 
> > > > then after 3. and 4. we would expect the simple quota usage of S to be 0
> > > > (putting aside some metadata extents that might be written) and after
> > > > 5., it should be the size of G plus metadata. Therefore, we need to be
> > > > able to determine whether a particular quota delta we are processing
> > > > predates simple quota enablement.
> > > > 
> > > > To do this, store the transaction id when quotas were enabled. In
> > > > fs_info for immediate use and in the quota status item to make it
> > > > recoverable on mount. When we see a delta, check if the generation of
> > > > the extent item is less than that of quota enablement. If so, we should
> > > > ignore the delta from this extent.
> > > > 
> > > > Signed-off-by: Boris Burkov <boris@bur.io>
> > > > ---
> > > >  fs/btrfs/accessors.h            |  2 ++
> > > >  fs/btrfs/extent-tree.c          |  4 ++++
> > > >  fs/btrfs/fs.h                   |  2 ++
> > > >  fs/btrfs/qgroup.c               | 14 ++++++++++++--
> > > >  fs/btrfs/qgroup.h               |  1 +
> > > >  include/uapi/linux/btrfs_tree.h |  7 +++++++
> > > >  6 files changed, 28 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
> > > > index a23045c05937..513f8edbd98e 100644
> > > > --- a/fs/btrfs/accessors.h
> > > > +++ b/fs/btrfs/accessors.h
> > > > @@ -970,6 +970,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
> > > >  		   flags, 64);
> > > >  BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
> > > >  		   rescan, 64);
> > > > +BTRFS_SETGET_FUNCS(qgroup_status_enable_gen, struct btrfs_qgroup_status_item,
> > > > +		   enable_gen, 64);
> > > >  
> > > >  /* btrfs_qgroup_info_item */
> > > >  BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
> > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > > index 1b5efd03ef83..395ab46e520b 100644
> > > > --- a/fs/btrfs/extent-tree.c
> > > > +++ b/fs/btrfs/extent-tree.c
> > > > @@ -1513,6 +1513,7 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
> > > >  			.rsv_bytes = href->reserved_bytes,
> > > >  			.is_data = true,
> > > >  			.is_inc	= true,
> > > > +			.generation = trans->transid,
> > > >  		};
> > > >  
> > > >  		if (extent_op)
> > > > @@ -1676,6 +1677,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
> > > >  			.rsv_bytes = 0,
> > > >  			.is_data = false,
> > > >  			.is_inc = true,
> > > > +			.generation = trans->transid,
> > > >  		};
> > > >  
> > > >  		BUG_ON(!extent_op || !extent_op->update_flags);
> > > > @@ -3217,6 +3219,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
> > > >  			.rsv_bytes = 0,
> > > >  			.is_data = is_data,
> > > >  			.is_inc = false,
> > > > +			.generation = btrfs_extent_generation(leaf, ei),
> > > >  		};
> > > >  
> > > >  		/* In this branch refs == 1 */
> > > > @@ -4850,6 +4853,7 @@ int btrfs_alloc_logged_file_extent(struct btrfs_trans_handle *trans,
> > > >  	struct btrfs_simple_quota_delta delta = {
> > > >  		.root = root_objectid,
> > > >  		.num_bytes = ins->offset,
> > > > +		.generation = trans->transid,
> > > >  		.rsv_bytes = 0,
> > > >  		.is_data = true,
> > > >  		.is_inc = true,
> > > > diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> > > > index f76f450c2abf..da7b623ff15f 100644
> > > > --- a/fs/btrfs/fs.h
> > > > +++ b/fs/btrfs/fs.h
> > > > @@ -802,6 +802,8 @@ struct btrfs_fs_info {
> > > >  	spinlock_t eb_leak_lock;
> > > >  	struct list_head allocated_ebs;
> > > >  #endif
> > > > +
> > > > +	u64 quota_enable_gen;
> > > 
> > > Please move it to the other quota/qgroup related members, at the end of
> > > fs_info there's only debugging stuff.
> > > 
> > > >  };
> > > >  
> > > >  static inline void btrfs_set_last_root_drop_gen(struct btrfs_fs_info *fs_info,
> > > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> > > > index 58e9ed0deedd..a8a603242431 100644
> > > > --- a/fs/btrfs/qgroup.c
> > > > +++ b/fs/btrfs/qgroup.c
> > > > @@ -454,6 +454,8 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
> > > >  			}
> > > >  			fs_info->qgroup_flags = btrfs_qgroup_status_flags(l, ptr);
> > > >  			simple = fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_SIMPLE;
> > > > +			if (simple)
> > > > +				fs_info->quota_enable_gen = btrfs_qgroup_status_enable_gen(l, ptr);
> > > >  			if (btrfs_qgroup_status_generation(l, ptr) !=
> > > >  			    fs_info->generation && !simple) {
> > > >  				qgroup_mark_inconsistent(fs_info);
> > > > @@ -1107,10 +1109,12 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info,
> > > >  	btrfs_set_qgroup_status_generation(leaf, ptr, trans->transid);
> > > >  	btrfs_set_qgroup_status_version(leaf, ptr, BTRFS_QGROUP_STATUS_VERSION);
> > > >  	fs_info->qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON;
> > > > -	if (simple)
> > > > +	if (simple) {
> > > >  		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_SIMPLE;
> > > > -	else
> > > > +		btrfs_set_qgroup_status_enable_gen(leaf, ptr, trans->transid);
> > > > +	} else {
> > > >  		fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
> > > > +	}
> > > >  	btrfs_set_qgroup_status_flags(leaf, ptr, fs_info->qgroup_flags &
> > > >  				      BTRFS_QGROUP_STATUS_FLAGS_MASK);
> > > >  	btrfs_set_qgroup_status_rescan(leaf, ptr, 0);
> > > > @@ -1202,6 +1206,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info,
> > > >  		goto out_free_path;
> > > >  	}
> > > >  
> > > > +	fs_info->quota_enable_gen = trans->transid;
> > > > +
> > > >  	mutex_unlock(&fs_info->qgroup_ioctl_lock);
> > > >  	/*
> > > >  	 * Commit the transaction while not holding qgroup_ioctl_lock, to avoid
> > > > @@ -4622,6 +4628,10 @@ int btrfs_record_simple_quota_delta(struct btrfs_fs_info *fs_info,
> > > >  	if (!is_fstree(root))
> > > >  		return 0;
> > > >  
> > > > +	/* If the extent predates enabling quotas, don't count it. */
> > > > +	if (delta->generation < fs_info->quota_enable_gen)
> > > > +		return 0;
> > > > +
> > > >  	spin_lock(&fs_info->qgroup_lock);
> > > >  	qgroup = find_qgroup_rb(fs_info, root);
> > > >  	if (!qgroup) {
> > > > diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
> > > > index ce6fa8694ca7..ae1ce14b365c 100644
> > > > --- a/fs/btrfs/qgroup.h
> > > > +++ b/fs/btrfs/qgroup.h
> > > > @@ -241,6 +241,7 @@ struct btrfs_simple_quota_delta {
> > > >  	u64 rsv_bytes; /* The number of bytes reserved for this extent */
> > > >  	bool is_inc; /* Whether we are using or freeing the extent */
> > > >  	bool is_data; /* Whether the extent is data or metadata */
> > > > +	u64 generation; /* The generation the extent was created in */
> > > 
> > > Please reorder it so it does not leave gaps between struct members.
> > > 
> > > >  };
> > > >  
> > > >  static inline u64 btrfs_qgroup_subvolid(u64 qgroupid)
> > > > diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> > > > index eacb26caf3c6..1120ce3dae42 100644
> > > > --- a/include/uapi/linux/btrfs_tree.h
> > > > +++ b/include/uapi/linux/btrfs_tree.h
> > > > @@ -1242,6 +1242,13 @@ struct btrfs_qgroup_status_item {
> > > >  	 * of the scan. It contains a logical address
> > > >  	 */
> > > >  	__le64 rescan;
> > > > +
> > > > +	/*
> > > > +	 * the generation when quotas are enabled. Used by simple quotas to
> > > > +	 * avoid decrementing when freeing an extent that was written before
> > > > +	 * enable.
> > > > +	 */
> > > > +	__le64 enable_gen;
> > > 
> > > This is public interface and btrfs_qgroup_status_item is used in many
> > > places in user space at least in btrfs-progs. This needs a lot of
> > > sanity checks.
> > 
> > Totally agreed in principle, but not exactly sure how to proceed in
> > practice. I would definitely appreciate some tips/help!
> > 
> > How we interact with the new field:
> > - When enabling squota, set it, the incompat bit, and the status flag
> > - When reading in the qgroup status_item, if the status flag is set,
> >   then read the enable_gen.
> > 
> > I believe this prevents us from ever reading garbage while trying to
> > read an old fs (status flag won't be set) and it prevents any
> > btrfs-progs from getting confused by a wrong-sized status item, since
> > it would choke on the incompat bit first.
> > 
> > Am I missing some other case? I can try to make it more explicitly
> > zeroed when we enable qgroups but not squotas? I can add an ASSERT that
> > the incompat bit is set as expected when we read the status item with
> > the flag on (that seems good no matter what)?
> > 
> > I can also write a wrapper for getting it which does the incompat/status
> > flag checking to make it more clear that it isn't safe to read in
> > general. Or a comment on the struct saying it depends on the incompat
> > bit?
> 
> All of the above makes sense and I had something like that in mind when
> writing the comment. The wrappers can make sure the bit is set when
> reading the item. I think there's an example in existing code that
> versions an item based on size, I can't find it now (probably something
> from the send/receive time where several new struct members were added).
> 
> I just noticed we have versioning for the qgoup status item,
> BTRFS_QGROUP_STATUS_VERSION is now 1 and has backward compatibility
> handling. We can probably use version 2 for squotas, in addition to the
> helpers with sanity checks.

I made the helper/validation changes in V6, but forgot to address the
BTRFS_QGROUP_STATUS_VERSION idea. Right now, that field is for
preventing forward compatibility as well as backward (for lack of a
better term?) The check on it is a !=, so if you bump the version, you
can no longer honor old fs-es qgroups, which is not the case with this
change. Since the version has never been bumped, I believe we can safely
change it to a backwards compatibility check and use it for squotas.

  reply	other threads:[~2023-09-13  0:16 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-27 22:12 [PATCH v5 00/18] btrfs: simple quotas Boris Burkov
2023-07-27 22:12 ` [PATCH v5 01/18] btrfs: introduce quota mode Boris Burkov
2023-07-27 22:12 ` [PATCH v5 02/18] btrfs: add new quota mode for simple quotas Boris Burkov
2023-08-21 18:00   ` Josef Bacik
2023-09-07 11:19   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 03/18] btrfs: expose quota mode via sysfs Boris Burkov
2023-08-21 18:00   ` Josef Bacik
2023-09-07 11:25   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 04/18] btrfs: add simple_quota incompat feature to sysfs Boris Burkov
2023-08-21 18:01   ` Josef Bacik
2023-09-07 11:28   ` David Sterba
2023-09-07 20:56     ` Boris Burkov
2023-07-27 22:12 ` [PATCH v5 05/18] btrfs: flush reservations during quota disable Boris Burkov
2023-07-27 22:12 ` [PATCH v5 06/18] btrfs: create qgroup earlier in snapshot creation Boris Burkov
2023-08-21 18:02   ` Josef Bacik
2023-09-07 11:41   ` David Sterba
2023-09-08 22:50     ` Boris Burkov
2023-07-27 22:12 ` [PATCH v5 07/18] btrfs: function for recording simple quota deltas Boris Burkov
2023-08-21 18:04   ` Josef Bacik
2023-09-07 11:46   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 08/18] btrfs: rename tree_ref and data_ref owning_root Boris Burkov
2023-07-27 22:12 ` [PATCH v5 09/18] btrfs: track owning root in btrfs_ref Boris Burkov
2023-08-21 18:05   ` Josef Bacik
2023-07-27 22:12 ` [PATCH v5 10/18] btrfs: track original extent owner in head_ref Boris Burkov
2023-08-21 18:06   ` Josef Bacik
2023-09-07 11:54   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 11/18] btrfs: new inline ref storing owning subvol of data extents Boris Burkov
2023-08-21 18:07   ` Josef Bacik
2023-09-07 12:06   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 12/18] btrfs: inline owner ref lookup helper Boris Burkov
2023-09-07 12:10   ` David Sterba
2023-07-27 22:13 ` [PATCH v5 13/18] btrfs: record simple quota deltas Boris Burkov
2023-08-21 18:08   ` Josef Bacik
2023-09-07 12:12   ` David Sterba
2023-07-27 22:13 ` [PATCH v5 14/18] btrfs: simple quota auto hierarchy for nested subvols Boris Burkov
2023-08-21 18:10   ` Josef Bacik
2023-09-07 12:16   ` David Sterba
2023-07-27 22:13 ` [PATCH v5 15/18] btrfs: check generation when recording simple quota delta Boris Burkov
2023-08-21 18:11   ` Josef Bacik
2023-09-07 12:24   ` David Sterba
2023-09-08 21:41     ` Boris Burkov
2023-09-11 18:00       ` David Sterba
2023-09-13  0:17         ` Boris Burkov [this message]
2023-07-27 22:13 ` [PATCH v5 16/18] btrfs: track metadata relocation cow with simple quota Boris Burkov
2023-09-07 12:27   ` David Sterba
2023-07-27 22:13 ` [PATCH v5 17/18] btrfs: track data relocation " Boris Burkov
2023-08-21 18:16   ` Josef Bacik
2023-07-27 22:13 ` [PATCH v5 18/18] btrfs: only set QUOTA_ENABLED when done reading qgroups Boris Burkov
2023-08-21 18:16   ` Josef Bacik
2023-09-07 10:51 ` [PATCH v5 00/18] btrfs: simple quotas David Sterba
2023-09-07 20:51   ` Boris Burkov
2023-09-11 18:06     ` David Sterba
2023-09-11 18:12   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230913001702.GA224634@zen \
    --to=boris@bur.io \
    --cc=dsterba@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.