public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: David Sterba <dsterba@suse.cz>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v5 00/18] btrfs: simple quotas
Date: Thu, 7 Sep 2023 13:51:31 -0700	[thread overview]
Message-ID: <20230907205131.GA5581@zen> (raw)
In-Reply-To: <20230907105115.GA3159@twin.jikos.cz>

On Thu, Sep 07, 2023 at 12:51:15PM +0200, David Sterba wrote:
> On Thu, Jul 27, 2023 at 03:12:47PM -0700, Boris Burkov wrote:
> > btrfs quota groups (qgroups) are a compelling feature of btrfs that
> > allow flexible control for limiting subvolume data and metadata usage.
> > However, due to btrfs's high level decision to tradeoff snapshot
> > performance against ref-counting performance, qgroups suffer from
> > non-trivial performance issues that make them unattractive in certain
> > workloads. Particularly, frequent backref walking during writes and
> > during commits can make operations increasingly expensive as the number
> > of snapshots scales up. For that reason, we have never been able to
> > commit to using qgroups in production at Meta, despite significant
> > interest from people running container workloads, where we would benefit
> > from protecting the rest of the host from a buggy application in a
> > container running away with disk usage. This patch series introduces a
> > simplified version of qgroups called
> > simple quotas (squotas) which never computes global reference counts
> > for extents, and thus has similar performance characteristics to normal,
> > quotas disabled, btrfs. The "trick" is that in simple quotas mode, we
> > account all extents permanently to the subvolume in which they were
> > originally created. That allows us to make all accounting 1:1 with
> > extent item lifetime, removing the need to walk backrefs. However,
> > this sacrifices the ability to compute shared vs. exclusive usage. It
> > also results in counter-intuitive, though still predictable and simple
> > accounting in the cases where an original extent is removed while a
> > shared copy still exists. Qgroups is able to detect that case and count
> > the remaining copy as an exclusive owner, while squotas is not. As a
> > result, squotas works best when the original extent is immutable and
> > outlives any clones.
> > 
> > ==Format Change==
> > In order to track the original creating subvolume of a data extent in
> > the face of reflinks, it is necessary to add additional accounting to
> > the extent item. To save space, this is done with a new inline ref item.
> > However, the downside of this approach is that it makes enabling squota
> > an incompat change, denoted by the new incompat bit SIMPLE_QUOTA. When
> > this bit is set and quotas are enabled, new extent items get the extra
> > accounting, and freed extent items check for the accounting to find
> > their creating subvolume. In addition, 1:1 with this incompat bit,
> > the quota status item now tracks a "quota enablement generation" needed
> > for properly handling deleting extents with predate enablement.
> > 
> > ==API==
> > Squotas reuses the api of qgroups.
> 
> So apart from the accounting, the hierarchy of qgroups can be still
> built as before, right? In the example you create a group 1/100 so I
> assume that it's still qgroups from the outside, and that the limits can
> be set.

Yes, you can create quota group hierarchies with the same nesting
behavior. I am only changing the accounting methodology (and added auto
hierarchy)

> 
> Because if not, then squotas would make more sense as a separate
> infrastructure, under quotas. Like that quotas are the abstraction while
> qgroups or squota would be the implementation.
> 
> > The only difference is that when you
> > enable quotas via `btrfs quota enable`, you pass the `--simple` flag.
> > Squotas will always report exclusive == shared for each qgroup. Squotas
> > deal with extent_item/metadata_item sizes and thus do not do anything
> > special with compression. Squotas also introduce auto inheritance for
> > nested subvols. The API is documented more fully in the documentation
> > patches in btrfs-progs.
> 
> The lack of exclusive size sharing will be confusing I guess, so we need
> to make it clear in the documentation and in the UI that it's either
> full or simple mode.

I am happy to iterate on that. I think always reporting as shared=0,
since the *ownership* is exclusive. I opted for making them equal since
it sort of both shared usage (we don't know if it's shared nor when it
will be freed) and exclusive usage (belongs to this subvol by owner ref)

> 
> I've added the patchset to for-next, we may need an iteration or two to
> fix some issues I've seen so far but on the fundamental level I think
> it's ok.

  reply	other threads:[~2023-09-07 20:50 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-27 22:12 [PATCH v5 00/18] btrfs: simple quotas Boris Burkov
2023-07-27 22:12 ` [PATCH v5 01/18] btrfs: introduce quota mode Boris Burkov
2023-07-27 22:12 ` [PATCH v5 02/18] btrfs: add new quota mode for simple quotas Boris Burkov
2023-08-21 18:00   ` Josef Bacik
2023-09-07 11:19   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 03/18] btrfs: expose quota mode via sysfs Boris Burkov
2023-08-21 18:00   ` Josef Bacik
2023-09-07 11:25   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 04/18] btrfs: add simple_quota incompat feature to sysfs Boris Burkov
2023-08-21 18:01   ` Josef Bacik
2023-09-07 11:28   ` David Sterba
2023-09-07 20:56     ` Boris Burkov
2023-07-27 22:12 ` [PATCH v5 05/18] btrfs: flush reservations during quota disable Boris Burkov
2023-07-27 22:12 ` [PATCH v5 06/18] btrfs: create qgroup earlier in snapshot creation Boris Burkov
2023-08-21 18:02   ` Josef Bacik
2023-09-07 11:41   ` David Sterba
2023-09-08 22:50     ` Boris Burkov
2023-07-27 22:12 ` [PATCH v5 07/18] btrfs: function for recording simple quota deltas Boris Burkov
2023-08-21 18:04   ` Josef Bacik
2023-09-07 11:46   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 08/18] btrfs: rename tree_ref and data_ref owning_root Boris Burkov
2023-07-27 22:12 ` [PATCH v5 09/18] btrfs: track owning root in btrfs_ref Boris Burkov
2023-08-21 18:05   ` Josef Bacik
2023-07-27 22:12 ` [PATCH v5 10/18] btrfs: track original extent owner in head_ref Boris Burkov
2023-08-21 18:06   ` Josef Bacik
2023-09-07 11:54   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 11/18] btrfs: new inline ref storing owning subvol of data extents Boris Burkov
2023-08-21 18:07   ` Josef Bacik
2023-09-07 12:06   ` David Sterba
2023-07-27 22:12 ` [PATCH v5 12/18] btrfs: inline owner ref lookup helper Boris Burkov
2023-09-07 12:10   ` David Sterba
2023-07-27 22:13 ` [PATCH v5 13/18] btrfs: record simple quota deltas Boris Burkov
2023-08-21 18:08   ` Josef Bacik
2023-09-07 12:12   ` David Sterba
2023-07-27 22:13 ` [PATCH v5 14/18] btrfs: simple quota auto hierarchy for nested subvols Boris Burkov
2023-08-21 18:10   ` Josef Bacik
2023-09-07 12:16   ` David Sterba
2023-07-27 22:13 ` [PATCH v5 15/18] btrfs: check generation when recording simple quota delta Boris Burkov
2023-08-21 18:11   ` Josef Bacik
2023-09-07 12:24   ` David Sterba
2023-09-08 21:41     ` Boris Burkov
2023-09-11 18:00       ` David Sterba
2023-09-13  0:17         ` Boris Burkov
2023-07-27 22:13 ` [PATCH v5 16/18] btrfs: track metadata relocation cow with simple quota Boris Burkov
2023-09-07 12:27   ` David Sterba
2023-07-27 22:13 ` [PATCH v5 17/18] btrfs: track data relocation " Boris Burkov
2023-08-21 18:16   ` Josef Bacik
2023-07-27 22:13 ` [PATCH v5 18/18] btrfs: only set QUOTA_ENABLED when done reading qgroups Boris Burkov
2023-08-21 18:16   ` Josef Bacik
2023-09-07 10:51 ` [PATCH v5 00/18] btrfs: simple quotas David Sterba
2023-09-07 20:51   ` Boris Burkov [this message]
2023-09-11 18:06     ` David Sterba
2023-09-11 18:12   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230907205131.GA5581@zen \
    --to=boris@bur.io \
    --cc=dsterba@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox