From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org,
djwong@kernel.org
Subject: Re: [PATCH 04/21] bcachefs: Disk space accounting rewrite
Date: Fri, 1 Mar 2024 10:03:35 -0500 [thread overview]
Message-ID: <ZeHuRyZLlVD5vc7v@bfoster> (raw)
In-Reply-To: <nywd7rix6qapogaqg5kxny3k272ptpzi4qv5q2dqts42bqbxrw@te5enoxvdhti>
On Thu, Feb 29, 2024 at 04:16:00PM -0500, Kent Overstreet wrote:
> On Thu, Feb 29, 2024 at 01:44:27PM -0500, Brian Foster wrote:
> > On Wed, Feb 28, 2024 at 11:10:12PM -0500, Kent Overstreet wrote:
> > > I think it ended up not needing to be moved, and I just forgot to drop
> > > it - originally I disallowed accounting entries that referenced
> > > nonexistent devices, but that wasn't workable so now it's only nonzero
> > > accounting keys that aren't allowed to reference nonexistent devices.
> > >
> > > I'll see if I can delete it.
> > >
> >
> > Do you mean to delete the change that moves the call, or the flush call
> > entirely?
>
> Delte the change, I think there's further cleanup (& probably bugs to
> fix) possible with that flush call but I'm not going to get into it
> right now.
>
Ok, just trying to determine whether I need to look back and make sure
this doesn't regress the problem this originally fixed.
> > > +/*
> > > + * Notes on disk accounting:
> > > + *
> > > + * We have two parallel sets of counters to be concerned with, and both must be
> > > + * kept in sync.
> > > + *
> > > + * - Persistent/on disk accounting, stored in the accounting btree and updated
> > > + * via btree write buffer updates that treat new accounting keys as deltas to
> > > + * apply to existing values. But reading from a write buffer btree is
> > > + * expensive, so we also have
> > > + *
> >
> > I find the wording a little odd here, and I also think it would be
> > helpful to explain how/from where the deltas originate. For example,
> > something along the lines of:
> >
> > "Persistent/on disk accounting, stored in the accounting btree and
> > updated via btree write buffer updates. Accounting updates are
> > represented as deltas that originate from <somewhere? trans triggers?>.
> > Accounting keys represent these deltas through commit into the write
> > buffer. The accounting/delta keys in the write buffer are then
> > accumulated into the appropriate accounting btree key at write buffer
> > flush time."
>
> yeah, that's worth including.
>
> There's an interesting point that you're touching on; btree write buffer
> are always dependent state changes from some other (non write buffer)
> btree; we never look at a write buffer btree and generate an update
> there - we can't, reading from a write buffer btree doesn't get you
> anything consistent or up to date.
>
> So in normal operation it really only makes sense to do write buffer
> updates from a transactional trigger - that's the only way to use them
> and have them be consistent with the resst of the filesystem.
>
> And since triggers work by comparing old and new, they naturally
> generate updates that are deltas.
>
Hm that is interesting, I hadn't made that connection. Thanks.
Brian
> > > + * - In memory accounting, where accounting is stored as an array of percpu
> > > + * counters, indexed by an eytzinger array of disk acounting keys/bpos (which
> > > + * are the same thing, excepting byte swabbing on big endian).
> > > + *
> >
> > Not really sure about the keys vs. bpos thing, kind of related to my
> > comments on the earlier patch. It might be more clear to just elide the
> > implementation details here, i.e.:
> >
> > "In memory accounting, where accounting is stored as an array of percpu
> > counters that are cheap to read, but not persistent. Updates to in
> > memory accounting are propagated from the transaction commit path."
> >
> > ... but NBD, and feel free to reword, drop and/or correct any of that
> > text.
>
> It's there because bch2_accounting_mem_read() takes a bpos when it
> should be a disk_accounting_key. I'll fix that if I can...
>
> > > + * Cheap to read, but non persistent.
> > > + *
> > > + * To do a disk accounting update:
> > > + * - initialize a disk_accounting_key, to specify which counter is being update
> > > + * - initialize counter deltas, as an array of 1-3 s64s
> > > + * - call bch2_disk_accounting_mod()
> > > + *
> > > + * This queues up the accounting update to be done at transaction commit time.
> > > + * Underneath, it's a normal btree write buffer update.
> > > + *
> > > + * The transaction commit path is responsible for propagating updates to the in
> > > + * memory counters, with bch2_accounting_mem_mod().
> > > + *
> > > + * The commit path also assigns every disk accounting update a unique version
> > > + * number, based on the journal sequence number and offset within that journal
> > > + * buffer; this is used by journal replay to determine which updates have been
> > > + * done.
> > > + *
> > > + * The transaction commit path also ensures that replicas entry accounting
> > > + * updates are properly marked in the superblock (so that we know whether we can
> > > + * mount without data being unavailable); it will update the superblock if
> > > + * bch2_accounting_mem_mod() tells it to.
> >
> > I'm not really sure what this last paragraph is telling me, but granted
> > I've not got that far into the code yet either.
>
> yeah that's for a whole different subsystem that happens to be slaved to
> the accounting - replicas.c, which also used to help out quite a bit
> with the accounting but now it's pretty much just for managing the
> superblock replicas section.
>
> The superblock replicas section is just a list of entries, where each
> entry is a list of devices - "there is replicated data present on this
> set of devices". We also have full counters of how much data is present
> replicated across each set of devices, so the superblock section is just
> a truncated version of the accounting - "data exists on these devices",
> instead of saying how much.
>
next prev parent reply other threads:[~2024-03-01 15:02 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-25 2:38 [PATCH 00/21] bcachefs disk accounting rewrite Kent Overstreet
2024-02-25 2:38 ` [PATCH 01/21] bcachefs: KEY_TYPE_accounting Kent Overstreet
2024-02-27 15:49 ` Brian Foster
2024-02-28 19:39 ` Kent Overstreet
2024-02-29 18:43 ` Brian Foster
2024-02-29 21:24 ` Kent Overstreet
2024-03-01 15:03 ` Brian Foster
2024-03-01 19:30 ` Kent Overstreet
2024-02-25 2:38 ` [PATCH 02/21] bcachefs: Accumulate accounting keys in journal replay Kent Overstreet
2024-02-27 15:49 ` Brian Foster
2024-02-28 20:06 ` Kent Overstreet
2024-02-25 2:38 ` [PATCH 03/21] bcachefs: btree write buffer knows how to accumulate bch_accounting keys Kent Overstreet
2024-02-27 15:50 ` Brian Foster
2024-02-28 22:42 ` Kent Overstreet
2024-02-29 18:44 ` Brian Foster
2024-02-29 20:25 ` Kent Overstreet
2024-02-25 2:38 ` [PATCH 04/21] bcachefs: Disk space accounting rewrite Kent Overstreet
2024-02-27 15:55 ` Brian Foster
2024-02-29 4:10 ` Kent Overstreet
2024-02-29 18:44 ` Brian Foster
2024-02-29 21:16 ` Kent Overstreet
2024-03-01 15:03 ` Brian Foster [this message]
2024-02-25 2:38 ` [PATCH 05/21] bcachefs: dev_usage updated by new accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 06/21] bcachefs: Kill bch2_fs_usage_initialize() Kent Overstreet
2024-02-25 2:38 ` [PATCH 07/21] bcachefs: Convert bch2_ioctl_fs_usage() to new accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 08/21] bcachefs: kill bch2_fs_usage_read() Kent Overstreet
2024-02-25 2:38 ` [PATCH 09/21] bcachefs: Kill writing old accounting to journal Kent Overstreet
2024-02-25 2:38 ` [PATCH 10/21] bcachefs: Delete journal-buf-sharded old style accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 11/21] bcachefs: Kill bch2_fs_usage_to_text() Kent Overstreet
2024-02-25 2:38 ` [PATCH 12/21] bcachefs: Kill fs_usage_online Kent Overstreet
2024-02-25 2:38 ` [PATCH 13/21] bcachefs: Kill replicas_journal_res Kent Overstreet
2024-02-25 2:38 ` [PATCH 14/21] bcachefs: Convert gc to new accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 15/21] bcachefs: Convert bch2_replicas_gc2() " Kent Overstreet
2024-02-25 2:38 ` [PATCH 16/21] bcachefs: bch2_verify_accounting_clean() Kent Overstreet
2024-02-25 2:38 ` [PATCH 17/21] bcachefs: Eytzinger accumulation for accounting keys Kent Overstreet
2024-02-25 2:38 ` [PATCH 18/21] bcachefs: bch_acct_compression Kent Overstreet
2024-02-25 2:38 ` [PATCH 19/21] bcachefs: Convert bch2_compression_stats_to_text() to new accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 20/21] bcachefs: bch2_fs_accounting_to_text() Kent Overstreet
2024-02-25 2:38 ` [PATCH 21/21] bcachefs: bch2_fs_usage_base_to_text() Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZeHuRyZLlVD5vc7v@bfoster \
--to=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox