All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org,
	djwong@kernel.org
Subject: Re: [PATCH 04/21] bcachefs: Disk space accounting rewrite
Date: Fri, 1 Mar 2024 10:03:35 -0500	[thread overview]
Message-ID: <ZeHuRyZLlVD5vc7v@bfoster> (raw)
In-Reply-To: <nywd7rix6qapogaqg5kxny3k272ptpzi4qv5q2dqts42bqbxrw@te5enoxvdhti>

On Thu, Feb 29, 2024 at 04:16:00PM -0500, Kent Overstreet wrote:
> On Thu, Feb 29, 2024 at 01:44:27PM -0500, Brian Foster wrote:
> > On Wed, Feb 28, 2024 at 11:10:12PM -0500, Kent Overstreet wrote:
> > > I think it ended up not needing to be moved, and I just forgot to drop
> > > it - originally I disallowed accounting entries that referenced
> > > nonexistent devices, but that wasn't workable so now it's only nonzero
> > > accounting keys that aren't allowed to reference nonexistent devices.
> > > 
> > > I'll see if I can delete it.
> > > 
> > 
> > Do you mean to delete the change that moves the call, or the flush call
> > entirely?
> 
> Delte the change, I think there's further cleanup (& probably bugs to
> fix) possible with that flush call but I'm not going to get into it
> right now.
> 

Ok, just trying to determine whether I need to look back and make sure
this doesn't regress the problem this originally fixed.

> > > +/*
> > > + * Notes on disk accounting:
> > > + *
> > > + * We have two parallel sets of counters to be concerned with, and both must be
> > > + * kept in sync.
> > > + *
> > > + *  - Persistent/on disk accounting, stored in the accounting btree and updated
> > > + *    via btree write buffer updates that treat new accounting keys as deltas to
> > > + *    apply to existing values. But reading from a write buffer btree is
> > > + *    expensive, so we also have
> > > + *
> > 
> > I find the wording a little odd here, and I also think it would be
> > helpful to explain how/from where the deltas originate. For example,
> > something along the lines of:
> > 
> > "Persistent/on disk accounting, stored in the accounting btree and
> > updated via btree write buffer updates. Accounting updates are
> > represented as deltas that originate from <somewhere? trans triggers?>.
> > Accounting keys represent these deltas through commit into the write
> > buffer. The accounting/delta keys in the write buffer are then
> > accumulated into the appropriate accounting btree key at write buffer
> > flush time."
> 
> yeah, that's worth including.
> 
> There's an interesting point that you're touching on; btree write buffer
> are always dependent state changes from some other (non write buffer)
> btree; we never look at a write buffer btree and generate an update
> there - we can't, reading from a write buffer btree doesn't get you
> anything consistent or up to date.
> 
> So in normal operation it really only makes sense to do write buffer
> updates from a transactional trigger - that's the only way to use them
> and have them be consistent with the resst of the filesystem.
> 
> And since triggers work by comparing old and new, they naturally
> generate updates that are deltas.
> 

Hm that is interesting, I hadn't made that connection. Thanks.

Brian

> > > + *  - In memory accounting, where accounting is stored as an array of percpu
> > > + *    counters, indexed by an eytzinger array of disk acounting keys/bpos (which
> > > + *    are the same thing, excepting byte swabbing on big endian).
> > > + *
> > 
> > Not really sure about the keys vs. bpos thing, kind of related to my
> > comments on the earlier patch. It might be more clear to just elide the
> > implementation details here, i.e.:
> > 
> > "In memory accounting, where accounting is stored as an array of percpu
> > counters that are cheap to read, but not persistent. Updates to in
> > memory accounting are propagated from the transaction commit path."
> > 
> > ... but NBD, and feel free to reword, drop and/or correct any of that
> > text.
> 
> It's there because bch2_accounting_mem_read() takes a bpos when it
> should be a disk_accounting_key. I'll fix that if I can...
> 
> > > + *    Cheap to read, but non persistent.
> > > + *
> > > + * To do a disk accounting update:
> > > + * - initialize a disk_accounting_key, to specify which counter is being update
> > > + * - initialize counter deltas, as an array of 1-3 s64s
> > > + * - call bch2_disk_accounting_mod()
> > > + *
> > > + * This queues up the accounting update to be done at transaction commit time.
> > > + * Underneath, it's a normal btree write buffer update.
> > > + *
> > > + * The transaction commit path is responsible for propagating updates to the in
> > > + * memory counters, with bch2_accounting_mem_mod().
> > > + *
> > > + * The commit path also assigns every disk accounting update a unique version
> > > + * number, based on the journal sequence number and offset within that journal
> > > + * buffer; this is used by journal replay to determine which updates have been
> > > + * done.
> > > + *
> > > + * The transaction commit path also ensures that replicas entry accounting
> > > + * updates are properly marked in the superblock (so that we know whether we can
> > > + * mount without data being unavailable); it will update the superblock if
> > > + * bch2_accounting_mem_mod() tells it to.
> > 
> > I'm not really sure what this last paragraph is telling me, but granted
> > I've not got that far into the code yet either.
> 
> yeah that's for a whole different subsystem that happens to be slaved to
> the accounting - replicas.c, which also used to help out quite a bit
> with the accounting but now it's pretty much just for managing the
> superblock replicas section.
> 
> The superblock replicas section is just a list of entries, where each
> entry is a list of devices - "there is replicated data present on this
> set of devices". We also have full counters of how much data is present
> replicated across each set of devices, so the superblock section is just
> a truncated version of the accounting - "data exists on these devices",
> instead of saying how much.
> 


  reply	other threads:[~2024-03-01 15:02 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-25  2:38 [PATCH 00/21] bcachefs disk accounting rewrite Kent Overstreet
2024-02-25  2:38 ` [PATCH 01/21] bcachefs: KEY_TYPE_accounting Kent Overstreet
2024-02-27 15:49   ` Brian Foster
2024-02-28 19:39     ` Kent Overstreet
2024-02-29 18:43       ` Brian Foster
2024-02-29 21:24         ` Kent Overstreet
2024-03-01 15:03           ` Brian Foster
2024-03-01 19:30             ` Kent Overstreet
2024-02-25  2:38 ` [PATCH 02/21] bcachefs: Accumulate accounting keys in journal replay Kent Overstreet
2024-02-27 15:49   ` Brian Foster
2024-02-28 20:06     ` Kent Overstreet
2024-02-25  2:38 ` [PATCH 03/21] bcachefs: btree write buffer knows how to accumulate bch_accounting keys Kent Overstreet
2024-02-27 15:50   ` Brian Foster
2024-02-28 22:42     ` Kent Overstreet
2024-02-29 18:44       ` Brian Foster
2024-02-29 20:25         ` Kent Overstreet
2024-02-25  2:38 ` [PATCH 04/21] bcachefs: Disk space accounting rewrite Kent Overstreet
2024-02-27 15:55   ` Brian Foster
2024-02-29  4:10     ` Kent Overstreet
2024-02-29 18:44       ` Brian Foster
2024-02-29 21:16         ` Kent Overstreet
2024-03-01 15:03           ` Brian Foster [this message]
2024-02-25  2:38 ` [PATCH 05/21] bcachefs: dev_usage updated by new accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 06/21] bcachefs: Kill bch2_fs_usage_initialize() Kent Overstreet
2024-02-25  2:38 ` [PATCH 07/21] bcachefs: Convert bch2_ioctl_fs_usage() to new accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 08/21] bcachefs: kill bch2_fs_usage_read() Kent Overstreet
2024-02-25  2:38 ` [PATCH 09/21] bcachefs: Kill writing old accounting to journal Kent Overstreet
2024-02-25  2:38 ` [PATCH 10/21] bcachefs: Delete journal-buf-sharded old style accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 11/21] bcachefs: Kill bch2_fs_usage_to_text() Kent Overstreet
2024-02-25  2:38 ` [PATCH 12/21] bcachefs: Kill fs_usage_online Kent Overstreet
2024-02-25  2:38 ` [PATCH 13/21] bcachefs: Kill replicas_journal_res Kent Overstreet
2024-02-25  2:38 ` [PATCH 14/21] bcachefs: Convert gc to new accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 15/21] bcachefs: Convert bch2_replicas_gc2() " Kent Overstreet
2024-02-25  2:38 ` [PATCH 16/21] bcachefs: bch2_verify_accounting_clean() Kent Overstreet
2024-02-25  2:38 ` [PATCH 17/21] bcachefs: Eytzinger accumulation for accounting keys Kent Overstreet
2024-02-25  2:38 ` [PATCH 18/21] bcachefs: bch_acct_compression Kent Overstreet
2024-02-25  2:38 ` [PATCH 19/21] bcachefs: Convert bch2_compression_stats_to_text() to new accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 20/21] bcachefs: bch2_fs_accounting_to_text() Kent Overstreet
2024-02-25  2:38 ` [PATCH 21/21] bcachefs: bch2_fs_usage_base_to_text() Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeHuRyZLlVD5vc7v@bfoster \
    --to=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.