public inbox for linux-bcachefs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/21] bcachefs disk accounting rewrite
@ 2024-02-25  2:38 Kent Overstreet
  2024-02-25  2:38 ` [PATCH 01/21] bcachefs: KEY_TYPE_accounting Kent Overstreet
                   ` (20 more replies)
  0 siblings, 21 replies; 39+ messages in thread
From: Kent Overstreet @ 2024-02-25  2:38 UTC (permalink / raw)
  To: linux-bcachefs, linux-fsdevel, linux-kernel
  Cc: Kent Overstreet, djwong, bfoster

here it is; the disk accounting rewrite I've been talking about since
forever.

git link:
https://evilpiepirate.org/git/bcachefs.git/log/?h=bcachefs-disk-accounting-rewrite

test dashboard (just rebased, results are regenerating as of this
writing but shouldn't be any regressions left):
https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-disk-accounting-rewrite

The old disk accounting scheme was fast, but had some limitations:

 - lack of scalability: it was based on percpu counters additionally
   sharded by outstanding journal buffer, and then just prior to journal
   write we'd roll up the counters and add them to the journal entry.

   But this meant that all counters were added to every journal write,
   which meant it'd never be able to support per-snapshot counters.

 - it was a pain to extend
   this was why, until now, we didn't have proper compressed accounting,
   and getting compression ratio required a full btree scan

In the new scheme:
 - every set of counters is a bkey, a key in a btree
   (BTREE_ID_accounting).

   this means they aren't pinned in the journal

 - the key has structure, and is extensible
   disk_accounting_key is a tagged union, and it's just union'd over
   bpos

 - counters are deltas, until flushed to the underlying btree

   this means counter updates are normal btree updates; the btree write
   buffer makes counter updates efficient.

Since reading counters from the btree would be expensive - it'd require
a write buffer flush to get up-to-date counters - we also maintain a
parallel set of accounting in memory, a bit like the old scheme but
without the per-journal-buffer sharding. The in memory accounters
indexed in an eytzinger tree by disk_accounting_key/bpos, with the
counters themselves being percpu u64s.

Reviewers: do a "is this adequately documented, can I find my way
around, do things make sense", not line-by-line "does this have bugs".

Compatibility: this is in no way compatible with the old disk accounting
on disk format, and it's not feasible to write out accounting in the old
format - that means we have to regenerate accounting when upgrading or
downgrading past this version.

That should work more or less seamlessly with the most recent compat
bits (bch_sb_field downgrade, so we can tell older versions what
recovery psases to run and what to fix); additionally, userspace fsck
now checks if the kernel bcachefs version better matches the on disk
version than itself and if so uses the kernle fsck implementation with
the OFFLINE_FSCK ioctl - so we shouldn't be bouncing back and forth
between versions if your tools and kernel don't match.

upgrade/downgrade still need a bit more testing, but transparently using
kernel fsck is well tested as of latest versions.

but: 6.7 users (& possibly 6.8) beware, the sb_downgrade section is in
6.7 but BCH_IOCTL_OFFLINE_FSCK is not, and backporting that doesn't look
likely given current -stable process fiasco.

merge ETA - this stuff may make the next merge window; I'd like to get
per-snapshot-id accounting done with it, that should be the biggest item
left.

Cheers,
Kent

Kent Overstreet (21):
  bcachefs: KEY_TYPE_accounting
  bcachefs: Accumulate accounting keys in journal replay
  bcachefs: btree write buffer knows how to accumulate bch_accounting
    keys
  bcachefs: Disk space accounting rewrite
  bcachefs: dev_usage updated by new accounting
  bcachefs: Kill bch2_fs_usage_initialize()
  bcachefs: Convert bch2_ioctl_fs_usage() to new accounting
  bcachefs: kill bch2_fs_usage_read()
  bcachefs: Kill writing old accounting to journal
  bcachefs: Delete journal-buf-sharded old style accounting
  bcachefs: Kill bch2_fs_usage_to_text()
  bcachefs: Kill fs_usage_online
  bcachefs: Kill replicas_journal_res
  bcachefs: Convert gc to new accounting
  bcachefs: Convert bch2_replicas_gc2() to new accounting
  bcachefs: bch2_verify_accounting_clean()
  bcachefs: Eytzinger accumulation for accounting keys
  bcachefs: bch_acct_compression
  bcachefs: Convert bch2_compression_stats_to_text() to new accounting
  bcachefs: bch2_fs_accounting_to_text()
  bcachefs: bch2_fs_usage_base_to_text()

 fs/bcachefs/Makefile                   |   3 +-
 fs/bcachefs/alloc_background.c         | 137 +++--
 fs/bcachefs/alloc_background.h         |   2 +
 fs/bcachefs/bcachefs.h                 |  22 +-
 fs/bcachefs/bcachefs_format.h          |  81 +--
 fs/bcachefs/bcachefs_ioctl.h           |   7 +-
 fs/bcachefs/bkey_methods.c             |   1 +
 fs/bcachefs/btree_gc.c                 | 259 ++++------
 fs/bcachefs/btree_iter.c               |   9 -
 fs/bcachefs/btree_journal_iter.c       |  23 +-
 fs/bcachefs/btree_journal_iter.h       |  15 +
 fs/bcachefs/btree_trans_commit.c       |  71 ++-
 fs/bcachefs/btree_types.h              |   1 -
 fs/bcachefs/btree_update.h             |  22 +-
 fs/bcachefs/btree_write_buffer.c       | 120 ++++-
 fs/bcachefs/btree_write_buffer.h       |  50 +-
 fs/bcachefs/btree_write_buffer_types.h |   2 +
 fs/bcachefs/buckets.c                  | 663 ++++---------------------
 fs/bcachefs/buckets.h                  |  70 +--
 fs/bcachefs/buckets_types.h            |  14 +-
 fs/bcachefs/chardev.c                  |  75 +--
 fs/bcachefs/disk_accounting.c          | 584 ++++++++++++++++++++++
 fs/bcachefs/disk_accounting.h          | 203 ++++++++
 fs/bcachefs/disk_accounting_format.h   | 145 ++++++
 fs/bcachefs/disk_accounting_types.h    |  20 +
 fs/bcachefs/ec.c                       | 166 ++++---
 fs/bcachefs/inode.c                    |  42 +-
 fs/bcachefs/journal_io.c               |  13 +-
 fs/bcachefs/recovery.c                 | 126 +++--
 fs/bcachefs/recovery_types.h           |   1 +
 fs/bcachefs/replicas.c                 | 242 ++-------
 fs/bcachefs/replicas.h                 |  16 +-
 fs/bcachefs/replicas_format.h          |  21 +
 fs/bcachefs/replicas_types.h           |  16 -
 fs/bcachefs/sb-clean.c                 |  62 ---
 fs/bcachefs/sb-downgrade.c             |  12 +-
 fs/bcachefs/sb-errors_types.h          |   4 +-
 fs/bcachefs/super.c                    |  74 ++-
 fs/bcachefs/sysfs.c                    | 109 ++--
 39 files changed, 1873 insertions(+), 1630 deletions(-)
 create mode 100644 fs/bcachefs/disk_accounting.c
 create mode 100644 fs/bcachefs/disk_accounting.h
 create mode 100644 fs/bcachefs/disk_accounting_format.h
 create mode 100644 fs/bcachefs/disk_accounting_types.h
 create mode 100644 fs/bcachefs/replicas_format.h

-- 
2.43.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2024-03-01 19:30 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-25  2:38 [PATCH 00/21] bcachefs disk accounting rewrite Kent Overstreet
2024-02-25  2:38 ` [PATCH 01/21] bcachefs: KEY_TYPE_accounting Kent Overstreet
2024-02-27 15:49   ` Brian Foster
2024-02-28 19:39     ` Kent Overstreet
2024-02-29 18:43       ` Brian Foster
2024-02-29 21:24         ` Kent Overstreet
2024-03-01 15:03           ` Brian Foster
2024-03-01 19:30             ` Kent Overstreet
2024-02-25  2:38 ` [PATCH 02/21] bcachefs: Accumulate accounting keys in journal replay Kent Overstreet
2024-02-27 15:49   ` Brian Foster
2024-02-28 20:06     ` Kent Overstreet
2024-02-25  2:38 ` [PATCH 03/21] bcachefs: btree write buffer knows how to accumulate bch_accounting keys Kent Overstreet
2024-02-27 15:50   ` Brian Foster
2024-02-28 22:42     ` Kent Overstreet
2024-02-29 18:44       ` Brian Foster
2024-02-29 20:25         ` Kent Overstreet
2024-02-25  2:38 ` [PATCH 04/21] bcachefs: Disk space accounting rewrite Kent Overstreet
2024-02-27 15:55   ` Brian Foster
2024-02-29  4:10     ` Kent Overstreet
2024-02-29 18:44       ` Brian Foster
2024-02-29 21:16         ` Kent Overstreet
2024-03-01 15:03           ` Brian Foster
2024-02-25  2:38 ` [PATCH 05/21] bcachefs: dev_usage updated by new accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 06/21] bcachefs: Kill bch2_fs_usage_initialize() Kent Overstreet
2024-02-25  2:38 ` [PATCH 07/21] bcachefs: Convert bch2_ioctl_fs_usage() to new accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 08/21] bcachefs: kill bch2_fs_usage_read() Kent Overstreet
2024-02-25  2:38 ` [PATCH 09/21] bcachefs: Kill writing old accounting to journal Kent Overstreet
2024-02-25  2:38 ` [PATCH 10/21] bcachefs: Delete journal-buf-sharded old style accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 11/21] bcachefs: Kill bch2_fs_usage_to_text() Kent Overstreet
2024-02-25  2:38 ` [PATCH 12/21] bcachefs: Kill fs_usage_online Kent Overstreet
2024-02-25  2:38 ` [PATCH 13/21] bcachefs: Kill replicas_journal_res Kent Overstreet
2024-02-25  2:38 ` [PATCH 14/21] bcachefs: Convert gc to new accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 15/21] bcachefs: Convert bch2_replicas_gc2() " Kent Overstreet
2024-02-25  2:38 ` [PATCH 16/21] bcachefs: bch2_verify_accounting_clean() Kent Overstreet
2024-02-25  2:38 ` [PATCH 17/21] bcachefs: Eytzinger accumulation for accounting keys Kent Overstreet
2024-02-25  2:38 ` [PATCH 18/21] bcachefs: bch_acct_compression Kent Overstreet
2024-02-25  2:38 ` [PATCH 19/21] bcachefs: Convert bch2_compression_stats_to_text() to new accounting Kent Overstreet
2024-02-25  2:38 ` [PATCH 20/21] bcachefs: bch2_fs_accounting_to_text() Kent Overstreet
2024-02-25  2:38 ` [PATCH 21/21] bcachefs: bch2_fs_usage_base_to_text() Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox