From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-bcachefs@vger.kernel.org, linux-kernel@vger.kernel.org,
djwong@kernel.org
Subject: Re: [PATCH 01/21] bcachefs: KEY_TYPE_accounting
Date: Tue, 27 Feb 2024 10:49:19 -0500 [thread overview]
Message-ID: <Zd4Ef49kHX3g69VT@bfoster> (raw)
In-Reply-To: <20240225023826.2413565-2-kent.overstreet@linux.dev>
On Sat, Feb 24, 2024 at 09:38:03PM -0500, Kent Overstreet wrote:
> New key type for the disk space accounting rewrite.
>
> - Holds a variable sized array of u64s (may be more than one for
> accounting e.g. compressed and uncompressed size, or buckets and
> sectors for a given data type)
>
> - Updates are deltas, not new versions of the key: this means updates
> to accounting can happen via the btree write buffer, which we'll be
> teaching to accumulate deltas.
>
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> ---
> fs/bcachefs/Makefile | 3 +-
> fs/bcachefs/bcachefs.h | 1 +
> fs/bcachefs/bcachefs_format.h | 80 +++------------
> fs/bcachefs/bkey_methods.c | 1 +
> fs/bcachefs/disk_accounting.c | 70 ++++++++++++++
> fs/bcachefs/disk_accounting.h | 52 ++++++++++
> fs/bcachefs/disk_accounting_format.h | 139 +++++++++++++++++++++++++++
> fs/bcachefs/replicas_format.h | 21 ++++
> fs/bcachefs/sb-downgrade.c | 12 ++-
> fs/bcachefs/sb-errors_types.h | 3 +-
> 10 files changed, 311 insertions(+), 71 deletions(-)
> create mode 100644 fs/bcachefs/disk_accounting.c
> create mode 100644 fs/bcachefs/disk_accounting.h
> create mode 100644 fs/bcachefs/disk_accounting_format.h
> create mode 100644 fs/bcachefs/replicas_format.h
>
...
> diff --git a/fs/bcachefs/disk_accounting_format.h b/fs/bcachefs/disk_accounting_format.h
> new file mode 100644
> index 000000000000..e06a42f0d578
> --- /dev/null
> +++ b/fs/bcachefs/disk_accounting_format.h
> @@ -0,0 +1,139 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _BCACHEFS_DISK_ACCOUNTING_FORMAT_H
> +#define _BCACHEFS_DISK_ACCOUNTING_FORMAT_H
> +
> +#include "replicas_format.h"
> +
> +/*
> + * Disk accounting - KEY_TYPE_accounting - on disk format:
> + *
> + * Here, the key has considerably more structure than a typical key (bpos); an
> + * accounting key is 'struct disk_accounting_key', which is a union of bpos.
> + *
First impression.. I'm a little confused why the key type is a union of
bpos. I'm possibly missing something fundamental/obvious, but could you
elaborate more on why that is here?
Brian
> + * This is a type-tagged union of all our various subtypes; a disk accounting
> + * key can be device counters, replicas counters, et cetera - it's extensible.
> + *
> + * The value is a list of u64s or s64s; the number of counters is specific to a
> + * given accounting type.
> + *
> + * Unlike with other key types, updates are _deltas_, and the deltas are not
> + * resolved until the update to the underlying btree, done by btree write buffer
> + * flush or journal replay.
> + *
> + * Journal replay in particular requires special handling. The journal tracks a
> + * range of entries which may possibly have not yet been applied to the btree
> + * yet - it does not know definitively whether individual entries are dirty and
> + * still need to be applied.
> + *
> + * To handle this, we use the version field of struct bkey, and give every
> + * accounting update a unique version number - a total ordering in time; the
> + * version number is derived from the key's position in the journal. Then
> + * journal replay can compare the version number of the key from the journal
> + * with the version number of the key in the btree to determine if a key needs
> + * to be replayed.
> + *
> + * For this to work, we must maintain this strict time ordering of updates as
> + * they are flushed to the btree, both via write buffer flush and via journal
> + * replay. This has complications for the write buffer code while journal replay
> + * is still in progress; the write buffer cannot flush any accounting keys to
> + * the btree until journal replay has finished replaying its accounting keys, or
> + * the (newer) version number of the keys from the write buffer will cause
> + * updates from journal replay to be lost.
> + */
> +
> +struct bch_accounting {
> + struct bch_val v;
> + __u64 d[];
> +};
> +
> +#define BCH_ACCOUNTING_MAX_COUNTERS 3
> +
> +#define BCH_DATA_TYPES() \
> + x(free, 0) \
> + x(sb, 1) \
> + x(journal, 2) \
> + x(btree, 3) \
> + x(user, 4) \
> + x(cached, 5) \
> + x(parity, 6) \
> + x(stripe, 7) \
> + x(need_gc_gens, 8) \
> + x(need_discard, 9)
> +
> +enum bch_data_type {
> +#define x(t, n) BCH_DATA_##t,
> + BCH_DATA_TYPES()
> +#undef x
> + BCH_DATA_NR
> +};
> +
> +static inline bool data_type_is_empty(enum bch_data_type type)
> +{
> + switch (type) {
> + case BCH_DATA_free:
> + case BCH_DATA_need_gc_gens:
> + case BCH_DATA_need_discard:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +static inline bool data_type_is_hidden(enum bch_data_type type)
> +{
> + switch (type) {
> + case BCH_DATA_sb:
> + case BCH_DATA_journal:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +#define BCH_DISK_ACCOUNTING_TYPES() \
> + x(nr_inodes, 0) \
> + x(persistent_reserved, 1) \
> + x(replicas, 2) \
> + x(dev_data_type, 3) \
> + x(dev_stripe_buckets, 4)
> +
> +enum disk_accounting_type {
> +#define x(f, nr) BCH_DISK_ACCOUNTING_##f = nr,
> + BCH_DISK_ACCOUNTING_TYPES()
> +#undef x
> + BCH_DISK_ACCOUNTING_TYPE_NR,
> +};
> +
> +struct bch_nr_inodes {
> +};
> +
> +struct bch_persistent_reserved {
> + __u8 nr_replicas;
> +};
> +
> +struct bch_dev_data_type {
> + __u8 dev;
> + __u8 data_type;
> +};
> +
> +struct bch_dev_stripe_buckets {
> + __u8 dev;
> +};
> +
> +struct disk_accounting_key {
> + union {
> + struct {
> + __u8 type;
> + union {
> + struct bch_nr_inodes nr_inodes;
> + struct bch_persistent_reserved persistent_reserved;
> + struct bch_replicas_entry_v1 replicas;
> + struct bch_dev_data_type dev_data_type;
> + struct bch_dev_stripe_buckets dev_stripe_buckets;
> + };
> + };
> + struct bpos _pad;
> + };
> +};
> +
> +#endif /* _BCACHEFS_DISK_ACCOUNTING_FORMAT_H */
> diff --git a/fs/bcachefs/replicas_format.h b/fs/bcachefs/replicas_format.h
> new file mode 100644
> index 000000000000..ed94f8c636b3
> --- /dev/null
> +++ b/fs/bcachefs/replicas_format.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _BCACHEFS_REPLICAS_FORMAT_H
> +#define _BCACHEFS_REPLICAS_FORMAT_H
> +
> +struct bch_replicas_entry_v0 {
> + __u8 data_type;
> + __u8 nr_devs;
> + __u8 devs[];
> +} __packed;
> +
> +struct bch_replicas_entry_v1 {
> + __u8 data_type;
> + __u8 nr_devs;
> + __u8 nr_required;
> + __u8 devs[];
> +} __packed;
> +
> +#define replicas_entry_bytes(_i) \
> + (offsetof(typeof(*(_i)), devs) + (_i)->nr_devs)
> +
> +#endif /* _BCACHEFS_REPLICAS_FORMAT_H */
> diff --git a/fs/bcachefs/sb-downgrade.c b/fs/bcachefs/sb-downgrade.c
> index 3337419faeff..33db8d7ca8c4 100644
> --- a/fs/bcachefs/sb-downgrade.c
> +++ b/fs/bcachefs/sb-downgrade.c
> @@ -52,9 +52,15 @@
> BCH_FSCK_ERR_subvol_fs_path_parent_wrong) \
> x(btree_subvolume_children, \
> BIT_ULL(BCH_RECOVERY_PASS_check_subvols), \
> - BCH_FSCK_ERR_subvol_children_not_set)
> + BCH_FSCK_ERR_subvol_children_not_set) \
> + x(disk_accounting_v2, \
> + BIT_ULL(BCH_RECOVERY_PASS_check_allocations), \
> + BCH_FSCK_ERR_accounting_mismatch)
>
> -#define DOWNGRADE_TABLE()
> +#define DOWNGRADE_TABLE() \
> + x(disk_accounting_v2, \
> + BIT_ULL(BCH_RECOVERY_PASS_check_alloc_info), \
> + BCH_FSCK_ERR_dev_usage_buckets_wrong)
>
> struct upgrade_downgrade_entry {
> u64 recovery_passes;
> @@ -108,7 +114,7 @@ void bch2_sb_set_upgrade(struct bch_fs *c,
> }
> }
>
> -#define x(ver, passes, ...) static const u16 downgrade_ver_##errors[] = { __VA_ARGS__ };
> +#define x(ver, passes, ...) static const u16 downgrade_##ver##_errors[] = { __VA_ARGS__ };
> DOWNGRADE_TABLE()
> #undef x
>
> diff --git a/fs/bcachefs/sb-errors_types.h b/fs/bcachefs/sb-errors_types.h
> index 0df4b0e7071a..383e13711001 100644
> --- a/fs/bcachefs/sb-errors_types.h
> +++ b/fs/bcachefs/sb-errors_types.h
> @@ -264,7 +264,8 @@
> x(subvol_children_not_set, 256) \
> x(subvol_children_bad, 257) \
> x(subvol_loop, 258) \
> - x(subvol_unreachable, 259)
> + x(subvol_unreachable, 259) \
> + x(accounting_mismatch, 260)
>
> enum bch_sb_error_id {
> #define x(t, n) BCH_FSCK_ERR_##t = n,
> --
> 2.43.0
>
next prev parent reply other threads:[~2024-02-27 15:47 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-25 2:38 [PATCH 00/21] bcachefs disk accounting rewrite Kent Overstreet
2024-02-25 2:38 ` [PATCH 01/21] bcachefs: KEY_TYPE_accounting Kent Overstreet
2024-02-27 15:49 ` Brian Foster [this message]
2024-02-28 19:39 ` Kent Overstreet
2024-02-29 18:43 ` Brian Foster
2024-02-29 21:24 ` Kent Overstreet
2024-03-01 15:03 ` Brian Foster
2024-03-01 19:30 ` Kent Overstreet
2024-02-25 2:38 ` [PATCH 02/21] bcachefs: Accumulate accounting keys in journal replay Kent Overstreet
2024-02-27 15:49 ` Brian Foster
2024-02-28 20:06 ` Kent Overstreet
2024-02-25 2:38 ` [PATCH 03/21] bcachefs: btree write buffer knows how to accumulate bch_accounting keys Kent Overstreet
2024-02-27 15:50 ` Brian Foster
2024-02-28 22:42 ` Kent Overstreet
2024-02-29 18:44 ` Brian Foster
2024-02-29 20:25 ` Kent Overstreet
2024-02-25 2:38 ` [PATCH 04/21] bcachefs: Disk space accounting rewrite Kent Overstreet
2024-02-27 15:55 ` Brian Foster
2024-02-29 4:10 ` Kent Overstreet
2024-02-29 18:44 ` Brian Foster
2024-02-29 21:16 ` Kent Overstreet
2024-03-01 15:03 ` Brian Foster
2024-02-25 2:38 ` [PATCH 05/21] bcachefs: dev_usage updated by new accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 06/21] bcachefs: Kill bch2_fs_usage_initialize() Kent Overstreet
2024-02-25 2:38 ` [PATCH 07/21] bcachefs: Convert bch2_ioctl_fs_usage() to new accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 08/21] bcachefs: kill bch2_fs_usage_read() Kent Overstreet
2024-02-25 2:38 ` [PATCH 09/21] bcachefs: Kill writing old accounting to journal Kent Overstreet
2024-02-25 2:38 ` [PATCH 10/21] bcachefs: Delete journal-buf-sharded old style accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 11/21] bcachefs: Kill bch2_fs_usage_to_text() Kent Overstreet
2024-02-25 2:38 ` [PATCH 12/21] bcachefs: Kill fs_usage_online Kent Overstreet
2024-02-25 2:38 ` [PATCH 13/21] bcachefs: Kill replicas_journal_res Kent Overstreet
2024-02-25 2:38 ` [PATCH 14/21] bcachefs: Convert gc to new accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 15/21] bcachefs: Convert bch2_replicas_gc2() " Kent Overstreet
2024-02-25 2:38 ` [PATCH 16/21] bcachefs: bch2_verify_accounting_clean() Kent Overstreet
2024-02-25 2:38 ` [PATCH 17/21] bcachefs: Eytzinger accumulation for accounting keys Kent Overstreet
2024-02-25 2:38 ` [PATCH 18/21] bcachefs: bch_acct_compression Kent Overstreet
2024-02-25 2:38 ` [PATCH 19/21] bcachefs: Convert bch2_compression_stats_to_text() to new accounting Kent Overstreet
2024-02-25 2:38 ` [PATCH 20/21] bcachefs: bch2_fs_accounting_to_text() Kent Overstreet
2024-02-25 2:38 ` [PATCH 21/21] bcachefs: bch2_fs_usage_base_to_text() Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zd4Ef49kHX3g69VT@bfoster \
--to=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox