* [PATCH 01/12] bcachefs: Relax restrictions on the number of accounting counters
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 02/12] bcachefs: Introduce btree node number accounting Nikita Ofitserov via B4 Relay
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Make adding/removing accounting counters easier when upgrading by
removing their exact number from bch_accounting key validity invariants.
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/disk_accounting.c | 11 ++++++++---
fs/bcachefs/disk_accounting.h | 4 ++--
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c
index d6c91abcdc4140ef98b407fa580b7cd0cf3b8f56..53080f4449737ff384d56a3a150b376dadbfdc73 100644
--- a/fs/bcachefs/disk_accounting.c
+++ b/fs/bcachefs/disk_accounting.c
@@ -239,10 +239,12 @@ int bch2_accounting_validate(struct bch_fs *c, struct bkey_s_c k,
c, accounting_key_junk_at_end,
"junk at end of accounting key");
- bkey_fsck_err_on(bch2_accounting_counters(k.k) != bch2_accounting_type_nr_counters[acc_k.type],
+ const unsigned nr_counters = bch2_accounting_counters(k.k);
+
+ bkey_fsck_err_on(!nr_counters || nr_counters > BCH_ACCOUNTING_MAX_COUNTERS,
c, accounting_key_nr_counters_wrong,
"accounting key with %u counters, should be %u",
- bch2_accounting_counters(k.k), bch2_accounting_type_nr_counters[acc_k.type]);
+ nr_counters, bch2_accounting_type_nr_counters[acc_k.type]);
fsck_err:
return ret;
}
@@ -359,10 +361,13 @@ static int __bch2_accounting_mem_insert(struct bch_fs *c, struct bkey_s_c_accoun
accounting_pos_cmp, &a.k->p) < acc->k.nr)
return 0;
+ struct disk_accounting_pos acc_k;
+ bpos_to_disk_accounting_pos(&acc_k, a.k->p);
+
struct accounting_mem_entry n = {
.pos = a.k->p,
.bversion = a.k->bversion,
- .nr_counters = bch2_accounting_counters(a.k),
+ .nr_counters = bch2_accounting_type_nr_counters[acc_k.type],
.v[0] = __alloc_percpu_gfp(n.nr_counters * sizeof(u64),
sizeof(u64), GFP_KERNEL),
};
diff --git a/fs/bcachefs/disk_accounting.h b/fs/bcachefs/disk_accounting.h
index cc73cce98a447d96ad40683b88bfbc12689d8bf2..9e46aaf74704eaf51a643c3e746a440abd175b9e 100644
--- a/fs/bcachefs/disk_accounting.h
+++ b/fs/bcachefs/disk_accounting.h
@@ -212,9 +212,9 @@ static inline int bch2_accounting_mem_mod_locked(struct btree_trans *trans,
struct accounting_mem_entry *e = &acc->k.data[idx];
- EBUG_ON(bch2_accounting_counters(a.k) != e->nr_counters);
+ const unsigned nr = min_t(unsigned, bch2_accounting_counters(a.k), e->nr_counters);
- for (unsigned i = 0; i < bch2_accounting_counters(a.k); i++)
+ for (unsigned i = 0; i < nr; i++)
this_cpu_add(e->v[gc][i], a.v->d[i]);
return 0;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 02/12] bcachefs: Introduce btree node number accounting
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 01/12] bcachefs: Relax restrictions on the number of accounting counters Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 03/12] bcachefs: Use explicit node counts in progress reporting Nikita Ofitserov via B4 Relay
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Add 2 new counters for BCH_DISK_ACCOUNTING_btree: total number of btree
nodes (ignoring replication) and the number of non-leaf btree nodes
(likewise). Those are to be used by recovery progress reporting instead
of estimating them.
This commit is missing the required upgrade/downgrade entries and
should be included only together with other accounting updates!
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/buckets.c | 16 +++++++++++-----
fs/bcachefs/disk_accounting_format.h | 10 +++++++++-
2 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c
index 021f5cb7998de704be9d135064af2817cb2c52fe..99e928f7799971d052f24ba6043eb54765cf42b9 100644
--- a/fs/bcachefs/buckets.c
+++ b/fs/bcachefs/buckets.c
@@ -749,6 +749,7 @@ static int __trigger_extent(struct btree_trans *trans,
enum btree_iter_update_trigger_flags flags)
{
bool gc = flags & BTREE_TRIGGER_gc;
+ bool insert = !(flags & BTREE_TRIGGER_overwrite);
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
const union bch_extent_entry *entry;
struct extent_ptr_decoded p;
@@ -802,7 +803,7 @@ static int __trigger_extent(struct btree_trans *trans,
if (cur_compression_type &&
cur_compression_type != p.crc.compression_type) {
- if (flags & BTREE_TRIGGER_overwrite)
+ if (!insert)
bch2_u64s_neg(compression_acct, ARRAY_SIZE(compression_acct));
ret = bch2_disk_accounting_mod2(trans, gc, compression_acct,
@@ -835,7 +836,7 @@ static int __trigger_extent(struct btree_trans *trans,
}
if (cur_compression_type) {
- if (flags & BTREE_TRIGGER_overwrite)
+ if (!insert)
bch2_u64s_neg(compression_acct, ARRAY_SIZE(compression_acct));
ret = bch2_disk_accounting_mod2(trans, gc, compression_acct,
@@ -845,12 +846,17 @@ static int __trigger_extent(struct btree_trans *trans,
}
if (level) {
- ret = bch2_disk_accounting_mod2_nr(trans, gc, &replicas_sectors, 1, btree, btree_id);
+ const bool leaf_node = level == 1;
+ s64 v[3] = {
+ replicas_sectors,
+ insert ? 1 : -1,
+ !leaf_node ? (insert ? 1 : -1) : 0,
+ };
+
+ ret = bch2_disk_accounting_mod2(trans, gc, v, btree, btree_id);
if (ret)
return ret;
} else {
- bool insert = !(flags & BTREE_TRIGGER_overwrite);
-
s64 v[3] = {
insert ? 1 : -1,
insert ? k.k->size : -((s64) k.k->size),
diff --git a/fs/bcachefs/disk_accounting_format.h b/fs/bcachefs/disk_accounting_format.h
index 8269af1dbe2a094454f780194f4ece33c4a4e461..730a17ea42431012282cec9d7803b0ac0b1d339d 100644
--- a/fs/bcachefs/disk_accounting_format.h
+++ b/fs/bcachefs/disk_accounting_format.h
@@ -108,7 +108,7 @@ static inline bool data_type_is_hidden(enum bch_data_type type)
x(dev_data_type, 3, 3) \
x(compression, 4, 3) \
x(snapshot, 5, 1) \
- x(btree, 6, 1) \
+ x(btree, 6, 3) \
x(rebalance_work, 7, 1) \
x(inum, 8, 3)
@@ -174,6 +174,14 @@ struct bch_acct_snapshot {
__u32 id;
} __packed;
+/*
+ * Metadata accounting per btree id:
+ * [
+ * total btree disk usage in sectors
+ * total number of btree nodes
+ * number of non-leaf btree nodes
+ * ]
+ */
struct bch_acct_btree {
__u32 id;
} __packed;
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 03/12] bcachefs: Use explicit node counts in progress reporting
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 01/12] bcachefs: Relax restrictions on the number of accounting counters Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 02/12] bcachefs: Introduce btree node number accounting Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 04/12] bcachefs: Introduce btree_leaf_has_triggers_mask Nikita Ofitserov via B4 Relay
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Also consider the metadata_replicas option when better
accounting is not available.
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/progress.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/fs/bcachefs/progress.c b/fs/bcachefs/progress.c
index 792fc6fef27018c73168c59857e7f3497c1969f4..3ad4e1f6f653c8c75205efdd6d72560b8dda4c51 100644
--- a/fs/bcachefs/progress.c
+++ b/fs/bcachefs/progress.c
@@ -12,6 +12,10 @@ void bch2_progress_init(struct progress_indicator_state *s,
s->next_print = jiffies + HZ * 10;
+ /* This is only an estimation: nodes can have different replica counts */
+ const u32 expected_node_disk_sectors =
+ READ_ONCE(c->opts.metadata_replicas) * btree_sectors(c);
+
for (unsigned i = 0; i < BTREE_ID_NR; i++) {
if (!(btree_id_mask & BIT_ULL(i)))
continue;
@@ -19,9 +23,23 @@ void bch2_progress_init(struct progress_indicator_state *s,
struct disk_accounting_pos acc;
disk_accounting_key_init(acc, btree, .id = i);
- u64 v;
- bch2_accounting_mem_read(c, disk_accounting_pos_to_bpos(&acc), &v, 1);
- s->nodes_total += div64_ul(v, btree_sectors(c));
+ struct {
+ u64 disk_sectors;
+ u64 total_nodes;
+ u64 inner_nodes;
+ } v = {0};
+ bch2_accounting_mem_read(c, disk_accounting_pos_to_bpos(&acc),
+ (u64 *)&v, sizeof(v) / sizeof(u64));
+
+ /*
+ * We check for zeros to degrade gracefully when run
+ * with un-upgraded accounting info (missing some counters).
+ */
+
+ if (v.total_nodes != 0)
+ s->nodes_total += v.total_nodes - v.inner_nodes;
+ else
+ s->nodes_total += div_u64(v.disk_sectors, expected_node_disk_sectors);
}
}
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 04/12] bcachefs: Introduce btree_leaf_has_triggers_mask
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (2 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 03/12] bcachefs: Use explicit node counts in progress reporting Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 05/12] bcachefs: Better progress reporting for btree iteration without leaves Nikita Ofitserov via B4 Relay
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/btree_gc.c | 2 +-
fs/bcachefs/btree_types.h | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c
index 43f294284d57473fd8868c974c4bdd847fcf2636..2b1b6b7ffdf0af7ec0bf4cbaf68cc9a53a20cfce 100644
--- a/fs/bcachefs/btree_gc.c
+++ b/fs/bcachefs/btree_gc.c
@@ -720,7 +720,7 @@ static int bch2_gc_btree(struct btree_trans *trans,
enum btree_id btree, bool initial)
{
struct bch_fs *c = trans->c;
- unsigned target_depth = btree_node_type_has_triggers(__btree_node_type(0, btree)) ? 0 : 1;
+ unsigned target_depth = BIT_ULL(btree) & btree_leaf_has_triggers_mask ? 0 : 1;
int ret = 0;
/* We need to make sure every leaf node is readable before going RW */
diff --git a/fs/bcachefs/btree_types.h b/fs/bcachefs/btree_types.h
index e893eb938bb3da441f0ab0096ced3cc3c2a701e2..ad9bd18fe9b6e51987da74d373830dca98df56ec 100644
--- a/fs/bcachefs/btree_types.h
+++ b/fs/bcachefs/btree_types.h
@@ -840,6 +840,9 @@ static inline bool btree_node_type_has_triggers(enum btree_node_type type)
return BIT_ULL(type) & BTREE_NODE_TYPE_HAS_TRIGGERS;
}
+/* A mask of btree id bits that have triggers for their leaves */
+static const u64 btree_leaf_has_triggers_mask = BTREE_NODE_TYPE_HAS_TRIGGERS >> 1;
+
static const u64 btree_is_extents_mask = 0
#define x(name, nr, flags, ...) |((!!((flags) & BTREE_IS_extents)) << nr)
BCH_BTREE_IDS()
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 05/12] bcachefs: Better progress reporting for btree iteration without leaves
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (3 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 04/12] bcachefs: Introduce btree_leaf_has_triggers_mask Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 06/12] bcachefs: Refactor/rename btree_type_has_ptrs Nikita Ofitserov via B4 Relay
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/progress.c | 17 +++++++++++++----
fs/bcachefs/progress.h | 12 +++++++++++-
2 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/fs/bcachefs/progress.c b/fs/bcachefs/progress.c
index 3ad4e1f6f653c8c75205efdd6d72560b8dda4c51..013eca7d6017a81390001519f20bfc4d0c6bc173 100644
--- a/fs/bcachefs/progress.c
+++ b/fs/bcachefs/progress.c
@@ -4,9 +4,10 @@
#include "disk_accounting.h"
#include "progress.h"
-void bch2_progress_init(struct progress_indicator_state *s,
- struct bch_fs *c,
- u64 btree_id_mask)
+void bch2_progress_init_inner(struct progress_indicator_state *s,
+ struct bch_fs *c,
+ u64 leaf_btree_id_mask,
+ u64 inner_btree_id_mask)
{
memset(s, 0, sizeof(*s));
@@ -16,6 +17,8 @@ void bch2_progress_init(struct progress_indicator_state *s,
const u32 expected_node_disk_sectors =
READ_ONCE(c->opts.metadata_replicas) * btree_sectors(c);
+ const u64 btree_id_mask = leaf_btree_id_mask | inner_btree_id_mask;
+
for (unsigned i = 0; i < BTREE_ID_NR; i++) {
if (!(btree_id_mask & BIT_ULL(i)))
continue;
@@ -31,11 +34,17 @@ void bch2_progress_init(struct progress_indicator_state *s,
bch2_accounting_mem_read(c, disk_accounting_pos_to_bpos(&acc),
(u64 *)&v, sizeof(v) / sizeof(u64));
+ /* Better to estimate as 0 than the total node count */
+ if (inner_btree_id_mask & BIT_ULL(i))
+ s->nodes_total += v.inner_nodes;
+
+ if (!(leaf_btree_id_mask & BIT_ULL(i)))
+ continue;
+
/*
* We check for zeros to degrade gracefully when run
* with un-upgraded accounting info (missing some counters).
*/
-
if (v.total_nodes != 0)
s->nodes_total += v.total_nodes - v.inner_nodes;
else
diff --git a/fs/bcachefs/progress.h b/fs/bcachefs/progress.h
index 972a73087ffe06632abef015e4556d8ee196eb24..91f3453377093e2713aa4198b68e13601223688e 100644
--- a/fs/bcachefs/progress.h
+++ b/fs/bcachefs/progress.h
@@ -20,7 +20,17 @@ struct progress_indicator_state {
struct btree *last_node;
};
-void bch2_progress_init(struct progress_indicator_state *, struct bch_fs *, u64);
+void bch2_progress_init_inner(struct progress_indicator_state *s,
+ struct bch_fs *c,
+ u64 leaf_btree_id_mask,
+ u64 inner_btree_id_mask);
+
+static inline void bch2_progress_init(struct progress_indicator_state *s,
+ struct bch_fs *c, u64 btree_id_mask)
+{
+ bch2_progress_init_inner(s, c, btree_id_mask, 0);
+}
+
void bch2_progress_update_iter(struct btree_trans *,
struct progress_indicator_state *,
struct btree_iter *,
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 06/12] bcachefs: Refactor/rename btree_type_has_ptrs
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (4 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 05/12] bcachefs: Better progress reporting for btree iteration without leaves Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 07/12] bcachefs: More accurate progress reporting for inner node iteration Nikita Ofitserov via B4 Relay
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
The new name (btree_type_has_data_ptrs) better reflects the meaning.
Also introduce the explicit mask constant.
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/backpointers.c | 2 +-
fs/bcachefs/btree_gc.c | 2 +-
fs/bcachefs/btree_types.h | 8 ++++----
fs/bcachefs/migrate.c | 2 +-
fs/bcachefs/move.c | 2 +-
5 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index cb25cddb759b84bed48c653605ba195218f2140c..0d585e5662be3f02580558e9a590075ea73193d5 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -809,7 +809,7 @@ static int bch2_check_extents_to_backpointers_pass(struct btree_trans *trans,
for (enum btree_id btree_id = 0;
btree_id < btree_id_nr_alive(c);
btree_id++) {
- int level, depth = btree_type_has_ptrs(btree_id) ? 0 : 1;
+ int level, depth = btree_type_has_data_ptrs(btree_id) ? 0 : 1;
ret = commit_do(trans, NULL, NULL,
BCH_TRANS_COMMIT_no_enospc,
diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c
index 2b1b6b7ffdf0af7ec0bf4cbaf68cc9a53a20cfce..006bdd5c90bc805219d4e38923135986425ccf17 100644
--- a/fs/bcachefs/btree_gc.c
+++ b/fs/bcachefs/btree_gc.c
@@ -1228,7 +1228,7 @@ int bch2_gc_gens(struct bch_fs *c)
}
for (unsigned i = 0; i < BTREE_ID_NR; i++)
- if (btree_type_has_ptrs(i)) {
+ if (btree_type_has_data_ptrs(i)) {
c->gc_gens_btree = i;
c->gc_gens_pos = POS_MIN;
diff --git a/fs/bcachefs/btree_types.h b/fs/bcachefs/btree_types.h
index ad9bd18fe9b6e51987da74d373830dca98df56ec..d06991c16d5c24c586e9449efa9553e2b8593bd4 100644
--- a/fs/bcachefs/btree_types.h
+++ b/fs/bcachefs/btree_types.h
@@ -886,15 +886,15 @@ static inline bool btree_type_has_snapshot_field(enum btree_id btree)
return BIT_ULL(btree) & mask;
}
-static inline bool btree_type_has_ptrs(enum btree_id btree)
-{
- const u64 mask = 0
+static const u64 btree_has_data_ptrs_mask = 0
#define x(name, nr, flags, ...) |((!!((flags) & BTREE_IS_data)) << nr)
BCH_BTREE_IDS()
#undef x
;
- return BIT_ULL(btree) & mask;
+static inline bool btree_type_has_data_ptrs(enum btree_id btree)
+{
+ return BIT_ULL(btree) & btree_has_data_ptrs_mask;
}
static inline bool btree_type_uses_write_buffer(enum btree_id btree)
diff --git a/fs/bcachefs/migrate.c b/fs/bcachefs/migrate.c
index 892990b4a6a6b16785e5030e5dc7cf15b05c0fc5..41657deb0e3807f4968272491faea09d05ec2299 100644
--- a/fs/bcachefs/migrate.c
+++ b/fs/bcachefs/migrate.c
@@ -122,7 +122,7 @@ static int bch2_dev_usrdata_drop(struct bch_fs *c,
CLASS(btree_trans, trans)(c);
for (unsigned id = 0; id < BTREE_ID_NR; id++) {
- if (!btree_type_has_ptrs(id))
+ if (!btree_type_has_data_ptrs(id))
continue;
/* Stripe keys have pointers, but are handled separately */
diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index 62aeb54ef11b33814f2d06e5ec2787b656a67c30..c30e74a3f89450b08b3a8b89a863752611482c69 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -671,7 +671,7 @@ static int bch2_move_data(struct bch_fs *c,
unsigned min_depth_this_btree = min_depth;
/* Stripe keys have pointers, but are handled separately */
- if (!btree_type_has_ptrs(id) ||
+ if (!btree_type_has_data_ptrs(id) ||
id == BTREE_ID_stripes)
min_depth_this_btree = max(min_depth_this_btree, 1);
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 07/12] bcachefs: More accurate progress reporting for inner node iteration
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (5 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 06/12] bcachefs: Refactor/rename btree_type_has_ptrs Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 08/12] bcachefs: Fix progress reporting for unknown btrees Nikita Ofitserov via B4 Relay
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/backpointers.c | 4 +++-
fs/bcachefs/btree_gc.c | 13 +++++--------
fs/bcachefs/migrate.c | 13 +++++++++----
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index 0d585e5662be3f02580558e9a590075ea73193d5..42370aebb7a442ee368df323eb1f4970f2e5f949 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -804,7 +804,9 @@ static int bch2_check_extents_to_backpointers_pass(struct btree_trans *trans,
struct progress_indicator_state progress;
int ret = 0;
- bch2_progress_init(&progress, trans->c, BIT_ULL(BTREE_ID_extents)|BIT_ULL(BTREE_ID_reflink));
+ bch2_progress_init_inner(&progress, trans->c,
+ btree_has_data_ptrs_mask,
+ ~0ULL);
for (enum btree_id btree_id = 0;
btree_id < btree_id_nr_alive(c);
diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c
index 006bdd5c90bc805219d4e38923135986425ccf17..b89e12a3112868f175d028ce9745c8b4944ca747 100644
--- a/fs/bcachefs/btree_gc.c
+++ b/fs/bcachefs/btree_gc.c
@@ -717,16 +717,12 @@ static int bch2_gc_mark_key(struct btree_trans *trans, enum btree_id btree_id,
static int bch2_gc_btree(struct btree_trans *trans,
struct progress_indicator_state *progress,
- enum btree_id btree, bool initial)
+ enum btree_id btree, unsigned target_depth,
+ bool initial)
{
struct bch_fs *c = trans->c;
- unsigned target_depth = BIT_ULL(btree) & btree_leaf_has_triggers_mask ? 0 : 1;
int ret = 0;
- /* We need to make sure every leaf node is readable before going RW */
- if (initial)
- target_depth = 0;
-
for (unsigned level = target_depth; level < BTREE_MAX_DEPTH; level++) {
struct btree *prev = NULL;
struct btree_iter iter;
@@ -784,7 +780,7 @@ static int bch2_gc_btrees(struct bch_fs *c)
int ret = 0;
struct progress_indicator_state progress;
- bch2_progress_init(&progress, c, ~0ULL);
+ bch2_progress_init_inner(&progress, c, ~0ULL, ~0ULL);
enum btree_id ids[BTREE_ID_NR];
for (unsigned i = 0; i < BTREE_ID_NR; i++)
@@ -797,7 +793,8 @@ static int bch2_gc_btrees(struct bch_fs *c)
if (IS_ERR_OR_NULL(bch2_btree_id_root(c, btree)->b))
continue;
- ret = bch2_gc_btree(trans, &progress, btree, true);
+ /* We need to make sure every leaf node is readable before going RW */
+ ret = bch2_gc_btree(trans, &progress, btree, 0, true);
}
bch_err_fn(c, ret);
diff --git a/fs/bcachefs/migrate.c b/fs/bcachefs/migrate.c
index 41657deb0e3807f4968272491faea09d05ec2299..0ef576ec45c7310423cf7cdf902cad30dba257db 100644
--- a/fs/bcachefs/migrate.c
+++ b/fs/bcachefs/migrate.c
@@ -265,10 +265,15 @@ int bch2_dev_data_drop_by_backpointers(struct bch_fs *c, unsigned dev_idx, unsig
int bch2_dev_data_drop(struct bch_fs *c, unsigned dev_idx, unsigned flags)
{
struct progress_indicator_state progress;
+ int ret;
+
bch2_progress_init(&progress, c,
- BIT_ULL(BTREE_ID_extents)|
- BIT_ULL(BTREE_ID_reflink));
+ btree_has_data_ptrs_mask & ~BIT_ULL(BTREE_ID_stripes));
+
+ if ((ret = bch2_dev_usrdata_drop(c, &progress, dev_idx, flags)))
+ return ret;
+
+ bch2_progress_init_inner(&progress, c, 0, ~0ULL);
- return bch2_dev_usrdata_drop(c, &progress, dev_idx, flags) ?:
- bch2_dev_metadata_drop(c, &progress, dev_idx, flags);
+ return bch2_dev_metadata_drop(c, &progress, dev_idx, flags);
}
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 08/12] bcachefs: Fix progress reporting for unknown btrees
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (6 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 07/12] bcachefs: More accurate progress reporting for inner node iteration Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 09/12] bcachefs: Partially fix old device removal with " Nikita Ofitserov via B4 Relay
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/progress.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/bcachefs/progress.c b/fs/bcachefs/progress.c
index 013eca7d6017a81390001519f20bfc4d0c6bc173..7cc16490ffa98fec34ff228cb56bd1af0f42da3d 100644
--- a/fs/bcachefs/progress.c
+++ b/fs/bcachefs/progress.c
@@ -19,7 +19,7 @@ void bch2_progress_init_inner(struct progress_indicator_state *s,
const u64 btree_id_mask = leaf_btree_id_mask | inner_btree_id_mask;
- for (unsigned i = 0; i < BTREE_ID_NR; i++) {
+ for (unsigned i = 0; i < btree_id_nr_alive(c); i++) {
if (!(btree_id_mask & BIT_ULL(i)))
continue;
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 09/12] bcachefs: Partially fix old device removal with unknown btrees
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (7 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 08/12] bcachefs: Fix progress reporting for unknown btrees Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 10/12] bcachefs: Improve check_allocations pass speed not in fsck Nikita Ofitserov via B4 Relay
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Handle removal of unknown metadata only, data pointed to by unknown
btrees cannot be supported currently.
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/migrate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/bcachefs/migrate.c b/fs/bcachefs/migrate.c
index 0ef576ec45c7310423cf7cdf902cad30dba257db..d67ca56d3f431980b104ea5528cce79be88f93fb 100644
--- a/fs/bcachefs/migrate.c
+++ b/fs/bcachefs/migrate.c
@@ -121,6 +121,7 @@ static int bch2_dev_usrdata_drop(struct bch_fs *c,
{
CLASS(btree_trans, trans)(c);
+ /* FIXME: this does not handle unknown btrees with data pointers */
for (unsigned id = 0; id < BTREE_ID_NR; id++) {
if (!btree_type_has_data_ptrs(id))
continue;
@@ -161,7 +162,7 @@ static int bch2_dev_metadata_drop(struct bch_fs *c,
bch2_bkey_buf_init(&k);
closure_init_stack(&cl);
- for (id = 0; id < BTREE_ID_NR; id++) {
+ for (id = 0; id < btree_id_nr_alive(c); id++) {
bch2_trans_node_iter_init(trans, &iter, id, POS_MIN, 0, 0,
BTREE_ITER_prefetch);
retry:
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 10/12] bcachefs: Improve check_allocations pass speed not in fsck
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (8 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 09/12] bcachefs: Partially fix old device removal with " Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 11/12] bcachefs: Fix missing c->usage updates from early recovery Nikita Ofitserov via B4 Relay
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Skip reading unnecessary leaf btree nodes unless running fsck.
For example, this should speed up version upgrade/downgrade when
rebuilding accounting information.
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/btree_gc.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c
index b89e12a3112868f175d028ce9745c8b4944ca747..c04e88ec5c0ac73c21f01db597b19d1b0c798299 100644
--- a/fs/bcachefs/btree_gc.c
+++ b/fs/bcachefs/btree_gc.c
@@ -793,8 +793,21 @@ static int bch2_gc_btrees(struct bch_fs *c)
if (IS_ERR_OR_NULL(bch2_btree_id_root(c, btree)->b))
continue;
- /* We need to make sure every leaf node is readable before going RW */
- ret = bch2_gc_btree(trans, &progress, btree, 0, true);
+
+ unsigned target_depth = BIT_ULL(btree) & btree_leaf_has_triggers_mask ? 0 : 1;
+
+ /*
+ * In fsck, we need to make sure every leaf node is readable
+ * before going RW, otherwise we can no longer rewind inside
+ * btree_lost_data to repair during the current fsck run.
+ *
+ * Otherwise, we can delay the repair to the next
+ * mount or offline fsck.
+ */
+ if (test_bit(BCH_FS_in_fsck, &c->flags))
+ target_depth = 0;
+
+ ret = bch2_gc_btree(trans, &progress, btree, target_depth, true);
}
bch_err_fn(c, ret);
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 11/12] bcachefs: Fix missing c->usage updates from early recovery
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (9 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 10/12] bcachefs: Improve check_allocations pass speed not in fsck Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-26 22:49 ` [PATCH 12/12] bcachefs: Fix online hidden (sb+journal) data accounting Nikita Ofitserov via B4 Relay
2025-08-27 17:17 ` [PATCH 00/12] Accounting for accurate progress reporting Kent Overstreet
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Make do_bch2_trans_commit_to_journal_replay apply global FS usage
updates stored in trans->fs_usage_delta, too.
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/btree_trans_commit.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/bcachefs/btree_trans_commit.c b/fs/bcachefs/btree_trans_commit.c
index 5fa7f2f9f1e9d0ddd1d4675774321b519313f477..e3f069ad4002e2b007600fb41cd220d13386d37f 100644
--- a/fs/bcachefs/btree_trans_commit.c
+++ b/fs/bcachefs/btree_trans_commit.c
@@ -970,6 +970,7 @@ do_bch2_trans_commit_to_journal_replay(struct btree_trans *trans,
struct bkey_i *accounting;
retry:
+ memset(&trans->fs_usage_delta, 0, sizeof(trans->fs_usage_delta));
percpu_down_read(&c->mark_lock);
for (accounting = btree_trans_subbuf_base(trans, &trans->accounting);
accounting != btree_trans_subbuf_top(trans, &trans->accounting);
@@ -981,6 +982,8 @@ do_bch2_trans_commit_to_journal_replay(struct btree_trans *trans,
if (ret)
goto revert_fs_usage;
}
+ /* Only fatal errors are possible later, so no need to revert this */
+ bch2_trans_account_disk_usage_change(trans);
percpu_up_read(&c->mark_lock);
trans_for_each_update(trans, i) {
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 12/12] bcachefs: Fix online hidden (sb+journal) data accounting
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (10 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 11/12] bcachefs: Fix missing c->usage updates from early recovery Nikita Ofitserov via B4 Relay
@ 2025-08-26 22:49 ` Nikita Ofitserov via B4 Relay
2025-08-27 17:17 ` [PATCH 00/12] Accounting for accurate progress reporting Kent Overstreet
12 siblings, 0 replies; 14+ messages in thread
From: Nikita Ofitserov via B4 Relay @ 2025-08-26 22:49 UTC (permalink / raw)
To: Kent Overstreet; +Cc: linux-bcachefs, Nikita Ofitserov
From: Nikita Ofitserov <himikof@gmail.com>
Now the c->usage->hidden counters are kept up to date by the same
trigger-based mechanism as the other ones.
Signed-off-by: Nikita Ofitserov <himikof@gmail.com>
---
fs/bcachefs/disk_accounting.c | 12 ++++++++----
fs/bcachefs/disk_accounting.h | 10 +++++++---
2 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/fs/bcachefs/disk_accounting.c b/fs/bcachefs/disk_accounting.c
index 53080f4449737ff384d56a3a150b376dadbfdc73..ea375b8bcebc33917eeaf754934815581232a97e 100644
--- a/fs/bcachefs/disk_accounting.c
+++ b/fs/bcachefs/disk_accounting.c
@@ -1064,13 +1064,17 @@ void bch2_verify_accounting_clean(struct bch_fs *c)
case BCH_DISK_ACCOUNTING_dev_data_type: {
{
guard(rcu)(); /* scoped guard is a loop, and doesn't play nicely with continue */
+ const enum bch_data_type data_type = acc_k.dev_data_type.data_type;
struct bch_dev *ca = bch2_dev_rcu_noerror(c, acc_k.dev_data_type.dev);
if (!ca)
continue;
- v[0] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].buckets);
- v[1] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].sectors);
- v[2] = percpu_u64_get(&ca->usage->d[acc_k.dev_data_type.data_type].fragmented);
+ v[0] = percpu_u64_get(&ca->usage->d[data_type].buckets);
+ v[1] = percpu_u64_get(&ca->usage->d[data_type].sectors);
+ v[2] = percpu_u64_get(&ca->usage->d[data_type].fragmented);
+
+ if (data_type == BCH_DATA_sb || data_type == BCH_DATA_journal)
+ base.hidden += a.v->d[0] * ca->mi.bucket_size;
}
if (memcmp(a.v->d, v, 3 * sizeof(u64))) {
@@ -1098,7 +1102,7 @@ void bch2_verify_accounting_clean(struct bch_fs *c)
mismatch = true; \
}
- //check(hidden);
+ check(hidden);
check(btree);
check(data);
check(cached);
diff --git a/fs/bcachefs/disk_accounting.h b/fs/bcachefs/disk_accounting.h
index 9e46aaf74704eaf51a643c3e746a440abd175b9e..9fcfc6a6e36958a27b312dad9de29455a6ee8f87 100644
--- a/fs/bcachefs/disk_accounting.h
+++ b/fs/bcachefs/disk_accounting.h
@@ -186,11 +186,15 @@ static inline int bch2_accounting_mem_mod_locked(struct btree_trans *trans,
break;
case BCH_DISK_ACCOUNTING_dev_data_type: {
guard(rcu)();
+ const enum bch_data_type data_type = acc_k.dev_data_type.data_type;
struct bch_dev *ca = bch2_dev_rcu_noerror(c, acc_k.dev_data_type.dev);
if (ca) {
- this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].buckets, a.v->d[0]);
- this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].sectors, a.v->d[1]);
- this_cpu_add(ca->usage->d[acc_k.dev_data_type.data_type].fragmented, a.v->d[2]);
+ this_cpu_add(ca->usage->d[data_type].buckets, a.v->d[0]);
+ this_cpu_add(ca->usage->d[data_type].sectors, a.v->d[1]);
+ this_cpu_add(ca->usage->d[data_type].fragmented, a.v->d[2]);
+
+ if (data_type == BCH_DATA_sb || data_type == BCH_DATA_journal)
+ trans->fs_usage_delta.hidden += a.v->d[0] * ca->mi.bucket_size;
}
break;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 00/12] Accounting for accurate progress reporting
2025-08-26 22:49 [PATCH 00/12] Accounting for accurate progress reporting Nikita Ofitserov via B4 Relay
` (11 preceding siblings ...)
2025-08-26 22:49 ` [PATCH 12/12] bcachefs: Fix online hidden (sb+journal) data accounting Nikita Ofitserov via B4 Relay
@ 2025-08-27 17:17 ` Kent Overstreet
12 siblings, 0 replies; 14+ messages in thread
From: Kent Overstreet @ 2025-08-27 17:17 UTC (permalink / raw)
To: himikof; +Cc: linux-bcachefs
On Wed, Aug 27, 2025 at 01:49:06AM +0300, Nikita Ofitserov via B4 Relay wrote:
> This patch series introduces new per-btree accounting counters and uses
> them for (hopefully) accurate progress reporting in recovery passes.
> Also includes various assorted bugfixes.
>
> The first commit ("Relax restrictions on the number of accounting
> counters") is optional, but will likely greatly improve the
> upgrade/tools version mismatch experience. Without it, all bree usage
> accounting will be thrown out and rebuilt on any version mismatch.
>
> The second commit has the format change but does not contain the
> upgrade/downgrade table entries. It is intended to be integrated
> together with other accounting changes in a single version upgrade.
>
> The last four commits are drive-by fixes/improvements, especially
> "Improve check_allocations pass speed not in fsck", which should make
> the future accounting upgrades much faster.
Series looks good, except for a couple notes:
- I prefer bugfixes/refactorings to come at the start of a series, so we
can get them in right away; I try to push on disk format changes as
far back in the series as I can
- The nr_counters check you deleted in bkey_validate has
upgrade/downgrade considerations, old versions will blow away the
btree counters on downgrade. That means we have to add a downgrade
table entry and run check_allocations, on both upgrade and downgrade
- As noted on IRC, by default all on disk format changes with only very
simple exceptions should get a new on disk format version; it helps
document what changed and it's good hygiene - version numbers are
cheap
- We should probably think more about upgrade/downgrade testing: I have
upgrade/downgrade tests, but they're not automated since they require
disk images that the CI doesn't know about
^ permalink raw reply [flat|nested] 14+ messages in thread