All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Runtime self healing for missing backpointers
@ 2025-05-17 19:25 Kent Overstreet
  2025-05-17 19:25 ` [PATCH 1/8] bcachefs: struct bch_fs_recovery Kent Overstreet
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

Post 6.14 upgrade, there have been some reports of copygc spinning
without making any progress, where it turns out it's been attempting to
evacuate buckets with missing backpointers. 

6.14 made backpointers fsck much faster, partly by deferring the "does
this backpointer point to a matching extent" checks until backpointers
are used. This means that bad backpointers that haven't yet been cleaned
up can cause the "sum up backpointers in a bucket to check if we need to
scan for missing backpointers" code to not detect missing backpointers.

Doh.

The solution is to re-run the "do backpointers in a bucket sum up to the
bucket sector counts?" checks _after_ we've walked backpointers in a
bucket at runtime, by codepaths that use backpointer_get_key() - e.g.
move_data_phys. backpointer_get_key() will have then cleaned up bad
backpointers, and if there are any missing backpointers we'll be able to
reliably detect them.

Note that the "do backpointers sum up to bucket sector counts" check is
quite cheap to do at runtime, particularly here where we've just touched
these keys.

Then, if we do detect missing backpointers, we can kick off
check_extents_to_backpointers. This is also much cheaper than it used to
be, since the scan only walks extent -> backpointer for buckets that we
know have missing backpointers.

Most of the patch series is the new infrastructure for running recovery
passes automatically and asynchronously. We'll be using this for more
things in the future, as other recovery passes become online passes -
this is one of the last missing pieces for "full runtime self healing
from anything".

To avoid unfortunate situations where something is triggering a recovery
pass continuously, there's ratelimiting, using the new superblock
section for "runtime, time of last run" for each recovery pass. Copygc
doesn't need to be able to evacuate any given bucket right away, it can
wait until a moderate percentage of buckets have missing backpointers
before kicking it off.

Kent Overstreet (8):
  bcachefs: struct bch_fs_recovery
  bcachefs: __bch2_run_recovery_passes()
  bcachefs: Reduce usage of recovery.curr_pass
  bcachefs: bch2_recovery_pass_status_to_text()
  bcachefs: bch2_run_explicit_recovery_pass() cleanup
  bcachefs: Run recovery passes asynchronously
  bcachefs: Improve bucket_bitmap code
  bcachefs: bch2_check_bucket_backpointer_mismatch()

 fs/bcachefs/alloc_background.c      |  10 +-
 fs/bcachefs/alloc_foreground.c      |   6 +-
 fs/bcachefs/backpointers.c          | 197 +++++++++----
 fs/bcachefs/backpointers.h          |  10 +-
 fs/bcachefs/bcachefs.h              |  23 +-
 fs/bcachefs/btree_cache.c           |   2 +-
 fs/bcachefs/btree_io.c              |   7 +-
 fs/bcachefs/btree_node_scan.c       |   2 +-
 fs/bcachefs/btree_update_interior.c |   2 +-
 fs/bcachefs/buckets.c               |  33 +--
 fs/bcachefs/errcode.h               |   1 -
 fs/bcachefs/error.c                 |   2 +-
 fs/bcachefs/fsck.c                  |   9 +-
 fs/bcachefs/move.c                  |  21 +-
 fs/bcachefs/movinggc.c              |  11 +-
 fs/bcachefs/rebalance.c             |   2 +-
 fs/bcachefs/recovery.c              |  46 ++-
 fs/bcachefs/recovery_passes.c       | 431 ++++++++++++++++++----------
 fs/bcachefs/recovery_passes.h       |  23 +-
 fs/bcachefs/recovery_passes_types.h |  27 ++
 fs/bcachefs/sb-members.c            |   4 +-
 fs/bcachefs/snapshot.c              |   4 +-
 fs/bcachefs/subvolume.c             |   6 +-
 fs/bcachefs/super.c                 |  10 +-
 fs/bcachefs/sysfs.c                 |   6 +
 25 files changed, 579 insertions(+), 316 deletions(-)
 create mode 100644 fs/bcachefs/recovery_passes_types.h

-- 
2.49.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/8] bcachefs: struct bch_fs_recovery
  2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
@ 2025-05-17 19:25 ` Kent Overstreet
  2025-05-17 19:25 ` [PATCH 2/8] bcachefs: __bch2_run_recovery_passes() Kent Overstreet
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

bch_fs has gotten obnoxiously big, let's start organizing thins a bit
better.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/alloc_background.c      |  2 +-
 fs/bcachefs/alloc_foreground.c      |  6 +--
 fs/bcachefs/backpointers.c          |  4 +-
 fs/bcachefs/bcachefs.h              | 17 +-------
 fs/bcachefs/btree_cache.c           |  2 +-
 fs/bcachefs/btree_io.c              |  6 +--
 fs/bcachefs/btree_update_interior.c |  2 +-
 fs/bcachefs/fsck.c                  |  8 ++--
 fs/bcachefs/movinggc.c              |  2 +-
 fs/bcachefs/rebalance.c             |  2 +-
 fs/bcachefs/recovery.c              | 10 ++---
 fs/bcachefs/recovery_passes.c       | 60 +++++++++++++++--------------
 fs/bcachefs/recovery_passes_types.h | 23 +++++++++++
 fs/bcachefs/snapshot.c              |  4 +-
 fs/bcachefs/super.c                 |  2 +-
 15 files changed, 81 insertions(+), 69 deletions(-)
 create mode 100644 fs/bcachefs/recovery_passes_types.h

diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
index f8d21c12c3d1..4ae2aa6ea758 100644
--- a/fs/bcachefs/alloc_background.c
+++ b/fs/bcachefs/alloc_background.c
@@ -309,7 +309,7 @@ int bch2_alloc_v4_validate(struct bch_fs *c, struct bkey_s_c k,
 				 "data type inconsistency");
 
 		bkey_fsck_err_on(!a.io_time[READ] &&
-				 c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_to_lru_refs,
+				 c->recovery.curr_pass > BCH_RECOVERY_PASS_check_alloc_to_lru_refs,
 				 c, alloc_key_cached_but_read_time_zero,
 				 "cached bucket with read_time == 0");
 		break;
diff --git a/fs/bcachefs/alloc_foreground.c b/fs/bcachefs/alloc_foreground.c
index 6aefa490ec24..76641cc4c27d 100644
--- a/fs/bcachefs/alloc_foreground.c
+++ b/fs/bcachefs/alloc_foreground.c
@@ -154,7 +154,7 @@ static struct open_bucket *bch2_open_bucket_alloc(struct bch_fs *c)
 
 static inline bool is_superblock_bucket(struct bch_fs *c, struct bch_dev *ca, u64 b)
 {
-	if (c->curr_recovery_pass > BCH_RECOVERY_PASS_trans_mark_dev_sbs)
+	if (c->recovery.curr_pass > BCH_RECOVERY_PASS_trans_mark_dev_sbs)
 		return false;
 
 	return bch2_is_superblock_bucket(ca, b);
@@ -524,7 +524,7 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
 
 	if (!avail) {
 		if (req->watermark > BCH_WATERMARK_normal &&
-		    c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_allocations)
+		    c->recovery.curr_pass <= BCH_RECOVERY_PASS_check_allocations)
 			goto alloc;
 
 		if (cl && !waiting) {
@@ -554,7 +554,7 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
 		goto alloc;
 	}
 
-	if (!ob && freespace && c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_alloc_info) {
+	if (!ob && freespace && c->recovery.curr_pass <= BCH_RECOVERY_PASS_check_alloc_info) {
 		freespace = false;
 		goto alloc;
 	}
diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index bdf524b465fa..44da8e2657af 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -120,7 +120,7 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
 		bch2_bkey_val_to_text(&buf, c, orig_k);
 
 		bch_err(c, "%s", buf.buf);
-	} else if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers) {
+	} else if (c->recovery.curr_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers) {
 		prt_printf(&buf, "backpointer not found when deleting\n");
 		printbuf_indent_add(&buf, 2);
 
@@ -136,7 +136,7 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
 		bch2_bkey_val_to_text(&buf, c, orig_k);
 	}
 
-	if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers &&
+	if (c->recovery.curr_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers &&
 	    __bch2_inconsistent_error(c, &buf))
 		ret = -BCH_ERR_erofs_unfixed_errors;
 
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 252fc1eaa0dc..1458f131af16 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -502,6 +502,7 @@ enum bch_time_stats {
 #include "keylist_types.h"
 #include "quota_types.h"
 #include "rebalance_types.h"
+#include "recovery_passes_types.h"
 #include "replicas_types.h"
 #include "sb-members_types.h"
 #include "subvolume_types.h"
@@ -1116,21 +1117,7 @@ struct bch_fs {
 	/* RECOVERY */
 	u64			journal_replay_seq_start;
 	u64			journal_replay_seq_end;
-	/*
-	 * Two different uses:
-	 * "Has this fsck pass?" - i.e. should this type of error be an
-	 * emergency read-only
-	 * And, in certain situations fsck will rewind to an earlier pass: used
-	 * for signaling to the toplevel code which pass we want to run now.
-	 */
-	enum bch_recovery_pass	curr_recovery_pass;
-	enum bch_recovery_pass	next_recovery_pass;
-	/* bitmask of recovery passes that we actually ran */
-	u64			recovery_passes_complete;
-	/* never rewinds version of curr_recovery_pass */
-	enum bch_recovery_pass	recovery_pass_done;
-	spinlock_t		recovery_pass_lock;
-	struct semaphore	run_recovery_passes_lock;
+	struct bch_fs_recovery	recovery;
 
 	/* DEBUG JUNK */
 	struct dentry		*fs_debug_dir;
diff --git a/fs/bcachefs/btree_cache.c b/fs/bcachefs/btree_cache.c
index 96c3846e0079..2600a97582b1 100644
--- a/fs/bcachefs/btree_cache.c
+++ b/fs/bcachefs/btree_cache.c
@@ -1019,7 +1019,7 @@ static noinline void btree_bad_header(struct bch_fs *c, struct btree *b)
 {
 	struct printbuf buf = PRINTBUF;
 
-	if (c->curr_recovery_pass <= BCH_RECOVERY_PASS_check_allocations)
+	if (c->recovery.curr_pass <= BCH_RECOVERY_PASS_check_allocations)
 		return;
 
 	prt_printf(&buf,
diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index 97cd25cd492b..e5db374f001b 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -556,7 +556,7 @@ static int __btree_err(int ret,
 		       struct printbuf *err_msg,
 		       const char *fmt, ...)
 {
-	if (c->curr_recovery_pass == BCH_RECOVERY_PASS_scan_for_btree_nodes)
+	if (c->recovery.curr_pass == BCH_RECOVERY_PASS_scan_for_btree_nodes)
 		return -BCH_ERR_fsck_fix;
 
 	bool have_retry = false;
@@ -1428,7 +1428,7 @@ static void btree_node_read_work(struct work_struct *work)
 	if ((failed.nr ||
 	     btree_node_need_rewrite(b)) &&
 	    !btree_node_read_error(b) &&
-	    c->curr_recovery_pass != BCH_RECOVERY_PASS_scan_for_btree_nodes) {
+	    c->recovery.curr_pass != BCH_RECOVERY_PASS_scan_for_btree_nodes) {
 		prt_printf(&buf, " (rewriting node)");
 		bch2_btree_node_rewrite_async(c, b);
 	}
@@ -1776,7 +1776,7 @@ void bch2_btree_node_read(struct btree_trans *trans, struct btree *b,
 		bch2_btree_lost_data(c, &buf, b->c.btree_id);
 
 		if (c->opts.recovery_passes & BIT_ULL(BCH_RECOVERY_PASS_check_topology) &&
-		    c->curr_recovery_pass > BCH_RECOVERY_PASS_check_topology &&
+		    c->recovery.curr_pass > BCH_RECOVERY_PASS_check_topology &&
 		    bch2_fs_emergency_read_only2(c, &buf))
 			ratelimit = false;
 
diff --git a/fs/bcachefs/btree_update_interior.c b/fs/bcachefs/btree_update_interior.c
index 2d43d51b597d..a658c97439ed 100644
--- a/fs/bcachefs/btree_update_interior.c
+++ b/fs/bcachefs/btree_update_interior.c
@@ -2363,7 +2363,7 @@ void bch2_btree_node_rewrite_async(struct bch_fs *c, struct btree *b)
 	bool now = false, pending = false;
 
 	spin_lock(&c->btree_node_rewrites_lock);
-	if (c->curr_recovery_pass > BCH_RECOVERY_PASS_journal_replay &&
+	if (c->recovery.curr_pass > BCH_RECOVERY_PASS_journal_replay &&
 	    enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_node_rewrite)) {
 		list_add(&a->list, &c->btree_node_rewrites);
 		now = true;
diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c
index 4dcd51b8259e..dc541a66b8eb 100644
--- a/fs/bcachefs/fsck.c
+++ b/fs/bcachefs/fsck.c
@@ -3140,7 +3140,7 @@ static int bch2_fsck_online_thread_fn(struct thread_with_stdio *stdio)
 	c->opts.fsck = true;
 	set_bit(BCH_FS_in_fsck, &c->flags);
 
-	c->curr_recovery_pass = BCH_RECOVERY_PASS_check_alloc_info;
+	c->recovery.curr_pass = BCH_RECOVERY_PASS_check_alloc_info;
 	int ret = bch2_run_online_recovery_passes(c);
 
 	clear_bit(BCH_FS_in_fsck, &c->flags);
@@ -3150,7 +3150,7 @@ static int bch2_fsck_online_thread_fn(struct thread_with_stdio *stdio)
 	c->stdio_filter = NULL;
 	c->opts.fix_errors = old_fix_errors;
 
-	up(&c->run_recovery_passes_lock);
+	up(&c->recovery.run_lock);
 	bch2_ro_ref_put(c);
 	return ret;
 }
@@ -3174,7 +3174,7 @@ long bch2_ioctl_fsck_online(struct bch_fs *c, struct bch_ioctl_fsck_online arg)
 	if (!bch2_ro_ref_tryget(c))
 		return -EROFS;
 
-	if (down_trylock(&c->run_recovery_passes_lock)) {
+	if (down_trylock(&c->recovery.run_lock)) {
 		bch2_ro_ref_put(c);
 		return -EAGAIN;
 	}
@@ -3206,7 +3206,7 @@ long bch2_ioctl_fsck_online(struct bch_fs *c, struct bch_ioctl_fsck_online arg)
 		bch_err_fn(c, ret);
 		if (thr)
 			bch2_fsck_thread_exit(&thr->thr);
-		up(&c->run_recovery_passes_lock);
+		up(&c->recovery.run_lock);
 		bch2_ro_ref_put(c);
 	}
 	return ret;
diff --git a/fs/bcachefs/movinggc.c b/fs/bcachefs/movinggc.c
index cc843815f7eb..4bfdb1befb9a 100644
--- a/fs/bcachefs/movinggc.c
+++ b/fs/bcachefs/movinggc.c
@@ -362,7 +362,7 @@ static int bch2_copygc_thread(void *arg)
 	 * Data move operations can't run until after check_snapshots has
 	 * completed, and bch2_snapshot_is_ancestor() is available.
 	 */
-	kthread_wait_freezable(c->recovery_pass_done > BCH_RECOVERY_PASS_check_snapshots ||
+	kthread_wait_freezable(c->recovery.pass_done > BCH_RECOVERY_PASS_check_snapshots ||
 			       kthread_should_stop());
 
 	bch2_move_stats_init(&move_stats, "copygc");
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index c223bb092d33..de1ec9e0caa0 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -616,7 +616,7 @@ static int bch2_rebalance_thread(void *arg)
 	 * Data move operations can't run until after check_snapshots has
 	 * completed, and bch2_snapshot_is_ancestor() is available.
 	 */
-	kthread_wait_freezable(c->recovery_pass_done > BCH_RECOVERY_PASS_check_snapshots ||
+	kthread_wait_freezable(c->recovery.pass_done > BCH_RECOVERY_PASS_check_snapshots ||
 			       kthread_should_stop());
 
 	bch2_moving_ctxt_init(&ctxt, c, NULL, &r->work_stats,
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 1895a6b13001..cd2372221a54 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -434,7 +434,7 @@ int bch2_journal_replay(struct bch_fs *c)
 	trans = NULL;
 
 	if (!c->opts.retain_recovery_info &&
-	    c->recovery_pass_done >= BCH_RECOVERY_PASS_journal_replay)
+	    c->recovery.pass_done >= BCH_RECOVERY_PASS_journal_replay)
 		bch2_journal_keys_put_initial(c);
 
 	replay_now_at(j, j->replay_journal_seq_end);
@@ -1001,7 +1001,7 @@ int bch2_fs_recovery(struct bch_fs *c)
 		bch_info(c, "Fixed errors, running fsck a second time to verify fs is clean");
 		clear_bit(BCH_FS_errors_fixed, &c->flags);
 
-		c->curr_recovery_pass = BCH_RECOVERY_PASS_check_alloc_info;
+		c->recovery.curr_pass = BCH_RECOVERY_PASS_check_alloc_info;
 
 		ret = bch2_run_recovery_passes(c);
 		if (ret)
@@ -1047,7 +1047,7 @@ int bch2_fs_recovery(struct bch_fs *c)
 
 	if (c->opts.fsck &&
 	    !test_bit(BCH_FS_error, &c->flags) &&
-	    c->recovery_pass_done == BCH_RECOVERY_PASS_NR - 1 &&
+	    c->recovery.pass_done == BCH_RECOVERY_PASS_NR - 1 &&
 	    ext->btrees_lost_data) {
 		ext->btrees_lost_data = 0;
 		write_sb = true;
@@ -1234,7 +1234,7 @@ int bch2_fs_initialize(struct bch_fs *c)
 	if (ret)
 		goto err;
 
-	c->recovery_pass_done = BCH_RECOVERY_PASS_NR - 1;
+	c->recovery.pass_done = BCH_RECOVERY_PASS_NR - 1;
 
 	bch2_copygc_wakeup(c);
 	bch2_rebalance_wakeup(c);
@@ -1257,7 +1257,7 @@ int bch2_fs_initialize(struct bch_fs *c)
 	bch2_write_super(c);
 	mutex_unlock(&c->sb_lock);
 
-	c->curr_recovery_pass = BCH_RECOVERY_PASS_NR;
+	c->recovery.curr_pass = BCH_RECOVERY_PASS_NR;
 	return 0;
 err:
 	bch_err_fn(c, ret);
diff --git a/fs/bcachefs/recovery_passes.c b/fs/bcachefs/recovery_passes.c
index 22cefffcf1fa..c1eca55a1dde 100644
--- a/fs/bcachefs/recovery_passes.c
+++ b/fs/bcachefs/recovery_passes.c
@@ -210,16 +210,18 @@ static int __bch2_run_explicit_recovery_pass(struct printbuf *out,
 					     struct bch_fs *c,
 					     enum bch_recovery_pass pass)
 {
-	if (c->curr_recovery_pass == ARRAY_SIZE(recovery_pass_fns))
+	struct bch_fs_recovery *r = &c->recovery;
+
+	if (r->curr_pass == ARRAY_SIZE(recovery_pass_fns))
 		return -BCH_ERR_not_in_recovery;
 
-	if (c->recovery_passes_complete & BIT_ULL(pass))
+	if (r->passes_complete & BIT_ULL(pass))
 		return 0;
 
 	bool print = !(c->opts.recovery_passes & BIT_ULL(pass));
 
 	if (pass < BCH_RECOVERY_PASS_set_may_go_rw &&
-	    c->curr_recovery_pass >= BCH_RECOVERY_PASS_set_may_go_rw) {
+	    r->curr_pass >= BCH_RECOVERY_PASS_set_may_go_rw) {
 		if (print)
 			prt_printf(out, "need recovery pass %s (%u), but already rw\n",
 				   bch2_recovery_passes[pass], pass);
@@ -229,14 +231,14 @@ static int __bch2_run_explicit_recovery_pass(struct printbuf *out,
 	if (print)
 		prt_printf(out, "running explicit recovery pass %s (%u), currently at %s (%u)\n",
 			   bch2_recovery_passes[pass], pass,
-			   bch2_recovery_passes[c->curr_recovery_pass], c->curr_recovery_pass);
+			   bch2_recovery_passes[r->curr_pass], r->curr_pass);
 
 	c->opts.recovery_passes |= BIT_ULL(pass);
 
 	if (test_bit(BCH_FS_in_recovery, &c->flags) &&
-	    c->curr_recovery_pass > pass) {
-		c->next_recovery_pass = pass;
-		c->recovery_passes_complete &= (1ULL << pass) >> 1;
+	    r->curr_pass > pass) {
+		r->next_pass = pass;
+		r->passes_complete &= (1ULL << pass) >> 1;
 		return -BCH_ERR_restart_recovery;
 	} else {
 		return 0;
@@ -251,9 +253,9 @@ static int bch2_run_explicit_recovery_pass_printbuf(struct bch_fs *c,
 	out->atomic++;
 
 	unsigned long flags;
-	spin_lock_irqsave(&c->recovery_pass_lock, flags);
+	spin_lock_irqsave(&c->recovery.lock, flags);
 	int ret = __bch2_run_explicit_recovery_pass(out, c, pass);
-	spin_unlock_irqrestore(&c->recovery_pass_lock, flags);
+	spin_unlock_irqrestore(&c->recovery.lock, flags);
 
 	--out->atomic;
 	return ret;
@@ -361,7 +363,7 @@ int bch2_run_online_recovery_passes(struct bch_fs *c)
 
 		int ret = bch2_run_recovery_pass(c, i);
 		if (bch2_err_matches(ret, BCH_ERR_restart_recovery)) {
-			i = c->curr_recovery_pass;
+			i = c->recovery.curr_pass;
 			continue;
 		}
 		if (ret)
@@ -381,26 +383,26 @@ int bch2_run_recovery_passes(struct bch_fs *c)
 	 */
 	c->opts.recovery_passes_exclude &= ~BCH_RECOVERY_PASS_set_may_go_rw;
 
-	down(&c->run_recovery_passes_lock);
-	spin_lock_irq(&c->recovery_pass_lock);
+	down(&c->recovery.run_lock);
+	spin_lock_irq(&c->recovery.lock);
 
-	while (c->curr_recovery_pass < ARRAY_SIZE(recovery_pass_fns) && !ret) {
-		unsigned prev_done = c->recovery_pass_done;
-		unsigned pass = c->curr_recovery_pass;
+	while (c->recovery.curr_pass < ARRAY_SIZE(recovery_pass_fns) && !ret) {
+		unsigned prev_done = c->recovery.pass_done;
+		unsigned pass = c->recovery.curr_pass;
 
-		c->next_recovery_pass = pass + 1;
+		c->recovery.next_pass = pass + 1;
 
 		if (c->opts.recovery_pass_last &&
-		    c->curr_recovery_pass > c->opts.recovery_pass_last)
+		    c->recovery.curr_pass > c->opts.recovery_pass_last)
 			break;
 
 		if (should_run_recovery_pass(c, pass)) {
-			spin_unlock_irq(&c->recovery_pass_lock);
+			spin_unlock_irq(&c->recovery.lock);
 			ret =   bch2_run_recovery_pass(c, pass) ?:
 				bch2_journal_flush(&c->journal);
-			spin_lock_irq(&c->recovery_pass_lock);
+			spin_lock_irq(&c->recovery.lock);
 
-			if (c->next_recovery_pass < c->curr_recovery_pass) {
+			if (c->recovery.next_pass < c->recovery.curr_pass) {
 				/*
 				 * bch2_run_explicit_recovery_pass() was called: we
 				 * can't always catch -BCH_ERR_restart_recovery because
@@ -408,30 +410,30 @@ int bch2_run_recovery_passes(struct bch_fs *c)
 				 * node read completion)
 				 */
 				ret = 0;
-				c->recovery_passes_complete &= ~(~0ULL << c->curr_recovery_pass);
+				c->recovery.passes_complete &= ~(~0ULL << c->recovery.curr_pass);
 			} else {
-				c->recovery_passes_complete |= BIT_ULL(pass);
-				c->recovery_pass_done = max(c->recovery_pass_done, pass);
+				c->recovery.passes_complete |= BIT_ULL(pass);
+				c->recovery.pass_done = max(c->recovery.pass_done, pass);
 			}
 		}
 
-		c->curr_recovery_pass = c->next_recovery_pass;
+		c->recovery.curr_pass = c->recovery.next_pass;
 
 		if (prev_done <= BCH_RECOVERY_PASS_check_snapshots &&
-		    c->recovery_pass_done > BCH_RECOVERY_PASS_check_snapshots) {
+		    c->recovery.pass_done > BCH_RECOVERY_PASS_check_snapshots) {
 			bch2_copygc_wakeup(c);
 			bch2_rebalance_wakeup(c);
 		}
 	}
 
-	spin_unlock_irq(&c->recovery_pass_lock);
-	up(&c->run_recovery_passes_lock);
+	spin_unlock_irq(&c->recovery.lock);
+	up(&c->recovery.run_lock);
 
 	return ret;
 }
 
 void bch2_fs_recovery_passes_init(struct bch_fs *c)
 {
-	spin_lock_init(&c->recovery_pass_lock);
-	sema_init(&c->run_recovery_passes_lock, 1);
+	spin_lock_init(&c->recovery.lock);
+	sema_init(&c->recovery.run_lock, 1);
 }
diff --git a/fs/bcachefs/recovery_passes_types.h b/fs/bcachefs/recovery_passes_types.h
new file mode 100644
index 000000000000..69e8e29d58d0
--- /dev/null
+++ b/fs/bcachefs/recovery_passes_types.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _BCACHEFS_RECOVERY_PASSES_TYPES_H
+#define _BCACHEFS_RECOVERY_PASSES_TYPES_H
+
+struct bch_fs_recovery {
+	/*
+	 * Two different uses:
+	 * "Has this fsck pass?" - i.e. should this type of error be an
+	 * emergency read-only
+	 * And, in certain situations fsck will rewind to an earlier pass: used
+	 * for signaling to the toplevel code which pass we want to run now.
+	 */
+	enum bch_recovery_pass	curr_pass;
+	enum bch_recovery_pass	next_pass;
+	/* never rewinds version of curr_pass */
+	enum bch_recovery_pass	pass_done;
+	/* bitmask of recovery passes that we actually ran */
+	u64			passes_complete;
+	spinlock_t		lock;
+	struct semaphore	run_lock;
+};
+
+#endif /* _BCACHEFS_RECOVERY_PASSES_TYPES_H */
diff --git a/fs/bcachefs/snapshot.c b/fs/bcachefs/snapshot.c
index c3dc450cbcec..c401d5285701 100644
--- a/fs/bcachefs/snapshot.c
+++ b/fs/bcachefs/snapshot.c
@@ -143,7 +143,7 @@ bool __bch2_snapshot_is_ancestor(struct bch_fs *c, u32 id, u32 ancestor)
 	rcu_read_lock();
 	struct snapshot_table *t = rcu_dereference(c->snapshots);
 
-	if (unlikely(c->recovery_pass_done < BCH_RECOVERY_PASS_check_snapshots)) {
+	if (unlikely(c->recovery.pass_done < BCH_RECOVERY_PASS_check_snapshots)) {
 		ret = __bch2_snapshot_is_ancestor_early(t, id, ancestor);
 		goto out;
 	}
@@ -348,7 +348,7 @@ static int __bch2_mark_snapshot(struct btree_trans *trans,
 
 		if (BCH_SNAPSHOT_WILL_DELETE(s.v)) {
 			set_bit(BCH_FS_need_delete_dead_snapshots, &c->flags);
-			if (c->curr_recovery_pass > BCH_RECOVERY_PASS_delete_dead_snapshots)
+			if (c->recovery.curr_pass > BCH_RECOVERY_PASS_delete_dead_snapshots)
 				bch2_delete_dead_snapshots_async(c);
 		}
 	} else {
diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index c46b2b2ebab1..170b0f26c018 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -392,7 +392,7 @@ void bch2_fs_read_only(struct bch_fs *c)
 	    !test_bit(BCH_FS_emergency_ro, &c->flags) &&
 	    test_bit(BCH_FS_started, &c->flags) &&
 	    test_bit(BCH_FS_clean_shutdown, &c->flags) &&
-	    c->recovery_pass_done >= BCH_RECOVERY_PASS_journal_replay) {
+	    c->recovery.pass_done >= BCH_RECOVERY_PASS_journal_replay) {
 		BUG_ON(c->journal.last_empty_seq != journal_cur_seq(&c->journal));
 		BUG_ON(atomic_long_read(&c->btree_cache.nr_dirty));
 		BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty));
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/8] bcachefs: __bch2_run_recovery_passes()
  2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
  2025-05-17 19:25 ` [PATCH 1/8] bcachefs: struct bch_fs_recovery Kent Overstreet
@ 2025-05-17 19:25 ` Kent Overstreet
  2025-05-17 19:25 ` [PATCH 3/8] bcachefs: Reduce usage of recovery.curr_pass Kent Overstreet
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

Consolidate bch2_run_recovery_passes() and
bch2_run_online_recovery_passes(), prep work for automatically
scheduling and running recovery passes in the background.

- Now takes a mask of which passes to run, automatic background repair
  will pass in sb.recovery_passes_required.

- Skips passes that are failing: a pass that failed may be reattempted
  after another pass succeeds (some passes depend on repair done by
  other passes for successful completion).

- bch2_recovery_passes_match() helper to skip alloc passes on a
  filesystem without alloc info.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/fsck.c                  |   2 +-
 fs/bcachefs/recovery.c              |   7 +-
 fs/bcachefs/recovery_passes.c       | 180 +++++++++++++++-------------
 fs/bcachefs/recovery_passes.h       |   4 +-
 fs/bcachefs/recovery_passes_types.h |   2 +
 5 files changed, 104 insertions(+), 91 deletions(-)

diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c
index dc541a66b8eb..0909716900f9 100644
--- a/fs/bcachefs/fsck.c
+++ b/fs/bcachefs/fsck.c
@@ -3141,7 +3141,7 @@ static int bch2_fsck_online_thread_fn(struct thread_with_stdio *stdio)
 	set_bit(BCH_FS_in_fsck, &c->flags);
 
 	c->recovery.curr_pass = BCH_RECOVERY_PASS_check_alloc_info;
-	int ret = bch2_run_online_recovery_passes(c);
+	int ret = bch2_run_online_recovery_passes(c, ~0ULL);
 
 	clear_bit(BCH_FS_in_fsck, &c->flags);
 	bch_err_fn(c, ret);
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index cd2372221a54..a7e6b5a6505a 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -966,7 +966,7 @@ int bch2_fs_recovery(struct bch_fs *c)
 	if (ret)
 		goto err;
 
-	ret = bch2_run_recovery_passes(c);
+	ret = bch2_run_recovery_passes(c, 0);
 	if (ret)
 		goto err;
 
@@ -1001,9 +1001,8 @@ int bch2_fs_recovery(struct bch_fs *c)
 		bch_info(c, "Fixed errors, running fsck a second time to verify fs is clean");
 		clear_bit(BCH_FS_errors_fixed, &c->flags);
 
-		c->recovery.curr_pass = BCH_RECOVERY_PASS_check_alloc_info;
-
-		ret = bch2_run_recovery_passes(c);
+		ret = bch2_run_recovery_passes(c,
+			BCH_RECOVERY_PASS_check_alloc_info);
 		if (ret)
 			goto err;
 
diff --git a/fs/bcachefs/recovery_passes.c b/fs/bcachefs/recovery_passes.c
index c1eca55a1dde..e0e261aa752e 100644
--- a/fs/bcachefs/recovery_passes.c
+++ b/fs/bcachefs/recovery_passes.c
@@ -203,6 +203,21 @@ static struct recovery_pass_fn recovery_pass_fns[] = {
 #undef x
 };
 
+static u64 bch2_recovery_passes_match(unsigned flags)
+{
+	u64 ret = 0;
+
+	for (unsigned i = 0; i < ARRAY_SIZE(recovery_pass_fns); i++)
+		if (recovery_pass_fns[i].when & flags)
+			ret |= BIT_ULL(i);
+	return ret;
+}
+
+u64 bch2_fsck_recovery_passes(void)
+{
+	return bch2_recovery_passes_match(PASS_FSCK);
+}
+
 /*
  * For when we need to rewind recovery passes and run a pass we skipped:
  */
@@ -235,10 +250,12 @@ static int __bch2_run_explicit_recovery_pass(struct printbuf *out,
 
 	c->opts.recovery_passes |= BIT_ULL(pass);
 
+	if (test_bit(BCH_FS_in_recovery, &c->flags))
+		r->passes_to_run |= BIT_ULL(pass);
+
 	if (test_bit(BCH_FS_in_recovery, &c->flags) &&
 	    r->curr_pass > pass) {
 		r->next_pass = pass;
-		r->passes_complete &= (1ULL << pass) >> 1;
 		return -BCH_ERR_restart_recovery;
 	} else {
 		return 0;
@@ -302,37 +319,9 @@ int bch2_run_explicit_recovery_pass_persistent(struct bch_fs *c,
 	return ret;
 }
 
-u64 bch2_fsck_recovery_passes(void)
-{
-	u64 ret = 0;
-
-	for (unsigned i = 0; i < ARRAY_SIZE(recovery_pass_fns); i++)
-		if (recovery_pass_fns[i].when & PASS_FSCK)
-			ret |= BIT_ULL(i);
-	return ret;
-}
-
-static bool should_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
-{
-	struct recovery_pass_fn *p = recovery_pass_fns + pass;
-
-	if ((p->when & PASS_ALLOC) && (c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info)))
-		return false;
-	if (c->opts.recovery_passes_exclude & BIT_ULL(pass))
-		return false;
-	if (c->opts.recovery_passes & BIT_ULL(pass))
-		return true;
-	if ((p->when & PASS_FSCK) && c->opts.fsck)
-		return true;
-	if ((p->when & PASS_UNCLEAN) && !c->sb.clean)
-		return true;
-	if (p->when & PASS_ALWAYS)
-		return true;
-	return false;
-}
-
 static int bch2_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
 {
+	struct bch_fs_recovery *r = &c->recovery;
 	struct recovery_pass_fn *p = recovery_pass_fns + pass;
 
 	if (!(p->when & PASS_SILENT))
@@ -341,8 +330,15 @@ static int bch2_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
 
 	s64 start_time = ktime_get_real_seconds();
 	int ret = p->fn(c);
-	if (ret)
+
+	r->passes_to_run &= ~BIT_ULL(pass);
+
+	if (ret) {
+		r->passes_failing |= BIT_ULL(pass);
 		return ret;
+	}
+
+	r->passes_failing = 0;
 
 	if (!test_bit(BCH_FS_error, &c->flags))
 		bch2_sb_recovery_pass_complete(c, pass, start_time);
@@ -353,80 +349,96 @@ static int bch2_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
 	return 0;
 }
 
-int bch2_run_online_recovery_passes(struct bch_fs *c)
+static int __bch2_run_recovery_passes(struct bch_fs *c, u64 orig_passes_to_run,
+				      bool online)
 {
-	for (unsigned i = 0; i < ARRAY_SIZE(recovery_pass_fns); i++) {
-		struct recovery_pass_fn *p = recovery_pass_fns + i;
-
-		if (!(p->when & PASS_ONLINE))
-			continue;
+	struct bch_fs_recovery *r = &c->recovery;
+	int ret = 0;
 
-		int ret = bch2_run_recovery_pass(c, i);
-		if (bch2_err_matches(ret, BCH_ERR_restart_recovery)) {
-			i = c->recovery.curr_pass;
-			continue;
-		}
-		if (ret)
-			return ret;
-	}
+	spin_lock_irq(&r->lock);
 
-	return 0;
-}
+	if (online)
+		orig_passes_to_run &= bch2_recovery_passes_match(PASS_ONLINE);
 
-int bch2_run_recovery_passes(struct bch_fs *c)
-{
-	int ret = 0;
+	if (c->sb.features & BIT_ULL(BCH_FEATURE_no_alloc_info))
+		orig_passes_to_run &= ~bch2_recovery_passes_match(PASS_ALLOC);
 
 	/*
-	 * We can't allow set_may_go_rw to be excluded; that would cause us to
-	 * use the journal replay keys for updates where it's not expected.
+	 * A failed recovery pass will be retried after another pass succeeds -
+	 * but not this iteration.
+	 *
+	 * This is because some passes depend on repair done by other passes: we
+	 * may want to retry, but we don't want to loop on failing passes.
 	 */
-	c->opts.recovery_passes_exclude &= ~BCH_RECOVERY_PASS_set_may_go_rw;
 
-	down(&c->recovery.run_lock);
-	spin_lock_irq(&c->recovery.lock);
+	orig_passes_to_run &= ~r->passes_failing;
 
-	while (c->recovery.curr_pass < ARRAY_SIZE(recovery_pass_fns) && !ret) {
-		unsigned prev_done = c->recovery.pass_done;
-		unsigned pass = c->recovery.curr_pass;
+	r->passes_to_run = orig_passes_to_run;
 
-		c->recovery.next_pass = pass + 1;
+	while (r->passes_to_run) {
+		unsigned prev_done = r->pass_done;
+		unsigned pass = __ffs64(r->passes_to_run);
+		r->curr_pass = pass;
+		r->next_pass = r->curr_pass + 1;
+		r->passes_to_run &= ~BIT_ULL(pass);
 
-		if (c->opts.recovery_pass_last &&
-		    c->recovery.curr_pass > c->opts.recovery_pass_last)
-			break;
+		spin_unlock_irq(&r->lock);
+
+		int ret2 = bch2_run_recovery_pass(c, pass) ?:
+			bch2_journal_flush(&c->journal);
 
-		if (should_run_recovery_pass(c, pass)) {
-			spin_unlock_irq(&c->recovery.lock);
-			ret =   bch2_run_recovery_pass(c, pass) ?:
-				bch2_journal_flush(&c->journal);
-			spin_lock_irq(&c->recovery.lock);
-
-			if (c->recovery.next_pass < c->recovery.curr_pass) {
-				/*
-				 * bch2_run_explicit_recovery_pass() was called: we
-				 * can't always catch -BCH_ERR_restart_recovery because
-				 * it may have been called from another thread (btree
-				 * node read completion)
-				 */
-				ret = 0;
-				c->recovery.passes_complete &= ~(~0ULL << c->recovery.curr_pass);
-			} else {
-				c->recovery.passes_complete |= BIT_ULL(pass);
-				c->recovery.pass_done = max(c->recovery.pass_done, pass);
-			}
+		spin_lock_irq(&r->lock);
+
+		if (r->next_pass < r->curr_pass) {
+			/* Rewind: */
+			r->passes_to_run |= orig_passes_to_run & (~0ULL << r->next_pass);
+		} else if (!ret2) {
+			r->pass_done = max(r->pass_done, pass);
+			r->passes_complete |= BIT_ULL(pass);
+		} else {
+			ret = ret2;
 		}
 
-		c->recovery.curr_pass = c->recovery.next_pass;
+		if (ret && !online)
+			break;
 
 		if (prev_done <= BCH_RECOVERY_PASS_check_snapshots &&
-		    c->recovery.pass_done > BCH_RECOVERY_PASS_check_snapshots) {
+		    r->pass_done > BCH_RECOVERY_PASS_check_snapshots) {
 			bch2_copygc_wakeup(c);
 			bch2_rebalance_wakeup(c);
 		}
 	}
 
-	spin_unlock_irq(&c->recovery.lock);
+	spin_unlock_irq(&r->lock);
+
+	return ret;
+}
+
+int bch2_run_online_recovery_passes(struct bch_fs *c, u64 passes)
+{
+	return __bch2_run_recovery_passes(c, c->sb.recovery_passes_required|passes, true);
+}
+
+int bch2_run_recovery_passes(struct bch_fs *c, enum bch_recovery_pass from)
+{
+	u64 passes =
+		bch2_recovery_passes_match(PASS_ALWAYS) |
+		(!c->sb.clean ? bch2_recovery_passes_match(PASS_UNCLEAN) : 0) |
+		(c->opts.fsck ? bch2_recovery_passes_match(PASS_FSCK) : 0) |
+		c->opts.recovery_passes |
+		c->sb.recovery_passes_required;
+
+	/*
+	 * We can't allow set_may_go_rw to be excluded; that would cause us to
+	 * use the journal replay keys for updates where it's not expected.
+	 */
+	c->opts.recovery_passes_exclude &= ~BCH_RECOVERY_PASS_set_may_go_rw;
+	passes &= ~c->opts.recovery_passes_exclude;
+
+	passes &= ~(BIT_ULL(from) - 1);
+
+	down(&c->recovery.run_lock);
+	int ret = __bch2_run_recovery_passes(c, passes, false);
 	up(&c->recovery.run_lock);
 
 	return ret;
diff --git a/fs/bcachefs/recovery_passes.h b/fs/bcachefs/recovery_passes.h
index 4c03472be5b9..0e79cc33fd8f 100644
--- a/fs/bcachefs/recovery_passes.h
+++ b/fs/bcachefs/recovery_passes.h
@@ -17,8 +17,8 @@ int __bch2_run_explicit_recovery_pass_persistent(struct bch_fs *, struct printbu
 int bch2_run_explicit_recovery_pass_persistent(struct bch_fs *, struct printbuf *,
 					       enum bch_recovery_pass);
 
-int bch2_run_online_recovery_passes(struct bch_fs *);
-int bch2_run_recovery_passes(struct bch_fs *);
+int bch2_run_online_recovery_passes(struct bch_fs *, u64);
+int bch2_run_recovery_passes(struct bch_fs *, enum bch_recovery_pass);
 
 void bch2_fs_recovery_passes_init(struct bch_fs *);
 
diff --git a/fs/bcachefs/recovery_passes_types.h b/fs/bcachefs/recovery_passes_types.h
index 69e8e29d58d0..deb6e0565cb9 100644
--- a/fs/bcachefs/recovery_passes_types.h
+++ b/fs/bcachefs/recovery_passes_types.h
@@ -14,8 +14,10 @@ struct bch_fs_recovery {
 	enum bch_recovery_pass	next_pass;
 	/* never rewinds version of curr_pass */
 	enum bch_recovery_pass	pass_done;
+	u64			passes_to_run;
 	/* bitmask of recovery passes that we actually ran */
 	u64			passes_complete;
+	u64			passes_failing;
 	spinlock_t		lock;
 	struct semaphore	run_lock;
 };
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/8] bcachefs: Reduce usage of recovery.curr_pass
  2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
  2025-05-17 19:25 ` [PATCH 1/8] bcachefs: struct bch_fs_recovery Kent Overstreet
  2025-05-17 19:25 ` [PATCH 2/8] bcachefs: __bch2_run_recovery_passes() Kent Overstreet
@ 2025-05-17 19:25 ` Kent Overstreet
  2025-05-17 19:25 ` [PATCH 4/8] bcachefs: bch2_recovery_pass_status_to_text() Kent Overstreet
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

We want recovery.curr_pass to be private to the recovery passes code,
for better showing recovery pass status; also, it may rewind and is
generally not the correct member to use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/alloc_background.c      | 3 ++-
 fs/bcachefs/alloc_foreground.c      | 6 +++---
 fs/bcachefs/backpointers.c          | 7 ++++---
 fs/bcachefs/btree_cache.c           | 2 +-
 fs/bcachefs/btree_io.c              | 3 +--
 fs/bcachefs/btree_update_interior.c | 2 +-
 fs/bcachefs/fsck.c                  | 1 -
 fs/bcachefs/snapshot.c              | 2 +-
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
index 4ae2aa6ea758..88e710ba2685 100644
--- a/fs/bcachefs/alloc_background.c
+++ b/fs/bcachefs/alloc_background.c
@@ -309,7 +309,8 @@ int bch2_alloc_v4_validate(struct bch_fs *c, struct bkey_s_c k,
 				 "data type inconsistency");
 
 		bkey_fsck_err_on(!a.io_time[READ] &&
-				 c->recovery.curr_pass > BCH_RECOVERY_PASS_check_alloc_to_lru_refs,
+				 !(c->recovery.passes_to_run &
+				   BIT_ULL(BCH_RECOVERY_PASS_check_alloc_to_lru_refs)),
 				 c, alloc_key_cached_but_read_time_zero,
 				 "cached bucket with read_time == 0");
 		break;
diff --git a/fs/bcachefs/alloc_foreground.c b/fs/bcachefs/alloc_foreground.c
index 76641cc4c27d..1a52c12c51ae 100644
--- a/fs/bcachefs/alloc_foreground.c
+++ b/fs/bcachefs/alloc_foreground.c
@@ -154,7 +154,7 @@ static struct open_bucket *bch2_open_bucket_alloc(struct bch_fs *c)
 
 static inline bool is_superblock_bucket(struct bch_fs *c, struct bch_dev *ca, u64 b)
 {
-	if (c->recovery.curr_pass > BCH_RECOVERY_PASS_trans_mark_dev_sbs)
+	if (c->recovery.passes_complete & BIT_ULL(BCH_RECOVERY_PASS_trans_mark_dev_sbs))
 		return false;
 
 	return bch2_is_superblock_bucket(ca, b);
@@ -524,7 +524,7 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
 
 	if (!avail) {
 		if (req->watermark > BCH_WATERMARK_normal &&
-		    c->recovery.curr_pass <= BCH_RECOVERY_PASS_check_allocations)
+		    c->recovery.pass_done < BCH_RECOVERY_PASS_check_allocations)
 			goto alloc;
 
 		if (cl && !waiting) {
@@ -554,7 +554,7 @@ static struct open_bucket *bch2_bucket_alloc_trans(struct btree_trans *trans,
 		goto alloc;
 	}
 
-	if (!ob && freespace && c->recovery.curr_pass <= BCH_RECOVERY_PASS_check_alloc_info) {
+	if (!ob && freespace && c->recovery.pass_done < BCH_RECOVERY_PASS_check_alloc_info) {
 		freespace = false;
 		goto alloc;
 	}
diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index 44da8e2657af..d9ddfc4b5dcc 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -104,6 +104,8 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
 {
 	struct bch_fs *c = trans->c;
 	struct printbuf buf = PRINTBUF;
+	bool will_check = c->recovery.passes_to_run &
+		BIT_ULL(BCH_RECOVERY_PASS_check_extents_to_backpointers);
 	int ret = 0;
 
 	if (insert) {
@@ -120,7 +122,7 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
 		bch2_bkey_val_to_text(&buf, c, orig_k);
 
 		bch_err(c, "%s", buf.buf);
-	} else if (c->recovery.curr_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers) {
+	} else if (!will_check) {
 		prt_printf(&buf, "backpointer not found when deleting\n");
 		printbuf_indent_add(&buf, 2);
 
@@ -136,8 +138,7 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
 		bch2_bkey_val_to_text(&buf, c, orig_k);
 	}
 
-	if (c->recovery.curr_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers &&
-	    __bch2_inconsistent_error(c, &buf))
+	if (!will_check && __bch2_inconsistent_error(c, &buf))
 		ret = -BCH_ERR_erofs_unfixed_errors;
 
 	bch_err(c, "%s", buf.buf);
diff --git a/fs/bcachefs/btree_cache.c b/fs/bcachefs/btree_cache.c
index 2600a97582b1..a5d983309311 100644
--- a/fs/bcachefs/btree_cache.c
+++ b/fs/bcachefs/btree_cache.c
@@ -1019,7 +1019,7 @@ static noinline void btree_bad_header(struct bch_fs *c, struct btree *b)
 {
 	struct printbuf buf = PRINTBUF;
 
-	if (c->recovery.curr_pass <= BCH_RECOVERY_PASS_check_allocations)
+	if (c->recovery.pass_done < BCH_RECOVERY_PASS_check_allocations)
 		return;
 
 	prt_printf(&buf,
diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c
index e5db374f001b..34018296053a 100644
--- a/fs/bcachefs/btree_io.c
+++ b/fs/bcachefs/btree_io.c
@@ -1775,8 +1775,7 @@ void bch2_btree_node_read(struct btree_trans *trans, struct btree *b,
 		prt_newline(&buf);
 		bch2_btree_lost_data(c, &buf, b->c.btree_id);
 
-		if (c->opts.recovery_passes & BIT_ULL(BCH_RECOVERY_PASS_check_topology) &&
-		    c->recovery.curr_pass > BCH_RECOVERY_PASS_check_topology &&
+		if (c->recovery.passes_complete & BIT_ULL(BCH_RECOVERY_PASS_check_topology) &&
 		    bch2_fs_emergency_read_only2(c, &buf))
 			ratelimit = false;
 
diff --git a/fs/bcachefs/btree_update_interior.c b/fs/bcachefs/btree_update_interior.c
index a658c97439ed..74e65714fecd 100644
--- a/fs/bcachefs/btree_update_interior.c
+++ b/fs/bcachefs/btree_update_interior.c
@@ -2363,7 +2363,7 @@ void bch2_btree_node_rewrite_async(struct bch_fs *c, struct btree *b)
 	bool now = false, pending = false;
 
 	spin_lock(&c->btree_node_rewrites_lock);
-	if (c->recovery.curr_pass > BCH_RECOVERY_PASS_journal_replay &&
+	if (c->recovery.passes_complete & BIT_ULL(BCH_RECOVERY_PASS_journal_replay) &&
 	    enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_node_rewrite)) {
 		list_add(&a->list, &c->btree_node_rewrites);
 		now = true;
diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c
index 0909716900f9..44997bdef95f 100644
--- a/fs/bcachefs/fsck.c
+++ b/fs/bcachefs/fsck.c
@@ -3140,7 +3140,6 @@ static int bch2_fsck_online_thread_fn(struct thread_with_stdio *stdio)
 	c->opts.fsck = true;
 	set_bit(BCH_FS_in_fsck, &c->flags);
 
-	c->recovery.curr_pass = BCH_RECOVERY_PASS_check_alloc_info;
 	int ret = bch2_run_online_recovery_passes(c, ~0ULL);
 
 	clear_bit(BCH_FS_in_fsck, &c->flags);
diff --git a/fs/bcachefs/snapshot.c b/fs/bcachefs/snapshot.c
index c401d5285701..24903e7de296 100644
--- a/fs/bcachefs/snapshot.c
+++ b/fs/bcachefs/snapshot.c
@@ -348,7 +348,7 @@ static int __bch2_mark_snapshot(struct btree_trans *trans,
 
 		if (BCH_SNAPSHOT_WILL_DELETE(s.v)) {
 			set_bit(BCH_FS_need_delete_dead_snapshots, &c->flags);
-			if (c->recovery.curr_pass > BCH_RECOVERY_PASS_delete_dead_snapshots)
+			if (c->recovery.pass_done > BCH_RECOVERY_PASS_delete_dead_snapshots)
 				bch2_delete_dead_snapshots_async(c);
 		}
 	} else {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/8] bcachefs: bch2_recovery_pass_status_to_text()
  2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
                   ` (2 preceding siblings ...)
  2025-05-17 19:25 ` [PATCH 3/8] bcachefs: Reduce usage of recovery.curr_pass Kent Overstreet
@ 2025-05-17 19:25 ` Kent Overstreet
  2025-05-17 19:25 ` [PATCH 5/8] bcachefs: bch2_run_explicit_recovery_pass() cleanup Kent Overstreet
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

Show recovery pass status in sysfs - important now that we're running
them automatically in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/recovery_passes.c | 24 ++++++++++++++++++++++++
 fs/bcachefs/recovery_passes.h |  2 ++
 fs/bcachefs/sysfs.c           |  6 ++++++
 3 files changed, 32 insertions(+)

diff --git a/fs/bcachefs/recovery_passes.c b/fs/bcachefs/recovery_passes.c
index e0e261aa752e..02639b3d86b0 100644
--- a/fs/bcachefs/recovery_passes.c
+++ b/fs/bcachefs/recovery_passes.c
@@ -444,6 +444,30 @@ int bch2_run_recovery_passes(struct bch_fs *c, enum bch_recovery_pass from)
 	return ret;
 }
 
+static void prt_passes(struct printbuf *out, const char *msg, u64 passes)
+{
+	prt_printf(out, "%s:\t", msg);
+	prt_bitflags(out, bch2_recovery_passes, passes);
+	prt_newline(out);
+}
+
+void bch2_recovery_pass_status_to_text(struct printbuf *out, struct bch_fs *c)
+{
+	struct bch_fs_recovery *r = &c->recovery;
+
+	printbuf_tabstop_push(out, 32);
+	prt_passes(out, "Scheduled passes", c->sb.recovery_passes_required);
+	prt_passes(out, "Scheduled online passes", c->sb.recovery_passes_required &
+		   bch2_recovery_passes_match(PASS_ONLINE));
+	prt_passes(out, "Complete passes", r->passes_complete);
+	prt_passes(out, "Failing passes", r->passes_failing);
+
+	if (r->curr_pass) {
+		prt_printf(out, "Current pass:\t%s\n", bch2_recovery_passes[r->curr_pass]);
+		prt_passes(out, "Current passes", r->passes_to_run);
+	}
+}
+
 void bch2_fs_recovery_passes_init(struct bch_fs *c)
 {
 	spin_lock_init(&c->recovery.lock);
diff --git a/fs/bcachefs/recovery_passes.h b/fs/bcachefs/recovery_passes.h
index 0e79cc33fd8f..8c90e29cd6cb 100644
--- a/fs/bcachefs/recovery_passes.h
+++ b/fs/bcachefs/recovery_passes.h
@@ -20,6 +20,8 @@ int bch2_run_explicit_recovery_pass_persistent(struct bch_fs *, struct printbuf
 int bch2_run_online_recovery_passes(struct bch_fs *, u64);
 int bch2_run_recovery_passes(struct bch_fs *, enum bch_recovery_pass);
 
+void bch2_recovery_pass_status_to_text(struct printbuf *, struct bch_fs *);
+
 void bch2_fs_recovery_passes_init(struct bch_fs *);
 
 #endif /* _BCACHEFS_RECOVERY_PASSES_H */
diff --git a/fs/bcachefs/sysfs.c b/fs/bcachefs/sysfs.c
index 907d4246b8ef..0101eb025117 100644
--- a/fs/bcachefs/sysfs.c
+++ b/fs/bcachefs/sysfs.c
@@ -35,6 +35,7 @@
 #include "nocow_locking.h"
 #include "opts.h"
 #include "rebalance.h"
+#include "recovery_passes.h"
 #include "replicas.h"
 #include "super-io.h"
 #include "tests.h"
@@ -202,6 +203,7 @@ read_attribute(copy_gc_wait);
 sysfs_pd_controller_attribute(rebalance);
 read_attribute(rebalance_status);
 read_attribute(snapshot_delete_status);
+read_attribute(recovery_status);
 
 read_attribute(new_stripes);
 
@@ -437,6 +439,9 @@ SHOW(bch2_fs)
 	if (attr == &sysfs_snapshot_delete_status)
 		bch2_snapshot_delete_status_to_text(out, c);
 
+	if (attr == &sysfs_recovery_status)
+		bch2_recovery_pass_status_to_text(out, c);
+
 	/* Debugging: */
 
 	if (attr == &sysfs_journal_debug)
@@ -587,6 +592,7 @@ struct attribute *bch2_fs_files[] = {
 
 	&sysfs_rebalance_status,
 	&sysfs_snapshot_delete_status,
+	&sysfs_recovery_status,
 
 	&sysfs_compression_stats,
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/8] bcachefs: bch2_run_explicit_recovery_pass() cleanup
  2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
                   ` (3 preceding siblings ...)
  2025-05-17 19:25 ` [PATCH 4/8] bcachefs: bch2_recovery_pass_status_to_text() Kent Overstreet
@ 2025-05-17 19:25 ` Kent Overstreet
  2025-05-17 19:25 ` [PATCH 6/8] bcachefs: Run recovery passes asynchronously Kent Overstreet
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

Consolidate the run_explicit_recovery_pass() interfaces by adding a
flags parameter; this will also let us add a RUN_RECOVERY_PASS_ratelimit
flag.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/btree_node_scan.c |   2 +-
 fs/bcachefs/buckets.c         |   8 +-
 fs/bcachefs/errcode.h         |   1 -
 fs/bcachefs/error.c           |   2 +-
 fs/bcachefs/recovery.c        |  31 ++++----
 fs/bcachefs/recovery_passes.c | 142 +++++++++++++++++++---------------
 fs/bcachefs/recovery_passes.h |  18 +++--
 fs/bcachefs/sb-members.c      |   4 +-
 fs/bcachefs/subvolume.c       |   6 +-
 9 files changed, 117 insertions(+), 97 deletions(-)

diff --git a/fs/bcachefs/btree_node_scan.c b/fs/bcachefs/btree_node_scan.c
index 7bd13438d5ef..5a97a6b8a757 100644
--- a/fs/bcachefs/btree_node_scan.c
+++ b/fs/bcachefs/btree_node_scan.c
@@ -541,7 +541,7 @@ int bch2_get_scanned_nodes(struct bch_fs *c, enum btree_id btree,
 
 	struct find_btree_nodes *f = &c->found_btree_nodes;
 
-	int ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes);
+	int ret = bch2_run_print_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes);
 	if (ret)
 		return ret;
 
diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c
index 8d6955ef631b..ca6e58d6fbc8 100644
--- a/fs/bcachefs/buckets.c
+++ b/fs/bcachefs/buckets.c
@@ -399,8 +399,8 @@ static int bucket_ref_update_err(struct btree_trans *trans, struct printbuf *buf
 
 	bool print = __bch2_count_fsck_err(c, id, buf);
 
-	int ret = bch2_run_explicit_recovery_pass_persistent(c, buf,
-					BCH_RECOVERY_PASS_check_allocations);
+	int ret = bch2_run_explicit_recovery_pass(c, buf,
+					BCH_RECOVERY_PASS_check_allocations, 0);
 
 	if (insert) {
 		bch2_trans_updates_to_text(buf, trans);
@@ -972,8 +972,8 @@ static int __bch2_trans_mark_metadata_bucket(struct btree_trans *trans,
 
 		bool print = bch2_count_fsck_err(c, bucket_metadata_type_mismatch, &buf);
 
-		bch2_run_explicit_recovery_pass_persistent(c, &buf,
-					BCH_RECOVERY_PASS_check_allocations);
+		bch2_run_explicit_recovery_pass(c, &buf,
+					BCH_RECOVERY_PASS_check_allocations, 0);
 
 		if (print)
 			bch2_print_str(c, KERN_ERR, buf.buf);
diff --git a/fs/bcachefs/errcode.h b/fs/bcachefs/errcode.h
index 4aac0182cbed..62843e772b2c 100644
--- a/fs/bcachefs/errcode.h
+++ b/fs/bcachefs/errcode.h
@@ -183,7 +183,6 @@
 	x(BCH_ERR_fsck,			fsck_repair_unimplemented)		\
 	x(BCH_ERR_fsck,			fsck_repair_impossible)			\
 	x(EINVAL,			restart_recovery)			\
-	x(EINVAL,			not_in_recovery)			\
 	x(EINVAL,			cannot_rewind_recovery)			\
 	x(0,				data_update_done)			\
 	x(BCH_ERR_data_update_done,	data_update_done_would_block)		\
diff --git a/fs/bcachefs/error.c b/fs/bcachefs/error.c
index 52f1108d5829..a476dd2c196e 100644
--- a/fs/bcachefs/error.c
+++ b/fs/bcachefs/error.c
@@ -102,7 +102,7 @@ int __bch2_topology_error(struct bch_fs *c, struct printbuf *out)
 		__bch2_inconsistent_error(c, out);
 		return -BCH_ERR_btree_need_topology_repair;
 	} else {
-		return bch2_run_explicit_recovery_pass_persistent(c, out, BCH_RECOVERY_PASS_check_topology) ?:
+		return bch2_run_explicit_recovery_pass(c, out, BCH_RECOVERY_PASS_check_topology, 0) ?:
 			-BCH_ERR_btree_node_read_validate_error;
 	}
 }
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index a7e6b5a6505a..0f954567ea45 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -52,24 +52,24 @@ int bch2_btree_lost_data(struct bch_fs *c,
 	}
 
 	/* Once we have runtime self healing for topology errors we won't need this: */
-	ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_topology) ?: ret;
+	ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_topology, 0) ?: ret;
 
 	/* Btree node accounting will be off: */
 	__set_bit_le64(BCH_FSCK_ERR_accounting_mismatch, ext->errors_silent);
-	ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_allocations) ?: ret;
+	ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_allocations, 0) ?: ret;
 
 #ifdef CONFIG_BCACHEFS_DEBUG
 	/*
 	 * These are much more minor, and don't need to be corrected right away,
 	 * but in debug mode we want the next fsck run to be clean:
 	 */
-	ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_lrus) ?: ret;
-	ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_backpointers_to_extents) ?: ret;
+	ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_lrus, 0) ?: ret;
+	ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_backpointers_to_extents, 0) ?: ret;
 #endif
 
 	switch (btree) {
 	case BTREE_ID_alloc:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_alloc_info) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_alloc_info, 0) ?: ret;
 
 		__set_bit_le64(BCH_FSCK_ERR_alloc_key_data_type_wrong, ext->errors_silent);
 		__set_bit_le64(BCH_FSCK_ERR_alloc_key_gen_wrong, ext->errors_silent);
@@ -79,30 +79,30 @@ int bch2_btree_lost_data(struct bch_fs *c,
 		__set_bit_le64(BCH_FSCK_ERR_alloc_key_stripe_redundancy_wrong, ext->errors_silent);
 		goto out;
 	case BTREE_ID_backpointers:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_btree_backpointers) ?: ret;
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_extents_to_backpointers) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_btree_backpointers, 0) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_extents_to_backpointers, 0) ?: ret;
 		goto out;
 	case BTREE_ID_need_discard:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_alloc_info) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_alloc_info, 0) ?: ret;
 		goto out;
 	case BTREE_ID_freespace:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_alloc_info) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_alloc_info, 0) ?: ret;
 		goto out;
 	case BTREE_ID_bucket_gens:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_alloc_info) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_alloc_info, 0) ?: ret;
 		goto out;
 	case BTREE_ID_lru:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_alloc_info) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_alloc_info, 0) ?: ret;
 		goto out;
 	case BTREE_ID_accounting:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_check_allocations) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_check_allocations, 0) ?: ret;
 		goto out;
 	case BTREE_ID_snapshots:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_reconstruct_snapshots) ?: ret;
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_scan_for_btree_nodes) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_reconstruct_snapshots, 0) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_scan_for_btree_nodes, 0) ?: ret;
 		goto out;
 	default:
-		ret = __bch2_run_explicit_recovery_pass_persistent(c, msg, BCH_RECOVERY_PASS_scan_for_btree_nodes) ?: ret;
+		ret = __bch2_run_explicit_recovery_pass(c, msg, BCH_RECOVERY_PASS_scan_for_btree_nodes, 0) ?: ret;
 		goto out;
 	}
 out:
@@ -978,7 +978,6 @@ int bch2_fs_recovery(struct bch_fs *c)
 	 */
 	set_bit(BCH_FS_may_go_rw, &c->flags);
 	clear_bit(BCH_FS_in_fsck, &c->flags);
-	clear_bit(BCH_FS_in_recovery, &c->flags);
 
 	/* in case we don't run journal replay, i.e. norecovery mode */
 	set_bit(BCH_FS_accounting_replay_done, &c->flags);
diff --git a/fs/bcachefs/recovery_passes.c b/fs/bcachefs/recovery_passes.c
index 02639b3d86b0..b931a9b465d4 100644
--- a/fs/bcachefs/recovery_passes.c
+++ b/fs/bcachefs/recovery_passes.c
@@ -218,104 +218,119 @@ u64 bch2_fsck_recovery_passes(void)
 	return bch2_recovery_passes_match(PASS_FSCK);
 }
 
+static bool recovery_pass_needs_set(struct bch_fs *c,
+				    enum bch_recovery_pass pass,
+				    enum bch_run_recovery_pass_flags flags)
+{
+	struct bch_fs_recovery *r = &c->recovery;
+	bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags);
+	bool persistent = !in_recovery || !(flags & RUN_RECOVERY_PASS_nopersistent);
+
+	/*
+	 * If RUN_RECOVERY_PASS_nopersistent is set, we don't want to do
+	 * anything if the pass has already run: these mean we need a prior pass
+	 * to run before we continue to repair, we don't expect that pass to fix
+	 * the damage we encountered.
+	 *
+	 * Otherwise, we run run_explicit_recovery_pass when we find damage, so
+	 * it should run again even if it's already run:
+	 */
+
+	return persistent
+		? !(c->sb.recovery_passes_required & BIT_ULL(pass))
+		: !((r->passes_to_run|r->passes_complete) & BIT_ULL(pass));
+}
+
 /*
  * For when we need to rewind recovery passes and run a pass we skipped:
  */
-static int __bch2_run_explicit_recovery_pass(struct printbuf *out,
-					     struct bch_fs *c,
-					     enum bch_recovery_pass pass)
+int __bch2_run_explicit_recovery_pass(struct bch_fs *c,
+				      struct printbuf *out,
+				      enum bch_recovery_pass pass,
+				      enum bch_run_recovery_pass_flags flags)
 {
 	struct bch_fs_recovery *r = &c->recovery;
+	int ret = 0;
 
-	if (r->curr_pass == ARRAY_SIZE(recovery_pass_fns))
-		return -BCH_ERR_not_in_recovery;
+	lockdep_assert_held(&c->sb_lock);
 
-	if (r->passes_complete & BIT_ULL(pass))
-		return 0;
+	bch2_printbuf_make_room(out, 1024);
+	out->atomic++;
 
-	bool print = !(c->opts.recovery_passes & BIT_ULL(pass));
+	unsigned long lockflags;
+	spin_lock_irqsave(&r->lock, lockflags);
 
-	if (pass < BCH_RECOVERY_PASS_set_may_go_rw &&
-	    r->curr_pass >= BCH_RECOVERY_PASS_set_may_go_rw) {
-		if (print)
-			prt_printf(out, "need recovery pass %s (%u), but already rw\n",
-				   bch2_recovery_passes[pass], pass);
-		return -BCH_ERR_cannot_rewind_recovery;
+	if (!recovery_pass_needs_set(c, pass, flags))
+		goto out;
+
+	bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags);
+	bool rewind = in_recovery && r->curr_pass > pass;
+
+	if ((flags & RUN_RECOVERY_PASS_nopersistent) && in_recovery) {
+		r->passes_to_run |= BIT_ULL(pass);
+	} else {
+		struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
+		__set_bit_le64(bch2_recovery_pass_to_stable(pass), ext->recovery_passes_required);
 	}
 
-	if (print)
-		prt_printf(out, "running explicit recovery pass %s (%u), currently at %s (%u)\n",
-			   bch2_recovery_passes[pass], pass,
-			   bch2_recovery_passes[r->curr_pass], r->curr_pass);
+	if (pass < BCH_RECOVERY_PASS_set_may_go_rw &&
+	    (!in_recovery || r->curr_pass >= BCH_RECOVERY_PASS_set_may_go_rw)) {
+		prt_printf(out, "need recovery pass %s (%u), but already rw\n",
+			   bch2_recovery_passes[pass], pass);
+		ret = -BCH_ERR_cannot_rewind_recovery;
+		goto out;
+	}
 
-	c->opts.recovery_passes |= BIT_ULL(pass);
+	prt_printf(out, "running recovery pass %s (%u), currently at %s (%u)%s\n",
+		   bch2_recovery_passes[pass], pass,
+		   bch2_recovery_passes[r->curr_pass], r->curr_pass,
+		   rewind ? " - rewinding" : "");
 
 	if (test_bit(BCH_FS_in_recovery, &c->flags))
 		r->passes_to_run |= BIT_ULL(pass);
 
-	if (test_bit(BCH_FS_in_recovery, &c->flags) &&
-	    r->curr_pass > pass) {
+	if (rewind) {
 		r->next_pass = pass;
-		return -BCH_ERR_restart_recovery;
-	} else {
-		return 0;
+		r->passes_complete &= (1ULL << pass) >> 1;
+		ret = -BCH_ERR_restart_recovery;
 	}
-}
-
-static int bch2_run_explicit_recovery_pass_printbuf(struct bch_fs *c,
-				    struct printbuf *out,
-				    enum bch_recovery_pass pass)
-{
-	bch2_printbuf_make_room(out, 1024);
-	out->atomic++;
-
-	unsigned long flags;
-	spin_lock_irqsave(&c->recovery.lock, flags);
-	int ret = __bch2_run_explicit_recovery_pass(out, c, pass);
-	spin_unlock_irqrestore(&c->recovery.lock, flags);
-
+out:
+	spin_unlock_irqrestore(&r->lock, lockflags);
 	--out->atomic;
 	return ret;
 }
 
 int bch2_run_explicit_recovery_pass(struct bch_fs *c,
-				    enum bch_recovery_pass pass)
+				    struct printbuf *out,
+				    enum bch_recovery_pass pass,
+				    enum bch_run_recovery_pass_flags flags)
 {
-	struct printbuf buf = PRINTBUF;
-	bch2_log_msg_start(c, &buf);
-	unsigned len = buf.pos;
+	if (!recovery_pass_needs_set(c, pass, flags))
+		return 0;
 
-	int ret = bch2_run_explicit_recovery_pass_printbuf(c, &buf, pass);
+	mutex_lock(&c->sb_lock);
+	int ret = __bch2_run_explicit_recovery_pass(c, out, pass, flags);
+	bch2_write_super(c);
+	mutex_unlock(&c->sb_lock);
 
-	if (len != buf.pos)
-		bch2_print_str(c, KERN_NOTICE, buf.buf);
-	printbuf_exit(&buf);
 	return ret;
 }
 
-int __bch2_run_explicit_recovery_pass_persistent(struct bch_fs *c,
-						 struct printbuf *out,
-						 enum bch_recovery_pass pass)
-{
-	lockdep_assert_held(&c->sb_lock);
-
-	struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
-	__set_bit_le64(bch2_recovery_pass_to_stable(pass), ext->recovery_passes_required);
-
-	return bch2_run_explicit_recovery_pass_printbuf(c, out, pass);
-}
-
-int bch2_run_explicit_recovery_pass_persistent(struct bch_fs *c,
-					       struct printbuf *out,
-					       enum bch_recovery_pass pass)
+int bch2_run_print_explicit_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
 {
-	if (c->sb.recovery_passes_required & BIT_ULL(pass))
+	if (!recovery_pass_needs_set(c, pass, RUN_RECOVERY_PASS_nopersistent))
 		return 0;
 
+	struct printbuf buf = PRINTBUF;
+	bch2_log_msg_start(c, &buf);
+
 	mutex_lock(&c->sb_lock);
-	int ret = __bch2_run_explicit_recovery_pass_persistent(c, out, pass);
+	int ret = __bch2_run_explicit_recovery_pass(c, &buf, pass,
+						RUN_RECOVERY_PASS_nopersistent);
 	mutex_unlock(&c->sb_lock);
 
+	bch2_print_str(c, KERN_NOTICE, buf.buf);
+	printbuf_exit(&buf);
 	return ret;
 }
 
@@ -409,6 +424,7 @@ static int __bch2_run_recovery_passes(struct bch_fs *c, u64 orig_passes_to_run,
 		}
 	}
 
+	clear_bit(BCH_FS_in_recovery, &c->flags);
 	spin_unlock_irq(&r->lock);
 
 	return ret;
diff --git a/fs/bcachefs/recovery_passes.h b/fs/bcachefs/recovery_passes.h
index 8c90e29cd6cb..30f896479a52 100644
--- a/fs/bcachefs/recovery_passes.h
+++ b/fs/bcachefs/recovery_passes.h
@@ -10,12 +10,18 @@ u64 bch2_recovery_passes_from_stable(u64 v);
 
 u64 bch2_fsck_recovery_passes(void);
 
-int bch2_run_explicit_recovery_pass(struct bch_fs *, enum bch_recovery_pass);
-
-int __bch2_run_explicit_recovery_pass_persistent(struct bch_fs *, struct printbuf *,
-					       enum bch_recovery_pass);
-int bch2_run_explicit_recovery_pass_persistent(struct bch_fs *, struct printbuf *,
-					       enum bch_recovery_pass);
+enum bch_run_recovery_pass_flags {
+	RUN_RECOVERY_PASS_nopersistent	= BIT(0),
+};
+
+int bch2_run_print_explicit_recovery_pass(struct bch_fs *, enum bch_recovery_pass);
+
+int __bch2_run_explicit_recovery_pass(struct bch_fs *, struct printbuf *,
+				      enum bch_recovery_pass,
+				      enum bch_run_recovery_pass_flags);
+int bch2_run_explicit_recovery_pass(struct bch_fs *, struct printbuf *,
+				    enum bch_recovery_pass,
+				    enum bch_run_recovery_pass_flags);
 
 int bch2_run_online_recovery_passes(struct bch_fs *, u64);
 int bch2_run_recovery_passes(struct bch_fs *, enum bch_recovery_pass);
diff --git a/fs/bcachefs/sb-members.c b/fs/bcachefs/sb-members.c
index 75184d8e685a..3398906660a5 100644
--- a/fs/bcachefs/sb-members.c
+++ b/fs/bcachefs/sb-members.c
@@ -20,8 +20,8 @@ int bch2_dev_missing_bkey(struct bch_fs *c, struct bkey_s_c k, unsigned dev)
 
 	bool print = bch2_count_fsck_err(c, ptr_to_invalid_device, &buf);
 
-	int ret = bch2_run_explicit_recovery_pass_persistent(c, &buf,
-						 BCH_RECOVERY_PASS_check_allocations);
+	int ret = bch2_run_explicit_recovery_pass(c, &buf,
+					BCH_RECOVERY_PASS_check_allocations, 0);
 
 	if (print)
 		bch2_print_str(c, KERN_ERR, buf.buf);
diff --git a/fs/bcachefs/subvolume.c b/fs/bcachefs/subvolume.c
index 3c6ba1469de2..35c9f86a73c1 100644
--- a/fs/bcachefs/subvolume.c
+++ b/fs/bcachefs/subvolume.c
@@ -23,8 +23,8 @@ static int bch2_subvolume_missing(struct bch_fs *c, u32 subvolid)
 	prt_printf(&buf, "missing subvolume %u", subvolid);
 	bool print = bch2_count_fsck_err(c, subvol_missing, &buf);
 
-	int ret = bch2_run_explicit_recovery_pass_persistent(c, &buf,
-					BCH_RECOVERY_PASS_check_inodes);
+	int ret = bch2_run_explicit_recovery_pass(c, &buf,
+					BCH_RECOVERY_PASS_check_inodes, 0);
 	if (print)
 		bch2_print_str(c, KERN_ERR, buf.buf);
 	printbuf_exit(&buf);
@@ -62,7 +62,7 @@ static int check_subvol(struct btree_trans *trans,
 	ret = bch2_snapshot_lookup(trans, snapid, &snapshot);
 
 	if (bch2_err_matches(ret, ENOENT))
-		return bch2_run_explicit_recovery_pass(c,
+		return bch2_run_print_explicit_recovery_pass(c,
 					BCH_RECOVERY_PASS_reconstruct_snapshots) ?: ret;
 	if (ret)
 		return ret;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/8] bcachefs: Run recovery passes asynchronously
  2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
                   ` (4 preceding siblings ...)
  2025-05-17 19:25 ` [PATCH 5/8] bcachefs: bch2_run_explicit_recovery_pass() cleanup Kent Overstreet
@ 2025-05-17 19:25 ` Kent Overstreet
  2025-05-17 19:25 ` [PATCH 7/8] bcachefs: Improve bucket_bitmap code Kent Overstreet
  2025-05-17 19:25 ` [PATCH 8/8] bcachefs: bch2_check_bucket_backpointer_mismatch() Kent Overstreet
  7 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

When we request a recovery pass to be run online, i.e. not during
recovery, if it's an online pass it'll now be run in the background,
instead of waiting for the next mount.

To avoid situations where recovery passes are running continuously, this
also includes ratelimiting: if the RUN_RECOVERY_PASS_ratelimit flag is
passed, the pass may be deferred until later - depending on the runtime
and last run stats in the recovery_passes superblock section.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/bcachefs.h              |   3 +-
 fs/bcachefs/recovery_passes.c       | 133 ++++++++++++++++++++++------
 fs/bcachefs/recovery_passes.h       |   1 +
 fs/bcachefs/recovery_passes_types.h |   2 +
 4 files changed, 113 insertions(+), 26 deletions(-)

diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 1458f131af16..e1680b635fe1 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -760,7 +760,8 @@ struct btree_trans_buf {
 	x(snapshot_delete_pagecache)					\
 	x(sysfs)							\
 	x(btree_write_buffer)						\
-	x(btree_node_scrub)
+	x(btree_node_scrub)						\
+	x(async_recovery_passes)
 
 enum bch_write_ref {
 #define x(n) BCH_WRITE_REF_##n,
diff --git a/fs/bcachefs/recovery_passes.c b/fs/bcachefs/recovery_passes.c
index b931a9b465d4..f74f14227137 100644
--- a/fs/bcachefs/recovery_passes.c
+++ b/fs/bcachefs/recovery_passes.c
@@ -138,6 +138,30 @@ static void bch2_sb_recovery_pass_complete(struct bch_fs *c,
 	mutex_unlock(&c->sb_lock);
 }
 
+static bool bch2_recovery_pass_want_ratelimit(struct bch_fs *c, enum bch_recovery_pass pass)
+{
+	enum bch_recovery_pass_stable stable = bch2_recovery_pass_to_stable(pass);
+	bool ret = false;
+
+	lockdep_assert_held(&c->sb_lock);
+
+	struct bch_sb_field_recovery_passes *r =
+		bch2_sb_field_get(c->disk_sb.sb, recovery_passes);
+
+	if (stable < recovery_passes_nr_entries(r)) {
+		struct recovery_pass_entry *i = r->start + stable;
+
+		/*
+		 * Ratelimit if the last runtime was more than 1% of the time
+		 * since we last ran
+		 */
+		ret = (u64) le32_to_cpu(i->last_runtime) * 100 >
+			ktime_get_real_seconds() - le64_to_cpu(i->last_run);
+	}
+
+	return ret;
+}
+
 const struct bch_sb_field_ops bch_sb_field_ops_recovery_passes = {
 	.validate	= bch2_sb_recovery_passes_validate,
 	.to_text	= bch2_sb_recovery_passes_to_text
@@ -218,13 +242,33 @@ u64 bch2_fsck_recovery_passes(void)
 	return bch2_recovery_passes_match(PASS_FSCK);
 }
 
+static void bch2_run_async_recovery_passes(struct bch_fs *c)
+{
+	if (!down_trylock(&c->recovery.run_lock))
+		return;
+
+	if (!enumerated_ref_tryget(&c->writes, BCH_WRITE_REF_async_recovery_passes))
+		goto unlock;
+
+	if (queue_work(system_long_wq, &c->recovery.work))
+		return;
+
+	enumerated_ref_put(&c->writes, BCH_WRITE_REF_async_recovery_passes);
+unlock:
+	up(&c->recovery.run_lock);
+}
+
 static bool recovery_pass_needs_set(struct bch_fs *c,
 				    enum bch_recovery_pass pass,
-				    enum bch_run_recovery_pass_flags flags)
+				    enum bch_run_recovery_pass_flags *flags)
 {
 	struct bch_fs_recovery *r = &c->recovery;
 	bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags);
-	bool persistent = !in_recovery || !(flags & RUN_RECOVERY_PASS_nopersistent);
+	bool persistent = !in_recovery || !(*flags & RUN_RECOVERY_PASS_nopersistent);
+
+	if ((*flags & RUN_RECOVERY_PASS_ratelimit) &&
+	    !bch2_recovery_pass_want_ratelimit(c, pass))
+		*flags &= ~RUN_RECOVERY_PASS_ratelimit;
 
 	/*
 	 * If RUN_RECOVERY_PASS_nopersistent is set, we don't want to do
@@ -236,9 +280,16 @@ static bool recovery_pass_needs_set(struct bch_fs *c,
 	 * it should run again even if it's already run:
 	 */
 
-	return persistent
-		? !(c->sb.recovery_passes_required & BIT_ULL(pass))
-		: !((r->passes_to_run|r->passes_complete) & BIT_ULL(pass));
+	if (persistent
+	    ? !(c->sb.recovery_passes_required & BIT_ULL(pass))
+	    : !((r->passes_to_run|r->passes_complete) & BIT_ULL(pass)))
+		return true;
+
+	if (!(*flags & RUN_RECOVERY_PASS_ratelimit) &&
+	    (r->passes_ratelimiting & BIT_ULL(pass)))
+		return true;
+
+	return false;
 }
 
 /*
@@ -260,15 +311,14 @@ int __bch2_run_explicit_recovery_pass(struct bch_fs *c,
 	unsigned long lockflags;
 	spin_lock_irqsave(&r->lock, lockflags);
 
-	if (!recovery_pass_needs_set(c, pass, flags))
+	if (!recovery_pass_needs_set(c, pass, &flags))
 		goto out;
 
 	bool in_recovery = test_bit(BCH_FS_in_recovery, &c->flags);
 	bool rewind = in_recovery && r->curr_pass > pass;
+	bool ratelimit = flags & RUN_RECOVERY_PASS_ratelimit;
 
-	if ((flags & RUN_RECOVERY_PASS_nopersistent) && in_recovery) {
-		r->passes_to_run |= BIT_ULL(pass);
-	} else {
+	if (!(in_recovery && (flags & RUN_RECOVERY_PASS_nopersistent))) {
 		struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext);
 		__set_bit_le64(bch2_recovery_pass_to_stable(pass), ext->recovery_passes_required);
 	}
@@ -281,18 +331,32 @@ int __bch2_run_explicit_recovery_pass(struct bch_fs *c,
 		goto out;
 	}
 
-	prt_printf(out, "running recovery pass %s (%u), currently at %s (%u)%s\n",
-		   bch2_recovery_passes[pass], pass,
-		   bch2_recovery_passes[r->curr_pass], r->curr_pass,
-		   rewind ? " - rewinding" : "");
+	if (ratelimit)
+		r->passes_ratelimiting |= BIT_ULL(pass);
+	else
+		r->passes_ratelimiting &= ~BIT_ULL(pass);
+
+	if (in_recovery && !ratelimit) {
+		prt_printf(out, "running recovery pass %s (%u), currently at %s (%u)%s\n",
+			   bch2_recovery_passes[pass], pass,
+			   bch2_recovery_passes[r->curr_pass], r->curr_pass,
+			   rewind ? " - rewinding" : "");
 
-	if (test_bit(BCH_FS_in_recovery, &c->flags))
 		r->passes_to_run |= BIT_ULL(pass);
 
-	if (rewind) {
-		r->next_pass = pass;
-		r->passes_complete &= (1ULL << pass) >> 1;
-		ret = -BCH_ERR_restart_recovery;
+		if (rewind) {
+			r->next_pass = pass;
+			r->passes_complete &= (1ULL << pass) >> 1;
+			ret = -BCH_ERR_restart_recovery;
+		}
+	} else {
+		prt_printf(out, "scheduling recovery pass %s (%u)%s\n",
+			   bch2_recovery_passes[pass], pass,
+			   ratelimit ? " - ratelimiting" : "");
+
+		struct recovery_pass_fn *p = recovery_pass_fns + pass;
+		if (p->when & PASS_ONLINE)
+			bch2_run_async_recovery_passes(c);
 	}
 out:
 	spin_unlock_irqrestore(&r->lock, lockflags);
@@ -305,20 +369,24 @@ int bch2_run_explicit_recovery_pass(struct bch_fs *c,
 				    enum bch_recovery_pass pass,
 				    enum bch_run_recovery_pass_flags flags)
 {
-	if (!recovery_pass_needs_set(c, pass, flags))
-		return 0;
+	int ret = 0;
 
-	mutex_lock(&c->sb_lock);
-	int ret = __bch2_run_explicit_recovery_pass(c, out, pass, flags);
-	bch2_write_super(c);
-	mutex_unlock(&c->sb_lock);
+	scoped_guard(mutex, &c->sb_lock) {
+		if (!recovery_pass_needs_set(c, pass, &flags))
+			return 0;
+
+		ret = __bch2_run_explicit_recovery_pass(c, out, pass, flags);
+		bch2_write_super(c);
+	}
 
 	return ret;
 }
 
 int bch2_run_print_explicit_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass)
 {
-	if (!recovery_pass_needs_set(c, pass, RUN_RECOVERY_PASS_nopersistent))
+	enum bch_run_recovery_pass_flags flags = RUN_RECOVERY_PASS_nopersistent;
+
+	if (!recovery_pass_needs_set(c, pass, &flags))
 		return 0;
 
 	struct printbuf buf = PRINTBUF;
@@ -430,6 +498,19 @@ static int __bch2_run_recovery_passes(struct bch_fs *c, u64 orig_passes_to_run,
 	return ret;
 }
 
+static void bch2_async_recovery_passes_work(struct work_struct *work)
+{
+	struct bch_fs *c = container_of(work, struct bch_fs, recovery.work);
+	struct bch_fs_recovery *r = &c->recovery;
+
+	__bch2_run_recovery_passes(c,
+		c->sb.recovery_passes_required & ~r->passes_ratelimiting,
+		true);
+
+	up(&r->run_lock);
+	enumerated_ref_put(&c->writes, BCH_WRITE_REF_async_recovery_passes);
+}
+
 int bch2_run_online_recovery_passes(struct bch_fs *c, u64 passes)
 {
 	return __bch2_run_recovery_passes(c, c->sb.recovery_passes_required|passes, true);
@@ -488,4 +569,6 @@ void bch2_fs_recovery_passes_init(struct bch_fs *c)
 {
 	spin_lock_init(&c->recovery.lock);
 	sema_init(&c->recovery.run_lock, 1);
+
+	INIT_WORK(&c->recovery.work, bch2_async_recovery_passes_work);
 }
diff --git a/fs/bcachefs/recovery_passes.h b/fs/bcachefs/recovery_passes.h
index 30f896479a52..dc0d2014ff9b 100644
--- a/fs/bcachefs/recovery_passes.h
+++ b/fs/bcachefs/recovery_passes.h
@@ -12,6 +12,7 @@ u64 bch2_fsck_recovery_passes(void);
 
 enum bch_run_recovery_pass_flags {
 	RUN_RECOVERY_PASS_nopersistent	= BIT(0),
+	RUN_RECOVERY_PASS_ratelimit	= BIT(1),
 };
 
 int bch2_run_print_explicit_recovery_pass(struct bch_fs *, enum bch_recovery_pass);
diff --git a/fs/bcachefs/recovery_passes_types.h b/fs/bcachefs/recovery_passes_types.h
index deb6e0565cb9..aa9526938cc3 100644
--- a/fs/bcachefs/recovery_passes_types.h
+++ b/fs/bcachefs/recovery_passes_types.h
@@ -18,8 +18,10 @@ struct bch_fs_recovery {
 	/* bitmask of recovery passes that we actually ran */
 	u64			passes_complete;
 	u64			passes_failing;
+	u64			passes_ratelimiting;
 	spinlock_t		lock;
 	struct semaphore	run_lock;
+	struct work_struct	work;
 };
 
 #endif /* _BCACHEFS_RECOVERY_PASSES_TYPES_H */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 7/8] bcachefs: Improve bucket_bitmap code
  2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
                   ` (5 preceding siblings ...)
  2025-05-17 19:25 ` [PATCH 6/8] bcachefs: Run recovery passes asynchronously Kent Overstreet
@ 2025-05-17 19:25 ` Kent Overstreet
  2025-05-17 19:25 ` [PATCH 8/8] bcachefs: bch2_check_bucket_backpointer_mismatch() Kent Overstreet
  7 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

Add some more helpers, and mismatches is now a superset of the empty
bitmap - simplifies most checks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/backpointers.c | 123 ++++++++++++++++++++++---------------
 fs/bcachefs/backpointers.h |   7 +++
 fs/bcachefs/bcachefs.h     |   3 +-
 fs/bcachefs/buckets.c      |  25 ++------
 fs/bcachefs/movinggc.c     |   6 +-
 fs/bcachefs/super.c        |   8 +--
 6 files changed, 92 insertions(+), 80 deletions(-)

diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index d9ddfc4b5dcc..6b98ce1ed6c9 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -15,6 +15,8 @@
 
 #include <linux/mm.h>
 
+static int bch2_bucket_bitmap_set(struct bch_dev *, struct bucket_bitmap *, u64);
+
 static inline struct bbpos bp_to_bbpos(struct bch_backpointer bp)
 {
 	return (struct bbpos) {
@@ -685,31 +687,28 @@ static int check_extent_to_backpointers(struct btree_trans *trans,
 			continue;
 		}
 
-		u64 b = PTR_BUCKET_NR(ca, &p.ptr);
-		bool set[2];
-
-		for (unsigned i = 0; i < ARRAY_SIZE(ca->bucket_backpointer_mismatches); i++) {
-			unsigned long *bitmap =
-				READ_ONCE(ca->bucket_backpointer_mismatches[i].buckets);
-			set[i] = bitmap && test_bit(b, bitmap);
+		if (p.ptr.cached && dev_ptr_stale_rcu(ca, &p.ptr)) {
+			rcu_read_unlock();
+			continue;
 		}
 
-		bool check = set[0];
-		bool empty = set[1];
+		u64 b = PTR_BUCKET_NR(ca, &p.ptr);
+		if (!bch2_bucket_bitmap_test(&ca->bucket_backpointer_mismatch, b)) {
+			rcu_read_unlock();
+			continue;
+		}
 
-		bool stale = p.ptr.cached && (!ca || dev_ptr_stale_rcu(ca, &p.ptr));
+		bool empty = bch2_bucket_bitmap_test(&ca->bucket_backpointer_empty, b);
 		rcu_read_unlock();
 
-		if ((check || empty) && !stale) {
-			struct bkey_i_backpointer bp;
-			bch2_extent_ptr_to_bp(c, btree, level, k, p, entry, &bp);
+		struct bkey_i_backpointer bp;
+		bch2_extent_ptr_to_bp(c, btree, level, k, p, entry, &bp);
 
-			int ret = check
-				? check_bp_exists(trans, s, &bp, k)
-				: bch2_bucket_backpointer_mod(trans, k, &bp, true);
-			if (ret)
-				return ret;
-		}
+		int ret = !empty
+			? check_bp_exists(trans, s, &bp, k)
+			: bch2_bucket_backpointer_mod(trans, k, &bp, true);
+		if (ret)
+			return ret;
 	}
 
 	return 0;
@@ -952,21 +951,12 @@ static int check_bucket_backpointer_mismatch(struct btree_trans *trans, struct b
 			      sectors[ALLOC_stripe] +
 			      sectors[ALLOC_cached]) == 0;
 
-		struct bucket_bitmap *bitmap = &ca->bucket_backpointer_mismatches[empty];
-
-		mutex_lock(&bitmap->lock);
-		if (!bitmap->buckets) {
-			bitmap->buckets = kvcalloc(BITS_TO_LONGS(ca->mi.nbuckets),
-						   sizeof(unsigned long), GFP_KERNEL);
-			if (!bitmap->buckets) {
-				mutex_unlock(&bitmap->lock);
-				ret = -BCH_ERR_ENOMEM_backpointer_mismatches_bitmap;
-				goto err;
-			}
-		}
-
-		bitmap->nr += !__test_and_set_bit(alloc_k.k->p.offset, bitmap->buckets);
-		mutex_unlock(&bitmap->lock);
+		ret = bch2_bucket_bitmap_set(ca, &ca->bucket_backpointer_mismatch,
+					     alloc_k.k->p.offset) ?:
+			(empty
+			 ? bch2_bucket_bitmap_set(ca, &ca->bucket_backpointer_empty,
+						  alloc_k.k->p.offset)
+			 : 0);
 	}
 err:
 	bch2_dev_put(ca);
@@ -992,15 +982,10 @@ static bool backpointer_node_has_missing(struct bch_fs *c, struct bkey_s_c k)
 			struct bpos bucket = bp_pos_to_bucket(ca, pos);
 			u64 next = ca->mi.nbuckets;
 
-			for (unsigned i = 0; i < ARRAY_SIZE(ca->bucket_backpointer_mismatches); i++) {
-				unsigned long *bitmap =
-					READ_ONCE(ca->bucket_backpointer_mismatches[i].buckets);
-				if (bitmap)
-					next = min_t(u64, next,
-						     find_next_bit(bitmap,
-								   ca->mi.nbuckets,
-								   bucket.offset));
-			}
+			unsigned long *bitmap = READ_ONCE(ca->bucket_backpointer_mismatch.buckets);
+			if (bitmap)
+				next = min_t(u64, next,
+					     find_next_bit(bitmap, ca->mi.nbuckets, bucket.offset));
 
 			bucket.offset = next;
 			if (bucket.offset == ca->mi.nbuckets)
@@ -1124,18 +1109,17 @@ int bch2_check_extents_to_backpointers(struct bch_fs *c)
 	if (ret)
 		goto err;
 
-	u64 nr_buckets = 0, nr_mismatches = 0, nr_empty = 0;
+	u64 nr_buckets = 0, nr_mismatches = 0;
 	for_each_member_device(c, ca) {
 		nr_buckets	+= ca->mi.nbuckets;
-		nr_mismatches	+= ca->bucket_backpointer_mismatches[0].nr;
-		nr_empty	+= ca->bucket_backpointer_mismatches[1].nr;
+		nr_mismatches	+= ca->bucket_backpointer_mismatch.nr;
 	}
 
-	if (!nr_mismatches && !nr_empty)
+	if (!nr_mismatches)
 		goto err;
 
 	bch_info(c, "scanning for missing backpointers in %llu/%llu buckets",
-		 nr_mismatches + nr_empty, nr_buckets);
+		 nr_mismatches, nr_buckets);
 
 	while (1) {
 		ret = bch2_pin_backpointer_nodes_with_missing(trans, s.bp_start, &s.bp_end);
@@ -1171,9 +1155,10 @@ int bch2_check_extents_to_backpointers(struct bch_fs *c)
 	bch2_bkey_buf_exit(&s.last_flushed, c);
 	bch2_btree_cache_unpin(c);
 
-	for_each_member_device(c, ca)
-		for (unsigned i = 0; i < ARRAY_SIZE(ca->bucket_backpointer_mismatches); i++)
-			bch2_bucket_bitmap_free(&ca->bucket_backpointer_mismatches[i]);
+	for_each_member_device(c, ca) {
+		bch2_bucket_bitmap_free(&ca->bucket_backpointer_mismatch);
+		bch2_bucket_bitmap_free(&ca->bucket_backpointer_empty);
+	}
 
 	bch_err_fn(c, ret);
 	return ret;
@@ -1297,6 +1282,42 @@ int bch2_check_backpointers_to_extents(struct bch_fs *c)
 	return ret;
 }
 
+static int bch2_bucket_bitmap_set(struct bch_dev *ca, struct bucket_bitmap *b, u64 bit)
+{
+	scoped_guard(mutex, &b->lock) {
+		if (!b->buckets) {
+			b->buckets = kvcalloc(BITS_TO_LONGS(ca->mi.nbuckets),
+					      sizeof(unsigned long), GFP_KERNEL);
+			if (!b->buckets)
+				return -BCH_ERR_ENOMEM_backpointer_mismatches_bitmap;
+		}
+
+		b->nr += !__test_and_set_bit(bit, b->buckets);
+	}
+
+	return 0;
+}
+
+int bch2_bucket_bitmap_resize(struct bucket_bitmap *b, u64 old_size, u64 new_size)
+{
+	scoped_guard(mutex, &b->lock) {
+		if (!b->buckets)
+			return 0;
+
+		unsigned long *n = kvcalloc(BITS_TO_LONGS(new_size),
+					    sizeof(unsigned long), GFP_KERNEL);
+		if (!n)
+			return -BCH_ERR_ENOMEM_backpointer_mismatches_bitmap;
+
+		memcpy(n, b->buckets,
+		       BITS_TO_LONGS(min(old_size, new_size)) * sizeof(unsigned long));
+		kvfree(b->buckets);
+		b->buckets = n;
+	}
+
+	return 0;
+}
+
 void bch2_bucket_bitmap_free(struct bucket_bitmap *b)
 {
 	mutex_lock(&b->lock);
diff --git a/fs/bcachefs/backpointers.h b/fs/bcachefs/backpointers.h
index f57098c32143..fe7149a2fbf5 100644
--- a/fs/bcachefs/backpointers.h
+++ b/fs/bcachefs/backpointers.h
@@ -188,6 +188,13 @@ int bch2_check_btree_backpointers(struct bch_fs *);
 int bch2_check_extents_to_backpointers(struct bch_fs *);
 int bch2_check_backpointers_to_extents(struct bch_fs *);
 
+static inline bool bch2_bucket_bitmap_test(struct bucket_bitmap *b, u64 i)
+{
+	unsigned long *bitmap = READ_ONCE(b->buckets);
+	return bitmap && test_bit(i, bitmap);
+}
+
+int bch2_bucket_bitmap_resize(struct bucket_bitmap *, u64, u64);
 void bch2_bucket_bitmap_free(struct bucket_bitmap *);
 
 #endif /* _BCACHEFS_BACKPOINTERS_BACKGROUND_H */
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index e1680b635fe1..b58fad743fc4 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -626,7 +626,8 @@ struct bch_dev {
 	u8			*oldest_gen;
 	unsigned long		*buckets_nouse;
 
-	struct bucket_bitmap	bucket_backpointer_mismatches[2];
+	struct bucket_bitmap	bucket_backpointer_mismatch;
+	struct bucket_bitmap	bucket_backpointer_empty;
 
 	struct bch_dev_usage_full __percpu
 				*usage;
diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c
index ca6e58d6fbc8..8bb6384190c5 100644
--- a/fs/bcachefs/buckets.c
+++ b/fs/bcachefs/buckets.c
@@ -1324,27 +1324,10 @@ int bch2_dev_buckets_resize(struct bch_fs *c, struct bch_dev *ca, u64 nbuckets)
 		       sizeof(bucket_gens->b[0]) * copy);
 	}
 
-	for (unsigned i = 0; i < ARRAY_SIZE(ca->bucket_backpointer_mismatches); i++) {
-		struct bucket_bitmap *bitmap = &ca->bucket_backpointer_mismatches[i];
-
-		mutex_lock(&bitmap->lock);
-		if (bitmap->buckets) {
-			unsigned long *n = kvcalloc(BITS_TO_LONGS(nbuckets),
-						    sizeof(unsigned long), GFP_KERNEL);
-			if (!n) {
-				mutex_unlock(&bitmap->lock);
-				ret = -BCH_ERR_ENOMEM_backpointer_mismatches_bitmap;
-				goto err;
-			}
-
-			memcpy(n, bitmap->buckets,
-			       BITS_TO_LONGS(ca->mi.nbuckets) * sizeof(unsigned long));
-			kvfree(bitmap->buckets);
-			bitmap->buckets = n;
-
-		}
-		mutex_unlock(&bitmap->lock);
-	}
+	ret =   bch2_bucket_bitmap_resize(&ca->bucket_backpointer_mismatch,
+					  ca->mi.nbuckets, nbuckets) ?:
+		bch2_bucket_bitmap_resize(&ca->bucket_backpointer_empty,
+					  ca->mi.nbuckets, nbuckets);
 
 	rcu_assign_pointer(ca->bucket_gens, bucket_gens);
 	bucket_gens	= old_bucket_gens;
diff --git a/fs/bcachefs/movinggc.c b/fs/bcachefs/movinggc.c
index 4bfdb1befb9a..0a751a65386f 100644
--- a/fs/bcachefs/movinggc.c
+++ b/fs/bcachefs/movinggc.c
@@ -8,6 +8,7 @@
 #include "bcachefs.h"
 #include "alloc_background.h"
 #include "alloc_foreground.h"
+#include "backpointers.h"
 #include "btree_iter.h"
 #include "btree_update.h"
 #include "btree_write_buffer.h"
@@ -76,7 +77,7 @@ static int bch2_bucket_is_movable(struct btree_trans *trans,
 
 	if (ca->mi.state != BCH_MEMBER_STATE_rw ||
 	    !bch2_dev_is_online(ca))
-		goto out_put;
+		goto out;
 
 	struct bch_alloc_v4 _a;
 	const struct bch_alloc_v4 *a = bch2_alloc_to_v4(k, &_a);
@@ -85,9 +86,8 @@ static int bch2_bucket_is_movable(struct btree_trans *trans,
 	u64 lru_idx	= alloc_lru_idx_fragmentation(*a, ca);
 
 	ret = lru_idx && lru_idx <= time;
-out_put:
-	bch2_dev_put(ca);
 out:
+	bch2_dev_put(ca);
 	bch2_trans_iter_exit(trans, &iter);
 	return ret;
 }
diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index 170b0f26c018..24658bf450ab 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -1366,8 +1366,8 @@ static void bch2_dev_free(struct bch_dev *ca)
 	if (ca->kobj.state_in_sysfs)
 		kobject_del(&ca->kobj);
 
-	for (unsigned i = 0; i < ARRAY_SIZE(ca->bucket_backpointer_mismatches); i++)
-		bch2_bucket_bitmap_free(&ca->bucket_backpointer_mismatches[i]);
+	bch2_bucket_bitmap_free(&ca->bucket_backpointer_mismatch);
+	bch2_bucket_bitmap_free(&ca->bucket_backpointer_empty);
 
 	bch2_free_super(&ca->disk_sb);
 	bch2_dev_allocator_background_exit(ca);
@@ -1499,8 +1499,8 @@ static struct bch_dev *__bch2_dev_alloc(struct bch_fs *c,
 	atomic_long_set(&ca->ref, 1);
 #endif
 
-	for (unsigned i = 0; i < ARRAY_SIZE(ca->bucket_backpointer_mismatches); i++)
-		mutex_init(&ca->bucket_backpointer_mismatches[i].lock);
+	mutex_init(&ca->bucket_backpointer_mismatch.lock);
+	mutex_init(&ca->bucket_backpointer_empty.lock);
 
 	bch2_dev_allocator_background_init(ca);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 8/8] bcachefs: bch2_check_bucket_backpointer_mismatch()
  2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
                   ` (6 preceding siblings ...)
  2025-05-17 19:25 ` [PATCH 7/8] bcachefs: Improve bucket_bitmap code Kent Overstreet
@ 2025-05-17 19:25 ` Kent Overstreet
  7 siblings, 0 replies; 9+ messages in thread
From: Kent Overstreet @ 2025-05-17 19:25 UTC (permalink / raw)
  To: linux-bcachefs; +Cc: Kent Overstreet

Detect buckets with missing backpointers, and run repair on demand.

__bch2_move_data_phys() now calls
bch2_check_bucket_backpointer_mismatch() as it walks buckets, which
checks for missing backpointers by comparing backpointers against bucket
sector counts.

When missing backpointers are detected, we kick off
bch2_check_extents_to_backpointers() asynchronously - right away if
we're trying to evacuate, or with a threshold if we're just running
copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 fs/bcachefs/alloc_background.c |  7 +++-
 fs/bcachefs/backpointers.c     | 75 +++++++++++++++++++++++++++++++---
 fs/bcachefs/backpointers.h     |  3 +-
 fs/bcachefs/move.c             | 21 ++++++++--
 fs/bcachefs/movinggc.c         |  3 ++
 5 files changed, 98 insertions(+), 11 deletions(-)

diff --git a/fs/bcachefs/alloc_background.c b/fs/bcachefs/alloc_background.c
index 88e710ba2685..a38b9c6c891e 100644
--- a/fs/bcachefs/alloc_background.c
+++ b/fs/bcachefs/alloc_background.c
@@ -2175,8 +2175,11 @@ static int invalidate_one_bucket(struct btree_trans *trans,
 	BUG_ON(a->data_type != BCH_DATA_cached);
 	BUG_ON(a->dirty_sectors);
 
-	if (!a->cached_sectors)
-		bch_err(c, "invalidating empty bucket, confused");
+	if (!a->cached_sectors) {
+		bch2_check_bucket_backpointer_mismatch(trans, ca, bucket.offset,
+						       true, last_flushed);
+		goto out;
+	}
 
 	unsigned cached_sectors = a->cached_sectors;
 	u8 gen = a->gen;
diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index 6b98ce1ed6c9..c08bc6685078 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -12,6 +12,7 @@
 #include "disk_accounting.h"
 #include "error.h"
 #include "progress.h"
+#include "recovery_passes.h"
 
 #include <linux/mm.h>
 
@@ -804,6 +805,13 @@ static int bch2_get_btree_in_memory_pos(struct btree_trans *trans,
 	return ret;
 }
 
+static inline int bch2_fs_going_ro(struct bch_fs *c)
+{
+	return test_bit(BCH_FS_going_ro, &c->flags)
+		? -EROFS
+		: 0;
+}
+
 static int bch2_check_extents_to_backpointers_pass(struct btree_trans *trans,
 						   struct extents_to_bp_state *s)
 {
@@ -831,6 +839,7 @@ static int bch2_check_extents_to_backpointers_pass(struct btree_trans *trans,
 
 			ret = for_each_btree_key_continue(trans, iter, 0, k, ({
 				bch2_progress_update_iter(trans, &progress, &iter, "extents_to_backpointers");
+				bch2_fs_going_ro(c) ?:
 				check_extent_to_backpointers(trans, s, btree_id, level, k) ?:
 				bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc);
 			}));
@@ -870,6 +879,7 @@ static int data_type_to_alloc_counter(enum bch_data_type t)
 static int check_bucket_backpointers_to_extents(struct btree_trans *, struct bch_dev *, struct bpos);
 
 static int check_bucket_backpointer_mismatch(struct btree_trans *trans, struct bkey_s_c alloc_k,
+					     bool *had_mismatch,
 					     struct bkey_buf *last_flushed)
 {
 	struct bch_fs *c = trans->c;
@@ -877,6 +887,8 @@ static int check_bucket_backpointer_mismatch(struct btree_trans *trans, struct b
 	const struct bch_alloc_v4 *a = bch2_alloc_to_v4(alloc_k, &a_convert);
 	bool need_commit = false;
 
+	*had_mismatch = false;
+
 	if (a->data_type == BCH_DATA_sb ||
 	    a->data_type == BCH_DATA_journal ||
 	    a->data_type == BCH_DATA_parity)
@@ -957,6 +969,8 @@ static int check_bucket_backpointer_mismatch(struct btree_trans *trans, struct b
 			 ? bch2_bucket_bitmap_set(ca, &ca->bucket_backpointer_empty,
 						  alloc_k.k->p.offset)
 			 : 0);
+
+		*had_mismatch = true;
 	}
 err:
 	bch2_dev_put(ca);
@@ -1104,7 +1118,9 @@ int bch2_check_extents_to_backpointers(struct bch_fs *c)
 
 	ret = for_each_btree_key(trans, iter, BTREE_ID_alloc,
 				 POS_MIN, BTREE_ITER_prefetch, k, ({
-		check_bucket_backpointer_mismatch(trans, k, &s.last_flushed);
+		bool had_mismatch;
+		bch2_fs_going_ro(c) ?:
+		check_bucket_backpointer_mismatch(trans, k, &had_mismatch, &s.last_flushed);
 	}));
 	if (ret)
 		goto err;
@@ -1150,20 +1166,69 @@ int bch2_check_extents_to_backpointers(struct bch_fs *c)
 
 		s.bp_start = bpos_successor(s.bp_end);
 	}
-err:
-	bch2_trans_put(trans);
-	bch2_bkey_buf_exit(&s.last_flushed, c);
-	bch2_btree_cache_unpin(c);
 
 	for_each_member_device(c, ca) {
 		bch2_bucket_bitmap_free(&ca->bucket_backpointer_mismatch);
 		bch2_bucket_bitmap_free(&ca->bucket_backpointer_empty);
 	}
+err:
+	bch2_trans_put(trans);
+	bch2_bkey_buf_exit(&s.last_flushed, c);
+	bch2_btree_cache_unpin(c);
 
 	bch_err_fn(c, ret);
 	return ret;
 }
 
+static int check_bucket_backpointer_pos_mismatch(struct btree_trans *trans,
+						 struct bpos bucket,
+						 bool *had_mismatch,
+						 struct bkey_buf *last_flushed)
+{
+	struct btree_iter alloc_iter;
+	struct bkey_s_c k = bch2_bkey_get_iter(trans, &alloc_iter,
+					       BTREE_ID_alloc, bucket,
+					       BTREE_ITER_cached);
+	int ret = bkey_err(k);
+	if (ret)
+		return ret;
+
+	ret = check_bucket_backpointer_mismatch(trans, k, had_mismatch, last_flushed);
+	bch2_trans_iter_exit(trans, &alloc_iter);
+	return ret;
+}
+
+int bch2_check_bucket_backpointer_mismatch(struct btree_trans *trans,
+					   struct bch_dev *ca, u64 bucket,
+					   bool copygc,
+					   struct bkey_buf *last_flushed)
+{
+	struct bch_fs *c = trans->c;
+	bool had_mismatch;
+	int ret = lockrestart_do(trans,
+		check_bucket_backpointer_pos_mismatch(trans, POS(ca->dev_idx, bucket),
+						      &had_mismatch, last_flushed));
+	if (ret || !had_mismatch)
+		return ret;
+
+	u64 nr = ca->bucket_backpointer_mismatch.nr;
+	u64 allowed = copygc ? ca->mi.nbuckets >> 7 : 0;
+
+	struct printbuf buf = PRINTBUF;
+	__bch2_log_msg_start(ca->name, &buf);
+
+	prt_printf(&buf, "Detected missing backpointers in bucket %llu, now have %llu/%llu with missing\n",
+		   bucket, nr, ca->mi.nbuckets);
+
+	bch2_run_explicit_recovery_pass(c, &buf,
+			BCH_RECOVERY_PASS_check_extents_to_backpointers,
+			nr < allowed ? RUN_RECOVERY_PASS_ratelimit : 0);
+
+	bch2_print_str(c, KERN_ERR, buf.buf);
+	printbuf_exit(&buf);
+	return 0;
+}
+
 /* backpointers -> extents */
 
 static int check_one_backpointer(struct btree_trans *trans,
diff --git a/fs/bcachefs/backpointers.h b/fs/bcachefs/backpointers.h
index fe7149a2fbf5..6840561084ce 100644
--- a/fs/bcachefs/backpointers.h
+++ b/fs/bcachefs/backpointers.h
@@ -182,7 +182,8 @@ struct bkey_s_c bch2_backpointer_get_key(struct btree_trans *, struct bkey_s_c_b
 struct btree *bch2_backpointer_get_node(struct btree_trans *, struct bkey_s_c_backpointer,
 					struct btree_iter *, struct bkey_buf *);
 
-int bch2_check_bucket_backpointer_mismatch(struct btree_trans *, struct bpos, struct bkey_buf *);
+int bch2_check_bucket_backpointer_mismatch(struct btree_trans *, struct bch_dev *, u64,
+					   bool, struct bkey_buf *);
 
 int bch2_check_btree_backpointers(struct bch_fs *);
 int bch2_check_extents_to_backpointers(struct bch_fs *);
diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index 49898d5743d4..0dd3bec3acff 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -815,6 +815,7 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
 			u64 bucket_start,
 			u64 bucket_end,
 			unsigned data_types,
+			bool copygc,
 			move_pred_fn pred, void *arg)
 {
 	struct btree_trans *trans = ctxt->trans;
@@ -825,6 +826,7 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
 	struct bkey_buf sk;
 	struct bkey_s_c k;
 	struct bkey_buf last_flushed;
+	u64 check_mismatch_done = bucket_start;
 	int ret = 0;
 
 	struct bch_dev *ca = bch2_dev_tryget(c, dev);
@@ -835,8 +837,6 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
 
 	struct bpos bp_start	= bucket_pos_to_bp_start(ca, POS(dev, bucket_start));
 	struct bpos bp_end	= bucket_pos_to_bp_end(ca, POS(dev, bucket_end));
-	bch2_dev_put(ca);
-	ca = NULL;
 
 	bch2_bkey_buf_init(&last_flushed);
 	bkey_init(&last_flushed.k->k);
@@ -871,6 +871,14 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
 		if (!k.k || bkey_gt(k.k->p, bp_end))
 			break;
 
+		if (check_mismatch_done < bp_pos_to_bucket(ca, k.k->p).offset) {
+			while (check_mismatch_done < bp_pos_to_bucket(ca, k.k->p).offset) {
+				bch2_check_bucket_backpointer_mismatch(trans, ca, check_mismatch_done++,
+								       copygc, &last_flushed);
+			}
+			continue;
+		}
+
 		if (k.k->type != KEY_TYPE_backpointer)
 			goto next;
 
@@ -946,10 +954,15 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
 next:
 		bch2_btree_iter_advance(trans, &bp_iter);
 	}
+
+	while (check_mismatch_done < bucket_end)
+		bch2_check_bucket_backpointer_mismatch(trans, ca, check_mismatch_done++,
+						       copygc, &last_flushed);
 err:
 	bch2_trans_iter_exit(trans, &bp_iter);
 	bch2_bkey_buf_exit(&sk, c);
 	bch2_bkey_buf_exit(&last_flushed, c);
+	bch2_dev_put(ca);
 	return ret;
 }
 
@@ -974,7 +987,8 @@ int bch2_move_data_phys(struct bch_fs *c,
 		ctxt.stats->data_type = (int) DATA_PROGRESS_DATA_TYPE_phys;
 	}
 
-	int ret = __bch2_move_data_phys(&ctxt, NULL, dev, start, end, data_types, pred, arg);
+	int ret = __bch2_move_data_phys(&ctxt, NULL, dev, start, end,
+					data_types, false, pred, arg);
 	bch2_moving_ctxt_exit(&ctxt);
 
 	return ret;
@@ -1019,6 +1033,7 @@ int bch2_evacuate_bucket(struct moving_context *ctxt,
 				   bucket.offset,
 				   bucket.offset + 1,
 				   ~0,
+				   true,
 				   evacuate_bucket_pred, &arg);
 }
 
diff --git a/fs/bcachefs/movinggc.c b/fs/bcachefs/movinggc.c
index 0a751a65386f..7cb0b3d347b4 100644
--- a/fs/bcachefs/movinggc.c
+++ b/fs/bcachefs/movinggc.c
@@ -75,6 +75,9 @@ static int bch2_bucket_is_movable(struct btree_trans *trans,
 	if (!ca)
 		goto out;
 
+	if (bch2_bucket_bitmap_test(&ca->bucket_backpointer_mismatch, b->k.bucket.offset))
+		goto out;
+
 	if (ca->mi.state != BCH_MEMBER_STATE_rw ||
 	    !bch2_dev_is_online(ca))
 		goto out;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-05-17 19:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-17 19:25 [PATCH 0/8] Runtime self healing for missing backpointers Kent Overstreet
2025-05-17 19:25 ` [PATCH 1/8] bcachefs: struct bch_fs_recovery Kent Overstreet
2025-05-17 19:25 ` [PATCH 2/8] bcachefs: __bch2_run_recovery_passes() Kent Overstreet
2025-05-17 19:25 ` [PATCH 3/8] bcachefs: Reduce usage of recovery.curr_pass Kent Overstreet
2025-05-17 19:25 ` [PATCH 4/8] bcachefs: bch2_recovery_pass_status_to_text() Kent Overstreet
2025-05-17 19:25 ` [PATCH 5/8] bcachefs: bch2_run_explicit_recovery_pass() cleanup Kent Overstreet
2025-05-17 19:25 ` [PATCH 6/8] bcachefs: Run recovery passes asynchronously Kent Overstreet
2025-05-17 19:25 ` [PATCH 7/8] bcachefs: Improve bucket_bitmap code Kent Overstreet
2025-05-17 19:25 ` [PATCH 8/8] bcachefs: bch2_check_bucket_backpointer_mismatch() Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.