* [PATCH 00/21] Big rebalance changes
@ 2025-08-24 12:37 Kent Overstreet
2025-08-24 12:37 ` [PATCH 01/21] bcachefs: s/bch_io_opts/bch_inode_opts Kent Overstreet
` (20 more replies)
0 siblings, 21 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
This is a WIP patch series: there's still lots of unfinished bits, but
there's big user visible changes happening so it's time to give people a
heads up and start circulating for comments and ideas.
This series originated with a debugging session where we noticed that
there were extents that had been written with the wrong checksum type.
That's pretty bad - the extent we noticed was written with _no_
checksum, meaning IO path options got screwed up somewhere and we had no
way of detecting it.
So - that means we need to be able to check and enforce that data is
correctly written according to the specified IO path options from the
filesystem and inode; something we've been needing for awhile.
IO path options -> extent handling is rebalance's responsibility (and
one of these days I'm going to rename that, that name has become quite
the inaccurate description for what it does). We already have the
ability to store IO path options in the extent itself (struct
bch_extent_rebalance); this is necessary for e.g. indirect extents where
we don't know what inode it came from when doing background processing,
as well as for the triggers so that we can do accounting for pending
rebalance work.
So a large part of the series is reworking how we apply IO path options
to extents (bch2_bkey_set_needs_rebalance()), so that we can strictly
enforce that the extent does match the IO path options and record an
error if it does not.
Additionally, in the current code rebalance only really handles the
compression/background_compression/background_target options; it now
needs to handle (enforce, apply, correct existing data) all the IO path
options - that means we need to add support for the data_checksum,
data_replicas and erasure_code option.
Supporting data_replicas requires reworking bch_extent_rebalance
(compatibility notes in that patch), since the triggers that update the
rebalance_work accounting and btrees (now btrees) need to be pure
functions of the extents - they can't be functions of device state or
durability, since those can change.
So bch_extent_rebalance now has flags for "extent needs to be
rebalanced"; the trigger only looks at those (and this will benefit
extent btree update performance); this centralizes looking at the entire
extent and deciding what needs to be done in
bch2_bkey_set_needs_rebalance(), which now has the new consistency
checks. We now clearly define under what conditions and codepaths an
extent is allowed to deviate: option changes, or a foreground write that
only needs background_compression or background_target applied.
The fun stuff:
Now that rebalance can react to any IO path option changes,
- It's no longer required to run 'bcachefs data rereplicate/bcachefs
data op drop_extra_replicas' after replicas setting changes,
BCH_DATA_OP_rereplicate and BCH_DATA_OP_drop_extra_replicas are
obsoleted.
- We can now react to device state changes: durability setting changes,
and more importantly, state changes when a device switches to
BCH_MEMBER_STATE_failed - this will automatically evacuate the device
(and evacuate will resume if we crash or shutdown and restart).
Currently, we only mark devices read-only on excessive IO errors, we
don't automatically mark devices as failed - that only happens in
response to a 'bcachefs device set-state' command. But in the future
we'll want to add configuration and policy for making this happen
automatically when a device appears to be unhealthy.
If you have a huge disk array, this will mean no wearing a pager to
respond to hardware failures: we'll do everything required to keep
data at the appropriate replication level and on healthy devices, just
swap the bad devices at your leisure.
Other good stuff:
- rebalance_work accounting is now broken out into subcounters for each
IO path option so you can better see what background processing is
happening
- new btrees for rebalance_work_hipri, to ensure that evacuating failed
devices runs first, and rebalance_work_pending for data we'd like to
move but can't because the target is full - this will solve the
"rebalance is spinning because I tried to stuff more data into
background_target than fits" bug reports.
Kent Overstreet (21):
bcachefs: s/bch_io_opts/bch_inode_opts
bcachefs: Inode opt helper refactoring
bcachefs: opt_change_cookie
bcachefs: Transactional consistency for set_needs_rebalance
bcachefs: Plumb bch_inode_opts.change_cookie
bcachefs: enum set_needs_rebalance_ctx
bcachefs: do_rebalance_scan() now responsible for indirect extents
bcachefs: Rename, split out bch2_extent_get_io_opts()
bcachefs: do_rebalance_extent() uses bch2_extent_get_apply_io_opts()
bcachefs: Correct propagation of io options to indirect extents
bcachefs: bkey_should_have_rb_opts()
bcachefs: bch2_bkey_needs_rebalance()
bcachefs: rebalance now supports changing checksum type
bcachefs: Consistency checking for bch_extent_rebalance opts
bcachefs: check_rebalance_work checks option inconsistency
bcachefs: bch2_bkey_set_needs_rebalance() now takes
per_snapshot_io_opts
bcachefs: bch_extent_rebalance changes
bcachefs: bch2_set_rebalance_needs_scan_device()
bcachefs: next_rebalance_extent() now handles replicas changes
bcachefs: rebalance: erasure_code opt change now handled
bcachefs: rebalance_work_(hipri|pending) btrees
fs/bcachefs/bcachefs.h | 2 +
fs/bcachefs/bcachefs_format.h | 8 +
fs/bcachefs/buckets.c | 28 +-
fs/bcachefs/checksum.h | 2 +-
fs/bcachefs/data_update.c | 44 +-
fs/bcachefs/data_update.h | 8 +-
fs/bcachefs/disk_accounting_format.h | 1 +
fs/bcachefs/extents.c | 16 +-
fs/bcachefs/extents.h | 15 +-
fs/bcachefs/fs-io-buffered.c | 12 +-
fs/bcachefs/fs-io-direct.c | 8 +-
fs/bcachefs/fs-io.c | 4 +-
fs/bcachefs/inode.c | 45 +-
fs/bcachefs/inode.h | 9 +-
fs/bcachefs/io_misc.c | 14 +-
fs/bcachefs/io_misc.h | 2 +-
fs/bcachefs/io_read.c | 2 +-
fs/bcachefs/io_read.h | 4 +-
fs/bcachefs/io_write.c | 50 +-
fs/bcachefs/io_write.h | 4 +-
fs/bcachefs/io_write_types.h | 2 +-
fs/bcachefs/move.c | 194 +------
fs/bcachefs/move.h | 34 +-
fs/bcachefs/opts.c | 15 +-
fs/bcachefs/opts.h | 9 +-
fs/bcachefs/rebalance.c | 823 +++++++++++++++++++++------
fs/bcachefs/rebalance.h | 80 ++-
fs/bcachefs/rebalance_format.h | 62 +-
fs/bcachefs/reflink.c | 16 +-
fs/bcachefs/sb-errors_format.h | 3 +-
fs/bcachefs/super.c | 4 +
fs/bcachefs/sysfs.c | 4 +
fs/bcachefs/trace.h | 5 -
fs/bcachefs/xattr.c | 3 +
34 files changed, 945 insertions(+), 587 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 01/21] bcachefs: s/bch_io_opts/bch_inode_opts
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 02/21] bcachefs: Inode opt helper refactoring Kent Overstreet
` (19 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Prep for introducing a new bch_io_opts with just the io path options.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/checksum.h | 2 +-
fs/bcachefs/data_update.c | 10 +++++-----
fs/bcachefs/data_update.h | 8 ++++----
fs/bcachefs/extents.c | 6 +++---
fs/bcachefs/extents.h | 4 ++--
fs/bcachefs/fs-io-buffered.c | 6 +++---
fs/bcachefs/fs-io-direct.c | 4 ++--
fs/bcachefs/fs-io.c | 2 +-
fs/bcachefs/inode.c | 4 ++--
fs/bcachefs/inode.h | 6 +++---
fs/bcachefs/io_misc.c | 4 ++--
fs/bcachefs/io_misc.h | 2 +-
fs/bcachefs/io_read.c | 2 +-
fs/bcachefs/io_read.h | 4 ++--
fs/bcachefs/io_write.h | 2 +-
fs/bcachefs/io_write_types.h | 2 +-
fs/bcachefs/move.c | 34 +++++++++++++++++-----------------
fs/bcachefs/move.h | 12 ++++++------
fs/bcachefs/opts.c | 4 ++--
fs/bcachefs/opts.h | 6 +++---
fs/bcachefs/rebalance.c | 18 +++++++++---------
fs/bcachefs/rebalance.h | 6 +++---
fs/bcachefs/reflink.c | 2 +-
23 files changed, 75 insertions(+), 75 deletions(-)
diff --git a/fs/bcachefs/checksum.h b/fs/bcachefs/checksum.h
index 7bd9cf6104ca..10bfadcde80a 100644
--- a/fs/bcachefs/checksum.h
+++ b/fs/bcachefs/checksum.h
@@ -130,7 +130,7 @@ static inline enum bch_csum_type bch2_csum_opt_to_type(enum bch_csum_opt type,
}
static inline enum bch_csum_type bch2_data_checksum_type(struct bch_fs *c,
- struct bch_io_opts opts)
+ struct bch_inode_opts opts)
{
if (opts.nocow)
return 0;
diff --git a/fs/bcachefs/data_update.c b/fs/bcachefs/data_update.c
index 2c997fddefb3..968850da0d23 100644
--- a/fs/bcachefs/data_update.c
+++ b/fs/bcachefs/data_update.c
@@ -613,7 +613,7 @@ int bch2_update_unwritten_extent(struct btree_trans *trans,
}
void bch2_data_update_opts_to_text(struct printbuf *out, struct bch_fs *c,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
if (!out->nr_tabstops)
@@ -681,7 +681,7 @@ void bch2_data_update_inflight_to_text(struct printbuf *out, struct data_update
int bch2_extent_drop_ptrs(struct btree_trans *trans,
struct btree_iter *iter,
struct bkey_s_c k,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
struct bch_fs *c = trans->c;
@@ -731,7 +731,7 @@ int bch2_extent_drop_ptrs(struct btree_trans *trans,
}
static int __bch2_data_update_bios_init(struct data_update *m, struct bch_fs *c,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
unsigned buf_bytes)
{
unsigned nr_vecs = DIV_ROUND_UP(buf_bytes, PAGE_SIZE);
@@ -758,7 +758,7 @@ static int __bch2_data_update_bios_init(struct data_update *m, struct bch_fs *c,
}
int bch2_data_update_bios_init(struct data_update *m, struct bch_fs *c,
- struct bch_io_opts *io_opts)
+ struct bch_inode_opts *io_opts)
{
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(bkey_i_to_s_c(m->k.k));
const union bch_extent_entry *entry;
@@ -830,7 +830,7 @@ int bch2_data_update_init(struct btree_trans *trans,
struct moving_context *ctxt,
struct data_update *m,
struct write_point_specifier wp,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts data_opts,
enum btree_id btree_id,
struct bkey_s_c k)
diff --git a/fs/bcachefs/data_update.h b/fs/bcachefs/data_update.h
index fc12aa65366f..3b0ba6f6497f 100644
--- a/fs/bcachefs/data_update.h
+++ b/fs/bcachefs/data_update.h
@@ -23,7 +23,7 @@ struct data_update_opts {
};
void bch2_data_update_opts_to_text(struct printbuf *, struct bch_fs *,
- struct bch_io_opts *, struct data_update_opts *);
+ struct bch_inode_opts *, struct data_update_opts *);
#define BCH_DATA_UPDATE_TYPES() \
x(copygc, 0) \
@@ -76,18 +76,18 @@ void bch2_data_update_read_done(struct data_update *);
int bch2_extent_drop_ptrs(struct btree_trans *,
struct btree_iter *,
struct bkey_s_c,
- struct bch_io_opts *,
+ struct bch_inode_opts *,
struct data_update_opts *);
int bch2_data_update_bios_init(struct data_update *, struct bch_fs *,
- struct bch_io_opts *);
+ struct bch_inode_opts *);
void bch2_data_update_exit(struct data_update *);
int bch2_data_update_init(struct btree_trans *, struct btree_iter *,
struct moving_context *,
struct data_update *,
struct write_point_specifier,
- struct bch_io_opts *, struct data_update_opts,
+ struct bch_inode_opts *, struct data_update_opts,
enum btree_id, struct bkey_s_c);
void bch2_data_update_opts_normalize(struct bkey_s_c, struct data_update_opts *);
diff --git a/fs/bcachefs/extents.c b/fs/bcachefs/extents.c
index 68a61f7bc737..016242ffc98d 100644
--- a/fs/bcachefs/extents.c
+++ b/fs/bcachefs/extents.c
@@ -1151,7 +1151,7 @@ bch2_extent_has_ptr(struct bkey_s_c k1, struct extent_ptr_decoded p1, struct bke
return NULL;
}
-static bool want_cached_ptr(struct bch_fs *c, struct bch_io_opts *opts,
+static bool want_cached_ptr(struct bch_fs *c, struct bch_inode_opts *opts,
struct bch_extent_ptr *ptr)
{
unsigned target = opts->promote_target ?: opts->foreground_target;
@@ -1165,7 +1165,7 @@ static bool want_cached_ptr(struct bch_fs *c, struct bch_io_opts *opts,
}
void bch2_extent_ptr_set_cached(struct bch_fs *c,
- struct bch_io_opts *opts,
+ struct bch_inode_opts *opts,
struct bkey_s k,
struct bch_extent_ptr *ptr)
{
@@ -1241,7 +1241,7 @@ bool bch2_extent_normalize(struct bch_fs *c, struct bkey_s k)
* the promote target.
*/
bool bch2_extent_normalize_by_opts(struct bch_fs *c,
- struct bch_io_opts *opts,
+ struct bch_inode_opts *opts,
struct bkey_s k)
{
struct bkey_ptrs ptrs;
diff --git a/fs/bcachefs/extents.h b/fs/bcachefs/extents.h
index f6dcb17108cd..03ea7c689d9a 100644
--- a/fs/bcachefs/extents.h
+++ b/fs/bcachefs/extents.h
@@ -686,10 +686,10 @@ bool bch2_extents_match(struct bkey_s_c, struct bkey_s_c);
struct bch_extent_ptr *
bch2_extent_has_ptr(struct bkey_s_c, struct extent_ptr_decoded, struct bkey_s);
-void bch2_extent_ptr_set_cached(struct bch_fs *, struct bch_io_opts *,
+void bch2_extent_ptr_set_cached(struct bch_fs *, struct bch_inode_opts *,
struct bkey_s, struct bch_extent_ptr *);
-bool bch2_extent_normalize_by_opts(struct bch_fs *, struct bch_io_opts *, struct bkey_s);
+bool bch2_extent_normalize_by_opts(struct bch_fs *, struct bch_inode_opts *, struct bkey_s);
bool bch2_extent_normalize(struct bch_fs *, struct bkey_s);
void bch2_extent_ptr_to_text(struct printbuf *out, struct bch_fs *, const struct bch_extent_ptr *);
diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c
index 9532f1a73053..7ba4ef3173c7 100644
--- a/fs/bcachefs/fs-io-buffered.c
+++ b/fs/bcachefs/fs-io-buffered.c
@@ -284,7 +284,7 @@ void bch2_readahead(struct readahead_control *ractl)
{
struct bch_inode_info *inode = to_bch_ei(ractl->mapping->host);
struct bch_fs *c = inode->v.i_sb->s_fs_info;
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
struct folio *folio;
struct readpages_iter readpages_iter;
struct blk_plug plug;
@@ -350,7 +350,7 @@ int bch2_read_single_folio(struct folio *folio, struct address_space *mapping)
struct bch_inode_info *inode = to_bch_ei(mapping->host);
struct bch_fs *c = inode->v.i_sb->s_fs_info;
struct bch_read_bio *rbio;
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
struct blk_plug plug;
int ret;
DECLARE_COMPLETION_ONSTACK(done);
@@ -407,7 +407,7 @@ struct bch_writepage_io {
struct bch_writepage_state {
struct bch_writepage_io *io;
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
struct bch_folio_sector *tmp;
unsigned tmp_sectors;
struct blk_plug plug;
diff --git a/fs/bcachefs/fs-io-direct.c b/fs/bcachefs/fs-io-direct.c
index 79823234160f..2ee6e1720515 100644
--- a/fs/bcachefs/fs-io-direct.c
+++ b/fs/bcachefs/fs-io-direct.c
@@ -68,7 +68,7 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
struct file *file = req->ki_filp;
struct bch_inode_info *inode = file_bch_inode(file);
struct bch_fs *c = inode->v.i_sb->s_fs_info;
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
struct dio_read *dio;
struct bio *bio;
struct blk_plug plug;
@@ -445,7 +445,7 @@ static __always_inline long bch2_dio_write_loop(struct dio_write *dio)
struct kiocb *req = dio->req;
struct address_space *mapping = dio->mapping;
struct bch_inode_info *inode = dio->inode;
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
struct bio *bio = &dio->op.wbio.bio;
unsigned unaligned, iter_count;
bool sync = dio->sync, dropped_locks;
diff --git a/fs/bcachefs/fs-io.c b/fs/bcachefs/fs-io.c
index de0d965f3fde..f71909e4ef1d 100644
--- a/fs/bcachefs/fs-io.c
+++ b/fs/bcachefs/fs-io.c
@@ -627,7 +627,7 @@ static noinline int __bchfs_fallocate(struct bch_inode_info *inode, int mode,
{
struct bch_fs *c = inode->v.i_sb->s_fs_info;
struct bpos end_pos = POS(inode->v.i_ino, end_sector);
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
int ret = 0;
bch2_inode_opts_get(&opts, c, &inode->ei_inode);
diff --git a/fs/bcachefs/inode.c b/fs/bcachefs/inode.c
index 4aa130ff7cf6..30dc6631c333 100644
--- a/fs/bcachefs/inode.c
+++ b/fs/bcachefs/inode.c
@@ -1224,7 +1224,7 @@ struct bch_opts bch2_inode_opts_to_opts(struct bch_inode_unpacked *inode)
return ret;
}
-void bch2_inode_opts_get(struct bch_io_opts *opts, struct bch_fs *c,
+void bch2_inode_opts_get(struct bch_inode_opts *opts, struct bch_fs *c,
struct bch_inode_unpacked *inode)
{
#define x(_name, _bits) \
@@ -1241,7 +1241,7 @@ void bch2_inode_opts_get(struct bch_io_opts *opts, struct bch_fs *c,
bch2_io_opts_fixups(opts);
}
-int bch2_inum_opts_get(struct btree_trans *trans, subvol_inum inum, struct bch_io_opts *opts)
+int bch2_inum_opts_get(struct btree_trans *trans, subvol_inum inum, struct bch_inode_opts *opts)
{
struct bch_inode_unpacked inode;
int ret = lockrestart_do(trans, bch2_inode_find_by_inum_trans(trans, inum, &inode));
diff --git a/fs/bcachefs/inode.h b/fs/bcachefs/inode.h
index 79092ea74844..c26f48fdaa81 100644
--- a/fs/bcachefs/inode.h
+++ b/fs/bcachefs/inode.h
@@ -289,9 +289,9 @@ int bch2_inode_nlink_inc(struct bch_inode_unpacked *);
void bch2_inode_nlink_dec(struct btree_trans *, struct bch_inode_unpacked *);
struct bch_opts bch2_inode_opts_to_opts(struct bch_inode_unpacked *);
-void bch2_inode_opts_get(struct bch_io_opts *, struct bch_fs *,
+void bch2_inode_opts_get(struct bch_inode_opts *, struct bch_fs *,
struct bch_inode_unpacked *);
-int bch2_inum_opts_get(struct btree_trans *, subvol_inum, struct bch_io_opts *);
+int bch2_inum_opts_get(struct btree_trans *, subvol_inum, struct bch_inode_opts *);
int bch2_inode_set_casefold(struct btree_trans *, subvol_inum,
struct bch_inode_unpacked *, unsigned);
@@ -300,7 +300,7 @@ int bch2_inode_set_casefold(struct btree_trans *, subvol_inum,
static inline struct bch_extent_rebalance
bch2_inode_rebalance_opts_get(struct bch_fs *c, struct bch_inode_unpacked *inode)
{
- struct bch_io_opts io_opts;
+ struct bch_inode_opts io_opts;
bch2_inode_opts_get(&io_opts, c, inode);
return io_opts_to_rebalance_opts(c, &io_opts);
}
diff --git a/fs/bcachefs/io_misc.c b/fs/bcachefs/io_misc.c
index fa0b06e17d17..5e03574059e0 100644
--- a/fs/bcachefs/io_misc.c
+++ b/fs/bcachefs/io_misc.c
@@ -24,7 +24,7 @@ int bch2_extent_fallocate(struct btree_trans *trans,
subvol_inum inum,
struct btree_iter *iter,
u64 sectors,
- struct bch_io_opts opts,
+ struct bch_inode_opts opts,
s64 *i_sectors_delta,
struct write_point_specifier write_point)
{
@@ -373,7 +373,7 @@ static int __bch2_resume_logged_op_finsert(struct btree_trans *trans,
struct btree_iter iter;
struct bkey_i_logged_op_finsert *op = bkey_i_to_logged_op_finsert(op_k);
subvol_inum inum = { le32_to_cpu(op->v.subvol), le64_to_cpu(op->v.inum) };
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
u64 dst_offset = le64_to_cpu(op->v.dst_offset);
u64 src_offset = le64_to_cpu(op->v.src_offset);
s64 shift = dst_offset - src_offset;
diff --git a/fs/bcachefs/io_misc.h b/fs/bcachefs/io_misc.h
index b93e4d4b3c0c..6a294f2a6dd6 100644
--- a/fs/bcachefs/io_misc.h
+++ b/fs/bcachefs/io_misc.h
@@ -3,7 +3,7 @@
#define _BCACHEFS_IO_MISC_H
int bch2_extent_fallocate(struct btree_trans *, subvol_inum, struct btree_iter *,
- u64, struct bch_io_opts, s64 *,
+ u64, struct bch_inode_opts, s64 *,
struct write_point_specifier);
int bch2_fpunch_snapshot(struct btree_trans *, struct bpos, struct bpos);
diff --git a/fs/bcachefs/io_read.c b/fs/bcachefs/io_read.c
index e7d53ab1cf55..74c3238230fe 100644
--- a/fs/bcachefs/io_read.c
+++ b/fs/bcachefs/io_read.c
@@ -158,7 +158,7 @@ static bool ptr_being_rewritten(struct bch_read_bio *orig, unsigned dev)
static inline int should_promote(struct bch_fs *c, struct bkey_s_c k,
struct bpos pos,
- struct bch_io_opts opts,
+ struct bch_inode_opts opts,
unsigned flags,
struct bch_io_failures *failed)
{
diff --git a/fs/bcachefs/io_read.h b/fs/bcachefs/io_read.h
index 1e1c0476bd03..df4632f6fe9e 100644
--- a/fs/bcachefs/io_read.h
+++ b/fs/bcachefs/io_read.h
@@ -74,7 +74,7 @@ struct bch_read_bio {
struct bpos data_pos;
struct bversion version;
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
struct work_struct work;
@@ -192,7 +192,7 @@ static inline struct bch_read_bio *rbio_init_fragment(struct bio *bio,
static inline struct bch_read_bio *rbio_init(struct bio *bio,
struct bch_fs *c,
- struct bch_io_opts opts,
+ struct bch_inode_opts opts,
bio_end_io_t end_io)
{
struct bch_read_bio *rbio = to_rbio(bio);
diff --git a/fs/bcachefs/io_write.h b/fs/bcachefs/io_write.h
index 2c0a8f35ee1f..6c05ba6e15d6 100644
--- a/fs/bcachefs/io_write.h
+++ b/fs/bcachefs/io_write.h
@@ -31,7 +31,7 @@ int bch2_extent_update(struct btree_trans *, subvol_inum,
struct disk_reservation *, u64, s64 *, bool);
static inline void bch2_write_op_init(struct bch_write_op *op, struct bch_fs *c,
- struct bch_io_opts opts)
+ struct bch_inode_opts opts)
{
op->c = c;
op->end_io = NULL;
diff --git a/fs/bcachefs/io_write_types.h b/fs/bcachefs/io_write_types.h
index 5da4eb8bb6f6..ab36b03e0a46 100644
--- a/fs/bcachefs/io_write_types.h
+++ b/fs/bcachefs/io_write_types.h
@@ -90,7 +90,7 @@ struct bch_write_op {
struct bch_devs_list devs_have;
u16 target;
u16 nonce;
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
u32 subvol;
struct bpos pos;
diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index 4f41f1f6ec6c..39bec75890f4 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -46,12 +46,12 @@ struct evacuate_bucket_arg {
static bool evacuate_bucket_pred(struct bch_fs *, void *,
enum btree_id, struct bkey_s_c,
- struct bch_io_opts *,
+ struct bch_inode_opts *,
struct data_update_opts *);
static noinline void
trace_io_move2(struct bch_fs *c, struct bkey_s_c k,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
CLASS(printbuf, buf)();
@@ -72,7 +72,7 @@ static noinline void trace_io_move_read2(struct bch_fs *c, struct bkey_s_c k)
static noinline void
trace_io_move_pred2(struct bch_fs *c, struct bkey_s_c k,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts,
move_pred_fn pred, void *_arg, bool p)
{
@@ -325,7 +325,7 @@ int bch2_move_extent(struct moving_context *ctxt,
struct move_bucket *bucket_in_flight,
struct btree_iter *iter,
struct bkey_s_c k,
- struct bch_io_opts io_opts,
+ struct bch_inode_opts io_opts,
struct data_update_opts data_opts)
{
struct btree_trans *trans = ctxt->trans;
@@ -447,7 +447,7 @@ int bch2_move_extent(struct moving_context *ctxt,
return ret;
}
-struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
+struct bch_inode_opts *bch2_move_get_io_opts(struct btree_trans *trans,
struct per_snapshot_io_opts *io_opts,
struct bpos extent_pos, /* extent_iter, extent_k may be in reflink btree */
struct btree_iter *extent_iter,
@@ -455,7 +455,7 @@ struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
{
struct bch_fs *c = trans->c;
u32 restart_count = trans->restart_count;
- struct bch_io_opts *opts_ret = &io_opts->fs_io_opts;
+ struct bch_inode_opts *opts_ret = &io_opts->fs_io_opts;
int ret = 0;
if (btree_iter_path(trans, extent_iter)->level)
@@ -506,7 +506,7 @@ struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *trans,
}
int bch2_move_get_io_opts_one(struct btree_trans *trans,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct btree_iter *extent_iter,
struct bkey_s_c extent_k)
{
@@ -618,7 +618,7 @@ int bch2_move_data_btree(struct moving_context *ctxt,
struct btree_trans *trans = ctxt->trans;
struct bch_fs *c = trans->c;
struct per_snapshot_io_opts snapshot_io_opts;
- struct bch_io_opts *io_opts;
+ struct bch_inode_opts *io_opts;
struct bkey_buf sk;
struct btree_iter iter, reflink_iter = {};
struct bkey_s_c k;
@@ -855,7 +855,7 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
struct btree_trans *trans = ctxt->trans;
struct bch_fs *c = trans->c;
bool is_kthread = current->flags & PF_KTHREAD;
- struct bch_io_opts io_opts = bch2_opts_to_inode_opts(c->opts);
+ struct bch_inode_opts io_opts = bch2_opts_to_inode_opts(c->opts);
struct btree_iter iter = {};
struct bkey_buf sk;
struct bkey_s_c k;
@@ -1034,7 +1034,7 @@ int bch2_move_data_phys(struct bch_fs *c,
static bool evacuate_bucket_pred(struct bch_fs *c, void *_arg,
enum btree_id btree, struct bkey_s_c k,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
struct evacuate_bucket_arg *arg = _arg;
@@ -1075,7 +1075,7 @@ int bch2_evacuate_bucket(struct moving_context *ctxt,
}
typedef bool (*move_btree_pred)(struct bch_fs *, void *,
- struct btree *, struct bch_io_opts *,
+ struct btree *, struct bch_inode_opts *,
struct data_update_opts *);
static int bch2_move_btree(struct bch_fs *c,
@@ -1085,7 +1085,7 @@ static int bch2_move_btree(struct bch_fs *c,
struct bch_move_stats *stats)
{
bool kthread = (current->flags & PF_KTHREAD) != 0;
- struct bch_io_opts io_opts = bch2_opts_to_inode_opts(c->opts);
+ struct bch_inode_opts io_opts = bch2_opts_to_inode_opts(c->opts);
struct moving_context ctxt;
struct btree_trans *trans;
struct btree_iter iter;
@@ -1154,7 +1154,7 @@ static int bch2_move_btree(struct bch_fs *c,
static bool rereplicate_pred(struct bch_fs *c, void *arg,
enum btree_id btree, struct bkey_s_c k,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
unsigned nr_good = bch2_bkey_durability(c, k);
@@ -1185,7 +1185,7 @@ static bool rereplicate_pred(struct bch_fs *c, void *arg,
static bool migrate_pred(struct bch_fs *c, void *arg,
enum btree_id btree, struct bkey_s_c k,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
@@ -1222,7 +1222,7 @@ static bool bformat_needs_redo(struct bkey_format *f)
static bool rewrite_old_nodes_pred(struct bch_fs *c, void *arg,
struct btree *b,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
if (b->version_ondisk != c->sb.version ||
@@ -1259,7 +1259,7 @@ int bch2_scan_old_btree_nodes(struct bch_fs *c, struct bch_move_stats *stats)
static bool drop_extra_replicas_pred(struct bch_fs *c, void *arg,
enum btree_id btree, struct bkey_s_c k,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
unsigned durability = bch2_bkey_durability(c, k);
@@ -1297,7 +1297,7 @@ static bool drop_extra_replicas_pred(struct bch_fs *c, void *arg,
static bool scrub_pred(struct bch_fs *c, void *_arg,
enum btree_id btree, struct bkey_s_c k,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
struct bch_ioctl_data *arg = _arg;
diff --git a/fs/bcachefs/move.h b/fs/bcachefs/move.h
index 481026ff99ab..a5561e02c38c 100644
--- a/fs/bcachefs/move.h
+++ b/fs/bcachefs/move.h
@@ -73,7 +73,7 @@ do { \
} while (1)
typedef bool (*move_pred_fn)(struct bch_fs *, void *, enum btree_id, struct bkey_s_c,
- struct bch_io_opts *, struct data_update_opts *);
+ struct bch_inode_opts *, struct data_update_opts *);
extern const char * const bch2_data_ops_strs[];
@@ -90,12 +90,12 @@ int bch2_move_ratelimit(struct moving_context *);
/* Inodes in different snapshots may have different IO options: */
struct snapshot_io_opts_entry {
u32 snapshot;
- struct bch_io_opts io_opts;
+ struct bch_inode_opts io_opts;
};
struct per_snapshot_io_opts {
u64 cur_inum;
- struct bch_io_opts fs_io_opts;
+ struct bch_inode_opts fs_io_opts;
DARRAY(struct snapshot_io_opts_entry) d;
};
@@ -110,7 +110,7 @@ static inline void per_snapshot_io_opts_exit(struct per_snapshot_io_opts *io_opt
darray_exit(&io_opts->d);
}
-int bch2_move_get_io_opts_one(struct btree_trans *, struct bch_io_opts *,
+int bch2_move_get_io_opts_one(struct btree_trans *, struct bch_inode_opts *,
struct btree_iter *, struct bkey_s_c);
int bch2_scan_old_btree_nodes(struct bch_fs *, struct bch_move_stats *);
@@ -119,10 +119,10 @@ int bch2_move_extent(struct moving_context *,
struct move_bucket *,
struct btree_iter *,
struct bkey_s_c,
- struct bch_io_opts,
+ struct bch_inode_opts,
struct data_update_opts);
-struct bch_io_opts *bch2_move_get_io_opts(struct btree_trans *,
+struct bch_inode_opts *bch2_move_get_io_opts(struct btree_trans *,
struct per_snapshot_io_opts *, struct bpos,
struct btree_iter *, struct bkey_s_c);
diff --git a/fs/bcachefs/opts.c b/fs/bcachefs/opts.c
index c3ef35dc01e2..6091e6d55a17 100644
--- a/fs/bcachefs/opts.c
+++ b/fs/bcachefs/opts.c
@@ -802,9 +802,9 @@ bool bch2_opt_set_sb(struct bch_fs *c, struct bch_dev *ca,
/* io opts: */
-struct bch_io_opts bch2_opts_to_inode_opts(struct bch_opts src)
+struct bch_inode_opts bch2_opts_to_inode_opts(struct bch_opts src)
{
- struct bch_io_opts opts = {
+ struct bch_inode_opts opts = {
#define x(_name, _bits) ._name = src._name,
BCH_INODE_OPTS()
#undef x
diff --git a/fs/bcachefs/opts.h b/fs/bcachefs/opts.h
index 31a3abcbd83e..40a55335998d 100644
--- a/fs/bcachefs/opts.h
+++ b/fs/bcachefs/opts.h
@@ -670,7 +670,7 @@ int bch2_parse_mount_opts(struct bch_fs *, struct bch_opts *, struct printbuf *,
/* inode opts: */
-struct bch_io_opts {
+struct bch_inode_opts {
#define x(_name, _bits) u##_bits _name;
BCH_INODE_OPTS()
#undef x
@@ -679,7 +679,7 @@ struct bch_io_opts {
#undef x
};
-static inline void bch2_io_opts_fixups(struct bch_io_opts *opts)
+static inline void bch2_io_opts_fixups(struct bch_inode_opts *opts)
{
if (!opts->background_target)
opts->background_target = opts->foreground_target;
@@ -692,7 +692,7 @@ static inline void bch2_io_opts_fixups(struct bch_io_opts *opts)
}
}
-struct bch_io_opts bch2_opts_to_inode_opts(struct bch_opts);
+struct bch_inode_opts bch2_opts_to_inode_opts(struct bch_opts);
bool bch2_opt_is_inode_opt(enum bch_opt_id);
#endif /* _BCACHEFS_OPTS_H */
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 25bf72dc6488..9590c57798c6 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -44,7 +44,7 @@ static const struct bch_extent_rebalance *bch2_bkey_rebalance_opts(struct bkey_s
}
static inline unsigned bch2_bkey_ptrs_need_compress(struct bch_fs *c,
- struct bch_io_opts *opts,
+ struct bch_inode_opts *opts,
struct bkey_s_c k,
struct bkey_ptrs_c ptrs)
{
@@ -71,7 +71,7 @@ static inline unsigned bch2_bkey_ptrs_need_compress(struct bch_fs *c,
}
static inline unsigned bch2_bkey_ptrs_need_move(struct bch_fs *c,
- struct bch_io_opts *opts,
+ struct bch_inode_opts *opts,
struct bkey_ptrs_c ptrs)
{
if (!opts->background_target ||
@@ -92,7 +92,7 @@ static inline unsigned bch2_bkey_ptrs_need_move(struct bch_fs *c,
}
static unsigned bch2_bkey_ptrs_need_rebalance(struct bch_fs *c,
- struct bch_io_opts *opts,
+ struct bch_inode_opts *opts,
struct bkey_s_c k)
{
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
@@ -145,7 +145,7 @@ u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *c, struct bkey_s_c k)
return sectors;
}
-static bool bch2_bkey_rebalance_needs_update(struct bch_fs *c, struct bch_io_opts *opts,
+static bool bch2_bkey_rebalance_needs_update(struct bch_fs *c, struct bch_inode_opts *opts,
struct bkey_s_c k)
{
if (!bkey_extent_is_direct_data(k.k))
@@ -161,7 +161,7 @@ static bool bch2_bkey_rebalance_needs_update(struct bch_fs *c, struct bch_io_opt
}
}
-int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_io_opts *opts,
+int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
struct bkey_i *_k)
{
if (!bkey_extent_is_direct_data(&_k->k))
@@ -187,7 +187,7 @@ int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_io_opts *opts,
}
int bch2_get_update_rebalance_opts(struct btree_trans *trans,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct btree_iter *iter,
struct bkey_s_c k)
{
@@ -356,7 +356,7 @@ static int bch2_bkey_clear_needs_rebalance(struct btree_trans *trans,
static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
struct bpos work_pos,
struct btree_iter *extent_iter,
- struct bch_io_opts *io_opts,
+ struct bch_inode_opts *io_opts,
struct data_update_opts *data_opts)
{
struct bch_fs *c = trans->c;
@@ -435,7 +435,7 @@ static int do_rebalance_extent(struct moving_context *ctxt,
struct bch_fs *c = trans->c;
struct bch_fs_rebalance *r = &trans->c->rebalance;
struct data_update_opts data_opts;
- struct bch_io_opts io_opts;
+ struct bch_inode_opts io_opts;
struct bkey_s_c k;
struct bkey_buf sk;
int ret;
@@ -508,7 +508,7 @@ static int do_rebalance_scan(struct moving_context *ctxt,
BTREE_ITER_prefetch, k, ({
ctxt->stats->pos = BBPOS(iter.btree_id, iter.pos);
- struct bch_io_opts *io_opts = bch2_move_get_io_opts(trans,
+ struct bch_inode_opts *io_opts = bch2_move_get_io_opts(trans,
&snapshot_io_opts, iter.pos, &iter, k);
PTR_ERR_OR_ZERO(io_opts);
})) ?:
diff --git a/fs/bcachefs/rebalance.h b/fs/bcachefs/rebalance.h
index 7a565ea7dbfc..9f2839ddb60e 100644
--- a/fs/bcachefs/rebalance.h
+++ b/fs/bcachefs/rebalance.h
@@ -8,7 +8,7 @@
#include "rebalance_types.h"
static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_fs *c,
- struct bch_io_opts *opts)
+ struct bch_inode_opts *opts)
{
struct bch_extent_rebalance r = {
.type = BIT(BCH_EXTENT_ENTRY_rebalance),
@@ -27,9 +27,9 @@ static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_f
};
u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *, struct bkey_s_c);
-int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_io_opts *, struct bkey_i *);
+int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_inode_opts *, struct bkey_i *);
int bch2_get_update_rebalance_opts(struct btree_trans *,
- struct bch_io_opts *,
+ struct bch_inode_opts *,
struct btree_iter *,
struct bkey_s_c);
diff --git a/fs/bcachefs/reflink.c b/fs/bcachefs/reflink.c
index 238a362de19e..55ad8ab7a148 100644
--- a/fs/bcachefs/reflink.c
+++ b/fs/bcachefs/reflink.c
@@ -589,7 +589,7 @@ s64 bch2_remap_range(struct bch_fs *c,
struct bpos dst_start = POS(dst_inum.inum, dst_offset);
struct bpos src_start = POS(src_inum.inum, src_offset);
struct bpos dst_end = dst_start, src_end = src_start;
- struct bch_io_opts opts;
+ struct bch_inode_opts opts;
struct bpos src_want;
u64 dst_done = 0;
u32 dst_snapshot, src_snapshot;
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 02/21] bcachefs: Inode opt helper refactoring
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
2025-08-24 12:37 ` [PATCH 01/21] bcachefs: s/bch_io_opts/bch_inode_opts Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 03/21] bcachefs: opt_change_cookie Kent Overstreet
` (18 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/fs-io-buffered.c | 8 ++++----
fs/bcachefs/fs-io-direct.c | 6 +++---
fs/bcachefs/fs-io.c | 2 +-
fs/bcachefs/inode.c | 17 +++++++++--------
fs/bcachefs/inode.h | 5 ++---
fs/bcachefs/move.c | 14 +++++++++-----
fs/bcachefs/move.h | 2 +-
fs/bcachefs/opts.c | 11 +++++------
fs/bcachefs/opts.h | 2 +-
9 files changed, 35 insertions(+), 32 deletions(-)
diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c
index 7ba4ef3173c7..a027992d769c 100644
--- a/fs/bcachefs/fs-io-buffered.c
+++ b/fs/bcachefs/fs-io-buffered.c
@@ -284,12 +284,12 @@ void bch2_readahead(struct readahead_control *ractl)
{
struct bch_inode_info *inode = to_bch_ei(ractl->mapping->host);
struct bch_fs *c = inode->v.i_sb->s_fs_info;
- struct bch_inode_opts opts;
struct folio *folio;
struct readpages_iter readpages_iter;
struct blk_plug plug;
- bch2_inode_opts_get(&opts, c, &inode->ei_inode);
+ struct bch_inode_opts opts;
+ bch2_inode_opts_get_inode(c, &inode->ei_inode, &opts);
int ret = readpages_iter_init(&readpages_iter, ractl);
if (ret)
@@ -361,7 +361,7 @@ int bch2_read_single_folio(struct folio *folio, struct address_space *mapping)
if (!bch2_folio_create(folio, GFP_KERNEL))
return -ENOMEM;
- bch2_inode_opts_get(&opts, c, &inode->ei_inode);
+ bch2_inode_opts_get_inode(c, &inode->ei_inode, &opts);
rbio = rbio_init(bio_alloc_bioset(NULL, 1, REQ_OP_READ, GFP_KERNEL, &c->bio_read),
c,
@@ -683,7 +683,7 @@ int bch2_writepages(struct address_space *mapping, struct writeback_control *wbc
struct bch_fs *c = mapping->host->i_sb->s_fs_info;
struct bch_writepage_state *w = kzalloc(sizeof(*w), GFP_NOFS|__GFP_NOFAIL);
- bch2_inode_opts_get(&w->opts, c, &to_bch_ei(mapping->host)->ei_inode);
+ bch2_inode_opts_get_inode(c, &to_bch_ei(mapping->host)->ei_inode, &w->opts);
blk_start_plug(&w->plug);
int ret = bch2_write_cache_pages(mapping, wbc, w);
diff --git a/fs/bcachefs/fs-io-direct.c b/fs/bcachefs/fs-io-direct.c
index 2ee6e1720515..a104b9d70bea 100644
--- a/fs/bcachefs/fs-io-direct.c
+++ b/fs/bcachefs/fs-io-direct.c
@@ -68,7 +68,6 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
struct file *file = req->ki_filp;
struct bch_inode_info *inode = file_bch_inode(file);
struct bch_fs *c = inode->v.i_sb->s_fs_info;
- struct bch_inode_opts opts;
struct dio_read *dio;
struct bio *bio;
struct blk_plug plug;
@@ -78,7 +77,8 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
size_t shorten;
ssize_t ret;
- bch2_inode_opts_get(&opts, c, &inode->ei_inode);
+ struct bch_inode_opts opts;
+ bch2_inode_opts_get_inode(c, &inode->ei_inode, &opts);
/* bios must be 512 byte aligned: */
if ((offset|iter->count) & (SECTOR_SIZE - 1))
@@ -451,7 +451,7 @@ static __always_inline long bch2_dio_write_loop(struct dio_write *dio)
bool sync = dio->sync, dropped_locks;
long ret;
- bch2_inode_opts_get(&opts, c, &inode->ei_inode);
+ bch2_inode_opts_get_inode(c, &inode->ei_inode, &opts);
while (1) {
iter_count = dio->iter.count;
diff --git a/fs/bcachefs/fs-io.c b/fs/bcachefs/fs-io.c
index f71909e4ef1d..57e9459afa07 100644
--- a/fs/bcachefs/fs-io.c
+++ b/fs/bcachefs/fs-io.c
@@ -630,7 +630,7 @@ static noinline int __bchfs_fallocate(struct bch_inode_info *inode, int mode,
struct bch_inode_opts opts;
int ret = 0;
- bch2_inode_opts_get(&opts, c, &inode->ei_inode);
+ bch2_inode_opts_get_inode(c, &inode->ei_inode, &opts);
CLASS(btree_trans, trans)(c);
CLASS(btree_iter, iter)(trans, BTREE_ID_extents,
diff --git a/fs/bcachefs/inode.c b/fs/bcachefs/inode.c
index 30dc6631c333..b7fcdb76483c 100644
--- a/fs/bcachefs/inode.c
+++ b/fs/bcachefs/inode.c
@@ -1224,21 +1224,22 @@ struct bch_opts bch2_inode_opts_to_opts(struct bch_inode_unpacked *inode)
return ret;
}
-void bch2_inode_opts_get(struct bch_inode_opts *opts, struct bch_fs *c,
- struct bch_inode_unpacked *inode)
+void bch2_inode_opts_get_inode(struct bch_fs *c,
+ struct bch_inode_unpacked *inode,
+ struct bch_inode_opts *ret)
{
#define x(_name, _bits) \
if ((inode)->bi_##_name) { \
- opts->_name = inode->bi_##_name - 1; \
- opts->_name##_from_inode = true; \
+ ret->_name = inode->bi_##_name - 1; \
+ ret->_name##_from_inode = true; \
} else { \
- opts->_name = c->opts._name; \
- opts->_name##_from_inode = false; \
+ ret->_name = c->opts._name; \
+ ret->_name##_from_inode = false; \
}
BCH_INODE_OPTS()
#undef x
- bch2_io_opts_fixups(opts);
+ bch2_io_opts_fixups(ret);
}
int bch2_inum_opts_get(struct btree_trans *trans, subvol_inum inum, struct bch_inode_opts *opts)
@@ -1249,7 +1250,7 @@ int bch2_inum_opts_get(struct btree_trans *trans, subvol_inum inum, struct bch_i
if (ret)
return ret;
- bch2_inode_opts_get(opts, trans->c, &inode);
+ bch2_inode_opts_get_inode(trans->c, &inode, opts);
return 0;
}
diff --git a/fs/bcachefs/inode.h b/fs/bcachefs/inode.h
index c26f48fdaa81..12e0a104c196 100644
--- a/fs/bcachefs/inode.h
+++ b/fs/bcachefs/inode.h
@@ -289,8 +289,7 @@ int bch2_inode_nlink_inc(struct bch_inode_unpacked *);
void bch2_inode_nlink_dec(struct btree_trans *, struct bch_inode_unpacked *);
struct bch_opts bch2_inode_opts_to_opts(struct bch_inode_unpacked *);
-void bch2_inode_opts_get(struct bch_inode_opts *, struct bch_fs *,
- struct bch_inode_unpacked *);
+void bch2_inode_opts_get_inode(struct bch_fs *, struct bch_inode_unpacked *, struct bch_inode_opts *);
int bch2_inum_opts_get(struct btree_trans *, subvol_inum, struct bch_inode_opts *);
int bch2_inode_set_casefold(struct btree_trans *, subvol_inum,
struct bch_inode_unpacked *, unsigned);
@@ -301,7 +300,7 @@ static inline struct bch_extent_rebalance
bch2_inode_rebalance_opts_get(struct bch_fs *c, struct bch_inode_unpacked *inode)
{
struct bch_inode_opts io_opts;
- bch2_inode_opts_get(&io_opts, c, inode);
+ bch2_inode_opts_get_inode(c, inode, &io_opts);
return io_opts_to_rebalance_opts(c, &io_opts);
}
diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index 39bec75890f4..03b3060f1964 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -481,7 +481,7 @@ struct bch_inode_opts *bch2_move_get_io_opts(struct btree_trans *trans,
break;
struct snapshot_io_opts_entry e = { .snapshot = k.k->p.snapshot };
- bch2_inode_opts_get(&e.io_opts, trans->c, &inode);
+ bch2_inode_opts_get_inode(trans->c, &inode, &e.io_opts);
darray_push(&io_opts->d, e);
}));
@@ -512,7 +512,7 @@ int bch2_move_get_io_opts_one(struct btree_trans *trans,
{
struct bch_fs *c = trans->c;
- *io_opts = bch2_opts_to_inode_opts(c->opts);
+ bch2_inode_opts_get(c, io_opts);
/* reflink btree? */
if (extent_k.k->p.inode) {
@@ -527,7 +527,7 @@ int bch2_move_get_io_opts_one(struct btree_trans *trans,
if (!ret && bkey_is_inode(inode_k.k)) {
struct bch_inode_unpacked inode;
bch2_inode_unpack(inode_k, &inode);
- bch2_inode_opts_get(io_opts, c, &inode);
+ bch2_inode_opts_get_inode(c, &inode, io_opts);
}
}
@@ -855,7 +855,6 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
struct btree_trans *trans = ctxt->trans;
struct bch_fs *c = trans->c;
bool is_kthread = current->flags & PF_KTHREAD;
- struct bch_inode_opts io_opts = bch2_opts_to_inode_opts(c->opts);
struct btree_iter iter = {};
struct bkey_buf sk;
struct bkey_s_c k;
@@ -863,6 +862,9 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
u64 check_mismatch_done = bucket_start;
int ret = 0;
+ struct bch_inode_opts io_opts;
+ bch2_inode_opts_get(c, &io_opts);
+
CLASS(bch2_dev_tryget, ca)(c, dev);
if (!ca)
return 0;
@@ -1085,7 +1087,6 @@ static int bch2_move_btree(struct bch_fs *c,
struct bch_move_stats *stats)
{
bool kthread = (current->flags & PF_KTHREAD) != 0;
- struct bch_inode_opts io_opts = bch2_opts_to_inode_opts(c->opts);
struct moving_context ctxt;
struct btree_trans *trans;
struct btree_iter iter;
@@ -1094,6 +1095,9 @@ static int bch2_move_btree(struct bch_fs *c,
struct data_update_opts data_opts;
int ret = 0;
+ struct bch_inode_opts io_opts;
+ bch2_inode_opts_get(c, &io_opts);
+
bch2_moving_ctxt_init(&ctxt, c, NULL, stats,
writepoint_ptr(&c->btree_write_point),
true);
diff --git a/fs/bcachefs/move.h b/fs/bcachefs/move.h
index a5561e02c38c..18021d2c51d0 100644
--- a/fs/bcachefs/move.h
+++ b/fs/bcachefs/move.h
@@ -102,7 +102,7 @@ struct per_snapshot_io_opts {
static inline void per_snapshot_io_opts_init(struct per_snapshot_io_opts *io_opts, struct bch_fs *c)
{
memset(io_opts, 0, sizeof(*io_opts));
- io_opts->fs_io_opts = bch2_opts_to_inode_opts(c->opts);
+ bch2_inode_opts_get(c, &io_opts->fs_io_opts);
}
static inline void per_snapshot_io_opts_exit(struct per_snapshot_io_opts *io_opts)
diff --git a/fs/bcachefs/opts.c b/fs/bcachefs/opts.c
index 6091e6d55a17..10d472d3e7d1 100644
--- a/fs/bcachefs/opts.c
+++ b/fs/bcachefs/opts.c
@@ -802,16 +802,15 @@ bool bch2_opt_set_sb(struct bch_fs *c, struct bch_dev *ca,
/* io opts: */
-struct bch_inode_opts bch2_opts_to_inode_opts(struct bch_opts src)
+void bch2_inode_opts_get(struct bch_fs *c, struct bch_inode_opts *ret)
{
- struct bch_inode_opts opts = {
-#define x(_name, _bits) ._name = src._name,
+ memset(ret, 0, sizeof(*ret));
+
+#define x(_name, _bits) ret->_name = c->opts._name,
BCH_INODE_OPTS()
#undef x
- };
- bch2_io_opts_fixups(&opts);
- return opts;
+ bch2_io_opts_fixups(ret);
}
bool bch2_opt_is_inode_opt(enum bch_opt_id id)
diff --git a/fs/bcachefs/opts.h b/fs/bcachefs/opts.h
index 40a55335998d..9e0c73b7bd3a 100644
--- a/fs/bcachefs/opts.h
+++ b/fs/bcachefs/opts.h
@@ -692,7 +692,7 @@ static inline void bch2_io_opts_fixups(struct bch_inode_opts *opts)
}
}
-struct bch_inode_opts bch2_opts_to_inode_opts(struct bch_opts);
+void bch2_inode_opts_get(struct bch_fs *, struct bch_inode_opts *);
bool bch2_opt_is_inode_opt(enum bch_opt_id);
#endif /* _BCACHEFS_OPTS_H */
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 03/21] bcachefs: opt_change_cookie
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
2025-08-24 12:37 ` [PATCH 01/21] bcachefs: s/bch_io_opts/bch_inode_opts Kent Overstreet
2025-08-24 12:37 ` [PATCH 02/21] bcachefs: Inode opt helper refactoring Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 04/21] bcachefs: Transactional consistency for set_needs_rebalance Kent Overstreet
` (17 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Add a sequence number for detecting races between when we read the io
path options and when we do the btree update with the new extent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/bcachefs.h | 2 ++
fs/bcachefs/inode.c | 2 ++
fs/bcachefs/opts.c | 4 ++++
fs/bcachefs/opts.h | 3 +++
fs/bcachefs/xattr.c | 3 +++
5 files changed, 14 insertions(+)
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index 16d08dfb5f19..7eb8b8f37f95 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -807,6 +807,8 @@ struct bch_fs {
struct bch_disk_groups_cpu __rcu *disk_groups;
struct bch_opts opts;
+ atomic_t opt_change_cookie;
+
unsigned loglevel;
unsigned prev_loglevel;
diff --git a/fs/bcachefs/inode.c b/fs/bcachefs/inode.c
index b7fcdb76483c..c1d673374e02 100644
--- a/fs/bcachefs/inode.c
+++ b/fs/bcachefs/inode.c
@@ -1239,6 +1239,8 @@ void bch2_inode_opts_get_inode(struct bch_fs *c,
BCH_INODE_OPTS()
#undef x
+ ret->opt_change_cookie = atomic_read(&c->opt_change_cookie);
+
bch2_io_opts_fixups(ret);
}
diff --git a/fs/bcachefs/opts.c b/fs/bcachefs/opts.c
index 10d472d3e7d1..16d210cbc849 100644
--- a/fs/bcachefs/opts.c
+++ b/fs/bcachefs/opts.c
@@ -606,6 +606,8 @@ void bch2_opt_hook_post_set(struct bch_fs *c, struct bch_dev *ca, u64 inum,
default:
break;
}
+
+ atomic_inc(&c->opt_change_cookie);
}
int bch2_parse_one_mount_opt(struct bch_fs *c, struct bch_opts *opts,
@@ -810,6 +812,8 @@ void bch2_inode_opts_get(struct bch_fs *c, struct bch_inode_opts *ret)
BCH_INODE_OPTS()
#undef x
+ ret->opt_change_cookie = atomic_read(&c->opt_change_cookie);
+
bch2_io_opts_fixups(ret);
}
diff --git a/fs/bcachefs/opts.h b/fs/bcachefs/opts.h
index 9e0c73b7bd3a..2425ba247201 100644
--- a/fs/bcachefs/opts.h
+++ b/fs/bcachefs/opts.h
@@ -674,9 +674,12 @@ struct bch_inode_opts {
#define x(_name, _bits) u##_bits _name;
BCH_INODE_OPTS()
#undef x
+
#define x(_name, _bits) u64 _name##_from_inode:1;
BCH_INODE_OPTS()
#undef x
+
+ u32 opt_change_cookie;
};
static inline void bch2_io_opts_fixups(struct bch_inode_opts *opts)
diff --git a/fs/bcachefs/xattr.c b/fs/bcachefs/xattr.c
index 6d7303008b19..de72a735a49d 100644
--- a/fs/bcachefs/xattr.c
+++ b/fs/bcachefs/xattr.c
@@ -590,6 +590,9 @@ static int bch2_xattr_bcachefs_set(const struct xattr_handler *handler,
}
ret = bch2_write_inode(c, inode, inode_opt_set_fn, &s, 0);
+
+ if (!ret)
+ atomic_inc(&c->opt_change_cookie);
}
err:
return bch2_err_class(ret);
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 04/21] bcachefs: Transactional consistency for set_needs_rebalance
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (2 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 03/21] bcachefs: opt_change_cookie Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 05/21] bcachefs: Plumb bch_inode_opts.change_cookie Kent Overstreet
` (16 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
We're going to be strictly enforcing that extents match the IO path
options, as defined by the filesystem/inode options: that means when we
call set_needs_rebalance(), we need to pass it the opts we got from the
inode in that same transaction.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/data_update.c | 6 +++++-
fs/bcachefs/inode.c | 28 ++++++++++++++++++---------
fs/bcachefs/inode.h | 2 +-
fs/bcachefs/io_misc.c | 8 +-------
fs/bcachefs/io_write.c | 40 ++++++++++++++++++++++++++++-----------
fs/bcachefs/reflink.c | 16 +++++-----------
6 files changed, 60 insertions(+), 40 deletions(-)
diff --git a/fs/bcachefs/data_update.c b/fs/bcachefs/data_update.c
index 968850da0d23..43d318ff488e 100644
--- a/fs/bcachefs/data_update.c
+++ b/fs/bcachefs/data_update.c
@@ -11,6 +11,7 @@
#include "ec.h"
#include "error.h"
#include "extents.h"
+#include "inode.h"
#include "io_write.h"
#include "keylist.h"
#include "move.h"
@@ -428,13 +429,16 @@ static int __bch2_data_update_index_update(struct btree_trans *trans,
goto out;
}
+ struct bch_inode_opts opts;
+
ret = bch2_trans_log_str(trans, bch2_data_update_type_strs[m->type]) ?:
bch2_trans_log_bkey(trans, m->btree_id, 0, m->k.k) ?:
bch2_insert_snapshot_whiteouts(trans, m->btree_id,
k.k->p, bkey_start_pos(&insert->k)) ?:
bch2_insert_snapshot_whiteouts(trans, m->btree_id,
k.k->p, insert->k.p) ?:
- bch2_bkey_set_needs_rebalance(c, &op->opts, insert) ?:
+ bch2_inum_snapshot_opts_get(trans, k.k->p.inode, k.k->p.snapshot, &opts) ?:
+ bch2_bkey_set_needs_rebalance(c, &opts, insert) ?:
bch2_trans_update(trans, &iter, insert,
BTREE_UPDATE_internal_snapshot_node);
if (ret)
diff --git a/fs/bcachefs/inode.c b/fs/bcachefs/inode.c
index c1d673374e02..d1ec33edcc0b 100644
--- a/fs/bcachefs/inode.c
+++ b/fs/bcachefs/inode.c
@@ -369,9 +369,9 @@ int __bch2_inode_peek(struct btree_trans *trans,
}
int bch2_inode_find_by_inum_snapshot(struct btree_trans *trans,
- u64 inode_nr, u32 snapshot,
- struct bch_inode_unpacked *inode,
- unsigned flags)
+ u64 inode_nr, u32 snapshot,
+ struct bch_inode_unpacked *inode,
+ unsigned flags)
{
CLASS(btree_iter, iter)(trans, BTREE_ID_inodes, SPOS(0, inode_nr, snapshot), flags);
struct bkey_s_c k = bch2_btree_iter_peek_slot(&iter);
@@ -1244,15 +1244,25 @@ void bch2_inode_opts_get_inode(struct bch_fs *c,
bch2_io_opts_fixups(ret);
}
-int bch2_inum_opts_get(struct btree_trans *trans, subvol_inum inum, struct bch_inode_opts *opts)
+int bch2_inum_snapshot_opts_get(struct btree_trans *trans,
+ u64 inum, u32 snapshot,
+ struct bch_inode_opts *opts)
{
- struct bch_inode_unpacked inode;
- int ret = lockrestart_do(trans, bch2_inode_find_by_inum_trans(trans, inum, &inode));
+ if (inum) {
+ struct bch_inode_unpacked inode;
+ int ret = bch2_inode_find_by_inum_snapshot(trans, inum, snapshot, &inode, 0);
+ if (ret)
+ return ret;
- if (ret)
- return ret;
+ bch2_inode_opts_get_inode(trans->c, &inode, opts);
+ } else {
+ /*
+ * data_update_index_update may call us for reflink btree extent
+ * updates, inum will be 0
+ */
- bch2_inode_opts_get_inode(trans->c, &inode, opts);
+ bch2_inode_opts_get(trans->c, opts);
+ }
return 0;
}
diff --git a/fs/bcachefs/inode.h b/fs/bcachefs/inode.h
index 12e0a104c196..63b7088811fb 100644
--- a/fs/bcachefs/inode.h
+++ b/fs/bcachefs/inode.h
@@ -290,7 +290,7 @@ void bch2_inode_nlink_dec(struct btree_trans *, struct bch_inode_unpacked *);
struct bch_opts bch2_inode_opts_to_opts(struct bch_inode_unpacked *);
void bch2_inode_opts_get_inode(struct bch_fs *, struct bch_inode_unpacked *, struct bch_inode_opts *);
-int bch2_inum_opts_get(struct btree_trans *, subvol_inum, struct bch_inode_opts *);
+int bch2_inum_snapshot_opts_get(struct btree_trans *, u64, u32, struct bch_inode_opts *);
int bch2_inode_set_casefold(struct btree_trans *, subvol_inum,
struct bch_inode_unpacked *, unsigned);
diff --git a/fs/bcachefs/io_misc.c b/fs/bcachefs/io_misc.c
index 5e03574059e0..6d204b980f76 100644
--- a/fs/bcachefs/io_misc.c
+++ b/fs/bcachefs/io_misc.c
@@ -373,7 +373,6 @@ static int __bch2_resume_logged_op_finsert(struct btree_trans *trans,
struct btree_iter iter;
struct bkey_i_logged_op_finsert *op = bkey_i_to_logged_op_finsert(op_k);
subvol_inum inum = { le32_to_cpu(op->v.subvol), le64_to_cpu(op->v.inum) };
- struct bch_inode_opts opts;
u64 dst_offset = le64_to_cpu(op->v.dst_offset);
u64 src_offset = le64_to_cpu(op->v.src_offset);
s64 shift = dst_offset - src_offset;
@@ -384,10 +383,6 @@ static int __bch2_resume_logged_op_finsert(struct btree_trans *trans,
bool warn_errors = i_sectors_delta != NULL;
int ret = 0;
- ret = bch2_inum_opts_get(trans, inum, &opts);
- if (ret)
- return ret;
-
/*
* check for missing subvolume before fpunch, as in resume we don't want
* it to be a fatal error
@@ -476,8 +471,7 @@ case LOGGED_OP_FINSERT_shift_extents:
op->v.pos = cpu_to_le64(insert ? bkey_start_offset(&delete.k) : delete.k.p.offset);
- ret = bch2_bkey_set_needs_rebalance(c, &opts, copy) ?:
- bch2_btree_insert_trans(trans, BTREE_ID_extents, &delete, 0) ?:
+ ret = bch2_btree_insert_trans(trans, BTREE_ID_extents, &delete, 0) ?:
bch2_btree_insert_trans(trans, BTREE_ID_extents, copy, 0) ?:
bch2_logged_op_update(trans, &op->k_i) ?:
bch2_trans_commit(trans, &disk_res, NULL, BCH_TRANS_COMMIT_no_enospc);
diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
index 1d83dcc9731e..a0cb5d2dd0f8 100644
--- a/fs/bcachefs/io_write.c
+++ b/fs/bcachefs/io_write.c
@@ -205,7 +205,8 @@ int bch2_sum_sector_overwrites(struct btree_trans *trans,
static inline int bch2_extent_update_i_size_sectors(struct btree_trans *trans,
struct btree_iter *extent_iter,
u64 new_i_size,
- s64 i_sectors_delta)
+ s64 i_sectors_delta,
+ struct bch_inode_unpacked *inode_u)
{
/*
* Crazy performance optimization:
@@ -227,7 +228,13 @@ static inline int bch2_extent_update_i_size_sectors(struct btree_trans *trans,
BTREE_ITER_intent|
BTREE_ITER_cached);
struct bkey_s_c k = bch2_btree_iter_peek_slot(&iter);
- int ret = bkey_err(k);
+
+ /*
+ * XXX: we currently need to unpack the inode on every write because we
+ * need the current io_opts, for transactional consistency - inode_v4?
+ */
+ int ret = bkey_err(k) ?:
+ bch2_inode_unpack(k, inode_u);
if (unlikely(ret))
return ret;
@@ -305,6 +312,7 @@ int bch2_extent_update(struct btree_trans *trans,
s64 *i_sectors_delta_total,
bool check_enospc)
{
+ struct bch_fs *c = trans->c;
struct bpos next_pos;
bool usage_increasing;
s64 i_sectors_delta = 0, disk_sectors_delta = 0;
@@ -335,7 +343,7 @@ int bch2_extent_update(struct btree_trans *trans,
if (disk_res &&
disk_sectors_delta > (s64) disk_res->sectors) {
- ret = bch2_disk_reservation_add(trans->c, disk_res,
+ ret = bch2_disk_reservation_add(c, disk_res,
disk_sectors_delta - disk_res->sectors,
!check_enospc || !usage_increasing
? BCH_DISK_RESERVATION_NOFAIL : 0);
@@ -349,9 +357,14 @@ int bch2_extent_update(struct btree_trans *trans,
* aren't changing - for fsync to work properly; fsync relies on
* inode->bi_journal_seq which is updated by the trigger code:
*/
+ struct bch_inode_unpacked inode;
+ struct bch_inode_opts opts;
+
ret = bch2_extent_update_i_size_sectors(trans, iter,
min(k->k.p.offset << 9, new_i_size),
- i_sectors_delta) ?:
+ i_sectors_delta, &inode) ?:
+ (bch2_inode_opts_get_inode(c, &inode, &opts),
+ bch2_bkey_set_needs_rebalance(c, &opts, k)) ?:
bch2_trans_update(trans, iter, k, 0) ?:
bch2_trans_commit(trans, disk_res, NULL,
BCH_TRANS_COMMIT_no_check_rw|
@@ -792,10 +805,6 @@ static void init_append_extent(struct bch_write_op *op,
bch2_alloc_sectors_append_ptrs_inlined(op->c, wp, &e->k_i, crc.compressed_size,
op->flags & BCH_WRITE_cached);
-
- if (!(op->flags & BCH_WRITE_move))
- bch2_bkey_set_needs_rebalance(op->c, &op->opts, &e->k_i);
-
bch2_keylist_push(&op->insert_keys);
}
@@ -1225,6 +1234,7 @@ static int bch2_nocow_write_convert_one_unwritten(struct btree_trans *trans,
return 0;
}
+ struct bch_fs *c = trans->c;
struct bkey_i *new = bch2_trans_kmalloc_nomemzero(trans,
bkey_bytes(k.k) + sizeof(struct bch_extent_rebalance));
int ret = PTR_ERR_OR_ZERO(new);
@@ -1239,8 +1249,6 @@ static int bch2_nocow_write_convert_one_unwritten(struct btree_trans *trans,
bkey_for_each_ptr(ptrs, ptr)
ptr->unwritten = 0;
- bch2_bkey_set_needs_rebalance(op->c, &op->opts, new);
-
/*
* Note that we're not calling bch2_subvol_get_snapshot() in this path -
* that was done when we kicked off the write, and here it's important
@@ -1248,8 +1256,18 @@ static int bch2_nocow_write_convert_one_unwritten(struct btree_trans *trans,
* since been created. The write is still outstanding, so we're ok
* w.r.t. snapshot atomicity:
*/
+
+ /*
+ * For transactional consistency, set_needs_rebalance() has to be called
+ * with the io_opts from the btree in the same transaction:
+ */
+ struct bch_inode_unpacked inode;
+ struct bch_inode_opts opts;
+
return bch2_extent_update_i_size_sectors(trans, iter,
- min(new->k.p.offset << 9, new_i_size), 0) ?:
+ min(new->k.p.offset << 9, new_i_size), 0, &inode) ?:
+ (bch2_inode_opts_get_inode(c, &inode, &opts),
+ bch2_bkey_set_needs_rebalance(c, &opts, new)) ?:
bch2_trans_update(trans, iter, new,
BTREE_UPDATE_internal_snapshot_node);
}
diff --git a/fs/bcachefs/reflink.c b/fs/bcachefs/reflink.c
index 55ad8ab7a148..5e62eddf30ba 100644
--- a/fs/bcachefs/reflink.c
+++ b/fs/bcachefs/reflink.c
@@ -589,7 +589,6 @@ s64 bch2_remap_range(struct bch_fs *c,
struct bpos dst_start = POS(dst_inum.inum, dst_offset);
struct bpos src_start = POS(src_inum.inum, src_offset);
struct bpos dst_end = dst_start, src_end = src_start;
- struct bch_inode_opts opts;
struct bpos src_want;
u64 dst_done = 0;
u32 dst_snapshot, src_snapshot;
@@ -609,10 +608,6 @@ s64 bch2_remap_range(struct bch_fs *c,
bch2_bkey_buf_init(&new_src);
CLASS(btree_trans, trans)(c);
- ret = bch2_inum_opts_get(trans, src_inum, &opts);
- if (ret)
- goto err;
-
bch2_trans_iter_init(trans, &src_iter, BTREE_ID_extents, src_start,
BTREE_ITER_intent);
bch2_trans_iter_init(trans, &dst_iter, BTREE_ID_extents, dst_start,
@@ -709,11 +704,10 @@ s64 bch2_remap_range(struct bch_fs *c,
min(src_k.k->p.offset - src_want.offset,
dst_end.offset - dst_iter.pos.offset));
- ret = bch2_bkey_set_needs_rebalance(c, &opts, new_dst.k) ?:
- bch2_extent_update(trans, dst_inum, &dst_iter,
- new_dst.k, &disk_res,
- new_i_size, i_sectors_delta,
- true);
+ ret = bch2_extent_update(trans, dst_inum, &dst_iter,
+ new_dst.k, &disk_res,
+ new_i_size, i_sectors_delta,
+ true);
bch2_disk_reservation_put(c, &disk_res);
}
bch2_trans_iter_exit(&dst_iter);
@@ -744,7 +738,7 @@ s64 bch2_remap_range(struct bch_fs *c,
bch2_trans_iter_exit(&inode_iter);
} while (bch2_err_matches(ret2, BCH_ERR_transaction_restart));
-err:
+
bch2_bkey_buf_exit(&new_src, c);
bch2_bkey_buf_exit(&new_dst, c);
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 05/21] bcachefs: Plumb bch_inode_opts.change_cookie
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (3 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 04/21] bcachefs: Transactional consistency for set_needs_rebalance Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 06/21] bcachefs: enum set_needs_rebalance_ctx Kent Overstreet
` (15 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/data_update.c | 3 ++-
fs/bcachefs/inode.c | 2 +-
fs/bcachefs/io_misc.c | 4 ++--
fs/bcachefs/io_write.c | 11 +++++++----
fs/bcachefs/io_write.h | 2 +-
fs/bcachefs/opts.c | 2 +-
fs/bcachefs/opts.h | 2 +-
fs/bcachefs/rebalance.c | 4 ++--
fs/bcachefs/rebalance.h | 2 +-
fs/bcachefs/reflink.c | 2 +-
10 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/fs/bcachefs/data_update.c b/fs/bcachefs/data_update.c
index 43d318ff488e..b62d890003ec 100644
--- a/fs/bcachefs/data_update.c
+++ b/fs/bcachefs/data_update.c
@@ -438,7 +438,8 @@ static int __bch2_data_update_index_update(struct btree_trans *trans,
bch2_insert_snapshot_whiteouts(trans, m->btree_id,
k.k->p, insert->k.p) ?:
bch2_inum_snapshot_opts_get(trans, k.k->p.inode, k.k->p.snapshot, &opts) ?:
- bch2_bkey_set_needs_rebalance(c, &opts, insert) ?:
+ bch2_bkey_set_needs_rebalance(c, &opts, insert,
+ m->op.opts.change_cookie) ?:
bch2_trans_update(trans, &iter, insert,
BTREE_UPDATE_internal_snapshot_node);
if (ret)
diff --git a/fs/bcachefs/inode.c b/fs/bcachefs/inode.c
index d1ec33edcc0b..c222fb8a7d07 100644
--- a/fs/bcachefs/inode.c
+++ b/fs/bcachefs/inode.c
@@ -1239,7 +1239,7 @@ void bch2_inode_opts_get_inode(struct bch_fs *c,
BCH_INODE_OPTS()
#undef x
- ret->opt_change_cookie = atomic_read(&c->opt_change_cookie);
+ ret->change_cookie = atomic_read(&c->opt_change_cookie);
bch2_io_opts_fixups(ret);
}
diff --git a/fs/bcachefs/io_misc.c b/fs/bcachefs/io_misc.c
index 6d204b980f76..04eb5ecd102b 100644
--- a/fs/bcachefs/io_misc.c
+++ b/fs/bcachefs/io_misc.c
@@ -109,7 +109,7 @@ int bch2_extent_fallocate(struct btree_trans *trans,
}
ret = bch2_extent_update(trans, inum, iter, new.k, &disk_res,
- 0, i_sectors_delta, true);
+ 0, i_sectors_delta, true, 0);
err:
if (!ret && sectors_allocated)
bch2_increment_clock(c, sectors_allocated, WRITE);
@@ -211,7 +211,7 @@ int bch2_fpunch_at(struct btree_trans *trans, struct btree_iter *iter,
bch2_cut_back(end_pos, &delete);
ret = bch2_extent_update(trans, inum, iter, &delete,
- &disk_res, 0, i_sectors_delta, false);
+ &disk_res, 0, i_sectors_delta, false, 0);
bch2_disk_reservation_put(c, &disk_res);
}
diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
index a0cb5d2dd0f8..0122d8b3292a 100644
--- a/fs/bcachefs/io_write.c
+++ b/fs/bcachefs/io_write.c
@@ -310,7 +310,8 @@ int bch2_extent_update(struct btree_trans *trans,
struct disk_reservation *disk_res,
u64 new_i_size,
s64 *i_sectors_delta_total,
- bool check_enospc)
+ bool check_enospc,
+ u32 change_cookie)
{
struct bch_fs *c = trans->c;
struct bpos next_pos;
@@ -364,7 +365,8 @@ int bch2_extent_update(struct btree_trans *trans,
min(k->k.p.offset << 9, new_i_size),
i_sectors_delta, &inode) ?:
(bch2_inode_opts_get_inode(c, &inode, &opts),
- bch2_bkey_set_needs_rebalance(c, &opts, k)) ?:
+ bch2_bkey_set_needs_rebalance(c, &opts, k,
+ change_cookie)) ?:
bch2_trans_update(trans, iter, k, 0) ?:
bch2_trans_commit(trans, disk_res, NULL,
BCH_TRANS_COMMIT_no_check_rw|
@@ -415,7 +417,8 @@ static int bch2_write_index_default(struct bch_write_op *op)
ret = bch2_extent_update(trans, inum, &iter, sk.k,
&op->res,
op->new_i_size, &op->i_sectors_delta,
- op->flags & BCH_WRITE_check_enospc);
+ op->flags & BCH_WRITE_check_enospc,
+ op->opts.change_cookie);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
continue;
@@ -1267,7 +1270,7 @@ static int bch2_nocow_write_convert_one_unwritten(struct btree_trans *trans,
return bch2_extent_update_i_size_sectors(trans, iter,
min(new->k.p.offset << 9, new_i_size), 0, &inode) ?:
(bch2_inode_opts_get_inode(c, &inode, &opts),
- bch2_bkey_set_needs_rebalance(c, &opts, new)) ?:
+ bch2_bkey_set_needs_rebalance(c, &opts, new, op->opts.change_cookie)) ?:
bch2_trans_update(trans, iter, new,
BTREE_UPDATE_internal_snapshot_node);
}
diff --git a/fs/bcachefs/io_write.h b/fs/bcachefs/io_write.h
index 6c05ba6e15d6..692529bf401d 100644
--- a/fs/bcachefs/io_write.h
+++ b/fs/bcachefs/io_write.h
@@ -28,7 +28,7 @@ int bch2_sum_sector_overwrites(struct btree_trans *, struct btree_iter *,
struct bkey_i *, bool *, s64 *, s64 *);
int bch2_extent_update(struct btree_trans *, subvol_inum,
struct btree_iter *, struct bkey_i *,
- struct disk_reservation *, u64, s64 *, bool);
+ struct disk_reservation *, u64, s64 *, bool, u32);
static inline void bch2_write_op_init(struct bch_write_op *op, struct bch_fs *c,
struct bch_inode_opts opts)
diff --git a/fs/bcachefs/opts.c b/fs/bcachefs/opts.c
index 16d210cbc849..9f5684ec056a 100644
--- a/fs/bcachefs/opts.c
+++ b/fs/bcachefs/opts.c
@@ -812,7 +812,7 @@ void bch2_inode_opts_get(struct bch_fs *c, struct bch_inode_opts *ret)
BCH_INODE_OPTS()
#undef x
- ret->opt_change_cookie = atomic_read(&c->opt_change_cookie);
+ ret->change_cookie = atomic_read(&c->opt_change_cookie);
bch2_io_opts_fixups(ret);
}
diff --git a/fs/bcachefs/opts.h b/fs/bcachefs/opts.h
index 2425ba247201..f0f3483c1aab 100644
--- a/fs/bcachefs/opts.h
+++ b/fs/bcachefs/opts.h
@@ -679,7 +679,7 @@ struct bch_inode_opts {
BCH_INODE_OPTS()
#undef x
- u32 opt_change_cookie;
+ u32 change_cookie;
};
static inline void bch2_io_opts_fixups(struct bch_inode_opts *opts)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 9590c57798c6..d8e214e6f671 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -162,7 +162,7 @@ static bool bch2_bkey_rebalance_needs_update(struct bch_fs *c, struct bch_inode_
}
int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
- struct bkey_i *_k)
+ struct bkey_i *_k, u32 change_cookie)
{
if (!bkey_extent_is_direct_data(&_k->k))
return 0;
@@ -218,7 +218,7 @@ int bch2_get_update_rebalance_opts(struct btree_trans *trans,
/* On successfull transaction commit, @k was invalidated: */
- return bch2_bkey_set_needs_rebalance(trans->c, io_opts, n) ?:
+ return bch2_bkey_set_needs_rebalance(trans->c, io_opts, n, 0) ?:
bch2_trans_update(trans, iter, n, BTREE_UPDATE_internal_snapshot_node) ?:
bch2_trans_commit(trans, NULL, NULL, 0) ?:
bch_err_throw(trans->c, transaction_restart_nested);
diff --git a/fs/bcachefs/rebalance.h b/fs/bcachefs/rebalance.h
index 9f2839ddb60e..62b7f0b3aec7 100644
--- a/fs/bcachefs/rebalance.h
+++ b/fs/bcachefs/rebalance.h
@@ -27,7 +27,7 @@ static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_f
};
u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *, struct bkey_s_c);
-int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_inode_opts *, struct bkey_i *);
+int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_inode_opts *, struct bkey_i *, u32);
int bch2_get_update_rebalance_opts(struct btree_trans *,
struct bch_inode_opts *,
struct btree_iter *,
diff --git a/fs/bcachefs/reflink.c b/fs/bcachefs/reflink.c
index 5e62eddf30ba..d54468fdcb18 100644
--- a/fs/bcachefs/reflink.c
+++ b/fs/bcachefs/reflink.c
@@ -707,7 +707,7 @@ s64 bch2_remap_range(struct bch_fs *c,
ret = bch2_extent_update(trans, dst_inum, &dst_iter,
new_dst.k, &disk_res,
new_i_size, i_sectors_delta,
- true);
+ true, 0);
bch2_disk_reservation_put(c, &disk_res);
}
bch2_trans_iter_exit(&dst_iter);
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 06/21] bcachefs: enum set_needs_rebalance_ctx
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (4 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 05/21] bcachefs: Plumb bch_inode_opts.change_cookie Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 07/21] bcachefs: do_rebalance_scan() now responsible for indirect extents Kent Overstreet
` (14 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Define why we're updating rebalance options, so we know what changes are
allowed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/data_update.c | 1 +
fs/bcachefs/io_write.c | 5 ++++-
fs/bcachefs/move.c | 6 ++++--
fs/bcachefs/rebalance.c | 9 ++++++---
fs/bcachefs/rebalance.h | 19 ++++++++++++++-----
5 files changed, 29 insertions(+), 11 deletions(-)
diff --git a/fs/bcachefs/data_update.c b/fs/bcachefs/data_update.c
index b62d890003ec..e932ee5488da 100644
--- a/fs/bcachefs/data_update.c
+++ b/fs/bcachefs/data_update.c
@@ -439,6 +439,7 @@ static int __bch2_data_update_index_update(struct btree_trans *trans,
k.k->p, insert->k.p) ?:
bch2_inum_snapshot_opts_get(trans, k.k->p.inode, k.k->p.snapshot, &opts) ?:
bch2_bkey_set_needs_rebalance(c, &opts, insert,
+ SET_NEEDS_REBALANCE_foreground,
m->op.opts.change_cookie) ?:
bch2_trans_update(trans, &iter, insert,
BTREE_UPDATE_internal_snapshot_node);
diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
index 0122d8b3292a..16bcdada8cf1 100644
--- a/fs/bcachefs/io_write.c
+++ b/fs/bcachefs/io_write.c
@@ -366,6 +366,7 @@ int bch2_extent_update(struct btree_trans *trans,
i_sectors_delta, &inode) ?:
(bch2_inode_opts_get_inode(c, &inode, &opts),
bch2_bkey_set_needs_rebalance(c, &opts, k,
+ SET_NEEDS_REBALANCE_foreground,
change_cookie)) ?:
bch2_trans_update(trans, iter, k, 0) ?:
bch2_trans_commit(trans, disk_res, NULL,
@@ -1270,7 +1271,9 @@ static int bch2_nocow_write_convert_one_unwritten(struct btree_trans *trans,
return bch2_extent_update_i_size_sectors(trans, iter,
min(new->k.p.offset << 9, new_i_size), 0, &inode) ?:
(bch2_inode_opts_get_inode(c, &inode, &opts),
- bch2_bkey_set_needs_rebalance(c, &opts, new, op->opts.change_cookie)) ?:
+ bch2_bkey_set_needs_rebalance(c, &opts, new,
+ SET_NEEDS_REBALANCE_foreground,
+ op->opts.change_cookie)) ?:
bch2_trans_update(trans, iter, new,
BTREE_UPDATE_internal_snapshot_node);
}
diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index 03b3060f1964..e96443e67b29 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -499,7 +499,8 @@ struct bch_inode_opts *bch2_move_get_io_opts(struct btree_trans *trans,
break;
}
out:
- ret = bch2_get_update_rebalance_opts(trans, opts_ret, extent_iter, extent_k);
+ ret = bch2_get_update_rebalance_opts(trans, opts_ret, extent_iter, extent_k,
+ SET_NEEDS_REBALANCE_other);
if (ret)
return ERR_PTR(ret);
return opts_ret;
@@ -531,7 +532,8 @@ int bch2_move_get_io_opts_one(struct btree_trans *trans,
}
}
- return bch2_get_update_rebalance_opts(trans, io_opts, extent_iter, extent_k);
+ return bch2_get_update_rebalance_opts(trans, io_opts, extent_iter, extent_k,
+ SET_NEEDS_REBALANCE_other);
}
int bch2_move_ratelimit(struct moving_context *ctxt)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index d8e214e6f671..a9d772642b3f 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -162,7 +162,9 @@ static bool bch2_bkey_rebalance_needs_update(struct bch_fs *c, struct bch_inode_
}
int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
- struct bkey_i *_k, u32 change_cookie)
+ struct bkey_i *_k,
+ enum set_needs_rebalance_ctx ctx,
+ u32 change_cookie)
{
if (!bkey_extent_is_direct_data(&_k->k))
return 0;
@@ -189,7 +191,8 @@ int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
int bch2_get_update_rebalance_opts(struct btree_trans *trans,
struct bch_inode_opts *io_opts,
struct btree_iter *iter,
- struct bkey_s_c k)
+ struct bkey_s_c k,
+ enum set_needs_rebalance_ctx ctx)
{
BUG_ON(iter->flags & BTREE_ITER_is_extents);
BUG_ON(iter->flags & BTREE_ITER_filter_snapshots);
@@ -218,7 +221,7 @@ int bch2_get_update_rebalance_opts(struct btree_trans *trans,
/* On successfull transaction commit, @k was invalidated: */
- return bch2_bkey_set_needs_rebalance(trans->c, io_opts, n, 0) ?:
+ return bch2_bkey_set_needs_rebalance(trans->c, io_opts, n, ctx, 0) ?:
bch2_trans_update(trans, iter, n, BTREE_UPDATE_internal_snapshot_node) ?:
bch2_trans_commit(trans, NULL, NULL, 0) ?:
bch_err_throw(trans->c, transaction_restart_nested);
diff --git a/fs/bcachefs/rebalance.h b/fs/bcachefs/rebalance.h
index 62b7f0b3aec7..1e1d6818e7d4 100644
--- a/fs/bcachefs/rebalance.h
+++ b/fs/bcachefs/rebalance.h
@@ -27,11 +27,20 @@ static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_f
};
u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *, struct bkey_s_c);
-int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_inode_opts *, struct bkey_i *, u32);
-int bch2_get_update_rebalance_opts(struct btree_trans *,
- struct bch_inode_opts *,
- struct btree_iter *,
- struct bkey_s_c);
+
+enum set_needs_rebalance_ctx {
+ SET_NEEDS_REBALANCE_opt_change,
+ SET_NEEDS_REBALANCE_opt_change_indirect,
+ SET_NEEDS_REBALANCE_foreground,
+ SET_NEEDS_REBALANCE_other,
+};
+
+int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_inode_opts *,
+ struct bkey_i *, enum set_needs_rebalance_ctx, u32);
+
+int bch2_get_update_rebalance_opts(struct btree_trans *, struct bch_inode_opts *,
+ struct btree_iter *, struct bkey_s_c,
+ enum set_needs_rebalance_ctx);
int bch2_set_rebalance_needs_scan_trans(struct btree_trans *, u64);
int bch2_set_rebalance_needs_scan(struct bch_fs *, u64 inum);
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 07/21] bcachefs: do_rebalance_scan() now responsible for indirect extents
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (5 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 06/21] bcachefs: enum set_needs_rebalance_ctx Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 08/21] bcachefs: Rename, split out bch2_extent_get_io_opts() Kent Overstreet
` (13 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/move.c | 69 ++---------------------------------------
fs/bcachefs/rebalance.c | 32 +++++++++++++++++--
2 files changed, 33 insertions(+), 68 deletions(-)
diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index e96443e67b29..5adeca883ecd 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -580,37 +580,6 @@ int bch2_move_ratelimit(struct moving_context *ctxt)
return 0;
}
-/*
- * Move requires non extents iterators, and there's also no need for it to
- * signal indirect_extent_missing_error:
- */
-static struct bkey_s_c bch2_lookup_indirect_extent_for_move(struct btree_trans *trans,
- struct btree_iter *iter,
- struct bkey_s_c_reflink_p p)
-{
- if (unlikely(REFLINK_P_ERROR(p.v)))
- return bkey_s_c_null;
-
- struct bpos reflink_pos = POS(0, REFLINK_P_IDX(p.v));
-
- bch2_trans_iter_init(trans, iter,
- BTREE_ID_reflink, reflink_pos,
- BTREE_ITER_not_extents);
-
- struct bkey_s_c k = bch2_btree_iter_peek(iter);
- if (!k.k || bkey_err(k)) {
- bch2_trans_iter_exit(iter);
- return k;
- }
-
- if (bkey_lt(reflink_pos, bkey_start_pos(k.k))) {
- bch2_trans_iter_exit(iter);
- return bkey_s_c_null;
- }
-
- return k;
-}
-
int bch2_move_data_btree(struct moving_context *ctxt,
struct bpos start,
struct bpos end,
@@ -625,12 +594,6 @@ int bch2_move_data_btree(struct moving_context *ctxt,
struct btree_iter iter, reflink_iter = {};
struct bkey_s_c k;
struct data_update_opts data_opts;
- /*
- * If we're moving a single file, also process reflinked data it points
- * to (this includes propagating changed io_opts from the inode to the
- * extent):
- */
- bool walk_indirect = start.inode == end.inode;
int ret = 0, ret2;
per_snapshot_io_opts_init(&snapshot_io_opts, c);
@@ -695,8 +658,6 @@ int bch2_move_data_btree(struct moving_context *ctxt,
bch2_ratelimit_reset(ctxt->rate);
while (!bch2_move_ratelimit(ctxt)) {
- struct btree_iter *extent_iter = &iter;
-
bch2_trans_begin(trans);
k = bch2_btree_iter_peek(&iter);
@@ -715,41 +676,17 @@ int bch2_move_data_btree(struct moving_context *ctxt,
if (ctxt->stats)
ctxt->stats->pos = BBPOS(iter.btree_id, iter.pos);
- if (walk_indirect &&
- k.k->type == KEY_TYPE_reflink_p &&
- REFLINK_P_MAY_UPDATE_OPTIONS(bkey_s_c_to_reflink_p(k).v)) {
- struct bkey_s_c_reflink_p p = bkey_s_c_to_reflink_p(k);
-
- bch2_trans_iter_exit(&reflink_iter);
- k = bch2_lookup_indirect_extent_for_move(trans, &reflink_iter, p);
- ret = bkey_err(k);
- if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
- continue;
- if (ret)
- break;
-
- if (!k.k)
- goto next_nondata;
-
- /*
- * XXX: reflink pointers may point to multiple indirect
- * extents, so don't advance past the entire reflink
- * pointer - need to fixup iter->k
- */
- extent_iter = &reflink_iter;
- }
-
if (!bkey_extent_is_direct_data(k.k))
goto next_nondata;
io_opts = bch2_move_get_io_opts(trans, &snapshot_io_opts,
- iter.pos, extent_iter, k);
+ iter.pos, &iter, k);
ret = PTR_ERR_OR_ZERO(io_opts);
if (ret)
continue;
memset(&data_opts, 0, sizeof(data_opts));
- if (!pred(c, arg, extent_iter->btree_id, k, io_opts, &data_opts))
+ if (!pred(c, arg, iter.btree_id, k, io_opts, &data_opts))
goto next;
/*
@@ -760,7 +697,7 @@ int bch2_move_data_btree(struct moving_context *ctxt,
k = bkey_i_to_s_c(sk.k);
if (!level)
- ret2 = bch2_move_extent(ctxt, NULL, extent_iter, k, *io_opts, data_opts);
+ ret2 = bch2_move_extent(ctxt, NULL, &iter, k, *io_opts, data_opts);
else if (!data_opts.scrub)
ret2 = bch2_btree_node_rewrite_pos(trans, btree_id, level,
k.k->p, data_opts.target, 0);
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index a9d772642b3f..2bb634ee0f48 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -482,6 +482,29 @@ static int do_rebalance_extent(struct moving_context *ctxt,
return ret;
}
+static int do_rebalance_scan_indirect(struct btree_trans *trans,
+ struct bkey_s_c_reflink_p p,
+ struct bch_inode_opts *opts)
+{
+ u64 idx = REFLINK_P_IDX(p.v) - le32_to_cpu(p.v->front_pad);
+ u64 end = REFLINK_P_IDX(p.v) + p.k->size + le32_to_cpu(p.v->back_pad);
+ u32 restart_count = trans->restart_count;
+
+ int ret = for_each_btree_key(trans, iter, BTREE_ID_reflink,
+ POS(0, idx), BTREE_ITER_not_extents, k, ({
+ if (bpos_ge(bkey_start_pos(k.k), POS(0, end)))
+ break;
+ bch2_get_update_rebalance_opts(trans, opts, &iter, k,
+ SET_NEEDS_REBALANCE_opt_change_indirect);
+ }));
+ if (ret)
+ return ret;
+
+ /* suppress trans_was_restarted() check */
+ trans->restart_count = restart_count;
+ return 0;
+}
+
static int do_rebalance_scan(struct moving_context *ctxt,
u64 inum, u64 cookie, u64 *sectors_scanned)
{
@@ -511,9 +534,14 @@ static int do_rebalance_scan(struct moving_context *ctxt,
BTREE_ITER_prefetch, k, ({
ctxt->stats->pos = BBPOS(iter.btree_id, iter.pos);
- struct bch_inode_opts *io_opts = bch2_move_get_io_opts(trans,
+ struct bch_inode_opts *opts = bch2_move_get_io_opts(trans,
&snapshot_io_opts, iter.pos, &iter, k);
- PTR_ERR_OR_ZERO(io_opts);
+ PTR_ERR_OR_ZERO(opts) ?:
+ (inum &&
+ k.k->type == KEY_TYPE_reflink_p &&
+ REFLINK_P_MAY_UPDATE_OPTIONS(bkey_s_c_to_reflink_p(k).v)
+ ? do_rebalance_scan_indirect(trans, bkey_s_c_to_reflink_p(k), opts)
+ : 0);
})) ?:
commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_clear_rebalance_needs_scan(trans, inum, cookie));
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 08/21] bcachefs: Rename, split out bch2_extent_get_io_opts()
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (6 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 07/21] bcachefs: do_rebalance_scan() now responsible for indirect extents Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 09/21] bcachefs: do_rebalance_extent() uses bch2_extent_get_apply_io_opts() Kent Overstreet
` (12 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Move to rebalance.c, where it more properly belongs, and split out
getting an extent's io options from re-propagating back to the extent -
prep work for new fsck code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/move.c | 97 ++------------------------------
fs/bcachefs/move.h | 30 ----------
fs/bcachefs/rebalance.c | 120 +++++++++++++++++++++++++++++++++++++---
fs/bcachefs/rebalance.h | 34 +++++++++++-
4 files changed, 147 insertions(+), 134 deletions(-)
diff --git a/fs/bcachefs/move.c b/fs/bcachefs/move.c
index 5adeca883ecd..62aeb54ef11b 100644
--- a/fs/bcachefs/move.c
+++ b/fs/bcachefs/move.c
@@ -447,95 +447,6 @@ int bch2_move_extent(struct moving_context *ctxt,
return ret;
}
-struct bch_inode_opts *bch2_move_get_io_opts(struct btree_trans *trans,
- struct per_snapshot_io_opts *io_opts,
- struct bpos extent_pos, /* extent_iter, extent_k may be in reflink btree */
- struct btree_iter *extent_iter,
- struct bkey_s_c extent_k)
-{
- struct bch_fs *c = trans->c;
- u32 restart_count = trans->restart_count;
- struct bch_inode_opts *opts_ret = &io_opts->fs_io_opts;
- int ret = 0;
-
- if (btree_iter_path(trans, extent_iter)->level)
- return opts_ret;
-
- if (extent_k.k->type == KEY_TYPE_reflink_v)
- goto out;
-
- if (io_opts->cur_inum != extent_pos.inode) {
- io_opts->d.nr = 0;
-
- ret = for_each_btree_key(trans, iter, BTREE_ID_inodes, POS(0, extent_pos.inode),
- BTREE_ITER_all_snapshots, k, ({
- if (k.k->p.offset != extent_pos.inode)
- break;
-
- if (!bkey_is_inode(k.k))
- continue;
-
- struct bch_inode_unpacked inode;
- _ret3 = bch2_inode_unpack(k, &inode);
- if (_ret3)
- break;
-
- struct snapshot_io_opts_entry e = { .snapshot = k.k->p.snapshot };
- bch2_inode_opts_get_inode(trans->c, &inode, &e.io_opts);
-
- darray_push(&io_opts->d, e);
- }));
- io_opts->cur_inum = extent_pos.inode;
- }
-
- ret = ret ?: trans_was_restarted(trans, restart_count);
- if (ret)
- return ERR_PTR(ret);
-
- if (extent_k.k->p.snapshot)
- darray_for_each(io_opts->d, i)
- if (bch2_snapshot_is_ancestor(c, extent_k.k->p.snapshot, i->snapshot)) {
- opts_ret = &i->io_opts;
- break;
- }
-out:
- ret = bch2_get_update_rebalance_opts(trans, opts_ret, extent_iter, extent_k,
- SET_NEEDS_REBALANCE_other);
- if (ret)
- return ERR_PTR(ret);
- return opts_ret;
-}
-
-int bch2_move_get_io_opts_one(struct btree_trans *trans,
- struct bch_inode_opts *io_opts,
- struct btree_iter *extent_iter,
- struct bkey_s_c extent_k)
-{
- struct bch_fs *c = trans->c;
-
- bch2_inode_opts_get(c, io_opts);
-
- /* reflink btree? */
- if (extent_k.k->p.inode) {
- CLASS(btree_iter, inode_iter)(trans, BTREE_ID_inodes,
- SPOS(0, extent_k.k->p.inode, extent_k.k->p.snapshot),
- BTREE_ITER_cached);
- struct bkey_s_c inode_k = bch2_btree_iter_peek_slot(&inode_iter);
- int ret = bkey_err(inode_k);
- if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
- return ret;
-
- if (!ret && bkey_is_inode(inode_k.k)) {
- struct bch_inode_unpacked inode;
- bch2_inode_unpack(inode_k, &inode);
- bch2_inode_opts_get_inode(c, &inode, io_opts);
- }
- }
-
- return bch2_get_update_rebalance_opts(trans, io_opts, extent_iter, extent_k,
- SET_NEEDS_REBALANCE_other);
-}
-
int bch2_move_ratelimit(struct moving_context *ctxt)
{
struct bch_fs *c = ctxt->trans->c;
@@ -679,8 +590,9 @@ int bch2_move_data_btree(struct moving_context *ctxt,
if (!bkey_extent_is_direct_data(k.k))
goto next_nondata;
- io_opts = bch2_move_get_io_opts(trans, &snapshot_io_opts,
- iter.pos, &iter, k);
+ io_opts = bch2_extent_get_apply_io_opts(trans, &snapshot_io_opts,
+ iter.pos, &iter, k,
+ SET_NEEDS_REBALANCE_other);
ret = PTR_ERR_OR_ZERO(io_opts);
if (ret)
continue;
@@ -878,7 +790,8 @@ static int __bch2_move_data_phys(struct moving_context *ctxt,
goto next;
if (!bp.v->level) {
- ret = bch2_move_get_io_opts_one(trans, &io_opts, &iter, k);
+ ret = bch2_extent_get_io_opts_one(trans, &io_opts, &iter, k,
+ SET_NEEDS_REBALANCE_other);
if (ret) {
bch2_trans_iter_exit(&iter);
continue;
diff --git a/fs/bcachefs/move.h b/fs/bcachefs/move.h
index 18021d2c51d0..754b0ad45950 100644
--- a/fs/bcachefs/move.h
+++ b/fs/bcachefs/move.h
@@ -87,32 +87,6 @@ void bch2_moving_ctxt_flush_all(struct moving_context *);
void bch2_move_ctxt_wait_for_io(struct moving_context *);
int bch2_move_ratelimit(struct moving_context *);
-/* Inodes in different snapshots may have different IO options: */
-struct snapshot_io_opts_entry {
- u32 snapshot;
- struct bch_inode_opts io_opts;
-};
-
-struct per_snapshot_io_opts {
- u64 cur_inum;
- struct bch_inode_opts fs_io_opts;
- DARRAY(struct snapshot_io_opts_entry) d;
-};
-
-static inline void per_snapshot_io_opts_init(struct per_snapshot_io_opts *io_opts, struct bch_fs *c)
-{
- memset(io_opts, 0, sizeof(*io_opts));
- bch2_inode_opts_get(c, &io_opts->fs_io_opts);
-}
-
-static inline void per_snapshot_io_opts_exit(struct per_snapshot_io_opts *io_opts)
-{
- darray_exit(&io_opts->d);
-}
-
-int bch2_move_get_io_opts_one(struct btree_trans *, struct bch_inode_opts *,
- struct btree_iter *, struct bkey_s_c);
-
int bch2_scan_old_btree_nodes(struct bch_fs *, struct bch_move_stats *);
int bch2_move_extent(struct moving_context *,
@@ -122,10 +96,6 @@ int bch2_move_extent(struct moving_context *,
struct bch_inode_opts,
struct data_update_opts);
-struct bch_inode_opts *bch2_move_get_io_opts(struct btree_trans *,
- struct per_snapshot_io_opts *, struct bpos,
- struct btree_iter *, struct bkey_s_c);
-
int bch2_move_data_btree(struct moving_context *, struct bpos, struct bpos,
move_pred_fn, void *, enum btree_id, unsigned);
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 2bb634ee0f48..d34a287cace4 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -188,11 +188,11 @@ int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
return 0;
}
-int bch2_get_update_rebalance_opts(struct btree_trans *trans,
- struct bch_inode_opts *io_opts,
- struct btree_iter *iter,
- struct bkey_s_c k,
- enum set_needs_rebalance_ctx ctx)
+static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
+ struct bch_inode_opts *io_opts,
+ struct btree_iter *iter,
+ struct bkey_s_c k,
+ enum set_needs_rebalance_ctx ctx)
{
BUG_ON(iter->flags & BTREE_ITER_is_extents);
BUG_ON(iter->flags & BTREE_ITER_filter_snapshots);
@@ -224,7 +224,107 @@ int bch2_get_update_rebalance_opts(struct btree_trans *trans,
return bch2_bkey_set_needs_rebalance(trans->c, io_opts, n, ctx, 0) ?:
bch2_trans_update(trans, iter, n, BTREE_UPDATE_internal_snapshot_node) ?:
bch2_trans_commit(trans, NULL, NULL, 0) ?:
- bch_err_throw(trans->c, transaction_restart_nested);
+ bch_err_throw(trans->c, transaction_restart_commit);
+}
+
+static struct bch_inode_opts *bch2_extent_get_io_opts(struct btree_trans *trans,
+ struct per_snapshot_io_opts *io_opts,
+ struct bpos extent_pos, /* extent_iter, extent_k may be in reflink btree */
+ struct btree_iter *extent_iter,
+ struct bkey_s_c extent_k)
+{
+ struct bch_fs *c = trans->c;
+ u32 restart_count = trans->restart_count;
+ int ret = 0;
+
+ if (btree_iter_path(trans, extent_iter)->level)
+ return &io_opts->fs_io_opts;
+
+ if (extent_k.k->type == KEY_TYPE_reflink_v)
+ return &io_opts->fs_io_opts;
+
+ if (io_opts->cur_inum != extent_pos.inode) {
+ io_opts->d.nr = 0;
+
+ ret = for_each_btree_key(trans, iter, BTREE_ID_inodes, POS(0, extent_pos.inode),
+ BTREE_ITER_all_snapshots, k, ({
+ if (k.k->p.offset != extent_pos.inode)
+ break;
+
+ if (!bkey_is_inode(k.k))
+ continue;
+
+ struct bch_inode_unpacked inode;
+ _ret3 = bch2_inode_unpack(k, &inode);
+ if (_ret3)
+ break;
+
+ struct snapshot_io_opts_entry e = { .snapshot = k.k->p.snapshot };
+ bch2_inode_opts_get_inode(c, &inode, &e.io_opts);
+
+ darray_push(&io_opts->d, e);
+ }));
+ io_opts->cur_inum = extent_pos.inode;
+ }
+
+ ret = ret ?: trans_was_restarted(trans, restart_count);
+ if (ret)
+ return ERR_PTR(ret);
+
+ if (extent_k.k->p.snapshot)
+ darray_for_each(io_opts->d, i)
+ if (bch2_snapshot_is_ancestor(c, extent_k.k->p.snapshot, i->snapshot))
+ return &i->io_opts;
+
+ return &io_opts->fs_io_opts;
+}
+
+struct bch_inode_opts *bch2_extent_get_apply_io_opts(struct btree_trans *trans,
+ struct per_snapshot_io_opts *snapshot_io_opts,
+ struct bpos extent_pos, /* extent_iter, extent_k may be in reflink btree */
+ struct btree_iter *extent_iter,
+ struct bkey_s_c extent_k,
+ enum set_needs_rebalance_ctx ctx)
+{
+ struct bch_inode_opts *opts =
+ bch2_extent_get_io_opts(trans, snapshot_io_opts, extent_pos, extent_iter, extent_k);
+ if (IS_ERR(opts) || btree_iter_path(trans, extent_iter)->level)
+ return opts;
+
+ int ret = bch2_get_update_rebalance_opts(trans, opts, extent_iter, extent_k,
+ SET_NEEDS_REBALANCE_other);
+ return ret ? ERR_PTR(ret) : opts;
+}
+
+int bch2_extent_get_io_opts_one(struct btree_trans *trans,
+ struct bch_inode_opts *io_opts,
+ struct btree_iter *extent_iter,
+ struct bkey_s_c extent_k,
+ enum set_needs_rebalance_ctx ctx)
+{
+ struct bch_fs *c = trans->c;
+
+ bch2_inode_opts_get(c, io_opts);
+
+ /* reflink btree? */
+ if (extent_k.k->p.inode) {
+ CLASS(btree_iter, inode_iter)(trans, BTREE_ID_inodes,
+ SPOS(0, extent_k.k->p.inode, extent_k.k->p.snapshot),
+ BTREE_ITER_cached);
+ struct bkey_s_c inode_k = bch2_btree_iter_peek_slot(&inode_iter);
+ int ret = bkey_err(inode_k);
+ if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
+ return ret;
+
+ if (!ret && bkey_is_inode(inode_k.k)) {
+ struct bch_inode_unpacked inode;
+ bch2_inode_unpack(inode_k, &inode);
+ bch2_inode_opts_get_inode(c, &inode, io_opts);
+ }
+ }
+
+ return bch2_get_update_rebalance_opts(trans, io_opts, extent_iter, extent_k,
+ ctx);
}
#define REBALANCE_WORK_SCAN_OFFSET (U64_MAX - 1)
@@ -373,7 +473,8 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
if (bkey_err(k))
return k;
- int ret = bch2_move_get_io_opts_one(trans, io_opts, extent_iter, k);
+ int ret = bch2_extent_get_io_opts_one(trans, io_opts, extent_iter, k,
+ SET_NEEDS_REBALANCE_other);
if (ret)
return bkey_s_c_err(ret);
@@ -534,8 +635,9 @@ static int do_rebalance_scan(struct moving_context *ctxt,
BTREE_ITER_prefetch, k, ({
ctxt->stats->pos = BBPOS(iter.btree_id, iter.pos);
- struct bch_inode_opts *opts = bch2_move_get_io_opts(trans,
- &snapshot_io_opts, iter.pos, &iter, k);
+ struct bch_inode_opts *opts = bch2_extent_get_apply_io_opts(trans,
+ &snapshot_io_opts, iter.pos, &iter, k,
+ SET_NEEDS_REBALANCE_opt_change);
PTR_ERR_OR_ZERO(opts) ?:
(inum &&
k.k->type == KEY_TYPE_reflink_p &&
diff --git a/fs/bcachefs/rebalance.h b/fs/bcachefs/rebalance.h
index 1e1d6818e7d4..fd33e7aa2ecb 100644
--- a/fs/bcachefs/rebalance.h
+++ b/fs/bcachefs/rebalance.h
@@ -38,9 +38,37 @@ enum set_needs_rebalance_ctx {
int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_inode_opts *,
struct bkey_i *, enum set_needs_rebalance_ctx, u32);
-int bch2_get_update_rebalance_opts(struct btree_trans *, struct bch_inode_opts *,
- struct btree_iter *, struct bkey_s_c,
- enum set_needs_rebalance_ctx);
+/* Inodes in different snapshots may have different IO options: */
+struct snapshot_io_opts_entry {
+ u32 snapshot;
+ struct bch_inode_opts io_opts;
+};
+
+struct per_snapshot_io_opts {
+ u64 cur_inum;
+ struct bch_inode_opts fs_io_opts;
+ DARRAY(struct snapshot_io_opts_entry) d;
+};
+
+static inline void per_snapshot_io_opts_init(struct per_snapshot_io_opts *io_opts, struct bch_fs *c)
+{
+ memset(io_opts, 0, sizeof(*io_opts));
+ bch2_inode_opts_get(c, &io_opts->fs_io_opts);
+}
+
+static inline void per_snapshot_io_opts_exit(struct per_snapshot_io_opts *io_opts)
+{
+ darray_exit(&io_opts->d);
+}
+
+struct bch_inode_opts *bch2_extent_get_apply_io_opts(struct btree_trans *,
+ struct per_snapshot_io_opts *, struct bpos,
+ struct btree_iter *, struct bkey_s_c,
+ enum set_needs_rebalance_ctx);
+
+int bch2_extent_get_io_opts_one(struct btree_trans *, struct bch_inode_opts *,
+ struct btree_iter *, struct bkey_s_c,
+ enum set_needs_rebalance_ctx);
int bch2_set_rebalance_needs_scan_trans(struct btree_trans *, u64);
int bch2_set_rebalance_needs_scan(struct bch_fs *, u64 inum);
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 09/21] bcachefs: do_rebalance_extent() uses bch2_extent_get_apply_io_opts()
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (7 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 08/21] bcachefs: Rename, split out bch2_extent_get_io_opts() Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 10/21] bcachefs: Correct propagation of io options to indirect extents Kent Overstreet
` (11 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
No reason for it to be using bch2_extent_get_io_opts_one(), iterating
over the rebalance_work btree iterates in natural key order.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 51 ++++++++++++++++++++++++-----------------
1 file changed, 30 insertions(+), 21 deletions(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index d34a287cace4..f292c93ddd4d 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -457,9 +457,10 @@ static int bch2_bkey_clear_needs_rebalance(struct btree_trans *trans,
}
static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
+ struct per_snapshot_io_opts *snapshot_io_opts,
struct bpos work_pos,
struct btree_iter *extent_iter,
- struct bch_inode_opts *io_opts,
+ struct bch_inode_opts **opts_ret,
struct data_update_opts *data_opts)
{
struct bch_fs *c = trans->c;
@@ -473,14 +474,19 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
if (bkey_err(k))
return k;
- int ret = bch2_extent_get_io_opts_one(trans, io_opts, extent_iter, k,
+ struct bch_inode_opts *opts =
+ bch2_extent_get_apply_io_opts(trans, snapshot_io_opts,
+ extent_iter->pos, extent_iter, k,
SET_NEEDS_REBALANCE_other);
+ int ret = PTR_ERR_OR_ZERO(opts);
if (ret)
return bkey_s_c_err(ret);
+ *opts_ret = opts;
+
memset(data_opts, 0, sizeof(*data_opts));
- data_opts->rewrite_ptrs = bch2_bkey_ptrs_need_rebalance(c, io_opts, k);
- data_opts->target = io_opts->background_target;
+ data_opts->rewrite_ptrs = bch2_bkey_ptrs_need_rebalance(c, opts, k);
+ data_opts->target = opts->background_target;
data_opts->write_flags |= BCH_WRITE_only_specified_devs;
if (!data_opts->rewrite_ptrs) {
@@ -505,19 +511,19 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
- unsigned p = bch2_bkey_ptrs_need_compress(c, io_opts, k, ptrs);
+ unsigned p = bch2_bkey_ptrs_need_compress(c, opts, k, ptrs);
if (p) {
prt_str(&buf, "compression=");
- bch2_compression_opt_to_text(&buf, io_opts->background_compression);
+ bch2_compression_opt_to_text(&buf, opts->background_compression);
prt_str(&buf, " ");
bch2_prt_u64_base2(&buf, p);
prt_newline(&buf);
}
- p = bch2_bkey_ptrs_need_move(c, io_opts, ptrs);
+ p = bch2_bkey_ptrs_need_move(c, opts, ptrs);
if (p) {
prt_str(&buf, "move=");
- bch2_target_to_text(&buf, c, io_opts->background_target);
+ bch2_target_to_text(&buf, c, opts->background_target);
prt_str(&buf, " ");
bch2_prt_u64_base2(&buf, p);
prt_newline(&buf);
@@ -532,6 +538,7 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
noinline_for_stack
static int do_rebalance_extent(struct moving_context *ctxt,
+ struct per_snapshot_io_opts *snapshot_io_opts,
struct bpos work_pos,
struct btree_iter *extent_iter)
{
@@ -539,7 +546,7 @@ static int do_rebalance_extent(struct moving_context *ctxt,
struct bch_fs *c = trans->c;
struct bch_fs_rebalance *r = &trans->c->rebalance;
struct data_update_opts data_opts;
- struct bch_inode_opts io_opts;
+ struct bch_inode_opts *io_opts;
struct bkey_s_c k;
struct bkey_buf sk;
int ret;
@@ -550,8 +557,8 @@ static int do_rebalance_extent(struct moving_context *ctxt,
bch2_bkey_buf_init(&sk);
ret = lockrestart_do(trans,
- bkey_err(k = next_rebalance_extent(trans, work_pos,
- extent_iter, &io_opts, &data_opts)));
+ bkey_err(k = next_rebalance_extent(trans, snapshot_io_opts,
+ work_pos, extent_iter, &io_opts, &data_opts)));
if (ret || !k.k)
goto out;
@@ -564,7 +571,7 @@ static int do_rebalance_extent(struct moving_context *ctxt,
bch2_bkey_buf_reassemble(&sk, c, k);
k = bkey_i_to_s_c(sk.k);
- ret = bch2_move_extent(ctxt, NULL, extent_iter, k, io_opts, data_opts);
+ ret = bch2_move_extent(ctxt, NULL, extent_iter, k, *io_opts, data_opts);
if (ret) {
if (bch2_err_matches(ret, ENOMEM)) {
/* memory allocation failure, wait for some IO to finish */
@@ -607,6 +614,7 @@ static int do_rebalance_scan_indirect(struct btree_trans *trans,
}
static int do_rebalance_scan(struct moving_context *ctxt,
+ struct per_snapshot_io_opts *snapshot_io_opts,
u64 inum, u64 cookie, u64 *sectors_scanned)
{
struct btree_trans *trans = ctxt->trans;
@@ -626,9 +634,6 @@ static int do_rebalance_scan(struct moving_context *ctxt,
r->state = BCH_REBALANCE_scanning;
- struct per_snapshot_io_opts snapshot_io_opts;
- per_snapshot_io_opts_init(&snapshot_io_opts, c);
-
int ret = for_each_btree_key_max(trans, iter, BTREE_ID_extents,
r->scan_start.pos, r->scan_end.pos,
BTREE_ITER_all_snapshots|
@@ -636,7 +641,7 @@ static int do_rebalance_scan(struct moving_context *ctxt,
ctxt->stats->pos = BBPOS(iter.btree_id, iter.pos);
struct bch_inode_opts *opts = bch2_extent_get_apply_io_opts(trans,
- &snapshot_io_opts, iter.pos, &iter, k,
+ snapshot_io_opts, iter.pos, &iter, k,
SET_NEEDS_REBALANCE_opt_change);
PTR_ERR_OR_ZERO(opts) ?:
(inum &&
@@ -648,16 +653,14 @@ static int do_rebalance_scan(struct moving_context *ctxt,
commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
bch2_clear_rebalance_needs_scan(trans, inum, cookie));
- per_snapshot_io_opts_exit(&snapshot_io_opts);
*sectors_scanned += atomic64_read(&r->scan_stats.sectors_seen);
- bch2_move_stats_exit(&r->scan_stats, c);
-
/*
* Ensure that the rebalance_work entries we created are seen by the
* next iteration of do_rebalance(), so we don't end up stuck in
* rebalance_wait():
*/
*sectors_scanned += 1;
+ bch2_move_stats_exit(&r->scan_stats, c);
bch2_btree_write_buffer_flush_sync(trans);
@@ -709,6 +712,9 @@ static int do_rebalance(struct moving_context *ctxt)
bch2_move_stats_init(&r->work_stats, "rebalance_work");
+ struct per_snapshot_io_opts snapshot_io_opts;
+ per_snapshot_io_opts_init(&snapshot_io_opts, c);
+
while (!bch2_move_ratelimit(ctxt)) {
if (!bch2_rebalance_enabled(c)) {
bch2_moving_ctxt_flush_all(ctxt);
@@ -723,15 +729,18 @@ static int do_rebalance(struct moving_context *ctxt)
break;
ret = k->k.type == KEY_TYPE_cookie
- ? do_rebalance_scan(ctxt, k->k.p.inode,
+ ? do_rebalance_scan(ctxt, &snapshot_io_opts,
+ k->k.p.inode,
le64_to_cpu(bkey_i_to_cookie(k)->v.cookie),
§ors_scanned)
- : do_rebalance_extent(ctxt, k->k.p, &extent_iter);
+ : do_rebalance_extent(ctxt, &snapshot_io_opts,
+ k->k.p, &extent_iter);
if (ret)
break;
}
bch2_trans_iter_exit(&extent_iter);
+ per_snapshot_io_opts_exit(&snapshot_io_opts);
bch2_move_stats_exit(&r->work_stats, c);
if (!ret &&
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 10/21] bcachefs: Correct propagation of io options to indirect extents
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (8 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 09/21] bcachefs: do_rebalance_extent() uses bch2_extent_get_apply_io_opts() Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 11/21] bcachefs: bkey_should_have_rb_opts() Kent Overstreet
` (10 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
io path options set from the inode should override existing indirect
extent options, if REFLINK_P_MAY_UPDATE_OPTS is set.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index f292c93ddd4d..932998864ad2 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -197,13 +197,23 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
BUG_ON(iter->flags & BTREE_ITER_is_extents);
BUG_ON(iter->flags & BTREE_ITER_filter_snapshots);
+ bool may_update_indirect = ctx == SET_NEEDS_REBALANCE_opt_change_indirect;
+
+ /*
+ * If it's an indirect extent, and we walked to it directly, we won't
+ * have the options from the inode that were directly applied: options
+ * from the extent take precedence - unless the io_opts option came from
+ * the inode and may_update_indirect is true (walked from a
+ * REFLINK_P_MAY_UPDATE_OPTIONS pointer).
+ */
const struct bch_extent_rebalance *r = k.k->type == KEY_TYPE_reflink_v
? bch2_bkey_rebalance_opts(k) : NULL;
if (r) {
-#define x(_name) \
- if (r->_name##_from_inode) { \
- io_opts->_name = r->_name; \
- io_opts->_name##_from_inode = true; \
+#define x(_name) \
+ if (r->_name##_from_inode && \
+ !(may_update_indirect && io_opts->_name##_from_inode)) { \
+ io_opts->_name = r->_name; \
+ io_opts->_name##_from_inode = true; \
}
BCH_REBALANCE_OPTS()
#undef x
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 11/21] bcachefs: bkey_should_have_rb_opts()
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (9 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 10/21] bcachefs: Correct propagation of io options to indirect extents Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 12/21] bcachefs: bch2_bkey_needs_rebalance() Kent Overstreet
` (9 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Factor out a small helper.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 43 ++++++++++++++++++++---------------------
1 file changed, 21 insertions(+), 22 deletions(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 932998864ad2..9e22ff0e2d28 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -145,20 +145,11 @@ u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *c, struct bkey_s_c k)
return sectors;
}
-static bool bch2_bkey_rebalance_needs_update(struct bch_fs *c, struct bch_inode_opts *opts,
- struct bkey_s_c k)
+static inline bool bkey_should_have_rb_opts(struct bch_fs *c,
+ struct bch_inode_opts *opts,
+ struct bkey_s_c k)
{
- if (!bkey_extent_is_direct_data(k.k))
- return 0;
-
- const struct bch_extent_rebalance *old = bch2_bkey_rebalance_opts(k);
-
- if (k.k->type == KEY_TYPE_reflink_v || bch2_bkey_ptrs_need_rebalance(c, opts, k)) {
- struct bch_extent_rebalance new = io_opts_to_rebalance_opts(c, opts);
- return old == NULL || memcmp(old, &new, sizeof(new));
- } else {
- return old != NULL;
- }
+ return k.k->type == KEY_TYPE_reflink_v || bch2_bkey_ptrs_need_rebalance(c, opts, k);
}
int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
@@ -173,7 +164,7 @@ int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
struct bch_extent_rebalance *old =
(struct bch_extent_rebalance *) bch2_bkey_rebalance_opts(k.s_c);
- if (k.k->type == KEY_TYPE_reflink_v || bch2_bkey_ptrs_need_rebalance(c, opts, k.s_c)) {
+ if (bkey_should_have_rb_opts(c, opts, k.s_c)) {
if (!old) {
old = bkey_val_end(k);
k.k->u64s += sizeof(*old) / sizeof(u64);
@@ -194,9 +185,14 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
struct bkey_s_c k,
enum set_needs_rebalance_ctx ctx)
{
+ struct bch_fs *c = trans->c;
+
BUG_ON(iter->flags & BTREE_ITER_is_extents);
BUG_ON(iter->flags & BTREE_ITER_filter_snapshots);
+ if (!bkey_extent_is_direct_data(k.k))
+ return 0;
+
bool may_update_indirect = ctx == SET_NEEDS_REBALANCE_opt_change_indirect;
/*
@@ -206,20 +202,23 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
* the inode and may_update_indirect is true (walked from a
* REFLINK_P_MAY_UPDATE_OPTIONS pointer).
*/
- const struct bch_extent_rebalance *r = k.k->type == KEY_TYPE_reflink_v
- ? bch2_bkey_rebalance_opts(k) : NULL;
- if (r) {
+ const struct bch_extent_rebalance *old = bch2_bkey_rebalance_opts(k);
+ if (old && k.k->type == KEY_TYPE_reflink_v) {
#define x(_name) \
- if (r->_name##_from_inode && \
+ if (old->_name##_from_inode && \
!(may_update_indirect && io_opts->_name##_from_inode)) { \
- io_opts->_name = r->_name; \
+ io_opts->_name = old->_name; \
io_opts->_name##_from_inode = true; \
}
BCH_REBALANCE_OPTS()
#undef x
}
- if (!bch2_bkey_rebalance_needs_update(trans->c, io_opts, k))
+ struct bch_extent_rebalance new = io_opts_to_rebalance_opts(c, io_opts);
+
+ if (bkey_should_have_rb_opts(c, io_opts, k)
+ ? old && !memcmp(old, &new, sizeof(new))
+ : !old)
return 0;
struct bkey_i *n = bch2_trans_kmalloc(trans, bkey_bytes(k.k) + 8);
@@ -231,10 +230,10 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
/* On successfull transaction commit, @k was invalidated: */
- return bch2_bkey_set_needs_rebalance(trans->c, io_opts, n, ctx, 0) ?:
+ return bch2_bkey_set_needs_rebalance(c, io_opts, n, ctx, 0) ?:
bch2_trans_update(trans, iter, n, BTREE_UPDATE_internal_snapshot_node) ?:
bch2_trans_commit(trans, NULL, NULL, 0) ?:
- bch_err_throw(trans->c, transaction_restart_commit);
+ bch_err_throw(c, transaction_restart_commit);
}
static struct bch_inode_opts *bch2_extent_get_io_opts(struct btree_trans *trans,
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 12/21] bcachefs: bch2_bkey_needs_rebalance()
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (10 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 11/21] bcachefs: bkey_should_have_rb_opts() Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 13/21] bcachefs: rebalance now supports changing checksum type Kent Overstreet
` (8 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Collapse bch2_bkey_sectors_need_rebalance() and
bch2_bkey_ptrs_need_rebalance() down to a single function, which outputs
both the bitmap of pointers that need to be rebalanced and the number of
sectors that need to be processed.
This will enable adding other reasons an extent might need to be
processed by rebalance: changing the checksum type,
increasing/decreasing replication level, enabling/disabling erasure
coding, etc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 171 +++++++++++++++++++---------------------
1 file changed, 81 insertions(+), 90 deletions(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 9e22ff0e2d28..0d1abb0eeb88 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -43,106 +43,95 @@ static const struct bch_extent_rebalance *bch2_bkey_rebalance_opts(struct bkey_s
return bch2_bkey_ptrs_rebalance_opts(bch2_bkey_ptrs_c(k));
}
-static inline unsigned bch2_bkey_ptrs_need_compress(struct bch_fs *c,
- struct bch_inode_opts *opts,
- struct bkey_s_c k,
- struct bkey_ptrs_c ptrs)
+static void bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
+ struct bch_inode_opts *io_opts,
+ unsigned *move_ptrs,
+ unsigned *compress_ptrs,
+ u64 *sectors)
{
- if (!opts->background_compression)
- return 0;
+ *move_ptrs = 0;
+ *compress_ptrs = 0;
+ *sectors = 0;
+
+ struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
+
+ const struct bch_extent_rebalance *rb_opts = bch2_bkey_ptrs_rebalance_opts(ptrs);
+ if (!io_opts && !rb_opts)
+ return;
+
+ if (bch2_bkey_extent_ptrs_flags(ptrs) & BIT_ULL(BCH_EXTENT_FLAG_poisoned))
+ return;
+
+ unsigned compression_type =
+ bch2_compression_opt_to_type(io_opts
+ ? io_opts->background_compression
+ : rb_opts->background_compression);
+ unsigned target = io_opts
+ ? io_opts->background_target
+ : rb_opts->background_target;
+ if (target && !bch2_target_accepts_data(c, BCH_DATA_user, target))
+ target = 0;
- unsigned compression_type = bch2_compression_opt_to_type(opts->background_compression);
const union bch_extent_entry *entry;
struct extent_ptr_decoded p;
- unsigned ptr_bit = 1;
- unsigned rewrite_ptrs = 0;
-
- bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
- if (p.crc.compression_type == BCH_COMPRESSION_TYPE_incompressible ||
- p.ptr.unwritten)
- return 0;
+ bool incompressible = false, unwritten = false;
- if (!p.ptr.cached && p.crc.compression_type != compression_type)
- rewrite_ptrs |= ptr_bit;
- ptr_bit <<= 1;
- }
+ unsigned ptr_idx = 1;
- return rewrite_ptrs;
-}
+ guard(rcu)();
+ bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
+ incompressible |= p.crc.compression_type == BCH_COMPRESSION_TYPE_incompressible;
+ unwritten |= p.ptr.unwritten;
-static inline unsigned bch2_bkey_ptrs_need_move(struct bch_fs *c,
- struct bch_inode_opts *opts,
- struct bkey_ptrs_c ptrs)
-{
- if (!opts->background_target ||
- !bch2_target_accepts_data(c, BCH_DATA_user, opts->background_target))
- return 0;
+ if (!p.ptr.cached) {
+ if (p.crc.compression_type != compression_type)
+ *compress_ptrs |= ptr_idx;
- unsigned ptr_bit = 1;
- unsigned rewrite_ptrs = 0;
+ if (target && !bch2_dev_in_target(c, p.ptr.dev, target))
+ *move_ptrs |= ptr_idx;
+ }
- guard(rcu)();
- bkey_for_each_ptr(ptrs, ptr) {
- if (!ptr->cached && !bch2_dev_in_target(c, ptr->dev, opts->background_target))
- rewrite_ptrs |= ptr_bit;
- ptr_bit <<= 1;
+ ptr_idx <<= 1;
}
- return rewrite_ptrs;
-}
+ if (unwritten)
+ *compress_ptrs = 0;
+ if (incompressible)
+ *compress_ptrs = 0;
-static unsigned bch2_bkey_ptrs_need_rebalance(struct bch_fs *c,
- struct bch_inode_opts *opts,
- struct bkey_s_c k)
-{
- struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
+ unsigned rb_ptrs = *move_ptrs | *compress_ptrs;
- if (bch2_bkey_extent_ptrs_flags(ptrs) & BIT_ULL(BCH_EXTENT_FLAG_poisoned))
- return 0;
+ if (!rb_ptrs)
+ return;
- return bch2_bkey_ptrs_need_compress(c, opts, k, ptrs) |
- bch2_bkey_ptrs_need_move(c, opts, ptrs);
+ ptr_idx = 1;
+ bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
+ if (rb_ptrs & ptr_idx)
+ *sectors += p.crc.compressed_size;
+ ptr_idx <<= 1;
+ }
}
u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *c, struct bkey_s_c k)
{
- struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
+ unsigned move_ptrs = 0;
+ unsigned compress_ptrs = 0;
+ u64 sectors = 0;
- const struct bch_extent_rebalance *opts = bch2_bkey_ptrs_rebalance_opts(ptrs);
- if (!opts)
- return 0;
-
- if (bch2_bkey_extent_ptrs_flags(ptrs) & BIT_ULL(BCH_EXTENT_FLAG_poisoned))
- return 0;
-
- const union bch_extent_entry *entry;
- struct extent_ptr_decoded p;
- u64 sectors = 0;
-
- if (opts->background_compression) {
- unsigned compression_type = bch2_compression_opt_to_type(opts->background_compression);
-
- bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
- if (p.crc.compression_type == BCH_COMPRESSION_TYPE_incompressible ||
- p.ptr.unwritten) {
- sectors = 0;
- goto incompressible;
- }
+ bch2_bkey_needs_rebalance(c, k, NULL, &move_ptrs, &compress_ptrs, §ors);
+ return sectors;
+}
- if (!p.ptr.cached && p.crc.compression_type != compression_type)
- sectors += p.crc.compressed_size;
- }
- }
-incompressible:
- if (opts->background_target) {
- guard(rcu)();
- bkey_for_each_ptr_decode(k.k, ptrs, p, entry)
- if (!p.ptr.cached &&
- !bch2_dev_in_target(c, p.ptr.dev, opts->background_target))
- sectors += p.crc.compressed_size;
- }
+static unsigned bch2_bkey_ptrs_need_rebalance(struct bch_fs *c,
+ struct bch_inode_opts *opts,
+ struct bkey_s_c k)
+{
+ unsigned move_ptrs = 0;
+ unsigned compress_ptrs = 0;
+ u64 sectors = 0;
- return sectors;
+ bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, §ors);
+ return move_ptrs|compress_ptrs;
}
static inline bool bkey_should_have_rb_opts(struct bch_fs *c,
@@ -518,23 +507,25 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
bch2_bkey_val_to_text(&buf, c, k);
prt_newline(&buf);
- struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
+ unsigned move_ptrs = 0;
+ unsigned compress_ptrs = 0;
+ u64 sectors = 0;
- unsigned p = bch2_bkey_ptrs_need_compress(c, opts, k, ptrs);
- if (p) {
- prt_str(&buf, "compression=");
- bch2_compression_opt_to_text(&buf, opts->background_compression);
+ bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, §ors);
+
+ if (move_ptrs) {
+ prt_str(&buf, "move=");
+ bch2_target_to_text(&buf, c, opts->background_target);
prt_str(&buf, " ");
- bch2_prt_u64_base2(&buf, p);
+ bch2_prt_u64_base2(&buf, move_ptrs);
prt_newline(&buf);
}
- p = bch2_bkey_ptrs_need_move(c, opts, ptrs);
- if (p) {
- prt_str(&buf, "move=");
- bch2_target_to_text(&buf, c, opts->background_target);
+ if (compress_ptrs) {
+ prt_str(&buf, "compression=");
+ bch2_compression_opt_to_text(&buf, opts->background_compression);
prt_str(&buf, " ");
- bch2_prt_u64_base2(&buf, p);
+ bch2_prt_u64_base2(&buf, compress_ptrs);
prt_newline(&buf);
}
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 13/21] bcachefs: rebalance now supports changing checksum type
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (11 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 12/21] bcachefs: bch2_bkey_needs_rebalance() Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 14/21] bcachefs: Consistency checking for bch_extent_rebalance opts Kent Overstreet
` (7 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
We had a user report with a filesystem where data has been written out
with the incorrect checksum type - including no checksum.
That means we need to check/repair such cases, and rebalance is
responsible for handling io path option propagation to extents; thus,
rebalance needs to support checksum type.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 31 +++++++++++++++++++++++++------
1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 0d1abb0eeb88..a2fb2f1cec70 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -47,10 +47,12 @@ static void bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
struct bch_inode_opts *io_opts,
unsigned *move_ptrs,
unsigned *compress_ptrs,
+ unsigned *csum_ptrs,
u64 *sectors)
{
*move_ptrs = 0;
*compress_ptrs = 0;
+ *csum_ptrs = 0;
*sectors = 0;
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
@@ -66,6 +68,9 @@ static void bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
bch2_compression_opt_to_type(io_opts
? io_opts->background_compression
: rb_opts->background_compression);
+ unsigned csum_type = bch2_csum_opt_to_type(io_opts
+ ? io_opts->data_checksum
+ : rb_opts->data_checksum, true);
unsigned target = io_opts
? io_opts->background_target
: rb_opts->background_target;
@@ -87,6 +92,9 @@ static void bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
if (p.crc.compression_type != compression_type)
*compress_ptrs |= ptr_idx;
+ if (p.crc.csum_type != csum_type)
+ *csum_ptrs |= ptr_idx;
+
if (target && !bch2_dev_in_target(c, p.ptr.dev, target))
*move_ptrs |= ptr_idx;
}
@@ -95,11 +103,11 @@ static void bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
}
if (unwritten)
- *compress_ptrs = 0;
+ *compress_ptrs = *csum_ptrs = 0;
if (incompressible)
*compress_ptrs = 0;
- unsigned rb_ptrs = *move_ptrs | *compress_ptrs;
+ unsigned rb_ptrs = *move_ptrs | *compress_ptrs | *csum_ptrs;
if (!rb_ptrs)
return;
@@ -116,9 +124,10 @@ u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *c, struct bkey_s_c k)
{
unsigned move_ptrs = 0;
unsigned compress_ptrs = 0;
+ unsigned csum_ptrs = 0;
u64 sectors = 0;
- bch2_bkey_needs_rebalance(c, k, NULL, &move_ptrs, &compress_ptrs, §ors);
+ bch2_bkey_needs_rebalance(c, k, NULL, &move_ptrs, &compress_ptrs, &csum_ptrs, §ors);
return sectors;
}
@@ -128,10 +137,11 @@ static unsigned bch2_bkey_ptrs_need_rebalance(struct bch_fs *c,
{
unsigned move_ptrs = 0;
unsigned compress_ptrs = 0;
+ unsigned csum_ptrs = 0;
u64 sectors = 0;
- bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, §ors);
- return move_ptrs|compress_ptrs;
+ bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, &csum_ptrs, §ors);
+ return move_ptrs|compress_ptrs|csum_ptrs;
}
static inline bool bkey_should_have_rb_opts(struct bch_fs *c,
@@ -509,9 +519,10 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
unsigned move_ptrs = 0;
unsigned compress_ptrs = 0;
+ unsigned csum_ptrs = 0;
u64 sectors = 0;
- bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, §ors);
+ bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, &csum_ptrs, §ors);
if (move_ptrs) {
prt_str(&buf, "move=");
@@ -529,6 +540,14 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
prt_newline(&buf);
}
+ if (csum_ptrs) {
+ prt_str(&buf, "csum=");
+ bch2_prt_csum_opt(&buf, opts->data_checksum);
+ prt_str(&buf, " ");
+ bch2_prt_u64_base2(&buf, csum_ptrs);
+ prt_newline(&buf);
+ }
+
trace_rebalance_extent(c, buf.buf);
}
count_event(c, rebalance_extent);
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 14/21] bcachefs: Consistency checking for bch_extent_rebalance opts
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (12 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 13/21] bcachefs: rebalance now supports changing checksum type Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 15/21] bcachefs: check_rebalance_work checks option inconsistency Kent Overstreet
` (6 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
bch2_get_update_rebalance_opts() now checks for consistency between
bch_extent_rebalance and the io path options from the inode and
filesystem; unless an option has been recently changed (and a scan to
propagate new options queued), they should match.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 65 +++++++++++++++++++++++++++++++---
fs/bcachefs/sb-errors_format.h | 3 +-
2 files changed, 63 insertions(+), 5 deletions(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index a2fb2f1cec70..872fff940c5a 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -25,6 +25,8 @@
#include <linux/kthread.h>
#include <linux/sched/cputime.h>
+#define REBALANCE_WORK_SCAN_OFFSET (U64_MAX - 1)
+
/* bch_extent_rebalance: */
static const struct bch_extent_rebalance *bch2_bkey_ptrs_rebalance_opts(struct bkey_ptrs_c ptrs)
@@ -178,6 +180,35 @@ int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
return 0;
}
+static int have_rebalance_scan_cookie(struct btree_trans *trans, u64 inum)
+{
+ /*
+ * If opts need to be propagated to the extent, a scan cookie should be
+ * present:
+ */
+ CLASS(btree_iter, iter)(trans, BTREE_ID_rebalance_work,
+ SPOS(inum, REBALANCE_WORK_SCAN_OFFSET, U32_MAX),
+ BTREE_ITER_intent);
+ struct bkey_s_c k = bch2_btree_iter_peek_slot(&iter);
+ int ret = bkey_err(k);
+ if (ret)
+ return ret;
+
+ if (k.k->type == KEY_TYPE_cookie)
+ return 1;
+
+ if (!inum)
+ return 0;
+
+ bch2_btree_iter_set_pos(&iter, SPOS(0, REBALANCE_WORK_SCAN_OFFSET, U32_MAX));
+ k = bch2_btree_iter_peek_slot(&iter);
+ ret = bkey_err(k);
+ if (ret)
+ return ret;
+
+ return k.k->type == KEY_TYPE_cookie;
+}
+
static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
struct bch_inode_opts *io_opts,
struct btree_iter *iter,
@@ -185,6 +216,7 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
enum set_needs_rebalance_ctx ctx)
{
struct bch_fs *c = trans->c;
+ int ret = 0;
BUG_ON(iter->flags & BTREE_ITER_is_extents);
BUG_ON(iter->flags & BTREE_ITER_filter_snapshots);
@@ -220,8 +252,33 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
: !old)
return 0;
+ if (k.k->type != KEY_TYPE_reflink_v) {
+ ret = have_rebalance_scan_cookie(trans, k.k->p.inode);
+ if (ret < 0)
+ return ret;
+
+ if (!ret) {
+ CLASS(printbuf, buf)();
+
+ prt_printf(&buf, "extent with incorrect/missing rebalance opts:\n");
+ bch2_bkey_val_to_text(&buf, c, k);
+
+ const struct bch_extent_rebalance _old = {};
+ if (!old)
+ old = &_old;
+#define x(_name) \
+ if (old->_name != new._name) \
+ prt_printf(&buf, "\n" #_name " %u != %u", \
+ old->_name, new._name); \
+ BCH_REBALANCE_OPTS()
+#undef x
+
+ fsck_err(trans, extent_io_opts_not_set, "%s", buf.buf);
+ }
+ }
+
struct bkey_i *n = bch2_trans_kmalloc(trans, bkey_bytes(k.k) + 8);
- int ret = PTR_ERR_OR_ZERO(n);
+ ret = PTR_ERR_OR_ZERO(n);
if (ret)
return ret;
@@ -229,10 +286,12 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
/* On successfull transaction commit, @k was invalidated: */
- return bch2_bkey_set_needs_rebalance(c, io_opts, n, ctx, 0) ?:
+ ret = bch2_bkey_set_needs_rebalance(c, io_opts, n, ctx, 0) ?:
bch2_trans_update(trans, iter, n, BTREE_UPDATE_internal_snapshot_node) ?:
bch2_trans_commit(trans, NULL, NULL, 0) ?:
bch_err_throw(c, transaction_restart_commit);
+fsck_err:
+ return ret;
}
static struct bch_inode_opts *bch2_extent_get_io_opts(struct btree_trans *trans,
@@ -335,8 +394,6 @@ int bch2_extent_get_io_opts_one(struct btree_trans *trans,
ctx);
}
-#define REBALANCE_WORK_SCAN_OFFSET (U64_MAX - 1)
-
static const char * const bch2_rebalance_state_strs[] = {
#define x(t) #t,
BCH_REBALANCE_STATES()
diff --git a/fs/bcachefs/sb-errors_format.h b/fs/bcachefs/sb-errors_format.h
index aa0ea1ec9f10..4816c4150261 100644
--- a/fs/bcachefs/sb-errors_format.h
+++ b/fs/bcachefs/sb-errors_format.h
@@ -337,7 +337,8 @@ enum bch_fsck_flags {
x(dirent_stray_data_after_cf_name, 305, 0) \
x(rebalance_work_incorrectly_set, 309, FSCK_AUTOFIX) \
x(rebalance_work_incorrectly_unset, 310, FSCK_AUTOFIX) \
- x(MAX, 326, 0)
+ x(extent_io_opts_not_set, 326, FSCK_AUTOFIX) \
+ x(MAX, 327, 0)
enum bch_sb_error_id {
#define x(t, n, ...) BCH_FSCK_ERR_##t = n,
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 15/21] bcachefs: check_rebalance_work checks option inconsistency
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (13 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 14/21] bcachefs: Consistency checking for bch_extent_rebalance opts Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 16/21] bcachefs: bch2_bkey_set_needs_rebalance() now takes per_snapshot_io_opts Kent Overstreet
` (5 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
The previous patch added consistency checking for bch_extent_rebalance
whenever the move or rebalance paths update an extent; now we
additionally check in the check_rebalance_work recovery pass.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 872fff940c5a..70000d9a1ec4 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -1003,6 +1003,7 @@ int bch2_fs_rebalance_init(struct bch_fs *c)
static int check_rebalance_work_one(struct btree_trans *trans,
struct btree_iter *extent_iter,
struct btree_iter *rebalance_iter,
+ struct per_snapshot_io_opts *snapshot_io_opts,
struct bkey_buf *last_flushed)
{
struct bch_fs *c = trans->c;
@@ -1073,6 +1074,13 @@ static int check_rebalance_work_one(struct btree_trans *trans,
return ret;
}
+ struct bch_inode_opts *opts = bch2_extent_get_apply_io_opts(trans,
+ snapshot_io_opts, extent_iter->pos, extent_iter, extent_k,
+ SET_NEEDS_REBALANCE_other);
+ ret = PTR_ERR_OR_ZERO(opts);
+ if (ret)
+ return ret;
+
if (cmp <= 0)
bch2_btree_iter_advance(extent_iter);
if (cmp >= 0)
@@ -1085,10 +1093,14 @@ int bch2_check_rebalance_work(struct bch_fs *c)
{
CLASS(btree_trans, trans)(c);
CLASS(btree_iter, extent_iter)(trans, BTREE_ID_reflink, POS_MIN,
+ BTREE_ITER_not_extents|
BTREE_ITER_prefetch);
CLASS(btree_iter, rebalance_iter)(trans, BTREE_ID_rebalance_work, POS_MIN,
BTREE_ITER_prefetch);
+ struct per_snapshot_io_opts snapshot_io_opts;
+ per_snapshot_io_opts_init(&snapshot_io_opts, c);
+
struct bkey_buf last_flushed;
bch2_bkey_buf_init(&last_flushed);
bkey_init(&last_flushed.k->k);
@@ -1102,12 +1114,14 @@ int bch2_check_rebalance_work(struct bch_fs *c)
bch2_trans_begin(trans);
- ret = check_rebalance_work_one(trans, &extent_iter, &rebalance_iter, &last_flushed);
+ ret = check_rebalance_work_one(trans, &extent_iter, &rebalance_iter,
+ &snapshot_io_opts, &last_flushed);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
ret = 0;
}
+ per_snapshot_io_opts_exit(&snapshot_io_opts);
bch2_bkey_buf_exit(&last_flushed, c);
return ret < 0 ? ret : 0;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 16/21] bcachefs: bch2_bkey_set_needs_rebalance() now takes per_snapshot_io_opts
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (14 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 15/21] bcachefs: check_rebalance_work checks option inconsistency Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 17/21] bcachefs: bch_extent_rebalance changes Kent Overstreet
` (4 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
To be used for caching the existence of scan cookies.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/data_update.c | 2 +-
fs/bcachefs/io_write.c | 4 ++--
fs/bcachefs/rebalance.c | 7 +++++--
fs/bcachefs/rebalance.h | 21 +++++++++++----------
4 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/fs/bcachefs/data_update.c b/fs/bcachefs/data_update.c
index e932ee5488da..0f968bab7d93 100644
--- a/fs/bcachefs/data_update.c
+++ b/fs/bcachefs/data_update.c
@@ -438,7 +438,7 @@ static int __bch2_data_update_index_update(struct btree_trans *trans,
bch2_insert_snapshot_whiteouts(trans, m->btree_id,
k.k->p, insert->k.p) ?:
bch2_inum_snapshot_opts_get(trans, k.k->p.inode, k.k->p.snapshot, &opts) ?:
- bch2_bkey_set_needs_rebalance(c, &opts, insert,
+ bch2_bkey_set_needs_rebalance(trans, NULL, &opts, insert,
SET_NEEDS_REBALANCE_foreground,
m->op.opts.change_cookie) ?:
bch2_trans_update(trans, &iter, insert,
diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
index 16bcdada8cf1..cf1dd8bc7fe9 100644
--- a/fs/bcachefs/io_write.c
+++ b/fs/bcachefs/io_write.c
@@ -365,7 +365,7 @@ int bch2_extent_update(struct btree_trans *trans,
min(k->k.p.offset << 9, new_i_size),
i_sectors_delta, &inode) ?:
(bch2_inode_opts_get_inode(c, &inode, &opts),
- bch2_bkey_set_needs_rebalance(c, &opts, k,
+ bch2_bkey_set_needs_rebalance(trans, NULL, &opts, k,
SET_NEEDS_REBALANCE_foreground,
change_cookie)) ?:
bch2_trans_update(trans, iter, k, 0) ?:
@@ -1271,7 +1271,7 @@ static int bch2_nocow_write_convert_one_unwritten(struct btree_trans *trans,
return bch2_extent_update_i_size_sectors(trans, iter,
min(new->k.p.offset << 9, new_i_size), 0, &inode) ?:
(bch2_inode_opts_get_inode(c, &inode, &opts),
- bch2_bkey_set_needs_rebalance(c, &opts, new,
+ bch2_bkey_set_needs_rebalance(trans, NULL, &opts, new,
SET_NEEDS_REBALANCE_foreground,
op->opts.change_cookie)) ?:
bch2_trans_update(trans, iter, new,
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 70000d9a1ec4..33bddbd33088 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -153,7 +153,9 @@ static inline bool bkey_should_have_rb_opts(struct bch_fs *c,
return k.k->type == KEY_TYPE_reflink_v || bch2_bkey_ptrs_need_rebalance(c, opts, k);
}
-int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
+int bch2_bkey_set_needs_rebalance(struct btree_trans *trans,
+ struct per_snapshot_io_opts *snapshot_io_opts,
+ struct bch_inode_opts *opts,
struct bkey_i *_k,
enum set_needs_rebalance_ctx ctx,
u32 change_cookie)
@@ -161,6 +163,7 @@ int bch2_bkey_set_needs_rebalance(struct bch_fs *c, struct bch_inode_opts *opts,
if (!bkey_extent_is_direct_data(&_k->k))
return 0;
+ struct bch_fs *c = trans->c;
struct bkey_s k = bkey_i_to_s(_k);
struct bch_extent_rebalance *old =
(struct bch_extent_rebalance *) bch2_bkey_rebalance_opts(k.s_c);
@@ -286,7 +289,7 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
/* On successfull transaction commit, @k was invalidated: */
- ret = bch2_bkey_set_needs_rebalance(c, io_opts, n, ctx, 0) ?:
+ ret = bch2_bkey_set_needs_rebalance(trans, NULL, io_opts, n, ctx, 0) ?:
bch2_trans_update(trans, iter, n, BTREE_UPDATE_internal_snapshot_node) ?:
bch2_trans_commit(trans, NULL, NULL, 0) ?:
bch_err_throw(c, transaction_restart_commit);
diff --git a/fs/bcachefs/rebalance.h b/fs/bcachefs/rebalance.h
index fd33e7aa2ecb..fd873894c8b6 100644
--- a/fs/bcachefs/rebalance.h
+++ b/fs/bcachefs/rebalance.h
@@ -28,16 +28,6 @@ static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_f
u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *, struct bkey_s_c);
-enum set_needs_rebalance_ctx {
- SET_NEEDS_REBALANCE_opt_change,
- SET_NEEDS_REBALANCE_opt_change_indirect,
- SET_NEEDS_REBALANCE_foreground,
- SET_NEEDS_REBALANCE_other,
-};
-
-int bch2_bkey_set_needs_rebalance(struct bch_fs *, struct bch_inode_opts *,
- struct bkey_i *, enum set_needs_rebalance_ctx, u32);
-
/* Inodes in different snapshots may have different IO options: */
struct snapshot_io_opts_entry {
u32 snapshot;
@@ -61,6 +51,17 @@ static inline void per_snapshot_io_opts_exit(struct per_snapshot_io_opts *io_opt
darray_exit(&io_opts->d);
}
+enum set_needs_rebalance_ctx {
+ SET_NEEDS_REBALANCE_opt_change,
+ SET_NEEDS_REBALANCE_opt_change_indirect,
+ SET_NEEDS_REBALANCE_foreground,
+ SET_NEEDS_REBALANCE_other,
+};
+
+int bch2_bkey_set_needs_rebalance(struct btree_trans *,
+ struct per_snapshot_io_opts *, struct bch_inode_opts *,
+ struct bkey_i *, enum set_needs_rebalance_ctx, u32);
+
struct bch_inode_opts *bch2_extent_get_apply_io_opts(struct btree_trans *,
struct per_snapshot_io_opts *, struct bpos,
struct btree_iter *, struct bkey_s_c,
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 17/21] bcachefs: bch_extent_rebalance changes
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (15 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 16/21] bcachefs: bch2_bkey_set_needs_rebalance() now takes per_snapshot_io_opts Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 18/21] bcachefs: bch2_set_rebalance_needs_scan_device() Kent Overstreet
` (3 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
"Extent needs rebalance for data_replicas" cannot be a pure function of
the extent with the current bch_extent_rebalance, which is required for
the triggers - they would create inconsistencies in the rebalance_work
accounting and btrees when e.g. device durability changes.
So: this adds bch_extent_rebalance.needs_rb, which has flags for all the
reasons rebalance may need to process an extent: the trigger now uses
just these flags instead of running the full "does this extent need
rebalance" calculations in bch2_bkey_sectors_need_rebalance().
Additionally:
- Instead of a single accounting counter for pending rebalance work,
this is now split out into different counters for the different io
path options rebalance handles (compression, data_checksum, replicas,
erasure_code, etc.)
- "Does this extent need to be rebalanced?" is now centralized in
bch2_bkey_set_needs_rebalance()
- "Is new rebalance_work allowed in this context" is
new_needs_rb_allowed() - this enforces that extents match the
specified io path options, with clearly defined exceptions (e.g.
accounting for races with option changes, and foreground writes are
allowed to add background_compression and background_target work)
XXX: split this patch up more
XXX: define a new on disk format version, and upgrade/downgrade table
entries
Compatibility notes: still undecided if we'll stick with redefining the
existing bch_extent_rebalance, or add a new extent entry type for
bch_extent_rebalance_v2 - there are pros and cons to both
If we redefine the existing bch_extent_rebalance, on upgrade
check_rebalance_work will correct all the existing bch_extent_rebalance
entries (along with accounting, rebalance_work btrees) - except indirect
extents will need special handling, which we likely need anyways
On downgrade, old versions don't have a recovery pass that
checks/fixes bch_extent_rebalance from the io path options - but they do
that on data move, so we're probably more or less ok; some wonkiness in
rebalance_work accounting would be expected
Adding a bch_extent_rebalance_v2 would be an incompatible upgrade
(adding new extent entry types is always an incompatible upgrade,
unfortunately) - and it'd require keeping around compatibility code for
e.g. the triggers to handle the old bch_extent_rebalance...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/buckets.c | 50 ++--
fs/bcachefs/data_update.c | 26 --
fs/bcachefs/disk_accounting_format.h | 1 +
fs/bcachefs/rebalance.c | 403 ++++++++++++++-------------
fs/bcachefs/rebalance.h | 21 +-
fs/bcachefs/rebalance_format.h | 62 +++--
fs/bcachefs/trace.h | 5 -
7 files changed, 288 insertions(+), 280 deletions(-)
diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c
index 6be1cc9ba0da..436634a5f77c 100644
--- a/fs/bcachefs/buckets.c
+++ b/fs/bcachefs/buckets.c
@@ -871,7 +871,6 @@ int bch2_trigger_extent(struct btree_trans *trans,
struct bkey_s_c old, struct bkey_s new,
enum btree_iter_update_trigger_flags flags)
{
- struct bch_fs *c = trans->c;
struct bkey_ptrs_c new_ptrs = bch2_bkey_ptrs_c(new.s_c);
struct bkey_ptrs_c old_ptrs = bch2_bkey_ptrs_c(old);
unsigned new_ptrs_bytes = (void *) new_ptrs.end - (void *) new_ptrs.start;
@@ -902,29 +901,34 @@ int bch2_trigger_extent(struct btree_trans *trans,
return ret;
}
- int need_rebalance_delta = 0;
- s64 need_rebalance_sectors_delta[1] = { 0 };
-
- s64 s = bch2_bkey_sectors_need_rebalance(c, old);
- need_rebalance_delta -= s != 0;
- need_rebalance_sectors_delta[0] -= s;
-
- s = bch2_bkey_sectors_need_rebalance(c, new.s_c);
- need_rebalance_delta += s != 0;
- need_rebalance_sectors_delta[0] += s;
-
- if ((flags & BTREE_TRIGGER_transactional) && need_rebalance_delta) {
- int ret = bch2_btree_bit_mod_buffered(trans, BTREE_ID_rebalance_work,
- new.k->p, need_rebalance_delta > 0);
- if (ret)
- return ret;
- }
+ unsigned old_r = bch2_bkey_needs_rb(old);
+ unsigned new_r = bch2_bkey_needs_rb(new.s_c);
+ if (old_r != new_r) {
+ /* XXX: slowpath, put in a a separate function */
+ int delta = (int) !!new_r - (int) !!old_r;
+ if ((flags & BTREE_TRIGGER_transactional) && delta) {
+ int ret = bch2_btree_bit_mod_buffered(trans, BTREE_ID_rebalance_work,
+ new.k->p, delta > 0);
+ if (ret)
+ return ret;
+ }
- if (need_rebalance_sectors_delta[0]) {
- int ret = bch2_disk_accounting_mod2(trans, flags & BTREE_TRIGGER_gc,
- need_rebalance_sectors_delta, rebalance_work);
- if (ret)
- return ret;
+ s64 v[1] = { 0 };
+#define x(n) \
+ if ((old_r ^ new_r) & BIT(BCH_REBALANCE_##n)) { \
+ v[0] = old_r & BIT(BCH_REBALANCE_##n) \
+ ? -(s64) old.k->size \
+ : new.k->size; \
+ \
+ int ret = bch2_disk_accounting_mod2(trans, \
+ flags & BTREE_TRIGGER_gc, \
+ v, rebalance_work, \
+ BCH_REBALANCE_##n); \
+ if (ret) \
+ return ret; \
+ }
+ BCH_REBALANCE_OPTS()
+#undef x
}
}
diff --git a/fs/bcachefs/data_update.c b/fs/bcachefs/data_update.c
index 0f968bab7d93..2466f7a1c9e6 100644
--- a/fs/bcachefs/data_update.c
+++ b/fs/bcachefs/data_update.c
@@ -207,28 +207,6 @@ static void trace_data_update2(struct data_update *m,
trace_data_update(c, buf.buf);
}
-noinline_for_stack
-static void trace_io_move_created_rebalance2(struct data_update *m,
- struct bkey_s_c old, struct bkey_s_c k,
- struct bkey_i *insert)
-{
- struct bch_fs *c = m->op.c;
- CLASS(printbuf, buf)();
-
- bch2_data_update_opts_to_text(&buf, c, &m->op.opts, &m->data_opts);
-
- prt_str(&buf, "\nold: ");
- bch2_bkey_val_to_text(&buf, c, old);
- prt_str(&buf, "\nk: ");
- bch2_bkey_val_to_text(&buf, c, k);
- prt_str(&buf, "\nnew: ");
- bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(insert));
-
- trace_io_move_created_rebalance(c, buf.buf);
-
- count_event(c, io_move_created_rebalance);
-}
-
noinline_for_stack
static int data_update_invalid_bkey(struct data_update *m,
struct bkey_s_c old, struct bkey_s_c k,
@@ -449,10 +427,6 @@ static int __bch2_data_update_index_update(struct btree_trans *trans,
if (trace_data_update_enabled())
trace_data_update2(m, old, k, insert);
- if (bch2_bkey_sectors_need_rebalance(c, bkey_i_to_s_c(insert)) * k.k->size >
- bch2_bkey_sectors_need_rebalance(c, k) * insert->k.size)
- trace_io_move_created_rebalance2(m, old, k, insert);
-
ret = bch2_trans_commit(trans, &op->res,
NULL,
BCH_TRANS_COMMIT_no_check_rw|
diff --git a/fs/bcachefs/disk_accounting_format.h b/fs/bcachefs/disk_accounting_format.h
index 8269af1dbe2a..4aa7f83f5d75 100644
--- a/fs/bcachefs/disk_accounting_format.h
+++ b/fs/bcachefs/disk_accounting_format.h
@@ -200,6 +200,7 @@ struct bch_acct_inum {
* move, extents counted here are also in the rebalance_work btree.
*/
struct bch_acct_rebalance_work {
+ __u8 opt;
};
struct disk_accounting_pos {
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 33bddbd33088..3a6cd54613a1 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -40,49 +40,54 @@ static const struct bch_extent_rebalance *bch2_bkey_ptrs_rebalance_opts(struct b
return NULL;
}
-static const struct bch_extent_rebalance *bch2_bkey_rebalance_opts(struct bkey_s_c k)
+const struct bch_extent_rebalance *bch2_bkey_rebalance_opts(struct bkey_s_c k)
{
return bch2_bkey_ptrs_rebalance_opts(bch2_bkey_ptrs_c(k));
}
-static void bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
- struct bch_inode_opts *io_opts,
- unsigned *move_ptrs,
- unsigned *compress_ptrs,
- unsigned *csum_ptrs,
- u64 *sectors)
+static struct bch_extent_rebalance
+bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
+ struct bch_inode_opts *opts,
+ unsigned *move_ptrs,
+ unsigned *compress_ptrs,
+ unsigned *csum_ptrs,
+ bool may_update_indirect)
{
*move_ptrs = 0;
*compress_ptrs = 0;
*csum_ptrs = 0;
- *sectors = 0;
struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
-
- const struct bch_extent_rebalance *rb_opts = bch2_bkey_ptrs_rebalance_opts(ptrs);
- if (!io_opts && !rb_opts)
- return;
+ struct bch_extent_rebalance r = { .type = BIT(BCH_EXTENT_ENTRY_rebalance) };
if (bch2_bkey_extent_ptrs_flags(ptrs) & BIT_ULL(BCH_EXTENT_FLAG_poisoned))
- return;
-
- unsigned compression_type =
- bch2_compression_opt_to_type(io_opts
- ? io_opts->background_compression
- : rb_opts->background_compression);
- unsigned csum_type = bch2_csum_opt_to_type(io_opts
- ? io_opts->data_checksum
- : rb_opts->data_checksum, true);
- unsigned target = io_opts
- ? io_opts->background_target
- : rb_opts->background_target;
+ return r;
+
+ const struct bch_extent_rebalance *old_r = bch2_bkey_ptrs_rebalance_opts(ptrs);
+ if (old_r)
+ r = *old_r;
+
+#define x(_name) \
+ if (k.k->type != KEY_TYPE_reflink_v || \
+ may_update_indirect || \
+ (!opts->_name##_from_inode && !r._name##_from_inode)) { \
+ r._name = opts->_name; \
+ r._name##_from_inode = opts->_name##_from_inode; \
+ }
+ BCH_REBALANCE_OPTS()
+#undef x
+
+ unsigned compression_type = bch2_compression_opt_to_type(r.background_compression);
+ unsigned csum_type = bch2_csum_opt_to_type(r.data_checksum, true);
+ unsigned target = r.background_target;
if (target && !bch2_target_accepts_data(c, BCH_DATA_user, target))
target = 0;
+ bool incompressible = false, unwritten = false, ec = false;
+ unsigned durability = 0, min_durability = INT_MAX;
+
const union bch_extent_entry *entry;
struct extent_ptr_decoded p;
- bool incompressible = false, unwritten = false;
-
unsigned ptr_idx = 1;
guard(rcu)();
@@ -99,6 +104,12 @@ static void bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
if (target && !bch2_dev_in_target(c, p.ptr.dev, target))
*move_ptrs |= ptr_idx;
+
+ unsigned d = bch2_extent_ptr_durability(c, &p);
+ durability += d;
+ min_durability = min(min_durability, d);
+
+ ec |= p.has_ec;
}
ptr_idx <<= 1;
@@ -109,48 +120,123 @@ static void bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
if (incompressible)
*compress_ptrs = 0;
- unsigned rb_ptrs = *move_ptrs | *compress_ptrs | *csum_ptrs;
-
- if (!rb_ptrs)
- return;
-
- ptr_idx = 1;
- bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
- if (rb_ptrs & ptr_idx)
- *sectors += p.crc.compressed_size;
- ptr_idx <<= 1;
- }
+ if (*csum_ptrs)
+ r.need_rb |= BIT(BCH_REBALANCE_data_checksum);
+ if (*compress_ptrs)
+ r.need_rb |= BIT(BCH_REBALANCE_background_compression);
+ if (r.erasure_code != ec)
+ r.need_rb |= BIT(BCH_REBALANCE_erasure_code);
+ if (durability < r.data_replicas || durability >= r.data_replicas + min_durability)
+ r.need_rb |= BIT(BCH_REBALANCE_data_replicas);
+ if (*move_ptrs)
+ r.need_rb |= BIT(BCH_REBALANCE_background_target);
+ return r;
}
-u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *c, struct bkey_s_c k)
+static int check_rebalance_scan_cookie(struct btree_trans *trans, u64 inum, bool *v)
{
- unsigned move_ptrs = 0;
- unsigned compress_ptrs = 0;
- unsigned csum_ptrs = 0;
- u64 sectors = 0;
+ if (*v)
+ return 0;
- bch2_bkey_needs_rebalance(c, k, NULL, &move_ptrs, &compress_ptrs, &csum_ptrs, §ors);
- return sectors;
+ /*
+ * If opts need to be propagated to the extent, a scan cookie should be
+ * present:
+ */
+ CLASS(btree_iter, iter)(trans, BTREE_ID_rebalance_work,
+ SPOS(inum, REBALANCE_WORK_SCAN_OFFSET, U32_MAX),
+ BTREE_ITER_intent);
+ struct bkey_s_c k = bch2_btree_iter_peek_slot(&iter);
+ int ret = bkey_err(k);
+ if (ret)
+ return ret;
+
+ *v = k.k->type == KEY_TYPE_cookie;
+ return 0;
}
-static unsigned bch2_bkey_ptrs_need_rebalance(struct bch_fs *c,
- struct bch_inode_opts *opts,
- struct bkey_s_c k)
+static int new_needs_rb_allowed(struct btree_trans *trans,
+ struct per_snapshot_io_opts *s,
+ struct bkey_s_c k,
+ enum set_needs_rebalance_ctx ctx,
+ unsigned opt_change_cookie,
+ const struct bch_extent_rebalance *old,
+ const struct bch_extent_rebalance *new,
+ unsigned new_need_rb)
{
- unsigned move_ptrs = 0;
- unsigned compress_ptrs = 0;
- unsigned csum_ptrs = 0;
- u64 sectors = 0;
+ struct bch_fs *c = trans->c;
+ /*
+ * New need_rb - pointers that don't match the current io path options -
+ * are only allowed in certain situations:
+ *
+ * Propagating new options: from bch2_set_rebalance_needs_scan
+ *
+ * Foreground writes: background_compression and background_target are
+ * allowed
+ *
+ * Foreground writes: we may have raced with an option change:
+ * opt_change_cookie checks for this
+ *
+ * XXX: foreground writes should still match compression,
+ * foreground_target - figure out how to check for this
+ */
+ if (ctx == SET_NEEDS_REBALANCE_opt_change ||
+ ctx == SET_NEEDS_REBALANCE_opt_change_indirect)
+ return 0;
- bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, &csum_ptrs, §ors);
- return move_ptrs|compress_ptrs|csum_ptrs;
-}
+ if (ctx == SET_NEEDS_REBALANCE_foreground) {
+ new_need_rb &= ~(BIT(BCH_REBALANCE_background_compression)|
+ BIT(BCH_REBALANCE_background_target));
+ if (!new_need_rb)
+ return 0;
-static inline bool bkey_should_have_rb_opts(struct bch_fs *c,
- struct bch_inode_opts *opts,
- struct bkey_s_c k)
-{
- return k.k->type == KEY_TYPE_reflink_v || bch2_bkey_ptrs_need_rebalance(c, opts, k);
+ if (opt_change_cookie != atomic_read(&c->opt_change_cookie))
+ return 0;
+ }
+
+ /*
+ * Either the extent data or the extent io options (from
+ * bch_extent_rebalance) should match the io_opts from the
+ * inode/filesystem, unless
+ *
+ * - There's a scan pending to propagate new options
+ * - It's an indirect extent: it may be referenced by inodes
+ * with inconsistent options
+ *
+ * For efficiency (so that we can cache checking for scan
+ * cookies), only check option consistency when we're called
+ * with snapshot_io_opts - don't bother when we're called from
+ * move_data_phys() -> get_io_opts_one()
+ *
+ * Note that we can cache the existence of a cookie, but not the
+ * non-existence, to avoid spurious false positives.
+ */
+ bool scan_cookie = false;
+ int ret = check_rebalance_scan_cookie(trans, 0, s ? &s->fs_scan_cookie : &scan_cookie) ?:
+ check_rebalance_scan_cookie(trans, k.k->p.inode, s ? &s->inum_scan_cookie : &scan_cookie);
+ if (ret)
+ return ret;
+
+ if (scan_cookie)
+ return 0;
+
+ CLASS(printbuf, buf)();
+
+ prt_printf(&buf, "extent with incorrect/missing rebalance opts:\n");
+ bch2_bkey_val_to_text(&buf, c, k);
+
+ const struct bch_extent_rebalance _old = {};
+ if (!old)
+ old = &_old;
+
+#define x(_name) \
+ if (new_need_rb & BIT(BCH_REBALANCE_##_name)) \
+ prt_printf(&buf, "\n" #_name " %u != %u", old->_name, new->_name);
+ BCH_REBALANCE_OPTS()
+#undef x
+
+ fsck_err(trans, extent_io_opts_not_set, "%s", buf.buf);
+fsck_err:
+ return ret;
}
int bch2_bkey_set_needs_rebalance(struct btree_trans *trans,
@@ -158,7 +244,7 @@ int bch2_bkey_set_needs_rebalance(struct btree_trans *trans,
struct bch_inode_opts *opts,
struct bkey_i *_k,
enum set_needs_rebalance_ctx ctx,
- u32 change_cookie)
+ unsigned opt_change_cookie)
{
if (!bkey_extent_is_direct_data(&_k->k))
return 0;
@@ -168,51 +254,44 @@ int bch2_bkey_set_needs_rebalance(struct btree_trans *trans,
struct bch_extent_rebalance *old =
(struct bch_extent_rebalance *) bch2_bkey_rebalance_opts(k.s_c);
- if (bkey_should_have_rb_opts(c, opts, k.s_c)) {
- if (!old) {
- old = bkey_val_end(k);
- k.k->u64s += sizeof(*old) / sizeof(u64);
- }
+ unsigned move_ptrs = 0;
+ unsigned compress_ptrs = 0;
+ unsigned csum_ptrs = 0;
+ struct bch_extent_rebalance new =
+ bch2_bkey_needs_rebalance(c, k.s_c, opts, &move_ptrs, &compress_ptrs, &csum_ptrs,
+ ctx == SET_NEEDS_REBALANCE_opt_change_indirect);
- *old = io_opts_to_rebalance_opts(c, opts);
- } else {
- if (old)
- extent_entry_drop(k, (union bch_extent_entry *) old);
- }
+ bool should_have_rb = k.k->type == KEY_TYPE_reflink_v || new.need_rb;
- return 0;
-}
+ if (should_have_rb == !!old &&
+ (should_have_rb ? !memcmp(old, &new, sizeof(new)) : !old))
+ return 0;
-static int have_rebalance_scan_cookie(struct btree_trans *trans, u64 inum)
-{
- /*
- * If opts need to be propagated to the extent, a scan cookie should be
- * present:
- */
- CLASS(btree_iter, iter)(trans, BTREE_ID_rebalance_work,
- SPOS(inum, REBALANCE_WORK_SCAN_OFFSET, U32_MAX),
- BTREE_ITER_intent);
- struct bkey_s_c k = bch2_btree_iter_peek_slot(&iter);
- int ret = bkey_err(k);
- if (ret)
- return ret;
+ unsigned new_need_rb = new.need_rb & ~(old ? old->need_rb : 0);
- if (k.k->type == KEY_TYPE_cookie)
- return 1;
+ if (unlikely(new_need_rb)) {
+ int ret = new_needs_rb_allowed(trans, snapshot_io_opts,
+ k.s_c, ctx, opt_change_cookie,
+ old, &new, new_need_rb);
+ if (ret)
+ return ret;
+ }
- if (!inum)
- return 0;
+ if (should_have_rb) {
+ if (!old) {
+ old = bkey_val_end(k);
+ k.k->u64s += sizeof(*old) / sizeof(u64);
+ }
- bch2_btree_iter_set_pos(&iter, SPOS(0, REBALANCE_WORK_SCAN_OFFSET, U32_MAX));
- k = bch2_btree_iter_peek_slot(&iter);
- ret = bkey_err(k);
- if (ret)
- return ret;
+ *old = new;
+ } else if (old)
+ extent_entry_drop(k, (union bch_extent_entry *) old);
- return k.k->type == KEY_TYPE_cookie;
+ return 0;
}
static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
+ struct per_snapshot_io_opts *snapshot_io_opts,
struct bch_inode_opts *io_opts,
struct btree_iter *iter,
struct bkey_s_c k,
@@ -227,59 +306,22 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
if (!bkey_extent_is_direct_data(k.k))
return 0;
- bool may_update_indirect = ctx == SET_NEEDS_REBALANCE_opt_change_indirect;
+ struct bch_extent_rebalance *old =
+ (struct bch_extent_rebalance *) bch2_bkey_rebalance_opts(k);
- /*
- * If it's an indirect extent, and we walked to it directly, we won't
- * have the options from the inode that were directly applied: options
- * from the extent take precedence - unless the io_opts option came from
- * the inode and may_update_indirect is true (walked from a
- * REFLINK_P_MAY_UPDATE_OPTIONS pointer).
- */
- const struct bch_extent_rebalance *old = bch2_bkey_rebalance_opts(k);
- if (old && k.k->type == KEY_TYPE_reflink_v) {
-#define x(_name) \
- if (old->_name##_from_inode && \
- !(may_update_indirect && io_opts->_name##_from_inode)) { \
- io_opts->_name = old->_name; \
- io_opts->_name##_from_inode = true; \
- }
- BCH_REBALANCE_OPTS()
-#undef x
- }
+ unsigned move_ptrs = 0;
+ unsigned compress_ptrs = 0;
+ unsigned csum_ptrs = 0;
+ struct bch_extent_rebalance new =
+ bch2_bkey_needs_rebalance(c, k, io_opts, &move_ptrs, &compress_ptrs, &csum_ptrs,
+ ctx == SET_NEEDS_REBALANCE_opt_change_indirect);
- struct bch_extent_rebalance new = io_opts_to_rebalance_opts(c, io_opts);
+ bool should_have_rb = k.k->type == KEY_TYPE_reflink_v || new.need_rb;
- if (bkey_should_have_rb_opts(c, io_opts, k)
- ? old && !memcmp(old, &new, sizeof(new))
- : !old)
+ if (should_have_rb == !!old &&
+ (should_have_rb ? !memcmp(old, &new, sizeof(new)) : !old))
return 0;
- if (k.k->type != KEY_TYPE_reflink_v) {
- ret = have_rebalance_scan_cookie(trans, k.k->p.inode);
- if (ret < 0)
- return ret;
-
- if (!ret) {
- CLASS(printbuf, buf)();
-
- prt_printf(&buf, "extent with incorrect/missing rebalance opts:\n");
- bch2_bkey_val_to_text(&buf, c, k);
-
- const struct bch_extent_rebalance _old = {};
- if (!old)
- old = &_old;
-#define x(_name) \
- if (old->_name != new._name) \
- prt_printf(&buf, "\n" #_name " %u != %u", \
- old->_name, new._name); \
- BCH_REBALANCE_OPTS()
-#undef x
-
- fsck_err(trans, extent_io_opts_not_set, "%s", buf.buf);
- }
- }
-
struct bkey_i *n = bch2_trans_kmalloc(trans, bkey_bytes(k.k) + 8);
ret = PTR_ERR_OR_ZERO(n);
if (ret)
@@ -289,12 +331,10 @@ static int bch2_get_update_rebalance_opts(struct btree_trans *trans,
/* On successfull transaction commit, @k was invalidated: */
- ret = bch2_bkey_set_needs_rebalance(trans, NULL, io_opts, n, ctx, 0) ?:
+ return bch2_bkey_set_needs_rebalance(trans, snapshot_io_opts, io_opts, n, ctx, 0) ?:
bch2_trans_update(trans, iter, n, BTREE_UPDATE_internal_snapshot_node) ?:
bch2_trans_commit(trans, NULL, NULL, 0) ?:
bch_err_throw(c, transaction_restart_commit);
-fsck_err:
- return ret;
}
static struct bch_inode_opts *bch2_extent_get_io_opts(struct btree_trans *trans,
@@ -334,7 +374,8 @@ static struct bch_inode_opts *bch2_extent_get_io_opts(struct btree_trans *trans,
darray_push(&io_opts->d, e);
}));
- io_opts->cur_inum = extent_pos.inode;
+ io_opts->cur_inum = extent_pos.inode;
+ io_opts->inum_scan_cookie = false;
}
ret = ret ?: trans_was_restarted(trans, restart_count);
@@ -357,12 +398,13 @@ struct bch_inode_opts *bch2_extent_get_apply_io_opts(struct btree_trans *trans,
enum set_needs_rebalance_ctx ctx)
{
struct bch_inode_opts *opts =
- bch2_extent_get_io_opts(trans, snapshot_io_opts, extent_pos, extent_iter, extent_k);
+ bch2_extent_get_io_opts(trans, snapshot_io_opts,
+ extent_pos, extent_iter, extent_k);
if (IS_ERR(opts) || btree_iter_path(trans, extent_iter)->level)
return opts;
- int ret = bch2_get_update_rebalance_opts(trans, opts, extent_iter, extent_k,
- SET_NEEDS_REBALANCE_other);
+ int ret = bch2_get_update_rebalance_opts(trans, snapshot_io_opts, opts,
+ extent_iter, extent_k, ctx);
return ret ? ERR_PTR(ret) : opts;
}
@@ -393,8 +435,7 @@ int bch2_extent_get_io_opts_one(struct btree_trans *trans,
}
}
- return bch2_get_update_rebalance_opts(trans, io_opts, extent_iter, extent_k,
- ctx);
+ return bch2_get_update_rebalance_opts(trans, NULL, io_opts, extent_iter, extent_k, ctx);
}
static const char * const bch2_rebalance_state_strs[] = {
@@ -507,23 +548,6 @@ static struct bkey_i *next_rebalance_entry(struct btree_trans *trans,
return &(&darray_pop(buf))->k_i;
}
-static int bch2_bkey_clear_needs_rebalance(struct btree_trans *trans,
- struct btree_iter *iter,
- struct bkey_s_c k)
-{
- if (k.k->type == KEY_TYPE_reflink_v || !bch2_bkey_rebalance_opts(k))
- return 0;
-
- struct bkey_i *n = bch2_bkey_make_mut(trans, iter, &k, 0);
- int ret = PTR_ERR_OR_ZERO(n);
- if (ret)
- return ret;
-
- extent_entry_drop(bkey_i_to_s(n),
- (void *) bch2_bkey_rebalance_opts(bkey_i_to_s_c(n)));
- return bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc);
-}
-
static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
struct per_snapshot_io_opts *snapshot_io_opts,
struct bpos work_pos,
@@ -552,22 +576,23 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
*opts_ret = opts;
+ unsigned move_ptrs = 0;
+ unsigned compress_ptrs = 0;
+ unsigned csum_ptrs = 0;
+ struct bch_extent_rebalance r =
+ bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, &csum_ptrs, false);
+
memset(data_opts, 0, sizeof(*data_opts));
- data_opts->rewrite_ptrs = bch2_bkey_ptrs_need_rebalance(c, opts, k);
+ data_opts->rewrite_ptrs = move_ptrs|compress_ptrs|csum_ptrs;
data_opts->target = opts->background_target;
data_opts->write_flags |= BCH_WRITE_only_specified_devs;
- if (!data_opts->rewrite_ptrs) {
- /*
- * device we would want to write to offline? devices in target
- * changed?
- *
- * We'll now need a full scan before this extent is picked up
- * again:
- */
- int ret = bch2_bkey_clear_needs_rebalance(trans, extent_iter, k);
- if (ret)
- return bkey_s_c_err(ret);
+ if (!data_opts->rewrite_ptrs &&
+ !data_opts->kill_ptrs &&
+ !data_opts->kill_ec_ptrs &&
+ !data_opts->extra_replicas) {
+ /* XXX: better error message */
+ bch_err(c, "goto extent to rebalance but nothing to do, confused");
return bkey_s_c_null;
}
@@ -577,13 +602,6 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
bch2_bkey_val_to_text(&buf, c, k);
prt_newline(&buf);
- unsigned move_ptrs = 0;
- unsigned compress_ptrs = 0;
- unsigned csum_ptrs = 0;
- u64 sectors = 0;
-
- bch2_bkey_needs_rebalance(c, k, opts, &move_ptrs, &compress_ptrs, &csum_ptrs, §ors);
-
if (move_ptrs) {
prt_str(&buf, "move=");
bch2_target_to_text(&buf, c, opts->background_target);
@@ -671,6 +689,7 @@ static int do_rebalance_extent(struct moving_context *ctxt,
static int do_rebalance_scan_indirect(struct btree_trans *trans,
struct bkey_s_c_reflink_p p,
+ struct per_snapshot_io_opts *snapshot_io_opts,
struct bch_inode_opts *opts)
{
u64 idx = REFLINK_P_IDX(p.v) - le32_to_cpu(p.v->front_pad);
@@ -681,7 +700,7 @@ static int do_rebalance_scan_indirect(struct btree_trans *trans,
POS(0, idx), BTREE_ITER_not_extents, k, ({
if (bpos_ge(bkey_start_pos(k.k), POS(0, end)))
break;
- bch2_get_update_rebalance_opts(trans, opts, &iter, k,
+ bch2_get_update_rebalance_opts(trans, snapshot_io_opts, opts, &iter, k,
SET_NEEDS_REBALANCE_opt_change_indirect);
}));
if (ret)
@@ -726,7 +745,8 @@ static int do_rebalance_scan(struct moving_context *ctxt,
(inum &&
k.k->type == KEY_TYPE_reflink_p &&
REFLINK_P_MAY_UPDATE_OPTIONS(bkey_s_c_to_reflink_p(k).v)
- ? do_rebalance_scan_indirect(trans, bkey_s_c_to_reflink_p(k), opts)
+ ? do_rebalance_scan_indirect(trans, bkey_s_c_to_reflink_p(k),
+ snapshot_io_opts, opts)
: 0);
})) ?:
commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
@@ -1047,8 +1067,7 @@ static int check_rebalance_work_one(struct btree_trans *trans,
extent_k.k = &deleted;
}
- bool should_have_rebalance =
- bch2_bkey_sectors_need_rebalance(c, extent_k) != 0;
+ bool should_have_rebalance = bch2_bkey_needs_rb(extent_k);
bool have_rebalance = rebalance_k.k->type == KEY_TYPE_set;
if (should_have_rebalance != have_rebalance) {
diff --git a/fs/bcachefs/rebalance.h b/fs/bcachefs/rebalance.h
index fd873894c8b6..dde7e4cb9533 100644
--- a/fs/bcachefs/rebalance.h
+++ b/fs/bcachefs/rebalance.h
@@ -10,7 +10,7 @@
static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_fs *c,
struct bch_inode_opts *opts)
{
- struct bch_extent_rebalance r = {
+ return (struct bch_extent_rebalance) {
.type = BIT(BCH_EXTENT_ENTRY_rebalance),
#define x(_name) \
._name = opts->_name, \
@@ -18,15 +18,15 @@ static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_f
BCH_REBALANCE_OPTS()
#undef x
};
-
- if (r.background_target &&
- !bch2_target_accepts_data(c, BCH_DATA_user, r.background_target))
- r.background_target = 0;
-
- return r;
};
-u64 bch2_bkey_sectors_need_rebalance(struct bch_fs *, struct bkey_s_c);
+const struct bch_extent_rebalance *bch2_bkey_rebalance_opts(struct bkey_s_c);
+
+static inline int bch2_bkey_needs_rb(struct bkey_s_c k)
+{
+ const struct bch_extent_rebalance *r = bch2_bkey_rebalance_opts(k);
+ return r ? r->need_rb : 0;
+}
/* Inodes in different snapshots may have different IO options: */
struct snapshot_io_opts_entry {
@@ -36,6 +36,9 @@ struct snapshot_io_opts_entry {
struct per_snapshot_io_opts {
u64 cur_inum;
+ bool fs_scan_cookie;
+ bool inum_scan_cookie;
+
struct bch_inode_opts fs_io_opts;
DARRAY(struct snapshot_io_opts_entry) d;
};
@@ -60,7 +63,7 @@ enum set_needs_rebalance_ctx {
int bch2_bkey_set_needs_rebalance(struct btree_trans *,
struct per_snapshot_io_opts *, struct bch_inode_opts *,
- struct bkey_i *, enum set_needs_rebalance_ctx, u32);
+ struct bkey_i *, enum set_needs_rebalance_ctx, unsigned);
struct bch_inode_opts *bch2_extent_get_apply_io_opts(struct btree_trans *,
struct per_snapshot_io_opts *, struct bpos,
diff --git a/fs/bcachefs/rebalance_format.h b/fs/bcachefs/rebalance_format.h
index ff9a1342a22b..c744a29c8fa5 100644
--- a/fs/bcachefs/rebalance_format.h
+++ b/fs/bcachefs/rebalance_format.h
@@ -2,52 +2,64 @@
#ifndef _BCACHEFS_REBALANCE_FORMAT_H
#define _BCACHEFS_REBALANCE_FORMAT_H
+/* subset of BCH_INODE_OPTS */
+#define BCH_REBALANCE_OPTS() \
+ x(data_replicas) \
+ x(data_checksum) \
+ x(erasure_code) \
+ x(background_compression) \
+ x(background_target) \
+ x(promote_target)
+
+enum bch_rebalance_opts {
+#define x(n) BCH_REBALANCE_##n,
+ BCH_REBALANCE_OPTS()
+#undef x
+};
+
struct bch_extent_rebalance {
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u64 type:6,
- unused:3,
+ unused:5,
+ hipri:1,
+ pending:1,
+ need_rb:5,
- promote_target_from_inode:1,
- erasure_code_from_inode:1,
+ data_replicas_from_inode:1,
data_checksum_from_inode:1,
+ erasure_code_from_inode:1,
background_compression_from_inode:1,
- data_replicas_from_inode:1,
background_target_from_inode:1,
+ promote_target_from_inode:1,
- promote_target:16,
- erasure_code:1,
+ data_replicas:3,
data_checksum:4,
- data_replicas:4,
+ erasure_code:1,
background_compression:8, /* enum bch_compression_opt */
- background_target:16;
+ background_target:12,
+ promote_target:12;
#elif defined (__BIG_ENDIAN_BITFIELD)
- __u64 background_target:16,
+ __u64 promote_target:12,
+ background_target:12,
background_compression:8,
- data_replicas:4,
- data_checksum:4,
erasure_code:1,
- promote_target:16,
+ data_checksum:4,
+ data_replicas:3,
+ promote_target_from_inode:1,
background_target_from_inode:1,
- data_replicas_from_inode:1,
background_compression_from_inode:1,
- data_checksum_from_inode:1,
erasure_code_from_inode:1,
- promote_target_from_inode:1,
+ data_checksum_from_inode:1,
+ data_replicas_from_inode:1,
- unused:3,
+ need_rb:5,
+ pending:1,
+ hipri:1,
+ unused:5,
type:6;
#endif
};
-/* subset of BCH_INODE_OPTS */
-#define BCH_REBALANCE_OPTS() \
- x(data_checksum) \
- x(background_compression) \
- x(data_replicas) \
- x(promote_target) \
- x(background_target) \
- x(erasure_code)
-
#endif /* _BCACHEFS_REBALANCE_FORMAT_H */
diff --git a/fs/bcachefs/trace.h b/fs/bcachefs/trace.h
index 269cdf1a87a4..915c3201fe16 100644
--- a/fs/bcachefs/trace.h
+++ b/fs/bcachefs/trace.h
@@ -1331,11 +1331,6 @@ DEFINE_EVENT(fs_str, io_move_pred,
TP_ARGS(c, str)
);
-DEFINE_EVENT(fs_str, io_move_created_rebalance,
- TP_PROTO(struct bch_fs *c, const char *str),
- TP_ARGS(c, str)
-);
-
DEFINE_EVENT(fs_str, io_move_evacuate_bucket,
TP_PROTO(struct bch_fs *c, const char *str),
TP_ARGS(c, str)
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 18/21] bcachefs: bch2_set_rebalance_needs_scan_device()
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (16 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 17/21] bcachefs: bch_extent_rebalance changes Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 19/21] bcachefs: next_rebalance_extent() now handles replicas changes Kent Overstreet
` (2 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Rebalance can now evacuate devices in response to state changes.
This obsoletes BCH_DATA_OP_migrate; setting a device to
BCH_MEMBER_STATE_failed (perhaps we should rename this) will cause it to
be evacuated (and the evacuate will resume if e.g. we crash or shutdown
and restart).
Additionally, we'll now be able to automatically evacuate failing
devices. Currently we only set devices read-only in response to IO
errors; we'll need to add configuration/policy/good heuristics (and
clearly document them) for deciding when a device is failing and should
be evacuated.
This works with rebalance scan cookies; these are currently used to
respond to filesystem/inode option changes. Cookies in the range of
1-4095 now refer to devices; when rebalance sees one of those it will
walk backpointers on that device and update bch_extent_rebalance, which
will react to the new device state (or durability setting change).
Performance implications: with BCH_DATA_OP_migrate, we walk backpointers
and do the data moves directly, meaning they happen in device LBA order.
However, by walking backpointers to queue up rebalance work entries and
then doing the work from the rebalance_work btree, we'll do the data
moves in logical key order.
Pro: doing data moves in logical key order will help with
fragmentation/data locality: extents from the same inode will be moved
at the same time, we'll get a bit of defragmentation and do better at
keeping related data together
Con: reads from the device being evacuated will no longer be sequential,
this will hurt performance on spinning rust.
Perhaps add a mode where we kick off data moves from
do_rebalance_scan_bp()? Would be pretty easy
XXX: slurp backpointers into a darray and sort before processing extents
in do_rebalance_scan_device: we recently saw a very slow evacuate that
was mostly just dropping cached data, on a huge filesystem entirely on
spinning rust with only 8GB of ram in the server - the backpointers ->
extents lookups are fairly random, batching + sorting will greatly
improve performance
XXX: add a superblock bit to make this transactional, if we crash
between the write_super for the member state/durability change and
creating the device scan cookie
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 77 +++++++++++++++++++++++++++++++++++++++--
fs/bcachefs/rebalance.h | 1 +
fs/bcachefs/super.c | 4 +++
fs/bcachefs/sysfs.c | 4 +++
4 files changed, 83 insertions(+), 3 deletions(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 3a6cd54613a1..7ebd1c982810 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -3,6 +3,7 @@
#include "bcachefs.h"
#include "alloc_background.h"
#include "alloc_foreground.h"
+#include "backpointers.h"
#include "btree_iter.h"
#include "btree_update.h"
#include "btree_write_buffer.h"
@@ -480,6 +481,11 @@ int bch2_set_rebalance_needs_scan(struct bch_fs *c, u64 inum)
return ret;
}
+int bch2_set_rebalance_needs_scan_device(struct bch_fs *c, unsigned dev)
+{
+ return bch2_set_rebalance_needs_scan(c, dev + 1);
+}
+
int bch2_set_fs_needs_rebalance(struct bch_fs *c)
{
return bch2_set_rebalance_needs_scan(c, 0);
@@ -687,6 +693,65 @@ static int do_rebalance_extent(struct moving_context *ctxt,
return ret;
}
+static int do_rebalance_scan_bp(struct btree_trans *trans,
+ struct bkey_s_c_backpointer bp,
+ struct bkey_buf *last_flushed)
+{
+ struct btree_iter iter;
+ struct bkey_s_c k = bch2_backpointer_get_key(trans, bp, &iter, 0, last_flushed);
+ int ret = bkey_err(k);
+ if (ret)
+ return ret;
+
+ struct bch_inode_opts io_opts;
+ ret = bch2_extent_get_io_opts_one(trans, &io_opts, &iter, k,
+ SET_NEEDS_REBALANCE_opt_change);
+ bch2_trans_iter_exit(&iter);
+ return ret;
+}
+
+static int do_rebalance_scan_device(struct moving_context *ctxt,
+ unsigned dev, u64 cookie,
+ u64 *sectors_scanned)
+{
+ struct btree_trans *trans = ctxt->trans;
+ struct bch_fs *c = trans->c;
+ struct bch_fs_rebalance *r = &c->rebalance;
+
+ struct bkey_buf last_flushed;
+ bch2_bkey_buf_init(&last_flushed);
+ bkey_init(&last_flushed.k->k);
+
+ bch2_btree_write_buffer_flush_sync(trans);
+
+ int ret = for_each_btree_key_max(trans, iter, BTREE_ID_backpointers,
+ POS(dev, 0), POS(dev, U64_MAX),
+ BTREE_ITER_prefetch, k, ({
+ ctxt->stats->pos = BBPOS(iter.btree_id, iter.pos);
+
+ if (k.k->type != KEY_TYPE_backpointer)
+ continue;
+
+ do_rebalance_scan_bp(trans, bkey_s_c_to_backpointer(k), &last_flushed);
+ })) ?:
+ commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
+ bch2_clear_rebalance_needs_scan(trans, dev + 1, cookie));
+
+ *sectors_scanned += atomic64_read(&r->scan_stats.sectors_seen);
+ /*
+ * Ensure that the rebalance_work entries we created are seen by the
+ * next iteration of do_rebalance(), so we don't end up stuck in
+ * rebalance_wait():
+ */
+ *sectors_scanned += 1;
+ bch2_move_stats_exit(&r->scan_stats, c);
+
+ bch2_btree_write_buffer_flush_sync(trans);
+
+ bch2_bkey_buf_exit(&last_flushed, c);
+ return ret;
+}
+
static int do_rebalance_scan_indirect(struct btree_trans *trans,
struct bkey_s_c_reflink_p p,
struct per_snapshot_io_opts *snapshot_io_opts,
@@ -722,15 +787,21 @@ static int do_rebalance_scan(struct moving_context *ctxt,
bch2_move_stats_init(&r->scan_stats, "rebalance_scan");
ctxt->stats = &r->scan_stats;
+ r->state = BCH_REBALANCE_scanning;
+
if (!inum) {
r->scan_start = BBPOS_MIN;
r->scan_end = BBPOS_MAX;
- } else {
+ } else if (inum >= BCACHEFS_ROOT_INO) {
r->scan_start = BBPOS(BTREE_ID_extents, POS(inum, 0));
r->scan_end = BBPOS(BTREE_ID_extents, POS(inum, U64_MAX));
- }
+ } else {
+ unsigned dev = inum - 1;
+ r->scan_start = BBPOS(BTREE_ID_backpointers, POS(dev, 0));
+ r->scan_end = BBPOS(BTREE_ID_backpointers, POS(dev, U64_MAX));
- r->state = BCH_REBALANCE_scanning;
+ return do_rebalance_scan_device(ctxt, inum - 1, cookie, sectors_scanned);
+ }
int ret = for_each_btree_key_max(trans, iter, BTREE_ID_extents,
r->scan_start.pos, r->scan_end.pos,
diff --git a/fs/bcachefs/rebalance.h b/fs/bcachefs/rebalance.h
index dde7e4cb9533..f6b74d5e1210 100644
--- a/fs/bcachefs/rebalance.h
+++ b/fs/bcachefs/rebalance.h
@@ -76,6 +76,7 @@ int bch2_extent_get_io_opts_one(struct btree_trans *, struct bch_inode_opts *,
int bch2_set_rebalance_needs_scan_trans(struct btree_trans *, u64);
int bch2_set_rebalance_needs_scan(struct bch_fs *, u64 inum);
+int bch2_set_rebalance_needs_scan_device(struct bch_fs *, unsigned);
int bch2_set_fs_needs_rebalance(struct bch_fs *);
static inline void bch2_rebalance_wakeup(struct bch_fs *c)
diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index 793c16fa8b09..b8746a3dd782 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -1952,6 +1952,10 @@ int __bch2_dev_set_state(struct bch_fs *c, struct bch_dev *ca,
if (new_state == BCH_MEMBER_STATE_rw)
__bch2_dev_read_write(c, ca);
+ /* XXX: add a superblock bit to make this transactional */
+ if (new_state == BCH_MEMBER_STATE_failed)
+ bch2_set_rebalance_needs_scan_device(c, ca->dev_idx);
+
bch2_rebalance_wakeup(c);
return ret;
diff --git a/fs/bcachefs/sysfs.c b/fs/bcachefs/sysfs.c
index bd3fa9c3372d..62ad13b34364 100644
--- a/fs/bcachefs/sysfs.c
+++ b/fs/bcachefs/sysfs.c
@@ -807,6 +807,10 @@ static ssize_t sysfs_opt_store(struct bch_fs *c,
if (!ca)
bch2_opt_set_by_id(&c->opts, id, v);
+ /* XXX: add a superblock bit to make this transactional */
+ if (id == Opt_durability)
+ bch2_set_rebalance_needs_scan_device(c, ca->dev_idx);
+
if (changed)
bch2_opt_hook_post_set(c, ca, 0, &c->opts, id);
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 19/21] bcachefs: next_rebalance_extent() now handles replicas changes
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (17 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 18/21] bcachefs: bch2_set_rebalance_needs_scan_device() Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 20/21] bcachefs: rebalance: erasure_code opt change now handled Kent Overstreet
2025-08-24 12:37 ` [PATCH 21/21] bcachefs: rebalance_work_(hipri|pending) btrees Kent Overstreet
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
This obsoletes BCH_DATA_OP_drop_extra_replicas, BCH_DATA_OP_rereplicate.
(And the code is taken from drop_extra_replicas_pred() and
rereplicate_pred() in move.c).
Changes to the data_replicas setting will now automatically be applied
to existing data without any additional user action.
NOTE: we don't yet have a mechanism for guessing how much space an
option change will require (and reserving said space), until we have
that users will have to be careful not to run a filesystem out of free
space with an option change.
XXX: support metadata
XXX: BCH_DATA_OP_rereplicate additionally calls bch2_replicas_gc2():
this clears out stale replicas entries from the superblock (based on the
accounting btree), ensuring the "can we mount degraded without missing
data?" table is accurate.
We'll need to figure out a new place to plumb this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 42 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 7ebd1c982810..29e12dbf3710 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -593,6 +593,48 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
data_opts->target = opts->background_target;
data_opts->write_flags |= BCH_WRITE_only_specified_devs;
+ if (r.need_rb & BIT(BCH_REBALANCE_data_replicas)) {
+ unsigned durability = bch2_bkey_durability(c, k);
+ struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
+ unsigned ptr_bit = 1;
+
+ guard(rcu)();
+ if (durability <= opts->data_replicas) {
+ bkey_for_each_ptr(ptrs, ptr) {
+ struct bch_dev *ca = bch2_dev_rcu_noerror(c, ptr->dev);
+ if (ca && !ptr->cached && !ca->mi.durability)
+ data_opts->kill_ptrs |= ptr_bit;
+ ptr_bit <<= 1;
+ }
+
+ data_opts->extra_replicas = opts->data_replicas - durability;
+ } else {
+ const union bch_extent_entry *entry;
+ struct extent_ptr_decoded p;
+
+ bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
+ unsigned d = bch2_extent_ptr_durability(c, &p);
+
+ if (d && durability - d >= opts->data_replicas) {
+ data_opts->kill_ptrs |= ptr_bit;
+ durability -= d;
+ }
+
+ ptr_bit <<= 1;
+ }
+
+ ptr_bit = 1;
+ bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
+ if (p.has_ec && durability - p.ec.redundancy >= opts->data_replicas) {
+ data_opts->kill_ec_ptrs |= ptr_bit;
+ durability -= p.ec.redundancy;
+ }
+
+ ptr_bit <<= 1;
+ }
+ }
+ }
+
if (!data_opts->rewrite_ptrs &&
!data_opts->kill_ptrs &&
!data_opts->kill_ec_ptrs &&
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 20/21] bcachefs: rebalance: erasure_code opt change now handled
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (18 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 19/21] bcachefs: next_rebalance_extent() now handles replicas changes Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
2025-08-24 12:37 ` [PATCH 21/21] bcachefs: rebalance_work_(hipri|pending) btrees Kent Overstreet
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
We had no existing mechanism for applying/removing erasure coding to
existing data, nothing is being obsoleted.
Like other IO path options, rebalance now ensures that data is stored
correctly according to the current erasure_code option.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/rebalance.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 29e12dbf3710..525e4d1716c5 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -593,9 +593,12 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
data_opts->target = opts->background_target;
data_opts->write_flags |= BCH_WRITE_only_specified_devs;
+ struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
+ const union bch_extent_entry *entry;
+ struct extent_ptr_decoded p;
+
if (r.need_rb & BIT(BCH_REBALANCE_data_replicas)) {
unsigned durability = bch2_bkey_durability(c, k);
- struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(k);
unsigned ptr_bit = 1;
guard(rcu)();
@@ -609,9 +612,6 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
data_opts->extra_replicas = opts->data_replicas - durability;
} else {
- const union bch_extent_entry *entry;
- struct extent_ptr_decoded p;
-
bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
unsigned d = bch2_extent_ptr_durability(c, &p);
@@ -635,6 +635,22 @@ static struct bkey_s_c next_rebalance_extent(struct btree_trans *trans,
}
}
+ if (r.need_rb & BIT(BCH_REBALANCE_erasure_code)) {
+ if (opts->erasure_code) {
+ data_opts->extra_replicas = min(data_opts->extra_replicas, 1);
+ } else {
+ unsigned ptr_bit = 1;
+ bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
+ if (p.has_ec) {
+ data_opts->kill_ec_ptrs |= ptr_bit;
+ data_opts->extra_replicas += p.ec.redundancy;
+ }
+
+ ptr_bit <<= 1;
+ }
+ }
+ }
+
if (!data_opts->rewrite_ptrs &&
!data_opts->kill_ptrs &&
!data_opts->kill_ec_ptrs &&
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 21/21] bcachefs: rebalance_work_(hipri|pending) btrees
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
` (19 preceding siblings ...)
2025-08-24 12:37 ` [PATCH 20/21] bcachefs: rebalance: erasure_code opt change now handled Kent Overstreet
@ 2025-08-24 12:37 ` Kent Overstreet
20 siblings, 0 replies; 22+ messages in thread
From: Kent Overstreet @ 2025-08-24 12:37 UTC (permalink / raw)
To: linux-bcachefs; +Cc: Kent Overstreet
Add two more btrees analagous to the rebalance_work btree: bitset btrees
that refer to extents in the extents and reflink btrees.
rebalance_work_hipri: this is for extents that are being moved off of a
BCH_MEMBER_STATE_failed device (device evacuates; we'll likely rename
that member state).
This is to ensure that evacuates aren't blocked by other work, e.g.
background_compression
rebalance_work_pending: this is for extents that we'd like to move but
can't - currently this only happens when the target is full. When we
detect this we can set the pending bit in bch_extent_rebalance; extents
in this btree won't be processed until some external event happens (e.g.
a new device is added to the array).
This fixes a bug where rebalance will spin when more data is stored in
the filesystem than fits in the target specified (e.g. a tiered SSD/HDD
setup with more data than fits on background_target0.
NOTE - we'd really like extents rebalance_work_pending to additionally
be indexed by the target we want to move them to; a large complicated
setup can have many targets with different directories on different
targets. We don't want to have to scan all of rebalance_work_pending,
just the extents for the target that now has free space.
We can't do that yet, though: that will require support for btrees with
larger sized integer keys (currently btree keys are fixed at 160 bit
integers).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
fs/bcachefs/bcachefs_format.h | 8 +++
fs/bcachefs/buckets.c | 32 +---------
fs/bcachefs/extents.c | 10 ----
fs/bcachefs/extents.h | 11 ++++
fs/bcachefs/rebalance.c | 106 +++++++++++++++++++++++++++++-----
fs/bcachefs/rebalance.h | 20 ++++++-
6 files changed, 129 insertions(+), 58 deletions(-)
diff --git a/fs/bcachefs/bcachefs_format.h b/fs/bcachefs/bcachefs_format.h
index b2de993d802b..28c0c876e14b 100644
--- a/fs/bcachefs/bcachefs_format.h
+++ b/fs/bcachefs/bcachefs_format.h
@@ -1425,6 +1425,14 @@ enum btree_id_flags {
BTREE_IS_snapshot_field| \
BTREE_IS_write_buffer, \
BIT_ULL(KEY_TYPE_accounting)) \
+ x(rebalance_work_hipri, 21, \
+ BTREE_IS_snapshot_field| \
+ BTREE_IS_write_buffer, \
+ BIT_ULL(KEY_TYPE_set)) \
+ x(rebalance_work_pending, 22, \
+ BTREE_IS_snapshot_field| \
+ BTREE_IS_write_buffer, \
+ BIT_ULL(KEY_TYPE_set)) \
enum btree_id {
#define x(name, nr, ...) BTREE_ID_##name = nr,
diff --git a/fs/bcachefs/buckets.c b/fs/bcachefs/buckets.c
index 436634a5f77c..afa97af1fb8b 100644
--- a/fs/bcachefs/buckets.c
+++ b/fs/bcachefs/buckets.c
@@ -901,35 +901,9 @@ int bch2_trigger_extent(struct btree_trans *trans,
return ret;
}
- unsigned old_r = bch2_bkey_needs_rb(old);
- unsigned new_r = bch2_bkey_needs_rb(new.s_c);
- if (old_r != new_r) {
- /* XXX: slowpath, put in a a separate function */
- int delta = (int) !!new_r - (int) !!old_r;
- if ((flags & BTREE_TRIGGER_transactional) && delta) {
- int ret = bch2_btree_bit_mod_buffered(trans, BTREE_ID_rebalance_work,
- new.k->p, delta > 0);
- if (ret)
- return ret;
- }
-
- s64 v[1] = { 0 };
-#define x(n) \
- if ((old_r ^ new_r) & BIT(BCH_REBALANCE_##n)) { \
- v[0] = old_r & BIT(BCH_REBALANCE_##n) \
- ? -(s64) old.k->size \
- : new.k->size; \
- \
- int ret = bch2_disk_accounting_mod2(trans, \
- flags & BTREE_TRIGGER_gc, \
- v, rebalance_work, \
- BCH_REBALANCE_##n); \
- if (ret) \
- return ret; \
- }
- BCH_REBALANCE_OPTS()
-#undef x
- }
+ int ret = bch2_trigger_extent_rebalance(trans, old, new, flags);
+ if (ret)
+ return ret;
}
return 0;
diff --git a/fs/bcachefs/extents.c b/fs/bcachefs/extents.c
index 016242ffc98d..6c3964d3efca 100644
--- a/fs/bcachefs/extents.c
+++ b/fs/bcachefs/extents.c
@@ -803,16 +803,6 @@ unsigned bch2_bkey_replicas(struct bch_fs *c, struct bkey_s_c k)
return replicas;
}
-static inline unsigned __extent_ptr_durability(struct bch_dev *ca, struct extent_ptr_decoded *p)
-{
- if (p->ptr.cached)
- return 0;
-
- return p->has_ec
- ? p->ec.redundancy + 1
- : ca->mi.durability;
-}
-
unsigned bch2_extent_ptr_desired_durability(struct bch_fs *c, struct extent_ptr_decoded *p)
{
struct bch_dev *ca = bch2_dev_rcu_noerror(c, p->ptr.dev);
diff --git a/fs/bcachefs/extents.h b/fs/bcachefs/extents.h
index 03ea7c689d9a..934754e36854 100644
--- a/fs/bcachefs/extents.h
+++ b/fs/bcachefs/extents.h
@@ -603,6 +603,17 @@ bool bch2_bkey_is_incompressible(struct bkey_s_c);
unsigned bch2_bkey_sectors_compressed(struct bkey_s_c);
unsigned bch2_bkey_replicas(struct bch_fs *, struct bkey_s_c);
+
+static inline unsigned __extent_ptr_durability(struct bch_dev *ca, struct extent_ptr_decoded *p)
+{
+ if (p->ptr.cached)
+ return 0;
+
+ return p->has_ec
+ ? p->ec.redundancy + 1
+ : ca->mi.durability;
+}
+
unsigned bch2_extent_ptr_desired_durability(struct bch_fs *, struct extent_ptr_decoded *);
unsigned bch2_extent_ptr_durability(struct bch_fs *, struct extent_ptr_decoded *);
unsigned bch2_bkey_durability(struct bch_fs *, struct bkey_s_c);
diff --git a/fs/bcachefs/rebalance.c b/fs/bcachefs/rebalance.c
index 525e4d1716c5..b5907e72bcfc 100644
--- a/fs/bcachefs/rebalance.c
+++ b/fs/bcachefs/rebalance.c
@@ -46,6 +46,60 @@ const struct bch_extent_rebalance *bch2_bkey_rebalance_opts(struct bkey_s_c k)
return bch2_bkey_ptrs_rebalance_opts(bch2_bkey_ptrs_c(k));
}
+static enum btree_id rb_work_btree(const struct bch_extent_rebalance *r)
+{
+ if (!r || !r->need_rb)
+ return 0;
+ if (r->hipri)
+ return BTREE_ID_rebalance_work_hipri;
+ if (r->pending)
+ return BTREE_ID_rebalance_work_pending;
+ return BTREE_ID_rebalance_work;
+}
+
+int __bch2_trigger_extent_rebalance(struct btree_trans *trans,
+ struct bkey_s_c old, struct bkey_s_c new,
+ enum btree_iter_update_trigger_flags flags)
+{
+ const struct bch_extent_rebalance *old_r = bch2_bkey_rebalance_opts(old);
+ const struct bch_extent_rebalance *new_r = bch2_bkey_rebalance_opts(new);
+
+ enum btree_id old_btree = rb_work_btree(old_r);
+ enum btree_id new_btree = rb_work_btree(new_r);
+
+ if (old_btree && old_btree != new_btree) {
+ int ret = bch2_btree_bit_mod_buffered(trans, old_btree, old.k->p, false);
+ if (ret)
+ return ret;
+ }
+
+ if (new_btree && old_btree != new_btree) {
+ int ret = bch2_btree_bit_mod_buffered(trans, new_btree, new.k->p, true);
+ if (ret)
+ return ret;
+ }
+
+ unsigned old_n = old_r ? old_r->need_rb : 0;
+ unsigned new_n = new_r ? new_r->need_rb : 0;
+
+ s64 v[1] = { 0 };
+#define x(n) \
+ if ((old_n ^ new_n) & BIT(BCH_REBALANCE_##n)) { \
+ v[0] = old_n & BIT(BCH_REBALANCE_##n) \
+ ? -(s64) old.k->size \
+ : new.k->size; \
+ \
+ int ret = bch2_disk_accounting_mod2(trans, \
+ flags & BTREE_TRIGGER_gc, \
+ v, rebalance_work, \
+ BCH_REBALANCE_##n); \
+ if (ret) \
+ return ret; \
+ }
+ BCH_REBALANCE_OPTS()
+#undef x
+}
+
static struct bch_extent_rebalance
bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
struct bch_inode_opts *opts,
@@ -64,10 +118,6 @@ bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
if (bch2_bkey_extent_ptrs_flags(ptrs) & BIT_ULL(BCH_EXTENT_FLAG_poisoned))
return r;
- const struct bch_extent_rebalance *old_r = bch2_bkey_ptrs_rebalance_opts(ptrs);
- if (old_r)
- r = *old_r;
-
#define x(_name) \
if (k.k->type != KEY_TYPE_reflink_v || \
may_update_indirect || \
@@ -96,6 +146,10 @@ bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
incompressible |= p.crc.compression_type == BCH_COMPRESSION_TYPE_incompressible;
unwritten |= p.ptr.unwritten;
+ struct bch_dev *ca = bch2_dev_rcu_noerror(c, p.ptr.dev);
+ if (!ca)
+ goto next;
+
if (!p.ptr.cached) {
if (p.crc.compression_type != compression_type)
*compress_ptrs |= ptr_idx;
@@ -106,13 +160,18 @@ bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
if (target && !bch2_dev_in_target(c, p.ptr.dev, target))
*move_ptrs |= ptr_idx;
- unsigned d = bch2_extent_ptr_durability(c, &p);
+ if (ca->mi.state == BCH_MEMBER_STATE_failed)
+ r.hipri = 1;
+
+ unsigned d = ca->mi.state != BCH_MEMBER_STATE_failed
+ ? __extent_ptr_durability(ca, &p)
+ : 0;
durability += d;
min_durability = min(min_durability, d);
ec |= p.has_ec;
}
-
+next:
ptr_idx <<= 1;
}
@@ -131,6 +190,10 @@ bch2_bkey_needs_rebalance(struct bch_fs *c, struct bkey_s_c k,
r.need_rb |= BIT(BCH_REBALANCE_data_replicas);
if (*move_ptrs)
r.need_rb |= BIT(BCH_REBALANCE_background_target);
+
+ const struct bch_extent_rebalance *old = bch2_bkey_ptrs_rebalance_opts(ptrs);
+ if (old && !(old->need_rb & ~r.need_rb))
+ r.pending = old->pending;
return r;
}
@@ -1152,25 +1215,29 @@ int bch2_fs_rebalance_init(struct bch_fs *c)
return 0;
}
+/* need better helpers for iterating in parallel */
+
static int check_rebalance_work_one(struct btree_trans *trans,
struct btree_iter *extent_iter,
struct btree_iter *rebalance_iter,
+ struct btree_iter *rebalance_hipri_iter,
+ struct btree_iter *rebalance_pending_iter,
struct per_snapshot_io_opts *snapshot_io_opts,
struct bkey_buf *last_flushed)
{
struct bch_fs *c = trans->c;
- struct bkey_s_c extent_k, rebalance_k;
+ struct bkey_s_c extent_k, rb_k;
CLASS(printbuf, buf)();
int ret = bkey_err(extent_k = bch2_btree_iter_peek(extent_iter)) ?:
- bkey_err(rebalance_k = bch2_btree_iter_peek(rebalance_iter));
+ bkey_err(rb_k = bch2_btree_iter_peek(rebalance_iter));
if (ret)
return ret;
if (!extent_k.k &&
extent_iter->btree_id == BTREE_ID_reflink &&
- (!rebalance_k.k ||
- rebalance_k.k->p.inode >= BCACHEFS_ROOT_INO)) {
+ (!rb_k.k ||
+ rb_k.k->p.inode >= BCACHEFS_ROOT_INO)) {
bch2_trans_iter_exit(extent_iter);
bch2_trans_iter_init(trans, extent_iter,
BTREE_ID_extents, POS_MIN,
@@ -1179,25 +1246,25 @@ static int check_rebalance_work_one(struct btree_trans *trans,
return bch_err_throw(c, transaction_restart_nested);
}
- if (!extent_k.k && !rebalance_k.k)
+ if (!extent_k.k && !rb_k.k)
return 1;
int cmp = bpos_cmp(extent_k.k ? extent_k.k->p : SPOS_MAX,
- rebalance_k.k ? rebalance_k.k->p : SPOS_MAX);
+ rb_k.k ? rb_k.k->p : SPOS_MAX);
struct bkey deleted;
bkey_init(&deleted);
if (cmp < 0) {
deleted.p = extent_k.k->p;
- rebalance_k.k = &deleted;
+ rb_k.k = &deleted;
} else if (cmp > 0) {
- deleted.p = rebalance_k.k->p;
+ deleted.p = rb_k.k->p;
extent_k.k = &deleted;
}
bool should_have_rebalance = bch2_bkey_needs_rb(extent_k);
- bool have_rebalance = rebalance_k.k->type == KEY_TYPE_set;
+ bool have_rebalance = rb_k.k->type == KEY_TYPE_set;
if (should_have_rebalance != have_rebalance) {
ret = bch2_btree_write_buffer_maybe_flush(trans, extent_k, last_flushed);
@@ -1248,6 +1315,10 @@ int bch2_check_rebalance_work(struct bch_fs *c)
BTREE_ITER_prefetch);
CLASS(btree_iter, rebalance_iter)(trans, BTREE_ID_rebalance_work, POS_MIN,
BTREE_ITER_prefetch);
+ CLASS(btree_iter, rebalance_hipri_iter)(trans, BTREE_ID_rebalance_work_hipri, POS_MIN,
+ BTREE_ITER_prefetch);
+ CLASS(btree_iter, rebalance_pending_iter)(trans, BTREE_ID_rebalance_work_pending, POS_MIN,
+ BTREE_ITER_prefetch);
struct per_snapshot_io_opts snapshot_io_opts;
per_snapshot_io_opts_init(&snapshot_io_opts, c);
@@ -1265,7 +1336,10 @@ int bch2_check_rebalance_work(struct bch_fs *c)
bch2_trans_begin(trans);
- ret = check_rebalance_work_one(trans, &extent_iter, &rebalance_iter,
+ ret = check_rebalance_work_one(trans, &extent_iter,
+ &rebalance_iter,
+ &rebalance_hipri_iter,
+ &rebalance_pending_iter,
&snapshot_io_opts, &last_flushed);
if (bch2_err_matches(ret, BCH_ERR_transaction_restart))
diff --git a/fs/bcachefs/rebalance.h b/fs/bcachefs/rebalance.h
index f6b74d5e1210..27c36568a626 100644
--- a/fs/bcachefs/rebalance.h
+++ b/fs/bcachefs/rebalance.h
@@ -22,10 +22,24 @@ static inline struct bch_extent_rebalance io_opts_to_rebalance_opts(struct bch_f
const struct bch_extent_rebalance *bch2_bkey_rebalance_opts(struct bkey_s_c);
-static inline int bch2_bkey_needs_rb(struct bkey_s_c k)
+int __bch2_trigger_extent_rebalance(struct btree_trans *,
+ struct bkey_s_c, struct bkey_s_c,
+ enum btree_iter_update_trigger_flags);
+
+static inline int bch2_trigger_extent_rebalance(struct btree_trans *trans,
+ struct bkey_s_c old, struct bkey_s_c new,
+ enum btree_iter_update_trigger_flags flags)
{
- const struct bch_extent_rebalance *r = bch2_bkey_rebalance_opts(k);
- return r ? r->need_rb : 0;
+ const struct bch_extent_rebalance *old_r = bch2_bkey_rebalance_opts(old);
+ const struct bch_extent_rebalance *new_r = bch2_bkey_rebalance_opts(new);
+
+ if ((!old_r && !new_r) ||
+ (old_r->need_rb == new_r->need_rb &&
+ old_r->hipri == new_r->hipri &&
+ old_r->pending == new_r->pending))
+ return 0;
+
+ return __bch2_trigger_extent_rebalance(trans, old, new, flags);
}
/* Inodes in different snapshots may have different IO options: */
--
2.50.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-08-24 12:38 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-24 12:37 [PATCH 00/21] Big rebalance changes Kent Overstreet
2025-08-24 12:37 ` [PATCH 01/21] bcachefs: s/bch_io_opts/bch_inode_opts Kent Overstreet
2025-08-24 12:37 ` [PATCH 02/21] bcachefs: Inode opt helper refactoring Kent Overstreet
2025-08-24 12:37 ` [PATCH 03/21] bcachefs: opt_change_cookie Kent Overstreet
2025-08-24 12:37 ` [PATCH 04/21] bcachefs: Transactional consistency for set_needs_rebalance Kent Overstreet
2025-08-24 12:37 ` [PATCH 05/21] bcachefs: Plumb bch_inode_opts.change_cookie Kent Overstreet
2025-08-24 12:37 ` [PATCH 06/21] bcachefs: enum set_needs_rebalance_ctx Kent Overstreet
2025-08-24 12:37 ` [PATCH 07/21] bcachefs: do_rebalance_scan() now responsible for indirect extents Kent Overstreet
2025-08-24 12:37 ` [PATCH 08/21] bcachefs: Rename, split out bch2_extent_get_io_opts() Kent Overstreet
2025-08-24 12:37 ` [PATCH 09/21] bcachefs: do_rebalance_extent() uses bch2_extent_get_apply_io_opts() Kent Overstreet
2025-08-24 12:37 ` [PATCH 10/21] bcachefs: Correct propagation of io options to indirect extents Kent Overstreet
2025-08-24 12:37 ` [PATCH 11/21] bcachefs: bkey_should_have_rb_opts() Kent Overstreet
2025-08-24 12:37 ` [PATCH 12/21] bcachefs: bch2_bkey_needs_rebalance() Kent Overstreet
2025-08-24 12:37 ` [PATCH 13/21] bcachefs: rebalance now supports changing checksum type Kent Overstreet
2025-08-24 12:37 ` [PATCH 14/21] bcachefs: Consistency checking for bch_extent_rebalance opts Kent Overstreet
2025-08-24 12:37 ` [PATCH 15/21] bcachefs: check_rebalance_work checks option inconsistency Kent Overstreet
2025-08-24 12:37 ` [PATCH 16/21] bcachefs: bch2_bkey_set_needs_rebalance() now takes per_snapshot_io_opts Kent Overstreet
2025-08-24 12:37 ` [PATCH 17/21] bcachefs: bch_extent_rebalance changes Kent Overstreet
2025-08-24 12:37 ` [PATCH 18/21] bcachefs: bch2_set_rebalance_needs_scan_device() Kent Overstreet
2025-08-24 12:37 ` [PATCH 19/21] bcachefs: next_rebalance_extent() now handles replicas changes Kent Overstreet
2025-08-24 12:37 ` [PATCH 20/21] bcachefs: rebalance: erasure_code opt change now handled Kent Overstreet
2025-08-24 12:37 ` [PATCH 21/21] bcachefs: rebalance_work_(hipri|pending) btrees Kent Overstreet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).