* [PATCH v6 00/16] Remap tree
@ 2025-11-14 18:47 Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
` (15 more replies)
0 siblings, 16 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone
This is version 6 of the patch series for the new logical remapping tree
feature - see the previous cover letters for more information including
the rationale:
* RFC: https://lore.kernel.org/all/20250515163641.3449017-1-maharmstone@fb.com/
* Version 1: https://lore.kernel.org/all/20250605162345.2561026-1-maharmstone@fb.com/
* Version 2: https://lore.kernel.org/all/20250813143509.31073-1-mark@harmstone.com/
* Version 3: https://lore.kernel.org/all/20251009112814.13942-1-mark@harmstone.com/
* Version 4: https://lore.kernel.org/all/20251024181227.32228-1-mark@harmstone.com/
* Version 5: https://lore.kernel.org/all/20251110171511.20900-1-mark@harmstone.com/
Changes since version 5:
* Fixed locking in btrfs_handle_fully_remapped_bgs()
* btrfs_mark_bg_fully_remapped() now puts BGs straight onto discard list if
using async discard, rather than fully_remapped_bgs
* Now using btrfs_mark_bg_unused() rather than a reimplementation of it
* Fixed potential race between btrfs_handle_fully_remapped_bgs() and
btrfs_delete_unused_bgs()
* Non-async discard call to btrfs_handle_fully_remapped_bgs() moved from
transaction commit to cleaner thread
* Fixed reservation of SYSTEM chunk metadata
* Reservations now done before starting a transaction to mark a block group
fully remapped
* Some other niggles that Boris pointed out
Mark Harmstone (16):
btrfs: add definitions and constants for remap-tree
btrfs: add REMAP chunk type
btrfs: allow remapped chunks to have zero stripes
btrfs: remove remapped block groups from the free-space tree
btrfs: don't add metadata items for the remap tree to the extent tree
btrfs: add extended version of struct block_group_item
btrfs: allow mounting filesystems with remap-tree incompat flag
btrfs: redirect I/O for remapped block groups
btrfs: handle deletions from remapped block group
btrfs: handle setting up relocation of block group with remap-tree
btrfs: move existing remaps before relocating block group
btrfs: replace identity remaps with actual remaps when doing
relocations
btrfs: add do_remap param to btrfs_discard_extent()
btrfs: allow balancing remap tree
btrfs: handle discarding fully-remapped block groups
btrfs: populate fully_remapped_bgs_list on mount
fs/btrfs/Kconfig | 2 +
fs/btrfs/accessors.h | 29 +
fs/btrfs/bio.c | 3 +-
fs/btrfs/bio.h | 3 +
fs/btrfs/block-group.c | 306 ++++-
fs/btrfs/block-group.h | 25 +-
fs/btrfs/block-rsv.c | 8 +
fs/btrfs/block-rsv.h | 1 +
fs/btrfs/discard.c | 57 +-
fs/btrfs/disk-io.c | 125 +-
fs/btrfs/extent-tree.c | 129 ++-
fs/btrfs/extent-tree.h | 3 +-
fs/btrfs/free-space-cache.c | 78 +-
fs/btrfs/free-space-cache.h | 1 +
fs/btrfs/free-space-tree.c | 4 +-
fs/btrfs/free-space-tree.h | 5 +-
fs/btrfs/fs.h | 10 +-
fs/btrfs/inode.c | 2 +-
fs/btrfs/locking.c | 1 +
fs/btrfs/relocation.c | 1885 +++++++++++++++++++++++++++++--
fs/btrfs/relocation.h | 18 +
fs/btrfs/space-info.c | 22 +-
fs/btrfs/sysfs.c | 4 +
fs/btrfs/transaction.c | 7 +
fs/btrfs/tree-checker.c | 94 +-
fs/btrfs/tree-checker.h | 5 +
fs/btrfs/volumes.c | 356 +++++-
fs/btrfs/volumes.h | 18 +-
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 29 +-
30 files changed, 2968 insertions(+), 263 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v6 01/16] btrfs: add definitions and constants for remap-tree
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 02/16] btrfs: add REMAP chunk type Mark Harmstone
` (14 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
Add an incompat flag for the new remap-tree feature, and the constants
and definitions needed to support it.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/accessors.h | 3 +++
fs/btrfs/locking.c | 1 +
fs/btrfs/sysfs.c | 2 ++
fs/btrfs/tree-checker.c | 6 ++----
fs/btrfs/tree-checker.h | 5 +++++
fs/btrfs/volumes.c | 1 +
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 12 ++++++++++++
8 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
index 78721412951c..3eec1a1ecdf4 100644
--- a/fs/btrfs/accessors.h
+++ b/fs/btrfs/accessors.h
@@ -1010,6 +1010,9 @@ BTRFS_SETGET_STACK_FUNCS(stack_verity_descriptor_encryption,
BTRFS_SETGET_STACK_FUNCS(stack_verity_descriptor_size,
struct btrfs_verity_descriptor_item, size, 64);
+BTRFS_SETGET_FUNCS(remap_address, struct btrfs_remap, address, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_remap_address, struct btrfs_remap, address, 64);
+
/* Cast into the data area of the leaf. */
#define btrfs_item_ptr(leaf, slot, type) \
((type *)(btrfs_item_nr_offset(leaf, 0) + btrfs_item_offset(leaf, slot)))
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 0035851d72b0..726e4d70f37c 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -73,6 +73,7 @@ static struct btrfs_lockdep_keyset {
{ .id = BTRFS_FREE_SPACE_TREE_OBJECTID, DEFINE_NAME("free-space") },
{ .id = BTRFS_BLOCK_GROUP_TREE_OBJECTID, DEFINE_NAME("block-group") },
{ .id = BTRFS_RAID_STRIPE_TREE_OBJECTID, DEFINE_NAME("raid-stripe") },
+ { .id = BTRFS_REMAP_TREE_OBJECTID, DEFINE_NAME("remap-tree") },
{ .id = 0, DEFINE_NAME("tree") },
};
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 1f64c132b387..e095936c2389 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -293,6 +293,7 @@ BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE);
BTRFS_FEAT_ATTR_COMPAT_RO(block_group_tree, BLOCK_GROUP_TREE);
BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34);
BTRFS_FEAT_ATTR_INCOMPAT(simple_quota, SIMPLE_QUOTA);
+BTRFS_FEAT_ATTR_INCOMPAT(remap_tree, REMAP_TREE);
#ifdef CONFIG_BLK_DEV_ZONED
BTRFS_FEAT_ATTR_INCOMPAT(zoned, ZONED);
#endif
@@ -327,6 +328,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
BTRFS_FEAT_ATTR_PTR(raid1c34),
BTRFS_FEAT_ATTR_PTR(block_group_tree),
BTRFS_FEAT_ATTR_PTR(simple_quota),
+ BTRFS_FEAT_ATTR_PTR(remap_tree),
#ifdef CONFIG_BLK_DEV_ZONED
BTRFS_FEAT_ATTR_PTR(zoned),
#endif
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c21c21adf61e..aedc208a95b8 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -913,12 +913,10 @@ int btrfs_check_chunk_valid(const struct btrfs_fs_info *fs_info,
length, btrfs_stripe_nr_to_offset(U32_MAX));
return -EUCLEAN;
}
- if (unlikely(type & ~(BTRFS_BLOCK_GROUP_TYPE_MASK |
- BTRFS_BLOCK_GROUP_PROFILE_MASK))) {
+ if (unlikely(type & ~BTRFS_BLOCK_GROUP_VALID)) {
chunk_err(fs_info, leaf, chunk, logical,
"unrecognized chunk type: 0x%llx",
- ~(BTRFS_BLOCK_GROUP_TYPE_MASK |
- BTRFS_BLOCK_GROUP_PROFILE_MASK) & type);
+ type & ~BTRFS_BLOCK_GROUP_VALID);
return -EUCLEAN;
}
diff --git a/fs/btrfs/tree-checker.h b/fs/btrfs/tree-checker.h
index eb201f4ec3c7..833e2fd989eb 100644
--- a/fs/btrfs/tree-checker.h
+++ b/fs/btrfs/tree-checker.h
@@ -57,6 +57,11 @@ enum btrfs_tree_block_status {
BTRFS_TREE_BLOCK_WRITTEN_NOT_SET,
};
+
+#define BTRFS_BLOCK_GROUP_VALID (BTRFS_BLOCK_GROUP_TYPE_MASK | \
+ BTRFS_BLOCK_GROUP_PROFILE_MASK | \
+ BTRFS_BLOCK_GROUP_REMAPPED)
+
/*
* Exported simply for btrfs-progs which wants to have the
* btrfs_tree_block_status return codes.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 75a34ed95c74..cf7b8bb86412 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -231,6 +231,7 @@ void btrfs_describe_block_groups(u64 bg_flags, char *buf, u32 size_buf)
DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_DATA, "data");
DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_SYSTEM, "system");
DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_METADATA, "metadata");
+ DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_REMAPPED, "remapped");
DESCRIBE_FLAG(BTRFS_AVAIL_ALLOC_BIT_SINGLE, "single");
for (i = 0; i < BTRFS_NR_RAID_TYPES; i++)
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index f40b300bd664..0763a23aeebc 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -336,6 +336,7 @@ struct btrfs_ioctl_fs_info_args {
#define BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2 (1ULL << 13)
#define BTRFS_FEATURE_INCOMPAT_RAID_STRIPE_TREE (1ULL << 14)
#define BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA (1ULL << 16)
+#define BTRFS_FEATURE_INCOMPAT_REMAP_TREE (1ULL << 17)
struct btrfs_ioctl_feature_flags {
__u64 compat_flags;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index fc29d273845d..4439d77a7252 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -76,6 +76,9 @@
/* Tracks RAID stripes in block groups. */
#define BTRFS_RAID_STRIPE_TREE_OBJECTID 12ULL
+/* Holds details of remapped addresses after relocation. */
+#define BTRFS_REMAP_TREE_OBJECTID 13ULL
+
/* device stats in the device tree */
#define BTRFS_DEV_STATS_OBJECTID 0ULL
@@ -282,6 +285,10 @@
#define BTRFS_RAID_STRIPE_KEY 230
+#define BTRFS_IDENTITY_REMAP_KEY 234
+#define BTRFS_REMAP_KEY 235
+#define BTRFS_REMAP_BACKREF_KEY 236
+
/*
* Records the overall state of the qgroups.
* There's only one instance of this key present,
@@ -1161,6 +1168,7 @@ struct btrfs_dev_replace_item {
#define BTRFS_BLOCK_GROUP_RAID6 (1ULL << 8)
#define BTRFS_BLOCK_GROUP_RAID1C3 (1ULL << 9)
#define BTRFS_BLOCK_GROUP_RAID1C4 (1ULL << 10)
+#define BTRFS_BLOCK_GROUP_REMAPPED (1ULL << 11)
#define BTRFS_BLOCK_GROUP_RESERVED (BTRFS_AVAIL_ALLOC_BIT_SINGLE | \
BTRFS_SPACE_INFO_GLOBAL_RSV)
@@ -1323,4 +1331,8 @@ struct btrfs_verity_descriptor_item {
__u8 encryption;
} __attribute__ ((__packed__));
+struct btrfs_remap {
+ __le64 address;
+} __attribute__ ((__packed__));
+
#endif /* _BTRFS_CTREE_H_ */
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 02/16] btrfs: add REMAP chunk type
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 03/16] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
` (13 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
Add a new REMAP chunk type, which is a metadata chunk that holds the
remap tree.
This is needed for bootstrapping purposes: the remap tree can't itself
be remapped, and must be relocated the existing way, by COWing every
leaf. The remap tree can't go in the SYSTEM chunk as space there is
limited, because a copy of the chunk item gets placed in the superblock.
The changes in fs/btrfs/volumes.h are because we're adding a new block
group type bit after the profile bits, and so can no longer rely on the
const_ilog2 trick.
The sizing to 32MB per chunk, matching the SYSTEM chunk, is an estimate
here, we can adjust it later if it proves to be too big or too small.
This works out to be ~500,000 remap items, which for a 4KB block size
covers ~2GB of remapped data in the worst case and ~500TB in the best case.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/block-rsv.c | 8 ++++++++
fs/btrfs/block-rsv.h | 1 +
fs/btrfs/disk-io.c | 1 +
fs/btrfs/fs.h | 2 ++
fs/btrfs/space-info.c | 13 ++++++++++++-
fs/btrfs/sysfs.c | 2 ++
fs/btrfs/tree-checker.c | 13 +++++++++++--
fs/btrfs/volumes.c | 3 +++
fs/btrfs/volumes.h | 10 +++++++++-
include/uapi/linux/btrfs_tree.h | 4 +++-
10 files changed, 52 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index 96cf7a162987..71bcaa6fa7ee 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -419,6 +419,9 @@ void btrfs_init_root_block_rsv(struct btrfs_root *root)
case BTRFS_TREE_LOG_OBJECTID:
root->block_rsv = &fs_info->treelog_rsv;
break;
+ case BTRFS_REMAP_TREE_OBJECTID:
+ root->block_rsv = &fs_info->remap_block_rsv;
+ break;
default:
root->block_rsv = NULL;
break;
@@ -432,6 +435,9 @@ void btrfs_init_global_block_rsv(struct btrfs_fs_info *fs_info)
space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM);
fs_info->chunk_block_rsv.space_info = space_info;
+ space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_REMAP);
+ fs_info->remap_block_rsv.space_info = space_info;
+
space_info = btrfs_find_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA);
fs_info->global_block_rsv.space_info = space_info;
fs_info->trans_block_rsv.space_info = space_info;
@@ -458,6 +464,8 @@ void btrfs_release_global_block_rsv(struct btrfs_fs_info *fs_info)
WARN_ON(fs_info->trans_block_rsv.reserved > 0);
WARN_ON(fs_info->chunk_block_rsv.size > 0);
WARN_ON(fs_info->chunk_block_rsv.reserved > 0);
+ WARN_ON(fs_info->remap_block_rsv.size > 0);
+ WARN_ON(fs_info->remap_block_rsv.reserved > 0);
WARN_ON(fs_info->delayed_block_rsv.size > 0);
WARN_ON(fs_info->delayed_block_rsv.reserved > 0);
WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);
diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h
index 79ae9d05cd91..8359fb96bc3c 100644
--- a/fs/btrfs/block-rsv.h
+++ b/fs/btrfs/block-rsv.h
@@ -22,6 +22,7 @@ enum btrfs_rsv_type {
BTRFS_BLOCK_RSV_DELALLOC,
BTRFS_BLOCK_RSV_TRANS,
BTRFS_BLOCK_RSV_CHUNK,
+ BTRFS_BLOCK_RSV_REMAP,
BTRFS_BLOCK_RSV_DELOPS,
BTRFS_BLOCK_RSV_DELREFS,
BTRFS_BLOCK_RSV_TREELOG,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0df81a09a3d1..5c106711ad9a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2819,6 +2819,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
BTRFS_BLOCK_RSV_GLOBAL);
btrfs_init_block_rsv(&fs_info->trans_block_rsv, BTRFS_BLOCK_RSV_TRANS);
btrfs_init_block_rsv(&fs_info->chunk_block_rsv, BTRFS_BLOCK_RSV_CHUNK);
+ btrfs_init_block_rsv(&fs_info->remap_block_rsv, BTRFS_BLOCK_RSV_REMAP);
btrfs_init_block_rsv(&fs_info->treelog_rsv, BTRFS_BLOCK_RSV_TREELOG);
btrfs_init_block_rsv(&fs_info->empty_block_rsv, BTRFS_BLOCK_RSV_EMPTY);
btrfs_init_block_rsv(&fs_info->delayed_block_rsv,
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 3c7eeaefa7d5..2d9dc32c7af9 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -499,6 +499,8 @@ struct btrfs_fs_info {
struct btrfs_block_rsv trans_block_rsv;
/* Block reservation for chunk tree */
struct btrfs_block_rsv chunk_block_rsv;
+ /* Block reservation for remap tree */
+ struct btrfs_block_rsv remap_block_rsv;
/* Block reservation for delayed operations */
struct btrfs_block_rsv delayed_block_rsv;
/* Block reservation for delayed refs */
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 6babbe333741..8e040dcea64a 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -215,7 +215,7 @@ static u64 calc_chunk_size(const struct btrfs_fs_info *fs_info, u64 flags)
if (flags & BTRFS_BLOCK_GROUP_DATA)
return BTRFS_MAX_DATA_CHUNK_SIZE;
- else if (flags & BTRFS_BLOCK_GROUP_SYSTEM)
+ else if (flags & (BTRFS_BLOCK_GROUP_SYSTEM | BTRFS_BLOCK_GROUP_REMAP))
return SZ_32M;
/* Handle BTRFS_BLOCK_GROUP_METADATA */
@@ -344,6 +344,8 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
if (mixed) {
flags = BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA;
ret = create_space_info(fs_info, flags);
+ if (ret)
+ goto out;
} else {
flags = BTRFS_BLOCK_GROUP_METADATA;
ret = create_space_info(fs_info, flags);
@@ -352,7 +354,15 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
flags = BTRFS_BLOCK_GROUP_DATA;
ret = create_space_info(fs_info, flags);
+ if (ret)
+ goto out;
+ }
+
+ if (features & BTRFS_FEATURE_INCOMPAT_REMAP_TREE) {
+ flags = BTRFS_BLOCK_GROUP_REMAP;
+ ret = create_space_info(fs_info, flags);
}
+
out:
return ret;
}
@@ -623,6 +633,7 @@ static void dump_global_block_rsv(struct btrfs_fs_info *fs_info)
DUMP_BLOCK_RSV(fs_info, global_block_rsv);
DUMP_BLOCK_RSV(fs_info, trans_block_rsv);
DUMP_BLOCK_RSV(fs_info, chunk_block_rsv);
+ DUMP_BLOCK_RSV(fs_info, remap_block_rsv);
DUMP_BLOCK_RSV(fs_info, delayed_block_rsv);
DUMP_BLOCK_RSV(fs_info, delayed_refs_rsv);
}
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index e095936c2389..ff7f79a3c3e7 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -2026,6 +2026,8 @@ static const char *alloc_name(struct btrfs_space_info *space_info)
case BTRFS_BLOCK_GROUP_SYSTEM:
ASSERT(space_info->subgroup_id == BTRFS_SUB_GROUP_PRIMARY);
return "system";
+ case BTRFS_BLOCK_GROUP_REMAP:
+ return "remap";
default:
WARN_ON(1);
return "invalid-combination";
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index aedc208a95b8..21bf57e81e1a 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -748,17 +748,26 @@ static int check_block_group_item(struct extent_buffer *leaf,
return -EUCLEAN;
}
+ if (flags & BTRFS_BLOCK_GROUP_REMAP &&
+ !btrfs_fs_incompat(fs_info, REMAP_TREE)) {
+ block_group_err(leaf, slot,
+"invalid flags, have 0x%llx (REMAP flag set) but no remap-tree incompat flag",
+ flags);
+ return -EUCLEAN;
+ }
+
type = flags & BTRFS_BLOCK_GROUP_TYPE_MASK;
if (unlikely(type != BTRFS_BLOCK_GROUP_DATA &&
type != BTRFS_BLOCK_GROUP_METADATA &&
type != BTRFS_BLOCK_GROUP_SYSTEM &&
+ type != BTRFS_BLOCK_GROUP_REMAP &&
type != (BTRFS_BLOCK_GROUP_METADATA |
BTRFS_BLOCK_GROUP_DATA))) {
block_group_err(leaf, slot,
-"invalid type, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llx or 0x%llx",
+"invalid type, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llx, 0x%llx or 0x%llx",
type, hweight64(type),
BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_METADATA,
- BTRFS_BLOCK_GROUP_SYSTEM,
+ BTRFS_BLOCK_GROUP_SYSTEM, BTRFS_BLOCK_GROUP_REMAP,
BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA);
return -EUCLEAN;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cf7b8bb86412..97aea6e8d6bb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -231,6 +231,9 @@ void btrfs_describe_block_groups(u64 bg_flags, char *buf, u32 size_buf)
DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_DATA, "data");
DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_SYSTEM, "system");
DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_METADATA, "metadata");
+ /* block groups containing the remap tree */
+ DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_REMAP, "remap");
+ /* block group that has been remapped */
DESCRIBE_FLAG(BTRFS_BLOCK_GROUP_REMAPPED, "remapped");
DESCRIBE_FLAG(BTRFS_AVAIL_ALLOC_BIT_SINGLE, "single");
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 34b854c1a303..4117fabb248b 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -58,7 +58,6 @@ static_assert(ilog2(BTRFS_STRIPE_LEN) == BTRFS_STRIPE_LEN_SHIFT);
*/
static_assert(const_ffs(BTRFS_BLOCK_GROUP_RAID0) <
const_ffs(BTRFS_BLOCK_GROUP_PROFILE_MASK & ~BTRFS_BLOCK_GROUP_RAID0));
-static_assert(ilog2(BTRFS_BLOCK_GROUP_RAID0) > ilog2(BTRFS_BLOCK_GROUP_TYPE_MASK));
/* ilog2() can handle both constants and variables */
#define BTRFS_BG_FLAG_TO_INDEX(profile) \
@@ -80,6 +79,15 @@ enum btrfs_raid_types {
BTRFS_NR_RAID_TYPES
};
+static_assert(BTRFS_RAID_RAID0 == 1);
+static_assert(BTRFS_RAID_RAID1 == 2);
+static_assert(BTRFS_RAID_DUP == 3);
+static_assert(BTRFS_RAID_RAID10 == 4);
+static_assert(BTRFS_RAID_RAID5 == 5);
+static_assert(BTRFS_RAID_RAID6 == 6);
+static_assert(BTRFS_RAID_RAID1C3 == 7);
+static_assert(BTRFS_RAID_RAID1C4 == 8);
+
/*
* Use sequence counter to get consistent device stat data on
* 32-bit processors.
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 4439d77a7252..9a36f0206d90 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -1169,12 +1169,14 @@ struct btrfs_dev_replace_item {
#define BTRFS_BLOCK_GROUP_RAID1C3 (1ULL << 9)
#define BTRFS_BLOCK_GROUP_RAID1C4 (1ULL << 10)
#define BTRFS_BLOCK_GROUP_REMAPPED (1ULL << 11)
+#define BTRFS_BLOCK_GROUP_REMAP (1ULL << 12)
#define BTRFS_BLOCK_GROUP_RESERVED (BTRFS_AVAIL_ALLOC_BIT_SINGLE | \
BTRFS_SPACE_INFO_GLOBAL_RSV)
#define BTRFS_BLOCK_GROUP_TYPE_MASK (BTRFS_BLOCK_GROUP_DATA | \
BTRFS_BLOCK_GROUP_SYSTEM | \
- BTRFS_BLOCK_GROUP_METADATA)
+ BTRFS_BLOCK_GROUP_METADATA | \
+ BTRFS_BLOCK_GROUP_REMAP)
#define BTRFS_BLOCK_GROUP_PROFILE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \
BTRFS_BLOCK_GROUP_RAID1 | \
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 03/16] btrfs: allow remapped chunks to have zero stripes
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 02/16] btrfs: add REMAP chunk type Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 04/16] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
` (12 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
When a chunk has been fully remapped, we are going to set its
num_stripes to 0, as it will no longer represent a physical location on
disk.
Change tree-checker to allow for this, and fix read_one_chunk() to avoid
a divide by zero.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/tree-checker.c | 65 ++++++++++++++++++++++++++++-------------
fs/btrfs/volumes.c | 7 ++++-
2 files changed, 51 insertions(+), 21 deletions(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 21bf57e81e1a..bce0d86b256f 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -816,6 +816,41 @@ static void chunk_err(const struct btrfs_fs_info *fs_info,
va_end(args);
}
+static bool valid_stripe_count(u64 profile, u16 num_stripes,
+ u16 sub_stripes)
+{
+ switch (profile) {
+ case BTRFS_BLOCK_GROUP_RAID0:
+ return true;
+ case BTRFS_BLOCK_GROUP_RAID10:
+ return sub_stripes ==
+ btrfs_raid_array[BTRFS_RAID_RAID10].sub_stripes;
+ case BTRFS_BLOCK_GROUP_RAID1:
+ return num_stripes ==
+ btrfs_raid_array[BTRFS_RAID_RAID1].devs_min;
+ case BTRFS_BLOCK_GROUP_RAID1C3:
+ return num_stripes ==
+ btrfs_raid_array[BTRFS_RAID_RAID1C3].devs_min;
+ case BTRFS_BLOCK_GROUP_RAID1C4:
+ return num_stripes ==
+ btrfs_raid_array[BTRFS_RAID_RAID1C4].devs_min;
+ case BTRFS_BLOCK_GROUP_RAID5:
+ return num_stripes >=
+ btrfs_raid_array[BTRFS_RAID_RAID5].devs_min;
+ case BTRFS_BLOCK_GROUP_RAID6:
+ return num_stripes >=
+ btrfs_raid_array[BTRFS_RAID_RAID6].devs_min;
+ case BTRFS_BLOCK_GROUP_DUP:
+ return num_stripes ==
+ btrfs_raid_array[BTRFS_RAID_DUP].dev_stripes;
+ case 0: /* SINGLE */
+ return num_stripes ==
+ btrfs_raid_array[BTRFS_RAID_SINGLE].dev_stripes;
+ default:
+ BUG();
+ }
+}
+
/*
* The common chunk check which could also work on super block sys chunk array.
*
@@ -839,6 +874,7 @@ int btrfs_check_chunk_valid(const struct btrfs_fs_info *fs_info,
u64 features;
u32 chunk_sector_size;
bool mixed = false;
+ bool remapped;
int raid_index;
int nparity;
int ncopies;
@@ -862,12 +898,14 @@ int btrfs_check_chunk_valid(const struct btrfs_fs_info *fs_info,
ncopies = btrfs_raid_array[raid_index].ncopies;
nparity = btrfs_raid_array[raid_index].nparity;
- if (unlikely(!num_stripes)) {
+ remapped = type & BTRFS_BLOCK_GROUP_REMAPPED;
+
+ if (unlikely(!remapped && !num_stripes)) {
chunk_err(fs_info, leaf, chunk, logical,
"invalid chunk num_stripes, have %u", num_stripes);
return -EUCLEAN;
}
- if (unlikely(num_stripes < ncopies)) {
+ if (unlikely(num_stripes != 0 && num_stripes < ncopies)) {
chunk_err(fs_info, leaf, chunk, logical,
"invalid chunk num_stripes < ncopies, have %u < %d",
num_stripes, ncopies);
@@ -965,22 +1003,9 @@ int btrfs_check_chunk_valid(const struct btrfs_fs_info *fs_info,
}
}
- if (unlikely((type & BTRFS_BLOCK_GROUP_RAID10 &&
- sub_stripes != btrfs_raid_array[BTRFS_RAID_RAID10].sub_stripes) ||
- (type & BTRFS_BLOCK_GROUP_RAID1 &&
- num_stripes != btrfs_raid_array[BTRFS_RAID_RAID1].devs_min) ||
- (type & BTRFS_BLOCK_GROUP_RAID1C3 &&
- num_stripes != btrfs_raid_array[BTRFS_RAID_RAID1C3].devs_min) ||
- (type & BTRFS_BLOCK_GROUP_RAID1C4 &&
- num_stripes != btrfs_raid_array[BTRFS_RAID_RAID1C4].devs_min) ||
- (type & BTRFS_BLOCK_GROUP_RAID5 &&
- num_stripes < btrfs_raid_array[BTRFS_RAID_RAID5].devs_min) ||
- (type & BTRFS_BLOCK_GROUP_RAID6 &&
- num_stripes < btrfs_raid_array[BTRFS_RAID_RAID6].devs_min) ||
- (type & BTRFS_BLOCK_GROUP_DUP &&
- num_stripes != btrfs_raid_array[BTRFS_RAID_DUP].dev_stripes) ||
- ((type & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0 &&
- num_stripes != btrfs_raid_array[BTRFS_RAID_SINGLE].dev_stripes))) {
+ if (!remapped &&
+ !valid_stripe_count(type & BTRFS_BLOCK_GROUP_PROFILE_MASK,
+ num_stripes, sub_stripes)) {
chunk_err(fs_info, leaf, chunk, logical,
"invalid num_stripes:sub_stripes %u:%u for profile %llu",
num_stripes, sub_stripes,
@@ -1004,11 +1029,11 @@ static int check_leaf_chunk_item(struct extent_buffer *leaf,
struct btrfs_fs_info *fs_info = leaf->fs_info;
int num_stripes;
- if (unlikely(btrfs_item_size(leaf, slot) < sizeof(struct btrfs_chunk))) {
+ if (unlikely(btrfs_item_size(leaf, slot) < offsetof(struct btrfs_chunk, stripe))) {
chunk_err(fs_info, leaf, chunk, key->offset,
"invalid chunk item size: have %u expect [%zu, %u)",
btrfs_item_size(leaf, slot),
- sizeof(struct btrfs_chunk),
+ offsetof(struct btrfs_chunk, stripe),
BTRFS_LEAF_DATA_SIZE(fs_info));
return -EUCLEAN;
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 97aea6e8d6bb..ca3bbbd3b213 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7047,7 +7047,12 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
*/
map->sub_stripes = btrfs_raid_array[index].sub_stripes;
map->verified_stripes = 0;
- map->stripe_size = btrfs_calc_stripe_length(map);
+
+ if (num_stripes > 0)
+ map->stripe_size = btrfs_calc_stripe_length(map);
+ else
+ map->stripe_size = 0;
+
for (i = 0; i < num_stripes; i++) {
map->stripes[i].physical =
btrfs_stripe_offset_nr(leaf, chunk, i);
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 04/16] btrfs: remove remapped block groups from the free-space tree
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (2 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 03/16] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 05/16] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
` (11 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
No new allocations can be done from block groups that have the REMAPPED flag
set, so there's no value in their having entries in the free-space tree.
Prevent a search through the free-space tree being scheduled for such a
block group, and prevent any additions to the in-memory free-space tree.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/block-group.c | 19 ++++++++++++++++---
fs/btrfs/free-space-cache.c | 3 +++
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index b964eacc1610..0e88e52aa909 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -933,6 +933,13 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, bool wait)
if (btrfs_is_zoned(fs_info))
return 0;
+ /*
+ * No allocations can be done from remapped block groups, so they have
+ * no entries in the free-space tree.
+ */
+ if (cache->flags & BTRFS_BLOCK_GROUP_REMAPPED)
+ return 0;
+
caching_ctl = kzalloc(sizeof(*caching_ctl), GFP_NOFS);
if (!caching_ctl)
return -ENOMEM;
@@ -1247,10 +1254,16 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
* deletes the block group item from the extent tree, allowing for
* another task to attempt to create another block group with the same
* item key (and failing with -EEXIST and a transaction abort).
+ *
+ * If the REMAPPED flag has been set the block group's free space
+ * has already been removed, so we can skip the call to
+ * btrfs_remove_block_group_free_space().
*/
- ret = btrfs_remove_block_group_free_space(trans, block_group);
- if (ret)
- goto out;
+ if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
+ ret = btrfs_remove_block_group_free_space(trans, block_group);
+ if (ret)
+ goto out;
+ }
ret = remove_block_group_item(trans, path, block_group);
if (ret < 0)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 6ccb492eae8e..05ce6b5a898f 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2756,6 +2756,9 @@ int btrfs_add_free_space(struct btrfs_block_group *block_group,
{
enum btrfs_trim_state trim_state = BTRFS_TRIM_STATE_UNTRIMMED;
+ if (block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)
+ return 0;
+
if (btrfs_is_zoned(block_group->fs_info))
return __btrfs_add_free_space_zoned(block_group, bytenr, size,
true);
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 05/16] btrfs: don't add metadata items for the remap tree to the extent tree
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (3 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 04/16] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 06/16] btrfs: add extended version of struct block_group_item Mark Harmstone
` (10 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
There is the following potential problem with the remap tree and delayed refs:
* Remapped extent freed in a delayed ref, which removes an entry from the
remap tree
* Remap tree now small enough to fit in a single leaf
* Corruption as we now have a level-0 block with a level-1 metadata item
in the extent tree
One solution to this would be to rework the remap tree code so that it operates
via delayed refs. But as we're hoping to remove cow-only metadata items in the
future anyway, change things so that the remap tree doesn't have any entries in
the extent tree. This also has the benefit of reducing write amplification.
We also make it so that the clear_cache mount option is a no-op, as with the
extent tree v2, as the free-space tree can no longer be recreated from the
extent tree.
Finally disable relocating the remap tree itself, which is added back in
a later patch. As it is we would get corruption as the traditional
relocation method walks the extent tree, and we're removing its metadata
items.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/disk-io.c | 3 +++
fs/btrfs/extent-tree.c | 31 ++++++++++++++++++++++++++++++-
fs/btrfs/volumes.c | 3 +++
3 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5c106711ad9a..af1f6d5f6765 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3053,6 +3053,9 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
btrfs_warn(fs_info,
"'clear_cache' option is ignored with extent tree v2");
+ else if (btrfs_fs_incompat(fs_info, REMAP_TREE))
+ btrfs_warn(fs_info,
+ "'clear_cache' option is ignored with remap tree");
else
rebuild_free_space_tree = true;
} else if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 86004b8daa96..383f6fce0079 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1553,6 +1553,28 @@ static void free_head_ref_squota_rsv(struct btrfs_fs_info *fs_info,
BTRFS_QGROUP_RSV_DATA);
}
+static int drop_remap_tree_ref(struct btrfs_trans_handle *trans,
+ const struct btrfs_delayed_ref_node *node)
+{
+ u64 bytenr = node->bytenr;
+ u64 num_bytes = node->num_bytes;
+ int ret;
+
+ ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
+ if (ret) {
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+
+ ret = btrfs_update_block_group(trans, bytenr, num_bytes, false);
+ if (ret) {
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
struct btrfs_delayed_ref_head *href,
const struct btrfs_delayed_ref_node *node,
@@ -1747,7 +1769,10 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
} else if (node->action == BTRFS_ADD_DELAYED_REF) {
ret = __btrfs_inc_extent_ref(trans, node, extent_op);
} else if (node->action == BTRFS_DROP_DELAYED_REF) {
- ret = __btrfs_free_extent(trans, href, node, extent_op);
+ if (node->ref_root == BTRFS_REMAP_TREE_OBJECTID)
+ ret = drop_remap_tree_ref(trans, node);
+ else
+ ret = __btrfs_free_extent(trans, href, node, extent_op);
} else {
BUG();
}
@@ -4894,6 +4919,9 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
int level = btrfs_delayed_ref_owner(node);
bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA);
+ if (unlikely(node->ref_root == BTRFS_REMAP_TREE_OBJECTID))
+ goto skip;
+
extent_key.objectid = node->bytenr;
if (skinny_metadata) {
/* The owner of a tree block is the level. */
@@ -4946,6 +4974,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
btrfs_free_path(path);
+skip:
return alloc_reserved_extent(trans, node->bytenr, fs_info->nodesize);
}
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ca3bbbd3b213..453e8581650e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3971,6 +3971,9 @@ static bool should_balance_chunk(struct extent_buffer *leaf, struct btrfs_chunk
struct btrfs_balance_args *bargs = NULL;
u64 chunk_type = btrfs_chunk_type(leaf, chunk);
+ if (chunk_type & BTRFS_BLOCK_GROUP_REMAP)
+ return false;
+
/* type filter */
if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) &
(bctl->flags & BTRFS_BALANCE_TYPE_MASK))) {
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 06/16] btrfs: add extended version of struct block_group_item
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (4 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 05/16] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
` (9 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
Add a struct btrfs_block_group_item_v2, which is used in the block group
tree if the remap-tree incompat flag is set.
This adds two new fields to the block group item: `remap_bytes` and
`identity_remap_count`.
`remap_bytes` records the amount of data that's physically within this
block group, but nominally in another, remapped block group. This is
necessary because this data will need to be moved first if this block
group is itself relocated. If `remap_bytes` > 0, this is an indicator to
the relocation thread that it will need to search the remap-tree for
backrefs. A block group must also have `remap_bytes` == 0 before it can
be dropped.
`identity_remap_count` records how many identity remap items are located
in the remap tree for this block group. When relocation is begun for
this block group, this is set to the number of holes in the free-space
tree for this range. As identity remaps are converted into actual remaps
by the relocation process, this number is decreased. Once it reaches 0,
either because of relocation or because extents have been deleted, the
block group has been fully remapped and its chunk's device extents are
removed.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/accessors.h | 20 +++++++
fs/btrfs/block-group.c | 100 ++++++++++++++++++++++++--------
fs/btrfs/block-group.h | 14 ++++-
fs/btrfs/discard.c | 2 +-
fs/btrfs/tree-checker.c | 10 +++-
include/uapi/linux/btrfs_tree.h | 8 +++
6 files changed, 126 insertions(+), 28 deletions(-)
diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
index 3eec1a1ecdf4..772d7d61a2fc 100644
--- a/fs/btrfs/accessors.h
+++ b/fs/btrfs/accessors.h
@@ -240,6 +240,26 @@ BTRFS_SETGET_FUNCS(block_group_flags, struct btrfs_block_group_item, flags, 64);
BTRFS_SETGET_STACK_FUNCS(stack_block_group_flags,
struct btrfs_block_group_item, flags, 64);
+/* struct btrfs_block_group_item_v2 */
+BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_used, struct btrfs_block_group_item_v2,
+ used, 64);
+BTRFS_SETGET_FUNCS(block_group_v2_used, struct btrfs_block_group_item_v2, used, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_chunk_objectid,
+ struct btrfs_block_group_item_v2, chunk_objectid, 64);
+BTRFS_SETGET_FUNCS(block_group_v2_chunk_objectid,
+ struct btrfs_block_group_item_v2, chunk_objectid, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_flags,
+ struct btrfs_block_group_item_v2, flags, 64);
+BTRFS_SETGET_FUNCS(block_group_v2_flags, struct btrfs_block_group_item_v2, flags, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_remap_bytes,
+ struct btrfs_block_group_item_v2, remap_bytes, 64);
+BTRFS_SETGET_FUNCS(block_group_v2_remap_bytes, struct btrfs_block_group_item_v2,
+ remap_bytes, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_block_group_v2_identity_remap_count,
+ struct btrfs_block_group_item_v2, identity_remap_count, 32);
+BTRFS_SETGET_FUNCS(block_group_v2_identity_remap_count, struct btrfs_block_group_item_v2,
+ identity_remap_count, 32);
+
/* struct btrfs_free_space_info */
BTRFS_SETGET_FUNCS(free_space_extent_count, struct btrfs_free_space_info,
extent_count, 32);
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 0e88e52aa909..3ebce7d6aae0 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2378,7 +2378,7 @@ static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
}
static int read_one_block_group(struct btrfs_fs_info *info,
- struct btrfs_block_group_item *bgi,
+ struct btrfs_block_group_item_v2 *bgi,
const struct btrfs_key *key,
int need_clear)
{
@@ -2393,11 +2393,16 @@ static int read_one_block_group(struct btrfs_fs_info *info,
return -ENOMEM;
cache->length = key->offset;
- cache->used = btrfs_stack_block_group_used(bgi);
+ cache->used = btrfs_stack_block_group_v2_used(bgi);
cache->commit_used = cache->used;
- cache->flags = btrfs_stack_block_group_flags(bgi);
- cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
+ cache->flags = btrfs_stack_block_group_v2_flags(bgi);
+ cache->global_root_id = btrfs_stack_block_group_v2_chunk_objectid(bgi);
cache->space_info = btrfs_find_space_info(info, cache->flags);
+ cache->remap_bytes = btrfs_stack_block_group_v2_remap_bytes(bgi);
+ cache->commit_remap_bytes = cache->remap_bytes;
+ cache->identity_remap_count =
+ btrfs_stack_block_group_v2_identity_remap_count(bgi);
+ cache->commit_identity_remap_count = cache->identity_remap_count;
btrfs_set_free_space_tree_thresholds(cache);
@@ -2462,7 +2467,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
} else if (cache->length == cache->used) {
cache->cached = BTRFS_CACHE_FINISHED;
btrfs_free_excluded_extents(cache);
- } else if (cache->used == 0) {
+ } else if (cache->used == 0 && cache->remap_bytes == 0) {
cache->cached = BTRFS_CACHE_FINISHED;
ret = btrfs_add_new_free_space(cache, cache->start,
cache->start + cache->length, NULL);
@@ -2482,7 +2487,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
set_avail_alloc_bits(info, cache->flags);
if (btrfs_chunk_writeable(info, cache->start)) {
- if (cache->used == 0) {
+ if (cache->used == 0 && cache->remap_bytes == 0) {
ASSERT(list_empty(&cache->bg_list));
if (btrfs_test_opt(info, DISCARD_ASYNC))
btrfs_discard_queue_work(&info->discard_ctl, cache);
@@ -2586,9 +2591,10 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
need_clear = 1;
while (1) {
- struct btrfs_block_group_item bgi;
+ struct btrfs_block_group_item_v2 bgi;
struct extent_buffer *leaf;
int slot;
+ size_t size;
ret = find_first_block_group(info, path, &key);
if (ret > 0)
@@ -2599,8 +2605,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
leaf = path->nodes[0];
slot = path->slots[0];
+ if (btrfs_fs_incompat(info, REMAP_TREE)) {
+ size = sizeof(struct btrfs_block_group_item_v2);
+ } else {
+ size = sizeof(struct btrfs_block_group_item);
+ btrfs_set_stack_block_group_v2_remap_bytes(&bgi, 0);
+ btrfs_set_stack_block_group_v2_identity_remap_count(&bgi, 0);
+ }
+
read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
- sizeof(bgi));
+ size);
btrfs_item_key_to_cpu(leaf, &key, slot);
btrfs_release_path(path);
@@ -2670,25 +2684,38 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
struct btrfs_block_group *block_group)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
- struct btrfs_block_group_item bgi;
+ struct btrfs_block_group_item_v2 bgi;
struct btrfs_root *root = btrfs_block_group_root(fs_info);
struct btrfs_key key;
u64 old_commit_used;
+ size_t size;
int ret;
spin_lock(&block_group->lock);
- btrfs_set_stack_block_group_used(&bgi, block_group->used);
- btrfs_set_stack_block_group_chunk_objectid(&bgi,
- block_group->global_root_id);
- btrfs_set_stack_block_group_flags(&bgi, block_group->flags);
+ btrfs_set_stack_block_group_v2_used(&bgi, block_group->used);
+ btrfs_set_stack_block_group_v2_chunk_objectid(&bgi,
+ block_group->global_root_id);
+ btrfs_set_stack_block_group_v2_flags(&bgi, block_group->flags);
+ btrfs_set_stack_block_group_v2_remap_bytes(&bgi,
+ block_group->remap_bytes);
+ btrfs_set_stack_block_group_v2_identity_remap_count(&bgi,
+ block_group->identity_remap_count);
old_commit_used = block_group->commit_used;
block_group->commit_used = block_group->used;
+ block_group->commit_remap_bytes = block_group->remap_bytes;
+ block_group->commit_identity_remap_count =
+ block_group->identity_remap_count;
key.objectid = block_group->start;
key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
key.offset = block_group->length;
spin_unlock(&block_group->lock);
- ret = btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi));
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE))
+ size = sizeof(struct btrfs_block_group_item_v2);
+ else
+ size = sizeof(struct btrfs_block_group_item);
+
+ ret = btrfs_insert_item(trans, root, &key, &bgi, size);
if (ret < 0) {
spin_lock(&block_group->lock);
block_group->commit_used = old_commit_used;
@@ -3143,10 +3170,12 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
struct btrfs_root *root = btrfs_block_group_root(fs_info);
unsigned long bi;
struct extent_buffer *leaf;
- struct btrfs_block_group_item bgi;
+ struct btrfs_block_group_item_v2 bgi;
struct btrfs_key key;
- u64 old_commit_used;
- u64 used;
+ u64 old_commit_used, old_commit_remap_bytes;
+ u32 old_commit_identity_remap_count;
+ u64 used, remap_bytes;
+ u32 identity_remap_count;
/*
* Block group items update can be triggered out of commit transaction
@@ -3156,13 +3185,21 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
*/
spin_lock(&cache->lock);
old_commit_used = cache->commit_used;
+ old_commit_remap_bytes = cache->commit_remap_bytes;
+ old_commit_identity_remap_count = cache->commit_identity_remap_count;
used = cache->used;
- /* No change in used bytes, can safely skip it. */
- if (cache->commit_used == used) {
+ remap_bytes = cache->remap_bytes;
+ identity_remap_count = cache->identity_remap_count;
+ /* No change in values, can safely skip it. */
+ if (cache->commit_used == used &&
+ cache->commit_remap_bytes == remap_bytes &&
+ cache->commit_identity_remap_count == identity_remap_count) {
spin_unlock(&cache->lock);
return 0;
}
cache->commit_used = used;
+ cache->commit_remap_bytes = remap_bytes;
+ cache->commit_identity_remap_count = identity_remap_count;
spin_unlock(&cache->lock);
key.objectid = cache->start;
@@ -3178,11 +3215,23 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
leaf = path->nodes[0];
bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
- btrfs_set_stack_block_group_used(&bgi, used);
- btrfs_set_stack_block_group_chunk_objectid(&bgi,
- cache->global_root_id);
- btrfs_set_stack_block_group_flags(&bgi, cache->flags);
- write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
+ btrfs_set_stack_block_group_v2_used(&bgi, used);
+ btrfs_set_stack_block_group_v2_chunk_objectid(&bgi,
+ cache->global_root_id);
+ btrfs_set_stack_block_group_v2_flags(&bgi, cache->flags);
+
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
+ btrfs_set_stack_block_group_v2_remap_bytes(&bgi,
+ cache->remap_bytes);
+ btrfs_set_stack_block_group_v2_identity_remap_count(&bgi,
+ cache->identity_remap_count);
+ write_extent_buffer(leaf, &bgi, bi,
+ sizeof(struct btrfs_block_group_item_v2));
+ } else {
+ write_extent_buffer(leaf, &bgi, bi,
+ sizeof(struct btrfs_block_group_item));
+ }
+
fail:
btrfs_release_path(path);
/*
@@ -3197,6 +3246,9 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
if (ret < 0 && ret != -ENOENT) {
spin_lock(&cache->lock);
cache->commit_used = old_commit_used;
+ cache->commit_remap_bytes = old_commit_remap_bytes;
+ cache->commit_identity_remap_count =
+ old_commit_identity_remap_count;
spin_unlock(&cache->lock);
}
return ret;
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 9172104a5889..af23fdb3cf4d 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -129,6 +129,8 @@ struct btrfs_block_group {
u64 flags;
u64 cache_generation;
u64 global_root_id;
+ u64 remap_bytes;
+ u32 identity_remap_count;
/*
* The last committed used bytes of this block group, if the above @used
@@ -136,6 +138,15 @@ struct btrfs_block_group {
* group item of this block group.
*/
u64 commit_used;
+ /*
+ * The last committed remap_bytes value of this block group.
+ */
+ u64 commit_remap_bytes;
+ /*
+ * The last commited identity_remap_count value of this block group.
+ */
+ u32 commit_identity_remap_count;
+
/*
* If the free space extent count exceeds this number, convert the block
* group to bitmaps.
@@ -282,7 +293,8 @@ static inline bool btrfs_is_block_group_used(const struct btrfs_block_group *bg)
{
lockdep_assert_held(&bg->lock);
- return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0);
+ return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0 ||
+ bg->remap_bytes > 0);
}
static inline bool btrfs_is_block_group_data_only(const struct btrfs_block_group *block_group)
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index 89fe85778115..ee5f5b2788e1 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -373,7 +373,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
return;
- if (block_group->used == 0)
+ if (block_group->used == 0 && block_group->remap_bytes == 0)
add_to_discard_unused_list(discard_ctl, block_group);
else
add_to_discard_list(discard_ctl, block_group);
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index bce0d86b256f..d035b86d8942 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -688,6 +688,7 @@ static int check_block_group_item(struct extent_buffer *leaf,
u64 chunk_objectid;
u64 flags;
u64 type;
+ size_t exp_size;
/*
* Here we don't really care about alignment since extent allocator can
@@ -699,10 +700,15 @@ static int check_block_group_item(struct extent_buffer *leaf,
return -EUCLEAN;
}
- if (unlikely(item_size != sizeof(bgi))) {
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE))
+ exp_size = sizeof(struct btrfs_block_group_item_v2);
+ else
+ exp_size = sizeof(struct btrfs_block_group_item);
+
+ if (unlikely(item_size != exp_size)) {
block_group_err(leaf, slot,
"invalid item size, have %u expect %zu",
- item_size, sizeof(bgi));
+ item_size, exp_size);
return -EUCLEAN;
}
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 9a36f0206d90..500e3a7df90b 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -1229,6 +1229,14 @@ struct btrfs_block_group_item {
__le64 flags;
} __attribute__ ((__packed__));
+struct btrfs_block_group_item_v2 {
+ __le64 used;
+ __le64 chunk_objectid;
+ __le64 flags;
+ __le64 remap_bytes;
+ __le32 identity_remap_count;
+} __attribute__ ((__packed__));
+
struct btrfs_free_space_info {
__le32 extent_count;
__le32 flags;
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (5 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 06/16] btrfs: add extended version of struct block_group_item Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 08/16] btrfs: redirect I/O for remapped block groups Mark Harmstone
` (8 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
If we encounter a filesystem with the remap-tree incompat flag set,
valdiate its compatibility with the other flags, and load the remap tree
using the values that have been added to the superblock.
The remap-tree feature depends on the free space tere, but no-holes and
block-group-tree have been made dependencies to reduce the testing
matrix. Similarly I'm not aware of any reason why mixed-bg and zoned would be
incompatible with remap-tree, but this is blocked for the time being
until it can be fully tested.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/Kconfig | 2 +
fs/btrfs/accessors.h | 6 ++
fs/btrfs/disk-io.c | 106 ++++++++++++++++++++++++++++----
fs/btrfs/extent-tree.c | 2 +
fs/btrfs/fs.h | 4 +-
fs/btrfs/transaction.c | 7 +++
include/uapi/linux/btrfs_tree.h | 5 +-
7 files changed, 118 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index 4438637c8900..77b5a9f27840 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -117,4 +117,6 @@ config BTRFS_EXPERIMENTAL
- large folio support
+ - remap-tree - logical address remapping tree
+
If unsure, say N.
diff --git a/fs/btrfs/accessors.h b/fs/btrfs/accessors.h
index 772d7d61a2fc..e45afdd0e774 100644
--- a/fs/btrfs/accessors.h
+++ b/fs/btrfs/accessors.h
@@ -883,6 +883,12 @@ BTRFS_SETGET_STACK_FUNCS(super_uuid_tree_generation, struct btrfs_super_block,
uuid_tree_generation, 64);
BTRFS_SETGET_STACK_FUNCS(super_nr_global_roots, struct btrfs_super_block,
nr_global_roots, 64);
+BTRFS_SETGET_STACK_FUNCS(super_remap_root, struct btrfs_super_block,
+ remap_root, 64);
+BTRFS_SETGET_STACK_FUNCS(super_remap_root_generation, struct btrfs_super_block,
+ remap_root_generation, 64);
+BTRFS_SETGET_STACK_FUNCS(super_remap_root_level, struct btrfs_super_block,
+ remap_root_level, 8);
/* struct btrfs_file_extent_item */
BTRFS_SETGET_STACK_FUNCS(stack_file_extent_type, struct btrfs_file_extent_item,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index af1f6d5f6765..9809e30fe103 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1188,6 +1188,8 @@ static struct btrfs_root *btrfs_get_global_root(struct btrfs_fs_info *fs_info,
return btrfs_grab_root(btrfs_global_root(fs_info, &key));
case BTRFS_RAID_STRIPE_TREE_OBJECTID:
return btrfs_grab_root(fs_info->stripe_root);
+ case BTRFS_REMAP_TREE_OBJECTID:
+ return btrfs_grab_root(fs_info->remap_root);
default:
return NULL;
}
@@ -1279,6 +1281,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
btrfs_put_root(fs_info->data_reloc_root);
btrfs_put_root(fs_info->block_group_root);
btrfs_put_root(fs_info->stripe_root);
+ btrfs_put_root(fs_info->remap_root);
btrfs_check_leaked_roots(fs_info);
btrfs_extent_buffer_leak_debug_check(fs_info);
kfree(fs_info->super_copy);
@@ -1831,6 +1834,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, bool free_chunk_root)
free_root_extent_buffers(info->data_reloc_root);
free_root_extent_buffers(info->block_group_root);
free_root_extent_buffers(info->stripe_root);
+ free_root_extent_buffers(info->remap_root);
if (free_chunk_root)
free_root_extent_buffers(info->chunk_root);
}
@@ -2260,20 +2264,45 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
if (ret)
goto out;
- /*
- * This tree can share blocks with some other fs tree during relocation
- * and we need a proper setup by btrfs_get_fs_root
- */
- root = btrfs_get_fs_root(tree_root->fs_info,
- BTRFS_DATA_RELOC_TREE_OBJECTID, true);
- if (IS_ERR(root)) {
- if (!btrfs_test_opt(fs_info, IGNOREBADROOTS)) {
- ret = PTR_ERR(root);
- goto out;
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
+ /* remap_root already loaded in load_important_roots() */
+ root = fs_info->remap_root;
+
+ set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
+
+ root->root_key.objectid = BTRFS_REMAP_TREE_OBJECTID;
+ root->root_key.type = BTRFS_ROOT_ITEM_KEY;
+ root->root_key.offset = 0;
+
+ /* Check that data reloc tree doesn't also exist */
+ location.objectid = BTRFS_DATA_RELOC_TREE_OBJECTID;
+ root = btrfs_read_tree_root(fs_info->tree_root, &location);
+ if (!IS_ERR(root)) {
+ btrfs_err(fs_info,
+ "data reloc tree exists when remap-tree enabled");
+ btrfs_put_root(root);
+ return -EIO;
+ } else if (PTR_ERR(root) != -ENOENT) {
+ btrfs_warn(fs_info,
+ "error %ld when checking for data reloc tree",
+ PTR_ERR(root));
}
} else {
- set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
- fs_info->data_reloc_root = root;
+ /*
+ * This tree can share blocks with some other fs tree during
+ * relocation and we need a proper setup by btrfs_get_fs_root
+ */
+ root = btrfs_get_fs_root(tree_root->fs_info,
+ BTRFS_DATA_RELOC_TREE_OBJECTID, true);
+ if (IS_ERR(root)) {
+ if (!btrfs_test_opt(fs_info, IGNOREBADROOTS)) {
+ ret = PTR_ERR(root);
+ goto out;
+ }
+ } else {
+ set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
+ fs_info->data_reloc_root = root;
+ }
}
location.objectid = BTRFS_QUOTA_TREE_OBJECTID;
@@ -2513,6 +2542,36 @@ int btrfs_validate_super(const struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
+ /*
+ * Reduce test matrix for remap tree by requiring block-group-tree
+ * and no-holes. Free-space-tree is a hard requirement.
+ */
+ if (!btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID) ||
+ !btrfs_fs_incompat(fs_info, NO_HOLES) ||
+ !btrfs_fs_compat_ro(fs_info, BLOCK_GROUP_TREE)) {
+ btrfs_err(fs_info,
+"remap-tree feature requires free-space-tree, no-holes, and block-group-tree");
+ ret = -EINVAL;
+ }
+
+ if (btrfs_fs_incompat(fs_info, MIXED_GROUPS)) {
+ btrfs_err(fs_info, "remap-tree not supported with mixed-bg");
+ ret = -EINVAL;
+ }
+
+ if (btrfs_fs_incompat(fs_info, ZONED)) {
+ btrfs_err(fs_info, "remap-tree not supported with zoned devices");
+ ret = -EINVAL;
+ }
+
+ if (sectorsize > PAGE_SIZE) {
+ btrfs_err(fs_info,
+ "remap-tree not supported when block size > page size");
+ ret = -EINVAL;
+ }
+ }
+
/*
* Hint to catch really bogus numbers, bitflips or so, more exact checks are
* done later
@@ -2671,6 +2730,18 @@ static int load_important_roots(struct btrfs_fs_info *fs_info)
btrfs_warn(fs_info, "couldn't read tree root");
return ret;
}
+
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
+ bytenr = btrfs_super_remap_root(sb);
+ gen = btrfs_super_remap_root_generation(sb);
+ level = btrfs_super_remap_root_level(sb);
+ ret = load_super_root(fs_info->remap_root, bytenr, gen, level);
+ if (ret) {
+ btrfs_warn(fs_info, "couldn't read remap root");
+ return ret;
+ }
+ }
+
return 0;
}
@@ -3288,6 +3359,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_root *tree_root;
struct btrfs_root *chunk_root;
+ struct btrfs_root *remap_root;
int ret;
int level;
@@ -3421,6 +3493,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
if (ret < 0)
goto fail_alloc;
+ if (btrfs_super_incompat_flags(disk_super) & BTRFS_FEATURE_INCOMPAT_REMAP_TREE) {
+ remap_root = btrfs_alloc_root(fs_info, BTRFS_REMAP_TREE_OBJECTID,
+ GFP_KERNEL);
+ fs_info->remap_root = remap_root;
+ if (!remap_root) {
+ ret = -ENOMEM;
+ goto fail_alloc;
+ }
+ }
+
/*
* At this point our mount options are validated, if we set ->max_inline
* to something non-standard make sure we truncate it to sectorsize.
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 383f6fce0079..a7e522f67cca 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2590,6 +2590,8 @@ static u64 get_alloc_profile_by_root(struct btrfs_root *root, int data)
flags = BTRFS_BLOCK_GROUP_DATA;
else if (root == fs_info->chunk_root)
flags = BTRFS_BLOCK_GROUP_SYSTEM;
+ else if (root == fs_info->remap_root)
+ flags = BTRFS_BLOCK_GROUP_REMAP;
else
flags = BTRFS_BLOCK_GROUP_METADATA;
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 2d9dc32c7af9..72fde0a3aaaf 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -305,7 +305,8 @@ enum {
#define BTRFS_FEATURE_INCOMPAT_SUPP \
(BTRFS_FEATURE_INCOMPAT_SUPP_STABLE | \
BTRFS_FEATURE_INCOMPAT_RAID_STRIPE_TREE | \
- BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2)
+ BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2 | \
+ BTRFS_FEATURE_INCOMPAT_REMAP_TREE)
#else
@@ -465,6 +466,7 @@ struct btrfs_fs_info {
struct btrfs_root *data_reloc_root;
struct btrfs_root *block_group_root;
struct btrfs_root *stripe_root;
+ struct btrfs_root *remap_root;
/* The log root tree is a directory of all the other log roots */
struct btrfs_root *log_root_tree;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 05ee4391c83a..ecad83c0783a 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1966,6 +1966,13 @@ static void update_super_roots(struct btrfs_fs_info *fs_info)
super->cache_generation = 0;
if (test_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, &fs_info->flags))
super->uuid_tree_generation = root_item->generation;
+
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
+ root_item = &fs_info->remap_root->root_item;
+ super->remap_root = root_item->bytenr;
+ super->remap_root_generation = root_item->generation;
+ super->remap_root_level = root_item->level;
+ }
}
int btrfs_transaction_blocked(struct btrfs_fs_info *info)
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 500e3a7df90b..89bcb80081a6 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -721,9 +721,12 @@ struct btrfs_super_block {
__u8 metadata_uuid[BTRFS_FSID_SIZE];
__u64 nr_global_roots;
+ __le64 remap_root;
+ __le64 remap_root_generation;
+ __u8 remap_root_level;
/* Future expansion */
- __le64 reserved[27];
+ __u8 reserved[199];
__u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 08/16] btrfs: redirect I/O for remapped block groups
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (6 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 09/16] btrfs: handle deletions from remapped block group Mark Harmstone
` (7 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
Change btrfs_map_block() so that if the block group has the REMAPPED
flag set, we call btrfs_translate_remap() to obtain a new address.
btrfs_translate_remap() searches the remap tree for a range
corresponding to the logical address passed to btrfs_map_block(). If it
is within an identity remap, this part of the block group hasn't yet
been relocated, and so we use the existing address.
If it is within an actual remap, we subtract the start of the remap
range and add the address of its destination, contained in the item's
payload.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/relocation.c | 54 +++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/relocation.h | 2 ++
fs/btrfs/volumes.c | 19 +++++++++++++++
3 files changed, 75 insertions(+)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 739fca944296..00e1898edbbe 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3860,6 +3860,60 @@ static const char *stage_to_string(enum reloc_stage stage)
return "unknown";
}
+int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
+ u64 *length)
+{
+ int ret;
+ struct btrfs_key key, found_key;
+ struct extent_buffer *leaf;
+ struct btrfs_remap *remap;
+ BTRFS_PATH_AUTO_FREE(path);
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ key.objectid = *logical;
+ key.type = (u8)-1;
+ key.offset = (u64)-1;
+
+ ret = btrfs_search_slot(NULL, fs_info->remap_root, &key, path,
+ 0, 0);
+ if (ret < 0)
+ return ret;
+
+ leaf = path->nodes[0];
+
+ if (path->slots[0] == 0)
+ return -ENOENT;
+
+ path->slots[0]--;
+
+ btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+
+ if (found_key.type != BTRFS_REMAP_KEY &&
+ found_key.type != BTRFS_IDENTITY_REMAP_KEY) {
+ return -ENOENT;
+ }
+
+ if (found_key.objectid > *logical ||
+ found_key.objectid + found_key.offset <= *logical) {
+ return -ENOENT;
+ }
+
+ if (*logical + *length > found_key.objectid + found_key.offset)
+ *length = found_key.objectid + found_key.offset - *logical;
+
+ if (found_key.type == BTRFS_IDENTITY_REMAP_KEY)
+ return 0;
+
+ remap = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_remap);
+
+ *logical += btrfs_remap_address(leaf, remap) - found_key.objectid;
+
+ return 0;
+}
+
/*
* function to relocate all extents in a block group.
*/
diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
index 5c36b3f84b57..b2ba83966650 100644
--- a/fs/btrfs/relocation.h
+++ b/fs/btrfs/relocation.h
@@ -31,5 +31,7 @@ int btrfs_should_cancel_balance(const struct btrfs_fs_info *fs_info);
struct btrfs_root *find_reloc_root(struct btrfs_fs_info *fs_info, u64 bytenr);
bool btrfs_should_ignore_reloc_root(const struct btrfs_root *root);
u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info);
+int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
+ u64 *length);
#endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 453e8581650e..6a72c2a599a6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6586,6 +6586,25 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
if (IS_ERR(map))
return PTR_ERR(map);
+ if (map->type & BTRFS_BLOCK_GROUP_REMAPPED) {
+ u64 new_logical = logical;
+
+ ret = btrfs_translate_remap(fs_info, &new_logical, length);
+ if (ret)
+ return ret;
+
+ if (new_logical != logical) {
+ btrfs_free_chunk_map(map);
+
+ map = btrfs_get_chunk_map(fs_info, new_logical,
+ *length);
+ if (IS_ERR(map))
+ return PTR_ERR(map);
+
+ logical = new_logical;
+ }
+ }
+
num_copies = btrfs_chunk_map_num_copies(map);
if (io_geom.mirror_num > num_copies)
return -EINVAL;
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 09/16] btrfs: handle deletions from remapped block group
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (7 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 08/16] btrfs: redirect I/O for remapped block groups Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-20 0:17 ` Boris Burkov
2025-11-14 18:47 ` [PATCH v6 10/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
` (6 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone
Handle the case where we free an extent from a block group that has the
REMAPPED flag set. Because the remap tree is orthogonal to the extent
tree, for data this may be within any number of identity remaps or
actual remaps. If we're freeing a metadata node, this will be wholly
inside one or the other.
btrfs_remove_extent_from_remap_tree() searches the remap tree for the
remaps that cover the range in question, then calls
remove_range_from_remap_tree() for each one, to punch a hole in the
remap and adjust the free-space tree.
For an identity remap, remove_range_from_remap_tree() will adjust the
block group's `identity_remap_count` if this changes. If it reaches
zero we mark the block group as fully remapped.
When we commit the transaction, fully remapped block groups have their
chunk stripes removed and their device extents freed, which makes the
disk space available again to the chunk allocator.
This is done when committing the transaction because it's a quick, rare
operation which prevents the chunk allocator from ENOSPCing - but see
later patches which do this asynchronously for the case of async
discard.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
---
fs/btrfs/block-group.c | 101 ++++++---
fs/btrfs/block-group.h | 4 +
fs/btrfs/disk-io.c | 6 +
fs/btrfs/extent-tree.c | 76 ++++++-
fs/btrfs/extent-tree.h | 1 +
fs/btrfs/fs.h | 4 +-
fs/btrfs/relocation.c | 452 +++++++++++++++++++++++++++++++++++++++++
fs/btrfs/relocation.h | 5 +
fs/btrfs/volumes.c | 56 +++--
fs/btrfs/volumes.h | 6 +
10 files changed, 656 insertions(+), 55 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 3ebce7d6aae0..e269518e1bfe 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1068,6 +1068,32 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans,
return ret;
}
+void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group)
+{
+ int factor = btrfs_bg_type_to_factor(block_group->flags);
+
+ spin_lock(&block_group->space_info->lock);
+
+ if (btrfs_test_opt(block_group->fs_info, ENOSPC_DEBUG)) {
+ WARN_ON(block_group->space_info->total_bytes
+ < block_group->length);
+ WARN_ON(block_group->space_info->bytes_readonly
+ < block_group->length - block_group->zone_unusable);
+ WARN_ON(block_group->space_info->bytes_zone_unusable
+ < block_group->zone_unusable);
+ WARN_ON(block_group->space_info->disk_total
+ < block_group->length * factor);
+ }
+ block_group->space_info->total_bytes -= block_group->length;
+ block_group->space_info->bytes_readonly -=
+ (block_group->length - block_group->zone_unusable);
+ btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
+ -block_group->zone_unusable);
+ block_group->space_info->disk_total -= block_group->length * factor;
+
+ spin_unlock(&block_group->space_info->lock);
+}
+
int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
struct btrfs_chunk_map *map)
{
@@ -1079,7 +1105,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
struct kobject *kobj = NULL;
int ret;
int index;
- int factor;
struct btrfs_caching_control *caching_ctl = NULL;
bool remove_map;
bool remove_rsv = false;
@@ -1088,7 +1113,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
if (!block_group)
return -ENOENT;
- BUG_ON(!block_group->ro);
+ BUG_ON(!block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED));
trace_btrfs_remove_block_group(block_group);
/*
@@ -1100,7 +1125,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
block_group->length);
index = btrfs_bg_flags_to_raid_index(block_group->flags);
- factor = btrfs_bg_type_to_factor(block_group->flags);
/* make sure this block group isn't part of an allocation cluster */
cluster = &fs_info->data_alloc_cluster;
@@ -1224,26 +1248,11 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
spin_lock(&block_group->space_info->lock);
list_del_init(&block_group->ro_list);
-
- if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
- WARN_ON(block_group->space_info->total_bytes
- < block_group->length);
- WARN_ON(block_group->space_info->bytes_readonly
- < block_group->length - block_group->zone_unusable);
- WARN_ON(block_group->space_info->bytes_zone_unusable
- < block_group->zone_unusable);
- WARN_ON(block_group->space_info->disk_total
- < block_group->length * factor);
- }
- block_group->space_info->total_bytes -= block_group->length;
- block_group->space_info->bytes_readonly -=
- (block_group->length - block_group->zone_unusable);
- btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
- -block_group->zone_unusable);
- block_group->space_info->disk_total -= block_group->length * factor;
-
spin_unlock(&block_group->space_info->lock);
+ if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED))
+ btrfs_remove_bg_from_sinfo(block_group);
+
/*
* Remove the free space for the block group from the free space tree
* and the block group's item from the extent tree before marking the
@@ -1578,8 +1587,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
spin_lock(&space_info->lock);
spin_lock(&block_group->lock);
- if (btrfs_is_block_group_used(block_group) || block_group->ro ||
- list_is_singular(&block_group->list)) {
+ if (btrfs_is_block_group_used(block_group) ||
+ (block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) ||
+ list_is_singular(&block_group->list) ||
+ block_group->fully_remapped) {
/*
* We want to bail if we made new allocations or have
* outstanding allocations in this block group. We do
@@ -1620,9 +1631,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
* needing to allocate extents from the block group.
*/
used = btrfs_space_info_used(space_info, true);
- if ((space_info->total_bytes - block_group->length < used &&
- block_group->zone_unusable < block_group->length) ||
- has_unwritten_metadata(block_group)) {
+ if (((space_info->total_bytes - block_group->length < used &&
+ block_group->zone_unusable < block_group->length) ||
+ has_unwritten_metadata(block_group)) &&
+ !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
/*
* Add a reference for the list, compensate for the ref
* drop under the "next" label for the
@@ -1787,6 +1799,12 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
btrfs_get_block_group(bg);
trace_btrfs_add_unused_block_group(bg);
list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
+ } else if (bg->flags & BTRFS_BLOCK_GROUP_REMAPPED &&
+ bg->identity_remap_count == 0) {
+ /*
+ * Leave fully remapped block groups on the
+ * fully_remapped_bgs list.
+ */
} else if (!test_bit(BLOCK_GROUP_FLAG_NEW, &bg->runtime_flags)) {
/* Pull out the block group from the reclaim_bgs list. */
trace_btrfs_add_unused_block_group(bg);
@@ -4600,6 +4618,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
list_del_init(&block_group->bg_list);
btrfs_put_block_group(block_group);
}
+
+ while (!list_empty(&info->fully_remapped_bgs)) {
+ block_group = list_first_entry(&info->fully_remapped_bgs,
+ struct btrfs_block_group,
+ bg_list);
+ list_del_init(&block_group->bg_list);
+ btrfs_put_block_group(block_group);
+ }
spin_unlock(&info->unused_bgs_lock);
spin_lock(&info->zone_active_bgs_lock);
@@ -4787,3 +4813,26 @@ bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg)
return false;
return true;
}
+
+void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
+ struct btrfs_trans_handle *trans)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+
+ spin_lock(&fs_info->unused_bgs_lock);
+
+ /*
+ * The block group might already be on the unused_bgs list, remove it
+ * if it is. It'll get readded after the async discard worker finishes,
+ * or in btrfs_handle_fully_remapped_bgs() if we're not using async
+ * discard.
+ */
+ if (!list_empty(&bg->bg_list))
+ list_del(&bg->bg_list);
+ else
+ btrfs_get_block_group(bg);
+
+ list_add_tail(&bg->bg_list, &fs_info->fully_remapped_bgs);
+
+ spin_unlock(&fs_info->unused_bgs_lock);
+}
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index af23fdb3cf4d..d85f3c2546d0 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -282,6 +282,7 @@ struct btrfs_block_group {
struct extent_buffer *last_eb;
enum btrfs_block_group_size_class size_class;
u64 reclaim_mark;
+ bool fully_remapped;
};
static inline u64 btrfs_block_group_end(const struct btrfs_block_group *block_group)
@@ -336,6 +337,7 @@ int btrfs_add_new_free_space(struct btrfs_block_group *block_group,
struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
struct btrfs_fs_info *fs_info,
const u64 chunk_offset);
+void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group);
int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
struct btrfs_chunk_map *map);
void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
@@ -407,5 +409,7 @@ int btrfs_use_block_group_size_class(struct btrfs_block_group *bg,
enum btrfs_block_group_size_class size_class,
bool force_wrong_size_class);
bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg);
+void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
+ struct btrfs_trans_handle *trans);
#endif /* BTRFS_BLOCK_GROUP_H */
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9809e30fe103..53221a0131fb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1526,6 +1526,10 @@ static int cleaner_kthread(void *arg)
*/
btrfs_run_defrag_inodes(fs_info);
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE) &&
+ !btrfs_test_opt(fs_info, DISCARD_ASYNC))
+ btrfs_handle_fully_remapped_bgs(fs_info);
+
/*
* Acquires fs_info->reclaim_bgs_lock to avoid racing
* with relocation (btrfs_relocate_chunk) and relocation
@@ -2878,6 +2882,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
INIT_LIST_HEAD(&fs_info->unused_bgs);
INIT_LIST_HEAD(&fs_info->reclaim_bgs);
+ INIT_LIST_HEAD(&fs_info->fully_remapped_bgs);
INIT_LIST_HEAD(&fs_info->zone_active_bgs);
#ifdef CONFIG_BTRFS_DEBUG
INIT_LIST_HEAD(&fs_info->allocated_roots);
@@ -2933,6 +2938,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
mutex_init(&fs_info->chunk_mutex);
mutex_init(&fs_info->transaction_kthread_mutex);
mutex_init(&fs_info->cleaner_mutex);
+ mutex_init(&fs_info->remap_mutex);
mutex_init(&fs_info->ro_block_group_mutex);
init_rwsem(&fs_info->commit_root_sem);
init_rwsem(&fs_info->cleanup_work_sem);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a7e522f67cca..b8fed3246e1f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -41,6 +41,7 @@
#include "tree-checker.h"
#include "raid-stripe-tree.h"
#include "delayed-inode.h"
+#include "relocation.h"
#undef SCRAMBLE_DELAYED_REFS
@@ -2846,6 +2847,51 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
return 0;
}
+void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_block_group *block_group;
+ int ret;
+
+ spin_lock(&fs_info->unused_bgs_lock);
+ while (!list_empty(&fs_info->fully_remapped_bgs)) {
+ struct btrfs_chunk_map *map;
+
+ block_group = list_first_entry(&fs_info->fully_remapped_bgs,
+ struct btrfs_block_group,
+ bg_list);
+ list_del_init(&block_group->bg_list);
+ spin_unlock(&fs_info->unused_bgs_lock);
+
+ map = btrfs_get_chunk_map(fs_info, block_group->start, 1);
+ if (IS_ERR(map)) {
+ btrfs_put_block_group(block_group);
+ return;
+ }
+
+ ret = btrfs_last_identity_remap_gone(map, block_group);
+ if (ret) {
+ btrfs_free_chunk_map(map);
+ btrfs_put_block_group(block_group);
+ return;
+ }
+
+ /*
+ * Set num_stripes to 0, so that btrfs_remove_dev_extents()
+ * won't run a second time.
+ */
+ map->num_stripes = 0;
+
+ btrfs_free_chunk_map(map);
+
+ if (block_group->used == 0)
+ btrfs_mark_bg_unused(block_group);
+
+ btrfs_put_block_group(block_group);
+ spin_lock(&fs_info->unused_bgs_lock);
+ }
+ spin_unlock(&fs_info->unused_bgs_lock);
+}
+
int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -2998,11 +3044,23 @@ u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info,
}
static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
- u64 bytenr, struct btrfs_squota_delta *delta)
+ u64 bytenr, struct btrfs_squota_delta *delta,
+ struct btrfs_path *path)
{
int ret;
+ bool remapped = false;
u64 num_bytes = delta->num_bytes;
+ /* returns 1 on success and 0 on no-op */
+ ret = btrfs_remove_extent_from_remap_tree(trans, path, bytenr,
+ num_bytes);
+ if (ret < 0) {
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ } else if (ret == 1) {
+ remapped = true;
+ }
+
if (delta->is_data) {
struct btrfs_root *csum_root;
@@ -3026,10 +3084,16 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
return ret;
}
- ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
- if (unlikely(ret)) {
- btrfs_abort_transaction(trans, ret);
- return ret;
+ /*
+ * If remapped, FST has already been taken care of in
+ * remove_range_from_remap_tree().
+ */
+ if (!remapped) {
+ ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
+ if (unlikely(ret)) {
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
}
ret = btrfs_update_block_group(trans, bytenr, num_bytes, false);
@@ -3395,7 +3459,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
}
btrfs_release_path(path);
- ret = do_free_extent_accounting(trans, bytenr, &delta);
+ ret = do_free_extent_accounting(trans, bytenr, &delta, path);
}
btrfs_release_path(path);
diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h
index e573509c5a71..a15a9497c9f3 100644
--- a/fs/btrfs/extent-tree.h
+++ b/fs/btrfs/extent-tree.h
@@ -163,5 +163,6 @@ void btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u6
int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
u64 num_bytes, u64 *actual_bytes);
int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range);
+void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info);
#endif
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 72fde0a3aaaf..9dbb482d8928 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -577,6 +577,7 @@ struct btrfs_fs_info {
struct mutex transaction_kthread_mutex;
struct mutex cleaner_mutex;
struct mutex chunk_mutex;
+ struct mutex remap_mutex;
/*
* This is taken to make sure we don't set block groups ro after the
@@ -830,10 +831,11 @@ struct btrfs_fs_info {
struct list_head reclaim_bgs;
int bg_reclaim_threshold;
- /* Protects the lists unused_bgs and reclaim_bgs. */
+ /* Protects the lists unused_bgs, reclaim_bgs, and fully_remapped_bgs. */
spinlock_t unused_bgs_lock;
/* Protected by unused_bgs_lock. */
struct list_head unused_bgs;
+ struct list_head fully_remapped_bgs;
struct mutex unused_bg_unpin_mutex;
/* Protect block groups that are going to be deleted */
struct mutex reclaim_bgs_lock;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 00e1898edbbe..315f212718ad 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -37,6 +37,7 @@
#include "super.h"
#include "tree-checker.h"
#include "raid-stripe-tree.h"
+#include "free-space-tree.h"
/*
* Relocation overview
@@ -3860,6 +3861,183 @@ static const char *stage_to_string(enum reloc_stage stage)
return "unknown";
}
+static void adjust_block_group_remap_bytes(struct btrfs_trans_handle *trans,
+ struct btrfs_block_group *bg,
+ s64 diff)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ bool bg_already_dirty = true, mark_unused = false;
+
+ spin_lock(&bg->lock);
+
+ bg->remap_bytes += diff;
+
+ if (bg->used == 0 && bg->remap_bytes == 0)
+ mark_unused = true;
+
+ spin_unlock(&bg->lock);
+
+ if (mark_unused)
+ btrfs_mark_bg_unused(bg);
+
+ spin_lock(&trans->transaction->dirty_bgs_lock);
+ if (list_empty(&bg->dirty_list)) {
+ list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
+ bg_already_dirty = false;
+ btrfs_get_block_group(bg);
+ }
+ spin_unlock(&trans->transaction->dirty_bgs_lock);
+
+ /* Modified block groups are accounted for in the delayed_refs_rsv. */
+ if (!bg_already_dirty)
+ btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
+}
+
+static int remove_chunk_stripes(struct btrfs_trans_handle *trans,
+ struct btrfs_chunk_map *chunk,
+ struct btrfs_path *path)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_key key;
+ struct extent_buffer *leaf;
+ struct btrfs_chunk *c;
+ int ret;
+
+ key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+ key.type = BTRFS_CHUNK_ITEM_KEY;
+ key.offset = chunk->start;
+
+ btrfs_reserve_chunk_metadata(trans, false);
+
+ ret = btrfs_search_slot(trans, fs_info->chunk_root, &key, path,
+ 0, 1);
+ if (ret) {
+ if (ret == 1) {
+ btrfs_release_path(path);
+ ret = -ENOENT;
+ }
+ btrfs_trans_release_chunk_metadata(trans);
+ return ret;
+ }
+
+ leaf = path->nodes[0];
+
+ c = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_chunk);
+ btrfs_set_chunk_num_stripes(leaf, c, 0);
+ btrfs_set_chunk_sub_stripes(leaf, c, 0);
+
+ btrfs_truncate_item(trans, path, offsetof(struct btrfs_chunk, stripe),
+ 1);
+
+ btrfs_mark_buffer_dirty(trans, leaf);
+
+ btrfs_release_path(path);
+ btrfs_trans_release_chunk_metadata(trans);
+
+ return 0;
+}
+
+int btrfs_last_identity_remap_gone(struct btrfs_chunk_map *chunk,
+ struct btrfs_block_group *bg)
+{
+ struct btrfs_fs_info *fs_info = bg->fs_info;
+ struct btrfs_trans_handle *trans;
+ int ret;
+ unsigned int num_items;
+ BTRFS_PATH_AUTO_FREE(path);
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ /*
+ * One item for each entry we're removing in the dev extents tree, and
+ * another for each device. DUP chunks are all on one device,
+ * everything else has one device per stripe.
+ */
+ if (bg->flags & BTRFS_BLOCK_GROUP_DUP)
+ num_items = chunk->num_stripes + 1;
+ else
+ num_items = 2 * chunk->num_stripes;
+
+ trans = btrfs_start_transaction_fallback_global_rsv(fs_info->tree_root,
+ num_items);
+ if (IS_ERR(trans))
+ return PTR_ERR(trans);
+
+ ret = btrfs_remove_dev_extents(trans, chunk);
+ if (ret) {
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+
+ mutex_lock(&trans->fs_info->chunk_mutex);
+
+ for (unsigned int i = 0; i < chunk->num_stripes; i++) {
+ ret = btrfs_update_device(trans, chunk->stripes[i].dev);
+ if (ret) {
+ mutex_unlock(&trans->fs_info->chunk_mutex);
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+ }
+
+ mutex_unlock(&trans->fs_info->chunk_mutex);
+
+ write_lock(&trans->fs_info->mapping_tree_lock);
+ btrfs_chunk_map_device_clear_bits(chunk, CHUNK_ALLOCATED);
+ write_unlock(&trans->fs_info->mapping_tree_lock);
+
+ btrfs_remove_bg_from_sinfo(bg);
+
+ ret = remove_chunk_stripes(trans, chunk, path);
+ if (ret) {
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+
+ ret = btrfs_commit_transaction(trans);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static void adjust_identity_remap_count(struct btrfs_trans_handle *trans,
+ struct btrfs_block_group *bg, int delta)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ bool bg_already_dirty = true, mark_fully_remapped = false;
+
+ WARN_ON(delta < 0 && -delta > bg->identity_remap_count);
+
+ spin_lock(&bg->lock);
+
+ bg->identity_remap_count += delta;
+
+ if (bg->identity_remap_count == 0 && !bg->fully_remapped) {
+ bg->fully_remapped = true;
+ mark_fully_remapped = true;
+ }
+
+ spin_unlock(&bg->lock);
+
+ spin_lock(&trans->transaction->dirty_bgs_lock);
+ if (list_empty(&bg->dirty_list)) {
+ list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
+ bg_already_dirty = false;
+ btrfs_get_block_group(bg);
+ }
+ spin_unlock(&trans->transaction->dirty_bgs_lock);
+
+ /* Modified block groups are accounted for in the delayed_refs_rsv. */
+ if (!bg_already_dirty)
+ btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
+
+ if (mark_fully_remapped)
+ btrfs_mark_bg_fully_remapped(bg, trans);
+}
+
int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
u64 *length)
{
@@ -4468,3 +4646,277 @@ u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info)
logical = fs_info->reloc_ctl->block_group->start;
return logical;
}
+
+static int insert_remap_item(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path, u64 old_addr, u64 length,
+ u64 new_addr)
+{
+ int ret;
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_key key;
+ struct btrfs_remap remap;
+
+ if (old_addr == new_addr) {
+ /* Add new identity remap item. */
+
+ key.objectid = old_addr;
+ key.type = BTRFS_IDENTITY_REMAP_KEY;
+ key.offset = length;
+
+ ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
+ path, &key, 0);
+ if (ret)
+ return ret;
+ } else {
+ /* Add new remap item. */
+
+ key.objectid = old_addr;
+ key.type = BTRFS_REMAP_KEY;
+ key.offset = length;
+
+ ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
+ path, &key,
+ sizeof(struct btrfs_remap));
+ if (ret)
+ return ret;
+
+ btrfs_set_stack_remap_address(&remap, new_addr);
+
+ write_extent_buffer(path->nodes[0], &remap,
+ btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
+ sizeof(struct btrfs_remap));
+
+ btrfs_release_path(path);
+
+ /* Add new backref item. */
+
+ key.objectid = new_addr;
+ key.type = BTRFS_REMAP_BACKREF_KEY;
+ key.offset = length;
+
+ ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
+ path, &key,
+ sizeof(struct btrfs_remap));
+ if (ret)
+ return ret;
+
+ btrfs_set_stack_remap_address(&remap, old_addr);
+
+ write_extent_buffer(path->nodes[0], &remap,
+ btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
+ sizeof(struct btrfs_remap));
+ }
+
+ btrfs_release_path(path);
+
+ return 0;
+}
+
+/*
+ * Punch a hole in the remap item or identity remap item pointed to by path,
+ * for the range [hole_start, hole_start + hole_length).
+ */
+static int remove_range_from_remap_tree(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ struct btrfs_block_group *bg,
+ u64 hole_start, u64 hole_length)
+{
+ int ret;
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct extent_buffer *leaf = path->nodes[0];
+ struct btrfs_key key;
+ u64 hole_end, new_addr, remap_start, remap_length, remap_end,
+ overlap_length;
+ bool is_identity_remap;
+ int identity_count_delta = 0;
+
+ hole_end = hole_start + hole_length;
+
+ btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+
+ is_identity_remap = key.type == BTRFS_IDENTITY_REMAP_KEY;
+
+ remap_start = key.objectid;
+ remap_length = key.offset;
+
+ remap_end = remap_start + remap_length;
+
+ if (is_identity_remap) {
+ new_addr = remap_start;
+ } else {
+ struct btrfs_remap *remap_ptr;
+
+ remap_ptr = btrfs_item_ptr(leaf, path->slots[0],
+ struct btrfs_remap);
+ new_addr = btrfs_remap_address(leaf, remap_ptr);
+ }
+
+ /* Delete old item. */
+
+ ret = btrfs_del_item(trans, fs_info->remap_root, path);
+
+ btrfs_release_path(path);
+
+ if (ret)
+ return ret;
+
+ if (is_identity_remap) {
+ identity_count_delta = -1;
+ } else {
+ /* Remove backref. */
+
+ key.objectid = new_addr;
+ key.type = BTRFS_REMAP_BACKREF_KEY;
+ key.offset = remap_length;
+
+ ret = btrfs_search_slot(trans, fs_info->remap_root,
+ &key, path, -1, 1);
+ if (ret) {
+ if (ret == 1) {
+ btrfs_release_path(path);
+ ret = -ENOENT;
+ }
+ return ret;
+ }
+
+ ret = btrfs_del_item(trans, fs_info->remap_root, path);
+
+ btrfs_release_path(path);
+
+ if (ret)
+ return ret;
+ }
+
+ /* If hole_start > remap_start, re-add the start of the remap item. */
+ if (hole_start > remap_start) {
+ ret = insert_remap_item(trans, path, remap_start,
+ hole_start - remap_start, new_addr);
+ if (ret)
+ return ret;
+
+ if (is_identity_remap)
+ identity_count_delta++;
+ }
+
+ /* If hole_end < remap_end, re-add the end of the remap item. */
+ if (hole_end < remap_end) {
+ ret = insert_remap_item(trans, path, hole_end,
+ remap_end - hole_end,
+ hole_end - remap_start + new_addr);
+ if (ret)
+ return ret;
+
+ if (is_identity_remap)
+ identity_count_delta++;
+ }
+
+ if (identity_count_delta != 0)
+ adjust_identity_remap_count(trans, bg, identity_count_delta);
+
+ overlap_length = min_t(u64, hole_end, remap_end) -
+ max_t(u64, hole_start, remap_start);
+
+ if (!is_identity_remap) {
+ struct btrfs_block_group *dest_bg;
+
+ dest_bg = btrfs_lookup_block_group(fs_info, new_addr);
+
+ adjust_block_group_remap_bytes(trans, dest_bg, -overlap_length);
+
+ btrfs_put_block_group(dest_bg);
+
+ ret = btrfs_add_to_free_space_tree(trans,
+ hole_start - remap_start + new_addr,
+ overlap_length);
+ if (ret)
+ return ret;
+ }
+
+ ret = overlap_length;
+
+ return ret;
+}
+
+/*
+ * Returns 1 if remove_range_from_remap_tree() has been called successfully,
+ * 0 if block group wasn't remapped, and a negative number on error.
+ */
+int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ u64 bytenr, u64 num_bytes)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_key key, found_key;
+ struct extent_buffer *leaf;
+ struct btrfs_block_group *bg;
+ int ret, length;
+
+ if (!(btrfs_super_incompat_flags(fs_info->super_copy) &
+ BTRFS_FEATURE_INCOMPAT_REMAP_TREE))
+ return 0;
+
+ bg = btrfs_lookup_block_group(fs_info, bytenr);
+ if (!bg)
+ return 0;
+
+ mutex_lock(&fs_info->remap_mutex);
+
+ if (!(bg->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
+ mutex_unlock(&fs_info->remap_mutex);
+ btrfs_put_block_group(bg);
+ return 0;
+ }
+
+ do {
+ key.objectid = bytenr;
+ key.type = (u8)-1;
+ key.offset = (u64)-1;
+
+ ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path,
+ -1, 1);
+ if (ret < 0)
+ goto end;
+
+ leaf = path->nodes[0];
+
+ if (path->slots[0] == 0) {
+ ret = -ENOENT;
+ goto end;
+ }
+
+ path->slots[0]--;
+
+ btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+
+ if (found_key.type != BTRFS_IDENTITY_REMAP_KEY &&
+ found_key.type != BTRFS_REMAP_KEY) {
+ ret = -ENOENT;
+ goto end;
+ }
+
+ if (bytenr < found_key.objectid ||
+ bytenr >= found_key.objectid + found_key.offset) {
+ ret = -ENOENT;
+ goto end;
+ }
+
+ length = remove_range_from_remap_tree(trans, path, bg, bytenr,
+ num_bytes);
+ if (length < 0) {
+ ret = length;
+ goto end;
+ }
+
+ bytenr += length;
+ num_bytes -= length;
+ } while (num_bytes > 0);
+
+ ret = 1;
+
+end:
+ mutex_unlock(&fs_info->remap_mutex);
+
+ btrfs_put_block_group(bg);
+ btrfs_release_path(path);
+ return ret;
+}
diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
index b2ba83966650..ffb497f27889 100644
--- a/fs/btrfs/relocation.h
+++ b/fs/btrfs/relocation.h
@@ -33,5 +33,10 @@ bool btrfs_should_ignore_reloc_root(const struct btrfs_root *root);
u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info);
int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
u64 *length);
+int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ u64 bytenr, u64 num_bytes);
+int btrfs_last_identity_remap_gone(struct btrfs_chunk_map *chunk,
+ struct btrfs_block_group *bg);
#endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6a72c2a599a6..2347b37113b0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2928,8 +2928,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
return ret;
}
-static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
- struct btrfs_device *device)
+int btrfs_update_device(struct btrfs_trans_handle *trans,
+ struct btrfs_device *device)
{
int ret;
BTRFS_PATH_AUTO_FREE(path);
@@ -3227,25 +3227,13 @@ static int remove_chunk_item(struct btrfs_trans_handle *trans,
return btrfs_free_chunk(trans, chunk_offset);
}
-int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
+int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
+ struct btrfs_chunk_map *map)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
- struct btrfs_chunk_map *map;
+ struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
u64 dev_extent_len = 0;
int i, ret = 0;
- struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
-
- map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
- if (IS_ERR(map)) {
- /*
- * This is a logic error, but we don't want to just rely on the
- * user having built with ASSERT enabled, so if ASSERT doesn't
- * do anything we still error out.
- */
- DEBUG_WARN("errr %ld reading chunk map at offset %llu",
- PTR_ERR(map), chunk_offset);
- return PTR_ERR(map);
- }
/*
* First delete the device extent items from the devices btree.
@@ -3266,7 +3254,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
if (unlikely(ret)) {
mutex_unlock(&fs_devices->device_list_mutex);
btrfs_abort_transaction(trans, ret);
- goto out;
+ return ret;
}
if (device->bytes_used > 0) {
@@ -3286,6 +3274,30 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
}
mutex_unlock(&fs_devices->device_list_mutex);
+ return 0;
+}
+
+int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_chunk_map *map;
+ int ret;
+
+ map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
+ if (IS_ERR(map)) {
+ /*
+ * This is a logic error, but we don't want to just rely on the
+ * user having built with ASSERT enabled, so if ASSERT doesn't
+ * do anything we still error out.
+ */
+ ASSERT(0);
+ return PTR_ERR(map);
+ }
+
+ ret = btrfs_remove_dev_extents(trans, map);
+ if (ret)
+ goto out;
+
/*
* We acquire fs_info->chunk_mutex for 2 reasons:
*
@@ -5419,7 +5431,7 @@ static void chunk_map_device_set_bits(struct btrfs_chunk_map *map, unsigned int
}
}
-static void chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
+void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
{
for (int i = 0; i < map->num_stripes; i++) {
struct btrfs_io_stripe *stripe = &map->stripes[i];
@@ -5436,7 +5448,7 @@ void btrfs_remove_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_ma
write_lock(&fs_info->mapping_tree_lock);
rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
RB_CLEAR_NODE(&map->rb_node);
- chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
+ btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
write_unlock(&fs_info->mapping_tree_lock);
/* Once for the tree reference. */
@@ -5472,7 +5484,7 @@ int btrfs_add_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_map *m
return -EEXIST;
}
chunk_map_device_set_bits(map, CHUNK_ALLOCATED);
- chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
+ btrfs_chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
write_unlock(&fs_info->mapping_tree_lock);
return 0;
@@ -5828,7 +5840,7 @@ void btrfs_mapping_tree_free(struct btrfs_fs_info *fs_info)
map = rb_entry(node, struct btrfs_chunk_map, rb_node);
rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
RB_CLEAR_NODE(&map->rb_node);
- chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
+ btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
/* Once for the tree ref. */
btrfs_free_chunk_map(map);
cond_resched_rwlock_write(&fs_info->mapping_tree_lock);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 4117fabb248b..ccf0a459180d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -794,6 +794,8 @@ u64 btrfs_calc_stripe_length(const struct btrfs_chunk_map *map);
int btrfs_nr_parity_stripes(u64 type);
int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans,
struct btrfs_block_group *bg);
+int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
+ struct btrfs_chunk_map *map);
int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
@@ -905,6 +907,10 @@ bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical);
bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb);
+int btrfs_update_device(struct btrfs_trans_handle *trans,
+ struct btrfs_device *device);
+void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map,
+ unsigned int bits);
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info,
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 10/16] btrfs: handle setting up relocation of block group with remap-tree
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (8 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 09/16] btrfs: handle deletions from remapped block group Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-15 14:52 ` Sun Yangkai
2025-11-14 18:47 ` [PATCH v6 11/16] btrfs: move existing remaps before relocating block group Mark Harmstone
` (5 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
Handle the preliminary work for relocating a block group in a filesystem
with the remap-tree flag set.
If the block group is SYSTEM btrfs_relocate_block_group() proceeds as it
does already, as bootstrapping issues mean that these block groups have
to be processed the existing way. Similarly with REMAP blocks, which are
dealt with in a later patch.
Otherwise we walk the free-space tree for the block group in question,
recording any holes. These get converted into identity remaps and placed
in the remap tree, and the block group's REMAPPED flag is set. From now
on no new allocations are possible within this block group, and any I/O
to it will be funnelled through btrfs_translate_remap(). We store the
number of identity remaps in `identity_remap_count`, so that we know
when we've removed the last one and the block group is fully remapped.
The change in btrfs_read_roots() is because data relocations no longer
rely on the data reloc tree as a hidden subvolume in which to do
snapshots.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/block-group.c | 6 +-
fs/btrfs/block-group.h | 4 +
fs/btrfs/free-space-tree.c | 4 +-
fs/btrfs/free-space-tree.h | 5 +-
fs/btrfs/relocation.c | 512 +++++++++++++++++++++++++++++++++----
fs/btrfs/relocation.h | 11 +
fs/btrfs/space-info.c | 9 +-
fs/btrfs/volumes.c | 91 ++++---
8 files changed, 549 insertions(+), 93 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index e269518e1bfe..4c4edaf3c753 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2414,6 +2414,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
cache->used = btrfs_stack_block_group_v2_used(bgi);
cache->commit_used = cache->used;
cache->flags = btrfs_stack_block_group_v2_flags(bgi);
+ cache->commit_flags = cache->flags;
cache->global_root_id = btrfs_stack_block_group_v2_chunk_objectid(bgi);
cache->space_info = btrfs_find_space_info(info, cache->flags);
cache->remap_bytes = btrfs_stack_block_group_v2_remap_bytes(bgi);
@@ -2723,6 +2724,7 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans,
block_group->commit_remap_bytes = block_group->remap_bytes;
block_group->commit_identity_remap_count =
block_group->identity_remap_count;
+ block_group->commit_flags = block_group->flags;
key.objectid = block_group->start;
key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
key.offset = block_group->length;
@@ -3211,13 +3213,15 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
/* No change in values, can safely skip it. */
if (cache->commit_used == used &&
cache->commit_remap_bytes == remap_bytes &&
- cache->commit_identity_remap_count == identity_remap_count) {
+ cache->commit_identity_remap_count == identity_remap_count &&
+ cache->commit_flags == cache->flags) {
spin_unlock(&cache->lock);
return 0;
}
cache->commit_used = used;
cache->commit_remap_bytes = remap_bytes;
cache->commit_identity_remap_count = identity_remap_count;
+ cache->commit_flags = cache->flags;
spin_unlock(&cache->lock);
key.objectid = cache->start;
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index d85f3c2546d0..4522074a45c2 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -146,6 +146,10 @@ struct btrfs_block_group {
* The last commited identity_remap_count value of this block group.
*/
u32 commit_identity_remap_count;
+ /*
+ * The last committed flags value for this block group.
+ */
+ u64 commit_flags;
/*
* If the free space extent count exceeds this number, convert the block
diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index 26eae347739f..e46b1fa86f80 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -21,8 +21,7 @@ static int __add_block_group_free_space(struct btrfs_trans_handle *trans,
struct btrfs_block_group *block_group,
struct btrfs_path *path);
-static struct btrfs_root *btrfs_free_space_root(
- struct btrfs_block_group *block_group)
+struct btrfs_root *btrfs_free_space_root(struct btrfs_block_group *block_group)
{
struct btrfs_key key = {
.objectid = BTRFS_FREE_SPACE_TREE_OBJECTID,
@@ -93,7 +92,6 @@ static int add_new_free_space_info(struct btrfs_trans_handle *trans,
return 0;
}
-EXPORT_FOR_TESTS
struct btrfs_free_space_info *btrfs_search_free_space_info(
struct btrfs_trans_handle *trans,
struct btrfs_block_group *block_group,
diff --git a/fs/btrfs/free-space-tree.h b/fs/btrfs/free-space-tree.h
index 3d9a5d4477fc..89d2ff7e5c18 100644
--- a/fs/btrfs/free-space-tree.h
+++ b/fs/btrfs/free-space-tree.h
@@ -35,12 +35,13 @@ int btrfs_add_to_free_space_tree(struct btrfs_trans_handle *trans,
u64 start, u64 size);
int btrfs_remove_from_free_space_tree(struct btrfs_trans_handle *trans,
u64 start, u64 size);
-
-#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
struct btrfs_free_space_info *
btrfs_search_free_space_info(struct btrfs_trans_handle *trans,
struct btrfs_block_group *block_group,
struct btrfs_path *path, int cow);
+struct btrfs_root *btrfs_free_space_root(struct btrfs_block_group *block_group);
+
+#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
int __btrfs_add_to_free_space_tree(struct btrfs_trans_handle *trans,
struct btrfs_block_group *block_group,
struct btrfs_path *path, u64 start, u64 size);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 315f212718ad..1f86c81678bb 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3617,7 +3617,7 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
btrfs_btree_balance_dirty(fs_info);
}
- if (!err) {
+ if (!err && !btrfs_fs_incompat(fs_info, REMAP_TREE)) {
ret = relocate_file_extent_cluster(rc);
if (ret < 0)
err = ret;
@@ -3861,6 +3861,90 @@ static const char *stage_to_string(enum reloc_stage stage)
return "unknown";
}
+static int add_remap_tree_entries(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ struct btrfs_key *entries,
+ unsigned int num_entries)
+{
+ int ret;
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_item_batch batch;
+ u32 *data_sizes;
+ u32 max_items;
+
+ max_items = BTRFS_LEAF_DATA_SIZE(trans->fs_info) / sizeof(struct btrfs_item);
+
+ data_sizes = kzalloc(sizeof(u32) * min_t(u32, num_entries, max_items),
+ GFP_NOFS);
+ if (!data_sizes)
+ return -ENOMEM;
+
+ while (true) {
+ batch.keys = entries;
+ batch.data_sizes = data_sizes;
+ batch.total_data_size = 0;
+ batch.nr = min_t(u32, num_entries, max_items);
+
+ ret = btrfs_insert_empty_items(trans, fs_info->remap_root, path,
+ &batch);
+ btrfs_release_path(path);
+
+ if (num_entries <= max_items)
+ break;
+
+ num_entries -= max_items;
+ entries += max_items;
+ }
+
+ kfree(data_sizes);
+
+ return ret;
+}
+
+struct space_run {
+ u64 start;
+ u64 end;
+};
+
+static void parse_bitmap(u64 block_size, const unsigned long *bitmap,
+ unsigned long size, u64 address,
+ struct space_run *space_runs,
+ unsigned int *num_space_runs)
+{
+ unsigned long pos, end;
+ u64 run_start, run_length;
+
+ pos = find_first_bit(bitmap, size);
+
+ if (pos == size)
+ return;
+
+ while (true) {
+ end = find_next_zero_bit(bitmap, size, pos);
+
+ run_start = address + (pos * block_size);
+ run_length = (end - pos) * block_size;
+
+ if (*num_space_runs != 0 &&
+ space_runs[*num_space_runs - 1].end == run_start) {
+ space_runs[*num_space_runs - 1].end += run_length;
+ } else {
+ space_runs[*num_space_runs].start = run_start;
+ space_runs[*num_space_runs].end = run_start + run_length;
+
+ (*num_space_runs)++;
+ }
+
+ if (end == size)
+ break;
+
+ pos = find_next_bit(bitmap, size, end + 1);
+
+ if (pos == size)
+ break;
+ }
+}
+
static void adjust_block_group_remap_bytes(struct btrfs_trans_handle *trans,
struct btrfs_block_group *bg,
s64 diff)
@@ -3893,6 +3977,184 @@ static void adjust_block_group_remap_bytes(struct btrfs_trans_handle *trans,
btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
}
+static int create_remap_tree_entries(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ struct btrfs_block_group *bg)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_free_space_info *fsi;
+ struct btrfs_key key, found_key;
+ struct extent_buffer *leaf;
+ struct btrfs_root *space_root;
+ u32 extent_count;
+ struct space_run *space_runs = NULL;
+ unsigned int num_space_runs = 0;
+ struct btrfs_key *entries = NULL;
+ unsigned int max_entries, num_entries;
+ int ret;
+
+ mutex_lock(&bg->free_space_lock);
+
+ if (test_bit(BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE, &bg->runtime_flags)) {
+ mutex_unlock(&bg->free_space_lock);
+
+ ret = btrfs_add_block_group_free_space(trans, bg);
+ if (ret)
+ return ret;
+
+ mutex_lock(&bg->free_space_lock);
+ }
+
+ fsi = btrfs_search_free_space_info(trans, bg, path, 0);
+ if (IS_ERR(fsi)) {
+ mutex_unlock(&bg->free_space_lock);
+ return PTR_ERR(fsi);
+ }
+
+ extent_count = btrfs_free_space_extent_count(path->nodes[0], fsi);
+
+ btrfs_release_path(path);
+
+ space_runs = kmalloc(sizeof(*space_runs) * extent_count, GFP_NOFS);
+ if (!space_runs) {
+ mutex_unlock(&bg->free_space_lock);
+ return -ENOMEM;
+ }
+
+ key.objectid = bg->start;
+ key.type = 0;
+ key.offset = 0;
+
+ space_root = btrfs_free_space_root(bg);
+
+ ret = btrfs_search_slot(trans, space_root, &key, path, 0, 0);
+ if (ret < 0) {
+ mutex_unlock(&bg->free_space_lock);
+ goto out;
+ }
+
+ ret = 0;
+
+ while (true) {
+ leaf = path->nodes[0];
+
+ btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+
+ if (found_key.objectid >= bg->start + bg->length)
+ break;
+
+ if (found_key.type == BTRFS_FREE_SPACE_EXTENT_KEY) {
+ if (num_space_runs != 0 &&
+ space_runs[num_space_runs - 1].end == found_key.objectid) {
+ space_runs[num_space_runs - 1].end =
+ found_key.objectid + found_key.offset;
+ } else {
+ BUG_ON(num_space_runs >= extent_count);
+
+ space_runs[num_space_runs].start = found_key.objectid;
+ space_runs[num_space_runs].end =
+ found_key.objectid + found_key.offset;
+
+ num_space_runs++;
+ }
+ } else if (found_key.type == BTRFS_FREE_SPACE_BITMAP_KEY) {
+ void *bitmap;
+ unsigned long offset;
+ u32 data_size;
+
+ offset = btrfs_item_ptr_offset(leaf, path->slots[0]);
+ data_size = btrfs_item_size(leaf, path->slots[0]);
+
+ if (data_size != 0) {
+ bitmap = kmalloc(data_size, GFP_NOFS);
+ if (!bitmap) {
+ mutex_unlock(&bg->free_space_lock);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ read_extent_buffer(leaf, bitmap, offset,
+ data_size);
+
+ parse_bitmap(fs_info->sectorsize, bitmap,
+ data_size * BITS_PER_BYTE,
+ found_key.objectid, space_runs,
+ &num_space_runs);
+
+ BUG_ON(num_space_runs > extent_count);
+
+ kfree(bitmap);
+ }
+ }
+
+ path->slots[0]++;
+
+ if (path->slots[0] >= btrfs_header_nritems(leaf)) {
+ ret = btrfs_next_leaf(space_root, path);
+ if (ret != 0) {
+ if (ret == 1)
+ ret = 0;
+ break;
+ }
+ leaf = path->nodes[0];
+ }
+ }
+
+ btrfs_release_path(path);
+
+ mutex_unlock(&bg->free_space_lock);
+
+ max_entries = extent_count + 2;
+ entries = kmalloc(sizeof(*entries) * max_entries, GFP_NOFS);
+ if (!entries) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ num_entries = 0;
+
+ if (num_space_runs > 0 && space_runs[0].start > bg->start) {
+ entries[num_entries].objectid = bg->start;
+ entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
+ entries[num_entries].offset = space_runs[0].start - bg->start;
+ num_entries++;
+ }
+
+ for (unsigned int i = 1; i < num_space_runs; i++) {
+ entries[num_entries].objectid = space_runs[i - 1].end;
+ entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
+ entries[num_entries].offset =
+ space_runs[i].start - space_runs[i - 1].end;
+ num_entries++;
+ }
+
+ if (num_space_runs == 0) {
+ entries[num_entries].objectid = bg->start;
+ entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
+ entries[num_entries].offset = bg->length;
+ num_entries++;
+ } else if (space_runs[num_space_runs - 1].end < bg->start + bg->length) {
+ entries[num_entries].objectid = space_runs[num_space_runs - 1].end;
+ entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
+ entries[num_entries].offset =
+ bg->start + bg->length - space_runs[num_space_runs - 1].end;
+ num_entries++;
+ }
+
+ if (num_entries == 0)
+ goto out;
+
+ bg->identity_remap_count = num_entries;
+
+ ret = add_remap_tree_entries(trans, path, entries, num_entries);
+
+out:
+ kfree(entries);
+ kfree(space_runs);
+
+ return ret;
+}
+
static int remove_chunk_stripes(struct btrfs_trans_handle *trans,
struct btrfs_chunk_map *chunk,
struct btrfs_path *path)
@@ -4038,6 +4300,55 @@ static void adjust_identity_remap_count(struct btrfs_trans_handle *trans,
btrfs_mark_bg_fully_remapped(bg, trans);
}
+static int mark_chunk_remapped(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path, uint64_t start)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_chunk_map *chunk;
+ struct btrfs_key key;
+ u64 type;
+ int ret;
+ struct extent_buffer *leaf;
+ struct btrfs_chunk *c;
+
+ read_lock(&fs_info->mapping_tree_lock);
+
+ chunk = btrfs_find_chunk_map_nolock(fs_info, start, 1);
+ if (!chunk) {
+ read_unlock(&fs_info->mapping_tree_lock);
+ return -ENOENT;
+ }
+
+ chunk->type |= BTRFS_BLOCK_GROUP_REMAPPED;
+ type = chunk->type;
+
+ read_unlock(&fs_info->mapping_tree_lock);
+
+ key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+ key.type = BTRFS_CHUNK_ITEM_KEY;
+ key.offset = start;
+
+ ret = btrfs_search_slot(trans, fs_info->chunk_root, &key, path,
+ 0, 1);
+ if (ret == 1) {
+ ret = -ENOENT;
+ goto end;
+ } else if (ret < 0)
+ goto end;
+
+ leaf = path->nodes[0];
+
+ c = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_chunk);
+ btrfs_set_chunk_type(leaf, c, type);
+ btrfs_mark_buffer_dirty(trans, leaf);
+
+ ret = 0;
+end:
+ btrfs_free_chunk_map(chunk);
+ btrfs_release_path(path);
+ return ret;
+}
+
int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
u64 *length)
{
@@ -4092,6 +4403,136 @@ int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
return 0;
}
+static int start_block_group_remapping(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path,
+ struct btrfs_block_group *bg)
+{
+ struct btrfs_trans_handle *trans;
+ bool bg_already_dirty = true;
+ int ret, ret2;
+
+ ret = btrfs_cache_block_group(bg, true);
+ if (ret)
+ return ret;
+
+ trans = btrfs_start_transaction(fs_info->remap_root, 0);
+ if (IS_ERR(trans))
+ return PTR_ERR(trans);
+
+ /* We need to run delayed refs, to make sure FST is up to date. */
+ ret = btrfs_run_delayed_refs(trans, U64_MAX);
+ if (ret) {
+ btrfs_end_transaction(trans);
+ return ret;
+ }
+
+ mutex_lock(&fs_info->remap_mutex);
+
+ if (bg->flags & BTRFS_BLOCK_GROUP_REMAPPED) {
+ ret = 0;
+ goto end;
+ }
+
+ ret = create_remap_tree_entries(trans, path, bg);
+ if (ret) {
+ btrfs_abort_transaction(trans, ret);
+ goto end;
+ }
+
+ spin_lock(&bg->lock);
+ bg->flags |= BTRFS_BLOCK_GROUP_REMAPPED;
+ spin_unlock(&bg->lock);
+
+ spin_lock(&trans->transaction->dirty_bgs_lock);
+ if (list_empty(&bg->dirty_list)) {
+ list_add_tail(&bg->dirty_list,
+ &trans->transaction->dirty_bgs);
+ bg_already_dirty = false;
+ btrfs_get_block_group(bg);
+ }
+ spin_unlock(&trans->transaction->dirty_bgs_lock);
+
+ /* Modified block groups are accounted for in the delayed_refs_rsv. */
+ if (!bg_already_dirty)
+ btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
+
+ ret = mark_chunk_remapped(trans, path, bg->start);
+ if (ret) {
+ btrfs_abort_transaction(trans, ret);
+ goto end;
+ }
+
+ ret = btrfs_remove_block_group_free_space(trans, bg);
+ if (ret) {
+ btrfs_abort_transaction(trans, ret);
+ goto end;
+ }
+
+ btrfs_remove_free_space_cache(bg);
+
+end:
+ mutex_unlock(&fs_info->remap_mutex);
+
+ ret2 = btrfs_end_transaction(trans);
+ if (!ret)
+ ret = ret2;
+
+ return ret;
+}
+
+static int do_nonremap_reloc(struct btrfs_fs_info *fs_info, bool verbose,
+ struct reloc_control *rc)
+{
+ int ret;
+
+ while (1) {
+ enum reloc_stage finishes_stage;
+
+ mutex_lock(&fs_info->cleaner_mutex);
+ ret = relocate_block_group(rc);
+ mutex_unlock(&fs_info->cleaner_mutex);
+
+ finishes_stage = rc->stage;
+ /*
+ * We may have gotten ENOSPC after we already dirtied some
+ * extents. If writeout happens while we're relocating a
+ * different block group we could end up hitting the
+ * BUG_ON(rc->stage == UPDATE_DATA_PTRS) in
+ * btrfs_reloc_cow_block. Make sure we write everything out
+ * properly so we don't trip over this problem, and then break
+ * out of the loop if we hit an error.
+ */
+ if (rc->stage == MOVE_DATA_EXTENTS && rc->found_file_extent) {
+ int wb_ret;
+
+ wb_ret = btrfs_wait_ordered_range(BTRFS_I(rc->data_inode),
+ 0, (u64)-1);
+ if (wb_ret && ret == 0)
+ ret = wb_ret;
+ invalidate_mapping_pages(rc->data_inode->i_mapping,
+ 0, -1);
+ rc->stage = UPDATE_DATA_PTRS;
+ }
+
+ if (ret < 0)
+ return ret;
+
+ if (rc->extents_found == 0)
+ break;
+
+ if (verbose)
+ btrfs_info(fs_info, "found %llu extents, stage: %s",
+ rc->extents_found,
+ stage_to_string(finishes_stage));
+ }
+
+ WARN_ON(rc->block_group->pinned > 0);
+ WARN_ON(rc->block_group->reserved > 0);
+ WARN_ON(rc->block_group->used > 0);
+
+ return 0;
+}
+
/*
* function to relocate all extents in a block group.
*/
@@ -4102,7 +4543,7 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start,
struct btrfs_root *extent_root = btrfs_extent_root(fs_info, group_start);
struct reloc_control *rc;
struct inode *inode;
- struct btrfs_path *path;
+ struct btrfs_path *path = NULL;
int ret;
bool bg_is_ro = false;
@@ -4164,7 +4605,7 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start,
}
inode = lookup_free_space_inode(rc->block_group, path);
- btrfs_free_path(path);
+ btrfs_release_path(path);
if (!IS_ERR(inode))
ret = delete_block_group_cache(rc->block_group, inode, 0);
@@ -4174,11 +4615,13 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start,
if (ret && ret != -ENOENT)
goto out;
- rc->data_inode = create_reloc_inode(rc->block_group);
- if (IS_ERR(rc->data_inode)) {
- ret = PTR_ERR(rc->data_inode);
- rc->data_inode = NULL;
- goto out;
+ if (!btrfs_fs_incompat(fs_info, REMAP_TREE)) {
+ rc->data_inode = create_reloc_inode(rc->block_group);
+ if (IS_ERR(rc->data_inode)) {
+ ret = PTR_ERR(rc->data_inode);
+ rc->data_inode = NULL;
+ goto out;
+ }
}
if (verbose)
@@ -4191,54 +4634,17 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start,
ret = btrfs_zone_finish(rc->block_group);
WARN_ON(ret && ret != -EAGAIN);
- while (1) {
- enum reloc_stage finishes_stage;
-
- mutex_lock(&fs_info->cleaner_mutex);
- ret = relocate_block_group(rc);
- mutex_unlock(&fs_info->cleaner_mutex);
-
- finishes_stage = rc->stage;
- /*
- * We may have gotten ENOSPC after we already dirtied some
- * extents. If writeout happens while we're relocating a
- * different block group we could end up hitting the
- * BUG_ON(rc->stage == UPDATE_DATA_PTRS) in
- * btrfs_reloc_cow_block. Make sure we write everything out
- * properly so we don't trip over this problem, and then break
- * out of the loop if we hit an error.
- */
- if (rc->stage == MOVE_DATA_EXTENTS && rc->found_file_extent) {
- int wb_ret;
-
- wb_ret = btrfs_wait_ordered_range(BTRFS_I(rc->data_inode), 0,
- (u64)-1);
- if (wb_ret && ret == 0)
- ret = wb_ret;
- invalidate_mapping_pages(rc->data_inode->i_mapping,
- 0, -1);
- rc->stage = UPDATE_DATA_PTRS;
- }
-
- if (ret < 0)
- goto out;
-
- if (rc->extents_found == 0)
- break;
-
- if (verbose)
- btrfs_info(fs_info, "found %llu extents, stage: %s",
- rc->extents_found,
- stage_to_string(finishes_stage));
+ if (should_relocate_using_remap_tree(bg)) {
+ ret = start_block_group_remapping(fs_info, path, bg);
+ } else {
+ ret = do_nonremap_reloc(fs_info, verbose, rc);
}
-
- WARN_ON(rc->block_group->pinned > 0);
- WARN_ON(rc->block_group->reserved > 0);
- WARN_ON(rc->block_group->used > 0);
out:
if (ret && bg_is_ro)
btrfs_dec_block_group_ro(rc->block_group);
- iput(rc->data_inode);
+ if (!btrfs_fs_incompat(fs_info, REMAP_TREE))
+ iput(rc->data_inode);
+ btrfs_free_path(path);
reloc_chunk_end(fs_info);
out_put_bg:
btrfs_put_block_group(bg);
@@ -4432,7 +4838,7 @@ int btrfs_recover_relocation(struct btrfs_fs_info *fs_info)
btrfs_free_path(path);
- if (ret == 0) {
+ if (ret == 0 && !btrfs_fs_incompat(fs_info, REMAP_TREE)) {
/* cleanup orphan inode in data relocation tree */
fs_root = btrfs_grab_root(fs_info->data_reloc_root);
ASSERT(fs_root);
diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
index ffb497f27889..9f166b900d46 100644
--- a/fs/btrfs/relocation.h
+++ b/fs/btrfs/relocation.h
@@ -12,6 +12,17 @@ struct btrfs_trans_handle;
struct btrfs_ordered_extent;
struct btrfs_pending_snapshot;
+static inline bool should_relocate_using_remap_tree(struct btrfs_block_group *bg)
+{
+ if (!btrfs_fs_incompat(bg->fs_info, REMAP_TREE))
+ return false;
+
+ if (bg->flags & (BTRFS_BLOCK_GROUP_SYSTEM | BTRFS_BLOCK_GROUP_REMAP))
+ return false;
+
+ return true;
+}
+
int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start,
bool verbose);
int btrfs_init_reloc_root(struct btrfs_trans_handle *trans, struct btrfs_root *root);
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 8e040dcea64a..9b9f7e38dbc9 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -376,8 +376,13 @@ void btrfs_add_bg_to_space_info(struct btrfs_fs_info *info,
factor = btrfs_bg_type_to_factor(block_group->flags);
spin_lock(&space_info->lock);
- space_info->total_bytes += block_group->length;
- space_info->disk_total += block_group->length * factor;
+
+ if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED) ||
+ block_group->identity_remap_count != 0) {
+ space_info->total_bytes += block_group->length;
+ space_info->disk_total += block_group->length * factor;
+ }
+
space_info->bytes_used += block_group->used;
space_info->disk_used += block_group->used * factor;
space_info->bytes_readonly += block_group->bytes_super;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2347b37113b0..58ce94e99f7d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3409,15 +3409,57 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
return ret;
}
-int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset,
- bool verbose)
+static int btrfs_relocate_chunk_finish(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *block_group)
{
struct btrfs_root *root = fs_info->chunk_root;
struct btrfs_trans_handle *trans;
- struct btrfs_block_group *block_group;
u64 length;
int ret;
+ btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
+ length = block_group->length;
+ btrfs_put_block_group(block_group);
+
+ /*
+ * On a zoned file system, discard the whole block group, this will
+ * trigger a REQ_OP_ZONE_RESET operation on the device zone. If
+ * resetting the zone fails, don't treat it as a fatal problem from the
+ * filesystem's point of view.
+ */
+ if (btrfs_is_zoned(fs_info)) {
+ ret = btrfs_discard_extent(fs_info, block_group->start, length,
+ NULL);
+ if (ret)
+ btrfs_info(fs_info,
+ "failed to reset zone %llu after relocation",
+ block_group->start);
+ }
+
+ trans = btrfs_start_trans_remove_block_group(root->fs_info,
+ block_group->start);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ btrfs_handle_fs_error(root->fs_info, ret, NULL);
+ return ret;
+ }
+
+ /*
+ * step two, delete the device extents and the
+ * chunk tree entries
+ */
+ ret = btrfs_remove_chunk(trans, block_group->start);
+ btrfs_end_transaction(trans);
+
+ return ret;
+}
+
+int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset,
+ bool verbose)
+{
+ struct btrfs_block_group *block_group;
+ int ret;
+
if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
btrfs_err(fs_info,
"relocate: not supported on extent tree v2 yet");
@@ -3455,38 +3497,15 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset,
block_group = btrfs_lookup_block_group(fs_info, chunk_offset);
if (!block_group)
return -ENOENT;
- btrfs_discard_cancel_work(&fs_info->discard_ctl, block_group);
- length = block_group->length;
- btrfs_put_block_group(block_group);
-
- /*
- * On a zoned file system, discard the whole block group, this will
- * trigger a REQ_OP_ZONE_RESET operation on the device zone. If
- * resetting the zone fails, don't treat it as a fatal problem from the
- * filesystem's point of view.
- */
- if (btrfs_is_zoned(fs_info)) {
- ret = btrfs_discard_extent(fs_info, chunk_offset, length, NULL);
- if (ret)
- btrfs_info(fs_info,
- "failed to reset zone %llu after relocation",
- chunk_offset);
- }
- trans = btrfs_start_trans_remove_block_group(root->fs_info,
- chunk_offset);
- if (IS_ERR(trans)) {
- ret = PTR_ERR(trans);
- btrfs_handle_fs_error(root->fs_info, ret, NULL);
- return ret;
+ if (should_relocate_using_remap_tree(block_group)) {
+ /* If we're relocating using the remap tree we're now done. */
+ btrfs_put_block_group(block_group);
+ ret = 0;
+ } else {
+ ret = btrfs_relocate_chunk_finish(fs_info, block_group);
}
- /*
- * step two, delete the device extents and the
- * chunk tree entries
- */
- ret = btrfs_remove_chunk(trans, chunk_offset);
- btrfs_end_transaction(trans);
return ret;
}
@@ -4155,6 +4174,14 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
chunk_type = btrfs_chunk_type(leaf, chunk);
+ /* Check if chunk has already been fully relocated. */
+ if (chunk_type & BTRFS_BLOCK_GROUP_REMAPPED &&
+ btrfs_chunk_num_stripes(leaf, chunk) == 0) {
+ btrfs_release_path(path);
+ mutex_unlock(&fs_info->reclaim_bgs_lock);
+ goto loop;
+ }
+
if (!counting) {
spin_lock(&fs_info->balance_lock);
bctl->stat.considered++;
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 11/16] btrfs: move existing remaps before relocating block group
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (9 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 10/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 12/16] btrfs: replace identity remaps with actual remaps when doing relocations Mark Harmstone
` (4 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
If when relocating a block group we find that `remap_bytes` > 0 in its
block group item, that means that it has been the destination block
group for another that has been remapped.
We need to seach the remap tree for any remap backrefs within this
range, and move the data to a third block group. This is because
otherwise btrfs_translate_remap() could end up following an unbounded
chain of remaps, which would only get worse over time.
We only relocate one block group at a time, so `remap_bytes` will only
ever go down while we are doing this. Once we're finished we set the
REMAPPED flag on the block group, which will permanently prevent any
other data from being moved to within it.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/bio.c | 3 +-
fs/btrfs/bio.h | 3 +
fs/btrfs/extent-tree.c | 6 +-
fs/btrfs/relocation.c | 487 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 496 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 1b38e3ee0a33..1301c8c48ae2 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -817,7 +817,8 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
*/
if (!(inode->flags & BTRFS_INODE_NODATASUM) &&
!test_bit(BTRFS_FS_STATE_NO_DATA_CSUMS, &fs_info->fs_state) &&
- !btrfs_is_data_reloc_root(inode->root)) {
+ !btrfs_is_data_reloc_root(inode->root) &&
+ !bbio->is_remap) {
if (should_async_write(bbio) &&
btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num))
goto done;
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 035145909b00..0bfd7981fe1f 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -87,6 +87,9 @@ struct btrfs_bio {
*/
bool is_scrub;
+ /* Whether the bio is coming from copy_remapped_data_io(). */
+ bool is_remap;
+
/* Whether the csum generation for data write is async. */
bool async_csum;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b8fed3246e1f..ab20f7fed4cf 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4543,7 +4543,8 @@ static noinline int find_free_extent(struct btrfs_root *root,
block_group->cached != BTRFS_CACHE_NO) {
down_read(&space_info->groups_sem);
if (list_empty(&block_group->list) ||
- block_group->ro) {
+ block_group->ro ||
+ block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED) {
/*
* someone is removing this block group,
* we can't jump into the have_block_group
@@ -4577,7 +4578,8 @@ static noinline int find_free_extent(struct btrfs_root *root,
ffe_ctl->hinted = false;
/* If the block group is read-only, we can skip it entirely. */
- if (unlikely(block_group->ro)) {
+ if (unlikely(block_group->ro) ||
+ block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED) {
if (ffe_ctl->for_treelog)
btrfs_clear_treelog_bg(block_group);
if (ffe_ctl->for_data_reloc)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1f86c81678bb..a95899af811d 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3977,6 +3977,487 @@ static void adjust_block_group_remap_bytes(struct btrfs_trans_handle *trans,
btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
}
+struct reloc_io_private {
+ struct completion done;
+ refcount_t pending_refs;
+ blk_status_t status;
+};
+
+static void reloc_endio(struct btrfs_bio *bbio)
+{
+ struct reloc_io_private *priv = bbio->private;
+
+ if (bbio->bio.bi_status)
+ WRITE_ONCE(priv->status, bbio->bio.bi_status);
+
+ if (refcount_dec_and_test(&priv->pending_refs))
+ complete(&priv->done);
+
+ bio_put(&bbio->bio);
+}
+
+static int copy_remapped_data_io(struct btrfs_fs_info *fs_info,
+ struct reloc_io_private *priv,
+ struct page **pages, u64 addr, u64 length,
+ bool do_write)
+{
+ struct btrfs_bio *bbio;
+ unsigned long i = 0;
+ blk_opf_t op = do_write ? REQ_OP_WRITE : REQ_OP_READ;
+
+ init_completion(&priv->done);
+ refcount_set(&priv->pending_refs, 1);
+ priv->status = 0;
+
+ bbio = btrfs_bio_alloc(BIO_MAX_VECS, op, BTRFS_I(fs_info->btree_inode),
+ addr, reloc_endio, priv);
+ bbio->bio.bi_iter.bi_sector = addr >> SECTOR_SHIFT;
+ bbio->is_remap = true;
+
+ do {
+ size_t bytes = min_t(u64, length, PAGE_SIZE);
+
+ if (bio_add_page(&bbio->bio, pages[i], bytes, 0) < bytes) {
+ refcount_inc(&priv->pending_refs);
+ btrfs_submit_bbio(bbio, 0);
+
+ bbio = btrfs_bio_alloc(BIO_MAX_VECS, op,
+ BTRFS_I(fs_info->btree_inode),
+ addr, reloc_endio, priv);
+ bbio->bio.bi_iter.bi_sector = addr >> SECTOR_SHIFT;
+ bbio->is_remap = true;
+ continue;
+ }
+
+ i++;
+ addr += bytes;
+ length -= bytes;
+ } while (length);
+
+ refcount_inc(&priv->pending_refs);
+ btrfs_submit_bbio(bbio, 0);
+
+ if (!refcount_dec_and_test(&priv->pending_refs))
+ wait_for_completion_io(&priv->done);
+
+ return blk_status_to_errno(READ_ONCE(priv->status));
+}
+
+static int copy_remapped_data(struct btrfs_fs_info *fs_info, u64 old_addr,
+ u64 new_addr, u64 length)
+{
+ int ret;
+ struct page **pages;
+ unsigned int nr_pages;
+ struct reloc_io_private priv;
+
+ nr_pages = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
+ if (!pages)
+ return -ENOMEM;
+ ret = btrfs_alloc_page_array(nr_pages, pages, 0);
+ if (ret) {
+ ret = -ENOMEM;
+ goto end;
+ }
+
+ ret = copy_remapped_data_io(fs_info, &priv, pages, old_addr, length,
+ false);
+ if (ret)
+ goto end;
+
+ ret = copy_remapped_data_io(fs_info, &priv, pages, new_addr, length,
+ true);
+
+end:
+ for (unsigned int i = 0; i < nr_pages; i++) {
+ if (pages[i])
+ __free_page(pages[i]);
+ }
+ kfree(pages);
+
+ return ret;
+}
+
+static int do_copy(struct btrfs_fs_info *fs_info, u64 old_addr, u64 new_addr,
+ u64 length)
+{
+ int ret;
+
+ /* Copy 1MB at a time, to avoid using too much memory. */
+
+ do {
+ u64 to_copy = min_t(u64, length, SZ_1M);
+
+ /* Limit to one bio. */
+ to_copy = min_t(u64, to_copy, BIO_MAX_VECS << PAGE_SHIFT);
+
+ ret = copy_remapped_data(fs_info, old_addr, new_addr,
+ to_copy);
+ if (ret)
+ return ret;
+
+ if (to_copy == length)
+ break;
+
+ old_addr += to_copy;
+ new_addr += to_copy;
+ length -= to_copy;
+ } while (true);
+
+ return 0;
+}
+
+static int add_remap_item(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path, u64 new_addr, u64 length,
+ u64 old_addr)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_remap remap;
+ struct btrfs_key key;
+ struct extent_buffer *leaf;
+ int ret;
+
+ key.objectid = old_addr;
+ key.type = BTRFS_REMAP_KEY;
+ key.offset = length;
+
+ ret = btrfs_insert_empty_item(trans, fs_info->remap_root, path,
+ &key, sizeof(struct btrfs_remap));
+ if (ret)
+ return ret;
+
+ leaf = path->nodes[0];
+
+ btrfs_set_stack_remap_address(&remap, new_addr);
+
+ write_extent_buffer(leaf, &remap,
+ btrfs_item_ptr_offset(leaf, path->slots[0]),
+ sizeof(struct btrfs_remap));
+
+ btrfs_release_path(path);
+
+ return 0;
+}
+
+static int add_remap_backref_item(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path, u64 new_addr,
+ u64 length, u64 old_addr)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_remap remap;
+ struct btrfs_key key;
+ struct extent_buffer *leaf;
+ int ret;
+
+ key.objectid = new_addr;
+ key.type = BTRFS_REMAP_BACKREF_KEY;
+ key.offset = length;
+
+ ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
+ path, &key, sizeof(struct btrfs_remap));
+ if (ret)
+ return ret;
+
+ leaf = path->nodes[0];
+
+ btrfs_set_stack_remap_address(&remap, old_addr);
+
+ write_extent_buffer(leaf, &remap,
+ btrfs_item_ptr_offset(leaf, path->slots[0]),
+ sizeof(struct btrfs_remap));
+
+ btrfs_release_path(path);
+
+ return 0;
+}
+
+static int move_existing_remap(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path,
+ struct btrfs_block_group *bg, u64 new_addr,
+ u64 length, u64 old_addr)
+{
+ struct btrfs_trans_handle *trans;
+ struct extent_buffer *leaf;
+ struct btrfs_remap *remap_ptr, remap;
+ struct btrfs_key key, ins;
+ u64 dest_addr, dest_length, min_size;
+ struct btrfs_block_group *dest_bg;
+ int ret;
+ bool is_data = bg->flags & BTRFS_BLOCK_GROUP_DATA;
+ struct btrfs_space_info *sinfo = bg->space_info;
+ bool mutex_taken = false, bg_needs_free_space;
+
+ spin_lock(&sinfo->lock);
+ btrfs_space_info_update_bytes_may_use(sinfo, length);
+ spin_unlock(&sinfo->lock);
+
+ if (is_data)
+ min_size = fs_info->sectorsize;
+ else
+ min_size = fs_info->nodesize;
+
+ ret = btrfs_reserve_extent(fs_info->fs_root, length, length, min_size,
+ 0, 0, &ins, is_data, false);
+ if (ret) {
+ spin_lock(&sinfo->lock);
+ btrfs_space_info_update_bytes_may_use(sinfo, -length);
+ spin_unlock(&sinfo->lock);
+ return ret;
+ }
+
+ dest_addr = ins.objectid;
+ dest_length = ins.offset;
+
+ if (!is_data && !IS_ALIGNED(dest_length, fs_info->nodesize)) {
+ u64 new_length = ALIGN_DOWN(dest_length, fs_info->nodesize);
+
+ btrfs_free_reserved_extent(fs_info, dest_addr + new_length,
+ dest_length - new_length, 0);
+
+ dest_length = new_length;
+ }
+
+ trans = btrfs_join_transaction(fs_info->remap_root);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ trans = NULL;
+ goto end;
+ }
+
+ mutex_lock(&fs_info->remap_mutex);
+ mutex_taken = true;
+
+ /* Find old remap entry. */
+
+ key.objectid = old_addr;
+ key.type = BTRFS_REMAP_KEY;
+ key.offset = length;
+
+ ret = btrfs_search_slot(trans, fs_info->remap_root, &key,
+ path, 0, 1);
+ if (ret == 1) {
+ /*
+ * Not a problem if the remap entry wasn't found: that means
+ * that another transaction has deallocated the data.
+ * move_existing_remaps() loops until the BG contains no
+ * remaps, so we can just return 0 in this case.
+ */
+ btrfs_release_path(path);
+ ret = 0;
+ goto end;
+ } else if (ret) {
+ goto end;
+ }
+
+ ret = do_copy(fs_info, new_addr, dest_addr, dest_length);
+ if (ret)
+ goto end;
+
+ /* Change data of old remap entry. */
+
+ leaf = path->nodes[0];
+
+ remap_ptr = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_remap);
+ btrfs_set_remap_address(leaf, remap_ptr, dest_addr);
+
+ btrfs_mark_buffer_dirty(trans, leaf);
+
+ if (dest_length != length) {
+ key.offset = dest_length;
+ btrfs_set_item_key_safe(trans, path, &key);
+ }
+
+ btrfs_release_path(path);
+
+ if (dest_length != length) {
+ /* Add remap item for remainder. */
+
+ ret = add_remap_item(trans, path, new_addr + dest_length,
+ length - dest_length,
+ old_addr + dest_length);
+ if (ret)
+ goto end;
+ }
+
+ /* Change or remove old backref. */
+
+ key.objectid = new_addr;
+ key.type = BTRFS_REMAP_BACKREF_KEY;
+ key.offset = length;
+
+ ret = btrfs_search_slot(trans, fs_info->remap_root, &key,
+ path, -1, 1);
+ if (ret) {
+ if (ret == 1) {
+ btrfs_release_path(path);
+ ret = -ENOENT;
+ }
+ goto end;
+ }
+
+ leaf = path->nodes[0];
+
+ if (dest_length == length) {
+ ret = btrfs_del_item(trans, fs_info->remap_root, path);
+ if (ret) {
+ btrfs_release_path(path);
+ goto end;
+ }
+ } else {
+ key.objectid += dest_length;
+ key.offset -= dest_length;
+ btrfs_set_item_key_safe(trans, path, &key);
+
+ btrfs_set_stack_remap_address(&remap, old_addr + dest_length);
+
+ write_extent_buffer(leaf, &remap,
+ btrfs_item_ptr_offset(leaf, path->slots[0]),
+ sizeof(struct btrfs_remap));
+ }
+
+ btrfs_release_path(path);
+
+ /* Add new backref. */
+
+ ret = add_remap_backref_item(trans, path, dest_addr, dest_length,
+ old_addr);
+ if (ret)
+ goto end;
+
+ adjust_block_group_remap_bytes(trans, bg, -dest_length);
+
+ ret = btrfs_add_to_free_space_tree(trans, new_addr, dest_length);
+ if (ret)
+ goto end;
+
+ dest_bg = btrfs_lookup_block_group(fs_info, dest_addr);
+
+ adjust_block_group_remap_bytes(trans, dest_bg, dest_length);
+
+ mutex_lock(&dest_bg->free_space_lock);
+ bg_needs_free_space = test_bit(BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE,
+ &dest_bg->runtime_flags);
+ mutex_unlock(&dest_bg->free_space_lock);
+ btrfs_put_block_group(dest_bg);
+
+ if (bg_needs_free_space) {
+ ret = btrfs_add_block_group_free_space(trans, dest_bg);
+ if (ret)
+ goto end;
+ }
+
+ ret = btrfs_remove_from_free_space_tree(trans, dest_addr, dest_length);
+ if (ret) {
+ btrfs_remove_from_free_space_tree(trans, new_addr,
+ dest_length);
+ goto end;
+ }
+
+ ret = 0;
+
+end:
+ if (mutex_taken)
+ mutex_unlock(&fs_info->remap_mutex);
+
+ btrfs_dec_block_group_reservations(fs_info, dest_addr);
+
+ if (ret) {
+ btrfs_free_reserved_extent(fs_info, dest_addr, dest_length, 0);
+
+ if (trans) {
+ btrfs_abort_transaction(trans, ret);
+ btrfs_end_transaction(trans);
+ }
+ } else {
+ dest_bg = btrfs_lookup_block_group(fs_info, dest_addr);
+ btrfs_free_reserved_bytes(dest_bg, dest_length, 0);
+ btrfs_put_block_group(dest_bg);
+
+ ret = btrfs_commit_transaction(trans);
+ }
+
+ return ret;
+}
+
+static int move_existing_remaps(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *bg,
+ struct btrfs_path *path)
+{
+ int ret;
+ struct btrfs_key key;
+ struct extent_buffer *leaf;
+ struct btrfs_remap *remap;
+ u64 old_addr;
+
+ /* Look for backrefs in remap tree. */
+
+ while (bg->remap_bytes > 0) {
+ key.objectid = bg->start;
+ key.type = BTRFS_REMAP_BACKREF_KEY;
+ key.offset = 0;
+
+ ret = btrfs_search_slot(NULL, fs_info->remap_root, &key, path,
+ 0, 0);
+ if (ret < 0)
+ return ret;
+
+ leaf = path->nodes[0];
+
+ if (path->slots[0] >= btrfs_header_nritems(leaf)) {
+ ret = btrfs_next_leaf(fs_info->remap_root, path);
+ if (ret < 0) {
+ btrfs_release_path(path);
+ return ret;
+ }
+
+ if (ret) {
+ btrfs_release_path(path);
+ break;
+ }
+
+ leaf = path->nodes[0];
+ }
+
+ btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+
+ if (key.type != BTRFS_REMAP_BACKREF_KEY) {
+ path->slots[0]++;
+
+ if (path->slots[0] >= btrfs_header_nritems(leaf)) {
+ ret = btrfs_next_leaf(fs_info->remap_root, path);
+ if (ret < 0) {
+ btrfs_release_path(path);
+ return ret;
+ }
+
+ if (ret) {
+ btrfs_release_path(path);
+ break;
+ }
+
+ leaf = path->nodes[0];
+ }
+ }
+
+ remap = btrfs_item_ptr(leaf, path->slots[0],
+ struct btrfs_remap);
+
+ old_addr = btrfs_remap_address(leaf, remap);
+
+ btrfs_release_path(path);
+
+ ret = move_existing_remap(fs_info, path, bg, key.objectid,
+ key.offset, old_addr);
+ if (ret)
+ return ret;
+ }
+
+ BUG_ON(bg->remap_bytes > 0);
+
+ return 0;
+}
+
static int create_remap_tree_entries(struct btrfs_trans_handle *trans,
struct btrfs_path *path,
struct btrfs_block_group *bg)
@@ -4635,6 +5116,12 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start,
WARN_ON(ret && ret != -EAGAIN);
if (should_relocate_using_remap_tree(bg)) {
+ if (bg->remap_bytes != 0) {
+ ret = move_existing_remaps(fs_info, bg, path);
+ if (ret)
+ goto out;
+ }
+
ret = start_block_group_remapping(fs_info, path, bg);
} else {
ret = do_nonremap_reloc(fs_info, verbose, rc);
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 12/16] btrfs: replace identity remaps with actual remaps when doing relocations
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (10 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 11/16] btrfs: move existing remaps before relocating block group Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-20 0:21 ` Boris Burkov
2025-11-14 18:47 ` [PATCH v6 13/16] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
` (3 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone
Add a function do_remap_tree_reloc(), which does the actual work of
doing a relocation using the remap tree.
In a loop we call do_remap_reloc_trans(), which searches for the first
identity remap for the block group. We call btrfs_reserve_extent() to
find space elsewhere for it, and read the data into memory and write it
to the new location. We then carve out the identity remap and replace it
with an actual remap, which points to the new location in which to look.
Once the last identity remap has been removed we call
last_identity_remap_gone(), which, as with deletions, removes the
chunk's stripes and device extents.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
---
fs/btrfs/relocation.c | 336 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 336 insertions(+)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index a95899af811d..15c4a7c6b1ef 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4636,6 +4636,61 @@ static int create_remap_tree_entries(struct btrfs_trans_handle *trans,
return ret;
}
+static int find_next_identity_remap(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path, u64 bg_end,
+ u64 last_start, u64 *start,
+ u64 *length)
+{
+ int ret;
+ struct btrfs_key key, found_key;
+ struct btrfs_root *remap_root = trans->fs_info->remap_root;
+ struct extent_buffer *leaf;
+
+ key.objectid = last_start;
+ key.type = BTRFS_IDENTITY_REMAP_KEY;
+ key.offset = 0;
+
+ ret = btrfs_search_slot(trans, remap_root, &key, path, 0, 0);
+ if (ret < 0)
+ goto out;
+
+ leaf = path->nodes[0];
+ while (true) {
+ if (path->slots[0] >= btrfs_header_nritems(leaf)) {
+ ret = btrfs_next_leaf(remap_root, path);
+
+ if (ret != 0) {
+ if (ret == 1)
+ ret = -ENOENT;
+ goto out;
+ }
+
+ leaf = path->nodes[0];
+ }
+
+ btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+
+ if (found_key.objectid >= bg_end) {
+ ret = -ENOENT;
+ goto out;
+ }
+
+ if (found_key.type == BTRFS_IDENTITY_REMAP_KEY) {
+ *start = found_key.objectid;
+ *length = found_key.offset;
+ ret = 0;
+ goto out;
+ }
+
+ path->slots[0]++;
+ }
+
+out:
+ btrfs_release_path(path);
+
+ return ret;
+}
+
static int remove_chunk_stripes(struct btrfs_trans_handle *trans,
struct btrfs_chunk_map *chunk,
struct btrfs_path *path)
@@ -4781,6 +4836,96 @@ static void adjust_identity_remap_count(struct btrfs_trans_handle *trans,
btrfs_mark_bg_fully_remapped(bg, trans);
}
+static int add_remap_entry(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path,
+ struct btrfs_block_group *src_bg, u64 old_addr,
+ u64 new_addr, u64 length)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_key key, new_key;
+ int ret;
+ int identity_count_delta = 0;
+
+ key.objectid = old_addr;
+ key.type = (u8)-1;
+ key.offset = (u64)-1;
+
+ ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path, -1, 1);
+ if (ret < 0)
+ goto end;
+
+ if (path->slots[0] == 0) {
+ ret = -ENOENT;
+ goto end;
+ }
+
+ path->slots[0]--;
+
+ btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+
+ if (key.type != BTRFS_IDENTITY_REMAP_KEY ||
+ key.objectid > old_addr ||
+ key.objectid + key.offset <= old_addr) {
+ ret = -ENOENT;
+ goto end;
+ }
+
+ /* Shorten or delete identity mapping entry. */
+
+ if (key.objectid == old_addr) {
+ ret = btrfs_del_item(trans, fs_info->remap_root, path);
+ if (ret)
+ goto end;
+
+ identity_count_delta--;
+ } else {
+ new_key.objectid = key.objectid;
+ new_key.type = BTRFS_IDENTITY_REMAP_KEY;
+ new_key.offset = old_addr - key.objectid;
+
+ btrfs_set_item_key_safe(trans, path, &new_key);
+ }
+
+ btrfs_release_path(path);
+
+ /* Create new remap entry. */
+
+ ret = add_remap_item(trans, path, new_addr, length, old_addr);
+ if (ret)
+ goto end;
+
+ /* Add entry for remainder of identity mapping, if necessary. */
+
+ if (key.objectid + key.offset != old_addr + length) {
+ new_key.objectid = old_addr + length;
+ new_key.type = BTRFS_IDENTITY_REMAP_KEY;
+ new_key.offset = key.objectid + key.offset - old_addr - length;
+
+ ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
+ path, &new_key, 0);
+ if (ret)
+ goto end;
+
+ btrfs_release_path(path);
+
+ identity_count_delta++;
+ }
+
+ /* Add backref. */
+
+ ret = add_remap_backref_item(trans, path, new_addr, length, old_addr);
+ if (ret)
+ goto end;
+
+ if (identity_count_delta != 0)
+ adjust_identity_remap_count(trans, src_bg, identity_count_delta);
+
+end:
+ btrfs_release_path(path);
+
+ return ret;
+}
+
static int mark_chunk_remapped(struct btrfs_trans_handle *trans,
struct btrfs_path *path, uint64_t start)
{
@@ -4830,6 +4975,189 @@ static int mark_chunk_remapped(struct btrfs_trans_handle *trans,
return ret;
}
+static int do_remap_reloc_trans(struct btrfs_fs_info *fs_info,
+ struct btrfs_block_group *src_bg,
+ struct btrfs_path *path, u64 *last_start)
+{
+ struct btrfs_trans_handle *trans;
+ struct btrfs_root *extent_root;
+ struct btrfs_key ins;
+ struct btrfs_block_group *dest_bg = NULL;
+ u64 start, remap_length, length, new_addr, min_size;
+ int ret;
+ bool no_more = false;
+ bool is_data = src_bg->flags & BTRFS_BLOCK_GROUP_DATA;
+ bool made_reservation = false, bg_needs_free_space;
+ struct btrfs_space_info *sinfo = src_bg->space_info;
+
+ extent_root = btrfs_extent_root(fs_info, src_bg->start);
+
+ trans = btrfs_start_transaction(extent_root, 0);
+ if (IS_ERR(trans))
+ return PTR_ERR(trans);
+
+ mutex_lock(&fs_info->remap_mutex);
+
+ ret = find_next_identity_remap(trans, path, src_bg->start + src_bg->length,
+ *last_start, &start, &remap_length);
+ if (ret == -ENOENT) {
+ no_more = true;
+ goto next;
+ } else if (ret) {
+ mutex_unlock(&fs_info->remap_mutex);
+ btrfs_end_transaction(trans);
+ return ret;
+ }
+
+ /* Try to reserve enough space for block. */
+
+ spin_lock(&sinfo->lock);
+ btrfs_space_info_update_bytes_may_use(sinfo, remap_length);
+ spin_unlock(&sinfo->lock);
+
+ if (is_data)
+ min_size = fs_info->sectorsize;
+ else
+ min_size = fs_info->nodesize;
+
+ /*
+ * We're using btrfs_reserve_extent() to allocate a contiguous
+ * logical address range, but this will become a remap item rather than
+ * an extent in the extent tree.
+ *
+ * Short allocations are fine: it means that we chop off the beginning
+ * of the identity remap that we're processing, and will tackle the
+ * rest of it the next time round.
+ */
+ ret = btrfs_reserve_extent(fs_info->fs_root, remap_length,
+ remap_length, min_size,
+ 0, 0, &ins, is_data, false);
+ if (ret) {
+ spin_lock(&sinfo->lock);
+ btrfs_space_info_update_bytes_may_use(sinfo, -remap_length);
+ spin_unlock(&sinfo->lock);
+
+ mutex_unlock(&fs_info->remap_mutex);
+ btrfs_end_transaction(trans);
+ return ret;
+ }
+
+ made_reservation = true;
+
+ new_addr = ins.objectid;
+ length = ins.offset;
+
+ if (!is_data && !IS_ALIGNED(length, fs_info->nodesize)) {
+ u64 new_length = ALIGN_DOWN(length, fs_info->nodesize);
+
+ btrfs_free_reserved_extent(fs_info, new_addr + new_length,
+ length - new_length, 0);
+
+ length = new_length;
+ }
+
+ dest_bg = btrfs_lookup_block_group(fs_info, new_addr);
+
+ mutex_lock(&dest_bg->free_space_lock);
+ bg_needs_free_space = test_bit(BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE,
+ &dest_bg->runtime_flags);
+ mutex_unlock(&dest_bg->free_space_lock);
+
+ if (bg_needs_free_space) {
+ ret = btrfs_add_block_group_free_space(trans, dest_bg);
+ if (ret)
+ goto fail;
+ }
+
+ ret = do_copy(fs_info, start, new_addr, length);
+ if (ret)
+ goto fail;
+
+ ret = btrfs_remove_from_free_space_tree(trans, new_addr, length);
+ if (ret)
+ goto fail;
+
+ ret = add_remap_entry(trans, path, src_bg, start, new_addr, length);
+ if (ret) {
+ btrfs_add_to_free_space_tree(trans, new_addr, length);
+ goto fail;
+ }
+
+ adjust_block_group_remap_bytes(trans, dest_bg, length);
+ btrfs_free_reserved_bytes(dest_bg, length, 0);
+
+ spin_lock(&sinfo->lock);
+ sinfo->bytes_readonly += length;
+ spin_unlock(&sinfo->lock);
+
+next:
+ if (dest_bg)
+ btrfs_put_block_group(dest_bg);
+
+ if (made_reservation)
+ btrfs_dec_block_group_reservations(fs_info, new_addr);
+
+ mutex_unlock(&fs_info->remap_mutex);
+
+ if (src_bg->identity_remap_count == 0) {
+ bool mark_fully_remapped = false;
+
+ spin_lock(&src_bg->lock);
+
+ if (!src_bg->fully_remapped) {
+ mark_fully_remapped = true;
+ src_bg->fully_remapped = true;
+ }
+
+ spin_unlock(&src_bg->lock);
+
+ if (mark_fully_remapped)
+ btrfs_mark_bg_fully_remapped(src_bg, trans);
+ }
+
+ ret = btrfs_end_transaction(trans);
+ if (ret)
+ return ret;
+
+ if (no_more)
+ return 1;
+
+ *last_start = start;
+
+ return 0;
+
+fail:
+ if (dest_bg)
+ btrfs_put_block_group(dest_bg);
+
+ btrfs_free_reserved_extent(fs_info, new_addr, length, 0);
+
+ mutex_unlock(&fs_info->remap_mutex);
+ btrfs_end_transaction(trans);
+
+ return ret;
+}
+
+static int do_remap_reloc(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path, struct btrfs_block_group *bg)
+{
+ u64 last_start;
+ int ret;
+
+ last_start = bg->start;
+
+ while (true) {
+ ret = do_remap_reloc_trans(fs_info, bg, path, &last_start);
+ if (ret) {
+ if (ret == 1)
+ ret = 0;
+ break;
+ }
+ }
+
+ return ret;
+}
+
int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
u64 *length)
{
@@ -5123,6 +5451,14 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start,
}
ret = start_block_group_remapping(fs_info, path, bg);
+ if (ret)
+ goto out;
+
+ ret = do_remap_reloc(fs_info, path, rc->block_group);
+ if (ret)
+ goto out;
+
+ btrfs_delete_unused_bgs(fs_info);
} else {
ret = do_nonremap_reloc(fs_info, verbose, rc);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 13/16] btrfs: add do_remap param to btrfs_discard_extent()
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (11 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 12/16] btrfs: replace identity remaps with actual remaps when doing relocations Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 14/16] btrfs: allow balancing remap tree Mark Harmstone
` (2 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
btrfs_discard_extent() can be called either when an extent is removed
or from walking the free-space tree. With a remapped block group these
two things are no longer equivalent: the extent's addresses are
remapped, while the free-space tree exclusively uses underlying
addresses.
Add a do_remap parameter to btrfs_discard_extent() and
btrfs_map_discard(), saying whether or not the address needs to be run
through the remap tree first.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/extent-tree.c | 11 +++++++----
fs/btrfs/extent-tree.h | 2 +-
fs/btrfs/free-space-cache.c | 2 +-
fs/btrfs/inode.c | 2 +-
fs/btrfs/volumes.c | 23 +++++++++++++++++++++--
fs/btrfs/volumes.h | 2 +-
6 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ab20f7fed4cf..91b4e1b0842c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1381,7 +1381,7 @@ static int do_discard_extent(struct btrfs_discard_stripe *stripe, u64 *bytes)
}
int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
- u64 num_bytes, u64 *actual_bytes)
+ u64 num_bytes, u64 *actual_bytes, bool do_remap)
{
int ret = 0;
u64 discarded_bytes = 0;
@@ -1399,7 +1399,8 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
int i;
num_bytes = end - cur;
- stripes = btrfs_map_discard(fs_info, cur, &num_bytes, &num_stripes);
+ stripes = btrfs_map_discard(fs_info, cur, &num_bytes,
+ &num_stripes, do_remap);
if (IS_ERR(stripes)) {
ret = PTR_ERR(stripes);
if (ret == -EOPNOTSUPP)
@@ -2912,7 +2913,8 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
if (btrfs_test_opt(fs_info, DISCARD_SYNC))
ret = btrfs_discard_extent(fs_info, start,
- end + 1 - start, NULL);
+ end + 1 - start, NULL,
+ true);
next_state = btrfs_next_extent_state(unpin, cached_state);
btrfs_clear_extent_dirty(unpin, start, end, &cached_state);
@@ -2970,7 +2972,8 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
ret = -EROFS;
if (!TRANS_ABORTED(trans))
ret = btrfs_discard_extent(fs_info, block_group->start,
- block_group->length, NULL);
+ block_group->length, NULL,
+ true);
/*
* Not strictly necessary to lock, as the block_group should be
diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h
index a15a9497c9f3..0700cb8de3f3 100644
--- a/fs/btrfs/extent-tree.h
+++ b/fs/btrfs/extent-tree.h
@@ -161,7 +161,7 @@ int btrfs_drop_subtree(struct btrfs_trans_handle *trans,
struct extent_buffer *parent);
void btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u64 end);
int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
- u64 num_bytes, u64 *actual_bytes);
+ u64 num_bytes, u64 *actual_bytes, bool do_remap);
int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range);
void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 05ce6b5a898f..30507fa8ad80 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3677,7 +3677,7 @@ static int do_trimming(struct btrfs_block_group *block_group,
}
spin_unlock(&space_info->lock);
- ret = btrfs_discard_extent(fs_info, start, bytes, &trimmed);
+ ret = btrfs_discard_extent(fs_info, start, bytes, &trimmed, false);
if (!ret) {
*total_trimmed += trimmed;
trim_state = BTRFS_TRIM_STATE_TRIMMED;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1a0c380ef464..0d0b36891bc7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3306,7 +3306,7 @@ int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent)
btrfs_discard_extent(fs_info,
ordered_extent->disk_bytenr,
ordered_extent->disk_num_bytes,
- NULL);
+ NULL, true);
btrfs_free_reserved_extent(fs_info,
ordered_extent->disk_bytenr,
ordered_extent->disk_num_bytes, true);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 58ce94e99f7d..80749c8ef8b2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3429,7 +3429,7 @@ static int btrfs_relocate_chunk_finish(struct btrfs_fs_info *fs_info,
*/
if (btrfs_is_zoned(fs_info)) {
ret = btrfs_discard_extent(fs_info, block_group->start, length,
- NULL);
+ NULL, true);
if (ret)
btrfs_info(fs_info,
"failed to reset zone %llu after relocation",
@@ -6114,7 +6114,7 @@ void btrfs_put_bioc(struct btrfs_io_context *bioc)
*/
struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
u64 logical, u64 *length_ret,
- u32 *num_stripes)
+ u32 *num_stripes, bool do_remap)
{
struct btrfs_chunk_map *map;
struct btrfs_discard_stripe *stripes;
@@ -6138,6 +6138,25 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
if (IS_ERR(map))
return ERR_CAST(map);
+ if (do_remap && map->type & BTRFS_BLOCK_GROUP_REMAPPED) {
+ u64 new_logical = logical;
+
+ ret = btrfs_translate_remap(fs_info, &new_logical, &length);
+ if (ret)
+ goto out_free_map;
+
+ if (new_logical != logical) {
+ btrfs_free_chunk_map(map);
+
+ map = btrfs_get_chunk_map(fs_info, new_logical,
+ length);
+ if (IS_ERR(map))
+ return ERR_CAST(map);
+
+ logical = new_logical;
+ }
+ }
+
/* we don't discard raid56 yet */
if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
ret = -EOPNOTSUPP;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index ccf0a459180d..505a50689fb0 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -732,7 +732,7 @@ int btrfs_map_repair_block(struct btrfs_fs_info *fs_info,
u32 length, int mirror_num);
struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
u64 logical, u64 *length_ret,
- u32 *num_stripes);
+ u32 *num_stripes, bool do_remap);
int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 14/16] btrfs: allow balancing remap tree
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (12 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 13/16] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 15/16] btrfs: handle discarding fully-remapped block groups Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 16/16] btrfs: populate fully_remapped_bgs_list on mount Mark Harmstone
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
Balancing the REMAP chunk, i.e. the chunk in which the remap tree lives,
is a special case.
We can't use the remap tree itself for this, as then we'd have no way to
boostrap it on mount. And we can't use the pre-remap tree code for this
as it relies on walking the extent tree, and we're not creating backrefs
for REMAP chunks.
So instead, if a balance would relocate any REMAP block groups, mark
those block groups as readonly and COW every leaf of the remap tree.
There's more sophisticated ways of doing this, such as only COWing nodes
within a block group that's to be relocated, but they're fiddly and with
lots of edge cases. Plus it's not anticipated that a) the number of
REMAP chunks is going to be particularly large, or b) that users will
want to only relocate some of these chunks - the main use case here is
to unbreak RAID conversion and device removal.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/volumes.c | 159 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 155 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 80749c8ef8b2..ecf1c8ed259b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4002,8 +4002,11 @@ static bool should_balance_chunk(struct extent_buffer *leaf, struct btrfs_chunk
struct btrfs_balance_args *bargs = NULL;
u64 chunk_type = btrfs_chunk_type(leaf, chunk);
- if (chunk_type & BTRFS_BLOCK_GROUP_REMAP)
- return false;
+ /* treat REMAP chunks as METADATA */
+ if (chunk_type & BTRFS_BLOCK_GROUP_REMAP) {
+ chunk_type &= ~BTRFS_BLOCK_GROUP_REMAP;
+ chunk_type |= BTRFS_BLOCK_GROUP_METADATA;
+ }
/* type filter */
if (!((chunk_type & BTRFS_BLOCK_GROUP_TYPE_MASK) &
@@ -4086,6 +4089,113 @@ static bool should_balance_chunk(struct extent_buffer *leaf, struct btrfs_chunk
return true;
}
+struct remap_chunk_info {
+ struct list_head list;
+ u64 offset;
+ struct btrfs_block_group *bg;
+ bool made_ro;
+};
+
+static int cow_remap_tree(struct btrfs_trans_handle *trans,
+ struct btrfs_path *path)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_key key = { 0 };
+ int ret;
+
+ ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path, 0, 1);
+ if (ret < 0)
+ return ret;
+
+ while (true) {
+ ret = btrfs_next_leaf(fs_info->remap_root, path);
+ if (ret < 0) {
+ return ret;
+ } else if (ret > 0) {
+ ret = 0;
+ break;
+ }
+
+ btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+
+ btrfs_release_path(path);
+
+ ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path,
+ 0, 1);
+ if (ret < 0)
+ break;
+ }
+
+ return ret;
+}
+
+static int balance_remap_chunks(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path,
+ struct list_head *chunks)
+{
+ struct remap_chunk_info *rci, *tmp;
+ struct btrfs_trans_handle *trans;
+ int ret;
+
+ list_for_each_entry_safe(rci, tmp, chunks, list) {
+ rci->bg = btrfs_lookup_block_group(fs_info, rci->offset);
+ if (!rci->bg) {
+ list_del(&rci->list);
+ kfree(rci);
+ continue;
+ }
+
+ ret = btrfs_inc_block_group_ro(rci->bg, false);
+ if (ret)
+ goto end;
+
+ rci->made_ro = true;
+ }
+
+ if (list_empty(chunks))
+ return 0;
+
+ trans = btrfs_start_transaction(fs_info->remap_root, 0);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ goto end;
+ }
+
+ mutex_lock(&fs_info->remap_mutex);
+
+ ret = cow_remap_tree(trans, path);
+
+ btrfs_release_path(path);
+
+ mutex_unlock(&fs_info->remap_mutex);
+
+ btrfs_commit_transaction(trans);
+
+end:
+ while (!list_empty(chunks)) {
+ bool unused;
+
+ rci = list_first_entry(chunks, struct remap_chunk_info, list);
+
+ spin_lock(&rci->bg->lock);
+ unused = !btrfs_is_block_group_used(rci->bg);
+ spin_unlock(&rci->bg->lock);
+
+ if (unused)
+ btrfs_mark_bg_unused(rci->bg);
+
+ if (rci->made_ro)
+ btrfs_dec_block_group_ro(rci->bg);
+
+ btrfs_put_block_group(rci->bg);
+
+ list_del(&rci->list);
+ kfree(rci);
+ }
+
+ return ret;
+}
+
static int __btrfs_balance(struct btrfs_fs_info *fs_info)
{
struct btrfs_balance_control *bctl = fs_info->balance_ctl;
@@ -4108,6 +4218,9 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
u32 count_meta = 0;
u32 count_sys = 0;
int chunk_reserved = 0;
+ struct remap_chunk_info *rci;
+ unsigned int num_remap_chunks = 0;
+ LIST_HEAD(remap_chunks);
path = btrfs_alloc_path();
if (!path) {
@@ -4206,7 +4319,8 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
count_data++;
else if (chunk_type & BTRFS_BLOCK_GROUP_SYSTEM)
count_sys++;
- else if (chunk_type & BTRFS_BLOCK_GROUP_METADATA)
+ else if (chunk_type & (BTRFS_BLOCK_GROUP_METADATA |
+ BTRFS_BLOCK_GROUP_REMAP))
count_meta++;
goto loop;
@@ -4226,6 +4340,30 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
goto loop;
}
+ /*
+ * Balancing REMAP chunks takes place separately - add the
+ * details to a list so it can be processed later.
+ */
+ if (chunk_type & BTRFS_BLOCK_GROUP_REMAP) {
+ mutex_unlock(&fs_info->reclaim_bgs_lock);
+
+ rci = kmalloc(sizeof(struct remap_chunk_info),
+ GFP_NOFS);
+ if (!rci) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ rci->offset = found_key.offset;
+ rci->bg = NULL;
+ rci->made_ro = false;
+ list_add_tail(&rci->list, &remap_chunks);
+
+ num_remap_chunks++;
+
+ goto loop;
+ }
+
if (!chunk_reserved) {
/*
* We may be relocating the only data chunk we have,
@@ -4265,11 +4403,24 @@ static int __btrfs_balance(struct btrfs_fs_info *fs_info)
key.offset = found_key.offset - 1;
}
+ btrfs_release_path(path);
+
if (counting) {
- btrfs_release_path(path);
counting = false;
goto again;
}
+
+ if (!list_empty(&remap_chunks)) {
+ ret = balance_remap_chunks(fs_info, path, &remap_chunks);
+ if (ret == -ENOSPC)
+ enospc_errors++;
+
+ if (!ret) {
+ spin_lock(&fs_info->balance_lock);
+ bctl->stat.completed += num_remap_chunks;
+ spin_unlock(&fs_info->balance_lock);
+ }
+ }
error:
btrfs_free_path(path);
if (enospc_errors) {
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 15/16] btrfs: handle discarding fully-remapped block groups
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (13 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 14/16] btrfs: allow balancing remap tree Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
2025-11-20 0:19 ` Boris Burkov
2025-11-14 18:47 ` [PATCH v6 16/16] btrfs: populate fully_remapped_bgs_list on mount Mark Harmstone
15 siblings, 1 reply; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone
Discard normally works by iterating over the free-space entries of a
block group. This doesn't work for fully-remapped block groups, as we
removed their free-space entries when we started relocation.
For sync discard, call btrfs_discard_extent() when we commit the
transaction in which the last identity remap was removed.
For async discard, add a new function btrfs_trim_fully_remapped_block_group()
to be called by the discard worker, which iterates over the block
group's range using the normal async discard rules. Once we reach the
end, remove the chunk's stripes and device extents to get back its free
space.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
---
fs/btrfs/block-group.c | 29 ++++++++--------
fs/btrfs/block-group.h | 1 +
fs/btrfs/discard.c | 57 ++++++++++++++++++++++++++++----
fs/btrfs/extent-tree.c | 3 ++
fs/btrfs/free-space-cache.c | 66 +++++++++++++++++++++++++++++++++++++
fs/btrfs/free-space-cache.h | 1 +
6 files changed, 137 insertions(+), 20 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 4c4edaf3c753..965ae904ec2e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4823,20 +4823,23 @@ void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
{
struct btrfs_fs_info *fs_info = trans->fs_info;
- spin_lock(&fs_info->unused_bgs_lock);
+ if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) {
+ btrfs_discard_queue_work(&fs_info->discard_ctl, bg);
+ } else {
+ spin_lock(&fs_info->unused_bgs_lock);
- /*
- * The block group might already be on the unused_bgs list, remove it
- * if it is. It'll get readded after the async discard worker finishes,
- * or in btrfs_handle_fully_remapped_bgs() if we're not using async
- * discard.
- */
- if (!list_empty(&bg->bg_list))
- list_del(&bg->bg_list);
- else
- btrfs_get_block_group(bg);
+ /*
+ * The block group might already be on the unused_bgs list,
+ * remove it if it is. It'll get readded after
+ * btrfs_handle_fully_remapped_bgs() finishes.
+ */
+ if (!list_empty(&bg->bg_list))
+ list_del(&bg->bg_list);
+ else
+ btrfs_get_block_group(bg);
- list_add_tail(&bg->bg_list, &fs_info->fully_remapped_bgs);
+ list_add_tail(&bg->bg_list, &fs_info->fully_remapped_bgs);
- spin_unlock(&fs_info->unused_bgs_lock);
+ spin_unlock(&fs_info->unused_bgs_lock);
+ }
}
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 4522074a45c2..b0b16efea19a 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -49,6 +49,7 @@ enum btrfs_discard_state {
BTRFS_DISCARD_EXTENTS,
BTRFS_DISCARD_BITMAPS,
BTRFS_DISCARD_RESET_CURSOR,
+ BTRFS_DISCARD_FULLY_REMAPPED,
};
/*
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index ee5f5b2788e1..f9890037395a 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -215,6 +215,27 @@ static struct btrfs_block_group *find_next_block_group(
return ret_block_group;
}
+/*
+ * Returns whether a block group is empty.
+ *
+ * @block_group: block_group of interest
+ *
+ * "Empty" here means that there are no extents physically located within the
+ * device extents corresponding to this block group.
+ *
+ * For a remapped block group, this means that all of its identity remaps have
+ * been removed. For a non-remapped block group, this means that no extents
+ * have an address within its range, and that nothing has been remapped to be
+ * within it.
+ */
+static bool block_group_is_empty(struct btrfs_block_group *block_group)
+{
+ if (block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)
+ return block_group->identity_remap_count == 0;
+ else
+ return block_group->used == 0 && block_group->remap_bytes == 0;
+}
+
/*
* Look up next block group and set it for use.
*
@@ -241,8 +262,10 @@ static struct btrfs_block_group *peek_discard_list(
block_group = find_next_block_group(discard_ctl, now);
if (block_group && now >= block_group->discard_eligible_time) {
+ bool empty = block_group_is_empty(block_group);
+
if (block_group->discard_index == BTRFS_DISCARD_INDEX_UNUSED &&
- block_group->used != 0) {
+ !empty) {
if (btrfs_is_block_group_data_only(block_group)) {
__add_to_discard_list(discard_ctl, block_group);
/*
@@ -267,7 +290,15 @@ static struct btrfs_block_group *peek_discard_list(
}
if (block_group->discard_state == BTRFS_DISCARD_RESET_CURSOR) {
block_group->discard_cursor = block_group->start;
- block_group->discard_state = BTRFS_DISCARD_EXTENTS;
+
+ if (block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED &&
+ empty) {
+ block_group->discard_state =
+ BTRFS_DISCARD_FULLY_REMAPPED;
+ } else {
+ block_group->discard_state =
+ BTRFS_DISCARD_EXTENTS;
+ }
}
}
if (block_group) {
@@ -373,7 +404,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
return;
- if (block_group->used == 0 && block_group->remap_bytes == 0)
+ if (block_group_is_empty(block_group))
add_to_discard_unused_list(discard_ctl, block_group);
else
add_to_discard_list(discard_ctl, block_group);
@@ -470,7 +501,7 @@ static void btrfs_finish_discard_pass(struct btrfs_discard_ctl *discard_ctl,
{
remove_from_discard_list(discard_ctl, block_group);
- if (block_group->used == 0) {
+ if (block_group_is_empty(block_group)) {
if (btrfs_is_free_space_trimmed(block_group))
btrfs_mark_bg_unused(block_group);
else
@@ -524,7 +555,8 @@ static void btrfs_discard_workfn(struct work_struct *work)
/* Perform discarding */
minlen = discard_minlen[discard_index];
- if (discard_state == BTRFS_DISCARD_BITMAPS) {
+ switch (discard_state) {
+ case BTRFS_DISCARD_BITMAPS: {
u64 maxlen = 0;
/*
@@ -541,17 +573,28 @@ static void btrfs_discard_workfn(struct work_struct *work)
btrfs_block_group_end(block_group),
minlen, maxlen, true);
discard_ctl->discard_bitmap_bytes += trimmed;
- } else {
+
+ break;
+ }
+
+ case BTRFS_DISCARD_FULLY_REMAPPED:
+ btrfs_trim_fully_remapped_block_group(block_group);
+ break;
+
+ default:
btrfs_trim_block_group_extents(block_group, &trimmed,
block_group->discard_cursor,
btrfs_block_group_end(block_group),
minlen, true);
discard_ctl->discard_extent_bytes += trimmed;
+
+ break;
}
/* Determine next steps for a block_group */
if (block_group->discard_cursor >= btrfs_block_group_end(block_group)) {
- if (discard_state == BTRFS_DISCARD_BITMAPS) {
+ if (discard_state == BTRFS_DISCARD_BITMAPS ||
+ discard_state == BTRFS_DISCARD_FULLY_REMAPPED) {
btrfs_finish_discard_pass(discard_ctl, block_group);
} else {
block_group->discard_cursor = block_group->start;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 91b4e1b0842c..f64ca57108af 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2876,6 +2876,9 @@ void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info)
return;
}
+ btrfs_discard_extent(fs_info, block_group->start,
+ block_group->length, NULL, false);
+
/*
* Set num_stripes to 0, so that btrfs_remove_dev_extents()
* won't run a second time.
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 30507fa8ad80..1b8716b17031 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -29,6 +29,7 @@
#include "file-item.h"
#include "file.h"
#include "super.h"
+#include "relocation.h"
#define BITS_PER_BITMAP (PAGE_SIZE * 8UL)
#define MAX_CACHE_BYTES_PER_GIG SZ_64K
@@ -3066,6 +3067,11 @@ bool btrfs_is_free_space_trimmed(struct btrfs_block_group *block_group)
struct rb_node *node;
bool ret = true;
+ if (block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED &&
+ block_group->identity_remap_count == 0) {
+ return true;
+ }
+
spin_lock(&ctl->tree_lock);
node = rb_first(&ctl->free_space_offset);
@@ -3834,6 +3840,66 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
return ret;
}
+void btrfs_trim_fully_remapped_block_group(struct btrfs_block_group *bg)
+{
+ struct btrfs_fs_info *fs_info = bg->fs_info;
+ struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl;
+ int ret = 0;
+ u64 bytes, trimmed;
+ const u64 max_discard_size = READ_ONCE(discard_ctl->max_discard_size);
+ u64 end = btrfs_block_group_end(bg);
+ struct btrfs_chunk_map *map;
+
+ bytes = end - bg->discard_cursor;
+
+ if (max_discard_size &&
+ bytes >= (max_discard_size +
+ BTRFS_ASYNC_DISCARD_MIN_FILTER)) {
+ bytes = max_discard_size;
+ }
+
+ ret = btrfs_discard_extent(fs_info, bg->discard_cursor, bytes, &trimmed,
+ false);
+ if (ret)
+ return;
+
+ bg->discard_cursor += trimmed;
+
+ if (bg->discard_cursor < end)
+ return;
+
+ map = btrfs_get_chunk_map(fs_info, bg->start, 1);
+ if (IS_ERR(map)) {
+ ret = PTR_ERR(map);
+ return;
+ }
+
+ ret = btrfs_last_identity_remap_gone(map, bg);
+ if (ret) {
+ btrfs_free_chunk_map(map);
+ return;
+ }
+
+ /*
+ * Set num_stripes to 0, so that btrfs_remove_dev_extents()
+ * won't run a second time.
+ */
+ map->num_stripes = 0;
+
+ btrfs_free_chunk_map(map);
+
+ if (bg->used == 0) {
+ spin_lock(&fs_info->unused_bgs_lock);
+ if (!list_empty(&bg->bg_list)) {
+ list_del_init(&bg->bg_list);
+ btrfs_put_block_group(bg);
+ }
+ spin_unlock(&fs_info->unused_bgs_lock);
+
+ btrfs_mark_bg_unused(bg);
+ }
+}
+
/*
* If we break out of trimming a bitmap prematurely, we should reset the
* trimming bit. In a rather contrived case, it's possible to race here so
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 9f1dbfdee8ca..33fc3b245648 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -166,6 +166,7 @@ int btrfs_trim_block_group_extents(struct btrfs_block_group *block_group,
int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group,
u64 *trimmed, u64 start, u64 end, u64 minlen,
u64 maxlen, bool async);
+void btrfs_trim_fully_remapped_block_group(struct btrfs_block_group *bg);
bool btrfs_free_space_cache_v1_active(struct btrfs_fs_info *fs_info);
int btrfs_set_free_space_cache_v1_active(struct btrfs_fs_info *fs_info, bool active);
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v6 16/16] btrfs: populate fully_remapped_bgs_list on mount
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
` (14 preceding siblings ...)
2025-11-14 18:47 ` [PATCH v6 15/16] btrfs: handle discarding fully-remapped block groups Mark Harmstone
@ 2025-11-14 18:47 ` Mark Harmstone
15 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-14 18:47 UTC (permalink / raw)
To: linux-btrfs; +Cc: Mark Harmstone, Boris Burkov
Add a function btrfs_populate_fully_remapped_bgs_list() which gets
called on mount, which looks for fully remapped block groups
(i.e. identity_remap_count == 0) which haven't yet had their chunk
stripes and device extents removed.
This happens when a filesystem is unmounted while async discard has not
yet finished, as otherwise the data range occupied by the chunk stripes
would be permanently unusable.
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Reviewed-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/block-group.c | 79 +++++++++++++++++++++++++++++++++++++
fs/btrfs/block-group.h | 2 +
fs/btrfs/disk-io.c | 9 +++++
fs/btrfs/free-space-cache.c | 7 ++++
fs/btrfs/relocation.c | 4 ++
5 files changed, 101 insertions(+)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 965ae904ec2e..de10c02d1852 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -4824,6 +4824,11 @@ void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
struct btrfs_fs_info *fs_info = trans->fs_info;
if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) {
+ spin_lock(&bg->lock);
+ set_bit(BLOCK_GROUP_FLAG_STRIPE_REMOVAL_PENDING,
+ &bg->runtime_flags);
+ spin_unlock(&bg->lock);
+
btrfs_discard_queue_work(&fs_info->discard_ctl, bg);
} else {
spin_lock(&fs_info->unused_bgs_lock);
@@ -4843,3 +4848,77 @@ void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
spin_unlock(&fs_info->unused_bgs_lock);
}
}
+
+/*
+ * Compare the block group and chunk trees, and find any fully-remapped block
+ * groups which haven't yet had their chunk stripes and device extents removed,
+ * and put them on the fully_remapped_bgs list so this gets done.
+ *
+ * This happens when a block group becomes fully remapped, i.e. its last
+ * identity mapping is removed, and the volume is unmounted before async
+ * discard has finished. It's important this gets done as until it is the
+ * chunk's stripes are dead space.
+ */
+int btrfs_populate_fully_remapped_bgs_list(struct btrfs_fs_info *fs_info)
+{
+ struct rb_node *node_bg, *node_chunk;
+
+ node_bg = rb_first_cached(&fs_info->block_group_cache_tree);
+ node_chunk = rb_first_cached(&fs_info->mapping_tree);
+
+ while (node_bg && node_chunk) {
+ struct btrfs_block_group *bg;
+ struct btrfs_chunk_map *map;
+
+ bg = rb_entry(node_bg, struct btrfs_block_group, cache_node);
+ map = rb_entry(node_chunk, struct btrfs_chunk_map, rb_node);
+
+ ASSERT(bg->start == map->start);
+
+ if (!(bg->flags & BTRFS_BLOCK_GROUP_REMAPPED))
+ goto next;
+
+ if (bg->identity_remap_count != 0)
+ goto next;
+
+ if (map->num_stripes == 0)
+ goto next;
+
+ spin_lock(&fs_info->unused_bgs_lock);
+
+ if (list_empty(&bg->bg_list)) {
+ btrfs_get_block_group(bg);
+ list_add_tail(&bg->bg_list,
+ &fs_info->fully_remapped_bgs);
+ } else {
+ list_move_tail(&bg->bg_list,
+ &fs_info->fully_remapped_bgs);
+ }
+
+ spin_unlock(&fs_info->unused_bgs_lock);
+
+ /*
+ * Ideally we'd want to call btrfs_discard_queue_work() here,
+ * but it'd do nothing as the discard worker hasn't been
+ * started yet.
+ *
+ * The block group will get added to the discard list when
+ * btrfs_handle_fully_remapped_bgs() gets called, when we
+ * commit the first transaction.
+ */
+ if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) {
+ spin_lock(&bg->lock);
+ set_bit(BLOCK_GROUP_FLAG_STRIPE_REMOVAL_PENDING,
+ &bg->runtime_flags);
+ spin_unlock(&bg->lock);
+ }
+
+next:
+ node_bg = rb_next(node_bg);
+ node_chunk = rb_next(node_chunk);
+ }
+
+ ASSERT(!node_bg && !node_chunk);
+
+ return 0;
+}
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index b0b16efea19a..03e8ad8a2ec7 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -93,6 +93,7 @@ enum btrfs_block_group_flags {
* transaction.
*/
BLOCK_GROUP_FLAG_NEW,
+ BLOCK_GROUP_FLAG_STRIPE_REMOVAL_PENDING,
};
enum btrfs_caching_type {
@@ -416,5 +417,6 @@ int btrfs_use_block_group_size_class(struct btrfs_block_group *bg,
bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg);
void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
struct btrfs_trans_handle *trans);
+int btrfs_populate_fully_remapped_bgs_list(struct btrfs_fs_info *fs_info);
#endif /* BTRFS_BLOCK_GROUP_H */
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 53221a0131fb..177a33cd9815 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3656,6 +3656,15 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
goto fail_sysfs;
}
+ if (btrfs_fs_incompat(fs_info, REMAP_TREE)) {
+ ret = btrfs_populate_fully_remapped_bgs_list(fs_info);
+ if (ret) {
+ btrfs_err(fs_info,
+ "failed to populate fully_remapped_bgs list: %d", ret);
+ goto fail_sysfs;
+ }
+ }
+
btrfs_zoned_reserve_data_reloc_bg(fs_info);
btrfs_free_zone_cache(fs_info);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 1b8716b17031..ce853a9c0a4c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3068,6 +3068,7 @@ bool btrfs_is_free_space_trimmed(struct btrfs_block_group *block_group)
bool ret = true;
if (block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED &&
+ !test_bit(BLOCK_GROUP_FLAG_STRIPE_REMOVAL_PENDING, &block_group->runtime_flags) &&
block_group->identity_remap_count == 0) {
return true;
}
@@ -3850,6 +3851,11 @@ void btrfs_trim_fully_remapped_block_group(struct btrfs_block_group *bg)
u64 end = btrfs_block_group_end(bg);
struct btrfs_chunk_map *map;
+ if (!test_bit(BLOCK_GROUP_FLAG_STRIPE_REMOVAL_PENDING, &bg->runtime_flags)) {
+ bg->discard_cursor = end;
+ goto skip_discard;
+ }
+
bytes = end - bg->discard_cursor;
if (max_discard_size &&
@@ -3888,6 +3894,7 @@ void btrfs_trim_fully_remapped_block_group(struct btrfs_block_group *bg)
btrfs_free_chunk_map(map);
+skip_discard:
if (bg->used == 0) {
spin_lock(&fs_info->unused_bgs_lock);
if (!list_empty(&bg->bg_list)) {
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 15c4a7c6b1ef..276201fe8f2d 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4788,6 +4788,10 @@ int btrfs_last_identity_remap_gone(struct btrfs_chunk_map *chunk,
btrfs_remove_bg_from_sinfo(bg);
+ spin_lock(&bg->lock);
+ clear_bit(BLOCK_GROUP_FLAG_STRIPE_REMOVAL_PENDING, &bg->runtime_flags);
+ spin_unlock(&bg->lock);
+
ret = remove_chunk_stripes(trans, chunk, path);
if (ret) {
btrfs_abort_transaction(trans, ret);
--
2.51.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v6 10/16] btrfs: handle setting up relocation of block group with remap-tree
2025-11-14 18:47 ` [PATCH v6 10/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
@ 2025-11-15 14:52 ` Sun Yangkai
2025-11-24 18:01 ` Mark Harmstone
0 siblings, 1 reply; 23+ messages in thread
From: Sun Yangkai @ 2025-11-15 14:52 UTC (permalink / raw)
To: mark; +Cc: boris, linux-btrfs
While reading the thread, I noticed the logic that builds the identity_remap
entries was a bit hard to follow.
I took the liberty of rewriting the function so that the two high-level cases
are immediately visible inside a single if/else. The result has no behavioral
change, and (at least to me) makes it obvious where the head/tail gaps are handled.
The modified code is shown below; feel free to pick it up if you find it useful.
Please let me know if I missed anything.
>
> +static int create_remap_tree_entries(struct btrfs_trans_handle *trans,
> + struct btrfs_path *path,
> + struct btrfs_block_group *bg)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + struct btrfs_free_space_info *fsi;
> + struct btrfs_key key, found_key;
> + struct extent_buffer *leaf;
> + struct btrfs_root *space_root;
> + u32 extent_count;
> + struct space_run *space_runs = NULL;
> + unsigned int num_space_runs = 0;
> + struct btrfs_key *entries = NULL;
> + unsigned int max_entries, num_entries;
> + int ret;
> +
> + mutex_lock(&bg->free_space_lock);
> +
> + if (test_bit(BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE, &bg->runtime_flags)) {
> + mutex_unlock(&bg->free_space_lock);
> +
> + ret = btrfs_add_block_group_free_space(trans, bg);
> + if (ret)
> + return ret;
> +
> + mutex_lock(&bg->free_space_lock);
> + }
> +
> + fsi = btrfs_search_free_space_info(trans, bg, path, 0);
> + if (IS_ERR(fsi)) {
> + mutex_unlock(&bg->free_space_lock);
> + return PTR_ERR(fsi);
> + }
> +
> + extent_count = btrfs_free_space_extent_count(path->nodes[0], fsi);
> +
> + btrfs_release_path(path);
> +
> + space_runs = kmalloc(sizeof(*space_runs) * extent_count, GFP_NOFS);
> + if (!space_runs) {
> + mutex_unlock(&bg->free_space_lock);
> + return -ENOMEM;
> + }
> +
> + key.objectid = bg->start;
> + key.type = 0;
> + key.offset = 0;
> +
> + space_root = btrfs_free_space_root(bg);
> +
> + ret = btrfs_search_slot(trans, space_root, &key, path, 0, 0);
> + if (ret < 0) {
> + mutex_unlock(&bg->free_space_lock);
> + goto out;
> + }
> +
> + ret = 0;
> +
> + while (true) {
> + leaf = path->nodes[0];
> +
> + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +
> + if (found_key.objectid >= bg->start + bg->length)
> + break;
> +
> + if (found_key.type == BTRFS_FREE_SPACE_EXTENT_KEY) {
> + if (num_space_runs != 0 &&
> + space_runs[num_space_runs - 1].end == found_key.objectid) {
> + space_runs[num_space_runs - 1].end =
> + found_key.objectid + found_key.offset;
> + } else {
> + BUG_ON(num_space_runs >= extent_count);
> +
> + space_runs[num_space_runs].start = found_key.objectid;
> + space_runs[num_space_runs].end =
> + found_key.objectid + found_key.offset;
> +
> + num_space_runs++;
> + }
> + } else if (found_key.type == BTRFS_FREE_SPACE_BITMAP_KEY) {
> + void *bitmap;
> + unsigned long offset;
> + u32 data_size;
> +
> + offset = btrfs_item_ptr_offset(leaf, path->slots[0]);
> + data_size = btrfs_item_size(leaf, path->slots[0]);
> +
> + if (data_size != 0) {
> + bitmap = kmalloc(data_size, GFP_NOFS);
> + if (!bitmap) {
> + mutex_unlock(&bg->free_space_lock);
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + read_extent_buffer(leaf, bitmap, offset,
> + data_size);
> +
> + parse_bitmap(fs_info->sectorsize, bitmap,
> + data_size * BITS_PER_BYTE,
> + found_key.objectid, space_runs,
> + &num_space_runs);
> +
> + BUG_ON(num_space_runs > extent_count);
> +
> + kfree(bitmap);
> + }
> + }
> +
> + path->slots[0]++;
> +
> + if (path->slots[0] >= btrfs_header_nritems(leaf)) {
> + ret = btrfs_next_leaf(space_root, path);
> + if (ret != 0) {
> + if (ret == 1)
> + ret = 0;
> + break;
> + }
> + leaf = path->nodes[0];
> + }
> + }
> +
> + btrfs_release_path(path);
> +
> + mutex_unlock(&bg->free_space_lock);
> +
> + max_entries = extent_count + 2;
> + entries = kmalloc(sizeof(*entries) * max_entries, GFP_NOFS);
> + if (!entries) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + num_entries = 0;
> +
> + if (num_space_runs > 0 && space_runs[0].start > bg->start) {
> + entries[num_entries].objectid = bg->start;
> + entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
> + entries[num_entries].offset = space_runs[0].start - bg->start;
> + num_entries++;
> + }
> +
> + for (unsigned int i = 1; i < num_space_runs; i++) {
> + entries[num_entries].objectid = space_runs[i - 1].end;
> + entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
> + entries[num_entries].offset =
> + space_runs[i].start - space_runs[i - 1].end;
> + num_entries++;
> + }
> +
> + if (num_space_runs == 0) {
> + entries[num_entries].objectid = bg->start;
> + entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
> + entries[num_entries].offset = bg->length;
> + num_entries++;
> + } else if (space_runs[num_space_runs - 1].end < bg->start + bg->length) {
> + entries[num_entries].objectid = space_runs[num_space_runs - 1].end;
> + entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
> + entries[num_entries].offset =
> + bg->start + bg->length - space_runs[num_space_runs - 1].end;
> + num_entries++;
> + }
> +
> + if (num_entries == 0)
> + goto out;
> +
> + bg->identity_remap_count = num_entries;
> +
> + ret = add_remap_tree_entries(trans, path, entries, num_entries);
We can group the empty and non-empty space_runs cases into an if/else to make
the two main flows obvious and reduce scattered conditions:
num_entries = 0;
if (num_space_runs == 0) {
entries[num_entries].objectid = bg->start;
entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
entries[num_entries].offset = bg->length;
num_entries++;
} else {
if (space_runs[0].start > bg->start) {
entries[num_entries].objectid = bg->start;
entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
entries[num_entries].offset = space_runs[0].start - bg->start;
num_entries++;
}
for (unsigned int i = 1; i < num_space_runs; i++) {
entries[num_entries].objectid = space_runs[i - 1].end;
entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
entries[num_entries].offset =
space_runs[i].start - space_runs[i - 1].end;
num_entries++;
}
if (space_runs[num_space_runs - 1].end < bg->start + bg->length) {
entries[num_entries].objectid = space_runs[num_space_runs - 1].end;
entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
entries[num_entries].offset =
bg->start + bg->length - space_runs[num_space_runs - 1].end;
num_entries++;
}
if (num_entries == 0)
goto out;
}
// I'm not sure if it's necessary but we can free space_runs earlier
// since we're also doing allocation in add_remap_tree_entries().
// kfree(space_runs);
// space_runs = NULL;
bg->identity_remap_count = num_entries;
ret = add_remap_tree_entries(trans, path, entries, num_entries);
> +
> +out:
> + kfree(entries);
> + kfree(space_runs);
> +
> + return ret;
> +}
Regards,
Sun Yangkai
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 09/16] btrfs: handle deletions from remapped block group
2025-11-14 18:47 ` [PATCH v6 09/16] btrfs: handle deletions from remapped block group Mark Harmstone
@ 2025-11-20 0:17 ` Boris Burkov
2025-11-24 12:40 ` Mark Harmstone
0 siblings, 1 reply; 23+ messages in thread
From: Boris Burkov @ 2025-11-20 0:17 UTC (permalink / raw)
To: Mark Harmstone; +Cc: linux-btrfs
On Fri, Nov 14, 2025 at 06:47:14PM +0000, Mark Harmstone wrote:
> Handle the case where we free an extent from a block group that has the
> REMAPPED flag set. Because the remap tree is orthogonal to the extent
> tree, for data this may be within any number of identity remaps or
> actual remaps. If we're freeing a metadata node, this will be wholly
> inside one or the other.
>
> btrfs_remove_extent_from_remap_tree() searches the remap tree for the
> remaps that cover the range in question, then calls
> remove_range_from_remap_tree() for each one, to punch a hole in the
> remap and adjust the free-space tree.
>
> For an identity remap, remove_range_from_remap_tree() will adjust the
> block group's `identity_remap_count` if this changes. If it reaches
> zero we mark the block group as fully remapped.
>
> When we commit the transaction, fully remapped block groups have their
> chunk stripes removed and their device extents freed, which makes the
> disk space available again to the chunk allocator.
>
> This is done when committing the transaction because it's a quick, rare
> operation which prevents the chunk allocator from ENOSPCing - but see
> later patches which do this asynchronously for the case of async
> discard.
>
This part of the message is out of date.
(thanks for changing it to the cleaner thread, btw)
This looks good to me now (aside from the commit message update)
Reviewed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Mark Harmstone <mark@harmstone.com>
> ---
> fs/btrfs/block-group.c | 101 ++++++---
> fs/btrfs/block-group.h | 4 +
> fs/btrfs/disk-io.c | 6 +
> fs/btrfs/extent-tree.c | 76 ++++++-
> fs/btrfs/extent-tree.h | 1 +
> fs/btrfs/fs.h | 4 +-
> fs/btrfs/relocation.c | 452 +++++++++++++++++++++++++++++++++++++++++
> fs/btrfs/relocation.h | 5 +
> fs/btrfs/volumes.c | 56 +++--
> fs/btrfs/volumes.h | 6 +
> 10 files changed, 656 insertions(+), 55 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 3ebce7d6aae0..e269518e1bfe 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -1068,6 +1068,32 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans,
> return ret;
> }
>
> +void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group)
> +{
> + int factor = btrfs_bg_type_to_factor(block_group->flags);
> +
> + spin_lock(&block_group->space_info->lock);
> +
> + if (btrfs_test_opt(block_group->fs_info, ENOSPC_DEBUG)) {
> + WARN_ON(block_group->space_info->total_bytes
> + < block_group->length);
> + WARN_ON(block_group->space_info->bytes_readonly
> + < block_group->length - block_group->zone_unusable);
> + WARN_ON(block_group->space_info->bytes_zone_unusable
> + < block_group->zone_unusable);
> + WARN_ON(block_group->space_info->disk_total
> + < block_group->length * factor);
> + }
> + block_group->space_info->total_bytes -= block_group->length;
> + block_group->space_info->bytes_readonly -=
> + (block_group->length - block_group->zone_unusable);
> + btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
> + -block_group->zone_unusable);
> + block_group->space_info->disk_total -= block_group->length * factor;
> +
> + spin_unlock(&block_group->space_info->lock);
> +}
> +
> int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> struct btrfs_chunk_map *map)
> {
> @@ -1079,7 +1105,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> struct kobject *kobj = NULL;
> int ret;
> int index;
> - int factor;
> struct btrfs_caching_control *caching_ctl = NULL;
> bool remove_map;
> bool remove_rsv = false;
> @@ -1088,7 +1113,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> if (!block_group)
> return -ENOENT;
>
> - BUG_ON(!block_group->ro);
> + BUG_ON(!block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED));
>
> trace_btrfs_remove_block_group(block_group);
> /*
> @@ -1100,7 +1125,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> block_group->length);
>
> index = btrfs_bg_flags_to_raid_index(block_group->flags);
> - factor = btrfs_bg_type_to_factor(block_group->flags);
>
> /* make sure this block group isn't part of an allocation cluster */
> cluster = &fs_info->data_alloc_cluster;
> @@ -1224,26 +1248,11 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>
> spin_lock(&block_group->space_info->lock);
> list_del_init(&block_group->ro_list);
> -
> - if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
> - WARN_ON(block_group->space_info->total_bytes
> - < block_group->length);
> - WARN_ON(block_group->space_info->bytes_readonly
> - < block_group->length - block_group->zone_unusable);
> - WARN_ON(block_group->space_info->bytes_zone_unusable
> - < block_group->zone_unusable);
> - WARN_ON(block_group->space_info->disk_total
> - < block_group->length * factor);
> - }
> - block_group->space_info->total_bytes -= block_group->length;
> - block_group->space_info->bytes_readonly -=
> - (block_group->length - block_group->zone_unusable);
> - btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
> - -block_group->zone_unusable);
> - block_group->space_info->disk_total -= block_group->length * factor;
> -
> spin_unlock(&block_group->space_info->lock);
>
> + if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED))
> + btrfs_remove_bg_from_sinfo(block_group);
> +
> /*
> * Remove the free space for the block group from the free space tree
> * and the block group's item from the extent tree before marking the
> @@ -1578,8 +1587,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>
> spin_lock(&space_info->lock);
> spin_lock(&block_group->lock);
> - if (btrfs_is_block_group_used(block_group) || block_group->ro ||
> - list_is_singular(&block_group->list)) {
> + if (btrfs_is_block_group_used(block_group) ||
> + (block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) ||
> + list_is_singular(&block_group->list) ||
> + block_group->fully_remapped) {
> /*
> * We want to bail if we made new allocations or have
> * outstanding allocations in this block group. We do
> @@ -1620,9 +1631,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
> * needing to allocate extents from the block group.
> */
> used = btrfs_space_info_used(space_info, true);
> - if ((space_info->total_bytes - block_group->length < used &&
> - block_group->zone_unusable < block_group->length) ||
> - has_unwritten_metadata(block_group)) {
> + if (((space_info->total_bytes - block_group->length < used &&
> + block_group->zone_unusable < block_group->length) ||
> + has_unwritten_metadata(block_group)) &&
> + !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
> /*
> * Add a reference for the list, compensate for the ref
> * drop under the "next" label for the
> @@ -1787,6 +1799,12 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
> btrfs_get_block_group(bg);
> trace_btrfs_add_unused_block_group(bg);
> list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
> + } else if (bg->flags & BTRFS_BLOCK_GROUP_REMAPPED &&
> + bg->identity_remap_count == 0) {
> + /*
> + * Leave fully remapped block groups on the
> + * fully_remapped_bgs list.
> + */
> } else if (!test_bit(BLOCK_GROUP_FLAG_NEW, &bg->runtime_flags)) {
> /* Pull out the block group from the reclaim_bgs list. */
> trace_btrfs_add_unused_block_group(bg);
> @@ -4600,6 +4618,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
> list_del_init(&block_group->bg_list);
> btrfs_put_block_group(block_group);
> }
> +
> + while (!list_empty(&info->fully_remapped_bgs)) {
> + block_group = list_first_entry(&info->fully_remapped_bgs,
> + struct btrfs_block_group,
> + bg_list);
> + list_del_init(&block_group->bg_list);
> + btrfs_put_block_group(block_group);
> + }
> spin_unlock(&info->unused_bgs_lock);
>
> spin_lock(&info->zone_active_bgs_lock);
> @@ -4787,3 +4813,26 @@ bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg)
> return false;
> return true;
> }
> +
> +void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
> + struct btrfs_trans_handle *trans)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> +
> + spin_lock(&fs_info->unused_bgs_lock);
> +
> + /*
> + * The block group might already be on the unused_bgs list, remove it
> + * if it is. It'll get readded after the async discard worker finishes,
> + * or in btrfs_handle_fully_remapped_bgs() if we're not using async
> + * discard.
> + */
> + if (!list_empty(&bg->bg_list))
> + list_del(&bg->bg_list);
> + else
> + btrfs_get_block_group(bg);
> +
> + list_add_tail(&bg->bg_list, &fs_info->fully_remapped_bgs);
> +
> + spin_unlock(&fs_info->unused_bgs_lock);
> +}
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index af23fdb3cf4d..d85f3c2546d0 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -282,6 +282,7 @@ struct btrfs_block_group {
> struct extent_buffer *last_eb;
> enum btrfs_block_group_size_class size_class;
> u64 reclaim_mark;
> + bool fully_remapped;
> };
>
> static inline u64 btrfs_block_group_end(const struct btrfs_block_group *block_group)
> @@ -336,6 +337,7 @@ int btrfs_add_new_free_space(struct btrfs_block_group *block_group,
> struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
> struct btrfs_fs_info *fs_info,
> const u64 chunk_offset);
> +void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group);
> int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> struct btrfs_chunk_map *map);
> void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
> @@ -407,5 +409,7 @@ int btrfs_use_block_group_size_class(struct btrfs_block_group *bg,
> enum btrfs_block_group_size_class size_class,
> bool force_wrong_size_class);
> bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg);
> +void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
> + struct btrfs_trans_handle *trans);
>
> #endif /* BTRFS_BLOCK_GROUP_H */
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 9809e30fe103..53221a0131fb 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1526,6 +1526,10 @@ static int cleaner_kthread(void *arg)
> */
> btrfs_run_defrag_inodes(fs_info);
>
> + if (btrfs_fs_incompat(fs_info, REMAP_TREE) &&
> + !btrfs_test_opt(fs_info, DISCARD_ASYNC))
> + btrfs_handle_fully_remapped_bgs(fs_info);
> +
> /*
> * Acquires fs_info->reclaim_bgs_lock to avoid racing
> * with relocation (btrfs_relocate_chunk) and relocation
> @@ -2878,6 +2882,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
> INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
> INIT_LIST_HEAD(&fs_info->unused_bgs);
> INIT_LIST_HEAD(&fs_info->reclaim_bgs);
> + INIT_LIST_HEAD(&fs_info->fully_remapped_bgs);
> INIT_LIST_HEAD(&fs_info->zone_active_bgs);
> #ifdef CONFIG_BTRFS_DEBUG
> INIT_LIST_HEAD(&fs_info->allocated_roots);
> @@ -2933,6 +2938,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
> mutex_init(&fs_info->chunk_mutex);
> mutex_init(&fs_info->transaction_kthread_mutex);
> mutex_init(&fs_info->cleaner_mutex);
> + mutex_init(&fs_info->remap_mutex);
> mutex_init(&fs_info->ro_block_group_mutex);
> init_rwsem(&fs_info->commit_root_sem);
> init_rwsem(&fs_info->cleanup_work_sem);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a7e522f67cca..b8fed3246e1f 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -41,6 +41,7 @@
> #include "tree-checker.h"
> #include "raid-stripe-tree.h"
> #include "delayed-inode.h"
> +#include "relocation.h"
>
> #undef SCRAMBLE_DELAYED_REFS
>
> @@ -2846,6 +2847,51 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
> return 0;
> }
>
> +void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info)
> +{
> + struct btrfs_block_group *block_group;
> + int ret;
> +
> + spin_lock(&fs_info->unused_bgs_lock);
> + while (!list_empty(&fs_info->fully_remapped_bgs)) {
> + struct btrfs_chunk_map *map;
> +
> + block_group = list_first_entry(&fs_info->fully_remapped_bgs,
> + struct btrfs_block_group,
> + bg_list);
> + list_del_init(&block_group->bg_list);
> + spin_unlock(&fs_info->unused_bgs_lock);
> +
> + map = btrfs_get_chunk_map(fs_info, block_group->start, 1);
> + if (IS_ERR(map)) {
> + btrfs_put_block_group(block_group);
> + return;
> + }
> +
> + ret = btrfs_last_identity_remap_gone(map, block_group);
> + if (ret) {
> + btrfs_free_chunk_map(map);
> + btrfs_put_block_group(block_group);
> + return;
> + }
> +
> + /*
> + * Set num_stripes to 0, so that btrfs_remove_dev_extents()
> + * won't run a second time.
> + */
> + map->num_stripes = 0;
> +
> + btrfs_free_chunk_map(map);
> +
> + if (block_group->used == 0)
> + btrfs_mark_bg_unused(block_group);
> +
> + btrfs_put_block_group(block_group);
> + spin_lock(&fs_info->unused_bgs_lock);
> + }
> + spin_unlock(&fs_info->unused_bgs_lock);
> +}
> +
> int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
> {
> struct btrfs_fs_info *fs_info = trans->fs_info;
> @@ -2998,11 +3044,23 @@ u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info,
> }
>
> static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
> - u64 bytenr, struct btrfs_squota_delta *delta)
> + u64 bytenr, struct btrfs_squota_delta *delta,
> + struct btrfs_path *path)
> {
> int ret;
> + bool remapped = false;
> u64 num_bytes = delta->num_bytes;
>
> + /* returns 1 on success and 0 on no-op */
> + ret = btrfs_remove_extent_from_remap_tree(trans, path, bytenr,
> + num_bytes);
> + if (ret < 0) {
> + btrfs_abort_transaction(trans, ret);
> + return ret;
> + } else if (ret == 1) {
> + remapped = true;
> + }
> +
> if (delta->is_data) {
> struct btrfs_root *csum_root;
>
> @@ -3026,10 +3084,16 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
> return ret;
> }
>
> - ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
> - if (unlikely(ret)) {
> - btrfs_abort_transaction(trans, ret);
> - return ret;
> + /*
> + * If remapped, FST has already been taken care of in
> + * remove_range_from_remap_tree().
> + */
> + if (!remapped) {
> + ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
> + if (unlikely(ret)) {
> + btrfs_abort_transaction(trans, ret);
> + return ret;
> + }
> }
>
> ret = btrfs_update_block_group(trans, bytenr, num_bytes, false);
> @@ -3395,7 +3459,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
> }
> btrfs_release_path(path);
>
> - ret = do_free_extent_accounting(trans, bytenr, &delta);
> + ret = do_free_extent_accounting(trans, bytenr, &delta, path);
> }
> btrfs_release_path(path);
>
> diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h
> index e573509c5a71..a15a9497c9f3 100644
> --- a/fs/btrfs/extent-tree.h
> +++ b/fs/btrfs/extent-tree.h
> @@ -163,5 +163,6 @@ void btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u6
> int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
> u64 num_bytes, u64 *actual_bytes);
> int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range);
> +void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info);
>
> #endif
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index 72fde0a3aaaf..9dbb482d8928 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -577,6 +577,7 @@ struct btrfs_fs_info {
> struct mutex transaction_kthread_mutex;
> struct mutex cleaner_mutex;
> struct mutex chunk_mutex;
> + struct mutex remap_mutex;
>
> /*
> * This is taken to make sure we don't set block groups ro after the
> @@ -830,10 +831,11 @@ struct btrfs_fs_info {
> struct list_head reclaim_bgs;
> int bg_reclaim_threshold;
>
> - /* Protects the lists unused_bgs and reclaim_bgs. */
> + /* Protects the lists unused_bgs, reclaim_bgs, and fully_remapped_bgs. */
> spinlock_t unused_bgs_lock;
> /* Protected by unused_bgs_lock. */
> struct list_head unused_bgs;
> + struct list_head fully_remapped_bgs;
> struct mutex unused_bg_unpin_mutex;
> /* Protect block groups that are going to be deleted */
> struct mutex reclaim_bgs_lock;
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 00e1898edbbe..315f212718ad 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -37,6 +37,7 @@
> #include "super.h"
> #include "tree-checker.h"
> #include "raid-stripe-tree.h"
> +#include "free-space-tree.h"
>
> /*
> * Relocation overview
> @@ -3860,6 +3861,183 @@ static const char *stage_to_string(enum reloc_stage stage)
> return "unknown";
> }
>
> +static void adjust_block_group_remap_bytes(struct btrfs_trans_handle *trans,
> + struct btrfs_block_group *bg,
> + s64 diff)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + bool bg_already_dirty = true, mark_unused = false;
> +
> + spin_lock(&bg->lock);
> +
> + bg->remap_bytes += diff;
> +
> + if (bg->used == 0 && bg->remap_bytes == 0)
> + mark_unused = true;
> +
> + spin_unlock(&bg->lock);
> +
> + if (mark_unused)
> + btrfs_mark_bg_unused(bg);
> +
> + spin_lock(&trans->transaction->dirty_bgs_lock);
> + if (list_empty(&bg->dirty_list)) {
> + list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
> + bg_already_dirty = false;
> + btrfs_get_block_group(bg);
> + }
> + spin_unlock(&trans->transaction->dirty_bgs_lock);
> +
> + /* Modified block groups are accounted for in the delayed_refs_rsv. */
> + if (!bg_already_dirty)
> + btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
> +}
> +
> +static int remove_chunk_stripes(struct btrfs_trans_handle *trans,
> + struct btrfs_chunk_map *chunk,
> + struct btrfs_path *path)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + struct btrfs_key key;
> + struct extent_buffer *leaf;
> + struct btrfs_chunk *c;
> + int ret;
> +
> + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> + key.type = BTRFS_CHUNK_ITEM_KEY;
> + key.offset = chunk->start;
> +
> + btrfs_reserve_chunk_metadata(trans, false);
> +
> + ret = btrfs_search_slot(trans, fs_info->chunk_root, &key, path,
> + 0, 1);
> + if (ret) {
> + if (ret == 1) {
> + btrfs_release_path(path);
> + ret = -ENOENT;
> + }
> + btrfs_trans_release_chunk_metadata(trans);
> + return ret;
> + }
> +
> + leaf = path->nodes[0];
> +
> + c = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_chunk);
> + btrfs_set_chunk_num_stripes(leaf, c, 0);
> + btrfs_set_chunk_sub_stripes(leaf, c, 0);
> +
> + btrfs_truncate_item(trans, path, offsetof(struct btrfs_chunk, stripe),
> + 1);
> +
> + btrfs_mark_buffer_dirty(trans, leaf);
> +
> + btrfs_release_path(path);
> + btrfs_trans_release_chunk_metadata(trans);
> +
> + return 0;
> +}
> +
> +int btrfs_last_identity_remap_gone(struct btrfs_chunk_map *chunk,
> + struct btrfs_block_group *bg)
> +{
> + struct btrfs_fs_info *fs_info = bg->fs_info;
> + struct btrfs_trans_handle *trans;
> + int ret;
> + unsigned int num_items;
> + BTRFS_PATH_AUTO_FREE(path);
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return -ENOMEM;
> +
> + /*
> + * One item for each entry we're removing in the dev extents tree, and
> + * another for each device. DUP chunks are all on one device,
> + * everything else has one device per stripe.
> + */
> + if (bg->flags & BTRFS_BLOCK_GROUP_DUP)
> + num_items = chunk->num_stripes + 1;
> + else
> + num_items = 2 * chunk->num_stripes;
> +
> + trans = btrfs_start_transaction_fallback_global_rsv(fs_info->tree_root,
> + num_items);
> + if (IS_ERR(trans))
> + return PTR_ERR(trans);
> +
> + ret = btrfs_remove_dev_extents(trans, chunk);
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + return ret;
> + }
> +
> + mutex_lock(&trans->fs_info->chunk_mutex);
> +
> + for (unsigned int i = 0; i < chunk->num_stripes; i++) {
> + ret = btrfs_update_device(trans, chunk->stripes[i].dev);
> + if (ret) {
> + mutex_unlock(&trans->fs_info->chunk_mutex);
> + btrfs_abort_transaction(trans, ret);
> + return ret;
> + }
> + }
> +
> + mutex_unlock(&trans->fs_info->chunk_mutex);
> +
> + write_lock(&trans->fs_info->mapping_tree_lock);
> + btrfs_chunk_map_device_clear_bits(chunk, CHUNK_ALLOCATED);
> + write_unlock(&trans->fs_info->mapping_tree_lock);
> +
> + btrfs_remove_bg_from_sinfo(bg);
> +
> + ret = remove_chunk_stripes(trans, chunk, path);
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + return ret;
> + }
> +
> + ret = btrfs_commit_transaction(trans);
> + if (ret)
> + return ret;
> +
> + return 0;
> +}
> +
> +static void adjust_identity_remap_count(struct btrfs_trans_handle *trans,
> + struct btrfs_block_group *bg, int delta)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + bool bg_already_dirty = true, mark_fully_remapped = false;
> +
> + WARN_ON(delta < 0 && -delta > bg->identity_remap_count);
> +
> + spin_lock(&bg->lock);
> +
> + bg->identity_remap_count += delta;
> +
> + if (bg->identity_remap_count == 0 && !bg->fully_remapped) {
> + bg->fully_remapped = true;
> + mark_fully_remapped = true;
> + }
> +
> + spin_unlock(&bg->lock);
> +
> + spin_lock(&trans->transaction->dirty_bgs_lock);
> + if (list_empty(&bg->dirty_list)) {
> + list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
> + bg_already_dirty = false;
> + btrfs_get_block_group(bg);
> + }
> + spin_unlock(&trans->transaction->dirty_bgs_lock);
> +
> + /* Modified block groups are accounted for in the delayed_refs_rsv. */
> + if (!bg_already_dirty)
> + btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
> +
> + if (mark_fully_remapped)
> + btrfs_mark_bg_fully_remapped(bg, trans);
> +}
> +
> int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
> u64 *length)
> {
> @@ -4468,3 +4646,277 @@ u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info)
> logical = fs_info->reloc_ctl->block_group->start;
> return logical;
> }
> +
> +static int insert_remap_item(struct btrfs_trans_handle *trans,
> + struct btrfs_path *path, u64 old_addr, u64 length,
> + u64 new_addr)
> +{
> + int ret;
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + struct btrfs_key key;
> + struct btrfs_remap remap;
> +
> + if (old_addr == new_addr) {
> + /* Add new identity remap item. */
> +
> + key.objectid = old_addr;
> + key.type = BTRFS_IDENTITY_REMAP_KEY;
> + key.offset = length;
> +
> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
> + path, &key, 0);
> + if (ret)
> + return ret;
> + } else {
> + /* Add new remap item. */
> +
> + key.objectid = old_addr;
> + key.type = BTRFS_REMAP_KEY;
> + key.offset = length;
> +
> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
> + path, &key,
> + sizeof(struct btrfs_remap));
> + if (ret)
> + return ret;
> +
> + btrfs_set_stack_remap_address(&remap, new_addr);
> +
> + write_extent_buffer(path->nodes[0], &remap,
> + btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
> + sizeof(struct btrfs_remap));
> +
> + btrfs_release_path(path);
> +
> + /* Add new backref item. */
> +
> + key.objectid = new_addr;
> + key.type = BTRFS_REMAP_BACKREF_KEY;
> + key.offset = length;
> +
> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
> + path, &key,
> + sizeof(struct btrfs_remap));
> + if (ret)
> + return ret;
> +
> + btrfs_set_stack_remap_address(&remap, old_addr);
> +
> + write_extent_buffer(path->nodes[0], &remap,
> + btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
> + sizeof(struct btrfs_remap));
> + }
> +
> + btrfs_release_path(path);
> +
> + return 0;
> +}
> +
> +/*
> + * Punch a hole in the remap item or identity remap item pointed to by path,
> + * for the range [hole_start, hole_start + hole_length).
> + */
> +static int remove_range_from_remap_tree(struct btrfs_trans_handle *trans,
> + struct btrfs_path *path,
> + struct btrfs_block_group *bg,
> + u64 hole_start, u64 hole_length)
> +{
> + int ret;
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + struct extent_buffer *leaf = path->nodes[0];
> + struct btrfs_key key;
> + u64 hole_end, new_addr, remap_start, remap_length, remap_end,
> + overlap_length;
> + bool is_identity_remap;
> + int identity_count_delta = 0;
> +
> + hole_end = hole_start + hole_length;
> +
> + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
> +
> + is_identity_remap = key.type == BTRFS_IDENTITY_REMAP_KEY;
> +
> + remap_start = key.objectid;
> + remap_length = key.offset;
> +
> + remap_end = remap_start + remap_length;
> +
> + if (is_identity_remap) {
> + new_addr = remap_start;
> + } else {
> + struct btrfs_remap *remap_ptr;
> +
> + remap_ptr = btrfs_item_ptr(leaf, path->slots[0],
> + struct btrfs_remap);
> + new_addr = btrfs_remap_address(leaf, remap_ptr);
> + }
> +
> + /* Delete old item. */
> +
> + ret = btrfs_del_item(trans, fs_info->remap_root, path);
> +
> + btrfs_release_path(path);
> +
> + if (ret)
> + return ret;
> +
> + if (is_identity_remap) {
> + identity_count_delta = -1;
> + } else {
> + /* Remove backref. */
> +
> + key.objectid = new_addr;
> + key.type = BTRFS_REMAP_BACKREF_KEY;
> + key.offset = remap_length;
> +
> + ret = btrfs_search_slot(trans, fs_info->remap_root,
> + &key, path, -1, 1);
> + if (ret) {
> + if (ret == 1) {
> + btrfs_release_path(path);
> + ret = -ENOENT;
> + }
> + return ret;
> + }
> +
> + ret = btrfs_del_item(trans, fs_info->remap_root, path);
> +
> + btrfs_release_path(path);
> +
> + if (ret)
> + return ret;
> + }
> +
> + /* If hole_start > remap_start, re-add the start of the remap item. */
> + if (hole_start > remap_start) {
> + ret = insert_remap_item(trans, path, remap_start,
> + hole_start - remap_start, new_addr);
> + if (ret)
> + return ret;
> +
> + if (is_identity_remap)
> + identity_count_delta++;
> + }
> +
> + /* If hole_end < remap_end, re-add the end of the remap item. */
> + if (hole_end < remap_end) {
> + ret = insert_remap_item(trans, path, hole_end,
> + remap_end - hole_end,
> + hole_end - remap_start + new_addr);
> + if (ret)
> + return ret;
> +
> + if (is_identity_remap)
> + identity_count_delta++;
> + }
> +
> + if (identity_count_delta != 0)
> + adjust_identity_remap_count(trans, bg, identity_count_delta);
> +
> + overlap_length = min_t(u64, hole_end, remap_end) -
> + max_t(u64, hole_start, remap_start);
> +
> + if (!is_identity_remap) {
> + struct btrfs_block_group *dest_bg;
> +
> + dest_bg = btrfs_lookup_block_group(fs_info, new_addr);
> +
> + adjust_block_group_remap_bytes(trans, dest_bg, -overlap_length);
> +
> + btrfs_put_block_group(dest_bg);
> +
> + ret = btrfs_add_to_free_space_tree(trans,
> + hole_start - remap_start + new_addr,
> + overlap_length);
> + if (ret)
> + return ret;
> + }
> +
> + ret = overlap_length;
> +
> + return ret;
> +}
> +
> +/*
> + * Returns 1 if remove_range_from_remap_tree() has been called successfully,
> + * 0 if block group wasn't remapped, and a negative number on error.
> + */
> +int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
> + struct btrfs_path *path,
> + u64 bytenr, u64 num_bytes)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + struct btrfs_key key, found_key;
> + struct extent_buffer *leaf;
> + struct btrfs_block_group *bg;
> + int ret, length;
> +
> + if (!(btrfs_super_incompat_flags(fs_info->super_copy) &
> + BTRFS_FEATURE_INCOMPAT_REMAP_TREE))
> + return 0;
> +
> + bg = btrfs_lookup_block_group(fs_info, bytenr);
> + if (!bg)
> + return 0;
> +
> + mutex_lock(&fs_info->remap_mutex);
> +
> + if (!(bg->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
> + mutex_unlock(&fs_info->remap_mutex);
> + btrfs_put_block_group(bg);
> + return 0;
> + }
> +
> + do {
> + key.objectid = bytenr;
> + key.type = (u8)-1;
> + key.offset = (u64)-1;
> +
> + ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path,
> + -1, 1);
> + if (ret < 0)
> + goto end;
> +
> + leaf = path->nodes[0];
> +
> + if (path->slots[0] == 0) {
> + ret = -ENOENT;
> + goto end;
> + }
> +
> + path->slots[0]--;
> +
> + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +
> + if (found_key.type != BTRFS_IDENTITY_REMAP_KEY &&
> + found_key.type != BTRFS_REMAP_KEY) {
> + ret = -ENOENT;
> + goto end;
> + }
> +
> + if (bytenr < found_key.objectid ||
> + bytenr >= found_key.objectid + found_key.offset) {
> + ret = -ENOENT;
> + goto end;
> + }
> +
> + length = remove_range_from_remap_tree(trans, path, bg, bytenr,
> + num_bytes);
> + if (length < 0) {
> + ret = length;
> + goto end;
> + }
> +
> + bytenr += length;
> + num_bytes -= length;
> + } while (num_bytes > 0);
> +
> + ret = 1;
> +
> +end:
> + mutex_unlock(&fs_info->remap_mutex);
> +
> + btrfs_put_block_group(bg);
> + btrfs_release_path(path);
> + return ret;
> +}
> diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
> index b2ba83966650..ffb497f27889 100644
> --- a/fs/btrfs/relocation.h
> +++ b/fs/btrfs/relocation.h
> @@ -33,5 +33,10 @@ bool btrfs_should_ignore_reloc_root(const struct btrfs_root *root);
> u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info);
> int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
> u64 *length);
> +int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
> + struct btrfs_path *path,
> + u64 bytenr, u64 num_bytes);
> +int btrfs_last_identity_remap_gone(struct btrfs_chunk_map *chunk,
> + struct btrfs_block_group *bg);
>
> #endif
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 6a72c2a599a6..2347b37113b0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2928,8 +2928,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> return ret;
> }
>
> -static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
> - struct btrfs_device *device)
> +int btrfs_update_device(struct btrfs_trans_handle *trans,
> + struct btrfs_device *device)
> {
> int ret;
> BTRFS_PATH_AUTO_FREE(path);
> @@ -3227,25 +3227,13 @@ static int remove_chunk_item(struct btrfs_trans_handle *trans,
> return btrfs_free_chunk(trans, chunk_offset);
> }
>
> -int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
> +int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
> + struct btrfs_chunk_map *map)
> {
> struct btrfs_fs_info *fs_info = trans->fs_info;
> - struct btrfs_chunk_map *map;
> + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> u64 dev_extent_len = 0;
> int i, ret = 0;
> - struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
> -
> - map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
> - if (IS_ERR(map)) {
> - /*
> - * This is a logic error, but we don't want to just rely on the
> - * user having built with ASSERT enabled, so if ASSERT doesn't
> - * do anything we still error out.
> - */
> - DEBUG_WARN("errr %ld reading chunk map at offset %llu",
> - PTR_ERR(map), chunk_offset);
> - return PTR_ERR(map);
> - }
>
> /*
> * First delete the device extent items from the devices btree.
> @@ -3266,7 +3254,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
> if (unlikely(ret)) {
> mutex_unlock(&fs_devices->device_list_mutex);
> btrfs_abort_transaction(trans, ret);
> - goto out;
> + return ret;
> }
>
> if (device->bytes_used > 0) {
> @@ -3286,6 +3274,30 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
> }
> mutex_unlock(&fs_devices->device_list_mutex);
>
> + return 0;
> +}
> +
> +int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + struct btrfs_chunk_map *map;
> + int ret;
> +
> + map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
> + if (IS_ERR(map)) {
> + /*
> + * This is a logic error, but we don't want to just rely on the
> + * user having built with ASSERT enabled, so if ASSERT doesn't
> + * do anything we still error out.
> + */
> + ASSERT(0);
> + return PTR_ERR(map);
> + }
> +
> + ret = btrfs_remove_dev_extents(trans, map);
> + if (ret)
> + goto out;
> +
> /*
> * We acquire fs_info->chunk_mutex for 2 reasons:
> *
> @@ -5419,7 +5431,7 @@ static void chunk_map_device_set_bits(struct btrfs_chunk_map *map, unsigned int
> }
> }
>
> -static void chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
> +void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
> {
> for (int i = 0; i < map->num_stripes; i++) {
> struct btrfs_io_stripe *stripe = &map->stripes[i];
> @@ -5436,7 +5448,7 @@ void btrfs_remove_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_ma
> write_lock(&fs_info->mapping_tree_lock);
> rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
> RB_CLEAR_NODE(&map->rb_node);
> - chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
> + btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
> write_unlock(&fs_info->mapping_tree_lock);
>
> /* Once for the tree reference. */
> @@ -5472,7 +5484,7 @@ int btrfs_add_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_map *m
> return -EEXIST;
> }
> chunk_map_device_set_bits(map, CHUNK_ALLOCATED);
> - chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
> + btrfs_chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
> write_unlock(&fs_info->mapping_tree_lock);
>
> return 0;
> @@ -5828,7 +5840,7 @@ void btrfs_mapping_tree_free(struct btrfs_fs_info *fs_info)
> map = rb_entry(node, struct btrfs_chunk_map, rb_node);
> rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
> RB_CLEAR_NODE(&map->rb_node);
> - chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
> + btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
> /* Once for the tree ref. */
> btrfs_free_chunk_map(map);
> cond_resched_rwlock_write(&fs_info->mapping_tree_lock);
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 4117fabb248b..ccf0a459180d 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -794,6 +794,8 @@ u64 btrfs_calc_stripe_length(const struct btrfs_chunk_map *map);
> int btrfs_nr_parity_stripes(u64 type);
> int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans,
> struct btrfs_block_group *bg);
> +int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
> + struct btrfs_chunk_map *map);
> int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
>
> #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
> @@ -905,6 +907,10 @@ bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical);
>
> bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
> const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb);
> +int btrfs_update_device(struct btrfs_trans_handle *trans,
> + struct btrfs_device *device);
> +void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map,
> + unsigned int bits);
>
> #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
> struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info,
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 15/16] btrfs: handle discarding fully-remapped block groups
2025-11-14 18:47 ` [PATCH v6 15/16] btrfs: handle discarding fully-remapped block groups Mark Harmstone
@ 2025-11-20 0:19 ` Boris Burkov
0 siblings, 0 replies; 23+ messages in thread
From: Boris Burkov @ 2025-11-20 0:19 UTC (permalink / raw)
To: Mark Harmstone; +Cc: linux-btrfs
On Fri, Nov 14, 2025 at 06:47:20PM +0000, Mark Harmstone wrote:
> Discard normally works by iterating over the free-space entries of a
> block group. This doesn't work for fully-remapped block groups, as we
> removed their free-space entries when we started relocation.
>
> For sync discard, call btrfs_discard_extent() when we commit the
> transaction in which the last identity remap was removed.
>
> For async discard, add a new function btrfs_trim_fully_remapped_block_group()
> to be called by the discard worker, which iterates over the block
> group's range using the normal async discard rules. Once we reach the
> end, remove the chunk's stripes and device extents to get back its free
> space.
>
> Signed-off-by: Mark Harmstone <mark@harmstone.com>
> ---
> fs/btrfs/block-group.c | 29 ++++++++--------
> fs/btrfs/block-group.h | 1 +
> fs/btrfs/discard.c | 57 ++++++++++++++++++++++++++++----
> fs/btrfs/extent-tree.c | 3 ++
> fs/btrfs/free-space-cache.c | 66 +++++++++++++++++++++++++++++++++++++
> fs/btrfs/free-space-cache.h | 1 +
> 6 files changed, 137 insertions(+), 20 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 4c4edaf3c753..965ae904ec2e 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -4823,20 +4823,23 @@ void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
> {
> struct btrfs_fs_info *fs_info = trans->fs_info;
>
> - spin_lock(&fs_info->unused_bgs_lock);
> + if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) {
> + btrfs_discard_queue_work(&fs_info->discard_ctl, bg);
> + } else {
> + spin_lock(&fs_info->unused_bgs_lock);
>
> - /*
> - * The block group might already be on the unused_bgs list, remove it
> - * if it is. It'll get readded after the async discard worker finishes,
> - * or in btrfs_handle_fully_remapped_bgs() if we're not using async
> - * discard.
> - */
> - if (!list_empty(&bg->bg_list))
> - list_del(&bg->bg_list);
> - else
> - btrfs_get_block_group(bg);
> + /*
> + * The block group might already be on the unused_bgs list,
> + * remove it if it is. It'll get readded after
> + * btrfs_handle_fully_remapped_bgs() finishes.
> + */
> + if (!list_empty(&bg->bg_list))
> + list_del(&bg->bg_list);
> + else
> + btrfs_get_block_group(bg);
>
> - list_add_tail(&bg->bg_list, &fs_info->fully_remapped_bgs);
> + list_add_tail(&bg->bg_list, &fs_info->fully_remapped_bgs);
>
> - spin_unlock(&fs_info->unused_bgs_lock);
> + spin_unlock(&fs_info->unused_bgs_lock);
> + }
> }
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 4522074a45c2..b0b16efea19a 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -49,6 +49,7 @@ enum btrfs_discard_state {
> BTRFS_DISCARD_EXTENTS,
> BTRFS_DISCARD_BITMAPS,
> BTRFS_DISCARD_RESET_CURSOR,
> + BTRFS_DISCARD_FULLY_REMAPPED,
> };
>
> /*
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index ee5f5b2788e1..f9890037395a 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -215,6 +215,27 @@ static struct btrfs_block_group *find_next_block_group(
> return ret_block_group;
> }
>
> +/*
> + * Returns whether a block group is empty.
> + *
> + * @block_group: block_group of interest
> + *
> + * "Empty" here means that there are no extents physically located within the
> + * device extents corresponding to this block group.
> + *
> + * For a remapped block group, this means that all of its identity remaps have
> + * been removed. For a non-remapped block group, this means that no extents
> + * have an address within its range, and that nothing has been remapped to be
> + * within it.
> + */
> +static bool block_group_is_empty(struct btrfs_block_group *block_group)
> +{
> + if (block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)
> + return block_group->identity_remap_count == 0;
> + else
> + return block_group->used == 0 && block_group->remap_bytes == 0;
> +}
> +
> /*
> * Look up next block group and set it for use.
> *
> @@ -241,8 +262,10 @@ static struct btrfs_block_group *peek_discard_list(
> block_group = find_next_block_group(discard_ctl, now);
>
> if (block_group && now >= block_group->discard_eligible_time) {
> + bool empty = block_group_is_empty(block_group);
> +
> if (block_group->discard_index == BTRFS_DISCARD_INDEX_UNUSED &&
> - block_group->used != 0) {
> + !empty) {
> if (btrfs_is_block_group_data_only(block_group)) {
> __add_to_discard_list(discard_ctl, block_group);
> /*
> @@ -267,7 +290,15 @@ static struct btrfs_block_group *peek_discard_list(
> }
> if (block_group->discard_state == BTRFS_DISCARD_RESET_CURSOR) {
> block_group->discard_cursor = block_group->start;
> - block_group->discard_state = BTRFS_DISCARD_EXTENTS;
> +
> + if (block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED &&
> + empty) {
> + block_group->discard_state =
> + BTRFS_DISCARD_FULLY_REMAPPED;
> + } else {
> + block_group->discard_state =
> + BTRFS_DISCARD_EXTENTS;
> + }
> }
> }
> if (block_group) {
> @@ -373,7 +404,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> return;
>
> - if (block_group->used == 0 && block_group->remap_bytes == 0)
> + if (block_group_is_empty(block_group))
> add_to_discard_unused_list(discard_ctl, block_group);
> else
> add_to_discard_list(discard_ctl, block_group);
> @@ -470,7 +501,7 @@ static void btrfs_finish_discard_pass(struct btrfs_discard_ctl *discard_ctl,
> {
> remove_from_discard_list(discard_ctl, block_group);
>
> - if (block_group->used == 0) {
> + if (block_group_is_empty(block_group)) {
> if (btrfs_is_free_space_trimmed(block_group))
> btrfs_mark_bg_unused(block_group);
> else
> @@ -524,7 +555,8 @@ static void btrfs_discard_workfn(struct work_struct *work)
> /* Perform discarding */
> minlen = discard_minlen[discard_index];
>
> - if (discard_state == BTRFS_DISCARD_BITMAPS) {
> + switch (discard_state) {
> + case BTRFS_DISCARD_BITMAPS: {
> u64 maxlen = 0;
>
> /*
> @@ -541,17 +573,28 @@ static void btrfs_discard_workfn(struct work_struct *work)
> btrfs_block_group_end(block_group),
> minlen, maxlen, true);
> discard_ctl->discard_bitmap_bytes += trimmed;
> - } else {
> +
> + break;
> + }
> +
> + case BTRFS_DISCARD_FULLY_REMAPPED:
> + btrfs_trim_fully_remapped_block_group(block_group);
> + break;
> +
> + default:
> btrfs_trim_block_group_extents(block_group, &trimmed,
> block_group->discard_cursor,
> btrfs_block_group_end(block_group),
> minlen, true);
> discard_ctl->discard_extent_bytes += trimmed;
> +
> + break;
> }
>
> /* Determine next steps for a block_group */
> if (block_group->discard_cursor >= btrfs_block_group_end(block_group)) {
> - if (discard_state == BTRFS_DISCARD_BITMAPS) {
> + if (discard_state == BTRFS_DISCARD_BITMAPS ||
> + discard_state == BTRFS_DISCARD_FULLY_REMAPPED) {
> btrfs_finish_discard_pass(discard_ctl, block_group);
> } else {
> block_group->discard_cursor = block_group->start;
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 91b4e1b0842c..f64ca57108af 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2876,6 +2876,9 @@ void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info)
> return;
> }
>
> + btrfs_discard_extent(fs_info, block_group->start,
> + block_group->length, NULL, false);
> +
> /*
> * Set num_stripes to 0, so that btrfs_remove_dev_extents()
> * won't run a second time.
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 30507fa8ad80..1b8716b17031 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -29,6 +29,7 @@
> #include "file-item.h"
> #include "file.h"
> #include "super.h"
> +#include "relocation.h"
>
> #define BITS_PER_BITMAP (PAGE_SIZE * 8UL)
> #define MAX_CACHE_BYTES_PER_GIG SZ_64K
> @@ -3066,6 +3067,11 @@ bool btrfs_is_free_space_trimmed(struct btrfs_block_group *block_group)
> struct rb_node *node;
> bool ret = true;
>
> + if (block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED &&
> + block_group->identity_remap_count == 0) {
> + return true;
> + }
> +
> spin_lock(&ctl->tree_lock);
> node = rb_first(&ctl->free_space_offset);
>
> @@ -3834,6 +3840,66 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
> return ret;
> }
>
> +void btrfs_trim_fully_remapped_block_group(struct btrfs_block_group *bg)
> +{
> + struct btrfs_fs_info *fs_info = bg->fs_info;
> + struct btrfs_discard_ctl *discard_ctl = &fs_info->discard_ctl;
> + int ret = 0;
> + u64 bytes, trimmed;
> + const u64 max_discard_size = READ_ONCE(discard_ctl->max_discard_size);
> + u64 end = btrfs_block_group_end(bg);
> + struct btrfs_chunk_map *map;
> +
> + bytes = end - bg->discard_cursor;
> +
> + if (max_discard_size &&
> + bytes >= (max_discard_size +
> + BTRFS_ASYNC_DISCARD_MIN_FILTER)) {
> + bytes = max_discard_size;
> + }
> +
> + ret = btrfs_discard_extent(fs_info, bg->discard_cursor, bytes, &trimmed,
> + false);
> + if (ret)
> + return;
> +
> + bg->discard_cursor += trimmed;
> +
> + if (bg->discard_cursor < end)
> + return;
> +
> + map = btrfs_get_chunk_map(fs_info, bg->start, 1);
> + if (IS_ERR(map)) {
> + ret = PTR_ERR(map);
> + return;
> + }
> +
> + ret = btrfs_last_identity_remap_gone(map, bg);
> + if (ret) {
> + btrfs_free_chunk_map(map);
> + return;
> + }
> +
> + /*
> + * Set num_stripes to 0, so that btrfs_remove_dev_extents()
> + * won't run a second time.
> + */
> + map->num_stripes = 0;
> +
> + btrfs_free_chunk_map(map);
> +
> + if (bg->used == 0) {
> + spin_lock(&fs_info->unused_bgs_lock);
> + if (!list_empty(&bg->bg_list)) {
> + list_del_init(&bg->bg_list);
> + btrfs_put_block_group(bg);
> + }
> + spin_unlock(&fs_info->unused_bgs_lock);
> +
> + btrfs_mark_bg_unused(bg);
> + }
This sequence here:
get_chunk -> last_identity_remap_gone -> num_stripes = 0 -> free_chunk_map -> mark_unused
is the same as in handle_fully_remapped_bgs, and should be shared. In
fact, that would have prevented you from having the bespoke
mark_bg_unused here but not there (which I think I did complain about in
v5, but it's hard to keep track over email..)
> +}
> +
> /*
> * If we break out of trimming a bitmap prematurely, we should reset the
> * trimming bit. In a rather contrived case, it's possible to race here so
> diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
> index 9f1dbfdee8ca..33fc3b245648 100644
> --- a/fs/btrfs/free-space-cache.h
> +++ b/fs/btrfs/free-space-cache.h
> @@ -166,6 +166,7 @@ int btrfs_trim_block_group_extents(struct btrfs_block_group *block_group,
> int btrfs_trim_block_group_bitmaps(struct btrfs_block_group *block_group,
> u64 *trimmed, u64 start, u64 end, u64 minlen,
> u64 maxlen, bool async);
> +void btrfs_trim_fully_remapped_block_group(struct btrfs_block_group *bg);
>
> bool btrfs_free_space_cache_v1_active(struct btrfs_fs_info *fs_info);
> int btrfs_set_free_space_cache_v1_active(struct btrfs_fs_info *fs_info, bool active);
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 12/16] btrfs: replace identity remaps with actual remaps when doing relocations
2025-11-14 18:47 ` [PATCH v6 12/16] btrfs: replace identity remaps with actual remaps when doing relocations Mark Harmstone
@ 2025-11-20 0:21 ` Boris Burkov
0 siblings, 0 replies; 23+ messages in thread
From: Boris Burkov @ 2025-11-20 0:21 UTC (permalink / raw)
To: Mark Harmstone; +Cc: linux-btrfs
On Fri, Nov 14, 2025 at 06:47:17PM +0000, Mark Harmstone wrote:
> Add a function do_remap_tree_reloc(), which does the actual work of
> doing a relocation using the remap tree.
>
> In a loop we call do_remap_reloc_trans(), which searches for the first
> identity remap for the block group. We call btrfs_reserve_extent() to
> find space elsewhere for it, and read the data into memory and write it
> to the new location. We then carve out the identity remap and replace it
> with an actual remap, which points to the new location in which to look.
>
> Once the last identity remap has been removed we call
> last_identity_remap_gone(), which, as with deletions, removes the
> chunk's stripes and device extents.
>
Reviewed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Mark Harmstone <mark@harmstone.com>
> ---
> fs/btrfs/relocation.c | 336 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 336 insertions(+)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index a95899af811d..15c4a7c6b1ef 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4636,6 +4636,61 @@ static int create_remap_tree_entries(struct btrfs_trans_handle *trans,
> return ret;
> }
>
> +static int find_next_identity_remap(struct btrfs_trans_handle *trans,
> + struct btrfs_path *path, u64 bg_end,
> + u64 last_start, u64 *start,
> + u64 *length)
> +{
> + int ret;
> + struct btrfs_key key, found_key;
> + struct btrfs_root *remap_root = trans->fs_info->remap_root;
> + struct extent_buffer *leaf;
> +
> + key.objectid = last_start;
> + key.type = BTRFS_IDENTITY_REMAP_KEY;
> + key.offset = 0;
> +
> + ret = btrfs_search_slot(trans, remap_root, &key, path, 0, 0);
> + if (ret < 0)
> + goto out;
> +
> + leaf = path->nodes[0];
> + while (true) {
> + if (path->slots[0] >= btrfs_header_nritems(leaf)) {
> + ret = btrfs_next_leaf(remap_root, path);
> +
> + if (ret != 0) {
> + if (ret == 1)
> + ret = -ENOENT;
> + goto out;
> + }
> +
> + leaf = path->nodes[0];
> + }
> +
> + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +
> + if (found_key.objectid >= bg_end) {
> + ret = -ENOENT;
> + goto out;
> + }
> +
> + if (found_key.type == BTRFS_IDENTITY_REMAP_KEY) {
> + *start = found_key.objectid;
> + *length = found_key.offset;
> + ret = 0;
> + goto out;
> + }
> +
> + path->slots[0]++;
> + }
> +
> +out:
> + btrfs_release_path(path);
> +
> + return ret;
> +}
> +
> static int remove_chunk_stripes(struct btrfs_trans_handle *trans,
> struct btrfs_chunk_map *chunk,
> struct btrfs_path *path)
> @@ -4781,6 +4836,96 @@ static void adjust_identity_remap_count(struct btrfs_trans_handle *trans,
> btrfs_mark_bg_fully_remapped(bg, trans);
> }
>
> +static int add_remap_entry(struct btrfs_trans_handle *trans,
> + struct btrfs_path *path,
> + struct btrfs_block_group *src_bg, u64 old_addr,
> + u64 new_addr, u64 length)
> +{
> + struct btrfs_fs_info *fs_info = trans->fs_info;
> + struct btrfs_key key, new_key;
> + int ret;
> + int identity_count_delta = 0;
> +
> + key.objectid = old_addr;
> + key.type = (u8)-1;
> + key.offset = (u64)-1;
> +
> + ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path, -1, 1);
> + if (ret < 0)
> + goto end;
> +
> + if (path->slots[0] == 0) {
> + ret = -ENOENT;
> + goto end;
> + }
> +
> + path->slots[0]--;
> +
> + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> +
> + if (key.type != BTRFS_IDENTITY_REMAP_KEY ||
> + key.objectid > old_addr ||
> + key.objectid + key.offset <= old_addr) {
> + ret = -ENOENT;
> + goto end;
> + }
> +
> + /* Shorten or delete identity mapping entry. */
> +
> + if (key.objectid == old_addr) {
> + ret = btrfs_del_item(trans, fs_info->remap_root, path);
> + if (ret)
> + goto end;
> +
> + identity_count_delta--;
> + } else {
> + new_key.objectid = key.objectid;
> + new_key.type = BTRFS_IDENTITY_REMAP_KEY;
> + new_key.offset = old_addr - key.objectid;
> +
> + btrfs_set_item_key_safe(trans, path, &new_key);
> + }
> +
> + btrfs_release_path(path);
> +
> + /* Create new remap entry. */
> +
> + ret = add_remap_item(trans, path, new_addr, length, old_addr);
> + if (ret)
> + goto end;
> +
> + /* Add entry for remainder of identity mapping, if necessary. */
> +
> + if (key.objectid + key.offset != old_addr + length) {
> + new_key.objectid = old_addr + length;
> + new_key.type = BTRFS_IDENTITY_REMAP_KEY;
> + new_key.offset = key.objectid + key.offset - old_addr - length;
> +
> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
> + path, &new_key, 0);
> + if (ret)
> + goto end;
> +
> + btrfs_release_path(path);
> +
> + identity_count_delta++;
> + }
> +
> + /* Add backref. */
> +
> + ret = add_remap_backref_item(trans, path, new_addr, length, old_addr);
> + if (ret)
> + goto end;
> +
> + if (identity_count_delta != 0)
> + adjust_identity_remap_count(trans, src_bg, identity_count_delta);
> +
> +end:
> + btrfs_release_path(path);
> +
> + return ret;
> +}
> +
> static int mark_chunk_remapped(struct btrfs_trans_handle *trans,
> struct btrfs_path *path, uint64_t start)
> {
> @@ -4830,6 +4975,189 @@ static int mark_chunk_remapped(struct btrfs_trans_handle *trans,
> return ret;
> }
>
> +static int do_remap_reloc_trans(struct btrfs_fs_info *fs_info,
> + struct btrfs_block_group *src_bg,
> + struct btrfs_path *path, u64 *last_start)
> +{
> + struct btrfs_trans_handle *trans;
> + struct btrfs_root *extent_root;
> + struct btrfs_key ins;
> + struct btrfs_block_group *dest_bg = NULL;
> + u64 start, remap_length, length, new_addr, min_size;
> + int ret;
> + bool no_more = false;
> + bool is_data = src_bg->flags & BTRFS_BLOCK_GROUP_DATA;
> + bool made_reservation = false, bg_needs_free_space;
> + struct btrfs_space_info *sinfo = src_bg->space_info;
> +
> + extent_root = btrfs_extent_root(fs_info, src_bg->start);
> +
> + trans = btrfs_start_transaction(extent_root, 0);
> + if (IS_ERR(trans))
> + return PTR_ERR(trans);
> +
> + mutex_lock(&fs_info->remap_mutex);
> +
> + ret = find_next_identity_remap(trans, path, src_bg->start + src_bg->length,
> + *last_start, &start, &remap_length);
> + if (ret == -ENOENT) {
> + no_more = true;
> + goto next;
> + } else if (ret) {
> + mutex_unlock(&fs_info->remap_mutex);
> + btrfs_end_transaction(trans);
> + return ret;
> + }
> +
> + /* Try to reserve enough space for block. */
> +
> + spin_lock(&sinfo->lock);
> + btrfs_space_info_update_bytes_may_use(sinfo, remap_length);
> + spin_unlock(&sinfo->lock);
> +
> + if (is_data)
> + min_size = fs_info->sectorsize;
> + else
> + min_size = fs_info->nodesize;
> +
> + /*
> + * We're using btrfs_reserve_extent() to allocate a contiguous
> + * logical address range, but this will become a remap item rather than
> + * an extent in the extent tree.
> + *
> + * Short allocations are fine: it means that we chop off the beginning
> + * of the identity remap that we're processing, and will tackle the
> + * rest of it the next time round.
> + */
> + ret = btrfs_reserve_extent(fs_info->fs_root, remap_length,
> + remap_length, min_size,
> + 0, 0, &ins, is_data, false);
> + if (ret) {
> + spin_lock(&sinfo->lock);
> + btrfs_space_info_update_bytes_may_use(sinfo, -remap_length);
> + spin_unlock(&sinfo->lock);
> +
> + mutex_unlock(&fs_info->remap_mutex);
> + btrfs_end_transaction(trans);
> + return ret;
> + }
> +
> + made_reservation = true;
> +
> + new_addr = ins.objectid;
> + length = ins.offset;
> +
> + if (!is_data && !IS_ALIGNED(length, fs_info->nodesize)) {
> + u64 new_length = ALIGN_DOWN(length, fs_info->nodesize);
> +
> + btrfs_free_reserved_extent(fs_info, new_addr + new_length,
> + length - new_length, 0);
> +
> + length = new_length;
> + }
> +
> + dest_bg = btrfs_lookup_block_group(fs_info, new_addr);
> +
> + mutex_lock(&dest_bg->free_space_lock);
> + bg_needs_free_space = test_bit(BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE,
> + &dest_bg->runtime_flags);
> + mutex_unlock(&dest_bg->free_space_lock);
> +
> + if (bg_needs_free_space) {
> + ret = btrfs_add_block_group_free_space(trans, dest_bg);
> + if (ret)
> + goto fail;
> + }
> +
> + ret = do_copy(fs_info, start, new_addr, length);
> + if (ret)
> + goto fail;
> +
> + ret = btrfs_remove_from_free_space_tree(trans, new_addr, length);
> + if (ret)
> + goto fail;
> +
> + ret = add_remap_entry(trans, path, src_bg, start, new_addr, length);
> + if (ret) {
> + btrfs_add_to_free_space_tree(trans, new_addr, length);
> + goto fail;
> + }
> +
> + adjust_block_group_remap_bytes(trans, dest_bg, length);
> + btrfs_free_reserved_bytes(dest_bg, length, 0);
> +
> + spin_lock(&sinfo->lock);
> + sinfo->bytes_readonly += length;
> + spin_unlock(&sinfo->lock);
> +
> +next:
> + if (dest_bg)
> + btrfs_put_block_group(dest_bg);
> +
> + if (made_reservation)
> + btrfs_dec_block_group_reservations(fs_info, new_addr);
> +
> + mutex_unlock(&fs_info->remap_mutex);
> +
> + if (src_bg->identity_remap_count == 0) {
> + bool mark_fully_remapped = false;
> +
> + spin_lock(&src_bg->lock);
> +
> + if (!src_bg->fully_remapped) {
> + mark_fully_remapped = true;
> + src_bg->fully_remapped = true;
> + }
> +
> + spin_unlock(&src_bg->lock);
> +
> + if (mark_fully_remapped)
> + btrfs_mark_bg_fully_remapped(src_bg, trans);
> + }
> +
> + ret = btrfs_end_transaction(trans);
> + if (ret)
> + return ret;
> +
> + if (no_more)
> + return 1;
> +
> + *last_start = start;
> +
> + return 0;
> +
> +fail:
> + if (dest_bg)
> + btrfs_put_block_group(dest_bg);
> +
> + btrfs_free_reserved_extent(fs_info, new_addr, length, 0);
> +
> + mutex_unlock(&fs_info->remap_mutex);
> + btrfs_end_transaction(trans);
> +
> + return ret;
> +}
> +
> +static int do_remap_reloc(struct btrfs_fs_info *fs_info,
> + struct btrfs_path *path, struct btrfs_block_group *bg)
> +{
> + u64 last_start;
> + int ret;
> +
> + last_start = bg->start;
> +
> + while (true) {
> + ret = do_remap_reloc_trans(fs_info, bg, path, &last_start);
> + if (ret) {
> + if (ret == 1)
> + ret = 0;
> + break;
> + }
> + }
> +
> + return ret;
> +}
> +
> int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
> u64 *length)
> {
> @@ -5123,6 +5451,14 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start,
> }
>
> ret = start_block_group_remapping(fs_info, path, bg);
> + if (ret)
> + goto out;
> +
> + ret = do_remap_reloc(fs_info, path, rc->block_group);
> + if (ret)
> + goto out;
> +
> + btrfs_delete_unused_bgs(fs_info);
> } else {
> ret = do_nonremap_reloc(fs_info, verbose, rc);
> }
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 09/16] btrfs: handle deletions from remapped block group
2025-11-20 0:17 ` Boris Burkov
@ 2025-11-24 12:40 ` Mark Harmstone
0 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-24 12:40 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs
On 20/11/2025 12.17 am, Boris Burkov wrote:
> On Fri, Nov 14, 2025 at 06:47:14PM +0000, Mark Harmstone wrote:
>> Handle the case where we free an extent from a block group that has the
>> REMAPPED flag set. Because the remap tree is orthogonal to the extent
>> tree, for data this may be within any number of identity remaps or
>> actual remaps. If we're freeing a metadata node, this will be wholly
>> inside one or the other.
>>
>> btrfs_remove_extent_from_remap_tree() searches the remap tree for the
>> remaps that cover the range in question, then calls
>> remove_range_from_remap_tree() for each one, to punch a hole in the
>> remap and adjust the free-space tree.
>>
>> For an identity remap, remove_range_from_remap_tree() will adjust the
>> block group's `identity_remap_count` if this changes. If it reaches
>> zero we mark the block group as fully remapped.
>>
>> When we commit the transaction, fully remapped block groups have their
>> chunk stripes removed and their device extents freed, which makes the
>> disk space available again to the chunk allocator.
>>
>> This is done when committing the transaction because it's a quick, rare
>> operation which prevents the chunk allocator from ENOSPCing - but see
>> later patches which do this asynchronously for the case of async
>> discard.
>>
>
> This part of the message is out of date.
> (thanks for changing it to the cleaner thread, btw)
>
> This looks good to me now (aside from the commit message update)
> Reviewed-by: Boris Burkov <boris@bur.io>
Oops! Thanks Boris
>
>> Signed-off-by: Mark Harmstone <mark@harmstone.com>
>> ---
>> fs/btrfs/block-group.c | 101 ++++++---
>> fs/btrfs/block-group.h | 4 +
>> fs/btrfs/disk-io.c | 6 +
>> fs/btrfs/extent-tree.c | 76 ++++++-
>> fs/btrfs/extent-tree.h | 1 +
>> fs/btrfs/fs.h | 4 +-
>> fs/btrfs/relocation.c | 452 +++++++++++++++++++++++++++++++++++++++++
>> fs/btrfs/relocation.h | 5 +
>> fs/btrfs/volumes.c | 56 +++--
>> fs/btrfs/volumes.h | 6 +
>> 10 files changed, 656 insertions(+), 55 deletions(-)
>>
>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>> index 3ebce7d6aae0..e269518e1bfe 100644
>> --- a/fs/btrfs/block-group.c
>> +++ b/fs/btrfs/block-group.c
>> @@ -1068,6 +1068,32 @@ static int remove_block_group_item(struct btrfs_trans_handle *trans,
>> return ret;
>> }
>>
>> +void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group)
>> +{
>> + int factor = btrfs_bg_type_to_factor(block_group->flags);
>> +
>> + spin_lock(&block_group->space_info->lock);
>> +
>> + if (btrfs_test_opt(block_group->fs_info, ENOSPC_DEBUG)) {
>> + WARN_ON(block_group->space_info->total_bytes
>> + < block_group->length);
>> + WARN_ON(block_group->space_info->bytes_readonly
>> + < block_group->length - block_group->zone_unusable);
>> + WARN_ON(block_group->space_info->bytes_zone_unusable
>> + < block_group->zone_unusable);
>> + WARN_ON(block_group->space_info->disk_total
>> + < block_group->length * factor);
>> + }
>> + block_group->space_info->total_bytes -= block_group->length;
>> + block_group->space_info->bytes_readonly -=
>> + (block_group->length - block_group->zone_unusable);
>> + btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
>> + -block_group->zone_unusable);
>> + block_group->space_info->disk_total -= block_group->length * factor;
>> +
>> + spin_unlock(&block_group->space_info->lock);
>> +}
>> +
>> int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> struct btrfs_chunk_map *map)
>> {
>> @@ -1079,7 +1105,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> struct kobject *kobj = NULL;
>> int ret;
>> int index;
>> - int factor;
>> struct btrfs_caching_control *caching_ctl = NULL;
>> bool remove_map;
>> bool remove_rsv = false;
>> @@ -1088,7 +1113,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> if (!block_group)
>> return -ENOENT;
>>
>> - BUG_ON(!block_group->ro);
>> + BUG_ON(!block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED));
>>
>> trace_btrfs_remove_block_group(block_group);
>> /*
>> @@ -1100,7 +1125,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> block_group->length);
>>
>> index = btrfs_bg_flags_to_raid_index(block_group->flags);
>> - factor = btrfs_bg_type_to_factor(block_group->flags);
>>
>> /* make sure this block group isn't part of an allocation cluster */
>> cluster = &fs_info->data_alloc_cluster;
>> @@ -1224,26 +1248,11 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>
>> spin_lock(&block_group->space_info->lock);
>> list_del_init(&block_group->ro_list);
>> -
>> - if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) {
>> - WARN_ON(block_group->space_info->total_bytes
>> - < block_group->length);
>> - WARN_ON(block_group->space_info->bytes_readonly
>> - < block_group->length - block_group->zone_unusable);
>> - WARN_ON(block_group->space_info->bytes_zone_unusable
>> - < block_group->zone_unusable);
>> - WARN_ON(block_group->space_info->disk_total
>> - < block_group->length * factor);
>> - }
>> - block_group->space_info->total_bytes -= block_group->length;
>> - block_group->space_info->bytes_readonly -=
>> - (block_group->length - block_group->zone_unusable);
>> - btrfs_space_info_update_bytes_zone_unusable(block_group->space_info,
>> - -block_group->zone_unusable);
>> - block_group->space_info->disk_total -= block_group->length * factor;
>> -
>> spin_unlock(&block_group->space_info->lock);
>>
>> + if (!(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED))
>> + btrfs_remove_bg_from_sinfo(block_group);
>> +
>> /*
>> * Remove the free space for the block group from the free space tree
>> * and the block group's item from the extent tree before marking the
>> @@ -1578,8 +1587,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>>
>> spin_lock(&space_info->lock);
>> spin_lock(&block_group->lock);
>> - if (btrfs_is_block_group_used(block_group) || block_group->ro ||
>> - list_is_singular(&block_group->list)) {
>> + if (btrfs_is_block_group_used(block_group) ||
>> + (block_group->ro && !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) ||
>> + list_is_singular(&block_group->list) ||
>> + block_group->fully_remapped) {
>> /*
>> * We want to bail if we made new allocations or have
>> * outstanding allocations in this block group. We do
>> @@ -1620,9 +1631,10 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
>> * needing to allocate extents from the block group.
>> */
>> used = btrfs_space_info_used(space_info, true);
>> - if ((space_info->total_bytes - block_group->length < used &&
>> - block_group->zone_unusable < block_group->length) ||
>> - has_unwritten_metadata(block_group)) {
>> + if (((space_info->total_bytes - block_group->length < used &&
>> + block_group->zone_unusable < block_group->length) ||
>> + has_unwritten_metadata(block_group)) &&
>> + !(block_group->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
>> /*
>> * Add a reference for the list, compensate for the ref
>> * drop under the "next" label for the
>> @@ -1787,6 +1799,12 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
>> btrfs_get_block_group(bg);
>> trace_btrfs_add_unused_block_group(bg);
>> list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
>> + } else if (bg->flags & BTRFS_BLOCK_GROUP_REMAPPED &&
>> + bg->identity_remap_count == 0) {
>> + /*
>> + * Leave fully remapped block groups on the
>> + * fully_remapped_bgs list.
>> + */
>> } else if (!test_bit(BLOCK_GROUP_FLAG_NEW, &bg->runtime_flags)) {
>> /* Pull out the block group from the reclaim_bgs list. */
>> trace_btrfs_add_unused_block_group(bg);
>> @@ -4600,6 +4618,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>> list_del_init(&block_group->bg_list);
>> btrfs_put_block_group(block_group);
>> }
>> +
>> + while (!list_empty(&info->fully_remapped_bgs)) {
>> + block_group = list_first_entry(&info->fully_remapped_bgs,
>> + struct btrfs_block_group,
>> + bg_list);
>> + list_del_init(&block_group->bg_list);
>> + btrfs_put_block_group(block_group);
>> + }
>> spin_unlock(&info->unused_bgs_lock);
>>
>> spin_lock(&info->zone_active_bgs_lock);
>> @@ -4787,3 +4813,26 @@ bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg)
>> return false;
>> return true;
>> }
>> +
>> +void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
>> + struct btrfs_trans_handle *trans)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> +
>> + spin_lock(&fs_info->unused_bgs_lock);
>> +
>> + /*
>> + * The block group might already be on the unused_bgs list, remove it
>> + * if it is. It'll get readded after the async discard worker finishes,
>> + * or in btrfs_handle_fully_remapped_bgs() if we're not using async
>> + * discard.
>> + */
>> + if (!list_empty(&bg->bg_list))
>> + list_del(&bg->bg_list);
>> + else
>> + btrfs_get_block_group(bg);
>> +
>> + list_add_tail(&bg->bg_list, &fs_info->fully_remapped_bgs);
>> +
>> + spin_unlock(&fs_info->unused_bgs_lock);
>> +}
>> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
>> index af23fdb3cf4d..d85f3c2546d0 100644
>> --- a/fs/btrfs/block-group.h
>> +++ b/fs/btrfs/block-group.h
>> @@ -282,6 +282,7 @@ struct btrfs_block_group {
>> struct extent_buffer *last_eb;
>> enum btrfs_block_group_size_class size_class;
>> u64 reclaim_mark;
>> + bool fully_remapped;
>> };
>>
>> static inline u64 btrfs_block_group_end(const struct btrfs_block_group *block_group)
>> @@ -336,6 +337,7 @@ int btrfs_add_new_free_space(struct btrfs_block_group *block_group,
>> struct btrfs_trans_handle *btrfs_start_trans_remove_block_group(
>> struct btrfs_fs_info *fs_info,
>> const u64 chunk_offset);
>> +void btrfs_remove_bg_from_sinfo(struct btrfs_block_group *block_group);
>> int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> struct btrfs_chunk_map *map);
>> void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
>> @@ -407,5 +409,7 @@ int btrfs_use_block_group_size_class(struct btrfs_block_group *bg,
>> enum btrfs_block_group_size_class size_class,
>> bool force_wrong_size_class);
>> bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg);
>> +void btrfs_mark_bg_fully_remapped(struct btrfs_block_group *bg,
>> + struct btrfs_trans_handle *trans);
>>
>> #endif /* BTRFS_BLOCK_GROUP_H */
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 9809e30fe103..53221a0131fb 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1526,6 +1526,10 @@ static int cleaner_kthread(void *arg)
>> */
>> btrfs_run_defrag_inodes(fs_info);
>>
>> + if (btrfs_fs_incompat(fs_info, REMAP_TREE) &&
>> + !btrfs_test_opt(fs_info, DISCARD_ASYNC))
>> + btrfs_handle_fully_remapped_bgs(fs_info);
>> +
>> /*
>> * Acquires fs_info->reclaim_bgs_lock to avoid racing
>> * with relocation (btrfs_relocate_chunk) and relocation
>> @@ -2878,6 +2882,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>> INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
>> INIT_LIST_HEAD(&fs_info->unused_bgs);
>> INIT_LIST_HEAD(&fs_info->reclaim_bgs);
>> + INIT_LIST_HEAD(&fs_info->fully_remapped_bgs);
>> INIT_LIST_HEAD(&fs_info->zone_active_bgs);
>> #ifdef CONFIG_BTRFS_DEBUG
>> INIT_LIST_HEAD(&fs_info->allocated_roots);
>> @@ -2933,6 +2938,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
>> mutex_init(&fs_info->chunk_mutex);
>> mutex_init(&fs_info->transaction_kthread_mutex);
>> mutex_init(&fs_info->cleaner_mutex);
>> + mutex_init(&fs_info->remap_mutex);
>> mutex_init(&fs_info->ro_block_group_mutex);
>> init_rwsem(&fs_info->commit_root_sem);
>> init_rwsem(&fs_info->cleanup_work_sem);
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index a7e522f67cca..b8fed3246e1f 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -41,6 +41,7 @@
>> #include "tree-checker.h"
>> #include "raid-stripe-tree.h"
>> #include "delayed-inode.h"
>> +#include "relocation.h"
>>
>> #undef SCRAMBLE_DELAYED_REFS
>>
>> @@ -2846,6 +2847,51 @@ static int unpin_extent_range(struct btrfs_fs_info *fs_info,
>> return 0;
>> }
>>
>> +void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info)
>> +{
>> + struct btrfs_block_group *block_group;
>> + int ret;
>> +
>> + spin_lock(&fs_info->unused_bgs_lock);
>> + while (!list_empty(&fs_info->fully_remapped_bgs)) {
>> + struct btrfs_chunk_map *map;
>> +
>> + block_group = list_first_entry(&fs_info->fully_remapped_bgs,
>> + struct btrfs_block_group,
>> + bg_list);
>> + list_del_init(&block_group->bg_list);
>> + spin_unlock(&fs_info->unused_bgs_lock);
>> +
>> + map = btrfs_get_chunk_map(fs_info, block_group->start, 1);
>> + if (IS_ERR(map)) {
>> + btrfs_put_block_group(block_group);
>> + return;
>> + }
>> +
>> + ret = btrfs_last_identity_remap_gone(map, block_group);
>> + if (ret) {
>> + btrfs_free_chunk_map(map);
>> + btrfs_put_block_group(block_group);
>> + return;
>> + }
>> +
>> + /*
>> + * Set num_stripes to 0, so that btrfs_remove_dev_extents()
>> + * won't run a second time.
>> + */
>> + map->num_stripes = 0;
>> +
>> + btrfs_free_chunk_map(map);
>> +
>> + if (block_group->used == 0)
>> + btrfs_mark_bg_unused(block_group);
>> +
>> + btrfs_put_block_group(block_group);
>> + spin_lock(&fs_info->unused_bgs_lock);
>> + }
>> + spin_unlock(&fs_info->unused_bgs_lock);
>> +}
>> +
>> int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
>> {
>> struct btrfs_fs_info *fs_info = trans->fs_info;
>> @@ -2998,11 +3044,23 @@ u64 btrfs_get_extent_owner_root(struct btrfs_fs_info *fs_info,
>> }
>>
>> static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
>> - u64 bytenr, struct btrfs_squota_delta *delta)
>> + u64 bytenr, struct btrfs_squota_delta *delta,
>> + struct btrfs_path *path)
>> {
>> int ret;
>> + bool remapped = false;
>> u64 num_bytes = delta->num_bytes;
>>
>> + /* returns 1 on success and 0 on no-op */
>> + ret = btrfs_remove_extent_from_remap_tree(trans, path, bytenr,
>> + num_bytes);
>> + if (ret < 0) {
>> + btrfs_abort_transaction(trans, ret);
>> + return ret;
>> + } else if (ret == 1) {
>> + remapped = true;
>> + }
>> +
>> if (delta->is_data) {
>> struct btrfs_root *csum_root;
>>
>> @@ -3026,10 +3084,16 @@ static int do_free_extent_accounting(struct btrfs_trans_handle *trans,
>> return ret;
>> }
>>
>> - ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
>> - if (unlikely(ret)) {
>> - btrfs_abort_transaction(trans, ret);
>> - return ret;
>> + /*
>> + * If remapped, FST has already been taken care of in
>> + * remove_range_from_remap_tree().
>> + */
>> + if (!remapped) {
>> + ret = btrfs_add_to_free_space_tree(trans, bytenr, num_bytes);
>> + if (unlikely(ret)) {
>> + btrfs_abort_transaction(trans, ret);
>> + return ret;
>> + }
>> }
>>
>> ret = btrfs_update_block_group(trans, bytenr, num_bytes, false);
>> @@ -3395,7 +3459,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>> }
>> btrfs_release_path(path);
>>
>> - ret = do_free_extent_accounting(trans, bytenr, &delta);
>> + ret = do_free_extent_accounting(trans, bytenr, &delta, path);
>> }
>> btrfs_release_path(path);
>>
>> diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h
>> index e573509c5a71..a15a9497c9f3 100644
>> --- a/fs/btrfs/extent-tree.h
>> +++ b/fs/btrfs/extent-tree.h
>> @@ -163,5 +163,6 @@ void btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u6
>> int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>> u64 num_bytes, u64 *actual_bytes);
>> int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range);
>> +void btrfs_handle_fully_remapped_bgs(struct btrfs_fs_info *fs_info);
>>
>> #endif
>> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
>> index 72fde0a3aaaf..9dbb482d8928 100644
>> --- a/fs/btrfs/fs.h
>> +++ b/fs/btrfs/fs.h
>> @@ -577,6 +577,7 @@ struct btrfs_fs_info {
>> struct mutex transaction_kthread_mutex;
>> struct mutex cleaner_mutex;
>> struct mutex chunk_mutex;
>> + struct mutex remap_mutex;
>>
>> /*
>> * This is taken to make sure we don't set block groups ro after the
>> @@ -830,10 +831,11 @@ struct btrfs_fs_info {
>> struct list_head reclaim_bgs;
>> int bg_reclaim_threshold;
>>
>> - /* Protects the lists unused_bgs and reclaim_bgs. */
>> + /* Protects the lists unused_bgs, reclaim_bgs, and fully_remapped_bgs. */
>> spinlock_t unused_bgs_lock;
>> /* Protected by unused_bgs_lock. */
>> struct list_head unused_bgs;
>> + struct list_head fully_remapped_bgs;
>> struct mutex unused_bg_unpin_mutex;
>> /* Protect block groups that are going to be deleted */
>> struct mutex reclaim_bgs_lock;
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 00e1898edbbe..315f212718ad 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -37,6 +37,7 @@
>> #include "super.h"
>> #include "tree-checker.h"
>> #include "raid-stripe-tree.h"
>> +#include "free-space-tree.h"
>>
>> /*
>> * Relocation overview
>> @@ -3860,6 +3861,183 @@ static const char *stage_to_string(enum reloc_stage stage)
>> return "unknown";
>> }
>>
>> +static void adjust_block_group_remap_bytes(struct btrfs_trans_handle *trans,
>> + struct btrfs_block_group *bg,
>> + s64 diff)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + bool bg_already_dirty = true, mark_unused = false;
>> +
>> + spin_lock(&bg->lock);
>> +
>> + bg->remap_bytes += diff;
>> +
>> + if (bg->used == 0 && bg->remap_bytes == 0)
>> + mark_unused = true;
>> +
>> + spin_unlock(&bg->lock);
>> +
>> + if (mark_unused)
>> + btrfs_mark_bg_unused(bg);
>> +
>> + spin_lock(&trans->transaction->dirty_bgs_lock);
>> + if (list_empty(&bg->dirty_list)) {
>> + list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
>> + bg_already_dirty = false;
>> + btrfs_get_block_group(bg);
>> + }
>> + spin_unlock(&trans->transaction->dirty_bgs_lock);
>> +
>> + /* Modified block groups are accounted for in the delayed_refs_rsv. */
>> + if (!bg_already_dirty)
>> + btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
>> +}
>> +
>> +static int remove_chunk_stripes(struct btrfs_trans_handle *trans,
>> + struct btrfs_chunk_map *chunk,
>> + struct btrfs_path *path)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_key key;
>> + struct extent_buffer *leaf;
>> + struct btrfs_chunk *c;
>> + int ret;
>> +
>> + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
>> + key.type = BTRFS_CHUNK_ITEM_KEY;
>> + key.offset = chunk->start;
>> +
>> + btrfs_reserve_chunk_metadata(trans, false);
>> +
>> + ret = btrfs_search_slot(trans, fs_info->chunk_root, &key, path,
>> + 0, 1);
>> + if (ret) {
>> + if (ret == 1) {
>> + btrfs_release_path(path);
>> + ret = -ENOENT;
>> + }
>> + btrfs_trans_release_chunk_metadata(trans);
>> + return ret;
>> + }
>> +
>> + leaf = path->nodes[0];
>> +
>> + c = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_chunk);
>> + btrfs_set_chunk_num_stripes(leaf, c, 0);
>> + btrfs_set_chunk_sub_stripes(leaf, c, 0);
>> +
>> + btrfs_truncate_item(trans, path, offsetof(struct btrfs_chunk, stripe),
>> + 1);
>> +
>> + btrfs_mark_buffer_dirty(trans, leaf);
>> +
>> + btrfs_release_path(path);
>> + btrfs_trans_release_chunk_metadata(trans);
>> +
>> + return 0;
>> +}
>> +
>> +int btrfs_last_identity_remap_gone(struct btrfs_chunk_map *chunk,
>> + struct btrfs_block_group *bg)
>> +{
>> + struct btrfs_fs_info *fs_info = bg->fs_info;
>> + struct btrfs_trans_handle *trans;
>> + int ret;
>> + unsigned int num_items;
>> + BTRFS_PATH_AUTO_FREE(path);
>> +
>> + path = btrfs_alloc_path();
>> + if (!path)
>> + return -ENOMEM;
>> +
>> + /*
>> + * One item for each entry we're removing in the dev extents tree, and
>> + * another for each device. DUP chunks are all on one device,
>> + * everything else has one device per stripe.
>> + */
>> + if (bg->flags & BTRFS_BLOCK_GROUP_DUP)
>> + num_items = chunk->num_stripes + 1;
>> + else
>> + num_items = 2 * chunk->num_stripes;
>> +
>> + trans = btrfs_start_transaction_fallback_global_rsv(fs_info->tree_root,
>> + num_items);
>> + if (IS_ERR(trans))
>> + return PTR_ERR(trans);
>> +
>> + ret = btrfs_remove_dev_extents(trans, chunk);
>> + if (ret) {
>> + btrfs_abort_transaction(trans, ret);
>> + return ret;
>> + }
>> +
>> + mutex_lock(&trans->fs_info->chunk_mutex);
>> +
>> + for (unsigned int i = 0; i < chunk->num_stripes; i++) {
>> + ret = btrfs_update_device(trans, chunk->stripes[i].dev);
>> + if (ret) {
>> + mutex_unlock(&trans->fs_info->chunk_mutex);
>> + btrfs_abort_transaction(trans, ret);
>> + return ret;
>> + }
>> + }
>> +
>> + mutex_unlock(&trans->fs_info->chunk_mutex);
>> +
>> + write_lock(&trans->fs_info->mapping_tree_lock);
>> + btrfs_chunk_map_device_clear_bits(chunk, CHUNK_ALLOCATED);
>> + write_unlock(&trans->fs_info->mapping_tree_lock);
>> +
>> + btrfs_remove_bg_from_sinfo(bg);
>> +
>> + ret = remove_chunk_stripes(trans, chunk, path);
>> + if (ret) {
>> + btrfs_abort_transaction(trans, ret);
>> + return ret;
>> + }
>> +
>> + ret = btrfs_commit_transaction(trans);
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static void adjust_identity_remap_count(struct btrfs_trans_handle *trans,
>> + struct btrfs_block_group *bg, int delta)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + bool bg_already_dirty = true, mark_fully_remapped = false;
>> +
>> + WARN_ON(delta < 0 && -delta > bg->identity_remap_count);
>> +
>> + spin_lock(&bg->lock);
>> +
>> + bg->identity_remap_count += delta;
>> +
>> + if (bg->identity_remap_count == 0 && !bg->fully_remapped) {
>> + bg->fully_remapped = true;
>> + mark_fully_remapped = true;
>> + }
>> +
>> + spin_unlock(&bg->lock);
>> +
>> + spin_lock(&trans->transaction->dirty_bgs_lock);
>> + if (list_empty(&bg->dirty_list)) {
>> + list_add_tail(&bg->dirty_list, &trans->transaction->dirty_bgs);
>> + bg_already_dirty = false;
>> + btrfs_get_block_group(bg);
>> + }
>> + spin_unlock(&trans->transaction->dirty_bgs_lock);
>> +
>> + /* Modified block groups are accounted for in the delayed_refs_rsv. */
>> + if (!bg_already_dirty)
>> + btrfs_inc_delayed_refs_rsv_bg_updates(fs_info);
>> +
>> + if (mark_fully_remapped)
>> + btrfs_mark_bg_fully_remapped(bg, trans);
>> +}
>> +
>> int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
>> u64 *length)
>> {
>> @@ -4468,3 +4646,277 @@ u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info)
>> logical = fs_info->reloc_ctl->block_group->start;
>> return logical;
>> }
>> +
>> +static int insert_remap_item(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path, u64 old_addr, u64 length,
>> + u64 new_addr)
>> +{
>> + int ret;
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_key key;
>> + struct btrfs_remap remap;
>> +
>> + if (old_addr == new_addr) {
>> + /* Add new identity remap item. */
>> +
>> + key.objectid = old_addr;
>> + key.type = BTRFS_IDENTITY_REMAP_KEY;
>> + key.offset = length;
>> +
>> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
>> + path, &key, 0);
>> + if (ret)
>> + return ret;
>> + } else {
>> + /* Add new remap item. */
>> +
>> + key.objectid = old_addr;
>> + key.type = BTRFS_REMAP_KEY;
>> + key.offset = length;
>> +
>> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
>> + path, &key,
>> + sizeof(struct btrfs_remap));
>> + if (ret)
>> + return ret;
>> +
>> + btrfs_set_stack_remap_address(&remap, new_addr);
>> +
>> + write_extent_buffer(path->nodes[0], &remap,
>> + btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
>> + sizeof(struct btrfs_remap));
>> +
>> + btrfs_release_path(path);
>> +
>> + /* Add new backref item. */
>> +
>> + key.objectid = new_addr;
>> + key.type = BTRFS_REMAP_BACKREF_KEY;
>> + key.offset = length;
>> +
>> + ret = btrfs_insert_empty_item(trans, fs_info->remap_root,
>> + path, &key,
>> + sizeof(struct btrfs_remap));
>> + if (ret)
>> + return ret;
>> +
>> + btrfs_set_stack_remap_address(&remap, old_addr);
>> +
>> + write_extent_buffer(path->nodes[0], &remap,
>> + btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
>> + sizeof(struct btrfs_remap));
>> + }
>> +
>> + btrfs_release_path(path);
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Punch a hole in the remap item or identity remap item pointed to by path,
>> + * for the range [hole_start, hole_start + hole_length).
>> + */
>> +static int remove_range_from_remap_tree(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path,
>> + struct btrfs_block_group *bg,
>> + u64 hole_start, u64 hole_length)
>> +{
>> + int ret;
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct extent_buffer *leaf = path->nodes[0];
>> + struct btrfs_key key;
>> + u64 hole_end, new_addr, remap_start, remap_length, remap_end,
>> + overlap_length;
>> + bool is_identity_remap;
>> + int identity_count_delta = 0;
>> +
>> + hole_end = hole_start + hole_length;
>> +
>> + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
>> +
>> + is_identity_remap = key.type == BTRFS_IDENTITY_REMAP_KEY;
>> +
>> + remap_start = key.objectid;
>> + remap_length = key.offset;
>> +
>> + remap_end = remap_start + remap_length;
>> +
>> + if (is_identity_remap) {
>> + new_addr = remap_start;
>> + } else {
>> + struct btrfs_remap *remap_ptr;
>> +
>> + remap_ptr = btrfs_item_ptr(leaf, path->slots[0],
>> + struct btrfs_remap);
>> + new_addr = btrfs_remap_address(leaf, remap_ptr);
>> + }
>> +
>> + /* Delete old item. */
>> +
>> + ret = btrfs_del_item(trans, fs_info->remap_root, path);
>> +
>> + btrfs_release_path(path);
>> +
>> + if (ret)
>> + return ret;
>> +
>> + if (is_identity_remap) {
>> + identity_count_delta = -1;
>> + } else {
>> + /* Remove backref. */
>> +
>> + key.objectid = new_addr;
>> + key.type = BTRFS_REMAP_BACKREF_KEY;
>> + key.offset = remap_length;
>> +
>> + ret = btrfs_search_slot(trans, fs_info->remap_root,
>> + &key, path, -1, 1);
>> + if (ret) {
>> + if (ret == 1) {
>> + btrfs_release_path(path);
>> + ret = -ENOENT;
>> + }
>> + return ret;
>> + }
>> +
>> + ret = btrfs_del_item(trans, fs_info->remap_root, path);
>> +
>> + btrfs_release_path(path);
>> +
>> + if (ret)
>> + return ret;
>> + }
>> +
>> + /* If hole_start > remap_start, re-add the start of the remap item. */
>> + if (hole_start > remap_start) {
>> + ret = insert_remap_item(trans, path, remap_start,
>> + hole_start - remap_start, new_addr);
>> + if (ret)
>> + return ret;
>> +
>> + if (is_identity_remap)
>> + identity_count_delta++;
>> + }
>> +
>> + /* If hole_end < remap_end, re-add the end of the remap item. */
>> + if (hole_end < remap_end) {
>> + ret = insert_remap_item(trans, path, hole_end,
>> + remap_end - hole_end,
>> + hole_end - remap_start + new_addr);
>> + if (ret)
>> + return ret;
>> +
>> + if (is_identity_remap)
>> + identity_count_delta++;
>> + }
>> +
>> + if (identity_count_delta != 0)
>> + adjust_identity_remap_count(trans, bg, identity_count_delta);
>> +
>> + overlap_length = min_t(u64, hole_end, remap_end) -
>> + max_t(u64, hole_start, remap_start);
>> +
>> + if (!is_identity_remap) {
>> + struct btrfs_block_group *dest_bg;
>> +
>> + dest_bg = btrfs_lookup_block_group(fs_info, new_addr);
>> +
>> + adjust_block_group_remap_bytes(trans, dest_bg, -overlap_length);
>> +
>> + btrfs_put_block_group(dest_bg);
>> +
>> + ret = btrfs_add_to_free_space_tree(trans,
>> + hole_start - remap_start + new_addr,
>> + overlap_length);
>> + if (ret)
>> + return ret;
>> + }
>> +
>> + ret = overlap_length;
>> +
>> + return ret;
>> +}
>> +
>> +/*
>> + * Returns 1 if remove_range_from_remap_tree() has been called successfully,
>> + * 0 if block group wasn't remapped, and a negative number on error.
>> + */
>> +int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path,
>> + u64 bytenr, u64 num_bytes)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_key key, found_key;
>> + struct extent_buffer *leaf;
>> + struct btrfs_block_group *bg;
>> + int ret, length;
>> +
>> + if (!(btrfs_super_incompat_flags(fs_info->super_copy) &
>> + BTRFS_FEATURE_INCOMPAT_REMAP_TREE))
>> + return 0;
>> +
>> + bg = btrfs_lookup_block_group(fs_info, bytenr);
>> + if (!bg)
>> + return 0;
>> +
>> + mutex_lock(&fs_info->remap_mutex);
>> +
>> + if (!(bg->flags & BTRFS_BLOCK_GROUP_REMAPPED)) {
>> + mutex_unlock(&fs_info->remap_mutex);
>> + btrfs_put_block_group(bg);
>> + return 0;
>> + }
>> +
>> + do {
>> + key.objectid = bytenr;
>> + key.type = (u8)-1;
>> + key.offset = (u64)-1;
>> +
>> + ret = btrfs_search_slot(trans, fs_info->remap_root, &key, path,
>> + -1, 1);
>> + if (ret < 0)
>> + goto end;
>> +
>> + leaf = path->nodes[0];
>> +
>> + if (path->slots[0] == 0) {
>> + ret = -ENOENT;
>> + goto end;
>> + }
>> +
>> + path->slots[0]--;
>> +
>> + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +
>> + if (found_key.type != BTRFS_IDENTITY_REMAP_KEY &&
>> + found_key.type != BTRFS_REMAP_KEY) {
>> + ret = -ENOENT;
>> + goto end;
>> + }
>> +
>> + if (bytenr < found_key.objectid ||
>> + bytenr >= found_key.objectid + found_key.offset) {
>> + ret = -ENOENT;
>> + goto end;
>> + }
>> +
>> + length = remove_range_from_remap_tree(trans, path, bg, bytenr,
>> + num_bytes);
>> + if (length < 0) {
>> + ret = length;
>> + goto end;
>> + }
>> +
>> + bytenr += length;
>> + num_bytes -= length;
>> + } while (num_bytes > 0);
>> +
>> + ret = 1;
>> +
>> +end:
>> + mutex_unlock(&fs_info->remap_mutex);
>> +
>> + btrfs_put_block_group(bg);
>> + btrfs_release_path(path);
>> + return ret;
>> +}
>> diff --git a/fs/btrfs/relocation.h b/fs/btrfs/relocation.h
>> index b2ba83966650..ffb497f27889 100644
>> --- a/fs/btrfs/relocation.h
>> +++ b/fs/btrfs/relocation.h
>> @@ -33,5 +33,10 @@ bool btrfs_should_ignore_reloc_root(const struct btrfs_root *root);
>> u64 btrfs_get_reloc_bg_bytenr(const struct btrfs_fs_info *fs_info);
>> int btrfs_translate_remap(struct btrfs_fs_info *fs_info, u64 *logical,
>> u64 *length);
>> +int btrfs_remove_extent_from_remap_tree(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path,
>> + u64 bytenr, u64 num_bytes);
>> +int btrfs_last_identity_remap_gone(struct btrfs_chunk_map *chunk,
>> + struct btrfs_block_group *bg);
>>
>> #endif
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 6a72c2a599a6..2347b37113b0 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2928,8 +2928,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>> return ret;
>> }
>>
>> -static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
>> - struct btrfs_device *device)
>> +int btrfs_update_device(struct btrfs_trans_handle *trans,
>> + struct btrfs_device *device)
>> {
>> int ret;
>> BTRFS_PATH_AUTO_FREE(path);
>> @@ -3227,25 +3227,13 @@ static int remove_chunk_item(struct btrfs_trans_handle *trans,
>> return btrfs_free_chunk(trans, chunk_offset);
>> }
>>
>> -int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>> +int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
>> + struct btrfs_chunk_map *map)
>> {
>> struct btrfs_fs_info *fs_info = trans->fs_info;
>> - struct btrfs_chunk_map *map;
>> + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>> u64 dev_extent_len = 0;
>> int i, ret = 0;
>> - struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>> -
>> - map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
>> - if (IS_ERR(map)) {
>> - /*
>> - * This is a logic error, but we don't want to just rely on the
>> - * user having built with ASSERT enabled, so if ASSERT doesn't
>> - * do anything we still error out.
>> - */
>> - DEBUG_WARN("errr %ld reading chunk map at offset %llu",
>> - PTR_ERR(map), chunk_offset);
>> - return PTR_ERR(map);
>> - }
>>
>> /*
>> * First delete the device extent items from the devices btree.
>> @@ -3266,7 +3254,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>> if (unlikely(ret)) {
>> mutex_unlock(&fs_devices->device_list_mutex);
>> btrfs_abort_transaction(trans, ret);
>> - goto out;
>> + return ret;
>> }
>>
>> if (device->bytes_used > 0) {
>> @@ -3286,6 +3274,30 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>> }
>> mutex_unlock(&fs_devices->device_list_mutex);
>>
>> + return 0;
>> +}
>> +
>> +int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_chunk_map *map;
>> + int ret;
>> +
>> + map = btrfs_get_chunk_map(fs_info, chunk_offset, 1);
>> + if (IS_ERR(map)) {
>> + /*
>> + * This is a logic error, but we don't want to just rely on the
>> + * user having built with ASSERT enabled, so if ASSERT doesn't
>> + * do anything we still error out.
>> + */
>> + ASSERT(0);
>> + return PTR_ERR(map);
>> + }
>> +
>> + ret = btrfs_remove_dev_extents(trans, map);
>> + if (ret)
>> + goto out;
>> +
>> /*
>> * We acquire fs_info->chunk_mutex for 2 reasons:
>> *
>> @@ -5419,7 +5431,7 @@ static void chunk_map_device_set_bits(struct btrfs_chunk_map *map, unsigned int
>> }
>> }
>>
>> -static void chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
>> +void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map, unsigned int bits)
>> {
>> for (int i = 0; i < map->num_stripes; i++) {
>> struct btrfs_io_stripe *stripe = &map->stripes[i];
>> @@ -5436,7 +5448,7 @@ void btrfs_remove_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_ma
>> write_lock(&fs_info->mapping_tree_lock);
>> rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
>> RB_CLEAR_NODE(&map->rb_node);
>> - chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>> + btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>> write_unlock(&fs_info->mapping_tree_lock);
>>
>> /* Once for the tree reference. */
>> @@ -5472,7 +5484,7 @@ int btrfs_add_chunk_map(struct btrfs_fs_info *fs_info, struct btrfs_chunk_map *m
>> return -EEXIST;
>> }
>> chunk_map_device_set_bits(map, CHUNK_ALLOCATED);
>> - chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
>> + btrfs_chunk_map_device_clear_bits(map, CHUNK_TRIMMED);
>> write_unlock(&fs_info->mapping_tree_lock);
>>
>> return 0;
>> @@ -5828,7 +5840,7 @@ void btrfs_mapping_tree_free(struct btrfs_fs_info *fs_info)
>> map = rb_entry(node, struct btrfs_chunk_map, rb_node);
>> rb_erase_cached(&map->rb_node, &fs_info->mapping_tree);
>> RB_CLEAR_NODE(&map->rb_node);
>> - chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>> + btrfs_chunk_map_device_clear_bits(map, CHUNK_ALLOCATED);
>> /* Once for the tree ref. */
>> btrfs_free_chunk_map(map);
>> cond_resched_rwlock_write(&fs_info->mapping_tree_lock);
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 4117fabb248b..ccf0a459180d 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -794,6 +794,8 @@ u64 btrfs_calc_stripe_length(const struct btrfs_chunk_map *map);
>> int btrfs_nr_parity_stripes(u64 type);
>> int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans,
>> struct btrfs_block_group *bg);
>> +int btrfs_remove_dev_extents(struct btrfs_trans_handle *trans,
>> + struct btrfs_chunk_map *map);
>> int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset);
>>
>> #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>> @@ -905,6 +907,10 @@ bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical);
>>
>> bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr);
>> const u8 *btrfs_sb_fsid_ptr(const struct btrfs_super_block *sb);
>> +int btrfs_update_device(struct btrfs_trans_handle *trans,
>> + struct btrfs_device *device);
>> +void btrfs_chunk_map_device_clear_bits(struct btrfs_chunk_map *map,
>> + unsigned int bits);
>>
>> #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>> struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info,
>> --
>> 2.51.0
>>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v6 10/16] btrfs: handle setting up relocation of block group with remap-tree
2025-11-15 14:52 ` Sun Yangkai
@ 2025-11-24 18:01 ` Mark Harmstone
0 siblings, 0 replies; 23+ messages in thread
From: Mark Harmstone @ 2025-11-24 18:01 UTC (permalink / raw)
To: Sun Yangkai; +Cc: boris, linux-btrfs
Thank you Sun, I agree that that's better. I'll roll it into the patch
for version 7.
On 15/11/2025 2.52 pm, Sun Yangkai wrote:
> While reading the thread, I noticed the logic that builds the identity_remap
> entries was a bit hard to follow.
> I took the liberty of rewriting the function so that the two high-level cases
> are immediately visible inside a single if/else. The result has no behavioral
> change, and (at least to me) makes it obvious where the head/tail gaps are handled.
> The modified code is shown below; feel free to pick it up if you find it useful.
> Please let me know if I missed anything.
>
>>
>> +static int create_remap_tree_entries(struct btrfs_trans_handle *trans,
>> + struct btrfs_path *path,
>> + struct btrfs_block_group *bg)
>> +{
>> + struct btrfs_fs_info *fs_info = trans->fs_info;
>> + struct btrfs_free_space_info *fsi;
>> + struct btrfs_key key, found_key;
>> + struct extent_buffer *leaf;
>> + struct btrfs_root *space_root;
>> + u32 extent_count;
>> + struct space_run *space_runs = NULL;
>> + unsigned int num_space_runs = 0;
>> + struct btrfs_key *entries = NULL;
>> + unsigned int max_entries, num_entries;
>> + int ret;
>> +
>> + mutex_lock(&bg->free_space_lock);
>> +
>> + if (test_bit(BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE, &bg->runtime_flags)) {
>> + mutex_unlock(&bg->free_space_lock);
>> +
>> + ret = btrfs_add_block_group_free_space(trans, bg);
>> + if (ret)
>> + return ret;
>> +
>> + mutex_lock(&bg->free_space_lock);
>> + }
>> +
>> + fsi = btrfs_search_free_space_info(trans, bg, path, 0);
>> + if (IS_ERR(fsi)) {
>> + mutex_unlock(&bg->free_space_lock);
>> + return PTR_ERR(fsi);
>> + }
>> +
>> + extent_count = btrfs_free_space_extent_count(path->nodes[0], fsi);
>> +
>> + btrfs_release_path(path);
>> +
>> + space_runs = kmalloc(sizeof(*space_runs) * extent_count, GFP_NOFS);
>> + if (!space_runs) {
>> + mutex_unlock(&bg->free_space_lock);
>> + return -ENOMEM;
>> + }
>> +
>> + key.objectid = bg->start;
>> + key.type = 0;
>> + key.offset = 0;
>> +
>> + space_root = btrfs_free_space_root(bg);
>> +
>> + ret = btrfs_search_slot(trans, space_root, &key, path, 0, 0);
>> + if (ret < 0) {
>> + mutex_unlock(&bg->free_space_lock);
>> + goto out;
>> + }
>> +
>> + ret = 0;
>> +
>> + while (true) {
>> + leaf = path->nodes[0];
>> +
>> + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +
>> + if (found_key.objectid >= bg->start + bg->length)
>> + break;
>> +
>> + if (found_key.type == BTRFS_FREE_SPACE_EXTENT_KEY) {
>> + if (num_space_runs != 0 &&
>> + space_runs[num_space_runs - 1].end == found_key.objectid) {
>> + space_runs[num_space_runs - 1].end =
>> + found_key.objectid + found_key.offset;
>> + } else {
>> + BUG_ON(num_space_runs >= extent_count);
>> +
>> + space_runs[num_space_runs].start = found_key.objectid;
>> + space_runs[num_space_runs].end =
>> + found_key.objectid + found_key.offset;
>> +
>> + num_space_runs++;
>> + }
>> + } else if (found_key.type == BTRFS_FREE_SPACE_BITMAP_KEY) {
>> + void *bitmap;
>> + unsigned long offset;
>> + u32 data_size;
>> +
>> + offset = btrfs_item_ptr_offset(leaf, path->slots[0]);
>> + data_size = btrfs_item_size(leaf, path->slots[0]);
>> +
>> + if (data_size != 0) {
>> + bitmap = kmalloc(data_size, GFP_NOFS);
>> + if (!bitmap) {
>> + mutex_unlock(&bg->free_space_lock);
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> +
>> + read_extent_buffer(leaf, bitmap, offset,
>> + data_size);
>> +
>> + parse_bitmap(fs_info->sectorsize, bitmap,
>> + data_size * BITS_PER_BYTE,
>> + found_key.objectid, space_runs,
>> + &num_space_runs);
>> +
>> + BUG_ON(num_space_runs > extent_count);
>> +
>> + kfree(bitmap);
>> + }
>> + }
>> +
>> + path->slots[0]++;
>> +
>> + if (path->slots[0] >= btrfs_header_nritems(leaf)) {
>> + ret = btrfs_next_leaf(space_root, path);
>> + if (ret != 0) {
>> + if (ret == 1)
>> + ret = 0;
>> + break;
>> + }
>> + leaf = path->nodes[0];
>> + }
>> + }
>> +
>> + btrfs_release_path(path);
>> +
>> + mutex_unlock(&bg->free_space_lock);
>> +
>> + max_entries = extent_count + 2;
>> + entries = kmalloc(sizeof(*entries) * max_entries, GFP_NOFS);
>> + if (!entries) {
>> + ret = -ENOMEM;
>> + goto out;
>> + }
>> +
>> + num_entries = 0;
>> +
>> + if (num_space_runs > 0 && space_runs[0].start > bg->start) {
>> + entries[num_entries].objectid = bg->start;
>> + entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
>> + entries[num_entries].offset = space_runs[0].start - bg->start;
>> + num_entries++;
>> + }
>> +
>> + for (unsigned int i = 1; i < num_space_runs; i++) {
>> + entries[num_entries].objectid = space_runs[i - 1].end;
>> + entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
>> + entries[num_entries].offset =
>> + space_runs[i].start - space_runs[i - 1].end;
>> + num_entries++;
>> + }
>> +
>> + if (num_space_runs == 0) {
>> + entries[num_entries].objectid = bg->start;
>> + entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
>> + entries[num_entries].offset = bg->length;
>> + num_entries++;
>> + } else if (space_runs[num_space_runs - 1].end < bg->start + bg->length) {
>> + entries[num_entries].objectid = space_runs[num_space_runs - 1].end;
>> + entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
>> + entries[num_entries].offset =
>> + bg->start + bg->length - space_runs[num_space_runs - 1].end;
>> + num_entries++;
>> + }
>> +
>> + if (num_entries == 0)
>> + goto out;
>> +
>> + bg->identity_remap_count = num_entries;
>> +
>> + ret = add_remap_tree_entries(trans, path, entries, num_entries);
>
> We can group the empty and non-empty space_runs cases into an if/else to make
> the two main flows obvious and reduce scattered conditions:
>
> num_entries = 0;
>
> if (num_space_runs == 0) {
> entries[num_entries].objectid = bg->start;
> entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
> entries[num_entries].offset = bg->length;
> num_entries++;
> } else {
> if (space_runs[0].start > bg->start) {
> entries[num_entries].objectid = bg->start;
> entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
> entries[num_entries].offset = space_runs[0].start - bg->start;
> num_entries++;
> }
> for (unsigned int i = 1; i < num_space_runs; i++) {
> entries[num_entries].objectid = space_runs[i - 1].end;
> entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
> entries[num_entries].offset =
> space_runs[i].start - space_runs[i - 1].end;
> num_entries++;
> }
> if (space_runs[num_space_runs - 1].end < bg->start + bg->length) {
> entries[num_entries].objectid = space_runs[num_space_runs - 1].end;
> entries[num_entries].type = BTRFS_IDENTITY_REMAP_KEY;
> entries[num_entries].offset =
> bg->start + bg->length - space_runs[num_space_runs - 1].end;
> num_entries++;
> }
> if (num_entries == 0)
> goto out;
> }
>
> // I'm not sure if it's necessary but we can free space_runs earlier
> // since we're also doing allocation in add_remap_tree_entries().
> // kfree(space_runs);
> // space_runs = NULL;
>
> bg->identity_remap_count = num_entries;
>
> ret = add_remap_tree_entries(trans, path, entries, num_entries);
>
>
>> +
>> +out:
>> + kfree(entries);
>> + kfree(space_runs);
>> +
>> + return ret;
>> +}
>
> Regards,
> Sun Yangkai
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-11-24 18:01 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-14 18:47 [PATCH v6 00/16] Remap tree Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 01/16] btrfs: add definitions and constants for remap-tree Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 02/16] btrfs: add REMAP chunk type Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 03/16] btrfs: allow remapped chunks to have zero stripes Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 04/16] btrfs: remove remapped block groups from the free-space tree Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 05/16] btrfs: don't add metadata items for the remap tree to the extent tree Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 06/16] btrfs: add extended version of struct block_group_item Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 07/16] btrfs: allow mounting filesystems with remap-tree incompat flag Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 08/16] btrfs: redirect I/O for remapped block groups Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 09/16] btrfs: handle deletions from remapped block group Mark Harmstone
2025-11-20 0:17 ` Boris Burkov
2025-11-24 12:40 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 10/16] btrfs: handle setting up relocation of block group with remap-tree Mark Harmstone
2025-11-15 14:52 ` Sun Yangkai
2025-11-24 18:01 ` Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 11/16] btrfs: move existing remaps before relocating block group Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 12/16] btrfs: replace identity remaps with actual remaps when doing relocations Mark Harmstone
2025-11-20 0:21 ` Boris Burkov
2025-11-14 18:47 ` [PATCH v6 13/16] btrfs: add do_remap param to btrfs_discard_extent() Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 14/16] btrfs: allow balancing remap tree Mark Harmstone
2025-11-14 18:47 ` [PATCH v6 15/16] btrfs: handle discarding fully-remapped block groups Mark Harmstone
2025-11-20 0:19 ` Boris Burkov
2025-11-14 18:47 ` [PATCH v6 16/16] btrfs: populate fully_remapped_bgs_list on mount Mark Harmstone
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox