* [PATCH v2 0/6] btrfs: extent-tree-v2, gc and no meta ref counting
@ 2022-03-07 22:04 Josef Bacik
2022-03-07 22:04 ` [PATCH v2 1/6] btrfs: read the nr_global_roots from the super block Josef Bacik
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Josef Bacik @ 2022-03-07 22:04 UTC (permalink / raw)
To: linux-btrfs, kernel-team
v1->v2:
- I decided to add the nr of expected global roots to the super block to make
checking more straightforward, so added a patch for reading that value.
- I sent the prep patches/fixes in a different series and those were merged, so
dropped them from this series
--- Original email ---
Hello,
This is the kernel side of the support for the GC trees and no longer tracking
metadata reference counts.
For the GC tree we're only implementing offloading the truncate to the GC tree
for now. As new support is added we'll add code for the garbage collection for
each of the new operations. Truncate was picked because it's simple enough to
do, gets us a nice latency win on normal workloads, and is a quick way to
validate that the GC tree is doing what it's supposed to.
This also disables the reference counting of metadata blocks. Snapshotting and
everything reference counting related to metadata has been disabled, and will be
turned back on as the code needed to support those operations is added back.
This survives xfstests without blowing up. Thanks,
Josef
Josef Bacik (6):
btrfs: read the nr_global_roots from the super block
btrfs: don't do backref modification for metadata for extent tree v2
btrfs: add definitions and read support for the garbage collection
tree
btrfs: add a btrfs_first_item helper
btrfs: turn evict_refill_and_join into a real helper
btrfs: add garbage collection tree support
fs/btrfs/Makefile | 2 +-
fs/btrfs/ctree.c | 23 ++++
fs/btrfs/ctree.h | 19 ++-
fs/btrfs/disk-io.c | 37 ++++--
fs/btrfs/extent-tree.c | 13 +-
fs/btrfs/gc-tree.c | 223 ++++++++++++++++++++++++++++++++
fs/btrfs/gc-tree.h | 15 +++
fs/btrfs/inode.c | 65 +++-------
fs/btrfs/print-tree.c | 4 +
fs/btrfs/space-info.c | 4 +-
fs/btrfs/transaction.c | 52 ++++++++
fs/btrfs/transaction.h | 2 +
include/uapi/linux/btrfs_tree.h | 6 +
13 files changed, 394 insertions(+), 71 deletions(-)
create mode 100644 fs/btrfs/gc-tree.c
create mode 100644 fs/btrfs/gc-tree.h
--
2.26.3
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/6] btrfs: read the nr_global_roots from the super block
2022-03-07 22:04 [PATCH v2 0/6] btrfs: extent-tree-v2, gc and no meta ref counting Josef Bacik
@ 2022-03-07 22:04 ` Josef Bacik
2022-03-07 22:04 ` [PATCH v2 2/6] btrfs: don't do backref modification for metadata for extent tree v2 Josef Bacik
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2022-03-07 22:04 UTC (permalink / raw)
To: linux-btrfs, kernel-team
Originally I was deriving the number of global roots from the largest
offset of the extent root we found in the tree_root. However this could
result in shenanigans with fuzzing, so instead store this in our
super block as the source of truth, and then check the offset of the
root items to make sure they're sane wrt our global root count.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/ctree.h | 6 +++++-
fs/btrfs/disk-io.c | 23 ++++++++++++-----------
2 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4db17bd05a21..aaa8451ef8be 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -277,6 +277,8 @@ struct btrfs_super_block {
/* the UUID written into btree blocks */
u8 metadata_uuid[BTRFS_FSID_SIZE];
+ __le64 nr_global_roots;
+
/* Extent tree v2 */
__le64 block_group_root;
__le64 block_group_root_generation;
@@ -284,7 +286,7 @@ struct btrfs_super_block {
/* future expansion */
u8 reserved8[7];
- __le64 reserved[25];
+ __le64 reserved[24];
u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
@@ -2513,6 +2515,8 @@ BTRFS_SETGET_STACK_FUNCS(super_block_group_root_generation,
block_group_root_generation, 64);
BTRFS_SETGET_STACK_FUNCS(super_block_group_root_level, struct btrfs_super_block,
block_group_root_level, 8);
+BTRFS_SETGET_STACK_FUNCS(super_nr_global_roots, struct btrfs_super_block,
+ nr_global_roots, 64);
int btrfs_super_csum_size(const struct btrfs_super_block *s);
const char *btrfs_super_csum_name(u16 csum_type);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 09693ab4fde0..aeefc4e2e71a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2583,7 +2583,6 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
{
struct btrfs_fs_info *fs_info = tree_root->fs_info;
struct btrfs_root *root;
- u64 max_global_id = 0;
int ret;
struct btrfs_key key = {
.objectid = objectid,
@@ -2617,14 +2616,15 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
if (key.objectid != objectid)
break;
- btrfs_release_path(path);
- /*
- * Just worry about this for extent tree, it'll be the same for
- * everybody.
- */
- if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
- max_global_id = max(max_global_id, key.offset);
+ if (key.offset >= fs_info->nr_global_roots) {
+ btrfs_err(fs_info, "invalid global root [%llu, %llu]\n",
+ key.objectid, key.offset);
+ ret = -EUCLEAN;
+ break;
+ }
+
+ btrfs_release_path(path);
found = true;
root = read_tree_root_path(tree_root, path, &key);
@@ -2643,9 +2643,6 @@ static int load_global_roots_objectid(struct btrfs_root *tree_root,
}
btrfs_release_path(path);
- if (objectid == BTRFS_EXTENT_TREE_OBJECTID)
- fs_info->nr_global_roots = max_global_id + 1;
-
if (!found || ret) {
if (objectid == BTRFS_CSUM_TREE_OBJECTID)
set_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
@@ -3242,6 +3239,7 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
fs_info->sectorsize = 4096;
fs_info->sectorsize_bits = ilog2(4096);
fs_info->stripesize = 4096;
+ fs_info->nr_global_roots = 1;
spin_lock_init(&fs_info->swapfile_pins_lock);
fs_info->swapfile_pins = RB_ROOT;
@@ -3598,6 +3596,9 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
fs_info->csums_per_leaf = BTRFS_MAX_ITEM_SIZE(fs_info) / fs_info->csum_size;
fs_info->stripesize = stripesize;
+ if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+ fs_info->nr_global_roots = btrfs_super_nr_global_roots(disk_super);
+
ret = btrfs_parse_options(fs_info, options, sb->s_flags);
if (ret) {
err = ret;
--
2.26.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/6] btrfs: don't do backref modification for metadata for extent tree v2
2022-03-07 22:04 [PATCH v2 0/6] btrfs: extent-tree-v2, gc and no meta ref counting Josef Bacik
2022-03-07 22:04 ` [PATCH v2 1/6] btrfs: read the nr_global_roots from the super block Josef Bacik
@ 2022-03-07 22:04 ` Josef Bacik
2022-03-07 22:04 ` [PATCH v2 3/6] btrfs: add definitions and read support for the garbage collection tree Josef Bacik
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2022-03-07 22:04 UTC (permalink / raw)
To: linux-btrfs, kernel-team
For extent tree v2 we will no longer track references for metadata in
the extent tree. Make changes at the alloc and free sides so the proper
accounting is done but skip the extent tree modification parts.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/extent-tree.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f477035a2ac2..309d8753bf41 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2955,7 +2955,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
struct btrfs_extent_item *ei;
struct btrfs_extent_inline_ref *iref;
int ret;
- int is_data;
int extent_slot = 0;
int found_extent = 0;
int num_to_del = 1;
@@ -2964,6 +2963,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
u64 bytenr = node->bytenr;
u64 num_bytes = node->num_bytes;
bool skinny_metadata = btrfs_fs_incompat(info, SKINNY_METADATA);
+ bool is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
+
+ if (btrfs_fs_incompat(info, EXTENT_TREE_V2) && !is_data)
+ return do_free_extent_accounting(trans, bytenr, num_bytes,
+ is_data);
extent_root = btrfs_extent_root(info, bytenr);
ASSERT(extent_root);
@@ -2972,8 +2976,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
if (!path)
return -ENOMEM;
- is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
-
if (!is_data && refs_to_drop != 1) {
btrfs_crit(info,
"invalid refs_to_drop, dropping more than 1 refs for tree block %llu refs_to_drop %u",
@@ -4703,6 +4705,9 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
u64 flags = extent_op->flags_to_set;
bool skinny_metadata = btrfs_fs_incompat(fs_info, SKINNY_METADATA);
+ if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+ goto out;
+
ref = btrfs_delayed_node_to_tree_ref(node);
extent_key.objectid = node->bytenr;
@@ -4756,7 +4761,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
btrfs_mark_buffer_dirty(leaf);
btrfs_free_path(path);
-
+out:
return alloc_reserved_extent(trans, node->bytenr, fs_info->nodesize);
}
--
2.26.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 3/6] btrfs: add definitions and read support for the garbage collection tree
2022-03-07 22:04 [PATCH v2 0/6] btrfs: extent-tree-v2, gc and no meta ref counting Josef Bacik
2022-03-07 22:04 ` [PATCH v2 1/6] btrfs: read the nr_global_roots from the super block Josef Bacik
2022-03-07 22:04 ` [PATCH v2 2/6] btrfs: don't do backref modification for metadata for extent tree v2 Josef Bacik
@ 2022-03-07 22:04 ` Josef Bacik
2022-03-07 22:04 ` [PATCH v2 4/6] btrfs: add a btrfs_first_item helper Josef Bacik
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2022-03-07 22:04 UTC (permalink / raw)
To: linux-btrfs, kernel-team
This adds the on disk definitions for the garbage collection tree and
the code to load it on mount.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/disk-io.c | 6 ++++++
fs/btrfs/print-tree.c | 4 ++++
include/uapi/linux/btrfs_tree.h | 6 ++++++
3 files changed, 16 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index aeefc4e2e71a..3546a3af9ad7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2678,6 +2678,12 @@ static int load_global_roots(struct btrfs_root *tree_root)
ret = load_global_roots_objectid(tree_root, path,
BTRFS_FREE_SPACE_TREE_OBJECTID,
"free space");
+ if (ret)
+ goto out;
+ if (!btrfs_fs_incompat(tree_root->fs_info, EXTENT_TREE_V2))
+ goto out;
+ ret = load_global_roots_objectid(tree_root, path,
+ BTRFS_GC_TREE_OBJECTID, "gc");
out:
btrfs_free_path(path);
return ret;
diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index dd8777872143..a0b0d5d68826 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -24,6 +24,7 @@ static const struct root_name_map root_map[] = {
{ BTRFS_UUID_TREE_OBJECTID, "UUID_TREE" },
{ BTRFS_FREE_SPACE_TREE_OBJECTID, "FREE_SPACE_TREE" },
{ BTRFS_BLOCK_GROUP_TREE_OBJECTID, "BLOCK_GROUP_TREE" },
+ { BTRFS_GC_TREE_OBJECTID, "GC_TREE" },
{ BTRFS_DATA_RELOC_TREE_OBJECTID, "DATA_RELOC_TREE" },
};
@@ -348,6 +349,9 @@ void btrfs_print_leaf(struct extent_buffer *l)
print_uuid_item(l, btrfs_item_ptr_offset(l, i),
btrfs_item_size(l, i));
break;
+ case BTRFS_GC_INODE_ITEM_KEY:
+ pr_info("\t\tgc inode item\n");
+ break;
}
}
}
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index b069752a8ecf..4a363289c90e 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -56,6 +56,9 @@
/* Holds the block group items for extent tree v2. */
#define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL
+/* holds the garbage collection itesm for extent tree v2. */
+#define BTRFS_GC_TREE_OBJECTID 12ULL
+
/* device stats in the device tree */
#define BTRFS_DEV_STATS_OBJECTID 0ULL
@@ -147,6 +150,9 @@
#define BTRFS_ORPHAN_ITEM_KEY 48
/* reserve 2-15 close to the inode for later flexibility */
+/* The garbage collection items. */
+#define BTRFS_GC_INODE_ITEM_KEY 49
+
/*
* dir items are the name -> inode pointers in a directory. There is one
* for every name in a directory. BTRFS_DIR_LOG_ITEM_KEY is no longer used
--
2.26.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 4/6] btrfs: add a btrfs_first_item helper
2022-03-07 22:04 [PATCH v2 0/6] btrfs: extent-tree-v2, gc and no meta ref counting Josef Bacik
` (2 preceding siblings ...)
2022-03-07 22:04 ` [PATCH v2 3/6] btrfs: add definitions and read support for the garbage collection tree Josef Bacik
@ 2022-03-07 22:04 ` Josef Bacik
2022-03-07 22:04 ` [PATCH v2 5/6] btrfs: turn evict_refill_and_join into a real helper Josef Bacik
2022-03-07 22:04 ` [PATCH v2 6/6] btrfs: add garbage collection tree support Josef Bacik
5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2022-03-07 22:04 UTC (permalink / raw)
To: linux-btrfs, kernel-team
The GC tree stuff is going to use this helper and it'll make the code a
bit cleaner to abstract this into a helper.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/ctree.c | 23 +++++++++++++++++++++++
fs/btrfs/ctree.h | 1 +
2 files changed, 24 insertions(+)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 0eecf98d0abb..eee0b7e3c68a 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -4791,3 +4791,26 @@ int btrfs_previous_extent_item(struct btrfs_root *root,
}
return 1;
}
+
+/**
+ * btrfs_first_item - search the given root for the first item.
+ * @root: the root to search.
+ * @path: the path to use for the search.
+ * @return: 0 if it found something, 1 if nothing was found and < on error.
+ *
+ * Search down and find the first item in a tree. If the root is empty return
+ * 1, otherwise we'll return 0 or < 0 if there was an error.
+ */
+int btrfs_first_item(struct btrfs_root *root, struct btrfs_path *path)
+{
+ struct btrfs_key key = {};
+ int ret;
+
+ ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+ if (ret > 0) {
+ if (btrfs_header_nritems(path->nodes[0]) == 0)
+ return 1;
+ ret = 0;
+ }
+ return ret;
+}
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aaa8451ef8be..0260c191c014 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2914,6 +2914,7 @@ void btrfs_wait_for_snapshot_creation(struct btrfs_root *root);
int btrfs_bin_search(struct extent_buffer *eb, const struct btrfs_key *key,
int *slot);
int __pure btrfs_comp_cpu_keys(const struct btrfs_key *k1, const struct btrfs_key *k2);
+int btrfs_first_item(struct btrfs_root *root, struct btrfs_path *path);
int btrfs_previous_item(struct btrfs_root *root,
struct btrfs_path *path, u64 min_objectid,
int type);
--
2.26.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 5/6] btrfs: turn evict_refill_and_join into a real helper
2022-03-07 22:04 [PATCH v2 0/6] btrfs: extent-tree-v2, gc and no meta ref counting Josef Bacik
` (3 preceding siblings ...)
2022-03-07 22:04 ` [PATCH v2 4/6] btrfs: add a btrfs_first_item helper Josef Bacik
@ 2022-03-07 22:04 ` Josef Bacik
2022-03-07 22:04 ` [PATCH v2 6/6] btrfs: add garbage collection tree support Josef Bacik
5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2022-03-07 22:04 UTC (permalink / raw)
To: linux-btrfs, kernel-team
We are going to be using this same mechanism for garbage collection as
evict uses. Rename the flush state to be reflective of the role in GC
it will play from now own, and move the helper to transaction.c, rename
it and make it public.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/ctree.h | 2 +-
fs/btrfs/inode.c | 52 ++----------------------------------------
fs/btrfs/space-info.c | 4 ++--
fs/btrfs/transaction.c | 49 +++++++++++++++++++++++++++++++++++++++
fs/btrfs/transaction.h | 2 ++
5 files changed, 56 insertions(+), 53 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0260c191c014..407ebaa7b48d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2851,7 +2851,7 @@ enum btrfs_reserve_flush_enum {
* - Running delalloc and waiting for ordered extents
* - Allocating a new chunk
*/
- BTRFS_RESERVE_FLUSH_EVICT,
+ BTRFS_RESERVE_FLUSH_GC,
/*
* Flush space by above mentioned methods and by:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2e7143ff5523..80318da765c0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5174,54 +5174,6 @@ static void evict_inode_truncate_pages(struct inode *inode)
spin_unlock(&io_tree->lock);
}
-static struct btrfs_trans_handle *evict_refill_and_join(struct btrfs_root *root,
- struct btrfs_block_rsv *rsv)
-{
- struct btrfs_fs_info *fs_info = root->fs_info;
- struct btrfs_trans_handle *trans;
- u64 delayed_refs_extra = btrfs_calc_insert_metadata_size(fs_info, 1);
- int ret;
-
- /*
- * Eviction should be taking place at some place safe because of our
- * delayed iputs. However the normal flushing code will run delayed
- * iputs, so we cannot use FLUSH_ALL otherwise we'll deadlock.
- *
- * We reserve the delayed_refs_extra here again because we can't use
- * btrfs_start_transaction(root, 0) for the same deadlocky reason as
- * above. We reserve our extra bit here because we generate a ton of
- * delayed refs activity by truncating.
- *
- * BTRFS_RESERVE_FLUSH_EVICT will steal from the global_rsv if it can,
- * if we fail to make this reservation we can re-try without the
- * delayed_refs_extra so we can make some forward progress.
- */
- ret = btrfs_block_rsv_refill(fs_info, rsv, rsv->size + delayed_refs_extra,
- BTRFS_RESERVE_FLUSH_EVICT);
- if (ret) {
- ret = btrfs_block_rsv_refill(fs_info, rsv, rsv->size,
- BTRFS_RESERVE_FLUSH_EVICT);
- if (ret) {
- btrfs_warn(fs_info,
- "could not allocate space for delete; will truncate on mount");
- return ERR_PTR(-ENOSPC);
- }
- delayed_refs_extra = 0;
- }
-
- trans = btrfs_join_transaction(root);
- if (IS_ERR(trans))
- return trans;
-
- if (delayed_refs_extra) {
- trans->block_rsv = &fs_info->trans_block_rsv;
- trans->bytes_reserved = delayed_refs_extra;
- btrfs_block_rsv_migrate(rsv, trans->block_rsv,
- delayed_refs_extra, 1);
- }
- return trans;
-}
-
void btrfs_evict_inode(struct inode *inode)
{
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -5292,7 +5244,7 @@ void btrfs_evict_inode(struct inode *inode)
.min_type = 0,
};
- trans = evict_refill_and_join(root, rsv);
+ trans = btrfs_gc_rsv_refill_and_join(root, rsv);
if (IS_ERR(trans))
goto free_rsv;
@@ -5317,7 +5269,7 @@ void btrfs_evict_inode(struct inode *inode)
* If it turns out that we are dropping too many of these, we might want
* to add a mechanism for retrying these after a commit.
*/
- trans = evict_refill_and_join(root, rsv);
+ trans = btrfs_gc_rsv_refill_and_join(root, rsv);
if (!IS_ERR(trans)) {
trans->block_rsv = rsv;
btrfs_orphan_del(trans, BTRFS_I(inode));
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index b87931a458eb..c1ff0f4e97df 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -1401,7 +1401,7 @@ static int handle_reserve_ticket(struct btrfs_fs_info *fs_info,
priority_flush_states,
ARRAY_SIZE(priority_flush_states));
break;
- case BTRFS_RESERVE_FLUSH_EVICT:
+ case BTRFS_RESERVE_FLUSH_GC:
priority_reclaim_metadata_space(fs_info, space_info, ticket,
evict_flush_states,
ARRAY_SIZE(evict_flush_states));
@@ -1459,7 +1459,7 @@ static inline void maybe_clamp_preempt(struct btrfs_fs_info *fs_info,
static inline bool can_steal(enum btrfs_reserve_flush_enum flush)
{
return (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL ||
- flush == BTRFS_RESERVE_FLUSH_EVICT);
+ flush == BTRFS_RESERVE_FLUSH_GC);
}
/**
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b008c5110958..d9671a9be696 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -887,6 +887,55 @@ static noinline void wait_for_commit(struct btrfs_transaction *commit,
}
}
+/**
+ * btrfs_gc_rsv_refill_and_join - refill a block rsv and join transaction for gc
+ * @root: the root we're modifying
+ * @rsv: the rsv we're refilling
+ * @return: trans handle with a refilled block_rsv
+ *
+ * Inode eviction or GC will be taking place somewhere safe because of either
+ * delayed iputs or the GC threads. However the normal flushing behavior
+ * may want to wait on eviction or GC in order to reclaim some space.
+ *
+ * This refills the rsv, and also adds some extra for the delayed refs that may
+ * be generated by the operation. If it cannot get the delayed refs reservation
+ * it'll reduce the reservation so we can possibly make progress.
+ *
+ * This will also steal from the global reserve if it needs to.
+ */
+struct btrfs_trans_handle *btrfs_gc_rsv_refill_and_join(struct btrfs_root *root,
+ struct btrfs_block_rsv *rsv)
+{
+ struct btrfs_fs_info *fs_info = root->fs_info;
+ struct btrfs_trans_handle *trans;
+ u64 delayed_refs_extra = btrfs_calc_insert_metadata_size(fs_info, 1);
+ int ret;
+
+ ret = btrfs_block_rsv_refill(fs_info, rsv, rsv->size + delayed_refs_extra,
+ BTRFS_RESERVE_FLUSH_GC);
+ if (ret) {
+ ret = btrfs_block_rsv_refill(fs_info, rsv, rsv->size,
+ BTRFS_RESERVE_FLUSH_GC);
+ if (ret) {
+ btrfs_warn(fs_info,
+ "could not allocate space for delete; will truncate on mount");
+ return ERR_PTR(-ENOSPC);
+ }
+ delayed_refs_extra = 0;
+ }
+
+ trans = btrfs_join_transaction(root);
+ if (IS_ERR(trans))
+ return trans;
+
+ if (delayed_refs_extra) {
+ trans->block_rsv = &fs_info->trans_block_rsv;
+ trans->bytes_reserved = delayed_refs_extra;
+ btrfs_block_rsv_migrate(rsv, trans->block_rsv,
+ delayed_refs_extra, 1);
+ }
+ return trans;
+}
int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid)
{
struct btrfs_transaction *cur_trans = NULL, *t;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 970ff316069d..c28574ea731b 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -235,5 +235,7 @@ void btrfs_apply_pending_changes(struct btrfs_fs_info *fs_info);
void btrfs_add_dropped_root(struct btrfs_trans_handle *trans,
struct btrfs_root *root);
void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
+struct btrfs_trans_handle *btrfs_gc_rsv_refill_and_join(struct btrfs_root *root,
+ struct btrfs_block_rsv *rsv);
#endif
--
2.26.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 6/6] btrfs: add garbage collection tree support
2022-03-07 22:04 [PATCH v2 0/6] btrfs: extent-tree-v2, gc and no meta ref counting Josef Bacik
` (4 preceding siblings ...)
2022-03-07 22:04 ` [PATCH v2 5/6] btrfs: turn evict_refill_and_join into a real helper Josef Bacik
@ 2022-03-07 22:04 ` Josef Bacik
5 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2022-03-07 22:04 UTC (permalink / raw)
To: linux-btrfs, kernel-team
This patch adds the support for loading the gc tree, and running the
inode garbage collection work. Every time the transaction is committed
we'll kick off helpers to run any items in the GC tree. Currently we
just have the inode item collection, which will handle the work of
deleting the inode items once an inode is unlinked.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/Makefile | 2 +-
fs/btrfs/ctree.h | 10 ++
fs/btrfs/disk-io.c | 8 +-
fs/btrfs/gc-tree.c | 223 +++++++++++++++++++++++++++++++++++++++++
fs/btrfs/gc-tree.h | 15 +++
fs/btrfs/inode.c | 13 +++
fs/btrfs/transaction.c | 3 +
7 files changed, 272 insertions(+), 2 deletions(-)
create mode 100644 fs/btrfs/gc-tree.c
create mode 100644 fs/btrfs/gc-tree.h
diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 4188ba3fd8c3..50c08dc95f3b 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -30,7 +30,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
- subpage.o tree-mod-log.o
+ subpage.o tree-mod-log.o gc-tree.o
btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 407ebaa7b48d..29e0e009267b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -869,6 +869,9 @@ struct btrfs_fs_info {
struct btrfs_workqueue *fixup_workers;
struct btrfs_workqueue *delayed_workers;
+ /* Used to run the GC work. */
+ struct btrfs_workqueue *gc_workers;
+
struct task_struct *transaction_kthread;
struct task_struct *cleaner_kthread;
u32 thread_pool_size;
@@ -1009,6 +1012,9 @@ struct btrfs_fs_info {
struct semaphore uuid_tree_rescan_sem;
+ /* Used to run GC in the background. */
+ struct work_struct gc_work;
+
/* Used to reclaim the metadata space in the background. */
struct work_struct async_reclaim_work;
struct work_struct async_data_reclaim_work;
@@ -1144,8 +1150,12 @@ enum {
BTRFS_ROOT_QGROUP_FLUSHING,
/* We started the orphan cleanup for this root. */
BTRFS_ROOT_ORPHAN_CLEANUP,
+
/* This root has a drop operation that was started previously. */
BTRFS_ROOT_UNFINISHED_DROP,
+
+ /* GC is happening on this root. */
+ BTRFS_ROOT_GC_RUNNING,
};
static inline void btrfs_wake_unfinished_drop(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3546a3af9ad7..c4e9d69ed672 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2283,6 +2283,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
*/
btrfs_destroy_workqueue(fs_info->endio_meta_workers);
btrfs_destroy_workqueue(fs_info->endio_meta_write_workers);
+ btrfs_destroy_workqueue(fs_info->gc_workers);
}
static void free_root_extent_buffers(struct btrfs_root *root)
@@ -2489,6 +2490,8 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
btrfs_alloc_workqueue(fs_info, "qgroup-rescan", flags, 1, 0);
fs_info->discard_ctl.discard_workers =
alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1);
+ fs_info->gc_workers =
+ btrfs_alloc_workqueue(fs_info, "garbage-collect", flags, max_active, 1);
if (!(fs_info->workers && fs_info->delalloc_workers &&
fs_info->flush_workers &&
@@ -2498,7 +2501,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
fs_info->endio_freespace_worker && fs_info->rmw_workers &&
fs_info->caching_workers && fs_info->fixup_workers &&
fs_info->delayed_workers && fs_info->qgroup_rescan_workers &&
- fs_info->discard_ctl.discard_workers)) {
+ fs_info->discard_ctl.discard_workers && fs_info->gc_workers)) {
return -ENOMEM;
}
@@ -4648,6 +4651,9 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
*/
btrfs_wake_unfinished_drop(fs_info);
+ /* Stop the gc workers. */
+ btrfs_flush_workqueue(fs_info->gc_workers);
+
/* wait for the qgroup rescan worker to stop */
btrfs_qgroup_wait_for_completion(fs_info, false);
diff --git a/fs/btrfs/gc-tree.c b/fs/btrfs/gc-tree.c
new file mode 100644
index 000000000000..7df7236f805c
--- /dev/null
+++ b/fs/btrfs/gc-tree.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "ctree.h"
+#include "gc-tree.h"
+#include "btrfs_inode.h"
+#include "disk-io.h"
+#include "transaction.h"
+#include "inode-item.h"
+
+struct gc_work {
+ struct btrfs_work work;
+ struct btrfs_root *root;
+};
+
+static struct btrfs_root *inode_gc_root(struct btrfs_inode *inode)
+{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
+ struct btrfs_key key = {
+ .objectid = BTRFS_GC_TREE_OBJECTID,
+ .type = BTRFS_ROOT_ITEM_KEY,
+ .offset = btrfs_ino(inode) % fs_info->nr_global_roots,
+ };
+
+ return btrfs_global_root(fs_info, &key);
+}
+
+static int add_gc_item(struct btrfs_root *root, struct btrfs_key *key,
+ struct btrfs_block_rsv *rsv)
+{
+ struct btrfs_path *path;
+ struct btrfs_trans_handle *trans;
+ int ret = 0;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ trans = btrfs_gc_rsv_refill_and_join(root, rsv);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ goto out;
+ }
+
+ trans->block_rsv = rsv;
+ ret = btrfs_insert_empty_item(trans, root, path, key, 0);
+ trans->block_rsv = &root->fs_info->trans_block_rsv;
+ btrfs_end_transaction(trans);
+out:
+ btrfs_free_path(path);
+ return ret;
+}
+
+static void delete_gc_item(struct btrfs_root *root, struct btrfs_path *path,
+ struct btrfs_block_rsv *rsv, struct btrfs_key *key)
+{
+ struct btrfs_trans_handle *trans;
+ int ret;
+
+ trans = btrfs_gc_rsv_refill_and_join(root, rsv);
+ if (IS_ERR(trans))
+ return;
+
+ ret = btrfs_search_slot(trans, root, key, path, -1, 1);
+ if (ret > 0)
+ ret = -ENOENT;
+ if (ret < 0)
+ return;
+ btrfs_del_item(trans, root, path);
+ btrfs_release_path(path);
+ btrfs_end_transaction(trans);
+}
+
+static int gc_inode(struct btrfs_fs_info *fs_info, struct btrfs_block_rsv *rsv,
+ struct btrfs_key *key)
+{
+ struct btrfs_root *root = btrfs_get_fs_root(fs_info, key->objectid, true);
+ struct btrfs_trans_handle *trans;
+ int ret = 0;
+
+ if (IS_ERR(root)) {
+ ret = PTR_ERR(root);
+
+ /* We are deleting this subvolume, just delete the GC item for it. */
+ if (ret == -ENOENT)
+ return 0;
+
+ btrfs_err(fs_info, "failed to look up root during gc %llu: %d",
+ key->objectid, ret);
+ return ret;
+ }
+
+ do {
+ struct btrfs_truncate_control control = {
+ .ino = key->offset,
+ .new_size = 0,
+ .min_type = 0,
+ };
+
+ trans = btrfs_gc_rsv_refill_and_join(root, rsv);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ break;
+ }
+
+ trans->block_rsv = rsv;
+
+ ret = btrfs_truncate_inode_items(trans, root, &control);
+
+ trans->block_rsv = &fs_info->trans_block_rsv;
+ btrfs_end_transaction(trans);
+ btrfs_btree_balance_dirty(fs_info);
+ } while (ret == -ENOSPC || ret == -EAGAIN);
+
+ btrfs_put_root(root);
+ return ret;
+}
+
+static void gc_work_fn(struct btrfs_work *work)
+{
+ struct gc_work *gc_work = container_of(work, struct gc_work, work);
+ struct btrfs_root *root = gc_work->root;
+ struct btrfs_fs_info *fs_info = root->fs_info;
+ struct btrfs_path *path;
+ struct btrfs_block_rsv *rsv;
+ int ret;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ goto out;
+
+ rsv = btrfs_alloc_block_rsv(fs_info, BTRFS_BLOCK_RSV_TEMP);
+ if (!rsv)
+ goto out_path;
+ rsv->size = btrfs_calc_metadata_size(fs_info, 1);
+ rsv->failfast = 1;
+
+ while (btrfs_fs_closing(fs_info) &&
+ !btrfs_first_item(root, path)) {
+ struct btrfs_key key;
+
+ btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
+ btrfs_release_path(path);
+
+ switch (key.type) {
+ case BTRFS_GC_INODE_ITEM_KEY:
+ ret = gc_inode(root->fs_info, rsv, &key);
+ break;
+ default:
+ ASSERT(0);
+ ret = -EINVAL;
+ break;
+ }
+
+ if (!ret)
+ delete_gc_item(root, path, rsv, &key);
+ }
+ btrfs_free_block_rsv(fs_info, rsv);
+out_path:
+ btrfs_free_path(path);
+out:
+ clear_bit(BTRFS_ROOT_GC_RUNNING, &root->state);
+ kfree(gc_work);
+}
+
+/**
+ * btrfs_queue_gc_work - queue work for non-empty GC roots.
+ * @fs_info: The fs_info for the file system.
+ *
+ * This walks through all of the garbage collection roots and schedules the
+ * work structs to chew through their work.
+ */
+void btrfs_queue_gc_work(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_root *root;
+ struct gc_work *gc_work;
+ struct btrfs_key key = {
+ .objectid = BTRFS_GC_TREE_OBJECTID,
+ .type = BTRFS_ROOT_ITEM_KEY,
+ };
+ int nr_global_roots = fs_info->nr_global_roots;
+ int i;
+
+ if (!btrfs_fs_incompat(fs_info, EXTENT_TREE_V2))
+ return;
+
+ if (btrfs_fs_closing(fs_info))
+ return;
+
+ for (i = 0; i < nr_global_roots; i++) {
+ key.offset = i;
+ root = btrfs_global_root(fs_info, &key);
+ if (test_and_set_bit(BTRFS_ROOT_GC_RUNNING, &root->state))
+ continue;
+ gc_work = kmalloc(sizeof(struct gc_work), GFP_KERNEL);
+ if (!gc_work) {
+ clear_bit(BTRFS_ROOT_GC_RUNNING, &root->state);
+ continue;
+ }
+ gc_work->root = root;
+ btrfs_init_work(&gc_work->work, gc_work_fn, NULL, NULL);
+ btrfs_queue_work(fs_info->gc_workers, &gc_work->work);
+ }
+}
+
+/**
+ * btrfs_add_inode_gc_item - add a gc item for an inode that needs to be removed.
+ * @inode: The inode that needs to have a gc item added.
+ * @rsv: The block rsv to use for the reservation.
+ *
+ * This adds the gc item for the given inode. This must be called during evict
+ * to make sure nobody else is going to access this inode.
+ */
+int btrfs_add_inode_gc_item(struct btrfs_inode *inode,
+ struct btrfs_block_rsv *rsv)
+{
+ struct btrfs_key key = {
+ .objectid = inode->root->root_key.objectid,
+ .type = BTRFS_GC_INODE_ITEM_KEY,
+ .offset = btrfs_ino(inode),
+ };
+
+ return add_gc_item(inode_gc_root(inode), &key, rsv);
+}
diff --git a/fs/btrfs/gc-tree.h b/fs/btrfs/gc-tree.h
new file mode 100644
index 000000000000..d744f45f8c8e
--- /dev/null
+++ b/fs/btrfs/gc-tree.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BTRFS_GC_TREE_H
+#define BTRFS_GC_TREE_H
+
+struct btrfs_fs_info;
+struct btrfs_inode;
+struct btrfs_block_rsv;
+
+void btrfs_queue_gc_work(struct btrfs_fs_info *fs_info);
+int btrfs_add_inode_gc_item(struct btrfs_inode *inode,
+ struct btrfs_block_rsv *rsv);
+
+#endif
+
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 80318da765c0..29e520bc636c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -55,6 +55,7 @@
#include "zoned.h"
#include "subpage.h"
#include "inode-item.h"
+#include "gc-tree.h"
struct btrfs_iget_args {
u64 ino;
@@ -5234,6 +5235,17 @@ void btrfs_evict_inode(struct inode *inode)
rsv->size = btrfs_calc_metadata_size(fs_info, 1);
rsv->failfast = 1;
+ /*
+ * If we have extent tree v2 enabled, insert our gc item and we're done,
+ * remove the orphan item if we succeeded.
+ */
+ if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+ ret = btrfs_add_inode_gc_item(BTRFS_I(inode), rsv);
+ if (ret)
+ goto free_rsv;
+ goto delete_orphan;
+ }
+
btrfs_i_size_write(BTRFS_I(inode), 0);
while (1) {
@@ -5269,6 +5281,7 @@ void btrfs_evict_inode(struct inode *inode)
* If it turns out that we are dropping too many of these, we might want
* to add a mechanism for retrying these after a commit.
*/
+delete_orphan:
trans = btrfs_gc_rsv_refill_and_join(root, rsv);
if (!IS_ERR(trans)) {
trans->block_rsv = rsv;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d9671a9be696..9d4bbdb62199 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -22,6 +22,7 @@
#include "block-group.h"
#include "space-info.h"
#include "zoned.h"
+#include "gc-tree.h"
#define BTRFS_ROOT_TRANS_TAG 0
@@ -2513,6 +2514,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
btrfs_put_transaction(cur_trans);
btrfs_put_transaction(cur_trans);
+ btrfs_queue_gc_work(fs_info);
+
if (trans->type & __TRANS_FREEZABLE)
sb_end_intwrite(fs_info->sb);
--
2.26.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-03-07 22:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-07 22:04 [PATCH v2 0/6] btrfs: extent-tree-v2, gc and no meta ref counting Josef Bacik
2022-03-07 22:04 ` [PATCH v2 1/6] btrfs: read the nr_global_roots from the super block Josef Bacik
2022-03-07 22:04 ` [PATCH v2 2/6] btrfs: don't do backref modification for metadata for extent tree v2 Josef Bacik
2022-03-07 22:04 ` [PATCH v2 3/6] btrfs: add definitions and read support for the garbage collection tree Josef Bacik
2022-03-07 22:04 ` [PATCH v2 4/6] btrfs: add a btrfs_first_item helper Josef Bacik
2022-03-07 22:04 ` [PATCH v2 5/6] btrfs: turn evict_refill_and_join into a real helper Josef Bacik
2022-03-07 22:04 ` [PATCH v2 6/6] btrfs: add garbage collection tree support Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox