* [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change
@ 2023-05-18 2:10 Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 1/7] btrfs-progs: tune: rework the main idea of csum change Qu Wenruo
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-18 2:10 UTC (permalink / raw)
To: linux-btrfs
[CHANGELOG]
v2:
- Skip csum item checks if the fs is under csum change
Tree-checker can be too sensitive if the csum size doesn not match the
old csum size, which can lead to false alerts on overlapping csum
items.
But we still want the tree checker functionality overall, so just
disable csum item related checks for csum change.
[DESIGN]
The csum change workflow looks like this:
- Generate new data csums
New data csums would have a different objectid (CSUM_CHANGE -13),
other than the existing one (EXTENT_CSUM -10).
The generation part would also verify the data contents, if any
mismatch we would error out.
- Delete the old data csums
- Change data csums objectid
- Rewrite metadata csums in-place
At this stage, we would check the checksum for both old and new algo.
If the metadata matches the old csum, we rewrite using new csum.
If the metadata matches the new csum, we skip it.
If the metadata doesn't match either csum, we error out.
[TESTS]
For now it's only basically tested manually.
So far the result looks good, the result fs can pass "btrfs check
--check-data-csum".
[TODO]
- Support for resume
Currently we won't resume an interrupted csum conversion.
Although the design should be able to handle any interruption at data
csum conversion part, and as long as metadata csum writes are atomic,
the metadata rewrites should also be fine.
- Support for revert if errors are found
If we hit data csum mismatch and can not repair from any copy, then
we should revert back to the old csum.
- Suppot for precaustious metadata check
We'd better read and very metadata before rewriting them.
- Extra test cases
Qu Wenruo (7):
btrfs-progs: tune: rework the main idea of csum change
btrfs-progs: tune: implement the prerequisite checks for csum change
btrfs-progs: tune: add the ability to read and verify the data before
generating new checksum
btrfs-progs: tune: add the ability to generate new data checksums
btrfs-progs: tune: add the ability to delete old data csums
btrfs-progs: tune: add the ability to migrate the temporary csum items
to regular csum items
btrfs-progs: tune: add the ability to change metadata csums
check/mode-common.c | 11 +-
convert/main.c | 12 +-
kernel-shared/ctree.c | 3 -
kernel-shared/ctree.h | 19 +-
kernel-shared/disk-io.c | 8 -
kernel-shared/file-item.c | 46 +-
kernel-shared/file-item.h | 4 +-
kernel-shared/print-tree.c | 11 +-
kernel-shared/tree-checker.c | 5 +
kernel-shared/uapi/btrfs_tree.h | 1 +
mkfs/rootdir.c | 11 +-
tune/change-csum.c | 1053 +++++++++++++++++--------------
tune/main.c | 2 +-
tune/tune.h | 3 +-
14 files changed, 646 insertions(+), 543 deletions(-)
--
2.40.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/7] btrfs-progs: tune: rework the main idea of csum change
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
@ 2023-05-18 2:10 ` Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 2/7] btrfs-progs: tune: implement the prerequisite checks for " Qu Wenruo
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-18 2:10 UTC (permalink / raw)
To: linux-btrfs
The existing attempt for changing csum types is as the following:
- Create a new temporary csum root
- Generate new data csums into the temporary csum root
- Drop the old csum tree and make the temporary one as csum root
- Change the checksums for metadata in-place
Unfortunately after some experiments, the csum root switch method has a
big pitfall, the backref items in extent tree.
Those backref items still point back to the old tree, meaning without a
lot of extra tricks, the extent tree would be corrupted.
Thus we have to go a new single tree variant:
- Generate new data csums into the csum root
The new data csums would have a different objectid to distinguish
them.
- Drop the old data csum items
- Change the key objectids of the new csums
- Change the checksums for metadata in-place
This means unfortunately we have to revert most of the old code, and
update the temporary item format.
The new temporary item would only record the target csum type.
At every stage we have a method to determine the progress, thus no need
for an item, but in the future it's still open for change.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
kernel-shared/ctree.c | 3 -
kernel-shared/ctree.h | 19 +-
kernel-shared/disk-io.c | 8 -
kernel-shared/file-item.c | 12 -
kernel-shared/print-tree.c | 11 +-
kernel-shared/uapi/btrfs_tree.h | 1 +
tune/change-csum.c | 518 ++------------------------------
tune/main.c | 2 +-
tune/tune.h | 3 +-
9 files changed, 34 insertions(+), 543 deletions(-)
diff --git a/kernel-shared/ctree.c b/kernel-shared/ctree.c
index 782bc6cc80c1..bcf16271d864 100644
--- a/kernel-shared/ctree.c
+++ b/kernel-shared/ctree.c
@@ -403,9 +403,6 @@ int btrfs_create_root(struct btrfs_trans_handle *trans,
fs_info->block_group_root = new_root;
break;
- case BTRFS_CSUM_TREE_TMP_OBJECTID:
- fs_info->csum_tree_tmp = new_root;
- break;
/*
* Essential trees can't be created by this function, yet.
* As we expect such skeleton exists, or a lot of functions like
diff --git a/kernel-shared/ctree.h b/kernel-shared/ctree.h
index 3b7d98bff469..5d3392ae82a6 100644
--- a/kernel-shared/ctree.h
+++ b/kernel-shared/ctree.h
@@ -56,9 +56,8 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes)
sizeof(struct btrfs_stripe) * (num_stripes - 1);
}
-/* Temporary flag not on-disk for blocks that have changed csum already */
-#define BTRFS_HEADER_FLAG_CSUM_NEW (1ULL << 16)
-#define BTRFS_SUPER_FLAG_CHANGING_CSUM (1ULL << 37)
+#define BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM (1ULL << 36)
+#define BTRFS_SUPER_FLAG_CHANGING_META_CSUM (1ULL << 37)
/*
* The fs is undergoing block group tree feature change.
@@ -306,9 +305,6 @@ struct btrfs_fs_info {
/* the log root tree is a directory of all the other log roots */
struct btrfs_root *log_root_tree;
- /* When switching csums */
- struct btrfs_root *csum_tree_tmp;
-
struct cache_tree extent_cache;
u64 max_cache_size;
u64 cache_size;
@@ -365,7 +361,6 @@ struct btrfs_fs_info {
unsigned int skip_leaf_item_checks:1;
int transaction_aborted;
- int force_csum_type;
int (*free_extent_hook)(u64 bytenr, u64 num_bytes, u64 parent,
u64 root_objectid, u64 owner, u64 offset,
@@ -670,17 +665,11 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
* - balance status item (objectid -4)
* (BTRFS_BALANCE_OBJECTID, BTRFS_TEMPORARY_ITEM_KEY, 0)
*
- * - second csum tree for conversion (objecitd
+ * - second csum tree for conversion (objecitd -13)
+ * (BTRFS_CSUM_CHANGE_OBJECTID, BTRFS_TEMPORARY_ITEM_KEY, <target csum type>)
*/
#define BTRFS_TEMPORARY_ITEM_KEY 248
-/*
- * Temporary value
- *
- * root tree pointer of checksum tree with new checksum type
- */
-#define BTRFS_CSUM_TREE_TMP_OBJECTID 13ULL
-
/*
* Obsolete name, see BTRFS_PERSISTENT_ITEM_KEY
*/
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 5cbfcdd8452c..442d3af8bc01 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -215,12 +215,6 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
u16 csum_size = fs_info->csum_size;
u16 csum_type = fs_info->csum_type;
- if (fs_info->force_csum_type != -1) {
- /* printf("CSUM TREE: offset %llu\n", buf->start); */
- csum_type = fs_info->force_csum_type;
- csum_size = btrfs_csum_type_size(csum_type);
- }
-
if (verify && fs_info->suppress_check_block_errors)
return verify_tree_block_csum_silent(buf, csum_size, csum_type);
return csum_tree_block_size(buf, csum_size, verify, csum_type);
@@ -475,7 +469,6 @@ int write_tree_block(struct btrfs_trans_handle *trans,
if (trans && !btrfs_buffer_uptodate(eb, trans->transid, 0))
BUG();
- btrfs_clear_header_flag(eb, BTRFS_HEADER_FLAG_CSUM_NEW);
btrfs_set_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
csum_tree_block(fs_info, eb, 0);
@@ -885,7 +878,6 @@ struct btrfs_fs_info *btrfs_new_fs_info(int writable, u64 sb_bytenr)
fs_info->metadata_alloc_profile = (u64)-1;
fs_info->system_alloc_profile = fs_info->metadata_alloc_profile;
fs_info->nr_global_roots = 1;
- fs_info->force_csum_type = -1;
return fs_info;
diff --git a/kernel-shared/file-item.c b/kernel-shared/file-item.c
index b372cc5eab54..9b59a4b7a9ae 100644
--- a/kernel-shared/file-item.c
+++ b/kernel-shared/file-item.c
@@ -142,7 +142,6 @@ btrfs_lookup_csum(struct btrfs_trans_handle *trans,
struct btrfs_csum_item *item;
struct extent_buffer *leaf;
u64 csum_offset = 0;
- u16 csum_type = root->fs_info->csum_type;
u16 csum_size = root->fs_info->csum_size;
int csums_in_item;
@@ -154,11 +153,6 @@ btrfs_lookup_csum(struct btrfs_trans_handle *trans,
goto fail;
leaf = path->nodes[0];
- if (leaf->fs_info->force_csum_type != -1) {
- csum_type = root->fs_info->force_csum_type;
- csum_size = btrfs_csum_type_size(csum_type);
- }
-
if (ret > 0) {
ret = 1;
if (path->slots[0] == 0)
@@ -208,12 +202,6 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans,
u16 csum_size = root->fs_info->csum_size;
u16 csum_type = root->fs_info->csum_type;
- if (root->fs_info->force_csum_type != -1) {
- /* printf("CSUM DATA: offset %llu (%d -> %d)\n", bytenr, csum_type, root->fs_info->force_csum_type); */
- csum_type = root->fs_info->force_csum_type;
- csum_size = btrfs_csum_type_size(csum_type);
- }
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
diff --git a/kernel-shared/print-tree.c b/kernel-shared/print-tree.c
index 2cfd6b950ec5..aaaf58ae2e0f 100644
--- a/kernel-shared/print-tree.c
+++ b/kernel-shared/print-tree.c
@@ -790,6 +790,9 @@ void print_objectid(FILE *stream, u64 objectid, u8 type)
case BTRFS_BLOCK_GROUP_TREE_OBJECTID:
fprintf(stream, "BLOCK_GROUP_TREE");
break;
+ case BTRFS_CSUM_CHANGE_OBJECTID:
+ fprintf(stream, "CSUM_CHANGE");
+ break;
case (u64)-1:
fprintf(stream, "-1");
break;
@@ -1142,8 +1145,12 @@ static void print_temporary_item(struct extent_buffer *eb, void *ptr,
case BTRFS_BALANCE_OBJECTID:
print_balance_item(eb, ptr);
break;
- case BTRFS_CSUM_TREE_TMP_OBJECTID:
- printf("\t\tcsum tree tmp root %llu\n", offset);
+ case BTRFS_CSUM_CHANGE_OBJECTID:
+ if (offset < btrfs_get_num_csums())
+ printf("\t\ttarget csum type %s (%llu)\n",
+ btrfs_super_csum_name(offset) ,offset);
+ else
+ printf("\t\tunknown csum type %llu\n", offset);
break;
default:
printf("\t\tunknown temporary item objectid %llu\n", objectid);
diff --git a/kernel-shared/uapi/btrfs_tree.h b/kernel-shared/uapi/btrfs_tree.h
index 5b9f71ab15de..ad555e7055ab 100644
--- a/kernel-shared/uapi/btrfs_tree.h
+++ b/kernel-shared/uapi/btrfs_tree.h
@@ -106,6 +106,7 @@
*/
#define BTRFS_FREE_INO_OBJECTID -12ULL
+#define BTRFS_CSUM_CHANGE_OBJECTID -13ULL
/* dummy objectid represents multiple objectids */
#define BTRFS_MULTIPLE_OBJECTIDS -255ULL
diff --git a/tune/change-csum.c b/tune/change-csum.c
index 4531f2190f06..7a9f6351e7fe 100644
--- a/tune/change-csum.c
+++ b/tune/change-csum.c
@@ -26,510 +26,28 @@
#include "common/internal.h"
#include "tune/tune.h"
-static int change_tree_csum(struct btrfs_trans_handle *trans, struct btrfs_root *root,
- int csum_type)
+int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
{
- struct btrfs_fs_info *fs_info = root->fs_info;
- struct btrfs_path path;
- struct btrfs_key key = {0, 0, 0};
- int ret = 0;
- int level;
-
- btrfs_init_path(&path);
- /* No transaction, all in-place */
- ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
- if (ret < 0)
- goto out;
-
- while (1) {
- level = 1;
- while (path.nodes[level]) {
- /* Caching can make double writes */
- if (!btrfs_header_flag(path.nodes[level], BTRFS_HEADER_FLAG_CSUM_NEW)) {
- ret = write_tree_block(NULL, fs_info, path.nodes[level]);
- if (ret < 0)
- goto out;
- btrfs_set_header_flag(path.nodes[level],
- BTRFS_HEADER_FLAG_CSUM_NEW);
- }
- level++;
- }
- ret = write_tree_block(NULL, fs_info, path.nodes[0]);
- if (ret < 0)
- goto out;
- ret = btrfs_next_leaf(root, &path);
- if (ret < 0)
- goto out;
- if (ret > 0) {
- ret = 0;
- goto out;
- }
- }
-out:
- btrfs_release_path(&path);
- return ret;
-}
-
-static struct btrfs_csum_item *lookup_tmp_csum(struct btrfs_trans_handle *trans,
- struct btrfs_path *path, u64 bytenr, int cow)
-{
- int ret;
- struct btrfs_fs_info *fs_info = trans->fs_info;
- struct btrfs_root *csum_root = fs_info->csum_tree_tmp;
- struct btrfs_key file_key;
- struct btrfs_key found_key;
- struct btrfs_csum_item *item;
- struct extent_buffer *leaf;
- u64 csum_offset = 0;
- u16 csum_type = fs_info->csum_type;
- u16 csum_size = fs_info->csum_size;
- int csums_in_item;
-
- file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
- file_key.offset = bytenr;
- file_key.type = BTRFS_EXTENT_CSUM_KEY;
- ret = btrfs_search_slot(trans, csum_root, &file_key, path, 0, cow);
- if (ret < 0)
- goto fail;
- leaf = path->nodes[0];
-
- if (leaf->fs_info->force_csum_type != -1) {
- csum_type = fs_info->force_csum_type;
- csum_size = btrfs_csum_type_size(csum_type);
- }
-
- if (ret > 0) {
- ret = 1;
- if (path->slots[0] == 0)
- goto fail;
- path->slots[0]--;
- btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
- if (found_key.type != BTRFS_EXTENT_CSUM_KEY)
- goto fail;
-
- csum_offset = (bytenr - found_key.offset) / fs_info->sectorsize;
- csums_in_item = btrfs_item_size(leaf, path->slots[0]);
- csums_in_item /= csum_size;
-
- if (csum_offset >= csums_in_item) {
- ret = -EFBIG;
- goto fail;
- }
- }
- item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_csum_item);
- item = (struct btrfs_csum_item *)((unsigned char *)item +
- csum_offset * csum_size);
- return item;
-fail:
- if (ret > 0)
- ret = -ENOENT;
- return ERR_PTR(ret);
-}
-
-#define MAX_CSUM_ITEMS(r, size) ((((BTRFS_LEAF_DATA_SIZE(r->fs_info) - \
- sizeof(struct btrfs_item) * 2) / \
- size) - 1))
-
-static int csum_file_block(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
- u64 alloc_end, u64 bytenr, char *data, size_t len)
-{
- struct btrfs_root *csum_root = fs_info->csum_tree_tmp;
- int ret = 0;
- struct btrfs_key file_key;
- struct btrfs_key found_key;
- u64 next_offset = (u64)-1;
- int found_next = 0;
- struct btrfs_path *path;
- struct btrfs_csum_item *item;
- struct extent_buffer *leaf = NULL;
- u64 csum_offset;
- u8 csum_result[BTRFS_CSUM_SIZE];
- u32 sectorsize = fs_info->sectorsize;
- u32 nritems;
- u32 ins_size;
- u16 csum_size;
- u16 csum_type;
-
- if (fs_info->force_csum_type != -1)
- return -EINVAL;
-
- csum_type = fs_info->force_csum_type;
- csum_size = btrfs_csum_type_size(csum_type);
-
- path = btrfs_alloc_path();
- if (!path)
- return -ENOMEM;
-
- file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
- file_key.type = BTRFS_EXTENT_CSUM_KEY;
- file_key.offset = bytenr;
-
- item = lookup_tmp_csum(trans, path, bytenr, 1);
- if (!IS_ERR(item)) {
- leaf = path->nodes[0];
- ret = 0;
- goto found;
- }
- ret = PTR_ERR(item);
- if (ret == -EFBIG) {
- u32 item_size;
-
- /* We found one, but it isn't big enough yet */
- leaf = path->nodes[0];
- item_size = btrfs_item_size(leaf, path->slots[0]);
- if ((item_size / csum_size) >= MAX_CSUM_ITEMS(csum_root, csum_size)) {
- /* Already at max size, make a new one */
- goto insert;
- }
- } else {
- int slot = path->slots[0] + 1;
-
- /* We didn't find a csum item, insert one */
- nritems = btrfs_header_nritems(path->nodes[0]);
- if (path->slots[0] >= nritems - 1) {
- ret = btrfs_next_leaf(csum_root, path);
- if (ret == 1)
- found_next = 1;
- if (ret != 0)
- goto insert;
- slot = 0;
- }
- btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot);
- if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
- found_key.type != BTRFS_EXTENT_CSUM_KEY) {
- found_next = 1;
- goto insert;
- }
- next_offset = found_key.offset;
- found_next = 1;
- goto insert;
- }
+ /* Phase 0, check conflicting features. */
/*
- * At this point, we know the tree has an item, but it isn't big
- * enough yet to put our csum in. Grow it.
+ * Phase 1, generate new data csums.
+ *
+ * The new data csums would have a different key objectid, and there
+ * will be a temporary item in root tree to indicate the new checksum
+ * algo.
*/
- btrfs_release_path(path);
- ret = btrfs_search_slot(trans, csum_root, &file_key, path, csum_size, 1);
- if (ret < 0)
- goto fail;
- if (ret == 0)
- BUG();
- if (path->slots[0] == 0)
- goto insert;
- path->slots[0]--;
- leaf = path->nodes[0];
- btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
- csum_offset = (file_key.offset - found_key.offset) / sectorsize;
- if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
- found_key.type != BTRFS_EXTENT_CSUM_KEY ||
- csum_offset >= MAX_CSUM_ITEMS(csum_root, csum_size)) {
- goto insert;
- }
- if (csum_offset >= btrfs_item_size(leaf, path->slots[0]) / csum_size) {
- u32 diff = (csum_offset + 1) * csum_size;
- diff = diff - btrfs_item_size(leaf, path->slots[0]);
- if (diff != csum_size)
- goto insert;
- ret = btrfs_extend_item(csum_root, path, diff);
- BUG_ON(ret);
- goto csum;
- }
+ /* Phase 2, delete the old data csums. */
-insert:
- btrfs_release_path(path);
- csum_offset = 0;
- if (found_next) {
- u64 tmp = min(alloc_end, next_offset);
- tmp -= file_key.offset;
- tmp /= sectorsize;
- tmp = max((u64)1, tmp);
- tmp = min(tmp, (u64)MAX_CSUM_ITEMS(csum_root, csum_size));
- ins_size = csum_size * tmp;
- } else {
- ins_size = csum_size;
- }
- ret = btrfs_insert_empty_item(trans, csum_root, path, &file_key, ins_size);
- if (ret < 0)
- goto fail;
- if (ret != 0) {
- WARN_ON(1);
- goto fail;
- }
-csum:
- leaf = path->nodes[0];
- item = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_csum_item);
- ret = 0;
- item = (struct btrfs_csum_item *)((unsigned char *)item +
- csum_offset * csum_size);
-found:
- btrfs_csum_data(fs_info, csum_type, (u8 *)data, csum_result, len);
- write_extent_buffer(leaf, csum_result, (unsigned long)item, csum_size);
- btrfs_mark_buffer_dirty(path->nodes[0]);
-fail:
- btrfs_free_path(path);
- return ret;
-}
-
-static int populate_csum(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info, char *buf, u64 start,
- u64 len)
-{
- u64 offset = 0;
- u64 sectorsize;
- int ret = 0;
-
- while (offset < len) {
- sectorsize = fs_info->sectorsize;
- ret = read_data_from_disk(fs_info, buf, start + offset,
- §orsize, 0);
- if (ret)
- break;
- ret = csum_file_block(trans, fs_info, start + len, start + offset,
- buf, sectorsize);
- if (ret)
- break;
- offset += sectorsize;
- }
- return ret;
-}
-
-static int fill_csum_tree_from_extent(struct btrfs_fs_info *fs_info)
-{
- struct btrfs_root *extent_root = btrfs_extent_root(fs_info, 0);
- struct btrfs_trans_handle *trans;
- struct btrfs_path path;
- struct btrfs_extent_item *ei;
- struct extent_buffer *leaf;
- char *buf;
- struct btrfs_key key;
- int ret;
-
- trans = btrfs_start_transaction(extent_root, 1);
- if (trans == NULL) {
- ret = PTR_ERR(trans);
- errno = -ret;
- error_msg(ERROR_MSG_START_TRANS, "%m");
- return -EINVAL;
- }
-
- btrfs_init_path(&path);
- key.objectid = 0;
- key.type = BTRFS_EXTENT_ITEM_KEY;
- key.offset = 0;
- ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
- if (ret < 0) {
- btrfs_release_path(&path);
- return ret;
- }
-
- buf = malloc(fs_info->sectorsize);
- if (!buf) {
- btrfs_release_path(&path);
- return -ENOMEM;
- }
-
- while (1) {
- if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) {
- ret = btrfs_next_leaf(extent_root, &path);
- if (ret < 0)
- break;
- if (ret) {
- ret = 0;
- break;
- }
- }
- leaf = path.nodes[0];
-
- btrfs_item_key_to_cpu(leaf, &key, path.slots[0]);
- if (key.type != BTRFS_EXTENT_ITEM_KEY) {
- path.slots[0]++;
- continue;
- }
-
- ei = btrfs_item_ptr(leaf, path.slots[0], struct btrfs_extent_item);
- if (!(btrfs_extent_flags(leaf, ei) & BTRFS_EXTENT_FLAG_DATA)) {
- path.slots[0]++;
- continue;
- }
-
- ret = populate_csum(trans, fs_info, buf, key.objectid, key.offset);
- if (ret)
- break;
- path.slots[0]++;
- }
-
- btrfs_release_path(&path);
- free(buf);
-
- /* dont' commit if thre's error */
- ret = btrfs_commit_transaction(trans, extent_root);
-
- return ret;
-}
-
-int rewrite_checksums(struct btrfs_fs_info *fs_info, int csum_type)
-{
- struct btrfs_root *root;
- struct btrfs_super_block *disk_super;
- struct btrfs_trans_handle *trans;
- struct btrfs_path path;
- struct btrfs_key key;
- u64 super_flags;
- int ret;
-
- disk_super = fs_info->super_copy;
- super_flags = btrfs_super_flags(disk_super);
-
- /* FIXME: Sanity checks */
- if (0) {
- error("UUID rewrite in progress, cannot change csum");
- return 1;
- }
-
- pr_verbose(LOG_DEFAULT, "Change csum from %s to %s\n",
- btrfs_super_csum_name(fs_info->csum_type),
- btrfs_super_csum_name(csum_type));
-
- fs_info->force_csum_type = csum_type;
- root = fs_info->tree_root;
-
- /* Step 1 sets the in progress flag, no other change to the sb */
- pr_verbose(LOG_DEFAULT, "Set superblock flag CHANGING_CSUM\n");
- trans = btrfs_start_transaction(root, 1);
- if (IS_ERR(trans)) {
- ret = PTR_ERR(trans);
- errno = -ret;
- error_msg(ERROR_MSG_START_TRANS, "%m");
- return ret;
- }
-
- btrfs_init_path(&path);
- key.objectid = BTRFS_CSUM_TREE_TMP_OBJECTID;
- key.type = BTRFS_TEMPORARY_ITEM_KEY;
- key.offset = 0;
- ret = btrfs_search_slot(trans, root, &key, &path, 0, 0);
- if (ret < 0)
- return ret;
-
- if (ret == 1) {
- struct item {
- u64 offset;
- u64 generation;
- u16 csum_type;
- /*
- * - generation when last synced
- * - must recheck the whole tree anyway in case the fs
- * was mounted between and there are some extents missing
- */
- } item[1];
-
- ret = btrfs_create_root(trans, fs_info, BTRFS_CSUM_TREE_TMP_OBJECTID);
- if (ret < 0) {
- return ret;
- } else {
- item->offset = btrfs_header_bytenr(fs_info->csum_tree_tmp->node);
- item->generation = btrfs_super_generation(fs_info->super_copy);
- item->csum_type = csum_type;
- ret = btrfs_insert_item(trans, fs_info->tree_root, &key, item,
- sizeof(*item));
- if (ret < 0)
- return ret;
- }
- } else {
- error("updating existing tmp csum root not implemented");
- exit(1);
- }
-
- super_flags |= BTRFS_SUPER_FLAG_CHANGING_CSUM;
- btrfs_set_super_flags(disk_super, super_flags);
- /* Change csum type here */
- btrfs_set_super_csum_type(disk_super, csum_type);
- ret = btrfs_commit_transaction(trans, root);
- if (ret < 0)
- return ret;
- btrfs_release_path(&path);
-
- struct {
- struct btrfs_root *root;
- const char *name;
- u64 objectid;
- bool p;
- bool g;
- } trees[] = {
- { .p = true, .root = fs_info->tree_root, .name = "root tree" },
- { .p = true, .root = fs_info->chunk_root, .name = "chunk tree" },
- { .p = true, .root = fs_info->dev_root, .name = "dev tree" },
- { .p = true, .root = fs_info->uuid_root, .name = "uuid tree" },
- { .p = true, .root = fs_info->quota_root, .name = "quota tree" },
- { .p = true, .root = fs_info->block_group_root, .name = "block group tree" },
- { .g = true, .objectid = BTRFS_EXTENT_TREE_OBJECTID, .name = "extent tree" },
- { .g = true, .objectid = BTRFS_CSUM_TREE_OBJECTID, .name = "csum tree" },
- { .g = true, .objectid = BTRFS_FREE_SPACE_TREE_OBJECTID, .name = "free space tree" },
- { .p = true, .root = fs_info->csum_tree_tmp, .name = "csum tmp tree" },
- { .objectid = BTRFS_DATA_RELOC_TREE_OBJECTID, .name = "data reloc tree" },
- { .objectid = BTRFS_FS_TREE_OBJECTID, .name = "fs tree" },
- /* TODO: iterate all fs trees */
- /* TODO: crashes if trees not present */
- /* { .objectid = BTRFS_TREE_LOG_OBJECTID, .name = "tree log tree" }, */
- /* { .objectid = BTRFS_TREE_RELOC_OBJECTID, .name = "tree reloc tree" }, */
- /* { .objectid = BTRFS_BLOCK_GROUP_TREE_OBJECTID, .name = "block group tree" }, */
- };
-
- for (int i = 0; i < ARRAY_SIZE(trees); i++) {
- pr_verbose(LOG_DEFAULT, "Change csum in %s\n", trees[i].name);
- if (trees[i].p) {
- root = trees[i].root;
- if (!root)
- continue;
- } else if (trees[i].g) {
- key.objectid = trees[i].objectid;
- key.type = BTRFS_ROOT_ITEM_KEY;
- key.offset = 0;
- root = btrfs_global_root(fs_info, &key);
- if (!root)
- continue;
- } else {
- key.objectid = trees[i].objectid;
- key.type = BTRFS_ROOT_ITEM_KEY;
- key.offset = (u64)-1;
- root = btrfs_read_fs_root_no_cache(fs_info, &key);
- if (!root)
- continue;
- }
- ret = change_tree_csum(trans, root, csum_type);
- if (ret < 0) {
- error("failed to change csum of %s: %d", trees[i].name, ret);
- goto out;
- }
- }
-
- /* DATA */
- pr_verbose(LOG_DEFAULT, "Change csum of data blocks\n");
- ret = fill_csum_tree_from_extent(fs_info);
- if (ret < 0)
- goto out;
-
- /* TODO: sync last status of old csum tree */
- /* TODO: delete old csum tree */
-
- /* Last, change csum in super */
- ret = write_all_supers(fs_info);
- if (ret < 0)
- goto out;
-
- /* All checksums done, drop the flag, super block csum will get updated */
- pr_verbose(LOG_DEFAULT, "Clear superblock flag CHANGING_CSUM\n");
- super_flags = btrfs_super_flags(fs_info->super_copy);
- super_flags &= ~BTRFS_SUPER_FLAG_CHANGING_CSUM;
- btrfs_set_super_flags(fs_info->super_copy, super_flags);
- btrfs_set_super_csum_type(disk_super, csum_type);
- ret = write_all_supers(fs_info);
- pr_verbose(LOG_DEFAULT, "Checksum change finished\n");
-out:
- /* check errors */
-
- return ret;
+ /* Phase 3, change the new csum key objectid */
+
+ /*
+ * Phase 4, change the csums for metadata.
+ *
+ * This has to be done in-place, as we don't have a good method
+ * like relocation in progs.
+ * Thus we have to support reading a tree block with either csum.
+ */
+ return -EOPNOTSUPP;
}
diff --git a/tune/main.c b/tune/main.c
index c3e18df5ed5c..e38c1f6d3729 100644
--- a/tune/main.c
+++ b/tune/main.c
@@ -373,7 +373,7 @@ int BOX_MAIN(btrfstune)(int argc, char *argv[])
if (csum_type != -1) {
/* TODO: check conflicting flags */
pr_verbose(LOG_DEFAULT, "Proceed to switch checksums\n");
- ret = rewrite_checksums(root->fs_info, csum_type);
+ ret = btrfs_change_csum_type(root->fs_info, csum_type);
}
if (change_metadata_uuid) {
diff --git a/tune/tune.h b/tune/tune.h
index 753dc95eb138..0ef249d89eee 100644
--- a/tune/tune.h
+++ b/tune/tune.h
@@ -32,6 +32,5 @@ int set_metadata_uuid(struct btrfs_root *root, const char *uuid_string);
int convert_to_bg_tree(struct btrfs_fs_info *fs_info);
int convert_to_extent_tree(struct btrfs_fs_info *fs_info);
-int rewrite_checksums(struct btrfs_fs_info *fs_info, int csum_type);
-
+int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type);
#endif
--
2.40.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/7] btrfs-progs: tune: implement the prerequisite checks for csum change
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 1/7] btrfs-progs: tune: rework the main idea of csum change Qu Wenruo
@ 2023-05-18 2:10 ` Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 3/7] btrfs-progs: tune: add the ability to read and verify the data before generating new checksum Qu Wenruo
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-18 2:10 UTC (permalink / raw)
To: linux-btrfs
The overall idea is to make sure no running operations (balance,
dev-replace, dirty log) for the fs before csum change.
And also reject half converted csums for now.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tune/change-csum.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)
diff --git a/tune/change-csum.c b/tune/change-csum.c
index 7a9f6351e7fe..daab70b6eb4a 100644
--- a/tune/change-csum.c
+++ b/tune/change-csum.c
@@ -26,9 +26,68 @@
#include "common/internal.h"
#include "tune/tune.h"
+static int check_csum_change_requreiment(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_root *tree_root = fs_info->tree_root;
+ struct btrfs_root *dev_root = fs_info->dev_root;
+ struct btrfs_path path = { 0 };
+ struct btrfs_key key;
+ int ret;
+
+ if (btrfs_super_log_root(fs_info->super_copy)) {
+ error("dirty log tree detected, please replay the log or zero it.");
+ return -EINVAL;
+ }
+ if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
+ error("no csum change support for extent-tree-v2 feature yet.");
+ return -EOPNOTSUPP;
+ }
+ if (btrfs_super_flags(fs_info->super_copy) &
+ (BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM |
+ BTRFS_SUPER_FLAG_CHANGING_META_CSUM)) {
+ error("resume from half converted status is not yet supported");
+ return -EOPNOTSUPP;
+ }
+ key.objectid = BTRFS_BALANCE_OBJECTID;
+ key.type = BTRFS_TEMPORARY_ITEM_KEY;
+ key.offset = 0;
+ ret = btrfs_search_slot(NULL, tree_root, &key, &path, 0, 0);
+ btrfs_release_path(&path);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to check the balance status: %m");
+ return ret;
+ }
+ if (ret == 0) {
+ error("running balance detected, please finish or cancel it.");
+ return -EINVAL;
+ }
+
+ key.objectid = 0;
+ key.type = BTRFS_DEV_REPLACE_KEY;
+ key.offset = 0;
+ ret = btrfs_search_slot(NULL, dev_root, &key, &path, 0, 0);
+ btrfs_release_path(&path);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to check the dev-reaplce status: %m");
+ return ret;
+ }
+ if (ret == 0) {
+ error("running dev-replace detected, please finish or cancel it.");
+ return -EINVAL;
+ }
+ return 0;
+}
+
int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
{
+ int ret;
+
/* Phase 0, check conflicting features. */
+ ret = check_csum_change_requreiment(fs_info);
+ if (ret < 0)
+ return ret;
/*
* Phase 1, generate new data csums.
--
2.40.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/7] btrfs-progs: tune: add the ability to read and verify the data before generating new checksum
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 1/7] btrfs-progs: tune: rework the main idea of csum change Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 2/7] btrfs-progs: tune: implement the prerequisite checks for " Qu Wenruo
@ 2023-05-18 2:10 ` Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 4/7] btrfs-progs: tune: add the ability to generate new data checksums Qu Wenruo
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-18 2:10 UTC (permalink / raw)
To: linux-btrfs
This patch introduces a new helper function,
read_verify_one_data_sector(), to do the data read and checksum
verification (against the old csum).
This data would be later re-used to generate a new csum.
And since we're introduce the helper function, we also build the
skeleton to iterate the data extents using the old csum tree.
This method is much better compared to iterating using extent tree,
which has no directly indicator on whether the data extent has csum or
not (nodatasum or preallocated).
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tune/change-csum.c | 244 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 244 insertions(+)
diff --git a/tune/change-csum.c b/tune/change-csum.c
index daab70b6eb4a..9d1b529e9c34 100644
--- a/tune/change-csum.c
+++ b/tune/change-csum.c
@@ -20,10 +20,12 @@
#include <stdlib.h>
#include "kernel-shared/ctree.h"
#include "kernel-shared/disk-io.h"
+#include "kernel-shared/volumes.h"
#include "kernel-shared/extent_io.h"
#include "kernel-shared/transaction.h"
#include "common/messages.h"
#include "common/internal.h"
+#include "common/utils.h"
#include "tune/tune.h"
static int check_csum_change_requreiment(struct btrfs_fs_info *fs_info)
@@ -80,6 +82,242 @@ static int check_csum_change_requreiment(struct btrfs_fs_info *fs_info)
return 0;
}
+static int get_last_csum_bytenr(struct btrfs_fs_info *fs_info, u64 *result)
+{
+ struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0);
+ struct btrfs_path path = { 0 };
+ struct btrfs_key key;
+ int ret;
+
+ key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
+ key.type = BTRFS_EXTENT_CSUM_KEY;
+ key.offset = (u64)-1;
+
+ ret = btrfs_search_slot(NULL, csum_root, &key, &path, 0, 0);
+ if (ret < 0)
+ return ret;
+ assert(ret > 0);
+ ret = btrfs_previous_item(csum_root, &path, BTRFS_EXTENT_CSUM_OBJECTID,
+ BTRFS_EXTENT_CSUM_KEY);
+ if (ret < 0)
+ return ret;
+ /*
+ * Emptry csum tree, set last csum byte to 0 so we can skip new data
+ * csum generation.
+ */
+ if (ret > 0) {
+ *result = 0;
+ btrfs_release_path(&path);
+ return 0;
+ }
+ btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+ *result = key.offset + btrfs_item_size(path.nodes[0], path.slots[0]) /
+ fs_info->csum_size * fs_info->sectorsize;
+ btrfs_release_path(&path);
+ return 0;
+}
+
+static int read_verify_one_data_sector(struct btrfs_fs_info *fs_info,
+ u64 logical, void *data_buf,
+ const void *old_csums)
+{
+ const u32 sectorsize = fs_info->sectorsize;
+ int num_copies = btrfs_num_copies(fs_info, logical, sectorsize);
+ bool found_good = false;
+
+ for (int mirror = 1; mirror <= num_copies; mirror++) {
+ u8 csum_has[BTRFS_CSUM_SIZE];
+ u64 readlen = sectorsize;
+ int ret;
+
+ ret = read_data_from_disk(fs_info, data_buf, logical, &readlen,
+ mirror);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to read logical %llu: %m", logical);
+ continue;
+ }
+ btrfs_csum_data(fs_info, fs_info->csum_type, data_buf, csum_has,
+ sectorsize);
+ if (memcmp(csum_has, old_csums, fs_info->csum_size) == 0) {
+ found_good = true;
+ break;
+ } else {
+ char found[BTRFS_CSUM_STRING_LEN];
+ char want[BTRFS_CSUM_STRING_LEN];
+
+ btrfs_format_csum(fs_info->csum_type, old_csums, want);
+ btrfs_format_csum(fs_info->csum_type, csum_has, found);
+ error("csum mismatch for logical %llu mirror %u, has %s expected %s",
+ logical, mirror, found, want);
+ }
+ }
+ if (!found_good)
+ return -EIO;
+ return 0;
+}
+
+static int generate_new_csum_range(struct btrfs_trans_handle *trans,
+ u64 logical, u64 length, u16 new_csum_type,
+ const void *old_csums)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ const u32 sectorsize = fs_info->sectorsize;
+ int ret = 0;
+ void *buf;
+
+ buf = malloc(fs_info->sectorsize);
+ if (!buf)
+ return -ENOMEM;
+
+ for (u64 cur = logical; cur < logical + length; cur += sectorsize) {
+ ret = read_verify_one_data_sector(fs_info, cur, buf, old_csums +
+ (cur - logical) / sectorsize * fs_info->csum_size);
+
+ if (ret < 0) {
+ error("failed to recover a good copy for data at logical %llu",
+ logical);
+ goto out;
+ }
+ /* Calculate new csum and insert it into the csum tree. */
+ ret = -EOPNOTSUPP;
+ }
+out:
+ free(buf);
+ return ret;
+}
+
+/*
+ * After reading this many bytes of data, commit the current transaction.
+ *
+ * Only a soft cap, we can exceed the threshold if hitting a large enough csum
+ * item.
+ */
+#define CSUM_CHANGE_BYTES_THRESHOLD (SZ_2M)
+static int generate_new_data_csums(struct btrfs_fs_info *fs_info, u16 new_csum_type)
+{
+ struct btrfs_root *tree_root = fs_info->tree_root;
+ struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0);
+ struct btrfs_trans_handle *trans;
+ struct btrfs_path path = { 0 };
+ struct btrfs_key key;
+ const u32 new_csum_size = btrfs_csum_type_size(new_csum_type);
+ void *csum_buffer;
+ u64 converted_bytes = 0;
+ u64 last_csum;
+ u64 cur = 0;
+ int ret;
+
+ ret = get_last_csum_bytenr(fs_info, &last_csum);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to get the last csum item: %m");
+ return ret;
+ }
+ csum_buffer = malloc(fs_info->nodesize);
+ if (!csum_buffer)
+ return -ENOMEM;
+
+ trans = btrfs_start_transaction(tree_root, 1);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ errno = -ret;
+ error("failed to start transaction: %m");
+ goto out;
+ }
+ key.objectid = BTRFS_CSUM_CHANGE_OBJECTID;
+ key.type = BTRFS_TEMPORARY_ITEM_KEY;
+ key.offset = new_csum_type;
+ ret = btrfs_insert_empty_item(trans, tree_root, &path, &key, 0);
+ btrfs_release_path(&path);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to insert csum change item: %m");
+ btrfs_abort_transaction(trans, ret);
+ goto out;
+ }
+ btrfs_set_super_flags(fs_info->super_copy,
+ btrfs_super_flags(fs_info->super_copy) |
+ BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM);
+ ret = btrfs_commit_transaction(trans, tree_root);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to commit the initial transaction: %m");
+ goto out;
+ }
+
+ trans = btrfs_start_transaction(csum_root,
+ CSUM_CHANGE_BYTES_THRESHOLD / fs_info->sectorsize *
+ new_csum_size);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ errno = -ret;
+ error("failed to start transaction: %m");
+ return ret;
+ }
+
+ while (cur < last_csum) {
+ u64 start;
+ u64 len;
+ u32 item_size;
+
+ key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
+ key.type = BTRFS_EXTENT_CSUM_KEY;
+ key.offset = cur;
+
+ ret = btrfs_search_slot(NULL, csum_root, &key, &path, 0, 0);
+ if (ret < 0)
+ goto out;
+ if (ret > 0 && path.slots[0] >=
+ btrfs_header_nritems(path.nodes[0])) {
+ ret = btrfs_next_leaf(csum_root, &path);
+ if (ret > 0) {
+ ret = 0;
+ btrfs_release_path(&path);
+ break;
+ }
+ if (ret < 0) {
+ btrfs_release_path(&path);
+ goto out;
+ }
+ }
+ btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+ assert(key.offset >= cur);
+ item_size = btrfs_item_size(path.nodes[0], path.slots[0]);
+
+ start = key.offset;
+ len = item_size / fs_info->csum_size * fs_info->sectorsize;
+ read_extent_buffer(path.nodes[0], csum_buffer,
+ btrfs_item_ptr_offset(path.nodes[0], path.slots[0]),
+ item_size);
+ btrfs_release_path(&path);
+
+ ret = generate_new_csum_range(trans, start, len, new_csum_type,
+ csum_buffer);
+ if (ret < 0)
+ goto out;
+ converted_bytes += len;
+ if (converted_bytes >= CSUM_CHANGE_BYTES_THRESHOLD) {
+ converted_bytes = 0;
+ ret = btrfs_commit_transaction(trans, csum_root);
+ if (ret < 0)
+ goto out;
+ trans = btrfs_start_transaction(csum_root,
+ CSUM_CHANGE_BYTES_THRESHOLD /
+ fs_info->sectorsize * new_csum_size);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ goto out;
+ }
+ }
+ cur = start + len;
+ }
+ ret = btrfs_commit_transaction(trans, csum_root);
+out:
+ free(csum_buffer);
+ return ret;
+}
+
int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
{
int ret;
@@ -96,6 +334,12 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
* will be a temporary item in root tree to indicate the new checksum
* algo.
*/
+ ret = generate_new_data_csums(fs_info, new_csum_type);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to generate new data csums: %m");
+ return ret;
+ }
/* Phase 2, delete the old data csums. */
--
2.40.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/7] btrfs-progs: tune: add the ability to generate new data checksums
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
` (2 preceding siblings ...)
2023-05-18 2:10 ` [PATCH v2 3/7] btrfs-progs: tune: add the ability to read and verify the data before generating new checksum Qu Wenruo
@ 2023-05-18 2:10 ` Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 5/7] btrfs-progs: tune: add the ability to delete old data csums Qu Wenruo
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-18 2:10 UTC (permalink / raw)
To: linux-btrfs
This patch would modify btrfs_csum_file_block() to handle csum type
other than the one used in the current fs.
The new data checksum would use a different objectid (-13) to
distinguish with the existing one (-10).
This needs to change tree-checker to skip the item size checks,
since new csum can be larger than the original csum.
After this stage, the resulted csum tree would look like this:
item 0 key (CSUM_CHANGE EXTENT_CSUM 13631488) itemoff 8091 itemsize 8192
range start 13631488 end 22020096 length 8388608
item 1 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 7067 itemsize 1024
range start 13631488 end 14680064 length 1048576
Note the itemsize is 8 times the original one, as the original csum is
CRC32, while target csum is SHA256, which is 8 times the size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
check/mode-common.c | 11 ++++++-----
convert/main.c | 12 ++++++------
kernel-shared/file-item.c | 34 ++++++++++++++++++----------------
kernel-shared/file-item.h | 4 ++--
kernel-shared/tree-checker.c | 5 +++++
mkfs/rootdir.c | 11 ++++++-----
tune/change-csum.c | 10 +++++++++-
7 files changed, 52 insertions(+), 35 deletions(-)
diff --git a/check/mode-common.c b/check/mode-common.c
index a38d2afc6b6f..175e90f78bdc 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -1209,18 +1209,19 @@ static int populate_csum(struct btrfs_trans_handle *trans,
struct btrfs_root *csum_root, char *buf, u64 start,
u64 len)
{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
u64 offset = 0;
- u64 sectorsize;
+ u64 sectorsize = fs_info->sectorsize;
int ret = 0;
while (offset < len) {
- sectorsize = gfs_info->sectorsize;
- ret = read_data_from_disk(gfs_info, buf, start + offset,
+ ret = read_data_from_disk(fs_info, buf, start + offset,
§orsize, 0);
if (ret)
break;
- ret = btrfs_csum_file_block(trans, start + len, start + offset,
- buf, sectorsize);
+ ret = btrfs_csum_file_block(trans, start + offset,
+ BTRFS_EXTENT_CSUM_OBJECTID,
+ fs_info->csum_type, buf);
if (ret)
break;
offset += sectorsize;
diff --git a/convert/main.c b/convert/main.c
index 9781200d7e42..0a62101d7e48 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -182,7 +182,8 @@ static int csum_disk_extent(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
u64 disk_bytenr, u64 num_bytes)
{
- u32 blocksize = root->fs_info->sectorsize;
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ u32 blocksize = fs_info->sectorsize;
u64 offset;
char *buffer;
int ret = 0;
@@ -193,7 +194,7 @@ static int csum_disk_extent(struct btrfs_trans_handle *trans,
for (offset = 0; offset < num_bytes; offset += blocksize) {
u64 read_len = blocksize;
- ret = read_data_from_disk(root->fs_info, buffer,
+ ret = read_data_from_disk(fs_info, buffer,
disk_bytenr + offset, &read_len, 0);
if (ret)
break;
@@ -203,10 +204,9 @@ static int csum_disk_extent(struct btrfs_trans_handle *trans,
ret = -EIO;
break;
}
- ret = btrfs_csum_file_block(trans,
- disk_bytenr + num_bytes,
- disk_bytenr + offset,
- buffer, blocksize);
+ ret = btrfs_csum_file_block(trans, disk_bytenr + offset,
+ BTRFS_EXTENT_CSUM_OBJECTID,
+ fs_info->csum_type, buffer);
if (ret)
break;
}
diff --git a/kernel-shared/file-item.c b/kernel-shared/file-item.c
index 9b59a4b7a9ae..1a2f5f147328 100644
--- a/kernel-shared/file-item.c
+++ b/kernel-shared/file-item.c
@@ -134,7 +134,7 @@ static struct btrfs_csum_item *
btrfs_lookup_csum(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
- u64 bytenr, int cow)
+ u64 bytenr, u64 csum_objectid, u16 csum_type, int cow)
{
int ret;
struct btrfs_key file_key;
@@ -142,10 +142,10 @@ btrfs_lookup_csum(struct btrfs_trans_handle *trans,
struct btrfs_csum_item *item;
struct extent_buffer *leaf;
u64 csum_offset = 0;
- u16 csum_size = root->fs_info->csum_size;
+ u16 csum_size = btrfs_csum_type_size(csum_type);
int csums_in_item;
- file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
+ file_key.objectid = csum_objectid;
file_key.offset = bytenr;
file_key.type = BTRFS_EXTENT_CSUM_KEY;
ret = btrfs_search_slot(trans, root, &file_key, path, 0, cow);
@@ -159,7 +159,8 @@ btrfs_lookup_csum(struct btrfs_trans_handle *trans,
goto fail;
path->slots[0]--;
btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
- if (found_key.type != BTRFS_EXTENT_CSUM_KEY)
+ if (found_key.type != BTRFS_EXTENT_CSUM_KEY ||
+ found_key.objectid != csum_objectid)
goto fail;
csum_offset = (bytenr - found_key.offset) /
@@ -182,10 +183,10 @@ fail:
return ERR_PTR(ret);
}
-int btrfs_csum_file_block(struct btrfs_trans_handle *trans,
- u64 alloc_end, u64 bytenr, char *data, size_t len)
+int btrfs_csum_file_block(struct btrfs_trans_handle *trans, u64 logical,
+ u64 csum_objectid, u32 csum_type, const char *data)
{
- struct btrfs_root *root = btrfs_csum_root(trans->fs_info, bytenr);
+ struct btrfs_root *root = btrfs_csum_root(trans->fs_info, logical);
int ret = 0;
struct btrfs_key file_key;
struct btrfs_key found_key;
@@ -199,18 +200,18 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans,
u32 sectorsize = root->fs_info->sectorsize;
u32 nritems;
u32 ins_size;
- u16 csum_size = root->fs_info->csum_size;
- u16 csum_type = root->fs_info->csum_type;
+ u16 csum_size = btrfs_csum_type_size(csum_type);
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
- file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
- file_key.offset = bytenr;
+ file_key.objectid = csum_objectid;
+ file_key.offset = logical;
file_key.type = BTRFS_EXTENT_CSUM_KEY;
- item = btrfs_lookup_csum(trans, root, path, bytenr, 1);
+ item = btrfs_lookup_csum(trans, root, path, logical, csum_objectid,
+ csum_type, 1);
if (!IS_ERR(item)) {
leaf = path->nodes[0];
ret = 0;
@@ -241,7 +242,7 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans,
slot = 0;
}
btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot);
- if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
+ if (found_key.objectid != csum_objectid ||
found_key.type != BTRFS_EXTENT_CSUM_KEY) {
found_next = 1;
goto insert;
@@ -270,7 +271,7 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans,
leaf = path->nodes[0];
btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
csum_offset = (file_key.offset - found_key.offset) / sectorsize;
- if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
+ if (found_key.objectid != csum_objectid ||
found_key.type != BTRFS_EXTENT_CSUM_KEY ||
csum_offset >= MAX_CSUM_ITEMS(root, csum_size)) {
goto insert;
@@ -290,7 +291,7 @@ insert:
btrfs_release_path(path);
csum_offset = 0;
if (found_next) {
- u64 tmp = min(alloc_end, next_offset);
+ u64 tmp = min(logical + sectorsize, next_offset);
tmp -= file_key.offset;
tmp /= sectorsize;
tmp = max((u64)1, tmp);
@@ -314,7 +315,8 @@ csum:
item = (struct btrfs_csum_item *)((unsigned char *)item +
csum_offset * csum_size);
found:
- btrfs_csum_data(root->fs_info, csum_type, (u8 *)data, csum_result, len);
+ btrfs_csum_data(root->fs_info, csum_type, (u8 *)data, csum_result,
+ sectorsize);
write_extent_buffer(leaf, csum_result, (unsigned long)item,
csum_size);
btrfs_mark_buffer_dirty(path->nodes[0]);
diff --git a/kernel-shared/file-item.h b/kernel-shared/file-item.h
index 25dfecca3429..efbe5f2093aa 100644
--- a/kernel-shared/file-item.h
+++ b/kernel-shared/file-item.h
@@ -80,8 +80,8 @@ int btrfs_insert_file_extent(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
u64 objectid, u64 pos, u64 offset,
u64 disk_num_bytes, u64 num_bytes);
-int btrfs_csum_file_block(struct btrfs_trans_handle *trans,
- u64 alloc_end, u64 bytenr, char *data, size_t len);
+int btrfs_csum_file_block(struct btrfs_trans_handle *trans, u64 logical,
+ u64 csum_objectid, u32 csum_type, const char *data);
int btrfs_insert_inline_extent(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 objectid,
u64 offset, const char *buffer, size_t size);
diff --git a/kernel-shared/tree-checker.c b/kernel-shared/tree-checker.c
index b28e42821533..107975891fe7 100644
--- a/kernel-shared/tree-checker.c
+++ b/kernel-shared/tree-checker.c
@@ -367,6 +367,11 @@ static int check_csum_item(struct extent_buffer *leaf, struct btrfs_key *key,
u32 sectorsize = fs_info->sectorsize;
const u32 csumsize = fs_info->csum_size;
+ /* For fs under csum change, we should not check the regular csum items. */
+ if (unlikely(btrfs_super_flags(fs_info->super_copy) &
+ (BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM |
+ BTRFS_SUPER_FLAG_CHANGING_META_CSUM)))
+ return 0;
if (unlikely(key->objectid != BTRFS_EXTENT_CSUM_OBJECTID)) {
generic_err(leaf, slot,
"invalid key objectid for csum item, have %llu expect %llu",
diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index 5fd3c6feea5c..4f7feb529998 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -306,12 +306,13 @@ static int add_file_items(struct btrfs_trans_handle *trans,
struct btrfs_inode_item *btrfs_inode, u64 objectid,
struct stat *st, const char *path_name)
{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
int ret = -1;
ssize_t ret_read;
u64 bytes_read = 0;
struct btrfs_key key;
int blocks;
- u32 sectorsize = root->fs_info->sectorsize;
+ u32 sectorsize = fs_info->sectorsize;
u64 first_block = 0;
u64 file_pos = 0;
u64 cur_bytes;
@@ -332,7 +333,7 @@ static int add_file_items(struct btrfs_trans_handle *trans,
if (st->st_size % sectorsize)
blocks += 1;
- if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) &&
+ if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(fs_info) &&
st->st_size < sectorsize) {
char *buffer = malloc(st->st_size);
@@ -397,9 +398,9 @@ again:
goto end;
}
- ret = btrfs_csum_file_block(trans,
- first_block + bytes_read + sectorsize,
- first_block + bytes_read, buf, sectorsize);
+ ret = btrfs_csum_file_block(trans, first_block + bytes_read,
+ BTRFS_EXTENT_CSUM_OBJECTID,
+ fs_info->csum_type, buf);
if (ret)
goto end;
diff --git a/tune/change-csum.c b/tune/change-csum.c
index 9d1b529e9c34..a30d142c1600 100644
--- a/tune/change-csum.c
+++ b/tune/change-csum.c
@@ -21,6 +21,7 @@
#include "kernel-shared/ctree.h"
#include "kernel-shared/disk-io.h"
#include "kernel-shared/volumes.h"
+#include "kernel-shared/file-item.h"
#include "kernel-shared/extent_io.h"
#include "kernel-shared/transaction.h"
#include "common/messages.h"
@@ -180,7 +181,14 @@ static int generate_new_csum_range(struct btrfs_trans_handle *trans,
goto out;
}
/* Calculate new csum and insert it into the csum tree. */
- ret = -EOPNOTSUPP;
+ ret = btrfs_csum_file_block(trans, cur,
+ BTRFS_CSUM_CHANGE_OBJECTID, new_csum_type, buf);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to insert new csum for data at logical %llu: %m",
+ cur);
+ goto out;
+ }
}
out:
free(buf);
--
2.40.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/7] btrfs-progs: tune: add the ability to delete old data csums
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
` (3 preceding siblings ...)
2023-05-18 2:10 ` [PATCH v2 4/7] btrfs-progs: tune: add the ability to generate new data checksums Qu Wenruo
@ 2023-05-18 2:10 ` Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 6/7] btrfs-progs: tune: add the ability to migrate the temporary csum items to regular csum items Qu Wenruo
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-18 2:10 UTC (permalink / raw)
To: linux-btrfs
The new helper function, delete_old_data_csums(), would delete the old
data csums while keep the new one untouched.
Since the new data csums have a key objectid (-13) smaller than the
old data csums (-10), we can safely delete from the tail of the btree.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tune/change-csum.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/tune/change-csum.c b/tune/change-csum.c
index a30d142c1600..61368ddf34b9 100644
--- a/tune/change-csum.c
+++ b/tune/change-csum.c
@@ -326,6 +326,68 @@ out:
return ret;
}
+static int delete_old_data_csums(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0);
+ struct btrfs_trans_handle *trans;
+ struct btrfs_path path = { 0 };
+ struct btrfs_key last_key;
+ int ret;
+
+ last_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
+ last_key.type = BTRFS_EXTENT_CSUM_KEY;
+ last_key.offset = (u64)-1;
+
+ trans = btrfs_start_transaction(csum_root, 1);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ errno = -ret;
+ error("failed to start transaction to delete old data csums: %m");
+ return ret;
+ }
+ while (true) {
+ int start_slot;
+ int nr;
+
+ ret = btrfs_search_slot(trans, csum_root, &last_key, &path, -1, 1);
+
+ nr = btrfs_header_nritems(path.nodes[0]);
+ /* No item left (empty csum tree), exit. */
+ if (!nr)
+ break;
+ for (start_slot = 0; start_slot < nr; start_slot++) {
+ struct btrfs_key found_key;
+
+ btrfs_item_key_to_cpu(path.nodes[0], &found_key, start_slot);
+ /* Break from the for loop, we found the first old csum. */
+ if (found_key.objectid == BTRFS_EXTENT_CSUM_OBJECTID)
+ break;
+ }
+ /* No more old csum item detected, exit. */
+ if (start_slot == nr)
+ break;
+
+ /* Delete items starting from @start_slot to the end. */
+ ret = btrfs_del_items(trans, csum_root, &path, start_slot,
+ nr - start_slot);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to delete items: %m");
+ break;
+ }
+ btrfs_release_path(&path);
+ }
+ btrfs_release_path(&path);
+ if (ret < 0)
+ btrfs_abort_transaction(trans, ret);
+ ret = btrfs_commit_transaction(trans, csum_root);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to commit transaction after deleting the old data csums: %m");
+ }
+ return ret;
+}
+
int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
{
int ret;
@@ -350,6 +412,9 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
}
/* Phase 2, delete the old data csums. */
+ ret = delete_old_data_csums(fs_info);
+ if (ret < 0)
+ return ret;
/* Phase 3, change the new csum key objectid */
--
2.40.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 6/7] btrfs-progs: tune: add the ability to migrate the temporary csum items to regular csum items
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
` (4 preceding siblings ...)
2023-05-18 2:10 ` [PATCH v2 5/7] btrfs-progs: tune: add the ability to delete old data csums Qu Wenruo
@ 2023-05-18 2:10 ` Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 7/7] btrfs-progs: tune: add the ability to change metadata csums Qu Wenruo
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-18 2:10 UTC (permalink / raw)
To: linux-btrfs
At this stage, the csum tree should only contain the temporary csum
items (CSUM_CHANGE, EXTENT_CSUM, logical), and no more old csum items.
Now we can convert those temporary csum items back to regular csum items
by changing their key objectids back to EXTENT_CSUM.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tune/change-csum.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/tune/change-csum.c b/tune/change-csum.c
index 61368ddf34b9..167760536336 100644
--- a/tune/change-csum.c
+++ b/tune/change-csum.c
@@ -388,6 +388,89 @@ static int delete_old_data_csums(struct btrfs_fs_info *fs_info)
return ret;
}
+static int change_csum_objectids(struct btrfs_fs_info *fs_info)
+{
+ struct btrfs_root *csum_root = btrfs_csum_root(fs_info, 0);
+ struct btrfs_trans_handle *trans;
+ struct btrfs_path path = { 0 };
+ struct btrfs_key last_key;
+ u64 super_flags;
+ int ret = 0;
+
+ last_key.objectid = BTRFS_CSUM_CHANGE_OBJECTID;
+ last_key.type = BTRFS_EXTENT_CSUM_KEY;
+ last_key.offset = (u64)-1;
+
+ trans = btrfs_start_transaction(csum_root, 1);
+ if (IS_ERR(trans)) {
+ ret = PTR_ERR(trans);
+ errno = -ret;
+ error("failed to start transaction to change csum objectids: %m");
+ return ret;
+ }
+ while (true) {
+ struct btrfs_key found_key;
+ int nr;
+
+ ret = btrfs_search_slot(trans, csum_root, &last_key, &path, 0, 1);
+ if (ret < 0)
+ goto out;
+ assert(ret > 0);
+
+ nr = btrfs_header_nritems(path.nodes[0]);
+ /* No item left (empty csum tree), exit. */
+ if (!nr)
+ goto out;
+ /* No more temporary csum items, all converted, exit. */
+ if (path.slots[0] == 0)
+ goto out;
+
+ /* All csum items should be new csums. */
+ btrfs_item_key_to_cpu(path.nodes[0], &found_key, 0);
+ assert(found_key.objectid == BTRFS_CSUM_CHANGE_OBJECTID);
+
+ /*
+ * Start changing the objectids, since EXTENT_CSUM (-10) is
+ * larger than CSUM_CHANGE (-13), we always change from the tail.
+ */
+ for (int i = nr - 1; i >= 0; i--) {
+ btrfs_item_key_to_cpu(path.nodes[0], &found_key, i);
+ found_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
+ path.slots[0] = i;
+ ret = btrfs_set_item_key_safe(csum_root, &path, &found_key);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to set item key for data csum at logical %llu: %m",
+ found_key.offset);
+ goto out;
+ }
+ }
+ btrfs_release_path(&path);
+ }
+out:
+ btrfs_release_path(&path);
+ if (ret < 0) {
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+
+ /*
+ * All data csum items has been changed to the new type, we can clear
+ * the superblock flag for data csum change, and go to the metadata csum
+ * change phase.
+ */
+ super_flags = btrfs_super_flags(fs_info->super_copy);
+ super_flags &= ~BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM;
+ super_flags |= BTRFS_SUPER_FLAG_CHANGING_META_CSUM;
+ btrfs_set_super_flags(fs_info->super_copy, super_flags);
+ ret = btrfs_commit_transaction(trans, csum_root);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to commit transaction after changing data csum objectids: %m");
+ }
+ return ret;
+}
+
int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
{
int ret;
@@ -417,6 +500,9 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
return ret;
/* Phase 3, change the new csum key objectid */
+ ret = change_csum_objectids(fs_info);
+ if (ret < 0)
+ return ret;
/*
* Phase 4, change the csums for metadata.
--
2.40.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 7/7] btrfs-progs: tune: add the ability to change metadata csums
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
` (5 preceding siblings ...)
2023-05-18 2:10 ` [PATCH v2 6/7] btrfs-progs: tune: add the ability to migrate the temporary csum items to regular csum items Qu Wenruo
@ 2023-05-18 2:10 ` Qu Wenruo
2023-05-22 12:14 ` [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change David Sterba
2023-05-22 20:19 ` David Sterba
8 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-18 2:10 UTC (permalink / raw)
To: linux-btrfs
The csum change for metadata is like uuid-change, we go with in-place
csum update without any COW.
During the rewrite, we will manually check the csum (both old and new)
for each tree block.
And only rewrite the csum if the tree block matches its old csum.
(For tree block matches its new csum, we need to do nothing).
And when everything is done, just update the superblock to reflect the
csum type change.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tune/change-csum.c | 143 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 142 insertions(+), 1 deletion(-)
diff --git a/tune/change-csum.c b/tune/change-csum.c
index 167760536336..c8809300a143 100644
--- a/tune/change-csum.c
+++ b/tune/change-csum.c
@@ -471,8 +471,144 @@ out:
return ret;
}
+static int rewrite_tree_block_csum(struct btrfs_fs_info *fs_info, u64 logical,
+ u16 new_csum_type)
+{
+ struct extent_buffer *eb;
+ u8 result_old[BTRFS_CSUM_SIZE];
+ u8 result_new[BTRFS_CSUM_SIZE];
+ int ret;
+
+ eb = alloc_dummy_extent_buffer(fs_info, logical, fs_info->nodesize);
+ if (!eb)
+ return -ENOMEM;
+
+ ret = btrfs_read_extent_buffer(eb, 0, 0, NULL);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to read tree block at logical %llu: %m", logical);
+ goto out;
+ }
+
+ /* Verify the csum first. */
+ btrfs_csum_data(fs_info, fs_info->csum_type, (u8 *)eb->data + BTRFS_CSUM_SIZE,
+ result_old, fs_info->nodesize - BTRFS_CSUM_SIZE);
+ btrfs_csum_data(fs_info, new_csum_type, (u8 *)eb->data + BTRFS_CSUM_SIZE,
+ result_new, fs_info->nodesize - BTRFS_CSUM_SIZE);
+
+ /* Matches old csum, rewrite. */
+ if (memcmp_extent_buffer(eb, result_old, 0, fs_info->csum_size) == 0) {
+ write_extent_buffer(eb, result_new, 0,
+ btrfs_csum_type_size(new_csum_type));
+ ret = write_data_to_disk(fs_info, eb->data, eb->start,
+ fs_info->nodesize);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to write tree block at logical %llu: %m",
+ logical);
+ }
+ goto out;
+ }
+
+ /* Already new csum. */
+ if (memcmp_extent_buffer(eb, result_new, 0, fs_info->csum_size) == 0)
+ goto out;
+
+ /* Csum doesn't match either old or new csum type, bad tree block. */
+ ret = -EIO;
+ error("tree block csum mismatch at logical %llu", logical);
+out:
+ free_extent_buffer(eb);
+ return ret;
+}
+
+static int change_meta_csums(struct btrfs_fs_info *fs_info, u32 new_csum_type)
+{
+ struct btrfs_root *extent_root = btrfs_extent_root(fs_info, 0);
+ struct btrfs_path path = { 0 };
+ struct btrfs_key key;
+ int ret;
+
+ /*
+ * Disable metadata csum checks first, as we may hit tree blocks with
+ * either old or new csums.
+ * We will manually check the meta csums here.
+ */
+ fs_info->skip_csum_check = true;
+
+ key.objectid = 0;
+ key.type = 0;
+ key.offset = 0;
+
+ ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to get the first tree block of extent tree: %m");
+ return ret;
+ }
+ assert(ret > 0);
+ while (true) {
+ btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+ if (key.type != BTRFS_EXTENT_ITEM_KEY &&
+ key.type != BTRFS_METADATA_ITEM_KEY)
+ goto next;
+
+ if (key.type == BTRFS_EXTENT_ITEM_KEY) {
+ struct btrfs_extent_item *ei;
+ ei = btrfs_item_ptr(path.nodes[0], path.slots[0],
+ struct btrfs_extent_item);
+ if (btrfs_extent_flags(path.nodes[0], ei) &
+ BTRFS_EXTENT_FLAG_DATA)
+ goto next;
+ }
+ ret = rewrite_tree_block_csum(fs_info, key.objectid, new_csum_type);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to rewrite csum for tree block %llu: %m",
+ key.offset);
+ goto out;
+ }
+next:
+ ret = btrfs_next_extent_item(extent_root, &path, U64_MAX);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to get next extent item: %m");
+ }
+ if (ret > 0) {
+ ret = 0;
+ goto out;
+ }
+ }
+out:
+ btrfs_release_path(&path);
+
+ /*
+ * Finish the change by clearing the csum change flag and update the superblock
+ * csum type.
+ */
+ if (ret == 0) {
+ u64 super_flags = btrfs_super_flags(fs_info->super_copy);
+
+ btrfs_set_super_csum_type(fs_info->super_copy, new_csum_type);
+ super_flags &= ~(BTRFS_SUPER_FLAG_CHANGING_DATA_CSUM |
+ BTRFS_SUPER_FLAG_CHANGING_META_CSUM);
+ btrfs_set_super_flags(fs_info->super_copy, super_flags);
+
+ fs_info->csum_type = new_csum_type;
+ fs_info->csum_size = btrfs_csum_type_size(new_csum_type);
+
+ ret = write_all_supers(fs_info);
+ if (ret < 0) {
+ errno = -ret;
+ error("failed to write super blocks: %m");
+ }
+ }
+ return ret;
+}
+
int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
{
+ u16 old_csum_type = fs_info->csum_type;
int ret;
/* Phase 0, check conflicting features. */
@@ -511,5 +647,10 @@ int btrfs_change_csum_type(struct btrfs_fs_info *fs_info, u16 new_csum_type)
* like relocation in progs.
* Thus we have to support reading a tree block with either csum.
*/
- return -EOPNOTSUPP;
+ ret = change_meta_csums(fs_info, new_csum_type);
+ if (ret == 0)
+ printf("converted csum type from %s (%u) to %s (%u)\n",
+ btrfs_super_csum_name(old_csum_type), old_csum_type,
+ btrfs_super_csum_name(new_csum_type), new_csum_type);
+ return ret;
}
--
2.40.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
` (6 preceding siblings ...)
2023-05-18 2:10 ` [PATCH v2 7/7] btrfs-progs: tune: add the ability to change metadata csums Qu Wenruo
@ 2023-05-22 12:14 ` David Sterba
2023-05-22 23:45 ` Qu Wenruo
2023-05-22 20:19 ` David Sterba
8 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2023-05-22 12:14 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Thu, May 18, 2023 at 10:10:38AM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Skip csum item checks if the fs is under csum change
> Tree-checker can be too sensitive if the csum size doesn not match the
> old csum size, which can lead to false alerts on overlapping csum
> items.
>
> But we still want the tree checker functionality overall, so just
> disable csum item related checks for csum change.
I still see some errors with v2, the same test that rotates the checksum
types on an increasingly filled filesystem (the one I sent you before):
ERROR: failed to insert csum change item: File exists
ERROR: failed to generate new data csums: File exists
WARNING: reserved space leaked, flag=0x4 bytes_reserved=16384
extent buffer leak: start 610811904 len 16384
extent buffer leak: start 5242880 len 16384
WARNING: dirty eb leak (aborted trans): start 5242880 len 16384
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
` (7 preceding siblings ...)
2023-05-22 12:14 ` [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change David Sterba
@ 2023-05-22 20:19 ` David Sterba
8 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2023-05-22 20:19 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Thu, May 18, 2023 at 10:10:38AM +0800, Qu Wenruo wrote:
> [TODO]
> - Support for resume
> Currently we won't resume an interrupted csum conversion.
> Although the design should be able to handle any interruption at data
> csum conversion part, and as long as metadata csum writes are atomic,
> the metadata rewrites should also be fine.
>
> - Support for revert if errors are found
> If we hit data csum mismatch and can not repair from any copy, then
> we should revert back to the old csum.
>
> - Suppot for precaustious metadata check
> We'd better read and very metadata before rewriting them.
>
> - Extra test cases
As the todo list for that feature is still long and it's behind the
experimental build I'll keep the patches in devel, please send
incremental fixes or further updates. Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change
2023-05-22 12:14 ` [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change David Sterba
@ 2023-05-22 23:45 ` Qu Wenruo
0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2023-05-22 23:45 UTC (permalink / raw)
To: dsterba, Qu Wenruo; +Cc: linux-btrfs
On 2023/5/22 20:14, David Sterba wrote:
> On Thu, May 18, 2023 at 10:10:38AM +0800, Qu Wenruo wrote:
>> [CHANGELOG]
>> v2:
>> - Skip csum item checks if the fs is under csum change
>> Tree-checker can be too sensitive if the csum size doesn not match the
>> old csum size, which can lead to false alerts on overlapping csum
>> items.
>>
>> But we still want the tree checker functionality overall, so just
>> disable csum item related checks for csum change.
>
> I still see some errors with v2, the same test that rotates the checksum
> types on an increasingly filled filesystem (the one I sent you before):
>
> ERROR: failed to insert csum change item: File exists
Oh sh*t, my tests only do one csum type cycle, which is
CRC32->BLAKE2->SHA256->XXHASH, and moved to the next mkfs.
But your incremental tests do multiple cycles (the incremental part is
not that a big deal, as after a full conversion it's no different than a
new fs but filled to that state).
In that case, even my v2 patches forgot to delete the csum change item
in root tree.
And one cyclic run won't fail, because they all have different offset,
but multiple cyclic runs would fail as long as we hit the second time
for the same target csum type.
The only thing saved my backend is the detailed error messages...
Thanks,
Qu
> ERROR: failed to generate new data csums: File exists
> WARNING: reserved space leaked, flag=0x4 bytes_reserved=16384
> extent buffer leak: start 610811904 len 16384
> extent buffer leak: start 5242880 len 16384
> WARNING: dirty eb leak (aborted trans): start 5242880 len 16384
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-05-23 0:40 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-18 2:10 [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 1/7] btrfs-progs: tune: rework the main idea of csum change Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 2/7] btrfs-progs: tune: implement the prerequisite checks for " Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 3/7] btrfs-progs: tune: add the ability to read and verify the data before generating new checksum Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 4/7] btrfs-progs: tune: add the ability to generate new data checksums Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 5/7] btrfs-progs: tune: add the ability to delete old data csums Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 6/7] btrfs-progs: tune: add the ability to migrate the temporary csum items to regular csum items Qu Wenruo
2023-05-18 2:10 ` [PATCH v2 7/7] btrfs-progs: tune: add the ability to change metadata csums Qu Wenruo
2023-05-22 12:14 ` [PATCH v2 0/7] btrfs-progs: csum-change: add the initial support for offline csum type change David Sterba
2023-05-22 23:45 ` Qu Wenruo
2023-05-22 20:19 ` David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.