* [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite
2018-10-11 15:03 [PATCH " Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
0 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
This field is going to be used when the user wants to change the UUID
of the filesystem without having to rewrite all metadata blocks. This
field adds another level of indirection such that when the FSID is
changed what really happens is the current uuid (the one with which the
fs was created) is copied to the 'metadata_uuid' field in the superblock
as well as a new incompat flag is set METADATA_UUID. When the kernel
detects this flag is set it knows that the superblock in fact has 2
uuids:
1. Is the UUID which is user-visible, currently known as FSID.
2. Metadata UUID - this is the UUID which is stamped into all on-disk
datastructures belonging to this file system.
When the new incompat flag is present device scaning checks whether
both fsid/metadata_uuid of the scanned device match to any of the
registed filesystems. When the flag is not set then both UUIDs are
equal and only the FSID is retained on disk, metadata_uuid is set only
in-memory during mount.
Additionally a new metadata_uuid field is also added to the fs_info
struct. It's initialised either with the FSID in case METADATA_UUID
incompat flag is not set or with the metdata_uuid of the superblock
otherwise.
This commit introduces the new fields as well as the new incompat flag
and switches all users of the fsid to the new logic.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/ctree.c | 4 +--
fs/btrfs/ctree.h | 12 ++++---
fs/btrfs/disk-io.c | 32 ++++++++++++++----
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/volumes.c | 72 ++++++++++++++++++++++++++++++++---------
fs/btrfs/volumes.h | 1 +
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 1 +
8 files changed, 97 insertions(+), 28 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2ee43b6a4f09..11b5c2abeddc 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -224,7 +224,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, new_root_objectid);
- write_extent_buffer_fsid(cow, fs_info->fsid);
+ write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
WARN_ON(btrfs_header_generation(buf) > trans->transid);
if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1033,7 +1033,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, root->root_key.objectid);
- write_extent_buffer_fsid(cow, fs_info->fsid);
+ write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
if (ret) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 15c659f23411..afa55f524a49 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -197,7 +197,7 @@ struct btrfs_root_backup {
struct btrfs_super_block {
u8 csum[BTRFS_CSUM_SIZE];
/* the first 4 fields must match struct btrfs_header */
- u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+ u8 fsid[BTRFS_FSID_SIZE]; /* userfacing FS specific uuid */
__le64 bytenr; /* this block number */
__le64 flags;
@@ -234,8 +234,10 @@ struct btrfs_super_block {
__le64 cache_generation;
__le64 uuid_tree_generation;
+ u8 metadata_uuid[BTRFS_FSID_SIZE]; /* The uuid written into btree blocks */
+
/* future expansion */
- __le64 reserved[30];
+ __le64 reserved[28];
u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
} __attribute__ ((__packed__));
@@ -265,7 +267,8 @@ struct btrfs_super_block {
BTRFS_FEATURE_INCOMPAT_RAID56 | \
BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \
BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \
- BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
#define BTRFS_FEATURE_INCOMPAT_SAFE_SET \
(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
@@ -746,7 +749,8 @@ struct btrfs_delayed_root;
#define BTRFS_FS_BALANCE_RUNNING 18
struct btrfs_fs_info {
- u8 fsid[BTRFS_FSID_SIZE];
+ u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */
+ u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */
u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
unsigned long flags;
struct btrfs_root *extent_root;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 27f6a3348f94..b61e4d47e316 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
if (WARN_ON(!PageUptodate(page)))
return -EUCLEAN;
- ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
+ ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid,
btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
return csum_tree_block(fs_info, eb, 0);
@@ -566,7 +566,19 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info,
read_extent_buffer(eb, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
while (fs_devices) {
- if (!memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE)) {
+ u8 *metadata_uuid;
+ /*
+ * Checking the incompat flag is only valid for the current
+ * fs. For seed devices it's forbidden to have their uuid
+ * changed so reading ->fsid in this case is fine
+ */
+ if (fs_devices == fs_info->fs_devices &&
+ btrfs_fs_incompat(fs_info, METADATA_UUID))
+ metadata_uuid = fs_devices->metadata_uuid;
+ else
+ metadata_uuid = fs_devices->fsid;
+
+ if (!memcmp(fsid, metadata_uuid, BTRFS_FSID_SIZE)) {
ret = 0;
break;
}
@@ -2478,10 +2490,11 @@ static int validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) {
+ if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid,
+ BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
- "dev_item UUID does not match fsid: %pU != %pU",
- fs_info->fsid, sb->dev_item.fsid);
+ "dev_item UUID does not match metadata fsid: %pU != %pU",
+ fs_info->metadata_fsid, sb->dev_item.fsid);
ret = -EINVAL;
}
@@ -2822,6 +2835,12 @@ int open_ctree(struct super_block *sb,
brelse(bh);
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
+ if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
+ memcpy(fs_info->metadata_fsid,
+ fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE);
+ } else {
+ memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+ }
ret = btrfs_validate_mount_super(fs_info);
if (ret) {
@@ -3760,7 +3779,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
btrfs_set_stack_device_io_width(dev_item, dev->io_width);
btrfs_set_stack_device_sector_size(dev_item, dev->sector_size);
memcpy(dev_item->uuid, dev->uuid, BTRFS_UUID_SIZE);
- memcpy(dev_item->fsid, dev->fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(dev_item->fsid, dev->fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE);
flags = btrfs_super_flags(sb);
btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 97cb2c17802e..4926d0975242 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8174,7 +8174,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
btrfs_set_header_generation(buf, trans->transid);
btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(buf, owner);
- write_extent_buffer_fsid(buf, fs_info->fsid);
+ write_extent_buffer_fsid(buf, fs_info->metadata_fsid);
write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid);
if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) {
buf->log_index = root->log_transid % 2;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f435d397019e..e3e12c94834f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -238,13 +238,15 @@ struct list_head *btrfs_get_fs_uuids(void)
/*
* alloc_fs_devices - allocate struct btrfs_fs_devices
- * @fsid: if not NULL, copy the uuid to fs_devices::fsid
+ * @fsid: if not NULL, copy the uuid to fs_devices::fsid
+ * @metadata_fsid: if not NULL, copy the uuid to fs_devices::metadata_fsid
*
* Return a pointer to a new struct btrfs_fs_devices on success, or ERR_PTR().
* The returned struct is not linked onto any lists and can be destroyed with
* kfree() right away.
*/
-static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
+static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
+ const u8 *metadata_fsid)
{
struct btrfs_fs_devices *fs_devs;
@@ -261,6 +263,11 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
if (fsid)
memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE);
+ if (metadata_fsid)
+ memcpy(fs_devs->metadata_uuid, metadata_fsid, BTRFS_FSID_SIZE);
+ else if (fsid)
+ memcpy(fs_devs->metadata_uuid, fsid, BTRFS_FSID_SIZE);
+
return fs_devs;
}
@@ -368,13 +375,24 @@ static struct btrfs_device *find_device(struct btrfs_fs_devices *fs_devices,
return NULL;
}
-static noinline struct btrfs_fs_devices *find_fsid(u8 *fsid)
+static noinline struct btrfs_fs_devices *
+find_fsid(const u8 *fsid, const u8 *metadata_fsid)
{
struct btrfs_fs_devices *fs_devices;
+ ASSERT(fsid);
+
list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
- if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0)
- return fs_devices;
+ if (metadata_fsid) {
+ if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
+ && memcmp(metadata_fsid, fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0)
+ return fs_devices;
+ } else {
+ if (memcmp(fsid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) == 0)
+ return fs_devices;
+ }
}
return NULL;
}
@@ -709,6 +727,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
device->generation = btrfs_super_generation(disk_super);
if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING) {
+ if (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
+ pr_err("BTRFS: Invalid seeding and uuid-changed device detected\n");
+ goto error_brelse;
+ }
+
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
fs_devices->seeding = 1;
} else {
@@ -759,10 +783,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
struct rcu_string *name;
u64 found_transid = btrfs_super_generation(disk_super);
u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
+ bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+
+ if (has_metadata_uuid)
+ fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
+ else
+ fs_devices = find_fsid(disk_super->fsid, NULL);
- fs_devices = find_fsid(disk_super->fsid);
if (!fs_devices) {
- fs_devices = alloc_fs_devices(disk_super->fsid);
+ if (has_metadata_uuid)
+ fs_devices = alloc_fs_devices(disk_super->fsid,
+ disk_super->metadata_uuid);
+ else
+ fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
+
if (IS_ERR(fs_devices))
return ERR_CAST(fs_devices);
@@ -884,7 +919,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
struct btrfs_device *device;
struct btrfs_device *orig_dev;
- fs_devices = alloc_fs_devices(orig->fsid);
+ fs_devices = alloc_fs_devices(orig->fsid, NULL);
if (IS_ERR(fs_devices))
return fs_devices;
@@ -1709,7 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
ptr = btrfs_device_uuid(dev_item);
write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
ptr = btrfs_device_fsid(dev_item);
- write_extent_buffer(leaf, trans->fs_info->fsid, ptr, BTRFS_FSID_SIZE);
+ write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr,
+ BTRFS_FSID_SIZE);
btrfs_mark_buffer_dirty(leaf);
ret = 0;
@@ -2132,7 +2168,11 @@ static struct btrfs_device *btrfs_find_device_by_path(
disk_super = (struct btrfs_super_block *)bh->b_data;
devid = btrfs_stack_device_id(&disk_super->dev_item);
dev_uuid = disk_super->dev_item.uuid;
- device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+ if (btrfs_fs_incompat(fs_info, METADATA_UUID))
+ device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->metadata_uuid);
+ else
+ device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+
brelse(bh);
if (!device)
device = ERR_PTR(-ENOENT);
@@ -2202,7 +2242,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
if (!fs_devices->seeding)
return -EINVAL;
- seed_devices = alloc_fs_devices(NULL);
+ seed_devices = alloc_fs_devices(NULL, NULL);
if (IS_ERR(seed_devices))
return PTR_ERR(seed_devices);
@@ -2239,6 +2279,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
generate_random_uuid(fs_devices->fsid);
memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
mutex_unlock(&fs_devices->device_list_mutex);
@@ -6245,7 +6287,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
cur_devices = fs_info->fs_devices;
while (cur_devices) {
if (!fsid ||
- !memcmp(cur_devices->fsid, fsid, BTRFS_FSID_SIZE)) {
+ !memcmp(cur_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE)) {
device = find_device(cur_devices, devid, uuid);
if (device)
return device;
@@ -6574,12 +6616,12 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info,
fs_devices = fs_devices->seed;
}
- fs_devices = find_fsid(fsid);
+ fs_devices = find_fsid(fsid, NULL);
if (!fs_devices) {
if (!btrfs_test_opt(fs_info, DEGRADED))
return ERR_PTR(-ENOENT);
- fs_devices = alloc_fs_devices(fsid);
+ fs_devices = alloc_fs_devices(fsid, NULL);
if (IS_ERR(fs_devices))
return fs_devices;
@@ -6629,7 +6671,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item),
BTRFS_FSID_SIZE);
- if (memcmp(fs_uuid, fs_info->fsid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) {
fs_devices = open_seed_devices(fs_info, fs_uuid);
if (IS_ERR(fs_devices))
return PTR_ERR(fs_devices);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index aefce895e994..04860497b33c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -210,6 +210,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+ u8 metadata_uuid[BTRFS_FSID_SIZE];
struct list_head fs_list;
u64 num_devices;
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ca1d21fc4a7..e0763bc4158e 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -269,6 +269,7 @@ struct btrfs_ioctl_fs_info_args {
#define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7)
#define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8)
#define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10)
struct btrfs_ioctl_feature_flags {
__u64 compat_flags;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index aff1356c2bb8..22f9299ba7d9 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -458,6 +458,7 @@ struct btrfs_free_space_header {
#define BTRFS_SUPER_FLAG_METADUMP (1ULL << 33)
#define BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34)
#define BTRFS_SUPER_FLAG_CHANGING_FSID (1ULL << 35)
+#define BTRFS_SUPER_FLAG_CHANGING_FSID_v2 (1ULL << 36)
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 0/6] FSID change kernel support
@ 2018-10-30 14:43 Nikolay Borisov
2018-10-30 14:43 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
` (6 more replies)
0 siblings, 7 replies; 9+ messages in thread
From: Nikolay Borisov @ 2018-10-30 14:43 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
Here is the 3rd submission for the kernel counterpart of the uuid change
patchset. The only difference is that I (hope) have adressed all cosmetic
feedback from David as well as have reworded some change logs to ease
understanding. I've also re-run the regression tests and no failure were
obsered.
For background information refer to first posting [0] and the second one [1]
[0] https://lore.kernel.org/linux-btrfs/1535531754-29774-1-git-send-email-nborisov@suse.com/
[1] https://lore.kernel.org/linux-btrfs/1539270244-27076-1-git-send-email-nborisov@suse.com/
Nikolay Borisov (6):
btrfs: Introduce support for FSID change without metadata rewrite
btrfs: Remove fsid/metadata_fsid fields from btrfs_info
btrfs: Add handling for disk split-brain scenario during fsid change
btrfs: Introduce 2 more members to struct btrfs_fs_devices
btrfs: Handle one more split-brain scenario during fsid change
btrfs: Handle final split-brain possibility during fsid change
fs/btrfs/check-integrity.c | 2 +-
fs/btrfs/ctree.c | 5 +-
fs/btrfs/ctree.h | 10 +-
fs/btrfs/disk-io.c | 53 ++++++++---
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/super.c | 2 +-
fs/btrfs/volumes.c | 196 ++++++++++++++++++++++++++++++++++++----
fs/btrfs/volumes.h | 6 ++
include/trace/events/btrfs.h | 2 +-
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 1 +
12 files changed, 241 insertions(+), 41 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite
2018-10-30 14:43 [PATCH v3 0/6] FSID change kernel support Nikolay Borisov
@ 2018-10-30 14:43 ` Nikolay Borisov
2018-10-30 14:43 ` [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Nikolay Borisov
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2018-10-30 14:43 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
This field is going to be used when the user wants to change the UUID
of the filesystem without having to rewrite all metadata blocks. This
field adds another level of indirection such that when the FSID is
changed what really happens is the current UUID (the one with which the
fs was created) is copied to the 'metadata_uuid' field in the superblock
as well as a new incompat flag is set METADATA_UUID. When the kernel
detects this flag is set it knows that the superblock in fact has 2
UUIDs:
1. Is the UUID which is user-visible, currently known as FSID.
2. Metadata UUID - this is the UUID which is stamped into all on-disk
datastructures belonging to this file system.
When the new incompat flag is present device scaning checks whether
both fsid/metadata_uuid of the scanned device match to any of the
registed filesystems. When the flag is not set then both UUIDs are
equal and only the FSID is retained on disk, metadata_uuid is set only
in-memory during mount.
Additionally a new metadata_uuid field is also added to the fs_info
struct. It's initialised either with the FSID in case METADATA_UUID
incompat flag is not set or with the metdata_uuid of the superblock
otherwise.
This commit introduces the new fields as well as the new incompat flag
and switches all users of the fsid to the new logic.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/ctree.c | 4 +--
fs/btrfs/ctree.h | 12 ++++---
fs/btrfs/disk-io.c | 32 ++++++++++++++----
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/volumes.c | 72 ++++++++++++++++++++++++++++++++---------
fs/btrfs/volumes.h | 1 +
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 1 +
8 files changed, 97 insertions(+), 28 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 539901fb5165..75cd41bf12f7 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -224,7 +224,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, new_root_objectid);
- write_extent_buffer_fsid(cow, fs_info->fsid);
+ write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
WARN_ON(btrfs_header_generation(buf) > trans->transid);
if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1050,7 +1050,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, root->root_key.objectid);
- write_extent_buffer_fsid(cow, fs_info->fsid);
+ write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
if (ret) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 68ca41dbbef3..501ada9ec7bd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -197,7 +197,7 @@ struct btrfs_root_backup {
struct btrfs_super_block {
u8 csum[BTRFS_CSUM_SIZE];
/* the first 4 fields must match struct btrfs_header */
- u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+ u8 fsid[BTRFS_FSID_SIZE]; /* userfacing FS specific uuid */
__le64 bytenr; /* this block number */
__le64 flags;
@@ -234,8 +234,10 @@ struct btrfs_super_block {
__le64 cache_generation;
__le64 uuid_tree_generation;
+ u8 metadata_uuid[BTRFS_FSID_SIZE]; /* The uuid written into btree blocks */
+
/* future expansion */
- __le64 reserved[30];
+ __le64 reserved[28];
u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
} __attribute__ ((__packed__));
@@ -265,7 +267,8 @@ struct btrfs_super_block {
BTRFS_FEATURE_INCOMPAT_RAID56 | \
BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \
BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \
- BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
#define BTRFS_FEATURE_INCOMPAT_SAFE_SET \
(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
@@ -746,7 +749,8 @@ struct btrfs_delayed_root;
#define BTRFS_FS_BALANCE_RUNNING 18
struct btrfs_fs_info {
- u8 fsid[BTRFS_FSID_SIZE];
+ u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */
+ u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */
u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
unsigned long flags;
struct btrfs_root *extent_root;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b0ab41da91d1..b76b18388b93 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
if (WARN_ON(!PageUptodate(page)))
return -EUCLEAN;
- ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
+ ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid,
btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
return csum_tree_block(fs_info, eb, 0);
@@ -566,7 +566,19 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info,
read_extent_buffer(eb, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
while (fs_devices) {
- if (!memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE)) {
+ u8 *metadata_uuid;
+ /*
+ * Checking the incompat flag is only valid for the current
+ * fs. For seed devices it's forbidden to have their uuid
+ * changed so reading ->fsid in this case is fine
+ */
+ if (fs_devices == fs_info->fs_devices &&
+ btrfs_fs_incompat(fs_info, METADATA_UUID))
+ metadata_uuid = fs_devices->metadata_uuid;
+ else
+ metadata_uuid = fs_devices->fsid;
+
+ if (!memcmp(fsid, metadata_uuid, BTRFS_FSID_SIZE)) {
ret = 0;
break;
}
@@ -2478,10 +2490,11 @@ static int validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) {
+ if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid,
+ BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
- "dev_item UUID does not match fsid: %pU != %pU",
- fs_info->fsid, sb->dev_item.fsid);
+ "dev_item UUID does not match metadata fsid: %pU != %pU",
+ fs_info->metadata_fsid, sb->dev_item.fsid);
ret = -EINVAL;
}
@@ -2822,6 +2835,12 @@ int open_ctree(struct super_block *sb,
brelse(bh);
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
+ if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
+ memcpy(fs_info->metadata_fsid,
+ fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE);
+ } else {
+ memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+ }
ret = btrfs_validate_mount_super(fs_info);
if (ret) {
@@ -3760,7 +3779,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
btrfs_set_stack_device_io_width(dev_item, dev->io_width);
btrfs_set_stack_device_sector_size(dev_item, dev->sector_size);
memcpy(dev_item->uuid, dev->uuid, BTRFS_UUID_SIZE);
- memcpy(dev_item->fsid, dev->fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(dev_item->fsid, dev->fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE);
flags = btrfs_super_flags(sb);
btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a1febf155747..047fba06a8d8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8169,7 +8169,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
btrfs_set_header_generation(buf, trans->transid);
btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(buf, owner);
- write_extent_buffer_fsid(buf, fs_info->fsid);
+ write_extent_buffer_fsid(buf, fs_info->metadata_fsid);
write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid);
if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) {
buf->log_index = root->log_transid % 2;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f435d397019e..e3e12c94834f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -238,13 +238,15 @@ struct list_head *btrfs_get_fs_uuids(void)
/*
* alloc_fs_devices - allocate struct btrfs_fs_devices
- * @fsid: if not NULL, copy the uuid to fs_devices::fsid
+ * @fsid: if not NULL, copy the uuid to fs_devices::fsid
+ * @metadata_fsid: if not NULL, copy the uuid to fs_devices::metadata_fsid
*
* Return a pointer to a new struct btrfs_fs_devices on success, or ERR_PTR().
* The returned struct is not linked onto any lists and can be destroyed with
* kfree() right away.
*/
-static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
+static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
+ const u8 *metadata_fsid)
{
struct btrfs_fs_devices *fs_devs;
@@ -261,6 +263,11 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
if (fsid)
memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE);
+ if (metadata_fsid)
+ memcpy(fs_devs->metadata_uuid, metadata_fsid, BTRFS_FSID_SIZE);
+ else if (fsid)
+ memcpy(fs_devs->metadata_uuid, fsid, BTRFS_FSID_SIZE);
+
return fs_devs;
}
@@ -368,13 +375,24 @@ static struct btrfs_device *find_device(struct btrfs_fs_devices *fs_devices,
return NULL;
}
-static noinline struct btrfs_fs_devices *find_fsid(u8 *fsid)
+static noinline struct btrfs_fs_devices *
+find_fsid(const u8 *fsid, const u8 *metadata_fsid)
{
struct btrfs_fs_devices *fs_devices;
+ ASSERT(fsid);
+
list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
- if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0)
- return fs_devices;
+ if (metadata_fsid) {
+ if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
+ && memcmp(metadata_fsid, fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0)
+ return fs_devices;
+ } else {
+ if (memcmp(fsid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) == 0)
+ return fs_devices;
+ }
}
return NULL;
}
@@ -709,6 +727,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
device->generation = btrfs_super_generation(disk_super);
if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING) {
+ if (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
+ pr_err("BTRFS: Invalid seeding and uuid-changed device detected\n");
+ goto error_brelse;
+ }
+
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
fs_devices->seeding = 1;
} else {
@@ -759,10 +783,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
struct rcu_string *name;
u64 found_transid = btrfs_super_generation(disk_super);
u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
+ bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+
+ if (has_metadata_uuid)
+ fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
+ else
+ fs_devices = find_fsid(disk_super->fsid, NULL);
- fs_devices = find_fsid(disk_super->fsid);
if (!fs_devices) {
- fs_devices = alloc_fs_devices(disk_super->fsid);
+ if (has_metadata_uuid)
+ fs_devices = alloc_fs_devices(disk_super->fsid,
+ disk_super->metadata_uuid);
+ else
+ fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
+
if (IS_ERR(fs_devices))
return ERR_CAST(fs_devices);
@@ -884,7 +919,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
struct btrfs_device *device;
struct btrfs_device *orig_dev;
- fs_devices = alloc_fs_devices(orig->fsid);
+ fs_devices = alloc_fs_devices(orig->fsid, NULL);
if (IS_ERR(fs_devices))
return fs_devices;
@@ -1709,7 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
ptr = btrfs_device_uuid(dev_item);
write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
ptr = btrfs_device_fsid(dev_item);
- write_extent_buffer(leaf, trans->fs_info->fsid, ptr, BTRFS_FSID_SIZE);
+ write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr,
+ BTRFS_FSID_SIZE);
btrfs_mark_buffer_dirty(leaf);
ret = 0;
@@ -2132,7 +2168,11 @@ static struct btrfs_device *btrfs_find_device_by_path(
disk_super = (struct btrfs_super_block *)bh->b_data;
devid = btrfs_stack_device_id(&disk_super->dev_item);
dev_uuid = disk_super->dev_item.uuid;
- device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+ if (btrfs_fs_incompat(fs_info, METADATA_UUID))
+ device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->metadata_uuid);
+ else
+ device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+
brelse(bh);
if (!device)
device = ERR_PTR(-ENOENT);
@@ -2202,7 +2242,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
if (!fs_devices->seeding)
return -EINVAL;
- seed_devices = alloc_fs_devices(NULL);
+ seed_devices = alloc_fs_devices(NULL, NULL);
if (IS_ERR(seed_devices))
return PTR_ERR(seed_devices);
@@ -2239,6 +2279,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
generate_random_uuid(fs_devices->fsid);
memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
mutex_unlock(&fs_devices->device_list_mutex);
@@ -6245,7 +6287,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
cur_devices = fs_info->fs_devices;
while (cur_devices) {
if (!fsid ||
- !memcmp(cur_devices->fsid, fsid, BTRFS_FSID_SIZE)) {
+ !memcmp(cur_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE)) {
device = find_device(cur_devices, devid, uuid);
if (device)
return device;
@@ -6574,12 +6616,12 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info,
fs_devices = fs_devices->seed;
}
- fs_devices = find_fsid(fsid);
+ fs_devices = find_fsid(fsid, NULL);
if (!fs_devices) {
if (!btrfs_test_opt(fs_info, DEGRADED))
return ERR_PTR(-ENOENT);
- fs_devices = alloc_fs_devices(fsid);
+ fs_devices = alloc_fs_devices(fsid, NULL);
if (IS_ERR(fs_devices))
return fs_devices;
@@ -6629,7 +6671,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item),
BTRFS_FSID_SIZE);
- if (memcmp(fs_uuid, fs_info->fsid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) {
fs_devices = open_seed_devices(fs_info, fs_uuid);
if (IS_ERR(fs_devices))
return PTR_ERR(fs_devices);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index aefce895e994..04860497b33c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -210,6 +210,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+ u8 metadata_uuid[BTRFS_FSID_SIZE];
struct list_head fs_list;
u64 num_devices;
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ca1d21fc4a7..e0763bc4158e 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -269,6 +269,7 @@ struct btrfs_ioctl_fs_info_args {
#define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7)
#define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8)
#define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10)
struct btrfs_ioctl_feature_flags {
__u64 compat_flags;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index aff1356c2bb8..e974f4bb5378 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -458,6 +458,7 @@ struct btrfs_free_space_header {
#define BTRFS_SUPER_FLAG_METADUMP (1ULL << 33)
#define BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34)
#define BTRFS_SUPER_FLAG_CHANGING_FSID (1ULL << 35)
+#define BTRFS_SUPER_FLAG_CHANGING_FSID_V2 (1ULL << 36)
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info
2018-10-30 14:43 [PATCH v3 0/6] FSID change kernel support Nikolay Borisov
2018-10-30 14:43 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
@ 2018-10-30 14:43 ` Nikolay Borisov
2018-10-30 14:43 ` [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Nikolay Borisov
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2018-10-30 14:43 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
Currently btrfs_fs_info structure contains a copy of the
fsid/metadata_uuid fields. Same values are also contained in the
btrfs_fs_devices structure which fs_info has a reference to. Let's
reduce duplication by removing the fields from fs_info and always refer
to the ones in fs_devices. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/check-integrity.c | 2 +-
fs/btrfs/ctree.c | 5 +++--
fs/btrfs/ctree.h | 2 --
fs/btrfs/disk-io.c | 21 ++++++++++++---------
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/super.c | 2 +-
fs/btrfs/volumes.c | 10 ++++------
include/trace/events/btrfs.h | 2 +-
9 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 2e43fba44035..781cae168d2a 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1720,7 +1720,7 @@ static int btrfsic_test_for_metadata(struct btrfsic_state *state,
num_pages = state->metablock_size >> PAGE_SHIFT;
h = (struct btrfs_header *)datav[0];
- if (memcmp(h->fsid, fs_info->fsid, BTRFS_FSID_SIZE))
+ if (memcmp(h->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE))
return 1;
for (i = 0; i < num_pages; i++) {
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 75cd41bf12f7..61f14a9836a1 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -12,6 +12,7 @@
#include "transaction.h"
#include "print-tree.h"
#include "locking.h"
+#include "volumes.h"
static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
*root, struct btrfs_path *path, int level);
@@ -224,7 +225,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, new_root_objectid);
- write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
+ write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid);
WARN_ON(btrfs_header_generation(buf) > trans->transid);
if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1050,7 +1051,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, root->root_key.objectid);
- write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
+ write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid);
ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
if (ret) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 501ada9ec7bd..8531f0f5d672 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -749,8 +749,6 @@ struct btrfs_delayed_root;
#define BTRFS_FS_BALANCE_RUNNING 18
struct btrfs_fs_info {
- u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */
- u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */
u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
unsigned long flags;
struct btrfs_root *extent_root;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b76b18388b93..a458ef5b605e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
if (WARN_ON(!PageUptodate(page)))
return -EUCLEAN;
- ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid,
+ ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
return csum_tree_block(fs_info, eb, 0);
@@ -2490,11 +2490,12 @@ static int validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid,
+ if (memcmp(fs_info->fs_devices->metadata_uuid, sb->dev_item.fsid,
BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
"dev_item UUID does not match metadata fsid: %pU != %pU",
- fs_info->metadata_fsid, sb->dev_item.fsid);
+ fs_info->fs_devices->metadata_uuid,
+ sb->dev_item.fsid);
ret = -EINVAL;
}
@@ -2834,14 +2835,16 @@ int open_ctree(struct super_block *sb,
sizeof(*fs_info->super_for_commit));
brelse(bh);
- memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
+ ASSERT(!memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid,
+ BTRFS_FSID_SIZE));
+
if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
- memcpy(fs_info->metadata_fsid,
- fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE);
- } else {
- memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+ ASSERT(!memcmp(fs_info->fs_devices->metadata_uuid,
+ fs_info->super_copy->metadata_uuid,
+ BTRFS_FSID_SIZE));
}
+
ret = btrfs_validate_mount_super(fs_info);
if (ret) {
btrfs_err(fs_info, "superblock contains fatal errors");
@@ -2961,7 +2964,7 @@ int open_ctree(struct super_block *sb,
sb->s_blocksize = sectorsize;
sb->s_blocksize_bits = blksize_bits(sectorsize);
- memcpy(&sb->s_uuid, fs_info->fsid, BTRFS_FSID_SIZE);
+ memcpy(&sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
mutex_lock(&fs_info->chunk_mutex);
ret = btrfs_read_sys_array(fs_info);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 047fba06a8d8..c98638155652 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8169,7 +8169,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
btrfs_set_header_generation(buf, trans->transid);
btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(buf, owner);
- write_extent_buffer_fsid(buf, fs_info->metadata_fsid);
+ write_extent_buffer_fsid(buf, fs_info->fs_devices->metadata_uuid);
write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid);
if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) {
buf->log_index = root->log_transid % 2;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a990a9045139..510635e49319 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3135,7 +3135,7 @@ static long btrfs_ioctl_fs_info(struct btrfs_fs_info *fs_info,
}
rcu_read_unlock();
- memcpy(&fi_args->fsid, fs_info->fsid, sizeof(fi_args->fsid));
+ memcpy(&fi_args->fsid, fs_devices->fsid, sizeof(fi_args->fsid));
fi_args->nodesize = fs_info->nodesize;
fi_args->sectorsize = fs_info->sectorsize;
fi_args->clone_alignment = fs_info->sectorsize;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b362b45dd757..1163183fe6dd 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2090,7 +2090,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
u64 total_free_data = 0;
u64 total_free_meta = 0;
int bits = dentry->d_sb->s_blocksize_bits;
- __be32 *fsid = (__be32 *)fs_info->fsid;
+ __be32 *fsid = (__be32 *)fs_info->fs_devices->fsid;
unsigned factor = 1;
struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
int ret;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e3e12c94834f..bf0aa900f22c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1744,8 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
ptr = btrfs_device_uuid(dev_item);
write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
ptr = btrfs_device_fsid(dev_item);
- write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr,
- BTRFS_FSID_SIZE);
+ write_extent_buffer(leaf, trans->fs_info->fs_devices->metadata_uuid,
+ ptr, BTRFS_FSID_SIZE);
btrfs_mark_buffer_dirty(leaf);
ret = 0;
@@ -2278,9 +2278,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
fs_devices->seed = seed_devices;
generate_random_uuid(fs_devices->fsid);
- memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
- memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
mutex_unlock(&fs_devices->device_list_mutex);
@@ -2522,7 +2520,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
* so rename the fsid on the sysfs
*/
snprintf(fsid_buf, BTRFS_UUID_UNPARSED_SIZE, "%pU",
- fs_info->fsid);
+ fs_info->fs_devices->fsid);
if (kobject_rename(&fs_devices->fsid_kobj, fsid_buf))
btrfs_warn(fs_info,
"sysfs: failed to create fsid for sprout");
@@ -6671,7 +6669,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item),
BTRFS_FSID_SIZE);
- if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_uuid, fs_devices->metadata_uuid, BTRFS_FSID_SIZE)) {
fs_devices = open_seed_devices(fs_info, fs_uuid);
if (IS_ERR(fs_devices))
return PTR_ERR(fs_devices);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 8568946f491d..4b8400f7d4fa 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -92,7 +92,7 @@ TRACE_DEFINE_ENUM(COMMIT_TRANS);
#define TP_STRUCT__entry_fsid __array(u8, fsid, BTRFS_FSID_SIZE)
#define TP_fast_assign_fsid(fs_info) \
- memcpy(__entry->fsid, fs_info->fsid, BTRFS_FSID_SIZE)
+ memcpy(__entry->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE)
#define TP_STRUCT__entry_btrfs(args...) \
TP_STRUCT__entry( \
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change
2018-10-30 14:43 [PATCH v3 0/6] FSID change kernel support Nikolay Borisov
2018-10-30 14:43 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
2018-10-30 14:43 ` [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Nikolay Borisov
@ 2018-10-30 14:43 ` Nikolay Borisov
2018-10-30 14:43 ` [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Nikolay Borisov
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2018-10-30 14:43 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
Even though fsid change without rewrite is a very quick operations it's
still possible to experience a split brain scenario if power loss
occurs at the right time. This patch handle the case where power
failure occurs while the first transaction (the one setting
CHANGING_FSID_V2) flag is being persisted on disk. This can cause
the btrfs_fs_devices of this filesystem to be created by a device which:
a) has the CHANGING_FSID_V2 flag set but its fsid value is intact
b) or a device which doesn't have CHANGING_FSID_V2 flag set and its
fsid value is intact
This situation is trivially handled by the current find_fsid code since
in both cases the devices are going to be treated like ordinary devices.
Since btrfs is always mounted using the superblock of the latest
device (the one with highest generation number), meaning it will have
the CHANGING_FSID_V2 flag set, ensure it's being cleared on mount. On
the first transaction commit following mount all disks will have it
cleared.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/disk-io.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a458ef5b605e..6498434c2e06 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2831,10 +2831,10 @@ int open_ctree(struct super_block *sb,
* the whole block of INFO_SIZE
*/
memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy));
- memcpy(fs_info->super_for_commit, fs_info->super_copy,
- sizeof(*fs_info->super_for_commit));
brelse(bh);
+ disk_super = fs_info->super_copy;
+
ASSERT(!memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid,
BTRFS_FSID_SIZE));
@@ -2844,6 +2844,15 @@ int open_ctree(struct super_block *sb,
BTRFS_FSID_SIZE));
}
+ features = btrfs_super_flags(disk_super);
+ if (features & BTRFS_SUPER_FLAG_CHANGING_FSID_V2) {
+ features &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
+ btrfs_set_super_flags(disk_super, features);
+ btrfs_info(fs_info, "found metadata uuid in progress flag. Clearing");
+ }
+
+ memcpy(fs_info->super_for_commit, fs_info->super_copy,
+ sizeof(*fs_info->super_for_commit));
ret = btrfs_validate_mount_super(fs_info);
if (ret) {
@@ -2852,7 +2861,6 @@ int open_ctree(struct super_block *sb,
goto fail_alloc;
}
- disk_super = fs_info->super_copy;
if (!btrfs_super_root(disk_super))
goto fail_alloc;
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices
2018-10-30 14:43 [PATCH v3 0/6] FSID change kernel support Nikolay Borisov
` (2 preceding siblings ...)
2018-10-30 14:43 ` [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Nikolay Borisov
@ 2018-10-30 14:43 ` Nikolay Borisov
2018-10-30 14:43 ` [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Nikolay Borisov
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2018-10-30 14:43 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
In order to gracefully handle split-brain scenario which are very
unlikely, yet possible, while performing the FSID change I'm
gonna need two more pieces of information:
1. The highest generation number among all devices registered to a
particular btrfs_fs_devices
2. A boolean flag whether a given btrfs_fs_devices was created by a
device which had the FSID_CHANGING_V2 flag set.
This is a preparatory patch and just introduces the variables as well
as code which sets them, their actual use is going to happen in a later
patch.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/volumes.c | 9 ++++++++-
fs/btrfs/volumes.h | 5 +++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bf0aa900f22c..f9dcbe74093c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -785,6 +785,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+ bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
+ BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
if (has_metadata_uuid)
fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
@@ -798,6 +800,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
else
fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
+ fs_devices->fsid_change = fsid_change_in_progress;
+
if (IS_ERR(fs_devices))
return ERR_CAST(fs_devices);
@@ -904,8 +908,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
* it back. We need it to pick the disk with largest generation
* (as above).
*/
- if (!fs_devices->opened)
+ if (!fs_devices->opened) {
device->generation = found_transid;
+ fs_devices->latest_generation = max(found_transid,
+ fs_devices->latest_generation);
+ }
fs_devices->total_devices = btrfs_super_num_devices(disk_super);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 04860497b33c..6b2a01c55426 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -211,6 +211,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
u8 metadata_uuid[BTRFS_FSID_SIZE];
+ bool fsid_change;
struct list_head fs_list;
u64 num_devices;
@@ -219,6 +220,10 @@ struct btrfs_fs_devices {
u64 missing_devices;
u64 total_rw_bytes;
u64 total_devices;
+
+ /* Highest generation number of seen devices */
+ u64 latest_generation;
+
struct block_device *latest_bdev;
/* all of the devices in the FS, protected by a mutex
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change
2018-10-30 14:43 [PATCH v3 0/6] FSID change kernel support Nikolay Borisov
` (3 preceding siblings ...)
2018-10-30 14:43 ` [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Nikolay Borisov
@ 2018-10-30 14:43 ` Nikolay Borisov
2018-10-30 14:43 ` [PATCH 6/6] btrfs: Handle final split-brain possibility " Nikolay Borisov
2018-11-19 13:44 ` [PATCH v3 0/6] FSID change kernel support David Sterba
6 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2018-10-30 14:43 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
This commit continues hardening the scanning code to handle cases where
power loss could have caused disks in a multi-disk filesystem to be
in inconsistent state. Namely handle the situation that can occur when
some of the disks in multi-disk fs have completed their fsid change i.e
they have METADATA_UUID incompat flag set, have cleared the
CHANGING_FSID_V2 flag and their fsid/metadata_uuid are different. At
the same time the other half of the disks will have their
fsid/metadata_uuid unchanged and will only have CHANGING_FSID_V2 flag.
This is handled by introducing code in the scan path which:
a) Handles the case when a device with CHANGING_FSID_V2 flag is
scanned and as a result btrfs_fs_devices is created with matching
fsid/metdata_uuid. Subsequently, when a device with completed fsid
change is scanned it will detect this via the new code in find_fsid
i.e that such an fs_devices exist that fsid_change flag is set to true,
it's metadata_uuid/fsid match and the metadata_uuid of the scanned
device matches that of the fs_devices. In this case, it's important to
note that the devices which has its fsid change completed will have a
higher generation number than the device with FSID_CHANGING_V2 flag
set, so its superblock block will be used during mount. To prevent an
assertion triggering because the sb used for mounting will have
differing fsid/metadata_uuid than the ones in the fs_devices struct
also add code in device_list_add which overwrites the values in
fs_devices.
b) Alternatively we can end up with a device that completed its
fsid change be scanned first which will create the respective
btrfs_fs_devices struct with differing fsid/metadata_uuid. In this
case when a device with FSID_CHANGING_V2 flag set is scanned it will
call the newly added find_fsid_inprogress function which will return
the correct fs_devices.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/volumes.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 74 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f9dcbe74093c..f967e995feff 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -382,6 +382,26 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
ASSERT(fsid);
+ if (metadata_fsid) {
+
+ /*
+ * Handle scanned device having completed its fsid change but
+ * belonging to a fs_devices that was created by first scanning
+ * a device which didn't have it's fsid/metadata_uuid changed
+ * at all and the CHANGING_FSID_V2 flag set.
+ */
+ list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+ if (fs_devices->fsid_change &&
+ memcmp(metadata_fsid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) == 0 &&
+ memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0) {
+ return fs_devices;
+ }
+ }
+ }
+
+ /* Handle non-split brain cases */
list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
if (metadata_fsid) {
if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
@@ -768,6 +788,27 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
}
/*
+ * Handle scanned device having its CHANGING_FSID_V2 flag set and the fs_devices
+ * being created with a disk that has already completed its fsid change.
+ */
+static struct btrfs_fs_devices *find_fsid_inprogress(
+ struct btrfs_super_block *disk_super)
+{
+ struct btrfs_fs_devices *fs_devices;
+
+ list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+ if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) != 0 &&
+ memcmp(fs_devices->metadata_uuid, disk_super->fsid,
+ BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change) {
+ return fs_devices;
+ }
+ }
+
+ return NULL;
+}
+
+/*
* Add new device to list of registered devices
*
* Returns:
@@ -779,7 +820,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
bool *new_device_added)
{
struct btrfs_device *device;
- struct btrfs_fs_devices *fs_devices;
+ struct btrfs_fs_devices *fs_devices = NULL;
struct rcu_string *name;
u64 found_transid = btrfs_super_generation(disk_super);
u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
@@ -788,10 +829,24 @@ static noinline struct btrfs_device *device_list_add(const char *path,
bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
- if (has_metadata_uuid)
- fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
- else
+ if (fsid_change_in_progress && !has_metadata_uuid) {
+ /*
+ * When we have an image which has CHANGING_FSID_V2 set it might
+ * belong to either a filesystem which has disks with completed
+ * fsid change or it might belong to fs with no uuid changes in
+ * effect, handle both.
+ */
+ fs_devices = find_fsid_inprogress(disk_super);
+ if (!fs_devices)
+ fs_devices = find_fsid(disk_super->fsid, NULL);
+
+ } else if (has_metadata_uuid) {
+ fs_devices = find_fsid(disk_super->fsid,
+ disk_super->metadata_uuid);
+ } else {
fs_devices = find_fsid(disk_super->fsid, NULL);
+ }
+
if (!fs_devices) {
if (has_metadata_uuid)
@@ -813,6 +868,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
mutex_lock(&fs_devices->device_list_mutex);
device = find_device(fs_devices, devid,
disk_super->dev_item.uuid);
+
+ /*
+ * If this disk has been pulled into an fs devices created by
+ * a device which had the FSID_CHANGING flag then replace the
+ * metadata_uuid/fsid values of the fs_devices.
+ */
+ if (has_metadata_uuid && fs_devices->fsid_change &&
+ found_transid > fs_devices->latest_generation) {
+ memcpy(fs_devices->fsid, disk_super->fsid,
+ BTRFS_FSID_SIZE);
+ memcpy(fs_devices->metadata_uuid,
+ disk_super->metadata_uuid, BTRFS_FSID_SIZE);
+
+ fs_devices->fsid_change = false;
+ }
}
if (!device) {
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 6/6] btrfs: Handle final split-brain possibility during fsid change
2018-10-30 14:43 [PATCH v3 0/6] FSID change kernel support Nikolay Borisov
` (4 preceding siblings ...)
2018-10-30 14:43 ` [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Nikolay Borisov
@ 2018-10-30 14:43 ` Nikolay Borisov
2018-11-19 13:44 ` [PATCH v3 0/6] FSID change kernel support David Sterba
6 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2018-10-30 14:43 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
This patch lands the last case which needs to be handled by the fsid
change code. Namely, this is the case where a multidisk filesystem has
already undergone at least one successful fsid change i.e all disks
have the METADATA_UUID incompat bit and power failure occurs as another
fsid change is in progress. When such an event occurs, disks could be
split in 2 groups. One of the groups will have both METADATA_UUID and
CHANGING_FSID_V2 flags set coupled with old fsid/metadata_uuid pairs.
The other group of disks will have only METADATA_UUID bit set and their
fsid will be different than the one in disks in the first group. Here
we look at the following cases:
a) A disk from the first group is scanned first, so fs_devices is
created with stale fsid/metdata_uuid. Then when a disk from the
second group is scanned it needs to first check whether there exists
such an fs_devices that has fsid_change set to true (because it was
created with a disk having the CHANGING_FSID_V2 flag), the
metadata_uuid and fsid of the fs_devices will be different (since it was
created by a disk which already has had at least 1 successful fsid change)
and finally the metadata_uuid of the fs_devices will equal that of the
currently scanned disk (because metadata_uuid never really changes).
When the correct fs_devices is found the information from the scanned
disk will replace the current one in fs_devices since the scanned disk
will have higher generation number.
b) A disk from the second group is scanned so fs_devices is created
as usual with differing fsid/metdata_uid. Then when a disk from the
first group is scanned the code detects that it has both
CHANGING_FSID_V2 and METADATA_UUID flags set and will search for
fs_devices that has differing metadata_uuid/fsid and whose
metadata_uuid is the same as that of the scanned device.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/volumes.c | 65 ++++++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 53 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f967e995feff..0ce2600c63b8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -383,7 +383,6 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
ASSERT(fsid);
if (metadata_fsid) {
-
/*
* Handle scanned device having completed its fsid change but
* belonging to a fs_devices that was created by first scanning
@@ -399,6 +398,21 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
return fs_devices;
}
}
+ /*
+ * Handle scanned device having completed its fsid change but
+ * belonging to a fs_devices that was created by a device that
+ * has an outdated pair of fsid/metadata_uuid and
+ * CHANGING_FSID_V2 flag set.
+ */
+ list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+ if (fs_devices->fsid_change &&
+ memcmp(fs_devices->metadata_uuid,
+ fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
+ memcmp(metadata_fsid, fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0) {
+ return fs_devices;
+ }
+ }
}
/* Handle non-split brain cases */
@@ -808,6 +822,30 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
return NULL;
}
+
+static struct btrfs_fs_devices *find_fsid_changed(
+ struct btrfs_super_block *disk_super)
+{
+ struct btrfs_fs_devices *fs_devices;
+
+ /*
+ * Handles the case where scanned device is part of an fs that had
+ * multiple successful changes of FSID but curently device didn't
+ * observe it. Meaning our fsid will be different than theirs.
+ */
+ list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+ if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) != 0 &&
+ memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0 &&
+ memcmp(fs_devices->fsid, disk_super->fsid,
+ BTRFS_FSID_SIZE) != 0) {
+ return fs_devices;
+ }
+ }
+
+ return NULL;
+}
/*
* Add new device to list of registered devices
*
@@ -829,17 +867,20 @@ static noinline struct btrfs_device *device_list_add(const char *path,
bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
- if (fsid_change_in_progress && !has_metadata_uuid) {
- /*
- * When we have an image which has CHANGING_FSID_V2 set it might
- * belong to either a filesystem which has disks with completed
- * fsid change or it might belong to fs with no uuid changes in
- * effect, handle both.
- */
- fs_devices = find_fsid_inprogress(disk_super);
- if (!fs_devices)
- fs_devices = find_fsid(disk_super->fsid, NULL);
-
+ if (fsid_change_in_progress) {
+ if (!has_metadata_uuid) {
+ /*
+ * When we have an image which has CHANGING_FSID_V2 set
+ * it might belong to either a filesystem which has
+ * disks with completed fsid change or it might belong
+ * to fs with no uuid changes in effect, handle both.
+ */
+ fs_devices = find_fsid_inprogress(disk_super);
+ if (!fs_devices)
+ fs_devices = find_fsid(disk_super->fsid, NULL);
+ } else {
+ fs_devices = find_fsid_changed(disk_super);
+ }
} else if (has_metadata_uuid) {
fs_devices = find_fsid(disk_super->fsid,
disk_super->metadata_uuid);
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/6] FSID change kernel support
2018-10-30 14:43 [PATCH v3 0/6] FSID change kernel support Nikolay Borisov
` (5 preceding siblings ...)
2018-10-30 14:43 ` [PATCH 6/6] btrfs: Handle final split-brain possibility " Nikolay Borisov
@ 2018-11-19 13:44 ` David Sterba
6 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2018-11-19 13:44 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: linux-btrfs
On Tue, Oct 30, 2018 at 04:43:22PM +0200, Nikolay Borisov wrote:
> Here is the 3rd submission for the kernel counterpart of the uuid change
> patchset. The only difference is that I (hope) have adressed all cosmetic
> feedback from David as well as have reworded some change logs to ease
> understanding. I've also re-run the regression tests and no failure were
> obsered.
>
> For background information refer to first posting [0] and the second one [1]
>
> [0] https://lore.kernel.org/linux-btrfs/1535531754-29774-1-git-send-email-nborisov@suse.com/
> [1] https://lore.kernel.org/linux-btrfs/1539270244-27076-1-git-send-email-nborisov@suse.com/
>
> Nikolay Borisov (6):
> btrfs: Introduce support for FSID change without metadata rewrite
> btrfs: Remove fsid/metadata_fsid fields from btrfs_info
> btrfs: Add handling for disk split-brain scenario during fsid change
> btrfs: Introduce 2 more members to struct btrfs_fs_devices
> btrfs: Handle one more split-brain scenario during fsid change
> btrfs: Handle final split-brain possibility during fsid change
I'm going to move this patchset from for-next to misc-next, the
userspace part is on the way to devel so we should be good for more
testing.
There are still some usability issues that I found during testing, that
can be fixed after the core is merged:
- metadata_uuid is missing sysfs export (+ btrfs-progs documentation)
- btrfstune -m/-M needs more verbose output, similar to -u/-U
d.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-11-19 13:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-30 14:43 [PATCH v3 0/6] FSID change kernel support Nikolay Borisov
2018-10-30 14:43 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
2018-10-30 14:43 ` [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Nikolay Borisov
2018-10-30 14:43 ` [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Nikolay Borisov
2018-10-30 14:43 ` [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Nikolay Borisov
2018-10-30 14:43 ` [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Nikolay Borisov
2018-10-30 14:43 ` [PATCH 6/6] btrfs: Handle final split-brain possibility " Nikolay Borisov
2018-11-19 13:44 ` [PATCH v3 0/6] FSID change kernel support David Sterba
-- strict thread matches above, loose matches on Subject: below --
2018-10-11 15:03 [PATCH " Nikolay Borisov
2018-10-11 15:03 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).