* [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite
2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
2018-10-11 15:03 ` [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Nikolay Borisov
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
This field is going to be used when the user wants to change the UUID
of the filesystem without having to rewrite all metadata blocks. This
field adds another level of indirection such that when the FSID is
changed what really happens is the current uuid (the one with which the
fs was created) is copied to the 'metadata_uuid' field in the superblock
as well as a new incompat flag is set METADATA_UUID. When the kernel
detects this flag is set it knows that the superblock in fact has 2
uuids:
1. Is the UUID which is user-visible, currently known as FSID.
2. Metadata UUID - this is the UUID which is stamped into all on-disk
datastructures belonging to this file system.
When the new incompat flag is present device scaning checks whether
both fsid/metadata_uuid of the scanned device match to any of the
registed filesystems. When the flag is not set then both UUIDs are
equal and only the FSID is retained on disk, metadata_uuid is set only
in-memory during mount.
Additionally a new metadata_uuid field is also added to the fs_info
struct. It's initialised either with the FSID in case METADATA_UUID
incompat flag is not set or with the metdata_uuid of the superblock
otherwise.
This commit introduces the new fields as well as the new incompat flag
and switches all users of the fsid to the new logic.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/ctree.c | 4 +--
fs/btrfs/ctree.h | 12 ++++---
fs/btrfs/disk-io.c | 32 ++++++++++++++----
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/volumes.c | 72 ++++++++++++++++++++++++++++++++---------
fs/btrfs/volumes.h | 1 +
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 1 +
8 files changed, 97 insertions(+), 28 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2ee43b6a4f09..11b5c2abeddc 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -224,7 +224,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, new_root_objectid);
- write_extent_buffer_fsid(cow, fs_info->fsid);
+ write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
WARN_ON(btrfs_header_generation(buf) > trans->transid);
if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1033,7 +1033,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, root->root_key.objectid);
- write_extent_buffer_fsid(cow, fs_info->fsid);
+ write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
if (ret) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 15c659f23411..afa55f524a49 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -197,7 +197,7 @@ struct btrfs_root_backup {
struct btrfs_super_block {
u8 csum[BTRFS_CSUM_SIZE];
/* the first 4 fields must match struct btrfs_header */
- u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+ u8 fsid[BTRFS_FSID_SIZE]; /* userfacing FS specific uuid */
__le64 bytenr; /* this block number */
__le64 flags;
@@ -234,8 +234,10 @@ struct btrfs_super_block {
__le64 cache_generation;
__le64 uuid_tree_generation;
+ u8 metadata_uuid[BTRFS_FSID_SIZE]; /* The uuid written into btree blocks */
+
/* future expansion */
- __le64 reserved[30];
+ __le64 reserved[28];
u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
} __attribute__ ((__packed__));
@@ -265,7 +267,8 @@ struct btrfs_super_block {
BTRFS_FEATURE_INCOMPAT_RAID56 | \
BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \
BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \
- BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
#define BTRFS_FEATURE_INCOMPAT_SAFE_SET \
(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
@@ -746,7 +749,8 @@ struct btrfs_delayed_root;
#define BTRFS_FS_BALANCE_RUNNING 18
struct btrfs_fs_info {
- u8 fsid[BTRFS_FSID_SIZE];
+ u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */
+ u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */
u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
unsigned long flags;
struct btrfs_root *extent_root;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 27f6a3348f94..b61e4d47e316 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
if (WARN_ON(!PageUptodate(page)))
return -EUCLEAN;
- ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
+ ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid,
btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
return csum_tree_block(fs_info, eb, 0);
@@ -566,7 +566,19 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info,
read_extent_buffer(eb, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
while (fs_devices) {
- if (!memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE)) {
+ u8 *metadata_uuid;
+ /*
+ * Checking the incompat flag is only valid for the current
+ * fs. For seed devices it's forbidden to have their uuid
+ * changed so reading ->fsid in this case is fine
+ */
+ if (fs_devices == fs_info->fs_devices &&
+ btrfs_fs_incompat(fs_info, METADATA_UUID))
+ metadata_uuid = fs_devices->metadata_uuid;
+ else
+ metadata_uuid = fs_devices->fsid;
+
+ if (!memcmp(fsid, metadata_uuid, BTRFS_FSID_SIZE)) {
ret = 0;
break;
}
@@ -2478,10 +2490,11 @@ static int validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) {
+ if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid,
+ BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
- "dev_item UUID does not match fsid: %pU != %pU",
- fs_info->fsid, sb->dev_item.fsid);
+ "dev_item UUID does not match metadata fsid: %pU != %pU",
+ fs_info->metadata_fsid, sb->dev_item.fsid);
ret = -EINVAL;
}
@@ -2822,6 +2835,12 @@ int open_ctree(struct super_block *sb,
brelse(bh);
memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
+ if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
+ memcpy(fs_info->metadata_fsid,
+ fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE);
+ } else {
+ memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+ }
ret = btrfs_validate_mount_super(fs_info);
if (ret) {
@@ -3760,7 +3779,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
btrfs_set_stack_device_io_width(dev_item, dev->io_width);
btrfs_set_stack_device_sector_size(dev_item, dev->sector_size);
memcpy(dev_item->uuid, dev->uuid, BTRFS_UUID_SIZE);
- memcpy(dev_item->fsid, dev->fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(dev_item->fsid, dev->fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE);
flags = btrfs_super_flags(sb);
btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 97cb2c17802e..4926d0975242 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8174,7 +8174,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
btrfs_set_header_generation(buf, trans->transid);
btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(buf, owner);
- write_extent_buffer_fsid(buf, fs_info->fsid);
+ write_extent_buffer_fsid(buf, fs_info->metadata_fsid);
write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid);
if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) {
buf->log_index = root->log_transid % 2;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f435d397019e..e3e12c94834f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -238,13 +238,15 @@ struct list_head *btrfs_get_fs_uuids(void)
/*
* alloc_fs_devices - allocate struct btrfs_fs_devices
- * @fsid: if not NULL, copy the uuid to fs_devices::fsid
+ * @fsid: if not NULL, copy the uuid to fs_devices::fsid
+ * @metadata_fsid: if not NULL, copy the uuid to fs_devices::metadata_fsid
*
* Return a pointer to a new struct btrfs_fs_devices on success, or ERR_PTR().
* The returned struct is not linked onto any lists and can be destroyed with
* kfree() right away.
*/
-static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
+static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
+ const u8 *metadata_fsid)
{
struct btrfs_fs_devices *fs_devs;
@@ -261,6 +263,11 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
if (fsid)
memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE);
+ if (metadata_fsid)
+ memcpy(fs_devs->metadata_uuid, metadata_fsid, BTRFS_FSID_SIZE);
+ else if (fsid)
+ memcpy(fs_devs->metadata_uuid, fsid, BTRFS_FSID_SIZE);
+
return fs_devs;
}
@@ -368,13 +375,24 @@ static struct btrfs_device *find_device(struct btrfs_fs_devices *fs_devices,
return NULL;
}
-static noinline struct btrfs_fs_devices *find_fsid(u8 *fsid)
+static noinline struct btrfs_fs_devices *
+find_fsid(const u8 *fsid, const u8 *metadata_fsid)
{
struct btrfs_fs_devices *fs_devices;
+ ASSERT(fsid);
+
list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
- if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0)
- return fs_devices;
+ if (metadata_fsid) {
+ if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
+ && memcmp(metadata_fsid, fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0)
+ return fs_devices;
+ } else {
+ if (memcmp(fsid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) == 0)
+ return fs_devices;
+ }
}
return NULL;
}
@@ -709,6 +727,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
device->generation = btrfs_super_generation(disk_super);
if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING) {
+ if (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
+ pr_err("BTRFS: Invalid seeding and uuid-changed device detected\n");
+ goto error_brelse;
+ }
+
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
fs_devices->seeding = 1;
} else {
@@ -759,10 +783,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
struct rcu_string *name;
u64 found_transid = btrfs_super_generation(disk_super);
u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
+ bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
+ BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+
+ if (has_metadata_uuid)
+ fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
+ else
+ fs_devices = find_fsid(disk_super->fsid, NULL);
- fs_devices = find_fsid(disk_super->fsid);
if (!fs_devices) {
- fs_devices = alloc_fs_devices(disk_super->fsid);
+ if (has_metadata_uuid)
+ fs_devices = alloc_fs_devices(disk_super->fsid,
+ disk_super->metadata_uuid);
+ else
+ fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
+
if (IS_ERR(fs_devices))
return ERR_CAST(fs_devices);
@@ -884,7 +919,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
struct btrfs_device *device;
struct btrfs_device *orig_dev;
- fs_devices = alloc_fs_devices(orig->fsid);
+ fs_devices = alloc_fs_devices(orig->fsid, NULL);
if (IS_ERR(fs_devices))
return fs_devices;
@@ -1709,7 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
ptr = btrfs_device_uuid(dev_item);
write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
ptr = btrfs_device_fsid(dev_item);
- write_extent_buffer(leaf, trans->fs_info->fsid, ptr, BTRFS_FSID_SIZE);
+ write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr,
+ BTRFS_FSID_SIZE);
btrfs_mark_buffer_dirty(leaf);
ret = 0;
@@ -2132,7 +2168,11 @@ static struct btrfs_device *btrfs_find_device_by_path(
disk_super = (struct btrfs_super_block *)bh->b_data;
devid = btrfs_stack_device_id(&disk_super->dev_item);
dev_uuid = disk_super->dev_item.uuid;
- device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+ if (btrfs_fs_incompat(fs_info, METADATA_UUID))
+ device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->metadata_uuid);
+ else
+ device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+
brelse(bh);
if (!device)
device = ERR_PTR(-ENOENT);
@@ -2202,7 +2242,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
if (!fs_devices->seeding)
return -EINVAL;
- seed_devices = alloc_fs_devices(NULL);
+ seed_devices = alloc_fs_devices(NULL, NULL);
if (IS_ERR(seed_devices))
return PTR_ERR(seed_devices);
@@ -2239,6 +2279,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
generate_random_uuid(fs_devices->fsid);
memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
+ memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
mutex_unlock(&fs_devices->device_list_mutex);
@@ -6245,7 +6287,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
cur_devices = fs_info->fs_devices;
while (cur_devices) {
if (!fsid ||
- !memcmp(cur_devices->fsid, fsid, BTRFS_FSID_SIZE)) {
+ !memcmp(cur_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE)) {
device = find_device(cur_devices, devid, uuid);
if (device)
return device;
@@ -6574,12 +6616,12 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info,
fs_devices = fs_devices->seed;
}
- fs_devices = find_fsid(fsid);
+ fs_devices = find_fsid(fsid, NULL);
if (!fs_devices) {
if (!btrfs_test_opt(fs_info, DEGRADED))
return ERR_PTR(-ENOENT);
- fs_devices = alloc_fs_devices(fsid);
+ fs_devices = alloc_fs_devices(fsid, NULL);
if (IS_ERR(fs_devices))
return fs_devices;
@@ -6629,7 +6671,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item),
BTRFS_FSID_SIZE);
- if (memcmp(fs_uuid, fs_info->fsid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) {
fs_devices = open_seed_devices(fs_info, fs_uuid);
if (IS_ERR(fs_devices))
return PTR_ERR(fs_devices);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index aefce895e994..04860497b33c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -210,6 +210,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+ u8 metadata_uuid[BTRFS_FSID_SIZE];
struct list_head fs_list;
u64 num_devices;
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ca1d21fc4a7..e0763bc4158e 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -269,6 +269,7 @@ struct btrfs_ioctl_fs_info_args {
#define BTRFS_FEATURE_INCOMPAT_RAID56 (1ULL << 7)
#define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA (1ULL << 8)
#define BTRFS_FEATURE_INCOMPAT_NO_HOLES (1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_METADATA_UUID (1ULL << 10)
struct btrfs_ioctl_feature_flags {
__u64 compat_flags;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index aff1356c2bb8..22f9299ba7d9 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -458,6 +458,7 @@ struct btrfs_free_space_header {
#define BTRFS_SUPER_FLAG_METADUMP (1ULL << 33)
#define BTRFS_SUPER_FLAG_METADUMP_V2 (1ULL << 34)
#define BTRFS_SUPER_FLAG_CHANGING_FSID (1ULL << 35)
+#define BTRFS_SUPER_FLAG_CHANGING_FSID_v2 (1ULL << 36)
/*
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info
2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
2018-10-11 15:03 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
2018-10-11 15:03 ` [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Nikolay Borisov
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
Currently btrfs_fs_info structure contains a copy of the
fsid/metadata_uuid fields. Same values are also contained in the
btrfs_fs_devices structure which fs_info has a reference to. Let's
reduce duplication by removing the fields from fs_info and always refer
to the ones in fs_devices. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/check-integrity.c | 2 +-
fs/btrfs/ctree.c | 5 +++--
fs/btrfs/ctree.h | 2 --
fs/btrfs/disk-io.c | 21 ++++++++++++---------
fs/btrfs/extent-tree.c | 2 +-
fs/btrfs/ioctl.c | 2 +-
fs/btrfs/super.c | 2 +-
fs/btrfs/volumes.c | 10 ++++------
include/trace/events/btrfs.h | 2 +-
9 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 2e43fba44035..781cae168d2a 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1720,7 +1720,7 @@ static int btrfsic_test_for_metadata(struct btrfsic_state *state,
num_pages = state->metablock_size >> PAGE_SHIFT;
h = (struct btrfs_header *)datav[0];
- if (memcmp(h->fsid, fs_info->fsid, BTRFS_FSID_SIZE))
+ if (memcmp(h->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE))
return 1;
for (i = 0; i < num_pages; i++) {
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 11b5c2abeddc..9de05de8887d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -12,6 +12,7 @@
#include "transaction.h"
#include "print-tree.h"
#include "locking.h"
+#include "volumes.h"
static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
*root, struct btrfs_path *path, int level);
@@ -224,7 +225,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, new_root_objectid);
- write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
+ write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid);
WARN_ON(btrfs_header_generation(buf) > trans->transid);
if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1033,7 +1034,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
else
btrfs_set_header_owner(cow, root->root_key.objectid);
- write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
+ write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid);
ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
if (ret) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index afa55f524a49..ad797ee42894 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -749,8 +749,6 @@ struct btrfs_delayed_root;
#define BTRFS_FS_BALANCE_RUNNING 18
struct btrfs_fs_info {
- u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */
- u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */
u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
unsigned long flags;
struct btrfs_root *extent_root;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b61e4d47e316..be2caf513e2f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
if (WARN_ON(!PageUptodate(page)))
return -EUCLEAN;
- ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid,
+ ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
return csum_tree_block(fs_info, eb, 0);
@@ -2490,11 +2490,12 @@ static int validate_super(struct btrfs_fs_info *fs_info,
ret = -EINVAL;
}
- if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid,
+ if (memcmp(fs_info->fs_devices->metadata_uuid, sb->dev_item.fsid,
BTRFS_FSID_SIZE) != 0) {
btrfs_err(fs_info,
"dev_item UUID does not match metadata fsid: %pU != %pU",
- fs_info->metadata_fsid, sb->dev_item.fsid);
+ fs_info->fs_devices->metadata_uuid,
+ sb->dev_item.fsid);
ret = -EINVAL;
}
@@ -2834,14 +2835,16 @@ int open_ctree(struct super_block *sb,
sizeof(*fs_info->super_for_commit));
brelse(bh);
- memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
+ ASSERT(!memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid,
+ BTRFS_FSID_SIZE));
+
if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
- memcpy(fs_info->metadata_fsid,
- fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE);
- } else {
- memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+ ASSERT(!memcmp(fs_info->fs_devices->metadata_uuid,
+ fs_info->super_copy->metadata_uuid,
+ BTRFS_FSID_SIZE));
}
+
ret = btrfs_validate_mount_super(fs_info);
if (ret) {
btrfs_err(fs_info, "superblock contains fatal errors");
@@ -2961,7 +2964,7 @@ int open_ctree(struct super_block *sb,
sb->s_blocksize = sectorsize;
sb->s_blocksize_bits = blksize_bits(sectorsize);
- memcpy(&sb->s_uuid, fs_info->fsid, BTRFS_FSID_SIZE);
+ memcpy(&sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
mutex_lock(&fs_info->chunk_mutex);
ret = btrfs_read_sys_array(fs_info);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4926d0975242..615e071ce17c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8174,7 +8174,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
btrfs_set_header_generation(buf, trans->transid);
btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(buf, owner);
- write_extent_buffer_fsid(buf, fs_info->metadata_fsid);
+ write_extent_buffer_fsid(buf, fs_info->fs_devices->metadata_uuid);
write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid);
if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) {
buf->log_index = root->log_transid % 2;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a990a9045139..510635e49319 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3135,7 +3135,7 @@ static long btrfs_ioctl_fs_info(struct btrfs_fs_info *fs_info,
}
rcu_read_unlock();
- memcpy(&fi_args->fsid, fs_info->fsid, sizeof(fi_args->fsid));
+ memcpy(&fi_args->fsid, fs_devices->fsid, sizeof(fi_args->fsid));
fi_args->nodesize = fs_info->nodesize;
fi_args->sectorsize = fs_info->sectorsize;
fi_args->clone_alignment = fs_info->sectorsize;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b362b45dd757..1163183fe6dd 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2090,7 +2090,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
u64 total_free_data = 0;
u64 total_free_meta = 0;
int bits = dentry->d_sb->s_blocksize_bits;
- __be32 *fsid = (__be32 *)fs_info->fsid;
+ __be32 *fsid = (__be32 *)fs_info->fs_devices->fsid;
unsigned factor = 1;
struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
int ret;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e3e12c94834f..bf0aa900f22c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1744,8 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
ptr = btrfs_device_uuid(dev_item);
write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
ptr = btrfs_device_fsid(dev_item);
- write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr,
- BTRFS_FSID_SIZE);
+ write_extent_buffer(leaf, trans->fs_info->fs_devices->metadata_uuid,
+ ptr, BTRFS_FSID_SIZE);
btrfs_mark_buffer_dirty(leaf);
ret = 0;
@@ -2278,9 +2278,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
fs_devices->seed = seed_devices;
generate_random_uuid(fs_devices->fsid);
- memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
- memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
mutex_unlock(&fs_devices->device_list_mutex);
@@ -2522,7 +2520,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
* so rename the fsid on the sysfs
*/
snprintf(fsid_buf, BTRFS_UUID_UNPARSED_SIZE, "%pU",
- fs_info->fsid);
+ fs_info->fs_devices->fsid);
if (kobject_rename(&fs_devices->fsid_kobj, fsid_buf))
btrfs_warn(fs_info,
"sysfs: failed to create fsid for sprout");
@@ -6671,7 +6669,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item),
BTRFS_FSID_SIZE);
- if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) {
+ if (memcmp(fs_uuid, fs_devices->metadata_uuid, BTRFS_FSID_SIZE)) {
fs_devices = open_seed_devices(fs_info, fs_uuid);
if (IS_ERR(fs_devices))
return PTR_ERR(fs_devices);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 8568946f491d..4b8400f7d4fa 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -92,7 +92,7 @@ TRACE_DEFINE_ENUM(COMMIT_TRANS);
#define TP_STRUCT__entry_fsid __array(u8, fsid, BTRFS_FSID_SIZE)
#define TP_fast_assign_fsid(fs_info) \
- memcpy(__entry->fsid, fs_info->fsid, BTRFS_FSID_SIZE)
+ memcpy(__entry->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE)
#define TP_STRUCT__entry_btrfs(args...) \
TP_STRUCT__entry( \
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change
2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
2018-10-11 15:03 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
2018-10-11 15:03 ` [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
2018-10-11 15:03 ` [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Nikolay Borisov
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
Even though FSID change without rewrite is a very quick operations it's
still possible to experience a split brain scenario if power loss
occurs at the right time. This patch handle the case where power
failure occurs while the first transaction (the one setting
FSID_CHANGING_V2) flag is being persisted on disk. This can cause
the btrfs_fs_device of this filesystem to be created by a device which:
a) has the FSID_CHANGING_V2 flag set but its fsid value is intact
b) or a device which doesn't have FSID_CHANGING_V2 flag set and its
fsid value is intact
This situatian is trivially handled by the current find_fsid code since
in both cases the devices are going to be tread like ordinary devices.
Since btrfs is mounted always using the superblock of the latest
device (the one with higher generation number), meaning it will have
the FSID_CHANGING_V2 flag set, ensure it's being cleared. On the first
transaction commit following the mount all disks will have it cleared.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/disk-io.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index be2caf513e2f..9c2f46f8421a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2831,10 +2831,10 @@ int open_ctree(struct super_block *sb,
* the whole block of INFO_SIZE
*/
memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy));
- memcpy(fs_info->super_for_commit, fs_info->super_copy,
- sizeof(*fs_info->super_for_commit));
brelse(bh);
+ disk_super = fs_info->super_copy;
+
ASSERT(!memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid,
BTRFS_FSID_SIZE));
@@ -2844,6 +2844,15 @@ int open_ctree(struct super_block *sb,
BTRFS_FSID_SIZE));
}
+ features = btrfs_super_flags(disk_super);
+ if (features & BTRFS_SUPER_FLAG_CHANGING_FSID_v2) {
+ features &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_v2;
+ btrfs_set_super_flags(disk_super, features);
+ btrfs_info(fs_info, "Found metadata uuid in progress flag. Clearing\n");
+ }
+
+ memcpy(fs_info->super_for_commit, fs_info->super_copy,
+ sizeof(*fs_info->super_for_commit));
ret = btrfs_validate_mount_super(fs_info);
if (ret) {
@@ -2852,7 +2861,6 @@ int open_ctree(struct super_block *sb,
goto fail_alloc;
}
- disk_super = fs_info->super_copy;
if (!btrfs_super_root(disk_super))
goto fail_alloc;
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices
2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
` (2 preceding siblings ...)
2018-10-11 15:03 ` [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
2018-10-11 15:03 ` [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Nikolay Borisov
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
In order to gracefully handle split-brain scenario which are very
unlikely, yet still possible while performing the FSID change I'm
gonna need two more pieces of information:
1. The highes generation number among all devices registered to a
particular btrfs_fs_devices
2. A boolean flag whether a given btrfs_fs_devices was created by a
device which had the FSID_CHANGING_V2 flag set.
This is a preparatory patch and just introduces the variables as well
as code which sets, their actual use is going to happen in a later
patch.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/volumes.c | 9 ++++++++-
fs/btrfs/volumes.h | 5 +++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bf0aa900f22c..c2b66d15e08d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -785,6 +785,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+ bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
+ BTRFS_SUPER_FLAG_CHANGING_FSID_v2);
if (has_metadata_uuid)
fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
@@ -798,6 +800,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
else
fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
+ fs_devices->fsid_change = fsid_change_in_progress;
+
if (IS_ERR(fs_devices))
return ERR_CAST(fs_devices);
@@ -904,8 +908,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
* it back. We need it to pick the disk with largest generation
* (as above).
*/
- if (!fs_devices->opened)
+ if (!fs_devices->opened) {
device->generation = found_transid;
+ fs_devices->latest_generation= max(found_transid,
+ fs_devices->latest_generation);
+ }
fs_devices->total_devices = btrfs_super_num_devices(disk_super);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 04860497b33c..6b2a01c55426 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -211,6 +211,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
struct btrfs_fs_devices {
u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
u8 metadata_uuid[BTRFS_FSID_SIZE];
+ bool fsid_change;
struct list_head fs_list;
u64 num_devices;
@@ -219,6 +220,10 @@ struct btrfs_fs_devices {
u64 missing_devices;
u64 total_rw_bytes;
u64 total_devices;
+
+ /* Highest generation number of seen devices */
+ u64 latest_generation;
+
struct block_device *latest_bdev;
/* all of the devices in the FS, protected by a mutex
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change
2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
` (3 preceding siblings ...)
2018-10-11 15:03 ` [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
2018-10-11 15:03 ` [PATCH 6/6] btrfs: Handle final split-brain possibility " Nikolay Borisov
2018-10-19 14:18 ` [PATCH 0/6] FSID change kernel support David Sterba
6 siblings, 0 replies; 10+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
This commit continues hardening the scanning code to handle cases where
power loss could have caused disks in a multi-disk filesystem to be
in inconsistent state. Namely handle the situation that can occur when
some of the disks in multi-disk fs have completed their fsid change i.e
they have METADATA_UUID incompat flag set, have cleared the
FSID_CHANGING_V2 flag and their fsid/metadata_uuid are different. At
the same time the other half of the disks will have their
fsid/metadata_uuid unchanged and will only have FSID_CHANGING_V2 flag.
This is handled by adding additional code in the scan path which:
a) In case first a device with FSID_CHANGING_V2 flag is scanned and
btrfs_fs_devices is created with matching fsid/metdata_uuid then when
a device with completed fsid change is scanned it will detect this
via the new code in find_fsid i.e that such an fs_devices exist that
fsid_change flag is set to true, it's metadata_uuid/fsid match and
the metadata_uuid of the scanned device matches that of the fs_devices.
In this case, it's important to note that the devices which has its
fsid change completed will have a higher generation number than the
device with FSID_CHANGING_V2 flag set, so its superblock block will
be used during mount. To prevent an assertion triggering because
the sb used for mounting will have differing fsid/metadata_uuid than
the ones in the fs_devices struct also add code in device_list_add
which overwrites the values in fs_devices.
b) Alternatively we can end up with a device that completed its
fsid change to be scanned first which will create the respective
btrfs_fs_devices struct with differing fsid/metadata_uuid. In this
case when a device with FSID_CHANGING_V2 flag set is scanned it will
call the newly added find_fsid_inprogress function which will return
the correct fs_devices.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/volumes.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 74 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c2b66d15e08d..2c9879a81884 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -382,6 +382,26 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
ASSERT(fsid);
+ if (metadata_fsid) {
+
+ /*
+ * Handle scanned device having completed its fsid change but
+ * belonging to a fs_devices that was created by first scanning
+ * a device which didn't have it's fsid/metadata_uuid changed
+ * at all and the CHANGING_FSID flag set. 4/a
+ */
+ list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+ if (fs_devices->fsid_change &&
+ memcmp(metadata_fsid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) == 0 &&
+ memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0) {
+ return fs_devices;
+ }
+ }
+ }
+
+ /* Handle non-split brain cases */
list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
if (metadata_fsid) {
if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
@@ -768,6 +788,27 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
}
/*
+ * Handle scanned device having its FSID_CHANGING flag set and the fs_devices
+ * being created with a disk that has already completed its fsid change.
+ */
+static struct btrfs_fs_devices *find_fsid_inprogress(
+ struct btrfs_super_block *disk_super)
+{
+ struct btrfs_fs_devices *fs_devices;
+
+ list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+ if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) != 0 &&
+ memcmp(fs_devices->metadata_uuid, disk_super->fsid,
+ BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change) {
+ return fs_devices;
+ }
+ }
+
+ return NULL;
+}
+
+/*
* Add new device to list of registered devices
*
* Returns:
@@ -779,7 +820,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
bool *new_device_added)
{
struct btrfs_device *device;
- struct btrfs_fs_devices *fs_devices;
+ struct btrfs_fs_devices *fs_devices = NULL;
struct rcu_string *name;
u64 found_transid = btrfs_super_generation(disk_super);
u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
@@ -788,10 +829,24 @@ static noinline struct btrfs_device *device_list_add(const char *path,
bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
BTRFS_SUPER_FLAG_CHANGING_FSID_v2);
- if (has_metadata_uuid)
- fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
- else
+ if (fsid_change_in_progress && !has_metadata_uuid) {
+ /*
+ * When we have an image which has FSID_CHANGE set
+ * it might belong to either a filesystem which has
+ * disks with completed fsid change or it might belong
+ * to fs with no uuid changes in effect, handle both.
+ */
+ fs_devices = find_fsid_inprogress(disk_super);
+ if (!fs_devices)
+ fs_devices = find_fsid(disk_super->fsid, NULL);
+
+ } else if (has_metadata_uuid) {
+ fs_devices = find_fsid(disk_super->fsid,
+ disk_super->metadata_uuid);
+ } else {
fs_devices = find_fsid(disk_super->fsid, NULL);
+ }
+
if (!fs_devices) {
if (has_metadata_uuid)
@@ -813,6 +868,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
mutex_lock(&fs_devices->device_list_mutex);
device = find_device(fs_devices, devid,
disk_super->dev_item.uuid);
+
+ /*
+ * If this disk has been pulled into an fs devices created by
+ * a device which had the FSID_CHANGING flag then replace the
+ * metadata_uuid/fsid values of the fs_devices.
+ */
+ if (has_metadata_uuid && fs_devices->fsid_change &&
+ found_transid > fs_devices->latest_generation) {
+ memcpy(fs_devices->fsid, disk_super->fsid,
+ BTRFS_FSID_SIZE);
+ memcpy(fs_devices->metadata_uuid,
+ disk_super->metadata_uuid, BTRFS_FSID_SIZE);
+
+ fs_devices->fsid_change = false;
+ }
}
if (!device) {
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 6/6] btrfs: Handle final split-brain possibility during fsid change
2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
` (4 preceding siblings ...)
2018-10-11 15:03 ` [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
2018-10-19 14:18 ` [PATCH 0/6] FSID change kernel support David Sterba
6 siblings, 0 replies; 10+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
To: linux-btrfs; +Cc: Nikolay Borisov
This patch lands the last case which needs to be handled by the fsid
change code. Namely, this is the case where a multidisk filesystem has
already undergone at least one successful fsid change i.e all disks
have the METADATA_UUID incompat bit and power failure occurs as another
fsid chind is in progress. When such an event occurs disks should be
split in 2 groups. One of the groups will have both METADATA_UUID and
FSID_CHANGING_V2 flags set coupled with old fsid/metadata_uuid pairs.
The other group of disks will have only METADATA_UUID bit set and their
fsid will be different than the one in disks in the first group. Here
we look at cases:
a) A disk from the first group is scanned first, so fs_devices is
created with stale fsid/metdata_uuid. Then when a disk from the
second group is scanned it needs to first check whether there exists
such an fs_devices that has fsid_change set to true (because it was
created with a disk having the FSID_CHANGING_V2 flag), the
metadata_uuid and fsid of the fsdevices will be different (since it was
created by a disk which already had at least 1 successful fsid change)
and finally the metadata_uuid of the fs_devices will equal that of the
currently scanned disk (because metadata_uuid never really changes).
When the correct fs_devices is found the information from the scanned
disk will replace the current one in fs_devices since the scanned disk
will have higher generation number.
b) A disk from the second group is scanned so fs_devices is created
as usual with differing fsid/metdata_uid. Then when a disk from the
first group is scanned the code detects that it has both
FSID_CHANGING and METADATA_UUID flags set and will scan for fs_devices
that has differing metadata_uuid/fsid and whose metadata_uuid is the
same as that of the scanned device.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
fs/btrfs/volumes.c | 65 ++++++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 53 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2c9879a81884..d08667cea189 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -383,7 +383,6 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
ASSERT(fsid);
if (metadata_fsid) {
-
/*
* Handle scanned device having completed its fsid change but
* belonging to a fs_devices that was created by first scanning
@@ -399,6 +398,21 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
return fs_devices;
}
}
+ /*
+ * Handle scanned device having completed its fsid change but
+ * belonging to a fs_devices that was created by a device that
+ * has an outdated pair of fsid/metadata_uuid and CHANGING_FSID
+ * flag set. 6/b
+ */
+ list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+ if (fs_devices->fsid_change &&
+ memcmp(fs_devices->metadata_uuid,
+ fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
+ memcmp(metadata_fsid, fs_devices->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0) {
+ return fs_devices;
+ }
+ }
}
/* Handle non-split brain cases */
@@ -808,6 +822,30 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
return NULL;
}
+
+static struct btrfs_fs_devices *find_fsid_changed(
+ struct btrfs_super_block *disk_super)
+{
+ struct btrfs_fs_devices *fs_devices;
+
+ /*
+ * Handles the case where scanned device is part of an fs that had
+ * multiple successful changes of FSID but curently device didn't
+ * observe it. Meaning our fsid will be different than theirs. 6/b
+ */
+ list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+ if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
+ BTRFS_FSID_SIZE) != 0 &&
+ memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid,
+ BTRFS_FSID_SIZE) == 0 &&
+ memcmp(fs_devices->fsid, disk_super->fsid,
+ BTRFS_FSID_SIZE) != 0) {
+ return fs_devices;
+ }
+ }
+
+ return NULL;
+}
/*
* Add new device to list of registered devices
*
@@ -829,17 +867,20 @@ static noinline struct btrfs_device *device_list_add(const char *path,
bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
BTRFS_SUPER_FLAG_CHANGING_FSID_v2);
- if (fsid_change_in_progress && !has_metadata_uuid) {
- /*
- * When we have an image which has FSID_CHANGE set
- * it might belong to either a filesystem which has
- * disks with completed fsid change or it might belong
- * to fs with no uuid changes in effect, handle both.
- */
- fs_devices = find_fsid_inprogress(disk_super);
- if (!fs_devices)
- fs_devices = find_fsid(disk_super->fsid, NULL);
-
+ if (fsid_change_in_progress) {
+ if (!has_metadata_uuid) {
+ /*
+ * When we have an image which has FSID_CHANGE set
+ * it might belong to either a filesystem which has
+ * disks with completed fsid change or it might belong
+ * to fs with no uuid changes in effect, handle both.
+ */
+ fs_devices = find_fsid_inprogress(disk_super);
+ if (!fs_devices)
+ fs_devices = find_fsid(disk_super->fsid, NULL);
+ } else {
+ fs_devices = find_fsid_changed(disk_super);
+ }
} else if (has_metadata_uuid) {
fs_devices = find_fsid(disk_super->fsid,
disk_super->metadata_uuid);
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 0/6] FSID change kernel support
2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
` (5 preceding siblings ...)
2018-10-11 15:03 ` [PATCH 6/6] btrfs: Handle final split-brain possibility " Nikolay Borisov
@ 2018-10-19 14:18 ` David Sterba
2018-10-19 14:31 ` Nikolay Borisov
6 siblings, 1 reply; 10+ messages in thread
From: David Sterba @ 2018-10-19 14:18 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: linux-btrfs
On Thu, Oct 11, 2018 at 06:03:20PM +0300, Nikolay Borisov wrote:
> Here is the second posting of the fsid change support for the kernel. For
> background information you can refer to v1 [0]. The main changes in this version
> are around the handling of possible split-brain scenarios. I've changed a bit
> how the userspace code works and now the process is split among 2 transactions.
> The first one flagging "we are about to change fsid" and once it's persisted on
> all disks a second one does the actual change. This of course is not enough
> to guarantee full consistency so I had to extend the device scanning to
> gracefully handle such cases. I believe I have covered everything but more
> review will be appreciated.
All the cases seem to be covered. Do you intend to add the design
document somewhere? The references in the code seem stale and puzzling.
> So patch 1 implements the basic functionality with no split-brain handling
> whatsoever. Patch 2 is a minor cleanup. Patch 3 deals with a split-brain that
> can occur if power loss happens during the initial transaction (the one setting
> the beginning flag). Patch 4 adds some information that is needed in the last 2
> patches. Patch 5 handles failure between transaction 1 and transaction 2 and
> finally patch 6 handles the case of power loss during transaction 1 but for an
> fs which has already undergone at least one successful fsid change. More
> details about the exact failure modes are in each respective patch.
>
> One thing which could be improved but I ran out of ideas is the naming of the
> ancillary functions - find_fsid_inprogress and find_fsid_changed.
Hm, no better ideas so it can stay and be changed later.
> I've actually tested the split-brain handing code with specially crafted images.
> They will be part of the user-space submissions and I believe I have full
> coverage for that.
Perfect, thanks.
Now the bad news from me :) There are several coding style and style
issues all over the patches so I'll list them here.
* BTRFS_SUPER_FLAG_CHANGING_FSID_v2 -- V2 with uppercase V, as other
versioned symbols
* all references in changelogs or comments should refer to the super flag
as CHANGING_FSID_v2, not FSID_CHANGING, not just CHANGING_FSID, nor
FSID_CHANGE. This is because we want to be able to search for it and
find all occurences
* error messages should stick to the preferred format,
https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_messages
* comments referring to UUID should use "UUID", there's a mixt of
both ways added by the patches, error messages should use the
uppercase form too
* find_fsid - type and name should be on one line, parameters on the
next if they don't fit
* device_list_add - missing space before =
As these are not functional problems, I'll add the patchset to for-next
for testing.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 0/6] FSID change kernel support
2018-10-19 14:18 ` [PATCH 0/6] FSID change kernel support David Sterba
@ 2018-10-19 14:31 ` Nikolay Borisov
2018-10-19 15:50 ` David Sterba
0 siblings, 1 reply; 10+ messages in thread
From: Nikolay Borisov @ 2018-10-19 14:31 UTC (permalink / raw)
To: dsterba, linux-btrfs
On 19.10.2018 17:18, David Sterba wrote:
> On Thu, Oct 11, 2018 at 06:03:20PM +0300, Nikolay Borisov wrote:
>> Here is the second posting of the fsid change support for the kernel. For
>> background information you can refer to v1 [0]. The main changes in this version
>> are around the handling of possible split-brain scenarios. I've changed a bit
>> how the userspace code works and now the process is split among 2 transactions.
>> The first one flagging "we are about to change fsid" and once it's persisted on
>> all disks a second one does the actual change. This of course is not enough
>> to guarantee full consistency so I had to extend the device scanning to
>> gracefully handle such cases. I believe I have covered everything but more
>> review will be appreciated.
>
> All the cases seem to be covered. Do you intend to add the design
> document somewhere? The references in the code seem stale and puzzling.
I believe I have the critical portions of the design document (i.e the
handling of various cases) described in each of the 3 patches that add
split-brain handling. The doc could be added to the btrfs-dev-doc repo I
guess.
>
>> So patch 1 implements the basic functionality with no split-brain handling
>> whatsoever. Patch 2 is a minor cleanup. Patch 3 deals with a split-brain that
>> can occur if power loss happens during the initial transaction (the one setting
>> the beginning flag). Patch 4 adds some information that is needed in the last 2
>> patches. Patch 5 handles failure between transaction 1 and transaction 2 and
>> finally patch 6 handles the case of power loss during transaction 1 but for an
>> fs which has already undergone at least one successful fsid change. More
>> details about the exact failure modes are in each respective patch.
>>
>> One thing which could be improved but I ran out of ideas is the naming of the
>> ancillary functions - find_fsid_inprogress and find_fsid_changed.
>
> Hm, no better ideas so it can stay and be changed later.
>
>> I've actually tested the split-brain handing code with specially crafted images.
>> They will be part of the user-space submissions and I believe I have full
>> coverage for that.
>
> Perfect, thanks.
>
> Now the bad news from me :) There are several coding style and style
> issues all over the patches so I'll list them here.
>
> * BTRFS_SUPER_FLAG_CHANGING_FSID_v2 -- V2 with uppercase V, as other
> versioned symbols
>
> * all references in changelogs or comments should refer to the super flag
> as CHANGING_FSID_v2, not FSID_CHANGING, not just CHANGING_FSID, nor
> FSID_CHANGE. This is because we want to be able to search for it and
> find all occurences
>
> * error messages should stick to the preferred format,
> https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_messages
>
> * comments referring to UUID should use "UUID", there's a mixt of
> both ways added by the patches, error messages should use the
> uppercase form too
>
> * find_fsid - type and name should be on one line, parameters on the
> next if they don't fit
>
> * device_list_add - missing space before =
>
> As these are not functional problems, I'll add the patchset to for-next
> for testing.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/6] FSID change kernel support
2018-10-19 14:31 ` Nikolay Borisov
@ 2018-10-19 15:50 ` David Sterba
0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2018-10-19 15:50 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: dsterba, linux-btrfs
On Fri, Oct 19, 2018 at 05:31:09PM +0300, Nikolay Borisov wrote:
>
>
> On 19.10.2018 17:18, David Sterba wrote:
> > On Thu, Oct 11, 2018 at 06:03:20PM +0300, Nikolay Borisov wrote:
> >> Here is the second posting of the fsid change support for the kernel. For
> >> background information you can refer to v1 [0]. The main changes in this version
> >> are around the handling of possible split-brain scenarios. I've changed a bit
> >> how the userspace code works and now the process is split among 2 transactions.
> >> The first one flagging "we are about to change fsid" and once it's persisted on
> >> all disks a second one does the actual change. This of course is not enough
> >> to guarantee full consistency so I had to extend the device scanning to
> >> gracefully handle such cases. I believe I have covered everything but more
> >> review will be appreciated.
> >
> > All the cases seem to be covered. Do you intend to add the design
> > document somewhere? The references in the code seem stale and puzzling.
>
> I believe I have the critical portions of the design document (i.e the
> handling of various cases) described in each of the 3 patches that add
> split-brain handling. The doc could be added to the btrfs-dev-doc repo I
> guess.
Yeah, but what does 4/a mean in the comments? It's not referring to the
changelog where I'd look first. External documentation is fine but we
need some sort of reference in the code. The comments in 5/6 and 6/6 are
sufficient, just that the references could be more explicit. Possibly
the cases can be simplified and put into the .c as a bullet list.
The document is good for understanding, the nice-to-have part would be
some quick reference to check the logic. I'll give it another read to
see how much is really missing or not.
^ permalink raw reply [flat|nested] 10+ messages in thread