linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] FSID change kernel support
@ 2018-10-11 15:03 Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Hello, 

Here is the second posting of the fsid change support for the kernel. For 
background information you can refer to v1 [0]. The main changes in this version
are around the handling of possible split-brain scenarios. I've changed a bit 
how the userspace code works and now the process is split among 2 transactions. 
The first one flagging "we are about to change fsid" and once it's persisted on
all disks a second one does the actual change. This of course is not enough 
to guarantee full consistency so I had to extend the device scanning to 
gracefully handle such cases. I believe I have covered everything but more 
review will be appreciated. 

So patch 1 implements the basic functionality with no split-brain handling 
whatsoever. Patch 2 is a minor cleanup. Patch 3 deals with a split-brain that 
can occur if power loss happens during the initial transaction (the one setting
the beginning flag). Patch 4 adds some information that is needed in the last 2 
patches. Patch 5 handles failure between transaction 1 and transaction 2 and 
finally patch 6 handles the case of power loss during transaction 1 but for an 
fs which has already undergone at least one successful fsid change. More 
details about the exact failure modes are in each respective patch.

One thing which could be improved but I ran out of ideas is the naming of the 
ancillary functions - find_fsid_inprogress and find_fsid_changed. 


I've actually tested the split-brain handing code with specially crafted images. 
They will be part of the user-space submissions and I believe I have full 
coverage for that. 

[0] https://lore.kernel.org/linux-btrfs/1535531774-29830-1-git-send-email-nborisov@suse.com/

Nikolay Borisov (6):
  btrfs: Introduce support for FSID change without metadata rewrite
  btrfs: Remove fsid/metadata_fsid fields from btrfs_info
  btrfs: Add handling for disk split-brain scenario during fsid change
  btrfs: Introduce 2 more members to struct btrfs_fs_devices
  btrfs: Handle one more split-brain scenario during fsid change
  btrfs: Handle final split-brain possibility during fsid change

 fs/btrfs/check-integrity.c      |   2 +-
 fs/btrfs/ctree.c                |   5 +-
 fs/btrfs/ctree.h                |  10 +-
 fs/btrfs/disk-io.c              |  53 ++++++++---
 fs/btrfs/extent-tree.c          |   2 +-
 fs/btrfs/ioctl.c                |   2 +-
 fs/btrfs/super.c                |   2 +-
 fs/btrfs/volumes.c              | 196 ++++++++++++++++++++++++++++++++++++----
 fs/btrfs/volumes.h              |   6 ++
 include/trace/events/btrfs.h    |   2 +-
 include/uapi/linux/btrfs.h      |   1 +
 include/uapi/linux/btrfs_tree.h |   1 +
 12 files changed, 241 insertions(+), 41 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite
  2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Nikolay Borisov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This field is going to be used when the user wants to change the UUID
of the filesystem without having to rewrite all metadata blocks. This
field adds another level of indirection such that when the FSID is
changed what really happens is the current uuid (the one with which the
fs was created) is copied to the 'metadata_uuid' field in the superblock
as well as a new incompat flag is set METADATA_UUID. When the kernel
detects this flag is set it knows that the superblock in fact has 2
uuids:

1. Is the UUID which is user-visible, currently known as FSID.
2. Metadata UUID - this is the UUID which is stamped into all on-disk
datastructures belonging to this file system.

When the new incompat flag is present device scaning checks whether
both fsid/metadata_uuid of the scanned device match to any of the
registed filesystems. When the flag is not set then both UUIDs are
equal and only the FSID is retained on disk, metadata_uuid is set only
in-memory during mount.

Additionally a new metadata_uuid field is also added to the fs_info
struct. It's initialised either with the FSID in case METADATA_UUID
incompat flag is not set or with the metdata_uuid of the superblock
otherwise.

This commit introduces the new fields as well as the new incompat flag
and switches all users of the fsid to the new logic.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.c                |  4 +--
 fs/btrfs/ctree.h                | 12 ++++---
 fs/btrfs/disk-io.c              | 32 ++++++++++++++----
 fs/btrfs/extent-tree.c          |  2 +-
 fs/btrfs/volumes.c              | 72 ++++++++++++++++++++++++++++++++---------
 fs/btrfs/volumes.h              |  1 +
 include/uapi/linux/btrfs.h      |  1 +
 include/uapi/linux/btrfs_tree.h |  1 +
 8 files changed, 97 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2ee43b6a4f09..11b5c2abeddc 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -224,7 +224,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 	else
 		btrfs_set_header_owner(cow, new_root_objectid);
 
-	write_extent_buffer_fsid(cow, fs_info->fsid);
+	write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
 
 	WARN_ON(btrfs_header_generation(buf) > trans->transid);
 	if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1033,7 +1033,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 	else
 		btrfs_set_header_owner(cow, root->root_key.objectid);
 
-	write_extent_buffer_fsid(cow, fs_info->fsid);
+	write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
 
 	ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
 	if (ret) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 15c659f23411..afa55f524a49 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -197,7 +197,7 @@ struct btrfs_root_backup {
 struct btrfs_super_block {
 	u8 csum[BTRFS_CSUM_SIZE];
 	/* the first 4 fields must match struct btrfs_header */
-	u8 fsid[BTRFS_FSID_SIZE];    /* FS specific uuid */
+	u8 fsid[BTRFS_FSID_SIZE];    /* userfacing FS specific uuid */
 	__le64 bytenr; /* this block number */
 	__le64 flags;
 
@@ -234,8 +234,10 @@ struct btrfs_super_block {
 	__le64 cache_generation;
 	__le64 uuid_tree_generation;
 
+	u8 metadata_uuid[BTRFS_FSID_SIZE]; /* The uuid written into btree blocks */
+
 	/* future expansion */
-	__le64 reserved[30];
+	__le64 reserved[28];
 	u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
 	struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
 } __attribute__ ((__packed__));
@@ -265,7 +267,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
-	 BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+	 BTRFS_FEATURE_INCOMPAT_NO_HOLES	|	\
+	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
 
 #define BTRFS_FEATURE_INCOMPAT_SAFE_SET			\
 	(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
@@ -746,7 +749,8 @@ struct btrfs_delayed_root;
 #define BTRFS_FS_BALANCE_RUNNING		18
 
 struct btrfs_fs_info {
-	u8 fsid[BTRFS_FSID_SIZE];
+	u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */
+	u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */
 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
 	unsigned long flags;
 	struct btrfs_root *extent_root;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 27f6a3348f94..b61e4d47e316 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
 	if (WARN_ON(!PageUptodate(page)))
 		return -EUCLEAN;
 
-	ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
+	ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid,
 			btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
 
 	return csum_tree_block(fs_info, eb, 0);
@@ -566,7 +566,19 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info,
 
 	read_extent_buffer(eb, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
 	while (fs_devices) {
-		if (!memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE)) {
+		u8 *metadata_uuid;
+		/*
+		 * Checking the incompat flag is only valid for the current
+		 * fs. For seed devices it's forbidden to have their uuid
+		 * changed so reading ->fsid in this case is fine
+		 */
+		if (fs_devices == fs_info->fs_devices &&
+		    btrfs_fs_incompat(fs_info, METADATA_UUID))
+			metadata_uuid = fs_devices->metadata_uuid;
+		else
+			metadata_uuid = fs_devices->fsid;
+
+		if (!memcmp(fsid, metadata_uuid, BTRFS_FSID_SIZE)) {
 			ret = 0;
 			break;
 		}
@@ -2478,10 +2490,11 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		ret = -EINVAL;
 	}
 
-	if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) {
+	if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid,
+		   BTRFS_FSID_SIZE) != 0) {
 		btrfs_err(fs_info,
-			   "dev_item UUID does not match fsid: %pU != %pU",
-			   fs_info->fsid, sb->dev_item.fsid);
+			   "dev_item UUID does not match metadata fsid: %pU != %pU",
+			   fs_info->metadata_fsid, sb->dev_item.fsid);
 		ret = -EINVAL;
 	}
 
@@ -2822,6 +2835,12 @@ int open_ctree(struct super_block *sb,
 	brelse(bh);
 
 	memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
+	if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
+		memcpy(fs_info->metadata_fsid,
+		       fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE);
+	} else {
+		memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+	}
 
 	ret = btrfs_validate_mount_super(fs_info);
 	if (ret) {
@@ -3760,7 +3779,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 		btrfs_set_stack_device_io_width(dev_item, dev->io_width);
 		btrfs_set_stack_device_sector_size(dev_item, dev->sector_size);
 		memcpy(dev_item->uuid, dev->uuid, BTRFS_UUID_SIZE);
-		memcpy(dev_item->fsid, dev->fs_devices->fsid, BTRFS_FSID_SIZE);
+		memcpy(dev_item->fsid, dev->fs_devices->metadata_uuid,
+		       BTRFS_FSID_SIZE);
 
 		flags = btrfs_super_flags(sb);
 		btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 97cb2c17802e..4926d0975242 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8174,7 +8174,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	btrfs_set_header_generation(buf, trans->transid);
 	btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
 	btrfs_set_header_owner(buf, owner);
-	write_extent_buffer_fsid(buf, fs_info->fsid);
+	write_extent_buffer_fsid(buf, fs_info->metadata_fsid);
 	write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid);
 	if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) {
 		buf->log_index = root->log_transid % 2;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f435d397019e..e3e12c94834f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -238,13 +238,15 @@ struct list_head *btrfs_get_fs_uuids(void)
 
 /*
  * alloc_fs_devices - allocate struct btrfs_fs_devices
- * @fsid:	if not NULL, copy the uuid to fs_devices::fsid
+ * @fsid:		if not NULL, copy the uuid to fs_devices::fsid
+ * @metadata_fsid:	if not NULL, copy the uuid to fs_devices::metadata_fsid
  *
  * Return a pointer to a new struct btrfs_fs_devices on success, or ERR_PTR().
  * The returned struct is not linked onto any lists and can be destroyed with
  * kfree() right away.
  */
-static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
+static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
+						 const u8 *metadata_fsid)
 {
 	struct btrfs_fs_devices *fs_devs;
 
@@ -261,6 +263,11 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
 	if (fsid)
 		memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE);
 
+	if (metadata_fsid)
+		memcpy(fs_devs->metadata_uuid, metadata_fsid, BTRFS_FSID_SIZE);
+	else if (fsid)
+		memcpy(fs_devs->metadata_uuid, fsid, BTRFS_FSID_SIZE);
+
 	return fs_devs;
 }
 
@@ -368,13 +375,24 @@ static struct btrfs_device *find_device(struct btrfs_fs_devices *fs_devices,
 	return NULL;
 }
 
-static noinline struct btrfs_fs_devices *find_fsid(u8 *fsid)
+static noinline struct btrfs_fs_devices *
+find_fsid(const u8 *fsid, const u8 *metadata_fsid)
 {
 	struct btrfs_fs_devices *fs_devices;
 
+	ASSERT(fsid);
+
 	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-		if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0)
-			return fs_devices;
+		if (metadata_fsid) {
+			if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
+			    && memcmp(metadata_fsid, fs_devices->metadata_uuid,
+				      BTRFS_FSID_SIZE) == 0)
+				return fs_devices;
+		} else {
+			if (memcmp(fsid, fs_devices->fsid,
+				   BTRFS_FSID_SIZE) == 0)
+				return fs_devices;
+		}
 	}
 	return NULL;
 }
@@ -709,6 +727,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 	device->generation = btrfs_super_generation(disk_super);
 
 	if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING) {
+		if (btrfs_super_incompat_flags(disk_super) &
+		    BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
+			pr_err("BTRFS: Invalid seeding and uuid-changed device detected\n");
+			goto error_brelse;
+		}
+
 		clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
 		fs_devices->seeding = 1;
 	} else {
@@ -759,10 +783,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	struct rcu_string *name;
 	u64 found_transid = btrfs_super_generation(disk_super);
 	u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
+	bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
+		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+
+	if (has_metadata_uuid)
+		fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
+	else
+		fs_devices = find_fsid(disk_super->fsid, NULL);
 
-	fs_devices = find_fsid(disk_super->fsid);
 	if (!fs_devices) {
-		fs_devices = alloc_fs_devices(disk_super->fsid);
+		if (has_metadata_uuid)
+			fs_devices = alloc_fs_devices(disk_super->fsid,
+						      disk_super->metadata_uuid);
+		else
+			fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
+
 		if (IS_ERR(fs_devices))
 			return ERR_CAST(fs_devices);
 
@@ -884,7 +919,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 	struct btrfs_device *device;
 	struct btrfs_device *orig_dev;
 
-	fs_devices = alloc_fs_devices(orig->fsid);
+	fs_devices = alloc_fs_devices(orig->fsid, NULL);
 	if (IS_ERR(fs_devices))
 		return fs_devices;
 
@@ -1709,7 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
 	ptr = btrfs_device_uuid(dev_item);
 	write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
 	ptr = btrfs_device_fsid(dev_item);
-	write_extent_buffer(leaf, trans->fs_info->fsid, ptr, BTRFS_FSID_SIZE);
+	write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr,
+			    BTRFS_FSID_SIZE);
 	btrfs_mark_buffer_dirty(leaf);
 
 	ret = 0;
@@ -2132,7 +2168,11 @@ static struct btrfs_device *btrfs_find_device_by_path(
 	disk_super = (struct btrfs_super_block *)bh->b_data;
 	devid = btrfs_stack_device_id(&disk_super->dev_item);
 	dev_uuid = disk_super->dev_item.uuid;
-	device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
+		device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->metadata_uuid);
+	else
+		device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+
 	brelse(bh);
 	if (!device)
 		device = ERR_PTR(-ENOENT);
@@ -2202,7 +2242,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	if (!fs_devices->seeding)
 		return -EINVAL;
 
-	seed_devices = alloc_fs_devices(NULL);
+	seed_devices = alloc_fs_devices(NULL, NULL);
 	if (IS_ERR(seed_devices))
 		return PTR_ERR(seed_devices);
 
@@ -2239,6 +2279,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 
 	generate_random_uuid(fs_devices->fsid);
 	memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
+	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
+	memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	mutex_unlock(&fs_devices->device_list_mutex);
 
@@ -6245,7 +6287,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
 	cur_devices = fs_info->fs_devices;
 	while (cur_devices) {
 		if (!fsid ||
-		    !memcmp(cur_devices->fsid, fsid, BTRFS_FSID_SIZE)) {
+		    !memcmp(cur_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE)) {
 			device = find_device(cur_devices, devid, uuid);
 			if (device)
 				return device;
@@ -6574,12 +6616,12 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info,
 		fs_devices = fs_devices->seed;
 	}
 
-	fs_devices = find_fsid(fsid);
+	fs_devices = find_fsid(fsid, NULL);
 	if (!fs_devices) {
 		if (!btrfs_test_opt(fs_info, DEGRADED))
 			return ERR_PTR(-ENOENT);
 
-		fs_devices = alloc_fs_devices(fsid);
+		fs_devices = alloc_fs_devices(fsid, NULL);
 		if (IS_ERR(fs_devices))
 			return fs_devices;
 
@@ -6629,7 +6671,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
 	read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item),
 			   BTRFS_FSID_SIZE);
 
-	if (memcmp(fs_uuid, fs_info->fsid, BTRFS_FSID_SIZE)) {
+	if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) {
 		fs_devices = open_seed_devices(fs_info, fs_uuid);
 		if (IS_ERR(fs_devices))
 			return PTR_ERR(fs_devices);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index aefce895e994..04860497b33c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -210,6 +210,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+	u8 metadata_uuid[BTRFS_FSID_SIZE];
 	struct list_head fs_list;
 
 	u64 num_devices;
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ca1d21fc4a7..e0763bc4158e 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -269,6 +269,7 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_RAID56		(1ULL << 7)
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_METADATA_UUID	(1ULL << 10)
 
 struct btrfs_ioctl_feature_flags {
 	__u64 compat_flags;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index aff1356c2bb8..22f9299ba7d9 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -458,6 +458,7 @@ struct btrfs_free_space_header {
 #define BTRFS_SUPER_FLAG_METADUMP	(1ULL << 33)
 #define BTRFS_SUPER_FLAG_METADUMP_V2	(1ULL << 34)
 #define BTRFS_SUPER_FLAG_CHANGING_FSID	(1ULL << 35)
+#define BTRFS_SUPER_FLAG_CHANGING_FSID_v2 (1ULL << 36)
 
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info
  2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Nikolay Borisov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Currently btrfs_fs_info structure contains a copy of the
fsid/metadata_uuid fields. Same values are also contained in the
btrfs_fs_devices structure which fs_info has a reference to. Let's
reduce duplication by removing the fields from fs_info and always refer
to the ones in fs_devices. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/check-integrity.c   |  2 +-
 fs/btrfs/ctree.c             |  5 +++--
 fs/btrfs/ctree.h             |  2 --
 fs/btrfs/disk-io.c           | 21 ++++++++++++---------
 fs/btrfs/extent-tree.c       |  2 +-
 fs/btrfs/ioctl.c             |  2 +-
 fs/btrfs/super.c             |  2 +-
 fs/btrfs/volumes.c           | 10 ++++------
 include/trace/events/btrfs.h |  2 +-
 9 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 2e43fba44035..781cae168d2a 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1720,7 +1720,7 @@ static int btrfsic_test_for_metadata(struct btrfsic_state *state,
 	num_pages = state->metablock_size >> PAGE_SHIFT;
 	h = (struct btrfs_header *)datav[0];
 
-	if (memcmp(h->fsid, fs_info->fsid, BTRFS_FSID_SIZE))
+	if (memcmp(h->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE))
 		return 1;
 
 	for (i = 0; i < num_pages; i++) {
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 11b5c2abeddc..9de05de8887d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -12,6 +12,7 @@
 #include "transaction.h"
 #include "print-tree.h"
 #include "locking.h"
+#include "volumes.h"
 
 static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
 		      *root, struct btrfs_path *path, int level);
@@ -224,7 +225,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 	else
 		btrfs_set_header_owner(cow, new_root_objectid);
 
-	write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
+	write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid);
 
 	WARN_ON(btrfs_header_generation(buf) > trans->transid);
 	if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1033,7 +1034,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 	else
 		btrfs_set_header_owner(cow, root->root_key.objectid);
 
-	write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
+	write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid);
 
 	ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
 	if (ret) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index afa55f524a49..ad797ee42894 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -749,8 +749,6 @@ struct btrfs_delayed_root;
 #define BTRFS_FS_BALANCE_RUNNING		18
 
 struct btrfs_fs_info {
-	u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */
-	u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */
 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
 	unsigned long flags;
 	struct btrfs_root *extent_root;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b61e4d47e316..be2caf513e2f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
 	if (WARN_ON(!PageUptodate(page)))
 		return -EUCLEAN;
 
-	ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid,
+	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
 			btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
 
 	return csum_tree_block(fs_info, eb, 0);
@@ -2490,11 +2490,12 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		ret = -EINVAL;
 	}
 
-	if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid,
+	if (memcmp(fs_info->fs_devices->metadata_uuid, sb->dev_item.fsid,
 		   BTRFS_FSID_SIZE) != 0) {
 		btrfs_err(fs_info,
 			   "dev_item UUID does not match metadata fsid: %pU != %pU",
-			   fs_info->metadata_fsid, sb->dev_item.fsid);
+			   fs_info->fs_devices->metadata_uuid,
+			   sb->dev_item.fsid);
 		ret = -EINVAL;
 	}
 
@@ -2834,14 +2835,16 @@ int open_ctree(struct super_block *sb,
 	       sizeof(*fs_info->super_for_commit));
 	brelse(bh);
 
-	memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
+	ASSERT(!memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid,
+		       BTRFS_FSID_SIZE));
+
 	if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
-		memcpy(fs_info->metadata_fsid,
-		       fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE);
-	} else {
-		memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+		ASSERT(!memcmp(fs_info->fs_devices->metadata_uuid,
+				fs_info->super_copy->metadata_uuid,
+				BTRFS_FSID_SIZE));
 	}
 
+
 	ret = btrfs_validate_mount_super(fs_info);
 	if (ret) {
 		btrfs_err(fs_info, "superblock contains fatal errors");
@@ -2961,7 +2964,7 @@ int open_ctree(struct super_block *sb,
 
 	sb->s_blocksize = sectorsize;
 	sb->s_blocksize_bits = blksize_bits(sectorsize);
-	memcpy(&sb->s_uuid, fs_info->fsid, BTRFS_FSID_SIZE);
+	memcpy(&sb->s_uuid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE);
 
 	mutex_lock(&fs_info->chunk_mutex);
 	ret = btrfs_read_sys_array(fs_info);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4926d0975242..615e071ce17c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8174,7 +8174,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	btrfs_set_header_generation(buf, trans->transid);
 	btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
 	btrfs_set_header_owner(buf, owner);
-	write_extent_buffer_fsid(buf, fs_info->metadata_fsid);
+	write_extent_buffer_fsid(buf, fs_info->fs_devices->metadata_uuid);
 	write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid);
 	if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) {
 		buf->log_index = root->log_transid % 2;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a990a9045139..510635e49319 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3135,7 +3135,7 @@ static long btrfs_ioctl_fs_info(struct btrfs_fs_info *fs_info,
 	}
 	rcu_read_unlock();
 
-	memcpy(&fi_args->fsid, fs_info->fsid, sizeof(fi_args->fsid));
+	memcpy(&fi_args->fsid, fs_devices->fsid, sizeof(fi_args->fsid));
 	fi_args->nodesize = fs_info->nodesize;
 	fi_args->sectorsize = fs_info->sectorsize;
 	fi_args->clone_alignment = fs_info->sectorsize;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b362b45dd757..1163183fe6dd 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2090,7 +2090,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	u64 total_free_data = 0;
 	u64 total_free_meta = 0;
 	int bits = dentry->d_sb->s_blocksize_bits;
-	__be32 *fsid = (__be32 *)fs_info->fsid;
+	__be32 *fsid = (__be32 *)fs_info->fs_devices->fsid;
 	unsigned factor = 1;
 	struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
 	int ret;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e3e12c94834f..bf0aa900f22c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1744,8 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
 	ptr = btrfs_device_uuid(dev_item);
 	write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
 	ptr = btrfs_device_fsid(dev_item);
-	write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr,
-			    BTRFS_FSID_SIZE);
+	write_extent_buffer(leaf, trans->fs_info->fs_devices->metadata_uuid,
+			    ptr, BTRFS_FSID_SIZE);
 	btrfs_mark_buffer_dirty(leaf);
 
 	ret = 0;
@@ -2278,9 +2278,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	fs_devices->seed = seed_devices;
 
 	generate_random_uuid(fs_devices->fsid);
-	memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
-	memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	mutex_unlock(&fs_devices->device_list_mutex);
 
@@ -2522,7 +2520,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		 * so rename the fsid on the sysfs
 		 */
 		snprintf(fsid_buf, BTRFS_UUID_UNPARSED_SIZE, "%pU",
-						fs_info->fsid);
+						fs_info->fs_devices->fsid);
 		if (kobject_rename(&fs_devices->fsid_kobj, fsid_buf))
 			btrfs_warn(fs_info,
 				   "sysfs: failed to create fsid for sprout");
@@ -6671,7 +6669,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
 	read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item),
 			   BTRFS_FSID_SIZE);
 
-	if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) {
+	if (memcmp(fs_uuid, fs_devices->metadata_uuid, BTRFS_FSID_SIZE)) {
 		fs_devices = open_seed_devices(fs_info, fs_uuid);
 		if (IS_ERR(fs_devices))
 			return PTR_ERR(fs_devices);
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 8568946f491d..4b8400f7d4fa 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -92,7 +92,7 @@ TRACE_DEFINE_ENUM(COMMIT_TRANS);
 #define TP_STRUCT__entry_fsid __array(u8, fsid, BTRFS_FSID_SIZE)
 
 #define TP_fast_assign_fsid(fs_info)					\
-	memcpy(__entry->fsid, fs_info->fsid, BTRFS_FSID_SIZE)
+	memcpy(__entry->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE)
 
 #define TP_STRUCT__entry_btrfs(args...)					\
 	TP_STRUCT__entry(						\
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change
  2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Nikolay Borisov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

Even though FSID change without rewrite is a very quick operations it's
still possible to experience a split brain scenario if power loss
occurs at the right time. This patch handle the case where power
failure occurs while the first transaction (the one setting
FSID_CHANGING_V2) flag is being persisted on disk. This can cause
the btrfs_fs_device of this filesystem to be created by a device which:

 a) has the FSID_CHANGING_V2 flag set but its fsid value is intact

 b) or a device which doesn't have FSID_CHANGING_V2 flag set and its
 fsid value is intact

This situatian is trivially handled by the current find_fsid code since
in both cases the devices are going to be tread like ordinary devices.
Since btrfs is mounted always using the superblock of the latest
device (the one with higher generation number), meaning it will have
the FSID_CHANGING_V2 flag set, ensure it's being cleared. On the first
transaction commit following the mount all disks will have it cleared.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/disk-io.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index be2caf513e2f..9c2f46f8421a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2831,10 +2831,10 @@ int open_ctree(struct super_block *sb,
 	 * the whole block of INFO_SIZE
 	 */
 	memcpy(fs_info->super_copy, bh->b_data, sizeof(*fs_info->super_copy));
-	memcpy(fs_info->super_for_commit, fs_info->super_copy,
-	       sizeof(*fs_info->super_for_commit));
 	brelse(bh);
 
+	disk_super = fs_info->super_copy;
+
 	ASSERT(!memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid,
 		       BTRFS_FSID_SIZE));
 
@@ -2844,6 +2844,15 @@ int open_ctree(struct super_block *sb,
 				BTRFS_FSID_SIZE));
 	}
 
+	features = btrfs_super_flags(disk_super);
+	if (features & BTRFS_SUPER_FLAG_CHANGING_FSID_v2) {
+		features &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_v2;
+		btrfs_set_super_flags(disk_super, features);
+		btrfs_info(fs_info, "Found metadata uuid in progress flag. Clearing\n");
+	}
+
+	memcpy(fs_info->super_for_commit, fs_info->super_copy,
+	       sizeof(*fs_info->super_for_commit));
 
 	ret = btrfs_validate_mount_super(fs_info);
 	if (ret) {
@@ -2852,7 +2861,6 @@ int open_ctree(struct super_block *sb,
 		goto fail_alloc;
 	}
 
-	disk_super = fs_info->super_copy;
 	if (!btrfs_super_root(disk_super))
 		goto fail_alloc;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices
  2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
                   ` (2 preceding siblings ...)
  2018-10-11 15:03 ` [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Nikolay Borisov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

In order to gracefully handle split-brain scenario which are very
unlikely, yet still possible while performing the FSID change I'm
gonna need two more pieces of information:
 1. The highes generation number among all devices registered to a
 particular btrfs_fs_devices

 2. A boolean flag whether a given btrfs_fs_devices was created by a
 device which had the FSID_CHANGING_V2 flag set.

This is a preparatory patch and just introduces the variables as well
as code which sets, their actual use is going to happen in a later
patch.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 9 ++++++++-
 fs/btrfs/volumes.h | 5 +++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bf0aa900f22c..c2b66d15e08d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -785,6 +785,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
 	bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
 		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
+					BTRFS_SUPER_FLAG_CHANGING_FSID_v2);
 
 	if (has_metadata_uuid)
 		fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
@@ -798,6 +800,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 		else
 			fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
 
+		fs_devices->fsid_change = fsid_change_in_progress;
+
 		if (IS_ERR(fs_devices))
 			return ERR_CAST(fs_devices);
 
@@ -904,8 +908,11 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	 * it back. We need it to pick the disk with largest generation
 	 * (as above).
 	 */
-	if (!fs_devices->opened)
+	if (!fs_devices->opened) {
 		device->generation = found_transid;
+		fs_devices->latest_generation= max(found_transid,
+						fs_devices->latest_generation);
+	}
 
 	fs_devices->total_devices = btrfs_super_num_devices(disk_super);
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 04860497b33c..6b2a01c55426 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -211,6 +211,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
+	bool fsid_change;
 	struct list_head fs_list;
 
 	u64 num_devices;
@@ -219,6 +220,10 @@ struct btrfs_fs_devices {
 	u64 missing_devices;
 	u64 total_rw_bytes;
 	u64 total_devices;
+
+	/* Highest generation number of seen devices */
+	u64 latest_generation;
+
 	struct block_device *latest_bdev;
 
 	/* all of the devices in the FS, protected by a mutex
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change
  2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
                   ` (3 preceding siblings ...)
  2018-10-11 15:03 ` [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
  2018-10-11 15:03 ` [PATCH 6/6] btrfs: Handle final split-brain possibility " Nikolay Borisov
  2018-10-19 14:18 ` [PATCH 0/6] FSID change kernel support David Sterba
  6 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This commit continues hardening the scanning code to handle cases where
power loss could have caused disks in a multi-disk filesystem to be
in inconsistent state. Namely handle the situation that can occur when
some of the disks in multi-disk fs have completed their fsid change i.e
they have METADATA_UUID incompat flag set, have cleared the
FSID_CHANGING_V2 flag and their fsid/metadata_uuid are different. At
the same time the other half of the disks will have their
fsid/metadata_uuid unchanged and will only have FSID_CHANGING_V2 flag.

This is handled by adding additional code in the scan path which:

 a) In case first a device with FSID_CHANGING_V2 flag is scanned and
 btrfs_fs_devices is created with matching fsid/metdata_uuid then when
 a device with completed fsid change is scanned it will detect this
 via the new code in find_fsid i.e that such an fs_devices exist that
 fsid_change flag is set to true, it's metadata_uuid/fsid match and
 the metadata_uuid of the scanned device matches that of the fs_devices.
 In this case, it's important to note that the devices which has its
 fsid change completed will have a higher generation number than the
 device with FSID_CHANGING_V2 flag set, so its superblock block will
 be used during mount. To prevent an assertion triggering because
 the sb used for mounting will have differing fsid/metadata_uuid than
 the ones in the fs_devices struct also add code in device_list_add
 which overwrites the values in fs_devices.

 b) Alternatively we can end up with a device that completed its
 fsid change to be scanned first which will create the respective
 btrfs_fs_devices struct with differing fsid/metadata_uuid. In this
 case when a device with FSID_CHANGING_V2 flag set is scanned it will
 call the newly added find_fsid_inprogress function which will return
 the correct fs_devices.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 74 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c2b66d15e08d..2c9879a81884 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -382,6 +382,26 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
 
 	ASSERT(fsid);
 
+	if (metadata_fsid) {
+
+               /*
+                * Handle scanned device having completed its fsid change but
+                * belonging to a fs_devices that was created by first scanning
+                * a device which didn't have it's fsid/metadata_uuid changed
+                * at all and the CHANGING_FSID flag set. 4/a
+                */
+               list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+                       if (fs_devices->fsid_change &&
+                           memcmp(metadata_fsid, fs_devices->fsid,
+                                  BTRFS_FSID_SIZE) == 0 &&
+                           memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
+                                  BTRFS_FSID_SIZE) == 0) {
+                               return fs_devices;
+                       }
+               }
+	}
+
+	/* Handle non-split brain cases */
 	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
 		if (metadata_fsid) {
 			if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
@@ -768,6 +788,27 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 }
 
 /*
+ * Handle scanned device having its FSID_CHANGING flag set and the fs_devices
+ * being created with a disk that has already completed its fsid change.
+ */
+static struct btrfs_fs_devices *find_fsid_inprogress(
+					struct btrfs_super_block *disk_super)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
+			   BTRFS_FSID_SIZE) != 0 &&
+		    memcmp(fs_devices->metadata_uuid, disk_super->fsid,
+			   BTRFS_FSID_SIZE) == 0 && !fs_devices->fsid_change) {
+			return fs_devices;
+		}
+	}
+
+	return NULL;
+}
+
+/*
  * Add new device to list of registered devices
  *
  * Returns:
@@ -779,7 +820,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 			   bool *new_device_added)
 {
 	struct btrfs_device *device;
-	struct btrfs_fs_devices *fs_devices;
+	struct btrfs_fs_devices *fs_devices = NULL;
 	struct rcu_string *name;
 	u64 found_transid = btrfs_super_generation(disk_super);
 	u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
@@ -788,10 +829,24 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
 					BTRFS_SUPER_FLAG_CHANGING_FSID_v2);
 
-	if (has_metadata_uuid)
-		fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
-	else
+	if (fsid_change_in_progress && !has_metadata_uuid) {
+               /*
+                * When we have an image which has FSID_CHANGE set
+                * it might belong to either a filesystem which has
+                * disks with completed fsid change or it might belong
+                * to fs with no uuid changes in effect, handle both.
+                */
+		fs_devices = find_fsid_inprogress(disk_super);
+		if (!fs_devices)
+			fs_devices = find_fsid(disk_super->fsid, NULL);
+
+	} else if (has_metadata_uuid) {
+		fs_devices = find_fsid(disk_super->fsid,
+				       disk_super->metadata_uuid);
+	} else {
 		fs_devices = find_fsid(disk_super->fsid, NULL);
+	}
+
 
 	if (!fs_devices) {
 		if (has_metadata_uuid)
@@ -813,6 +868,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 		mutex_lock(&fs_devices->device_list_mutex);
 		device = find_device(fs_devices, devid,
 				disk_super->dev_item.uuid);
+
+                /*
+                 * If this disk has been pulled into an fs devices created by
+                 * a device which had the FSID_CHANGING flag then replace the
+                 * metadata_uuid/fsid values of the fs_devices.
+                 */
+                if (has_metadata_uuid && fs_devices->fsid_change &&
+                    found_transid > fs_devices->latest_generation) {
+                        memcpy(fs_devices->fsid, disk_super->fsid,
+                               BTRFS_FSID_SIZE);
+                        memcpy(fs_devices->metadata_uuid,
+                               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
+
+                        fs_devices->fsid_change = false;
+                }
 	}
 
 	if (!device) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] btrfs: Handle final split-brain possibility during fsid change
  2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
                   ` (4 preceding siblings ...)
  2018-10-11 15:03 ` [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Nikolay Borisov
@ 2018-10-11 15:03 ` Nikolay Borisov
  2018-10-19 14:18 ` [PATCH 0/6] FSID change kernel support David Sterba
  6 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-11 15:03 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This patch lands the last case which needs to be handled by the fsid
change code. Namely, this is the case where a multidisk filesystem has
already undergone at least one successful fsid change i.e all disks
have the METADATA_UUID incompat bit and power failure occurs as another
fsid chind is in progress. When such an event occurs disks should be
split in 2 groups. One of the groups will have both METADATA_UUID and
FSID_CHANGING_V2 flags set coupled with old fsid/metadata_uuid pairs.
The other group of disks will have only METADATA_UUID bit set and their
fsid will be different than the one in disks in the first group. Here
we look at cases:
  a) A disk from the first group is scanned first, so fs_devices is
  created with stale fsid/metdata_uuid. Then when a disk from the
  second group is scanned it needs to first check whether there exists
  such an fs_devices that has fsid_change set to true (because it was
  created with a disk having the FSID_CHANGING_V2 flag), the
  metadata_uuid and fsid of the fsdevices will be different (since it was
  created by a disk which already had at least 1 successful fsid change)
  and finally the metadata_uuid of the fs_devices will equal that of the
  currently scanned disk (because metadata_uuid never really changes).
  When the correct fs_devices is found the information from the scanned
  disk will replace the current one in fs_devices since the scanned disk
  will have higher generation number.

  b) A disk from the second group is scanned so fs_devices is created
  as usual with differing fsid/metdata_uid. Then when a disk from the
  first group is scanned the code detects that it has both
  FSID_CHANGING and METADATA_UUID flags set and will scan for fs_devices
  that has differing metadata_uuid/fsid and whose metadata_uuid  is the
  same as that of the scanned device.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/volumes.c | 65 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 53 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2c9879a81884..d08667cea189 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -383,7 +383,6 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
 	ASSERT(fsid);
 
 	if (metadata_fsid) {
-
                /*
                 * Handle scanned device having completed its fsid change but
                 * belonging to a fs_devices that was created by first scanning
@@ -399,6 +398,21 @@ find_fsid(const u8 *fsid, const u8 *metadata_fsid)
                                return fs_devices;
                        }
                }
+		/*
+		 * Handle scanned device having completed its fsid change but
+		 * belonging to a fs_devices that was created by a device that
+		 * has an outdated pair of fsid/metadata_uuid and CHANGING_FSID
+		 * flag set. 6/b
+		 */
+		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+			if (fs_devices->fsid_change &&
+			    memcmp(fs_devices->metadata_uuid,
+				   fs_devices->fsid, BTRFS_FSID_SIZE) != 0 &&
+			    memcmp(metadata_fsid, fs_devices->metadata_uuid,
+				   BTRFS_FSID_SIZE) == 0) {
+				return fs_devices;
+			}
+		}
 	}
 
 	/* Handle non-split brain cases */
@@ -808,6 +822,30 @@ static struct btrfs_fs_devices *find_fsid_inprogress(
 	return NULL;
 }
 
+
+static struct btrfs_fs_devices *find_fsid_changed(
+					struct btrfs_super_block *disk_super)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	/*
+	 * Handles the case where scanned device is part of an fs that had
+	 * multiple successful changes of FSID but curently device didn't
+	 * observe it. Meaning our fsid will be different than theirs. 6/b
+	 */
+	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
+			   BTRFS_FSID_SIZE) != 0 &&
+		    memcmp(fs_devices->metadata_uuid, disk_super->metadata_uuid,
+			   BTRFS_FSID_SIZE) == 0 &&
+		    memcmp(fs_devices->fsid, disk_super->fsid,
+			   BTRFS_FSID_SIZE) != 0) {
+			return fs_devices;
+		}
+	}
+
+	return NULL;
+}
 /*
  * Add new device to list of registered devices
  *
@@ -829,17 +867,20 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
 					BTRFS_SUPER_FLAG_CHANGING_FSID_v2);
 
-	if (fsid_change_in_progress && !has_metadata_uuid) {
-               /*
-                * When we have an image which has FSID_CHANGE set
-                * it might belong to either a filesystem which has
-                * disks with completed fsid change or it might belong
-                * to fs with no uuid changes in effect, handle both.
-                */
-		fs_devices = find_fsid_inprogress(disk_super);
-		if (!fs_devices)
-			fs_devices = find_fsid(disk_super->fsid, NULL);
-
+	if (fsid_change_in_progress) {
+		if (!has_metadata_uuid) {
+			/*
+			 * When we have an image which has FSID_CHANGE set
+			 * it might belong to either a filesystem which has
+			 * disks with completed fsid change or it might belong
+			 * to fs with no uuid changes in effect, handle both.
+			 */
+			fs_devices = find_fsid_inprogress(disk_super);
+			if (!fs_devices)
+				fs_devices = find_fsid(disk_super->fsid, NULL);
+		} else {
+			fs_devices = find_fsid_changed(disk_super);
+		}
 	} else if (has_metadata_uuid) {
 		fs_devices = find_fsid(disk_super->fsid,
 				       disk_super->metadata_uuid);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] FSID change kernel support
  2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
                   ` (5 preceding siblings ...)
  2018-10-11 15:03 ` [PATCH 6/6] btrfs: Handle final split-brain possibility " Nikolay Borisov
@ 2018-10-19 14:18 ` David Sterba
  2018-10-19 14:31   ` Nikolay Borisov
  6 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2018-10-19 14:18 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: linux-btrfs

On Thu, Oct 11, 2018 at 06:03:20PM +0300, Nikolay Borisov wrote:
> Here is the second posting of the fsid change support for the kernel. For 
> background information you can refer to v1 [0]. The main changes in this version
> are around the handling of possible split-brain scenarios. I've changed a bit 
> how the userspace code works and now the process is split among 2 transactions. 
> The first one flagging "we are about to change fsid" and once it's persisted on
> all disks a second one does the actual change. This of course is not enough 
> to guarantee full consistency so I had to extend the device scanning to 
> gracefully handle such cases. I believe I have covered everything but more 
> review will be appreciated. 

All the cases seem to be covered. Do you intend to add the design
document somewhere? The references in the code seem stale and puzzling.

> So patch 1 implements the basic functionality with no split-brain handling 
> whatsoever. Patch 2 is a minor cleanup. Patch 3 deals with a split-brain that 
> can occur if power loss happens during the initial transaction (the one setting
> the beginning flag). Patch 4 adds some information that is needed in the last 2 
> patches. Patch 5 handles failure between transaction 1 and transaction 2 and 
> finally patch 6 handles the case of power loss during transaction 1 but for an 
> fs which has already undergone at least one successful fsid change. More 
> details about the exact failure modes are in each respective patch.
> 
> One thing which could be improved but I ran out of ideas is the naming of the 
> ancillary functions - find_fsid_inprogress and find_fsid_changed. 

Hm, no better ideas so it can stay and be changed later.

> I've actually tested the split-brain handing code with specially crafted images. 
> They will be part of the user-space submissions and I believe I have full 
> coverage for that. 

Perfect, thanks.

Now the bad news from me :) There are several coding style and style
issues all over the patches so I'll list them here.

* BTRFS_SUPER_FLAG_CHANGING_FSID_v2 -- V2 with uppercase V, as other
  versioned symbols

* all references in changelogs or comments should refer to the super flag
  as CHANGING_FSID_v2, not FSID_CHANGING, not just CHANGING_FSID, nor
  FSID_CHANGE. This is because we want to be able to search for it and
  find all occurences

* error messages should stick to the preferred format,
  https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_messages

* comments referring to UUID should use "UUID", there's a mixt of
  both ways added by the patches, error messages should use the
  uppercase form too

* find_fsid - type and name should be on one line, parameters on the
  next if they don't fit

* device_list_add - missing space before =

As these are not functional problems, I'll add the patchset to for-next
for testing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] FSID change kernel support
  2018-10-19 14:18 ` [PATCH 0/6] FSID change kernel support David Sterba
@ 2018-10-19 14:31   ` Nikolay Borisov
  2018-10-19 15:50     ` David Sterba
  0 siblings, 1 reply; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-19 14:31 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 19.10.2018 17:18, David Sterba wrote:
> On Thu, Oct 11, 2018 at 06:03:20PM +0300, Nikolay Borisov wrote:
>> Here is the second posting of the fsid change support for the kernel. For 
>> background information you can refer to v1 [0]. The main changes in this version
>> are around the handling of possible split-brain scenarios. I've changed a bit 
>> how the userspace code works and now the process is split among 2 transactions. 
>> The first one flagging "we are about to change fsid" and once it's persisted on
>> all disks a second one does the actual change. This of course is not enough 
>> to guarantee full consistency so I had to extend the device scanning to 
>> gracefully handle such cases. I believe I have covered everything but more 
>> review will be appreciated. 
> 
> All the cases seem to be covered. Do you intend to add the design
> document somewhere? The references in the code seem stale and puzzling.

I believe I have the critical portions of the design document (i.e the
handling of various cases) described in each of the 3 patches that add
split-brain handling. The doc could be added to the btrfs-dev-doc repo I
guess.

> 
>> So patch 1 implements the basic functionality with no split-brain handling 
>> whatsoever. Patch 2 is a minor cleanup. Patch 3 deals with a split-brain that 
>> can occur if power loss happens during the initial transaction (the one setting
>> the beginning flag). Patch 4 adds some information that is needed in the last 2 
>> patches. Patch 5 handles failure between transaction 1 and transaction 2 and 
>> finally patch 6 handles the case of power loss during transaction 1 but for an 
>> fs which has already undergone at least one successful fsid change. More 
>> details about the exact failure modes are in each respective patch.
>>
>> One thing which could be improved but I ran out of ideas is the naming of the 
>> ancillary functions - find_fsid_inprogress and find_fsid_changed. 
> 
> Hm, no better ideas so it can stay and be changed later.
> 
>> I've actually tested the split-brain handing code with specially crafted images. 
>> They will be part of the user-space submissions and I believe I have full 
>> coverage for that. 
> 
> Perfect, thanks.
> 
> Now the bad news from me :) There are several coding style and style
> issues all over the patches so I'll list them here.
> 
> * BTRFS_SUPER_FLAG_CHANGING_FSID_v2 -- V2 with uppercase V, as other
>   versioned symbols
> 
> * all references in changelogs or comments should refer to the super flag
>   as CHANGING_FSID_v2, not FSID_CHANGING, not just CHANGING_FSID, nor
>   FSID_CHANGE. This is because we want to be able to search for it and
>   find all occurences
> 
> * error messages should stick to the preferred format,
>   https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_messages
> 
> * comments referring to UUID should use "UUID", there's a mixt of
>   both ways added by the patches, error messages should use the
>   uppercase form too
> 
> * find_fsid - type and name should be on one line, parameters on the
>   next if they don't fit
> 
> * device_list_add - missing space before =
> 
> As these are not functional problems, I'll add the patchset to for-next
> for testing.
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] FSID change kernel support
  2018-10-19 14:31   ` Nikolay Borisov
@ 2018-10-19 15:50     ` David Sterba
  0 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2018-10-19 15:50 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: dsterba, linux-btrfs

On Fri, Oct 19, 2018 at 05:31:09PM +0300, Nikolay Borisov wrote:
> 
> 
> On 19.10.2018 17:18, David Sterba wrote:
> > On Thu, Oct 11, 2018 at 06:03:20PM +0300, Nikolay Borisov wrote:
> >> Here is the second posting of the fsid change support for the kernel. For 
> >> background information you can refer to v1 [0]. The main changes in this version
> >> are around the handling of possible split-brain scenarios. I've changed a bit 
> >> how the userspace code works and now the process is split among 2 transactions. 
> >> The first one flagging "we are about to change fsid" and once it's persisted on
> >> all disks a second one does the actual change. This of course is not enough 
> >> to guarantee full consistency so I had to extend the device scanning to 
> >> gracefully handle such cases. I believe I have covered everything but more 
> >> review will be appreciated. 
> > 
> > All the cases seem to be covered. Do you intend to add the design
> > document somewhere? The references in the code seem stale and puzzling.
> 
> I believe I have the critical portions of the design document (i.e the
> handling of various cases) described in each of the 3 patches that add
> split-brain handling. The doc could be added to the btrfs-dev-doc repo I
> guess.

Yeah, but what does 4/a mean in the comments? It's not referring to the
changelog where I'd look first. External documentation is fine but we
need some sort of reference in the code. The comments in 5/6 and 6/6 are
sufficient, just that the references could be more explicit. Possibly
the cases can be simplified and put into the .c as a bullet list.

The document is good for understanding, the nice-to-have part would be
some quick reference to check the logic. I'll give it another read to
see how much is really missing or not.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite
  2018-10-30 14:43 [PATCH v3 " Nikolay Borisov
@ 2018-10-30 14:43 ` Nikolay Borisov
  0 siblings, 0 replies; 11+ messages in thread
From: Nikolay Borisov @ 2018-10-30 14:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

This field is going to be used when the user wants to change the UUID
of the filesystem without having to rewrite all metadata blocks. This
field adds another level of indirection such that when the FSID is
changed what really happens is the current UUID (the one with which the
fs was created) is copied to the 'metadata_uuid' field in the superblock
as well as a new incompat flag is set METADATA_UUID. When the kernel
detects this flag is set it knows that the superblock in fact has 2
UUIDs:

1. Is the UUID which is user-visible, currently known as FSID.
2. Metadata UUID - this is the UUID which is stamped into all on-disk
datastructures belonging to this file system.

When the new incompat flag is present device scaning checks whether
both fsid/metadata_uuid of the scanned device match to any of the
registed filesystems. When the flag is not set then both UUIDs are
equal and only the FSID is retained on disk, metadata_uuid is set only
in-memory during mount.

Additionally a new metadata_uuid field is also added to the fs_info
struct. It's initialised either with the FSID in case METADATA_UUID
incompat flag is not set or with the metdata_uuid of the superblock
otherwise.

This commit introduces the new fields as well as the new incompat flag
and switches all users of the fsid to the new logic.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.c                |  4 +--
 fs/btrfs/ctree.h                | 12 ++++---
 fs/btrfs/disk-io.c              | 32 ++++++++++++++----
 fs/btrfs/extent-tree.c          |  2 +-
 fs/btrfs/volumes.c              | 72 ++++++++++++++++++++++++++++++++---------
 fs/btrfs/volumes.h              |  1 +
 include/uapi/linux/btrfs.h      |  1 +
 include/uapi/linux/btrfs_tree.h |  1 +
 8 files changed, 97 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 539901fb5165..75cd41bf12f7 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -224,7 +224,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 	else
 		btrfs_set_header_owner(cow, new_root_objectid);
 
-	write_extent_buffer_fsid(cow, fs_info->fsid);
+	write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
 
 	WARN_ON(btrfs_header_generation(buf) > trans->transid);
 	if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1050,7 +1050,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 	else
 		btrfs_set_header_owner(cow, root->root_key.objectid);
 
-	write_extent_buffer_fsid(cow, fs_info->fsid);
+	write_extent_buffer_fsid(cow, fs_info->metadata_fsid);
 
 	ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
 	if (ret) {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 68ca41dbbef3..501ada9ec7bd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -197,7 +197,7 @@ struct btrfs_root_backup {
 struct btrfs_super_block {
 	u8 csum[BTRFS_CSUM_SIZE];
 	/* the first 4 fields must match struct btrfs_header */
-	u8 fsid[BTRFS_FSID_SIZE];    /* FS specific uuid */
+	u8 fsid[BTRFS_FSID_SIZE];    /* userfacing FS specific uuid */
 	__le64 bytenr; /* this block number */
 	__le64 flags;
 
@@ -234,8 +234,10 @@ struct btrfs_super_block {
 	__le64 cache_generation;
 	__le64 uuid_tree_generation;
 
+	u8 metadata_uuid[BTRFS_FSID_SIZE]; /* The uuid written into btree blocks */
+
 	/* future expansion */
-	__le64 reserved[30];
+	__le64 reserved[28];
 	u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
 	struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
 } __attribute__ ((__packed__));
@@ -265,7 +267,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_RAID56 |		\
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
-	 BTRFS_FEATURE_INCOMPAT_NO_HOLES)
+	 BTRFS_FEATURE_INCOMPAT_NO_HOLES	|	\
+	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
 
 #define BTRFS_FEATURE_INCOMPAT_SAFE_SET			\
 	(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
@@ -746,7 +749,8 @@ struct btrfs_delayed_root;
 #define BTRFS_FS_BALANCE_RUNNING		18
 
 struct btrfs_fs_info {
-	u8 fsid[BTRFS_FSID_SIZE];
+	u8 fsid[BTRFS_FSID_SIZE]; /* User-visible fs UUID */
+	u8 metadata_fsid[BTRFS_FSID_SIZE]; /* UUID written to btree blocks */
 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
 	unsigned long flags;
 	struct btrfs_root *extent_root;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b0ab41da91d1..b76b18388b93 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -551,7 +551,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct page *page)
 	if (WARN_ON(!PageUptodate(page)))
 		return -EUCLEAN;
 
-	ASSERT(memcmp_extent_buffer(eb, fs_info->fsid,
+	ASSERT(memcmp_extent_buffer(eb, fs_info->metadata_fsid,
 			btrfs_header_fsid(), BTRFS_FSID_SIZE) == 0);
 
 	return csum_tree_block(fs_info, eb, 0);
@@ -566,7 +566,19 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info,
 
 	read_extent_buffer(eb, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
 	while (fs_devices) {
-		if (!memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE)) {
+		u8 *metadata_uuid;
+		/*
+		 * Checking the incompat flag is only valid for the current
+		 * fs. For seed devices it's forbidden to have their uuid
+		 * changed so reading ->fsid in this case is fine
+		 */
+		if (fs_devices == fs_info->fs_devices &&
+		    btrfs_fs_incompat(fs_info, METADATA_UUID))
+			metadata_uuid = fs_devices->metadata_uuid;
+		else
+			metadata_uuid = fs_devices->fsid;
+
+		if (!memcmp(fsid, metadata_uuid, BTRFS_FSID_SIZE)) {
 			ret = 0;
 			break;
 		}
@@ -2478,10 +2490,11 @@ static int validate_super(struct btrfs_fs_info *fs_info,
 		ret = -EINVAL;
 	}
 
-	if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) {
+	if (memcmp(fs_info->metadata_fsid, sb->dev_item.fsid,
+		   BTRFS_FSID_SIZE) != 0) {
 		btrfs_err(fs_info,
-			   "dev_item UUID does not match fsid: %pU != %pU",
-			   fs_info->fsid, sb->dev_item.fsid);
+			   "dev_item UUID does not match metadata fsid: %pU != %pU",
+			   fs_info->metadata_fsid, sb->dev_item.fsid);
 		ret = -EINVAL;
 	}
 
@@ -2822,6 +2835,12 @@ int open_ctree(struct super_block *sb,
 	brelse(bh);
 
 	memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE);
+	if (btrfs_fs_incompat(fs_info, METADATA_UUID)) {
+		memcpy(fs_info->metadata_fsid,
+		       fs_info->super_copy->metadata_uuid, BTRFS_FSID_SIZE);
+	} else {
+		memcpy(fs_info->metadata_fsid, fs_info->fsid, BTRFS_FSID_SIZE);
+	}
 
 	ret = btrfs_validate_mount_super(fs_info);
 	if (ret) {
@@ -3760,7 +3779,8 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 		btrfs_set_stack_device_io_width(dev_item, dev->io_width);
 		btrfs_set_stack_device_sector_size(dev_item, dev->sector_size);
 		memcpy(dev_item->uuid, dev->uuid, BTRFS_UUID_SIZE);
-		memcpy(dev_item->fsid, dev->fs_devices->fsid, BTRFS_FSID_SIZE);
+		memcpy(dev_item->fsid, dev->fs_devices->metadata_uuid,
+		       BTRFS_FSID_SIZE);
 
 		flags = btrfs_super_flags(sb);
 		btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a1febf155747..047fba06a8d8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8169,7 +8169,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 	btrfs_set_header_generation(buf, trans->transid);
 	btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
 	btrfs_set_header_owner(buf, owner);
-	write_extent_buffer_fsid(buf, fs_info->fsid);
+	write_extent_buffer_fsid(buf, fs_info->metadata_fsid);
 	write_extent_buffer_chunk_tree_uuid(buf, fs_info->chunk_tree_uuid);
 	if (root->root_key.objectid == BTRFS_TREE_LOG_OBJECTID) {
 		buf->log_index = root->log_transid % 2;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f435d397019e..e3e12c94834f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -238,13 +238,15 @@ struct list_head *btrfs_get_fs_uuids(void)
 
 /*
  * alloc_fs_devices - allocate struct btrfs_fs_devices
- * @fsid:	if not NULL, copy the uuid to fs_devices::fsid
+ * @fsid:		if not NULL, copy the uuid to fs_devices::fsid
+ * @metadata_fsid:	if not NULL, copy the uuid to fs_devices::metadata_fsid
  *
  * Return a pointer to a new struct btrfs_fs_devices on success, or ERR_PTR().
  * The returned struct is not linked onto any lists and can be destroyed with
  * kfree() right away.
  */
-static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
+static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid,
+						 const u8 *metadata_fsid)
 {
 	struct btrfs_fs_devices *fs_devs;
 
@@ -261,6 +263,11 @@ static struct btrfs_fs_devices *alloc_fs_devices(const u8 *fsid)
 	if (fsid)
 		memcpy(fs_devs->fsid, fsid, BTRFS_FSID_SIZE);
 
+	if (metadata_fsid)
+		memcpy(fs_devs->metadata_uuid, metadata_fsid, BTRFS_FSID_SIZE);
+	else if (fsid)
+		memcpy(fs_devs->metadata_uuid, fsid, BTRFS_FSID_SIZE);
+
 	return fs_devs;
 }
 
@@ -368,13 +375,24 @@ static struct btrfs_device *find_device(struct btrfs_fs_devices *fs_devices,
 	return NULL;
 }
 
-static noinline struct btrfs_fs_devices *find_fsid(u8 *fsid)
+static noinline struct btrfs_fs_devices *
+find_fsid(const u8 *fsid, const u8 *metadata_fsid)
 {
 	struct btrfs_fs_devices *fs_devices;
 
+	ASSERT(fsid);
+
 	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-		if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0)
-			return fs_devices;
+		if (metadata_fsid) {
+			if (memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE) == 0
+			    && memcmp(metadata_fsid, fs_devices->metadata_uuid,
+				      BTRFS_FSID_SIZE) == 0)
+				return fs_devices;
+		} else {
+			if (memcmp(fsid, fs_devices->fsid,
+				   BTRFS_FSID_SIZE) == 0)
+				return fs_devices;
+		}
 	}
 	return NULL;
 }
@@ -709,6 +727,12 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 	device->generation = btrfs_super_generation(disk_super);
 
 	if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING) {
+		if (btrfs_super_incompat_flags(disk_super) &
+		    BTRFS_FEATURE_INCOMPAT_METADATA_UUID) {
+			pr_err("BTRFS: Invalid seeding and uuid-changed device detected\n");
+			goto error_brelse;
+		}
+
 		clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
 		fs_devices->seeding = 1;
 	} else {
@@ -759,10 +783,21 @@ static noinline struct btrfs_device *device_list_add(const char *path,
 	struct rcu_string *name;
 	u64 found_transid = btrfs_super_generation(disk_super);
 	u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
+	bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
+		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+
+	if (has_metadata_uuid)
+		fs_devices = find_fsid(disk_super->fsid, disk_super->metadata_uuid);
+	else
+		fs_devices = find_fsid(disk_super->fsid, NULL);
 
-	fs_devices = find_fsid(disk_super->fsid);
 	if (!fs_devices) {
-		fs_devices = alloc_fs_devices(disk_super->fsid);
+		if (has_metadata_uuid)
+			fs_devices = alloc_fs_devices(disk_super->fsid,
+						      disk_super->metadata_uuid);
+		else
+			fs_devices = alloc_fs_devices(disk_super->fsid, NULL);
+
 		if (IS_ERR(fs_devices))
 			return ERR_CAST(fs_devices);
 
@@ -884,7 +919,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 	struct btrfs_device *device;
 	struct btrfs_device *orig_dev;
 
-	fs_devices = alloc_fs_devices(orig->fsid);
+	fs_devices = alloc_fs_devices(orig->fsid, NULL);
 	if (IS_ERR(fs_devices))
 		return fs_devices;
 
@@ -1709,7 +1744,8 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
 	ptr = btrfs_device_uuid(dev_item);
 	write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE);
 	ptr = btrfs_device_fsid(dev_item);
-	write_extent_buffer(leaf, trans->fs_info->fsid, ptr, BTRFS_FSID_SIZE);
+	write_extent_buffer(leaf, trans->fs_info->metadata_fsid, ptr,
+			    BTRFS_FSID_SIZE);
 	btrfs_mark_buffer_dirty(leaf);
 
 	ret = 0;
@@ -2132,7 +2168,11 @@ static struct btrfs_device *btrfs_find_device_by_path(
 	disk_super = (struct btrfs_super_block *)bh->b_data;
 	devid = btrfs_stack_device_id(&disk_super->dev_item);
 	dev_uuid = disk_super->dev_item.uuid;
-	device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
+		device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->metadata_uuid);
+	else
+		device = btrfs_find_device(fs_info, devid, dev_uuid, disk_super->fsid);
+
 	brelse(bh);
 	if (!device)
 		device = ERR_PTR(-ENOENT);
@@ -2202,7 +2242,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	if (!fs_devices->seeding)
 		return -EINVAL;
 
-	seed_devices = alloc_fs_devices(NULL);
+	seed_devices = alloc_fs_devices(NULL, NULL);
 	if (IS_ERR(seed_devices))
 		return PTR_ERR(seed_devices);
 
@@ -2239,6 +2279,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 
 	generate_random_uuid(fs_devices->fsid);
 	memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
+	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
+	memcpy(fs_info->metadata_fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	mutex_unlock(&fs_devices->device_list_mutex);
 
@@ -6245,7 +6287,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid,
 	cur_devices = fs_info->fs_devices;
 	while (cur_devices) {
 		if (!fsid ||
-		    !memcmp(cur_devices->fsid, fsid, BTRFS_FSID_SIZE)) {
+		    !memcmp(cur_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE)) {
 			device = find_device(cur_devices, devid, uuid);
 			if (device)
 				return device;
@@ -6574,12 +6616,12 @@ static struct btrfs_fs_devices *open_seed_devices(struct btrfs_fs_info *fs_info,
 		fs_devices = fs_devices->seed;
 	}
 
-	fs_devices = find_fsid(fsid);
+	fs_devices = find_fsid(fsid, NULL);
 	if (!fs_devices) {
 		if (!btrfs_test_opt(fs_info, DEGRADED))
 			return ERR_PTR(-ENOENT);
 
-		fs_devices = alloc_fs_devices(fsid);
+		fs_devices = alloc_fs_devices(fsid, NULL);
 		if (IS_ERR(fs_devices))
 			return fs_devices;
 
@@ -6629,7 +6671,7 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
 	read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item),
 			   BTRFS_FSID_SIZE);
 
-	if (memcmp(fs_uuid, fs_info->fsid, BTRFS_FSID_SIZE)) {
+	if (memcmp(fs_uuid, fs_info->metadata_fsid, BTRFS_FSID_SIZE)) {
 		fs_devices = open_seed_devices(fs_info, fs_uuid);
 		if (IS_ERR(fs_devices))
 			return PTR_ERR(fs_devices);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index aefce895e994..04860497b33c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -210,6 +210,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
+	u8 metadata_uuid[BTRFS_FSID_SIZE];
 	struct list_head fs_list;
 
 	u64 num_devices;
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ca1d21fc4a7..e0763bc4158e 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -269,6 +269,7 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_RAID56		(1ULL << 7)
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
+#define BTRFS_FEATURE_INCOMPAT_METADATA_UUID	(1ULL << 10)
 
 struct btrfs_ioctl_feature_flags {
 	__u64 compat_flags;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index aff1356c2bb8..e974f4bb5378 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -458,6 +458,7 @@ struct btrfs_free_space_header {
 #define BTRFS_SUPER_FLAG_METADUMP	(1ULL << 33)
 #define BTRFS_SUPER_FLAG_METADUMP_V2	(1ULL << 34)
 #define BTRFS_SUPER_FLAG_CHANGING_FSID	(1ULL << 35)
+#define BTRFS_SUPER_FLAG_CHANGING_FSID_V2 (1ULL << 36)
 
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-10-30 14:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-11 15:03 [PATCH 0/6] FSID change kernel support Nikolay Borisov
2018-10-11 15:03 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov
2018-10-11 15:03 ` [PATCH 2/6] btrfs: Remove fsid/metadata_fsid fields from btrfs_info Nikolay Borisov
2018-10-11 15:03 ` [PATCH 3/6] btrfs: Add handling for disk split-brain scenario during fsid change Nikolay Borisov
2018-10-11 15:03 ` [PATCH 4/6] btrfs: Introduce 2 more members to struct btrfs_fs_devices Nikolay Borisov
2018-10-11 15:03 ` [PATCH 5/6] btrfs: Handle one more split-brain scenario during fsid change Nikolay Borisov
2018-10-11 15:03 ` [PATCH 6/6] btrfs: Handle final split-brain possibility " Nikolay Borisov
2018-10-19 14:18 ` [PATCH 0/6] FSID change kernel support David Sterba
2018-10-19 14:31   ` Nikolay Borisov
2018-10-19 15:50     ` David Sterba
  -- strict thread matches above, loose matches on Subject: below --
2018-10-30 14:43 [PATCH v3 " Nikolay Borisov
2018-10-30 14:43 ` [PATCH 1/6] btrfs: Introduce support for FSID change without metadata rewrite Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).