[PATCH v3 0/7] Chunk level degradable check

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/7] Chunk level degradable check
@ 2017-03-08  2:41 Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  2:41 UTC (permalink / raw)
  To: linux-btrfs, anand.jain, kilobyte, demfloro

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict
if data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded
mounted.

But in fact, some times all single chunks may be in the existing
device and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
 # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
 # wipefs -f /dev/sdc
 # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for
btrfs, allow above case to succeed, and it's quite small anyway.

And enhance kernel error message for missing device, at least kernel
can know what's making mount failed, other than meaningless
"failed to read system chunk/chunk tree -5".

v2:
  Update after almost 2 years.
  Add the last patch to enhance the kernel output, so user can know
  it's missing devices prevent btrfs to mount.
v3:
  Remove one duplicated missing device output
  Use the advice from Anand Jain, not to add new members in btrfs_device,
  but use a new structure extra_rw_degrade_errors, to record error when
  sending down/waiting device.

Sorry Dmitrii Tcvetkov and Adam Borowski, I'm afraid I can't add your
tested-by tags in v3, as the 4th and 4th patches have quite a big change,
so you may need to retest the new patchset.
Sorry for the trouble.

Qu Wenruo (7):
  btrfs: Introduce a function to check if all chunks a OK for degraded
    rw mount
  btrfs: Do chunk level rw degrade check at mount time
  btrfs: Do chunk level degradation check for remount
  btrfs: Introduce extra_rw_degrade_errors parameter for
    btrfs_check_rw_degradable
  btrfs: Allow barrier_all_devices to do chunk level device check
  btrfs: Cleanup num_tolerated_disk_barrier_failures
  btrfs: Enhance missing device kernel message

 fs/btrfs/ctree.h   |   2 -
 fs/btrfs/disk-io.c |  87 ++++++------------------------
 fs/btrfs/disk-io.h |   2 -
 fs/btrfs/super.c   |   5 +-
 fs/btrfs/volumes.c | 156 ++++++++++++++++++++++++++++++++++++++++++++---------
 fs/btrfs/volumes.h |  37 +++++++++++++
 6 files changed, 188 insertions(+), 101 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
@ 2017-03-08  2:41 ` Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 2/7] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  2:41 UTC (permalink / raw)
  To: linux-btrfs, anand.jain, kilobyte, demfloro

Introduce a new function, btrfs_check_rw_degradable(), to check if all
chunks in btrfs is OK for degraded rw mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73d56eef5e60..3fb760cd5bad 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6765,6 +6765,59 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	return -EIO;
 }
 
+/*
+ * Check if all chunks in the fs is OK for read-write degraded mount
+ *
+ * Return true if the fs is OK to be mounted degraded read-write
+ * Return false if the fs is not OK to be mounted degraded
+ */
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	u64 next_start = 0;
+	bool ret = true;
+
+	read_lock(&map_tree->map_tree.lock);
+	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)-1);
+	/* No chunk at all? Return false anyway */
+	if (!em) {
+		ret = false;
+		goto out;
+	}
+	while (em) {
+		struct map_lookup *map;
+		int missing = 0;
+		int max_tolerated;
+		int i;
+
+		map = (struct map_lookup *) em->bdev;
+		max_tolerated =
+			btrfs_get_num_tolerated_disk_barrier_failures(
+					map->type);
+		for (i = 0; i < map->num_stripes; i++) {
+			if (map->stripes[i].dev->missing)
+				missing++;
+		}
+		if (missing > max_tolerated) {
+			ret = false;
+			btrfs_warn(fs_info,
+	"chunk %llu missing %d devices, max tolerance is %d for writeble mount",
+				   em->start, missing, max_tolerated);
+			free_extent_map(em);
+			goto out;
+		}
+		next_start = extent_map_end(em);
+		free_extent_map(em);
+
+		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
+					   (u64)(-1) - next_start);
+	}
+out:
+	read_unlock(&map_tree->map_tree.lock);
+	return ret;
+}
+
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 {
 	struct btrfs_root *root = fs_info->chunk_root;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 59be81206dd7..db1b5ef479cf 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -538,4 +538,5 @@ struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/7] btrfs: Do chunk level rw degrade check at mount time
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
@ 2017-03-08  2:41 ` Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 3/7] btrfs: Do chunk level degradation check for remount Qu Wenruo
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  2:41 UTC (permalink / raw)
  To: linux-btrfs, anand.jain, kilobyte, demfloro

Now use the btrfs_check_rw_degradable() to do mount time degration check.

With this patch, now we can mount with the following case:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdc
 # mount /dev/sdb /mnt/btrfs -o degraded
 As the single data chunk is only in sdb, so it's OK to mount as
 degraded, as missing one device is OK for RAID1.

But still fail with the following case as expected:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdb
 # mount /dev/sdc /mnt/btrfs -o degraded
 As the data chunk is only in sdb, so it's not OK to mount it as
 degraded.

Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reported-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08b74daf35d0..3de89283d400 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3057,15 +3057,10 @@ int open_ctree(struct super_block *sb,
 		btrfs_err(fs_info, "failed to read block groups: %d", ret);
 		goto fail_sysfs;
 	}
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	if (fs_info->fs_devices->missing_devices >
-	     fs_info->num_tolerated_disk_barrier_failures &&
-	    !(sb->s_flags & MS_RDONLY)) {
+
+	if (!(sb->s_flags & MS_RDONLY) && !btrfs_check_rw_degradable(fs_info)) {
 		btrfs_warn(fs_info,
-"missing devices (%llu) exceeds the limit (%d), writeable mount is not allowed",
-			fs_info->fs_devices->missing_devices,
-			fs_info->num_tolerated_disk_barrier_failures);
+		"writeable mount is not allowed due to too many missing devices");
 		goto fail_sysfs;
 	}
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/7] btrfs: Do chunk level degradation check for remount
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 2/7] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
@ 2017-03-08  2:41 ` Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 4/7] btrfs: Introduce extra_rw_degrade_errors parameter for btrfs_check_rw_degradable Qu Wenruo
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  2:41 UTC (permalink / raw)
  To: linux-btrfs, anand.jain, kilobyte, demfloro

Just the same for mount time check, use btrfs_check_rw_degradable() to
check if we are OK to be remounted rw.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/super.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index da687dc79cce..1f5772501c92 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1784,9 +1784,8 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		if (fs_info->fs_devices->missing_devices >
-		     fs_info->num_tolerated_disk_barrier_failures &&
-		    !(*flags & MS_RDONLY)) {
+		if (!(*flags & MS_RDONLY) &&
+		    !btrfs_check_rw_degradable(fs_info)) {
 			btrfs_warn(fs_info,
 				"too many missing devices, writeable remount is not allowed");
 			ret = -EACCES;
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/7] btrfs: Introduce extra_rw_degrade_errors parameter for btrfs_check_rw_degradable
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (2 preceding siblings ...)
  2017-03-08  2:41 ` [PATCH v3 3/7] btrfs: Do chunk level degradation check for remount Qu Wenruo
@ 2017-03-08  2:41 ` Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 5/7] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  2:41 UTC (permalink / raw)
  To: linux-btrfs, anand.jain, kilobyte, demfloro

Introduce a new structure, extra_rw_degrade_errors, to record
devid<->error mapping.

This strucutre will have a array to record runtime error, which affects
degraded mount, like failure to flush or wait one device.

Also allow btrfs_check_rw_degradable() to accept such structure as
another error source other than btrfs_device->missing.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c |  3 ++-
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/volumes.h | 36 ++++++++++++++++++++++++++++-
 4 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3de89283d400..658b8fab1d39 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3058,7 +3058,8 @@ int open_ctree(struct super_block *sb,
 		goto fail_sysfs;
 	}
 
-	if (!(sb->s_flags & MS_RDONLY) && !btrfs_check_rw_degradable(fs_info)) {
+	if (!(sb->s_flags & MS_RDONLY) &&
+	    !btrfs_check_rw_degradable(fs_info, NULL)) {
 		btrfs_warn(fs_info,
 		"writeable mount is not allowed due to too many missing devices");
 		goto fail_sysfs;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 1f5772501c92..06bd9b332e18 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1785,7 +1785,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 		}
 
 		if (!(*flags & MS_RDONLY) &&
-		    !btrfs_check_rw_degradable(fs_info)) {
+		    !btrfs_check_rw_degradable(fs_info, NULL)) {
 			btrfs_warn(fs_info,
 				"too many missing devices, writeable remount is not allowed");
 			ret = -EACCES;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3fb760cd5bad..f44f7f428848 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6765,13 +6765,72 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	return -EIO;
 }
 
+void record_extra_rw_degrade_error(struct extra_rw_degrade_errors *errors,
+				   u64 devid)
+{
+	int i;
+	bool inserted = false;
+
+	if (!errors)
+		return;
+
+	spin_lock(&errors->lock);
+	for (i = 0; i < errors->nr_devs; i++) {
+		struct rw_degrade_error *error = &errors->errors[i];
+
+		if (!error->initialized) {
+			error->devid = devid;
+			error->initialized = true;
+			error->err = true;
+			inserted = true;
+			break;
+		}
+		if (error->devid == devid) {
+			error->err = true;
+			inserted = true;
+			break;
+		}
+	}
+	spin_unlock(&errors->lock);
+	/*
+	 * We iterate all the error records but still found no empty slot
+	 * This means errors->nr_devs is not correct.
+	 */
+	WARN_ON(!inserted);
+}
+
+static bool device_has_rw_degrade_error(struct extra_rw_degrade_errors *errors,
+					u64 devid)
+{
+	int i;
+	bool ret = false;
+
+	if (!errors)
+		return ret;
+
+	spin_lock(&errors->lock);
+	for (i = 0; i < errors->nr_devs; i++) {
+		struct rw_degrade_error *error = &errors->errors[i];
+
+		if (!error->initialized)
+			break;
+		if (error->devid == devid) {
+			ret = true;
+			break;
+		}
+	}
+	spin_unlock(&errors->lock);
+	return ret;
+}
+
 /*
  * Check if all chunks in the fs is OK for read-write degraded mount
  *
  * Return true if the fs is OK to be mounted degraded read-write
  * Return false if the fs is not OK to be mounted degraded
  */
-bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
+			       struct extra_rw_degrade_errors *errors)
 {
 	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
 	struct extent_map *em;
@@ -6796,7 +6855,10 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
 			btrfs_get_num_tolerated_disk_barrier_failures(
 					map->type);
 		for (i = 0; i < map->num_stripes; i++) {
-			if (map->stripes[i].dev->missing)
+			struct btrfs_device *device = map->stripes[i].dev;
+
+			if (device->missing ||
+			    device_has_rw_degrade_error(errors, device->devid))
 				missing++;
 		}
 		if (missing > max_tolerated) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index db1b5ef479cf..67d7474e42a3 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -538,5 +538,39 @@ struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 
-bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info);
+/*
+ * For btrfs_check_rw_degradable() to check extra error from
+ * barrier_all_devices()
+ */
+struct rw_degrade_error {
+	u64 devid;
+	bool initialized;
+	bool err;
+};
+
+struct extra_rw_degrade_errors {
+	int nr_devs;
+	spinlock_t lock;
+	struct rw_degrade_error errors[];
+};
+
+static inline struct extra_rw_degrade_errors *alloc_extra_rw_degrade_errors(
+		int nr_devs)
+{
+	struct extra_rw_degrade_errors *ret;
+
+	ret = kzalloc(sizeof(struct extra_rw_degrade_errors) + nr_devs *
+		      sizeof(struct rw_degrade_error), GFP_NOFS);
+	if (!ret)
+		return ret;
+	spin_lock_init(&ret->lock);
+	ret->nr_devs = nr_devs;
+	return ret;
+}
+
+void record_extra_rw_degrade_error(struct extra_rw_degrade_errors *errors,
+				   u64 devid);
+
+bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
+			       struct extra_rw_degrade_errors *errors);
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/7] btrfs: Allow barrier_all_devices to do chunk level device check
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (3 preceding siblings ...)
  2017-03-08  2:41 ` [PATCH v3 4/7] btrfs: Introduce extra_rw_degrade_errors parameter for btrfs_check_rw_degradable Qu Wenruo
@ 2017-03-08  2:41 ` Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 6/7] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  2:41 UTC (permalink / raw)
  To: linux-btrfs, anand.jain, kilobyte, demfloro

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it's can be easily changed to new per-chunk degradable check
framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time.
With these 2 new members, btrfs_check_rw_degradable() can check if the
fs is still OK when the fs is committed to disk.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 658b8fab1d39..549045a3e15f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3570,17 +3570,20 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 {
 	struct list_head *head;
 	struct btrfs_device *dev;
-	int errors_send = 0;
-	int errors_wait = 0;
+	struct extra_rw_degrade_errors *errors;
 	int ret;
 
+	errors = alloc_extra_rw_degrade_errors(info->fs_devices->num_devices);
+	if (!errors)
+		return -ENOMEM;
+
 	/* send down all the barriers */
 	head = &info->fs_devices->devices;
 	list_for_each_entry_rcu(dev, head, dev_list) {
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_send++;
+			record_extra_rw_degrade_error(errors, dev->devid);
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3588,7 +3591,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 0);
 		if (ret)
-			errors_send++;
+			record_extra_rw_degrade_error(errors, dev->devid);
 	}
 
 	/* wait for all the barriers */
@@ -3596,7 +3599,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_wait++;
+			record_extra_rw_degrade_error(errors, dev->devid);
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3604,11 +3607,13 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 1);
 		if (ret)
-			errors_wait++;
+			record_extra_rw_degrade_error(errors, dev->devid);
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
+	if (!btrfs_check_rw_degradable(info, errors)) {
+		kfree(errors);
 		return -EIO;
+	}
+	kfree(errors);
 	return 0;
 }
 
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 6/7] btrfs: Cleanup num_tolerated_disk_barrier_failures
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (4 preceding siblings ...)
  2017-03-08  2:41 ` [PATCH v3 5/7] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
@ 2017-03-08  2:41 ` Qu Wenruo
  2017-03-08  2:41 ` [PATCH v3 7/7] btrfs: Enhance missing device kernel message Qu Wenruo
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  2:41 UTC (permalink / raw)
  To: linux-btrfs, anand.jain, kilobyte, demfloro

As we use per-chunk degradable check, now the global
num_tolerated_disk_barrier_failures is of no use.

So cleanup it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/disk-io.c | 54 ------------------------------------------------------
 fs/btrfs/disk-io.h |  2 --
 fs/btrfs/volumes.c | 17 -----------------
 4 files changed, 75 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 29b7fc28c607..d688025c1ef0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1062,8 +1062,6 @@ struct btrfs_fs_info {
 	/* next backup root to be overwritten */
 	int backup_root_index;
 
-	int num_tolerated_disk_barrier_failures;
-
 	/* device replace state */
 	struct btrfs_dev_replace dev_replace;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 549045a3e15f..affd7aada057 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3646,60 +3646,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
 	return min_tolerated;
 }
 
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_ioctl_space_info space;
-	struct btrfs_space_info *sinfo;
-	u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
-		       BTRFS_BLOCK_GROUP_SYSTEM,
-		       BTRFS_BLOCK_GROUP_METADATA,
-		       BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
-	int i;
-	int c;
-	int num_tolerated_disk_barrier_failures =
-		(int)fs_info->fs_devices->num_devices;
-
-	for (i = 0; i < ARRAY_SIZE(types); i++) {
-		struct btrfs_space_info *tmp;
-
-		sinfo = NULL;
-		rcu_read_lock();
-		list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
-			if (tmp->flags == types[i]) {
-				sinfo = tmp;
-				break;
-			}
-		}
-		rcu_read_unlock();
-
-		if (!sinfo)
-			continue;
-
-		down_read(&sinfo->groups_sem);
-		for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
-			u64 flags;
-
-			if (list_empty(&sinfo->block_groups[c]))
-				continue;
-
-			btrfs_get_block_group_info(&sinfo->block_groups[c],
-						   &space);
-			if (space.total_bytes == 0 || space.used_bytes == 0)
-				continue;
-			flags = space.flags;
-
-			num_tolerated_disk_barrier_failures = min(
-				num_tolerated_disk_barrier_failures,
-				btrfs_get_num_tolerated_disk_barrier_failures(
-					flags));
-		}
-		up_read(&sinfo->groups_sem);
-	}
-
-	return num_tolerated_disk_barrier_failures;
-}
-
 int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors)
 {
 	struct list_head *head;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 2e0ec29bfd69..4522d2f11909 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -142,8 +142,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 int btree_lock_page_hook(struct page *page, void *data,
 				void (*flush_fn)(void *));
 int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info);
 int __init btrfs_end_io_wq_init(void);
 void btrfs_end_io_wq_exit(void);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f44f7f428848..765d213ac5ef 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1973,9 +1973,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 		free_fs_devices(cur_devices);
 	}
 
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-
 out:
 	mutex_unlock(&uuid_mutex);
 	return ret;
@@ -2474,8 +2471,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 				   "sysfs: failed to create fsid for sprout");
 	}
 
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
 	ret = btrfs_commit_transaction(trans);
 
 	if (seeding_dev) {
@@ -3858,13 +3853,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 			   bctl->meta.target, bctl->data.target);
 	}
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures = min(
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
-			btrfs_get_num_tolerated_disk_barrier_failures(
-				bctl->sys.target));
-	}
-
 	ret = insert_balance_item(fs_info, bctl);
 	if (ret && ret != -EEXIST)
 		goto out;
@@ -3887,11 +3875,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	mutex_lock(&fs_info->balance_mutex);
 	atomic_dec(&fs_info->balance_running);
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures =
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	}
-
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
 		update_ioctl_balance_args(fs_info, 0, bargs);
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 7/7] btrfs: Enhance missing device kernel message
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (5 preceding siblings ...)
  2017-03-08  2:41 ` [PATCH v3 6/7] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
@ 2017-03-08  2:41 ` Qu Wenruo
  2017-03-08  5:26   ` Andrei Borzenkov
  2017-03-08  6:47 ` [PATCH v3 0/7] Chunk level degradable check Adam Borowski
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  2:41 UTC (permalink / raw)
  To: linux-btrfs, anand.jain, kilobyte, demfloro

For missing device, btrfs will just refuse to mount with almost
meaningless kernel message like:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed

This patch will add extra device missing output, making the result to:

 BTRFS info (device vdb6): disk space caching is enabled
 BTRFS info (device vdb6): has skinny extents
 BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
 BTRFS error (device vdb6): failed to read the system array: -5
 BTRFS error (device vdb6): open_ctree failed

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 24 +++++++++++++++++-------
 fs/btrfs/volumes.h |  2 ++
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 765d213ac5ef..f2c878d5f714 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6442,6 +6442,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 		if (!map->stripes[i].dev &&
 		    !btrfs_test_opt(fs_info, DEGRADED)) {
 			free_extent_map(em);
+			btrfs_report_missing_device(fs_info, devid, uuid);
 			return -EIO;
 		}
 		if (!map->stripes[i].dev) {
@@ -6452,8 +6453,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 				free_extent_map(em);
 				return -EIO;
 			}
-			btrfs_warn(fs_info, "devid %llu uuid %pU is missing",
-				   devid, uuid);
+			btrfs_report_missing_device(fs_info, devid, uuid);
 		}
 		map->stripes[i].dev->in_fs_metadata = 1;
 	}
@@ -6570,17 +6570,21 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
 
 	device = btrfs_find_device(fs_info, devid, dev_uuid, fs_uuid);
 	if (!device) {
-		if (!btrfs_test_opt(fs_info, DEGRADED))
+		if (!btrfs_test_opt(fs_info, DEGRADED)) {
+			btrfs_report_missing_device(fs_info, devid, dev_uuid);
 			return -EIO;
+		}
 
 		device = add_missing_dev(fs_devices, devid, dev_uuid);
 		if (!device)
 			return -ENOMEM;
-		btrfs_warn(fs_info, "devid %llu uuid %pU missing",
-				devid, dev_uuid);
+		btrfs_report_missing_device(fs_info, devid, dev_uuid);
 	} else {
-		if (!device->bdev && !btrfs_test_opt(fs_info, DEGRADED))
-			return -EIO;
+		if (!device->bdev) {
+			btrfs_report_missing_device(fs_info, devid, dev_uuid);
+			if (!btrfs_test_opt(fs_info, DEGRADED))
+				return -EIO;
+		}
 
 		if(!device->bdev && !device->missing) {
 			/*
@@ -6806,6 +6810,12 @@ static bool device_has_rw_degrade_error(struct extra_rw_degrade_errors *errors,
 	return ret;
 }
 
+void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
+				 u8 *uuid)
+{
+	btrfs_warn_rl(fs_info, "devid %llu uuid %pU is missing", devid, uuid);
+}
+
 /*
  * Check if all chunks in the fs is OK for read-write degraded mount
  *
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 67d7474e42a3..1f6ab55640da 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -573,4 +573,6 @@ void record_extra_rw_degrade_error(struct extra_rw_degrade_errors *errors,
 
 bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
 			       struct extra_rw_degrade_errors *errors);
+void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
+				 u8 *uuid);
 #endif
-- 
2.12.0




^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 7/7] btrfs: Enhance missing device kernel message
  2017-03-08  2:41 ` [PATCH v3 7/7] btrfs: Enhance missing device kernel message Qu Wenruo
@ 2017-03-08  5:26   ` Andrei Borzenkov
  2017-03-08  5:43     ` Qu Wenruo
  0 siblings, 1 reply; 18+ messages in thread
From: Andrei Borzenkov @ 2017-03-08  5:26 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, anand.jain, kilobyte, demfloro

08.03.2017 05:41, Qu Wenruo пишет:
> For missing device, btrfs will just refuse to mount with almost
> meaningless kernel message like:
> 
>  BTRFS info (device vdb6): disk space caching is enabled
>  BTRFS info (device vdb6): has skinny extents
>  BTRFS error (device vdb6): failed to read the system array: -5
>  BTRFS error (device vdb6): open_ctree failed
> 
> This patch will add extra device missing output, making the result to:
> 
>  BTRFS info (device vdb6): disk space caching is enabled
>  BTRFS info (device vdb6): has skinny extents
>  BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
>  BTRFS error (device vdb6): failed to read the system array: -5

Unfortunately it is still unclear that failure to mount is caused by
missing device. As you explained (and the whole reason of this patch
series) we still are able to mount even with missing device(s). It
should print error (not warning) telling that some extents are not
accessible due to missing device(s).

>  BTRFS error (device vdb6): open_ctree failed
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Reviewed-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/volumes.c | 24 +++++++++++++++++-------
>  fs/btrfs/volumes.h |  2 ++
>  2 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 765d213ac5ef..f2c878d5f714 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6442,6 +6442,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>  		if (!map->stripes[i].dev &&
>  		    !btrfs_test_opt(fs_info, DEGRADED)) {
>  			free_extent_map(em);
> +			btrfs_report_missing_device(fs_info, devid, uuid);
>  			return -EIO;
>  		}
>  		if (!map->stripes[i].dev) {
> @@ -6452,8 +6453,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>  				free_extent_map(em);
>  				return -EIO;
>  			}
> -			btrfs_warn(fs_info, "devid %llu uuid %pU is missing",
> -				   devid, uuid);
> +			btrfs_report_missing_device(fs_info, devid, uuid);
>  		}
>  		map->stripes[i].dev->in_fs_metadata = 1;
>  	}
> @@ -6570,17 +6570,21 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
>  
>  	device = btrfs_find_device(fs_info, devid, dev_uuid, fs_uuid);
>  	if (!device) {
> -		if (!btrfs_test_opt(fs_info, DEGRADED))
> +		if (!btrfs_test_opt(fs_info, DEGRADED)) {
> +			btrfs_report_missing_device(fs_info, devid, dev_uuid);
>  			return -EIO;
> +		}
>  
>  		device = add_missing_dev(fs_devices, devid, dev_uuid);
>  		if (!device)
>  			return -ENOMEM;
> -		btrfs_warn(fs_info, "devid %llu uuid %pU missing",
> -				devid, dev_uuid);
> +		btrfs_report_missing_device(fs_info, devid, dev_uuid);
>  	} else {
> -		if (!device->bdev && !btrfs_test_opt(fs_info, DEGRADED))
> -			return -EIO;
> +		if (!device->bdev) {
> +			btrfs_report_missing_device(fs_info, devid, dev_uuid);
> +			if (!btrfs_test_opt(fs_info, DEGRADED))
> +				return -EIO;
> +		}
>  
>  		if(!device->bdev && !device->missing) {
>  			/*
> @@ -6806,6 +6810,12 @@ static bool device_has_rw_degrade_error(struct extra_rw_degrade_errors *errors,
>  	return ret;
>  }
>  
> +void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
> +				 u8 *uuid)
> +{
> +	btrfs_warn_rl(fs_info, "devid %llu uuid %pU is missing", devid, uuid);
> +}
> +
>  /*
>   * Check if all chunks in the fs is OK for read-write degraded mount
>   *
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 67d7474e42a3..1f6ab55640da 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -573,4 +573,6 @@ void record_extra_rw_degrade_error(struct extra_rw_degrade_errors *errors,
>  
>  bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>  			       struct extra_rw_degrade_errors *errors);
> +void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
> +				 u8 *uuid);
>  #endif
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 7/7] btrfs: Enhance missing device kernel message
  2017-03-08  5:26   ` Andrei Borzenkov
@ 2017-03-08  5:43     ` Qu Wenruo
  0 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  5:43 UTC (permalink / raw)
  To: Andrei Borzenkov, linux-btrfs, anand.jain, kilobyte, demfloro



At 03/08/2017 01:26 PM, Andrei Borzenkov wrote:
> 08.03.2017 05:41, Qu Wenruo пишет:
>> For missing device, btrfs will just refuse to mount with almost
>> meaningless kernel message like:
>>
>>  BTRFS info (device vdb6): disk space caching is enabled
>>  BTRFS info (device vdb6): has skinny extents
>>  BTRFS error (device vdb6): failed to read the system array: -5
>>  BTRFS error (device vdb6): open_ctree failed
>>
>> This patch will add extra device missing output, making the result to:
>>
>>  BTRFS info (device vdb6): disk space caching is enabled
>>  BTRFS info (device vdb6): has skinny extents
>>  BTRFS warning (device vdb6): devid 2 uuid 80470722-cad2-4b90-b7c3-fee294552f1b is missing
>>  BTRFS error (device vdb6): failed to read the system array: -5
>
> Unfortunately it is still unclear that failure to mount is caused by
> missing device. As you explained (and the whole reason of this patch
> series) we still are able to mount even with missing device(s). It
> should print error (not warning) telling that some extents are not
> accessible due to missing device(s)

If you specify -o degraded to allow read_chunk_tree to progress further, 
then there will be detailed reason for it:

  BTRFS info (device vdb7): allowing degraded mounts
  BTRFS info (device vdb7): disk space caching is enabled
  BTRFS info (device vdb7): has skinny extents
  BTRFS info (device vdb7): flagging fs with big metadata feature
  BTRFS warning (device vdb7): devid 1 uuid 
21cdf493-9d75-4a78-a8ee-d7426521a9f1 is missing
  BTRFS warning (device vdb7): chunk 12582912 missing 1 devices, max 
tolerance is 0 for writeble mount
  BTRFS warning (device vdb7): writeable mount is not allowed due to too 
many missing devices
  BTRFS error (device vdb7): open_ctree failed

Otherwise, normal mount doesn't allow any missing device and will just 
return -EIO from btrfs_read_chunk_tree(), preventing us from going to 
btrfs_check_rw_degradable().

I'll add more output for detailed reason why the mount fails in another 
patchset.

Thanks,
Qu

>
>>  BTRFS error (device vdb6): open_ctree failed
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Reviewed-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>  fs/btrfs/volumes.c | 24 +++++++++++++++++-------
>>  fs/btrfs/volumes.h |  2 ++
>>  2 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 765d213ac5ef..f2c878d5f714 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6442,6 +6442,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>>  		if (!map->stripes[i].dev &&
>>  		    !btrfs_test_opt(fs_info, DEGRADED)) {
>>  			free_extent_map(em);
>> +			btrfs_report_missing_device(fs_info, devid, uuid);
>>  			return -EIO;
>>  		}
>>  		if (!map->stripes[i].dev) {
>> @@ -6452,8 +6453,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>>  				free_extent_map(em);
>>  				return -EIO;
>>  			}
>> -			btrfs_warn(fs_info, "devid %llu uuid %pU is missing",
>> -				   devid, uuid);
>> +			btrfs_report_missing_device(fs_info, devid, uuid);
>>  		}
>>  		map->stripes[i].dev->in_fs_metadata = 1;
>>  	}
>> @@ -6570,17 +6570,21 @@ static int read_one_dev(struct btrfs_fs_info *fs_info,
>>
>>  	device = btrfs_find_device(fs_info, devid, dev_uuid, fs_uuid);
>>  	if (!device) {
>> -		if (!btrfs_test_opt(fs_info, DEGRADED))
>> +		if (!btrfs_test_opt(fs_info, DEGRADED)) {
>> +			btrfs_report_missing_device(fs_info, devid, dev_uuid);
>>  			return -EIO;
>> +		}
>>
>>  		device = add_missing_dev(fs_devices, devid, dev_uuid);
>>  		if (!device)
>>  			return -ENOMEM;
>> -		btrfs_warn(fs_info, "devid %llu uuid %pU missing",
>> -				devid, dev_uuid);
>> +		btrfs_report_missing_device(fs_info, devid, dev_uuid);
>>  	} else {
>> -		if (!device->bdev && !btrfs_test_opt(fs_info, DEGRADED))
>> -			return -EIO;
>> +		if (!device->bdev) {
>> +			btrfs_report_missing_device(fs_info, devid, dev_uuid);
>> +			if (!btrfs_test_opt(fs_info, DEGRADED))
>> +				return -EIO;
>> +		}
>>
>>  		if(!device->bdev && !device->missing) {
>>  			/*
>> @@ -6806,6 +6810,12 @@ static bool device_has_rw_degrade_error(struct extra_rw_degrade_errors *errors,
>>  	return ret;
>>  }
>>
>> +void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
>> +				 u8 *uuid)
>> +{
>> +	btrfs_warn_rl(fs_info, "devid %llu uuid %pU is missing", devid, uuid);
>> +}
>> +
>>  /*
>>   * Check if all chunks in the fs is OK for read-write degraded mount
>>   *
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 67d7474e42a3..1f6ab55640da 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -573,4 +573,6 @@ void record_extra_rw_degrade_error(struct extra_rw_degrade_errors *errors,
>>
>>  bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>>  			       struct extra_rw_degrade_errors *errors);
>> +void btrfs_report_missing_device(struct btrfs_fs_info *fs_info, u64 devid,
>> +				 u8 *uuid);
>>  #endif
>>
>
>
>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/7] Chunk level degradable check
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (6 preceding siblings ...)
  2017-03-08  2:41 ` [PATCH v3 7/7] btrfs: Enhance missing device kernel message Qu Wenruo
@ 2017-03-08  6:47 ` Adam Borowski
  2017-03-08  7:39   ` Qu Wenruo
  2017-03-08  8:00 ` Dmitrii Tcvetkov
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Adam Borowski @ 2017-03-08  6:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, anand.jain, demfloro

On Wed, Mar 08, 2017 at 10:41:17AM +0800, Qu Wenruo wrote:
> This patchset will introduce a new per-chunk degradable check for
> btrfs, allow above case to succeed, and it's quite small anyway.

> v3:
>   Remove one duplicated missing device output
>   Use the advice from Anand Jain, not to add new members in btrfs_device,
>   but use a new structure extra_rw_degrade_errors, to record error when
>   sending down/waiting device.
> 
> Sorry Dmitrii Tcvetkov and Adam Borowski, I'm afraid I can't add your
> tested-by tags in v3, as the 4th and 4th patches have quite a big change,
> so you may need to retest the new patchset.

Well, testing stuff is why we're here :)

Re-tested, as far as your patch goes all is well.


Looks like I found an unrelated bug, though, that messed with my testing.
And it looks like a nasty one: once "btrfs dev scan" sees a disk, it stores
its device and will then happily use it without verification even if it's
been pulled out and replaced by something else.  Lemme investigate that
later today.

-- 
⢀⣴⠾⠻⢶⣦⠀ Meow!
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
⠈⠳⣄⠀⠀⠀⠀ preimage for double rot13!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/7] Chunk level degradable check
  2017-03-08  6:47 ` [PATCH v3 0/7] Chunk level degradable check Adam Borowski
@ 2017-03-08  7:39   ` Qu Wenruo
  2017-03-08 18:40     ` Anand Jain
  2017-03-08 19:01     ` Anand Jain
  0 siblings, 2 replies; 18+ messages in thread
From: Qu Wenruo @ 2017-03-08  7:39 UTC (permalink / raw)
  To: Adam Borowski; +Cc: linux-btrfs, anand.jain, demfloro



At 03/08/2017 02:47 PM, Adam Borowski wrote:
> On Wed, Mar 08, 2017 at 10:41:17AM +0800, Qu Wenruo wrote:
>> This patchset will introduce a new per-chunk degradable check for
>> btrfs, allow above case to succeed, and it's quite small anyway.
>
>> v3:
>>   Remove one duplicated missing device output
>>   Use the advice from Anand Jain, not to add new members in btrfs_device,
>>   but use a new structure extra_rw_degrade_errors, to record error when
>>   sending down/waiting device.
>>
>> Sorry Dmitrii Tcvetkov and Adam Borowski, I'm afraid I can't add your
>> tested-by tags in v3, as the 4th and 4th patches have quite a big change,
>> so you may need to retest the new patchset.
>
> Well, testing stuff is why we're here :)
>
> Re-tested, as far as your patch goes all is well.
>
>
> Looks like I found an unrelated bug, though, that messed with my testing.
> And it looks like a nasty one: once "btrfs dev scan" sees a disk, it stores
> its device and will then happily use it without verification even if it's
> been pulled out and replaced by something else.  Lemme investigate that
> later today.

It would be nice if you can send detailed info to mail list.

AFAIK, btrfs will check its superblock for devid and device UUID, to 
ensure that's the device we're using.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/7] Chunk level degradable check
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (7 preceding siblings ...)
  2017-03-08  6:47 ` [PATCH v3 0/7] Chunk level degradable check Adam Borowski
@ 2017-03-08  8:00 ` Dmitrii Tcvetkov
  2017-03-08 12:25 ` Austin S. Hemmelgarn
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Dmitrii Tcvetkov @ 2017-03-08  8:00 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On Wed, 8 Mar 2017 10:41:17 +0800
Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> This patchset will introduce a new per-chunk degradable check for
> btrfs, allow above case to succeed, and it's quite small anyway.

> v2:
>   Update after almost 2 years.
>   Add the last patch to enhance the kernel output, so user can know
>   it's missing devices prevent btrfs to mount.
> v3:
>   Remove one duplicated missing device output
>   Use the advice from Anand Jain, not to add new members in
> btrfs_device, but use a new structure extra_rw_degrade_errors, to
> record error when sending down/waiting device.

Tested raid1/raid10 cases for loosing 1 and more devices: behaviour of
the patchset in regard of allowing degraded mount is still correct.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/7] Chunk level degradable check
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (8 preceding siblings ...)
  2017-03-08  8:00 ` Dmitrii Tcvetkov
@ 2017-03-08 12:25 ` Austin S. Hemmelgarn
  2017-03-08 18:31 ` Anand Jain
  2017-03-08 21:08 ` Goffredo Baroncelli
  11 siblings, 0 replies; 18+ messages in thread
From: Austin S. Hemmelgarn @ 2017-03-08 12:25 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, anand.jain, kilobyte, demfloro

On 2017-03-07 21:41, Qu Wenruo wrote:
> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
> check for tolerated missing device.
>
> Although the one-size-fit-all solution is quite safe, it's too strict
> if data and metadata has different duplication level.
>
> For example, if one use Single data and RAID1 metadata for 2 disks, it
> means any missing device will make the fs unable to be degraded
> mounted.
>
> But in fact, some times all single chunks may be in the existing
> device and in that case, we should allow it to be rw degraded mounted.
>
> Such case can be easily reproduced using the following script:
>  # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>  # wipefs -f /dev/sdc
>  # mount /dev/sdb -o degraded,rw
>
> If using btrfs-debug-tree to check /dev/sdb, one should find that the
> data chunk is only in sdb, so in fact it should allow degraded mount.
>
> This patchset will introduce a new per-chunk degradable check for
> btrfs, allow above case to succeed, and it's quite small anyway.
>
> And enhance kernel error message for missing device, at least kernel
> can know what's making mount failed, other than meaningless
> "failed to read system chunk/chunk tree -5".
>
> v2:
>   Update after almost 2 years.
>   Add the last patch to enhance the kernel output, so user can know
>   it's missing devices prevent btrfs to mount.
> v3:
>   Remove one duplicated missing device output
>   Use the advice from Anand Jain, not to add new members in btrfs_device,
>   but use a new structure extra_rw_degrade_errors, to record error when
>   sending down/waiting device.
>
> Sorry Dmitrii Tcvetkov and Adam Borowski, I'm afraid I can't add your
> tested-by tags in v3, as the 4th and 4th patches have quite a big change,
> so you may need to retest the new patchset.
> Sorry for the trouble.
>
> Qu Wenruo (7):
>   btrfs: Introduce a function to check if all chunks a OK for degraded
>     rw mount
>   btrfs: Do chunk level rw degrade check at mount time
>   btrfs: Do chunk level degradation check for remount
>   btrfs: Introduce extra_rw_degrade_errors parameter for
>     btrfs_check_rw_degradable
>   btrfs: Allow barrier_all_devices to do chunk level device check
>   btrfs: Cleanup num_tolerated_disk_barrier_failures
>   btrfs: Enhance missing device kernel message
>
>  fs/btrfs/ctree.h   |   2 -
>  fs/btrfs/disk-io.c |  87 ++++++------------------------
>  fs/btrfs/disk-io.h |   2 -
>  fs/btrfs/super.c   |   5 +-
>  fs/btrfs/volumes.c | 156 ++++++++++++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/volumes.h |  37 +++++++++++++
>  6 files changed, 188 insertions(+), 101 deletions(-)
>
Everything appears to work as advertised here, so for the patcheset as a 
whole, you can add:

Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>

Also, I've added a couple of specific cases to my automated test scripts 
to make sure this keeps working, so going forwards we'll have some 
regression testing for this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/7] Chunk level degradable check
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (9 preceding siblings ...)
  2017-03-08 12:25 ` Austin S. Hemmelgarn
@ 2017-03-08 18:31 ` Anand Jain
  2017-03-08 21:08 ` Goffredo Baroncelli
  11 siblings, 0 replies; 18+ messages in thread
From: Anand Jain @ 2017-03-08 18:31 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, kilobyte, demfloro

> v3:
>   Remove one duplicated missing device output
>   Use the advice from Anand Jain, not to add new members in btrfs_device,
>   but use a new structure extra_rw_degrade_errors, to record error when
>   sending down/waiting device.

Suggested local variables because, v2 had theoretical bug as assessed 
before. The actual fix (while the variables continue to be at struct 
device) may not be in the scope of this patch, as the btrfs as such 
would not handle the intermittent device disappear/reappear as of now. 
So though I believe in the long term the barrier failure should be part 
of per device dev_stat rather, but at the moment to keep the original 
design as it is in the scope of this patch a local stack variable will 
suffice.

Hope this clarifies better.

Thanks, Anand

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/7] Chunk level degradable check
  2017-03-08  7:39   ` Qu Wenruo
@ 2017-03-08 18:40     ` Anand Jain
  2017-03-08 19:01     ` Anand Jain
  1 sibling, 0 replies; 18+ messages in thread
From: Anand Jain @ 2017-03-08 18:40 UTC (permalink / raw)
  To: Qu Wenruo, Adam Borowski; +Cc: linux-btrfs, demfloro


>> Looks like I found an unrelated bug, though, that messed with my testing.
>> And it looks like a nasty one: once "btrfs dev scan" sees a disk, it
>> stores
>> its device and will then happily use it without verification even if it's
>> been pulled out and replaced by something else.  Lemme investigate that
>> later today.

   A missing device scanned after the FS mount - would just appear
   as being used by the FS tree, but actually its not. BTRFS does
   not handle the online reappear of the device that is the device
   which is not found at the mount/open-ctree won't be part of
   the alloc_dev_list, yet.

HTH
Anand

> It would be nice if you can send detailed info to mail list.
>
> AFAIK, btrfs will check its superblock for devid and device UUID, to
> ensure that's the device we're using.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/7] Chunk level degradable check
  2017-03-08  7:39   ` Qu Wenruo
  2017-03-08 18:40     ` Anand Jain
@ 2017-03-08 19:01     ` Anand Jain
  1 sibling, 0 replies; 18+ messages in thread
From: Anand Jain @ 2017-03-08 19:01 UTC (permalink / raw)
  To: Qu Wenruo, Adam Borowski; +Cc: linux-btrfs, demfloro


>> Looks like I found an unrelated bug, though, that messed with my testing.
>> And it looks like a nasty one: once "btrfs dev scan" sees a disk, it
>> stores
>> its device and will then happily use it without verification even if it's
>> been pulled out and replaced by something else.  Lemme investigate that
>> later today.

    A missing device scanned after the FS mount - would just appear
    as being used by the FS tree, but actually its not. BTRFS does
    not handle the online reappear of the device that is if device
    which is not found at the mount/open-ctree won't be part of
    the alloc_dev_list, yet.

HTH
Anand

> It would be nice if you can send detailed info to mail list.
>
> AFAIK, btrfs will check its superblock for devid and device UUID, to
> ensure that's the device we're using.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/7] Chunk level degradable check
  2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
                   ` (10 preceding siblings ...)
  2017-03-08 18:31 ` Anand Jain
@ 2017-03-08 21:08 ` Goffredo Baroncelli
  11 siblings, 0 replies; 18+ messages in thread
From: Goffredo Baroncelli @ 2017-03-08 21:08 UTC (permalink / raw)
  To: Qu Wenruo, anand.jain, kilobyte, demfloro; +Cc: linux-btrfs

Hi Qu,

I made some tests (see table below). Basically I created a hybrid btrfs filesystem composed by a metadata "single" profile, and a data single/raid1/raid10/raid5/raid6 profile.

For each test case I tried to remove a disk (which could be used by data and or metadata), and then I checked if the filesystem was mountable.

The results were alway the ones expected: when the number of the disks was insufficient, btrfs refused to mount the filesystem. Otherwise of the mount was fine with the correct info in dmesg (with the exception if I remove a disks with the system chunk of course). The hybrid profiles ensure that btrfs checks all the chunks.

The only exception was for raid10. If I remove 2 disks the filesystem were always unmountable. I tried all the combinations and none was successful. But I suppose that some could be OK: i.e. the ones where I removed each half of raid1. This could be a possible future improvement.

Finally a little suggestion: the message showed in dmesg is something like

"chunk 20971520 missing 3 devices, max tolerance is 2 for writable mount"

I suggest 

"chunk 20971520 (RAID6/6 disks) missing 3 devices, max allowed 2 for a successful mount"

1) "max tolerance is 2" is not very clear for me (but I have to admit my English is terrific :-) )
2) the message seems to suggest that a non-writable mount could be allowed: is it the case ?
3) add (if possible) the raid profile and the number of disks, this could help the diagnosis

Anyway thanks for your effort on the multi-disk support.

BR
G.Baroncelli


Test report:

Data	Metadata	Nr disk	Test	     			ExRes	Res
----    --------	-------	-------------------		-----	----
SINGLE 	SINGLE 		4	Remove  mdata disk       	No 	Yes
SINGLE 	SINGLE 		4	Remove  unused disk       	yes	Yes
SINGLE 	SINGLE 		4	Remove  data disk       	No 	Yes
					
RAID1  	SINGLE 		4	Remove  mdata disk       	No 	yes
RAID1  	SINGLE 		4	Remove  data disk       	Yes	yes
RAID1  	SINGLE 		5	Remove  unused  disk       	yes	yes
RAID1  	SINGLE 		5	Remove  data disk  (2)		No 	yes
					
RAID10 	SINGLE 		5	Remove  mdata disk       	No 	yes
RAID10 	SINGLE 		5	Remove  data disk       	Yes	yes
RAID10 	SINGLE 		5	Remove  data disk  (2)		No 	Yes (*)
					
RAID5  	SINGLE 		5	Remove  mdata disk       	No 	yes
RAID5  	SINGLE 		5	Remove  data disk       	Yes	yes
RAID5  	SINGLE 		5	Remove  data disk  (2)		No 	yes
					
RAID6  	SINGLE 		5	Remove  mdata disk       	No 	yes
RAID6  	SINGLE 		5	Remove  data disk  		Yes	yes
RAID6  	SINGLE 		5	Remove  data disk  (2)		Yes	yes
RAID6  	SINGLE 		5	Remove  data disk  (3)		No 	yes
					
					
Note:					
ExRes: Expected result	      -> expectet mount success
Res: Result		      -> test result
Remove mdata disk             -> remove the disk which holds metadata					
Remove  data disk             -> remove the disk which holds data 
				 (and not metadata)					
Remove  data disk (2)         -> remove 2 disks which hold data (and not
 				 metadata) 
Remove  data disk (3)         -> remove 3 disks which hold data (and not
 				 metadata) 
Removed unused disk           -> removed an unused disk					




On 2017-03-08 03:41, Qu Wenruo wrote:
> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
> check for tolerated missing device.
> 
> Although the one-size-fit-all solution is quite safe, it's too strict
> if data and metadata has different duplication level.
> 
> For example, if one use Single data and RAID1 metadata for 2 disks, it
> means any missing device will make the fs unable to be degraded
> mounted.
> 
> But in fact, some times all single chunks may be in the existing
> device and in that case, we should allow it to be rw degraded mounted.



> 
> Such case can be easily reproduced using the following script:
>  # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>  # wipefs -f /dev/sdc
>  # mount /dev/sdb -o degraded,rw
> 
> If using btrfs-debug-tree to check /dev/sdb, one should find that the
> data chunk is only in sdb, so in fact it should allow degraded mount.
> 
> This patchset will introduce a new per-chunk degradable check for
> btrfs, allow above case to succeed, and it's quite small anyway.
> 
> And enhance kernel error message for missing device, at least kernel
> can know what's making mount failed, other than meaningless
> "failed to read system chunk/chunk tree -5".
> 
> v2:
>   Update after almost 2 years.
>   Add the last patch to enhance the kernel output, so user can know
>   it's missing devices prevent btrfs to mount.
> v3:
>   Remove one duplicated missing device output
>   Use the advice from Anand Jain, not to add new members in btrfs_device,
>   but use a new structure extra_rw_degrade_errors, to record error when
>   sending down/waiting device.
> 
> Sorry Dmitrii Tcvetkov and Adam Borowski, I'm afraid I can't add your
> tested-by tags in v3, as the 4th and 4th patches have quite a big change,
> so you may need to retest the new patchset.
> Sorry for the trouble.
> 
> Qu Wenruo (7):
>   btrfs: Introduce a function to check if all chunks a OK for degraded
>     rw mount
>   btrfs: Do chunk level rw degrade check at mount time
>   btrfs: Do chunk level degradation check for remount
>   btrfs: Introduce extra_rw_degrade_errors parameter for
>     btrfs_check_rw_degradable
>   btrfs: Allow barrier_all_devices to do chunk level device check
>   btrfs: Cleanup num_tolerated_disk_barrier_failures
>   btrfs: Enhance missing device kernel message
> 
>  fs/btrfs/ctree.h   |   2 -
>  fs/btrfs/disk-io.c |  87 ++++++------------------------
>  fs/btrfs/disk-io.h |   2 -
>  fs/btrfs/super.c   |   5 +-
>  fs/btrfs/volumes.c | 156 ++++++++++++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/volumes.h |  37 +++++++++++++
>  6 files changed, 188 insertions(+), 101 deletions(-)
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-03-08 21:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-08  2:41 [PATCH v3 0/7] Chunk level degradable check Qu Wenruo
2017-03-08  2:41 ` [PATCH v3 1/7] btrfs: Introduce a function to check if all chunks a OK for degraded rw mount Qu Wenruo
2017-03-08  2:41 ` [PATCH v3 2/7] btrfs: Do chunk level rw degrade check at mount time Qu Wenruo
2017-03-08  2:41 ` [PATCH v3 3/7] btrfs: Do chunk level degradation check for remount Qu Wenruo
2017-03-08  2:41 ` [PATCH v3 4/7] btrfs: Introduce extra_rw_degrade_errors parameter for btrfs_check_rw_degradable Qu Wenruo
2017-03-08  2:41 ` [PATCH v3 5/7] btrfs: Allow barrier_all_devices to do chunk level device check Qu Wenruo
2017-03-08  2:41 ` [PATCH v3 6/7] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
2017-03-08  2:41 ` [PATCH v3 7/7] btrfs: Enhance missing device kernel message Qu Wenruo
2017-03-08  5:26   ` Andrei Borzenkov
2017-03-08  5:43     ` Qu Wenruo
2017-03-08  6:47 ` [PATCH v3 0/7] Chunk level degradable check Adam Borowski
2017-03-08  7:39   ` Qu Wenruo
2017-03-08 18:40     ` Anand Jain
2017-03-08 19:01     ` Anand Jain
2017-03-08  8:00 ` Dmitrii Tcvetkov
2017-03-08 12:25 ` Austin S. Hemmelgarn
2017-03-08 18:31 ` Anand Jain
2017-03-08 21:08 ` Goffredo Baroncelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).