[PATCH 0/5] Btrfs: Per-chunk degradable check

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] Btrfs: Per-chunk degradable check
@ 2015-09-21  2:10 Qu Wenruo
  2015-09-21  2:10 ` [PATCH 1/5] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Qu Wenruo
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Qu Wenruo @ 2015-09-21  2:10 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Anand Jain

Btrfs currently uses num_tolerated_disk_barrier_failures to do global
check for tolerated missing device.

Although the one-size-fit-all solution is quite safe, it's too strict if
data and metadata has different duplication level.

For example, if one use Single data and RAID1 metadata for 2 disks, it
means any missing device will make the fs unable to be degraded mounted.

But in fact, some times all single chunks may be in the existing device
and in that case, we should allow it to be rw degraded mounted.

Such case can be easily reproduced using the following script:
 # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
 # wipefs -f /dev/sdc
 # mount /dev/sdb -o degraded,rw

If using btrfs-debug-tree to check /dev/sdb, one should find that the
data chunk is only in sdb, so in fact it should allow degraded mount.

This patchset will introduce a new per-chunk degradable check for btrfs,
allow above case to succeed, and it's quite small anyway.

Also, it provides the possibility for later enhancement, like
automatically add 'degraded' mount option if possible.

Cc: Anand Jain <anand.jain@oracle.com>

Qu Wenruo (5):
  btrfs: Introduce a new function to check if all chunks a OK for
    degraded     mount
  btrfs: Do per-chunk check for mount time check
  btrfs: Do per-chunk degraded check for remount
  btrfs: Allow barrier_all_devices to do per-chunk device check
  btrfs: Cleanup num_tolerated_disk_barrier_failures

 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/disk-io.c | 87 ++++++++++--------------------------------------------
 fs/btrfs/disk-io.h |  2 --
 fs/btrfs/super.c   | 11 ++++---
 fs/btrfs/volumes.c | 84 +++++++++++++++++++++++++++++++++++++++++-----------
 fs/btrfs/volumes.h |  5 ++++
 6 files changed, 94 insertions(+), 97 deletions(-)

-- 
2.5.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] btrfs: Introduce a new function to check if all chunks a OK for degraded mount
  2015-09-21  2:10 [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
@ 2015-09-21  2:10 ` Qu Wenruo
  2016-04-18  8:47   ` [PATCH] btrfs: fix btrfs_check_degradable() to free extent map Anand Jain
  2015-09-21  2:10 ` [PATCH 2/5] btrfs: Do per-chunk check for mount time check Qu Wenruo
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2015-09-21  2:10 UTC (permalink / raw)
  To: linux-btrfs

Introduce a new function, btrfs_check_degradable(), to judge if all
chunks in btrfs is OK for degraded mount.

It provides the new basis for accurate btrfs mount/remount and even
runtime degraded mount check other than old one-size-fit-all method.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 64 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 644e070..f1ef215 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6906,3 +6906,66 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info)
 		fs_devices = fs_devices->seed;
 	}
 }
+
+/*
+ * Check if all chunks in the fs is OK for degraded mount
+ * Caller itself should do extra check if DEGRADED mount option is given
+ * for >0 return value.
+ *
+ * Return 0 if all chunks are OK.
+ * Return >0 if all chunks are degradable but not all OK.
+ * Return <0 if any chunk is not degradable or other bug.
+ */
+int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	u64 next_start = 0;
+	int ret = 0;
+
+	if (flags & MS_RDONLY)
+		return 0;
+
+	read_lock(&map_tree->map_tree.lock);
+	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)(-1));
+	/* No any chunk? Should be a huge bug */
+	if (!em) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	while (em) {
+		struct map_lookup *map;
+		int missing = 0;
+		int max_tolerated;
+		int i;
+
+		map = (struct map_lookup *) em->bdev;
+		max_tolerated =
+			btrfs_get_num_tolerated_disk_barrier_failures(
+					map->type);
+		for (i = 0; i < map->num_stripes; i++) {
+			if (map->stripes[i].dev->missing)
+				missing++;
+		}
+		if (missing > max_tolerated) {
+			ret = -EIO;
+			btrfs_warn(fs_info,
+				   "missing devices(%d) exceeds the limit(%d), writebale mount is not allowed",
+				   missing, max_tolerated);
+			goto out;
+		} else if (missing)
+			ret = 1;
+		next_start = extent_map_end(em);
+
+		/*
+		 * Alwasy search range [next_start, (u64)-1) to find the next
+		 * chunk map
+		 */
+		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
+					   (u64)(-1) - next_start);
+	}
+out:
+	read_unlock(&map_tree->map_tree.lock);
+	return ret;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2ca784a..fe758df 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -547,5 +547,6 @@ static inline void unlock_chunks(struct btrfs_root *root)
 struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
+int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags);
 
 #endif
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] btrfs: Do per-chunk check for mount time check
  2015-09-21  2:10 [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
  2015-09-21  2:10 ` [PATCH 1/5] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Qu Wenruo
@ 2015-09-21  2:10 ` Qu Wenruo
  2015-09-25  7:05   ` Anand Jain
  2015-09-21  2:10 ` [PATCH 3/5] btrfs: Do per-chunk degraded check for remount Qu Wenruo
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2015-09-21  2:10 UTC (permalink / raw)
  To: linux-btrfs

Now use the btrfs_check_degraded() to do mount time degraded check.

With this patch, now we can mount with the following case:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdc
 # mount /dev/sdb /mnt/btrfs -o degraded
 As the single data chunk is only in sdb, so it's OK to mount as
 degraded, as missing one device is OK for RAID1.

But still fail with the following case as expected:
 # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
 # wipefs -a /dev/sdb
 # mount /dev/sdc /mnt/btrfs -o degraded
 As the data chunk is only in sdb, so it's not OK to mount it as
 degraded.

Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Reported-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0b658d0..d64299f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2858,6 +2858,16 @@ int open_ctree(struct super_block *sb,
 		goto fail_tree_roots;
 	}
 
+	ret = btrfs_check_degradable(fs_info, fs_info->sb->s_flags);
+	if (ret < 0) {
+		btrfs_error(fs_info, ret, "degraded writable mount failed");
+		goto fail_tree_roots;
+	} else if (ret > 0 && !btrfs_test_opt(chunk_root, DEGRADED)) {
+		btrfs_warn(fs_info,
+			"Some device missing, but still degraded mountable, please mount with -o degraded option");
+		ret = -EACCES;
+		goto fail_tree_roots;
+	}
 	/*
 	 * keep the device that is marked to be the target device for the
 	 * dev_replace procedure
@@ -2947,14 +2957,6 @@ retry_root_backup:
 	}
 	fs_info->num_tolerated_disk_barrier_failures =
 		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	if (fs_info->fs_devices->missing_devices >
-	     fs_info->num_tolerated_disk_barrier_failures &&
-	    !(sb->s_flags & MS_RDONLY)) {
-		pr_warn("BTRFS: missing devices(%llu) exceeds the limit(%d), writeable mount is not allowed\n",
-			fs_info->fs_devices->missing_devices,
-			fs_info->num_tolerated_disk_barrier_failures);
-		goto fail_sysfs;
-	}
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
 					       "btrfs-cleaner");
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] btrfs: Do per-chunk degraded check for remount
  2015-09-21  2:10 [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
  2015-09-21  2:10 ` [PATCH 1/5] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Qu Wenruo
  2015-09-21  2:10 ` [PATCH 2/5] btrfs: Do per-chunk check for mount time check Qu Wenruo
@ 2015-09-21  2:10 ` Qu Wenruo
  2015-09-25  6:54   ` Anand Jain
  2015-09-25  8:24   ` [PATCH 1/1] " Anand Jain
  2015-09-21  2:10 ` [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check Qu Wenruo
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Qu Wenruo @ 2015-09-21  2:10 UTC (permalink / raw)
  To: linux-btrfs

Just the same for mount time check, use new btrfs_check_degraded() to do
per chunk check.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/super.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c389c13..720c044 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1681,11 +1681,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		if (fs_info->fs_devices->missing_devices >
-		     fs_info->num_tolerated_disk_barrier_failures &&
-		    !(*flags & MS_RDONLY)) {
+		ret = btrfs_check_degradable(fs_info, *flags);
+		if (ret < 0) {
+			btrfs_error(fs_info, ret,
+				    "degraded writable remount failed");
+			goto restore;
+		} else if (ret > 0 && !btrfs_test_opt(root, DEGRADED)) {
 			btrfs_warn(fs_info,
-				"too many missing devices, writeable remount is not allowed");
+				"some device missing, but still degraded mountable, please remount with -o degraded option");
 			ret = -EACCES;
 			goto restore;
 		}
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check
  2015-09-21  2:10 [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
                   ` (2 preceding siblings ...)
  2015-09-21  2:10 ` [PATCH 3/5] btrfs: Do per-chunk degraded check for remount Qu Wenruo
@ 2015-09-21  2:10 ` Qu Wenruo
  2015-10-30  8:32   ` Anand Jain
  2015-09-21  2:10 ` [PATCH 5/5] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
  2015-11-05  0:57 ` [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
  5 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2015-09-21  2:10 UTC (permalink / raw)
  To: linux-btrfs

The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it's can be easily changed to new per-chunk degradable check
framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time.
And then check it in a similar but more accurate behavior than old code.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 13 +++++--------
 fs/btrfs/volumes.c |  6 +++++-
 fs/btrfs/volumes.h |  4 ++++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d64299f..7cd94e7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3400,8 +3400,6 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 {
 	struct list_head *head;
 	struct btrfs_device *dev;
-	int errors_send = 0;
-	int errors_wait = 0;
 	int ret;
 
 	/* send down all the barriers */
@@ -3410,7 +3408,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_send++;
+			dev->err_send = 1;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3418,7 +3416,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 0);
 		if (ret)
-			errors_send++;
+			dev->err_send = 1;
 	}
 
 	/* wait for all the barriers */
@@ -3426,7 +3424,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_wait++;
+			dev->err_wait = 1;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3434,10 +3432,9 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 1);
 		if (ret)
-			errors_wait++;
+			dev->err_wait = 1;
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
+	if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
 		return -EIO;
 	return 0;
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f1ef215..88266fa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6945,8 +6945,12 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
 			btrfs_get_num_tolerated_disk_barrier_failures(
 					map->type);
 		for (i = 0; i < map->num_stripes; i++) {
-			if (map->stripes[i].dev->missing)
+			if (map->stripes[i].dev->missing ||
+			    map->stripes[i].dev->err_wait ||
+			    map->stripes[i].dev->err_send)
 				missing++;
+			map->stripes[i].dev->err_wait = 0;
+			map->stripes[i].dev->err_send = 0;
 		}
 		if (missing > max_tolerated) {
 			ret = -EIO;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index fe758df..cd02556 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -76,6 +76,10 @@ struct btrfs_device {
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 
+	/* for barrier_all_devices() check */
+	int err_send;
+	int err_wait;
+
 #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
 	seqcount_t data_seqcount;
 #endif
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] btrfs: Cleanup num_tolerated_disk_barrier_failures
  2015-09-21  2:10 [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
                   ` (3 preceding siblings ...)
  2015-09-21  2:10 ` [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check Qu Wenruo
@ 2015-09-21  2:10 ` Qu Wenruo
  2015-11-05  0:57 ` [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
  5 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2015-09-21  2:10 UTC (permalink / raw)
  To: linux-btrfs

As we use per-chunk degradable check, now the global
num_tolerated_disk_barrier_failures is of no use.

So cleanup it.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/disk-io.c | 56 ------------------------------------------------------
 fs/btrfs/disk-io.h |  2 --
 fs/btrfs/volumes.c | 17 -----------------
 4 files changed, 77 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..3a4d779 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1753,8 +1753,6 @@ struct btrfs_fs_info {
 	/* next backup root to be overwritten */
 	int backup_root_index;
 
-	int num_tolerated_disk_barrier_failures;
-
 	/* device replace state */
 	struct btrfs_dev_replace dev_replace;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7cd94e7..93304cb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2955,8 +2955,6 @@ retry_root_backup:
 		printk(KERN_ERR "BTRFS: Failed to read block groups: %d\n", ret);
 		goto fail_sysfs;
 	}
-	fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
 					       "btrfs-cleaner");
@@ -3459,60 +3457,6 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags)
 	return 0;
 }
 
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_ioctl_space_info space;
-	struct btrfs_space_info *sinfo;
-	u64 types[] = {BTRFS_BLOCK_GROUP_DATA,
-		       BTRFS_BLOCK_GROUP_SYSTEM,
-		       BTRFS_BLOCK_GROUP_METADATA,
-		       BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA};
-	int i;
-	int c;
-	int num_tolerated_disk_barrier_failures =
-		(int)fs_info->fs_devices->num_devices;
-
-	for (i = 0; i < ARRAY_SIZE(types); i++) {
-		struct btrfs_space_info *tmp;
-
-		sinfo = NULL;
-		rcu_read_lock();
-		list_for_each_entry_rcu(tmp, &fs_info->space_info, list) {
-			if (tmp->flags == types[i]) {
-				sinfo = tmp;
-				break;
-			}
-		}
-		rcu_read_unlock();
-
-		if (!sinfo)
-			continue;
-
-		down_read(&sinfo->groups_sem);
-		for (c = 0; c < BTRFS_NR_RAID_TYPES; c++) {
-			u64 flags;
-
-			if (list_empty(&sinfo->block_groups[c]))
-				continue;
-
-			btrfs_get_block_group_info(&sinfo->block_groups[c],
-						   &space);
-			if (space.total_bytes == 0 || space.used_bytes == 0)
-				continue;
-			flags = space.flags;
-
-			num_tolerated_disk_barrier_failures = min(
-				num_tolerated_disk_barrier_failures,
-				btrfs_get_num_tolerated_disk_barrier_failures(
-					flags));
-		}
-		up_read(&sinfo->groups_sem);
-	}
-
-	return num_tolerated_disk_barrier_failures;
-}
-
 static int write_all_supers(struct btrfs_root *root, int max_mirrors)
 {
 	struct list_head *head;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index bdfb479..487fd4a0 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -140,8 +140,6 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
 int btree_lock_page_hook(struct page *page, void *data,
 				void (*flush_fn)(void *));
 int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
-int btrfs_calc_num_tolerated_disk_barrier_failures(
-	struct btrfs_fs_info *fs_info);
 int __init btrfs_end_io_wq_init(void);
 void btrfs_end_io_wq_exit(void);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 88266fa..6bda203 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1825,9 +1825,6 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path)
 		free_fs_devices(cur_devices);
 	}
 
-	root->fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
-
 	/*
 	 * at this point, the device is zero sized.  We want to
 	 * remove it from the devices list and zero out the old super
@@ -2355,8 +2352,6 @@ int btrfs_init_new_device(struct btrfs_root *root, char *device_path)
 			pr_warn("BTRFS: sysfs: failed to create fsid for sprout\n");
 	}
 
-	root->fs_info->num_tolerated_disk_barrier_failures =
-		btrfs_calc_num_tolerated_disk_barrier_failures(root->fs_info);
 	ret = btrfs_commit_transaction(trans, root);
 
 	if (seeding_dev) {
@@ -3584,13 +3579,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 		}
 	} while (read_seqretry(&fs_info->profiles_lock, seq));
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures = min(
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info),
-			btrfs_get_num_tolerated_disk_barrier_failures(
-				bctl->sys.target));
-	}
-
 	ret = insert_balance_item(fs_info->tree_root, bctl);
 	if (ret && ret != -EEXIST)
 		goto out;
@@ -3613,11 +3601,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl,
 	mutex_lock(&fs_info->balance_mutex);
 	atomic_dec(&fs_info->balance_running);
 
-	if (bctl->sys.flags & BTRFS_BALANCE_ARGS_CONVERT) {
-		fs_info->num_tolerated_disk_barrier_failures =
-			btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
-	}
-
 	if (bargs) {
 		memset(bargs, 0, sizeof(*bargs));
 		update_ioctl_balance_args(fs_info, 0, bargs);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] btrfs: Do per-chunk degraded check for remount
  2015-09-21  2:10 ` [PATCH 3/5] btrfs: Do per-chunk degraded check for remount Qu Wenruo
@ 2015-09-25  6:54   ` Anand Jain
  2015-09-25  8:08     ` Qu Wenruo
  2015-09-25  8:24   ` [PATCH 1/1] " Anand Jain
  1 sibling, 1 reply; 17+ messages in thread
From: Anand Jain @ 2015-09-25  6:54 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 09/21/2015 10:10 AM, Qu Wenruo wrote:
> Just the same for mount time check, use new btrfs_check_degraded() to do
> per chunk check.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>   fs/btrfs/super.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index c389c13..720c044 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1681,11 +1681,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
>   			goto restore;
>   		}
>
> -		if (fs_info->fs_devices->missing_devices >
> -		     fs_info->num_tolerated_disk_barrier_failures &&
> -		    !(*flags & MS_RDONLY)) {
> +		ret = btrfs_check_degradable(fs_info, *flags);
> +		if (ret < 0) {
> +			btrfs_error(fs_info, ret,
> +				    "degraded writable remount failed");

btrfs_erorr() which is an error handling routine, isn't appropriate 
here, mainly because as we are in the remount context, I am not sure if 
you meant to change the fs state to readonly (on error) in the remount 
context ? or Instead btrfs_err() which is an error reporting/logging 
would be appropriate.

btrfs_erorr() and btrfs_err() are way different in action but very easy 
have an oversight and use the wrong one. the below patch changed it..

    Btrfs: consolidate btrfs_error() to btrfs_std_error()

Thanks, Anand


> +			goto restore;
> +		} else if (ret > 0 && !btrfs_test_opt(root, DEGRADED)) {
>   			btrfs_warn(fs_info,
> -				"too many missing devices, writeable remount is not allowed");
> +				"some device missing, but still degraded mountable, please remount with -o degraded option");
>   			ret = -EACCES;
>   			goto restore;
>   		}
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] btrfs: Do per-chunk check for mount time check
  2015-09-21  2:10 ` [PATCH 2/5] btrfs: Do per-chunk check for mount time check Qu Wenruo
@ 2015-09-25  7:05   ` Anand Jain
  0 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2015-09-25  7:05 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 09/21/2015 10:10 AM, Qu Wenruo wrote:
> Now use the btrfs_check_degraded() to do mount time degraded check.
>
> With this patch, now we can mount with the following case:
>   # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
>   # wipefs -a /dev/sdc
>   # mount /dev/sdb /mnt/btrfs -o degraded
>   As the single data chunk is only in sdb, so it's OK to mount as
>   degraded, as missing one device is OK for RAID1.
>
> But still fail with the following case as expected:
>   # mkfs.btrfs -f -m raid1 -d single /dev/sdb /dev/sdc
>   # wipefs -a /dev/sdb
>   # mount /dev/sdc /mnt/btrfs -o degraded
>   As the data chunk is only in sdb, so it's not OK to mount it as
>   degraded.
>
> Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
> Reported-by: Anand Jain <anand.jain@oracle.com>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>   fs/btrfs/disk-io.c | 18 ++++++++++--------
>   1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 0b658d0..d64299f 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2858,6 +2858,16 @@ int open_ctree(struct super_block *sb,
>   		goto fail_tree_roots;
>   	}
>
> +	ret = btrfs_check_degradable(fs_info, fs_info->sb->s_flags);
> +	if (ret < 0) {
> +		btrfs_error(fs_info, ret, "degraded writable mount failed");

> +		goto fail_tree_roots;

same here too. if at all we are failing the mount. there is no point in 
doing the error handling (btrfs_error()) instead just error reporting is 
better (btrfs_err()).

Thanks, Anand



> +	} else if (ret > 0 && !btrfs_test_opt(chunk_root, DEGRADED)) {
> +		btrfs_warn(fs_info,
> +			"Some device missing, but still degraded mountable, please mount with -o degraded option");
> +		ret = -EACCES;
> +		goto fail_tree_roots;
> +	}
>   	/*
>   	 * keep the device that is marked to be the target device for the
>   	 * dev_replace procedure
> @@ -2947,14 +2957,6 @@ retry_root_backup:
>   	}
>   	fs_info->num_tolerated_disk_barrier_failures =
>   		btrfs_calc_num_tolerated_disk_barrier_failures(fs_info);
> -	if (fs_info->fs_devices->missing_devices >
> -	     fs_info->num_tolerated_disk_barrier_failures &&
> -	    !(sb->s_flags & MS_RDONLY)) {
> -		pr_warn("BTRFS: missing devices(%llu) exceeds the limit(%d), writeable mount is not allowed\n",
> -			fs_info->fs_devices->missing_devices,
> -			fs_info->num_tolerated_disk_barrier_failures);
> -		goto fail_sysfs;
> -	}
>
>   	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
>   					       "btrfs-cleaner");
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] btrfs: Do per-chunk degraded check for remount
  2015-09-25  6:54   ` Anand Jain
@ 2015-09-25  8:08     ` Qu Wenruo
  2015-09-25  8:30       ` Anand Jain
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2015-09-25  8:08 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs



Anand Jain wrote on 2015/09/25 14:54 +0800:
>
>
> On 09/21/2015 10:10 AM, Qu Wenruo wrote:
>> Just the same for mount time check, use new btrfs_check_degraded() to do
>> per chunk check.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/super.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index c389c13..720c044 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1681,11 +1681,14 @@ static int btrfs_remount(struct super_block
>> *sb, int *flags, char *data)
>>               goto restore;
>>           }
>>
>> -        if (fs_info->fs_devices->missing_devices >
>> -             fs_info->num_tolerated_disk_barrier_failures &&
>> -            !(*flags & MS_RDONLY)) {
>> +        ret = btrfs_check_degradable(fs_info, *flags);
>> +        if (ret < 0) {
>> +            btrfs_error(fs_info, ret,
>> +                    "degraded writable remount failed");
>
> btrfs_erorr() which is an error handling routine, isn't appropriate
> here, mainly because as we are in the remount context, I am not sure if
> you meant to change the fs state to readonly (on error) in the remount
> context ? or Instead btrfs_err() which is an error reporting/logging
> would be appropriate.
>
> btrfs_erorr() and btrfs_err() are way different in action but very easy
> have an oversight and use the wrong one. the below patch changed it..
>
>     Btrfs: consolidate btrfs_error() to btrfs_std_error()
>
> Thanks, Anand

Thanks for pointing this out.

I was quite unsure about using btrfs_info/warn/error.

In this case, I just wan't to output a dmesg info to let user know 
exactly what caused the mount failed.
Original code output nothing but "failed to open chunk tree", which is 
quite confusing for end user.

I was planning to use btrfs_info, but at least this is really an error 
message, but only to info user the real cause.

Maybe btrfs_warn will be a better choice?

Thanks,
Qu

>
>
>> +            goto restore;
>> +        } else if (ret > 0 && !btrfs_test_opt(root, DEGRADED)) {
>>               btrfs_warn(fs_info,
>> -                "too many missing devices, writeable remount is not
>> allowed");
>> +                "some device missing, but still degraded mountable,
>> please remount with -o degraded option");
>>               ret = -EACCES;
>>               goto restore;
>>           }
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/1] btrfs: Do per-chunk degraded check for remount
  2015-09-21  2:10 ` [PATCH 3/5] btrfs: Do per-chunk degraded check for remount Qu Wenruo
  2015-09-25  6:54   ` Anand Jain
@ 2015-09-25  8:24   ` Anand Jain
  1 sibling, 0 replies; 17+ messages in thread
From: Anand Jain @ 2015-09-25  8:24 UTC (permalink / raw)
  To: linux-btrfs; +Cc: quwenruo

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

Just the same for mount time check, use new btrfs_check_degraded() to do
per chunk check.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>

Btrfs: use btrfs_error instead of btrfs_err during remount

apply on top of the patch

[PATCH 1/1] Btrfs: consolidate btrfs_error() to btrfs_std_error()

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/super.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 181db38..16f1412 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1664,11 +1664,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 			goto restore;
 		}
 
-		if (fs_info->fs_devices->missing_devices >
-		     fs_info->num_tolerated_disk_barrier_failures &&
-		    !(*flags & MS_RDONLY)) {
+		ret = btrfs_check_degradable(fs_info, *flags);
+		if (ret < 0) {
+			btrfs_err(fs_info,
+				"degraded writable remount failed %d", ret);
+			goto restore;
+		} else if (ret > 0 && !btrfs_test_opt(root, DEGRADED)) {
 			btrfs_warn(fs_info,
-				"too many missing devices, writeable remount is not allowed");
+				"some device missing, but still degraded mountable, please remount with -o degraded option");
 			ret = -EACCES;
 			goto restore;
 		}
-- 
2.4.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] btrfs: Do per-chunk degraded check for remount
  2015-09-25  8:08     ` Qu Wenruo
@ 2015-09-25  8:30       ` Anand Jain
  2015-09-25  8:34         ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Anand Jain @ 2015-09-25  8:30 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Qu,

Strictly speaking IMO it should be reported to the user on the cli 
terminal, and no logging in required. since its not that easy to get 
that at this point, I am ok with logging it as error. Since we are 
failing the task(mount), error is better.

I have made that change this on top of the patch

   [PATCH 1/1] Btrfs: consolidate btrfs_error() to btrfs_std_error()

and sent them both.

Thanks, Anand


> Thanks for pointing this out.
>
> I was quite unsure about using btrfs_info/warn/error.
>
> In this case, I just wan't to output a dmesg info to let user know
> exactly what caused the mount failed.
> Original code output nothing but "failed to open chunk tree", which is
> quite confusing for end user.
>
> I was planning to use btrfs_info, but at least this is really an error
> message, but only to info user the real cause.
>
> Maybe btrfs_warn will be a better choice?
>
> Thanks,
> Qu



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] btrfs: Do per-chunk degraded check for remount
  2015-09-25  8:30       ` Anand Jain
@ 2015-09-25  8:34         ` Qu Wenruo
  0 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2015-09-25  8:34 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

Thanks Anand,

I'm OK with both new patches.

Thanks for the modification.
Qu

Anand Jain wrote on 2015/09/25 16:30 +0800:
> Qu,
>
> Strictly speaking IMO it should be reported to the user on the cli
> terminal, and no logging in required. since its not that easy to get
> that at this point, I am ok with logging it as error. Since we are
> failing the task(mount), error is better.
>
> I have made that change this on top of the patch
>
>    [PATCH 1/1] Btrfs: consolidate btrfs_error() to btrfs_std_error()
>
> and sent them both.
>
> Thanks, Anand
>
>
>> Thanks for pointing this out.
>>
>> I was quite unsure about using btrfs_info/warn/error.
>>
>> In this case, I just wan't to output a dmesg info to let user know
>> exactly what caused the mount failed.
>> Original code output nothing but "failed to open chunk tree", which is
>> quite confusing for end user.
>>
>> I was planning to use btrfs_info, but at least this is really an error
>> message, but only to info user the real cause.
>>
>> Maybe btrfs_warn will be a better choice?
>>
>> Thanks,
>> Qu
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check
  2015-09-21  2:10 ` [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check Qu Wenruo
@ 2015-10-30  8:32   ` Anand Jain
  2015-10-30 11:41     ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Anand Jain @ 2015-10-30  8:32 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs



Qu,

  We shouldn't mark FS readonly when chunks are degradable.
  As below.

Thanks, Anand


diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 39a2d57..dbb2483 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3530,7 +3530,7 @@ static int write_all_supers(struct btrfs_root 
*root, int max_mirrors)

         if (do_barriers) {
                 ret = barrier_all_devices(root->fs_info);
-               if (ret) {
+               if (ret < 0) {
                         mutex_unlock(
 
&root->fs_info->fs_devices->device_list_mutex);
                         btrfs_std_error(root->fs_info, ret,




On 09/21/2015 10:10 AM, Qu Wenruo wrote:
> The last user of num_tolerated_disk_barrier_failures is
> barrier_all_devices().
> But it's can be easily changed to new per-chunk degradable check
> framework.
>
> Now btrfs_device will have two extra members, representing send/wait
> error, set at write_dev_flush() time.
> And then check it in a similar but more accurate behavior than old code.
>
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>   fs/btrfs/disk-io.c | 13 +++++--------
>   fs/btrfs/volumes.c |  6 +++++-
>   fs/btrfs/volumes.h |  4 ++++
>   3 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d64299f..7cd94e7 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3400,8 +3400,6 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>   {
>   	struct list_head *head;
>   	struct btrfs_device *dev;
> -	int errors_send = 0;
> -	int errors_wait = 0;
>   	int ret;
>
>   	/* send down all the barriers */
> @@ -3410,7 +3408,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>   		if (dev->missing)
>   			continue;
>   		if (!dev->bdev) {
> -			errors_send++;
> +			dev->err_send = 1;
>   			continue;
>   		}
>   		if (!dev->in_fs_metadata || !dev->writeable)
> @@ -3418,7 +3416,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>
>   		ret = write_dev_flush(dev, 0);
>   		if (ret)
> -			errors_send++;
> +			dev->err_send = 1;
>   	}
>
>   	/* wait for all the barriers */
> @@ -3426,7 +3424,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>   		if (dev->missing)
>   			continue;
>   		if (!dev->bdev) {
> -			errors_wait++;
> +			dev->err_wait = 1;
>   			continue;
>   		}
>   		if (!dev->in_fs_metadata || !dev->writeable)
> @@ -3434,10 +3432,9 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>
>   		ret = write_dev_flush(dev, 1);
>   		if (ret)
> -			errors_wait++;
> +			dev->err_wait = 1;
>   	}
> -	if (errors_send > info->num_tolerated_disk_barrier_failures ||
> -	    errors_wait > info->num_tolerated_disk_barrier_failures)
> +	if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
>   		return -EIO;
>   	return 0;
>   }
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f1ef215..88266fa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6945,8 +6945,12 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
>   			btrfs_get_num_tolerated_disk_barrier_failures(
>   					map->type);
>   		for (i = 0; i < map->num_stripes; i++) {
> -			if (map->stripes[i].dev->missing)
> +			if (map->stripes[i].dev->missing ||
> +			    map->stripes[i].dev->err_wait ||
> +			    map->stripes[i].dev->err_send)
>   				missing++;
> +			map->stripes[i].dev->err_wait = 0;
> +			map->stripes[i].dev->err_send = 0;
>   		}
>   		if (missing > max_tolerated) {
>   			ret = -EIO;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index fe758df..cd02556 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -76,6 +76,10 @@ struct btrfs_device {
>   	int can_discard;
>   	int is_tgtdev_for_dev_replace;
>
> +	/* for barrier_all_devices() check */
> +	int err_send;
> +	int err_wait;
> +
>   #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>   	seqcount_t data_seqcount;
>   #endif
>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check
  2015-10-30  8:32   ` Anand Jain
@ 2015-10-30 11:41     ` Qu Wenruo
  2015-10-30 23:52       ` Anand Jain
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2015-10-30 11:41 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo; +Cc: linux-btrfs



在 2015年10月30日 16:32, Anand Jain 写道:
>
>
> Qu,
>
>   We shouldn't mark FS readonly when chunks are degradable.
>   As below.
>
> Thanks, Anand
>
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 39a2d57..dbb2483 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3530,7 +3530,7 @@ static int write_all_supers(struct btrfs_root
> *root, int max_mirrors)
>
>          if (do_barriers) {
>                  ret = barrier_all_devices(root->fs_info);
> -               if (ret) {
> +               if (ret < 0) {
>                          mutex_unlock(
>
> &root->fs_info->fs_devices->device_list_mutex);
>                          btrfs_std_error(root->fs_info, ret,
>
>

Sorry, I didn't got the point here.

There should be no difference between ret and ret < 0,
as barrier_all_devices() will only return -EIO or 0.

Or I missed something?

Thanks,
Qu

>
>
> On 09/21/2015 10:10 AM, Qu Wenruo wrote:
>> The last user of num_tolerated_disk_barrier_failures is
>> barrier_all_devices().
>> But it's can be easily changed to new per-chunk degradable check
>> framework.
>>
>> Now btrfs_device will have two extra members, representing send/wait
>> error, set at write_dev_flush() time.
>> And then check it in a similar but more accurate behavior than old code.
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/disk-io.c | 13 +++++--------
>>   fs/btrfs/volumes.c |  6 +++++-
>>   fs/btrfs/volumes.h |  4 ++++
>>   3 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index d64299f..7cd94e7 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3400,8 +3400,6 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>   {
>>       struct list_head *head;
>>       struct btrfs_device *dev;
>> -    int errors_send = 0;
>> -    int errors_wait = 0;
>>       int ret;
>>
>>       /* send down all the barriers */
>> @@ -3410,7 +3408,7 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>           if (dev->missing)
>>               continue;
>>           if (!dev->bdev) {
>> -            errors_send++;
>> +            dev->err_send = 1;
>>               continue;
>>           }
>>           if (!dev->in_fs_metadata || !dev->writeable)
>> @@ -3418,7 +3416,7 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>
>>           ret = write_dev_flush(dev, 0);
>>           if (ret)
>> -            errors_send++;
>> +            dev->err_send = 1;
>>       }
>>
>>       /* wait for all the barriers */
>> @@ -3426,7 +3424,7 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>           if (dev->missing)
>>               continue;
>>           if (!dev->bdev) {
>> -            errors_wait++;
>> +            dev->err_wait = 1;
>>               continue;
>>           }
>>           if (!dev->in_fs_metadata || !dev->writeable)
>> @@ -3434,10 +3432,9 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>
>>           ret = write_dev_flush(dev, 1);
>>           if (ret)
>> -            errors_wait++;
>> +            dev->err_wait = 1;
>>       }
>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>> +    if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
>>           return -EIO;
>>       return 0;
>>   }
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index f1ef215..88266fa 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6945,8 +6945,12 @@ int btrfs_check_degradable(struct btrfs_fs_info
>> *fs_info, unsigned flags)
>>               btrfs_get_num_tolerated_disk_barrier_failures(
>>                       map->type);
>>           for (i = 0; i < map->num_stripes; i++) {
>> -            if (map->stripes[i].dev->missing)
>> +            if (map->stripes[i].dev->missing ||
>> +                map->stripes[i].dev->err_wait ||
>> +                map->stripes[i].dev->err_send)
>>                   missing++;
>> +            map->stripes[i].dev->err_wait = 0;
>> +            map->stripes[i].dev->err_send = 0;
>>           }
>>           if (missing > max_tolerated) {
>>               ret = -EIO;
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index fe758df..cd02556 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -76,6 +76,10 @@ struct btrfs_device {
>>       int can_discard;
>>       int is_tgtdev_for_dev_replace;
>>
>> +    /* for barrier_all_devices() check */
>> +    int err_send;
>> +    int err_wait;
>> +
>>   #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>       seqcount_t data_seqcount;
>>   #endif
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check
  2015-10-30 11:41     ` Qu Wenruo
@ 2015-10-30 23:52       ` Anand Jain
  0 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2015-10-30 23:52 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo; +Cc: linux-btrfs


On 10/30/2015 07:41 PM, Qu Wenruo wrote:
>
>
> 在 2015年10月30日 16:32, Anand Jain 写道:
>>
>>
>> Qu,
>>
>>   We shouldn't mark FS readonly when chunks are degradable.
>>   As below.
>>
>> Thanks, Anand
>>
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 39a2d57..dbb2483 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3530,7 +3530,7 @@ static int write_all_supers(struct btrfs_root
>> *root, int max_mirrors)
>>
>>          if (do_barriers) {
>>                  ret = barrier_all_devices(root->fs_info);
>> -               if (ret) {
>> +               if (ret < 0) {
>>                          mutex_unlock(
>>
>> &root->fs_info->fs_devices->device_list_mutex);
>>                          btrfs_std_error(root->fs_info, ret,
>>
>>
>
> Sorry, I didn't got the point here.
>
> There should be no difference between ret and ret < 0,
> as barrier_all_devices() will only return -EIO or 0.

  oh sorry. you are right. I missed that point.

Thanks, Anand


> Thanks,
> Qu
>
>>
>>
>> On 09/21/2015 10:10 AM, Qu Wenruo wrote:
>>> The last user of num_tolerated_disk_barrier_failures is
>>> barrier_all_devices().
>>> But it's can be easily changed to new per-chunk degradable check
>>> framework.
>>>
>>> Now btrfs_device will have two extra members, representing send/wait
>>> error, set at write_dev_flush() time.
>>> And then check it in a similar but more accurate behavior than old code.
>>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>>   fs/btrfs/disk-io.c | 13 +++++--------
>>>   fs/btrfs/volumes.c |  6 +++++-
>>>   fs/btrfs/volumes.h |  4 ++++
>>>   3 files changed, 14 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index d64299f..7cd94e7 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3400,8 +3400,6 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>   {
>>>       struct list_head *head;
>>>       struct btrfs_device *dev;
>>> -    int errors_send = 0;
>>> -    int errors_wait = 0;
>>>       int ret;
>>>
>>>       /* send down all the barriers */
>>> @@ -3410,7 +3408,7 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>           if (dev->missing)
>>>               continue;
>>>           if (!dev->bdev) {
>>> -            errors_send++;
>>> +            dev->err_send = 1;
>>>               continue;
>>>           }
>>>           if (!dev->in_fs_metadata || !dev->writeable)
>>> @@ -3418,7 +3416,7 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>
>>>           ret = write_dev_flush(dev, 0);
>>>           if (ret)
>>> -            errors_send++;
>>> +            dev->err_send = 1;
>>>       }
>>>
>>>       /* wait for all the barriers */
>>> @@ -3426,7 +3424,7 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>           if (dev->missing)
>>>               continue;
>>>           if (!dev->bdev) {
>>> -            errors_wait++;
>>> +            dev->err_wait = 1;
>>>               continue;
>>>           }
>>>           if (!dev->in_fs_metadata || !dev->writeable)
>>> @@ -3434,10 +3432,9 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>
>>>           ret = write_dev_flush(dev, 1);
>>>           if (ret)
>>> -            errors_wait++;
>>> +            dev->err_wait = 1;
>>>       }
>>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>>> +    if (btrfs_check_degradable(info, info->sb->s_flags) < 0)
>>>           return -EIO;
>>>       return 0;
>>>   }
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index f1ef215..88266fa 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -6945,8 +6945,12 @@ int btrfs_check_degradable(struct btrfs_fs_info
>>> *fs_info, unsigned flags)
>>>               btrfs_get_num_tolerated_disk_barrier_failures(
>>>                       map->type);
>>>           for (i = 0; i < map->num_stripes; i++) {
>>> -            if (map->stripes[i].dev->missing)
>>> +            if (map->stripes[i].dev->missing ||
>>> +                map->stripes[i].dev->err_wait ||
>>> +                map->stripes[i].dev->err_send)
>>>                   missing++;
>>> +            map->stripes[i].dev->err_wait = 0;
>>> +            map->stripes[i].dev->err_send = 0;
>>>           }
>>>           if (missing > max_tolerated) {
>>>               ret = -EIO;
>>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>>> index fe758df..cd02556 100644
>>> --- a/fs/btrfs/volumes.h
>>> +++ b/fs/btrfs/volumes.h
>>> @@ -76,6 +76,10 @@ struct btrfs_device {
>>>       int can_discard;
>>>       int is_tgtdev_for_dev_replace;
>>>
>>> +    /* for barrier_all_devices() check */
>>> +    int err_send;
>>> +    int err_wait;
>>> +
>>>   #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>>       seqcount_t data_seqcount;
>>>   #endif
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Btrfs: Per-chunk degradable check
  2015-09-21  2:10 [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
                   ` (4 preceding siblings ...)
  2015-09-21  2:10 ` [PATCH 5/5] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
@ 2015-11-05  0:57 ` Qu Wenruo
  5 siblings, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2015-11-05  0:57 UTC (permalink / raw)
  To: linux-btrfs, Chris Mason; +Cc: Anand Jain

Any new comment?

And to Chris, is it possible to pick these patchset for 4.4?

IMHO it's small enough (less than 100 lines) and didn't change degraded 
mount behavior much.

Thanks,
Qu

Qu Wenruo wrote on 2015/09/21 10:10 +0800:
> Btrfs currently uses num_tolerated_disk_barrier_failures to do global
> check for tolerated missing device.
>
> Although the one-size-fit-all solution is quite safe, it's too strict if
> data and metadata has different duplication level.
>
> For example, if one use Single data and RAID1 metadata for 2 disks, it
> means any missing device will make the fs unable to be degraded mounted.
>
> But in fact, some times all single chunks may be in the existing device
> and in that case, we should allow it to be rw degraded mounted.
>
> Such case can be easily reproduced using the following script:
>   # mkfs.btrfs -f -m raid1 -d sing /dev/sdb /dev/sdc
>   # wipefs -f /dev/sdc
>   # mount /dev/sdb -o degraded,rw
>
> If using btrfs-debug-tree to check /dev/sdb, one should find that the
> data chunk is only in sdb, so in fact it should allow degraded mount.
>
> This patchset will introduce a new per-chunk degradable check for btrfs,
> allow above case to succeed, and it's quite small anyway.
>
> Also, it provides the possibility for later enhancement, like
> automatically add 'degraded' mount option if possible.
>
> Cc: Anand Jain <anand.jain@oracle.com>
>
> Qu Wenruo (5):
>    btrfs: Introduce a new function to check if all chunks a OK for
>      degraded     mount
>    btrfs: Do per-chunk check for mount time check
>    btrfs: Do per-chunk degraded check for remount
>    btrfs: Allow barrier_all_devices to do per-chunk device check
>    btrfs: Cleanup num_tolerated_disk_barrier_failures
>
>   fs/btrfs/ctree.h   |  2 --
>   fs/btrfs/disk-io.c | 87 ++++++++++--------------------------------------------
>   fs/btrfs/disk-io.h |  2 --
>   fs/btrfs/super.c   | 11 ++++---
>   fs/btrfs/volumes.c | 84 +++++++++++++++++++++++++++++++++++++++++-----------
>   fs/btrfs/volumes.h |  5 ++++
>   6 files changed, 94 insertions(+), 97 deletions(-)
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] btrfs: fix btrfs_check_degradable() to free extent map
  2015-09-21  2:10 ` [PATCH 1/5] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Qu Wenruo
@ 2016-04-18  8:47   ` Anand Jain
  0 siblings, 0 replies; 17+ messages in thread
From: Anand Jain @ 2016-04-18  8:47 UTC (permalink / raw)
  To: quwenruo; +Cc: linux-btrfs

I am making the following changes...

--------------- 8< ----------------------
Free the extent map and realign the map_tree read_lock
to fix the following..

kernel: BUG btrfs_extent_map (Tainted: G    B          ): Objects remaining in btrfs_extent_map on __kmem_cache_shutdown()
kernel: -----------------------------------------------------------------------------
kernel: INFO: Slab 0xffffea0001e4ef80 objects=28 used=9 fp=0xffff8800793be1b0 flags=0x1fffe0000000080
kernel: CPU: 0 PID: 5397 Comm: modprobe Tainted: G    B           4.6.0-rc2asj+ #5
kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
kernel:  0000000000000086 00000000c6b0aa23 ffff880035f3bcf8 ffffffff8141b7e3
kernel:  ffffea0001e4ef80 ffff880036994a00 ffff880035f3bdd0 ffffffff81217310
kernel:  0000000000000020 ffff880035f3bde0 ffff880035f3bd90 656a624f00000000
kernel: Call Trace:
kernel:  [<ffffffff8141b7e3>] dump_stack+0x85/0xc2
kernel:  [<ffffffff81217310>] slab_err+0xb0/0xe0
kernel:  [<ffffffff8121c109>] ? __kmalloc+0x259/0x3a0
kernel:  [<ffffffff8121f34c>] ? __kmem_cache_shutdown+0x14c/0x380
kernel:  [<ffffffff8121f36b>] __kmem_cache_shutdown+0x16b/0x380
kernel:  [<ffffffff811de5fb>] kmem_cache_destroy+0x1fb/0x2b0
kernel:  [<ffffffffa055be55>] extent_map_exit+0x15/0x20 [btrfs]
kernel:  [<ffffffffa05d8012>] exit_btrfs_fs+0x27/0x1015 [btrfs]
kernel:  [<ffffffff811365e3>] SyS_delete_module+0x1b3/0x260
kernel:  [<ffffffff810036ba>] ? exit_to_usermode_loop+0x6a/0xc0
kernel:  [<ffffffff8100301b>] ? trace_hardirqs_on_thunk+0x1b/0x1d
kernel:  [<ffffffff816a023c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
kernel: INFO: Object 0xffff8800793be240 @offset=576
kernel: INFO: Object 0xffff8800793be2d0 @offset=720
kernel: INFO: Object 0xffff8800793be3f0 @offset=1008
kernel: INFO: Object 0xffff8800793be480 @offset=1152
kernel: INFO: Object 0xffff8800793be7e0 @offset=2016
kernel: INFO: Object 0xffff8800793be870 @offset=2160
kernel: INFO: Object 0xffff8800793be900 @offset=2304
kernel: INFO: Object 0xffff8800793beab0 @offset=2736
kernel: INFO: Object 0xffff8800793bebd0 @offset=3024

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 91d094ae2a10..36398c9d34eb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7105,6 +7105,7 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
 
 	read_lock(&map_tree->map_tree.lock);
 	em = lookup_extent_mapping(&map_tree->map_tree, 0, (u64)(-1));
+	read_unlock(&map_tree->map_tree.lock);
 	/* No any chunk? Should be a huge bug */
 	if (!em) {
 		ret = -ENOENT;
@@ -7143,11 +7144,14 @@ int btrfs_check_degradable(struct btrfs_fs_info *fs_info, unsigned flags)
 		 * Alwasy search range [next_start, (u64)-1) to find the next
 		 * chunk map
 		 */
+		free_extent_map(em);
+		read_lock(&map_tree->map_tree.lock);
 		em = lookup_extent_mapping(&map_tree->map_tree, next_start,
 					   (u64)(-1) - next_start);
+		read_unlock(&map_tree->map_tree.lock);
 	}
 out:
-	read_unlock(&map_tree->map_tree.lock);
+	free_extent_map(em);
 	return ret;
 }
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-04-18  8:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-21  2:10 [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo
2015-09-21  2:10 ` [PATCH 1/5] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Qu Wenruo
2016-04-18  8:47   ` [PATCH] btrfs: fix btrfs_check_degradable() to free extent map Anand Jain
2015-09-21  2:10 ` [PATCH 2/5] btrfs: Do per-chunk check for mount time check Qu Wenruo
2015-09-25  7:05   ` Anand Jain
2015-09-21  2:10 ` [PATCH 3/5] btrfs: Do per-chunk degraded check for remount Qu Wenruo
2015-09-25  6:54   ` Anand Jain
2015-09-25  8:08     ` Qu Wenruo
2015-09-25  8:30       ` Anand Jain
2015-09-25  8:34         ` Qu Wenruo
2015-09-25  8:24   ` [PATCH 1/1] " Anand Jain
2015-09-21  2:10 ` [PATCH 4/5] btrfs: Allow barrier_all_devices to do per-chunk device check Qu Wenruo
2015-10-30  8:32   ` Anand Jain
2015-10-30 11:41     ` Qu Wenruo
2015-10-30 23:52       ` Anand Jain
2015-09-21  2:10 ` [PATCH 5/5] btrfs: Cleanup num_tolerated_disk_barrier_failures Qu Wenruo
2015-11-05  0:57 ` [PATCH 0/5] Btrfs: Per-chunk degradable check Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).