[PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks
@ 2016-04-28  3:06 Anand Jain
  2016-04-28  3:06 ` [PATCH 1/2] " Anand Jain
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Anand Jain @ 2016-04-28  3:06 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, clm

>From the comments that commit[1] deleted

- /*
- * we add in the count of missing devices because we want
- * to make sure that any RAID levels on a degraded FS
- * continue to be honored.
- *

appear to me that automatic reduced-chunk-allocation
when RAID1 is degraded wasn't in the original design.

which also introduced unpleasant things like automatically
allocating single chunks when RAID1 is mounted in degraded
mode, which will hinder further RAID1 mount in degraded
mode.

For example:
mkfs.btrfs -f -d raid1 -m raid1 /dev/sdc /dev/sdd
modprobe -r btrfs && modprobe btrfs
mount -o degraded /dev/sdc /btrfs
btrfs fi df /btrfs
Data, RAID1: total=409.56MiB, used=64.00KiB
Data, single: total=416.00MiB, used=128.00KiB
System, RAID1: total=8.00MiB, used=16.00KiB
System, single: total=32.00MiB, used=0.00B
Metadata, RAID1: total=409.56MiB, used=112.00KiB
Metadata, single: total=256.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
mount -o degraded /dev/sdc /btrfs <-- fails

So I am proposing to revert this commit [1].

And now to fix the original issue that is - chunk allocation
fails when RAID1 is degraded, The reason for the problem
seems to be that we had the devs_min attribute for RAID1
set wrongly. Correcting this also means that its time to
fix the RAID1 fixmes in the functions __btrfs_alloc_chunk()
patch [2] does that, and is for review.

Limited tested. For your review comments. Thanks.


[1]
------------------------------
commit 95669976bd7d30ae265db938ecb46a6b7f8cb893

Btrfs: don't consider the missing device when allocating new chunks

The original code allocated new chunks by the number of the writable
devices and missing devices to make sure that any RAID levels on a
degraded FS continue to be honored, but it introduced a problem that
it stopped us to allocating new chunks, the steps to reproduce is
following:

# mkfs.btrfs -m raid1 -d raid1 -f <dev0> <dev1>
# mkfs.btrfs -f <dev1> //Removing <dev1> from the original fs
# mount -o degraded <dev0> <mnt>
# dd if=/dev/null of=<mnt>/tmpfile bs=1M

It is because we allocate new chunks only on the writable devices,
if we take the number of missing devices into account, and want to
allocate new chunks with higher RAID level, we will fail because we
don't have enough writable device. Fix it by ignoring the number
of missing devices when allocating new chunks.
-----------------


[2]
[PATH] btrfs: create degraded-RAID1 chunks


Anand Jain (2):
  btrfs: create degraded-RAID1 chunks
  revert: Btrfs: don't consider the missing device when allocating new
    chunks

 fs/btrfs/extent-tree.c | 17 +++++++++++++++--
 fs/btrfs/volumes.c     | 38 +++++++++++++++++++++++++++++++++-----
 2 files changed, 48 insertions(+), 7 deletions(-)

-- 
2.7.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] btrfs: create degraded-RAID1 chunks
  2016-04-28  3:06 [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks Anand Jain
@ 2016-04-28  3:06 ` Anand Jain
  2016-04-29 16:42   ` David Sterba
  2016-04-28  3:06 ` [PATCH 2/2] revert: Btrfs: don't consider the missing device when allocating new chunks Anand Jain
  2016-04-29 16:37 ` [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks David Sterba
  2 siblings, 1 reply; 9+ messages in thread
From: Anand Jain @ 2016-04-28  3:06 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, clm

When RAID1 is degraded, newer chunks should be degraded-RAID1
chunks instead of single chunks.

The bug is because the devs_min for raid1 was wrong it should
be 1, instead of 2.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 38 +++++++++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e2b54d546b7c..8b87ed6eb381 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -56,7 +56,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.sub_stripes	= 1,
 		.dev_stripes	= 1,
 		.devs_max	= 2,
-		.devs_min	= 2,
+		.devs_min	= 1,
 		.tolerated_failures = 1,
 		.devs_increment	= 2,
 		.ncopies	= 2,
@@ -4513,6 +4513,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	int i;
 	int j;
 	int index;
+	int missing_dev = 0;
 
 	BUG_ON(!alloc_profile_is_valid(type, 0));
 
@@ -4627,14 +4628,35 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 	     btrfs_cmp_device_info, NULL);
 
-	/* round down to number of usable stripes */
-	ndevs -= ndevs % devs_increment;
 
-	if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) {
+	/*
+	 * For raid1 and raid10; ndevs = devs_min, is less than
+	 * (devs_increment * sub_stripes) = x in degraded mode.
+	 * For rest of the RAIDs, x is 1
+	 */
+	if (ndevs >= (devs_increment * sub_stripes)) {
+		/* round down to number of usable stripes */
+		ndevs -= ndevs % devs_increment;
+	}
+
+	/*
+	 * Fix devs_min for RAID10
+	 */
+	if (ndevs < devs_min) {
+		/* todo: look for better error reporting */
 		ret = -ENOSPC;
 		goto error;
 	}
 
+	/*
+	 * For RAID1 and RAID10 if there is no sufficient dev for mirror
+	 * or stripe, let the chunks be created in degraded mode.
+	 * And for rest of RAIDs its fine as (devs_increment * sub_stripes)
+	 * is 1.
+	 */
+	if (ndevs < (devs_increment * sub_stripes))
+		missing_dev = (devs_increment * sub_stripes) - ndevs;
+
 	if (devs_max && ndevs > devs_max)
 		ndevs = devs_max;
 	/*
@@ -4645,11 +4667,17 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	num_stripes = ndevs * dev_stripes;
 
 	/*
-	 * this will have to be fixed for RAID1 and RAID10 over
+	 * This will have to be fixed for RAID1 and RAID10 over
 	 * more drives
 	 */
 	data_stripes = num_stripes / ncopies;
 
+	if (type & BTRFS_BLOCK_GROUP_RAID1) {
+		if (missing_dev) {
+			/* For RAID1 data_stripes is always 1 rather */
+			data_stripes = num_stripes;
+		}
+	}
 	if (type & BTRFS_BLOCK_GROUP_RAID5) {
 		raid_stripe_len = find_raid56_stripe_len(ndevs - 1,
 				 btrfs_super_stripesize(info->super_copy));
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] revert: Btrfs: don't consider the missing device when allocating new chunks
  2016-04-28  3:06 [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks Anand Jain
  2016-04-28  3:06 ` [PATCH 1/2] " Anand Jain
@ 2016-04-28  3:06 ` Anand Jain
  2016-04-29 16:37 ` [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks David Sterba
  2 siblings, 0 replies; 9+ messages in thread
From: Anand Jain @ 2016-04-28  3:06 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, clm

This patch reverts the commit
 95669976bd7d30ae265db938ecb46a6b7f8cb893
 Btrfs: don't consider the missing device when allocating new chunks

Original code was correct as in the deleted comments
/*
 * we add in the count of missing devices because we want
 * to make sure that any RAID levels on a degraded FS
 * continue to be honored.
 */

The following RFC patch will take care of the original bug
  btrfs: create degraded-RAID1 chunks

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/extent-tree.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 53e12977bfd0..bf60da1020b7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3968,12 +3968,19 @@ static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags)
  */
 static u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags)
 {
-	u64 num_devices = root->fs_info->fs_devices->rw_devices;
+	u64 num_devices;
 	u64 target;
 	u64 raid_type;
 	u64 allowed = 0;
 
 	/*
+	 * we add in the count of missing devices because we want
+	 * to make sure that any RAID levels on a degraded FS
+	 * continue to be honored.
+	 */
+	num_devices = root->fs_info->fs_devices->rw_devices +
+			root->fs_info->fs_devices->missing_devices;
+	/*
 	 * see if restripe for this chunk_type is in progress, if so
 	 * try to reduce to the target profile
 	 */
@@ -9146,7 +9153,13 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags)
 	if (stripped)
 		return extended_to_chunk(stripped);
 
-	num_devices = root->fs_info->fs_devices->rw_devices;
+	/*
+	 * we add in the count of missing devices because we want
+	 * to make sure that any RAID levels on a degraded FS
+	 * continue to be honored.
+	 */
+	num_devices = root->fs_info->fs_devices->rw_devices +
+			root->fs_info->fs_devices->missing_devices;
 
 	stripped = BTRFS_BLOCK_GROUP_RAID0 |
 		BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 |
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks
  2016-04-28  3:06 [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks Anand Jain
  2016-04-28  3:06 ` [PATCH 1/2] " Anand Jain
  2016-04-28  3:06 ` [PATCH 2/2] revert: Btrfs: don't consider the missing device when allocating new chunks Anand Jain
@ 2016-04-29 16:37 ` David Sterba
  2016-05-02  4:12   ` Anand Jain
  2 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2016-04-29 16:37 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, clm

On Thu, Apr 28, 2016 at 11:06:18AM +0800, Anand Jain wrote:
> From the comments that commit[1] deleted
> 
> - /*
> - * we add in the count of missing devices because we want
> - * to make sure that any RAID levels on a degraded FS
> - * continue to be honored.
> - *
> 
> appear to me that automatic reduced-chunk-allocation
> when RAID1 is degraded wasn't in the original design.
> 
> which also introduced unpleasant things like automatically
> allocating single chunks when RAID1 is mounted in degraded
> mode, which will hinder further RAID1 mount in degraded
> mode.

Agreed. As the automatic conversion cannot be turned off, it causes some
surprises. We've opposed against such things in the past, so I'm for
not doing the 'single' allocations. Independly, I got a feedback from a
user who liked the proposed change.

> And now to fix the original issue that is - chunk allocation
> fails when RAID1 is degraded, The reason for the problem
> seems to be that we had the devs_min attribute for RAID1
> set wrongly. Correcting this also means that its time to
> fix the RAID1 fixmes in the functions __btrfs_alloc_chunk()
> patch [2] does that, and is for review.

This means we'd allow full writes to a degraded raid1 filesystem. This
can bring surprises as well. The question is what to do if the device
pops out, some writes happen, and then is added.

One option is to set some bit in the degraded filesystem that degraded
writes happened. After that, mounting the whole filesystem would
recommend running scrub before dropping the bit. Forcing a read-only
mount here would be similar to read-only degraded mount, so I guess we'd
have to somehow deal with the missing writes.

I haven't thought about all details, the raid1 auto-repair can handle
corrupted data, I think missing metadata should be handled as well and
repaired.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: create degraded-RAID1 chunks
  2016-04-28  3:06 ` [PATCH 1/2] " Anand Jain
@ 2016-04-29 16:42   ` David Sterba
  2016-05-02  6:10     ` Anand Jain
  0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2016-04-29 16:42 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, clm

On Thu, Apr 28, 2016 at 11:06:19AM +0800, Anand Jain wrote:
> When RAID1 is degraded, newer chunks should be degraded-RAID1
> chunks instead of single chunks.
> 
> The bug is because the devs_min for raid1 was wrong it should
> be 1, instead of 2.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/volumes.c | 38 +++++++++++++++++++++++++++++++++-----
>  1 file changed, 33 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e2b54d546b7c..8b87ed6eb381 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -56,7 +56,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.sub_stripes	= 1,
>  		.dev_stripes	= 1,
>  		.devs_max	= 2,
> -		.devs_min	= 2,
> +		.devs_min	= 1,

I think we should introduce another way how to determine the lower limit
for the degraded mounts. We need the proper raidX constraints and use
the degraded limits only if in case of the degraded mount.

>  		.tolerated_failures = 1,

Which is exactly the tolerated_failures:

  degraded_devs_min == devs_min - tolerated_failures

which works for all raid levels with redundancy.

>  		.devs_increment	= 2,
>  		.ncopies	= 2,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks
  2016-04-29 16:37 ` [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks David Sterba
@ 2016-05-02  4:12   ` Anand Jain
  2016-05-02  5:30     ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: Anand Jain @ 2016-05-02  4:12 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs, clm



On 04/30/2016 12:37 AM, David Sterba wrote:
> On Thu, Apr 28, 2016 at 11:06:18AM +0800, Anand Jain wrote:
>>  From the comments that commit[1] deleted
>>
>> - /*
>> - * we add in the count of missing devices because we want
>> - * to make sure that any RAID levels on a degraded FS
>> - * continue to be honored.
>> - *
>>
>> appear to me that automatic reduced-chunk-allocation
>> when RAID1 is degraded wasn't in the original design.
>>
>> which also introduced unpleasant things like automatically
>> allocating single chunks when RAID1 is mounted in degraded
>> mode, which will hinder further RAID1 mount in degraded
>> mode.
>
> Agreed. As the automatic conversion cannot be turned off, it causes some
> surprises. We've opposed against such things in the past, so I'm for
> not doing the 'single' allocations. Independly, I got a feedback from a
> user who liked the proposed change..

yes.


>> And now to fix the original issue that is - chunk allocation
>> fails when RAID1 is degraded, The reason for the problem
>> seems to be that we had the devs_min attribute for RAID1
>> set wrongly. Correcting this also means that its time to
>> fix the RAID1 fixmes in the functions __btrfs_alloc_chunk()
>> patch [2] does that, and is for review.
>
> This means we'd allow full writes to a degraded raid1 filesystem. This
> can bring surprises as well. The question is what to do if the device
> pops out, some writes happen, and then is added.

> One option is to set some bit in the degraded filesystem that degraded
> writes happened.

> After that, mounting the whole filesystem would
> recommend running scrub before dropping the bit.

Right some flag should tell to fix the degraded chunks. Any
suggestion on naming? (as of now calling it degraded-chunk-write-flag).

Also as of now I think its ok to fail the mount when its found
that both of the RAID1 devices has degraded-chunk-write-flag set
(a split brain situation) so that user can mount one of the device
and freshly btrfs-device-add the other.


> Forcing a read-only
> mount here would be similar to read-only degraded mount, so I guess we'd
> have to somehow deal with the missing writes.

> I haven't thought about all details, the raid1 auto-repair can handle
> corrupted data, I think missing metadata should be handled as well and
> repaired.


I found raid5 scrub nicely handles the missing writes. However
RAID1 (and guess raid10 as well) needs balance. (I would like to
keep it as it is as of now). IMO RAID1 should do what RAID5 is doing.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks
  2016-05-02  4:12   ` Anand Jain
@ 2016-05-02  5:30     ` Duncan
  0 siblings, 0 replies; 9+ messages in thread
From: Duncan @ 2016-05-02  5:30 UTC (permalink / raw)
  To: linux-btrfs

Anand Jain posted on Mon, 02 May 2016 12:12:31 +0800 as excerpted:

> On 04/30/2016 12:37 AM, David Sterba wrote:
>> On Thu, Apr 28, 2016 at 11:06:18AM +0800, Anand Jain wrote:
>>>  From the comments that commit[1] deleted
>>>
>>> - /*
>>> - * we add in the count of missing devices because we want
>>> - * to make sure that any RAID levels on a degraded FS
>>> - * continue to be honored.
>>> - *
>>>
>>> appear to me that automatic reduced-chunk-allocation when RAID1 is
>>> degraded wasn't in the original design.
>>>
>>> which also introduced unpleasant things like automatically allocating
>>> single chunks when RAID1 is mounted in degraded mode, which will
>>> hinder further RAID1 mount in degraded mode.
>>
>> Agreed. As the automatic conversion cannot be turned off, it causes
>> some surprises. We've opposed against such things in the past, so I'm
>> for not doing the 'single' allocations. Independly, I got a feedback
>> from a user who liked the proposed change..
> 
> yes.

Sounds good to this user too. =:^)

>>> And now to fix the original issue that is - chunk allocation fails
>>> when RAID1 is degraded, The reason for the problem seems to be that we
>>> had the devs_min attribute for RAID1 set wrongly. Correcting this also
>>> means that its time to fix the RAID1 fixmes in the functions
>>> __btrfs_alloc_chunk() patch [2] does that, and is for review.
>>
>> This means we'd allow full writes to a degraded raid1 filesystem. This
>> can bring surprises as well. The question is what to do if the device
>> pops out, some writes happen, and then is added.
> 
>> One option is to set some bit in the degraded filesystem that degraded
>> writes happened.
> 
>> After that, mounting the whole filesystem would recommend running scrub
>> before dropping the bit.
> 
> Right some flag should tell to fix the degraded chunks. Any suggestion
> on naming? (as of now calling it degraded-chunk-write-flag).

I almost replied earlier, suggesting dirty-raid as the name...  of course 
if my understanding is right, if not it may be inappropriate.

> Also as of now I think its ok to fail the mount when its found that both
> of the RAID1 devices has degraded-chunk-write-flag set (a split brain
> situation) so that user can mount one of the device and freshly
> btrfs-device-add the other.

That idea solves the problem I found in my own early testing, where 
mounting each one separately and writing to it could produce 
unpredictable results, particularly if the generations happened to match 
up.

>> Forcing a read-only mount here would be similar to read-only degraded
>> mount, so I guess we'd have to somehow deal with the missing writes.
> 
>> I haven't thought about all details, the raid1 auto-repair can handle
>> corrupted data, I think missing metadata should be handled as well and
>> repaired.
> 
> 
> I found raid5 scrub nicely handles the missing writes. However RAID1
> (and guess raid10 as well) needs balance. (I would like to keep it as it
> is as of now). IMO RAID1 should do what RAID5 is doing.

??  In my own btrfs raid1 experience, a full scrub fixed things for raid1 
as well.  

One angle of that experience was suspend to ram, with resume failing to 
resume one of the devices because it took too long to come back up, 
resulting in a crash soon after resume so I had to reboot and do a scrub 
anyway.  Ultimately I decided that resume to both devices simply wasn't 
reliable enough to continue testing fate, when much of the time I had to 
reboot AND do a scrub afterward anyway.  So I quit doing suspend to RAM 
at all, and simply started shutting down instead.  (Back with spinning 
rust I really hated to shut down and dump all that cache, but on the ssds 
I run now, it's not a big issue either way, and shutdown/startup is fast 
enough on systemd on ssd that it's not worth worrying about either, so...)

The other angle was continuing to run a defective and slowly failing ssd 
for some time after I realized it was failing, just to see how both the 
ssd and btrfs raid1 dealt with the problems.  I *know* a decent amount of 
those corruptions were metadata, due in part to scrub returning layers of 
unverified errors that would after another run be detected and corrected 
errors, some of which in turn would produce another layer of unverified 
errors, until a repeated scrub eventually came up with no unverified 
errors, at which point that run would correct the remaining errors and 
further runs would return no errors at all.

So AFAIK, raid1 scrub handles the missing writes well too, as long as 
scrub is run again whenever there's unverified errors, so it can detect 
and correct more on the next run after the parent layer was fixed so it 
was possible.

Tho that was a couple kernel cycles ago now, so it's possible raid1 scrub 
regressed since then.

Again, unless I'm misunderstanding what you guys are referring to...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: create degraded-RAID1 chunks
  2016-04-29 16:42   ` David Sterba
@ 2016-05-02  6:10     ` Anand Jain
  2016-05-10 11:00       ` Anand Jain
  0 siblings, 1 reply; 9+ messages in thread
From: Anand Jain @ 2016-05-02  6:10 UTC (permalink / raw)
  To: dsterba, linux-btrfs, clm


Thanks for comments, more below..

On 04/30/2016 12:42 AM, David Sterba wrote:
> On Thu, Apr 28, 2016 at 11:06:19AM +0800, Anand Jain wrote:
>> When RAID1 is degraded, newer chunks should be degraded-RAID1
>> chunks instead of single chunks.
>>
>> The bug is because the devs_min for raid1 was wrong it should
>> be 1, instead of 2.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/volumes.c | 38 +++++++++++++++++++++++++++++++++-----
>>   1 file changed, 33 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index e2b54d546b7c..8b87ed6eb381 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -56,7 +56,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>>   		.sub_stripes	= 1,
>>   		.dev_stripes	= 1,
>>   		.devs_max	= 2,
>> -		.devs_min	= 2,
>> +		.devs_min	= 1,
>
> I think we should introduce another way how to determine the lower limit
> for the degraded mounts. We need the proper raidX constraints and use
> the degraded limits only if in case of the degraded mount.
>
>>   		.tolerated_failures = 1,
>
> Which is exactly the tolerated_failures:
>
>    degraded_devs_min == devs_min - tolerated_failures

  that is devs_min is actually healthy_devs_min.

> which works for all raid levels with redundancy.

But not for RAID5 and RAID6.

Here is a (simulation?) tool which gives some ready ans.
I have added devs_min - tolerated_failures to it.

https://github.com/asj/btrfs-raid-cal.git

I am seeing problem as this:
RAID5&6 devs_min values are in the context of degraded volume.
RAID1&10.. devs_min values are in the context of healthy volume.

RAID56 is correct. We already have devs_max to know the number
of devices in a healthy volumes. RAID1 is devs_min is wrong so
it ended up being same as devs_max.

?

Thanks, Anand


>>   		.devs_increment	= 2,
>>   		.ncopies	= 2,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] btrfs: create degraded-RAID1 chunks
  2016-05-02  6:10     ` Anand Jain
@ 2016-05-10 11:00       ` Anand Jain
  0 siblings, 0 replies; 9+ messages in thread
From: Anand Jain @ 2016-05-10 11:00 UTC (permalink / raw)
  To: dsterba, linux-btrfs, clm



>>> -        .devs_min    = 2,
>>> +        .devs_min    = 1,


>> I think we should introduce another way how to determine the lower limit
>> for the degraded mounts. We need the proper raidX constraints and use
>> the degraded limits only if in case of the degraded mount.

>>>           .tolerated_failures = 1,

>> Which is exactly the tolerated_failures:
>>
>>    degraded_devs_min == devs_min - tolerated_failures

>   that is devs_min is actually healthy_devs_min.

>> which works for all raid levels with redundancy.

> But not for RAID5 and RAID6.

> Here is a (simulation?) tool which gives some ready ans.
> I have added devs_min - tolerated_failures to it.
>
> https://github.com/asj/btrfs-raid-cal.git

I have copied the state table from the above repo and
modified to add the above equation.


[x1                 = devs_increment * sub_stripes]
[ndevs'             = ndevs - ndevs % devs_increment]
[num_stripes        = ndevs' * dev_stripes]
[data_stripes       = num_stripes / ncopies]
[missing_mirror_dev = x1 - ndevs']
[Y                  = devs_min - tolerated_failures ]


		R10 R1 DUP R0 Sn R5 R6
.sub_stripes	= 2, 1, 1, 1, 1, 1, 1
.dev_stripes	= 1, 1, 2, 1, 1, 1, 1
.devs_max	= 0, 2, 1, 0, 1, 0, 0
.devs_min	= 4, 1, 1, 2, 1, 2, 3
.tolerated_fails= 1, 1, 0, 0, 0, 1, 2
.devs_increment	= 2, 2, 1, 1, 1, 1, 1
.ncopies	= 2, 2, 2, 1, 1, 2, 3
  x1             = 4, 2, 1, 1, 1, 1, 1
  Y              = 3, 0, 1, 2, 1, 1, 1

[ndevs = 9]
ndevs		= 9, 9, 9, 9, 9, 9, 9
ndevs'		= 8, 2, 1, 9, 9, 9, 9
num_stripes	= 8, 2, 2, 9, 1, 9, 9
data_stripes	= 4, 1, 1, 9, 1, 8, 7

[ndevs = tolerated_fails + devs_min]
ndevs		= 5, 2, 1, 2, 1, 3, 5
ndevs'		= 4, 2, 1, 2, 1, 3, 5
num_stripes	= 4, 2, 2, 2, 1, 3, 5
data_stripes	= 2, 1, 1, 2, 1, 1, 1

[ndevs = devs_min]
ndevs		= 3, 1, 1, 2, 1, 2, 3
ndevs'		= 2, 0, 1, 2, 1, 2, 3
num_stripes	= -, -, 2, 2, 1, 2, 3
data_stripes	= -, -, 1, 2, 1, 1, 1


[ndevs = devs_min, with RAID1 patch fix]
ndevs		= 3, 1, 1, 2, 1, 2, 3
ndevs'		= 3, 1, 1, 2, 1, 2, 3
num_stripes	= 3, 1, 2, 2, 1, 2, 3
data_stripes	= ?, 1, 1, 2, 1, 1, 1



 > I am seeing problem as this:
 > RAID5&6 devs_min values are in the context of degraded volume.
 > RAID1&10.. devs_min values are in the context of healthy volume.
 >
 > RAID56 is correct. We already have devs_max to know the number
 > of devices in a healthy volumes. RAID1 is devs_min is wrong so
 > it ended up being same as devs_max.


Thanks, Anand



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-05-10 11:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-28  3:06 [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks Anand Jain
2016-04-28  3:06 ` [PATCH 1/2] " Anand Jain
2016-04-29 16:42   ` David Sterba
2016-05-02  6:10     ` Anand Jain
2016-05-10 11:00       ` Anand Jain
2016-04-28  3:06 ` [PATCH 2/2] revert: Btrfs: don't consider the missing device when allocating new chunks Anand Jain
2016-04-29 16:37 ` [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks David Sterba
2016-05-02  4:12   ` Anand Jain
2016-05-02  5:30     ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).