linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3 of 4] MD RAID10:  Improve redundancy for 'far' and 'offset' algorithms (part 2)
@ 2012-12-17 18:40 Jonathan Brassow
  2013-01-07  6:04 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Brassow @ 2012-12-17 18:40 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb, agk, jbrassow

MD RAID10:  Improve redundancy for 'far' and 'offset' algorithms (part 2)

This patch addresses raid arrays that have a number of devices that cannot
be evenly divided by 'far_copies'.  (E.g. 5 devices, far_copies = 2)  This
case must be handled differently because it causes that last set to be of
a different size than the rest of the sets.  We must compute a new modulo
for this last set so that copied chunks are properly wrapped around.

Example use_far_sets=1, far_copies=2, near_copies=1, devices=5:
                "far" algorithm
        dev1 dev2 dev3 dev4 dev5
	==== ==== ==== ==== ====
	[ A   B ] [ C    D   E ]
        [ G   H ] [ I    J   K ]
                    ...
        [ B   A ] [ E    C   D ] --> nominal set of 2 and last set of 3
        [ H   G ] [ K    I   J ]     []'s show far/offset sets

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>

Index: linux-upstream/drivers/md/raid10.c
===================================================================
--- linux-upstream.orig/drivers/md/raid10.c
+++ linux-upstream/drivers/md/raid10.c
@@ -550,6 +550,13 @@ static void __raid10_find_phys(struct ge
 	sector_t stripe;
 	int dev;
 	int slot = 0;
+	int last_far_set_start, last_far_set_size;
+
+	last_far_set_start = (geo->raid_disks / geo->far_set_size) - 1;
+	last_far_set_start *= geo->far_set_size;
+
+	last_far_set_size = geo->far_set_size;
+	last_far_set_size += (geo->raid_disks % geo->far_set_size);
 
 	/* now calculate first sector/dev */
 	chunk = r10bio->sector >> geo->chunk_shift;
@@ -575,9 +582,16 @@ static void __raid10_find_phys(struct ge
 		for (f = 1; f < geo->far_copies; f++) {
 			set = d / geo->far_set_size;
 			d += geo->near_copies;
-			d %= geo->far_set_size;
-			d += geo->far_set_size * set;
 
+			if ((geo->raid_disks % geo->far_set_size) &&
+			    (d > last_far_set_start)) {
+				d -= last_far_set_start;
+				d %= last_far_set_size;
+				d += last_far_set_start;
+			} else {
+				d %= geo->far_set_size;
+				d += geo->far_set_size * set;
+			}
 			s += geo->stride;
 			r10bio->devs[slot].devnum = d;
 			r10bio->devs[slot].addr = s;
@@ -615,6 +629,18 @@ static sector_t raid10_find_virt(struct
 	struct geom *geo = &conf->geo;
 	int far_set_start = (dev / geo->far_set_size) * geo->far_set_size;
 	int far_set_size = geo->far_set_size;
+	int last_far_set_start;
+
+	if (geo->raid_disks % geo->far_set_size) {
+		last_far_set_start = (geo->raid_disks / geo->far_set_size) - 1;
+		last_far_set_start *= geo->far_set_size;
+
+		if (dev >= last_far_set_start) {
+			far_set_size = geo->far_set_size;
+			far_set_size += (geo->raid_disks % geo->far_set_size);
+			far_set_start = last_far_set_start;
+		}
+	}
 
 	offset = sector & geo->chunk_mask;
 	if (geo->far_offset) {



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 3 of 4] MD RAID10:  Improve redundancy for 'far' and 'offset' algorithms (part 2)
  2012-12-17 18:40 [PATCH 3 of 4] MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2) Jonathan Brassow
@ 2013-01-07  6:04 ` NeilBrown
  2013-01-07 20:56   ` Brassow Jonathan
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2013-01-07  6:04 UTC (permalink / raw)
  To: Jonathan Brassow; +Cc: linux-raid, agk

[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]


Hi Jon,
 I really like the way you have coded this - it makes for very neat code.

 I'm happy to take  it as it is, however I'll just note a couple of little
 issues that you could change if you like:

> --- linux-upstream.orig/drivers/md/raid10.c
> +++ linux-upstream/drivers/md/raid10.c
> @@ -550,6 +550,13 @@ static void __raid10_find_phys(struct ge
>  	sector_t stripe;
>  	int dev;
>  	int slot = 0;
> +	int last_far_set_start, last_far_set_size;
> +
> +	last_far_set_start = (geo->raid_disks / geo->far_set_size) - 1;
> +	last_far_set_start *= geo->far_set_size;
> +
> +	last_far_set_size = geo->far_set_size;
> +	last_far_set_size += (geo->raid_disks % geo->far_set_size);
>  
>  	/* now calculate first sector/dev */
>  	chunk = r10bio->sector >> geo->chunk_shift;
> @@ -575,9 +582,16 @@ static void __raid10_find_phys(struct ge
>  		for (f = 1; f < geo->far_copies; f++) {
>  			set = d / geo->far_set_size;
>  			d += geo->near_copies;
> -			d %= geo->far_set_size;
> -			d += geo->far_set_size * set;
>  
> +			if ((geo->raid_disks % geo->far_set_size) &&
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This test isn't needed.  If it is false, then last_far_set_size will be the
same as geo->far_set_size, so both branches will do the same thing.


> +			    (d > last_far_set_start)) {
> +				d -= last_far_set_start;
> +				d %= last_far_set_size;
> +				d += last_far_set_start;
> +			} else {
> +				d %= geo->far_set_size;
> +				d += geo->far_set_size * set;
> +			}
>  			s += geo->stride;
>  			r10bio->devs[slot].devnum = d;
>  			r10bio->devs[slot].addr = s;
> @@ -615,6 +629,18 @@ static sector_t raid10_find_virt(struct
>  	struct geom *geo = &conf->geo;
>  	int far_set_start = (dev / geo->far_set_size) * geo->far_set_size;
>  	int far_set_size = geo->far_set_size;
> +	int last_far_set_start;
> +
> +	if (geo->raid_disks % geo->far_set_size) {

Similarly I don't think this test is needed - just perform the following code
in any case.
Maybe you will think the code is more clear the way it is.  If so, I'm fine
to leave it as it is.

Thanks,
NeilBrown



> +		last_far_set_start = (geo->raid_disks / geo->far_set_size) - 1;
> +		last_far_set_start *= geo->far_set_size;
> +
> +		if (dev >= last_far_set_start) {
> +			far_set_size = geo->far_set_size;
> +			far_set_size += (geo->raid_disks % geo->far_set_size);
> +			far_set_start = last_far_set_start;
> +		}
> +	}
>  
>  	offset = sector & geo->chunk_mask;
>  	if (geo->far_offset) {
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 3 of 4] MD RAID10:  Improve redundancy for 'far' and 'offset' algorithms (part 2)
  2013-01-07  6:04 ` NeilBrown
@ 2013-01-07 20:56   ` Brassow Jonathan
  0 siblings, 0 replies; 3+ messages in thread
From: Brassow Jonathan @ 2013-01-07 20:56 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org Raid, Alasdair Kergon,
	Jonathan Brassow

Neil,

In the normal case (I hope!) most people will choose to have a device count that is a multiple of 'copies'.  My hope was to avoid doing too much extra work just because we need to allow for the possibility of non-mulitples.  (In fact, I had considered encapsulating the assignment of 'last_far_set_start|size' in a conditional so it would normally be avoided, but didn't.)  Having the conditionals signals (to me) that we are treating the last set differently... because it is different in the non-multiple case.  Without the conditionals, we would treat the last set differently even in the proper 'multiple' case (i.e. the case where the last set is /not/ different from the others).

It is clearer to me the way it is, but it seems like a simple preference and nothing more.  I will happily change it if you like.

 brassow

(sorry for top post)

On Jan 7, 2013, at 12:04 AM, NeilBrown wrote:

> 
> Hi Jon,
> I really like the way you have coded this - it makes for very neat code.
> 
> I'm happy to take  it as it is, however I'll just note a couple of little
> issues that you could change if you like:
> 
>> --- linux-upstream.orig/drivers/md/raid10.c
>> +++ linux-upstream/drivers/md/raid10.c
>> @@ -550,6 +550,13 @@ static void __raid10_find_phys(struct ge
>> 	sector_t stripe;
>> 	int dev;
>> 	int slot = 0;
>> +	int last_far_set_start, last_far_set_size;
>> +
>> +	last_far_set_start = (geo->raid_disks / geo->far_set_size) - 1;
>> +	last_far_set_start *= geo->far_set_size;
>> +
>> +	last_far_set_size = geo->far_set_size;
>> +	last_far_set_size += (geo->raid_disks % geo->far_set_size);
>> 
>> 	/* now calculate first sector/dev */
>> 	chunk = r10bio->sector >> geo->chunk_shift;
>> @@ -575,9 +582,16 @@ static void __raid10_find_phys(struct ge
>> 		for (f = 1; f < geo->far_copies; f++) {
>> 			set = d / geo->far_set_size;
>> 			d += geo->near_copies;
>> -			d %= geo->far_set_size;
>> -			d += geo->far_set_size * set;
>> 
>> +			if ((geo->raid_disks % geo->far_set_size) &&
>                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> This test isn't needed.  If it is false, then last_far_set_size will be the
> same as geo->far_set_size, so both branches will do the same thing.
> 
> 
>> +			    (d > last_far_set_start)) {
>> +				d -= last_far_set_start;
>> +				d %= last_far_set_size;
>> +				d += last_far_set_start;
>> +			} else {
>> +				d %= geo->far_set_size;
>> +				d += geo->far_set_size * set;
>> +			}
>> 			s += geo->stride;
>> 			r10bio->devs[slot].devnum = d;
>> 			r10bio->devs[slot].addr = s;
>> @@ -615,6 +629,18 @@ static sector_t raid10_find_virt(struct
>> 	struct geom *geo = &conf->geo;
>> 	int far_set_start = (dev / geo->far_set_size) * geo->far_set_size;
>> 	int far_set_size = geo->far_set_size;
>> +	int last_far_set_start;
>> +
>> +	if (geo->raid_disks % geo->far_set_size) {
> 
> Similarly I don't think this test is needed - just perform the following code
> in any case.
> Maybe you will think the code is more clear the way it is.  If so, I'm fine
> to leave it as it is.
> 
> Thanks,
> NeilBrown
> 
> 
> 
>> +		last_far_set_start = (geo->raid_disks / geo->far_set_size) - 1;
>> +		last_far_set_start *= geo->far_set_size;
>> +
>> +		if (dev >= last_far_set_start) {
>> +			far_set_size = geo->far_set_size;
>> +			far_set_size += (geo->raid_disks % geo->far_set_size);
>> +			far_set_start = last_far_set_start;
>> +		}
>> +	}
>> 
>> 	offset = sector & geo->chunk_mask;
>> 	if (geo->far_offset) {
>> 
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-01-07 20:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-17 18:40 [PATCH 3 of 4] MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2) Jonathan Brassow
2013-01-07  6:04 ` NeilBrown
2013-01-07 20:56   ` Brassow Jonathan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).