[PATCH RESEND 0/5] md/raid10 changes

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RESEND 0/5] md/raid10 changes
@ 2011-06-15  2:01 Namhyung Kim
  2011-06-15  2:02 ` [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays Namhyung Kim
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15  2:01 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hello Neil,

This is a resend of my previous raid10 patches. Some of them are revised
according to your comments and some of them are not reviewed at all.
Please take a look.

Thanks.


Namhyung Kim (5):
  md/raid10: optimize read_balance() for 'far offset' arrays
  md/raid10: get rid of duplicated conditional expression
  md/raid10: factor out common bio handling code
  md/raid10: share pages between read and write bio's during recovery
  md/raid10: spread read for subordinate r10bios during recovery

 drivers/md/raid10.c |   83 +++++++++++++++++++++++++++-----------------------
 1 files changed, 45 insertions(+), 38 deletions(-)

--
1.7.5.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays
  2011-06-15  2:01 [PATCH RESEND 0/5] md/raid10 changes Namhyung Kim
@ 2011-06-15  2:02 ` Namhyung Kim
  2011-06-15  2:57   ` NeilBrown
  2011-06-15  6:51   ` Keld Jørn Simonsen
  2011-06-15  2:02 ` [PATCH 2/5] md/raid10: get rid of duplicated conditional expression Namhyung Kim
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15  2:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

If @conf->far_offset > 0, there is only 1 stripe so that we can treat
the array same as 'near' arrays.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
 drivers/md/raid10.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6e846688962f..fc56bdd8c3fb 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -531,7 +531,7 @@ retry:
 			break;
 
 		/* for far > 1 always use the lowest address */
-		if (conf->far_copies > 1)
+		if (conf->far_copies > 1 && conf->far_offset == 0)
 			new_distance = r10_bio->devs[slot].addr;
 		else
 			new_distance = abs(r10_bio->devs[slot].addr -
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays
  2011-06-15  2:02 ` [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays Namhyung Kim
@ 2011-06-15  2:57   ` NeilBrown
  2011-06-15  6:51   ` Keld Jørn Simonsen
  1 sibling, 0 replies; 14+ messages in thread
From: NeilBrown @ 2011-06-15  2:57 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-raid

On Wed, 15 Jun 2011 11:02:00 +0900 Namhyung Kim <namhyung@gmail.com> wrote:

> If @conf->far_offset > 0, there is only 1 stripe so that we can treat
> the array same as 'near' arrays.
> 
> Signed-off-by: Namhyung Kim <namhyung@gmail.com>
> ---
>  drivers/md/raid10.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6e846688962f..fc56bdd8c3fb 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -531,7 +531,7 @@ retry:
>  			break;
>  
>  		/* for far > 1 always use the lowest address */
> -		if (conf->far_copies > 1)
> +		if (conf->far_copies > 1 && conf->far_offset == 0)
>  			new_distance = r10_bio->devs[slot].addr;
>  		else
>  			new_distance = abs(r10_bio->devs[slot].addr -

Hi,
 I realise that I said before that this part was OK but having thought about
 it some more I am not convinced.
 Using the 'distance' from were the head previously was is a fairly poor
 heuristic.  In some cases it is the best we have, but it still isn't good.

 With an 'offset' layout, like with a 'far' layout, sections of the array are
 in a RAID0 layout and we know that reading from a RAID0 gives good speed by
 algorithmically distributing the reads evenly over all the devices.

 Setting new_distance to the 'addr' means that we use the algorithmic
 approach to distribute reads.  Setting it to the difference between addr and
 head_position uses the heuristic approach.

 I much prefer the algorithmic (RAID0) approach were it is possible.  So I'm
 not going to apply this patch.  However if you can demonstrate real speedups
 in some realist test I will reconsider my position.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays
  2011-06-15  2:02 ` [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays Namhyung Kim
  2011-06-15  2:57   ` NeilBrown
@ 2011-06-15  6:51   ` Keld Jørn Simonsen
  2011-06-15 12:25     ` Namhyung Kim
  1 sibling, 1 reply; 14+ messages in thread
From: Keld Jørn Simonsen @ 2011-06-15  6:51 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Neil Brown, linux-raid

On Wed, Jun 15, 2011 at 11:02:00AM +0900, Namhyung Kim wrote:
> If @conf->far_offset > 0, there is only 1 stripe so that we can treat
> the array same as 'near' arrays.

does it also work with more than 2 copies - eg 3 copies?
I think the original code just takes the available data blocks with the 
lowest address.

Best regards
keld

> Signed-off-by: Namhyung Kim <namhyung@gmail.com>
> ---
>  drivers/md/raid10.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6e846688962f..fc56bdd8c3fb 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -531,7 +531,7 @@ retry:
>  			break;
>  
>  		/* for far > 1 always use the lowest address */
> -		if (conf->far_copies > 1)
> +		if (conf->far_copies > 1 && conf->far_offset == 0)
>  			new_distance = r10_bio->devs[slot].addr;
>  		else
>  			new_distance = abs(r10_bio->devs[slot].addr -
> -- 
> 1.7.5.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays
  2011-06-15  6:51   ` Keld Jørn Simonsen
@ 2011-06-15 12:25     ` Namhyung Kim
  2011-06-15 14:35       ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15 12:25 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Neil Brown, linux-raid

2011-06-15 (수), 08:51 +0200, Keld Jørn Simonsen:
> On Wed, Jun 15, 2011 at 11:02:00AM +0900, Namhyung Kim wrote:
> > If @conf->far_offset > 0, there is only 1 stripe so that we can treat
> > the array same as 'near' arrays.
> 
> does it also work with more than 2 copies - eg 3 copies?
> I think the original code just takes the available data blocks with the 
> lowest address.
> 

Hi,

Let me clarify this: AFAIK, 'far offset' array saves redundant data in
the diagonally adjacent chunk/disk, so it could be roughly thought as
'raid0' array with reduced size - just ignore redundant chunks here. It
was my mistake considering it as 'near' array. :(

Therefore, it makes more sense distributing reads over the array based
on some criteria - here, the address of starting sector - like RAID0
does. Now I see that the same goes to the 'far copies' array exactly, so
the original code is correct.

Thanks.

-- 
Regards,
Namhyung Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays
  2011-06-15 12:25     ` Namhyung Kim
@ 2011-06-15 14:35       ` Namhyung Kim
  2011-06-15 23:56         ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15 14:35 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Neil Brown, linux-raid

Namhyung Kim <namhyung@gmail.com> writes:
> 2011-06-15 (수), 08:51 +0200, Keld Jørn Simonsen:
>> On Wed, Jun 15, 2011 at 11:02:00AM +0900, Namhyung Kim wrote:
>> > If @conf->far_offset > 0, there is only 1 stripe so that we can treat
>> > the array same as 'near' arrays.
>> 
>> does it also work with more than 2 copies - eg 3 copies?
>> I think the original code just takes the available data blocks with the 
>> lowest address.
>> 
>
> Hi,
>
> Let me clarify this: AFAIK, 'far offset' array saves redundant data in
> the diagonally adjacent chunk/disk, so it could be roughly thought as
> 'raid0' array with reduced size - just ignore redundant chunks here. It
> was my mistake considering it as 'near' array. :(
>

I'm confused again. If fo > 0 && fc > 1 && nc > 1 then it turns out to
a near array with reduced size, no? Does it still need to be treaded
as RAID0?


> Therefore, it makes more sense distributing reads over the array based
> on some criteria - here, the address of starting sector - like RAID0
> does. Now I see that the same goes to the 'far copies' array exactly, so
> the original code is correct.
>
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays
  2011-06-15 14:35       ` Namhyung Kim
@ 2011-06-15 23:56         ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2011-06-15 23:56 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Keld Jørn Simonsen, linux-raid

On Wed, 15 Jun 2011 23:35:53 +0900 Namhyung Kim <namhyung@gmail.com> wrote:

> Namhyung Kim <namhyung@gmail.com> writes:
> > 2011-06-15 (수), 08:51 +0200, Keld Jørn Simonsen:
> >> On Wed, Jun 15, 2011 at 11:02:00AM +0900, Namhyung Kim wrote:
> >> > If @conf->far_offset > 0, there is only 1 stripe so that we can treat
> >> > the array same as 'near' arrays.
> >> 
> >> does it also work with more than 2 copies - eg 3 copies?
> >> I think the original code just takes the available data blocks with the 
> >> lowest address.
> >> 
> >
> > Hi,
> >
> > Let me clarify this: AFAIK, 'far offset' array saves redundant data in
> > the diagonally adjacent chunk/disk, so it could be roughly thought as
> > 'raid0' array with reduced size - just ignore redundant chunks here. It
> > was my mistake considering it as 'near' array. :(
> >
> 
> I'm confused again. If fo > 0 && fc > 1 && nc > 1 then it turns out to
> a near array with reduced size, no? Does it still need to be treaded
> as RAID0?

This would be a mix of near and offset.  I'm not at all sure what the "best"
read balancing approach would be.  But as I don't think anyone would ever
actually use it, I don't think it really matters.

Thanks,
NeilBrown



> 
> 
> > Therefore, it makes more sense distributing reads over the array based
> > on some criteria - here, the address of starting sector - like RAID0
> > does. Now I see that the same goes to the 'far copies' array exactly, so
> > the original code is correct.
> >
> > Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/5] md/raid10: get rid of duplicated conditional expression
  2011-06-15  2:01 [PATCH RESEND 0/5] md/raid10 changes Namhyung Kim
  2011-06-15  2:02 ` [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays Namhyung Kim
@ 2011-06-15  2:02 ` Namhyung Kim
  2011-06-15  2:02 ` [PATCH 3/5] md/raid10: factor out common bio handling code Namhyung Kim
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15  2:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Variable 'first' is initialized to zero and updated to @rdev->raid_disk
only if it is greater than 0. Thus condition '>= first' always implies
'>= 0' so the latter is not needed.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
 drivers/md/raid10.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index fc56bdd8c3fb..fcb86e86bc31 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1093,8 +1093,7 @@ static int raid10_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 	if (rdev->raid_disk >= 0)
 		first = last = rdev->raid_disk;
 
-	if (rdev->saved_raid_disk >= 0 &&
-	    rdev->saved_raid_disk >= first &&
+	if (rdev->saved_raid_disk >= first &&
 	    conf->mirrors[rdev->saved_raid_disk].rdev == NULL)
 		mirror = rdev->saved_raid_disk;
 	else
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] md/raid10: factor out common bio handling code
  2011-06-15  2:01 [PATCH RESEND 0/5] md/raid10 changes Namhyung Kim
  2011-06-15  2:02 ` [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays Namhyung Kim
  2011-06-15  2:02 ` [PATCH 2/5] md/raid10: get rid of duplicated conditional expression Namhyung Kim
@ 2011-06-15  2:02 ` Namhyung Kim
  2011-06-15  2:02 ` [PATCH v2 4/5] md/raid10: share pages between read and write bio's during recovery Namhyung Kim
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15  2:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
 drivers/md/raid10.c |   44 +++++++++++++++++++++++---------------------
 1 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index fcb86e86bc31..a53779ffdf89 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -244,6 +244,23 @@ static inline void update_head_pos(int slot, r10bio_t *r10_bio)
 		r10_bio->devs[slot].addr + (r10_bio->sectors);
 }
 
+/*
+ * Find the disk number which triggered given bio
+ */
+static int find_bio_disk(conf_t *conf, r10bio_t *r10_bio, struct bio *bio)
+{
+	int slot;
+
+	for (slot = 0; slot < conf->copies; slot++)
+		if (r10_bio->devs[slot].bio == bio)
+			break;
+
+	BUG_ON(slot == conf->copies);
+	update_head_pos(slot, r10_bio);
+
+	return r10_bio->devs[slot].devnum;
+}
+
 static void raid10_end_read_request(struct bio *bio, int error)
 {
 	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
@@ -289,13 +306,10 @@ static void raid10_end_write_request(struct bio *bio, int error)
 {
 	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
 	r10bio_t *r10_bio = bio->bi_private;
-	int slot, dev;
+	int dev;
 	conf_t *conf = r10_bio->mddev->private;
 
-	for (slot = 0; slot < conf->copies; slot++)
-		if (r10_bio->devs[slot].bio == bio)
-			break;
-	dev = r10_bio->devs[slot].devnum;
+	dev = find_bio_disk(conf, r10_bio, bio);
 
 	/*
 	 * this branch is our 'one mirror IO has finished' event handler:
@@ -316,8 +330,6 @@ static void raid10_end_write_request(struct bio *bio, int error)
 		 */
 		set_bit(R10BIO_Uptodate, &r10_bio->state);
 
-	update_head_pos(slot, r10_bio);
-
 	/*
 	 *
 	 * Let's see if all mirrored write operations have finished
@@ -1173,14 +1185,9 @@ static void end_sync_read(struct bio *bio, int error)
 {
 	r10bio_t *r10_bio = bio->bi_private;
 	conf_t *conf = r10_bio->mddev->private;
-	int i,d;
+	int d;
 
-	for (i=0; i<conf->copies; i++)
-		if (r10_bio->devs[i].bio == bio)
-			break;
-	BUG_ON(i == conf->copies);
-	update_head_pos(i, r10_bio);
-	d = r10_bio->devs[i].devnum;
+	d = find_bio_disk(conf, r10_bio, bio);
 
 	if (test_bit(BIO_UPTODATE, &bio->bi_flags))
 		set_bit(R10BIO_Uptodate, &r10_bio->state);
@@ -1211,18 +1218,13 @@ static void end_sync_write(struct bio *bio, int error)
 	r10bio_t *r10_bio = bio->bi_private;
 	mddev_t *mddev = r10_bio->mddev;
 	conf_t *conf = mddev->private;
-	int i,d;
+	int d;
 
-	for (i = 0; i < conf->copies; i++)
-		if (r10_bio->devs[i].bio == bio)
-			break;
-	d = r10_bio->devs[i].devnum;
+	d = find_bio_disk(conf, r10_bio, bio);
 
 	if (!uptodate)
 		md_error(mddev, conf->mirrors[d].rdev);
 
-	update_head_pos(i, r10_bio);
-
 	rdev_dec_pending(conf->mirrors[d].rdev, mddev);
 	while (atomic_dec_and_test(&r10_bio->remaining)) {
 		if (r10_bio->master_bio == NULL) {
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/5] md/raid10: share pages between read and write bio's during recovery
  2011-06-15  2:01 [PATCH RESEND 0/5] md/raid10 changes Namhyung Kim
                   ` (2 preceding siblings ...)
  2011-06-15  2:02 ` [PATCH 3/5] md/raid10: factor out common bio handling code Namhyung Kim
@ 2011-06-15  2:02 ` Namhyung Kim
  2011-06-15  2:02 ` [PATCH 5/5] md/raid10: spread read for subordinate r10bios " Namhyung Kim
  2011-06-15  3:09 ` [PATCH RESEND 0/5] md/raid10 changes NeilBrown
  5 siblings, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15  2:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

When performing a recovery, only first 2 slots in r10_bio are in use,
for read and write respectively. However all of pages in the write bio
are never used and just replaced to read bio's when the read completes.

Get rid of those unused pages and share read pages properly.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
 drivers/md/raid10.c |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a53779ffdf89..dea73bdb99b8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -123,7 +123,15 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 	for (j = 0 ; j < nalloc; j++) {
 		bio = r10_bio->devs[j].bio;
 		for (i = 0; i < RESYNC_PAGES; i++) {
-			page = alloc_page(gfp_flags);
+			if (j == 1 && !test_bit(MD_RECOVERY_SYNC,
+						&conf->mddev->recovery)) {
+				/* we can share bv_page's during recovery */
+				struct bio *rbio = r10_bio->devs[0].bio;
+				page = rbio->bi_io_vec[i].bv_page;
+				get_page(page);
+			} else {
+				page = alloc_page(gfp_flags);
+			}
 			if (unlikely(!page))
 				goto out_free_pages;
 
@@ -1360,20 +1368,14 @@ done:
 static void recovery_request_write(mddev_t *mddev, r10bio_t *r10_bio)
 {
 	conf_t *conf = mddev->private;
-	int i, d;
-	struct bio *bio, *wbio;
-
+	int d;
+	struct bio *wbio;
 
-	/* move the pages across to the second bio
+	/*
+	 * share the pages with the first bio
 	 * and submit the write request
 	 */
-	bio = r10_bio->devs[0].bio;
 	wbio = r10_bio->devs[1].bio;
-	for (i=0; i < wbio->bi_vcnt; i++) {
-		struct page *p = bio->bi_io_vec[i].bv_page;
-		bio->bi_io_vec[i].bv_page = wbio->bi_io_vec[i].bv_page;
-		wbio->bi_io_vec[i].bv_page = p;
-	}
 	d = r10_bio->devs[1].devnum;
 
 	atomic_inc(&conf->mirrors[d].rdev->nr_pending);
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] md/raid10: spread read for subordinate r10bios during recovery
  2011-06-15  2:01 [PATCH RESEND 0/5] md/raid10 changes Namhyung Kim
                   ` (3 preceding siblings ...)
  2011-06-15  2:02 ` [PATCH v2 4/5] md/raid10: share pages between read and write bio's during recovery Namhyung Kim
@ 2011-06-15  2:02 ` Namhyung Kim
  2011-06-15  3:09   ` NeilBrown
  2011-06-15  3:09 ` [PATCH RESEND 0/5] md/raid10 changes NeilBrown
  5 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15  2:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

In the current scheme, multiple read request could be directed to
the first active disk during recovery if there are several disk
failure at the same time. Spreading those requests on other in-sync
disks might be helpful.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
---
 drivers/md/raid10.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index dea73bdb99b8..d0188e49f881 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1832,6 +1832,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
 	if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
 		/* recovery... the complicated one */
 		int j, k;
+		int last_read = -1;
 		r10_bio = NULL;
 
 		for (i=0 ; i<conf->raid_disks; i++) {
@@ -1891,7 +1892,9 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
 						      &sync_blocks, still_degraded);
 
 			for (j=0; j<conf->copies;j++) {
-				int d = r10_bio->devs[j].devnum;
+				int c = (last_read + j + 1) % conf->copies;
+				int d = r10_bio->devs[c].devnum;
+
 				if (!conf->mirrors[d].rdev ||
 				    !test_bit(In_sync, &conf->mirrors[d].rdev->flags))
 					continue;
@@ -1902,13 +1905,14 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
 				bio->bi_private = r10_bio;
 				bio->bi_end_io = end_sync_read;
 				bio->bi_rw = READ;
-				bio->bi_sector = r10_bio->devs[j].addr +
+				bio->bi_sector = r10_bio->devs[c].addr +
 					conf->mirrors[d].rdev->data_offset;
 				bio->bi_bdev = conf->mirrors[d].rdev->bdev;
 				atomic_inc(&conf->mirrors[d].rdev->nr_pending);
 				atomic_inc(&r10_bio->remaining);
-				/* and we write to 'i' */
+				last_read = c;
 
+				/* and we write to 'i' */
 				for (k=0; k<conf->copies; k++)
 					if (r10_bio->devs[k].devnum == i)
 						break;
-- 
1.7.5.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] md/raid10: spread read for subordinate r10bios during recovery
  2011-06-15  2:02 ` [PATCH 5/5] md/raid10: spread read for subordinate r10bios " Namhyung Kim
@ 2011-06-15  3:09   ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2011-06-15  3:09 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-raid

On Wed, 15 Jun 2011 11:02:04 +0900 Namhyung Kim <namhyung@gmail.com> wrote:

> In the current scheme, multiple read request could be directed to
> the first active disk during recovery if there are several disk
> failure at the same time. Spreading those requests on other in-sync
> disks might be helpful.

I don't find this convincing either.  Spreading requests over disks in a
haphazard way is not certain to improve anything and could easily cause
regressions.

For example if I have an 'n3' array on 3 devices, this will read alternately
from the first 2 devices while rebuilding the last.  This is simply a waste.
One disk would be enough keep the rebuilding disk busy - the other should be
left of regular reads.

Again: if you can demonstrate a speed up in some configuration I'll be happy
to reconsider the patch.

Thanks,
NeilBrown


> 
> Signed-off-by: Namhyung Kim <namhyung@gmail.com>
> ---
>  drivers/md/raid10.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index dea73bdb99b8..d0188e49f881 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1832,6 +1832,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
>  	if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
>  		/* recovery... the complicated one */
>  		int j, k;
> +		int last_read = -1;
>  		r10_bio = NULL;
>  
>  		for (i=0 ; i<conf->raid_disks; i++) {
> @@ -1891,7 +1892,9 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
>  						      &sync_blocks, still_degraded);
>  
>  			for (j=0; j<conf->copies;j++) {
> -				int d = r10_bio->devs[j].devnum;
> +				int c = (last_read + j + 1) % conf->copies;
> +				int d = r10_bio->devs[c].devnum;
> +
>  				if (!conf->mirrors[d].rdev ||
>  				    !test_bit(In_sync, &conf->mirrors[d].rdev->flags))
>  					continue;
> @@ -1902,13 +1905,14 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
>  				bio->bi_private = r10_bio;
>  				bio->bi_end_io = end_sync_read;
>  				bio->bi_rw = READ;
> -				bio->bi_sector = r10_bio->devs[j].addr +
> +				bio->bi_sector = r10_bio->devs[c].addr +
>  					conf->mirrors[d].rdev->data_offset;
>  				bio->bi_bdev = conf->mirrors[d].rdev->bdev;
>  				atomic_inc(&conf->mirrors[d].rdev->nr_pending);
>  				atomic_inc(&r10_bio->remaining);
> -				/* and we write to 'i' */
> +				last_read = c;
>  
> +				/* and we write to 'i' */
>  				for (k=0; k<conf->copies; k++)
>  					if (r10_bio->devs[k].devnum == i)
>  						break;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 0/5] md/raid10 changes
  2011-06-15  2:01 [PATCH RESEND 0/5] md/raid10 changes Namhyung Kim
                   ` (4 preceding siblings ...)
  2011-06-15  2:02 ` [PATCH 5/5] md/raid10: spread read for subordinate r10bios " Namhyung Kim
@ 2011-06-15  3:09 ` NeilBrown
  2011-06-15  3:33   ` Namhyung Kim
  5 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2011-06-15  3:09 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-raid

On Wed, 15 Jun 2011 11:01:59 +0900 Namhyung Kim <namhyung@gmail.com> wrote:

> Hello Neil,
> 
> This is a resend of my previous raid10 patches. Some of them are revised
> according to your comments and some of them are not reviewed at all.
> Please take a look.
> 
> Thanks.
> 
> 
> Namhyung Kim (5):
>   md/raid10: optimize read_balance() for 'far offset' arrays
>   md/raid10: get rid of duplicated conditional expression
>   md/raid10: factor out common bio handling code
>   md/raid10: share pages between read and write bio's during recovery
>   md/raid10: spread read for subordinate r10bios during recovery
> 

Thanks.
 I've reject the first and last as explained separately but the others look
 good and I have added them to my tree.  They will probably be submitted in
 the next merge window.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 0/5] md/raid10 changes
  2011-06-15  3:09 ` [PATCH RESEND 0/5] md/raid10 changes NeilBrown
@ 2011-06-15  3:33   ` Namhyung Kim
  0 siblings, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2011-06-15  3:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

2011-06-15 (수), 13:09 +1000, NeilBrown:
> On Wed, 15 Jun 2011 11:01:59 +0900 Namhyung Kim <namhyung@gmail.com> wrote:
> 
> > Hello Neil,
> > 
> > This is a resend of my previous raid10 patches. Some of them are revised
> > according to your comments and some of them are not reviewed at all.
> > Please take a look.
> > 
> > Thanks.
> > 
> > 
> > Namhyung Kim (5):
> >   md/raid10: optimize read_balance() for 'far offset' arrays
> >   md/raid10: get rid of duplicated conditional expression
> >   md/raid10: factor out common bio handling code
> >   md/raid10: share pages between read and write bio's during recovery
> >   md/raid10: spread read for subordinate r10bios during recovery
> > 
> 
> Thanks.
>  I've reject the first and last as explained separately but the others look
>  good and I have added them to my tree.  They will probably be submitted in
>  the next merge window.
> 
> Thanks,
> NeilBrown
> 

Hi,

Thanks for reviewing them. As I said before, I don't have any realistic
test environment for those patches. They've just come from the code
inspection, and I (wrongly) thought they would be somewhat helpful,
sorry. :)


-- 
Regards,
Namhyung Kim


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-06-15 23:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-15  2:01 [PATCH RESEND 0/5] md/raid10 changes Namhyung Kim
2011-06-15  2:02 ` [PATCH v2 1/5] md/raid10: optimize read_balance() for 'far offset' arrays Namhyung Kim
2011-06-15  2:57   ` NeilBrown
2011-06-15  6:51   ` Keld Jørn Simonsen
2011-06-15 12:25     ` Namhyung Kim
2011-06-15 14:35       ` Namhyung Kim
2011-06-15 23:56         ` NeilBrown
2011-06-15  2:02 ` [PATCH 2/5] md/raid10: get rid of duplicated conditional expression Namhyung Kim
2011-06-15  2:02 ` [PATCH 3/5] md/raid10: factor out common bio handling code Namhyung Kim
2011-06-15  2:02 ` [PATCH v2 4/5] md/raid10: share pages between read and write bio's during recovery Namhyung Kim
2011-06-15  2:02 ` [PATCH 5/5] md/raid10: spread read for subordinate r10bios " Namhyung Kim
2011-06-15  3:09   ` NeilBrown
2011-06-15  3:09 ` [PATCH RESEND 0/5] md/raid10 changes NeilBrown
2011-06-15  3:33   ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).