[PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
@ 2008-12-08 21:57 Yuri Tikhonov
  2009-01-15 21:51 ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Yuri Tikhonov @ 2008-12-08 21:57 UTC (permalink / raw)
  To: linux-raid; +Cc: linuxppc-dev, dan.j.williams, wd, dzu, yanok

Rewrite handle_stripe_dirtying6 function to work asynchronously.

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
---
 drivers/md/raid5.c |  113 ++++++++++++++--------------------------------------
 1 files changed, 30 insertions(+), 83 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e08ed4f..f0b47bd 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2485,99 +2485,46 @@ static void handle_stripe_dirtying6(raid5_conf_t *conf,
 		struct stripe_head *sh,	struct stripe_head_state *s,
 		struct r6_state *r6s, int disks)
 {
-	int rcw = 0, must_compute = 0, pd_idx = sh->pd_idx, i;
+	int rcw = 0, pd_idx = sh->pd_idx, i;
 	int qd_idx = r6s->qd_idx;
+
+	set_bit(STRIPE_HANDLE, &sh->state);
 	for (i = disks; i--; ) {
 		struct r5dev *dev = &sh->dev[i];
-		/* Would I have to read this buffer for reconstruct_write */
-		if (!test_bit(R5_OVERWRITE, &dev->flags)
-		    && i != pd_idx && i != qd_idx
-		    && (!test_bit(R5_LOCKED, &dev->flags)
-			    ) &&
-		    !test_bit(R5_UPTODATE, &dev->flags)) {
-			if (test_bit(R5_Insync, &dev->flags)) rcw++;
-			else {
-				pr_debug("raid6: must_compute: "
-					"disk %d flags=%#lx\n", i, dev->flags);
-				must_compute++;
+		/* check if we haven't enough data */
+		if (!test_bit(R5_OVERWRITE, &dev->flags) &&
+		    i != pd_idx && i != qd_idx &&
+		    !test_bit(R5_LOCKED, &dev->flags) &&
+		    !(test_bit(R5_UPTODATE, &dev->flags) ||
+		      test_bit(R5_Wantcompute, &dev->flags))) {
+			rcw++;
+			if (!test_bit(R5_Insync, &dev->flags))
+				continue; /* it's a failed drive */
+
+			if (
+			  test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
+				pr_debug("Read_old stripe %llu "
+					"block %d for Reconstruct\n",
+				     (unsigned long long)sh->sector, i);
+				set_bit(R5_LOCKED, &dev->flags);
+				set_bit(R5_Wantread, &dev->flags);
+				s->locked++;
+			} else {
+				pr_debug("Request delayed stripe %llu "
+					"block %d for Reconstruct\n",
+				     (unsigned long long)sh->sector, i);
+				set_bit(STRIPE_DELAYED, &sh->state);
+				set_bit(STRIPE_HANDLE, &sh->state);
 			}
 		}
 	}
-	pr_debug("for sector %llu, rcw=%d, must_compute=%d\n",
-	       (unsigned long long)sh->sector, rcw, must_compute);
-	set_bit(STRIPE_HANDLE, &sh->state);
-
-	if (rcw > 0)
-		/* want reconstruct write, but need to get some data */
-		for (i = disks; i--; ) {
-			struct r5dev *dev = &sh->dev[i];
-			if (!test_bit(R5_OVERWRITE, &dev->flags)
-			    && !(s->failed == 0 && (i == pd_idx || i == qd_idx))
-			    && !test_bit(R5_LOCKED, &dev->flags) &&
-			    !test_bit(R5_UPTODATE, &dev->flags) &&
-			    test_bit(R5_Insync, &dev->flags)) {
-				if (
-				  test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
-					pr_debug("Read_old stripe %llu "
-						"block %d for Reconstruct\n",
-					     (unsigned long long)sh->sector, i);
-					set_bit(R5_LOCKED, &dev->flags);
-					set_bit(R5_Wantread, &dev->flags);
-					s->locked++;
-				} else {
-					pr_debug("Request delayed stripe %llu "
-						"block %d for Reconstruct\n",
-					     (unsigned long long)sh->sector, i);
-					set_bit(STRIPE_DELAYED, &sh->state);
-					set_bit(STRIPE_HANDLE, &sh->state);
-				}
-			}
-		}
 	/* now if nothing is locked, and if we have enough data, we can start a
 	 * write request
 	 */
-	if (s->locked == 0 && rcw == 0 &&
+	if ((s->req_compute || !test_bit(STRIPE_COMPUTE_RUN, &sh->state)) &&
+	    s->locked == 0 && rcw == 0 &&
 	    !test_bit(STRIPE_BIT_DELAY, &sh->state)) {
-		if (must_compute > 0) {
-			/* We have failed blocks and need to compute them */
-			switch (s->failed) {
-			case 0:
-				BUG();
-			case 1:
-				compute_block_1(sh, r6s->failed_num[0], 0);
-				break;
-			case 2:
-				compute_block_2(sh, r6s->failed_num[0],
-						r6s->failed_num[1]);
-				break;
-			default: /* This request should have been failed? */
-				BUG();
-			}
-		}
-
-		pr_debug("Computing parity for stripe %llu\n",
-			(unsigned long long)sh->sector);
-		compute_parity6(sh, RECONSTRUCT_WRITE);
-		/* now every locked buffer is ready to be written */
-		for (i = disks; i--; )
-			if (test_bit(R5_LOCKED, &sh->dev[i].flags)) {
-				pr_debug("Writing stripe %llu block %d\n",
-				       (unsigned long long)sh->sector, i);
-				s->locked++;
-				set_bit(R5_Wantwrite, &sh->dev[i].flags);
-			}
-		if (s->locked == disks)
-			if (!test_and_set_bit(STRIPE_FULL_WRITE, &sh->state))
-				atomic_inc(&conf->pending_full_writes);
-		/* after a RECONSTRUCT_WRITE, the stripe MUST be in-sync */
-		set_bit(STRIPE_INSYNC, &sh->state);
-
-		if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
-			atomic_dec(&conf->preread_active_stripes);
-			if (atomic_read(&conf->preread_active_stripes) <
-			    IO_THRESHOLD)
-				md_wakeup_thread(conf->mddev->thread);
-		}
+		schedule_reconstruction(sh, s, 1, 0);
 	}
 }
 
-- 
1.5.6.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
  2008-12-08 21:57 [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way Yuri Tikhonov
@ 2009-01-15 21:51 ` Dan Williams
  2009-01-15 22:21   ` Dan Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2009-01-15 21:51 UTC (permalink / raw)
  To: Yuri Tikhonov; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok

On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote:
> Rewrite handle_stripe_dirtying6 function to work asynchronously.
>
> Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
> Signed-off-by: Ilya Yanok <yanok@emcraft.com>
> ---
>  drivers/md/raid5.c |  113 ++++++++++++++--------------------------------------
>  1 files changed, 30 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index e08ed4f..f0b47bd 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2485,99 +2485,46 @@ static void handle_stripe_dirtying6(raid5_conf_t *conf,
>                struct stripe_head *sh, struct stripe_head_state *s,
>                struct r6_state *r6s, int disks)
>  {
> -       int rcw = 0, must_compute = 0, pd_idx = sh->pd_idx, i;
> +       int rcw = 0, pd_idx = sh->pd_idx, i;
>        int qd_idx = r6s->qd_idx;
> +
> +       set_bit(STRIPE_HANDLE, &sh->state);
>        for (i = disks; i--; ) {
>                struct r5dev *dev = &sh->dev[i];
> -               /* Would I have to read this buffer for reconstruct_write */
> -               if (!test_bit(R5_OVERWRITE, &dev->flags)
> -                   && i != pd_idx && i != qd_idx
> -                   && (!test_bit(R5_LOCKED, &dev->flags)
> -                           ) &&
> -                   !test_bit(R5_UPTODATE, &dev->flags)) {
> -                       if (test_bit(R5_Insync, &dev->flags)) rcw++;
> -                       else {
> -                               pr_debug("raid6: must_compute: "
> -                                       "disk %d flags=%#lx\n", i, dev->flags);
> -                               must_compute++;
> +               /* check if we haven't enough data */
> +               if (!test_bit(R5_OVERWRITE, &dev->flags) &&
> +                   i != pd_idx && i != qd_idx &&
> +                   !test_bit(R5_LOCKED, &dev->flags) &&
> +                   !(test_bit(R5_UPTODATE, &dev->flags) ||
> +                     test_bit(R5_Wantcompute, &dev->flags))) {
> +                       rcw++;
> +                       if (!test_bit(R5_Insync, &dev->flags))
> +                               continue; /* it's a failed drive */
> +
> +                       if (
> +                         test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
> +                               pr_debug("Read_old stripe %llu "
> +                                       "block %d for Reconstruct\n",
> +                                    (unsigned long long)sh->sector, i);
> +                               set_bit(R5_LOCKED, &dev->flags);
> +                               set_bit(R5_Wantread, &dev->flags);
> +                               s->locked++;
> +                       } else {
> +                               pr_debug("Request delayed stripe %llu "
> +                                       "block %d for Reconstruct\n",
> +                                    (unsigned long long)sh->sector, i);
> +                               set_bit(STRIPE_DELAYED, &sh->state);
> +                               set_bit(STRIPE_HANDLE, &sh->state);

What's the reasoning behind changing the logic here, i.e. removing
must_compute and such?  I'd feel more comfortable seeing copy and
paste where possible with cleanups separated out into their own patch.

--
Dan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
  2009-01-15 21:51 ` Dan Williams
@ 2009-01-15 22:21   ` Dan Williams
  2009-01-16  1:07     ` Cheng Renquan
  2009-01-16 14:24     ` Yuri Tikhonov
  0 siblings, 2 replies; 7+ messages in thread
From: Dan Williams @ 2009-01-15 22:21 UTC (permalink / raw)
  To: Yuri Tikhonov; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok

On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote:
> What's the reasoning behind changing the logic here, i.e. removing
> must_compute and such?  I'd feel more comfortable seeing copy and
> paste where possible with cleanups separated out into their own patch.
>

Ok, I now see why this change was made.  Please make this changelog
more descriptive than "Rewrite handle_stripe_dirtying6 function to
work asynchronously."

Regards,
Dan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
  2009-01-15 22:21   ` Dan Williams
@ 2009-01-16  1:07     ` Cheng Renquan
  2009-01-16 14:46       ` Re[2]: " Yuri Tikhonov
  2009-01-16 14:24     ` Yuri Tikhonov
  1 sibling, 1 reply; 7+ messages in thread
From: Cheng Renquan @ 2009-01-16  1:07 UTC (permalink / raw)
  To: Yuri Tikhonov, Dan Williams; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok

On Tue, Dec 9, 2008 at 5:57 AM, Yuri Tikhonov <yur@emcraft.com> wrote:
> Rewrite handle_stripe_dirtying6 function to work asynchronously.


On Fri, Jan 16, 2009 at 6:21 AM, Dan Williams <dan.j.williams@intel.com> wr=
ote:
> On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> =
wrote:
>> On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote:
>> What's the reasoning behind changing the logic here, i.e. removing
>> must_compute and such?  I'd feel more comfortable seeing copy and
>> paste where possible with cleanups separated out into their own patch.

>
> Ok, I now see why this change was made.  Please make this changelog
> more descriptive than "Rewrite handle_stripe_dirtying6 function to
> work asynchronously."

Ack, could you please make the changelog more descriptive?
and or add some of your benchmark results?

Thanks,

--
Cheng Renquan (=E7=A8=8B=E4=BB=BB=E5=85=A8), Shenzhen, China

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re[2]: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
  2009-01-15 22:21   ` Dan Williams
  2009-01-16  1:07     ` Cheng Renquan
@ 2009-01-16 14:24     ` Yuri Tikhonov
  2009-01-16 18:39       ` Dan Williams
  1 sibling, 1 reply; 7+ messages in thread
From: Yuri Tikhonov @ 2009-01-16 14:24 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok

On Friday, January 16, 2009 you wrote:

> On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> =
wrote:
>> On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote:
>> What's the reasoning behind changing the logic here, i.e. removing
>> must_compute and such?  I'd feel more comfortable seeing copy and
>> paste where possible with cleanups separated out into their own patch.
>>

> Ok, I now see why this change was made.  Please make this changelog
> more descriptive than "Rewrite handle_stripe_dirtying6 function to
> work asynchronously."

 Sure, how about the following:

"

 md: rewrite handle_stripe_dirtying6 in asynchronous way

 Processing stripe dirtying in asynchronous way requires some changes=20
to the handle_stripe_dirtying6() algorithm.

 In the synchronous implementation of the stripe dirtying we processed=20
dirtying of a degraded stripe (with partially changed strip(s) located=20
on the failed drive(s)) inside one handle_stripe_dirtying6() call:
- we computed the missed strips from the old parities, and thus got=20
the fully up-to-date stripe, then
- we did reconstruction using the new data to write.

 In the asynchronous case of handle_stripe_dirtying6() we don't=20
process anything right inside this function (since we under the lock),=20
but only schedule the necessary operations with flags. Thus, if=20
handle_stripe_dirtying6() is performed on the top of a degraded array=20
we should schedule the reconstruction operation when the failed strips=20
are marked (by previously called fetch_block6()) as to be computed=20
(with the R5_Wantcompute flag), and all the other strips of the stripe=20
are UPTODATE. The schedule_reconstruction() function will set the=20
STRIPE_OP_POSTXOR flag [for new parity calculation], which is then=20
handled in raid_run_ops() after the STRIPE_OP_COMPUTE_BLK one [which=20
causes computing of the data missed].

"

 Regards, Yuri

 --
 Yuri Tikhonov, Senior Software Engineer
 Emcraft Systems, www.emcraft.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re[2]: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
  2009-01-16  1:07     ` Cheng Renquan
@ 2009-01-16 14:46       ` Yuri Tikhonov
  0 siblings, 0 replies; 7+ messages in thread
From: Yuri Tikhonov @ 2009-01-16 14:46 UTC (permalink / raw)
  To: Cheng Renquan; +Cc: wd, dzu, linux-raid, linuxppc-dev, yanok, Dan Williams

=0D=0AHello Cheng,

On Friday, January 16, 2009 you wrote:

> Ack, could you please make the changelog more descriptive?
> and or add some of your benchmark results?

 Of course. We did benchmarking using the Xdd tool like follows:

# xdd -op write -kbytes $kbytes -reqsize $reqsize -dio-passes 2 =E2=80=93ve=
rbose -target $target_device

 where

$kbytes =3D data disks * size of disk
$reqsize=3D data disks * chunk size
$target_device =3D /dev/md0

 This way we did write of full array size, and thus achieved the=20
maximum performance.

 The test cases were RAID-6 built on the top of 14 S-ATA drives=20
connected to 2 LSI cards (7+7) inserted into the 800 MHz Katmai board=20
(based on ppc440spe) equipped with 4GB of 800 MHz DRAM .

 Here are the results (Psw - write throughput with s/w RAID-6; Phw -=20
write throughput with the h/w accelerated RAID-6):

 PAGE_SIZE=3D4KB, chunk=3D64/128/256 KB
        Psw =3D 71/72/74 MBps
        Phw =3D 128/136/139 MBps

 PAGE_SIZE=3D16KB, chunk=3D256/512/1024 KB
        Psw =3D 81/81/82 MBps
        Phw =3D 205/244/239 MBps

 PAGE_SIZE=3D64KB, chunk=3D1024/2048/4096 KB
        Psw =3D 84/84/85 MBps
        Phw =3D 258/253/258 MBps

 PAGE_SIZE=3D256KB, chunk=3D4096/8192/16384 KB
        Psw =3D 81/83/83 MBps
        Phw =3D 288/275/274 MBps

 Regards, Yuri

 --
 Yuri Tikhonov, Senior Software Engineer
 Emcraft Systems, www.emcraft.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re[2]: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
  2009-01-16 14:24     ` Yuri Tikhonov
@ 2009-01-16 18:39       ` Dan Williams
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2009-01-16 18:39 UTC (permalink / raw)
  To: Yuri Tikhonov; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok

On Fri, Jan 16, 2009 at 7:24 AM, Yuri Tikhonov <yur@emcraft.com> wrote:
>> Ok, I now see why this change was made.  Please make this changelog
>> more descriptive than "Rewrite handle_stripe_dirtying6 function to
>> work asynchronously."
>
>  Sure, how about the following:
>
> "
>
>  md: rewrite handle_stripe_dirtying6 in asynchronous way
>
>  Processing stripe dirtying in asynchronous way requires some changes
> to the handle_stripe_dirtying6() algorithm.
>
>  In the synchronous implementation of the stripe dirtying we processed
> dirtying of a degraded stripe (with partially changed strip(s) located
> on the failed drive(s)) inside one handle_stripe_dirtying6() call:
> - we computed the missed strips from the old parities, and thus got
> the fully up-to-date stripe, then
> - we did reconstruction using the new data to write.
>
>  In the asynchronous case of handle_stripe_dirtying6() we don't
> process anything right inside this function (since we under the lock),
> but only schedule the necessary operations with flags. Thus, if
> handle_stripe_dirtying6() is performed on the top of a degraded array
> we should schedule the reconstruction operation when the failed strips
> are marked (by previously called fetch_block6()) as to be computed
> (with the R5_Wantcompute flag), and all the other strips of the stripe
> are UPTODATE. The schedule_reconstruction() function will set the
> STRIPE_OP_POSTXOR flag [for new parity calculation], which is then
> handled in raid_run_ops() after the STRIPE_OP_COMPUTE_BLK one [which
> causes computing of the data missed].
>
> "

Excellent!

Thanks,
Dan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-01-16 18:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-08 21:57 [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way Yuri Tikhonov
2009-01-15 21:51 ` Dan Williams
2009-01-15 22:21   ` Dan Williams
2009-01-16  1:07     ` Cheng Renquan
2009-01-16 14:46       ` Re[2]: " Yuri Tikhonov
2009-01-16 14:24     ` Yuri Tikhonov
2009-01-16 18:39       ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).