* [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
@ 2008-12-08 21:57 Yuri Tikhonov
2009-01-15 21:51 ` Dan Williams
0 siblings, 1 reply; 7+ messages in thread
From: Yuri Tikhonov @ 2008-12-08 21:57 UTC (permalink / raw)
To: linux-raid; +Cc: linuxppc-dev, dan.j.williams, wd, dzu, yanok
Rewrite handle_stripe_dirtying6 function to work asynchronously.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
---
drivers/md/raid5.c | 113 ++++++++++++++--------------------------------------
1 files changed, 30 insertions(+), 83 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e08ed4f..f0b47bd 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2485,99 +2485,46 @@ static void handle_stripe_dirtying6(raid5_conf_t *conf,
struct stripe_head *sh, struct stripe_head_state *s,
struct r6_state *r6s, int disks)
{
- int rcw = 0, must_compute = 0, pd_idx = sh->pd_idx, i;
+ int rcw = 0, pd_idx = sh->pd_idx, i;
int qd_idx = r6s->qd_idx;
+
+ set_bit(STRIPE_HANDLE, &sh->state);
for (i = disks; i--; ) {
struct r5dev *dev = &sh->dev[i];
- /* Would I have to read this buffer for reconstruct_write */
- if (!test_bit(R5_OVERWRITE, &dev->flags)
- && i != pd_idx && i != qd_idx
- && (!test_bit(R5_LOCKED, &dev->flags)
- ) &&
- !test_bit(R5_UPTODATE, &dev->flags)) {
- if (test_bit(R5_Insync, &dev->flags)) rcw++;
- else {
- pr_debug("raid6: must_compute: "
- "disk %d flags=%#lx\n", i, dev->flags);
- must_compute++;
+ /* check if we haven't enough data */
+ if (!test_bit(R5_OVERWRITE, &dev->flags) &&
+ i != pd_idx && i != qd_idx &&
+ !test_bit(R5_LOCKED, &dev->flags) &&
+ !(test_bit(R5_UPTODATE, &dev->flags) ||
+ test_bit(R5_Wantcompute, &dev->flags))) {
+ rcw++;
+ if (!test_bit(R5_Insync, &dev->flags))
+ continue; /* it's a failed drive */
+
+ if (
+ test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
+ pr_debug("Read_old stripe %llu "
+ "block %d for Reconstruct\n",
+ (unsigned long long)sh->sector, i);
+ set_bit(R5_LOCKED, &dev->flags);
+ set_bit(R5_Wantread, &dev->flags);
+ s->locked++;
+ } else {
+ pr_debug("Request delayed stripe %llu "
+ "block %d for Reconstruct\n",
+ (unsigned long long)sh->sector, i);
+ set_bit(STRIPE_DELAYED, &sh->state);
+ set_bit(STRIPE_HANDLE, &sh->state);
}
}
}
- pr_debug("for sector %llu, rcw=%d, must_compute=%d\n",
- (unsigned long long)sh->sector, rcw, must_compute);
- set_bit(STRIPE_HANDLE, &sh->state);
-
- if (rcw > 0)
- /* want reconstruct write, but need to get some data */
- for (i = disks; i--; ) {
- struct r5dev *dev = &sh->dev[i];
- if (!test_bit(R5_OVERWRITE, &dev->flags)
- && !(s->failed == 0 && (i == pd_idx || i == qd_idx))
- && !test_bit(R5_LOCKED, &dev->flags) &&
- !test_bit(R5_UPTODATE, &dev->flags) &&
- test_bit(R5_Insync, &dev->flags)) {
- if (
- test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
- pr_debug("Read_old stripe %llu "
- "block %d for Reconstruct\n",
- (unsigned long long)sh->sector, i);
- set_bit(R5_LOCKED, &dev->flags);
- set_bit(R5_Wantread, &dev->flags);
- s->locked++;
- } else {
- pr_debug("Request delayed stripe %llu "
- "block %d for Reconstruct\n",
- (unsigned long long)sh->sector, i);
- set_bit(STRIPE_DELAYED, &sh->state);
- set_bit(STRIPE_HANDLE, &sh->state);
- }
- }
- }
/* now if nothing is locked, and if we have enough data, we can start a
* write request
*/
- if (s->locked == 0 && rcw == 0 &&
+ if ((s->req_compute || !test_bit(STRIPE_COMPUTE_RUN, &sh->state)) &&
+ s->locked == 0 && rcw == 0 &&
!test_bit(STRIPE_BIT_DELAY, &sh->state)) {
- if (must_compute > 0) {
- /* We have failed blocks and need to compute them */
- switch (s->failed) {
- case 0:
- BUG();
- case 1:
- compute_block_1(sh, r6s->failed_num[0], 0);
- break;
- case 2:
- compute_block_2(sh, r6s->failed_num[0],
- r6s->failed_num[1]);
- break;
- default: /* This request should have been failed? */
- BUG();
- }
- }
-
- pr_debug("Computing parity for stripe %llu\n",
- (unsigned long long)sh->sector);
- compute_parity6(sh, RECONSTRUCT_WRITE);
- /* now every locked buffer is ready to be written */
- for (i = disks; i--; )
- if (test_bit(R5_LOCKED, &sh->dev[i].flags)) {
- pr_debug("Writing stripe %llu block %d\n",
- (unsigned long long)sh->sector, i);
- s->locked++;
- set_bit(R5_Wantwrite, &sh->dev[i].flags);
- }
- if (s->locked == disks)
- if (!test_and_set_bit(STRIPE_FULL_WRITE, &sh->state))
- atomic_inc(&conf->pending_full_writes);
- /* after a RECONSTRUCT_WRITE, the stripe MUST be in-sync */
- set_bit(STRIPE_INSYNC, &sh->state);
-
- if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
- atomic_dec(&conf->preread_active_stripes);
- if (atomic_read(&conf->preread_active_stripes) <
- IO_THRESHOLD)
- md_wakeup_thread(conf->mddev->thread);
- }
+ schedule_reconstruction(sh, s, 1, 0);
}
}
--
1.5.6.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
2008-12-08 21:57 [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way Yuri Tikhonov
@ 2009-01-15 21:51 ` Dan Williams
2009-01-15 22:21 ` Dan Williams
0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2009-01-15 21:51 UTC (permalink / raw)
To: Yuri Tikhonov; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok
On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote:
> Rewrite handle_stripe_dirtying6 function to work asynchronously.
>
> Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
> Signed-off-by: Ilya Yanok <yanok@emcraft.com>
> ---
> drivers/md/raid5.c | 113 ++++++++++++++--------------------------------------
> 1 files changed, 30 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index e08ed4f..f0b47bd 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -2485,99 +2485,46 @@ static void handle_stripe_dirtying6(raid5_conf_t *conf,
> struct stripe_head *sh, struct stripe_head_state *s,
> struct r6_state *r6s, int disks)
> {
> - int rcw = 0, must_compute = 0, pd_idx = sh->pd_idx, i;
> + int rcw = 0, pd_idx = sh->pd_idx, i;
> int qd_idx = r6s->qd_idx;
> +
> + set_bit(STRIPE_HANDLE, &sh->state);
> for (i = disks; i--; ) {
> struct r5dev *dev = &sh->dev[i];
> - /* Would I have to read this buffer for reconstruct_write */
> - if (!test_bit(R5_OVERWRITE, &dev->flags)
> - && i != pd_idx && i != qd_idx
> - && (!test_bit(R5_LOCKED, &dev->flags)
> - ) &&
> - !test_bit(R5_UPTODATE, &dev->flags)) {
> - if (test_bit(R5_Insync, &dev->flags)) rcw++;
> - else {
> - pr_debug("raid6: must_compute: "
> - "disk %d flags=%#lx\n", i, dev->flags);
> - must_compute++;
> + /* check if we haven't enough data */
> + if (!test_bit(R5_OVERWRITE, &dev->flags) &&
> + i != pd_idx && i != qd_idx &&
> + !test_bit(R5_LOCKED, &dev->flags) &&
> + !(test_bit(R5_UPTODATE, &dev->flags) ||
> + test_bit(R5_Wantcompute, &dev->flags))) {
> + rcw++;
> + if (!test_bit(R5_Insync, &dev->flags))
> + continue; /* it's a failed drive */
> +
> + if (
> + test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
> + pr_debug("Read_old stripe %llu "
> + "block %d for Reconstruct\n",
> + (unsigned long long)sh->sector, i);
> + set_bit(R5_LOCKED, &dev->flags);
> + set_bit(R5_Wantread, &dev->flags);
> + s->locked++;
> + } else {
> + pr_debug("Request delayed stripe %llu "
> + "block %d for Reconstruct\n",
> + (unsigned long long)sh->sector, i);
> + set_bit(STRIPE_DELAYED, &sh->state);
> + set_bit(STRIPE_HANDLE, &sh->state);
What's the reasoning behind changing the logic here, i.e. removing
must_compute and such? I'd feel more comfortable seeing copy and
paste where possible with cleanups separated out into their own patch.
--
Dan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
2009-01-15 21:51 ` Dan Williams
@ 2009-01-15 22:21 ` Dan Williams
2009-01-16 1:07 ` Cheng Renquan
2009-01-16 14:24 ` Yuri Tikhonov
0 siblings, 2 replies; 7+ messages in thread
From: Dan Williams @ 2009-01-15 22:21 UTC (permalink / raw)
To: Yuri Tikhonov; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok
On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote:
> What's the reasoning behind changing the logic here, i.e. removing
> must_compute and such? I'd feel more comfortable seeing copy and
> paste where possible with cleanups separated out into their own patch.
>
Ok, I now see why this change was made. Please make this changelog
more descriptive than "Rewrite handle_stripe_dirtying6 function to
work asynchronously."
Regards,
Dan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
2009-01-15 22:21 ` Dan Williams
@ 2009-01-16 1:07 ` Cheng Renquan
2009-01-16 14:46 ` Re[2]: " Yuri Tikhonov
2009-01-16 14:24 ` Yuri Tikhonov
1 sibling, 1 reply; 7+ messages in thread
From: Cheng Renquan @ 2009-01-16 1:07 UTC (permalink / raw)
To: Yuri Tikhonov, Dan Williams; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok
On Tue, Dec 9, 2008 at 5:57 AM, Yuri Tikhonov <yur@emcraft.com> wrote:
> Rewrite handle_stripe_dirtying6 function to work asynchronously.
On Fri, Jan 16, 2009 at 6:21 AM, Dan Williams <dan.j.williams@intel.com> wr=
ote:
> On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> =
wrote:
>> On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote:
>> What's the reasoning behind changing the logic here, i.e. removing
>> must_compute and such? I'd feel more comfortable seeing copy and
>> paste where possible with cleanups separated out into their own patch.
>
> Ok, I now see why this change was made. Please make this changelog
> more descriptive than "Rewrite handle_stripe_dirtying6 function to
> work asynchronously."
Ack, could you please make the changelog more descriptive?
and or add some of your benchmark results?
Thanks,
--
Cheng Renquan (=E7=A8=8B=E4=BB=BB=E5=85=A8), Shenzhen, China
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re[2]: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
2009-01-15 22:21 ` Dan Williams
2009-01-16 1:07 ` Cheng Renquan
@ 2009-01-16 14:24 ` Yuri Tikhonov
2009-01-16 18:39 ` Dan Williams
1 sibling, 1 reply; 7+ messages in thread
From: Yuri Tikhonov @ 2009-01-16 14:24 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok
On Friday, January 16, 2009 you wrote:
> On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> =
wrote:
>> On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote:
>> What's the reasoning behind changing the logic here, i.e. removing
>> must_compute and such? I'd feel more comfortable seeing copy and
>> paste where possible with cleanups separated out into their own patch.
>>
> Ok, I now see why this change was made. Please make this changelog
> more descriptive than "Rewrite handle_stripe_dirtying6 function to
> work asynchronously."
Sure, how about the following:
"
md: rewrite handle_stripe_dirtying6 in asynchronous way
Processing stripe dirtying in asynchronous way requires some changes=20
to the handle_stripe_dirtying6() algorithm.
In the synchronous implementation of the stripe dirtying we processed=20
dirtying of a degraded stripe (with partially changed strip(s) located=20
on the failed drive(s)) inside one handle_stripe_dirtying6() call:
- we computed the missed strips from the old parities, and thus got=20
the fully up-to-date stripe, then
- we did reconstruction using the new data to write.
In the asynchronous case of handle_stripe_dirtying6() we don't=20
process anything right inside this function (since we under the lock),=20
but only schedule the necessary operations with flags. Thus, if=20
handle_stripe_dirtying6() is performed on the top of a degraded array=20
we should schedule the reconstruction operation when the failed strips=20
are marked (by previously called fetch_block6()) as to be computed=20
(with the R5_Wantcompute flag), and all the other strips of the stripe=20
are UPTODATE. The schedule_reconstruction() function will set the=20
STRIPE_OP_POSTXOR flag [for new parity calculation], which is then=20
handled in raid_run_ops() after the STRIPE_OP_COMPUTE_BLK one [which=20
causes computing of the data missed].
"
Regards, Yuri
--
Yuri Tikhonov, Senior Software Engineer
Emcraft Systems, www.emcraft.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re[2]: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
2009-01-16 1:07 ` Cheng Renquan
@ 2009-01-16 14:46 ` Yuri Tikhonov
0 siblings, 0 replies; 7+ messages in thread
From: Yuri Tikhonov @ 2009-01-16 14:46 UTC (permalink / raw)
To: Cheng Renquan; +Cc: wd, dzu, linux-raid, linuxppc-dev, yanok, Dan Williams
=0D=0AHello Cheng,
On Friday, January 16, 2009 you wrote:
> Ack, could you please make the changelog more descriptive?
> and or add some of your benchmark results?
Of course. We did benchmarking using the Xdd tool like follows:
# xdd -op write -kbytes $kbytes -reqsize $reqsize -dio-passes 2 =E2=80=93ve=
rbose -target $target_device
where
$kbytes =3D data disks * size of disk
$reqsize=3D data disks * chunk size
$target_device =3D /dev/md0
This way we did write of full array size, and thus achieved the=20
maximum performance.
The test cases were RAID-6 built on the top of 14 S-ATA drives=20
connected to 2 LSI cards (7+7) inserted into the 800 MHz Katmai board=20
(based on ppc440spe) equipped with 4GB of 800 MHz DRAM .
Here are the results (Psw - write throughput with s/w RAID-6; Phw -=20
write throughput with the h/w accelerated RAID-6):
PAGE_SIZE=3D4KB, chunk=3D64/128/256 KB
Psw =3D 71/72/74 MBps
Phw =3D 128/136/139 MBps
PAGE_SIZE=3D16KB, chunk=3D256/512/1024 KB
Psw =3D 81/81/82 MBps
Phw =3D 205/244/239 MBps
PAGE_SIZE=3D64KB, chunk=3D1024/2048/4096 KB
Psw =3D 84/84/85 MBps
Phw =3D 258/253/258 MBps
PAGE_SIZE=3D256KB, chunk=3D4096/8192/16384 KB
Psw =3D 81/83/83 MBps
Phw =3D 288/275/274 MBps
Regards, Yuri
--
Yuri Tikhonov, Senior Software Engineer
Emcraft Systems, www.emcraft.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Re[2]: [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way
2009-01-16 14:24 ` Yuri Tikhonov
@ 2009-01-16 18:39 ` Dan Williams
0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2009-01-16 18:39 UTC (permalink / raw)
To: Yuri Tikhonov; +Cc: linux-raid, linuxppc-dev, wd, dzu, yanok
On Fri, Jan 16, 2009 at 7:24 AM, Yuri Tikhonov <yur@emcraft.com> wrote:
>> Ok, I now see why this change was made. Please make this changelog
>> more descriptive than "Rewrite handle_stripe_dirtying6 function to
>> work asynchronously."
>
> Sure, how about the following:
>
> "
>
> md: rewrite handle_stripe_dirtying6 in asynchronous way
>
> Processing stripe dirtying in asynchronous way requires some changes
> to the handle_stripe_dirtying6() algorithm.
>
> In the synchronous implementation of the stripe dirtying we processed
> dirtying of a degraded stripe (with partially changed strip(s) located
> on the failed drive(s)) inside one handle_stripe_dirtying6() call:
> - we computed the missed strips from the old parities, and thus got
> the fully up-to-date stripe, then
> - we did reconstruction using the new data to write.
>
> In the asynchronous case of handle_stripe_dirtying6() we don't
> process anything right inside this function (since we under the lock),
> but only schedule the necessary operations with flags. Thus, if
> handle_stripe_dirtying6() is performed on the top of a degraded array
> we should schedule the reconstruction operation when the failed strips
> are marked (by previously called fetch_block6()) as to be computed
> (with the R5_Wantcompute flag), and all the other strips of the stripe
> are UPTODATE. The schedule_reconstruction() function will set the
> STRIPE_OP_POSTXOR flag [for new parity calculation], which is then
> handled in raid_run_ops() after the STRIPE_OP_COMPUTE_BLK one [which
> causes computing of the data missed].
>
> "
Excellent!
Thanks,
Dan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-01-16 18:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-08 21:57 [PATCH 07/11] md: rewrite handle_stripe_dirtying6 in asynchronous way Yuri Tikhonov
2009-01-15 21:51 ` Dan Williams
2009-01-15 22:21 ` Dan Williams
2009-01-16 1:07 ` Cheng Renquan
2009-01-16 14:46 ` Re[2]: " Yuri Tikhonov
2009-01-16 14:24 ` Yuri Tikhonov
2009-01-16 18:39 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).