From: NeilBrown <neilb@suse.de>
To: patrik@dsl.sk
Cc: linux-raid@vger.kernel.org
Subject: Re: Sequential writing to degraded RAID6 causing a lot of reading
Date: Tue, 20 May 2014 21:08:44 +1000 [thread overview]
Message-ID: <20140520210844.78dafb14@notabene.brown> (raw)
In-Reply-To: <CAAOsTSkzZM7Pf8k=JU=6uWKUqsq7M44FLC=fM_GVB8KD+z7qqA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 8333 bytes --]
On Tue, 20 May 2014 12:07:11 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> 2014-05-20 7:42 GMT+02:00 NeilBrown <neilb@suse.de>:
> > On Thu, 15 May 2014 09:50:49 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >
> >> OK, it seems that because of that my copy operations will not be
> >> finished yet by next week... :)
> >>
> >> BTW this time layout is left-symetric but the problem I guess is in
> >> whole strip' write detection with degraded RAID6.
> >>
> >> Patrik
> >>
> >> 2014-05-15 9:18 GMT+02:00 NeilBrown <neilb@suse.de>:
> >> > On Thu, 15 May 2014 09:04:27 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >> >
> >> >> Hello Neil,
> >> >>
> >> >> did you make some progress on this issue by any chance?
> >> >
> >> > No I haven't - sorry.
> >> > After 2 year, I guess I really should.
> >> >
> >> > I'll make another note for first thing next week.
> >
> > Can you try the following patch and let me know if it helps?
>
> I dont want to test it on production system... But I have some
> degraded array which does not have production data on it so I will
> think about how to test it.
>
> > I definitely reduced the number of reads significantly, but my measurements
> > (of a very simple test case) didn't show much speed-up.
> >
>
> I did not look at the patch itself but according to your description
> is should eliminate the problem, should it not? What was your read /
> write ratio after the patch?
It depends a bit of what particular tests I ran and what other hacks were in
the kernel - I did get zero reads, but that was with some hacks that aren't
general enough to be used.
Providing the stripe_cache_size was reasonably large, I got somewhere between
1:100 and 1:10. When things were bad, it was often close to 1:1.
NeilBrown
>
> Thanks.
>
> Patrik
>
> > This is against current mainline. If you want it against another version and
> > it doesn't apply easily, just ask.
> >
> > Thanks,
> > NeilBrown
> >
> > From 98c411f93391be0dbda98d43835dd9e042faa78f Mon Sep 17 00:00:00 2001
> > From: NeilBrown <neilb@suse.de>
> > Date: Mon, 19 May 2014 11:16:49 +1000
> > Subject: [PATCH] md/raid56: Don't perform reads to support writes until stripe
> > is ready.
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > If it is found that we need to pre-read some blocks before a write
> > can succeed, we normally set STRIPE_DELAYED and don't actually perform
> > the read until STRIPE_PREREAD_ACTIVE subsequently gets set.
> >
> > However for a degraded RAID6 we currently perform the reads as soon
> > as we see that a write is pending. This significantly hurts
> > throughput.
> >
> > So:
> > - when handle_stripe_dirtying find a block that it wants on a device
> > that is failed, set STRIPE_DELAY, instead of doing nothing, and
> > - when fetch_block detects that a read might be required to satisfy a
> > write, only perform the read if STRIPE_PREREAD_ACTIVE is set,
> > and if we would actually need to read something to complete the write.
> >
> > This also helps RAID5, though less often as RAID5 supports a
> > read-modify-write cycle. For RAID5 the read is performed too early
> > only if the write is not a full 4K aligned write (i.e. no an
> > R5_OVERWRITE).
> >
> > Also clean up a couple of horrible bits of formatting.
> >
> > Reported-by: Patrik Horník <patrik@dsl.sk>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> >
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 633e20a96b34..d67202bd9118 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -292,9 +292,12 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh,
> > BUG_ON(atomic_read(&conf->active_stripes)==0);
> > if (test_bit(STRIPE_HANDLE, &sh->state)) {
> > if (test_bit(STRIPE_DELAYED, &sh->state) &&
> > - !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> > + !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
> > list_add_tail(&sh->lru, &conf->delayed_list);
> > - else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
> > + if (atomic_read(&conf->preread_active_stripes)
> > + < IO_THRESHOLD)
> > + md_wakeup_thread(conf->mddev->thread);
> > + } else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
> > sh->bm_seq - conf->seq_write > 0)
> > list_add_tail(&sh->lru, &conf->bitmap_list);
> > else {
> > @@ -2908,8 +2911,11 @@ static int fetch_block(struct stripe_head *sh, struct stripe_head_state *s,
> > (s->failed >= 1 && fdev[0]->toread) ||
> > (s->failed >= 2 && fdev[1]->toread) ||
> > (sh->raid_conf->level <= 5 && s->failed && fdev[0]->towrite &&
> > + (!test_bit(R5_Insync, &dev->flags) || test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) &&
> > !test_bit(R5_OVERWRITE, &fdev[0]->flags)) ||
> > - (sh->raid_conf->level == 6 && s->failed && s->to_write))) {
> > + (sh->raid_conf->level == 6 && s->failed && s->to_write &&
> > + s->towrite < sh->raid_conf->raid_disks - 2 &&
> > + (!test_bit(R5_Insync, &dev->flags) || test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))))) {
> > /* we would like to get this block, possibly by computing it,
> > * otherwise read it if the backing disk is insync
> > */
> > @@ -3115,7 +3121,8 @@ static void handle_stripe_dirtying(struct r5conf *conf,
> > !test_bit(R5_LOCKED, &dev->flags) &&
> > !(test_bit(R5_UPTODATE, &dev->flags) ||
> > test_bit(R5_Wantcompute, &dev->flags))) {
> > - if (test_bit(R5_Insync, &dev->flags)) rcw++;
> > + if (test_bit(R5_Insync, &dev->flags))
> > + rcw++;
> > else
> > rcw += 2*disks;
> > }
> > @@ -3136,10 +3143,10 @@ static void handle_stripe_dirtying(struct r5conf *conf,
> > !(test_bit(R5_UPTODATE, &dev->flags) ||
> > test_bit(R5_Wantcompute, &dev->flags)) &&
> > test_bit(R5_Insync, &dev->flags)) {
> > - if (
> > - test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
> > - pr_debug("Read_old block "
> > - "%d for r-m-w\n", i);
> > + if (test_bit(STRIPE_PREREAD_ACTIVE,
> > + &sh->state)) {
> > + pr_debug("Read_old block %d for r-m-w\n",
> > + i);
> > set_bit(R5_LOCKED, &dev->flags);
> > set_bit(R5_Wantread, &dev->flags);
> > s->locked++;
> > @@ -3162,10 +3169,9 @@ static void handle_stripe_dirtying(struct r5conf *conf,
> > !(test_bit(R5_UPTODATE, &dev->flags) ||
> > test_bit(R5_Wantcompute, &dev->flags))) {
> > rcw++;
> > - if (!test_bit(R5_Insync, &dev->flags))
> > - continue; /* it's a failed drive */
> > - if (
> > - test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
> > + if (test_bit(R5_Insync, &dev->flags) &&
> > + test_bit(STRIPE_PREREAD_ACTIVE,
> > + &sh->state)) {
> > pr_debug("Read_old block "
> > "%d for Reconstruct\n", i);
> > set_bit(R5_LOCKED, &dev->flags);
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
prev parent reply other threads:[~2014-05-20 11:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-23 19:01 Sequential writing to degraded RAID6 causing a lot of reading Patrik Horník
2012-05-24 4:48 ` NeilBrown
2012-05-24 12:37 ` Patrik Horník
2012-05-25 16:07 ` Patrik Horník
2012-05-28 1:31 ` NeilBrown
2014-05-15 7:04 ` Patrik Horník
2014-05-15 7:18 ` NeilBrown
2014-05-15 7:50 ` Patrik Horník
2014-05-20 5:42 ` NeilBrown
2014-05-20 10:07 ` Patrik Horník
2014-05-20 11:08 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140520210844.78dafb14@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=patrik@dsl.sk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).