From: "Patrik Horník" <patrik@dsl.sk>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: Sequential writing to degraded RAID6 causing a lot of reading
Date: Thu, 15 May 2014 09:04:27 +0200 [thread overview]
Message-ID: <CAAOsTSmvGOwhmFW4PRfkmVnSApNKrsZfa6RAbjEhwQh-+ko1vQ@mail.gmail.com> (raw)
In-Reply-To: <20120528113145.1b8ac4ab@notabene.brown>
Hello Neil,
did you make some progress on this issue by any chance?
I am hitting the same problem again on degraded RAID 6 missing two
drives, kernel Debian 3.13.10-1, mdadm v3.2.5.
Thanks.
Patrik
2012-05-28 3:31 GMT+02:00 NeilBrown <neilb@suse.de>:
>
> On Thu, 24 May 2012 14:37:28 +0200 Patrik Horník <patrik@dsl.sk> wrote:
>
> > On Thu, May 24, 2012 at 6:48 AM, NeilBrown <neilb@suse.de> wrote:
>
> > > Firstly, degraded RAID6 with a left-symmetric layout is quite different from
> > > an optimal RAID5 because there are Q blocks sprinkled around and some D
> > > blocks missing. So there will always be more work to do.
> > >
> > > Degraded left-symmetric-6 is quite similar to optimal RAID5 as the same data
> > > is stored in the same place - so reading should be exactly the same.
> > > However writing is generally different and the code doesn't make any attempt
> > > to notice and optimise cases that happen to be similar to RAID5.
> >
> > Actually I have left-symmetric-6 without one of the "regular" drives
> > not the one with only Qs on it, so it should be similar to degraded
> > RAID6 with a left-symmetric in this regard.
>
> Yes, it should - I had assumed wrongly ;-)
>
> >
> > > A particular issue is that while RAID5 does read-modify-write when updating a
> > > single block in an array with 5 or more devices (i.e. it reads the old data
> > > block and the parity block, subtracts the old from parity and adds the new,
> > > then writes both back), RAID6 does not. It always does a reconstruct-write,
> > > so on a 6-device RAID6 it will read the other 4 data blocks, compute P and Q,
> > > and write them out with the new data.
> > > If it did read-modify-write it might be able to get away with reading just P,
> > > Q, and the old data block - 3 reads instead of 4. However subtracting from
> > > the Q block is more complicated that subtracting from the P block and has not
> > > been implemented.
> >
> > OK, I did not know that. In my case I have 8 drives RAID6 degraded to
> > 7 drives, so it would be plus to have it implemented the RAID5 way.
> > But anyway I was thinking the whole-stripe detection should work in
> > this case.
> >
> > > But that might not be the issue you are hitting - it simply shows that RAID6
> > > is different from RAID5 in important but non-obvious ways.
> > >
> > > Yes, RAID5 and RAID6 do try to detect whole-stripe write and write them out
> > > without reading. This is not always possible though.
> > > Maybe if you told us how many devices were in your arrays (which may be
> > > import to understand exactly what is happening), what the chunk size is, and
> > > exactly what command you use to write "lots of data". That might help
> > > understand what is happening.
> >
> > The RAID5 is 5 drives, the RAID6 arrays are 7 of 8 drives, chunk size
> > is 64K. I am using command dd if=/dev/zero of=file bs=X count=Y, it
> > behaves the same for bs between 64K to 1 MB. Actually internal read
> > speed from every drive is slightly higher that write speed, about cca
> > 10%. The ratio between write speed to the array and write speed to
> > individual drive is cca 5.5 - 5.7.
>
> I cannot really picture how the read speed can be higher than the write
> speed. The spindle doesn't speed up for reads and slow down for writes does
> it? But that's not really relevant.
>
> A 'dd' with large block size should be a good test. I just did a simple
> experiment. With a 4-drive non-degraded RAID6 I get about a 1:100 ratio for
> reads to writes for an extended write to the filesystem.
> If I fail one device it becomes 1:1. Something certainly seems wrong there.
>
> RAID5 behaves more as you would expect - many more writes than reads.
>
> I've made a note to look into this when I get a chance.
>
> Thanks for the report.
>
> NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-05-15 7:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-23 19:01 Sequential writing to degraded RAID6 causing a lot of reading Patrik Horník
2012-05-24 4:48 ` NeilBrown
2012-05-24 12:37 ` Patrik Horník
2012-05-25 16:07 ` Patrik Horník
2012-05-28 1:31 ` NeilBrown
2014-05-15 7:04 ` Patrik Horník [this message]
2014-05-15 7:18 ` NeilBrown
2014-05-15 7:50 ` Patrik Horník
2014-05-20 5:42 ` NeilBrown
2014-05-20 10:07 ` Patrik Horník
2014-05-20 11:08 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAOsTSmvGOwhmFW4PRfkmVnSApNKrsZfa6RAbjEhwQh-+ko1vQ@mail.gmail.com \
--to=patrik@dsl.sk \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).