Re: Sequential writing to degraded RAID6 causing a lot of reading

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: patrik@dsl.sk
Cc: linux-raid@vger.kernel.org
Subject: Re: Sequential writing to degraded RAID6 causing a lot of reading
Date: Mon, 28 May 2012 11:31:45 +1000	[thread overview]
Message-ID: <20120528113145.1b8ac4ab@notabene.brown> (raw)
In-Reply-To: <CAAOsTSm47hy6YF-5fodB+2G6n1E=+Ces_PWsSo7F6GKEF_R1iA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3499 bytes --]

On Thu, 24 May 2012 14:37:28 +0200 Patrik Horník <patrik@dsl.sk> wrote:

> On Thu, May 24, 2012 at 6:48 AM, NeilBrown <neilb@suse.de> wrote:

> > Firstly, degraded RAID6 with a left-symmetric layout is quite different from
> > an optimal RAID5 because there are Q blocks sprinkled around and some D
> > blocks missing.  So there will always be more work to do.
> >
> > Degraded left-symmetric-6 is quite similar to optimal RAID5 as the same data
> > is stored in the same place - so reading should be exactly the same.
> > However writing is generally different and the code doesn't make any attempt
> > to notice and optimise cases that happen to be similar to RAID5.
> 
> Actually I have left-symmetric-6 without one of the "regular" drives
> not the one with only Qs on it, so it should be similar to degraded
> RAID6 with a left-symmetric in this regard.

Yes, it should - I had assumed wrongly ;-)

> 
> > A particular issue is that while RAID5 does read-modify-write when updating a
> > single block in an array with 5 or more devices (i.e. it reads the old data
> > block and the parity block, subtracts the old from parity and adds the new,
> > then writes both back), RAID6 does not. It always does a reconstruct-write,
> > so on a 6-device RAID6 it will read the other 4 data blocks, compute P and Q,
> > and write them out with the new data.
> > If it did read-modify-write it might be able to get away with reading just P,
> > Q, and the old data block - 3 reads instead of 4.  However subtracting from
> > the Q block is more complicated that subtracting from the P block and has not
> > been implemented.
> 
> OK, I did not know that. In my case I have 8 drives RAID6 degraded to
> 7 drives, so it would be plus to have it implemented the RAID5 way.
> But anyway I was thinking the whole-stripe detection should work in
> this case.
> 
> > But that might not be the issue you are hitting - it simply shows that RAID6
> > is different from RAID5 in important but non-obvious ways.
> >
> > Yes, RAID5 and RAID6 do try to detect whole-stripe write and write them out
> > without reading.  This is not always possible though.
> > Maybe if you told us how many devices were in your arrays (which may be
> > import to understand exactly what is happening), what the chunk size is, and
> > exactly what command you use to write "lots of data".  That might help
> > understand what is happening.
> 
> The RAID5 is 5 drives, the RAID6 arrays are 7 of 8 drives, chunk size
> is 64K. I am using command dd if=/dev/zero of=file bs=X count=Y, it
> behaves the same for bs between 64K to 1 MB. Actually internal read
> speed from every drive is slightly higher that write speed, about cca
> 10%. The ratio between write speed to the array and write speed to
> individual drive is cca 5.5 - 5.7.

I cannot really picture how the read speed can be higher than the write
speed.  The spindle doesn't speed up for reads and slow down for writes does
it?  But that's not really relevant.

A 'dd' with large block size should be a good test.  I just did a simple
experiment.  With a 4-drive non-degraded RAID6 I get about a 1:100 ratio for
reads to writes for an extended write to the filesystem.
If I fail one device it becomes 1:1.  Something certainly seems wrong there.

RAID5 behaves more as you would expect - many more writes than reads.

I've made a note to look into this when I get a chance.

Thanks for the report.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2012-05-28  1:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-23 19:01 Sequential writing to degraded RAID6 causing a lot of reading Patrik Horník
2012-05-24  4:48 ` NeilBrown
2012-05-24 12:37   ` Patrik Horník
2012-05-25 16:07     ` Patrik Horník
2012-05-28  1:31     ` NeilBrown [this message]
2014-05-15  7:04       ` Patrik Horník
2014-05-15  7:18         ` NeilBrown
2014-05-15  7:50           ` Patrik Horník
2014-05-20  5:42             ` NeilBrown
2014-05-20 10:07               ` Patrik Horník
2014-05-20 11:08                 ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120528113145.1b8ac4ab@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=patrik@dsl.sk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).