linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Chris Worley <worleys@gmail.com>
Cc: linuxraid <linux-raid@vger.kernel.org>
Subject: Re: RAID50, despite chunk setting, does everything in 4KB blocks
Date: Tue, 20 Dec 2011 11:08:06 +1100	[thread overview]
Message-ID: <20111220110806.221173c6@notabene.brown> (raw)
In-Reply-To: <CANWz5fg9e9A_FZDY28m38_c+EZ=NdjCBcqYhbW+8Qyyc1=BRxg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2665 bytes --]

On Mon, 19 Dec 2011 16:56:16 -0700 Chris Worley <worleys@gmail.com> wrote:

> On Mon, Dec 19, 2011 at 4:24 PM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 19 Dec 2011 15:43:13 -0700 Chris Worley <worleys@gmail.com> wrote:
> >
> >> It doesn't really matter what chunk sizes I set, but, for example, I
> >> create three RAID5's of 5 drives each with a chunk size of 32K, and
> >> create a RAID0 comprised of the three RAID5's with a chunk size of
> >> 64K:
> >>
> >> md0 : active raid0 md27[2] md26[1] md25[0]
> >>       1885098048 blocks super 1.2 64k chunks
> >>
> >> If I write to one of the RAID5's, using:
> >>
> >> # dd of=/dev/md27  if=/dev/zero bs=1024k oflag=direct
> >>
> >> ... then "iostat -dmx 2" shows the drives being written to in 32K
> >> chunks (avgrq-sz=64), as you'd expect.
> >>
> >> But, writing to the RAID0 that's striping the RAID5's, shows
> >> everything being written in 4KB chunks (iostat shows avgrq-sz=8) to
> >> the RAID0 as well as to the RAID5's.
> >
> > When writing to a RAID5 it *always* submits request to the lower layers in
> > PAGE sized units.  This makes it much easier to keep parity and data aligned.
> >
> > The queue on the underlying device should sort the requests and  group them
> > together and your evidence suggests that it does.
> >
> > When writing to the RAID5 through a RAID0 it will only see 64K at a time but
> > that shouldn't won't make any difference to its behaviour and should change
> > the way the requests finally get to the device.
> >
> > So I have no idea why you see a difference.
> >
> > I suspect lots of block-layer tracing, and lots of staring at code and lots
> > of head scratching would be needed to understand what is really going in.
> 
> Note that "max_segments" for the raid0 = 1, and max_segment_size =
> 4096, which tells Linux that the md can only take a single 4KB page
> per IO request.

Ah, of course.  RAID5 sets a merge_bvec_fn so that there is some chance that
read requests can bypass the cache.
As RAID0 doesn't honour the merge_bvec_fn (maybe it should) it sets the max
request size to 1 page.

RAID10 sets a merge_bvec_fn too so RAID0 will be sending it requests in
1-page pieces.

> 
> The scheduler shouldn't be involved in the transaction between the
> RAID0 and RAID5, as neither uses the scheduler, so it shouldn't merge
> there, but it also shouldn't be fragmenting.
> 
> Not having the RAID0 send the larger chunks to the RAID5's may cause
> more fragmentation than the drive's scheduler will be able to
> re-merge.

How hard can it be to merge a few (thousand) requests??? :-)

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

      reply	other threads:[~2011-12-20  0:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-19 22:43 RAID50, despite chunk setting, does everything in 4KB blocks Chris Worley
2011-12-19 23:24 ` NeilBrown
2011-12-19 23:56   ` Chris Worley
2011-12-20  0:08     ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111220110806.221173c6@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=worleys@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).