All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Chris Worley <worleys@gmail.com>
Cc: linuxraid <linux-raid@vger.kernel.org>
Subject: Re: RAID50, despite chunk setting, does everything in 4KB blocks
Date: Tue, 20 Dec 2011 11:08:06 +1100	[thread overview]
Message-ID: <20111220110806.221173c6@notabene.brown> (raw)
In-Reply-To: <CANWz5fg9e9A_FZDY28m38_c+EZ=NdjCBcqYhbW+8Qyyc1=BRxg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2665 bytes --]

On Mon, 19 Dec 2011 16:56:16 -0700 Chris Worley <worleys@gmail.com> wrote:

> On Mon, Dec 19, 2011 at 4:24 PM, NeilBrown <neilb@suse.de> wrote:
> > On Mon, 19 Dec 2011 15:43:13 -0700 Chris Worley <worleys@gmail.com> wrote:
> >
> >> It doesn't really matter what chunk sizes I set, but, for example, I
> >> create three RAID5's of 5 drives each with a chunk size of 32K, and
> >> create a RAID0 comprised of the three RAID5's with a chunk size of
> >> 64K:
> >>
> >> md0 : active raid0 md27[2] md26[1] md25[0]
> >>       1885098048 blocks super 1.2 64k chunks
> >>
> >> If I write to one of the RAID5's, using:
> >>
> >> # dd of=/dev/md27  if=/dev/zero bs=1024k oflag=direct
> >>
> >> ... then "iostat -dmx 2" shows the drives being written to in 32K
> >> chunks (avgrq-sz=64), as you'd expect.
> >>
> >> But, writing to the RAID0 that's striping the RAID5's, shows
> >> everything being written in 4KB chunks (iostat shows avgrq-sz=8) to
> >> the RAID0 as well as to the RAID5's.
> >
> > When writing to a RAID5 it *always* submits request to the lower layers in
> > PAGE sized units.  This makes it much easier to keep parity and data aligned.
> >
> > The queue on the underlying device should sort the requests and  group them
> > together and your evidence suggests that it does.
> >
> > When writing to the RAID5 through a RAID0 it will only see 64K at a time but
> > that shouldn't won't make any difference to its behaviour and should change
> > the way the requests finally get to the device.
> >
> > So I have no idea why you see a difference.
> >
> > I suspect lots of block-layer tracing, and lots of staring at code and lots
> > of head scratching would be needed to understand what is really going in.
> 
> Note that "max_segments" for the raid0 = 1, and max_segment_size =
> 4096, which tells Linux that the md can only take a single 4KB page
> per IO request.

Ah, of course.  RAID5 sets a merge_bvec_fn so that there is some chance that
read requests can bypass the cache.
As RAID0 doesn't honour the merge_bvec_fn (maybe it should) it sets the max
request size to 1 page.

RAID10 sets a merge_bvec_fn too so RAID0 will be sending it requests in
1-page pieces.

> 
> The scheduler shouldn't be involved in the transaction between the
> RAID0 and RAID5, as neither uses the scheduler, so it shouldn't merge
> there, but it also shouldn't be fragmenting.
> 
> Not having the RAID0 send the larger chunks to the RAID5's may cause
> more fragmentation than the drive's scheduler will be able to
> re-merge.

How hard can it be to merge a few (thousand) requests??? :-)

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

      reply	other threads:[~2011-12-20  0:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-19 22:43 RAID50, despite chunk setting, does everything in 4KB blocks Chris Worley
2011-12-19 23:24 ` NeilBrown
2011-12-19 23:56   ` Chris Worley
2011-12-20  0:08     ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111220110806.221173c6@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=worleys@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.