From: "Keld Jørn Simonsen" <keld@dkuug.dk>
To: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid10 layouts and performance Re: md man page
Date: Wed, 9 Jul 2008 00:44:22 +0200 [thread overview]
Message-ID: <20080708224422.GD7443@rap.rap.dk> (raw)
In-Reply-To: <18546.42692.577082.770926@notabene.brown>
On Tue, Jul 08, 2008 at 09:29:08AM +1000, Neil Brown wrote:
Content-Description: message body text
>
> (Adding linux-raid - I hope that's OK Keld?)
Yeah, that is fine:-)
> On Wednesday July 2, keld@dkuug.dk wrote:
> >
> > When 'offset' replicas are chosen, the multiple copies of a given chunk
> > are laid out on consecutive drives and at consecutive offsets. Effec-
> > tively each stripe is duplicated and the copies are offset by one
> > device. This should give similar read characteristics to 'far' if a
> > suitably large chunk size is used, but without as much seeking for
> > writes.
> >
> > A number of benchmarks have shown that 'offset' layout does not have
> > similar read characteristics as the 'far' layout. Also a number of benchmarks have
> > shown that seeking is similar in 'far' and 'offset' layouts. So I suggest to
> > remove the last sentence.
>
> If I have done any such benchmarks, it was too long ago to remember,
> so I decided to do some simple tests and graph them. I like graphs
> and I like this one so I've decided to share it.
I like graphs too! May I use your graph on the wiki?
> The X axis is chunk size, ranging from 4k to 4096k - it is
> logarithmic.
> The Y axis is throughput in MB/s measured by 'dd' to the raw device -
> average of 5 runs.
> This was with a 2-drive raid with each of the possible layout: n2, f2,
> o2.
>
> f2-read is strikingly faster than anything else. It is clearly
> reading from both drives as once, as you would expect it to.
> f2-write is slower then anything else (except at 4K chunk size, which is
> an extreme case).
Yes, in your test. Is this done with dd on the raw array?
My tests indicate that writing is almost the same for raid10,n2 and
raid10,f2, when using the ext3 fs. I think the elevator comes into play
here. And I actually think this is important. You do not use an array
without a fs on top of it. And for the user, it is really the resulting
performance of the raid and the fs that is interesting. The raw array is
not that interesting.
> o2-read is fairly steady for most of the chunk sizes, but peaks up at
> 2M and only drops a little at 4M. This seems to suggest that it is
> around 2M that the time to seek over a chunk drops well below the time
> to read one chunk. Possibly at smaller chunk sizes, it just reads to
> skip N sectors. Maybe the cylinder size is about 2Meg - there no real
> gain from the offset layout until you can seek over whole cylinders.
> So the sentence:
> This should give similar read characteristics to 'far' if a
> suitably large chunk size is used
> seems somewhat justified if the chunksize used is 2M.
Your graph indicates that raid10,o2 is something like 20 % under the
performance of raid10,f2, in the best case. In the worst case it is
about 30 % under. To me this is not "similar". To me that is better
described as a performance of 20 - 30 % under that of raid10,f2.
> It might be interesting to implement non-power-of-2 chunksizes and try
> a range of sizes between 1M and 4M to see what the graph looks like...
> maybe we could find the actual cylinder size.
>
> o2-write is very close to n2-write and is measurably (8%-14%) higher
> than f2-write. This seems to support the sentence
> but without as much seeking for writes.
>
> It is not that there are fewer seeks, but that the seeks are shorter.
This is most likely compensated by the elevator, as described above.
> So while I don't want to just remove that last sentence, I agree that
> it could be improved, possibly by giving a ball-park figure for what a
> "suitably large chunk size" is. Also the second half could be
> "but without the long seeks being required for sequential writes".
>
> It would probably be good to do some measurements with random IO as
> well to see how they compare.
>
> Anyone else have some measurements they would like to share?
There is something like more than a handful in the wiki at
http://linux-raid.osdl.org/index.php/Performance
This includes some tests for random IO.
> Thanks for your suggestions.
you are welcome!
In my quest for updated documentation for linux raid, I find that mdadm
documentation is also very outdated. The mdadm man page that is reported
by google, and on wikipedia for mdadm, do not include any info on
raid10!
Is there a page that we could reference, which has the current mdadm man
page? And which is maintained?
I note that our raid wiki is now nbr 3 on google, That is a lot better
than number 121 which was the place about half a year ago:-)
next prev parent reply other threads:[~2008-07-08 22:44 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-01 20:22 md man page Keld Jørn Simonsen
2008-07-01 20:27 ` Randy.Dunlap
2008-07-01 21:21 ` NeilBrown
2008-07-02 14:26 ` Andre Noll
[not found] ` <20080702001739.GA26832@rap.rap.dk>
2008-07-07 23:32 ` raid10 layouts and performance " Neil Brown
[not found] ` <18546.42692.577082.770926@notabene.brown>
2008-07-08 22:44 ` Keld Jørn Simonsen [this message]
2008-07-09 8:51 ` David Greaves
2008-07-09 10:09 ` Keld Jørn Simonsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080708224422.GD7443@rap.rap.dk \
--to=keld@dkuug.dk \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).