* raid10 offset distance
@ 2013-12-27 2:37 Phillip Susi
2013-12-27 13:08 ` Peter Grandi
0 siblings, 1 reply; 2+ messages in thread
From: Phillip Susi @ 2013-12-27 2:37 UTC (permalink / raw)
To: Linux RAID
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
md(4) hints that you can get sequential read performance near that of
the far layout with the offset layout and a large chunk size. That
sounded nice and I thought that chunk size was just the number of
stripes to write before the backup copy. It turns out that "chunk
size" is really just the stripe factor, and so using a very large
stripe factor hurts performance since even a relatively large IO
doesn't span disks.
Is there a way to keep a sane stripe factor, but have the offset copy
stored further away to reduce seeking during sequential reads?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBCgAGBQJSvOfhAAoJEI5FoCIzSKrwizkIAIfDU2MS9S1Hb4wYSZm5QZhn
HSvDcnVD2jxSXwgJXLSsvq8dlO426OVnIwFECGcXS/1Lc/UcaX6t9C6cbsKGtvBG
5NQdq4WE/mBTQ4nXUzAJvyZfkacn4JhJinZjK6uEaGkoIj2cj1FP0/qjLGP3S7eT
9dMrXU61PbrS6XGHgfJzp0BXAN7YVjajCXKvwsY53BEjnuFsACs9eM8S8GBztNe8
mR9JqM9TfRhZ7C0t9/T+sbFHQqATm24Bnl7awzd7UuqXOJR8qA1GpxMHlfy07OHt
8Ya7RV28ezPmaa//RrlvPeVN8iVtvJoWndaqeXghmJYYn06CNd90gEOZZEDDi00=
=WgY2
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: raid10 offset distance
2013-12-27 2:37 raid10 offset distance Phillip Susi
@ 2013-12-27 13:08 ` Peter Grandi
0 siblings, 0 replies; 2+ messages in thread
From: Peter Grandi @ 2013-12-27 13:08 UTC (permalink / raw)
To: Linux RAID
[ ... ]
> md(4) hints that you can get sequential read performance near
> that of the far layout with the offset layout and a large
> chunk size.
Tradeoffs of course apply.
> That sounded nice and I thought that chunk size was just the
> number of stripes to write before the backup copy. It turns
> out that "chunk size" is really just the stripe factor,
That's a rather peculiar choice of terminology, quite unlike
that used by the MD documentation. In particular in MD RAID1 in
the general case there is no "backup copy"; all replicas of a
given page have the same status. 'man 4 md' uses correctly the
term "multiple copies of a given chunk".
Perhaps only with '--write-mostly' one could think of a chunk
on the indicated device as being "backup copy".
> and so using a very large stripe factor hurts performance
> since even a relatively large IO doesn't span disks.
"performance" is a long word and perhaps should be used more
rarely... Large chunks may result in a lower *transfer rate* of
short non-streaming non-threaded IO, but benefit other types of
IO. I personally prefer smaller chunk in most cases, but I am
aware of the tradeoffs involved.
> Is there a way to keep a sane stripe factor, but have the
> offset copy stored further away to reduce seeking during
> sequential reads?
It is not clear to me what you are imagining here; with all
layouts each copy of a chunk are on different MD member devices
so "further away" sounds rather strange to me.
The difference between the layouts is the offsets where the
different copies are stored on each MD member device ('n': same,
'o2': a chunk, 'f2': half a disk). This makes a difference with
rotating disk devices that have very different transfer rates on
outer and inner tracks.
Whatever it is, there is no miraculous RAID10 layout that will
give both awesome read and write transfer rates or IOPS, each
layout comes with pretty sharp trsdeoffs between the various
aspects of the performance envelope.
There is an interesting section here:
http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
with layout diagrams that might help you figure out what is
possible and what the tradeoffs area.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-12-27 13:08 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-27 2:37 raid10 offset distance Phillip Susi
2013-12-27 13:08 ` Peter Grandi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox