From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: "layout" of a six drive raid10
Date: Tue, 9 Feb 2016 01:42:40 +0000 (UTC) [thread overview]
Message-ID: <pan$b862$fa78f2fd$bea6373b$99d690b8@cox.net> (raw)
In-Reply-To: 1E2010FD-CBFD-44BD-B5DB-9ECD5C009391@bueechi.net
boli posted on Mon, 08 Feb 2016 23:19:52 +0100 as excerpted:
> Hi
>
> I'm trying to figure out what a six drive btrfs raid10 would look like.
> It could mean that stripes are split over two raid1 sets of three
> devices each. The sentence "Every stripe is split across to exactly 2
> RAID-1 sets" would lead me to believe this.
>
> However, earlier it says for raid0 that "stripe[s are] split across as
> many devices as possible". Which for six drives would be: stripes are
> split over three raid1 sets of two devices each.
>
> Can anyone enlighten me as to which is correct?
Hugo's correct, and this is pretty much restating what he did. Sometimes
I find that reading things again in different words helps me better
understand the concept, and this post is made with that in mind.
At present, btrfs has only two-way mirroring, not N-way. So any raid
level that includes mirroring will have exactly two copies, no matter the
number of devices. (FWIW, N-way-mirroring is on the roadmap, but who
knows when it'll come, and like raid56 mode, it will likely take some
time to stabilize even once it does.)
What that means for a six device raid1 or raid10 is, still exactly two
copies of everything, with raid1 simply being three independent chunks,
two copies each, and raid10 being two copies of a three-device stripe.
> Reason I'm asking is that I'm deciding on a suitable raid level for a
> new DIY NAS box. I'd rather not use btrfs raid6 (for now).
Agreed and I think wise choice. =:^) I'd still be a bit cautious of
btrfs raid56, as I don't think it's quite to the level of stability that
other btrfs raid types are, just yet. I expect to be much more
comfortable recommending it in another couple kernel cycles.
> The first
> alternative I thought of was raid10. Later I learned how btrfs raid1
> works and figured it might be better suited for my use case: Striping
> the data over multiple raid1 sets doesn't really help, as transfer
> from/to my box will be limited by gigabit ethernet anyway, and a single
> drive can saturate that.
>
> Thoughts on this would also be appreciated.
Agreed, again. =:^)
Tho I'd consider benchmarking or testing, as I'm not sure btrfs raid1 on
spinning rust will in practice fully saturate the gigabit Ethernet,
particularly as it gets fragmented (which COW filesystems such as btrfs
tend to do much more so than non-COW, unless you're using something like
the autodefrag mount option from the get-go, as I do here, tho in that
case, striping won't necessarily help a lot either).
If you're concerned about getting the last bit of performance possible,
I'd say raid10, tho over the gigabit ethernet, the difference isn't
likely to be much.
OTOH, if you're more concerned about ease of maintenance, replacing
devices, etc, I believe raid1 is a bit less complex both in code terms
(where less code complexity means less chance of bugs) and in
administration, at least conceptually, tho in practice the administration
is going to be very close to the same as well.
So I'd tend to lean toward raid1 for a use-case thruput limited to gitabit
Ethernet speeds, even on spinning rust, as I think there may be a bit of
a difference in speed vs raid10, but I doubt it'll be much due to the
gigabit thruput limit, and I'd consider the lower complexity of raid1 to
offset that.
> As a bonus I was wondering how btrfs raid1 are layed out in general, in
> particular with even and odd numbers of drives. A pair is trivial. For
> three drives I think a "ring setup" with each drive sharing half of its
> data with another drive. But how is it with four drives – are they
> organized as two pairs, or four-way, or …
For raid1, allocation is done in pairs, with each allocation taking the
device with the most space left, except that both copies can't be on a
single device, even if for instance you have a 3 TB device and the rest
are 1 TB or smaller. That case would result in one copy of each pair on
the 3 TB device, one copy on whatever device has the most space left of
the others.
Which on a filesystem with all equal sized devices, tends to result in
round-robin allocation, tho of course in the odd number of devices case,
there will always be at least one device that has either more or less
allocation by a one-chunk margin. (Tho it can be noted that metadata
chunks are smaller than data chunks, and while Hugo noted the nominal 1
GiB data chunk size and 256 MiB metadata chunk size, at the 100 GiB plus
per device scale, chunks can be larger, upto 10 GiB data chunk, and of
course smaller on very small devices, so the 1GiB-data/256MiB-metadata
values are indeed only nominal, but they still give you some idea of the
relative size.)
So a btrfs raid1 on four equally sized devices will indeed result in two
pairs, but simply because of the most-space-available allocation rule,
not because it's forced to pairs of pairs. And with unequally sized
devices, the device with the most space will always get one of the two
copies, until its space equalizes to that of at least one other device.
Btrfs raid10 works similarly with the copy allocation, but stripe
allocation works exactly opposite, prioritizing stripe width. So with an
even number of equally sized devices, each stripe will be half the number
of devices wide, with the second copy being the other half. If there's
an odd number of devices, one will be left out on each allocation, but
the one that's left out will change with each allocation, as the one left
out in the previous allocation will now have more space available than
the others so it'll be allocated first for one of the copies, leaving a
different one to be left out on this allocation round. And with
unequally sized devices, allocation will always be to an even number and
always to at least four at once, of course favoring the device with the
most space available, but stripes will always be half the available
width, with a second copy of the stripe to the other half, so will use up
space on all devices at once if it's an even number of devices with space
left, all but one if it's an odd number with space left, since both
copies can't be on the same device, which means that odd device can't be
used for that allocation round, tho it will be for the next, and a
different device left out instead.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2016-02-09 1:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-08 22:19 "layout" of a six drive raid10 boli
2016-02-08 23:05 ` Hugo Mills
2016-02-09 1:42 ` Duncan [this message]
2016-02-09 7:02 ` Kai Krakow
2016-02-09 7:19 ` Kai Krakow
2016-02-09 13:02 ` Austin S. Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$b862$fa78f2fd$bea6373b$99d690b8@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).