linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Recommended why to use btrfs for production?
Date: Thu, 9 Jun 2016 07:38:43 -0400	[thread overview]
Message-ID: <aa93e9e0-eade-085c-c955-2f7e433d92f8@gmail.com> (raw)
In-Reply-To: <pan$f0f75$1ba12fd1$de89e173$5dda67a@cox.net>

On 2016-06-09 02:16, Duncan wrote:
> Austin S. Hemmelgarn posted on Fri, 03 Jun 2016 10:21:12 -0400 as
> excerpted:
>
>> As far as BTRFS raid10 mode in general, there are a few things that are
>> important to remember about it:
>> 1. It stores exactly two copies of everything, any extra disks just add
>> to the stripe length on each copy.
>
> I'll add one more, potentially very important, related to this one:
>
> Btrfs raid mode (any of them) works in relation to individual chunks,
> *NOT* individual devices.
>
> What that means for btrfs raid10 in combination with the above exactly
> two copies rule, is that it works rather differently than a standard
> raid10, which can tolerate loss of two devices as long as they're from
> the same mirror set, as the other mirror set will then still be whole.
> Because with btrfs raid10 the mirror sets are dynamic per-chunk, loss of
> a second device close to assures loss of data, because the very likely
> true assumption is that both mirror sets will be affected for some
> chunks, but not others.
Actually, that's not _quite_ the case.  Assuming that you have an even 
number of devices, BTRFS raid10 will currently always span all the 
available devices with two striped copies of the data (if there's an odd 
number, it spans one less than the total, and rotates which one gets 
left out of each chunk).  This means that as long as all the devices are 
the same size and you have have stripes that are the full width of the 
array (you can end up with shorter ones if you have run in degraded mode 
or expanded the array), your probability of data loss per-chunk goes 
down as you add more devices (because the probability of a two device 
failure affecting both copies of a stripe in a given chunk decreases), 
but goes up as you add more chunks (because you then have to apply that 
probability for each individual chunk).  Once you've lost one disk, the 
probability that losing another will compromise a specific chunk is:
1/(N - 1)
Where N is the total number of devices.
The probability that it will compromise _any_ chunk is:
(1/(N - 1))/C
Where C is the total number of chunks
BTRFS raid1 mode actually has the exact same probabilities, but they 
apply even if you have an odd number of disks.
>
> By using a layered approach, btrfs raid1 on top (for its error correction
> from the other copy feature) of a pair of mdraid0s, you force one of the
> btrfs raid1 copies to each of the mdraid0s, thus making allocation more
> deterministic than btrfs raid10, and can thus again tolerate loss of two
> devices, as long as they're from the same underlying mdraid0.
>
> (Traditionally, raid1 on top of raid0 is called raid01, and is
> discouraged compared to raid10, raid0 on top of raid1, because device
> failure and replacement with the latter triggers a much more localized
> rebuild than the former, across the pair of devices in the raid1 when
> it's closest to the physical devices, across the whole array, one raid0
> to the other, when the raid1 is on top.  However, btrfs raid1's data
> integrity and error repair from the good mirror feature is generally
> considered to be useful enough to be worth the rebuild-inefficiency of
> the raid01 design.)
>
> So in regard to failure tolerance, btrfs raid10 is far closer to
> traditional raid5, loss of a single device is tolerated, loss of a second
> before a repair is complete generally means data loss -- there's not the
> chance of it being on the same mirror set to save you that traditional
> raid10 has.
>
> Similarly, btrfs raid10 doesn't have the cleanly separate pair of mirrors
> on raid0 arrays that traditional raid10 does, thus doesn't have the fault
> tolerance of losing say the connection or power to one entire device
> bank, as long as it's all one mirror set, that traditional raid10 has.
>
> And again, doing the layered thing with btrfs raid1 on top and mdraid0
> (or whatever else) underneath gets that back for you, if you set it up
> that way, of course.
And will get you better performance than just BTRFS most of the time too.


  reply	other threads:[~2016-06-09 11:38 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03  9:49 Recommended why to use btrfs for production? Martin
2016-06-03  9:53 ` Marc Haber
2016-06-03  9:57   ` Martin
2016-06-03 10:01 ` Hans van Kranenburg
2016-06-03 10:15   ` Martin
2016-06-03 12:55 ` Austin S. Hemmelgarn
2016-06-03 13:31   ` Martin
2016-06-03 13:47     ` Julian Taylor
2016-06-03 14:21     ` Austin S. Hemmelgarn
2016-06-03 14:39       ` Martin
2016-06-03 19:09       ` Christoph Anton Mitterer
2016-06-09  6:16       ` Duncan
2016-06-09 11:38         ` Austin S. Hemmelgarn [this message]
2016-06-09 17:39           ` Chris Murphy
2016-06-09 19:57             ` Duncan
2016-06-03 14:05   ` Chris Murphy
2016-06-03 14:11     ` Martin
2016-06-03 15:33       ` Austin S. Hemmelgarn
2016-06-04  0:48         ` Nicholas D Steeves
2016-06-04  1:48           ` Chris Murphy
2016-06-06 13:29             ` Austin S. Hemmelgarn
2016-06-04  1:34       ` Chris Murphy
2016-06-05 10:45     ` Mladen Milinkovic
2016-06-05 16:33       ` James Johnston
2016-06-05 18:20         ` Andrei Borzenkov
2016-06-06  1:47       ` Chris Murphy
2016-06-06  2:40         ` James Johnston
2016-06-06 13:36           ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa93e9e0-eade-085c-c955-2f7e433d92f8@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).