Re: I need to P. are we almost there yet?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Brendan Hide <brendan@swiftspirit.co.za>,
	ashford@whisperpc.com, Phillip Susi <psusi@ubuntu.com>
Cc: Jose Manuel Perez Bethencourt <jmperezbeth@gmail.com>,
	Chris Murphy <lists@colorremedies.com>,
	"sys.syphus" <syssyphus@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: I need to P. are we almost there yet?
Date: Fri, 02 Jan 2015 14:41:10 -0500	[thread overview]
Message-ID: <54A6F456.6090701@gmail.com> (raw)
In-Reply-To: <54A6D951.9080706@swiftspirit.co.za>

[-- Attachment #1: Type: text/plain, Size: 2485 bytes --]

On 2015-01-02 12:45, Brendan Hide wrote:
> On 2015/01/02 15:42, Austin S Hemmelgarn wrote:
>> On 2014-12-31 12:27, ashford@whisperpc.com wrote:
>>> I see this as a CRITICAL design flaw.  The reason for calling it
>>> CRITICAL
>>> is that System Administrators have been trained for >20 years that
>>> RAID-10
>>> can usually handle a dual-disk failure, but the BTRFS implementation has
>>> effectively ZERO chance of doing so.
>> No, some rather simple math
> That's the problem. The math isn't as simple as you'd expect:
>
> The example below is probably a pathological case - but here goes. Let's
> say in this 4-disk example that chunks are striped as d1,d2,d1,d2 where
> d1 is the first bit of data and d2 is the second:
> Chunk 1 might be striped across disks A,B,C,D d1,d2,d1,d2
> Chunk 2 might be striped across disks B,C,A,D d3,d4,d3,d4
> Chunk 3 might be striped across disks D,A,C,B d5,d6,d5,d6
> Chunk 4 might be striped across disks A,C,B,D d7,d8,d7,d8
> Chunk 5 might be striped across disks A,C,D,B d9,d10,d9,d10
>
> Lose any two disks and you have a 50% chance on *each* chunk to have
> lost that chunk. With traditional RAID10 you have a 50% chance of losing
> the array entirely. With btrfs, the more data you have stored, the
> chances get closer to 100% of losing *some* data in a 2-disk failure.
>
> In the above example, losing A and B means you lose d3, d6, and d7
> (which ends up being 60% of all chunks).
> Losing A and C means you lose d1 (20% of all chunks).OK
> Losing A and D means you lose d9 (20% of all chunks).
> Losing B and C means you lose d10 (20% of all chunks).
> Losing B and D means you lose d2 (20% of all chunks).
> Losing C and D means you lose d4,d5, AND d8 (60% of all chunks)
>
> The above skewed example has an average of 40% of all chunks failed. As
> you add more data and randomise the allocation, this will approach 50% -
> BUT, the chances of losing *some* data is already clearly shown to be
> very close to 100%.
>
OK, I forgot about the randomization effect that the chunk allocation 
and freeing has.  We really should slap a *BIG* warning label on that 
(and ideally find some better way to do it so it's more reliable).

As an aside, I've found that a BTRFS raid1 set on top of 2 LVM/MD RAID0 
sets is actually faster than using a BTRFS raid10 set with the same 
number of disks (how much faster is workload dependent), and provides 
better guarantees than a BTRFS raid10 set.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

next prev parent reply	other threads:[~2015-01-02 19:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-29 18:56 I need to P. are we almost there yet? sys.syphus
2014-12-29 19:00 ` sys.syphus
2014-12-29 19:04   ` Hugo Mills
2014-12-29 20:25     ` sys.syphus
2014-12-29 21:50       ` Hugo Mills
2014-12-29 21:16   ` Chris Murphy
2014-12-30  0:20     ` ashford
     [not found]       ` <CALBWd85UsSih24RhwpmDeMjuMWCKj9dGeuZes5POj6qEFkiz2w@mail.gmail.com>
2014-12-30 17:09         ` Fwd: " Jose Manuel Perez Bethencourt
2014-12-30 21:44       ` Phillip Susi
2014-12-30 23:17         ` ashford
2014-12-31  2:45           ` Phillip Susi
2014-12-31 17:27             ` ashford
2014-12-31 23:38               ` Phillip Susi
2015-01-01  1:26               ` Chris Samuel
2015-01-01 20:12                 ` Roger Binns
2015-01-02  3:47                   ` Duncan
2015-01-02 13:42               ` Austin S Hemmelgarn
2015-01-02 17:45                 ` Brendan Hide
2015-01-02 19:41                   ` Austin S Hemmelgarn [this message]
2014-12-29 21:13 ` Chris Murphy
2015-01-03 11:34 ` Bob Marley
2015-01-03 13:11   ` Duncan
2015-01-03 18:53     ` Bob Marley
2015-01-03 19:03       ` sys.syphus
2015-01-03 18:55     ` sys.syphus
2015-01-04  3:22       ` Duncan
2015-01-04  3:54         ` Hugo Mills
2015-01-03 21:58     ` Roman Mamedov
2015-01-04  3:24       ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54A6F456.6090701@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=ashford@whisperpc.com \
    --cc=brendan@swiftspirit.co.za \
    --cc=jmperezbeth@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=psusi@ubuntu.com \
    --cc=syssyphus@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).