Re: RAID56 status?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Jan Vales <jan@jvales.net>, Hugo Mills <hugo@carfax.org.uk>,
	Christoph Anton Mitterer <calestyo@scientia.net>,
	<linux-btrfs@vger.kernel.org>
Subject: Re: RAID56 status?
Date: Mon, 23 Jan 2017 09:34:10 +0800	[thread overview]
Message-ID: <35d49fbd-102c-3e54-4e40-5f71fb88d2da@cn.fujitsu.com> (raw)
In-Reply-To: <3e9fe326-7638-bc78-4fe8-f6f6851769d7@jvales.net>

At 01/23/2017 08:25 AM, Jan Vales wrote:
> On 01/22/2017 11:39 PM, Hugo Mills wrote:
>> On Sun, Jan 22, 2017 at 11:35:49PM +0100, Christoph Anton Mitterer wrote:
>>> On Sun, 2017-01-22 at 22:22 +0100, Jan Vales wrote:
>>>> Therefore my question: whats the status of raid5/6 is in btrfs?
>>>> Is it somehow "production"-ready by now?
>>> AFAIK, what's on the - apparently already no longer updated -
>>> https://btrfs.wiki.kernel.org/index.php/Status still applies, and
>>> RAID56 is not yet usable for anything near production.
>>
>>    It's still all valid. Nothing's changed.
>>
>>    How would you like it to be updated? "Nope, still broken"?
>>
>>    Hugo.
>>
>>

I'd like to update the wiki to "More and more RAID5/6 bugs are found" :)

OK, no kidding, at least we did exposed several new bugs, and reports 
already exists for a while in mail list.

Some examples are:

1) RAID5/6 scrub will repair data while corrupting parity
    Quite ironic, repairing is just changing one corruption to
    another.

2) RAID5/6 scrub can report false alerts on csum error

3) Dev-replace cancel sometimes can cause kernel panic.

And if we find more bugs, I'm not surprised at all.

So, if really want to use RAID5/6, please use soft raid, then build 
single volume btrfs on it.

I'm seriously considering to re-implement btrfs RAID5/6 using device 
mapper, which is tried and true.

>
> As the changelog stops at 4.7 the wiki seemed a little dead - "still
> broken as of $(date)" or something like that would be nice ^.^
>
> Also some more exact documentation/definition of btrfs' raid-levels
> would be cool, as they seem to mismatch traditional raid-levels - or at
> least I as an ignorant user fail to understand them...

man mkfs.btrfs has a quite good table for the btrfs profiles.

>
> Correct me, if im wrong...
> * It seems, raid1(btrfs) is actually raid10, as there are no more than 2
> copies of data, regardless of the count of devices.

Somewhat right, despite the stripe size of RAID10 is 64K while RAID1 is 
chunk size(1G for data normally), and the large stripe size for RAID1 
makes it meaningless to call it RAID0.

> ** Is there a way to duplicate data n-times?

The only supported n-times duplication is 3-times duplication, which 
uses RAID6 on 3 devices, and I don't consider it safe compared to RAID1.

> ** If there are only 3 devices and the wrong device dies... is it dead?

For RAID1/10/5/6, theoretically it's still alive.
RAID5/6 is of course no problem for it.

For RAID1, always 2 mirrors and mirrors are always located on difference 
device, so no matter which mirrors dies, btrfs can still read it out.

But in practice, it's btrfs, you know right?

> * Whats the diffrence of raid1(btrfs) and raid10(btrfs)?

RAID1: Pure mirror, no striping
           Disk 1                |           Disk 2
----------------------------------------------------------------
  Data Data Data Data Data       | Data Data Data Data Data
  \                      /
      Full one chunk

While chunks are always allocated to the device with most unallocated 
space, you can consider it as extent level RAID1 with chunk level RAID0.

RAID10: RAID1 first, then RAID0
         IIRC RAID0 stripe size is 64K

Disk 1 | Data 1 (64K) Data 4 (64K)
Disk 2 | Data 1 (64K) Data 4 (64K)
---------------------------------------
Disk 3 | Data 2 (64K)
Disk 4 | Data 2 (64K)
---------------------------------------
Disk 5 | Data 3 (64K)
Disk 6 | Data 3 (64K)

> ** After reading like 5 diffrent wiki pages, I understood, that there
> are diffrences ... but not what they are and how they affect me :/

Chunk level striping won't have any obvious performance advantage, while 
64K level striping do.

> * Whats the diffrence of raid0(btrfs) and "normal" multi-device
> operation which seems like a traditional raid0 to me?

What's "normal" or traditional RAID0?
Doesn't it uses all devices for striping? Or just uses 2?

Btrfs RAID0 is always using stripe size 64K (not only RAID0, but also 
RAID10/5/6).

While btrfs chunk allocation also provide chunk size level striping, 
which is 1G for data (considering your fs is larger than 10G) or 256M 
for metadata.

But that striping size won't provide anything useful.
So you could just forgot that chunk level thing.

Despite that, btrfs RAID should quite match normal RAID.

Thanks,
Qu

>
> Maybe rename/alias raid-levels that do not match traditional
> raid-levels, so one cannot expect some behavior that is not there.
> The extreme example is imho raid1(btrfs) vs raid1.
> I would expect that if i have 5 btrfs-raid1-devices, 4 may die and btrfs
> should be able to fully recover, which, if i understand correctly, by
> far does not hold.
> If you named that raid-level say "george" ... I would need to consult
> the docs and I obviously would not expect any behavior. :)
>
> regards,
> Jan Vales
> --
> I only read plaintext emails.
>

next prev parent reply	other threads:[~2017-01-23  1:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-22 21:22 RAID56 status? Jan Vales
2017-01-22 22:35 ` Christoph Anton Mitterer
2017-01-22 22:39   ` Hugo Mills
2017-01-22 22:48     ` Waxhead
2017-01-22 22:56     ` Christoph Anton Mitterer
2017-01-23  0:25     ` Jan Vales
2017-01-23  1:34       ` Qu Wenruo [this message]
     [not found]         ` <CALqCWCXNoKqAJR=7c4wzOvVjSBxNRMsUYMvkfRMcVk14dkp27Q@mail.gmail.com>
2017-01-23  5:24           ` Qu Wenruo
2017-01-23 17:53             ` Christoph Anton Mitterer
2017-01-23 23:18               ` Chris Mason
2017-01-23 23:31                 ` Christoph Anton Mitterer
2017-01-24 14:36                   ` Niccolò Belli
2017-01-23  6:57       ` Brendan Hide
2017-01-23  8:34         ` Janos Toth F.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35d49fbd-102c-3e54-4e40-5f71fb88d2da@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=calestyo@scientia.net \
    --cc=hugo@carfax.org.uk \
    --cc=jan@jvales.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).