linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/4] 3- and 4- copy RAID1
Date: Wed, 18 Jul 2018 08:50:26 -0400	[thread overview]
Message-ID: <6901a05c-d71b-bf2e-b66f-69b02aee527d@gmail.com> (raw)
In-Reply-To: <pan$98af2$39851f53$ef9fd141$bfd51f5b@cox.net>

On 2018-07-18 03:20, Duncan wrote:
> Goffredo Baroncelli posted on Wed, 18 Jul 2018 07:59:52 +0200 as
> excerpted:
> 
>> On 07/17/2018 11:12 PM, Duncan wrote:
>>> Goffredo Baroncelli posted on Mon, 16 Jul 2018 20:29:46 +0200 as
>>> excerpted:
>>>
>>>> On 07/15/2018 04:37 PM, waxhead wrote:
>>>
>>>> Striping and mirroring/pairing are orthogonal properties; mirror and
>>>> parity are mutually exclusive.
>>>
>>> I can't agree.  I don't know whether you meant that in the global
>>> sense,
>>> or purely in the btrfs context (which I suspect), but either way I
>>> can't agree.
>>>
>>> In the pure btrfs context, while striping and mirroring/pairing are
>>> orthogonal today, Hugo's whole point was that btrfs is theoretically
>>> flexible enough to allow both together and the feature may at some
>>> point be added, so it makes sense to have a layout notation format
>>> flexible enough to allow it as well.
>>
>> When I say orthogonal, It means that these can be combined: i.e. you can
>> have - striping (RAID0)
>> - parity  (?)
>> - striping + parity  (e.g. RAID5/6)
>> - mirroring  (RAID1)
>> - mirroring + striping  (RAID10)
>>
>> However you can't have mirroring+parity; this means that a notation
>> where both 'C' ( = number of copy) and 'P' ( = number of parities) is
>> too verbose.
> 
> Yes, you can have mirroring+parity, conceptually it's simply raid5/6 on
> top of mirroring or mirroring on top of raid5/6, much as raid10 is
> conceptually just raid0 on top of raid1, and raid01 is conceptually raid1
> on top of raid0.
> 
> While it's not possible today on (pure) btrfs (it's possible today with
> md/dm-raid or hardware-raid handling one layer), it's theoretically
> possible both for btrfs and in general, and it could be added to btrfs in
> the future, so a notation with the flexibility to allow parity and
> mirroring together does make sense, and having just that sort of
> flexibility is exactly why Hugo made the notation proposal he did.
> 
> Tho a sensible use-case for mirroring+parity is a different question.  I
> can see a case being made for it if one layer is hardware/firmware raid,
> but I'm not entirely sure what the use-case for pure-btrfs raid16 or 61
> (or 15 or 51) might be, where pure mirroring or pure parity wouldn't
> arguably be a at least as good a match to the use-case.  Perhaps one of
> the other experts in such things here might help with that.
> 
>>>> Question #2: historically RAID10 is requires 4 disks. However I am
>>>> guessing if the stripe could be done on a different number of disks:
>>>> What about RAID1+Striping on 3 (or 5 disks) ? The key of striping is
>>>> that every 64k, the data are stored on a different disk....
>>>
>>> As someone else pointed out, md/lvm-raid10 already work like this.
>>> What btrfs calls raid10 is somewhat different, but btrfs raid1 pretty
>>> much works this way except with huge (gig size) chunks.
>>
>> As implemented in BTRFS, raid1 doesn't have striping.
> 
> The argument is that because there's only two copies, on multi-device
> btrfs raid1 with 4+ devices of equal size so chunk allocations tend to
> alternate device pairs, it's effectively striped at the macro level, with
> the 1 GiB device-level chunks effectively being huge individual device
> strips of 1 GiB.
Actually, it also behaves like LVM and MD RAID10 for any number of 
devices greater than 2, though the exact placement may diverge because 
of BTRFS's concept of different chunk types.  In LVM and MD RAID10, each 
block is stored as two copies, and what disks it ends up on is dependent 
on the block number modulo the number of disks (so, for 3 disks A, B, 
and C, block 0 is on A and B, block 1 is on C and A, and block 2 is on B 
and C, with subsequent blocks following the same pattern).  In an 
idealized model of BTRFS with only one chunk type, you get exactly the 
same behavior (because BTRFS allocates chunks based on disk utilization, 
and prefers lower numbered disks to higher ones in the event of a tie).
> 
> At 1 GiB strip size it doesn't have the typical performance advantage of
> striping, but conceptually, it's equivalent to raid10 with huge 1 GiB
> strips/chunks.
>

  parent reply	other threads:[~2018-07-18 13:28 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-13 18:46 [PATCH 0/4] 3- and 4- copy RAID1 David Sterba
2018-07-13 18:46 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba
2018-07-13 18:46 ` [PATCH 1/4] btrfs: refactor block group replication factor calculation to a helper David Sterba
2018-07-13 18:46 ` [PATCH 2/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba
2018-07-13 21:02   ` Goffredo Baroncelli
2018-07-17 16:00     ` David Sterba
2018-07-13 18:46 ` [PATCH 3/4] btrfs: add support for 4-copy replication (raid1c4) David Sterba
2018-07-13 18:46 ` [PATCH 4/4] btrfs: add incompatibility bit for extended raid features David Sterba
2018-07-15 14:37 ` [PATCH 0/4] 3- and 4- copy RAID1 waxhead
2018-07-16 18:29   ` Goffredo Baroncelli
2018-07-16 18:49     ` Austin S. Hemmelgarn
2018-07-17 21:12     ` Duncan
2018-07-18  5:59       ` Goffredo Baroncelli
2018-07-18  7:20         ` Duncan
2018-07-18  8:39           ` Duncan
2018-07-18 12:45             ` Austin S. Hemmelgarn
2018-07-18 12:50             ` Hugo Mills
2018-07-19 21:22               ` waxhead
2018-07-18 12:50           ` Austin S. Hemmelgarn [this message]
2018-07-18 19:42           ` Goffredo Baroncelli
2018-07-19 11:43             ` Austin S. Hemmelgarn
2018-07-19 17:29               ` Goffredo Baroncelli
2018-07-19 19:10                 ` Austin S. Hemmelgarn
2018-07-20 17:13                   ` Goffredo Baroncelli
2018-07-20 18:33                     ` Austin S. Hemmelgarn
2018-07-20  5:17             ` Andrei Borzenkov
2018-07-20 17:16               ` Goffredo Baroncelli
2018-07-20 18:38                 ` Andrei Borzenkov
2018-07-20 18:41                   ` Hugo Mills
2018-07-20 18:46                     ` Austin S. Hemmelgarn
2018-07-16 21:51   ` waxhead
2018-07-15 14:46 ` Hugo Mills
2018-07-19  7:27 ` Qu Wenruo
2018-07-19 11:47   ` Austin S. Hemmelgarn
2018-07-20 16:42     ` David Sterba
2018-07-20 16:35   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6901a05c-d71b-bf2e-b66f-69b02aee527d@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).