All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Mitchell <george@chinilu.com>
To: Duncan <1i5t5.duncan@cox.net>, linux-btrfs@vger.kernel.org
Subject: Re: How does btrfs handle bad blocks in raid1?
Date: Fri, 10 Jan 2014 07:46:29 -0800	[thread overview]
Message-ID: <52D015D5.4050909@chinilu.com> (raw)
In-Reply-To: <pan$d99db$e3c147a3$da4732f1$5ec5ee5@cox.net>

On 01/10/2014 07:27 AM, Duncan wrote:
> George Eleftheriou posted on Thu, 09 Jan 2014 17:49:48 +0100 as excerpted:
>
>> I'm really looking forward to the day that typing:
>>
>> mkfs.btrfs -d raid10 -m raid10  /dev/sd[abcd]
>>
>> will do exactly what is expected to do. A true RAID10 resilient in 2
>> disks' failure. Simple and beautiful.
>>
>> We're almost there...
> I see the further discussion, but three comments:
>
> 1) (As should be obvious by now, but as the saying goes...)
>
> I want N-way-mirroring so bad I can taste it!  =:^)
>
> 2) Assuming a guaranteed 2-device-drop safe 3(+)-way-mirroring
> possibility, the above mkfs.btrfs would by the same assumption of
> necessity be a bit more complicated than that (and would require six
> devices of the same size for simplest conceptual formulation, not the
> four shown above).
>
> Because at that point, a distinction between these two possibilities for
> a 6-device raid10 would need to be made:
>
> * Two-way raid1/mirror on the devices, three-way raid0/stripe on top.
>
> This is the current default and only choice, as discussed elsewhere in
> the subthread.  The three-way-stripe is 3X fast (ideal, probably more
> like 2X fast in practice, allowing for overhead), while the 2-way-mirror
> provides guaranteed 1-device-drop safety, with a possibility to lose two
> devices and recover, or not, depending on which two they are.
>
> For maximum backward compatibility with what we have now, since it /is/
> what we have now, that's likely what you'd still get with this:
>
> mkfs.btrfs -d raid10 -m raid10 /dev/sd[abcdef]
>
> ... but it'd only guarantee single-device-drop safety.
>
> The alternative, which I want so bad I can taste it, would be:
>
> * Three-way raid1/mirror on the devices, two-way raid0/stripe on top.
>
> That would sacrifice the 3X speed reducing it to 2X (ideal, probably 1.5X
> in practice due to overhead), but the 3-way-mirror would provide *BOTH*
> guaranteed 2-device-drop safety, *AND* guaranteed checksummed 3-way
> individual-btrfs-node integrity-checked mirroring, such that should any
> two of the three mirrors fail checksum, there'd still be that third copy.
>
> What would the mkfs.btrfs command look like for that?  I've no insight on
> exactly how they plan to implement it, but here's one possible idea:
>
> mkfs.btrfs -d raid10.3 -m raid10.3 /dev/sd[abcdef]
>
> The ".3" bit would indicate three-way-mirroring instead of the default 2-
> way-mirroring.  It has the advantage of relative brevity, but isn't
> entirely intuitive.
>
> Another possibility would be a more explicit two-component mode-spec,
> like this:
>
> mkfs.btrfs -d mirror3 (-d) raid10, -m mirror3 (-m) raid10 /dev/sd[abcdef]
>
> (Whether the second -d/-m specifier was required to be there, optional,
> or could not be there, would depend on how they setup the parser.
> Another option would be a no-space comma separator: -d mirror3,raid10
> -m mirror3,raid10 .)
>
> This is more verbose but MUCH clearer, and as such I believe would be
> preferred to the dot-format, since after all, mkfs isn't something most
> peope do a lot of, so clarity should be preferred to brevity.  And I'd
> predict the no-space-comma-separator, since that format's least
> complicated in terms of shell parsing, and is already familiar from usage
> in fstab, among other places.
>
> Oh, that would taste SOOO good! =:^)
>
> 3) Just for clarity in case anyone were to get mixed up, those devices
> can be partitions (or for that matter, mdraids or whatever) too.  They
> don't have to be actual whole physical devices.  So /dev/sd[abcdef]5 ,
> for instance, would work too.  That's actually what I'm already doing
> here, altho obviously not with the n-way-mirroring I so want, as it's not
> available yet.
>
> (This comment specifically included since the fact that multi-device
> btrfs could be on partition-devices wasn't clear to at least one list
> poster, not that long ago.  So just to make it explicitly clear to
> anybody stumbling on this post from google or whatever...)
>
Duncan, you are describing exactly the sort of ROBUST RAID product I 
would like to see btrfs become.  In this world of ridiculously 
inexpensive hard drives I don't think we should ever have to risk ending 
up in a degraded state, at least certainly not for long, but not ever 
would be ideal.  We should never end up being in a panic to change out a 
drive and facing additional panic as to whether a rebuild is going to 
succeed or fall on its face.  Those days should be over forever, 
barring, of course, a direct nuclear hit.  - George

  reply	other threads:[~2014-01-10 15:46 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-09 10:26 How does btrfs handle bad blocks in raid1? Clemens Eisserer
2014-01-09 10:42 ` Hugo Mills
2014-01-09 12:41   ` Duncan
2014-01-09 12:52     ` Austin S Hemmelgarn
2014-01-09 15:15       ` Duncan
2014-01-09 16:49         ` George Eleftheriou
2014-01-09 17:09           ` Hugo Mills
2014-01-09 17:34             ` George Eleftheriou
2014-01-09 17:43               ` Hugo Mills
2014-01-09 18:40                 ` George Eleftheriou
2014-01-09 17:29           ` Chris Murphy
2014-01-09 18:00             ` George Eleftheriou
2014-01-10 15:27           ` Duncan
2014-01-10 15:46             ` George Mitchell [this message]
2014-01-09 17:31       ` Chris Murphy
2014-01-09 18:20         ` Austin S Hemmelgarn
2014-01-09 14:58     ` Chris Mason
2014-01-09 18:08     ` Chris Murphy
2014-01-09 18:22       ` Austin S Hemmelgarn
2014-01-09 18:52         ` Chris Murphy
2014-01-10 17:03           ` Duncan
2014-01-09 18:40   ` Chris Murphy
2014-01-09 19:13     ` Kyle Gates
2014-01-09 19:31       ` Chris Murphy
2014-01-09 23:24         ` George Mitchell
2014-01-10  0:08           ` Clemens Eisserer
2014-01-10  0:46             ` George Mitchell
     [not found] <201401100106.s0A16CNd016476@atl4mhib27.myregisteredsite.com>
2014-01-10  1:31 ` George Mitchell
2014-01-14 19:13   ` Chris Murphy
2014-01-14 19:37     ` Roman Mamedov
2014-01-14 21:05       ` Chris Murphy
2014-01-14 21:19         ` Roman Mamedov
2014-01-14 21:37           ` Chris Murphy
2014-01-14 21:45             ` Chris Murphy
2014-01-14 21:54             ` Roman Mamedov
2014-01-14 20:29     ` George Mitchell
2014-01-14 21:00       ` Roman Mamedov
2014-01-14 21:06         ` Hugo Mills
2014-01-14 21:27           ` Chris Murphy
2014-01-14 21:27         ` George Mitchell
2014-01-14 21:28         ` George Mitchell
2014-01-14 21:14       ` Chris Murphy
2014-01-14 21:48         ` George Mitchell
2014-01-14 21:48         ` George Mitchell
2014-01-14 22:14         ` George Mitchell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D015D5.4050909@chinilu.com \
    --to=george@chinilu.com \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.