All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: How does btrfs handle bad blocks in raid1?
Date: Fri, 10 Jan 2014 15:27:58 +0000 (UTC)	[thread overview]
Message-ID: <pan$d99db$e3c147a3$da4732f1$5ec5ee5@cox.net> (raw)
In-Reply-To: CAPpBBdNNHB3igMK67j5FpZxg7f6tmxGZk8x1OSwKd8GhTqsj=w@mail.gmail.com

George Eleftheriou posted on Thu, 09 Jan 2014 17:49:48 +0100 as excerpted:

> I'm really looking forward to the day that typing:
> 
> mkfs.btrfs -d raid10 -m raid10  /dev/sd[abcd]
> 
> will do exactly what is expected to do. A true RAID10 resilient in 2
> disks' failure. Simple and beautiful.
> 
> We're almost there...

I see the further discussion, but three comments:

1) (As should be obvious by now, but as the saying goes...)

I want N-way-mirroring so bad I can taste it!  =:^)

2) Assuming a guaranteed 2-device-drop safe 3(+)-way-mirroring 
possibility, the above mkfs.btrfs would by the same assumption of 
necessity be a bit more complicated than that (and would require six 
devices of the same size for simplest conceptual formulation, not the 
four shown above).

Because at that point, a distinction between these two possibilities for 
a 6-device raid10 would need to be made:

* Two-way raid1/mirror on the devices, three-way raid0/stripe on top.

This is the current default and only choice, as discussed elsewhere in 
the subthread.  The three-way-stripe is 3X fast (ideal, probably more 
like 2X fast in practice, allowing for overhead), while the 2-way-mirror 
provides guaranteed 1-device-drop safety, with a possibility to lose two 
devices and recover, or not, depending on which two they are.

For maximum backward compatibility with what we have now, since it /is/ 
what we have now, that's likely what you'd still get with this:

mkfs.btrfs -d raid10 -m raid10 /dev/sd[abcdef]

... but it'd only guarantee single-device-drop safety.

The alternative, which I want so bad I can taste it, would be:

* Three-way raid1/mirror on the devices, two-way raid0/stripe on top.

That would sacrifice the 3X speed reducing it to 2X (ideal, probably 1.5X 
in practice due to overhead), but the 3-way-mirror would provide *BOTH* 
guaranteed 2-device-drop safety, *AND* guaranteed checksummed 3-way 
individual-btrfs-node integrity-checked mirroring, such that should any 
two of the three mirrors fail checksum, there'd still be that third copy.

What would the mkfs.btrfs command look like for that?  I've no insight on 
exactly how they plan to implement it, but here's one possible idea:

mkfs.btrfs -d raid10.3 -m raid10.3 /dev/sd[abcdef]

The ".3" bit would indicate three-way-mirroring instead of the default 2-
way-mirroring.  It has the advantage of relative brevity, but isn't 
entirely intuitive.

Another possibility would be a more explicit two-component mode-spec, 
like this:

mkfs.btrfs -d mirror3 (-d) raid10, -m mirror3 (-m) raid10 /dev/sd[abcdef]

(Whether the second -d/-m specifier was required to be there, optional, 
or could not be there, would depend on how they setup the parser.  
Another option would be a no-space comma separator: -d mirror3,raid10
-m mirror3,raid10 .)

This is more verbose but MUCH clearer, and as such I believe would be 
preferred to the dot-format, since after all, mkfs isn't something most 
peope do a lot of, so clarity should be preferred to brevity.  And I'd 
predict the no-space-comma-separator, since that format's least 
complicated in terms of shell parsing, and is already familiar from usage 
in fstab, among other places.

Oh, that would taste SOOO good! =:^)

3) Just for clarity in case anyone were to get mixed up, those devices 
can be partitions (or for that matter, mdraids or whatever) too.  They 
don't have to be actual whole physical devices.  So /dev/sd[abcdef]5 , 
for instance, would work too.  That's actually what I'm already doing 
here, altho obviously not with the n-way-mirroring I so want, as it's not 
available yet.

(This comment specifically included since the fact that multi-device 
btrfs could be on partition-devices wasn't clear to at least one list 
poster, not that long ago.  So just to make it explicitly clear to 
anybody stumbling on this post from google or whatever...)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  parent reply	other threads:[~2014-01-10 15:28 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-09 10:26 How does btrfs handle bad blocks in raid1? Clemens Eisserer
2014-01-09 10:42 ` Hugo Mills
2014-01-09 12:41   ` Duncan
2014-01-09 12:52     ` Austin S Hemmelgarn
2014-01-09 15:15       ` Duncan
2014-01-09 16:49         ` George Eleftheriou
2014-01-09 17:09           ` Hugo Mills
2014-01-09 17:34             ` George Eleftheriou
2014-01-09 17:43               ` Hugo Mills
2014-01-09 18:40                 ` George Eleftheriou
2014-01-09 17:29           ` Chris Murphy
2014-01-09 18:00             ` George Eleftheriou
2014-01-10 15:27           ` Duncan [this message]
2014-01-10 15:46             ` George Mitchell
2014-01-09 17:31       ` Chris Murphy
2014-01-09 18:20         ` Austin S Hemmelgarn
2014-01-09 14:58     ` Chris Mason
2014-01-09 18:08     ` Chris Murphy
2014-01-09 18:22       ` Austin S Hemmelgarn
2014-01-09 18:52         ` Chris Murphy
2014-01-10 17:03           ` Duncan
2014-01-09 18:40   ` Chris Murphy
2014-01-09 19:13     ` Kyle Gates
2014-01-09 19:31       ` Chris Murphy
2014-01-09 23:24         ` George Mitchell
2014-01-10  0:08           ` Clemens Eisserer
2014-01-10  0:46             ` George Mitchell
     [not found] <201401100106.s0A16CNd016476@atl4mhib27.myregisteredsite.com>
2014-01-10  1:31 ` George Mitchell
2014-01-14 19:13   ` Chris Murphy
2014-01-14 19:37     ` Roman Mamedov
2014-01-14 21:05       ` Chris Murphy
2014-01-14 21:19         ` Roman Mamedov
2014-01-14 21:37           ` Chris Murphy
2014-01-14 21:45             ` Chris Murphy
2014-01-14 21:54             ` Roman Mamedov
2014-01-14 20:29     ` George Mitchell
2014-01-14 21:00       ` Roman Mamedov
2014-01-14 21:06         ` Hugo Mills
2014-01-14 21:27           ` Chris Murphy
2014-01-14 21:27         ` George Mitchell
2014-01-14 21:28         ` George Mitchell
2014-01-14 21:14       ` Chris Murphy
2014-01-14 21:48         ` George Mitchell
2014-01-14 21:48         ` George Mitchell
2014-01-14 22:14         ` George Mitchell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$d99db$e3c147a3$da4732f1$5ec5ee5@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.