All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robert White <rwhite@pobox.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: mkfs.btrfs limits "odd" [and maybe a "failed" phantom device?]
Date: Fri, 12 Dec 2014 14:28:06 -0800	[thread overview]
Message-ID: <548B6BF6.2060306@pobox.com> (raw)
In-Reply-To: <20141212164544.GB25614@hungrycats.org>

On 12/12/2014 08:45 AM, Zygo Blaxell wrote:
> On Thu, Dec 11, 2014 at 10:01:06PM -0800, Robert White wrote:
>> So RAID5 with three media M is
>>
>> M    MM   MMM
>> D1   D2   P(a)
>> D3   P(b) D4
>> P(c) D5   D6
>
> RAID5 with two media is well defined, and looks like this:
>
> M    MM
> D1   P(a)
> P(b) D2
> D3   P(c)

Like I said in the other fork of this thread... I see (now) that the 
math works but I can find no trace of anyone having ever implemented 
this for arity less than 3 RAID greater than one paradigm (outside btrfs 
and its associated materials).

It's like talking about a two-wheeled tricycle. 8-)

I would _genuinely_ like to see any third party discussion of this. It 
just isn't done (probably because, as you've shown it just a really 
complicated and CPU intensive way to end up with a simple mirror). I 
spent several hours looking. I can see the math works, and I understand 
what you are doing (as I said at some length in the grandparent message) 
but it "just isn't done".

The reason I use the tricycle example is that, while most people know 
this instinctively few are aware of the fact that going from two wheels 
to three-or-more wheels reverses the steering paradigm. On a bike you 
push-left lean-left and go-left. At the higher arity vehicles (including 
adding a side-car to a bike) you push-right go left (you lean left too, 
but that's just to keep from nosing over 8-). I find that quite apt in 
the whole RAID1 vs RAID5 discussion since the former is about copying 
one-or-more times and the latter is about starting with a theoretically 
zeroed buffer and doing reversible checksumming into it.

I doubt that I will be the last person to be confused by BTRFS' 
implementation of a two-wheeled tricycle.

You're going to get a lot of mail over the years. 8-)


MEANWHILE

the system really needs to be able to explicitly express and support the 
"missing" media paradigm.

  M     x    MMM
  D1    .    P(a)
  D3    .    D4
  P(c)  .    D6

The correct logic here to "remove" (e.g. "replace with nothing" instead 
of "delete") a media just doesn't seem to exist. And it's already 
painfully missing in the RAID1 situation.

If I have a system with N SATA ports, and I have connected N drives, and 
device M is starting to fail... I need to be able to disconnect M and 
then connect M(new). Possibly with a non-trivial amount of time in 
there. For all RAID levels greater than zero this is a natural operation 
in a degraded mode. And for a nearly full filesystem the shrink 
operation that is btrfs device delete would not work. And for any 
nontrivially occupied fiesystem it would be way slow, and need to be 
reversed for another way-slow interval.

So I need to be able to "replace" a drive with a "nothing" so that the 
number of active media becomes N-1 but the arity remains N.

mdadm has the "missing" keyword. the Device Mapper has the "zero" 
target. As near as I can tell btrfs has got nothing in this functional slot.

Imagine, if you will, a block device that is the anti-/dev/null. All 
operations on this block device return EFAULT. lets call it 
/dev/nothing. And lets say I have a /dev/sdc that has to come out 
immediately (and all my stuff is RAID1/5/6).  The operational chain would be

btrfs replace start /dev/sdc /dev/nothing /
(time pases, physical device is removed and replace)
btrfs replace start /dev/nothing /dev/sdc /

Now that's good-ish, but really the first replace is pernicious. The 
internal state for the filesystem should just be able to record that 
device id 3 (assuming /dev/sda is devid1 and b is 2 etc for this 
example) is just gone. The replace-with-nothing becomes more-or-less 
instant.

The first replace is also pernicious if its the second media failure on 
a fully RAID6 array since that would trying to put the same kernel level 
device in the array twice.

The restore operation, the replace of the nothing with the something, 
remains fully elaborate.

The "nothing" devices need to show up in the device id tables for a 
running array in their geographically correct positions and all that.

Without this "missing" status as a first-class part of the system, 
dealing with failures and communicating about those failures with the 
operator will become vexatious.


[The use of "device delete" and "device add" as changes in arity and 
size, and its inaplicability to cases where failure is being dealt with 
abent a change of arity, could be clearer in the documentation.]

  reply	other threads:[~2014-12-12 22:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-10 22:18 mkfs.btrfs limits "odd" [and maybe a "failed" phantom device?] Robert White
2014-12-11  7:33 ` Duncan
2014-12-12  3:56 ` Zygo Blaxell
2014-12-12  6:01   ` Robert White
2014-12-12  9:06     ` David Taylor
2014-12-12 11:16       ` Robert White
2014-12-12 13:29         ` Hugo Mills
2014-12-13  3:01         ` Duncan
2014-12-12 16:45     ` Zygo Blaxell
2014-12-12 22:28       ` Robert White [this message]
2014-12-13  4:28         ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548B6BF6.2060306@pobox.com \
    --to=rwhite@pobox.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.