Raid 1 recovery

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid 1 recovery
@ 2017-01-18 21:07 Jon
  2017-01-18 21:30 ` Chris Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Jon @ 2017-01-18 21:07 UTC (permalink / raw)
  To: linux-btrfs

So, I had a raid 1 btrfs system setup on my laptop. Recently I upgraded
the drives and wanted to get my data back. I figured I could just plug
in one drive, but I found that the volume simply would not mount. I
tried the other drive alone and got the same thing. Plugging in both at
the same time and the volume mounted without issue.

I used raid 1 because I figured that if one drive failed I could simply
use the other. This recovery scenario makes me think this is incorrect.
Am I misunderstanding btrfs raid? Is there a process to go through for
mounting single member of a raid pool?

Thanks,
Jon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid 1 recovery
  2017-01-18 21:07 Raid 1 recovery Jon
@ 2017-01-18 21:30 ` Chris Murphy
  2017-01-19  7:15   ` Duncan
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Murphy @ 2017-01-18 21:30 UTC (permalink / raw)
  To: Jon; +Cc: Btrfs BTRFS

On Wed, Jan 18, 2017 at 2:07 PM, Jon <jmoroney@hawaii.edu> wrote:
> So, I had a raid 1 btrfs system setup on my laptop. Recently I upgraded
> the drives and wanted to get my data back. I figured I could just plug
> in one drive, but I found that the volume simply would not mount. I
> tried the other drive alone and got the same thing. Plugging in both at
> the same time and the volume mounted without issue.

Requires mount option degraded.

If this is a boot volume, this is difficult because the current udev
rule prevents a mount attempt so long as all devices for a Btrfs
volume aren't present.


>
> I used raid 1 because I figured that if one drive failed I could simply
> use the other. This recovery scenario makes me think this is incorrect.
> Am I misunderstanding btrfs raid? Is there a process to go through for
> mounting single member of a raid pool?

mount -o degraded


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid 1 recovery
  2017-01-18 21:30 ` Chris Murphy
@ 2017-01-19  7:15   ` Duncan
  2017-01-19 10:46     ` Andrei Borzenkov
  2017-01-19 19:18     ` Chris Murphy
  0 siblings, 2 replies; 5+ messages in thread
From: Duncan @ 2017-01-19  7:15 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Wed, 18 Jan 2017 14:30:28 -0700 as excerpted:

> On Wed, Jan 18, 2017 at 2:07 PM, Jon <jmoroney@hawaii.edu> wrote:
>> So, I had a raid 1 btrfs system setup on my laptop. Recently I upgraded
>> the drives and wanted to get my data back. I figured I could just plug
>> in one drive, but I found that the volume simply would not mount. I
>> tried the other drive alone and got the same thing. Plugging in both at
>> the same time and the volume mounted without issue.
> 
> Requires mount option degraded.
> 
> If this is a boot volume, this is difficult because the current udev
> rule prevents a mount attempt so long as all devices for a Btrfs volume
> aren't present.

OK, so I've known about this from the list for some time, but what is the 
status with regard to udev/systemd (has a bug/issue been filed, results, 
link?), and what are the alternatives, both for upstream, and for a dev, 
either trying to be proactive, or currently facing a refusal to boot due 
to the issue?

IOW, I run btrfs raid1 on /, setup before the udev rule I believe as it 
worked back then, and I've known about the issue but haven't followed it 
closely enough to know what I should do if faced with a dead device, or 
what the current status is on alternatives to fix the problem longer 
term, either locally (does simply disabling the rule work?) or upstream, 
and what the alternatives might be along with the reasons this is still 
shipping instead.  And I'm asking in ordered to try to remedy that. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid 1 recovery
  2017-01-19  7:15   ` Duncan
@ 2017-01-19 10:46     ` Andrei Borzenkov
  2017-01-19 19:18     ` Chris Murphy
  1 sibling, 0 replies; 5+ messages in thread
From: Andrei Borzenkov @ 2017-01-19 10:46 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Thu, Jan 19, 2017 at 10:15 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> Chris Murphy posted on Wed, 18 Jan 2017 14:30:28 -0700 as excerpted:
>
>> On Wed, Jan 18, 2017 at 2:07 PM, Jon <jmoroney@hawaii.edu> wrote:
>>> So, I had a raid 1 btrfs system setup on my laptop. Recently I upgraded
>>> the drives and wanted to get my data back. I figured I could just plug
>>> in one drive, but I found that the volume simply would not mount. I
>>> tried the other drive alone and got the same thing. Plugging in both at
>>> the same time and the volume mounted without issue.
>>
>> Requires mount option degraded.
>>
>> If this is a boot volume, this is difficult because the current udev
>> rule prevents a mount attempt so long as all devices for a Btrfs volume
>> aren't present.
>
> OK, so I've known about this from the list for some time, but what is the
> status with regard to udev/systemd (has a bug/issue been filed, results,
> link?), and what are the alternatives, both for upstream, and for a dev,
> either trying to be proactive, or currently facing a refusal to boot due
> to the issue?
>

The only possible solution is to introduce separate object that
represents btrsf "storage" (raid) that exports information whether it
can be mounted. Then it is possible to start timer and decide to mount
degraded after timer expires.

Compare with Linux MD that is assembled by udev as well and does exactly that.

Where this object is better implemented - I do not know. It can be
btrfs side (export pseudo device similar to /dev/mdX for Linux MD). It
can be user space - special service that waits for device assembly.

Abstracting it on btrfs level is more clean and does not depend on
specific user space (a.k.a. udev/systemd). OTOH zfs shares the same
problem at the end, so common solution to multi-device filesystem
handling would be good.

> IOW, I run btrfs raid1 on /, setup before the udev rule I believe as it
> worked back then, and I've known about the issue but haven't followed it
> closely enough to know what I should do if faced with a dead device, or
> what the current status is on alternatives to fix the problem longer
> term, either locally (does simply disabling the rule work?) or upstream,
> and what the alternatives might be along with the reasons this is still
> shipping instead.  And I'm asking in ordered to try to remedy that. =:^)
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Raid 1 recovery
  2017-01-19  7:15   ` Duncan
  2017-01-19 10:46     ` Andrei Borzenkov
@ 2017-01-19 19:18     ` Chris Murphy
  1 sibling, 0 replies; 5+ messages in thread
From: Chris Murphy @ 2017-01-19 19:18 UTC (permalink / raw)
  To: Duncan; +Cc: Btrfs BTRFS

On Thu, Jan 19, 2017 at 12:15 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> Chris Murphy posted on Wed, 18 Jan 2017 14:30:28 -0700 as excerpted:
>
>> On Wed, Jan 18, 2017 at 2:07 PM, Jon <jmoroney@hawaii.edu> wrote:
>>> So, I had a raid 1 btrfs system setup on my laptop. Recently I upgraded
>>> the drives and wanted to get my data back. I figured I could just plug
>>> in one drive, but I found that the volume simply would not mount. I
>>> tried the other drive alone and got the same thing. Plugging in both at
>>> the same time and the volume mounted without issue.
>>
>> Requires mount option degraded.
>>
>> If this is a boot volume, this is difficult because the current udev
>> rule prevents a mount attempt so long as all devices for a Btrfs volume
>> aren't present.
>
> OK, so I've known about this from the list for some time, but what is the
> status with regard to udev/systemd (has a bug/issue been filed, results,
> link?), and what are the alternatives, both for upstream, and for a dev,
> either trying to be proactive, or currently facing a refusal to boot due
> to the issue?

If the udev rule isn't there, there's a chance that there's a mount
failure with any multiple device setup if one member device is late to
the party. If the udev rule is removed and if rootflags=degraded, now
whenever there's a late device, there's always a degraded boot and the
drive late to the party is out of sync. And we have no fast resync
like mdadm with write intent bitmaps, so it requires a complete volume
scrub (initiated manually) to avoid corruption, as soon as the volume
is made whole.

Now maybe the udev rule could be made smarter, I don't really know. If
it's multiple device you'd want a rule that just waits for say, 30
seconds or a minute or something sane, whatever that'd be. That way
normal operation just delays things a bit to make sure all member
drives are available at the same time, so that the mount command
(without degraded option) works. And the only failure case is when
there is in fact a bad drive. Someone willing to take the risk could
use such a udev rule along with rootflags=degraded, but this is asking
for trouble.

What's really needed is a daemon or other service that manages the
pool status. And that includes dealing with degradedness and resyncs
automatically.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-01-19 19:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-18 21:07 Raid 1 recovery Jon
2017-01-18 21:30 ` Chris Murphy
2017-01-19  7:15   ` Duncan
2017-01-19 10:46     ` Andrei Borzenkov
2017-01-19 19:18     ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).