From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds
Date: Thu, 20 Mar 2014 07:37:05 +0000 (UTC) [thread overview]
Message-ID: <pan$cb2d8$ca7492f$8d528793$b5010601@cox.net> (raw)
In-Reply-To: 20140319154031.GP6143@merlins.org
Marc MERLIN posted on Wed, 19 Mar 2014 08:40:31 -0700 as excerpted:
> That's the thing though. If the bad device hadn't been forcibly removed,
> and apparently the only way to do this was to unmount, make the device
> node disappear, and remount in degraded mode, it looked to me like btrfs
> was still consideing that the drive was part of the array and trying to
> write to it.
> After adding a drive, I couldn't quite tell if it was striping over 11
> drive2 or 10, but it felt that at least at times, it was striping over
> 11 drives with write failures on the missing drive.
> I can't prove it, but I'm thinking the new data I was writing was being
> striped in degraded mode.
FWIW, there's at least two problems here, one a bug (or perhaps it'd more
accurately be described as an as yet incomplete feature) unrelated to
btrfs raid5/6 mode, the other the incomplete raid5/6 support. Both are
known issues, however.
The incomplete raid5/6 is discussed well enough elsewhere including in
this thread as a whole, which leaves the other issue.
The other issue, not specifically raid5/6 mode related, is that
currently, in-kernel btrfs is basically oblivious to disappearing drives,
thus explaining some of the more complex bits of the behavior you
described. Yes, the kernel has the device data and other layers know
when a device goes missing, but it's basically a case of the right hand
not knowing what the left hand is doing -- once setup on a set of
devices, in-kernel btrfs basically doesn't do anything with the device
information available to it, at least in terms of removing a device from
its listing when it goes missing. (It does seem to transparently handle
a missing btrfs component device reappearing, arguably /too/
transparently!)
Basically all btrfs does is log errors when a component device
disappears. It doesn't do anything with the disappeared device, and
really doesn't "know" it has disappeared at all, until an unmount and
(possibly degraded) remount, at which point it re-enumerates the devices
and again knows what's actually there... until a device disappears again.
There's actually patches being worked on to fix that situation as we
speak, and it's possible they're actually in btrfs-next already. (I've
seen the patches and discussion go by on the list but haven't tracked
them to the extent that I know current status, other than that they're
not in mainline yet.)
Meanwhile, counter-intuitively, btrfs-userspace is sometimes more aware
of current device status than btrfs-kernel is ATM, since parts of
userspace actually either get current status from the kernel, or trigger
a rescan in ordered to get it. But even after a rescan updates what
userspace knows and thus what the kernel as a whole knows, btrfs-kernel
still doesn't actually use that new information available to it in the
same kernel that btrfs-userspace used to get it from!
Knowing that rather counterintuitive "little" inconsistency, that isn't
actually so little, goes quite a way toward explaining what otherwise
looks like illogical btrfs behavior -- how could kernel-btrfs not know
the status of its own devices?
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
prev parent reply other threads:[~2014-03-20 7:37 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-16 15:23 [PATCH] Btrfs: fix incremental send's decision to delay a dir move/rename Filipe David Borba Manana
2014-03-16 17:09 ` [PATCH v2] " Filipe David Borba Manana
2014-03-16 20:37 ` [PATCH v3] " Filipe David Borba Manana
2014-03-16 22:20 ` How to handle a RAID5 arrawy with a failing drive? Marc MERLIN
2014-03-16 22:55 ` Chris Murphy
2014-03-16 23:12 ` Chris Murphy
2014-03-16 23:17 ` Marc MERLIN
2014-03-16 23:23 ` Chris Murphy
2014-03-17 0:51 ` Marc MERLIN
2014-03-17 1:06 ` Chris Murphy
2014-03-17 1:17 ` Marc MERLIN
2014-03-17 2:56 ` Chris Murphy
2014-03-17 3:44 ` Marc MERLIN
2014-03-17 5:12 ` Chris Murphy
2014-03-17 16:13 ` Marc MERLIN
2014-03-17 17:38 ` Chris Murphy
2014-03-16 23:40 ` ronnie sahlberg
2014-03-16 23:20 ` Chris Murphy
2014-03-18 9:02 ` Duncan
2014-03-19 6:09 ` How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds Marc MERLIN
2014-03-19 6:32 ` Chris Murphy
2014-03-19 15:40 ` Marc MERLIN
2014-03-19 16:53 ` Chris Murphy
2014-03-19 22:40 ` Marc MERLIN
[not found] ` <CAGwxe4jL+L571MtEmeHnTnHQSD7h+2ApfWqycgV-ymXhfMR-JA@mail.gmail.com>
2014-03-20 0:46 ` Marc MERLIN
2014-03-20 7:37 ` Tobias Holst
2014-03-23 19:22 ` Marc MERLIN
2014-03-20 7:37 ` Duncan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$cb2d8$ca7492f$8d528793$b5010601@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox