public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds
Date: Thu, 20 Mar 2014 07:37:05 +0000 (UTC)	[thread overview]
Message-ID: <pan$cb2d8$ca7492f$8d528793$b5010601@cox.net> (raw)
In-Reply-To: 20140319154031.GP6143@merlins.org

Marc MERLIN posted on Wed, 19 Mar 2014 08:40:31 -0700 as excerpted:

> That's the thing though. If the bad device hadn't been forcibly removed,
> and apparently the only way to do this was to unmount, make the device
> node disappear, and remount in degraded mode, it looked to me like btrfs
> was still consideing that the drive was part of the array and trying to
> write to it.
> After adding a drive, I couldn't quite tell if it was striping over 11
> drive2 or 10, but it felt that at least at times, it was striping over
> 11 drives with write failures on the missing drive.
> I can't prove it, but I'm thinking the new data I was writing was being
> striped in degraded mode.

FWIW, there's at least two problems here, one a bug (or perhaps it'd more 
accurately be described as an as yet incomplete feature) unrelated to 
btrfs raid5/6 mode, the other the incomplete raid5/6 support.  Both are 
known issues, however.

The incomplete raid5/6 is discussed well enough elsewhere including in 
this thread as a whole, which leaves the other issue.

The other issue, not specifically raid5/6 mode related, is that 
currently, in-kernel btrfs is basically oblivious to disappearing drives, 
thus explaining some of the more complex bits of the behavior you 
described.  Yes, the kernel has the device data and other layers know 
when a device goes missing, but it's basically a case of the right hand 
not knowing what the left hand is doing -- once setup on a set of 
devices, in-kernel btrfs basically doesn't do anything with the device 
information available to it, at least in terms of removing a device from 
its listing when it goes missing.  (It does seem to transparently handle 
a missing btrfs component device reappearing, arguably /too/ 
transparently!)

Basically all btrfs does is log errors when a component device 
disappears.  It doesn't do anything with the disappeared device, and 
really doesn't "know" it has disappeared at all, until an unmount and 
(possibly degraded) remount, at which point it re-enumerates the devices 
and again knows what's actually there... until a device disappears again.

There's actually patches being worked on to fix that situation as we 
speak, and it's possible they're actually in btrfs-next already.  (I've 
seen the patches and discussion go by on the list but haven't tracked 
them to the extent that I know current status, other than that they're 
not in mainline yet.)

Meanwhile, counter-intuitively, btrfs-userspace is sometimes more aware 
of current device status than btrfs-kernel is ATM, since parts of 
userspace actually either get current status from the kernel, or trigger 
a rescan in ordered to get it.  But even after a rescan updates what 
userspace knows and thus what the kernel as a whole knows, btrfs-kernel 
still doesn't actually use that new information available to it in the 
same kernel that btrfs-userspace used to get it from!

Knowing that rather counterintuitive "little" inconsistency, that isn't 
actually so little, goes quite a way toward explaining what otherwise 
looks like illogical btrfs behavior -- how could kernel-btrfs not know 
the status of its own devices?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


      parent reply	other threads:[~2014-03-20  7:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-16 15:23 [PATCH] Btrfs: fix incremental send's decision to delay a dir move/rename Filipe David Borba Manana
2014-03-16 17:09 ` [PATCH v2] " Filipe David Borba Manana
2014-03-16 20:37 ` [PATCH v3] " Filipe David Borba Manana
2014-03-16 22:20   ` How to handle a RAID5 arrawy with a failing drive? Marc MERLIN
2014-03-16 22:55     ` Chris Murphy
2014-03-16 23:12       ` Chris Murphy
2014-03-16 23:17         ` Marc MERLIN
2014-03-16 23:23           ` Chris Murphy
2014-03-17  0:51             ` Marc MERLIN
2014-03-17  1:06               ` Chris Murphy
2014-03-17  1:17                 ` Marc MERLIN
2014-03-17  2:56                   ` Chris Murphy
2014-03-17  3:44                     ` Marc MERLIN
2014-03-17  5:12                       ` Chris Murphy
2014-03-17 16:13                         ` Marc MERLIN
2014-03-17 17:38                           ` Chris Murphy
2014-03-16 23:40           ` ronnie sahlberg
2014-03-16 23:20         ` Chris Murphy
2014-03-18  9:02     ` Duncan
2014-03-19  6:09       ` How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds Marc MERLIN
2014-03-19  6:32         ` Chris Murphy
2014-03-19 15:40           ` Marc MERLIN
2014-03-19 16:53             ` Chris Murphy
2014-03-19 22:40               ` Marc MERLIN
     [not found]                 ` <CAGwxe4jL+L571MtEmeHnTnHQSD7h+2ApfWqycgV-ymXhfMR-JA@mail.gmail.com>
2014-03-20  0:46                   ` Marc MERLIN
2014-03-20  7:37                     ` Tobias Holst
2014-03-23 19:22               ` Marc MERLIN
2014-03-20  7:37             ` Duncan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$cb2d8$ca7492f$8d528793$b5010601@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox