From: Marc MERLIN <marc@merlins.org>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds
Date: Sun, 23 Mar 2014 12:22:02 -0700 [thread overview]
Message-ID: <20140323192202.GC3732@merlins.org> (raw)
In-Reply-To: <EBE67530-BDC7-487C-826D-A0E65259BCBA@colorremedies.com>
On Wed, Mar 19, 2014 at 10:53:33AM -0600, Chris Murphy wrote:
>
> On Mar 19, 2014, at 9:40 AM, Marc MERLIN <marc@merlins.org> wrote:
> >
> > After adding a drive, I couldn't quite tell if it was striping over 11
> > drive2 or 10, but it felt that at least at times, it was striping over 11
> > drives with write failures on the missing drive.
> > I can't prove it, but I'm thinking the new data I was writing was being
> > striped in degraded mode.
>
> Well it does sound fragile after all to add a drive to a degraded array, especially when it's not expressly treating the faulty drive as faulty. I think iotop will show what block devices are being written to. And in a VM it's easy (albeit rudimentary) with sparse files, as you can see them grow.
>
> >
> > Yes, although it's limited, you apparently only lose new data that was added
> > after you went into degraded mode and only if you add another drive where
> > you write more data.
> > In real life this shouldn't be too common, even if it is indeed a bug.
>
> It's entirely plausible a drive power/data cable becomes lose, runs for hours degraded before the wayward device is reseated. It'll be common enough. It's definitely not OK for all of that data in the interim to vanish just because the volume has resumed from degraded to normal. Two states of data, normal vs degraded, is scary. It sounds like totally silent data loss. So yeah if it's reproducible it's worthy of a separate bug.
I just got around to filing that bug:
https://bugzilla.kernel.org/show_bug.cgi?id=72811
In other news, I was able to
1) remove a drive
2) mount degraded
3) add a new drive
4) rebalance (that took 2 days with little data, 4 deadlocks and reboots
though)
5) remove the missing drive from the filesystem
6) remount the array without -o degraded
Now, I'm testing
1) add a new drive
2 remove a working drive
3) automatic rebalance from #2 should rebuild on the new drive automatically
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
next prev parent reply other threads:[~2014-03-23 19:22 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-16 15:23 [PATCH] Btrfs: fix incremental send's decision to delay a dir move/rename Filipe David Borba Manana
2014-03-16 17:09 ` [PATCH v2] " Filipe David Borba Manana
2014-03-16 20:37 ` [PATCH v3] " Filipe David Borba Manana
2014-03-16 22:20 ` How to handle a RAID5 arrawy with a failing drive? Marc MERLIN
2014-03-16 22:55 ` Chris Murphy
2014-03-16 23:12 ` Chris Murphy
2014-03-16 23:17 ` Marc MERLIN
2014-03-16 23:23 ` Chris Murphy
2014-03-17 0:51 ` Marc MERLIN
2014-03-17 1:06 ` Chris Murphy
2014-03-17 1:17 ` Marc MERLIN
2014-03-17 2:56 ` Chris Murphy
2014-03-17 3:44 ` Marc MERLIN
2014-03-17 5:12 ` Chris Murphy
2014-03-17 16:13 ` Marc MERLIN
2014-03-17 17:38 ` Chris Murphy
2014-03-16 23:40 ` ronnie sahlberg
2014-03-16 23:20 ` Chris Murphy
2014-03-18 9:02 ` Duncan
2014-03-19 6:09 ` How to handle a RAID5 arrawy with a failing drive? -> raid5 mostly works, just no rebuilds Marc MERLIN
2014-03-19 6:32 ` Chris Murphy
2014-03-19 15:40 ` Marc MERLIN
2014-03-19 16:53 ` Chris Murphy
2014-03-19 22:40 ` Marc MERLIN
[not found] ` <CAGwxe4jL+L571MtEmeHnTnHQSD7h+2ApfWqycgV-ymXhfMR-JA@mail.gmail.com>
2014-03-20 0:46 ` Marc MERLIN
2014-03-20 7:37 ` Tobias Holst
2014-03-23 19:22 ` Marc MERLIN [this message]
2014-03-20 7:37 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140323192202.GC3732@merlins.org \
--to=marc@merlins.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox