Re: corrupted RAID1: unsuccessful recovery / help needed

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: corrupted RAID1: unsuccessful recovery / help needed
Date: Mon, 26 Oct 2015 08:31:42 +0000 (UTC)	[thread overview]
Message-ID: <pan$deb9e$70dd599$c247d7fb$410817e7@cox.net> (raw)
In-Reply-To: 562DC606.3070602@lukas-pirl.de

Lukas Pirl posted on Mon, 26 Oct 2015 19:19:50 +1300 as excerpted:

> TL;DR: RAID1 does not recover, I guess the interesting part in the stack
> trace is: [elided, I'm not a dev so it's little help to me]
> 
> I'd appreciate some help for repairing a corrupted RAID1.
> 
> Setup:
> * Linux 4.2.0-12, Btrfs v3.17, 
> `btrfs fi show`:
>    uuid: 5be372f5-5492-4f4b-b641-c14f4ad8ae23
>    Total devices 6 FS bytes used 2.87TiB
>    devid 1 size 931.51GiB used 636.00GiB path /dev/mapper/[...]
>    devid 2 size 931.51GiB used 634.03GiB path /dev/mapper/
>    devid 3 size   1.82TiB used   1.53TiB path /dev/mapper/
>    devid 4 size   1.82TiB used   1.53TiB path /dev/mapper/
>    devid 6 size   1.82TiB used   1.05TiB path /dev/mapper/
>    *** Some devices missing
> * disks are dm-crypted

FWIW... Older btrfs userspace such as your v3.17 is "OK" for normal 
runtime use, assuming you don't need any newer features, as in normal 
runtime, it's the kernel code doing the real work and userspace for the 
most part simply makes the appropriate kernel calls to do that work.

But, once you get into a recovery situation like the one you're in now, 
current userspace becomes much more important, as the various things 
you'll do to attempt recovery rely far more on userspace code directly 
accessing the filesystem, and it's only the newest userspace code that 
has the latest fixes.

So for a recovery situation, the newest userspace release (4.2.2 at 
present) as well as a recent kernel is recommended, and depending on the 
problem, you may at times need to run integration or apply patches on top 
of that.

> What happened:
> * devid 5 started to die (slowly)
> * added a new disk (devid 6) and tried `btrfs device delete`
> * failed with kernel crashes (guess:) due to heavy IO errors
> * removed devid 5 from /dev (deactivated in dm-crypt)
> * tried `btrfs balance`
>    * interrupted multiple times due to kernel crashes
>      (probably due to semi-corrupted file system?)
> * file system did not mount anymore after a required hard-reset
> * no successful recovery so far:
>    if not read-only, kernel IO blocks eventually (hard-reset required)
> * tried:
>    * `-o degraded`
>      -> IO freeze, kernel log: http://pastebin.com/Rzrp7XeL
>    * `-o degraded,recovery`
>      -> IO freeze, kernel log: http://pastebin.com/VemHfnuS
>    * `-o degraded,recovery,ro`
>      -> file system accessible, system stable
> * going rw again does not fix the problem
> 
> I did not btrfs-zero-log so far because my oops did not look very
> similar to the one in the Wiki and I did not want to risk to make
> recovery harder.

General note about btrfs and btrfs raid.  Given that btrfs itself remains 
a "stabilizing, but not yet fully mature and stable filesystem", while 
btrfs raid will often let you recover from a bad device, sometimes that 
recovery is in the form of letting you mount ro, so you can access the 
data and copy it elsewhere, before blowing away the filesystem and 
starting over.

Back to the problem at hand.  Current btrfs has a known limitation when 
operating in degraded mode.  That being, a btrfs raid may be write-
mountable only once, degraded, after which it can only be read-only 
mounted.  This is because under certain circumstances in degraded mode, 
btrfs will fall back from its normal raid mode to single mode chunk 
allocation for new writes, and once there's single-mode chunks on the 
filesystem, btrfs mount isn't currently smart enough to check that all 
chunks are actually available on present devices, and simply jumps to the 
conclusion that there's single mode chunks on the missing device(s) as 
well, so refuses to mount writable after that in ordered to prevent 
further damage to the filesystem and preserve the ability to mount at 
least ro, to copy off what isn't damaged.

There's a patch in the pipeline for this problem, that checks individual 
chunks instead of leaping to conclusions based on the presence of single-
mode chunks on a degraded filesystem with missing devices.  If that's 
your only problem (which the backtraces might reveal but I as a non-dev 
btrfs user can't tell), the patches should let you mount writable.

But that patch isn't in kernel 4.2.  You'll need at least kernel 4.3-rc, 
and possibly btrfs integration, or to cherrypick the patches onto 4.2.

Meanwhile, in keeping with the admin's rule on backups, by definition, if 
you valued the data more than the time and resources necessary for a 
backup, by definition, you have a backup available, otherwise, by 
definition, you valued the data less than the time and resources 
necessary to back it up.

Therefore, no worries.  Regardless of the fate of the data, you saved 
what your actions declared of most valuable to you, either the data, or 
the hassle and resources cost of the backup you didn't do.  As such, if 
you don't have a backup (or if you do but it's outdated), the data at 
risk of loss is by definition of very limited value.

That said, it appears you don't even have to worry about loss of that 
very limited value data, since mounting degraded,recovery,ro gives you 
stable access to it, and you can use the opportunity provided to copy it 
elsewhere, at least to the extent that the data we already know is of 
limited value is even worth the hassle of doing that.

Which is exactly what I'd do.  Actually, I've had to resort to btrfs 
restore[1] a couple times when the filesystem wouldn't mount at all, so 
the fact that you can mount it degraded,recovery,ro, already puts you 
ahead of the game. =:^)

So yeah, first thing, since you have the opportunity, unless your backups 
are sufficiently current that it's not worth the trouble, copy off the 
data while you can.

Then, unless you wish to keep the filesystem around in case the devs want 
to use it to improve btrfs' recovery system, I'd just blow it away and 
start over, restoring the data from backup once you have a fresh 
filesystem to restore to.  That's the simplest and fastest way to a fully 
working system once again, and what I did here after using btrfs restore 
to recover the delta between current and my backups.

---
[1] Btrfs restore: Yes, I have backups, but I don't always keep them 
current.  To the extent that I risk losing the difference between current 
and backup, my actions obviously define that difference as not worth the 
hassle cost of more frequent backups vs. the risk.  But while my actions 
define that delta data as of relatively low value, it's not of /no/ 
value, and to the extent btrfs restore allows me to recover it, I 
appreciate that I can do so, avoiding the loss of the delta between my 
backup and what was current.  Of course, that lowers the risk of loss 
even further, letting me put off updating the backups even longer if I 
wanted, but I haven't actually done so.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2015-10-26  8:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-26  6:19 corrupted RAID1: unsuccessful recovery / help needed Lukas Pirl
2015-10-26  8:31 ` Duncan [this message]
2015-10-29 21:43   ` Lukas Pirl
2015-10-30  9:40     ` Duncan
2015-10-30 10:58     ` Duncan
2015-10-30 11:25       ` Hugo Mills
2015-10-30 15:03       ` Austin S Hemmelgarn
2015-11-08  2:59       ` Lukas Pirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$deb9e$70dd599$c247d7fb$410817e7@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).