Re: Unrecoverable fs corruption?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Unrecoverable fs corruption?
Date: Sun, 3 Jan 2016 15:53:56 +0000 (UTC)	[thread overview]
Message-ID: <pan$11cfd$9e78008a$67e09cd4$7017ff54@cox.net> (raw)
In-Reply-To: CAPmG0jb1HF6Mnfi3ObqUfPHs4RNLCDvsnuncvX-=LGZazDHnYg@mail.gmail.com

Henk Slager posted on Sat, 02 Jan 2016 22:19:18 +0100 as excerpted:

> If you think btrfs raid (I/O)fault handling etc is not good enough yet,
> instead of raid1, you might consider 2x single (dup for metadata), with
> 1 the main/master fs and the other one the slave fs, created by send |
> receive (incremental). If you scrub both on regular basis, email or so
> the error cases, you can act if something is wrong.
> And every now and then do a brute-force diff to verify that contents of
> both filesystems (snapshots) are still the same.

Given the OP's situation, that he was running btrfs in raid1 mode, and 
that a third device of similar capacity is simply out of the question due 
to cost at this point, this approach, possibly generalized, is what I'd 
recommend as well.

RAID-1 is not a backup.  And I'd strongly recommend a backup take 
priority over a raid1 if there's simply not enough money for more 
devices.  There's simply too many ways a raid1 can go wrong when there's 
no actual backup, including fat-fingering a deletion[1].

Now if the device capacity is sufficiently large, I'd actually recommend 
partitioning both devices up with two identically sized partitions on 
each.  Then the first partition on each can be made into a raid1 forming 
the working copy, while the second partition on each can be a separate 
raid1 that's the backup.  That way, there's both a backup and raid1 
protection.  That's actually what I'm doing here, pretty much.[2]

Of course, the partitioned raid1 working and backup solution does require 
that the data actually fit in half the space of a single device, and it 
may not, in which case this isn't an option.

Which would bring us back to a working copy on one device and its backup 
on the other.

But I'd actually consider making either the backup not btrfs.  What I use 
here for my second backups is the old reiserfs I was using before btrfs.  
That way, if it's a btrfs bug that takes out the one copy, you don't have 
to worry about the same btrfs bug taking out the backup when you try to 
fall back to it.  It may not be particularly likely, and it does kill the 
chance of using btrfs send/receive to update the backup, but it 
significantly eases my mind when I'm in recovery mode, knowing my backup 
isn't subject to whatever btrfs bug I had that put me in recovery mode in 
the first place.  

(In the partitioned raid case, I'd consider making the backup mdraid1, 
with whatever filesystem on top, since other than btrfs and zfs, 
filesystems basically don't do raid so it must be implemented below 
them.  Or don't raid the backup and simply make a primary backup on one 
device and a secondary backup on the other.)

---
[1] Fat-fingering a deletion:  My own brown-bag "I became an admin that 
day" case was running a script, unfortunately as root, that I was 
debugging, where I did an rm -rf $somevar/*, with $somevar assigned 
earlier, only either the somevar in the assignment or the somevar in the 
rm line was typoed, so the var ended up empty and the command ended up as 
rm -rf /*. ...

I was *SO* glad I had a backup, not just a raid1, that day!

Needless to say, I also learned the lesson, the hard way, that either you 
don't debug your scripts as root, or if you are going to do so, you 
comment out rm lines and replace them with ls, the first time thru!  Or 
do a confirm-prompt with the command line printed, first, and then copy/
paste the confirmation version to the operational line, so there's no 
chance of typoing something different than the confirmed version.

[2] Dual raid1 working and backup copies on a pair of partitioned 
devices:  My setup is actually rather somewhat more complex than that, 
but the details are not apropos to this discussion.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2016-01-03 15:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-31 23:36 Unrecoverable fs corruption? Alexander Duscheleit
2016-01-01  1:22 ` Chris Murphy
2016-01-01  8:13   ` Duncan
2016-01-02  4:32     ` Christoph Anton Mitterer
2016-01-03 15:00       ` Duncan
2016-01-04  0:05         ` Christoph Anton Mitterer
2016-01-06  7:35           ` Duncan
2016-01-02 10:53     ` Alexander Duscheleit
2016-01-02 21:19       ` Henk Slager
2016-01-03 15:53         ` Duncan [this message]
2016-01-03 16:24           ` Martin Steigerwald
2016-01-03 16:08       ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$11cfd$9e78008a$67e09cd4$7017ff54@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).