linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Trying to rescue my data :(
Date: Sun, 26 Jun 2016 02:30:45 +0000 (UTC)	[thread overview]
Message-ID: <pan$e8a2$15460093$3bb59e30$70ebaf89@cox.net> (raw)
In-Reply-To: b578050f-1425-134b-9655-7ad36dd4b7ab@crc.id.au

Steven Haigh posted on Sun, 26 Jun 2016 02:39:23 +1000 as excerpted:

> In every case, it was a flurry of csum error messages, then instant
> death.

This is very possibly a known bug in btrfs, that occurs even in raid1 
where a later scrub repairs all csum errors.  While in theory btrfs raid1 
should simply pull from the mirrored copy if its first try fails checksum 
(assuming the second one passes, of course), and it seems to do this just 
fine if there's only an occasional csum error, if it gets too many at 
once, it *does* unfortunately crash, despite the second copy being 
available and being just fine as later demonstrated by the scrub fixing 
the bad copy from the good one.

I'm used to dealing with that here any time I have a bad shutdown (and 
I'm running live-git kde, which currently has a bug that triggers a 
system crash if I let it idle and shut off the monitors, so I've been 
getting crash shutdowns and having to deal with this unfortunately often, 
recently).  Fortunately I keep my root, with all system executables, etc, 
mounted read-only by default, so it's not affected and I can /almost/ 
boot normally after such a crash.  The problem is /var/log and /home 
(which has some parts of /var that need to be writable symlinked into /
home/var, so / can stay read-only).  Something in the normal after-crash 
boot triggers enough csum errors there that I often crash again.

So I have to boot to emergency mode and manually mount the filesystems in 
question, so nothing's trying to access them until I run the scrub and 
fix the csum errors.  Scrub itself doesn't trigger the crash, thankfully, 
and once it has repaired all the csum errors due to partial writes on one 
mirror that either were never made or were properly completed on the 
other mirror, I can exit emergency mode and complete the normal boot (to 
the multi-user default target).  As there's no more csum errors then 
because scrub fixed them all, the boot doesn't crash due to too many such 
errors, and I'm back in business.


Tho I believe at least the csum bug that affects me may only trigger if 
compression is (or perhaps has been in the past) enabled.  Since I run 
compress=lzo everywhere, that would certainly affect me.  It would also 
explain why the bug has remained around for quite some time as well, 
since presumably the devs don't run with compression on enough for this 
to have become a personal itch they needed to scratch, thus its remaining 
untraced and unfixed.

So if you weren't using the compress option, your bug is probably 
different, but either way, the whole thing about too many csum errors at 
once triggering a system crash sure does sound familiar, here.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  parent reply	other threads:[~2016-06-26  2:30 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-24 14:52 Trying to rescue my data :( Steven Haigh
2016-06-24 16:26 ` Steven Haigh
2016-06-24 16:59   ` ronnie sahlberg
2016-06-24 17:05     ` Steven Haigh
2016-06-24 17:40       ` Austin S. Hemmelgarn
2016-06-24 17:43         ` Steven Haigh
2016-06-24 17:50           ` Austin S. Hemmelgarn
2016-06-25  4:19             ` Steven Haigh
2016-06-25 16:25               ` Chris Murphy
2016-06-25 16:39                 ` Steven Haigh
2016-06-25 17:14                   ` Chris Murphy
2016-06-26  2:30                   ` Duncan [this message]
2016-06-26  3:13                     ` Steven Haigh
2016-09-11 19:48                       ` compress=lzo safe to use? (was: Re: Trying to rescue my data :() Martin Steigerwald
2016-09-11 20:06                         ` Adam Borowski
2016-09-11 20:27                           ` Chris Murphy
2016-09-11 20:49                         ` compress=lzo safe to use? Hans van Kranenburg
2016-09-12  4:36                           ` Duncan
2016-09-17  9:30                             ` Kai Krakow
2016-09-12  1:00                         ` Steven Haigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$e8a2$15460093$3bb59e30$70ebaf89@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).