Re: mounting failed any file on my filesystem

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: mounting failed any file on my filesystem
Date: Thu, 29 Dec 2016 22:31:29 +0000 (UTC)	[thread overview]
Message-ID: <pan$623d2$cd2f67c9$e174a628$49552d7a@cox.net> (raw)
In-Reply-To: 1707774.BW2KGqTm4d@dibsi

Jan Koester posted on Thu, 29 Dec 2016 20:05:35 +0100 as excerpted:

> Hi,
> 
> i have problem with filesystem if my system crashed i have made been
> hard reset of the system after my Filesystem was crashed. I have already
> tried to repair without success you can see it on log file. It's seem
> one corrupted block brings complete filesystem to crashing.
> 
> Have anybody idea what happened with my filesystem ?
> 
> dmesg if open file:
> [29450.404327] WARNING: CPU: 5 PID: 16161 at
> /build/linux-lIgGMF/linux-4.8.11/ fs/btrfs/extent-tree.c:6945
> __btrfs_free_extent.isra.71+0x8e2/0xd60 [btrfs]

First a disclaimer.  I'm a btrfs user and list regular, not a dev.  As 
such I don't really read call traces much beyond checking the kernel 
version, and don't do code.  It's likely that you will get a more 
authoritative reply from someone who does, and it should take precedence, 
but in the mean time, I can try to deal with the preliminaries.

Kernel 4.8.11, good.  But you run btrfs check below, and we don't have 
the version of your btrfs-progs userspace.  Please report that too.

> btrfs output:
> root@dibsi:/home/jan# btrfs check /dev/disk/by-uuid/
> 73d4dc77-6ff3-412f-9b0a-0d11458faf32

Note that btrfs check is read-only by default.  It will report what it 
thinks are errors, but won't attempt to fix them unless you add various 
options (such as --repair) to tell it to do so.  This is by design and is 
very important, as attempting to repair problems that it doesn't properly 
understand could make the problems worse instead of better.  So even tho 
the above command will only report what it sees as problems, not attempt 
to fix them, you did the right thing by running check without --repair 
first, and posting the results here for an expert to look at and tell you 
whether to try --repair, or what else to try instead.

> Checking filesystem on
> /dev/disk/by-uuid/73d4dc77-6ff3-412f-9b0a-0d11458faf32
> UUID: 73d4dc77-6ff3-412f-9b0a-0d11458faf32
> checking extents
> parent transid verify failed on 2280458502144 wanted 861168
> found 860380
> parent transid verify failed on 2280458502144 wanted 861168
> found 860380
> checksum verify failed on 2280458502144 found FC3DF84D
> wanted 2164EB93
> checksum verify failed on 2280458502144 found FC3DF84D
> wanted 2164EB93
> bytenr mismatch, want=2280458502144, have=15938383240448
[...]

Some other information that we normally ask for includes the output from 
a few other btrfs commands.

It's unclear from your report if the filesystem will mount at all.  The 
subject says mount failed, but then it mentions any file on the 
filesystem, which seems to imply that you could mount, but that any file 
you attempted to actually access after mounting crashes the system with 
the trace you posted, so I'm not sure if you can actually mount the 
filesystem at all.

If you can't mount the filesystem, at least try to post the output from...

btrfs filesystem show

If you can mount the filesystem, then the much more detailed...

btrfs filesystem usage

... if your btrfs-progs is new enough, or...

btrfs filesystem df

... if btrfs-progs is too old to have the usage command.

Also, if it's not clear from the output of the commands above (usage by 
itself, or show plus df, should answer most of the below, but show alone 
only provides some of the information), tell us a bit more about the 
filesystem in question:

Single device (like traditional filesystems) or multiple device?  If 
multiple device, what raid levels if you know them, or did you just go 
with the defaults.  If single device, again, defaults, or did you specify 
single or dup, particularly for metadata.

Also, how big was the filesystem and how close to full?  And was it on 
ssd, spinning rust, or on top of something virtual (like a VM image 
existing as a file on the host, or lvm, or mdraid, etc)?

Meanwhile, if you can mount, the first thing I'd try is btrfs scrub 
(unless you were running btrfs raid56 mode, which makes things far more 
complex as it's not stable yet and isn't recommended except for testing 
with data you can afford to lose).  Often, a scrub can fix much of the 
damage of a crash if you were running raid1 mode (multi-device metadata 
default), raid10, or dup (single device metadata default, except on ssd), 
as those have a second checksummed copy that will often be correct that 
scrub can use to fix the bad copy, but it will detect but be unable to 
fix damage in single mode (default for data) or raid0 mode, as those 
don't have a second copy available to fix the first.

Because the default for single device btrfs is dup metadata, single data, 
in that case the scrub should fix most or all of metadata, allowing you 
to access small file (roughly anything under a couple KiB) and larger 
files that weren't themselves damaged, but you may still have damage in 
some files of any significant size.

But scrub can only run if you can mount the filesystem.  If you can't, 
then you have to try other things in ordered to get it mountable, first.  
Many of these other things tend to be much more complex and risky, so if 
you can mount at all, try scrub first, and see how much it helps.  Here 
I'm dual-device raid1 for nearly all my btrfs, and (assuming I can mount 
the affected filesystem, which I usually can) I now run scrub first thing 
after a crash, as a preventative measure even without knowing if the 
filesystem was damaged or not.

If the filesystem won't mount, then the recommendation is /likely/ to be 
trying the usebackuproot mount option (which replaced the older recovery 
mount option, but you're using a new enough kernel for usebackuproot), 
which will try some older tree roots if the newest one is damaged.  You 
may have to use that option with readonly, which of course will prevent 
running scrub or the like while mounted, but may help you get access to 
the data at least to freshen up your backups.  However, usebackuproot 
will by definition sacrifice the last seconds of writes before the crash, 
and while I'd probably try this option on my own system without asking, 
I'm not comfortable recommending it to others, so I'd suggest waiting for 
one of the higher experts to confirm, before trying it yourself.

Beyond usebackuproot, you get into more risky attempts to repair that may 
instead do further damage if they don't work.  This is where btrfs check 
--repair lives, along with some other check options, btrfs rescue, etc.  
Unless specifically told otherwise by an expert after they look at the 
filesystem info, these are risky enough that if at all possible, you want 
to freshen your backups before you try them.

That's where btrfs restore comes in, as it lets you try to attempt 
restoring files from an unmountable filesystem, while not actually 
writing to that filesystem, thus not risking doing further damage, in the 
process.  Of course that means you have to have some place to put the 
files it's going to restore.  In simple mode you just run btrfs restore 
with commandline parameters telling it what device to try to restore from 
and where to put the restored files (and some options telling it whether 
to try restoring metadata like file ownership, permissions, dates, etc), 
and it just works.

However, should btrfs restore's simple mode fail, there's more complex 
advanced modes to try, still without risking further damage to the 
filesystem in question, but that gets complex enough it needs its own 
post... if you come to that.  There's a page on the wiki with some 
instructions, but they may not be current and it's a complex enough 
operation that most people need help beyond what's on the wiki (and in 
the btrfs-restore manpage), anyway.  But here's the link so you can take 
a look at what the general operation looks like:

https://btrfs.wiki.kernel.org/index.php/Restore

Meanwhile, it's a bit late now, but in general, btrfs is considered still 
in heavy development, stabilizing but not yet fully stable and mature.  
As such, while any sysadmin worth the label will tell you that you are 
defining any data you don't have backups for as not worth the time, 
trouble and resources to do those backups, basically defining it as throw-
away data because it's /not/ worth backing up or by definition you'd 
/have/ those backups, even for normal stable and mature filesystems, with 
btrfs still stabilizing, backups are even /more/ strongly recommended, as 
is keeping them current within the window of data you're willing to lose 
if you lose the primary copy, and keeping those backups practically 
usable (not over a slow net link that'll take over a week to download in 
ordered to restore, for instance, one real case that was posted).  If 
you're doing that then losing a filesystem isn't going to be a big stress 
and you can afford to skip the real complex and risky stuff (unless 
you're simply doing it to learn how) and just restore from backup, as it 
will be simpler.  If not, then you should really reexamine whether btrfs 
is the right filesystem choice for you, because it /isn't/ yet fully 
stable and mature, and chances are you'd be better off with a more stable 
and mature filesystem where not having updated at-hand backups is less of 
a risk (altho as I said any sysadmin worth the name will tell you not 
having backups is literally defining the data as throw-away value, 
because in the real world, "things happen", and there's too many of those 
things possible in the real world to behave otherwise).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2016-12-29 22:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-29 19:05 mounting failed any file on my filesystem Jan Koester
2016-12-29 22:31 ` Duncan [this message]
2016-12-30 12:17   ` Jan Koester
2016-12-31  5:05     ` Duncan
2017-01-01 17:24       ` Jan Koester

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$623d2$cd2f67c9$e174a628$49552d7a@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.