From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: BTRFS Raid5/6 Recovery Problem after accidentially hibernation
Date: Thu, 13 Nov 2014 08:49:12 +0000 (UTC) [thread overview]
Message-ID: <pan$c311$b2fd7185$d23e610e$c84330fe@cox.net> (raw)
In-Reply-To: 54639860.7030206@automatix.de
Juergen Sauer posted on Wed, 12 Nov 2014 18:26:56 +0100 as excerpted:
> Current Status:
> # root@pc6:~# btrfs fi show /dev/sda1
> # parent transid verify failed on 209362944 wanted 293924 found 293922
> # parent transid verify failed on 209362944 wanted 293924 found 293922
> What does "parent transid verify failed on 209362944 wanted 293924 found
> 293922" really mean?
The wanted 293924 found 293922 bit is simple enough, once you understand
what it's doing.
The basis of it is this: Every full btrfs commit, all the way up to the
root node, is called a generation. These are sequential, counting
upward, with transaction ID or transid being another term meaning exactly
the same thing as generation number. You'll see the two terms used
interchangeably in btrfs documentation and log output.
So what that's saying is that it wanted transid/generation 293924, but
what it found was from two commit transactions earlier, 293922.
I believe the 209362944 is the bytenr, basically the address in the
filesystem.
You can use btrfs-show-super to see what the value of the current root
generation as stored in the superblocks, thus getting some idea how far
back this damage appears to be taking you. With commits every 30 seconds
by default if there's data being written, going back two commits as the
above seems to indicate isn't bad. If it's going back a few thousand,
things begin to look much worse.
The Restore wiki page might be of some help at this point. It discusses
find-root (now btrfs-find-root I believe) as well, which can be used to
see what's going on with these root generations and if the one is
corrupt, to find a usable earlier generation. Note that while general
idea remains, the way the tools work and what they output has changed a
bit from when it was written.
https://btrfs.wiki.kernel.org/index.php/Restore
As covered on that page, what restore does is let you attempt to retrieve
files off an unmounted filesystem that's too damaged to mount read-only
and copy them off properly. The files are restored to another (mounted)
filesystem, which does mean you need enough room on this other filesystem
in ordered to store them. If the current root is corrupted beyond help,
you can go back in time to previous roots, trying to find a good one,
using the output from find-root.
As NOT covered or not covered well on that page, partly because it's
outdated and some of this didn't exist when it was written...
1) Restore's -D|--dry-run option can be very helpful, giving you some
idea of how many files it looks like it can restore when pointed at a
particular root.
2) The -l option, list tree roots, is also very helpful. This is the way
you actually see those filesystem trees that the wiki page discusses,
saying pick the latest tree root with all or most filesystem trees
available -- this is how you see what's available! (This part was quite
confusing to me when I was trying to use the page, until I discovered the
-l option. Then it all made sense!)
3) Be aware, when the files are restored, you do get your file data, but
they're written to the new location using the user (root) you're doing
the recovery as, using standard umask permissions. So you lose ownership
and permissions metadata on all restored files, and will need to go thru
and change it back to what you want/need, as necessary.
(Luckily when I did it, I had a backup; it just wasn't current. So I was
able to quickly hack up a script that used a for loop to match all the
restored filenames against those in the backup, and if they matched, do a
chown and a chmod using the backup file as a reference. So the only ones
I had to correct manually were the new ones that weren't in the backup.)
4) Similarly, when I did my restore, no symlinks were restored at all. I
had to recreate them manually, and I do make fair use of symlinks, so I
had a fair number to recreate manually, and a somewhat broken setup until
I got all the ones I normally use recreated.
5) A different user reported that when he used restore some time ago, he
lost his real small files, too. He believed restore only restored files
that actually had extent data, while btrfs folds really small files (a
few KiB max, the exact size limit depends on certain variables set at
filesystem creation) directly into the metadata so they don't actually
have data extents. However, I didn't see any of that on my restore, so
either it has been fixed, or perhaps one of my mount options (autodefrag
or compress=lzo, most likely) saved me, or perhaps I simply didn't notice
them, tho I doubt that as I think many of the smallest files would have
been config files and I customize heavily, and didn't notice missing
settings afterward.
6) On large directories, the restore will think it got caught in a loop.
Back a few kernel and userspace cycles ago when I did my restore, it
would just bail out at that point, and I had to redo the same restore
command several times, writing to the same location (without overwrites),
in ordered to get all files. When I stopped seeing that error and it
wouldn't restore any more files I knew I had what I was going to get.
However, I think the behavior is changed now, and it simply asks if it
should continue or not, instead of bailing out without asking, now.
There's supposed to be an option to tell it to just keep continuing,
instead of asking every few minutes, as well, tho I've not actually used
it so I don't know how it ends up working in practice.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
prev parent reply other threads:[~2014-11-13 8:49 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-11 11:13 transid failed / mount Problem on Linux pc6 3.17.2-1-ARCH #1 SMP PREEMPT Thu Oct 30 20:49:39 CET 2014 x86_64 GNU/Linux Juergen Sauer
2014-11-11 22:39 ` Duncan
2014-11-12 17:26 ` BTRFS Raid5/6 Recovery Problem after accidentially hibernation Juergen Sauer
2014-11-13 8:49 ` Duncan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$c311$b2fd7185$d23e610e$c84330fe@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.