From: Matthew Dawson <matthew@mjdsystems.ca>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Kai Krakow <hurikhan77+btrfs@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: Help recovering filesystem (if possible)
Date: Wed, 24 Nov 2021 00:11:46 -0500 [thread overview]
Message-ID: <4306866.vuYhMxLoTh@ring00> (raw)
In-Reply-To: <20211124044343.GF17148@hungrycats.org>
On Tuesday, November 23, 2021 11:43:43 P.M. EST Zygo Blaxell wrote:
> On Thu, Nov 18, 2021 at 11:42:05PM -0500, Matthew Dawson wrote:
> > On Thursday, November 18, 2021 4:09:15 P.M. EST Zygo Blaxell wrote:
> > > On Wed, Nov 17, 2021 at 09:57:40PM -0500, Matthew Dawson wrote:
> > > > On Monday, November 15, 2021 5:46:43 A.M. EST Kai Krakow wrote:
> > > > > Am Mo., 15. Nov. 2021 um 02:55 Uhr schrieb Matthew Dawson
> > > > >
> > > > > <matthew@mjdsystems.ca>:
> > > > > > I recently upgrade one of my machines to the 5.15.2 kernel. on
> > > > > > the
> > > > > > first
> > > > > > reboot, I had a kernel fault during the initialization (I didn't
> > > > > > get
> > > > > > to
> > > > > > capture the printed stack trace, but I'm 99% sure it did not have
> > > > > > BTRFS
> > > > > > related calls). I then rebooted the machine back to a 5.14
> > > > > > kernel,
> > > > > > but
> > > > > > the
> > > > > > BCache (writeback) cache was corrupted. I then force started the
> > > > > > underlying disks, but now my BTRFS filesystem will no longer
> > > > > > mount. I
> > > > > > realize there may be missing/corrupted data, but I would like to
> > > > > > ideally
> > > > > > get any data I can off the disks.
> > > > >
> > > > > I had a similar issue lately where the system didn't reboot cleanly
> > > > > (there's some issue in the BIOS or with the SSD firmware where it
> > > > > would disconnect the SSD from SATA a few seconds after boot, forcing
> > > > > bcache into detaching dirty caches).
> > > > >
> > > > > Since you are seeing transaction IDs lacking behind expectations, I
> > > > > think you've lost dirty writeback data from bcache. Do fix this in
> > > > > the
> > > > > future, you should use bcache only in writearound or writethrough
> > > > > mode.
> > > >
> > > > Considering I started the bcache devices without the cache, I don't
> > > > doubt
> > > > I've lost writeback data and I have no doubts there will be issues.
> > > > At
> > > > this point I'm just in data recovery, trying to get what I can.
> > >
> > > The word "issues" is not adequate to describe the catastrophic damage
> > > to metadata that occurs if the contents of a writeback cache are lost.
> > >
> > > If writeback failure happens to only one btrfs device's cache, you
> > > can recover with btrfs raid1 self-healing using intact copies stored
> > > on working devices. If it happens on multiple btrfs devices at once
> > > (e.g. due to misconfiguration of bcache with more than one btrfs device
> > > per pool or more than one bcache pool per SSD, or due to a kernel bug
> > > that affects all bcache instances at once, or a firmware bug that
> > > affects
> > > each SSD device the same way during a crash) then recovery isn't
> > > possible.
> > >
> > > Writeback cache failures are _bad_, falling between "many thousands of
> > > bad sectors" and "total disk failure" in terms of difficulty of
> > > recovery.
> > >
> > > > Hopefully someone has a different idea? I am posting here because I
> > > > feel
> > > > any luck is going to start using more dangerous options and those
> > > > usually
> > > > say to ask the mailing list first.
> > >
> > > Your best option would be to get the caches running again, at least in
> > > read-only mode. It's not a good option, but all your other options
> > > depend
> > > on having access to as many cached dirty pages as possible. If all you
> > > have is the backing devices, then now is the time to scrape what you
> > > can from the drives with 'btrfs restore' then make use of your backups.
> >
> > At this point I think I'm stuck with just the backing devices (with GB of
> > lost dirty data on the cache). And I'm primarily in data recovery,
> > trying to get whatever good data I can to help supplement the backed up
> > data.
>
> I don't use words like "catastrophic" casually. Recovery typically
> isn't possible with the backing disks after a writeback cache failure.
>
> The writeback cache algorithm will prefer to keep the most critical
> metadata in cache, while writing out-of-date metadata pages out to the
> backing devices. This process effectively wipes btrfs metadata off
> the backing disks as the cache fills up, and puts it back as the cache
> flushes out. If a large dirty cache dies, it can leave nothing behind.
>
> > As mentioned in my first email though, btrfs restore fails with the
> > following error message:
> > # btrfs restore -l /dev/dm-2
> > parent transid verify failed on 132806584614912 wanted 3240123 found
> > 3240119 parent transid verify failed on 132806584614912 wanted 3240123
> > found 3240119 parent transid verify failed on 132806584614912 wanted
> > 3240123 found 3240119 parent transid verify failed on 132806584614912
> > wanted 3240123 found 3240119 Ignoring transid failure
> > Couldn't setup extent tree
> > Couldn't setup device tree
> > Could not open root, trying backup super
> > warning, device 6 is missing
> > warning, device 13 is missing
> > warning, device 12 is missing
> > warning, device 11 is missing
> > warning, device 7 is missing
> > warning, device 9 is missing
> > warning, device 14 is missing
> > bytenr mismatch, want=136920576753664, have=0
> > ERROR: cannot read chunk root
> > Could not open root, trying backup super
> > warning, device 6 is missing
> > warning, device 13 is missing
> > warning, device 12 is missing
> > warning, device 11 is missing
> > warning, device 7 is missing
> > warning, device 9 is missing
> > warning, device 14 is missing
> > bytenr mismatch, want=136920576753664, have=0
> > ERROR: cannot read chunk root
> > Could not open root, trying backup super
> > When all devices are up and reported to the kernel. I was looking for
> > help to try and move beyond these errors and get whatever may still be
> > available.
> The general btrfs recovery process is:
>
> 1. Restore device and chunk trees. Without these, btrfs
> can't translate logical to physical block addresses, or even
> recognize its own devices, so you get "device is missing" errors.
> The above log shows that device and chunk tree data is now in the
> cache--or at least, not on the backing disks. 'btrfs rescue
> chunk-recover' may locate an older copy of this data by brute
> force search of the disk, if an older copy still exists.
>
> 2. Find subvol roots to read data. 'btrfs-find-root' will
> do a brute-force search of the disks to locate subvol roots,
> which you can pass to 'btrfs restore -l' to try to read files.
> Normally this produces hundreds of candidates and you'll have
> to try each one. If you have an old snapshot (one that predates
> the last full cache flush, and no balance, device shrink, device
> remove, defrag, or dedupe operation has occurred since) then you
> might be able to read its entire tree. Subvols that are modified
> recently will be unusable as they will be missing many or all
> of their pages (they will be in the cache, not the backing disks).
>
> 3. Verify the data you get back. The csum tree is no longer
> usable, so you'll have no way to know if any data that you get
> from the filesystem is correct or garbage. This is true even if
> you are reading from an old snapshot, as the csum tree is global
> to all subvols and will be modified (and moved into the cache)
> by any write to the filesystem.
>
> In the logs above we see that you have missing pages in extent, chunk,
> and device trees. In a writeback cache setup, new versions of these
> trees will be written to the cache, while the old versions are partially
> or completely erased on the backing devices in the process of flushing
> out previous dirty pages. This pattern will repeat for subvol and csum
> trees, leaving you with severely damaged or unusable metadata on the
> backing disks as long as there are dirty pages in cache.
>
> > If further recovery is impossible that's fine I'll wipe and start over,
> > but I rather try some risky things to get what I can before I do so.
>
> I wouldn't say it's impossible in theory, but in practice it is a level
> of effort comparable to unshredding a phone book--after someone has
> grabbed a handful of the shredded paper and burned it.
>
> High-risk interventions like 'check --repair --init-extent-tree' are
> likely to have no effect in the best case (they'll give up due to lack
> of usable metadata), and will destroy even more data in the worst case
> (they'll try modifying the filesystem and overwrite some of the surviving
> data). They depend on having intact device and subvol trees to work,
> so if you can't get those back, there's no need to try anything else.
>
> In theory, if you can infer the file structure from the contents of the
> files, you might be able to guess some of the missing metadata. e.g. the
> logical-to-physical translation in the device tree only provides about
> 16 bits of an extent byte address, so you could theoretically build
> a tool which tries all 65536 most likely disk locations for a block
> until it finds a plausible content match for a file, and use that tool
> to reconstruct the device tree. It might even be possible to automate
> this using fragments of the csum tree (assuming the relevant parts of
> the csum tree exist on the backing devices and not only in the cache).
> This is only the theory--practical tools to do this kind of recovery
> don't yet exist.
Thanks for the suggestions! I'll give them a try over the next bit (I'm
getting some extra storage, then I'll try using device mapper's snapshot
target to avoid destroying what there).
I also might try writing a recovery tool for the bcache cache, doing something
similar to the dm snapshot system.
Thanks for the pointers!
--
Matthew
next prev parent reply other threads:[~2021-11-24 5:11 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-15 1:52 Help recovering filesystem (if possible) Matthew Dawson
2021-11-15 10:46 ` Kai Krakow
2021-11-18 2:57 ` Matthew Dawson
2021-11-18 21:09 ` Zygo Blaxell
2021-11-19 4:42 ` Matthew Dawson
2021-11-24 4:43 ` Zygo Blaxell
2021-11-24 5:11 ` Matthew Dawson [this message]
-- strict thread matches above, loose matches on Subject: below --
2021-11-15 1:23 Matthew Dawson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4306866.vuYhMxLoTh@ring00 \
--to=matthew@mjdsystems.ca \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=hurikhan77+btrfs@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).