Re: Help recovering filesystem (if possible)

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Matthew Dawson <matthew@mjdsystems.ca>
To: Kai Krakow <hurikhan77+btrfs@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Help recovering filesystem (if possible)
Date: Wed, 17 Nov 2021 21:57:40 -0500	[thread overview]
Message-ID: <3321185.LZWGnKmheA@cwmtaff> (raw)
In-Reply-To: <CAMthOuMxvff2d0THhKWCpErQFumrJA9vmNqS6vtBNDwUwf3j-w@mail.gmail.com>

On Monday, November 15, 2021 5:46:43 A.M. EST Kai Krakow wrote:
> Am Mo., 15. Nov. 2021 um 02:55 Uhr schrieb Matthew Dawson
> 
> <matthew@mjdsystems.ca>:
> > I recently upgrade one of my machines to the 5.15.2 kernel.  on the first
> > reboot, I had a kernel fault during the initialization (I didn't get to
> > capture the printed stack trace, but I'm 99% sure it did not have BTRFS
> > related calls).  I then rebooted the machine back to a 5.14 kernel, but
> > the
> > BCache (writeback) cache was corrupted.  I then force started the
> > underlying disks, but now my BTRFS filesystem will no longer mount.  I
> > realize there may be missing/corrupted data, but I would like to ideally
> > get any data I can off the disks.
> 
> I had a similar issue lately where the system didn't reboot cleanly
> (there's some issue in the BIOS or with the SSD firmware where it
> would disconnect the SSD from SATA a few seconds after boot, forcing
> bcache into detaching dirty caches).
> 
> Since you are seeing transaction IDs lacking behind expectations, I
> think you've lost dirty writeback data from bcache. Do fix this in the
> future, you should use bcache only in writearound or writethrough
> mode.
Considering I started the bcache devices without the cache, I don't doubt I've 
lost writeback data and I have no doubts there will be issues.  At this point 
I'm just in data recovery, trying to get what I can.

> 
> > This system involves 10 8TB disk, some are doing BCache -> LUKS -> BTRFS,
> > some are doing LUKS -> BTRFS.
> 
> Not LUKS here, and all my btrfs pool members are attached to a single
> SSD as caching frontend.
> 
> > When I try to mount the filesystem, I get the following in dmesg:
> > [117632.798339] BTRFS info (device dm-0): flagging fs with big metadata
> > feature [117632.798344] BTRFS info (device dm-0): disk space caching is
> > enabled [117632.798346] BTRFS info (device dm-0): has skinny extents
> > [117632.873186] BTRFS error (device dm-0): parent transid verify failed on
> > 132806584614912 wanted 3240123 found 3240119
> 
> I had luck with the following steps:
> 
> * ensure that all members are attached to bcache as they should
> * ensure bcache is running in writearound mode for each member
> * ensure that btrfs did scan for all members
> 
> Next, I started `btrfs check` for each member disk, eventually one
> would contain the needed disk structures and only showed a few errors.
> 
> I was then able to mount btrfs through that device node, open ctree
> didn't fail this time. I don't remember if I used "usebackuproot" for
> mount or a similar switch for "btrfs check".
> 
> I then ran `btrfs scrub` which fixed the broken metadata. Luckily, I
> had only metadata corruption on the disks which had dirty writeback
> cleared, and metadata runs in RAID-1 mode for me.
> 
> "btrfs check" then didn't find any errors. Reboot worked fine.
Thanks for the suggestion.  Unfortunately, all my disks report basically the 
same errors, so I wasn't able to recover my system this way.

> 
> [...]
> 
> > Is there any hope in recovering this data?  Or should I give up on it at
> > this point and reformat?  Most of the data is backed up (or are backups
> > themselves), but I'd like to get what I can.
> 
> Well, I'm doing daily backups with borg - to a different technology
> (no btrfs, no bcache, different system). I don't think backing up
> btrfs to btrfs is a brilliant idea, especially not when both are
> mounted to the same system.
I'm not quite that redundant, but the backups of things I really care about 
are actually to an off-site system.  But accessing data through a backup can be 
painful compared to hopefully just getting it out.  Also the local backups on 
the system would be nice to have, for historical purposes.

> 
> You may try my steps above. If you've found a member device which
> shows fewer errors, you COULD try to repair it if mount still fails
> (or try one of the recovery mount options). But you may want to ask
> the experts again here.
I did try, thanks.  Unfortunately as noted above it wasn't helpful.

Hopefully someone has a different idea?  I am posting here because I feel any 
luck is going to start using more dangerous options and those usually say to 
ask the mailing list first.

> 
> Depending on how much dirty writeback you've lost in bcache, chances
> may be good that one of the members has enough metadata to
> successfully mount or repair the filesystem. Or at least, it's a good
> start for "btrfs restore" then.
> 
> What do we learn from this?
> 
> * probably do not use bcache in writeback mode if you can avoid it
> * switch bcache to writearound mode before kernel upgrades, wait for
> writeback to finish
> * success mounting btrfs may depend a lot on which member device you
> actually mount

Thanks,
-- 
Matthew

next prev parent reply	other threads:[~2021-11-18  2:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15  1:52 Help recovering filesystem (if possible) Matthew Dawson
2021-11-15 10:46 ` Kai Krakow
2021-11-18  2:57   ` Matthew Dawson [this message]
2021-11-18 21:09     ` Zygo Blaxell
2021-11-19  4:42       ` Matthew Dawson
2021-11-24  4:43         ` Zygo Blaxell
2021-11-24  5:11           ` Matthew Dawson
  -- strict thread matches above, loose matches on Subject: below --
2021-11-15  1:23 Matthew Dawson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3321185.LZWGnKmheA@cwmtaff \
    --to=matthew@mjdsystems.ca \
    --cc=hurikhan77+btrfs@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox