All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Dawson <matthew@mjdsystems.ca>
To: Kai Krakow <hurikhan77+btrfs@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Help recovering filesystem (if possible)
Date: Wed, 17 Nov 2021 21:57:40 -0500	[thread overview]
Message-ID: <3321185.LZWGnKmheA@cwmtaff> (raw)
In-Reply-To: <CAMthOuMxvff2d0THhKWCpErQFumrJA9vmNqS6vtBNDwUwf3j-w@mail.gmail.com>

On Monday, November 15, 2021 5:46:43 A.M. EST Kai Krakow wrote:
> Am Mo., 15. Nov. 2021 um 02:55 Uhr schrieb Matthew Dawson
> 
> <matthew@mjdsystems.ca>:
> > I recently upgrade one of my machines to the 5.15.2 kernel.  on the first
> > reboot, I had a kernel fault during the initialization (I didn't get to
> > capture the printed stack trace, but I'm 99% sure it did not have BTRFS
> > related calls).  I then rebooted the machine back to a 5.14 kernel, but
> > the
> > BCache (writeback) cache was corrupted.  I then force started the
> > underlying disks, but now my BTRFS filesystem will no longer mount.  I
> > realize there may be missing/corrupted data, but I would like to ideally
> > get any data I can off the disks.
> 
> I had a similar issue lately where the system didn't reboot cleanly
> (there's some issue in the BIOS or with the SSD firmware where it
> would disconnect the SSD from SATA a few seconds after boot, forcing
> bcache into detaching dirty caches).
> 
> Since you are seeing transaction IDs lacking behind expectations, I
> think you've lost dirty writeback data from bcache. Do fix this in the
> future, you should use bcache only in writearound or writethrough
> mode.
Considering I started the bcache devices without the cache, I don't doubt I've 
lost writeback data and I have no doubts there will be issues.  At this point 
I'm just in data recovery, trying to get what I can.

> 
> > This system involves 10 8TB disk, some are doing BCache -> LUKS -> BTRFS,
> > some are doing LUKS -> BTRFS.
> 
> Not LUKS here, and all my btrfs pool members are attached to a single
> SSD as caching frontend.
> 
> > When I try to mount the filesystem, I get the following in dmesg:
> > [117632.798339] BTRFS info (device dm-0): flagging fs with big metadata
> > feature [117632.798344] BTRFS info (device dm-0): disk space caching is
> > enabled [117632.798346] BTRFS info (device dm-0): has skinny extents
> > [117632.873186] BTRFS error (device dm-0): parent transid verify failed on
> > 132806584614912 wanted 3240123 found 3240119
> 
> I had luck with the following steps:
> 
> * ensure that all members are attached to bcache as they should
> * ensure bcache is running in writearound mode for each member
> * ensure that btrfs did scan for all members
> 
> Next, I started `btrfs check` for each member disk, eventually one
> would contain the needed disk structures and only showed a few errors.
> 
> I was then able to mount btrfs through that device node, open ctree
> didn't fail this time. I don't remember if I used "usebackuproot" for
> mount or a similar switch for "btrfs check".
> 
> I then ran `btrfs scrub` which fixed the broken metadata. Luckily, I
> had only metadata corruption on the disks which had dirty writeback
> cleared, and metadata runs in RAID-1 mode for me.
> 
> "btrfs check" then didn't find any errors. Reboot worked fine.
Thanks for the suggestion.  Unfortunately, all my disks report basically the 
same errors, so I wasn't able to recover my system this way.

> 
> [...]
> 
> > Is there any hope in recovering this data?  Or should I give up on it at
> > this point and reformat?  Most of the data is backed up (or are backups
> > themselves), but I'd like to get what I can.
> 
> Well, I'm doing daily backups with borg - to a different technology
> (no btrfs, no bcache, different system). I don't think backing up
> btrfs to btrfs is a brilliant idea, especially not when both are
> mounted to the same system.
I'm not quite that redundant, but the backups of things I really care about 
are actually to an off-site system.  But accessing data through a backup can be 
painful compared to hopefully just getting it out.  Also the local backups on 
the system would be nice to have, for historical purposes.

> 
> You may try my steps above. If you've found a member device which
> shows fewer errors, you COULD try to repair it if mount still fails
> (or try one of the recovery mount options). But you may want to ask
> the experts again here.
I did try, thanks.  Unfortunately as noted above it wasn't helpful.

Hopefully someone has a different idea?  I am posting here because I feel any 
luck is going to start using more dangerous options and those usually say to 
ask the mailing list first.

> 
> Depending on how much dirty writeback you've lost in bcache, chances
> may be good that one of the members has enough metadata to
> successfully mount or repair the filesystem. Or at least, it's a good
> start for "btrfs restore" then.
> 
> What do we learn from this?
> 
> * probably do not use bcache in writeback mode if you can avoid it
> * switch bcache to writearound mode before kernel upgrades, wait for
> writeback to finish
> * success mounting btrfs may depend a lot on which member device you
> actually mount

Thanks,
-- 
Matthew



  reply	other threads:[~2021-11-18  2:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15  1:52 Help recovering filesystem (if possible) Matthew Dawson
2021-11-15 10:46 ` Kai Krakow
2021-11-18  2:57   ` Matthew Dawson [this message]
2021-11-18 21:09     ` Zygo Blaxell
2021-11-19  4:42       ` Matthew Dawson
2021-11-24  4:43         ` Zygo Blaxell
2021-11-24  5:11           ` Matthew Dawson
  -- strict thread matches above, loose matches on Subject: below --
2021-11-15  1:23 Matthew Dawson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3321185.LZWGnKmheA@cwmtaff \
    --to=matthew@mjdsystems.ca \
    --cc=hurikhan77+btrfs@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.