public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* Finding long-term data corruption
@ 2021-11-06 17:24 Alex Lieflander
  2021-11-07  7:28 ` Andrei Borzenkov
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Lieflander @ 2021-11-06 17:24 UTC (permalink / raw)
  To: linux-btrfs

Hello,

All of my files and data were exposed to an unknown amount of corruption, and I’d like to know how much I can recover and/or whether I can detect the extent of the damage. The steps that led me here are a bit complicated but (I think) relevant to the problem, so I’ve detailed them below.

I use BTRFS for most of my filesystems, and my system recently died. While investigating the issue, I found out that corruption had been detected months earlier (after an unclean shutdown) on one of them. Corruption was detected on another a few weeks later for unknown reasons. The number of detected corruptions continued to grow to about 160 and 30, respectively, before things began to noticeably malfunction.

During this time I’d been `btrfs sub snap -r`-ing and `btrfs send -p`-ing both to the third BTRFS filesystem as a backup method, with no errors except some warnings about the “capabilities” of particular files being “set multiple times". I reformatted my backup drive a few weeks ago for unrelated reasons (after corruption was detected, unbeknownst to me). Since then I continued to regularly “backup” in this way.

Once I noticed the corruption (that `btrfs scrub` couldn’t fix) I tried increasingly aggressive actions until both original filesystems were destroyed and unrecoverable. After that I reformatted and “sent” the corresponding sub-volumes back to their original drives (with the newly reformatted filesystems). Now scrub detects no errors on any of the filesystems, but btrfs-send can’t incrementally send on one of the filesystems. The parent I’m using is the one that I sent from the backup drive. On closer inspection, the received sub-volume has a few subtle permission changes from the sent one. These sub-volumes have always been read-only and I don’t think I ever modified them.

With the situation now described, I have a few questions that I’m hoping to find the answer to:

1. Can corrupt data propagate through sent sub-volumes?

2. Can this corruption damage earlier, intact, sub-volumes?

3. Does sub-volume sending include the checksums? Would a clean scrub report on the receiving filesystem be an actual indication of uncorrupted data?

4. Is there a way that I could detect what data/files are currently corrupted? How so?

5. What might cause a sent sub-volume (with no parent) to differ between two filesystems? Is that a sign of corruption?

6. Is using sub-volumes in the way that I described appropriate for use as a backup solution?

Thank you for your work on this interesting and extremely useful filesystem, and for reading this far!

Regards,
Alex Lieflander

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-14  0:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-06 17:24 Finding long-term data corruption Alex Lieflander
2021-11-07  7:28 ` Andrei Borzenkov
2021-11-09 18:37   ` Alex Lieflander
2021-11-13  2:48     ` Fwd: " Alex Lieflander
2021-11-13 23:54       ` Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox