public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrei Borzenkov <arvidjaar@gmail.com>
To: Alex Lieflander <atlief@icloud.com>, linux-btrfs@vger.kernel.org
Subject: Re: Finding long-term data corruption
Date: Sun, 7 Nov 2021 10:28:39 +0300	[thread overview]
Message-ID: <b341cf51-f747-71b9-e762-89bf6dbb7be2@gmail.com> (raw)
In-Reply-To: <C85EE7D2-FC47-4A0E-B7A8-9285CF46C3FC@icloud.com>

On 06.11.2021 20:24, Alex Lieflander wrote:
> Hello,
> 
> All of my files and data were exposed to an unknown amount of corruption, and I’d like to know how much I can recover and/or whether I can detect the extent of the damage. The steps that led me here are a bit complicated but (I think) relevant to the problem, so I’ve detailed them below.
> 
> I use BTRFS for most of my filesystems, and my system recently died. While investigating the issue, I found out that corruption had been detected months earlier (after an unclean shutdown) on one of them. Corruption was detected on another a few weeks later for unknown reasons. The number of detected corruptions continued to grow to about 160 and 30, respectively, before things began to noticeably malfunction.
> 
> During this time I’d been `btrfs sub snap -r`-ing and `btrfs send -p`-ing both to the third BTRFS filesystem as a backup method, with no errors except some warnings about the “capabilities” of particular files being “set multiple times". I reformatted my backup drive a few weeks ago for unrelated reasons (after corruption was detected, unbeknownst to me). Since then I continued to regularly “backup” in this way.
> 
> Once I noticed the corruption (that `btrfs scrub` couldn’t fix) I tried increasingly aggressive actions until both original filesystems were destroyed and unrecoverable. After that I reformatted and “sent” the corresponding sub-volumes back to their original drives (with the newly reformatted filesystems). Now scrub detects no errors on any of the filesystems, but btrfs-send can’t incrementally send on one of the filesystems. The parent I’m using is the one that I sent from the backup drive. On closer inspection, the received sub-volume has a few subtle permission changes from the sent one. These sub-volumes have always been read-only and I don’t think I ever modified them.
> 

That most likely is the result of stale received UUID on the source side.

https://lore.kernel.org/linux-btrfs/CAL3q7H5y6z7rRu-ZsZe_WXeHteWx1edZi+L3-UL0aa0oYg+qQA@mail.gmail.com/

> With the situation now described, I have a few questions that I’m hoping to find the answer to:
> 
> 1. Can corrupt data propagate through sent sub-volumes?
> 

You did not really explain what kind of corruption it was or how you
detected it in the first place. If you are talking about corruption
detected by scrub - it should not, as btrfs should have either used good
copy (in case of redundant profile) or failed btrfs send (if data was
unreadable).

> 2. Can this corruption damage earlier, intact, sub-volumes?
> 

Again - what corruption? Physical media errors may happen anywhere.
RAID5 or RAID6 profiles errors may affect non-related data under some
conditions.

> 3. Does sub-volume sending include the checksums? Would a clean scrub report on the receiving filesystem be an actual indication of uncorrupted data?

As far as I know send stream does not include any checksums. btrfs
receive is logical, it creates/writes files from user space so scrub
results on receive side have no relation to content or state of
filesystem on send side.

> 
> 4. Is there a way that I could detect what data/files are currently corrupted? How so?
> 

For the third time - explain what kind of corruption you are talking
about. If corruption cannot be detected by btrfs, you need to use
data/application specific methods to verify data integrity.

> 5. What might cause a sent sub-volume (with no parent) to differ between two filesystems? Is that a sign of corruption?
> 

You need to show much more details before this can be answered. Full
send is expected to have the same content. If you have evidences that
this is not the case, provide logs/commands output/any facts that show
how you determine that, starting with actual send/receive invocations
you were using.

> 6. Is using sub-volumes in the way that I described appropriate for use as a backup solution?
> 

You did not really describe much. If you refer to

"I’d been `btrfs sub snap -r`-ing and `btrfs send -p`-ing both to the
third BTRFS filesystem as a backup method"

yes, it is. Do not forget received UUID pitfall and never ever (and I
mean really *EVER*) change any subvolume from being read-only to
read-write as part of restore from backup. Always create a clone
(writable subvolume) of read-only snapshot and use it as recovered content.


> Thank you for your work on this interesting and extremely useful filesystem, and for reading this far!
> 
> Regards,
> Alex Lieflander
> 


  reply	other threads:[~2021-11-07  7:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-06 17:24 Finding long-term data corruption Alex Lieflander
2021-11-07  7:28 ` Andrei Borzenkov [this message]
2021-11-09 18:37   ` Alex Lieflander
2021-11-13  2:48     ` Fwd: " Alex Lieflander
2021-11-13 23:54       ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b341cf51-f747-71b9-e762-89bf6dbb7be2@gmail.com \
    --to=arvidjaar@gmail.com \
    --cc=atlief@icloud.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox