Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: "Agustín DallʼAlba" <agustin@dallalba.com.ar>
To: unlisted-recipients:; (no To-header on input)
Cc: linux-btrfs@vger.kernel.org
Subject: Re: raid10 corruption while removing failing disk
Date: Mon, 31 Aug 2020 17:05:18 -0300	[thread overview]
Message-ID: <ae92575c87858511b17a15734c0ebdba01eb0840.camel@dallalba.com.ar> (raw)
In-Reply-To: <CAJCQCtSdJVw5o2hJ3OyE6-nvM2xpx=nRHLVNSgf9ydD2O--vMQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]

[Resent because the message was too long for the list]

On Tue, 2020-08-11 at 13:17 -0600, Chris Murphy wrote:
> > > My advice is to mount ro, backup (or two copies for important info),
> > > and start with a new Btrfs file system and restore. It's not worth
> > > repairing.
> > Sigh, I was expecting I'd have to do this. At least no data was lost,
> > and the system still functions even though it's read-only. Do you think
> > check --repair is not worth trying? Everything of value is already
> > backed up, but restoring it would take many hours of work.
> 
> Metadata, RAID10: total=9.00GiB, used=7.57GiB
> 
> Ballpark 8 hours for --repair given metadata size and spinning drives.
> It'll add some time adding --init-extent-tree which... is decently
> likely to be needed here. So the gotcha is, see if --repair works, and
> it fixes some stuff but still needs extent tree repaired anyway. Now
> you have to do that and it could be another 8 hours. Or do you go with
> the heavy hammer right away to save time and do both at once? But the
> heavy hammer is riskier.
> 
> Whether repair or start over, you need to have the backup plus 2x for
> important stuff. To do the repair you need to be prepared for the
> possibility tihngs get worse. I'll argue strongly that it's a bug if
> things get worse (i.e. now you can't mount ro at all) but as a risk
> assessment, it has to be considered.

So, I've finally managed to get someone to add a disk to this system
and ran a btrfs check --repair. It failed almost immediately with:

Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/disk/by-label/Susanita
UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
[1/7] checking root items
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
bad tree block 10919566688256, bytenr mismatch, want=10919566688256, have=17196831625821864417
ERROR: failed to repair root items: Input/output error

so I ran btrfs check --init-extent-tree, and it's still running after
24 hours. It seems to have processed 2 GiB of... something:

[2/7] checking extents                         (0:04:22 elapsed, 434185 items checked)
ref mismatch on [331916251136 4096] extent item 0, found 1
data backref 331916251136 parent 10915911958528 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 331916251136 parent 10915911958528 owner 0 offset 0 found 1 wanted 0 back 0x557cdf7560f0
backpointer mismatch on [331916251136 4096]
adding new data backref on 331916251136 parent 10915911958528 owner 0 offset 0 found 1
Repaired extent references for 331916251136

[24 hours later]

[2/7] checking extents                         (23:47:26 elapsed, 434185 items checked)
ref mismatch on [334605303808 188416] extent item 0, found 2
data backref 334605303808 parent 10915986505728 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 334605303808 parent 10915986505728 owner 0 offset 0 found 1 wanted 0 back 0x557ce0ac16c0
data backref 334605303808 root 10455 owner 219090 offset 921600 num_refs 0 not found in extent tree
incorrect local backref count on 334605303808 root 10455 owner 219090 offset 921600 found 1 wanted 0 back 0x557d14faebc0
backpointer mismatch on [334605303808 188416]
adding new data backref on 334605303808 parent 10915986505728 owner 0 offset 0 found 1
adding new data backref on 334605303808 root 10455 owner 219090 offset 921600 found 1
Repaired extent references for 334605303808

But now but I've got no idea if it's doing something useful or if I'd
better ^C it and give up with this filesystem. I attached the log of the ongoing repair and of a read-only check I ran immediately before.

Cheers.

[-- Attachment #2: btrfs-check-3.xz --]
[-- Type: application/x-xz, Size: 23472 bytes --]

[-- Attachment #3: btrfs-init-extent-tree-3-truncated.xz --]
[-- Type: application/x-xz, Size: 40036 bytes --]

      parent reply	other threads:[~2020-08-31 20:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10  7:03 raid10 corruption while removing failing disk Agustín DallʼAlba
2020-08-10  7:22 ` Nikolay Borisov
2020-08-10  7:38   ` Martin Steigerwald
2020-08-10  7:51     ` Nikolay Borisov
2020-08-10  8:57       ` Martin Steigerwald
2020-08-11  1:30       ` Chris Murphy
2020-08-10  7:59     ` Agustín DallʼAlba
2020-08-10  8:21 ` Nikolay Borisov
2020-08-10 22:24   ` Zygo Blaxell
2020-08-11  1:18   ` Agustín DallʼAlba
2020-08-11  1:48     ` Chris Murphy
2020-08-11  2:34 ` Chris Murphy
2020-08-11  5:06   ` Agustín DallʼAlba
2020-08-11 19:17     ` Chris Murphy
2020-08-11 20:40       ` Agustín DallʼAlba
2020-08-12  3:03         ` Chris Murphy
2020-08-31 20:05       ` Agustín DallʼAlba [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae92575c87858511b17a15734c0ebdba01eb0840.camel@dallalba.com.ar \
    --to=agustin@dallalba.com.ar \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox