From: "Agustín DallʼAlba" <agustin@dallalba.com.ar>
To: unlisted-recipients:; (no To-header on input)
Cc: linux-btrfs@vger.kernel.org
Subject: Re: raid10 corruption while removing failing disk
Date: Mon, 31 Aug 2020 17:05:18 -0300 [thread overview]
Message-ID: <ae92575c87858511b17a15734c0ebdba01eb0840.camel@dallalba.com.ar> (raw)
In-Reply-To: <CAJCQCtSdJVw5o2hJ3OyE6-nvM2xpx=nRHLVNSgf9ydD2O--vMQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]
[Resent because the message was too long for the list]
On Tue, 2020-08-11 at 13:17 -0600, Chris Murphy wrote:
> > > My advice is to mount ro, backup (or two copies for important info),
> > > and start with a new Btrfs file system and restore. It's not worth
> > > repairing.
> > Sigh, I was expecting I'd have to do this. At least no data was lost,
> > and the system still functions even though it's read-only. Do you think
> > check --repair is not worth trying? Everything of value is already
> > backed up, but restoring it would take many hours of work.
>
> Metadata, RAID10: total=9.00GiB, used=7.57GiB
>
> Ballpark 8 hours for --repair given metadata size and spinning drives.
> It'll add some time adding --init-extent-tree which... is decently
> likely to be needed here. So the gotcha is, see if --repair works, and
> it fixes some stuff but still needs extent tree repaired anyway. Now
> you have to do that and it could be another 8 hours. Or do you go with
> the heavy hammer right away to save time and do both at once? But the
> heavy hammer is riskier.
>
> Whether repair or start over, you need to have the backup plus 2x for
> important stuff. To do the repair you need to be prepared for the
> possibility tihngs get worse. I'll argue strongly that it's a bug if
> things get worse (i.e. now you can't mount ro at all) but as a risk
> assessment, it has to be considered.
So, I've finally managed to get someone to add a disk to this system
and ran a btrfs check --repair. It failed almost immediately with:
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/disk/by-label/Susanita
UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c
[1/7] checking root items
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
checksum verify failed on 10919566688256 found 0000006E wanted 00000066
bad tree block 10919566688256, bytenr mismatch, want=10919566688256, have=17196831625821864417
ERROR: failed to repair root items: Input/output error
so I ran btrfs check --init-extent-tree, and it's still running after
24 hours. It seems to have processed 2 GiB of... something:
[2/7] checking extents (0:04:22 elapsed, 434185 items checked)
ref mismatch on [331916251136 4096] extent item 0, found 1
data backref 331916251136 parent 10915911958528 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 331916251136 parent 10915911958528 owner 0 offset 0 found 1 wanted 0 back 0x557cdf7560f0
backpointer mismatch on [331916251136 4096]
adding new data backref on 331916251136 parent 10915911958528 owner 0 offset 0 found 1
Repaired extent references for 331916251136
[24 hours later]
[2/7] checking extents (23:47:26 elapsed, 434185 items checked)
ref mismatch on [334605303808 188416] extent item 0, found 2
data backref 334605303808 parent 10915986505728 owner 0 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 334605303808 parent 10915986505728 owner 0 offset 0 found 1 wanted 0 back 0x557ce0ac16c0
data backref 334605303808 root 10455 owner 219090 offset 921600 num_refs 0 not found in extent tree
incorrect local backref count on 334605303808 root 10455 owner 219090 offset 921600 found 1 wanted 0 back 0x557d14faebc0
backpointer mismatch on [334605303808 188416]
adding new data backref on 334605303808 parent 10915986505728 owner 0 offset 0 found 1
adding new data backref on 334605303808 root 10455 owner 219090 offset 921600 found 1
Repaired extent references for 334605303808
But now but I've got no idea if it's doing something useful or if I'd
better ^C it and give up with this filesystem. I attached the log of the ongoing repair and of a read-only check I ran immediately before.
Cheers.
[-- Attachment #2: btrfs-check-3.xz --]
[-- Type: application/x-xz, Size: 23472 bytes --]
[-- Attachment #3: btrfs-init-extent-tree-3-truncated.xz --]
[-- Type: application/x-xz, Size: 40036 bytes --]
prev parent reply other threads:[~2020-08-31 20:05 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 7:03 raid10 corruption while removing failing disk Agustín DallʼAlba
2020-08-10 7:22 ` Nikolay Borisov
2020-08-10 7:38 ` Martin Steigerwald
2020-08-10 7:51 ` Nikolay Borisov
2020-08-10 8:57 ` Martin Steigerwald
2020-08-11 1:30 ` Chris Murphy
2020-08-10 7:59 ` Agustín DallʼAlba
2020-08-10 8:21 ` Nikolay Borisov
2020-08-10 22:24 ` Zygo Blaxell
2020-08-11 1:18 ` Agustín DallʼAlba
2020-08-11 1:48 ` Chris Murphy
2020-08-11 2:34 ` Chris Murphy
2020-08-11 5:06 ` Agustín DallʼAlba
2020-08-11 19:17 ` Chris Murphy
2020-08-11 20:40 ` Agustín DallʼAlba
2020-08-12 3:03 ` Chris Murphy
2020-08-31 20:05 ` Agustín DallʼAlba [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae92575c87858511b17a15734c0ebdba01eb0840.camel@dallalba.com.ar \
--to=agustin@dallalba.com.ar \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox