corrupt leaf, unexpected item end, unmountable

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* corrupt leaf, unexpected item end, unmountable
@ 2021-02-18  2:41 Daniel Dawson
  2021-02-18 23:57 ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Dawson @ 2021-02-18  2:41 UTC (permalink / raw)
  To: linux-btrfs

I was attempting to replace the drives in an array with RAID6 profile.
The first replacement was seemingly successful (and there was a scrub
afterward, with no errors). However, about 0.6% into the second
replacement (sdc), something went wrong, and it went read-only (I should
have copied the log of that somehow). Now it refuses to mount, and a
(readonly) check cannot get started.

# mount -o ro,degraded /dev/sda3 /mnt
mount: /mnt: can't read superblock on /dev/sda3.
# btrfs rescue super-recover /dev/sda3
All supers are valid, no need to recover

For this, dmesg shows:

[  202.675384] BTRFS info (device sdc3): allowing degraded mounts
[  202.675387] BTRFS info (device sdc3): disk space caching is enabled
[  202.675389] BTRFS info (device sdc3): has skinny extents
[  202.676302] BTRFS warning (device sdc3): devid 3 uuid
911a642e-0a4c-4483-9a1f-cde7b87c5519 is missing
[  202.676601] BTRFS warning (device sdc3): devid 3 uuid
911a642e-0a4c-4483-9a1f-cde7b87c5519 is missing
[  202.985528] BTRFS info (device sdc3): bdev /dev/sdb3 errs: wr 0, rd
0, flush 0, corrupt 26, gen 0
[  202.985533] BTRFS info (device sdc3): bdev /dev/sdd3 errs: wr 0, rd
0, flush 0, corrupt 98, gen 0
[  203.278131] BTRFS info (device sdc3): start tree-log replay
[  203.454496] BTRFS critical (device sdc3): corrupt leaf: root=7
block=371567214592 slot=0, unexpected item end, have 16315 expect 16283
[  203.454499] BTRFS error (device sdc3): block=371567214592 read time
tree block corruption detected
[  203.454634] BTRFS critical (device sdc3): corrupt leaf: root=7
block=371567214592 slot=0, unexpected item end, have 16315 expect 16283
[  203.454636] BTRFS error (device sdc3): block=371567214592 read time
tree block corruption detected
[  203.455794] BTRFS critical (device sdc3): corrupt leaf: root=7
block=371567214592 slot=0, unexpected item end, have 16315 expect 16283
[  203.455796] BTRFS error (device sdc3): block=371567214592 read time
tree block corruption detected
[  203.455820] BTRFS: error (device sdc3) in __btrfs_free_extent:3105:
errno=-5 IO failure
[  203.455823] BTRFS: error (device sdc3) in
btrfs_run_delayed_refs:2208: errno=-5 IO failure
[  203.455833] BTRFS: error (device sdc3) in btrfs_replay_log:2287:
errno=-5 IO failure (Failed to recover log tree)
[  203.747758] BTRFS error (device sdc3): open_ctree failed

I've looked for, but can't find, any bad blocks on the devices. Also, if
it adds any info...

# btrfs check --readonly /dev/sda3
Opening filesystem to check...
warning, device 3 is missing
checksum verify failed on 371587727360 found 000000FF wanted 00000049
checksum verify failed on 371587727360 found 00000005 wanted 00000010
checksum verify failed on 371587727360 found 00000005 wanted 00000010
bad tree block 371587727360, bytenr mismatch, want=371587727360,
have=1076190010624
ERROR: could not setup extent tree
ERROR: cannot open file system

Note: I'm running this off of System Rescue 7.01, which has earlier
versions of things than what the machine in question has installed (the
latter being Linux 5.10.16, with btrfs-progs v5.10.1).

# uname -a
Linux sysrescue 5.4.78-1-lts #1 SMP Wed, 18 Nov 2020 19:51:49 +0000
x86_64 GNU/Linux
# btrfs --version
btrfs-progs v5.4.1
# btrfs filesystem show
Label: 'vroot2020'  uuid: 5214d903-783a-4d14-ac78-046da5ac1db7
        Total devices 4 FS bytes used 65.98GiB
        devid    0 size 457.64GiB used 39.53GiB path /dev/sdc3
        devid    1 size 457.64GiB used 39.56GiB path /dev/sda3
        devid    2 size 457.64GiB used 39.56GiB path /dev/sdb3
        devid    4 size 457.64GiB used 39.53GiB path /dev/sdd3

-- 
Skype: fllthdcrb   PGP public key: 0xF7B4422A
PGP fingerprint: 5BBD 5080 FEB0 EF7F 142F  8173 D572 B791 F7B4 422A

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt leaf, unexpected item end, unmountable
  2021-02-18  2:41 corrupt leaf, unexpected item end, unmountable Daniel Dawson
@ 2021-02-18 23:57 ` Chris Murphy
  2021-02-19  1:10   ` Daniel Dawson
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2021-02-18 23:57 UTC (permalink / raw)
  To: Daniel Dawson; +Cc: Btrfs BTRFS

On Wed, Feb 17, 2021 at 7:43 PM Daniel Dawson <danielcdawson@gmail.com> wrote:
>
> I was attempting to replace the drives in an array with RAID6 profile.

metadata raid6 as well?

What replacement command(s) are you using?


> The first replacement was seemingly successful (and there was a scrub
> afterward, with no errors). However, about 0.6% into the second
> replacement (sdc), something went wrong, and it went read-only (I should
> have copied the log of that somehow). Now it refuses to mount, and a
> (readonly) check cannot get started.
>
>
> # mount -o ro,degraded /dev/sda3 /mnt
> mount: /mnt: can't read superblock on /dev/sda3.
> # btrfs rescue super-recover /dev/sda3
> All supers are valid, no need to recover
>
>
> For this, dmesg shows:
>
> [  202.675384] BTRFS info (device sdc3): allowing degraded mounts
> [  202.675387] BTRFS info (device sdc3): disk space caching is enabled
> [  202.675389] BTRFS info (device sdc3): has skinny extents
> [  202.676302] BTRFS warning (device sdc3): devid 3 uuid
> 911a642e-0a4c-4483-9a1f-cde7b87c5519 is missing
> [  202.676601] BTRFS warning (device sdc3): devid 3 uuid
> 911a642e-0a4c-4483-9a1f-cde7b87c5519 is missing

What device is devid 3?


> [  202.985528] BTRFS info (device sdc3): bdev /dev/sdb3 errs: wr 0, rd
> 0, flush 0, corrupt 26, gen 0
> [  202.985533] BTRFS info (device sdc3): bdev /dev/sdd3 errs: wr 0, rd
> 0, flush 0, corrupt 98, gen 0
> [  203.278131] BTRFS info (device sdc3): start tree-log replay
> [  203.454496] BTRFS critical (device sdc3): corrupt leaf: root=7
> block=371567214592 slot=0, unexpected item end, have 16315 expect 16283
> [  203.454499] BTRFS error (device sdc3): block=371567214592 read time
> tree block corruption detected
> [  203.454634] BTRFS critical (device sdc3): corrupt leaf: root=7
> block=371567214592 slot=0, unexpected item end, have 16315 expect 16283
> [  203.454636] BTRFS error (device sdc3): block=371567214592 read time
> tree block corruption detected
> [  203.455794] BTRFS critical (device sdc3): corrupt leaf: root=7
> block=371567214592 slot=0, unexpected item end, have 16315 expect 16283

16315=0x3fbb, 16283=0x3f9b, 16315^16283 = 32 or 0x20

11111110111011
11111110011011
        ^

Do a RAM test for as long as you can tolerate it, or it finds the
defect. Sometimes they show up quickly, other times days.


> [  203.455796] BTRFS error (device sdc3): block=371567214592 read time
> tree block corruption detected
> [  203.455820] BTRFS: error (device sdc3) in __btrfs_free_extent:3105:
> errno=-5 IO failure
> [  203.455823] BTRFS: error (device sdc3) in
> btrfs_run_delayed_refs:2208: errno=-5 IO failure
> [  203.455833] BTRFS: error (device sdc3) in btrfs_replay_log:2287:
> errno=-5 IO failure (Failed to recover log tree)
> [  203.747758] BTRFS error (device sdc3): open_ctree failed
>
>
> I've looked for, but can't find, any bad blocks on the devices. Also, if
> it adds any info...
>
> # btrfs check --readonly /dev/sda3
> Opening filesystem to check...
> warning, device 3 is missing
> checksum verify failed on 371587727360 found 000000FF wanted 00000049
> checksum verify failed on 371587727360 found 00000005 wanted 00000010
> checksum verify failed on 371587727360 found 00000005 wanted 00000010
> bad tree block 371587727360, bytenr mismatch, want=371587727360,
> have=1076190010624
> ERROR: could not setup extent tree
> ERROR: cannot open file system
>
>
> Note: I'm running this off of System Rescue 7.01, which has earlier
> versions of things than what the machine in question has installed (the
> latter being Linux 5.10.16, with btrfs-progs v5.10.1).
>
> # uname -a
> Linux sysrescue 5.4.78-1-lts #1 SMP Wed, 18 Nov 2020 19:51:49 +0000
> x86_64 GNU/Linux
> # btrfs --version
> btrfs-progs v5.4.1
> # btrfs filesystem show
> Label: 'vroot2020'  uuid: 5214d903-783a-4d14-ac78-046da5ac1db7
>         Total devices 4 FS bytes used 65.98GiB
>         devid    0 size 457.64GiB used 39.53GiB path /dev/sdc3
>         devid    1 size 457.64GiB used 39.56GiB path /dev/sda3
>         devid    2 size 457.64GiB used 39.56GiB path /dev/sdb3
>         devid    4 size 457.64GiB used 39.53GiB path /dev/sdd3


This is confusing. devid 3 is claimed to be missing, but fi show isn't
showing any missing devices. If none of sd[abcd] are devid 3, then
what dev node is devid 3 and where is it?

But yeah you're probably best off not trying to fix this file system
until the memory is sorted out.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt leaf, unexpected item end, unmountable
  2021-02-18 23:57 ` Chris Murphy
@ 2021-02-19  1:10   ` Daniel Dawson
  2021-02-19  5:03     ` Chris Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Dawson @ 2021-02-19  1:10 UTC (permalink / raw)
  To: Btrfs BTRFS

On 2/18/21 3:57 PM, Chris Murphy wrote:
> metadata raid6 as well?

Yes.

> What replacement command(s) are you using?

For this drive, it was "btrfs replace start -r 3 /dev/sda3 /"

> What device is devid 3? 
It would normally be sdc3. I'll address the confusion below.
> 16315=0x3fbb, 16283=0x3f9b, 16315^16283 = 32 or 0x20
> 11111110111011
> 11111110011011
>         ^
>
> Do a RAM test for as long as you can tolerate it, or it finds the
> defect. Sometimes they show up quickly, other times days.
I didn't think of a flipped bit. Thanks.
>>         devid    0 size 457.64GiB used 39.53GiB path /dev/sdc3
>>         devid    1 size 457.64GiB used 39.56GiB path /dev/sda3
>>         devid    2 size 457.64GiB used 39.56GiB path /dev/sdb3
>>         devid    4 size 457.64GiB used 39.53GiB path /dev/sdd3
>
> This is confusing. devid 3 is claimed to be missing, but fi show isn't
> showing any missing devices. If none of sd[abcd] are devid 3, then
> what dev node is devid 3 and where is it?
It looks to me like btrfs is temporarily assigning devid 0 to the new
device being used as a replacement. That is what I observed before; once
the replace operation was complete, it went back to the normal number.
Since the replacement didn't finish this time, sdc3 is still devid 0.
> But yeah you're probably best off not trying to fix this file system
> until the memory is sorted out.
Right. I'll get on that soon and see if anything pops up. Thanks for the
help so far.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt leaf, unexpected item end, unmountable
  2021-02-19  1:10   ` Daniel Dawson
@ 2021-02-19  5:03     ` Chris Murphy
  2021-02-19 15:02       ` Daniel Dawson
  2021-02-19 15:03       ` Daniel Dawson
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Murphy @ 2021-02-19  5:03 UTC (permalink / raw)
  To: Daniel Dawson; +Cc: Btrfs BTRFS

On Thu, Feb 18, 2021 at 6:12 PM Daniel Dawson <danielcdawson@gmail.com> wrote:
>
> On 2/18/21 3:57 PM, Chris Murphy wrote:
> > metadata raid6 as well?
>
> Yes.

Once everything else is figured out, you should consider converting
metadata to raid1c3.

https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@hungrycats.org/

> > What replacement command(s) are you using?
>
> For this drive, it was "btrfs replace start -r 3 /dev/sda3 /"

OK replace is good.

> > Do a RAM test for as long as you can tolerate it, or it finds the
> > defect. Sometimes they show up quickly, other times days.
> I didn't think of a flipped bit. Thanks.
> >>         devid    0 size 457.64GiB used 39.53GiB path /dev/sdc3
> >>         devid    1 size 457.64GiB used 39.56GiB path /dev/sda3
> >>         devid    2 size 457.64GiB used 39.56GiB path /dev/sdb3
> >>         devid    4 size 457.64GiB used 39.53GiB path /dev/sdd3
> >
> > This is confusing. devid 3 is claimed to be missing, but fi show isn't
> > showing any missing devices. If none of sd[abcd] are devid 3, then
> > what dev node is devid 3 and where is it?
> It looks to me like btrfs is temporarily assigning devid 0 to the new
> device being used as a replacement.That is what I observed before; once
> the replace operation was complete, it went back to the normal number.
> Since the replacement didn't finish this time, sdc3 is still devid 0.

The new replacement is devid 0 during the replacement. The drive being
replaced keeps its devid until the end, and then there's a switch,
that device is removed, and the signature on the old drive is wiped.
Sooo.... something is still wrong with the above because there's no
devid 3, there's kernel and btrfs check messages saying devid 3 is
missing.

It doesn't seem likely that /dev/sdc3 is devid 3 because it can't be
both missing and be the mounted dev node.

>[  202.676601] BTRFS warning (device sdc3): devid 3 uuid 911a642e-0a4c-4483-9a1f-cde7b87c5519 is missing

Try a reboot, and use blkid to check you've got all devices + 1 (the
new one that failed replacement). Verify all supers as well with
'btrfs rescue super-recover -v' and that it all correlates with 'btrfs
filesystem show' as well.

What should be true is the replace will resume upon being normally
mounted. But for that to happen, all the drives + 1 must be available.

If a tree log is damaged and prevents mount then, you need to make a
calculation. You can try to mount with ro,nologreplay and freshen
backups for anything you'd rather not lose - just in case things get
worse. And then you can zero the log and see if that'll let you
normally mount the device (i.e. rw and not degraded). But some of it
will depend on what's wrong.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt leaf, unexpected item end, unmountable
  2021-02-19  5:03     ` Chris Murphy
@ 2021-02-19 15:02       ` Daniel Dawson
  2021-02-19 15:03       ` Daniel Dawson
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel Dawson @ 2021-02-19 15:02 UTC (permalink / raw)
  To: Btrfs BTRFS

On 2/18/21 9:03 PM, Chris Murphy wrote:
> Once everything else is figured out, you should consider converting
> metadata to raid1c3.

Got it.

> The new replacement is devid 0 during the replacement. The drive being
> replaced keeps its devid until the end, and then there's a switch,
> that device is removed, and the signature on the old drive is wiped.
> Sooo.... something is still wrong with the above because there's no
> devid 3, there's kernel and btrfs check messages saying devid 3 is
> missing.
>
> It doesn't seem likely that /dev/sdc3 is devid 3 because it can't be
> both missing and be the mounted dev node.

It seems I was unclear. I removed the old drive prior to the
replacement, hence degraded mode.

A while ago, I imaged the drives, to see what I could do without risk
(on another machine). Turns out I was able to mount the filesystem using
-o ro,nologreplay,degraded and copy almost all files. A small number
were unreadable/un-stat-able. Fortunately nothing critical, though the
OS may well be unusable.

(Also, in case you were wondering, memory testing has revealed no errors
so far.)

> If a tree log is damaged and prevents mount then, you need to make a
> calculation. You can try to mount with ro,nologreplay and freshen
> backups for anything you'd rather not lose - just in case things get
> worse. And then you can zero the log and see if that'll let you
> normally mount the device (i.e. rw and not degraded). But some of it
> will depend on what's wrong.

That doesn't work. It gives the same errors as when I tried to run
check, but repeated once each for extent tree and device tree. It just
can't get past this problem.

At this point, I think it's best to just reinstall with a fresh
filesystem, and not make the same mistakes. Thanks for the help, once again.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: corrupt leaf, unexpected item end, unmountable
  2021-02-19  5:03     ` Chris Murphy
  2021-02-19 15:02       ` Daniel Dawson
@ 2021-02-19 15:03       ` Daniel Dawson
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel Dawson @ 2021-02-19 15:03 UTC (permalink / raw)
  To: Btrfs BTRFS

On 2/18/21 9:03 PM, Chris Murphy wrote:
> Once everything else is figured out, you should consider converting
> metadata to raid1c3.

Got it.

> The new replacement is devid 0 during the replacement. The drive being
> replaced keeps its devid until the end, and then there's a switch,
> that device is removed, and the signature on the old drive is wiped.
> Sooo.... something is still wrong with the above because there's no
> devid 3, there's kernel and btrfs check messages saying devid 3 is
> missing.
>
> It doesn't seem likely that /dev/sdc3 is devid 3 because it can't be
> both missing and be the mounted dev node.

It seems I was unclear. I removed the old drive prior to the
replacement, hence degraded mode.

A while ago, I imaged the drives, to see what I could do without risk
(on another machine). Turns out I was able to mount the filesystem using
-o ro,nologreplay,degraded and copy almost all files. A small number
were unreadable/un-stat-able. Fortunately nothing critical, though the
OS may well be unusable.

(Also, in case you were wondering, memory testing has revealed no errors
so far.)

> If a tree log is damaged and prevents mount then, you need to make a
> calculation. You can try to mount with ro,nologreplay and freshen
> backups for anything you'd rather not lose - just in case things get
> worse. And then you can zero the log and see if that'll let you
> normally mount the device (i.e. rw and not degraded). But some of it
> will depend on what's wrong.

That doesn't work. It gives the same errors as when I tried to run
check, but repeated once each for extent tree and device tree. It just
can't get past this problem.

At this point, I think it's best to just reinstall with a fresh
filesystem, and not make the same mistakes. Thanks for the help, once again.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-19 15:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-18  2:41 corrupt leaf, unexpected item end, unmountable Daniel Dawson
2021-02-18 23:57 ` Chris Murphy
2021-02-19  1:10   ` Daniel Dawson
2021-02-19  5:03     ` Chris Murphy
2021-02-19 15:02       ` Daniel Dawson
2021-02-19 15:03       ` Daniel Dawson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).