Unable to mount degraded RAID5

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Unable to mount degraded RAID5
@ 2016-07-04 18:09 Tomáš Hrdina
  2016-07-04 18:41 ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-04 18:09 UTC (permalink / raw)
  To: linux-btrfs

Hello,
one of my 3 disks failed in RAID5. After that, fs is unable to mount.
Any help on what to try next would be appreciated.


sudo btrfs version
btrfs-progs v4.6.1
-- I installed 4.6.1 just now. I ran rescue on 4.4

uname -a
Linux uncik-srv 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux



sudo mount -t btrfs -o ro,recovery /dev/sdc /shares
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

sudo btrfs rescue chunk-recover /dev/sdc
Scanning: 2517981163520 in dev0, 3166618750976 in dev1Segmentation fault
(core dumped)

sudo btrfs filesystem show /dev/sda
warning, device 3 is missing
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
bytenr mismatch, want=12678831570944, have=10160133442474442752
Couldn't read chunk tree
Label: none  uuid: 2dab74bb-fc73-4c47-a413-a55840f6f71e
        Total devices 3 FS bytes used 3.80TiB
        devid    1 size 3.64TiB used 1.92TiB path /dev/sdb
        devid    2 size 3.64TiB used 1.92TiB path /dev/sda
        *** Some devices missing

sudo btrfs restore /dev/sda /mnt
warning, device 3 is missing
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
bytenr mismatch, want=12678831570944, have=10160133442474442752
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 3 is missing
warning, device 1 is missing
bytenr mismatch, want=12678831570944, have=0
Couldn't read chunk tree
Could not open root, trying backup super
warning, device 3 is missing
warning, device 1 is missing
bytenr mismatch, want=12678831570944, have=0
Couldn't read chunk tree
Could not open root, trying backup super

sudo btrfs check /dev/sda
warning, device 3 is missing
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
bytenr mismatch, want=12678831570944, have=10160133442474442752
Couldn't read chunk tree
Couldn't open file system


Log
http://sebsauvage.net/paste/?236c3f6f238dbf26#4kM3tx+CjlA8ke9yMH+gD/QDsnjNnBK2i5Do4CXwD04=

Thank you
Tomas

---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-04 18:09 Unable to mount degraded RAID5 Tomáš Hrdina
@ 2016-07-04 18:41 ` Chris Murphy
       [not found]   ` <95f58623-95a4-b5d2-fa3a-bfb957840a31@gmail.com>
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-04 18:41 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Btrfs BTRFS

On Mon, Jul 4, 2016 at 12:09 PM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:

> sudo mount -t btrfs -o ro,recovery /dev/sdc /shares
> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>        missing codepage or helper program, or other error
>
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.

It needs -o degraded, not recovery.

Do not use 'btrfs replace' on raid5 right now, it seems to be
unreliable. If you do not have a backup of this raid5 I highly
recommend that you mount -o ro,degraded and make a backup now before
you do anything else to the file system. Degraded raid56 is really
fragile on Btrfs, and still broadly considered experimental (or at
least has enough caveats and gotchas that it's really just for expert
usage).

> sudo btrfs rescue chunk-recover /dev/sdc
> Scanning: 2517981163520 in dev0, 3166618750976 in dev1Segmentation fault
> (core dumped)

This is not a good idea. Avoid randomly trying things, especially
things that have absolutely nothing to do with your problem.

>
> sudo btrfs filesystem show /dev/sda
> warning, device 3 is missing
> checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
> checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
> bytenr mismatch, want=12678831570944, have=10160133442474442752
> Couldn't read chunk tree
> Label: none  uuid: 2dab74bb-fc73-4c47-a413-a55840f6f71e
>         Total devices 3 FS bytes used 3.80TiB
>         devid    1 size 3.64TiB used 1.92TiB path /dev/sdb
>         devid    2 size 3.64TiB used 1.92TiB path /dev/sda
>         *** Some devices missing
>
> sudo btrfs restore /dev/sda /mnt
> warning, device 3 is missing
> checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
> checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
> bytenr mismatch, want=12678831570944, have=10160133442474442752
> Couldn't read chunk tree
> Could not open root, trying backup super
> warning, device 3 is missing
> warning, device 1 is missing

It's concerning that this says device 1 is missing when 'btrfs fi
show' clearly shows it as not missing. There's a bug here somewhere,
either show is wrong, or restore is wrong. That restore sees two
missing devices means it probably can't reconstruct from parity, and
there will be csum errors. Hopefully that's all that's going on right
now.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <95f58623-95a4-b5d2-fa3a-bfb957840a31@gmail.com>]

* Re: Unable to mount degraded RAID5
       [not found]   ` <95f58623-95a4-b5d2-fa3a-bfb957840a31@gmail.com>
@ 2016-07-04 19:01     ` Chris Murphy
  2016-07-04 19:11       ` Tomáš Hrdina
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-04 19:01 UTC (permalink / raw)
  To: Tomáš Hrdina, Btrfs BTRFS

On Mon, Jul 4, 2016 at 12:54 PM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> Degraded gives same result:
>
> sudo mount -t btrfs -o ro,degraded /dev/sda /shares
> mount: wrong fs type, bad option, bad superblock on /dev/sda,
>        missing codepage or helper program, or other error
>
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
>

That's bad. What are the kernel messages for the mount attempt?

Do the following and report back all the results.

# btrfs dev scan
# btrfs fi show
# blkid
# btrfs check /dev/sda


Also, make sure you reply to the btrfs list also and not just to me
personally. And also don't top post or it'll annoy some people on the
list.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-04 19:01     ` Chris Murphy
@ 2016-07-04 19:11       ` Tomáš Hrdina
  2016-07-04 20:43         ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-04 19:11 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

Result from dmesg:
http://sebsauvage.net/paste/?4e8e95b5eafbf675#ybToBzZ/WAoRjjugeH6N2YXZKEBlswaNI/J41GBmFYU=

sudo btrfs dev scan
Scanning for Btrfs filesystems

sudo btrfs fi show
warning, device 3 is missing
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
bytenr mismatch, want=12678831570944, have=10160133442474442752
Couldn't read chunk tree
Label: none  uuid: 2dab74bb-fc73-4c47-a413-a55840f6f71e
        Total devices 3 FS bytes used 3.80TiB
        devid    1 size 3.64TiB used 1.92TiB path /dev/sdb
        devid    2 size 3.64TiB used 1.92TiB path /dev/sda
        *** Some devices missing

sudo blkid
/dev/sda: UUID="2dab74bb-fc73-4c47-a413-a55840f6f71e"
UUID_SUB="7262d027-202a-4b29-aaf8-0bb8cf107d4d" TYPE="btrfs"
/dev/sdb: UUID="2dab74bb-fc73-4c47-a413-a55840f6f71e"
UUID_SUB="afa4cff8-d037-472e-9eb1-070d331b6a20" TYPE="btrfs"
/dev/sdc1: UUID="6731963f-5bf9-458c-a44d-6f0bbea38357" TYPE="ext2"
PARTUUID="ccb00f7e-01"
/dev/sdc5: UUID="522XeT-jWNr-O0oG-HeHV-akQu-AR3P-fzeUAy"
TYPE="LVM2_member" PARTUUID="ccb00f7e-05"
/dev/sdd1: LABEL="Verbatim HDD" UUID="70947DE1947DAA6C" TYPE="ntfs"
PARTUUID="58334162-01"
/dev/mapper/uncik--srv--vg-root:
UUID="15b50e38-d4cc-454c-ba0f-80adbb4cd4e1" TYPE="ext4"
/dev/mapper/uncik--srv--vg-swap_1:
UUID="c9db5981-acc3-43bb-a209-a213b80cc9cb" TYPE="swap"

sudo btrfs check /dev/sda
warning, device 3 is missing
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
bytenr mismatch, want=12678831570944, have=10160133442474442752
Couldn't read chunk tree
Couldn't open file system
 
Thank you
Tomas


------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Monday, July 04, 2016 9:01PM
 *To:* Tomáš Hrdina, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

On Mon, Jul 4, 2016 at 12:54 PM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> Degraded gives same result:
> 
> sudo mount -t btrfs -o ro,degraded /dev/sda /shares
> mount: wrong fs type, bad option, bad superblock on /dev/sda,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> 

That's bad. What are the kernel messages for the mount attempt?

Do the following and report back all the results.

# btrfs dev scan
# btrfs fi show
# blkid
# btrfs check /dev/sda


Also, make sure you reply to the btrfs list also and not just to me
personally. And also don't top post or it'll annoy some people on the
list.




---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-04 19:11       ` Tomáš Hrdina
@ 2016-07-04 20:43         ` Chris Murphy
  2016-07-04 21:10           ` Tomáš Hrdina
  2016-07-05  3:48           ` Andrei Borzenkov
  0 siblings, 2 replies; 25+ messages in thread
From: Chris Murphy @ 2016-07-04 20:43 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Jul 4, 2016 at 1:11 PM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> Result from dmesg:
> http://sebsauvage.net/paste/?4e8e95b5eafbf675#ybToBzZ/WAoRjjugeH6N2YXZKEBlswaNI/J41GBmFYU=

[10849.041749] BTRFS info (device sda): allowing degraded mounts
[10849.041754] BTRFS info (device sda): disk space caching is enabled
[10849.041756] BTRFS: has skinny extents
[10849.090553] BTRFS error (device sda): bad tree block start
10160120763642806272 12678831570944
[10849.090676] BTRFS error (device sda): bad tree block start
10160120763642806272 12678831570944
[10849.090700] BTRFS: failed to read chunk tree on sda
[10849.100153] BTRFS: open_ctree failed


Try 'mount -o ro,degraded,recovery


>
> sudo btrfs check /dev/sda
> warning, device 3 is missing
> checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
> checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
> bytenr mismatch, want=12678831570944, have=10160133442474442752
> Couldn't read chunk tree
> Couldn't open file system

Want and have are way far apart. If the mount command above still
fails then I'd like to see:

# btrfs-show-super -fa /dev/sda
# btrfs-show-super -fa /dev/sdb

Pretty much look for any discrepancies in generation, root and
chunk_root addresses, both in the main part of the super as well as in
the backups.

# btrfs-find-root /dev/sda

Maybe it's possible to use a different tree to get it mounted. I don't
know what happened but merely a failing device should not either break
checksums or lose the ability to mount the proper tree; but for sure
one of the backups should work.

Have you done a scrub on this file system and do you know if anything
was fixed or if it always found no problem?



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-04 20:43         ` Chris Murphy
@ 2016-07-04 21:10           ` Tomáš Hrdina
  2016-07-04 22:42             ` Chris Murphy
  2016-07-05  3:48           ` Andrei Borzenkov
  1 sibling, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-04 21:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

One disk got reallocated sectors in SMART, so i did extended smart test
and it passed. Then I ran scrub and it found nothing. Everything was ok.
After this, it was started another extended smart test, weekly
scheduled, and I thing that sometime during this, disk went offline.

Maybe problem can be, that another disk have smart stat: Reported
Uncorrect on 1.


sudo mount -o ro,degraded,recovery /dev/sda /shares
mount: wrong fs type, bad option, bad superblock on /dev/sda,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
 
sudo btrfs-show-super -fa /dev/sda and sdb
http://sebsauvage.net/paste/?39c73a3440b2e903#WZnUJXNFPNz/fFuOK3QquVeOWQUopcCl0JabtuYMWew=

sudo btrfs-find-root /dev/sda
warning, device 3 is missing
Couldn't read chunk tree
ERROR: open ctree failed

Thank you
Tomáš

------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Monday, July 04, 2016 10:43PM
 *To:* Tomáš Hrdina
*Cc:* Chris Murphy, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

On Mon, Jul 4, 2016 at 1:11 PM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> Result from dmesg:
> http://sebsauvage.net/paste/?4e8e95b5eafbf675#ybToBzZ/WAoRjjugeH6N2YXZKEBlswaNI/J41GBmFYU=

[10849.041749] BTRFS info (device sda): allowing degraded mounts
[10849.041754] BTRFS info (device sda): disk space caching is enabled
[10849.041756] BTRFS: has skinny extents
[10849.090553] BTRFS error (device sda): bad tree block start
10160120763642806272 12678831570944
[10849.090676] BTRFS error (device sda): bad tree block start
10160120763642806272 12678831570944
[10849.090700] BTRFS: failed to read chunk tree on sda
[10849.100153] BTRFS: open_ctree failed


Try 'mount -o ro,degraded,recovery


> 
> sudo btrfs check /dev/sda
> warning, device 3 is missing
> checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
> checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
> bytenr mismatch, want=12678831570944, have=10160133442474442752
> Couldn't read chunk tree
> Couldn't open file system

Want and have are way far apart. If the mount command above still
fails then I'd like to see:

# btrfs-show-super -fa /dev/sda
# btrfs-show-super -fa /dev/sdb

Pretty much look for any discrepancies in generation, root and
chunk_root addresses, both in the main part of the super as well as in
the backups.

# btrfs-find-root /dev/sda

Maybe it's possible to use a different tree to get it mounted. I don't
know what happened but merely a failing device should not either break
checksums or lose the ability to mount the proper tree; but for sure
one of the backups should work.

Have you done a scrub on this file system and do you know if anything
was fixed or if it always found no problem?





---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-04 21:10           ` Tomáš Hrdina
@ 2016-07-04 22:42             ` Chris Murphy
  2016-07-04 22:59               ` Chris Murphy
  2016-07-05  7:12               ` Tomáš Hrdina
  0 siblings, 2 replies; 25+ messages in thread
From: Chris Murphy @ 2016-07-04 22:42 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Jul 4, 2016 at 3:10 PM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:

> http://sebsauvage.net/paste/?39c73a3440b2e903#WZnUJXNFPNz/fFuOK3QquVeOWQUopcCl0JabtuYMWew=

Both backup 0 and 1 have bad information for backup_fs_root.

backup_fs_root: 0 gen: 0 level: 0

Presumably it automatically tries backup 2 or 3 even though they have
older generations but I'm not sure.

> sudo btrfs-find-root /dev/sda
> warning, device 3 is missing
> Couldn't read chunk tree
> ERROR: open ctree failed

I'm gonna guess the system chunk is bad or damaged somehow and
therefore there's no way to get to the chunk tree. What do you get
for:

# btrfs-debug-tree -d /dev/sda

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-04 22:42             ` Chris Murphy
@ 2016-07-04 22:59               ` Chris Murphy
  2016-07-05  7:12               ` Tomáš Hrdina
  1 sibling, 0 replies; 25+ messages in thread
From: Chris Murphy @ 2016-07-04 22:59 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Tomáš Hrdina, Btrfs BTRFS

I just tried btrfs rescue chunk-recover (btrfs-progs 4.6) on new
Btrfs, 3x raid5 with 1 dev missing. I get:

[root@f24s ~]# btrfs rescue chunk-recover /dev/VG/2
Scanning: DONE in dev0, DONE in dev1
open with broken chunk error
Chunk tree recovery failed

So I don't think rescue chunk-recover can work degraded. At least,
it's not working now, and if it isn't meant to work it probably should
fail before it does the scanning, which takes a long time. I filed a
bug:
https://bugzilla.kernel.org/show_bug.cgi?id=121471

But in my case things still continue to work with btrfs-find-tree and
degraded mount works OK, so off hand I don't think the rescue
chunk-recover made things worse.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-04 22:42             ` Chris Murphy
  2016-07-04 22:59               ` Chris Murphy
@ 2016-07-05  7:12               ` Tomáš Hrdina
  1 sibling, 0 replies; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-05  7:12 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

sudo btrfs-debug-tree -d /dev/sdc
btrfs-progs v4.6.1
warning, device 3 is missing
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
checksum verify failed on 12678831570944 found 3DC57E3E wanted 771D2379
bytenr mismatch, want=12678831570944, have=10160133442474442752
Couldn't read chunk tree
ERROR: unable to open /dev/sdc
 

Thank you
Tomas

------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Tuesday, July 05, 2016 12:42AM
 *To:* Tomáš Hrdina
*Cc:* Chris Murphy, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

On Mon, Jul 4, 2016 at 3:10 PM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:

> http://sebsauvage.net/paste/?39c73a3440b2e903#WZnUJXNFPNz/fFuOK3QquVeOWQUopcCl0JabtuYMWew=


Both backup 0 and 1 have bad information for backup_fs_root.

backup_fs_root: 0 gen: 0 level: 0

Presumably it automatically tries backup 2 or 3 even though they have
older generations but I'm not sure.


> sudo btrfs-find-root /dev/sda
> warning, device 3 is missing
> Couldn't read chunk tree
> ERROR: open ctree failed

I'm gonna guess the system chunk is bad or damaged somehow and
therefore there's no way to get to the chunk tree. What do you get
for:

# btrfs-debug-tree -d /dev/sda





---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-04 20:43         ` Chris Murphy
  2016-07-04 21:10           ` Tomáš Hrdina
@ 2016-07-05  3:48           ` Andrei Borzenkov
  2016-07-05 15:13             ` Chris Murphy
  1 sibling, 1 reply; 25+ messages in thread
From: Andrei Borzenkov @ 2016-07-05  3:48 UTC (permalink / raw)
  To: Chris Murphy, Tomáš Hrdina; +Cc: Btrfs BTRFS

04.07.2016 23:43, Chris Murphy пишет:
> 
> Have you done a scrub on this file system and do you know if anything
> was fixed or if it always found no problem?
> 

scrub on degraded RAID5 cannot fix anything by definition, because even
if scrub finds discrepancies, it does not have enough data to
reconstruct them. I would actually avoid it - the worst that can happen
if it attempts to replace remaining data with something faked.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-05  3:48           ` Andrei Borzenkov
@ 2016-07-05 15:13             ` Chris Murphy
  2016-07-05 18:40               ` Tomáš Hrdina
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-05 15:13 UTC (permalink / raw)
  To: Andrei Borzenkov; +Cc: Chris Murphy, Tomáš Hrdina, Btrfs BTRFS

On Mon, Jul 4, 2016 at 9:48 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> 04.07.2016 23:43, Chris Murphy пишет:
>>
>> Have you done a scrub on this file system and do you know if anything
>> was fixed or if it always found no problem?
>>
>
> scrub on degraded RAID5 cannot fix anything by definition,

Right. In this case, he can't mount, so he can't do a scrub. My
concise question could be confusing in another situation as suggesting
he should do a scrub now, but I was asking if he had ever done a
scrub. I was wondering if maybe he's run into this scrub problem where
a data strip is wrong but gets fixed from good parity and is then
promptly overwritten with wrongly computed parity. That leads to this
same kind of checksum errors when degraded because the wrong parity
results in wrong reconstruction of data.

But that's not the case here it seems. So, how is it this healthy,
functioning raid5 totally implodes like this with checksum errors just
because of a single device degraded? There are no device read errors
or link resets in the kernel messages. It seems to be a weakness of
the chunk tree again, which at least Qu has mentioned before.

>because even
> if scrub finds discrepancies, it does not have enough data to
> reconstruct them. I would actually avoid it - the worst that can happen
> if it attempts to replace remaining data with something faked.

At the moment I would like all of the debugging tools to have a flag
to force ignoring checksum checks. Right now they fail on checksum
mismatch. Instead I'd rather see the output ignoring checksum
mismatches, but somehow indicate suspicious information because of a
checksum mismatch.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-05 15:13             ` Chris Murphy
@ 2016-07-05 18:40               ` Tomáš Hrdina
  2016-07-05 23:19                 ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-05 18:40 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

I don't know, if it would be good idea, but my disk, which disconnected
is connected again. Maybe it could help in getting data to the right
state, so other two disk could be mounted alone. But don't know, if it
would stay connected for some work. Or if it would make things even worst.

Thank you
Tomas

------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Tuesday, July 05, 2016 5:13PM
 *To:* Andrei Borzenkov
*Cc:* Chris Murphy, Tomáš Hrdina, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

On Mon, Jul 4, 2016 at 9:48 PM, Andrei Borzenkov <arvidjaar@gmail.com>
wrote:
> 04.07.2016 23:43, Chris Murphy пишет:
>> 
>> Have you done a scrub on this file system and do you know if anything
>> was fixed or if it always found no problem?
>> 
>>
> scrub on degraded RAID5 cannot fix anything by definition,

Right. In this case, he can't mount, so he can't do a scrub. My
concise question could be confusing in another situation as suggesting
he should do a scrub now, but I was asking if he had ever done a
scrub. I was wondering if maybe he's run into this scrub problem where
a data strip is wrong but gets fixed from good parity and is then
promptly overwritten with wrongly computed parity. That leads to this
same kind of checksum errors when degraded because the wrong parity
results in wrong reconstruction of data.

But that's not the case here it seems. So, how is it this healthy,
functioning raid5 totally implodes like this with checksum errors just
because of a single device degraded? There are no device read errors
or link resets in the kernel messages. It seems to be a weakness of
the chunk tree again, which at least Qu has mentioned before.

> because even
> if scrub finds discrepancies, it does not have enough data to
> reconstruct them. I would actually avoid it - the worst that can happen
> if it attempts to replace remaining data with something faked.

At the moment I would like all of the debugging tools to have a flag
to force ignoring checksum checks. Right now they fail on checksum
mismatch. Instead I'd rather see the output ignoring checksum
mismatches, but somehow indicate suspicious information because of a
checksum mismatch.

---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-05 18:40               ` Tomáš Hrdina
@ 2016-07-05 23:19                 ` Chris Murphy
  2016-07-06  8:07                   ` Tomáš Hrdina
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-05 23:19 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Chris Murphy, Btrfs BTRFS

On Tue, Jul 5, 2016 at 12:40 PM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> I don't know, if it would be good idea, but my disk, which disconnected
> is connected again. Maybe it could help in getting data to the right
> state, so other two disk could be mounted alone. But don't know, if it
> would stay connected for some work. Or if it would make things even worst.

I'd stick to the read only commands:

btrfs check
btrfs-debug-tree -d
btrfs-find-root

Also, I'm surprised I didn't ask (seeing as I'm on a rampage about
this these days)


smartctl -l scterc /dev/sdX   ## for each drive
smartcl -a /dev/sdX  ## for each drive
cat /sys/block/sdX/device/timeout   ## for each drive

We should find out if there are bad sectors, and if they can even be
properly corrected by Btrfs self healing mechanism. The normal default
on Linux prevents this with consumer drives.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-05 23:19                 ` Chris Murphy
@ 2016-07-06  8:07                   ` Tomáš Hrdina
  2016-07-06 16:08                     ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-06  8:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Now with 3 disks:

sudo btrfs check /dev/sda
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
Checking filesystem on /dev/sda
UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
checking extents
parent transid verify failed on 7009468874752 wanted 70180 found 70133
parent transid verify failed on 7009468874752 wanted 70180 found 70133
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
bytenr mismatch, want=7009468874752, have=65536
parent transid verify failed on 7008859045888 wanted 70175 found 70133
parent transid verify failed on 7008859045888 wanted 70175 found 70133
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
bytenr mismatch, want=7008859045888, have=65536
parent transid verify failed on 7008899547136 wanted 70175 found 70133
parent transid verify failed on 7008899547136 wanted 70175 found 70133
checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
parent transid verify failed on 7008899547136 wanted 70175 found 70133
Ignoring transid failure
leaf parent key incorrect 7008899547136
bad block 7008899547136
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536


sudo btrfs-debug-tree -d /dev/sdc
http://sebsauvage.net/paste/?d690b2c9d130008d#cni3fnKUZ7Y/oaXm+nsOw0afoWDFXNl26eC+vbJmcRA=
 

sudo btrfs-find-root /dev/sdc
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
Superblock thinks the generation is 70182
Superblock thinks the level is 1
Found tree root at 6062830010368 gen 70182 level 1
Well block 6062434418688(gen: 70181 level: 1) seems good, but
generation/level doesn't match, want gen: 70182 level: 1
Well block 6062497202176(gen: 69186 level: 0) seems good, but
generation/level doesn't match, want gen: 70182 level: 1
Well block 6062470332416(gen: 69186 level: 0) seems good, but
generation/level doesn't match, want gen: 70182 level: 1


sudo smartctl -l scterc /dev/sda
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled


sudo smartctl -l scterc /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)


sudo smartctl -l scterc /dev/sdc
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled


sudo smartcl -a /dev/sdx
http://sebsauvage.net/paste/?aab1d282ceb1e1cf#auxFRkK5GCW8j1gR7mwgzR1z92Qn9oqtc6EEC2C6sEE=


cat /sys/block/sda/device/timeout
30


cat /sys/block/sdb/device/timeout
30


cat /sys/block/sdc/device/timeout
30

Thank you
Tomas

------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Wednesday, July 06, 2016 1:19AM
 *To:* Tomáš Hrdina
*Cc:* Chris Murphy, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

btrfs check


---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-06  8:07                   ` Tomáš Hrdina
@ 2016-07-06 16:08                     ` Chris Murphy
  2016-07-06 17:50                       ` Tomáš Hrdina
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-06 16:08 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Chris Murphy, Btrfs BTRFS

On Wed, Jul 6, 2016 at 2:07 AM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> Now with 3 disks:
>
> sudo btrfs check /dev/sda
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> bytenr mismatch, want=7008807157760, have=65536
> Checking filesystem on /dev/sda
> UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
> checking extents
> parent transid verify failed on 7009468874752 wanted 70180 found 70133
> parent transid verify failed on 7009468874752 wanted 70180 found 70133
> checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
> checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
> bytenr mismatch, want=7009468874752, have=65536
> parent transid verify failed on 7008859045888 wanted 70175 found 70133
> parent transid verify failed on 7008859045888 wanted 70175 found 70133
> checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
> checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
> bytenr mismatch, want=7008859045888, have=65536
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> Ignoring transid failure
> leaf parent key incorrect 7008899547136
> bad block 7008899547136
> Errors found in extent allocation tree or chunk allocation
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> bytenr mismatch, want=7009074167808, have=65536

Ok much better than before, these all seem sane with a limited number
of problems. Maybe --repair can fix it, but don't do that yet.

> sudo btrfs-debug-tree -d /dev/sdc
> http://sebsauvage.net/paste/?d690b2c9d130008d#cni3fnKUZ7Y/oaXm+nsOw0afoWDFXNl26eC+vbJmcRA=

OK good, so now it finds the chunk tree OK. This is good news. I would
try to mount it ro first, if you need to make or refresh a backup. So
in order:

mount -o ro
mount -o ro,recovery

If those don't work lets see what the user and kernel errors are.

>
>
> sudo btrfs-find-root /dev/sdc
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> Superblock thinks the generation is 70182
> Superblock thinks the level is 1
> Found tree root at 6062830010368 gen 70182 level 1
> Well block 6062434418688(gen: 70181 level: 1) seems good, but
> generation/level doesn't match, want gen: 70182 level: 1
> Well block 6062497202176(gen: 69186 level: 0) seems good, but
> generation/level doesn't match, want gen: 70182 level: 1
> Well block 6062470332416(gen: 69186 level: 0) seems good, but
> generation/level doesn't match, want gen: 70182 level: 1

This is also a good sign that you can probably get btrfs rescue to
work and point it to one of these older tree roots, if mount won't
work.

>
>
> sudo smartctl -l scterc /dev/sda
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>
> SCT Error Recovery Control:
>            Read: Disabled
>           Write: Disabled
>
>
> sudo smartctl -l scterc /dev/sdb
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>
> SCT Error Recovery Control:
>            Read:     70 (7.0 seconds)
>           Write:     70 (7.0 seconds)
>
>
> sudo smartctl -l scterc /dev/sdc
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>
> SCT Error Recovery Control:
>            Read: Disabled
>           Write: Disabled

There's good news and bad news. The good news is all the drives
support SCT ERC. The bad news is two of the drives have the wrong
setting for raid1+, including raid5. Issue:

smartctl -l scterc,70,70 /dev/sdX   #for each drive

This is not a persistent setting. The drive being powered off (maybe
even reset) will revert the setting to drive default. Some people use
a udev rule to set this during startup. I think it can also be done
with a systemd unit. You'd want to specify the drives by id, wwn if
available, so that it's always consistent across boots.

The point of this setting is to force the drive to give up on errors
quickly, allowing Btrfs in this case to be informed of the exact
problem (media error and what sector) so that Btrfs can reconstruct
the data from parity and then fix the bad sector(s). In your current
configuration the fixup can't happen, so problems start to accumulate.

> sudo smartcl -a /dev/sdx
> http://sebsauvage.net/paste/?aab1d282ceb1e1cf#auxFRkK5GCW8j1gR7mwgzR1z92Qn9oqtc6EEC2C6sEE=

sudo smartctl -a /dev/sda
 === START OF INFORMATION SECTION === Model Family: Seagate NAS HDD
Device Model: ST4000VN000-1H4168 Serial Number: Z302YVSZ
5 Reallocated_Sector_Ct 0x0033 089 089 010 Pre-fail Always - 14648

That's too many reallocated sectors. The good news is none are
pending. But for a NAS drive I think this is too high, get it replaced
under warranty. It certainly means that the unrecoverable read spec
for this particular drive is being busted so they should replace it
without question. It's possible this value is high by a factor of 8 if
they're counting 512 byte logical sectors, where the actual physical
sector is 4096 bytes. So it might not be as big of a problem as it
seems, but it's still busted the spec.

sudo smartctl -a /dev/sdb
=== START OF INFORMATION SECTION === Model Family: Seagate NAS HDD
Device Model: ST4000VN000-2AH166 Serial Number: WDH00SM8 LU WWN Device
Id: 5 000c50 09bbd3af2

 Error 1 occurred at disk power-on lifetime: 453 hours (18 days + 21
hours) When the command that caused the error occurred, the device was
active or idle. After command completion occurred, registers were: ER
ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC
at LBA = 0x0fffffff = 268435455

This drive has recently experienced an explicit read error. That
probably was fixed by Btrfs 18 days ago, if you have logs going back
that long you'd likely see a fixup for this same sector LBA value.

/dev/sdc looks OK.

What's interesting looking at all smartctl outputs is that all three
are NAS models of Seagate but *two* of them do not have SCT ERC
enabled by default. That is very eyebrow raising as it relates to the
potential spread of misconfigurations of RAID.

Device Model: ST4000VN000-1H4168
Device Model: ST4000VN000-2AH166  ## this one has SCT ERC set to 70 deciseconds
Device Model: ST4000VN000-1H4168

Seems like a bad idea for a NAS drive to default to SCT ERC disabled,
I would expect the overwhelming use case for NAS drives will be raid1,
5, or 6, all of which need SCT ERC enabled. Very weird choice by
Seagate in my opinion.

Anyway, you should enable this on the other two drives. That way there
are fast error recoveries. If it turns out Btrfs can't reconstruct
something upon error, we can deal with that later. The main thing is
you want to get this raid5 as healthy as possible before the
previously failed device fails again, or gets replaced.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-06 16:08                     ` Chris Murphy
@ 2016-07-06 17:50                       ` Tomáš Hrdina
  2016-07-06 18:12                         ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-06 17:50 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

sudo mount -o ro /dev/sdc /shares
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

sudo mount -o ro,recovery /dev/sdc /shares
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

dmesg
http://sebsauvage.net/paste/?04d1162dc44d7e55#uY0kIaX66o7Kh+TZAGK2T+CKdRk2jorIWM3w5gfXp8I=

Do you want any other log to see?

For all 3 disks:
sudo smartctl -l scterc,70,70 /dev/sdx
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control set to:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Thank you
Tomas

------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Wednesday, July 06, 2016 6:08PM
 *To:* Tomáš Hrdina
*Cc:* Chris Murphy, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

On Wed, Jul 6, 2016 at 2:07 AM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> Now with 3 disks:
> 
> sudo btrfs check /dev/sda
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> bytenr mismatch, want=7008807157760, have=65536
> Checking filesystem on /dev/sda
> UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
> checking extents
> parent transid verify failed on 7009468874752 wanted 70180 found 70133
> parent transid verify failed on 7009468874752 wanted 70180 found 70133
> checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
> checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
> bytenr mismatch, want=7009468874752, have=65536
> parent transid verify failed on 7008859045888 wanted 70175 found 70133
> parent transid verify failed on 7008859045888 wanted 70175 found 70133
> checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
> checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
> bytenr mismatch, want=7008859045888, have=65536
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> Ignoring transid failure
> leaf parent key incorrect 7008899547136
> bad block 7008899547136
> Errors found in extent allocation tree or chunk allocation
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> bytenr mismatch, want=7009074167808, have=65536

Ok much better than before, these all seem sane with a limited number
of problems. Maybe --repair can fix it, but don't do that yet.

> sudo btrfs-debug-tree -d /dev/sdc
> http://sebsauvage.net/paste/?d690b2c9d130008d#cni3fnKUZ7Y/oaXm+nsOw0afoWDFXNl26eC+vbJmcRA=

OK good, so now it finds the chunk tree OK. This is good news. I would
try to mount it ro first, if you need to make or refresh a backup. So
in order:

mount -o ro
mount -o ro,recovery

If those don't work lets see what the user and kernel errors are.

> 
>>
> sudo btrfs-find-root /dev/sdc
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> Superblock thinks the generation is 70182
> Superblock thinks the level is 1
> Found tree root at 6062830010368 gen 70182 level 1
> Well block 6062434418688(gen: 70181 level: 1) seems good, but
> generation/level doesn't match, want gen: 70182 level: 1
> Well block 6062497202176(gen: 69186 level: 0) seems good, but
> generation/level doesn't match, want gen: 70182 level: 1
> Well block 6062470332416(gen: 69186 level: 0) seems good, but
> generation/level doesn't match, want gen: 70182 level: 1

This is also a good sign that you can probably get btrfs rescue to
work and point it to one of these older tree roots, if mount won't
work.

> 
>>
> sudo smartctl -l scterc /dev/sda
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> SCT Error Recovery Control:
>            Read: Disabled
>           Write: Disabled
> 
>>
> sudo smartctl -l scterc /dev/sdb
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> SCT Error Recovery Control:
>            Read:     70 (7.0 seconds)
>           Write:     70 (7.0 seconds)
> 
>>
> sudo smartctl -l scterc /dev/sdc
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-24-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> SCT Error Recovery Control:
>            Read: Disabled
>           Write: Disabled

There's good news and bad news. The good news is all the drives
support SCT ERC. The bad news is two of the drives have the wrong
setting for raid1+, including raid5. Issue:

smartctl -l scterc,70,70 /dev/sdX   #for each drive

This is not a persistent setting. The drive being powered off (maybe
even reset) will revert the setting to drive default. Some people use
a udev rule to set this during startup. I think it can also be done
with a systemd unit. You'd want to specify the drives by id, wwn if
available, so that it's always consistent across boots.

The point of this setting is to force the drive to give up on errors
quickly, allowing Btrfs in this case to be informed of the exact
problem (media error and what sector) so that Btrfs can reconstruct
the data from parity and then fix the bad sector(s). In your current
configuration the fixup can't happen, so problems start to accumulate.

> sudo smartcl -a /dev/sdx
> http://sebsauvage.net/paste/?aab1d282ceb1e1cf#auxFRkK5GCW8j1gR7mwgzR1z92Qn9oqtc6EEC2C6sEE=

sudo smartctl -a /dev/sda
 === START OF INFORMATION SECTION === Model Family: Seagate NAS HDD
Device Model: ST4000VN000-1H4168 Serial Number: Z302YVSZ
5 Reallocated_Sector_Ct 0x0033 089 089 010 Pre-fail Always - 14648

That's too many reallocated sectors. The good news is none are
pending. But for a NAS drive I think this is too high, get it replaced
under warranty. It certainly means that the unrecoverable read spec
for this particular drive is being busted so they should replace it
without question. It's possible this value is high by a factor of 8 if
they're counting 512 byte logical sectors, where the actual physical
sector is 4096 bytes. So it might not be as big of a problem as it
seems, but it's still busted the spec.

sudo smartctl -a /dev/sdb
=== START OF INFORMATION SECTION === Model Family: Seagate NAS HDD
Device Model: ST4000VN000-2AH166 Serial Number: WDH00SM8 LU WWN Device
Id: 5 000c50 09bbd3af2

 Error 1 occurred at disk power-on lifetime: 453 hours (18 days + 21
hours) When the command that caused the error occurred, the device was
active or idle. After command completion occurred, registers were: ER
ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC
at LBA = 0x0fffffff = 268435455

This drive has recently experienced an explicit read error. That
probably was fixed by Btrfs 18 days ago, if you have logs going back
that long you'd likely see a fixup for this same sector LBA value.

/dev/sdc looks OK.

What's interesting looking at all smartctl outputs is that all three
are NAS models of Seagate but *two* of them do not have SCT ERC
enabled by default. That is very eyebrow raising as it relates to the
potential spread of misconfigurations of RAID.

Device Model: ST4000VN000-1H4168
Device Model: ST4000VN000-2AH166  ## this one has SCT ERC set to 70
deciseconds
Device Model: ST4000VN000-1H4168

Seems like a bad idea for a NAS drive to default to SCT ERC disabled,
I would expect the overwhelming use case for NAS drives will be raid1,
5, or 6, all of which need SCT ERC enabled. Very weird choice by
Seagate in my opinion.

Anyway, you should enable this on the other two drives. That way there
are fast error recoveries. If it turns out Btrfs can't reconstruct
something upon error, we can deal with that later. The main thing is
you want to get this raid5 as healthy as possible before the
previously failed device fails again, or gets replaced.

---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-06 17:50                       ` Tomáš Hrdina
@ 2016-07-06 18:12                         ` Chris Murphy
  2016-07-09 17:30                           ` Tomáš Hrdina
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-06 18:12 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Chris Murphy, Btrfs BTRFS

On Wed, Jul 6, 2016 at 11:50 AM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> sudo mount -o ro /dev/sdc /shares
> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>        missing codepage or helper program, or other error
>
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
>
>
> sudo mount -o ro,recovery /dev/sdc /shares
> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>        missing codepage or helper program, or other error
>
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.

[ 275.688919] BTRFS error (device sda): parent transid verify failed
on 7008533413888 wanted 70175 found 70132

Looks like the generation is too far back for backup roots.

Just for grins, now that all drives are present, what do you get for

# btrfs rescue super-recover -v /dev/sda

Next I suggest btrfs-image -c9 -t4 and optionally -s to sanitize file
names. And also btrfs-debug-tree (this time no -d) redirected to a
file. These two files can be big, about the size of the used amount of
metadata chunks. These go in the cloud at some point, reference them
in a bugzilla.kernel.org bug report by URL. Expect it to be months
before a dev looks at it.

So now what you want to try to do is use restore.
https://btrfs.wiki.kernel.org/index.php/Restore

You can use the information from btrfs-find-root to give restore a -t
value to try. For example:

>Found tree root at 6062830010368 gen 70182 level 1
>Well block 6062434418688(gen: 70181 level: 1) seems good, but
>generation/level doesn't match, want gen: 70182 level: 1
>Well block 6062497202176(gen: 69186 level: 0) seems good, but
>generation/level doesn't match, want gen: 70182 level: 1
>Well block 6062470332416(gen: 69186 level: 0) seems good, but
>generation/level doesn't match, want gen: 70182 level: 1

btrfs restore -t 6062830010368 -v -i /dev/sda <pathtowhereyouwantdatatogo>

If that fails totally you can try the next bytenr, for the -t value,
6062434418688. And then the next. Each value down is going backward in
time, so it implies some data loss.

This is not the end. It's just that it's the safest since no changes
to the fs have happened. If you set up some kind of overlay you can be
more aggressive like going right for btrfs check --repair and seeing
if it can fix things, but without the overlay it's possible to totally
break the fs such that even restore won't work.

Once you pretty much have everything important off the volume, you can
get more aggressive with trying to fix it. OR just blow it away and
start over. But I think it's valid to gather as much information about
the file system and try to fix it because the autopsy is the main way
to make Btrfs better.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-06 18:12                         ` Chris Murphy
@ 2016-07-09 17:30                           ` Tomáš Hrdina
  2016-07-09 18:33                             ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-09 17:30 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

 sudo btrfs rescue super-recover -v /dev/sda
All Devices:
        Device: id = 1, name = /dev/sdc
        Device: id = 2, name = /dev/sdb
        Device: id = 3, name = /dev/sda

Before Recovering:
        [All good supers]:
                device name = /dev/sdc
                superblock bytenr = 65536

                device name = /dev/sdc
                superblock bytenr = 67108864

                device name = /dev/sdc
                superblock bytenr = 274877906944

                device name = /dev/sdb
                superblock bytenr = 65536

                device name = /dev/sdb
                superblock bytenr = 67108864

                device name = /dev/sdb
                superblock bytenr = 274877906944

                device name = /dev/sda
                superblock bytenr = 65536

                device name = /dev/sda
                superblock bytenr = 67108864

                device name = /dev/sda
                superblock bytenr = 274877906944

        [All bad supers]:

All supers are valid, no need to recover


I hope, a made it right:


sudo btrfs-image -c9 -t4 /dev/sda /mnt/btrfs-image
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536
Error going to next leaf -5
create failed (Success)


sudo btrfs-debug-tree /dev/sda > /mnt/btrfs-debug-tree 2>
/mnt/btrfs-debug-tree-err

btrfs-debug-tree file have 10MB
btrfs-debug-tree-err
http://sebsauvage.net/paste/?12cf2fb771b93bdd#Ajv5gPoxDKjaWExcJnMZLVhcU5wVw77abeZ4tIGTazU=


I used btrfs restore and everything except the newest files was restored.
I can get those files again from the internet, so now it is save to do
changes to filesystem and try to repair it.
In the end, I will create new fs, but I can try to repair it and
hopefully gather some helpful information.

Thank you for help...

Tomas

------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Wednesday, July 06, 2016 8:12PM
 *To:* Tomáš Hrdina
*Cc:* Chris Murphy, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

btrfs rescue super-recover -v /dev/sda


---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-09 17:30                           ` Tomáš Hrdina
@ 2016-07-09 18:33                             ` Chris Murphy
  2016-07-10  7:01                               ` Tomáš Hrdina
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-09 18:33 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Chris Murphy, Btrfs BTRFS

On Sat, Jul 9, 2016 at 11:30 AM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
>  sudo btrfs rescue super-recover -v /dev/sda
> All Devices:
>         Device: id = 1, name = /dev/sdc
>         Device: id = 2, name = /dev/sdb
>         Device: id = 3, name = /dev/sda
>
> Before Recovering:
>         [All good supers]:
>                 device name = /dev/sdc
>                 superblock bytenr = 65536
>
>                 device name = /dev/sdc
>                 superblock bytenr = 67108864
>
>                 device name = /dev/sdc
>                 superblock bytenr = 274877906944
>
>                 device name = /dev/sdb
>                 superblock bytenr = 65536
>
>                 device name = /dev/sdb
>                 superblock bytenr = 67108864
>
>                 device name = /dev/sdb
>                 superblock bytenr = 274877906944
>
>                 device name = /dev/sda
>                 superblock bytenr = 65536
>
>                 device name = /dev/sda
>                 superblock bytenr = 67108864
>
>                 device name = /dev/sda
>                 superblock bytenr = 274877906944
>
>         [All bad supers]:
>
> All supers are valid, no need to recover
>
>
> I hope, a made it right:
>
>
> sudo btrfs-image -c9 -t4 /dev/sda /mnt/btrfs-image
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> bytenr mismatch, want=7008807157760, have=65536
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> bytenr mismatch, want=7009074167808, have=65536
> Error going to next leaf -5
> create failed (Success)

I've always found that to be a confusing message at the end. If you've
got something in the realm of the total used amount of metadata block
groups (what you'd see with 'fi df' or 'fi usage' then it's  probably
OK.

>
>
> sudo btrfs-debug-tree /dev/sda > /mnt/btrfs-debug-tree 2>
> /mnt/btrfs-debug-tree-err
>
> btrfs-debug-tree file have 10MB
> btrfs-debug-tree-err
> http://sebsauvage.net/paste/?12cf2fb771b93bdd#Ajv5gPoxDKjaWExcJnMZLVhcU5wVw77abeZ4tIGTazU=

Huh so not very useful compared to with -d, it must be tripping up on
something. I guess you could try btrfs-progs 4.6.1 and see if you get
any different results, at the least, btrfs-debug-tree shouldn't crash.

>> I used btrfs restore and everything except the newest files was restored.
> I can get those files again from the internet, so now it is save to do
> changes to filesystem and try to repair it.
> In the end, I will create new fs, but I can try to repair it and
> hopefully gather some helpful information.
>
> Thank you for help...

So now it's an open question what to try and in what order, and I'm
afraid I'm only making estimated guesses. Ideally you'd set up some
kind of overlay so that you can try different sequences in a way
that's non-destructive to the original, but just make sure whatever
overlay method you use obscures the original file system from the
kernel or you run into the problem of having the same volume UUID
exposed more than once, which presently can corrupt both copies.

I would try them in the following order where you try to mount after
each and only try the next one if mount fails.

btrfs check --repair
btrfs check -r <tree root bytenr> --repair

Use the tree root bytenr you used for mostly successful recovery for
'btrfs restore -t' but if that doesn't work then I'd try all the tree
roots that btrfs-find-root reports.

btrfs check --repair --init-csum-tree
btrfs check --repair --init-extent-tree

Those aren't really related to your problem at all, it's just
spaghetti at a wall. Likewise the following two:

btrfs restore chunk-recover
btrfs restore zero-log

In particular there's nothing about your situation that suggests zero
log ought to fix anything. But stranger things have happened.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-09 18:33                             ` Chris Murphy
@ 2016-07-10  7:01                               ` Tomáš Hrdina
  2016-07-10 20:08                                 ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-10  7:01 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

After every step, I tried mount fs with ro, ro,recovery and
ro,degraded,recovery. If failed, I moved to next step.


sudo btrfs check --repair /dev/sdc
enabling repair mode
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
Checking filesystem on /dev/sdc
UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
checking extents
parent transid verify failed on 7009468874752 wanted 70180 found 70133
parent transid verify failed on 7009468874752 wanted 70180 found 70133
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
bytenr mismatch, want=7009468874752, have=65536
parent transid verify failed on 7008859045888 wanted 70175 found 70133
parent transid verify failed on 7008859045888 wanted 70175 found 70133
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
bytenr mismatch, want=7008859045888, have=65536
parent transid verify failed on 7008899547136 wanted 70175 found 70133
parent transid verify failed on 7008899547136 wanted 70175 found 70133
checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
parent transid verify failed on 7008899547136 wanted 70175 found 70133
Ignoring transid failure
leaf parent key incorrect 7008899547136
bad block 7008899547136
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536


btrfs check -r <tree root bytenr> --repair

I didn't use any bytenr for recovery. Recovery worked without -t.

sudo btrfs-find-root /dev/sdc
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
Superblock thinks the generation is 70182
Superblock thinks the level is 1
Found tree root at 6062830010368 gen 70182 level 1
Well block 6062434418688(gen: 70181 level: 1) seems good, but
generation/level doesn't match, want gen: 70182 level: 1
Well block 6062497202176(gen: 69186 level: 0) seems good, but
generation/level doesn't match, want gen: 70182 level: 1
Well block 6062470332416(gen: 69186 level: 0) seems good, but
generation/level doesn't match, want gen: 70182 level: 1


sudo btrfs check -r 6062830010368 --repair /dev/sdc
enabling repair mode
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
Checking filesystem on /dev/sdc
UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
checking extents
parent transid verify failed on 7009468874752 wanted 70180 found 70133
parent transid verify failed on 7009468874752 wanted 70180 found 70133
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
bytenr mismatch, want=7009468874752, have=65536
parent transid verify failed on 7008859045888 wanted 70175 found 70133
parent transid verify failed on 7008859045888 wanted 70175 found 70133
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
bytenr mismatch, want=7008859045888, have=65536
parent transid verify failed on 7008899547136 wanted 70175 found 70133
parent transid verify failed on 7008899547136 wanted 70175 found 70133
checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
parent transid verify failed on 7008899547136 wanted 70175 found 70133
Ignoring transid failure
leaf parent key incorrect 7008899547136
bad block 7008899547136
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536


sudo btrfs check -r 6062434418688 --repair /dev/sdc
enabling repair mode
parent transid verify failed on 6062434418688 wanted 70182 found 70181
parent transid verify failed on 6062434418688 wanted 70182 found 70181
checksum verify failed on 6062434418688 found F868085E wanted 1C8BB5E8
parent transid verify failed on 6062434418688 wanted 70182 found 70181
Ignoring transid failure
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
Checking filesystem on /dev/sdc
UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
checking extents
parent transid verify failed on 6063240511488 wanted 70175 found 70132
parent transid verify failed on 6063240511488 wanted 70175 found 70132
checksum verify failed on 6063240511488 found E9831D76 wanted 0D60A0C0
checksum verify failed on 6063240511488 found E9831D76 wanted 0D60A0C0
bytenr mismatch, want=6063240511488, have=65536
parent transid verify failed on 7398301417472 wanted 70180 found 70134
parent transid verify failed on 7398301417472 wanted 70180 found 70134
checksum verify failed on 7398301417472 found B2FA98AB wanted 5619251D
checksum verify failed on 7398301417472 found B2FA98AB wanted 5619251D
bytenr mismatch, want=7398301417472, have=65536
parent transid verify failed on 7398262865920 wanted 70180 found 70133
parent transid verify failed on 7398262865920 wanted 70180 found 70133
checksum verify failed on 7398262865920 found 37B272E5 wanted D351CF53
parent transid verify failed on 7398262865920 wanted 70180 found 70133
Ignoring transid failure
leaf parent key incorrect 7398262865920
parent transid verify failed on 7398398099456 wanted 70180 found 70134
parent transid verify failed on 7398398099456 wanted 70180 found 70134
checksum verify failed on 7398398099456 found 1923B74F wanted FDC00AF9
checksum verify failed on 7398398099456 found 1923B74F wanted FDC00AF9
bytenr mismatch, want=7398398099456, have=65536
parent transid verify failed on 7398398099456 wanted 70180 found 70134
parent transid verify failed on 7398398099456 wanted 70180 found 70134
checksum verify failed on 7398398099456 found 1923B74F wanted FDC00AF9
checksum verify failed on 7398398099456 found 1923B74F wanted FDC00AF9
bytenr mismatch, want=7398398099456, have=65536
parent transid verify failed on 7009449263104 wanted 70180 found 70133
parent transid verify failed on 7009449263104 wanted 70180 found 70133
checksum verify failed on 7009449263104 found AD1A4120 wanted 49F9FC96
checksum verify failed on 7009449263104 found AD1A4120 wanted 49F9FC96
bytenr mismatch, want=7009449263104, have=65536
parent transid verify failed on 7398308003840 wanted 70180 found 70134
parent transid verify failed on 7398308003840 wanted 70180 found 70134
checksum verify failed on 7398308003840 found 9162951D wanted 758128AB
checksum verify failed on 7398308003840 found 9162951D wanted 758128AB
bytenr mismatch, want=7398308003840, have=65536
parent transid verify failed on 7009456766976 wanted 70180 found 70133
parent transid verify failed on 7009456766976 wanted 70180 found 70133
checksum verify failed on 7009456766976 found 0A20BD0C wanted EEC300BA
checksum verify failed on 7009456766976 found 0A20BD0C wanted EEC300BA
bytenr mismatch, want=7009456766976, have=65536
parent transid verify failed on 7398971736064 wanted 70180 found 70134
parent transid verify failed on 7398971736064 wanted 70180 found 70134
checksum verify failed on 7398971736064 found 39868CDB wanted DD65316D
checksum verify failed on 7398971736064 found 39868CDB wanted DD65316D
bytenr mismatch, want=7398971736064, have=65536
parent transid verify failed on 7398171967488 wanted 70180 found 70133
parent transid verify failed on 7398171967488 wanted 70180 found 70133
checksum verify failed on 7398171967488 found 372EF754 wanted D3CD4AE2
checksum verify failed on 7398171967488 found 372EF754 wanted D3CD4AE2
bytenr mismatch, want=7398171967488, have=65536
parent transid verify failed on 7009468596224 wanted 70180 found 70133
parent transid verify failed on 7009468596224 wanted 70180 found 70133
checksum verify failed on 7009468596224 found CE38C9D6 wanted 2ADB7460
parent transid verify failed on 7009468596224 wanted 70180 found 70133
Ignoring transid failure
leaf parent key incorrect 7009468596224
parent transid verify failed on 7398199115776 wanted 70180 found 70133
parent transid verify failed on 7398199115776 wanted 70180 found 70133
checksum verify failed on 7398199115776 found 90F857D8 wanted 741BEA6E
checksum verify failed on 7398199115776 found 90F857D8 wanted 741BEA6E
bytenr mismatch, want=7398199115776, have=65536
parent transid verify failed on 7398207799296 wanted 70180 found 70133
parent transid verify failed on 7398207799296 wanted 70180 found 70133
checksum verify failed on 7398207799296 found 99BAD070 wanted 7D596DC6
checksum verify failed on 7398207799296 found 99BAD070 wanted 7D596DC6
bytenr mismatch, want=7398207799296, have=65536
parent transid verify failed on 7009468874752 wanted 70180 found 70133
parent transid verify failed on 7009468874752 wanted 70180 found 70133
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
bytenr mismatch, want=7009468874752, have=65536
parent transid verify failed on 7008859045888 wanted 70175 found 70133
parent transid verify failed on 7008859045888 wanted 70175 found 70133
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
bytenr mismatch, want=7008859045888, have=65536
parent transid verify failed on 7008899547136 wanted 70175 found 70133
parent transid verify failed on 7008899547136 wanted 70175 found 70133
checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
parent transid verify failed on 7008899547136 wanted 70175 found 70133
Ignoring transid failure
leaf parent key incorrect 7008899547136
bad block 7008899547136
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7398682050560 wanted 70180 found 70134
parent transid verify failed on 7398682050560 wanted 70180 found 70134
checksum verify failed on 7398682050560 found 3B3B2ADC wanted DFD8976A
checksum verify failed on 7398682050560 found 3B3B2ADC wanted DFD8976A
bytenr mismatch, want=7398682050560, have=65536


sudo btrfs check -r 6062497202176 --repair /dev/sdc
enabling repair mode
parent transid verify failed on 6062497202176 wanted 70182 found 69186
parent transid verify failed on 6062497202176 wanted 70182 found 69186
checksum verify failed on 6062497202176 found 41994FE1 wanted A57AF257
checksum verify failed on 6062497202176 found 41994FE1 wanted A57AF257
bytenr mismatch, want=6062497202176, have=65536
Couldn't read tree root
Couldn't open file system


sudo btrfs check -r 6062470332416 --repair /dev/sdc
enabling repair mode
parent transid verify failed on 6062470332416 wanted 70182 found 69186
parent transid verify failed on 6062470332416 wanted 70182 found 69186
checksum verify failed on 6062470332416 found 46F2EC59 wanted A21151EF
checksum verify failed on 6062470332416 found 46F2EC59 wanted A21151EF
bytenr mismatch, want=6062470332416, have=65536
Couldn't read tree root
Couldn't open file system


sudo btrfs check --repair --init-csum-tree /dev/sdc
enabling repair mode
Creating a new CRC tree
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
Checking filesystem on /dev/sdc
UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
Reinit crc root
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs(btrfs_reserve_extent+0x8f9)[0x45140a]
btrfs(btrfs_alloc_free_block+0x60)[0x451794]
btrfs[0x41d2d5]
btrfs(cmd_check+0xfe8)[0x42d0f5]
btrfs(main+0x155)[0x40a433]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ffa55325830]
btrfs(_start+0x29)[0x40a029]


sudo btrfs check --repair --init-extent-tree /dev/sdc
enabling repair mode
Checking filesystem on /dev/sdc
UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
Creating a new extent tree
Failed to find [6062434598912, 168, 16384]
btrfs unable to find ref byte nr 6062830010368 parent 0 root 1  owner 1
offset 0
Failed to find [6062434615296, 168, 16384]
btrfs unable to find ref byte nr 6062830174208 parent 0 root 1  owner 0
offset 1
parent transid verify failed on 7398212452352 wanted 70180 found 70133
parent transid verify failed on 7398212452352 wanted 70180 found 70133
checksum verify failed on 7398212452352 found B2C4F638 wanted 56274B8E
checksum verify failed on 7398212452352 found B2C4F638 wanted 56274B8E
bytenr mismatch, want=7398212452352, have=65536
Error reading data reloc tree
error resetting the pending balance
transaction.h:42: btrfs_start_transaction: Assertion
`fs_info->running_transaction` failed.
btrfs[0x4468e6]
btrfs(close_ctree_fs_info+0x184)[0x448d59]
btrfs(cmd_check+0x3010)[0x42f11d]
btrfs(main+0x155)[0x40a433]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f34be581830]
btrfs(_start+0x29)[0x40a029]


I believe you meant rescue command:

sudo btrfs rescue chunk-recover -v /dev/sdc
All Devices:
        Device: id = 1, name = /dev/sdd
        Device: id = 3, name = /dev/sdb
        Device: id = 2, name = /dev/sdc

Scanning: 2808088981504 in dev0, 2759780548608 in dev1, 2978000535552 in
dev2scan chunk headers error
Chunk tree recovery aborted


sudo btrfs rescue zero-log /dev/sdc
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
Clearing log on /dev/sdc, previous log_root 0, level 0
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs(btrfs_reserve_extent+0x8f9)[0x45140a]
btrfs(btrfs_alloc_free_block+0x60)[0x451794]
btrfs(__btrfs_cow_block+0x1a7)[0x4406dc]
btrfs(btrfs_cow_block+0x102)[0x441161]
btrfs[0x446ce3]
btrfs(btrfs_commit_transaction+0xec)[0x448ac7]
btrfs[0x432d3d]
btrfs(handle_command_group+0x5d)[0x40a2d9]
btrfs(cmd_rescue+0x15)[0x432d71]
btrfs(main+0x155)[0x40a433]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fb21969b830]
btrfs(_start+0x29)[0x40a029]


So far, no luck. I can't still mount.

Thank you
Tomas

------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Saturday, July 09, 2016 8:33PM
 *To:* Tomáš Hrdina
*Cc:* Chris Murphy, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

btrfs check --repair


---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-10  7:01                               ` Tomáš Hrdina
@ 2016-07-10 20:08                                 ` Chris Murphy
  2016-07-11 17:17                                   ` Tomáš Hrdina
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-10 20:08 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Chris Murphy, Btrfs BTRFS

On Sun, Jul 10, 2016 at 1:01 AM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:

> sudo btrfs check --repair /dev/sdc
> enabling repair mode
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> bytenr mismatch, want=7008807157760, have=65536
> Checking filesystem on /dev/sdc
> UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
> checking extents
> parent transid verify failed on 7009468874752 wanted 70180 found 70133
> parent transid verify failed on 7009468874752 wanted 70180 found 70133
> checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
> checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
> bytenr mismatch, want=7009468874752, have=65536
> parent transid verify failed on 7008859045888 wanted 70175 found 70133
> parent transid verify failed on 7008859045888 wanted 70175 found 70133
> checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
> checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
> bytenr mismatch, want=7008859045888, have=65536
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> Ignoring transid failure
> leaf parent key incorrect 7008899547136
> bad block 7008899547136
> Errors found in extent allocation tree or chunk allocation
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> bytenr mismatch, want=7009074167808, have=65536

OK well it was all a goose chase then. These are all the same messages
from 4 days ago also. The central problem appears to be checksum
verifications on multiple blocks, which really doesn't make sense to
me because it should be able to reconstruct from parity.

How is it possible to have four root trees, all of which point to
different leaf/nodes, all of which have some kind of checksum failure?
None of them are good? And none of them can be reconstructed? Sounds
fishy.

You try to plug each of those bytenr's into

btrfs-debug-tree -b <bytenr>  and see if it'll show you what leaf
information is there that it doesn't like. But if there's a csum
mismatch, it may refuse to show anything, rather than show it and say
it's unreliable due to csum mismatch.  If it refuses to show it you
could plug each of those failed bytenrs into

btrfs-map-logical -l <bytenr> and get a device and physical sector,
then you can get the entire leaf, compute a new csum and overwrite the
current one. That way it now passes csum and see if that's the only
problem, or if there's another brick wall later. Of course, if the
csum was correct, and it's the metadata that's bad, honoring bad
metadata as valid might cause a bad fix and then the whole thing
implodes. But you're pretty much there already I'd say.

If I were to pick an address to start with, it'd be this one.

> leaf parent key incorrect 7008899547136
> bad block 7008899547136

But other than that, I'm out of ideas. It's completely reasonable to
just give up at this point.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-10 20:08                                 ` Chris Murphy
@ 2016-07-11 17:17                                   ` Tomáš Hrdina
  2016-07-11 19:25                                     ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Tomáš Hrdina @ 2016-07-11 17:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

sudo btrfs-debug-tree /dev/sdc
It has 200000 lines. Don't know, what you use for bigger files.


sudo btrfs-debug-tree -b 6062434418688 /dev/sdc
http://sebsauvage.net/paste/?fc156dee1d1deb3b#YpG/TA0H3I313jMuC4pgsdj++TcuDaFwWIBeuuOXfCA=
 
sudo btrfs-debug-tree -b 6062497202176 /dev/sdc
http://sebsauvage.net/paste/?86621abec9c239bd#kwTpZ7BZLcLw71yCfr3jHKZT08zsaXK3RgdFo7MFoFc=


sudo btrfs-debug-tree -b 6062470332416 /dev/sdc
http://sebsauvage.net/paste/?4ff40fa0b6b201c9#nFk7pT9MLj2w9egUJlgXdkmCkWyp1vSG0kADfq3J7eA=

It got some results, but I don't know, what to look for.


sudo btrfs-map-logical -l 7008899547136 /dev/sdc
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
mirror 1 logical 7008899547136 physical 735226609664 device /dev/sdb
mirror 2 logical 7008899547136 physical 3166748524544 device /dev/sdc


Also I don't know, what to do with this. How to compute new csum.

For me, it would be ok to give up and just start fresh.

Thank you
Tomas
------------------------------------------------------------------------

 *From:* Chris Murphy
 *Sent:*  Sunday, July 10, 2016 10:08PM
 *To:* Tomáš Hrdina
*Cc:* Chris Murphy, Btrfs Btrfs
 *Subject:* Re: Unable to mount degraded RAID5

btrfs-debug-tree -b


---
Tato zpráva byla zkontrolována na viry programem Avast Antivirus.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-11 17:17                                   ` Tomáš Hrdina
@ 2016-07-11 19:25                                     ` Chris Murphy
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Murphy @ 2016-07-11 19:25 UTC (permalink / raw)
  To: Tomáš Hrdina; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Jul 11, 2016 at 11:17 AM, Tomáš Hrdina <thomas.rkh@gmail.com> wrote:
> sudo btrfs-debug-tree /dev/sdc
> It has 200000 lines. Don't know, what you use for bigger files.
>
>
> sudo btrfs-debug-tree -b 6062434418688 /dev/sdc
> http://sebsauvage.net/paste/?fc156dee1d1deb3b#YpG/TA0H3I313jMuC4pgsdj++TcuDaFwWIBeuuOXfCA=
>
> sudo btrfs-debug-tree -b 6062497202176 /dev/sdc
> http://sebsauvage.net/paste/?86621abec9c239bd#kwTpZ7BZLcLw71yCfr3jHKZT08zsaXK3RgdFo7MFoFc=
>
>
> sudo btrfs-debug-tree -b 6062470332416 /dev/sdc
> http://sebsauvage.net/paste/?4ff40fa0b6b201c9#nFk7pT9MLj2w9egUJlgXdkmCkWyp1vSG0kADfq3J7eA=

None of these have anything useful in them, there's no tree root there.


>
> It got some results, but I don't know, what to look for.
>
>
> sudo btrfs-map-logical -l 7008899547136 /dev/sdc
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> bytenr mismatch, want=7008807157760, have=65536
> mirror 1 logical 7008899547136 physical 735226609664 device /dev/sdb
> mirror 2 logical 7008899547136 physical 3166748524544 device /dev/sdc
>
>
> Also I don't know, what to do with this. How to compute new csum.

Right, that's pretty tricky to do manually.



>
> For me, it would be ok to give up and just start fresh.

OK in that case do that.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <CAFDLS-CtnVDtD8d=Wtp0tVokKJ6pjptpX7MR862dThBJvSPC5g@mail.gmail.com>]

* Fwd: Unable to mount degraded RAID5
       [not found] <CAFDLS-CtnVDtD8d=Wtp0tVokKJ6pjptpX7MR862dThBJvSPC5g@mail.gmail.com>
@ 2016-07-06 17:12 ` Gonzalo Gomez-Arrue Azpiazu
  2016-07-06 18:19   ` Chris Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Gonzalo Gomez-Arrue Azpiazu @ 2016-07-06 17:12 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I had a RAID5 with 3 disks and one failed; now the filesystem cannot be mounted.

None of the recommendations that I found seem to work. The situation
seems to be similar to this one:
http://www.spinics.net/lists/linux-btrfs/msg56825.html

Any suggestion on what to try next?

Thanks a lot beforehand!

sudo btrfs version
btrfs-progs v4.4

uname -a
Linux ubuntu 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

sudo btrfs fi show
warning, device 2 is missing
checksum verify failed on 2339175972864 found A781ADC2 wanted 43621074
checksum verify failed on 2339175972864 found A781ADC2 wanted 43621074
bytenr mismatch, want=2339175972864, have=65536
Couldn't read chunk root
Label: none  uuid: 495efbc6-2f62-4cd7-962b-7ae3d0e929f1
Total devices 3 FS bytes used 1.29TiB
devid    1 size 2.73TiB used 674.03GiB path /dev/sdc1
devid    3 size 2.73TiB used 674.03GiB path /dev/sdd1
*** Some devices missing

sudo mount -t btrfs -o ro,degraded,recovery /dev/sdc1 /btrfs
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

dmesg | tail
[ 2440.036368] BTRFS info (device sdd1): allowing degraded mounts
[ 2440.036383] BTRFS info (device sdd1): enabling auto recovery
[ 2440.036390] BTRFS info (device sdd1): disk space caching is enabled
[ 2440.037928] BTRFS warning (device sdd1): devid 2 uuid
0c7d7db2-6a27-4b19-937b-b6266ba81257 is missing
[ 2440.652085] BTRFS info (device sdd1): bdev (null) errs: wr 1413, rd
362, flush 471, corrupt 0, gen 0
[ 2441.359066] BTRFS error (device sdd1): bad tree block start 0 833766391808
[ 2441.359306] BTRFS error (device sdd1): bad tree block start 0 833766391808
[ 2441.359330] BTRFS: Failed to read block groups: -5
[ 2441.383793] BTRFS: open_ctree failed

sudo btrfs restore /dev/sdc1 /bkp
warning, device 2 is missing
checksum verify failed on 2339175972864 found A781ADC2 wanted 43621074
checksum verify failed on 2339175972864 found A781ADC2 wanted 43621074
bytenr mismatch, want=2339175972864, have=65536
Couldn't read chunk root
Could not open root, trying backup super
warning, device 2 is missing
warning, device 3 is missing
checksum verify failed on 2339175972864 found A781ADC2 wanted 43621074
checksum verify failed on 2339175972864 found A781ADC2 wanted 43621074
bytenr mismatch, want=2339175972864, have=65536
Couldn't read chunk root
Could not open root, trying backup super
warning, device 2 is missing
warning, device 3 is missing
checksum verify failed on 2339175972864 found A781ADC2 wanted 43621074
checksum verify failed on 2339175972864 found A781ADC2 wanted 43621074
bytenr mismatch, want=2339175972864, have=65536
Couldn't read chunk root
Could not open root, trying backup super

sudo btrfs-show-super -fa /dev/sdc1
http://sebsauvage.net/paste/?d79e9e9c385cf1a5#fNwoEj5o2aQ6T7nDl4vjrFqEJG0SHeVpmGknbbCVnd0=

sudo btrfs-find-root /dev/sdc1
warning, device 2 is missing
Couldn't read chunk root
Open ctree failed

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-06 17:12 ` Fwd: " Gonzalo Gomez-Arrue Azpiazu
@ 2016-07-06 18:19   ` Chris Murphy
  2016-07-07 12:24     ` Gonzalo Gomez-Arrue Azpiazu
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Murphy @ 2016-07-06 18:19 UTC (permalink / raw)
  To: Gonzalo Gomez-Arrue Azpiazu; +Cc: Btrfs BTRFS

On Wed, Jul 6, 2016 at 11:12 AM, Gonzalo Gomez-Arrue Azpiazu
<ggomarr@gmail.com> wrote:
> Hello,
>
> I had a RAID5 with 3 disks and one failed; now the filesystem cannot be mounted.
>
> None of the recommendations that I found seem to work. The situation
> seems to be similar to this one:
> http://www.spinics.net/lists/linux-btrfs/msg56825.html
>
> Any suggestion on what to try next?

Basically if you are degraded *and* it runs into additional errors,
then it's broken because raid5 only protects against one device error.
The main problem is if it can't read the chunk root it's hard for any
tool to recover data because the chunk tree mapping is vital to
finding data.

What do you get for:
btrfs rescue super-recover -v /dev/sdc1

It's a problem with the chunk tree because all of your super blocks
point to the same chunk tree root so there isn't another one to try.

>sudo btrfs-find-root /dev/sdc1
>warning, device 2 is missing
>Couldn't read chunk root
>Open ctree failed

It's bad news. I'm not even sure 'btrfs restore' can help this case.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Unable to mount degraded RAID5
  2016-07-06 18:19   ` Chris Murphy
@ 2016-07-07 12:24     ` Gonzalo Gomez-Arrue Azpiazu
  0 siblings, 0 replies; 25+ messages in thread
From: Gonzalo Gomez-Arrue Azpiazu @ 2016-07-07 12:24 UTC (permalink / raw)
  To: linux-btrfs

Thanks a lot, your will to help out someone you do not know (and who
is obviously way over his depth) is inspiring.

This is what it says:

btrfs rescue super-recover -v /dev/sdc1
All Devices:
Device: id = 3, name = /dev/sdd1
Device: id = 1, name = /dev/sdc1

Before Recovering:
[All good supers]:
device name = /dev/sdd1
superblock bytenr = 65536

device name = /dev/sdd1
superblock bytenr = 67108864

device name = /dev/sdd1
superblock bytenr = 274877906944

device name = /dev/sdc1
superblock bytenr = 65536

device name = /dev/sdc1
superblock bytenr = 67108864

device name = /dev/sdc1
superblock bytenr = 274877906944

[All bad supers]:

All supers are valid, no need to recover

Any suggestion on what to do next?

(again, really appreciated - I hope to be able to give back the
support I am receiving at some point!)

On Wed, Jul 6, 2016 at 9:19 PM, Chris Murphy <lists@colorremedies.com> wrote:
> On Wed, Jul 6, 2016 at 11:12 AM, Gonzalo Gomez-Arrue Azpiazu
> <ggomarr@gmail.com> wrote:
>> Hello,
>>
>> I had a RAID5 with 3 disks and one failed; now the filesystem cannot be mounted.
>>
>> None of the recommendations that I found seem to work. The situation
>> seems to be similar to this one:
>> http://www.spinics.net/lists/linux-btrfs/msg56825.html
>>
>> Any suggestion on what to try next?
>
> Basically if you are degraded *and* it runs into additional errors,
> then it's broken because raid5 only protects against one device error.
> The main problem is if it can't read the chunk root it's hard for any
> tool to recover data because the chunk tree mapping is vital to
> finding data.
>
> What do you get for:
> btrfs rescue super-recover -v /dev/sdc1
>
> It's a problem with the chunk tree because all of your super blocks
> point to the same chunk tree root so there isn't another one to try.
>
>>sudo btrfs-find-root /dev/sdc1
>>warning, device 2 is missing
>>Couldn't read chunk root
>>Open ctree failed
>
> It's bad news. I'm not even sure 'btrfs restore' can help this case.
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-07-11 19:25 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-04 18:09 Unable to mount degraded RAID5 Tomáš Hrdina
2016-07-04 18:41 ` Chris Murphy
     [not found]   ` <95f58623-95a4-b5d2-fa3a-bfb957840a31@gmail.com>
2016-07-04 19:01     ` Chris Murphy
2016-07-04 19:11       ` Tomáš Hrdina
2016-07-04 20:43         ` Chris Murphy
2016-07-04 21:10           ` Tomáš Hrdina
2016-07-04 22:42             ` Chris Murphy
2016-07-04 22:59               ` Chris Murphy
2016-07-05  7:12               ` Tomáš Hrdina
2016-07-05  3:48           ` Andrei Borzenkov
2016-07-05 15:13             ` Chris Murphy
2016-07-05 18:40               ` Tomáš Hrdina
2016-07-05 23:19                 ` Chris Murphy
2016-07-06  8:07                   ` Tomáš Hrdina
2016-07-06 16:08                     ` Chris Murphy
2016-07-06 17:50                       ` Tomáš Hrdina
2016-07-06 18:12                         ` Chris Murphy
2016-07-09 17:30                           ` Tomáš Hrdina
2016-07-09 18:33                             ` Chris Murphy
2016-07-10  7:01                               ` Tomáš Hrdina
2016-07-10 20:08                                 ` Chris Murphy
2016-07-11 17:17                                   ` Tomáš Hrdina
2016-07-11 19:25                                     ` Chris Murphy
     [not found] <CAFDLS-CtnVDtD8d=Wtp0tVokKJ6pjptpX7MR862dThBJvSPC5g@mail.gmail.com>
2016-07-06 17:12 ` Fwd: " Gonzalo Gomez-Arrue Azpiazu
2016-07-06 18:19   ` Chris Murphy
2016-07-07 12:24     ` Gonzalo Gomez-Arrue Azpiazu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).