Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* Self-destruct of btrfs RAID6 array
@ 2015-11-20  4:11 Paul Loewenstein
  2015-11-20  6:19 ` Duncan
  2015-11-20 13:29 ` Austin S Hemmelgarn
  0 siblings, 2 replies; 3+ messages in thread
From: Paul Loewenstein @ 2015-11-20  4:11 UTC (permalink / raw)
  To: linux-btrfs

I have just had an apparently catastrophic collapse of a large RAID6 
array.  I was hoping that the dual-redundancy of a RAID6 array would 
compensate for having no backup media large enough to back it up!

Any suggestions for repairing this array, at least to the point of 
mounting it read-only?  I am thinking of trying to mount it degraded 
with different devices missing, but I don't know if that will be an 
exercise in futility.

btrfs fi show still works!

Label: 'btrfsdata'  uuid: ccde0a00-e50b-4154-977f-ac591ab580a5
         Total devices 6 FS bytes used 9.62TiB
         devid   10 size 3.64TiB used 2.41TiB path /dev/sdg
         devid   11 size 3.64TiB used 2.41TiB path /dev/sda
         devid   12 size 3.64TiB used 2.41TiB path /dev/sdb
         devid   13 size 3.64TiB used 2.41TiB path /dev/sdc
         devid   14 size 3.64TiB used 2.41TiB path /dev/sdd
         devid   15 size 3.64TiB used 2.41TiB path /dev/sde

It spontaneously (I believe it was after it successfully mounted rw on 
boot, but I can't check for sure without looking at the last file 
creation time).  After another reboot it won't mount at all.

btrfs check /dev/sda gives:

parent transid verify failed on 73440384909312 wanted 491976 found 485531
parent transid verify failed on 73440384909312 wanted 491976 found 485531
checksum verify failed on 73440384909312 found 26943E11 wanted 0FCB3E97
checksum verify failed on 73440384909312 found AAD98681 wanted EA004FE8
checksum verify failed on 73440384909312 found AAD98681 wanted EA004FE8
bytenr mismatch, want=73440384909312, have=274180945215488
Couldn't read chunk root
Couldn't open file system

Looking back in the journal (I shall now be setting up journal 
monitoring), I found lots of errors, starting last September, only a few 
weeks after converting from RAID1 to RAID6.
Blank lines precede reboots and for the first log indicate the omission 
of over 30K entries!  The first log must represent some software bug, 
because /dev/sdh is NOT a btrfs device!

LOG EXTRACTS, while the filesystem was still mounted.  Journal grepped 
for btrfs, boot line added after.  Note different kernel version on 
reboot after upgrade.

Aug 26 20:12:24 cambridge kernel: Linux version 4.1.5-100.fc21.x86_64 
(mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 4.9.2 20150212 
(Red Hat 4.9.2-6) (GCC) ) #1 SMP Tue Aug 11 00:24:23 UTC 2015
Aug 26 20:12:52 cambridge kernel: Btrfs loaded
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 11 
transid 484422 /dev/sda
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 15 
transid 484422 /dev/sde
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 13 
transid 484422 /dev/sdc
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 14 
transid 484422 /dev/sdd
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 12 
transid 484422 /dev/sdb
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 10 
transid 484422 /dev/sdg
Sep 13 16:11:34 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 0, rd 0, 
flush 1, corrupt 0, gen 0
Sep 13 16:11:34 cambridge kernel: BTRFS: lost page write due to I/O 
error on /dev/sdh
Sep 13 16:11:34 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 1, rd 0, 
flush 1, corrupt 0, gen 0
Sep 13 16:11:34 cambridge kernel: BTRFS: lost page write due to I/O 
error on /dev/sdh
Sep 13 16:11:34 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 2, rd 0, 
flush 1, corrupt 0, gen 0
Sep 13 16:11:34 cambridge kernel: BTRFS: lost page write due to I/O 
error on /dev/sdh

Nov 15 15:21:51 cambridge kernel: BTRFS: lost page write due to I/O 
error on /dev/sdh
Nov 15 15:21:51 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 18713, 
rd 0, flush 6238, corrupt 0, gen 0
Nov 15 15:21:51 cambridge kernel: BTRFS: lost page write due to I/O 
error on /dev/sdh
Nov 15 15:21:51 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 18714, 
rd 0, flush 6238, corrupt 0, gen 0

Nov 15 15:23:00 cambridge kernel: Linux version 4.1.12-101.fc21.x86_64 
(mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 4.9.2 20150212 
(Red Hat 4.9.2-6) (GCC) ) #1 SMP Wed Oct 28 15:18:44 UTC 2015
Nov 15 15:23:33 cambridge kernel: Btrfs loaded
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 14 
transid 492036 /dev/sdd
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 15 
transid 485798 /dev/sde
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 11 
transid 492036 /dev/sda
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 13 
transid 492036 /dev/sdc
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 10 
transid 492036 /dev/sdg
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 12 
transid 492036 /dev/sdb
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid 
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid 
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid 
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid 
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 15:23:33 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711, 
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): bad tree block 
start 1121375725894905312 74200909787136
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): bad tree block 
start 7250342666203184288 74200909791232
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439

Nov 15 20:37:14 cambridge kernel: BTRFS (device sdb): parent transid 
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:37:14 cambridge kernel: BTRFS (device sdb): parent transid 
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:39:01 cambridge kernel: BTRFS (device sdb): bad tree block 
start 8747312261073978676 74201584123904
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733865472 csum 3128256294 expected csum 3176585556
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733869568 csum 3953187115 expected csum 2827150008
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733873664 csum 2011708136 expected csum 1514290758
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733877760 csum 4227108651 expected csum 3929632885
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733881856 csum 667263525 expected csum 2167952522
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733885952 csum 1421670165 expected csum 2602382287
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733890048 csum 2320260888 expected csum 606775819
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733865472 csum 3128256294 expected csum 3176585556
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733894144 csum 2140326945 expected csum 2209619790
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum 
failed ino 1455165 off 1733898240 csum 372680472 expected csum 3888049973

Nov 15 20:42:45 cambridge kernel: Linux version 4.1.12-101.fc21.x86_64 
(mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 4.9.2 20150212 
(Red Hat 4.9.2-6) (GCC) ) #1 SMP Wed Oct 28 15:18:44 UTC 2015
Nov 15 20:43:16 cambridge kernel: Btrfs loaded
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 15 
transid 492120 /dev/sde
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 14 
transid 492120 /dev/sdd
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 13 
transid 492120 /dev/sdc
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 12 
transid 492120 /dev/sdb
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 11 
transid 492120 /dev/sda
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 10 
transid 492120 /dev/sdg
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:43:16 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711, 
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): bad tree block 
start 1121375725894905312 74200909787136
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): bad tree block 
start 7250342666203184288 74200909791232
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS: Failed to read block groups: -5
Nov 15 20:43:16 cambridge kernel: BTRFS: open_ctree failed
Nov 15 20:49:14 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:49:15 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711, 
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): bad tree block 
start 1121375725894905312 74200909787136
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): bad tree block 
start 7250342666203184288 74200909791232
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS: Failed to read block groups: -5
Nov 15 20:49:16 cambridge kernel: BTRFS: open_ctree failed
Nov 15 20:43:16 cambridge kernel: Btrfs loaded
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 15 
transid 492120 /dev/sde
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 14 
transid 492120 /dev/sdd
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 13 
transid 492120 /dev/sdc
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 12 
transid 492120 /dev/sdb
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 11 
transid 492120 /dev/sda
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 10 
transid 492120 /dev/sdg
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:43:16 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711, 
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): bad tree block 
start 1121375725894905312 74200909787136
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): bad tree block 
start 7250342666203184288 74200909791232
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS: Failed to read block groups: -5
Nov 15 20:43:16 cambridge kernel: BTRFS: open_ctree failed
Nov 15 20:49:14 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:49:15 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711, 
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): bad tree block 
start 1121375725894905312 74200909787136
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): bad tree block 
start 7250342666203184288 74200909791232
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid 
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS: Failed to read block groups: -5
Nov 15 20:49:16 cambridge kernel: BTRFS: open_ctree failed


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-11-20 13:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-20  4:11 Self-destruct of btrfs RAID6 array Paul Loewenstein
2015-11-20  6:19 ` Duncan
2015-11-20 13:29 ` Austin S Hemmelgarn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox