* Broken RAID6, segfault on chunk-recover
@ 2016-01-04 13:32 Abe
2016-01-06 22:51 ` Abe
0 siblings, 1 reply; 4+ messages in thread
From: Abe @ 2016-01-04 13:32 UTC (permalink / raw)
To: linux-btrfs
Hello,
Could you please help in recovering this multiple device filesystem?
I went up to a point where running chunk-recover looks to be promising,
but unfortunately the command will segfault consistently.
This was a 8 devices Btrfs RAID6 using LSI HBA. One disk died suddenly
and is 100% unavailable. At this point, I had evidences userland was
receiving corrupted data while reading files (few invalid bytes in
gigabytes files). System was rebooted. Since then I can't mount it as
degraded or use any recovery command.
Memtest is ok.
What would you advise ?
Is the segfault issue something you would like me to help debug before
going further ?
----------------------------------
# uname -a
Linux horo 4.3.0-1-amd64 #1 SMP Debian 4.3.3-2 (2015-12-17) x86_64 GNU/Linux
----------------------------------
# ./btrfs version
btrfs-progs v4.3.1
----------------------------------
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 119.2G 0 disk
├─sda1 8:1 0 1M 0 part
└─sda2 8:2 0 118.3G 0 part /
sdb 8:16 0 477G 0 disk
└─sdb1 8:17 0 477G 0 part
sdc 8:32 0 477G 0 disk
└─sdc1 8:33 0 477G 0 part
sdd 8:48 0 477G 0 disk
└─sdd1 8:49 0 477G 0 part
sde 8:64 0 477G 0 disk
└─sde1 8:65 0 477G 0 part
sdf 8:80 0 477G 0 disk
└─sdf1 8:81 0 477G 0 part
sdg 8:96 0 477G 0 disk
└─sdg1 8:97 0 477G 0 part
sdh 8:112 0 477G 0 disk
└─sdh1 8:113 0 477G 0 part
----------------------------------
# ./btrfs fi show
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Label: 'hive' uuid: bec7b9a0-c56c-494e-8631-072d3f89c0c9
Total devices 8 FS bytes used 2.22TiB
devid 1 size 476.94GiB used 325.73GiB path /dev/sdf1
devid 2 size 476.94GiB used 325.73GiB path /dev/sdc1
devid 3 size 476.94GiB used 325.73GiB path /dev/sdd1
devid 4 size 476.94GiB used 325.73GiB path /dev/sde1
devid 5 size 476.94GiB used 325.73GiB path /dev/sdb1
devid 7 size 476.94GiB used 325.73GiB path /dev/sdh1
devid 8 size 476.94GiB used 325.73GiB path /dev/sdg1
*** Some devices missing
----------------------------------
# mount -o ro,degraded,recovery /dev/sdb1 /mnt/temp
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
[169387.880114] BTRFS info (device sdb1): allowing degraded mounts
[169387.880125] BTRFS info (device sdb1): enabling auto recovery
[169387.880130] BTRFS info (device sdb1): disk space caching is enabled
[169387.880132] BTRFS: has skinny extents
[169387.890940] BTRFS: bdev (null) errs: wr 27, rd 1535, flush 9,
corrupt 0, gen 0
[169388.014701] BTRFS (device sdb1): bad tree block start
8621721664010832405 6766851391488
[169388.015117] BTRFS (device sdb1): bad tree block start
8621721664010832405 6766851391488
[169388.015148] BTRFS: Failed to read block groups: -5
[169388.042731] BTRFS: open_ctree failed
----------------------------------
# ./btrfs check --readonly /dev/sdb1
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Couldn't open file system
# ./btrfs check --readonly --tree-root 4415566151680 /dev/sdb1
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Couldn't open file system
# ./btrfs check --readonly --tree-root 526521381424393794 /dev/sdb1
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Couldn't open file system
# ./btrfs check --repair --init-csum-tree /dev/sdb1
enabling repair mode
Creating a new CRC tree
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Couldn't open file system
----------------------------------
# ./btrfs rescue super-recover -v /dev/sd[bcdefgh]1
[...]
All supers are valid, no need to recover
----------------------------------
# ./btrfs rescue chunk-recover -v /dev/sdb1
All Devices:
Device: id = 1, name = /dev/sdf1
Device: id = 7, name = /dev/sdh1
Device: id = 8, name = /dev/sdg1
Device: id = 4, name = /dev/sde1
Device: id = 2, name = /dev/sdc1
Device: id = 3, name = /dev/sdd1
Device: id = 5, name = /dev/sdb1
Scanning: 0 in dev0, 0 in dev1, 0 in dev2, 0 in dev3, 0 in dev4, 0 in
dev5, 0 in dev6
Segmentation fault
# ./btrfs rescue chunk-recover -v /dev/sdc1
All Devices:
Device: id = 1, name = /dev/sdf1
Device: id = 7, name = /dev/sdh1
Device: id = 8, name = /dev/sdg1
Device: id = 4, name = /dev/sde1
Device: id = 3, name = /dev/sdd1
Device: id = 5, name = /dev/sdb1
Device: id = 2, name = /dev/sdc1
Scanning: 4096 in dev0, 2052096 in dev1, 2183168 in dev2, 1146880 in
dev3, 0 in dev4, 0 in dev5, 0 in dev6
Segmentation fault
# ./btrfs rescue chunk-recover -v /dev/sdd1
All Devices:
Device: id = 1, name = /dev/sdf1
Device: id = 7, name = /dev/sdh1
Device: id = 8, name = /dev/sdg1
Device: id = 4, name = /dev/sde1
Device: id = 2, name = /dev/sdc1
Device: id = 5, name = /dev/sdb1
Device: id = 3, name = /dev/sdd1
Scanning: 0 in dev0, 0 in dev1, 0 in dev2, 0 in dev3, 0 in dev4, 0 in
dev5, 0 in dev6
Segmentation fault
[...]
----------------------------------
Best regards
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Broken RAID6, segfault on chunk-recover
2016-01-04 13:32 Broken RAID6, segfault on chunk-recover Abe
@ 2016-01-06 22:51 ` Abe
2016-01-07 16:39 ` Abe
0 siblings, 1 reply; 4+ messages in thread
From: Abe @ 2016-01-06 22:51 UTC (permalink / raw)
To: linux-btrfs
Here is the backtrace.
Any chance using chunk-recover will help repair this filesystem?
Thanks
btrfs-progs# git rev-parse HEAD
7c3394ed9ef2063a7256d4bc078a485b6f826bc5
btrfs-progs# gdb --args ./btrfs rescue chunk-recover -v /dev/sdf1
(gdb) r
Starting program: /root/btrfs-progs/btrfs rescue chunk-recover -v
/dev/sdf1
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
All Devices:
Device: id = 7, name = /dev/sdh1
Device: id = 8, name = /dev/sdg1
Device: id = 4, name = /dev/sde1
Device: id = 2, name = /dev/sdc1
Device: id = 3, name = /dev/sdd1
Device: id = 5, name = /dev/sdb1
Device: id = 1, name = /dev/sdf1
[New Thread 0x7ffff6f95700 (LWP 26524)]
[New Thread 0x7ffff6794700 (LWP 26525)]
[New Thread 0x7ffff5f93700 (LWP 26526)]
[New Thread 0x7ffff5792700 (LWP 26527)]
[New Thread 0x7ffff4f91700 (LWP 26528)]
[New Thread 0x7fffe7fff700 (LWP 26529)]
[New Thread 0x7fffe77fe700 (LWP 26530)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5f93700 (LWP 26526)]
btrfs_new_block_group_record (leaf=leaf@entry=0x7fffec0008c0,
key=key@entry=0x7ffff5f92c30, slot=slot@entry=3) at cmds-check.c:5013
5013 rec->flags = btrfs_disk_block_group_flags(leaf, ptr);
(gdb) bt
#0 btrfs_new_block_group_record (leaf=leaf@entry=0x7fffec0008c0,
key=key@entry=0x7ffff5f92c30, slot=slot@entry=3) at cmds-check.c:5013
#1 0x00000000004309c6 in process_block_group_item (slot=3,
key=0x7ffff5f92c30, leaf=0x7fffec0008c0, bg_cache=0x7fffffffe3e8) at
chunk-recover.c:232
#2 extract_metadata_record (rc=rc@entry=0x7fffffffe3b0,
leaf=leaf@entry=0x7fffec0008c0) at chunk-recover.c:717
#3 0x0000000000431190 in scan_one_device (dev_scan_struct=0x695820) at
chunk-recover.c:807
#4 0x00007ffff7341284 in start_thread (arg=0x7ffff5f93700) at
pthread_create.c:333
#5 0x00007ffff707e74d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) thread apply all bt
Thread 8 (Thread 0x7fffe77fe700 (LWP 26530)):
#0 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
#1 0x00007ffff73411c0 in ?? () at pthread_create.c:237 from
/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fffe77fe700 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 7 (Thread 0x7fffe7fff700 (LWP 26529)):
#0 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
#1 0x00007ffff73411c0 in ?? () at pthread_create.c:237 from
/lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007fffe7fff700 in ?? ()
#3 0x0000000000000000 in ?? ()
Thread 6 (Thread 0x7ffff4f91700 (LWP 26528)):
#0 0x00007ffff734a013 in pread64 () at
../sysdeps/unix/syscall-template.S:81
#1 0x0000000000430ea5 in pread64 (__offset=532480, __nbytes=<optimized
out>, __buf=0x7fffdc00093c, __fd=7) at
/usr/include/x86_64-linux-gnu/bits/unistd.h:117
#2 scan_one_device (dev_scan_struct=0x695860) at chunk-recover.c:776
#3 0x00007ffff7341284 in start_thread (arg=0x7ffff4f91700) at
pthread_create.c:333
#4 0x00007ffff707e74d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 5 (Thread 0x7ffff5792700 (LWP 26527)):
#0 0x00007ffff734a013 in pread64 () at
../sysdeps/unix/syscall-template.S:81
#1 0x0000000000430ea5 in pread64 (__offset=368640, __nbytes=<optimized
out>, __buf=0x7fffe000093c, __fd=6) at
/usr/include/x86_64-linux-gnu/bits/unistd.h:117
#2 scan_one_device (dev_scan_struct=0x695840) at chunk-recover.c:776
#3 0x00007ffff7341284 in start_thread (arg=0x7ffff5792700) at
pthread_create.c:333
#4 0x00007ffff707e74d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 4 (Thread 0x7ffff5f93700 (LWP 26526)):
#0 btrfs_new_block_group_record (leaf=leaf@entry=0x7fffec0008c0,
key=key@entry=0x7ffff5f92c30, slot=slot@entry=3) at cmds-check.c:5013
#1 0x00000000004309c6 in process_block_group_item (slot=3,
key=0x7ffff5f92c30, leaf=0x7fffec0008c0, bg_cache=0x7fffffffe3e8) at
chunk-recover.c:232
#2 extract_metadata_record (rc=rc@entry=0x7fffffffe3b0,
leaf=leaf@entry=0x7fffec0008c0) at chunk-recover.c:717
#3 0x0000000000431190 in scan_one_device (dev_scan_struct=0x695820) at
chunk-recover.c:807
#4 0x00007ffff7341284 in start_thread (arg=0x7ffff5f93700) at
pthread_create.c:333
#5 0x00007ffff707e74d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 3 (Thread 0x7ffff6794700 (LWP 26525)):
#0 0x00007ffff734a013 in pread64 () at
../sysdeps/unix/syscall-template.S:81
#1 0x0000000000430ea5 in pread64 (__offset=3493888, __nbytes=<optimized
out>, __buf=0x7fffe800093c, __fd=4)
at /usr/include/x86_64-linux-gnu/bits/unistd.h:117
#2 scan_one_device (dev_scan_struct=0x695800) at chunk-recover.c:776
#3 0x00007ffff7341284 in start_thread (arg=0x7ffff6794700) at
pthread_create.c:333
#4 0x00007ffff707e74d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 2 (Thread 0x7ffff6f95700 (LWP 26524)):
#0 0x00007ffff734a013 in pread64 () at
../sysdeps/unix/syscall-template.S:81
#1 0x0000000000430ea5 in pread64 (__offset=3751936, __nbytes=<optimized
out>, __buf=0x7ffff000093c, __fd=3)
at /usr/include/x86_64-linux-gnu/bits/unistd.h:117
#2 scan_one_device (dev_scan_struct=0x6957e0) at chunk-recover.c:776
#3 0x00007ffff7341284 in start_thread (arg=0x7ffff6f95700) at
pthread_create.c:333
#4 0x00007ffff707e74d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 1 (Thread 0x7ffff7fe08c0 (LWP 26520)):
#0 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
#1 0x00007ffff73400fa in create_thread (pd=pd@entry=0x7fffe77fe700,
attr=attr@entry=0x7fffffffe0e0, stopped_start=<optimized out>,
stopped_start@entry=false, stackaddr=<optimized out>,
thread_ran=0x7fffffffe0df) at
../sysdeps/unix/sysv/linux/createthread.c:102
#2 0x00007ffff7341a08 in __pthread_create_2_1
(newthread=newthread@entry=0x695900, attr=attr@entry=0x0,
start_routine=start_routine@entry=0x430dd6 <scan_one_device>,
arg=arg@entry=0x6958a0) at pthread_create.c:677
#3 0x000000000043150e in scan_devices (rc=0x7fffffffe3b0) at
chunk-recover.c:876
#4 btrfs_recover_chunk_tree (path=path@entry=0x7fffffffe8ab
"/dev/sdf1", verbose=verbose@entry=1, yes=yes@entry=0) at
chunk-recover.c:2323
#5 0x000000000042f333 in cmd_rescue_chunk_recover (argc=<optimized
out>, argv=<optimized out>) at cmds-rescue.c:95
#6 0x0000000000409ca4 in handle_command_group (grp=grp@entry=0x68de20
<rescue_cmd_group>, argc=3, argv=<optimized out>) at btrfs.c:144
#7 0x000000000042f67e in cmd_rescue (argc=<optimized out>,
argv=<optimized out>) at cmds-rescue.c:220
#8 0x0000000000409dfe in main (argc=4, argv=0x7fffffffe650) at
btrfs.c:252
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Broken RAID6, segfault on chunk-recover
2016-01-06 22:51 ` Abe
@ 2016-01-07 16:39 ` Abe
2016-01-08 1:42 ` Qu Wenruo
0 siblings, 1 reply; 4+ messages in thread
From: Abe @ 2016-01-07 16:39 UTC (permalink / raw)
To: linux-btrfs
No wonder why chunck-recover crashed, superblocks are valid but they all
contain a bogus chunk_root, it exceeds the total_bytes !
So all I need it to find where that btree root is on disk, right?
Problem is almost every tool from btrfs-progs is unusuable because they
try to read the chunck root at first. Do you know of any alternative
tool that could scan devices and find where roots might be?
All superblock copies for all devices look like this :
# ./btrfs-show-super /dev/sdb1
superblock: bytenr=65536, device=/dev/sdb1
---------------------------------------------------------
csum 0x938855ef [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
fsid bec7b9a0-c56c-494e-8631-072d3f89c0c9
label hive
generation 34245
root 6766859681792
sys_array_size 321
chunk_root_generation 32751
root_level 1
chunk_root 4415566151680 <--- Impossible Offset
chunk_root_level 1
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 4096872997376 <---
bytes_used 2438972817408
sectorsize 4096
nodesize 16384
leafsize 16384
stripesize 4096
root_dir 6
num_devices 8
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x1e1
( MIXED_BACKREF |
BIG_METADATA |
EXTENDED_IREF |
RAID56 |
SKINNY_METADATA )
csum_type 0
csum_size 4
cache_generation 34245
uuid_tree_generation 34245
dev_item.uuid ea4a089b-c88c-4a6c-80e5-b2c93f0613f5
dev_item.fsid bec7b9a0-c56c-494e-8631-072d3f89c0c9 [match]
dev_item.type 0
dev_item.total_bytes 512109125120
dev_item.bytes_used 349754621952
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 5
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0
Best regards,
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Broken RAID6, segfault on chunk-recover
2016-01-07 16:39 ` Abe
@ 2016-01-08 1:42 ` Qu Wenruo
0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2016-01-08 1:42 UTC (permalink / raw)
To: Abe, linux-btrfs
Abe wrote on 2016/01/07 17:39 +0100:
> No wonder why chunck-recover crashed, superblocks are valid but they all
> contain a bogus chunk_root, it exceeds the total_bytes !
> So all I need it to find where that btree root is on disk, right?
>
> Problem is almost every tool from btrfs-progs is unusuable because they
> try to read the chunck root at first. Do you know of any alternative
> tool that could scan devices and find where roots might be?
First, you're wrong about the total_bytes and chunk_root bytenr.
Chunk_root bytenr is btrfs *logical* bytenr.
It is completely OK for logical bytenr to beyond total bytenr,
So the problem is not here.
The real problem for chunk recovery to segfault, would be that, in your
cmds-check.c:5013, the leaf->data could be empty.
But you did give enough info or checked *leaf, so I can't determine now.
To recovery your RAID6, the possible chance lies in your superblock.
Please use run btrfs-show-super with -f -a to get full backup roots output.
Which may have a good old chunk root.
Thanks,
Qu
>
>
> All superblock copies for all devices look like this :
>
> # ./btrfs-show-super /dev/sdb1
> superblock: bytenr=65536, device=/dev/sdb1
> ---------------------------------------------------------
> csum 0x938855ef [match]
> bytenr 65536
> flags 0x1
> ( WRITTEN )
> magic _BHRfS_M [match]
> fsid bec7b9a0-c56c-494e-8631-072d3f89c0c9
> label hive
> generation 34245
> root 6766859681792
> sys_array_size 321
> chunk_root_generation 32751
> root_level 1
> chunk_root 4415566151680 <--- Impossible Offset
> chunk_root_level 1
> log_root 0
> log_root_transid 0
> log_root_level 0
> total_bytes 4096872997376 <---
> bytes_used 2438972817408
> sectorsize 4096
> nodesize 16384
> leafsize 16384
> stripesize 4096
> root_dir 6
> num_devices 8
> compat_flags 0x0
> compat_ro_flags 0x0
> incompat_flags 0x1e1
> ( MIXED_BACKREF |
> BIG_METADATA |
> EXTENDED_IREF |
> RAID56 |
> SKINNY_METADATA )
> csum_type 0
> csum_size 4
> cache_generation 34245
> uuid_tree_generation 34245
> dev_item.uuid ea4a089b-c88c-4a6c-80e5-b2c93f0613f5
> dev_item.fsid bec7b9a0-c56c-494e-8631-072d3f89c0c9 [match]
> dev_item.type 0
> dev_item.total_bytes 512109125120
> dev_item.bytes_used 349754621952
> dev_item.io_align 4096
> dev_item.io_width 4096
> dev_item.sector_size 4096
> dev_item.devid 5
> dev_item.dev_group 0
> dev_item.seek_speed 0
> dev_item.bandwidth 0
> dev_item.generation 0
>
>
> Best regards,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-08 1:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-04 13:32 Broken RAID6, segfault on chunk-recover Abe
2016-01-06 22:51 ` Abe
2016-01-07 16:39 ` Abe
2016-01-08 1:42 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).