linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Broken RAID6, segfault on chunk-recover
@ 2016-01-04 13:32 Abe
  2016-01-06 22:51 ` Abe
  0 siblings, 1 reply; 4+ messages in thread
From: Abe @ 2016-01-04 13:32 UTC (permalink / raw)
  To: linux-btrfs

Hello,

Could you please help in recovering this multiple device filesystem?
I went up to a point where running chunk-recover looks to be promising,
but unfortunately the command will segfault consistently.

This was a 8 devices Btrfs RAID6 using LSI HBA. One disk died suddenly
and is 100% unavailable. At this point, I had evidences userland was
receiving corrupted data while reading files (few invalid bytes in
gigabytes files). System was rebooted. Since then I can't mount it as
degraded or use any recovery command.
Memtest is ok.

What would you advise ?
Is the segfault issue something you would like me to help debug before
going further ?


----------------------------------

# uname -a
Linux horo 4.3.0-1-amd64 #1 SMP Debian 4.3.3-2 (2015-12-17) x86_64 GNU/Linux

----------------------------------

# ./btrfs version
btrfs-progs v4.3.1

----------------------------------

# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 119.2G  0 disk
├─sda1   8:1    0     1M  0 part
└─sda2   8:2    0 118.3G  0 part /
sdb      8:16   0   477G  0 disk
└─sdb1   8:17   0   477G  0 part
sdc      8:32   0   477G  0 disk
└─sdc1   8:33   0   477G  0 part
sdd      8:48   0   477G  0 disk
└─sdd1   8:49   0   477G  0 part
sde      8:64   0   477G  0 disk
└─sde1   8:65   0   477G  0 part
sdf      8:80   0   477G  0 disk
└─sdf1   8:81   0   477G  0 part
sdg      8:96   0   477G  0 disk
└─sdg1   8:97   0   477G  0 part
sdh      8:112  0   477G  0 disk
└─sdh1   8:113  0   477G  0 part

----------------------------------

# ./btrfs fi show
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Label: 'hive'  uuid: bec7b9a0-c56c-494e-8631-072d3f89c0c9
        Total devices 8 FS bytes used 2.22TiB
        devid    1 size 476.94GiB used 325.73GiB path /dev/sdf1
        devid    2 size 476.94GiB used 325.73GiB path /dev/sdc1
        devid    3 size 476.94GiB used 325.73GiB path /dev/sdd1
        devid    4 size 476.94GiB used 325.73GiB path /dev/sde1
        devid    5 size 476.94GiB used 325.73GiB path /dev/sdb1
        devid    7 size 476.94GiB used 325.73GiB path /dev/sdh1
        devid    8 size 476.94GiB used 325.73GiB path /dev/sdg1
        *** Some devices missing

----------------------------------

# mount -o ro,degraded,recovery /dev/sdb1 /mnt/temp
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so.

[169387.880114] BTRFS info (device sdb1): allowing degraded mounts
[169387.880125] BTRFS info (device sdb1): enabling auto recovery
[169387.880130] BTRFS info (device sdb1): disk space caching is enabled
[169387.880132] BTRFS: has skinny extents
[169387.890940] BTRFS: bdev (null) errs: wr 27, rd 1535, flush 9,
corrupt 0, gen 0
[169388.014701] BTRFS (device sdb1): bad tree block start
8621721664010832405 6766851391488
[169388.015117] BTRFS (device sdb1): bad tree block start
8621721664010832405 6766851391488
[169388.015148] BTRFS: Failed to read block groups: -5
[169388.042731] BTRFS: open_ctree failed

----------------------------------

# ./btrfs check --readonly /dev/sdb1
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Couldn't open file system

# ./btrfs check --readonly --tree-root 4415566151680 /dev/sdb1
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Couldn't open file system

# ./btrfs check --readonly --tree-root 526521381424393794 /dev/sdb1
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Couldn't open file system

# ./btrfs check --repair --init-csum-tree /dev/sdb1
enabling repair mode
Creating a new CRC tree
warning, device 6 is missing
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
checksum verify failed on 4415566151680 found F8A6E83A wanted EB7CA66C
bytenr mismatch, want=4415566151680, have=526521381424393794
Couldn't read chunk root
Couldn't open file system

----------------------------------

# ./btrfs rescue super-recover -v /dev/sd[bcdefgh]1
[...]
All supers are valid, no need to recover

----------------------------------

# ./btrfs rescue chunk-recover -v /dev/sdb1
All Devices:
        Device: id = 1, name = /dev/sdf1
        Device: id = 7, name = /dev/sdh1
        Device: id = 8, name = /dev/sdg1
        Device: id = 4, name = /dev/sde1
        Device: id = 2, name = /dev/sdc1
        Device: id = 3, name = /dev/sdd1
        Device: id = 5, name = /dev/sdb1

Scanning: 0 in dev0, 0 in dev1, 0 in dev2, 0 in dev3, 0 in dev4, 0 in
dev5, 0 in dev6
Segmentation fault

# ./btrfs rescue chunk-recover -v /dev/sdc1
All Devices:
        Device: id = 1, name = /dev/sdf1
        Device: id = 7, name = /dev/sdh1
        Device: id = 8, name = /dev/sdg1
        Device: id = 4, name = /dev/sde1
        Device: id = 3, name = /dev/sdd1
        Device: id = 5, name = /dev/sdb1
        Device: id = 2, name = /dev/sdc1

Scanning: 4096 in dev0, 2052096 in dev1, 2183168 in dev2, 1146880 in
dev3, 0 in dev4, 0 in dev5, 0 in dev6
Segmentation fault

# ./btrfs rescue chunk-recover -v /dev/sdd1
All Devices:
        Device: id = 1, name = /dev/sdf1
        Device: id = 7, name = /dev/sdh1
        Device: id = 8, name = /dev/sdg1
        Device: id = 4, name = /dev/sde1
        Device: id = 2, name = /dev/sdc1
        Device: id = 5, name = /dev/sdb1
        Device: id = 3, name = /dev/sdd1

Scanning: 0 in dev0, 0 in dev1, 0 in dev2, 0 in dev3, 0 in dev4, 0 in
dev5, 0 in dev6
Segmentation fault

[...]

----------------------------------


Best regards

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Broken RAID6, segfault on chunk-recover
  2016-01-04 13:32 Broken RAID6, segfault on chunk-recover Abe
@ 2016-01-06 22:51 ` Abe
  2016-01-07 16:39   ` Abe
  0 siblings, 1 reply; 4+ messages in thread
From: Abe @ 2016-01-06 22:51 UTC (permalink / raw)
  To: linux-btrfs

Here is the backtrace.
Any chance using chunk-recover will help repair this filesystem?
Thanks


btrfs-progs# git rev-parse HEAD
7c3394ed9ef2063a7256d4bc078a485b6f826bc5

btrfs-progs# gdb --args ./btrfs rescue chunk-recover -v /dev/sdf1

(gdb) r
Starting program: /root/btrfs-progs/btrfs rescue chunk-recover -v 
/dev/sdf1
[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/x86_64-linux-gnu/libthread_db.so.1".
All Devices:
         Device: id = 7, name = /dev/sdh1
         Device: id = 8, name = /dev/sdg1
         Device: id = 4, name = /dev/sde1
         Device: id = 2, name = /dev/sdc1
         Device: id = 3, name = /dev/sdd1
         Device: id = 5, name = /dev/sdb1
         Device: id = 1, name = /dev/sdf1

[New Thread 0x7ffff6f95700 (LWP 26524)]
[New Thread 0x7ffff6794700 (LWP 26525)]
[New Thread 0x7ffff5f93700 (LWP 26526)]
[New Thread 0x7ffff5792700 (LWP 26527)]
[New Thread 0x7ffff4f91700 (LWP 26528)]
[New Thread 0x7fffe7fff700 (LWP 26529)]
[New Thread 0x7fffe77fe700 (LWP 26530)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5f93700 (LWP 26526)]
btrfs_new_block_group_record (leaf=leaf@entry=0x7fffec0008c0, 
key=key@entry=0x7ffff5f92c30, slot=slot@entry=3) at cmds-check.c:5013
5013            rec->flags = btrfs_disk_block_group_flags(leaf, ptr);


(gdb) bt

#0  btrfs_new_block_group_record (leaf=leaf@entry=0x7fffec0008c0, 
key=key@entry=0x7ffff5f92c30, slot=slot@entry=3) at cmds-check.c:5013
#1  0x00000000004309c6 in process_block_group_item (slot=3, 
key=0x7ffff5f92c30, leaf=0x7fffec0008c0, bg_cache=0x7fffffffe3e8) at 
chunk-recover.c:232
#2  extract_metadata_record (rc=rc@entry=0x7fffffffe3b0, 
leaf=leaf@entry=0x7fffec0008c0) at chunk-recover.c:717
#3  0x0000000000431190 in scan_one_device (dev_scan_struct=0x695820) at 
chunk-recover.c:807
#4  0x00007ffff7341284 in start_thread (arg=0x7ffff5f93700) at 
pthread_create.c:333
#5  0x00007ffff707e74d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109


(gdb) thread apply all bt

Thread 8 (Thread 0x7fffe77fe700 (LWP 26530)):
#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
#1  0x00007ffff73411c0 in ?? () at pthread_create.c:237 from 
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fffe77fe700 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7fffe7fff700 (LWP 26529)):
#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
#1  0x00007ffff73411c0 in ?? () at pthread_create.c:237 from 
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fffe7fff700 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7ffff4f91700 (LWP 26528)):
#0  0x00007ffff734a013 in pread64 () at 
../sysdeps/unix/syscall-template.S:81
#1  0x0000000000430ea5 in pread64 (__offset=532480, __nbytes=<optimized 
out>, __buf=0x7fffdc00093c, __fd=7) at 
/usr/include/x86_64-linux-gnu/bits/unistd.h:117
#2  scan_one_device (dev_scan_struct=0x695860) at chunk-recover.c:776
#3  0x00007ffff7341284 in start_thread (arg=0x7ffff4f91700) at 
pthread_create.c:333
#4  0x00007ffff707e74d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7ffff5792700 (LWP 26527)):
#0  0x00007ffff734a013 in pread64 () at 
../sysdeps/unix/syscall-template.S:81
#1  0x0000000000430ea5 in pread64 (__offset=368640, __nbytes=<optimized 
out>, __buf=0x7fffe000093c, __fd=6) at 
/usr/include/x86_64-linux-gnu/bits/unistd.h:117
#2  scan_one_device (dev_scan_struct=0x695840) at chunk-recover.c:776
#3  0x00007ffff7341284 in start_thread (arg=0x7ffff5792700) at 
pthread_create.c:333
#4  0x00007ffff707e74d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7ffff5f93700 (LWP 26526)):
#0  btrfs_new_block_group_record (leaf=leaf@entry=0x7fffec0008c0, 
key=key@entry=0x7ffff5f92c30, slot=slot@entry=3) at cmds-check.c:5013
#1  0x00000000004309c6 in process_block_group_item (slot=3, 
key=0x7ffff5f92c30, leaf=0x7fffec0008c0, bg_cache=0x7fffffffe3e8) at 
chunk-recover.c:232
#2  extract_metadata_record (rc=rc@entry=0x7fffffffe3b0, 
leaf=leaf@entry=0x7fffec0008c0) at chunk-recover.c:717
#3  0x0000000000431190 in scan_one_device (dev_scan_struct=0x695820) at 
chunk-recover.c:807
#4  0x00007ffff7341284 in start_thread (arg=0x7ffff5f93700) at 
pthread_create.c:333
#5  0x00007ffff707e74d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ffff6794700 (LWP 26525)):
#0  0x00007ffff734a013 in pread64 () at 
../sysdeps/unix/syscall-template.S:81
#1  0x0000000000430ea5 in pread64 (__offset=3493888, __nbytes=<optimized 
out>, __buf=0x7fffe800093c, __fd=4)
     at /usr/include/x86_64-linux-gnu/bits/unistd.h:117
#2  scan_one_device (dev_scan_struct=0x695800) at chunk-recover.c:776
#3  0x00007ffff7341284 in start_thread (arg=0x7ffff6794700) at 
pthread_create.c:333
#4  0x00007ffff707e74d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7ffff6f95700 (LWP 26524)):
#0  0x00007ffff734a013 in pread64 () at 
../sysdeps/unix/syscall-template.S:81
#1  0x0000000000430ea5 in pread64 (__offset=3751936, __nbytes=<optimized 
out>, __buf=0x7ffff000093c, __fd=3)
     at /usr/include/x86_64-linux-gnu/bits/unistd.h:117
#2  scan_one_device (dev_scan_struct=0x6957e0) at chunk-recover.c:776
#3  0x00007ffff7341284 in start_thread (arg=0x7ffff6f95700) at 
pthread_create.c:333
#4  0x00007ffff707e74d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7ffff7fe08c0 (LWP 26520)):
#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
#1  0x00007ffff73400fa in create_thread (pd=pd@entry=0x7fffe77fe700, 
attr=attr@entry=0x7fffffffe0e0, stopped_start=<optimized out>,
     stopped_start@entry=false, stackaddr=<optimized out>, 
thread_ran=0x7fffffffe0df) at 
../sysdeps/unix/sysv/linux/createthread.c:102
#2  0x00007ffff7341a08 in __pthread_create_2_1 
(newthread=newthread@entry=0x695900, attr=attr@entry=0x0,
     start_routine=start_routine@entry=0x430dd6 <scan_one_device>, 
arg=arg@entry=0x6958a0) at pthread_create.c:677
#3  0x000000000043150e in scan_devices (rc=0x7fffffffe3b0) at 
chunk-recover.c:876
#4  btrfs_recover_chunk_tree (path=path@entry=0x7fffffffe8ab 
"/dev/sdf1", verbose=verbose@entry=1, yes=yes@entry=0) at 
chunk-recover.c:2323
#5  0x000000000042f333 in cmd_rescue_chunk_recover (argc=<optimized 
out>, argv=<optimized out>) at cmds-rescue.c:95
#6  0x0000000000409ca4 in handle_command_group (grp=grp@entry=0x68de20 
<rescue_cmd_group>, argc=3, argv=<optimized out>) at btrfs.c:144
#7  0x000000000042f67e in cmd_rescue (argc=<optimized out>, 
argv=<optimized out>) at cmds-rescue.c:220
#8  0x0000000000409dfe in main (argc=4, argv=0x7fffffffe650) at 
btrfs.c:252

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Broken RAID6, segfault on chunk-recover
  2016-01-06 22:51 ` Abe
@ 2016-01-07 16:39   ` Abe
  2016-01-08  1:42     ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Abe @ 2016-01-07 16:39 UTC (permalink / raw)
  To: linux-btrfs

No wonder why chunck-recover crashed, superblocks are valid but they all
contain a bogus chunk_root, it exceeds the total_bytes !
So all I need it to find where that btree root is on disk, right?

Problem is almost every tool from btrfs-progs is unusuable because they
try to read the chunck root at first. Do you know of any alternative
tool that could scan devices and find where roots might be?


All superblock copies for all devices look like this :

# ./btrfs-show-super /dev/sdb1
superblock: bytenr=65536, device=/dev/sdb1
---------------------------------------------------------
csum			0x938855ef [match]
bytenr			65536
flags			0x1
			( WRITTEN )
magic			_BHRfS_M [match]
fsid			bec7b9a0-c56c-494e-8631-072d3f89c0c9
label			hive
generation		34245
root			6766859681792
sys_array_size		321
chunk_root_generation	32751
root_level		1
chunk_root		4415566151680  <--- Impossible Offset
chunk_root_level	1
log_root		0
log_root_transid	0
log_root_level		0
total_bytes		4096872997376  <---
bytes_used		2438972817408
sectorsize		4096
nodesize		16384
leafsize		16384
stripesize		4096
root_dir		6
num_devices		8
compat_flags		0x0
compat_ro_flags		0x0
incompat_flags		0x1e1
			( MIXED_BACKREF |
			  BIG_METADATA |
			  EXTENDED_IREF |
			  RAID56 |
			  SKINNY_METADATA )
csum_type		0
csum_size		4
cache_generation	34245
uuid_tree_generation	34245
dev_item.uuid		ea4a089b-c88c-4a6c-80e5-b2c93f0613f5
dev_item.fsid		bec7b9a0-c56c-494e-8631-072d3f89c0c9 [match]
dev_item.type		0
dev_item.total_bytes	512109125120
dev_item.bytes_used	349754621952
dev_item.io_align	4096
dev_item.io_width	4096
dev_item.sector_size	4096
dev_item.devid		5
dev_item.dev_group	0
dev_item.seek_speed	0
dev_item.bandwidth	0
dev_item.generation	0


Best regards,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Broken RAID6, segfault on chunk-recover
  2016-01-07 16:39   ` Abe
@ 2016-01-08  1:42     ` Qu Wenruo
  0 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2016-01-08  1:42 UTC (permalink / raw)
  To: Abe, linux-btrfs



Abe wrote on 2016/01/07 17:39 +0100:
> No wonder why chunck-recover crashed, superblocks are valid but they all
> contain a bogus chunk_root, it exceeds the total_bytes !
> So all I need it to find where that btree root is on disk, right?
>
> Problem is almost every tool from btrfs-progs is unusuable because they
> try to read the chunck root at first. Do you know of any alternative
> tool that could scan devices and find where roots might be?

First, you're wrong about the total_bytes and chunk_root bytenr.

Chunk_root bytenr is btrfs *logical* bytenr.
It is completely OK for logical bytenr to beyond total bytenr,

So the problem is not here.


The real problem for chunk recovery to segfault, would be that, in your 
cmds-check.c:5013, the leaf->data could be empty.

But you did give enough info or checked *leaf, so I can't determine now.

To recovery your RAID6, the possible chance lies in your superblock.

Please use run btrfs-show-super with -f -a to get full backup roots output.
Which may have a good old chunk root.

Thanks,
Qu

>
>
> All superblock copies for all devices look like this :
>
> # ./btrfs-show-super /dev/sdb1
> superblock: bytenr=65536, device=/dev/sdb1
> ---------------------------------------------------------
> csum			0x938855ef [match]
> bytenr			65536
> flags			0x1
> 			( WRITTEN )
> magic			_BHRfS_M [match]
> fsid			bec7b9a0-c56c-494e-8631-072d3f89c0c9
> label			hive
> generation		34245
> root			6766859681792
> sys_array_size		321
> chunk_root_generation	32751
> root_level		1
> chunk_root		4415566151680  <--- Impossible Offset
> chunk_root_level	1
> log_root		0
> log_root_transid	0
> log_root_level		0
> total_bytes		4096872997376  <---
> bytes_used		2438972817408
> sectorsize		4096
> nodesize		16384
> leafsize		16384
> stripesize		4096
> root_dir		6
> num_devices		8
> compat_flags		0x0
> compat_ro_flags		0x0
> incompat_flags		0x1e1
> 			( MIXED_BACKREF |
> 			  BIG_METADATA |
> 			  EXTENDED_IREF |
> 			  RAID56 |
> 			  SKINNY_METADATA )
> csum_type		0
> csum_size		4
> cache_generation	34245
> uuid_tree_generation	34245
> dev_item.uuid		ea4a089b-c88c-4a6c-80e5-b2c93f0613f5
> dev_item.fsid		bec7b9a0-c56c-494e-8631-072d3f89c0c9 [match]
> dev_item.type		0
> dev_item.total_bytes	512109125120
> dev_item.bytes_used	349754621952
> dev_item.io_align	4096
> dev_item.io_width	4096
> dev_item.sector_size	4096
> dev_item.devid		5
> dev_item.dev_group	0
> dev_item.seek_speed	0
> dev_item.bandwidth	0
> dev_item.generation	0
>
>
> Best regards,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-08  1:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-04 13:32 Broken RAID6, segfault on chunk-recover Abe
2016-01-06 22:51 ` Abe
2016-01-07 16:39   ` Abe
2016-01-08  1:42     ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).