Unable to rescue RAID5

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Unable to rescue RAID5
@ 2016-10-13  1:28 Hiroshi Honda
  2016-10-14  2:47 ` Duncan
  0 siblings, 1 reply; 5+ messages in thread
From: Hiroshi Honda @ 2016-10-13  1:28 UTC (permalink / raw)
  To: Btrfs List

I am using Btrfs RAID5 with 10x4T disks.
Around 2 weeks ago, one disk of it was taking long time when read and write.
So, I tried replace the disk with new disk by replace command ( btrfs replace ... ).
But, it failed. So, I added new disk into the array by add command and then deleted the bad disk by delete command.
But, this deletion failed with error message as well. So, I deleted new disk from the array at the moment to return to first situation.
But, system is showing some error message with number and the number is counting up.
So, I rebooted the OS. Because, I felt this situation is something bad.
After that, I can not mount the array...
I copied those disk image by dd and then removed real disks from the system and mounted those image files by losetup.

I would appreciate it if you could help this.

# uname -a
Linux L176 4.7.4-1-ARCH #1 SMP PREEMPT Thu Sep 15 15:24:29 CEST 2016 x86_64 GNU/Linux

# ./btrfs version
btrfs-progs v4.8.1

# ./btrfs fi show
Label: 'space1'  uuid: b27e1505-4203-464b-a9d5-96881ca550e0
        Total devices 10 FS bytes used 14.47TiB
        devid    1 size 3.64TiB used 1.61TiB path /dev/loop8
        devid    2 size 3.64TiB used 1.61TiB path /dev/loop6
        devid    3 size 3.64TiB used 1.61TiB path /dev/loop4
        devid    4 size 3.64TiB used 1.61TiB path /dev/loop7
        devid    5 size 3.64TiB used 1.61TiB path /dev/loop9
        devid    6 size 3.64TiB used 1.61TiB path /dev/loop0
        devid    7 size 3.64TiB used 1.61TiB path /dev/loop5
        devid    8 size 3.64TiB used 1.61TiB path /dev/loop2
        devid    9 size 3.64TiB used 1.61TiB path /dev/loop3
        devid   10 size 3.64TiB used 1.61TiB path /dev/loop1


# mount -t btrfs -o ro,degraded,recovery /dev/loop3 /root/oo/
mount: wrong fs type, bad option, bad superblock on /dev/loop3,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

"-o ro,degraded", "-o ro,recovery" and "-o ro" are same result..


# ./btrfs restore /dev/loop3 /root/oo/
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 58D16B1B wanted 64109AAB
checksum verify failed on 16091506147328 found 58D16B1B wanted 64109AAB
bytenr mismatch, want=16091506147328, have=3144945314663302585
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 10 is missing
warning, device 8 is missing
warning, device 7 is missing
warning, device 6 is missing
warning, device 5 is missing
warning, device 4 is missing
warning, device 3 is missing
warning, device 2 is missing
warning, device 1 is missing
warning, device 2 is missing
warning, device 3 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 6 is missing
warning, device 7 is missing
warning, device 8 is missing
warning, device 10 is missing
warning, device 1 is missing
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
bytenr mismatch, want=16091506147328, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super
warning, device 10 is missing
warning, device 8 is missing
warning, device 7 is missing
warning, device 6 is missing
warning, device 5 is missing
warning, device 4 is missing
warning, device 3 is missing
warning, device 2 is missing
warning, device 1 is missing
warning, device 2 is missing
warning, device 3 is missing
warning, device 4 is missing
warning, device 5 is missing
warning, device 6 is missing
warning, device 7 is missing
warning, device 8 is missing
warning, device 10 is missing
warning, device 1 is missing
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
bytenr mismatch, want=16091506147328, have=0
ERROR: cannot read chunk root
Could not open root, trying backup super


# ./btrfs-image /dev/loop3 diskid9.img
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 58D16B1B wanted 64109AAB
checksum verify failed on 16091506147328 found 58D16B1B wanted 64109AAB
bytenr mismatch, want=16091506147328, have=3144945314663302585
ERROR: cannot read chunk root
ERROR: open ctree failed
ERROR: create failed: Success


# ./btrfs check --readonly /dev/loop3
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 1E4E4207 wanted 00000000
checksum verify failed on 16091506147328 found 58D16B1B wanted 64109AAB
checksum verify failed on 16091506147328 found 58D16B1B wanted 64109AAB
bytenr mismatch, want=16091506147328, have=3144945314663302585
ERROR: cannot read chunk root
ERROR: cannot open file system

# ./btrfs-show-super -f -a /dev/loop[0-9]
http://xero.incoming.jp/btrfshelp/showsuper.txt
(148kB)

# ./btrfs rescue super-recover -v /dev/loop3
http://xero.incoming.jp/btrfshelp/rescue_super-recover.txt
(3kB)

"./btrfs rescue super-recover" says /dev/loop3 is bad.
But, I suppose /dev/loop3 is sole correct disk. Because the generation is old than another.
I guess perhaps this bad disk was rejected when I did "btrfs delete..." command.
That's why this disk's generation is being stop.
If so, If I could use this information, I wonder rescue files from the array??
But I have no idea about the method...

Thank you
Hiroshi Honda


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unable to rescue RAID5
  2016-10-13  1:28 Unable to rescue RAID5 Hiroshi Honda
@ 2016-10-14  2:47 ` Duncan
  2016-10-14 10:11   ` Hiroshi Honda
  0 siblings, 1 reply; 5+ messages in thread
From: Duncan @ 2016-10-14  2:47 UTC (permalink / raw)
  To: linux-btrfs

Hiroshi Honda posted on Thu, 13 Oct 2016 10:28:19 +0900 as excerpted:

> I am using Btrfs RAID5 with 10x4T disks.
> Around 2 weeks ago, one disk of it was taking long time when read and
> write.
> So, I tried replace the disk with new disk by replace command ( btrfs
> replace ... ).
> But, it failed. So, I added new disk into the array by add command and
> then deleted the bad disk by delete command.
> But, this deletion failed with error message as well. So, I deleted new
> disk from the array at the moment to return to first situation.
> But, system is showing some error message with number and the number is
> counting up.
> So, I rebooted the OS. Because, I felt this situation is something bad.
> After that, I can not mount the array...

Btrfs parity-raid currently has serious known issues that aren't likely 
to be corrected in the short term, as to some extent they're in the 
design.  The recommendation is thus not to use btrfs raid56 in 
production, and if you're already on it, to get off it ASAP, because 
while it runs fine in normal operation, failure and repair modes simply 
can't be guaranteed to work as one might expect at this point.

There's more about that on the list if you're interested, and others 
following raid56 mode more closely than I will likely reply with further 
specific details, as well, but that's the basic raid56 mode status and 
recommendation in practical terms.

As for rescuing files from the array, the proper answer is that btrfs as 
a whole is still stabilizing, not fully stable and mature, so the 
sysadmin's rule that if you don't have at least one backup, you are by 
definition of your failure to backup, defining the data as worth, at 
maximum, less than the time, hassle and resources necessary to take that 
backup, applies even more strongly to the not yet fully stable and mature 
btrfs, than it does to properly stable and mature filesystems.

And of course btrfs raid56 mode as a feature is newer and less stabilized 
than btrfs in general, so the rule that if you don't have a backup, you 
really are defining the data as of throw-away value, applies even more to 
anything on btrfs raid56.

So to rescue the data, restore it from backups to a preferably non-btrfs-
raid56 mode filesystem.  Or if you don't have backups, don't worry about 
it, because you effectively already defined the data as no more than 
throw-away value by not having those backups in the first place.

That's the proper answer.  In practice... all hope isn't yet lost.  
There's some chance to rescue your data, but it'll take time and 
patience, and likely a significant level of technical understanding along 
with some help from btrfs experts that know more about the btrfs raid56 
technical details than I do.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unable to rescue RAID5
  2016-10-14  2:47 ` Duncan
@ 2016-10-14 10:11   ` Hiroshi Honda
  2016-10-14 11:28     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 5+ messages in thread
From: Hiroshi Honda @ 2016-10-14 10:11 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

> That's the proper answer.  In practice... all hope isn't yet lost.  
I understood the proper answer.
I'll take care it in the future.

Is there something step/method can I do from this situation?

Thank you
Hiroshi Honda


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unable to rescue RAID5
  2016-10-14 10:11   ` Hiroshi Honda
@ 2016-10-14 11:28     ` Austin S. Hemmelgarn
  2016-10-16  2:34       ` Hiroshi Honda
  0 siblings, 1 reply; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2016-10-14 11:28 UTC (permalink / raw)
  To: Hiroshi Honda, Duncan; +Cc: linux-btrfs

On 2016-10-14 06:11, Hiroshi Honda wrote:
>> That's the proper answer.  In practice... all hope isn't yet lost.
> I understood the proper answer.
> I'll take care it in the future.
>
> Is there something step/method can I do from this situation?
>
You should probably look at `btrfs restore`.  I'm not sure how well it 
works with the raid56 code, and you'll need somewhere to put the data 
being restored, but that's probably your best option.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unable to rescue RAID5
  2016-10-14 11:28     ` Austin S. Hemmelgarn
@ 2016-10-16  2:34       ` Hiroshi Honda
  0 siblings, 0 replies; 5+ messages in thread
From: Hiroshi Honda @ 2016-10-16  2:34 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs


> You should probably look at `btrfs restore`.  I'm not sure how well it works with the raid56 code, and
> you'll need somewhere to put the data being restored, but that's probably your best option.

I tried it already. and then it was "cannot read chunk root".
Please take see 1st article of this thread.
http://www.spinics.net/lists/linux-btrfs/msg59683.html
And I specified -i option for ignore errors, but same result.

1st super mirror(67108864) and 2nd super mirror(274877906944) said all
devices are missing like below. But there are all devices...
> warning, device 10 is missing
> warning, device 8 is missing
> warning, device 7 is missing
> warning, device 6 is missing
> warning, device 5 is missing
> warning, device 4 is missing
> warning, device 3 is missing
> warning, device 2 is missing
> warning, device 1 is missing
> warning, device 2 is missing
> warning, device 3 is missing
> warning, device 4 is missing
> warning, device 5 is missing
> warning, device 6 is missing
> warning, device 7 is missing
> warning, device 8 is missing
> warning, device 10 is missing
> warning, device 1 is missing

How can I avoid those missing error?
Can I restore files if I could avoid those missing errors?
Do I need change source code or Is there btrfs command or the option?

Thank you
Hiroshi Honda


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-10-16  2:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-13  1:28 Unable to rescue RAID5 Hiroshi Honda
2016-10-14  2:47 ` Duncan
2016-10-14 10:11   ` Hiroshi Honda
2016-10-14 11:28     ` Austin S. Hemmelgarn
2016-10-16  2:34       ` Hiroshi Honda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).