Data recovery from a linear multi-disk btrfs file system

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Data recovery from a linear multi-disk btrfs file system
@ 2016-07-15  9:51 Matt
  2016-07-15 12:10 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 7+ messages in thread
From: Matt @ 2016-07-15  9:51 UTC (permalink / raw)
  To: linux-btrfs

Hello

I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large file system (see below).  One of the 6 disk failed. What is the best way to recover from this?

Thanks to RAID1 of the metadata I can still access the data residing on the remaining 5 disks after mounting ro,force.  What I would like to do now is to 

1) Find out the names of all the files with missing data
2) Make the file system fully functional (rw) again.

To achieve 2 I wanted to move the data of the disk. This, however, turns out to be rather difficult. 
 - rsync does not provide a immediate time-out option in case of an IO error
 - Even when I set the time-out for dd_rescue to a minimum, the transfer speed is still way too low to move the data
 (> 15TB) off the file system.
Both methods are too slow to move off the data within a reasonable time frame. 

Does anybody have a suggestion how to best recover from this? (Our backup is incomplete).
I am looking for either a tool to move off the  data — something which gives up immediately in case of IO error and log the affected files.
Alternatively I am looking for a btrfs command like  “ btrfs device delete missing “ for a non-RAID multi-disk btrfs filesystem.
Would some variant of  "btrfs balance" do something helpful?

Any help is appreciated!

Regards,
Matt

# btrfs fi show
Label: none  uuid: d82fff2c-0232-47dd-a257-04c67141fc83
	Total devices 6 FS bytes used 16.83TiB
	devid    1 size 3.64TiB used 3.47TiB path /dev/sdc
	devid    2 size 3.64TiB used 3.47TiB path /dev/sdd
	devid    3 size 3.64TiB used 3.47TiB path /dev/sde
	devid    4 size 3.64TiB used 3.47TiB path /dev/sdf
	devid    5 size 1.82TiB used 1.82TiB path /dev/sdb
	*** Some devices missing

\x16# btrfs fi df /work
Data, RAID0: total=18.31TiB, used=16.80TiB
Data, single: total=8.00MiB, used=8.00MiB
System, RAID1: total=8.00MiB, used=896.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=34.00GiB, used=30.18GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Data recovery from a linear multi-disk btrfs file system
  2016-07-15  9:51 Data recovery from a linear multi-disk btrfs file system Matt
@ 2016-07-15 12:10 ` Austin S. Hemmelgarn
  2016-07-15 18:45   ` Matt
  0 siblings, 1 reply; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-15 12:10 UTC (permalink / raw)
  To: Matt, linux-btrfs

On 2016-07-15 05:51, Matt wrote:
> Hello
>
> I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large file system (see below).  One of the 6 disk failed. What is the best way to recover from this?
>
> Thanks to RAID1 of the metadata I can still access the data residing on the remaining 5 disks after mounting ro,force.  What I would like to do now is to
>
> 1) Find out the names of all the files with missing data
> 2) Make the file system fully functional (rw) again.
>
> To achieve 2 I wanted to move the data of the disk. This, however, turns out to be rather difficult.
>  - rsync does not provide a immediate time-out option in case of an IO error
>  - Even when I set the time-out for dd_rescue to a minimum, the transfer speed is still way too low to move the data
>  (> 15TB) off the file system.
> Both methods are too slow to move off the data within a reasonable time frame.
>
> Does anybody have a suggestion how to best recover from this? (Our backup is incomplete).
> I am looking for either a tool to move off the  data — something which gives up immediately in case of IO error and log the affected files.
> Alternatively I am looking for a btrfs command like  “ btrfs device delete missing “ for a non-RAID multi-disk btrfs filesystem.
> Would some variant of  "btrfs balance" do something helpful?
>
> Any help is appreciated!
>
> Regards,
> Matt
>
> # btrfs fi show
> Label: none  uuid: d82fff2c-0232-47dd-a257-04c67141fc83
> 	Total devices 6 FS bytes used 16.83TiB
> 	devid    1 size 3.64TiB used 3.47TiB path /dev/sdc
> 	devid    2 size 3.64TiB used 3.47TiB path /dev/sdd
> 	devid    3 size 3.64TiB used 3.47TiB path /dev/sde
> 	devid    4 size 3.64TiB used 3.47TiB path /dev/sdf
> 	devid    5 size 1.82TiB used 1.82TiB path /dev/sdb
> 	*** Some devices missing
>
>
> \x16# btrfs fi df /work
> Data, RAID0: total=18.31TiB, used=16.80TiB
> Data, single: total=8.00MiB, used=8.00MiB
> System, RAID1: total=8.00MiB, used=896.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=34.00GiB, used=30.18GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
The tool you want is `btrfs restore`.  You'll need somewhere to put the 
files from this too of course.  That said, given that you had data in 
raid0 mode, you're not likely to get much other than very small files 
back out of this, and given other factors, you're not likely to get what 
you would consider reasonable performance out of this either.

Your best bet to get a working filesystem again would be to just 
recreate it from scratch, there's not much else that can be done when 
you've got a raid0 profile and have lost a disk.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Data recovery from a linear multi-disk btrfs file system
  2016-07-15 12:10 ` Austin S. Hemmelgarn
@ 2016-07-15 18:45   ` Matt
  2016-07-15 18:52     ` Austin S. Hemmelgarn
  2016-07-20 22:19     ` Kai Krakow
  0 siblings, 2 replies; 7+ messages in thread
From: Matt @ 2016-07-15 18:45 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

> On 15 Jul 2016, at 14:10, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote:
> 
> On 2016-07-15 05:51, Matt wrote:
>> Hello
>> 
>> I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large file system (see below).  One of the 6 disk failed. What is the best way to recover from this?
>> 
> The tool you want is `btrfs restore`.  You'll need somewhere to put the files from this too of course.  That said, given that you had data in raid0 mode, you're not likely to get much other than very small files back out of this, and given other factors, you're not likely to get what you would consider reasonable performance out of this either.

Thanks so much for pointing me towards btrfs-restore. I surely will give it a try.  Note that the FS is not a RAID0 but  linear (“JPOD") configuration. This is why  it somehow did not occur to me to try btrfs-restore.  The good news about in this configuration  the files are *not* distributed across disks. We can  read most of the files just fine.  The failed disk was actually smaller than the others five so that we should be able to recover more than 5/6 of the data, shouldn’t we?  My trouble is that the IO errors due to the missing disk  cripple the transfer speed of both rsync and dd_rescue.

> Your best bet to get a working filesystem again would be to just recreate it from scratch, there's not much else that can be done when you've got a raid0 profile and have lost a disk.

This is what I plan to do if there if btrfs-restore turns out to be too slow and nobody on this list has any better idea.  It will, however, require  transferring  >15TB across the Atlantic (this is were the “backup” reside).  This can be tedious which is why I would love to avoid it.

Matt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Data recovery from a linear multi-disk btrfs file system
  2016-07-15 18:45   ` Matt
@ 2016-07-15 18:52     ` Austin S. Hemmelgarn
  2016-07-20 20:20       ` Chris Murphy
  2016-07-20 22:19     ` Kai Krakow
  1 sibling, 1 reply; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2016-07-15 18:52 UTC (permalink / raw)
  To: Matt; +Cc: linux-btrfs

On 2016-07-15 14:45, Matt wrote:
>
>> On 15 Jul 2016, at 14:10, Austin S. Hemmelgarn <ahferroin7@gmail.com> wrote:
>>
>> On 2016-07-15 05:51, Matt wrote:
>>> Hello
>>>
>>> I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large file system (see below).  One of the 6 disk failed. What is the best way to recover from this?
>>>
>> The tool you want is `btrfs restore`.  You'll need somewhere to put the files from this too of course.  That said, given that you had data in raid0 mode, you're not likely to get much other than very small files back out of this, and given other factors, you're not likely to get what you would consider reasonable performance out of this either.
>
> Thanks so much for pointing me towards btrfs-restore. I surely will give it a try.  Note that the FS is not a RAID0 but  linear (“JPOD") configuration. This is why  it somehow did not occur to me to try btrfs-restore.  The good news about in this configuration  the files are *not* distributed across disks. We can  read most of the files just fine.  The failed disk was actually smaller than the others five so that we should be able to recover more than 5/6 of the data, shouldn’t we?  My trouble is that the IO errors due to the missing disk  cripple the transfer speed of both rsync and dd_rescue.
Your own 'btrfs fi df' output clearly says that more than 99% of your 
data chunks are in a RAID0 profile, hence my statement.  Functionally, 
this is similar to concatenating all the disks, but it gets better 
performance and is a bit harder to recover data from.  I hadn't noticed 
however that the disks were different sizes, so should be able to 
recover a significant amount of data from it.
>
>> Your best bet to get a working filesystem again would be to just recreate it from scratch, there's not much else that can be done when you've got a raid0 profile and have lost a disk.
>
> This is what I plan to do if there if btrfs-restore turns out to be too slow and nobody on this list has any better idea.  It will, however, require  transferring  >15TB across the Atlantic (this is were the “backup” reside).  This can be tedious which is why I would love to avoid it.
Entirely understandable.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Data recovery from a linear multi-disk btrfs file system
  2016-07-15 18:52     ` Austin S. Hemmelgarn
@ 2016-07-20 20:20       ` Chris Murphy
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Murphy @ 2016-07-20 20:20 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Matt, Btrfs BTRFS

On Fri, Jul 15, 2016 at 12:52 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:

> Your own 'btrfs fi df' output clearly says that more than 99% of your data
> chunks are in a RAID0 profile, hence my statement.

Somewhen in ancient Btrfs list history, there was a call to change the
mkfs default for multiple device from data raid0 profile to single.
But I just tried it and it's still raid0. That's pretty risky for a
default as any file more than 64KiB is going to end up with an
unrecoverable hole in the data, and it might be that the OP was
expecting single profile when creating this file system and just
overlooked that it's raid0, not linear.

Even single profile is risky. Some users might be prepared to lose
some data on one failed device. But the safest option would be to use
raid1 (two copies only, should we one day get n-copies) profile for
data and metadata when the profile isn't otherwise specified. By
default.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Data recovery from a linear multi-disk btrfs file system
  2016-07-15 18:45   ` Matt
  2016-07-15 18:52     ` Austin S. Hemmelgarn
@ 2016-07-20 22:19     ` Kai Krakow
  2016-07-20 22:30       ` Kai Krakow
  1 sibling, 1 reply; 7+ messages in thread
From: Kai Krakow @ 2016-07-20 22:19 UTC (permalink / raw)
  To: linux-btrfs

Am Fri, 15 Jul 2016 20:45:32 +0200
schrieb Matt <langelino@gmx.net>:

> > On 15 Jul 2016, at 14:10, Austin S. Hemmelgarn
> > <ahferroin7@gmail.com> wrote:
> > 
> > On 2016-07-15 05:51, Matt wrote:  
> >> Hello
> >> 
> >> I glued together 6 disks in linear lvm fashion (no RAID) to obtain
> >> one large file system (see below).  One of the 6 disk failed. What
> >> is the best way to recover from this? 
> > The tool you want is `btrfs restore`.  You'll need somewhere to put
> > the files from this too of course.  That said, given that you had
> > data in raid0 mode, you're not likely to get much other than very
> > small files back out of this, and given other factors, you're not
> > likely to get what you would consider reasonable performance out of
> > this either.  
> 
> Thanks so much for pointing me towards btrfs-restore. I surely will
> give it a try.  Note that the FS is not a RAID0 but  linear (“JPOD")
> configuration. This is why  it somehow did not occur to me to try
> btrfs-restore.  The good news about in this configuration  the files
> are *not* distributed across disks. We can  read most of the files
> just fine.  The failed disk was actually smaller than the others five
> so that we should be able to recover more than 5/6 of the data,
> shouldn’t we?  My trouble is that the IO errors due to the missing
> disk  cripple the transfer speed of both rsync and dd_rescue.
> 
> > Your best bet to get a working filesystem again would be to just
> > recreate it from scratch, there's not much else that can be done
> > when you've got a raid0 profile and have lost a disk.  
> 
> This is what I plan to do if there if btrfs-restore turns out to be
> too slow and nobody on this list has any better idea.  It will,
> however, require  transferring  >15TB across the Atlantic (this is
> were the “backup” reside).  This can be tedious which is why I would
> love to avoid it.

Depending on the importance of the data it may be cheaper to transfer
the data physically on harddisks...

However, if your backup potentially includes a lot of duplicate blocks,
you may have a better experience using borgbackup to transfer the data
- it's a free, deduplicating and compressing backup tool. If your data
isn't already compressed and doesn't contain a lot of images, you may
end up with 8TB or less data to transfer. I'm using borg to compress a
300GB server down to 50-60GB backup (and this already includes 4 weeks
worth of retention). My home machine compresses down to 1.2TB from
1.8TB data with around 1 week of retention - tho I'm having a lot of
non-duplicated binary data (images, videos, games).

When backing up across a long or slow network link, you may want to work
with a local cache of the backup - and you may want to work with
deduplication. My strategy is to use borgbackup to create backups
locally, then rsync the result to the remote location.

-- 
Regards,
Kai

Replies to list-only preferred.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Data recovery from a linear multi-disk btrfs file system
  2016-07-20 22:19     ` Kai Krakow
@ 2016-07-20 22:30       ` Kai Krakow
  0 siblings, 0 replies; 7+ messages in thread
From: Kai Krakow @ 2016-07-20 22:30 UTC (permalink / raw)
  To: linux-btrfs

Am Thu, 21 Jul 2016 00:19:41 +0200
schrieb Kai Krakow <hurikhan77@gmail.com>:

> Am Fri, 15 Jul 2016 20:45:32 +0200
> schrieb Matt <langelino@gmx.net>:
> 
> > > On 15 Jul 2016, at 14:10, Austin S. Hemmelgarn
> > > <ahferroin7@gmail.com> wrote:
> > > 
> > > On 2016-07-15 05:51, Matt wrote:    
>  [...]  
> > > The tool you want is `btrfs restore`.  You'll need somewhere to
> > > put the files from this too of course.  That said, given that you
> > > had data in raid0 mode, you're not likely to get much other than
> > > very small files back out of this, and given other factors,
> > > you're not likely to get what you would consider reasonable
> > > performance out of this either.    
> > 
> > Thanks so much for pointing me towards btrfs-restore. I surely will
> > give it a try.  Note that the FS is not a RAID0 but  linear (“JPOD")
> > configuration. This is why  it somehow did not occur to me to try
> > btrfs-restore.  The good news about in this configuration  the files
> > are *not* distributed across disks. We can  read most of the files
> > just fine.  The failed disk was actually smaller than the others
> > five so that we should be able to recover more than 5/6 of the data,
> > shouldn’t we?  My trouble is that the IO errors due to the missing
> > disk  cripple the transfer speed of both rsync and dd_rescue.
> >   
> > > Your best bet to get a working filesystem again would be to just
> > > recreate it from scratch, there's not much else that can be done
> > > when you've got a raid0 profile and have lost a disk.    
> > 
> > This is what I plan to do if there if btrfs-restore turns out to be
> > too slow and nobody on this list has any better idea.  It will,
> > however, require  transferring  >15TB across the Atlantic (this is
> > were the “backup” reside).  This can be tedious which is why I would
> > love to avoid it.  
> 
> Depending on the importance of the data it may be cheaper to transfer
> the data physically on harddisks...
> 
> However, if your backup potentially includes a lot of duplicate
> blocks, you may have a better experience using borgbackup to transfer
> the data
> - it's a free, deduplicating and compressing backup tool. If your data
> isn't already compressed and doesn't contain a lot of images, you may
> end up with 8TB or less data to transfer. I'm using borg to compress a
> 300GB server down to 50-60GB backup (and this already includes 4 weeks
> worth of retention). My home machine compresses down to 1.2TB from
> 1.8TB data with around 1 week of retention - tho I'm having a lot of
> non-duplicated binary data (images, videos, games).
> 
> When backing up across a long or slow network link, you may want to
> work with a local cache of the backup - and you may want to work with
> deduplication. My strategy is to use borgbackup to create backups
> locally, then rsync the result to the remote location.

BTW: You should start transferring the backup to your local location in
parallel to recovering your local storage. Another option would be to
recover what's possible, take a borgbackup of it, then use borg to
backup the remote location into the same repository - thanks to its
deduplication it would only transfer blocks not known locally. You then
have an option to recover data easily from both copies in the
repository. This, however, will only work properly if your remote
backup has been built using rsync (so it has the same file structure
and is not some archive format) or is extracted temporarily at the
remote location. Extra tip: borg will recover from partial
transfers, you could still parallelize both options. ;-) You just
cannot access a single backup repository concurrently.

-- 
Regards,
Kai

Replies to list-only preferred.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-20 22:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-15  9:51 Data recovery from a linear multi-disk btrfs file system Matt
2016-07-15 12:10 ` Austin S. Hemmelgarn
2016-07-15 18:45   ` Matt
2016-07-15 18:52     ` Austin S. Hemmelgarn
2016-07-20 20:20       ` Chris Murphy
2016-07-20 22:19     ` Kai Krakow
2016-07-20 22:30       ` Kai Krakow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).