* Self-destruct of btrfs RAID6 array
@ 2015-11-20 4:11 Paul Loewenstein
2015-11-20 6:19 ` Duncan
2015-11-20 13:29 ` Austin S Hemmelgarn
0 siblings, 2 replies; 3+ messages in thread
From: Paul Loewenstein @ 2015-11-20 4:11 UTC (permalink / raw)
To: linux-btrfs
I have just had an apparently catastrophic collapse of a large RAID6
array. I was hoping that the dual-redundancy of a RAID6 array would
compensate for having no backup media large enough to back it up!
Any suggestions for repairing this array, at least to the point of
mounting it read-only? I am thinking of trying to mount it degraded
with different devices missing, but I don't know if that will be an
exercise in futility.
btrfs fi show still works!
Label: 'btrfsdata' uuid: ccde0a00-e50b-4154-977f-ac591ab580a5
Total devices 6 FS bytes used 9.62TiB
devid 10 size 3.64TiB used 2.41TiB path /dev/sdg
devid 11 size 3.64TiB used 2.41TiB path /dev/sda
devid 12 size 3.64TiB used 2.41TiB path /dev/sdb
devid 13 size 3.64TiB used 2.41TiB path /dev/sdc
devid 14 size 3.64TiB used 2.41TiB path /dev/sdd
devid 15 size 3.64TiB used 2.41TiB path /dev/sde
It spontaneously (I believe it was after it successfully mounted rw on
boot, but I can't check for sure without looking at the last file
creation time). After another reboot it won't mount at all.
btrfs check /dev/sda gives:
parent transid verify failed on 73440384909312 wanted 491976 found 485531
parent transid verify failed on 73440384909312 wanted 491976 found 485531
checksum verify failed on 73440384909312 found 26943E11 wanted 0FCB3E97
checksum verify failed on 73440384909312 found AAD98681 wanted EA004FE8
checksum verify failed on 73440384909312 found AAD98681 wanted EA004FE8
bytenr mismatch, want=73440384909312, have=274180945215488
Couldn't read chunk root
Couldn't open file system
Looking back in the journal (I shall now be setting up journal
monitoring), I found lots of errors, starting last September, only a few
weeks after converting from RAID1 to RAID6.
Blank lines precede reboots and for the first log indicate the omission
of over 30K entries! The first log must represent some software bug,
because /dev/sdh is NOT a btrfs device!
LOG EXTRACTS, while the filesystem was still mounted. Journal grepped
for btrfs, boot line added after. Note different kernel version on
reboot after upgrade.
Aug 26 20:12:24 cambridge kernel: Linux version 4.1.5-100.fc21.x86_64
(mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 4.9.2 20150212
(Red Hat 4.9.2-6) (GCC) ) #1 SMP Tue Aug 11 00:24:23 UTC 2015
Aug 26 20:12:52 cambridge kernel: Btrfs loaded
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 11
transid 484422 /dev/sda
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 15
transid 484422 /dev/sde
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 13
transid 484422 /dev/sdc
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 14
transid 484422 /dev/sdd
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 12
transid 484422 /dev/sdb
Aug 26 20:12:52 cambridge kernel: BTRFS: device label btrfsdata devid 10
transid 484422 /dev/sdg
Sep 13 16:11:34 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 0, rd 0,
flush 1, corrupt 0, gen 0
Sep 13 16:11:34 cambridge kernel: BTRFS: lost page write due to I/O
error on /dev/sdh
Sep 13 16:11:34 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 1, rd 0,
flush 1, corrupt 0, gen 0
Sep 13 16:11:34 cambridge kernel: BTRFS: lost page write due to I/O
error on /dev/sdh
Sep 13 16:11:34 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 2, rd 0,
flush 1, corrupt 0, gen 0
Sep 13 16:11:34 cambridge kernel: BTRFS: lost page write due to I/O
error on /dev/sdh
Nov 15 15:21:51 cambridge kernel: BTRFS: lost page write due to I/O
error on /dev/sdh
Nov 15 15:21:51 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 18713,
rd 0, flush 6238, corrupt 0, gen 0
Nov 15 15:21:51 cambridge kernel: BTRFS: lost page write due to I/O
error on /dev/sdh
Nov 15 15:21:51 cambridge kernel: BTRFS: bdev /dev/sdh errs: wr 18714,
rd 0, flush 6238, corrupt 0, gen 0
Nov 15 15:23:00 cambridge kernel: Linux version 4.1.12-101.fc21.x86_64
(mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 4.9.2 20150212
(Red Hat 4.9.2-6) (GCC) ) #1 SMP Wed Oct 28 15:18:44 UTC 2015
Nov 15 15:23:33 cambridge kernel: Btrfs loaded
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 14
transid 492036 /dev/sdd
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 15
transid 485798 /dev/sde
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 11
transid 492036 /dev/sda
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 13
transid 492036 /dev/sdc
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 10
transid 492036 /dev/sdg
Nov 15 15:23:33 cambridge kernel: BTRFS: device label btrfsdata devid 12
transid 492036 /dev/sdb
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 15:23:33 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711,
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): bad tree block
start 1121375725894905312 74200909787136
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): bad tree block
start 7250342666203184288 74200909791232
Nov 15 15:23:33 cambridge kernel: BTRFS (device sdb): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:37:14 cambridge kernel: BTRFS (device sdb): parent transid
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:37:14 cambridge kernel: BTRFS (device sdb): parent transid
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:39:01 cambridge kernel: BTRFS (device sdb): bad tree block
start 8747312261073978676 74201584123904
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733865472 csum 3128256294 expected csum 3176585556
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733869568 csum 3953187115 expected csum 2827150008
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733873664 csum 2011708136 expected csum 1514290758
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733877760 csum 4227108651 expected csum 3929632885
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733881856 csum 667263525 expected csum 2167952522
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733885952 csum 1421670165 expected csum 2602382287
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733890048 csum 2320260888 expected csum 606775819
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733865472 csum 3128256294 expected csum 3176585556
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733894144 csum 2140326945 expected csum 2209619790
Nov 15 20:39:02 cambridge kernel: BTRFS warning (device sdb): csum
failed ino 1455165 off 1733898240 csum 372680472 expected csum 3888049973
Nov 15 20:42:45 cambridge kernel: Linux version 4.1.12-101.fc21.x86_64
(mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 4.9.2 20150212
(Red Hat 4.9.2-6) (GCC) ) #1 SMP Wed Oct 28 15:18:44 UTC 2015
Nov 15 20:43:16 cambridge kernel: Btrfs loaded
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 15
transid 492120 /dev/sde
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 14
transid 492120 /dev/sdd
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 13
transid 492120 /dev/sdc
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 12
transid 492120 /dev/sdb
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 11
transid 492120 /dev/sda
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 10
transid 492120 /dev/sdg
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:43:16 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711,
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): bad tree block
start 1121375725894905312 74200909787136
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): bad tree block
start 7250342666203184288 74200909791232
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS: Failed to read block groups: -5
Nov 15 20:43:16 cambridge kernel: BTRFS: open_ctree failed
Nov 15 20:49:14 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:49:15 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711,
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): bad tree block
start 1121375725894905312 74200909787136
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): bad tree block
start 7250342666203184288 74200909791232
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS: Failed to read block groups: -5
Nov 15 20:49:16 cambridge kernel: BTRFS: open_ctree failed
Nov 15 20:43:16 cambridge kernel: Btrfs loaded
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 15
transid 492120 /dev/sde
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 14
transid 492120 /dev/sdd
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 13
transid 492120 /dev/sdc
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 12
transid 492120 /dev/sdb
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 11
transid 492120 /dev/sda
Nov 15 20:43:16 cambridge kernel: BTRFS: device label btrfsdata devid 10
transid 492120 /dev/sdg
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:43:16 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711,
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): bad tree block
start 1121375725894905312 74200909787136
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): bad tree block
start 7250342666203184288 74200909791232
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:43:16 cambridge kernel: BTRFS: Failed to read block groups: -5
Nov 15 20:43:16 cambridge kernel: BTRFS: open_ctree failed
Nov 15 20:49:14 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384909312 wanted 491976 found 485531
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384913408 wanted 491976 found 485531
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384917504 wanted 491976 found 485696
Nov 15 20:49:15 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73440384921600 wanted 491976 found 485696
Nov 15 20:49:15 cambridge kernel: BTRFS: bdev /dev/sde errs: wr 18711,
rd 0, flush 6237, corrupt 0, gen 0
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): bad tree block
start 1121375725894905312 74200909787136
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): bad tree block
start 7250342666203184288 74200909791232
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS (device sdg): parent transid
verify failed on 73417618042880 wanted 488487 found 485439
Nov 15 20:49:16 cambridge kernel: BTRFS: Failed to read block groups: -5
Nov 15 20:49:16 cambridge kernel: BTRFS: open_ctree failed
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Self-destruct of btrfs RAID6 array
2015-11-20 4:11 Self-destruct of btrfs RAID6 array Paul Loewenstein
@ 2015-11-20 6:19 ` Duncan
2015-11-20 13:29 ` Austin S Hemmelgarn
1 sibling, 0 replies; 3+ messages in thread
From: Duncan @ 2015-11-20 6:19 UTC (permalink / raw)
To: linux-btrfs
Paul Loewenstein posted on Thu, 19 Nov 2015 20:11:14 -0800 as excerpted:
> I have just had an apparently catastrophic collapse of a large RAID6
> array. I was hoping that the dual-redundancy of a RAID6 array would
> compensate for having no backup media large enough to back it up!
Well...
First, while btrfs in general is "stabilizing" and is noticeably better
than it was a year ago, it remains "not yet fully stable or mature."
There's a sysadmin's rule of backups, that if it's not backed up, you
value the data it contains less than the time/trouble/resources of making
a backup, and thus, should it fail, regardless of any loss of data you've
saved what your actions defined as /really/ valuable, the time/trouble/
resources saved by not doing the backup, and thus should be happy as you
saved the real important stuff.
Because btrfs isn't yet fully stable, having backups is even more
important than it would be on a fully stable filesystem like xfs, ext*,
or reiserfs (my previous favorite and what I still use on spinning rust
and for backups), so that sysadmin's rule of backups applies double.
Of course some distros are choosing to deploy and support btrfs as if
it's already fully stable, and that's their risk and their business for
doing so, but by the same token, for that you'd get support from them,
not from the upstream list (here), where btrfs is still considered to be
"stabilizing, not yet fully stable".
Second, btrfs raid56 mode is much newer than btrfs in general, and isn't
yet close to even the "stabilizing, good enough provided you have good
backups or are using throw-away data" general level of btrfs. Nominal
code-completion was only kernel 3.19, and there were very significant
bugs with it and 4.0, into the early 4.1 cycle, tho by 4.1 release the
worst and known bugs were fixed. But as a btrfs user and list regular, I
and others have repeatedly recommended that people not consider btrfs
raid56 mode as "stabilizing-stable" as btrfs in general is, for at least
a year (five kernel cycles) after nominal code completion in 3.19, and
even then, people thinking about using btrfs raid56 should check the list
for recent bugs and consider, before deploying in anything but throw-away-
data (which can be because it's backed up data) test mode. Of course
that would be kernel 4.4, which is currently in development.
And as it happens, kernel 4.4 has been announced as a long-term-stable
series, so things look to be working out reasonably well for those
interested in first-opportunity-stablish btrfs raid56 deployment on it.
=:^)
Since we're obviously not at 4.4 release yet, and in fact you're
apparently running 4.1 stable series, that means btrfs raid56 mode must
still be considered less stable than btrfs as a whole, which as I said is
itself "still stabilizing, not fully stable and mature", so now we're at
double-the-already-doubled-strength, 4 times the normal strength, of the
sysadmin's backup rule.
So it's four-times self-evident that if you didn't have backups for data
on raid56 mode btrfs, by your actions you placed a *REALLY* low value on
that data! So losing it is /very/ trivial, at least compared to the time
and resources you can be happy you saved by not having a backup. =:^)
That said, there's still hope...
First, because btrfs raid56 mode /is/ so new and not yet stable, you
really need to be working with the absolute latest tools in ordered to
have the best chance at recovery. That means kernel 4.3 and btrfs-progs
4.3.1, if at all possible. You can use earlier, but it might mean losing
what's actually recoverable using the latest tools.
> Any suggestions for repairing this array, at least to the point of
> mounting it read-only? I am thinking of trying to mount it degraded
> with different devices missing, but I don't know if that will be an
> exercise in futility.
>
> btrfs fi show still works!
>
> Label: 'btrfsdata' uuid: ccde0a00-e50b-4154-977f-ac591ab580a5
> Total devices 6 FS bytes used 9.62TiB
> devid 10 size 3.64TiB used 2.41TiB path /dev/sdg
> devid 11 size 3.64TiB used 2.41TiB path /dev/sda
> devid 12 size 3.64TiB used 2.41TiB path /dev/sdb
> devid 13 size 3.64TiB used 2.41TiB path /dev/sdc
> devid 14 size 3.64TiB used 2.41TiB path /dev/sdd
> devid 15 size 3.64TiB used 2.41TiB path /dev/sde
>
> It spontaneously (I believe it was after it successfully mounted rw on
> boot, but I can't check for sure without looking at the last file
> creation time). After another reboot it won't mount at all.
You say mount, but there's no hint of the options you've tried.
If you've not yet read up on the user documentation on the wiki,
https://btrfs.wiki.kernel.org , I suggest you do so. There's a lot of
useful background information there, including discussion of mount
options and recovery.
What you will want to try here if you haven't already is a degraded,ro
mount, possibly with the recovery option as well (try it without first,
then with, if necessary).
If you've not tried degraded writable yet, there's a possibility mounting
degraded, writable, will work, but if it does, you want to do device
replaces/deletes to get undegraded as soon as possible, preferably with
as little other writing to the filesystem as possible, as if new chunks
need allocated to do further writes they may be allocated in single mode,
and there's currently a bug which won't allow degraded read-write mount
after that, because btrfs sees the single-mode chunks on a degraded
filesystem and thinks there may be others on the missing devices, without
actually checking. As a result, you often get just one shot at a
writable mount to undegrade, and if that doesn't work, the filesystem is
often only read-only mountable after that. (This bug applies to all
redundant/parity raid modes so to raid1 and raid10 as well, not just
raid56.)
If you /had/ tried degraded mounting, that bug may be why you're now
unable to mount again, writable, but degraded,ro, is likely to still
work. There's actually a patch for the bug, that makes btrfs check the
actual chunk allocation to see if all are accounted for on the existing
devices, allowing writable mounting if so, but it's definitely not in 4.1
or 4.2, tho I think it might have made 4.3. (If so it could possibly be
backported to stable-series 4.1 at least, but it's unlikely to be there
yet.)
If the various degraded,recovery,ro options don't work, the next thing to
try is btrfs restore. This works with an unmounted filesystem using the
userspace code, so a current btrfs-progs, preferably 4.3.0 or 4.3.1, is
recommended for the best chance at success.
What btrfs restore does is try to read the unmounted filesystem and
retrieve files from it, writing them to some other mounted filesystem
location. Newer btrfs restore versions have options to save ownership/
permissions and timestamp data, and rewrite symlinks as well, otherwise
the files are written as the executing user (root) using its umask.
There's options to write only selective parts of the filesystem, and/or
to restore specific snapshots (which are otherwise ignored), as well.
Obviously you'll need space at wherever you point restore at to write
whatever you intend to restore, but if you didn't have a current backup,
as people considering this option obviously didn't, this is basically
replacing the space you would have otherwise dedicated to backups, so
it's not too horrible.
With a bit of luck, restore will work without further trouble. If it
doesn't, there's more damage, but btrfs does keep a history of main
roots, and btrfs-find-root can be used to list them, with btrfs restore
able to take a root by its bytenr, using the -t option. Here's the wiki
page link with further instructions, tho last I looked it was a bit dated.
https://btrfs.wiki.kernel.org/index.php/Restore
A hint, in case it's not obvious from the wiki page, generation, and
transid/transaction-id, are the same thing. =:^)
Of course, also see the btrfs-restore manpage, which now actually lists
the wiki link for more info. As I said the wiki page was a bit dated
last I looked, so definitely check the manpage, and pay attention to the
newer options such as -l (list roots, useful with -t to see if that root
is a good restore candidate), -D (dry run), and -m and -S, metadata and
symlinks, without which files will be restored as the writing user (root)
using the present umask, with current timestamps, and no symlinks.
If btrfs restore fails you, then getting a dev interested in the specific
errors you have and patches to fix them, is your only hope. But of
course, since you already saved what was most important to you, the time
and resources you would have otherwise spent to do the backup, and what
might be lost here is as explained above at most valued at 4X-trivial,
you can still be happy that you saved the really important stuff and any
loss really /is/ trivial.
(Seriously, when you compare the loss of a bit of data to what those
folks in France lost recently, or what those Syrian refugees are risking
and at times losing, their lives, or what the folks in 9/11 lost... in
perspective, losing a bit of data here really *is* trivial. The fact
that we're both here at all, along with the others on the list,
discussing this, makes us all pretty lucky, all things considered!
Sometimes it does help to step back and get some /real/ perspective! =:^)
> Looking back in the journal (I shall now be setting up journal
> monitoring), I found lots of errors, starting last September, only a few
> weeks after converting from RAID1 to RAID6.
> Blank lines precede reboots and for the first log indicate the omission
> of over 30K entries! The first log must represent some software bug,
> because /dev/sdh is NOT a btrfs device!
That very possibly indicates either a different device-detection order
and thus device letter assignment on boot, such that one of the other
devices appeared as /dev/sdh at that boot, or a device dropping out and
reappearing as sdh, instead of whatever letter it had previously. On
today's hardware, such device reordering isn't uncommon, thus the switch
to mounting by UUID or filesystem labels, for instance, as opposed to the
now somewhat unpredictable /dev/sdX devices names, since the X can change!
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Self-destruct of btrfs RAID6 array
2015-11-20 4:11 Self-destruct of btrfs RAID6 array Paul Loewenstein
2015-11-20 6:19 ` Duncan
@ 2015-11-20 13:29 ` Austin S Hemmelgarn
1 sibling, 0 replies; 3+ messages in thread
From: Austin S Hemmelgarn @ 2015-11-20 13:29 UTC (permalink / raw)
To: Paul Loewenstein, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]
On 2015-11-19 23:11, Paul Loewenstein wrote:
> I have just had an apparently catastrophic collapse of a large RAID6
> array. I was hoping that the dual-redundancy of a RAID6 array would
> compensate for having no backup media large enough to back it up!
Duncan already did a really good job of explaining this (and from what I
can tell, I'm pretty sure his analysis of what's going on is correct),
but I would like to add a couple of things.
First, RAID is not a backup, it's a way to minimize the need to restore
from backups in the event of hardware failure (or, in the case of BTRFS,
also a way to minimize the effects of data corruption).
Second, have you considered doing encrypted backups to a cloud storage
service? This is what I personally do, and it works really well for me.
Amazon S3 has pretty reasonable pricing, and there are multiple
options on Linux to allow accessing it like a filesystem. There are
many other options as well (in my case, I backup to both S3 and Dropbox,
but I also have small enough backups that I don't need to worry about
the 1T limit on Dropbox for non-business accounts).
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-11-20 13:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-20 4:11 Self-destruct of btrfs RAID6 array Paul Loewenstein
2015-11-20 6:19 ` Duncan
2015-11-20 13:29 ` Austin S Hemmelgarn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox