From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: recover broken partition on external HDD
Date: Mon, 6 Aug 2018 22:40:33 +0000 (UTC) [thread overview]
Message-ID: <pan$b5a07$5993c1bb$cc37f579$529a3aaa@cox.net> (raw)
In-Reply-To: CAP-4mcAfmp-rCrr5_-TJL4HNz8tVi5kpsOo-8WkZNRLRPOgD=g@mail.gmail.com
Marijn Stollenga posted on Sat, 04 Aug 2018 12:14:44 +0200 as excerpted:
> Hello btrfs experts, I need your help trying to recover an external HDD.
> I accidentally created a zfs partition on my external HD, which of
> course screw up the whole partition. I quickly unplugged it and it being
> a 1TB drive is assume there is still data on it.
Just a user and list regular here, not a dev, so my help will be somewhat
limited, but as I've seen no other replies yet, perhaps it's better than
nothing...
> With some great help on IRC I searched for tags using grep and found
> many positions:
> https://paste.ee/p/xzL5x
>
> Now I would like to scan all these positions for their information and
> somehow piece it together, I know there is supposed to be a superblock
> around 256GB but I'm not sure where the partition started (the search
> was run from a manually created partition starting at 1MB).
There's a mention of the three superblock copies and their addresses in
the problem FAQ (wrapped link due to posting-client required hoops I
don't want to jump thru to post it properly ATM, unwrap manually):
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#What_if_I_don.
27t_have_wipefs_at_hand.3F
> In general I would be happy if someone can point me to a library that
> can do low level reading and piecing together of these pieces of meta
> information and see what is left.
There are multiple libraries in various states available, but being more
a sysadmin than a dev I'd consume them as dependencies of whatever app I
was installing that required them, so I've not followed the details.
However, here's a bit of what I found just now with a quick look:
The project ideas FAQ on the wiki has a (somewhat outdated) library entry
(wrapped link...):
https://btrfs.wiki.kernel.org/index.php/
Project_ideas#Provide_a_library_covering_.27btrfs.27_functionality
That provides further links to a couple python projects as well as a
haskell lib.
But I added the "somewhat dated" parenthetical due to libtrfsutil by Omar
Sandoval, which appeared in btrfs-progs 4.16. So there's now an official
library. =:^) Tho not being a dev I've not the foggiest whether it'll
provide the functionality you're after.
I also see a rust lib mentioned on-list (Oct 2016).
https://gitlab.wellbehavedsoftware.com/well-behaved-software/rust-btrfs
> I know there is btrfs-check etc. but these need the superblock to be
> known. Also on another messed up drive (I screwed up two btrfs drives in
> the same way at the same time) I was able to find the third superblock,
> but it seems they in the end pointed to other parts in the file system
> in the beginning of the drive which were broken.
OK, this may seem like rubbing salt in the wound ATM, but there's a
reason they did that back in the day before modern disinfectants, it
helped stop infection before it started. Likewise, the following policy
should help avoid the problem in the first place.
A sysadmin's first rule of data value and backups is that the real value
placed on data isn't defined by arbitrary claims, but rather by the
number and quality of backups those who control that data find it
worthwhile to make of it. If it's worth a lot, there will be multiple
backups, likely stored in multiple locations, some offsite in ordered to
avoid loss in the event of fire/flood/bombing/etc. Only data that's of
trivial value, less than that of the time/trouble/resources necessary to
do that backup, will have no backup at all.
(Of course, age of backups is simply a sub-case of the above, since in
that case the data in question is simply the data in the delta between
the last backup and the current working state. By definition, as soon as
it is considered worth more than the time/trouble/resources necessary to
update the backup, an updated or full new backup will be made.)
(The second rule of backups is that it's not a backup until it has been
tested to actually be usable under conditions similar to those in which
the backup would actually be needed. In many cases that'll mean booting
to rescue media and ensuring they can access and restore the backup from
there using only the resources available from that rescue media. In
other cases it'll mean booting directly to the backup and ensuring that
normal operations can resume from there. Etc. And if it hasn't been
tested yet, it's not a backup, only a potential backup still in progress.)
So the above really shouldn't be a problem at all, because you either:
1) Defined the data as worth having a backup, in which case you can just
restore from it,
OR
2) Defined the data as of such limited value that it wasn't worth the
hassle/time/resources necessary for that backup, in which case you saved
what was of *real* value, that time/hassle/resources, before you ever
lost the data, and the data loss isn't a big deal because it, by
definition of not having a backup, can be of only trivial value not worth
the hassle.
There's no #3. The data was either defined as worth a backup by virtue
of having one, and can be restored from there, or it wasn't, but no big
deal because the time/trouble/resources that would have otherwise gone
into that backup was defined as more important, and was saved before the
data was ever lost in the first place.
Thus, while the loss of the data due to fat-fingering (which all
sysadmins come to appreciate the real risk of, after a few events of
their own) the placement of that ZFS might be a bit of a bother, it's not
worth spending huge amounts of time trying to recover, because it was
either worth having a backup, in which case you simply recover from it,
or it wasn't, in which case it's not worth spending huge amounts to time
trying to recover, either.
Of course there's still the pre-disaster weighed risk that something will
go wrong vs. the post-disaster it DID go wrong, now how do I best get
back to normal operation question, but in the context of the backups rule
above resolving that question is more a matter of whether it's most
efficient to spend a little time trying to recover the existing data with
no guarantee of full success, or to simply jump directly into the wipe
and restore from known-good (because tested!) backups, which might take
more time, but has a (near) 100% chance at recovery to the point of the
backup. (The slight chance of failure to recover from tested backups is
what multiple levels of backups covers for, with the the value of the
data and the weighed risk balanced against the value of the time/hassle/
resources necessary to do that one more level of backup.)
So while it might be worth a bit of time to quick-test recovery of the
damaged data, it very quickly becomes not worth the further hassle,
because either the data was already defined as not worth it due to not
having a backup, or restoring from that backup will be faster and less
hassle, with a far greater chance of success, than diving further into
the data recovery morass, with ever more limited chances of success.
Live by that sort of policy from now on, and the results of the next
failure, whether it be hardware, software, or wetware (another fat-
fingering, again, this is coming from someone, me, who has had enough of
their own!), won't be anything to write the list about, unless of course
it's a btrfs bug and quite apart from worrying about your data, you're
just trying to get it fixed so it won't continue to happen.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2018-08-07 0:54 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-04 10:14 recover broken partition on external HDD Marijn Stollenga
2018-08-06 22:40 ` Duncan [this message]
2018-08-09 7:30 ` Marijn Stollenga
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$b5a07$5993c1bb$cc37f579$529a3aaa@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).