From: Martin Steigerwald <martin@lichtvoll.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.cz>
Subject: BTRFS wiki: page about recovery (was: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.)
Date: Thu, 05 Feb 2015 09:59:02 +0100 [thread overview]
Message-ID: <1505422.Q2KAhjoLSB@merkaba> (raw)
In-Reply-To: <54D32D9D.2060306@cn.fujitsu.com>
Am Donnerstag, 5. Februar 2015, 16:45:17 schrieb Qu Wenruo:
> -------- Original Message --------
> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks,
> AKA dangerous mode.
> From: Martin Steigerwald <martin@lichtvoll.de>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015年02月05日 16:31
>
> > Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo:
> >> -------- Original Message --------
> >> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree
> >> blocks, AKA dangerous mode.
> >> From: Martin Steigerwald <martin@lichtvoll.de>
> >> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >> Date: 2015年02月04日 17:16
> >>
> >>> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> >>>> Btrfs's metadata csum is a good mechanism, keeping bit error away
> >>>> from
> >>>> sensitive kernel. But such mechanism will also be too sensitive,
> >>>> like
> >>>> bit error in csum bytes or low all zero bits in nodeptr.
> >>>> It's a trade using "error tolerance" for stable, and is reasonable
> >>>> for
> >>>> most cases since there is DUP/RAID1/5/6/10 duplication level.
> >>>>
> >>>> But in some case, whatever for development purpose or despair user
> >>>> who
> >>>> can't tolerant all his/her inline data lost, or even crazy QA team
> >>>> hoping btrfs can survive heavy random bits bombing, there are some
> >>>> guys
> >>>> want to get rid of the csum protection and face the crucial raw
> >>>> data
> >>>> no
> >>>> matter what disaster may happen.
> >>>>
> >>>> So, introduce the new '--dangerous' (or "destruction"/"debug" if
> >>>> you
> >>>> like) option for btrfsck to reset all csum of tree blocks.
> >>>
> >>> I often wondered about this: AFAIK if you get a csum error BTRFS
> >>> makes
> >>> this an input/output error. For being able to access the data in
> >>> place,
> >>> how about a "iwantmycorrupteddataback" mount option where BTRFS just
> >>> logs csum errors but allows one to access the files nonetheless.
> >>
> >> The idea is good, but don't forget we have metadata(tree block) and
> >> data. For data, this is completely OK.
> >> But for metadata, this may be a disaster just like the --dangerous
> >> option.
> >
> > Ah yes, so probably only do this for data or have an extra option for
> > skipping csum on metadata for the really desparate, but then I´d
> > really
> > force read only to avoid corrupted causing more damage.
> >
> >>> This could even
> >>> work together with remount. Maybe it would be good not to allow
> >>> writing to broken csum blocks, i.e. fail these with input/output
> >>> error.
> >>
> >> Don't forget btrfs' COW write.
> >> So write into data shouldn't be a problem.(if COW is enabled).
> >
> > Yes, but… it hides the corruption. Unless you have a snapshot if an
> > application reads corrupted data and then writes it back, then you
> > have no indication that the data was corrupted in the first time.
> >
> >>> This way, the csum would not be automatically fixed, *but* one is
> >>> able
> >>> to access the broken data, *while* knowing it is broken.
> >>>
> >>> If that is possible already, I missed it.
> >>
> >> Much as you considered, data csum can be rebuilt in btrfsck with
> >> --init-csum-tree option.
> >> Although not every user knows this feature and even less users know
> >> the
> >> correct timing using it.
> >
> > I wonder about making a wiki page about recovery options with two
> > parts:
> >
> > 1) Diagnosis. First find out what might be wrong.
> >
> > 2) Cure. Then decide which steps to try to recover.
>
> This seems really useful.
>
> But I'm a little afraid of introducing too much info for end user,
> metadata/data, difference between btrfsck
> and scrub and tons of other things may make user confused.
> And more, this things should be done by btrfsck automatically...
Sure. The page should contain a disclaimer anyway, and I think its good to
have it as easy as possible for the user. But also, for the early
adopters, I think it is really good to have some guidance available, with
the caveat to always ask here on the mailing list if unsure about next
step.
> Beside this, wiki pages about real world btrfs recovery strategy is very
> helpful.
> Feel free to add, although I'm not sure how to add pages to btrfs wiki,
> maybe you need to contact Marc or
> David?
David, I requested a wiki account via the page and even made a (not quite
serious) 50 words biography in order to pass that form.
Thanks,
Martin
>
> Thanks,
> Qu
>
> > And of cause an intro on best practice to only work on a copy of the
> > copy for any in-place repair attempts.
> >
> > I´d be willing to make such a page, provided I get enough hints on
> > what to try when. I have some ideas myself, but I am not sure they
> > are accurate :)
> >
> > Thanks,
> > Martin
> >
> >> Thanks,
> >> Qu
> >>
> >>>> The csum reseting have the following features:
> >>>> 1) Top to down level by level
> >>>> The csum resetting is done from tree to level 1, and only when all
> >>>> the
> >>>> csum of nodes in this level is reset and can pass read_tree_block()
> >>>> check, it will continue to next level.
> >>>> And all bytenr in nodeptr will be re-aligned, so bit error in the
> >>>> low
> >>>> 12 bits(4K sector size case) can also be repaired without pain.
> >>>> With this behavior, error in nodeptr has a chance not affecting its
> >>>> child.
> >>>>
> >>>> 2) No Copy-on-write
> >>>> COW means we needs to have a valid extent tree, if extent tree is
> >>>> corrupted COW will only be a BUG_ON blocking us.
> >>>> So all the r/w in this dangerous mode will use no-cow write. That's
> >>>> why
> >>>> we export and slightly modified write_tree_block() to do no-cow
> >>>> tree
> >>>> block write with newly calculated csum.
> >>>> Since the write is not cowed, if it fails, it will also destroy the
> >>>> last hope for manual inspection.
> >>>>
> >>>> Qu Wenruo (7):
> >>>> btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search
> >>>> result
> >>>>
> >>>> in the same level of path->lowest_level.
> >>>>
> >>>> btrfs-progs: Introduce btrfs_next_slot() function to iterate to
> >>>> next
> >>>>
> >>>> slot in given level.
> >>>>
> >>>> btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree
> >>>> node.
> >>>> btrfs-progs: Export write_tree_block() and allow it to do nocow
> >>>> write.
> >>>>
> >>>> btrfs-progs: Introduce new function reset_tree_block_csum() for
> >>>> later
> >>>> tree block csum reset.
> >>>>
> >>>> btrfs-progs: Introduce new function
> >>>> reset_(one_root/roots)_csum()
> >>>> to
> >>>>
> >>>> reset one/all tree's csum in tree root.
> >>>>
> >>>> btrfs-progs: Introduce "--dangerous" option to reset all tree
> >>>> block
> >>>>
> >>>> csum.
> >>>>
> >>>> cmds-check.c | 284
> >>>>
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
> >>>>
> >>>> | 18 ++--
> >>>>
> >>>> ctree.h | 25 +++++-
> >>>> disk-io.c | 55 +++++++++---
> >>>> disk-io.h | 3 +
> >>>> 5 files changed, 359 insertions(+), 26 deletions(-)
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
next prev parent reply other threads:[~2015-02-05 8:59 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-04 7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
2015-02-04 7:16 ` [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level Qu Wenruo
2015-02-04 7:16 ` [PATCH 2/7] btrfs-progs: Introduce btrfs_next_slot() function to iterate to next slot in given level Qu Wenruo
2015-02-04 7:16 ` [PATCH 3/7] btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node Qu Wenruo
2015-02-04 7:16 ` [PATCH 4/7] btrfs-progs: Export write_tree_block() and allow it to do nocow write Qu Wenruo
2015-02-04 7:16 ` [PATCH 4/5] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04 7:16 ` [PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04 7:16 ` [PATCH 5/7] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04 7:16 ` [PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04 7:16 ` [PATCH 7/7] btrfs-progs: Introduce "--dangerous" option to reset all tree block csum Qu Wenruo
2015-02-04 9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
2015-02-04 10:07 ` Paul Jones
2015-02-05 1:43 ` Qu Wenruo
2015-02-05 1:35 ` Qu Wenruo
2015-02-05 8:31 ` Martin Steigerwald
2015-02-05 8:45 ` Qu Wenruo
2015-02-05 8:59 ` Martin Steigerwald [this message]
2015-04-22 5:55 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1505422.Q2KAhjoLSB@merkaba \
--to=martin@lichtvoll.de \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).