linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Steigerwald <martin@lichtvoll.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.cz>
Subject: BTRFS wiki: page about recovery (was: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.)
Date: Thu, 05 Feb 2015 09:59:02 +0100	[thread overview]
Message-ID: <1505422.Q2KAhjoLSB@merkaba> (raw)
In-Reply-To: <54D32D9D.2060306@cn.fujitsu.com>

Am Donnerstag, 5. Februar 2015, 16:45:17 schrieb Qu Wenruo:
> -------- Original Message --------
> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks,
> AKA dangerous mode.
> From: Martin Steigerwald <martin@lichtvoll.de>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015年02月05日 16:31
> 
> > Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo:
> >> -------- Original Message --------
> >> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree
> >> blocks, AKA dangerous mode.
> >> From: Martin Steigerwald <martin@lichtvoll.de>
> >> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >> Date: 2015年02月04日 17:16
> >> 
> >>> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> >>>> Btrfs's metadata csum is a good mechanism, keeping bit error away
> >>>> from
> >>>> sensitive kernel. But such mechanism will also be too sensitive,
> >>>> like
> >>>> bit error in csum bytes or low all zero bits in nodeptr.
> >>>> It's a trade using "error tolerance" for stable, and is reasonable
> >>>> for
> >>>> most cases since there is DUP/RAID1/5/6/10 duplication level.
> >>>> 
> >>>> But in some case, whatever for development purpose or despair user
> >>>> who
> >>>> can't tolerant all his/her inline data lost, or even crazy QA team
> >>>> hoping btrfs can survive heavy random bits bombing, there are some
> >>>> guys
> >>>> want to get rid of the csum protection and face the crucial raw
> >>>> data
> >>>> no
> >>>> matter what disaster may happen.
> >>>> 
> >>>> So, introduce the new '--dangerous' (or "destruction"/"debug" if
> >>>> you
> >>>> like) option for btrfsck to reset all csum of tree blocks.
> >>> 
> >>> I often wondered about this: AFAIK if you get a csum error BTRFS
> >>> makes
> >>> this an input/output error. For being able to access the data in
> >>> place,
> >>> how about a "iwantmycorrupteddataback" mount option where BTRFS just
> >>> logs csum errors but allows one to access the files nonetheless.
> >> 
> >> The idea is good, but don't forget we have metadata(tree block) and
> >> data. For data, this is completely OK.
> >> But for metadata, this may be a disaster just like the --dangerous
> >> option.
> > 
> > Ah yes, so probably only do this for data or have an extra option for
> > skipping csum on metadata for the really desparate, but then I´d
> > really
> > force read only to avoid corrupted causing more damage.
> > 
> >>> This could even
> >>> work together with remount. Maybe it would be good not to allow
> >>> writing to broken csum blocks, i.e. fail these with input/output
> >>> error.
> >> 
> >> Don't forget btrfs' COW write.
> >> So write into data shouldn't be a problem.(if COW is enabled).
> > 
> > Yes, but… it hides the corruption. Unless you have a snapshot if an
> > application reads corrupted data and then writes it back, then you
> > have no indication that the data was corrupted in the first time.
> > 
> >>> This way, the csum would not be automatically fixed, *but* one is
> >>> able
> >>> to access the broken data, *while* knowing it is broken.
> >>> 
> >>> If that is possible already, I missed it.
> >> 
> >> Much as you considered, data csum can be rebuilt in btrfsck with
> >> --init-csum-tree option.
> >> Although not every user knows this feature and even less users know
> >> the
> >> correct timing using it.
> > 
> > I wonder about making a wiki page about recovery options with two
> > parts:
> > 
> > 1) Diagnosis. First find out what might be wrong.
> > 
> > 2) Cure. Then decide which steps to try to recover.
> 
> This seems really useful.
> 
> But I'm a little afraid of introducing too much info for end user,
> metadata/data, difference between btrfsck
> and scrub and tons of other things may make user confused.
> And more, this things should be done by btrfsck automatically...

Sure. The page should contain a disclaimer anyway, and I think its good to 
have it as easy as possible for the user. But also, for the early 
adopters, I think it is really good to have some guidance available, with 
the caveat to always ask here on the mailing list if unsure about next 
step.

> Beside this, wiki pages about real world btrfs recovery strategy is very
> helpful.
> Feel free to add, although I'm not sure how to add pages to btrfs wiki,
> maybe you need to contact Marc or
> David?

David, I requested a wiki account via the page and even made a (not quite 
serious) 50 words biography in order to pass that form.

Thanks,
Martin

> 
> Thanks,
> Qu
> 
> > And of cause an intro on best practice to only work on a copy of the
> > copy for any in-place repair attempts.
> > 
> > I´d be willing to make such a page, provided I get enough hints on
> > what to try when. I have some ideas myself, but I am not sure they
> > are accurate :)
> > 
> > Thanks,
> > Martin
> > 
> >> Thanks,
> >> Qu
> >> 
> >>>> The csum reseting have the following features:
> >>>> 1) Top to down level by level
> >>>> The csum resetting is done from tree to level 1, and only when all
> >>>> the
> >>>> csum of nodes in this level is reset and can pass read_tree_block()
> >>>> check, it will continue to next level.
> >>>> And all bytenr in nodeptr will be re-aligned, so bit error in the
> >>>> low
> >>>> 12 bits(4K sector size case) can also be repaired without pain.
> >>>> With this behavior, error in nodeptr has a chance not affecting its
> >>>> child.
> >>>> 
> >>>> 2) No Copy-on-write
> >>>> COW means we needs to have a valid extent tree, if extent tree is
> >>>> corrupted COW will only be a BUG_ON blocking us.
> >>>> So all the r/w in this dangerous mode will use no-cow write. That's
> >>>> why
> >>>> we export and slightly modified write_tree_block() to do no-cow
> >>>> tree
> >>>> block write with newly calculated csum.
> >>>> Since the write is not cowed, if it fails, it will also destroy the
> >>>> last hope for manual inspection.
> >>>> 
> >>>> Qu Wenruo (7):
> >>>>     btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search
> >>>>     result
> >>>>     
> >>>>       in     the same level of path->lowest_level.
> >>>>     
> >>>>     btrfs-progs: Introduce btrfs_next_slot() function to iterate to
> >>>>     next
> >>>>     
> >>>>         slot in given level.
> >>>>     
> >>>>     btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree
> >>>>     node.
> >>>>     btrfs-progs: Export write_tree_block() and allow it to do nocow
> >>>>     write.
> >>>> 
> >>>> btrfs-progs: Introduce new function reset_tree_block_csum() for
> >>>> later
> >>>> tree block csum reset.
> >>>> 
> >>>>     btrfs-progs: Introduce new function
> >>>>     reset_(one_root/roots)_csum()
> >>>>     to
> >>>>     
> >>>>         reset one/all tree's csum in tree root.
> >>>>     
> >>>>     btrfs-progs: Introduce "--dangerous" option to reset all tree
> >>>>     block
> >>>>     
> >>>>        csum.
> >>>>    
> >>>>    cmds-check.c | 284
> >>>> 
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
> >>>> 
> >>>>    |  18 ++--
> >>>>    
> >>>>    ctree.h      |  25 +++++-
> >>>>    disk-io.c    |  55 +++++++++---
> >>>>    disk-io.h    |   3 +
> >>>>    5 files changed, 359 insertions(+), 26 deletions(-)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

  reply	other threads:[~2015-02-05  8:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
2015-02-04  7:16 ` [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level Qu Wenruo
2015-02-04  7:16 ` [PATCH 2/7] btrfs-progs: Introduce btrfs_next_slot() function to iterate to next slot in given level Qu Wenruo
2015-02-04  7:16 ` [PATCH 3/7] btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node Qu Wenruo
2015-02-04  7:16 ` [PATCH 4/7] btrfs-progs: Export write_tree_block() and allow it to do nocow write Qu Wenruo
2015-02-04  7:16 ` [PATCH 4/5] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04  7:16 ` [PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04  7:16 ` [PATCH 5/7] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04  7:16 ` [PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04  7:16 ` [PATCH 7/7] btrfs-progs: Introduce "--dangerous" option to reset all tree block csum Qu Wenruo
2015-02-04  9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
2015-02-04 10:07   ` Paul Jones
2015-02-05  1:43     ` Qu Wenruo
2015-02-05  1:35   ` Qu Wenruo
2015-02-05  8:31     ` Martin Steigerwald
2015-02-05  8:45       ` Qu Wenruo
2015-02-05  8:59         ` Martin Steigerwald [this message]
2015-04-22  5:55 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1505422.Q2KAhjoLSB@merkaba \
    --to=martin@lichtvoll.de \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).