All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Steigerwald <martin@lichtvoll.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.cz>
Subject: BTRFS wiki: page about recovery (was: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.)
Date: Thu, 05 Feb 2015 09:59:02 +0100	[thread overview]
Message-ID: <1505422.Q2KAhjoLSB@merkaba> (raw)
In-Reply-To: <54D32D9D.2060306@cn.fujitsu.com>

Am Donnerstag, 5. Februar 2015, 16:45:17 schrieb Qu Wenruo:
> -------- Original Message --------
> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks,
> AKA dangerous mode.
> From: Martin Steigerwald <martin@lichtvoll.de>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015年02月05日 16:31
> 
> > Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo:
> >> -------- Original Message --------
> >> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree
> >> blocks, AKA dangerous mode.
> >> From: Martin Steigerwald <martin@lichtvoll.de>
> >> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >> Date: 2015年02月04日 17:16
> >> 
> >>> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> >>>> Btrfs's metadata csum is a good mechanism, keeping bit error away
> >>>> from
> >>>> sensitive kernel. But such mechanism will also be too sensitive,
> >>>> like
> >>>> bit error in csum bytes or low all zero bits in nodeptr.
> >>>> It's a trade using "error tolerance" for stable, and is reasonable
> >>>> for
> >>>> most cases since there is DUP/RAID1/5/6/10 duplication level.
> >>>> 
> >>>> But in some case, whatever for development purpose or despair user
> >>>> who
> >>>> can't tolerant all his/her inline data lost, or even crazy QA team
> >>>> hoping btrfs can survive heavy random bits bombing, there are some
> >>>> guys
> >>>> want to get rid of the csum protection and face the crucial raw
> >>>> data
> >>>> no
> >>>> matter what disaster may happen.
> >>>> 
> >>>> So, introduce the new '--dangerous' (or "destruction"/"debug" if
> >>>> you
> >>>> like) option for btrfsck to reset all csum of tree blocks.
> >>> 
> >>> I often wondered about this: AFAIK if you get a csum error BTRFS
> >>> makes
> >>> this an input/output error. For being able to access the data in
> >>> place,
> >>> how about a "iwantmycorrupteddataback" mount option where BTRFS just
> >>> logs csum errors but allows one to access the files nonetheless.
> >> 
> >> The idea is good, but don't forget we have metadata(tree block) and
> >> data. For data, this is completely OK.
> >> But for metadata, this may be a disaster just like the --dangerous
> >> option.
> > 
> > Ah yes, so probably only do this for data or have an extra option for
> > skipping csum on metadata for the really desparate, but then I´d
> > really
> > force read only to avoid corrupted causing more damage.
> > 
> >>> This could even
> >>> work together with remount. Maybe it would be good not to allow
> >>> writing to broken csum blocks, i.e. fail these with input/output
> >>> error.
> >> 
> >> Don't forget btrfs' COW write.
> >> So write into data shouldn't be a problem.(if COW is enabled).
> > 
> > Yes, but… it hides the corruption. Unless you have a snapshot if an
> > application reads corrupted data and then writes it back, then you
> > have no indication that the data was corrupted in the first time.
> > 
> >>> This way, the csum would not be automatically fixed, *but* one is
> >>> able
> >>> to access the broken data, *while* knowing it is broken.
> >>> 
> >>> If that is possible already, I missed it.
> >> 
> >> Much as you considered, data csum can be rebuilt in btrfsck with
> >> --init-csum-tree option.
> >> Although not every user knows this feature and even less users know
> >> the
> >> correct timing using it.
> > 
> > I wonder about making a wiki page about recovery options with two
> > parts:
> > 
> > 1) Diagnosis. First find out what might be wrong.
> > 
> > 2) Cure. Then decide which steps to try to recover.
> 
> This seems really useful.
> 
> But I'm a little afraid of introducing too much info for end user,
> metadata/data, difference between btrfsck
> and scrub and tons of other things may make user confused.
> And more, this things should be done by btrfsck automatically...

Sure. The page should contain a disclaimer anyway, and I think its good to 
have it as easy as possible for the user. But also, for the early 
adopters, I think it is really good to have some guidance available, with 
the caveat to always ask here on the mailing list if unsure about next 
step.

> Beside this, wiki pages about real world btrfs recovery strategy is very
> helpful.
> Feel free to add, although I'm not sure how to add pages to btrfs wiki,
> maybe you need to contact Marc or
> David?

David, I requested a wiki account via the page and even made a (not quite 
serious) 50 words biography in order to pass that form.

Thanks,
Martin

> 
> Thanks,
> Qu
> 
> > And of cause an intro on best practice to only work on a copy of the
> > copy for any in-place repair attempts.
> > 
> > I´d be willing to make such a page, provided I get enough hints on
> > what to try when. I have some ideas myself, but I am not sure they
> > are accurate :)
> > 
> > Thanks,
> > Martin
> > 
> >> Thanks,
> >> Qu
> >> 
> >>>> The csum reseting have the following features:
> >>>> 1) Top to down level by level
> >>>> The csum resetting is done from tree to level 1, and only when all
> >>>> the
> >>>> csum of nodes in this level is reset and can pass read_tree_block()
> >>>> check, it will continue to next level.
> >>>> And all bytenr in nodeptr will be re-aligned, so bit error in the
> >>>> low
> >>>> 12 bits(4K sector size case) can also be repaired without pain.
> >>>> With this behavior, error in nodeptr has a chance not affecting its
> >>>> child.
> >>>> 
> >>>> 2) No Copy-on-write
> >>>> COW means we needs to have a valid extent tree, if extent tree is
> >>>> corrupted COW will only be a BUG_ON blocking us.
> >>>> So all the r/w in this dangerous mode will use no-cow write. That's
> >>>> why
> >>>> we export and slightly modified write_tree_block() to do no-cow
> >>>> tree
> >>>> block write with newly calculated csum.
> >>>> Since the write is not cowed, if it fails, it will also destroy the
> >>>> last hope for manual inspection.
> >>>> 
> >>>> Qu Wenruo (7):
> >>>>     btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search
> >>>>     result
> >>>>     
> >>>>       in     the same level of path->lowest_level.
> >>>>     
> >>>>     btrfs-progs: Introduce btrfs_next_slot() function to iterate to
> >>>>     next
> >>>>     
> >>>>         slot in given level.
> >>>>     
> >>>>     btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree
> >>>>     node.
> >>>>     btrfs-progs: Export write_tree_block() and allow it to do nocow
> >>>>     write.
> >>>> 
> >>>> btrfs-progs: Introduce new function reset_tree_block_csum() for
> >>>> later
> >>>> tree block csum reset.
> >>>> 
> >>>>     btrfs-progs: Introduce new function
> >>>>     reset_(one_root/roots)_csum()
> >>>>     to
> >>>>     
> >>>>         reset one/all tree's csum in tree root.
> >>>>     
> >>>>     btrfs-progs: Introduce "--dangerous" option to reset all tree
> >>>>     block
> >>>>     
> >>>>        csum.
> >>>>    
> >>>>    cmds-check.c | 284
> >>>> 
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
> >>>> 
> >>>>    |  18 ++--
> >>>>    
> >>>>    ctree.h      |  25 +++++-
> >>>>    disk-io.c    |  55 +++++++++---
> >>>>    disk-io.h    |   3 +
> >>>>    5 files changed, 359 insertions(+), 26 deletions(-)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

  reply	other threads:[~2015-02-05  8:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-04  7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
2015-02-04  7:16 ` [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level Qu Wenruo
2015-02-04  7:16 ` [PATCH 2/7] btrfs-progs: Introduce btrfs_next_slot() function to iterate to next slot in given level Qu Wenruo
2015-02-04  7:16 ` [PATCH 3/7] btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node Qu Wenruo
2015-02-04  7:16 ` [PATCH 4/7] btrfs-progs: Export write_tree_block() and allow it to do nocow write Qu Wenruo
2015-02-04  7:16 ` [PATCH 4/5] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04  7:16 ` [PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04  7:16 ` [PATCH 5/7] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04  7:16 ` [PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04  7:16 ` [PATCH 7/7] btrfs-progs: Introduce "--dangerous" option to reset all tree block csum Qu Wenruo
2015-02-04  9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
2015-02-04 10:07   ` Paul Jones
2015-02-05  1:43     ` Qu Wenruo
2015-02-05  1:35   ` Qu Wenruo
2015-02-05  8:31     ` Martin Steigerwald
2015-02-05  8:45       ` Qu Wenruo
2015-02-05  8:59         ` Martin Steigerwald [this message]
2015-04-22  5:55 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1505422.Q2KAhjoLSB@merkaba \
    --to=martin@lichtvoll.de \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.