From: Martin Steigerwald <martin@lichtvoll.de>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.
Date: Thu, 05 Feb 2015 09:31:36 +0100 [thread overview]
Message-ID: <1898269.N3rJDqb6kZ@merkaba> (raw)
In-Reply-To: <54D2C8DE.1010006@cn.fujitsu.com>
Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo:
> -------- Original Message --------
> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks,
> AKA dangerous mode.
> From: Martin Steigerwald <martin@lichtvoll.de>
> To: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Date: 2015年02月04日 17:16
>
> > Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> >> Btrfs's metadata csum is a good mechanism, keeping bit error away
> >> from
> >> sensitive kernel. But such mechanism will also be too sensitive, like
> >> bit error in csum bytes or low all zero bits in nodeptr.
> >> It's a trade using "error tolerance" for stable, and is reasonable
> >> for
> >> most cases since there is DUP/RAID1/5/6/10 duplication level.
> >>
> >> But in some case, whatever for development purpose or despair user
> >> who
> >> can't tolerant all his/her inline data lost, or even crazy QA team
> >> hoping btrfs can survive heavy random bits bombing, there are some
> >> guys
> >> want to get rid of the csum protection and face the crucial raw data
> >> no
> >> matter what disaster may happen.
> >>
> >> So, introduce the new '--dangerous' (or "destruction"/"debug" if you
> >> like) option for btrfsck to reset all csum of tree blocks.
> >
> > I often wondered about this: AFAIK if you get a csum error BTRFS makes
> > this an input/output error. For being able to access the data in
> > place,
> > how about a "iwantmycorrupteddataback" mount option where BTRFS just
> > logs csum errors but allows one to access the files nonetheless.
>
> The idea is good, but don't forget we have metadata(tree block) and
> data. For data, this is completely OK.
> But for metadata, this may be a disaster just like the --dangerous
> option.
Ah yes, so probably only do this for data or have an extra option for
skipping csum on metadata for the really desparate, but then I´d really
force read only to avoid corrupted causing more damage.
> > This could even
> > work together with remount. Maybe it would be good not to allow
> > writing to broken csum blocks, i.e. fail these with input/output
> > error.
>
> Don't forget btrfs' COW write.
> So write into data shouldn't be a problem.(if COW is enabled).
Yes, but… it hides the corruption. Unless you have a snapshot if an
application reads corrupted data and then writes it back, then you have no
indication that the data was corrupted in the first time.
> > This way, the csum would not be automatically fixed, *but* one is able
> > to access the broken data, *while* knowing it is broken.
> >
> > If that is possible already, I missed it.
>
> Much as you considered, data csum can be rebuilt in btrfsck with
> --init-csum-tree option.
> Although not every user knows this feature and even less users know the
> correct timing using it.
I wonder about making a wiki page about recovery options with two parts:
1) Diagnosis. First find out what might be wrong.
2) Cure. Then decide which steps to try to recover.
And of cause an intro on best practice to only work on a copy of the copy
for any in-place repair attempts.
I´d be willing to make such a page, provided I get enough hints on what to
try when. I have some ideas myself, but I am not sure they are accurate :)
Thanks,
Martin
>
> Thanks,
> Qu
>
> >> The csum reseting have the following features:
> >> 1) Top to down level by level
> >> The csum resetting is done from tree to level 1, and only when all
> >> the
> >> csum of nodes in this level is reset and can pass read_tree_block()
> >> check, it will continue to next level.
> >> And all bytenr in nodeptr will be re-aligned, so bit error in the low
> >> 12 bits(4K sector size case) can also be repaired without pain.
> >> With this behavior, error in nodeptr has a chance not affecting its
> >> child.
> >>
> >> 2) No Copy-on-write
> >> COW means we needs to have a valid extent tree, if extent tree is
> >> corrupted COW will only be a BUG_ON blocking us.
> >> So all the r/w in this dangerous mode will use no-cow write. That's
> >> why
> >> we export and slightly modified write_tree_block() to do no-cow tree
> >> block write with newly calculated csum.
> >> Since the write is not cowed, if it fails, it will also destroy the
> >> last hope for manual inspection.
> >>
> >> Qu Wenruo (7):
> >> btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search
> >> result
> >>
> >> in the same level of path->lowest_level.
> >>
> >> btrfs-progs: Introduce btrfs_next_slot() function to iterate to
> >> next
> >>
> >> slot in given level.
> >>
> >> btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node.
> >> btrfs-progs: Export write_tree_block() and allow it to do nocow
> >> write.
> >>
> >> btrfs-progs: Introduce new function reset_tree_block_csum() for later
> >> tree block csum reset.
> >>
> >> btrfs-progs: Introduce new function reset_(one_root/roots)_csum()
> >> to
> >>
> >> reset one/all tree's csum in tree root.
> >>
> >> btrfs-progs: Introduce "--dangerous" option to reset all tree
> >> block
> >>
> >> csum.
> >>
> >> cmds-check.c | 284
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
> >>
> >> | 18 ++--
> >>
> >> ctree.h | 25 +++++-
> >> disk-io.c | 55 +++++++++---
> >> disk-io.h | 3 +
> >> 5 files changed, 359 insertions(+), 26 deletions(-)
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
next prev parent reply other threads:[~2015-02-05 8:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-04 7:16 [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
2015-02-04 7:16 ` [PATCH 1/7] btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search result in the same level of path->lowest_level Qu Wenruo
2015-02-04 7:16 ` [PATCH 2/7] btrfs-progs: Introduce btrfs_next_slot() function to iterate to next slot in given level Qu Wenruo
2015-02-04 7:16 ` [PATCH 3/7] btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree node Qu Wenruo
2015-02-04 7:16 ` [PATCH 4/7] btrfs-progs: Export write_tree_block() and allow it to do nocow write Qu Wenruo
2015-02-04 7:16 ` [PATCH 4/5] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04 7:16 ` [PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04 7:16 ` [PATCH 5/7] btrfs-progs: Introduce new function reset_tree_block_csum() for later tree block csum reset Qu Wenruo
2015-02-04 7:16 ` [PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root Qu Wenruo
2015-02-04 7:16 ` [PATCH 7/7] btrfs-progs: Introduce "--dangerous" option to reset all tree block csum Qu Wenruo
2015-02-04 9:16 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Martin Steigerwald
2015-02-04 10:07 ` Paul Jones
2015-02-05 1:43 ` Qu Wenruo
2015-02-05 1:35 ` Qu Wenruo
2015-02-05 8:31 ` Martin Steigerwald [this message]
2015-02-05 8:45 ` Qu Wenruo
2015-02-05 8:59 ` BTRFS wiki: page about recovery (was: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode.) Martin Steigerwald
2015-04-22 5:55 ` [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, AKA dangerous mode Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1898269.N3rJDqb6kZ@merkaba \
--to=martin@lichtvoll.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).