From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: Leonard Lausen <leonard@lausen.nl>
Subject: Re: [PATCH v5.1 12/12] btrfs: Do mandatory tree block check before submitting bio
Date: Mon, 18 Feb 2019 11:26:24 +0200 [thread overview]
Message-ID: <7a0248d2-754d-e352-796b-df41f9fbce53@suse.com> (raw)
In-Reply-To: <20190218052753.24138-13-wqu@suse.com>
On 18.02.19 г. 7:27 ч., Qu Wenruo wrote:
> There are at least 2 reports about memory bit flip sneaking into on-disk
> data.
>
> Currently we only have a relaxed check triggered at
> btrfs_mark_buffer_dirty() time, as it's not mandatory and only for
> CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build, it doesn't help user to
> detect such problem.
>
> This patch will address the hole by triggering comprehensive check on
> tree blocks before writing it back to disk.
>
> The design points are:
> - Timing of the check: Tree block write hook
> This timing is chosen to reduce the overhead.
> The comprehensive check should be as expensive as csum.
> Doing full check at btrfs_mark_buffer_dirty() is too expensive for end
> user.
>
> - Loose empty leaf check
> Originally for empty leaf, tree-checker will report error if it's not
> a tree root.
> The problem for such check at write time is:
> * False alert for tree root created in current transaction
> In that case, the commit root still needs to be written to disk.
> And since current root can differ from commit root, then it will
> cause false alert.
> This happens for log tree.
>
> * False alert for relocated tree block
> Relocated tree block can be written to disk due to memory pressure,
> in that case an empty csum tree root can be written to disk and
> cause false alert, since csum root node hasn't been updated.
>
> Although some more reliable empty leaf check is still kept as is.
> Namely essential trees (e.g. extent, chunk) should never be empty.
>
> The example error output will be something like:
> BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
> BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected
> BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
> BTRFS info (device dm-3): forced readonly
> BTRFS warning (device dm-3): Skipping commit of aborted transaction.
> BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
> BTRFS info (device dm-3): delayed_refs has NO entry
>
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/disk-io.c | 10 ++++++++++
> fs/btrfs/tree-checker.c | 24 +++++++++++++++++++++---
> fs/btrfs/tree-checker.h | 8 ++++++++
> 3 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 6052ab508f84..fff789f8db63 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -313,6 +313,16 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
> return -EUCLEAN;
> }
> } else {
> + if (btrfs_header_level(buf))
> + err = btrfs_check_node(fs_info, buf);
> + else
> + err = btrfs_check_leaf_write(fs_info, buf);
> + if (err < 0) {
> + btrfs_err(fs_info,
> + "block=%llu write time tree block corruption detected",
> + buf->start);
> + return err;
> + }
This code should be moved in csum_dirty_buffer. Currently there is
pending cleanups in csum_tree_block and the final if there will be
removed and respective read/write code factored out in
csum_dirty_buffer/btree_readpage_end_io_hook.
Eventually csum_tree_block's sole purpose should be to calculate the
checksum and nothing more.
> write_extent_buffer(buf, result, 0, csum_size);
> }
>
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index a62e1e837a89..b8cdaf472031 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -477,7 +477,7 @@ static int check_leaf_item(struct btrfs_fs_info *fs_info,
> }
>
> static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
> - bool check_item_data)
> + bool check_item_data, bool check_empty_leaf)
> {
> /* No valid key type is 0, so all key should be larger than this key */
> struct btrfs_key prev_key = {0, 0, 0};
> @@ -516,6 +516,18 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
> owner);
> return -EUCLEAN;
> }
> +
> + /*
> + * Skip empty leaf check, mostly for write time tree block
> + *
> + * Such skip mostly happens for tree block write time, as
> + * we can't use @owner as accurate owner indicator.
> + * Case like balance and new tree block created for commit root
> + * can break owner check easily.
> + */
> + if (!check_empty_leaf)
> + return 0;
> +
> key.objectid = owner;
> key.type = BTRFS_ROOT_ITEM_KEY;
> key.offset = (u64)-1;
> @@ -636,13 +648,19 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
> int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
> struct extent_buffer *leaf)
> {
> - return check_leaf(fs_info, leaf, true);
> + return check_leaf(fs_info, leaf, true, true);
> }
>
> int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
> struct extent_buffer *leaf)
> {
> - return check_leaf(fs_info, leaf, false);
> + return check_leaf(fs_info, leaf, false, true);
> +}
> +
> +int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
> + struct extent_buffer *leaf)
> +{
> + return check_leaf(fs_info, leaf, false, false);
> }
>
> int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node)
> diff --git a/fs/btrfs/tree-checker.h b/fs/btrfs/tree-checker.h
> index ff043275b784..6f8d1b627c53 100644
> --- a/fs/btrfs/tree-checker.h
> +++ b/fs/btrfs/tree-checker.h
> @@ -23,6 +23,14 @@ int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
> */
> int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
> struct extent_buffer *leaf);
> +
> +/*
> + * Write time specific leaf checker.
> + * Don't check if the empty leaf belongs to a tree root. Mostly for balance
> + * and new tree created in current transaction.
> + */
> +int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
> + struct extent_buffer *leaf);
> int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node);
>
> #endif
>
next prev parent reply other threads:[~2019-02-18 9:26 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-18 5:27 [PATCH v5.1 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-02-23 4:38 ` [PATCH v5.2 " Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 05/12] btrfs: extent_io: Handle error better in extent_write_full_page() Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 06/12] btrfs: extent_io: Handle error better in btree_write_cache_pages() Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 08/12] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
2019-03-12 0:33 ` David Sterba
2019-03-12 0:42 ` Qu Wenruo
2019-03-13 11:31 ` David Sterba
2019-03-13 12:02 ` Qu Wenruo
2019-03-15 6:27 ` [PATCH v5.2 " Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 11/12] btrfs: extent_io: Handle error better in extent_writepages() Qu Wenruo
2019-02-18 5:27 ` [PATCH v5.1 12/12] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
2019-02-18 9:26 ` Nikolay Borisov [this message]
2019-02-18 9:32 ` Qu Wenruo
2019-02-20 18:11 ` David Sterba
2019-02-20 18:25 ` [PATCH v5.1 00/12] btrfs: Enhancement to tree block validation David Sterba
2019-02-21 0:37 ` Qu Wenruo
2019-02-22 15:38 ` David Sterba
2019-02-21 4:49 ` Qu Wenruo
2019-02-22 15:18 ` David Sterba
2019-02-23 0:47 ` Qu Wenruo
2019-02-27 12:22 ` David Sterba
2019-02-27 13:40 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7a0248d2-754d-e352-796b-df41f9fbce53@suse.com \
--to=nborisov@suse.com \
--cc=leonard@lausen.nl \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).