linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: Leonard Lausen <leonard@lausen.nl>
Subject: Re: [PATCH v5.1 12/12] btrfs: Do mandatory tree block check before submitting bio
Date: Mon, 18 Feb 2019 11:26:24 +0200	[thread overview]
Message-ID: <7a0248d2-754d-e352-796b-df41f9fbce53@suse.com> (raw)
In-Reply-To: <20190218052753.24138-13-wqu@suse.com>



On 18.02.19 г. 7:27 ч., Qu Wenruo wrote:
> There are at least 2 reports about memory bit flip sneaking into on-disk
> data.
> 
> Currently we only have a relaxed check triggered at
> btrfs_mark_buffer_dirty() time, as it's not mandatory and only for
> CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build, it doesn't help user to
> detect such problem.
> 
> This patch will address the hole by triggering comprehensive check on
> tree blocks before writing it back to disk.
> 
> The design points are:
> - Timing of the check: Tree block write hook
>   This timing is chosen to reduce the overhead.
>   The comprehensive check should be as expensive as csum.
>   Doing full check at btrfs_mark_buffer_dirty() is too expensive for end
>   user.
> 
> - Loose empty leaf check
>   Originally for empty leaf, tree-checker will report error if it's not
>   a tree root.
>   The problem for such check at write time is:
>   * False alert for tree root created in current transaction
>     In that case, the commit root still needs to be written to disk.
>     And since current root can differ from commit root, then it will
>     cause false alert.
>     This happens for log tree.
> 
>   * False alert for relocated tree block
>     Relocated tree block can be written to disk due to memory pressure,
>     in that case an empty csum tree root can be written to disk and
>     cause false alert, since csum root node hasn't been updated.
> 
>   Although some more reliable empty leaf check is still kept as is.
>   Namely essential trees (e.g. extent, chunk) should never be empty.
> 
> The example error output will be something like:
>   BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>   BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected
>   BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>   BTRFS info (device dm-3): forced readonly
>   BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>   BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>   BTRFS info (device dm-3): delayed_refs has NO entry
> 
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c      | 10 ++++++++++
>  fs/btrfs/tree-checker.c | 24 +++++++++++++++++++++---
>  fs/btrfs/tree-checker.h |  8 ++++++++
>  3 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 6052ab508f84..fff789f8db63 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -313,6 +313,16 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
>  			return -EUCLEAN;
>  		}
>  	} else {
> +		if (btrfs_header_level(buf))
> +			err = btrfs_check_node(fs_info, buf);
> +		else
> +			err = btrfs_check_leaf_write(fs_info, buf);
> +		if (err < 0) {
> +			btrfs_err(fs_info,
> +			"block=%llu write time tree block corruption detected",
> +				  buf->start);
> +			return err;
> +		}

This code should be moved in csum_dirty_buffer. Currently there is
pending cleanups in csum_tree_block and the final if there will be
removed and respective read/write code factored out in
csum_dirty_buffer/btree_readpage_end_io_hook.

Eventually csum_tree_block's sole purpose should be to calculate the
checksum and nothing more.

>  		write_extent_buffer(buf, result, 0, csum_size);
>  	}
>  
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index a62e1e837a89..b8cdaf472031 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -477,7 +477,7 @@ static int check_leaf_item(struct btrfs_fs_info *fs_info,
>  }
>  
>  static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
> -		      bool check_item_data)
> +		      bool check_item_data, bool check_empty_leaf)
>  {
>  	/* No valid key type is 0, so all key should be larger than this key */
>  	struct btrfs_key prev_key = {0, 0, 0};
> @@ -516,6 +516,18 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
>  				    owner);
>  			return -EUCLEAN;
>  		}
> +
> +		/*
> +		 * Skip empty leaf check, mostly for write time tree block
> +		 *
> +		 * Such skip mostly happens for tree block write time, as
> +		 * we can't use @owner as accurate owner indicator.
> +		 * Case like balance and new tree block created for commit root
> +		 * can break owner check easily.
> +		 */
> +		if (!check_empty_leaf)
> +			return 0;
> +
>  		key.objectid = owner;
>  		key.type = BTRFS_ROOT_ITEM_KEY;
>  		key.offset = (u64)-1;
> @@ -636,13 +648,19 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
>  int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
>  			  struct extent_buffer *leaf)
>  {
> -	return check_leaf(fs_info, leaf, true);
> +	return check_leaf(fs_info, leaf, true, true);
>  }
>  
>  int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
>  			     struct extent_buffer *leaf)
>  {
> -	return check_leaf(fs_info, leaf, false);
> +	return check_leaf(fs_info, leaf, false, true);
> +}
> +
> +int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
> +			   struct extent_buffer *leaf)
> +{
> +	return check_leaf(fs_info, leaf, false, false);
>  }
>  
>  int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node)
> diff --git a/fs/btrfs/tree-checker.h b/fs/btrfs/tree-checker.h
> index ff043275b784..6f8d1b627c53 100644
> --- a/fs/btrfs/tree-checker.h
> +++ b/fs/btrfs/tree-checker.h
> @@ -23,6 +23,14 @@ int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
>   */
>  int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
>  			     struct extent_buffer *leaf);
> +
> +/*
> + * Write time specific leaf checker.
> + * Don't check if the empty leaf belongs to a tree root. Mostly for balance
> + * and new tree created in current transaction.
> + */
> +int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
> +			   struct extent_buffer *leaf);
>  int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node);
>  
>  #endif
> 

  reply	other threads:[~2019-02-18  9:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18  5:27 [PATCH v5.1 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-02-23  4:38   ` [PATCH v5.2 " Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 05/12] btrfs: extent_io: Handle error better in extent_write_full_page() Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 06/12] btrfs: extent_io: Handle error better in btree_write_cache_pages() Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 08/12] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
2019-03-12  0:33   ` David Sterba
2019-03-12  0:42     ` Qu Wenruo
2019-03-13 11:31       ` David Sterba
2019-03-13 12:02         ` Qu Wenruo
2019-03-15  6:27   ` [PATCH v5.2 " Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 11/12] btrfs: extent_io: Handle error better in extent_writepages() Qu Wenruo
2019-02-18  5:27 ` [PATCH v5.1 12/12] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
2019-02-18  9:26   ` Nikolay Borisov [this message]
2019-02-18  9:32     ` Qu Wenruo
2019-02-20 18:11       ` David Sterba
2019-02-20 18:25 ` [PATCH v5.1 00/12] btrfs: Enhancement to tree block validation David Sterba
2019-02-21  0:37   ` Qu Wenruo
2019-02-22 15:38     ` David Sterba
2019-02-21  4:49 ` Qu Wenruo
2019-02-22 15:18 ` David Sterba
2019-02-23  0:47   ` Qu Wenruo
2019-02-27 12:22     ` David Sterba
2019-02-27 13:40       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a0248d2-754d-e352-796b-df41f9fbce53@suse.com \
    --to=nborisov@suse.com \
    --cc=leonard@lausen.nl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).