public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	dsterba@suse.cz, David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 05/20] btrfs: handle invalid root reference found in btrfs_find_root()
Date: Mon, 29 Jan 2024 16:55:46 +0100	[thread overview]
Message-ID: <20240129155546.GY31555@twin.jikos.cz> (raw)
In-Reply-To: <20240126141927.GA1612357@perftesting>

On Fri, Jan 26, 2024 at 09:19:27AM -0500, Josef Bacik wrote:
> On Fri, Jan 26, 2024 at 09:44:52AM +1030, Qu Wenruo wrote:
> > 
> > 
> > On 2024/1/26 02:58, David Sterba wrote:
> > > On Thu, Jan 25, 2024 at 02:33:53PM +1030, Qu Wenruo wrote:
> > > > 
> > > > 
> > > > On 2024/1/25 07:48, David Sterba wrote:
> > > > > The btrfs_find_root() looks up a root by a key, allowing to do an
> > > > > inexact search when key->offset is -1.  It's never expected to find such
> > > > > item, as it would break allowed the range of a root id.
> > > > > 
> > > > > Signed-off-by: David Sterba <dsterba@suse.com>
> > > > > ---
> > > > >    fs/btrfs/root-tree.c | 9 ++++++++-
> > > > >    1 file changed, 8 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
> > > > > index ba7e2181ff4e..326cd0d03287 100644
> > > > > --- a/fs/btrfs/root-tree.c
> > > > > +++ b/fs/btrfs/root-tree.c
> > > > > @@ -82,7 +82,14 @@ int btrfs_find_root(struct btrfs_root *root, const struct btrfs_key *search_key,
> > > > >    		if (ret > 0)
> > > > >    			goto out;
> > > > >    	} else {
> > > > > -		BUG_ON(ret == 0);		/* Logical error */
> > > > > +		/*
> > > > > +		 * Key with offset -1 found, there would have to exist a root
> > > > > +		 * with such id, but this is out of the valid range.
> > > > > +		 */
> > > > > +		if (ret == 0) {
> > > > > +			ret = -EUCLEAN;
> > > > > +			goto out;
> > > > > +		}
> > > > 
> > > > Considering all root items would already be checked by tree-checker,
> > > > I'd prefer to add an extra "key.offset == (u64)-1" check, and convert
> > > > this to ASSERT(ret == 0), so that we won't need to bother the impossible
> > > > case (nor its error messages).
> > > 
> > > I did not think about tree-checker in this context and it actually does
> > > verify offsets of the item keys so it's already prevented, assuming such
> > > corrupted issue would come from the outside.
> > 
> > If the root item is fine, but it's some runtime memory bitflip, then I'd
> > say if hitting an ASSERT() is really the last time we need to consider.
> > 
> > Furthermore, we have further safenet at metadata write time, which would
> > definitely lead to a transaction abort.
> > 
> > > 
> > > Assertions are not very popular in release builds and distros turn them
> > > off. A real error handling prevents propagating an error further to the
> > > code, so this is for catching bugs that could happen after tree-checker
> > > lets it pass with valid data but something sets wrong value to offset.
> > > 
> > > The reasoning I'm using is to have full coverage of the error values as
> > > real handling with worst case throwing an EUCLEAN. Assertions are not
> > > popular in release builds and distros turn them off so it's effectively
> > > removing error handling and allowing silent errors.
> > > 
> > Yep, I know ASSERT() is not popular in release builds.
> > 
> > But in this case, if tree-checker didn't catch such corruption, but some
> > runtime memory biflip (well, no single bitflip can result to -1) or
> > memory corruption created such -1 key offset, and ASSERT() is ignoring it.
> > 
> > That would still be fine, as our final write time tree-checker would
> > catch and abort current transaction.
> > 
> > So yes, I want to use ASSERT() intentionally here, because we're still
> > fine even it's not properly caught.
> > 
> > And again back to the old discussion on EUCLEAN, I really want it to be
> > noisy enough immediately, not waiting for the caller layers up the call
> > chain to do their error handling.
> 
> We can have both.
> 
> This patch series is removing BUG_ON()'s, and this is the correct change for
> that.
> 
> If we need followups to harden the tree checker that's valuable work too, but
> that's a followup and doesn't mean this series needs changing.
> 
> With CONFIG_BTRFS_DEBUG on we're going to get a WARN_ON when this thing trips
> and we'll see it in fstests.  Thanks,

I'd propose something different than an ASSERT for handling the EUCLEAN
cases. To make it print a WARN_ON under debug and with a message (all
builds).

The first step is to handle all the errors so I'd like not to mix it
with series. There are about 255 EUCLEAN cases (and possibly more
missing), that's a lot so we need to have a common way to handle them
instead of randomly adding ASSERT(0), duplicating an error condition
in the assert or doing thinsg like WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG).

  reply	other threads:[~2024-01-29 15:56 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 21:17 [PATCH 00/20] Error handling fixes David Sterba
2024-01-24 21:17 ` [PATCH 01/20] btrfs: handle directory and dentry mismatch in btrfs_may_delete() David Sterba
2024-01-24 21:17 ` [PATCH 02/20] btrfs: handle invalid range and start in merge_extent_mapping() David Sterba
2024-01-25  3:53   ` Qu Wenruo
2024-01-25 16:07     ` David Sterba
2024-01-24 21:17 ` [PATCH 03/20] btrfs: handle block group lookup error when it's being removed David Sterba
2024-01-25  3:58   ` Qu Wenruo
2024-01-25 16:12     ` David Sterba
2024-01-25 23:09       ` Qu Wenruo
2024-01-29 15:49         ` David Sterba
2024-01-24 21:18 ` [PATCH 04/20] btrfs: handle root deletion lookup error in btrfs_del_root() David Sterba
2024-01-25  4:01   ` Qu Wenruo
2024-01-25 16:19     ` David Sterba
2024-01-24 21:18 ` [PATCH 05/20] btrfs: handle invalid root reference found in btrfs_find_root() David Sterba
2024-01-25  4:03   ` Qu Wenruo
2024-01-25 16:28     ` David Sterba
2024-01-25 23:14       ` Qu Wenruo
2024-01-26 14:19         ` Josef Bacik
2024-01-29 15:55           ` David Sterba [this message]
2024-01-26 12:06   ` Anand Jain
2024-01-29 15:56     ` David Sterba
2024-01-24 21:18 ` [PATCH 06/20] btrfs: handle invalid root reference found in btrfs_init_root_free_objectid() David Sterba
2024-01-25  4:06   ` Qu Wenruo
2024-01-24 21:18 ` [PATCH 07/20] btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks() David Sterba
2024-01-25  4:08   ` Qu Wenruo
2024-01-24 21:18 ` [PATCH 08/20] btrfs: handle invalid extent item reference found in check_committed_ref() David Sterba
2024-01-25  4:10   ` Qu Wenruo
2024-01-25 16:31     ` David Sterba
2024-01-24 21:18 ` [PATCH 09/20] btrfs: export: handle invalid inode or root reference in btrfs_get_parent() David Sterba
2024-01-24 21:18 ` [PATCH 10/20] btrfs: delayed-inode: drop pointless BUG_ON in __btrfs_remove_delayed_item() David Sterba
2024-01-24 21:18 ` [PATCH 11/20] btrfs: change BUG_ON to assertion when checking for delayed_node root David Sterba
2024-01-24 21:18 ` [PATCH 12/20] btrfs: defrag: change BUG_ON to assertion in btrfs_defrag_leaves() David Sterba
2024-01-24 21:18 ` [PATCH 13/20] btrfs: change BUG_ON to assertion in btrfs_read_roots() David Sterba
2024-01-24 21:18 ` [PATCH 14/20] btrfs: change BUG_ON to assertion when verifying lockdep class setup David Sterba
2024-01-24 21:18 ` [PATCH 15/20] btrfs: change BUG_ON to assertion when verifying root in btrfs_alloc_reserved_file_extent() David Sterba
2024-01-24 21:18 ` [PATCH 16/20] btrfs: change BUG_ON to assertion in reset_balance_state() David Sterba
2024-01-24 21:18 ` [PATCH 17/20] btrfs: unify handling of return values of btrfs_insert_empty_items() David Sterba
2024-01-24 21:18 ` [PATCH 18/20] btrfs: move transaction abort to the error site in btrfs_delete_free_space_tree() David Sterba
2024-01-24 21:18 ` [PATCH 19/20] btrfs: move transaction abort to the error site in btrfs_create_free_space_tree() David Sterba
2024-01-24 21:18 ` [PATCH 20/20] btrfs: move transaction abort to the error site btrfs_rebuild_free_space_tree() David Sterba
2024-01-24 22:14 ` [PATCH 00/20] Error handling fixes Josef Bacik
2024-01-25  4:21 ` Qu Wenruo
2024-01-25 16:34   ` David Sterba
2024-01-26 12:28 ` Anand Jain
2024-01-29 16:13   ` David Sterba
2024-01-29 23:33     ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240129155546.GY31555@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox