public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 2/2] btrfs: add delayed ref self tests
Date: Fri, 15 Nov 2024 09:42:35 -0500	[thread overview]
Message-ID: <20241115144235.GA1623939@perftesting> (raw)
In-Reply-To: <20241114222353.GA3037257@zen.localdomain>

On Thu, Nov 14, 2024 at 02:23:53PM -0800, Boris Burkov wrote:
> On Thu, Nov 14, 2024 at 02:57:49PM -0500, Josef Bacik wrote:
> > The recent fix for a stupid mistake I made uncovered the fact that we
> > don't have adequate testing in the delayed refs code, as it took a
> > pretty extensive and long running stress test to uncover something that
> > a unit test would have uncovered right away.
> > 
> > Fix this by adding a delayed refs self test suite.  This will validate
> > that the btrfs_ref transformation does the correct thing, that we do the
> > correct thing when merging delayed refs, and that we get the delayed
> > refs in the order that we expect.  These are all crucial to how the
> > delayed refs operate.
> > 
> > I introduced various bugs (including the original bug) into the delayed
> > refs code to validate that these tests caught all of the shenanigans
> > that I could think of.
> 
> These tests look reasonably exhaustive for us handling expected
> situations in the expected way.
> 
> I think it's also a good practice to try to test various error paths
> that are part of the intended behavior of the function.
> 
> Is that something we could feasibly do at this point? (i.e.,
> would it generally BUG_ON and do we have the vocabulary to assert the
> expected failures?)
> 
> OTOH, I don't see any too interesting (not enomem) failures in add/select
> ref anyway, so that might apply more to testing an aspect of delayed
> refs that actually cares about the extent tree, for example.

Funnily enough I did this at first to make sure it caught the wrong ref type,
and forgot we have ASSERT()'s to validate that.  So we could do this, but we'd
have to go through and remove all the ASSERT()'s we put into place to keep us
from doing obviously wrong things.  ENOMEM and other related errors that are
valid is good things to test for, but unfortunately outside our abilities with
the self test.  Maybe kunit test has a way to do this, so perhaps that's
something to put on the list to do in the future instead of doing our own hand
rolled unit testing.

> 
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/Makefile                   |    2 +-
> >  fs/btrfs/delayed-ref.c              |   14 +-
> >  fs/btrfs/tests/btrfs-tests.c        |   18 +
> >  fs/btrfs/tests/btrfs-tests.h        |    6 +
> >  fs/btrfs/tests/delayed-refs-tests.c | 1012 +++++++++++++++++++++++++++
> >  5 files changed, 1048 insertions(+), 4 deletions(-)
> >  create mode 100644 fs/btrfs/tests/delayed-refs-tests.c
> > 
> > diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> > index 3cfc440c636c..2d5f0482678b 100644
> > --- a/fs/btrfs/Makefile
> > +++ b/fs/btrfs/Makefile
> > @@ -44,4 +44,4 @@ btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \
> >  	tests/extent-buffer-tests.o tests/btrfs-tests.o \
> >  	tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o \
> >  	tests/free-space-tree-tests.o tests/extent-map-tests.o \
> > -	tests/raid-stripe-tree-tests.o
> > +	tests/raid-stripe-tree-tests.o tests/delayed-refs-tests.o
> > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> > index 63e0a7f660da..3ec0468d1d94 100644
> > --- a/fs/btrfs/delayed-ref.c
> > +++ b/fs/btrfs/delayed-ref.c
> > @@ -93,6 +93,9 @@ void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans)
> >  	u64 num_bytes;
> >  	u64 reserved_bytes;
> >  
> > +	if (btrfs_is_testing(fs_info))
> > +		return;
> > +
> 
> Not new as of this patch or anything, but this is really gross. Is this
> really better than different modes and blocking off/replacing stuff with
> macros, for example? I guess that doesn't work with running the tests
> while loading the module.
> 
> Optimally, we'd have to refactor stuff to have a unit-testable core, but
> that is a much bigger lift that also leads to a bunch of bugs while we
> do it.

There's two things you'll see here, there's this style of "block this off for
testing", and then there's what I did in btrfs_alloc_dummy_fs_info() where I
stub out the fs_info->csum_size and ->csums_per_leaf.  Those tripped up some of
the random helpers, and I could have easily just slapped a "if
(btrfs_is_testing(fs_info)) return;" in there, but it's way more straightforward
to just stub out what we need for the test.

> 
> >  	num_bytes = btrfs_calc_delayed_ref_bytes(fs_info, trans->delayed_ref_updates);
> >  	num_bytes += btrfs_calc_delayed_ref_csum_bytes(fs_info,
> >  						       trans->delayed_ref_csum_deletions);
> > @@ -1261,6 +1264,7 @@ void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans)
> >  {
> >  	struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs;
> >  	struct btrfs_fs_info *fs_info = trans->fs_info;
> > +	bool testing = btrfs_is_testing(fs_info);
> >  
> >  	spin_lock(&delayed_refs->lock);
> >  	while (true) {
> > @@ -1290,7 +1294,7 @@ void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans)
> >  		spin_unlock(&delayed_refs->lock);
> >  		mutex_unlock(&head->mutex);
> >  
> > -		if (pin_bytes) {
> > +		if (!testing && pin_bytes) {
> >  			struct btrfs_block_group *bg;
> >  
> >  			bg = btrfs_lookup_block_group(fs_info, head->bytenr);
> > @@ -1322,12 +1326,16 @@ void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans)
> >  			btrfs_error_unpin_extent_range(fs_info, head->bytenr,
> >  				head->bytenr + head->num_bytes - 1);
> >  		}
> > -		btrfs_cleanup_ref_head_accounting(fs_info, delayed_refs, head);
> > +		if (!testing)
> > +			btrfs_cleanup_ref_head_accounting(fs_info, delayed_refs,
> > +							  head);
> >  		btrfs_put_delayed_ref_head(head);
> >  		cond_resched();
> >  		spin_lock(&delayed_refs->lock);
> >  	}
> > -	btrfs_qgroup_destroy_extent_records(trans);
> > +
> > +	if (!testing)
> 
> What is the principle that decides whether something is appropriate for
> testing mode?

The principle is "do I care about this right now for the test I'm writing" ;).

The reality is that our codebase isn't well suited for unit testing.  I added
the self tests over a decade ago, we didn't have kunit or any of the
infrastructure/experience to do proper unit testing.  We could prioritize
removing the btrfs_is_testing() stuff, we know where it's all used, and we could
rework the code to be more testable now that we know where the sore spots are.
Unfortunately right now it's more a balance between getting things tested and
making it as painless as possible.  Thanks,

Josef

  reply	other threads:[~2024-11-15 14:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-14 19:57 [PATCH 0/2] btrfs: add a delayed ref self test Josef Bacik
2024-11-14 19:57 ` [PATCH 1/2] btrfs: move select_delayed_ref and export it Josef Bacik
2024-11-14 19:57 ` [PATCH 2/2] btrfs: add delayed ref self tests Josef Bacik
2024-11-14 22:23   ` Boris Burkov
2024-11-15 14:42     ` Josef Bacik [this message]
2024-11-15 18:33   ` David Sterba
2024-12-06 19:51     ` David Sterba
2024-12-09 14:01       ` Josef Bacik
2024-11-26 15:34   ` David Sterba
2024-11-14 22:24 ` [PATCH 0/2] btrfs: add a delayed ref self test Boris Burkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241115144235.GA1623939@perftesting \
    --to=josef@toxicpanda.com \
    --cc=boris@bur.io \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox