Re: Flaky test: generic/085

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Linux Filesystem Development List <linux-fsdevel@vger.kernel.org>,
	linux-block@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: Flaky test: generic/085
Date: Thu, 13 Jun 2024 10:17:09 -0700	[thread overview]
Message-ID: <20240613171709.GA3855044@frogsfrogsfrogs> (raw)
In-Reply-To: <20240613-lackmantel-einsehen-90f0d727358d@brauner>

On Thu, Jun 13, 2024 at 11:55:37AM +0200, Christian Brauner wrote:
> On Wed, Jun 12, 2024 at 03:47:16PM GMT, Theodore Ts'o wrote:
> > On Wed, Jun 12, 2024 at 01:25:07PM +0200, Christian Brauner wrote:
> > > I've been trying to reproduce this with pmem yesterday and wasn't able to.
> > > 
> > > What's the kernel config and test config that's used?
> > >
> > 
> > The kernel config can be found here:
> > 
> > https://github.com/tytso/xfstests-bld/blob/master/kernel-build/kernel-configs/config-6.1
> > 
> > Drop it into .config in the build directory of any kernel sources
> > newer than 6.1, and then run "make olddefconfig".  This is all
> > automated in the install-kconfig script which I use:
> > 
> > https://github.com/tytso/xfstests-bld/blob/master/kernel-build/install-kconfig
> > 
> > The VM has 4 CPU's, and 26GiB of memory, and kernel is booted with the
> > boot command line options "memmap=4G!9G memmap=9G!14G", which sets up
> > fake /dev/pmem0 and /dev/pmem1 devices backed by RAM.  This is my poor
> > engineer's way of testing DAX without needing to get access to
> > expensive VM's with pmem.  :-)
> > 
> > I'm assuming this is a timing-dependant bug which is easiest to
> > trigger on fast devices, so a ramdisk might also work.  FWIW, I also
> > can see failures relatively frequently using the ext4/nojournal
> > configuration on a SSD-backed cloud block device (GCE's Persistent
> > Disk SSD product).
> > 
> > As a result, if you grab my xfstests-bld repo from github, and then
> > run "qemu-xfstests -c ext4/nojournal C 20 generic/085" it should
> > also reproduce.  See the Documentation/kvm-quickstart.md for more details.
> 
> Thanks, Ted! Ok, I think I figured it out.
> 
> P1
> dm_resume()
> -> bdev_freeze()
>    mutex_lock(&bdev->bd_fsfreeze_mutex);
>    atomic_inc_return(&bdev->bd_fsfreeze_count); // == 1
>    mutex_unlock(&bdev->bd_fsfreeze_mutex);
> 
> P2						P3
> setup_bdev_super()
> bdev_file_open_by_dev();
> atomic_read(&bdev->bd_fsfreeze_count); // != 0
> 
> 						bdev_thaw()
> 						mutex_lock(&bdev->bd_fsfreeze_mutex);
> 						atomic_dec_return(&bdev->bd_fsfreeze_count); // == 0
> 						mutex_unlock(&bdev->bd_fsfreeze_mutex);
> 						bd_holder_lock();
> 						// grab passive reference on sb via sb->s_count
> 						bd_holder_unlock();
> 						// Sleep to be woken when superblock ready or dead
> bdev_fput()
> bd_holder_lock()
> // yield bdev
> bd_holder_unlock()
> 
> deactivate_locked_super()
> // notify that superblock is dead
> 
> 						// get woken and see that superblock is dead; fail
> 
> In words this basically means that P1 calls dm_suspend() which calls
> into bdev_freeze() before the block device has been claimed by the
> filesystem. This brings bdev->bd_fsfreeze_count to 1 and no call into
> fs_bdev_freeze() is required.
> 
> Now P2 tries to mount that frozen block device. It claims it and checks
> bdev->bd_fsfreeze_count. As it's elevated it aborts mounting holding
> sb->s_umount all the time ofc.
> 
> In the meantime P3 calls dm_resume() it sees that the block device is
> already claimed by a filesystem and calls into fs_bdev_thaw().
> 
> It takes a passive reference and realizes that the filesystem isn't
> ready yet. So P3 puts itself to sleep to wait for the filesystem to
> become ready.
> 
> P2 puts the last active reference to the filesystem and marks it as
> dying.
> 
> Now P3 gets woken, sees that the filesystem is dying and
> get_bdev_super() fails.
> 
> So Darrick is correct about the fix but the reasoning is a bit
> different. :)
> 
> Patch appended and on #vfs.fixes.

> From 35224b919d6778ca5dd11f76659ae849594bd2bf Mon Sep 17 00:00:00 2001
> From: Christian Brauner <brauner@kernel.org>
> Date: Thu, 13 Jun 2024 11:38:14 +0200
> Subject: [PATCH] fs: don't misleadingly warn during thaw operations
> 
> The block device may have been frozen before it was claimed by a
> filesystem. Concurrently another process might try to mount that
> frozen block device and has temporarily claimed the block device for
> that purpose causing a concurrent fs_bdev_thaw() to end up here. The
> mounter is already about to abort mounting because they still saw an
> elevanted bdev->bd_fsfreeze_count so get_bdev_super() will return
> NULL in that case.
> 
> For example, P1 calls dm_suspend() which calls into bdev_freeze() before
> the block device has been claimed by the filesystem. This brings
> bdev->bd_fsfreeze_count to 1 and no call into fs_bdev_freeze() is
> required.
> 
> Now P2 tries to mount that frozen block device. It claims it and checks
> bdev->bd_fsfreeze_count. As it's elevated it aborts mounting.
> 
> In the meantime P3 calls dm_resume() it sees that the block device is
> already claimed by a filesystem and calls into fs_bdev_thaw().
> 
> It takes a passive reference and realizes that the filesystem isn't
> ready yet. So P3 puts itself to sleep to wait for the filesystem to
> become ready.
> 
> P2 puts the last active reference to the filesystem and marks it as
> dying. Now P3 gets woken, sees that the filesystem is dying and
> get_bdev_super() fails.

Wow that's twisty.  But it makes sense to me, so
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> Fixes: 49ef8832fb1a ("bdev: implement freeze and thaw holder operations")
> Cc: <stable@vger.kernel.org>
> Reported-by: Theodore Ts'o <tytso@mit.edu>
> Link: https://lore.kernel.org/r/20240611085210.GA1838544@mit.edu
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
>  fs/super.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index b72f1d288e95..095ba793e10c 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1502,8 +1502,17 @@ static int fs_bdev_thaw(struct block_device *bdev)
>  
>  	lockdep_assert_held(&bdev->bd_fsfreeze_mutex);
>  
> +	/*
> +	 * The block device may have been frozen before it was claimed by a
> +	 * filesystem. Concurrently another process might try to mount that
> +	 * frozen block device and has temporarily claimed the block device for
> +	 * that purpose causing a concurrent fs_bdev_thaw() to end up here. The
> +	 * mounter is already about to abort mounting because they still saw an
> +	 * elevanted bdev->bd_fsfreeze_count so get_bdev_super() will return
> +	 * NULL in that case.
> +	 */
>  	sb = get_bdev_super(bdev);
> -	if (WARN_ON_ONCE(!sb))
> +	if (!sb)
>  		return -EINVAL;
>  
>  	if (sb->s_op->thaw_super)
> -- 
> 2.43.0
>

     prev parent reply	other threads:[~2024-06-13 17:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-11  8:52 Flaky test: generic/085 Theodore Ts'o
2024-06-11 16:37 ` Darrick J. Wong
2024-06-12 11:25   ` Christian Brauner
2024-06-12 14:47     ` Theodore Ts'o
2024-06-13  9:55       ` Christian Brauner
2024-06-13 17:17         ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240613171709.GA3855044@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=brauner@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox