public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: Boris Burkov <boris@bur.io>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com
Subject: Re: [PATCH v2] btrfs: fix mount failure caused by race with umount
Date: Thu, 16 Jul 2020 20:51:10 +0200	[thread overview]
Message-ID: <20200716185110.GB3703@twin.jikos.cz> (raw)
In-Reply-To: <20200710172304.139763-1-boris@bur.io>

On Fri, Jul 10, 2020 at 10:23:04AM -0700, Boris Burkov wrote:
> Here is the sequence laid out in greater detail:
> 
> CPU0                                                    CPU1
> down_write sb->s_umount
> btrfs_kill_super
>   kill_anon_super(sb)
>     generic_shutdown_super(sb);
>       shrink_dcache_for_umount(sb);
>       sync_filesystem(sb);
>       evict_inodes(sb); // SLOW
> 
>                                               btrfs_mount_root
>                                                 btrfs_scan_one_device
>                                                 fs_devices = device->fs_devices
>                                                 fs_info->fs_devices = fs_devices
>                                                 // fs_devices-opened makes this a no-op
>                                                 btrfs_open_devices(fs_devices, mode, fs_type)
>                                                 s = sget(fs_type, test, set, flags, fs_info);
>                                                   find sb in s_instances
>                                                   grab_super(sb);
>                                                     down_write(&s->s_umount); // blocks
> 
>       sop->put_super(sb)
>         // sb->fs_devices->opened == 2; no-op
>       spin_lock(&sb_lock);
>       hlist_del_init(&sb->s_instances);
>       spin_unlock(&sb_lock);
>       up_write(&sb->s_umount);
>                                                     return 0;
>                                                   retry lookup
>                                                   don't find sb in s_instances (deleted by CPU0)
>                                                   s = alloc_super
>                                                   return s;
>                                                 btrfs_fill_super(s, fs_devices, data)
>                                                   open_ctree // fs_devices total_rw_bytes improperly set!
>                                                     btrfs_read_chunk_tree
>                                                       read_one_dev // increment total_rw_bytes again!!
>                                                       super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!

It seems weird that umount and mount can be mixed in such way but with
the VFS locks and structures it's valid, so the devices managed by btrfs
slipped through.

With the suggested fix, the bit BTRFS_DEV_STATE_IN_FS_METADATA becomes
quite important and the synchronization of the device related data.
The semantics seems quite subtle and inconsistent regarding other uses
of set_bit or clear_bit and the total_rw_bytes.

I'm thinkig about unconditional setting of IN_FS_METADATA as it is now,
but recalculating total_rw_size outside of read_one_dev in
btrfs_read_chunk_tree. There it should not matter if the bit was set by
the unmounted or the mounted filesystem, as long as the locking rules
for updating fs_devices hold. For that we have uuid_mutex and
fs_devices::device_list_mutex, this is used elsewhere so fixing it using
existing mechanisms is IMHO better way than relying on subtle
undocumented semantics of the state bit.

  parent reply	other threads:[~2020-07-16 18:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10  0:44 [PATCH] btrfs: fix mount failure caused by race with umount Boris Burkov
2020-07-10  1:23 ` Josef Bacik
2020-07-10 17:23   ` [PATCH v2] " Boris Burkov
2020-07-10 17:51     ` Josef Bacik
2020-07-16 18:51     ` David Sterba [this message]
2020-07-16 20:29       ` [PATCH v3] " Boris Burkov
2020-07-20 16:32         ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200716185110.GB3703@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=boris@bur.io \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox