From: Su Yue <l@damenly.su>
To: Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH V2] btrfs: traverse seed devices if fs_devices::devices is empty in show_devname
Date: Wed, 18 Aug 2021 15:30:19 +0800 [thread overview]
Message-ID: <7dgjfbca.fsf@damenly.su> (raw)
In-Reply-To: <1b42b3aa-0363-36ad-df5c-4d9d86b8cc97@oracle.com>
On Wed 18 Aug 2021 at 14:48, Anand Jain <anand.jain@oracle.com>
wrote:
> On 18/08/2021 12:19, Su Yue wrote:
>> while running btrfs/238 in my test box, the following warning
>> occurs
>> in high chance:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509
>> btrfs_show_devname+0x104/0x1e8 [btrfs]
>> CPU: 2 PID: 1 Comm: systemd Tainted: G W O
>> 5.14.0-rc1-custom #72
>> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>> Call trace:
>> btrfs_show_devname+0x108/0x1b4 [btrfs]
>> show_mountinfo+0x234/0x2c4
>> m_show+0x28/0x34
>> seq_read_iter+0x12c/0x3c4
>> vfs_read+0x29c/0x2c8
>> ksys_read+0x80/0xec
>> __arm64_sys_read+0x28/0x34
>> invoke_syscall+0x50/0xf8
>> do_el0_svc+0x88/0x138
>> el0_svc+0x2c/0x8c
>> el0t_64_sync_handler+0x84/0xe4
>> el0t_64_sync+0x198/0x19c
>> ---[ end trace 3efd7e5950b8af05 ]---
>>
>
>> It's also reproducible by creating a sprout filesystem and
>> reading
>> /proc/self/mounts in parallel.
>
> ok. This explains.
>
>>
>> The warning is produced if btrfs_show_devname() can't find any
>> available
>> device in fs_info->fs_devices->devices which is protected by
>> RCU.
>
>
>> The warning is desirable to exercise there is at least one
>> device in the
>> mounted filesystem. However, it's not always true for a
>> sprouting fs.
>
>
> Right. When the code is running from line 2596 (including) until
> line
> 2607, there are chances that the fs_info->fs_devices->devices
> list is
> empty. Or those devices are moving to
> fs_info->fs_devices->seed_list.
>
>
> 2596 ret = btrfs_prepare_sprout(fs_info);
> 2597 if (ret) {
> 2598 btrfs_abort_transaction(trans,
> ret);
> 2599 goto error_trans;
> 2600 }
> 2601 }
> 2602
> 2603 device->fs_devices = fs_devices;
> 2604
> 2605 mutex_lock(&fs_devices->device_list_mutex);
> 2606 mutex_lock(&fs_info->chunk_mutex);
> 2607 list_add_rcu(&device->dev_list,
> &fs_devices->devices);
>
>
>>
>> While a new device is being added into fs to be sprouted, call
>> stack is:
>> btrfs_ioctl_add_dev
>> btrfs_init_new_device
>> btrfs_prepare_sprout
>> list_splice_init_rcu(&fs_devices->devices,
>> &seed_devices->devices,
>> synchronize_rcu);
>> list_add_rcu(&device->dev_list, &fs_devices->devices);
>>
>> Looking at btrfs_prepare_sprout(), every new RCU reader will
>> read a
>> empty fs_devices->devices once synchronize_rcu() is called.
>> After commit 4faf55b03823 ("btrfs: don't traverse into the seed
>> devices
>> in show_devname"), btrfs_show_devname() won't looking into
>> fs_devices->seed_list even there is no device in
>> fs_devices->devices.
>>
>> And since commit 88c14590cdd6 ("btrfs: use RCU in
>> btrfs_show_devname for
>> device list traversal"), btrfs_show_devname() only uses RCU
>> without mutex
>> lock taken for device list traversal. The function read an
>> empty
>> fs_devices->devices and found no device in the list then
>> triggers the
>> warning. The commit just enlarged the window that the fs device
>> list
>> could be empty. Even btrfs_show_devname() uses mutex_lock(),
>> there is a
>> tiny chance to read an empty devices list between
>> mutex_unlock() in
>> btrfs_prepare_sprout() and next mutex_lock() in
>> btrfs_init_new_device().
>>
>> So take device_list_mutex then traverse fs_devices->seed_list
>> to seek
>> for a seed device if no device was found in
>> fs_devices->devices.
>> Since a normal fs always has devices in fs_device->devices and
>> the
>> window is small enough, the mutex lock is not too heavy.
>>
>> Signed-off-by: Su Yue <l@damenly.su>
>>
>> ---
>> Changelog:
>> v2:
>> Try to traverse fs_devices->seed_list instead of removing
>> the
>> WARN_ON().
>> Change the subject.
>> Add description of fix.
>> ---
>> fs/btrfs/super.c | 41
>> ++++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 38 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index d07b18b2b250..31e723eb2ccf 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -2482,7 +2482,9 @@ static int btrfs_unfreeze(struct
>> super_block *sb)
>> static int btrfs_show_devname(struct seq_file *m, struct
>> dentry *root)
>> {
>> struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb);
>> + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>> struct btrfs_device *dev, *first_dev = NULL;
>> + struct btrfs_fs_devices *seed_devices;
>>
>> /*
>> * Lightweight locking of the devices. We should not need
>> @@ -2492,7 +2494,7 @@ static int btrfs_show_devname(struct
>> seq_file *m, struct dentry *root)
>> * least until the rcu_read_unlock.
>> */
>> rcu_read_lock();
>> - list_for_each_entry_rcu(dev,
>> &fs_info->fs_devices->devices, dev_list) {
>> + list_for_each_entry_rcu(dev, &fs_devices->devices,
>> dev_list) {
>> if (test_bit(BTRFS_DEV_STATE_MISSING,
>> &dev->dev_state))
>> continue;
>> if (!dev->name)
>> @@ -2503,9 +2505,42 @@ static int btrfs_show_devname(struct
>> seq_file *m, struct dentry *root)
>>
>> if (first_dev)
>> seq_escape(m, rcu_str_deref(first_dev->name), "
>> \t\n\\");
>> - else
>> - WARN_ON(1);
>> rcu_read_unlock();
>> +
>> + if (first_dev)
>> + return 0;
>> +
>> + /*
>> + * While the fs is sprouting, above fs_devices->devices
>> could be empty
>> + * if the RCU read happened in the window between when
>> + * fs_devices->devices was spliced into
>> seed_devices->devices in
>> + * btrfs_prepare_sprout() and new device is not added to
>> + * fs_devices->devices in btrfs_init_new_device().
>> + *
>> + * Take device_list_mutex to make sure seed_devices has
>> been added into
>> + * fs_devices->seed_list then we can traverse it.
>> + */
>> + mutex_lock(&fs_devices->device_list_mutex);
>
>
> possible fix:
> As the problem is from line 2596 to 2607 (above) can we move
> list_add_rcu(&device->dev_list, &fs_devices->devices);
> into btrfs_prepare_sprout() so that it shall reduce the racing
> window.
>
Seems a feasible fix in another way. If the new device is to be
added
by btrfs_prepare_sprout(), then almost everything (num_devices,
total
bytes) what is updated in the mutex block starting from line 2607
should
be moved into the mutex blcok btrfs_prepare_sprout() inside?
Then the code path looks more complex...
> And,
> We have learned that taking device_list_mutex in this thread
> will end
> up with a lockdep warning. We might need a new fs_info state
> to
> indicate that FS is sprouting.
>
Not confident in design :) I guess it's up to David.
Thanks.
--
Su
> Thanks, Anand
>
>> + list_for_each_entry(seed_devices, &fs_devices->seed_list,
>> seed_list) {
>> + list_for_each_entry(dev, &seed_devices->devices,
>> dev_list) {
>> + if (test_bit(BTRFS_DEV_STATE_MISSING,
>> &dev->dev_state))
>> + continue;
>> + if (!dev->name)
>> + continue;
>> + if (!first_dev || dev->devid < first_dev->devid)
>> + first_dev = dev;
>> + }
>> + }
>> +
>> + if (first_dev) {
>> + rcu_read_lock();
>> + seq_escape(m, rcu_str_deref(first_dev->name), "
>> \t\n\\");
>> + rcu_read_unlock();
>> + } else {
>> + WARN_ON(1);
>> + }
>> + mutex_unlock(&fs_devices->device_list_mutex);
>> +
>> return 0;
>> }
>>
>>
next prev parent reply other threads:[~2021-08-18 7:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-18 4:19 [PATCH V2] btrfs: traverse seed devices if fs_devices::devices is empty in show_devname Su Yue
2021-08-18 6:48 ` Anand Jain
2021-08-18 7:30 ` Su Yue [this message]
2021-08-18 22:31 ` David Sterba
2021-08-19 6:18 ` Su Yue
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7dgjfbca.fsf@damenly.su \
--to=l@damenly.su \
--cc=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).