linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Su Yue <l@damenly.su>
To: Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH V2] btrfs: traverse seed devices if fs_devices::devices is empty in show_devname
Date: Wed, 18 Aug 2021 15:30:19 +0800	[thread overview]
Message-ID: <7dgjfbca.fsf@damenly.su> (raw)
In-Reply-To: <1b42b3aa-0363-36ad-df5c-4d9d86b8cc97@oracle.com>


On Wed 18 Aug 2021 at 14:48, Anand Jain <anand.jain@oracle.com> 
wrote:

> On 18/08/2021 12:19, Su Yue wrote:
>> while running btrfs/238 in my test box, the following warning 
>> occurs
>> in high chance:
>>
>> ------------[ cut here  ]------------
>> WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509 
>> btrfs_show_devname+0x104/0x1e8 [btrfs]
>> CPU: 2 PID: 1 Comm: systemd Tainted: G        W  O 
>> 5.14.0-rc1-custom #72
>> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>> Call trace:
>>    btrfs_show_devname+0x108/0x1b4 [btrfs]
>>    show_mountinfo+0x234/0x2c4
>>    m_show+0x28/0x34
>>    seq_read_iter+0x12c/0x3c4
>>    vfs_read+0x29c/0x2c8
>>    ksys_read+0x80/0xec
>>    __arm64_sys_read+0x28/0x34
>>    invoke_syscall+0x50/0xf8
>>    do_el0_svc+0x88/0x138
>>    el0_svc+0x2c/0x8c
>>    el0t_64_sync_handler+0x84/0xe4
>>    el0t_64_sync+0x198/0x19c
>> ---[ end trace 3efd7e5950b8af05  ]---
>>
>
>> It's also reproducible by creating a sprout filesystem and 
>> reading
>> /proc/self/mounts in parallel.
>
>   ok. This explains.
>
>>
>> The warning is produced if btrfs_show_devname() can't find any 
>> available
>> device in fs_info->fs_devices->devices which is protected by 
>> RCU.
>
>
>> The warning is desirable to exercise there is at least one 
>> device in the
>> mounted filesystem. However, it's not always true for a 
>> sprouting fs.
>
>
> Right. When the code is running from line 2596 (including) until 
> line
> 2607, there are chances that the fs_info->fs_devices->devices 
> list is
> empty. Or those devices are moving to 
> fs_info->fs_devices->seed_list.
>
>
> 2596                 ret = btrfs_prepare_sprout(fs_info);
> 2597                 if (ret) {
> 2598                         btrfs_abort_transaction(trans, 
> ret);
> 2599                         goto error_trans;
> 2600                 }
> 2601         }
> 2602
> 2603         device->fs_devices = fs_devices;
> 2604
> 2605         mutex_lock(&fs_devices->device_list_mutex);
> 2606         mutex_lock(&fs_info->chunk_mutex);
> 2607         list_add_rcu(&device->dev_list, 
> &fs_devices->devices);
>
>
>>
>> While a new device is being added into fs to be sprouted, call 
>> stack is:
>>   btrfs_ioctl_add_dev
>>    btrfs_init_new_device
>>      btrfs_prepare_sprout
>>        list_splice_init_rcu(&fs_devices->devices, 
>>        &seed_devices->devices,
>>        synchronize_rcu);
>>      list_add_rcu(&device->dev_list, &fs_devices->devices);
>>
>> Looking at btrfs_prepare_sprout(), every new RCU reader will 
>> read a
>> empty fs_devices->devices once synchronize_rcu() is called.
>> After commit 4faf55b03823 ("btrfs: don't traverse into the seed 
>> devices
>> in show_devname"), btrfs_show_devname() won't looking into
>> fs_devices->seed_list even there is no device in 
>> fs_devices->devices.
>>
>> And since commit 88c14590cdd6 ("btrfs: use RCU in 
>> btrfs_show_devname for
>> device list traversal"), btrfs_show_devname() only uses RCU 
>> without mutex
>> lock taken for device list traversal. The function read an 
>> empty
>> fs_devices->devices and found no device in the list then 
>> triggers the
>> warning. The commit just enlarged the window that the fs device 
>> list
>> could be empty. Even btrfs_show_devname() uses mutex_lock(), 
>> there is a
>> tiny chance to read an empty devices list between 
>> mutex_unlock() in
>> btrfs_prepare_sprout() and next mutex_lock() in 
>> btrfs_init_new_device().
>>
>> So take device_list_mutex then traverse fs_devices->seed_list 
>> to seek
>> for a seed device if no device was found in 
>> fs_devices->devices.
>> Since a normal fs always has devices in fs_device->devices and 
>> the
>> window is small enough, the mutex lock is not too heavy.
>>
>> Signed-off-by: Su Yue <l@damenly.su>
>>
>> ---
>> Changelog:
>> v2:
>>      Try to traverse fs_devices->seed_list instead of removing 
>>      the
>>        WARN_ON().
>>      Change the subject.
>>      Add description of fix.
>> ---
>>   fs/btrfs/super.c | 41 
>>   ++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 38 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index d07b18b2b250..31e723eb2ccf 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -2482,7 +2482,9 @@ static int btrfs_unfreeze(struct 
>> super_block *sb)
>>   static int btrfs_show_devname(struct seq_file *m, struct 
>>   dentry *root)
>>   {
>>   	struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb);
>> +	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
>>   	struct btrfs_device *dev, *first_dev = NULL;
>> +	struct btrfs_fs_devices *seed_devices;
>>
>>   	/*
>>   	 * Lightweight locking of the devices. We should not need
>> @@ -2492,7 +2494,7 @@ static int btrfs_show_devname(struct 
>> seq_file *m, struct dentry *root)
>>   	 * least until the rcu_read_unlock.
>>   	 */
>>   	rcu_read_lock();
>> -	list_for_each_entry_rcu(dev, 
>> &fs_info->fs_devices->devices, dev_list) {
>> +	list_for_each_entry_rcu(dev, &fs_devices->devices, 
>> dev_list) {
>>   		if (test_bit(BTRFS_DEV_STATE_MISSING, 
>>   &dev->dev_state))
>>   			continue;
>>   		if (!dev->name)
>> @@ -2503,9 +2505,42 @@ static int btrfs_show_devname(struct 
>> seq_file *m, struct dentry *root)
>>
>>   	if (first_dev)
>>   		seq_escape(m, rcu_str_deref(first_dev->name), " 
>>   \t\n\\");
>> -	else
>> -		WARN_ON(1);
>>   	rcu_read_unlock();
>> +
>> +	if (first_dev)
>> +		return 0;
>> +
>> +	/*
>> +	 * While the fs is sprouting, above fs_devices->devices 
>> could be empty
>> +	 * if the RCU read happened in the window between when
>> +	 * fs_devices->devices was spliced into 
>> seed_devices->devices in
>> +	 * btrfs_prepare_sprout() and new device is not added to
>> +	 * fs_devices->devices in btrfs_init_new_device().
>> +	 *
>> +	 * Take device_list_mutex to make sure seed_devices has 
>> been added into
>> +	 * fs_devices->seed_list then we can traverse it.
>> +	 */
>> +	mutex_lock(&fs_devices->device_list_mutex);
>
>
> possible fix:
>   As the problem is from line 2596 to 2607 (above) can we move
>      list_add_rcu(&device->dev_list, &fs_devices->devices);
>   into btrfs_prepare_sprout() so that it shall reduce the racing 
>   window.
>
Seems a feasible fix in another way. If the new device is to be 
added
by btrfs_prepare_sprout(), then almost everything (num_devices, 
total
bytes) what is updated in the mutex block starting from line 2607 
should
be moved into the mutex blcok btrfs_prepare_sprout() inside?
Then the code path looks more complex...

> And,
>   We have learned that taking device_list_mutex in this thread 
>   will end
>   up with a lockdep warning. We might need a new fs_info state 
>   to
>   indicate that FS is sprouting.
>
Not confident in design :) I guess it's up to David.

Thanks.

--
Su
> Thanks, Anand
>
>> +	list_for_each_entry(seed_devices, &fs_devices->seed_list, 
>> seed_list) {
>> +		list_for_each_entry(dev, &seed_devices->devices, 
>> dev_list) {
>> +			if (test_bit(BTRFS_DEV_STATE_MISSING, 
>> &dev->dev_state))
>> +				continue;
>> +			if (!dev->name)
>> +				continue;
>> +			if (!first_dev || dev->devid < first_dev->devid)
>> +				first_dev = dev;
>> +		}
>> +	}
>> +
>> +	if (first_dev) {
>> +		rcu_read_lock();
>> +		seq_escape(m, rcu_str_deref(first_dev->name), " 
>> \t\n\\");
>> +		rcu_read_unlock();
>> +	} else {
>> +		WARN_ON(1);
>> +	}
>> +	mutex_unlock(&fs_devices->device_list_mutex);
>> +
>>   	return 0;
>>   }
>>
>>

  reply	other threads:[~2021-08-18  7:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  4:19 [PATCH V2] btrfs: traverse seed devices if fs_devices::devices is empty in show_devname Su Yue
2021-08-18  6:48 ` Anand Jain
2021-08-18  7:30   ` Su Yue [this message]
2021-08-18 22:31   ` David Sterba
2021-08-19  6:18   ` Su Yue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7dgjfbca.fsf@damenly.su \
    --to=l@damenly.su \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).