From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Miao Xie <miaoxie@huawei.com>, Chris Mason <clm@fb.com>,
<dsterba@suse.cz>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
Date: Wed, 21 Jan 2015 15:47:54 +0800 [thread overview]
Message-ID: <54BF59AA.9070408@cn.fujitsu.com> (raw)
In-Reply-To: <54BF4F62.3010805@huawei.com>
-------- Original Message --------
Subject: Re: [PATCH] btrfs: Don't call btrfs_start_transaction() on
frozen fs to avoid deadlock.
From: Miao Xie <miaoxie@huawei.com>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>, Chris Mason <clm@fb.com>
Date: 2015年01月21日 15:04
> On Wed, 21 Jan 2015 11:53:34 +0800, Qu Wenruo wrote:
>>>> [snipped]
>>>> This will cause another problem, nobody can ensure there will be next
>>>> transaction and the change may
>>>> never to written into disk.
>>> First, the pending changes is mount option, that is in-memory data.
>>> Second, the same problem would happen after you freeze fs.
>> Pending changes are *not* only mount options. Feature change and label change
>> are also pending changes if using sysfs.
> My miss, I don't notice feature and label change by sysfs.
>
> But the implementation of feature and label change by sysfs is wrong, we can
> not change them without write permission.
>
>> Normal ioctl label changing is not affected.
>>
>> For freeze, it's not the same problem since the fs will be unfreeze sooner or
>> later and transaction will be initiated.
> You can not assume the operations of the users, they might freeze the fs and
> then shutdown the machine.
>
>>>> For example, if we change the features/label through sysfs, and then umount
>>>> the fs,
>>> It is different from pending change.
>> No, now features/label changing using sysfs both use pending changes to do the
>> commit.
>> See BTRFS_PENDING_COMMIT bit.
>> So freeze -> change features/label -> sync will still cause the deadlock in the
>> same way,
>> and you can try it yourself.
> As I said above, the implementation of sysfs feature and label change is wrong,
> it is better to separate them from the pending mount option change, make the
> sysfs feature and label change be done in the context of transaction after
> getting the write permission. If so, we needn't do anything special when sync
> the fs.
>
> In short, changing the sysfs feature and label change implementation and
> removing the unnecessary btrfs_start_transaction in sync_fs can fix the
> deadlock.
Your method will only fix the deadlock, but will introduce the risk like
pending inode_cache will never
be written to disk as I already explained. (If still using the
fs_info->pending_changes mechanism)
To ensure pending changes written to disk sync_fs() should start a
transaction if needed,
or there will be chance that no transaction can handle it.
But I don't see the necessity to pending current work(inode_cache,
feature/label changes) to next transaction.
To David:
I'm a little curious about why inode_cache needs to be delayed to next
transaction.
In btrfs_remount() we have s_umount mutex, and we synced the whole
filesystem already,
so there should be no running transaction and we can just set any mount
option into fs_info.
Or even in worst case, there is a racing window, we can still start a
transaction and do the commit,
a little overhead in such minor case won't impact the overall performance.
For sysfs change, I prefer attach or start transaction method, and for
mount option change, and
such sysfs tuning is also minor case for a filesystem.
What do you think about reverting the whole patchset and rework the
sysfs interface?
Thanks,
Qu
>
> Thanks
> Miao
>
>> Thanks,
>> Qu
>>
>>> If you want to change features/label, you should get write permission and make
>>> sure the fs is not be freezed because those are on-disk data. So the problem
>>> doesn't exist, or there is a bug.
>>>
>>> Thanks
>>> Miao
>>>
>>>> since there is no write, there is no running transaction and if we don't start a
>>>> new transaction,
>>>> it won't be flushed to disk.
>>>>
>>>> Thanks,
>>>> Qu
>>>>> the reason is:
>>>>> - Make the behavior of the fs be consistent(both freezed fs and unfreezed fs)
>>>>> - Data on the disk is right and integrated
>>>>>
>>>>>
>>>>> Thanks
>>>>> Miao
>>>> .
>>>>
>> .
>>
next prev parent reply other threads:[~2015-01-21 7:47 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-19 7:42 [PATCH] btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock Qu Wenruo
2015-01-19 14:06 ` David Sterba
2015-01-20 2:51 ` Qu Wenruo
2015-01-20 2:53 ` Qu Wenruo
2015-01-20 3:06 ` Miao Xie
2015-01-20 3:17 ` Qu Wenruo
2015-01-20 8:16 ` Miao Xie
2015-01-20 0:19 ` Miao Xie
2015-01-20 0:26 ` Qu Wenruo
2015-01-20 17:13 ` David Sterba
2015-01-21 0:58 ` Qu Wenruo
2015-01-21 1:05 ` Chris Mason
2015-01-21 1:09 ` Qu Wenruo
2015-01-21 1:10 ` Chris Mason
2015-01-21 3:10 ` Miao Xie
2015-01-21 3:15 ` Qu Wenruo
2015-01-21 3:26 ` Miao Xie
2015-01-21 3:53 ` Qu Wenruo
2015-01-21 7:04 ` Miao Xie
2015-01-21 7:47 ` Qu Wenruo [this message]
2015-01-21 8:46 ` Miao Xie
2015-01-23 17:39 ` David Sterba
2015-01-23 18:21 ` Chris Mason
2015-01-23 16:59 ` David Sterba
2015-01-26 0:31 ` Miao Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54BF59AA.9070408@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=clm@fb.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=miaoxie@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).