From: "Li, Hao" <lihao2018.fnst@cn.fujitsu.com>
To: Ira Weiny <ira.weiny@intel.com>, Yasunori Goto <y-goto@fujitsu.com>
Cc: Dave Chinner <david@fromorbit.com>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: Can we change the S_DAX flag immediately on XFS without dropping caches?
Date: Tue, 18 Aug 2020 17:16:25 +0800 [thread overview]
Message-ID: <ba98b77e-a806-048a-a0dc-ca585677daf3@cn.fujitsu.com> (raw)
In-Reply-To: <20200807170858.GU1573827@iweiny-DESK2.sc.intel.com>
On 2020/8/8 1:09, Ira Weiny wrote:
> On Fri, Jul 31, 2020 at 06:59:32PM +0900, Yasunori Goto wrote:
>> On 2020/07/30 8:21, Dave Chinner wrote:
>>> On Wed, Jul 29, 2020 at 11:23:21AM +0900, Yasunori Goto wrote:
>>>> Hi,
>>>>
>>>> On 2020/07/28 11:20, Dave Chinner wrote:
>>>>> On Tue, Jul 28, 2020 at 02:00:08AM +0000, Li, Hao wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have noticed that we have to drop caches to make the changing of S_DAX
>>>>>> flag take effect after using chattr +x to turn on DAX for a existing
>>>>>> regular file. The related function is xfs_diflags_to_iflags, whose
>>>>>> second parameter determines whether we should set S_DAX immediately.
>>>>> Yup, as documented in Documentation/filesystems/dax.txt. Specifically:
>>>>>
>>>>> 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
>>>>> the change in behaviour for existing regular files may not occur
>>>>> immediately. If the change must take effect immediately, the administrator
>>>>> needs to:
>>>>>
>>>>> a) stop the application so there are no active references to the data set
>>>>> the policy change will affect
>>>>>
>>>>> b) evict the data set from kernel caches so it will be re-instantiated when
>>>>> the application is restarted. This can be achieved by:
>>>>>
>>>>> i. drop-caches
>>>>> ii. a filesystem unmount and mount cycle
>>>>> iii. a system reboot
>>>>>
>>>>>> I can't figure out why we do this. Is this because the page caches in
>>>>>> address_space->i_pages are hard to deal with?
>>>>> Because of unfixable races in the page fault path that prevent
>>>>> changing the caching behaviour of the inode while concurrent access
>>>>> is possible. The only way to guarantee races can't happen is to
>>>>> cycle the inode out of cache.
>>>> I understand why the drop_cache operation is necessary. Thanks.
>>>>
>>>> BTW, even normal user becomes to able to change DAX flag for an inode,
>>>> drop_cache operation still requires root permission, right?
>>> Step back for a minute and explain why you want to be able to change
>>> the DAX mode of a file -as a user-.
>>
>> For example, there are 2 containers executed in a system, which is named as
>> container A and container B, and these host gives FS-DAX files to each
>> containers.
>> If the user of container A would like to change DAX-off for tuning, then he
>> will stop his application
>> and change DAX flag, but the flag may not be changed.
>>
>> Then he will "need" to ask host operator to execute drop_cache, and the
>> operator did it.
>> As a result, not only container A, but also container B get the impact of
>> drop_cache.
>>
>> Especially, if this is multi tenant container system, then I think this is
>> not acceptable.
>>
>> Probably, there are 2 problems I think.
>> 1) drop_cache requires root permission.
>> 2) drop_cache has too wide effect.
>>
>>>> So, if kernel have a feature for normal user can operate drop cache for "a
>>>> inode" with
>>>> its permission, I think it improve the above limitation, and
>>>> we would like to try to implement it recently.
>>> No, drop_caches is not going to be made available to users. That
>>> makes it s trivial system wide DoS vector.
>> The current drop_cache feature tries to drop ALL of cache (page cache and/or
>> slab cache).
>> Then, I agree that normal user should not drop all of them.
>>
>> But my intention was that drop cache of ONE file which is changed dax flag,
>> (and if possible, drop only the inode cache.)
>> Do you mean it will be still cause of weakness against DoS attack?
>> If so, I should give up to solve problem 1) at least.
> FWIW changing the on disk flag automatically flags the inode to be dropped as
> soon as all references are done.
>
> See:
>
> 2c567af418e3 fs: Introduce DCACHE_DONTCACHE
> dae2f8ed7992 fs: Lift XFS_IDONTCACHE to the VFS layer
Hi,
I find that DCACHE_DONTCACHE doesn't work well.
If DCACHE_REFERENCED is not set, dput() can drop the inode successfully as
soon as all references are gone. By contrast, if DCACHE_REFERENCED is set,
dput() only decreases the reference count of dentry and don't evict inode.
Example 1:
echo abcdefg > test.txt
echo 3 > /proc/sys/vm/drop_caches
xfs_io -c 'chattr +x' test.txt
In this example, we can say the DAX policy takes effects immediately as we
don't need to drop cache after chattr.
In this circumstance, DCACHE_REFERENCED is not set, and DCACHE_DONTCACHE
can drop the inode as expected.
Example 2:
echo abcdefg > test.txt
xfs_io -c 'chattr +x' test.txt
In this example, we must drop caches after chattr to make DAX policy
take effects. This is because DCACHE_REFERENCED is set, and fast_dput() will
return true, and then retain_dentry() have no chance to check DCACHE_DONTCACHE.
If this is the desired behavior, I can't understand the necessity
of DCACHE_DONTCACHE.
Regards,
Hao Li
>
> But from a users perspective you just don't know when that will happen. The
> system just can't guarantee it. The best the user can do is stop taking
> references to the file and close all references, and periodically check the
> state. But this will take a reference so... Kind of a catch-22 here... :-(
>
> Ira
>
>>
>> Thanks,
>>
>>> Cheers,
>>>
>>> Dave.
>> --
>> Yasunori Goto
>>
>
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
prev parent reply other threads:[~2020-08-18 9:16 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-28 2:00 Can we change the S_DAX flag immediately on XFS without dropping caches? Li, Hao
2020-07-28 2:20 ` Dave Chinner
2020-07-29 2:23 ` Yasunori Goto
2020-07-29 16:10 ` Ira Weiny
2020-07-31 9:12 ` Li, Hao
2020-08-05 8:10 ` Li, Hao
2020-08-05 15:44 ` Darrick J. Wong
2020-08-07 16:57 ` Ira Weiny
2020-07-31 10:04 ` Yasunori Goto
2020-07-29 23:21 ` Dave Chinner
2020-07-31 9:15 ` Li, Hao
2020-07-31 9:59 ` Yasunori Goto
2020-08-07 17:09 ` Ira Weiny
2020-08-18 9:16 ` Li, Hao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba98b77e-a806-048a-a0dc-ca585677daf3@cn.fujitsu.com \
--to=lihao2018.fnst@cn.fujitsu.com \
--cc=david@fromorbit.com \
--cc=ira.weiny@intel.com \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-xfs@vger.kernel.org \
--cc=y-goto@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox