From: Hao Xu <hao.xu@linux.dev>
To: Bernd Schubert <bernd.schubert@fastmail.fm>,
Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>,
fuse-devel@lists.sourceforge.net
Cc: linux-fsdevel@vger.kernel.org, Wanpeng Li <wanpengli@tencent.com>,
cgxu519@mykernel.net, miklos@szeredi.hu
Subject: Re: [External] [fuse-devel] [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode
Date: Wed, 26 Jul 2023 00:57:23 +0800 [thread overview]
Message-ID: <45da6206-8e34-a184-5ba4-d40be252cfd2@linux.dev> (raw)
In-Reply-To: <cb8c18e6-b5cb-e891-696f-b403012eacb7@fastmail.fm>
On 7/25/23 21:00, Bernd Schubert wrote:
>
>
> On 7/25/23 12:11, Hao Xu wrote:
>> On 7/21/23 19:56, Bernd Schubert wrote:
>>> On July 21, 2023 1:27:26 PM GMT+02:00, Hao Xu <hao.xu@linux.dev> wrote:
>>>> On 7/21/23 14:35, Jiachen Zhang wrote:
>>>>>
>>>>> On 2023/6/30 17:46, Hao Xu wrote:
>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>
>>>>>> In direct_io_relax mode, there can be shared mmaped files and
>>>>>> thus dirty
>>>>>> pages in its page cache. Therefore those dirty pages should be
>>>>>> written
>>>>>> back to backend before direct write to avoid data loss.
>>>>>>
>>>>>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>>>>>> ---
>>>>>> fs/fuse/file.c | 7 +++++++
>>>>>> 1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>>>>> index 176f719f8fc8..7c9167c62bf6 100644
>>>>>> --- a/fs/fuse/file.c
>>>>>> +++ b/fs/fuse/file.c
>>>>>> @@ -1485,6 +1485,13 @@ ssize_t fuse_direct_io(struct fuse_io_priv
>>>>>> *io, struct iov_iter *iter,
>>>>>> if (!ia)
>>>>>> return -ENOMEM;
>>>>>> + if (fopen_direct_write && fc->direct_io_relax) {
>>>>>> + res = filemap_write_and_wait_range(mapping, pos, pos +
>>>>>> count - 1);
>>>>>> + if (res) {
>>>>>> + fuse_io_free(ia);
>>>>>> + return res;
>>>>>> + }
>>>>>> + }
>>>>>> if (!cuse && fuse_range_is_writeback(inode, idx_from,
>>>>>> idx_to)) {
>>>>>> if (!write)
>>>>>> inode_lock(inode);
>>>>>
>>>>> Tested-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
>>>>>
>>>>>
>>>>> Looks good to me.
>>>>>
>>>>> By the way, the behaviour would be a first FUSE_WRITE flushing the
>>>>> page cache, followed by a second FUSE_WRITE doing the direct IO.
>>>>> In the future, further optimization could be first write into the
>>>>> page cache and then flush the dirty page to the FUSE daemon.
>>>>>
>>>>
>>>> I think this makes sense, cannot think of any issue in it for now, so
>>>> I'll do that change and send next version, super thanks, Jiachen!
>>>>
>>>> Thanks,
>>>> Hao
>>>>
>>>>>
>>>>> Thanks,
>>>>> Jiachen
>>>>
>>>
>>> On my phone, sorry if mail formatting is not optimal.
>>> Do I understand it right? You want DIO code path copy into pages and
>>> then flush/invalidate these pages? That would be punish DIO for for
>>> the unlikely case there are also dirty pages (discouraged IO pattern).
>>
>> Hi Bernd,
>> I think I don't get what you said, why it is punishment and why it's
>> discouraged IO pattern?
>> On my first eyes seeing Jiachen's idea, I was thinking "that sounds
>> disobeying direct write semantics" because usually direct write is
>> "flush dirty page -> invalidate page -> write data through to backend"
>> not "write data to page -> flush dirty page/(writeback data)"
>> The latter in worst case write data both to page cache and backend
>> while the former just write to backend and load it to the page cache
>> when buffered reading. But seems there is no such "standard way" which
>> says we should implement direct IO in that way.
>
> Hi Hao,
>
> sorry for being brief last week, I was on vacation and reading/writing
> some mails on my phone.
>
> With 'punishment' I mean memory copies to the page cache - memory
> copies are expensive and DIO should avoid it.
>
> Right now your patch adds filemap_write_and_wait_range(), but we do
> not know if it did work (i.e. if pages had to be flushed). So unless
> you find a way to get that information, copy to page cache would be
> unconditionally - overhead of memory copy even if there are no dirty
> pages.
Ah, looks I understood what you mean in my last email reply. Yes, just
like what I said in last email:
[1] flush dirty page --> invalidate page --> write data to backend
This is what we do for direct write right now in kernel, I call this
policy "write-through", since it doesn't care much about the cache.
[2] write data to page cache --> flush dirty page in suitable time
This is "write-back" policy, used by buffered write. Here in this
patch's case, we flush pages synchronously, so it still can be called
direct-write.
Surely, in the worst case, the page is clean, then [2] has one extra
memory copy than [1]. But like what I pointed out, for [2], next time
buffered
read happens, the page is in latest state, so no I/O needed, while for
[1], it has to load data from backend to page cache.
>
> With 'discouraged' I mean mix of page cache and direct-io. Typically
> one should only do either of both (page cache or DIO), but not a mix
> of them. For example see your patch, it flushes the page cache, but
> without a lock - races are possible. Copying to the page cache might
> be a solution, but it has the overhead above.
For race, we held inode lock there, do I miss anything?
>
> Thanks,
> Bernd
I now think it's good to keep the pattern same as other filesystems
which is [1] to avoid possible performance issues in the future, thanks
Bernd.
Hao
next prev parent reply other threads:[~2023-07-25 16:57 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-30 9:45 [PATCH v3 0/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
2023-06-30 9:46 ` [PATCH 1/3] fuse: invalidate page cache pages before direct write Hao Xu
2023-06-30 10:32 ` Bernd Schubert
2023-07-21 3:34 ` [fuse-devel] " Jiachen Zhang
2023-06-30 9:46 ` [PATCH 2/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
2023-06-30 10:35 ` Bernd Schubert
2023-06-30 9:46 ` [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode Hao Xu
2023-06-30 10:40 ` Bernd Schubert
2023-07-21 6:35 ` [External] [fuse-devel] " Jiachen Zhang
2023-07-21 11:27 ` Hao Xu
2023-07-21 11:56 ` Bernd Schubert
2023-07-25 10:11 ` Hao Xu
2023-07-25 13:00 ` Bernd Schubert
2023-07-25 16:57 ` Hao Xu [this message]
2023-07-25 17:59 ` Bernd Schubert
2023-07-27 9:42 ` Hao Xu
2023-07-26 11:07 ` Jiachen Zhang
2023-07-26 13:15 ` Bernd Schubert
2023-07-27 2:24 ` Jiachen Zhang
2023-07-27 10:31 ` Hao Xu
2023-07-28 2:57 ` Jiachen Zhang
2023-07-27 10:48 ` Hao Xu
2023-07-05 10:23 ` [RFC] [PATCH] fuse: DIO writes always use the same code path Bernd Schubert
2023-07-06 14:43 ` Christoph Hellwig
2023-07-07 13:36 ` Bernd Schubert
2023-07-17 8:03 ` Hao Xu
2023-07-17 21:17 ` Bernd Schubert
2023-07-20 7:32 ` [PATCH v3 0/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45da6206-8e34-a184-5ba4-d40be252cfd2@linux.dev \
--to=hao.xu@linux.dev \
--cc=bernd.schubert@fastmail.fm \
--cc=cgxu519@mykernel.net \
--cc=fuse-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=wanpengli@tencent.com \
--cc=zhangjiachen.jaycee@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).