From: Hao Xu <hao.xu@linux.dev>
To: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>,
Bernd Schubert <bernd.schubert@fastmail.fm>,
fuse-devel@lists.sourceforge.net
Cc: linux-fsdevel@vger.kernel.org, Wanpeng Li <wanpengli@tencent.com>,
cgxu519@mykernel.net, miklos@szeredi.hu
Subject: Re: [fuse-devel] [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode
Date: Thu, 27 Jul 2023 18:48:40 +0800 [thread overview]
Message-ID: <fcc8c890-e8c4-266b-74d0-437ded0eef5d@linux.dev> (raw)
In-Reply-To: <6856f435-a589-e044-881f-3a80aefa1174@bytedance.com>
On 7/26/23 19:07, Jiachen Zhang wrote:
>
>
> On 2023/7/26 00:57, Hao Xu wrote:
>>
>> On 7/25/23 21:00, Bernd Schubert wrote:
>>>
>>>
>>> On 7/25/23 12:11, Hao Xu wrote:
>>>> On 7/21/23 19:56, Bernd Schubert wrote:
>>>>> On July 21, 2023 1:27:26 PM GMT+02:00, Hao Xu <hao.xu@linux.dev>
>>>>> wrote:
>>>>>> On 7/21/23 14:35, Jiachen Zhang wrote:
>>>>>>>
>>>>>>> On 2023/6/30 17:46, Hao Xu wrote:
>>>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>>>
>>>>>>>> In direct_io_relax mode, there can be shared mmaped files and
>>>>>>>> thus dirty
>>>>>>>> pages in its page cache. Therefore those dirty pages should be
>>>>>>>> written
>>>>>>>> back to backend before direct write to avoid data loss.
>>>>>>>>
>>>>>>>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>>>>>>>> ---
>>>>>>>> fs/fuse/file.c | 7 +++++++
>>>>>>>> 1 file changed, 7 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>>>>>>> index 176f719f8fc8..7c9167c62bf6 100644
>>>>>>>> --- a/fs/fuse/file.c
>>>>>>>> +++ b/fs/fuse/file.c
>>>>>>>> @@ -1485,6 +1485,13 @@ ssize_t fuse_direct_io(struct
>>>>>>>> fuse_io_priv *io, struct iov_iter *iter,
>>>>>>>> if (!ia)
>>>>>>>> return -ENOMEM;
>>>>>>>> + if (fopen_direct_write && fc->direct_io_relax) {
>
>
> Hi,
>
> Seems this patchset flushes and invalidates the page cache before
> doing the direct-io writes, which avoids data loss caused by flushing
> staled data to FUSE daemon. And I tested it works well.
>
> But there is also another side of the same problem we should consider.
> If a file is modified through its page cache (shared mmapped regions,
> or non-FOPEN_DIRECT_IO files), the following direct-io reads may
> bypass the new data in dirty page cache and read the staled data from
> FUSE daemon. I think this is also a problem that should be fixed. It
> could be fixed by uncondictionally calling
> filemap_write_and_wait_range() before direct-io read.
Yea, I think this is true, I'll fix it in v2. Thanks Jiachen.
>
>
>>>>>>>> + res = filemap_write_and_wait_range(mapping, pos, pos +
>>>>>>>> count - 1);
>>>>>>>> + if (res) {
>>>>>>>> + fuse_io_free(ia);
>>>>>>>> + return res;
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> if (!cuse && fuse_range_is_writeback(inode, idx_from,
>>>>>>>> idx_to)) {
>>>>>>>> if (!write)
>>>>>>>> inode_lock(inode);
>>>>>>>
>>>>>>> Tested-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
>>>>>>>
>>>>>>>
>>>>>>> Looks good to me.
>>>>>>>
>>>>>>> By the way, the behaviour would be a first FUSE_WRITE flushing
>>>>>>> the page cache, followed by a second FUSE_WRITE doing the direct
>>>>>>> IO. In the future, further optimization could be first write
>>>>>>> into the page cache and then flush the dirty page to the FUSE
>>>>>>> daemon.
>>>>>>>
>>>>>>
>>>>>> I think this makes sense, cannot think of any issue in it for
>>>>>> now, so
>>>>>> I'll do that change and send next version, super thanks, Jiachen!
>>>>>>
>>>>>> Thanks,
>>>>>> Hao
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiachen
>>>>>>
>>>>>
>>>>> On my phone, sorry if mail formatting is not optimal.
>>>>> Do I understand it right? You want DIO code path copy into pages
>>>>> and then flush/invalidate these pages? That would be punish DIO
>>>>> for for the unlikely case there are also dirty pages (discouraged
>>>>> IO pattern).
>>>>
>>>> Hi Bernd,
>>>> I think I don't get what you said, why it is punishment and why
>>>> it's discouraged IO pattern?
>>>> On my first eyes seeing Jiachen's idea, I was thinking "that sounds
>>>> disobeying direct write semantics" because usually direct write is
>>>> "flush dirty page -> invalidate page -> write data through to backend"
>>>> not "write data to page -> flush dirty page/(writeback data)"
>>>> The latter in worst case write data both to page cache and backend
>>>> while the former just write to backend and load it to the page cache
>>>> when buffered reading. But seems there is no such "standard way" which
>>>> says we should implement direct IO in that way.
>>>
>>> Hi Hao,
>>>
>>> sorry for being brief last week, I was on vacation and
>>> reading/writing some mails on my phone.
>>>
>>> With 'punishment' I mean memory copies to the page cache - memory
>>> copies are expensive and DIO should avoid it.
>>>
>>> Right now your patch adds filemap_write_and_wait_range(), but we do
>>> not know if it did work (i.e. if pages had to be flushed). So unless
>>> you find a way to get that information, copy to page cache would be
>>> unconditionally - overhead of memory copy even if there are no dirty
>>> pages.
>>
>>
>> Ah, looks I understood what you mean in my last email reply. Yes,
>> just like what I said in last email:
>>
>> [1] flush dirty page --> invalidate page --> write data to backend
>>
>> This is what we do for direct write right now in kernel, I call
>> this policy "write-through", since it doesn't care much about the cache.
>>
>> [2] write data to page cache --> flush dirty page in suitable time
>>
>> This is "write-back" policy, used by buffered write. Here in
>> this patch's case, we flush pages synchronously, so it still can be
>> called direct-write.
>>
>> Surely, in the worst case, the page is clean, then [2] has one extra
>> memory copy than [1]. But like what I pointed out, for [2], next time
>> buffered
>>
>> read happens, the page is in latest state, so no I/O needed, while
>> for [1], it has to load data from backend to page cache.
>>
>
> Write-through, write-back and direct-io are also exlained in the
> kernel documentation [*], of which write-through and write-back are
> cache modes. According to the document, the pattern [2] is similar to
> the FUSE write-back mode, but the pattern [1] is different from the
> FUSE write-through mode. The FUSE write-through mode obeys the 'write
> data to page cache --> flush dirty page synchronously' (let us call it
> pattern [3]), which keeps the clean cache in-core after flushing.
>
> To improve performance while keeping the direct-io semantics, my
> thoughts was in the future, maybe we can fallback to the pattern [3]
> if the target page is in-core, otherwise keep the original direct-io
> pattern without reading from whole pages from FUSE daemon.
>
> [*] https://www.kernel.org/doc/Documentation/filesystems/fuse-io.txt
>
> Thanks,
> Jiachen
>
>>
>>>
>>> With 'discouraged' I mean mix of page cache and direct-io. Typically
>>> one should only do either of both (page cache or DIO), but not a mix
>>> of them. For example see your patch, it flushes the page cache, but
>>> without a lock - races are possible. Copying to the page cache might
>>> be a solution, but it has the overhead above.
>>
>>
>> For race, we held inode lock there, do I miss anything?
>>
>>
>>>
>>> Thanks,
>>> Bernd
>>
>>
>> I now think it's good to keep the pattern same as other filesystems
>> which is [1] to avoid possible performance issues in the future,
>> thanks Bernd.
>>
>>
>> Hao
>>
>>
next prev parent reply other threads:[~2023-07-27 10:48 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-30 9:45 [PATCH v3 0/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
2023-06-30 9:46 ` [PATCH 1/3] fuse: invalidate page cache pages before direct write Hao Xu
2023-06-30 10:32 ` Bernd Schubert
2023-07-21 3:34 ` [fuse-devel] " Jiachen Zhang
2023-06-30 9:46 ` [PATCH 2/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
2023-06-30 10:35 ` Bernd Schubert
2023-06-30 9:46 ` [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode Hao Xu
2023-06-30 10:40 ` Bernd Schubert
2023-07-21 6:35 ` [External] [fuse-devel] " Jiachen Zhang
2023-07-21 11:27 ` Hao Xu
2023-07-21 11:56 ` Bernd Schubert
2023-07-25 10:11 ` Hao Xu
2023-07-25 13:00 ` Bernd Schubert
2023-07-25 16:57 ` Hao Xu
2023-07-25 17:59 ` Bernd Schubert
2023-07-27 9:42 ` Hao Xu
2023-07-26 11:07 ` Jiachen Zhang
2023-07-26 13:15 ` Bernd Schubert
2023-07-27 2:24 ` Jiachen Zhang
2023-07-27 10:31 ` Hao Xu
2023-07-28 2:57 ` Jiachen Zhang
2023-07-27 10:48 ` Hao Xu [this message]
2023-07-05 10:23 ` [RFC] [PATCH] fuse: DIO writes always use the same code path Bernd Schubert
2023-07-06 14:43 ` Christoph Hellwig
2023-07-07 13:36 ` Bernd Schubert
2023-07-17 8:03 ` Hao Xu
2023-07-17 21:17 ` Bernd Schubert
2023-07-20 7:32 ` [PATCH v3 0/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fcc8c890-e8c4-266b-74d0-437ded0eef5d@linux.dev \
--to=hao.xu@linux.dev \
--cc=bernd.schubert@fastmail.fm \
--cc=cgxu519@mykernel.net \
--cc=fuse-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=wanpengli@tencent.com \
--cc=zhangjiachen.jaycee@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).