linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hao Xu <hao.xu@linux.dev>
To: Bernd Schubert <bernd.schubert@fastmail.fm>,
	Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>,
	fuse-devel@lists.sourceforge.net
Cc: linux-fsdevel@vger.kernel.org, Wanpeng Li <wanpengli@tencent.com>,
	cgxu519@mykernel.net, miklos@szeredi.hu
Subject: Re: [External] [fuse-devel] [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode
Date: Wed, 26 Jul 2023 00:57:23 +0800	[thread overview]
Message-ID: <45da6206-8e34-a184-5ba4-d40be252cfd2@linux.dev> (raw)
In-Reply-To: <cb8c18e6-b5cb-e891-696f-b403012eacb7@fastmail.fm>


On 7/25/23 21:00, Bernd Schubert wrote:
>
>
> On 7/25/23 12:11, Hao Xu wrote:
>> On 7/21/23 19:56, Bernd Schubert wrote:
>>> On July 21, 2023 1:27:26 PM GMT+02:00, Hao Xu <hao.xu@linux.dev> wrote:
>>>> On 7/21/23 14:35, Jiachen Zhang wrote:
>>>>>
>>>>> On 2023/6/30 17:46, Hao Xu wrote:
>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>
>>>>>> In direct_io_relax mode, there can be shared mmaped files and 
>>>>>> thus dirty
>>>>>> pages in its page cache. Therefore those dirty pages should be 
>>>>>> written
>>>>>> back to backend before direct write to avoid data loss.
>>>>>>
>>>>>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>>>>>> ---
>>>>>>    fs/fuse/file.c | 7 +++++++
>>>>>>    1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>>>>> index 176f719f8fc8..7c9167c62bf6 100644
>>>>>> --- a/fs/fuse/file.c
>>>>>> +++ b/fs/fuse/file.c
>>>>>> @@ -1485,6 +1485,13 @@ ssize_t fuse_direct_io(struct fuse_io_priv 
>>>>>> *io, struct iov_iter *iter,
>>>>>>        if (!ia)
>>>>>>            return -ENOMEM;
>>>>>> +    if (fopen_direct_write && fc->direct_io_relax) {
>>>>>> +        res = filemap_write_and_wait_range(mapping, pos, pos + 
>>>>>> count - 1);
>>>>>> +        if (res) {
>>>>>> +            fuse_io_free(ia);
>>>>>> +            return res;
>>>>>> +        }
>>>>>> +    }
>>>>>>        if (!cuse && fuse_range_is_writeback(inode, idx_from, 
>>>>>> idx_to)) {
>>>>>>            if (!write)
>>>>>>                inode_lock(inode);
>>>>>
>>>>> Tested-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
>>>>>
>>>>>
>>>>> Looks good to me.
>>>>>
>>>>> By the way, the behaviour would be a first FUSE_WRITE flushing the 
>>>>> page cache, followed by a second FUSE_WRITE doing the direct IO. 
>>>>> In the future, further optimization could be first write into the 
>>>>> page cache and then flush the dirty page to the FUSE daemon.
>>>>>
>>>>
>>>> I think this makes sense, cannot think of any issue in it for now, so
>>>> I'll do that change and send next version, super thanks, Jiachen!
>>>>
>>>> Thanks,
>>>> Hao
>>>>
>>>>>
>>>>> Thanks,
>>>>> Jiachen
>>>>
>>>
>>> On my phone, sorry if mail formatting is not optimal.
>>> Do I understand it right? You want DIO code path copy into pages and 
>>> then flush/invalidate these pages? That would be punish DIO for for 
>>> the unlikely case there are also dirty pages (discouraged IO pattern).
>>
>> Hi Bernd,
>> I think I don't get what you said, why it is punishment and why it's 
>> discouraged IO pattern?
>> On my first eyes seeing Jiachen's idea, I was thinking "that sounds
>> disobeying direct write semantics" because usually direct write is
>> "flush dirty page -> invalidate page -> write data through to backend"
>> not "write data to page -> flush dirty page/(writeback data)"
>> The latter in worst case write data both to page cache and backend
>> while the former just write to backend and load it to the page cache
>> when buffered reading. But seems there is no such "standard way" which
>> says we should implement direct IO in that way.
>
> Hi Hao,
>
> sorry for being brief last week, I was on vacation and reading/writing 
> some mails on my phone.
>
> With 'punishment' I mean memory copies to the page cache - memory 
> copies are expensive and DIO should avoid it.
>
> Right now your patch adds filemap_write_and_wait_range(), but we do 
> not know if it did work (i.e. if pages had to be flushed). So unless 
> you find a way to get that information, copy to page cache would be 
> unconditionally - overhead of memory copy even if there are no dirty 
> pages.


Ah, looks I understood what you mean in my last email reply. Yes, just 
like what I said in last email:

[1] flush dirty page --> invalidate page --> write data to backend

    This is what we do for direct write right now in kernel, I call this 
policy "write-through", since it doesn't care much about the cache.

[2] write data to page cache --> flush dirty page in suitable time

    This is  "write-back" policy, used by buffered write. Here in this 
patch's case, we flush pages synchronously, so it still can be called 
direct-write.

Surely, in the worst case, the page is clean, then [2] has one extra 
memory copy than [1]. But like what I pointed out, for [2], next time 
buffered

read happens, the page is in latest state, so no I/O needed, while for 
[1], it has to load data from backend to page cache.


>
> With 'discouraged' I mean mix of page cache and direct-io. Typically 
> one should only do either of both (page cache or DIO), but not a mix 
> of them. For example see your patch, it flushes the page cache, but 
> without a lock - races are possible. Copying to the page cache might 
> be a solution, but it has the overhead above.


For race, we held inode lock there, do I miss anything?


>
> Thanks,
> Bernd


I now think it's good to keep the pattern same as other filesystems 
which is [1] to avoid possible performance issues in the future, thanks 
Bernd.


Hao



  reply	other threads:[~2023-07-25 16:57 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-30  9:45 [PATCH v3 0/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
2023-06-30  9:46 ` [PATCH 1/3] fuse: invalidate page cache pages before direct write Hao Xu
2023-06-30 10:32   ` Bernd Schubert
2023-07-21  3:34   ` [fuse-devel] " Jiachen Zhang
2023-06-30  9:46 ` [PATCH 2/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
2023-06-30 10:35   ` Bernd Schubert
2023-06-30  9:46 ` [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode Hao Xu
2023-06-30 10:40   ` Bernd Schubert
2023-07-21  6:35   ` [External] [fuse-devel] " Jiachen Zhang
2023-07-21 11:27     ` Hao Xu
2023-07-21 11:56       ` Bernd Schubert
2023-07-25 10:11         ` Hao Xu
2023-07-25 13:00           ` Bernd Schubert
2023-07-25 16:57             ` Hao Xu [this message]
2023-07-25 17:59               ` Bernd Schubert
2023-07-27  9:42                 ` Hao Xu
2023-07-26 11:07               ` Jiachen Zhang
2023-07-26 13:15                 ` Bernd Schubert
2023-07-27  2:24                   ` Jiachen Zhang
2023-07-27 10:31                 ` Hao Xu
2023-07-28  2:57                   ` Jiachen Zhang
2023-07-27 10:48                 ` Hao Xu
2023-07-05 10:23 ` [RFC] [PATCH] fuse: DIO writes always use the same code path Bernd Schubert
2023-07-06 14:43   ` Christoph Hellwig
2023-07-07 13:36     ` Bernd Schubert
2023-07-17  8:03   ` Hao Xu
2023-07-17 21:17     ` Bernd Schubert
2023-07-20  7:32 ` [PATCH v3 0/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45da6206-8e34-a184-5ba4-d40be252cfd2@linux.dev \
    --to=hao.xu@linux.dev \
    --cc=bernd.schubert@fastmail.fm \
    --cc=cgxu519@mykernel.net \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=wanpengli@tencent.com \
    --cc=zhangjiachen.jaycee@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).