linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hao Xu <hao.xu@linux.dev>
To: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>,
	Bernd Schubert <bernd.schubert@fastmail.fm>,
	fuse-devel@lists.sourceforge.net
Cc: linux-fsdevel@vger.kernel.org, Wanpeng Li <wanpengli@tencent.com>,
	cgxu519@mykernel.net, miklos@szeredi.hu
Subject: Re: [fuse-devel] [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode
Date: Thu, 27 Jul 2023 18:48:40 +0800	[thread overview]
Message-ID: <fcc8c890-e8c4-266b-74d0-437ded0eef5d@linux.dev> (raw)
In-Reply-To: <6856f435-a589-e044-881f-3a80aefa1174@bytedance.com>


On 7/26/23 19:07, Jiachen Zhang wrote:
>
>
> On 2023/7/26 00:57, Hao Xu wrote:
>>
>> On 7/25/23 21:00, Bernd Schubert wrote:
>>>
>>>
>>> On 7/25/23 12:11, Hao Xu wrote:
>>>> On 7/21/23 19:56, Bernd Schubert wrote:
>>>>> On July 21, 2023 1:27:26 PM GMT+02:00, Hao Xu <hao.xu@linux.dev> 
>>>>> wrote:
>>>>>> On 7/21/23 14:35, Jiachen Zhang wrote:
>>>>>>>
>>>>>>> On 2023/6/30 17:46, Hao Xu wrote:
>>>>>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>>>>>
>>>>>>>> In direct_io_relax mode, there can be shared mmaped files and 
>>>>>>>> thus dirty
>>>>>>>> pages in its page cache. Therefore those dirty pages should be 
>>>>>>>> written
>>>>>>>> back to backend before direct write to avoid data loss.
>>>>>>>>
>>>>>>>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>>>>>>>> ---
>>>>>>>>    fs/fuse/file.c | 7 +++++++
>>>>>>>>    1 file changed, 7 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>>>>>>> index 176f719f8fc8..7c9167c62bf6 100644
>>>>>>>> --- a/fs/fuse/file.c
>>>>>>>> +++ b/fs/fuse/file.c
>>>>>>>> @@ -1485,6 +1485,13 @@ ssize_t fuse_direct_io(struct 
>>>>>>>> fuse_io_priv *io, struct iov_iter *iter,
>>>>>>>>        if (!ia)
>>>>>>>>            return -ENOMEM;
>>>>>>>> +    if (fopen_direct_write && fc->direct_io_relax) {
>
>
> Hi,
>
> Seems this patchset flushes and invalidates the page cache before 
> doing the direct-io writes, which avoids data loss caused by flushing 
> staled data to FUSE daemon. And I tested it works well.
>
> But there is also another side of the same problem we should consider. 
> If a file is modified through its page cache (shared mmapped regions, 
> or non-FOPEN_DIRECT_IO files), the following direct-io reads may 
> bypass the new data in dirty page cache and read the staled data from 
> FUSE daemon. I think this is also a problem that should be fixed. It 
> could be fixed by uncondictionally calling 
> filemap_write_and_wait_range() before direct-io read.


Yea, I think this is true, I'll fix it in v2. Thanks Jiachen.


>
>
>>>>>>>> +        res = filemap_write_and_wait_range(mapping, pos, pos + 
>>>>>>>> count - 1);
>>>>>>>> +        if (res) {
>>>>>>>> +            fuse_io_free(ia);
>>>>>>>> +            return res;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>>        if (!cuse && fuse_range_is_writeback(inode, idx_from, 
>>>>>>>> idx_to)) {
>>>>>>>>            if (!write)
>>>>>>>>                inode_lock(inode);
>>>>>>>
>>>>>>> Tested-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
>>>>>>>
>>>>>>>
>>>>>>> Looks good to me.
>>>>>>>
>>>>>>> By the way, the behaviour would be a first FUSE_WRITE flushing 
>>>>>>> the page cache, followed by a second FUSE_WRITE doing the direct 
>>>>>>> IO. In the future, further optimization could be first write 
>>>>>>> into the page cache and then flush the dirty page to the FUSE 
>>>>>>> daemon.
>>>>>>>
>>>>>>
>>>>>> I think this makes sense, cannot think of any issue in it for 
>>>>>> now, so
>>>>>> I'll do that change and send next version, super thanks, Jiachen!
>>>>>>
>>>>>> Thanks,
>>>>>> Hao
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiachen
>>>>>>
>>>>>
>>>>> On my phone, sorry if mail formatting is not optimal.
>>>>> Do I understand it right? You want DIO code path copy into pages 
>>>>> and then flush/invalidate these pages? That would be punish DIO 
>>>>> for for the unlikely case there are also dirty pages (discouraged 
>>>>> IO pattern).
>>>>
>>>> Hi Bernd,
>>>> I think I don't get what you said, why it is punishment and why 
>>>> it's discouraged IO pattern?
>>>> On my first eyes seeing Jiachen's idea, I was thinking "that sounds
>>>> disobeying direct write semantics" because usually direct write is
>>>> "flush dirty page -> invalidate page -> write data through to backend"
>>>> not "write data to page -> flush dirty page/(writeback data)"
>>>> The latter in worst case write data both to page cache and backend
>>>> while the former just write to backend and load it to the page cache
>>>> when buffered reading. But seems there is no such "standard way" which
>>>> says we should implement direct IO in that way.
>>>
>>> Hi Hao,
>>>
>>> sorry for being brief last week, I was on vacation and 
>>> reading/writing some mails on my phone.
>>>
>>> With 'punishment' I mean memory copies to the page cache - memory 
>>> copies are expensive and DIO should avoid it.
>>>
>>> Right now your patch adds filemap_write_and_wait_range(), but we do 
>>> not know if it did work (i.e. if pages had to be flushed). So unless 
>>> you find a way to get that information, copy to page cache would be 
>>> unconditionally - overhead of memory copy even if there are no dirty 
>>> pages.
>>
>>
>> Ah, looks I understood what you mean in my last email reply. Yes, 
>> just like what I said in last email:
>>
>> [1] flush dirty page --> invalidate page --> write data to backend
>>
>>     This is what we do for direct write right now in kernel, I call 
>> this policy "write-through", since it doesn't care much about the cache.
>>
>> [2] write data to page cache --> flush dirty page in suitable time
>>
>>     This is  "write-back" policy, used by buffered write. Here in 
>> this patch's case, we flush pages synchronously, so it still can be 
>> called direct-write.
>>
>> Surely, in the worst case, the page is clean, then [2] has one extra 
>> memory copy than [1]. But like what I pointed out, for [2], next time 
>> buffered
>>
>> read happens, the page is in latest state, so no I/O needed, while 
>> for [1], it has to load data from backend to page cache.
>>
>
> Write-through, write-back and direct-io are also exlained in the 
> kernel documentation [*], of which write-through and write-back are 
> cache modes. According to the document, the pattern [2] is similar to 
> the FUSE write-back mode, but the pattern [1] is different from the 
> FUSE write-through mode. The FUSE write-through mode obeys the 'write 
> data to page cache --> flush dirty page synchronously' (let us call it 
> pattern [3]), which keeps the clean cache in-core after flushing.
>
> To improve performance while keeping the direct-io semantics, my 
> thoughts was in the future, maybe we can fallback to the pattern [3] 
> if the target page is in-core, otherwise keep the original direct-io 
> pattern without reading from whole pages from FUSE daemon.
>
> [*] https://www.kernel.org/doc/Documentation/filesystems/fuse-io.txt
>
> Thanks,
> Jiachen
>
>>
>>>
>>> With 'discouraged' I mean mix of page cache and direct-io. Typically 
>>> one should only do either of both (page cache or DIO), but not a mix 
>>> of them. For example see your patch, it flushes the page cache, but 
>>> without a lock - races are possible. Copying to the page cache might 
>>> be a solution, but it has the overhead above.
>>
>>
>> For race, we held inode lock there, do I miss anything?
>>
>>
>>>
>>> Thanks,
>>> Bernd
>>
>>
>> I now think it's good to keep the pattern same as other filesystems 
>> which is [1] to avoid possible performance issues in the future, 
>> thanks Bernd.
>>
>>
>> Hao
>>
>>

  parent reply	other threads:[~2023-07-27 10:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-30  9:45 [PATCH v3 0/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
2023-06-30  9:46 ` [PATCH 1/3] fuse: invalidate page cache pages before direct write Hao Xu
2023-06-30 10:32   ` Bernd Schubert
2023-07-21  3:34   ` [fuse-devel] " Jiachen Zhang
2023-06-30  9:46 ` [PATCH 2/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu
2023-06-30 10:35   ` Bernd Schubert
2023-06-30  9:46 ` [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode Hao Xu
2023-06-30 10:40   ` Bernd Schubert
2023-07-21  6:35   ` [External] [fuse-devel] " Jiachen Zhang
2023-07-21 11:27     ` Hao Xu
2023-07-21 11:56       ` Bernd Schubert
2023-07-25 10:11         ` Hao Xu
2023-07-25 13:00           ` Bernd Schubert
2023-07-25 16:57             ` Hao Xu
2023-07-25 17:59               ` Bernd Schubert
2023-07-27  9:42                 ` Hao Xu
2023-07-26 11:07               ` Jiachen Zhang
2023-07-26 13:15                 ` Bernd Schubert
2023-07-27  2:24                   ` Jiachen Zhang
2023-07-27 10:31                 ` Hao Xu
2023-07-28  2:57                   ` Jiachen Zhang
2023-07-27 10:48                 ` Hao Xu [this message]
2023-07-05 10:23 ` [RFC] [PATCH] fuse: DIO writes always use the same code path Bernd Schubert
2023-07-06 14:43   ` Christoph Hellwig
2023-07-07 13:36     ` Bernd Schubert
2023-07-17  8:03   ` Hao Xu
2023-07-17 21:17     ` Bernd Schubert
2023-07-20  7:32 ` [PATCH v3 0/3] fuse: add a new fuse init flag to relax restrictions in no cache mode Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcc8c890-e8c4-266b-74d0-437ded0eef5d@linux.dev \
    --to=hao.xu@linux.dev \
    --cc=bernd.schubert@fastmail.fm \
    --cc=cgxu519@mykernel.net \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=wanpengli@tencent.com \
    --cc=zhangjiachen.jaycee@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).