From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DC9CC0015E for ; Tue, 25 Jul 2023 16:57:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232003AbjGYQ5l (ORCPT ); Tue, 25 Jul 2023 12:57:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229492AbjGYQ5k (ORCPT ); Tue, 25 Jul 2023 12:57:40 -0400 Received: from out-37.mta1.migadu.com (out-37.mta1.migadu.com [95.215.58.37]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 108EA10B for ; Tue, 25 Jul 2023 09:57:38 -0700 (PDT) Message-ID: <45da6206-8e34-a184-5ba4-d40be252cfd2@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690304257; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qOH2+wUxEOwH8igRf+huAYSxRY+zwTmm3Lq2g2+GYT4=; b=HLyRT0JHYQOGR6u5HeXufBBX6rGdEg9joKATrL3cUM+PIm3oDLFlgjAdIsHIoeYAkgK6L+ geRzMCPmXjXXx5gZW8PLyIHxQIVyBdFw/SDsY1J6pGWPwkeHnJEhyF/hH3AjMdmTe8pGC/ DX75ch4tT8nnyo1YCbbR2OJ4L/Ojp4A= Date: Wed, 26 Jul 2023 00:57:23 +0800 MIME-Version: 1.0 Subject: Re: [External] [fuse-devel] [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode Content-Language: en-US To: Bernd Schubert , Jiachen Zhang , fuse-devel@lists.sourceforge.net Cc: linux-fsdevel@vger.kernel.org, Wanpeng Li , cgxu519@mykernel.net, miklos@szeredi.hu References: <20230630094602.230573-1-hao.xu@linux.dev> <20230630094602.230573-4-hao.xu@linux.dev> <2622afd7-228f-02f3-3b72-a1c826844126@linux.dev> <396A0BF4-DA68-46F8-9881-3801737225C6@fastmail.fm> <9b0a164d-3d0e-cc57-81b7-ae32bef4e9d7@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Xu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On 7/25/23 21:00, Bernd Schubert wrote: > > > On 7/25/23 12:11, Hao Xu wrote: >> On 7/21/23 19:56, Bernd Schubert wrote: >>> On July 21, 2023 1:27:26 PM GMT+02:00, Hao Xu wrote: >>>> On 7/21/23 14:35, Jiachen Zhang wrote: >>>>> >>>>> On 2023/6/30 17:46, Hao Xu wrote: >>>>>> From: Hao Xu >>>>>> >>>>>> In direct_io_relax mode, there can be shared mmaped files and >>>>>> thus dirty >>>>>> pages in its page cache. Therefore those dirty pages should be >>>>>> written >>>>>> back to backend before direct write to avoid data loss. >>>>>> >>>>>> Signed-off-by: Hao Xu >>>>>> --- >>>>>>    fs/fuse/file.c | 7 +++++++ >>>>>>    1 file changed, 7 insertions(+) >>>>>> >>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c >>>>>> index 176f719f8fc8..7c9167c62bf6 100644 >>>>>> --- a/fs/fuse/file.c >>>>>> +++ b/fs/fuse/file.c >>>>>> @@ -1485,6 +1485,13 @@ ssize_t fuse_direct_io(struct fuse_io_priv >>>>>> *io, struct iov_iter *iter, >>>>>>        if (!ia) >>>>>>            return -ENOMEM; >>>>>> +    if (fopen_direct_write && fc->direct_io_relax) { >>>>>> +        res = filemap_write_and_wait_range(mapping, pos, pos + >>>>>> count - 1); >>>>>> +        if (res) { >>>>>> +            fuse_io_free(ia); >>>>>> +            return res; >>>>>> +        } >>>>>> +    } >>>>>>        if (!cuse && fuse_range_is_writeback(inode, idx_from, >>>>>> idx_to)) { >>>>>>            if (!write) >>>>>>                inode_lock(inode); >>>>> >>>>> Tested-by: Jiachen Zhang >>>>> >>>>> >>>>> Looks good to me. >>>>> >>>>> By the way, the behaviour would be a first FUSE_WRITE flushing the >>>>> page cache, followed by a second FUSE_WRITE doing the direct IO. >>>>> In the future, further optimization could be first write into the >>>>> page cache and then flush the dirty page to the FUSE daemon. >>>>> >>>> >>>> I think this makes sense, cannot think of any issue in it for now, so >>>> I'll do that change and send next version, super thanks, Jiachen! >>>> >>>> Thanks, >>>> Hao >>>> >>>>> >>>>> Thanks, >>>>> Jiachen >>>> >>> >>> On my phone, sorry if mail formatting is not optimal. >>> Do I understand it right? You want DIO code path copy into pages and >>> then flush/invalidate these pages? That would be punish DIO for for >>> the unlikely case there are also dirty pages (discouraged IO pattern). >> >> Hi Bernd, >> I think I don't get what you said, why it is punishment and why it's >> discouraged IO pattern? >> On my first eyes seeing Jiachen's idea, I was thinking "that sounds >> disobeying direct write semantics" because usually direct write is >> "flush dirty page -> invalidate page -> write data through to backend" >> not "write data to page -> flush dirty page/(writeback data)" >> The latter in worst case write data both to page cache and backend >> while the former just write to backend and load it to the page cache >> when buffered reading. But seems there is no such "standard way" which >> says we should implement direct IO in that way. > > Hi Hao, > > sorry for being brief last week, I was on vacation and reading/writing > some mails on my phone. > > With 'punishment' I mean memory copies to the page cache - memory > copies are expensive and DIO should avoid it. > > Right now your patch adds filemap_write_and_wait_range(), but we do > not know if it did work (i.e. if pages had to be flushed). So unless > you find a way to get that information, copy to page cache would be > unconditionally - overhead of memory copy even if there are no dirty > pages. Ah, looks I understood what you mean in my last email reply. Yes, just like what I said in last email: [1] flush dirty page --> invalidate page --> write data to backend    This is what we do for direct write right now in kernel, I call this policy "write-through", since it doesn't care much about the cache. [2] write data to page cache --> flush dirty page in suitable time    This is  "write-back" policy, used by buffered write. Here in this patch's case, we flush pages synchronously, so it still can be called direct-write. Surely, in the worst case, the page is clean, then [2] has one extra memory copy than [1]. But like what I pointed out, for [2], next time buffered read happens, the page is in latest state, so no I/O needed, while for [1], it has to load data from backend to page cache. > > With 'discouraged' I mean mix of page cache and direct-io. Typically > one should only do either of both (page cache or DIO), but not a mix > of them. For example see your patch, it flushes the page cache, but > without a lock - races are possible. Copying to the page cache might > be a solution, but it has the overhead above. For race, we held inode lock there, do I miss anything? > > Thanks, > Bernd I now think it's good to keep the pattern same as other filesystems which is [1] to avoid possible performance issues in the future, thanks Bernd. Hao