From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 147ACC001E0 for ; Thu, 27 Jul 2023 09:43:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232483AbjG0Jni (ORCPT ); Thu, 27 Jul 2023 05:43:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234050AbjG0Jn0 (ORCPT ); Thu, 27 Jul 2023 05:43:26 -0400 Received: from out-124.mta0.migadu.com (out-124.mta0.migadu.com [91.218.175.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC6411739 for ; Thu, 27 Jul 2023 02:43:01 -0700 (PDT) Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1690450979; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+Ux7MkE7mBBTHCndJ7zOsRytS4ZLarEC65+RLjXxT+w=; b=AEYOEuChlb66P5nt4dVwpwol8jWjB3m8aXfO29+rilywovbkDXBQ/etiVjSd/F4MnzrVxU 72J188GIud0T27x0z5DF2PuS2OjS8+xm09k81MO9lxCmFjdcuhZA8XeEkHEQ+zdYD1Wz54 HF6aQSBt4nPgOE4ngwF8agCTU2gWqf0= Date: Thu, 27 Jul 2023 17:42:52 +0800 MIME-Version: 1.0 Subject: Re: [External] [fuse-devel] [PATCH 3/3] fuse: write back dirty pages before direct write in direct_io_relax mode Content-Language: en-US To: Bernd Schubert , Jiachen Zhang , fuse-devel@lists.sourceforge.net Cc: linux-fsdevel@vger.kernel.org, Wanpeng Li , cgxu519@mykernel.net, miklos@szeredi.hu References: <20230630094602.230573-1-hao.xu@linux.dev> <20230630094602.230573-4-hao.xu@linux.dev> <2622afd7-228f-02f3-3b72-a1c826844126@linux.dev> <396A0BF4-DA68-46F8-9881-3801737225C6@fastmail.fm> <9b0a164d-3d0e-cc57-81b7-ae32bef4e9d7@linux.dev> <45da6206-8e34-a184-5ba4-d40be252cfd2@linux.dev> <02444c76-ea2e-3931-d49e-5cd0518711a7@fastmail.fm> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Xu In-Reply-To: <02444c76-ea2e-3931-d49e-5cd0518711a7@fastmail.fm> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On 7/26/23 01:59, Bernd Schubert wrote: > > > On 7/25/23 18:57, Hao Xu wrote: >> >> On 7/25/23 21:00, Bernd Schubert wrote: >>> >>> >>> On 7/25/23 12:11, Hao Xu wrote: >>>> On 7/21/23 19:56, Bernd Schubert wrote: >>>>> On July 21, 2023 1:27:26 PM GMT+02:00, Hao Xu >>>>> wrote: >>>>>> On 7/21/23 14:35, Jiachen Zhang wrote: >>>>>>> >>>>>>> On 2023/6/30 17:46, Hao Xu wrote: >>>>>>>> From: Hao Xu >>>>>>>> >>>>>>>> In direct_io_relax mode, there can be shared mmaped files and >>>>>>>> thus dirty >>>>>>>> pages in its page cache. Therefore those dirty pages should be >>>>>>>> written >>>>>>>> back to backend before direct write to avoid data loss. >>>>>>>> >>>>>>>> Signed-off-by: Hao Xu >>>>>>>> --- >>>>>>>>    fs/fuse/file.c | 7 +++++++ >>>>>>>>    1 file changed, 7 insertions(+) >>>>>>>> >>>>>>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c >>>>>>>> index 176f719f8fc8..7c9167c62bf6 100644 >>>>>>>> --- a/fs/fuse/file.c >>>>>>>> +++ b/fs/fuse/file.c >>>>>>>> @@ -1485,6 +1485,13 @@ ssize_t fuse_direct_io(struct >>>>>>>> fuse_io_priv *io, struct iov_iter *iter, >>>>>>>>        if (!ia) >>>>>>>>            return -ENOMEM; >>>>>>>> +    if (fopen_direct_write && fc->direct_io_relax) { >>>>>>>> +        res = filemap_write_and_wait_range(mapping, pos, pos + >>>>>>>> count - 1); >>>>>>>> +        if (res) { >>>>>>>> +            fuse_io_free(ia); >>>>>>>> +            return res; >>>>>>>> +        } >>>>>>>> +    } >>>>>>>>        if (!cuse && fuse_range_is_writeback(inode, idx_from, >>>>>>>> idx_to)) { >>>>>>>>            if (!write) >>>>>>>>                inode_lock(inode); >>>>>>> >>>>>>> Tested-by: Jiachen Zhang >>>>>>> >>>>>>> >>>>>>> Looks good to me. >>>>>>> >>>>>>> By the way, the behaviour would be a first FUSE_WRITE flushing >>>>>>> the page cache, followed by a second FUSE_WRITE doing the direct >>>>>>> IO. In the future, further optimization could be first write into >>>>>>> the page cache and then flush the dirty page to the FUSE daemon. >>>>>>> >>>>>> >>>>>> I think this makes sense, cannot think of any issue in it for now, so >>>>>> I'll do that change and send next version, super thanks, Jiachen! >>>>>> >>>>>> Thanks, >>>>>> Hao >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Jiachen >>>>>> >>>>> >>>>> On my phone, sorry if mail formatting is not optimal. >>>>> Do I understand it right? You want DIO code path copy into pages >>>>> and then flush/invalidate these pages? That would be punish DIO for >>>>> for the unlikely case there are also dirty pages (discouraged IO >>>>> pattern). >>>> >>>> Hi Bernd, >>>> I think I don't get what you said, why it is punishment and why it's >>>> discouraged IO pattern? >>>> On my first eyes seeing Jiachen's idea, I was thinking "that sounds >>>> disobeying direct write semantics" because usually direct write is >>>> "flush dirty page -> invalidate page -> write data through to backend" >>>> not "write data to page -> flush dirty page/(writeback data)" >>>> The latter in worst case write data both to page cache and backend >>>> while the former just write to backend and load it to the page cache >>>> when buffered reading. But seems there is no such "standard way" which >>>> says we should implement direct IO in that way. >>> >>> Hi Hao, >>> >>> sorry for being brief last week, I was on vacation and >>> reading/writing some mails on my phone. >>> >>> With 'punishment' I mean memory copies to the page cache - memory >>> copies are expensive and DIO should avoid it. >>> >>> Right now your patch adds filemap_write_and_wait_range(), but we do >>> not know if it did work (i.e. if pages had to be flushed). So unless >>> you find a way to get that information, copy to page cache would be >>> unconditionally - overhead of memory copy even if there are no dirty >>> pages. >> >> >> Ah, looks I understood what you mean in my last email reply. Yes, just >> like what I said in last email: >> >> [1] flush dirty page --> invalidate page --> write data to backend >> >>     This is what we do for direct write right now in kernel, I call >> this policy "write-through", since it doesn't care much about the cache. >> >> [2] write data to page cache --> flush dirty page in suitable time >> >>     This is  "write-back" policy, used by buffered write. Here in this >> patch's case, we flush pages synchronously, so it still can be called >> direct-write. >> >> Surely, in the worst case, the page is clean, then [2] has one extra >> memory copy than [1]. But like what I pointed out, for [2], next time >> buffered >> >> read happens, the page is in latest state, so no I/O needed, while for >> [1], it has to load data from backend to page cache. >> >> >>> >>> With 'discouraged' I mean mix of page cache and direct-io. Typically >>> one should only do either of both (page cache or DIO), but not a mix >>> of them. For example see your patch, it flushes the page cache, but >>> without a lock - races are possible. Copying to the page cache might >>> be a solution, but it has the overhead above. >> >> >> For race, we held inode lock there, do I miss anything? > > We hold inode lock in write path, but not in read path. That ensures > invalidate_inode_pages2_range() is not racing, but DIO read might race > with a page cache writing happening in parallel. I guess results are > then a bit unexpected anyway, although differently if we would hold the > lock. > This confused me a bit. Direct read should hold inode shared lock I believe, I checked it for fuse, neither in FOPEN_DIRECT_IO or not we don't hold shared inode lock. Any concern causes fuse doing so? Regards, Hao > >> >> >>> >>> Thanks, >>> Bernd >> >> >> I now think it's good to keep the pattern same as other filesystems >> which is [1] to avoid possible performance issues in the future, >> thanks Bernd. > > Thanks, I think we should keep fuse consistent with what other fs do. > > > Thanks, > Bernd