From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2AF7FA3741 for ; Mon, 31 Oct 2022 07:09:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229767AbiJaHJA (ORCPT ); Mon, 31 Oct 2022 03:09:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229469AbiJaHI7 (ORCPT ); Mon, 31 Oct 2022 03:08:59 -0400 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2B13A186 for ; Mon, 31 Oct 2022 00:08:57 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id 17so5674194pfv.4 for ; Mon, 31 Oct 2022 00:08:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=129190ubM3xU7nxJsk44KGJFSeALp4i9xsUuUgjhGbI=; b=GsycPg3GW7y2TIV6orCuUXV0qlblV8bINebBaWrxvewUvEoEUAjAzN2YkXPZZ97/No sq8cG5q1KFrrDMTg5CIQNuOsfJCF22/qWuXb1HSrY2CWFsdcyI16ZaqjPJJoCbB9WOhp ytLpXTejzng5HRQgnskTwD5u8bbYEyy9r/UhavTt3K2nQ/49eIZu0eY/chbRfc4gq5DI qZ6Bjqt7Sg/+7zmRs3BeBbYwChqhRcw6w6/hrmjYak/3pkq3LoB2BECUhOr4pM3gxH+p iPW6riB1XlOIGE6yH7oShfLWjYmYUGvztNOwg9Ua5pvBRmJIhemALGxaCio/NsJ3vAO5 d1LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=129190ubM3xU7nxJsk44KGJFSeALp4i9xsUuUgjhGbI=; b=OUn5PaLJKPQHeLxupQUp29fV9cSPQTeud5ERFg3qU99qP/0rCIR2KFMjEpEuyFcbhB c1ojWeUE1I9rb3PsgxlgEldwXxvPadYXUeCNS7wQ7LtnxrvVCB6oPtaGHkybj0ipX7tt /cJikRZ53aWvbnt/eBgZzMeYUd+VgtJtnnnJ4XqoK1rF/ZmJdx55B2PzuWomB3GRaC9m RtI7cWml3RV/s9cjrPQsAnXrm29ZBh7uXzqVGqEN6+B78M6BtkBm0lxmw6EojDPxM6H4 I0QGxbGM44hhX/HuSJE0yUKfZ7tbzuPtZW2MznsMHPfAjx3z1n2iBAB8oVD5Wj71YUUU oe6w== X-Gm-Message-State: ACrzQf1XV9CD88oPaZJ4yP4nRQkEL6K+BP1IluPwYu9/tprRaBke+J0Z cOAYeCXneuBRlR2jENVRC3fepw== X-Google-Smtp-Source: AMsMyM5uFu56UJLcjQSHG9Yf+bPECu2PhjaIgrkbd4t2+j+IElfeV7pF7cbFTcpQ+sLHh/MFByDZuw== X-Received: by 2002:a05:6a00:1253:b0:56d:8742:a9ff with SMTP id u19-20020a056a00125300b0056d8742a9ffmr2408260pfi.5.1667200137292; Mon, 31 Oct 2022 00:08:57 -0700 (PDT) Received: from dread.disaster.area (pa49-181-106-210.pa.nsw.optusnet.com.au. [49.181.106.210]) by smtp.gmail.com with ESMTPSA id s11-20020a65644b000000b0043c80e53c74sm3456907pgv.28.2022.10.31.00.08.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Oct 2022 00:08:56 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1opOuj-008V2d-3o; Mon, 31 Oct 2022 18:08:53 +1100 Date: Mon, 31 Oct 2022 18:08:53 +1100 From: Dave Chinner To: Matthew Wilcox Cc: "Ritesh Harjani (IBM)" , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Christoph Hellwig , "Darrick J . Wong" , Aravinda Herle , David Howells Subject: Re: [RFC 2/2] iomap: Support subpage size dirty tracking to improve write performance Message-ID: <20221031070853.GL3600936@dread.disaster.area> References: <886076cfa6f547d22765c522177d33cf621013d2.1666928993.git.ritesh.list@gmail.com> <20221028210422.GC3600936@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Mon, Oct 31, 2022 at 03:43:24AM +0000, Matthew Wilcox wrote: > On Sat, Oct 29, 2022 at 08:04:22AM +1100, Dave Chinner wrote: > > As it is, we already have the capability for the mapping tree to > > have multiple indexes pointing to the same folio - perhaps it's time > > to start thinking about using filesystem blocks as the mapping tree > > index rather than PAGE_SIZE chunks, so that the page cache can then > > track dirty state on filesystem block boundaries natively and > > this whole problem goes away. We have to solve this sub-folio dirty > > tracking problem for multi-page folios anyway, so it seems to me > > that we should solve the sub-page block size dirty tracking problem > > the same way.... > > That's an interesting proposal. From the page cache's point of > view right now, there is only one dirty bit per folio, not per page. Per folio, yes, but I thought we also had a dirty bit per index entry in the mapping tree. Writeback code uses the PAGECACHE_TAG_DIRTY mark to find the dirty folios efficiently (i.e. the write_cache_pages() iterator), so it's not like this is something new. i.e. we already have coherent, external dirty bit tracking mechanisms outside the folio itself that filesystems use. That's kinda what I'm getting at here - we already have coherent dirty state tracking outside of the individual folios themselves. Hence if we have to track sub-folio up-to-date state, sub-folio dirty state and, potentially, sub-folio writeback state outside the folio itself, why not do it by extending the existing coherent dirty state tracking that is built into the mapping tree itself? Folios + Xarray have given us the ability to disconnect the size of the cached item at any given index from the index granularity - why not extend that down to sub-page folio granularity in addition to the scaling up we've been doing for large (multipage) folio mappings? Then we don't need any sort of filesystem specific "add-on" that sits alongside the mapping tree that tries to keep track of dirty state in addition to the folio and the mapping tree tracking that already exists... > We have a number of people looking at the analogous problem for network > filesystems right now. Dave Howells' netfs infrastructure is trying > to solve the problem for everyone (and he's been looking at iomap as > inspiration for what he's doing). I'm kind of hoping we end up with one > unified solution that can be used for all filesystems that want sub-folio > dirty tracking. His solution is a bit more complex than I really want > to see, at least partially because he's trying to track dirtiness at > byte granularity, no matter how much pain that causes to the server. Byte range granularity is probably overkill for block based filesystems - all we need is a couple of extra bits per block to be stored in the mapping tree alongside the folio.... Cheers, Dave. -- Dave Chinner david@fromorbit.com