From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06803C4332F for ; Wed, 2 Nov 2022 09:04:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231196AbiKBJE5 (ORCPT ); Wed, 2 Nov 2022 05:04:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231134AbiKBJES (ORCPT ); Wed, 2 Nov 2022 05:04:18 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F5BC29343; Wed, 2 Nov 2022 02:03:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=RC1G2wyQRnwjxQ7LmWyOKL4toH6YzFwXRb5xKhPM3q0=; b=iBV6EUoRGIFcxtAJY+CRI/wveu Ke8HJK3bFYjjGHrLoEMGU0XAX1qpzT6JAxOoIDr0df8cSbl+j6KCHuIKxsGWkBvilNOzZY9yIDHqv +PbM7eIFMWCJkgj5z/tV38MVxkanYTMIfQBxw/99Ng+QWHjqFfGR9Ytf6oCj9KgC2OiJlnWF/78YY U3ijoEkzaTTwcG/eXNaVWo3U8UgANqqkxIuGVjSRssK52NVNWm/E312mGLL7n8sXHmzheWwAzvYYZ 1i7tgHCEhGXpsLzfCnGWQUZzm7aFLKXrqLu70WmymP/BL2xTMxssOPE99oxXxm3ERzoXbNJB9D09K K3XnaC6Q==; Received: from hch by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1oq9eR-009MOI-EI; Wed, 02 Nov 2022 09:03:11 +0000 Date: Wed, 2 Nov 2022 02:03:11 -0700 From: Christoph Hellwig To: Matthew Wilcox Cc: Dave Chinner , "Ritesh Harjani (IBM)" , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Christoph Hellwig , "Darrick J . Wong" , Aravinda Herle , David Howells Subject: Re: [RFC 2/2] iomap: Support subpage size dirty tracking to improve write performance Message-ID: References: <886076cfa6f547d22765c522177d33cf621013d2.1666928993.git.ritesh.list@gmail.com> <20221028210422.GC3600936@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Mon, Oct 31, 2022 at 03:43:24AM +0000, Matthew Wilcox wrote: > I agree that bufferheads do bottom-up dirty tracking, but I don't think > that what Ritesh is doing here is bottom-up dirty tracking. Buffer > heads expose an API to dirty a block, which necessarily goes bottom-up. > There's no API here to dirty a block. Instead there's an API to dirty > a range of a folio, so we're still top-down; we're just keeping track > of it in a more precise way. Agreed. > If there is any dirty region, the folio must be marked dirty (otherwise > we'll never know that it needs to be written back). The interesting > question (as your paragraph below hints) is whether removing the dirty > part of a folio from a file marks the folio clean. I believe that's > optional, but it's probably worth doing. Also agreed. > > What happens with direct extent manipulation like fallocate() > > operations? These invalidate the parts of the page cache over the > > range we are punching, shifting, etc, without interacting directly > > with iomap, so do we now have to ensure that the sub-folio dirty > > regions are also invalidated correctly? i.e. do functions like > > xfs_flush_unmap_range() need to become iomap infrastructure so that > > they can update sub-folio dirty ranges correctly? > > I'm slightly confused by this question. As I understand the various > fallocate operations, they start by kicking out all the folios affected > by the operation (generally from the start of the operation to EOF), > so we'd writeback the (dirty part of) folios which are dirty, then > invalidate the folios in cache. I'm not sure there's going to be > much difference. Yes. As far as I can tell all pagecache manipulation for the fallocate operations is driven by the file system and it is only done by those the punch/zero/move ranges. The file system then goes though the normal pagecache truncate helpers rounded to the block size, which through the ops should do the right thing. > Yes. This is also going to be a performance problem. Marking a folio as > dirty is no longer just setting the bit in struct folio and the xarray > but also setting all the bits in iop->state. Depending on the size > of the folio, and the fs blocksize, this could be quite a lot of bits. > eg a 2MB folio with a 1k block size is 2048 bits (256 bytes, 6 cachelines > (it dirties the spinlock in cacheline 0, then the bitmap occupies 3 full > cachelines and 2 partial ones)). We can always optimize by having a bit for the fairly common all dirty case and only track and look at the array if that is no the case. > filesystems right now. Dave Howells' netfs infrastructure is trying > to solve the problem for everyone (and he's been looking at iomap as > inspiration for what he's doing). Btw, I never understod why the network file systems don't just use iomap. There is nothing block specific in the core iomap code.