From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F9F71DD0D9; Thu, 24 Oct 2024 17:08:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729789698; cv=none; b=PdQgNjn48IRxgCmgtYLu9Ewbcxi6eV8px4H9hjYsTinIILHmXf7J7TDspz/a/T+LP40punpIsh9GsjAQdymfMwnJrl7KjosN8a0iltExcaZl2JGINSeGuEvc98BAAf1OnmKYEx34j39v912ZSHUpbX885Hn3Ejd1eQyDiWfYRIA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729789698; c=relaxed/simple; bh=robSyRUhettsUoVKqWt+6PBW83WMsbRSY2lHQ051nNc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZH7GJhf5r3oEydsqXxnRe6LVvC5+P3c8wXSC1B8LeUgYIOrWad1i5XLnU//3Anrpd0+C+2vAvKFH9qnSO10PCoa6NZqPnotOS/cKB2Ib6KEI9xqR2IgCK1TXfZCsCorT12oZKmWKkhKZK1AWD8ZXcfcwkJWg8QSEusZUNNOgucw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RcU/yTPT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RcU/yTPT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 88173C4CEC7; Thu, 24 Oct 2024 17:08:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1729789697; bh=robSyRUhettsUoVKqWt+6PBW83WMsbRSY2lHQ051nNc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=RcU/yTPTvPiOABD4F8pAft/nUFJFBOiugH8NzotcAKsGuJFIlCYbRnMxcoi+a8lo7 G/C0GrqApHCsQTi2T5F7APK+Ol8s5wtfa6wnBSj8qh+l3rBc2Iowt8f8OMrQJpjLTK pT6M9ORRy+kM1QiUIbvqCMZsEUonuMcEELbmkJrIddSOA6kS7G9fmMaYuWO3anv2CN P4k0xdCbkFCsTa7TKYwyhvabFstbbKPWCKXqb2Swnn3rrBm2W16kS/hnmTHM7m8Pwx W4Bjr8rBv92ugAzOyTxC2Lba2xxebDMvTVqLHMbGSzbhACh+lwP4BHrxLsl/Vte/LK /K2ESoxuapcQQ== Date: Thu, 24 Oct 2024 10:08:17 -0700 From: "Darrick J. Wong" To: Brian Foster Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org Subject: Re: [PATCH] iomap: elide zero range flush from partial eof zeroing Message-ID: <20241024170817.GK21853@frogsfrogsfrogs> References: <20241023143029.11275-1-bfoster@redhat.com> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241023143029.11275-1-bfoster@redhat.com> On Wed, Oct 23, 2024 at 10:30:29AM -0400, Brian Foster wrote: > iomap zero range performs a pagecache flush upon seeing unwritten > extents with dirty pagecache in order to determine accurate > subranges that require direct zeroing. This is to support an > optimization where clean, unwritten ranges are skipped as they are > already zero on-disk. > > Certain use cases for zero range are more sensitive to flush latency > than others. The kernel test robot recently reported a regression in > the following stress-ng workload on XFS: > > stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --metamix 64 > > This workload involves a series of small, strided, write extending > writes. On XFS, this produces a pattern of allocating post-eof > speculative preallocation, converting preallocation to unwritten on > zero range calls, dirtying pagecache over the converted mapping, and > then repeating the sequence again from the updated EOF. This > basically produces a sequence of pagecache flushes on the partial > EOF block zeroing use case of zero range. > > To mitigate this problem, special case the EOF block zeroing use > case to prefer zeroing over a pagecache flush when the EOF folio is > already dirty. This brings most of the performance back by avoiding > flushes on write and truncate extension operations, while preserving > the ability for iomap to flush and properly process larger ranges. > > Signed-off-by: Brian Foster > --- > > Hi iomap maintainers, > > This is an incremental optimization for the regression reported by the > test robot here[1]. I'm not totally convinced this is necessary as an > immediate fix, but the discussion on that thread was enough to suggest > it could be. I don't really love the factoring, but I had to play a bit > of whack-a-mole between fstests and stress-ng to restore performance and > still maintain behavior expectations for some of the tests. > > On a positive note, exploring this gave me what I think is a better idea > for dealing with zero range overall, so I'm working on a followup to > this that reworks it by splitting zero range across block alignment > boundaries (similar to how something like truncate page range works, for > example). This simplifies things by isolating the dirty range check to a > single folio on an unaligned start offset, which lets the _iter() call > do a skip or zero (i.e. no more flush_and_stale()), and then > unconditionally flush the aligned portion to end-of-range. The latter > flush should be a no-op for every use case I've seen so far, so this > might entirely avoid the need for anything more complex for zero range. > > In summary, I'm posting this as an optional and more "stable-worthy" > patch for reference and for the maintainers to consider as they like. I > think it's reasonable to include if we are concerned about this > particular stress-ng test and are Ok with it as a transient solution. > But if it were up to me, I'd probably sit on it for a bit to determine > if a more practical user/workload is affected by this, particularly > knowing that I'm trying to rework it. This could always be applied as a > stable fix if really needed, but I just don't think the slightly more > invasive rework is appropriate for -rc.. > > Thoughts, reviews, flames appreciated. > > Brian > > [1] https://lore.kernel.org/linux-xfs/202410141536.1167190b-oliver.sang@intel.com/ > > fs/iomap/buffered-io.c | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index aa587b2142e2..8fd25b14d120 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1372,6 +1372,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero, > loff_t pos = iter->pos; > loff_t length = iomap_length(iter); > loff_t written = 0; > + bool eof_zero = false; > > /* > * We must zero subranges of unwritten mappings that might be dirty in > @@ -1391,12 +1392,23 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero, > * triggers writeback time post-eof zeroing. > */ > if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) { > - if (*range_dirty) { > + /* range is clean and already zeroed, nothing to do */ > + if (!*range_dirty) > + return length; > + > + /* flush for anything other than partial eof zeroing */ > + if (pos != i_size_read(iter->inode) || > + (pos % i_blocksize(iter->inode)) == 0) { > *range_dirty = false; > return iomap_zero_iter_flush_and_stale(iter); > } > - /* range is clean and already zeroed, nothing to do */ > - return length; > + /* > + * Special case partial EOF zeroing. Since we know the EOF > + * folio is dirty, prefer in-memory zeroing for it. This avoids > + * excessive flush latency on frequent file size extending > + * operations. > + */ > + eof_zero = true; > } > > do { > @@ -1415,6 +1427,8 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero, > offset = offset_in_folio(folio, pos); > if (bytes > folio_size(folio) - offset) > bytes = folio_size(folio) - offset; > + if (eof_zero && length > bytes) > + length = bytes; What does this do? I think this causes the loop to break after putting the folio that caches @pos? And then I guess we go around the loop in iomap_zero_range again if there were more bytes to zero after this folio? --D > > folio_zero_range(folio, offset, bytes); > folio_mark_accessed(folio); > -- > 2.46.2 > >