From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 050562459FD; Fri, 13 Feb 2026 16:24:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770999899; cv=none; b=SPJKwnCqngsXVXamUfzDKNJxnoWDPowpEhjqXbfpqu/6UYXkZj+xR/JyxM6b3x8GHwtBR8eYI4zI4SsDCsPxlhFAbmk0W/8ySZnmRai4o+N7vv+Lx8VINwoagUvDoGVBK0gk7yoJIWgEYHypJgKsQBeKceZI4dvHlefhySEeadA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770999899; c=relaxed/simple; bh=kwsicxk0t02G079LmEmHH+BZZGn76QP0Y73QpA4Vh2Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=UOxOtfWzmEG4oqaJhYUTIW/D6lb+dpRkkLZNE1etJfF/I6S4SkV5Mkem3yTbfUS7DW4PpPOxr5i49MQIyX4cHCyJyA6TSD/zb0p7d3s/DHu9V6aKkem59R8az/uPpae3CCirYDZlcGr1M0MqYNOiNDxO4kjjs8Zt9U67NWKQoow= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=f/1VxxWF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f/1VxxWF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 85ABEC116C6; Fri, 13 Feb 2026 16:24:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1770999898; bh=kwsicxk0t02G079LmEmHH+BZZGn76QP0Y73QpA4Vh2Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=f/1VxxWFUBxmCE5cPbqQPvU7xXhEdFurJIkc5qcF1ykBB/FdcBOs79bBfswY7UO0D zwZYMsV2uKRbIConToUacSfC5wJ/TAtghmMznbRGSSAzT/AH3pM1lGCIemEKWmBNHu PDDS1WFCU7HI8YIyzao3IZ7fJA6kMCu9SuEN9Arcq2uNsKbAh02UiTxmC0nKeT0WlW btgKsCol7lsM/xX3Xy1z5C3r26zzXhmplRFKFYA/HHRYSuC28Iu0nLl60TUYSrK9RI 2gaP4xWMD3Wj/WrGIPxVhgVtQZMtIImeFnmUjM6Z+5yPF+/6511UFUpTmMA5T0QZBu 8ofCr/IS7IcEg== Date: Fri, 13 Feb 2026 08:24:57 -0800 From: "Darrick J. Wong" To: "Nirjhar Roy (IBM)" Cc: Brian Foster , linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org Subject: Re: [PATCH v2 1/5] iomap, xfs: lift zero range hole mapping flush into xfs Message-ID: <20260213162457.GG7712@frogsfrogsfrogs> References: <20260129155028.141110-1-bfoster@redhat.com> <20260129155028.141110-2-bfoster@redhat.com> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Fri, Feb 13, 2026 at 03:50:07PM +0530, Nirjhar Roy (IBM) wrote: > On Thu, 2026-01-29 at 10:50 -0500, Brian Foster wrote: > > iomap zero range has a wart in that it also flushes dirty pagecache > > over hole mappings (rather than only unwritten mappings). This was > > included to accommodate a quirk in XFS where COW fork preallocation > > can exist over a hole in the data fork, and the associated range is > > reported as a hole. This is because the range actually is a hole, > > but XFS also has an optimization where if COW fork blocks exist for > > a range being written to, those blocks are used regardless of > > whether the data fork blocks are shared or not. For zeroing, COW > > fork blocks over a data fork hole are only relevant if the range is > > dirty in pagecache, otherwise the range is already considered > > zeroed. > > > > The easiest way to deal with this corner case is to flush the > > pagecache to trigger COW remapping into the data fork, and then > > operate on the updated on-disk state. The problem is that ext4 > > cannot accommodate a flush from this context due to being a > > transaction deadlock vector. > > > > Outside of the hole quirk, ext4 can avoid the flush for zero range > > by using the recently introduced folio batch lookup mechanism for > > unwritten mappings. Therefore, take the next logical step and lift > > the hole handling logic into the XFS iomap_begin handler. iomap will > > still flush on unwritten mappings without a folio batch, and XFS > > will flush and retry mapping lookups in the case where it would > > otherwise report a hole with dirty pagecache during a zero range. > > > > Note that this is intended to be a fairly straightforward lift and > > otherwise not change behavior. Now that the flush exists within XFS, > > follow on patches can further optimize it. > > > > Signed-off-by: Brian Foster > > --- > > fs/iomap/buffered-io.c | 2 +- > > fs/xfs/xfs_iomap.c | 25 ++++++++++++++++++++++--- > > 2 files changed, 23 insertions(+), 4 deletions(-) > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > index 6beb876658c0..807384d72311 100644 > > --- a/fs/iomap/buffered-io.c > > +++ b/fs/iomap/buffered-io.c > > @@ -1620,7 +1620,7 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, > > srcmap->type == IOMAP_UNWRITTEN)) { > > s64 status; > > > > - if (range_dirty) { > > + if (range_dirty && srcmap->type == IOMAP_UNWRITTEN) { > > range_dirty = false; > > status = iomap_zero_iter_flush_and_stale(&iter); > > } else { > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > index 37a1b33e9045..896d0dd07613 100644 > > --- a/fs/xfs/xfs_iomap.c > > +++ b/fs/xfs/xfs_iomap.c > > @@ -1790,6 +1790,7 @@ xfs_buffered_write_iomap_begin( > > if (error) > > return error; > > > > +restart: > > error = xfs_ilock_for_iomap(ip, flags, &lockmode); > > if (error) > > return error; > > @@ -1817,9 +1818,27 @@ xfs_buffered_write_iomap_begin( > > if (eof) > > imap.br_startoff = end_fsb; /* fake hole until the end */ > > > > - /* We never need to allocate blocks for zeroing or unsharing a hole. */ > > - if ((flags & (IOMAP_UNSHARE | IOMAP_ZERO)) && > > - imap.br_startoff > offset_fsb) { > > + /* We never need to allocate blocks for unsharing a hole. */ > > + if ((flags & IOMAP_UNSHARE) && imap.br_startoff > offset_fsb) { > > + xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff); > > + goto out_unlock; > > + } > > + > > + /* > > + * We may need to zero over a hole in the data fork if it's fronted by > > + * COW blocks and dirty pagecache. To make sure zeroing occurs, force > > + * writeback to remap pending blocks and restart the lookup. > > + */ > > + if ((flags & IOMAP_ZERO) && imap.br_startoff > offset_fsb) { > > + if (filemap_range_needs_writeback(inode->i_mapping, offset, > > + offset + count - 1)) { > > + xfs_iunlock(ip, lockmode); > > I am a bit new to this section of the code - so a naive question: > Why do we need to unlock the inode here? Shouldn't the mappings be thread safe while the write/flush > is going on? Writeback takes XFS_ILOCK, which we currently hold here (possibly in exclusive mode) so we must drop it to write(back) and wait. --D > --NR > > + error = filemap_write_and_wait_range(inode->i_mapping, > > + offset, offset + count - 1); > > + if (error) > > + return error; > > + goto restart; > > + } > > xfs_hole_to_iomap(ip, iomap, offset_fsb, imap.br_startoff); > > goto out_unlock; > > } > >