All of lore.kernel.org
 help / color / mirror / Atom feed
From: <gregkh@linuxfoundation.org>
To: bfoster@redhat.com,catherine.hoang@oracle.com,cem@kernel.org,djwong@kernel.org,gregkh@linuxfoundation.org,xfs-stable@lists.linux.dev
Cc: <stable-commits@vger.kernel.org>
Subject: Patch "xfs: skip background cowblock trims on inodes open for write" has been added to the 6.6-stable tree
Date: Fri, 21 Feb 2025 16:23:29 +0100	[thread overview]
Message-ID: <2025022129-donated-flagpole-a2e1@gregkh> (raw)
In-Reply-To: <20250205214025.72516-6-catherine.hoang@oracle.com>


This is a note to let you know that I've just added the patch titled

    xfs: skip background cowblock trims on inodes open for write

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     xfs-skip-background-cowblock-trims-on-inodes-open-for-write.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From stable+bounces-113971-greg=kroah.com@vger.kernel.org Wed Feb  5 22:40:53 2025
From: Catherine Hoang <catherine.hoang@oracle.com>
Date: Wed,  5 Feb 2025 13:40:06 -0800
Subject: xfs: skip background cowblock trims on inodes open for write
To: stable@vger.kernel.org
Cc: xfs-stable@lists.linux.dev
Message-ID: <20250205214025.72516-6-catherine.hoang@oracle.com>

From: Brian Foster <bfoster@redhat.com>

commit 90a71daaf73f5d39bb0cbb3c7ab6af942fe6233e upstream.

The background blockgc scanner runs on a 5m interval by default and
trims preallocation (post-eof and cow fork) from inodes that are
otherwise idle. Idle effectively means that iolock can be acquired
without blocking and that the inode has no dirty pagecache or I/O in
flight.

This simple mechanism and heuristic has worked fairly well for
post-eof speculative preallocations. Support for reflink and COW
fork preallocations came sometime later and plugged into the same
mechanism, with similar heuristics. Some recent testing has shown
that COW fork preallocation may be notably more sensitive to blockgc
processing than post-eof preallocation, however.

For example, consider an 8GB reflinked file with a COW extent size
hint of 1MB. A worst case fully randomized overwrite of this file
results in ~8k extents of an average size of ~1MB. If the same
workload is interrupted a couple times for blockgc processing
(assuming the file goes idle), the resulting extent count explodes
to over 100k extents with an average size <100kB. This is
significantly worse than ideal and essentially defeats the COW
extent size hint mechanism.

While this particular test is instrumented, it reflects a fairly
reasonable pattern in practice where random I/Os might spread out
over a large period of time with varying periods of (in)activity.
For example, consider a cloned disk image file for a VM or container
with long uptime and variable and bursty usage. A background blockgc
scan that races and processes the image file when it happens to be
clean and idle can have a significant effect on the future
fragmentation level of the file, even when still in use.

To help combat this, update the heuristic to skip cowblocks inodes
that are currently opened for write access during non-sync blockgc
scans. This allows COW fork preallocations to persist for as long as
possible unless otherwise needed for functional purposes (i.e. a
sync scan), the file is idle and closed, or the inode is being
evicted from cache. While here, update the comments to help
distinguish performance oriented heuristics from the logic that
exists to maintain functional correctness.

Suggested-by: Darrick Wong <djwong@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/xfs_icache.c |   31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1234,14 +1234,17 @@ xfs_inode_clear_eofblocks_tag(
 }
 
 /*
- * Set ourselves up to free CoW blocks from this file.  If it's already clean
- * then we can bail out quickly, but otherwise we must back off if the file
- * is undergoing some kind of write.
+ * Prepare to free COW fork blocks from an inode.
  */
 static bool
 xfs_prep_free_cowblocks(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	struct xfs_icwalk	*icw)
 {
+	bool			sync;
+
+	sync = icw && (icw->icw_flags & XFS_ICWALK_FLAG_SYNC);
+
 	/*
 	 * Just clear the tag if we have an empty cow fork or none at all. It's
 	 * possible the inode was fully unshared since it was originally tagged.
@@ -1253,9 +1256,21 @@ xfs_prep_free_cowblocks(
 	}
 
 	/*
-	 * If the mapping is dirty or under writeback we cannot touch the
-	 * CoW fork.  Leave it alone if we're in the midst of a directio.
+	 * A cowblocks trim of an inode can have a significant effect on
+	 * fragmentation even when a reasonable COW extent size hint is set.
+	 * Therefore, we prefer to not process cowblocks unless they are clean
+	 * and idle. We can never process a cowblocks inode that is dirty or has
+	 * in-flight I/O under any circumstances, because outstanding writeback
+	 * or dio expects targeted COW fork blocks exist through write
+	 * completion where they can be remapped into the data fork.
+	 *
+	 * Therefore, the heuristic used here is to never process inodes
+	 * currently opened for write from background (i.e. non-sync) scans. For
+	 * sync scans, use the pagecache/dio state of the inode to ensure we
+	 * never free COW fork blocks out from under pending I/O.
 	 */
+	if (!sync && inode_is_open_for_write(VFS_I(ip)))
+		return false;
 	if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
 	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
 	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
@@ -1291,7 +1306,7 @@ xfs_inode_free_cowblocks(
 	if (!xfs_iflags_test(ip, XFS_ICOWBLOCKS))
 		return 0;
 
-	if (!xfs_prep_free_cowblocks(ip))
+	if (!xfs_prep_free_cowblocks(ip, icw))
 		return 0;
 
 	if (!xfs_icwalk_match(ip, icw))
@@ -1320,7 +1335,7 @@ xfs_inode_free_cowblocks(
 	 * Check again, nobody else should be able to dirty blocks or change
 	 * the reflink iflag now that we have the first two locks held.
 	 */
-	if (xfs_prep_free_cowblocks(ip))
+	if (xfs_prep_free_cowblocks(ip, icw))
 		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
 	return ret;
 }


Patches currently in stable-queue which might be from catherine.hoang@oracle.com are

queue-6.6/xfs-return-bool-from-xfs_attr3_leaf_add.patch
queue-6.6/xfs-fix-a-sloppy-memory-handling-bug-in-xfs_iroot_realloc.patch
queue-6.6/xfs-streamline-xfs_filestream_pick_ag.patch
queue-6.6/xfs-merge-xfs_attr_leaf_try_add-into-xfs_attr_leaf_addname.patch
queue-6.6/xfs-don-t-free-cowblocks-from-under-dirty-pagecache-on-unshare.patch
queue-6.6/xfs-pass-the-exact-range-to-initialize-to-xfs_initialize_perag.patch
queue-6.6/xfs-assert-a-valid-limit-in-xfs_rtfind_forw.patch
queue-6.6/xfs-don-t-use-__gfp_retry_mayfail-in-xfs_initialize_perag.patch
queue-6.6/xfs-use-try_cmpxchg-in-xlog_cil_insert_pcp_aggregate.patch
queue-6.6/xfs-don-t-ifdef-around-the-exact-minlen-allocations.patch
queue-6.6/xfs-reduce-unnecessary-searches-when-searching-for-the-best-extents.patch
queue-6.6/xfs-validate-inumber-in-xfs_iget.patch
queue-6.6/xfs-support-lowmode-allocations-in-xfs_bmap_exact_minlen_extent_alloc.patch
queue-6.6/xfs-skip-background-cowblock-trims-on-inodes-open-for-write.patch
queue-6.6/xfs-remove-empty-declartion-in-header-file.patch
queue-6.6/xfs-fold-xfs_bmap_alloc_userdata-into-xfs_bmapi_allocate.patch
queue-6.6/xfs-update-the-file-system-geometry-after-recoverying-superblock-buffers.patch
queue-6.6/xfs-call-xfs_bmap_exact_minlen_extent_alloc-from-xfs_bmap_btalloc.patch
queue-6.6/xfs-distinguish-extra-split-from-real-enospc-from-xfs_attr_node_try_addname.patch
queue-6.6/xfs-error-out-when-a-superblock-buffer-update-reduces-the-agcount.patch
queue-6.6/xfs-update-the-pag-for-the-last-ag-at-recovery-time.patch
queue-6.6/xfs-check-for-delayed-allocations-before-setting-extsize.patch
queue-6.6/xfs-fix-a-typo.patch
queue-6.6/xfs-distinguish-extra-split-from-real-enospc-from-xfs_attr3_leaf_split.patch

  reply	other threads:[~2025-02-21 15:24 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 21:40 [PATCH 6.6 00/24] xfs backports for 6.6.y (from 6.12) Catherine Hoang
2025-02-05 21:40 ` [PATCH 6.6 01/24] xfs: assert a valid limit in xfs_rtfind_forw Catherine Hoang
2025-02-07 22:51   ` Sasha Levin
2025-02-21 15:23   ` Patch "xfs: assert a valid limit in xfs_rtfind_forw" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 02/24] xfs: validate inumber in xfs_iget Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: validate inumber in xfs_iget" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 03/24] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: fix a sloppy memory handling bug in xfs_iroot_realloc" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 04/24] xfs: fix a typo Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: fix a typo" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 05/24] xfs: skip background cowblock trims on inodes open for write Catherine Hoang
2025-02-21 15:23   ` gregkh [this message]
2025-02-05 21:40 ` [PATCH 6.6 06/24] xfs: don't free cowblocks from under dirty pagecache on unshare Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: don't free cowblocks from under dirty pagecache on unshare" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 07/24] xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 08/24] xfs: return bool from xfs_attr3_leaf_add Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: return bool from xfs_attr3_leaf_add" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 09/24] xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 10/24] xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 11/24] xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 12/24] xfs: don't ifdef around the exact minlen allocations Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: don't ifdef around the exact minlen allocations" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 13/24] xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 14/24] xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 15/24] xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate() Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()" has been added to the 6.6-stable tree gregkh
2025-02-21 15:33     ` Uros Bizjak
2025-02-21 15:57       ` Greg KH
2025-02-05 21:40 ` [PATCH 6.6 16/24] xfs: Remove empty declartion in header file Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: Remove empty declartion in header file" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 17/24] xfs: pass the exact range to initialize to xfs_initialize_perag Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: pass the exact range to initialize to xfs_initialize_perag" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 18/24] xfs: update the file system geometry after recoverying superblock buffers Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: update the file system geometry after recoverying superblock buffers" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 19/24] xfs: error out when a superblock buffer update reduces the agcount Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: error out when a superblock buffer update reduces the agcount" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 20/24] xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 21/24] xfs: update the pag for the last AG at recovery time Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: update the pag for the last AG at recovery time" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 22/24] xfs: Reduce unnecessary searches when searching for the best extents Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: Reduce unnecessary searches when searching for the best extents" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 23/24] xfs: streamline xfs_filestream_pick_ag Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: streamline xfs_filestream_pick_ag" has been added to the 6.6-stable tree gregkh
2025-02-05 21:40 ` [PATCH 6.6 24/24] xfs: Check for delayed allocations before setting extsize Catherine Hoang
2025-02-21 15:23   ` Patch "xfs: Check for delayed allocations before setting extsize" has been added to the 6.6-stable tree gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2025022129-donated-flagpole-a2e1@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=bfoster@redhat.com \
    --cc=catherine.hoang@oracle.com \
    --cc=cem@kernel.org \
    --cc=djwong@kernel.org \
    --cc=stable-commits@vger.kernel.org \
    --cc=xfs-stable@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.