public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2] xfs: skip background cowblock trims on inodes open for write
Date: Tue, 17 Sep 2024 11:24:41 -0700	[thread overview]
Message-ID: <20240917182441.GH182194@frogsfrogsfrogs> (raw)
In-Reply-To: <20240903124713.23289-1-bfoster@redhat.com>

On Tue, Sep 03, 2024 at 08:47:13AM -0400, Brian Foster wrote:
> The background blockgc scanner runs on a 5m interval by default and
> trims preallocation (post-eof and cow fork) from inodes that are
> otherwise idle. Idle effectively means that iolock can be acquired
> without blocking and that the inode has no dirty pagecache or I/O in
> flight.
> 
> This simple mechanism and heuristic has worked fairly well for
> post-eof speculative preallocations. Support for reflink and COW
> fork preallocations came sometime later and plugged into the same
> mechanism, with similar heuristics. Some recent testing has shown
> that COW fork preallocation may be notably more sensitive to blockgc
> processing than post-eof preallocation, however.
> 
> For example, consider an 8GB reflinked file with a COW extent size
> hint of 1MB. A worst case fully randomized overwrite of this file
> results in ~8k extents of an average size of ~1MB. If the same
> workload is interrupted a couple times for blockgc processing
> (assuming the file goes idle), the resulting extent count explodes
> to over 100k extents with an average size <100kB. This is
> significantly worse than ideal and essentially defeats the COW
> extent size hint mechanism.
> 
> While this particular test is instrumented, it reflects a fairly
> reasonable pattern in practice where random I/Os might spread out
> over a large period of time with varying periods of (in)activity.
> For example, consider a cloned disk image file for a VM or container
> with long uptime and variable and bursty usage. A background blockgc
> scan that races and processes the image file when it happens to be
> clean and idle can have a significant effect on the future
> fragmentation level of the file, even when still in use.
> 
> To help combat this, update the heuristic to skip cowblocks inodes
> that are currently opened for write access during non-sync blockgc
> scans. This allows COW fork preallocations to persist for as long as
> possible unless otherwise needed for functional purposes (i.e. a
> sync scan), the file is idle and closed, or the inode is being
> evicted from cache. While here, update the comments to help
> distinguish performance oriented heuristics from the logic that
> exists to maintain functional correctness.
> 
> Suggested-by: Darrick Wong <djwong@kernel.org>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> v2:
> - Reorder logic and update comments in xfs_prep_free_cowblocks().
> v1: https://lore.kernel.org/linux-xfs/20240214165231.84925-1-bfoster@redhat.com/
> 
>  fs/xfs/xfs_icache.c | 31 +++++++++++++++++++++++--------
>  1 file changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index cf629302d48e..900a6277d931 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1241,14 +1241,17 @@ xfs_inode_clear_eofblocks_tag(
>  }
>  
>  /*
> - * Set ourselves up to free CoW blocks from this file.  If it's already clean
> - * then we can bail out quickly, but otherwise we must back off if the file
> - * is undergoing some kind of write.
> + * Prepare to free COW fork blocks from an inode.
>   */
>  static bool
>  xfs_prep_free_cowblocks(
> -	struct xfs_inode	*ip)
> +	struct xfs_inode	*ip,
> +	struct xfs_icwalk	*icw)
>  {
> +	bool			sync;
> +
> +	sync = icw && (icw->icw_flags & XFS_ICWALK_FLAG_SYNC);
> +
>  	/*
>  	 * Just clear the tag if we have an empty cow fork or none at all. It's
>  	 * possible the inode was fully unshared since it was originally tagged.
> @@ -1260,9 +1263,21 @@ xfs_prep_free_cowblocks(
>  	}
>  
>  	/*
> -	 * If the mapping is dirty or under writeback we cannot touch the
> -	 * CoW fork.  Leave it alone if we're in the midst of a directio.
> +	 * A cowblocks trim of an inode can have a significant effect on
> +	 * fragmentation even when a reasonable COW extent size hint is set.
> +	 * Therefore, we prefer to not process cowblocks unless they are clean
> +	 * and idle. We can never process a cowblocks inode that is dirty or has
> +	 * in-flight I/O under any circumstances, because outstanding writeback
> +	 * or dio expects targeted COW fork blocks exist through write
> +	 * completion where they can be remapped into the data fork.
> +	 *
> +	 * Therefore, the heuristic used here is to never process inodes
> +	 * currently opened for write from background (i.e. non-sync) scans. For
> +	 * sync scans, use the pagecache/dio state of the inode to ensure we
> +	 * never free COW fork blocks out from under pending I/O.

Sounds good to me!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  	 */
> +	if (!sync && inode_is_open_for_write(VFS_I(ip)))
> +		return false;
>  	if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
>  	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
>  	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
> @@ -1298,7 +1313,7 @@ xfs_inode_free_cowblocks(
>  	if (!xfs_iflags_test(ip, XFS_ICOWBLOCKS))
>  		return 0;
>  
> -	if (!xfs_prep_free_cowblocks(ip))
> +	if (!xfs_prep_free_cowblocks(ip, icw))
>  		return 0;
>  
>  	if (!xfs_icwalk_match(ip, icw))
> @@ -1327,7 +1342,7 @@ xfs_inode_free_cowblocks(
>  	 * Check again, nobody else should be able to dirty blocks or change
>  	 * the reflink iflag now that we have the first two locks held.
>  	 */
> -	if (xfs_prep_free_cowblocks(ip))
> +	if (xfs_prep_free_cowblocks(ip, icw))
>  		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
>  	return ret;
>  }
> -- 
> 2.45.0
> 
> 

  parent reply	other threads:[~2024-09-17 18:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-03 12:47 [PATCH v2] xfs: skip background cowblock trims on inodes open for write Brian Foster
2024-09-06 11:40 ` [PATCH 2/1] xfs: don't free cowblocks from under dirty pagecache on unshare Brian Foster
2024-09-17 18:31   ` Darrick J. Wong
2024-09-18 12:22     ` Brian Foster
2024-09-17 18:24 ` Darrick J. Wong [this message]
2024-09-30 17:09 ` [PATCH v2] xfs: skip background cowblock trims on inodes open for write Darrick J. Wong
2024-10-11  7:42 ` Carlos Maiolino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240917182441.GH182194@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox