From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>, Carlos Maiolino <cem@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2] xfs: skip background cowblock trims on inodes open for write
Date: Mon, 30 Sep 2024 10:09:14 -0700 [thread overview]
Message-ID: <20240930170914.GU21853@frogsfrogsfrogs> (raw)
In-Reply-To: <20240903124713.23289-1-bfoster@redhat.com>
On Tue, Sep 03, 2024 at 08:47:13AM -0400, Brian Foster wrote:
> The background blockgc scanner runs on a 5m interval by default and
> trims preallocation (post-eof and cow fork) from inodes that are
> otherwise idle. Idle effectively means that iolock can be acquired
> without blocking and that the inode has no dirty pagecache or I/O in
> flight.
Hey, can we get this data corruption fix staged for 6.12-rc2, please?
Yesterday's fstests push contained the exerciser for this bug.
--D
> This simple mechanism and heuristic has worked fairly well for
> post-eof speculative preallocations. Support for reflink and COW
> fork preallocations came sometime later and plugged into the same
> mechanism, with similar heuristics. Some recent testing has shown
> that COW fork preallocation may be notably more sensitive to blockgc
> processing than post-eof preallocation, however.
>
> For example, consider an 8GB reflinked file with a COW extent size
> hint of 1MB. A worst case fully randomized overwrite of this file
> results in ~8k extents of an average size of ~1MB. If the same
> workload is interrupted a couple times for blockgc processing
> (assuming the file goes idle), the resulting extent count explodes
> to over 100k extents with an average size <100kB. This is
> significantly worse than ideal and essentially defeats the COW
> extent size hint mechanism.
>
> While this particular test is instrumented, it reflects a fairly
> reasonable pattern in practice where random I/Os might spread out
> over a large period of time with varying periods of (in)activity.
> For example, consider a cloned disk image file for a VM or container
> with long uptime and variable and bursty usage. A background blockgc
> scan that races and processes the image file when it happens to be
> clean and idle can have a significant effect on the future
> fragmentation level of the file, even when still in use.
>
> To help combat this, update the heuristic to skip cowblocks inodes
> that are currently opened for write access during non-sync blockgc
> scans. This allows COW fork preallocations to persist for as long as
> possible unless otherwise needed for functional purposes (i.e. a
> sync scan), the file is idle and closed, or the inode is being
> evicted from cache. While here, update the comments to help
> distinguish performance oriented heuristics from the logic that
> exists to maintain functional correctness.
>
> Suggested-by: Darrick Wong <djwong@kernel.org>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>
> v2:
> - Reorder logic and update comments in xfs_prep_free_cowblocks().
> v1: https://lore.kernel.org/linux-xfs/20240214165231.84925-1-bfoster@redhat.com/
>
> fs/xfs/xfs_icache.c | 31 +++++++++++++++++++++++--------
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index cf629302d48e..900a6277d931 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1241,14 +1241,17 @@ xfs_inode_clear_eofblocks_tag(
> }
>
> /*
> - * Set ourselves up to free CoW blocks from this file. If it's already clean
> - * then we can bail out quickly, but otherwise we must back off if the file
> - * is undergoing some kind of write.
> + * Prepare to free COW fork blocks from an inode.
> */
> static bool
> xfs_prep_free_cowblocks(
> - struct xfs_inode *ip)
> + struct xfs_inode *ip,
> + struct xfs_icwalk *icw)
> {
> + bool sync;
> +
> + sync = icw && (icw->icw_flags & XFS_ICWALK_FLAG_SYNC);
> +
> /*
> * Just clear the tag if we have an empty cow fork or none at all. It's
> * possible the inode was fully unshared since it was originally tagged.
> @@ -1260,9 +1263,21 @@ xfs_prep_free_cowblocks(
> }
>
> /*
> - * If the mapping is dirty or under writeback we cannot touch the
> - * CoW fork. Leave it alone if we're in the midst of a directio.
> + * A cowblocks trim of an inode can have a significant effect on
> + * fragmentation even when a reasonable COW extent size hint is set.
> + * Therefore, we prefer to not process cowblocks unless they are clean
> + * and idle. We can never process a cowblocks inode that is dirty or has
> + * in-flight I/O under any circumstances, because outstanding writeback
> + * or dio expects targeted COW fork blocks exist through write
> + * completion where they can be remapped into the data fork.
> + *
> + * Therefore, the heuristic used here is to never process inodes
> + * currently opened for write from background (i.e. non-sync) scans. For
> + * sync scans, use the pagecache/dio state of the inode to ensure we
> + * never free COW fork blocks out from under pending I/O.
> */
> + if (!sync && inode_is_open_for_write(VFS_I(ip)))
> + return false;
> if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
> mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
> mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
> @@ -1298,7 +1313,7 @@ xfs_inode_free_cowblocks(
> if (!xfs_iflags_test(ip, XFS_ICOWBLOCKS))
> return 0;
>
> - if (!xfs_prep_free_cowblocks(ip))
> + if (!xfs_prep_free_cowblocks(ip, icw))
> return 0;
>
> if (!xfs_icwalk_match(ip, icw))
> @@ -1327,7 +1342,7 @@ xfs_inode_free_cowblocks(
> * Check again, nobody else should be able to dirty blocks or change
> * the reflink iflag now that we have the first two locks held.
> */
> - if (xfs_prep_free_cowblocks(ip))
> + if (xfs_prep_free_cowblocks(ip, icw))
> ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> return ret;
> }
> --
> 2.45.0
>
>
next prev parent reply other threads:[~2024-09-30 17:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-03 12:47 [PATCH v2] xfs: skip background cowblock trims on inodes open for write Brian Foster
2024-09-06 11:40 ` [PATCH 2/1] xfs: don't free cowblocks from under dirty pagecache on unshare Brian Foster
2024-09-17 18:31 ` Darrick J. Wong
2024-09-18 12:22 ` Brian Foster
2024-09-17 18:24 ` [PATCH v2] xfs: skip background cowblock trims on inodes open for write Darrick J. Wong
2024-09-30 17:09 ` Darrick J. Wong [this message]
2024-10-11 7:42 ` Carlos Maiolino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240930170914.GU21853@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=bfoster@redhat.com \
--cc=cem@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox