public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/1] xfs: don't free cowblocks from under dirty pagecache on unshare
Date: Tue, 17 Sep 2024 11:31:42 -0700	[thread overview]
Message-ID: <20240917183142.GI182194@frogsfrogsfrogs> (raw)
In-Reply-To: <20240906114051.120743-1-bfoster@redhat.com>

On Fri, Sep 06, 2024 at 07:40:51AM -0400, Brian Foster wrote:
> fallocate unshare mode explicitly breaks extent sharing. When a
> command completes, it checks the data fork for any remaining shared
> extents to determine whether the reflink inode flag and COW fork
> preallocation can be removed. This logic doesn't consider in-core
> pagecache and I/O state, however, which means we can unsafely remove
> COW fork blocks that are still needed under certain conditions.
> 
> For example, consider the following command sequence:
> 
> xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \
> 	-c "pwrite 0 32k" -c "funshare 0 1k" <file>
> 
> This allocates a data block at offset 0, shares it, and then
> overwrites it with a larger buffered write. The overwrite triggers
> COW fork preallocation, 32 blocks by default, which maps the entire
> 32k write to delalloc in the COW fork. All but the shared block at
> offset 0 remains hole mapped in the data fork. The unshare command
> redirties and flushes the folio at offset 0, removing the only
> shared extent from the inode. Since the inode no longer maps shared
> extents, unshare purges the COW fork before the remaining 28k may
> have written back.
> 
> This leaves dirty pagecache backed by holes, which writeback quietly
> skips, thus leaving clean, non-zeroed pagecache over holes in the
> file. To verify, fiemap shows holes in the first 32k of the file and
> reads return different data across a remount:
> 
> $ xfs_io -c "fiemap -v" <file>
> <file>:
>  EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
>    ...
>    1: [8..511]:        hole               504
>    ...
> $ xfs_io -c "pread -v 4k 8" <file>
> 00001000:  cd cd cd cd cd cd cd cd  ........
> $ umount <mnt>; mount <dev> <mnt>
> $ xfs_io -c "pread -v 4k 8" <file>
> 00001000:  00 00 00 00 00 00 00 00  ........
> 
> To avoid this problem, make unshare follow the same rules used for
> background cowblock scanning and never purge the COW fork for inodes
> with dirty pagecache or in-flight I/O.
> 
> Fixes: 46afb0628b ("xfs: only flush the unshared range in xfs_reflink_unshare")
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Question: Does xfs_repair report orphaned cow staging blocks after this?
There's a longstanding bug that I've seen in the long soak xfs/286 VM
where we slowly leak cow fork blocks (~80 per ~1 billion fsxops over 7
days).

Anyhow this looks correct on its own so
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
> 
> Here's another COW issue I came across via some unshare testing. A quick
> hack to enable unshare in fsx uncovered it. I'll follow up with a proper
> patch for that.
> 
> I'm sending this as a 2/1 here just to reflect patch order in my local
> tree. Also note that I haven't explicitly tested the fixes commit, but a
> quick test to switch back to the old full flush behavior on latest
> master also makes the problem go away, so I suspect that's where the
> regression was introduced.
> 
> Brian
> 
>  fs/xfs/xfs_icache.c  |  8 +-------
>  fs/xfs/xfs_reflink.c |  3 +++
>  fs/xfs/xfs_reflink.h | 19 +++++++++++++++++++
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 900a6277d931..a1b34e6ccfe2 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1278,13 +1278,7 @@ xfs_prep_free_cowblocks(
>  	 */
>  	if (!sync && inode_is_open_for_write(VFS_I(ip)))
>  		return false;
> -	if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
> -	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
> -	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
> -	    atomic_read(&VFS_I(ip)->i_dio_count))
> -		return false;
> -
> -	return true;
> +	return xfs_can_free_cowblocks(ip);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 6fde6ec8092f..5bf6682e701b 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1595,6 +1595,9 @@ xfs_reflink_clear_inode_flag(
>  
>  	ASSERT(xfs_is_reflink_inode(ip));
>  
> +	if (!xfs_can_free_cowblocks(ip))
> +		return 0;
> +
>  	error = xfs_reflink_inode_has_shared_extents(*tpp, ip, &needs_flag);
>  	if (error || needs_flag)
>  		return error;
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index fb55e4ce49fa..4a58e4533671 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -6,6 +6,25 @@
>  #ifndef __XFS_REFLINK_H
>  #define __XFS_REFLINK_H 1
>  
> +/*
> + * Check whether it is safe to free COW fork blocks from an inode. It is unsafe
> + * to do so when an inode has dirty cache or I/O in-flight, even if no shared
> + * extents exist in the data fork, because outstanding I/O may target blocks
> + * that were speculatively allocated to the COW fork.
> + */
> +static inline bool
> +xfs_can_free_cowblocks(struct xfs_inode *ip)
> +{
> +	struct inode *inode = VFS_I(ip);
> +
> +	if ((inode->i_state & I_DIRTY_PAGES) ||
> +	    mapping_tagged(inode->i_mapping, PAGECACHE_TAG_DIRTY) ||
> +	    mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
> +	    atomic_read(&inode->i_dio_count))
> +		return false;
> +	return true;
> +}
> +
>  extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *irec, bool *shared);
>  int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
> -- 
> 2.45.0
> 
> 

  reply	other threads:[~2024-09-17 18:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-03 12:47 [PATCH v2] xfs: skip background cowblock trims on inodes open for write Brian Foster
2024-09-06 11:40 ` [PATCH 2/1] xfs: don't free cowblocks from under dirty pagecache on unshare Brian Foster
2024-09-17 18:31   ` Darrick J. Wong [this message]
2024-09-18 12:22     ` Brian Foster
2024-09-17 18:24 ` [PATCH v2] xfs: skip background cowblock trims on inodes open for write Darrick J. Wong
2024-09-30 17:09 ` Darrick J. Wong
2024-10-11  7:42 ` Carlos Maiolino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240917183142.GI182194@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox