From: Brian Foster <bfoster@redhat.com>
To: linux-xfs@vger.kernel.org
Cc: Darrick Wong <djwong@kernel.org>
Subject: [PATCH 2/1] xfs: don't free cowblocks from under dirty pagecache on unshare
Date: Fri, 6 Sep 2024 07:40:51 -0400 [thread overview]
Message-ID: <20240906114051.120743-1-bfoster@redhat.com> (raw)
In-Reply-To: <20240903124713.23289-1-bfoster@redhat.com>
fallocate unshare mode explicitly breaks extent sharing. When a
command completes, it checks the data fork for any remaining shared
extents to determine whether the reflink inode flag and COW fork
preallocation can be removed. This logic doesn't consider in-core
pagecache and I/O state, however, which means we can unsafely remove
COW fork blocks that are still needed under certain conditions.
For example, consider the following command sequence:
xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \
-c "pwrite 0 32k" -c "funshare 0 1k" <file>
This allocates a data block at offset 0, shares it, and then
overwrites it with a larger buffered write. The overwrite triggers
COW fork preallocation, 32 blocks by default, which maps the entire
32k write to delalloc in the COW fork. All but the shared block at
offset 0 remains hole mapped in the data fork. The unshare command
redirties and flushes the folio at offset 0, removing the only
shared extent from the inode. Since the inode no longer maps shared
extents, unshare purges the COW fork before the remaining 28k may
have written back.
This leaves dirty pagecache backed by holes, which writeback quietly
skips, thus leaving clean, non-zeroed pagecache over holes in the
file. To verify, fiemap shows holes in the first 32k of the file and
reads return different data across a remount:
$ xfs_io -c "fiemap -v" <file>
<file>:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
...
1: [8..511]: hole 504
...
$ xfs_io -c "pread -v 4k 8" <file>
00001000: cd cd cd cd cd cd cd cd ........
$ umount <mnt>; mount <dev> <mnt>
$ xfs_io -c "pread -v 4k 8" <file>
00001000: 00 00 00 00 00 00 00 00 ........
To avoid this problem, make unshare follow the same rules used for
background cowblock scanning and never purge the COW fork for inodes
with dirty pagecache or in-flight I/O.
Fixes: 46afb0628b ("xfs: only flush the unshared range in xfs_reflink_unshare")
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
Here's another COW issue I came across via some unshare testing. A quick
hack to enable unshare in fsx uncovered it. I'll follow up with a proper
patch for that.
I'm sending this as a 2/1 here just to reflect patch order in my local
tree. Also note that I haven't explicitly tested the fixes commit, but a
quick test to switch back to the old full flush behavior on latest
master also makes the problem go away, so I suspect that's where the
regression was introduced.
Brian
fs/xfs/xfs_icache.c | 8 +-------
fs/xfs/xfs_reflink.c | 3 +++
fs/xfs/xfs_reflink.h | 19 +++++++++++++++++++
3 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 900a6277d931..a1b34e6ccfe2 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1278,13 +1278,7 @@ xfs_prep_free_cowblocks(
*/
if (!sync && inode_is_open_for_write(VFS_I(ip)))
return false;
- if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
- mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
- mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
- atomic_read(&VFS_I(ip)->i_dio_count))
- return false;
-
- return true;
+ return xfs_can_free_cowblocks(ip);
}
/*
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 6fde6ec8092f..5bf6682e701b 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1595,6 +1595,9 @@ xfs_reflink_clear_inode_flag(
ASSERT(xfs_is_reflink_inode(ip));
+ if (!xfs_can_free_cowblocks(ip))
+ return 0;
+
error = xfs_reflink_inode_has_shared_extents(*tpp, ip, &needs_flag);
if (error || needs_flag)
return error;
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index fb55e4ce49fa..4a58e4533671 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -6,6 +6,25 @@
#ifndef __XFS_REFLINK_H
#define __XFS_REFLINK_H 1
+/*
+ * Check whether it is safe to free COW fork blocks from an inode. It is unsafe
+ * to do so when an inode has dirty cache or I/O in-flight, even if no shared
+ * extents exist in the data fork, because outstanding I/O may target blocks
+ * that were speculatively allocated to the COW fork.
+ */
+static inline bool
+xfs_can_free_cowblocks(struct xfs_inode *ip)
+{
+ struct inode *inode = VFS_I(ip);
+
+ if ((inode->i_state & I_DIRTY_PAGES) ||
+ mapping_tagged(inode->i_mapping, PAGECACHE_TAG_DIRTY) ||
+ mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
+ atomic_read(&inode->i_dio_count))
+ return false;
+ return true;
+}
+
extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
struct xfs_bmbt_irec *irec, bool *shared);
int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
--
2.45.0
next prev parent reply other threads:[~2024-09-06 11:39 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-03 12:47 [PATCH v2] xfs: skip background cowblock trims on inodes open for write Brian Foster
2024-09-06 11:40 ` Brian Foster [this message]
2024-09-17 18:31 ` [PATCH 2/1] xfs: don't free cowblocks from under dirty pagecache on unshare Darrick J. Wong
2024-09-18 12:22 ` Brian Foster
2024-09-17 18:24 ` [PATCH v2] xfs: skip background cowblock trims on inodes open for write Darrick J. Wong
2024-09-30 17:09 ` Darrick J. Wong
2024-10-11 7:42 ` Carlos Maiolino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240906114051.120743-1-bfoster@redhat.com \
--to=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox