linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* fix stale delalloc punching for COW I/O
@ 2024-08-27  5:09 Christoph Hellwig
  2024-08-27  5:09 ` [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

Hi all,

this is another fallout from the zoned XFS work, which stresses the XFS
COW I/O path very heavily.  It affects normal I/O to reflinked files as
well, but is very hard to hit there.

The main problem here is that we only punch out delalloc reservations
from the data fork, but COW I/O places delalloc extents into the COW
fork, which means that it won't get punched out for short writes.

Diffstat:
 fs/iomap/buffered-io.c |  139 +++++++++++++++++++++++--------------------------
 fs/xfs/xfs_aops.c      |    4 -
 fs/xfs/xfs_bmap_util.c |   10 ++-
 fs/xfs/xfs_bmap_util.h |    2 
 fs/xfs/xfs_iomap.c     |   57 ++++++++------------
 include/linux/iomap.h  |   11 ++-
 6 files changed, 109 insertions(+), 114 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:14   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

When direct I/O completions invalidates the page cache it holds neither the
i_rwsem nor the invalidate_lock so it can be racing with
iomap_write_delalloc_release.  If the search for the end of the region that
contains data returns the start offset we hit such a race and just need to
look for the end of the newly created hole instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f420c53d86acc5..69a931de1979b9 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1241,7 +1241,15 @@ static int iomap_write_delalloc_release(struct inode *inode,
 			error = data_end;
 			goto out_unlock;
 		}
-		WARN_ON_ONCE(data_end <= start_byte);
+
+		/*
+		 * If we race with post-direct I/O invalidation of the page cache,
+		 * there might be no data left at start_byte.
+		 */
+		if (data_end == start_byte)
+			continue;
+
+		WARN_ON_ONCE(data_end < start_byte);
 		WARN_ON_ONCE(data_end > scan_end_byte);
 
 		error = iomap_write_delalloc_scan(inode, &punch_start_byte,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
  2024-08-27  5:09 ` [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27  5:44   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 03/10] iomap: pass flags to iomap_file_buffered_write_punch_delalloc Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

Currently iomap_unshare_iter relies on the IOMAP_F_SHARED flag to detect
blocks to unshare.  This is reasonable, but IOMAP_F_SHARED is also useful
for the file system to do internal book keeping for out of place writes.
XFS used to that, until it got removed in commit 72a048c1056a
("xfs: only set IOMAP_F_SHARED when providing a srcmap to a write")
because unshare for incorrectly unshare such blocks.

Add an extra safeguard by checking the explicitly provided srcmap instead
of the fallback to the iomap for valid data, as that catches the case
where we'd just copy from the same place we'd write to easily, allowing
to reinstate setting IOMAP_F_SHARED for all XFS writes that go to the
COW fork.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 69a931de1979b9..737a005082e035 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1337,16 +1337,25 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
 static loff_t iomap_unshare_iter(struct iomap_iter *iter)
 {
 	struct iomap *iomap = &iter->iomap;
-	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	loff_t pos = iter->pos;
 	loff_t length = iomap_length(iter);
 	loff_t written = 0;
 
-	/* don't bother with blocks that are not shared to start with */
+	/* Don't bother with blocks that are not shared to start with. */
 	if (!(iomap->flags & IOMAP_F_SHARED))
 		return length;
-	/* don't bother with holes or unwritten extents */
-	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
+
+	/*
+	 * Don't bother with holes or unwritten extents.
+	 *
+	 * Note that we use srcmap directly instead of iomap_iter_srcmap as
+	 * unsharing requires providing a separate source map, and the presence
+	 * of one is a good indicator that unsharing is needed, unlike
+	 * IOMAP_F_SHARED which can be set for any data that goes into the COW
+	 * fork for XFS.
+	 */
+	if (iter->srcmap.type == IOMAP_HOLE ||
+	    iter->srcmap.type == IOMAP_UNWRITTEN)
 		return length;
 
 	do {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/10] iomap: pass flags to iomap_file_buffered_write_punch_delalloc
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
  2024-08-27  5:09 ` [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release Christoph Hellwig
  2024-08-27  5:09 ` [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:22   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 04/10] iomap: zeroing already holds invalidate_lock in iomap_file_buffered_write_punch_delalloc Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

To fix short write error handling, We'll need to figure out what operation
iomap_file_buffered_write_punch_delalloc is called for.  Pass the flags
argument on to it, and reorder the argument list to match that of
->iomap_end so that the compiler only has to add the new punch argument
to the end of it instead of reshuffling the registers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c |  5 ++---
 fs/xfs/xfs_iomap.c     |  5 +++--
 include/linux/iomap.h  | 10 ++++++----
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 737a005082e035..34de9f58794ad5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -23,7 +23,6 @@
 
 #define IOEND_BATCH_SIZE	4096
 
-typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
 /*
  * Structure allocated for each folio to track per-block uptodate, dirty state
  * and I/O completions.
@@ -1300,8 +1299,8 @@ static int iomap_write_delalloc_release(struct inode *inode,
  *         internal filesystem allocation lock
  */
 int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
-		struct iomap *iomap, loff_t pos, loff_t length,
-		ssize_t written, iomap_punch_t punch)
+		loff_t pos, loff_t length, ssize_t written, unsigned flags,
+		struct iomap *iomap, iomap_punch_t punch)
 {
 	loff_t			start_byte;
 	loff_t			end_byte;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 72c981e3dc9211..47b5c83588259e 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1231,8 +1231,9 @@ xfs_buffered_write_iomap_end(
 	struct xfs_mount	*mp = XFS_M(inode->i_sb);
 	int			error;
 
-	error = iomap_file_buffered_write_punch_delalloc(inode, iomap, offset,
-			length, written, &xfs_buffered_write_delalloc_punch);
+	error = iomap_file_buffered_write_punch_delalloc(inode, offset, length,
+			written, flags, iomap,
+			&xfs_buffered_write_delalloc_punch);
 	if (error && !xfs_is_shutdown(mp)) {
 		xfs_alert(mp, "%s: unable to clean up ino 0x%llx",
 			__func__, XFS_I(inode)->i_ino);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6fc1c858013d1e..83da37d64d1144 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -258,10 +258,6 @@ static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
 
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
-int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
-		struct iomap *iomap, loff_t pos, loff_t length, ssize_t written,
-		int (*punch)(struct inode *inode, loff_t pos, loff_t length));
-
 int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
 void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
 bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
@@ -277,6 +273,12 @@ int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 		const struct iomap_ops *ops);
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
 			const struct iomap_ops *ops);
+
+typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
+int iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
+		loff_t length, ssize_t written, unsigned flag,
+		struct iomap *iomap, iomap_punch_t punch);
+
 int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		u64 start, u64 len, const struct iomap_ops *ops);
 loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/10] iomap: zeroing already holds invalidate_lock in iomap_file_buffered_write_punch_delalloc
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-08-27  5:09 ` [PATCH 03/10] iomap: pass flags to iomap_file_buffered_write_punch_delalloc Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:28   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 05/10] iomap: pass the iomap to the punch callback Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

All callers of iomap_zero_range already hold invalidate_lock, so we can't
take it again in iomap_file_buffered_write_punch_delalloc.

Use the passed in flags argument to detect if we're called from a zeroing
operation and don't take the lock again in this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 34de9f58794ad5..574ca413516443 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1198,8 +1198,8 @@ static int iomap_write_delalloc_scan(struct inode *inode,
  * require sprinkling this code with magic "+ 1" and "- 1" arithmetic and expose
  * the code to subtle off-by-one bugs....
  */
-static int iomap_write_delalloc_release(struct inode *inode,
-		loff_t start_byte, loff_t end_byte, iomap_punch_t punch)
+static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
+		loff_t end_byte, unsigned flags, iomap_punch_t punch)
 {
 	loff_t punch_start_byte = start_byte;
 	loff_t scan_end_byte = min(i_size_read(inode), end_byte);
@@ -1210,8 +1210,13 @@ static int iomap_write_delalloc_release(struct inode *inode,
 	 * folios and dirtying them via ->page_mkwrite whilst we walk the
 	 * cache and perform delalloc extent removal. Failing to do this can
 	 * leave dirty pages with no space reservation in the cache.
+	 *
+	 * For zeroing operations the callers already hold invalidate_lock.
 	 */
-	filemap_invalidate_lock(inode->i_mapping);
+	if (flags & IOMAP_ZERO)
+		rwsem_assert_held_write(&inode->i_mapping->invalidate_lock);
+	else
+		filemap_invalidate_lock(inode->i_mapping);
 	while (start_byte < scan_end_byte) {
 		loff_t		data_end;
 
@@ -1264,7 +1269,8 @@ static int iomap_write_delalloc_release(struct inode *inode,
 		error = punch(inode, punch_start_byte,
 				end_byte - punch_start_byte);
 out_unlock:
-	filemap_invalidate_unlock(inode->i_mapping);
+	if (!(flags & IOMAP_ZERO))
+		filemap_invalidate_unlock(inode->i_mapping);
 	return error;
 }
 
@@ -1328,7 +1334,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
 	if (start_byte >= end_byte)
 		return 0;
 
-	return iomap_write_delalloc_release(inode, start_byte, end_byte,
+	return iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
 					punch);
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/10] iomap: pass the iomap to the punch callback
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-08-27  5:09 ` [PATCH 04/10] iomap: zeroing already holds invalidate_lock in iomap_file_buffered_write_punch_delalloc Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:28   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 06/10] iomap: remove the iomap_file_buffered_write_punch_delalloc return value Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

XFS will need to look at the flags in the iomap structure, so pass it
down all the way to the callback.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 25 +++++++++++++------------
 fs/xfs/xfs_iomap.c     |  3 ++-
 include/linux/iomap.h  |  3 ++-
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 574ca413516443..7950cbecb78c22 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1047,7 +1047,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
 static int iomap_write_delalloc_ifs_punch(struct inode *inode,
 		struct folio *folio, loff_t start_byte, loff_t end_byte,
-		iomap_punch_t punch)
+		struct iomap *iomap, iomap_punch_t punch)
 {
 	unsigned int first_blk, last_blk, i;
 	loff_t last_byte;
@@ -1072,7 +1072,7 @@ static int iomap_write_delalloc_ifs_punch(struct inode *inode,
 	for (i = first_blk; i <= last_blk; i++) {
 		if (!ifs_block_is_dirty(folio, ifs, i)) {
 			ret = punch(inode, folio_pos(folio) + (i << blkbits),
-				    1 << blkbits);
+				    1 << blkbits, iomap);
 			if (ret)
 				return ret;
 		}
@@ -1084,7 +1084,7 @@ static int iomap_write_delalloc_ifs_punch(struct inode *inode,
 
 static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
 		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
-		iomap_punch_t punch)
+		struct iomap *iomap, iomap_punch_t punch)
 {
 	int ret = 0;
 
@@ -1094,14 +1094,14 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
 	/* if dirty, punch up to offset */
 	if (start_byte > *punch_start_byte) {
 		ret = punch(inode, *punch_start_byte,
-				start_byte - *punch_start_byte);
+				start_byte - *punch_start_byte, iomap);
 		if (ret)
 			return ret;
 	}
 
 	/* Punch non-dirty blocks within folio */
-	ret = iomap_write_delalloc_ifs_punch(inode, folio, start_byte,
-			end_byte, punch);
+	ret = iomap_write_delalloc_ifs_punch(inode, folio, start_byte, end_byte,
+			iomap, punch);
 	if (ret)
 		return ret;
 
@@ -1134,7 +1134,7 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
  */
 static int iomap_write_delalloc_scan(struct inode *inode,
 		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
-		iomap_punch_t punch)
+		struct iomap *iomap, iomap_punch_t punch)
 {
 	while (start_byte < end_byte) {
 		struct folio	*folio;
@@ -1150,7 +1150,7 @@ static int iomap_write_delalloc_scan(struct inode *inode,
 		}
 
 		ret = iomap_write_delalloc_punch(inode, folio, punch_start_byte,
-						 start_byte, end_byte, punch);
+				start_byte, end_byte, iomap, punch);
 		if (ret) {
 			folio_unlock(folio);
 			folio_put(folio);
@@ -1199,7 +1199,8 @@ static int iomap_write_delalloc_scan(struct inode *inode,
  * the code to subtle off-by-one bugs....
  */
 static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
-		loff_t end_byte, unsigned flags, iomap_punch_t punch)
+		loff_t end_byte, unsigned flags, struct iomap *iomap,
+		iomap_punch_t punch)
 {
 	loff_t punch_start_byte = start_byte;
 	loff_t scan_end_byte = min(i_size_read(inode), end_byte);
@@ -1257,7 +1258,7 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
 		WARN_ON_ONCE(data_end > scan_end_byte);
 
 		error = iomap_write_delalloc_scan(inode, &punch_start_byte,
-				start_byte, data_end, punch);
+				start_byte, data_end, iomap, punch);
 		if (error)
 			goto out_unlock;
 
@@ -1267,7 +1268,7 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
 
 	if (punch_start_byte < end_byte)
 		error = punch(inode, punch_start_byte,
-				end_byte - punch_start_byte);
+				end_byte - punch_start_byte, iomap);
 out_unlock:
 	if (!(flags & IOMAP_ZERO))
 		filemap_invalidate_unlock(inode->i_mapping);
@@ -1335,7 +1336,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
 		return 0;
 
 	return iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
-					punch);
+					iomap, punch);
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 47b5c83588259e..695e5bee776f94 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1212,7 +1212,8 @@ static int
 xfs_buffered_write_delalloc_punch(
 	struct inode		*inode,
 	loff_t			offset,
-	loff_t			length)
+	loff_t			length,
+	struct iomap		*iomap)
 {
 	xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
 	return 0;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 83da37d64d1144..a931190f6d858b 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -274,7 +274,8 @@ int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
 			const struct iomap_ops *ops);
 
-typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
+typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
+		struct iomap *iomap);
 int iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
 		loff_t length, ssize_t written, unsigned flag,
 		struct iomap *iomap, iomap_punch_t punch);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/10] iomap: remove the iomap_file_buffered_write_punch_delalloc return value
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-08-27  5:09 ` [PATCH 05/10] iomap: pass the iomap to the punch callback Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:36   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

iomap_file_buffered_write_punch_delalloc can only return errors if either
the ->punch callback returned an error, or if someone changed the API of
mapping_seek_hole_data to return a negative error code that is not
-ENXIO.

As the only instance of ->punch never returns an error, an such an error
would be fatal anyway remove the entire error propagation and don't
return an error code from iomap_file_buffered_write_punch_delalloc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 82 +++++++++++++++---------------------------
 fs/xfs/xfs_iomap.c     | 17 ++-------
 include/linux/iomap.h  |  4 +--
 3 files changed, 33 insertions(+), 70 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 7950cbecb78c22..3d7e69a542518a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1045,7 +1045,7 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
-static int iomap_write_delalloc_ifs_punch(struct inode *inode,
+static void iomap_write_delalloc_ifs_punch(struct inode *inode,
 		struct folio *folio, loff_t start_byte, loff_t end_byte,
 		struct iomap *iomap, iomap_punch_t punch)
 {
@@ -1053,7 +1053,6 @@ static int iomap_write_delalloc_ifs_punch(struct inode *inode,
 	loff_t last_byte;
 	u8 blkbits = inode->i_blkbits;
 	struct iomap_folio_state *ifs;
-	int ret = 0;
 
 	/*
 	 * When we have per-block dirty tracking, there can be
@@ -1063,47 +1062,35 @@ static int iomap_write_delalloc_ifs_punch(struct inode *inode,
 	 */
 	ifs = folio->private;
 	if (!ifs)
-		return ret;
+		return;
 
 	last_byte = min_t(loff_t, end_byte - 1,
 			folio_pos(folio) + folio_size(folio) - 1);
 	first_blk = offset_in_folio(folio, start_byte) >> blkbits;
 	last_blk = offset_in_folio(folio, last_byte) >> blkbits;
 	for (i = first_blk; i <= last_blk; i++) {
-		if (!ifs_block_is_dirty(folio, ifs, i)) {
-			ret = punch(inode, folio_pos(folio) + (i << blkbits),
+		if (!ifs_block_is_dirty(folio, ifs, i))
+			punch(inode, folio_pos(folio) + (i << blkbits),
 				    1 << blkbits, iomap);
-			if (ret)
-				return ret;
-		}
 	}
-
-	return ret;
 }
 
-
-static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
+static void iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
 		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
 		struct iomap *iomap, iomap_punch_t punch)
 {
-	int ret = 0;
-
 	if (!folio_test_dirty(folio))
-		return ret;
+		return;
 
 	/* if dirty, punch up to offset */
 	if (start_byte > *punch_start_byte) {
-		ret = punch(inode, *punch_start_byte,
-				start_byte - *punch_start_byte, iomap);
-		if (ret)
-			return ret;
+		punch(inode, *punch_start_byte, start_byte - *punch_start_byte,
+				iomap);
 	}
 
 	/* Punch non-dirty blocks within folio */
-	ret = iomap_write_delalloc_ifs_punch(inode, folio, start_byte, end_byte,
+	iomap_write_delalloc_ifs_punch(inode, folio, start_byte, end_byte,
 			iomap, punch);
-	if (ret)
-		return ret;
 
 	/*
 	 * Make sure the next punch start is correctly bound to
@@ -1111,8 +1098,6 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
 	 */
 	*punch_start_byte = min_t(loff_t, end_byte,
 				folio_pos(folio) + folio_size(folio));
-
-	return ret;
 }
 
 /*
@@ -1132,13 +1117,12 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
  * This function uses [start_byte, end_byte) intervals (i.e. open ended) to
  * simplify range iterations.
  */
-static int iomap_write_delalloc_scan(struct inode *inode,
+static void iomap_write_delalloc_scan(struct inode *inode,
 		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
 		struct iomap *iomap, iomap_punch_t punch)
 {
 	while (start_byte < end_byte) {
 		struct folio	*folio;
-		int ret;
 
 		/* grab locked page */
 		folio = filemap_lock_folio(inode->i_mapping,
@@ -1149,20 +1133,14 @@ static int iomap_write_delalloc_scan(struct inode *inode,
 			continue;
 		}
 
-		ret = iomap_write_delalloc_punch(inode, folio, punch_start_byte,
+		iomap_write_delalloc_punch(inode, folio, punch_start_byte,
 				start_byte, end_byte, iomap, punch);
-		if (ret) {
-			folio_unlock(folio);
-			folio_put(folio);
-			return ret;
-		}
 
 		/* move offset to start of next folio in range */
 		start_byte = folio_next_index(folio) << PAGE_SHIFT;
 		folio_unlock(folio);
 		folio_put(folio);
 	}
-	return 0;
 }
 
 /*
@@ -1198,13 +1176,12 @@ static int iomap_write_delalloc_scan(struct inode *inode,
  * require sprinkling this code with magic "+ 1" and "- 1" arithmetic and expose
  * the code to subtle off-by-one bugs....
  */
-static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
+static void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
 		loff_t end_byte, unsigned flags, struct iomap *iomap,
 		iomap_punch_t punch)
 {
 	loff_t punch_start_byte = start_byte;
 	loff_t scan_end_byte = min(i_size_read(inode), end_byte);
-	int error = 0;
 
 	/*
 	 * Lock the mapping to avoid races with page faults re-instantiating
@@ -1226,13 +1203,15 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
 		/*
 		 * If there is no more data to scan, all that is left is to
 		 * punch out the remaining range.
+		 *
+		 * Note that mapping_seek_hole_data is only supposed to return
+		 * either an offset or -ENXIO, so WARN on any other error as
+		 * that would be an API change without updating the callers.
 		 */
 		if (start_byte == -ENXIO || start_byte == scan_end_byte)
 			break;
-		if (start_byte < 0) {
-			error = start_byte;
+		if (WARN_ON_ONCE(start_byte < 0))
 			goto out_unlock;
-		}
 		WARN_ON_ONCE(start_byte < punch_start_byte);
 		WARN_ON_ONCE(start_byte > scan_end_byte);
 
@@ -1242,10 +1221,8 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
 		 */
 		data_end = mapping_seek_hole_data(inode->i_mapping, start_byte,
 				scan_end_byte, SEEK_HOLE);
-		if (data_end < 0) {
-			error = data_end;
+		if (WARN_ON_ONCE(data_end < 0))
 			goto out_unlock;
-		}
 
 		/*
 		 * If we race with post-direct I/O invalidation of the page cache,
@@ -1257,22 +1234,19 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
 		WARN_ON_ONCE(data_end < start_byte);
 		WARN_ON_ONCE(data_end > scan_end_byte);
 
-		error = iomap_write_delalloc_scan(inode, &punch_start_byte,
-				start_byte, data_end, iomap, punch);
-		if (error)
-			goto out_unlock;
+		iomap_write_delalloc_scan(inode, &punch_start_byte, start_byte,
+				data_end, iomap, punch);
 
 		/* The next data search starts at the end of this one. */
 		start_byte = data_end;
 	}
 
 	if (punch_start_byte < end_byte)
-		error = punch(inode, punch_start_byte,
-				end_byte - punch_start_byte, iomap);
+		punch(inode, punch_start_byte, end_byte - punch_start_byte,
+				iomap);
 out_unlock:
 	if (!(flags & IOMAP_ZERO))
 		filemap_invalidate_unlock(inode->i_mapping);
-	return error;
 }
 
 /*
@@ -1305,7 +1279,7 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
  *       ->punch
  *         internal filesystem allocation lock
  */
-int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
+void iomap_file_buffered_write_punch_delalloc(struct inode *inode,
 		loff_t pos, loff_t length, ssize_t written, unsigned flags,
 		struct iomap *iomap, iomap_punch_t punch)
 {
@@ -1314,11 +1288,11 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
 	unsigned int		blocksize = i_blocksize(inode);
 
 	if (iomap->type != IOMAP_DELALLOC)
-		return 0;
+		return;
 
 	/* If we didn't reserve the blocks, we're not allowed to punch them. */
 	if (!(iomap->flags & IOMAP_F_NEW))
-		return 0;
+		return;
 
 	/*
 	 * start_byte refers to the first unused block after a short write. If
@@ -1333,10 +1307,10 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
 
 	/* Nothing to do if we've written the entire delalloc extent */
 	if (start_byte >= end_byte)
-		return 0;
+		return;
 
-	return iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
-					iomap, punch);
+	iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
+			punch);
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 695e5bee776f94..1e11f48814c0d0 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1208,7 +1208,7 @@ xfs_buffered_write_iomap_begin(
 	return error;
 }
 
-static int
+static void
 xfs_buffered_write_delalloc_punch(
 	struct inode		*inode,
 	loff_t			offset,
@@ -1216,7 +1216,6 @@ xfs_buffered_write_delalloc_punch(
 	struct iomap		*iomap)
 {
 	xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
-	return 0;
 }
 
 static int
@@ -1228,18 +1227,8 @@ xfs_buffered_write_iomap_end(
 	unsigned		flags,
 	struct iomap		*iomap)
 {
-
-	struct xfs_mount	*mp = XFS_M(inode->i_sb);
-	int			error;
-
-	error = iomap_file_buffered_write_punch_delalloc(inode, offset, length,
-			written, flags, iomap,
-			&xfs_buffered_write_delalloc_punch);
-	if (error && !xfs_is_shutdown(mp)) {
-		xfs_alert(mp, "%s: unable to clean up ino 0x%llx",
-			__func__, XFS_I(inode)->i_ino);
-		return error;
-	}
+	iomap_file_buffered_write_punch_delalloc(inode, offset, length, written,
+			flags, iomap, &xfs_buffered_write_delalloc_punch);
 	return 0;
 }
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a931190f6d858b..78a48af4d2c0a1 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -274,9 +274,9 @@ int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
 			const struct iomap_ops *ops);
 
-typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
+typedef void (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
 		struct iomap *iomap);
-int iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
+void iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
 		loff_t length, ssize_t written, unsigned flag,
 		struct iomap *iomap, iomap_punch_t punch);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-08-27  5:09 ` [PATCH 06/10] iomap: remove the iomap_file_buffered_write_punch_delalloc return value Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:37   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 08/10] xfs: share a bit more code in xfs_buffered_write_iomap_begin Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

xfs_buffered_write_iomap_begin can also create delallocate reservations
that need cleaning up, prepare for that by adding support for the COW
fork in xfs_bmap_punch_delalloc_range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c      |  4 ++--
 fs/xfs/xfs_bmap_util.c | 10 +++++++---
 fs/xfs/xfs_bmap_util.h |  2 +-
 fs/xfs/xfs_iomap.c     |  3 ++-
 4 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 6dead20338e24c..559a3a57709748 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -116,7 +116,7 @@ xfs_end_ioend(
 	if (unlikely(error)) {
 		if (ioend->io_flags & IOMAP_F_SHARED) {
 			xfs_reflink_cancel_cow_range(ip, offset, size, true);
-			xfs_bmap_punch_delalloc_range(ip, offset,
+			xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, offset,
 					offset + size);
 		}
 		goto done;
@@ -456,7 +456,7 @@ xfs_discard_folio(
 	 * byte of the next folio. Hence the end offset is only dependent on the
 	 * folio itself and not the start offset that is passed in.
 	 */
-	xfs_bmap_punch_delalloc_range(ip, pos,
+	xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, pos,
 				folio_pos(folio) + folio_size(folio));
 }
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index fe2e2c93097550..15c8f90f19a934 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -443,11 +443,12 @@ xfs_getbmap(
 void
 xfs_bmap_punch_delalloc_range(
 	struct xfs_inode	*ip,
+	int			whichfork,
 	xfs_off_t		start_byte,
 	xfs_off_t		end_byte)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = &ip->i_df;
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
 	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, start_byte);
 	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, end_byte);
 	struct xfs_bmbt_irec	got, del;
@@ -475,11 +476,14 @@ xfs_bmap_punch_delalloc_range(
 			continue;
 		}
 
-		xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur, &got, &del);
+		xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got, &del);
 		if (!xfs_iext_get_extent(ifp, &icur, &got))
 			break;
 	}
 
+	if (whichfork == XFS_COW_FORK && !ifp->if_bytes)
+		xfs_inode_clear_cowblocks_tag(ip);
+
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 }
@@ -590,7 +594,7 @@ xfs_free_eofblocks(
 	 */
 	if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
 		if (ip->i_delayed_blks) {
-			xfs_bmap_punch_delalloc_range(ip,
+			xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK,
 				round_up(XFS_ISIZE(ip), mp->m_sb.sb_blocksize),
 				LLONG_MAX);
 		}
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index eb0895bfb9dae4..b29760d36e1ab1 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -30,7 +30,7 @@ xfs_bmap_rtalloc(struct xfs_bmalloca *ap)
 }
 #endif /* CONFIG_XFS_RT */
 
-void	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
+void	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip, int whichfork,
 		xfs_off_t start_byte, xfs_off_t end_byte);
 
 struct kgetbmap {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 1e11f48814c0d0..24d69c8c168aeb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1215,7 +1215,8 @@ xfs_buffered_write_delalloc_punch(
 	loff_t			length,
 	struct iomap		*iomap)
 {
-	xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
+	xfs_bmap_punch_delalloc_range(XFS_I(inode), XFS_DATA_FORK, offset,
+			offset + length);
 }
 
 static int
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/10] xfs: share a bit more code in xfs_buffered_write_iomap_begin
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-08-27  5:09 ` [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:38   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
  2024-08-27  5:09 ` [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes Christoph Hellwig
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

Introduce a local iomap_flags variable so that the code allocating new
delalloc blocks in the data fork can fall through to the found_imap
label and reuse the code to unlock and fill the iomap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 24d69c8c168aeb..e0dc6393686c01 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -975,6 +975,7 @@ xfs_buffered_write_iomap_begin(
 	int			allocfork = XFS_DATA_FORK;
 	int			error = 0;
 	unsigned int		lockmode = XFS_ILOCK_EXCL;
+	unsigned int		iomap_flags = 0;
 	u64			seq;
 
 	if (xfs_is_shutdown(mp))
@@ -1145,6 +1146,11 @@ xfs_buffered_write_iomap_begin(
 		}
 	}
 
+	/*
+	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
+	 * them out if the write happens to fail.
+	 */
+	iomap_flags |= IOMAP_F_NEW;
 	if (allocfork == XFS_COW_FORK) {
 		error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb,
 				end_fsb - offset_fsb, prealloc_blocks, &cmap,
@@ -1162,19 +1168,11 @@ xfs_buffered_write_iomap_begin(
 	if (error)
 		goto out_unlock;
 
-	/*
-	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
-	 * them out if the write happens to fail.
-	 */
-	seq = xfs_iomap_inode_sequence(ip, IOMAP_F_NEW);
-	xfs_iunlock(ip, lockmode);
 	trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
-	return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_NEW, seq);
-
 found_imap:
-	seq = xfs_iomap_inode_sequence(ip, 0);
+	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
 	xfs_iunlock(ip, lockmode);
-	return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq);
+	return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq);
 
 convert_delay:
 	xfs_iunlock(ip, lockmode);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-08-27  5:09 ` [PATCH 08/10] xfs: share a bit more code in xfs_buffered_write_iomap_begin Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:44   ` Darrick J. Wong
  2024-08-27  5:09 ` [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes Christoph Hellwig
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

Change to always set xfs_buffered_write_iomap_begin for COW fork
allocations even if they don't overlap existing data fork extents,
which will allow the iomap_end callback to detect if it has to punch
stale delalloc blocks from the COW fork instead of the data fork.  It
also means we sample the sequence counter for both the data and the COW
fork when writing to the COW fork, which ensures we properly revalidate
when only COW fork changes happens.

This is essentially a revert of commit 72a048c1056a ("xfs: only set
IOMAP_F_SHARED when providing a srcmap to a write"). This is fine because
the problem that the commit fixed has now been dealt with in iomap by
only looking at the actual srcmap and not the fallback to the write
iomap.

Note that the direct I/O path was never changed and has always set
IOMAP_F_SHARED for all COW fork allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e0dc6393686c01..22e9613a995f12 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1186,20 +1186,19 @@ xfs_buffered_write_iomap_begin(
 	return 0;
 
 found_cow:
-	seq = xfs_iomap_inode_sequence(ip, 0);
 	if (imap.br_startoff <= offset_fsb) {
-		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq);
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0,
+				xfs_iomap_inode_sequence(ip, 0));
 		if (error)
 			goto out_unlock;
-		seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
-		xfs_iunlock(ip, lockmode);
-		return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
-					 IOMAP_F_SHARED, seq);
+	} else {
+		xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
 	}
 
-	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
+	iomap_flags = IOMAP_F_SHARED;
+	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
 	xfs_iunlock(ip, lockmode);
-	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq);
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
 
 out_unlock:
 	xfs_iunlock(ip, lockmode);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes
  2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-08-27  5:09 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
@ 2024-08-27  5:09 ` Christoph Hellwig
  2024-08-27 16:44   ` Darrick J. Wong
  9 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:09 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

When ->iomap_end is called on a short write to the COW fork it needs to
punch stale delalloc data from the COW fork and not the data fork.

Ensure that IOMAP_F_NEW is set for new COW fork allocations in
xfs_buffered_write_iomap_begin, and then use the IOMAP_F_SHARED flag
in xfs_buffered_write_delalloc_punch to decide which fork to punch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 22e9613a995f12..4113e09cb836a8 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1195,7 +1195,7 @@ xfs_buffered_write_iomap_begin(
 		xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
 	}
 
-	iomap_flags = IOMAP_F_SHARED;
+	iomap_flags |= IOMAP_F_SHARED;
 	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
 	xfs_iunlock(ip, lockmode);
 	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
@@ -1212,8 +1212,10 @@ xfs_buffered_write_delalloc_punch(
 	loff_t			length,
 	struct iomap		*iomap)
 {
-	xfs_bmap_punch_delalloc_range(XFS_I(inode), XFS_DATA_FORK, offset,
-			offset + length);
+	xfs_bmap_punch_delalloc_range(XFS_I(inode),
+			(iomap->flags & IOMAP_F_SHARED) ?
+				XFS_COW_FORK : XFS_DATA_FORK,
+			offset, offset + length);
 }
 
 static int
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter
  2024-08-27  5:09 ` [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter Christoph Hellwig
@ 2024-08-27  5:44   ` Darrick J. Wong
  2024-08-27  5:47     ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27  5:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:49AM +0200, Christoph Hellwig wrote:
> Currently iomap_unshare_iter relies on the IOMAP_F_SHARED flag to detect
> blocks to unshare.  This is reasonable, but IOMAP_F_SHARED is also useful
> for the file system to do internal book keeping for out of place writes.
> XFS used to that, until it got removed in commit 72a048c1056a
> ("xfs: only set IOMAP_F_SHARED when providing a srcmap to a write")
> because unshare for incorrectly unshare such blocks.
> 
> Add an extra safeguard by checking the explicitly provided srcmap instead
> of the fallback to the iomap for valid data, as that catches the case
> where we'd just copy from the same place we'd write to easily, allowing
> to reinstate setting IOMAP_F_SHARED for all XFS writes that go to the
> COW fork.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 69a931de1979b9..737a005082e035 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1337,16 +1337,25 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
>  static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  {
>  	struct iomap *iomap = &iter->iomap;
> -	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	loff_t pos = iter->pos;
>  	loff_t length = iomap_length(iter);
>  	loff_t written = 0;
>  
> -	/* don't bother with blocks that are not shared to start with */
> +	/* Don't bother with blocks that are not shared to start with. */
>  	if (!(iomap->flags & IOMAP_F_SHARED))
>  		return length;
> -	/* don't bother with holes or unwritten extents */
> -	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
> +
> +	/*
> +	 * Don't bother with holes or unwritten extents.
> +	 *
> +	 * Note that we use srcmap directly instead of iomap_iter_srcmap as
> +	 * unsharing requires providing a separate source map, and the presence
> +	 * of one is a good indicator that unsharing is needed, unlike
> +	 * IOMAP_F_SHARED which can be set for any data that goes into the COW

Maybe we should rename it then?

IOMAP_F_OUT_OF_PLACE_WRITE.

Yuck.

IOMAP_F_ELSEWHERE

Not much better.  Maybe just add a comment that "IOMAP_F_SHARED" really
just means an out of place write (aka srcmap is not just a hole).

--D

> +	 * fork for XFS.
> +	 */
> +	if (iter->srcmap.type == IOMAP_HOLE ||
> +	    iter->srcmap.type == IOMAP_UNWRITTEN)
>  		return length;
>  
>  	do {
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter
  2024-08-27  5:44   ` Darrick J. Wong
@ 2024-08-27  5:47     ` Christoph Hellwig
  2024-08-27 16:21       ` Darrick J. Wong
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-27  5:47 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Christian Brauner, linux-xfs,
	linux-fsdevel

On Mon, Aug 26, 2024 at 10:44:24PM -0700, Darrick J. Wong wrote:
> Maybe we should rename it then?
> 
> IOMAP_F_OUT_OF_PLACE_WRITE.
> 
> Yuck.
> 
> IOMAP_F_ELSEWHERE
> 
> Not much better.  Maybe just add a comment that "IOMAP_F_SHARED" really
> just means an out of place write (aka srcmap is not just a hole).

Heh.

For writes it usually means out of place write, but for reporting
it gets translated to the FIEMAP_EXTENT_SHARED flag or is used to
reject swapon.  And the there is black magic in DAX.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
  2024-08-27  5:09 ` [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release Christoph Hellwig
@ 2024-08-27 16:14   ` Darrick J. Wong
  2024-08-28  4:48     ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:48AM +0200, Christoph Hellwig wrote:
> When direct I/O completions invalidates the page cache it holds neither the
> i_rwsem nor the invalidate_lock so it can be racing with
> iomap_write_delalloc_release.  If the search for the end of the region that
> contains data returns the start offset we hit such a race and just need to
> look for the end of the newly created hole instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index f420c53d86acc5..69a931de1979b9 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1241,7 +1241,15 @@ static int iomap_write_delalloc_release(struct inode *inode,
>  			error = data_end;
>  			goto out_unlock;
>  		}
> -		WARN_ON_ONCE(data_end <= start_byte);
> +
> +		/*
> +		 * If we race with post-direct I/O invalidation of the page cache,
> +		 * there might be no data left at start_byte.
> +		 */
> +		if (data_end == start_byte)
> +			continue;

Is there any chance that we could get stuck in a loop here?  I
think it's the case that if SEEK_HOLE returns data_end == start_byte,
then the next time through the loop, the SEEK_DATA will return something
that is > start_byte.  Unless someone is very rapidly writing and
punching the page cache?

Hmm but then if *xfs* is punching delalloc then we're we holding the
iolock so who else could be doing that?

If the answers are 'no' and 'nobody' then
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +
> +		WARN_ON_ONCE(data_end < start_byte);
>  		WARN_ON_ONCE(data_end > scan_end_byte);
>  
>  		error = iomap_write_delalloc_scan(inode, &punch_start_byte,
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter
  2024-08-27  5:47     ` Christoph Hellwig
@ 2024-08-27 16:21       ` Darrick J. Wong
  2024-08-28  4:49         ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:47:57AM +0200, Christoph Hellwig wrote:
> On Mon, Aug 26, 2024 at 10:44:24PM -0700, Darrick J. Wong wrote:
> > Maybe we should rename it then?
> > 
> > IOMAP_F_OUT_OF_PLACE_WRITE.
> > 
> > Yuck.
> > 
> > IOMAP_F_ELSEWHERE
> > 
> > Not much better.  Maybe just add a comment that "IOMAP_F_SHARED" really
> > just means an out of place write (aka srcmap is not just a hole).
> 
> Heh.
> 
> For writes it usually means out of place write, but for reporting
> it gets translated to the FIEMAP_EXTENT_SHARED flag or is used to
> reject swapon.  And the there is black magic in DAX.

Hee hee.  Yeah, let's leave IOMAP_F_SHARED alone; an out of place write
can be detected by iter->srcmap.type != HOLE.

The patch looks fine so
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/10] iomap: pass flags to iomap_file_buffered_write_punch_delalloc
  2024-08-27  5:09 ` [PATCH 03/10] iomap: pass flags to iomap_file_buffered_write_punch_delalloc Christoph Hellwig
@ 2024-08-27 16:22   ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:50AM +0200, Christoph Hellwig wrote:
> To fix short write error handling, We'll need to figure out what operation
> iomap_file_buffered_write_punch_delalloc is called for.  Pass the flags
> argument on to it, and reorder the argument list to match that of
> ->iomap_end so that the compiler only has to add the new punch argument
> to the end of it instead of reshuffling the registers.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c |  5 ++---
>  fs/xfs/xfs_iomap.c     |  5 +++--
>  include/linux/iomap.h  | 10 ++++++----
>  3 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 737a005082e035..34de9f58794ad5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -23,7 +23,6 @@
>  
>  #define IOEND_BATCH_SIZE	4096
>  
> -typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
>  /*
>   * Structure allocated for each folio to track per-block uptodate, dirty state
>   * and I/O completions.
> @@ -1300,8 +1299,8 @@ static int iomap_write_delalloc_release(struct inode *inode,
>   *         internal filesystem allocation lock
>   */
>  int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
> -		struct iomap *iomap, loff_t pos, loff_t length,
> -		ssize_t written, iomap_punch_t punch)
> +		loff_t pos, loff_t length, ssize_t written, unsigned flags,
> +		struct iomap *iomap, iomap_punch_t punch)
>  {
>  	loff_t			start_byte;
>  	loff_t			end_byte;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 72c981e3dc9211..47b5c83588259e 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1231,8 +1231,9 @@ xfs_buffered_write_iomap_end(
>  	struct xfs_mount	*mp = XFS_M(inode->i_sb);
>  	int			error;
>  
> -	error = iomap_file_buffered_write_punch_delalloc(inode, iomap, offset,
> -			length, written, &xfs_buffered_write_delalloc_punch);
> +	error = iomap_file_buffered_write_punch_delalloc(inode, offset, length,
> +			written, flags, iomap,
> +			&xfs_buffered_write_delalloc_punch);
>  	if (error && !xfs_is_shutdown(mp)) {
>  		xfs_alert(mp, "%s: unable to clean up ino 0x%llx",
>  			__func__, XFS_I(inode)->i_ino);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 6fc1c858013d1e..83da37d64d1144 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -258,10 +258,6 @@ static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
>  
>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  		const struct iomap_ops *ops);
> -int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
> -		struct iomap *iomap, loff_t pos, loff_t length, ssize_t written,
> -		int (*punch)(struct inode *inode, loff_t pos, loff_t length));
> -
>  int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
>  void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
>  bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
> @@ -277,6 +273,12 @@ int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>  		const struct iomap_ops *ops);
>  vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
>  			const struct iomap_ops *ops);
> +
> +typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
> +int iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
> +		loff_t length, ssize_t written, unsigned flag,
> +		struct iomap *iomap, iomap_punch_t punch);
> +
>  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		u64 start, u64 len, const struct iomap_ops *ops);
>  loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/10] iomap: zeroing already holds invalidate_lock in iomap_file_buffered_write_punch_delalloc
  2024-08-27  5:09 ` [PATCH 04/10] iomap: zeroing already holds invalidate_lock in iomap_file_buffered_write_punch_delalloc Christoph Hellwig
@ 2024-08-27 16:28   ` Darrick J. Wong
  2024-08-28  4:51     ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:51AM +0200, Christoph Hellwig wrote:
> All callers of iomap_zero_range already hold invalidate_lock, so we can't
> take it again in iomap_file_buffered_write_punch_delalloc.

What about the xfs_zero_range call in xfs_file_write_checks?  AFAICT we
don't hold the invalidate lock there.  Did I misread that?

Also, would nested takings of the invalidate lock cause a livelock?  Or
is this actually quite broken now?

--D

> Use the passed in flags argument to detect if we're called from a zeroing
> operation and don't take the lock again in this case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 34de9f58794ad5..574ca413516443 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1198,8 +1198,8 @@ static int iomap_write_delalloc_scan(struct inode *inode,
>   * require sprinkling this code with magic "+ 1" and "- 1" arithmetic and expose
>   * the code to subtle off-by-one bugs....
>   */
> -static int iomap_write_delalloc_release(struct inode *inode,
> -		loff_t start_byte, loff_t end_byte, iomap_punch_t punch)
> +static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
> +		loff_t end_byte, unsigned flags, iomap_punch_t punch)
>  {
>  	loff_t punch_start_byte = start_byte;
>  	loff_t scan_end_byte = min(i_size_read(inode), end_byte);
> @@ -1210,8 +1210,13 @@ static int iomap_write_delalloc_release(struct inode *inode,
>  	 * folios and dirtying them via ->page_mkwrite whilst we walk the
>  	 * cache and perform delalloc extent removal. Failing to do this can
>  	 * leave dirty pages with no space reservation in the cache.
> +	 *
> +	 * For zeroing operations the callers already hold invalidate_lock.
>  	 */
> -	filemap_invalidate_lock(inode->i_mapping);
> +	if (flags & IOMAP_ZERO)
> +		rwsem_assert_held_write(&inode->i_mapping->invalidate_lock);
> +	else
> +		filemap_invalidate_lock(inode->i_mapping);
>  	while (start_byte < scan_end_byte) {
>  		loff_t		data_end;
>  
> @@ -1264,7 +1269,8 @@ static int iomap_write_delalloc_release(struct inode *inode,
>  		error = punch(inode, punch_start_byte,
>  				end_byte - punch_start_byte);
>  out_unlock:
> -	filemap_invalidate_unlock(inode->i_mapping);
> +	if (!(flags & IOMAP_ZERO))
> +		filemap_invalidate_unlock(inode->i_mapping);
>  	return error;
>  }
>  
> @@ -1328,7 +1334,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
>  	if (start_byte >= end_byte)
>  		return 0;
>  
> -	return iomap_write_delalloc_release(inode, start_byte, end_byte,
> +	return iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
>  					punch);
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/10] iomap: pass the iomap to the punch callback
  2024-08-27  5:09 ` [PATCH 05/10] iomap: pass the iomap to the punch callback Christoph Hellwig
@ 2024-08-27 16:28   ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:52AM +0200, Christoph Hellwig wrote:
> XFS will need to look at the flags in the iomap structure, so pass it
> down all the way to the callback.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/buffered-io.c | 25 +++++++++++++------------
>  fs/xfs/xfs_iomap.c     |  3 ++-
>  include/linux/iomap.h  |  3 ++-
>  3 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 574ca413516443..7950cbecb78c22 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1047,7 +1047,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
>  static int iomap_write_delalloc_ifs_punch(struct inode *inode,
>  		struct folio *folio, loff_t start_byte, loff_t end_byte,
> -		iomap_punch_t punch)
> +		struct iomap *iomap, iomap_punch_t punch)
>  {
>  	unsigned int first_blk, last_blk, i;
>  	loff_t last_byte;
> @@ -1072,7 +1072,7 @@ static int iomap_write_delalloc_ifs_punch(struct inode *inode,
>  	for (i = first_blk; i <= last_blk; i++) {
>  		if (!ifs_block_is_dirty(folio, ifs, i)) {
>  			ret = punch(inode, folio_pos(folio) + (i << blkbits),
> -				    1 << blkbits);
> +				    1 << blkbits, iomap);
>  			if (ret)
>  				return ret;
>  		}
> @@ -1084,7 +1084,7 @@ static int iomap_write_delalloc_ifs_punch(struct inode *inode,
>  
>  static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
>  		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
> -		iomap_punch_t punch)
> +		struct iomap *iomap, iomap_punch_t punch)
>  {
>  	int ret = 0;
>  
> @@ -1094,14 +1094,14 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
>  	/* if dirty, punch up to offset */
>  	if (start_byte > *punch_start_byte) {
>  		ret = punch(inode, *punch_start_byte,
> -				start_byte - *punch_start_byte);
> +				start_byte - *punch_start_byte, iomap);
>  		if (ret)
>  			return ret;
>  	}
>  
>  	/* Punch non-dirty blocks within folio */
> -	ret = iomap_write_delalloc_ifs_punch(inode, folio, start_byte,
> -			end_byte, punch);
> +	ret = iomap_write_delalloc_ifs_punch(inode, folio, start_byte, end_byte,
> +			iomap, punch);
>  	if (ret)
>  		return ret;
>  
> @@ -1134,7 +1134,7 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
>   */
>  static int iomap_write_delalloc_scan(struct inode *inode,
>  		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
> -		iomap_punch_t punch)
> +		struct iomap *iomap, iomap_punch_t punch)
>  {
>  	while (start_byte < end_byte) {
>  		struct folio	*folio;
> @@ -1150,7 +1150,7 @@ static int iomap_write_delalloc_scan(struct inode *inode,
>  		}
>  
>  		ret = iomap_write_delalloc_punch(inode, folio, punch_start_byte,
> -						 start_byte, end_byte, punch);
> +				start_byte, end_byte, iomap, punch);
>  		if (ret) {
>  			folio_unlock(folio);
>  			folio_put(folio);
> @@ -1199,7 +1199,8 @@ static int iomap_write_delalloc_scan(struct inode *inode,
>   * the code to subtle off-by-one bugs....
>   */
>  static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
> -		loff_t end_byte, unsigned flags, iomap_punch_t punch)
> +		loff_t end_byte, unsigned flags, struct iomap *iomap,
> +		iomap_punch_t punch)
>  {
>  	loff_t punch_start_byte = start_byte;
>  	loff_t scan_end_byte = min(i_size_read(inode), end_byte);
> @@ -1257,7 +1258,7 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>  		WARN_ON_ONCE(data_end > scan_end_byte);
>  
>  		error = iomap_write_delalloc_scan(inode, &punch_start_byte,
> -				start_byte, data_end, punch);
> +				start_byte, data_end, iomap, punch);
>  		if (error)
>  			goto out_unlock;
>  
> @@ -1267,7 +1268,7 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>  
>  	if (punch_start_byte < end_byte)
>  		error = punch(inode, punch_start_byte,
> -				end_byte - punch_start_byte);
> +				end_byte - punch_start_byte, iomap);
>  out_unlock:
>  	if (!(flags & IOMAP_ZERO))
>  		filemap_invalidate_unlock(inode->i_mapping);
> @@ -1335,7 +1336,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
>  		return 0;
>  
>  	return iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
> -					punch);
> +					iomap, punch);
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 47b5c83588259e..695e5bee776f94 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1212,7 +1212,8 @@ static int
>  xfs_buffered_write_delalloc_punch(
>  	struct inode		*inode,
>  	loff_t			offset,
> -	loff_t			length)
> +	loff_t			length,
> +	struct iomap		*iomap)
>  {
>  	xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
>  	return 0;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 83da37d64d1144..a931190f6d858b 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -274,7 +274,8 @@ int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>  vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
>  			const struct iomap_ops *ops);
>  
> -typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
> +typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
> +		struct iomap *iomap);
>  int iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
>  		loff_t length, ssize_t written, unsigned flag,
>  		struct iomap *iomap, iomap_punch_t punch);
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/10] iomap: remove the iomap_file_buffered_write_punch_delalloc return value
  2024-08-27  5:09 ` [PATCH 06/10] iomap: remove the iomap_file_buffered_write_punch_delalloc return value Christoph Hellwig
@ 2024-08-27 16:36   ` Darrick J. Wong
  2024-08-28  4:52     ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:53AM +0200, Christoph Hellwig wrote:
> iomap_file_buffered_write_punch_delalloc can only return errors if either
> the ->punch callback returned an error, or if someone changed the API of
> mapping_seek_hole_data to return a negative error code that is not
> -ENXIO.
> 
> As the only instance of ->punch never returns an error, an such an error
> would be fatal anyway remove the entire error propagation and don't
> return an error code from iomap_file_buffered_write_punch_delalloc.

Not sure I like this one -- if the ->iomap_begin method returns some
weird error to iomap_seek_{data,hole}, then I think we'd at least want
to complain about that?

Though I guess we're punching delalloc mappings for a failed pagecache
write, so we've already got ourselves a juicy EIO to throw up to the
application so maybe it's fine not to bother with the error recovery
erroring out.  Right?

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 82 +++++++++++++++---------------------------
>  fs/xfs/xfs_iomap.c     | 17 ++-------
>  include/linux/iomap.h  |  4 +--
>  3 files changed, 33 insertions(+), 70 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 7950cbecb78c22..3d7e69a542518a 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1045,7 +1045,7 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
> -static int iomap_write_delalloc_ifs_punch(struct inode *inode,
> +static void iomap_write_delalloc_ifs_punch(struct inode *inode,
>  		struct folio *folio, loff_t start_byte, loff_t end_byte,
>  		struct iomap *iomap, iomap_punch_t punch)
>  {
> @@ -1053,7 +1053,6 @@ static int iomap_write_delalloc_ifs_punch(struct inode *inode,
>  	loff_t last_byte;
>  	u8 blkbits = inode->i_blkbits;
>  	struct iomap_folio_state *ifs;
> -	int ret = 0;
>  
>  	/*
>  	 * When we have per-block dirty tracking, there can be
> @@ -1063,47 +1062,35 @@ static int iomap_write_delalloc_ifs_punch(struct inode *inode,
>  	 */
>  	ifs = folio->private;
>  	if (!ifs)
> -		return ret;
> +		return;
>  
>  	last_byte = min_t(loff_t, end_byte - 1,
>  			folio_pos(folio) + folio_size(folio) - 1);
>  	first_blk = offset_in_folio(folio, start_byte) >> blkbits;
>  	last_blk = offset_in_folio(folio, last_byte) >> blkbits;
>  	for (i = first_blk; i <= last_blk; i++) {
> -		if (!ifs_block_is_dirty(folio, ifs, i)) {
> -			ret = punch(inode, folio_pos(folio) + (i << blkbits),
> +		if (!ifs_block_is_dirty(folio, ifs, i))
> +			punch(inode, folio_pos(folio) + (i << blkbits),
>  				    1 << blkbits, iomap);
> -			if (ret)
> -				return ret;
> -		}
>  	}
> -
> -	return ret;
>  }
>  
> -
> -static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
> +static void iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
>  		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
>  		struct iomap *iomap, iomap_punch_t punch)
>  {
> -	int ret = 0;
> -
>  	if (!folio_test_dirty(folio))
> -		return ret;
> +		return;
>  
>  	/* if dirty, punch up to offset */
>  	if (start_byte > *punch_start_byte) {
> -		ret = punch(inode, *punch_start_byte,
> -				start_byte - *punch_start_byte, iomap);
> -		if (ret)
> -			return ret;
> +		punch(inode, *punch_start_byte, start_byte - *punch_start_byte,
> +				iomap);
>  	}
>  
>  	/* Punch non-dirty blocks within folio */
> -	ret = iomap_write_delalloc_ifs_punch(inode, folio, start_byte, end_byte,
> +	iomap_write_delalloc_ifs_punch(inode, folio, start_byte, end_byte,
>  			iomap, punch);
> -	if (ret)
> -		return ret;
>  
>  	/*
>  	 * Make sure the next punch start is correctly bound to
> @@ -1111,8 +1098,6 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
>  	 */
>  	*punch_start_byte = min_t(loff_t, end_byte,
>  				folio_pos(folio) + folio_size(folio));
> -
> -	return ret;
>  }
>  
>  /*
> @@ -1132,13 +1117,12 @@ static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio,
>   * This function uses [start_byte, end_byte) intervals (i.e. open ended) to
>   * simplify range iterations.
>   */
> -static int iomap_write_delalloc_scan(struct inode *inode,
> +static void iomap_write_delalloc_scan(struct inode *inode,
>  		loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte,
>  		struct iomap *iomap, iomap_punch_t punch)
>  {
>  	while (start_byte < end_byte) {
>  		struct folio	*folio;
> -		int ret;
>  
>  		/* grab locked page */
>  		folio = filemap_lock_folio(inode->i_mapping,
> @@ -1149,20 +1133,14 @@ static int iomap_write_delalloc_scan(struct inode *inode,
>  			continue;
>  		}
>  
> -		ret = iomap_write_delalloc_punch(inode, folio, punch_start_byte,
> +		iomap_write_delalloc_punch(inode, folio, punch_start_byte,
>  				start_byte, end_byte, iomap, punch);
> -		if (ret) {
> -			folio_unlock(folio);
> -			folio_put(folio);
> -			return ret;
> -		}
>  
>  		/* move offset to start of next folio in range */
>  		start_byte = folio_next_index(folio) << PAGE_SHIFT;
>  		folio_unlock(folio);
>  		folio_put(folio);
>  	}
> -	return 0;
>  }
>  
>  /*
> @@ -1198,13 +1176,12 @@ static int iomap_write_delalloc_scan(struct inode *inode,
>   * require sprinkling this code with magic "+ 1" and "- 1" arithmetic and expose
>   * the code to subtle off-by-one bugs....
>   */
> -static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
> +static void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>  		loff_t end_byte, unsigned flags, struct iomap *iomap,
>  		iomap_punch_t punch)
>  {
>  	loff_t punch_start_byte = start_byte;
>  	loff_t scan_end_byte = min(i_size_read(inode), end_byte);
> -	int error = 0;
>  
>  	/*
>  	 * Lock the mapping to avoid races with page faults re-instantiating
> @@ -1226,13 +1203,15 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>  		/*
>  		 * If there is no more data to scan, all that is left is to
>  		 * punch out the remaining range.
> +		 *
> +		 * Note that mapping_seek_hole_data is only supposed to return
> +		 * either an offset or -ENXIO, so WARN on any other error as
> +		 * that would be an API change without updating the callers.
>  		 */
>  		if (start_byte == -ENXIO || start_byte == scan_end_byte)
>  			break;
> -		if (start_byte < 0) {
> -			error = start_byte;
> +		if (WARN_ON_ONCE(start_byte < 0))
>  			goto out_unlock;
> -		}
>  		WARN_ON_ONCE(start_byte < punch_start_byte);
>  		WARN_ON_ONCE(start_byte > scan_end_byte);
>  
> @@ -1242,10 +1221,8 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>  		 */
>  		data_end = mapping_seek_hole_data(inode->i_mapping, start_byte,
>  				scan_end_byte, SEEK_HOLE);
> -		if (data_end < 0) {
> -			error = data_end;
> +		if (WARN_ON_ONCE(data_end < 0))
>  			goto out_unlock;
> -		}
>  
>  		/*
>  		 * If we race with post-direct I/O invalidation of the page cache,
> @@ -1257,22 +1234,19 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>  		WARN_ON_ONCE(data_end < start_byte);
>  		WARN_ON_ONCE(data_end > scan_end_byte);
>  
> -		error = iomap_write_delalloc_scan(inode, &punch_start_byte,
> -				start_byte, data_end, iomap, punch);
> -		if (error)
> -			goto out_unlock;
> +		iomap_write_delalloc_scan(inode, &punch_start_byte, start_byte,
> +				data_end, iomap, punch);
>  
>  		/* The next data search starts at the end of this one. */
>  		start_byte = data_end;
>  	}
>  
>  	if (punch_start_byte < end_byte)
> -		error = punch(inode, punch_start_byte,
> -				end_byte - punch_start_byte, iomap);
> +		punch(inode, punch_start_byte, end_byte - punch_start_byte,
> +				iomap);
>  out_unlock:
>  	if (!(flags & IOMAP_ZERO))
>  		filemap_invalidate_unlock(inode->i_mapping);
> -	return error;
>  }
>  
>  /*
> @@ -1305,7 +1279,7 @@ static int iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>   *       ->punch
>   *         internal filesystem allocation lock
>   */
> -int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
> +void iomap_file_buffered_write_punch_delalloc(struct inode *inode,
>  		loff_t pos, loff_t length, ssize_t written, unsigned flags,
>  		struct iomap *iomap, iomap_punch_t punch)
>  {
> @@ -1314,11 +1288,11 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
>  	unsigned int		blocksize = i_blocksize(inode);
>  
>  	if (iomap->type != IOMAP_DELALLOC)
> -		return 0;
> +		return;
>  
>  	/* If we didn't reserve the blocks, we're not allowed to punch them. */
>  	if (!(iomap->flags & IOMAP_F_NEW))
> -		return 0;
> +		return;
>  
>  	/*
>  	 * start_byte refers to the first unused block after a short write. If
> @@ -1333,10 +1307,10 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
>  
>  	/* Nothing to do if we've written the entire delalloc extent */
>  	if (start_byte >= end_byte)
> -		return 0;
> +		return;
>  
> -	return iomap_write_delalloc_release(inode, start_byte, end_byte, flags,
> -					iomap, punch);
> +	iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
> +			punch);
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 695e5bee776f94..1e11f48814c0d0 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1208,7 +1208,7 @@ xfs_buffered_write_iomap_begin(
>  	return error;
>  }
>  
> -static int
> +static void
>  xfs_buffered_write_delalloc_punch(
>  	struct inode		*inode,
>  	loff_t			offset,
> @@ -1216,7 +1216,6 @@ xfs_buffered_write_delalloc_punch(
>  	struct iomap		*iomap)
>  {
>  	xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
> -	return 0;
>  }
>  
>  static int
> @@ -1228,18 +1227,8 @@ xfs_buffered_write_iomap_end(
>  	unsigned		flags,
>  	struct iomap		*iomap)
>  {
> -
> -	struct xfs_mount	*mp = XFS_M(inode->i_sb);
> -	int			error;
> -
> -	error = iomap_file_buffered_write_punch_delalloc(inode, offset, length,
> -			written, flags, iomap,
> -			&xfs_buffered_write_delalloc_punch);
> -	if (error && !xfs_is_shutdown(mp)) {
> -		xfs_alert(mp, "%s: unable to clean up ino 0x%llx",
> -			__func__, XFS_I(inode)->i_ino);
> -		return error;
> -	}
> +	iomap_file_buffered_write_punch_delalloc(inode, offset, length, written,
> +			flags, iomap, &xfs_buffered_write_delalloc_punch);
>  	return 0;
>  }
>  
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index a931190f6d858b..78a48af4d2c0a1 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -274,9 +274,9 @@ int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>  vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
>  			const struct iomap_ops *ops);
>  
> -typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
> +typedef void (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
>  		struct iomap *iomap);
> -int iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
> +void iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
>  		loff_t length, ssize_t written, unsigned flag,
>  		struct iomap *iomap, iomap_punch_t punch);
>  
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range
  2024-08-27  5:09 ` [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range Christoph Hellwig
@ 2024-08-27 16:37   ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:54AM +0200, Christoph Hellwig wrote:
> xfs_buffered_write_iomap_begin can also create delallocate reservations
> that need cleaning up, prepare for that by adding support for the COW
> fork in xfs_bmap_punch_delalloc_range.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Seems fine to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_aops.c      |  4 ++--
>  fs/xfs/xfs_bmap_util.c | 10 +++++++---
>  fs/xfs/xfs_bmap_util.h |  2 +-
>  fs/xfs/xfs_iomap.c     |  3 ++-
>  4 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 6dead20338e24c..559a3a57709748 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -116,7 +116,7 @@ xfs_end_ioend(
>  	if (unlikely(error)) {
>  		if (ioend->io_flags & IOMAP_F_SHARED) {
>  			xfs_reflink_cancel_cow_range(ip, offset, size, true);
> -			xfs_bmap_punch_delalloc_range(ip, offset,
> +			xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, offset,
>  					offset + size);
>  		}
>  		goto done;
> @@ -456,7 +456,7 @@ xfs_discard_folio(
>  	 * byte of the next folio. Hence the end offset is only dependent on the
>  	 * folio itself and not the start offset that is passed in.
>  	 */
> -	xfs_bmap_punch_delalloc_range(ip, pos,
> +	xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK, pos,
>  				folio_pos(folio) + folio_size(folio));
>  }
>  
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index fe2e2c93097550..15c8f90f19a934 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -443,11 +443,12 @@ xfs_getbmap(
>  void
>  xfs_bmap_punch_delalloc_range(
>  	struct xfs_inode	*ip,
> +	int			whichfork,
>  	xfs_off_t		start_byte,
>  	xfs_off_t		end_byte)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_ifork	*ifp = &ip->i_df;
> +	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
>  	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, start_byte);
>  	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, end_byte);
>  	struct xfs_bmbt_irec	got, del;
> @@ -475,11 +476,14 @@ xfs_bmap_punch_delalloc_range(
>  			continue;
>  		}
>  
> -		xfs_bmap_del_extent_delay(ip, XFS_DATA_FORK, &icur, &got, &del);
> +		xfs_bmap_del_extent_delay(ip, whichfork, &icur, &got, &del);
>  		if (!xfs_iext_get_extent(ifp, &icur, &got))
>  			break;
>  	}
>  
> +	if (whichfork == XFS_COW_FORK && !ifp->if_bytes)
> +		xfs_inode_clear_cowblocks_tag(ip);
> +
>  out_unlock:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  }
> @@ -590,7 +594,7 @@ xfs_free_eofblocks(
>  	 */
>  	if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
>  		if (ip->i_delayed_blks) {
> -			xfs_bmap_punch_delalloc_range(ip,
> +			xfs_bmap_punch_delalloc_range(ip, XFS_DATA_FORK,
>  				round_up(XFS_ISIZE(ip), mp->m_sb.sb_blocksize),
>  				LLONG_MAX);
>  		}
> diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
> index eb0895bfb9dae4..b29760d36e1ab1 100644
> --- a/fs/xfs/xfs_bmap_util.h
> +++ b/fs/xfs/xfs_bmap_util.h
> @@ -30,7 +30,7 @@ xfs_bmap_rtalloc(struct xfs_bmalloca *ap)
>  }
>  #endif /* CONFIG_XFS_RT */
>  
> -void	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
> +void	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip, int whichfork,
>  		xfs_off_t start_byte, xfs_off_t end_byte);
>  
>  struct kgetbmap {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 1e11f48814c0d0..24d69c8c168aeb 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1215,7 +1215,8 @@ xfs_buffered_write_delalloc_punch(
>  	loff_t			length,
>  	struct iomap		*iomap)
>  {
> -	xfs_bmap_punch_delalloc_range(XFS_I(inode), offset, offset + length);
> +	xfs_bmap_punch_delalloc_range(XFS_I(inode), XFS_DATA_FORK, offset,
> +			offset + length);
>  }
>  
>  static int
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/10] xfs: share a bit more code in xfs_buffered_write_iomap_begin
  2024-08-27  5:09 ` [PATCH 08/10] xfs: share a bit more code in xfs_buffered_write_iomap_begin Christoph Hellwig
@ 2024-08-27 16:38   ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:55AM +0200, Christoph Hellwig wrote:
> Introduce a local iomap_flags variable so that the code allocating new
> delalloc blocks in the data fork can fall through to the found_imap
> label and reuse the code to unlock and fill the iomap.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

That looks pretty straightforward
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_iomap.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 24d69c8c168aeb..e0dc6393686c01 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -975,6 +975,7 @@ xfs_buffered_write_iomap_begin(
>  	int			allocfork = XFS_DATA_FORK;
>  	int			error = 0;
>  	unsigned int		lockmode = XFS_ILOCK_EXCL;
> +	unsigned int		iomap_flags = 0;
>  	u64			seq;
>  
>  	if (xfs_is_shutdown(mp))
> @@ -1145,6 +1146,11 @@ xfs_buffered_write_iomap_begin(
>  		}
>  	}
>  
> +	/*
> +	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
> +	 * them out if the write happens to fail.
> +	 */
> +	iomap_flags |= IOMAP_F_NEW;
>  	if (allocfork == XFS_COW_FORK) {
>  		error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb,
>  				end_fsb - offset_fsb, prealloc_blocks, &cmap,
> @@ -1162,19 +1168,11 @@ xfs_buffered_write_iomap_begin(
>  	if (error)
>  		goto out_unlock;
>  
> -	/*
> -	 * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch
> -	 * them out if the write happens to fail.
> -	 */
> -	seq = xfs_iomap_inode_sequence(ip, IOMAP_F_NEW);
> -	xfs_iunlock(ip, lockmode);
>  	trace_xfs_iomap_alloc(ip, offset, count, allocfork, &imap);
> -	return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, IOMAP_F_NEW, seq);
> -
>  found_imap:
> -	seq = xfs_iomap_inode_sequence(ip, 0);
> +	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
>  	xfs_iunlock(ip, lockmode);
> -	return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, 0, seq);
> +	return xfs_bmbt_to_iomap(ip, iomap, &imap, flags, iomap_flags, seq);
>  
>  convert_delay:
>  	xfs_iunlock(ip, lockmode);
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations
  2024-08-27  5:09 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
@ 2024-08-27 16:44   ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:56AM +0200, Christoph Hellwig wrote:
> Change to always set xfs_buffered_write_iomap_begin for COW fork
> allocations even if they don't overlap existing data fork extents,
> which will allow the iomap_end callback to detect if it has to punch
> stale delalloc blocks from the COW fork instead of the data fork.  It
> also means we sample the sequence counter for both the data and the COW
> fork when writing to the COW fork, which ensures we properly revalidate
> when only COW fork changes happens.
> 
> This is essentially a revert of commit 72a048c1056a ("xfs: only set
> IOMAP_F_SHARED when providing a srcmap to a write"). This is fine because
> the problem that the commit fixed has now been dealt with in iomap by
> only looking at the actual srcmap and not the fallback to the write
> iomap.
> 
> Note that the direct I/O path was never changed and has always set
> IOMAP_F_SHARED for all COW fork allocations.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

That was fun to compare your revert vs. a patch from 3 years ago. $)

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_iomap.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index e0dc6393686c01..22e9613a995f12 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1186,20 +1186,19 @@ xfs_buffered_write_iomap_begin(
>  	return 0;
>  
>  found_cow:
> -	seq = xfs_iomap_inode_sequence(ip, 0);
>  	if (imap.br_startoff <= offset_fsb) {
> -		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq);
> +		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0,
> +				xfs_iomap_inode_sequence(ip, 0));
>  		if (error)
>  			goto out_unlock;
> -		seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
> -		xfs_iunlock(ip, lockmode);
> -		return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
> -					 IOMAP_F_SHARED, seq);
> +	} else {
> +		xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
>  	}
>  
> -	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
> +	iomap_flags = IOMAP_F_SHARED;
> +	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
>  	xfs_iunlock(ip, lockmode);
> -	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq);
> +	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
>  
>  out_unlock:
>  	xfs_iunlock(ip, lockmode);
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes
  2024-08-27  5:09 ` [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes Christoph Hellwig
@ 2024-08-27 16:44   ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-27 16:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Tue, Aug 27, 2024 at 07:09:57AM +0200, Christoph Hellwig wrote:
> When ->iomap_end is called on a short write to the COW fork it needs to
> punch stale delalloc data from the COW fork and not the data fork.
> 
> Ensure that IOMAP_F_NEW is set for new COW fork allocations in
> xfs_buffered_write_iomap_begin, and then use the IOMAP_F_SHARED flag
> in xfs_buffered_write_delalloc_punch to decide which fork to punch.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks fine,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_iomap.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 22e9613a995f12..4113e09cb836a8 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1195,7 +1195,7 @@ xfs_buffered_write_iomap_begin(
>  		xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
>  	}
>  
> -	iomap_flags = IOMAP_F_SHARED;
> +	iomap_flags |= IOMAP_F_SHARED;
>  	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
>  	xfs_iunlock(ip, lockmode);
>  	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
> @@ -1212,8 +1212,10 @@ xfs_buffered_write_delalloc_punch(
>  	loff_t			length,
>  	struct iomap		*iomap)
>  {
> -	xfs_bmap_punch_delalloc_range(XFS_I(inode), XFS_DATA_FORK, offset,
> -			offset + length);
> +	xfs_bmap_punch_delalloc_range(XFS_I(inode),
> +			(iomap->flags & IOMAP_F_SHARED) ?
> +				XFS_COW_FORK : XFS_DATA_FORK,
> +			offset, offset + length);
>  }
>  
>  static int
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
  2024-08-27 16:14   ` Darrick J. Wong
@ 2024-08-28  4:48     ` Christoph Hellwig
  2024-08-28 16:13       ` Darrick J. Wong
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-28  4:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Christian Brauner, linux-xfs,
	linux-fsdevel

On Tue, Aug 27, 2024 at 09:14:16AM -0700, Darrick J. Wong wrote:
> Is there any chance that we could get stuck in a loop here?  I
> think it's the case that if SEEK_HOLE returns data_end == start_byte,
> then the next time through the loop, the SEEK_DATA will return something
> that is > start_byte.

Yes.

> Unless someone is very rapidly writing and
> punching the page cache?
> 
> Hmm but then if *xfs* is punching delalloc then we're we holding the
> iolock so who else could be doing that?

Yes.  It's only the async direct I/O completions that punch without
the lock.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter
  2024-08-27 16:21       ` Darrick J. Wong
@ 2024-08-28  4:49         ` Christoph Hellwig
  2024-08-28 16:17           ` Darrick J. Wong
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-28  4:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Christian Brauner, linux-xfs,
	linux-fsdevel

On Tue, Aug 27, 2024 at 09:21:49AM -0700, Darrick J. Wong wrote:
> > For writes it usually means out of place write, but for reporting
> > it gets translated to the FIEMAP_EXTENT_SHARED flag or is used to
> > reject swapon.  And the there is black magic in DAX.
> 
> Hee hee.  Yeah, let's leave IOMAP_F_SHARED alone; an out of place write
> can be detected by iter->srcmap.type != HOLE.

I can probably come up with a comment that includes the COWextsize hints
in the definition of out of place writes and we should be all fine..


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/10] iomap: zeroing already holds invalidate_lock in iomap_file_buffered_write_punch_delalloc
  2024-08-27 16:28   ` Darrick J. Wong
@ 2024-08-28  4:51     ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-28  4:51 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Christian Brauner, linux-xfs,
	linux-fsdevel

On Tue, Aug 27, 2024 at 09:28:04AM -0700, Darrick J. Wong wrote:
> On Tue, Aug 27, 2024 at 07:09:51AM +0200, Christoph Hellwig wrote:
> > All callers of iomap_zero_range already hold invalidate_lock, so we can't
> > take it again in iomap_file_buffered_write_punch_delalloc.
> 
> What about the xfs_zero_range call in xfs_file_write_checks?  AFAICT we
> don't hold the invalidate lock there.  Did I misread that?

No, I think you're right.  My testing just never managed to hit a short
zero while doing the write prep.

I guess I'll need to do something more complicated than the zero flag
then. I initially added a new flag just for that and then (wrongly as you
pointed out) that I don't need it after all.

> Also, would nested takings of the invalidate lock cause a livelock?  Or
> is this actually quite broken now?

It is a cold, hard deadlock.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/10] iomap: remove the iomap_file_buffered_write_punch_delalloc return value
  2024-08-27 16:36   ` Darrick J. Wong
@ 2024-08-28  4:52     ` Christoph Hellwig
  2024-08-28 16:18       ` Darrick J. Wong
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-28  4:52 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Christian Brauner, linux-xfs,
	linux-fsdevel

On Tue, Aug 27, 2024 at 09:36:13AM -0700, Darrick J. Wong wrote:
> > 
> > As the only instance of ->punch never returns an error, an such an error
> > would be fatal anyway remove the entire error propagation and don't
> > return an error code from iomap_file_buffered_write_punch_delalloc.
> 
> Not sure I like this one -- if the ->iomap_begin method returns some
> weird error to iomap_seek_{data,hole}, then I think we'd at least want
> to complain about that?

iomap_file_buffered_write_punch_delalloc never calls into
iomap_seek_{data,hole} and thus ->iomap_begin.  It just uses the lower
level mapping_seek_hole_data that checks dirty state, and that always
returns either an offset or -ENXIO, which is a special signal and not
an error.

> Though I guess we're punching delalloc mappings for a failed pagecache
> write, so we've already got ourselves a juicy EIO to throw up to the
> application so maybe it's fine not to bother with the error recovery
> erroring out.  Right?

But that is true as well.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
  2024-08-28  4:48     ` Christoph Hellwig
@ 2024-08-28 16:13       ` Darrick J. Wong
  2024-08-29  3:46         ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-28 16:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Wed, Aug 28, 2024 at 06:48:48AM +0200, Christoph Hellwig wrote:
> On Tue, Aug 27, 2024 at 09:14:16AM -0700, Darrick J. Wong wrote:
> > Is there any chance that we could get stuck in a loop here?  I
> > think it's the case that if SEEK_HOLE returns data_end == start_byte,
> > then the next time through the loop, the SEEK_DATA will return something
> > that is > start_byte.
> 
> Yes.
> 
> > Unless someone is very rapidly writing and
> > punching the page cache?
> > 
> > Hmm but then if *xfs* is punching delalloc then we're we holding the
> > iolock so who else could be doing that?
> 
> Yes.  It's only the async direct I/O completions that punch without
> the lock.

Heh, I forgot that I'd asked three questions.  Anyway I'm satisfied with
the answers I got, so

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Though we might have to revisit this for filesystems that don't take
i_rwsem exclusively when writing -- is that a problem?  I guess if you
had two threads both writing and punching the pagecache they could get
into trouble, but that might be a case of "if it hurts don't do that".

--D


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter
  2024-08-28  4:49         ` Christoph Hellwig
@ 2024-08-28 16:17           ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-28 16:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Wed, Aug 28, 2024 at 06:49:29AM +0200, Christoph Hellwig wrote:
> On Tue, Aug 27, 2024 at 09:21:49AM -0700, Darrick J. Wong wrote:
> > > For writes it usually means out of place write, but for reporting
> > > it gets translated to the FIEMAP_EXTENT_SHARED flag or is used to
> > > reject swapon.  And the there is black magic in DAX.
> > 
> > Hee hee.  Yeah, let's leave IOMAP_F_SHARED alone; an out of place write
> > can be detected by iter->srcmap.type != HOLE.

out of place write => "OOP"

but that acronym is already taken

static inline bool iomap_write_oops(const struct iomap_iter *i)
{
	/* ... i wrote it again */
	return i->srcmap.type != HOLE;
}

<with apologies to the artist I just insulted>

jk

> I can probably come up with a comment that includes the COWextsize hints
> in the definition of out of place writes and we should be all fine..

Sounds good to me.

--D

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/10] iomap: remove the iomap_file_buffered_write_punch_delalloc return value
  2024-08-28  4:52     ` Christoph Hellwig
@ 2024-08-28 16:18       ` Darrick J. Wong
  0 siblings, 0 replies; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-28 16:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Wed, Aug 28, 2024 at 06:52:52AM +0200, Christoph Hellwig wrote:
> On Tue, Aug 27, 2024 at 09:36:13AM -0700, Darrick J. Wong wrote:
> > > 
> > > As the only instance of ->punch never returns an error, an such an error
> > > would be fatal anyway remove the entire error propagation and don't
> > > return an error code from iomap_file_buffered_write_punch_delalloc.
> > 
> > Not sure I like this one -- if the ->iomap_begin method returns some
> > weird error to iomap_seek_{data,hole}, then I think we'd at least want
> > to complain about that?
> 
> iomap_file_buffered_write_punch_delalloc never calls into
> iomap_seek_{data,hole} and thus ->iomap_begin.  It just uses the lower
> level mapping_seek_hole_data that checks dirty state, and that always
> returns either an offset or -ENXIO, which is a special signal and not
> an error.
> 
> > Though I guess we're punching delalloc mappings for a failed pagecache
> > write, so we've already got ourselves a juicy EIO to throw up to the
> > application so maybe it's fine not to bother with the error recovery
> > erroring out.  Right?
> 
> But that is true as well.

Ok, I'm satisfied,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
  2024-08-28 16:13       ` Darrick J. Wong
@ 2024-08-29  3:46         ` Christoph Hellwig
  2024-08-29 14:22           ` Darrick J. Wong
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-29  3:46 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Christian Brauner, linux-xfs,
	linux-fsdevel

On Wed, Aug 28, 2024 at 09:13:38AM -0700, Darrick J. Wong wrote:
> Though we might have to revisit this for filesystems that don't take
> i_rwsem exclusively when writing -- is that a problem?  I guess if you
> had two threads both writing and punching the pagecache they could get
> into trouble, but that might be a case of "if it hurts don't do that".

No i_rwsem for buffered writes?  You can't really do that without hell
breaking lose.  At least not without another exclusive lock.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
  2024-08-29  3:46         ` Christoph Hellwig
@ 2024-08-29 14:22           ` Darrick J. Wong
  2024-08-30  3:42             ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Darrick J. Wong @ 2024-08-29 14:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Christian Brauner, linux-xfs, linux-fsdevel

On Thu, Aug 29, 2024 at 05:46:26AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 28, 2024 at 09:13:38AM -0700, Darrick J. Wong wrote:
> > Though we might have to revisit this for filesystems that don't take
> > i_rwsem exclusively when writing -- is that a problem?  I guess if you
> > had two threads both writing and punching the pagecache they could get
> > into trouble, but that might be a case of "if it hurts don't do that".
> 
> No i_rwsem for buffered writes?  You can't really do that without hell
> breaking lose.  At least not without another exclusive lock.

Well, i_rwsem in shared mode.  ISTR ext4 does inode_lock_shared and
serializes on the folio lock, at least for non extending writes.

--D

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release
  2024-08-29 14:22           ` Darrick J. Wong
@ 2024-08-30  3:42             ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2024-08-30  3:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Christian Brauner, linux-xfs,
	linux-fsdevel

On Thu, Aug 29, 2024 at 07:22:19AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 29, 2024 at 05:46:26AM +0200, Christoph Hellwig wrote:
> > On Wed, Aug 28, 2024 at 09:13:38AM -0700, Darrick J. Wong wrote:
> > > Though we might have to revisit this for filesystems that don't take
> > > i_rwsem exclusively when writing -- is that a problem?  I guess if you
> > > had two threads both writing and punching the pagecache they could get
> > > into trouble, but that might be a case of "if it hurts don't do that".
> > 
> > No i_rwsem for buffered writes?  You can't really do that without hell
> > breaking lose.  At least not without another exclusive lock.
> 
> Well, i_rwsem in shared mode.  ISTR ext4 does inode_lock_shared and
> serializes on the folio lock, at least for non extending writes.

ext4 uses plain inode lock/unlock which is an exclusive i_rwsem.
Given that Posix requires the entire write to synchronized vs other
writes applications would break otherwise.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations
  2024-09-23 15:28 fix stale delalloc punching for COW I/O v3 Christoph Hellwig
@ 2024-09-23 15:28 ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2024-09-23 15:28 UTC (permalink / raw)
  To: Chandan Babu R, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

Change to always set xfs_buffered_write_iomap_begin for COW fork
allocations even if they don't overlap existing data fork extents,
which will allow the iomap_end callback to detect if it has to punch
stale delalloc blocks from the COW fork instead of the data fork.  It
also means we sample the sequence counter for both the data and the COW
fork when writing to the COW fork, which ensures we properly revalidate
when only COW fork changes happens.

This is essentially a revert of commit 72a048c1056a ("xfs: only set
IOMAP_F_SHARED when providing a srcmap to a write"). This is fine because
the problem that the commit fixed has now been dealt with in iomap by
only looking at the actual srcmap and not the fallback to the write
iomap.

Note that the direct I/O path was never changed and has always set
IOMAP_F_SHARED for all COW fork allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_iomap.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index aff9e8305399ee..cc768f0139d365 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1186,20 +1186,19 @@ xfs_buffered_write_iomap_begin(
 	return 0;
 
 found_cow:
-	seq = xfs_iomap_inode_sequence(ip, 0);
 	if (imap.br_startoff <= offset_fsb) {
-		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq);
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0,
+				xfs_iomap_inode_sequence(ip, 0));
 		if (error)
 			goto out_unlock;
-		seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
-		xfs_iunlock(ip, lockmode);
-		return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
-					 IOMAP_F_SHARED, seq);
+	} else {
+		xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
 	}
 
-	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
+	iomap_flags = IOMAP_F_SHARED;
+	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
 	xfs_iunlock(ip, lockmode);
-	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq);
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
 
 out_unlock:
 	xfs_iunlock(ip, lockmode);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations
  2024-09-24  7:40 fix stale delalloc punching for COW I/O v4 Christoph Hellwig
@ 2024-09-24  7:40 ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2024-09-24  7:40 UTC (permalink / raw)
  To: Chandan Babu R, Carlos Maiolino, Christian Brauner,
	Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

Change to always set xfs_buffered_write_iomap_begin for COW fork
allocations even if they don't overlap existing data fork extents,
which will allow the iomap_end callback to detect if it has to punch
stale delalloc blocks from the COW fork instead of the data fork.  It
also means we sample the sequence counter for both the data and the COW
fork when writing to the COW fork, which ensures we properly revalidate
when only COW fork changes happens.

This is essentially a revert of commit 72a048c1056a ("xfs: only set
IOMAP_F_SHARED when providing a srcmap to a write"). This is fine because
the problem that the commit fixed has now been dealt with in iomap by
only looking at the actual srcmap and not the fallback to the write
iomap.

Note that the direct I/O path was never changed and has always set
IOMAP_F_SHARED for all COW fork allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_iomap.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index aff9e8305399ee..cc768f0139d365 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1186,20 +1186,19 @@ xfs_buffered_write_iomap_begin(
 	return 0;
 
 found_cow:
-	seq = xfs_iomap_inode_sequence(ip, 0);
 	if (imap.br_startoff <= offset_fsb) {
-		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq);
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0,
+				xfs_iomap_inode_sequence(ip, 0));
 		if (error)
 			goto out_unlock;
-		seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
-		xfs_iunlock(ip, lockmode);
-		return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
-					 IOMAP_F_SHARED, seq);
+	} else {
+		xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
 	}
 
-	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
+	iomap_flags = IOMAP_F_SHARED;
+	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
 	xfs_iunlock(ip, lockmode);
-	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq);
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
 
 out_unlock:
 	xfs_iunlock(ip, lockmode);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations
  2024-10-08  8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
@ 2024-10-08  8:59 ` Christoph Hellwig
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2024-10-08  8:59 UTC (permalink / raw)
  To: Carlos Maiolino, Christian Brauner, Darrick J. Wong
  Cc: linux-xfs, linux-fsdevel

Change to always set xfs_buffered_write_iomap_begin for COW fork
allocations even if they don't overlap existing data fork extents,
which will allow the iomap_end callback to detect if it has to punch
stale delalloc blocks from the COW fork instead of the data fork.  It
also means we sample the sequence counter for both the data and the COW
fork when writing to the COW fork, which ensures we properly revalidate
when only COW fork changes happens.

This is essentially a revert of commit 72a048c1056a ("xfs: only set
IOMAP_F_SHARED when providing a srcmap to a write"). This is fine because
the problem that the commit fixed has now been dealt with in iomap by
only looking at the actual srcmap and not the fallback to the write
iomap.

Note that the direct I/O path was never changed and has always set
IOMAP_F_SHARED for all COW fork allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_iomap.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ebd0c90c1b3d8e..0317bbfeeb38f3 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1186,20 +1186,20 @@ xfs_buffered_write_iomap_begin(
 	return 0;
 
 found_cow:
-	seq = xfs_iomap_inode_sequence(ip, 0);
 	if (imap.br_startoff <= offset_fsb) {
-		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0, seq);
+		error = xfs_bmbt_to_iomap(ip, srcmap, &imap, flags, 0,
+				xfs_iomap_inode_sequence(ip, 0));
 		if (error)
 			goto out_unlock;
-		seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED);
-		xfs_iunlock(ip, lockmode);
-		return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags,
-					 IOMAP_F_SHARED, seq);
+	} else {
+		xfs_trim_extent(&cmap, offset_fsb,
+				imap.br_startoff - offset_fsb);
 	}
 
-	xfs_trim_extent(&cmap, offset_fsb, imap.br_startoff - offset_fsb);
+	iomap_flags = IOMAP_F_SHARED;
+	seq = xfs_iomap_inode_sequence(ip, iomap_flags);
 	xfs_iunlock(ip, lockmode);
-	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, 0, seq);
+	return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, iomap_flags, seq);
 
 out_unlock:
 	xfs_iunlock(ip, lockmode);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-10-08  9:00 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-27  5:09 fix stale delalloc punching for COW I/O Christoph Hellwig
2024-08-27  5:09 ` [PATCH 01/10] iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release Christoph Hellwig
2024-08-27 16:14   ` Darrick J. Wong
2024-08-28  4:48     ` Christoph Hellwig
2024-08-28 16:13       ` Darrick J. Wong
2024-08-29  3:46         ` Christoph Hellwig
2024-08-29 14:22           ` Darrick J. Wong
2024-08-30  3:42             ` Christoph Hellwig
2024-08-27  5:09 ` [PATCH 02/10] iomap: improve shared block detection in iomap_unshare_iter Christoph Hellwig
2024-08-27  5:44   ` Darrick J. Wong
2024-08-27  5:47     ` Christoph Hellwig
2024-08-27 16:21       ` Darrick J. Wong
2024-08-28  4:49         ` Christoph Hellwig
2024-08-28 16:17           ` Darrick J. Wong
2024-08-27  5:09 ` [PATCH 03/10] iomap: pass flags to iomap_file_buffered_write_punch_delalloc Christoph Hellwig
2024-08-27 16:22   ` Darrick J. Wong
2024-08-27  5:09 ` [PATCH 04/10] iomap: zeroing already holds invalidate_lock in iomap_file_buffered_write_punch_delalloc Christoph Hellwig
2024-08-27 16:28   ` Darrick J. Wong
2024-08-28  4:51     ` Christoph Hellwig
2024-08-27  5:09 ` [PATCH 05/10] iomap: pass the iomap to the punch callback Christoph Hellwig
2024-08-27 16:28   ` Darrick J. Wong
2024-08-27  5:09 ` [PATCH 06/10] iomap: remove the iomap_file_buffered_write_punch_delalloc return value Christoph Hellwig
2024-08-27 16:36   ` Darrick J. Wong
2024-08-28  4:52     ` Christoph Hellwig
2024-08-28 16:18       ` Darrick J. Wong
2024-08-27  5:09 ` [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range Christoph Hellwig
2024-08-27 16:37   ` Darrick J. Wong
2024-08-27  5:09 ` [PATCH 08/10] xfs: share a bit more code in xfs_buffered_write_iomap_begin Christoph Hellwig
2024-08-27 16:38   ` Darrick J. Wong
2024-08-27  5:09 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
2024-08-27 16:44   ` Darrick J. Wong
2024-08-27  5:09 ` [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes Christoph Hellwig
2024-08-27 16:44   ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2024-09-23 15:28 fix stale delalloc punching for COW I/O v3 Christoph Hellwig
2024-09-23 15:28 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
2024-09-24  7:40 fix stale delalloc punching for COW I/O v4 Christoph Hellwig
2024-09-24  7:40 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
2024-10-08  8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
2024-10-08  8:59 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).