post-EOF block handling revamp

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* post-EOF block handling revamp
@ 2024-06-23  5:34 Christoph Hellwig
  2024-06-23  5:34 ` [PATCH 01/10] xfs: fix freeing speculative preallocations for preallocated files Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Hi all,

this series reworks handling of post-EOF blocks, primarily in ->release.
This takes over the work originally started by Dave in:

    https://marc.info/?l=linux-xfs&m=154951612101291&w=2

and lingering in Darricks's tree:

    https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=reduce-eofblocks-gc-on-close

for years to ensure ->release doesn't too eagerly kill post-EOF block
speculative preallocation and then goes on to not let preallocation
on inodes with the append only flag set linger forever.

I'll also post a rebased version of Dave's patches from back then.

The first patch has already been sent standalone and as part of Darrick's
fixes series, but as the rest of the series depends on it I'm sending it
here again - third time's a charm.

Diffstat:
 xfs_bmap_util.c |   58 ++++++++++++++++++----------------
 xfs_bmap_util.h |    2 -
 xfs_file.c      |   73 +++++++++++++++++++++++++++++++++++++++++--
 xfs_icache.c    |    4 +-
 xfs_inode.c     |   94 +-------------------------------------------------------
 xfs_inode.h     |    5 +-
 6 files changed, 109 insertions(+), 127 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/10] xfs: fix freeing speculative preallocations for preallocated files
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:30   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 02/10] xfs: remove the i_mode check in xfs_release Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

xfs_can_free_eofblocks returns false for files that have persistent
preallocations unless the force flag is passed and there are delayed
blocks.  This means it won't free delalloc reservations for files
with persistent preallocations unless the force flag is set, and it
will also free the persistent preallocations if the force flag is
set and the file happens to have delayed allocations.

Both of these are bad, so do away with the force flag and always
free post-EOF delayed allocations only for files with the
XFS_DIFLAG_PREALLOC flag set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c | 30 ++++++++++++++++++++++--------
 fs/xfs/xfs_bmap_util.h |  2 +-
 fs/xfs/xfs_icache.c    |  2 +-
 fs/xfs/xfs_inode.c     | 14 ++++----------
 4 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index ac2e77ebb54c73..a4d9fbc21b8343 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -486,13 +486,11 @@ xfs_bmap_punch_delalloc_range(
 
 /*
  * Test whether it is appropriate to check an inode for and free post EOF
- * blocks. The 'force' parameter determines whether we should also consider
- * regular files that are marked preallocated or append-only.
+ * blocks.
  */
 bool
 xfs_can_free_eofblocks(
-	struct xfs_inode	*ip,
-	bool			force)
+	struct xfs_inode	*ip)
 {
 	struct xfs_bmbt_irec	imap;
 	struct xfs_mount	*mp = ip->i_mount;
@@ -526,11 +524,11 @@ xfs_can_free_eofblocks(
 		return false;
 
 	/*
-	 * Do not free real preallocated or append-only files unless the file
-	 * has delalloc blocks and we are forced to remove them.
+	 * Only free real extents for inodes with persistent preallocations or
+	 * the append-only flag.
 	 */
 	if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
-		if (!force || ip->i_delayed_blks == 0)
+		if (ip->i_delayed_blks == 0)
 			return false;
 
 	/*
@@ -584,6 +582,22 @@ xfs_free_eofblocks(
 	/* Wait on dio to ensure i_size has settled. */
 	inode_dio_wait(VFS_I(ip));
 
+	/*
+	 * For preallocated files only free delayed allocations.
+	 *
+	 * Note that this means we also leave speculative preallocations in
+	 * place for preallocated files.
+	 */
+	if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
+		if (ip->i_delayed_blks) {
+			xfs_bmap_punch_delalloc_range(ip,
+				round_up(XFS_ISIZE(ip), mp->m_sb.sb_blocksize),
+				LLONG_MAX);
+		}
+		xfs_inode_clear_eofblocks_tag(ip);
+		return 0;
+	}
+
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
 	if (error) {
 		ASSERT(xfs_is_shutdown(mp));
@@ -891,7 +905,7 @@ xfs_prepare_shift(
 	 * Trim eofblocks to avoid shifting uninitialized post-eof preallocation
 	 * into the accessible region of the file.
 	 */
-	if (xfs_can_free_eofblocks(ip, true)) {
+	if (xfs_can_free_eofblocks(ip)) {
 		error = xfs_free_eofblocks(ip);
 		if (error)
 			return error;
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index 51f84d8ff372fa..eb0895bfb9dae4 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -63,7 +63,7 @@ int	xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
 				xfs_off_t len);
 
 /* EOF block manipulation functions */
-bool	xfs_can_free_eofblocks(struct xfs_inode *ip, bool force);
+bool	xfs_can_free_eofblocks(struct xfs_inode *ip);
 int	xfs_free_eofblocks(struct xfs_inode *ip);
 
 int	xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip,
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 0953163a2d8492..9967334ea99f1a 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1155,7 +1155,7 @@ xfs_inode_free_eofblocks(
 	}
 	*lockflags |= XFS_IOLOCK_EXCL;
 
-	if (xfs_can_free_eofblocks(ip, false))
+	if (xfs_can_free_eofblocks(ip))
 		return xfs_free_eofblocks(ip);
 
 	/* inode could be preallocated or append-only */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f36091e1e7f50b..38f946e3be2da3 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1595,7 +1595,7 @@ xfs_release(
 	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL))
 		return 0;
 
-	if (xfs_can_free_eofblocks(ip, false)) {
+	if (xfs_can_free_eofblocks(ip)) {
 		/*
 		 * Check if the inode is being opened, written and closed
 		 * frequently and we have delayed allocation blocks outstanding
@@ -1856,15 +1856,13 @@ xfs_inode_needs_inactive(
 
 	/*
 	 * This file isn't being freed, so check if there are post-eof blocks
-	 * to free.  @force is true because we are evicting an inode from the
-	 * cache.  Post-eof blocks must be freed, lest we end up with broken
-	 * free space accounting.
+	 * to free.
 	 *
 	 * Note: don't bother with iolock here since lockdep complains about
 	 * acquiring it in reclaim context. We have the only reference to the
 	 * inode at this point anyways.
 	 */
-	return xfs_can_free_eofblocks(ip, true);
+	return xfs_can_free_eofblocks(ip);
 }
 
 /*
@@ -1947,15 +1945,11 @@ xfs_inactive(
 
 	if (VFS_I(ip)->i_nlink != 0) {
 		/*
-		 * force is true because we are evicting an inode from the
-		 * cache. Post-eof blocks must be freed, lest we end up with
-		 * broken free space accounting.
-		 *
 		 * Note: don't bother with iolock here since lockdep complains
 		 * about acquiring it in reclaim context. We have the only
 		 * reference to the inode at this point anyways.
 		 */
-		if (xfs_can_free_eofblocks(ip, true))
+		if (xfs_can_free_eofblocks(ip))
 			error = xfs_free_eofblocks(ip);
 
 		goto out;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/10] xfs: remove the i_mode check in xfs_release
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
  2024-06-23  5:34 ` [PATCH 01/10] xfs: fix freeing speculative preallocations for preallocated files Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:34   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 03/10] xfs: refactor f_op->release handling Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

xfs_release is only called from xfs_file_release, which is wired up as
the f_op->release handler for regular files only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_inode.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 38f946e3be2da3..9a9340aebe9d8a 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1552,9 +1552,6 @@ xfs_release(
 	xfs_mount_t	*mp = ip->i_mount;
 	int		error = 0;
 
-	if (!S_ISREG(VFS_I(ip)->i_mode) || (VFS_I(ip)->i_mode == 0))
-		return 0;
-
 	/* If this is a read-only mount, don't do this (would generate I/O) */
 	if (xfs_is_readonly(mp))
 		return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/10] xfs: refactor f_op->release handling
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
  2024-06-23  5:34 ` [PATCH 01/10] xfs: fix freeing speculative preallocations for preallocated files Christoph Hellwig
  2024-06-23  5:34 ` [PATCH 02/10] xfs: remove the i_mode check in xfs_release Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:35   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 04/10] xfs: don't bother returning errors from xfs_file_release Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Currently f_op->release is split in not very obvious ways.  Fix that by
folding xfs_release into xfs_file_release.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c  | 71 +++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_inode.c | 79 ----------------------------------------------
 fs/xfs/xfs_inode.h |  1 -
 3 files changed, 68 insertions(+), 83 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index b240ea5241dc9d..d39d0ea522d1c2 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1188,10 +1188,75 @@ xfs_dir_open(
 
 STATIC int
 xfs_file_release(
-	struct inode	*inode,
-	struct file	*filp)
+	struct inode		*inode,
+	struct file		*file)
 {
-	return xfs_release(XFS_I(inode));
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	int			error;
+
+	/* If this is a read-only mount, don't generate I/O */
+	if (xfs_is_readonly(mp))
+		return 0;
+
+	/*
+	 * If we previously truncated this file and removed old data in the
+	 * process, we want to initiate "early" writeout on the last close.
+	 * This is an attempt to combat the notorious NULL files problem which
+	 * is particularly noticeable from a truncate down, buffered (re-)write
+	 * (delalloc), followed by a crash.  What we are effectively doing here
+	 * is significantly reducing the time window where we'd otherwise be
+	 * exposed to that problem.
+	 */
+	if (!xfs_is_shutdown(mp) &&
+	    xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
+		xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
+		if (ip->i_delayed_blks > 0) {
+			error = filemap_flush(inode->i_mapping);
+			if (error)
+				return error;
+		}
+	}
+
+	/*
+	 * XFS aggressively preallocates post-EOF space to generate contiguous
+	 * allocations for writers that append to the end of the file and we
+	 * try to free these when an open file context is released.
+	 *
+	 * There is no point in freeing blocks here for open but unlinked files
+	 * as they will be taken care of by the inactivation path soon.
+	 *
+	 * If we can't get the iolock just skip truncating the blocks past EOF
+	 * because we could deadlock with the mmap_lock otherwise. We'll get
+	 * another chance to drop them once the last reference to the inode is
+	 * dropped, so we'll never leak blocks permanently.
+	 */
+	if (inode->i_nlink && xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+		if (xfs_can_free_eofblocks(ip) &&
+		    !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) {
+			/*
+			 * Check if the inode is being opened, written and
+			 * closed frequently and we have delayed allocation
+			 * blocks outstanding (e.g. streaming writes from the
+			 * NFS server), truncating the blocks past EOF will
+			 * cause fragmentation to occur.
+			 *
+			 * In this case don't do the truncation, but we have to
+			 * be careful how we detect this case. Blocks beyond EOF
+			 * show up as i_delayed_blks even when the inode is
+			 * clean, so we need to truncate them away first before
+			 * checking for a dirty release. Hence on the first
+			 * dirty close we will still remove the speculative
+			 * allocation, but after that we will leave it in place.
+			 */
+			error = xfs_free_eofblocks(ip);
+			if (!error && ip->i_delayed_blks)
+				xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
+		}
+		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	}
+
+	return error;
 }
 
 STATIC int
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 9a9340aebe9d8a..fe4906a08665ee 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1545,85 +1545,6 @@ xfs_itruncate_extents_flags(
 	return error;
 }
 
-int
-xfs_release(
-	xfs_inode_t	*ip)
-{
-	xfs_mount_t	*mp = ip->i_mount;
-	int		error = 0;
-
-	/* If this is a read-only mount, don't do this (would generate I/O) */
-	if (xfs_is_readonly(mp))
-		return 0;
-
-	if (!xfs_is_shutdown(mp)) {
-		int truncated;
-
-		/*
-		 * If we previously truncated this file and removed old data
-		 * in the process, we want to initiate "early" writeout on
-		 * the last close.  This is an attempt to combat the notorious
-		 * NULL files problem which is particularly noticeable from a
-		 * truncate down, buffered (re-)write (delalloc), followed by
-		 * a crash.  What we are effectively doing here is
-		 * significantly reducing the time window where we'd otherwise
-		 * be exposed to that problem.
-		 */
-		truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
-		if (truncated) {
-			xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
-			if (ip->i_delayed_blks > 0) {
-				error = filemap_flush(VFS_I(ip)->i_mapping);
-				if (error)
-					return error;
-			}
-		}
-	}
-
-	if (VFS_I(ip)->i_nlink == 0)
-		return 0;
-
-	/*
-	 * If we can't get the iolock just skip truncating the blocks past EOF
-	 * because we could deadlock with the mmap_lock otherwise. We'll get
-	 * another chance to drop them once the last reference to the inode is
-	 * dropped, so we'll never leak blocks permanently.
-	 */
-	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL))
-		return 0;
-
-	if (xfs_can_free_eofblocks(ip)) {
-		/*
-		 * Check if the inode is being opened, written and closed
-		 * frequently and we have delayed allocation blocks outstanding
-		 * (e.g. streaming writes from the NFS server), truncating the
-		 * blocks past EOF will cause fragmentation to occur.
-		 *
-		 * In this case don't do the truncation, but we have to be
-		 * careful how we detect this case. Blocks beyond EOF show up as
-		 * i_delayed_blks even when the inode is clean, so we need to
-		 * truncate them away first before checking for a dirty release.
-		 * Hence on the first dirty close we will still remove the
-		 * speculative allocation, but after that we will leave it in
-		 * place.
-		 */
-		if (xfs_iflags_test(ip, XFS_IDIRTY_RELEASE))
-			goto out_unlock;
-
-		error = xfs_free_eofblocks(ip);
-		if (error)
-			goto out_unlock;
-
-		/* delalloc blocks after truncation means it really is dirty */
-		if (ip->i_delayed_blks)
-			xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
-	}
-
-out_unlock:
-	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
-	return error;
-}
-
 /*
  * Mark all the buffers attached to this directory stale.  In theory we should
  * never be freeing a directory with any blocks at all, but this covers the
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 292b90b5f2ac84..ae9851226f9913 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -513,7 +513,6 @@ enum layout_break_reason {
 #define XFS_INHERIT_GID(pip)	\
 	(xfs_has_grpid((pip)->i_mount) || (VFS_I(pip)->i_mode & S_ISGID))
 
-int		xfs_release(struct xfs_inode *ip);
 int		xfs_inactive(struct xfs_inode *ip);
 int		xfs_lookup(struct xfs_inode *dp, const struct xfs_name *name,
 			   struct xfs_inode **ipp, struct xfs_name *ci_name);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/10] xfs: don't bother returning errors from xfs_file_release
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-06-23  5:34 ` [PATCH 03/10] xfs: refactor f_op->release handling Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:39   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 05/10] xfs: skip all of xfs_file_release when shut down Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

While ->release returns int, the only caller ignores the return value.
As we're only doing cleanup work there isn't much of a point in
return a value to start with, so just document the situation instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index d39d0ea522d1c2..7b91cbab80da55 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1186,6 +1186,10 @@ xfs_dir_open(
 	return error;
 }
 
+/*
+ * Don't bother propagating errors.  We're just doing cleanup, and the caller
+ * ignores the return value anyway.
+ */
 STATIC int
 xfs_file_release(
 	struct inode		*inode,
@@ -1193,7 +1197,6 @@ xfs_file_release(
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
-	int			error;
 
 	/* If this is a read-only mount, don't generate I/O */
 	if (xfs_is_readonly(mp))
@@ -1211,11 +1214,8 @@ xfs_file_release(
 	if (!xfs_is_shutdown(mp) &&
 	    xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
 		xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
-		if (ip->i_delayed_blks > 0) {
-			error = filemap_flush(inode->i_mapping);
-			if (error)
-				return error;
-		}
+		if (ip->i_delayed_blks > 0)
+			filemap_flush(inode->i_mapping);
 	}
 
 	/*
@@ -1249,14 +1249,14 @@ xfs_file_release(
 			 * dirty close we will still remove the speculative
 			 * allocation, but after that we will leave it in place.
 			 */
-			error = xfs_free_eofblocks(ip);
-			if (!error && ip->i_delayed_blks)
+			xfs_free_eofblocks(ip);
+			if (ip->i_delayed_blks)
 				xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
 		}
 		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 	}
 
-	return error;
+	return 0;
 }
 
 STATIC int
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/10] xfs: skip all of xfs_file_release when shut down
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-06-23  5:34 ` [PATCH 04/10] xfs: don't bother returning errors from xfs_file_release Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:41   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 06/10] xfs: don't free post-EOF blocks on read close Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

There is no point in trying to free post-EOF blocks when the file system
is shutdown, as it will just error out ASAP.  Instead return instantly
when xfs_file_shutdown is called on a shut down file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 7b91cbab80da55..0380e0b1d9c6c7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1198,8 +1198,11 @@ xfs_file_release(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 
-	/* If this is a read-only mount, don't generate I/O */
-	if (xfs_is_readonly(mp))
+	/*
+	 * If this is a read-only mount or the file system has been shut down,
+	 * don't generate I/O.
+	 */
+	if (xfs_is_readonly(mp) || xfs_is_shutdown(mp))
 		return 0;
 
 	/*
@@ -1211,8 +1214,7 @@ xfs_file_release(
 	 * is significantly reducing the time window where we'd otherwise be
 	 * exposed to that problem.
 	 */
-	if (!xfs_is_shutdown(mp) &&
-	    xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
+	if (xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
 		xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
 		if (ip->i_delayed_blks > 0)
 			filemap_flush(inode->i_mapping);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/10] xfs: don't free post-EOF blocks on read close
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-06-23  5:34 ` [PATCH 05/10] xfs: skip all of xfs_file_release when shut down Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:43   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 07/10] xfs: only free posteof blocks on first close Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When we have a workload that does open/read/close in parallel with other
allocation, the file becomes rapidly fragmented. This is due to close()
calling xfs_file_release() and removing the speculative preallocation
beyond EOF.

Add a check for a writable context to xfs_file_release to skip the
post-EOF block freeing (an the similarly pointless flushing on truncate
down).

Before:

Test 1: sync write fragmentation counts

/mnt/scratch/file.0: 919
/mnt/scratch/file.1: 916
/mnt/scratch/file.2: 919
/mnt/scratch/file.3: 920
/mnt/scratch/file.4: 920
/mnt/scratch/file.5: 921
/mnt/scratch/file.6: 916
/mnt/scratch/file.7: 918

After:

Test 1: sync write fragmentation counts

/mnt/scratch/file.0: 24
/mnt/scratch/file.1: 24
/mnt/scratch/file.2: 11
/mnt/scratch/file.3: 24
/mnt/scratch/file.4: 3
/mnt/scratch/file.5: 24
/mnt/scratch/file.6: 24
/mnt/scratch/file.7: 23

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: wordsmithing, fix commit message]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: ported to the new ->release code structure]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0380e0b1d9c6c7..8d70171678fe24 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1228,12 +1228,18 @@ xfs_file_release(
 	 * There is no point in freeing blocks here for open but unlinked files
 	 * as they will be taken care of by the inactivation path soon.
 	 *
+	 * When releasing a read-only context, don't flush data or trim post-EOF
+	 * blocks.  This avoids open/read/close workloads from removing EOF
+	 * blocks that other writers depend upon to reduce fragmentation.
+	 *
 	 * If we can't get the iolock just skip truncating the blocks past EOF
 	 * because we could deadlock with the mmap_lock otherwise. We'll get
 	 * another chance to drop them once the last reference to the inode is
 	 * dropped, so we'll never leak blocks permanently.
 	 */
-	if (inode->i_nlink && xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
+	if (inode->i_nlink &&
+	    (file->f_mode & FMODE_WRITE) &&
+	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
 		if (xfs_can_free_eofblocks(ip) &&
 		    !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) {
 			/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/10] xfs: only free posteof blocks on first close
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-06-23  5:34 ` [PATCH 06/10] xfs: don't free post-EOF blocks on read close Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:46   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 08/10] xfs: check XFS_IDIRTY_RELEASE earlier in xfs_release_eofblocks Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

From: "Darrick J. Wong" <djwong@kernel.org>

Certain workloads fragment files on XFS very badly, such as a software
package that creates a number of threads, each of which repeatedly run
the sequence: open a file, perform a synchronous write, and close the
file, which defeats the speculative preallocation mechanism.  We work
around this problem by only deleting posteof blocks the /first/ time a
file is closed to preserve the behavior that unpacking a tarball lays
out files one after the other with no gaps.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[hch: rebased, updated comment, renamed the flag]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c  | 32 +++++++++++---------------------
 fs/xfs/xfs_inode.h |  4 ++--
 2 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 8d70171678fe24..de52aceabebc27 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1215,15 +1215,21 @@ xfs_file_release(
 	 * exposed to that problem.
 	 */
 	if (xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
-		xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
+		xfs_iflags_clear(ip, XFS_EOFBLOCKS_RELEASED);
 		if (ip->i_delayed_blks > 0)
 			filemap_flush(inode->i_mapping);
 	}
 
 	/*
 	 * XFS aggressively preallocates post-EOF space to generate contiguous
-	 * allocations for writers that append to the end of the file and we
-	 * try to free these when an open file context is released.
+	 * allocations for writers that append to the end of the file.
+	 *
+	 * To support workloads that close and reopen the file frequently, these
+	 * preallocations usually persist after a close unless it is the first
+	 * close for the inode.  This is a tradeoff to generate tightly packed
+	 * data layouts for unpacking tarballs or similar archives that write
+	 * one file after another without going back to it while keeping the
+	 * preallocation for files that have recurring open/write/close cycles.
 	 *
 	 * There is no point in freeing blocks here for open but unlinked files
 	 * as they will be taken care of by the inactivation path soon.
@@ -1241,25 +1247,9 @@ xfs_file_release(
 	    (file->f_mode & FMODE_WRITE) &&
 	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
 		if (xfs_can_free_eofblocks(ip) &&
-		    !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) {
-			/*
-			 * Check if the inode is being opened, written and
-			 * closed frequently and we have delayed allocation
-			 * blocks outstanding (e.g. streaming writes from the
-			 * NFS server), truncating the blocks past EOF will
-			 * cause fragmentation to occur.
-			 *
-			 * In this case don't do the truncation, but we have to
-			 * be careful how we detect this case. Blocks beyond EOF
-			 * show up as i_delayed_blks even when the inode is
-			 * clean, so we need to truncate them away first before
-			 * checking for a dirty release. Hence on the first
-			 * dirty close we will still remove the speculative
-			 * allocation, but after that we will leave it in place.
-			 */
+		    !xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED)) {
 			xfs_free_eofblocks(ip);
-			if (ip->i_delayed_blks)
-				xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
+			xfs_iflags_set(ip, XFS_EOFBLOCKS_RELEASED);
 		}
 		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 	}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index ae9851226f9913..548a4f00bcae1b 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -336,7 +336,7 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
 #define XFS_INEW		(1 << 3) /* inode has just been allocated */
 #define XFS_IPRESERVE_DM_FIELDS	(1 << 4) /* has legacy DMAPI fields set */
 #define XFS_ITRUNCATED		(1 << 5) /* truncated down so flush-on-close */
-#define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
+#define XFS_EOFBLOCKS_RELEASED	(1 << 6) /* eofblocks were freed in ->release */
 #define XFS_IFLUSHING		(1 << 7) /* inode is being flushed */
 #define __XFS_IPINNED_BIT	8	 /* wakeup key for zero pin count */
 #define XFS_IPINNED		(1 << __XFS_IPINNED_BIT)
@@ -383,7 +383,7 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
  */
 #define XFS_IRECLAIM_RESET_FLAGS	\
 	(XFS_IRECLAIMABLE | XFS_IRECLAIM | \
-	 XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \
+	 XFS_EOFBLOCKS_RELEASED | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \
 	 XFS_INACTIVATING | XFS_IQUOTAUNCHECKED)
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/10] xfs: check XFS_IDIRTY_RELEASE earlier in xfs_release_eofblocks
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-06-23  5:34 ` [PATCH 07/10] xfs: only free posteof blocks on first close Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:50   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 09/10] xfs: simplify extent lookup in xfs_can_free_eofblocks Christoph Hellwig
  2024-06-23  5:34 ` [PATCH 10/10] xfs: reclaim speculative preallocations for append only files Christoph Hellwig
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

If the XFS_IDIRTY_RELEASE flag is set, we are not going to free
the eofblocks, so don't bother locking the inode or performing the
checks in xfs_can_free_eofblocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index de52aceabebc27..1903fa5568a37d 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1245,9 +1245,9 @@ xfs_file_release(
 	 */
 	if (inode->i_nlink &&
 	    (file->f_mode & FMODE_WRITE) &&
+	    !xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
 	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
-		if (xfs_can_free_eofblocks(ip) &&
-		    !xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED)) {
+		if (xfs_can_free_eofblocks(ip)) {
 			xfs_free_eofblocks(ip);
 			xfs_iflags_set(ip, XFS_EOFBLOCKS_RELEASED);
 		}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/10] xfs: simplify extent lookup in xfs_can_free_eofblocks
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-06-23  5:34 ` [PATCH 08/10] xfs: check XFS_IDIRTY_RELEASE earlier in xfs_release_eofblocks Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:51   ` Darrick J. Wong
  2024-06-23  5:34 ` [PATCH 10/10] xfs: reclaim speculative preallocations for append only files Christoph Hellwig
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

xfs_can_free_eofblocks just cares if there is an extent beyond EOF.
Replace the call to xfs_bmapi_read with a xfs_iext_lookup_extent
as we've already checked that extents are read in earlier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c | 22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index a4d9fbc21b8343..52863b784b023f 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -492,12 +492,12 @@ bool
 xfs_can_free_eofblocks(
 	struct xfs_inode	*ip)
 {
-	struct xfs_bmbt_irec	imap;
 	struct xfs_mount	*mp = ip->i_mount;
+	bool			found_blocks = false;
 	xfs_fileoff_t		end_fsb;
 	xfs_fileoff_t		last_fsb;
-	int			nimaps = 1;
-	int			error;
+	struct xfs_bmbt_irec	imap;
+	struct xfs_iext_cursor	icur;
 
 	/*
 	 * Caller must either hold the exclusive io lock; or be inactivating
@@ -544,21 +544,13 @@ xfs_can_free_eofblocks(
 		return false;
 
 	/*
-	 * Look up the mapping for the first block past EOF.  If we can't find
-	 * it, there's nothing to free.
+	 * Check if there is an post-EOF extent to free.
 	 */
 	xfs_ilock(ip, XFS_ILOCK_SHARED);
-	error = xfs_bmapi_read(ip, end_fsb, last_fsb - end_fsb, &imap, &nimaps,
-			0);
+	if (xfs_iext_lookup_extent(ip, &ip->i_df, end_fsb, &icur, &imap))
+		found_blocks = true;
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
-	if (error || nimaps == 0)
-		return false;
-
-	/*
-	 * If there's a real mapping there or there are delayed allocation
-	 * reservations, then we have post-EOF blocks to try to free.
-	 */
-	return imap.br_startblock != HOLESTARTBLOCK || ip->i_delayed_blks;
+	return found_blocks;
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/10] xfs: reclaim speculative preallocations for append only files
  2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-06-23  5:34 ` [PATCH 09/10] xfs: simplify extent lookup in xfs_can_free_eofblocks Christoph Hellwig
@ 2024-06-23  5:34 ` Christoph Hellwig
  2024-06-24 15:54   ` Darrick J. Wong
  9 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-23  5:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

The XFS XFS_DIFLAG_APPEND maps to the VFS S_APPEND flag, which forbids
writes that don't append at the current EOF.

But the commit originally adding XFS_DIFLAG_APPEND support (commit
a23321e766d in xfs xfs-import repository) also checked it to skip
releasing speculative preallocations, which doesn't make any sense.

Another commit (dd9f438e3290 in the xfs-import repository) late extended
that flag to also report these speculation preallocations which should
not exist in getbmap.

Remove these checks as nothing XFS_DIFLAG_APPEND implies that
preallocations beyond EOF should exist, but explicitly check for
XFS_DIFLAG_APPEND in xfs_file_release to bypass the algorithm that
discard preallocations on the first close as append only file aren't
expected to be written to only once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_bmap_util.c | 12 +++++-------
 fs/xfs/xfs_file.c      |  4 ++++
 fs/xfs/xfs_icache.c    |  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 52863b784b023f..aa924d7cd32abd 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -331,8 +331,7 @@ xfs_getbmap(
 		}
 
 		if (xfs_get_extsz_hint(ip) ||
-		    (ip->i_diflags &
-		     (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))
+		    (ip->i_diflags & XFS_DIFLAG_PREALLOC))
 			max_len = mp->m_super->s_maxbytes;
 		else
 			max_len = XFS_ISIZE(ip);
@@ -524,12 +523,11 @@ xfs_can_free_eofblocks(
 		return false;
 
 	/*
-	 * Only free real extents for inodes with persistent preallocations or
-	 * the append-only flag.
+	 * Do not free real extents in preallocated files unless the file has
+	 * delalloc blocks and we are forced to remove them.
 	 */
-	if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
-		if (ip->i_delayed_blks == 0)
-			return false;
+	if ((ip->i_diflags & XFS_DIFLAG_PREALLOC) && !ip->i_delayed_blks)
+		return false;
 
 	/*
 	 * Do not try to free post-EOF blocks if EOF is beyond the end of the
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 1903fa5568a37d..b05822a70ea680 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1231,6 +1231,9 @@ xfs_file_release(
 	 * one file after another without going back to it while keeping the
 	 * preallocation for files that have recurring open/write/close cycles.
 	 *
+	 * This heuristic is skipped for inodes with the append-only flag as
+	 * that flags is rather pointless for inodes written oly once.
+	 *
 	 * There is no point in freeing blocks here for open but unlinked files
 	 * as they will be taken care of by the inactivation path soon.
 	 *
@@ -1245,6 +1248,7 @@ xfs_file_release(
 	 */
 	if (inode->i_nlink &&
 	    (file->f_mode & FMODE_WRITE) &&
+	    !(ip->i_diflags & XFS_DIFLAG_APPEND) &&
 	    !xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
 	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
 		if (xfs_can_free_eofblocks(ip)) {
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 9967334ea99f1a..0f07ec842b7023 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1158,7 +1158,7 @@ xfs_inode_free_eofblocks(
 	if (xfs_can_free_eofblocks(ip))
 		return xfs_free_eofblocks(ip);
 
-	/* inode could be preallocated or append-only */
+	/* inode could be preallocated */
 	trace_xfs_inode_free_eofblocks_invalid(ip);
 	xfs_inode_clear_eofblocks_tag(ip);
 	return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/10] xfs: fix freeing speculative preallocations for preallocated files
  2024-06-23  5:34 ` [PATCH 01/10] xfs: fix freeing speculative preallocations for preallocated files Christoph Hellwig
@ 2024-06-24 15:30   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:46AM +0200, Christoph Hellwig wrote:
> xfs_can_free_eofblocks returns false for files that have persistent
> preallocations unless the force flag is passed and there are delayed
> blocks.  This means it won't free delalloc reservations for files
> with persistent preallocations unless the force flag is set, and it
> will also free the persistent preallocations if the force flag is
> set and the file happens to have delayed allocations.
> 
> Both of these are bad, so do away with the force flag and always
> free post-EOF delayed allocations only for files with the
> XFS_DIFLAG_PREALLOC flag set.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks fine
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_bmap_util.c | 30 ++++++++++++++++++++++--------
>  fs/xfs/xfs_bmap_util.h |  2 +-
>  fs/xfs/xfs_icache.c    |  2 +-
>  fs/xfs/xfs_inode.c     | 14 ++++----------
>  4 files changed, 28 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index ac2e77ebb54c73..a4d9fbc21b8343 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -486,13 +486,11 @@ xfs_bmap_punch_delalloc_range(
>  
>  /*
>   * Test whether it is appropriate to check an inode for and free post EOF
> - * blocks. The 'force' parameter determines whether we should also consider
> - * regular files that are marked preallocated or append-only.
> + * blocks.
>   */
>  bool
>  xfs_can_free_eofblocks(
> -	struct xfs_inode	*ip,
> -	bool			force)
> +	struct xfs_inode	*ip)
>  {
>  	struct xfs_bmbt_irec	imap;
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -526,11 +524,11 @@ xfs_can_free_eofblocks(
>  		return false;
>  
>  	/*
> -	 * Do not free real preallocated or append-only files unless the file
> -	 * has delalloc blocks and we are forced to remove them.
> +	 * Only free real extents for inodes with persistent preallocations or
> +	 * the append-only flag.
>  	 */
>  	if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
> -		if (!force || ip->i_delayed_blks == 0)
> +		if (ip->i_delayed_blks == 0)
>  			return false;
>  
>  	/*
> @@ -584,6 +582,22 @@ xfs_free_eofblocks(
>  	/* Wait on dio to ensure i_size has settled. */
>  	inode_dio_wait(VFS_I(ip));
>  
> +	/*
> +	 * For preallocated files only free delayed allocations.
> +	 *
> +	 * Note that this means we also leave speculative preallocations in
> +	 * place for preallocated files.
> +	 */
> +	if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) {
> +		if (ip->i_delayed_blks) {
> +			xfs_bmap_punch_delalloc_range(ip,
> +				round_up(XFS_ISIZE(ip), mp->m_sb.sb_blocksize),
> +				LLONG_MAX);
> +		}
> +		xfs_inode_clear_eofblocks_tag(ip);
> +		return 0;
> +	}
> +
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
>  	if (error) {
>  		ASSERT(xfs_is_shutdown(mp));
> @@ -891,7 +905,7 @@ xfs_prepare_shift(
>  	 * Trim eofblocks to avoid shifting uninitialized post-eof preallocation
>  	 * into the accessible region of the file.
>  	 */
> -	if (xfs_can_free_eofblocks(ip, true)) {
> +	if (xfs_can_free_eofblocks(ip)) {
>  		error = xfs_free_eofblocks(ip);
>  		if (error)
>  			return error;
> diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
> index 51f84d8ff372fa..eb0895bfb9dae4 100644
> --- a/fs/xfs/xfs_bmap_util.h
> +++ b/fs/xfs/xfs_bmap_util.h
> @@ -63,7 +63,7 @@ int	xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
>  				xfs_off_t len);
>  
>  /* EOF block manipulation functions */
> -bool	xfs_can_free_eofblocks(struct xfs_inode *ip, bool force);
> +bool	xfs_can_free_eofblocks(struct xfs_inode *ip);
>  int	xfs_free_eofblocks(struct xfs_inode *ip);
>  
>  int	xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip,
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 0953163a2d8492..9967334ea99f1a 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1155,7 +1155,7 @@ xfs_inode_free_eofblocks(
>  	}
>  	*lockflags |= XFS_IOLOCK_EXCL;
>  
> -	if (xfs_can_free_eofblocks(ip, false))
> +	if (xfs_can_free_eofblocks(ip))
>  		return xfs_free_eofblocks(ip);
>  
>  	/* inode could be preallocated or append-only */
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index f36091e1e7f50b..38f946e3be2da3 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1595,7 +1595,7 @@ xfs_release(
>  	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL))
>  		return 0;
>  
> -	if (xfs_can_free_eofblocks(ip, false)) {
> +	if (xfs_can_free_eofblocks(ip)) {
>  		/*
>  		 * Check if the inode is being opened, written and closed
>  		 * frequently and we have delayed allocation blocks outstanding
> @@ -1856,15 +1856,13 @@ xfs_inode_needs_inactive(
>  
>  	/*
>  	 * This file isn't being freed, so check if there are post-eof blocks
> -	 * to free.  @force is true because we are evicting an inode from the
> -	 * cache.  Post-eof blocks must be freed, lest we end up with broken
> -	 * free space accounting.
> +	 * to free.
>  	 *
>  	 * Note: don't bother with iolock here since lockdep complains about
>  	 * acquiring it in reclaim context. We have the only reference to the
>  	 * inode at this point anyways.
>  	 */
> -	return xfs_can_free_eofblocks(ip, true);
> +	return xfs_can_free_eofblocks(ip);
>  }
>  
>  /*
> @@ -1947,15 +1945,11 @@ xfs_inactive(
>  
>  	if (VFS_I(ip)->i_nlink != 0) {
>  		/*
> -		 * force is true because we are evicting an inode from the
> -		 * cache. Post-eof blocks must be freed, lest we end up with
> -		 * broken free space accounting.
> -		 *
>  		 * Note: don't bother with iolock here since lockdep complains
>  		 * about acquiring it in reclaim context. We have the only
>  		 * reference to the inode at this point anyways.
>  		 */
> -		if (xfs_can_free_eofblocks(ip, true))
> +		if (xfs_can_free_eofblocks(ip))
>  			error = xfs_free_eofblocks(ip);
>  
>  		goto out;
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] xfs: remove the i_mode check in xfs_release
  2024-06-23  5:34 ` [PATCH 02/10] xfs: remove the i_mode check in xfs_release Christoph Hellwig
@ 2024-06-24 15:34   ` Darrick J. Wong
  2024-06-24 15:50     ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:47AM +0200, Christoph Hellwig wrote:
> xfs_release is only called from xfs_file_release, which is wired up as
> the f_op->release handler for regular files only.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_inode.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 38f946e3be2da3..9a9340aebe9d8a 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1552,9 +1552,6 @@ xfs_release(
>  	xfs_mount_t	*mp = ip->i_mount;
>  	int		error = 0;
>  
> -	if (!S_ISREG(VFS_I(ip)->i_mode) || (VFS_I(ip)->i_mode == 0))

How would we encounter !i_mode regular files being released?

If an open file's link count is incorrectly low, it can't get freed
until after all the open file descriptors have been released, right?
Or is there some other vector for this?

I'm wondering if this ought to be:

	if (XFS_IS_CORRUPT(mp, !VFS_I(ip)->i_mode)) {
		xfs_inode_mark_sick(ip);
		return -EFSCORRUPTED;
	}

--D

> -		return 0;
> -
>  	/* If this is a read-only mount, don't do this (would generate I/O) */
>  	if (xfs_is_readonly(mp))
>  		return 0;
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/10] xfs: refactor f_op->release handling
  2024-06-23  5:34 ` [PATCH 03/10] xfs: refactor f_op->release handling Christoph Hellwig
@ 2024-06-24 15:35   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:48AM +0200, Christoph Hellwig wrote:
> Currently f_op->release is split in not very obvious ways.  Fix that by
> folding xfs_release into xfs_file_release.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Pretty straightforward hoist, looks like
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_file.c  | 71 +++++++++++++++++++++++++++++++++++++++--
>  fs/xfs/xfs_inode.c | 79 ----------------------------------------------
>  fs/xfs/xfs_inode.h |  1 -
>  3 files changed, 68 insertions(+), 83 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index b240ea5241dc9d..d39d0ea522d1c2 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1188,10 +1188,75 @@ xfs_dir_open(
>  
>  STATIC int
>  xfs_file_release(
> -	struct inode	*inode,
> -	struct file	*filp)
> +	struct inode		*inode,
> +	struct file		*file)
>  {
> -	return xfs_release(XFS_I(inode));
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct xfs_mount	*mp = ip->i_mount;
> +	int			error;
> +
> +	/* If this is a read-only mount, don't generate I/O */
> +	if (xfs_is_readonly(mp))
> +		return 0;
> +
> +	/*
> +	 * If we previously truncated this file and removed old data in the
> +	 * process, we want to initiate "early" writeout on the last close.
> +	 * This is an attempt to combat the notorious NULL files problem which
> +	 * is particularly noticeable from a truncate down, buffered (re-)write
> +	 * (delalloc), followed by a crash.  What we are effectively doing here
> +	 * is significantly reducing the time window where we'd otherwise be
> +	 * exposed to that problem.
> +	 */
> +	if (!xfs_is_shutdown(mp) &&
> +	    xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
> +		xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
> +		if (ip->i_delayed_blks > 0) {
> +			error = filemap_flush(inode->i_mapping);
> +			if (error)
> +				return error;
> +		}
> +	}
> +
> +	/*
> +	 * XFS aggressively preallocates post-EOF space to generate contiguous
> +	 * allocations for writers that append to the end of the file and we
> +	 * try to free these when an open file context is released.
> +	 *
> +	 * There is no point in freeing blocks here for open but unlinked files
> +	 * as they will be taken care of by the inactivation path soon.
> +	 *
> +	 * If we can't get the iolock just skip truncating the blocks past EOF
> +	 * because we could deadlock with the mmap_lock otherwise. We'll get
> +	 * another chance to drop them once the last reference to the inode is
> +	 * dropped, so we'll never leak blocks permanently.
> +	 */
> +	if (inode->i_nlink && xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> +		if (xfs_can_free_eofblocks(ip) &&
> +		    !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) {
> +			/*
> +			 * Check if the inode is being opened, written and
> +			 * closed frequently and we have delayed allocation
> +			 * blocks outstanding (e.g. streaming writes from the
> +			 * NFS server), truncating the blocks past EOF will
> +			 * cause fragmentation to occur.
> +			 *
> +			 * In this case don't do the truncation, but we have to
> +			 * be careful how we detect this case. Blocks beyond EOF
> +			 * show up as i_delayed_blks even when the inode is
> +			 * clean, so we need to truncate them away first before
> +			 * checking for a dirty release. Hence on the first
> +			 * dirty close we will still remove the speculative
> +			 * allocation, but after that we will leave it in place.
> +			 */
> +			error = xfs_free_eofblocks(ip);
> +			if (!error && ip->i_delayed_blks)
> +				xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
> +		}
> +		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> +	}
> +
> +	return error;
>  }
>  
>  STATIC int
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 9a9340aebe9d8a..fe4906a08665ee 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1545,85 +1545,6 @@ xfs_itruncate_extents_flags(
>  	return error;
>  }
>  
> -int
> -xfs_release(
> -	xfs_inode_t	*ip)
> -{
> -	xfs_mount_t	*mp = ip->i_mount;
> -	int		error = 0;
> -
> -	/* If this is a read-only mount, don't do this (would generate I/O) */
> -	if (xfs_is_readonly(mp))
> -		return 0;
> -
> -	if (!xfs_is_shutdown(mp)) {
> -		int truncated;
> -
> -		/*
> -		 * If we previously truncated this file and removed old data
> -		 * in the process, we want to initiate "early" writeout on
> -		 * the last close.  This is an attempt to combat the notorious
> -		 * NULL files problem which is particularly noticeable from a
> -		 * truncate down, buffered (re-)write (delalloc), followed by
> -		 * a crash.  What we are effectively doing here is
> -		 * significantly reducing the time window where we'd otherwise
> -		 * be exposed to that problem.
> -		 */
> -		truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
> -		if (truncated) {
> -			xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
> -			if (ip->i_delayed_blks > 0) {
> -				error = filemap_flush(VFS_I(ip)->i_mapping);
> -				if (error)
> -					return error;
> -			}
> -		}
> -	}
> -
> -	if (VFS_I(ip)->i_nlink == 0)
> -		return 0;
> -
> -	/*
> -	 * If we can't get the iolock just skip truncating the blocks past EOF
> -	 * because we could deadlock with the mmap_lock otherwise. We'll get
> -	 * another chance to drop them once the last reference to the inode is
> -	 * dropped, so we'll never leak blocks permanently.
> -	 */
> -	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL))
> -		return 0;
> -
> -	if (xfs_can_free_eofblocks(ip)) {
> -		/*
> -		 * Check if the inode is being opened, written and closed
> -		 * frequently and we have delayed allocation blocks outstanding
> -		 * (e.g. streaming writes from the NFS server), truncating the
> -		 * blocks past EOF will cause fragmentation to occur.
> -		 *
> -		 * In this case don't do the truncation, but we have to be
> -		 * careful how we detect this case. Blocks beyond EOF show up as
> -		 * i_delayed_blks even when the inode is clean, so we need to
> -		 * truncate them away first before checking for a dirty release.
> -		 * Hence on the first dirty close we will still remove the
> -		 * speculative allocation, but after that we will leave it in
> -		 * place.
> -		 */
> -		if (xfs_iflags_test(ip, XFS_IDIRTY_RELEASE))
> -			goto out_unlock;
> -
> -		error = xfs_free_eofblocks(ip);
> -		if (error)
> -			goto out_unlock;
> -
> -		/* delalloc blocks after truncation means it really is dirty */
> -		if (ip->i_delayed_blks)
> -			xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
> -	}
> -
> -out_unlock:
> -	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> -	return error;
> -}
> -
>  /*
>   * Mark all the buffers attached to this directory stale.  In theory we should
>   * never be freeing a directory with any blocks at all, but this covers the
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 292b90b5f2ac84..ae9851226f9913 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -513,7 +513,6 @@ enum layout_break_reason {
>  #define XFS_INHERIT_GID(pip)	\
>  	(xfs_has_grpid((pip)->i_mount) || (VFS_I(pip)->i_mode & S_ISGID))
>  
> -int		xfs_release(struct xfs_inode *ip);
>  int		xfs_inactive(struct xfs_inode *ip);
>  int		xfs_lookup(struct xfs_inode *dp, const struct xfs_name *name,
>  			   struct xfs_inode **ipp, struct xfs_name *ci_name);
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] xfs: don't bother returning errors from xfs_file_release
  2024-06-23  5:34 ` [PATCH 04/10] xfs: don't bother returning errors from xfs_file_release Christoph Hellwig
@ 2024-06-24 15:39   ` Darrick J. Wong
  2024-06-24 15:51     ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:49AM +0200, Christoph Hellwig wrote:
> While ->release returns int, the only caller ignores the return value.
> As we're only doing cleanup work there isn't much of a point in
> return a value to start with, so just document the situation instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_file.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index d39d0ea522d1c2..7b91cbab80da55 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1186,6 +1186,10 @@ xfs_dir_open(
>  	return error;
>  }
>  
> +/*
> + * Don't bother propagating errors.  We're just doing cleanup, and the caller
> + * ignores the return value anyway.

Shouldn't we drop the int return from the function declaration, then?

(Is that also a cleanup that's you're working on?)

--D

> + */
>  STATIC int
>  xfs_file_release(
>  	struct inode		*inode,
> @@ -1193,7 +1197,6 @@ xfs_file_release(
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> -	int			error;
>  
>  	/* If this is a read-only mount, don't generate I/O */
>  	if (xfs_is_readonly(mp))
> @@ -1211,11 +1214,8 @@ xfs_file_release(
>  	if (!xfs_is_shutdown(mp) &&
>  	    xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
>  		xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
> -		if (ip->i_delayed_blks > 0) {
> -			error = filemap_flush(inode->i_mapping);
> -			if (error)
> -				return error;
> -		}
> +		if (ip->i_delayed_blks > 0)
> +			filemap_flush(inode->i_mapping);
>  	}
>  
>  	/*
> @@ -1249,14 +1249,14 @@ xfs_file_release(
>  			 * dirty close we will still remove the speculative
>  			 * allocation, but after that we will leave it in place.
>  			 */
> -			error = xfs_free_eofblocks(ip);
> -			if (!error && ip->i_delayed_blks)
> +			xfs_free_eofblocks(ip);
> +			if (ip->i_delayed_blks)
>  				xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
>  		}
>  		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
>  	}
>  
> -	return error;
> +	return 0;
>  }
>  
>  STATIC int
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/10] xfs: skip all of xfs_file_release when shut down
  2024-06-23  5:34 ` [PATCH 05/10] xfs: skip all of xfs_file_release when shut down Christoph Hellwig
@ 2024-06-24 15:41   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:50AM +0200, Christoph Hellwig wrote:
> There is no point in trying to free post-EOF blocks when the file system
> is shutdown, as it will just error out ASAP.  Instead return instantly
> when xfs_file_shutdown is called on a shut down file system.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Makes sense,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_file.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 7b91cbab80da55..0380e0b1d9c6c7 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1198,8 +1198,11 @@ xfs_file_release(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
>  
> -	/* If this is a read-only mount, don't generate I/O */
> -	if (xfs_is_readonly(mp))
> +	/*
> +	 * If this is a read-only mount or the file system has been shut down,
> +	 * don't generate I/O.
> +	 */
> +	if (xfs_is_readonly(mp) || xfs_is_shutdown(mp))
>  		return 0;
>  
>  	/*
> @@ -1211,8 +1214,7 @@ xfs_file_release(
>  	 * is significantly reducing the time window where we'd otherwise be
>  	 * exposed to that problem.
>  	 */
> -	if (!xfs_is_shutdown(mp) &&
> -	    xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
> +	if (xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
>  		xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
>  		if (ip->i_delayed_blks > 0)
>  			filemap_flush(inode->i_mapping);
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/10] xfs: don't free post-EOF blocks on read close
  2024-06-23  5:34 ` [PATCH 06/10] xfs: don't free post-EOF blocks on read close Christoph Hellwig
@ 2024-06-24 15:43   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:51AM +0200, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we have a workload that does open/read/close in parallel with other
> allocation, the file becomes rapidly fragmented. This is due to close()
> calling xfs_file_release() and removing the speculative preallocation
> beyond EOF.
> 
> Add a check for a writable context to xfs_file_release to skip the
> post-EOF block freeing (an the similarly pointless flushing on truncate
> down).
> 
> Before:
> 
> Test 1: sync write fragmentation counts
> 
> /mnt/scratch/file.0: 919
> /mnt/scratch/file.1: 916
> /mnt/scratch/file.2: 919
> /mnt/scratch/file.3: 920
> /mnt/scratch/file.4: 920
> /mnt/scratch/file.5: 921
> /mnt/scratch/file.6: 916
> /mnt/scratch/file.7: 918
> 
> After:
> 
> Test 1: sync write fragmentation counts
> 
> /mnt/scratch/file.0: 24
> /mnt/scratch/file.1: 24
> /mnt/scratch/file.2: 11
> /mnt/scratch/file.3: 24
> /mnt/scratch/file.4: 3
> /mnt/scratch/file.5: 24
> /mnt/scratch/file.6: 24
> /mnt/scratch/file.7: 23
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> [darrick: wordsmithing, fix commit message]
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> [hch: ported to the new ->release code structure]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I like how this has gotten much pared down from what's been lurking in
my tree for ages.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_file.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 0380e0b1d9c6c7..8d70171678fe24 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1228,12 +1228,18 @@ xfs_file_release(
>  	 * There is no point in freeing blocks here for open but unlinked files
>  	 * as they will be taken care of by the inactivation path soon.
>  	 *
> +	 * When releasing a read-only context, don't flush data or trim post-EOF
> +	 * blocks.  This avoids open/read/close workloads from removing EOF
> +	 * blocks that other writers depend upon to reduce fragmentation.
> +	 *
>  	 * If we can't get the iolock just skip truncating the blocks past EOF
>  	 * because we could deadlock with the mmap_lock otherwise. We'll get
>  	 * another chance to drop them once the last reference to the inode is
>  	 * dropped, so we'll never leak blocks permanently.
>  	 */
> -	if (inode->i_nlink && xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> +	if (inode->i_nlink &&
> +	    (file->f_mode & FMODE_WRITE) &&
> +	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
>  		if (xfs_can_free_eofblocks(ip) &&
>  		    !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) {
>  			/*
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/10] xfs: only free posteof blocks on first close
  2024-06-23  5:34 ` [PATCH 07/10] xfs: only free posteof blocks on first close Christoph Hellwig
@ 2024-06-24 15:46   ` Darrick J. Wong
  2024-06-24 16:08     ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:52AM +0200, Christoph Hellwig wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> Certain workloads fragment files on XFS very badly, such as a software
> package that creates a number of threads, each of which repeatedly run
> the sequence: open a file, perform a synchronous write, and close the
> file, which defeats the speculative preallocation mechanism.  We work
> around this problem by only deleting posteof blocks the /first/ time a
> file is closed to preserve the behavior that unpacking a tarball lays
> out files one after the other with no gaps.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> [hch: rebased, updated comment, renamed the flag]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Someone please review this?  The last person to try was Dave, five years
ago, and I do not know if he ever saw what it did to various workloads.

https://lore.kernel.org/linux-xfs/20190315034237.GL23020@dastard/

--D

> ---
>  fs/xfs/xfs_file.c  | 32 +++++++++++---------------------
>  fs/xfs/xfs_inode.h |  4 ++--
>  2 files changed, 13 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 8d70171678fe24..de52aceabebc27 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1215,15 +1215,21 @@ xfs_file_release(
>  	 * exposed to that problem.
>  	 */
>  	if (xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED)) {
> -		xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
> +		xfs_iflags_clear(ip, XFS_EOFBLOCKS_RELEASED);
>  		if (ip->i_delayed_blks > 0)
>  			filemap_flush(inode->i_mapping);
>  	}
>  
>  	/*
>  	 * XFS aggressively preallocates post-EOF space to generate contiguous
> -	 * allocations for writers that append to the end of the file and we
> -	 * try to free these when an open file context is released.
> +	 * allocations for writers that append to the end of the file.
> +	 *
> +	 * To support workloads that close and reopen the file frequently, these
> +	 * preallocations usually persist after a close unless it is the first
> +	 * close for the inode.  This is a tradeoff to generate tightly packed
> +	 * data layouts for unpacking tarballs or similar archives that write
> +	 * one file after another without going back to it while keeping the
> +	 * preallocation for files that have recurring open/write/close cycles.
>  	 *
>  	 * There is no point in freeing blocks here for open but unlinked files
>  	 * as they will be taken care of by the inactivation path soon.
> @@ -1241,25 +1247,9 @@ xfs_file_release(
>  	    (file->f_mode & FMODE_WRITE) &&
>  	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
>  		if (xfs_can_free_eofblocks(ip) &&
> -		    !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) {
> -			/*
> -			 * Check if the inode is being opened, written and
> -			 * closed frequently and we have delayed allocation
> -			 * blocks outstanding (e.g. streaming writes from the
> -			 * NFS server), truncating the blocks past EOF will
> -			 * cause fragmentation to occur.
> -			 *
> -			 * In this case don't do the truncation, but we have to
> -			 * be careful how we detect this case. Blocks beyond EOF
> -			 * show up as i_delayed_blks even when the inode is
> -			 * clean, so we need to truncate them away first before
> -			 * checking for a dirty release. Hence on the first
> -			 * dirty close we will still remove the speculative
> -			 * allocation, but after that we will leave it in place.
> -			 */
> +		    !xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED)) {
>  			xfs_free_eofblocks(ip);
> -			if (ip->i_delayed_blks)
> -				xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);
> +			xfs_iflags_set(ip, XFS_EOFBLOCKS_RELEASED);
>  		}
>  		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
>  	}
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index ae9851226f9913..548a4f00bcae1b 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -336,7 +336,7 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
>  #define XFS_INEW		(1 << 3) /* inode has just been allocated */
>  #define XFS_IPRESERVE_DM_FIELDS	(1 << 4) /* has legacy DMAPI fields set */
>  #define XFS_ITRUNCATED		(1 << 5) /* truncated down so flush-on-close */
> -#define XFS_IDIRTY_RELEASE	(1 << 6) /* dirty release already seen */
> +#define XFS_EOFBLOCKS_RELEASED	(1 << 6) /* eofblocks were freed in ->release */
>  #define XFS_IFLUSHING		(1 << 7) /* inode is being flushed */
>  #define __XFS_IPINNED_BIT	8	 /* wakeup key for zero pin count */
>  #define XFS_IPINNED		(1 << __XFS_IPINNED_BIT)
> @@ -383,7 +383,7 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
>   */
>  #define XFS_IRECLAIM_RESET_FLAGS	\
>  	(XFS_IRECLAIMABLE | XFS_IRECLAIM | \
> -	 XFS_IDIRTY_RELEASE | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \
> +	 XFS_EOFBLOCKS_RELEASED | XFS_ITRUNCATED | XFS_NEED_INACTIVE | \
>  	 XFS_INACTIVATING | XFS_IQUOTAUNCHECKED)
>  
>  /*
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] xfs: remove the i_mode check in xfs_release
  2024-06-24 15:34   ` Darrick J. Wong
@ 2024-06-24 15:50     ` Christoph Hellwig
  2024-08-07 15:01       ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-24 15:50 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 08:34:59AM -0700, Darrick J. Wong wrote:
> > -	if (!S_ISREG(VFS_I(ip)->i_mode) || (VFS_I(ip)->i_mode == 0))
> 
> How would we encounter !i_mode regular files being released?

We can't.  If that code ever made any sense than in ancient pre-history
in IRIX.

> If an open file's link count is incorrectly low, it can't get freed
> until after all the open file descriptors have been released, right?
> Or is there some other vector for this?

No.

> I'm wondering if this ought to be:
> 
> 	if (XFS_IS_CORRUPT(mp, !VFS_I(ip)->i_mode)) {
> 		xfs_inode_mark_sick(ip);
> 		return -EFSCORRUPTED;
> 	}

I wouldn't even bother with that.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/10] xfs: check XFS_IDIRTY_RELEASE earlier in xfs_release_eofblocks
  2024-06-23  5:34 ` [PATCH 08/10] xfs: check XFS_IDIRTY_RELEASE earlier in xfs_release_eofblocks Christoph Hellwig
@ 2024-06-24 15:50   ` Darrick J. Wong
  2024-06-24 15:54     ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:53AM +0200, Christoph Hellwig wrote:
> If the XFS_IDIRTY_RELEASE flag is set, we are not going to free

         XFS_EOFBLOCKS_RELEASED ?

> the eofblocks, so don't bother locking the inode or performing the
> checks in xfs_can_free_eofblocks.

It'll still be the case that ->destroy_inode will have the chance to
delete the eofblocks if we don't do it here, correct?

If so, then
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index de52aceabebc27..1903fa5568a37d 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1245,9 +1245,9 @@ xfs_file_release(
>  	 */
>  	if (inode->i_nlink &&
>  	    (file->f_mode & FMODE_WRITE) &&
> +	    !xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
>  	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> -		if (xfs_can_free_eofblocks(ip) &&
> -		    !xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED)) {
> +		if (xfs_can_free_eofblocks(ip)) {
>  			xfs_free_eofblocks(ip);
>  			xfs_iflags_set(ip, XFS_EOFBLOCKS_RELEASED);
>  		}
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/10] xfs: simplify extent lookup in xfs_can_free_eofblocks
  2024-06-23  5:34 ` [PATCH 09/10] xfs: simplify extent lookup in xfs_can_free_eofblocks Christoph Hellwig
@ 2024-06-24 15:51   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:54AM +0200, Christoph Hellwig wrote:
> xfs_can_free_eofblocks just cares if there is an extent beyond EOF.
> Replace the call to xfs_bmapi_read with a xfs_iext_lookup_extent
> as we've already checked that extents are read in earlier.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I guess that works :)
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_bmap_util.c | 22 +++++++---------------
>  1 file changed, 7 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index a4d9fbc21b8343..52863b784b023f 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -492,12 +492,12 @@ bool
>  xfs_can_free_eofblocks(
>  	struct xfs_inode	*ip)
>  {
> -	struct xfs_bmbt_irec	imap;
>  	struct xfs_mount	*mp = ip->i_mount;
> +	bool			found_blocks = false;
>  	xfs_fileoff_t		end_fsb;
>  	xfs_fileoff_t		last_fsb;
> -	int			nimaps = 1;
> -	int			error;
> +	struct xfs_bmbt_irec	imap;
> +	struct xfs_iext_cursor	icur;
>  
>  	/*
>  	 * Caller must either hold the exclusive io lock; or be inactivating
> @@ -544,21 +544,13 @@ xfs_can_free_eofblocks(
>  		return false;
>  
>  	/*
> -	 * Look up the mapping for the first block past EOF.  If we can't find
> -	 * it, there's nothing to free.
> +	 * Check if there is an post-EOF extent to free.
>  	 */
>  	xfs_ilock(ip, XFS_ILOCK_SHARED);
> -	error = xfs_bmapi_read(ip, end_fsb, last_fsb - end_fsb, &imap, &nimaps,
> -			0);
> +	if (xfs_iext_lookup_extent(ip, &ip->i_df, end_fsb, &icur, &imap))
> +		found_blocks = true;
>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
> -	if (error || nimaps == 0)
> -		return false;
> -
> -	/*
> -	 * If there's a real mapping there or there are delayed allocation
> -	 * reservations, then we have post-EOF blocks to try to free.
> -	 */
> -	return imap.br_startblock != HOLESTARTBLOCK || ip->i_delayed_blks;
> +	return found_blocks;
>  }
>  
>  /*
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] xfs: don't bother returning errors from xfs_file_release
  2024-06-24 15:39   ` Darrick J. Wong
@ 2024-06-24 15:51     ` Christoph Hellwig
  2024-08-07 14:59       ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-24 15:51 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 08:39:51AM -0700, Darrick J. Wong wrote:
> > +/*
> > + * Don't bother propagating errors.  We're just doing cleanup, and the caller
> > + * ignores the return value anyway.
> 
> Shouldn't we drop the int return from the function declaration, then?
> 
> (Is that also a cleanup that's you're working on?)

We can't drop it without changing the f_ops->release signature and
updates to the many instance of it.  That would still be worthwhile
project, but it's something for someone who is better at scripting
than me.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/10] xfs: check XFS_IDIRTY_RELEASE earlier in xfs_release_eofblocks
  2024-06-24 15:50   ` Darrick J. Wong
@ 2024-06-24 15:54     ` Christoph Hellwig
  0 siblings, 0 replies; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-24 15:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 08:50:22AM -0700, Darrick J. Wong wrote:
> On Sun, Jun 23, 2024 at 07:34:53AM +0200, Christoph Hellwig wrote:
> > If the XFS_IDIRTY_RELEASE flag is set, we are not going to free
> 
>          XFS_EOFBLOCKS_RELEASED ?

Yes.

> > the eofblocks, so don't bother locking the inode or performing the
> > checks in xfs_can_free_eofblocks.
> 
> It'll still be the case that ->destroy_inode will have the chance to
> delete the eofblocks if we don't do it here, correct?

Yes.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] xfs: reclaim speculative preallocations for append only files
  2024-06-23  5:34 ` [PATCH 10/10] xfs: reclaim speculative preallocations for append only files Christoph Hellwig
@ 2024-06-24 15:54   ` Darrick J. Wong
  2024-06-24 16:07     ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 15:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Sun, Jun 23, 2024 at 07:34:55AM +0200, Christoph Hellwig wrote:
> The XFS XFS_DIFLAG_APPEND maps to the VFS S_APPEND flag, which forbids
> writes that don't append at the current EOF.
> 
> But the commit originally adding XFS_DIFLAG_APPEND support (commit
> a23321e766d in xfs xfs-import repository) also checked it to skip
> releasing speculative preallocations, which doesn't make any sense.
> 
> Another commit (dd9f438e3290 in the xfs-import repository) late extended

                                                             later

> that flag to also report these speculation preallocations which should
> not exist in getbmap.
> 
> Remove these checks as nothing XFS_DIFLAG_APPEND implies that
> preallocations beyond EOF should exist, but explicitly check for
> XFS_DIFLAG_APPEND in xfs_file_release to bypass the algorithm that
> discard preallocations on the first close as append only file aren't

                                                           files

> expected to be written to only once.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_bmap_util.c | 12 +++++-------
>  fs/xfs/xfs_file.c      |  4 ++++
>  fs/xfs/xfs_icache.c    |  2 +-
>  3 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 52863b784b023f..aa924d7cd32abd 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -331,8 +331,7 @@ xfs_getbmap(
>  		}
>  
>  		if (xfs_get_extsz_hint(ip) ||
> -		    (ip->i_diflags &
> -		     (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))
> +		    (ip->i_diflags & XFS_DIFLAG_PREALLOC))

The last time you tried to remove XFS_DIFLAG_APPEND from this test, I
noticed that there's some fstest that "fails" because the bmap output
for an append-only file now stops at isize instead of maxbytes.  Do you
see this same regression?

>  			max_len = mp->m_super->s_maxbytes;
>  		else
>  			max_len = XFS_ISIZE(ip);
> @@ -524,12 +523,11 @@ xfs_can_free_eofblocks(
>  		return false;
>  
>  	/*
> -	 * Only free real extents for inodes with persistent preallocations or
> -	 * the append-only flag.
> +	 * Do not free real extents in preallocated files unless the file has
> +	 * delalloc blocks and we are forced to remove them.
>  	 */
> -	if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND))
> -		if (ip->i_delayed_blks == 0)
> -			return false;
> +	if ((ip->i_diflags & XFS_DIFLAG_PREALLOC) && !ip->i_delayed_blks)
> +		return false;
>  
>  	/*
>  	 * Do not try to free post-EOF blocks if EOF is beyond the end of the
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1903fa5568a37d..b05822a70ea680 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1231,6 +1231,9 @@ xfs_file_release(
>  	 * one file after another without going back to it while keeping the
>  	 * preallocation for files that have recurring open/write/close cycles.
>  	 *
> +	 * This heuristic is skipped for inodes with the append-only flag as
> +	 * that flags is rather pointless for inodes written oly once.

                flag                                         only

--D

> +	 *
>  	 * There is no point in freeing blocks here for open but unlinked files
>  	 * as they will be taken care of by the inactivation path soon.
>  	 *
> @@ -1245,6 +1248,7 @@ xfs_file_release(
>  	 */
>  	if (inode->i_nlink &&
>  	    (file->f_mode & FMODE_WRITE) &&
> +	    !(ip->i_diflags & XFS_DIFLAG_APPEND) &&
>  	    !xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
>  	    xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
>  		if (xfs_can_free_eofblocks(ip)) {
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 9967334ea99f1a..0f07ec842b7023 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1158,7 +1158,7 @@ xfs_inode_free_eofblocks(
>  	if (xfs_can_free_eofblocks(ip))
>  		return xfs_free_eofblocks(ip);
>  
> -	/* inode could be preallocated or append-only */
> +	/* inode could be preallocated */
>  	trace_xfs_inode_free_eofblocks_invalid(ip);
>  	xfs_inode_clear_eofblocks_tag(ip);
>  	return 0;
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] xfs: reclaim speculative preallocations for append only files
  2024-06-24 15:54   ` Darrick J. Wong
@ 2024-06-24 16:07     ` Christoph Hellwig
  2024-06-24 17:06       ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-24 16:07 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 08:54:43AM -0700, Darrick J. Wong wrote:
> >  
> >  		if (xfs_get_extsz_hint(ip) ||
> > -		    (ip->i_diflags &
> > -		     (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))
> > +		    (ip->i_diflags & XFS_DIFLAG_PREALLOC))
> 
> The last time you tried to remove XFS_DIFLAG_APPEND from this test, I
> noticed that there's some fstest that "fails" because the bmap output
> for an append-only file now stops at isize instead of maxbytes.  Do you
> see this same regression?

No.   What test did you see it with?  Any special mkfs or mount options?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/10] xfs: only free posteof blocks on first close
  2024-06-24 15:46   ` Darrick J. Wong
@ 2024-06-24 16:08     ` Christoph Hellwig
  2024-06-24 16:49       ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-24 16:08 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 08:46:21AM -0700, Darrick J. Wong wrote:
> On Sun, Jun 23, 2024 at 07:34:52AM +0200, Christoph Hellwig wrote:
> > From: "Darrick J. Wong" <djwong@kernel.org>
> > 
> > Certain workloads fragment files on XFS very badly, such as a software
> > package that creates a number of threads, each of which repeatedly run
> > the sequence: open a file, perform a synchronous write, and close the
> > file, which defeats the speculative preallocation mechanism.  We work
> > around this problem by only deleting posteof blocks the /first/ time a
> > file is closed to preserve the behavior that unpacking a tarball lays
> > out files one after the other with no gaps.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > [hch: rebased, updated comment, renamed the flag]
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Someone please review this?  The last person to try was Dave, five years
> ago, and I do not know if he ever saw what it did to various workloads.
> 
> https://lore.kernel.org/linux-xfs/20190315034237.GL23020@dastard/

Well, the read-only check Dave suggested is in the previous patch,
and the tests he sent cover the relevant synthetic workloads.  What
else are you looking for?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/10] xfs: only free posteof blocks on first close
  2024-06-24 16:08     ` Christoph Hellwig
@ 2024-06-24 16:49       ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 16:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 06:08:23PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 24, 2024 at 08:46:21AM -0700, Darrick J. Wong wrote:
> > On Sun, Jun 23, 2024 at 07:34:52AM +0200, Christoph Hellwig wrote:
> > > From: "Darrick J. Wong" <djwong@kernel.org>
> > > 
> > > Certain workloads fragment files on XFS very badly, such as a software
> > > package that creates a number of threads, each of which repeatedly run
> > > the sequence: open a file, perform a synchronous write, and close the
> > > file, which defeats the speculative preallocation mechanism.  We work
> > > around this problem by only deleting posteof blocks the /first/ time a
> > > file is closed to preserve the behavior that unpacking a tarball lays
> > > out files one after the other with no gaps.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > [hch: rebased, updated comment, renamed the flag]
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > 
> > Someone please review this?  The last person to try was Dave, five years
> > ago, and I do not know if he ever saw what it did to various workloads.
> > 
> > https://lore.kernel.org/linux-xfs/20190315034237.GL23020@dastard/
> 
> Well, the read-only check Dave suggested is in the previous patch,
> and the tests he sent cover the relevant synthetic workloads.  What
> else are you looking for?

Nothing -- it looks fine to me, but as it's authored by me, I can't
meaningfully slap an RVB tag on it, can I?

Eh I've done the rest of the series; let's try it anyway:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] xfs: reclaim speculative preallocations for append only files
  2024-06-24 16:07     ` Christoph Hellwig
@ 2024-06-24 17:06       ` Darrick J. Wong
  2024-06-24 17:22         ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 17:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 06:07:35PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 24, 2024 at 08:54:43AM -0700, Darrick J. Wong wrote:
> > >  
> > >  		if (xfs_get_extsz_hint(ip) ||
> > > -		    (ip->i_diflags &
> > > -		     (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))
> > > +		    (ip->i_diflags & XFS_DIFLAG_PREALLOC))
> > 
> > The last time you tried to remove XFS_DIFLAG_APPEND from this test, I
> > noticed that there's some fstest that "fails" because the bmap output
> > for an append-only file now stops at isize instead of maxbytes.  Do you
> > see this same regression?
> 
> No.   What test did you see it with?  Any special mkfs or mount options?

IIRC /think/ it was xfs/009.  No particularly special mkfs/mount
options, though my memory of 10 days ago is a bit hazy now. :(

--D

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] xfs: reclaim speculative preallocations for append only files
  2024-06-24 17:06       ` Darrick J. Wong
@ 2024-06-24 17:22         ` Christoph Hellwig
  2024-06-24 18:44           ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2024-06-24 17:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 10:06:18AM -0700, Darrick J. Wong wrote:
> On Mon, Jun 24, 2024 at 06:07:35PM +0200, Christoph Hellwig wrote:
> > On Mon, Jun 24, 2024 at 08:54:43AM -0700, Darrick J. Wong wrote:
> > > >  
> > > >  		if (xfs_get_extsz_hint(ip) ||
> > > > -		    (ip->i_diflags &
> > > > -		     (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))
> > > > +		    (ip->i_diflags & XFS_DIFLAG_PREALLOC))
> > > 
> > > The last time you tried to remove XFS_DIFLAG_APPEND from this test, I
> > > noticed that there's some fstest that "fails" because the bmap output
> > > for an append-only file now stops at isize instead of maxbytes.  Do you
> > > see this same regression?
> > 
> > No.   What test did you see it with?  Any special mkfs or mount options?
> 
> IIRC /think/ it was xfs/009.  No particularly special mkfs/mount
> options, though my memory of 10 days ago is a bit hazy now. :(

009 was the case when I also removed the XFS_DIFLAG_PREALLOC check
accidentally in the very first version.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/10] xfs: reclaim speculative preallocations for append only files
  2024-06-24 17:22         ` Christoph Hellwig
@ 2024-06-24 18:44           ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-06-24 18:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 07:22:53PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 24, 2024 at 10:06:18AM -0700, Darrick J. Wong wrote:
> > On Mon, Jun 24, 2024 at 06:07:35PM +0200, Christoph Hellwig wrote:
> > > On Mon, Jun 24, 2024 at 08:54:43AM -0700, Darrick J. Wong wrote:
> > > > >  
> > > > >  		if (xfs_get_extsz_hint(ip) ||
> > > > > -		    (ip->i_diflags &
> > > > > -		     (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))
> > > > > +		    (ip->i_diflags & XFS_DIFLAG_PREALLOC))
> > > > 
> > > > The last time you tried to remove XFS_DIFLAG_APPEND from this test, I
> > > > noticed that there's some fstest that "fails" because the bmap output
> > > > for an append-only file now stops at isize instead of maxbytes.  Do you
> > > > see this same regression?
> > > 
> > > No.   What test did you see it with?  Any special mkfs or mount options?
> > 
> > IIRC /think/ it was xfs/009.  No particularly special mkfs/mount
> > options, though my memory of 10 days ago is a bit hazy now. :(
> 
> 009 was the case when I also removed the XFS_DIFLAG_PREALLOC check
> accidentally in the very first version.

Aha!  /me sneezes for the 10,000th time today and says

With the typos fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 04/10] xfs: don't bother returning errors from xfs_file_release
  2024-06-24 15:51     ` Christoph Hellwig
@ 2024-08-07 14:59       ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-08-07 14:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 05:51:24PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 24, 2024 at 08:39:51AM -0700, Darrick J. Wong wrote:
> > > +/*
> > > + * Don't bother propagating errors.  We're just doing cleanup, and the caller
> > > + * ignores the return value anyway.
> > 
> > Shouldn't we drop the int return from the function declaration, then?
> > 
> > (Is that also a cleanup that's you're working on?)
> 
> We can't drop it without changing the f_ops->release signature and
> updates to the many instance of it.  That would still be worthwhile
> project, but it's something for someone who is better at scripting
> than me.

Yeahhhhh... at this point it's a giant treewide change that just doesn't
seem worth it.  Maybe save it for someone who wants to make a subtle but
important behavior change and needs a type signature change to stand in
as an idiot light for out of tree modules. :P

Anyway I think this is fine, let's merge this series
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/10] xfs: remove the i_mode check in xfs_release
  2024-06-24 15:50     ` Christoph Hellwig
@ 2024-08-07 15:01       ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2024-08-07 15:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Mon, Jun 24, 2024 at 05:50:11PM +0200, Christoph Hellwig wrote:
> On Mon, Jun 24, 2024 at 08:34:59AM -0700, Darrick J. Wong wrote:
> > > -	if (!S_ISREG(VFS_I(ip)->i_mode) || (VFS_I(ip)->i_mode == 0))
> > 
> > How would we encounter !i_mode regular files being released?
> 
> We can't.  If that code ever made any sense than in ancient pre-history
> in IRIX.
> 
> > If an open file's link count is incorrectly low, it can't get freed
> > until after all the open file descriptors have been released, right?
> > Or is there some other vector for this?
> 
> No.
> 
> > I'm wondering if this ought to be:
> > 
> > 	if (XFS_IS_CORRUPT(mp, !VFS_I(ip)->i_mode)) {
> > 		xfs_inode_mark_sick(ip);
> > 		return -EFSCORRUPTED;
> > 	}
> 
> I wouldn't even bother with that.

Oh, right, because xfs_iget checks that for us and bails out before the
vfs can even get its hands on the file.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-08-07 15:01 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-23  5:34 post-EOF block handling revamp Christoph Hellwig
2024-06-23  5:34 ` [PATCH 01/10] xfs: fix freeing speculative preallocations for preallocated files Christoph Hellwig
2024-06-24 15:30   ` Darrick J. Wong
2024-06-23  5:34 ` [PATCH 02/10] xfs: remove the i_mode check in xfs_release Christoph Hellwig
2024-06-24 15:34   ` Darrick J. Wong
2024-06-24 15:50     ` Christoph Hellwig
2024-08-07 15:01       ` Darrick J. Wong
2024-06-23  5:34 ` [PATCH 03/10] xfs: refactor f_op->release handling Christoph Hellwig
2024-06-24 15:35   ` Darrick J. Wong
2024-06-23  5:34 ` [PATCH 04/10] xfs: don't bother returning errors from xfs_file_release Christoph Hellwig
2024-06-24 15:39   ` Darrick J. Wong
2024-06-24 15:51     ` Christoph Hellwig
2024-08-07 14:59       ` Darrick J. Wong
2024-06-23  5:34 ` [PATCH 05/10] xfs: skip all of xfs_file_release when shut down Christoph Hellwig
2024-06-24 15:41   ` Darrick J. Wong
2024-06-23  5:34 ` [PATCH 06/10] xfs: don't free post-EOF blocks on read close Christoph Hellwig
2024-06-24 15:43   ` Darrick J. Wong
2024-06-23  5:34 ` [PATCH 07/10] xfs: only free posteof blocks on first close Christoph Hellwig
2024-06-24 15:46   ` Darrick J. Wong
2024-06-24 16:08     ` Christoph Hellwig
2024-06-24 16:49       ` Darrick J. Wong
2024-06-23  5:34 ` [PATCH 08/10] xfs: check XFS_IDIRTY_RELEASE earlier in xfs_release_eofblocks Christoph Hellwig
2024-06-24 15:50   ` Darrick J. Wong
2024-06-24 15:54     ` Christoph Hellwig
2024-06-23  5:34 ` [PATCH 09/10] xfs: simplify extent lookup in xfs_can_free_eofblocks Christoph Hellwig
2024-06-24 15:51   ` Darrick J. Wong
2024-06-23  5:34 ` [PATCH 10/10] xfs: reclaim speculative preallocations for append only files Christoph Hellwig
2024-06-24 15:54   ` Darrick J. Wong
2024-06-24 16:07     ` Christoph Hellwig
2024-06-24 17:06       ` Darrick J. Wong
2024-06-24 17:22         ` Christoph Hellwig
2024-06-24 18:44           ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox