public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* RFC: extended version of the xfs_bmapi_write retval fix
@ 2024-04-08 14:54 Christoph Hellwig
  2024-04-08 14:54 ` [PATCH 1/8] xfs: fix error returns from xfs_bmapi_write Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

Hi all,

this series tries to take the xfs_bmapi_wite retval fix a little further,
but I'm not entirely happy with the result.

The first patch is the actual return value fix with the comments fixed
up as per the comment from Darrick, and a dead check removed I found
while developing the rest.

The following patches are random cleanups to get the code in shape for
the last patches.

The second to last patch fixes long-standing issues in
xfs_bmap_add_extent_delay_real in code paths that haven't been used much
or at all.

The last patch changes xfs_bmapi_write to not convert the entire
delalloc extent if it hits one.  This sounds simpler than it is, because
delalloc conversion has been designed to always convert the entire
extent since the initial delalloc commits.   I'm not really sure that
putting strain on this code now will do us much good, so I'll just leave
these patches out for comments for now and will look into how coding up
a loop of delalloc conversion calls in every place that could allocate
blocks in the data fork of a regular file and thus hit delalloc so we
can compare the approaches.

In the mean time I think that patch 1 is a candidate for 6.9 as it fixes
the fuzzer problem.

Diffstat:
 libxfs/xfs_attr_remote.c |    1 
 libxfs/xfs_bmap.c        |  120 ++++++++++++++++++++++++++++-------------------
 libxfs/xfs_da_btree.c    |   20 +------
 scrub/quota_repair.c     |    6 --
 scrub/rtbitmap_repair.c  |    2 
 xfs_bmap_util.c          |   31 +++++-------
 xfs_dquot.c              |    1 
 xfs_iomap.c              |    8 ---
 xfs_reflink.c            |   14 -----
 xfs_rtalloc.c            |    2 
 10 files changed, 93 insertions(+), 112 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] xfs: fix error returns from xfs_bmapi_write
  2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
@ 2024-04-08 14:54 ` Christoph Hellwig
  2024-04-09 23:19   ` Darrick J. Wong
  2024-04-08 14:54 ` [PATCH 2/8] xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

xfs_bmapi_write can return 0 without actually returning a mapping in
mval in two different cases:

 1) when there is absolutely no space available to do an allocation
 2) when converting delalloc space, and the allocation is so small
    that it only covers parts of the delalloc extent before the
    range requested by the caller

Callers at best can handle one of these cases, but in many cases can't
cope with either one.  Switch xfs_bmapi_write to always return a
mapping or return an error code instead.  For case 1) above ENOSPC is
the obvious choice which is very much what the callers expect anyway.
For case 2) there is no really good error code, so pick a funky one
from the SysV streams portfolio.

This fixes the reproducer here:

    https://lore.kernel.org/linux-xfs/CAEJPjCvT3Uag-pMTYuigEjWZHn1sGMZ0GCjVVCv29tNHK76Cgg@mail.gmail.com0/

which uses reserved blocks to create file systems that are gravely
out of space and thus cause at least xfs_file_alloc_space to hang
and trigger the lack of ENOSPC handling in xfs_dquot_disk_alloc.

Note that this patch does not actually make any caller but
xfs_alloc_file_space deal intelligently with case 2) above.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_attr_remote.c |  1 -
 fs/xfs/libxfs/xfs_bmap.c        | 46 ++++++++++++++++++++++++++-------
 fs/xfs/libxfs/xfs_da_btree.c    | 20 ++++----------
 fs/xfs/scrub/quota_repair.c     |  6 -----
 fs/xfs/scrub/rtbitmap_repair.c  |  2 --
 fs/xfs/xfs_bmap_util.c          | 31 +++++++++++-----------
 fs/xfs/xfs_dquot.c              |  1 -
 fs/xfs/xfs_iomap.c              |  8 ------
 fs/xfs/xfs_reflink.c            | 14 ----------
 fs/xfs/xfs_rtalloc.c            |  2 --
 10 files changed, 57 insertions(+), 74 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index ff04128287720a..41a02dcc2541b0 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -626,7 +626,6 @@ xfs_attr_rmtval_set_blk(
 	if (error)
 		return error;
 
-	ASSERT(nmap == 1);
 	ASSERT((map->br_startblock != DELAYSTARTBLOCK) &&
 	       (map->br_startblock != HOLESTARTBLOCK));
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 656c95a22f2e6d..14196d918865ba 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4226,8 +4226,10 @@ xfs_bmapi_allocate(
 	} else {
 		error = xfs_bmap_alloc_userdata(bma);
 	}
-	if (error || bma->blkno == NULLFSBLOCK)
+	if (error)
 		return error;
+	if (bma->blkno == NULLFSBLOCK)
+		return -ENOSPC;
 
 	if (bma->flags & XFS_BMAPI_ZERO) {
 		error = xfs_zero_extent(bma->ip, bma->blkno, bma->length);
@@ -4406,6 +4408,15 @@ xfs_bmapi_finish(
  * extent state if necessary.  Details behaviour is controlled by the flags
  * parameter.  Only allocates blocks from a single allocation group, to avoid
  * locking problems.
+ *
+ * Returns 0 on success and places the extent mappings in mval.  nmaps is used
+ * as an input/output parameter where the caller specifies the maximum number
+ * of mappings that may be returned and xfs_bmapi_write passes back the number
+ * of mappings (including existing mappings) it found.
+ *
+ * Returns a negative error code on failure, including -ENOSPC when it could not
+ * allocate any blocks and -ENOSR when it did allocate blocks to convert a
+ * delalloc range, but those blocks were before the passed in range.
  */
 int
 xfs_bmapi_write(
@@ -4534,10 +4545,16 @@ xfs_bmapi_write(
 			ASSERT(len > 0);
 			ASSERT(bma.length > 0);
 			error = xfs_bmapi_allocate(&bma);
-			if (error)
+			if (error) {
+				/*
+				 * If we already allocated space in a previous
+				 * iteration return what we go so far when
+				 * running out of space.
+				 */
+				if (error == -ENOSPC && bma.nallocs)
+					break;
 				goto error0;
-			if (bma.blkno == NULLFSBLOCK)
-				break;
+			}
 
 			/*
 			 * If this is a CoW allocation, record the data in
@@ -4575,7 +4592,6 @@ xfs_bmapi_write(
 		if (!xfs_iext_next_extent(ifp, &bma.icur, &bma.got))
 			eof = true;
 	}
-	*nmap = n;
 
 	error = xfs_bmap_btree_to_extents(tp, ip, bma.cur, &bma.logflags,
 			whichfork);
@@ -4586,7 +4602,22 @@ xfs_bmapi_write(
 	       ifp->if_nextents > XFS_IFORK_MAXEXT(ip, whichfork));
 	xfs_bmapi_finish(&bma, whichfork, 0);
 	xfs_bmap_validate_ret(orig_bno, orig_len, orig_flags, orig_mval,
-		orig_nmap, *nmap);
+		orig_nmap, n);
+
+	/*
+	 * When converting delayed allocations, xfs_bmapi_allocate ignores
+	 * the passed in bno and always converts from the start of the found
+	 * delalloc extent.
+	 *
+	 * To avoid a successful return with *nmap set to 0, return the magic
+	 * -ENOSR error code for this particular case so that the caller can
+	 * handle it.
+	 */
+	if (!n) {
+		ASSERT(bma.nallocs >= *nmap);
+		return -ENOSR;
+	}
+	*nmap = n;
 	return 0;
 error0:
 	xfs_bmapi_finish(&bma, whichfork, error);
@@ -4693,9 +4724,6 @@ xfs_bmapi_convert_delalloc(
 	if (error)
 		goto out_finish;
 
-	error = -ENOSPC;
-	if (WARN_ON_ONCE(bma.blkno == NULLFSBLOCK))
-		goto out_finish;
 	if (WARN_ON_ONCE(!xfs_valid_startblock(ip, bma.got.br_startblock))) {
 		xfs_bmap_mark_sick(ip, whichfork);
 		error = -EFSCORRUPTED;
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index 718d071bb21ae3..276c710548b5a7 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2167,8 +2167,8 @@ xfs_da_grow_inode_int(
 	struct xfs_inode	*dp = args->dp;
 	int			w = args->whichfork;
 	xfs_rfsblock_t		nblks = dp->i_nblocks;
-	struct xfs_bmbt_irec	map, *mapp;
-	int			nmap, error, got, i, mapi;
+	struct xfs_bmbt_irec	map, *mapp = &map;
+	int			nmap, error, got, i, mapi = 1;
 
 	/*
 	 * Find a spot in the file space to put the new block.
@@ -2184,14 +2184,7 @@ xfs_da_grow_inode_int(
 	error = xfs_bmapi_write(tp, dp, *bno, count,
 			xfs_bmapi_aflag(w)|XFS_BMAPI_METADATA|XFS_BMAPI_CONTIG,
 			args->total, &map, &nmap);
-	if (error)
-		return error;
-
-	ASSERT(nmap <= 1);
-	if (nmap == 1) {
-		mapp = &map;
-		mapi = 1;
-	} else if (nmap == 0 && count > 1) {
+	if (error == -ENOSPC && count > 1) {
 		xfs_fileoff_t		b;
 		int			c;
 
@@ -2209,16 +2202,13 @@ xfs_da_grow_inode_int(
 					args->total, &mapp[mapi], &nmap);
 			if (error)
 				goto out_free_map;
-			if (nmap < 1)
-				break;
 			mapi += nmap;
 			b = mapp[mapi - 1].br_startoff +
 			    mapp[mapi - 1].br_blockcount;
 		}
-	} else {
-		mapi = 0;
-		mapp = NULL;
 	}
+	if (error)
+		goto out_free_map;
 
 	/*
 	 * Count the blocks we got, make sure it matches the total.
diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c
index 0bab4c30cb85ab..90cd1512bba961 100644
--- a/fs/xfs/scrub/quota_repair.c
+++ b/fs/xfs/scrub/quota_repair.c
@@ -77,8 +77,6 @@ xrep_quota_item_fill_bmap_hole(
 			irec, &nmaps);
 	if (error)
 		return error;
-	if (nmaps != 1)
-		return -ENOSPC;
 
 	dq->q_blkno = XFS_FSB_TO_DADDR(mp, irec->br_startblock);
 
@@ -444,10 +442,6 @@ xrep_quota_data_fork(
 					XFS_BMAPI_CONVERT, 0, &nrec, &nmap);
 			if (error)
 				goto out;
-			if (nmap != 1) {
-				error = -ENOSPC;
-				goto out;
-			}
 			ASSERT(nrec.br_startoff == irec.br_startoff);
 			ASSERT(nrec.br_blockcount == irec.br_blockcount);
 
diff --git a/fs/xfs/scrub/rtbitmap_repair.c b/fs/xfs/scrub/rtbitmap_repair.c
index 46f5d5f605c915..0fef98e9f83409 100644
--- a/fs/xfs/scrub/rtbitmap_repair.c
+++ b/fs/xfs/scrub/rtbitmap_repair.c
@@ -108,8 +108,6 @@ xrep_rtbitmap_data_mappings(
 				0, &map, &nmaps);
 		if (error)
 			return error;
-		if (nmaps != 1)
-			return -EFSCORRUPTED;
 
 		/* Commit new extent and all deferred work. */
 		error = xrep_defer_finish(sc);
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 19e11d1da66074..fbca94170cd386 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -721,33 +721,32 @@ xfs_alloc_file_space(
 		if (error)
 			goto error;
 
-		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
-				allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
-				&nimaps);
-		if (error)
-			goto error;
-
-		ip->i_diflags |= XFS_DIFLAG_PREALLOC;
-		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
-
-		error = xfs_trans_commit(tp);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		if (error)
-			break;
-
 		/*
 		 * If the allocator cannot find a single free extent large
 		 * enough to cover the start block of the requested range,
-		 * xfs_bmapi_write will return 0 but leave *nimaps set to 0.
+		 * xfs_bmapi_write will return -ENOSR.
 		 *
 		 * In that case we simply need to keep looping with the same
 		 * startoffset_fsb so that one of the following allocations
 		 * will eventually reach the requested range.
 		 */
-		if (nimaps) {
+		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
+				allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
+				&nimaps);
+		if (error) {
+			if (error != -ENOSR)
+				goto error;
+			error = 0;
+		} else {
 			startoffset_fsb += imapp->br_blockcount;
 			allocatesize_fsb -= imapp->br_blockcount;
 		}
+
+		ip->i_diflags |= XFS_DIFLAG_PREALLOC;
+		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+
+		error = xfs_trans_commit(tp);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	}
 
 	return error;
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index c98cb468c35780..0c9eb8fdeec082 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -357,7 +357,6 @@ xfs_dquot_disk_alloc(
 		goto err_cancel;
 
 	ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
-	ASSERT(nmaps == 1);
 	ASSERT((map.br_startblock != DELAYSTARTBLOCK) &&
 	       (map.br_startblock != HOLESTARTBLOCK));
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 4087af7f3c9f3f..42155cedefb77d 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -321,14 +321,6 @@ xfs_iomap_write_direct(
 	if (error)
 		goto out_unlock;
 
-	/*
-	 * Copy any maps to caller's array and return any error.
-	 */
-	if (nimaps == 0) {
-		error = -ENOSPC;
-		goto out_unlock;
-	}
-
 	if (unlikely(!xfs_valid_startblock(ip, imap->br_startblock))) {
 		xfs_bmap_mark_sick(ip, XFS_DATA_FORK);
 		error = xfs_alert_fsblock_zero(ip, imap);
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 7da0e8f961d351..5ecb52a234becc 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -430,13 +430,6 @@ xfs_reflink_fill_cow_hole(
 	if (error)
 		return error;
 
-	/*
-	 * Allocation succeeded but the requested range was not even partially
-	 * satisfied?  Bail out!
-	 */
-	if (nimaps == 0)
-		return -ENOSPC;
-
 convert:
 	return xfs_reflink_convert_unwritten(ip, imap, cmap, convert_now);
 
@@ -499,13 +492,6 @@ xfs_reflink_fill_delalloc(
 		error = xfs_trans_commit(tp);
 		if (error)
 			return error;
-
-		/*
-		 * Allocation succeeded but the requested range was not even
-		 * partially satisfied?  Bail out!
-		 */
-		if (nimaps == 0)
-			return -ENOSPC;
 	} while (cmap->br_startoff + cmap->br_blockcount <= imap->br_startoff);
 
 	return xfs_reflink_convert_unwritten(ip, imap, cmap, convert_now);
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index e66f9bd5de5cff..46ee8093f797a6 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -709,8 +709,6 @@ xfs_growfs_rt_alloc(
 		nmap = 1;
 		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
 					XFS_BMAPI_METADATA, 0, &map, &nmap);
-		if (!error && nmap < 1)
-			error = -ENOSPC;
 		if (error)
 			goto out_trans_cancel;
 		/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate
  2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
  2024-04-08 14:54 ` [PATCH 1/8] xfs: fix error returns from xfs_bmapi_write Christoph Hellwig
@ 2024-04-08 14:54 ` Christoph Hellwig
  2024-04-09 23:17   ` Darrick J. Wong
  2024-04-08 14:54 ` [PATCH 3/8] xfs: lifr a xfs_valid_startblock into xfs_bmapi_allocate Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

tmp_logflags is initialized to 0 and then ORed into bma->logflags, which
isn't actually doing anything.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 14196d918865ba..a847159302703a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4191,7 +4191,6 @@ xfs_bmapi_allocate(
 	struct xfs_mount	*mp = bma->ip->i_mount;
 	int			whichfork = xfs_bmapi_whichfork(bma->flags);
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(bma->ip, whichfork);
-	int			tmp_logflags = 0;
 	int			error;
 
 	ASSERT(bma->length > 0);
@@ -4262,8 +4261,6 @@ xfs_bmapi_allocate(
 		error = xfs_bmap_add_extent_hole_real(bma->tp, bma->ip,
 				whichfork, &bma->icur, &bma->cur, &bma->got,
 				&bma->logflags, bma->flags);
-
-	bma->logflags |= tmp_logflags;
 	if (error)
 		return error;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] xfs: lifr a xfs_valid_startblock into xfs_bmapi_allocate
  2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
  2024-04-08 14:54 ` [PATCH 1/8] xfs: fix error returns from xfs_bmapi_write Christoph Hellwig
  2024-04-08 14:54 ` [PATCH 2/8] xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate Christoph Hellwig
@ 2024-04-08 14:54 ` Christoph Hellwig
  2024-04-09 23:17   ` Darrick J. Wong
  2024-04-08 14:54 ` [PATCH 4/8] xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

xfs_bmapi_convert_delalloc has a xfs_valid_startblock check on the block
allocated by xfs_bmapi_allocate.  Lift it into xfs_bmapi_allocate as
we should assert the same for xfs_bmapi_write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a847159302703a..3b5816de4af2a1 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4230,6 +4230,11 @@ xfs_bmapi_allocate(
 	if (bma->blkno == NULLFSBLOCK)
 		return -ENOSPC;
 
+	if (WARN_ON_ONCE(!xfs_valid_startblock(bma->ip, bma->blkno))) {
+		xfs_bmap_mark_sick(bma->ip, whichfork);
+		return -EFSCORRUPTED;
+	}
+
 	if (bma->flags & XFS_BMAPI_ZERO) {
 		error = xfs_zero_extent(bma->ip, bma->blkno, bma->length);
 		if (error)
@@ -4721,12 +4726,6 @@ xfs_bmapi_convert_delalloc(
 	if (error)
 		goto out_finish;
 
-	if (WARN_ON_ONCE(!xfs_valid_startblock(ip, bma.got.br_startblock))) {
-		xfs_bmap_mark_sick(ip, whichfork);
-		error = -EFSCORRUPTED;
-		goto out_finish;
-	}
-
 	XFS_STATS_ADD(mp, xs_xstrat_bytes, XFS_FSB_TO_B(mp, bma.length));
 	XFS_STATS_INC(mp, xs_xstrat_quick);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write
  2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-04-08 14:54 ` [PATCH 3/8] xfs: lifr a xfs_valid_startblock into xfs_bmapi_allocate Christoph Hellwig
@ 2024-04-08 14:54 ` Christoph Hellwig
  2024-04-09 23:17   ` Darrick J. Wong
  2024-04-08 14:54 ` [PATCH 5/8] xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

XFS_FILBLKS_MIN uses min_t and thus does the comparison using the correct
xfs_filblks_t type.  Use it in xfs_bmapi_write and slightly adjust the
comment document th potential pitfall to take account of this

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 3b5816de4af2a1..f2e934c2fb423c 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4537,14 +4537,11 @@ xfs_bmapi_write(
 			 * allocation length request (which can be 64 bits in
 			 * length) and the bma length request, which is
 			 * xfs_extlen_t and therefore 32 bits. Hence we have to
-			 * check for 32-bit overflows and handle them here.
+			 * be careful and do the min() using the larger type to
+			 * avoid overflows.
 			 */
-			if (len > (xfs_filblks_t)XFS_MAX_BMBT_EXTLEN)
-				bma.length = XFS_MAX_BMBT_EXTLEN;
-			else
-				bma.length = len;
+			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
 
-			ASSERT(len > 0);
 			ASSERT(bma.length > 0);
 			error = xfs_bmapi_allocate(&bma);
 			if (error) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate
  2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-04-08 14:54 ` [PATCH 4/8] xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write Christoph Hellwig
@ 2024-04-08 14:54 ` Christoph Hellwig
  2024-04-09 23:20   ` Darrick J. Wong
  2024-04-08 14:54 ` [PATCH 6/8] xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

xfs_bmapi_allocate currently overwrites offset and len when converting
delayed allocations, and duplicates the length cap done for non-delalloc
allocations.  Move all that logic into the callers to avoid duplication
and to make the calling conventions more obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index f2e934c2fb423c..aa182937de4641 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4194,21 +4194,11 @@ xfs_bmapi_allocate(
 	int			error;
 
 	ASSERT(bma->length > 0);
+	ASSERT(bma->length <= XFS_MAX_BMBT_EXTLEN);
 
-	/*
-	 * For the wasdelay case, we could also just allocate the stuff asked
-	 * for in this bmap call but that wouldn't be as good.
-	 */
 	if (bma->wasdel) {
-		bma->length = (xfs_extlen_t)bma->got.br_blockcount;
-		bma->offset = bma->got.br_startoff;
 		if (!xfs_iext_peek_prev_extent(ifp, &bma->icur, &bma->prev))
 			bma->prev.br_startoff = NULLFILEOFF;
-	} else {
-		bma->length = XFS_FILBLKS_MIN(bma->length, XFS_MAX_BMBT_EXTLEN);
-		if (!bma->eof)
-			bma->length = XFS_FILBLKS_MIN(bma->length,
-					bma->got.br_startoff - bma->offset);
 	}
 
 	if (bma->flags & XFS_BMAPI_CONTIG)
@@ -4542,6 +4532,15 @@ xfs_bmapi_write(
 			 */
 			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
 
+			if (wasdelay) {
+				bma.offset = bma.got.br_startoff;
+				bma.length = bma.got.br_blockcount;
+			} else {
+				if (!eof)
+					bma.length = XFS_FILBLKS_MIN(bma.length,
+						bma.got.br_startoff - bno);
+			}
+
 			ASSERT(bma.length > 0);
 			error = xfs_bmapi_allocate(&bma);
 			if (error) {
@@ -4694,11 +4693,16 @@ xfs_bmapi_convert_delalloc(
 	bma.tp = tp;
 	bma.ip = ip;
 	bma.wasdel = true;
-	bma.offset = bma.got.br_startoff;
-	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount,
-			XFS_MAX_BMBT_EXTLEN);
 	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
 
+	/*
+	 * Always allocate convert from the start of the delalloc extent even if
+	 * that is outside the passed in range to create large contiguous
+	 * extents on disk.
+	 */
+	bma.offset = bma.got.br_startoff;
+	bma.length = bma.got.br_blockcount;
+
 	/*
 	 * When we're converting the delalloc reservations backing dirty pages
 	 * in the page cache, we must be careful about how we create the new
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/8] xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate
  2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-04-08 14:54 ` [PATCH 5/8] xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate Christoph Hellwig
@ 2024-04-08 14:54 ` Christoph Hellwig
  2024-04-09 23:16   ` Darrick J. Wong
  2024-04-08 14:54 ` [PATCH 7/8] xfs: fix xfs_bmap_add_extent_delay_real for partial conversions Christoph Hellwig
  2024-04-08 14:54 ` [PATCH 8/8] xfs: do not allocate the entire delalloc extent in xfs_bmapi_write Christoph Hellwig
  7 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

Both callers of xfs_bmapi_allocate already initialize bma->prev, don't
redo that in xfs_bmapi_allocate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index aa182937de4641..08e13ebdee1b53 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4196,11 +4196,6 @@ xfs_bmapi_allocate(
 	ASSERT(bma->length > 0);
 	ASSERT(bma->length <= XFS_MAX_BMBT_EXTLEN);
 
-	if (bma->wasdel) {
-		if (!xfs_iext_peek_prev_extent(ifp, &bma->icur, &bma->prev))
-			bma->prev.br_startoff = NULLFILEOFF;
-	}
-
 	if (bma->flags & XFS_BMAPI_CONTIG)
 		bma->minlen = bma->length;
 	else
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
  2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-04-08 14:54 ` [PATCH 6/8] xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate Christoph Hellwig
@ 2024-04-08 14:54 ` Christoph Hellwig
  2024-04-09 23:16   ` Darrick J. Wong
  2024-04-08 14:54 ` [PATCH 8/8] xfs: do not allocate the entire delalloc extent in xfs_bmapi_write Christoph Hellwig
  7 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

xfs_bmap_add_extent_delay_real takes parts or all of a delalloc extent
and converts them to a real extent.  It is written to deal with any
potential overlap of the to be converted range with the delalloc extent,
but it turns out that currently only converting the entire extents, or a
part starting at the beginning is actually exercised, as the only caller
always tries to convert the entire delalloc extent, and either succeeds
or at least progresses partially from the start.

If it only converts a tiny part of a delalloc extent, the indirect block
calculation for the new delalloc extent (da_new) might be equivalent to that
of the existing delalloc extent (da_new).  If this extent conversion now
requires allocating an indirect block that gets accounted into da_new,
leading to the assert that da_new must be smaller or equal to da_new
unless we split the extent to trigger.

Except for the assert that case is actually handled by just trying to
allocate more space, as that already handled for the split case (which
currently can't be reached at all), so just reusing it should be fine.
Except that without dipping into the reserved block pool that would make
it a bit too easy to trigger a fs shutdown due to ENOSPC.  So in addition
to adjusting the assert, also dip into the reserved block pool.

Note that I could only reproduce the assert with a change to only convert
the actually asked range instead of the full delalloc extent from
xfs_bmapi_write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 08e13ebdee1b53..7700a48e013d5a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1586,6 +1586,7 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		ASSERT(da_new <= da_old);
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG:
@@ -1616,6 +1617,7 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		ASSERT(da_new <= da_old);
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
@@ -1650,6 +1652,7 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		ASSERT(da_new <= da_old);
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
@@ -1684,6 +1687,7 @@ xfs_bmap_add_extent_delay_real(
 				goto done;
 			}
 		}
+		ASSERT(da_new <= da_old);
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG:
@@ -1722,6 +1726,7 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		ASSERT(da_new <= da_old);
 		break;
 
 	case BMAP_LEFT_FILLING:
@@ -1812,6 +1817,7 @@ xfs_bmap_add_extent_delay_real(
 		xfs_iext_update_extent(bma->ip, state, &bma->icur, &PREV);
 		xfs_iext_next(ifp, &bma->icur);
 		xfs_iext_update_extent(bma->ip, state, &bma->icur, &RIGHT);
+		ASSERT(da_new <= da_old);
 		break;
 
 	case BMAP_RIGHT_FILLING:
@@ -1861,6 +1867,7 @@ xfs_bmap_add_extent_delay_real(
 		PREV.br_blockcount = temp;
 		xfs_iext_insert(bma->ip, &bma->icur, &PREV, state);
 		xfs_iext_next(ifp, &bma->icur);
+		ASSERT(da_new <= da_old);
 		break;
 
 	case 0:
@@ -1983,11 +1990,9 @@ xfs_bmap_add_extent_delay_real(
 	}
 
 	/* adjust for changes in reserved delayed indirect blocks */
-	if (da_new != da_old) {
-		ASSERT(state == 0 || da_new < da_old);
+	if (da_new != da_old)
 		error = xfs_mod_fdblocks(mp, (int64_t)(da_old - da_new),
-				false);
-	}
+				da_new > da_old);
 
 	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork);
 done:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/8] xfs: do not allocate the entire delalloc extent in xfs_bmapi_write
  2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-04-08 14:54 ` [PATCH 7/8] xfs: fix xfs_bmap_add_extent_delay_real for partial conversions Christoph Hellwig
@ 2024-04-08 14:54 ` Christoph Hellwig
  2024-04-09 23:16   ` Darrick J. Wong
  7 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-08 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong; +Cc: Dave Chinner, open list:XFS FILESYSTEM

While trying to convert the entire delalloc extent is a good decision
for regular writeback as it leads to larger contigous on-disk extents,
but for other callers of xfs_bmapi_write is is rather questionable as
it forced them to loop creating new transactions just in case there
is no large enough contiguous extent to cover the whole delalloc
reservation.

Change xfs_bmapi_write to only allocate the passed in range instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7700a48e013d5a..748809b13113ab 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4533,8 +4533,9 @@ xfs_bmapi_write(
 			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
 
 			if (wasdelay) {
-				bma.offset = bma.got.br_startoff;
-				bma.length = bma.got.br_blockcount;
+				bma.length = XFS_FILBLKS_MIN(bma.length,
+					bma.got.br_blockcount -
+					(bno - bma.got.br_startoff));
 			} else {
 				if (!eof)
 					bma.length = XFS_FILBLKS_MIN(bma.length,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] xfs: do not allocate the entire delalloc extent in xfs_bmapi_write
  2024-04-08 14:54 ` [PATCH 8/8] xfs: do not allocate the entire delalloc extent in xfs_bmapi_write Christoph Hellwig
@ 2024-04-09 23:16   ` Darrick J. Wong
  2024-04-10  4:07     ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2024-04-09 23:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, open list:XFS FILESYSTEM

On Mon, Apr 08, 2024 at 04:54:54PM +0200, Christoph Hellwig wrote:
> While trying to convert the entire delalloc extent is a good decision
> for regular writeback as it leads to larger contigous on-disk extents,
> but for other callers of xfs_bmapi_write is is rather questionable as
> it forced them to loop creating new transactions just in case there
> is no large enough contiguous extent to cover the whole delalloc
> reservation.
> 
> Change xfs_bmapi_write to only allocate the passed in range instead.

Looking at this... I guess xfs_map_blocks -> xfs_convert_blocks ->
xfs_bmapi_convert_delalloc -> xfs_bmapi_allocate is now how writeback
converts delalloc extents before scheduling writeout.  This is how the
mass-conversions of large da reservations got done before this series,
and that's still how it works, right?

Whereas xfs_bmapi_write is for targeted conversions only?

> Signed-off-by: Christoph Hellwig <hch@lst.de>

If yes and yes, then:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7700a48e013d5a..748809b13113ab 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4533,8 +4533,9 @@ xfs_bmapi_write(
>  			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
>  
>  			if (wasdelay) {
> -				bma.offset = bma.got.br_startoff;
> -				bma.length = bma.got.br_blockcount;
> +				bma.length = XFS_FILBLKS_MIN(bma.length,
> +					bma.got.br_blockcount -
> +					(bno - bma.got.br_startoff));
>  			} else {
>  				if (!eof)
>  					bma.length = XFS_FILBLKS_MIN(bma.length,
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
  2024-04-08 14:54 ` [PATCH 7/8] xfs: fix xfs_bmap_add_extent_delay_real for partial conversions Christoph Hellwig
@ 2024-04-09 23:16   ` Darrick J. Wong
  2024-04-10  4:04     ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2024-04-09 23:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, open list:XFS FILESYSTEM

On Mon, Apr 08, 2024 at 04:54:53PM +0200, Christoph Hellwig wrote:
> xfs_bmap_add_extent_delay_real takes parts or all of a delalloc extent
> and converts them to a real extent.  It is written to deal with any
> potential overlap of the to be converted range with the delalloc extent,
> but it turns out that currently only converting the entire extents, or a
> part starting at the beginning is actually exercised, as the only caller
> always tries to convert the entire delalloc extent, and either succeeds
> or at least progresses partially from the start.
> 
> If it only converts a tiny part of a delalloc extent, the indirect block
> calculation for the new delalloc extent (da_new) might be equivalent to that
> of the existing delalloc extent (da_new).  If this extent conversion now

                                   da_old?

The code changes make sense to me otherwise.

--D

> requires allocating an indirect block that gets accounted into da_new,
> leading to the assert that da_new must be smaller or equal to da_new
> unless we split the extent to trigger.
> 
> Except for the assert that case is actually handled by just trying to
> allocate more space, as that already handled for the split case (which
> currently can't be reached at all), so just reusing it should be fine.
> Except that without dipping into the reserved block pool that would make
> it a bit too easy to trigger a fs shutdown due to ENOSPC.  So in addition
> to adjusting the assert, also dip into the reserved block pool.
> 
> Note that I could only reproduce the assert with a change to only convert
> the actually asked range instead of the full delalloc extent from
> xfs_bmapi_write.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 08e13ebdee1b53..7700a48e013d5a 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -1586,6 +1586,7 @@ xfs_bmap_add_extent_delay_real(
>  			if (error)
>  				goto done;
>  		}
> +		ASSERT(da_new <= da_old);
>  		break;
>  
>  	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG:
> @@ -1616,6 +1617,7 @@ xfs_bmap_add_extent_delay_real(
>  			if (error)
>  				goto done;
>  		}
> +		ASSERT(da_new <= da_old);
>  		break;
>  
>  	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
> @@ -1650,6 +1652,7 @@ xfs_bmap_add_extent_delay_real(
>  			if (error)
>  				goto done;
>  		}
> +		ASSERT(da_new <= da_old);
>  		break;
>  
>  	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
> @@ -1684,6 +1687,7 @@ xfs_bmap_add_extent_delay_real(
>  				goto done;
>  			}
>  		}
> +		ASSERT(da_new <= da_old);
>  		break;
>  
>  	case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG:
> @@ -1722,6 +1726,7 @@ xfs_bmap_add_extent_delay_real(
>  			if (error)
>  				goto done;
>  		}
> +		ASSERT(da_new <= da_old);
>  		break;
>  
>  	case BMAP_LEFT_FILLING:
> @@ -1812,6 +1817,7 @@ xfs_bmap_add_extent_delay_real(
>  		xfs_iext_update_extent(bma->ip, state, &bma->icur, &PREV);
>  		xfs_iext_next(ifp, &bma->icur);
>  		xfs_iext_update_extent(bma->ip, state, &bma->icur, &RIGHT);
> +		ASSERT(da_new <= da_old);
>  		break;
>  
>  	case BMAP_RIGHT_FILLING:
> @@ -1861,6 +1867,7 @@ xfs_bmap_add_extent_delay_real(
>  		PREV.br_blockcount = temp;
>  		xfs_iext_insert(bma->ip, &bma->icur, &PREV, state);
>  		xfs_iext_next(ifp, &bma->icur);
> +		ASSERT(da_new <= da_old);
>  		break;
>  
>  	case 0:
> @@ -1983,11 +1990,9 @@ xfs_bmap_add_extent_delay_real(
>  	}
>  
>  	/* adjust for changes in reserved delayed indirect blocks */
> -	if (da_new != da_old) {
> -		ASSERT(state == 0 || da_new < da_old);
> +	if (da_new != da_old)
>  		error = xfs_mod_fdblocks(mp, (int64_t)(da_old - da_new),
> -				false);
> -	}
> +				da_new > da_old);
>  
>  	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork);
>  done:
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/8] xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate
  2024-04-08 14:54 ` [PATCH 6/8] xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate Christoph Hellwig
@ 2024-04-09 23:16   ` Darrick J. Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Darrick J. Wong @ 2024-04-09 23:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, open list:XFS FILESYSTEM

On Mon, Apr 08, 2024 at 04:54:52PM +0200, Christoph Hellwig wrote:
> Both callers of xfs_bmapi_allocate already initialize bma->prev, don't
> redo that in xfs_bmapi_allocate.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks ok,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index aa182937de4641..08e13ebdee1b53 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4196,11 +4196,6 @@ xfs_bmapi_allocate(
>  	ASSERT(bma->length > 0);
>  	ASSERT(bma->length <= XFS_MAX_BMBT_EXTLEN);
>  
> -	if (bma->wasdel) {
> -		if (!xfs_iext_peek_prev_extent(ifp, &bma->icur, &bma->prev))
> -			bma->prev.br_startoff = NULLFILEOFF;
> -	}
> -
>  	if (bma->flags & XFS_BMAPI_CONTIG)
>  		bma->minlen = bma->length;
>  	else
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/8] xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write
  2024-04-08 14:54 ` [PATCH 4/8] xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write Christoph Hellwig
@ 2024-04-09 23:17   ` Darrick J. Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Darrick J. Wong @ 2024-04-09 23:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, open list:XFS FILESYSTEM

On Mon, Apr 08, 2024 at 04:54:50PM +0200, Christoph Hellwig wrote:
> XFS_FILBLKS_MIN uses min_t and thus does the comparison using the correct
> xfs_filblks_t type.  Use it in xfs_bmapi_write and slightly adjust the
> comment document th potential pitfall to take account of this
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Pretty straightforward conversion,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 3b5816de4af2a1..f2e934c2fb423c 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4537,14 +4537,11 @@ xfs_bmapi_write(
>  			 * allocation length request (which can be 64 bits in
>  			 * length) and the bma length request, which is
>  			 * xfs_extlen_t and therefore 32 bits. Hence we have to
> -			 * check for 32-bit overflows and handle them here.
> +			 * be careful and do the min() using the larger type to
> +			 * avoid overflows.
>  			 */
> -			if (len > (xfs_filblks_t)XFS_MAX_BMBT_EXTLEN)
> -				bma.length = XFS_MAX_BMBT_EXTLEN;
> -			else
> -				bma.length = len;
> +			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
>  
> -			ASSERT(len > 0);
>  			ASSERT(bma.length > 0);
>  			error = xfs_bmapi_allocate(&bma);
>  			if (error) {
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/8] xfs: lifr a xfs_valid_startblock into xfs_bmapi_allocate
  2024-04-08 14:54 ` [PATCH 3/8] xfs: lifr a xfs_valid_startblock into xfs_bmapi_allocate Christoph Hellwig
@ 2024-04-09 23:17   ` Darrick J. Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Darrick J. Wong @ 2024-04-09 23:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, open list:XFS FILESYSTEM

On Mon, Apr 08, 2024 at 04:54:49PM +0200, Christoph Hellwig wrote:
> Subject: [PATCH 3/8] xfs: lifr a xfs_valid_startblock into xfs_bmapi_allocate

                            lift

> xfs_bmapi_convert_delalloc has a xfs_valid_startblock check on the block
> allocated by xfs_bmapi_allocate.  Lift it into xfs_bmapi_allocate as
> we should assert the same for xfs_bmapi_write.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

With the typo fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index a847159302703a..3b5816de4af2a1 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4230,6 +4230,11 @@ xfs_bmapi_allocate(
>  	if (bma->blkno == NULLFSBLOCK)
>  		return -ENOSPC;
>  
> +	if (WARN_ON_ONCE(!xfs_valid_startblock(bma->ip, bma->blkno))) {
> +		xfs_bmap_mark_sick(bma->ip, whichfork);
> +		return -EFSCORRUPTED;
> +	}
> +
>  	if (bma->flags & XFS_BMAPI_ZERO) {
>  		error = xfs_zero_extent(bma->ip, bma->blkno, bma->length);
>  		if (error)
> @@ -4721,12 +4726,6 @@ xfs_bmapi_convert_delalloc(
>  	if (error)
>  		goto out_finish;
>  
> -	if (WARN_ON_ONCE(!xfs_valid_startblock(ip, bma.got.br_startblock))) {
> -		xfs_bmap_mark_sick(ip, whichfork);
> -		error = -EFSCORRUPTED;
> -		goto out_finish;
> -	}
> -
>  	XFS_STATS_ADD(mp, xs_xstrat_bytes, XFS_FSB_TO_B(mp, bma.length));
>  	XFS_STATS_INC(mp, xs_xstrat_quick);
>  
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate
  2024-04-08 14:54 ` [PATCH 2/8] xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate Christoph Hellwig
@ 2024-04-09 23:17   ` Darrick J. Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Darrick J. Wong @ 2024-04-09 23:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, open list:XFS FILESYSTEM

On Mon, Apr 08, 2024 at 04:54:48PM +0200, Christoph Hellwig wrote:
> tmp_logflags is initialized to 0 and then ORed into bma->logflags, which
> isn't actually doing anything.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 14196d918865ba..a847159302703a 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4191,7 +4191,6 @@ xfs_bmapi_allocate(
>  	struct xfs_mount	*mp = bma->ip->i_mount;
>  	int			whichfork = xfs_bmapi_whichfork(bma->flags);
>  	struct xfs_ifork	*ifp = xfs_ifork_ptr(bma->ip, whichfork);
> -	int			tmp_logflags = 0;
>  	int			error;
>  
>  	ASSERT(bma->length > 0);
> @@ -4262,8 +4261,6 @@ xfs_bmapi_allocate(
>  		error = xfs_bmap_add_extent_hole_real(bma->tp, bma->ip,
>  				whichfork, &bma->icur, &bma->cur, &bma->got,
>  				&bma->logflags, bma->flags);
> -
> -	bma->logflags |= tmp_logflags;
>  	if (error)
>  		return error;
>  
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] xfs: fix error returns from xfs_bmapi_write
  2024-04-08 14:54 ` [PATCH 1/8] xfs: fix error returns from xfs_bmapi_write Christoph Hellwig
@ 2024-04-09 23:19   ` Darrick J. Wong
  2024-04-10  4:02     ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2024-04-09 23:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, open list:XFS FILESYSTEM

On Mon, Apr 08, 2024 at 04:54:47PM +0200, Christoph Hellwig wrote:
> xfs_bmapi_write can return 0 without actually returning a mapping in
> mval in two different cases:
> 
>  1) when there is absolutely no space available to do an allocation
>  2) when converting delalloc space, and the allocation is so small
>     that it only covers parts of the delalloc extent before the
>     range requested by the caller
> 
> Callers at best can handle one of these cases, but in many cases can't
> cope with either one.  Switch xfs_bmapi_write to always return a
> mapping or return an error code instead.  For case 1) above ENOSPC is
> the obvious choice which is very much what the callers expect anyway.
> For case 2) there is no really good error code, so pick a funky one
> from the SysV streams portfolio.
> 
> This fixes the reproducer here:
> 
>     https://lore.kernel.org/linux-xfs/CAEJPjCvT3Uag-pMTYuigEjWZHn1sGMZ0GCjVVCv29tNHK76Cgg@mail.gmail.com0/
> 
> which uses reserved blocks to create file systems that are gravely
> out of space and thus cause at least xfs_file_alloc_space to hang
> and trigger the lack of ENOSPC handling in xfs_dquot_disk_alloc.
> 
> Note that this patch does not actually make any caller but
> xfs_alloc_file_space deal intelligently with case 2) above.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Hmm.  So I think the answer to my questions earlier is that with the
series applied, ENOSR should never leak out because bmapi_write calls
will always do the piece that the caller asked for, right?

If yes, then
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr_remote.c |  1 -
>  fs/xfs/libxfs/xfs_bmap.c        | 46 ++++++++++++++++++++++++++-------
>  fs/xfs/libxfs/xfs_da_btree.c    | 20 ++++----------
>  fs/xfs/scrub/quota_repair.c     |  6 -----
>  fs/xfs/scrub/rtbitmap_repair.c  |  2 --
>  fs/xfs/xfs_bmap_util.c          | 31 +++++++++++-----------
>  fs/xfs/xfs_dquot.c              |  1 -
>  fs/xfs/xfs_iomap.c              |  8 ------
>  fs/xfs/xfs_reflink.c            | 14 ----------
>  fs/xfs/xfs_rtalloc.c            |  2 --
>  10 files changed, 57 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index ff04128287720a..41a02dcc2541b0 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -626,7 +626,6 @@ xfs_attr_rmtval_set_blk(
>  	if (error)
>  		return error;
>  
> -	ASSERT(nmap == 1);
>  	ASSERT((map->br_startblock != DELAYSTARTBLOCK) &&
>  	       (map->br_startblock != HOLESTARTBLOCK));
>  
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 656c95a22f2e6d..14196d918865ba 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4226,8 +4226,10 @@ xfs_bmapi_allocate(
>  	} else {
>  		error = xfs_bmap_alloc_userdata(bma);
>  	}
> -	if (error || bma->blkno == NULLFSBLOCK)
> +	if (error)
>  		return error;
> +	if (bma->blkno == NULLFSBLOCK)
> +		return -ENOSPC;
>  
>  	if (bma->flags & XFS_BMAPI_ZERO) {
>  		error = xfs_zero_extent(bma->ip, bma->blkno, bma->length);
> @@ -4406,6 +4408,15 @@ xfs_bmapi_finish(
>   * extent state if necessary.  Details behaviour is controlled by the flags
>   * parameter.  Only allocates blocks from a single allocation group, to avoid
>   * locking problems.
> + *
> + * Returns 0 on success and places the extent mappings in mval.  nmaps is used
> + * as an input/output parameter where the caller specifies the maximum number
> + * of mappings that may be returned and xfs_bmapi_write passes back the number
> + * of mappings (including existing mappings) it found.
> + *
> + * Returns a negative error code on failure, including -ENOSPC when it could not
> + * allocate any blocks and -ENOSR when it did allocate blocks to convert a
> + * delalloc range, but those blocks were before the passed in range.
>   */
>  int
>  xfs_bmapi_write(
> @@ -4534,10 +4545,16 @@ xfs_bmapi_write(
>  			ASSERT(len > 0);
>  			ASSERT(bma.length > 0);
>  			error = xfs_bmapi_allocate(&bma);
> -			if (error)
> +			if (error) {
> +				/*
> +				 * If we already allocated space in a previous
> +				 * iteration return what we go so far when
> +				 * running out of space.
> +				 */
> +				if (error == -ENOSPC && bma.nallocs)
> +					break;
>  				goto error0;
> -			if (bma.blkno == NULLFSBLOCK)
> -				break;
> +			}
>  
>  			/*
>  			 * If this is a CoW allocation, record the data in
> @@ -4575,7 +4592,6 @@ xfs_bmapi_write(
>  		if (!xfs_iext_next_extent(ifp, &bma.icur, &bma.got))
>  			eof = true;
>  	}
> -	*nmap = n;
>  
>  	error = xfs_bmap_btree_to_extents(tp, ip, bma.cur, &bma.logflags,
>  			whichfork);
> @@ -4586,7 +4602,22 @@ xfs_bmapi_write(
>  	       ifp->if_nextents > XFS_IFORK_MAXEXT(ip, whichfork));
>  	xfs_bmapi_finish(&bma, whichfork, 0);
>  	xfs_bmap_validate_ret(orig_bno, orig_len, orig_flags, orig_mval,
> -		orig_nmap, *nmap);
> +		orig_nmap, n);
> +
> +	/*
> +	 * When converting delayed allocations, xfs_bmapi_allocate ignores
> +	 * the passed in bno and always converts from the start of the found
> +	 * delalloc extent.
> +	 *
> +	 * To avoid a successful return with *nmap set to 0, return the magic
> +	 * -ENOSR error code for this particular case so that the caller can
> +	 * handle it.
> +	 */
> +	if (!n) {
> +		ASSERT(bma.nallocs >= *nmap);
> +		return -ENOSR;
> +	}
> +	*nmap = n;
>  	return 0;
>  error0:
>  	xfs_bmapi_finish(&bma, whichfork, error);
> @@ -4693,9 +4724,6 @@ xfs_bmapi_convert_delalloc(
>  	if (error)
>  		goto out_finish;
>  
> -	error = -ENOSPC;
> -	if (WARN_ON_ONCE(bma.blkno == NULLFSBLOCK))
> -		goto out_finish;
>  	if (WARN_ON_ONCE(!xfs_valid_startblock(ip, bma.got.br_startblock))) {
>  		xfs_bmap_mark_sick(ip, whichfork);
>  		error = -EFSCORRUPTED;
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 718d071bb21ae3..276c710548b5a7 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -2167,8 +2167,8 @@ xfs_da_grow_inode_int(
>  	struct xfs_inode	*dp = args->dp;
>  	int			w = args->whichfork;
>  	xfs_rfsblock_t		nblks = dp->i_nblocks;
> -	struct xfs_bmbt_irec	map, *mapp;
> -	int			nmap, error, got, i, mapi;
> +	struct xfs_bmbt_irec	map, *mapp = &map;
> +	int			nmap, error, got, i, mapi = 1;
>  
>  	/*
>  	 * Find a spot in the file space to put the new block.
> @@ -2184,14 +2184,7 @@ xfs_da_grow_inode_int(
>  	error = xfs_bmapi_write(tp, dp, *bno, count,
>  			xfs_bmapi_aflag(w)|XFS_BMAPI_METADATA|XFS_BMAPI_CONTIG,
>  			args->total, &map, &nmap);
> -	if (error)
> -		return error;
> -
> -	ASSERT(nmap <= 1);
> -	if (nmap == 1) {
> -		mapp = &map;
> -		mapi = 1;
> -	} else if (nmap == 0 && count > 1) {
> +	if (error == -ENOSPC && count > 1) {
>  		xfs_fileoff_t		b;
>  		int			c;
>  
> @@ -2209,16 +2202,13 @@ xfs_da_grow_inode_int(
>  					args->total, &mapp[mapi], &nmap);
>  			if (error)
>  				goto out_free_map;
> -			if (nmap < 1)
> -				break;
>  			mapi += nmap;
>  			b = mapp[mapi - 1].br_startoff +
>  			    mapp[mapi - 1].br_blockcount;
>  		}
> -	} else {
> -		mapi = 0;
> -		mapp = NULL;
>  	}
> +	if (error)
> +		goto out_free_map;
>  
>  	/*
>  	 * Count the blocks we got, make sure it matches the total.
> diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c
> index 0bab4c30cb85ab..90cd1512bba961 100644
> --- a/fs/xfs/scrub/quota_repair.c
> +++ b/fs/xfs/scrub/quota_repair.c
> @@ -77,8 +77,6 @@ xrep_quota_item_fill_bmap_hole(
>  			irec, &nmaps);
>  	if (error)
>  		return error;
> -	if (nmaps != 1)
> -		return -ENOSPC;
>  
>  	dq->q_blkno = XFS_FSB_TO_DADDR(mp, irec->br_startblock);
>  
> @@ -444,10 +442,6 @@ xrep_quota_data_fork(
>  					XFS_BMAPI_CONVERT, 0, &nrec, &nmap);
>  			if (error)
>  				goto out;
> -			if (nmap != 1) {
> -				error = -ENOSPC;
> -				goto out;
> -			}
>  			ASSERT(nrec.br_startoff == irec.br_startoff);
>  			ASSERT(nrec.br_blockcount == irec.br_blockcount);
>  
> diff --git a/fs/xfs/scrub/rtbitmap_repair.c b/fs/xfs/scrub/rtbitmap_repair.c
> index 46f5d5f605c915..0fef98e9f83409 100644
> --- a/fs/xfs/scrub/rtbitmap_repair.c
> +++ b/fs/xfs/scrub/rtbitmap_repair.c
> @@ -108,8 +108,6 @@ xrep_rtbitmap_data_mappings(
>  				0, &map, &nmaps);
>  		if (error)
>  			return error;
> -		if (nmaps != 1)
> -			return -EFSCORRUPTED;
>  
>  		/* Commit new extent and all deferred work. */
>  		error = xrep_defer_finish(sc);
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 19e11d1da66074..fbca94170cd386 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -721,33 +721,32 @@ xfs_alloc_file_space(
>  		if (error)
>  			goto error;
>  
> -		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
> -				allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
> -				&nimaps);
> -		if (error)
> -			goto error;
> -
> -		ip->i_diflags |= XFS_DIFLAG_PREALLOC;
> -		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> -
> -		error = xfs_trans_commit(tp);
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -		if (error)
> -			break;
> -
>  		/*
>  		 * If the allocator cannot find a single free extent large
>  		 * enough to cover the start block of the requested range,
> -		 * xfs_bmapi_write will return 0 but leave *nimaps set to 0.
> +		 * xfs_bmapi_write will return -ENOSR.
>  		 *
>  		 * In that case we simply need to keep looping with the same
>  		 * startoffset_fsb so that one of the following allocations
>  		 * will eventually reach the requested range.
>  		 */
> -		if (nimaps) {
> +		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
> +				allocatesize_fsb, XFS_BMAPI_PREALLOC, 0, imapp,
> +				&nimaps);
> +		if (error) {
> +			if (error != -ENOSR)
> +				goto error;
> +			error = 0;
> +		} else {
>  			startoffset_fsb += imapp->br_blockcount;
>  			allocatesize_fsb -= imapp->br_blockcount;
>  		}
> +
> +		ip->i_diflags |= XFS_DIFLAG_PREALLOC;
> +		xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> +
> +		error = xfs_trans_commit(tp);
> +		xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	}
>  
>  	return error;
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index c98cb468c35780..0c9eb8fdeec082 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -357,7 +357,6 @@ xfs_dquot_disk_alloc(
>  		goto err_cancel;
>  
>  	ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
> -	ASSERT(nmaps == 1);
>  	ASSERT((map.br_startblock != DELAYSTARTBLOCK) &&
>  	       (map.br_startblock != HOLESTARTBLOCK));
>  
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 4087af7f3c9f3f..42155cedefb77d 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -321,14 +321,6 @@ xfs_iomap_write_direct(
>  	if (error)
>  		goto out_unlock;
>  
> -	/*
> -	 * Copy any maps to caller's array and return any error.
> -	 */
> -	if (nimaps == 0) {
> -		error = -ENOSPC;
> -		goto out_unlock;
> -	}
> -
>  	if (unlikely(!xfs_valid_startblock(ip, imap->br_startblock))) {
>  		xfs_bmap_mark_sick(ip, XFS_DATA_FORK);
>  		error = xfs_alert_fsblock_zero(ip, imap);
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 7da0e8f961d351..5ecb52a234becc 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -430,13 +430,6 @@ xfs_reflink_fill_cow_hole(
>  	if (error)
>  		return error;
>  
> -	/*
> -	 * Allocation succeeded but the requested range was not even partially
> -	 * satisfied?  Bail out!
> -	 */
> -	if (nimaps == 0)
> -		return -ENOSPC;
> -
>  convert:
>  	return xfs_reflink_convert_unwritten(ip, imap, cmap, convert_now);
>  
> @@ -499,13 +492,6 @@ xfs_reflink_fill_delalloc(
>  		error = xfs_trans_commit(tp);
>  		if (error)
>  			return error;
> -
> -		/*
> -		 * Allocation succeeded but the requested range was not even
> -		 * partially satisfied?  Bail out!
> -		 */
> -		if (nimaps == 0)
> -			return -ENOSPC;
>  	} while (cmap->br_startoff + cmap->br_blockcount <= imap->br_startoff);
>  
>  	return xfs_reflink_convert_unwritten(ip, imap, cmap, convert_now);
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index e66f9bd5de5cff..46ee8093f797a6 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -709,8 +709,6 @@ xfs_growfs_rt_alloc(
>  		nmap = 1;
>  		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
>  					XFS_BMAPI_METADATA, 0, &map, &nmap);
> -		if (!error && nmap < 1)
> -			error = -ENOSPC;
>  		if (error)
>  			goto out_trans_cancel;
>  		/*
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/8] xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate
  2024-04-08 14:54 ` [PATCH 5/8] xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate Christoph Hellwig
@ 2024-04-09 23:20   ` Darrick J. Wong
  2024-04-10  4:03     ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2024-04-09 23:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, open list:XFS FILESYSTEM

On Mon, Apr 08, 2024 at 04:54:51PM +0200, Christoph Hellwig wrote:
> xfs_bmapi_allocate currently overwrites offset and len when converting
> delayed allocations, and duplicates the length cap done for non-delalloc
> allocations.  Move all that logic into the callers to avoid duplication
> and to make the calling conventions more obvious.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index f2e934c2fb423c..aa182937de4641 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4194,21 +4194,11 @@ xfs_bmapi_allocate(
>  	int			error;
>  
>  	ASSERT(bma->length > 0);
> +	ASSERT(bma->length <= XFS_MAX_BMBT_EXTLEN);
>  
> -	/*
> -	 * For the wasdelay case, we could also just allocate the stuff asked
> -	 * for in this bmap call but that wouldn't be as good.
> -	 */
>  	if (bma->wasdel) {
> -		bma->length = (xfs_extlen_t)bma->got.br_blockcount;
> -		bma->offset = bma->got.br_startoff;
>  		if (!xfs_iext_peek_prev_extent(ifp, &bma->icur, &bma->prev))
>  			bma->prev.br_startoff = NULLFILEOFF;
> -	} else {
> -		bma->length = XFS_FILBLKS_MIN(bma->length, XFS_MAX_BMBT_EXTLEN);
> -		if (!bma->eof)
> -			bma->length = XFS_FILBLKS_MIN(bma->length,
> -					bma->got.br_startoff - bma->offset);
>  	}
>  
>  	if (bma->flags & XFS_BMAPI_CONTIG)
> @@ -4542,6 +4532,15 @@ xfs_bmapi_write(
>  			 */
>  			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
>  
> +			if (wasdelay) {
> +				bma.offset = bma.got.br_startoff;
> +				bma.length = bma.got.br_blockcount;

This read funny since we'd previously set bma.{offset,length} above, but
I guess that preserves the "convert all the delalloc" behavior; and you
turn it off in patch 8, right?

--D

> +			} else {
> +				if (!eof)
> +					bma.length = XFS_FILBLKS_MIN(bma.length,
> +						bma.got.br_startoff - bno);
> +			}
> +
>  			ASSERT(bma.length > 0);
>  			error = xfs_bmapi_allocate(&bma);
>  			if (error) {
> @@ -4694,11 +4693,16 @@ xfs_bmapi_convert_delalloc(
>  	bma.tp = tp;
>  	bma.ip = ip;
>  	bma.wasdel = true;
> -	bma.offset = bma.got.br_startoff;
> -	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount,
> -			XFS_MAX_BMBT_EXTLEN);
>  	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
>  
> +	/*
> +	 * Always allocate convert from the start of the delalloc extent even if
> +	 * that is outside the passed in range to create large contiguous
> +	 * extents on disk.
> +	 */
> +	bma.offset = bma.got.br_startoff;
> +	bma.length = bma.got.br_blockcount;
> +
>  	/*
>  	 * When we're converting the delalloc reservations backing dirty pages
>  	 * in the page cache, we must be careful about how we create the new
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] xfs: fix error returns from xfs_bmapi_write
  2024-04-09 23:19   ` Darrick J. Wong
@ 2024-04-10  4:02     ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner,
	open list:XFS FILESYSTEM

On Tue, Apr 09, 2024 at 04:19:17PM -0700, Darrick J. Wong wrote:
> Hmm.  So I think the answer to my questions earlier is that with the
> series applied, ENOSR should never leak out because bmapi_write calls
> will always do the piece that the caller asked for, right?

Almost.  With the whole series ENOSR goes away again (but then again
I'm a bit skeptical if it opens new unknown cans of worms, see the
cover letter).

With just this patch it will leak to user space if we ever hit a case
where we completely mishandled this error before.  Which we've not
observed, but which isn't entirely impossible.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/8] xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate
  2024-04-09 23:20   ` Darrick J. Wong
@ 2024-04-10  4:03     ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:03 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner,
	open list:XFS FILESYSTEM

On Tue, Apr 09, 2024 at 04:20:44PM -0700, Darrick J. Wong wrote:
> > @@ -4542,6 +4532,15 @@ xfs_bmapi_write(
> >  			 */
> >  			bma.length = XFS_FILBLKS_MIN(len, XFS_MAX_BMBT_EXTLEN);
> >  
> > +			if (wasdelay) {
> > +				bma.offset = bma.got.br_startoff;
> > +				bma.length = bma.got.br_blockcount;
> 
> This read funny since we'd previously set bma.{offset,length} above, but
> I guess that preserves the "convert all the delalloc" behavior; and you
> turn it off in patch 8, right?

Yes.  That being said I should just move the assignments into the other
branch here to make things more obvious.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
  2024-04-09 23:16   ` Darrick J. Wong
@ 2024-04-10  4:04     ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:04 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner,
	open list:XFS FILESYSTEM

On Tue, Apr 09, 2024 at 04:16:45PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 08, 2024 at 04:54:53PM +0200, Christoph Hellwig wrote:
> > xfs_bmap_add_extent_delay_real takes parts or all of a delalloc extent
> > and converts them to a real extent.  It is written to deal with any
> > potential overlap of the to be converted range with the delalloc extent,
> > but it turns out that currently only converting the entire extents, or a
> > part starting at the beginning is actually exercised, as the only caller
> > always tries to convert the entire delalloc extent, and either succeeds
> > or at least progresses partially from the start.
> > 
> > If it only converts a tiny part of a delalloc extent, the indirect block
> > calculation for the new delalloc extent (da_new) might be equivalent to that
> > of the existing delalloc extent (da_new).  If this extent conversion now
> 
>                                    da_old?

Yes.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] xfs: do not allocate the entire delalloc extent in xfs_bmapi_write
  2024-04-09 23:16   ` Darrick J. Wong
@ 2024-04-10  4:07     ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:07 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner,
	open list:XFS FILESYSTEM

On Tue, Apr 09, 2024 at 04:16:01PM -0700, Darrick J. Wong wrote:
> On Mon, Apr 08, 2024 at 04:54:54PM +0200, Christoph Hellwig wrote:
> > While trying to convert the entire delalloc extent is a good decision
> > for regular writeback as it leads to larger contigous on-disk extents,
> > but for other callers of xfs_bmapi_write is is rather questionable as
> > it forced them to loop creating new transactions just in case there
> > is no large enough contiguous extent to cover the whole delalloc
> > reservation.
> > 
> > Change xfs_bmapi_write to only allocate the passed in range instead.
> 
> Looking at this... I guess xfs_map_blocks -> xfs_convert_blocks ->
> xfs_bmapi_convert_delalloc -> xfs_bmapi_allocate is now how writeback
> converts delalloc extents before scheduling writeout.  This is how the
> mass-conversions of large da reservations got done before this series,
> and that's still how it works, right?

Yes and yes.

> Whereas xfs_bmapi_write is for targeted conversions only?

targeted is one way to describe it, the other way to look at it is
that xfs_bmapi_write is used where the callers want to allocate
real (or unwritten) extents for a range, which just happens to
convert delalloc as a side-effect as those callers don't want to
deal with delalloc extents.

> 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> If yes and yes, then:
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

So as mentioned in the cover letter I'm quite worried about the
new behavior we expose here as we always converted delalloc
extents from the start and tried to convert it to the end,
and this now changes that.  So while the changes looks quite
simple they expose a lot of previously untested code and behavior.

It's probably the right thing to do but quite risky, let me know
what you think about it.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-04-10  4:07 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-08 14:54 RFC: extended version of the xfs_bmapi_write retval fix Christoph Hellwig
2024-04-08 14:54 ` [PATCH 1/8] xfs: fix error returns from xfs_bmapi_write Christoph Hellwig
2024-04-09 23:19   ` Darrick J. Wong
2024-04-10  4:02     ` Christoph Hellwig
2024-04-08 14:54 ` [PATCH 2/8] xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate Christoph Hellwig
2024-04-09 23:17   ` Darrick J. Wong
2024-04-08 14:54 ` [PATCH 3/8] xfs: lifr a xfs_valid_startblock into xfs_bmapi_allocate Christoph Hellwig
2024-04-09 23:17   ` Darrick J. Wong
2024-04-08 14:54 ` [PATCH 4/8] xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write Christoph Hellwig
2024-04-09 23:17   ` Darrick J. Wong
2024-04-08 14:54 ` [PATCH 5/8] xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate Christoph Hellwig
2024-04-09 23:20   ` Darrick J. Wong
2024-04-10  4:03     ` Christoph Hellwig
2024-04-08 14:54 ` [PATCH 6/8] xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate Christoph Hellwig
2024-04-09 23:16   ` Darrick J. Wong
2024-04-08 14:54 ` [PATCH 7/8] xfs: fix xfs_bmap_add_extent_delay_real for partial conversions Christoph Hellwig
2024-04-09 23:16   ` Darrick J. Wong
2024-04-10  4:04     ` Christoph Hellwig
2024-04-08 14:54 ` [PATCH 8/8] xfs: do not allocate the entire delalloc extent in xfs_bmapi_write Christoph Hellwig
2024-04-09 23:16   ` Darrick J. Wong
2024-04-10  4:07     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox