linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4)
@ 2022-08-19 18:14 Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 1/9] xfs: flush inodegc workqueue tasks before cancel Leah Rumancik
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable; +Cc: linux-xfs, amir73il, Leah Rumancik

Hello,

Here's another round of xfs backports for 5.15.y that have been
through testing and ACK'd.

Thanks,
Leah

Brian Foster (2):
  xfs: flush inodegc workqueue tasks before cancel
  xfs: fix soft lockup via spinning in filestream ag selection loop

Darrick J. Wong (6):
  xfs: reserve quota for dir expansion when linking/unlinking files
  xfs: reserve quota for target dir expansion when renaming files
  xfs: remove infinite loop when reserving free block pool
  xfs: always succeed at setting the reserve pool size
  xfs: fix overfilling of reserve pool
  xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*

Eric Sandeen (1):
  xfs: revert "xfs: actually bump warning counts when we send warnings"

 fs/xfs/xfs_filestream.c  |  7 ++--
 fs/xfs/xfs_fsops.c       | 50 ++++++++++-------------
 fs/xfs/xfs_icache.c      | 22 ++--------
 fs/xfs/xfs_inode.c       | 79 ++++++++++++++++++++++--------------
 fs/xfs/xfs_ioctl.c       |  2 +-
 fs/xfs/xfs_trans.c       | 86 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trans.h       |  3 ++
 fs/xfs/xfs_trans_dquot.c |  1 -
 8 files changed, 167 insertions(+), 83 deletions(-)

-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 5.15 1/9] xfs: flush inodegc workqueue tasks before cancel
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 2/9] xfs: reserve quota for dir expansion when linking/unlinking files Leah Rumancik
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable
  Cc: linux-xfs, amir73il, Brian Foster, Darrick J . Wong, Dave Chinner,
	Leah Rumancik

From: Brian Foster <bfoster@redhat.com>

[ Upstream commit 6191cf3ad59fda5901160633fef8e41b064a5246 ]

The xfs_inodegc_stop() helper performs a high level flush of pending
work on the percpu queues and then runs a cancel_work_sync() on each
of the percpu work tasks to ensure all work has completed before
returning.  While cancel_work_sync() waits for wq tasks to complete,
it does not guarantee work tasks have started. This means that the
_stop() helper can queue and instantly cancel a wq task without
having completed the associated work. This can be observed by
tracepoint inspection of a simple "rm -f <file>; fsfreeze -f <mnt>"
test:

	xfs_destroy_inode: ... ino 0x83 ...
	xfs_inode_set_need_inactive: ... ino 0x83 ...
	xfs_inodegc_stop: ...
	...
	xfs_inodegc_start: ...
	xfs_inodegc_worker: ...
	xfs_inode_inactivating: ... ino 0x83 ...

The first few lines show that the inode is removed and need inactive
state set, but the inactivation work has not completed before the
inodegc mechanism stops. The inactivation doesn't actually occur
until the fs is unfrozen and the gc mechanism starts back up. Note
that this test requires fsfreeze to reproduce because xfs_freeze
indirectly invokes xfs_fs_statfs(), which calls xfs_inodegc_flush().

When this occurs, the workqueue try_to_grab_pending() logic first
tries to steal the pending bit, which does not succeed because the
bit has been set by queue_work_on(). Subsequently, it checks for
association of a pool workqueue from the work item under the pool
lock. This association is set at the point a work item is queued and
cleared when dequeued for processing. If the association exists, the
work item is removed from the queue and cancel_work_sync() returns
true. If the pwq association is cleared, the remove attempt assumes
the task is busy and retries (eventually returning false to the
caller after waiting for the work task to complete).

To avoid this race, we can flush each work item explicitly before
cancel. However, since the _queue_all() already schedules each
underlying work item, the workqueue level helpers are sufficient to
achieve the same ordering effect. E.g., the inodegc enabled flag
prevents scheduling any further work in the _stop() case. Use the
drain_workqueue() helper in this particular case to make the intent
a bit more self explanatory.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_icache.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index f2210d927481..5e44d7bbd8fc 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1872,28 +1872,20 @@ xfs_inodegc_worker(
 }
 
 /*
- * Force all currently queued inode inactivation work to run immediately, and
- * wait for the work to finish. Two pass - queue all the work first pass, wait
- * for it in a second pass.
+ * Force all currently queued inode inactivation work to run immediately and
+ * wait for the work to finish.
  */
 void
 xfs_inodegc_flush(
 	struct xfs_mount	*mp)
 {
-	struct xfs_inodegc	*gc;
-	int			cpu;
-
 	if (!xfs_is_inodegc_enabled(mp))
 		return;
 
 	trace_xfs_inodegc_flush(mp, __return_address);
 
 	xfs_inodegc_queue_all(mp);
-
-	for_each_online_cpu(cpu) {
-		gc = per_cpu_ptr(mp->m_inodegc, cpu);
-		flush_work(&gc->work);
-	}
+	flush_workqueue(mp->m_inodegc_wq);
 }
 
 /*
@@ -1904,18 +1896,12 @@ void
 xfs_inodegc_stop(
 	struct xfs_mount	*mp)
 {
-	struct xfs_inodegc	*gc;
-	int			cpu;
-
 	if (!xfs_clear_inodegc_enabled(mp))
 		return;
 
 	xfs_inodegc_queue_all(mp);
+	drain_workqueue(mp->m_inodegc_wq);
 
-	for_each_online_cpu(cpu) {
-		gc = per_cpu_ptr(mp->m_inodegc, cpu);
-		cancel_work_sync(&gc->work);
-	}
 	trace_xfs_inodegc_stop(mp, __return_address);
 }
 
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5.15 2/9] xfs: reserve quota for dir expansion when linking/unlinking files
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 1/9] xfs: flush inodegc workqueue tasks before cancel Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 3/9] xfs: reserve quota for target dir expansion when renaming files Leah Rumancik
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable; +Cc: linux-xfs, amir73il, Darrick J. Wong, Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 871b9316e7a778ff97bdc34fdb2f2977f616651d ]

XFS does not reserve quota for directory expansion when linking or
unlinking children from a directory.  This means that we don't reject
the expansion with EDQUOT when we're at or near a hard limit, which
means that unprivileged userspace can use link()/unlink() to exceed
quota.

The fix for this is nuanced -- link operations don't always expand the
directory, and we allow a link to proceed with no space reservation if
we don't need to add a block to the directory to handle the addition.
Unlink operations generally do not expand the directory (you'd have to
free a block and then cause a btree split) and we can defer the
directory block freeing if there is no space reservation.

Moreover, there is a further bug in that we do not trigger the blockgc
workers to try to clear space when we're out of quota.

To fix both cases, create a new xfs_trans_alloc_dir function that
allocates the transaction, locks and joins the inodes, and reserves
quota for the directory.  If there isn't sufficient space or quota,
we'll switch the caller to reservationless mode.  This should prevent
quota usage overruns with the least restriction in functionality.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c | 46 +++++++++----------------
 fs/xfs/xfs_trans.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trans.h |  3 ++
 3 files changed, 106 insertions(+), 29 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c19f3ca605af..f4dec7f6c6d0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1223,7 +1223,7 @@ xfs_link(
 {
 	xfs_mount_t		*mp = tdp->i_mount;
 	xfs_trans_t		*tp;
-	int			error;
+	int			error, nospace_error = 0;
 	int			resblks;
 
 	trace_xfs_link(tdp, target_name);
@@ -1242,19 +1242,11 @@ xfs_link(
 		goto std_return;
 
 	resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_link, resblks, 0, 0, &tp);
-	if (error == -ENOSPC) {
-		resblks = 0;
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_link, 0, 0, 0, &tp);
-	}
+	error = xfs_trans_alloc_dir(tdp, &M_RES(mp)->tr_link, sip, &resblks,
+			&tp, &nospace_error);
 	if (error)
 		goto std_return;
 
-	xfs_lock_two_inodes(sip, XFS_ILOCK_EXCL, tdp, XFS_ILOCK_EXCL);
-
-	xfs_trans_ijoin(tp, sip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, tdp, XFS_ILOCK_EXCL);
-
 	error = xfs_iext_count_may_overflow(tdp, XFS_DATA_FORK,
 			XFS_IEXT_DIR_MANIP_CNT(mp));
 	if (error)
@@ -1312,6 +1304,8 @@ xfs_link(
  error_return:
 	xfs_trans_cancel(tp);
  std_return:
+	if (error == -ENOSPC && nospace_error)
+		error = nospace_error;
 	return error;
 }
 
@@ -2761,6 +2755,7 @@ xfs_remove(
 	xfs_mount_t		*mp = dp->i_mount;
 	xfs_trans_t             *tp = NULL;
 	int			is_dir = S_ISDIR(VFS_I(ip)->i_mode);
+	int			dontcare;
 	int                     error = 0;
 	uint			resblks;
 
@@ -2778,31 +2773,24 @@ xfs_remove(
 		goto std_return;
 
 	/*
-	 * We try to get the real space reservation first,
-	 * allowing for directory btree deletion(s) implying
-	 * possible bmap insert(s).  If we can't get the space
-	 * reservation then we use 0 instead, and avoid the bmap
-	 * btree insert(s) in the directory code by, if the bmap
-	 * insert tries to happen, instead trimming the LAST
-	 * block from the directory.
+	 * We try to get the real space reservation first, allowing for
+	 * directory btree deletion(s) implying possible bmap insert(s).  If we
+	 * can't get the space reservation then we use 0 instead, and avoid the
+	 * bmap btree insert(s) in the directory code by, if the bmap insert
+	 * tries to happen, instead trimming the LAST block from the directory.
+	 *
+	 * Ignore EDQUOT and ENOSPC being returned via nospace_error because
+	 * the directory code can handle a reservationless update and we don't
+	 * want to prevent a user from trying to free space by deleting things.
 	 */
 	resblks = XFS_REMOVE_SPACE_RES(mp);
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0, 0, &tp);
-	if (error == -ENOSPC) {
-		resblks = 0;
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0,
-				&tp);
-	}
+	error = xfs_trans_alloc_dir(dp, &M_RES(mp)->tr_remove, ip, &resblks,
+			&tp, &dontcare);
 	if (error) {
 		ASSERT(error != -ENOSPC);
 		goto std_return;
 	}
 
-	xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL);
-
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
-
 	/*
 	 * If we're removing a directory perform some additional validation.
 	 */
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 67dec11e34c7..95c183072e7a 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1201,3 +1201,89 @@ xfs_trans_alloc_ichange(
 	xfs_trans_cancel(tp);
 	return error;
 }
+
+/*
+ * Allocate an transaction, lock and join the directory and child inodes to it,
+ * and reserve quota for a directory update.  If there isn't sufficient space,
+ * @dblocks will be set to zero for a reservationless directory update and
+ * @nospace_error will be set to a negative errno describing the space
+ * constraint we hit.
+ *
+ * The caller must ensure that the on-disk dquots attached to this inode have
+ * already been allocated and initialized.  The ILOCKs will be dropped when the
+ * transaction is committed or cancelled.
+ */
+int
+xfs_trans_alloc_dir(
+	struct xfs_inode	*dp,
+	struct xfs_trans_res	*resv,
+	struct xfs_inode	*ip,
+	unsigned int		*dblocks,
+	struct xfs_trans	**tpp,
+	int			*nospace_error)
+{
+	struct xfs_trans	*tp;
+	struct xfs_mount	*mp = ip->i_mount;
+	unsigned int		resblks;
+	bool			retried = false;
+	int			error;
+
+retry:
+	*nospace_error = 0;
+	resblks = *dblocks;
+	error = xfs_trans_alloc(mp, resv, resblks, 0, 0, &tp);
+	if (error == -ENOSPC) {
+		*nospace_error = error;
+		resblks = 0;
+		error = xfs_trans_alloc(mp, resv, resblks, 0, 0, &tp);
+	}
+	if (error)
+		return error;
+
+	xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL);
+
+	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+	error = xfs_qm_dqattach_locked(dp, false);
+	if (error) {
+		/* Caller should have allocated the dquots! */
+		ASSERT(error != -ENOENT);
+		goto out_cancel;
+	}
+
+	error = xfs_qm_dqattach_locked(ip, false);
+	if (error) {
+		/* Caller should have allocated the dquots! */
+		ASSERT(error != -ENOENT);
+		goto out_cancel;
+	}
+
+	if (resblks == 0)
+		goto done;
+
+	error = xfs_trans_reserve_quota_nblks(tp, dp, resblks, 0, false);
+	if (error == -EDQUOT || error == -ENOSPC) {
+		if (!retried) {
+			xfs_trans_cancel(tp);
+			xfs_blockgc_free_quota(dp, 0);
+			retried = true;
+			goto retry;
+		}
+
+		*nospace_error = error;
+		resblks = 0;
+		error = 0;
+	}
+	if (error)
+		goto out_cancel;
+
+done:
+	*tpp = tp;
+	*dblocks = resblks;
+	return 0;
+
+out_cancel:
+	xfs_trans_cancel(tp);
+	return error;
+}
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 50da47f23a07..faba74d4c702 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -265,6 +265,9 @@ int xfs_trans_alloc_icreate(struct xfs_mount *mp, struct xfs_trans_res *resv,
 int xfs_trans_alloc_ichange(struct xfs_inode *ip, struct xfs_dquot *udqp,
 		struct xfs_dquot *gdqp, struct xfs_dquot *pdqp, bool force,
 		struct xfs_trans **tpp);
+int xfs_trans_alloc_dir(struct xfs_inode *dp, struct xfs_trans_res *resv,
+		struct xfs_inode *ip, unsigned int *dblocks,
+		struct xfs_trans **tpp, int *nospace_error);
 
 static inline void
 xfs_trans_set_context(
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5.15 3/9] xfs: reserve quota for target dir expansion when renaming files
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 1/9] xfs: flush inodegc workqueue tasks before cancel Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 2/9] xfs: reserve quota for dir expansion when linking/unlinking files Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 4/9] xfs: remove infinite loop when reserving free block pool Leah Rumancik
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable; +Cc: linux-xfs, amir73il, Darrick J. Wong, Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 41667260bc84db4dfe566e3f6ab6da5293d60d8d ]

XFS does not reserve quota for directory expansion when renaming
children into a directory.  This means that we don't reject the
expansion with EDQUOT when we're at or near a hard limit, which means
that unprivileged userspace can use rename() to exceed quota.

Rename operations don't always expand the target directory, and we allow
a rename to proceed with no space reservation if we don't need to add a
block to the target directory to handle the addition.  Moreover, the
unlink operation on the source directory generally does not expand the
directory (you'd have to free a block and then cause a btree split) and
it's probably of little consequence to leave the corner case that
renaming a file out of a directory can increase its size.

As with link and unlink, there is a further bug in that we do not
trigger the blockgc workers to try to clear space when we're out of
quota.

Because rename is its own special tricky animal, we'll patch xfs_rename
directly to reserve quota to the rename transaction.  We'll leave
cleaning up the rest of xfs_rename for the metadata directory tree
patchset.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f4dec7f6c6d0..fb7a97cdf99f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3103,7 +3103,8 @@ xfs_rename(
 	bool			new_parent = (src_dp != target_dp);
 	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
 	int			spaceres;
-	int			error;
+	bool			retried = false;
+	int			error, nospace_error = 0;
 
 	trace_xfs_rename(src_dp, target_dp, src_name, target_name);
 
@@ -3127,9 +3128,12 @@ xfs_rename(
 	xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip, wip,
 				inodes, &num_inodes);
 
+retry:
+	nospace_error = 0;
 	spaceres = XFS_RENAME_SPACE_RES(mp, target_name->len);
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_rename, spaceres, 0, 0, &tp);
 	if (error == -ENOSPC) {
+		nospace_error = error;
 		spaceres = 0;
 		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_rename, 0, 0, 0,
 				&tp);
@@ -3183,6 +3187,31 @@ xfs_rename(
 					target_dp, target_name, target_ip,
 					spaceres);
 
+	/*
+	 * Try to reserve quota to handle an expansion of the target directory.
+	 * We'll allow the rename to continue in reservationless mode if we hit
+	 * a space usage constraint.  If we trigger reservationless mode, save
+	 * the errno if there isn't any free space in the target directory.
+	 */
+	if (spaceres != 0) {
+		error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
+				0, false);
+		if (error == -EDQUOT || error == -ENOSPC) {
+			if (!retried) {
+				xfs_trans_cancel(tp);
+				xfs_blockgc_free_quota(target_dp, 0);
+				retried = true;
+				goto retry;
+			}
+
+			nospace_error = error;
+			spaceres = 0;
+			error = 0;
+		}
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	/*
 	 * Check for expected errors before we dirty the transaction
 	 * so we can return an error without a transaction abort.
@@ -3429,6 +3458,8 @@ xfs_rename(
 out_release_wip:
 	if (wip)
 		xfs_irele(wip);
+	if (error == -ENOSPC && nospace_error)
+		error = nospace_error;
 	return error;
 }
 
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5.15 4/9] xfs: remove infinite loop when reserving free block pool
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
                   ` (2 preceding siblings ...)
  2022-08-19 18:14 ` [PATCH 5.15 3/9] xfs: reserve quota for target dir expansion when renaming files Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 5/9] xfs: always succeed at setting the reserve pool size Leah Rumancik
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable
  Cc: linux-xfs, amir73il, Darrick J. Wong, Brian Foster, Dave Chinner,
	Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 15f04fdc75aaaa1cccb0b8b3af1be290e118a7bc ]

Infinite loops in kernel code are scary.  Calls to xfs_reserve_blocks
should be rare (people should just use the defaults!) so we really don't
need to try so hard.  Simplify the logic here by removing the infinite
loop.

Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c | 50 +++++++++++++++++++---------------------------
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 710e857bb825..3c6d9d6836ef 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -430,46 +430,36 @@ xfs_reserve_blocks(
 	 * If the request is larger than the current reservation, reserve the
 	 * blocks before we update the reserve counters. Sample m_fdblocks and
 	 * perform a partial reservation if the request exceeds free space.
+	 *
+	 * The code below estimates how many blocks it can request from
+	 * fdblocks to stash in the reserve pool.  This is a classic TOCTOU
+	 * race since fdblocks updates are not always coordinated via
+	 * m_sb_lock.
 	 */
-	error = -ENOSPC;
-	do {
-		free = percpu_counter_sum(&mp->m_fdblocks) -
+	free = percpu_counter_sum(&mp->m_fdblocks) -
 						xfs_fdblocks_unavailable(mp);
-		if (free <= 0)
-			break;
-
-		delta = request - mp->m_resblks;
-		lcounter = free - delta;
-		if (lcounter < 0)
-			/* We can't satisfy the request, just get what we can */
-			fdblks_delta = free;
-		else
-			fdblks_delta = delta;
-
+	delta = request - mp->m_resblks;
+	if (delta > 0 && free > 0) {
 		/*
 		 * We'll either succeed in getting space from the free block
-		 * count or we'll get an ENOSPC. If we get a ENOSPC, it means
-		 * things changed while we were calculating fdblks_delta and so
-		 * we should try again to see if there is anything left to
-		 * reserve.
-		 *
-		 * Don't set the reserved flag here - we don't want to reserve
-		 * the extra reserve blocks from the reserve.....
+		 * count or we'll get an ENOSPC.  Don't set the reserved flag
+		 * here - we don't want to reserve the extra reserve blocks
+		 * from the reserve.
 		 */
+		fdblks_delta = min(free, delta);
 		spin_unlock(&mp->m_sb_lock);
 		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
 		spin_lock(&mp->m_sb_lock);
-	} while (error == -ENOSPC);
 
-	/*
-	 * Update the reserve counters if blocks have been successfully
-	 * allocated.
-	 */
-	if (!error && fdblks_delta) {
-		mp->m_resblks += fdblks_delta;
-		mp->m_resblks_avail += fdblks_delta;
+		/*
+		 * Update the reserve counters if blocks have been successfully
+		 * allocated.
+		 */
+		if (!error) {
+			mp->m_resblks += fdblks_delta;
+			mp->m_resblks_avail += fdblks_delta;
+		}
 	}
-
 out:
 	if (outval) {
 		outval->resblks = mp->m_resblks;
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5.15 5/9] xfs: always succeed at setting the reserve pool size
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
                   ` (3 preceding siblings ...)
  2022-08-19 18:14 ` [PATCH 5.15 4/9] xfs: remove infinite loop when reserving free block pool Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 6/9] xfs: fix overfilling of reserve pool Leah Rumancik
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable; +Cc: linux-xfs, amir73il, Darrick J. Wong, Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 0baa2657dc4d79202148be79a3dc36c35f425060 ]

Nowadays, xfs_mod_fdblocks will always choose to fill the reserve pool
with freed blocks before adding to fdblocks.  Therefore, we can change
the behavior of xfs_reserve_blocks slightly -- setting the target size
of the pool should always succeed, since a deficiency will eventually
be made up as blocks get freed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 3c6d9d6836ef..5c2bea1e12a8 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -434,11 +434,14 @@ xfs_reserve_blocks(
 	 * The code below estimates how many blocks it can request from
 	 * fdblocks to stash in the reserve pool.  This is a classic TOCTOU
 	 * race since fdblocks updates are not always coordinated via
-	 * m_sb_lock.
+	 * m_sb_lock.  Set the reserve size even if there's not enough free
+	 * space to fill it because mod_fdblocks will refill an undersized
+	 * reserve when it can.
 	 */
 	free = percpu_counter_sum(&mp->m_fdblocks) -
 						xfs_fdblocks_unavailable(mp);
 	delta = request - mp->m_resblks;
+	mp->m_resblks = request;
 	if (delta > 0 && free > 0) {
 		/*
 		 * We'll either succeed in getting space from the free block
@@ -455,10 +458,8 @@ xfs_reserve_blocks(
 		 * Update the reserve counters if blocks have been successfully
 		 * allocated.
 		 */
-		if (!error) {
-			mp->m_resblks += fdblks_delta;
+		if (!error)
 			mp->m_resblks_avail += fdblks_delta;
-		}
 	}
 out:
 	if (outval) {
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5.15 6/9] xfs: fix overfilling of reserve pool
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
                   ` (4 preceding siblings ...)
  2022-08-19 18:14 ` [PATCH 5.15 5/9] xfs: always succeed at setting the reserve pool size Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 7/9] xfs: fix soft lockup via spinning in filestream ag selection loop Leah Rumancik
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable; +Cc: linux-xfs, amir73il, Darrick J. Wong, Dave Chinner, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 82be38bcf8a2e056b4c99ce79a3827fa743df6ec ]

Due to cycling of m_sb_lock, it's possible for multiple callers of
xfs_reserve_blocks to race at changing the pool size, subtracting blocks
from fdblocks, and actually putting it in the pool.  The result of all
this is that we can overfill the reserve pool to hilarious levels.

xfs_mod_fdblocks, when called with a positive value, already knows how
to take freed blocks and either fill the reserve until it's full, or put
them in fdblocks.  Use that instead of setting m_resblks_avail directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 5c2bea1e12a8..5b5b68affe66 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -448,18 +448,17 @@ xfs_reserve_blocks(
 		 * count or we'll get an ENOSPC.  Don't set the reserved flag
 		 * here - we don't want to reserve the extra reserve blocks
 		 * from the reserve.
+		 *
+		 * The desired reserve size can change after we drop the lock.
+		 * Use mod_fdblocks to put the space into the reserve or into
+		 * fdblocks as appropriate.
 		 */
 		fdblks_delta = min(free, delta);
 		spin_unlock(&mp->m_sb_lock);
 		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
-		spin_lock(&mp->m_sb_lock);
-
-		/*
-		 * Update the reserve counters if blocks have been successfully
-		 * allocated.
-		 */
 		if (!error)
-			mp->m_resblks_avail += fdblks_delta;
+			xfs_mod_fdblocks(mp, fdblks_delta, 0);
+		spin_lock(&mp->m_sb_lock);
 	}
 out:
 	if (outval) {
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5.15 7/9] xfs: fix soft lockup via spinning in filestream ag selection loop
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
                   ` (5 preceding siblings ...)
  2022-08-19 18:14 ` [PATCH 5.15 6/9] xfs: fix overfilling of reserve pool Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 8/9] xfs: revert "xfs: actually bump warning counts when we send warnings" Leah Rumancik
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable
  Cc: linux-xfs, amir73il, Brian Foster, Darrick J . Wong,
	Christoph Hellwig, Dave Chinner, Leah Rumancik

From: Brian Foster <bfoster@redhat.com>

[ Upstream commit f650df7171b882dca737ddbbeb414100b31f16af ]

The filestream AG selection loop uses pagf data to aid in AG
selection, which depends on pagf initialization. If the in-core
structure is not initialized, the caller invokes the AGF read path
to do so and carries on. If another task enters the loop and finds
a pagf init already in progress, the AGF read returns -EAGAIN and
the task continues the loop. This does not increment the current ag
index, however, which means the task spins on the current AGF buffer
until unlocked.

If the AGF read I/O submitted by the initial task happens to be
delayed for whatever reason, this results in soft lockup warnings
via the spinning task. This is reproduced by xfs/170. To avoid this
problem, fix the AGF trylock failure path to properly iterate to the
next AG. If a task iterates all AGs without making progress, the
trylock behavior is dropped in favor of blocking locks and thus a
soft lockup is no longer possible.

Fixes: f48e2df8a877ca1c ("xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_filestream.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 6a3ce0f6dc9e..be9bcf8a1f99 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -128,11 +128,12 @@ xfs_filestream_pick_ag(
 		if (!pag->pagf_init) {
 			err = xfs_alloc_pagf_init(mp, NULL, ag, trylock);
 			if (err) {
-				xfs_perag_put(pag);
-				if (err != -EAGAIN)
+				if (err != -EAGAIN) {
+					xfs_perag_put(pag);
 					return err;
+				}
 				/* Couldn't lock the AGF, skip this AG. */
-				continue;
+				goto next_ag;
 			}
 		}
 
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5.15 8/9] xfs: revert "xfs: actually bump warning counts when we send warnings"
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
                   ` (6 preceding siblings ...)
  2022-08-19 18:14 ` [PATCH 5.15 7/9] xfs: fix soft lockup via spinning in filestream ag selection loop Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-19 18:14 ` [PATCH 5.15 9/9] xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP* Leah Rumancik
  2022-08-23  7:21 ` [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Greg KH
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable
  Cc: linux-xfs, amir73il, Eric Sandeen, Darrick J . Wong, Dave Chinner,
	Dave Chinner, Leah Rumancik

From: Eric Sandeen <sandeen@redhat.com>

[ Upstream commit bc37e4fb5cac2925b2e286b1f1d4fc2b519f7d92 ]

This reverts commit 4b8628d57b725b32616965e66975fcdebe008fe7.

XFS quota has had the concept of a "quota warning limit" since
the earliest Irix implementation, but a mechanism for incrementing
the warning counter was never implemented, as documented in the
xfs_quota(8) man page. We do know from the historical archive that
it was never incremented at runtime during quota reservation
operations.

With this commit, the warning counter quickly increments for every
allocation attempt after the user has crossed a quote soft
limit threshold, and this in turn transitions the user to hard
quota failures, rendering soft quota thresholds and timers useless.
This was reported as a regression by users.

Because the intended behavior of this warning counter has never been
understood or documented, and the result of this change is a regression
in soft quota functionality, revert this commit to make soft quota
limits and timers operable again.

Fixes: 4b8628d57b72 ("xfs: actually bump warning counts when we send warnings)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_trans_dquot.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index 3872ce671411..955c457e585a 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -603,7 +603,6 @@ xfs_dqresv_check(
 			return QUOTA_NL_ISOFTLONGWARN;
 		}
 
-		res->warnings++;
 		return QUOTA_NL_ISOFTWARN;
 	}
 
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5.15 9/9] xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
                   ` (7 preceding siblings ...)
  2022-08-19 18:14 ` [PATCH 5.15 8/9] xfs: revert "xfs: actually bump warning counts when we send warnings" Leah Rumancik
@ 2022-08-19 18:14 ` Leah Rumancik
  2022-08-23  7:21 ` [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Greg KH
  9 siblings, 0 replies; 11+ messages in thread
From: Leah Rumancik @ 2022-08-19 18:14 UTC (permalink / raw)
  To: stable
  Cc: linux-xfs, amir73il, Darrick J. Wong, Allison Henderson,
	Catherine Hoang, Leah Rumancik

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 29d650f7e3ab55283b89c9f5883d0c256ce478b5 ]

Syzbot tripped over the following complaint from the kernel:

WARNING: CPU: 2 PID: 15402 at mm/util.c:597 kvmalloc_node+0x11e/0x125 mm/util.c:597

While trying to run XFS_IOC_GETBMAP against the following structure:

struct getbmap fubar = {
	.bmv_count	= 0x22dae649,
};

Obviously, this is a crazy huge value since the next thing that the
ioctl would do is allocate 37GB of memory.  This is enough to make
kvmalloc mad, but isn't large enough to trip the validation functions.
In other words, I'm fussing with checks that were **already sufficient**
because that's easier than dealing with 644 internal bug reports.  Yes,
that's right, six hundred and forty-four.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index fba52e75e98b..bcc3c18c8080 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1545,7 +1545,7 @@ xfs_ioc_getbmap(
 
 	if (bmx.bmv_count < 2)
 		return -EINVAL;
-	if (bmx.bmv_count > ULONG_MAX / recsize)
+	if (bmx.bmv_count >= INT_MAX / recsize)
 		return -ENOMEM;
 
 	buf = kvzalloc(bmx.bmv_count * sizeof(*buf), GFP_KERNEL);
-- 
2.37.1.595.g718a3a8f04-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4)
  2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
                   ` (8 preceding siblings ...)
  2022-08-19 18:14 ` [PATCH 5.15 9/9] xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP* Leah Rumancik
@ 2022-08-23  7:21 ` Greg KH
  9 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2022-08-23  7:21 UTC (permalink / raw)
  To: Leah Rumancik; +Cc: stable, linux-xfs, amir73il

On Fri, Aug 19, 2022 at 11:14:22AM -0700, Leah Rumancik wrote:
> Hello,
> 
> Here's another round of xfs backports for 5.15.y that have been
> through testing and ACK'd.

All now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-08-23  7:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-19 18:14 [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 1/9] xfs: flush inodegc workqueue tasks before cancel Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 2/9] xfs: reserve quota for dir expansion when linking/unlinking files Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 3/9] xfs: reserve quota for target dir expansion when renaming files Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 4/9] xfs: remove infinite loop when reserving free block pool Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 5/9] xfs: always succeed at setting the reserve pool size Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 6/9] xfs: fix overfilling of reserve pool Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 7/9] xfs: fix soft lockup via spinning in filestream ag selection loop Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 8/9] xfs: revert "xfs: actually bump warning counts when we send warnings" Leah Rumancik
2022-08-19 18:14 ` [PATCH 5.15 9/9] xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP* Leah Rumancik
2022-08-23  7:21 ` [PATCH 5.15 0/9] xfs stable candidate patches for 5.15.y (part 4) Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).