public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chandan Babu R <chandan.babu@oracle.com>
To: djwong@kernel.org
Cc: chandan.babu@oracle.com, linux-xfs@vger.kernel.org,
	amir73il@gmail.com, leah.rumancik@gmail.com
Subject: [PATCH 5.4 CANDIDATE 17/17] xfs: force log and push AIL to clear pinned inodes when aborting mount
Date: Tue, 11 Apr 2023 09:05:14 +0530	[thread overview]
Message-ID: <20230411033514.58024-18-chandan.babu@oracle.com> (raw)
In-Reply-To: <20230411033514.58024-1-chandan.babu@oracle.com>

From: "Darrick J. Wong" <djwong@kernel.org>

commit d336f7ebc65007f5831e2297e6f3383ae8dbf8ed upstream.

[ Slightly modify fs/xfs/xfs_mount.c to resolve merge conflicts ]

If we allocate quota inodes in the process of mounting a filesystem but
then decide to abort the mount, it's possible that the quota inodes are
sitting around pinned by the log.  Now that inode reclaim relies on the
AIL to flush inodes, we have to force the log and push the AIL in
between releasing the quota inodes and kicking off reclaim to tear down
all the incore inodes.  Do this by extracting the bits we need from the
unmount path and reusing them.  As an added bonus, failed writes during
a failed mount will not retry forever now.

This was originally found during a fuzz test of metadata directories
(xfs/1546), but the actual symptom was that reclaim hung up on the quota
inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
---
 fs/xfs/xfs_mount.c | 90 +++++++++++++++++++++++-----------------------
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 2860966af6c2..2277f21c4f14 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -674,6 +674,47 @@ xfs_check_summary_counts(
 	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
 }
 
+/*
+ * Flush and reclaim dirty inodes in preparation for unmount. Inodes and
+ * internal inode structures can be sitting in the CIL and AIL at this point,
+ * so we need to unpin them, write them back and/or reclaim them before unmount
+ * can proceed.
+ *
+ * An inode cluster that has been freed can have its buffer still pinned in
+ * memory because the transaction is still sitting in a iclog. The stale inodes
+ * on that buffer will be pinned to the buffer until the transaction hits the
+ * disk and the callbacks run. Pushing the AIL will skip the stale inodes and
+ * may never see the pinned buffer, so nothing will push out the iclog and
+ * unpin the buffer.
+ *
+ * Hence we need to force the log to unpin everything first. However, log
+ * forces don't wait for the discards they issue to complete, so we have to
+ * explicitly wait for them to complete here as well.
+ *
+ * Then we can tell the world we are unmounting so that error handling knows
+ * that the filesystem is going away and we should error out anything that we
+ * have been retrying in the background.  This will prevent never-ending
+ * retries in AIL pushing from hanging the unmount.
+ *
+ * Finally, we can push the AIL to clean all the remaining dirty objects, then
+ * reclaim the remaining inodes that are still in memory at this point in time.
+ */
+static void
+xfs_unmount_flush_inodes(
+	struct xfs_mount	*mp)
+{
+	xfs_log_force(mp, XFS_LOG_SYNC);
+	xfs_extent_busy_wait_all(mp);
+	flush_workqueue(xfs_discard_wq);
+
+	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
+
+	xfs_ail_push_all_sync(mp->m_ail);
+	cancel_delayed_work_sync(&mp->m_reclaim_work);
+	xfs_reclaim_inodes(mp, SYNC_WAIT);
+	xfs_health_unmount(mp);
+}
+
 /*
  * This function does the following on an initial mount of a file system:
  *	- reads the superblock from disk and init the mount struct
@@ -1047,7 +1088,7 @@ xfs_mountfs(
 	/* Clean out dquots that might be in memory after quotacheck. */
 	xfs_qm_unmount(mp);
 	/*
-	 * Cancel all delayed reclaim work and reclaim the inodes directly.
+	 * Flush all inode reclamation work and flush the log.
 	 * We have to do this /after/ rtunmount and qm_unmount because those
 	 * two will have scheduled delayed reclaim for the rt/quota inodes.
 	 *
@@ -1057,11 +1098,8 @@ xfs_mountfs(
 	 * qm_unmount_quotas and therefore rely on qm_unmount to release the
 	 * quota inodes.
 	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp, SYNC_WAIT);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
  out_log_dealloc:
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
 	xfs_log_mount_cancel(mp);
  out_fail_wait:
 	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
@@ -1102,47 +1140,7 @@ xfs_unmountfs(
 	xfs_rtunmount_inodes(mp);
 	xfs_irele(mp->m_rootip);
 
-	/*
-	 * We can potentially deadlock here if we have an inode cluster
-	 * that has been freed has its buffer still pinned in memory because
-	 * the transaction is still sitting in a iclog. The stale inodes
-	 * on that buffer will have their flush locks held until the
-	 * transaction hits the disk and the callbacks run. the inode
-	 * flush takes the flush lock unconditionally and with nothing to
-	 * push out the iclog we will never get that unlocked. hence we
-	 * need to force the log first.
-	 */
-	xfs_log_force(mp, XFS_LOG_SYNC);
-
-	/*
-	 * Wait for all busy extents to be freed, including completion of
-	 * any discard operation.
-	 */
-	xfs_extent_busy_wait_all(mp);
-	flush_workqueue(xfs_discard_wq);
-
-	/*
-	 * We now need to tell the world we are unmounting. This will allow
-	 * us to detect that the filesystem is going away and we should error
-	 * out anything that we have been retrying in the background. This will
-	 * prevent neverending retries in AIL pushing from hanging the unmount.
-	 */
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
-
-	/*
-	 * Flush all pending changes from the AIL.
-	 */
-	xfs_ail_push_all_sync(mp->m_ail);
-
-	/*
-	 * And reclaim all inodes.  At this point there should be no dirty
-	 * inodes and none should be pinned or locked, but use synchronous
-	 * reclaim just to be sure. We can stop background inode reclaim
-	 * here as well if it is still running.
-	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp, SYNC_WAIT);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
 
 	xfs_qm_unmount(mp);
 
-- 
2.39.1


  parent reply	other threads:[~2023-04-11  3:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-11  3:34 [PATCH 5.4 CANDIDATE 00/17] xfs stable candidate patches for 5.4.y (from v5.11 & v5.12) Chandan Babu R
2023-04-11  3:34 ` [PATCH 5.4 CANDIDATE 01/17] xfs: show the proper user quota options Chandan Babu R
2023-04-11  3:34 ` [PATCH 5.4 CANDIDATE 02/17] xfs: merge the projid fields in struct xfs_icdinode Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 03/17] xfs: ensure that the inode uid/gid match values match the icdinode ones Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 04/17] xfs: remove the icdinode di_uid/di_gid members Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 05/17] xfs: remove the kuid/kgid conversion wrappers Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 06/17] xfs: add a new xfs_sb_version_has_v3inode helper Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 07/17] xfs: only check the superblock version for dinode size calculation Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 08/17] xfs: simplify di_flags2 inheritance in xfs_ialloc Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 09/17] xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 10/17] xfs: remove the di_version field from struct icdinode Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 11/17] xfs: fix up non-directory creation in SGID directories Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 12/17] xfs: set inode size after creating symlink Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 13/17] xfs: report corruption only as a regular error Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 14/17] xfs: shut down the filesystem if we screw up quota reservation Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 15/17] xfs: consider shutdown in bmapbt cursor delete assert Chandan Babu R
2023-04-11  3:35 ` [PATCH 5.4 CANDIDATE 16/17] xfs: don't reuse busy extents on extent trim Chandan Babu R
2023-04-11  3:35 ` Chandan Babu R [this message]
2023-04-11 18:15 ` [PATCH 5.4 CANDIDATE 00/17] xfs stable candidate patches for 5.4.y (from v5.11 & v5.12) Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230411033514.58024-18-chandan.babu@oracle.com \
    --to=chandan.babu@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=djwong@kernel.org \
    --cc=leah.rumancik@gmail.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox