Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Chandan Babu R <chandan.babu@oracle.com>
To: gregkh@linuxfoundation.org
Cc: sashal@kernel.org, mcgrof@kernel.org, linux-xfs@vger.kernel.org,
	stable@vger.kernel.org, djwong@kernel.org,
	chandan.babu@oracle.com, amir73il@gmail.com,
	leah.rumancik@gmail.com
Subject: [PATCH 5.4 17/17] xfs: force log and push AIL to clear pinned inodes when aborting mount
Date: Wed, 12 Apr 2023 09:56:24 +0530	[thread overview]
Message-ID: <20230412042624.600511-18-chandan.babu@oracle.com> (raw)
In-Reply-To: <20230412042624.600511-1-chandan.babu@oracle.com>

From: "Darrick J. Wong" <djwong@kernel.org>

commit d336f7ebc65007f5831e2297e6f3383ae8dbf8ed upstream.

[ Slightly modify fs/xfs/xfs_mount.c to resolve merge conflicts ]

If we allocate quota inodes in the process of mounting a filesystem but
then decide to abort the mount, it's possible that the quota inodes are
sitting around pinned by the log.  Now that inode reclaim relies on the
AIL to flush inodes, we have to force the log and push the AIL in
between releasing the quota inodes and kicking off reclaim to tear down
all the incore inodes.  Do this by extracting the bits we need from the
unmount path and reusing them.  As an added bonus, failed writes during
a failed mount will not retry forever now.

This was originally found during a fuzz test of metadata directories
(xfs/1546), but the actual symptom was that reclaim hung up on the quota
inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_mount.c | 90 +++++++++++++++++++++++-----------------------
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 2860966af6c2..2277f21c4f14 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -674,6 +674,47 @@ xfs_check_summary_counts(
 	return xfs_initialize_perag_data(mp, mp->m_sb.sb_agcount);
 }
 
+/*
+ * Flush and reclaim dirty inodes in preparation for unmount. Inodes and
+ * internal inode structures can be sitting in the CIL and AIL at this point,
+ * so we need to unpin them, write them back and/or reclaim them before unmount
+ * can proceed.
+ *
+ * An inode cluster that has been freed can have its buffer still pinned in
+ * memory because the transaction is still sitting in a iclog. The stale inodes
+ * on that buffer will be pinned to the buffer until the transaction hits the
+ * disk and the callbacks run. Pushing the AIL will skip the stale inodes and
+ * may never see the pinned buffer, so nothing will push out the iclog and
+ * unpin the buffer.
+ *
+ * Hence we need to force the log to unpin everything first. However, log
+ * forces don't wait for the discards they issue to complete, so we have to
+ * explicitly wait for them to complete here as well.
+ *
+ * Then we can tell the world we are unmounting so that error handling knows
+ * that the filesystem is going away and we should error out anything that we
+ * have been retrying in the background.  This will prevent never-ending
+ * retries in AIL pushing from hanging the unmount.
+ *
+ * Finally, we can push the AIL to clean all the remaining dirty objects, then
+ * reclaim the remaining inodes that are still in memory at this point in time.
+ */
+static void
+xfs_unmount_flush_inodes(
+	struct xfs_mount	*mp)
+{
+	xfs_log_force(mp, XFS_LOG_SYNC);
+	xfs_extent_busy_wait_all(mp);
+	flush_workqueue(xfs_discard_wq);
+
+	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
+
+	xfs_ail_push_all_sync(mp->m_ail);
+	cancel_delayed_work_sync(&mp->m_reclaim_work);
+	xfs_reclaim_inodes(mp, SYNC_WAIT);
+	xfs_health_unmount(mp);
+}
+
 /*
  * This function does the following on an initial mount of a file system:
  *	- reads the superblock from disk and init the mount struct
@@ -1047,7 +1088,7 @@ xfs_mountfs(
 	/* Clean out dquots that might be in memory after quotacheck. */
 	xfs_qm_unmount(mp);
 	/*
-	 * Cancel all delayed reclaim work and reclaim the inodes directly.
+	 * Flush all inode reclamation work and flush the log.
 	 * We have to do this /after/ rtunmount and qm_unmount because those
 	 * two will have scheduled delayed reclaim for the rt/quota inodes.
 	 *
@@ -1057,11 +1098,8 @@ xfs_mountfs(
 	 * qm_unmount_quotas and therefore rely on qm_unmount to release the
 	 * quota inodes.
 	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp, SYNC_WAIT);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
  out_log_dealloc:
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
 	xfs_log_mount_cancel(mp);
  out_fail_wait:
 	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
@@ -1102,47 +1140,7 @@ xfs_unmountfs(
 	xfs_rtunmount_inodes(mp);
 	xfs_irele(mp->m_rootip);
 
-	/*
-	 * We can potentially deadlock here if we have an inode cluster
-	 * that has been freed has its buffer still pinned in memory because
-	 * the transaction is still sitting in a iclog. The stale inodes
-	 * on that buffer will have their flush locks held until the
-	 * transaction hits the disk and the callbacks run. the inode
-	 * flush takes the flush lock unconditionally and with nothing to
-	 * push out the iclog we will never get that unlocked. hence we
-	 * need to force the log first.
-	 */
-	xfs_log_force(mp, XFS_LOG_SYNC);
-
-	/*
-	 * Wait for all busy extents to be freed, including completion of
-	 * any discard operation.
-	 */
-	xfs_extent_busy_wait_all(mp);
-	flush_workqueue(xfs_discard_wq);
-
-	/*
-	 * We now need to tell the world we are unmounting. This will allow
-	 * us to detect that the filesystem is going away and we should error
-	 * out anything that we have been retrying in the background. This will
-	 * prevent neverending retries in AIL pushing from hanging the unmount.
-	 */
-	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
-
-	/*
-	 * Flush all pending changes from the AIL.
-	 */
-	xfs_ail_push_all_sync(mp->m_ail);
-
-	/*
-	 * And reclaim all inodes.  At this point there should be no dirty
-	 * inodes and none should be pinned or locked, but use synchronous
-	 * reclaim just to be sure. We can stop background inode reclaim
-	 * here as well if it is still running.
-	 */
-	cancel_delayed_work_sync(&mp->m_reclaim_work);
-	xfs_reclaim_inodes(mp, SYNC_WAIT);
-	xfs_health_unmount(mp);
+	xfs_unmount_flush_inodes(mp);
 
 	xfs_qm_unmount(mp);
 
-- 
2.39.1


  parent reply	other threads:[~2023-04-12  4:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12  4:26 [PATCH 5.4 00/17] xfs stable candidate patches for 5.4.y (from v5.11 & v5.12) Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 01/17] xfs: show the proper user quota options Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 02/17] xfs: merge the projid fields in struct xfs_icdinode Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 03/17] xfs: ensure that the inode uid/gid match values match the icdinode ones Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 04/17] xfs: remove the icdinode di_uid/di_gid members Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 05/17] xfs: remove the kuid/kgid conversion wrappers Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 06/17] xfs: add a new xfs_sb_version_has_v3inode helper Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 07/17] xfs: only check the superblock version for dinode size calculation Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 08/17] xfs: simplify di_flags2 inheritance in xfs_ialloc Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 09/17] xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize Chandan Babu R
2023-04-12  4:29   ` kernel test robot
2023-04-12  4:36     ` Chandan Babu R
2023-04-12  4:47       ` Philip Li
2023-04-12  4:50         ` Chandan Babu R
2023-04-12  5:08   ` [PATCH 5.4 V2] " Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 10/17] xfs: remove the di_version field from struct icdinode Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 11/17] xfs: fix up non-directory creation in SGID directories Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 12/17] xfs: set inode size after creating symlink Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 13/17] xfs: report corruption only as a regular error Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 14/17] xfs: shut down the filesystem if we screw up quota reservation Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 15/17] xfs: consider shutdown in bmapbt cursor delete assert Chandan Babu R
2023-04-12  4:26 ` [PATCH 5.4 16/17] xfs: don't reuse busy extents on extent trim Chandan Babu R
2023-04-12  4:26 ` Chandan Babu R [this message]
2023-04-18  9:49 ` [PATCH 5.4 00/17] xfs stable candidate patches for 5.4.y (from v5.11 & v5.12) Greg KH
2023-04-18 10:53   ` Greg KH
2023-04-18 12:39     ` Chandan Babu R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230412042624.600511-18-chandan.babu@oracle.com \
    --to=chandan.babu@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=djwong@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=leah.rumancik@gmail.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox