Archive-only list for patches
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	shrikanth hegde <sshegde@linux.vnet.ibm.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	Leah Rumancik <leah.rumancik@gmail.com>,
	Chandan Babu R <chandanbabu@kernel.org>,
	Ritesh Harjani <ritesh.list@gmail.com>
Subject: [PATCH 6.1 49/73] xfs: load uncached unlinked inodes into memory on demand
Date: Fri, 27 Sep 2024 14:24:00 +0200	[thread overview]
Message-ID: <20240927121721.888174191@linuxfoundation.org> (raw)
In-Reply-To: <20240927121719.897851549@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Darrick J. Wong" <djwong@kernel.org>

[ Upstream commit 68b957f64fca1930164bfc6d6d379acdccd547d7 ]

shrikanth hegde reports that filesystems fail shortly after mount with
the following failure:

	WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs]

This of course is the WARN_ON_ONCE in xfs_iunlink_lookup:

	ip = radix_tree_lookup(&pag->pag_ici_root, agino);
	if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... }

>From diagnostic data collected by the bug reporters, it would appear
that we cleanly mounted a filesystem that contained unlinked inodes.
Unlinked inodes are only processed as a final step of log recovery,
which means that clean mounts do not process the unlinked list at all.

Prior to the introduction of the incore unlinked lists, this wasn't a
problem because the unlink code would (very expensively) traverse the
entire ondisk metadata iunlink chain to keep things up to date.
However, the incore unlinked list code complains when it realizes that
it is out of sync with the ondisk metadata and shuts down the fs, which
is bad.

Ritesh proposed to solve this problem by unconditionally parsing the
unlinked lists at mount time, but this imposes a mount time cost for
every filesystem to catch something that should be very infrequent.
Instead, let's target the places where we can encounter a next_unlinked
pointer that refers to an inode that is not in cache, and load it into
cache.

Note: This patch does not address the problem of iget loading an inode
from the middle of the iunlink list and needing to set i_prev_unlinked
correctly.

Reported-by: shrikanth hegde <sshegde@linux.vnet.ibm.com>
Triaged-by: Ritesh Harjani <ritesh.list@gmail.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Acked-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/xfs_inode.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_trace.h |   25 ++++++++++++++++
 2 files changed, 100 insertions(+), 5 deletions(-)

--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1829,12 +1829,17 @@ xfs_iunlink_lookup(
 
 	rcu_read_lock();
 	ip = radix_tree_lookup(&pag->pag_ici_root, agino);
+	if (!ip) {
+		/* Caller can handle inode not being in memory. */
+		rcu_read_unlock();
+		return NULL;
+	}
 
 	/*
-	 * Inode not in memory or in RCU freeing limbo should not happen.
-	 * Warn about this and let the caller handle the failure.
+	 * Inode in RCU freeing limbo should not happen.  Warn about this and
+	 * let the caller handle the failure.
 	 */
-	if (WARN_ON_ONCE(!ip || !ip->i_ino)) {
+	if (WARN_ON_ONCE(!ip->i_ino)) {
 		rcu_read_unlock();
 		return NULL;
 	}
@@ -1843,7 +1848,10 @@ xfs_iunlink_lookup(
 	return ip;
 }
 
-/* Update the prev pointer of the next agino. */
+/*
+ * Update the prev pointer of the next agino.  Returns -ENOLINK if the inode
+ * is not in cache.
+ */
 static int
 xfs_iunlink_update_backref(
 	struct xfs_perag	*pag,
@@ -1858,7 +1866,8 @@ xfs_iunlink_update_backref(
 
 	ip = xfs_iunlink_lookup(pag, next_agino);
 	if (!ip)
-		return -EFSCORRUPTED;
+		return -ENOLINK;
+
 	ip->i_prev_unlinked = prev_agino;
 	return 0;
 }
@@ -1902,6 +1911,62 @@ xfs_iunlink_update_bucket(
 	return 0;
 }
 
+/*
+ * Load the inode @next_agino into the cache and set its prev_unlinked pointer
+ * to @prev_agino.  Caller must hold the AGI to synchronize with other changes
+ * to the unlinked list.
+ */
+STATIC int
+xfs_iunlink_reload_next(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agibp,
+	xfs_agino_t		prev_agino,
+	xfs_agino_t		next_agino)
+{
+	struct xfs_perag	*pag = agibp->b_pag;
+	struct xfs_mount	*mp = pag->pag_mount;
+	struct xfs_inode	*next_ip = NULL;
+	xfs_ino_t		ino;
+	int			error;
+
+	ASSERT(next_agino != NULLAGINO);
+
+#ifdef DEBUG
+	rcu_read_lock();
+	next_ip = radix_tree_lookup(&pag->pag_ici_root, next_agino);
+	ASSERT(next_ip == NULL);
+	rcu_read_unlock();
+#endif
+
+	xfs_info_ratelimited(mp,
+ "Found unrecovered unlinked inode 0x%x in AG 0x%x.  Initiating recovery.",
+			next_agino, pag->pag_agno);
+
+	/*
+	 * Use an untrusted lookup just to be cautious in case the AGI has been
+	 * corrupted and now points at a free inode.  That shouldn't happen,
+	 * but we'd rather shut down now since we're already running in a weird
+	 * situation.
+	 */
+	ino = XFS_AGINO_TO_INO(mp, pag->pag_agno, next_agino);
+	error = xfs_iget(mp, tp, ino, XFS_IGET_UNTRUSTED, 0, &next_ip);
+	if (error)
+		return error;
+
+	/* If this is not an unlinked inode, something is very wrong. */
+	if (VFS_I(next_ip)->i_nlink != 0) {
+		error = -EFSCORRUPTED;
+		goto rele;
+	}
+
+	next_ip->i_prev_unlinked = prev_agino;
+	trace_xfs_iunlink_reload_next(next_ip);
+rele:
+	ASSERT(!(VFS_I(next_ip)->i_state & I_DONTCACHE));
+	xfs_irele(next_ip);
+	return error;
+}
+
 static int
 xfs_iunlink_insert_inode(
 	struct xfs_trans	*tp,
@@ -1933,6 +1998,8 @@ xfs_iunlink_insert_inode(
 	 * inode.
 	 */
 	error = xfs_iunlink_update_backref(pag, agino, next_agino);
+	if (error == -ENOLINK)
+		error = xfs_iunlink_reload_next(tp, agibp, agino, next_agino);
 	if (error)
 		return error;
 
@@ -2027,6 +2094,9 @@ xfs_iunlink_remove_inode(
 	 */
 	error = xfs_iunlink_update_backref(pag, ip->i_prev_unlinked,
 			ip->i_next_unlinked);
+	if (error == -ENOLINK)
+		error = xfs_iunlink_reload_next(tp, agibp, ip->i_prev_unlinked,
+				ip->i_next_unlinked);
 	if (error)
 		return error;
 
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3679,6 +3679,31 @@ TRACE_EVENT(xfs_iunlink_update_dinode,
 		  __entry->new_ptr)
 );
 
+TRACE_EVENT(xfs_iunlink_reload_next,
+	TP_PROTO(struct xfs_inode *ip),
+	TP_ARGS(ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agino_t, agino)
+		__field(xfs_agino_t, prev_agino)
+		__field(xfs_agino_t, next_agino)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->agno = XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino);
+		__entry->agino = XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino);
+		__entry->prev_agino = ip->i_prev_unlinked;
+		__entry->next_agino = ip->i_next_unlinked;
+	),
+	TP_printk("dev %d:%d agno 0x%x agino 0x%x prev_unlinked 0x%x next_unlinked 0x%x",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->agino,
+		  __entry->prev_agino,
+		  __entry->next_agino)
+);
+
 DECLARE_EVENT_CLASS(xfs_ag_inode_class,
 	TP_PROTO(struct xfs_inode *ip),
 	TP_ARGS(ip),



  parent reply	other threads:[~2024-09-27 12:32 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-27 12:23 [PATCH 6.1 00/73] 6.1.112-rc1 review Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 01/73] ASoC: SOF: mediatek: Add missing board compatible Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 02/73] ASoC: allow module autoloading for table db1200_pids Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 03/73] ASoC: allow module autoloading for table board_ids Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 04/73] ALSA: hda/realtek - Fixed ALC256 headphone no sound Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 05/73] ALSA: hda/realtek - FIxed ALC285 " Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 06/73] scsi: lpfc: Fix overflow build issue Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 07/73] pinctrl: at91: make it work with current gpiolib Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 08/73] hwmon: (asus-ec-sensors) remove VRM temp X570-E GAMING Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 09/73] microblaze: dont treat zero reserved memory regions as error Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 10/73] net: ftgmac100: Ensure tx descriptor updates are visible Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 11/73] LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 12/73] wifi: iwlwifi: lower message level for FW buffer destination Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 13/73] wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 14/73] wifi: iwlwifi: mvm: pause TCM when the firmware is stopped Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 15/73] wifi: iwlwifi: mvm: dont wait for tx queues if firmware is dead Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 16/73] wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap() Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 17/73] wifi: iwlwifi: clear trans->state earlier upon error Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 18/73] can: mcp251xfd: mcp251xfd_ring_init(): check TX-coalescing configuration Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 19/73] ASoC: Intel: soc-acpi-cht: Make Lenovo Yoga Tab 3 X90F DMI match less strict Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 20/73] ASoC: intel: fix module autoloading Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 21/73] ASoC: tda7419: " Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 22/73] spi: spidev: Add an entry for elgin,jg10309-01 Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 23/73] drm: komeda: Fix an issue related to normalized zpos Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 24/73] spi: bcm63xx: Enable module autoloading Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 25/73] smb: client: fix hang in wait_for_response() for negproto Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 26/73] x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 27/73] tools: hv: rm .*.cmd when make clean Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 28/73] block: Fix where bio IO priority gets set Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 29/73] spi: spidev: Add missing spi_device_id for jg10309-01 Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 30/73] ocfs2: add bounds checking to ocfs2_xattr_find_entry() Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 31/73] ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry() Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 32/73] xfs: dquot shrinker doesnt check for XFS_DQFLAG_FREEING Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 33/73] xfs: Fix deadlock on xfs_inodegc_worker Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 34/73] xfs: fix extent busy updating Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 35/73] xfs: dont use BMBT btree split workers for IO completion Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 36/73] xfs: fix low space alloc deadlock Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 37/73] xfs: prefer free inodes at ENOSPC over chunk allocation Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 38/73] xfs: block reservation too large for minleft allocation Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 39/73] xfs: fix uninitialized variable access Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 40/73] xfs: quotacheck failure can race with background inode inactivation Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 41/73] xfs: fix BUG_ON in xfs_getbmap() Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 42/73] xfs: buffer pins need to hold a buffer reference Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 43/73] xfs: defered work could create precommits Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 44/73] xfs: fix AGF vs inode cluster buffer deadlock Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 45/73] xfs: collect errors from inodegc for unlinked inode recovery Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 46/73] xfs: fix ag count overflow during growfs Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 47/73] xfs: remove WARN when dquot cache insertion fails Greg Kroah-Hartman
2024-09-27 12:23 ` [PATCH 6.1 48/73] xfs: fix the calculation for "end" and "length" Greg Kroah-Hartman
2024-09-27 12:24 ` Greg Kroah-Hartman [this message]
2024-09-27 12:24 ` [PATCH 6.1 50/73] xfs: fix negative array access in xfs_getbmap Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 51/73] xfs: fix unlink vs cluster buffer instantiation race Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 52/73] xfs: correct calculation for agend and blockcount Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 53/73] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 54/73] xfs: reload entire unlinked bucket lists Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 55/73] xfs: make inode unlinked bucket recovery work with quotacheck Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 56/73] xfs: fix reloading entire unlinked bucket lists Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 57/73] xfs: set bnobt/cntbt numrecs correctly when formatting new AGs Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 58/73] xfs: journal geometry is not properly bounds checked Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 59/73] netfilter: nft_socket: make cgroupsv2 matching work with namespaces Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 60/73] netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level() Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 61/73] netfilter: nft_set_pipapo: walk over current view on netlink dump Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 62/73] netfilter: nf_tables: missing iterator type in lookup walk Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 63/73] Revert "wifi: cfg80211: check wiphy mutex is held for wdev mutex" Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 64/73] gpiolib: cdev: Ignore reconfiguration without direction Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 65/73] gpio: prevent potential speculation leaks in gpio_device_get_desc() Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 66/73] can: mcp251xfd: properly indent labels Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 67/73] can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop() Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 68/73] selftests: mptcp: join: restrict fullmesh endp on 1st sf Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 69/73] btrfs: calculate the right space for delayed refs when updating global reserve Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 70/73] powercap: RAPL: fix invalid initialization for pl4_supported field Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 71/73] x86/mm: Switch to new Intel CPU model defines Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 72/73] USB: serial: pl2303: add device id for Macrosilicon MS3020 Greg Kroah-Hartman
2024-09-27 12:24 ` [PATCH 6.1 73/73] USB: usbtmc: prevent kernel-usb-infoleak Greg Kroah-Hartman
2024-09-27 15:17 ` [PATCH 6.1 00/73] 6.1.112-rc1 review Peter Schneider
2024-09-27 15:52 ` Allen
2024-09-27 18:35 ` Jon Hunter
2024-09-27 18:40 ` Florian Fainelli
2024-09-28 12:39 ` Naresh Kamboju
2024-09-28 17:13 ` Shuah Khan
2024-09-29  8:43 ` Ron Economos
2024-09-29 11:19 ` Muhammad Usama Anjum
2024-09-30  8:41 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240927121721.888174191@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=chandanbabu@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=leah.rumancik@gmail.com \
    --cc=patches@lists.linux.dev \
    --cc=ritesh.list@gmail.com \
    --cc=sshegde@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox