From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, "Darrick J. Wong" <djwong@kernel.org>,
Dave Chinner <dchinner@redhat.com>,
Chandan Babu R <chandan.babu@oracle.com>,
Amir Goldstein <amir73il@gmail.com>
Subject: [PATCH 5.10 09/12] xfs: remove all COW fork extents when remounting readonly
Date: Thu, 30 Jun 2022 15:47:14 +0200 [thread overview]
Message-ID: <20220630133230.967036087@linuxfoundation.org> (raw)
In-Reply-To: <20220630133230.676254336@linuxfoundation.org>
From: "Darrick J. Wong" <djwong@kernel.org>
commit 089558bc7ba785c03815a49c89e28ad9b8de51f9 upstream.
[backport xfs_icwalk -> xfs_eofblocks for 5.10.y]
As part of multiple customer escalations due to file data corruption
after copy on write operations, I wrote some fstests that use fsstress
to hammer on COW to shake things loose. Regrettably, I caught some
filesystem shutdowns due to incorrect rmap operations with the following
loop:
mount <filesystem> # (0)
fsstress <run only readonly ops> & # (1)
while true; do
fsstress <run all ops>
mount -o remount,ro # (2)
fsstress <run only readonly ops>
mount -o remount,rw # (3)
done
When (2) happens, notice that (1) is still running. xfs_remount_ro will
call xfs_blockgc_stop to walk the inode cache to free all the COW
extents, but the blockgc mechanism races with (1)'s reader threads to
take IOLOCKs and loses, which means that it doesn't clean them all out.
Call such a file (A).
When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
walks the ondisk refcount btree and frees any COW extent that it finds.
This function does not check the inode cache, which means that incore
COW forks of inode (A) is now inconsistent with the ondisk metadata. If
one of those former COW extents are allocated and mapped into another
file (B) and someone triggers a COW to the stale reservation in (A), A's
dirty data will be written into (B) and once that's done, those blocks
will be transferred to (A)'s data fork without bumping the refcount.
The results are catastrophic -- file (B) and the refcount btree are now
corrupt. Solve this race by forcing the xfs_blockgc_free_space to run
synchronously, which causes xfs_icwalk to return to inodes that were
skipped because the blockgc code couldn't take the IOLOCK. This is safe
to do here because the VFS has already prohibited new writer threads.
Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/xfs/xfs_super.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1695,7 +1695,10 @@ static int
xfs_remount_ro(
struct xfs_mount *mp)
{
- int error;
+ struct xfs_eofblocks eofb = {
+ .eof_flags = XFS_EOF_FLAGS_SYNC,
+ };
+ int error;
/*
* Cancel background eofb scanning so it cannot race with the final
@@ -1703,8 +1706,13 @@ xfs_remount_ro(
*/
xfs_stop_block_reaping(mp);
- /* Get rid of any leftover CoW reservations... */
- error = xfs_icache_free_cowblocks(mp, NULL);
+ /*
+ * Clear out all remaining COW staging extents and speculative post-EOF
+ * preallocations so that we don't leave inodes requiring inactivation
+ * cleanups during reclaim on a read-only mount. We must process every
+ * cached inode, so this requires a synchronous cache scan.
+ */
+ error = xfs_icache_free_cowblocks(mp, &eofb);
if (error) {
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
return error;
next prev parent reply other threads:[~2022-06-30 14:08 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-30 13:47 [PATCH 5.10 00/12] 5.10.128-rc1 review Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 01/12] MAINTAINERS: add Amir as xfs maintainer for 5.10.y Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 02/12] drm: remove drm_fb_helper_modinit Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 03/12] tick/nohz: unexport __init-annotated tick_nohz_full_setup() Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 04/12] clocksource/drivers/ixp4xx: remove __init from ixp4xx_timer_setup() Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 05/12] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 06/12] xfs: use kmem_cache_free() for kmem_cache objects Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 07/12] xfs: punch out data fork delalloc blocks on COW writeback failure Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 08/12] xfs: Fix the free logic of state in xfs_attr_node_hasname Greg Kroah-Hartman
2022-06-30 13:47 ` Greg Kroah-Hartman [this message]
2022-06-30 13:47 ` [PATCH 5.10 10/12] xfs: check sb_meta_uuid for dabuf buffer recovery Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 11/12] powerpc/ftrace: Remove ftrace init tramp once kernel init is complete Greg Kroah-Hartman
2022-06-30 13:47 ` [PATCH 5.10 12/12] net: mscc: ocelot: allow unregistered IP multicast flooding Greg Kroah-Hartman
2022-06-30 17:09 ` [PATCH 5.10 00/12] 5.10.128-rc1 review Jon Hunter
2022-06-30 23:03 ` Florian Fainelli
2022-06-30 23:15 ` Shuah Khan
2022-07-01 0:57 ` Guenter Roeck
2022-07-01 3:16 ` Samuel Zou
2022-07-01 6:23 ` Naresh Kamboju
2022-07-01 10:34 ` Sudip Mukherjee
2022-07-01 18:47 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220630133230.967036087@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=amir73il@gmail.com \
--cc=chandan.babu@oracle.com \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).