From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>, Dave Chinner <dchinner@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
david@fromorbit.com, hch@infradead.org
Subject: [PATCH 15/14] fstest: regression test for writeback corruption bug
Date: Wed, 9 Nov 2022 10:22:50 -0800 [thread overview]
Message-ID: <Y2vv+tdWVEStVpaO@magnolia> (raw)
In-Reply-To: <166801774453.3992140.241667783932550826.stgit@magnolia>
From: Darrick J. Wong <djwong@kernel.org>
This is a regression test for a data corruption bug that existed in XFS'
copy on write code between 4.9 and 4.19. The root cause is a
concurrency bug wherein we would drop ILOCK_SHARED after querying the
CoW fork in xfs_map_cow and retake it before querying the data fork in
xfs_map_blocks. See the test description for a lot more details.
Cc: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
tests/xfs/924 | 197 +++++++++++++++++++++++++++++++++++++++++++++++++++++
tests/xfs/924.out | 2 +
2 files changed, 199 insertions(+)
create mode 100755 tests/xfs/924
create mode 100644 tests/xfs/924.out
diff --git a/tests/xfs/924 b/tests/xfs/924
new file mode 100755
index 0000000000..486afefedd
--- /dev/null
+++ b/tests/xfs/924
@@ -0,0 +1,197 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Oracle. All Rights Reserved.
+#
+# FS QA Test 924
+#
+# This is a regression test for a data corruption bug that existed in XFS' copy
+# on write code between 4.9 and 4.19. The root cause is a concurrency bug
+# wherein we would drop ILOCK_SHARED after querying the CoW fork in xfs_map_cow
+# and retake it before querying the data fork in xfs_map_blocks. If a second
+# thread changes the CoW fork mappings between the two calls, it's possible for
+# xfs_map_blocks to return a zero-block mapping, which results in writeback
+# being elided for that block. Elided writeback of dirty data results in
+# silent loss of writes.
+#
+# Worse yet, kernels from that era still used buffer heads, which means that an
+# elided writeback leaves the page clean but the bufferheads dirty. Due to a
+# naïve optimization in mark_buffer_dirty, the SetPageDirty call is elided if
+# the bufferhead is dirty, which means that a subsequent rewrite of the data
+# block will never result in the page being marked dirty, and all subsequent
+# writes are lost.
+#
+# It turns out that Christoph Hellwig unwittingly fixed the race in commit
+# 5c665e5b5af6 ("xfs: remove xfs_map_cow"), and no testcase was ever written.
+# Four years later, we hit it on a production 4.14 kernel. This testcase
+# relies on a debugging knob that introduces artificial delays into writeback.
+#
+# Before the race, the file blocks 0-1 are not shared and blocks 2-5 are
+# shared. There are no extents in CoW fork.
+#
+# Two threads race like this:
+#
+# Thread 1 (writeback block 0) | Thread 2 (write to block 2)
+# ---------------------------------|--------------------------------
+# |
+# 1. Check if block 0 in CoW fork |
+# from xfs_map_cow. |
+# |
+# 2. Block 0 not found in CoW |
+# fork; the block is considered |
+# not shared. |
+# |
+# 3. xfs_map_blocks looks up data |
+# fork to get a map covering |
+# block 0. |
+# |
+# 4. It gets a data fork mapping |
+# for block 0 with length 2. |
+# |
+# | 1. A buffered write to block 2 sees
+# | that it is a shared block and no
+# | extent covers block 2 in CoW fork.
+# |
+# | It creates a new CoW fork mapping.
+# | Due to the cowextsize, the new
+# | extent starts at block 0 with
+# | length 128.
+# |
+# |
+# 5. It lookup CoW fork again to |
+# trim the map (0, 2) to a |
+# shared block boundary. |
+# |
+# 5a. It finds (0, 128) in CoW fork|
+# 5b. It trims the data fork map |
+# from (0, 1) to (0, 0) (!!!) |
+# |
+# 6. The xfs_imap_valid call after |
+# the xfs_map_blocks call checks|
+# if the mapping (0, 0) covers |
+# block 0. The result is "NO". |
+# |
+# 7. Since block 0 has no physical |
+# block mapped, it's not added |
+# to the ioend. This is the |
+# first problem. |
+# |
+# 8. xfs_add_to_ioend usually |
+# clears the bufferhead dirty |
+# flag Because this is skipped,|
+# we leave the page clean with |
+# the associated buffer head(s) |
+# dirty (the second problem). |
+# Now the dirty state is |
+# inconsistent.
+#
+# On newer kernels, this is also a functionality test for the ifork sequence
+# counter because the writeback completions will change the data fork and force
+# revalidations of the wb mapping.
+#
+. ./common/preamble
+_begin_fstest auto quick clone
+
+# Import common functions.
+. ./common/reflink
+. ./common/inject
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs xfs
+_fixed_by_kernel_commit 5c665e5b5af6 "xfs: remove xfs_map_cow"
+_require_error_injection
+_require_scratch_reflink
+_require_cp_reflink
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount >> $seqres.full
+
+knob="$(_find_xfs_mountdev_errortag_knob $SCRATCH_DEV "wb_delay_ms")"
+test -w "$knob" || _notrun "Kernel does not have wb_delay_ms error injector"
+
+blksz=65536
+_require_congruent_file_oplen $SCRATCH_MNT $blksz
+
+# Make sure we have sufficient extent size to create speculative CoW
+# preallocations.
+$XFS_IO_PROG -c 'cowextsize 1m' $SCRATCH_MNT
+
+# Write out a file with the first two blocks unshared and the rest shared.
+_pwrite_byte 0x59 0 $((160 * blksz)) $SCRATCH_MNT/file >> $seqres.full
+_pwrite_byte 0x59 0 $((160 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+_cp_reflink $SCRATCH_MNT/file $SCRATCH_MNT/file.reflink
+
+_pwrite_byte 0x58 0 $((2 * blksz)) $SCRATCH_MNT/file >> $seqres.full
+_pwrite_byte 0x58 0 $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+# Avoid creation of large folios on newer kernels by cycling the mount and
+# immediately writing to the page cache.
+_scratch_cycle_mount
+
+# Write the same data to file.compare as we're about to do to file. Do this
+# before slowing down writeback to avoid unnecessary delay.
+_pwrite_byte 0x57 0 $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+_pwrite_byte 0x56 $((2 * blksz)) $((2 * blksz)) $SCRATCH_MNT/file.compare >> $seqres.full
+sync
+
+# Introduce a half-second wait to each writeback block mapping call. This
+# gives us a chance to race speculative cow prealloc with writeback.
+wb_delay=500
+echo $wb_delay > $knob
+curval="$(cat $knob)"
+test "$curval" -eq $wb_delay || echo "expected wb_delay_ms == $wb_delay"
+
+# Start thread 1 + writeback above
+$XFS_IO_PROG -c "pwrite -S 0x57 0 $((2 * blksz))" \
+ -c 'bmap -celpv' -c 'bmap -elpv' \
+ -c 'fsync' $SCRATCH_MNT/file >> $seqres.full &
+sleep 1
+
+# Start a sentry to look for evidence of the XFS_ERRORTAG_REPORT logging. If
+# we see that, we know we've forced writeback to revalidate a mapping. The
+# test has been successful, so turn off the delay.
+sentryfile=$TEST_DIR/$seq.sentry
+wait_for_errortag() {
+ while [ -e "$sentryfile" ]; do
+ if _check_dmesg_for XFS_ERRTAG_WB_DELAY_MS; then
+ echo 0 > $knob
+ break;
+ fi
+ sleep 1
+ done
+}
+touch $sentryfile
+wait_for_errortag &
+
+# Start thread 2 to create the cowextsize reservation
+$XFS_IO_PROG -c "pwrite -S 0x56 $((2 * blksz)) $((2 * blksz))" \
+ -c 'bmap -celpv' -c 'bmap -elpv' \
+ -c 'fsync' $SCRATCH_MNT/file >> $seqres.full
+rm -f $sentryfile
+
+_check_dmesg_for XFS_ERRTAG_WB_DELAY_MS
+saw_delay=$?
+
+# Flush everything to disk. If the bug manifests, then after the cycle,
+# file should have stale 0x58 in block 0 because we silently dropped a write.
+_scratch_cycle_mount
+
+if ! cmp -s $SCRATCH_MNT/file $SCRATCH_MNT/file.compare; then
+ echo file and file.compare do not match
+ $XFS_IO_PROG -c 'bmap -celpv' -c 'bmap -elpv' $SCRATCH_MNT/file >> $seqres.full
+ echo file.compare
+ od -tx1 -Ad -c $SCRATCH_MNT/file.compare
+ echo file
+ od -tx1 -Ad -c $SCRATCH_MNT/file
+elif [ $saw_delay -ne 0 ]; then
+ # The files matched, but nothing got logged about the revalidation?
+ echo "Expected to hear about XFS_ERRTAG_WB_DELAY_MS?"
+fi
+
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/924.out b/tests/xfs/924.out
new file mode 100644
index 0000000000..c6655da35a
--- /dev/null
+++ b/tests/xfs/924.out
@@ -0,0 +1,2 @@
+QA output created by 924
+Silence is golden
next prev parent reply other threads:[~2022-11-09 18:22 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-09 18:15 [PATCHSET RFCRAP v2 00/14] xfs, iomap: fix data corruption due to stale cached iomaps Darrick J. Wong
2022-11-09 18:15 ` [PATCH 01/14] xfs: write page faults in iomap are not buffered writes Darrick J. Wong
2022-11-09 18:15 ` [PATCH 02/14] xfs: punching delalloc extents on write failure is racy Darrick J. Wong
2022-11-09 18:16 ` [PATCH 03/14] xfs: use byte ranges for write cleanup ranges Darrick J. Wong
2022-11-09 18:16 ` [PATCH 04/14] xfs: buffered write failure should not truncate the page cache Darrick J. Wong
2022-11-15 8:32 ` Christoph Hellwig
2022-11-09 18:16 ` [PATCH 05/14] iomap: write iomap validity checks Darrick J. Wong
2022-11-09 18:16 ` [PATCH 06/14] xfs: use iomap_valid method to detect stale cached iomaps Darrick J. Wong
2022-11-09 18:16 ` [PATCH 07/14] xfs: drop write error injection is unfixable, remove it Darrick J. Wong
2022-11-09 18:16 ` [PATCH 08/14] iomap: pass iter to ->iomap_begin implementations Darrick J. Wong
2022-11-15 8:36 ` Christoph Hellwig
2022-11-09 18:16 ` [PATCH 09/14] iomap: pass iter to ->iomap_end implementations Darrick J. Wong
2022-11-09 18:16 ` [PATCH 10/14] iomap: pass a private pointer to iomap_file_buffered_write Darrick J. Wong
2022-11-15 8:37 ` Christoph Hellwig
2022-11-09 18:16 ` [PATCH 11/14] xfs: move the seq counters for buffered writes to a private struct Darrick J. Wong
2022-11-15 8:38 ` Christoph Hellwig
2022-11-09 18:16 ` [PATCH 12/14] xfs: validate COW fork sequence counters during buffered writes Darrick J. Wong
2022-11-09 18:16 ` [PATCH 13/14] xfs: add debug knob to slow down writeback for fun Darrick J. Wong
2022-11-09 18:17 ` [PATCH 14/14] xfs: add debug knob to slow down write " Darrick J. Wong
2022-11-09 18:22 ` Darrick J. Wong [this message]
2022-11-09 18:23 ` [PATCH 16/14] fstest: regression test for writes racing with reclaim writeback Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2vv+tdWVEStVpaO@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox