From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 9/9] xfs: drop write error injection is unfixable, remove it
Date: Thu, 17 Nov 2022 10:01:13 -0800 [thread overview]
Message-ID: <Y3Z26ZYSEK+rf0q4@magnolia> (raw)
In-Reply-To: <20221117055810.498014-10-david@fromorbit.com>
On Thu, Nov 17, 2022 at 04:58:10PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> With the changes to scan the page cache for dirty data to avoid data
> corruptions from partial write cleanup racing with other page cache
> operations, the drop writes error injection no longer works the same
> way it used to and causes xfs/196 to fail. This is because xfs/196
> writes to the file and populates the page cache before it turns on
> the error injection and starts failing -overwrites-.
>
> The result is that the original drop-writes code failed writes only
> -after- overwriting the data in the cache, followed by invalidates
> the cached data, then punching out the delalloc extent from under
> that data.
>
> On the surface, this looks fine. The problem is that page cache
> invalidation *doesn't guarantee that it removes anything from the
> page cache* and it doesn't change the dirty state of the folio. When
> block size == page size and we do page aligned IO (as xfs/196 does)
> everything happens to align perfectly and page cache invalidation
> removes the single page folios that span the written data. Hence the
> followup delalloc punch pass does not find cached data over that
> range and it can punch the extent out.
>
> IOWs, xfs/196 "works" for block size == page size with the new
> code. I say "works", because it actually only works for the case
> where IO is page aligned, and no data was read from disk before
> writes occur. Because the moment we actually read data first, the
> readahead code allocates multipage folios and suddenly the
> invalidate code goes back to zeroing subfolio ranges without
> changing dirty state.
>
> Hence, with multipage folios in play, block size == page size is
> functionally identical to block size < page size behaviour, and
> drop-writes is manifestly broken w.r.t to this case. Invalidation of
> a subfolio range doesn't result in the folio being removed from the
> cache, just the range gets zeroed. Hence after we've sequentially
> walked over a folio that we've dirtied (via write data) and then
> invalidated, we end up with a dirty folio full of zeroed data.
>
> And because the new code skips punching ranges that have dirty
> folios covering them, we end up leaving the delalloc range intact
> after failing all the writes. Hence failed writes now end up
> writing zeroes to disk in the cases where invalidation zeroes folios
> rather than removing them from cache.
>
> This is a fundamental change of behaviour that is needed to avoid
> the data corruption vectors that exist in the old write fail path,
> and it renders the drop-writes injection non-functional and
> unworkable as it stands.
>
> As it is, I think the error injection is also now unnecessary, as
> partial writes that need delalloc extent are going to be a lot more
> common with stale iomap detection in place. Hence this patch removes
> the drop-writes error injection completely. xfs/196 can remain for
> testing kernels that don't have this data corruption fix, but those
> that do will report:
>
> xfs/196 3s ... [not run] XFS error injection drop_writes unknown on this kernel.
Assuming you're planning to scuttle xfs/196 as well,
(I appreciate the cleanups in xfs_error.c too.)
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/libxfs/xfs_errortag.h | 12 +++++-------
> fs/xfs/xfs_error.c | 27 ++++++++++++++++++++-------
> fs/xfs/xfs_iomap.c | 9 ---------
> 3 files changed, 25 insertions(+), 23 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
> index 5362908164b0..580ccbd5aadc 100644
> --- a/fs/xfs/libxfs/xfs_errortag.h
> +++ b/fs/xfs/libxfs/xfs_errortag.h
> @@ -40,13 +40,12 @@
> #define XFS_ERRTAG_REFCOUNT_FINISH_ONE 25
> #define XFS_ERRTAG_BMAP_FINISH_ONE 26
> #define XFS_ERRTAG_AG_RESV_CRITICAL 27
> +
> /*
> - * DEBUG mode instrumentation to test and/or trigger delayed allocation
> - * block killing in the event of failed writes. When enabled, all
> - * buffered writes are silenty dropped and handled as if they failed.
> - * All delalloc blocks in the range of the write (including pre-existing
> - * delalloc blocks!) are tossed as part of the write failure error
> - * handling sequence.
> + * Drop-writes support removed because write error handling cannot trash
> + * pre-existing delalloc extents in any useful way anymore. We retain the
> + * definition so that we can reject it as an invalid value in
> + * xfs_errortag_valid().
> */
> #define XFS_ERRTAG_DROP_WRITES 28
> #define XFS_ERRTAG_LOG_BAD_CRC 29
> @@ -95,7 +94,6 @@
> #define XFS_RANDOM_REFCOUNT_FINISH_ONE 1
> #define XFS_RANDOM_BMAP_FINISH_ONE 1
> #define XFS_RANDOM_AG_RESV_CRITICAL 4
> -#define XFS_RANDOM_DROP_WRITES 1
> #define XFS_RANDOM_LOG_BAD_CRC 1
> #define XFS_RANDOM_LOG_ITEM_PIN 1
> #define XFS_RANDOM_BUF_LRU_REF 2
> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> index c6b2aabd6f18..dea3c0649d2f 100644
> --- a/fs/xfs/xfs_error.c
> +++ b/fs/xfs/xfs_error.c
> @@ -46,7 +46,7 @@ static unsigned int xfs_errortag_random_default[] = {
> XFS_RANDOM_REFCOUNT_FINISH_ONE,
> XFS_RANDOM_BMAP_FINISH_ONE,
> XFS_RANDOM_AG_RESV_CRITICAL,
> - XFS_RANDOM_DROP_WRITES,
> + 0, /* XFS_RANDOM_DROP_WRITES has been removed */
> XFS_RANDOM_LOG_BAD_CRC,
> XFS_RANDOM_LOG_ITEM_PIN,
> XFS_RANDOM_BUF_LRU_REF,
> @@ -162,7 +162,6 @@ XFS_ERRORTAG_ATTR_RW(refcount_continue_update, XFS_ERRTAG_REFCOUNT_CONTINUE_UPDA
> XFS_ERRORTAG_ATTR_RW(refcount_finish_one, XFS_ERRTAG_REFCOUNT_FINISH_ONE);
> XFS_ERRORTAG_ATTR_RW(bmap_finish_one, XFS_ERRTAG_BMAP_FINISH_ONE);
> XFS_ERRORTAG_ATTR_RW(ag_resv_critical, XFS_ERRTAG_AG_RESV_CRITICAL);
> -XFS_ERRORTAG_ATTR_RW(drop_writes, XFS_ERRTAG_DROP_WRITES);
> XFS_ERRORTAG_ATTR_RW(log_bad_crc, XFS_ERRTAG_LOG_BAD_CRC);
> XFS_ERRORTAG_ATTR_RW(log_item_pin, XFS_ERRTAG_LOG_ITEM_PIN);
> XFS_ERRORTAG_ATTR_RW(buf_lru_ref, XFS_ERRTAG_BUF_LRU_REF);
> @@ -206,7 +205,6 @@ static struct attribute *xfs_errortag_attrs[] = {
> XFS_ERRORTAG_ATTR_LIST(refcount_finish_one),
> XFS_ERRORTAG_ATTR_LIST(bmap_finish_one),
> XFS_ERRORTAG_ATTR_LIST(ag_resv_critical),
> - XFS_ERRORTAG_ATTR_LIST(drop_writes),
> XFS_ERRORTAG_ATTR_LIST(log_bad_crc),
> XFS_ERRORTAG_ATTR_LIST(log_item_pin),
> XFS_ERRORTAG_ATTR_LIST(buf_lru_ref),
> @@ -256,6 +254,19 @@ xfs_errortag_del(
> kmem_free(mp->m_errortag);
> }
>
> +static bool
> +xfs_errortag_valid(
> + unsigned int error_tag)
> +{
> + if (error_tag >= XFS_ERRTAG_MAX)
> + return false;
> +
> + /* Error out removed injection types */
> + if (error_tag == XFS_ERRTAG_DROP_WRITES)
> + return false;
> + return true;
> +}
> +
> bool
> xfs_errortag_test(
> struct xfs_mount *mp,
> @@ -277,7 +288,9 @@ xfs_errortag_test(
> if (!mp->m_errortag)
> return false;
>
> - ASSERT(error_tag < XFS_ERRTAG_MAX);
> + if (!xfs_errortag_valid(error_tag))
> + return false;
> +
> randfactor = mp->m_errortag[error_tag];
> if (!randfactor || prandom_u32_max(randfactor))
> return false;
> @@ -293,7 +306,7 @@ xfs_errortag_get(
> struct xfs_mount *mp,
> unsigned int error_tag)
> {
> - if (error_tag >= XFS_ERRTAG_MAX)
> + if (!xfs_errortag_valid(error_tag))
> return -EINVAL;
>
> return mp->m_errortag[error_tag];
> @@ -305,7 +318,7 @@ xfs_errortag_set(
> unsigned int error_tag,
> unsigned int tag_value)
> {
> - if (error_tag >= XFS_ERRTAG_MAX)
> + if (!xfs_errortag_valid(error_tag))
> return -EINVAL;
>
> mp->m_errortag[error_tag] = tag_value;
> @@ -319,7 +332,7 @@ xfs_errortag_add(
> {
> BUILD_BUG_ON(ARRAY_SIZE(xfs_errortag_random_default) != XFS_ERRTAG_MAX);
>
> - if (error_tag >= XFS_ERRTAG_MAX)
> + if (!xfs_errortag_valid(error_tag))
> return -EINVAL;
>
> return xfs_errortag_set(mp, error_tag,
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index a9b3a1bcc3fd..d294a00ca28b 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1191,15 +1191,6 @@ xfs_buffered_write_iomap_end(
> struct xfs_mount *mp = XFS_M(inode->i_sb);
> int error;
>
> - /*
> - * Behave as if the write failed if drop writes is enabled. Set the NEW
> - * flag to force delalloc cleanup.
> - */
> - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) {
> - iomap->flags |= IOMAP_F_NEW;
> - written = 0;
> - }
> -
> error = iomap_file_buffered_write_punch_delalloc(inode, iomap, offset,
> length, written, &xfs_buffered_write_delalloc_punch);
> if (error && !xfs_is_shutdown(mp)) {
> --
> 2.37.2
>
next prev parent reply other threads:[~2022-11-17 18:01 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-17 5:58 [PATCH 0/8 v3] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-17 5:58 ` [PATCH 1/9] xfs: write page faults in iomap are not buffered writes Dave Chinner
2022-11-17 5:58 ` [PATCH 2/9] xfs: punching delalloc extents on write failure is racy Dave Chinner
2022-11-17 17:50 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 3/9] xfs: use byte ranges for write cleanup ranges Dave Chinner
2022-11-17 5:58 ` [PATCH 4/9] xfs,iomap: move delalloc punching to iomap Dave Chinner
2022-11-17 17:57 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 5/9] iomap: buffered write failure should not truncate the page cache Dave Chinner
2022-11-17 18:36 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 6/9] xfs: xfs_bmap_punch_delalloc_range() should take a byte range Dave Chinner
2022-11-17 18:37 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 7/9] iomap: write iomap validity checks Dave Chinner
2022-11-17 18:40 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 8/9] xfs: use iomap_valid method to detect stale cached iomaps Dave Chinner
2022-11-17 18:59 ` Darrick J. Wong
2022-11-17 21:36 ` Dave Chinner
2022-11-17 5:58 ` [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Dave Chinner
2022-11-17 18:01 ` Darrick J. Wong [this message]
-- strict thread matches above, loose matches on Subject: below --
2022-11-23 5:58 [PATCH 0/9 V4] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-23 5:58 ` [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Dave Chinner
2022-11-15 1:30 [PATCH v2 0/9] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-15 1:30 ` [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3Z26ZYSEK+rf0q4@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).