linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 9/9] xfs: drop write error injection is unfixable, remove it
Date: Thu, 17 Nov 2022 10:01:13 -0800	[thread overview]
Message-ID: <Y3Z26ZYSEK+rf0q4@magnolia> (raw)
In-Reply-To: <20221117055810.498014-10-david@fromorbit.com>

On Thu, Nov 17, 2022 at 04:58:10PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> With the changes to scan the page cache for dirty data to avoid data
> corruptions from partial write cleanup racing with other page cache
> operations, the drop writes error injection no longer works the same
> way it used to and causes xfs/196 to fail. This is because xfs/196
> writes to the file and populates the page cache before it turns on
> the error injection and starts failing -overwrites-.
> 
> The result is that the original drop-writes code failed writes only
> -after- overwriting the data in the cache, followed by invalidates
> the cached data, then punching out the delalloc extent from under
> that data.
> 
> On the surface, this looks fine. The problem is that page cache
> invalidation *doesn't guarantee that it removes anything from the
> page cache* and it doesn't change the dirty state of the folio. When
> block size == page size and we do page aligned IO (as xfs/196 does)
> everything happens to align perfectly and page cache invalidation
> removes the single page folios that span the written data. Hence the
> followup delalloc punch pass does not find cached data over that
> range and it can punch the extent out.
> 
> IOWs, xfs/196 "works" for block size == page size with the new
> code. I say "works", because it actually only works for the case
> where IO is page aligned, and no data was read from disk before
> writes occur. Because the moment we actually read data first, the
> readahead code allocates multipage folios and suddenly the
> invalidate code goes back to zeroing subfolio ranges without
> changing dirty state.
> 
> Hence, with multipage folios in play, block size == page size is
> functionally identical to block size < page size behaviour, and
> drop-writes is manifestly broken w.r.t to this case. Invalidation of
> a subfolio range doesn't result in the folio being removed from the
> cache, just the range gets zeroed. Hence after we've sequentially
> walked over a folio that we've dirtied (via write data) and then
> invalidated, we end up with a dirty folio full of zeroed data.
> 
> And because the new code skips punching ranges that have dirty
> folios covering them, we end up leaving the delalloc range intact
> after failing all the writes. Hence failed writes now end up
> writing zeroes to disk in the cases where invalidation zeroes folios
> rather than removing them from cache.
> 
> This is a fundamental change of behaviour that is needed to avoid
> the data corruption vectors that exist in the old write fail path,
> and it renders the drop-writes injection non-functional and
> unworkable as it stands.
> 
> As it is, I think the error injection is also now unnecessary, as
> partial writes that need delalloc extent are going to be a lot more
> common with stale iomap detection in place. Hence this patch removes
> the drop-writes error injection completely. xfs/196 can remain for
> testing kernels that don't have this data corruption fix, but those
> that do will report:
> 
> xfs/196 3s ... [not run] XFS error injection drop_writes unknown on this kernel.

Assuming you're planning to scuttle xfs/196 as well,

(I appreciate the cleanups in xfs_error.c too.)

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_errortag.h | 12 +++++-------
>  fs/xfs/xfs_error.c           | 27 ++++++++++++++++++++-------
>  fs/xfs/xfs_iomap.c           |  9 ---------
>  3 files changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
> index 5362908164b0..580ccbd5aadc 100644
> --- a/fs/xfs/libxfs/xfs_errortag.h
> +++ b/fs/xfs/libxfs/xfs_errortag.h
> @@ -40,13 +40,12 @@
>  #define XFS_ERRTAG_REFCOUNT_FINISH_ONE			25
>  #define XFS_ERRTAG_BMAP_FINISH_ONE			26
>  #define XFS_ERRTAG_AG_RESV_CRITICAL			27
> +
>  /*
> - * DEBUG mode instrumentation to test and/or trigger delayed allocation
> - * block killing in the event of failed writes. When enabled, all
> - * buffered writes are silenty dropped and handled as if they failed.
> - * All delalloc blocks in the range of the write (including pre-existing
> - * delalloc blocks!) are tossed as part of the write failure error
> - * handling sequence.
> + * Drop-writes support removed because write error handling cannot trash
> + * pre-existing delalloc extents in any useful way anymore. We retain the
> + * definition so that we can reject it as an invalid value in
> + * xfs_errortag_valid().
>   */
>  #define XFS_ERRTAG_DROP_WRITES				28
>  #define XFS_ERRTAG_LOG_BAD_CRC				29
> @@ -95,7 +94,6 @@
>  #define XFS_RANDOM_REFCOUNT_FINISH_ONE			1
>  #define XFS_RANDOM_BMAP_FINISH_ONE			1
>  #define XFS_RANDOM_AG_RESV_CRITICAL			4
> -#define XFS_RANDOM_DROP_WRITES				1
>  #define XFS_RANDOM_LOG_BAD_CRC				1
>  #define XFS_RANDOM_LOG_ITEM_PIN				1
>  #define XFS_RANDOM_BUF_LRU_REF				2
> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> index c6b2aabd6f18..dea3c0649d2f 100644
> --- a/fs/xfs/xfs_error.c
> +++ b/fs/xfs/xfs_error.c
> @@ -46,7 +46,7 @@ static unsigned int xfs_errortag_random_default[] = {
>  	XFS_RANDOM_REFCOUNT_FINISH_ONE,
>  	XFS_RANDOM_BMAP_FINISH_ONE,
>  	XFS_RANDOM_AG_RESV_CRITICAL,
> -	XFS_RANDOM_DROP_WRITES,
> +	0, /* XFS_RANDOM_DROP_WRITES has been removed */
>  	XFS_RANDOM_LOG_BAD_CRC,
>  	XFS_RANDOM_LOG_ITEM_PIN,
>  	XFS_RANDOM_BUF_LRU_REF,
> @@ -162,7 +162,6 @@ XFS_ERRORTAG_ATTR_RW(refcount_continue_update,	XFS_ERRTAG_REFCOUNT_CONTINUE_UPDA
>  XFS_ERRORTAG_ATTR_RW(refcount_finish_one,	XFS_ERRTAG_REFCOUNT_FINISH_ONE);
>  XFS_ERRORTAG_ATTR_RW(bmap_finish_one,	XFS_ERRTAG_BMAP_FINISH_ONE);
>  XFS_ERRORTAG_ATTR_RW(ag_resv_critical,	XFS_ERRTAG_AG_RESV_CRITICAL);
> -XFS_ERRORTAG_ATTR_RW(drop_writes,	XFS_ERRTAG_DROP_WRITES);
>  XFS_ERRORTAG_ATTR_RW(log_bad_crc,	XFS_ERRTAG_LOG_BAD_CRC);
>  XFS_ERRORTAG_ATTR_RW(log_item_pin,	XFS_ERRTAG_LOG_ITEM_PIN);
>  XFS_ERRORTAG_ATTR_RW(buf_lru_ref,	XFS_ERRTAG_BUF_LRU_REF);
> @@ -206,7 +205,6 @@ static struct attribute *xfs_errortag_attrs[] = {
>  	XFS_ERRORTAG_ATTR_LIST(refcount_finish_one),
>  	XFS_ERRORTAG_ATTR_LIST(bmap_finish_one),
>  	XFS_ERRORTAG_ATTR_LIST(ag_resv_critical),
> -	XFS_ERRORTAG_ATTR_LIST(drop_writes),
>  	XFS_ERRORTAG_ATTR_LIST(log_bad_crc),
>  	XFS_ERRORTAG_ATTR_LIST(log_item_pin),
>  	XFS_ERRORTAG_ATTR_LIST(buf_lru_ref),
> @@ -256,6 +254,19 @@ xfs_errortag_del(
>  	kmem_free(mp->m_errortag);
>  }
>  
> +static bool
> +xfs_errortag_valid(
> +	unsigned int		error_tag)
> +{
> +	if (error_tag >= XFS_ERRTAG_MAX)
> +		return false;
> +
> +	/* Error out removed injection types */
> +	if (error_tag == XFS_ERRTAG_DROP_WRITES)
> +		return false;
> +	return true;
> +}
> +
>  bool
>  xfs_errortag_test(
>  	struct xfs_mount	*mp,
> @@ -277,7 +288,9 @@ xfs_errortag_test(
>  	if (!mp->m_errortag)
>  		return false;
>  
> -	ASSERT(error_tag < XFS_ERRTAG_MAX);
> +	if (!xfs_errortag_valid(error_tag))
> +		return false;
> +
>  	randfactor = mp->m_errortag[error_tag];
>  	if (!randfactor || prandom_u32_max(randfactor))
>  		return false;
> @@ -293,7 +306,7 @@ xfs_errortag_get(
>  	struct xfs_mount	*mp,
>  	unsigned int		error_tag)
>  {
> -	if (error_tag >= XFS_ERRTAG_MAX)
> +	if (!xfs_errortag_valid(error_tag))
>  		return -EINVAL;
>  
>  	return mp->m_errortag[error_tag];
> @@ -305,7 +318,7 @@ xfs_errortag_set(
>  	unsigned int		error_tag,
>  	unsigned int		tag_value)
>  {
> -	if (error_tag >= XFS_ERRTAG_MAX)
> +	if (!xfs_errortag_valid(error_tag))
>  		return -EINVAL;
>  
>  	mp->m_errortag[error_tag] = tag_value;
> @@ -319,7 +332,7 @@ xfs_errortag_add(
>  {
>  	BUILD_BUG_ON(ARRAY_SIZE(xfs_errortag_random_default) != XFS_ERRTAG_MAX);
>  
> -	if (error_tag >= XFS_ERRTAG_MAX)
> +	if (!xfs_errortag_valid(error_tag))
>  		return -EINVAL;
>  
>  	return xfs_errortag_set(mp, error_tag,
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index a9b3a1bcc3fd..d294a00ca28b 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1191,15 +1191,6 @@ xfs_buffered_write_iomap_end(
>  	struct xfs_mount	*mp = XFS_M(inode->i_sb);
>  	int			error;
>  
> -	/*
> -	 * Behave as if the write failed if drop writes is enabled. Set the NEW
> -	 * flag to force delalloc cleanup.
> -	 */
> -	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DROP_WRITES)) {
> -		iomap->flags |= IOMAP_F_NEW;
> -		written = 0;
> -	}
> -
>  	error = iomap_file_buffered_write_punch_delalloc(inode, iomap, offset,
>  			length, written, &xfs_buffered_write_delalloc_punch);
>  	if (error && !xfs_is_shutdown(mp)) {
> -- 
> 2.37.2
> 

  reply	other threads:[~2022-11-17 18:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  5:58 [PATCH 0/8 v3] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-17  5:58 ` [PATCH 1/9] xfs: write page faults in iomap are not buffered writes Dave Chinner
2022-11-17  5:58 ` [PATCH 2/9] xfs: punching delalloc extents on write failure is racy Dave Chinner
2022-11-17 17:50   ` Darrick J. Wong
2022-11-17  5:58 ` [PATCH 3/9] xfs: use byte ranges for write cleanup ranges Dave Chinner
2022-11-17  5:58 ` [PATCH 4/9] xfs,iomap: move delalloc punching to iomap Dave Chinner
2022-11-17 17:57   ` Darrick J. Wong
2022-11-17  5:58 ` [PATCH 5/9] iomap: buffered write failure should not truncate the page cache Dave Chinner
2022-11-17 18:36   ` Darrick J. Wong
2022-11-17  5:58 ` [PATCH 6/9] xfs: xfs_bmap_punch_delalloc_range() should take a byte range Dave Chinner
2022-11-17 18:37   ` Darrick J. Wong
2022-11-17  5:58 ` [PATCH 7/9] iomap: write iomap validity checks Dave Chinner
2022-11-17 18:40   ` Darrick J. Wong
2022-11-17  5:58 ` [PATCH 8/9] xfs: use iomap_valid method to detect stale cached iomaps Dave Chinner
2022-11-17 18:59   ` Darrick J. Wong
2022-11-17 21:36     ` Dave Chinner
2022-11-17  5:58 ` [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Dave Chinner
2022-11-17 18:01   ` Darrick J. Wong [this message]
  -- strict thread matches above, loose matches on Subject: below --
2022-11-23  5:58 [PATCH 0/9 V4] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-23  5:58 ` [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Dave Chinner
2022-11-15  1:30 [PATCH v2 0/9] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-15  1:30 ` [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3Z26ZYSEK+rf0q4@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).