Re: [PATCH 2/5] xfs: separate out inode buffer recovery a bit more

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/5] xfs: separate out inode buffer recovery a bit more
Date: Tue, 19 Mar 2024 11:40:01 -0700	[thread overview]
Message-ID: <20240319184001.GC1927156@frogsfrogsfrogs> (raw)
In-Reply-To: <20240319021547.3483050-3-david@fromorbit.com>

On Tue, Mar 19, 2024 at 01:15:21PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> It really is a unique snowflake, so peal off from normal buffer
> recovery earlier and shuffle all the unique bits into the inode
> buffer recovery function.
> 
> Also, it looks like the handling of mismatched inode cluster buffer
> sizes is wrong - we have to write the recovered buffer -before- we
> mark it stale as we're not supposed to write stale buffers. I don't
> think we check that anywhere in the buffer IO path, but lets do it
> the right way anyway.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_buf_item_recover.c | 99 ++++++++++++++++++++++-------------
>  1 file changed, 63 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index dba57ee6fa6d..f994a303ad0a 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -229,7 +229,7 @@ xlog_recover_validate_buf_type(
>  	 * just avoid the verification stage for non-crc filesystems
>  	 */
>  	if (!xfs_has_crc(mp))
> -		return;
> +		return 0;
>  
>  	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
>  	magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
> @@ -407,7 +407,7 @@ xlog_recover_validate_buf_type(
>  	 * skipped.
>  	 */
>  	if (current_lsn == NULLCOMMITLSN)
> -		return 0;;
> +		return 0;
>  
>  	if (warnmsg) {
>  		xfs_warn(mp, warnmsg);
> @@ -567,18 +567,22 @@ xlog_recover_this_dquot_buffer(
>  }
>  
>  /*
> - * Perform recovery for a buffer full of inodes.  In these buffers, the only
> - * data which should be recovered is that which corresponds to the
> - * di_next_unlinked pointers in the on disk inode structures.  The rest of the
> - * data for the inodes is always logged through the inodes themselves rather
> - * than the inode buffer and is recovered in xlog_recover_inode_pass2().
> + * Perform recovery for a buffer full of inodes. We don't have inode cluster
> + * buffer specific LSNs, so we always recover inode buffers if they contain
> + * inodes.
> + *
> + * In these buffers, the only inode data which should be recovered is that which
> + * corresponds to the di_next_unlinked pointers in the on disk inode structures.
> + * The rest of the data for the inodes is always logged through the inodes
> + * themselves rather than the inode buffer and is recovered in
> + * xlog_recover_inode_pass2().
>   *
>   * The only time when buffers full of inodes are fully recovered is when the
> - * buffer is full of newly allocated inodes.  In this case the buffer will
> - * not be marked as an inode buffer and so will be sent to
> - * xlog_recover_do_reg_buffer() below during recovery.
> + * buffer is full of newly allocated inodes.  In this case the buffer will not
> + * be marked as an inode buffer and so xlog_recover_do_reg_buffer() will be used
> + * instead.
>   */
> -STATIC int
> +static int
>  xlog_recover_do_inode_buffer(
>  	struct xfs_mount		*mp,
>  	struct xlog_recover_item	*item,
> @@ -598,6 +602,13 @@ xlog_recover_do_inode_buffer(
>  
>  	trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
>  
> +	/*
> +	 * If the magic number doesn't match, something has gone wrong. Don't
> +	 * recover the buffer.
> +	 */
> +	if (cpu_to_be16(XFS_DINODE_MAGIC) != *((__be16 *)bp->b_addr))
> +		return -EFSCORRUPTED;
> +
>  	/*
>  	 * Post recovery validation only works properly on CRC enabled
>  	 * filesystems.
> @@ -677,6 +688,31 @@ xlog_recover_do_inode_buffer(
>  
>  	}
>  
> +	/*
> +	 * Make sure that only inode buffers with good sizes remain valid after
> +	 * recovering this buffer item.
> +	 *
> +	 * The kernel moves inodes in buffers of 1 block or inode_cluster_size
> +	 * bytes, whichever is bigger.  The inode buffers in the log can be a
> +	 * different size if the log was generated by an older kernel using
> +	 * unclustered inode buffers or a newer kernel running with a different
> +	 * inode cluster size.  Regardless, if the inode buffer size isn't
> +	 * max(blocksize, inode_cluster_size) for *our* value of
> +	 * inode_cluster_size, then we need to keep the buffer out of the buffer
> +	 * cache so that the buffer won't overlap with future reads of those
> +	 * inodes.
> +	 *
> +	 * To acheive this, we write the buffer ito recover the inodes then mark
> +	 * it stale so that it won't be found on overlapping buffer lookups and
> +	 * caller knows not to queue it for delayed write.
> +	 */
> +	if (BBTOB(bp->b_length) != M_IGEO(mp)->inode_cluster_size) {
> +		int error;
> +
> +		error = xfs_bwrite(bp);
> +		xfs_buf_stale(bp);
> +		return error;
> +	}
>  	return 0;
>  }
>  
> @@ -840,7 +876,6 @@ xlog_recover_get_buf_lsn(
>  	magic16 = be16_to_cpu(*(__be16 *)blk);
>  	switch (magic16) {
>  	case XFS_DQUOT_MAGIC:
> -	case XFS_DINODE_MAGIC:
>  		goto recover_immediately;
>  	default:
>  		break;
> @@ -910,6 +945,17 @@ xlog_recover_buf_commit_pass2(
>  	if (error)
>  		return error;
>  
> +	/*
> +	 * Inode buffer recovery is quite unique, so go out separate ways here
> +	 * to simplify the rest of the code.
> +	 */
> +	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
> +		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> +		if (error || (bp->b_flags & XBF_STALE))
> +			goto out_release;
> +		goto out_write;
> +	}
> +
>  	/*
>  	 * Recover the buffer only if we get an LSN from it and it's less than
>  	 * the lsn of the transaction we are replaying.
> @@ -946,9 +992,7 @@ xlog_recover_buf_commit_pass2(
>  		goto out_release;
>  	}
>  
> -	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
> -		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> -	} else if (buf_f->blf_flags &
> +	if (buf_f->blf_flags &
>  		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
>  		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
>  			goto out_release;
> @@ -965,28 +1009,11 @@ xlog_recover_buf_commit_pass2(
>  	/*
>  	 * Perform delayed write on the buffer.  Asynchronous writes will be
>  	 * slower when taking into account all the buffers to be flushed.
> -	 *
> -	 * Also make sure that only inode buffers with good sizes stay in
> -	 * the buffer cache.  The kernel moves inodes in buffers of 1 block
> -	 * or inode_cluster_size bytes, whichever is bigger.  The inode
> -	 * buffers in the log can be a different size if the log was generated
> -	 * by an older kernel using unclustered inode buffers or a newer kernel
> -	 * running with a different inode cluster size.  Regardless, if
> -	 * the inode buffer size isn't max(blocksize, inode_cluster_size)
> -	 * for *our* value of inode_cluster_size, then we need to keep
> -	 * the buffer out of the buffer cache so that the buffer won't
> -	 * overlap with future reads of those inodes.
>  	 */
> -	if (XFS_DINODE_MAGIC ==
> -	    be16_to_cpu(*((__be16 *)xfs_buf_offset(bp, 0))) &&
> -	    (BBTOB(bp->b_length) != M_IGEO(log->l_mp)->inode_cluster_size)) {
> -		xfs_buf_stale(bp);
> -		error = xfs_bwrite(bp);
> -	} else {
> -		ASSERT(bp->b_mount == mp);
> -		bp->b_flags |= _XBF_LOGRECOVERY;
> -		xfs_buf_delwri_queue(bp, buffer_list);
> -	}
> +out_write:
> +	ASSERT(bp->b_mount == mp);
> +	bp->b_flags |= _XBF_LOGRECOVERY;
> +	xfs_buf_delwri_queue(bp, buffer_list);
>  
>  out_release:
>  	xfs_buf_relse(bp);
> -- 
> 2.43.0
> 
>

next prev parent reply	other threads:[~2024-03-19 18:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-19  2:15 [PATCH 0/5] xfs: fix discontiguous metadata block recovery Dave Chinner
2024-03-19  2:15 ` [PATCH 1/5] xfs: buffer log item type mismatches are corruption Dave Chinner
2024-03-19  7:23   ` Christoph Hellwig
2024-03-19 18:16   ` Darrick J. Wong
2024-03-19  2:15 ` [PATCH 2/5] xfs: separate out inode buffer recovery a bit more Dave Chinner
2024-03-19  7:26   ` Christoph Hellwig
2024-03-19 18:40   ` Darrick J. Wong [this message]
2024-03-19  2:15 ` [PATCH 3/5] xfs: recover dquot buffers unconditionally Dave Chinner
2024-03-19 18:49   ` Darrick J. Wong
2024-03-19 21:46   ` Christoph Hellwig
2024-03-19  2:15 ` [PATCH 4/5] xfs: detect partial buffer recovery operations Dave Chinner
2024-03-19 20:39   ` Darrick J. Wong
2024-03-19 22:54   ` Christoph Hellwig
2024-03-19 23:14     ` Darrick J. Wong
2024-03-19  2:15 ` [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery Dave Chinner
2024-03-19 20:40   ` Darrick J. Wong
2024-03-19 21:48   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240319184001.GC1927156@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox