From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode
Date: Tue, 15 Nov 2022 14:49:54 +1100 [thread overview]
Message-ID: <20221115034954.GX3600936@dread.disaster.area> (raw)
In-Reply-To: <166473482957.1084685.13676900912332698227.stgit@magnolia>
On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> In commit d658e, we tried to improve the robustnes of xchk_get_inode in
> the face of EINVAL returns from iget by calling xfs_imap to see if the
> inobt itself thinks that the inode is allocated. Unfortunately, that
> commit didn't consider the possibility that the inode gets allocated
> after iget but before imap. In this case, the imap call will succeed,
> but we turn that into a corruption error and tell userspace the inode is
> corrupt.
>
> Avoid this false corruption report by grabbing the AGI header and
> retrying the iget before calling imap. If the iget succeeds, we can
> proceed with the usual scrub-by-handle code. Fix all the incorrect
> comments too, since unreadable/corrupt inodes no longer result in EINVAL
> returns.
>
> Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode")
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
OK.
> ---
> fs/xfs/scrub/common.c | 203 ++++++++++++++++++++++++++++++++++++++++---------
> fs/xfs/scrub/common.h | 4 +
> fs/xfs/xfs_icache.c | 3 -
> fs/xfs/xfs_icache.h | 1
> 4 files changed, 172 insertions(+), 39 deletions(-)
>
>
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 42a25488bd25..9a811c5fa8f7 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -635,6 +635,14 @@ xchk_ag_init(
>
> /* Per-scrubber setup functions */
>
> +void
> +xchk_trans_cancel(
> + struct xfs_scrub *sc)
> +{
> + xfs_trans_cancel(sc->tp);
> + sc->tp = NULL;
> +}
> +
> /*
> * Grab an empty transaction so that we can re-grab locked buffers if
> * one of our btrees turns out to be cyclic.
> @@ -720,6 +728,80 @@ xchk_iget(
> return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp);
> }
>
> +/*
> + * Try to grab an inode. If we can't, return the AGI buffer so that the caller
> + * can single-step the loading process to see where things went wrong.
"Try to grab an inode in a manner that avoids races with physical inode
allocation. If we can't, return the locked AGI buffer so that the
caller can single-step the loading process to see where things went
wrong."
> + *
> + * If the iget succeeds, return 0, a NULL AGI, and the inode.
> + * If the iget fails, return the error, the AGI, and a NULL inode. This can
"... the locked AGI, ...."
> + * include -EINVAL and -ENOENT for invalid inode numbers or inodes that are no
> + * longer allocated; or any other corruption or runtime error.
> + * If the AGI read fails, return the error, a NULL AGI, and NULL inode.
> + * If a fatal signal is pending, return -EINTR, a NULL AGI, and a NULL inode.
> + */
> +int
> +xchk_iget_agi(
> + struct xfs_scrub *sc,
> + xfs_ino_t inum,
> + struct xfs_buf **agi_bpp,
> + struct xfs_inode **ipp)
> +{
> + struct xfs_mount *mp = sc->mp;
> + struct xfs_trans *tp = sc->tp;
> + struct xfs_perag *pag;
> + int error;
> +
> +again:
> + *agi_bpp = NULL;
> + *ipp = NULL;
> + error = 0;
> +
> + if (xchk_should_terminate(sc, &error))
> + return error;
> +
> + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum));
> + error = xfs_ialloc_read_agi(pag, tp, agi_bpp);
> + xfs_perag_put(pag);
> + if (error)
> + return error;
> +
> + error = xfs_iget(mp, tp, inum,
> + XFS_IGET_NOWAIT | XFS_IGET_UNTRUSTED, 0, ipp);
OK, IGET_NOWAIT is new....
> +#define XFS_IGET_NOWAIT 0x10 /* return EAGAIN instead of blocking */
But it doesn't prevent blocking. XFS_IGET_UNTRUSTED means we do a
inobt record lookup (btree buffer locking and IO that can block) as
well as reading the inode cluster from disk if it's not already in
cache. Hence this isn't what I'd call a "non-blocking" or "no wait"
operation.
AFAICT, what is needed here is avoiding the -retry mechanism- of the
lookup, not "non blocking/no wait" semantics. i.e. this seems
reasonable to get an -EAGAIN error instead of delaying and trying
again if we are using XFS_IGET_NORETRY semantics....
> + if (error == -EAGAIN) {
> + /*
> + * Cached inode awaiting inactivation. Drop the AGI buffer in
> + * case the inactivation worker is now waiting for it, and try
> + * the iget again.
> + */
That's not the only reason xfs_iget() could return -EAGAIN,
right? radix_tree_preload() failing can cause -EAGAIN to be returned
from xfs_iget_cache_miss(), as can an instantiation race inserting
the new inode into the radix tree. The cache hit path has a bunch
more cases, too. Perhaps:
/*
* The inode may be in core but temporarily
* unavailable and may require the AGI buffer before
* it can be returned. Drop the AGI buffer and retry
* the lookup.
*/
> + xfs_trans_brelse(tp, *agi_bpp);
> + delay(1);
> + goto again;
> + }
> + if (error == 0) {
> + /* We got the inode, so we can release the AGI. */
> + ASSERT(*ipp != NULL);
> + xfs_trans_brelse(tp, *agi_bpp);
> + *agi_bpp = NULL;
> + }
> +
> + return error;
That's better written as:
if (error)
return error;
/* We got the inode, so we can release the AGI. */
ASSERT(*ipp != NULL);
xfs_trans_brelse(tp, *agi_bpp);
*agi_bpp = NULL;
return 0;
Other than that, the code makes sense.
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-11-15 3:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-02 18:20 [PATCHSET v23.1 0/3] xfs: fix iget/irele usage in online fsck Darrick J. Wong
2022-10-02 18:20 ` [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
2022-11-15 3:13 ` Dave Chinner
2022-11-15 3:34 ` Darrick J. Wong
2022-10-02 18:20 ` [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode Darrick J. Wong
2022-11-15 3:49 ` Dave Chinner [this message]
2022-11-16 0:53 ` Darrick J. Wong
2022-10-02 18:20 ` [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core Darrick J. Wong
2022-11-15 4:08 ` Dave Chinner
2022-11-16 2:49 ` Darrick J. Wong
2022-11-17 1:15 ` Dave Chinner
2022-11-17 20:20 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221115034954.GX3600936@dread.disaster.area \
--to=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox