Re: [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health
Date: Thu, 4 Apr 2019 07:50:11 -0400	[thread overview]
Message-ID: <20190404115009.GB37737@bfoster> (raw)
In-Reply-To: <155413867262.4966.18080677522827911800.stgit@magnolia>

On Mon, Apr 01, 2019 at 10:11:12AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Now that we have the ability to track sick metadata in-core, make scrub
> and repair update those health assessments after doing work.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/Makefile       |    1 
>  fs/xfs/scrub/health.c |  180 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/health.h |   12 +++
>  fs/xfs/scrub/scrub.c  |    8 ++
>  fs/xfs/scrub/scrub.h  |   11 +++
>  5 files changed, 212 insertions(+)
>  create mode 100644 fs/xfs/scrub/health.c
>  create mode 100644 fs/xfs/scrub/health.h
> 
> 
...
> diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
> new file mode 100644
> index 000000000000..dd9986500801
> --- /dev/null
> +++ b/fs/xfs/scrub/health.c
> @@ -0,0 +1,180 @@
...
> +/* Update filesystem health assessments based on what we found and did. */
> +void
> +xchk_update_health(
> +	struct xfs_scrub	*sc,
> +	bool			already_fixed)
> +{
> +	/*
> +	 * If the scrubber finds errors, we mark sick whatever's mentioned in
> +	 * sick_mask, no matter whether this is a first scan or an evaluation
> +	 * of repair effectiveness.
> +	 *
> +	 * If there is no direct corruption and we're called after a repair,
> +	 * clear whatever's in heal_mask because that's what we fixed.
> +	 *
> +	 * Otherwise, there's no direct corruption and we didn't repair
> +	 * anything, so mark whatever's in sick_mask as healthy.
> +	 */
> +	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
> +		xchk_mark_sick(sc, sc->sick_mask);
> +	else if (already_fixed)
> +		xchk_mark_healthy(sc, sc->heal_mask);
> +	else
> +		xchk_mark_healthy(sc, sc->sick_mask);
> +}

Hmm, I think I follow what we're doing here but it's a bit confusing
without the additional context of where these bits will be set/cleared
at the lower scrub layers (or at least without an example). Some
questions on that below...

...
> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index 1b2344d00525..b1519dfc5811 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c
> @@ -40,6 +40,7 @@
>  #include "scrub/trace.h"
>  #include "scrub/btree.h"
>  #include "scrub/repair.h"
> +#include "scrub/health.h"
>  
>  /*
>   * Online Scrub and Repair
> @@ -468,6 +469,7 @@ xfs_scrub_metadata(
>  {
>  	struct xfs_scrub		sc;
>  	struct xfs_mount		*mp = ip->i_mount;
> +	unsigned int			heal_mask;
>  	bool				try_harder = false;
>  	bool				already_fixed = false;
>  	int				error = 0;
> @@ -488,6 +490,7 @@ xfs_scrub_metadata(
>  	error = xchk_validate_inputs(mp, sm);
>  	if (error)
>  		goto out;
> +	heal_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
>  
>  	xchk_experimental_warning(mp);
>  
> @@ -499,6 +502,8 @@ xfs_scrub_metadata(
>  	sc.ops = &meta_scrub_ops[sm->sm_type];
>  	sc.try_harder = try_harder;
>  	sc.sa.agno = NULLAGNUMBER;
> +	sc.heal_mask = heal_mask;
> +	sc.sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);

Ok, so we initialize the heal/sick masks based on the scrub type that
was requested on the first pass through...

>  	error = sc.ops->setup(&sc, ip);
>  	if (error)
>  		goto out_teardown;
> @@ -519,6 +524,8 @@ xfs_scrub_metadata(
>  	} else if (error)
>  		goto out_teardown;
>  
> +	xchk_update_health(&sc, already_fixed);
> +

... then update the in-core fs health state based on the sick mask. Is
it possible for the scrub operation to set more sick mask bits based on
what it finds? More specifically, I'm wondering why the masks wouldn't
start as zero and toggle based on finding/fixing corruption(s). Or if
the sick mask value is essentially fixed, whether we need to store it in
the xfs_scrub context...

>  	if ((sc.sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) && !already_fixed) {
>  		bool needs_fix;
>  
> @@ -551,6 +558,7 @@ xfs_scrub_metadata(
>  				xrep_failure(mp);
>  				goto out;
>  			}
> +			heal_mask = sc.heal_mask;

And if we end up doing a repair, we presumably can repair multiple
things and so we track that separately and persist the heal mask across
a potential retry. What about the case where we don't retry, but scrub
finds something and then immediately repairs it? Should we update the fs
state after both detecting and clearing the problem, or does that happen
elsewhere?

Also, if repair can potentially clear multiple bits, what's the
possibility of a repair clearing one failure and then failing on
another, causing the broader repair op to return an error or jump into
this retry? ISTM that it might be possible to skip clearing one fail
state bit so long as the original thing remained corrupted, but I feel
like I'm still missing some context on the bigger picture scrub
tracking...

Brian

>  			goto retry_op;
>  		}
>  	}
> diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
> index 22f754fba8e5..05f1ad242a35 100644
> --- a/fs/xfs/scrub/scrub.h
> +++ b/fs/xfs/scrub/scrub.h
> @@ -62,6 +62,17 @@ struct xfs_scrub {
>  	struct xfs_inode		*ip;
>  	void				*buf;
>  	uint				ilock_flags;
> +
> +	/* Metadata to be marked sick if scrub finds errors. */
> +	unsigned int			sick_mask;
> +
> +	/*
> +	 * Metadata to be marked healthy if repair fixes errors.  Some repair
> +	 * functions can fix multiple data structures at once, so we have to
> +	 * treat sick and heal masks separately.
> +	 */
> +	unsigned int			heal_mask;
> +
>  	bool				try_harder;
>  	bool				has_quotaofflock;
>  
>

next prev parent reply	other threads:[~2019-04-04 11:50 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-01 17:10 [PATCH 00/10] xfs: online health tracking support Darrick J. Wong
2019-04-01 17:10 ` [PATCH 01/10] xfs: track metadata health levels Darrick J. Wong
2019-04-02 13:22   ` Brian Foster
2019-04-02 13:30     ` Darrick J. Wong
2019-04-01 17:10 ` [PATCH 02/10] xfs: replace the BAD_SUMMARY mount flag with the equivalent health code Darrick J. Wong
2019-04-02 13:22   ` Brian Foster
2019-04-01 17:10 ` [PATCH 03/10] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem Darrick J. Wong
2019-04-02 13:24   ` Brian Foster
2019-04-02 13:40     ` Darrick J. Wong
2019-04-02 13:53       ` Brian Foster
2019-04-02 18:16         ` Darrick J. Wong
2019-04-02 18:32           ` Brian Foster
2019-04-01 17:10 ` [PATCH 04/10] xfs: expand xfs_fsop_geom Darrick J. Wong
2019-04-02 17:34   ` Brian Foster
2019-04-02 21:53   ` Dave Chinner
2019-04-02 22:31     ` Darrick J. Wong
2019-04-01 17:10 ` [PATCH 05/10] xfs: add a new ioctl to describe allocation group geometry Darrick J. Wong
2019-04-02 17:34   ` Brian Foster
2019-04-02 21:35     ` Darrick J. Wong
2019-04-01 17:10 ` [PATCH 06/10] xfs: report fs and rt health via geometry structure Darrick J. Wong
2019-04-02 17:35   ` Brian Foster
2019-04-02 18:23     ` Darrick J. Wong
2019-04-02 23:34       ` Darrick J. Wong
2019-04-01 17:10 ` [PATCH 07/10] xfs: report AG health via AG geometry ioctl Darrick J. Wong
2019-04-03 14:30   ` Brian Foster
2019-04-03 16:11     ` Darrick J. Wong
2019-04-04 11:48       ` Brian Foster
2019-04-05 20:33         ` Darrick J. Wong
2019-04-08 11:34           ` Brian Foster
2019-04-09  3:25             ` Darrick J. Wong
2019-04-01 17:11 ` [PATCH 08/10] xfs: report inode health via bulkstat Darrick J. Wong
2019-04-01 17:11 ` [PATCH 09/10] xfs: scrub/repair should update filesystem metadata health Darrick J. Wong
2019-04-04 11:50   ` Brian Foster [this message]
2019-04-04 18:01     ` Darrick J. Wong
2019-04-05 13:07       ` Brian Foster
2019-04-05 20:54         ` Darrick J. Wong
2019-04-08 11:35           ` Brian Foster
2019-04-09  3:30             ` Darrick J. Wong
2019-04-01 17:11 ` [PATCH 10/10] xfs: update health status if we get a clean bill of health Darrick J. Wong
2019-04-04 11:51   ` Brian Foster
2019-04-04 15:48     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190404115009.GB37737@bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox