Re: [PATCH 3/8] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/8] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem
Date: Thu, 11 Apr 2019 08:29:14 -0400	[thread overview]
Message-ID: <20190411122910.GC2888@bfoster> (raw)
In-Reply-To: <155494714539.1090518.9582107800209578968.stgit@magnolia>

On Wed, Apr 10, 2019 at 06:45:45PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If we know the filesystem metadata isn't healthy during unmount, we want
> to encourage the administrator to run xfs_repair right away.  We can't
> do this if BAD_SUMMARY will cause an unclean log unmount to force
> summary recalculation, so turn it off if the fs is bad.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_health.h |    2 +
>  fs/xfs/xfs_health.c        |   74 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_mount.c         |    2 +
>  fs/xfs/xfs_trace.h         |    3 ++
>  4 files changed, 81 insertions(+)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
> index 30762a5d4862..a434b47f2aa0 100644
> --- a/fs/xfs/libxfs/xfs_health.h
> +++ b/fs/xfs/libxfs/xfs_health.h
> @@ -110,6 +110,8 @@ void xfs_inode_mark_healthy(struct xfs_inode *ip, unsigned int mask);
>  void xfs_inode_measure_sickness(struct xfs_inode *ip, unsigned int *sick,
>  		unsigned int *checked);
>  
> +void xfs_health_unmount(struct xfs_mount *mp);
> +
>  /* Now some helpers. */
>  
>  static inline bool
> diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
> index 941f33037e2f..21728228e08b 100644
> --- a/fs/xfs/xfs_health.c
> +++ b/fs/xfs/xfs_health.c
> @@ -19,6 +19,80 @@
>  #include "xfs_trace.h"
>  #include "xfs_health.h"
>  
> +/*
> + * Warn about metadata corruption that we detected but haven't fixed, and
> + * make sure we're not sitting on anything that would get in the way of
> + * recovery.
> + */
> +void
> +xfs_health_unmount(
> +	struct xfs_mount	*mp)
> +{
> +	struct xfs_perag	*pag;
> +	xfs_agnumber_t		agno;
> +	unsigned int		sick = 0;
> +	unsigned int		checked = 0;
> +	bool			warn = false;
> +
> +	if (XFS_FORCED_SHUTDOWN(mp))
> +		return;
> +
> +	/* Measure AG corruption levels. */
> +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> +		pag = xfs_perag_get(mp, agno);
> +		xfs_ag_measure_sickness(pag, &sick, &checked);
> +		if (sick) {
> +			trace_xfs_ag_unfixed_corruption(mp, agno, sick);
> +			warn = true;
> +		}
> +		xfs_perag_put(pag);
> +	}
> +
> +	/* Measure realtime volume corruption levels. */
> +	xfs_rt_measure_sickness(mp, &sick, &checked);
> +	if (sick) {
> +		trace_xfs_rt_unfixed_corruption(mp, sick);
> +		warn = true;
> +	}
> +
> +	/*
> +	 * Measure fs corruption and keep the sample around for the warning.
> +	 * See the note below for why we exempt FS_COUNTERS.
> +	 */
> +	xfs_fs_measure_sickness(mp, &sick, &checked);
> +	if (sick & ~XFS_SICK_FS_COUNTERS) {
> +		trace_xfs_fs_unfixed_corruption(mp, sick);
> +		warn = true;
> +	}
> +
> +	if (warn) {
> +		xfs_warn(mp,
> +"Uncorrected metadata errors detected; please run xfs_repair.");
> +
> +		/*
> +		 * We discovered uncorrected metadata problems at some point
> +		 * during this filesystem mount and have advised the
> +		 * administrator to run repair once the unmount completes.
> +		 *
> +		 * However, we must be careful -- when FSCOUNTERS are flagged
> +		 * unhealthy, the unmount procedure omits writing the clean
> +		 * unmount record to the log so that the next mount will run
> +		 * recovery and recompute the summary counters.  In other
> +		 * words, we leave a dirty log to get the counters fixed.
> +		 *
> +		 * Unfortunately, xfs_repair cannot recover dirty logs, so if
> +		 * there were filesystem problems, FSCOUNTERS was flagged, and
> +		 * the administrator takes our advice to run xfs_repair,
> +		 * they'll have to zap the log before repairing structures.
> +		 * We don't really want to encourage this, so we mark the
> +		 * FSCOUNTERS healthy so that a subsequent repair run won't see
> +		 * a dirty log.
> +		 */
> +		if (sick & XFS_SICK_FS_COUNTERS)
> +			xfs_fs_mark_healthy(mp, XFS_SICK_FS_COUNTERS);
> +	}
> +}
> +
>  /* Mark unhealthy per-fs metadata. */
>  void
>  xfs_fs_mark_sick(
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 14f454e09e6e..eff8b4c3eb3e 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1070,6 +1070,7 @@ xfs_mountfs(
>  	 */
>  	cancel_delayed_work_sync(&mp->m_reclaim_work);
>  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> +	xfs_health_unmount(mp);
>   out_log_dealloc:
>  	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
>  	xfs_log_mount_cancel(mp);
> @@ -1152,6 +1153,7 @@ xfs_unmountfs(
>  	 */
>  	cancel_delayed_work_sync(&mp->m_reclaim_work);
>  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> +	xfs_health_unmount(mp);
>  
>  	xfs_qm_unmount(mp);
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index f079841c7af6..2464ea351f83 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3461,8 +3461,10 @@ DEFINE_EVENT(xfs_fs_corrupt_class, name,	\
>  	TP_ARGS(mp, flags))
>  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_sick);
>  DEFINE_FS_CORRUPT_EVENT(xfs_fs_mark_healthy);
> +DEFINE_FS_CORRUPT_EVENT(xfs_fs_unfixed_corruption);
>  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_sick);
>  DEFINE_FS_CORRUPT_EVENT(xfs_rt_mark_healthy);
> +DEFINE_FS_CORRUPT_EVENT(xfs_rt_unfixed_corruption);
>  
>  DECLARE_EVENT_CLASS(xfs_ag_corrupt_class,
>  	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int flags),
> @@ -3488,6 +3490,7 @@ DEFINE_EVENT(xfs_ag_corrupt_class, name,	\
>  	TP_ARGS(mp, agno, flags))
>  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick);
>  DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy);
> +DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption);
>  
>  DECLARE_EVENT_CLASS(xfs_inode_corrupt_class,
>  	TP_PROTO(struct xfs_inode *ip, unsigned int flags),
>

next prev parent reply	other threads:[~2019-04-11 12:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11  1:45 [PATCH v2 0/8] xfs: online health tracking support Darrick J. Wong
2019-04-11  1:45 ` [PATCH 1/8] xfs: track metadata health status Darrick J. Wong
2019-04-11 12:29   ` Brian Foster
2019-04-11 15:18     ` Darrick J. Wong
2019-04-11 16:05       ` Brian Foster
2019-04-11 18:31         ` Darrick J. Wong
2019-04-11  1:45 ` [PATCH 2/8] xfs: replace the BAD_SUMMARY mount flag with the equivalent health code Darrick J. Wong
2019-04-11  1:45 ` [PATCH 3/8] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem Darrick J. Wong
2019-04-11 12:29   ` Brian Foster [this message]
2019-04-11  1:45 ` [PATCH 4/8] xfs: bump XFS_IOC_FSGEOMETRY to v5 structures Darrick J. Wong
2019-04-11 12:29   ` Brian Foster
2019-04-11  1:45 ` [PATCH 5/8] xfs: add a new ioctl to describe allocation group geometry Darrick J. Wong
2019-04-11 13:08   ` Brian Foster
2019-04-11  1:46 ` [PATCH 6/8] xfs: report fs and rt health via geometry structure Darrick J. Wong
2019-04-11 13:09   ` Brian Foster
2019-04-11 15:30     ` Darrick J. Wong
2019-04-11  1:46 ` [PATCH 7/8] xfs: report AG health via AG geometry ioctl Darrick J. Wong
2019-04-11 13:09   ` Brian Foster
2019-04-11 15:33     ` Darrick J. Wong
2019-04-11  1:46 ` [PATCH 8/8] xfs: report inode health via bulkstat Darrick J. Wong
2019-04-11 13:10   ` Brian Foster
  -- strict thread matches above, loose matches on Subject: below --
2019-04-12  6:28 [PATCH v3 0/8] xfs: online health tracking support Darrick J. Wong
2019-04-12  6:28 ` [PATCH 3/8] xfs: clear BAD_SUMMARY if unmounting an unhealthy filesystem Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190411122910.GC2888@bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.