linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v3 07/17] xfs: ratelimit unmount time per-buffer I/O error alert
Date: Fri, 1 May 2020 07:24:08 -0400	[thread overview]
Message-ID: <20200501112408.GB40250@bfoster> (raw)
In-Reply-To: <20200430220743.GJ2040@dread.disaster.area>

On Fri, May 01, 2020 at 08:07:43AM +1000, Dave Chinner wrote:
> On Wed, Apr 29, 2020 at 01:21:43PM -0400, Brian Foster wrote:
> > At unmount time, XFS emits an alert for every in-core buffer that
> > might have undergone a write error. In practice this behavior is
> > probably reasonable given that the filesystem is likely short lived
> > once I/O errors begin to occur consistently. Under certain test or
> > otherwise expected error conditions, this can spam the logs and slow
> > down the unmount.
> > 
> > Now that we have a ratelimit mechanism specifically for buffer
> > alerts, reuse it for the per-buffer alerts in xfs_wait_buftarg().
> > Also lift the final repair message out of the loop so it always
> > prints and assert that the metadata error handling code has shut
> > down the fs.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_buf.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 594d5e1df6f8..8f0f605de579 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
...
> > @@ -1685,17 +1686,23 @@ xfs_wait_buftarg(
> >  			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
> >  			list_del_init(&bp->b_lru);
> >  			if (bp->b_flags & XBF_WRITE_FAIL) {
> > -				xfs_alert(btp->bt_mount,
> > +				write_fail = true;
> > +				xfs_buf_alert_ratelimited(bp,
> > +					"XFS: Corruption Alert",
> >  "Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
> >  					(long long)bp->b_bn);
> > -				xfs_alert(btp->bt_mount,
> > -"Please run xfs_repair to determine the extent of the problem.");
> >  			}
> >  			xfs_buf_rele(bp);
> >  		}
> >  		if (loop++ != 0)
> >  			delay(100);
> >  	}
> > +
> > +	if (write_fail) {
> > +		ASSERT(XFS_FORCED_SHUTDOWN(btp->bt_mount));
> 
> I think this is incorrect. A metadata write that is set to retry
> forever and is failing because of a bad sector or some other
> persistent device error will not shut down the filesystem, but still
> be reported here as a failure. Hence we can easily get here without
> a filesystem shutdown having occurred...
> 

I'm confused by your comment because I don't see how we get here to free
a dirty buffer without the filesystem already shut down. AFAICT we're
going to spin (in freeze or unmount) until all outstanding buffers are
written back or converted to permanent errors, which shuts down the fs.
Hm?

Note that I don't object to turning this into a direct
xfs_force_shutdown() call as a fallback, if that's what you're asking
for (which isn't totally clear to me either). Obviously my assumption is
that we're already shut down anyways, but I'd like to get on the same
page here first...

Brian

> Cheers,
> 
> Dave.
> 
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> 


  reply	other threads:[~2020-05-01 11:24 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29 17:21 [PATCH v3 00/17] xfs: flush related error handling cleanups Brian Foster
2020-04-29 17:21 ` [PATCH v3 01/17] xfs: refactor failed buffer resubmission into xfsaild Brian Foster
2020-04-30 17:26   ` Darrick J. Wong
2020-04-29 17:21 ` [PATCH v3 02/17] xfs: factor out buffer I/O failure code Brian Foster
2020-04-30 18:16   ` Darrick J. Wong
2020-05-01  7:43   ` Christoph Hellwig
2020-04-29 17:21 ` [PATCH v3 03/17] xfs: simplify inode flush error handling Brian Foster
2020-04-30 18:37   ` Darrick J. Wong
2020-05-01  9:17     ` Christoph Hellwig
2020-05-01 10:17       ` Christoph Hellwig
2020-05-01 17:43         ` Darrick J. Wong
2020-05-01 17:50           ` Christoph Hellwig
2020-05-01 11:22       ` Brian Foster
2020-04-29 17:21 ` [PATCH v3 04/17] xfs: remove unnecessary shutdown check from xfs_iflush() Brian Foster
2020-04-30 18:37   ` Darrick J. Wong
2020-04-29 17:21 ` [PATCH v3 05/17] xfs: reset buffer write failure state on successful completion Brian Foster
2020-04-30 18:41   ` Darrick J. Wong
2020-05-01  7:44   ` Christoph Hellwig
2020-04-29 17:21 ` [PATCH v3 06/17] xfs: refactor ratelimited buffer error messages into helper Brian Foster
2020-04-30 18:42   ` Darrick J. Wong
2020-05-01  7:44   ` Christoph Hellwig
2020-04-29 17:21 ` [PATCH v3 07/17] xfs: ratelimit unmount time per-buffer I/O error alert Brian Foster
2020-04-30 18:43   ` Darrick J. Wong
2020-04-30 22:07   ` Dave Chinner
2020-05-01 11:24     ` Brian Foster [this message]
2020-05-01  7:48   ` Christoph Hellwig
2020-04-29 17:21 ` [PATCH v3 08/17] xfs: fix duplicate verification from xfs_qm_dqflush() Brian Foster
2020-04-30 18:45   ` Darrick J. Wong
2020-05-01 11:24     ` Brian Foster
2020-04-29 17:21 ` [PATCH v3 09/17] xfs: abort consistently on dquot flush failure Brian Foster
2020-04-30 18:46   ` Darrick J. Wong
2020-04-29 17:21 ` [PATCH v3 10/17] xfs: acquire ->ail_lock from xfs_trans_ail_delete() Brian Foster
2020-04-30 18:52   ` Darrick J. Wong
2020-05-01 11:25     ` Brian Foster
2020-05-01  7:50   ` Christoph Hellwig
2020-04-29 17:21 ` [PATCH v3 11/17] xfs: use delete helper for items expected to be in AIL Brian Foster
2020-04-30 18:54   ` Darrick J. Wong
2020-05-01  7:56   ` Christoph Hellwig
2020-04-29 17:21 ` [PATCH v3 12/17] xfs: drop unused shutdown parameter from xfs_trans_ail_remove() Brian Foster
2020-04-30 18:56   ` Darrick J. Wong
2020-05-01  7:57   ` Christoph Hellwig
2020-04-29 17:21 ` [PATCH v3 13/17] xfs: combine xfs_trans_ail_[remove|delete]() Brian Foster
2020-04-30 18:58   ` Darrick J. Wong
2020-05-01  8:01     ` Christoph Hellwig
2020-05-01  8:00   ` Christoph Hellwig
2020-05-01 11:25     ` Brian Foster
2020-04-29 17:21 ` [PATCH v3 14/17] xfs: remove unused iflush stale parameter Brian Foster
2020-04-30 18:58   ` Darrick J. Wong
2020-04-29 17:21 ` [PATCH v3 15/17] xfs: random buffer write failure errortag Brian Foster
2020-04-30 18:59   ` Darrick J. Wong
2020-05-01  8:02   ` Christoph Hellwig
2020-04-29 17:21 ` [PATCH v3 16/17] xfs: remove unused shutdown types Brian Foster
2020-04-30 18:59   ` Darrick J. Wong
2020-04-29 17:21 ` [PATCH v3 17/17] xfs: remove unused iget_flags param from xfs_imap_to_bp() Brian Foster
2020-04-30 19:00   ` Darrick J. Wong
2020-05-01  8:03     ` Christoph Hellwig
2020-05-01 11:25     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200501112408.GB40250@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).