public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
Date: Thu, 19 Feb 2015 10:52:20 +1100	[thread overview]
Message-ID: <20150218235220.GQ4251@dastard> (raw)
In-Reply-To: <54E51CC7.8040709@sandeen.net>

On Wed, Feb 18, 2015 at 05:14:15PM -0600, Eric Sandeen wrote:
> On 2/18/15 4:32 PM, Dave Chinner wrote:
> >  	/*
> > -	 * If the write of the buffer was synchronous, we want to make
> > -	 * sure to return the error to the caller of xfs_bwrite().
> > +	 * Repeated failure on an async write.
> > +	 *
> > +	 * Now we need to classify the error and determine the correct action to
> > +	 * take. Different types of errors will require different processing,
> > +	 * but make the default classification "transient" so that we keep
> > +	 * retrying in these cases.  If this is the wrog thing to do, then we'll
> > +	 * get reports that the filesystem hung in the presence of that type of
> > +	 * error and we can take appropriate action to remedy the issue for that
> > +	 * type of error.
> >  	 */
> 
> So, I think this is the tricky part.
> 
> No errno has a universal meaning, and we don't know what kind of block device
> we're talking to.  One device's ENOSPC may be another's EIO, and either may or
> may not be permanent, maybe ENODEV *isn't* permanent.  (...is it always permanent?)

When a device is unplugged and then plugged back in it comes back as
a different device. So, AFAICT, if the device goes away then we'll
never be able to recover because the underlying block device never
comes back...

> My first feeble hack at this was counting consecutive errors, and
> hard failing after a set number.  Thinking about that later, it
> seems like something time-based might be better than
> io-count-based.

Possibly. IOs usually timeout after 30s, so EIO is going to have to
be delayed at least for long enough for things like FC transport
reconnect periods (worse case is 240s, IIRC) regardless of the
number of IOs...

> Can we really simply switch on an error?  If nothing else, this might have
> to be configurable somehow, so that an admin can choose which errors for
> which device are desired to be "permanent."

Well, the switch is simple characterisation. What we do with that
error type can be much more complex, and that's why I haven't tried
to address those issues here. When we've sorted out what we need
and how we are going to configure the error handling, then we can
add it.

e.g. if we need configurable error handling, it needs to be
configurable for different error types, and it needs to be
configurable on a per-mount basis. And it needs to be configurable
at runtime, not just at mount time. That kind of leads to using
sysfs for this. e.g. for each error type we ned to handle different
behaviour for:

$ cat /sys/fs/xfs/vda/meta_write_errors/enospc/type
[transient] permanent
$ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_timeout_seconds
300
$ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts
50
$ cat /sys/fs/xfs/vda/meta_write_errors/enospc/transient_fail_at_umount
1

And then have generic infrastructure to set it up and handle the
buffer errors according to the config?

> (I think that's accurately summing up irc-and-side-channel discussions) ;)

Pretty much.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-02-18 23:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-18 22:32 [PATCH] xfs: Introduce permanent async buffer write IO failures Dave Chinner
2015-02-18 23:14 ` Eric Sandeen
2015-02-18 23:52   ` Dave Chinner [this message]
2015-02-19 19:04     ` Carlos Maiolino
2015-02-19 21:18       ` Dave Chinner
2015-02-19 14:28 ` Brian Foster
2015-02-19 21:34   ` Dave Chinner
2015-02-19 21:41   ` Eric Sandeen
2015-02-19 23:02     ` Dave Chinner
2015-02-19 22:39 ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150218235220.GQ4251@dastard \
    --to=david@fromorbit.com \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox