* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-18 22:32 [PATCH] xfs: Introduce permanent async buffer write IO failures Dave Chinner
@ 2015-02-18 23:14 ` Eric Sandeen
2015-02-18 23:52 ` Dave Chinner
2015-02-19 14:28 ` Brian Foster
2015-02-19 22:39 ` Eric Sandeen
2 siblings, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2015-02-18 23:14 UTC (permalink / raw)
To: Dave Chinner, xfs
On 2/18/15 4:32 PM, Dave Chinner wrote:
> /*
> - * If the write of the buffer was synchronous, we want to make
> - * sure to return the error to the caller of xfs_bwrite().
> + * Repeated failure on an async write.
> + *
> + * Now we need to classify the error and determine the correct action to
> + * take. Different types of errors will require different processing,
> + * but make the default classification "transient" so that we keep
> + * retrying in these cases. If this is the wrog thing to do, then we'll
> + * get reports that the filesystem hung in the presence of that type of
> + * error and we can take appropriate action to remedy the issue for that
> + * type of error.
> */
So, I think this is the tricky part.
No errno has a universal meaning, and we don't know what kind of block device
we're talking to. One device's ENOSPC may be another's EIO, and either may or
may not be permanent, maybe ENODEV *isn't* permanent. (...is it always permanent?)
My first feeble hack at this was counting consecutive errors, and hard failing
after a set number. Thinking about that later, it seems like something time-based
might be better than io-count-based.
Can we really simply switch on an error? If nothing else, this might have
to be configurable somehow, so that an admin can choose which errors for
which device are desired to be "permanent."
(I think that's accurately summing up irc-and-side-channel discussions) ;)
Thanks,
-Eric
> + switch (bp->b_error) {
> + case ENODEV:
> + /* permanent error, no recovery possible */
> + xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
> + goto out_stale;
> + default:
> + /* do nothing, higher layers will retry */
> + break;
> + }
> +
> + xfs_buf_relse(bp);
> + return true;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-18 23:14 ` Eric Sandeen
@ 2015-02-18 23:52 ` Dave Chinner
2015-02-19 19:04 ` Carlos Maiolino
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2015-02-18 23:52 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
On Wed, Feb 18, 2015 at 05:14:15PM -0600, Eric Sandeen wrote:
> On 2/18/15 4:32 PM, Dave Chinner wrote:
> > /*
> > - * If the write of the buffer was synchronous, we want to make
> > - * sure to return the error to the caller of xfs_bwrite().
> > + * Repeated failure on an async write.
> > + *
> > + * Now we need to classify the error and determine the correct action to
> > + * take. Different types of errors will require different processing,
> > + * but make the default classification "transient" so that we keep
> > + * retrying in these cases. If this is the wrog thing to do, then we'll
> > + * get reports that the filesystem hung in the presence of that type of
> > + * error and we can take appropriate action to remedy the issue for that
> > + * type of error.
> > */
>
> So, I think this is the tricky part.
>
> No errno has a universal meaning, and we don't know what kind of block device
> we're talking to. One device's ENOSPC may be another's EIO, and either may or
> may not be permanent, maybe ENODEV *isn't* permanent. (...is it always permanent?)
When a device is unplugged and then plugged back in it comes back as
a different device. So, AFAICT, if the device goes away then we'll
never be able to recover because the underlying block device never
comes back...
> My first feeble hack at this was counting consecutive errors, and
> hard failing after a set number. Thinking about that later, it
> seems like something time-based might be better than
> io-count-based.
Possibly. IOs usually timeout after 30s, so EIO is going to have to
be delayed at least for long enough for things like FC transport
reconnect periods (worse case is 240s, IIRC) regardless of the
number of IOs...
> Can we really simply switch on an error? If nothing else, this might have
> to be configurable somehow, so that an admin can choose which errors for
> which device are desired to be "permanent."
Well, the switch is simple characterisation. What we do with that
error type can be much more complex, and that's why I haven't tried
to address those issues here. When we've sorted out what we need
and how we are going to configure the error handling, then we can
add it.
e.g. if we need configurable error handling, it needs to be
configurable for different error types, and it needs to be
configurable on a per-mount basis. And it needs to be configurable
at runtime, not just at mount time. That kind of leads to using
sysfs for this. e.g. for each error type we ned to handle different
behaviour for:
$ cat /sys/fs/xfs/vda/meta_write_errors/enospc/type
[transient] permanent
$ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_timeout_seconds
300
$ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts
50
$ cat /sys/fs/xfs/vda/meta_write_errors/enospc/transient_fail_at_umount
1
And then have generic infrastructure to set it up and handle the
buffer errors according to the config?
> (I think that's accurately summing up irc-and-side-channel discussions) ;)
Pretty much.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-18 23:52 ` Dave Chinner
@ 2015-02-19 19:04 ` Carlos Maiolino
2015-02-19 21:18 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Carlos Maiolino @ 2015-02-19 19:04 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, xfs
>
> Well, the switch is simple characterisation. What we do with that
> error type can be much more complex, and that's why I haven't tried
> to address those issues here. When we've sorted out what we need
> and how we are going to configure the error handling, then we can
> add it.
>
> e.g. if we need configurable error handling, it needs to be
> configurable for different error types, and it needs to be
> configurable on a per-mount basis. And it needs to be configurable
> at runtime, not just at mount time. That kind of leads to using
> sysfs for this. e.g. for each error type we ned to handle different
> behaviour for:
>
> $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/type
> [transient] permanent
> $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_timeout_seconds
> 300
> $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts
> 50
> $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/transient_fail_at_umount
> 1
>
> And then have generic infrastructure to set it up and handle the
> buffer errors according to the config?
>
> > (I think that's accurately summing up irc-and-side-channel discussions) ;)
>
> Pretty much.
>
talking about possible configurable error handlers, what about leave this choice
of failure to the sysadmin? Instead a time or type based configuration what
about something that the administrator could just say "next IO to device X
should fail permanently"?
Anyway, I know it would not be automatic, but it adds some flexibility to the
current behavior.
Anyway, just my 2 cents.
--
Carlos
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-19 19:04 ` Carlos Maiolino
@ 2015-02-19 21:18 ` Dave Chinner
0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2015-02-19 21:18 UTC (permalink / raw)
To: Eric Sandeen, xfs
On Thu, Feb 19, 2015 at 05:04:19PM -0200, Carlos Maiolino wrote:
> >
> > Well, the switch is simple characterisation. What we do with that
> > error type can be much more complex, and that's why I haven't tried
> > to address those issues here. When we've sorted out what we need
> > and how we are going to configure the error handling, then we can
> > add it.
> >
> > e.g. if we need configurable error handling, it needs to be
> > configurable for different error types, and it needs to be
> > configurable on a per-mount basis. And it needs to be configurable
> > at runtime, not just at mount time. That kind of leads to using
> > sysfs for this. e.g. for each error type we ned to handle different
> > behaviour for:
> >
> > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/type
> > [transient] permanent
> > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_timeout_seconds
> > 300
> > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts
> > 50
> > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/transient_fail_at_umount
> > 1
> >
> > And then have generic infrastructure to set it up and handle the
> > buffer errors according to the config?
> >
> > > (I think that's accurately summing up irc-and-side-channel discussions) ;)
> >
> > Pretty much.
> >
>
> talking about possible configurable error handlers, what about leave this choice
> of failure to the sysadmin? Instead a time or type based configuration what
> about something that the administrator could just say "next IO to device X
> should fail permanently"?
How is this different to just shutting down the filesystem
immediately via 'xfs_io -x -c shutdown /path/to/mnt/pt' ?
Regardless of this, leave failures as transient, then when an
error condition occurs (say thinp device ENOSPC), this will error
out on the next IO that is retried:
# echo permanent > /sys/fs/xfs/vda/meta_write_errors/enospc/type
# echo 0 > /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts
Will make the next device ENOSPC IO error shut the filesystem down.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-18 22:32 [PATCH] xfs: Introduce permanent async buffer write IO failures Dave Chinner
2015-02-18 23:14 ` Eric Sandeen
@ 2015-02-19 14:28 ` Brian Foster
2015-02-19 21:34 ` Dave Chinner
2015-02-19 21:41 ` Eric Sandeen
2015-02-19 22:39 ` Eric Sandeen
2 siblings, 2 replies; 10+ messages in thread
From: Brian Foster @ 2015-02-19 14:28 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Thu, Feb 19, 2015 at 09:32:20AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Currently we consider all async metadata buffer write failures as
> transient failures and retry the IO immediately and then again at a
> a later time. The issue with this is that we can hang forever is the
> error is not recoverable.
>
> An example of this is a device that has been pulled and so is
> returning ENODEV to all IO. Attempting to unmount the filesystem
> with the device in this state will result in a hang waiting for the
> async metadata IO to be flushed and written successfully. In this
> case, we need metadata writeback to error out and trigger a shutdown
> state so the unmount can complete.
>
> Put simply, we need to consider some errors as permanent and
> unrecoverable rather than transient.
>
> Hence add infrastructure that allows us to classify the async write
> errors and hence apply different policies to the different failures.
> In this patch, classify ENODEV as a permanent error but leave
> all the other types of error handling unchanged.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_buf_item.c | 110 ++++++++++++++++++++++++++++++--------------------
> 1 file changed, 66 insertions(+), 44 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 507d96a..274678f 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1041,14 +1041,13 @@ xfs_buf_do_callbacks(
> }
>
> /*
> - * This is the iodone() function for buffers which have had callbacks
> - * attached to them by xfs_buf_attach_iodone(). It should remove each
> - * log item from the buffer's list and call the callback of each in turn.
> - * When done, the buffer's fsprivate field is set to NULL and the buffer
> - * is unlocked with a call to iodone().
> + * Process a write IO error on a buffer with active log items.
> + *
> + * Returns true if the buffer has been completed and released, false if callback
> + * processing still needs to be performed and the IO completed.
> */
> -void
> -xfs_buf_iodone_callbacks(
> +static bool
> +xfs_buf_iodone_callback_error(
> struct xfs_buf *bp)
> {
> struct xfs_log_item *lip = bp->b_fspriv;
> @@ -1056,19 +1055,12 @@ xfs_buf_iodone_callbacks(
> static ulong lasttime;
> static xfs_buftarg_t *lasttarg;
>
> - if (likely(!bp->b_error))
> - goto do_callbacks;
> -
> /*
> * If we've already decided to shutdown the filesystem because of
> * I/O errors, there's no point in giving this a retry.
> */
> - if (XFS_FORCED_SHUTDOWN(mp)) {
> - xfs_buf_stale(bp);
> - XFS_BUF_DONE(bp);
> - trace_xfs_buf_item_iodone(bp, _RET_IP_);
> - goto do_callbacks;
> - }
> + if (XFS_FORCED_SHUTDOWN(mp))
> + goto out_stale;
>
> if (bp->b_target != lasttarg ||
> time_after(jiffies, (lasttime + 5*HZ))) {
> @@ -1077,45 +1069,75 @@ xfs_buf_iodone_callbacks(
> }
> lasttarg = bp->b_target;
>
> + /* synchronous writes will have callers process the error */
> + if (!(bp->b_flags & XBF_ASYNC))
> + goto out_stale;
> +
So if we get an error we drop into this function. If it's a sync I/O,
jump to the state processing and let the callers handle it.
> + trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
> + ASSERT(bp->b_iodone != NULL);
> + xfs_buf_ioerror(bp, 0);
> +
Otherwise it's async... we clear the error here.
> /*
> * If the write was asynchronous then no one will be looking for the
> - * error. Clear the error state and write the buffer out again.
> - *
> - * XXX: This helps against transient write errors, but we need to find
> - * a way to shut the filesystem down if the writes keep failing.
> - *
> - * In practice we'll shut the filesystem down soon as non-transient
> - * errors tend to affect the whole device and a failing log write
> - * will make us give up. But we really ought to do better here.
> + * error. If this is the first failure, clear the error state and write
> + * the buffer out again.
> */
> - if (XFS_BUF_ISASYNC(bp)) {
> - ASSERT(bp->b_iodone != NULL);
> -
> - trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
> -
> - xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the flag */
> -
> - if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> - bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> - XBF_DONE | XBF_WRITE_FAIL;
> - xfs_buf_submit(bp);
> - } else {
> - xfs_buf_relse(bp);
> - }
> -
> - return;
> + if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> + bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> + XBF_DONE | XBF_WRITE_FAIL;
> + xfs_buf_submit(bp);
> + return true;
... and retry once via the XBF_WRITE_FAIL flag.
> }
>
> /*
> - * If the write of the buffer was synchronous, we want to make
> - * sure to return the error to the caller of xfs_bwrite().
> + * Repeated failure on an async write.
> + *
> + * Now we need to classify the error and determine the correct action to
> + * take. Different types of errors will require different processing,
> + * but make the default classification "transient" so that we keep
> + * retrying in these cases. If this is the wrog thing to do, then we'll
> + * get reports that the filesystem hung in the presence of that type of
> + * error and we can take appropriate action to remedy the issue for that
> + * type of error.
> */
> + switch (bp->b_error) {
When is this ever true if we clear the error above?
> + case ENODEV:
> + /* permanent error, no recovery possible */
> + xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
> + goto out_stale;
If we can't retry, do the error-specific processing. Is doing this after
a retry really the intended behavior? E.g., if ENODEV is permanent, the
retry seems pointless.
> + default:
> + /* do nothing, higher layers will retry */
> + break;
> + }
> +
> + xfs_buf_relse(bp);
> + return true;
> +
> +out_stale:
> xfs_buf_stale(bp);
> XFS_BUF_DONE(bp);
> -
> trace_xfs_buf_error_relse(bp, _RET_IP_);
> + return false;
> +}
> +
> +/*
> + * This is the iodone() function for buffers which have had callbacks attached
> + * to them by xfs_buf_attach_iodone(). We need to iterate the items on the
> + * callback list, mark the buffer as having no more callbacks and then push the
> + * buffer through IO completion processing.
> + */
> +void
> +xfs_buf_iodone_callbacks(
> + struct xfs_buf *bp)
> +{
> + /*
> + * If there is an error, process it. Some errors require us
> + * to run callbacks after failure processing is done so we
> + * detect that and take appropriate action.
> + */
80 columns?
Brian
> + if (bp->b_error && xfs_buf_iodone_callback_error(bp))
> + return;
>
> -do_callbacks:
> xfs_buf_do_callbacks(bp);
> bp->b_fspriv = NULL;
> bp->b_iodone = NULL;
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-19 14:28 ` Brian Foster
@ 2015-02-19 21:34 ` Dave Chinner
2015-02-19 21:41 ` Eric Sandeen
1 sibling, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2015-02-19 21:34 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Thu, Feb 19, 2015 at 09:28:42AM -0500, Brian Foster wrote:
> On Thu, Feb 19, 2015 at 09:32:20AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Currently we consider all async metadata buffer write failures as
> > transient failures and retry the IO immediately and then again at a
> > a later time. The issue with this is that we can hang forever is the
> > error is not recoverable.
> >
> > An example of this is a device that has been pulled and so is
> > returning ENODEV to all IO. Attempting to unmount the filesystem
> > with the device in this state will result in a hang waiting for the
> > async metadata IO to be flushed and written successfully. In this
> > case, we need metadata writeback to error out and trigger a shutdown
> > state so the unmount can complete.
> >
> > Put simply, we need to consider some errors as permanent and
> > unrecoverable rather than transient.
> >
> > Hence add infrastructure that allows us to classify the async write
> > errors and hence apply different policies to the different failures.
> > In this patch, classify ENODEV as a permanent error but leave
> > all the other types of error handling unchanged.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
....
> > + trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
> > + ASSERT(bp->b_iodone != NULL);
> > + xfs_buf_ioerror(bp, 0);
> > +
>
> Otherwise it's async... we clear the error here.
Ah, well spotted, that shouldn't be there.
>
> > /*
> > * If the write was asynchronous then no one will be looking for the
> > - * error. Clear the error state and write the buffer out again.
> > - *
> > - * XXX: This helps against transient write errors, but we need to find
> > - * a way to shut the filesystem down if the writes keep failing.
> > - *
> > - * In practice we'll shut the filesystem down soon as non-transient
> > - * errors tend to affect the whole device and a failing log write
> > - * will make us give up. But we really ought to do better here.
> > + * error. If this is the first failure, clear the error state and write
> > + * the buffer out again.
> > */
> > - if (XFS_BUF_ISASYNC(bp)) {
> > - ASSERT(bp->b_iodone != NULL);
> > -
> > - trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
> > -
> > - xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the flag */
> > -
> > - if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> > - bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> > - XBF_DONE | XBF_WRITE_FAIL;
> > - xfs_buf_submit(bp);
> > - } else {
> > - xfs_buf_relse(bp);
> > - }
> > -
> > - return;
> > + if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> > + bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> > + XBF_DONE | XBF_WRITE_FAIL;
> > + xfs_buf_submit(bp);
> > + return true;
>
> ... and retry once via the XBF_WRITE_FAIL flag.
It should be in this branch.
>
> > }
> >
> > /*
> > - * If the write of the buffer was synchronous, we want to make
> > - * sure to return the error to the caller of xfs_bwrite().
> > + * Repeated failure on an async write.
> > + *
> > + * Now we need to classify the error and determine the correct action to
> > + * take. Different types of errors will require different processing,
> > + * but make the default classification "transient" so that we keep
> > + * retrying in these cases. If this is the wrog thing to do, then we'll
> > + * get reports that the filesystem hung in the presence of that type of
> > + * error and we can take appropriate action to remedy the issue for that
> > + * type of error.
> > */
> > + switch (bp->b_error) {
>
> When is this ever true if we clear the error above?
>
> > + case ENODEV:
> > + /* permanent error, no recovery possible */
> > + xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
> > + goto out_stale;
>
> If we can't retry, do the error-specific processing. Is doing this after
> a retry really the intended behavior? E.g., if ENODEV is permanent, the
> retry seems pointless.
I'm leaving the "retry once" logic there on purpose - as a just in
case we had some other transient problem. It doesn't hurt to retry
the IO if there was an error, and if we get another error we can be
pretty certain it wasn't a transient problem.
> > + default:
> > + /* do nothing, higher layers will retry */
> > + break;
> > + }
> > +
bp->b_error should be cleared here....
> > + xfs_buf_relse(bp);
> > + return true;
> > +
> > +out_stale:
> > xfs_buf_stale(bp);
> > XFS_BUF_DONE(bp);
> > -
> > trace_xfs_buf_error_relse(bp, _RET_IP_);
> > + return false;
> > +}
> > +
> > +/*
> > + * This is the iodone() function for buffers which have had callbacks attached
> > + * to them by xfs_buf_attach_iodone(). We need to iterate the items on the
> > + * callback list, mark the buffer as having no more callbacks and then push the
> > + * buffer through IO completion processing.
> > + */
> > +void
> > +xfs_buf_iodone_callbacks(
> > + struct xfs_buf *bp)
> > +{
> > + /*
> > + * If there is an error, process it. Some errors require us
> > + * to run callbacks after failure processing is done so we
> > + * detect that and take appropriate action.
> > + */
>
> 80 columns?
*nod*
That's what I get for hand editing patches. :)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-19 14:28 ` Brian Foster
2015-02-19 21:34 ` Dave Chinner
@ 2015-02-19 21:41 ` Eric Sandeen
2015-02-19 23:02 ` Dave Chinner
1 sibling, 1 reply; 10+ messages in thread
From: Eric Sandeen @ 2015-02-19 21:41 UTC (permalink / raw)
To: Brian Foster, Dave Chinner; +Cc: xfs
On 2/19/15 8:28 AM, Brian Foster wrote:
> On Thu, Feb 19, 2015 at 09:32:20AM +1100, Dave Chinner wrote:
...
>> + if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
>> + bp->b_flags |= XBF_WRITE | XBF_ASYNC |
>> + XBF_DONE | XBF_WRITE_FAIL;
>> + xfs_buf_submit(bp);
>> + return true;
so, shouldn't
xfs_buf_ioerror(bp, 0);
go under this conditional to address Brian's point... ok, you just responded to that. :)
FWIW, this is re-setting XBF_ASYNC which must already be set, right?
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-19 21:41 ` Eric Sandeen
@ 2015-02-19 23:02 ` Dave Chinner
0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2015-02-19 23:02 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Brian Foster, xfs
On Thu, Feb 19, 2015 at 03:41:12PM -0600, Eric Sandeen wrote:
> On 2/19/15 8:28 AM, Brian Foster wrote:
> > On Thu, Feb 19, 2015 at 09:32:20AM +1100, Dave Chinner wrote:
>
> ...
>
> >> + if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> >> + bp->b_flags |= XBF_WRITE | XBF_ASYNC |
> >> + XBF_DONE | XBF_WRITE_FAIL;
> >> + xfs_buf_submit(bp);
> >> + return true;
>
> so, shouldn't
>
> xfs_buf_ioerror(bp, 0);
>
> go under this conditional to address Brian's point... ok, you just responded to that. :)
>
> FWIW, this is re-setting XBF_ASYNC which must already be set, right?
To be sure, to be sure.... ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
2015-02-18 22:32 [PATCH] xfs: Introduce permanent async buffer write IO failures Dave Chinner
2015-02-18 23:14 ` Eric Sandeen
2015-02-19 14:28 ` Brian Foster
@ 2015-02-19 22:39 ` Eric Sandeen
2 siblings, 0 replies; 10+ messages in thread
From: Eric Sandeen @ 2015-02-19 22:39 UTC (permalink / raw)
To: Dave Chinner, xfs
What about something along these lines (rough, but works) -
shut down the fs after enough time goes by with sequential errors
and no success, on the same buffer (based on your patch...)
One thing that still happens is a lot of error spew during the
retries, maybe we can make more use of XBF_WRITE_FAIL to only
print errors once?
Based-on-patch-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 75ff5d5..13c47a1 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -181,6 +181,7 @@ typedef struct xfs_buf {
unsigned int b_page_count; /* size of page array */
unsigned int b_offset; /* page offset in first page */
int b_error; /* error code on I/O */
+ unsigned long b_first_error_time;
const struct xfs_buf_ops *b_ops;
#ifdef XFS_BUF_LOCK_TRACKING
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 507d96a..4751c5f 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1041,14 +1041,13 @@ xfs_buf_do_callbacks(
}
/*
- * This is the iodone() function for buffers which have had callbacks
- * attached to them by xfs_buf_attach_iodone(). It should remove each
- * log item from the buffer's list and call the callback of each in turn.
- * When done, the buffer's fsprivate field is set to NULL and the buffer
- * is unlocked with a call to iodone().
+ * Process a write IO error on a buffer with active log items.
+ *
+ * Returns true if the buffer has been completed and released, false if callback
+ * processing still needs to be performed and the IO completed.
*/
-void
-xfs_buf_iodone_callbacks(
+static bool
+xfs_buf_iodone_callback_error(
struct xfs_buf *bp)
{
struct xfs_log_item *lip = bp->b_fspriv;
@@ -1056,19 +1055,12 @@ xfs_buf_iodone_callbacks(
static ulong lasttime;
static xfs_buftarg_t *lasttarg;
- if (likely(!bp->b_error))
- goto do_callbacks;
-
/*
* If we've already decided to shutdown the filesystem because of
* I/O errors, there's no point in giving this a retry.
*/
- if (XFS_FORCED_SHUTDOWN(mp)) {
- xfs_buf_stale(bp);
- XFS_BUF_DONE(bp);
- trace_xfs_buf_item_iodone(bp, _RET_IP_);
- goto do_callbacks;
- }
+ if (XFS_FORCED_SHUTDOWN(mp))
+ goto out_stale;
if (bp->b_target != lasttarg ||
time_after(jiffies, (lasttime + 5*HZ))) {
@@ -1077,45 +1069,70 @@ xfs_buf_iodone_callbacks(
}
lasttarg = bp->b_target;
+ /* synchronous writes will have callers process the error */
+ if (!(bp->b_flags & XBF_ASYNC))
+ goto out_stale;
+
+ trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
+ ASSERT(bp->b_iodone != NULL);
+
/*
* If the write was asynchronous then no one will be looking for the
- * error. Clear the error state and write the buffer out again.
- *
- * XXX: This helps against transient write errors, but we need to find
- * a way to shut the filesystem down if the writes keep failing.
- *
- * In practice we'll shut the filesystem down soon as non-transient
- * errors tend to affect the whole device and a failing log write
- * will make us give up. But we really ought to do better here.
+ * error. If this is the first failure, clear the error state and write
+ * the buffer out again.
*/
- if (XFS_BUF_ISASYNC(bp)) {
- ASSERT(bp->b_iodone != NULL);
-
- trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
-
- xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the flag */
-
- if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
- bp->b_flags |= XBF_WRITE | XBF_ASYNC |
- XBF_DONE | XBF_WRITE_FAIL;
- xfs_buf_submit(bp);
- } else {
- xfs_buf_relse(bp);
- }
-
- return;
+ if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
+ if (!bp->b_first_error_time)
+ bp->b_first_error_time = get_seconds();
+ xfs_buf_ioerror(bp, 0);
+ bp->b_flags |= XBF_WRITE | XBF_ASYNC |
+ XBF_DONE | XBF_WRITE_FAIL;
+ xfs_buf_submit(bp);
+ return true;
}
/*
- * If the write of the buffer was synchronous, we want to make
- * sure to return the error to the caller of xfs_bwrite().
+ * Repeated failure on an async write.
+ *
+ * Let things retry for <tuneable handwave> 60s, then give up.
+ * XXX handle seconds wrap?
*/
+ if (get_seconds() - bp->b_first_error_time > 60) {
+ xfs_err(mp, "too many errors, giving up");
+ xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
+ goto out_stale;
+ }
+
+ xfs_buf_relse(bp);
+ return true;
+
+out_stale:
xfs_buf_stale(bp);
XFS_BUF_DONE(bp);
-
trace_xfs_buf_error_relse(bp, _RET_IP_);
+ return false;
+}
+
+/*
+ * This is the iodone() function for buffers which have had callbacks attached
+ * to them by xfs_buf_attach_iodone(). We need to iterate the items on the
+ * callback list, mark the buffer as having no more callbacks and then push the
+ * buffer through IO completion processing.
+ */
+void
+xfs_buf_iodone_callbacks(
+ struct xfs_buf *bp)
+{
+ /*
+ * If there is an error, process it. Some errors require us
+ * to run callbacks after failure processing is done so we
+ * detect that and take appropriate action.
+ */
+ if (bp->b_error && xfs_buf_iodone_callback_error(bp))
+ return;
-do_callbacks:
+ /* zero out error timer if we're good */
+ bp->b_first_error_time = 0;
xfs_buf_do_callbacks(bp);
bp->b_fspriv = NULL;
bp->b_iodone = NULL;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread