From: Carlos Maiolino <cmaiolino@redhat.com>
To: xfs@oss.sgi.com
Subject: Re: [PATCH 5/7] xfs: add configuration of error failure speed
Date: Wed, 4 May 2016 11:57:29 +0200 [thread overview]
Message-ID: <20160504095729.GB2855@redhat.com> (raw)
In-Reply-To: <20160503232419.GV26977@dastard>
On Wed, May 04, 2016 at 09:24:19AM +1000, Dave Chinner wrote:
> On Tue, May 03, 2016 at 07:55:38PM +0200, Carlos Maiolino wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > On reception of an error, we can fail immediately, perform some
> > bound amount of retries or retry indefinitely. The current behaviour
> > we have is to retry forever.
> >
> > However, we'd like the ability to choose what behaviour we have, and
> > that requires the ability to configure the behaviour through the new
> > sysfs interfaces.
> >
> > max_retries is used to set the number of retries to do until permanently fail,
> > the only special case is when it is set to -1, which, will cause the system to
> > retry indefinitely.
>
> This doesn't read particularly well, and its more a description of
> the mechanism (which shoul dbe in the code) rather than the reason
> for implementing it.. The original commit message talked about
> different failure characteristics, which still need to be mentioned
> here. e.g:
>
> We'd like to be able to configure errors to either fail
> immediately (fail fast), retry for some time before
> failing (fail slow) or retry forever (fail never). This is
> implemented by using specific values for the number of
> retries we allow.
Fair enough, I will re-work the patch descriptions in the next version.
> >
> > Add both a maximum retry count and a retry timeout so that we can
> > bound by time and/or physical IO attempts.
> >
> > Finally, plumb these into xfs_buf_iodone error processing so that
> > the error behaviour follows the selected configuration.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by:
> > Carlos Maiolino <cmaiolino@redhat.com>
>
> You also need to document what changes you've made to the original
> patches. I can't tell what you've changed in each patch just by
> looking at them - this is the first one where I noticed something
> while reading it. i.e. the commit message should have:
>
> [ cmaiolino: removed fail-fast/slow/never and instead use retry count
> to determine behaviour ]
>
Ok
> > ---
> > fs/xfs/xfs_buf.h | 23 ++++++++++++++-
> > fs/xfs/xfs_buf_item.c | 13 +++++++--
> > fs/xfs/xfs_mount.h | 3 +-
> > fs/xfs/xfs_sysfs.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > 4 files changed, 112 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> > index adef116..bd978f0 100644
> > --- a/fs/xfs/xfs_buf.h
> > +++ b/fs/xfs/xfs_buf.h
> > @@ -183,7 +183,28 @@ typedef struct xfs_buf {
> > unsigned int b_page_count; /* size of page array */
> > unsigned int b_offset; /* page offset in first page */
> > int b_error; /* error code on I/O */
> > - int b_last_error; /* previous async I/O error */
> > +
> > + /*
> > + * async write failure retry count. Initialised to zero on the first
> > + * failure, then when it exceeds the maximum configured without a
> > + * success the write is considered to be failed permanently and the
> > + * iodone handler will take appropriate action.
> > + *
> > + * For retry timeouts, we record the jiffie of the first failure. This
> > + * means that we can change the retry timeout and it on the next error
> > + * it will be checked against the newly configured timeout. This
> > + * prevents buffers getting stuck in retry loops with a really long
> > + * timeout.
> > + *
> > + * last_error is used to ensure that we are getting repeated errors, not
> > + * different errors. e.g. a block device might change ENOSPC to EIO when
> > + * a failure timeout occurs, so we want to re-initialise the error
> > + * retry behaviour appropriately when that happens.
> > + */
> > + int b_retries;
> > + unsigned long b_first_retry_time; /* in jiffies */
> > + int b_last_error;
> > +
> > const struct xfs_buf_ops *b_ops;
> >
> > #ifdef XFS_BUF_LOCK_TRACKING
> > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> > index 3a30a88..68ed2a0 100644
> > --- a/fs/xfs/xfs_buf_item.c
> > +++ b/fs/xfs/xfs_buf_item.c
> > @@ -1086,6 +1086,9 @@ xfs_buf_iodone_callback_error(
> > bp->b_flags |= (XBF_WRITE | XBF_ASYNC |
> > XBF_DONE | XBF_WRITE_FAIL);
> > bp->b_last_error = bp->b_error;
> > + bp->b_retries = 0;
> > + bp->b_first_retry_time = jiffies;
> > +
> > xfs_buf_ioerror(bp, 0);
> > xfs_buf_submit(bp);
> > return true;
> > @@ -1095,8 +1098,13 @@ xfs_buf_iodone_callback_error(
> > * Repeated failure on an async write. Take action according to the
> > * error configuration we have been set up to use.
> > */
> > - if (!cfg->max_retries)
> > - goto permanent_error;
> > + if (cfg->max_retries >= 0)
> > + if (++bp->b_retries > cfg->max_retries)
> > + goto permanent_error;
> > + if (cfg->retry_timeout)
> > + if (time_after(jiffies,
> > + cfg->retry_timeout + bp->b_first_retry_time))
> > + goto permanent_error;
>
> This is hard to read and the pattern is prone to future maintenance
> problems. i.e. nested single line if statements are prone to people
> making mistakes with indentation when adding more conditions to
> them.
>
> So:
>
> if (cfg->max_retries != XFS_ERR_RETRY_FOREVER &&
> ++bp->b_retries > cfg->max_retries)
> goto permanent_error;
> if (cfg->retry_timeout &&
> time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time))
> goto permanent_error;
>
Fair enough, will change this in the next revision.
Thanks for the review
> > /* still a transient error, higher layers will retry */
> > xfs_buf_ioerror(bp, 0);
> > @@ -1139,6 +1147,7 @@ xfs_buf_iodone_callbacks(
> > * retry state here in preparation for the next error that may occur.
> > */
> > bp->b_last_error = 0;
> > + bp->b_retries = 0;
> >
> > xfs_buf_do_callbacks(bp);
> > bp->b_fspriv = NULL;
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 0c5a976..0382140 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -54,7 +54,8 @@ enum {
> >
> > struct xfs_error_cfg {
> > struct xfs_kobj kobj;
> > - int max_retries;
> > + int max_retries;
>
> #define XFS_ERR_RETRY_FOREVER (-1)
>
> > + unsigned long retry_timeout; /* in jiffies, 0 = no timeout */
> > };
> ....
> > +static ssize_t
> > +max_retries_store(
> > + struct kobject *kobject,
> > + const char *buf,
> > + size_t count)
> > +{
> > + struct xfs_error_cfg *cfg = to_error_cfg(kobject);
> > + int ret;
> > + int val;
> > +
> > + ret = kstrtoint(buf, 0, &val);
> > + if (ret)
> > + return ret;
> > +
> > + if (val < -1 || val > INT_MAX)
> > + return -EINVAL;
>
> Should special case XFS_ERR_RETRY_FOREVER here. i.e. range is
> 0-INT_MAX || XFS_ERR_RETRY_FOREVER.
>
> if ((val < -1 || val > INT_MAX) &&
> val != XFS_ERR_RETRY_FOREVER)
> return -EINVAL;
>
> --
> Dave Chinner
> david@fromorbit.com
--
Carlos
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2016-05-04 9:57 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-03 17:55 [PATCH 0/7] Configurable error behavior [V2] Carlos Maiolino
2016-05-03 17:55 ` [PATCH 1/7] xfs: configurable error behaviour via sysfs Carlos Maiolino
2016-05-03 17:55 ` [PATCH 2/7] xfs: introduce metadata IO error class Carlos Maiolino
2016-05-03 17:55 ` [PATCH 3/7] xfs: add configurable error support to metadata buffers Carlos Maiolino
2016-05-03 17:55 ` [PATCH 4/7] xfs: introduce table-based init for error behaviours Carlos Maiolino
2016-05-03 17:55 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino
2016-05-03 23:24 ` Dave Chinner
2016-05-04 9:57 ` Carlos Maiolino [this message]
2016-05-03 17:55 ` [PATCH 6/7] xfs: add configuration handles for specific errors Carlos Maiolino
2016-05-03 17:55 ` [PATCH 7/7] xfs: shutdown filesystem at unmount in case of errors Carlos Maiolino
2016-05-03 22:59 ` [PATCH 0/7] Configurable error behavior [V2] Dave Chinner
2016-05-04 9:53 ` Carlos Maiolino
-- strict thread matches above, loose matches on Subject: below --
2016-05-04 15:43 [PATCH 0/7] Configurable error behavior [V3] Carlos Maiolino
2016-05-04 15:43 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino
2016-05-05 14:10 ` Brian Foster
2016-05-06 0:04 ` Dave Chinner
2016-05-06 10:59 ` Carlos Maiolino
2016-05-10 12:16 [PATCH 0/7] Configurable error behavior [V4] Carlos Maiolino
2016-05-10 12:16 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160504095729.GB2855@redhat.com \
--to=cmaiolino@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox