All of lore.kernel.org
 help / color / mirror / Atom feed
From: Carlos Maiolino <cmaiolino@redhat.com>
To: xfs@oss.sgi.com
Subject: Re: [PATCH 5/7] xfs: add configuration of error failure speed
Date: Wed, 4 May 2016 11:57:29 +0200	[thread overview]
Message-ID: <20160504095729.GB2855@redhat.com> (raw)
In-Reply-To: <20160503232419.GV26977@dastard>

On Wed, May 04, 2016 at 09:24:19AM +1000, Dave Chinner wrote:
> On Tue, May 03, 2016 at 07:55:38PM +0200, Carlos Maiolino wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > On reception of an error, we can fail immediately, perform some
> > bound amount of retries or retry indefinitely. The current behaviour
> > we have is to retry forever.
> > 
> > However, we'd like the ability to choose what behaviour we have, and
> > that requires the ability to configure the behaviour through the new
> > sysfs interfaces.
> > 
> > max_retries is used to set the number of retries to do until permanently fail,
> > the only special case is when it is set to -1, which, will cause the system to
> > retry indefinitely.
> 
> This doesn't read particularly well, and its more a description of
> the mechanism (which shoul dbe in the code) rather than the reason
> for implementing it.. The original commit message talked about
> different failure characteristics, which still need to be mentioned
> here. e.g:
> 
> 	We'd like to be able to configure errors to either fail
> 	immediately (fail fast), retry for some time before
> 	failing (fail slow) or retry forever (fail never). This is
> 	implemented by using specific values for the number of
> 	retries we allow.

Fair enough, I will re-work the patch descriptions in the next version.


> > 
> > Add both a maximum retry count and a retry timeout so that we can
> > bound by time and/or physical IO attempts.
> > 
> > Finally, plumb these into xfs_buf_iodone error processing so that
> > the error behaviour follows the selected configuration.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by:
> > Carlos Maiolino <cmaiolino@redhat.com>
> 
> You also need to document what changes you've made to the original
> patches. I can't tell what you've changed in each patch just by
> looking at them - this is the first one where I noticed something
> while reading it. i.e. the commit message should have:
> 
> [ cmaiolino: removed fail-fast/slow/never and instead use retry count
> to determine behaviour ]
>

Ok
 
> > ---
> >  fs/xfs/xfs_buf.h      | 23 ++++++++++++++-
> >  fs/xfs/xfs_buf_item.c | 13 +++++++--
> >  fs/xfs/xfs_mount.h    |  3 +-
> >  fs/xfs/xfs_sysfs.c    | 81 ++++++++++++++++++++++++++++++++++++++++++++++++---
> >  4 files changed, 112 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> > index adef116..bd978f0 100644
> > --- a/fs/xfs/xfs_buf.h
> > +++ b/fs/xfs/xfs_buf.h
> > @@ -183,7 +183,28 @@ typedef struct xfs_buf {
> >  	unsigned int		b_page_count;	/* size of page array */
> >  	unsigned int		b_offset;	/* page offset in first page */
> >  	int			b_error;	/* error code on I/O */
> > -	int			b_last_error;	/* previous async I/O error */
> > +
> > +	/*
> > +	 * async write failure retry count. Initialised to zero on the first
> > +	 * failure, then when it exceeds the maximum configured without a
> > +	 * success the write is considered to be failed permanently and the
> > +	 * iodone handler will take appropriate action.
> > +	 *
> > +	 * For retry timeouts, we record the jiffie of the first failure. This
> > +	 * means that we can change the retry timeout and it on the next error
> > +	 * it will be checked against the newly configured timeout. This
> > +	 * prevents buffers getting stuck in retry loops with a really long
> > +	 * timeout.
> > +	 *
> > +	 * last_error is used to ensure that we are getting repeated errors, not
> > +	 * different errors. e.g. a block device might change ENOSPC to EIO when
> > +	 * a failure timeout occurs, so we want to re-initialise the error
> > +	 * retry behaviour appropriately when that happens.
> > +	 */
> > +	int			b_retries;
> > +	unsigned long		b_first_retry_time; /* in jiffies */
> > +	int			b_last_error;
> > +
> >  	const struct xfs_buf_ops	*b_ops;
> >  
> >  #ifdef XFS_BUF_LOCK_TRACKING
> > diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> > index 3a30a88..68ed2a0 100644
> > --- a/fs/xfs/xfs_buf_item.c
> > +++ b/fs/xfs/xfs_buf_item.c
> > @@ -1086,6 +1086,9 @@ xfs_buf_iodone_callback_error(
> >  		bp->b_flags |= (XBF_WRITE | XBF_ASYNC |
> >  			        XBF_DONE | XBF_WRITE_FAIL);
> >  		bp->b_last_error = bp->b_error;
> > +		bp->b_retries = 0;
> > +		bp->b_first_retry_time = jiffies;
> > +
> >  		xfs_buf_ioerror(bp, 0);
> >  		xfs_buf_submit(bp);
> >  		return true;
> > @@ -1095,8 +1098,13 @@ xfs_buf_iodone_callback_error(
> >  	 * Repeated failure on an async write. Take action according to the
> >  	 * error configuration we have been set up to use.
> >  	 */
> > -	if (!cfg->max_retries)
> > -		goto permanent_error;
> > +	if (cfg->max_retries >= 0)
> > +		if (++bp->b_retries > cfg->max_retries)
> > +			goto permanent_error;
> > +	if (cfg->retry_timeout)
> > +		if (time_after(jiffies,
> > +			       cfg->retry_timeout + bp->b_first_retry_time))
> > +			goto permanent_error;
> 
> This is hard to read and the pattern is prone to future maintenance
> problems. i.e. nested single line if statements are prone to people
> making mistakes with indentation when adding more conditions to
> them.
> 
> So:
> 
> 	if (cfg->max_retries != XFS_ERR_RETRY_FOREVER &&
> 	    ++bp->b_retries > cfg->max_retries)
> 		goto permanent_error;
> 	if (cfg->retry_timeout &&
> 	    time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time))
> 		goto permanent_error;
>

Fair enough, will change this in the next revision.

Thanks for the review
 
> >  	/* still a transient error, higher layers will retry */
> >  	xfs_buf_ioerror(bp, 0);
> > @@ -1139,6 +1147,7 @@ xfs_buf_iodone_callbacks(
> >  	 * retry state here in preparation for the next error that may occur.
> >  	 */
> >  	bp->b_last_error = 0;
> > +	bp->b_retries = 0;
> >  
> >  	xfs_buf_do_callbacks(bp);
> >  	bp->b_fspriv = NULL;
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 0c5a976..0382140 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -54,7 +54,8 @@ enum {
> >  
> >  struct xfs_error_cfg {
> >  	struct xfs_kobj	kobj;
> > -	int		max_retries;
> > +	int		max_retries;
> 
> #define XFS_ERR_RETRY_FOREVER	(-1)
> 
> > +	unsigned long	retry_timeout;	/* in jiffies, 0 = no timeout */
> >  };
> ....
> > +static ssize_t
> > +max_retries_store(
> > +	struct kobject	*kobject,
> > +	const char	*buf,
> > +	size_t		count)
> > +{
> > +	struct xfs_error_cfg *cfg = to_error_cfg(kobject);
> > +	int		ret;
> > +	int		val;
> > +
> > +	ret = kstrtoint(buf, 0, &val);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (val < -1 || val > INT_MAX)
> > +		return -EINVAL;
> 
> Should special case XFS_ERR_RETRY_FOREVER here. i.e. range is
> 0-INT_MAX || XFS_ERR_RETRY_FOREVER.
> 
> 	if ((val < -1 || val > INT_MAX) &&
> 	    val != XFS_ERR_RETRY_FOREVER)
> 		return -EINVAL;
> 
> -- 
> Dave Chinner
> david@fromorbit.com

-- 
Carlos

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-05-04  9:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-03 17:55 [PATCH 0/7] Configurable error behavior [V2] Carlos Maiolino
2016-05-03 17:55 ` [PATCH 1/7] xfs: configurable error behaviour via sysfs Carlos Maiolino
2016-05-03 17:55 ` [PATCH 2/7] xfs: introduce metadata IO error class Carlos Maiolino
2016-05-03 17:55 ` [PATCH 3/7] xfs: add configurable error support to metadata buffers Carlos Maiolino
2016-05-03 17:55 ` [PATCH 4/7] xfs: introduce table-based init for error behaviours Carlos Maiolino
2016-05-03 17:55 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino
2016-05-03 23:24   ` Dave Chinner
2016-05-04  9:57     ` Carlos Maiolino [this message]
2016-05-03 17:55 ` [PATCH 6/7] xfs: add configuration handles for specific errors Carlos Maiolino
2016-05-03 17:55 ` [PATCH 7/7] xfs: shutdown filesystem at unmount in case of errors Carlos Maiolino
2016-05-03 22:59 ` [PATCH 0/7] Configurable error behavior [V2] Dave Chinner
2016-05-04  9:53   ` Carlos Maiolino
  -- strict thread matches above, loose matches on Subject: below --
2016-05-04 15:43 [PATCH 0/7] Configurable error behavior [V3] Carlos Maiolino
2016-05-04 15:43 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino
2016-05-05 14:10   ` Brian Foster
2016-05-06  0:04   ` Dave Chinner
2016-05-06 10:59     ` Carlos Maiolino
2016-05-10 12:16 [PATCH 0/7] Configurable error behavior [V4] Carlos Maiolino
2016-05-10 12:16 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160504095729.GB2855@redhat.com \
    --to=cmaiolino@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.