public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Carlos Maiolino <cmaiolino@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 5/7] xfs: add configuration of error failure speed
Date: Thu, 5 May 2016 10:10:56 -0400	[thread overview]
Message-ID: <20160505141055.GE1231@bfoster.bfoster> (raw)
In-Reply-To: <1462376600-8617-6-git-send-email-cmaiolino@redhat.com>

On Wed, May 04, 2016 at 05:43:18PM +0200, Carlos Maiolino wrote:
> On reception of an error, we can fail immediately, perform some
> bound amount of retries or retry indefinitely. The current behaviour
> we have is to retry forever.
> 
> However, we'd like the ability to choose how long the filesystem should try
> after an error, it can either fail immediately, retry a few times, or retry
> forever. This is implemented by using max_retries sysfs attribute, to hold the
> amount of times we allow the filesystem to retry after an error. Being -1 a
> special case where the filesystem will retry indefinitely.
> 
> Add both a maximum retry count and a retry timeout so that we can bound by
> time and/or physical IO attempts.
> 
> Finally, plumb these into xfs_buf_iodone error processing so that
> the error behaviour follows the selected configuration.
> 
> Changelog:
> 
> V3:
> 	- In xfs_buf_iodone_callback_error, use max_retries to decide how long
> 	  the filesystem should retry on errors instead of XFS_ERR_FAIL enums
> 	  and fail_speed
> 
> 	- Remove all code implementing fail_speed attribute from the original
> 	  patch
> 		-- failure_speed_show/store attributes function implementation
> 		-- max_retries_store() now accepts values from -1 up to INT_MAX
> 
> 	- retry_timeout_seconds_show() print fixed:
> 		-- jiffies_to_msecs() should be divided by MSEC_PER_SEC
> 		-- trailing whitespace removed
> 
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
>  fs/xfs/xfs_buf.h      | 23 ++++++++++++++-
>  fs/xfs/xfs_buf_item.c | 12 ++++++--
>  fs/xfs/xfs_mount.h    |  3 +-
>  fs/xfs/xfs_sysfs.c    | 81 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  4 files changed, 111 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index adef116..bd978f0 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -183,7 +183,28 @@ typedef struct xfs_buf {
>  	unsigned int		b_page_count;	/* size of page array */
>  	unsigned int		b_offset;	/* page offset in first page */
>  	int			b_error;	/* error code on I/O */
> -	int			b_last_error;	/* previous async I/O error */
> +
> +	/*
> +	 * async write failure retry count. Initialised to zero on the first
> +	 * failure, then when it exceeds the maximum configured without a
> +	 * success the write is considered to be failed permanently and the
> +	 * iodone handler will take appropriate action.
> +	 *
> +	 * For retry timeouts, we record the jiffie of the first failure. This
> +	 * means that we can change the retry timeout and it on the next error
> +	 * it will be checked against the newly configured timeout. This
> +	 * prevents buffers getting stuck in retry loops with a really long
> +	 * timeout.

I'd reword the last bit to something like the following:

"This means we can change the retry timeout for buffers already under
I/O and thus avoid getting stuck in retry loops with a really long
timeout."

> +	 *
> +	 * last_error is used to ensure that we are getting repeated errors, not
> +	 * different errors. e.g. a block device might change ENOSPC to EIO when
> +	 * a failure timeout occurs, so we want to re-initialise the error
> +	 * retry behaviour appropriately when that happens.
> +	 */
> +	int			b_retries;
> +	unsigned long		b_first_retry_time; /* in jiffies */
> +	int			b_last_error;
> +
>  	const struct xfs_buf_ops	*b_ops;
>  
>  #ifdef XFS_BUF_LOCK_TRACKING
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 3a30a88..6fe6852 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1086,6 +1086,9 @@ xfs_buf_iodone_callback_error(
>  		bp->b_flags |= (XBF_WRITE | XBF_ASYNC |
>  			        XBF_DONE | XBF_WRITE_FAIL);
>  		bp->b_last_error = bp->b_error;
> +		bp->b_retries = 0;
> +		bp->b_first_retry_time = jiffies;
> +
>  		xfs_buf_ioerror(bp, 0);
>  		xfs_buf_submit(bp);
>  		return true;
> @@ -1095,8 +1098,12 @@ xfs_buf_iodone_callback_error(
>  	 * Repeated failure on an async write. Take action according to the
>  	 * error configuration we have been set up to use.
>  	 */
> -	if (!cfg->max_retries)
> -		goto permanent_error;
> +	if ((cfg->max_retries >= 0) &&
> +	    (++bp->b_retries > cfg->max_retries))
> +			goto permanent_error;
> +	if (cfg->retry_timeout &&
> +	    time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time))
> +			goto permanent_error;
>  
>  	/* still a transient error, higher layers will retry */
>  	xfs_buf_ioerror(bp, 0);
> @@ -1139,6 +1146,7 @@ xfs_buf_iodone_callbacks(
>  	 * retry state here in preparation for the next error that may occur.
>  	 */
>  	bp->b_last_error = 0;
> +	bp->b_retries = 0;
>  
>  	xfs_buf_do_callbacks(bp);
>  	bp->b_fspriv = NULL;
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 0c5a976..0382140 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -54,7 +54,8 @@ enum {
>  
>  struct xfs_error_cfg {
>  	struct xfs_kobj	kobj;
> -	int		max_retries;
> +	int		max_retries;	/* -1 = retry forever */
> +	unsigned long	retry_timeout;	/* in jiffies, 0 = no timeout */
>  };
>  
>  typedef struct xfs_mount {
> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> index 2a5b1cf..c881360 100644
> --- a/fs/xfs/xfs_sysfs.c
> +++ b/fs/xfs/xfs_sysfs.c
> @@ -374,10 +374,6 @@ struct kobj_type xfs_log_ktype = {
>   * and any other future type of IO (e.g. special inode or directory error
>   * handling) we care to support.
>   */
> -static struct attribute *xfs_error_attrs[] = {
> -	NULL,
> -};
> -
>  static inline struct xfs_error_cfg *
>  to_error_cfg(struct kobject *kobject)
>  {
> @@ -385,6 +381,79 @@ to_error_cfg(struct kobject *kobject)
>  	return container_of(kobj, struct xfs_error_cfg, kobj);
>  }
>  
> +static ssize_t
> +max_retries_show(
> +	struct kobject	*kobject,
> +	char		*buf)
> +{
> +	struct xfs_error_cfg *cfg = to_error_cfg(kobject);
> +
> +	return snprintf(buf, PAGE_SIZE, "%d\n", cfg->max_retries);
> +}
> +
> +static ssize_t
> +max_retries_store(
> +	struct kobject	*kobject,
> +	const char	*buf,
> +	size_t		count)
> +{
> +	struct xfs_error_cfg *cfg = to_error_cfg(kobject);
> +	int		ret;
> +	int		val;
> +
> +	ret = kstrtoint(buf, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	if (val < -1 || val > INT_MAX)
> +		return -EINVAL;
> +

Can an int be greater than INT_MAX? :)

The rest looks fine so you can add:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	cfg->max_retries = val;
> +	return count;
> +}
> +XFS_SYSFS_ATTR_RW(max_retries);
> +
> +static ssize_t
> +retry_timeout_seconds_show(
> +	struct kobject	*kobject,
> +	char		*buf)
> +{
> +	struct xfs_error_cfg *cfg = to_error_cfg(kobject);
> +
> +	return snprintf(buf, PAGE_SIZE, "%ld\n",
> +			jiffies_to_msecs(cfg->retry_timeout) / MSEC_PER_SEC);
> +}
> +
> +static ssize_t
> +retry_timeout_seconds_store(
> +	struct kobject	*kobject,
> +	const char	*buf,
> +	size_t		count)
> +{
> +	struct xfs_error_cfg *cfg = to_error_cfg(kobject);
> +	int		ret;
> +	int		val;
> +
> +	ret = kstrtoint(buf, 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	/* 1 day timeout maximum */
> +	if (val < 0 || val > 86400)
> +		return -EINVAL;
> +
> +	cfg->retry_timeout = msecs_to_jiffies(val * MSEC_PER_SEC);
> +	return count;
> +}
> +XFS_SYSFS_ATTR_RW(retry_timeout_seconds);
> +
> +static struct attribute *xfs_error_attrs[] = {
> +	ATTR_LIST(max_retries),
> +	ATTR_LIST(retry_timeout_seconds),
> +	NULL,
> +};
> +
> +
>  struct kobj_type xfs_error_cfg_ktype = {
>  	.release = xfs_sysfs_release,
>  	.sysfs_ops = &xfs_sysfs_ops,
> @@ -404,11 +473,13 @@ struct kobj_type xfs_error_ktype = {
>  struct xfs_error_init {
>  	char		*name;
>  	int		max_retries;
> +	int		retry_timeout;	/* in seconds */
>  };
>  
>  static const struct xfs_error_init xfs_error_meta_init[XFS_ERR_ERRNO_MAX] = {
>  	{ .name = "default",
>  	  .max_retries = -1,
> +	  .retry_timeout = 0,
>  	},
>  };
>  
> @@ -439,6 +510,8 @@ xfs_error_sysfs_init_class(
>  			goto out_error;
>  
>  		cfg->max_retries = init[i].max_retries;
> +		cfg->retry_timeout = msecs_to_jiffies(
> +					init[i].retry_timeout * MSEC_PER_SEC);
>  	}
>  	return 0;
>  
> -- 
> 2.4.11
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-05-05 14:11 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-04 15:43 [PATCH 0/7] Configurable error behavior [V3] Carlos Maiolino
2016-05-04 15:43 ` [PATCH 1/7] xfs: configurable error behavior via sysfs Carlos Maiolino
2016-05-05 14:10   ` Brian Foster
2016-05-04 15:43 ` [PATCH 2/7] xfs: introduce metadata IO error class Carlos Maiolino
2016-05-05 14:10   ` Brian Foster
2016-05-04 15:43 ` [PATCH 3/7] xfs: add configurable error support to metadata buffers Carlos Maiolino
2016-05-05 14:10   ` Brian Foster
2016-05-04 15:43 ` [PATCH 4/7] xfs: introduce table-based init for error behaviors Carlos Maiolino
2016-05-05 14:10   ` Brian Foster
2016-05-04 15:43 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino
2016-05-05 14:10   ` Brian Foster [this message]
2016-05-06  0:04   ` Dave Chinner
2016-05-06 10:59     ` Carlos Maiolino
2016-05-04 15:43 ` [PATCH 6/7] xfs: add configuration handlers for specific errors Carlos Maiolino
2016-05-05 14:11   ` Brian Foster
2016-05-05 23:57     ` Dave Chinner
2016-05-04 15:43 ` [PATCH 7/7] xfs: add "fail at unmount" error handling configuration Carlos Maiolino
2016-05-05 14:11 ` [PATCH 0/7] Configurable error behavior [V3] Brian Foster
2016-05-05 14:37   ` Carlos Maiolino
2016-05-05 15:18     ` Brian Foster
2016-05-05 23:49       ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2016-05-10 12:16 [PATCH 0/7] Configurable error behavior [V4] Carlos Maiolino
2016-05-10 12:16 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino
2016-05-03 17:55 [PATCH 0/7] Configurable error behavior [V2] Carlos Maiolino
2016-05-03 17:55 ` [PATCH 5/7] xfs: add configuration of error failure speed Carlos Maiolino
2016-05-03 23:24   ` Dave Chinner
2016-05-04  9:57     ` Carlos Maiolino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160505141055.GE1231@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=cmaiolino@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox