From: Hannes Reinecke <hare@suse.de>
To: Coly Li <colyli@suse.de>, linux-bcache@vger.kernel.org
Cc: linux-block@vger.kernel.org, mlyle@lyle.org, tang.junhui@zte.com.cn
Subject: Re: [PATCH v1 07/10] bcache: set error_limit correctly
Date: Mon, 8 Jan 2018 08:26:52 +0100 [thread overview]
Message-ID: <e89b64f2-ff9a-1603-4d0b-6e75283fa993@suse.de> (raw)
In-Reply-To: <20180103140325.63175-8-colyli@suse.de>
On 01/03/2018 03:03 PM, Coly Li wrote:
> Struct cache uses io_errors for two purposes,
> - Error decay: when cache set error_decay is set, io_errors is used to
> generate a small piece of delay when I/O error happens.
> - I/O errors counter: in order to generate big enough value for error
> decay, I/O errors counter value is stored by left shifting 20 bits (a.k.a
> IO_ERROR_SHIFT).
>
> In function bch_count_io_errors(), if I/O errors counter reaches cache set
> error limit, bch_cache_set_error() will be called to retire the whold cache
> set. But current code is problematic when checking the error limit, see the
> following code piece from bch_count_io_errors(),
>
> 90 if (error) {
> 91 char buf[BDEVNAME_SIZE];
> 92 unsigned errors = atomic_add_return(1 << IO_ERROR_SHIFT,
> 93 &ca->io_errors);
> 94 errors >>= IO_ERROR_SHIFT;
> 95
> 96 if (errors < ca->set->error_limit)
> 97 pr_err("%s: IO error on %s, recovering",
> 98 bdevname(ca->bdev, buf), m);
> 99 else
> 100 bch_cache_set_error(ca->set,
> 101 "%s: too many IO errors %s",
> 102 bdevname(ca->bdev, buf), m);
> 103 }
>
> At line 94, errors is right shifting IO_ERROR_SHIFT bits, now it is real
> errors counter to compare at line 96. But ca->set->error_limit is initia-
> lized with an amplified value in bch_cache_set_alloc(),
> 1545 c->error_limit = 8 << IO_ERROR_SHIFT;
>
> It means by default, in bch_count_io_errors(), before 8<<20 errors happened
> bch_cache_set_error() won't be called to retire the problematic cache
> device. If the average request size is 64KB, it means bcache won't handle
> failed device until 512GB data is requested. This is too large to be an I/O
> threashold. So I believe the correct error limit should be much less.
>
> This patch sets default cache set error limit to 8, then in
> bch_count_io_errors() when errors counter reaches 8 (if it is default
> value), function bch_cache_set_error() will be called to retire the whole
> cache set. This patch also removes bits shifting when store or show
> io_error_limit value via sysfs interface.
>
> Nowadays most of SSDs handle internal flash failure automatically by LBA
> address re-indirect mapping. If an I/O error can be observed by upper layer
> code, it will be a notable error because that SSD can not re-indirect
> map the problematic LBA address to an available flash block. This situation
> indicates the whole SSD will be failed very soon. Therefore setting 8 as
> the default io error limit value makes sense, it is enough for most of
> cache devices.
>
> Signed-off-by: Coly Li <colyli@suse.de>
> ---
> drivers/md/bcache/bcache.h | 1 +
> drivers/md/bcache/super.c | 2 +-
> drivers/md/bcache/sysfs.c | 4 ++--
> 3 files changed, 4 insertions(+), 3 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
next prev parent reply other threads:[~2018-01-08 7:26 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-03 14:03 [PATCH v1 00/10] cache device failure handling improvement Coly Li
2018-01-03 14:03 ` [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state Coly Li
2018-01-03 17:08 ` Michael Lyle
2018-01-05 17:05 ` Coly Li
2018-01-05 17:09 ` Michael Lyle
2018-01-08 7:09 ` Hannes Reinecke
2018-01-08 13:50 ` Coly Li
2018-01-03 14:03 ` [PATCH v1 02/10] bcache: set task properly in allocator_wait() Coly Li
2018-01-03 17:09 ` Michael Lyle
2018-01-05 17:11 ` Coly Li
2018-01-08 7:10 ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 03/10] bcache: reduce cache_set devices iteration by devices_max_used Coly Li
2018-01-03 17:11 ` Michael Lyle
2018-01-08 7:12 ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 04/10] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-01-08 7:16 ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping Coly Li
2018-01-08 7:22 ` Hannes Reinecke
2018-01-08 16:01 ` Coly Li
2018-01-03 14:03 ` [PATCH v1 06/10] bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier Coly Li
2018-01-08 7:25 ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 07/10] bcache: set error_limit correctly Coly Li
2018-01-08 7:26 ` Hannes Reinecke [this message]
2018-01-03 14:03 ` [PATCH v1 08/10] bcache: fix misleading error message in bch_count_io_errors() Coly Li
2018-01-03 17:14 ` Michael Lyle
2018-01-08 7:27 ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 09/10] bcache: add io_disable to struct cache_set Coly Li
2018-01-08 7:30 ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 10/10] bcache: stop all attached bcache devices for a retired cache set Coly Li
2018-01-08 7:31 ` Hannes Reinecke
2018-01-03 17:07 ` [PATCH v1 00/10] cache device failure handling improvement Michael Lyle
2018-01-04 2:20 ` Coly Li
2018-01-04 17:46 ` Michael Lyle
2018-01-05 4:04 ` Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e89b64f2-ff9a-1603-4d0b-6e75283fa993@suse.de \
--to=hare@suse.de \
--cc=colyli@suse.de \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=mlyle@lyle.org \
--cc=tang.junhui@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox