linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/13] bcache: device failure handling improvement
@ 2018-01-28  1:56 Coly Li
  2018-01-28  1:56 ` [PATCH v4 01/13] bcache: set writeback_rate_update_seconds in range [1, 60] seconds Coly Li
                   ` (13 more replies)
  0 siblings, 14 replies; 17+ messages in thread
From: Coly Li @ 2018-01-28  1:56 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Hi maintainers and folks,

This patch set tries to improve bcache device failure handling, includes
cache device and backing device failures.

The basic idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices which are attached to this cache set
- Stop all the detached bcache devices (configurable)
- Stop all flash only volume on the cache set
The above process is named 'cache set retire' by me. The result of cache
set retire is, cache set and bcache devices are all removed, following
I/O requests will get failed immediately to notift upper layer or user
space coce that the cache device is failed or disconnected.

For failed backing device, there are two kinds of failures to handle,
- If device is disconnected, and kernel thread dc->status_update_thread
  finds it is offline for BACKING_DEV_OFFLINE_TIMEOUT (5) seconds, the
  kernel thread will set dc->io_disable and call bcache_device_stop() to
  stop and remove the bcache device from system.
- If device is alive but returns too many I/O errors, after errors number
  exceeds dc->error_limit, call bch_cached_dev_error() to set
  dc->io_disable and stop bcache device. Then the broken backing device
  and its bcache device will be removed from system.

The v4 patch set combines two v3 patches into one, and adds one more patch
to permit users to explicitly avoid stopping attached bcache device from a
retiring cache set. This is a configurable option suggested by
Nix <nix@esperi.org.uk>.

Some patches of this patch set is already in bcache-for-next and not
included here anymore. Most of the patches are reviewed by Hannes Reinecke
and Junhui Tang. There are still severl patches need to be reviewed,
- [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly
- [PATCH v4 13/13] bcache: add stop_attached_devs_on_fail to struct
  cached_dev 

Any comment, question and review are warmly welcome. Thanks in advance.

Changelog:
v4: add per-cached_dev option stop_attached_devs_on_fail to avoid stopping
    attached bcache device from a retiring cache set.
v3: fix detach issue find in v2 patch set.
v2: fixes all problems found in v1 review.
    add patches to handle backing device failure.
    add one more patch to set writeback_rate_update_seconds range.
    include a patch from Junhui Tang.
v1: the initial version, only handles cache device failure.

Coly Li
---

Coly Li (12):
  bcache: set writeback_rate_update_seconds in range [1, 60] seconds
  bcache: properly set task state in bch_writeback_thread()
  bcache: fix cached_dev->count usage for bch_cache_set_error()
  bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
  bcache: stop dc->writeback_rate_update properly
  bcache: set error_limit correctly
  bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
  bcache: stop all attached bcache devices for a retired cache set
  bcache: add backing_request_endio() for bi_end_io of attached backing
    device I/O
  bcache: add io_disable to struct cached_dev
  bcache: stop bcache device when backing device is offline
  bcache: add stop_attached_devs_on_fail to struct cached_dev

Tang Junhui (1):
  bcache: fix inaccurate io state for detached bcache devices

 drivers/md/bcache/alloc.c     |   5 +-
 drivers/md/bcache/bcache.h    |  38 ++++++++-
 drivers/md/bcache/btree.c     |  10 ++-
 drivers/md/bcache/io.c        |  16 +++-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   | 187 +++++++++++++++++++++++++++++++++++-------
 drivers/md/bcache/super.c     | 181 ++++++++++++++++++++++++++++++++++++----
 drivers/md/bcache/sysfs.c     |  55 ++++++++++++-
 drivers/md/bcache/util.h      |   6 --
 drivers/md/bcache/writeback.c |  99 ++++++++++++++++++----
 drivers/md/bcache/writeback.h |   5 +-
 11 files changed, 522 insertions(+), 84 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 17+ messages in thread
* [PATCH v4 00/13] bcache: device failure handling improvement
@ 2018-01-27 14:23 Coly Li
  2018-01-27 14:23 ` [PATCH v4 04/13] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set Coly Li
  0 siblings, 1 reply; 17+ messages in thread
From: Coly Li @ 2018-01-27 14:23 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Hi maintainers and folks,

This patch set tries to improve bcache device failure handling, includes
cache device and backing device failures.

The basic idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices which are attached to this cache set
- Stop all the detached bcache devices (configurable)
- Stop all flash only volume on the cache set
The above process is named 'cache set retire' by me. The result of cache
set retire is, cache set and bcache devices are all removed, following
I/O requests will get failed immediately to notift upper layer or user
space coce that the cache device is failed or disconnected.

For failed backing device, there are two kinds of failures to handle,
- If device is disconnected, and kernel thread dc->status_update_thread
  finds it is offline for BACKING_DEV_OFFLINE_TIMEOUT (5) seconds, the
  kernel thread will set dc->io_disable and call bcache_device_stop() to
  stop and remove the bcache device from system.
- If device is alive but returns too many I/O errors, after errors number
  exceeds dc->error_limit, call bch_cached_dev_error() to set
  dc->io_disable and stop bcache device. Then the broken backing device
  and its bcache device will be removed from system.

The v4 patch set combines two v3 patches into one, and adds one more patch
to permit users to explicitly avoid stopping attached bcache device from a
retiring cache set. This is a configurable option suggested by
Nix <nix@esperi.org.uk>.

Some patches of this patch set is already in bcache-for-next and not
included here anymore. Most of the patches are reviewed by Hannes Reinecke
and Junhui Tang. There are still severl patches need to be reviewed,
- [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly
- [PATCH v4 13/13] bcache: add stop_attached_devs_on_fail to struct
  cached_dev 

Any comment, question and review are warmly welcome. Thanks in advance.

Changelog:
v4: add per-cached_dev option stop_attached_devs_on_fail to avoid stopping
    attached bcache device from a retiring cache set.
v3: fix detach issue find in v2 patch set.
v2: fixes all problems found in v1 review.
    add patches to handle backing device failure.
    add one more patch to set writeback_rate_update_seconds range.
    include a patch from Junhui Tang.
v1: the initial version, only handles cache device failure.

Coly Li
---

Coly Li (12):
  bcache: set writeback_rate_update_seconds in range [1, 60] seconds
  bcache: properly set task state in bch_writeback_thread()
  bcache: fix cached_dev->count usage for bch_cache_set_error()
  bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
  bcache: stop dc->writeback_rate_update properly
  bcache: set error_limit correctly
  bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
  bcache: stop all attached bcache devices for a retired cache set
  bcache: add backing_request_endio() for bi_end_io of attached backing
    device I/O
  bcache: add io_disable to struct cached_dev
  bcache: stop bcache device when backing device is offline
  bcache: add stop_attached_devs_on_fail to struct cached_dev

Tang Junhui (1):
  bcache: fix inaccurate io state for detached bcache devices

 drivers/md/bcache/alloc.c     |   5 +-
 drivers/md/bcache/bcache.h    |  38 ++++++++-
 drivers/md/bcache/btree.c     |  10 ++-
 drivers/md/bcache/io.c        |  16 +++-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   | 187 +++++++++++++++++++++++++++++++++++-------
 drivers/md/bcache/super.c     | 181 ++++++++++++++++++++++++++++++++++++----
 drivers/md/bcache/sysfs.c     |  55 ++++++++++++-
 drivers/md/bcache/util.h      |   6 --
 drivers/md/bcache/writeback.c |  99 ++++++++++++++++++----
 drivers/md/bcache/writeback.h |   5 +-
 11 files changed, 522 insertions(+), 84 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-02-02  2:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-28  1:56 [PATCH v4 00/13] bcache: device failure handling improvement Coly Li
2018-01-28  1:56 ` [PATCH v4 01/13] bcache: set writeback_rate_update_seconds in range [1, 60] seconds Coly Li
2018-01-28  1:56 ` [PATCH v4 02/13] bcache: properly set task state in bch_writeback_thread() Coly Li
2018-01-28  1:56 ` [PATCH v4 03/13] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-01-28  1:56 ` [PATCH v4 04/13] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set Coly Li
2018-01-28  1:56 ` [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly Coly Li
2018-01-28  1:56 ` [PATCH v4 06/13] bcache: set error_limit correctly Coly Li
2018-01-28  1:56 ` [PATCH v4 07/13] bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags Coly Li
2018-01-28  1:56 ` [PATCH v4 08/13] bcache: stop all attached bcache devices for a retired cache set Coly Li
2018-01-28  1:56 ` [PATCH v4 09/13] bcache: fix inaccurate io state for detached bcache devices Coly Li
2018-01-28  1:56 ` [PATCH v4 10/13] bcache: add backing_request_endio() for bi_end_io of attached backing device I/O Coly Li
2018-01-28  1:56 ` [PATCH v4 11/13] bcache: add io_disable to struct cached_dev Coly Li
2018-01-28  1:56 ` [PATCH v4 12/13] bcache: stop bcache device when backing device is offline Coly Li
2018-01-28  1:56 ` [PATCH v4 13/13] bcache: add stop_when_cache_set_failed to struct cached_dev Coly Li
2018-02-01 21:52 ` [PATCH v4 00/13] bcache: device failure handling improvement Michael Lyle
2018-02-02  2:04   ` Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2018-01-27 14:23 Coly Li
2018-01-27 14:23 ` [PATCH v4 04/13] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).