public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/10] cache device failure handling improvement
@ 2018-01-03 14:03 Coly Li
  2018-01-03 14:03 ` [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state Coly Li
                   ` (10 more replies)
  0 siblings, 11 replies; 36+ messages in thread
From: Coly Li @ 2018-01-03 14:03 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, mlyle, tang.junhui, Coly Li

Hi maintainers and folks,

This patch set tries to improve cache device failure handling. A basic
idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices attached to this cache set
- Stop all bcache devices linked to this cache set
The above process is named 'cache set retire' by me. The result of cache
set retire is, cache set and bcache devices are all removed, following
I/O requests will get failed immediately to notift upper layer or user
space coce that the cache device is failed or disconnected.

The first 8 patches of this patch set is to fix existing bugs in bcache,
the last 2 patches do the real improvement. Order of applying these patches
is important, if the last 2 patches are applied firstly, kernel panic or
process hang will be observed. Therefore I suggest to apply the first 8
fixes, then apply the last 2 patches.

The patch set is tested with writethrough, writeback, writearound mode,
read/write/readwrite workloads, so far it works as expected. IMHO the
cache set retire logic is complicated, I need your help to review the
patches, any question is warmly wlecome.

Coly Li (10):
  bcache: exit bch_writeback_thread() with proper task state
  bcache: set task properly in allocator_wait()
  bcache: reduce cache_set devices iteration by devices_max_used
  bcache: fix cached_dev->count usage for bch_cache_set_error()
  bcache: stop dc->writeback_rate_update if cache set is stopping
  bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier
  bcache: set error_limit correctly
  bcache: fix misleading error message in bch_count_io_errors()
  bcache: add io_disable to struct cache_set
  bcache: stop all attached bcache devices for a retired cache set

 drivers/md/bcache/alloc.c     |  5 ++---
 drivers/md/bcache/bcache.h    | 19 +++++++++++++++-
 drivers/md/bcache/btree.c     |  8 ++++---
 drivers/md/bcache/io.c        | 15 ++++++++-----
 drivers/md/bcache/journal.c   |  4 ++--
 drivers/md/bcache/request.c   | 26 ++++++++++++++++------
 drivers/md/bcache/super.c     | 51 +++++++++++++++++++++++++++++++++++--------
 drivers/md/bcache/sysfs.c     |  8 +++++--
 drivers/md/bcache/util.h      |  6 -----
 drivers/md/bcache/writeback.c | 51 +++++++++++++++++++++++++++++++++----------
 drivers/md/bcache/writeback.h |  4 +---
 11 files changed, 144 insertions(+), 53 deletions(-)

Thanks in advance.

Coly Li

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: [PATCH v1 06/10] bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier
@ 2018-01-03 17:27 tang.junhui
  2018-01-04  8:56 ` Coly Li
  0 siblings, 1 reply; 36+ messages in thread
From: tang.junhui @ 2018-01-03 17:27 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

Hello Coly,
Thanks for your works!

Acctually stopping write-back thread and writeback_rate_update work in 
bcache_device_detach() has already done in:
https://github.com/mlyle/linux/commit/397d02e162b8ee11940a4e9f45e16fee0650d64e

Is it nessary to add "rate_update_canceled" to identify that
whether writeback_rate_update work are already stoped or not? 
I think it is ok to call cancel_delayed_work_sync(&dc->writeback_rate_update)
twice before the memory of writeback_rate_update being released (It will return
-ENOENT).

>Delayed worker dc->writeback_rate_update and kernel thread
>dc->writeback_thread reference cache set data structure in their routine,
>Therefor, before they are stopped, cache set should not be release. Other-
>wise, NULL pointer deference will be triggered.
>
>Currenly delayed worker dc->writeback_rate_update and kernel thread
>dc->writeback_thread are stopped in cached_dev_free(). When cache set is
>retiring by too many I/O errors, cached_dev_free() is called when refcount
>of bcache device's closure (disk.cl) reaches 0. In most of cases, last
>refcount of disk.cl is dropped in last line of cached_dev_detach_finish().
>But in cached_dev_detach_finish() before calling closure_put(&dc->disk.cl),
>bcache_device_detach() is called, and inside bcache_device_detach()
>refcount of cache_set->caching is dropped by closure_put(&d->c->caching).
>
>It is very probably this is the last refcount of this closure, so routine
>cache_set_flush() will be called (it is set in __cache_set_unregister()),
>and its parent closure cache_set->cl may also drop its last refcount and
>cache_set_free() is called too. In cache_set_free() the last refcount of
>cache_set->kobj is dropped and then bch_cache_set_release() is called. Now
>in bch_cache_set_release(), the memory of struct cache_set is freeed.
>
>bch_cache_set_release() is called before cached_dev_free(), then there is a
>time window after cache set memory freed and before dc->writeback_thread
>and dc->writeback_rate_update stopped, if one of them is scheduled to run,
>a NULL pointer deference will be triggered.
>
>This patch fixes the above problem by stopping dc->writeback_thread and
>dc->writeback_rate_update earlier in bcache_device_detach() before calling
>closure_put(&d->c->caching). Because cancel_delayed_work_sync() and
>kthread_stop() are synchronized operations, we can make sure cache set
>is available when the delayed work and kthread are stopping.
>
>Because cached_dev_free() can also be called by writing 1 to sysfs file
>/sys/block/bcache<N>/bcache/stop, this code path may not call
>bcache_device_detach() if d-c is NULL. So stopping dc->writeback_thread
>and dc->writeback_rate_update in cached_dev_free() is still necessary. In
>order to avoid stop them twice, dc->rate_update_canceled is added to
>indicate dc->writeback_rate_update is canceled, and dc->writeback_thread
>is set to NULL to indicate it is stopped.
>
>Signed-off-by: Coly Li <colyli@suse.de>
>---
> drivers/md/bcache/bcache.h    |  1 +
> drivers/md/bcache/super.c     | 21 +++++++++++++++++++--
> drivers/md/bcache/writeback.c |  1 +
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>index 83c569942bd0..395b87942a2f 100644
>--- a/drivers/md/bcache/bcache.h
>+++ b/drivers/md/bcache/bcache.h
>@@ -322,6 +322,7 @@ struct cached_dev {
> 
>     struct bch_ratelimit    writeback_rate;
>     struct delayed_work    writeback_rate_update;
>+    bool            rate_update_canceled;
> 
>     /*
>      * Internal to the writeback code, so read_dirty() can keep track of
>diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>index 5401d2356aa3..8912be4165c5 100644
>--- a/drivers/md/bcache/super.c
>+++ b/drivers/md/bcache/super.c
>@@ -696,8 +696,20 @@ static void bcache_device_link(struct bcache_device *d, struct cache_set *c,
> 
> static void bcache_device_detach(struct bcache_device *d)
> {
>+    struct cached_dev *dc;
>+
>     lockdep_assert_held(&bch_register_lock);
> 
>+    dc = container_of(d, struct cached_dev, disk);
>+    if (!IS_ERR_OR_NULL(dc->writeback_thread)) {
>+        kthread_stop(dc->writeback_thread);
>+        dc->writeback_thread = NULL;
>+    }
>+    if (!dc->rate_update_canceled) {
>+        cancel_delayed_work_sync(&dc->writeback_rate_update);
>+        dc->rate_update_canceled = true;
>+    }
>+
>     if (test_bit(BCACHE_DEV_DETACHING, &d->flags)) {
>         struct uuid_entry *u = d->c->uuids + d->id;
> 
>@@ -1071,9 +1083,14 @@ static void cached_dev_free(struct closure *cl)
> {
>     struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl);
> 
>-    cancel_delayed_work_sync(&dc->writeback_rate_update);
>-    if (!IS_ERR_OR_NULL(dc->writeback_thread))
>+    if (!dc->rate_update_canceled) {
>+        cancel_delayed_work_sync(&dc->writeback_rate_update);
>+        dc->rate_update_canceled = true;
>+    }
>+    if (!IS_ERR_OR_NULL(dc->writeback_thread)) {
>         kthread_stop(dc->writeback_thread);
>+        dc->writeback_thread = NULL;
>+    }
>     if (dc->writeback_write_wq)
>         destroy_workqueue(dc->writeback_write_wq);
> 
>diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>index 745d9b2a326f..ab2ac3d72393 100644
>--- a/drivers/md/bcache/writeback.c
>+++ b/drivers/md/bcache/writeback.c
>@@ -548,6 +548,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
>     dc->writeback_rate_i_term_inverse = 10000;
> 
>     INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate);
>+    dc->rate_update_canceled = false;
> }
 

Thanks,
Tang Junhui

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2018-01-08 16:01 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-03 14:03 [PATCH v1 00/10] cache device failure handling improvement Coly Li
2018-01-03 14:03 ` [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state Coly Li
2018-01-03 17:08   ` Michael Lyle
2018-01-05 17:05     ` Coly Li
2018-01-05 17:09       ` Michael Lyle
2018-01-08  7:09   ` Hannes Reinecke
2018-01-08 13:50     ` Coly Li
2018-01-03 14:03 ` [PATCH v1 02/10] bcache: set task properly in allocator_wait() Coly Li
2018-01-03 17:09   ` Michael Lyle
2018-01-05 17:11     ` Coly Li
2018-01-08  7:10   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 03/10] bcache: reduce cache_set devices iteration by devices_max_used Coly Li
2018-01-03 17:11   ` Michael Lyle
2018-01-08  7:12   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 04/10] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-01-08  7:16   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping Coly Li
2018-01-08  7:22   ` Hannes Reinecke
2018-01-08 16:01     ` Coly Li
2018-01-03 14:03 ` [PATCH v1 06/10] bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier Coly Li
2018-01-08  7:25   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 07/10] bcache: set error_limit correctly Coly Li
2018-01-08  7:26   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 08/10] bcache: fix misleading error message in bch_count_io_errors() Coly Li
2018-01-03 17:14   ` Michael Lyle
2018-01-08  7:27   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 09/10] bcache: add io_disable to struct cache_set Coly Li
2018-01-08  7:30   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 10/10] bcache: stop all attached bcache devices for a retired cache set Coly Li
2018-01-08  7:31   ` Hannes Reinecke
2018-01-03 17:07 ` [PATCH v1 00/10] cache device failure handling improvement Michael Lyle
2018-01-04  2:20   ` Coly Li
2018-01-04 17:46     ` Michael Lyle
2018-01-05  4:04       ` Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2018-01-03 17:27 [PATCH v1 06/10] bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier tang.junhui
2018-01-04  8:56 ` Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox