public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/10] cache device failure handling improvement
@ 2018-01-03 14:03 Coly Li
  2018-01-03 14:03 ` [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state Coly Li
                   ` (10 more replies)
  0 siblings, 11 replies; 41+ messages in thread
From: Coly Li @ 2018-01-03 14:03 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, mlyle, tang.junhui, Coly Li

Hi maintainers and folks,

This patch set tries to improve cache device failure handling. A basic
idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices attached to this cache set
- Stop all bcache devices linked to this cache set
The above process is named 'cache set retire' by me. The result of cache
set retire is, cache set and bcache devices are all removed, following
I/O requests will get failed immediately to notift upper layer or user
space coce that the cache device is failed or disconnected.

The first 8 patches of this patch set is to fix existing bugs in bcache,
the last 2 patches do the real improvement. Order of applying these patches
is important, if the last 2 patches are applied firstly, kernel panic or
process hang will be observed. Therefore I suggest to apply the first 8
fixes, then apply the last 2 patches.

The patch set is tested with writethrough, writeback, writearound mode,
read/write/readwrite workloads, so far it works as expected. IMHO the
cache set retire logic is complicated, I need your help to review the
patches, any question is warmly wlecome.

Coly Li (10):
  bcache: exit bch_writeback_thread() with proper task state
  bcache: set task properly in allocator_wait()
  bcache: reduce cache_set devices iteration by devices_max_used
  bcache: fix cached_dev->count usage for bch_cache_set_error()
  bcache: stop dc->writeback_rate_update if cache set is stopping
  bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier
  bcache: set error_limit correctly
  bcache: fix misleading error message in bch_count_io_errors()
  bcache: add io_disable to struct cache_set
  bcache: stop all attached bcache devices for a retired cache set

 drivers/md/bcache/alloc.c     |  5 ++---
 drivers/md/bcache/bcache.h    | 19 +++++++++++++++-
 drivers/md/bcache/btree.c     |  8 ++++---
 drivers/md/bcache/io.c        | 15 ++++++++-----
 drivers/md/bcache/journal.c   |  4 ++--
 drivers/md/bcache/request.c   | 26 ++++++++++++++++------
 drivers/md/bcache/super.c     | 51 +++++++++++++++++++++++++++++++++++--------
 drivers/md/bcache/sysfs.c     |  8 +++++--
 drivers/md/bcache/util.h      |  6 -----
 drivers/md/bcache/writeback.c | 51 +++++++++++++++++++++++++++++++++----------
 drivers/md/bcache/writeback.h |  4 +---
 11 files changed, 144 insertions(+), 53 deletions(-)

Thanks in advance.

Coly Li

^ permalink raw reply	[flat|nested] 41+ messages in thread
* Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping
@ 2018-01-04 12:41 tang.junhui
  2018-01-05  4:01 ` Coly Li
  0 siblings, 1 reply; 41+ messages in thread
From: tang.junhui @ 2018-01-04 12:41 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

Hello Coly,

>When cache set is stopping, calculating writeback rate is wast of time.
>This is the purpose of the first checking, to avoid unnecessary delay
>from bcache_flash_devs_sectors_dirty() inside __update_writeback_rate().

I thought twice, and found there is still a race.

work update_writeback_rate is runing:                                                                                
>@@ -91,6 +91,11 @@ static void update_writeback_rate(struct work_struct *work)
>     struct cached_dev *dc = container_of(to_delayed_work(work),
>                          struct cached_dev,
>                          writeback_rate_update);
>+    struct cache_set *c = dc->disk.c;
>+
>+    /* quit directly if cache set is stopping */
>+    if (test_bit(CACHE_SET_STOPPING, &c->flags))
>+        return;
> 
>     down_read(&dc->writeback_lock);
> 
>@@ -100,6 +105,10 @@ static void update_writeback_rate(struct work_struct *work)
> 
>     up_read(&dc->writeback_lock);
> 
>+    /* do not schedule delayed work if cache set is stopping */
>+    if (test_bit(CACHE_SET_STOPPING, &c->flags))
>+        return;
>+

and if cancel_delayed_wsrksync() runs at this moment, the code can not prevent the
work re-arming itself to workqueue. So maybe we need a locker to protect it.    

>     schedule_delayed_work(&dc->writeback_rate_update,
>                   dc->writeback_rate_update_seconds * HZ);
> }

Thanks,
Tang Junhui

^ permalink raw reply	[flat|nested] 41+ messages in thread
* Re: Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping
@ 2018-01-03 18:54 tang.junhui
  2018-01-04  9:05 ` Coly Li
  0 siblings, 1 reply; 41+ messages in thread
From: tang.junhui @ 2018-01-03 18:54 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

Hi Coly,

>It is about an implicit and interesting ordering, a simple patch with a
>lot detail behind. Let me explain why it's safe,
>
>- cancel_delayed_work_sync() is called in bcache_device_detach() when
>dc->count is 0. But cancel_delayed_work_sync() is called before
>closure_put(&d->c->caching) in bcache_device_detach().
>
>- After closure_put(&d->c->caching) called in bcache_device_detach(),
>refcount of c->caching becomes 0, and cache_set_flush() set in
>__cache_set_unregister() continue to execute.
>
>- See continue_at() and closure_put_after_sub(), after c->caching
>refoucnt reaches 0, and c->caching->fn gets called, closure_put(parent)
>will be called. Here the parent of c->caching is c->cl, so refcount of
>c->cl will reach 0, and closure_put_after_sub() will be called again for
>c->cl and call its cl->fn which is cache_set_free(). It is set in
>bch_cache_set_alloc().
>
>- So before closure_put(&d->c->caching) is called in
>bcache_device_detach(), cache_set_flush() in __cache_set_unregister()
>won't be executed and memory of cache_set won't be freed.
>
>Now back to delayed worker routine update_writeback_rate(), I check
>c->flags inside it, in this moment cancel_delayed_work_sync() is still
>waiting for update_writeback_rate() to exit, so refcount of c->caching
>is still hold and cache set memory will only be freed after
>cancel_delayed_work_sync() returns and refcount of c->caching dropped.
>
>And in cached_dev_free(), bcache_device_free() and
>kobject_put(&dc->disk.kobj) are called after cancel_delayed_work_sync()
>returns. Therefore when update_writeback_rate() is executing and
>accessing c->flgas inside, memory of cache_set, bcache_device and
>cached_dev are all there.

OK, I got your mean. It is safe run work update_writeback_rate when 
cancel_delayed_work_sync() is called, but it is not safe to re-arm itself
to workqueue again.

So, the first judgement  "if (test_bit(CACHE_SET_STOPPING, &c->flags))"
is not very nessary, we only need the second one to prevent the work 
from re-arming itself to workqueue.


Thanks,
Tang Junhui

^ permalink raw reply	[flat|nested] 41+ messages in thread
* Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping
@ 2018-01-03 16:47 tang.junhui
  2018-01-04  8:06 ` Coly Li
  0 siblings, 1 reply; 41+ messages in thread
From: tang.junhui @ 2018-01-03 16:47 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

Hello Coly,

>dc->writeback_rate_update is a special delayed worker, it re-arms itself
>to run after several seconds by,
>>>     schedule_delayed_work(&dc->writeback_rate_update,
>>>                   dc->writeback_rate_update_seconds * HZ);
>
>I check the workqueue code, it seems cancel_delayed_work_sync() does not
>prevent a delayed work to re-arm itself inside worker routine. And in my
>test, around 5 seconds after cancel_delayed_work_sync() called, a NULL
>pointer difference oops happens. 
I think I got how this issue would be occured:
1)work writeback_rate_update is running;
2)cancel_delayed_work_sync() is called, and waiting for work 
  writeback_rate_update to run over, and conntinue to release cche set 
  and cached device resources.
3)In the end of step 1, work writeback_rate_update re-armed again, and
  5s later, the work runs again.

>Cache set memory is freed within 2 seconds
>after __cache_set_unregister() called, so inside
>__update_writeback_rate() struct cache_set is referenced and causes the
>NULL pointer deference bug.
>
So, if it occured like I said above, it is not safty to judged by:
>    struct cache_set *c = dc->disk.c;
>    if (test_bit(CACHE_SET_STOPPING, &c->flags))
>        return;
because cache_set, even bcache device and cache device are all released.                                             
Is that right?

Thanks,
Tang Junhui

^ permalink raw reply	[flat|nested] 41+ messages in thread
* Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping
@ 2018-01-03 13:15 tang.junhui
  2018-01-04  3:32 ` Coly Li
  0 siblings, 1 reply; 41+ messages in thread
From: tang.junhui @ 2018-01-03 13:15 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

Hello Coly,

Thanks for this serials.

>struct delayed_work writeback_rate_update in struct cache_dev is a delayed
>worker to call function update_writeback_rate() in period (the interval is
>defined by dc->writeback_rate_update_seconds).
>
>When a metadate I/O error happens on cache device, bcache error handling
>routine bch_cache_set_error() will call bch_cache_set_unregister() to
>retire whole cache set. On the unregister code path, cached_dev_free()
>calls cancel_delayed_work_sync(&dc->writeback_rate_update) to stop this
>delayed work.
>
>dc->writeback_rate_update is a special delayed work from others in bcache.
>In its routine update_writeback_rate(), this delayed work is re-armed
>after a piece of time. That means when cancel_delayed_work_sync() returns,
>this delayed work can still be executed after several seconds defined by
>dc->writeback_rate_update_seconds.
>
>The problem is, after cancel_delayed_work_sync() returns, the cache set
>unregister code path will eventually release memory of struct cache set.
>Then the delayed work is scheduled to run, and inside its routine
>update_writeback_rate() that already released cache set NULL pointer will
>be accessed. Now a NULL pointer deference panic is triggered.
>
As I known that, after calling cancel_delayed_work_sync(), if the work queue
is running, cancel_delayed_work_sync() would return only AFTER the work queue
runs over, else if the work queue does not run yet, cancel_delayed_work_sync()
will cancel the work queue imediately, so I think it is safe in 
update_writeback_rate() to access the cache set resource. Point me out if I
am wrong.

>In order to avoid the above problem, this patch checks cache set flags in
>delayed work routine update_writeback_rate(). If flag CACHE_SET_STOPPING
>is set, this routine will quit without re-arm the delayed work. Then the
>NULL pointer deference panic won't happen after cache set is released.
>
>Signed-off-by: Coly Li <colyli@suse.de>
>---
> drivers/md/bcache/writeback.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>index 0789a9e18337..745d9b2a326f 100644
>--- a/drivers/md/bcache/writeback.c
>+++ b/drivers/md/bcache/writeback.c
>@@ -91,6 +91,11 @@ static void update_writeback_rate(struct work_struct *work)
>     struct cached_dev *dc = container_of(to_delayed_work(work),
>                          struct cached_dev,
>                          writeback_rate_update);
>+    struct cache_set *c = dc->disk.c;
>+
>+    /* quit directly if cache set is stopping */
>+    if (test_bit(CACHE_SET_STOPPING, &c->flags))
>+        return;
> 
>     down_read(&dc->writeback_lock);
> 
>@@ -100,6 +105,10 @@ static void update_writeback_rate(struct work_struct *work)
> 
>     up_read(&dc->writeback_lock);
> 
>+    /* do not schedule delayed work if cache set is stopping */
>+    if (test_bit(CACHE_SET_STOPPING, &c->flags))
>+        return;
>+
>     schedule_delayed_work(&dc->writeback_rate_update,
>                   dc->writeback_rate_update_seconds * HZ);
> }
>-- 
>2.15.1                                                             

Thanks,
Tang Junhui

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-01-08 16:01 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-03 14:03 [PATCH v1 00/10] cache device failure handling improvement Coly Li
2018-01-03 14:03 ` [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state Coly Li
2018-01-03 17:08   ` Michael Lyle
2018-01-05 17:05     ` Coly Li
2018-01-05 17:09       ` Michael Lyle
2018-01-08  7:09   ` Hannes Reinecke
2018-01-08 13:50     ` Coly Li
2018-01-03 14:03 ` [PATCH v1 02/10] bcache: set task properly in allocator_wait() Coly Li
2018-01-03 17:09   ` Michael Lyle
2018-01-05 17:11     ` Coly Li
2018-01-08  7:10   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 03/10] bcache: reduce cache_set devices iteration by devices_max_used Coly Li
2018-01-03 17:11   ` Michael Lyle
2018-01-08  7:12   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 04/10] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-01-08  7:16   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping Coly Li
2018-01-08  7:22   ` Hannes Reinecke
2018-01-08 16:01     ` Coly Li
2018-01-03 14:03 ` [PATCH v1 06/10] bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier Coly Li
2018-01-08  7:25   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 07/10] bcache: set error_limit correctly Coly Li
2018-01-08  7:26   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 08/10] bcache: fix misleading error message in bch_count_io_errors() Coly Li
2018-01-03 17:14   ` Michael Lyle
2018-01-08  7:27   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 09/10] bcache: add io_disable to struct cache_set Coly Li
2018-01-08  7:30   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 10/10] bcache: stop all attached bcache devices for a retired cache set Coly Li
2018-01-08  7:31   ` Hannes Reinecke
2018-01-03 17:07 ` [PATCH v1 00/10] cache device failure handling improvement Michael Lyle
2018-01-04  2:20   ` Coly Li
2018-01-04 17:46     ` Michael Lyle
2018-01-05  4:04       ` Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2018-01-04 12:41 [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping tang.junhui
2018-01-05  4:01 ` Coly Li
2018-01-03 18:54 tang.junhui
2018-01-04  9:05 ` Coly Li
2018-01-03 16:47 tang.junhui
2018-01-04  8:06 ` Coly Li
2018-01-03 13:15 tang.junhui
2018-01-04  3:32 ` Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox