linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/13] bcache: device failure handling improvement
@ 2018-01-28  1:56 Coly Li
  2018-01-28  1:56 ` [PATCH v4 01/13] bcache: set writeback_rate_update_seconds in range [1, 60] seconds Coly Li
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Coly Li @ 2018-01-28  1:56 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Hi maintainers and folks,

This patch set tries to improve bcache device failure handling, includes
cache device and backing device failures.

The basic idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices which are attached to this cache set
- Stop all the detached bcache devices (configurable)
- Stop all flash only volume on the cache set
The above process is named 'cache set retire' by me. The result of cache
set retire is, cache set and bcache devices are all removed, following
I/O requests will get failed immediately to notift upper layer or user
space coce that the cache device is failed or disconnected.

For failed backing device, there are two kinds of failures to handle,
- If device is disconnected, and kernel thread dc->status_update_thread
  finds it is offline for BACKING_DEV_OFFLINE_TIMEOUT (5) seconds, the
  kernel thread will set dc->io_disable and call bcache_device_stop() to
  stop and remove the bcache device from system.
- If device is alive but returns too many I/O errors, after errors number
  exceeds dc->error_limit, call bch_cached_dev_error() to set
  dc->io_disable and stop bcache device. Then the broken backing device
  and its bcache device will be removed from system.

The v4 patch set combines two v3 patches into one, and adds one more patch
to permit users to explicitly avoid stopping attached bcache device from a
retiring cache set. This is a configurable option suggested by
Nix <nix@esperi.org.uk>.

Some patches of this patch set is already in bcache-for-next and not
included here anymore. Most of the patches are reviewed by Hannes Reinecke
and Junhui Tang. There are still severl patches need to be reviewed,
- [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly
- [PATCH v4 13/13] bcache: add stop_attached_devs_on_fail to struct
  cached_dev 

Any comment, question and review are warmly welcome. Thanks in advance.

Changelog:
v4: add per-cached_dev option stop_attached_devs_on_fail to avoid stopping
    attached bcache device from a retiring cache set.
v3: fix detach issue find in v2 patch set.
v2: fixes all problems found in v1 review.
    add patches to handle backing device failure.
    add one more patch to set writeback_rate_update_seconds range.
    include a patch from Junhui Tang.
v1: the initial version, only handles cache device failure.

Coly Li
---

Coly Li (12):
  bcache: set writeback_rate_update_seconds in range [1, 60] seconds
  bcache: properly set task state in bch_writeback_thread()
  bcache: fix cached_dev->count usage for bch_cache_set_error()
  bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
  bcache: stop dc->writeback_rate_update properly
  bcache: set error_limit correctly
  bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
  bcache: stop all attached bcache devices for a retired cache set
  bcache: add backing_request_endio() for bi_end_io of attached backing
    device I/O
  bcache: add io_disable to struct cached_dev
  bcache: stop bcache device when backing device is offline
  bcache: add stop_attached_devs_on_fail to struct cached_dev

Tang Junhui (1):
  bcache: fix inaccurate io state for detached bcache devices

 drivers/md/bcache/alloc.c     |   5 +-
 drivers/md/bcache/bcache.h    |  38 ++++++++-
 drivers/md/bcache/btree.c     |  10 ++-
 drivers/md/bcache/io.c        |  16 +++-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   | 187 +++++++++++++++++++++++++++++++++++-------
 drivers/md/bcache/super.c     | 181 ++++++++++++++++++++++++++++++++++++----
 drivers/md/bcache/sysfs.c     |  55 ++++++++++++-
 drivers/md/bcache/util.h      |   6 --
 drivers/md/bcache/writeback.c |  99 ++++++++++++++++++----
 drivers/md/bcache/writeback.h |   5 +-
 11 files changed, 522 insertions(+), 84 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly
@ 2018-01-30  1:57 tang.junhui
  2018-01-30  2:20 ` Coly Li
  0 siblings, 1 reply; 23+ messages in thread
From: tang.junhui @ 2018-01-30  1:57 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

Hello Coly:

OK, I got your point now.
Thanks for your patience.

And there is a small issue I hope to be modified:
+#define BCACHE_DEV_WB_RUNNING        4
+#define BCACHE_DEV_RATE_DW_RUNNING    8
Would be OK just as:
+#define BCACHE_DEV_WB_RUNNING        3
+#define BCACHE_DEV_RATE_DW_RUNNING    4

Reviewed-by: Tang Junhui <tang.junhui@zte.com.cn>

                   
>On 29/01/2018 8:22 PM, tang.junhui@zte.com.cn wrote:
>> From: Tang Junhui <tang.junhui@zte.com.cn>
>> 
>> Hello Coly:
>> 
>> There are some differences,
>> Using variable of atomic_t type can not guarantee the atomicity of transaction.
>> for example:
>> A thread runs in update_writeback_rate()
>> update_writeback_rate(){
>>     ....
>> +    if (test_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) {
>> +        schedule_delayed_work(&dc->writeback_rate_update,
>>                    dc->writeback_rate_update_seconds * HZ);
>> +    }
>> 
>> Then another thread executes in cached_dev_detach_finish():
>>     if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
>>         cancel_writeback_rate_update_dwork(dc);
>> 
>> +
>> +    /*
>> +     * should check BCACHE_DEV_RATE_DW_RUNNING before calling
>> +     * cancel_delayed_work_sync().
>> +     */
>> +    clear_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags);
>> +    /* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */
>> +    smp_mb();
>> 
>> Race still exists.
>>  
>
>Hi Junhui,
>
>Check super.c:cancel_writeback_rate_update_dwork(),
>BCACHE_DEV_RATE_DW_RUNNING is checked there.
>
>You may see in cached_dev_detach_finish() and update_writeback_rate(),
>the orders to check BCACHE_DEV_RATE_DW_RUNNING and BCACHE_DEV_WB_RUNNING
>are different.
>
>cached_dev_detach_finish()        update_writeback_rate()
>
>test_and_clear_bit            set_bit
>BCACHE_DEV_WB_RUNNING            BCACHE_DEV_RATE_DW_RUNNING
>
>(implicit smp_mb())            smp_mb()
>
>test_bit                test_bit
>BCACHE_DEV_RATE_DW_RUNNING        BCACHE_DEV_WB_RUNNING
>
>                    clear_bit()
>                    BCACHE_DEV_RATE_DW_RUNNING
>
>                    smp_mb()
>
>
>This two flags are accessed in reversed order in different locations,
>there is a smp_mb() between accessing two flags to serialize the access
>order.
>
>By the above reserve ordering accessing, it is sure that
>- in cached_dev_detach_finish(), before
>test_bit(BCACHE_DEV_RATE_DW_RUNNING) bit BCACHE_DEV_WB_RUNNING must be
>cleared already.
>- in update_writeback_rate(), before test_bit(BCACHE_DEV_WB_RUNNING),
>BCACHE_DEV_RATE_DW_RUNNING must be set already.
>
>Therefore in your example, if a thread is testing BCACHE_DEV_WB_RUNNING
>in update_writeback_rate(), it means BCACHE_DEV_RATE_DW_RUNNING must be
>set already. So in cancel_writeback_rate_update_dwork() another thread
>must wait until BCACHE_DEV_RATE_DW_RUNNING is cleared then
>cancel_delayed_work_sync() can be called. And in update_writeback_rate()
>the bit BCACHE_DEV_RATE_DW_RUNNING is cleared after
>schedule_delayed_work() returns, so the race is killed.
>
>A mutex lock indicates an implicit memory barrier, and in your
>suggestion up_read(&dc->writeback_lock) is after schedule_delayed_work()
>too. This is why I said they are almost same.
>
>Thanks.
>
>Coly Li
>
>>>
>>> On 29/01/2018 3:35 PM, tang.junhui@zte.com.cn wrote:
>>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>>
>>>> Hello Coly:
>>>>
>>>> This patch is somewhat difficult for me,
>>>> I think we can resolve it in a simple way.
>>>>
>>>> We can take the schedule_delayed_work() under the protection of 
>>>> dc->writeback_lock, and judge if we need re-arm this work to queue.
>>>>
>>>> static void update_writeback_rate(struct work_struct *work)
>>>> {
>>>>     struct cached_dev *dc = container_of(to_delayed_work(work),
>>>>                          struct cached_dev,
>>>>                          writeback_rate_update);
>>>>
>>>>     down_read(&dc->writeback_lock);
>>>>
>>>>     if (atomic_read(&dc->has_dirty) &&
>>>>         dc->writeback_percent)
>>>>         __update_writeback_rate(dc);
>>>>
>>>> -    up_read(&dc->writeback_lock);
>>>> +    if (NEED_RE-AEMING)    
>>>>         schedule_delayed_work(&dc->writeback_rate_update,
>>>>                   dc->writeback_rate_update_seconds * HZ);
>>>> +    up_read(&dc->writeback_lock);
>>>> }
>>>>
>>>> In cached_dev_detach_finish() and cached_dev_free() we can set the no need
>>>> flag under the protection of dc->writeback_lock, for example:
>>>>
>>>> static void cached_dev_detach_finish(struct work_struct *w)
>>>> {
>>>>     ...
>>>> +    down_write(&dc->writeback_lock);
>>>> +    SET NO NEED RE-ARM FLAG
>>>> +    up_write(&dc->writeback_lock);
>>>>     cancel_delayed_work_sync(&dc->writeback_rate_update);
>>>> }
>>>>
>>>> I think this way is more simple and readable.
>>>>
>>>
>>> Hi Junhui,
>>>
>>> Your suggest is essentially almost same to my patch,
>>> - clear BCACHE_DEV_DETACHING bit acts as SET NO NEED RE-ARM FLAG.
>>> - cancel_writeback_rate_update_dwork acts as some kind of locking with a
>>> timeout.
>>>
>>> The difference is I don't use dc->writeback_lock, and replace it by
>>> BCACHE_DEV_RATE_DW_RUNNING.
>>>
>>> The reason is my following development. I plan to implement a real-time
>>> update stripe_sectors_dirty of bcache device and cache set, then
>>> bcache_flash_devs_sectors_dirty() can be very fast and bch_register_lock
>>> can be removed here. And then I also plan to remove reference of
>>> dc->writeback_lock in update_writeback_rate() because indeed it is
>>> unnecessary here (the patch is held by Mike's locking resort work).
>>>
>>> Since I plan to remove dc->writeback_lock from update_writeback_rate(),
>>> I don't want to reference dc->writeback in the delayed work.
>>>
>>> The basic idea behind your suggestion and this patch, is almost
>>> identical. The only difference might be the timeout in
>>> cancel_writeback_rate_update_dwork().
>>>
>>> Thanks.
>>>
>>> Coly Li
>> 
>> Thanks.
>> Tang Junhui
>> 

Thanks.
Tang Junhui

^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly
@ 2018-01-29 12:22 tang.junhui
  2018-01-29 12:57 ` Coly Li
  0 siblings, 1 reply; 23+ messages in thread
From: tang.junhui @ 2018-01-29 12:22 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

Hello Coly:

There are some differences,
Using variable of atomic_t type can not guarantee the atomicity of transaction.
for example:
A thread runs in update_writeback_rate()
update_writeback_rate(){
	....
+	if (test_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) {
+		schedule_delayed_work(&dc->writeback_rate_update,
 			      dc->writeback_rate_update_seconds * HZ);
+	}

Then another thread executes in cached_dev_detach_finish():
	if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
		cancel_writeback_rate_update_dwork(dc);

+
+	/*
+	 * should check BCACHE_DEV_RATE_DW_RUNNING before calling
+	 * cancel_delayed_work_sync().
+	 */
+	clear_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags);
+	/* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */
+	smp_mb();

Race still exists.
 
> 
> On 29/01/2018 3:35 PM, tang.junhui@zte.com.cn wrote:
> > From: Tang Junhui <tang.junhui@zte.com.cn>
> > 
> > Hello Coly:
> > 
> > This patch is somewhat difficult for me,
> > I think we can resolve it in a simple way.
> > 
> > We can take the schedule_delayed_work() under the protection of 
> > dc->writeback_lock, and judge if we need re-arm this work to queue.
> > 
> > static void update_writeback_rate(struct work_struct *work)
> > {
> >     struct cached_dev *dc = container_of(to_delayed_work(work),
> >                          struct cached_dev,
> >                          writeback_rate_update);
> > 
> >     down_read(&dc->writeback_lock);
> > 
> >     if (atomic_read(&dc->has_dirty) &&
> >         dc->writeback_percent)
> >         __update_writeback_rate(dc);
> > 
> > -    up_read(&dc->writeback_lock);
> > +    if (NEED_RE-AEMING)    
> >         schedule_delayed_work(&dc->writeback_rate_update,
> >                   dc->writeback_rate_update_seconds * HZ);
> > +    up_read(&dc->writeback_lock);
> > }
> > 
> > In cached_dev_detach_finish() and cached_dev_free() we can set the no need
> > flag under the protection of dc->writeback_lock, for example:
> > 
> > static void cached_dev_detach_finish(struct work_struct *w)
> > {
> >     ...
> > +    down_write(&dc->writeback_lock);
> > +    SET NO NEED RE-ARM FLAG
> > +    up_write(&dc->writeback_lock);
> >     cancel_delayed_work_sync(&dc->writeback_rate_update);
> > }
> > 
> > I think this way is more simple and readable.
> > 
> 
> Hi Junhui,
> 
> Your suggest is essentially almost same to my patch,
> - clear BCACHE_DEV_DETACHING bit acts as SET NO NEED RE-ARM FLAG.
> - cancel_writeback_rate_update_dwork acts as some kind of locking with a
> timeout.
> 
> The difference is I don't use dc->writeback_lock, and replace it by
> BCACHE_DEV_RATE_DW_RUNNING.
> 
> The reason is my following development. I plan to implement a real-time
> update stripe_sectors_dirty of bcache device and cache set, then
> bcache_flash_devs_sectors_dirty() can be very fast and bch_register_lock
> can be removed here. And then I also plan to remove reference of
> dc->writeback_lock in update_writeback_rate() because indeed it is
> unnecessary here (the patch is held by Mike's locking resort work).
> 
> Since I plan to remove dc->writeback_lock from update_writeback_rate(),
> I don't want to reference dc->writeback in the delayed work.
> 
> The basic idea behind your suggestion and this patch, is almost
> identical. The only difference might be the timeout in
> cancel_writeback_rate_update_dwork().
> 
> Thanks.
> 
> Coly Li

Thanks.
Tang Junhui

^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly
@ 2018-01-29  7:35 tang.junhui
  2018-01-29  9:36 ` Coly Li
  0 siblings, 1 reply; 23+ messages in thread
From: tang.junhui @ 2018-01-29  7:35 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

Hello Coly:

This patch is somewhat difficult for me,
I think we can resolve it in a simple way.

We can take the schedule_delayed_work() under the protection of 
dc->writeback_lock, and judge if we need re-arm this work to queue.

static void update_writeback_rate(struct work_struct *work)
{
	struct cached_dev *dc = container_of(to_delayed_work(work),
					     struct cached_dev,
					     writeback_rate_update);

	down_read(&dc->writeback_lock);

	if (atomic_read(&dc->has_dirty) &&
	    dc->writeback_percent)
		__update_writeback_rate(dc);

-	up_read(&dc->writeback_lock);
+	if (NEED_RE-AEMING)	
		schedule_delayed_work(&dc->writeback_rate_update,
			      dc->writeback_rate_update_seconds * HZ);
+	up_read(&dc->writeback_lock);
}

In cached_dev_detach_finish() and cached_dev_free() we can set the no need
flag under the protection of dc->writeback_lock, for example:

static void cached_dev_detach_finish(struct work_struct *w)
{
	...
+	down_write(&dc->writeback_lock);
+	SET NO NEED RE-ARM FLAG
+	up_write(&dc->writeback_lock);
	cancel_delayed_work_sync(&dc->writeback_rate_update);
}

I think this way is more simple and readable.


> struct delayed_work writeback_rate_update in struct cache_dev is a delayed
> worker to call function update_writeback_rate() in period (the interval is
> defined by dc->writeback_rate_update_seconds).
> 
> When a metadate I/O error happens on cache device, bcache error handling
> routine bch_cache_set_error() will call bch_cache_set_unregister() to
> retire whole cache set. On the unregister code path, this delayed work is
> stopped by calling cancel_delayed_work_sync(&dc->writeback_rate_update).
> 
> dc->writeback_rate_update is a special delayed work from others in bcache.
> In its routine update_writeback_rate(), this delayed work is re-armed
> itself. That means when cancel_delayed_work_sync() returns, this delayed
> work can still be executed after several seconds defined by
> dc->writeback_rate_update_seconds.
> 
> The problem is, after cancel_delayed_work_sync() returns, the cache set
> unregister code path will continue and release memory of struct cache set.
> Then the delayed work is scheduled to run, __update_writeback_rate()
> will reference the already released cache_set memory, and trigger a NULL
> pointer deference fault.
> 
> This patch introduces two more bcache device flags,
> - BCACHE_DEV_WB_RUNNING
>   bit set:  bcache device is in writeback mode and running, it is OK for
>             dc->writeback_rate_update to re-arm itself.
>   bit clear:bcache device is trying to stop dc->writeback_rate_update,
>             this delayed work should not re-arm itself and quit.
> - BCACHE_DEV_RATE_DW_RUNNING
>   bit set:  routine update_writeback_rate() is executing.
>   bit clear: routine update_writeback_rate() quits.
> 
> This patch also adds a function cancel_writeback_rate_update_dwork() to
> wait for dc->writeback_rate_update quits before cancel it by calling
> cancel_delayed_work_sync(). In order to avoid a deadlock by unexpected
> quit dc->writeback_rate_update, after time_out seconds this function will
> give up and continue to call cancel_delayed_work_sync().
> 
> And here I explain how this patch stops self re-armed delayed work properly
> with the above stuffs.
> 
> update_writeback_rate() sets BCACHE_DEV_RATE_DW_RUNNING at its beginning
> and clears BCACHE_DEV_RATE_DW_RUNNING at its end. Before calling
> cancel_writeback_rate_update_dwork() clear flag BCACHE_DEV_WB_RUNNING.
> 
> Before calling cancel_delayed_work_sync() wait utill flag
> BCACHE_DEV_RATE_DW_RUNNING is clear. So when calling
> cancel_delayed_work_sync(), dc->writeback_rate_update must be already re-
> armed, or quite by seeing BCACHE_DEV_WB_RUNNING cleared. In both cases
> delayed work routine update_writeback_rate() won't be executed after
> cancel_delayed_work_sync() returns.
> 
> Inside update_writeback_rate() before calling schedule_delayed_work(), flag
> BCACHE_DEV_WB_RUNNING is checked before. If this flag is cleared, it means
> someone is about to stop the delayed work. Because flag
> BCACHE_DEV_RATE_DW_RUNNING is set already and cancel_delayed_work_sync()
> has to wait for this flag to be cleared, we don't need to worry about race
> condition here.
> 
> If update_writeback_rate() is scheduled to run after checking
> BCACHE_DEV_RATE_DW_RUNNING and before calling cancel_delayed_work_sync()
> in cancel_writeback_rate_update_dwork(), it is also safe. Because at this
> moment BCACHE_DEV_WB_RUNNING is cleared with memory barrier. As I mentioned
> previously, update_writeback_rate() will see BCACHE_DEV_WB_RUNNING is clear
> and quit immediately.
> 
> Because there are more dependences inside update_writeback_rate() to struct
> cache_set memory, dc->writeback_rate_update is not a simple self re-arm
> delayed work. After trying many different methods (e.g. hold dc->count, or
> use locks), this is the only way I can find which works to properly stop
> dc->writeback_rate_update delayed work.
> 
> Changelog:
> v2: Try to fix the race issue which is pointed out by Junhui.
> v1: The initial version for review
> 
> Signed-off-by: Coly Li <colyli@suse.de>
> Cc: Michael Lyle <mlyle@lyle.org>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Junhui Tang <tang.junhui@zte.com.cn>
> ---
>  drivers/md/bcache/bcache.h    |  9 +++++----
>  drivers/md/bcache/super.c     | 39 +++++++++++++++++++++++++++++++++++----
>  drivers/md/bcache/sysfs.c     |  3 ++-
>  drivers/md/bcache/writeback.c | 29 ++++++++++++++++++++++++++++-
>  4 files changed, 70 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 5e2d4e80198e..88d938c8d027 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -258,10 +258,11 @@ struct bcache_device {
>      struct gendisk        *disk;
>  
>      unsigned long        flags;
> -#define BCACHE_DEV_CLOSING    0
> -#define BCACHE_DEV_DETACHING    1
> -#define BCACHE_DEV_UNLINK_DONE    2
> -
> +#define BCACHE_DEV_CLOSING        0
> +#define BCACHE_DEV_DETACHING        1
> +#define BCACHE_DEV_UNLINK_DONE        2
> +#define BCACHE_DEV_WB_RUNNING        4
> +#define BCACHE_DEV_RATE_DW_RUNNING    8
>      unsigned        nr_stripes;
>      unsigned        stripe_size;
>      atomic_t        *stripe_sectors_dirty;
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index d14e09cce2f6..6d888e8fea8c 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -899,6 +899,32 @@ void bch_cached_dev_run(struct cached_dev *dc)
>          pr_debug("error creating sysfs link");
>  }
>  
> +/*
> + * If BCACHE_DEV_RATE_DW_RUNNING is set, it means routine of the delayed
> + * work dc->writeback_rate_update is running. Wait until the routine
> + * quits (BCACHE_DEV_RATE_DW_RUNNING is clear), then continue to
> + * cancel it. If BCACHE_DEV_RATE_DW_RUNNING is not clear after time_out
> + * seconds, give up waiting here and continue to cancel it too.
> + */
> +static void cancel_writeback_rate_update_dwork(struct cached_dev *dc)
> +{
> +    int time_out = WRITEBACK_RATE_UPDATE_SECS_MAX * HZ;
> +
> +    do {
> +        if (!test_bit(BCACHE_DEV_RATE_DW_RUNNING,
> +                  &dc->disk.flags))
> +            break;
> +        time_out--;
> +        schedule_timeout_interruptible(1);
> +    } while (time_out > 0);
> +
> +    if (time_out == 0)
> +        pr_warn("bcache: give up waiting for "
> +            "dc->writeback_write_update to quit");
> +
> +    cancel_delayed_work_sync(&dc->writeback_rate_update);
> +}
> +
>  static void cached_dev_detach_finish(struct work_struct *w)
>  {
>      struct cached_dev *dc = container_of(w, struct cached_dev, detach);
> @@ -911,7 +937,9 @@ static void cached_dev_detach_finish(struct work_struct *w)
>  
>      mutex_lock(&bch_register_lock);
>  
> -    cancel_delayed_work_sync(&dc->writeback_rate_update);
> +    if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
> +        cancel_writeback_rate_update_dwork(dc);
> +
>      if (!IS_ERR_OR_NULL(dc->writeback_thread)) {
>          kthread_stop(dc->writeback_thread);
>          dc->writeback_thread = NULL;
> @@ -954,6 +982,7 @@ void bch_cached_dev_detach(struct cached_dev *dc)
>      closure_get(&dc->disk.cl);
>  
>      bch_writeback_queue(dc);
> +
>      cached_dev_put(dc);
>  }
>  
> @@ -1079,14 +1108,16 @@ static void cached_dev_free(struct closure *cl)
>  {
>      struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl);
>  
> -    cancel_delayed_work_sync(&dc->writeback_rate_update);
> +    mutex_lock(&bch_register_lock);
> +
> +    if (test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
> +        cancel_writeback_rate_update_dwork(dc);
> +
>      if (!IS_ERR_OR_NULL(dc->writeback_thread))
>          kthread_stop(dc->writeback_thread);
>      if (dc->writeback_write_wq)
>          destroy_workqueue(dc->writeback_write_wq);
>  
> -    mutex_lock(&bch_register_lock);
> -
>      if (atomic_read(&dc->running))
>          bd_unlink_disk_holder(dc->bdev, dc->disk.disk);
>      bcache_device_free(&dc->disk);
> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
> index a74a752c9e0f..b7166c504cdb 100644
> --- a/drivers/md/bcache/sysfs.c
> +++ b/drivers/md/bcache/sysfs.c
> @@ -304,7 +304,8 @@ STORE(bch_cached_dev)
>          bch_writeback_queue(dc);
>  
>      if (attr == &sysfs_writeback_percent)
> -        schedule_delayed_work(&dc->writeback_rate_update,
> +        if (!test_and_set_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags))
> +            schedule_delayed_work(&dc->writeback_rate_update,
>                        dc->writeback_rate_update_seconds * HZ);
>  
>      mutex_unlock(&bch_register_lock);
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 4dbeaaa575bf..8f98ef1038d3 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -115,6 +115,21 @@ static void update_writeback_rate(struct work_struct *work)
>                           struct cached_dev,
>                           writeback_rate_update);
>  
> +    /*
> +     * should check BCACHE_DEV_RATE_DW_RUNNING before calling
> +     * cancel_delayed_work_sync().
> +     */
> +    set_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags);
> +    /* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */
> +    smp_mb();
> +
> +    if (!test_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) {
> +        clear_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags);
> +        /* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */
> +        smp_mb();
> +        return;
> +    }
> +
>      down_read(&dc->writeback_lock);
>  
>      if (atomic_read(&dc->has_dirty) &&
> @@ -123,8 +138,18 @@ static void update_writeback_rate(struct work_struct *work)
>  
>      up_read(&dc->writeback_lock);
>  
> -    schedule_delayed_work(&dc->writeback_rate_update,
> +    if (test_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)) {
> +        schedule_delayed_work(&dc->writeback_rate_update,
>                    dc->writeback_rate_update_seconds * HZ);
> +    }
> +
> +    /*
> +     * should check BCACHE_DEV_RATE_DW_RUNNING before calling
> +     * cancel_delayed_work_sync().
> +     */
> +    clear_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags);
> +    /* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */
> +    smp_mb();
>  }
>  
>  static unsigned writeback_delay(struct cached_dev *dc, unsigned sectors)
> @@ -675,6 +700,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
>      dc->writeback_rate_p_term_inverse = 40;
>      dc->writeback_rate_i_term_inverse = 10000;
>  
> +    WARN_ON(test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags));
>      INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate);
>  }
>  
> @@ -693,6 +719,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
>          return PTR_ERR(dc->writeback_thread);
>      }
>  
> +    WARN_ON(test_and_set_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags));
>      schedule_delayed_work(&dc->writeback_rate_update,
>                    dc->writeback_rate_update_seconds * HZ);
>  
> -- 
> 2.15.1

Thanks,
Tang Junhui

^ permalink raw reply	[flat|nested] 23+ messages in thread
* [PATCH v4 00/13] bcache: device failure handling improvement
@ 2018-01-27 14:23 Coly Li
  2018-01-27 14:23 ` [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly Coly Li
  0 siblings, 1 reply; 23+ messages in thread
From: Coly Li @ 2018-01-27 14:23 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, Coly Li

Hi maintainers and folks,

This patch set tries to improve bcache device failure handling, includes
cache device and backing device failures.

The basic idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices which are attached to this cache set
- Stop all the detached bcache devices (configurable)
- Stop all flash only volume on the cache set
The above process is named 'cache set retire' by me. The result of cache
set retire is, cache set and bcache devices are all removed, following
I/O requests will get failed immediately to notift upper layer or user
space coce that the cache device is failed or disconnected.

For failed backing device, there are two kinds of failures to handle,
- If device is disconnected, and kernel thread dc->status_update_thread
  finds it is offline for BACKING_DEV_OFFLINE_TIMEOUT (5) seconds, the
  kernel thread will set dc->io_disable and call bcache_device_stop() to
  stop and remove the bcache device from system.
- If device is alive but returns too many I/O errors, after errors number
  exceeds dc->error_limit, call bch_cached_dev_error() to set
  dc->io_disable and stop bcache device. Then the broken backing device
  and its bcache device will be removed from system.

The v4 patch set combines two v3 patches into one, and adds one more patch
to permit users to explicitly avoid stopping attached bcache device from a
retiring cache set. This is a configurable option suggested by
Nix <nix@esperi.org.uk>.

Some patches of this patch set is already in bcache-for-next and not
included here anymore. Most of the patches are reviewed by Hannes Reinecke
and Junhui Tang. There are still severl patches need to be reviewed,
- [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly
- [PATCH v4 13/13] bcache: add stop_attached_devs_on_fail to struct
  cached_dev 

Any comment, question and review are warmly welcome. Thanks in advance.

Changelog:
v4: add per-cached_dev option stop_attached_devs_on_fail to avoid stopping
    attached bcache device from a retiring cache set.
v3: fix detach issue find in v2 patch set.
v2: fixes all problems found in v1 review.
    add patches to handle backing device failure.
    add one more patch to set writeback_rate_update_seconds range.
    include a patch from Junhui Tang.
v1: the initial version, only handles cache device failure.

Coly Li
---

Coly Li (12):
  bcache: set writeback_rate_update_seconds in range [1, 60] seconds
  bcache: properly set task state in bch_writeback_thread()
  bcache: fix cached_dev->count usage for bch_cache_set_error()
  bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
  bcache: stop dc->writeback_rate_update properly
  bcache: set error_limit correctly
  bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags
  bcache: stop all attached bcache devices for a retired cache set
  bcache: add backing_request_endio() for bi_end_io of attached backing
    device I/O
  bcache: add io_disable to struct cached_dev
  bcache: stop bcache device when backing device is offline
  bcache: add stop_attached_devs_on_fail to struct cached_dev

Tang Junhui (1):
  bcache: fix inaccurate io state for detached bcache devices

 drivers/md/bcache/alloc.c     |   5 +-
 drivers/md/bcache/bcache.h    |  38 ++++++++-
 drivers/md/bcache/btree.c     |  10 ++-
 drivers/md/bcache/io.c        |  16 +++-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   | 187 +++++++++++++++++++++++++++++++++++-------
 drivers/md/bcache/super.c     | 181 ++++++++++++++++++++++++++++++++++++----
 drivers/md/bcache/sysfs.c     |  55 ++++++++++++-
 drivers/md/bcache/util.h      |   6 --
 drivers/md/bcache/writeback.c |  99 ++++++++++++++++++----
 drivers/md/bcache/writeback.h |   5 +-
 11 files changed, 522 insertions(+), 84 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-02-02  2:04 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-28  1:56 [PATCH v4 00/13] bcache: device failure handling improvement Coly Li
2018-01-28  1:56 ` [PATCH v4 01/13] bcache: set writeback_rate_update_seconds in range [1, 60] seconds Coly Li
2018-01-28  1:56 ` [PATCH v4 02/13] bcache: properly set task state in bch_writeback_thread() Coly Li
2018-01-28  1:56 ` [PATCH v4 03/13] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-01-28  1:56 ` [PATCH v4 04/13] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set Coly Li
2018-01-28  1:56 ` [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly Coly Li
2018-01-28  1:56 ` [PATCH v4 06/13] bcache: set error_limit correctly Coly Li
2018-01-28  1:56 ` [PATCH v4 07/13] bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags Coly Li
2018-01-28  1:56 ` [PATCH v4 08/13] bcache: stop all attached bcache devices for a retired cache set Coly Li
2018-01-28  1:56 ` [PATCH v4 09/13] bcache: fix inaccurate io state for detached bcache devices Coly Li
2018-01-28  1:56 ` [PATCH v4 10/13] bcache: add backing_request_endio() for bi_end_io of attached backing device I/O Coly Li
2018-01-28  1:56 ` [PATCH v4 11/13] bcache: add io_disable to struct cached_dev Coly Li
2018-01-28  1:56 ` [PATCH v4 12/13] bcache: stop bcache device when backing device is offline Coly Li
2018-01-28  1:56 ` [PATCH v4 13/13] bcache: add stop_when_cache_set_failed to struct cached_dev Coly Li
2018-02-01 21:52 ` [PATCH v4 00/13] bcache: device failure handling improvement Michael Lyle
2018-02-02  2:04   ` Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2018-01-30  1:57 [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly tang.junhui
2018-01-30  2:20 ` Coly Li
2018-01-29 12:22 tang.junhui
2018-01-29 12:57 ` Coly Li
2018-01-29  7:35 tang.junhui
2018-01-29  9:36 ` Coly Li
2018-01-27 14:23 [PATCH v4 00/13] bcache: device failure handling improvement Coly Li
2018-01-27 14:23 ` [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly Coly Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).