public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/10] cache device failure handling improvement
@ 2018-01-03 14:03 Coly Li
  2018-01-03 14:03 ` [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state Coly Li
                   ` (10 more replies)
  0 siblings, 11 replies; 35+ messages in thread
From: Coly Li @ 2018-01-03 14:03 UTC (permalink / raw)
  To: linux-bcache; +Cc: linux-block, mlyle, tang.junhui, Coly Li

Hi maintainers and folks,

This patch set tries to improve cache device failure handling. A basic
idea to handle failed cache device is,
- Unregister cache set
- Detach all backing devices attached to this cache set
- Stop all bcache devices linked to this cache set
The above process is named 'cache set retire' by me. The result of cache
set retire is, cache set and bcache devices are all removed, following
I/O requests will get failed immediately to notift upper layer or user
space coce that the cache device is failed or disconnected.

The first 8 patches of this patch set is to fix existing bugs in bcache,
the last 2 patches do the real improvement. Order of applying these patches
is important, if the last 2 patches are applied firstly, kernel panic or
process hang will be observed. Therefore I suggest to apply the first 8
fixes, then apply the last 2 patches.

The patch set is tested with writethrough, writeback, writearound mode,
read/write/readwrite workloads, so far it works as expected. IMHO the
cache set retire logic is complicated, I need your help to review the
patches, any question is warmly wlecome.

Coly Li (10):
  bcache: exit bch_writeback_thread() with proper task state
  bcache: set task properly in allocator_wait()
  bcache: reduce cache_set devices iteration by devices_max_used
  bcache: fix cached_dev->count usage for bch_cache_set_error()
  bcache: stop dc->writeback_rate_update if cache set is stopping
  bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier
  bcache: set error_limit correctly
  bcache: fix misleading error message in bch_count_io_errors()
  bcache: add io_disable to struct cache_set
  bcache: stop all attached bcache devices for a retired cache set

 drivers/md/bcache/alloc.c     |  5 ++---
 drivers/md/bcache/bcache.h    | 19 +++++++++++++++-
 drivers/md/bcache/btree.c     |  8 ++++---
 drivers/md/bcache/io.c        | 15 ++++++++-----
 drivers/md/bcache/journal.c   |  4 ++--
 drivers/md/bcache/request.c   | 26 ++++++++++++++++------
 drivers/md/bcache/super.c     | 51 +++++++++++++++++++++++++++++++++++--------
 drivers/md/bcache/sysfs.c     |  8 +++++--
 drivers/md/bcache/util.h      |  6 -----
 drivers/md/bcache/writeback.c | 51 +++++++++++++++++++++++++++++++++----------
 drivers/md/bcache/writeback.h |  4 +---
 11 files changed, 144 insertions(+), 53 deletions(-)

Thanks in advance.

Coly Li

^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: [PATCH v1 03/10] bcache: reduce cache_set devices iteration by devices_max_used
@ 2018-01-03 19:14 tang.junhui
  0 siblings, 0 replies; 35+ messages in thread
From: tang.junhui @ 2018-01-03 19:14 UTC (permalink / raw)
  To: colyli; +Cc: mlyle, linux-bcache, linux-block, tang.junhui

From: Tang Junhui <tang.junhui@zte.com.cn>

LGTM.

Reviewed-by: Tang Junhui <tang.junhui@zte.com.cn>

>Member devices of struct cache_set is used to reference all attached                              
>bcache devices to this cache set. If it is treated as array of pointers,                          
>size of devices[] is indicated by member nr_uuids of struct cache_set.                            
>                                                                                                  
>nr_uuids is calculated in drivers/md/super.c:bch_cache_set_alloc(),                               
>    bucket_bytes(c) / sizeof(struct uuid_entry)                                                   
>Bucket size is determined by user space tool "make-bcache", by default it                         
>is 1024 sectors (defined in bcache-tools/make-bcache.c:main()). So default                        
>nr_uuids value is 4096 from the above calculation.                                                
>                                                                                                  
>Every time when bcache code iterates bcache devices of a cache set, all                           
>the 4096 pointers are checked even only 1 bcache device is attached to the                        
>cache set, that's a wast of time and unncessary.                                                  
>                                                                                                  
>This patch adds a member devices_max_used to struct cache_set. Its value                          
>is 1 + the maximum used index of devices[] in a cache set. When iterating                         
>all valid bcache devices of a cache set, use c->devices_max_used in                               
>for-loop may reduce a lot of useless checking.                                                    
>                                                                                                  
>Personally, my motivation of this patch is not for performance, I use it                          
>in bcache debugging, which helps me to narrow down the scape to check                             
>valid bcached devices of a cache set.                                                             
>                                                                                                  
>Signed-off-by: Coly Li <colyli@suse.de>                                                           
>---                                                                                               
> drivers/md/bcache/bcache.h    | 1 +                                                              
> drivers/md/bcache/btree.c     | 2 +-                                                             
> drivers/md/bcache/super.c     | 9 ++++++---                                                      
> drivers/md/bcache/writeback.h | 2 +-                                                             
> 4 files changed, 9 insertions(+), 5 deletions(-)                                                 
>                                                                                                  
>diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h                              
>index 843877e017e1..83c569942bd0 100644                                                           
>--- a/drivers/md/bcache/bcache.h                                                                  
>+++ b/drivers/md/bcache/bcache.h                                                                  
>@@ -488,6 +488,7 @@ struct cache_set {                                                            
>     int            caches_loaded;                                                                
>                                                                                                  
>     struct bcache_device    **devices;                                                           
>+    unsigned        devices_max_used;                                                            
>     struct list_head    cached_devs;                                                             
>     uint64_t        cached_dev_sectors;                                                          
>     struct closure        caching;                                                               
>diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c                                
>index 81e8dc3dbe5e..bf0d7978bc3d 100644                                                           
>--- a/drivers/md/bcache/btree.c                                                                   
>+++ b/drivers/md/bcache/btree.c                                                                   
>@@ -1678,7 +1678,7 @@ static void bch_btree_gc_finish(struct cache_set *c)                        
>                                                                                                  
>     /* don't reclaim buckets to which writeback keys point */                                    
>     rcu_read_lock();                                                                             
>-    for (i = 0; i < c->nr_uuids; i++) {                                                          
>+    for (i = 0; i < c->devices_max_used; i++) {                                                  
>         struct bcache_device *d = c->devices[i];                                                 
>         struct cached_dev *dc;                                                                   
>         struct keybuf_key *w, *n;                                                                
>diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c                                
>index b4d28928dec5..064efd869017 100644                                                           
>--- a/drivers/md/bcache/super.c                                                                   
>+++ b/drivers/md/bcache/super.c                                                                   
>@@ -721,6 +721,9 @@ static void bcache_device_attach(struct bcache_device *d, struct cache_set *c,
>     d->c = c;                                                                                    
>     c->devices[id] = d;                                                                          
>                                                                                                  
>+    if (id >= c->devices_max_used)                                                               
>+        c->devices_max_used = id + 1;                                                            
>+                                                                                                 
>     closure_get(&c->caching);                                                                    
> }                                                                                                
>                                                                                                  
>@@ -1261,7 +1264,7 @@ static int flash_devs_run(struct cache_set *c)                              
>     struct uuid_entry *u;                                                                        
>                                                                                                  
>     for (u = c->uuids;                                                                           
>-         u < c->uuids + c->nr_uuids && !ret;                                                     
>+         u < c->uuids + c->devices_max_used && !ret;                                             
>          u++)                                                                                    
>         if (UUID_FLASH_ONLY(u))                                                                  
>             ret = flash_dev_run(c, u);                                                           
>@@ -1427,7 +1430,7 @@ static void __cache_set_unregister(struct closure *cl)                      
>                                                                                                  
>     mutex_lock(&bch_register_lock);                                                              
>                                                                                                  
>-    for (i = 0; i < c->nr_uuids; i++)                                                            
>+    for (i = 0; i < c->devices_max_used; i++)                                                    
>         if (c->devices[i]) {                                                                     
>             if (!UUID_FLASH_ONLY(&c->uuids[i]) &&                                                
>                 test_bit(CACHE_SET_UNREGISTERING, &c->flags)) {                                  
>@@ -1490,7 +1493,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)                  
>     c->bucket_bits        = ilog2(sb->bucket_size);                                              
>     c->block_bits        = ilog2(sb->block_size);                                                
>     c->nr_uuids        = bucket_bytes(c) / sizeof(struct uuid_entry);                            
>-                                                                                                 
>+    c->devices_max_used    = 0;                                                                  
>     c->btree_pages        = bucket_pages(c);                                                     
>     if (c->btree_pages > BTREE_MAX_PAGES)                                                        
>         c->btree_pages = max_t(int, c->btree_pages / 4,                                          
>diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h                        
>index a9e3ffb4b03c..1d284f3d0363 100644                                                           
>--- a/drivers/md/bcache/writeback.h                                                               
>+++ b/drivers/md/bcache/writeback.h                                                               
>@@ -21,7 +21,7 @@ static inline uint64_t  bcache_flash_devs_sectors_dirty(struct cache_set *c)    
>                                                                                                  
>     mutex_lock(&bch_register_lock);                                                              
>                                                                                                  
>-    for (i = 0; i < c->nr_uuids; i++) {                                                          
>+    for (i = 0; i < c->devices_max_used; i++) {                                                  
>         struct bcache_device *d = c->devices[i];                                                 
>                                                                                                  
>         if (!d || !UUID_FLASH_ONLY(&c->uuids[i]))                                                
>--                                                                                                
>2.15.1                                                                                            


Thanks,
Tang Junhui

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-01-08 16:01 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-03 14:03 [PATCH v1 00/10] cache device failure handling improvement Coly Li
2018-01-03 14:03 ` [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state Coly Li
2018-01-03 17:08   ` Michael Lyle
2018-01-05 17:05     ` Coly Li
2018-01-05 17:09       ` Michael Lyle
2018-01-08  7:09   ` Hannes Reinecke
2018-01-08 13:50     ` Coly Li
2018-01-03 14:03 ` [PATCH v1 02/10] bcache: set task properly in allocator_wait() Coly Li
2018-01-03 17:09   ` Michael Lyle
2018-01-05 17:11     ` Coly Li
2018-01-08  7:10   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 03/10] bcache: reduce cache_set devices iteration by devices_max_used Coly Li
2018-01-03 17:11   ` Michael Lyle
2018-01-08  7:12   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 04/10] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-01-08  7:16   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping Coly Li
2018-01-08  7:22   ` Hannes Reinecke
2018-01-08 16:01     ` Coly Li
2018-01-03 14:03 ` [PATCH v1 06/10] bcache: stop dc->writeback_rate_update, dc->writeback_thread earlier Coly Li
2018-01-08  7:25   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 07/10] bcache: set error_limit correctly Coly Li
2018-01-08  7:26   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 08/10] bcache: fix misleading error message in bch_count_io_errors() Coly Li
2018-01-03 17:14   ` Michael Lyle
2018-01-08  7:27   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 09/10] bcache: add io_disable to struct cache_set Coly Li
2018-01-08  7:30   ` Hannes Reinecke
2018-01-03 14:03 ` [PATCH v1 10/10] bcache: stop all attached bcache devices for a retired cache set Coly Li
2018-01-08  7:31   ` Hannes Reinecke
2018-01-03 17:07 ` [PATCH v1 00/10] cache device failure handling improvement Michael Lyle
2018-01-04  2:20   ` Coly Li
2018-01-04 17:46     ` Michael Lyle
2018-01-05  4:04       ` Coly Li
  -- strict thread matches above, loose matches on Subject: below --
2018-01-03 19:14 [PATCH v1 03/10] bcache: reduce cache_set devices iteration by devices_max_used tang.junhui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox