public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bcache: btree.c: Fix GC thread exit in case of cache device failure and unregister
@ 2018-01-12 15:24 Pavel Vazharov
  2018-01-13  4:06 ` Coly Li
  0 siblings, 1 reply; 3+ messages in thread
From: Pavel Vazharov @ 2018-01-12 15:24 UTC (permalink / raw)
  To: mlyle, kent.overstreet; +Cc: linux-bcache, linux-kernel, Pavel Vazharov

There was a possibility for infinite do-while loop inside the GC thread
function in case of total failure of the caching device. I was able to
reproduce it 3 times simulating disappearing of the caching device via
'echo 1 > /sys/block/<dev>/device/delete'. In that case the btree_root
starts to return non zero and non -EAGAIN result, 'gc failed' message
start to fill the kernel log and the do-while becomes infinite loop
occupying single CPU core at 100%.
There is already a logic which unregisters the cache_set (or panics) in
case of io errors and thus we exit the loop here if the unregistering
procedure has already started.

Signed-off-by: Pavel Vazharov <freakpv@gmail.com>
---
 drivers/md/bcache/btree.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81e8dc3..a672081 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1748,8 +1748,12 @@ static void bch_btree_gc(struct cache_set *c)
 		closure_sync(&writes);
 		cond_resched();
 
-		if (ret && ret != -EAGAIN)
-			pr_warn("gc failed!");
+		if (ret && ret != -EAGAIN) {
+			if (test_bit(CACHE_SET_UNREGISTERING, &c->flags))
+				break;
+			else
+				pr_warn("gc failed!");
+		}
 	} while (ret);
 
 	bch_btree_gc_finish(c);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] bcache: btree.c: Fix GC thread exit in case of cache device failure and unregister
  2018-01-12 15:24 [PATCH] bcache: btree.c: Fix GC thread exit in case of cache device failure and unregister Pavel Vazharov
@ 2018-01-13  4:06 ` Coly Li
  2018-01-13  4:43   ` Pavel Vazharov
  0 siblings, 1 reply; 3+ messages in thread
From: Coly Li @ 2018-01-13  4:06 UTC (permalink / raw)
  To: Pavel Vazharov, mlyle, kent.overstreet; +Cc: linux-bcache, linux-kernel

On 12/01/2018 11:24 PM, Pavel Vazharov wrote:
> There was a possibility for infinite do-while loop inside the GC thread
> function in case of total failure of the caching device. I was able to
> reproduce it 3 times simulating disappearing of the caching device via
> 'echo 1 > /sys/block/<dev>/device/delete'. In that case the btree_root
> starts to return non zero and non -EAGAIN result, 'gc failed' message
> start to fill the kernel log and the do-while becomes infinite loop
> occupying single CPU core at 100%.
> There is already a logic which unregisters the cache_set (or panics) in
> case of io errors and thus we exit the loop here if the unregistering
> procedure has already started.
> 
> Signed-off-by: Pavel Vazharov <freakpv@gmail.com>
> ---
>  drivers/md/bcache/btree.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 81e8dc3..a672081 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -1748,8 +1748,12 @@ static void bch_btree_gc(struct cache_set *c)
>  		closure_sync(&writes);
>  		cond_resched();
>  
> -		if (ret && ret != -EAGAIN)
> -			pr_warn("gc failed!");
> +		if (ret && ret != -EAGAIN) {
> +			if (test_bit(CACHE_SET_UNREGISTERING, &c->flags))
> +				break;
> +			else
> +				pr_warn("gc failed!");
> +		}
>  	} while (ret);
>  
>  	bch_btree_gc_finish(c);
> 

Hi Pavel,

I see the point here. But there are 2 code paths to call
cache_set_flush(), one is from bch_cache_set_error(), one is from sysfs
interface (echo 1 > /sys/fs/bcache/<UUID>/stop).

CACHE_SET_UNREGISTERING is set in the first code path, the another code
path from sysfs does not set CACHE_SET_UNREGISTERING. In this case maybe
the above while-loop can not be stopped.

In my device failure cache set, I add an io_disable (in v2 it is
CACHE_SET_IO_DISABLE flag) to disable all cache set I/O, maybe it can be
used to check the condition and break the while-loop.

Thanks for the hint, I will also try to fix it in my patch set. If you
don't mind, I am glad to have your "Reviewed-by:" after I post the v2
patch set.

Thanks.

-- 
Coly Li

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] bcache: btree.c: Fix GC thread exit in case of cache device failure and unregister
  2018-01-13  4:06 ` Coly Li
@ 2018-01-13  4:43   ` Pavel Vazharov
  0 siblings, 0 replies; 3+ messages in thread
From: Pavel Vazharov @ 2018-01-13  4:43 UTC (permalink / raw)
  To: Coly Li; +Cc: mlyle, kent.overstreet, linux-bcache, linux-kernel

On Sat, 13 Jan 2018 12:06:26 +0800
Coly Li <i@coly.li> wrote:

> On 12/01/2018 11:24 PM, Pavel Vazharov wrote:
> > There was a possibility for infinite do-while loop inside the GC thread
> > function in case of total failure of the caching device. I was able to
> > reproduce it 3 times simulating disappearing of the caching device via
> > 'echo 1 > /sys/block/<dev>/device/delete'. In that case the btree_root
> > starts to return non zero and non -EAGAIN result, 'gc failed' message
> > start to fill the kernel log and the do-while becomes infinite loop
> > occupying single CPU core at 100%.
> > There is already a logic which unregisters the cache_set (or panics) in
> > case of io errors and thus we exit the loop here if the unregistering
> > procedure has already started.
> > 
> > Signed-off-by: Pavel Vazharov <freakpv@gmail.com>
> > ---
> >  drivers/md/bcache/btree.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> > index 81e8dc3..a672081 100644
> > --- a/drivers/md/bcache/btree.c
> > +++ b/drivers/md/bcache/btree.c
> > @@ -1748,8 +1748,12 @@ static void bch_btree_gc(struct cache_set *c)
> >  		closure_sync(&writes);
> >  		cond_resched();
> >  
> > -		if (ret && ret != -EAGAIN)
> > -			pr_warn("gc failed!");
> > +		if (ret && ret != -EAGAIN) {
> > +			if (test_bit(CACHE_SET_UNREGISTERING, &c->flags))
> > +				break;
> > +			else
> > +				pr_warn("gc failed!");
> > +		}
> >  	} while (ret);
> >  
> >  	bch_btree_gc_finish(c);
> > 
> 
> Hi Pavel,
> 
> I see the point here. But there are 2 code paths to call
> cache_set_flush(), one is from bch_cache_set_error(), one is from sysfs
> interface (echo 1 > /sys/fs/bcache/<UUID>/stop).
> 
> CACHE_SET_UNREGISTERING is set in the first code path, the another code
> path from sysfs does not set CACHE_SET_UNREGISTERING. In this case maybe
> the above while-loop can not be stopped.
> 
> In my device failure cache set, I add an io_disable (in v2 it is
> CACHE_SET_IO_DISABLE flag) to disable all cache set I/O, maybe it can be
> used to check the condition and break the while-loop.
> 
> Thanks for the hint, I will also try to fix it in my patch set. If you
> don't mind, I am glad to have your "Reviewed-by:" after I post the v2
> patch set.
> 
> Thanks.
> 
> -- 
> Coly Li

Hi Coly,

CACHE_SET_IO_DISABLE looks like more general solution to the problem.
Thanks for the review invitation. I'll do my best.

-- 
Pavel Vazharov <freakpv@gmail.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-01-13  4:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-12 15:24 [PATCH] bcache: btree.c: Fix GC thread exit in case of cache device failure and unregister Pavel Vazharov
2018-01-13  4:06 ` Coly Li
2018-01-13  4:43   ` Pavel Vazharov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox