From: Coly Li <i@coly.li>
To: Pavel Vazharov <freakpv@gmail.com>,
mlyle@lyle.org, kent.overstreet@gmail.com
Cc: linux-bcache@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] bcache: btree.c: Fix GC thread exit in case of cache device failure and unregister
Date: Sat, 13 Jan 2018 12:06:26 +0800 [thread overview]
Message-ID: <8bf2eafd-651e-ce0b-3a4c-aa10e292ce2f@coly.li> (raw)
In-Reply-To: <1515770690-18562-1-git-send-email-freakpv@gmail.com>
On 12/01/2018 11:24 PM, Pavel Vazharov wrote:
> There was a possibility for infinite do-while loop inside the GC thread
> function in case of total failure of the caching device. I was able to
> reproduce it 3 times simulating disappearing of the caching device via
> 'echo 1 > /sys/block/<dev>/device/delete'. In that case the btree_root
> starts to return non zero and non -EAGAIN result, 'gc failed' message
> start to fill the kernel log and the do-while becomes infinite loop
> occupying single CPU core at 100%.
> There is already a logic which unregisters the cache_set (or panics) in
> case of io errors and thus we exit the loop here if the unregistering
> procedure has already started.
>
> Signed-off-by: Pavel Vazharov <freakpv@gmail.com>
> ---
> drivers/md/bcache/btree.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 81e8dc3..a672081 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -1748,8 +1748,12 @@ static void bch_btree_gc(struct cache_set *c)
> closure_sync(&writes);
> cond_resched();
>
> - if (ret && ret != -EAGAIN)
> - pr_warn("gc failed!");
> + if (ret && ret != -EAGAIN) {
> + if (test_bit(CACHE_SET_UNREGISTERING, &c->flags))
> + break;
> + else
> + pr_warn("gc failed!");
> + }
> } while (ret);
>
> bch_btree_gc_finish(c);
>
Hi Pavel,
I see the point here. But there are 2 code paths to call
cache_set_flush(), one is from bch_cache_set_error(), one is from sysfs
interface (echo 1 > /sys/fs/bcache/<UUID>/stop).
CACHE_SET_UNREGISTERING is set in the first code path, the another code
path from sysfs does not set CACHE_SET_UNREGISTERING. In this case maybe
the above while-loop can not be stopped.
In my device failure cache set, I add an io_disable (in v2 it is
CACHE_SET_IO_DISABLE flag) to disable all cache set I/O, maybe it can be
used to check the condition and break the while-loop.
Thanks for the hint, I will also try to fix it in my patch set. If you
don't mind, I am glad to have your "Reviewed-by:" after I post the v2
patch set.
Thanks.
--
Coly Li
next prev parent reply other threads:[~2018-01-13 4:06 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-12 15:24 [PATCH] bcache: btree.c: Fix GC thread exit in case of cache device failure and unregister Pavel Vazharov
2018-01-13 4:06 ` Coly Li [this message]
2018-01-13 4:43 ` Pavel Vazharov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8bf2eafd-651e-ce0b-3a4c-aa10e292ce2f@coly.li \
--to=i@coly.li \
--cc=freakpv@gmail.com \
--cc=kent.overstreet@gmail.com \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mlyle@lyle.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox