public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Michael Lyle <mlyle@lyle.org>, linux-bcache@vger.kernel.org
Cc: linux-block@vger.kernel.org, Junhui Tang <tang.junhui@zte.com.cn>
Subject: Re: [PATCH v7 9/9] bcache: stop bcache device when backing device is offline
Date: Wed, 28 Feb 2018 12:54:06 +0800	[thread overview]
Message-ID: <3cbe4681-e9a8-0b1a-15f2-b75d7a0cdcf5@suse.de> (raw)
In-Reply-To: <f4eb2061-67d7-28a3-6187-6390ff974836@lyle.org>

On 28/02/2018 2:20 AM, Michael Lyle wrote:
> Hi Coly Li--
> 
> Just a couple of questions.
> 
> On 02/27/2018 08:55 AM, Coly Li wrote:
>> +#define BACKING_DEV_OFFLINE_TIMEOUT 5
> 

Hi Mike,

> I think you wanted this to be 30 (per commit message)-- was this turned
> down for testing or deliberate?
> 

Currently the correct timeout is 5 seconds. The 30 seconds was for the
condition when the offline backing device came back within 30 seconds,
which is not implemented yet. In general after 5 seconds, the offline
device will be deleted from system, if it comes back again the device
name might be changed (e.g. from /dev/sdc to /dev/sdd) with different
bdev index. I need to recognize the new coming drive is previous offline
one, and modify bcache internal data structures to link to the new
device name. This requires more details and effort, and I decided to
work on it later as a separate topic. Obviously I didn't update patch
commit log properly. Thanks for point out this :-)

>> +static int cached_dev_status_update(void *arg)
>> +{
>> +	struct cached_dev *dc = arg;
>> +	struct request_queue *q;
>> +	char buf[BDEVNAME_SIZE];
>> +
>> +	/*
>> +	 * If this delayed worker is stopping outside, directly quit here.
>> +	 * dc->io_disable might be set via sysfs interface, so check it
>> +	 * here too.
>> +	 */
>> +	while (!kthread_should_stop() && !dc->io_disable) {
>> +		q = bdev_get_queue(dc->bdev);
>> +		if (blk_queue_dying(q))
> 
> I am not sure-- is this the correct test to know if the bdev is offline?
>  It's very sparsely used outside of the core block system.  (Is there
> any scenario where the queue comes back?  Can the device be "offline"
> and still have a live queue?  Another approach might be to set
> io_disable when there's an extended period without a successful IO, and
> it might be possible to do this with atomics without a thread).
> 

I asked blocker layer developers Ming Lei, this is the methods suggested
 for now. And in my test, it works as expected.

Anyway, using a kernel thread to monitor device status is ugly, the
reason I have to do it this way is because there is no general notifier
for block layer device offline or failure. There is a bus notifier, but
for devices have no bus, it does not work well. So I am also thinking of
the possibility for a generic notifier for block device offline or
failure, if it can be done someday, we can just register a callback
routine, and not need an extra kernel thread.

> This approach can work if you're certain that blk_queue_dying is an
> appropriate test for all failure scenarios (I just don't know).

So far it works well, and I am collaborating with our partners to test
the patch set, no negative report so far.

Thanks.

Coly Li

  reply	other threads:[~2018-02-28  4:54 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-27 16:55 [PATCH v7 0/9] bcache: device failure handling improvement Coly Li
2018-02-27 16:55 ` [PATCH v7 1/9] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-02-27 18:01   ` Michael Lyle
2018-02-27 16:55 ` [PATCH v7 2/9] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set Coly Li
2018-02-27 16:55 ` [PATCH v7 3/9] bcache: stop dc->writeback_rate_update properly Coly Li
2018-02-27 18:53   ` Michael Lyle
2018-02-27 16:55 ` [PATCH v7 4/9] bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags Coly Li
2018-02-27 18:07   ` Michael Lyle
2018-02-27 16:55 ` [PATCH v7 5/9] bcache: add stop_when_cache_set_failed option to backing device Coly Li
2018-02-27 16:55 ` [PATCH v7 6/9] bcache: fix inaccurate io state for detached bcache devices Coly Li
2018-02-27 16:55 ` [PATCH v7 7/9] bcache: add backing_request_endio() for bi_end_io of attached backing device I/O Coly Li
2018-03-14 18:30   ` Michael Lyle
2018-03-14 19:08   ` Michael Lyle
2018-02-27 16:55 ` [PATCH v7 8/9] bcache: add io_disable to struct cached_dev Coly Li
2018-03-14 18:35   ` Michael Lyle
2018-03-14 19:09   ` Michael Lyle
2018-02-27 16:55 ` [PATCH v7 9/9] bcache: stop bcache device when backing device is offline Coly Li
2018-02-27 18:20   ` Michael Lyle
2018-02-28  4:54     ` Coly Li [this message]
2018-02-27 18:29 ` [PATCH v7 0/9] bcache: device failure handling improvement Michael Lyle
2018-02-27 18:33   ` Michael Lyle
2018-02-27 19:27     ` Michael Lyle
2018-02-28  4:55       ` Coly Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3cbe4681-e9a8-0b1a-15f2-b75d7a0cdcf5@suse.de \
    --to=colyli@suse.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=mlyle@lyle.org \
    --cc=tang.junhui@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox