From: Coly Li <colyli@suse.de>
To: Pavel Goran <via-bcache@pvgoran.name>
Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org,
Nix <nix@esperi.org.uk>, Michael Lyle <mlyle@lyle.org>,
Junhui Tang <tang.junhui@zte.com.cn>,
Hannes Reinecke <hare@suse.com>
Subject: Re: [PATCH v4 13/13] bcache: add stop_when_cache_set_failed to struct cached_dev
Date: Sun, 28 Jan 2018 17:39:37 +0800 [thread overview]
Message-ID: <c63e4e9f-1eb2-f90d-b294-ed4342034597@suse.de> (raw)
In-Reply-To: <1693067263.20180128085551@pvgoran.name>
On 28/01/2018 1:55 PM, Pavel Goran wrote:
> Hello Coly,
>
> Sunday, January 28, 2018, 7:32:09 AM, you wrote:
>
>>>> Current bcache failure handling code will stop all attached bcache devices
>>>> when the cache set is broken or disconnected. This is desired behavior for
>>>> most of enterprise or cloud use cases, but maybe not for low end
>>>> configuration. Nix <nix@esperi.org.uk> points out, users may still want to
>>>> access the bcache device after cache device failed, for example on laptops.
>>>
>>> In the current state, this functionality is rather user-unfriendly.
>>>
>>> 1. The new "stop_when_cache_set_failed" option doesn't seem to be persistent.
>>> (At least, I did not see any explicit mechanism of saving/restoring it.) That
>>> is, users will have to set it on each system startup.
>>>
>>> 2. If the new option is set to zero, it will (supposedly) lead to data
>>> corruption/loss when cache set of a "dirty" bcache device is detached. The
>>> option that an average home user may want to switch shouldn't be a way to
>>> shoot oneself in the foot!
>>>> As a remedy, the option could be changed to have the states "always" and
>>> "auto" instead of "1" and "0", so that "auto" would still stop the bcache
>>> device if it (or the entire cache set) is "dirty". (Alternatively, it could be
>>> renamed to "force_stop_when_cache_set_failed" or
>>> "always_stop_when_cache_set_failed" with values "1" and "0", if string values
>>> are undesirable.)
>>>
>>> Also, the printed warning is somewhat misleading: it says "To disable this
>>> warning message, please set /sys/block/%s/bcache/stop_when_cache_set_failed to
>>> 1", whereas the suggested change would lead to behaviour change rather that to
>>> just disabling the warning.
>>>
>>> 3. If (2) is implemented, the default value for the option could be changed to
>>> "auto". (The rationale is that enterprise users who want it enabled are better
>>> prepared to tune their device settings on startup.) However, this is only
>>> important if (1) is not implemented.
>>>
>
>> Hi Pavel,
>
>> I don't like "auto" since its behavior is unpredictable, sometimes whole
>> things are gone, sometimes bcache device exists but slower, for large
>> scale deployed environment, it is confused and misleading.
>
Hi Pavel,
> In fact, during normal operations, this mode can be made rather predictable:
> the device is stopped if it's "writeback", the device isn't stopped if it's
> "writearound" or "writethrough". The only time it's unpredictable is when the
> device is being transitioned from "writeback" to other mode.
>
> However, I can see how this adds a certain informational inconsistency to the
> picture: an administrator won't be able to say "everything is good" just
> because all bcache devices work; he'll have to be aware that non-writeback
> devices get special handling.
>
You got what I meant. I just hope people can know how errors are handled
without extra document. In auto mode, print out kernel message to
indicate why bcache device is stopped or not might be useful to users to
understand what's going on.
>> After thinking for a while, I feel the "auto"/"always" options can make
>> both enterprise and home users get an agreement. I am not able to find
>> other better solution so far, thank you for the hint, brilliant!
>
>> Personally I intend to set "always" as the default option, because I
>> maintain bcache for enterprise workloads. The persistent option problem
>> pointed by you does make sense, I will think how to solve it later
>> (probably from user space tools).
>
> Providing persistency via user-space tools looks like a bad idea. User-space
> tools tend to lag behind kernel, so users will not see the new functionality
> in their distributions for several months or several years. Also, if you're
> talking about setting the option from user space on each startup, then
> additionally there is a problem of different incompatible init systems, which
> will further make things more complicated.
OK, I get your point, it makes sense. This is just a rough thought from
me, the motivation was to avoid on-disk format change.
>
> Is there a way to save the option somewhere in bcache superblock, like the
> cache mode is saved?
>
It is possible to add flag in bcache superblock, but it means on disk
format modification, this is something I try best to avoid. I need to
check the code before I have the answer of yes or no.
>> How do you think of this ?
>
> Actually, I'd love to hear thoughts from other users and developers. Three
> opinions is too few for this kind of discussion.
Sure I agree. Indeed the suggestion and feedback for you and Nix helps
me quite a lot :-) Firstly I try to avoid data corruption and provide
flexibility to enterprise and personal users, then we can improve it in
future if someone may provide better suggestion than yours :-)
Thanks.
Coly Li
next prev parent reply other threads:[~2018-01-28 9:39 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-27 14:23 [PATCH v4 00/13] bcache: device failure handling improvement Coly Li
2018-01-27 14:23 ` [PATCH v4 01/13] bcache: set writeback_rate_update_seconds in range [1, 60] seconds Coly Li
2018-02-01 21:44 ` Michael Lyle
2018-01-27 14:23 ` [PATCH v4 02/13] bcache: properly set task state in bch_writeback_thread() Coly Li
2018-02-01 21:45 ` Michael Lyle
2018-01-27 14:23 ` [PATCH v4 03/13] bcache: fix cached_dev->count usage for bch_cache_set_error() Coly Li
2018-01-27 14:23 ` [PATCH v4 04/13] bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set Coly Li
2018-01-27 14:23 ` [PATCH v4 05/13] bcache: stop dc->writeback_rate_update properly Coly Li
2018-01-27 14:23 ` [PATCH v4 06/13] bcache: set error_limit correctly Coly Li
2018-02-01 21:49 ` Michael Lyle
2018-01-27 14:24 ` [PATCH v4 07/13] bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags Coly Li
2018-01-27 14:24 ` [PATCH v4 08/13] bcache: stop all attached bcache devices for a retired cache set Coly Li
2018-01-27 14:24 ` [PATCH v4 09/13] bcache: fix inaccurate io state for detached bcache devices Coly Li
2018-01-27 14:24 ` [PATCH v4 10/13] bcache: add backing_request_endio() for bi_end_io of attached backing device I/O Coly Li
2018-01-27 14:24 ` [PATCH v4 11/13] bcache: add io_disable to struct cached_dev Coly Li
2018-01-27 14:24 ` [PATCH v4 12/13] bcache: stop bcache device when backing device is offline Coly Li
2018-01-27 14:24 ` [PATCH v4 13/13] bcache: add stop_when_cache_set_failed to struct cached_dev Coly Li
2018-01-28 3:33 ` Pavel Goran
2018-01-28 4:32 ` Coly Li
2018-01-28 5:55 ` Re[2]: " Pavel Goran
2018-01-28 9:39 ` Coly Li [this message]
2018-01-29 12:57 ` Nix
2018-01-29 13:02 ` Coly Li
-- strict thread matches above, loose matches on Subject: below --
2018-01-28 1:56 [PATCH v4 00/13] bcache: device failure handling improvement Coly Li
2018-01-28 1:56 ` [PATCH v4 13/13] bcache: add stop_when_cache_set_failed to struct cached_dev Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c63e4e9f-1eb2-f90d-b294-ed4342034597@suse.de \
--to=colyli@suse.de \
--cc=hare@suse.com \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=mlyle@lyle.org \
--cc=nix@esperi.org.uk \
--cc=tang.junhui@zte.com.cn \
--cc=via-bcache@pvgoran.name \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).