From: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: axboe@kernel.dk, xiaoguang.wang@linux.alibaba.com,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
joseph.qi@linux.alibaba.com
Subject: Re: [RFC PATCH 0/9] ublk_drv: add USER_RECOVERY support
Date: Mon, 29 Aug 2022 12:00:49 +0800 [thread overview]
Message-ID: <2c5def30-7116-7a0d-860d-74e1d36a91c6@linux.alibaba.com> (raw)
In-Reply-To: <Ywwfi7Dgi0JC2kQ/@T590>
On 2022/8/29 10:08, Ming Lei wrote:
> On Wed, Aug 24, 2022 at 01:47:35PM +0800, ZiyangZhang wrote:
>> ublk_drv is a driver simply passes all blk-mq rqs to ublksrv[1] in
>> userspace. For each ublk queue, there is one ubq_daemon(pthread).
>> All ubq_daemons share the same process which opens /dev/ublkcX.
>> The ubq_daemon code infinitely loops on io_uring_enter() to
>> send/receive io_uring cmds which pass information of blk-mq
>> rqs.
>>
>> Now, if one ubq_daemon(pthread) or the process crashes, ublk_drv
>> must abort the dying ubq, stop the device and release everything.
>> This is not a good choice in practice because users do not expect
>> aborted requests, I/O errors and a released device. They may want
>> a recovery machenism so that no requests are aborted and no I/O
>> error occurs. Anyway, users just want everything works as uaual.
>
> I understand the requirement is that both /dev/ublkbN and /dev/ublkcN
> won't be deleted & re-added from user viewpoint after user recovery,
> so the device context won't be lost.
Yes, after the 'process' is killed or crashed(such as segmentation fault)
both /dev/ublkb0 and /dev/ublkc0 is not deleted.
>
>>
>> This RFC patchset implements USER_RECOVERY support. If the process
>> crashes, we allow ublksrv to provide new process and ubq_daemons.
>> We do not support single ubq_daemon(pthread) recovery because a
>> pthread rarely crashes.
>>
>> Recovery feature is quite useful for products do not expect to
>> return any I/O error to frontend users.
>
> That looks one very ideal requirement. To be honest, no any block driver
> can guarantee that 100%, so it is just one soft requirement?
>
> Cause memory allocation may fail, network may be disconnected,
> re-creating pthread or process may fail too, ...
Yes, I know there are many other problem which may cause a failure.
The recovery mechanism only guarantees that rqs sent to ublksrv
before crash are not aborted. Instead, ublk_drv re-issues the request
itself and fio does not konw about it. Of course the backend must
tolerate a double-write/read.
>
>> In detail, we support
>> this scenario:
>> (1) The /dev/ublkc0 is opened by process 0;
>> (2) Fio is running on /dev/ublkb0 exposed by ublk_drv and all
>> rqs are handled by process 0.
>> (3) Process 0 suddenly crashes(e.g. segfault);
>> (4) Fio is still running and submit IOs(but these IOs cannot
>> complete now)
>> (5) User recovers with process 1 and attach it to /dev/ublkc0
>> (6) All rqs are handled by process 1 now and IOs can be
>> completed now.
>>
>> Note: The backend must tolerate double-write because we re-issue
>> a rq sent to the old(dying) process before. We allow users to
>> choose whether re-issue these rqs or not, please see patch 7 for
>> more detail.
>>
>> We provide a sample script here to simulate the above steps:
>>
>> ***************************script***************************
>> LOOPS=10
>>
>> __ublk_get_pid() {
>> pid=`./ublk list -n 0 | grep "pid" | awk '{print $7}'`
>> echo $pid
>> }
>>
>> ublk_recover_kill()
>> {
>> for CNT in `seq $LOOPS`; do
>> dmesg -C
>> pid=`__ublk_get_pid`
>> echo -e "*** kill $pid now ***"
>> kill -9 $pid
>> sleep 2
>> echo -e "*** recover now ***"
>> ./ublk recover -n 0
>
> The current behavior is that /dev/ublkb* is removed after device is
> aborted because ubq daemon is killed.
>
> What if 'ublk recover' command isn't sent? So the current behavior
> without recovery is changed? Or just changed with this feature enabled?
No, I do not change the default behavior. You can verify this by running
generic/002 and generic/003. These tests passes with either recovery enabled
or disabled.
(1) With recovery disabled, the monitor_work scheduled periodically or
STOP_DEV ctrl-cmd issued manually can cleanup everything and remove the
gendisk.
(2)With recovery enabled, the monitor_work is not scheduled anymore, see
patch 9. So after a crash,all resources are still in kernel.
Then, there are two options for a user:
(a) You don't want to recover it, so just send STOP_DEV ctrl-cmd. This will
schedule monitor_work once and cleanup everything. Please see patch 5.
(b) You want to recover it, so just send START_RECOVERY ctrl-cmd. Then you
HAVE TO start a new process and send END_RECOVERY.
Note: Actually I am thinking what if a user has sent START_RECOVERY but he fails
to start a new process. I have a rough idea: just abort all rqs after we unqiuesce
the request queue. But that is not included in this RFC patchset because I
want to make it simpler. Maybe we can consider it later?
>
> BTW, I do not mean the change isn't reasonable, but suggest to document
> the user visible change, so it can get reviewed from either user
> viewpoint and technical point.
>
>> sleep 4
>> done
>> }
>>
>> ublk_test()
>> {
>> dmesg -C
>> echo -e "*** add ublk device ***"
>> ./ublk add -t null -d 4 -i 1
>> sleep 2
>> echo -e "*** start fio ***"
>> fio --bs=4k \
>> --filename=/dev/ublkb0 \
>> --runtime=100s \
>> --rw=read &
>> sleep 4
>> ublk_recover_kill
>> wait
>> echo -e "*** delete ublk device ***"
>> ./ublk del -n 0
>> }
>>
>> for CNT in `seq 4`; do
>> modprobe -rv ublk_drv
>> modprobe ublk_drv
>> echo -e "************ round $CNT ************"
>> ublk_test
>> sleep 5
>> done
>> ***************************script***************************
>>
>> You may run it with our modified ublksrv[3] which supports
>> recovey feature. No I/O error occurs and you can verify it
>> by typing
>> $ perf-tools/bin/tpoint block:block_rq_error
>>
>> The basic idea of USER_RECOVERY is quite straightfoward:
>>
>> (1) release/free everything belongs to the dying process.
>>
>> Note: Since ublk_drv does save information about user process,
>> this work is important because we don't expect any resource
>> lekage. Particularly, ioucmds from the dying ubq_daemons
>> need to be completed(freed). Current ublk_drv code cannot satisfy
>> our need while considering USER_RECOVERY. So we refactor some code
>> shown in patch 1-5 to gracefully free all ioucmds.
>>
>> (2) init ublk queues including requeuing/aborting rqs.
>>
>> (3) allow new ubq_daemons issue FETCH_REQ.
>>
>> Here is steps to reocver:
>>
>> (1) For a user, after a process crash(how he detect a crash is not related
>> to this patchset), he sends START_USER_RECOVERY ctrl-cmd to
>
> I'd suggest to describe crash detector a bit at least, as one whole use case,
> crash detector should be the input of the use case of user recovery, which is
> usually one part of use case when modeling software requirement/design.
This patchset tries to answer only one question: After a process crash, how to
re-attach the device by another process. So I do not consider other questions
too much, such as:
(1) How to detect a crash?
(2) Is IO hang a crash? Should we kill the process?
(3) What if a blk-mq rq timeout? Does the process dies? Should we kill the process?
I think we can answer them after kernel-support of USER_RECOVERY is available.
For now I only try to directly kill the process in testcases and manually inject
a crash in handle_io_async().
>
> Such as, crash is detected after the ubq daemon pthread/process is crashed?
> Will you consider io hang in the daemon pthread/process? IMO, long-term,
> the crash detector utility should be part of ublksrv.
Yes, we should design a crash detector in ublksrv. For IO hang, my idea is that:
(1) the ublksrv_tgt code should handle it if user runs ublksrv directly.
(2) the backend should handle it if user only uses libublksrv and embeds it inside
the backend code.
>
> We don't implement ublk driver's IO timeout yet, but that implementation may be
> related with this recovery feature closely, such as, one simple approach is to
> kill ubq-daemon if we can't move on after retrying several times, then
> let userspace detect & recovery.
You mean the ublk_drv can kill the ubq_daemon? I have not consider this case yet...
BTW, I don't think we should put too much logic(IO hang, detector) in ublk_drv
because it should only pass-through rqs to userpsace. We should make ublk_drv simple.
Accept a new daemon and re-attach it to /dev/ublkb0 is what it can do I think.
Regards,
Zhang
next prev parent reply other threads:[~2022-08-29 4:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-24 5:47 [RFC PATCH 0/9] ublk_drv: add USER_RECOVERY support ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 1/9] ublk_drv: check 'current' instead of 'ubq_daemon' ZiyangZhang
2022-08-29 2:13 ` Ming Lei
2022-08-24 5:47 ` [RFC PATCH 2/9] ublk_drv: refactor ublk_cancel_queue() ZiyangZhang
2022-08-29 3:01 ` Ming Lei
2022-08-29 4:50 ` Ziyang Zhang
2022-08-24 5:47 ` [RFC PATCH 3/9] ublk_drv: add a helper to get ioucmd from pdu ZiyangZhang
2022-08-29 3:06 ` Ming Lei
2022-08-29 4:59 ` Ziyang Zhang
2022-08-24 5:47 ` [RFC PATCH 4/9] ublk_drv: refactor __ublk_rq_task_work() and aborting machenism ZiyangZhang
2022-08-29 5:40 ` Ming Lei
2022-08-29 6:13 ` Ziyang Zhang
2022-08-29 8:11 ` Ming Lei
2022-08-29 9:09 ` Ziyang Zhang
2022-08-29 10:02 ` Ming Lei
2022-08-24 5:47 ` [RFC PATCH 5/9] ublk_drv: refactor ublk_stop_dev() ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 6/9] ublk_drv: add pr_devel() to prepare for recovery feature ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 7/9] ublk_drv: define macros for recovery feature and check them ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 8/9] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support ZiyangZhang
2022-08-24 5:47 ` [RFC PATCH 9/9] ublk_drv: do not schedule monitor_work with recovery feature enabled ZiyangZhang
2022-08-29 2:08 ` [RFC PATCH 0/9] ublk_drv: add USER_RECOVERY support Ming Lei
2022-08-29 4:00 ` Ziyang Zhang [this message]
2022-08-29 5:56 ` Ming Lei
2022-08-29 7:29 ` Ziyang Zhang
2022-08-29 8:38 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2c5def30-7116-7a0d-860d-74e1d36a91c6@linux.alibaba.com \
--to=ziyangzhang@linux.alibaba.com \
--cc=axboe@kernel.dk \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=xiaoguang.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox