From: Ming Lei <ming.lei@redhat.com>
To: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
io-uring@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
Gabriel Krisman Bertazi <krisman@collabora.com>,
Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
ming.lei@redhat.com
Subject: Re: [PATCH V5 1/2] ublk_drv: add io_uring based userspace block driver
Date: Thu, 14 Jul 2022 18:48:45 +0800 [thread overview]
Message-ID: <Ys/0jTxQCEHdI560@T590> (raw)
In-Reply-To: <a4249561-84a0-a314-c377-b96d28b7b20b@linux.alibaba.com>
On Thu, Jul 14, 2022 at 06:20:38PM +0800, Ziyang Zhang wrote:
> On 2022/7/13 22:07, Ming Lei wrote:
> > This is the driver part of userspace block driver(ublk driver), the other
> > part is userspace daemon part(ublksrv)[1].
> >
> > The two parts communicate by io_uring's IORING_OP_URING_CMD with one
> > shared cmd buffer for storing io command, and the buffer is read only for
> > ublksrv, each io command is indexed by io request tag directly, and
> > is written by ublk driver.
> >
> > For example, when one READ io request is submitted to ublk block driver, ublk
> > driver stores the io command into cmd buffer first, then completes one
> > IORING_OP_URING_CMD for notifying ublksrv, and the URING_CMD is issued to
> > ublk driver beforehand by ublksrv for getting notification of any new io request,
> > and each URING_CMD is associated with one io request by tag.
> >
> > After ublksrv gets the io command, it translates and handles the ublk io
> > request, such as, for the ublk-loop target, ublksrv translates the request
> > into same request on another file or disk, like the kernel loop block
> > driver. In ublksrv's implementation, the io is still handled by io_uring,
> > and share same ring with IORING_OP_URING_CMD command. When the target io
> > request is done, the same IORING_OP_URING_CMD is issued to ublk driver for
> > both committing io request result and getting future notification of new
> > io request.
> >
> > Another thing done by ublk driver is to copy data between kernel io
> > request and ublksrv's io buffer:
> >
> > 1) before ubsrv handles WRITE request, copy the request's data into
> > ublksrv's userspace io buffer, so that ublksrv can handle the write
> > request
> >
> > 2) after ubsrv handles READ request, copy ublksrv's userspace io buffer
> > into this READ request, then ublk driver can complete the READ request
> >
> > Zero copy may be switched if mm is ready to support it.
> >
> > ublk driver doesn't handle any logic of the specific user space driver,
> > so it is small/simple enough.
> >
> > [1] ublksrv
> >
> > https://github.com/ming1/ubdsrv
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
>
>
> Hi, Ming
>
> I find that a big change from v4 to v5 is the simplification of locks.
>
> In v5 you remove ubq->abort_lock, and I want to ask why it is OK to remove it?
Actually V4 and previous version dealt with the issue too complicated.
>
> If you have time, could you explain how ublk deals with potential race on:
> 1)queue_rq 2)ublk_abort_queue 3) ublk_ctrl_stop_dev 4) ublk_rq_task_work.
> (Lock in ublk really confuses me...)
One big change is the following code:
__ublk_rq_task_work():
bool task_exiting = current != ubq->ubq_daemon ||
(current->flags & PF_EXITING);
...
if (unlikely(task_exiting)) {
blk_mq_end_request(req, BLK_STS_IOERR);
mod_delayed_work(system_wq, &ub->monitor_work, 0);
return;
}
Abort is always started after PF_EXITING is set, but if PF_EXITING is
set, __ublk_rq_task_work fails the request immediately, then io->flags
won't be touched, then no race with abort. Also PF_EXITING is
per-task flag, can only be set before calling __ublk_rq_task_work(),
and setting it actually serialized with calling task work func.
In ublk_queue_rq(), we don't touch io->flags, so there isn't race
with abort.
Wrt. ublk_ctrl_stop_dev(), it isn't related with abort directly, and
if del_gendisk() waits for inflight IO, abort work will be started
for making forward progress. After del_gendisk() returns, there can't
be any inflight io, so it is safe to cancel other pending io command.
>
>
> [...]
>
> > +
> > +/*
> > + * __ublk_fail_req() may be called from abort context or ->ubq_daemon
> > + * context during exiting, so lock is required.
> > + *
> > + * Also aborting may not be started yet, keep in mind that one failed
> > + * request may be issued by block layer again.
> > + */
> > +static void __ublk_fail_req(struct ublk_io *io, struct request *req)
> > +{
> > + WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
> > +
> > + if (!(io->flags & UBLK_IO_FLAG_ABORTED)) {
> > + io->flags |= UBLK_IO_FLAG_ABORTED;
> > + blk_mq_end_request(req, BLK_STS_IOERR);
> > + }
> > +}
> > +
>
> [...]
>
> > +
> > +/*
> > + * When ->ubq_daemon is exiting, either new request is ended immediately,
> > + * or any queued io command is drained, so it is safe to abort queue
> > + * lockless
> > + */
> > +static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
> > +{
> > + int i;
> > +
> > + if (!ublk_get_device(ub))
> > + return;
> > +
> > + for (i = 0; i < ubq->q_depth; i++) {
> > + struct ublk_io *io = &ubq->ios[i];
> > +
> > + if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) {
> > + struct request *rq;
> > +
> > + /*
> > + * Either we fail the request or ublk_rq_task_work_fn
> > + * will do it
> > + */
> > + rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);
> > + if (rq)
> > + __ublk_fail_req(io, rq);
> > + }
> > + }
> > + ublk_put_device(ub);
> > +}
> > +
>
>
> Another problem:
>
> 1) comment of __ublk_fail_req(): "so lock is required"
Yeah, now __ublk_fail_req is only called in abort context, and no race
with task work any more, so lock isn't needed.
>
> 2) comment of ublk_abort_queue(): "so it is safe to abort queue lockless"
This comment is updated in v5, and it is correct.
>
> 3) ublk_abort_queue() calls _ublk_fail_req() on all ubqs.
No, ublk_abort_queue() only aborts the passed ubq, so if one ubq daemon
is aborted, other ubqs can still handle IO during deleting disk.
Thanks,
Ming
next prev parent reply other threads:[~2022-07-14 10:49 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-13 14:07 [PATCH V5 0/2] ublk: add io_uring based userspace block driver Ming Lei
2022-07-13 14:07 ` [PATCH V5 1/2] ublk_drv: " Ming Lei
2022-07-14 10:20 ` Ziyang Zhang
2022-07-14 10:48 ` Ming Lei [this message]
2022-07-14 13:23 ` Ziyang Zhang
2022-07-15 2:04 ` Ming Lei
2022-07-15 6:07 ` Ziyang Zhang
2022-07-13 14:07 ` [PATCH V5 2/2] ublk_drv: support to complete io command via task_work_add Ming Lei
2022-07-13 20:25 ` [PATCH V5 0/2] ublk: add io_uring based userspace block driver Jens Axboe
2022-07-14 0:19 ` Ming Lei
2022-07-14 2:54 ` Jens Axboe
2022-07-14 2:59 ` Jens Axboe
2022-07-14 5:30 ` Ming Lei
2022-07-19 10:15 ` Geert Uytterhoeven
2022-07-14 2:54 ` Jens Axboe
2022-07-14 14:41 ` Gabriel Krisman Bertazi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ys/0jTxQCEHdI560@T590 \
--to=ming.lei@redhat.com \
--cc=ZiyangZhang@linux.alibaba.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=krisman@collabora.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=xiaoguang.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.