From: Ming Lei <ming.lei@redhat.com>
To: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Cc: Gabriel Krisman Bertazi <krisman@collabora.com>,
Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Harris James R <james.r.harris@intel.com>,
linux-kernel@vger.kernel.org, io-uring@vger.kernel.org,
Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
ming.lei@redhat.com
Subject: Re: [PATCH V3 1/1] ublk: add io_uring based userspace block driver
Date: Tue, 5 Jul 2022 16:12:19 +0800 [thread overview]
Message-ID: <YsPyY0qiERHeg/XK@T590> (raw)
In-Reply-To: <ebd6754e-57bf-88a7-df04-3f38864b0c52@linux.alibaba.com>
On Tue, Jul 05, 2022 at 12:16:07PM +0800, Ziyang Zhang wrote:
> On 2022/7/5 06:10, Gabriel Krisman Bertazi wrote:
> > Ming Lei <ming.lei@redhat.com> writes:
> >
> >> This is the driver part of userspace block driver(ublk driver), the other
> >> part is userspace daemon part(ublksrv)[1].
> >>
> >> The two parts communicate by io_uring's IORING_OP_URING_CMD with one
> >> shared cmd buffer for storing io command, and the buffer is read only for
> >> ublksrv, each io command is indexed by io request tag directly, and
> >> is written by ublk driver.
> >>
> >> For example, when one READ io request is submitted to ublk block driver, ublk
> >> driver stores the io command into cmd buffer first, then completes one
> >> IORING_OP_URING_CMD for notifying ublksrv, and the URING_CMD is issued to
> >> ublk driver beforehand by ublksrv for getting notification of any new io request,
> >> and each URING_CMD is associated with one io request by tag.
> >>
> >> After ublksrv gets the io command, it translates and handles the ublk io
> >> request, such as, for the ublk-loop target, ublksrv translates the request
> >> into same request on another file or disk, like the kernel loop block
> >> driver. In ublksrv's implementation, the io is still handled by io_uring,
> >> and share same ring with IORING_OP_URING_CMD command. When the target io
> >> request is done, the same IORING_OP_URING_CMD is issued to ublk driver for
> >> both committing io request result and getting future notification of new
> >> io request.
> >>
> >> Another thing done by ublk driver is to copy data between kernel io
> >> request and ublksrv's io buffer:
> >>
> >> 1) before ubsrv handles WRITE request, copy the request's data into
> >> ublksrv's userspace io buffer, so that ublksrv can handle the write
> >> request
> >>
> >> 2) after ubsrv handles READ request, copy ublksrv's userspace io buffer
> >> into this READ request, then ublk driver can complete the READ request
> >>
> >> Zero copy may be switched if mm is ready to support it.
> >>
> >> ublk driver doesn't handle any logic of the specific user space driver,
> >> so it should be small/simple enough.
> >>
> >> [1] ublksrv
> >>
> >> https://github.com/ming1/ubdsrv
> >>
> >> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >
> > Hi Ming,
> >
> > A few comments inline:
> >
> >
> >> +#define UBLK_MINORS (1U << MINORBITS)
> >> +
> >> +struct ublk_rq_data {
> >> + struct callback_head work;
> >> +};
> >> +
> >> +/* io cmd is active: sqe cmd is received, and its cqe isn't done */
> >> +#define UBLK_IO_FLAG_ACTIVE 0x01
> >> +
> >> +/*
> >> + * FETCH io cmd is completed via cqe, and the io cmd is being handled by
> >> + * ublksrv, and not committed yet
> >> + */
> >> +#define UBLK_IO_FLAG_OWNED_BY_SRV 0x02
> >> +
> >
> > Minor nit: I wonder if the IO life cycle isn't better represented as a
> > state machine than flags:
> >
> > enum {
> > UBLK_IO_FREE,
> > UBLK_IO_QUEUED
> > UBLK_IO_OWNED_BY_SRV
> > UBLK_IO_COMPLETED,
> > UBLK_IO_ABORTED,
> > }
> >
> > Since currently, IO_FLAG_ACTIVE and IO_OWNED_BY_SRV should (almost) be
> > mutually exclusive.
> >
> >
> >> +
> >> +static int ublk_ctrl_stop_dev(struct ublk_device *ub)
> >> +{
> >> + ublk_stop_dev(ub);
> >> + cancel_work_sync(&ub->stop_work);
> >> + return 0;
> >> +}
> >> +
> >> +static inline bool ublk_queue_ready(struct ublk_queue *ubq)
> >> +{
> >> + return ubq->nr_io_ready == ubq->q_depth;
> >> +}
> >> +
> >> +/* device can only be started after all IOs are ready */
> >> +static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
> >> +{
> >> + mutex_lock(&ub->mutex);
> >> + ubq->nr_io_ready++;
> >
> > I think this is still problematic for the case where a FETCH_IO is sent
> > from a different thread than the one originally set in ubq_daemon
> > (i.e. a userspace bug). Since ubq_daemon is used to decide what task
> > context will do the data copy, If an IO_FETCH_RQ is sent to the same queue
> > from two threads, the data copy can happen in the context of the wrong
> > task. I'd suggest something like the check below at the beginning of
> > mark_io_ready and a similar on for IO_COMMIT_AND_FETCH_RQ
> >
> > mutex_lock(&ub->mutex);
> > if (ub->ubq_daemon && ub->ubq_daemon != current) {
> > mutex_unlock(&ub->mutex);
> > return -EINVAL;
> > }
> > ubq->nr_io_ready++;
> > ...
> >> + if (ublk_queue_ready(ubq)) {
> >> + ubq->ubq_daemon = current;
> >> + get_task_struct(ubq->ubq_daemon);
> >> + ub->nr_queues_ready++;
> >> + }
> >> + if (ub->nr_queues_ready == ub->dev_info.nr_hw_queues)
> >> + complete_all(&ub->completion);
> >> + mutex_unlock(&ub->mutex);
> >> +}
> >> +
> >> +static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
> >> +{
> >> + struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd;
> >> + struct ublk_device *ub = cmd->file->private_data;
> >> + struct ublk_queue *ubq;
> >> + struct ublk_io *io;
> >> + u32 cmd_op = cmd->cmd_op;
> >> + unsigned tag = ub_cmd->tag;
> >> + int ret = -EINVAL;
> >> +
> >> + pr_devel("%s: receieved: cmd op %d queue %d tag %d result %d\n",
> >
> > ^^^
> > received
> >
> >
> >> + __func__, cmd->cmd_op, ub_cmd->q_id, tag,
> >> + ub_cmd->result);
> >> +
> >> + if (!(issue_flags & IO_URING_F_SQE128))
> >> + goto out;
> >> +
> >> + ubq = ublk_get_queue(ub, ub_cmd->q_id);
> >> + if (!ubq || ub_cmd->q_id != ubq->q_id)
> >
> > q_id is coming from userspace and is used to access an array inside
> > ublk_get_queue(). I think you need to ensure qid < ub->dev_info.nr_hw_queues
> > before calling ublk_get_queue() to protect from a kernel bad memory
> > access triggered by userspace.
> >
> >> + goto out;
> >> +
> >> + if (WARN_ON_ONCE(tag >= ubq->q_depth))
> >
> > Userspace shouldn't be able to easily trigger a WARN_ON.
> >
> >> + goto out;
> >> +
> >> + io = &ubq->ios[tag];
> >> +
> >> + /* there is pending io cmd, something must be wrong */
> >> + if (io->flags & UBLK_IO_FLAG_ACTIVE) {b
> >> + ret = -EBUSY;
> >> + goto out;
> >> + }
> >> +
> >> + switch (cmd_op) {
> >> + case UBLK_IO_FETCH_REQ:
> >> + /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */
> >> + if (WARN_ON_ONCE(ublk_queue_ready(ubq))) {
> >
> > Likewise, this shouldn't trigger a WARN_ON, IMO.
> >
> >> + ret = -EBUSY;
> >> + goto out;
> >> + }
> >> + /*
> >> + * The io is being handled by server, so COMMIT_RQ is expected
> >> + * instead of FETCH_REQ
> >> + */
> >> + if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)
> >> + goto out;
> >> + /* FETCH_RQ has to provide IO buffer */
> >> + if (!ub_cmd->addr)
> >> + goto out;
> >> + io->cmd = cmd;
> >> + io->flags |= UBLK_IO_FLAG_ACTIVE;
> >> + io->addr = ub_cmd->addr;
> >> +
> >> + ublk_mark_io_ready(ub, ubq);
> >> + break;
> >> + case UBLK_IO_COMMIT_AND_FETCH_REQ:
> >> + /* FETCH_RQ has to provide IO buffer */
> >> + if (!ub_cmd->addr)
> >> + goto out;
> >> + io->addr = ub_cmd->addr;
> >> + io->flags |= UBLK_IO_FLAG_ACTIVE;
> >> + fallthrough;
> >> + case UBLK_IO_COMMIT_REQ:
> >> + io->cmd = cmd;
> >> + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> >> + goto out;
> >> + ublk_commit_completion(ub, ub_cmd);
> >> +
> >> + /* COMMIT_REQ is supposed to not fetch req */
> >
> > I wonder if we could make it without IO_COMMIT_REQ. Is it useful to be
> > able to commit without fetching a new request?
>
> UBLK_IO_COMMIT_REQ is not necessary, IMO.
> In current version of ubd_drv.c I find UBLK_IO_COMMIT_REQ is sent by ublksrv
> after it gets one UBD_IO_RES_ABORT beacuse ubd_drv wants to abort IOs and let
> the ublk daemon exit.
>
> We can use UBLK_IO_COMMIT_AND_FETCH_REQ to replace UBLK_IO_COMMIT_REQ.
> The data flow could be:
>
> 1) UBLK_IO_COMMIT_AND_FETCH_REQ from ublksrv
>
> 2) ubd_drv receives IO's sqe with UBLK_IO_COMMIT_AND_FETCH_REQ
> and sets the IO's status to UBLK_IO_QUEUED
Not see where UBLK_IO_QUEUED is? :-)
>
> 3) ubd_drv wants to abort IOs so it just completes
> this IO's cqe(UBD_IO_RES_ABORT)
>
> I successfully removed UBLK_IO_COMMIT_REQ when developing libubd
> although I choose the earliest version of ubd_drv.c(v5.17-ubd-dev)
> which may be a buggy version.
You can verify it with latest ublksrv(libublksrv has been in master)
too by the following patch, then 'make test T=generic' can run
successfully.
diff --git a/lib/ublksrv.c b/lib/ublksrv.c
index c4bb2f4..aee71a0 100644
--- a/lib/ublksrv.c
+++ b/lib/ublksrv.c
@@ -150,7 +150,7 @@ static inline int ublksrv_queue_io_cmd(struct ublksrv_queue *q,
else
cmd_op = UBLK_IO_FETCH_REQ;
} else if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP) {
- cmd_op = UBLK_IO_COMMIT_REQ;
+ cmd_op = UBLK_IO_COMMIT_AND_FETCH_REQ;
} else {
syslog(LOG_ERR, "io flags is zero, tag %d\n",
(int)cmd->tag);
ublk_cancel_queue() will cancel all these io commands.
Thanks,
Ming
next prev parent reply other threads:[~2022-07-05 8:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-28 16:08 [PATCH V3 0/1] ublk: add io_uring based userspace block driver Ming Lei
2022-06-28 16:08 ` [PATCH V3 1/1] " Ming Lei
2022-06-30 11:35 ` Ziyang Zhang
2022-06-30 12:33 ` Ming Lei
2022-07-01 2:47 ` Ziyang Zhang
2022-07-01 4:06 ` Ming Lei
2022-07-04 11:17 ` Sagi Grimberg
2022-07-04 12:34 ` Ming Lei
2022-07-04 14:00 ` Sagi Grimberg
2022-07-04 16:13 ` Gabriel Krisman Bertazi
2022-07-04 16:19 ` Sagi Grimberg
2022-07-05 0:43 ` Ming Lei
2022-07-04 22:10 ` Gabriel Krisman Bertazi
2022-07-05 4:16 ` Ziyang Zhang
2022-07-05 8:12 ` Ming Lei [this message]
2022-07-05 8:06 ` Ming Lei
2022-07-07 7:49 ` Ming Lei
2022-07-07 7:58 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YsPyY0qiERHeg/XK@T590 \
--to=ming.lei@redhat.com \
--cc=ZiyangZhang@linux.alibaba.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=james.r.harris@intel.com \
--cc=krisman@collabora.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stefanha@redhat.com \
--cc=xiaoguang.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.