linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
	Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH 6/8] ublk: implement ->queue_rqs()
Date: Thu, 27 Mar 2025 17:15:02 +0800	[thread overview]
Message-ID: <Z-UXFu6q1apYYUBb@fedora> (raw)
In-Reply-To: <CADUfDZoBTkNypLXsVFMsZLSJGw3i7VxnQ=POodmYm=E4HPU=SA@mail.gmail.com>

On Wed, Mar 26, 2025 at 02:30:56PM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Implement ->queue_rqs() for improving perf in case of MQ.
> >
> > In this way, we just need to call io_uring_cmd_complete_in_task() once for
> > one batch, then both io_uring and ublk server can get exact batch from
> > client side.
> >
> > Follows IOPS improvement:
> >
> > - tests
> >
> >         tools/testing/selftests/ublk/kublk add -t null -q 2 [-z]
> >
> >         fio/t/io_uring -p0 /dev/ublkb0
> >
> > - results:
> >
> >         more than 10% IOPS boost observed
> >
> > Pass all ublk selftests, especially the io dispatch order test.
> >
> > Cc: Uday Shankar <ushankar@purestorage.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c | 85 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 77 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 53a463681a41..86621fde7fde 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -83,6 +83,7 @@ struct ublk_rq_data {
> >  struct ublk_uring_cmd_pdu {
> >         struct ublk_queue *ubq;
> >         u16 tag;
> > +       struct rq_list list;
> >  };
> >
> >  /*
> > @@ -1258,6 +1259,32 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
> >         io_uring_cmd_complete_in_task(io->cmd, ublk_rq_task_work_cb);
> >  }
> >
> > +static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
> > +               unsigned int issue_flags)
> > +{
> > +       struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> > +       struct ublk_queue *ubq = pdu->ubq;
> > +       struct request *rq;
> > +
> > +       while ((rq = rq_list_pop(&pdu->list))) {
> > +               struct ublk_io *io = &ubq->ios[rq->tag];
> > +
> > +               ublk_rq_task_work_cb(io->cmd, issue_flags);
> 
> ublk_rq_task_work_cb() is duplicating the lookup of ubq, rq, and io.
> Could you factor out a helper that takes those values instead of cmd?
> 
> > +       }
> > +}
> > +
> > +static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
> > +{
> > +       struct request *rq = l->head;
> > +       struct ublk_io *io = &ubq->ios[rq->tag];
> > +       struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(io->cmd);
> > +
> > +       pdu->ubq = ubq;
> 
> Why does pdu->ubq need to be set here but not in ublk_queue_cmd()? I
> would have thought it would already be set to ubq because pdu comes
> from a rq belonging to this ubq.
> 
> > +       pdu->list = *l;
> > +       rq_list_init(l);
> > +       io_uring_cmd_complete_in_task(io->cmd, ublk_cmd_list_tw_cb);
> 
> Could store io->cmd in a variable to avoid looking it up twice.
> 
> > +}
> > +
> >  static enum blk_eh_timer_return ublk_timeout(struct request *rq)
> >  {
> >         struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > @@ -1296,16 +1323,13 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
> >         return BLK_EH_RESET_TIMER;
> >  }
> >
> > -static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > -               const struct blk_mq_queue_data *bd)
> > +static blk_status_t ublk_prep_rq_batch(struct request *rq)
> 
> naming nit: why "batch"?

All were handled in my local version.

> 
> >  {
> > -       struct ublk_queue *ubq = hctx->driver_data;
> > -       struct request *rq = bd->rq;
> > +       struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> >         blk_status_t res;
> >
> > -       if (unlikely(ubq->fail_io)) {
> > +       if (unlikely(ubq->fail_io))
> >                 return BLK_STS_TARGET;
> > -       }
> >
> >         /* fill iod to slot in io cmd buffer */
> >         res = ublk_setup_iod(ubq, rq);
> > @@ -1324,17 +1348,58 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> >         if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
> >                 return BLK_STS_IOERR;
> >
> > +       if (unlikely(ubq->canceling))
> > +               return BLK_STS_IOERR;
> 
> Why is ubq->cancelling treated differently for ->queue_rq() vs. ->queue_rqs()?

It is same, just ublk_queue_rqs() becomes simpler by letting ->queue_rqs()
to handle ubq->canceling.

Here it is really something which need to comment for ubq->canceling
handling, which has to be done after ->fail_io/->force_abort is dealt
with, otherwise IO hang is caused when removing disk.

That is one bug introduced in this patch.

> 
> > +
> > +       blk_mq_start_request(rq);
> > +       return BLK_STS_OK;
> > +}
> > +
> > +static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > +               const struct blk_mq_queue_data *bd)
> > +{
> > +       struct ublk_queue *ubq = hctx->driver_data;
> > +       struct request *rq = bd->rq;
> > +       blk_status_t res;
> > +
> >         if (unlikely(ubq->canceling)) {
> >                 __ublk_abort_rq(ubq, rq);
> >                 return BLK_STS_OK;
> >         }
> >
> > -       blk_mq_start_request(bd->rq);
> > -       ublk_queue_cmd(ubq, rq);
> > +       res = ublk_prep_rq_batch(rq);
> > +       if (res != BLK_STS_OK)
> > +               return res;
> >
> > +       ublk_queue_cmd(ubq, rq);
> >         return BLK_STS_OK;
> >  }
> >
> > +static void ublk_queue_rqs(struct rq_list *rqlist)
> > +{
> > +       struct rq_list requeue_list = { };
> > +       struct rq_list submit_list = { };
> > +       struct ublk_queue *ubq = NULL;
> > +       struct request *req;
> > +
> > +       while ((req = rq_list_pop(rqlist))) {
> > +               struct ublk_queue *this_q = req->mq_hctx->driver_data;
> > +
> > +               if (ubq && ubq != this_q && !rq_list_empty(&submit_list))
> > +                       ublk_queue_cmd_list(ubq, &submit_list);
> > +               ubq = this_q;
> 
> Probably could avoid the extra ->driver_data dereference on every rq
> by comparing the mq_hctx pointers instead. The ->driver_data
> dereference could be moved to the ublk_queue_cmd_list() calls.

Yes.

> 
> > +
> > +               if (ublk_prep_rq_batch(req) == BLK_STS_OK)
> > +                       rq_list_add_tail(&submit_list, req);
> > +               else
> > +                       rq_list_add_tail(&requeue_list, req);
> > +       }
> > +
> > +       if (ubq && !rq_list_empty(&submit_list))
> > +               ublk_queue_cmd_list(ubq, &submit_list);
> > +       *rqlist = requeue_list;
> > +}
> > +
> >  static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
> >                 unsigned int hctx_idx)
> >  {
> > @@ -1347,6 +1412,7 @@ static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
> >
> >  static const struct blk_mq_ops ublk_mq_ops = {
> >         .queue_rq       = ublk_queue_rq,
> > +       .queue_rqs      = ublk_queue_rqs,
> >         .init_hctx      = ublk_init_hctx,
> >         .timeout        = ublk_timeout,
> >  };
> > @@ -3147,6 +3213,9 @@ static int __init ublk_init(void)
> >         BUILD_BUG_ON((u64)UBLKSRV_IO_BUF_OFFSET +
> >                         UBLKSRV_IO_BUF_TOTAL_SIZE < UBLKSRV_IO_BUF_OFFSET);
> >
> > +       BUILD_BUG_ON(sizeof(struct ublk_uring_cmd_pdu) >
> > +                       sizeof_field(struct io_uring_cmd, pdu));
> 
> Looks like Uday also suggested this, but if you change
> ublk_get_uring_cmd_pdu() to use io_uring_cmd_to_pdu(), you get this
> check for free.

I have followed Uday's suggestion to patch ublk_get_uring_cmd_pdu(),
and will send out v2 soon.

Thanks,
Ming


  reply	other threads:[~2025-03-27  9:15 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
2025-03-24 13:48 ` [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
2025-03-24 14:32   ` Caleb Sander Mateos
2025-03-24 13:48 ` [PATCH 2/8] ublk: add helper of ublk_need_map_io() Ming Lei
2025-03-24 15:01   ` Caleb Sander Mateos
2025-03-24 13:48 ` [PATCH 3/8] ublk: truncate io command result Ming Lei
2025-03-24 15:51   ` Caleb Sander Mateos
2025-03-25  0:50     ` Ming Lei
2025-03-24 13:48 ` [PATCH 4/8] ublk: add segment parameter Ming Lei
2025-03-24 22:26   ` Caleb Sander Mateos
2025-03-25  1:15     ` Ming Lei
2025-03-25 19:43       ` Caleb Sander Mateos
2025-03-26  2:17         ` Ming Lei
2025-03-26 16:43           ` Caleb Sander Mateos
2025-03-27  1:49             ` Ming Lei
2025-03-24 13:49 ` [PATCH 5/8] ublk: document zero copy feature Ming Lei
2025-03-25 15:26   ` Caleb Sander Mateos
2025-03-26  2:29     ` Ming Lei
2025-03-26 16:48       ` Caleb Sander Mateos
2025-03-24 13:49 ` [PATCH 6/8] ublk: implement ->queue_rqs() Ming Lei
2025-03-24 17:07   ` Uday Shankar
2025-03-25  1:02     ` Ming Lei
2025-03-26 21:30   ` Caleb Sander Mateos
2025-03-27  9:15     ` Ming Lei [this message]
2025-03-24 13:49 ` [PATCH 7/8] selftests: ublk: add more tests for covering MQ Ming Lei
2025-03-24 13:49 ` [PATCH 8/8] selftests: ublk: add test for checking zero copy related parameter Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-UXFu6q1apYYUBb@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).