From: "Michael S. Tsirkin" <mst@redhat.com>
To: Suwan Kim <suwan.kim027@gmail.com>
Cc: jasowang@redhat.com, stefanha@redhat.com, pbonzini@redhat.com,
mgurtovoy@nvidia.com, virtualization@lists.linux-foundation.org,
linux-block@vger.kernel.org, kernel test robot <lkp@intel.com>
Subject: Re: [PATCH v3 1/2] virtio-blk: support polling I/O
Date: Thu, 24 Mar 2022 10:32:02 -0400 [thread overview]
Message-ID: <20220324103056-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20220324140450.33148-2-suwan.kim027@gmail.com>
On Thu, Mar 24, 2022 at 11:04:49PM +0900, Suwan Kim wrote:
> This patch supports polling I/O via virtio-blk driver. Polling
> feature is enabled by module parameter "num_poll_queues" and it
> sets dedicated polling queues for virtio-blk. This patch improves
> the polling I/O throughput and latency.
>
> The virtio-blk driver doesn't not have a poll function and a poll
> queue and it has been operating in interrupt driven method even if
> the polling function is called in the upper layer.
>
> virtio-blk polling is implemented upon 'batched completion' of block
> layer. virtblk_poll() queues completed request to io_comp_batch->req_list
> and later, virtblk_complete_batch() calls unmap function and ends
> the requests in batch.
>
> virtio-blk reads the number of poll queues from module parameter
> "num_poll_queues". If VM sets queue parameter as below,
> ("num-queues=N" [QEMU property], "num_poll_queues=M" [module parameter])
> It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)]
> as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default
> queues, the poll queues have no callback function.
>
> Regarding HW-SW queue mapping, the default queue mapping uses the
> existing method that condsiders MSI irq vector. But the poll queue
> doesn't have an irq, so it uses the regular blk-mq cpu mapping.
>
> For verifying the improvement, I did Fio polling I/O performance test
> with io_uring engine with the options below.
> (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N)
> I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll
> queues for VM.
>
> As a result, IOPS and average latency improved about 10%.
>
> Test result:
>
> - Fio io_uring poll without virtio-blk poll support
> -- numjobs=1 : IOPS = 339K, avg latency = 188.33us
> -- numjobs=2 : IOPS = 367K, avg latency = 347.33us
> -- numjobs=4 : IOPS = 383K, avg latency = 682.06us
>
> - Fio io_uring poll with virtio-blk poll support
> -- numjobs=1 : IOPS = 380K, avg latency = 167.87us
> -- numjobs=2 : IOPS = 409K, avg latency = 312.6us
> -- numjobs=4 : IOPS = 413K, avg latency = 619.72us
>
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
> ---
> drivers/block/virtio_blk.c | 101 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 97 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 8c415be86732..3d16f8b753e7 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -37,6 +37,10 @@ MODULE_PARM_DESC(num_request_queues,
> "0 for no limit. "
> "Values > nr_cpu_ids truncated to nr_cpu_ids.");
>
> +static unsigned int num_poll_queues;
> +module_param(num_poll_queues, uint, 0644);
> +MODULE_PARM_DESC(num_poll_queues, "The number of dedicated virtqueues for polling I/O");
> +
> static int major;
> static DEFINE_IDA(vd_index_ida);
>
Is there some way to make it work reasonably without need to set
module parameters? I don't see any other devices with a num_poll_queues
parameter - how do they handle this?
> @@ -81,6 +85,7 @@ struct virtio_blk {
>
> /* num of vqs */
> int num_vqs;
> + int io_queues[HCTX_MAX_TYPES];
> struct virtio_blk_vq *vqs;
> };
>
> @@ -548,6 +553,7 @@ static int init_vq(struct virtio_blk *vblk)
> const char **names;
> struct virtqueue **vqs;
> unsigned short num_vqs;
> + unsigned int num_poll_vqs;
> struct virtio_device *vdev = vblk->vdev;
> struct irq_affinity desc = { 0, };
>
> @@ -556,6 +562,7 @@ static int init_vq(struct virtio_blk *vblk)
> &num_vqs);
> if (err)
> num_vqs = 1;
> +
> if (!err && !num_vqs) {
> dev_err(&vdev->dev, "MQ advertised but zero queues reported\n");
> return -EINVAL;
> @@ -565,6 +572,13 @@ static int init_vq(struct virtio_blk *vblk)
> min_not_zero(num_request_queues, nr_cpu_ids),
> num_vqs);
>
> + num_poll_vqs = min_t(unsigned int, num_poll_queues, num_vqs - 1);
> +
> + memset(vblk->io_queues, 0, sizeof(int) * HCTX_MAX_TYPES);
> + vblk->io_queues[HCTX_TYPE_DEFAULT] = num_vqs - num_poll_vqs;
> + vblk->io_queues[HCTX_TYPE_READ] = 0;
> + vblk->io_queues[HCTX_TYPE_POLL] = num_poll_vqs;
> +
> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
> if (!vblk->vqs)
> return -ENOMEM;
> @@ -578,8 +592,13 @@ static int init_vq(struct virtio_blk *vblk)
> }
>
> for (i = 0; i < num_vqs; i++) {
> - callbacks[i] = virtblk_done;
> - snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
> + if (i < num_vqs - num_poll_vqs) {
> + callbacks[i] = virtblk_done;
> + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
> + } else {
> + callbacks[i] = NULL;
> + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
> + }
> names[i] = vblk->vqs[i].name;
> }
>
> @@ -728,16 +747,87 @@ static const struct attribute_group *virtblk_attr_groups[] = {
> static int virtblk_map_queues(struct blk_mq_tag_set *set)
> {
> struct virtio_blk *vblk = set->driver_data;
> + int i, qoff;
> +
> + for (i = 0, qoff = 0; i < set->nr_maps; i++) {
> + struct blk_mq_queue_map *map = &set->map[i];
> +
> + map->nr_queues = vblk->io_queues[i];
> + map->queue_offset = qoff;
> + qoff += map->nr_queues;
> +
> + if (map->nr_queues == 0)
> + continue;
> +
> + /*
> + * Regular queues have interrupts and hence CPU affinity is
> + * defined by the core virtio code, but polling queues have
> + * no interrupts so we let the block layer assign CPU affinity.
> + */
> + if (i == HCTX_TYPE_DEFAULT)
> + blk_mq_virtio_map_queues(&set->map[i], vblk->vdev, 0);
> + else
> + blk_mq_map_queues(&set->map[i]);
> + }
> +
> + return 0;
> +}
> +
> +static void virtblk_complete_batch(struct io_comp_batch *iob)
> +{
> + struct request *req;
> + struct virtblk_req *vbr;
>
> - return blk_mq_virtio_map_queues(&set->map[HCTX_TYPE_DEFAULT],
> - vblk->vdev, 0);
> + rq_list_for_each(&iob->req_list, req) {
> + vbr = blk_mq_rq_to_pdu(req);
> + virtblk_unmap_data(req, vbr);
> + virtblk_cleanup_cmd(req);
> + }
> + blk_mq_end_request_batch(iob);
> +}
> +
> +static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
> +{
> + struct virtio_blk_vq *vq = hctx->driver_data;
> + struct virtblk_req *vbr;
> + unsigned long flags;
> + unsigned int len;
> + int found = 0;
> +
> + spin_lock_irqsave(&vq->lock, flags);
> +
> + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
> + struct request *req = blk_mq_rq_from_pdu(vbr);
> +
> + found++;
> + if (!blk_mq_add_to_batch(req, iob, vbr->status,
> + virtblk_complete_batch))
> + blk_mq_complete_request(req);
> + }
> +
> + spin_unlock_irqrestore(&vq->lock, flags);
> +
> + return found;
> +}
> +
> +static int virtblk_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> + unsigned int hctx_idx)
> +{
> + struct virtio_blk *vblk = data;
> + struct virtio_blk_vq *vq = &vblk->vqs[hctx_idx];
> +
> + WARN_ON(vblk->tag_set.tags[hctx_idx] != hctx->tags);
> + hctx->driver_data = vq;
> + return 0;
> }
>
> static const struct blk_mq_ops virtio_mq_ops = {
> .queue_rq = virtio_queue_rq,
> .commit_rqs = virtio_commit_rqs,
> + .init_hctx = virtblk_init_hctx,
> .complete = virtblk_request_done,
> .map_queues = virtblk_map_queues,
> + .poll = virtblk_poll,
> };
>
> static unsigned int virtblk_queue_depth;
> @@ -816,6 +906,9 @@ static int virtblk_probe(struct virtio_device *vdev)
> sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
> vblk->tag_set.driver_data = vblk;
> vblk->tag_set.nr_hw_queues = vblk->num_vqs;
> + vblk->tag_set.nr_maps = 1;
> + if (vblk->io_queues[HCTX_TYPE_POLL])
> + vblk->tag_set.nr_maps = 3;
>
> err = blk_mq_alloc_tag_set(&vblk->tag_set);
> if (err)
> --
> 2.26.3
WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Suwan Kim <suwan.kim027@gmail.com>
Cc: mgurtovoy@nvidia.com, kernel test robot <lkp@intel.com>,
virtualization@lists.linux-foundation.org,
linux-block@vger.kernel.org, stefanha@redhat.com,
pbonzini@redhat.com
Subject: Re: [PATCH v3 1/2] virtio-blk: support polling I/O
Date: Thu, 24 Mar 2022 10:32:02 -0400 [thread overview]
Message-ID: <20220324103056-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20220324140450.33148-2-suwan.kim027@gmail.com>
On Thu, Mar 24, 2022 at 11:04:49PM +0900, Suwan Kim wrote:
> This patch supports polling I/O via virtio-blk driver. Polling
> feature is enabled by module parameter "num_poll_queues" and it
> sets dedicated polling queues for virtio-blk. This patch improves
> the polling I/O throughput and latency.
>
> The virtio-blk driver doesn't not have a poll function and a poll
> queue and it has been operating in interrupt driven method even if
> the polling function is called in the upper layer.
>
> virtio-blk polling is implemented upon 'batched completion' of block
> layer. virtblk_poll() queues completed request to io_comp_batch->req_list
> and later, virtblk_complete_batch() calls unmap function and ends
> the requests in batch.
>
> virtio-blk reads the number of poll queues from module parameter
> "num_poll_queues". If VM sets queue parameter as below,
> ("num-queues=N" [QEMU property], "num_poll_queues=M" [module parameter])
> It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)]
> as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default
> queues, the poll queues have no callback function.
>
> Regarding HW-SW queue mapping, the default queue mapping uses the
> existing method that condsiders MSI irq vector. But the poll queue
> doesn't have an irq, so it uses the regular blk-mq cpu mapping.
>
> For verifying the improvement, I did Fio polling I/O performance test
> with io_uring engine with the options below.
> (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N)
> I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll
> queues for VM.
>
> As a result, IOPS and average latency improved about 10%.
>
> Test result:
>
> - Fio io_uring poll without virtio-blk poll support
> -- numjobs=1 : IOPS = 339K, avg latency = 188.33us
> -- numjobs=2 : IOPS = 367K, avg latency = 347.33us
> -- numjobs=4 : IOPS = 383K, avg latency = 682.06us
>
> - Fio io_uring poll with virtio-blk poll support
> -- numjobs=1 : IOPS = 380K, avg latency = 167.87us
> -- numjobs=2 : IOPS = 409K, avg latency = 312.6us
> -- numjobs=4 : IOPS = 413K, avg latency = 619.72us
>
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
> ---
> drivers/block/virtio_blk.c | 101 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 97 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 8c415be86732..3d16f8b753e7 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -37,6 +37,10 @@ MODULE_PARM_DESC(num_request_queues,
> "0 for no limit. "
> "Values > nr_cpu_ids truncated to nr_cpu_ids.");
>
> +static unsigned int num_poll_queues;
> +module_param(num_poll_queues, uint, 0644);
> +MODULE_PARM_DESC(num_poll_queues, "The number of dedicated virtqueues for polling I/O");
> +
> static int major;
> static DEFINE_IDA(vd_index_ida);
>
Is there some way to make it work reasonably without need to set
module parameters? I don't see any other devices with a num_poll_queues
parameter - how do they handle this?
> @@ -81,6 +85,7 @@ struct virtio_blk {
>
> /* num of vqs */
> int num_vqs;
> + int io_queues[HCTX_MAX_TYPES];
> struct virtio_blk_vq *vqs;
> };
>
> @@ -548,6 +553,7 @@ static int init_vq(struct virtio_blk *vblk)
> const char **names;
> struct virtqueue **vqs;
> unsigned short num_vqs;
> + unsigned int num_poll_vqs;
> struct virtio_device *vdev = vblk->vdev;
> struct irq_affinity desc = { 0, };
>
> @@ -556,6 +562,7 @@ static int init_vq(struct virtio_blk *vblk)
> &num_vqs);
> if (err)
> num_vqs = 1;
> +
> if (!err && !num_vqs) {
> dev_err(&vdev->dev, "MQ advertised but zero queues reported\n");
> return -EINVAL;
> @@ -565,6 +572,13 @@ static int init_vq(struct virtio_blk *vblk)
> min_not_zero(num_request_queues, nr_cpu_ids),
> num_vqs);
>
> + num_poll_vqs = min_t(unsigned int, num_poll_queues, num_vqs - 1);
> +
> + memset(vblk->io_queues, 0, sizeof(int) * HCTX_MAX_TYPES);
> + vblk->io_queues[HCTX_TYPE_DEFAULT] = num_vqs - num_poll_vqs;
> + vblk->io_queues[HCTX_TYPE_READ] = 0;
> + vblk->io_queues[HCTX_TYPE_POLL] = num_poll_vqs;
> +
> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
> if (!vblk->vqs)
> return -ENOMEM;
> @@ -578,8 +592,13 @@ static int init_vq(struct virtio_blk *vblk)
> }
>
> for (i = 0; i < num_vqs; i++) {
> - callbacks[i] = virtblk_done;
> - snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
> + if (i < num_vqs - num_poll_vqs) {
> + callbacks[i] = virtblk_done;
> + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
> + } else {
> + callbacks[i] = NULL;
> + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
> + }
> names[i] = vblk->vqs[i].name;
> }
>
> @@ -728,16 +747,87 @@ static const struct attribute_group *virtblk_attr_groups[] = {
> static int virtblk_map_queues(struct blk_mq_tag_set *set)
> {
> struct virtio_blk *vblk = set->driver_data;
> + int i, qoff;
> +
> + for (i = 0, qoff = 0; i < set->nr_maps; i++) {
> + struct blk_mq_queue_map *map = &set->map[i];
> +
> + map->nr_queues = vblk->io_queues[i];
> + map->queue_offset = qoff;
> + qoff += map->nr_queues;
> +
> + if (map->nr_queues == 0)
> + continue;
> +
> + /*
> + * Regular queues have interrupts and hence CPU affinity is
> + * defined by the core virtio code, but polling queues have
> + * no interrupts so we let the block layer assign CPU affinity.
> + */
> + if (i == HCTX_TYPE_DEFAULT)
> + blk_mq_virtio_map_queues(&set->map[i], vblk->vdev, 0);
> + else
> + blk_mq_map_queues(&set->map[i]);
> + }
> +
> + return 0;
> +}
> +
> +static void virtblk_complete_batch(struct io_comp_batch *iob)
> +{
> + struct request *req;
> + struct virtblk_req *vbr;
>
> - return blk_mq_virtio_map_queues(&set->map[HCTX_TYPE_DEFAULT],
> - vblk->vdev, 0);
> + rq_list_for_each(&iob->req_list, req) {
> + vbr = blk_mq_rq_to_pdu(req);
> + virtblk_unmap_data(req, vbr);
> + virtblk_cleanup_cmd(req);
> + }
> + blk_mq_end_request_batch(iob);
> +}
> +
> +static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
> +{
> + struct virtio_blk_vq *vq = hctx->driver_data;
> + struct virtblk_req *vbr;
> + unsigned long flags;
> + unsigned int len;
> + int found = 0;
> +
> + spin_lock_irqsave(&vq->lock, flags);
> +
> + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) {
> + struct request *req = blk_mq_rq_from_pdu(vbr);
> +
> + found++;
> + if (!blk_mq_add_to_batch(req, iob, vbr->status,
> + virtblk_complete_batch))
> + blk_mq_complete_request(req);
> + }
> +
> + spin_unlock_irqrestore(&vq->lock, flags);
> +
> + return found;
> +}
> +
> +static int virtblk_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> + unsigned int hctx_idx)
> +{
> + struct virtio_blk *vblk = data;
> + struct virtio_blk_vq *vq = &vblk->vqs[hctx_idx];
> +
> + WARN_ON(vblk->tag_set.tags[hctx_idx] != hctx->tags);
> + hctx->driver_data = vq;
> + return 0;
> }
>
> static const struct blk_mq_ops virtio_mq_ops = {
> .queue_rq = virtio_queue_rq,
> .commit_rqs = virtio_commit_rqs,
> + .init_hctx = virtblk_init_hctx,
> .complete = virtblk_request_done,
> .map_queues = virtblk_map_queues,
> + .poll = virtblk_poll,
> };
>
> static unsigned int virtblk_queue_depth;
> @@ -816,6 +906,9 @@ static int virtblk_probe(struct virtio_device *vdev)
> sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
> vblk->tag_set.driver_data = vblk;
> vblk->tag_set.nr_hw_queues = vblk->num_vqs;
> + vblk->tag_set.nr_maps = 1;
> + if (vblk->io_queues[HCTX_TYPE_POLL])
> + vblk->tag_set.nr_maps = 3;
>
> err = blk_mq_alloc_tag_set(&vblk->tag_set);
> if (err)
> --
> 2.26.3
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2022-03-24 14:32 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-24 14:04 [PATCH v3 0/2] virtio-blk: support polling I/O and mq_ops->queue_rqs() Suwan Kim
2022-03-24 14:04 ` [PATCH v3 1/2] virtio-blk: support polling I/O Suwan Kim
2022-03-24 14:32 ` Michael S. Tsirkin [this message]
2022-03-24 14:32 ` Michael S. Tsirkin
2022-03-24 14:46 ` Suwan Kim
2022-03-24 17:56 ` Michael S. Tsirkin
2022-03-24 17:56 ` Michael S. Tsirkin
2022-03-26 12:00 ` Suwan Kim
2022-03-24 17:34 ` Dongli Zhang
2022-03-24 17:34 ` Dongli Zhang
2022-03-26 11:53 ` Suwan Kim
2022-03-24 17:58 ` Michael S. Tsirkin
2022-03-24 17:58 ` Michael S. Tsirkin
2022-03-26 12:44 ` Suwan Kim
2022-03-28 12:53 ` Stefan Hajnoczi
2022-03-28 12:53 ` Stefan Hajnoczi
2022-03-28 14:40 ` Suwan Kim
2022-03-24 14:04 ` [PATCH v3 2/2] virtio-blk: support mq_ops->queue_rqs() Suwan Kim
2022-03-28 13:16 ` Stefan Hajnoczi
2022-03-28 13:16 ` Stefan Hajnoczi
2022-03-28 15:50 ` Suwan Kim
2022-03-29 8:45 ` Stefan Hajnoczi
2022-03-29 8:45 ` Stefan Hajnoczi
2022-03-29 13:48 ` Suwan Kim
2022-03-29 15:01 ` Stefan Hajnoczi
2022-03-29 15:01 ` Stefan Hajnoczi
2022-03-29 15:54 ` Suwan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220324103056-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=jasowang@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=lkp@intel.com \
--cc=mgurtovoy@nvidia.com \
--cc=pbonzini@redhat.com \
--cc=stefanha@redhat.com \
--cc=suwan.kim027@gmail.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.