From: Christoph Hellwig <hch@infradead.org>
To: Suwan Kim <suwan.kim027@gmail.com>
Cc: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com,
pbonzini@redhat.com, mgurtovoy@nvidia.com,
dongli.zhang@oracle.com,
virtualization@lists.linux-foundation.org,
linux-block@vger.kernel.org
Subject: Re: [PATCH v4 1/2] virtio-blk: support polling I/O
Date: Tue, 5 Apr 2022 01:51:40 -0700 [thread overview]
Message-ID: <YkwDHGutRLN51hbd@infradead.org> (raw)
In-Reply-To: <20220405053122.77626-2-suwan.kim027@gmail.com>
On Tue, Apr 05, 2022 at 02:31:21PM +0900, Suwan Kim wrote:
> This patch supports polling I/O via virtio-blk driver. Polling
> feature is enabled by module parameter "num_poll_queues" and it
> sets dedicated polling queues for virtio-blk. This patch improves
> the polling I/O throughput and latency.
>
> The virtio-blk driver doesn't not have a poll function and a poll
> queue and it has been operating in interrupt driven method even if
> the polling function is called in the upper layer.
>
> virtio-blk polling is implemented upon 'batched completion' of block
> layer. virtblk_poll() queues completed request to io_comp_batch->req_list
> and later, virtblk_complete_batch() calls unmap function and ends
> the requests in batch.
>
> virtio-blk reads the number of poll queues from module parameter
> "num_poll_queues". If VM sets queue parameter as below,
> ("num-queues=N" [QEMU property], "num_poll_queues=M" [module parameter])
> It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)]
> as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default
> queues, the poll queues have no callback function.
>
> Regarding HW-SW queue mapping, the default queue mapping uses the
> existing method that condsiders MSI irq vector. But the poll queue
> doesn't have an irq, so it uses the regular blk-mq cpu mapping.
>
> For verifying the improvement, I did Fio polling I/O performance test
> with io_uring engine with the options below.
> (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N)
> I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll
> queues for VM.
>
> As a result, IOPS and average latency improved about 10%.
>
> Test result:
>
> - Fio io_uring poll without virtio-blk poll support
> -- numjobs=1 : IOPS = 339K, avg latency = 188.33us
> -- numjobs=2 : IOPS = 367K, avg latency = 347.33us
> -- numjobs=4 : IOPS = 383K, avg latency = 682.06us
>
> - Fio io_uring poll with virtio-blk poll support
> -- numjobs=1 : IOPS = 385K, avg latency = 165.94us
> -- numjobs=2 : IOPS = 408K, avg latency = 313.28us
> -- numjobs=4 : IOPS = 424K, avg latency = 613.05us
>
> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
> ---
> drivers/block/virtio_blk.c | 112 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 108 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 8c415be86732..712579dcd3cc 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -37,6 +37,10 @@ MODULE_PARM_DESC(num_request_queues,
> "0 for no limit. "
> "Values > nr_cpu_ids truncated to nr_cpu_ids.");
>
> +static unsigned int poll_queues;
> +module_param(poll_queues, uint, 0644);
> +MODULE_PARM_DESC(poll_queues, "The number of dedicated virtqueues for polling I/O");
> +
> static int major;
> static DEFINE_IDA(vd_index_ida);
>
> @@ -81,6 +85,7 @@ struct virtio_blk {
>
> /* num of vqs */
> int num_vqs;
> + int io_queues[HCTX_MAX_TYPES];
> struct virtio_blk_vq *vqs;
> };
>
> @@ -548,6 +553,7 @@ static int init_vq(struct virtio_blk *vblk)
> const char **names;
> struct virtqueue **vqs;
> unsigned short num_vqs;
> + unsigned int num_poll_vqs;
> struct virtio_device *vdev = vblk->vdev;
> struct irq_affinity desc = { 0, };
>
> @@ -556,6 +562,7 @@ static int init_vq(struct virtio_blk *vblk)
> &num_vqs);
> if (err)
> num_vqs = 1;
> +
> if (!err && !num_vqs) {
> dev_err(&vdev->dev, "MQ advertised but zero queues reported\n");
> return -EINVAL;
> @@ -565,6 +572,18 @@ static int init_vq(struct virtio_blk *vblk)
> min_not_zero(num_request_queues, nr_cpu_ids),
> num_vqs);
>
> + num_poll_vqs = min_t(unsigned int, poll_queues, num_vqs - 1);
> +
> + memset(vblk->io_queues, 0, sizeof(int) * HCTX_MAX_TYPES);
> + vblk->io_queues[HCTX_TYPE_DEFAULT] = num_vqs - num_poll_vqs;
> + vblk->io_queues[HCTX_TYPE_READ] = 0;
> + vblk->io_queues[HCTX_TYPE_POLL] = num_poll_vqs;
> +
> + dev_info(&vdev->dev, "%d/%d/%d default/read/poll queues\n",
> + vblk->io_queues[HCTX_TYPE_DEFAULT],
> + vblk->io_queues[HCTX_TYPE_READ],
> + vblk->io_queues[HCTX_TYPE_POLL]);
> +
> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
> if (!vblk->vqs)
> return -ENOMEM;
> @@ -578,8 +597,13 @@ static int init_vq(struct virtio_blk *vblk)
> }
>
> for (i = 0; i < num_vqs; i++) {
> + if (i < num_vqs - num_poll_vqs) {
> + callbacks[i] = virtblk_done;
> + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
> + } else {
> + callbacks[i] = NULL;
> + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
> + }
> names[i] = vblk->vqs[i].name;
This would look a little cleaner with two loops:
for (i = 0; i < num_vqs - num_poll_vqs; i++) {
callbacks[i] = virtblk_done;
snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
names[i] = vblk->vqs[i].name;
}
for (; i < num_vqs; i++) {
callbacks[i] = NULL;
snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
names[i] = vblk->vqs[i].name;
}
> +
> + if (map->nr_queues == 0)
> + continue;
> +
> + /*
> + * Regular queues have interrupts and hence CPU affinity is
> + * defined by the core virtio code, but polling queues have
> + * no interrupts so we let the block layer assign CPU affinity.
> + */
> + if (i == HCTX_TYPE_DEFAULT)
I'd check for
i != HCTX_TYPE_POLL
here instead to make the check a little more explicit and future proof
for the potential addition of read queues (which would be a Linux only
change without hypervisor or spec changes). In fact you might as well
add that support now as doing it is completely trivial once a driver
supports multiple map types.
> +static void virtblk_complete_batch(struct io_comp_batch *iob)
> +{
> + struct request *req;
> + struct virtblk_req *vbr;
> +
> + rq_list_for_each(&iob->req_list, req) {
> + vbr = blk_mq_rq_to_pdu(req);
> + virtblk_unmap_data(req, vbr);
> + virtblk_cleanup_cmd(req);
vbr is only used ones, so why not just:
virtblk_unmap_data(req, blk_mq_rq_to_pdu);
?
Or even better add a cleanup patch to just remove the vbr argument to
virtblk_unmap_data as it is not needed at all.
WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@infradead.org>
To: Suwan Kim <suwan.kim027@gmail.com>
Cc: mgurtovoy@nvidia.com, mst@redhat.com,
virtualization@lists.linux-foundation.org,
linux-block@vger.kernel.org, stefanha@redhat.com,
pbonzini@redhat.com
Subject: Re: [PATCH v4 1/2] virtio-blk: support polling I/O
Date: Tue, 5 Apr 2022 01:51:40 -0700 [thread overview]
Message-ID: <YkwDHGutRLN51hbd@infradead.org> (raw)
In-Reply-To: <20220405053122.77626-2-suwan.kim027@gmail.com>
On Tue, Apr 05, 2022 at 02:31:21PM +0900, Suwan Kim wrote:
> This patch supports polling I/O via virtio-blk driver. Polling
> feature is enabled by module parameter "num_poll_queues" and it
> sets dedicated polling queues for virtio-blk. This patch improves
> the polling I/O throughput and latency.
>
> The virtio-blk driver doesn't not have a poll function and a poll
> queue and it has been operating in interrupt driven method even if
> the polling function is called in the upper layer.
>
> virtio-blk polling is implemented upon 'batched completion' of block
> layer. virtblk_poll() queues completed request to io_comp_batch->req_list
> and later, virtblk_complete_batch() calls unmap function and ends
> the requests in batch.
>
> virtio-blk reads the number of poll queues from module parameter
> "num_poll_queues". If VM sets queue parameter as below,
> ("num-queues=N" [QEMU property], "num_poll_queues=M" [module parameter])
> It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)]
> as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default
> queues, the poll queues have no callback function.
>
> Regarding HW-SW queue mapping, the default queue mapping uses the
> existing method that condsiders MSI irq vector. But the poll queue
> doesn't have an irq, so it uses the regular blk-mq cpu mapping.
>
> For verifying the improvement, I did Fio polling I/O performance test
> with io_uring engine with the options below.
> (io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N)
> I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll
> queues for VM.
>
> As a result, IOPS and average latency improved about 10%.
>
> Test result:
>
> - Fio io_uring poll without virtio-blk poll support
> -- numjobs=1 : IOPS = 339K, avg latency = 188.33us
> -- numjobs=2 : IOPS = 367K, avg latency = 347.33us
> -- numjobs=4 : IOPS = 383K, avg latency = 682.06us
>
> - Fio io_uring poll with virtio-blk poll support
> -- numjobs=1 : IOPS = 385K, avg latency = 165.94us
> -- numjobs=2 : IOPS = 408K, avg latency = 313.28us
> -- numjobs=4 : IOPS = 424K, avg latency = 613.05us
>
> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
> ---
> drivers/block/virtio_blk.c | 112 +++++++++++++++++++++++++++++++++++--
> 1 file changed, 108 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 8c415be86732..712579dcd3cc 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -37,6 +37,10 @@ MODULE_PARM_DESC(num_request_queues,
> "0 for no limit. "
> "Values > nr_cpu_ids truncated to nr_cpu_ids.");
>
> +static unsigned int poll_queues;
> +module_param(poll_queues, uint, 0644);
> +MODULE_PARM_DESC(poll_queues, "The number of dedicated virtqueues for polling I/O");
> +
> static int major;
> static DEFINE_IDA(vd_index_ida);
>
> @@ -81,6 +85,7 @@ struct virtio_blk {
>
> /* num of vqs */
> int num_vqs;
> + int io_queues[HCTX_MAX_TYPES];
> struct virtio_blk_vq *vqs;
> };
>
> @@ -548,6 +553,7 @@ static int init_vq(struct virtio_blk *vblk)
> const char **names;
> struct virtqueue **vqs;
> unsigned short num_vqs;
> + unsigned int num_poll_vqs;
> struct virtio_device *vdev = vblk->vdev;
> struct irq_affinity desc = { 0, };
>
> @@ -556,6 +562,7 @@ static int init_vq(struct virtio_blk *vblk)
> &num_vqs);
> if (err)
> num_vqs = 1;
> +
> if (!err && !num_vqs) {
> dev_err(&vdev->dev, "MQ advertised but zero queues reported\n");
> return -EINVAL;
> @@ -565,6 +572,18 @@ static int init_vq(struct virtio_blk *vblk)
> min_not_zero(num_request_queues, nr_cpu_ids),
> num_vqs);
>
> + num_poll_vqs = min_t(unsigned int, poll_queues, num_vqs - 1);
> +
> + memset(vblk->io_queues, 0, sizeof(int) * HCTX_MAX_TYPES);
> + vblk->io_queues[HCTX_TYPE_DEFAULT] = num_vqs - num_poll_vqs;
> + vblk->io_queues[HCTX_TYPE_READ] = 0;
> + vblk->io_queues[HCTX_TYPE_POLL] = num_poll_vqs;
> +
> + dev_info(&vdev->dev, "%d/%d/%d default/read/poll queues\n",
> + vblk->io_queues[HCTX_TYPE_DEFAULT],
> + vblk->io_queues[HCTX_TYPE_READ],
> + vblk->io_queues[HCTX_TYPE_POLL]);
> +
> vblk->vqs = kmalloc_array(num_vqs, sizeof(*vblk->vqs), GFP_KERNEL);
> if (!vblk->vqs)
> return -ENOMEM;
> @@ -578,8 +597,13 @@ static int init_vq(struct virtio_blk *vblk)
> }
>
> for (i = 0; i < num_vqs; i++) {
> + if (i < num_vqs - num_poll_vqs) {
> + callbacks[i] = virtblk_done;
> + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
> + } else {
> + callbacks[i] = NULL;
> + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
> + }
> names[i] = vblk->vqs[i].name;
This would look a little cleaner with two loops:
for (i = 0; i < num_vqs - num_poll_vqs; i++) {
callbacks[i] = virtblk_done;
snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i);
names[i] = vblk->vqs[i].name;
}
for (; i < num_vqs; i++) {
callbacks[i] = NULL;
snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i);
names[i] = vblk->vqs[i].name;
}
> +
> + if (map->nr_queues == 0)
> + continue;
> +
> + /*
> + * Regular queues have interrupts and hence CPU affinity is
> + * defined by the core virtio code, but polling queues have
> + * no interrupts so we let the block layer assign CPU affinity.
> + */
> + if (i == HCTX_TYPE_DEFAULT)
I'd check for
i != HCTX_TYPE_POLL
here instead to make the check a little more explicit and future proof
for the potential addition of read queues (which would be a Linux only
change without hypervisor or spec changes). In fact you might as well
add that support now as doing it is completely trivial once a driver
supports multiple map types.
> +static void virtblk_complete_batch(struct io_comp_batch *iob)
> +{
> + struct request *req;
> + struct virtblk_req *vbr;
> +
> + rq_list_for_each(&iob->req_list, req) {
> + vbr = blk_mq_rq_to_pdu(req);
> + virtblk_unmap_data(req, vbr);
> + virtblk_cleanup_cmd(req);
vbr is only used ones, so why not just:
virtblk_unmap_data(req, blk_mq_rq_to_pdu);
?
Or even better add a cleanup patch to just remove the vbr argument to
virtblk_unmap_data as it is not needed at all.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2022-04-05 9:23 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-05 5:31 [PATCH v4 0/2] virtio-blk: support polling I/O and mq_ops->queue_rqs() Suwan Kim
2022-04-05 5:31 ` [PATCH v4 1/2] virtio-blk: support polling I/O Suwan Kim
2022-04-05 7:26 ` Stefan Hajnoczi
2022-04-05 7:26 ` Stefan Hajnoczi
2022-04-05 10:08 ` Suwan Kim
2022-04-05 8:51 ` Christoph Hellwig [this message]
2022-04-05 8:51 ` Christoph Hellwig
2022-04-05 10:30 ` Suwan Kim
2022-04-05 14:35 ` Suwan Kim
2022-04-05 5:31 ` [PATCH v4 2/2] virtio-blk: support mq_ops->queue_rqs() Suwan Kim
2022-04-05 8:57 ` Christoph Hellwig
2022-04-05 8:57 ` Christoph Hellwig
2022-04-05 10:56 ` Suwan Kim
2022-04-05 9:09 ` Stefan Hajnoczi
2022-04-05 9:09 ` Stefan Hajnoczi
-- strict thread matches above, loose matches on Subject: below --
2022-04-04 9:28 [PATCH v4 0/2] virtio-blk: support polling I/O and mq_ops->queue_rqs() Suwan Kim
2022-04-04 9:28 ` [PATCH v4 1/2] virtio-blk: support polling I/O Suwan Kim
2022-04-05 5:21 ` Suwan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YkwDHGutRLN51hbd@infradead.org \
--to=hch@infradead.org \
--cc=dongli.zhang@oracle.com \
--cc=jasowang@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=mgurtovoy@nvidia.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=stefanha@redhat.com \
--cc=suwan.kim027@gmail.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.