* [Qemu-devel] [PATCH v5 0/3] linux-aio: introduce submit I/O as a batch
@ 2014-07-04 3:06 Ming Lei
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 1/3] block: block: introduce APIs for submitting IO " Ming Lei
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Ming Lei @ 2014-07-04 3:06 UTC (permalink / raw)
To: Peter Maydell, qemu-devel, Paolo Bonzini, Stefan Hajnoczi
Cc: Kevin Wolf, Fam Zheng, Michael S. Tsirkin
Hi,
The commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O)
introduces ~40% throughput regression on virtio-blk dataplane, and
one of causes is that submitting I/O as a batch is removed.
This patchset trys to introduce this mechanism on block, at least,
linux-aio can benefit from that.
With these patches, it is observed that thoughout on virtio-blk
dataplane can be improved a lot, see data in commit log of patch
3/3.
It should be possible to apply the batch mechanism to other devices
(such as virtio-scsi) too.
TODO:
- support queuing I/O to multi files for scsi devies, which
need some changes to linux-aio
V5:
- rebase on v2.1.0-rc0 of qemu.git/master
- block/linux-aio.c code style fix
- don't flush io queue before flush, pointed by Paolo
V4:
- support other non-raw formats with under-optimized performance
- use reference counter for plug & unplug
- flush io queue before sending flush command
V3:
- only support submitting I/O as a batch for raw format, pointed by
Kevin
V2:
- define return value of bdrv_io_unplug as void, suggested by Paolo
- avoid busy-wait for handling io_submit
V1:
- move queuing io stuff into linux-aio.c as suggested by Paolo
Thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH v5 1/3] block: block: introduce APIs for submitting IO as a batch
2014-07-04 3:06 [Qemu-devel] [PATCH v5 0/3] linux-aio: introduce submit I/O as a batch Ming Lei
@ 2014-07-04 3:06 ` Ming Lei
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 2/3] linux-aio: implement io plug, unplug and flush io queue Ming Lei
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 3/3] dataplane: submit I/O as a batch Ming Lei
2 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2014-07-04 3:06 UTC (permalink / raw)
To: Peter Maydell, qemu-devel, Paolo Bonzini, Stefan Hajnoczi
Cc: Kevin Wolf, Ming Lei, Fam Zheng, Michael S. Tsirkin
This patch introduces three APIs so that following
patches can support queuing I/O requests and submitting them
as a batch for improving I/O performance.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block.c | 31 +++++++++++++++++++++++++++++++
include/block/block.h | 4 ++++
include/block/block_int.h | 5 +++++
3 files changed, 40 insertions(+)
diff --git a/block.c b/block.c
index f80e2b2..8800a6b 100644
--- a/block.c
+++ b/block.c
@@ -1905,6 +1905,7 @@ void bdrv_drain_all(void)
bool bs_busy;
aio_context_acquire(aio_context);
+ bdrv_flush_io_queue(bs);
bdrv_start_throttled_reqs(bs);
bs_busy = bdrv_requests_pending(bs);
bs_busy |= aio_poll(aio_context, bs_busy);
@@ -5782,3 +5783,33 @@ BlockDriverState *check_to_replace_node(const char *node_name, Error **errp)
return to_replace_bs;
}
+
+void bdrv_io_plug(BlockDriverState *bs)
+{
+ BlockDriver *drv = bs->drv;
+ if (drv && drv->bdrv_io_plug) {
+ drv->bdrv_io_plug(bs);
+ } else if (bs->file) {
+ bdrv_io_plug(bs->file);
+ }
+}
+
+void bdrv_io_unplug(BlockDriverState *bs)
+{
+ BlockDriver *drv = bs->drv;
+ if (drv && drv->bdrv_io_unplug) {
+ drv->bdrv_io_unplug(bs);
+ } else if (bs->file) {
+ bdrv_io_unplug(bs->file);
+ }
+}
+
+void bdrv_flush_io_queue(BlockDriverState *bs)
+{
+ BlockDriver *drv = bs->drv;
+ if (drv && drv->bdrv_flush_io_queue) {
+ drv->bdrv_flush_io_queue(bs);
+ } else if (bs->file) {
+ bdrv_flush_io_queue(bs->file);
+ }
+}
diff --git a/include/block/block.h b/include/block/block.h
index baecc26..32d3676 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -584,4 +584,8 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs);
*/
void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
+void bdrv_io_plug(BlockDriverState *bs);
+void bdrv_io_unplug(BlockDriverState *bs);
+void bdrv_flush_io_queue(BlockDriverState *bs);
+
#endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8f8e65e..f6c3bef 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -261,6 +261,11 @@ struct BlockDriver {
void (*bdrv_attach_aio_context)(BlockDriverState *bs,
AioContext *new_context);
+ /* io queue for linux-aio */
+ void (*bdrv_io_plug)(BlockDriverState *bs);
+ void (*bdrv_io_unplug)(BlockDriverState *bs);
+ void (*bdrv_flush_io_queue)(BlockDriverState *bs);
+
QLIST_ENTRY(BlockDriver) list;
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH v5 2/3] linux-aio: implement io plug, unplug and flush io queue
2014-07-04 3:06 [Qemu-devel] [PATCH v5 0/3] linux-aio: introduce submit I/O as a batch Ming Lei
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 1/3] block: block: introduce APIs for submitting IO " Ming Lei
@ 2014-07-04 3:06 ` Ming Lei
2014-07-04 8:42 ` Stefan Hajnoczi
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 3/3] dataplane: submit I/O as a batch Ming Lei
2 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2014-07-04 3:06 UTC (permalink / raw)
To: Peter Maydell, qemu-devel, Paolo Bonzini, Stefan Hajnoczi
Cc: Kevin Wolf, Ming Lei, Fam Zheng, Michael S. Tsirkin
This patch implements .bdrv_io_plug, .bdrv_io_unplug and
.bdrv_flush_io_queue callbacks for linux-aio Block Drivers,
so that submitting I/O as a batch can be supported on linux-aio.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
block/linux-aio.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++--
block/raw-aio.h | 2 ++
block/raw-posix.c | 45 ++++++++++++++++++++++++++
3 files changed, 138 insertions(+), 2 deletions(-)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index f0a2c08..9f1883e 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -25,6 +25,8 @@
*/
#define MAX_EVENTS 128
+#define MAX_QUEUED_IO 128
+
struct qemu_laiocb {
BlockDriverAIOCB common;
struct qemu_laio_state *ctx;
@@ -36,9 +38,19 @@ struct qemu_laiocb {
QLIST_ENTRY(qemu_laiocb) node;
};
+struct laio_queue {
+ struct iocb *iocbs[MAX_QUEUED_IO];
+ int plugged;
+ unsigned int size;
+ unsigned int idx;
+};
+
struct qemu_laio_state {
io_context_t ctx;
EventNotifier e;
+
+ /* io queue for submit at batch */
+ struct laio_queue io_q;
};
static inline ssize_t io_event_ret(struct io_event *ev)
@@ -135,6 +147,77 @@ static const AIOCBInfo laio_aiocb_info = {
.cancel = laio_cancel,
};
+static void ioq_init(struct laio_queue *io_q)
+{
+ io_q->size = MAX_QUEUED_IO;
+ io_q->idx = 0;
+ io_q->plugged = 0;
+}
+
+static int ioq_submit(struct qemu_laio_state *s)
+{
+ int ret, i = 0;
+ int len = s->io_q.idx;
+
+ do {
+ ret = io_submit(s->ctx, len, s->io_q.iocbs);
+ } while (i++ < 3 && ret == -EAGAIN);
+
+ /* empty io queue */
+ s->io_q.idx = 0;
+
+ if (ret >= 0) {
+ return 0;
+ }
+
+ for (i = 0; i < len; i++) {
+ struct qemu_laiocb *laiocb =
+ container_of(s->io_q.iocbs[i], struct qemu_laiocb, iocb);
+
+ laiocb->ret = ret;
+ qemu_laio_process_completion(s, laiocb);
+ }
+ return ret;
+}
+
+static void ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
+{
+ unsigned int idx = s->io_q.idx;
+
+ s->io_q.iocbs[idx++] = iocb;
+ s->io_q.idx = idx;
+
+ /* submit immediately if queue is full */
+ if (idx == s->io_q.size) {
+ ioq_submit(s);
+ }
+}
+
+void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
+{
+ struct qemu_laio_state *s = aio_ctx;
+
+ s->io_q.plugged++;
+}
+
+int laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug)
+{
+ struct qemu_laio_state *s = aio_ctx;
+ int ret = 0;
+
+ assert(s->io_q.plugged > 0 || !unplug);
+
+ if (unplug && --s->io_q.plugged > 0) {
+ return 0;
+ }
+
+ if (s->io_q.idx > 0) {
+ ret = ioq_submit(s);
+ }
+
+ return ret;
+}
+
BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockDriverCompletionFunc *cb, void *opaque, int type)
@@ -168,8 +251,12 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
}
io_set_eventfd(&laiocb->iocb, event_notifier_get_fd(&s->e));
- if (io_submit(s->ctx, 1, &iocbs) < 0)
- goto out_free_aiocb;
+ if (!s->io_q.plugged) {
+ if (io_submit(s->ctx, 1, &iocbs) < 0)
+ goto out_free_aiocb;
+ } else {
+ ioq_enqueue(s, iocbs);
+ }
return &laiocb->common;
out_free_aiocb:
@@ -204,6 +291,8 @@ void *laio_init(void)
goto out_close_efd;
}
+ ioq_init(&s->io_q);
+
return s;
out_close_efd:
diff --git a/block/raw-aio.h b/block/raw-aio.h
index 8cf084e..e18c975 100644
--- a/block/raw-aio.h
+++ b/block/raw-aio.h
@@ -40,6 +40,8 @@ BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
BlockDriverCompletionFunc *cb, void *opaque, int type);
void laio_detach_aio_context(void *s, AioContext *old_context);
void laio_attach_aio_context(void *s, AioContext *new_context);
+void laio_io_plug(BlockDriverState *bs, void *aio_ctx);
+int laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug);
#endif
#ifdef _WIN32
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 825a0c8..808b2d8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1057,6 +1057,36 @@ static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs,
cb, opaque, type);
}
+static void raw_aio_plug(BlockDriverState *bs)
+{
+#ifdef CONFIG_LINUX_AIO
+ BDRVRawState *s = bs->opaque;
+ if (s->use_aio) {
+ laio_io_plug(bs, s->aio_ctx);
+ }
+#endif
+}
+
+static void raw_aio_unplug(BlockDriverState *bs)
+{
+#ifdef CONFIG_LINUX_AIO
+ BDRVRawState *s = bs->opaque;
+ if (s->use_aio) {
+ laio_io_unplug(bs, s->aio_ctx, true);
+ }
+#endif
+}
+
+static void raw_aio_flush_io_queue(BlockDriverState *bs)
+{
+#ifdef CONFIG_LINUX_AIO
+ BDRVRawState *s = bs->opaque;
+ if (s->use_aio) {
+ laio_io_unplug(bs, s->aio_ctx, false);
+ }
+#endif
+}
+
static BlockDriverAIOCB *raw_aio_readv(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockDriverCompletionFunc *cb, void *opaque)
@@ -1528,6 +1558,9 @@ static BlockDriver bdrv_file = {
.bdrv_aio_flush = raw_aio_flush,
.bdrv_aio_discard = raw_aio_discard,
.bdrv_refresh_limits = raw_refresh_limits,
+ .bdrv_io_plug = raw_aio_plug,
+ .bdrv_io_unplug = raw_aio_unplug,
+ .bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
@@ -1927,6 +1960,9 @@ static BlockDriver bdrv_host_device = {
.bdrv_aio_flush = raw_aio_flush,
.bdrv_aio_discard = hdev_aio_discard,
.bdrv_refresh_limits = raw_refresh_limits,
+ .bdrv_io_plug = raw_aio_plug,
+ .bdrv_io_unplug = raw_aio_unplug,
+ .bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
@@ -2072,6 +2108,9 @@ static BlockDriver bdrv_host_floppy = {
.bdrv_aio_writev = raw_aio_writev,
.bdrv_aio_flush = raw_aio_flush,
.bdrv_refresh_limits = raw_refresh_limits,
+ .bdrv_io_plug = raw_aio_plug,
+ .bdrv_io_unplug = raw_aio_unplug,
+ .bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
@@ -2200,6 +2239,9 @@ static BlockDriver bdrv_host_cdrom = {
.bdrv_aio_writev = raw_aio_writev,
.bdrv_aio_flush = raw_aio_flush,
.bdrv_refresh_limits = raw_refresh_limits,
+ .bdrv_io_plug = raw_aio_plug,
+ .bdrv_io_unplug = raw_aio_unplug,
+ .bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
@@ -2334,6 +2376,9 @@ static BlockDriver bdrv_host_cdrom = {
.bdrv_aio_writev = raw_aio_writev,
.bdrv_aio_flush = raw_aio_flush,
.bdrv_refresh_limits = raw_refresh_limits,
+ .bdrv_io_plug = raw_aio_plug,
+ .bdrv_io_unplug = raw_aio_unplug,
+ .bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH v5 3/3] dataplane: submit I/O as a batch
2014-07-04 3:06 [Qemu-devel] [PATCH v5 0/3] linux-aio: introduce submit I/O as a batch Ming Lei
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 1/3] block: block: introduce APIs for submitting IO " Ming Lei
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 2/3] linux-aio: implement io plug, unplug and flush io queue Ming Lei
@ 2014-07-04 3:06 ` Ming Lei
2 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2014-07-04 3:06 UTC (permalink / raw)
To: Peter Maydell, qemu-devel, Paolo Bonzini, Stefan Hajnoczi
Cc: Kevin Wolf, Ming Lei, Fam Zheng, Michael S. Tsirkin
Before commit 580b6b2aa2(dataplane: use the QEMU block
layer for I/O), dataplane for virtio-blk submits block
I/O as a batch.
This commit 580b6b2aa2 replaces the custom linux AIO
implementation(including submit I/O as a batch) with QEMU
block layer, but this commit causes ~40% throughput regression
on virtio-blk performance, and removing submitting I/O
as a batch is one of the causes.
This patch applies the newly introduced bdrv_io_plug() and
bdrv_io_unplug() interfaces to support submitting I/O
at batch for Qemu block layer, and in my test, the change
can improve throughput by ~30% with 'aio=native'.
Following my fio test script:
[global]
direct=1
size=4G
bsrange=4k-4k
timeout=40
numjobs=4
ioengine=libaio
iodepth=64
filename=/dev/vdc
group_reporting=1
[f]
rw=randread
Result on one of my small machine(host: x86_64, 2cores, 4thread, guest: 4cores):
- qemu master: 65K IOPS
- qemu master with these patches: 92K IOPS
- 2.0.0 release(dataplane using custom linux aio): 104K IOPS
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
hw/block/dataplane/virtio-blk.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 6cd75f6..bed9f13 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -83,6 +83,7 @@ static void handle_notify(EventNotifier *e)
};
event_notifier_test_and_clear(&s->host_notifier);
+ bdrv_io_plug(s->blk->conf.bs);
for (;;) {
/* Disable guest->host notifies to avoid unnecessary vmexits */
vring_disable_notification(s->vdev, &s->vring);
@@ -116,6 +117,7 @@ static void handle_notify(EventNotifier *e)
break;
}
}
+ bdrv_io_unplug(s->blk->conf.bs);
}
/* Context: QEMU global mutex held */
--
1.7.9.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH v5 2/3] linux-aio: implement io plug, unplug and flush io queue
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 2/3] linux-aio: implement io plug, unplug and flush io queue Ming Lei
@ 2014-07-04 8:42 ` Stefan Hajnoczi
0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2014-07-04 8:42 UTC (permalink / raw)
To: Ming Lei
Cc: Kevin Wolf, Peter Maydell, Fam Zheng, Michael S. Tsirkin,
qemu-devel, Stefan Hajnoczi, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 757 bytes --]
On Fri, Jul 04, 2014 at 11:06:41AM +0800, Ming Lei wrote:
> +static int ioq_submit(struct qemu_laio_state *s)
> +{
> + int ret, i = 0;
> + int len = s->io_q.idx;
> +
> + do {
> + ret = io_submit(s->ctx, len, s->io_q.iocbs);
> + } while (i++ < 3 && ret == -EAGAIN);
> +
> + /* empty io queue */
> + s->io_q.idx = 0;
> +
> + if (ret >= 0) {
> + return 0;
> + }
> +
> + for (i = 0; i < len; i++) {
> + struct qemu_laiocb *laiocb =
> + container_of(s->io_q.iocbs[i], struct qemu_laiocb, iocb);
> +
> + laiocb->ret = ret;
> + qemu_laio_process_completion(s, laiocb);
> + }
> + return ret;
> +}
Please see my review of the previous revision. You didn't address my
comments.
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-07-04 8:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-04 3:06 [Qemu-devel] [PATCH v5 0/3] linux-aio: introduce submit I/O as a batch Ming Lei
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 1/3] block: block: introduce APIs for submitting IO " Ming Lei
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 2/3] linux-aio: implement io plug, unplug and flush io queue Ming Lei
2014-07-04 8:42 ` Stefan Hajnoczi
2014-07-04 3:06 ` [Qemu-devel] [PATCH v5 3/3] dataplane: submit I/O as a batch Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).