* [Qemu-devel] [PATCH 1/3] Add bdrv_aio_multiwrite
2009-09-01 13:51 [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Kevin Wolf
@ 2009-09-01 13:51 ` Kevin Wolf
2009-09-01 13:51 ` [Qemu-devel] [PATCH 2/3] virtio-blk: Use bdrv_aio_multiwrite Kevin Wolf
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2009-09-01 13:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf
Allows block drivers to handle multiple write requests at once. For drivers not
implementing bdrv_aio_multiwrite a trivial emulation is provided (just call
bdrv_aio_writev for each request).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block.c | 37 +++++++++++++++++++++++++++++++++++++
block.h | 15 +++++++++++++++
block_int.h | 3 +++
3 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/block.c b/block.c
index 033957d..c154d78 100644
--- a/block.c
+++ b/block.c
@@ -1354,6 +1354,43 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
return ret;
}
+/*
+ * Submit multiple AIO write requests at once.
+ *
+ * On success, the function returns 0 and all requests in the reqs array have
+ * been submitted. In error case this function returns -1, and any of the
+ * requests may or may not be submitted yet. In particular, this means that the
+ * callback will be called for some of the requests, for others it won't. The
+ * caller must check the error field of the BlockRequest to wait for the right
+ * callbacks (if error != 0, no callback will be called).
+ *
+ * The implementation may modify the contents of the reqs array, e.g. to merge
+ * requests. However, the fields opaque and error are left unmodified as they
+ * are used to signal failure for a single request to the caller.
+ */
+int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs, int num_reqs)
+{
+ BlockDriver *drv = bs->drv;
+ BlockDriverAIOCB *acb;
+ int i;
+ int res = 0;
+
+ if (drv->bdrv_aio_multiwrite != NULL) {
+ return drv->bdrv_aio_multiwrite(bs, reqs, num_reqs);
+ }
+
+ for (i = 0; i < num_reqs; i++) {
+ acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
+ reqs[i].nb_sectors, reqs[i].cb, reqs[i].opaque);
+ if (acb == NULL) {
+ reqs[i].error = EIO;
+ res = -1;
+ }
+ }
+
+ return res;
+}
+
void bdrv_aio_cancel(BlockDriverAIOCB *acb)
{
acb->pool->cancel(acb);
diff --git a/block.h b/block.h
index 28bf357..ea69052 100644
--- a/block.h
+++ b/block.h
@@ -87,6 +87,21 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
BlockDriverCompletionFunc *cb, void *opaque);
void bdrv_aio_cancel(BlockDriverAIOCB *acb);
+typedef struct BlockRequest {
+ /* Fields to be filled by multiwrite caller */
+ int64_t sector;
+ int nb_sectors;
+ QEMUIOVector *qiov;
+ BlockDriverCompletionFunc *cb;
+ void *opaque;
+
+ /* Filled by multiwrite implementation */
+ int error;
+} BlockRequest;
+
+int bdrv_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs,
+ int num_reqs);
+
/* sg packet commands */
int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf);
BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
diff --git a/block_int.h b/block_int.h
index 0902fd4..027b1d8 100644
--- a/block_int.h
+++ b/block_int.h
@@ -70,6 +70,9 @@ struct BlockDriver {
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockDriverCompletionFunc *cb, void *opaque);
+ int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
+ int num_reqs);
+
const char *protocol_name;
int (*bdrv_truncate)(BlockDriverState *bs, int64_t offset);
int64_t (*bdrv_getlength)(BlockDriverState *bs);
--
1.6.0.6
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 2/3] virtio-blk: Use bdrv_aio_multiwrite
2009-09-01 13:51 [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Kevin Wolf
2009-09-01 13:51 ` [Qemu-devel] [PATCH 1/3] Add bdrv_aio_multiwrite Kevin Wolf
@ 2009-09-01 13:51 ` Kevin Wolf
2009-09-01 13:51 ` [Qemu-devel] [PATCH 3/3] qcow2: Add bdrv_aio_multiwrite implementation Kevin Wolf
2009-09-01 15:52 ` [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Christoph Hellwig
3 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2009-09-01 13:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf
It is quite common for virtio-blk to submit more than one write request in a
row to the qemu block layer. Use bdrv_aio_multiwrite to allow block drivers to
optimize their handling of the requests.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
hw/virtio-blk.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
1 files changed, 42 insertions(+), 8 deletions(-)
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index c160246..5c88c12 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -252,15 +252,40 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq *req)
}
#endif /* __linux__ */
-static void virtio_blk_handle_write(VirtIOBlockReq *req)
+static void do_multiwrite(BlockDriverState *bs, BlockRequest *blkreq,
+ int num_writes)
{
- BlockDriverAIOCB *acb;
+ int i, ret;
+ ret = bdrv_aio_multiwrite(bs, blkreq, num_writes);
+
+ if (ret != 0) {
+ for (i = 0; i < num_writes; i++) {
+ if (blkreq[i].error) {
+ virtio_blk_req_complete(blkreq[i].opaque, VIRTIO_BLK_S_IOERR);
+ }
+ }
+ }
+}
- acb = bdrv_aio_writev(req->dev->bs, req->out->sector, &req->qiov,
- req->qiov.size / 512, virtio_blk_rw_complete, req);
- if (!acb) {
- virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
+static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes,
+ VirtIOBlockReq *req, BlockDriverState **old_bs)
+{
+ if (req->dev->bs != *old_bs || *num_writes == 32) {
+ if (*old_bs != NULL) {
+ do_multiwrite(*old_bs, blkreq, *num_writes);
+ }
+ *num_writes = 0;
+ *old_bs = req->dev->bs;
}
+
+ blkreq[*num_writes].sector = req->out->sector;
+ blkreq[*num_writes].nb_sectors = req->qiov.size / 512;
+ blkreq[*num_writes].qiov = &req->qiov;
+ blkreq[*num_writes].cb = virtio_blk_rw_complete;
+ blkreq[*num_writes].opaque = req;
+ blkreq[*num_writes].error = 0;
+
+ (*num_writes)++;
}
static void virtio_blk_handle_read(VirtIOBlockReq *req)
@@ -278,6 +303,9 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
{
VirtIOBlock *s = to_virtio_blk(vdev);
VirtIOBlockReq *req;
+ BlockRequest blkreq[32];
+ int num_writes = 0;
+ BlockDriverState *old_bs = NULL;
while ((req = virtio_blk_get_request(s))) {
if (req->elem.out_num < 1 || req->elem.in_num < 1) {
@@ -299,13 +327,18 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
} else if (req->out->type & VIRTIO_BLK_T_OUT) {
qemu_iovec_init_external(&req->qiov, &req->elem.out_sg[1],
req->elem.out_num - 1);
- virtio_blk_handle_write(req);
+ virtio_blk_handle_write(blkreq, &num_writes, req, &old_bs);
} else {
qemu_iovec_init_external(&req->qiov, &req->elem.in_sg[0],
req->elem.in_num - 1);
virtio_blk_handle_read(req);
}
}
+
+ if (num_writes > 0) {
+ do_multiwrite(old_bs, blkreq, num_writes);
+ }
+
/*
* FIXME: Want to check for completions before returning to guest mode,
* so cached reads and writes are reported as quickly as possible. But
@@ -324,7 +357,8 @@ static void virtio_blk_dma_restart_bh(void *opaque)
s->rq = NULL;
while (req) {
- virtio_blk_handle_write(req);
+ bdrv_aio_writev(req->dev->bs, req->out->sector, &req->qiov,
+ req->qiov.size / 512, virtio_blk_rw_complete, req);
req = req->next;
}
}
--
1.6.0.6
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 3/3] qcow2: Add bdrv_aio_multiwrite implementation
2009-09-01 13:51 [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Kevin Wolf
2009-09-01 13:51 ` [Qemu-devel] [PATCH 1/3] Add bdrv_aio_multiwrite Kevin Wolf
2009-09-01 13:51 ` [Qemu-devel] [PATCH 2/3] virtio-blk: Use bdrv_aio_multiwrite Kevin Wolf
@ 2009-09-01 13:51 ` Kevin Wolf
2009-09-01 15:52 ` [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Christoph Hellwig
3 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2009-09-01 13:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf
One performance problem of qcow2 during the initial image growth are
sequential writes that are not cluster aligned. In this case, when a first
requests requires to allocate a new cluster but writes only to the first
couple of sectors in that cluster, the rest of the cluster is zeroed - just
to be overwritten by the following second request that fills up the cluster.
Let's try to merge sequential write requests to the same cluster, so we can
avoid to write the zero padding to the disk in the first place.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/qcow2.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
cutils.c | 17 +++++++
qemu-common.h | 1 +
3 files changed, 152 insertions(+), 0 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 8579e01..9fc5cec 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -646,6 +646,139 @@ static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState *bs,
return &acb->common;
}
+typedef struct QcowMultiwriteCB {
+ int error;
+ int num_requests;
+ int num_callbacks;
+ struct {
+ BlockDriverCompletionFunc *cb;
+ void *opaque;
+ QEMUIOVector *free_qiov;
+ } callbacks[];
+} QcowMultiwriteCB;
+
+static void qcow_multiwrite_user_cb(QcowMultiwriteCB *mcb)
+{
+ int i;
+
+ for (i = 0; i < mcb->num_callbacks; i++) {
+ mcb->callbacks[i].cb(mcb->callbacks[i].opaque, mcb->error);
+ qemu_free(mcb->callbacks[i].free_qiov);
+ }
+}
+
+static void qcow_multiwrite_cb(void *opaque, int ret)
+{
+ QcowMultiwriteCB *mcb = opaque;
+
+ if (ret < 0) {
+ mcb->error = ret;
+ qcow_multiwrite_user_cb(mcb);
+ }
+
+ mcb->num_requests--;
+ if (mcb->num_requests == 0) {
+ if (mcb->error == 0) {
+ qcow_multiwrite_user_cb(mcb);
+ }
+ qemu_free(mcb);
+ }
+}
+
+static int qcow_multiwrite_req_compare(const void *a, const void *b)
+{
+ return (((BlockRequest*) a)->sector - ((BlockRequest*) b)->sector);
+}
+
+static int qcow_aio_multiwrite(BlockDriverState *bs, BlockRequest *reqs,
+ int num_reqs)
+{
+ BDRVQcowState *s = bs->opaque;
+ QcowMultiwriteCB *mcb;
+ BlockDriverAIOCB *acb;
+ int i, outidx;
+
+ // Sort requests by start sector
+ qsort(reqs, num_reqs, sizeof(*reqs), &qcow_multiwrite_req_compare);
+
+ // Create QcowMultiwriteCB structure
+ mcb = qemu_mallocz(sizeof(*mcb) + num_reqs * sizeof(*mcb->callbacks));
+ mcb->num_requests = 0;
+ mcb->num_callbacks = num_reqs;
+
+ for (i = 0; i < num_reqs; i++) {
+ mcb->callbacks[i].cb = reqs[i].cb;
+ mcb->callbacks[i].opaque = reqs[i].opaque;
+ }
+
+ // Check if adjacent requests touch the same clusters. If so, combine them,
+ // filling up gaps with zero sectors.
+ outidx = 0;
+ for (i = 1; i < num_reqs; i++) {
+ int merge = 0;
+ uint64_t start_cluster = reqs[i].sector >> (s->cluster_bits - 9);
+ uint64_t end_cluster_prev =
+ (reqs[outidx].sector + reqs[outidx].nb_sectors - 1)
+ >> (s->cluster_bits - 9);
+
+ if (start_cluster == end_cluster_prev) {
+#ifdef DEBUG_MERGE
+ fprintf(stderr, "Possible merge: %lx -- %lx\n",
+ (reqs[outidx].sector + reqs[outidx].nb_sectors - 1),
+ reqs[i].sector);
+#endif
+ // TODO This is only handling exactly sequential writes. When we
+ // know that the cluster is unallocated, we could even fill in some
+ // zero padding and merge more requests.
+ if (reqs[i].sector == reqs[outidx].sector + reqs[outidx].nb_sectors) {
+#ifdef DEBUG_MERGE
+ fprintf(stderr, " Merging\n");
+#endif
+ merge = 1;
+ }
+ }
+
+ if (merge) {
+ reqs[outidx].nb_sectors += reqs[i].nb_sectors;
+ reqs[outidx].qiov =
+ qemu_iovec_concat(reqs[outidx].qiov, reqs[i].qiov);
+ mcb->callbacks[i].free_qiov = reqs[outidx].qiov;
+ } else {
+ outidx++;
+ reqs[outidx].sector = reqs[i].sector;
+ reqs[outidx].nb_sectors = reqs[i].nb_sectors;
+ reqs[outidx].qiov = reqs[i].qiov;
+ }
+ }
+
+ // Run the aio requests
+ for (i = 0; i < num_reqs; i++) {
+ acb = qcow_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
+ reqs[i].nb_sectors, qcow_multiwrite_cb, mcb);
+
+ if (acb == NULL) {
+ // We can only fail the whole thing if no request has been
+ // submitted yet. Otherwise we'll wait for the submitted AIOs to
+ // complete and report the error in the callback.
+ if (mcb->num_requests == 0) {
+ reqs[i].error = EIO;
+ goto fail;
+ } else {
+ mcb->error = EIO;
+ break;
+ }
+ } else {
+ mcb->num_requests++;
+ }
+ }
+
+ return 0;
+
+fail:
+ free(mcb);
+ return -1;
+}
+
static void qcow_close(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
@@ -1124,6 +1257,7 @@ static BlockDriver bdrv_qcow2 = {
.bdrv_aio_readv = qcow_aio_readv,
.bdrv_aio_writev = qcow_aio_writev,
.bdrv_write_compressed = qcow_write_compressed,
+ .bdrv_aio_multiwrite = qcow_aio_multiwrite,
.bdrv_snapshot_create = qcow2_snapshot_create,
.bdrv_snapshot_goto = qcow2_snapshot_goto,
diff --git a/cutils.c b/cutils.c
index bd9a019..e12013a 100644
--- a/cutils.c
+++ b/cutils.c
@@ -151,6 +151,23 @@ void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len)
++qiov->niov;
}
+QEMUIOVector *qemu_iovec_concat(QEMUIOVector *a, QEMUIOVector *b)
+{
+ int i;
+ QEMUIOVector *qiov = qemu_malloc(sizeof(*qiov));
+
+ qemu_iovec_init(qiov, a->niov + b->niov);
+
+ for (i = 0; i < a->niov; i++) {
+ qemu_iovec_add(qiov, a->iov[i].iov_base, a->iov[i].iov_len);
+ }
+ for (i = 0; i < b->niov; i++) {
+ qemu_iovec_add(qiov, b->iov[i].iov_base, b->iov[i].iov_len);
+ }
+
+ return qiov;
+}
+
void qemu_iovec_destroy(QEMUIOVector *qiov)
{
assert(qiov->nalloc != -1);
diff --git a/qemu-common.h b/qemu-common.h
index 74ac88f..2871820 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -223,6 +223,7 @@ typedef struct QEMUIOVector {
void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int niov);
void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
+QEMUIOVector *qemu_iovec_concat(QEMUIOVector *a, QEMUIOVector *b);
void qemu_iovec_destroy(QEMUIOVector *qiov);
void qemu_iovec_reset(QEMUIOVector *qiov);
void qemu_iovec_to_buffer(QEMUIOVector *qiov, void *buf);
--
1.6.0.6
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-01 13:51 [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Kevin Wolf
` (2 preceding siblings ...)
2009-09-01 13:51 ` [Qemu-devel] [PATCH 3/3] qcow2: Add bdrv_aio_multiwrite implementation Kevin Wolf
@ 2009-09-01 15:52 ` Christoph Hellwig
2009-09-01 16:08 ` Anthony Liguori
` (2 more replies)
3 siblings, 3 replies; 13+ messages in thread
From: Christoph Hellwig @ 2009-09-01 15:52 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
On Tue, Sep 01, 2009 at 03:51:49PM +0200, Kevin Wolf wrote:
> virtio often issues multiple requests in a row, but each one independently. If
> the block drivers knew all of the requests, they could optimize the way they
> handle the requests. See the description of patch 3 for how qcow2 can use this
> to avoid unnecessary writes to the disk.
I think this interface is extremly awkward and the layering is wrong.
Everyone benefits from having one large instead of multiple small
requests, so if we do get multiple sequential write requests we should
always merged it at a high level even before starting to issue AIO,
e.g. do it all in virtio-blk.
Of course using a sane filesystem in the guest would also fix it,
but the point of virtualization is at least partially to keep all
that old crap working nicely.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-01 15:52 ` [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Christoph Hellwig
@ 2009-09-01 16:08 ` Anthony Liguori
2009-09-01 16:14 ` Christoph Hellwig
2009-09-01 17:00 ` Avi Kivity
2009-09-01 16:59 ` Avi Kivity
2009-09-02 7:27 ` Kevin Wolf
2 siblings, 2 replies; 13+ messages in thread
From: Anthony Liguori @ 2009-09-01 16:08 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Kevin Wolf, qemu-devel
Christoph Hellwig wrote:
> On Tue, Sep 01, 2009 at 03:51:49PM +0200, Kevin Wolf wrote:
>
>> virtio often issues multiple requests in a row, but each one independently. If
>> the block drivers knew all of the requests, they could optimize the way they
>> handle the requests. See the description of patch 3 for how qcow2 can use this
>> to avoid unnecessary writes to the disk.
>>
>
> I think this interface is extremly awkward and the layering is wrong.
> Everyone benefits from having one large instead of multiple small
> requests, so if we do get multiple sequential write requests we should
> always merged it at a high level even before starting to issue AIO,
> e.g. do it all in virtio-blk.
>
Or introduce a helper layer that coalesces requests that lives outside
of the core block API.
I don't like the interface either although I'm not sure if the
alternative is better. A different way to do this would be to have a
queuing API. A device would queue requests and then kick the queue.
This is closer to how virtio-blk works.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-01 16:08 ` Anthony Liguori
@ 2009-09-01 16:14 ` Christoph Hellwig
2009-09-01 17:00 ` Avi Kivity
1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2009-09-01 16:14 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Kevin Wolf, Christoph Hellwig, qemu-devel
On Tue, Sep 01, 2009 at 11:08:10AM -0500, Anthony Liguori wrote:
> Or introduce a helper layer that coalesces requests that lives outside
> of the core block API.
I'm not even sure we need much of a helper. It's just a slightly
smarter way to build the qiov - instead of submitting the qiov once
we're done with one request poke into the next one and add it to
if it fits, else submit the request. Also submit it if no more
request is pending.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-01 16:08 ` Anthony Liguori
2009-09-01 16:14 ` Christoph Hellwig
@ 2009-09-01 17:00 ` Avi Kivity
1 sibling, 0 replies; 13+ messages in thread
From: Avi Kivity @ 2009-09-01 17:00 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Kevin Wolf, Christoph Hellwig, qemu-devel
On 09/01/2009 07:08 PM, Anthony Liguori wrote:
>
> I don't like the interface either although I'm not sure if the
> alternative is better. A different way to do this would be to have a
> queuing API. A device would queue requests and then kick the queue.
> This is closer to how virtio-blk works.
>
The other end would simply collect the requests in an array and call
multiwrite when the queue is kicked.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-01 15:52 ` [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Christoph Hellwig
2009-09-01 16:08 ` Anthony Liguori
@ 2009-09-01 16:59 ` Avi Kivity
2009-09-02 7:27 ` Kevin Wolf
2 siblings, 0 replies; 13+ messages in thread
From: Avi Kivity @ 2009-09-01 16:59 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Kevin Wolf, qemu-devel
On 09/01/2009 06:52 PM, Christoph Hellwig wrote:
> On Tue, Sep 01, 2009 at 03:51:49PM +0200, Kevin Wolf wrote:
>
>> virtio often issues multiple requests in a row, but each one independently. If
>> the block drivers knew all of the requests, they could optimize the way they
>> handle the requests. See the description of patch 3 for how qcow2 can use this
>> to avoid unnecessary writes to the disk.
>>
> I think this interface is extremly awkward and the layering is wrong.
>
Can you explain why?
> Everyone benefits from having one large instead of multiple small
> requests, so if we do get multiple sequential write requests we should
> always merged it at a high level even before starting to issue AIO,
> e.g. do it all in virtio-blk.
>
I don't think we should merge requests for the guest. If it issued
multiple requests, that's what it wants.
I see two benefits to multiwrite:
- handle layout issues in qcow2 (so we can allocate for all the
requests, then submit them in parallel)
- submit multiple requests at once for linux-aio
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-01 15:52 ` [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once Christoph Hellwig
2009-09-01 16:08 ` Anthony Liguori
2009-09-01 16:59 ` Avi Kivity
@ 2009-09-02 7:27 ` Kevin Wolf
2009-09-02 15:43 ` Christoph Hellwig
2 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2009-09-02 7:27 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: qemu-devel
Christoph Hellwig schrieb:
> On Tue, Sep 01, 2009 at 03:51:49PM +0200, Kevin Wolf wrote:
>> virtio often issues multiple requests in a row, but each one independently. If
>> the block drivers knew all of the requests, they could optimize the way they
>> handle the requests. See the description of patch 3 for how qcow2 can use this
>> to avoid unnecessary writes to the disk.
>
> I think this interface is extremly awkward and the layering is wrong.
> Everyone benefits from having one large instead of multiple small
> requests, so if we do get multiple sequential write requests we should
> always merged it at a high level even before starting to issue AIO,
> e.g. do it all in virtio-blk.
While I don't agree with everything you're saying, I think you do have a
point. When you're working on one specific problem, you're sometimes
blind for the big picture.
It's true that the functionality that is implemented so far could be
useful for all block formats. I'm not sure if merging requests is the
only useful thing you could with it, but for now it seems to be the only
one. So the general merging mechanism doesn't belong into the qcow2
block driver.
On the other hand, if you look at the TODO which says that we need to
merge requests if they are not directly adjacent, but touch the same
cluster, this is something that cannot be done by the generic block
layer - it doesn't even know that things like clusters exist. We could
have a block driver function like bdrv_can_merge that takes two requests
and checks if it would make sense to merge these requests. Does that
make sense?
Your suggestion to do it all in virtio-blk I don't like at all. We need
to do it in the most generic place, not in earliest possible place. Why
shouldn't there be other devices that could profit from it? Not sure
about IDE and SCSI, but the Xen disk device should definitely be able to
make use of it. Let's not make the mistake to change the code from being
too format specific to being too device specific. It belongs in block.c.
> Of course using a sane filesystem in the guest would also fix it,
> but the point of virtualization is at least partially to keep all
> that old crap working nicely.
Not sure what your definition of "old crap" is, but ext3 seems to meet
it. I don't think it's irrelevant.
Kevin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-02 7:27 ` Kevin Wolf
@ 2009-09-02 15:43 ` Christoph Hellwig
2009-09-02 15:50 ` Kevin Wolf
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2009-09-02 15:43 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Christoph Hellwig, qemu-devel
On Wed, Sep 02, 2009 at 09:27:11AM +0200, Kevin Wolf wrote:
> > Of course using a sane filesystem in the guest would also fix it,
> > but the point of virtualization is at least partially to keep all
> > that old crap working nicely.
>
> Not sure what your definition of "old crap" is, but ext3 seems to meet
> it. I don't think it's irrelevant.
Even ext3 should not do it as the elevator would merge the requests.
Do you happen to run a kernel which accidentally turned on the misnamed
and misguided SSD flag on for virtio? That would explain sending
lots tiny I/Os. You can checks this with a:
cat /sys/block/vd*/queue/rotational
if a 0 turns up somewhere we have the problem.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-02 15:43 ` Christoph Hellwig
@ 2009-09-02 15:50 ` Kevin Wolf
2009-09-02 17:26 ` Christoph Hellwig
0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2009-09-02 15:50 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: qemu-devel
Christoph Hellwig schrieb:
> On Wed, Sep 02, 2009 at 09:27:11AM +0200, Kevin Wolf wrote:
>>> Of course using a sane filesystem in the guest would also fix it,
>>> but the point of virtualization is at least partially to keep all
>>> that old crap working nicely.
>> Not sure what your definition of "old crap" is, but ext3 seems to meet
>> it. I don't think it's irrelevant.
>
> Even ext3 should not do it as the elevator would merge the requests.
> Do you happen to run a kernel which accidentally turned on the misnamed
> and misguided SSD flag on for virtio? That would explain sending
> lots tiny I/Os. You can checks this with a:
>
> cat /sys/block/vd*/queue/rotational
>
> if a 0 turns up somewhere we have the problem.
It's a RHEL 5.3 installation, and the kernel doesn't seem to even
provide this file. However, I think I read that this is an issue of
relatively new kernels, so probably not?
What could matter, though, is that the installer uses LVM for the
standard layout which I'm using. Does this make a difference in that
respect?
Kevin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] block: Handle multiple write requests at once
2009-09-02 15:50 ` Kevin Wolf
@ 2009-09-02 17:26 ` Christoph Hellwig
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2009-09-02 17:26 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Christoph Hellwig, qemu-devel
On Wed, Sep 02, 2009 at 05:50:47PM +0200, Kevin Wolf wrote:
> It's a RHEL 5.3 installation, and the kernel doesn't seem to even
> provide this file. However, I think I read that this is an issue of
> relatively new kernels, so probably not?
Indeed.
> What could matter, though, is that the installer uses LVM for the
> standard layout which I'm using. Does this make a difference in that
> respect?
It should not matter. Then again it should not issue these I/O patterns
at all, but it does. Something is clearly wrong in those I/O patterns,
but given that the exist out there in the field it seems like we'll have
to deal with it. I'll look over the patch a bit more carefully to see
if there is any obviously better way to handle it.
^ permalink raw reply [flat|nested] 13+ messages in thread