* [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file
@ 2012-08-28 13:04 Stefan Hajnoczi
2012-08-28 13:25 ` Paolo Bonzini
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2012-08-28 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Anthony Liguori, Stefan Hajnoczi
Zero writes have cluster granularity in QED. Therefore they can only be
used to zero entire clusters.
If the zero write request leaves sectors untouched, zeroing the entire
cluster would obscure the backing file. Instead return -ENOTSUP, which
is handled by block.c:bdrv_co_do_write_zeroes() and falls back to a
regular write.
The qemu-iotests 034 test cases covers this scenario.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/qed.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/block/qed.c b/block/qed.c
index a02dbfd..21cb239 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1363,10 +1363,21 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
int nb_sectors)
{
BlockDriverAIOCB *blockacb;
+ BDRVQEDState *s = bs->opaque;
QEDWriteZeroesCB cb = { .done = false };
QEMUIOVector qiov;
struct iovec iov;
+ /* Refuse if there are untouched backing file sectors */
+ if (bs->backing_hd) {
+ if (qed_offset_into_cluster(s, sector_num * BDRV_SECTOR_SIZE) != 0) {
+ return -ENOTSUP;
+ }
+ if (qed_offset_into_cluster(s, nb_sectors * BDRV_SECTOR_SIZE) != 0) {
+ return -ENOTSUP;
+ }
+ }
+
/* Zero writes start without an I/O buffer. If a buffer becomes necessary
* then it will be allocated during request processing.
*/
--
1.7.10.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file
2012-08-28 13:04 [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file Stefan Hajnoczi
@ 2012-08-28 13:25 ` Paolo Bonzini
2012-08-28 13:37 ` Stefan Hajnoczi
2012-08-28 13:38 ` Kevin Wolf
0 siblings, 2 replies; 7+ messages in thread
From: Paolo Bonzini @ 2012-08-28 13:25 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, Anthony Liguori, qemu-devel
Il 28/08/2012 15:04, Stefan Hajnoczi ha scritto:
> Zero writes have cluster granularity in QED. Therefore they can only be
> used to zero entire clusters.
>
> If the zero write request leaves sectors untouched, zeroing the entire
> cluster would obscure the backing file. Instead return -ENOTSUP, which
> is handled by block.c:bdrv_co_do_write_zeroes() and falls back to a
> regular write.
>
> The qemu-iotests 034 test cases covers this scenario.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Makes sense since both streaming and copy-on-read will do cluster-aligned writes.
The "right fix" would not be much more complex though, something like this, right?
(untested).
diff --git a/block/qed.c b/block/qed.c
index a02dbfd..a885671 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1133,8 +1133,14 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
return;
}
+ if (qed_offset_into_cluster(s, acb->cur_pos) != 0 ||
+ qed_offset_into_cluster(s, acb->cur_pos + acb->cur_qiov.size) != 0) {
+ goto copy;
+ }
+
cb = qed_aio_write_zero_cluster;
} else {
+copy:
cb = qed_aio_write_prefill;
acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
}
Paolo
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file
2012-08-28 13:25 ` Paolo Bonzini
@ 2012-08-28 13:37 ` Stefan Hajnoczi
2012-08-28 14:23 ` Paolo Bonzini
2012-08-28 13:38 ` Kevin Wolf
1 sibling, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2012-08-28 13:37 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, qemu-devel
On Tue, Aug 28, 2012 at 2:25 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 28/08/2012 15:04, Stefan Hajnoczi ha scritto:
>> Zero writes have cluster granularity in QED. Therefore they can only be
>> used to zero entire clusters.
>>
>> If the zero write request leaves sectors untouched, zeroing the entire
>> cluster would obscure the backing file. Instead return -ENOTSUP, which
>> is handled by block.c:bdrv_co_do_write_zeroes() and falls back to a
>> regular write.
>>
>> The qemu-iotests 034 test cases covers this scenario.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>
> Makes sense since both streaming and copy-on-read will do cluster-aligned writes.
>
> The "right fix" would not be much more complex though, something like this, right?
> (untested).
Yes but it's more complicated. To do a really good job we should
slice off the first/last clusters if they are unaligned, handle them
like regular allocating writes, and handle the middle of the request
as a zero write.
I decided to do the simplest implementation since this scenario only
occurs in test cases, not real guests.
Stefan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file
2012-08-28 13:25 ` Paolo Bonzini
2012-08-28 13:37 ` Stefan Hajnoczi
@ 2012-08-28 13:38 ` Kevin Wolf
1 sibling, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2012-08-28 13:38 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Anthony Liguori, Stefan Hajnoczi, qemu-devel
Am 28.08.2012 15:25, schrieb Paolo Bonzini:
> Il 28/08/2012 15:04, Stefan Hajnoczi ha scritto:
>> Zero writes have cluster granularity in QED. Therefore they can only be
>> used to zero entire clusters.
>>
>> If the zero write request leaves sectors untouched, zeroing the entire
>> cluster would obscure the backing file. Instead return -ENOTSUP, which
>> is handled by block.c:bdrv_co_do_write_zeroes() and falls back to a
>> regular write.
>>
>> The qemu-iotests 034 test cases covers this scenario.
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Thanks, applied to the block branch.
> Makes sense since both streaming and copy-on-read will do cluster-aligned writes.
>
> The "right fix" would not be much more complex though, something like this, right?
> (untested).
I think Stefan's fix is the right one. It does the same thing as yours
and it's much simpler.
Kevin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file
2012-08-28 13:37 ` Stefan Hajnoczi
@ 2012-08-28 14:23 ` Paolo Bonzini
2012-08-29 7:53 ` Stefan Hajnoczi
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2012-08-28 14:23 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 901 bytes --]
Il 28/08/2012 15:37, Stefan Hajnoczi ha scritto:
>> > The "right fix" would not be much more complex though, something like this, right?
>> > (untested).
> Yes but it's more complicated. To do a really good job we should
> slice off the first/last clusters if they are unaligned, handle them
> like regular allocating writes, and handle the middle of the request
> as a zero write.
>
> I decided to do the simplest implementation since this scenario only
> occurs in test cases, not real guests.
Yes, I was curious because it reminded me of the patch I did to write
zeroes when I was playing with discard to avoid the large bounce buffer
in qed_aio_write_inplace. That patch takes care of processing clusters
one by one (though that means one L2 write for each and every cluster,
not just the first and last).
It probably causes a performance hit, but anyway I attach it for
completeness.
Paolo
[-- Attachment #2: wz.patch --]
[-- Type: text/x-patch, Size: 7644 bytes --]
>From 98f2978ae5d0f34dca0369fcc727d1e533c0e6b0 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 8 Mar 2012 14:48:29 +0100
Subject: [PATCH 1/2] qed: make write-zeroes bounce buffer smaller than a
single cluster
Currently, a write-zeroes operation could allocates memory for the whole
I/O operation if it is not aligned. This is not necessary, because only
two unaligned clusters could be written.
This makes the write-zeroes operation proceed one cluster at a time,
even if all clusters are currently available and zero. This does cause
worse performance due to multiple L2 table writes. However, if zero-write
detection is enabled it means we're not interested in maximum performance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/qed.c | 75 +++++++++++++++++++++++++++++++++++++------------------------
1 file modificato, 46 inserzioni(+), 29 rimozioni(-)
diff --git a/block/qed.c b/block/qed.c
index a02dbfd..bcea346 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -869,7 +869,9 @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
/* Free the buffer we may have allocated for zero writes */
if (acb->flags & QED_AIOCB_ZERO) {
qemu_vfree(acb->qiov->iov[0].iov_base);
- acb->qiov->iov[0].iov_base = NULL;
+ qemu_iovec_destroy(acb->qiov);
+ g_free(acb->qiov);
+ acb->qiov = NULL;
}
/* Arrange for a bh to invoke the completion function */
@@ -1096,6 +1098,34 @@ static void qed_aio_write_zero_cluster(void *opaque, int ret)
}
/**
+ * Calculate the I/O vector
+ *
+ * @acb: Write request
+ * @len: Length in bytes
+ */
+static void qed_prepare_qiov(QEDAIOCB *acb, size_t len)
+{
+ /* Calculate the I/O vector */
+ if (acb->flags & QED_AIOCB_ZERO) {
+ /* Allocate buffer for zero writes */
+ if (!acb->qiov) {
+ BDRVQEDState *s = acb_to_s(acb);
+ char *base;
+
+ acb->qiov = g_malloc(sizeof(QEMUIOVector));
+ base = qemu_blockalign(s->bs, s->header.cluster_size);
+ qemu_iovec_init(acb->qiov, 1);
+ qemu_iovec_add(acb->qiov, base, s->header.cluster_size);
+ memset(base, 0, s->header.cluster_size);
+ }
+ assert(len <= acb->qiov->size);
+ qemu_iovec_add(&acb->cur_qiov, acb->qiov->iov[0].iov_base, len);
+ } else {
+ qemu_iovec_concat(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
+ }
+}
+
+/**
* Write new data cluster
*
* @acb: Write request
@@ -1124,21 +1154,20 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
acb->cur_nclusters = qed_bytes_to_clusters(s,
qed_offset_into_cluster(s, acb->cur_pos) + len);
- qemu_iovec_concat(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
if (acb->flags & QED_AIOCB_ZERO) {
/* Skip ahead if the clusters are already zero */
if (acb->find_cluster_ret == QED_CLUSTER_ZERO) {
- qed_aio_next_io(acb, 0);
- return;
+ cb = qed_aio_next_io;
+ } else {
+ cb = qed_aio_write_zero_cluster;
}
-
- cb = qed_aio_write_zero_cluster;
} else {
cb = qed_aio_write_prefill;
acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
}
+ qed_prepare_qiov(acb, len);
if (qed_should_set_need_check(s)) {
s->header.features |= QED_F_NEED_CHECK;
qed_write_header(s, cb, acb);
@@ -1158,19 +1187,8 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
*/
static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
{
- /* Allocate buffer for zero writes */
- if (acb->flags & QED_AIOCB_ZERO) {
- struct iovec *iov = acb->qiov->iov;
-
- if (!iov->iov_base) {
- iov->iov_base = qemu_blockalign(acb->common.bs, iov->iov_len);
- memset(iov->iov_base, 0, iov->iov_len);
- }
- }
-
- /* Calculate the I/O vector */
acb->cur_cluster = offset;
- qemu_iovec_concat(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
+ qed_prepare_qiov(acb, len);
/* Do the actual write */
qed_aio_write_main(acb, 0);
@@ -1270,6 +1288,7 @@ static void qed_aio_next_io(void *opaque, int ret)
{
QEDAIOCB *acb = opaque;
BDRVQEDState *s = acb_to_s(acb);
+ uint64_t len;
QEDFindClusterFunc *io_fn = (acb->flags & QED_AIOCB_WRITE) ?
qed_aio_write_data : qed_aio_read_data;
@@ -1291,10 +1310,14 @@ static void qed_aio_next_io(void *opaque, int ret)
return;
}
+ /* Limit buffer size to one cluster when writing zeroes. */
+ len = acb->end_pos - acb->cur_pos;
+ if (acb->flags & QED_AIOCB_ZERO) {
+ len = MIN(len, s->header.cluster_size);
+ }
+
/* Find next cluster and start I/O */
- qed_find_cluster(s, &acb->request,
- acb->cur_pos, acb->end_pos - acb->cur_pos,
- io_fn, acb);
+ qed_find_cluster(s, &acb->request, acb->cur_pos, len, io_fn, acb);
}
static BlockDriverAIOCB *qed_aio_setup(BlockDriverState *bs,
@@ -1315,7 +1338,7 @@ static BlockDriverAIOCB *qed_aio_setup(BlockDriverState *bs,
acb->cur_pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE;
acb->end_pos = acb->cur_pos + nb_sectors * BDRV_SECTOR_SIZE;
acb->request.l2_table = NULL;
- qemu_iovec_init(&acb->cur_qiov, qiov->niov);
+ qemu_iovec_init(&acb->cur_qiov, qiov ? qiov->niov : 1);
/* Start request */
qed_aio_next_io(acb, 0);
@@ -1364,17 +1387,11 @@ static int coroutine_fn bdrv_qed_co_write_zeroes(BlockDriverState *bs,
{
BlockDriverAIOCB *blockacb;
QEDWriteZeroesCB cb = { .done = false };
- QEMUIOVector qiov;
- struct iovec iov;
/* Zero writes start without an I/O buffer. If a buffer becomes necessary
* then it will be allocated during request processing.
*/
- iov.iov_base = NULL,
- iov.iov_len = nb_sectors * BDRV_SECTOR_SIZE,
-
- qemu_iovec_init_external(&qiov, &iov, 1);
- blockacb = qed_aio_setup(bs, sector_num, &qiov, nb_sectors,
+ blockacb = qed_aio_setup(bs, sector_num, NULL, nb_sectors,
qed_co_write_zeroes_cb, &cb,
QED_AIOCB_WRITE | QED_AIOCB_ZERO);
if (!blockacb) {
--
1.7.11.2
>From c74de5ae7650233574fb1572bc3b463864af8a3e Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Tue, 28 Aug 2012 16:09:23 +0200
Subject: [PATCH 2/2] qed: do copy-on-write for the first and last cluster in
a misaligned write-zero request
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/qed.c | 11 +++++++++++
1 file modificato, 11 inserzioni(+)
diff --git a/block/qed.c b/block/qed.c
index bcea346..0f8d06c 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1160,9 +1160,20 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
if (acb->find_cluster_ret == QED_CLUSTER_ZERO) {
cb = qed_aio_next_io;
} else {
+ /* For misaligned requests, do normal copy-on-write for the first
+ * and last cluster.
+ */
+ unsigned offset = qed_offset_into_cluster(s, acb->cur_pos);
+ if (offset != 0 || qed_offset_into_cluster(s, offset + len) != 0) {
+ acb->cur_nclusters = 1;
+ len = MIN(len, s->header.cluster_size - offset);
+ goto copy;
+ }
+
cb = qed_aio_write_zero_cluster;
}
} else {
+copy:
cb = qed_aio_write_prefill;
acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
}
--
1.7.11.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file
2012-08-28 14:23 ` Paolo Bonzini
@ 2012-08-29 7:53 ` Stefan Hajnoczi
2012-08-29 9:09 ` Paolo Bonzini
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2012-08-29 7:53 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Kevin Wolf, Stefan Hajnoczi, Anthony Liguori, qemu-devel
On Tue, Aug 28, 2012 at 04:23:31PM +0200, Paolo Bonzini wrote:
> Il 28/08/2012 15:37, Stefan Hajnoczi ha scritto:
> >> > The "right fix" would not be much more complex though, something like this, right?
> >> > (untested).
> > Yes but it's more complicated. To do a really good job we should
> > slice off the first/last clusters if they are unaligned, handle them
> > like regular allocating writes, and handle the middle of the request
> > as a zero write.
> >
> > I decided to do the simplest implementation since this scenario only
> > occurs in test cases, not real guests.
>
> Yes, I was curious because it reminded me of the patch I did to write
> zeroes when I was playing with discard to avoid the large bounce buffer
> in qed_aio_write_inplace. That patch takes care of processing clusters
> one by one (though that means one L2 write for each and every cluster,
> not just the first and last).
>
> It probably causes a performance hit, but anyway I attach it for
> completeness.
Thanks for sharing. I think this patch could be used as the basis for
something that handles the first and last clusters one-at-a-time and
does the middle clusters in a single L2 update.
I'm not going to implement that right now because I prefer the simple
solution unless this code path becomes more used.
Stefan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file
2012-08-29 7:53 ` Stefan Hajnoczi
@ 2012-08-29 9:09 ` Paolo Bonzini
0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2012-08-29 9:09 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, Stefan Hajnoczi, Anthony Liguori, qemu-devel
Il 29/08/2012 09:53, Stefan Hajnoczi ha scritto:
> I think this patch could be used as the basis for
> something that handles the first and last clusters one-at-a-time and
> does the middle clusters in a single L2 update.
>
> I'm not going to implement that right now because I prefer the simple
> solution unless this code path becomes more used.
Agreed, thanks!
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-08-29 9:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-28 13:04 [Qemu-devel] [PATCH for-1.2] qed: refuse unaligned zero writes with a backing file Stefan Hajnoczi
2012-08-28 13:25 ` Paolo Bonzini
2012-08-28 13:37 ` Stefan Hajnoczi
2012-08-28 14:23 ` Paolo Bonzini
2012-08-29 7:53 ` Stefan Hajnoczi
2012-08-29 9:09 ` Paolo Bonzini
2012-08-28 13:38 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).