* [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life
@ 2009-10-08 13:02 Kevin Wolf
2009-10-08 14:30 ` Anthony Liguori
2009-10-08 15:28 ` Jamie Lokier
0 siblings, 2 replies; 8+ messages in thread
From: Kevin Wolf @ 2009-10-08 13:02 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf
When the synchronous read and write functions were dropped, they were replaced
by generic emulation functions. Unfortunately, these emulation functions don't
provide the same semantics as the original functions did.
The original bdrv_read would mean that we read some data synchronously and that
we won't be interrupted during this read. The latter assumption is no longer
true with the emulation function which needs to use qemu_aio_poll and therefore
allows the callback of any other concurrent AIO request to be run during the
read. Which in turn means that (meta)data read earlier could have changed and
be invalid now. qcow2 is not prepared to work in this way and it's just scary
how many places there are where other requests could run.
I'm not sure yet where exactly it breaks, but you'll see breakage with virtio
on qcow2 with a backing file. Providing synchronous functions again fixes the
problem for me.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/qcow2-cluster.c | 6 ++--
block/qcow2.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++-
block/qcow2.h | 3 ++
3 files changed, 55 insertions(+), 5 deletions(-)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e444e53..a7de820 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -306,8 +306,8 @@ void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t sector_num,
}
-static int qcow_read(BlockDriverState *bs, int64_t sector_num,
- uint8_t *buf, int nb_sectors)
+int qcow2_read(BlockDriverState *bs, int64_t sector_num, uint8_t *buf,
+ int nb_sectors)
{
BDRVQcowState *s = bs->opaque;
int ret, index_in_cluster, n, n1;
@@ -358,7 +358,7 @@ static int copy_sectors(BlockDriverState *bs, uint64_t start_sect,
n = n_end - n_start;
if (n <= 0)
return 0;
- ret = qcow_read(bs, start_sect + n_start, s->cluster_data, n);
+ ret = qcow2_read(bs, start_sect + n_start, s->cluster_data, n);
if (ret < 0)
return ret;
if (s->crypt_method) {
diff --git a/block/qcow2.c b/block/qcow2.c
index a9e7682..52584ed 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -934,6 +934,51 @@ static int qcow_make_empty(BlockDriverState *bs)
return 0;
}
+static int qcow2_write(BlockDriverState *bs, int64_t sector_num,
+ const uint8_t *buf, int nb_sectors)
+{
+ BDRVQcowState *s = bs->opaque;
+ int ret, index_in_cluster, n;
+ uint64_t cluster_offset;
+ int n_end;
+ QCowL2Meta l2meta;
+
+ while (nb_sectors > 0) {
+ memset(&l2meta, 0, sizeof(l2meta));
+
+ index_in_cluster = sector_num & (s->cluster_sectors - 1);
+ n_end = index_in_cluster + nb_sectors;
+ if (s->crypt_method &&
+ n_end > QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors)
+ n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
+ cluster_offset = qcow2_alloc_cluster_offset(bs, sector_num << 9,
+ index_in_cluster,
+ n_end, &n, &l2meta);
+ if (!cluster_offset)
+ return -1;
+ if (s->crypt_method) {
+ qcow2_encrypt_sectors(s, sector_num, s->cluster_data, buf, n, 1,
+ &s->aes_encrypt_key);
+ ret = bdrv_pwrite(s->hd, cluster_offset + index_in_cluster * 512,
+ s->cluster_data, n * 512);
+ } else {
+ ret = bdrv_pwrite(s->hd, cluster_offset + index_in_cluster * 512, buf, n * 512);
+ }
+ if (ret != n * 512 || qcow2_alloc_cluster_link_l2(bs, cluster_offset, &l2meta) < 0) {
+ qcow2_free_any_clusters(bs, cluster_offset, l2meta.nb_clusters);
+ return -1;
+ }
+ nb_sectors -= n;
+ sector_num += n;
+ buf += n * 512;
+ if (l2meta.nb_clusters != 0) {
+ QLIST_REMOVE(&l2meta, next_in_flight);
+ }
+ }
+ s->cluster_cache_offset = -1; /* disable compressed cache */
+ return 0;
+}
+
/* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */
static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
@@ -1121,8 +1166,10 @@ static BlockDriver bdrv_qcow2 = {
.bdrv_set_key = qcow_set_key,
.bdrv_make_empty = qcow_make_empty,
- .bdrv_aio_readv = qcow_aio_readv,
- .bdrv_aio_writev = qcow_aio_writev,
+ .bdrv_read = qcow2_read,
+ .bdrv_write = qcow2_write,
+ .bdrv_aio_readv = qcow_aio_readv,
+ .bdrv_aio_writev = qcow_aio_writev,
.bdrv_write_compressed = qcow_write_compressed,
.bdrv_snapshot_create = qcow2_snapshot_create,
diff --git a/block/qcow2.h b/block/qcow2.h
index 26ab5d9..d3f690a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -202,6 +202,9 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
QCowL2Meta *m);
+int qcow2_read(BlockDriverState *bs, int64_t sector_num, uint8_t *buf,
+ int nb_sectors);
+
/* qcow2-snapshot.c functions */
int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info);
int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id);
--
1.6.2.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life
2009-10-08 13:02 [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life Kevin Wolf
@ 2009-10-08 14:30 ` Anthony Liguori
2009-10-08 14:47 ` Kevin Wolf
2009-10-08 15:28 ` Jamie Lokier
1 sibling, 1 reply; 8+ messages in thread
From: Anthony Liguori @ 2009-10-08 14:30 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
Kevin Wolf wrote:
> When the synchronous read and write functions were dropped, they were replaced
> by generic emulation functions. Unfortunately, these emulation functions don't
> provide the same semantics as the original functions did.
>
> The original bdrv_read would mean that we read some data synchronously and that
> we won't be interrupted during this read. The latter assumption is no longer
> true with the emulation function which needs to use qemu_aio_poll and therefore
> allows the callback of any other concurrent AIO request to be run during the
> read.
Perhaps you could create a mechanism to freeze the qcow2 image by
queuing all completions within qcow2 until the image was unfrozen. This
would have the same effect switching to synchronous read/write.
You may also have to queue new read/write requests...
Introducing sync read/write seems like a major step backwards to me.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life
2009-10-08 14:30 ` Anthony Liguori
@ 2009-10-08 14:47 ` Kevin Wolf
2009-10-08 15:01 ` Anthony Liguori
0 siblings, 1 reply; 8+ messages in thread
From: Kevin Wolf @ 2009-10-08 14:47 UTC (permalink / raw)
To: Anthony Liguori; +Cc: qemu-devel
Am 08.10.2009 16:30, schrieb Anthony Liguori:
> Kevin Wolf wrote:
>> When the synchronous read and write functions were dropped, they were replaced
>> by generic emulation functions. Unfortunately, these emulation functions don't
>> provide the same semantics as the original functions did.
>>
>> The original bdrv_read would mean that we read some data synchronously and that
>> we won't be interrupted during this read. The latter assumption is no longer
>> true with the emulation function which needs to use qemu_aio_poll and therefore
>> allows the callback of any other concurrent AIO request to be run during the
>> read.
>
> Perhaps you could create a mechanism to freeze the qcow2 image by
> queuing all completions within qcow2 until the image was unfrozen. This
> would have the same effect switching to synchronous read/write.
>
> You may also have to queue new read/write requests...
>
> Introducing sync read/write seems like a major step backwards to me.
Right, I was expecting your reaction. ;-) I do even agree that it's not
nice to have the synchronous functions back. But removing them caused a
regression, so the removal should be reverted until it is done right.
I just want to make clear that we're talking about data corruption here.
This is not just something that we can care about when we are bored some
time in the future.
For stable, I think taking this patch is the only reasonable thing to
do. Any other solution would be way too invasive. For master we can
discuss other solutions. However, I really don't feel like hacking
around it with a quick fix and breaking other stuff, so this takes a bit
more time. For the meantime I would prefer committing this to master, too.
By the way, I don't think queuing things in qcow2 is the right thing to
do. It probably wouldn't even help if, say, VMDK implemented AIO and I
used a VMDK image as a backing file for qcow2. The real solution is to
fix the generic bdrv_read/write emulation. Now we just need to find the
best way to do this.
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life
2009-10-08 14:47 ` Kevin Wolf
@ 2009-10-08 15:01 ` Anthony Liguori
2009-10-08 15:23 ` Kevin Wolf
2009-10-08 18:47 ` Mark McLoughlin
0 siblings, 2 replies; 8+ messages in thread
From: Anthony Liguori @ 2009-10-08 15:01 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
Kevin Wolf wrote:
> Am 08.10.2009 16:30, schrieb Anthony Liguori:
>
>> Kevin Wolf wrote:
>>
>>> When the synchronous read and write functions were dropped, they were replaced
>>> by generic emulation functions. Unfortunately, these emulation functions don't
>>> provide the same semantics as the original functions did.
>>>
>>> The original bdrv_read would mean that we read some data synchronously and that
>>> we won't be interrupted during this read. The latter assumption is no longer
>>> true with the emulation function which needs to use qemu_aio_poll and therefore
>>> allows the callback of any other concurrent AIO request to be run during the
>>> read.
>>>
>> Perhaps you could create a mechanism to freeze the qcow2 image by
>> queuing all completions within qcow2 until the image was unfrozen. This
>> would have the same effect switching to synchronous read/write.
>>
>> You may also have to queue new read/write requests...
>>
>> Introducing sync read/write seems like a major step backwards to me.
>>
>
> Right, I was expecting your reaction. ;-) I do even agree that it's not
> nice to have the synchronous functions back. But removing them caused a
> regression, so the removal should be reverted until it is done right.
>
> I just want to make clear that we're talking about data corruption here.
> This is not just something that we can care about when we are bored some
> time in the future.
>
Yeah, okay. Can we do a more direct revert though so that it's clearer
in the commit log?
> By the way, I don't think queuing things in qcow2 is the right thing to
> do. It probably wouldn't even help if, say, VMDK implemented AIO and I
> used a VMDK image as a backing file for qcow2. The real solution is to
> fix the generic bdrv_read/write emulation. Now we just need to find the
> best way to do this.
>
Ideally, you could say qemu_aio_wait(aiocb) and it would only wait for
that particular request. I'm not convinced though that there aren't
dependency issues though since we can generate synthetic aiocbs.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life
2009-10-08 15:01 ` Anthony Liguori
@ 2009-10-08 15:23 ` Kevin Wolf
2009-10-08 18:47 ` Mark McLoughlin
1 sibling, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2009-10-08 15:23 UTC (permalink / raw)
To: Anthony Liguori; +Cc: qemu-devel
Am 08.10.2009 17:01, schrieb Anthony Liguori:
> Kevin Wolf wrote:
>> Am 08.10.2009 16:30, schrieb Anthony Liguori:
>>
>>> Kevin Wolf wrote:
>>>
>>>> When the synchronous read and write functions were dropped, they were replaced
>>>> by generic emulation functions. Unfortunately, these emulation functions don't
>>>> provide the same semantics as the original functions did.
>>>>
>>>> The original bdrv_read would mean that we read some data synchronously and that
>>>> we won't be interrupted during this read. The latter assumption is no longer
>>>> true with the emulation function which needs to use qemu_aio_poll and therefore
>>>> allows the callback of any other concurrent AIO request to be run during the
>>>> read.
>>>>
>>> Perhaps you could create a mechanism to freeze the qcow2 image by
>>> queuing all completions within qcow2 until the image was unfrozen. This
>>> would have the same effect switching to synchronous read/write.
>>>
>>> You may also have to queue new read/write requests...
>>>
>>> Introducing sync read/write seems like a major step backwards to me.
>>>
>>
>> Right, I was expecting your reaction. ;-) I do even agree that it's not
>> nice to have the synchronous functions back. But removing them caused a
>> regression, so the removal should be reverted until it is done right.
>>
>> I just want to make clear that we're talking about data corruption here.
>> This is not just something that we can care about when we are bored some
>> time in the future.
>>
>
> Yeah, okay. Can we do a more direct revert though so that it's clearer
> in the commit log?
Hm, it's not a single commit to revert. The main part is that we need
qcow_write back which was removed in ade40677. The removal of the
synchronous functions from the BlockDriver must have happened longer ago.
And then I needed to put one or two changes on top to make it work with
the changes since it was dropped.
>> By the way, I don't think queuing things in qcow2 is the right thing to
>> do. It probably wouldn't even help if, say, VMDK implemented AIO and I
>> used a VMDK image as a backing file for qcow2. The real solution is to
>> fix the generic bdrv_read/write emulation. Now we just need to find the
>> best way to do this.
>
> Ideally, you could say qemu_aio_wait(aiocb) and it would only wait for
> that particular request. I'm not convinced though that there aren't
> dependency issues though since we can generate synthetic aiocbs.
You can't only wait for this single request but you also need to allow
waiting for all requests that were submitted while processing the
request you're waiting for. I think you need to create something like a
new context for all requests that are newly started.
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life
2009-10-08 15:01 ` Anthony Liguori
2009-10-08 15:23 ` Kevin Wolf
@ 2009-10-08 18:47 ` Mark McLoughlin
1 sibling, 0 replies; 8+ messages in thread
From: Mark McLoughlin @ 2009-10-08 18:47 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Kevin Wolf, qemu-devel
On Thu, 2009-10-08 at 10:01 -0500, Anthony Liguori wrote:
> Kevin Wolf wrote:
> > Am 08.10.2009 16:30, schrieb Anthony Liguori:
> >
> >> Kevin Wolf wrote:
> >>
> >>> When the synchronous read and write functions were dropped, they were replaced
> >>> by generic emulation functions. Unfortunately, these emulation functions don't
> >>> provide the same semantics as the original functions did.
> >>>
> >>> The original bdrv_read would mean that we read some data synchronously and that
> >>> we won't be interrupted during this read. The latter assumption is no longer
> >>> true with the emulation function which needs to use qemu_aio_poll and therefore
> >>> allows the callback of any other concurrent AIO request to be run during the
> >>> read.
> >>>
> >> Perhaps you could create a mechanism to freeze the qcow2 image by
> >> queuing all completions within qcow2 until the image was unfrozen. This
> >> would have the same effect switching to synchronous read/write.
> >>
> >> You may also have to queue new read/write requests...
> >>
> >> Introducing sync read/write seems like a major step backwards to me.
> >>
> >
> > Right, I was expecting your reaction. ;-) I do even agree that it's not
> > nice to have the synchronous functions back. But removing them caused a
> > regression, so the removal should be reverted until it is done right.
> >
> > I just want to make clear that we're talking about data corruption here.
> > This is not just something that we can care about when we are bored some
> > time in the future.
> >
>
> Yeah, okay. Can we do a more direct revert though so that it's clearer
> in the commit log?
FWIW, here's the Fedora 12 (qemu-kvm-0.11.0) report on this:
https://bugzilla.redhat.com/524734
Cheers,
Mark.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life
2009-10-08 13:02 [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life Kevin Wolf
2009-10-08 14:30 ` Anthony Liguori
@ 2009-10-08 15:28 ` Jamie Lokier
2009-10-08 15:47 ` Kevin Wolf
1 sibling, 1 reply; 8+ messages in thread
From: Jamie Lokier @ 2009-10-08 15:28 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
Kevin Wolf wrote:
> The original bdrv_read would mean that we read some data
> synchronously and that we won't be interrupted during this read. The
> latter assumption is no longer true with the emulation function
> which needs to use qemu_aio_poll and therefore allows the callback
> of any other concurrent AIO request to be run during the read. Which
> in turn means that (meta)data read earlier could have changed and be
> invalid now.
I'm not sure if I understand your description, specifically
"(meta)data read earlier could have changed and be invalid now".
Do you mean:
Async call into qcow2 #2
------------------------
issues a request with bdrv_read/write
Async call into qcow2 #1
------------------------
reads some metadata from memory (**)
does some calculations
issues a request with bdrv_read/write
the request completes
updates some metadata in memory
async call finished with result
the request completes
updates some metadata in memory
.... ERROR, memory isn't what it was at point (**)
Thanks,
-- Jamie
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life
2009-10-08 15:28 ` Jamie Lokier
@ 2009-10-08 15:47 ` Kevin Wolf
0 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2009-10-08 15:47 UTC (permalink / raw)
To: Jamie Lokier; +Cc: qemu-devel
Am 08.10.2009 17:28, schrieb Jamie Lokier:
> Kevin Wolf wrote:
>> The original bdrv_read would mean that we read some data
>> synchronously and that we won't be interrupted during this read. The
>> latter assumption is no longer true with the emulation function
>> which needs to use qemu_aio_poll and therefore allows the callback
>> of any other concurrent AIO request to be run during the read. Which
>> in turn means that (meta)data read earlier could have changed and be
>> invalid now.
>
> I'm not sure if I understand your description, specifically
> "(meta)data read earlier could have changed and be invalid now".
>
> Do you mean:
No, it's not exactly what I meant. But you're right, your scenario could
happen, too. If global state is changed in an AIO callback called during
bdrv_read/write (e.g. a new L2 table is loaded into the cache), bad
things are almost guaranteed to happen.
What I was thinking of is:
>
> Async call into qcow2 #2
> ------------------------
>
> issues a request with bdrv_read/write
>
> Async call into qcow2 #1
> ------------------------
> reads some metadata from memory (**)
#1 reads some metadata from disk (from the image file)
> does some calculations
> issues a request with bdrv_read/write
>
> the request completes
> updates some metadata in memory
#2 updates the metadata in the file
> async call finished with result
>
> the request completes
> updates some metadata in memory
> .... ERROR, memory isn't what it was at point (**)
#1 still uses the old metadata which it had loaded into memory
(specifically those parts on the stack).
Also, I was thinking of #2 as being a regular AIO write (no bdrv_write
involved), but again your version could work as well. As I said I don't
know yet which of them really happens.
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-10-08 18:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-08 13:02 [Qemu-devel] [PATCH] qcow2: Bring synchronous read/write back to life Kevin Wolf
2009-10-08 14:30 ` Anthony Liguori
2009-10-08 14:47 ` Kevin Wolf
2009-10-08 15:01 ` Anthony Liguori
2009-10-08 15:23 ` Kevin Wolf
2009-10-08 18:47 ` Mark McLoughlin
2009-10-08 15:28 ` Jamie Lokier
2009-10-08 15:47 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).