* [Qemu-devel] [PATCH 1/8] block: add bdrv_aio_stream
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
@ 2011-04-27 13:27 ` Stefan Hajnoczi
2011-04-29 11:56 ` Kevin Wolf
2011-04-27 13:27 ` [Qemu-devel] [PATCH 2/8] qmp: Add QMP support for stream commands Stefan Hajnoczi
` (7 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori
From: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
block.c | 32 ++++++++++++++++++++++++++++++++
block.h | 2 ++
block_int.h | 3 +++
3 files changed, 37 insertions(+), 0 deletions(-)
diff --git a/block.c b/block.c
index f731c7a..5e3476c 100644
--- a/block.c
+++ b/block.c
@@ -2248,6 +2248,38 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
return ret;
}
+/**
+ * Attempt to stream an image starting from sector_num.
+ *
+ * @sector_num - the first sector to start streaming from
+ * @cb - block completion callback
+ * @opaque - data to pass completion callback
+ *
+ * Returns NULL if the image format not support streaming, the image is
+ * read-only, or no image is open.
+ *
+ * The intention of this function is for a user to execute it once with a
+ * sector_num of 0 and then upon receiving a completion callback, to remember
+ * the number of sectors "streamed", and then to call this function again with
+ * an offset adjusted by the number of sectors previously streamed.
+ *
+ * This allows a user to progressive stream in an image at a pace that makes
+ * sense. In general, this function tries to do the smallest amount of I/O
+ * possible to do some useful work.
+ *
+ * This function only really makes sense in combination with a block format
+ * that supports copy on read and has it enabled. If copy on read is not
+ * enabled, a block format driver may return NULL.
+ */
+BlockDriverAIOCB *bdrv_aio_stream(BlockDriverState *bs, int64_t sector_num,
+ BlockDriverCompletionFunc *cb, void *opaque)
+{
+ if (!bs->drv || bs->read_only || !bs->drv->bdrv_aio_stream) {
+ return NULL;
+ }
+
+ return bs->drv->bdrv_aio_stream(bs, sector_num, cb, opaque);
+}
typedef struct MultiwriteCB {
int error;
diff --git a/block.h b/block.h
index 52e9cad..fad828a 100644
--- a/block.h
+++ b/block.h
@@ -119,6 +119,8 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
BlockDriverCompletionFunc *cb, void *opaque);
BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *bdrv_aio_stream(BlockDriverState *bs, int64_t sector_num,
+ BlockDriverCompletionFunc *cb, void *opaque);
void bdrv_aio_cancel(BlockDriverAIOCB *acb);
typedef struct BlockRequest {
diff --git a/block_int.h b/block_int.h
index 545ad11..0c125d0 100644
--- a/block_int.h
+++ b/block_int.h
@@ -73,6 +73,9 @@ struct BlockDriver {
BlockDriverCompletionFunc *cb, void *opaque);
BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
BlockDriverCompletionFunc *cb, void *opaque);
+ BlockDriverAIOCB *(*bdrv_aio_stream)(BlockDriverState *bs,
+ int64_t sector_num,
+ BlockDriverCompletionFunc *cb, void *opaque);
int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num,
int nb_sectors);
--
1.7.4.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 1/8] block: add bdrv_aio_stream
2011-04-27 13:27 ` [Qemu-devel] [PATCH 1/8] block: add bdrv_aio_stream Stefan Hajnoczi
@ 2011-04-29 11:56 ` Kevin Wolf
2011-05-06 13:21 ` Stefan Hajnoczi
0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2011-04-29 11:56 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel
Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
> From: Anthony Liguori <aliguori@us.ibm.com>
>
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> ---
> block.c | 32 ++++++++++++++++++++++++++++++++
> block.h | 2 ++
> block_int.h | 3 +++
> 3 files changed, 37 insertions(+), 0 deletions(-)
>
> diff --git a/block.c b/block.c
> index f731c7a..5e3476c 100644
> --- a/block.c
> +++ b/block.c
> @@ -2248,6 +2248,38 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
> return ret;
> }
>
> +/**
> + * Attempt to stream an image starting from sector_num.
> + *
> + * @sector_num - the first sector to start streaming from
> + * @cb - block completion callback
> + * @opaque - data to pass completion callback
> + *
> + * Returns NULL if the image format not support streaming, the image is
> + * read-only, or no image is open.
> + *
> + * The intention of this function is for a user to execute it once with a
> + * sector_num of 0 and then upon receiving a completion callback, to remember
> + * the number of sectors "streamed", and then to call this function again with
> + * an offset adjusted by the number of sectors previously streamed.
> + *
> + * This allows a user to progressive stream in an image at a pace that makes
> + * sense. In general, this function tries to do the smallest amount of I/O
> + * possible to do some useful work.
> + *
> + * This function only really makes sense in combination with a block format
> + * that supports copy on read and has it enabled. If copy on read is not
> + * enabled, a block format driver may return NULL.
> + */
> +BlockDriverAIOCB *bdrv_aio_stream(BlockDriverState *bs, int64_t sector_num,
> + BlockDriverCompletionFunc *cb, void *opaque)
I think bdrv_aio_stream is a bad name for this. It only becomes image
streaming because the caller repeatedly calls this function. What the
function really does is copying some data from the backing file into the
overlay image.
I'm not sure how the caller would know how many sectors have been
copied. A BlockDriverCompletionFunc usually returns 0 on success, did
you change it here to use positive numbers for something else? At least
this must be documented somewhere, but I would suggest to add a
nb_sectors argument instead so that the caller decides how many sectors
to copy.
If you say that it only makes sense with copy on read, should one think
of it as a read that throws the read data away? I think considering it a
copy function makes more sense and is independent of copy on read.
Kevin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 1/8] block: add bdrv_aio_stream
2011-04-29 11:56 ` Kevin Wolf
@ 2011-05-06 13:21 ` Stefan Hajnoczi
2011-05-06 13:36 ` Kevin Wolf
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-05-06 13:21 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Anthony Liguori, Stefan Hajnoczi, qemu-devel
On Fri, Apr 29, 2011 at 12:56 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
>> +/**
>> + * Attempt to stream an image starting from sector_num.
>> + *
>> + * @sector_num - the first sector to start streaming from
>> + * @cb - block completion callback
>> + * @opaque - data to pass completion callback
>> + *
>> + * Returns NULL if the image format not support streaming, the image is
>> + * read-only, or no image is open.
>> + *
>> + * The intention of this function is for a user to execute it once with a
>> + * sector_num of 0 and then upon receiving a completion callback, to remember
>> + * the number of sectors "streamed", and then to call this function again with
>> + * an offset adjusted by the number of sectors previously streamed.
>> + *
>> + * This allows a user to progressive stream in an image at a pace that makes
>> + * sense. In general, this function tries to do the smallest amount of I/O
>> + * possible to do some useful work.
>> + *
>> + * This function only really makes sense in combination with a block format
>> + * that supports copy on read and has it enabled. If copy on read is not
>> + * enabled, a block format driver may return NULL.
>> + */
>> +BlockDriverAIOCB *bdrv_aio_stream(BlockDriverState *bs, int64_t sector_num,
>> + BlockDriverCompletionFunc *cb, void *opaque)
>
> I think bdrv_aio_stream is a bad name for this. It only becomes image
> streaming because the caller repeatedly calls this function. What the
> function really does is copying some data from the backing file into the
> overlay image.
That's true but bdrv_aio_copy_from_backing_file() is a bit long. The
special thing about this operation is that it takes a starting
sector_num but no length. The callback receives the nb_sectors. So
this operation isn't an ordinary [start, length) copy either so
bdrv_aio_stream() isn't that bad?
> I'm not sure how the caller would know how many sectors have been
> copied. A BlockDriverCompletionFunc usually returns 0 on success, did
> you change it here to use positive numbers for something else? At least
> this must be documented somewhere, but I would suggest to add a
> nb_sectors argument instead so that the caller decides how many sectors
> to copy.
Yes, I agree that a separate nb_sectors argument would be clearer.
> If you say that it only makes sense with copy on read, should one think
> of it as a read that throws the read data away? I think considering it a
> copy function makes more sense and is independent of copy on read.
I actually think the copy-on-read statement is an implementation
detail. I can imagine doing essentially the same behavior without
exposing copy on read to the user. But in QED streaming is based on
copy-on-read. Let's remove this comment.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 1/8] block: add bdrv_aio_stream
2011-05-06 13:21 ` Stefan Hajnoczi
@ 2011-05-06 13:36 ` Kevin Wolf
2011-05-06 15:47 ` Stefan Hajnoczi
0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2011-05-06 13:36 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Anthony Liguori, Stefan Hajnoczi, qemu-devel
Am 06.05.2011 15:21, schrieb Stefan Hajnoczi:
> On Fri, Apr 29, 2011 at 12:56 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
>>> +/**
>>> + * Attempt to stream an image starting from sector_num.
>>> + *
>>> + * @sector_num - the first sector to start streaming from
>>> + * @cb - block completion callback
>>> + * @opaque - data to pass completion callback
>>> + *
>>> + * Returns NULL if the image format not support streaming, the image is
>>> + * read-only, or no image is open.
>>> + *
>>> + * The intention of this function is for a user to execute it once with a
>>> + * sector_num of 0 and then upon receiving a completion callback, to remember
>>> + * the number of sectors "streamed", and then to call this function again with
>>> + * an offset adjusted by the number of sectors previously streamed.
>>> + *
>>> + * This allows a user to progressive stream in an image at a pace that makes
>>> + * sense. In general, this function tries to do the smallest amount of I/O
>>> + * possible to do some useful work.
>>> + *
>>> + * This function only really makes sense in combination with a block format
>>> + * that supports copy on read and has it enabled. If copy on read is not
>>> + * enabled, a block format driver may return NULL.
>>> + */
>>> +BlockDriverAIOCB *bdrv_aio_stream(BlockDriverState *bs, int64_t sector_num,
>>> + BlockDriverCompletionFunc *cb, void *opaque)
>>
>> I think bdrv_aio_stream is a bad name for this. It only becomes image
>> streaming because the caller repeatedly calls this function. What the
>> function really does is copying some data from the backing file into the
>> overlay image.
>
> That's true but bdrv_aio_copy_from_backing_file() is a bit long.
bdrv_copy_backing() or something should be short enough and still
describes what it's really doing.
> The
> special thing about this operation is that it takes a starting
> sector_num but no length. The callback receives the nb_sectors. So
> this operation isn't an ordinary [start, length) copy either so
> bdrv_aio_stream() isn't that bad?
Well, you're going to introduce nb_sectors anyway, so it's not really
special any more.
> I actually think the copy-on-read statement is an implementation
> detail. I can imagine doing essentially the same behavior without
> exposing copy on read to the user. But in QED streaming is based on
> copy-on-read. Let's remove this comment.
Ok. Removing the comment and calling it something with "copy" in the
name should make clear what the intention is.
Kevin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 1/8] block: add bdrv_aio_stream
2011-05-06 13:36 ` Kevin Wolf
@ 2011-05-06 15:47 ` Stefan Hajnoczi
0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-05-06 15:47 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Anthony Liguori, Stefan Hajnoczi, qemu-devel
On Fri, May 6, 2011 at 2:36 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 06.05.2011 15:21, schrieb Stefan Hajnoczi:
>> On Fri, Apr 29, 2011 at 12:56 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
>>>> +/**
>>>> + * Attempt to stream an image starting from sector_num.
>>>> + *
>>>> + * @sector_num - the first sector to start streaming from
>>>> + * @cb - block completion callback
>>>> + * @opaque - data to pass completion callback
>>>> + *
>>>> + * Returns NULL if the image format not support streaming, the image is
>>>> + * read-only, or no image is open.
>>>> + *
>>>> + * The intention of this function is for a user to execute it once with a
>>>> + * sector_num of 0 and then upon receiving a completion callback, to remember
>>>> + * the number of sectors "streamed", and then to call this function again with
>>>> + * an offset adjusted by the number of sectors previously streamed.
>>>> + *
>>>> + * This allows a user to progressive stream in an image at a pace that makes
>>>> + * sense. In general, this function tries to do the smallest amount of I/O
>>>> + * possible to do some useful work.
>>>> + *
>>>> + * This function only really makes sense in combination with a block format
>>>> + * that supports copy on read and has it enabled. If copy on read is not
>>>> + * enabled, a block format driver may return NULL.
>>>> + */
>>>> +BlockDriverAIOCB *bdrv_aio_stream(BlockDriverState *bs, int64_t sector_num,
>>>> + BlockDriverCompletionFunc *cb, void *opaque)
>>>
>>> I think bdrv_aio_stream is a bad name for this. It only becomes image
>>> streaming because the caller repeatedly calls this function. What the
>>> function really does is copying some data from the backing file into the
>>> overlay image.
>>
>> That's true but bdrv_aio_copy_from_backing_file() is a bit long.
>
> bdrv_copy_backing() or something should be short enough and still
> describes what it's really doing.
>
>> The
>> special thing about this operation is that it takes a starting
>> sector_num but no length. The callback receives the nb_sectors. So
>> this operation isn't an ordinary [start, length) copy either so
>> bdrv_aio_stream() isn't that bad?
>
> Well, you're going to introduce nb_sectors anyway, so it's not really
> special any more.
I guess you're right. First I wasn't planning on passing nb_sectors
to this function since its the blockdev.c streaming loop that drives
the streaming process - we may not need nb_sectors here. But I guess
this is like a read(2) function and the block driver can return short
reads if that is convenient due to cluster sizes or other image format
internals. By passing in nb_sectors we avoid streaming too much at
the end.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 2/8] qmp: Add QMP support for stream commands
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
2011-04-27 13:27 ` [Qemu-devel] [PATCH 1/8] block: add bdrv_aio_stream Stefan Hajnoczi
@ 2011-04-27 13:27 ` Stefan Hajnoczi
2011-04-29 12:09 ` Kevin Wolf
2011-04-27 13:27 ` [Qemu-devel] [PATCH 3/8] qed: add support for Copy-on-Read Stefan Hajnoczi
` (6 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
From: Anthony Liguori <aliguori@us.ibm.com>
For leaf images with copy on read semantics, the stream commands allow the user
to populate local blocks by manually streaming them from the backing image.
Once all blocks have been streamed, the dependency on the original backing
image can be removed. Therefore, stream commands can be used to implement
post-copy live block migration and rapid deployment.
The stream command can be used to stream a single sector, to start streaming
the entire device, and to cancel an active stream. It is easiest to allow the
stream command to manage streaming for the entire device but a managent tool
could use single sector mode to throttle the I/O rate. When a single sector is
streamed, the command returns an offset that can be used for a subsequent call.
The command synopses are as follows:
stream
------
Stream data to a block device.
Arguments:
- all: Stream the entire device (json-bool, optional)
- stop: Stop streaming to the device (json-bool, optional)
- device: device name (json-string)
- offset: device offset in bytes (json-int, optional)
Return:
- device: The device name being streamed
- len: The size of the device (in bytes)
- offset: The ending offset of the completed I/O (in bytes)
Examples:
-> { "execute": "stream", "arguments": { "device": "virtio0", "offset": 0 } }
<- { "return": { "device": "virtio0", "len": 10737418240, "offset": 512 } }
-> { "execute": "stream", "arguments": { "all": true, "device": "virtio0" } }
<- { "return": {} }
-> { "execute": "stream", "arguments": { "stop": true, "device": "virtio0" } }
<- { "return": {} }
query-stream
------------
Show progress of ongoing stream operation
Return a json-array of all streams. If no stream is active then an empty array
will be returned. Each stream is a json-object with the following data:
- device: The device name being streamed
- len: The size of the device (in bytes)
- offset: The ending offset of the completed I/O (in bytes)
Example:
-> { "execute": "query-stream" }
<- { "return":[
{ "device": "virtio0", "len": 10737418240, "offset": 709632}
]
}
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
blockdev.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
blockdev.h | 5 ++
hmp-commands.hx | 18 +++++
monitor.c | 20 +++++
qerror.c | 9 +++
qerror.h | 6 ++
qmp-commands.hx | 64 +++++++++++++++++
7 files changed, 334 insertions(+), 0 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index 5429621..99c0726 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -16,6 +16,7 @@
#include "sysemu.h"
#include "hw/qdev.h"
#include "block_int.h"
+#include "qjson.h"
static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
@@ -50,6 +51,144 @@ static const int if_max_devs[IF_COUNT] = {
[IF_SCSI] = 7,
};
+typedef struct StreamState {
+ MonitorCompletion *cb;
+ void *cb_opaque;
+ int64_t offset;
+ bool once;
+ bool cancel;
+ BlockDriverState *bs;
+ QEMUTimer *timer;
+ uint64_t stream_delay;
+} StreamState;
+
+static StreamState global_stream;
+static StreamState *active_stream;
+
+static QObject *stream_get_qobject(StreamState *s)
+{
+ const char *name = bdrv_get_device_name(s->bs);
+ int64_t len = bdrv_getlength(s->bs);
+
+ return qobject_from_jsonf("{ 'device': %s, 'offset': %" PRId64 ", "
+ "'len': %" PRId64 " }", name, s->offset, len);
+}
+
+static void do_stream_cb(void *opaque, int ret)
+{
+ StreamState *s = opaque;
+
+ if (ret < 0) {
+ qerror_report(QERR_STREAMING_ERROR, strerror(-ret));
+ goto out;
+ }
+
+ s->offset += ret * BDRV_SECTOR_SIZE;
+
+ if (!s->once) {
+ if (s->offset == bdrv_getlength(s->bs)) {
+ bdrv_change_backing_file(s->bs, NULL, NULL);
+ } else if (!s->cancel) {
+ qemu_mod_timer(s->timer,
+ qemu_get_clock_ns(rt_clock) + s->stream_delay);
+ return;
+ }
+ }
+
+out:
+ if (s->cb) {
+ s->cb(s->cb_opaque, stream_get_qobject(s));
+ }
+ qemu_del_timer(s->timer);
+ qemu_free_timer(s->timer);
+ active_stream = NULL;
+}
+
+/* We can't call bdrv_aio_stream() directly from the callback because that
+ * makes qemu_aio_flush() not complete until the streaming is completed.
+ * By delaying with a timer, we give qemu_aio_flush() a chance to complete.
+ */
+static void stream_next_iteration(void *opaque)
+{
+ StreamState *s = opaque;
+
+ bdrv_aio_stream(s->bs, s->offset / BDRV_SECTOR_SIZE, do_stream_cb, s);
+}
+
+static StreamState *stream_start(const char *device, int64_t offset, bool once,
+ MonitorCompletion cb, void *opaque)
+{
+ BlockDriverState *bs;
+ StreamState *s = &global_stream;
+ BlockDriverAIOCB *acb;
+
+ if (active_stream) {
+ qerror_report(QERR_DEVICE_IN_USE,
+ bdrv_get_device_name(active_stream->bs));
+ return NULL;
+ }
+
+ bs = bdrv_find(device);
+ if (!bs) {
+ qerror_report(QERR_DEVICE_NOT_FOUND, device);
+ return NULL;
+ }
+
+ if (offset % BDRV_SECTOR_SIZE) {
+ qerror_report(QERR_INVALID_PARAMETER_VALUE,
+ "offset", "a sector-aligned offset");
+ return NULL;
+ }
+
+ if (offset >= bdrv_getlength(bs)) {
+ qerror_report(QERR_INVALID_PARAMETER_VALUE,
+ "offset", "an offset less than device length");
+ return NULL;
+ }
+
+ memset(s, 0, sizeof(*s));
+ if (once) {
+ s->cb = cb;
+ s->cb_opaque = opaque;
+ s->once = true;
+ }
+ s->offset = offset;
+ s->bs = bs;
+ s->stream_delay = 0; /* FIXME make this configurable */
+ s->timer = qemu_new_timer_ns(rt_clock, stream_next_iteration, s);
+
+ acb = bdrv_aio_stream(bs, offset / BDRV_SECTOR_SIZE, do_stream_cb, s);
+ if (acb == NULL) {
+ qemu_free_timer(s->timer);
+ qerror_report(QERR_NOT_SUPPORTED);
+ return NULL;
+ }
+
+ active_stream = s;
+
+ return s;
+}
+
+static int stream_stop(const char *device, MonitorCompletion *cb, void *opaque)
+{
+ if (!active_stream) {
+ qerror_report(QERR_STREAMING_ERROR, strerror(ESRCH));
+ return -1;
+ }
+
+ /*
+ * In case we want to support simultaneous streams in the future,
+ * require a device name to be specified when stopping a stream.
+ */
+ if (strcmp(device, bdrv_get_device_name(active_stream->bs))) {
+ qerror_report(QERR_DEVICE_NOT_FOUND, device);
+ return -1;
+ }
+
+ active_stream->cancel = true;
+ return 0;
+}
+
/*
* We automatically delete the drive when a device using it gets
* unplugged. Questionable feature, but we can't just drop it.
@@ -647,6 +786,79 @@ out:
return ret;
}
+void monitor_print_stream(Monitor *mon, const QObject *data)
+{
+ QList *streams;
+
+ if (data == NULL) {
+ return;
+ }
+
+ streams = qobject_to_qlist(data);
+ if (streams && !qlist_empty(streams)) {
+ /* Only print a single stream until multi-stream support is added */
+ QDict *qdict = qobject_to_qdict(qlist_peek(streams));
+ monitor_printf(mon, "Streaming device %s: Completed %" PRId64 " of %"
+ PRId64 " bytes\n", qdict_get_str(qdict, "device"),
+ qdict_get_int(qdict, "offset"),
+ qdict_get_int(qdict, "len"));
+ } else {
+ monitor_printf(mon, "No active stream\n");
+ }
+}
+
+int do_stream_info(Monitor *mon, MonitorCompletion *cb, void *opaque)
+{
+ QList *streams = qlist_new();
+
+ if (active_stream) {
+ qlist_append_obj(streams, stream_get_qobject(active_stream));
+ }
+
+ cb(opaque, QOBJECT(streams));
+ return 0;
+}
+
+int do_stream(Monitor *mon, const QDict *params,
+ MonitorCompletion cb, void *opaque)
+{
+ int all = qdict_get_try_bool(params, "all", false);
+ int stop = qdict_get_try_bool(params, "stop", false);
+ const char *device = qdict_get_str(params, "device");
+ int64_t offset = 0;
+ StreamState *s;
+
+ if (all && stop) {
+ qerror_report(QERR_INVALID_PARAMETER, "stop' not allowed with 'all");
+ return -1;
+ }
+
+ if (stop) {
+ if (stream_stop(device, cb, opaque)) {
+ return -1;
+ }
+ } else if (all) {
+ s = stream_start(device, offset, false, NULL, NULL);
+ if (!s) {
+ return -1;
+ }
+ } else {
+ if (qdict_haskey(params, "offset")) {
+ offset = qdict_get_int(params, "offset");
+ }
+ s = stream_start(device, offset, true, cb, opaque);
+ if (!s) {
+ return -1;
+ }
+ return 0;
+ /* This will complete asynchronously when the sector is streamed */
+ }
+
+ /* Starting and stopping full device streams complete immediately */
+ cb(opaque, NULL);
+ return 0;
+}
+
static int eject_device(Monitor *mon, BlockDriverState *bs, int force)
{
if (!force) {
diff --git a/blockdev.h b/blockdev.h
index 2c9e780..c1c4dfd 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -12,6 +12,7 @@
#include "block.h"
#include "qemu-queue.h"
+#include "monitor.h"
void blockdev_mark_auto_del(BlockDriverState *bs);
void blockdev_auto_del(BlockDriverState *bs);
@@ -64,5 +65,9 @@ int do_change_block(Monitor *mon, const char *device,
int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
+void monitor_print_stream(Monitor *mon, const QObject *data);
+int do_stream_info(Monitor *mon, MonitorCompletion *cb, void *opaque);
+int do_stream(Monitor *mon, const QDict *params,
+ MonitorCompletion cb, void *opaque);
#endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 834e6a8..1db477c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -38,6 +38,22 @@ Commit changes to the disk images (if -snapshot is used) or backing files.
ETEXI
{
+ .name = "stream",
+ .args_type = "all:-a,stop:-s,device:B,offset:i?",
+ .params = "[-a] [-s] device [offset]",
+ .help = "Stream data to a block device",
+ .user_print = monitor_print_stream,
+ .mhandler.cmd_async = do_stream,
+ .flags = MONITOR_CMD_ASYNC,
+ },
+
+STEXI
+@item stream
+@findex stream
+Stream data to a block device.
+ETEXI
+
+ {
.name = "q|quit",
.args_type = "",
.params = "",
@@ -1352,6 +1368,8 @@ show device tree
show qdev device model list
@item info roms
show roms
+@item info stream
+show progress of ongoing stream operation
@end table
ETEXI
diff --git a/monitor.c b/monitor.c
index 5f3bc72..f325ede 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3100,6 +3100,16 @@ static const mon_cmd_t info_cmds[] = {
.mhandler.info = do_info_trace_events,
},
#endif
+ {
+ .name = "stream",
+ .args_type = "",
+ .params = "",
+ .help = "show block streaming status",
+ .user_print = monitor_print_stream,
+ .mhandler.info_async = do_stream_info,
+ .flags = MONITOR_CMD_ASYNC,
+
+ },
{
.name = NULL,
},
@@ -3242,6 +3252,16 @@ static const mon_cmd_t qmp_query_cmds[] = {
.mhandler.info_async = do_info_balloon,
.flags = MONITOR_CMD_ASYNC,
},
+ {
+ .name = "stream",
+ .args_type = "",
+ .params = "",
+ .help = "show block streaming status",
+ .user_print = monitor_print_stream,
+ .mhandler.info_async = do_stream_info,
+ .flags = MONITOR_CMD_ASYNC,
+
+ },
{ /* NULL */ },
};
diff --git a/qerror.c b/qerror.c
index 4855604..5a4f0ba 100644
--- a/qerror.c
+++ b/qerror.c
@@ -157,6 +157,10 @@ static const QErrorStringTable qerror_table[] = {
.desc = "No '%(bus)' bus found for device '%(device)'",
},
{
+ .error_fmt = QERR_NOT_SUPPORTED,
+ .desc = "Operation is not supported",
+ },
+ {
.error_fmt = QERR_OPEN_FILE_FAILED,
.desc = "Could not open '%(filename)'",
},
@@ -209,6 +213,11 @@ static const QErrorStringTable qerror_table[] = {
.error_fmt = QERR_VNC_SERVER_FAILED,
.desc = "Could not start VNC server on %(target)",
},
+ {
+ .error_fmt = QERR_STREAMING_ERROR,
+ .desc = "An error occurred during streaming: %(msg)",
+ },
+
{}
};
diff --git a/qerror.h b/qerror.h
index df61d2c..cbe19cb 100644
--- a/qerror.h
+++ b/qerror.h
@@ -132,6 +132,9 @@ QError *qobject_to_qerror(const QObject *obj);
#define QERR_NO_BUS_FOR_DEVICE \
"{ 'class': 'NoBusForDevice', 'data': { 'device': %s, 'bus': %s } }"
+#define QERR_NOT_SUPPORTED \
+ "{ 'class': 'NotSupported', 'data': {} }"
+
#define QERR_OPEN_FILE_FAILED \
"{ 'class': 'OpenFileFailed', 'data': { 'filename': %s } }"
@@ -174,4 +177,7 @@ QError *qobject_to_qerror(const QObject *obj);
#define QERR_FEATURE_DISABLED \
"{ 'class': 'FeatureDisabled', 'data': { 'name': %s } }"
+#define QERR_STREAMING_ERROR \
+ "{ 'class': 'StreamingError', 'data': { 'msg': %s } }"
+
#endif /* QERROR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index fbd98ee..c2bccd6 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -858,6 +858,48 @@ Example:
EQMP
{
+ .name = "stream",
+ .args_type = "all:-a,stop:-s,device:B,offset:i?",
+ .params = "[-a] [-s] device [offset]",
+ .help = "Stream data to a block device",
+ .user_print = monitor_print_stream,
+ .mhandler.cmd_async = do_stream,
+ .flags = MONITOR_CMD_ASYNC,
+ },
+
+SQMP
+stream
+------
+
+Stream data to a block device.
+
+Arguments:
+
+- all: Stream the entire device (json-bool, optional)
+- stop: Stop streaming to the device (json-bool, optional)
+- device: device name (json-string)
+- offset: device offset in bytes (json-int, optional)
+
+Return:
+
+- device: The device name being streamed
+- len: The size of the device (in bytes)
+- offset: The ending offset of the completed I/O (in bytes)
+
+Examples:
+
+-> { "execute": "stream", "arguments": { "device": "virtio0", "offset": 0 } }
+<- { "return": { "device": "virtio0", "len": 10737418240, "offset": 512 } }
+
+-> { "execute": "stream", "arguments": { "all": true, "device": "virtio0" } }
+<- { "return": {} }
+
+-> { "execute": "stream", "arguments": { "stop": true, "device": "virtio0" } }
+<- { "return": {} }
+
+EQMP
+
+ {
.name = "qmp_capabilities",
.args_type = "",
.params = "",
@@ -1777,3 +1819,25 @@ Example:
EQMP
+SQMP
+query-stream
+------------
+
+Show progress of ongoing stream operation
+
+Return a json-array of all streams. If no stream is active then an empty array
+will be returned. Each stream is a json-object with the following data:
+
+- device: The device name being streamed
+- len: The size of the device (in bytes)
+- offset: The ending offset of the completed I/O (in bytes)
+
+Example:
+
+-> { "execute": "query-stream" }
+<- { "return":[
+ { "device": "virtio0", "len": 10737418240, "offset": 709632}
+ ]
+ }
+
+EQMP
--
1.7.4.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 2/8] qmp: Add QMP support for stream commands
2011-04-27 13:27 ` [Qemu-devel] [PATCH 2/8] qmp: Add QMP support for stream commands Stefan Hajnoczi
@ 2011-04-29 12:09 ` Kevin Wolf
2011-05-06 13:23 ` Stefan Hajnoczi
0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2011-04-29 12:09 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel, Adam Litke
Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
> From: Anthony Liguori <aliguori@us.ibm.com>
>
> For leaf images with copy on read semantics, the stream commands allow the user
> to populate local blocks by manually streaming them from the backing image.
> Once all blocks have been streamed, the dependency on the original backing
> image can be removed. Therefore, stream commands can be used to implement
> post-copy live block migration and rapid deployment.
>
> The stream command can be used to stream a single sector, to start streaming
> the entire device, and to cancel an active stream. It is easiest to allow the
> stream command to manage streaming for the entire device but a managent tool
> could use single sector mode to throttle the I/O rate. When a single sector is
> streamed, the command returns an offset that can be used for a subsequent call.
You mean literally single sectors? You're not interested in completing
the job in finite time, are you? ;-)
I would suggest adding a length argument for the all=false case, so that
management tools can choose more reasonable sizes.
Kevin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 2/8] qmp: Add QMP support for stream commands
2011-04-29 12:09 ` Kevin Wolf
@ 2011-05-06 13:23 ` Stefan Hajnoczi
0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-05-06 13:23 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Anthony Liguori, Adam Litke, Stefan Hajnoczi, qemu-devel
On Fri, Apr 29, 2011 at 1:09 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
>> From: Anthony Liguori <aliguori@us.ibm.com>
>>
>> For leaf images with copy on read semantics, the stream commands allow the user
>> to populate local blocks by manually streaming them from the backing image.
>> Once all blocks have been streamed, the dependency on the original backing
>> image can be removed. Therefore, stream commands can be used to implement
>> post-copy live block migration and rapid deployment.
>>
>> The stream command can be used to stream a single sector, to start streaming
>> the entire device, and to cancel an active stream. It is easiest to allow the
>> stream command to manage streaming for the entire device but a managent tool
>> could use single sector mode to throttle the I/O rate. When a single sector is
>> streamed, the command returns an offset that can be used for a subsequent call.
>
> You mean literally single sectors? You're not interested in completing
> the job in finite time, are you? ;-)
>
> I would suggest adding a length argument for the all=false case, so that
> management tools can choose more reasonable sizes.
Discussion on libvir-list suggests the same thing. Let's take a
nb_sectors where 0=all.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 3/8] qed: add support for Copy-on-Read
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
2011-04-27 13:27 ` [Qemu-devel] [PATCH 1/8] block: add bdrv_aio_stream Stefan Hajnoczi
2011-04-27 13:27 ` [Qemu-devel] [PATCH 2/8] qmp: Add QMP support for stream commands Stefan Hajnoczi
@ 2011-04-27 13:27 ` Stefan Hajnoczi
2011-04-27 14:29 ` Paolo Bonzini
2011-04-29 12:14 ` Kevin Wolf
2011-04-27 13:27 ` [Qemu-devel] [PATCH 4/8] qed: intelligent streaming implementation Stefan Hajnoczi
` (5 subsequent siblings)
8 siblings, 2 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori
From: Anthony Liguori <aliguori@us.ibm.com>
When creating an image using qemu-img, just pass '-o copy_on_read' and then
whenever QED reads from a backing file, it will write the block to the QED
file after the read completes ensuring that you only fetch from the backing
device once.
This is very useful for streaming images over a slow connection.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
block/qed.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++---
block/qed.h | 15 +++++++++++----
2 files changed, 59 insertions(+), 7 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index c8c5930..7487683 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -448,7 +448,8 @@ static int bdrv_qed_flush(BlockDriverState *bs)
static int qed_create(const char *filename, uint32_t cluster_size,
uint64_t image_size, uint32_t table_size,
- const char *backing_file, const char *backing_fmt)
+ const char *backing_file, const char *backing_fmt,
+ bool copy_on_read)
{
QEDHeader header = {
.magic = QED_MAGIC,
@@ -490,6 +491,9 @@ static int qed_create(const char *filename, uint32_t cluster_size,
if (qed_fmt_is_raw(backing_fmt)) {
header.features |= QED_F_BACKING_FORMAT_NO_PROBE;
}
+ if (copy_on_read) {
+ header.compat_features |= QED_CF_COPY_ON_READ;
+ }
}
qed_header_cpu_to_le(&header, &le_header);
@@ -523,6 +527,7 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
uint32_t table_size = QED_DEFAULT_TABLE_SIZE;
const char *backing_file = NULL;
const char *backing_fmt = NULL;
+ bool copy_on_read = false;
while (options && options->name) {
if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
@@ -539,6 +544,10 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
if (options->value.n) {
table_size = options->value.n;
}
+ } else if (!strcmp(options->name, "copy_on_read")) {
+ if (options->value.n) {
+ copy_on_read = true;
+ }
}
options++;
}
@@ -559,9 +568,14 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
qed_max_image_size(cluster_size, table_size));
return -EINVAL;
}
+ if (copy_on_read && !backing_file) {
+ fprintf(stderr,
+ "QED only supports Copy-on-Read with a backing file\n");
+ return -EINVAL;
+ }
return qed_create(filename, cluster_size, image_size, table_size,
- backing_file, backing_fmt);
+ backing_file, backing_fmt, copy_on_read);
}
typedef struct {
@@ -1092,6 +1106,27 @@ static void qed_aio_write_data(void *opaque, int ret,
}
/**
+ * Copy on read callback
+ *
+ * Write data from backing file to QED that's been read if CoR is enabled.
+ */
+static void qed_copy_on_read_cb(void *opaque, int ret)
+{
+ QEDAIOCB *acb = opaque;
+ BDRVQEDState *s = acb_to_s(acb);
+ BlockDriverAIOCB *cor_acb;
+
+ cor_acb = bdrv_aio_writev(s->bs,
+ acb->cur_pos / BDRV_SECTOR_SIZE,
+ &acb->cur_qiov,
+ acb->cur_qiov.size / BDRV_SECTOR_SIZE,
+ qed_aio_next_io, acb);
+ if (!cor_acb) {
+ qed_aio_complete(acb, -EIO);
+ }
+}
+
+/**
* Read data cluster
*
* @opaque: Read request
@@ -1127,8 +1162,14 @@ static void qed_aio_read_data(void *opaque, int ret,
qed_aio_next_io(acb, 0);
return;
} else if (ret != QED_CLUSTER_FOUND) {
+ BlockDriverCompletionFunc *cb = qed_aio_next_io;
+
+ if (bs->backing_hd &&
+ (s->header.compat_features & QED_CF_COPY_ON_READ)) {
+ cb = qed_copy_on_read_cb;
+ }
qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov,
- qed_aio_next_io, acb);
+ cb, acb);
return;
}
@@ -1349,6 +1390,10 @@ static QEMUOptionParameter qed_create_options[] = {
.name = BLOCK_OPT_TABLE_SIZE,
.type = OPT_SIZE,
.help = "L1/L2 table size (in clusters)"
+ }, {
+ .name = "copy_on_read",
+ .type = OPT_FLAG,
+ .help = "Copy blocks from base image on read"
},
{ /* end of list */ }
};
diff --git a/block/qed.h b/block/qed.h
index 3e1ab84..845a80e 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -56,12 +56,19 @@ enum {
/* The backing file format must not be probed, treat as raw image */
QED_F_BACKING_FORMAT_NO_PROBE = 0x04,
- /* Feature bits must be used when the on-disk format changes */
- QED_FEATURE_MASK = QED_F_BACKING_FILE | /* supported feature bits */
+ /* Reads to the backing file should populate the image file */
+ QED_CF_COPY_ON_READ = 0x01,
+
+ /* Supported feature bits */
+ QED_FEATURE_MASK = QED_F_BACKING_FILE |
QED_F_NEED_CHECK |
QED_F_BACKING_FORMAT_NO_PROBE,
- QED_COMPAT_FEATURE_MASK = 0, /* supported compat feature bits */
- QED_AUTOCLEAR_FEATURE_MASK = 0, /* supported autoclear feature bits */
+
+ /* Supported compat feature bits */
+ QED_COMPAT_FEATURE_MASK = QED_CF_COPY_ON_READ,
+
+ /* Supported autoclear feature bits */
+ QED_AUTOCLEAR_FEATURE_MASK = 0,
/* Data is stored in groups of sectors called clusters. Cluster size must
* be large to avoid keeping too much metadata. I/O requests that have
--
1.7.4.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 3/8] qed: add support for Copy-on-Read
2011-04-27 13:27 ` [Qemu-devel] [PATCH 3/8] qed: add support for Copy-on-Read Stefan Hajnoczi
@ 2011-04-27 14:29 ` Paolo Bonzini
2011-04-29 12:14 ` Kevin Wolf
1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2011-04-27 14:29 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori
On 04/27/2011 03:27 PM, Stefan Hajnoczi wrote:
> From: Anthony Liguori<aliguori@us.ibm.com>
>
> When creating an image using qemu-img, just pass '-o copy_on_read' and then
> whenever QED reads from a backing file, it will write the block to the QED
> file after the read completes ensuring that you only fetch from the backing
> device once.
>
> This is very useful for streaming images over a slow connection.
While having the default in the file is sane, it seems to me that it
should be overridable at runtime, too.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 3/8] qed: add support for Copy-on-Read
2011-04-27 13:27 ` [Qemu-devel] [PATCH 3/8] qed: add support for Copy-on-Read Stefan Hajnoczi
2011-04-27 14:29 ` Paolo Bonzini
@ 2011-04-29 12:14 ` Kevin Wolf
2011-05-06 13:24 ` Stefan Hajnoczi
1 sibling, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2011-04-29 12:14 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel
Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
> From: Anthony Liguori <aliguori@us.ibm.com>
>
> When creating an image using qemu-img, just pass '-o copy_on_read' and then
> whenever QED reads from a backing file, it will write the block to the QED
> file after the read completes ensuring that you only fetch from the backing
> device once.
As discussed previously, this is not the right primary interface.
Also, the patch still seems to be broken with respect to concurrent writes.
Kevin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 3/8] qed: add support for Copy-on-Read
2011-04-29 12:14 ` Kevin Wolf
@ 2011-05-06 13:24 ` Stefan Hajnoczi
0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-05-06 13:24 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Anthony Liguori, Stefan Hajnoczi, qemu-devel
On Fri, Apr 29, 2011 at 1:14 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
>> From: Anthony Liguori <aliguori@us.ibm.com>
>>
>> When creating an image using qemu-img, just pass '-o copy_on_read' and then
>> whenever QED reads from a backing file, it will write the block to the QED
>> file after the read completes ensuring that you only fetch from the backing
>> device once.
>
> As discussed previously, this is not the right primary interface.
>
> Also, the patch still seems to be broken with respect to concurrent writes.
Both good points, I really wanted to give a full RFC patch series to
show all the code that has been written today. But your points need
to be addressed for the next version.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 4/8] qed: intelligent streaming implementation
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
` (2 preceding siblings ...)
2011-04-27 13:27 ` [Qemu-devel] [PATCH 3/8] qed: add support for Copy-on-Read Stefan Hajnoczi
@ 2011-04-27 13:27 ` Stefan Hajnoczi
2011-04-27 13:27 ` [Qemu-devel] [PATCH 5/8] qed: detect zero writes and skip them when to an unalloc cluster Stefan Hajnoczi
` (4 subsequent siblings)
8 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori
From: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
block/qed.c | 165 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 158 insertions(+), 7 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index 7487683..56150c3 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1222,11 +1222,11 @@ static void qed_aio_next_io(void *opaque, int ret)
io_fn, acb);
}
-static BlockDriverAIOCB *qed_aio_setup(BlockDriverState *bs,
- int64_t sector_num,
- QEMUIOVector *qiov, int nb_sectors,
- BlockDriverCompletionFunc *cb,
- void *opaque, bool is_write)
+static QEDAIOCB *qed_aio_setup(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov, int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque, bool is_write)
{
QEDAIOCB *acb = qemu_aio_get(&qed_aio_pool, bs, cb, opaque);
@@ -1242,8 +1242,22 @@ static BlockDriverAIOCB *qed_aio_setup(BlockDriverState *bs,
acb->request.l2_table = NULL;
qemu_iovec_init(&acb->cur_qiov, qiov->niov);
+ return acb;
+}
+
+static BlockDriverAIOCB *bdrv_qed_aio_setup(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov, int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque, bool is_write)
+{
+ QEDAIOCB *acb;
+
+ acb = qed_aio_setup(bs, sector_num, qiov, nb_sectors,
+ cb, opaque, is_write);
/* Start request */
qed_aio_next_io(acb, 0);
+
return &acb->common;
}
@@ -1253,7 +1267,8 @@ static BlockDriverAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
BlockDriverCompletionFunc *cb,
void *opaque)
{
- return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, false);
+ return bdrv_qed_aio_setup(bs, sector_num, qiov, nb_sectors,
+ cb, opaque, false);
}
static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
@@ -1262,7 +1277,142 @@ static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
BlockDriverCompletionFunc *cb,
void *opaque)
{
- return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, true);
+ return bdrv_qed_aio_setup(bs, sector_num, qiov, nb_sectors,
+ cb, opaque, true);
+}
+
+typedef struct QEDStreamData {
+ QEDAIOCB *acb;
+ uint64_t offset;
+ QEMUIOVector qiov;
+ void *buffer;
+ size_t len;
+ BlockDriverCompletionFunc *cb;
+ void *opaque;
+} QEDStreamData;
+
+static void qed_aio_stream_cb(void *opaque, int ret)
+{
+ QEDStreamData *stream_data = opaque;
+ QEDAIOCB *acb = stream_data->acb;
+
+ if (ret) {
+ ret = -EIO;
+ } else {
+ ret = (acb->end_pos - stream_data->offset) / BDRV_SECTOR_SIZE;
+ }
+
+ stream_data->cb(stream_data->opaque, ret);
+
+ qemu_iovec_destroy(&stream_data->qiov);
+ qemu_vfree(stream_data->buffer);
+ qemu_free(stream_data);
+}
+
+static void qed_stream_find_cluster_cb(void *opaque, int ret,
+ uint64_t offset, size_t len);
+
+/**
+ * Perform the next qed_find_cluster() from a BH
+ *
+ * This is necessary because we iterate over each cluster in turn.
+ * qed_find_cluster() may invoke its callback immediately without returning up
+ * the call stack, causing us to overflow the call stack. By starting each
+ * iteration from a BH we guarantee that a fresh stack is used each time.
+ */
+static void qed_stream_next_cluster_bh(void *opaque)
+{
+ QEDStreamData *stream_data = opaque;
+ QEDAIOCB *acb = stream_data->acb;
+ BDRVQEDState *s = acb_to_s(acb);
+
+ qemu_bh_delete(acb->bh);
+ acb->bh = NULL;
+
+ acb->cur_pos += s->header.cluster_size;
+ acb->end_pos += s->header.cluster_size;
+
+ qed_find_cluster(s, &acb->request, acb->cur_pos,
+ acb->end_pos - acb->cur_pos,
+ qed_stream_find_cluster_cb, stream_data);
+}
+
+/**
+ * Search for an unallocated cluster adjusting the current request until we
+ * can use it to read an unallocated cluster.
+ *
+ * Callback from qed_find_cluster().
+ */
+static void qed_stream_find_cluster_cb(void *opaque, int ret,
+ uint64_t offset, size_t len)
+{
+ QEDStreamData *stream_data = opaque;
+ QEDAIOCB *acb = stream_data->acb;
+ BDRVQEDState *s = acb_to_s(acb);
+
+ if (ret < 0) {
+ qed_aio_complete(acb, ret);
+ return;
+ }
+
+ if (ret == QED_CLUSTER_FOUND ||
+ ret == QED_CLUSTER_ZERO) {
+ /* proceed to next cluster */
+
+ if (acb->end_pos == s->header.image_size) {
+ qed_aio_complete(acb, 0);
+ return;
+ }
+
+ acb->bh = qemu_bh_new(qed_stream_next_cluster_bh, stream_data);
+ qemu_bh_schedule(acb->bh);
+ } else {
+ /* found a hole, kick off request */
+ qed_aio_next_io(acb, 0);
+ }
+}
+
+static BlockDriverAIOCB *bdrv_qed_aio_stream(BlockDriverState *bs,
+ int64_t sector_num,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQEDState *s = bs->opaque;
+ QEDStreamData *stream_data;
+ QEDAIOCB *acb;
+ uint32_t cluster_size = s->header.cluster_size;
+ uint64_t start_cluster;
+ QEMUIOVector *qiov;
+
+ if (!(s->header.compat_features & QED_CF_COPY_ON_READ)) {
+ return NULL;
+ }
+
+ stream_data = qemu_mallocz(sizeof(*stream_data));
+
+ stream_data->cb = cb;
+ stream_data->opaque = opaque;
+ stream_data->len = cluster_size;
+ stream_data->buffer = qemu_blockalign(s->bs, cluster_size);
+ stream_data->offset = sector_num * BDRV_SECTOR_SIZE;
+
+ start_cluster = qed_start_of_cluster(s, stream_data->offset);
+ sector_num = start_cluster / BDRV_SECTOR_SIZE;
+
+ qiov = &stream_data->qiov;
+ qemu_iovec_init(qiov, 1);
+ qemu_iovec_add(qiov, stream_data->buffer, cluster_size);
+
+ acb = qed_aio_setup(bs, sector_num, qiov,
+ cluster_size / BDRV_SECTOR_SIZE,
+ qed_aio_stream_cb, stream_data, false);
+ stream_data->acb = acb;
+
+ qed_find_cluster(s, &acb->request, acb->cur_pos,
+ acb->end_pos - acb->cur_pos,
+ qed_stream_find_cluster_cb, stream_data);
+
+ return &acb->common;
}
static BlockDriverAIOCB *bdrv_qed_aio_flush(BlockDriverState *bs,
@@ -1412,6 +1562,7 @@ static BlockDriver bdrv_qed = {
.bdrv_make_empty = bdrv_qed_make_empty,
.bdrv_aio_readv = bdrv_qed_aio_readv,
.bdrv_aio_writev = bdrv_qed_aio_writev,
+ .bdrv_aio_stream = bdrv_qed_aio_stream,
.bdrv_aio_flush = bdrv_qed_aio_flush,
.bdrv_truncate = bdrv_qed_truncate,
.bdrv_getlength = bdrv_qed_getlength,
--
1.7.4.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 5/8] qed: detect zero writes and skip them when to an unalloc cluster
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
` (3 preceding siblings ...)
2011-04-27 13:27 ` [Qemu-devel] [PATCH 4/8] qed: intelligent streaming implementation Stefan Hajnoczi
@ 2011-04-27 13:27 ` Stefan Hajnoczi
2011-04-27 13:27 ` [Qemu-devel] [PATCH 6/8] blockdev: Allow image files to auto-enable streaming Stefan Hajnoczi
` (3 subsequent siblings)
8 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori
From: Anthony Liguori <aliguori@us.ibm.com>
A value of 1 is used to indicate that a cluster contains all zeros. Update the
code to detect zero writes only when a flag is set on the AIOCB. For now, only
set the flag on copy-on-read based write requests to avoid polluting the
cache on write in the zero copy case.
After this patch, we can stream an image file from a backing file without
fully expanding the image.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
block/qed.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
block/qed.h | 1 +
2 files changed, 107 insertions(+), 18 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index 56150c3..2c155d9 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -33,6 +33,13 @@ static AIOPool qed_aio_pool = {
.cancel = qed_aio_cancel,
};
+static BlockDriverAIOCB *qed_aio_writev_check(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque);
+
static int bdrv_qed_probe(const uint8_t *buf, int buf_size,
const char *filename)
{
@@ -871,9 +878,8 @@ static void qed_aio_write_l1_update(void *opaque, int ret)
/**
* Update L2 table with new cluster offsets and write them out
*/
-static void qed_aio_write_l2_update(void *opaque, int ret)
+static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
{
- QEDAIOCB *acb = opaque;
BDRVQEDState *s = acb_to_s(acb);
bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
int index;
@@ -889,7 +895,7 @@ static void qed_aio_write_l2_update(void *opaque, int ret)
index = qed_l2_index(s, acb->cur_pos);
qed_update_l2_table(s, acb->request.l2_table->table, index, acb->cur_nclusters,
- acb->cur_cluster);
+ offset);
if (need_alloc) {
/* Write out the whole new L2 table */
@@ -906,6 +912,51 @@ err:
qed_aio_complete(acb, ret);
}
+static void qed_aio_write_l2_update_cb(void *opaque, int ret)
+{
+ QEDAIOCB *acb = opaque;
+ qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
+}
+
+/**
+ * Determine if we have a zero write to a block of clusters
+ *
+ * We validate that the write is aligned to a cluster boundary, and that it's
+ * a multiple of cluster size with all zeros.
+ */
+static bool qed_is_zero_write(QEDAIOCB *acb)
+{
+ BDRVQEDState *s = acb_to_s(acb);
+ int i;
+
+ if (!qed_offset_is_cluster_aligned(s, acb->cur_pos)) {
+ return false;
+ }
+
+ if (!qed_offset_is_cluster_aligned(s, acb->cur_qiov.size)) {
+ return false;
+ }
+
+ for (i = 0; i < acb->cur_qiov.niov; i++) {
+ struct iovec *iov = &acb->cur_qiov.iov[i];
+ uint64_t *v;
+ int j;
+
+ if ((iov->iov_len & 0x07)) {
+ return false;
+ }
+
+ v = iov->iov_base;
+ for (j = 0; j < iov->iov_len; j += sizeof(v[0])) {
+ if (v[j >> 3]) {
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
/**
* Flush new data clusters before updating the L2 table
*
@@ -920,7 +971,7 @@ static void qed_aio_write_flush_before_l2_update(void *opaque, int ret)
QEDAIOCB *acb = opaque;
BDRVQEDState *s = acb_to_s(acb);
- if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update, opaque)) {
+ if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update_cb, opaque)) {
qed_aio_complete(acb, -EIO);
}
}
@@ -950,7 +1001,7 @@ static void qed_aio_write_main(void *opaque, int ret)
if (s->bs->backing_hd) {
next_fn = qed_aio_write_flush_before_l2_update;
} else {
- next_fn = qed_aio_write_l2_update;
+ next_fn = qed_aio_write_l2_update_cb;
}
}
@@ -1016,6 +1067,18 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
return !(s->header.features & QED_F_NEED_CHECK);
}
+static void qed_aio_write_zero_cluster(void *opaque, int ret)
+{
+ QEDAIOCB *acb = opaque;
+
+ if (ret) {
+ qed_aio_complete(acb, ret);
+ return;
+ }
+
+ qed_aio_write_l2_update(acb, 0, 1);
+}
+
/**
* Write new data cluster
*
@@ -1027,6 +1090,7 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
{
BDRVQEDState *s = acb_to_s(acb);
+ BlockDriverCompletionFunc *cb;
/* Freeze this request if another allocating write is in progress */
if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs)) {
@@ -1041,11 +1105,18 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
+ cb = qed_aio_write_prefill;
+
+ /* Zero write detection */
+ if (acb->check_zero_write && qed_is_zero_write(acb)) {
+ cb = qed_aio_write_zero_cluster;
+ }
+
if (qed_should_set_need_check(s)) {
s->header.features |= QED_F_NEED_CHECK;
- qed_write_header(s, qed_aio_write_prefill, acb);
+ qed_write_header(s, cb, acb);
} else {
- qed_aio_write_prefill(acb, 0);
+ cb(acb, 0);
}
}
@@ -1116,11 +1187,11 @@ static void qed_copy_on_read_cb(void *opaque, int ret)
BDRVQEDState *s = acb_to_s(acb);
BlockDriverAIOCB *cor_acb;
- cor_acb = bdrv_aio_writev(s->bs,
- acb->cur_pos / BDRV_SECTOR_SIZE,
- &acb->cur_qiov,
- acb->cur_qiov.size / BDRV_SECTOR_SIZE,
- qed_aio_next_io, acb);
+ cor_acb = qed_aio_writev_check(s->bs,
+ acb->cur_pos / BDRV_SECTOR_SIZE,
+ &acb->cur_qiov,
+ acb->cur_qiov.size / BDRV_SECTOR_SIZE,
+ qed_aio_next_io, acb);
if (!cor_acb) {
qed_aio_complete(acb, -EIO);
}
@@ -1226,7 +1297,8 @@ static QEDAIOCB *qed_aio_setup(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov, int nb_sectors,
BlockDriverCompletionFunc *cb,
- void *opaque, bool is_write)
+ void *opaque, bool is_write,
+ bool check_zero_write)
{
QEDAIOCB *acb = qemu_aio_get(&qed_aio_pool, bs, cb, opaque);
@@ -1235,6 +1307,7 @@ static QEDAIOCB *qed_aio_setup(BlockDriverState *bs,
acb->is_write = is_write;
acb->finished = NULL;
+ acb->check_zero_write = check_zero_write;
acb->qiov = qiov;
acb->qiov_offset = 0;
acb->cur_pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE;
@@ -1249,12 +1322,13 @@ static BlockDriverAIOCB *bdrv_qed_aio_setup(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov, int nb_sectors,
BlockDriverCompletionFunc *cb,
- void *opaque, bool is_write)
+ void *opaque, bool is_write,
+ bool check_zero_write)
{
QEDAIOCB *acb;
acb = qed_aio_setup(bs, sector_num, qiov, nb_sectors,
- cb, opaque, is_write);
+ cb, opaque, is_write, check_zero_write);
/* Start request */
qed_aio_next_io(acb, 0);
@@ -1268,7 +1342,7 @@ static BlockDriverAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
void *opaque)
{
return bdrv_qed_aio_setup(bs, sector_num, qiov, nb_sectors,
- cb, opaque, false);
+ cb, opaque, false, false);
}
static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
@@ -1278,7 +1352,21 @@ static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
void *opaque)
{
return bdrv_qed_aio_setup(bs, sector_num, qiov, nb_sectors,
- cb, opaque, true);
+ cb, opaque, true, false);
+}
+
+/**
+ * Perform a write with a zero-check.
+ */
+static BlockDriverAIOCB *qed_aio_writev_check(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ return bdrv_qed_aio_setup(bs, sector_num, qiov, nb_sectors,
+ cb, opaque, true, true);
}
typedef struct QEDStreamData {
@@ -1405,7 +1493,7 @@ static BlockDriverAIOCB *bdrv_qed_aio_stream(BlockDriverState *bs,
acb = qed_aio_setup(bs, sector_num, qiov,
cluster_size / BDRV_SECTOR_SIZE,
- qed_aio_stream_cb, stream_data, false);
+ qed_aio_stream_cb, stream_data, false, false);
stream_data->acb = acb;
qed_find_cluster(s, &acb->request, acb->cur_pos,
diff --git a/block/qed.h b/block/qed.h
index 845a80e..8e9e415 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -135,6 +135,7 @@ typedef struct QEDAIOCB {
bool is_write; /* false - read, true - write */
bool *finished; /* signal for cancel completion */
uint64_t end_pos; /* request end on block device, in bytes */
+ bool check_zero_write; /* true - check blocks for zero write */
/* User scatter-gather list */
QEMUIOVector *qiov;
--
1.7.4.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 6/8] blockdev: Allow image files to auto-enable streaming
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
` (4 preceding siblings ...)
2011-04-27 13:27 ` [Qemu-devel] [PATCH 5/8] qed: detect zero writes and skip them when to an unalloc cluster Stefan Hajnoczi
@ 2011-04-27 13:27 ` Stefan Hajnoczi
2011-04-29 12:20 ` Kevin Wolf
2011-04-27 13:27 ` [Qemu-devel] [PATCH 7/8] qed: Add QED_CF_STREAM flag " Stefan Hajnoczi
` (2 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi
Image files that having streaming enabled will automatically begin
streaming when opened.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block.c | 5 +++++
block.h | 1 +
block_int.h | 1 +
blockdev.c | 9 +++++++++
4 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/block.c b/block.c
index 5e3476c..68a97a3 100644
--- a/block.c
+++ b/block.c
@@ -1584,6 +1584,11 @@ const char *bdrv_get_device_name(BlockDriverState *bs)
return bs->device_name;
}
+int bdrv_stream_enabled(BlockDriverState *bs)
+{
+ return bs->stream;
+}
+
int bdrv_flush(BlockDriverState *bs)
{
if (bs->open_flags & BDRV_O_NO_FLUSH) {
diff --git a/block.h b/block.h
index fad828a..3357c50 100644
--- a/block.h
+++ b/block.h
@@ -189,6 +189,7 @@ int bdrv_is_removable(BlockDriverState *bs);
int bdrv_is_read_only(BlockDriverState *bs);
int bdrv_is_sg(BlockDriverState *bs);
int bdrv_enable_write_cache(BlockDriverState *bs);
+int bdrv_stream_enabled(BlockDriverState *bs);
int bdrv_is_inserted(BlockDriverState *bs);
int bdrv_media_changed(BlockDriverState *bs);
int bdrv_is_locked(BlockDriverState *bs);
diff --git a/block_int.h b/block_int.h
index 0c125d0..d0fe96c 100644
--- a/block_int.h
+++ b/block_int.h
@@ -155,6 +155,7 @@ struct BlockDriverState {
int encrypted; /* if true, the media is encrypted */
int valid_key; /* if true, a valid encryption key has been set */
int sg; /* if true, the device is a /dev/sg* */
+ int stream; /* if true, stream from the backing file */
/* event callback when inserting/removing */
void (*change_cb)(void *opaque, int reason);
void *change_opaque;
diff --git a/blockdev.c b/blockdev.c
index 99c0726..5d6cb2b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -678,6 +678,15 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
goto err;
}
+ if (bdrv_stream_enabled(dinfo->bdrv)) {
+ const char *device_name = bdrv_get_device_name(dinfo->bdrv);
+
+ if (!stream_start(device_name, 0, false, NULL, NULL)) {
+ fprintf(stderr, "qemu: warning: stream_start failed for '%s'\n",
+ device_name);
+ }
+ }
+
if (bdrv_key_required(dinfo->bdrv))
autostart = 0;
return dinfo;
--
1.7.4.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] blockdev: Allow image files to auto-enable streaming
2011-04-27 13:27 ` [Qemu-devel] [PATCH 6/8] blockdev: Allow image files to auto-enable streaming Stefan Hajnoczi
@ 2011-04-29 12:20 ` Kevin Wolf
0 siblings, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2011-04-29 12:20 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Anthony Liguori, qemu-devel
Am 27.04.2011 15:27, schrieb Stefan Hajnoczi:
> Image files that having streaming enabled will automatically begin
> streaming when opened.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Hm... I wasn't really happy about images that do copy on read even if I
didn't tell qemu so on the command line. Now they can start to copy data
and use up my internet connection even without the guest really
accessing the data. This seems to be one step more that I find rather
questionable.
Anyway, same as for copy on read: While we can discuss _allowing_ it in
backing files, it's definitely not suitable as a primary interface.
Kevin
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 7/8] qed: Add QED_CF_STREAM flag to auto-enable streaming
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
` (5 preceding siblings ...)
2011-04-27 13:27 ` [Qemu-devel] [PATCH 6/8] blockdev: Allow image files to auto-enable streaming Stefan Hajnoczi
@ 2011-04-27 13:27 ` Stefan Hajnoczi
2011-04-27 13:27 ` [Qemu-devel] [PATCH 8/8] qed: Add -o stream=on image creation option Stefan Hajnoczi
2011-04-27 13:41 ` [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
8 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi
The QED_CF_STREAM flag can be set to automatically stream from the
backing file.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/qed.c | 5 +++++
block/qed.h | 6 +++++-
2 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index 2c155d9..a61cee9 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -373,6 +373,11 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
if (s->header.features & QED_F_BACKING_FORMAT_NO_PROBE) {
pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
}
+
+ if ((s->header.compat_features & QED_CF_STREAM) &&
+ !bdrv_is_read_only(bs->file)) {
+ bs->stream = 1;
+ }
}
/* Reset unknown autoclear feature bits. This is a backwards
diff --git a/block/qed.h b/block/qed.h
index 8e9e415..23a9bde 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -59,13 +59,17 @@ enum {
/* Reads to the backing file should populate the image file */
QED_CF_COPY_ON_READ = 0x01,
+ /* Stream until the backing image is no longer needed */
+ QED_CF_STREAM = 0x02,
+
/* Supported feature bits */
QED_FEATURE_MASK = QED_F_BACKING_FILE |
QED_F_NEED_CHECK |
QED_F_BACKING_FORMAT_NO_PROBE,
/* Supported compat feature bits */
- QED_COMPAT_FEATURE_MASK = QED_CF_COPY_ON_READ,
+ QED_COMPAT_FEATURE_MASK = QED_CF_COPY_ON_READ |
+ QED_CF_STREAM,
/* Supported autoclear feature bits */
QED_AUTOCLEAR_FEATURE_MASK = 0,
--
1.7.4.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 8/8] qed: Add -o stream=on image creation option
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
` (6 preceding siblings ...)
2011-04-27 13:27 ` [Qemu-devel] [PATCH 7/8] qed: Add QED_CF_STREAM flag " Stefan Hajnoczi
@ 2011-04-27 13:27 ` Stefan Hajnoczi
2011-04-27 13:41 ` [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
8 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:27 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi
Create an image that automatically streams its backing file like this:
qemu-img create -f qed -o backing_file=master.raw,backing_fmt=raw,copy_on_read=on,stream=on stream.qed 60G
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/qed.c | 21 +++++++++++++++++++--
1 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index a61cee9..d65abe7 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -461,7 +461,7 @@ static int bdrv_qed_flush(BlockDriverState *bs)
static int qed_create(const char *filename, uint32_t cluster_size,
uint64_t image_size, uint32_t table_size,
const char *backing_file, const char *backing_fmt,
- bool copy_on_read)
+ bool copy_on_read, bool stream)
{
QEDHeader header = {
.magic = QED_MAGIC,
@@ -506,6 +506,9 @@ static int qed_create(const char *filename, uint32_t cluster_size,
if (copy_on_read) {
header.compat_features |= QED_CF_COPY_ON_READ;
}
+ if (stream) {
+ header.compat_features |= QED_CF_STREAM;
+ }
}
qed_header_cpu_to_le(&header, &le_header);
@@ -540,6 +543,7 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
const char *backing_file = NULL;
const char *backing_fmt = NULL;
bool copy_on_read = false;
+ bool stream = false;
while (options && options->name) {
if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
@@ -560,6 +564,10 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
if (options->value.n) {
copy_on_read = true;
}
+ } else if (!strcmp(options->name, "stream")) {
+ if (options->value.n) {
+ stream = true;
+ }
}
options++;
}
@@ -585,9 +593,14 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
"QED only supports Copy-on-Read with a backing file\n");
return -EINVAL;
}
+ if (stream && !copy_on_read) {
+ fprintf(stderr,
+ "QED requires Copy-on-Read to be enabled for streaming\n");
+ return -EINVAL;
+ }
return qed_create(filename, cluster_size, image_size, table_size,
- backing_file, backing_fmt, copy_on_read);
+ backing_file, backing_fmt, copy_on_read, stream);
}
typedef struct {
@@ -1637,6 +1650,10 @@ static QEMUOptionParameter qed_create_options[] = {
.name = "copy_on_read",
.type = OPT_FLAG,
.help = "Copy blocks from base image on read"
+ }, {
+ .name = "stream",
+ .type = OPT_FLAG,
+ .help = "Start copying blocks from base image once opened"
},
{ /* end of list */ }
};
--
1.7.4.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 0/8] QED image streaming
2011-04-27 13:27 [Qemu-devel] [RFC PATCH 0/8] QED image streaming Stefan Hajnoczi
` (7 preceding siblings ...)
2011-04-27 13:27 ` [Qemu-devel] [PATCH 8/8] qed: Add -o stream=on image creation option Stefan Hajnoczi
@ 2011-04-27 13:41 ` Stefan Hajnoczi
8 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2011-04-27 13:41 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, Anthony Liguori, qemu-devel, Adam Litke
On Wed, Apr 27, 2011 at 2:27 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> This patch series is structured as follows and is based on work that Adam
> Litke, Anthony Liguori, and I have done:
>
> [PATCH 1/8] block: add bdrv_aio_stream
>
> Introduce the .bdrv_aio_stream() BlockDriver interface.
>
> [PATCH 2/8] qmp: Add QMP support for stream commands
>
> Introduce monitor commands to start/stop image streaming as well as querying
> the state of image streaming.
>
> [PATCH 3/8] qed: add support for Copy-on-Read
> [PATCH 4/8] qed: intelligent streaming implementation
> [PATCH 5/8] qed: detect zero writes and skip them when to an unalloc
>
> Implement QED support for .bdrv_aio_stream().
>
> [PATCH 6/8] blockdev: Allow image files to auto-enable streaming
> [PATCH 7/8] qed: Add QED_CF_STREAM flag to auto-enable streaming
> [PATCH 8/8] qed: Add -o stream=on image creation option
>
> Introduce a flag that auto-starts image streaming when the image file is opened.
>
> TODO
> * Settle on monitor interfaces and libvirt interaction
> * Streaming background I/O throttling
> * Additional testing
Just wanted to point out that this series is Request For Comments.
The individual patch email subjects are missing "RFC", sorry.
I'm really interested in thought on the block layer and monitor
interfaces that are being introduced here.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread