* [Qemu-devel] [RFC v2 00/15] QED image streaming
@ 2011-07-27 13:44 Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 01/15] block: add -drive copy-on-read=on|off Stefan Hajnoczi
` (14 more replies)
0 siblings, 15 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
Overview
--------
This patch series adds image streaming support for QED image files. QMP/HMP
commands are added to perform image streaming at runtime. This interface is
already supported by libvirt.
The goal is to implement image streaming in a generic way for all image formats
that support backing files. In the meantime, I want to share the latest
QED-specific patch series.
Image streaming populates the file in the background while the guest is
running. This makes it possible to start the guest before its image file has
been fully provisioned.
Example use cases include:
* Providing small virtual appliances for download that can be launched
immediately but provision themselves in the background.
* Reducing guest provisioning time by creating local image files but backing
them with shared master images which will be streamed.
When image streaming is enabled, the unallocated regions of the image file are
populated with the data from the backing file. This occurs in the background
and the guest can perform regular I/O in the meantime. Once the entire backing
file has been streamed, the image no longer requires a backing file and will
drop its reference.
Example invocation
------------------
$ # my_fedora.qed is a tiny file initially but will be streamed when the guest starts
$ ./qemu-img create -f qed -o backing_file=fedora-14.img my_fedora.qed
Formatting 'my_fedora.qed', fmt=qed size=10737418240 backing_file='fedora-14.img' cluster_size=0 table_size=0
$ # run the guest and stream fedora-14.img into my_fedora.qed
$ x86_64-softmmu/qemu-system-x86_64 -m 512 -enable-kvm -drive if=virtio,file=my_fedora.qed,cache=none,stream=on
Details on changes
------------------
Image streaming introduces a new bdrv_aio_copy_backing() interface. Block
drivers that implement this interface support streaming. This function scans
for an unallocated cluster and populates it with data from the backing file.
The details of populating the image file are actually best implemented as a
copy-on-read operation. Copy-on-read means that a read request will populate
the image file if it needs to fetch data from the backing file. The
copy-on-read feature can be used outside the context of streaming and this
patch series therefore introduces the -drive copy-on-read=on option for that
purpose.
The new block_stream QMP/HMP command can be used to start streaming a block
device. QMP events are raised on completion and failure so that polling is not
required. Adam Litke <agl@us.ibm.com> has implemented the libvirt APIs for
image streaming:
http://www.redhat.com/archives/libvir-list/2011-July/msg01570.html
Patches 1-6
block: add -drive copy-on-read=on|off
qed: replace is_write with flags field
qed: extract qed_start_allocating_write()
qed: make qed_aio_write_alloc() reusable
qed: add support for copy-on-read
qed: avoid deadlock on emulated synchronous I/O
These patches add copy-on-read support and implement it for QED.
Patches 7-13
block: add bdrv_aio_copy_backing()
qmp: add block_stream command
qmp: add block_job_cancel command
qmp: add query-block-jobs command
qmp: add block_job_set_speed command
block: add -drive stream=on|off
qed: intelligent streaming implementation
These patches implement image streaming using copy-on-read.
Patch 14
trace: trace bdrv_aio_readv/writev error paths
Additional trace events to identify I/O errors.
Patch 15
tests: add image streaming QMP interface tests
A Python script that performs basic QMP tests of image streaming.
v2:
* Implement latest block_stream QMP/HMP API
* Split monitor command patches into separate commits
* Add rate-limiting
* Remove iteration interface where client drives streaming
v1:
* -drive copy-on-read=on|off,stream=on|off instead of image header bits
* Latest libvirt API compatibility
* Workaround and assert for synchronous I/O emulation deadlock
Anthony Liguori (3):
qed: add support for copy-on-read
block: add bdrv_aio_copy_backing()
qed: intelligent streaming implementation
Stefan Hajnoczi (12):
block: add -drive copy-on-read=on|off
qed: replace is_write with flags field
qed: extract qed_start_allocating_write()
qed: make qed_aio_write_alloc() reusable
qed: avoid deadlock on emulated synchronous I/O
qmp: add block_stream command
qmp: add block_job_cancel command
qmp: add query-block-jobs command
qmp: add block_job_set_speed command
block: add -drive stream=on|off
trace: trace bdrv_aio_readv/writev error paths
tests: add image streaming QMP interface tests
block.c | 66 +++++++++-
block.h | 6 +
block/qed.c | 363 ++++++++++++++++++++++++++++++++++++++++++++++--------
block/qed.h | 8 +-
block_int.h | 3 +
blockdev.c | 286 +++++++++++++++++++++++++++++++++++++++++++
blockdev.h | 8 ++
hmp-commands.hx | 51 ++++++++-
monitor.c | 19 +++
monitor.h | 1 +
qemu-config.c | 8 ++
qemu-options.hx | 13 ++-
qerror.c | 8 ++
qerror.h | 6 +
qmp-commands.hx | 171 ++++++++++++++++++++++++++
test-stream.py | 193 +++++++++++++++++++++++++++++
trace-events | 10 ++-
17 files changed, 1156 insertions(+), 64 deletions(-)
create mode 100644 test-stream.py
--
1.7.5.4
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 01/15] block: add -drive copy-on-read=on|off
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 02/15] qed: replace is_write with flags field Stefan Hajnoczi
` (13 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
This patch adds the -drive copy-on-read=on|off command-line option:
copy-on-read=on|off
copy-on-read is "on" or "off" and enables whether to copy read backing
file sectors into the image file. Copy-on-read avoids accessing the
same backing file sectors repeatedly and is useful when the backing file
is over a slow network. By default copy-on-read is off.
The new BlockDriverState.copy_on_read field indicates whether
copy-on-read is enabled. Block drivers can use this as a hint to copy
sectors read from the backing file into the image file. The point of
copy-on-read is to avoid accessing the backing file again in the future.
Block drivers that do not honor the copy-on-read hint simply read data
without populating the image file.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block.c | 5 +++++
block.h | 1 +
block_int.h | 1 +
blockdev.c | 6 ++++++
hmp-commands.hx | 5 +++--
qemu-config.c | 4 ++++
qemu-options.hx | 10 +++++++++-
7 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/block.c b/block.c
index 9549b9e..3d074af 100644
--- a/block.c
+++ b/block.c
@@ -430,6 +430,11 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
/* buffer_alignment defaulted to 512, drivers can change this value */
bs->buffer_alignment = 512;
+ bs->copy_on_read = 0;
+ if (flags & BDRV_O_RDWR) {
+ bs->copy_on_read = !!(flags & BDRV_O_COPY_ON_READ);
+ }
+
pstrcpy(bs->filename, sizeof(bs->filename), filename);
if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv)) {
diff --git a/block.h b/block.h
index 59cc410..f6ffa93 100644
--- a/block.h
+++ b/block.h
@@ -34,6 +34,7 @@ typedef struct QEMUSnapshotInfo {
#define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
#define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
#define BDRV_O_NO_FLUSH 0x0200 /* disable flushing on this disk */
+#define BDRV_O_COPY_ON_READ 0x0400 /* copy read backing sectors into image */
#define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
diff --git a/block_int.h b/block_int.h
index efb6803..69dd5a7 100644
--- a/block_int.h
+++ b/block_int.h
@@ -154,6 +154,7 @@ struct BlockDriverState {
int encrypted; /* if true, the media is encrypted */
int valid_key; /* if true, a valid encryption key has been set */
int sg; /* if true, the device is a /dev/sg* */
+ int copy_on_read; /* if true, copy read backing sectors into image */
/* event callback when inserting/removing */
void (*change_cb)(void *opaque, int reason);
void *change_opaque;
diff --git a/blockdev.c b/blockdev.c
index 0b8d3a4..b337732 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -237,6 +237,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
const char *devaddr;
DriveInfo *dinfo;
int snapshot = 0;
+ int copy_on_read;
int ret;
translation = BIOS_ATA_TRANSLATION_AUTO;
@@ -253,6 +254,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
ro = qemu_opt_get_bool(opts, "readonly", 0);
+ copy_on_read = qemu_opt_get_bool(opts, "copy-on-read", 0);
file = qemu_opt_get(opts, "file");
serial = qemu_opt_get(opts, "serial");
@@ -517,6 +519,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
bdrv_flags |= (BDRV_O_SNAPSHOT|BDRV_O_CACHE_WB|BDRV_O_NO_FLUSH);
}
+ if (copy_on_read) {
+ bdrv_flags |= BDRV_O_COPY_ON_READ;
+ }
+
if (media == MEDIA_CDROM) {
/* CDROM is fine for any interface, don't check. */
ro = 1;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index c857827..cbaa9a0 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -863,9 +863,10 @@ ETEXI
.args_type = "pci_addr:s,opts:s",
.params = "[[<domain>:]<bus>:]<slot>\n"
"[file=file][,if=type][,bus=n]\n"
- "[,unit=m][,media=d][index=i]\n"
+ "[,unit=m][,media=d][,index=i]\n"
"[,cyls=c,heads=h,secs=s[,trans=t]]\n"
- "[snapshot=on|off][,cache=on|off]",
+ "[,snapshot=on|off][,cache=on|off]\n"
+ "[,readonly=on|off][,copy-on-read=on|off]",
.help = "add drive to PCI storage controller",
.mhandler.cmd = drive_hot_add,
},
diff --git a/qemu-config.c b/qemu-config.c
index b2ec40b..2e5ee3c 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -84,6 +84,10 @@ static QemuOptsList qemu_drive_opts = {
.name = "readonly",
.type = QEMU_OPT_BOOL,
.help = "open drive file as read-only",
+ },{
+ .name = "copy-on-read",
+ .type = QEMU_OPT_BOOL,
+ .help = "copy read data from backing file into image file",
},
{ /* end of list */ }
},
diff --git a/qemu-options.hx b/qemu-options.hx
index 1d57f64..b7e52fe 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -135,7 +135,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
" [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
" [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
" [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
- " [,readonly=on|off]\n"
+ " [,readonly=on|off][,copy-on-read=on|off]\n"
" use 'file' as a drive image\n", QEMU_ARCH_ALL)
STEXI
@item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
@@ -183,6 +183,9 @@ host disk is full; report the error to the guest otherwise).
The default setting is @option{werror=enospc} and @option{rerror=report}.
@item readonly
Open drive @option{file} as read-only. Guest write attempts will fail.
+@item copy-on-read=@var{copy-on-read}
+@var{copy-on-read} is "on" or "off" and enables whether to copy read backing
+file sectors into the image file.
@end table
By default, writethrough caching is used for all block device. This means that
@@ -210,6 +213,11 @@ like your host losing power, the disk storage getting disconnected accidently,
etc. you're image will most probably be rendered unusable. When using
the @option{-snapshot} option, unsafe caching is always used.
+Copy-on-read avoids accessing the same backing file sectors repeatedly and is
+useful when the backing file is over a slow network. By default copy-on-read
+is off. Note that copy-on-read is a hint and may by ignored by block drivers
+which do not support it.
+
Instead of @option{-cdrom} you can use:
@example
qemu -drive file=file,index=2,media=cdrom
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 02/15] qed: replace is_write with flags field
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 01/15] block: add -drive copy-on-read=on|off Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 03/15] qed: extract qed_start_allocating_write() Stefan Hajnoczi
` (12 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
Per-request attributes like read/write are currently implemented as bool
fields in the QEDAIOCB struct. This becomes unwiedly as the number of
attributes grows. For example, the qed_aio_setup() function would have
to take multiple bool arguments and at call sites it would be hard to
distinguish the meaning of each bool.
Instead use a flags field with bitmask constants. This will be used
when the copy-on-write and check for zeroes attributes are introduced.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/qed.c | 15 ++++++++-------
block/qed.h | 6 +++++-
trace-events | 2 +-
3 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index 3970379..565bbc1 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1254,8 +1254,8 @@ static void qed_aio_next_io(void *opaque, int ret)
{
QEDAIOCB *acb = opaque;
BDRVQEDState *s = acb_to_s(acb);
- QEDFindClusterFunc *io_fn =
- acb->is_write ? qed_aio_write_data : qed_aio_read_data;
+ QEDFindClusterFunc *io_fn = (acb->flags & QED_AIOCB_WRITE) ?
+ qed_aio_write_data : qed_aio_read_data;
trace_qed_aio_next_io(s, acb, ret, acb->cur_pos + acb->cur_qiov.size);
@@ -1285,14 +1285,14 @@ static BlockDriverAIOCB *qed_aio_setup(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov, int nb_sectors,
BlockDriverCompletionFunc *cb,
- void *opaque, bool is_write)
+ void *opaque, int flags)
{
QEDAIOCB *acb = qemu_aio_get(&qed_aio_pool, bs, cb, opaque);
trace_qed_aio_setup(bs->opaque, acb, sector_num, nb_sectors,
- opaque, is_write);
+ opaque, flags);
- acb->is_write = is_write;
+ acb->flags = flags;
acb->finished = NULL;
acb->qiov = qiov;
acb->qiov_offset = 0;
@@ -1312,7 +1312,7 @@ static BlockDriverAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
BlockDriverCompletionFunc *cb,
void *opaque)
{
- return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, false);
+ return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
}
static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
@@ -1321,7 +1321,8 @@ static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
BlockDriverCompletionFunc *cb,
void *opaque)
{
- return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, true);
+ return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb,
+ opaque, QED_AIOCB_WRITE);
}
static BlockDriverAIOCB *bdrv_qed_aio_flush(BlockDriverState *bs,
diff --git a/block/qed.h b/block/qed.h
index 388fdb3..dbc00be 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -123,12 +123,16 @@ typedef struct QEDRequest {
CachedL2Table *l2_table;
} QEDRequest;
+enum {
+ QED_AIOCB_WRITE = 0x0001, /* read or write? */
+};
+
typedef struct QEDAIOCB {
BlockDriverAIOCB common;
QEMUBH *bh;
int bh_ret; /* final return status for completion bh */
QSIMPLEQ_ENTRY(QEDAIOCB) next; /* next request */
- bool is_write; /* false - read, true - write */
+ int flags; /* QED_AIOCB_* bits ORed together */
bool *finished; /* signal for cancel completion */
uint64_t end_pos; /* request end on block device, in bytes */
diff --git a/trace-events b/trace-events
index 713f042..73a8592 100644
--- a/trace-events
+++ b/trace-events
@@ -268,7 +268,7 @@ disable qed_need_check_timer_cb(void *s) "s %p"
disable qed_start_need_check_timer(void *s) "s %p"
disable qed_cancel_need_check_timer(void *s) "s %p"
disable qed_aio_complete(void *s, void *acb, int ret) "s %p acb %p ret %d"
-disable qed_aio_setup(void *s, void *acb, int64_t sector_num, int nb_sectors, void *opaque, int is_write) "s %p acb %p sector_num %"PRId64" nb_sectors %d opaque %p is_write %d"
+disable qed_aio_setup(void *s, void *acb, int64_t sector_num, int nb_sectors, void *opaque, int flags) "s %p acb %p sector_num %"PRId64" nb_sectors %d opaque %p flags %#x"
disable qed_aio_next_io(void *s, void *acb, int ret, uint64_t cur_pos) "s %p acb %p ret %d cur_pos %"PRIu64""
disable qed_aio_read_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
disable qed_aio_write_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 03/15] qed: extract qed_start_allocating_write()
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 01/15] block: add -drive copy-on-read=on|off Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 02/15] qed: replace is_write with flags field Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 04/15] qed: make qed_aio_write_alloc() reusable Stefan Hajnoczi
` (11 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
Copy-on-read requests are a form of allocating write and will need to be
queued like other allocating writes. This patch extracts the request
queuing code for allocating writes so that it can be reused for
copy-on-read in a later patch.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/qed.c | 32 ++++++++++++++++++++++++++------
1 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index 565bbc1..cc193ad 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1097,14 +1097,15 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
}
/**
- * Write new data cluster
+ * Start an allocating write request or queue it
*
- * @acb: Write request
- * @len: Length in bytes
+ * @ret: true if request can proceed, false if queued
*
- * This path is taken when writing to previously unallocated clusters.
+ * If a request is queued this function returns false and the caller should
+ * return. When it becomes time for the request to proceed the qed_aio_next()
+ * function will be called.
*/
-static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
+static bool qed_start_allocating_write(QEDAIOCB *acb)
{
BDRVQEDState *s = acb_to_s(acb);
@@ -1119,7 +1120,26 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
}
if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs) ||
s->allocating_write_reqs_plugged) {
- return; /* wait for existing request to finish */
+ return false;
+ }
+ return true;
+}
+
+/**
+ * Write new data cluster
+ *
+ * @acb: Write request
+ * @len: Length in bytes
+ *
+ * This path is taken when writing to previously unallocated clusters.
+ */
+static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
+{
+ BDRVQEDState *s = acb_to_s(acb);
+ BlockDriverCompletionFunc *cb;
+
+ if (!qed_start_allocating_write(acb)) {
+ return;
}
acb->cur_nclusters = qed_bytes_to_clusters(s,
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 04/15] qed: make qed_aio_write_alloc() reusable
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (2 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 03/15] qed: extract qed_start_allocating_write() Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 05/15] qed: add support for copy-on-read Stefan Hajnoczi
` (10 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
Copy-on-read requests will share the allocating write code path. This
requires making qed_aio_write_alloc() reusable outside of a write
request. This patch ensures that iovec setup is performed in a common
place before qed_aio_write_alloc() is called.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/qed.c | 53 +++++++++++++++--------------------------------------
1 files changed, 15 insertions(+), 38 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index cc193ad..4f535aa 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1133,19 +1133,18 @@ static bool qed_start_allocating_write(QEDAIOCB *acb)
*
* This path is taken when writing to previously unallocated clusters.
*/
-static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
+static void qed_aio_write_alloc(QEDAIOCB *acb)
{
BDRVQEDState *s = acb_to_s(acb);
- BlockDriverCompletionFunc *cb;
if (!qed_start_allocating_write(acb)) {
- return;
+ qemu_iovec_reset(&acb->cur_qiov);
+ return; /* wait until current allocating write completes */
}
acb->cur_nclusters = qed_bytes_to_clusters(s,
- qed_offset_into_cluster(s, acb->cur_pos) + len);
+ qed_offset_into_cluster(s, acb->cur_pos) + acb->cur_qiov.size);
acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
- qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
if (qed_should_set_need_check(s)) {
s->header.features |= QED_F_NEED_CHECK;
@@ -1156,25 +1155,6 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
}
/**
- * Write data cluster in place
- *
- * @acb: Write request
- * @offset: Cluster offset in bytes
- * @len: Length in bytes
- *
- * This path is taken when writing to already allocated clusters.
- */
-static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
-{
- /* Calculate the I/O vector */
- acb->cur_cluster = offset;
- qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
-
- /* Do the actual write */
- qed_aio_write_main(acb, 0);
-}
-
-/**
* Write data cluster
*
* @opaque: Write request
@@ -1192,22 +1172,19 @@ static void qed_aio_write_data(void *opaque, int ret,
trace_qed_aio_write_data(acb_to_s(acb), acb, ret, offset, len);
- acb->find_cluster_ret = ret;
-
- switch (ret) {
- case QED_CLUSTER_FOUND:
- qed_aio_write_inplace(acb, offset, len);
- break;
+ if (ret < 0) {
+ qed_aio_complete(acb, ret);
+ return;
+ }
- case QED_CLUSTER_L2:
- case QED_CLUSTER_L1:
- case QED_CLUSTER_ZERO:
- qed_aio_write_alloc(acb, len);
- break;
+ acb->find_cluster_ret = ret;
+ qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
- default:
- qed_aio_complete(acb, ret);
- break;
+ if (ret == QED_CLUSTER_FOUND) {
+ acb->cur_cluster = offset;
+ qed_aio_write_main(acb, 0);
+ } else {
+ qed_aio_write_alloc(acb);
}
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 05/15] qed: add support for copy-on-read
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (3 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 04/15] qed: make qed_aio_write_alloc() reusable Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 06/15] qed: avoid deadlock on emulated synchronous I/O Stefan Hajnoczi
` (9 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
From: Anthony Liguori <aliguori@us.ibm.com>
This patch implements copy-on-read in QED. Once a read request reaches
the copy-on-read state it adds itself to the allocating write queue in
order to avoid race conditions with write requests.
If an allocating write request manages to sneak in before the
copy-on-read request, then the copy-on-read will notice that the cluster
has been allocated when qed_find_cluster() is re-run. This works
because only one allocating request is active at any time and when the
next request is activated it will re-run qed_find_cluster().
[Originally by Anthony. Stefan added allocating write queuing and
factored out the QED_CF_COPY_ON_READ header flag.]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/qed.c | 35 +++++++++++++++++++++++++++++++++--
block/qed.h | 3 ++-
trace-events | 1 +
3 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index 4f535aa..6ca57f2 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1189,6 +1189,25 @@ static void qed_aio_write_data(void *opaque, int ret,
}
/**
+ * Copy on read callback
+ *
+ * Write data from backing file to QED that's been read if CoR is enabled.
+ */
+static void qed_copy_on_read_cb(void *opaque, int ret)
+{
+ QEDAIOCB *acb = opaque;
+
+ trace_qed_copy_on_read_cb(acb, ret);
+
+ if (ret < 0) {
+ qed_aio_complete(acb, ret);
+ return;
+ }
+
+ qed_aio_write_alloc(acb);
+}
+
+/**
* Read data cluster
*
* @opaque: Read request
@@ -1216,6 +1235,7 @@ static void qed_aio_read_data(void *opaque, int ret,
goto err;
}
+ acb->find_cluster_ret = ret;
qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
/* Handle zero cluster and backing file reads */
@@ -1224,8 +1244,17 @@ static void qed_aio_read_data(void *opaque, int ret,
qed_aio_next_io(acb, 0);
return;
} else if (ret != QED_CLUSTER_FOUND) {
+ BlockDriverCompletionFunc *cb = qed_aio_next_io;
+
+ if (bs->backing_hd && (acb->flags & QED_AIOCB_COPY_ON_READ)) {
+ if (!qed_start_allocating_write(acb)) {
+ qemu_iovec_reset(&acb->cur_qiov);
+ return; /* wait for current allocating write to complete */
+ }
+ cb = qed_copy_on_read_cb;
+ }
qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov,
- qed_aio_next_io, acb);
+ cb, acb);
return;
}
@@ -1309,7 +1338,9 @@ static BlockDriverAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
BlockDriverCompletionFunc *cb,
void *opaque)
{
- return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
+ int flags = bs->copy_on_read ? QED_AIOCB_COPY_ON_READ : 0;
+
+ return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, flags);
}
static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
diff --git a/block/qed.h b/block/qed.h
index dbc00be..16f4bd9 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -124,7 +124,8 @@ typedef struct QEDRequest {
} QEDRequest;
enum {
- QED_AIOCB_WRITE = 0x0001, /* read or write? */
+ QED_AIOCB_WRITE = 0x0001, /* read or write? */
+ QED_AIOCB_COPY_ON_READ = 0x0002,
};
typedef struct QEDAIOCB {
diff --git a/trace-events b/trace-events
index 73a8592..2c7c6dc 100644
--- a/trace-events
+++ b/trace-events
@@ -271,6 +271,7 @@ disable qed_aio_complete(void *s, void *acb, int ret) "s %p acb %p ret %d"
disable qed_aio_setup(void *s, void *acb, int64_t sector_num, int nb_sectors, void *opaque, int flags) "s %p acb %p sector_num %"PRId64" nb_sectors %d opaque %p flags %#x"
disable qed_aio_next_io(void *s, void *acb, int ret, uint64_t cur_pos) "s %p acb %p ret %d cur_pos %"PRIu64""
disable qed_aio_read_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
+disable qed_copy_on_read_cb(void *acb, int ret) "acb %p ret %d"
disable qed_aio_write_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
disable qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64""
disable qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64""
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 06/15] qed: avoid deadlock on emulated synchronous I/O
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (4 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 05/15] qed: add support for copy-on-read Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 07/15] block: add bdrv_aio_copy_backing() Stefan Hajnoczi
` (8 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
The block layer emulates synchronous bdrv_read()/bdrv_write() for
drivers that only provide the asynchronous interfaces. The emulation
issues an asynchronous request inside a new "async context" and waits
for that request to complete. If currently outstanding requests
complete during this time, their completion functions are not invoked
until the async context is popped again.
This can lead to deadlock if an allocating write is being processed when
synchronous I/O emulation starts. The emulated synchronous write will
be queued because an existing request is being processed. But the
existing request on cannot complete until the async context is popped.
The result is that qemu_aio_wait() sits in a deadlock.
Address this problem in two ways:
1. Add an assertion so that we instantly know if this corner case is
hit. This saves us time by giving a clear failure indication.
2. Ignore the copy-on-read hint for emulated synchronous reads. This
allows us to do emulated synchronous reads without hitting the
deadlock.
Keep this as a separate commit instead of merging with previous QED
patches so it is easy to drop when coroutines are introduced and
eliminate async contexts.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/qed.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index 6ca57f2..ffdbc2d 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1120,6 +1120,14 @@ static bool qed_start_allocating_write(QEDAIOCB *acb)
}
if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs) ||
s->allocating_write_reqs_plugged) {
+ /* Queuing an emulated synchronous write causes deadlock since
+ * currently outstanding requests are not in the current async context
+ * and their completion will never be invoked. Once the block layer
+ * moves to truly asynchronous semantics this failure case will be
+ * eliminated.
+ */
+ assert(get_async_context_id() == 0);
+
return false;
}
return true;
@@ -1246,7 +1254,9 @@ static void qed_aio_read_data(void *opaque, int ret,
} else if (ret != QED_CLUSTER_FOUND) {
BlockDriverCompletionFunc *cb = qed_aio_next_io;
- if (bs->backing_hd && (acb->flags & QED_AIOCB_COPY_ON_READ)) {
+ /* See qed_start_allocating_write() for get_async_context_id() hack */
+ if (bs->backing_hd && (acb->flags & QED_AIOCB_COPY_ON_READ) &&
+ get_async_context_id() == 0) {
if (!qed_start_allocating_write(acb)) {
qemu_iovec_reset(&acb->cur_qiov);
return; /* wait for current allocating write to complete */
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 07/15] block: add bdrv_aio_copy_backing()
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (5 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 06/15] qed: avoid deadlock on emulated synchronous I/O Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 08/15] qmp: add block_stream command Stefan Hajnoczi
` (7 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
From: Anthony Liguori <aliguori@us.ibm.com>
Add the bdrv_aio_copy_backing() function to the BlockDriver interface.
This function copies unallocated sectors from the backing file and can
be used to implement image streaming.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block.c | 37 +++++++++++++++++++++++++++++++++++++
block.h | 5 +++++
block_int.h | 2 ++
3 files changed, 44 insertions(+), 0 deletions(-)
diff --git a/block.c b/block.c
index 3d074af..8225758 100644
--- a/block.c
+++ b/block.c
@@ -2240,6 +2240,43 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
return ret;
}
+/**
+ * Attempt to copy unallocated sectors from backing file.
+ *
+ * @sector_num - the first sector to start from
+ * @cb - completion callback
+ * @opaque - data to pass completion callback
+ *
+ * Returns NULL if the image format not support the operation, the image is
+ * read-only, or no image is open.
+ *
+ * The intention of this function is for a user to execute it once with a
+ * sector_num of 0 and then upon receiving a completion callback, to remember
+ * the number of sectors copied, and then to call this function again with
+ * an offset adjusted by the number of sectors previously copied.
+ *
+ * This allows a user to progressive stream in an image at a pace that makes
+ * sense. In general, this function tries to do the smallest amount of I/O
+ * possible to do some useful work.
+ *
+ * This function only really makes sense in combination with a block format
+ * that supports copy on read and has it enabled. If copy on read is not
+ * enabled, a block format driver may return NULL.
+ *
+ * If an I/O error occurs the completion callback is invoked with -errno in the
+ * nb_sectors argument.
+ */
+BlockDriverAIOCB *bdrv_aio_copy_backing(BlockDriverState *bs,
+ int64_t sector_num,
+ BlockDriverCopyBackingCB *cb,
+ void *opaque)
+{
+ if (!bs->drv || bs->read_only || !bs->drv->bdrv_aio_copy_backing) {
+ return NULL;
+ }
+
+ return bs->drv->bdrv_aio_copy_backing(bs, sector_num, cb, opaque);
+}
typedef struct MultiwriteCB {
int error;
diff --git a/block.h b/block.h
index f6ffa93..aee11e6 100644
--- a/block.h
+++ b/block.h
@@ -113,6 +113,7 @@ typedef struct BlockDriverAIOCB BlockDriverAIOCB;
typedef void BlockDriverCompletionFunc(void *opaque, int ret);
typedef void BlockDriverDirtyHandler(BlockDriverState *bs, int64_t sector,
int sector_num);
+typedef void BlockDriverCopyBackingCB(void *opaque, int nb_sectors);
BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockDriverCompletionFunc *cb, void *opaque);
@@ -121,6 +122,10 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
BlockDriverCompletionFunc *cb, void *opaque);
BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *bdrv_aio_copy_backing(BlockDriverState *bs,
+ int64_t sector_num,
+ BlockDriverCopyBackingCB *cb,
+ void *opaque);
void bdrv_aio_cancel(BlockDriverAIOCB *acb);
typedef struct BlockRequest {
diff --git a/block_int.h b/block_int.h
index 69dd5a7..8b10083 100644
--- a/block_int.h
+++ b/block_int.h
@@ -74,6 +74,8 @@ struct BlockDriver {
BlockDriverCompletionFunc *cb, void *opaque);
BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
BlockDriverCompletionFunc *cb, void *opaque);
+ BlockDriverAIOCB *(*bdrv_aio_copy_backing)(BlockDriverState *bs,
+ int64_t sector_num, BlockDriverCopyBackingCB *cb, void *opaque);
int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num,
int nb_sectors);
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 08/15] qmp: add block_stream command
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (6 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 07/15] block: add bdrv_aio_copy_backing() Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-28 15:53 ` Marcelo Tosatti
2011-07-27 13:44 ` [Qemu-devel] [PATCH 09/15] qmp: add block_job_cancel command Stefan Hajnoczi
` (6 subsequent siblings)
14 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
For leaf images with copy-on-read semantics, the stream command allows
the user to populate the image file by copying data from the backing
file while the guest is running. Once all blocks have been streamed,
the dependency on the original backing file is removed. Therefore,
stream commands can be used to implement post-copy live block migration
and rapid deployment.
The command synopsis is:
block_stream
------------
Copy data from a backing file into a block device.
The block streaming operation is performed in the background until the
entire backing file has been copied. This command returns immediately
once streaming has started. The status of ongoing block streaming
operations can be checked with query-block-jobs. The operation can be
stopped before it has completed using the block_job_cancel command.
If a base file is specified then sectors are not copied from that base
file and its backing chain. When streaming completes the image file
will have the base file as its backing file. This can be used to stream
a subset of the backing file chain instead of flattening the entire
image.
On successful completion the image file is updated to drop the backing
file.
Arguments:
- device: device name (json-string)
- base: common backing file (json-string, optional)
Errors:
DeviceInUse: streaming is already active on this device
DeviceNotFound: device name is invalid
NotSupported: image streaming is not supported by this device
Events:
On completion the BLOCK_JOB_COMPLETED event is raised with the following
fields:
- type: job type ("stream" for image streaming, json-string)
- device: device name (json-string)
- end: maximum progress value (json-int)
- position: current progress value (json-int)
- speed: rate limit, bytes per second (json-int)
- error: error message (json-string, only on error)
The completion event is raised both on success and on failure. On
success position is equal to end. On failure position and end can be
used to indicate at which point the operation failed.
On failure the error field contains a human-readable error message.
There are no semantics other than that streaming has failed and clients
should not try to interpret the error string.
Examples:
-> { "execute": "block_stream", "arguments": { "device": "virtio0" } }
<- { "return": {} }
Signed-off-by: Adam Litke <agl@us.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
blockdev.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
blockdev.h | 1 +
hmp-commands.hx | 14 ++++++
monitor.c | 3 +
monitor.h | 1 +
qerror.c | 8 +++
qerror.h | 6 +++
qmp-commands.hx | 64 ++++++++++++++++++++++++++
8 files changed, 230 insertions(+), 0 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index b337732..cd5e49c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -16,6 +16,7 @@
#include "sysemu.h"
#include "hw/qdev.h"
#include "block_int.h"
+#include "qjson.h"
static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
@@ -50,6 +51,131 @@ static const int if_max_devs[IF_COUNT] = {
[IF_SCSI] = 7,
};
+typedef struct StreamState {
+ int64_t offset; /* current position in block device */
+ BlockDriverState *bs;
+ QEMUTimer *timer;
+ QLIST_ENTRY(StreamState) list;
+} StreamState;
+
+static QLIST_HEAD(, StreamState) block_streams =
+ QLIST_HEAD_INITIALIZER(block_streams);
+
+static QObject *stream_get_qobject(StreamState *s)
+{
+ const char *name = bdrv_get_device_name(s->bs);
+ int64_t len = bdrv_getlength(s->bs);
+
+ return qobject_from_jsonf("{ 'device': %s, 'type': 'stream', "
+ "'offset': %" PRId64 ", 'len': %" PRId64 ", "
+ "'speed': %" PRId64 " }",
+ name, s->offset, len, (int64_t)0);
+}
+
+static void stream_mon_event(StreamState *s, int ret)
+{
+ QObject *data = stream_get_qobject(s);
+
+ if (ret < 0) {
+ QDict *qdict = qobject_to_qdict(data);
+
+ qdict_put(qdict, "error", qstring_from_str(strerror(-ret)));
+ }
+
+ monitor_protocol_event(QEVENT_BLOCK_JOB_COMPLETED, data);
+ qobject_decref(data);
+}
+
+static void stream_free(StreamState *s)
+{
+ QLIST_REMOVE(s, list);
+
+ qemu_del_timer(s->timer);
+ qemu_free_timer(s->timer);
+ qemu_free(s);
+}
+
+static void stream_complete(StreamState *s, int ret)
+{
+ stream_mon_event(s, ret);
+ stream_free(s);
+}
+
+static void stream_cb(void *opaque, int nb_sectors)
+{
+ StreamState *s = opaque;
+
+ if (nb_sectors < 0) {
+ stream_complete(s, nb_sectors);
+ return;
+ }
+
+ s->offset += nb_sectors * BDRV_SECTOR_SIZE;
+
+ if (s->offset == bdrv_getlength(s->bs)) {
+ bdrv_change_backing_file(s->bs, NULL, NULL);
+ stream_complete(s, 0);
+ } else {
+ qemu_mod_timer(s->timer, qemu_get_clock_ns(rt_clock));
+ }
+}
+
+/* We can't call bdrv_aio_stream() directly from the callback because that
+ * makes qemu_aio_flush() not complete until the streaming is completed.
+ * By delaying with a timer, we give qemu_aio_flush() a chance to complete.
+ */
+static void stream_next_iteration(void *opaque)
+{
+ StreamState *s = opaque;
+
+ bdrv_aio_copy_backing(s->bs, s->offset / BDRV_SECTOR_SIZE, stream_cb, s);
+}
+
+static StreamState *stream_find(const char *device)
+{
+ StreamState *s;
+
+ QLIST_FOREACH(s, &block_streams, list) {
+ if (strcmp(bdrv_get_device_name(s->bs), device) == 0) {
+ return s;
+ }
+ }
+ return NULL;
+}
+
+static StreamState *stream_start(const char *device)
+{
+ StreamState *s;
+ BlockDriverAIOCB *acb;
+ BlockDriverState *bs;
+
+ s = stream_find(device);
+ if (s) {
+ qerror_report(QERR_DEVICE_IN_USE, device);
+ return NULL;
+ }
+
+ bs = bdrv_find(device);
+ if (!bs) {
+ qerror_report(QERR_DEVICE_NOT_FOUND, device);
+ return NULL;
+ }
+
+ s = qemu_mallocz(sizeof(*s));
+ s->bs = bs;
+ s->timer = qemu_new_timer_ns(rt_clock, stream_next_iteration, s);
+ QLIST_INSERT_HEAD(&block_streams, s, list);
+
+ acb = bdrv_aio_copy_backing(s->bs, s->offset / BDRV_SECTOR_SIZE,
+ stream_cb, s);
+ if (acb == NULL) {
+ stream_free(s);
+ qerror_report(QERR_NOT_SUPPORTED);
+ return NULL;
+ }
+ return s;
+}
+
/*
* We automatically delete the drive when a device using it gets
* unplugged. Questionable feature, but we can't just drop it.
@@ -650,6 +776,13 @@ out:
return ret;
}
+int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data)
+{
+ const char *device = qdict_get_str(params, "device");
+
+ return stream_start(device) ? 0 : -1;
+}
+
static int eject_device(Monitor *mon, BlockDriverState *bs, int force)
{
if (!force) {
diff --git a/blockdev.h b/blockdev.h
index 3587786..f475aa8 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -65,5 +65,6 @@ int do_change_block(Monitor *mon, const char *device,
int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data);
#endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index cbaa9a0..9bf1025 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -38,6 +38,20 @@ Commit changes to the disk images (if -snapshot is used) or backing files.
ETEXI
{
+ .name = "block_stream",
+ .args_type = "device:B",
+ .params = "device",
+ .help = "Copy data from a backing file into a block device",
+ .mhandler.cmd_new = do_block_stream,
+ },
+
+STEXI
+@item block_stream
+@findex block_stream
+Copy data from a backing file into a block device.
+ETEXI
+
+ {
.name = "q|quit",
.args_type = "",
.params = "",
diff --git a/monitor.c b/monitor.c
index 718935b..700b534 100644
--- a/monitor.c
+++ b/monitor.c
@@ -468,6 +468,9 @@ void monitor_protocol_event(MonitorEvent event, QObject *data)
case QEVENT_SPICE_DISCONNECTED:
event_name = "SPICE_DISCONNECTED";
break;
+ case QEVENT_BLOCK_JOB_COMPLETED:
+ event_name = "BLOCK_JOB_COMPLETED";
+ break;
default:
abort();
break;
diff --git a/monitor.h b/monitor.h
index 4f2d328..135c927 100644
--- a/monitor.h
+++ b/monitor.h
@@ -35,6 +35,7 @@ typedef enum MonitorEvent {
QEVENT_SPICE_CONNECTED,
QEVENT_SPICE_INITIALIZED,
QEVENT_SPICE_DISCONNECTED,
+ QEVENT_BLOCK_JOB_COMPLETED,
QEVENT_MAX,
} MonitorEvent;
diff --git a/qerror.c b/qerror.c
index 69c1bc9..c5bd197 100644
--- a/qerror.c
+++ b/qerror.c
@@ -162,6 +162,10 @@ static const QErrorStringTable qerror_table[] = {
.desc = "No '%(bus)' bus found for device '%(device)'",
},
{
+ .error_fmt = QERR_NOT_SUPPORTED,
+ .desc = "Operation is not supported",
+ },
+ {
.error_fmt = QERR_OPEN_FILE_FAILED,
.desc = "Could not open '%(filename)'",
},
@@ -230,6 +234,10 @@ static const QErrorStringTable qerror_table[] = {
.error_fmt = QERR_QGA_COMMAND_FAILED,
.desc = "Guest agent command failed, error was '%(message)'",
},
+ {
+ .error_fmt = QERR_STREAMING_ERROR,
+ .desc = "An error occurred during streaming: %(msg)",
+ },
{}
};
diff --git a/qerror.h b/qerror.h
index 8058456..ffe3190 100644
--- a/qerror.h
+++ b/qerror.h
@@ -139,6 +139,9 @@ QError *qobject_to_qerror(const QObject *obj);
#define QERR_NO_BUS_FOR_DEVICE \
"{ 'class': 'NoBusForDevice', 'data': { 'device': %s, 'bus': %s } }"
+#define QERR_NOT_SUPPORTED \
+ "{ 'class': 'NotSupported', 'data': {} }"
+
#define QERR_OPEN_FILE_FAILED \
"{ 'class': 'OpenFileFailed', 'data': { 'filename': %s } }"
@@ -193,4 +196,7 @@ QError *qobject_to_qerror(const QObject *obj);
#define QERR_QGA_COMMAND_FAILED \
"{ 'class': 'QgaCommandFailed', 'data': { 'message': %s } }"
+#define QERR_STREAMING_ERROR \
+ "{ 'class': 'StreamingError', 'data': { 'msg': %s } }"
+
#endif /* QERROR_H */
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 54e313c..80402c7 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -945,6 +945,70 @@ Example:
<- { "return": {} }
EQMP
+
+ {
+ .name = "block_stream",
+ .args_type = "device:B",
+ .params = "device",
+ .help = "Copy data from a backing file into a block device",
+ .mhandler.cmd_new = do_block_stream,
+ },
+
+SQMP
+
+Copy data from a backing file into a block device.
+
+The block streaming operation is performed in the background until the entire
+backing file has been copied. This command returns immediately once streaming
+has started. The status of ongoing block streaming operations can be checked
+with query-block-jobs. The operation can be stopped before it has completed
+using the block_job_cancel command.
+
+If a base file is specified then sectors are not copied from that base file and
+its backing chain. When streaming completes the image file will have the base
+file as its backing file. This can be used to stream a subset of the backing
+file chain instead of flattening the entire image.
+
+On successful completion the image file is updated to drop the backing file.
+
+Arguments:
+
+- device: device name (json-string)
+- base: common backing file (json-string, optional)
+
+Errors:
+
+DeviceInUse: streaming is already active on this device
+DeviceNotFound: device name is invalid
+NotSupported: image streaming is not supported by this device
+
+Events:
+
+On completion the BLOCK_JOB_COMPLETED event is raised with the following
+fields:
+
+- type: job type ("stream" for image streaming, json-string)
+- device: device name (json-string)
+- end: maximum progress value (json-int)
+- position: current progress value (json-int)
+- speed: rate limit, bytes per second (json-int)
+- error: error message (json-string, only on error)
+
+The completion event is raised both on success and on failure. On
+success position is equal to end. On failure position and end can be
+used to indicate at which point the operation failed.
+
+On failure the error field contains a human-readable error message. There are
+no semantics other than that streaming has failed and clients should not try
+to interpret the error string.
+
+Examples:
+
+-> { "execute": "block_stream", "arguments": { "device": "virtio0" } }
+<- { "return": {} }
+
+EQMP
+
{
.name = "qmp_capabilities",
.args_type = "",
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 09/15] qmp: add block_job_cancel command
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (7 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 08/15] qmp: add block_stream command Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 10/15] qmp: add query-block-jobs command Stefan Hajnoczi
` (5 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
Image streaming operations can be stopped using the block_job_cancel
command. In the future other types of background operations on block
devices can be cancelled using this command.
The command synopsis is:
block_job_cancel
----------------
Stop an active block streaming operation.
This command returns once the active block streaming operation has been
stopped. It is an error to call this command if no operation is in
progress.
The image file retains its backing file unless the streaming operation
happens to complete just as it is being cancelled.
A new block streaming operation can be started at a later time to finish
copying all data from the backing file.
Arguments:
- device: device name (json-string)
Errors:
DeviceNotActive: streaming is not active on this device
DeviceInUse: cancellation already in progress
Examples:
-> { "execute": "block_job_cancel", "arguments":
{ "device": "virtio0" } }
<- { "return": {} }
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
blockdev.c | 34 ++++++++++++++++++++++++++++++++++
blockdev.h | 3 +++
hmp-commands.hx | 15 +++++++++++++++
qmp-commands.hx | 41 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 93 insertions(+), 0 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index cd5e49c..e9bc577 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -52,6 +52,8 @@ static const int if_max_devs[IF_COUNT] = {
};
typedef struct StreamState {
+ MonitorCompletion *cancel_cb;
+ void *cancel_opaque;
int64_t offset; /* current position in block device */
BlockDriverState *bs;
QEMUTimer *timer;
@@ -90,6 +92,10 @@ static void stream_free(StreamState *s)
{
QLIST_REMOVE(s, list);
+ if (s->cancel_cb) {
+ s->cancel_cb(s->cancel_opaque, NULL);
+ }
+
qemu_del_timer(s->timer);
qemu_free_timer(s->timer);
qemu_free(s);
@@ -115,6 +121,8 @@ static void stream_cb(void *opaque, int nb_sectors)
if (s->offset == bdrv_getlength(s->bs)) {
bdrv_change_backing_file(s->bs, NULL, NULL);
stream_complete(s, 0);
+ } else if (s->cancel_cb) {
+ stream_free(s);
} else {
qemu_mod_timer(s->timer, qemu_get_clock_ns(rt_clock));
}
@@ -176,6 +184,24 @@ static StreamState *stream_start(const char *device)
return s;
}
+static int stream_stop(const char *device, MonitorCompletion *cb, void *opaque)
+{
+ StreamState *s = stream_find(device);
+
+ if (!s) {
+ qerror_report(QERR_DEVICE_NOT_ACTIVE, device);
+ return -1;
+ }
+ if (s->cancel_cb) {
+ qerror_report(QERR_DEVICE_IN_USE, device);
+ return -1;
+ }
+
+ s->cancel_cb = cb;
+ s->cancel_opaque = opaque;
+ return 0;
+}
+
/*
* We automatically delete the drive when a device using it gets
* unplugged. Questionable feature, but we can't just drop it.
@@ -783,6 +809,14 @@ int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data)
return stream_start(device) ? 0 : -1;
}
+int do_block_job_cancel(Monitor *mon, const QDict *params,
+ MonitorCompletion cb, void *opaque)
+{
+ const char *device = qdict_get_str(params, "device");
+
+ return stream_stop(device, cb, opaque);
+}
+
static int eject_device(Monitor *mon, BlockDriverState *bs, int force)
{
if (!force) {
diff --git a/blockdev.h b/blockdev.h
index f475aa8..a3cde69 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -12,6 +12,7 @@
#include "block.h"
#include "qemu-queue.h"
+#include "monitor.h"
void blockdev_mark_auto_del(BlockDriverState *bs);
void blockdev_auto_del(BlockDriverState *bs);
@@ -66,5 +67,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data);
+int do_block_job_cancel(Monitor *mon, const QDict *params,
+ MonitorCompletion cb, void *opaque);
#endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 9bf1025..613eb76 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -52,6 +52,21 @@ Copy data from a backing file into a block device.
ETEXI
{
+ .name = "block_job_cancel",
+ .args_type = "device:B",
+ .params = "device",
+ .help = "Stop an active block streaming operation",
+ .mhandler.cmd_async = do_block_job_cancel,
+ .flags = MONITOR_CMD_ASYNC,
+ },
+
+STEXI
+@item block_job_cancel
+@findex block_job_cancel
+Stop an active block streaming operation.
+ETEXI
+
+ {
.name = "q|quit",
.args_type = "",
.params = "",
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 80402c7..5ab15a4 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1010,6 +1010,47 @@ Examples:
EQMP
{
+ .name = "block_job_cancel",
+ .args_type = "device:B",
+ .params = "device",
+ .help = "Stop an active streaming operation on a block device",
+ .mhandler.cmd_async = do_block_job_cancel,
+ .flags = MONITOR_CMD_ASYNC,
+ },
+
+SQMP
+
+block_job_cancel
+----------------
+
+Stop an active block streaming operation.
+
+This command returns once the active block streaming operation has been
+stopped. It is an error to call this command if no operation is in progress.
+
+The image file retains its backing file unless the streaming operation happens
+to complete just as it is being cancelled.
+
+A new block streaming operation can be started at a later time to finish
+copying all data from the backing file.
+
+Arguments:
+
+- device: device name (json-string)
+
+Errors:
+
+DeviceNotActive: streaming is not active on this device
+DeviceInUse: cancellation already in progress
+
+Examples:
+
+-> { "execute": "block_job_cancel", "arguments": { "device": "virtio0" } }
+<- { "return": {} }
+
+EQMP
+
+ {
.name = "qmp_capabilities",
.args_type = "",
.params = "",
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 10/15] qmp: add query-block-jobs command
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (8 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 09/15] qmp: add block_job_cancel command Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 11/15] qmp: add block_job_set_speed command Stefan Hajnoczi
` (4 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
Active image streaming operations can be enumerated with the
query-block-jobs command. Each operation is listed along with its
total progress.
The command synopsis is:
query-block-jobs
----------------
Show progress of ongoing block device operations.
Return a json-array of all block device operations. If no operation is
active then return an empty array. Each operation is a json-object with
the following data:
- type: job type ("stream" for image streaming, json-string)
- device: device name (json-string)
- end: maximum progress value (json-int)
- position: current progress value (json-int)
- speed: rate limit, bytes per second (json-int)
Progress can be observed as position increases and it reaches end when
the operation completes. Position and end have undefined units but can
be used to calculate a percentage indicating the progress that has been
made.
Example:
-> { "execute": "query-block-jobs" }
<- { "return":[
{ "type": "stream", "device": "virtio0",
"end": 10737418240, "position": 709632,
"speed": 0 }
]
}
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
blockdev.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
blockdev.h | 2 ++
hmp-commands.hx | 2 ++
monitor.c | 16 ++++++++++++++++
qmp-commands.hx | 31 +++++++++++++++++++++++++++++++
5 files changed, 97 insertions(+), 0 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index e9bc577..422b43b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -802,6 +802,52 @@ out:
return ret;
}
+static void monitor_print_block_stream(Monitor *mon, const QObject *data)
+{
+ QDict *stream;
+
+ assert(data);
+ stream = qobject_to_qdict(data);
+
+ monitor_printf(mon, "Streaming device %s: Completed %" PRId64 " of %"
+ PRId64 " bytes, speed limit %" PRId64 " bytes/s\n",
+ qdict_get_str(stream, "device"),
+ qdict_get_int(stream, "offset"),
+ qdict_get_int(stream, "len"),
+ (int64_t)0);
+}
+
+void monitor_print_block_jobs(Monitor *mon, const QObject *data)
+{
+ QList *streams;
+ QListEntry *entry;
+
+ assert(data);
+ streams = qobject_to_qlist(data);
+ assert(streams); /* we pass a list of stream objects to ourselves */
+
+ if (qlist_empty(streams)) {
+ monitor_printf(mon, "No active jobs\n");
+ return;
+ }
+
+ QLIST_FOREACH_ENTRY(streams, entry) {
+ monitor_print_block_stream(mon, entry->value);
+ }
+}
+
+void do_info_block_jobs(Monitor *mon, QObject **ret_data)
+{
+ QList *streams;
+ StreamState *s;
+
+ streams = qlist_new();
+ QLIST_FOREACH(s, &block_streams, list) {
+ qlist_append_obj(streams, stream_get_qobject(s));
+ }
+ *ret_data = QOBJECT(streams);
+}
+
int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data)
{
const char *device = qdict_get_str(params, "device");
diff --git a/blockdev.h b/blockdev.h
index a3cde69..0a32793 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -66,6 +66,8 @@ int do_change_block(Monitor *mon, const char *device,
int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
+void monitor_print_block_jobs(Monitor *mon, const QObject *data);
+void do_info_block_jobs(Monitor *mon, QObject **ret_data);
int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data);
int do_block_job_cancel(Monitor *mon, const QDict *params,
MonitorCompletion cb, void *opaque);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 613eb76..74a74d8 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1383,6 +1383,8 @@ show device tree
show qdev device model list
@item info roms
show roms
+@item info block-jobs
+show progress of background block device operations
@end table
ETEXI
diff --git a/monitor.c b/monitor.c
index 700b534..bc2f630 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3141,6 +3141,14 @@ static const mon_cmd_t info_cmds[] = {
},
#endif
{
+ .name = "block-jobs",
+ .args_type = "",
+ .params = "",
+ .help = "show block job status",
+ .user_print = monitor_print_block_jobs,
+ .mhandler.info_new = do_info_block_jobs,
+ },
+ {
.name = NULL,
},
};
@@ -3282,6 +3290,14 @@ static const mon_cmd_t qmp_query_cmds[] = {
.mhandler.info_async = do_info_balloon,
.flags = MONITOR_CMD_ASYNC,
},
+ {
+ .name = "block-jobs",
+ .args_type = "",
+ .params = "",
+ .help = "show block job status",
+ .user_print = monitor_print_block_jobs,
+ .mhandler.info_new = do_info_block_jobs,
+ },
{ /* NULL */ },
};
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 5ab15a4..c3a72ad 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1971,3 +1971,34 @@ Example:
EQMP
+SQMP
+query-block-jobs
+----------------
+
+Show progress of ongoing block device operations.
+
+Return a json-array of all block device operations. If no operation is
+active then return an empty array. Each operation is a json-object with the
+following data:
+
+- type: job type ("stream" for image streaming, json-string)
+- device: device name (json-string)
+- end: maximum progress value (json-int)
+- position: current progress value (json-int)
+- speed: rate limit, bytes per second (json-int)
+
+Progress can be observed as position increases and it reaches end when
+the operation completes. Position and end have undefined units but can be
+used to calculate a percentage indicating the progress that has been made.
+
+Example:
+
+-> { "execute": "query-block-jobs" }
+<- { "return":[
+ { "type": "stream", "device": "virtio0",
+ "end": 10737418240, "position": 709632,
+ "speed": 0 }
+ ]
+ }
+
+EQMP
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 11/15] qmp: add block_job_set_speed command
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (9 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 10/15] qmp: add query-block-jobs command Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 12/15] block: add -drive stream=on|off Stefan Hajnoczi
` (3 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
The block_job_set_speed command sets a throughput limit on an active
image streaming operation. This can be used to isolate the streaming
operation and control the amount of I/O bandwidth it consumes.
The command synopsis is as follows:
block_job_set_speed
-------------------
Set maximum speed for a background block operation.
This is a per-block device command that can only be issued
when there is an active block job.
Throttling can be disabled by setting the speed to 0.
Arguments:
- device: device name (json-string)
- value: maximum speed, in bytes per second (json-int)
Errors:
DeviceNotActive: streaming is not active on this device
NotSupported: job type does not support speed setting
Example:
-> { "execute": "block_job_set_speed",
"arguments": { "device": "virtio0", "value": 1024 } }
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
blockdev.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
blockdev.h | 2 +
hmp-commands.hx | 14 ++++++++++++
qmp-commands.hx | 35 ++++++++++++++++++++++++++++++
4 files changed, 111 insertions(+), 3 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index 422b43b..a044830 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -51,12 +51,20 @@ static const int if_max_devs[IF_COUNT] = {
[IF_SCSI] = 7,
};
+enum {
+ SLICE_TIME_NS = 100000000, /* 100 ms rate-limiting slice time */
+};
+
typedef struct StreamState {
MonitorCompletion *cancel_cb;
void *cancel_opaque;
int64_t offset; /* current position in block device */
BlockDriverState *bs;
QEMUTimer *timer;
+ int64_t bytes_per_sec; /* rate limit */
+ int64_t bytes_per_slice; /* rate limit scaled to slice */
+ int64_t slice_end_time; /* when this slice finishes */
+ int64_t slice_start_offset; /* offset when slice started */
QLIST_ENTRY(StreamState) list;
} StreamState;
@@ -71,7 +79,7 @@ static QObject *stream_get_qobject(StreamState *s)
return qobject_from_jsonf("{ 'device': %s, 'type': 'stream', "
"'offset': %" PRId64 ", 'len': %" PRId64 ", "
"'speed': %" PRId64 " }",
- name, s->offset, len, (int64_t)0);
+ name, s->offset, len, s->bytes_per_sec);
}
static void stream_mon_event(StreamState *s, int ret)
@@ -107,6 +115,27 @@ static void stream_complete(StreamState *s, int ret)
stream_free(s);
}
+static void stream_schedule_next_iteration(StreamState *s)
+{
+ int64_t next = qemu_get_clock_ns(rt_clock);
+
+ /* New slice */
+ if (next >= s->slice_end_time) {
+ s->slice_end_time = next + SLICE_TIME_NS;
+ s->slice_start_offset = s->offset;
+ }
+
+ /* Throttle */
+ if (s->bytes_per_slice &&
+ s->offset - s->slice_start_offset >= s->bytes_per_slice) {
+ next = s->slice_end_time;
+ s->slice_end_time = next + SLICE_TIME_NS;
+ s->slice_start_offset += s->bytes_per_slice;
+ }
+
+ qemu_mod_timer(s->timer, next);
+}
+
static void stream_cb(void *opaque, int nb_sectors)
{
StreamState *s = opaque;
@@ -124,7 +153,7 @@ static void stream_cb(void *opaque, int nb_sectors)
} else if (s->cancel_cb) {
stream_free(s);
} else {
- qemu_mod_timer(s->timer, qemu_get_clock_ns(rt_clock));
+ stream_schedule_next_iteration(s);
}
}
@@ -202,6 +231,20 @@ static int stream_stop(const char *device, MonitorCompletion *cb, void *opaque)
return 0;
}
+static int stream_set_speed(const char *device, int64_t bytes_per_sec)
+{
+ StreamState *s = stream_find(device);
+
+ if (!s) {
+ qerror_report(QERR_DEVICE_NOT_ACTIVE, device);
+ return -1;
+ }
+
+ s->bytes_per_sec = bytes_per_sec;
+ s->bytes_per_slice = bytes_per_sec * SLICE_TIME_NS / 1000000000LL;
+ return 0;
+}
+
/*
* We automatically delete the drive when a device using it gets
* unplugged. Questionable feature, but we can't just drop it.
@@ -814,7 +857,7 @@ static void monitor_print_block_stream(Monitor *mon, const QObject *data)
qdict_get_str(stream, "device"),
qdict_get_int(stream, "offset"),
qdict_get_int(stream, "len"),
- (int64_t)0);
+ qdict_get_int(stream, "speed"));
}
void monitor_print_block_jobs(Monitor *mon, const QObject *data)
@@ -863,6 +906,20 @@ int do_block_job_cancel(Monitor *mon, const QDict *params,
return stream_stop(device, cb, opaque);
}
+int do_block_job_set_speed(Monitor *mon, const QDict *params,
+ QObject **ret_data)
+{
+ const char *device = qdict_get_str(params, "device");
+ int64_t value;
+
+ value = qdict_get_int(params, "value");
+ if (value < 0) {
+ value = 0;
+ }
+
+ return stream_set_speed(device, value);
+}
+
static int eject_device(Monitor *mon, BlockDriverState *bs, int force)
{
if (!force) {
diff --git a/blockdev.h b/blockdev.h
index 0a32793..6f09597 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -71,5 +71,7 @@ void do_info_block_jobs(Monitor *mon, QObject **ret_data);
int do_block_stream(Monitor *mon, const QDict *params, QObject **ret_data);
int do_block_job_cancel(Monitor *mon, const QDict *params,
MonitorCompletion cb, void *opaque);
+int do_block_job_set_speed(Monitor *mon, const QDict *params,
+ QObject **ret_data);
#endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 74a74d8..2470c3f 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -67,6 +67,20 @@ Stop an active block streaming operation.
ETEXI
{
+ .name = "block_job_set_speed",
+ .args_type = "device:B,value:o",
+ .params = "device value",
+ .help = "Set the maximum speed for a background block operation",
+ .mhandler.cmd_new = do_block_job_set_speed,
+ },
+
+STEXI
+@item block_job_set_speed @var{device} @var{value}
+@findex block_job_set_speed
+Set the maximum speed for a background block operation.
+ETEXI
+
+ {
.name = "q|quit",
.args_type = "",
.params = "",
diff --git a/qmp-commands.hx b/qmp-commands.hx
index c3a72ad..c969909 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1051,6 +1051,41 @@ Examples:
EQMP
{
+ .name = "block_job_set_speed",
+ .args_type = "device:B,value:o",
+ .params = "device value",
+ .help = "Set maximum speed for a background block operation",
+ .mhandler.cmd_new = do_block_job_set_speed,
+ },
+
+SQMP
+block_job_set_speed
+-------------------
+
+Set maximum speed for a background block operation.
+
+This is a per-block device command that can only be issued
+when there is an active block job.
+
+Throttling can be disabled by setting the speed to 0.
+
+Arguments:
+
+- device: device name (json-string)
+- value: maximum speed, in bytes per second (json-int)
+
+Errors:
+DeviceNotActive: streaming is not active on this device
+NotSupported: job type does not support speed setting
+
+Example:
+
+-> { "execute": "block_job_set_speed",
+ "arguments": { "device": "virtio0", "value": 1024 } }
+
+EQMP
+
+ {
.name = "qmp_capabilities",
.args_type = "",
.params = "",
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 12/15] block: add -drive stream=on|off
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (10 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 11/15] qmp: add block_job_set_speed command Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 13/15] qed: intelligent streaming implementation Stefan Hajnoczi
` (2 subsequent siblings)
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
This patch adds the -drive stream=on|off command-line option:
stream=on|off
stream is "on" or "off" and enables background copying of backing file
contents into the image file until the backing file is no longer
needed.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
blockdev.c | 12 +++++++++++-
hmp-commands.hx | 3 ++-
qemu-config.c | 4 ++++
qemu-options.hx | 5 ++++-
4 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index a044830..20947e2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -432,7 +432,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
const char *devaddr;
DriveInfo *dinfo;
int snapshot = 0;
- int copy_on_read;
+ int copy_on_read, stream;
int ret;
translation = BIOS_ATA_TRANSLATION_AUTO;
@@ -450,6 +450,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
ro = qemu_opt_get_bool(opts, "readonly", 0);
copy_on_read = qemu_opt_get_bool(opts, "copy-on-read", 0);
+ stream = qemu_opt_get_bool(opts, "stream", 0);
file = qemu_opt_get(opts, "file");
serial = qemu_opt_get(opts, "serial");
@@ -737,6 +738,15 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
goto err;
}
+ if (stream) {
+ const char *device_name = bdrv_get_device_name(dinfo->bdrv);
+
+ if (!stream_start(device_name)) {
+ fprintf(stderr, "qemu: warning: stream_start failed for '%s'\n",
+ device_name);
+ }
+ }
+
if (bdrv_key_required(dinfo->bdrv))
autostart = 0;
return dinfo;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 2470c3f..b4dd8c08 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -909,7 +909,8 @@ ETEXI
"[,unit=m][,media=d][,index=i]\n"
"[,cyls=c,heads=h,secs=s[,trans=t]]\n"
"[,snapshot=on|off][,cache=on|off]\n"
- "[,readonly=on|off][,copy-on-read=on|off]",
+ "[,readonly=on|off][,copy-on-read=on|off]"
+ "[,stream=on|off]",
.help = "add drive to PCI storage controller",
.mhandler.cmd = drive_hot_add,
},
diff --git a/qemu-config.c b/qemu-config.c
index 2e5ee3c..f1b9045 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -88,6 +88,10 @@ static QemuOptsList qemu_drive_opts = {
.name = "copy-on-read",
.type = QEMU_OPT_BOOL,
.help = "copy read data from backing file into image file",
+ },{
+ .name = "stream",
+ .type = QEMU_OPT_BOOL,
+ .help = "copy backing file data into image file while guest runs",
},
{ /* end of list */ }
},
diff --git a/qemu-options.hx b/qemu-options.hx
index b7e52fe..d54dd86 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -135,7 +135,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
" [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
" [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
" [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
- " [,readonly=on|off][,copy-on-read=on|off]\n"
+ " [,readonly=on|off][,copy-on-read=on|off][,stream=on|off]\n"
" use 'file' as a drive image\n", QEMU_ARCH_ALL)
STEXI
@item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
@@ -186,6 +186,9 @@ Open drive @option{file} as read-only. Guest write attempts will fail.
@item copy-on-read=@var{copy-on-read}
@var{copy-on-read} is "on" or "off" and enables whether to copy read backing
file sectors into the image file.
+@item stream=@var{stream}
+@var{stream} is "on" or "off" and enables background copying of backing file
+contents into the image file until the backing file is no longer needed.
@end table
By default, writethrough caching is used for all block device. This means that
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 13/15] qed: intelligent streaming implementation
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (11 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 12/15] block: add -drive stream=on|off Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 14/15] trace: trace bdrv_aio_readv/writev error paths Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 15/15] tests: add image streaming QMP interface tests Stefan Hajnoczi
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Adam Litke
From: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
block/qed.c | 248 +++++++++++++++++++++++++++++++++++++++++++++++++++++++----
block/qed.h | 3 +-
2 files changed, 234 insertions(+), 17 deletions(-)
diff --git a/block/qed.c b/block/qed.c
index ffdbc2d..f9f7c94 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -951,9 +951,8 @@ static void qed_aio_write_l1_update(void *opaque, int ret)
/**
* Update L2 table with new cluster offsets and write them out
*/
-static void qed_aio_write_l2_update(void *opaque, int ret)
+static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
{
- QEDAIOCB *acb = opaque;
BDRVQEDState *s = acb_to_s(acb);
bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
int index;
@@ -969,7 +968,7 @@ static void qed_aio_write_l2_update(void *opaque, int ret)
index = qed_l2_index(s, acb->cur_pos);
qed_update_l2_table(s, acb->request.l2_table->table, index, acb->cur_nclusters,
- acb->cur_cluster);
+ offset);
if (need_alloc) {
/* Write out the whole new L2 table */
@@ -986,6 +985,51 @@ err:
qed_aio_complete(acb, ret);
}
+static void qed_aio_write_l2_update_cb(void *opaque, int ret)
+{
+ QEDAIOCB *acb = opaque;
+ qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
+}
+
+/**
+ * Determine if we have a zero write to a block of clusters
+ *
+ * We validate that the write is aligned to a cluster boundary, and that it's
+ * a multiple of cluster size with all zeros.
+ */
+static bool qed_is_zero_write(QEDAIOCB *acb)
+{
+ BDRVQEDState *s = acb_to_s(acb);
+ int i;
+
+ if (!qed_offset_is_cluster_aligned(s, acb->cur_pos)) {
+ return false;
+ }
+
+ if (!qed_offset_is_cluster_aligned(s, acb->cur_qiov.size)) {
+ return false;
+ }
+
+ for (i = 0; i < acb->cur_qiov.niov; i++) {
+ struct iovec *iov = &acb->cur_qiov.iov[i];
+ uint64_t *v;
+ int j;
+
+ if ((iov->iov_len & 0x07)) {
+ return false;
+ }
+
+ v = iov->iov_base;
+ for (j = 0; j < iov->iov_len; j += sizeof(v[0])) {
+ if (v[j >> 3]) {
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
/**
* Flush new data clusters before updating the L2 table
*
@@ -1000,7 +1044,7 @@ static void qed_aio_write_flush_before_l2_update(void *opaque, int ret)
QEDAIOCB *acb = opaque;
BDRVQEDState *s = acb_to_s(acb);
- if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update, opaque)) {
+ if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update_cb, opaque)) {
qed_aio_complete(acb, -EIO);
}
}
@@ -1030,7 +1074,7 @@ static void qed_aio_write_main(void *opaque, int ret)
if (s->bs->backing_hd) {
next_fn = qed_aio_write_flush_before_l2_update;
} else {
- next_fn = qed_aio_write_l2_update;
+ next_fn = qed_aio_write_l2_update_cb;
}
}
@@ -1096,6 +1140,18 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
return !(s->header.features & QED_F_NEED_CHECK);
}
+static void qed_aio_write_zero_cluster(void *opaque, int ret)
+{
+ QEDAIOCB *acb = opaque;
+
+ if (ret) {
+ qed_aio_complete(acb, ret);
+ return;
+ }
+
+ qed_aio_write_l2_update(acb, 0, 1);
+}
+
/**
* Start an allocating write request or queue it
*
@@ -1144,6 +1200,7 @@ static bool qed_start_allocating_write(QEDAIOCB *acb)
static void qed_aio_write_alloc(QEDAIOCB *acb)
{
BDRVQEDState *s = acb_to_s(acb);
+ BlockDriverCompletionFunc *cb;
if (!qed_start_allocating_write(acb)) {
qemu_iovec_reset(&acb->cur_qiov);
@@ -1154,11 +1211,18 @@ static void qed_aio_write_alloc(QEDAIOCB *acb)
qed_offset_into_cluster(s, acb->cur_pos) + acb->cur_qiov.size);
acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
+ cb = qed_aio_write_prefill;
+
+ /* Zero write detection */
+ if ((acb->flags & QED_AIOCB_CHECK_ZERO_WRITE) && qed_is_zero_write(acb)) {
+ cb = qed_aio_write_zero_cluster;
+ }
+
if (qed_should_set_need_check(s)) {
s->header.features |= QED_F_NEED_CHECK;
- qed_write_header(s, qed_aio_write_prefill, acb);
+ qed_write_header(s, cb, acb);
} else {
- qed_aio_write_prefill(acb, 0);
+ cb(acb, 0);
}
}
@@ -1317,11 +1381,11 @@ static void qed_aio_next_io(void *opaque, int ret)
io_fn, acb);
}
-static BlockDriverAIOCB *qed_aio_setup(BlockDriverState *bs,
- int64_t sector_num,
- QEMUIOVector *qiov, int nb_sectors,
- BlockDriverCompletionFunc *cb,
- void *opaque, int flags)
+static QEDAIOCB *qed_aio_setup(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov, int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque, int flags)
{
QEDAIOCB *acb = qemu_aio_get(&qed_aio_pool, bs, cb, opaque);
@@ -1337,8 +1401,22 @@ static BlockDriverAIOCB *qed_aio_setup(BlockDriverState *bs,
acb->request.l2_table = NULL;
qemu_iovec_init(&acb->cur_qiov, qiov->niov);
+ return acb;
+}
+
+static BlockDriverAIOCB *bdrv_qed_aio_setup(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov, int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque, int flags)
+{
+ QEDAIOCB *acb;
+
+ acb = qed_aio_setup(bs, sector_num, qiov, nb_sectors,
+ cb, opaque, flags);
/* Start request */
qed_aio_next_io(acb, 0);
+
return &acb->common;
}
@@ -1348,9 +1426,15 @@ static BlockDriverAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
BlockDriverCompletionFunc *cb,
void *opaque)
{
- int flags = bs->copy_on_read ? QED_AIOCB_COPY_ON_READ : 0;
+ /* Don't bloat image file in copy-on-read, use zero detection */
+ int flags = QED_AIOCB_CHECK_ZERO_WRITE;
+
+ if (bs->copy_on_read) {
+ flags |= QED_AIOCB_COPY_ON_READ;
+ }
- return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, flags);
+ return bdrv_qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb,
+ opaque, flags);
}
static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
@@ -1359,8 +1443,139 @@ static BlockDriverAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
BlockDriverCompletionFunc *cb,
void *opaque)
{
- return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb,
- opaque, QED_AIOCB_WRITE);
+ return bdrv_qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb,
+ opaque, QED_AIOCB_WRITE);
+}
+
+typedef struct QEDCopyBackingData {
+ QEDAIOCB *acb;
+ uint64_t offset;
+ QEMUIOVector qiov;
+ void *buffer;
+ size_t len;
+ BlockDriverCompletionFunc *cb;
+ void *opaque;
+} QEDCopyBackingData;
+
+static void qed_aio_copy_backing_cb(void *opaque, int ret)
+{
+ QEDCopyBackingData *copy_backing_data = opaque;
+ QEDAIOCB *acb = copy_backing_data->acb;
+
+ if (ret) {
+ ret = -EIO;
+ } else {
+ ret = (acb->end_pos - copy_backing_data->offset) / BDRV_SECTOR_SIZE;
+ }
+
+ copy_backing_data->cb(copy_backing_data->opaque, ret);
+
+ qemu_iovec_destroy(©_backing_data->qiov);
+ qemu_vfree(copy_backing_data->buffer);
+ qemu_free(copy_backing_data);
+}
+
+static void qed_copy_backing_find_cluster_cb(void *opaque, int ret,
+ uint64_t offset, size_t len);
+
+/**
+ * Perform the next qed_find_cluster() from a BH
+ *
+ * This is necessary because we iterate over each cluster in turn.
+ * qed_find_cluster() may invoke its callback immediately without returning up
+ * the call stack, causing us to overflow the call stack. By starting each
+ * iteration from a BH we guarantee that a fresh stack is used each time.
+ */
+static void qed_copy_backing_next_cluster_bh(void *opaque)
+{
+ QEDCopyBackingData *copy_backing_data = opaque;
+ QEDAIOCB *acb = copy_backing_data->acb;
+ BDRVQEDState *s = acb_to_s(acb);
+
+ qemu_bh_delete(acb->bh);
+ acb->bh = NULL;
+
+ acb->cur_pos += s->header.cluster_size;
+ acb->end_pos += s->header.cluster_size;
+
+ qed_find_cluster(s, &acb->request, acb->cur_pos,
+ acb->end_pos - acb->cur_pos,
+ qed_copy_backing_find_cluster_cb, copy_backing_data);
+}
+
+/**
+ * Search for an unallocated cluster adjusting the current request until we
+ * can use it to read an unallocated cluster.
+ *
+ * Callback from qed_find_cluster().
+ */
+static void qed_copy_backing_find_cluster_cb(void *opaque, int ret,
+ uint64_t offset, size_t len)
+{
+ QEDCopyBackingData *copy_backing_data = opaque;
+ QEDAIOCB *acb = copy_backing_data->acb;
+ BDRVQEDState *s = acb_to_s(acb);
+
+ if (ret < 0) {
+ qed_aio_complete(acb, ret);
+ return;
+ }
+
+ if (ret == QED_CLUSTER_FOUND ||
+ ret == QED_CLUSTER_ZERO) {
+ /* proceed to next cluster */
+
+ if (acb->end_pos == s->header.image_size) {
+ qed_aio_complete(acb, 0);
+ return;
+ }
+
+ acb->bh = qemu_bh_new(qed_copy_backing_next_cluster_bh,
+ copy_backing_data);
+ qemu_bh_schedule(acb->bh);
+ } else {
+ /* found a hole, kick off request */
+ qed_aio_next_io(acb, 0);
+ }
+}
+
+static BlockDriverAIOCB *bdrv_qed_aio_copy_backing(BlockDriverState *bs,
+ int64_t sector_num, BlockDriverCompletionFunc *cb, void *opaque)
+{
+ BDRVQEDState *s = bs->opaque;
+ QEDCopyBackingData *copy_backing_data;
+ QEDAIOCB *acb;
+ uint32_t cluster_size = s->header.cluster_size;
+ uint64_t start_cluster;
+ QEMUIOVector *qiov;
+
+ copy_backing_data = qemu_mallocz(sizeof(*copy_backing_data));
+
+ copy_backing_data->cb = cb;
+ copy_backing_data->opaque = opaque;
+ copy_backing_data->len = cluster_size;
+ copy_backing_data->buffer = qemu_blockalign(s->bs, cluster_size);
+ copy_backing_data->offset = sector_num * BDRV_SECTOR_SIZE;
+
+ start_cluster = qed_start_of_cluster(s, copy_backing_data->offset);
+ sector_num = start_cluster / BDRV_SECTOR_SIZE;
+
+ qiov = ©_backing_data->qiov;
+ qemu_iovec_init(qiov, 1);
+ qemu_iovec_add(qiov, copy_backing_data->buffer, cluster_size);
+
+ acb = qed_aio_setup(bs, sector_num, qiov,
+ cluster_size / BDRV_SECTOR_SIZE,
+ qed_aio_copy_backing_cb, copy_backing_data,
+ QED_AIOCB_CHECK_ZERO_WRITE |
+ QED_AIOCB_COPY_ON_READ);
+ copy_backing_data->acb = acb;
+
+ qed_find_cluster(s, &acb->request, acb->cur_pos,
+ acb->end_pos - acb->cur_pos,
+ qed_copy_backing_find_cluster_cb, copy_backing_data);
+
+ return &acb->common;
}
static BlockDriverAIOCB *bdrv_qed_aio_flush(BlockDriverState *bs,
@@ -1527,6 +1742,7 @@ static BlockDriver bdrv_qed = {
.bdrv_make_empty = bdrv_qed_make_empty,
.bdrv_aio_readv = bdrv_qed_aio_readv,
.bdrv_aio_writev = bdrv_qed_aio_writev,
+ .bdrv_aio_copy_backing = bdrv_qed_aio_copy_backing,
.bdrv_aio_flush = bdrv_qed_aio_flush,
.bdrv_truncate = bdrv_qed_truncate,
.bdrv_getlength = bdrv_qed_getlength,
diff --git a/block/qed.h b/block/qed.h
index 16f4bd9..48c65f7 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -124,8 +124,9 @@ typedef struct QEDRequest {
} QEDRequest;
enum {
- QED_AIOCB_WRITE = 0x0001, /* read or write? */
+ QED_AIOCB_WRITE = 0x0001, /* read or write? */
QED_AIOCB_COPY_ON_READ = 0x0002,
+ QED_AIOCB_CHECK_ZERO_WRITE = 0x0004, /* detect zeroes? */
};
typedef struct QEDAIOCB {
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 14/15] trace: trace bdrv_aio_readv/writev error paths
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (12 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 13/15] qed: intelligent streaming implementation Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 15/15] tests: add image streaming QMP interface tests Stefan Hajnoczi
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
It is useful to understand why an I/O request was failed. Add trace
events for the error paths in bdrv_aio_readv() and bdrv_aio_writev().
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block.c | 24 +++++++++++++++++++-----
trace-events | 7 +++++++
2 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/block.c b/block.c
index 8225758..6cd7742 100644
--- a/block.c
+++ b/block.c
@@ -2148,10 +2148,14 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque);
- if (!drv)
+ if (!drv) {
+ trace_bdrv_aio_readv_null_drv(bs, sector_num, nb_sectors, opaque);
return NULL;
- if (bdrv_check_request(bs, sector_num, nb_sectors))
+ }
+ if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+ trace_bdrv_aio_readv_out_of_range(bs, sector_num, nb_sectors, opaque);
return NULL;
+ }
ret = drv->bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
cb, opaque);
@@ -2160,6 +2164,8 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
/* Update stats even though technically transfer has not happened. */
bs->rd_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
bs->rd_ops ++;
+ } else {
+ trace_bdrv_aio_readv_failed(bs, sector_num, nb_sectors, opaque);
}
return ret;
@@ -2211,12 +2217,18 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
- if (!drv)
+ if (!drv) {
+ trace_bdrv_aio_writev_null_drv(bs, sector_num, nb_sectors, opaque);
return NULL;
- if (bs->read_only)
+ }
+ if (bs->read_only) {
+ trace_bdrv_aio_writev_read_only(bs, sector_num, nb_sectors, opaque);
return NULL;
- if (bdrv_check_request(bs, sector_num, nb_sectors))
+ }
+ if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+ trace_bdrv_aio_writev_out_of_range(bs, sector_num, nb_sectors, opaque);
return NULL;
+ }
if (bs->dirty_bitmap) {
blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
@@ -2235,6 +2247,8 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
if (bs->wr_highest_sector < sector_num + nb_sectors - 1) {
bs->wr_highest_sector = sector_num + nb_sectors - 1;
}
+ } else {
+ trace_bdrv_aio_writev_failed(bs, sector_num, nb_sectors, opaque);
}
return ret;
diff --git a/trace-events b/trace-events
index 2c7c6dc..3a8b2d0 100644
--- a/trace-events
+++ b/trace-events
@@ -64,7 +64,14 @@ disable bdrv_aio_multiwrite_earlyfail(void *mcb) "mcb %p"
disable bdrv_aio_multiwrite_latefail(void *mcb, int i) "mcb %p i %d"
disable bdrv_aio_flush(void *bs, void *opaque) "bs %p opaque %p"
disable bdrv_aio_readv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
+disable bdrv_aio_readv_null_drv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
+disable bdrv_aio_readv_out_of_range(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
+disable bdrv_aio_readv_failed(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
disable bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
+disable bdrv_aio_writev_null_drv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
+disable bdrv_aio_writev_read_only(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
+disable bdrv_aio_writev_out_of_range(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
+disable bdrv_aio_writev_failed(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
disable bdrv_set_locked(void *bs, int locked) "bs %p locked %d"
# hw/virtio-blk.c
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [Qemu-devel] [PATCH 15/15] tests: add image streaming QMP interface tests
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
` (13 preceding siblings ...)
2011-07-27 13:44 ` [Qemu-devel] [PATCH 14/15] trace: trace bdrv_aio_readv/writev error paths Stefan Hajnoczi
@ 2011-07-27 13:44 ` Stefan Hajnoczi
14 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-27 13:44 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, Adam Litke
The test-stream.py script performs several automated tests of the image
streaming QMP interface, including exercising both the incremental and
background streaming modes.
This should probably be ported to KVM-Autotest rather than reinventing
the wheel.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
test-stream.py | 193 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 193 insertions(+), 0 deletions(-)
create mode 100644 test-stream.py
diff --git a/test-stream.py b/test-stream.py
new file mode 100644
index 0000000..1a63306
--- /dev/null
+++ b/test-stream.py
@@ -0,0 +1,193 @@
+#!/usr/bin/env python
+import unittest
+import subprocess
+import re
+import os
+import sys; sys.path.append('QMP/')
+import qmp
+
+def qemu_img(*args):
+ devnull = open('/dev/null', 'r+')
+ return subprocess.call(['./qemu-img'] + list(args), stdin=devnull, stdout=devnull)
+
+class VM(object):
+ def __init__(self):
+ self._monitor_path = '/tmp/qemu-mon.%d' % os.getpid()
+ self._qemu_log_path = '/tmp/qemu-log.%d' % os.getpid()
+ self._args = ['x86_64-softmmu/qemu-system-x86_64',
+ '-chardev', 'socket,id=mon,path=' + self._monitor_path,
+ '-mon', 'chardev=mon,mode=control',
+ '-nographic']
+ self._num_drives = 0
+
+ def add_drive(self, path, opts=''):
+ options = ['if=virtio',
+ 'cache=none',
+ 'file=%s' % path,
+ 'id=drive%d' % self._num_drives]
+ if opts:
+ options.append(opts)
+
+ self._args.append('-drive')
+ self._args.append(','.join(options))
+ self._num_drives += 1
+ return self
+
+ def launch(self):
+ devnull = open('/dev/null', 'rb')
+ qemulog = open(self._qemu_log_path, 'wb')
+ self._qmp = qmp.QEMUMonitorProtocol(self._monitor_path, server=True)
+ self._popen = subprocess.Popen(self._args, stdin=devnull, stdout=qemulog,
+ stderr=subprocess.STDOUT)
+ self._qmp.accept()
+
+ def shutdown(self):
+ self._qmp.cmd('quit')
+ self._popen.wait()
+ os.remove(self._monitor_path)
+ #os.remove(self._qemu_log_path)
+
+ def qmp(self, cmd, **args):
+ return self._qmp.cmd(cmd, args=args)
+
+ def get_qmp_events(self, wait=False):
+ events = self._qmp.get_events(wait=wait)
+ self._qmp.clear_events()
+ return events
+
+index_re = re.compile(r'([^\[]+)\[([^\]]+)\]')
+
+class QMPTestCase(unittest.TestCase):
+ def dictpath(self, d, path):
+ """Traverse a path in a nested dict"""
+ for component in path.split('/'):
+ m = index_re.match(component)
+ if m:
+ component, idx = m.groups()
+ idx = int(idx)
+
+ if not isinstance(d, dict) or component not in d:
+ self.fail('failed path traversal for "%s" in "%s"' % (path, str(d)))
+ d = d[component]
+
+ if m:
+ if not isinstance(d, list):
+ self.fail('path component "%s" in "%s" is not a list in "%s"' % (component, path, str(d)))
+ try:
+ d = d[idx]
+ except IndexError:
+ self.fail('invalid index "%s" in path "%s" in "%s"' % (idx, path, str(d)))
+ return d
+
+ def assert_qmp(self, d, path, value):
+ result = self.dictpath(d, path)
+ self.assertEqual(result, value, 'values not equal "%s" and "%s"' % (str(result), str(value)))
+
+ def assert_no_active_streams(self):
+ result = self.vm.qmp('query-block-jobs')
+ self.assert_qmp(result, 'return', [])
+
+class TestSingleDrive(QMPTestCase):
+ image_len = 1 * 1024 * 1024 # MB
+
+ def setUp(self):
+ qemu_img('create', 'backing.img', str(TestSingleDrive.image_len))
+ qemu_img('create', '-f', 'qed', '-o', 'backing_file=backing.img', 'test.qed')
+ self.vm = VM().add_drive('test.qed', 'copy-on-read=on')
+ self.vm.launch()
+
+ def tearDown(self):
+ self.vm.shutdown()
+ os.remove('test.qed')
+ os.remove('backing.img')
+
+ def test_stream(self):
+ self.assert_no_active_streams()
+
+ result = self.vm.qmp('block_stream', device='drive0')
+ self.assert_qmp(result, 'return', {})
+
+ completed = False
+ while not completed:
+ for event in self.vm.get_qmp_events(wait=True):
+ if event['event'] == 'BLOCK_JOB_COMPLETED':
+ self.assert_qmp(event, 'data/type', 'stream')
+ self.assert_qmp(event, 'data/device', 'drive0')
+ self.assert_qmp(event, 'data/offset', self.image_len)
+ self.assert_qmp(event, 'data/len', self.image_len)
+ completed = True
+
+ self.assert_no_active_streams()
+
+ def test_device_not_found(self):
+ result = self.vm.qmp('block_stream', device='nonexistent')
+ self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+class TestStreamStop(QMPTestCase):
+ image_len = 8 * 1024 * 1024 * 1024 # GB
+
+ def setUp(self):
+ qemu_img('create', 'backing.img', str(TestStreamStop.image_len))
+ qemu_img('create', '-f', 'qed', '-o', 'backing_file=backing.img', 'test.qed')
+ self.vm = VM().add_drive('test.qed', 'copy-on-read=on')
+ self.vm.launch()
+
+ def tearDown(self):
+ self.vm.shutdown()
+ os.remove('test.qed')
+ os.remove('backing.img')
+
+ def test_stream_stop(self):
+ import time
+
+ self.assert_no_active_streams()
+
+ result = self.vm.qmp('block_stream', device='drive0')
+ self.assert_qmp(result, 'return', {})
+
+ time.sleep(1)
+ events = self.vm.get_qmp_events(wait=False)
+ self.assertEqual(events, [], 'unexpected QMP event: %s' % events)
+
+ self.vm.qmp('block_job_cancel', device='drive0')
+ self.assert_qmp(result, 'return', {})
+
+ self.assert_no_active_streams()
+
+class TestSetSpeed(QMPTestCase):
+ image_len = 80 * 1024 * 1024 # MB
+
+ def setUp(self):
+ qemu_img('create', 'backing.img', str(TestSetSpeed.image_len))
+ qemu_img('create', '-f', 'qed', '-o', 'backing_file=backing.img', 'test.qed')
+ self.vm = VM().add_drive('test.qed', 'copy-on-read=on')
+ self.vm.launch()
+
+ def tearDown(self):
+ self.vm.shutdown()
+ os.remove('test.qed')
+ os.remove('backing.img')
+
+ # This doesn't print or verify anything, only use it via "test-stream.py TestSetSpeed"
+ def test_stream_set_speed(self):
+ self.assert_no_active_streams()
+
+ result = self.vm.qmp('block_stream', device='drive0')
+ self.assert_qmp(result, 'return', {})
+
+ result = self.vm.qmp('block_job_set_speed', device='drive0', value=8 * 1024 * 1024)
+
+ completed = False
+ while not completed:
+ for event in self.vm.get_qmp_events(wait=True):
+ if event['event'] == 'BLOCK_JOB_COMPLETED':
+ self.assert_qmp(event, 'data/type', 'stream')
+ self.assert_qmp(event, 'data/device', 'drive0')
+ self.assert_qmp(event, 'data/offset', self.image_len)
+ self.assert_qmp(event, 'data/len', self.image_len)
+ completed = True
+
+ self.assert_no_active_streams()
+
+if __name__ == '__main__':
+ unittest.main()
--
1.7.5.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 08/15] qmp: add block_stream command
2011-07-27 13:44 ` [Qemu-devel] [PATCH 08/15] qmp: add block_stream command Stefan Hajnoczi
@ 2011-07-28 15:53 ` Marcelo Tosatti
2011-07-28 15:57 ` Stefan Hajnoczi
0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Tosatti @ 2011-07-28 15:53 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, Anthony Liguori, qemu-devel, Adam Litke
On Wed, Jul 27, 2011 at 02:44:48PM +0100, Stefan Hajnoczi wrote:
> For leaf images with copy-on-read semantics, the stream command allows
> the user to populate the image file by copying data from the backing
> file while the guest is running. Once all blocks have been streamed,
> the dependency on the original backing file is removed. Therefore,
> stream commands can be used to implement post-copy live block migration
> and rapid deployment.
>
> The command synopsis is:
>
> block_stream
> ------------
>
> Copy data from a backing file into a block device.
>
> The block streaming operation is performed in the background until the
> entire backing file has been copied. This command returns immediately
> once streaming has started. The status of ongoing block streaming
> operations can be checked with query-block-jobs. The operation can be
> stopped before it has completed using the block_job_cancel command.
>
> If a base file is specified then sectors are not copied from that base
> file and its backing chain. When streaming completes the image file
> will have the base file as its backing file. This can be used to stream
> a subset of the backing file chain instead of flattening the entire
> image.
>
> On successful completion the image file is updated to drop the backing
> file.
>
> Arguments:
>
> - device: device name (json-string)
> - base: common backing file (json-string, optional)
>
> Errors:
>
> DeviceInUse: streaming is already active on this device
> DeviceNotFound: device name is invalid
> NotSupported: image streaming is not supported by this device
>
> Events:
>
> On completion the BLOCK_JOB_COMPLETED event is raised with the following
> fields:
>
> - type: job type ("stream" for image streaming, json-string)
> - device: device name (json-string)
> - end: maximum progress value (json-int)
> - position: current progress value (json-int)
> - speed: rate limit, bytes per second (json-int)
> - error: error message (json-string, only on error)
>
> The completion event is raised both on success and on failure. On
> success position is equal to end. On failure position and end can be
> used to indicate at which point the operation failed.
>
> On failure the error field contains a human-readable error message.
> There are no semantics other than that streaming has failed and clients
> should not try to interpret the error string.
>
> Examples:
>
> -> { "execute": "block_stream", "arguments": { "device": "virtio0" } }
> <- { "return": {} }
>
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
> blockdev.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> blockdev.h | 1 +
> hmp-commands.hx | 14 ++++++
> monitor.c | 3 +
> monitor.h | 1 +
> qerror.c | 8 +++
> qerror.h | 6 +++
> qmp-commands.hx | 64 ++++++++++++++++++++++++++
> 8 files changed, 230 insertions(+), 0 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index b337732..cd5e49c 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -16,6 +16,7 @@
> #include "sysemu.h"
> #include "hw/qdev.h"
> #include "block_int.h"
> +#include "qjson.h"
>
> static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
>
> @@ -50,6 +51,131 @@ static const int if_max_devs[IF_COUNT] = {
> [IF_SCSI] = 7,
> };
>
> +typedef struct StreamState {
> + int64_t offset; /* current position in block device */
> + BlockDriverState *bs;
> + QEMUTimer *timer;
> + QLIST_ENTRY(StreamState) list;
> +} StreamState;
> +
> +static QLIST_HEAD(, StreamState) block_streams =
> + QLIST_HEAD_INITIALIZER(block_streams);
> +
> +static QObject *stream_get_qobject(StreamState *s)
> +{
> + const char *name = bdrv_get_device_name(s->bs);
> + int64_t len = bdrv_getlength(s->bs);
> +
> + return qobject_from_jsonf("{ 'device': %s, 'type': 'stream', "
> + "'offset': %" PRId64 ", 'len': %" PRId64 ", "
> + "'speed': %" PRId64 " }",
> + name, s->offset, len, (int64_t)0);
> +}
> +
> +static void stream_mon_event(StreamState *s, int ret)
> +{
> + QObject *data = stream_get_qobject(s);
> +
> + if (ret < 0) {
> + QDict *qdict = qobject_to_qdict(data);
> +
> + qdict_put(qdict, "error", qstring_from_str(strerror(-ret)));
> + }
> +
> + monitor_protocol_event(QEVENT_BLOCK_JOB_COMPLETED, data);
> + qobject_decref(data);
> +}
> +
> +static void stream_free(StreamState *s)
> +{
> + QLIST_REMOVE(s, list);
> +
> + qemu_del_timer(s->timer);
> + qemu_free_timer(s->timer);
> + qemu_free(s);
> +}
> +
> +static void stream_complete(StreamState *s, int ret)
> +{
> + stream_mon_event(s, ret);
> + stream_free(s);
> +}
> +
> +static void stream_cb(void *opaque, int nb_sectors)
> +{
> + StreamState *s = opaque;
> +
> + if (nb_sectors < 0) {
> + stream_complete(s, nb_sectors);
> + return;
> + }
> +
> + s->offset += nb_sectors * BDRV_SECTOR_SIZE;
> +
> + if (s->offset == bdrv_getlength(s->bs)) {
> + bdrv_change_backing_file(s->bs, NULL, NULL);
> + stream_complete(s, 0);
> + } else {
> + qemu_mod_timer(s->timer, qemu_get_clock_ns(rt_clock));
> + }
> +}
> +
> +/* We can't call bdrv_aio_stream() directly from the callback because that
> + * makes qemu_aio_flush() not complete until the streaming is completed.
> + * By delaying with a timer, we give qemu_aio_flush() a chance to complete.
> + */
> +static void stream_next_iteration(void *opaque)
> +{
> + StreamState *s = opaque;
> +
> + bdrv_aio_copy_backing(s->bs, s->offset / BDRV_SECTOR_SIZE, stream_cb, s);
> +}
The plan is to replace format specific code with the generic
implementation in the future?
> +
> +static StreamState *stream_find(const char *device)
> +{
> + StreamState *s;
> +
> + QLIST_FOREACH(s, &block_streams, list) {
> + if (strcmp(bdrv_get_device_name(s->bs), device) == 0) {
> + return s;
> + }
> + }
> + return NULL;
> +}
> +
> +static StreamState *stream_start(const char *device)
> +{
> + StreamState *s;
> + BlockDriverAIOCB *acb;
> + BlockDriverState *bs;
> +
> + s = stream_find(device);
> + if (s) {
> + qerror_report(QERR_DEVICE_IN_USE, device);
> + return NULL;
> + }
> +
> + bs = bdrv_find(device);
> + if (!bs) {
> + qerror_report(QERR_DEVICE_NOT_FOUND, device);
> + return NULL;
> + }
> +
> + s = qemu_mallocz(sizeof(*s));
> + s->bs = bs;
> + s->timer = qemu_new_timer_ns(rt_clock, stream_next_iteration, s);
> + QLIST_INSERT_HEAD(&block_streams, s, list);
Should increase refcount with drive_get_ref().
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [PATCH 08/15] qmp: add block_stream command
2011-07-28 15:53 ` Marcelo Tosatti
@ 2011-07-28 15:57 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2011-07-28 15:57 UTC (permalink / raw)
To: Marcelo Tosatti
Cc: Kevin Wolf, Anthony Liguori, Adam Litke, Stefan Hajnoczi,
qemu-devel
On Thu, Jul 28, 2011 at 4:53 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Wed, Jul 27, 2011 at 02:44:48PM +0100, Stefan Hajnoczi wrote:
>> For leaf images with copy-on-read semantics, the stream command allows
>> the user to populate the image file by copying data from the backing
>> file while the guest is running. Once all blocks have been streamed,
>> the dependency on the original backing file is removed. Therefore,
>> stream commands can be used to implement post-copy live block migration
>> and rapid deployment.
>>
>> The command synopsis is:
>>
>> block_stream
>> ------------
>>
>> Copy data from a backing file into a block device.
>>
>> The block streaming operation is performed in the background until the
>> entire backing file has been copied. This command returns immediately
>> once streaming has started. The status of ongoing block streaming
>> operations can be checked with query-block-jobs. The operation can be
>> stopped before it has completed using the block_job_cancel command.
>>
>> If a base file is specified then sectors are not copied from that base
>> file and its backing chain. When streaming completes the image file
>> will have the base file as its backing file. This can be used to stream
>> a subset of the backing file chain instead of flattening the entire
>> image.
>>
>> On successful completion the image file is updated to drop the backing
>> file.
>>
>> Arguments:
>>
>> - device: device name (json-string)
>> - base: common backing file (json-string, optional)
>>
>> Errors:
>>
>> DeviceInUse: streaming is already active on this device
>> DeviceNotFound: device name is invalid
>> NotSupported: image streaming is not supported by this device
>>
>> Events:
>>
>> On completion the BLOCK_JOB_COMPLETED event is raised with the following
>> fields:
>>
>> - type: job type ("stream" for image streaming, json-string)
>> - device: device name (json-string)
>> - end: maximum progress value (json-int)
>> - position: current progress value (json-int)
>> - speed: rate limit, bytes per second (json-int)
>> - error: error message (json-string, only on error)
>>
>> The completion event is raised both on success and on failure. On
>> success position is equal to end. On failure position and end can be
>> used to indicate at which point the operation failed.
>>
>> On failure the error field contains a human-readable error message.
>> There are no semantics other than that streaming has failed and clients
>> should not try to interpret the error string.
>>
>> Examples:
>>
>> -> { "execute": "block_stream", "arguments": { "device": "virtio0" } }
>> <- { "return": {} }
>>
>> Signed-off-by: Adam Litke <agl@us.ibm.com>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>> blockdev.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> blockdev.h | 1 +
>> hmp-commands.hx | 14 ++++++
>> monitor.c | 3 +
>> monitor.h | 1 +
>> qerror.c | 8 +++
>> qerror.h | 6 +++
>> qmp-commands.hx | 64 ++++++++++++++++++++++++++
>> 8 files changed, 230 insertions(+), 0 deletions(-)
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index b337732..cd5e49c 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -16,6 +16,7 @@
>> #include "sysemu.h"
>> #include "hw/qdev.h"
>> #include "block_int.h"
>> +#include "qjson.h"
>>
>> static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives);
>>
>> @@ -50,6 +51,131 @@ static const int if_max_devs[IF_COUNT] = {
>> [IF_SCSI] = 7,
>> };
>>
>> +typedef struct StreamState {
>> + int64_t offset; /* current position in block device */
>> + BlockDriverState *bs;
>> + QEMUTimer *timer;
>> + QLIST_ENTRY(StreamState) list;
>> +} StreamState;
>> +
>> +static QLIST_HEAD(, StreamState) block_streams =
>> + QLIST_HEAD_INITIALIZER(block_streams);
>> +
>> +static QObject *stream_get_qobject(StreamState *s)
>> +{
>> + const char *name = bdrv_get_device_name(s->bs);
>> + int64_t len = bdrv_getlength(s->bs);
>> +
>> + return qobject_from_jsonf("{ 'device': %s, 'type': 'stream', "
>> + "'offset': %" PRId64 ", 'len': %" PRId64 ", "
>> + "'speed': %" PRId64 " }",
>> + name, s->offset, len, (int64_t)0);
>> +}
>> +
>> +static void stream_mon_event(StreamState *s, int ret)
>> +{
>> + QObject *data = stream_get_qobject(s);
>> +
>> + if (ret < 0) {
>> + QDict *qdict = qobject_to_qdict(data);
>> +
>> + qdict_put(qdict, "error", qstring_from_str(strerror(-ret)));
>> + }
>> +
>> + monitor_protocol_event(QEVENT_BLOCK_JOB_COMPLETED, data);
>> + qobject_decref(data);
>> +}
>> +
>> +static void stream_free(StreamState *s)
>> +{
>> + QLIST_REMOVE(s, list);
>> +
>> + qemu_del_timer(s->timer);
>> + qemu_free_timer(s->timer);
>> + qemu_free(s);
>> +}
>> +
>> +static void stream_complete(StreamState *s, int ret)
>> +{
>> + stream_mon_event(s, ret);
>> + stream_free(s);
>> +}
>> +
>> +static void stream_cb(void *opaque, int nb_sectors)
>> +{
>> + StreamState *s = opaque;
>> +
>> + if (nb_sectors < 0) {
>> + stream_complete(s, nb_sectors);
>> + return;
>> + }
>> +
>> + s->offset += nb_sectors * BDRV_SECTOR_SIZE;
>> +
>> + if (s->offset == bdrv_getlength(s->bs)) {
>> + bdrv_change_backing_file(s->bs, NULL, NULL);
>> + stream_complete(s, 0);
>> + } else {
>> + qemu_mod_timer(s->timer, qemu_get_clock_ns(rt_clock));
>> + }
>> +}
>> +
>> +/* We can't call bdrv_aio_stream() directly from the callback because that
>> + * makes qemu_aio_flush() not complete until the streaming is completed.
>> + * By delaying with a timer, we give qemu_aio_flush() a chance to complete.
>> + */
>> +static void stream_next_iteration(void *opaque)
>> +{
>> + StreamState *s = opaque;
>> +
>> + bdrv_aio_copy_backing(s->bs, s->offset / BDRV_SECTOR_SIZE, stream_cb, s);
>> +}
>
> The plan is to replace format specific code with the generic
> implementation in the future?
I think Kevin has said this will not be merged into qemu.git. But I
am sharing it as a reference implementation against which the libvirt
API works.
Next I will send out a stub implementation of the QMP/HMP block_stream
APIs without any of the block driver-specific functionality. It may
make sense to merge this into QEMU just to nail down the QMP/HMP
interface, even if it does not do anything yet (we have errors for
ENOTSUP).
>> +
>> +static StreamState *stream_find(const char *device)
>> +{
>> + StreamState *s;
>> +
>> + QLIST_FOREACH(s, &block_streams, list) {
>> + if (strcmp(bdrv_get_device_name(s->bs), device) == 0) {
>> + return s;
>> + }
>> + }
>> + return NULL;
>> +}
>> +
>> +static StreamState *stream_start(const char *device)
>> +{
>> + StreamState *s;
>> + BlockDriverAIOCB *acb;
>> + BlockDriverState *bs;
>> +
>> + s = stream_find(device);
>> + if (s) {
>> + qerror_report(QERR_DEVICE_IN_USE, device);
>> + return NULL;
>> + }
>> +
>> + bs = bdrv_find(device);
>> + if (!bs) {
>> + qerror_report(QERR_DEVICE_NOT_FOUND, device);
>> + return NULL;
>> + }
>> +
>> + s = qemu_mallocz(sizeof(*s));
>> + s->bs = bs;
>> + s->timer = qemu_new_timer_ns(rt_clock, stream_next_iteration, s);
>> + QLIST_INSERT_HEAD(&block_streams, s, list);
>
> Should increase refcount with drive_get_ref().
Yes thanks.
Stefan
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-07-28 15:58 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-27 13:44 [Qemu-devel] [RFC v2 00/15] QED image streaming Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 01/15] block: add -drive copy-on-read=on|off Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 02/15] qed: replace is_write with flags field Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 03/15] qed: extract qed_start_allocating_write() Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 04/15] qed: make qed_aio_write_alloc() reusable Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 05/15] qed: add support for copy-on-read Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 06/15] qed: avoid deadlock on emulated synchronous I/O Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 07/15] block: add bdrv_aio_copy_backing() Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 08/15] qmp: add block_stream command Stefan Hajnoczi
2011-07-28 15:53 ` Marcelo Tosatti
2011-07-28 15:57 ` Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 09/15] qmp: add block_job_cancel command Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 10/15] qmp: add query-block-jobs command Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 11/15] qmp: add block_job_set_speed command Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 12/15] block: add -drive stream=on|off Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 13/15] qed: intelligent streaming implementation Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 14/15] trace: trace bdrv_aio_readv/writev error paths Stefan Hajnoczi
2011-07-27 13:44 ` [Qemu-devel] [PATCH 15/15] tests: add image streaming QMP interface tests Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).