* [Qemu-devel] [PATCH v2 0/2] add internal backup job and write-threshold filter drivers
@ 2017-08-15 8:18 Manos Pitsidianakis
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 1/2] block: use internal filter node in backup Manos Pitsidianakis
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 2/2] block: add filter driver to block/write-threshold.c Manos Pitsidianakis
0 siblings, 2 replies; 8+ messages in thread
From: Manos Pitsidianakis @ 2017-08-15 8:18 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, Alberto Garcia, Stefan Hajnoczi, Kevin Wolf
This is part of my GSOC project, which is refactoring hardcoded block layer
features into filter drivers. Block filter drivers are inserted into the graph
only when a feature is needed. This makes the block layer more modular and
reuses the block driver abstraction that is already present.
Before write notifiers currently have two users:
block/backup.c uses before write notifiers to intercept write requests. This
can be refactored to use the filter driver interface by injecting an implicit
filter node to intercept the write requests and call backup_do_cow().
block/write-threshold.c checks that write requests do not pass a user set
offset and issue an event when they do. A new write-threshold driver can
perform the same function and be added by the user when
block-{insert,remove}-node are introduced. It is not trivial to convert the
existing interface (block-set-write-threshold) to using the filter driver.
v2:
add motivation in commit messages
dropped hunk that removed before-write notifiers
Manos Pitsidianakis (2):
block: use internal filter node in backup
block: add filter driver to block/write-threshold.c
block.c | 89 ++++++++++++--
block/backup.c | 207 +++++++++++++++++++++++++++----
block/mirror.c | 7 +-
block/qapi.c | 2 +-
block/write-threshold.c | 264 +++++++++++++++++++++++++++++++++++-----
blockdev.c | 2 +-
include/block/block.h | 8 +-
include/block/write-threshold.h | 22 ++--
qapi/block-core.json | 19 ++-
tests/qemu-iotests/141.out | 2 +-
tests/test-write-threshold.c | 40 +++---
11 files changed, 557 insertions(+), 105 deletions(-)
--
2.11.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v2 1/2] block: use internal filter node in backup
2017-08-15 8:18 [Qemu-devel] [PATCH v2 0/2] add internal backup job and write-threshold filter drivers Manos Pitsidianakis
@ 2017-08-15 8:18 ` Manos Pitsidianakis
2017-08-16 13:25 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-10-02 12:45 ` [Qemu-devel] " Kevin Wolf
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 2/2] block: add filter driver to block/write-threshold.c Manos Pitsidianakis
1 sibling, 2 replies; 8+ messages in thread
From: Manos Pitsidianakis @ 2017-08-15 8:18 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, Alberto Garcia, Stefan Hajnoczi, Kevin Wolf
block/backup.c currently uses before write notifiers on the targeted
node. We can create a filter node instead to intercept write requests
for the backup job on the BDS level, instead of the BlockBackend level.
This is part of deprecating before write notifiers, which are hard coded
into the block layer. Block filter drivers are inserted into the graph
only when a feature is needed. This makes the block layer more modular
and reuses the block driver abstraction that is already present.
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
---
block.c | 89 +++++++++++++++++--
block/backup.c | 207 ++++++++++++++++++++++++++++++++++++++++-----
block/mirror.c | 7 +-
blockdev.c | 2 +-
include/block/block.h | 8 +-
tests/qemu-iotests/141.out | 2 +-
6 files changed, 276 insertions(+), 39 deletions(-)
diff --git a/block.c b/block.c
index 2de1c29eb3..81bd51b670 100644
--- a/block.c
+++ b/block.c
@@ -2088,6 +2088,38 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs)
}
/*
+ * Sets the file link of a BDS. A new reference is created; callers
+ * which don't need their own reference any more must call bdrv_unref().
+ */
+void bdrv_set_file(BlockDriverState *bs, BlockDriverState *file_bs,
+ Error **errp)
+{
+ if (file_bs) {
+ bdrv_ref(file_bs);
+ }
+
+ if (bs->file) {
+ bdrv_unref_child(bs, bs->file);
+ }
+
+ if (!file_bs) {
+ bs->file = NULL;
+ goto out;
+ }
+
+ bs->file = bdrv_attach_child(bs, file_bs, "file", &child_file,
+ errp);
+ if (!bs->file) {
+ bdrv_unref(file_bs);
+ }
+
+ bdrv_refresh_filename(bs);
+
+out:
+ bdrv_refresh_limits(bs, NULL);
+}
+
+/*
* Sets the backing file link of a BDS. A new reference is created; callers
* which don't need their own reference any more must call bdrv_unref().
*/
@@ -2355,12 +2387,12 @@ static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
goto out;
}
- /* bdrv_append() consumes a strong reference to bs_snapshot
+ /* bdrv_append_backing() consumes a strong reference to bs_snapshot
* (i.e. it will call bdrv_unref() on it) even on error, so in
* order to be able to return one, we have to increase
* bs_snapshot's refcount here */
bdrv_ref(bs_snapshot);
- bdrv_append(bs_snapshot, bs, &local_err);
+ bdrv_append_backing(bs_snapshot, bs, &local_err);
if (local_err) {
error_propagate(errp, local_err);
bs_snapshot = NULL;
@@ -3142,7 +3174,7 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
return false;
}
- if (c->role == &child_backing) {
+ if (c->role == &child_backing || c->role == &child_file) {
/* If @from is a backing file of @to, ignore the child to avoid
* creating a loop. We only want to change the pointer of other
* parents. */
@@ -3213,6 +3245,45 @@ out:
}
/*
+ * Add new bs node at the top of a BDS chain while the chain is
+ * live, while keeping required fields on the top layer.
+ *
+ * This will modify the BlockDriverState fields, and swap contents
+ * between bs_new and bs_top. Both bs_new and bs_top are modified.
+ *
+ * bs_new must not be attached to a BlockBackend.
+ *
+ * bdrv_append_file() takes ownership of a bs_new reference and unrefs it
+ * because that's what the callers commonly need. bs_new will be referenced by
+ * the old parents of bs_top after bdrv_append_file() returns. If the caller
+ * needs to keep a reference of its own, it must call bdrv_ref().
+ */
+void bdrv_append_file(BlockDriverState *bs_new, BlockDriverState *bs_top,
+ Error **errp)
+{
+ Error *local_err = NULL;
+
+ bdrv_ref(bs_top);
+ bdrv_set_file(bs_new, bs_top, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ bdrv_set_file(bs_new, NULL, &error_abort);
+ goto out;
+ }
+ bdrv_replace_node(bs_top, bs_new, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ goto out;
+ }
+
+ /* bs_new is now referenced by its new parents, we don't need the
+ * additional reference any more. */
+out:
+ bdrv_unref(bs_top);
+ bdrv_unref(bs_new);
+}
+
+/*
* Add new bs contents at the top of an image chain while the chain is
* live, while keeping required fields on the top layer.
*
@@ -3223,13 +3294,13 @@ out:
*
* This function does not create any image files.
*
- * bdrv_append() takes ownership of a bs_new reference and unrefs it because
- * that's what the callers commonly need. bs_new will be referenced by the old
- * parents of bs_top after bdrv_append() returns. If the caller needs to keep a
- * reference of its own, it must call bdrv_ref().
+ * bdrv_append_backing() takes ownership of a bs_new reference and unrefs it
+ * because that's what the callers commonly need. bs_new will be referenced by
+ * the old parents of bs_top after bdrv_append_backing() returns. If the caller
+ * needs to keep a reference of its own, it must call bdrv_ref().
*/
-void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
- Error **errp)
+void bdrv_append_backing(BlockDriverState *bs_new, BlockDriverState *bs_top,
+ Error **errp)
{
Error *local_err = NULL;
diff --git a/block/backup.c b/block/backup.c
index 504a089150..0828d522b6 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -43,7 +43,7 @@ typedef struct BackupBlockJob {
unsigned long *done_bitmap;
int64_t cluster_size;
bool compress;
- NotifierWithReturn before_write;
+ BlockDriverState *filter;
QLIST_HEAD(, CowRequest) inflight_reqs;
} BackupBlockJob;
@@ -174,20 +174,6 @@ out:
return ret;
}
-static int coroutine_fn backup_before_write_notify(
- NotifierWithReturn *notifier,
- void *opaque)
-{
- BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
- BdrvTrackedRequest *req = opaque;
-
- assert(req->bs == blk_bs(job->common.blk));
- assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
- assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
-
- return backup_do_cow(job, req->offset, req->bytes, NULL, true);
-}
-
static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
{
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
@@ -202,7 +188,8 @@ static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
{
BdrvDirtyBitmap *bm;
- BlockDriverState *bs = blk_bs(job->common.blk);
+ BlockDriverState *bs = child_bs(blk_bs(job->common.blk));
+ assert(bs);
if (ret < 0 || block_job_is_cancelled(&job->common)) {
/* Merge the successor back into the parent, delete nothing. */
@@ -234,9 +221,31 @@ static void backup_abort(BlockJob *job)
static void backup_clean(BlockJob *job)
{
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+ BlockDriverState *bs = child_bs(s->filter),
+ *filter = s->filter;
+
assert(s->target);
blk_unref(s->target);
s->target = NULL;
+
+ /* make sure nothing goes away while removing filter */
+ bdrv_ref(filter);
+ bdrv_ref(bs);
+ bdrv_drained_begin(bs);
+
+ block_job_remove_all_bdrv(job);
+ bdrv_child_try_set_perm(filter->file, 0, BLK_PERM_ALL,
+ &error_abort);
+ bdrv_replace_node(filter, bs, &error_abort);
+
+ blk_remove_bs(job->blk);
+ blk_set_perm(job->blk, 0, BLK_PERM_ALL, &error_abort);
+ blk_insert_bs(job->blk, filter, &error_abort);
+
+ bdrv_drained_end(bs);
+ bdrv_unref(filter);
+ bdrv_unref(bs);
+ s->filter = NULL;
}
static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
@@ -421,6 +430,18 @@ out:
return ret;
}
+static void backup_top_enable(BackupBlockJob *job)
+{
+ BlockDriverState *bs = job->filter;
+ bs->opaque = job;
+}
+
+static void backup_top_disable(BackupBlockJob *job)
+{
+ BlockDriverState *bs = job->filter;
+ bs->opaque = NULL;
+}
+
static void coroutine_fn backup_run(void *opaque)
{
BackupBlockJob *job = opaque;
@@ -435,13 +456,12 @@ static void coroutine_fn backup_run(void *opaque)
job->done_bitmap = bitmap_new(DIV_ROUND_UP(job->common.len,
job->cluster_size));
- job->before_write.notify = backup_before_write_notify;
- bdrv_add_before_write_notifier(bs, &job->before_write);
+ backup_top_enable(job);
if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
while (!block_job_is_cancelled(&job->common)) {
- /* Yield until the job is cancelled. We just let our before_write
- * notify callback service CoW requests. */
+ /* Yield until the job is cancelled. We just let our backup_top
+ * filter service CoW requests. */
block_job_yield(&job->common);
}
} else if (job->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
@@ -507,8 +527,7 @@ static void coroutine_fn backup_run(void *opaque)
}
}
}
-
- notifier_with_return_remove(&job->before_write);
+ backup_top_disable(job);
/* wait until pending backup_do_cow() calls have completed */
qemu_co_rwlock_wrlock(&job->flush_rwlock);
@@ -532,6 +551,124 @@ static const BlockJobDriver backup_job_driver = {
.drain = backup_drain,
};
+static int coroutine_fn backup_top_co_preadv(BlockDriverState *bs,
+ uint64_t offset,
+ uint64_t bytes,
+ QEMUIOVector *qiov, int flags)
+{
+ return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn backup_top_co_pwritev(BlockDriverState *bs,
+ uint64_t offset,
+ uint64_t bytes,
+ QEMUIOVector *qiov, int flags)
+{
+ int ret = 0;
+ BackupBlockJob *job = bs->opaque;
+ if (job) {
+ assert(bs == blk_bs(job->common.blk));
+ assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+ assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
+ ret = backup_do_cow(job, offset, bytes, NULL, true);
+ }
+
+ return ret < 0 ? ret : bdrv_co_pwritev(bs->file, offset, bytes,
+ qiov, flags);
+}
+
+static int coroutine_fn backup_top_co_pwrite_zeroes(BlockDriverState *bs,
+ int64_t offset,
+ int bytes,
+ BdrvRequestFlags flags)
+{
+ int ret = 0;
+ BackupBlockJob *job = bs->opaque;
+ if (job) {
+ assert(bs == blk_bs(job->common.blk));
+ assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+ assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
+ ret = backup_do_cow(job, offset, bytes, NULL, true);
+ }
+
+ return ret < 0 ? ret : bdrv_co_pwrite_zeroes(bs->file, offset, bytes,
+ flags);
+}
+
+static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int bytes)
+{
+ int ret = 0;
+ BackupBlockJob *job = bs->opaque;
+ if (job) {
+ assert(bs == blk_bs(job->common.blk));
+ assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
+ assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
+ ret = backup_do_cow(job, offset, bytes, NULL, true);
+ }
+
+ return ret < 0 ? ret : bdrv_co_pdiscard(bs->file->bs, offset, bytes);
+}
+
+static int backup_top_co_flush(BlockDriverState *bs)
+{
+ return bdrv_co_flush(bs->file->bs);
+}
+
+static int backup_top_open(BlockDriverState *bs, QDict *options,
+ int flags, Error **errp)
+{
+ return 0;
+}
+
+static void backup_top_close(BlockDriverState *bs)
+{
+}
+
+static int64_t backup_top_getlength(BlockDriverState *bs)
+{
+ return bs->file ? bdrv_getlength(bs->file->bs) : 0;
+}
+
+static bool backup_recurse_is_first_non_filter(BlockDriverState *bs,
+ BlockDriverState *candidate)
+{
+ return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
+}
+
+static int64_t coroutine_fn backup_co_get_block_status(BlockDriverState *bs,
+ int64_t sector_num,
+ int nb_sectors,
+ int *pnum,
+ BlockDriverState **file)
+{
+ assert(bs->file && bs->file->bs);
+ *pnum = nb_sectors;
+ *file = bs->file->bs;
+ return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
+ (sector_num << BDRV_SECTOR_BITS);
+}
+static BlockDriver backup_top = {
+ .format_name = "backup-top",
+ .instance_size = sizeof(BackupBlockJob *),
+
+ .bdrv_open = backup_top_open,
+ .bdrv_close = backup_top_close,
+
+ .bdrv_co_flush = backup_top_co_flush,
+ .bdrv_co_preadv = backup_top_co_preadv,
+ .bdrv_co_pwritev = backup_top_co_pwritev,
+ .bdrv_co_pwrite_zeroes = backup_top_co_pwrite_zeroes,
+ .bdrv_co_pdiscard = backup_top_co_pdiscard,
+
+ .bdrv_getlength = backup_top_getlength,
+ .bdrv_child_perm = bdrv_filter_default_perms,
+ .bdrv_recurse_is_first_non_filter = backup_recurse_is_first_non_filter,
+ .bdrv_co_get_block_status = backup_co_get_block_status,
+
+ .is_filter = true,
+};
+
BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, int64_t speed,
MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
@@ -545,6 +682,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
int64_t len;
BlockDriverInfo bdi;
BackupBlockJob *job = NULL;
+ BlockDriverState *filter = NULL;
int ret;
assert(bs);
@@ -606,17 +744,33 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
bdrv_get_device_name(bs));
goto error;
}
+ /* Setup before write filter */
+ filter =
+ bdrv_new_open_driver(&backup_top,
+ NULL, bdrv_get_flags(bs), NULL, &error_abort);
+ filter->implicit = true;
+ filter->total_sectors = bs->total_sectors;
+ bdrv_set_aio_context(filter, bdrv_get_aio_context(bs));
+
+ /* Insert before write notifier in the BDS chain */
+ bdrv_ref(filter);
+ bdrv_drained_begin(bs);
+ bdrv_append_file(filter, bs, &error_abort);
+ bdrv_drained_end(bs);
/* job->common.len is fixed, so we can't allow resize */
- job = block_job_create(job_id, &backup_job_driver, bs,
+ job = block_job_create(job_id, &backup_job_driver, filter,
BLK_PERM_CONSISTENT_READ,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
speed, creation_flags, cb, opaque, errp);
+ bdrv_unref(filter);
if (!job) {
goto error;
}
+ job->filter = filter;
+
/* The target must match the source in size, so no resize here either */
job->target = blk_new(BLK_PERM_WRITE,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
@@ -675,6 +829,13 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
if (job) {
backup_clean(&job->common);
block_job_early_fail(&job->common);
+ } else {
+ /* don't leak filter if job creation failed */
+ if (filter) {
+ bdrv_child_try_set_perm(filter->file, 0, BLK_PERM_ALL,
+ &error_abort);
+ bdrv_replace_node(filter, bs, &error_abort);
+ }
}
return NULL;
diff --git a/block/mirror.c b/block/mirror.c
index e1a160e6ea..3044472fd4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1174,11 +1174,12 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
mirror_top_bs->total_sectors = bs->total_sectors;
bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
- /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
- * it alive until block_job_create() succeeds even if bs has no parent. */
+ /* bdrv_append_backing() takes ownership of the mirror_top_bs reference,
+ * need to keep it alive until block_job_create() succeeds even if bs has
+ * no parent. */
bdrv_ref(mirror_top_bs);
bdrv_drained_begin(bs);
- bdrv_append(mirror_top_bs, bs, &local_err);
+ bdrv_append_backing(mirror_top_bs, bs, &local_err);
bdrv_drained_end(bs);
if (local_err) {
diff --git a/blockdev.c b/blockdev.c
index 5c11c245b0..8e2fc6e64c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1788,7 +1788,7 @@ static void external_snapshot_prepare(BlkActionState *common,
* can fail, so we need to do it in .prepare; undoing it for abort is
* always possible. */
bdrv_ref(state->new_bs);
- bdrv_append(state->new_bs, state->old_bs, &local_err);
+ bdrv_append_backing(state->new_bs, state->old_bs, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
diff --git a/include/block/block.h b/include/block/block.h
index d1f03cb48b..744b50e734 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -244,8 +244,10 @@ int bdrv_create(BlockDriver *drv, const char* filename,
QemuOpts *opts, Error **errp);
int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
BlockDriverState *bdrv_new(void);
-void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
- Error **errp);
+void bdrv_append_file(BlockDriverState *bs_new, BlockDriverState *bs_top,
+ Error **errp);
+void bdrv_append_backing(BlockDriverState *bs_new, BlockDriverState *bs_top,
+ Error **errp);
void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
Error **errp);
@@ -256,6 +258,8 @@ BdrvChild *bdrv_open_child(const char *filename,
BlockDriverState* parent,
const BdrvChildRole *child_role,
bool allow_none, Error **errp);
+void bdrv_set_file(BlockDriverState *bs, BlockDriverState *file_bs,
+ Error **errp);
void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
Error **errp);
int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
index 82e763b68d..cc653c317a 100644
--- a/tests/qemu-iotests/141.out
+++ b/tests/qemu-iotests/141.out
@@ -9,7 +9,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/m.
{"return": {}}
Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
{"return": {}}
-{"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
+{"error": {"class": "GenericError", "desc": "Block device drv0 is in use"}}
{"return": {}}
{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 1048576, "offset": 0, "speed": 0, "type": "backup"}}
{"return": {}}
--
2.11.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH v2 2/2] block: add filter driver to block/write-threshold.c
2017-08-15 8:18 [Qemu-devel] [PATCH v2 0/2] add internal backup job and write-threshold filter drivers Manos Pitsidianakis
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 1/2] block: use internal filter node in backup Manos Pitsidianakis
@ 2017-08-15 8:18 ` Manos Pitsidianakis
2017-10-02 12:52 ` Kevin Wolf
1 sibling, 1 reply; 8+ messages in thread
From: Manos Pitsidianakis @ 2017-08-15 8:18 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, Alberto Garcia, Stefan Hajnoczi, Kevin Wolf
With runtime insertion and removal of filters, write-threshold.c can
provide more flexible deliveries of BLOCK_WRITE_THRESHOLD events. After
the event trigger, the filter nodes are no longer useful and must be
removed.
The existing write-threshold cannot be easily converted to using the
filter driver, so it is not affected.
This is part of deprecating before write notifiers, which are hard coded
into the block layer. Block filter drivers are inserted into the graph
only when a feature is needed. This makes the block layer more modular
and reuses the block driver abstraction that is already present.
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
---
block/qapi.c | 2 +-
block/write-threshold.c | 264 +++++++++++++++++++++++++++++++++++-----
include/block/write-threshold.h | 22 ++--
qapi/block-core.json | 19 ++-
tests/test-write-threshold.c | 40 +++---
5 files changed, 281 insertions(+), 66 deletions(-)
diff --git a/block/qapi.c b/block/qapi.c
index 2be44a6758..fe6cf2eae5 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -122,7 +122,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
info->group = g_strdup(throttle_group_get_name(tgm));
}
- info->write_threshold = bdrv_write_threshold_get(bs);
+ info->write_threshold = bdrv_write_threshold_get_legacy(bs);
bs0 = bs;
p_image_info = &info->image;
diff --git a/block/write-threshold.c b/block/write-threshold.c
index 0bd1a01c86..4a67188ea3 100644
--- a/block/write-threshold.c
+++ b/block/write-threshold.c
@@ -2,9 +2,11 @@
* QEMU System Emulator block write threshold notification
*
* Copyright Red Hat, Inc. 2014
+ * Copyright 2017 Manos Pitsidianakis
*
* Authors:
* Francesco Romani <fromani@redhat.com>
+ * Manos Pitsidianakis <el13635@mail.ntua.gr>
*
* This work is licensed under the terms of the GNU LGPL, version 2 or later.
* See the COPYING.LIB file in the top-level directory.
@@ -19,46 +21,35 @@
#include "qmp-commands.h"
-uint64_t bdrv_write_threshold_get(const BlockDriverState *bs)
+uint64_t bdrv_write_threshold_get_legacy(const BlockDriverState *bs)
{
return bs->write_threshold_offset;
}
-bool bdrv_write_threshold_is_set(const BlockDriverState *bs)
+bool bdrv_write_threshold_is_set_legacy(const BlockDriverState *bs)
{
return bs->write_threshold_offset > 0;
}
-static void write_threshold_disable(BlockDriverState *bs)
+static void write_threshold_disable_legacy(BlockDriverState *bs)
{
- if (bdrv_write_threshold_is_set(bs)) {
+ if (bdrv_write_threshold_is_set_legacy(bs)) {
notifier_with_return_remove(&bs->write_threshold_notifier);
bs->write_threshold_offset = 0;
}
}
-uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs,
- const BdrvTrackedRequest *req)
-{
- if (bdrv_write_threshold_is_set(bs)) {
- if (req->offset > bs->write_threshold_offset) {
- return (req->offset - bs->write_threshold_offset) + req->bytes;
- }
- if ((req->offset + req->bytes) > bs->write_threshold_offset) {
- return (req->offset + req->bytes) - bs->write_threshold_offset;
- }
- }
- return 0;
-}
-
static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
void *opaque)
{
BdrvTrackedRequest *req = opaque;
BlockDriverState *bs = req->bs;
uint64_t amount = 0;
+ uint64_t threshold = bdrv_write_threshold_get_legacy(bs);
+ uint64_t offset = req->offset;
+ uint64_t bytes = req->bytes;
- amount = bdrv_write_threshold_exceeded(bs, req);
+ amount = bdrv_write_threshold_exceeded(threshold, offset, bytes);
if (amount > 0) {
qapi_event_send_block_write_threshold(
bs->node_name,
@@ -67,7 +58,7 @@ static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
&error_abort);
/* autodisable to avoid flooding the monitor */
- write_threshold_disable(bs);
+ write_threshold_disable_legacy(bs);
}
return 0; /* should always let other notifiers run */
@@ -79,25 +70,26 @@ static void write_threshold_register_notifier(BlockDriverState *bs)
bdrv_add_before_write_notifier(bs, &bs->write_threshold_notifier);
}
-static void write_threshold_update(BlockDriverState *bs,
- int64_t threshold_bytes)
+static void write_threshold_update_legacy(BlockDriverState *bs,
+ int64_t threshold_bytes)
{
bs->write_threshold_offset = threshold_bytes;
}
-void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_bytes)
+void bdrv_write_threshold_set_legacy(BlockDriverState *bs,
+ uint64_t threshold_bytes)
{
- if (bdrv_write_threshold_is_set(bs)) {
+ if (bdrv_write_threshold_is_set_legacy(bs)) {
if (threshold_bytes > 0) {
- write_threshold_update(bs, threshold_bytes);
+ write_threshold_update_legacy(bs, threshold_bytes);
} else {
- write_threshold_disable(bs);
+ write_threshold_disable_legacy(bs);
}
} else {
if (threshold_bytes > 0) {
/* avoid multiple registration */
write_threshold_register_notifier(bs);
- write_threshold_update(bs, threshold_bytes);
+ write_threshold_update_legacy(bs, threshold_bytes);
}
/* discard bogus disable request */
}
@@ -119,7 +111,223 @@ void qmp_block_set_write_threshold(const char *node_name,
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
- bdrv_write_threshold_set(bs, threshold_bytes);
+ bdrv_write_threshold_set_legacy(bs, threshold_bytes);
aio_context_release(aio_context);
}
+
+
+/* The write-threshold filter drivers delivers a one-time BLOCK_WRITE_THRESHOLD
+ * event when a passing write request exceeds the configured write threshold
+ * offset of the filter.
+ *
+ * This is useful to transparently resize thin-provisioned drives without
+ * the guest OS noticing.
+ */
+
+#define QEMU_OPT_WRITE_THRESHOLD "write-threshold"
+static BlockDriver write_threshold;
+static QemuOptsList write_threshold_opts = {
+ .name = "write-threshold",
+ .head = QTAILQ_HEAD_INITIALIZER(write_threshold_opts.head),
+ .desc = {
+ {
+ .name = QEMU_OPT_WRITE_THRESHOLD,
+ .type = QEMU_OPT_NUMBER,
+ .help = "configured threshold for the block device, bytes. Use 0"
+ "to disable the threshold",
+ },
+ { /* end of list */ }
+ },
+};
+
+static bool bdrv_write_threshold_is_set(const BlockDriverState *bs)
+{
+ uint64_t threshold = *(uint64_t *)bs->opaque;
+ return threshold > 0;
+}
+
+static void bdrv_write_threshold_disable(BlockDriverState *bs)
+{
+ uint64_t *threshold = (uint64_t *)bs->opaque;
+ if (bdrv_write_threshold_is_set(bs)) {
+ *threshold = 0;
+ }
+}
+
+uint64_t bdrv_write_threshold_exceeded(uint64_t threshold, uint64_t offset,
+ uint64_t bytes)
+{
+ if (threshold) {
+ if (offset > threshold) {
+ return (offset - threshold) + bytes;
+ }
+ if ((offset + bytes) > threshold) {
+ return (offset + bytes) - threshold;
+ }
+ }
+ return 0;
+}
+
+
+static void bdrv_write_threshold_update(BlockDriverState *bs,
+ int64_t threshold_bytes)
+{
+ uint64_t *threshold = (uint64_t *)bs->opaque;
+ *threshold = threshold_bytes;
+}
+
+static void bdrv_write_threshold_check_amount(BlockDriverState *bs,
+ uint64_t offset,
+ uint64_t bytes)
+{
+ uint64_t threshold = *(uint64_t *)bs->opaque;
+ uint64_t amount = 0;
+
+ amount = bdrv_write_threshold_exceeded(threshold, offset, bytes);
+ if (amount > 0) {
+ qapi_event_send_block_write_threshold(child_bs(bs)->node_name,
+ amount,
+ threshold,
+ &error_abort);
+ /* autodisable to avoid flooding the monitor */
+ bdrv_write_threshold_disable(bs);
+ }
+}
+
+/* Filter driver methods */
+
+static int coroutine_fn write_threshold_co_preadv(BlockDriverState *bs,
+ uint64_t offset,
+ uint64_t bytes,
+ QEMUIOVector *qiov,
+ int flags)
+{
+ return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn write_threshold_co_pwritev(BlockDriverState *bs,
+ uint64_t offset,
+ uint64_t bytes,
+ QEMUIOVector *qiov,
+ int flags)
+{
+ bdrv_write_threshold_check_amount(bs, offset, bytes);
+ return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn write_threshold_co_pwrite_zeroes(
+ BlockDriverState *bs,
+ int64_t offset,
+ int bytes,
+ BdrvRequestFlags flags)
+{
+ bdrv_write_threshold_check_amount(bs, offset, bytes);
+ return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static int coroutine_fn write_threshold_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int bytes)
+{
+ bdrv_write_threshold_check_amount(bs, offset, bytes);
+ return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
+}
+
+
+static int64_t write_threshold_getlength(BlockDriverState *bs)
+{
+ return bdrv_getlength(bs->file->bs);
+}
+
+static int write_threshold_open(BlockDriverState *bs, QDict *options,
+ int flags, Error **errp)
+{
+ Error *local_err = NULL;
+ int ret = 0;
+ QemuOpts *opts = NULL;
+ uint64_t threshold = 0;
+
+ bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
+ false, errp);
+ if (!bs->file) {
+ return -EINVAL;
+ }
+
+ bs->supported_write_flags = bs->file->bs->supported_write_flags;
+ bs->supported_zero_flags = bs->file->bs->supported_zero_flags;
+
+ opts = qemu_opts_create(&write_threshold_opts, NULL, 0, &error_abort);
+
+ qemu_opts_absorb_qdict(opts, options, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ ret = -EINVAL;
+ goto ret;
+ }
+
+ threshold = qemu_opt_get_number(opts, QEMU_OPT_WRITE_THRESHOLD, 0);
+ bdrv_write_threshold_update(bs, threshold);
+
+ret:
+ qemu_opts_del(opts);
+ return ret;
+}
+
+static void write_threshold_close(BlockDriverState *bs)
+{
+}
+
+static int write_threshold_co_flush(BlockDriverState *bs)
+{
+ return bdrv_co_flush(bs->file->bs);
+}
+
+static int64_t coroutine_fn write_threshold_co_get_block_status(
+ BlockDriverState *bs,
+ int64_t sector_num,
+ int nb_sectors,
+ int *pnum,
+ BlockDriverState **file)
+{
+ assert(child_bs(bs));
+ *pnum = nb_sectors;
+ *file = child_bs(bs);
+ return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
+ (sector_num << BDRV_SECTOR_BITS);
+}
+
+static bool write_threshold_recurse_is_first_non_filter(
+ BlockDriverState *bs,
+ BlockDriverState *candidate)
+{
+ return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
+}
+
+static BlockDriver write_threshold = {
+ .format_name = "write-threshold",
+ .instance_size = sizeof(uint64_t),
+
+ .bdrv_open = write_threshold_open,
+ .bdrv_close = write_threshold_close,
+
+ .bdrv_co_flush = write_threshold_co_flush,
+ .bdrv_co_preadv = write_threshold_co_preadv,
+ .bdrv_co_pwritev = write_threshold_co_pwritev,
+ .bdrv_co_pwrite_zeroes = write_threshold_co_pwrite_zeroes,
+ .bdrv_co_pdiscard = write_threshold_co_pdiscard,
+
+ .bdrv_getlength = write_threshold_getlength,
+ .bdrv_child_perm = bdrv_filter_default_perms,
+ .bdrv_co_get_block_status = write_threshold_co_get_block_status,
+ .bdrv_recurse_is_first_non_filter =
+ write_threshold_recurse_is_first_non_filter,
+
+ .is_filter = true,
+};
+
+static void bdrv_write_threshold_init(void)
+{
+ bdrv_register(&write_threshold);
+}
+
+block_init(bdrv_write_threshold_init);
diff --git a/include/block/write-threshold.h b/include/block/write-threshold.h
index 234d2193e0..5cf378564d 100644
--- a/include/block/write-threshold.h
+++ b/include/block/write-threshold.h
@@ -15,7 +15,7 @@
#include "qemu-common.h"
/*
- * bdrv_write_threshold_set:
+ * bdrv_write_threshold_set_legacy:
*
* Set the write threshold for block devices, in bytes.
* Notify when a write exceeds the threshold, meaning the device
@@ -24,22 +24,25 @@
*
* Use threshold_bytes == 0 to disable.
*/
-void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_bytes);
+void bdrv_write_threshold_set_legacy(BlockDriverState *bs,
+ uint64_t threshold_bytes);
+
/*
- * bdrv_write_threshold_get
+ * bdrv_write_threshold_get_legacy
*
* Get the configured write threshold, in bytes.
* Zero means no threshold configured.
+ *
*/
-uint64_t bdrv_write_threshold_get(const BlockDriverState *bs);
+uint64_t bdrv_write_threshold_get_legacy(const BlockDriverState *bs);
/*
- * bdrv_write_threshold_is_set
+ * bdrv_write_threshold_is_set_legacy
*
* Tell if a write threshold is set for a given BDS.
*/
-bool bdrv_write_threshold_is_set(const BlockDriverState *bs);
+bool bdrv_write_threshold_is_set_legacy(const BlockDriverState *bs);
/*
* bdrv_write_threshold_exceeded
@@ -51,11 +54,10 @@ bool bdrv_write_threshold_is_set(const BlockDriverState *bs);
* NOTE: here we assume the following holds for each request this code
* deals with:
*
- * assert((req->offset + req->bytes) <= UINT64_MAX)
+ * assert((offset + bytes) <= UINT64_MAX)
*
* Please not there is *not* an actual C assert().
*/
-uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs,
- const BdrvTrackedRequest *req);
-
+uint64_t bdrv_write_threshold_exceeded(uint64_t threshold, uint64_t offset,
+ uint64_t bytes);
#endif
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 12fd749a94..4d6ba1baef 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2232,7 +2232,8 @@
'host_device', 'http', 'https', 'iscsi', 'luks', 'nbd', 'nfs',
'null-aio', 'null-co', 'parallels', 'qcow', 'qcow2', 'qed',
'quorum', 'raw', 'rbd', 'replication', 'sheepdog', 'ssh',
- 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs' ] }
+ 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat', 'vxhs',
+ 'write-threshold'] }
##
# @BlockdevOptionsFile:
@@ -3113,6 +3114,21 @@
'file' : 'BlockdevRef',
'*limits' : 'ThrottleLimits'
} }
+
+##
+# @BlockdevOptionsWriteThreshold:
+#
+# Driver specific block device options for the write-threshold driver
+#
+# @file: reference to or definition of the data source block device
+# @write-threshold: threshold in bytes
+# Since: 2.11
+##
+{ 'struct': 'BlockdevOptionsWriteThreshold',
+ 'data': { '*file' : 'BlockdevRef',
+ 'write-threshold' : 'int'
+ } }
+
##
# @BlockdevOptions:
#
@@ -3175,6 +3191,7 @@
'sheepdog': 'BlockdevOptionsSheepdog',
'ssh': 'BlockdevOptionsSsh',
'throttle': 'BlockdevOptionsThrottle',
+ 'write-threshold': 'BlockdevOptionsWriteThreshold',
'vdi': 'BlockdevOptionsGenericFormat',
'vhdx': 'BlockdevOptionsGenericFormat',
'vmdk': 'BlockdevOptionsGenericCOWFormat',
diff --git a/tests/test-write-threshold.c b/tests/test-write-threshold.c
index 97ca12f710..1c802d2de4 100644
--- a/tests/test-write-threshold.c
+++ b/tests/test-write-threshold.c
@@ -17,9 +17,9 @@ static void test_threshold_not_set_on_init(void)
BlockDriverState bs;
memset(&bs, 0, sizeof(bs));
- g_assert(!bdrv_write_threshold_is_set(&bs));
+ g_assert(!bdrv_write_threshold_is_set_legacy(&bs));
- res = bdrv_write_threshold_get(&bs);
+ res = bdrv_write_threshold_get_legacy(&bs);
g_assert_cmpint(res, ==, 0);
}
@@ -30,11 +30,11 @@ static void test_threshold_set_get(void)
BlockDriverState bs;
memset(&bs, 0, sizeof(bs));
- bdrv_write_threshold_set(&bs, threshold);
+ bdrv_write_threshold_set_legacy(&bs, threshold);
- g_assert(bdrv_write_threshold_is_set(&bs));
+ g_assert(bdrv_write_threshold_is_set_legacy(&bs));
- res = bdrv_write_threshold_get(&bs);
+ res = bdrv_write_threshold_get_legacy(&bs);
g_assert_cmpint(res, ==, threshold);
}
@@ -46,9 +46,9 @@ static void test_threshold_multi_set_get(void)
BlockDriverState bs;
memset(&bs, 0, sizeof(bs));
- bdrv_write_threshold_set(&bs, threshold1);
- bdrv_write_threshold_set(&bs, threshold2);
- res = bdrv_write_threshold_get(&bs);
+ bdrv_write_threshold_set_legacy(&bs, threshold1);
+ bdrv_write_threshold_set_legacy(&bs, threshold2);
+ res = bdrv_write_threshold_get_legacy(&bs);
g_assert_cmpint(res, ==, threshold2);
}
@@ -56,16 +56,10 @@ static void test_threshold_not_trigger(void)
{
uint64_t amount = 0;
uint64_t threshold = 4 * 1024 * 1024;
- BlockDriverState bs;
- BdrvTrackedRequest req;
+ uint64_t offset = 1024;
+ uint64_t bytes = 1024;
- memset(&bs, 0, sizeof(bs));
- memset(&req, 0, sizeof(req));
- req.offset = 1024;
- req.bytes = 1024;
-
- bdrv_write_threshold_set(&bs, threshold);
- amount = bdrv_write_threshold_exceeded(&bs, &req);
+ amount = bdrv_write_threshold_exceeded(threshold, offset, bytes);
g_assert_cmpuint(amount, ==, 0);
}
@@ -74,16 +68,10 @@ static void test_threshold_trigger(void)
{
uint64_t amount = 0;
uint64_t threshold = 4 * 1024 * 1024;
- BlockDriverState bs;
- BdrvTrackedRequest req;
+ uint64_t offset = (4 * 1024 * 1024) - 1024;
+ uint64_t bytes = 2 * 1024;
- memset(&bs, 0, sizeof(bs));
- memset(&req, 0, sizeof(req));
- req.offset = (4 * 1024 * 1024) - 1024;
- req.bytes = 2 * 1024;
-
- bdrv_write_threshold_set(&bs, threshold);
- amount = bdrv_write_threshold_exceeded(&bs, &req);
+ amount = bdrv_write_threshold_exceeded(threshold, offset, bytes);
g_assert_cmpuint(amount, >=, 1024);
}
--
2.11.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/2] block: use internal filter node in backup
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 1/2] block: use internal filter node in backup Manos Pitsidianakis
@ 2017-08-16 13:25 ` Stefan Hajnoczi
2017-08-16 20:11 ` Manos Pitsidianakis
2017-10-02 12:45 ` [Qemu-devel] " Kevin Wolf
1 sibling, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2017-08-16 13:25 UTC (permalink / raw)
To: Manos Pitsidianakis; +Cc: qemu-devel, Kevin Wolf, Stefan Hajnoczi, qemu-block
[-- Attachment #1: Type: text/plain, Size: 24266 bytes --]
On Tue, Aug 15, 2017 at 11:18:53AM +0300, Manos Pitsidianakis wrote:
> block/backup.c currently uses before write notifiers on the targeted
> node. We can create a filter node instead to intercept write requests
> for the backup job on the BDS level, instead of the BlockBackend level.
>
> This is part of deprecating before write notifiers, which are hard coded
> into the block layer. Block filter drivers are inserted into the graph
> only when a feature is needed. This makes the block layer more modular
> and reuses the block driver abstraction that is already present.
>
> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
> ---
> block.c | 89 +++++++++++++++++--
> block/backup.c | 207 ++++++++++++++++++++++++++++++++++++++++-----
> block/mirror.c | 7 +-
> blockdev.c | 2 +-
> include/block/block.h | 8 +-
> tests/qemu-iotests/141.out | 2 +-
> 6 files changed, 276 insertions(+), 39 deletions(-)
>
> diff --git a/block.c b/block.c
> index 2de1c29eb3..81bd51b670 100644
> --- a/block.c
> +++ b/block.c
> @@ -2088,6 +2088,38 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs)
> }
>
> /*
> + * Sets the file link of a BDS. A new reference is created; callers
> + * which don't need their own reference any more must call bdrv_unref().
> + */
> +void bdrv_set_file(BlockDriverState *bs, BlockDriverState *file_bs,
> + Error **errp)
> +{
> + if (file_bs) {
> + bdrv_ref(file_bs);
> + }
> +
> + if (bs->file) {
> + bdrv_unref_child(bs, bs->file);
> + }
> +
> + if (!file_bs) {
> + bs->file = NULL;
> + goto out;
> + }
> +
> + bs->file = bdrv_attach_child(bs, file_bs, "file", &child_file,
> + errp);
> + if (!bs->file) {
> + bdrv_unref(file_bs);
> + }
> +
> + bdrv_refresh_filename(bs);
> +
> +out:
> + bdrv_refresh_limits(bs, NULL);
> +}
> +
> +/*
> * Sets the backing file link of a BDS. A new reference is created; callers
> * which don't need their own reference any more must call bdrv_unref().
> */
> @@ -2355,12 +2387,12 @@ static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
> goto out;
> }
>
> - /* bdrv_append() consumes a strong reference to bs_snapshot
> + /* bdrv_append_backing() consumes a strong reference to bs_snapshot
> * (i.e. it will call bdrv_unref() on it) even on error, so in
> * order to be able to return one, we have to increase
> * bs_snapshot's refcount here */
> bdrv_ref(bs_snapshot);
> - bdrv_append(bs_snapshot, bs, &local_err);
> + bdrv_append_backing(bs_snapshot, bs, &local_err);
> if (local_err) {
> error_propagate(errp, local_err);
> bs_snapshot = NULL;
> @@ -3142,7 +3174,7 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
> return false;
> }
>
> - if (c->role == &child_backing) {
> + if (c->role == &child_backing || c->role == &child_file) {
> /* If @from is a backing file of @to, ignore the child to avoid
> * creating a loop. We only want to change the pointer of other
> * parents. */
This may have unwanted side-effects. I think you're using is so that
bdrv_set_file() + bdrv_replace_node() does not create a loop in the
graph. That is okay if there is only one parent with child_file. If
there are multiple parents with that role then it's not clear to me that
they should all be skipped.
> @@ -3213,6 +3245,45 @@ out:
> }
>
> /*
> + * Add new bs node at the top of a BDS chain while the chain is
> + * live, while keeping required fields on the top layer.
> + *
> + * This will modify the BlockDriverState fields, and swap contents
> + * between bs_new and bs_top. Both bs_new and bs_top are modified.
> + *
> + * bs_new must not be attached to a BlockBackend.
> + *
> + * bdrv_append_file() takes ownership of a bs_new reference and unrefs it
> + * because that's what the callers commonly need. bs_new will be referenced by
> + * the old parents of bs_top after bdrv_append_file() returns. If the caller
> + * needs to keep a reference of its own, it must call bdrv_ref().
> + */
> +void bdrv_append_file(BlockDriverState *bs_new, BlockDriverState *bs_top,
> + Error **errp)
> +{
> + Error *local_err = NULL;
> +
> + bdrv_ref(bs_top);
> + bdrv_set_file(bs_new, bs_top, &local_err);
bdrv_set_file() takes its own reference so there's no need to call
bdrv_ref(bs_top).
But it would be more consistent with existing functions for
bdrv_set_file() *not* to take a new reference. If you make that change
then this bdrv_ref() is correct.
> + if (local_err) {
> + error_propagate(errp, local_err);
> + bdrv_set_file(bs_new, NULL, &error_abort);
If bdrv_set_file(bs_new, bs_top) failed, why is it necessary to set
bs_new->file to NULL?
bdrv_set_file() should guarantee that bs_new->file is either bs_top (on
success) or the old value (on failure). Then the caller does not need
to fix up bs_new->file on failure.
> + goto out;
> + }
> + bdrv_replace_node(bs_top, bs_new, &local_err);
> + if (local_err) {
> + error_propagate(errp, local_err);
> + goto out;
> + }
> +
> + /* bs_new is now referenced by its new parents, we don't need the
> + * additional reference any more. */
> +out:
> + bdrv_unref(bs_top);
> + bdrv_unref(bs_new);
> +}
> +
> +/*
> * Add new bs contents at the top of an image chain while the chain is
> * live, while keeping required fields on the top layer.
> *
Plase introduce the bdrv_set_file() and bdrv_append_file() APIs in a
separate patch from the backup block job changes.
> @@ -3223,13 +3294,13 @@ out:
> *
> * This function does not create any image files.
> *
> - * bdrv_append() takes ownership of a bs_new reference and unrefs it because
> - * that's what the callers commonly need. bs_new will be referenced by the old
> - * parents of bs_top after bdrv_append() returns. If the caller needs to keep a
> - * reference of its own, it must call bdrv_ref().
> + * bdrv_append_backing() takes ownership of a bs_new reference and unrefs it
> + * because that's what the callers commonly need. bs_new will be referenced by
> + * the old parents of bs_top after bdrv_append_backing() returns. If the caller
> + * needs to keep a reference of its own, it must call bdrv_ref().
> */
> -void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
> - Error **errp)
> +void bdrv_append_backing(BlockDriverState *bs_new, BlockDriverState *bs_top,
> + Error **errp)
> {
> Error *local_err = NULL;
>
Please change bdrv_append() to bdrv_append_backing() in a separate
patch.
> diff --git a/block/backup.c b/block/backup.c
> index 504a089150..0828d522b6 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -43,7 +43,7 @@ typedef struct BackupBlockJob {
> unsigned long *done_bitmap;
> int64_t cluster_size;
> bool compress;
> - NotifierWithReturn before_write;
> + BlockDriverState *filter;
> QLIST_HEAD(, CowRequest) inflight_reqs;
> } BackupBlockJob;
>
> @@ -174,20 +174,6 @@ out:
> return ret;
> }
>
> -static int coroutine_fn backup_before_write_notify(
> - NotifierWithReturn *notifier,
> - void *opaque)
> -{
> - BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
> - BdrvTrackedRequest *req = opaque;
> -
> - assert(req->bs == blk_bs(job->common.blk));
> - assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
> - assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
> -
> - return backup_do_cow(job, req->offset, req->bytes, NULL, true);
> -}
> -
> static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
> {
> BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> @@ -202,7 +188,8 @@ static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
> static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
> {
> BdrvDirtyBitmap *bm;
> - BlockDriverState *bs = blk_bs(job->common.blk);
> + BlockDriverState *bs = child_bs(blk_bs(job->common.blk));
> + assert(bs);
>
> if (ret < 0 || block_job_is_cancelled(&job->common)) {
> /* Merge the successor back into the parent, delete nothing. */
> @@ -234,9 +221,31 @@ static void backup_abort(BlockJob *job)
> static void backup_clean(BlockJob *job)
> {
> BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> + BlockDriverState *bs = child_bs(s->filter),
> + *filter = s->filter;
Please do not put multiple variable declarations with initializers in a
single statement. It's easier to read like this:
BlockDriverState *filter = s->filter;
BlockDriverState *bs = child_bs(filter);
> +
> assert(s->target);
> blk_unref(s->target);
> s->target = NULL;
> +
> + /* make sure nothing goes away while removing filter */
> + bdrv_ref(filter);
> + bdrv_ref(bs);
> + bdrv_drained_begin(bs);
> +
> + block_job_remove_all_bdrv(job);
> + bdrv_child_try_set_perm(filter->file, 0, BLK_PERM_ALL,
> + &error_abort);
> + bdrv_replace_node(filter, bs, &error_abort);
> +
> + blk_remove_bs(job->blk);
> + blk_set_perm(job->blk, 0, BLK_PERM_ALL, &error_abort);
> + blk_insert_bs(job->blk, filter, &error_abort);
> +
> + bdrv_drained_end(bs);
> + bdrv_unref(filter);
> + bdrv_unref(bs);
> + s->filter = NULL;
> }
>
> static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
> @@ -421,6 +430,18 @@ out:
> return ret;
> }
>
> +static void backup_top_enable(BackupBlockJob *job)
> +{
> + BlockDriverState *bs = job->filter;
> + bs->opaque = job;
> +}
> +
> +static void backup_top_disable(BackupBlockJob *job)
> +{
> + BlockDriverState *bs = job->filter;
> + bs->opaque = NULL;
> +}
bs->opaque is a pointer that is managed by block.c. Drivers are not
allowed to modify this pointer. You declared BlockDriver.instance_size
= sizeof(BackupBlockJob *) but didn't use it.
I guess this should be:
static void backup_top_enable(BackupBlockJob *job)
{
BackupBlockJob **jobp = job->bs->opaque;
*jobp = job;
}
static void backup_top_disable(BackupBlockJob *job)
{
BackupBlockJob **jobp = job->bs->opaque;
*jobp = NULL;
}
...
static int coroutine_fn backup_top_co_pwritev(BlockDriverState *bs,
uint64_t offset,
uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
int ret = 0;
BackupBlockJob *job = *(BackupBlockJob **)bs->opaque;
if (job) {
Alternatively filter->job can be used but that may be a legacy field
that will be removed eventually.
> +
> static void coroutine_fn backup_run(void *opaque)
> {
> BackupBlockJob *job = opaque;
> @@ -435,13 +456,12 @@ static void coroutine_fn backup_run(void *opaque)
> job->done_bitmap = bitmap_new(DIV_ROUND_UP(job->common.len,
> job->cluster_size));
>
> - job->before_write.notify = backup_before_write_notify;
> - bdrv_add_before_write_notifier(bs, &job->before_write);
> + backup_top_enable(job);
>
> if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
> while (!block_job_is_cancelled(&job->common)) {
> - /* Yield until the job is cancelled. We just let our before_write
> - * notify callback service CoW requests. */
> + /* Yield until the job is cancelled. We just let our backup_top
> + * filter service CoW requests. */
> block_job_yield(&job->common);
> }
> } else if (job->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
> @@ -507,8 +527,7 @@ static void coroutine_fn backup_run(void *opaque)
> }
> }
> }
> -
> - notifier_with_return_remove(&job->before_write);
> + backup_top_disable(job);
>
> /* wait until pending backup_do_cow() calls have completed */
> qemu_co_rwlock_wrlock(&job->flush_rwlock);
> @@ -532,6 +551,124 @@ static const BlockJobDriver backup_job_driver = {
> .drain = backup_drain,
> };
>
> +static int coroutine_fn backup_top_co_preadv(BlockDriverState *bs,
> + uint64_t offset,
> + uint64_t bytes,
> + QEMUIOVector *qiov, int flags)
> +{
> + return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
> +}
> +
> +static int coroutine_fn backup_top_co_pwritev(BlockDriverState *bs,
> + uint64_t offset,
> + uint64_t bytes,
> + QEMUIOVector *qiov, int flags)
> +{
> + int ret = 0;
> + BackupBlockJob *job = bs->opaque;
> + if (job) {
> + assert(bs == blk_bs(job->common.blk));
> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> + ret = backup_do_cow(job, offset, bytes, NULL, true);
> + }
> +
> + return ret < 0 ? ret : bdrv_co_pwritev(bs->file, offset, bytes,
> + qiov, flags);
> +}
> +
> +static int coroutine_fn backup_top_co_pwrite_zeroes(BlockDriverState *bs,
> + int64_t offset,
> + int bytes,
> + BdrvRequestFlags flags)
> +{
> + int ret = 0;
> + BackupBlockJob *job = bs->opaque;
> + if (job) {
> + assert(bs == blk_bs(job->common.blk));
> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> + ret = backup_do_cow(job, offset, bytes, NULL, true);
> + }
> +
> + return ret < 0 ? ret : bdrv_co_pwrite_zeroes(bs->file, offset, bytes,
> + flags);
> +}
> +
> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
> + int64_t offset, int bytes)
> +{
> + int ret = 0;
> + BackupBlockJob *job = bs->opaque;
> + if (job) {
> + assert(bs == blk_bs(job->common.blk));
> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> + ret = backup_do_cow(job, offset, bytes, NULL, true);
> + }
> +
> + return ret < 0 ? ret : bdrv_co_pdiscard(bs->file->bs, offset, bytes);
> +}
> +
> +static int backup_top_co_flush(BlockDriverState *bs)
> +{
> + return bdrv_co_flush(bs->file->bs);
> +}
> +
> +static int backup_top_open(BlockDriverState *bs, QDict *options,
> + int flags, Error **errp)
> +{
> + return 0;
> +}
> +
> +static void backup_top_close(BlockDriverState *bs)
> +{
> +}
> +
> +static int64_t backup_top_getlength(BlockDriverState *bs)
> +{
> + return bs->file ? bdrv_getlength(bs->file->bs) : 0;
> +}
> +
> +static bool backup_recurse_is_first_non_filter(BlockDriverState *bs,
> + BlockDriverState *candidate)
> +{
> + return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
> +}
> +
> +static int64_t coroutine_fn backup_co_get_block_status(BlockDriverState *bs,
> + int64_t sector_num,
> + int nb_sectors,
> + int *pnum,
> + BlockDriverState **file)
> +{
> + assert(bs->file && bs->file->bs);
> + *pnum = nb_sectors;
> + *file = bs->file->bs;
> + return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
> + (sector_num << BDRV_SECTOR_BITS);
> +}
> +static BlockDriver backup_top = {
> + .format_name = "backup-top",
> + .instance_size = sizeof(BackupBlockJob *),
> +
> + .bdrv_open = backup_top_open,
.bdrv_open may be NULL. There's no need to define this function.
> + .bdrv_close = backup_top_close,
> +
> + .bdrv_co_flush = backup_top_co_flush,
> + .bdrv_co_preadv = backup_top_co_preadv,
> + .bdrv_co_pwritev = backup_top_co_pwritev,
> + .bdrv_co_pwrite_zeroes = backup_top_co_pwrite_zeroes,
> + .bdrv_co_pdiscard = backup_top_co_pdiscard,
> +
> + .bdrv_getlength = backup_top_getlength,
> + .bdrv_child_perm = bdrv_filter_default_perms,
> + .bdrv_recurse_is_first_non_filter = backup_recurse_is_first_non_filter,
I think this is dead code since .is_filter = true. It will not be
called.
> + .bdrv_co_get_block_status = backup_co_get_block_status,
> +
> + .is_filter = true,
> +};
> +
> BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> BlockDriverState *target, int64_t speed,
> MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
> @@ -545,6 +682,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> int64_t len;
> BlockDriverInfo bdi;
> BackupBlockJob *job = NULL;
> + BlockDriverState *filter = NULL;
> int ret;
>
> assert(bs);
> @@ -606,17 +744,33 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> bdrv_get_device_name(bs));
> goto error;
> }
> + /* Setup before write filter */
> + filter =
> + bdrv_new_open_driver(&backup_top,
> + NULL, bdrv_get_flags(bs), NULL, &error_abort);
> + filter->implicit = true;
> + filter->total_sectors = bs->total_sectors;
> + bdrv_set_aio_context(filter, bdrv_get_aio_context(bs));
> +
> + /* Insert before write notifier in the BDS chain */
> + bdrv_ref(filter);
> + bdrv_drained_begin(bs);
> + bdrv_append_file(filter, bs, &error_abort);
> + bdrv_drained_end(bs);
>
> /* job->common.len is fixed, so we can't allow resize */
> - job = block_job_create(job_id, &backup_job_driver, bs,
> + job = block_job_create(job_id, &backup_job_driver, filter,
> BLK_PERM_CONSISTENT_READ,
> BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
> speed, creation_flags, cb, opaque, errp);
> + bdrv_unref(filter);
> if (!job) {
> goto error;
> }
>
> + job->filter = filter;
Is it okay to use filter here after bdrv_unref()?
> +
> /* The target must match the source in size, so no resize here either */
> job->target = blk_new(BLK_PERM_WRITE,
> BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> @@ -675,6 +829,13 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> if (job) {
> backup_clean(&job->common);
> block_job_early_fail(&job->common);
> + } else {
> + /* don't leak filter if job creation failed */
> + if (filter) {
> + bdrv_child_try_set_perm(filter->file, 0, BLK_PERM_ALL,
> + &error_abort);
> + bdrv_replace_node(filter, bs, &error_abort);
> + }
> }
>
> return NULL;
> diff --git a/block/mirror.c b/block/mirror.c
> index e1a160e6ea..3044472fd4 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1174,11 +1174,12 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
> mirror_top_bs->total_sectors = bs->total_sectors;
> bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
>
> - /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
> - * it alive until block_job_create() succeeds even if bs has no parent. */
> + /* bdrv_append_backing() takes ownership of the mirror_top_bs reference,
> + * need to keep it alive until block_job_create() succeeds even if bs has
> + * no parent. */
> bdrv_ref(mirror_top_bs);
> bdrv_drained_begin(bs);
> - bdrv_append(mirror_top_bs, bs, &local_err);
> + bdrv_append_backing(mirror_top_bs, bs, &local_err);
> bdrv_drained_end(bs);
>
> if (local_err) {
> diff --git a/blockdev.c b/blockdev.c
> index 5c11c245b0..8e2fc6e64c 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1788,7 +1788,7 @@ static void external_snapshot_prepare(BlkActionState *common,
> * can fail, so we need to do it in .prepare; undoing it for abort is
> * always possible. */
> bdrv_ref(state->new_bs);
> - bdrv_append(state->new_bs, state->old_bs, &local_err);
> + bdrv_append_backing(state->new_bs, state->old_bs, &local_err);
> if (local_err) {
> error_propagate(errp, local_err);
> return;
> diff --git a/include/block/block.h b/include/block/block.h
> index d1f03cb48b..744b50e734 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -244,8 +244,10 @@ int bdrv_create(BlockDriver *drv, const char* filename,
> QemuOpts *opts, Error **errp);
> int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
> BlockDriverState *bdrv_new(void);
> -void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
> - Error **errp);
> +void bdrv_append_file(BlockDriverState *bs_new, BlockDriverState *bs_top,
> + Error **errp);
> +void bdrv_append_backing(BlockDriverState *bs_new, BlockDriverState *bs_top,
> + Error **errp);
> void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
> Error **errp);
>
> @@ -256,6 +258,8 @@ BdrvChild *bdrv_open_child(const char *filename,
> BlockDriverState* parent,
> const BdrvChildRole *child_role,
> bool allow_none, Error **errp);
> +void bdrv_set_file(BlockDriverState *bs, BlockDriverState *file_bs,
> + Error **errp);
> void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
> Error **errp);
> int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
> diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
> index 82e763b68d..cc653c317a 100644
> --- a/tests/qemu-iotests/141.out
> +++ b/tests/qemu-iotests/141.out
> @@ -9,7 +9,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/m.
> {"return": {}}
> Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
> {"return": {}}
> -{"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
> +{"error": {"class": "GenericError", "desc": "Block device drv0 is in use"}}
> {"return": {}}
> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 1048576, "offset": 0, "speed": 0, "type": "backup"}}
> {"return": {}}
> --
> 2.11.0
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/2] block: use internal filter node in backup
2017-08-16 13:25 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-08-16 20:11 ` Manos Pitsidianakis
2017-08-17 14:06 ` Stefan Hajnoczi
0 siblings, 1 reply; 8+ messages in thread
From: Manos Pitsidianakis @ 2017-08-16 20:11 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Kevin Wolf, Stefan Hajnoczi, qemu-block,
Alberto Garcia
[-- Attachment #1: Type: text/plain, Size: 25793 bytes --]
On Wed, Aug 16, 2017 at 02:25:44PM +0100, Stefan Hajnoczi wrote:
>On Tue, Aug 15, 2017 at 11:18:53AM +0300, Manos Pitsidianakis wrote:
>> block/backup.c currently uses before write notifiers on the targeted
>> node. We can create a filter node instead to intercept write requests
>> for the backup job on the BDS level, instead of the BlockBackend level.
>>
>> This is part of deprecating before write notifiers, which are hard coded
>> into the block layer. Block filter drivers are inserted into the graph
>> only when a feature is needed. This makes the block layer more modular
>> and reuses the block driver abstraction that is already present.
>>
>> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
>> ---
>> block.c | 89 +++++++++++++++++--
>> block/backup.c | 207 ++++++++++++++++++++++++++++++++++++++++-----
>> block/mirror.c | 7 +-
>> blockdev.c | 2 +-
>> include/block/block.h | 8 +-
>> tests/qemu-iotests/141.out | 2 +-
>> 6 files changed, 276 insertions(+), 39 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 2de1c29eb3..81bd51b670 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2088,6 +2088,38 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs)
>> }
>>
>> /*
>> + * Sets the file link of a BDS. A new reference is created; callers
>> + * which don't need their own reference any more must call bdrv_unref().
>> + */
>> +void bdrv_set_file(BlockDriverState *bs, BlockDriverState *file_bs,
>> + Error **errp)
>> +{
>> + if (file_bs) {
>> + bdrv_ref(file_bs);
>> + }
>> +
>> + if (bs->file) {
>> + bdrv_unref_child(bs, bs->file);
>> + }
>> +
>> + if (!file_bs) {
>> + bs->file = NULL;
>> + goto out;
>> + }
>> +
>> + bs->file = bdrv_attach_child(bs, file_bs, "file", &child_file,
>> + errp);
>> + if (!bs->file) {
>> + bdrv_unref(file_bs);
>> + }
>> +
>> + bdrv_refresh_filename(bs);
>> +
>> +out:
>> + bdrv_refresh_limits(bs, NULL);
>> +}
>> +
>> +/*
>> * Sets the backing file link of a BDS. A new reference is created; callers
>> * which don't need their own reference any more must call bdrv_unref().
>> */
>> @@ -2355,12 +2387,12 @@ static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
>> goto out;
>> }
>>
>> - /* bdrv_append() consumes a strong reference to bs_snapshot
>> + /* bdrv_append_backing() consumes a strong reference to bs_snapshot
>> * (i.e. it will call bdrv_unref() on it) even on error, so in
>> * order to be able to return one, we have to increase
>> * bs_snapshot's refcount here */
>> bdrv_ref(bs_snapshot);
>> - bdrv_append(bs_snapshot, bs, &local_err);
>> + bdrv_append_backing(bs_snapshot, bs, &local_err);
>> if (local_err) {
>> error_propagate(errp, local_err);
>> bs_snapshot = NULL;
>> @@ -3142,7 +3174,7 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
>> return false;
>> }
>>
>> - if (c->role == &child_backing) {
>> + if (c->role == &child_backing || c->role == &child_file) {
>> /* If @from is a backing file of @to, ignore the child to avoid
>> * creating a loop. We only want to change the pointer of other
>> * parents. */
>
>This may have unwanted side-effects. I think you're using is so that
>bdrv_set_file() + bdrv_replace_node() does not create a loop in the
>graph. That is okay if there is only one parent with child_file. If
>there are multiple parents with that role then it's not clear to me that
>they should all be skipped.
I am afraid I don't understand what you're saying. What is the
difference with the child_backing scenario here? In both cases we
should update all from->parents children unless they also happen to be a
child of `to`. If there are multiple parents with child_file, they are
not skipped except for the ones where `to` is the parent.
>
>> @@ -3213,6 +3245,45 @@ out:
>> }
>>
>> /*
>> + * Add new bs node at the top of a BDS chain while the chain is
>> + * live, while keeping required fields on the top layer.
>> + *
>> + * This will modify the BlockDriverState fields, and swap contents
>> + * between bs_new and bs_top. Both bs_new and bs_top are modified.
>> + *
>> + * bs_new must not be attached to a BlockBackend.
>> + *
>> + * bdrv_append_file() takes ownership of a bs_new reference and unrefs it
>> + * because that's what the callers commonly need. bs_new will be referenced by
>> + * the old parents of bs_top after bdrv_append_file() returns. If the caller
>> + * needs to keep a reference of its own, it must call bdrv_ref().
>> + */
>> +void bdrv_append_file(BlockDriverState *bs_new, BlockDriverState *bs_top,
>> + Error **errp)
>> +{
>> + Error *local_err = NULL;
>> +
>> + bdrv_ref(bs_top);
>> + bdrv_set_file(bs_new, bs_top, &local_err);
>
>bdrv_set_file() takes its own reference so there's no need to call
>bdrv_ref(bs_top).
>
>But it would be more consistent with existing functions for
>bdrv_set_file() *not* to take a new reference. If you make that change
>then this bdrv_ref() is correct.
>
>> + if (local_err) {
>> + error_propagate(errp, local_err);
>> + bdrv_set_file(bs_new, NULL, &error_abort);
>
>If bdrv_set_file(bs_new, bs_top) failed, why is it necessary to set
>bs_new->file to NULL?
>
>bdrv_set_file() should guarantee that bs_new->file is either bs_top (on
>success) or the old value (on failure). Then the caller does not need
>to fix up bs_new->file on failure.
I think the error paths got mixed here, bs_new->file should be set to
NULL on bdrv_replace_node() failure.
>
>> + goto out;
>> + }
>> + bdrv_replace_node(bs_top, bs_new, &local_err);
>> + if (local_err) {
>> + error_propagate(errp, local_err);
>> + goto out;
>> + }
>> +
>> + /* bs_new is now referenced by its new parents, we don't need the
>> + * additional reference any more. */
>> +out:
>> + bdrv_unref(bs_top);
>> + bdrv_unref(bs_new);
>> +}
>> +
>> +/*
>> * Add new bs contents at the top of an image chain while the chain is
>> * live, while keeping required fields on the top layer.
>> *
>
>Plase introduce the bdrv_set_file() and bdrv_append_file() APIs in a
>separate patch from the backup block job changes.
>
>> @@ -3223,13 +3294,13 @@ out:
>> *
>> * This function does not create any image files.
>> *
>> - * bdrv_append() takes ownership of a bs_new reference and unrefs it because
>> - * that's what the callers commonly need. bs_new will be referenced by the old
>> - * parents of bs_top after bdrv_append() returns. If the caller needs to keep a
>> - * reference of its own, it must call bdrv_ref().
>> + * bdrv_append_backing() takes ownership of a bs_new reference and unrefs it
>> + * because that's what the callers commonly need. bs_new will be referenced by
>> + * the old parents of bs_top after bdrv_append_backing() returns. If the caller
>> + * needs to keep a reference of its own, it must call bdrv_ref().
>> */
>> -void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
>> - Error **errp)
>> +void bdrv_append_backing(BlockDriverState *bs_new, BlockDriverState *bs_top,
>> + Error **errp)
>> {
>> Error *local_err = NULL;
>>
>
>Please change bdrv_append() to bdrv_append_backing() in a separate
>patch.
>
>> diff --git a/block/backup.c b/block/backup.c
>> index 504a089150..0828d522b6 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -43,7 +43,7 @@ typedef struct BackupBlockJob {
>> unsigned long *done_bitmap;
>> int64_t cluster_size;
>> bool compress;
>> - NotifierWithReturn before_write;
>> + BlockDriverState *filter;
>> QLIST_HEAD(, CowRequest) inflight_reqs;
>> } BackupBlockJob;
>>
>> @@ -174,20 +174,6 @@ out:
>> return ret;
>> }
>>
>> -static int coroutine_fn backup_before_write_notify(
>> - NotifierWithReturn *notifier,
>> - void *opaque)
>> -{
>> - BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
>> - BdrvTrackedRequest *req = opaque;
>> -
>> - assert(req->bs == blk_bs(job->common.blk));
>> - assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
>> - assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
>> -
>> - return backup_do_cow(job, req->offset, req->bytes, NULL, true);
>> -}
>> -
>> static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
>> {
>> BackupBlockJob *s = container_of(job, BackupBlockJob, common);
>> @@ -202,7 +188,8 @@ static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
>> static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>> {
>> BdrvDirtyBitmap *bm;
>> - BlockDriverState *bs = blk_bs(job->common.blk);
>> + BlockDriverState *bs = child_bs(blk_bs(job->common.blk));
>> + assert(bs);
>>
>> if (ret < 0 || block_job_is_cancelled(&job->common)) {
>> /* Merge the successor back into the parent, delete nothing. */
>> @@ -234,9 +221,31 @@ static void backup_abort(BlockJob *job)
>> static void backup_clean(BlockJob *job)
>> {
>> BackupBlockJob *s = container_of(job, BackupBlockJob, common);
>> + BlockDriverState *bs = child_bs(s->filter),
>> + *filter = s->filter;
>
>Please do not put multiple variable declarations with initializers in a
>single statement. It's easier to read like this:
>
> BlockDriverState *filter = s->filter;
> BlockDriverState *bs = child_bs(filter);
>
>> +
>> assert(s->target);
>> blk_unref(s->target);
>> s->target = NULL;
>> +
>> + /* make sure nothing goes away while removing filter */
>> + bdrv_ref(filter);
>> + bdrv_ref(bs);
>> + bdrv_drained_begin(bs);
>> +
>> + block_job_remove_all_bdrv(job);
>> + bdrv_child_try_set_perm(filter->file, 0, BLK_PERM_ALL,
>> + &error_abort);
>> + bdrv_replace_node(filter, bs, &error_abort);
>> +
>> + blk_remove_bs(job->blk);
>> + blk_set_perm(job->blk, 0, BLK_PERM_ALL, &error_abort);
>> + blk_insert_bs(job->blk, filter, &error_abort);
>> +
>> + bdrv_drained_end(bs);
>> + bdrv_unref(filter);
>> + bdrv_unref(bs);
>> + s->filter = NULL;
>> }
>>
>> static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
>> @@ -421,6 +430,18 @@ out:
>> return ret;
>> }
>>
>> +static void backup_top_enable(BackupBlockJob *job)
>> +{
>> + BlockDriverState *bs = job->filter;
>> + bs->opaque = job;
>> +}
>> +
>> +static void backup_top_disable(BackupBlockJob *job)
>> +{
>> + BlockDriverState *bs = job->filter;
>> + bs->opaque = NULL;
>> +}
>
>bs->opaque is a pointer that is managed by block.c. Drivers are not
>allowed to modify this pointer. You declared BlockDriver.instance_size
>= sizeof(BackupBlockJob *) but didn't use it.
Doh, thanks.
>
>I guess this should be:
>
> static void backup_top_enable(BackupBlockJob *job)
> {
> BackupBlockJob **jobp = job->bs->opaque;
> *jobp = job;
> }
>
> static void backup_top_disable(BackupBlockJob *job)
> {
> BackupBlockJob **jobp = job->bs->opaque;
> *jobp = NULL;
> }
>
> ...
>
> static int coroutine_fn backup_top_co_pwritev(BlockDriverState *bs,
> uint64_t offset,
> uint64_t bytes,
> QEMUIOVector *qiov, int flags)
> {
> int ret = 0;
> BackupBlockJob *job = *(BackupBlockJob **)bs->opaque;
> if (job) {
>
>Alternatively filter->job can be used but that may be a legacy field
>that will be removed eventually.
>
>> +
>> static void coroutine_fn backup_run(void *opaque)
>> {
>> BackupBlockJob *job = opaque;
>> @@ -435,13 +456,12 @@ static void coroutine_fn backup_run(void *opaque)
>> job->done_bitmap = bitmap_new(DIV_ROUND_UP(job->common.len,
>> job->cluster_size));
>>
>> - job->before_write.notify = backup_before_write_notify;
>> - bdrv_add_before_write_notifier(bs, &job->before_write);
>> + backup_top_enable(job);
>>
>> if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
>> while (!block_job_is_cancelled(&job->common)) {
>> - /* Yield until the job is cancelled. We just let our before_write
>> - * notify callback service CoW requests. */
>> + /* Yield until the job is cancelled. We just let our backup_top
>> + * filter service CoW requests. */
>> block_job_yield(&job->common);
>> }
>> } else if (job->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>> @@ -507,8 +527,7 @@ static void coroutine_fn backup_run(void *opaque)
>> }
>> }
>> }
>> -
>> - notifier_with_return_remove(&job->before_write);
>> + backup_top_disable(job);
>>
>> /* wait until pending backup_do_cow() calls have completed */
>> qemu_co_rwlock_wrlock(&job->flush_rwlock);
>> @@ -532,6 +551,124 @@ static const BlockJobDriver backup_job_driver = {
>> .drain = backup_drain,
>> };
>>
>> +static int coroutine_fn backup_top_co_preadv(BlockDriverState *bs,
>> + uint64_t offset,
>> + uint64_t bytes,
>> + QEMUIOVector *qiov, int flags)
>> +{
>> + return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
>> +}
>> +
>> +static int coroutine_fn backup_top_co_pwritev(BlockDriverState *bs,
>> + uint64_t offset,
>> + uint64_t bytes,
>> + QEMUIOVector *qiov, int flags)
>> +{
>> + int ret = 0;
>> + BackupBlockJob *job = bs->opaque;
>> + if (job) {
>> + assert(bs == blk_bs(job->common.blk));
>> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
>> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
>> + ret = backup_do_cow(job, offset, bytes, NULL, true);
>> + }
>> +
>> + return ret < 0 ? ret : bdrv_co_pwritev(bs->file, offset, bytes,
>> + qiov, flags);
>> +}
>> +
>> +static int coroutine_fn backup_top_co_pwrite_zeroes(BlockDriverState *bs,
>> + int64_t offset,
>> + int bytes,
>> + BdrvRequestFlags flags)
>> +{
>> + int ret = 0;
>> + BackupBlockJob *job = bs->opaque;
>> + if (job) {
>> + assert(bs == blk_bs(job->common.blk));
>> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
>> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
>> + ret = backup_do_cow(job, offset, bytes, NULL, true);
>> + }
>> +
>> + return ret < 0 ? ret : bdrv_co_pwrite_zeroes(bs->file, offset, bytes,
>> + flags);
>> +}
>> +
>> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
>> + int64_t offset, int bytes)
>> +{
>> + int ret = 0;
>> + BackupBlockJob *job = bs->opaque;
>> + if (job) {
>> + assert(bs == blk_bs(job->common.blk));
>> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
>> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
>> + ret = backup_do_cow(job, offset, bytes, NULL, true);
>> + }
>> +
>> + return ret < 0 ? ret : bdrv_co_pdiscard(bs->file->bs, offset, bytes);
>> +}
>> +
>> +static int backup_top_co_flush(BlockDriverState *bs)
>> +{
>> + return bdrv_co_flush(bs->file->bs);
>> +}
>> +
>> +static int backup_top_open(BlockDriverState *bs, QDict *options,
>> + int flags, Error **errp)
>> +{
>> + return 0;
>> +}
>> +
>> +static void backup_top_close(BlockDriverState *bs)
>> +{
>> +}
>> +
>> +static int64_t backup_top_getlength(BlockDriverState *bs)
>> +{
>> + return bs->file ? bdrv_getlength(bs->file->bs) : 0;
>> +}
>> +
>> +static bool backup_recurse_is_first_non_filter(BlockDriverState *bs,
>> + BlockDriverState *candidate)
>> +{
>> + return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
>> +}
>> +
>> +static int64_t coroutine_fn backup_co_get_block_status(BlockDriverState *bs,
>> + int64_t sector_num,
>> + int nb_sectors,
>> + int *pnum,
>> + BlockDriverState **file)
>> +{
>> + assert(bs->file && bs->file->bs);
>> + *pnum = nb_sectors;
>> + *file = bs->file->bs;
>> + return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
>> + (sector_num << BDRV_SECTOR_BITS);
>> +}
>> +static BlockDriver backup_top = {
>> + .format_name = "backup-top",
>> + .instance_size = sizeof(BackupBlockJob *),
>> +
>> + .bdrv_open = backup_top_open,
>
>.bdrv_open may be NULL. There's no need to define this function.
>
>> + .bdrv_close = backup_top_close,
>> +
>> + .bdrv_co_flush = backup_top_co_flush,
>> + .bdrv_co_preadv = backup_top_co_preadv,
>> + .bdrv_co_pwritev = backup_top_co_pwritev,
>> + .bdrv_co_pwrite_zeroes = backup_top_co_pwrite_zeroes,
>> + .bdrv_co_pdiscard = backup_top_co_pdiscard,
>> +
>> + .bdrv_getlength = backup_top_getlength,
>> + .bdrv_child_perm = bdrv_filter_default_perms,
>> + .bdrv_recurse_is_first_non_filter = backup_recurse_is_first_non_filter,
>
>I think this is dead code since .is_filter = true. It will not be
>called.
bdrv_recurse_is_first_non_filter() is only called if is_filter is true
and the driver implements it to allow recursion to its children.
>
>> + .bdrv_co_get_block_status = backup_co_get_block_status,
>> +
>> + .is_filter = true,
>> +};
>> +
>> BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>> BlockDriverState *target, int64_t speed,
>> MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
>> @@ -545,6 +682,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>> int64_t len;
>> BlockDriverInfo bdi;
>> BackupBlockJob *job = NULL;
>> + BlockDriverState *filter = NULL;
>> int ret;
>>
>> assert(bs);
>> @@ -606,17 +744,33 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>> bdrv_get_device_name(bs));
>> goto error;
>> }
>> + /* Setup before write filter */
>> + filter =
>> + bdrv_new_open_driver(&backup_top,
>> + NULL, bdrv_get_flags(bs), NULL, &error_abort);
>> + filter->implicit = true;
>> + filter->total_sectors = bs->total_sectors;
>> + bdrv_set_aio_context(filter, bdrv_get_aio_context(bs));
>> +
>> + /* Insert before write notifier in the BDS chain */
>> + bdrv_ref(filter);
>> + bdrv_drained_begin(bs);
>> + bdrv_append_file(filter, bs, &error_abort);
>> + bdrv_drained_end(bs);
>>
>> /* job->common.len is fixed, so we can't allow resize */
>> - job = block_job_create(job_id, &backup_job_driver, bs,
>> + job = block_job_create(job_id, &backup_job_driver, filter,
>> BLK_PERM_CONSISTENT_READ,
>> BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>> BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
>> speed, creation_flags, cb, opaque, errp);
>> + bdrv_unref(filter);
>> if (!job) {
>> goto error;
>> }
>>
>> + job->filter = filter;
>
>Is it okay to use filter here after bdrv_unref()?
Yes, this is only to make sure it won't get freed if bs doesn't have a
parent. It is a ref/unref pair around bdrv_append_file and
block_job_create, and filter will have refcnt >= 1 after the unref. I
will add a comment for clarification.
Thanks for the comments!
>
>> +
>> /* The target must match the source in size, so no resize here either */
>> job->target = blk_new(BLK_PERM_WRITE,
>> BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>> @@ -675,6 +829,13 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>> if (job) {
>> backup_clean(&job->common);
>> block_job_early_fail(&job->common);
>> + } else {
>> + /* don't leak filter if job creation failed */
>> + if (filter) {
>> + bdrv_child_try_set_perm(filter->file, 0, BLK_PERM_ALL,
>> + &error_abort);
>> + bdrv_replace_node(filter, bs, &error_abort);
>> + }
>> }
>>
>> return NULL;
>> diff --git a/block/mirror.c b/block/mirror.c
>> index e1a160e6ea..3044472fd4 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1174,11 +1174,12 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>> mirror_top_bs->total_sectors = bs->total_sectors;
>> bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
>>
>> - /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
>> - * it alive until block_job_create() succeeds even if bs has no parent. */
>> + /* bdrv_append_backing() takes ownership of the mirror_top_bs reference,
>> + * need to keep it alive until block_job_create() succeeds even if bs has
>> + * no parent. */
>> bdrv_ref(mirror_top_bs);
>> bdrv_drained_begin(bs);
>> - bdrv_append(mirror_top_bs, bs, &local_err);
>> + bdrv_append_backing(mirror_top_bs, bs, &local_err);
>> bdrv_drained_end(bs);
>>
>> if (local_err) {
>> diff --git a/blockdev.c b/blockdev.c
>> index 5c11c245b0..8e2fc6e64c 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -1788,7 +1788,7 @@ static void external_snapshot_prepare(BlkActionState *common,
>> * can fail, so we need to do it in .prepare; undoing it for abort is
>> * always possible. */
>> bdrv_ref(state->new_bs);
>> - bdrv_append(state->new_bs, state->old_bs, &local_err);
>> + bdrv_append_backing(state->new_bs, state->old_bs, &local_err);
>> if (local_err) {
>> error_propagate(errp, local_err);
>> return;
>> diff --git a/include/block/block.h b/include/block/block.h
>> index d1f03cb48b..744b50e734 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -244,8 +244,10 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>> QemuOpts *opts, Error **errp);
>> int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
>> BlockDriverState *bdrv_new(void);
>> -void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
>> - Error **errp);
>> +void bdrv_append_file(BlockDriverState *bs_new, BlockDriverState *bs_top,
>> + Error **errp);
>> +void bdrv_append_backing(BlockDriverState *bs_new, BlockDriverState *bs_top,
>> + Error **errp);
>> void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
>> Error **errp);
>>
>> @@ -256,6 +258,8 @@ BdrvChild *bdrv_open_child(const char *filename,
>> BlockDriverState* parent,
>> const BdrvChildRole *child_role,
>> bool allow_none, Error **errp);
>> +void bdrv_set_file(BlockDriverState *bs, BlockDriverState *file_bs,
>> + Error **errp);
>> void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
>> Error **errp);
>> int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
>> diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
>> index 82e763b68d..cc653c317a 100644
>> --- a/tests/qemu-iotests/141.out
>> +++ b/tests/qemu-iotests/141.out
>> @@ -9,7 +9,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/m.
>> {"return": {}}
>> Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
>> {"return": {}}
>> -{"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
>> +{"error": {"class": "GenericError", "desc": "Block device drv0 is in use"}}
>> {"return": {}}
>> {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 1048576, "offset": 0, "speed": 0, "type": "backup"}}
>> {"return": {}}
>> --
>> 2.11.0
>>
>>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH v2 1/2] block: use internal filter node in backup
2017-08-16 20:11 ` Manos Pitsidianakis
@ 2017-08-17 14:06 ` Stefan Hajnoczi
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2017-08-17 14:06 UTC (permalink / raw)
To: Manos Pitsidianakis, Stefan Hajnoczi, qemu-devel, Kevin Wolf,
qemu-block, Alberto Garcia
[-- Attachment #1: Type: text/plain, Size: 2752 bytes --]
On Wed, Aug 16, 2017 at 11:11:44PM +0300, Manos Pitsidianakis wrote:
> On Wed, Aug 16, 2017 at 02:25:44PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Aug 15, 2017 at 11:18:53AM +0300, Manos Pitsidianakis wrote:
> > > @@ -3142,7 +3174,7 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
> > > return false;
> > > }
> > >
> > > - if (c->role == &child_backing) {
> > > + if (c->role == &child_backing || c->role == &child_file) {
> > > /* If @from is a backing file of @to, ignore the child to avoid
> > > * creating a loop. We only want to change the pointer of other
> > > * parents. */
> >
> > This may have unwanted side-effects. I think you're using is so that
> > bdrv_set_file() + bdrv_replace_node() does not create a loop in the
> > graph. That is okay if there is only one parent with child_file. If
> > there are multiple parents with that role then it's not clear to me that
> > they should all be skipped.
>
> I am afraid I don't understand what you're saying. What is the difference
> with the child_backing scenario here? In both cases we should update all
> from->parents children unless they also happen to be a child of `to`. If
> there are multiple parents with child_file, they are not skipped except for
> the ones where `to` is the parent.
Okay, I'll have to look at this again. Thanks!
> > > +static BlockDriver backup_top = {
> > > + .format_name = "backup-top",
> > > + .instance_size = sizeof(BackupBlockJob *),
> > > +
> > > + .bdrv_open = backup_top_open,
> >
> > .bdrv_open may be NULL. There's no need to define this function.
> >
> > > + .bdrv_close = backup_top_close,
> > > +
> > > + .bdrv_co_flush = backup_top_co_flush,
> > > + .bdrv_co_preadv = backup_top_co_preadv,
> > > + .bdrv_co_pwritev = backup_top_co_pwritev,
> > > + .bdrv_co_pwrite_zeroes = backup_top_co_pwrite_zeroes,
> > > + .bdrv_co_pdiscard = backup_top_co_pdiscard,
> > > +
> > > + .bdrv_getlength = backup_top_getlength,
> > > + .bdrv_child_perm = bdrv_filter_default_perms,
> > > + .bdrv_recurse_is_first_non_filter = backup_recurse_is_first_non_filter,
> >
> > I think this is dead code since .is_filter = true. It will not be
> > called.
>
> bdrv_recurse_is_first_non_filter() is only called if is_filter is true and
> the driver implements it to allow recursion to its children.
You are right, I misread the code.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] block: use internal filter node in backup
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 1/2] block: use internal filter node in backup Manos Pitsidianakis
2017-08-16 13:25 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-10-02 12:45 ` Kevin Wolf
1 sibling, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2017-10-02 12:45 UTC (permalink / raw)
To: Manos Pitsidianakis
Cc: qemu-devel, qemu-block, Alberto Garcia, Stefan Hajnoczi
Am 15.08.2017 um 10:18 hat Manos Pitsidianakis geschrieben:
> block/backup.c currently uses before write notifiers on the targeted
> node. We can create a filter node instead to intercept write requests
> for the backup job on the BDS level, instead of the BlockBackend level.
>
> This is part of deprecating before write notifiers, which are hard coded
> into the block layer. Block filter drivers are inserted into the graph
> only when a feature is needed. This makes the block layer more modular
> and reuses the block driver abstraction that is already present.
>
> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
You make a lot of changes to add bdrv_set_file()/bdrv_append_file()
(which should be separate patches, by the way), but if we look at the
only actual user of the functions:
> @@ -606,17 +744,33 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> bdrv_get_device_name(bs));
> goto error;
> }
> + /* Setup before write filter */
> + filter =
> + bdrv_new_open_driver(&backup_top,
> + NULL, bdrv_get_flags(bs), NULL, &error_abort);
> + filter->implicit = true;
> + filter->total_sectors = bs->total_sectors;
> + bdrv_set_aio_context(filter, bdrv_get_aio_context(bs));
> +
> + /* Insert before write notifier in the BDS chain */
> + bdrv_ref(filter);
> + bdrv_drained_begin(bs);
> + bdrv_append_file(filter, bs, &error_abort);
> + bdrv_drained_end(bs);
Is there a reason why you can't simply do:
filter->file = bdrv_attach_child(filter, bs, "file", &child_file, &local_err);
bdrv_replace_node(filter->file, filter, &error_abort);
Effectively these two lines are all that your new functions boil down to
because only this one special case is ever used.
Did you consider that using filter->file for the backup filter rather
than filter->backing like in mirror and commit mean that the backing
file chain is broken? Are we sure that all functions that operate on the
backing file chain can already skip filters?
For example, bdrv_co_get_block_status_above() doesn't seem to do the
right thing yet, it would consider the filter as the base file where the
chain ends.
Attaching as filter->backing feels a bit safer to me.
> @@ -532,6 +551,124 @@ static const BlockJobDriver backup_job_driver = {
> .drain = backup_drain,
> };
>
> +static int coroutine_fn backup_top_co_preadv(BlockDriverState *bs,
> + uint64_t offset,
> + uint64_t bytes,
> + QEMUIOVector *qiov, int flags)
> +{
> + return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
> +}
> +
> +static int coroutine_fn backup_top_co_pwritev(BlockDriverState *bs,
> + uint64_t offset,
> + uint64_t bytes,
> + QEMUIOVector *qiov, int flags)
> +{
> + int ret = 0;
> + BackupBlockJob *job = bs->opaque;
> + if (job) {
> + assert(bs == blk_bs(job->common.blk));
> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> + ret = backup_do_cow(job, offset, bytes, NULL, true);
> + }
Is it worth making this a static inline function instead of having three
copies of the same code?
> + return ret < 0 ? ret : bdrv_co_pwritev(bs->file, offset, bytes,
> + qiov, flags);
> +}
> +
> +static int coroutine_fn backup_top_co_pwrite_zeroes(BlockDriverState *bs,
> + int64_t offset,
> + int bytes,
> + BdrvRequestFlags flags)
> +{
> + int ret = 0;
> + BackupBlockJob *job = bs->opaque;
> + if (job) {
> + assert(bs == blk_bs(job->common.blk));
> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> + ret = backup_do_cow(job, offset, bytes, NULL, true);
> + }
> +
> + return ret < 0 ? ret : bdrv_co_pwrite_zeroes(bs->file, offset, bytes,
> + flags);
> +}
> +
> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
> + int64_t offset, int bytes)
> +{
> + int ret = 0;
> + BackupBlockJob *job = bs->opaque;
> + if (job) {
> + assert(bs == blk_bs(job->common.blk));
> + assert(QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE));
> + assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
> + ret = backup_do_cow(job, offset, bytes, NULL, true);
> + }
> +
> + return ret < 0 ? ret : bdrv_co_pdiscard(bs->file->bs, offset, bytes);
> +}
> +
> +static int backup_top_co_flush(BlockDriverState *bs)
> +{
> + return bdrv_co_flush(bs->file->bs);
> +}
> +
> +static int backup_top_open(BlockDriverState *bs, QDict *options,
> + int flags, Error **errp)
> +{
> + return 0;
> +}
> +
> +static void backup_top_close(BlockDriverState *bs)
> +{
> +}
> +
> +static int64_t backup_top_getlength(BlockDriverState *bs)
> +{
> + return bs->file ? bdrv_getlength(bs->file->bs) : 0;
> +}
> +
> +static bool backup_recurse_is_first_non_filter(BlockDriverState *bs,
> + BlockDriverState *candidate)
> +{
> + return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
> +}
> +
> +static int64_t coroutine_fn backup_co_get_block_status(BlockDriverState *bs,
> + int64_t sector_num,
> + int nb_sectors,
> + int *pnum,
> + BlockDriverState **file)
> +{
> + assert(bs->file && bs->file->bs);
> + *pnum = nb_sectors;
> + *file = bs->file->bs;
> + return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
> + (sector_num << BDRV_SECTOR_BITS);
> +}
This is just a duplicate of bdrv_co_get_block_status_from_file().
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH v2 2/2] block: add filter driver to block/write-threshold.c
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 2/2] block: add filter driver to block/write-threshold.c Manos Pitsidianakis
@ 2017-10-02 12:52 ` Kevin Wolf
0 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2017-10-02 12:52 UTC (permalink / raw)
To: Manos Pitsidianakis
Cc: qemu-devel, qemu-block, Alberto Garcia, Stefan Hajnoczi
Am 15.08.2017 um 10:18 hat Manos Pitsidianakis geschrieben:
> With runtime insertion and removal of filters, write-threshold.c can
> provide more flexible deliveries of BLOCK_WRITE_THRESHOLD events. After
> the event trigger, the filter nodes are no longer useful and must be
> removed.
> The existing write-threshold cannot be easily converted to using the
> filter driver, so it is not affected.
>
> This is part of deprecating before write notifiers, which are hard coded
> into the block layer. Block filter drivers are inserted into the graph
> only when a feature is needed. This makes the block layer more modular
> and reuses the block driver abstraction that is already present.
>
> Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
> ---
> block/qapi.c | 2 +-
> block/write-threshold.c | 264 +++++++++++++++++++++++++++++++++++-----
> include/block/write-threshold.h | 22 ++--
> qapi/block-core.json | 19 ++-
> tests/test-write-threshold.c | 40 +++---
> 5 files changed, 281 insertions(+), 66 deletions(-)
>
> diff --git a/block/qapi.c b/block/qapi.c
> index 2be44a6758..fe6cf2eae5 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -122,7 +122,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
> info->group = g_strdup(throttle_group_get_name(tgm));
> }
>
> - info->write_threshold = bdrv_write_threshold_get(bs);
> + info->write_threshold = bdrv_write_threshold_get_legacy(bs);
>
> bs0 = bs;
> p_image_info = &info->image;
> diff --git a/block/write-threshold.c b/block/write-threshold.c
> index 0bd1a01c86..4a67188ea3 100644
> --- a/block/write-threshold.c
> +++ b/block/write-threshold.c
> @@ -2,9 +2,11 @@
> * QEMU System Emulator block write threshold notification
> *
> * Copyright Red Hat, Inc. 2014
> + * Copyright 2017 Manos Pitsidianakis
> *
> * Authors:
> * Francesco Romani <fromani@redhat.com>
> + * Manos Pitsidianakis <el13635@mail.ntua.gr>
> *
> * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> * See the COPYING.LIB file in the top-level directory.
> @@ -19,46 +21,35 @@
> #include "qmp-commands.h"
>
>
> -uint64_t bdrv_write_threshold_get(const BlockDriverState *bs)
> +uint64_t bdrv_write_threshold_get_legacy(const BlockDriverState *bs)
> {
> return bs->write_threshold_offset;
> }
>
> -bool bdrv_write_threshold_is_set(const BlockDriverState *bs)
> +bool bdrv_write_threshold_is_set_legacy(const BlockDriverState *bs)
> {
> return bs->write_threshold_offset > 0;
> }
>
> -static void write_threshold_disable(BlockDriverState *bs)
> +static void write_threshold_disable_legacy(BlockDriverState *bs)
> {
> - if (bdrv_write_threshold_is_set(bs)) {
> + if (bdrv_write_threshold_is_set_legacy(bs)) {
> notifier_with_return_remove(&bs->write_threshold_notifier);
> bs->write_threshold_offset = 0;
> }
> }
>
> -uint64_t bdrv_write_threshold_exceeded(const BlockDriverState *bs,
> - const BdrvTrackedRequest *req)
> -{
> - if (bdrv_write_threshold_is_set(bs)) {
> - if (req->offset > bs->write_threshold_offset) {
> - return (req->offset - bs->write_threshold_offset) + req->bytes;
> - }
> - if ((req->offset + req->bytes) > bs->write_threshold_offset) {
> - return (req->offset + req->bytes) - bs->write_threshold_offset;
> - }
> - }
> - return 0;
> -}
> -
> static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
> void *opaque)
> {
> BdrvTrackedRequest *req = opaque;
> BlockDriverState *bs = req->bs;
> uint64_t amount = 0;
> + uint64_t threshold = bdrv_write_threshold_get_legacy(bs);
> + uint64_t offset = req->offset;
> + uint64_t bytes = req->bytes;
>
> - amount = bdrv_write_threshold_exceeded(bs, req);
> + amount = bdrv_write_threshold_exceeded(threshold, offset, bytes);
> if (amount > 0) {
> qapi_event_send_block_write_threshold(
> bs->node_name,
> @@ -67,7 +58,7 @@ static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
> &error_abort);
>
> /* autodisable to avoid flooding the monitor */
> - write_threshold_disable(bs);
> + write_threshold_disable_legacy(bs);
> }
>
> return 0; /* should always let other notifiers run */
> @@ -79,25 +70,26 @@ static void write_threshold_register_notifier(BlockDriverState *bs)
> bdrv_add_before_write_notifier(bs, &bs->write_threshold_notifier);
> }
>
> -static void write_threshold_update(BlockDriverState *bs,
> - int64_t threshold_bytes)
> +static void write_threshold_update_legacy(BlockDriverState *bs,
> + int64_t threshold_bytes)
> {
> bs->write_threshold_offset = threshold_bytes;
> }
>
> -void bdrv_write_threshold_set(BlockDriverState *bs, uint64_t threshold_bytes)
> +void bdrv_write_threshold_set_legacy(BlockDriverState *bs,
> + uint64_t threshold_bytes)
> {
> - if (bdrv_write_threshold_is_set(bs)) {
> + if (bdrv_write_threshold_is_set_legacy(bs)) {
> if (threshold_bytes > 0) {
> - write_threshold_update(bs, threshold_bytes);
> + write_threshold_update_legacy(bs, threshold_bytes);
> } else {
> - write_threshold_disable(bs);
> + write_threshold_disable_legacy(bs);
> }
> } else {
> if (threshold_bytes > 0) {
> /* avoid multiple registration */
> write_threshold_register_notifier(bs);
> - write_threshold_update(bs, threshold_bytes);
> + write_threshold_update_legacy(bs, threshold_bytes);
> }
> /* discard bogus disable request */
> }
> @@ -119,7 +111,223 @@ void qmp_block_set_write_threshold(const char *node_name,
> aio_context = bdrv_get_aio_context(bs);
> aio_context_acquire(aio_context);
>
> - bdrv_write_threshold_set(bs, threshold_bytes);
> + bdrv_write_threshold_set_legacy(bs, threshold_bytes);
>
> aio_context_release(aio_context);
> }
> +
> +
> +/* The write-threshold filter drivers delivers a one-time BLOCK_WRITE_THRESHOLD
> + * event when a passing write request exceeds the configured write threshold
> + * offset of the filter.
> + *
> + * This is useful to transparently resize thin-provisioned drives without
> + * the guest OS noticing.
> + */
> +
> +#define QEMU_OPT_WRITE_THRESHOLD "write-threshold"
> +static BlockDriver write_threshold;
This forward declaration is unnecessary, the variable is never used
before the actual definition.
> +static QemuOptsList write_threshold_opts = {
> + .name = "write-threshold",
> + .head = QTAILQ_HEAD_INITIALIZER(write_threshold_opts.head),
> + .desc = {
> + {
> + .name = QEMU_OPT_WRITE_THRESHOLD,
> + .type = QEMU_OPT_NUMBER,
> + .help = "configured threshold for the block device, bytes. Use 0"
> + "to disable the threshold",
> + },
> + { /* end of list */ }
> + },
> +};
> +
> +static bool bdrv_write_threshold_is_set(const BlockDriverState *bs)
> +{
> + uint64_t threshold = *(uint64_t *)bs->opaque;
I think I would prefer using a struct for bs->opaque, even if it only
contains a single uint64_t for now.
> + return threshold > 0;
> +}
> +static int64_t coroutine_fn write_threshold_co_get_block_status(
> + BlockDriverState *bs,
> + int64_t sector_num,
> + int nb_sectors,
> + int *pnum,
> + BlockDriverState **file)
> +{
> + assert(child_bs(bs));
> + *pnum = nb_sectors;
> + *file = child_bs(bs);
> + return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
> + (sector_num << BDRV_SECTOR_BITS);
> +}
This is bdrv_co_get_block_status_from_file() again.
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-10-02 12:53 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-15 8:18 [Qemu-devel] [PATCH v2 0/2] add internal backup job and write-threshold filter drivers Manos Pitsidianakis
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 1/2] block: use internal filter node in backup Manos Pitsidianakis
2017-08-16 13:25 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-08-16 20:11 ` Manos Pitsidianakis
2017-08-17 14:06 ` Stefan Hajnoczi
2017-10-02 12:45 ` [Qemu-devel] " Kevin Wolf
2017-08-15 8:18 ` [Qemu-devel] [PATCH v2 2/2] block: add filter driver to block/write-threshold.c Manos Pitsidianakis
2017-10-02 12:52 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).