[Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane
@ 2014-10-01 17:01 Stefan Hajnoczi
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 01/11] block: acquire AioContext in generic blockjob QMP commands Stefan Hajnoczi
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

Almost all the infrastructure is in place to make blockjobs safe for use with
dataplane:

  * Op blockers all us to exclude commands that could conflict with a blockjob
    or dataplane.

  * AioContext acquire/release allows threads to temporarily access a
    BlockDriverState that is running in another thread.

This series introduces a few additional helpers:

  * block_job_defer_to_main_loop() which allows blockjobs to run their
    completion code in the QEMU main loop.  This is necessary because some
    operations are not safe outside the QEMU global mutex.

  * bdrv_drain() which can be used in limited cases to wait for in-flight
    requests to complete (as opposed to the global bdrv_drain_all() function).

The approach taken in this series is to convert the blockdev.c monitor command
so it acquires the BlockDriverState's AioContext.  Normally only 1 AioContext
is involved at a time but the mirror job's to_replace argument can involve a
second AioContext.

Then the block job code itself is converted to defer main loop code using
block_job_defer_to_main_loop().

Example:

  $ qemu-system-x86_64 -enable-kvm -m 1024 \
        -drive if=none,id=drive0,file=test.img \
        -object iothread,iothread0 \
        -device virtio-blk-pci,drive=drive0,iothread=iothread0
  (qemu) drive_mirror drive0 test2.img

Stefan Hajnoczi (11):
  block: acquire AioContext in generic blockjob QMP commands
  blockdev: acquire AioContext in do_qmp_query_block_jobs_one()
  blockdev: acquire AioContext in blockdev_mark_auto_del()
  blockdev: add note that block_job_cb() must be thread-safe
  blockjob: add block_job_defer_to_main_loop()
  block: add bdrv_drain()
  block: let backup blockjob run in BDS AioContext
  block: let stream blockjob run in BDS AioContext
  block: let mirror blockjob run in BDS AioContext
  block: let commit blockjob run in BDS AioContext
  block: declare blockjobs and dataplane friends!

 block.c                         |  49 +++++++++--
 block/backup.c                  |  21 ++++-
 block/commit.c                  |  72 ++++++++++------
 block/mirror.c                  |  85 +++++++++++++------
 block/stream.c                  |  50 +++++++----
 blockdev.c                      | 179 +++++++++++++++++++++++++++++-----------
 blockjob.c                      |  36 ++++++++
 hw/block/dataplane/virtio-blk.c |   5 ++
 include/block/block.h           |   1 +
 include/block/blockjob.h        |  19 +++++
 10 files changed, 394 insertions(+), 123 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 01/11] block: acquire AioContext in generic blockjob QMP commands
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-01 18:27   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 02/11] blockdev: acquire AioContext in do_qmp_query_block_jobs_one() Stefan Hajnoczi
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

block-job-set-speed, block-job-cancel, block-job-pause,
block-job-resume, and block-job-complete must acquire the
BlockDriverState AioContext so that it is safe to access bs.

At the moment bs->job is always NULL when dataplane is active because op
blockers prevent blockjobs from starting.  Once the rest of the blockjob
API has been made aware of AioContext we can drop the op blocker.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c | 52 +++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index ad43648..379a268 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2335,20 +2335,35 @@ void qmp_drive_mirror(const char *device, const char *target,
     }
 }
 
-static BlockJob *find_block_job(const char *device)
+/* Get the block job for a given device name and acquire its AioContext */
+static BlockJob *find_block_job(const char *device, AioContext **aio_context)
 {
     BlockDriverState *bs;
 
     bs = bdrv_find(device);
-    if (!bs || !bs->job) {
-        return NULL;
+    if (!bs) {
+        goto notfound;
+    }
+
+    *aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(*aio_context);
+
+    if (!bs->job) {
+        aio_context_release(*aio_context);
+        goto notfound;
     }
+
     return bs->job;
+
+notfound:
+    *aio_context = NULL;
+    return NULL;
 }
 
 void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
 {
-    BlockJob *job = find_block_job(device);
+    AioContext *aio_context;
+    BlockJob *job = find_block_job(device, &aio_context);
 
     if (!job) {
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
@@ -2356,34 +2371,40 @@ void qmp_block_job_set_speed(const char *device, int64_t speed, Error **errp)
     }
 
     block_job_set_speed(job, speed, errp);
+    aio_context_release(aio_context);
 }
 
 void qmp_block_job_cancel(const char *device,
                           bool has_force, bool force, Error **errp)
 {
-    BlockJob *job = find_block_job(device);
-
-    if (!has_force) {
-        force = false;
-    }
+    AioContext *aio_context;
+    BlockJob *job = find_block_job(device, &aio_context);
 
     if (!job) {
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
         return;
     }
+
+    if (!has_force) {
+        force = false;
+    }
+
     if (job->paused && !force) {
         error_setg(errp, "The block job for device '%s' is currently paused",
                    device);
-        return;
+        goto out;
     }
 
     trace_qmp_block_job_cancel(job);
     block_job_cancel(job);
+out:
+    aio_context_release(aio_context);
 }
 
 void qmp_block_job_pause(const char *device, Error **errp)
 {
-    BlockJob *job = find_block_job(device);
+    AioContext *aio_context;
+    BlockJob *job = find_block_job(device, &aio_context);
 
     if (!job) {
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
@@ -2392,11 +2413,13 @@ void qmp_block_job_pause(const char *device, Error **errp)
 
     trace_qmp_block_job_pause(job);
     block_job_pause(job);
+    aio_context_release(aio_context);
 }
 
 void qmp_block_job_resume(const char *device, Error **errp)
 {
-    BlockJob *job = find_block_job(device);
+    AioContext *aio_context;
+    BlockJob *job = find_block_job(device, &aio_context);
 
     if (!job) {
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
@@ -2405,11 +2428,13 @@ void qmp_block_job_resume(const char *device, Error **errp)
 
     trace_qmp_block_job_resume(job);
     block_job_resume(job);
+    aio_context_release(aio_context);
 }
 
 void qmp_block_job_complete(const char *device, Error **errp)
 {
-    BlockJob *job = find_block_job(device);
+    AioContext *aio_context;
+    BlockJob *job = find_block_job(device, &aio_context);
 
     if (!job) {
         error_set(errp, QERR_BLOCK_JOB_NOT_ACTIVE, device);
@@ -2418,6 +2443,7 @@ void qmp_block_job_complete(const char *device, Error **errp)
 
     trace_qmp_block_job_complete(job);
     block_job_complete(job, errp);
+    aio_context_release(aio_context);
 }
 
 void qmp_change_backing_file(const char *device,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 02/11] blockdev: acquire AioContext in do_qmp_query_block_jobs_one()
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 01/11] block: acquire AioContext in generic blockjob QMP commands Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-01 18:32   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 03/11] blockdev: acquire AioContext in blockdev_mark_auto_del() Stefan Hajnoczi
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

Make sure that query-block-jobs acquires the BlockDriverState
AioContext so that the blockjob isn't running in another thread while we
access its state.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 379a268..270425d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2585,14 +2585,19 @@ fail:
 static void do_qmp_query_block_jobs_one(void *opaque, BlockDriverState *bs)
 {
     BlockJobInfoList **prev = opaque;
-    BlockJob *job = bs->job;
+    AioContext *aio_context;
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
 
-    if (job) {
+    if (bs->job) {
         BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1);
         elem->value = block_job_query(bs->job);
         (*prev)->next = elem;
         *prev = elem;
     }
+
+    aio_context_release(aio_context);
 }
 
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 03/11] blockdev: acquire AioContext in blockdev_mark_auto_del()
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 01/11] block: acquire AioContext in generic blockjob QMP commands Stefan Hajnoczi
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 02/11] blockdev: acquire AioContext in do_qmp_query_block_jobs_one() Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-01 18:39   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 04/11] blockdev: add note that block_job_cb() must be thread-safe Stefan Hajnoczi
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

When an emulated storage controller is unrealized it will call
blockdev_mark_auto_del().  This will cancel any running block job (and
that eventually releases its reference to the BDS so it can be freed).

Since the block job may be executing in another AioContext we must
acquire/release to ensure thread safety.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 270425d..5e38f34 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -89,14 +89,21 @@ static const int if_max_devs[IF_COUNT] = {
 void blockdev_mark_auto_del(BlockDriverState *bs)
 {
     DriveInfo *dinfo = drive_get_by_blockdev(bs);
+    AioContext *aio_context;
 
     if (dinfo && !dinfo->enable_auto_del) {
         return;
     }
 
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+
     if (bs->job) {
         block_job_cancel(bs->job);
     }
+
+    aio_context_release(aio_context);
+
     if (dinfo) {
         dinfo->auto_del = 1;
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 04/11] blockdev: add note that block_job_cb() must be thread-safe
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 03/11] blockdev: acquire AioContext in blockdev_mark_auto_del() Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-01 18:42   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 05/11] blockjob: add block_job_defer_to_main_loop() Stefan Hajnoczi
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

This function is correct but we should document the constraint that
everything must be thread-safe.

Emitting QMP events and scheduling BHs are both thread-safe so nothing
needs to be done here.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockdev.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 5e38f34..1c79352 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1889,6 +1889,11 @@ out:
 
 static void block_job_cb(void *opaque, int ret)
 {
+    /* Note that this function may be executed from another AioContext besides
+     * the QEMU main loop.  If you need to access anything that assumes the
+     * QEMU global mutex, use a BH or introduce a mutex.
+     */
+
     BlockDriverState *bs = opaque;
     const char *msg = NULL;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 05/11] blockjob: add block_job_defer_to_main_loop()
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 04/11] blockdev: add note that block_job_cb() must be thread-safe Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-01 19:06   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 06/11] block: add bdrv_drain() Stefan Hajnoczi
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

Block jobs will run in the BlockDriverState's AioContext, which may not
always be the QEMU main loop.

There are some block layer APIs that are either not thread-safe or risk
lock ordering problems.  This includes bdrv_unref(), bdrv_close(), and
anything that calls bdrv_drain_all().

The block_job_defer_to_main_loop() API allows a block job to schedule a
function to run in the main loop with the BlockDriverState AioContext
held.

This function will be used to perform cleanup and backing chain
manipulations in block jobs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockjob.c               | 35 +++++++++++++++++++++++++++++++++++
 include/block/blockjob.h | 19 +++++++++++++++++++
 2 files changed, 54 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 0689fdd..24a64d8 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -313,3 +313,38 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
     }
     return action;
 }
+
+typedef struct {
+    BlockJob *job;
+    QEMUBH *bh;
+    AioContext *aio_context;
+    BlockJobDeferToMainLoopFn *fn;
+    void *opaque;
+} BlockJobDeferToMainLoopData;
+
+static void block_job_defer_to_main_loop_bh(void *opaque)
+{
+    BlockJobDeferToMainLoopData *data = opaque;
+
+    qemu_bh_delete(data->bh);
+
+    aio_context_acquire(data->aio_context);
+    data->fn(data->job, data->opaque);
+    aio_context_release(data->aio_context);
+
+    g_free(data);
+}
+
+void block_job_defer_to_main_loop(BlockJob *job,
+                                  BlockJobDeferToMainLoopFn *fn,
+                                  void *opaque)
+{
+    BlockJobDeferToMainLoopData *data = g_malloc(sizeof(*data));
+    data->job = job;
+    data->bh = qemu_bh_new(block_job_defer_to_main_loop_bh, data);
+    data->aio_context = bdrv_get_aio_context(job->bs);
+    data->fn = fn;
+    data->opaque = opaque;
+
+    qemu_bh_schedule(data->bh);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 60aa835..5c13124 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -295,4 +295,23 @@ void block_job_iostatus_reset(BlockJob *job);
 BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
                                         BlockdevOnError on_err,
                                         int is_read, int error);
+
+typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
+
+/**
+ * block_job_defer_to_main_loop:
+ * @job: The job
+ * @fn: The function to run in the main loop
+ * @opaque: The opaque value that is passed to @fn
+ *
+ * Execute a given function in the main loop with the BlockDriverState
+ * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), and
+ * anything that uses bdrv_drain_all() in the main loop.
+ *
+ * The @job AioContext is held while @fn executes.
+ */
+void block_job_defer_to_main_loop(BlockJob *job,
+                                  BlockJobDeferToMainLoopFn *fn,
+                                  void *opaque);
+
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 06/11] block: add bdrv_drain()
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 05/11] blockjob: add block_job_defer_to_main_loop() Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-01 19:15   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 07/11] block: let backup blockjob run in BDS AioContext Stefan Hajnoczi
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

Now that op blockers are in use, we can ensure that no other sources are
generating I/O on a BlockDriverState.  Therefore it is possible to drain
requests for a single BDS.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c               | 36 +++++++++++++++++++++++++++++-------
 include/block/block.h |  1 +
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index c5a251c..48305c4 100644
--- a/block.c
+++ b/block.c
@@ -1918,6 +1918,34 @@ static bool bdrv_requests_pending(BlockDriverState *bs)
     return false;
 }
 
+static bool bdrv_drain_one(BlockDriverState *bs)
+{
+    bool bs_busy;
+
+    bdrv_flush_io_queue(bs);
+    bdrv_start_throttled_reqs(bs);
+    bs_busy = bdrv_requests_pending(bs);
+    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
+    return bs_busy;
+}
+
+/*
+ * Wait for pending requests to complete on a single BlockDriverState subtree
+ *
+ * See the warning in bdrv_drain_all().  This function can only be called if
+ * you are sure nothing can generate I/O because you have op blockers
+ * installed.
+ *
+ * Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
+ * AioContext.
+ */
+void bdrv_drain(BlockDriverState *bs)
+{
+    while (bdrv_drain_one(bs)) {
+        /* Keep iterating */
+    }
+}
+
 /*
  * Wait for pending requests to complete across all BlockDriverStates
  *
@@ -1941,16 +1969,10 @@ void bdrv_drain_all(void)
 
         QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
             AioContext *aio_context = bdrv_get_aio_context(bs);
-            bool bs_busy;
 
             aio_context_acquire(aio_context);
-            bdrv_flush_io_queue(bs);
-            bdrv_start_throttled_reqs(bs);
-            bs_busy = bdrv_requests_pending(bs);
-            bs_busy |= aio_poll(aio_context, bs_busy);
+            busy |= bdrv_drain_one(bs);
             aio_context_release(aio_context);
-
-            busy |= bs_busy;
         }
     }
 }
diff --git a/include/block/block.h b/include/block/block.h
index 3318f0d..61df804 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -372,6 +372,7 @@ int bdrv_flush(BlockDriverState *bs);
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
 int bdrv_flush_all(void);
 void bdrv_close_all(void);
+void bdrv_drain(BlockDriverState *bs);
 void bdrv_drain_all(void);
 
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 07/11] block: let backup blockjob run in BDS AioContext
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 06/11] block: add bdrv_drain() Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-01 19:38   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 08/11] block: let stream " Stefan Hajnoczi
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

The backup block job must run in the BlockDriverState AioContext so that
it works with dataplane.

The basics of acquiring the AioContext are easy in blockdev.c.

The completion code in block/backup.c must call bdrv_unref() from the
main loop.  Use block_job_defer_to_main_loop() to achieve that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/backup.c | 21 +++++++++++++++++++--
 blockdev.c     | 23 ++++++++++++++++-------
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index d0b0225..9d015b5 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -227,9 +227,25 @@ static BlockErrorAction backup_error_action(BackupBlockJob *job,
     }
 }
 
+typedef struct {
+    int ret;
+} BackupCompleteData;
+
+static void backup_complete(BlockJob *job, void *opaque)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+    BackupCompleteData *data = opaque;
+
+    bdrv_unref(s->target);
+
+    block_job_completed(job, data->ret);
+    g_free(data);
+}
+
 static void coroutine_fn backup_run(void *opaque)
 {
     BackupBlockJob *job = opaque;
+    BackupCompleteData *data;
     BlockDriverState *bs = job->common.bs;
     BlockDriverState *target = job->target;
     BlockdevOnError on_target_error = job->on_target_error;
@@ -344,9 +360,10 @@ static void coroutine_fn backup_run(void *opaque)
     hbitmap_free(job->bitmap);
 
     bdrv_iostatus_disable(target);
-    bdrv_unref(target);
 
-    block_job_completed(&job->common, ret);
+    data = g_malloc(sizeof(*data));
+    data->ret = ret;
+    block_job_defer_to_main_loop(&job->common, backup_complete, data);
 }
 
 void backup_start(BlockDriverState *bs, BlockDriverState *target,
diff --git a/blockdev.c b/blockdev.c
index 1c79352..44f0cc7 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2068,6 +2068,7 @@ void qmp_drive_backup(const char *device, const char *target,
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     BlockDriverState *source = NULL;
+    AioContext *aio_context;
     BlockDriver *drv = NULL;
     Error *local_err = NULL;
     int flags;
@@ -2093,9 +2094,12 @@ void qmp_drive_backup(const char *device, const char *target,
         return;
     }
 
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+
     if (!bdrv_is_inserted(bs)) {
         error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
-        return;
+        goto out;
     }
 
     if (!has_format) {
@@ -2105,12 +2109,12 @@ void qmp_drive_backup(const char *device, const char *target,
         drv = bdrv_find_format(format);
         if (!drv) {
             error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
-            return;
+            goto out;
         }
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
-        return;
+        goto out;
     }
 
     flags = bs->open_flags | BDRV_O_RDWR;
@@ -2130,7 +2134,7 @@ void qmp_drive_backup(const char *device, const char *target,
     size = bdrv_getlength(bs);
     if (size < 0) {
         error_setg_errno(errp, -size, "bdrv_getlength failed");
-        return;
+        goto out;
     }
 
     if (mode != NEW_IMAGE_MODE_EXISTING) {
@@ -2147,23 +2151,28 @@ void qmp_drive_backup(const char *device, const char *target,
 
     if (local_err) {
         error_propagate(errp, local_err);
-        return;
+        goto out;
     }
 
     target_bs = NULL;
     ret = bdrv_open(&target_bs, target, NULL, NULL, flags, drv, &local_err);
     if (ret < 0) {
         error_propagate(errp, local_err);
-        return;
+        goto out;
     }
 
+    bdrv_set_aio_context(target_bs, aio_context);
+
     backup_start(bs, target_bs, speed, sync, on_source_error, on_target_error,
                  block_job_cb, bs, &local_err);
     if (local_err != NULL) {
         bdrv_unref(target_bs);
         error_propagate(errp, local_err);
-        return;
+        goto out;
     }
+
+out:
+    aio_context_release(aio_context);
 }
 
 BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 08/11] block: let stream blockjob run in BDS AioContext
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
                   ` (6 preceding siblings ...)
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 07/11] block: let backup blockjob run in BDS AioContext Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-04 20:48   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 09/11] block: let mirror " Stefan Hajnoczi
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

The stream block job must run in the BlockDriverState AioContext so that
it works with dataplane.

The basics of acquiring the AioContext are easy in blockdev.c.

The tricky part is the completion code which drops part of the backing
file chain.  This must be done in the main loop where bdrv_unref() and
bdrv_close() are safe to call.  Use block_job_defer_to_main_loop() to
achieve that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/stream.c | 50 ++++++++++++++++++++++++++++++++++++--------------
 blockdev.c     | 16 ++++++++++++----
 2 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index cdea3e8..21bce12 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -79,9 +79,39 @@ static void close_unused_images(BlockDriverState *top, BlockDriverState *base,
     bdrv_refresh_limits(top, NULL);
 }
 
+typedef struct {
+    int ret;
+    bool reached_end;
+} StreamCompleteData;
+
+static void stream_complete(BlockJob *job, void *opaque)
+{
+    StreamBlockJob *s = container_of(job, StreamBlockJob, common);
+    StreamCompleteData *data = opaque;
+    BlockDriverState *base = s->base;
+
+    if (!block_job_is_cancelled(&s->common) && data->reached_end &&
+        data->ret == 0) {
+        const char *base_id = NULL, *base_fmt = NULL;
+        if (base) {
+            base_id = s->backing_file_str;
+            if (base->drv) {
+                base_fmt = base->drv->format_name;
+            }
+        }
+        data->ret = bdrv_change_backing_file(job->bs, base_id, base_fmt);
+        close_unused_images(job->bs, base, base_id);
+    }
+
+    g_free(s->backing_file_str);
+    block_job_completed(&s->common, data->ret);
+    g_free(data);
+}
+
 static void coroutine_fn stream_run(void *opaque)
 {
     StreamBlockJob *s = opaque;
+    StreamCompleteData *data;
     BlockDriverState *bs = s->common.bs;
     BlockDriverState *base = s->base;
     int64_t sector_num, end;
@@ -183,21 +213,13 @@ wait:
     /* Do not remove the backing file if an error was there but ignored.  */
     ret = error;
 
-    if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) {
-        const char *base_id = NULL, *base_fmt = NULL;
-        if (base) {
-            base_id = s->backing_file_str;
-            if (base->drv) {
-                base_fmt = base->drv->format_name;
-            }
-        }
-        ret = bdrv_change_backing_file(bs, base_id, base_fmt);
-        close_unused_images(bs, base, base_id);
-    }
-
     qemu_vfree(buf);
-    g_free(s->backing_file_str);
-    block_job_completed(&s->common, ret);
+
+    /* Modify backing chain and close BDSes in main loop */
+    data = g_malloc(sizeof(*data));
+    data->ret = ret;
+    data->reached_end = sector_num == end;
+    block_job_defer_to_main_loop(&s->common, stream_complete, data);
 }
 
 static void stream_set_speed(BlockJob *job, int64_t speed, Error **errp)
diff --git a/blockdev.c b/blockdev.c
index 44f0cc7..5473cd0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1923,6 +1923,7 @@ void qmp_block_stream(const char *device,
 {
     BlockDriverState *bs;
     BlockDriverState *base_bs = NULL;
+    AioContext *aio_context;
     Error *local_err = NULL;
     const char *base_name = NULL;
 
@@ -1936,16 +1937,20 @@ void qmp_block_stream(const char *device,
         return;
     }
 
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_STREAM, errp)) {
-        return;
+        goto out;
     }
 
     if (has_base) {
         base_bs = bdrv_find_backing_image(bs, base);
         if (base_bs == NULL) {
             error_set(errp, QERR_BASE_NOT_FOUND, base);
-            return;
+            goto out;
         }
+        assert(bdrv_get_aio_context(base_bs) == aio_context);
         base_name = base;
     }
 
@@ -1954,7 +1959,7 @@ void qmp_block_stream(const char *device,
     if (base_bs == NULL && has_backing_file) {
         error_setg(errp, "backing file specified, but streaming the "
                          "entire chain");
-        return;
+        goto out;
     }
 
     /* backing_file string overrides base bs filename */
@@ -1964,10 +1969,13 @@ void qmp_block_stream(const char *device,
                  on_error, block_job_cb, bs, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
-        return;
+        goto out;
     }
 
     trace_qmp_block_stream(bs, bs->job);
+
+out:
+    aio_context_release(aio_context);
 }
 
 void qmp_block_commit(const char *device,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 09/11] block: let mirror blockjob run in BDS AioContext
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
                   ` (7 preceding siblings ...)
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 08/11] block: let stream " Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-04 21:07   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 10/11] block: let commit " Stefan Hajnoczi
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 11/11] block: declare blockjobs and dataplane friends! Stefan Hajnoczi
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

The mirror block job must run in the BlockDriverState AioContext so that
it works with dataplane.

Acquire the AioContext in blockdev.c so starting the block job is safe.

Note that to_replace is treated separately from other BlockDriverStates
in that it does not need to be in the same AioContext.  Explicitly
acquire/release to_replace's AioContext when accessing it.

The completion code in block/mirror.c must perform BDS graph
manipulation and bdrv_reopen() from the main loop.  Use
block_job_defer_to_main_loop() to achieve that.

The bdrv_drain_all() call is not allowed outside the main loop since it
could lead to lock ordering problems.  Use bdrv_drain(bs) instead
because we have acquired the AioContext so nothing else can sneak in
I/O.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c        | 13 +++++++--
 block/mirror.c | 85 ++++++++++++++++++++++++++++++++++++++++------------------
 blockdev.c     | 38 ++++++++++++++++++--------
 3 files changed, 97 insertions(+), 39 deletions(-)

diff --git a/block.c b/block.c
index 48305c4..9966390 100644
--- a/block.c
+++ b/block.c
@@ -5942,13 +5942,19 @@ bool bdrv_is_first_non_filter(BlockDriverState *candidate)
 BlockDriverState *check_to_replace_node(const char *node_name, Error **errp)
 {
     BlockDriverState *to_replace_bs = bdrv_find_node(node_name);
+    AioContext *aio_context;
+
     if (!to_replace_bs) {
         error_setg(errp, "Node name '%s' not found", node_name);
         return NULL;
     }
 
+    aio_context = bdrv_get_aio_context(to_replace_bs);
+    aio_context_acquire(aio_context);
+
     if (bdrv_op_is_blocked(to_replace_bs, BLOCK_OP_TYPE_REPLACE, errp)) {
-        return NULL;
+        to_replace_bs = NULL;
+        goto out;
     }
 
     /* We don't want arbitrary node of the BDS chain to be replaced only the top
@@ -5958,9 +5964,12 @@ BlockDriverState *check_to_replace_node(const char *node_name, Error **errp)
      */
     if (!bdrv_is_first_non_filter(to_replace_bs)) {
         error_setg(errp, "Only top most non filter can be replaced");
-        return NULL;
+        to_replace_bs = NULL;
+        goto out;
     }
 
+out:
+    aio_context_release(aio_context);
     return to_replace_bs;
 }
 
diff --git a/block/mirror.c b/block/mirror.c
index 18b18e0..19b87d0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -314,9 +314,56 @@ static void mirror_drain(MirrorBlockJob *s)
     }
 }
 
+typedef struct {
+    int ret;
+} MirrorExitData;
+
+static void mirror_exit(BlockJob *job, void *opaque)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+    MirrorExitData *data = opaque;
+    AioContext *replace_aio_context = NULL;
+
+    if (s->to_replace) {
+        replace_aio_context = bdrv_get_aio_context(s->to_replace);
+        aio_context_acquire(replace_aio_context);
+    }
+
+    if (s->should_complete && data->ret == 0) {
+        BlockDriverState *to_replace = s->common.bs;
+        if (s->to_replace) {
+            to_replace = s->to_replace;
+        }
+        if (bdrv_get_flags(s->target) != bdrv_get_flags(to_replace)) {
+            bdrv_reopen(s->target, bdrv_get_flags(to_replace), NULL);
+        }
+        bdrv_swap(s->target, to_replace);
+        if (s->common.driver->job_type == BLOCK_JOB_TYPE_COMMIT) {
+            /* drop the bs loop chain formed by the swap: break the loop then
+             * trigger the unref from the top one */
+            BlockDriverState *p = s->base->backing_hd;
+            bdrv_set_backing_hd(s->base, NULL);
+            bdrv_unref(p);
+        }
+    }
+    if (s->to_replace) {
+        bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
+        error_free(s->replace_blocker);
+        bdrv_unref(s->to_replace);
+    }
+    if (replace_aio_context) {
+        aio_context_release(replace_aio_context);
+    }
+    g_free(s->replaces);
+    bdrv_unref(s->target);
+    block_job_completed(&s->common, data->ret);
+    g_free(data);
+}
+
 static void coroutine_fn mirror_run(void *opaque)
 {
     MirrorBlockJob *s = opaque;
+    MirrorExitData *data;
     BlockDriverState *bs = s->common.bs;
     int64_t sector_num, end, sectors_per_chunk, length;
     uint64_t last_pause_ns;
@@ -467,7 +514,7 @@ static void coroutine_fn mirror_run(void *opaque)
              * mirror_populate runs.
              */
             trace_mirror_before_drain(s, cnt);
-            bdrv_drain_all();
+            bdrv_drain(bs);
             cnt = bdrv_get_dirty_count(bs, s->dirty_bitmap);
         }
 
@@ -510,31 +557,10 @@ immediate_exit:
     g_free(s->in_flight_bitmap);
     bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
     bdrv_iostatus_disable(s->target);
-    if (s->should_complete && ret == 0) {
-        BlockDriverState *to_replace = s->common.bs;
-        if (s->to_replace) {
-            to_replace = s->to_replace;
-        }
-        if (bdrv_get_flags(s->target) != bdrv_get_flags(to_replace)) {
-            bdrv_reopen(s->target, bdrv_get_flags(to_replace), NULL);
-        }
-        bdrv_swap(s->target, to_replace);
-        if (s->common.driver->job_type == BLOCK_JOB_TYPE_COMMIT) {
-            /* drop the bs loop chain formed by the swap: break the loop then
-             * trigger the unref from the top one */
-            BlockDriverState *p = s->base->backing_hd;
-            bdrv_set_backing_hd(s->base, NULL);
-            bdrv_unref(p);
-        }
-    }
-    if (s->to_replace) {
-        bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
-        error_free(s->replace_blocker);
-        bdrv_unref(s->to_replace);
-    }
-    g_free(s->replaces);
-    bdrv_unref(s->target);
-    block_job_completed(&s->common, ret);
+
+    data = g_malloc(sizeof(*data));
+    data->ret = ret;
+    block_job_defer_to_main_loop(&s->common, mirror_exit, data);
 }
 
 static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -573,16 +599,23 @@ static void mirror_complete(BlockJob *job, Error **errp)
 
     /* check the target bs is not blocked and block all operations on it */
     if (s->replaces) {
+        AioContext *replace_aio_context;
+
         s->to_replace = check_to_replace_node(s->replaces, &local_err);
         if (!s->to_replace) {
             error_propagate(errp, local_err);
             return;
         }
 
+        replace_aio_context = bdrv_get_aio_context(s->to_replace);
+        aio_context_acquire(replace_aio_context);
+
         error_setg(&s->replace_blocker,
                    "block device is in use by block-job-complete");
         bdrv_op_block_all(s->to_replace, s->replace_blocker);
         bdrv_ref(s->to_replace);
+
+        aio_context_release(replace_aio_context);
     }
 
     s->should_complete = true;
diff --git a/blockdev.c b/blockdev.c
index 5473cd0..5cf2058 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2205,6 +2205,7 @@ void qmp_drive_mirror(const char *device, const char *target,
 {
     BlockDriverState *bs;
     BlockDriverState *source, *target_bs;
+    AioContext *aio_context;
     BlockDriver *drv = NULL;
     Error *local_err = NULL;
     QDict *options = NULL;
@@ -2247,9 +2248,12 @@ void qmp_drive_mirror(const char *device, const char *target,
         return;
     }
 
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+
     if (!bdrv_is_inserted(bs)) {
         error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
-        return;
+        goto out;
     }
 
     if (!has_format) {
@@ -2259,12 +2263,12 @@ void qmp_drive_mirror(const char *device, const char *target,
         drv = bdrv_find_format(format);
         if (!drv) {
             error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
-            return;
+            goto out;
         }
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_MIRROR, errp)) {
-        return;
+        goto out;
     }
 
     flags = bs->open_flags | BDRV_O_RDWR;
@@ -2279,29 +2283,36 @@ void qmp_drive_mirror(const char *device, const char *target,
     size = bdrv_getlength(bs);
     if (size < 0) {
         error_setg_errno(errp, -size, "bdrv_getlength failed");
-        return;
+        goto out;
     }
 
     if (has_replaces) {
         BlockDriverState *to_replace_bs;
+        AioContext *replace_aio_context;
+        int64_t replace_size;
 
         if (!has_node_name) {
             error_setg(errp, "a node-name must be provided when replacing a"
                              " named node of the graph");
-            return;
+            goto out;
         }
 
         to_replace_bs = check_to_replace_node(replaces, &local_err);
 
         if (!to_replace_bs) {
             error_propagate(errp, local_err);
-            return;
+            goto out;
         }
 
-        if (size != bdrv_getlength(to_replace_bs)) {
+        replace_aio_context = bdrv_get_aio_context(to_replace_bs);
+        aio_context_acquire(replace_aio_context);
+        replace_size = bdrv_getlength(to_replace_bs);
+        aio_context_release(replace_aio_context);
+
+        if (size != replace_size) {
             error_setg(errp, "cannot replace image with a mirror image of "
                              "different size");
-            return;
+            goto out;
         }
     }
 
@@ -2330,7 +2341,7 @@ void qmp_drive_mirror(const char *device, const char *target,
 
     if (local_err) {
         error_propagate(errp, local_err);
-        return;
+        goto out;
     }
 
     if (has_node_name) {
@@ -2346,9 +2357,11 @@ void qmp_drive_mirror(const char *device, const char *target,
                     flags | BDRV_O_NO_BACKING, drv, &local_err);
     if (ret < 0) {
         error_propagate(errp, local_err);
-        return;
+        goto out;
     }
 
+    bdrv_set_aio_context(target_bs, aio_context);
+
     /* pass the node name to replace to mirror start since it's loose coupling
      * and will allow to check whether the node still exist at mirror completion
      */
@@ -2360,8 +2373,11 @@ void qmp_drive_mirror(const char *device, const char *target,
     if (local_err != NULL) {
         bdrv_unref(target_bs);
         error_propagate(errp, local_err);
-        return;
+        goto out;
     }
+
+out:
+    aio_context_release(aio_context);
 }
 
 /* Get the block job for a given device name and acquire its AioContext */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 10/11] block: let commit blockjob run in BDS AioContext
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
                   ` (8 preceding siblings ...)
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 09/11] block: let mirror " Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-04 21:28   ` Max Reitz
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 11/11] block: declare blockjobs and dataplane friends! Stefan Hajnoczi
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

The commit block job must run in the BlockDriverState AioContext so that
it works with dataplane.

Acquire the AioContext in blockdev.c so starting the block job is safe.
One detail here is that the bdrv_drain_all() must be moved inside the
aio_context_acquire() region so requests cannot sneak in between the
drain and acquire.

The completion code in block/commit.c must perform backing chain
manipulation and bdrv_reopen() from the main loop.  Use
block_job_defer_to_main_loop() to achieve that.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/commit.c | 72 +++++++++++++++++++++++++++++++++++++---------------------
 blockdev.c     | 29 +++++++++++++++--------
 2 files changed, 66 insertions(+), 35 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 91517d3..0fd05dc 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -60,17 +60,50 @@ static int coroutine_fn commit_populate(BlockDriverState *bs,
     return 0;
 }
 
-static void coroutine_fn commit_run(void *opaque)
+typedef struct {
+    int ret;
+} CommitCompleteData;
+
+static void commit_complete(BlockJob *job, void *opaque)
 {
-    CommitBlockJob *s = opaque;
+    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
+    CommitCompleteData *data = opaque;
     BlockDriverState *active = s->active;
     BlockDriverState *top = s->top;
     BlockDriverState *base = s->base;
     BlockDriverState *overlay_bs;
+    int ret = data->ret;
+
+    if (!block_job_is_cancelled(&s->common) && ret == 0) {
+        /* success */
+        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
+    }
+
+    /* restore base open flags here if appropriate (e.g., change the base back
+     * to r/o). These reopens do not need to be atomic, since we won't abort
+     * even on failure here */
+    if (s->base_flags != bdrv_get_flags(base)) {
+        bdrv_reopen(base, s->base_flags, NULL);
+    }
+    overlay_bs = bdrv_find_overlay(active, top);
+    if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
+        bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
+    }
+    g_free(s->backing_file_str);
+    block_job_completed(&s->common, ret);
+    g_free(data);
+}
+
+static void coroutine_fn commit_run(void *opaque)
+{
+    CommitBlockJob *s = opaque;
+    CommitCompleteData *data;
+    BlockDriverState *top = s->top;
+    BlockDriverState *base = s->base;
     int64_t sector_num, end;
     int ret = 0;
     int n = 0;
-    void *buf;
+    void *buf = NULL;
     int bytes_written = 0;
     int64_t base_len;
 
@@ -78,18 +111,18 @@ static void coroutine_fn commit_run(void *opaque)
 
 
     if (s->common.len < 0) {
-        goto exit_restore_reopen;
+        goto out;
     }
 
     ret = base_len = bdrv_getlength(base);
     if (base_len < 0) {
-        goto exit_restore_reopen;
+        goto out;
     }
 
     if (base_len < s->common.len) {
         ret = bdrv_truncate(base, s->common.len);
         if (ret) {
-            goto exit_restore_reopen;
+            goto out;
         }
     }
 
@@ -128,7 +161,7 @@ wait:
             if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
                 s->on_error == BLOCKDEV_ON_ERROR_REPORT||
                 (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC && ret == -ENOSPC)) {
-                goto exit_free_buf;
+                goto out;
             } else {
                 n = 0;
                 continue;
@@ -140,27 +173,14 @@ wait:
 
     ret = 0;
 
-    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
-        /* success */
-        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
+out:
+    if (buf) {
+        qemu_vfree(buf);
     }
 
-exit_free_buf:
-    qemu_vfree(buf);
-
-exit_restore_reopen:
-    /* restore base open flags here if appropriate (e.g., change the base back
-     * to r/o). These reopens do not need to be atomic, since we won't abort
-     * even on failure here */
-    if (s->base_flags != bdrv_get_flags(base)) {
-        bdrv_reopen(base, s->base_flags, NULL);
-    }
-    overlay_bs = bdrv_find_overlay(active, top);
-    if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
-        bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
-    }
-    g_free(s->backing_file_str);
-    block_job_completed(&s->common, ret);
+    data = g_malloc(sizeof(*data));
+    data->ret = ret;
+    block_job_defer_to_main_loop(&s->common, commit_complete, data);
 }
 
 static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
diff --git a/blockdev.c b/blockdev.c
index 5cf2058..d9333d7 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1987,6 +1987,7 @@ void qmp_block_commit(const char *device,
 {
     BlockDriverState *bs;
     BlockDriverState *base_bs, *top_bs;
+    AioContext *aio_context;
     Error *local_err = NULL;
     /* This will be part of the QMP command, if/when the
      * BlockdevOnError change for blkmirror makes it in
@@ -1997,9 +1998,6 @@ void qmp_block_commit(const char *device,
         speed = 0;
     }
 
-    /* drain all i/o before commits */
-    bdrv_drain_all();
-
     /* Important Note:
      *  libvirt relies on the DeviceNotFound error class in order to probe for
      *  live commit feature versions; for this to work, we must make sure to
@@ -2011,8 +2009,14 @@ void qmp_block_commit(const char *device,
         return;
     }
 
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+
+    /* drain all i/o before commits */
+    bdrv_drain_all();
+
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT, errp)) {
-        return;
+        goto out;
     }
 
     /* default top_bs is the active layer */
@@ -2026,9 +2030,11 @@ void qmp_block_commit(const char *device,
 
     if (top_bs == NULL) {
         error_setg(errp, "Top image file %s not found", top ? top : "NULL");
-        return;
+        goto out;
     }
 
+    assert(bdrv_get_aio_context(top_bs) == aio_context);
+
     if (has_base && base) {
         base_bs = bdrv_find_backing_image(top_bs, base);
     } else {
@@ -2037,20 +2043,22 @@ void qmp_block_commit(const char *device,
 
     if (base_bs == NULL) {
         error_set(errp, QERR_BASE_NOT_FOUND, base ? base : "NULL");
-        return;
+        goto out;
     }
 
+    assert(bdrv_get_aio_context(base_bs) == aio_context);
+
     /* Do not allow attempts to commit an image into itself */
     if (top_bs == base_bs) {
         error_setg(errp, "cannot commit an image into itself");
-        return;
+        goto out;
     }
 
     if (top_bs == bs) {
         if (has_backing_file) {
             error_setg(errp, "'backing-file' specified,"
                              " but 'top' is the active layer");
-            return;
+            goto out;
         }
         commit_active_start(bs, base_bs, speed, on_error, block_job_cb,
                             bs, &local_err);
@@ -2060,8 +2068,11 @@ void qmp_block_commit(const char *device,
     }
     if (local_err != NULL) {
         error_propagate(errp, local_err);
-        return;
+        goto out;
     }
+
+out:
+    aio_context_release(aio_context);
 }
 
 void qmp_drive_backup(const char *device, const char *target,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH 11/11] block: declare blockjobs and dataplane friends!
  2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
                   ` (9 preceding siblings ...)
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 10/11] block: let commit " Stefan Hajnoczi
@ 2014-10-01 17:01 ` Stefan Hajnoczi
  2014-10-04 21:30   ` Max Reitz
  10 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-01 17:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, Max Reitz

Now that blockjobs use AioContext they are safe for use with dataplane.
Unblock them!

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 blockjob.c                      | 1 +
 hw/block/dataplane/virtio-blk.c | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 24a64d8..d0b753f 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -50,6 +50,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
     error_setg(&job->blocker, "block device is in use by block job: %s",
                BlockJobType_lookup[driver->job_type]);
     bdrv_op_block_all(bs, job->blocker);
+    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 
     job->driver        = driver;
     job->bs            = bs;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5458f9d..0603230 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -195,6 +195,11 @@ void virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *blk,
     bdrv_op_block_all(blk->conf.bs, s->blocker);
     bdrv_op_unblock(blk->conf.bs, BLOCK_OP_TYPE_RESIZE, s->blocker);
     bdrv_op_unblock(blk->conf.bs, BLOCK_OP_TYPE_DRIVE_DEL, s->blocker);
+    bdrv_op_unblock(blk->conf.bs, BLOCK_OP_TYPE_BACKUP_SOURCE, s->blocker);
+    bdrv_op_unblock(blk->conf.bs, BLOCK_OP_TYPE_COMMIT, s->blocker);
+    bdrv_op_unblock(blk->conf.bs, BLOCK_OP_TYPE_MIRROR, s->blocker);
+    bdrv_op_unblock(blk->conf.bs, BLOCK_OP_TYPE_STREAM, s->blocker);
+    bdrv_op_unblock(blk->conf.bs, BLOCK_OP_TYPE_REPLACE, s->blocker);
 
     *dataplane = s;
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 01/11] block: acquire AioContext in generic blockjob QMP commands
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 01/11] block: acquire AioContext in generic blockjob QMP commands Stefan Hajnoczi
@ 2014-10-01 18:27   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-01 18:27 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> block-job-set-speed, block-job-cancel, block-job-pause,
> block-job-resume, and block-job-complete must acquire the
> BlockDriverState AioContext so that it is safe to access bs.
>
> At the moment bs->job is always NULL when dataplane is active because op
> blockers prevent blockjobs from starting.  Once the rest of the blockjob
> API has been made aware of AioContext we can drop the op blocker.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   blockdev.c | 52 +++++++++++++++++++++++++++++++++++++++-------------
>   1 file changed, 39 insertions(+), 13 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 02/11] blockdev: acquire AioContext in do_qmp_query_block_jobs_one()
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 02/11] blockdev: acquire AioContext in do_qmp_query_block_jobs_one() Stefan Hajnoczi
@ 2014-10-01 18:32   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-01 18:32 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> Make sure that query-block-jobs acquires the BlockDriverState
> AioContext so that the blockjob isn't running in another thread while we
> access its state.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   blockdev.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)

This conflicts with patch 7 Markus's BlockBackend series; but in itself, 
this patch is fine.

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 03/11] blockdev: acquire AioContext in blockdev_mark_auto_del()
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 03/11] blockdev: acquire AioContext in blockdev_mark_auto_del() Stefan Hajnoczi
@ 2014-10-01 18:39   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-01 18:39 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> When an emulated storage controller is unrealized it will call
> blockdev_mark_auto_del().  This will cancel any running block job (and
> that eventually releases its reference to the BDS so it can be freed).
>
> Since the block job may be executing in another AioContext we must
> acquire/release to ensure thread safety.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   blockdev.c | 7 +++++++
>   1 file changed, 7 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 04/11] blockdev: add note that block_job_cb() must be thread-safe
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 04/11] blockdev: add note that block_job_cb() must be thread-safe Stefan Hajnoczi
@ 2014-10-01 18:42   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-01 18:42 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> This function is correct but we should document the constraint that
> everything must be thread-safe.
>
> Emitting QMP events and scheduling BHs are both thread-safe so nothing
> needs to be done here.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   blockdev.c | 5 +++++
>   1 file changed, 5 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 05/11] blockjob: add block_job_defer_to_main_loop()
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 05/11] blockjob: add block_job_defer_to_main_loop() Stefan Hajnoczi
@ 2014-10-01 19:06   ` Max Reitz
  2014-10-02 14:58     ` Stefan Hajnoczi
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2014-10-01 19:06 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> Block jobs will run in the BlockDriverState's AioContext, which may not
> always be the QEMU main loop.
>
> There are some block layer APIs that are either not thread-safe or risk
> lock ordering problems.  This includes bdrv_unref(), bdrv_close(), and
> anything that calls bdrv_drain_all().
>
> The block_job_defer_to_main_loop() API allows a block job to schedule a
> function to run in the main loop with the BlockDriverState AioContext
> held.
>
> This function will be used to perform cleanup and backing chain
> manipulations in block jobs.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   blockjob.c               | 35 +++++++++++++++++++++++++++++++++++
>   include/block/blockjob.h | 19 +++++++++++++++++++
>   2 files changed, 54 insertions(+)
>
> diff --git a/blockjob.c b/blockjob.c
> index 0689fdd..24a64d8 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -313,3 +313,38 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
>       }
>       return action;
>   }
> +
> +typedef struct {
> +    BlockJob *job;
> +    QEMUBH *bh;
> +    AioContext *aio_context;
> +    BlockJobDeferToMainLoopFn *fn;
> +    void *opaque;
> +} BlockJobDeferToMainLoopData;
> +
> +static void block_job_defer_to_main_loop_bh(void *opaque)
> +{
> +    BlockJobDeferToMainLoopData *data = opaque;
> +
> +    qemu_bh_delete(data->bh);
> +
> +    aio_context_acquire(data->aio_context);
> +    data->fn(data->job, data->opaque);
> +    aio_context_release(data->aio_context);
> +
> +    g_free(data);
> +}
> +
> +void block_job_defer_to_main_loop(BlockJob *job,
> +                                  BlockJobDeferToMainLoopFn *fn,
> +                                  void *opaque)
> +{
> +    BlockJobDeferToMainLoopData *data = g_malloc(sizeof(*data));
> +    data->job = job;
> +    data->bh = qemu_bh_new(block_job_defer_to_main_loop_bh, data);
> +    data->aio_context = bdrv_get_aio_context(job->bs);
> +    data->fn = fn;
> +    data->opaque = opaque;
> +
> +    qemu_bh_schedule(data->bh);
> +}

I'm not so sure whether it'd be possible to change the BDS's AIO context 
(in another thread) after the call to bdrv_get_aio_context() in 
block_job_defer_to_main_loop() and before 
block_job_defer_to_main_loop_bh() is run. If so, this might create race 
conditions.

Other than that, this patch looks good.

Max

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 06/11] block: add bdrv_drain()
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 06/11] block: add bdrv_drain() Stefan Hajnoczi
@ 2014-10-01 19:15   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-01 19:15 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> Now that op blockers are in use, we can ensure that no other sources are
> generating I/O on a BlockDriverState.  Therefore it is possible to drain
> requests for a single BDS.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block.c               | 36 +++++++++++++++++++++++++++++-------
>   include/block/block.h |  1 +
>   2 files changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/block.c b/block.c
> index c5a251c..48305c4 100644
> --- a/block.c
> +++ b/block.c
> @@ -1918,6 +1918,34 @@ static bool bdrv_requests_pending(BlockDriverState *bs)
>       return false;
>   }
>   
> +static bool bdrv_drain_one(BlockDriverState *bs)
> +{
> +    bool bs_busy;
> +
> +    bdrv_flush_io_queue(bs);
> +    bdrv_start_throttled_reqs(bs);
> +    bs_busy = bdrv_requests_pending(bs);
> +    bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy);
> +    return bs_busy;
> +}
> +
> +/*
> + * Wait for pending requests to complete on a single BlockDriverState subtree
> + *
> + * See the warning in bdrv_drain_all().  This function can only be called if
> + * you are sure nothing can generate I/O because you have op blockers
> + * installed.

Although that warning now sounds too harsh: "it is not possible to have 
a function to drain a single device's I/O queue." Apparently, under 
certain circumstances, it now *is* possible. ;-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] block: let backup blockjob run in BDS AioContext
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 07/11] block: let backup blockjob run in BDS AioContext Stefan Hajnoczi
@ 2014-10-01 19:38   ` Max Reitz
  2014-10-02 15:08     ` Stefan Hajnoczi
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2014-10-01 19:38 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> The backup block job must run in the BlockDriverState AioContext so that
> it works with dataplane.
>
> The basics of acquiring the AioContext are easy in blockdev.c.
>
> The completion code in block/backup.c must call bdrv_unref() from the
> main loop.  Use block_job_defer_to_main_loop() to achieve that.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block/backup.c | 21 +++++++++++++++++++--
>   blockdev.c     | 23 ++++++++++++++++-------
>   2 files changed, 35 insertions(+), 9 deletions(-)

Hm, so, if I run a backup blockjob from one thread (which is not the 
main loop and not the BDS's AIO context) and try to cancel it or set its 
speed from another thread (e.g. the main loop), won't the blockjob hold 
the BDS's AIO context lock as long as it runs and making it impossible 
to interfere?

Max

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 05/11] blockjob: add block_job_defer_to_main_loop()
  2014-10-01 19:06   ` Max Reitz
@ 2014-10-02 14:58     ` Stefan Hajnoczi
  0 siblings, 0 replies; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-02 14:58 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2435 bytes --]

On Wed, Oct 01, 2014 at 09:06:36PM +0200, Max Reitz wrote:
> On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> >+typedef struct {
> >+    BlockJob *job;
> >+    QEMUBH *bh;
> >+    AioContext *aio_context;
> >+    BlockJobDeferToMainLoopFn *fn;
> >+    void *opaque;
> >+} BlockJobDeferToMainLoopData;
> >+
> >+static void block_job_defer_to_main_loop_bh(void *opaque)
> >+{
> >+    BlockJobDeferToMainLoopData *data = opaque;
> >+
> >+    qemu_bh_delete(data->bh);
> >+
> >+    aio_context_acquire(data->aio_context);
> >+    data->fn(data->job, data->opaque);
> >+    aio_context_release(data->aio_context);
> >+
> >+    g_free(data);
> >+}
> >+
> >+void block_job_defer_to_main_loop(BlockJob *job,
> >+                                  BlockJobDeferToMainLoopFn *fn,
> >+                                  void *opaque)
> >+{
> >+    BlockJobDeferToMainLoopData *data = g_malloc(sizeof(*data));
> >+    data->job = job;
> >+    data->bh = qemu_bh_new(block_job_defer_to_main_loop_bh, data);
> >+    data->aio_context = bdrv_get_aio_context(job->bs);
> >+    data->fn = fn;
> >+    data->opaque = opaque;
> >+
> >+    qemu_bh_schedule(data->bh);
> >+}
> 
> I'm not so sure whether it'd be possible to change the BDS's AIO context (in
> another thread) after the call to bdrv_get_aio_context() in
> block_job_defer_to_main_loop() and before block_job_defer_to_main_loop_bh()
> is run. If so, this might create race conditions.
> 
> Other than that, this patch looks good.

Yes, I think you are correct.

The solution is a little tricky, we need to hold both AioContexts:

  static void block_job_defer_to_main_loop_bh(void *opaque)
  {
      BlockJobDeferToMainLoopData *data = opaque;
      AioContext aio_context;

      qemu_bh_delete(data->bh);

      /* Prevent race with block_job_defer_to_main_loop() */
      aio_context_acquire(data->aio_context);

      /* Fetch BDS AioContext again, in case it has changed */
      aio_context = bdrv_get_aio_context(data->job->bs);
      aio_context_acquire(aio_context);

      data->fn(data->job, data->opaque);

      aio_context_release(aio_context);

      aio_context_release(data->aio_context);

      g_free(data);
  }

Tada!  Because this executes in the main loop it is safe to do this - no lock
ordering problems.  And because aio_context_acquire() is recursive we can lock
twice without problems.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] block: let backup blockjob run in BDS AioContext
  2014-10-01 19:38   ` Max Reitz
@ 2014-10-02 15:08     ` Stefan Hajnoczi
  2014-10-04 19:54       ` Max Reitz
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-02 15:08 UTC (permalink / raw)
  To: Max Reitz
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1559 bytes --]

On Wed, Oct 01, 2014 at 09:38:39PM +0200, Max Reitz wrote:
> On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> >The backup block job must run in the BlockDriverState AioContext so that
> >it works with dataplane.
> >
> >The basics of acquiring the AioContext are easy in blockdev.c.
> >
> >The completion code in block/backup.c must call bdrv_unref() from the
> >main loop.  Use block_job_defer_to_main_loop() to achieve that.
> >
> >Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >---
> >  block/backup.c | 21 +++++++++++++++++++--
> >  blockdev.c     | 23 ++++++++++++++++-------
> >  2 files changed, 35 insertions(+), 9 deletions(-)
> 
> Hm, so, if I run a backup blockjob from one thread (which is not the main
> loop and not the BDS's AIO context)

That cannot happen.  The blockjob always runs with the BDS AioContext
acquired - either because we explicitly did so in the main loop or
because the dataplane IOThread acquires it.

> and try to cancel it or set its speed
> from another thread (e.g. the main loop), won't the blockjob hold the BDS's
> AIO context lock as long as it runs and making it impossible to interfere?

No, because of how the AioContext's RfifoLock lock works.

It has a "contention callback" that kicks the aio_poll() loop.  This
allows the QEMU monitor to acquire the lock whenever an IOThread is
blocked in poll(2).  It uses an eventfd to force the IOThread out of
poll(2) and this happens automatically in aio_context_acquire().

See aio_context_new() and aio_rfifolock_cb().

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 07/11] block: let backup blockjob run in BDS AioContext
  2014-10-02 15:08     ` Stefan Hajnoczi
@ 2014-10-04 19:54       ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-04 19:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi

On 02.10.2014 17:08, Stefan Hajnoczi wrote:
> On Wed, Oct 01, 2014 at 09:38:39PM +0200, Max Reitz wrote:
>> On 01.10.2014 19:01, Stefan Hajnoczi wrote:
>>> The backup block job must run in the BlockDriverState AioContext so that
>>> it works with dataplane.
>>>
>>> The basics of acquiring the AioContext are easy in blockdev.c.
>>>
>>> The completion code in block/backup.c must call bdrv_unref() from the
>>> main loop.  Use block_job_defer_to_main_loop() to achieve that.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>   block/backup.c | 21 +++++++++++++++++++--
>>>   blockdev.c     | 23 ++++++++++++++++-------
>>>   2 files changed, 35 insertions(+), 9 deletions(-)
>> Hm, so, if I run a backup blockjob from one thread (which is not the main
>> loop and not the BDS's AIO context)
> That cannot happen.  The blockjob always runs with the BDS AioContext
> acquired - either because we explicitly did so in the main loop or
> because the dataplane IOThread acquires it.
>
>> and try to cancel it or set its speed
>> from another thread (e.g. the main loop), won't the blockjob hold the BDS's
>> AIO context lock as long as it runs and making it impossible to interfere?
> No, because of how the AioContext's RfifoLock lock works.
>
> It has a "contention callback" that kicks the aio_poll() loop.  This
> allows the QEMU monitor to acquire the lock whenever an IOThread is
> blocked in poll(2).  It uses an eventfd to force the IOThread out of
> poll(2) and this happens automatically in aio_context_acquire().
>
> See aio_context_new() and aio_rfifolock_cb().

Ah, okay, thank you for clarifying that.

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 08/11] block: let stream blockjob run in BDS AioContext
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 08/11] block: let stream " Stefan Hajnoczi
@ 2014-10-04 20:48   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-04 20:48 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> The stream block job must run in the BlockDriverState AioContext so that
> it works with dataplane.
>
> The basics of acquiring the AioContext are easy in blockdev.c.
>
> The tricky part is the completion code which drops part of the backing
> file chain.  This must be done in the main loop where bdrv_unref() and
> bdrv_close() are safe to call.  Use block_job_defer_to_main_loop() to
> achieve that.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block/stream.c | 50 ++++++++++++++++++++++++++++++++++++--------------
>   blockdev.c     | 16 ++++++++++++----
>   2 files changed, 48 insertions(+), 18 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 09/11] block: let mirror blockjob run in BDS AioContext
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 09/11] block: let mirror " Stefan Hajnoczi
@ 2014-10-04 21:07   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-04 21:07 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> The mirror block job must run in the BlockDriverState AioContext so that
> it works with dataplane.
>
> Acquire the AioContext in blockdev.c so starting the block job is safe.
>
> Note that to_replace is treated separately from other BlockDriverStates
> in that it does not need to be in the same AioContext.  Explicitly
> acquire/release to_replace's AioContext when accessing it.
>
> The completion code in block/mirror.c must perform BDS graph
> manipulation and bdrv_reopen() from the main loop.  Use
> block_job_defer_to_main_loop() to achieve that.
>
> The bdrv_drain_all() call is not allowed outside the main loop since it
> could lead to lock ordering problems.  Use bdrv_drain(bs) instead
> because we have acquired the AioContext so nothing else can sneak in
> I/O.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block.c        | 13 +++++++--
>   block/mirror.c | 85 ++++++++++++++++++++++++++++++++++++++++------------------
>   blockdev.c     | 38 ++++++++++++++++++--------
>   3 files changed, 97 insertions(+), 39 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 10/11] block: let commit blockjob run in BDS AioContext
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 10/11] block: let commit " Stefan Hajnoczi
@ 2014-10-04 21:28   ` Max Reitz
  2014-10-06  9:30     ` Stefan Hajnoczi
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2014-10-04 21:28 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> The commit block job must run in the BlockDriverState AioContext so that
> it works with dataplane.
>
> Acquire the AioContext in blockdev.c so starting the block job is safe.
> One detail here is that the bdrv_drain_all() must be moved inside the
> aio_context_acquire() region so requests cannot sneak in between the
> drain and acquire.

Hm, I see the intent, but in patch 5 you said bdrv_drain_all() should 
never be called outside of the main loop (at least that's how it 
appeared to me). Wouldn't it be enough to use bdrv_drain() on the source 
BDS, like in patch 9?

> The completion code in block/commit.c must perform backing chain
> manipulation and bdrv_reopen() from the main loop.  Use
> block_job_defer_to_main_loop() to achieve that.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block/commit.c | 72 +++++++++++++++++++++++++++++++++++++---------------------
>   blockdev.c     | 29 +++++++++++++++--------
>   2 files changed, 66 insertions(+), 35 deletions(-)
>
> diff --git a/block/commit.c b/block/commit.c
> index 91517d3..0fd05dc 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -60,17 +60,50 @@ static int coroutine_fn commit_populate(BlockDriverState *bs,
>       return 0;
>   }
>   
> -static void coroutine_fn commit_run(void *opaque)
> +typedef struct {
> +    int ret;
> +} CommitCompleteData;
> +
> +static void commit_complete(BlockJob *job, void *opaque)
>   {
> -    CommitBlockJob *s = opaque;
> +    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
> +    CommitCompleteData *data = opaque;
>       BlockDriverState *active = s->active;
>       BlockDriverState *top = s->top;
>       BlockDriverState *base = s->base;
>       BlockDriverState *overlay_bs;
> +    int ret = data->ret;
> +
> +    if (!block_job_is_cancelled(&s->common) && ret == 0) {
> +        /* success */
> +        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
> +    }
> +
> +    /* restore base open flags here if appropriate (e.g., change the base back
> +     * to r/o). These reopens do not need to be atomic, since we won't abort
> +     * even on failure here */
> +    if (s->base_flags != bdrv_get_flags(base)) {
> +        bdrv_reopen(base, s->base_flags, NULL);
> +    }
> +    overlay_bs = bdrv_find_overlay(active, top);
> +    if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
> +        bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
> +    }
> +    g_free(s->backing_file_str);
> +    block_job_completed(&s->common, ret);
> +    g_free(data);
> +}
> +
> +static void coroutine_fn commit_run(void *opaque)
> +{
> +    CommitBlockJob *s = opaque;
> +    CommitCompleteData *data;
> +    BlockDriverState *top = s->top;
> +    BlockDriverState *base = s->base;
>       int64_t sector_num, end;
>       int ret = 0;
>       int n = 0;
> -    void *buf;
> +    void *buf = NULL;
>       int bytes_written = 0;
>       int64_t base_len;
>   
> @@ -78,18 +111,18 @@ static void coroutine_fn commit_run(void *opaque)
>   
>   
>       if (s->common.len < 0) {
> -        goto exit_restore_reopen;
> +        goto out;
>       }
>   
>       ret = base_len = bdrv_getlength(base);
>       if (base_len < 0) {
> -        goto exit_restore_reopen;
> +        goto out;
>       }
>   
>       if (base_len < s->common.len) {
>           ret = bdrv_truncate(base, s->common.len);
>           if (ret) {
> -            goto exit_restore_reopen;
> +            goto out;
>           }
>       }
>   
> @@ -128,7 +161,7 @@ wait:
>               if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
>                   s->on_error == BLOCKDEV_ON_ERROR_REPORT||
>                   (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC && ret == -ENOSPC)) {
> -                goto exit_free_buf;
> +                goto out;
>               } else {
>                   n = 0;
>                   continue;
> @@ -140,27 +173,14 @@ wait:
>   
>       ret = 0;
>   
> -    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
> -        /* success */
> -        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
> +out:
> +    if (buf) {
> +        qemu_vfree(buf);
>       }

Is this new condition really necessary? However, it won't hurt, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

A general question regarding the assertions here and in patch 8: I tried 
to break them, but it couldn't find a way. The way I tried was by 
creating two devices in different threads with just one qcow2 behind 
each of them, and then trying to attach on of those qcow2 BDS to the 
other as a backing file. I couldn't find out, how, but I guess this is 
something we might want to support in the future. Can we actually be 
sure that all of the BDS in one tree are always running in the same AIO 
context? Are we already enforcing this?

And furthermore, basically all the calls to acquire an AIO context are 
of the form "aio_context = bdrv_get_aio_context(bs); 
aio_context_acquire(aio_context);". It is *extremely* unlikely if 
possible at all, but wouldn't it be possible to change the BDS's AIO 
context from another thread after the first function returned and before 
the lock is acquired? If that is really the case, I think we should have 
some atomic bdrv_acquire_aio_context() function.

Max

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 11/11] block: declare blockjobs and dataplane friends!
  2014-10-01 17:01 ` [Qemu-devel] [PATCH 11/11] block: declare blockjobs and dataplane friends! Stefan Hajnoczi
@ 2014-10-04 21:30   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-04 21:30 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> Now that blockjobs use AioContext they are safe for use with dataplane.
> Unblock them!
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   blockjob.c                      | 1 +
>   hw/block/dataplane/virtio-blk.c | 5 +++++
>   2 files changed, 6 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 10/11] block: let commit blockjob run in BDS AioContext
  2014-10-04 21:28   ` Max Reitz
@ 2014-10-06  9:30     ` Stefan Hajnoczi
  2014-10-07 15:18       ` Max Reitz
  0 siblings, 1 reply; 28+ messages in thread
From: Stefan Hajnoczi @ 2014-10-06  9:30 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4799 bytes --]

On Sat, Oct 04, 2014 at 11:28:22PM +0200, Max Reitz wrote:
> On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> >The commit block job must run in the BlockDriverState AioContext so that
> >it works with dataplane.
> >
> >Acquire the AioContext in blockdev.c so starting the block job is safe.
> >One detail here is that the bdrv_drain_all() must be moved inside the
> >aio_context_acquire() region so requests cannot sneak in between the
> >drain and acquire.
> 
> Hm, I see the intent, but in patch 5 you said bdrv_drain_all() should never
> be called outside of the main loop (at least that's how it appeared to me).
> Wouldn't it be enough to use bdrv_drain() on the source BDS, like in patch
> 9?

There is no contradiction here because qmp_block_commit() is invoked by
the QEMU monitor from the main loop.

The problem with bdrv_drain_all() is that it acquires AioContexts.  If
called outside the main loop without taking special care, it could
result in lock ordering problems (e.g. two threads trying to acquire all
AioContexts at the same time while already holding their respective
contexts).

qmp_block_commit() is just traditional QEMU global mutex code so it is
allowed to call bdrv_drain_all().

> >@@ -140,27 +173,14 @@ wait:
> >      ret = 0;
> >-    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
> >-        /* success */
> >-        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
> >+out:
> >+    if (buf) {
> >+        qemu_vfree(buf);
> >      }
> 
> Is this new condition really necessary? However, it won't hurt, so:

This was a mistake.  Since commit
94c8ff3a01d9bd1005f066a0ee3fe43c842a43b7 ("w32: Make qemu_vfree() accept
NULL like the POSIX implementation") it is no longer necessary to check
for NULL pointers.

You can't teach an old dog new tricks :).

Thanks, will fix in the next revision!

> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> A general question regarding the assertions here and in patch 8: I tried to
> break them, but it couldn't find a way. The way I tried was by creating two
> devices in different threads with just one qcow2 behind each of them, and
> then trying to attach on of those qcow2 BDS to the other as a backing file.
> I couldn't find out, how, but I guess this is something we might want to
> support in the future. Can we actually be sure that all of the BDS in one
> tree are always running in the same AIO context? Are we already enforcing
> this?

bdrv_set_aio_context() is recursive so it also sets all the child nodes.
That is the only mechanism to ensure AioContext is consistent across
nodes.

When the BDS graph is manipulated (e.g. attaching new roots, swapping
nodes, etc) we don't perform checks today.

Markus has asked that I add the appropriate assertions so errors are
caught early.  I haven't done that yet but it's a good next step.

As far as I'm aware, these patches don't introduce cases where we would
make the AioContext in the graph inconsistent.  So I see the AioContext
consistency assertions as a separate patch series (which I will work on
next...hopefully not to discover horrible problems!).

> And furthermore, basically all the calls to acquire an AIO context are of
> the form "aio_context = bdrv_get_aio_context(bs);
> aio_context_acquire(aio_context);". It is *extremely* unlikely if possible
> at all, but wouldn't it be possible to change the BDS's AIO context from
> another thread after the first function returned and before the lock is
> acquired? If that is really the case, I think we should have some atomic
> bdrv_acquire_aio_context() function.

No, because only the main loop calls bdrv_set_aio_context().  At the
moment the case you mentioned cannot happen.

Ultimately, we should move away from "this only works in the main loop"
constraints.  In order to provide atomic BDS AioContext acquire we need
a global root that is thread-safe.  That doesn't exist today -
bdrv_states is protected by the QEMU global mutex only.

I thought about adding the infrastructure in this patch series but it is
not necessary yet and would make the series more complicated.

The idea is:

 * Add bdrv_states_lock to protect the global list of BlockDriverStates
 * Integrate bdrv_ref()/bdrv_unref() as well as bdrv_get_aio_context()
   so they are atomic and protected by the bdrv_states_lock

So bdrv_find() and other functions that access bdrv_states become the
entry points to acquiring BlockDriverStates in a thread-safe fashion.
bdrv_unref() will need rethinking too to prevent races between freeing a
BDS and bdrv_find().

Can you think of a place where we need this today?  I haven't found one
yet but would like one to develop the code against.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH 10/11] block: let commit blockjob run in BDS AioContext
  2014-10-06  9:30     ` Stefan Hajnoczi
@ 2014-10-07 15:18       ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2014-10-07 15:18 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel

On 06.10.2014 11:30, Stefan Hajnoczi wrote:
> On Sat, Oct 04, 2014 at 11:28:22PM +0200, Max Reitz wrote:
>> On 01.10.2014 19:01, Stefan Hajnoczi wrote:
>>> The commit block job must run in the BlockDriverState AioContext so that
>>> it works with dataplane.
>>>
>>> Acquire the AioContext in blockdev.c so starting the block job is safe.
>>> One detail here is that the bdrv_drain_all() must be moved inside the
>>> aio_context_acquire() region so requests cannot sneak in between the
>>> drain and acquire.
>> Hm, I see the intent, but in patch 5 you said bdrv_drain_all() should never
>> be called outside of the main loop (at least that's how it appeared to me).
>> Wouldn't it be enough to use bdrv_drain() on the source BDS, like in patch
>> 9?
> There is no contradiction here because qmp_block_commit() is invoked by
> the QEMU monitor from the main loop.
>
> The problem with bdrv_drain_all() is that it acquires AioContexts.  If
> called outside the main loop without taking special care, it could
> result in lock ordering problems (e.g. two threads trying to acquire all
> AioContexts at the same time while already holding their respective
> contexts).
>
> qmp_block_commit() is just traditional QEMU global mutex code so it is
> allowed to call bdrv_drain_all().

Hm, okay then.

>>> @@ -140,27 +173,14 @@ wait:
>>>       ret = 0;
>>> -    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
>>> -        /* success */
>>> -        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
>>> +out:
>>> +    if (buf) {
>>> +        qemu_vfree(buf);
>>>       }
>> Is this new condition really necessary? However, it won't hurt, so:
> This was a mistake.  Since commit
> 94c8ff3a01d9bd1005f066a0ee3fe43c842a43b7 ("w32: Make qemu_vfree() accept
> NULL like the POSIX implementation") it is no longer necessary to check
> for NULL pointers.
>
> You can't teach an old dog new tricks :).
>
> Thanks, will fix in the next revision!
>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>> A general question regarding the assertions here and in patch 8: I tried to
>> break them, but it couldn't find a way. The way I tried was by creating two
>> devices in different threads with just one qcow2 behind each of them, and
>> then trying to attach on of those qcow2 BDS to the other as a backing file.
>> I couldn't find out, how, but I guess this is something we might want to
>> support in the future. Can we actually be sure that all of the BDS in one
>> tree are always running in the same AIO context? Are we already enforcing
>> this?
> bdrv_set_aio_context() is recursive so it also sets all the child nodes.
> That is the only mechanism to ensure AioContext is consistent across
> nodes.
>
> When the BDS graph is manipulated (e.g. attaching new roots, swapping
> nodes, etc) we don't perform checks today.
>
> Markus has asked that I add the appropriate assertions so errors are
> caught early.  I haven't done that yet but it's a good next step.

Okay, seems good to me. It's not possible to break them now, and if it 
will ever be, the assertions will at least catch it.

> As far as I'm aware, these patches don't introduce cases where we would
> make the AioContext in the graph inconsistent.  So I see the AioContext
> consistency assertions as a separate patch series (which I will work on
> next...hopefully not to discover horrible problems!).
>
>> And furthermore, basically all the calls to acquire an AIO context are of
>> the form "aio_context = bdrv_get_aio_context(bs);
>> aio_context_acquire(aio_context);". It is *extremely* unlikely if possible
>> at all, but wouldn't it be possible to change the BDS's AIO context from
>> another thread after the first function returned and before the lock is
>> acquired? If that is really the case, I think we should have some atomic
>> bdrv_acquire_aio_context() function.
> No, because only the main loop calls bdrv_set_aio_context().  At the
> moment the case you mentioned cannot happen.
>
> Ultimately, we should move away from "this only works in the main loop"
> constraints.  In order to provide atomic BDS AioContext acquire we need
> a global root that is thread-safe.  That doesn't exist today -
> bdrv_states is protected by the QEMU global mutex only.
>
> I thought about adding the infrastructure in this patch series but it is
> not necessary yet and would make the series more complicated.
>
> The idea is:
>
>   * Add bdrv_states_lock to protect the global list of BlockDriverStates
>   * Integrate bdrv_ref()/bdrv_unref() as well as bdrv_get_aio_context()
>     so they are atomic and protected by the bdrv_states_lock
>
> So bdrv_find() and other functions that access bdrv_states become the
> entry points to acquiring BlockDriverStates in a thread-safe fashion.
> bdrv_unref() will need rethinking too to prevent races between freeing a
> BDS and bdrv_find().
>
> Can you think of a place where we need this today?  I haven't found one
> yet but would like one to develop the code against.

No, I can't think of anything, as long as QMP commands always arrive 
through the main loop.

Thank you for your explanations!

Max

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-10-07 15:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-01 17:01 [Qemu-devel] [PATCH 00/11] block: allow blockjobs to coexist with dataplane Stefan Hajnoczi
2014-10-01 17:01 ` [Qemu-devel] [PATCH 01/11] block: acquire AioContext in generic blockjob QMP commands Stefan Hajnoczi
2014-10-01 18:27   ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 02/11] blockdev: acquire AioContext in do_qmp_query_block_jobs_one() Stefan Hajnoczi
2014-10-01 18:32   ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 03/11] blockdev: acquire AioContext in blockdev_mark_auto_del() Stefan Hajnoczi
2014-10-01 18:39   ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 04/11] blockdev: add note that block_job_cb() must be thread-safe Stefan Hajnoczi
2014-10-01 18:42   ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 05/11] blockjob: add block_job_defer_to_main_loop() Stefan Hajnoczi
2014-10-01 19:06   ` Max Reitz
2014-10-02 14:58     ` Stefan Hajnoczi
2014-10-01 17:01 ` [Qemu-devel] [PATCH 06/11] block: add bdrv_drain() Stefan Hajnoczi
2014-10-01 19:15   ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 07/11] block: let backup blockjob run in BDS AioContext Stefan Hajnoczi
2014-10-01 19:38   ` Max Reitz
2014-10-02 15:08     ` Stefan Hajnoczi
2014-10-04 19:54       ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 08/11] block: let stream " Stefan Hajnoczi
2014-10-04 20:48   ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 09/11] block: let mirror " Stefan Hajnoczi
2014-10-04 21:07   ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 10/11] block: let commit " Stefan Hajnoczi
2014-10-04 21:28   ` Max Reitz
2014-10-06  9:30     ` Stefan Hajnoczi
2014-10-07 15:18       ` Max Reitz
2014-10-01 17:01 ` [Qemu-devel] [PATCH 11/11] block: declare blockjobs and dataplane friends! Stefan Hajnoczi
2014-10-04 21:30   ` Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).