[Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition
@ 2016-11-02 17:50 John Snow
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 1/6] blockjob: fix dead pointer in txn list John Snow
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: John Snow @ 2016-11-02 17:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, stefanha, pbonzini, jcody, qemu-devel,
	John Snow

There are a few problems with transactional job completion right now.

First, if jobs complete so quickly they complete before remaining jobs
get a chance to join the transaction, the completion mode can leave well
known state and the QLIST can get corrupted and the transactional jobs
can complete in batches or phases instead of all together.

Second, if two or more jobs defer to the main loop at roughly the same
time, it's possible for one job's cleanup to directly invoke the other
job's cleanup from within the same thread, leading to a situation that
will deadlock the entire transaction.

Thanks to Vladimir for pointing out these modes of failure.

===
v3:
===

- Rebase to origin/master, requisite patches now upstream.

===
v2:
===

- Correct Vladimir's email (Sorry!)
- Add test as a variant of an existing test [Vladimir]

________________________________________________________________________________

For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git branch job-fix-race-condition
https://github.com/jnsnow/qemu/tree/job-fix-race-condition

This version is tagged job-fix-race-condition-v3:
https://github.com/jnsnow/qemu/releases/tag/job-fix-race-condition-v3

John Snow (5):
  blockjob: add .clean property
  blockjob: add .start field
  blockjob: add block_job_start
  blockjob: refactor backup_start as backup_job_create
  iotests: add transactional failure race test

Vladimir Sementsov-Ogievskiy (1):
  blockjob: fix dead pointer in txn list

 block/backup.c               | 63 ++++++++++++++++++---------------
 block/commit.c               |  4 +--
 block/mirror.c               |  5 +--
 block/replication.c          | 12 ++++---
 block/stream.c               |  4 +--
 blockdev.c                   | 83 ++++++++++++++++++++++++++++----------------
 blockjob.c                   | 55 ++++++++++++++++++++++-------
 include/block/block_int.h    | 23 ++++++------
 include/block/blockjob.h     |  9 +++++
 include/block/blockjob_int.h | 11 ++++++
 tests/qemu-iotests/124       | 53 ++++++++++++++++++----------
 tests/qemu-iotests/124.out   |  4 +--
 tests/test-blockjob-txn.c    | 12 +++----
 13 files changed, 221 insertions(+), 117 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v3 1/6] blockjob: fix dead pointer in txn list
  2016-11-02 17:50 [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition John Snow
@ 2016-11-02 17:50 ` John Snow
  2016-11-08  2:47   ` Jeff Cody
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 2/6] blockjob: add .clean property John Snow
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2016-11-02 17:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, stefanha, pbonzini, jcody, qemu-devel,
	John Snow

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Though it is not intended to be reached through normal circumstances,
if we do not gracefully deconstruct the transaction QLIST, we may wind
up with stale pointers in the list.

The rest of this series attempts to address the underlying issues,
but this should fix list inconsistencies.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
[Rewrote commit message. --js]
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
---
 blockjob.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/blockjob.c b/blockjob.c
index 4aa14a4..4d0ef53 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -256,6 +256,7 @@ static void block_job_completed_single(BlockJob *job)
     }

     if (job->txn) {
+        QLIST_REMOVE(job, txn_list);
         block_job_txn_unref(job->txn);
     }
     block_job_unref(job);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v3 2/6] blockjob: add .clean property
  2016-11-02 17:50 [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition John Snow
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 1/6] blockjob: fix dead pointer in txn list John Snow
@ 2016-11-02 17:50 ` John Snow
  2016-11-08  2:51   ` Jeff Cody
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 3/6] blockjob: add .start field John Snow
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2016-11-02 17:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, stefanha, pbonzini, jcody, qemu-devel,
	John Snow

Cleaning up after we have deferred to the main thread but before the
transaction has converged can be dangerous and result in deadlocks
if the job cleanup invokes any BH polling loops.

A job may attempt to begin cleaning up, but may induce another job to
enter its cleanup routine. The second job, part of our same transaction,
will block waiting for the first job to finish, so neither job may now
make progress.

To rectify this, allow jobs to register a cleanup operation that will
always run regardless of if the job was in a transaction or not, and
if the transaction job group completed successfully or not.

Move sensitive cleanup to this callback instead which is guaranteed to
be run only after the transaction has converged, which removes sensitive
timing constraints from said cleanup.

Furthermore, in future patches these cleanup operations will be performed
regardless of whether or not we actually started the job. Therefore,
cleanup callbacks should essentially confine themselves to undoing create
operations, e.g. setup actions taken in what is now backup_start.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c               | 15 ++++++++++-----
 blockjob.c                   |  3 +++
 include/block/blockjob_int.h |  8 ++++++++
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 7b5d8a3..734a24c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -242,6 +242,14 @@ static void backup_abort(BlockJob *job)
     }
 }
 
+static void backup_clean(BlockJob *job)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+    assert(s->target);
+    blk_unref(s->target);
+    s->target = NULL;
+}
+
 static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common);
@@ -321,6 +329,7 @@ static const BlockJobDriver backup_job_driver = {
     .set_speed              = backup_set_speed,
     .commit                 = backup_commit,
     .abort                  = backup_abort,
+    .clean                  = backup_clean,
     .attached_aio_context   = backup_attached_aio_context,
     .drain                  = backup_drain,
 };
@@ -343,12 +352,8 @@ typedef struct {
 
 static void backup_complete(BlockJob *job, void *opaque)
 {
-    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
     BackupCompleteData *data = opaque;
 
-    blk_unref(s->target);
-    s->target = NULL;
-
     block_job_completed(job, data->ret);
     g_free(data);
 }
@@ -658,7 +663,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
         bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
     }
     if (job) {
-        blk_unref(job->target);
+        backup_clean(&job->common);
         block_job_unref(&job->common);
     }
 }
diff --git a/blockjob.c b/blockjob.c
index 4d0ef53..e3c458c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -241,6 +241,9 @@ static void block_job_completed_single(BlockJob *job)
             job->driver->abort(job);
         }
     }
+    if (job->driver->clean) {
+        job->driver->clean(job);
+    }
 
     if (job->cb) {
         job->cb(job->opaque, job->ret);
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 40275e4..60d91a0 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -74,6 +74,14 @@ struct BlockJobDriver {
     void (*abort)(BlockJob *job);
 
     /**
+     * If the callback is not NULL, it will be invoked after a call to either
+     * .commit() or .abort(). Regardless of which callback is invoked after
+     * completion, .clean() will always be called, even if the job does not
+     * belong to a transaction group.
+     */
+    void (*clean)(BlockJob *job);
+
+    /**
      * If the callback is not NULL, it will be invoked when the job transitions
      * into the paused state.  Paused jobs must not perform any asynchronous
      * I/O or event loop activity.  This callback is used to quiesce jobs.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v3 3/6] blockjob: add .start field
  2016-11-02 17:50 [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition John Snow
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 1/6] blockjob: fix dead pointer in txn list John Snow
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 2/6] blockjob: add .clean property John Snow
@ 2016-11-02 17:50 ` John Snow
  2016-11-08  2:58   ` Jeff Cody
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start John Snow
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2016-11-02 17:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, stefanha, pbonzini, jcody, qemu-devel,
	John Snow

Add an explicit start field to specify the entrypoint. We already have
ownership of the coroutine itself AND managing the lifetime of the
coroutine, let's take control of creation of the coroutine, too.

This will allow us to delay creation of the actual coroutine until we
know we'll actually start a BlockJob in block_job_start. This avoids
the sticky question of how to "un-create" a Coroutine that hasn't been
started yet.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c               | 25 +++++++++++++------------
 block/commit.c               |  3 ++-
 block/mirror.c               |  4 +++-
 block/stream.c               |  3 ++-
 include/block/blockjob_int.h |  3 +++
 5 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 734a24c..4ed4494 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -323,17 +323,6 @@ static void backup_drain(BlockJob *job)
     }
 }
 
-static const BlockJobDriver backup_job_driver = {
-    .instance_size          = sizeof(BackupBlockJob),
-    .job_type               = BLOCK_JOB_TYPE_BACKUP,
-    .set_speed              = backup_set_speed,
-    .commit                 = backup_commit,
-    .abort                  = backup_abort,
-    .clean                  = backup_clean,
-    .attached_aio_context   = backup_attached_aio_context,
-    .drain                  = backup_drain,
-};
-
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
                                             bool read, int error)
 {
@@ -542,6 +531,18 @@ static void coroutine_fn backup_run(void *opaque)
     block_job_defer_to_main_loop(&job->common, backup_complete, data);
 }
 
+static const BlockJobDriver backup_job_driver = {
+    .instance_size          = sizeof(BackupBlockJob),
+    .job_type               = BLOCK_JOB_TYPE_BACKUP,
+    .start                  = backup_run,
+    .set_speed              = backup_set_speed,
+    .commit                 = backup_commit,
+    .abort                  = backup_abort,
+    .clean                  = backup_clean,
+    .attached_aio_context   = backup_attached_aio_context,
+    .drain                  = backup_drain,
+};
+
 void backup_start(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *target, int64_t speed,
                   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
@@ -653,7 +654,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
     block_job_add_bdrv(&job->common, target);
     job->common.len = len;
-    job->common.co = qemu_coroutine_create(backup_run, job);
+    job->common.co = qemu_coroutine_create(job->common.driver->start, job);
     block_job_txn_add_job(txn, &job->common);
     qemu_coroutine_enter(job->common.co);
     return;
diff --git a/block/commit.c b/block/commit.c
index e1eda89..20d27e2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -205,6 +205,7 @@ static const BlockJobDriver commit_job_driver = {
     .instance_size = sizeof(CommitBlockJob),
     .job_type      = BLOCK_JOB_TYPE_COMMIT,
     .set_speed     = commit_set_speed,
+    .start         = commit_run,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
@@ -288,7 +289,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     s->backing_file_str = g_strdup(backing_file_str);
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(commit_run, s);
+    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
     trace_commit_start(bs, base, top, s, s->common.co);
     qemu_coroutine_enter(s->common.co);
diff --git a/block/mirror.c b/block/mirror.c
index b2c1fb8..659e09c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -920,6 +920,7 @@ static const BlockJobDriver mirror_job_driver = {
     .instance_size          = sizeof(MirrorBlockJob),
     .job_type               = BLOCK_JOB_TYPE_MIRROR,
     .set_speed              = mirror_set_speed,
+    .start                  = mirror_run,
     .complete               = mirror_complete,
     .pause                  = mirror_pause,
     .attached_aio_context   = mirror_attached_aio_context,
@@ -930,6 +931,7 @@ static const BlockJobDriver commit_active_job_driver = {
     .instance_size          = sizeof(MirrorBlockJob),
     .job_type               = BLOCK_JOB_TYPE_COMMIT,
     .set_speed              = mirror_set_speed,
+    .start                  = mirror_run,
     .complete               = mirror_complete,
     .pause                  = mirror_pause,
     .attached_aio_context   = mirror_attached_aio_context,
@@ -1007,7 +1009,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
         }
     }
 
-    s->common.co = qemu_coroutine_create(mirror_run, s);
+    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_mirror_start(bs, s, s->common.co, opaque);
     qemu_coroutine_enter(s->common.co);
 }
diff --git a/block/stream.c b/block/stream.c
index b05856b..92309ff 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -218,6 +218,7 @@ static const BlockJobDriver stream_job_driver = {
     .instance_size = sizeof(StreamBlockJob),
     .job_type      = BLOCK_JOB_TYPE_STREAM,
     .set_speed     = stream_set_speed,
+    .start         = stream_run,
 };
 
 void stream_start(const char *job_id, BlockDriverState *bs,
@@ -254,7 +255,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     s->bs_flags = orig_bs_flags;
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(stream_run, s);
+    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_stream_start(bs, base, s, s->common.co);
     qemu_coroutine_enter(s->common.co);
 }
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 60d91a0..8223822 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -47,6 +47,9 @@ struct BlockJobDriver {
     /** Optional callback for job types that need to forward I/O status reset */
     void (*iostatus_reset)(BlockJob *job);
 
+    /** Mandatory: Entrypoint for the Coroutine. */
+    CoroutineEntry *start;
+
     /**
      * Optional callback for job types whose completion must be triggered
      * manually.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start
  2016-11-02 17:50 [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition John Snow
                   ` (2 preceding siblings ...)
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 3/6] blockjob: add .start field John Snow
@ 2016-11-02 17:50 ` John Snow
  2016-11-03 12:17   ` Kevin Wolf
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create John Snow
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2016-11-02 17:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, stefanha, pbonzini, jcody, qemu-devel,
	John Snow

Instead of automatically starting jobs at creation time via backup_start
et al, we'd like to return a job object pointer that can be started
manually at later point in time.

For now, add the block_job_start mechanism and start the jobs
automatically as we have been doing, with conversions job-by-job coming
in later patches.

Of note: cancellation of unstarted jobs will perform all the normal
cleanup as if the job had started, particularly abort and clean. The
only difference is that we will not emit any events, because the job
never actually started.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            |  3 +--
 block/commit.c            |  3 +--
 block/mirror.c            |  3 +--
 block/stream.c            |  3 +--
 blockjob.c                | 51 ++++++++++++++++++++++++++++++++++++-----------
 include/block/blockjob.h  |  9 +++++++++
 tests/test-blockjob-txn.c | 12 +++++------
 7 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 4ed4494..ae1b99a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -654,9 +654,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
     block_job_add_bdrv(&job->common, target);
     job->common.len = len;
-    job->common.co = qemu_coroutine_create(job->common.driver->start, job);
     block_job_txn_add_job(txn, &job->common);
-    qemu_coroutine_enter(job->common.co);
+    block_job_start(&job->common);
     return;
 
  error:
diff --git a/block/commit.c b/block/commit.c
index 20d27e2..5b7c454 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
     s->backing_file_str = g_strdup(backing_file_str);
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
 
     trace_commit_start(bs, base, top, s, s->common.co);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
 
 
diff --git a/block/mirror.c b/block/mirror.c
index 659e09c..c078d45 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1009,9 +1009,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
         }
     }
 
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_mirror_start(bs, s, s->common.co, opaque);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
 
 void mirror_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/stream.c b/block/stream.c
index 92309ff..2de8d38 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -255,7 +255,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     s->bs_flags = orig_bs_flags;
 
     s->on_error = on_error;
-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
     trace_stream_start(bs, base, s, s->common.co);
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
 }
diff --git a/blockjob.c b/blockjob.c
index e3c458c..16c5159 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
     job->blk           = blk;
     job->cb            = cb;
     job->opaque        = opaque;
-    job->busy          = true;
+    job->busy          = false;
+    job->paused        = true;
     job->refcnt        = 1;
     bs->job = job;
 
@@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
     return (job->id == NULL);
 }
 
+static bool block_job_started(BlockJob *job)
+{
+    return job->co;
+}
+
+void block_job_start(BlockJob *job)
+{
+    assert(job && !block_job_started(job) && job->paused &&
+           !job->busy && job->driver->start);
+    job->paused = false;
+    job->busy = true;
+    job->co = qemu_coroutine_create(job->driver->start, job);
+    qemu_coroutine_enter(job->co);
+}
+
 void block_job_ref(BlockJob *job)
 {
     ++job->refcnt;
@@ -248,14 +264,18 @@ static void block_job_completed_single(BlockJob *job)
     if (job->cb) {
         job->cb(job->opaque, job->ret);
     }
-    if (block_job_is_cancelled(job)) {
-        block_job_event_cancelled(job);
-    } else {
-        const char *msg = NULL;
-        if (job->ret < 0) {
-            msg = strerror(-job->ret);
+
+    /* Emit events only if we actually started */
+    if (block_job_started(job)) {
+        if (block_job_is_cancelled(job)) {
+            block_job_event_cancelled(job);
+        } else {
+            const char *msg = NULL;
+            if (job->ret < 0) {
+                msg = strerror(-job->ret);
+            }
+            block_job_event_completed(job, msg);
         }
-        block_job_event_completed(job, msg);
     }
 
     if (job->txn) {
@@ -363,7 +383,8 @@ void block_job_complete(BlockJob *job, Error **errp)
 {
     /* Should not be reachable via external interface for internal jobs */
     assert(job->id);
-    if (job->pause_count || job->cancelled || !job->driver->complete) {
+    if (job->pause_count || job->cancelled ||
+        !block_job_started(job) || !job->driver->complete) {
         error_setg(errp, "The active block job '%s' cannot be completed",
                    job->id);
         return;
@@ -395,6 +416,8 @@ bool block_job_user_paused(BlockJob *job)
 
 void coroutine_fn block_job_pause_point(BlockJob *job)
 {
+    assert(job && block_job_started(job));
+
     if (!block_job_should_pause(job)) {
         return;
     }
@@ -446,9 +469,13 @@ void block_job_enter(BlockJob *job)
 
 void block_job_cancel(BlockJob *job)
 {
-    job->cancelled = true;
-    block_job_iostatus_reset(job);
-    block_job_enter(job);
+    if (block_job_started(job)) {
+        job->cancelled = true;
+        block_job_iostatus_reset(job);
+        block_job_enter(job);
+    } else {
+        block_job_completed(job, -ECANCELED);
+    }
 }
 
 bool block_job_is_cancelled(BlockJob *job)
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 356cacf..1acb256 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -189,6 +189,15 @@ void block_job_add_bdrv(BlockJob *job, BlockDriverState *bs);
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp);
 
 /**
+ * block_job_start:
+ * @job: A job that has not yet been started.
+ *
+ * Begins execution of a block job.
+ * Takes ownership of one reference to the job object.
+ */
+void block_job_start(BlockJob *job);
+
+/**
  * block_job_cancel:
  * @job: The job to be canceled.
  *
diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index f9afc3b..b132e39 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -24,10 +24,6 @@ typedef struct {
     int *result;
 } TestBlockJob;
 
-static const BlockJobDriver test_block_job_driver = {
-    .instance_size = sizeof(TestBlockJob),
-};
-
 static void test_block_job_complete(BlockJob *job, void *opaque)
 {
     BlockDriverState *bs = blk_bs(job->blk);
@@ -77,6 +73,11 @@ static void test_block_job_cb(void *opaque, int ret)
     g_free(data);
 }
 
+static const BlockJobDriver test_block_job_driver = {
+    .instance_size = sizeof(TestBlockJob),
+    .start = test_block_job_run,
+};
+
 /* Create a block job that completes with a given return code after a given
  * number of event loop iterations.  The return code is stored in the given
  * result pointer.
@@ -104,10 +105,9 @@ static BlockJob *test_block_job_start(unsigned int iterations,
     s->use_timer = use_timer;
     s->rc = rc;
     s->result = result;
-    s->common.co = qemu_coroutine_create(test_block_job_run, s);
     data->job = s;
     data->result = result;
-    qemu_coroutine_enter(s->common.co);
+    block_job_start(&s->common);
     return &s->common;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create
  2016-11-02 17:50 [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition John Snow
                   ` (3 preceding siblings ...)
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start John Snow
@ 2016-11-02 17:50 ` John Snow
  2016-11-03 13:17   ` Kevin Wolf
  2016-11-08  3:14   ` Jeff Cody
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 6/6] iotests: add transactional failure race test John Snow
  2016-11-03 13:21 ` [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition Kevin Wolf
  6 siblings, 2 replies; 22+ messages in thread
From: John Snow @ 2016-11-02 17:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, stefanha, pbonzini, jcody, qemu-devel,
	John Snow

Refactor backup_start as backup_job_create, which only creates the job,
but does not automatically start it. The old interface, 'backup_start',
is not kept in favor of limiting the number of nearly-identical interfaces
that would have to be edited to keep up with QAPI changes in the future.

Callers that wish to synchronously start the backup_block_job can
instead just call block_job_start immediately after calling
backup_job_create.

Transactions are updated to use the new interface, calling block_job_start
only during the .commit phase, which helps prevent race conditions where
jobs may finish before we even finish building the transaction. This may
happen, for instance, during empty block backup jobs.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c            | 26 ++++++++-------
 block/replication.c       | 12 ++++---
 blockdev.c                | 83 ++++++++++++++++++++++++++++++-----------------
 include/block/block_int.h | 23 ++++++-------
 4 files changed, 87 insertions(+), 57 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index ae1b99a..ea38733 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -543,7 +543,7 @@ static const BlockJobDriver backup_job_driver = {
     .drain                  = backup_drain,
 };
 
-void backup_start(const char *job_id, BlockDriverState *bs,
+BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *target, int64_t speed,
                   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
                   bool compress,
@@ -563,52 +563,52 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 
     if (bs == target) {
         error_setg(errp, "Source and target cannot be the same");
-        return;
+        return NULL;
     }
 
     if (!bdrv_is_inserted(bs)) {
         error_setg(errp, "Device is not inserted: %s",
                    bdrv_get_device_name(bs));
-        return;
+        return NULL;
     }
 
     if (!bdrv_is_inserted(target)) {
         error_setg(errp, "Device is not inserted: %s",
                    bdrv_get_device_name(target));
-        return;
+        return NULL;
     }
 
     if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
         error_setg(errp, "Compression is not supported for this drive %s",
                    bdrv_get_device_name(target));
-        return;
+        return NULL;
     }
 
     if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
-        return;
+        return NULL;
     }
 
     if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
-        return;
+        return NULL;
     }
 
     if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         if (!sync_bitmap) {
             error_setg(errp, "must provide a valid bitmap name for "
                              "\"incremental\" sync mode");
-            return;
+            return NULL;
         }
 
         /* Create a new bitmap, and freeze/disable this one. */
         if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
-            return;
+            return NULL;
         }
     } else if (sync_bitmap) {
         error_setg(errp,
                    "a sync_bitmap was provided to backup_run, "
                    "but received an incompatible sync_mode (%s)",
                    MirrorSyncMode_lookup[sync_mode]);
-        return;
+        return NULL;
     }
 
     len = bdrv_getlength(bs);
@@ -655,8 +655,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
     block_job_add_bdrv(&job->common, target);
     job->common.len = len;
     block_job_txn_add_job(txn, &job->common);
-    block_job_start(&job->common);
-    return;
+
+    return &job->common;
 
  error:
     if (sync_bitmap) {
@@ -666,4 +666,6 @@ void backup_start(const char *job_id, BlockDriverState *bs,
         backup_clean(&job->common);
         block_job_unref(&job->common);
     }
+
+    return NULL;
 }
diff --git a/block/replication.c b/block/replication.c
index d5e2b0f..729dd12 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -421,6 +421,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
     int64_t active_length, hidden_length, disk_length;
     AioContext *aio_context;
     Error *local_err = NULL;
+    BlockJob *job;
 
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
@@ -508,17 +509,18 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         bdrv_op_block_all(top_bs, s->blocker);
         bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-        backup_start(NULL, s->secondary_disk->bs, s->hidden_disk->bs, 0,
-                     MIRROR_SYNC_MODE_NONE, NULL, false,
-                     BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
-                     BLOCK_JOB_INTERNAL, backup_job_completed, bs,
-                     NULL, &local_err);
+        job = backup_job_create(NULL, s->secondary_disk->bs, s->hidden_disk->bs,
+                                0, MIRROR_SYNC_MODE_NONE, NULL, false,
+                                BLOCKDEV_ON_ERROR_REPORT,
+                                BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
+                                backup_job_completed, bs, NULL, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
             backup_job_cleanup(bs);
             aio_context_release(aio_context);
             return;
         }
+        block_job_start(job);
         break;
     default:
         aio_context_release(aio_context);
diff --git a/blockdev.c b/blockdev.c
index 102ca9f..871aa35 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1811,7 +1811,7 @@ typedef struct DriveBackupState {
     BlockJob *job;
 } DriveBackupState;
 
-static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
+static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
                             Error **errp);
 
 static void drive_backup_prepare(BlkActionState *common, Error **errp)
@@ -1835,23 +1835,27 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
     bdrv_drained_begin(bs);
     state->bs = bs;
 
-    do_drive_backup(backup, common->block_job_txn, &local_err);
+    state->job = do_drive_backup(backup, common->block_job_txn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
     }
+}
 
-    state->job = state->bs->job;
+static void drive_backup_commit(BlkActionState *common)
+{
+    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
+    if (state->job) {
+        block_job_start(state->job);
+    }
 }
 
 static void drive_backup_abort(BlkActionState *common)
 {
     DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
-    BlockDriverState *bs = state->bs;
 
-    /* Only cancel if it's the job we started */
-    if (bs && bs->job && bs->job == state->job) {
-        block_job_cancel_sync(bs->job);
+    if (state->job) {
+        block_job_cancel_sync(state->job);
     }
 }
 
@@ -1872,8 +1876,8 @@ typedef struct BlockdevBackupState {
     AioContext *aio_context;
 } BlockdevBackupState;
 
-static void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
-                               Error **errp);
+static BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
+                                    Error **errp);
 
 static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
 {
@@ -1906,23 +1910,27 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
     state->bs = bs;
     bdrv_drained_begin(state->bs);
 
-    do_blockdev_backup(backup, common->block_job_txn, &local_err);
+    state->job = do_blockdev_backup(backup, common->block_job_txn, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
     }
+}
 
-    state->job = state->bs->job;
+static void blockdev_backup_commit(BlkActionState *common)
+{
+    BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
+    if (state->job) {
+        block_job_start(state->job);
+    }
 }
 
 static void blockdev_backup_abort(BlkActionState *common)
 {
     BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
-    BlockDriverState *bs = state->bs;
 
-    /* Only cancel if it's the job we started */
-    if (bs && bs->job && bs->job == state->job) {
-        block_job_cancel_sync(bs->job);
+    if (state->job) {
+        block_job_cancel_sync(state->job);
     }
 }
 
@@ -2072,12 +2080,14 @@ static const BlkActionOps actions[] = {
     [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
         .instance_size = sizeof(DriveBackupState),
         .prepare = drive_backup_prepare,
+        .commit = drive_backup_commit,
         .abort = drive_backup_abort,
         .clean = drive_backup_clean,
     },
     [TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP] = {
         .instance_size = sizeof(BlockdevBackupState),
         .prepare = blockdev_backup_prepare,
+        .commit = blockdev_backup_commit,
         .abort = blockdev_backup_abort,
         .clean = blockdev_backup_clean,
     },
@@ -3106,11 +3116,13 @@ out:
     aio_context_release(aio_context);
 }
 
-static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
+static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
+                                 Error **errp)
 {
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     BlockDriverState *source = NULL;
+    BlockJob *job = NULL;
     BdrvDirtyBitmap *bmap = NULL;
     AioContext *aio_context;
     QDict *options = NULL;
@@ -3139,7 +3151,7 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
 
     bs = qmp_get_root_bs(backup->device, errp);
     if (!bs) {
-        return;
+        return NULL;
     }
 
     aio_context = bdrv_get_aio_context(bs);
@@ -3213,10 +3225,10 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
         }
     }
 
-    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
-                 bmap, backup->compress, backup->on_source_error,
-                 backup->on_target_error, BLOCK_JOB_DEFAULT,
-                 NULL, NULL, txn, &local_err);
+    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
+                            backup->sync, bmap, backup->compress,
+                            backup->on_source_error, backup->on_target_error,
+                            BLOCK_JOB_DEFAULT, NULL, NULL, txn, &local_err);
     bdrv_unref(target_bs);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
@@ -3225,11 +3237,17 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
 
 out:
     aio_context_release(aio_context);
+    return job;
 }
 
 void qmp_drive_backup(DriveBackup *arg, Error **errp)
 {
-    return do_drive_backup(arg, NULL, errp);
+
+    BlockJob *job;
+    job = do_drive_backup(arg, NULL, errp);
+    if (job) {
+        block_job_start(job);
+    }
 }
 
 BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
@@ -3237,12 +3255,14 @@ BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
     return bdrv_named_nodes_list(errp);
 }
 
-void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
+BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
+                             Error **errp)
 {
     BlockDriverState *bs;
     BlockDriverState *target_bs;
     Error *local_err = NULL;
     AioContext *aio_context;
+    BlockJob *job = NULL;
 
     if (!backup->has_speed) {
         backup->speed = 0;
@@ -3262,7 +3282,7 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
 
     bs = qmp_get_root_bs(backup->device, errp);
     if (!bs) {
-        return;
+        return NULL;
     }
 
     aio_context = bdrv_get_aio_context(bs);
@@ -3284,20 +3304,25 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
             goto out;
         }
     }
-    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
-                 NULL, backup->compress, backup->on_source_error,
-                 backup->on_target_error, BLOCK_JOB_DEFAULT,
-                 NULL, NULL, txn, &local_err);
+    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
+                            backup->sync, NULL, backup->compress,
+                            backup->on_source_error, backup->on_target_error,
+                            BLOCK_JOB_DEFAULT, NULL, NULL, txn, &local_err);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
     }
 out:
     aio_context_release(aio_context);
+    return job;
 }
 
 void qmp_blockdev_backup(BlockdevBackup *arg, Error **errp)
 {
-    do_blockdev_backup(arg, NULL, errp);
+    BlockJob *job;
+    job = do_blockdev_backup(arg, NULL, errp);
+    if (job) {
+        block_job_start(job);
+    }
 }
 
 /* Parameter check and block job starting for drive mirroring.
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b02abbd..83a423c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -748,7 +748,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   bool unmap, Error **errp);
 
 /*
- * backup_start:
+ * backup_job_create:
  * @job_id: The id of the newly-created job, or %NULL to use the
  * device name of @bs.
  * @bs: Block device to operate on.
@@ -764,18 +764,19 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
  * @opaque: Opaque pointer value passed to @cb.
  * @txn: Transaction that this job is part of (may be NULL).
  *
- * Start a backup operation on @bs.  Clusters in @bs are written to @target
+ * Create a backup operation on @bs.  Clusters in @bs are written to @target
  * until the job is cancelled or manually completed.
  */
-void backup_start(const char *job_id, BlockDriverState *bs,
-                  BlockDriverState *target, int64_t speed,
-                  MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
-                  bool compress,
-                  BlockdevOnError on_source_error,
-                  BlockdevOnError on_target_error,
-                  int creation_flags,
-                  BlockCompletionFunc *cb, void *opaque,
-                  BlockJobTxn *txn, Error **errp);
+BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
+                            BlockDriverState *target, int64_t speed,
+                            MirrorSyncMode sync_mode,
+                            BdrvDirtyBitmap *sync_bitmap,
+                            bool compress,
+                            BlockdevOnError on_source_error,
+                            BlockdevOnError on_target_error,
+                            int creation_flags,
+                            BlockCompletionFunc *cb, void *opaque,
+                            BlockJobTxn *txn, Error **errp);
 
 void hmp_drive_add_node(Monitor *mon, const char *optstr);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v3 6/6] iotests: add transactional failure race test
  2016-11-02 17:50 [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition John Snow
                   ` (4 preceding siblings ...)
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create John Snow
@ 2016-11-02 17:50 ` John Snow
  2016-11-03 13:21 ` [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition Kevin Wolf
  6 siblings, 0 replies; 22+ messages in thread
From: John Snow @ 2016-11-02 17:50 UTC (permalink / raw)
  To: qemu-block
  Cc: kwolf, vsementsov, stefanha, pbonzini, jcody, qemu-devel,
	John Snow

Add a regression test for the case found by Vladimir.

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: John Snow <jsnow@redhat.com>
---
 tests/qemu-iotests/124     | 53 ++++++++++++++++++++++++++++++----------------
 tests/qemu-iotests/124.out |  4 ++--
 2 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index f06938e..d0d2c2b 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -395,19 +395,7 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
         self.check_backups()
 
 
-    def test_transaction_failure(self):
-        '''Test: Verify backups made from a transaction that partially fails.
-
-        Add a second drive with its own unique pattern, and add a bitmap to each
-        drive. Use blkdebug to interfere with the backup on just one drive and
-        attempt to create a coherent incremental backup across both drives.
-
-        verify a failure in one but not both, then delete the failed stubs and
-        re-run the same transaction.
-
-        verify that both incrementals are created successfully.
-        '''
-
+    def do_transaction_failure_test(self, race=False):
         # Create a second drive, with pattern:
         drive1 = self.add_node('drive1')
         self.img_create(drive1['file'], drive1['fmt'])
@@ -451,9 +439,10 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
         self.assertFalse(self.vm.get_qmp_events(wait=False))
 
         # Emulate some writes
-        self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
-                                          ('0xfe', '16M', '256k'),
-                                          ('0x64', '32736k', '64k')))
+        if not race:
+            self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
+                                              ('0xfe', '16M', '256k'),
+                                              ('0x64', '32736k', '64k')))
         self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
                                           ('0xef', '16M', '256k'),
                                           ('0x46', '32736k', '64k')))
@@ -463,7 +452,8 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
         target1 = self.prepare_backup(dr1bm0)
 
         # Ask for a new incremental backup per-each drive,
-        # expecting drive1's backup to fail:
+        # expecting drive1's backup to fail. In the 'race' test,
+        # we expect drive1 to attempt to cancel the empty drive0 job.
         transaction = [
             transaction_drive_backup(drive0['id'], target0, sync='incremental',
                                      format=drive0['fmt'], mode='existing',
@@ -488,9 +478,15 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
         self.assert_no_active_block_jobs()
 
         # Delete drive0's successful target and eliminate our record of the
-        # unsuccessful drive1 target. Then re-run the same transaction.
+        # unsuccessful drive1 target.
         dr0bm0.del_target()
         dr1bm0.del_target()
+        if race:
+            # Don't re-run the transaction, we only wanted to test the race.
+            self.vm.shutdown()
+            return
+
+        # Re-run the same transaction:
         target0 = self.prepare_backup(dr0bm0)
         target1 = self.prepare_backup(dr1bm0)
 
@@ -511,6 +507,27 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
         self.vm.shutdown()
         self.check_backups()
 
+    def test_transaction_failure(self):
+        '''Test: Verify backups made from a transaction that partially fails.
+
+        Add a second drive with its own unique pattern, and add a bitmap to each
+        drive. Use blkdebug to interfere with the backup on just one drive and
+        attempt to create a coherent incremental backup across both drives.
+
+        verify a failure in one but not both, then delete the failed stubs and
+        re-run the same transaction.
+
+        verify that both incrementals are created successfully.
+        '''
+        self.do_transaction_failure_test()
+
+    def test_transaction_failure_race(self):
+        '''Test: Verify that transactions with jobs that have no data to
+        transfer do not cause race conditions in the cancellation of the entire
+        transaction job group.
+        '''
+        self.do_transaction_failure_test(race=True)
+
 
     def test_sync_dirty_bitmap_missing(self):
         self.assert_no_active_block_jobs()
diff --git a/tests/qemu-iotests/124.out b/tests/qemu-iotests/124.out
index 36376be..e56cae0 100644
--- a/tests/qemu-iotests/124.out
+++ b/tests/qemu-iotests/124.out
@@ -1,5 +1,5 @@
-..........
+...........
 ----------------------------------------------------------------------
-Ran 10 tests
+Ran 11 tests
 
 OK
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start John Snow
@ 2016-11-03 12:17   ` Kevin Wolf
  2016-11-08  2:02     ` John Snow
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2016-11-03 12:17 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, vsementsov, stefanha, pbonzini, jcody, qemu-devel

Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> Instead of automatically starting jobs at creation time via backup_start
> et al, we'd like to return a job object pointer that can be started
> manually at later point in time.
> 
> For now, add the block_job_start mechanism and start the jobs
> automatically as we have been doing, with conversions job-by-job coming
> in later patches.
> 
> Of note: cancellation of unstarted jobs will perform all the normal
> cleanup as if the job had started, particularly abort and clean. The
> only difference is that we will not emit any events, because the job
> never actually started.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

> diff --git a/block/commit.c b/block/commit.c
> index 20d27e2..5b7c454 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>      s->backing_file_str = g_strdup(backing_file_str);
>  
>      s->on_error = on_error;
> -    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
>  
>      trace_commit_start(bs, base, top, s, s->common.co);

s->common.co is now uninitialised and should probably be removed from
the tracepoint arguments. The same is true for mirror and stream.

> -    qemu_coroutine_enter(s->common.co);
> +    block_job_start(&s->common);
>  }

> diff --git a/blockjob.c b/blockjob.c
> index e3c458c..16c5159 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>      job->blk           = blk;
>      job->cb            = cb;
>      job->opaque        = opaque;
> -    job->busy          = true;
> +    job->busy          = false;
> +    job->paused        = true;
>      job->refcnt        = 1;
>      bs->job = job;
>  
> @@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
>      return (job->id == NULL);
>  }
>  
> +static bool block_job_started(BlockJob *job)
> +{
> +    return job->co;
> +}
> +
> +void block_job_start(BlockJob *job)
> +{
> +    assert(job && !block_job_started(job) && job->paused &&
> +           !job->busy && job->driver->start);
> +    job->paused = false;
> +    job->busy = true;
> +    job->co = qemu_coroutine_create(job->driver->start, job);
> +    qemu_coroutine_enter(job->co);
> +}

We allow the user to pause a job while it's not started yet. You
classified this as "harmless". But if we accept this, can we really
unconditionally enter the coroutine even if the job has been paused?
Can't a user expect that a job remains in paused state when they
explicitly requested a pause and the job was already internally paused,
like in this case by block_job_create()?

The same probably also applies to the internal job pausing during
bdrv_drain_all_begin/end, though as you know there is a larger problem
with starting jobs under drain_all anyway. For now, we just need to keep
in mind that we can neither create nor start a job in such sections.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create John Snow
@ 2016-11-03 13:17   ` Kevin Wolf
  2016-11-08  5:41     ` John Snow
  2016-11-08  3:14   ` Jeff Cody
  1 sibling, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2016-11-03 13:17 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, vsementsov, stefanha, pbonzini, jcody, qemu-devel

Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> Refactor backup_start as backup_job_create, which only creates the job,
> but does not automatically start it. The old interface, 'backup_start',
> is not kept in favor of limiting the number of nearly-identical interfaces
> that would have to be edited to keep up with QAPI changes in the future.
> 
> Callers that wish to synchronously start the backup_block_job can
> instead just call block_job_start immediately after calling
> backup_job_create.
> 
> Transactions are updated to use the new interface, calling block_job_start
> only during the .commit phase, which helps prevent race conditions where
> jobs may finish before we even finish building the transaction. This may
> happen, for instance, during empty block backup jobs.
> 
> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: John Snow <jsnow@redhat.com>

> +static void drive_backup_commit(BlkActionState *common)
> +{
> +    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> +    if (state->job) {
> +        block_job_start(state->job);
> +    }
>  }

How could state->job ever be NULL?

Same question for abort, and for blockdev_backup_commit/abort.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition
  2016-11-02 17:50 [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition John Snow
                   ` (5 preceding siblings ...)
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 6/6] iotests: add transactional failure race test John Snow
@ 2016-11-03 13:21 ` Kevin Wolf
  6 siblings, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2016-11-03 13:21 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, vsementsov, stefanha, pbonzini, jcody, qemu-devel

Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> There are a few problems with transactional job completion right now.
> 
> First, if jobs complete so quickly they complete before remaining jobs
> get a chance to join the transaction, the completion mode can leave well
> known state and the QLIST can get corrupted and the transactional jobs
> can complete in batches or phases instead of all together.
> 
> Second, if two or more jobs defer to the main loop at roughly the same
> time, it's possible for one job's cleanup to directly invoke the other
> job's cleanup from within the same thread, leading to a situation that
> will deadlock the entire transaction.
> 
> Thanks to Vladimir for pointing out these modes of failure.

Patch 1-3 and 6:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start
  2016-11-03 12:17   ` Kevin Wolf
@ 2016-11-08  2:02     ` John Snow
  2016-11-08  2:05       ` Jeff Cody
  0 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2016-11-08  2:02 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: vsementsov, qemu-block, jcody, qemu-devel, stefanha, pbonzini



On 11/03/2016 08:17 AM, Kevin Wolf wrote:
> Am 02.11.2016 um 18:50 hat John Snow geschrieben:
>> Instead of automatically starting jobs at creation time via backup_start
>> et al, we'd like to return a job object pointer that can be started
>> manually at later point in time.
>>
>> For now, add the block_job_start mechanism and start the jobs
>> automatically as we have been doing, with conversions job-by-job coming
>> in later patches.
>>
>> Of note: cancellation of unstarted jobs will perform all the normal
>> cleanup as if the job had started, particularly abort and clean. The
>> only difference is that we will not emit any events, because the job
>> never actually started.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>
>> diff --git a/block/commit.c b/block/commit.c
>> index 20d27e2..5b7c454 100644
>> --- a/block/commit.c
>> +++ b/block/commit.c
>> @@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>>      s->backing_file_str = g_strdup(backing_file_str);
>>
>>      s->on_error = on_error;
>> -    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
>>
>>      trace_commit_start(bs, base, top, s, s->common.co);
>
> s->common.co is now uninitialised and should probably be removed from
> the tracepoint arguments. The same is true for mirror and stream.
>
>> -    qemu_coroutine_enter(s->common.co);
>> +    block_job_start(&s->common);
>>  }
>
>> diff --git a/blockjob.c b/blockjob.c
>> index e3c458c..16c5159 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>>      job->blk           = blk;
>>      job->cb            = cb;
>>      job->opaque        = opaque;
>> -    job->busy          = true;
>> +    job->busy          = false;
>> +    job->paused        = true;
>>      job->refcnt        = 1;
>>      bs->job = job;
>>
>> @@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
>>      return (job->id == NULL);
>>  }
>>
>> +static bool block_job_started(BlockJob *job)
>> +{
>> +    return job->co;
>> +}
>> +
>> +void block_job_start(BlockJob *job)
>> +{
>> +    assert(job && !block_job_started(job) && job->paused &&
>> +           !job->busy && job->driver->start);
>> +    job->paused = false;
>> +    job->busy = true;
>> +    job->co = qemu_coroutine_create(job->driver->start, job);
>> +    qemu_coroutine_enter(job->co);
>> +}
>
> We allow the user to pause a job while it's not started yet. You
> classified this as "harmless". But if we accept this, can we really
> unconditionally enter the coroutine even if the job has been paused?
> Can't a user expect that a job remains in paused state when they
> explicitly requested a pause and the job was already internally paused,
> like in this case by block_job_create()?
>

What will end up happening is that we'll enter the job, and then it'll 
pause immediately upon entrance. Is that a problem?

If the jobs themselves are not checking their pause state fastidiously, 
it could be (but block/backup does -- after it creates a write notifier.)

Do we want a stronger guarantee here?

Naively I think it's OK as-is, but I could add a stronger boolean in 
that lets us know if it's okay to start or not, and we could delay the 
actual creation and start until the 'resume' comes in if you'd like.

I'd like to avoid the complexity if we can help it, but perhaps I'm not 
thinking carefully enough about the existing edge cases.

> The same probably also applies to the internal job pausing during
> bdrv_drain_all_begin/end, though as you know there is a larger problem
> with starting jobs under drain_all anyway. For now, we just need to keep
> in mind that we can neither create nor start a job in such sections.
>

Yeah, there are deeper problems there. As long as the existing critical 
sections don't allow us to create jobs (started or not) I think we're 
probably already OK.

> Kevin
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start
  2016-11-08  2:02     ` John Snow
@ 2016-11-08  2:05       ` Jeff Cody
  2016-11-08  2:20         ` John Snow
  2016-11-08  9:16         ` Kevin Wolf
  0 siblings, 2 replies; 22+ messages in thread
From: Jeff Cody @ 2016-11-08  2:05 UTC (permalink / raw)
  To: John Snow
  Cc: Kevin Wolf, vsementsov, qemu-block, qemu-devel, stefanha,
	pbonzini

On Mon, Nov 07, 2016 at 09:02:14PM -0500, John Snow wrote:
> 
> 
> On 11/03/2016 08:17 AM, Kevin Wolf wrote:
> >Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> >>Instead of automatically starting jobs at creation time via backup_start
> >>et al, we'd like to return a job object pointer that can be started
> >>manually at later point in time.
> >>
> >>For now, add the block_job_start mechanism and start the jobs
> >>automatically as we have been doing, with conversions job-by-job coming
> >>in later patches.
> >>
> >>Of note: cancellation of unstarted jobs will perform all the normal
> >>cleanup as if the job had started, particularly abort and clean. The
> >>only difference is that we will not emit any events, because the job
> >>never actually started.
> >>
> >>Signed-off-by: John Snow <jsnow@redhat.com>
> >
> >>diff --git a/block/commit.c b/block/commit.c
> >>index 20d27e2..5b7c454 100644
> >>--- a/block/commit.c
> >>+++ b/block/commit.c
> >>@@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
> >>     s->backing_file_str = g_strdup(backing_file_str);
> >>
> >>     s->on_error = on_error;
> >>-    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
> >>
> >>     trace_commit_start(bs, base, top, s, s->common.co);
> >
> >s->common.co is now uninitialised and should probably be removed from
> >the tracepoint arguments. The same is true for mirror and stream.
> >
> >>-    qemu_coroutine_enter(s->common.co);
> >>+    block_job_start(&s->common);
> >> }
> >
> >>diff --git a/blockjob.c b/blockjob.c
> >>index e3c458c..16c5159 100644
> >>--- a/blockjob.c
> >>+++ b/blockjob.c
> >>@@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
> >>     job->blk           = blk;
> >>     job->cb            = cb;
> >>     job->opaque        = opaque;
> >>-    job->busy          = true;
> >>+    job->busy          = false;
> >>+    job->paused        = true;
> >>     job->refcnt        = 1;
> >>     bs->job = job;
> >>
> >>@@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
> >>     return (job->id == NULL);
> >> }
> >>
> >>+static bool block_job_started(BlockJob *job)
> >>+{
> >>+    return job->co;
> >>+}
> >>+
> >>+void block_job_start(BlockJob *job)
> >>+{
> >>+    assert(job && !block_job_started(job) && job->paused &&
> >>+           !job->busy && job->driver->start);
> >>+    job->paused = false;
> >>+    job->busy = true;
> >>+    job->co = qemu_coroutine_create(job->driver->start, job);
> >>+    qemu_coroutine_enter(job->co);
> >>+}
> >
> >We allow the user to pause a job while it's not started yet. You
> >classified this as "harmless". But if we accept this, can we really
> >unconditionally enter the coroutine even if the job has been paused?
> >Can't a user expect that a job remains in paused state when they
> >explicitly requested a pause and the job was already internally paused,
> >like in this case by block_job_create()?
> >
> 
> What will end up happening is that we'll enter the job, and then it'll pause
> immediately upon entrance. Is that a problem?
> 
> If the jobs themselves are not checking their pause state fastidiously, it
> could be (but block/backup does -- after it creates a write notifier.)
> 
> Do we want a stronger guarantee here?
> 
> Naively I think it's OK as-is, but I could add a stronger boolean in that
> lets us know if it's okay to start or not, and we could delay the actual
> creation and start until the 'resume' comes in if you'd like.
> 
> I'd like to avoid the complexity if we can help it, but perhaps I'm not
> thinking carefully enough about the existing edge cases.
> 

Is there any reason we can't just use job->pause_count here?  When the job
is created, set job->paused = true, and job->pause_count = 1.  In the
block_job_start(), check the pause_count prior to qemu_coroutine_enter():

    void block_job_start(BlockJob *job)
    {
        assert(job && !block_job_started(job) && job->paused &&
              !job->busy && job->driver->start);
        job->co = qemu_coroutine_create(job->driver->start, job);
        job->paused = --job->pause_count > 0;
        if (!job->paused) {
            job->busy = true;
            qemu_coroutine_enter(job->co);
        }
    }


> >The same probably also applies to the internal job pausing during
> >bdrv_drain_all_begin/end, though as you know there is a larger problem
> >with starting jobs under drain_all anyway. For now, we just need to keep
> >in mind that we can neither create nor start a job in such sections.
> >
> 
> Yeah, there are deeper problems there. As long as the existing critical
> sections don't allow us to create jobs (started or not) I think we're
> probably already OK.
> 
> >Kevin
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start
  2016-11-08  2:05       ` Jeff Cody
@ 2016-11-08  2:20         ` John Snow
  2016-11-08  9:16         ` Kevin Wolf
  1 sibling, 0 replies; 22+ messages in thread
From: John Snow @ 2016-11-08  2:20 UTC (permalink / raw)
  To: Jeff Cody
  Cc: Kevin Wolf, vsementsov, qemu-block, qemu-devel, stefanha,
	pbonzini



On 11/07/2016 09:05 PM, Jeff Cody wrote:
> On Mon, Nov 07, 2016 at 09:02:14PM -0500, John Snow wrote:
>>
>>
>> On 11/03/2016 08:17 AM, Kevin Wolf wrote:
>>> Am 02.11.2016 um 18:50 hat John Snow geschrieben:
>>>> Instead of automatically starting jobs at creation time via backup_start
>>>> et al, we'd like to return a job object pointer that can be started
>>>> manually at later point in time.
>>>>
>>>> For now, add the block_job_start mechanism and start the jobs
>>>> automatically as we have been doing, with conversions job-by-job coming
>>>> in later patches.
>>>>
>>>> Of note: cancellation of unstarted jobs will perform all the normal
>>>> cleanup as if the job had started, particularly abort and clean. The
>>>> only difference is that we will not emit any events, because the job
>>>> never actually started.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>
>>>> diff --git a/block/commit.c b/block/commit.c
>>>> index 20d27e2..5b7c454 100644
>>>> --- a/block/commit.c
>>>> +++ b/block/commit.c
>>>> @@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>>>>     s->backing_file_str = g_strdup(backing_file_str);
>>>>
>>>>     s->on_error = on_error;
>>>> -    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
>>>>
>>>>     trace_commit_start(bs, base, top, s, s->common.co);
>>>
>>> s->common.co is now uninitialised and should probably be removed from
>>> the tracepoint arguments. The same is true for mirror and stream.
>>>
>>>> -    qemu_coroutine_enter(s->common.co);
>>>> +    block_job_start(&s->common);
>>>> }
>>>
>>>> diff --git a/blockjob.c b/blockjob.c
>>>> index e3c458c..16c5159 100644
>>>> --- a/blockjob.c
>>>> +++ b/blockjob.c
>>>> @@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>>>>     job->blk           = blk;
>>>>     job->cb            = cb;
>>>>     job->opaque        = opaque;
>>>> -    job->busy          = true;
>>>> +    job->busy          = false;
>>>> +    job->paused        = true;
>>>>     job->refcnt        = 1;
>>>>     bs->job = job;
>>>>
>>>> @@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
>>>>     return (job->id == NULL);
>>>> }
>>>>
>>>> +static bool block_job_started(BlockJob *job)
>>>> +{
>>>> +    return job->co;
>>>> +}
>>>> +
>>>> +void block_job_start(BlockJob *job)
>>>> +{
>>>> +    assert(job && !block_job_started(job) && job->paused &&
>>>> +           !job->busy && job->driver->start);
>>>> +    job->paused = false;
>>>> +    job->busy = true;
>>>> +    job->co = qemu_coroutine_create(job->driver->start, job);
>>>> +    qemu_coroutine_enter(job->co);
>>>> +}
>>>
>>> We allow the user to pause a job while it's not started yet. You
>>> classified this as "harmless". But if we accept this, can we really
>>> unconditionally enter the coroutine even if the job has been paused?
>>> Can't a user expect that a job remains in paused state when they
>>> explicitly requested a pause and the job was already internally paused,
>>> like in this case by block_job_create()?
>>>
>>
>> What will end up happening is that we'll enter the job, and then it'll pause
>> immediately upon entrance. Is that a problem?
>>
>> If the jobs themselves are not checking their pause state fastidiously, it
>> could be (but block/backup does -- after it creates a write notifier.)
>>
>> Do we want a stronger guarantee here?
>>
>> Naively I think it's OK as-is, but I could add a stronger boolean in that
>> lets us know if it's okay to start or not, and we could delay the actual
>> creation and start until the 'resume' comes in if you'd like.
>>
>> I'd like to avoid the complexity if we can help it, but perhaps I'm not
>> thinking carefully enough about the existing edge cases.
>>
>
> Is there any reason we can't just use job->pause_count here?  When the job
> is created, set job->paused = true, and job->pause_count = 1.  In the
> block_job_start(), check the pause_count prior to qemu_coroutine_enter():
>
>     void block_job_start(BlockJob *job)
>     {
>         assert(job && !block_job_started(job) && job->paused &&
>               !job->busy && job->driver->start);
>         job->co = qemu_coroutine_create(job->driver->start, job);
>         job->paused = --job->pause_count > 0;
>         if (!job->paused) {
>             job->busy = true;
>             qemu_coroutine_enter(job->co);
>         }
>     }
>

Solid point. Let's do it this way.
Thanks!

>
>>> The same probably also applies to the internal job pausing during
>>> bdrv_drain_all_begin/end, though as you know there is a larger problem
>>> with starting jobs under drain_all anyway. For now, we just need to keep
>>> in mind that we can neither create nor start a job in such sections.
>>>
>>
>> Yeah, there are deeper problems there. As long as the existing critical
>> sections don't allow us to create jobs (started or not) I think we're
>> probably already OK.
>>
>>> Kevin
>>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/6] blockjob: fix dead pointer in txn list
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 1/6] blockjob: fix dead pointer in txn list John Snow
@ 2016-11-08  2:47   ` Jeff Cody
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2016-11-08  2:47 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, kwolf, vsementsov, stefanha, pbonzini, qemu-devel

On Wed, Nov 02, 2016 at 01:50:51PM -0400, John Snow wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> Though it is not intended to be reached through normal circumstances,
> if we do not gracefully deconstruct the transaction QLIST, we may wind
> up with stale pointers in the list.
> 
> The rest of this series attempts to address the underlying issues,
> but this should fix list inconsistencies.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Tested-by: John Snow <jsnow@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> [Rewrote commit message. --js]
> Signed-off-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  blockjob.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 4aa14a4..4d0ef53 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -256,6 +256,7 @@ static void block_job_completed_single(BlockJob *job)
>      }
>  
>      if (job->txn) {
> +        QLIST_REMOVE(job, txn_list);
>          block_job_txn_unref(job->txn);
>      }
>      block_job_unref(job);
> -- 
> 2.7.4
>

Reviewed-by: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] blockjob: add .clean property
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 2/6] blockjob: add .clean property John Snow
@ 2016-11-08  2:51   ` Jeff Cody
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2016-11-08  2:51 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, kwolf, vsementsov, stefanha, pbonzini, qemu-devel

On Wed, Nov 02, 2016 at 01:50:52PM -0400, John Snow wrote:
> Cleaning up after we have deferred to the main thread but before the
> transaction has converged can be dangerous and result in deadlocks
> if the job cleanup invokes any BH polling loops.
> 
> A job may attempt to begin cleaning up, but may induce another job to
> enter its cleanup routine. The second job, part of our same transaction,
> will block waiting for the first job to finish, so neither job may now
> make progress.
> 
> To rectify this, allow jobs to register a cleanup operation that will
> always run regardless of if the job was in a transaction or not, and
> if the transaction job group completed successfully or not.
> 
> Move sensitive cleanup to this callback instead which is guaranteed to
> be run only after the transaction has converged, which removes sensitive
> timing constraints from said cleanup.
> 
> Furthermore, in future patches these cleanup operations will be performed
> regardless of whether or not we actually started the job. Therefore,
> cleanup callbacks should essentially confine themselves to undoing create
> operations, e.g. setup actions taken in what is now backup_start.
> 
> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c               | 15 ++++++++++-----
>  blockjob.c                   |  3 +++
>  include/block/blockjob_int.h |  8 ++++++++
>  3 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 7b5d8a3..734a24c 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -242,6 +242,14 @@ static void backup_abort(BlockJob *job)
>      }
>  }
>  
> +static void backup_clean(BlockJob *job)
> +{
> +    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> +    assert(s->target);
> +    blk_unref(s->target);
> +    s->target = NULL;
> +}
> +
>  static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
>  {
>      BackupBlockJob *s = container_of(job, BackupBlockJob, common);
> @@ -321,6 +329,7 @@ static const BlockJobDriver backup_job_driver = {
>      .set_speed              = backup_set_speed,
>      .commit                 = backup_commit,
>      .abort                  = backup_abort,
> +    .clean                  = backup_clean,
>      .attached_aio_context   = backup_attached_aio_context,
>      .drain                  = backup_drain,
>  };
> @@ -343,12 +352,8 @@ typedef struct {
>  
>  static void backup_complete(BlockJob *job, void *opaque)
>  {
> -    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
>      BackupCompleteData *data = opaque;
>  
> -    blk_unref(s->target);
> -    s->target = NULL;
> -
>      block_job_completed(job, data->ret);
>      g_free(data);
>  }
> @@ -658,7 +663,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>          bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
>      }
>      if (job) {
> -        blk_unref(job->target);
> +        backup_clean(&job->common);
>          block_job_unref(&job->common);
>      }
>  }
> diff --git a/blockjob.c b/blockjob.c
> index 4d0ef53..e3c458c 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -241,6 +241,9 @@ static void block_job_completed_single(BlockJob *job)
>              job->driver->abort(job);
>          }
>      }
> +    if (job->driver->clean) {
> +        job->driver->clean(job);
> +    }
>  
>      if (job->cb) {
>          job->cb(job->opaque, job->ret);
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 40275e4..60d91a0 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -74,6 +74,14 @@ struct BlockJobDriver {
>      void (*abort)(BlockJob *job);
>  
>      /**
> +     * If the callback is not NULL, it will be invoked after a call to either
> +     * .commit() or .abort(). Regardless of which callback is invoked after
> +     * completion, .clean() will always be called, even if the job does not
> +     * belong to a transaction group.
> +     */
> +    void (*clean)(BlockJob *job);
> +
> +    /**
>       * If the callback is not NULL, it will be invoked when the job transitions
>       * into the paused state.  Paused jobs must not perform any asynchronous
>       * I/O or event loop activity.  This callback is used to quiesce jobs.
> -- 
> 2.7.4
> 

Reviewed-by: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/6] blockjob: add .start field
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 3/6] blockjob: add .start field John Snow
@ 2016-11-08  2:58   ` Jeff Cody
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2016-11-08  2:58 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, kwolf, vsementsov, stefanha, pbonzini, qemu-devel

On Wed, Nov 02, 2016 at 01:50:53PM -0400, John Snow wrote:
> Add an explicit start field to specify the entrypoint. We already have
> ownership of the coroutine itself AND managing the lifetime of the
> coroutine, let's take control of creation of the coroutine, too.
> 
> This will allow us to delay creation of the actual coroutine until we
> know we'll actually start a BlockJob in block_job_start. This avoids
> the sticky question of how to "un-create" a Coroutine that hasn't been
> started yet.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c               | 25 +++++++++++++------------
>  block/commit.c               |  3 ++-
>  block/mirror.c               |  4 +++-
>  block/stream.c               |  3 ++-
>  include/block/blockjob_int.h |  3 +++
>  5 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 734a24c..4ed4494 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -323,17 +323,6 @@ static void backup_drain(BlockJob *job)
>      }
>  }
>  
> -static const BlockJobDriver backup_job_driver = {
> -    .instance_size          = sizeof(BackupBlockJob),
> -    .job_type               = BLOCK_JOB_TYPE_BACKUP,
> -    .set_speed              = backup_set_speed,
> -    .commit                 = backup_commit,
> -    .abort                  = backup_abort,
> -    .clean                  = backup_clean,
> -    .attached_aio_context   = backup_attached_aio_context,
> -    .drain                  = backup_drain,
> -};
> -
>  static BlockErrorAction backup_error_action(BackupBlockJob *job,
>                                              bool read, int error)
>  {
> @@ -542,6 +531,18 @@ static void coroutine_fn backup_run(void *opaque)
>      block_job_defer_to_main_loop(&job->common, backup_complete, data);
>  }
>  
> +static const BlockJobDriver backup_job_driver = {
> +    .instance_size          = sizeof(BackupBlockJob),
> +    .job_type               = BLOCK_JOB_TYPE_BACKUP,
> +    .start                  = backup_run,
> +    .set_speed              = backup_set_speed,
> +    .commit                 = backup_commit,
> +    .abort                  = backup_abort,
> +    .clean                  = backup_clean,
> +    .attached_aio_context   = backup_attached_aio_context,
> +    .drain                  = backup_drain,
> +};
> +

Some code movement here in addition to the .start addition, but to a better
place (I am guessing that is intentional).

>  void backup_start(const char *job_id, BlockDriverState *bs,
>                    BlockDriverState *target, int64_t speed,
>                    MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
> @@ -653,7 +654,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>  
>      block_job_add_bdrv(&job->common, target);
>      job->common.len = len;
> -    job->common.co = qemu_coroutine_create(backup_run, job);
> +    job->common.co = qemu_coroutine_create(job->common.driver->start, job);
>      block_job_txn_add_job(txn, &job->common);
>      qemu_coroutine_enter(job->common.co);
>      return;
> diff --git a/block/commit.c b/block/commit.c
> index e1eda89..20d27e2 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -205,6 +205,7 @@ static const BlockJobDriver commit_job_driver = {
>      .instance_size = sizeof(CommitBlockJob),
>      .job_type      = BLOCK_JOB_TYPE_COMMIT,
>      .set_speed     = commit_set_speed,
> +    .start         = commit_run,
>  };
>  
>  void commit_start(const char *job_id, BlockDriverState *bs,
> @@ -288,7 +289,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>      s->backing_file_str = g_strdup(backing_file_str);
>  
>      s->on_error = on_error;
> -    s->common.co = qemu_coroutine_create(commit_run, s);
> +    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
>  
>      trace_commit_start(bs, base, top, s, s->common.co);
>      qemu_coroutine_enter(s->common.co);
> diff --git a/block/mirror.c b/block/mirror.c
> index b2c1fb8..659e09c 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -920,6 +920,7 @@ static const BlockJobDriver mirror_job_driver = {
>      .instance_size          = sizeof(MirrorBlockJob),
>      .job_type               = BLOCK_JOB_TYPE_MIRROR,
>      .set_speed              = mirror_set_speed,
> +    .start                  = mirror_run,
>      .complete               = mirror_complete,
>      .pause                  = mirror_pause,
>      .attached_aio_context   = mirror_attached_aio_context,
> @@ -930,6 +931,7 @@ static const BlockJobDriver commit_active_job_driver = {
>      .instance_size          = sizeof(MirrorBlockJob),
>      .job_type               = BLOCK_JOB_TYPE_COMMIT,
>      .set_speed              = mirror_set_speed,
> +    .start                  = mirror_run,
>      .complete               = mirror_complete,
>      .pause                  = mirror_pause,
>      .attached_aio_context   = mirror_attached_aio_context,
> @@ -1007,7 +1009,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>          }
>      }
>  
> -    s->common.co = qemu_coroutine_create(mirror_run, s);
> +    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
>      trace_mirror_start(bs, s, s->common.co, opaque);
>      qemu_coroutine_enter(s->common.co);
>  }
> diff --git a/block/stream.c b/block/stream.c
> index b05856b..92309ff 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -218,6 +218,7 @@ static const BlockJobDriver stream_job_driver = {
>      .instance_size = sizeof(StreamBlockJob),
>      .job_type      = BLOCK_JOB_TYPE_STREAM,
>      .set_speed     = stream_set_speed,
> +    .start         = stream_run,
>  };
>  
>  void stream_start(const char *job_id, BlockDriverState *bs,
> @@ -254,7 +255,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>      s->bs_flags = orig_bs_flags;
>  
>      s->on_error = on_error;
> -    s->common.co = qemu_coroutine_create(stream_run, s);
> +    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
>      trace_stream_start(bs, base, s, s->common.co);
>      qemu_coroutine_enter(s->common.co);
>  }
> diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
> index 60d91a0..8223822 100644
> --- a/include/block/blockjob_int.h
> +++ b/include/block/blockjob_int.h
> @@ -47,6 +47,9 @@ struct BlockJobDriver {
>      /** Optional callback for job types that need to forward I/O status reset */
>      void (*iostatus_reset)(BlockJob *job);
>  
> +    /** Mandatory: Entrypoint for the Coroutine. */
> +    CoroutineEntry *start;
> +
>      /**
>       * Optional callback for job types whose completion must be triggered
>       * manually.
> -- 
> 2.7.4
> 

Reviewed-by: Jeff Cody <jcody@redhat.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create
  2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create John Snow
  2016-11-03 13:17   ` Kevin Wolf
@ 2016-11-08  3:14   ` Jeff Cody
  1 sibling, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2016-11-08  3:14 UTC (permalink / raw)
  To: John Snow; +Cc: qemu-block, kwolf, vsementsov, stefanha, pbonzini, qemu-devel

On Wed, Nov 02, 2016 at 01:50:55PM -0400, John Snow wrote:
> Refactor backup_start as backup_job_create, which only creates the job,
> but does not automatically start it. The old interface, 'backup_start',
> is not kept in favor of limiting the number of nearly-identical interfaces
> that would have to be edited to keep up with QAPI changes in the future.
> 
> Callers that wish to synchronously start the backup_block_job can
> instead just call block_job_start immediately after calling
> backup_job_create.
> 
> Transactions are updated to use the new interface, calling block_job_start
> only during the .commit phase, which helps prevent race conditions where
> jobs may finish before we even finish building the transaction. This may
> happen, for instance, during empty block backup jobs.
> 
> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c            | 26 ++++++++-------
>  block/replication.c       | 12 ++++---
>  blockdev.c                | 83 ++++++++++++++++++++++++++++++-----------------
>  include/block/block_int.h | 23 ++++++-------
>  4 files changed, 87 insertions(+), 57 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index ae1b99a..ea38733 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -543,7 +543,7 @@ static const BlockJobDriver backup_job_driver = {
>      .drain                  = backup_drain,
>  };
>  
> -void backup_start(const char *job_id, BlockDriverState *bs,
> +BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>                    BlockDriverState *target, int64_t speed,
>                    MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
>                    bool compress,
> @@ -563,52 +563,52 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>  
>      if (bs == target) {
>          error_setg(errp, "Source and target cannot be the same");
> -        return;
> +        return NULL;
>      }
>  
>      if (!bdrv_is_inserted(bs)) {
>          error_setg(errp, "Device is not inserted: %s",
>                     bdrv_get_device_name(bs));
> -        return;
> +        return NULL;
>      }
>  
>      if (!bdrv_is_inserted(target)) {
>          error_setg(errp, "Device is not inserted: %s",
>                     bdrv_get_device_name(target));
> -        return;
> +        return NULL;
>      }
>  
>      if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
>          error_setg(errp, "Compression is not supported for this drive %s",
>                     bdrv_get_device_name(target));
> -        return;
> +        return NULL;
>      }
>  
>      if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
> -        return;
> +        return NULL;
>      }
>  
>      if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
> -        return;
> +        return NULL;
>      }
>  
>      if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>          if (!sync_bitmap) {
>              error_setg(errp, "must provide a valid bitmap name for "
>                               "\"incremental\" sync mode");
> -            return;
> +            return NULL;
>          }
>  
>          /* Create a new bitmap, and freeze/disable this one. */
>          if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
> -            return;
> +            return NULL;
>          }
>      } else if (sync_bitmap) {
>          error_setg(errp,
>                     "a sync_bitmap was provided to backup_run, "
>                     "but received an incompatible sync_mode (%s)",
>                     MirrorSyncMode_lookup[sync_mode]);
> -        return;
> +        return NULL;
>      }
>  
>      len = bdrv_getlength(bs);
> @@ -655,8 +655,8 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>      block_job_add_bdrv(&job->common, target);
>      job->common.len = len;
>      block_job_txn_add_job(txn, &job->common);
> -    block_job_start(&job->common);
> -    return;
> +
> +    return &job->common;
>  
>   error:
>      if (sync_bitmap) {
> @@ -666,4 +666,6 @@ void backup_start(const char *job_id, BlockDriverState *bs,
>          backup_clean(&job->common);
>          block_job_unref(&job->common);
>      }
> +
> +    return NULL;
>  }
> diff --git a/block/replication.c b/block/replication.c
> index d5e2b0f..729dd12 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -421,6 +421,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>      int64_t active_length, hidden_length, disk_length;
>      AioContext *aio_context;
>      Error *local_err = NULL;
> +    BlockJob *job;
>  
>      aio_context = bdrv_get_aio_context(bs);
>      aio_context_acquire(aio_context);
> @@ -508,17 +509,18 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>          bdrv_op_block_all(top_bs, s->blocker);
>          bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
>  
> -        backup_start(NULL, s->secondary_disk->bs, s->hidden_disk->bs, 0,
> -                     MIRROR_SYNC_MODE_NONE, NULL, false,
> -                     BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
> -                     BLOCK_JOB_INTERNAL, backup_job_completed, bs,
> -                     NULL, &local_err);
> +        job = backup_job_create(NULL, s->secondary_disk->bs, s->hidden_disk->bs,
> +                                0, MIRROR_SYNC_MODE_NONE, NULL, false,
> +                                BLOCKDEV_ON_ERROR_REPORT,
> +                                BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
> +                                backup_job_completed, bs, NULL, &local_err);
>          if (local_err) {
>              error_propagate(errp, local_err);
>              backup_job_cleanup(bs);
>              aio_context_release(aio_context);
>              return;
>          }
> +        block_job_start(job);
>          break;
>      default:
>          aio_context_release(aio_context);
> diff --git a/blockdev.c b/blockdev.c
> index 102ca9f..871aa35 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1811,7 +1811,7 @@ typedef struct DriveBackupState {
>      BlockJob *job;
>  } DriveBackupState;
>  
> -static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
> +static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
>                              Error **errp);

It'd be nice to someday remove this forward declaration and just move the
function up here.  But I won't ask you to do it :)

>  
>  static void drive_backup_prepare(BlkActionState *common, Error **errp)
> @@ -1835,23 +1835,27 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
>      bdrv_drained_begin(bs);
>      state->bs = bs;
>  
> -    do_drive_backup(backup, common->block_job_txn, &local_err);
> +    state->job = do_drive_backup(backup, common->block_job_txn, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
>      }
> +}
>  
> -    state->job = state->bs->job;
> +static void drive_backup_commit(BlkActionState *common)
> +{
> +    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> +    if (state->job) {

Kevin has a point on all the state->job checks in the _commit/_abort
functions.  I think the only way that could happen is if the _prepare() was
not run for some reason.

Maybe just make them all asserts()?

With that change:

Reviewed-by: Jeff Cody <jcody@redhat.com>

> +        block_job_start(state->job);
> +    }
>  }
>  
>  static void drive_backup_abort(BlkActionState *common)
>  {
>      DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> -    BlockDriverState *bs = state->bs;
>  
> -    /* Only cancel if it's the job we started */
> -    if (bs && bs->job && bs->job == state->job) {
> -        block_job_cancel_sync(bs->job);
> +    if (state->job) {
> +        block_job_cancel_sync(state->job);
>      }
>  }
>  
> @@ -1872,8 +1876,8 @@ typedef struct BlockdevBackupState {
>      AioContext *aio_context;
>  } BlockdevBackupState;
>  
> -static void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> -                               Error **errp);
> +static BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> +                                    Error **errp);
>  
>  static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
>  {
> @@ -1906,23 +1910,27 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
>      state->bs = bs;
>      bdrv_drained_begin(state->bs);
>  
> -    do_blockdev_backup(backup, common->block_job_txn, &local_err);
> +    state->job = do_blockdev_backup(backup, common->block_job_txn, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
>      }
> +}
>  
> -    state->job = state->bs->job;
> +static void blockdev_backup_commit(BlkActionState *common)
> +{
> +    BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
> +    if (state->job) {
> +        block_job_start(state->job);
> +    }
>  }
>  
>  static void blockdev_backup_abort(BlkActionState *common)
>  {
>      BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common);
> -    BlockDriverState *bs = state->bs;
>  
> -    /* Only cancel if it's the job we started */
> -    if (bs && bs->job && bs->job == state->job) {
> -        block_job_cancel_sync(bs->job);
> +    if (state->job) {
> +        block_job_cancel_sync(state->job);
>      }
>  }
>  
> @@ -2072,12 +2080,14 @@ static const BlkActionOps actions[] = {
>      [TRANSACTION_ACTION_KIND_DRIVE_BACKUP] = {
>          .instance_size = sizeof(DriveBackupState),
>          .prepare = drive_backup_prepare,
> +        .commit = drive_backup_commit,
>          .abort = drive_backup_abort,
>          .clean = drive_backup_clean,
>      },
>      [TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP] = {
>          .instance_size = sizeof(BlockdevBackupState),
>          .prepare = blockdev_backup_prepare,
> +        .commit = blockdev_backup_commit,
>          .abort = blockdev_backup_abort,
>          .clean = blockdev_backup_clean,
>      },
> @@ -3106,11 +3116,13 @@ out:
>      aio_context_release(aio_context);
>  }
>  
> -static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
> +static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
> +                                 Error **errp)
>  {
>      BlockDriverState *bs;
>      BlockDriverState *target_bs;
>      BlockDriverState *source = NULL;
> +    BlockJob *job = NULL;
>      BdrvDirtyBitmap *bmap = NULL;
>      AioContext *aio_context;
>      QDict *options = NULL;
> @@ -3139,7 +3151,7 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>  
>      bs = qmp_get_root_bs(backup->device, errp);
>      if (!bs) {
> -        return;
> +        return NULL;
>      }
>  
>      aio_context = bdrv_get_aio_context(bs);
> @@ -3213,10 +3225,10 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>          }
>      }
>  
> -    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
> -                 bmap, backup->compress, backup->on_source_error,
> -                 backup->on_target_error, BLOCK_JOB_DEFAULT,
> -                 NULL, NULL, txn, &local_err);
> +    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
> +                            backup->sync, bmap, backup->compress,
> +                            backup->on_source_error, backup->on_target_error,
> +                            BLOCK_JOB_DEFAULT, NULL, NULL, txn, &local_err);
>      bdrv_unref(target_bs);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
> @@ -3225,11 +3237,17 @@ static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error **errp)
>  
>  out:
>      aio_context_release(aio_context);
> +    return job;
>  }
>  
>  void qmp_drive_backup(DriveBackup *arg, Error **errp)
>  {
> -    return do_drive_backup(arg, NULL, errp);
> +
> +    BlockJob *job;
> +    job = do_drive_backup(arg, NULL, errp);
> +    if (job) {
> +        block_job_start(job);
> +    }
>  }
>  
>  BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
> @@ -3237,12 +3255,14 @@ BlockDeviceInfoList *qmp_query_named_block_nodes(Error **errp)
>      return bdrv_named_nodes_list(errp);
>  }
>  
> -void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
> +BlockJob *do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
> +                             Error **errp)
>  {
>      BlockDriverState *bs;
>      BlockDriverState *target_bs;
>      Error *local_err = NULL;
>      AioContext *aio_context;
> +    BlockJob *job = NULL;
>  
>      if (!backup->has_speed) {
>          backup->speed = 0;
> @@ -3262,7 +3282,7 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
>  
>      bs = qmp_get_root_bs(backup->device, errp);
>      if (!bs) {
> -        return;
> +        return NULL;
>      }
>  
>      aio_context = bdrv_get_aio_context(bs);
> @@ -3284,20 +3304,25 @@ void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
>              goto out;
>          }
>      }
> -    backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
> -                 NULL, backup->compress, backup->on_source_error,
> -                 backup->on_target_error, BLOCK_JOB_DEFAULT,
> -                 NULL, NULL, txn, &local_err);
> +    job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
> +                            backup->sync, NULL, backup->compress,
> +                            backup->on_source_error, backup->on_target_error,
> +                            BLOCK_JOB_DEFAULT, NULL, NULL, txn, &local_err);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
>      }
>  out:
>      aio_context_release(aio_context);
> +    return job;
>  }
>  
>  void qmp_blockdev_backup(BlockdevBackup *arg, Error **errp)
>  {
> -    do_blockdev_backup(arg, NULL, errp);
> +    BlockJob *job;
> +    job = do_blockdev_backup(arg, NULL, errp);
> +    if (job) {
> +        block_job_start(job);
> +    }
>  }
>  
>  /* Parameter check and block job starting for drive mirroring.
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index b02abbd..83a423c 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -748,7 +748,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>                    bool unmap, Error **errp);
>  
>  /*
> - * backup_start:
> + * backup_job_create:
>   * @job_id: The id of the newly-created job, or %NULL to use the
>   * device name of @bs.
>   * @bs: Block device to operate on.
> @@ -764,18 +764,19 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>   * @opaque: Opaque pointer value passed to @cb.
>   * @txn: Transaction that this job is part of (may be NULL).
>   *
> - * Start a backup operation on @bs.  Clusters in @bs are written to @target
> + * Create a backup operation on @bs.  Clusters in @bs are written to @target
>   * until the job is cancelled or manually completed.
>   */
> -void backup_start(const char *job_id, BlockDriverState *bs,
> -                  BlockDriverState *target, int64_t speed,
> -                  MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
> -                  bool compress,
> -                  BlockdevOnError on_source_error,
> -                  BlockdevOnError on_target_error,
> -                  int creation_flags,
> -                  BlockCompletionFunc *cb, void *opaque,
> -                  BlockJobTxn *txn, Error **errp);
> +BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> +                            BlockDriverState *target, int64_t speed,
> +                            MirrorSyncMode sync_mode,
> +                            BdrvDirtyBitmap *sync_bitmap,
> +                            bool compress,
> +                            BlockdevOnError on_source_error,
> +                            BlockdevOnError on_target_error,
> +                            int creation_flags,
> +                            BlockCompletionFunc *cb, void *opaque,
> +                            BlockJobTxn *txn, Error **errp);
>  
>  void hmp_drive_add_node(Monitor *mon, const char *optstr);
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create
  2016-11-03 13:17   ` Kevin Wolf
@ 2016-11-08  5:41     ` John Snow
  2016-11-08  9:11       ` Kevin Wolf
  0 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2016-11-08  5:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: vsementsov, qemu-block, jcody, qemu-devel, stefanha, pbonzini



On 11/03/2016 09:17 AM, Kevin Wolf wrote:
> Am 02.11.2016 um 18:50 hat John Snow geschrieben:
>> Refactor backup_start as backup_job_create, which only creates the job,
>> but does not automatically start it. The old interface, 'backup_start',
>> is not kept in favor of limiting the number of nearly-identical interfaces
>> that would have to be edited to keep up with QAPI changes in the future.
>>
>> Callers that wish to synchronously start the backup_block_job can
>> instead just call block_job_start immediately after calling
>> backup_job_create.
>>
>> Transactions are updated to use the new interface, calling block_job_start
>> only during the .commit phase, which helps prevent race conditions where
>> jobs may finish before we even finish building the transaction. This may
>> happen, for instance, during empty block backup jobs.
>>
>> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>
>> +static void drive_backup_commit(BlkActionState *common)
>> +{
>> +    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
>> +    if (state->job) {
>> +        block_job_start(state->job);
>> +    }
>>  }
>
> How could state->job ever be NULL?
>

Mechanical thinking. It can't. (I definitely didn't copy paste from the 
.abort routines. Definitely.)

> Same question for abort, and for blockdev_backup_commit/abort.
>

Abort ... we may not have created the job successfully. Abort gets 
called whether or not we made it to or through the matching .prepare.

> Kevin
>

--js

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create
  2016-11-08  5:41     ` John Snow
@ 2016-11-08  9:11       ` Kevin Wolf
  2016-11-08 15:24         ` John Snow
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2016-11-08  9:11 UTC (permalink / raw)
  To: John Snow; +Cc: vsementsov, qemu-block, jcody, qemu-devel, stefanha, pbonzini

Am 08.11.2016 um 06:41 hat John Snow geschrieben:
> On 11/03/2016 09:17 AM, Kevin Wolf wrote:
> >Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> >>Refactor backup_start as backup_job_create, which only creates the job,
> >>but does not automatically start it. The old interface, 'backup_start',
> >>is not kept in favor of limiting the number of nearly-identical interfaces
> >>that would have to be edited to keep up with QAPI changes in the future.
> >>
> >>Callers that wish to synchronously start the backup_block_job can
> >>instead just call block_job_start immediately after calling
> >>backup_job_create.
> >>
> >>Transactions are updated to use the new interface, calling block_job_start
> >>only during the .commit phase, which helps prevent race conditions where
> >>jobs may finish before we even finish building the transaction. This may
> >>happen, for instance, during empty block backup jobs.
> >>
> >>Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>Signed-off-by: John Snow <jsnow@redhat.com>
> >
> >>+static void drive_backup_commit(BlkActionState *common)
> >>+{
> >>+    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> >>+    if (state->job) {
> >>+        block_job_start(state->job);
> >>+    }
> >> }
> >
> >How could state->job ever be NULL?
> >
> 
> Mechanical thinking. It can't. (I definitely didn't copy paste from
> the .abort routines. Definitely.)
> 
> >Same question for abort, and for blockdev_backup_commit/abort.
> >
> 
> Abort ... we may not have created the job successfully. Abort gets
> called whether or not we made it to or through the matching
> .prepare.

Ah, yes, I always forget about this. It's so counterintuitive (and
bdrv_reopen() actually works differently, it only aborts entries that
have successfully been prepared).

Is there a good reason why qmp_transaction() works this way, especially
since we have a separate .clean function?

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start
  2016-11-08  2:05       ` Jeff Cody
  2016-11-08  2:20         ` John Snow
@ 2016-11-08  9:16         ` Kevin Wolf
  1 sibling, 0 replies; 22+ messages in thread
From: Kevin Wolf @ 2016-11-08  9:16 UTC (permalink / raw)
  To: Jeff Cody
  Cc: John Snow, vsementsov, qemu-block, qemu-devel, stefanha, pbonzini

Am 08.11.2016 um 03:05 hat Jeff Cody geschrieben:
> On Mon, Nov 07, 2016 at 09:02:14PM -0500, John Snow wrote:
> > On 11/03/2016 08:17 AM, Kevin Wolf wrote:
> > >Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> > >>+void block_job_start(BlockJob *job)
> > >>+{
> > >>+    assert(job && !block_job_started(job) && job->paused &&
> > >>+           !job->busy && job->driver->start);
> > >>+    job->paused = false;
> > >>+    job->busy = true;
> > >>+    job->co = qemu_coroutine_create(job->driver->start, job);
> > >>+    qemu_coroutine_enter(job->co);
> > >>+}
> > >
> > >We allow the user to pause a job while it's not started yet. You
> > >classified this as "harmless". But if we accept this, can we really
> > >unconditionally enter the coroutine even if the job has been paused?
> > >Can't a user expect that a job remains in paused state when they
> > >explicitly requested a pause and the job was already internally paused,
> > >like in this case by block_job_create()?
> > >
> > 
> > What will end up happening is that we'll enter the job, and then it'll pause
> > immediately upon entrance. Is that a problem?
> > 
> > If the jobs themselves are not checking their pause state fastidiously, it
> > could be (but block/backup does -- after it creates a write notifier.)
> > 
> > Do we want a stronger guarantee here?
> > 
> > Naively I think it's OK as-is, but I could add a stronger boolean in that
> > lets us know if it's okay to start or not, and we could delay the actual
> > creation and start until the 'resume' comes in if you'd like.
> > 
> > I'd like to avoid the complexity if we can help it, but perhaps I'm not
> > thinking carefully enough about the existing edge cases.
> > 
> 
> Is there any reason we can't just use job->pause_count here?  When the job
> is created, set job->paused = true, and job->pause_count = 1.  In the
> block_job_start(), check the pause_count prior to qemu_coroutine_enter():
> 
>     void block_job_start(BlockJob *job)
>     {
>         assert(job && !block_job_started(job) && job->paused &&
>               !job->busy && job->driver->start);
>         job->co = qemu_coroutine_create(job->driver->start, job);
>         job->paused = --job->pause_count > 0;
>         if (!job->paused) {
>             job->busy = true;
>             qemu_coroutine_enter(job->co);
>         }
>     }

Yes, something like this is what I had in mind.

> > >The same probably also applies to the internal job pausing during
> > >bdrv_drain_all_begin/end, though as you know there is a larger problem
> > >with starting jobs under drain_all anyway. For now, we just need to keep
> > >in mind that we can neither create nor start a job in such sections.
> > >
> > 
> > Yeah, there are deeper problems there. As long as the existing critical
> > sections don't allow us to create jobs (started or not) I think we're
> > probably already OK.

My point here was that we would like the get rid of that restriction
eventually, and if we add more and more things that depend on the
restriction, getting rid of it will only become harder.

But with the above code, I think this specific problem is solved.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create
  2016-11-08  9:11       ` Kevin Wolf
@ 2016-11-08 15:24         ` John Snow
  2016-11-08 18:30           ` Jeff Cody
  0 siblings, 1 reply; 22+ messages in thread
From: John Snow @ 2016-11-08 15:24 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: vsementsov, qemu-block, jcody, qemu-devel, stefanha, pbonzini



On 11/08/2016 04:11 AM, Kevin Wolf wrote:
> Am 08.11.2016 um 06:41 hat John Snow geschrieben:
>> On 11/03/2016 09:17 AM, Kevin Wolf wrote:
>>> Am 02.11.2016 um 18:50 hat John Snow geschrieben:
>>>> Refactor backup_start as backup_job_create, which only creates the job,
>>>> but does not automatically start it. The old interface, 'backup_start',
>>>> is not kept in favor of limiting the number of nearly-identical interfaces
>>>> that would have to be edited to keep up with QAPI changes in the future.
>>>>
>>>> Callers that wish to synchronously start the backup_block_job can
>>>> instead just call block_job_start immediately after calling
>>>> backup_job_create.
>>>>
>>>> Transactions are updated to use the new interface, calling block_job_start
>>>> only during the .commit phase, which helps prevent race conditions where
>>>> jobs may finish before we even finish building the transaction. This may
>>>> happen, for instance, during empty block backup jobs.
>>>>
>>>> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>
>>>> +static void drive_backup_commit(BlkActionState *common)
>>>> +{
>>>> +    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
>>>> +    if (state->job) {
>>>> +        block_job_start(state->job);
>>>> +    }
>>>> }
>>>
>>> How could state->job ever be NULL?
>>>
>>
>> Mechanical thinking. It can't. (I definitely didn't copy paste from
>> the .abort routines. Definitely.)
>>
>>> Same question for abort, and for blockdev_backup_commit/abort.
>>>
>>
>> Abort ... we may not have created the job successfully. Abort gets
>> called whether or not we made it to or through the matching
>> .prepare.
>
> Ah, yes, I always forget about this. It's so counterintuitive (and
> bdrv_reopen() actually works differently, it only aborts entries that
> have successfully been prepared).
>
> Is there a good reason why qmp_transaction() works this way, especially
> since we have a separate .clean function?
>
> Kevin
>

We just don't track which actions have succeeded or not, so we loop 
through all actions on each phase regardless.

I could add a little state enumeration (or boolean) to each action and I 
could adjust abort to only run on actions that either completed or 
failed, but in this case I think it still wouldn't change the text for 
.abort, because an action may fail before it got to creating the job, 
for instance.

Unless you'd propose undoing .prepare IN .prepare in failure cases, but 
why write abort code twice? I don't mind it living in .abort, personally.

--js

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create
  2016-11-08 15:24         ` John Snow
@ 2016-11-08 18:30           ` Jeff Cody
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Cody @ 2016-11-08 18:30 UTC (permalink / raw)
  To: John Snow
  Cc: Kevin Wolf, vsementsov, qemu-block, qemu-devel, stefanha,
	pbonzini

On Tue, Nov 08, 2016 at 10:24:50AM -0500, John Snow wrote:
> 
> 
> On 11/08/2016 04:11 AM, Kevin Wolf wrote:
> >Am 08.11.2016 um 06:41 hat John Snow geschrieben:
> >>On 11/03/2016 09:17 AM, Kevin Wolf wrote:
> >>>Am 02.11.2016 um 18:50 hat John Snow geschrieben:
> >>>>Refactor backup_start as backup_job_create, which only creates the job,
> >>>>but does not automatically start it. The old interface, 'backup_start',
> >>>>is not kept in favor of limiting the number of nearly-identical interfaces
> >>>>that would have to be edited to keep up with QAPI changes in the future.
> >>>>
> >>>>Callers that wish to synchronously start the backup_block_job can
> >>>>instead just call block_job_start immediately after calling
> >>>>backup_job_create.
> >>>>
> >>>>Transactions are updated to use the new interface, calling block_job_start
> >>>>only during the .commit phase, which helps prevent race conditions where
> >>>>jobs may finish before we even finish building the transaction. This may
> >>>>happen, for instance, during empty block backup jobs.
> >>>>
> >>>>Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>>Signed-off-by: John Snow <jsnow@redhat.com>
> >>>
> >>>>+static void drive_backup_commit(BlkActionState *common)
> >>>>+{
> >>>>+    DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
> >>>>+    if (state->job) {
> >>>>+        block_job_start(state->job);
> >>>>+    }
> >>>>}
> >>>
> >>>How could state->job ever be NULL?
> >>>
> >>
> >>Mechanical thinking. It can't. (I definitely didn't copy paste from
> >>the .abort routines. Definitely.)
> >>
> >>>Same question for abort, and for blockdev_backup_commit/abort.
> >>>
> >>
> >>Abort ... we may not have created the job successfully. Abort gets
> >>called whether or not we made it to or through the matching
> >>.prepare.
> >
> >Ah, yes, I always forget about this. It's so counterintuitive (and
> >bdrv_reopen() actually works differently, it only aborts entries that
> >have successfully been prepared).
> >
> >Is there a good reason why qmp_transaction() works this way, especially
> >since we have a separate .clean function?
> >
> >Kevin
> >
> 
> We just don't track which actions have succeeded or not, so we loop through
> all actions on each phase regardless.
> 
> I could add a little state enumeration (or boolean) to each action and I
> could adjust abort to only run on actions that either completed or failed,
> but in this case I think it still wouldn't change the text for .abort,
> because an action may fail before it got to creating the job, for instance.
> 

As far as this part goes, couldn't we just do it without any flags, by not
inserting the state into the snap_bdrv_states list unless it was successful
(assuming _prepare cleans up itself on failure)?  E.g.:
 
-        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, state, entry);
 
         state->ops->prepare(state, &local_err);
         if (local_err) {
             error_propagate(errp, local_err);
+            g_free(state);
             goto delete_and_fail;
         }
+        QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, state, entry);
     }

> Unless you'd propose undoing .prepare IN .prepare in failure cases, but why
> write abort code twice? I don't mind it living in .abort, personally.
>

Doing it the above way would indeed require prepare functions to clean up
after themselves on failure.

The bdrv_reopen() model does it this way, and I think it makes sense.  With
most APIs, on failure you wouldn't have a way of knowing what has or has not
been done, so it leaves everything in a clean state.  I think this is a good
model to follow.

It is also what most QEMU block interfaces currently do, iirc (.bdrv_open,
etc.) - if it fails, it is assumed that it frees all resources it allocated.

I guess it doesn't have to be done this way, and the complexity can just be
pushed into the _abort() function.  After all, with these transactional
models, there exists an abort function, which differentiates it from most
other APIs.  But the downfall is that we have different ways of handling
essentially the same sort of transactional model in the block layer (between
bdrv_reopen and qmp_transaction), and it trips up reviewers / authors.  

(I don't think changing how qmp_transaction handles this is something that
needs to be handled in this series - but it would be nice in the future
sometime).

Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-11-08 18:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-02 17:50 [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition John Snow
2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 1/6] blockjob: fix dead pointer in txn list John Snow
2016-11-08  2:47   ` Jeff Cody
2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 2/6] blockjob: add .clean property John Snow
2016-11-08  2:51   ` Jeff Cody
2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 3/6] blockjob: add .start field John Snow
2016-11-08  2:58   ` Jeff Cody
2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start John Snow
2016-11-03 12:17   ` Kevin Wolf
2016-11-08  2:02     ` John Snow
2016-11-08  2:05       ` Jeff Cody
2016-11-08  2:20         ` John Snow
2016-11-08  9:16         ` Kevin Wolf
2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 5/6] blockjob: refactor backup_start as backup_job_create John Snow
2016-11-03 13:17   ` Kevin Wolf
2016-11-08  5:41     ` John Snow
2016-11-08  9:11       ` Kevin Wolf
2016-11-08 15:24         ` John Snow
2016-11-08 18:30           ` Jeff Cody
2016-11-08  3:14   ` Jeff Cody
2016-11-02 17:50 ` [Qemu-devel] [PATCH v3 6/6] iotests: add transactional failure race test John Snow
2016-11-03 13:21 ` [Qemu-devel] [PATCH v3 0/6] jobs: fix transactional race condition Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).