From: John Snow <jsnow@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Jeff Cody <jcody@redhat.com>,
vsementsov@parallels.com, famz@redhat.com, mreitz@redhat.com
Subject: Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions
Date: Mon, 29 Jun 2015 18:38:13 -0400 [thread overview]
Message-ID: <5591C8D5.6080005@redhat.com> (raw)
In-Reply-To: <1435234332-581-6-git-send-email-stefanha@redhat.com>
On 06/25/2015 08:12 AM, Stefan Hajnoczi wrote:
> Sometimes block jobs must execute as a transaction group. Finishing
> jobs wait until all other jobs are ready to complete successfully.
> Failure or cancellation of one job cancels the other jobs in the group.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> blockjob.c | 160 ++++++++++++++++++++++++++++++++++++++++++++++
> include/block/block.h | 1 +
> include/block/block_int.h | 3 +-
> include/block/blockjob.h | 49 ++++++++++++++
> trace-events | 4 ++
> 5 files changed, 216 insertions(+), 1 deletion(-)
>
> diff --git a/blockjob.c b/blockjob.c
> index ec46fad..3c6f1d4 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -400,3 +400,163 @@ void block_job_defer_to_main_loop(BlockJob *job,
>
> qemu_bh_schedule(data->bh);
> }
> +
> +/* Transactional group of block jobs */
> +struct BlockJobTxn {
> + /* Jobs may be in different AioContexts so protect all fields */
> + QemuMutex lock;
> +
> + /* Reference count for txn object */
> + unsigned int ref;
> +
> + /* Is this txn cancelling its jobs? */
> + bool aborting;
> +
> + /* Number of jobs still running */
> + unsigned int jobs_pending;
> +
> + /* List of jobs */
> + QLIST_HEAD(, BlockJob) jobs;
> +};
> +
> +BlockJobTxn *block_job_txn_new(void)
> +{
> + BlockJobTxn *txn = g_new(BlockJobTxn, 1);
> + qemu_mutex_init(&txn->lock);
> + txn->ref = 1; /* dropped by block_job_txn_begin() */
> + txn->aborting = false;
> + txn->jobs_pending = 0;
> + QLIST_INIT(&txn->jobs);
> + return txn;
> +}
> +
> +static void block_job_txn_unref(BlockJobTxn *txn)
> +{
> + qemu_mutex_lock(&txn->lock);
> +
> + if (--txn->ref > 0) {
> + qemu_mutex_unlock(&txn->lock);
> + return;
> + }
> +
> + qemu_mutex_unlock(&txn->lock);
> + qemu_mutex_destroy(&txn->lock);
> + g_free(txn);
> +}
> +
> +/* The purpose of this is to keep txn alive until all jobs have been added */
> +void block_job_txn_begin(BlockJobTxn *txn)
> +{
> + block_job_txn_unref(txn);
> +}
> +
Maybe it's not entirely clear to the caller that "begin" may in fact
actually delete the BlockJobTxn. Shall we update the caller's pointer to
NULL in this case as a hint? Passing a **txn will imply that we are
giving up our ownership of the object.
> +void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
> +{
> + if (!txn) {
> + return;
> + }
> +
> + assert(!job->txn);
> + job->txn = txn;
> +
> + qemu_mutex_lock(&txn->lock);
> + txn->ref++;
> + txn->jobs_pending++;
> + QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
> + qemu_mutex_unlock(&txn->lock);
> +}
> +
> +/* Cancel all other jobs in case of abort, wake all waiting jobs in case of
> + * successful completion. Runs from main loop.
> + */
> +static void block_job_txn_complete(BlockJob *job, void *opaque)
> +{
> + BlockJobTxn *txn = opaque;
> + BlockJob *other_job;
> + bool aborting = txn->aborting;
> +
> + qemu_mutex_lock(&txn->lock);
> + txn->ref++; /* keep txn alive until the end of this loop */
> +
> + QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
> + AioContext *ctx;
> +
> + qemu_mutex_unlock(&txn->lock);
> + ctx = bdrv_get_aio_context(other_job->bs);
> + aio_context_acquire(ctx);
> +
> + /* Cancel all other jobs if aborting. Don't cancel our own failed job
> + * since cancellation throws away the error value.
> + */
> + if (aborting && other_job != job) {
> + block_job_cancel(other_job);
> + } else {
> + block_job_enter(other_job);
> + }
> +
> + aio_context_release(ctx);
> + qemu_mutex_lock(&txn->lock);
> + }
> +
> + qemu_mutex_unlock(&txn->lock);
> + block_job_txn_unref(txn);
> +}
> +
Maybe we can name this one along the lines of
block_job_txn_complete_impl or something clearly internal, so that we
can name the public interface simply "block_job_txn_complete."
Maybe I'm just bike shedding, but calling only a "prepare to X" function
without a matching "X" call in the exposed API seems odd.
I suppose it doesn't matter, because I can't think of anything nicer :)
> +void coroutine_fn block_job_txn_prepare_to_complete(BlockJobTxn *txn,
> + BlockJob *job,
> + int ret)
> +{
> + if (!txn) {
> + return;
> + }
> +
> + qemu_mutex_lock(&txn->lock);
> +
> + /* This function is entered in 3 cases:
> + *
> + * 1. Successful job completion - wait for other jobs
> + * 2. First failed/cancelled job in txn - cancel other jobs and wait
> + * 3. Subsequent cancelled jobs - finish immediately, don't wait
> + */
> + trace_block_job_txn_prepare_to_complete_entry(txn, job, ret,
> + block_job_is_cancelled(job),
> + txn->aborting,
> + txn->jobs_pending);
> +
> + if (txn->aborting) { /* Case 3 */
> + assert(block_job_is_cancelled(job));
> + goto out; /* already cancelled, don't yield */
> + }
> +
So the first failure forces all jobs not-yet-complete to cancel. Is
there any chance for a race condition of two jobs completing almost
simultaneously, where the first fails and the second completes, and the
2nd job makes it here before it gets canceled?
BOD: I really assume the answer is "no," but it's not immediately
evident to me.
> + if (ret != 0 || block_job_is_cancelled(job)) { /* Case 2 */
> +abort:
> + txn->aborting = true;
> + block_job_defer_to_main_loop(job, block_job_txn_complete, txn);
> + } else { /* Case 1 */
> + if (--txn->jobs_pending == 0) {
> + block_job_defer_to_main_loop(job, block_job_txn_complete, txn);
> + }
> + }
> +
> + /* Wait for block_job_txn_complete() */
> + do {
> + qemu_mutex_unlock(&txn->lock);
> + job->busy = false;
> + qemu_coroutine_yield();
> + job->busy = true;
> + qemu_mutex_lock(&txn->lock);
> +
> + if (block_job_is_cancelled(job) && !txn->aborting) {
> + goto abort; /* this job just got cancelled by the user */
> + }
> + } while (!txn->aborting && txn->jobs_pending > 0);
> +
> +out:
> + trace_block_job_txn_prepare_to_complete_return(txn, job, ret,
> + block_job_is_cancelled(job),
> + txn->aborting,
> + txn->jobs_pending);
> +
> + qemu_mutex_unlock(&txn->lock);
> + block_job_txn_unref(txn);
> +}
> diff --git a/include/block/block.h b/include/block/block.h
> index a4c505d..cb19c73 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -13,6 +13,7 @@
> typedef struct BlockDriver BlockDriver;
> typedef struct BlockJob BlockJob;
> typedef struct BdrvChildRole BdrvChildRole;
> +typedef struct BlockJobTxn BlockJobTxn;
>
> typedef struct BlockDriverInfo {
> /* in bytes, 0 if irrelevant */
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index ea3e7f0..812a18a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -639,6 +639,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
> * @on_source_error: The action to take upon error reading from the source.
> * @on_target_error: The action to take upon error writing to the target.
> * @cb: Completion function for the job.
> + * @txn: Transaction that this job is part of (may be NULL).
> * @opaque: Opaque pointer value passed to @cb.
> *
> * Start a backup operation on @bs. Clusters in @bs are written to @target
> @@ -650,7 +651,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
> BlockdevOnError on_source_error,
> BlockdevOnError on_target_error,
> BlockCompletionFunc *cb, void *opaque,
> - Error **errp);
> + BlockJobTxn *txn, Error **errp);
>
> void blk_dev_change_media_cb(BlockBackend *blk, bool load);
> bool blk_dev_has_removable_media(BlockBackend *blk);
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 57d8ef1..ce57e98 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -122,6 +122,10 @@ struct BlockJob {
>
> /** The opaque value that is passed to the completion function. */
> void *opaque;
> +
> + /** Non-NULL if this job is part of a transaction */
> + BlockJobTxn *txn;
> + QLIST_ENTRY(BlockJob) txn_list;
> };
>
> /**
> @@ -348,4 +352,49 @@ void block_job_defer_to_main_loop(BlockJob *job,
> BlockJobDeferToMainLoopFn *fn,
> void *opaque);
>
> +/**
> + * block_job_txn_new:
> + *
> + * Allocate and return a new block job transaction. Jobs can be added to the
> + * transaction using block_job_txn_add_job(). block_job_txn_begin() must be
> + * called when all jobs (if any) have been added.
> + *
> + * All jobs in the transaction either complete successfully or fail/cancel as a
> + * group. Jobs wait for each other before completing. Cancelling one job
> + * cancels all jobs in the transaction.
> + */
> +BlockJobTxn *block_job_txn_new(void);
> +
> +/**
> + * block_job_txn_add_job:
> + * @txn: The transaction
> + * @job: Job to add to the transaction
> + *
> + * Add @job to the transaction. The @job must not already be in a transaction.
> + * The block job driver must call block_job_txn_prepare_to_complete() before
> + * final cleanup and completion.
> + */
> +void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job);
> +
> +/**
> + * block_job_txn_begin:
> + * @txn: The transaction
> + *
> + * Call this to mark the end of adding jobs to the transaction. This must be
> + * called even if no jobs were added.
> + */
> +void block_job_txn_begin(BlockJobTxn *txn);
> +
> +/**
> + * block_job_txn_prepare_to_complete:
> + * @txn: The transaction
> + * @job: The block job
> + * @ret: Block job return value (0 for success, otherwise job failure)
> + *
> + * Wait for other jobs in the transaction to complete. If @ret is non-zero or
> + * @job is cancelled, all other jobs in the transaction will be cancelled.
> + */
> +void coroutine_fn block_job_txn_prepare_to_complete(BlockJobTxn *txn,
> + BlockJob *job, int ret);
> +
> #endif
> diff --git a/trace-events b/trace-events
> index 52b7efa..b6a43a0 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -123,6 +123,10 @@ virtio_blk_data_plane_start(void *s) "dataplane %p"
> virtio_blk_data_plane_stop(void *s) "dataplane %p"
> virtio_blk_data_plane_process_request(void *s, unsigned int out_num, unsigned int in_num, unsigned int head) "dataplane %p out_num %u in_num %u head %u"
>
> +# blockjob.c
> +block_job_txn_prepare_to_complete_entry(void *txn, void *job, int ret, bool cancelled, bool aborting, unsigned int jobs_pending) "txn %p job %p ret %d cancelled %d aborting %d jobs_pending %u"
> +block_job_txn_prepare_to_complete_return(void *txn, void *job, int ret, bool cancelled, bool aborting, unsigned int jobs_pending) "txn %p job %p ret %d cancelled %d aborting %d jobs_pending %u"
> +
> # hw/virtio/dataplane/vring.c
> vring_setup(uint64_t physical, void *desc, void *avail, void *used) "vring physical %#"PRIx64" desc %p avail %p used %p"
>
>
Bike-shedding comments and a benefit-of-the-doubt aside;
Reviewed-by: John Snow <jsnow@redhat.com>
next prev parent reply other threads:[~2015-06-29 22:38 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-25 12:12 [Qemu-devel] [PATCH 00/10] block: incremental backup transactions using BlockJobTxn Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 01/10] qapi: Add transaction support to block-dirty-bitmap operations Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 02/10] iotests: add transactional incremental backup test Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 03/10] block: rename BlkTransactionState and BdrvActionOps Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 04/10] block: keep bitmap if incremental backup job is cancelled Stefan Hajnoczi
2015-06-26 6:00 ` Fam Zheng
2015-06-29 22:36 ` John Snow
2015-06-25 12:12 ` [Qemu-devel] [PATCH 05/10] block: add block job transactions Stefan Hajnoczi
2015-06-26 6:41 ` Fam Zheng
2015-06-29 22:38 ` John Snow [this message]
2015-06-30 16:20 ` Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 06/10] blockdev: make BlockJobTxn available to qmp 'transaction' Stefan Hajnoczi
2015-06-26 6:42 ` Fam Zheng
2015-06-29 22:38 ` John Snow
2015-06-25 12:12 ` [Qemu-devel] [PATCH 07/10] block/backup: support block job transactions Stefan Hajnoczi
2015-06-26 6:44 ` Fam Zheng
2015-06-29 22:39 ` John Snow
2015-06-30 15:27 ` Stefan Hajnoczi
2015-06-30 15:52 ` John Snow
2015-07-01 8:45 ` Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 08/10] iotests: 124 - transactional failure test Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 09/10] qmp-commands.hx: Update the supported 'transaction' operations Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 10/10] tests: add BlockJobTxn unit test Stefan Hajnoczi
2015-06-26 6:58 ` Fam Zheng
2015-06-29 15:08 ` [Qemu-devel] [PATCH 00/10] block: incremental backup transactions using BlockJobTxn Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5591C8D5.6080005@redhat.com \
--to=jsnow@redhat.com \
--cc=famz@redhat.com \
--cc=jcody@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.