From: Stefan Hajnoczi <stefanha@redhat.com>
To: John Snow <jsnow@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
famz@redhat.com, Jeff Cody <jcody@redhat.com>,
qemu-devel@nongnu.org, mreitz@redhat.com,
vsementsov@parallels.com
Subject: Re: [Qemu-devel] [PATCH 05/10] block: add block job transactions
Date: Tue, 30 Jun 2015 17:20:25 +0100 [thread overview]
Message-ID: <20150630162025.GG31899@stefanha-thinkpad.redhat.com> (raw)
In-Reply-To: <5591C8D5.6080005@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4721 bytes --]
On Mon, Jun 29, 2015 at 06:38:13PM -0400, John Snow wrote:
> On 06/25/2015 08:12 AM, Stefan Hajnoczi wrote:
> > +/* The purpose of this is to keep txn alive until all jobs have been added */
> > +void block_job_txn_begin(BlockJobTxn *txn)
> > +{
> > + block_job_txn_unref(txn);
> > +}
> > +
>
> Maybe it's not entirely clear to the caller that "begin" may in fact
> actually delete the BlockJobTxn. Shall we update the caller's pointer to
> NULL in this case as a hint? Passing a **txn will imply that we are
> giving up our ownership of the object.
Good idea.
> > +/* Cancel all other jobs in case of abort, wake all waiting jobs in case of
> > + * successful completion. Runs from main loop.
> > + */
> > +static void block_job_txn_complete(BlockJob *job, void *opaque)
> > +{
> > + BlockJobTxn *txn = opaque;
> > + BlockJob *other_job;
> > + bool aborting = txn->aborting;
> > +
> > + qemu_mutex_lock(&txn->lock);
> > + txn->ref++; /* keep txn alive until the end of this loop */
> > +
> > + QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
> > + AioContext *ctx;
> > +
> > + qemu_mutex_unlock(&txn->lock);
> > + ctx = bdrv_get_aio_context(other_job->bs);
> > + aio_context_acquire(ctx);
> > +
> > + /* Cancel all other jobs if aborting. Don't cancel our own failed job
> > + * since cancellation throws away the error value.
> > + */
> > + if (aborting && other_job != job) {
> > + block_job_cancel(other_job);
> > + } else {
> > + block_job_enter(other_job);
> > + }
> > +
> > + aio_context_release(ctx);
> > + qemu_mutex_lock(&txn->lock);
> > + }
> > +
> > + qemu_mutex_unlock(&txn->lock);
> > + block_job_txn_unref(txn);
> > +}
> > +
>
> Maybe we can name this one along the lines of
> block_job_txn_complete_impl or something clearly internal, so that we
> can name the public interface simply "block_job_txn_complete."
>
> Maybe I'm just bike shedding, but calling only a "prepare to X" function
> without a matching "X" call in the exposed API seems odd.
>
> I suppose it doesn't matter, because I can't think of anything nicer :)
>
> > +void coroutine_fn block_job_txn_prepare_to_complete(BlockJobTxn *txn,
> > + BlockJob *job,
> > + int ret)
I'll rename the function to:
void coroutine_fn block_job_txn_job_done(BlockJobTxn *txn, BlockJob *job, int ret)
"complete" is already overloaded. block_job_completed() is called by
jobs to terminate. block_job_complete() is called by the monitor to
nudge a waiting job to finish. They are two different things. Maybe
using the term in BlockJobTxn makes things more confusing.
(It's possible to drop the txn argument since it can be fetched from
job->txn, but that makes boundary between BlockJob and BlockJobTxn
harder to understand.)
> > +{
> > + if (!txn) {
> > + return;
> > + }
> > +
> > + qemu_mutex_lock(&txn->lock);
> > +
> > + /* This function is entered in 3 cases:
> > + *
> > + * 1. Successful job completion - wait for other jobs
> > + * 2. First failed/cancelled job in txn - cancel other jobs and wait
> > + * 3. Subsequent cancelled jobs - finish immediately, don't wait
> > + */
> > + trace_block_job_txn_prepare_to_complete_entry(txn, job, ret,
> > + block_job_is_cancelled(job),
> > + txn->aborting,
> > + txn->jobs_pending);
> > +
> > + if (txn->aborting) { /* Case 3 */
> > + assert(block_job_is_cancelled(job));
> > + goto out; /* already cancelled, don't yield */
> > + }
> > +
>
> So the first failure forces all jobs not-yet-complete to cancel. Is
> there any chance for a race condition of two jobs completing almost
> simultaneously, where the first fails and the second completes, and the
> 2nd job makes it here before it gets canceled?
>
> BOD: I really assume the answer is "no," but it's not immediately
> evident to me.
Good catch! It is possible if the 2nd job is entered after the 1st job
yielded but before block_job_txn_complete() is called. There would have
to be a callback pending for the 2nd job.
It can be fixed by adding a new case for jobs that call
block_job_txn_prepare_to_complete() when block_job_txn_complete has been
scheduled. The job should simply wait for block_job_txn_complete().
I'll add a test case for this scenario.
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
next prev parent reply other threads:[~2015-06-30 16:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-25 12:12 [Qemu-devel] [PATCH 00/10] block: incremental backup transactions using BlockJobTxn Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 01/10] qapi: Add transaction support to block-dirty-bitmap operations Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 02/10] iotests: add transactional incremental backup test Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 03/10] block: rename BlkTransactionState and BdrvActionOps Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 04/10] block: keep bitmap if incremental backup job is cancelled Stefan Hajnoczi
2015-06-26 6:00 ` Fam Zheng
2015-06-29 22:36 ` John Snow
2015-06-25 12:12 ` [Qemu-devel] [PATCH 05/10] block: add block job transactions Stefan Hajnoczi
2015-06-26 6:41 ` Fam Zheng
2015-06-29 22:38 ` John Snow
2015-06-30 16:20 ` Stefan Hajnoczi [this message]
2015-06-25 12:12 ` [Qemu-devel] [PATCH 06/10] blockdev: make BlockJobTxn available to qmp 'transaction' Stefan Hajnoczi
2015-06-26 6:42 ` Fam Zheng
2015-06-29 22:38 ` John Snow
2015-06-25 12:12 ` [Qemu-devel] [PATCH 07/10] block/backup: support block job transactions Stefan Hajnoczi
2015-06-26 6:44 ` Fam Zheng
2015-06-29 22:39 ` John Snow
2015-06-30 15:27 ` Stefan Hajnoczi
2015-06-30 15:52 ` John Snow
2015-07-01 8:45 ` Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 08/10] iotests: 124 - transactional failure test Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 09/10] qmp-commands.hx: Update the supported 'transaction' operations Stefan Hajnoczi
2015-06-25 12:12 ` [Qemu-devel] [PATCH 10/10] tests: add BlockJobTxn unit test Stefan Hajnoczi
2015-06-26 6:58 ` Fam Zheng
2015-06-29 15:08 ` [Qemu-devel] [PATCH 00/10] block: incremental backup transactions using BlockJobTxn Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150630162025.GG31899@stefanha-thinkpad.redhat.com \
--to=stefanha@redhat.com \
--cc=famz@redhat.com \
--cc=jcody@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).