Re: [Qemu-devel] [Qemu-block] [PATCH v10 12/14] block: add transactional properties

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: John Snow <jsnow@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: vsementsov@virtuozzo.com, famz@redhat.com, qemu-block@nongnu.org,
	armbru@redhat.com, qemu-devel@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH v10 12/14] block: add transactional properties
Date: Fri, 6 Nov 2015 13:46:42 -0500	[thread overview]
Message-ID: <563CF592.3060302@redhat.com> (raw)
In-Reply-To: <20151106083219.GB4071@noname.redhat.com>

On 11/06/2015 03:32 AM, Kevin Wolf wrote:
> Am 05.11.2015 um 19:52 hat John Snow geschrieben:
>>
>>
>> On 11/05/2015 05:47 AM, Stefan Hajnoczi wrote:
>>> On Tue, Nov 03, 2015 at 12:27:19PM -0500, John Snow wrote:
>>>>
>>>>
>>>> On 11/03/2015 10:17 AM, Stefan Hajnoczi wrote:
>>>>> On Fri, Oct 23, 2015 at 07:56:50PM -0400, John Snow wrote:
>>>>>> @@ -1732,6 +1757,10 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
>>>>>>      BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
>>>>>>                                               common, common);
>>>>>>  
>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>      action = common->action->block_dirty_bitmap_add;
>>>>>>      /* AIO context taken and released within qmp_block_dirty_bitmap_add */
>>>>>>      qmp_block_dirty_bitmap_add(action->node, action->name,
>>>>>> @@ -1767,6 +1796,10 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
>>>>>>                                               common, common);
>>>>>>      BlockDirtyBitmap *action;
>>>>>>  
>>>>>> +    if (action_check_cancel_mode(common, errp) < 0) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>      action = common->action->block_dirty_bitmap_clear;
>>>>>>      state->bitmap = block_dirty_bitmap_lookup(action->node,
>>>>>>                                                action->name,
>>>>>
>>>>> Why do the bitmap add/clear actions not support err-cancel=all?
>>>>>
>>>>> I understand why other block jobs don't support it, but it's not clear
>>>>> why these non-block job actions cannot.
>>>>>
>>>>
>>>> Because they don't have a callback to invoke if the rest of the job fails.
>>>>
>>>> I could create a BlockJob for them complete with a callback to invoke,
>>>> but basically it's just because there's no interface to unwind them, or
>>>> an interface to join them with the transaction.
>>>>
>>>> They're small, synchronous non-job actions. Which makes them weird.
>>>
>>> Funny, we've been looking at the same picture while seeing different
>>> things:
>>> https://en.wikipedia.org/wiki/Rabbit%E2%80%93duck_illusion
>>>
>>> I think I understand your idea: the transaction should include both
>>> immediate actions as well as block jobs.
>>>
>>> My mental model was different: immediate actions commit/abort along with
>>> the 'transaction' command.  Block jobs are separate and complete/cancel
>>> together in a group.
>>>
>>> In practice I think the two end up being similar because we won't be
>>> able to implement immediate action commit/abort together with
>>> long-running block jobs because the immediate actions rely on
>>> quiescing/pausing the guest for atomic commit/abort.
>>>
>>> So with your mental model the QMP client has to submit 2 'transaction'
>>> commands: 1 for the immediate actions, 1 for the block jobs.
>>>
>>> In my mental model the QMP client submits 1 command but the immediate
>>> actions and block jobs are two separate transaction scopes.  This means
>>> if the block jobs fail, the client needs to be aware of the immediate
>>> actions that have committed.  Because of this, it becomes just as much
>>> client effort as submitting two separate 'transaction' commands in your
>>> model.
>>>
>>> Can anyone see a practical difference?  I think I'm happy with John's
>>> model.
>>>
>>> Stefan
>>>
>>
>> We discussed this off-list, but for the sake of the archive:
>>
>> == How it is now ==
>>
>> Currently, transactions have two implicit phases: the first is the
>> synchronous phase. If this phase completes successfully, we consider the
>> transaction a success. The second phase is the asynchronous phase where
>> jobs launched by the synchronous phase run to completion.
>>
>> all synchronous commands must complete for the transaction to "succeed."
>> There are currently (pre-patch) no guarantees about asynchronous command
>> completion. As long as all synchronous actions complete, asynchronous
>> actions are free to succeed or fail individually.
> 
> You're overcomplicating this. All actions are currently synchronous and
> what you consider asynchronous transaction actions aren't actually part
> of the transaction at all. The action is "start block job X", not "run
> block job X".
> 

"OK," except the entire goal of the series was to allow the "aren't
actually part of the transaction at all" parts to live and die together
based on the transaction they were launched from. This really implies
they belong to a second asynchronous phase of the transaction.

Otherwise, why would totally unrelated jobs fail because a different one
did?

I realize this is an /extension/ of the existing semantics vs a "fix,"
and part of the confusion is how I and everyone else was looking at it
differently. How could this happen, though? Let's look at our
transaction documentation:

"Operations that are currently supported:" [...] "- drive-backup" [...]
"If there is any failure
performing any of the operations, all operations for the group are
abandoned."

Where do we say "Except drive-backup, though, because technically the
drive-backup action only starts the job, and we don't really consider
this to be part of the transaction" ? We've never actually defined this
behavior as part of our API as far as I can see.

Regardless: the net effect is still the same. We want jobs launched by
transactions to (optionally!) cancel themselves on failure. The
complications only arise because people want the exact semantic meaning
to be precise for the QMP API.

The concept is simple, the language to describe it has been muddy.

>> == My Model ==
>>
>> The current behavior is my "err-cancel = none" scenario: we offer no
>> guarantee about the success or failure of the transaction as a whole
>> after the synchronous portion has completed.
>>
>> What I was proposing is "err-cancel = all," which to me means that _ALL_
>> commands in this transaction are to succeed (synchronous or not) before
>> _any_ actions are irrevocably committed. This means that for a
>> hypothetical mixed synchronous-asynchronous transaction, that even after
>> the transaction succeeded (it passed the synchronous phase), if an
>> asynchronous action later fails, all actions both synchronous and non
>> are rolled-back -- a kind of retroactive failure of the transaction.
>> This is clearly not possible in all cases, so commands that cannot
>> support these semantics will refuse "err-cancel = all" during the
>> synchronous phase.
> 
> Is this possible in any case? You're losing transaction semantics the
> lastest when you drop the BQL that the monitor holds. At least atomicity
> and isolation aren't given any more.
> 

It might be possible in at least *some* cases. I currently don't even
attempt it, though. This is why some transaction actions just refuse
this parameter if it shows up.

It'd be pretty easy to undo a bitmap-add or a bitmap-clear, as long as I
gave these actions a conditional success callback.

The "undo" semantics of the jobs are somewhat different. I am not
suggesting we can teleport back in time to before we tried to do the
backup, but we can at least abort the backup and make it like you never
issued the command -- which is important for incremental backups.

The rollback behavior for each action needs to be spelled out in our
documentation ... as well as categorizing which actions complete before
qmp_transactions return and which may have lingering effects (like
drive-backup.)

> You can try to undo some parts of what you did later one, but if any
> involved BDS was used in the meantime by anything other than the block
> job, you don't have transactional behaviour any more.
> 
> And isn't the management tool perfectly capable of cleaning up all the
> block jobs without magic happening in qemu if one of them fails? Do we
> actually need atomic failure later on? And if so, do we need atomic
> failure only of block jobs started in the same transaction? Why?
> 

There's always a debate about who is responsible for cleaning things up
on failures: QEMU or the management tool? Luckily: it's an optional
parameter, so the tool can decide at run-time about what semantics it wants.

IMO: It's a little late in this series to question if we need this or
not, but I'll oblige.

The original objection from Stefan to the incremental backup
https://soundcloud.com/tothejazz/gyms-flappy-the-happy-sealtransaction
semantics was that if two incremental backup jobs launched by a
transaction had only partial success, the management tool would have to
take on extra state and possibly issue some corrective actions to QEMU
in order to undo the damage.

QEMU, however, could just unravel the failure fairly trivially and allow
the backup commands to maintain a simple binary success/failure state.
This was agreed to be the better, less complicated management scenario.

In the case that incremental backups partially complete, you'll have one
bitmap deleted and a different bitmap merged back onto the BDS. The
management tool can at this point absolutely not delete the one
incremental that got made and needs to leave it in place before
re-issuing the incremental backup. It's then responsible for either
squashing the two incremental backups it made, or otherwise managing the
disparity in the number of actual backup files.

For QEMU's trouble, we don't delete incremental backup data until after
the backup operation, so it's trivial to hold onto the data until we
know everything's OK.

We decided it was nicest for QEMU to just roll back all of the jobs in
this case. If the management tool disagrees, it can just roll with the
original semantics.

I still believe strongly that this is the right way to go. It's
flexible, allows for either party to manage the failure, the parameter
is completely non-ambiguous, provides for early failure if the expected
semantics are not possible, and the code is already here and mostly
reviewed... except for the API names.

--js

next prev parent reply	other threads:[~2015-11-06 18:46 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-23 23:56 [Qemu-devel] [PATCH v10 00/14] block: incremental backup transactions using BlockJobTxn John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 01/14] qapi: Add transaction support to block-dirty-bitmap operations John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 02/14] iotests: add transactional incremental backup test John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 03/14] block: rename BlkTransactionState and BdrvActionOps John Snow
2015-11-03 16:33   ` [Qemu-devel] [Qemu-block] " Jeff Cody
2015-11-03 17:32     ` John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 04/14] backup: Extract dirty bitmap handling as a separate function John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 05/14] blockjob: Introduce reference count and fix reference to job->bs John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 06/14] blockjob: Add .commit and .abort block job actions John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 07/14] blockjob: Add "completed" and "ret" in BlockJob John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 08/14] blockjob: Simplify block_job_finish_sync John Snow
2015-11-03 13:52   ` Stefan Hajnoczi
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 09/14] block: Add block job transactions John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 10/14] block/backup: Rely on commit/abort for cleanup John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 11/14] block: Add BlockJobTxn support to backup_run John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 12/14] block: add transactional properties John Snow
2015-11-03 15:17   ` Stefan Hajnoczi
2015-11-03 17:27     ` John Snow
2015-11-05 10:47       ` Stefan Hajnoczi
2015-11-05 18:52         ` John Snow
2015-11-05 19:35           ` Markus Armbruster
2015-11-05 19:43             ` John Snow
2015-11-06  8:32           ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2015-11-06 16:36             ` Stefan Hajnoczi
2015-11-06 18:46             ` John Snow [this message]
2015-11-06 20:35               ` John Snow
2015-11-03 15:23   ` [Qemu-devel] " Eric Blake
2015-11-03 17:31     ` John Snow
2015-11-03 17:35       ` Eric Blake
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 13/14] iotests: 124 - transactional failure test John Snow
2015-10-23 23:56 ` [Qemu-devel] [PATCH v10 14/14] tests: add BlockJobTxn unit test John Snow
2015-11-02 21:06 ` [Qemu-devel] [PATCH v10 00/14] block: incremental backup transactions using BlockJobTxn John Snow
2015-11-03 15:22 ` Stefan Hajnoczi
2015-11-03 17:46   ` John Snow
2015-11-03 22:07     ` John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563CF592.3060302@redhat.com \
    --to=jsnow@redhat.com \
    --cc=armbru@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).