qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: John Snow <jsnow@redhat.com>
Cc: qemu-block@nongnu.org, vsementsov@virtuozzo.com, famz@redhat.com,
	jcody@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create
Date: Tue, 11 Oct 2016 11:35:53 +0200	[thread overview]
Message-ID: <20161011093553.GA6334@noname.redhat.com> (raw)
In-Reply-To: <a1c818de-5ba8-825e-b7e6-f25e98befa13@redhat.com>

Am 11.10.2016 um 00:51 hat John Snow geschrieben:
> >>Sadly for me, I realized this patch has a potential problem. When we
> >>were adding the bitmap operations, it became clear that the
> >>atomicity point was during .prepare, not .commit.
> >>
> >>e.g. the bitmap is cleared or created during prepare, and backup_run
> >>installs its Write Notifier at that point in time, too.
> >
> >Strictly speaking that's wrong then.
> >
> 
> I agree, though I do remember this coming up during the bitmap
> review process that the current point-in-time spot is during prepare
> at the moment.
> 
> I do think that while it's at least a consistent model (The model
> where we do in fact commit during .prepare(), and simply undo or
> revert during .abort(), and only clean or remove undo-cache in
> .commit()) it certainly violates the principle of least surprise and
> is a little rude...

As long as we can reliably undo things in .abort (i.e. use operations
that can't fail) and keep the locks and the device drained, we should be
okay in terms of atomicity.

I think it's still nicer if we can enable things only in .commit, but
sometimes we have to use operations that could fail, so we have to do
them in .prepare.

The exact split between .prepare/.commit/.abort isn't visible on the
external interfaces as long as it's done correctly, so it doesn't
necessarily have to be the same for all commands.

> >The write notifier doesn't really hurt because it is never triggered
> >between prepare and commit (we're holding the lock) and it can just be
> >removed again.
> >
> >Clearing the bitmap is a bug because the caller could expect that the
> >bitmap is in its original state if the transaction fails. I doubt this
> >is a problem in practice, but we should fix it anyway.
> 
> We make a backup to undo the process if it fails. I only mention it
> to emphasize that the atomic point appears to be during prepare. In
> practice we hold the locks for the whole process, but... I think
> Paolo may be actively trying to change that.

Well, the whole .prepare/.commit or .prepare/.abort sequence is supposed
to be atomic, so it's really the same thing. Changing this would break
the transactional behaviour, so that's not possible anyway.

> >By the way, why did we allow to add a 'bitmap' option for DriveBackup
> >without adding it to BlockdevBackup at the same time?
> 
> I don't remember. I'm not sure anyone ever audited it to convince
> themselves it was a useful or safe thing to do. I believe at the
> time I was pushing for bitmaps in DriveBackup, Fam was still
> authoring the BlockdevBackup interface.

Hm, maybe that's why. I checked the commit dates of both (and there
BlockdevBackup was earlier), but I didn't check the development history.

Should we add it now or is it a bad idea?

> >>By changing BlockJobs to only run on commit, we've severed the
> >>atomicity point such that some actions will take effect during
> >>prepare, and others at commit.
> >>
> >>I still think it's the correct thing to do to delay the BlockJobs
> >>until the commit phase, so I will start auditing the code to see how
> >>hard it is to shift the atomicity point to commit instead. If it's
> >>possible to do that, I think from the POV of the managing
> >>application, having the atomicity point be
> >>
> >>Feel free to chime in with suggestions and counterpoints until then.
> >
> >I agree that jobs have to be started only at commit. There may be other
> >things that are currently happening in prepare that really should be
> >moved as well, but unless moving one thing but not the other doesn't
> >break anything that was working, we can fix one thing at a time.
> >
> >Kevin
> >
> 
> Alright, let's give this a whirl.
> 
> We have 8 transaction actions:
> 
> drive_backup
> blockdev_backup
> block_dirty_bitmap_add
> block_dirty_bitmap_clear
> abort
> blockdev_snapshot
> blockdev_snapshot_sync
> blockdev_snapshot_internal_sync
> 
> Drive and Blockdev backup are already modified to behave
> point-in-time at time of .commit() by changing them to only begin
> running once the commit phase occurs.
> 
> Bitmap add and clear are trivial to rework; clear just moves the
> call to clear in commit, with possibly some action taken to prevent
> the bitmap from become used by some other process in the meantime.
> Add is easy to rework too, we can create it during prepare but reset
> it back to zero during commit if necessary.
> 
> Abort needs no changes.
> 
> blockdev_snapshot[_sync] actually appears to already be doing the
> right thing, by only installing the new top layer during commit,
> which makes this action inconsistent by current semantics, but
> requires no changes to move to the desired new semantics.

This doesn't sound too bad.

> That leaves only the internal snapshot to worry about, which does
> admittedly look like quite the yak to shave. It's a bit out of scope
> for me, but Kevin, do you think this is possible?
> 
> Looks like implementations are qcow2, rbd, and sheepdog. I imagine
> this would need to be split into prepare and commit semantics to
> accommodate this change... though we don't have any meaningful
> control over the rbd implementation.
> 
> Any thoughts? I could conceivably just change everything over to
> working primarily during .commit(), and just argue that the locks
> held for the transaction are sufficient to leave the internal
> snapshot alone "for now," ...

Leave them alone. We don't really support atomic internal snapshots. We
could make some heavy refactoring in order to split the BlockDriver
callbacks into prepare/commit/abort, but that's probably not worth the
effort and would make some code that already isn't tested much a lot
more complex.

If we ever decided to get serious about internal snapshots, we could
still do this. I kind of like internal snapshots, but I doubt it will
happen.

Kevin

  parent reply	other threads:[~2016-10-11  9:36 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-30 22:00 [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 01/11] blockjob: fix dead pointer in txn list John Snow
2016-10-05 13:43   ` Kevin Wolf
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 02/11] blockjob: centralize QMP event emissions John Snow
2016-10-05 13:43   ` Kevin Wolf
2016-10-05 18:49     ` John Snow
2016-10-05 19:24       ` Eric Blake
2016-10-05 21:00         ` John Snow
2016-10-10 16:45           ` Kashyap Chamarthy
2016-10-10 18:36             ` John Snow
2016-10-10 19:28               ` Eric Blake
2016-10-11 13:32                 ` Kashyap Chamarthy
2016-10-06  7:44       ` Kevin Wolf
2016-10-06 16:57         ` John Snow
2016-10-06 18:16           ` Eric Blake
2016-10-06 18:19             ` John Snow
2016-10-11  9:50       ` Markus Armbruster
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 03/11] Blockjobs: Internalize user_pause logic John Snow
2016-10-04  0:57   ` Jeff Cody
2016-10-04  2:46     ` John Snow
2016-10-04 18:35     ` John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context John Snow
2016-10-05 14:02   ` Kevin Wolf
2016-10-06 20:22     ` John Snow
2016-10-07  7:49       ` Paolo Bonzini
2016-10-13  0:49         ` John Snow
2016-10-13  9:03           ` Paolo Bonzini
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 05/11] blockjobs: split interface into public/private John Snow
2016-10-05 14:17   ` Kevin Wolf
2016-10-05 16:20     ` John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 06/11] blockjobs: fix documentation John Snow
2016-10-05 15:03   ` Kevin Wolf
2016-10-05 16:22     ` John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 07/11] blockjob: add .clean property John Snow
2016-10-12 11:11   ` Vladimir Sementsov-Ogievskiy
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 08/11] blockjob: add .start field John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 09/11] blockjob: add block_job_start John Snow
2016-10-05 15:17   ` Kevin Wolf
2016-10-06 22:44     ` John Snow
2016-10-17 18:00       ` John Snow
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 10/11] blockjob: refactor backup_start as backup_job_create John Snow
2016-10-07 18:39   ` John Snow
2016-10-10  8:57     ` Kevin Wolf
2016-10-10 22:51       ` John Snow
2016-10-11  8:56         ` Paolo Bonzini
2016-10-11  9:35         ` Kevin Wolf [this message]
2016-10-17  8:59           ` Fam Zheng
2016-09-30 22:00 ` [Qemu-devel] [PATCH v2 11/11] iotests: add transactional failure race test John Snow
2016-10-12 11:26   ` Vladimir Sementsov-Ogievskiy
2016-10-12 16:09     ` John Snow
2016-09-30 22:22 ` [Qemu-devel] [PATCH v2 00/11] blockjobs: Fix transactional race condition no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161011093553.GA6334@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=famz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).