From: Kevin Wolf <kwolf@redhat.com>
To: John Snow <jsnow@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-devel@nongnu.org, qemu-block@nongnu.org, famz@redhat.com,
armbru@redhat.com, mnestratov@virtuozzo.com, mreitz@redhat.com,
nshirokovskiy@virtuozzo.com, stefanha@redhat.com, den@openvz.org,
pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API
Date: Tue, 12 Dec 2017 12:16:38 +0100 [thread overview]
Message-ID: <20171212111638.GD3879@localhost.localdomain> (raw)
In-Reply-To: <f916db57-e4a9-1903-9957-d1afb0fdd495@redhat.com>
Am 11.12.2017 um 19:40 hat John Snow geschrieben:
>
>
> On 12/11/2017 06:15 AM, Kevin Wolf wrote:
> > Am 09.12.2017 um 01:57 hat John Snow geschrieben:
> >> Here's an idea of what this API might look like without revealing
> >> explicit merge/split primitives.
> >>
> >> A new bitmap property that lets us set retention:
> >>
> >> :: block-dirty-bitmap-set-retention bitmap=foo slices=10
> >>
> >> Or something similar, where the default property for all bitmaps is
> >> zero -- the current behavior: no copies retained.
> >>
> >> By setting it to a non-zero positive integer, the incremental backup
> >> mode will automatically save a disabled copy when possible.
> >
> > -EMAGIC
> >
> > Operations that create or delete user-visible objects should be
> > explicit, not automatic. You're trying to implement management layer
> > functionality in qemu here, but incomplete enough that the artifacts of
> > it are still visible externally. (A complete solution within qemu
> > wouldn't expose low-level concepts such as bitmaps on an external
> > interface, but you would expose something like checkpoints.)
> >
> > Usually it's not a good idea to have a design where qemu implements
> > enough to restrict management tools to whatever use case we had in mind,
> > but not enough to make the management tool's life substantially easier
> > (by not having to care about some low-level concepts).
> >
> >> "What happens if we exceed our retention?"
> >>
> >> (A) We push the last one out automatically, or
> >> (B) We fail the operation immediately.
> >>
> >> A is more convenient, but potentially unsafe if the management tool or
> >> user wasn't aware that was going to happen.
> >> B is more annoying, but definitely more safe as it means we cannot lose
> >> a bitmap accidentally.
> >
> > Both mean that the management layer has not only to deal with the
> > deletion of bitmaps as it wants to have them, but also to keep the
> > retention counter somewhere and predict what qemu is going to do to the
> > bitmaps and whether any corrective action needs to be taken.
> >
> > This is making things more complex rather than simpler.
> >
> >> I would argue for B with perhaps a force-cycle=true|false that defaults
> >> to false to let management tools say "Yes, go ahead, remove the old one"
> >> with additionally some return to let us know it happened:
> >>
> >> {"return": {
> >> "dropped-slices": [ {"bitmap0": 0}, ...]
> >> }}
> >>
> >> This would introduce some concept of bitmap slices into the mix as ID'd
> >> children of a bitmap. I would propose that these slices are numbered and
> >> monotonically increasing. "bitmap0" as an object starts with no slices,
> >> but every incremental backup creates slice 0, slice 1, slice 2, and so
> >> on. Even after we start deleting some, they stay ordered. These numbers
> >> then stand in for points in time.
> >>
> >> The counter can (must?) be reset and all slices forgotten when
> >> performing a full backup while providing a bitmap argument.
> >>
> >> "How can a user make use of the slices once they're made?"
> >>
> >> Let's consider something like mode=partial in contrast to
> >> mode=incremental, and an example where we have 6 prior slices:
> >> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.)
> >>
> >> mode=partial bitmap=foo slice=4
> >>
> >> This would create a backup from slice 4 to the current time α. This
> >> includes all clusters from 4, 5, and the active bitmap.
> >>
> >> I don't think it is meaningful to define any end point that isn't the
> >> current time, so I've omitted that as a possibility.
> >
> > John, what are you doing here? This adds option after option, and even
> > additional slice object, only complicating an easy thing more and more.
> > I'm not sure if that was your intention, but I feel I'm starting to
> > understand better how Linus's rants come about.
> >
> > Let me summarise what this means for management layer:
> >
> > * The management layer has to manage bitmaps. They have direct control
> > over creation and deletion of bitmaps. So far so good.
> >
> > * It also has to manage slices in those bitmaps objects; and these
> > slices are what contains the actual bitmaps. In order to identify a
> > bitmap in qemu, you need:
> >
> > a) the node name
> > b) the bitmap ID, and
> > c) the slice number
> >
> > The slice number is assigned by qemu and libvirt has to wait until
> > qemu tells it about the slice number of a newly created slice. If
> > libvirt doesn't receive the reply to the command that started the
> > block job, it needs to be able to query this information from qemu,
> > e.g. in query-block-jobs.
> >
> > * Slices are automatically created when you start a backup job with a
> > bitmap. It doesn't matter whether you even intend to do an incremental
> > backup against this point in time. qemu knows better.
> >
> > * In order to delete a slice that you don't need any more, you have to
> > create more slices (by doing more backups), but you don't get to
> > decide which one is dropped. qemu helpfully just drops the oldest one.
> > It doesn't matter if you want to keep an older one so you can do an
> > incremental backup for a longer timespan. Don't worry about your
> > backup strategy, qemu knows better.
> >
> > * Of course, just creating a new backup job doesn't mean that removing
> > the old slice works, even if you give the respective option. That's
> > what the 'dropped-slices' return is for. So once again wait for
> > whatever qemu did and reproduce it in the data structures of the
> > management tool. It's also more information that needs to be exposed
> > in query-block-jobs because libvirt might miss the return value.
> >
> > * Hmm... What happens if you start n backup block jobs, with n > slices?
> > Sounds like a great way to introduce subtle bugs in both qemu and the
> > management layer.
> >
> > Do you really think working with this API would be fun for libvirt?
> >
> >> "Does a partial backup create a new point in time?"
> >>
> >> If yes: This means that the next incremental backup must necessarily be
> >> based off of the last partial backup that was made. This seems a little
> >> inconvenient. This would mean that point in time α becomes "slice 6."
> >
> > Or based off any of the previous points in time, provided that qemu
> > didn't helpfully decide to delete it. Can't I still create a backup
> > starting from slice 4 then?
> >
> > Also, a more general question about incremental backup: How does it play
> > with snapshots? Shouldn't we expect that people sometimes use both
> > snapshots and backups? Can we restrict the backup job to considering
> > bitmaps only from a single node or should we be able to reference
> > bitmaps of a backing file as well?
> >
> >> If no: This means that we lose the point in time when we made the
> >> partial and we cannot chain off of the partial backup. It does mean that
> >> the next incremental backup will work as normally expected, however.
> >> This means that point in time α cannot again be referenced by the
> >> management client.
> >>
> >> This mirrors the dynamic between "incremental" and "differential" backups.
> >>
> >> ..hmmm..
> >>
> >> You know, incremental backups are just a special case of "partial" here
> >> where slice is the last recorded slice... Let's look at an API like this:
> >>
> >> mode=<incremental|differential> bitmap=<name> [slice=N]
> >>
> >> Incremental: We create a new slice if the bitmap has room for one.
> >> Differential: We don't create a new slice. The data in the active bitmap
> >> α does not get cleared after the bitmap operation.
> >>
> >> Slice:
> >> If not specified, assume we want only the active slice. This is the
> >> current behavior in QEMU 2.11.
> >> If specified, we create a temporary merge between bitmaps [N..α] and use
> >> that for the backup operation.
> >>
> >> "Can we delete slices?"
> >>
> >> Sure.
> >>
> >> :: block-dirty-bitmap-slice-delete bitmap=foo slice=4
> >>
> >> "Can we create a slice without making a bitmap?"
> >>
> >> It would be easy to do, but I'm not sure I see the utility. In using it,
> >> it means if you don't specify the slice manually for the next backup
> >> that you will necessarily be getting something not usable.
> >>
> >> but we COULD do it, it would just be banking the changes in the active
> >> bitmap into a new slice.
> >
> > Okay, with explicit management this is getting a little more reasonable
> > now. However, I don't understand what slices buy us then compared to
> > just separate bitmaps.
> >
> > Essentially, bitmaps form a second kind of backing chain. Backup always
> > wants to use the combined bitmaps of some subchain. I see two easy ways
> > to do this: Either pass an array of bitmaps to consider to the job, or
> > store the "backing link" in the bitmap so that we can just specify a
> > "base bitmap" like we usually do with normal backing files.
> >
> > The backup block job can optionally append a new bitmap to the chain
> > like external snapshots do for backing chains. Deleting a bitmap in the
> > chain is the merge operation, similar to a commit block job for backing
> > chains.
> >
> > We know these mechanism very well because the block layer has been using
> > them for ages.
> >
> >>> I also have another idea:
> >>> implement new object: point-in-time or checkpoint. The should have
> >>> names, and the simple add/remove API.
> >>> And they will be backed by dirty bitmaps. so checkpoint deletion is
> >>> bitmap merge (and delete one of them),
> >>> checkpoint creation is disabling of active-checkpoint-bitmap and
> >>> starting new active-checkpoint-bitmap.
> >>
> >> Yes, exactly! I think that's pretty similar to what I am thinking of
> >> with slices.
> >>
> >> This sounds a little safer to me in that we can examine an operation to
> >> see if it's sane or not.
> >
> > Exposing checkpoints is a reasonable high-level API. The important part
> > then is that you don't expose bitmaps + slices, but only checkpoints
> > without bitmaps. The bitmaps are an implementation detail.
> >
> >>> Then we can implement merging of several bitmaps (from one of
> >>> checkpoints to current moment) in
> >>> NBD meta-context-query handling.
> >>>
> >> Note:
> >>
> >> I should say that I've had discussions with Stefan in the past over
> >> things like differential mode and the feeling I got from him was that he
> >> felt that data should be copied from QEMU precisely *once*, viewing any
> >> subsequent copying of the same data as redundant and wasteful.
> >
> > That's a management layer decision. Apparently there are users who want
> > to copy from qemu multiple times, otherwise we wouldn't be talking about
> > slices and retention.
> >
> > Kevin
>
> Sorry.
*lol*
Though I do hope that my rant was at least somewhat constructive, if
only by making differently broken suggestions. ;-)
Kevin
next prev parent reply other threads:[~2017-12-12 11:17 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-13 16:20 [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API Vladimir Sementsov-Ogievskiy
2017-11-13 16:20 ` [Qemu-devel] [PATCH 1/4] block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap Vladimir Sementsov-Ogievskiy
2017-11-13 17:50 ` [Qemu-devel] [PATCH 1/4 for-2.11?] " Eric Blake
2017-11-16 7:56 ` Vladimir Sementsov-Ogievskiy
2017-11-16 7:59 ` Vladimir Sementsov-Ogievskiy
2017-11-13 16:20 ` [Qemu-devel] [PATCH 2/4] qapi: add block-dirty-bitmap-enable/disable Vladimir Sementsov-Ogievskiy
2017-11-13 16:20 ` [Qemu-devel] [PATCH 3/4] qmp: transaction support for block-dirty-bitmap-enable/disable Vladimir Sementsov-Ogievskiy
2017-11-13 16:20 ` [Qemu-devel] [PATCH 4/4] qapi: add block-dirty-bitmap-merge Vladimir Sementsov-Ogievskiy
2017-12-26 8:41 ` Vladimir Sementsov-Ogievskiy
2017-11-15 21:20 ` [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API John Snow
2017-11-16 8:17 ` Vladimir Sementsov-Ogievskiy
2017-11-17 3:10 ` John Snow
2017-11-17 8:22 ` Vladimir Sementsov-Ogievskiy
2017-11-17 21:35 ` John Snow
2017-11-21 17:23 ` Kevin Wolf
2017-11-22 0:10 ` John Snow
2017-11-22 8:40 ` Vladimir Sementsov-Ogievskiy
2017-12-07 11:56 ` [Qemu-devel] [Qemu-block] " Kashyap Chamarthy
2017-12-07 17:33 ` Kevin Wolf
2017-12-08 9:35 ` Kashyap Chamarthy
2017-12-07 22:47 ` John Snow
2017-12-08 14:24 ` Max Reitz
2017-11-30 12:10 ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2017-12-07 0:38 ` John Snow
2017-12-07 9:39 ` Vladimir Sementsov-Ogievskiy
2017-12-09 0:57 ` John Snow
2017-12-11 9:14 ` Denis V. Lunev
2017-12-11 11:15 ` Kevin Wolf
2017-12-11 12:18 ` Vladimir Sementsov-Ogievskiy
2017-12-12 22:15 ` John Snow
2017-12-11 18:40 ` John Snow
2017-12-12 11:16 ` Kevin Wolf [this message]
2017-11-20 16:00 ` Denis V. Lunev
2017-11-24 15:01 ` Vladimir Sementsov-Ogievskiy
2017-11-27 12:04 ` Kevin Wolf
2017-12-13 4:12 ` Fam Zheng
2017-12-19 16:07 ` Vladimir Sementsov-Ogievskiy
2017-12-20 1:06 ` John Snow
2017-12-20 8:05 ` Nikolay Shirokovskiy
2017-12-20 8:20 ` Vladimir Sementsov-Ogievskiy
2017-12-20 10:29 ` Kirill Korotaev
2017-12-26 7:07 ` Fam Zheng
2017-12-26 8:57 ` Vladimir Sementsov-Ogievskiy
2017-12-26 9:45 ` Fam Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171212111638.GD3879@localhost.localdomain \
--to=kwolf@redhat.com \
--cc=armbru@redhat.com \
--cc=den@openvz.org \
--cc=famz@redhat.com \
--cc=jsnow@redhat.com \
--cc=mnestratov@virtuozzo.com \
--cc=mreitz@redhat.com \
--cc=nshirokovskiy@virtuozzo.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.