qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Peter Lieven <pl@kamp.de>, John Snow <jsnow@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Christian Theune <ct@flyingcircus.io>
Subject: Re: [Qemu-devel] Qemu and Changed Block Tracking
Date: Wed, 22 Feb 2017 06:32:05 -0600	[thread overview]
Message-ID: <c207e00d-25a6-45bb-a849-235d0cd2be0d@redhat.com> (raw)
In-Reply-To: <f0eb7ff1-47d7-42d0-c7b5-946d2224e986@kamp.de>

[-- Attachment #1: Type: text/plain, Size: 3760 bytes --]

On 02/22/2017 02:45 AM, Peter Lieven wrote:
>> A bit outdated now, but:
>> http://wiki.qemu-project.org/Features/IncrementalBackup
>>
>> and also a summary I wrote not too far back (PDF):
>> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>>
>> and I'm sure the Virtuozzo developers could chime in on this subject,
>> but basically we do have something similar in the works, as eblake says.
> 
> Hi John, Hi Erik,

It's Eric, but you're not the first to make that typo :)

> 
> thanks for your feedback. Are you both the ones working primary on this topic?
> If there is anything to review or help needed, please let me know.
> 
> My 2 cents:
> I thing I had in mind if there is no image fleecing available, but fetching the dirty bitmap
> from external would be a feauture to put a write lock on a block device.

The whole idea is to use a dirty bitmap coupled with image fleecing,
where the point-in-time of the image fleecing is done at a window where
the guest I/O is quiescent in order to get a stable fleecing point.  We
already support write locks (guest quiesence) using qga to do fsfreeze.
You want the time that guest I/O is frozen to be as small as possible
(in particular, the Windows implementation of quiescence will fail if
you hold things frozen for more than a couple of seconds).

Right now, the qcow2 image format does not track write generations, and
I don't think we plan on adding that directly into qcow2.  However, you
can externally simulate write generations by keeping track of how many
image fleecing points you have created (each fleecing point is another
write generation).


> In this case something like this via QMP (and external software) should work:
> ---8<---
>  gen =  write generation of last backup (or 0 for full backup)
>  do {
>      nextgen = fetch current write generation (via QMP)
>      dirtymap = send all block whose write generation is greater than 'gen' (via QMP)

No, we are NOT going to send dirty information via QMP.  Rather, we are
going to send it via NBD's extension NBD_CMD_BLOCK_STATUS.  The idea is
that a client connects and asks which qemu blocks are dirty, then uses
that information to read only the dirty blocks.

>      dirtycnt = 0
>      foreach block in dirtymap {
>                copy to backup via external software
>                dirtycnt++
>      }
>      gen = nextgen
>  } while (dirtycnt < X)         <--- to achieve this a thorttling or similar might be needed
> 
> fsfreeze (optional)
> write lock (via QMP)
> backupgen = fetch current write generation (via QMP)
> dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
> foreach block in dirtymap {
>                copy to backup via external software
> }
> unlock (via QMP)
> fsthaw (optional)
> --->8---

That is too long for the guest to be frozen.  Rather, the flow is more like:

set up bitmap0 to track all writes since last point in time
fsfreeze (optional)
transaction to pivot to new bitmap1 (effectively freezing bitmap0 as the
point in time we are interested in)
fsthaw
connect via NBD with a request to view the data at the bitmap0 point in
time - read the bitmap, then read the sectors that the bitmap says are dirty
clean up bitmap0 (qemu can finally delete any point-in-time sectors that
were copied off due to any writes after the thaw)

> As far as I understand CBT in VMware is not just only a dirty bitmap, but also a write generation tracking for blocks (size 64kb or whatever)

Write generation is a matter of tracking which bitmaps and points in
time you fleeced from.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2017-02-22 12:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-21 12:43 [Qemu-devel] Qemu and Changed Block Tracking Peter Lieven
2017-02-21 15:11 ` Eric Blake
2017-02-21 21:13 ` John Snow
2017-02-22  8:45   ` Peter Lieven
2017-02-22 12:32     ` Eric Blake [this message]
2017-02-23 14:27       ` Peter Lieven
2017-02-24 21:31         ` John Snow
2017-02-24 21:44           ` Eric Blake
2017-02-26 20:41             ` Peter Lieven
2017-02-27 16:56               ` Eric Blake
2017-02-27 20:39             ` John Snow
2017-02-22 21:17     ` John Snow
2017-02-23 14:29       ` Peter Lieven
2017-02-23 19:34         ` John Snow
2017-02-24  7:59           ` Peter Lieven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c207e00d-25a6-45bb-a849-235d0cd2be0d@redhat.com \
    --to=eblake@redhat.com \
    --cc=ct@flyingcircus.io \
    --cc=jsnow@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).