[Qemu-devel] Qemu and Changed Block Tracking

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Qemu and Changed Block Tracking
@ 2017-02-21 12:43 Peter Lieven
  2017-02-21 15:11 ` Eric Blake
  2017-02-21 21:13 ` John Snow
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Lieven @ 2017-02-21 12:43 UTC (permalink / raw)
  To: qemu-devel@nongnu.org

Hi,


is there anyone ever thought about implementing something like VMware CBT in Qemu?


https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020128


Thanks,
Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-21 12:43 [Qemu-devel] Qemu and Changed Block Tracking Peter Lieven
@ 2017-02-21 15:11 ` Eric Blake
  2017-02-21 21:13 ` John Snow
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Blake @ 2017-02-21 15:11 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 598 bytes --]

On 02/21/2017 06:43 AM, Peter Lieven wrote:
> Hi,
> 
> 
> is there anyone ever thought about implementing something like VMware
> CBT in Qemu?
> 
> 
> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020128

Yes; in fact, the work on persistent dirty bitmaps and on NBD
BLOCK_STATUS reporting is what we envision as the building blocks for an
upper layer software to be able to grab CBT information on which blocks
are dirty.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-21 12:43 [Qemu-devel] Qemu and Changed Block Tracking Peter Lieven
  2017-02-21 15:11 ` Eric Blake
@ 2017-02-21 21:13 ` John Snow
  2017-02-22  8:45   ` Peter Lieven
  1 sibling, 1 reply; 15+ messages in thread
From: John Snow @ 2017-02-21 21:13 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel@nongnu.org



On 02/21/2017 07:43 AM, Peter Lieven wrote:
> Hi,
> 
> 
> is there anyone ever thought about implementing something like VMware
> CBT in Qemu?
> 
> 
> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020128
> 
> 
> 
> Thanks,
> Peter
> 
> 

A bit outdated now, but:
http://wiki.qemu-project.org/Features/IncrementalBackup

and also a summary I wrote not too far back (PDF):
https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE

and I'm sure the Virtuozzo developers could chime in on this subject,
but basically we do have something similar in the works, as eblake says.

--js

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-21 21:13 ` John Snow
@ 2017-02-22  8:45   ` Peter Lieven
  2017-02-22 12:32     ` Eric Blake
  2017-02-22 21:17     ` John Snow
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Lieven @ 2017-02-22  8:45 UTC (permalink / raw)
  To: John Snow, qemu-devel@nongnu.org, Christian Theune


Am 21.02.2017 um 22:13 schrieb John Snow:
>
> On 02/21/2017 07:43 AM, Peter Lieven wrote:
>> Hi,
>>
>>
>> is there anyone ever thought about implementing something like VMware
>> CBT in Qemu?
>>
>>
>> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020128
>>
>>
>>
>> Thanks,
>> Peter
>>
>>
> A bit outdated now, but:
> http://wiki.qemu-project.org/Features/IncrementalBackup
>
> and also a summary I wrote not too far back (PDF):
> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>
> and I'm sure the Virtuozzo developers could chime in on this subject,
> but basically we do have something similar in the works, as eblake says.

Hi John, Hi Erik,

thanks for your feedback. Are you both the ones working primary on this topic?
If there is anything to review or help needed, please let me know.

My 2 cents:
I thing I had in mind if there is no image fleecing available, but fetching the dirty bitmap
from external would be a feauture to put a write lock on a block device.
Write lock means, drain all pending writes and queue all further writes until unlock (as if they
were throttled to zero). This could help fetch consistent backups from storage device (thinking of iSCSI SAN) without
the help of the hypervisor to actually transfer data (no load in the frontend network or the host). What would further
be needed is a write generation for each block, not just only a dirty bitmap.

In this case something like this via QMP (and external software) should work:
---8<---
 gen =  write generation of last backup (or 0 for full backup)
 do {
     nextgen = fetch current write generation (via QMP)
     dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
     dirtycnt = 0
     foreach block in dirtymap {
               copy to backup via external software
               dirtycnt++
     }
     gen = nextgen
 } while (dirtycnt < X)         <--- to achieve this a thorttling or similar might be needed

fsfreeze (optional)
write lock (via QMP)
backupgen = fetch current write generation (via QMP)
dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
foreach block in dirtymap {
               copy to backup via external software
}
unlock (via QMP)
fsthaw (optional)
--->8---

As far as I understand CBT in VMware is not just only a dirty bitmap, but also a write generation tracking for blocks (size 64kb or whatever)

Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-22  8:45   ` Peter Lieven
@ 2017-02-22 12:32     ` Eric Blake
  2017-02-23 14:27       ` Peter Lieven
  2017-02-22 21:17     ` John Snow
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Blake @ 2017-02-22 12:32 UTC (permalink / raw)
  To: Peter Lieven, John Snow, qemu-devel@nongnu.org, Christian Theune

[-- Attachment #1: Type: text/plain, Size: 3760 bytes --]

On 02/22/2017 02:45 AM, Peter Lieven wrote:
>> A bit outdated now, but:
>> http://wiki.qemu-project.org/Features/IncrementalBackup
>>
>> and also a summary I wrote not too far back (PDF):
>> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>>
>> and I'm sure the Virtuozzo developers could chime in on this subject,
>> but basically we do have something similar in the works, as eblake says.
> 
> Hi John, Hi Erik,

It's Eric, but you're not the first to make that typo :)

> 
> thanks for your feedback. Are you both the ones working primary on this topic?
> If there is anything to review or help needed, please let me know.
> 
> My 2 cents:
> I thing I had in mind if there is no image fleecing available, but fetching the dirty bitmap
> from external would be a feauture to put a write lock on a block device.

The whole idea is to use a dirty bitmap coupled with image fleecing,
where the point-in-time of the image fleecing is done at a window where
the guest I/O is quiescent in order to get a stable fleecing point.  We
already support write locks (guest quiesence) using qga to do fsfreeze.
You want the time that guest I/O is frozen to be as small as possible
(in particular, the Windows implementation of quiescence will fail if
you hold things frozen for more than a couple of seconds).

Right now, the qcow2 image format does not track write generations, and
I don't think we plan on adding that directly into qcow2.  However, you
can externally simulate write generations by keeping track of how many
image fleecing points you have created (each fleecing point is another
write generation).


> In this case something like this via QMP (and external software) should work:
> ---8<---
>  gen =  write generation of last backup (or 0 for full backup)
>  do {
>      nextgen = fetch current write generation (via QMP)
>      dirtymap = send all block whose write generation is greater than 'gen' (via QMP)

No, we are NOT going to send dirty information via QMP.  Rather, we are
going to send it via NBD's extension NBD_CMD_BLOCK_STATUS.  The idea is
that a client connects and asks which qemu blocks are dirty, then uses
that information to read only the dirty blocks.

>      dirtycnt = 0
>      foreach block in dirtymap {
>                copy to backup via external software
>                dirtycnt++
>      }
>      gen = nextgen
>  } while (dirtycnt < X)         <--- to achieve this a thorttling or similar might be needed
> 
> fsfreeze (optional)
> write lock (via QMP)
> backupgen = fetch current write generation (via QMP)
> dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
> foreach block in dirtymap {
>                copy to backup via external software
> }
> unlock (via QMP)
> fsthaw (optional)
> --->8---

That is too long for the guest to be frozen.  Rather, the flow is more like:

set up bitmap0 to track all writes since last point in time
fsfreeze (optional)
transaction to pivot to new bitmap1 (effectively freezing bitmap0 as the
point in time we are interested in)
fsthaw
connect via NBD with a request to view the data at the bitmap0 point in
time - read the bitmap, then read the sectors that the bitmap says are dirty
clean up bitmap0 (qemu can finally delete any point-in-time sectors that
were copied off due to any writes after the thaw)

> As far as I understand CBT in VMware is not just only a dirty bitmap, but also a write generation tracking for blocks (size 64kb or whatever)

Write generation is a matter of tracking which bitmaps and points in
time you fleeced from.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-22  8:45   ` Peter Lieven
  2017-02-22 12:32     ` Eric Blake
@ 2017-02-22 21:17     ` John Snow
  2017-02-23 14:29       ` Peter Lieven
  1 sibling, 1 reply; 15+ messages in thread
From: John Snow @ 2017-02-22 21:17 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel@nongnu.org, Christian Theune



On 02/22/2017 03:45 AM, Peter Lieven wrote:
> 
> Am 21.02.2017 um 22:13 schrieb John Snow:
>>
>> On 02/21/2017 07:43 AM, Peter Lieven wrote:
>>> Hi,
>>>
>>>
>>> is there anyone ever thought about implementing something like VMware
>>> CBT in Qemu?
>>>
>>>
>>> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020128
>>>
>>>
>>>
>>> Thanks,
>>> Peter
>>>
>>>
>> A bit outdated now, but:
>> http://wiki.qemu-project.org/Features/IncrementalBackup
>>
>> and also a summary I wrote not too far back (PDF):
>> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>>
>> and I'm sure the Virtuozzo developers could chime in on this subject,
>> but basically we do have something similar in the works, as eblake says.
> 
> Hi John, Hi Erik,
> 
> thanks for your feedback. Are you both the ones working primary on this topic?
> If there is anything to review or help needed, please let me know.
> 

I've been working on incremental backups; Fam and I now co-maintain
block/dirty-bitmap.c.

Vladimir Sementsov-Ogievskiy has been working on bitmap persistence and
migration from Virtuozzo; as well as the NBD specification amendment to
allow us to fleece images with dirty bitmaps.

(Check the wiki and the whitepaper I linked!)

Eric has been guiding the review process for the NBD side of things.

> My 2 cents:
> I thing I had in mind if there is no image fleecing available, but fetching the dirty bitmap
> from external would be a feauture to put a write lock on a block device.
> Write lock means, drain all pending writes and queue all further writes until unlock (as if they
> were throttled to zero). This could help fetch consistent backups from storage device (thinking of iSCSI SAN) without
> the help of the hypervisor to actually transfer data (no load in the frontend network or the host). What would further
> be needed is a write generation for each block, not just only a dirty bitmap.
> 
> In this case something like this via QMP (and external software) should work:
> ---8<---
>  gen =  write generation of last backup (or 0 for full backup)
>  do {
>      nextgen = fetch current write generation (via QMP)

As Eric said, there's a lot of hostility to using QMP as a metadata
transmission protocol.

>      dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
>      dirtycnt = 0
>      foreach block in dirtymap {
>                copy to backup via external software
>                dirtycnt++
>      }
>      gen = nextgen
>  } while (dirtycnt < X)         <--- to achieve this a thorttling or similar might be needed
> 
> fsfreeze (optional)
> write lock (via QMP)
> backupgen = fetch current write generation (via QMP)
> dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
> foreach block in dirtymap {
>                copy to backup via external software
> }
> unlock (via QMP)
> fsthaw (optional)
> --->8---
> 
> As far as I understand CBT in VMware is not just only a dirty bitmap, but also a write generation tracking for blocks (size 64kb or whatever)
> 

I think at the moment I'm worried about getting the basic features out
the door, but I'm not opposed to adding fancier features if there's
justification or demand for them.

> Peter
> 

--js

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-22 12:32     ` Eric Blake
@ 2017-02-23 14:27       ` Peter Lieven
  2017-02-24 21:31         ` John Snow
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Lieven @ 2017-02-23 14:27 UTC (permalink / raw)
  To: Eric Blake, John Snow, qemu-devel@nongnu.org, Christian Theune

Am 22.02.2017 um 13:32 schrieb Eric Blake:
> On 02/22/2017 02:45 AM, Peter Lieven wrote:
>>> A bit outdated now, but:
>>> http://wiki.qemu-project.org/Features/IncrementalBackup
>>>
>>> and also a summary I wrote not too far back (PDF):
>>> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>>>
>>> and I'm sure the Virtuozzo developers could chime in on this subject,
>>> but basically we do have something similar in the works, as eblake says.
>> Hi John, Hi Erik,
> It's Eric, but you're not the first to make that typo :)
>
>> thanks for your feedback. Are you both the ones working primary on this topic?
>> If there is anything to review or help needed, please let me know.
>>
>> My 2 cents:
>> I thing I had in mind if there is no image fleecing available, but fetching the dirty bitmap
>> from external would be a feauture to put a write lock on a block device.
> The whole idea is to use a dirty bitmap coupled with image fleecing,
> where the point-in-time of the image fleecing is done at a window where
> the guest I/O is quiescent in order to get a stable fleecing point.  We
> already support write locks (guest quiesence) using qga to do fsfreeze.
> You want the time that guest I/O is frozen to be as small as possible
> (in particular, the Windows implementation of quiescence will fail if
> you hold things frozen for more than a couple of seconds).
>
> Right now, the qcow2 image format does not track write generations, and
> I don't think we plan on adding that directly into qcow2.  However, you
> can externally simulate write generations by keeping track of how many
> image fleecing points you have created (each fleecing point is another
> write generation).
>
>
>> In this case something like this via QMP (and external software) should work:
>> ---8<---
>>   gen =  write generation of last backup (or 0 for full backup)
>>   do {
>>       nextgen = fetch current write generation (via QMP)
>>       dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
> No, we are NOT going to send dirty information via QMP.  Rather, we are
> going to send it via NBD's extension NBD_CMD_BLOCK_STATUS.  The idea is
> that a client connects and asks which qemu blocks are dirty, then uses
> that information to read only the dirty blocks.

I understand, that for the case of local storage connecting via NBD to Qemu to grep a snapshot
might be a good idea, but consider that you have a NAS for your vServer images. May it be NFS,
iSCSI, CEPH or whatever. In an enterprise scenario I would generally except to have a NAS rather
than local storage.

When you are going to backup your vServer (full or incremental) you shuffle all the traffic through
Qemu and your Node running the vServer. In this case you run all the traffic over the wire twice.

NAS -> Node -> Qemu - > Backup Server

But the Backup Server could instead connect to the NAS directly avoiding load on the frontent LAN
and the Qemu Node.

I would like to find a nice solution for this scenario. If not in the first step it would maybe be good to
have this in mind when implementing a dirty block tracking.

Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-22 21:17     ` John Snow
@ 2017-02-23 14:29       ` Peter Lieven
  2017-02-23 19:34         ` John Snow
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Lieven @ 2017-02-23 14:29 UTC (permalink / raw)
  To: John Snow, qemu-devel@nongnu.org, Christian Theune

Am 22.02.2017 um 22:17 schrieb John Snow:
>
> On 02/22/2017 03:45 AM, Peter Lieven wrote:
>> Am 21.02.2017 um 22:13 schrieb John Snow:
>>> On 02/21/2017 07:43 AM, Peter Lieven wrote:
>>>> Hi,
>>>>
>>>>
>>>> is there anyone ever thought about implementing something like VMware
>>>> CBT in Qemu?
>>>>
>>>>
>>>> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020128
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>>
>>> A bit outdated now, but:
>>> http://wiki.qemu-project.org/Features/IncrementalBackup
>>>
>>> and also a summary I wrote not too far back (PDF):
>>> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>>>
>>> and I'm sure the Virtuozzo developers could chime in on this subject,
>>> but basically we do have something similar in the works, as eblake says.
>> Hi John, Hi Erik,
>>
>> thanks for your feedback. Are you both the ones working primary on this topic?
>> If there is anything to review or help needed, please let me know.
>>
> I've been working on incremental backups; Fam and I now co-maintain
> block/dirty-bitmap.c.
>
> Vladimir Sementsov-Ogievskiy has been working on bitmap persistence and
> migration from Virtuozzo; as well as the NBD specification amendment to
> allow us to fleece images with dirty bitmaps.
>
> (Check the wiki and the whitepaper I linked!)
>
> Eric has been guiding the review process for the NBD side of things.
>
>> My 2 cents:
>> I thing I had in mind if there is no image fleecing available, but fetching the dirty bitmap
>> from external would be a feauture to put a write lock on a block device.
>> Write lock means, drain all pending writes and queue all further writes until unlock (as if they
>> were throttled to zero). This could help fetch consistent backups from storage device (thinking of iSCSI SAN) without
>> the help of the hypervisor to actually transfer data (no load in the frontend network or the host). What would further
>> be needed is a write generation for each block, not just only a dirty bitmap.
>>
>> In this case something like this via QMP (and external software) should work:
>> ---8<---
>>   gen =  write generation of last backup (or 0 for full backup)
>>   do {
>>       nextgen = fetch current write generation (via QMP)
> As Eric said, there's a lot of hostility to using QMP as a metadata
> transmission protocol.
>
>>       dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
>>       dirtycnt = 0
>>       foreach block in dirtymap {
>>                 copy to backup via external software
>>                 dirtycnt++
>>       }
>>       gen = nextgen
>>   } while (dirtycnt < X)         <--- to achieve this a thorttling or similar might be needed
>>
>> fsfreeze (optional)
>> write lock (via QMP)
>> backupgen = fetch current write generation (via QMP)
>> dirtymap = send all block whose write generation is greater than 'gen' (via QMP)
>> foreach block in dirtymap {
>>                 copy to backup via external software
>> }
>> unlock (via QMP)
>> fsthaw (optional)
>> --->8---
>>
>> As far as I understand CBT in VMware is not just only a dirty bitmap, but also a write generation tracking for blocks (size 64kb or whatever)
>>
> I think at the moment I'm worried about getting the basic features out
> the door, but I'm not opposed to adding fancier features if there's
> justification or demand for them.

Sure, the basic features are most important. I was just thinking of the above scenario to interact with a NAS and have Qemu's "help"
to create incremental backups.

Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-23 14:29       ` Peter Lieven
@ 2017-02-23 19:34         ` John Snow
  2017-02-24  7:59           ` Peter Lieven
  0 siblings, 1 reply; 15+ messages in thread
From: John Snow @ 2017-02-23 19:34 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel@nongnu.org, Christian Theune



On 02/23/2017 09:29 AM, Peter Lieven wrote:
> Am 22.02.2017 um 22:17 schrieb John Snow:
>>
>> On 02/22/2017 03:45 AM, Peter Lieven wrote:
>>> Am 21.02.2017 um 22:13 schrieb John Snow:
>>>> On 02/21/2017 07:43 AM, Peter Lieven wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> is there anyone ever thought about implementing something like VMware
>>>>> CBT in Qemu?
>>>>>
>>>>>
>>>>> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020128
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Peter
>>>>>
>>>>>
>>>> A bit outdated now, but:
>>>> http://wiki.qemu-project.org/Features/IncrementalBackup
>>>>
>>>> and also a summary I wrote not too far back (PDF):
>>>> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>>>>
>>>> and I'm sure the Virtuozzo developers could chime in on this subject,
>>>> but basically we do have something similar in the works, as eblake
>>>> says.
>>> Hi John, Hi Erik,
>>>
>>> thanks for your feedback. Are you both the ones working primary on
>>> this topic?
>>> If there is anything to review or help needed, please let me know.
>>>
>> I've been working on incremental backups; Fam and I now co-maintain
>> block/dirty-bitmap.c.
>>
>> Vladimir Sementsov-Ogievskiy has been working on bitmap persistence and
>> migration from Virtuozzo; as well as the NBD specification amendment to
>> allow us to fleece images with dirty bitmaps.
>>
>> (Check the wiki and the whitepaper I linked!)
>>
>> Eric has been guiding the review process for the NBD side of things.
>>
>>> My 2 cents:
>>> I thing I had in mind if there is no image fleecing available, but
>>> fetching the dirty bitmap
>>> from external would be a feauture to put a write lock on a block device.
>>> Write lock means, drain all pending writes and queue all further
>>> writes until unlock (as if they
>>> were throttled to zero). This could help fetch consistent backups
>>> from storage device (thinking of iSCSI SAN) without
>>> the help of the hypervisor to actually transfer data (no load in the
>>> frontend network or the host). What would further
>>> be needed is a write generation for each block, not just only a dirty
>>> bitmap.
>>>
>>> In this case something like this via QMP (and external software)
>>> should work:
>>> ---8<---
>>>   gen =  write generation of last backup (or 0 for full backup)
>>>   do {
>>>       nextgen = fetch current write generation (via QMP)
>> As Eric said, there's a lot of hostility to using QMP as a metadata
>> transmission protocol.
>>
>>>       dirtymap = send all block whose write generation is greater
>>> than 'gen' (via QMP)
>>>       dirtycnt = 0
>>>       foreach block in dirtymap {
>>>                 copy to backup via external software
>>>                 dirtycnt++
>>>       }
>>>       gen = nextgen
>>>   } while (dirtycnt < X)         <--- to achieve this a thorttling or
>>> similar might be needed
>>>
>>> fsfreeze (optional)
>>> write lock (via QMP)
>>> backupgen = fetch current write generation (via QMP)
>>> dirtymap = send all block whose write generation is greater than
>>> 'gen' (via QMP)
>>> foreach block in dirtymap {
>>>                 copy to backup via external software
>>> }
>>> unlock (via QMP)
>>> fsthaw (optional)
>>> --->8---
>>>
>>> As far as I understand CBT in VMware is not just only a dirty bitmap,
>>> but also a write generation tracking for blocks (size 64kb or whatever)
>>>
>> I think at the moment I'm worried about getting the basic features out
>> the door, but I'm not opposed to adding fancier features if there's
>> justification or demand for them.
> 
> Sure, the basic features are most important. I was just thinking of the
> above scenario to interact with a NAS and have Qemu's "help"
> to create incremental backups.
> 
> Peter

If you get the chance to read the white paper I linked to you, please
let me know which use cases we might not be able to cover that you feel
other programs might offer.

I can also make a point to CC you on future upstream discussions as they
happen.

Thanks,
--js

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-23 19:34         ` John Snow
@ 2017-02-24  7:59           ` Peter Lieven
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Lieven @ 2017-02-24  7:59 UTC (permalink / raw)
  To: John Snow, qemu-devel@nongnu.org, Christian Theune

Am 23.02.2017 um 20:34 schrieb John Snow:
>
> On 02/23/2017 09:29 AM, Peter Lieven wrote:
>> Am 22.02.2017 um 22:17 schrieb John Snow:
>>> On 02/22/2017 03:45 AM, Peter Lieven wrote:
>>>> Am 21.02.2017 um 22:13 schrieb John Snow:
>>>>> On 02/21/2017 07:43 AM, Peter Lieven wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> is there anyone ever thought about implementing something like VMware
>>>>>> CBT in Qemu?
>>>>>>
>>>>>>
>>>>>> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020128
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Peter
>>>>>>
>>>>>>
>>>>> A bit outdated now, but:
>>>>> http://wiki.qemu-project.org/Features/IncrementalBackup
>>>>>
>>>>> and also a summary I wrote not too far back (PDF):
>>>>> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>>>>>
>>>>> and I'm sure the Virtuozzo developers could chime in on this subject,
>>>>> but basically we do have something similar in the works, as eblake
>>>>> says.
>>>> Hi John, Hi Erik,
>>>>
>>>> thanks for your feedback. Are you both the ones working primary on
>>>> this topic?
>>>> If there is anything to review or help needed, please let me know.
>>>>
>>> I've been working on incremental backups; Fam and I now co-maintain
>>> block/dirty-bitmap.c.
>>>
>>> Vladimir Sementsov-Ogievskiy has been working on bitmap persistence and
>>> migration from Virtuozzo; as well as the NBD specification amendment to
>>> allow us to fleece images with dirty bitmaps.
>>>
>>> (Check the wiki and the whitepaper I linked!)
>>>
>>> Eric has been guiding the review process for the NBD side of things.
>>>
>>>> My 2 cents:
>>>> I thing I had in mind if there is no image fleecing available, but
>>>> fetching the dirty bitmap
>>>> from external would be a feauture to put a write lock on a block device.
>>>> Write lock means, drain all pending writes and queue all further
>>>> writes until unlock (as if they
>>>> were throttled to zero). This could help fetch consistent backups
>>>> from storage device (thinking of iSCSI SAN) without
>>>> the help of the hypervisor to actually transfer data (no load in the
>>>> frontend network or the host). What would further
>>>> be needed is a write generation for each block, not just only a dirty
>>>> bitmap.
>>>>
>>>> In this case something like this via QMP (and external software)
>>>> should work:
>>>> ---8<---
>>>>   gen =  write generation of last backup (or 0 for full backup)
>>>>   do {
>>>>       nextgen = fetch current write generation (via QMP)
>>> As Eric said, there's a lot of hostility to using QMP as a metadata
>>> transmission protocol.
>>>
>>>>       dirtymap = send all block whose write generation is greater
>>>> than 'gen' (via QMP)
>>>>       dirtycnt = 0
>>>>       foreach block in dirtymap {
>>>>                 copy to backup via external software
>>>>                 dirtycnt++
>>>>       }
>>>>       gen = nextgen
>>>>   } while (dirtycnt < X)         <--- to achieve this a thorttling or
>>>> similar might be needed
>>>>
>>>> fsfreeze (optional)
>>>> write lock (via QMP)
>>>> backupgen = fetch current write generation (via QMP)
>>>> dirtymap = send all block whose write generation is greater than
>>>> 'gen' (via QMP)
>>>> foreach block in dirtymap {
>>>>                 copy to backup via external software
>>>> }
>>>> unlock (via QMP)
>>>> fsthaw (optional)
>>>> --->8---
>>>>
>>>> As far as I understand CBT in VMware is not just only a dirty bitmap,
>>>> but also a write generation tracking for blocks (size 64kb or whatever)
>>>>
>>> I think at the moment I'm worried about getting the basic features out
>>> the door, but I'm not opposed to adding fancier features if there's
>>> justification or demand for them.
>> Sure, the basic features are most important. I was just thinking of the
>> above scenario to interact with a NAS and have Qemu's "help"
>> to create incremental backups.
>>
>> Peter
> If you get the chance to read the white paper I linked to you, please
> let me know which use cases we might not be able to cover that you feel
> other programs might offer.

Will do. Only have had a short glimpse yet.

>
> I can also make a point to CC you on future upstream discussions as they
> happen.
Yes, please.

Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-23 14:27       ` Peter Lieven
@ 2017-02-24 21:31         ` John Snow
  2017-02-24 21:44           ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: John Snow @ 2017-02-24 21:31 UTC (permalink / raw)
  To: Peter Lieven, Eric Blake, qemu-devel@nongnu.org, Christian Theune



On 02/23/2017 09:27 AM, Peter Lieven wrote:
> Am 22.02.2017 um 13:32 schrieb Eric Blake:
>> On 02/22/2017 02:45 AM, Peter Lieven wrote:
>>>> A bit outdated now, but:
>>>> http://wiki.qemu-project.org/Features/IncrementalBackup
>>>>
>>>> and also a summary I wrote not too far back (PDF):
>>>> https://drive.google.com/file/d/0B3CFr1TuHydWalVJaEdPaE5PbFE
>>>>
>>>> and I'm sure the Virtuozzo developers could chime in on this subject,
>>>> but basically we do have something similar in the works, as eblake
>>>> says.
>>> Hi John, Hi Erik,
>> It's Eric, but you're not the first to make that typo :)
>>
>>> thanks for your feedback. Are you both the ones working primary on
>>> this topic?
>>> If there is anything to review or help needed, please let me know.
>>>
>>> My 2 cents:
>>> I thing I had in mind if there is no image fleecing available, but
>>> fetching the dirty bitmap
>>> from external would be a feauture to put a write lock on a block device.
>> The whole idea is to use a dirty bitmap coupled with image fleecing,
>> where the point-in-time of the image fleecing is done at a window where
>> the guest I/O is quiescent in order to get a stable fleecing point.  We
>> already support write locks (guest quiesence) using qga to do fsfreeze.
>> You want the time that guest I/O is frozen to be as small as possible
>> (in particular, the Windows implementation of quiescence will fail if
>> you hold things frozen for more than a couple of seconds).
>>
>> Right now, the qcow2 image format does not track write generations, and
>> I don't think we plan on adding that directly into qcow2.  However, you
>> can externally simulate write generations by keeping track of how many
>> image fleecing points you have created (each fleecing point is another
>> write generation).
>>
>>
>>> In this case something like this via QMP (and external software)
>>> should work:
>>> ---8<---
>>>   gen =  write generation of last backup (or 0 for full backup)
>>>   do {
>>>       nextgen = fetch current write generation (via QMP)
>>>       dirtymap = send all block whose write generation is greater
>>> than 'gen' (via QMP)
>> No, we are NOT going to send dirty information via QMP.  Rather, we are
>> going to send it via NBD's extension NBD_CMD_BLOCK_STATUS.  The idea is
>> that a client connects and asks which qemu blocks are dirty, then uses
>> that information to read only the dirty blocks.
> 
> I understand, that for the case of local storage connecting via NBD to
> Qemu to grep a snapshot
> might be a good idea, but consider that you have a NAS for your vServer
> images. May it be NFS,
> iSCSI, CEPH or whatever. In an enterprise scenario I would generally
> except to have a NAS rather
> than local storage.
> 
> When you are going to backup your vServer (full or incremental) you
> shuffle all the traffic through
> Qemu and your Node running the vServer. In this case you run all the
> traffic over the wire twice.
> 
> NAS -> Node -> Qemu - > Backup Server
> 
> But the Backup Server could instead connect to the NAS directly avoiding
> load on the frontent LAN
> and the Qemu Node.
> 

In a live backup I don't see how you will be removing QEMU from the data
transfer loop. QEMU is the only process that knows what the correct view
of the image is, and needs to facilitate.

It's not safe to copy the blocks directly without QEMU's mediation.

--js

> I would like to find a nice solution for this scenario. If not in the
> first step it would maybe be good to
> have this in mind when implementing a dirty block tracking.
> 
> Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-24 21:31         ` John Snow
@ 2017-02-24 21:44           ` Eric Blake
  2017-02-26 20:41             ` Peter Lieven
  2017-02-27 20:39             ` John Snow
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Blake @ 2017-02-24 21:44 UTC (permalink / raw)
  To: John Snow, Peter Lieven, qemu-devel@nongnu.org, Christian Theune

[-- Attachment #1: Type: text/plain, Size: 1932 bytes --]

On 02/24/2017 03:31 PM, John Snow wrote:
>>
>> But the Backup Server could instead connect to the NAS directly avoiding
>> load on the frontent LAN
>> and the Qemu Node.
>>
> 
> In a live backup I don't see how you will be removing QEMU from the data
> transfer loop. QEMU is the only process that knows what the correct view
> of the image is, and needs to facilitate.
> 
> It's not safe to copy the blocks directly without QEMU's mediation.

Although we may already have enough tools in place to help achieve that:
create a temporary qcow2 wrapper around the primary image via external
snapshot, so that the primary image is now read-only in qemu; then use
whatever block-status mechanism (whether the NBD block status extension,
or directly reading from a persistent bitmap) to facilitate whatever
more efficient offline transfer of just the relevant portions of that
main file, then live block-commit to get qemu to start writing to the
file again.

In other words, any time your algorithm wants to cause an I/O freeze to
a particular file, the solution is to add a qcow2 external snapshot
followed by a live commit.

So tweaking the proposal a few mails ago:

fsfreeze (optional)
create qcow2 snapshot wrapper as a write lock (via QMP)
fsthaw - now with no risk of violating guest timing constraints
dirtymap = find all blocks that are dirty since last backup (via named
bitmap/NBD block status)
foreach block in dirtymap {
               copy to backup via external software
}
live commit image (via QMP)

The window where guest I/O is frozen is small (the freeze/snapshot
create/thaw steps can be done in less than a second), while the window
where you are extracting incremental backup data is longer (during that
time, guest I/O is happening into a wrapper qcow2 file).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-24 21:44           ` Eric Blake
@ 2017-02-26 20:41             ` Peter Lieven
  2017-02-27 16:56               ` Eric Blake
  2017-02-27 20:39             ` John Snow
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Lieven @ 2017-02-26 20:41 UTC (permalink / raw)
  To: Eric Blake; +Cc: John Snow, qemu-devel@nongnu.org, Christian Theune


> Am 24.02.2017 um 22:44 schrieb Eric Blake <eblake@redhat.com>:
> 
> On 02/24/2017 03:31 PM, John Snow wrote:
>>> 
>>> But the Backup Server could instead connect to the NAS directly avoiding
>>> load on the frontent LAN
>>> and the Qemu Node.
>>> 
>> 
>> In a live backup I don't see how you will be removing QEMU from the data
>> transfer loop. QEMU is the only process that knows what the correct view
>> of the image is, and needs to facilitate.
>> 
>> It's not safe to copy the blocks directly without QEMU's mediation.
> 
> Although we may already have enough tools in place to help achieve that:
> create a temporary qcow2 wrapper around the primary image via external
> snapshot, so that the primary image is now read-only in qemu; then use
> whatever block-status mechanism (whether the NBD block status extension,
> or directly reading from a persistent bitmap) to facilitate whatever
> more efficient offline transfer of just the relevant portions of that
> main file, then live block-commit to get qemu to start writing to the
> file again.
> 
> In other words, any time your algorithm wants to cause an I/O freeze to
> a particular file, the solution is to add a qcow2 external snapshot
> followed by a live commit.
> 
> So tweaking the proposal a few mails ago:
> 
> fsfreeze (optional)
> create qcow2 snapshot wrapper as a write lock (via QMP)
> fsthaw - now with no risk of violating guest timing constraints
> dirtymap = find all blocks that are dirty since last backup (via named
> bitmap/NBD block status)
> foreach block in dirtymap {
>               copy to backup via external software
> }
> live commit image (via QMP)
> 
> The window where guest I/O is frozen is small (the freeze/snapshot
> create/thaw steps can be done in less than a second), while the window
> where you are extracting incremental backup data is longer (during that
> time, guest I/O is happening into a wrapper qcow2 file).

The live-snapshot/live-commit stuff could indeed help in my scenario. If I understand correctly this is
something that already works today, correct? If I have taken a live-snapshot, is live-migration and
stop/start of the VM still possible? What about live-migration and start/stop during live-commit?
I don’t talk about the dirty bitmap tracking I understand that persistence and live-migration support
is still in the works, I’m just interested in the snapshot/commit part.

Thanks
Peter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-26 20:41             ` Peter Lieven
@ 2017-02-27 16:56               ` Eric Blake
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2017-02-27 16:56 UTC (permalink / raw)
  To: Peter Lieven; +Cc: John Snow, qemu-devel@nongnu.org, Christian Theune

[-- Attachment #1: Type: text/plain, Size: 1045 bytes --]

On 02/26/2017 02:41 PM, Peter Lieven wrote:
> The live-snapshot/live-commit stuff could indeed help in my scenario. If I understand correctly this is
> something that already works today, correct? If I have taken a live-snapshot, is live-migration and
> stop/start of the VM still possible? What about live-migration and start/stop during live-commit?

Yes, a guest can be started or stopped while migration and/or
live-commit are underway.  You probably have to keep the qemu process
around (stopping the guest but keeping qemu alive is different than
stopping qemu altogether), which is where the persistence factors into
it (once we have persistent bitmaps, then stopping qemu altogether
becomes possible).

> I don’t talk about the dirty bitmap tracking I understand that persistence and live-migration support
> is still in the works, I’m just interested in the snapshot/commit part.
> 
> Thanks
> Peter
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] Qemu and Changed Block Tracking
  2017-02-24 21:44           ` Eric Blake
  2017-02-26 20:41             ` Peter Lieven
@ 2017-02-27 20:39             ` John Snow
  1 sibling, 0 replies; 15+ messages in thread
From: John Snow @ 2017-02-27 20:39 UTC (permalink / raw)
  To: Eric Blake, Peter Lieven, qemu-devel@nongnu.org, Christian Theune



On 02/24/2017 04:44 PM, Eric Blake wrote:
> On 02/24/2017 03:31 PM, John Snow wrote:
>>>
>>> But the Backup Server could instead connect to the NAS directly avoiding
>>> load on the frontent LAN
>>> and the Qemu Node.
>>>
>>
>> In a live backup I don't see how you will be removing QEMU from the data
>> transfer loop. QEMU is the only process that knows what the correct view
>> of the image is, and needs to facilitate.
>>
>> It's not safe to copy the blocks directly without QEMU's mediation.
> 
> Although we may already have enough tools in place to help achieve that:
> create a temporary qcow2 wrapper around the primary image via external
> snapshot, so that the primary image is now read-only in qemu; then use
> whatever block-status mechanism (whether the NBD block status extension,
> or directly reading from a persistent bitmap) to facilitate whatever
> more efficient offline transfer of just the relevant portions of that
> main file, then live block-commit to get qemu to start writing to the
> file again.
> 

Right, really good point. We can just turn the "live" backup into a
not-live one (kind of!) to work around the constraint.

In this case, creating the external snapshot should probably create a
"new" bitmap on the root, leaving the old one behind on the backing
file. This avoids spurious copies of data that hasn't changed in the
backing file, and makes clearing the bitmap on success easier for us.
Once the snapshots are re-merged, we can merge their respective bitmaps
again.

This can work in some scenarios, sure! We may have to be careful about
how exactly bitmaps fork when you create new external snapshots, but
that does seem workable and (possibly) the most performant, if that's a
concern.

--js

> In other words, any time your algorithm wants to cause an I/O freeze to
> a particular file, the solution is to add a qcow2 external snapshot
> followed by a live commit.
> 
> So tweaking the proposal a few mails ago:
> 
> fsfreeze (optional)
> create qcow2 snapshot wrapper as a write lock (via QMP)
> fsthaw - now with no risk of violating guest timing constraints
> dirtymap = find all blocks that are dirty since last backup (via named
> bitmap/NBD block status)
> foreach block in dirtymap {
>                copy to backup via external software
> }
> live commit image (via QMP)
> 
> The window where guest I/O is frozen is small (the freeze/snapshot
> create/thaw steps can be done in less than a second), while the window
> where you are extracting incremental backup data is longer (during that
> time, guest I/O is happening into a wrapper qcow2 file).
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-02-27 20:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-21 12:43 [Qemu-devel] Qemu and Changed Block Tracking Peter Lieven
2017-02-21 15:11 ` Eric Blake
2017-02-21 21:13 ` John Snow
2017-02-22  8:45   ` Peter Lieven
2017-02-22 12:32     ` Eric Blake
2017-02-23 14:27       ` Peter Lieven
2017-02-24 21:31         ` John Snow
2017-02-24 21:44           ` Eric Blake
2017-02-26 20:41             ` Peter Lieven
2017-02-27 16:56               ` Eric Blake
2017-02-27 20:39             ` John Snow
2017-02-22 21:17     ` John Snow
2017-02-23 14:29       ` Peter Lieven
2017-02-23 19:34         ` John Snow
2017-02-24  7:59           ` Peter Lieven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).