qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series
@ 2014-10-30  3:22 Fam Zheng
  0 siblings, 0 replies; 17+ messages in thread
From: Fam Zheng @ 2014-10-30  3:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Benoit Canet, Vladimir Sementsov-Ogievskij,
	Markus Armbruster, Max Reitz, John Snow, Stefan Hajnoczi, Jd,
	Paolo Bonzini, Luiz Capitulino

v6: Rebased v4 of the series on top of qemu.git. (skipping v5 since it was used
    by me as a private sending, for those who received it, the code is the same
    :)

This is the in memory part of the incremental backup feature.

With the added commands, we can create a bitmap on a block backend, from which
point of time all the writes are tracked by the bitmap, marking sectors as
dirty.  Later, we call drive-backup and pass the bitmap to it, to do an
incremental backup.

See the last patch which adds some tests for this use case.

Fam

Fam Zheng (10):
  qapi: Add optional field "name" to block dirty bitmap
  qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove
  block: Introduce bdrv_dirty_bitmap_granularity()
  hbitmap: Add hbitmap_copy
  block: Add bdrv_copy_dirty_bitmap and bdrv_reset_dirty_bitmap
  qmp: Add block-dirty-bitmap-enable and block-dirty-bitmap-disable
  qmp: Add support of "dirty-bitmap" sync mode for drive-backup
  qapi: Add transaction support to
    block-dirty-bitmap-{add,enable,disable}
  qmp: Add dirty bitmap 'enabled' field in query-block
  qemu-iotests: Add tests for drive-backup sync=dirty-bitmap

 block-migration.c             |   2 +-
 block.c                       |  82 ++++++++++++++++-
 block/backup.c                |  54 +++++++++++-
 block/mirror.c                |   6 +-
 blockdev.c                    | 198 +++++++++++++++++++++++++++++++++++++++++-
 hmp.c                         |   4 +-
 include/block/block.h         |  15 +++-
 include/block/block_int.h     |   4 +
 include/qemu/hbitmap.h        |   8 ++
 qapi-schema.json              |   5 +-
 qapi/block-core.json          | 120 +++++++++++++++++++++++--
 qmp-commands.hx               |  66 +++++++++++++-
 tests/qemu-iotests/056        |  33 ++++++-
 tests/qemu-iotests/056.out    |   4 +-
 tests/qemu-iotests/iotests.py |   8 ++
 util/hbitmap.c                |  16 ++++
 16 files changed, 603 insertions(+), 22 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series
       [not found]         ` <546A88DD.10006@redhat.com>
@ 2014-11-18 10:54           ` Vladimir Sementsov-Ogievskiy
  2014-11-18 13:09             ` Vladimir Sementsov-Ogievskiy
  2014-11-20 10:34           ` [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-18 10:54 UTC (permalink / raw)
  To: John Snow, Fam Zheng; +Cc: Denis V. Lunev, stefanha, qemu-devel


> (2) File Format
>
> Some standard file magic, which includes:
>
> - Some magic byte(s)
> - Dirty flag. Needed to tell if we can trust this data or not.
> - The size of the bitmap
> - The granularity of the bitmap
> - The offset to the first sector of bitmap data (Maybe? It can't hurt 
> if we give ourselves a sector's worth to write metadata within.)
> - Data starting at... PAGESIZE? 
- The name of the bitmap and also the size of this name

>
> (5) Partial Persistence
>
> We did not discuss only saving higher levels of the bitmap. What's the 
> primary benefit you're seeking? 
Hmm. It may be used for faster sync. Maybe, save some of bitmap levels 
on timer while vm is running and save the last level on shutdown?

CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:
>
>
> On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi
>>
>> I'd just like to start working on persistent dirty bitmap. My thoughts
>> about it are the following:
>> - qemu -drive file=file,dirty_bitmap=file
>>      so,  bitmap will be loaded with drive open and saved with drive 
>> close.
>> - save only meaningful (the last) level of the bitmap, restore all
>> levels on bitmap loading
>> - bool parameter "persistent" for bdrv_create_dirty_bitmap and
>> BdrvDirtyBitmap
>> - internal dirty_bitmaps, saved in qcow2 file
>>
>> Best regards,
>> Vladimir
>
> I am thinking:
>
> (1) Command Lines
>
> If you enable dirty bitmaps and give it a file that doesn't exist, it 
> should error out on you.
>
> If you enable dirty bitmaps and give it a file that's blank, it 
> understands that it is to create a persistent bitmap file in this 
> location and it should enable persistence.
>
> If a bitmap file is given and it has valid magic, this should imply 
> persistence.
>
> I am hesitant to have it auto-create files that don't already exist in 
> case the files become large in size and a misconfiguration leads to 
> repeated creation of these files that get orphaned in random folders. 
> Perhaps we can add a create=auto flag or similar to allow this 
> behavior if wanted.
>
> (2) File Format
>
> Some standard file magic, which includes:
>
> - Some magic byte(s)
> - Dirty flag. Needed to tell if we can trust this data or not.
> - The size of the bitmap
> - The granularity of the bitmap
> - The offset to the first sector of bitmap data (Maybe? It can't hurt 
> if we give ourselves a sector's worth to write metadata within.)
> - Data starting at... PAGESIZE?
>
> (3) Data Integrity
>
> The dirty flag could work something like:
>
> - If, on first open, the file has the dirty flag set, we need to 
> discard the bitmap data because we can no longer trust it.
> - If the bitmap file is clean, proceed as normal, but take a lock 
> against any of the bitmap functions to prevent them from marking any 
> bits dirty.
> - On first write to a clean persistent bitmap, delay the write until 
> we can mark the bitmap as dirty first. This incurs a write penalty 
> when we try to use the bitmap at first...
> - Unlock the bitmap functions and allow them to mark blocks as needed.
> - At some point, based on a sync policy, re-commit the dirty 
> information to the file and mark the file as clean once more and 
> re-take the persistence lock.
>
> (4) Synchronization Policy
>
> - Sync after so many bits become dirty in the bitmap, either as an 
> absolute threshold or a density percentage?
> - Sync periodically on a fixed timer?
> - Sync periodically opportunistically when I/O utilization becomes 
> relatively low? (With some sort of starvation prevention timer?)
> - Sync only at shutdown?
>
> In discussing with Stefan, I think we rather liked the idea of a timer 
> that tries to re-commit the block data during lulls in the I/O.
>
> (5) Partial Persistence
>
> We did not discuss only saving higher levels of the bitmap. What's the 
> primary benefit you're seeking?
>
> (6) Inclusion as qcow2 Metadata
>
> And lastly, we did discuss the inclusion of the bitmap as qcow2 
> metadata, but decided it wasn't our principle target for the format to 
> allow generality to other file formats. We didn't really discuss the 
> idea of having it as an option or an extension, but I don't (off the 
> top of my head) have any reasonings against it, but I will likely not 
> work on it myself.
>
>
> You didn't CC qemu-devel on this (so I won't!), but perhaps we should 
> re-send out our ideas to the wider list for feedback before we proceed 
> any further. Maybe we can split the work if we agree upon a design.
>
> Thanks!
> --js
>
> P.S.: I'm still cleaning up Fam's first patchset based on Max's and 
> your feedback. Hope to have it out by the end of this week.
>
>> On 11.11.2014 18:59, John Snow wrote:
>>>
>>>
>>> On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi Fam, hi Jorn.
>>>>
>>>> Jagane's project - http://wiki.qemu.org/Features/Livebackup
>>>>
>>>> In two words:
>>>> Normal delta - like in qemu, while backuping, we save all new 
>>>> writes to
>>>> separate virtual disk - delta. When backup is done, we can merge delta
>>>> back to original image.
>>>> Reverse delta - while backuping, we don't stop writing to original 
>>>> image
>>>> (and qemu works with it, not with delta), but before every write we 
>>>> copy
>>>> the corresponding block (if backup needs it) to separate virtual disk
>>>> (reverse delta). So, for backuping, if the current block is not
>>>> rewritten, we take it from original file, otherwise we take it from
>>>> reverse delta. The benefit is that, theoretically we'll have two times
>>>> less overhead I/Os then with "normal delta backup + merge", because of
>>>> part of (about 0.5) blocks which are rewritten while backuping will be
>>>> backuped before rewrite, and they will not go to delta then. Also, 
>>>> there
>>>> are should be methods to improve this coefficient, by choosing rules
>>>> about what blocks to backup first.
>>>>
>>>> Are someone work now on persistent (while vm is turned off) 
>>>> bitmaps? If
>>>> not, are there any recommendations?
>>>>
>>>> Best regards,
>>>> Vladimir
>>>>
>>>
>>> Hi Vladimir:
>>>
>>> I recently inherited Fam's backup series and I intended to start
>>> reviewing the comments this week. After I sift through that situation,
>>> the next step for me is persistent dirty bitmap support. I had been
>>> working on AHCI and IDE an awful lot lately, so it will take me a
>>> little bit to switch over.
>>>
>>> I don't have any existing notions of how I intend to implement it, so
>>> if you'd like to discuss any concerns you have about that subsystem,
>>> we can talk :)
>>>
>>> Thanks,
>>> --John
>>>
>>>> On 08.11.2014 10:19, Fam Zheng wrote:
>>>>> On Fri, 11/07 15:23, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Hi!
>>>>>>
>>>>>> Glad to see you working on backup series again. Some time ago I've
>>>>>> already
>>>>>> fixed v4 to be mergeable, but unfortunately didn't publish.
>>>>>> Actually I need (and I'm going to develop) incremental backup with
>>>>>> reverse
>>>>>> delta (like abandoned project
>>>>>> http://wiki.qemu.org/Features/Livebackup by
>>>>>> Jagane). And I've already fixed Jagane's project to be working with
>>>>>> current
>>>>>> Qemu. But the code is bad. Can you give me a peace of advice about
>>>>>> what
>>>>>> should I notice? I have a very superficial understanding of qemu 
>>>>>> block
>>>>>> devices, block jobs, etc.
>>>>> Hi Vladimir,
>>>>>
>>>>> Thank you for reviewing my series.
>>>>>
>>>>> I haven't seen Jagane's series, so could you pleased explain what
>>>>> "reverse
>>>>> delta" is?
>>>>>
>>>>> John Snow (CC'ed) will take my work on incremental backup so I think
>>>>> you can
>>>>> coordinate. I'll probably count on John to work on your comments too,
>>>>> in the
>>>>> future I'll follow the progress and review.
>>>>>
>>>>> Thanks,
>>>>> Fam
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series
  2014-11-18 10:54           ` [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series Vladimir Sementsov-Ogievskiy
@ 2014-11-18 13:09             ` Vladimir Sementsov-Ogievskiy
  2014-11-18 14:41               ` Vladimir Sementsov-Ogievskiy
  2014-11-18 16:08               ` John Snow
  0 siblings, 2 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-18 13:09 UTC (permalink / raw)
  To: John Snow, Fam Zheng; +Cc: Denis V. Lunev, stefanha, qemu-devel

> (3) Data Integrity
>
> The dirty flag could work something like:
>
> - If, on first open, the file has the dirty flag set, we need to 
> discard the bitmap data because we can no longer trust it.
> - If the bitmap file is clean, proceed as normal, but take a lock 
> against any of the bitmap functions to prevent them from marking any 
> bits dirty.
> - On first write to a clean persistent bitmap, delay the write until 
> we can mark the bitmap as dirty first. This incurs a write penalty 
> when we try to use the bitmap at first...
> - Unlock the bitmap functions and allow them to mark blocks as needed.
> - At some point, based on a sync policy, re-commit the dirty 
> information to the file and mark the file as clean once more and 
> re-take the persistence lock. 
Correct me if I'm wrong.

#Read bitmap:
read in blockdev_init, before any write to device, so no lock is needed.

#Set bits in bitmap:
if bitmap.dirty_flag:
    set bits
else:
    LOCK
    set bits
    set bitmap.dirty_flag
    set dirty_flag in bitmap file
    UNLOCK

#Sync:
if not bitmap.dirty_flag:
    skip sync
else:
    LOCK
    save one of bitmap levels (saving the last one is too long and not 
very good idea, because it is fast-updateing)
    unset dirty_flag in bitmap file
    unset bitmap.dirty_flag
    UNLOCK

#Last sync in bdrv_close:
Just save the last bitmap level and unset dirty_flag in bitmap file

Also.. I'm not quite sure about locking.. As I understand, co-routines 
in qemu are not running in parallel, is locking required? Or sync timer 
will not be co-routine based?

Best regards,
Vladimir

On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:
>
>> (2) File Format
>>
>> Some standard file magic, which includes:
>>
>> - Some magic byte(s)
>> - Dirty flag. Needed to tell if we can trust this data or not.
>> - The size of the bitmap
>> - The granularity of the bitmap
>> - The offset to the first sector of bitmap data (Maybe? It can't hurt 
>> if we give ourselves a sector's worth to write metadata within.)
>> - Data starting at... PAGESIZE? 
> - The name of the bitmap and also the size of this name
>
>>
>> (5) Partial Persistence
>>
>> We did not discuss only saving higher levels of the bitmap. What's 
>> the primary benefit you're seeking? 
> Hmm. It may be used for faster sync. Maybe, save some of bitmap levels 
> on timer while vm is running and save the last level on shutdown?
>
> CC qemu-devel - ok.
>
> Best regards,
> Vladimir
>
> On 18.11.2014 02:46, John Snow wrote:
>>
>>
>> On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi
>>>
>>> I'd just like to start working on persistent dirty bitmap. My thoughts
>>> about it are the following:
>>> - qemu -drive file=file,dirty_bitmap=file
>>>      so,  bitmap will be loaded with drive open and saved with drive 
>>> close.
>>> - save only meaningful (the last) level of the bitmap, restore all
>>> levels on bitmap loading
>>> - bool parameter "persistent" for bdrv_create_dirty_bitmap and
>>> BdrvDirtyBitmap
>>> - internal dirty_bitmaps, saved in qcow2 file
>>>
>>> Best regards,
>>> Vladimir
>>
>> I am thinking:
>>
>> (1) Command Lines
>>
>> If you enable dirty bitmaps and give it a file that doesn't exist, it 
>> should error out on you.
>>
>> If you enable dirty bitmaps and give it a file that's blank, it 
>> understands that it is to create a persistent bitmap file in this 
>> location and it should enable persistence.
>>
>> If a bitmap file is given and it has valid magic, this should imply 
>> persistence.
>>
>> I am hesitant to have it auto-create files that don't already exist 
>> in case the files become large in size and a misconfiguration leads 
>> to repeated creation of these files that get orphaned in random 
>> folders. Perhaps we can add a create=auto flag or similar to allow 
>> this behavior if wanted.
>>
>> (2) File Format
>>
>> Some standard file magic, which includes:
>>
>> - Some magic byte(s)
>> - Dirty flag. Needed to tell if we can trust this data or not.
>> - The size of the bitmap
>> - The granularity of the bitmap
>> - The offset to the first sector of bitmap data (Maybe? It can't hurt 
>> if we give ourselves a sector's worth to write metadata within.)
>> - Data starting at... PAGESIZE?
>>
>> (3) Data Integrity
>>
>> The dirty flag could work something like:
>>
>> - If, on first open, the file has the dirty flag set, we need to 
>> discard the bitmap data because we can no longer trust it.
>> - If the bitmap file is clean, proceed as normal, but take a lock 
>> against any of the bitmap functions to prevent them from marking any 
>> bits dirty.
>> - On first write to a clean persistent bitmap, delay the write until 
>> we can mark the bitmap as dirty first. This incurs a write penalty 
>> when we try to use the bitmap at first...
>> - Unlock the bitmap functions and allow them to mark blocks as needed.
>> - At some point, based on a sync policy, re-commit the dirty 
>> information to the file and mark the file as clean once more and 
>> re-take the persistence lock.
>>
>> (4) Synchronization Policy
>>
>> - Sync after so many bits become dirty in the bitmap, either as an 
>> absolute threshold or a density percentage?
>> - Sync periodically on a fixed timer?
>> - Sync periodically opportunistically when I/O utilization becomes 
>> relatively low? (With some sort of starvation prevention timer?)
>> - Sync only at shutdown?
>>
>> In discussing with Stefan, I think we rather liked the idea of a 
>> timer that tries to re-commit the block data during lulls in the I/O.
>>
>> (5) Partial Persistence
>>
>> We did not discuss only saving higher levels of the bitmap. What's 
>> the primary benefit you're seeking?
>>
>> (6) Inclusion as qcow2 Metadata
>>
>> And lastly, we did discuss the inclusion of the bitmap as qcow2 
>> metadata, but decided it wasn't our principle target for the format 
>> to allow generality to other file formats. We didn't really discuss 
>> the idea of having it as an option or an extension, but I don't (off 
>> the top of my head) have any reasonings against it, but I will likely 
>> not work on it myself.
>>
>>
>> You didn't CC qemu-devel on this (so I won't!), but perhaps we should 
>> re-send out our ideas to the wider list for feedback before we 
>> proceed any further. Maybe we can split the work if we agree upon a 
>> design.
>>
>> Thanks!
>> --js
>>
>> P.S.: I'm still cleaning up Fam's first patchset based on Max's and 
>> your feedback. Hope to have it out by the end of this week.
>>
>>> On 11.11.2014 18:59, John Snow wrote:
>>>>
>>>>
>>>> On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Hi Fam, hi Jorn.
>>>>>
>>>>> Jagane's project - http://wiki.qemu.org/Features/Livebackup
>>>>>
>>>>> In two words:
>>>>> Normal delta - like in qemu, while backuping, we save all new 
>>>>> writes to
>>>>> separate virtual disk - delta. When backup is done, we can merge 
>>>>> delta
>>>>> back to original image.
>>>>> Reverse delta - while backuping, we don't stop writing to original 
>>>>> image
>>>>> (and qemu works with it, not with delta), but before every write 
>>>>> we copy
>>>>> the corresponding block (if backup needs it) to separate virtual disk
>>>>> (reverse delta). So, for backuping, if the current block is not
>>>>> rewritten, we take it from original file, otherwise we take it from
>>>>> reverse delta. The benefit is that, theoretically we'll have two 
>>>>> times
>>>>> less overhead I/Os then with "normal delta backup + merge", 
>>>>> because of
>>>>> part of (about 0.5) blocks which are rewritten while backuping 
>>>>> will be
>>>>> backuped before rewrite, and they will not go to delta then. Also, 
>>>>> there
>>>>> are should be methods to improve this coefficient, by choosing rules
>>>>> about what blocks to backup first.
>>>>>
>>>>> Are someone work now on persistent (while vm is turned off) 
>>>>> bitmaps? If
>>>>> not, are there any recommendations?
>>>>>
>>>>> Best regards,
>>>>> Vladimir
>>>>>
>>>>
>>>> Hi Vladimir:
>>>>
>>>> I recently inherited Fam's backup series and I intended to start
>>>> reviewing the comments this week. After I sift through that situation,
>>>> the next step for me is persistent dirty bitmap support. I had been
>>>> working on AHCI and IDE an awful lot lately, so it will take me a
>>>> little bit to switch over.
>>>>
>>>> I don't have any existing notions of how I intend to implement it, so
>>>> if you'd like to discuss any concerns you have about that subsystem,
>>>> we can talk :)
>>>>
>>>> Thanks,
>>>> --John
>>>>
>>>>> On 08.11.2014 10:19, Fam Zheng wrote:
>>>>>> On Fri, 11/07 15:23, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> Glad to see you working on backup series again. Some time ago I've
>>>>>>> already
>>>>>>> fixed v4 to be mergeable, but unfortunately didn't publish.
>>>>>>> Actually I need (and I'm going to develop) incremental backup with
>>>>>>> reverse
>>>>>>> delta (like abandoned project
>>>>>>> http://wiki.qemu.org/Features/Livebackup by
>>>>>>> Jagane). And I've already fixed Jagane's project to be working with
>>>>>>> current
>>>>>>> Qemu. But the code is bad. Can you give me a peace of advice about
>>>>>>> what
>>>>>>> should I notice? I have a very superficial understanding of qemu 
>>>>>>> block
>>>>>>> devices, block jobs, etc.
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> Thank you for reviewing my series.
>>>>>>
>>>>>> I haven't seen Jagane's series, so could you pleased explain what
>>>>>> "reverse
>>>>>> delta" is?
>>>>>>
>>>>>> John Snow (CC'ed) will take my work on incremental backup so I think
>>>>>> you can
>>>>>> coordinate. I'll probably count on John to work on your comments 
>>>>>> too,
>>>>>> in the
>>>>>> future I'll follow the progress and review.
>>>>>>
>>>>>> Thanks,
>>>>>> Fam
>>>>>
>>>>
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series
  2014-11-18 13:09             ` Vladimir Sementsov-Ogievskiy
@ 2014-11-18 14:41               ` Vladimir Sementsov-Ogievskiy
  2014-11-18 16:08               ` John Snow
  1 sibling, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-18 14:41 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: Fam Zheng, Denis V. Lunev, stefanha

Also, if we sync not the last level, bitmap.dirty_flag should be related 
to syncing level, not to the whole bitmap, to reduce sync overhead.
Or, if we implement difficult sync policy, there should be dirty flags 
for each bitmap level. Despite this, only one level is needed to be 
saved in the bitmap file.

PS: more ideas about file format - thanks to Denis V. Lunev 
<den@parallels.com>
1) Shouldn't we consider a possibility of storing several bitmaps in one 
file? Or one bitmap = one file?
2) Implement header extensions like in qcow2.

Best regards,
Vladimir

On 18.11.2014 16:09, Vladimir Sementsov-Ogievskiy wrote:
>> (3) Data Integrity
>>
>> The dirty flag could work something like:
>>
>> - If, on first open, the file has the dirty flag set, we need to 
>> discard the bitmap data because we can no longer trust it.
>> - If the bitmap file is clean, proceed as normal, but take a lock 
>> against any of the bitmap functions to prevent them from marking any 
>> bits dirty.
>> - On first write to a clean persistent bitmap, delay the write until 
>> we can mark the bitmap as dirty first. This incurs a write penalty 
>> when we try to use the bitmap at first...
>> - Unlock the bitmap functions and allow them to mark blocks as needed.
>> - At some point, based on a sync policy, re-commit the dirty 
>> information to the file and mark the file as clean once more and 
>> re-take the persistence lock. 
> Correct me if I'm wrong.
>
> #Read bitmap:
> read in blockdev_init, before any write to device, so no lock is needed.
>
> #Set bits in bitmap:
> if bitmap.dirty_flag:
>    set bits
> else:
>    LOCK
>    set bits
>    set bitmap.dirty_flag
>    set dirty_flag in bitmap file
>    UNLOCK
>
> #Sync:
> if not bitmap.dirty_flag:
>    skip sync
> else:
>    LOCK
>    save one of bitmap levels (saving the last one is too long and not 
> very good idea, because it is fast-updateing)
>    unset dirty_flag in bitmap file
>    unset bitmap.dirty_flag
>    UNLOCK
>
> #Last sync in bdrv_close:
> Just save the last bitmap level and unset dirty_flag in bitmap file
>
> Also.. I'm not quite sure about locking.. As I understand, co-routines 
> in qemu are not running in parallel, is locking required? Or sync 
> timer will not be co-routine based?
>
> Best regards,
> Vladimir
>
> On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:
>>
>>> (2) File Format
>>>
>>> Some standard file magic, which includes:
>>>
>>> - Some magic byte(s)
>>> - Dirty flag. Needed to tell if we can trust this data or not.
>>> - The size of the bitmap
>>> - The granularity of the bitmap
>>> - The offset to the first sector of bitmap data (Maybe? It can't 
>>> hurt if we give ourselves a sector's worth to write metadata within.)
>>> - Data starting at... PAGESIZE? 
>> - The name of the bitmap and also the size of this name
>>
>>>
>>> (5) Partial Persistence
>>>
>>> We did not discuss only saving higher levels of the bitmap. What's 
>>> the primary benefit you're seeking? 
>> Hmm. It may be used for faster sync. Maybe, save some of bitmap 
>> levels on timer while vm is running and save the last level on shutdown?
>>
>> CC qemu-devel - ok.
>>
>> Best regards,
>> Vladimir
>>
>> On 18.11.2014 02:46, John Snow wrote:
>>>
>>>
>>> On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi
>>>>
>>>> I'd just like to start working on persistent dirty bitmap. My thoughts
>>>> about it are the following:
>>>> - qemu -drive file=file,dirty_bitmap=file
>>>>      so,  bitmap will be loaded with drive open and saved with 
>>>> drive close.
>>>> - save only meaningful (the last) level of the bitmap, restore all
>>>> levels on bitmap loading
>>>> - bool parameter "persistent" for bdrv_create_dirty_bitmap and
>>>> BdrvDirtyBitmap
>>>> - internal dirty_bitmaps, saved in qcow2 file
>>>>
>>>> Best regards,
>>>> Vladimir
>>>
>>> I am thinking:
>>>
>>> (1) Command Lines
>>>
>>> If you enable dirty bitmaps and give it a file that doesn't exist, 
>>> it should error out on you.
>>>
>>> If you enable dirty bitmaps and give it a file that's blank, it 
>>> understands that it is to create a persistent bitmap file in this 
>>> location and it should enable persistence.
>>>
>>> If a bitmap file is given and it has valid magic, this should imply 
>>> persistence.
>>>
>>> I am hesitant to have it auto-create files that don't already exist 
>>> in case the files become large in size and a misconfiguration leads 
>>> to repeated creation of these files that get orphaned in random 
>>> folders. Perhaps we can add a create=auto flag or similar to allow 
>>> this behavior if wanted.
>>>
>>> (2) File Format
>>>
>>> Some standard file magic, which includes:
>>>
>>> - Some magic byte(s)
>>> - Dirty flag. Needed to tell if we can trust this data or not.
>>> - The size of the bitmap
>>> - The granularity of the bitmap
>>> - The offset to the first sector of bitmap data (Maybe? It can't 
>>> hurt if we give ourselves a sector's worth to write metadata within.)
>>> - Data starting at... PAGESIZE?
>>>
>>> (3) Data Integrity
>>>
>>> The dirty flag could work something like:
>>>
>>> - If, on first open, the file has the dirty flag set, we need to 
>>> discard the bitmap data because we can no longer trust it.
>>> - If the bitmap file is clean, proceed as normal, but take a lock 
>>> against any of the bitmap functions to prevent them from marking any 
>>> bits dirty.
>>> - On first write to a clean persistent bitmap, delay the write until 
>>> we can mark the bitmap as dirty first. This incurs a write penalty 
>>> when we try to use the bitmap at first...
>>> - Unlock the bitmap functions and allow them to mark blocks as needed.
>>> - At some point, based on a sync policy, re-commit the dirty 
>>> information to the file and mark the file as clean once more and 
>>> re-take the persistence lock.
>>>
>>> (4) Synchronization Policy
>>>
>>> - Sync after so many bits become dirty in the bitmap, either as an 
>>> absolute threshold or a density percentage?
>>> - Sync periodically on a fixed timer?
>>> - Sync periodically opportunistically when I/O utilization becomes 
>>> relatively low? (With some sort of starvation prevention timer?)
>>> - Sync only at shutdown?
>>>
>>> In discussing with Stefan, I think we rather liked the idea of a 
>>> timer that tries to re-commit the block data during lulls in the I/O.
>>>
>>> (5) Partial Persistence
>>>
>>> We did not discuss only saving higher levels of the bitmap. What's 
>>> the primary benefit you're seeking?
>>>
>>> (6) Inclusion as qcow2 Metadata
>>>
>>> And lastly, we did discuss the inclusion of the bitmap as qcow2 
>>> metadata, but decided it wasn't our principle target for the format 
>>> to allow generality to other file formats. We didn't really discuss 
>>> the idea of having it as an option or an extension, but I don't (off 
>>> the top of my head) have any reasonings against it, but I will 
>>> likely not work on it myself.
>>>
>>>
>>> You didn't CC qemu-devel on this (so I won't!), but perhaps we 
>>> should re-send out our ideas to the wider list for feedback before 
>>> we proceed any further. Maybe we can split the work if we agree upon 
>>> a design.
>>>
>>> Thanks!
>>> --js
>>>
>>> P.S.: I'm still cleaning up Fam's first patchset based on Max's and 
>>> your feedback. Hope to have it out by the end of this week.
>>>
>>>> On 11.11.2014 18:59, John Snow wrote:
>>>>>
>>>>>
>>>>> On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Hi Fam, hi Jorn.
>>>>>>
>>>>>> Jagane's project - http://wiki.qemu.org/Features/Livebackup
>>>>>>
>>>>>> In two words:
>>>>>> Normal delta - like in qemu, while backuping, we save all new 
>>>>>> writes to
>>>>>> separate virtual disk - delta. When backup is done, we can merge 
>>>>>> delta
>>>>>> back to original image.
>>>>>> Reverse delta - while backuping, we don't stop writing to 
>>>>>> original image
>>>>>> (and qemu works with it, not with delta), but before every write 
>>>>>> we copy
>>>>>> the corresponding block (if backup needs it) to separate virtual 
>>>>>> disk
>>>>>> (reverse delta). So, for backuping, if the current block is not
>>>>>> rewritten, we take it from original file, otherwise we take it from
>>>>>> reverse delta. The benefit is that, theoretically we'll have two 
>>>>>> times
>>>>>> less overhead I/Os then with "normal delta backup + merge", 
>>>>>> because of
>>>>>> part of (about 0.5) blocks which are rewritten while backuping 
>>>>>> will be
>>>>>> backuped before rewrite, and they will not go to delta then. 
>>>>>> Also, there
>>>>>> are should be methods to improve this coefficient, by choosing rules
>>>>>> about what blocks to backup first.
>>>>>>
>>>>>> Are someone work now on persistent (while vm is turned off) 
>>>>>> bitmaps? If
>>>>>> not, are there any recommendations?
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir
>>>>>>
>>>>>
>>>>> Hi Vladimir:
>>>>>
>>>>> I recently inherited Fam's backup series and I intended to start
>>>>> reviewing the comments this week. After I sift through that 
>>>>> situation,
>>>>> the next step for me is persistent dirty bitmap support. I had been
>>>>> working on AHCI and IDE an awful lot lately, so it will take me a
>>>>> little bit to switch over.
>>>>>
>>>>> I don't have any existing notions of how I intend to implement it, so
>>>>> if you'd like to discuss any concerns you have about that subsystem,
>>>>> we can talk :)
>>>>>
>>>>> Thanks,
>>>>> --John
>>>>>
>>>>>> On 08.11.2014 10:19, Fam Zheng wrote:
>>>>>>> On Fri, 11/07 15:23, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> Glad to see you working on backup series again. Some time ago I've
>>>>>>>> already
>>>>>>>> fixed v4 to be mergeable, but unfortunately didn't publish.
>>>>>>>> Actually I need (and I'm going to develop) incremental backup with
>>>>>>>> reverse
>>>>>>>> delta (like abandoned project
>>>>>>>> http://wiki.qemu.org/Features/Livebackup by
>>>>>>>> Jagane). And I've already fixed Jagane's project to be working 
>>>>>>>> with
>>>>>>>> current
>>>>>>>> Qemu. But the code is bad. Can you give me a peace of advice about
>>>>>>>> what
>>>>>>>> should I notice? I have a very superficial understanding of 
>>>>>>>> qemu block
>>>>>>>> devices, block jobs, etc.
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> Thank you for reviewing my series.
>>>>>>>
>>>>>>> I haven't seen Jagane's series, so could you pleased explain what
>>>>>>> "reverse
>>>>>>> delta" is?
>>>>>>>
>>>>>>> John Snow (CC'ed) will take my work on incremental backup so I 
>>>>>>> think
>>>>>>> you can
>>>>>>> coordinate. I'll probably count on John to work on your comments 
>>>>>>> too,
>>>>>>> in the
>>>>>>> future I'll follow the progress and review.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Fam
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series
  2014-11-18 13:09             ` Vladimir Sementsov-Ogievskiy
  2014-11-18 14:41               ` Vladimir Sementsov-Ogievskiy
@ 2014-11-18 16:08               ` John Snow
  2014-11-19  6:25                 ` Denis V. Lunev
  1 sibling, 1 reply; 17+ messages in thread
From: John Snow @ 2014-11-18 16:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Fam Zheng
  Cc: Denis V. Lunev, stefanha, qemu-devel



On 11/18/2014 08:09 AM, Vladimir Sementsov-Ogievskiy wrote:
>> (3) Data Integrity
>>
>> The dirty flag could work something like:
>>
>> - If, on first open, the file has the dirty flag set, we need to
>> discard the bitmap data because we can no longer trust it.
>> - If the bitmap file is clean, proceed as normal, but take a lock
>> against any of the bitmap functions to prevent them from marking any
>> bits dirty.
>> - On first write to a clean persistent bitmap, delay the write until
>> we can mark the bitmap as dirty first. This incurs a write penalty
>> when we try to use the bitmap at first...
>> - Unlock the bitmap functions and allow them to mark blocks as needed.
>> - At some point, based on a sync policy, re-commit the dirty
>> information to the file and mark the file as clean once more and
>> re-take the persistence lock.
> Correct me if I'm wrong.
>
> #Read bitmap:
> read in blockdev_init, before any write to device, so no lock is needed.
>
> #Set bits in bitmap:
> if bitmap.dirty_flag:
>     set bits
> else:
>     LOCK
>     set bits
>     set bitmap.dirty_flag
>     set dirty_flag in bitmap file
>     UNLOCK
>
> #Sync:
> if not bitmap.dirty_flag:
>     skip sync
> else:
>     LOCK
>     save one of bitmap levels (saving the last one is too long and not
> very good idea, because it is fast-updateing)
>     unset dirty_flag in bitmap file
>     unset bitmap.dirty_flag
>     UNLOCK
>
> #Last sync in bdrv_close:
> Just save the last bitmap level and unset dirty_flag in bitmap file
>
> Also.. I'm not quite sure about locking.. As I understand, co-routines
> in qemu are not running in parallel, is locking required? Or sync timer
> will not be co-routine based?
>
> Best regards,
> Vladimir

Might be being too informal. I just meant a lock or barrier to prevent 
actual IO throughput until we can confirm the dirty flag has been 
adjusted to indicate that the persistent bitmap is now officially out of 
date. Nothing fancy.

Wasn't trying to imply that we needed threading protection, just 
"locking" the IO until we can configure the bitmap as we need it to be.

> On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:
>>
>>> (2) File Format
>>>
>>> Some standard file magic, which includes:
>>>
>>> - Some magic byte(s)
>>> - Dirty flag. Needed to tell if we can trust this data or not.
>>> - The size of the bitmap
>>> - The granularity of the bitmap
>>> - The offset to the first sector of bitmap data (Maybe? It can't hurt
>>> if we give ourselves a sector's worth to write metadata within.)
>>> - Data starting at... PAGESIZE?
>> - The name of the bitmap and also the size of this name
>>
>>>
>>> (5) Partial Persistence
>>>
>>> We did not discuss only saving higher levels of the bitmap. What's
>>> the primary benefit you're seeking?
>> Hmm. It may be used for faster sync. Maybe, save some of bitmap levels
>> on timer while vm is running and save the last level on shutdown?
>>
>> CC qemu-devel - ok.
>>
>> Best regards,
>> Vladimir
>>
>> On 18.11.2014 02:46, John Snow wrote:
>>>
>>>
>>> On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi
>>>>
>>>> I'd just like to start working on persistent dirty bitmap. My thoughts
>>>> about it are the following:
>>>> - qemu -drive file=file,dirty_bitmap=file
>>>>      so,  bitmap will be loaded with drive open and saved with drive
>>>> close.
>>>> - save only meaningful (the last) level of the bitmap, restore all
>>>> levels on bitmap loading
>>>> - bool parameter "persistent" for bdrv_create_dirty_bitmap and
>>>> BdrvDirtyBitmap
>>>> - internal dirty_bitmaps, saved in qcow2 file
>>>>
>>>> Best regards,
>>>> Vladimir
>>>
>>> I am thinking:
>>>
>>> (1) Command Lines
>>>
>>> If you enable dirty bitmaps and give it a file that doesn't exist, it
>>> should error out on you.
>>>
>>> If you enable dirty bitmaps and give it a file that's blank, it
>>> understands that it is to create a persistent bitmap file in this
>>> location and it should enable persistence.
>>>
>>> If a bitmap file is given and it has valid magic, this should imply
>>> persistence.
>>>
>>> I am hesitant to have it auto-create files that don't already exist
>>> in case the files become large in size and a misconfiguration leads
>>> to repeated creation of these files that get orphaned in random
>>> folders. Perhaps we can add a create=auto flag or similar to allow
>>> this behavior if wanted.
>>>
>>> (2) File Format
>>>
>>> Some standard file magic, which includes:
>>>
>>> - Some magic byte(s)
>>> - Dirty flag. Needed to tell if we can trust this data or not.
>>> - The size of the bitmap
>>> - The granularity of the bitmap
>>> - The offset to the first sector of bitmap data (Maybe? It can't hurt
>>> if we give ourselves a sector's worth to write metadata within.)
>>> - Data starting at... PAGESIZE?
>>>
>>> (3) Data Integrity
>>>
>>> The dirty flag could work something like:
>>>
>>> - If, on first open, the file has the dirty flag set, we need to
>>> discard the bitmap data because we can no longer trust it.
>>> - If the bitmap file is clean, proceed as normal, but take a lock
>>> against any of the bitmap functions to prevent them from marking any
>>> bits dirty.
>>> - On first write to a clean persistent bitmap, delay the write until
>>> we can mark the bitmap as dirty first. This incurs a write penalty
>>> when we try to use the bitmap at first...
>>> - Unlock the bitmap functions and allow them to mark blocks as needed.
>>> - At some point, based on a sync policy, re-commit the dirty
>>> information to the file and mark the file as clean once more and
>>> re-take the persistence lock.
>>>
>>> (4) Synchronization Policy
>>>
>>> - Sync after so many bits become dirty in the bitmap, either as an
>>> absolute threshold or a density percentage?
>>> - Sync periodically on a fixed timer?
>>> - Sync periodically opportunistically when I/O utilization becomes
>>> relatively low? (With some sort of starvation prevention timer?)
>>> - Sync only at shutdown?
>>>
>>> In discussing with Stefan, I think we rather liked the idea of a
>>> timer that tries to re-commit the block data during lulls in the I/O.
>>>
>>> (5) Partial Persistence
>>>
>>> We did not discuss only saving higher levels of the bitmap. What's
>>> the primary benefit you're seeking?
>>>
>>> (6) Inclusion as qcow2 Metadata
>>>
>>> And lastly, we did discuss the inclusion of the bitmap as qcow2
>>> metadata, but decided it wasn't our principle target for the format
>>> to allow generality to other file formats. We didn't really discuss
>>> the idea of having it as an option or an extension, but I don't (off
>>> the top of my head) have any reasonings against it, but I will likely
>>> not work on it myself.
>>>
>>>
>>> You didn't CC qemu-devel on this (so I won't!), but perhaps we should
>>> re-send out our ideas to the wider list for feedback before we
>>> proceed any further. Maybe we can split the work if we agree upon a
>>> design.
>>>
>>> Thanks!
>>> --js
>>>
>>> P.S.: I'm still cleaning up Fam's first patchset based on Max's and
>>> your feedback. Hope to have it out by the end of this week.
>>>
>>>> On 11.11.2014 18:59, John Snow wrote:
>>>>>
>>>>>
>>>>> On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Hi Fam, hi Jorn.
>>>>>>
>>>>>> Jagane's project - http://wiki.qemu.org/Features/Livebackup
>>>>>>
>>>>>> In two words:
>>>>>> Normal delta - like in qemu, while backuping, we save all new
>>>>>> writes to
>>>>>> separate virtual disk - delta. When backup is done, we can merge
>>>>>> delta
>>>>>> back to original image.
>>>>>> Reverse delta - while backuping, we don't stop writing to original
>>>>>> image
>>>>>> (and qemu works with it, not with delta), but before every write
>>>>>> we copy
>>>>>> the corresponding block (if backup needs it) to separate virtual disk
>>>>>> (reverse delta). So, for backuping, if the current block is not
>>>>>> rewritten, we take it from original file, otherwise we take it from
>>>>>> reverse delta. The benefit is that, theoretically we'll have two
>>>>>> times
>>>>>> less overhead I/Os then with "normal delta backup + merge",
>>>>>> because of
>>>>>> part of (about 0.5) blocks which are rewritten while backuping
>>>>>> will be
>>>>>> backuped before rewrite, and they will not go to delta then. Also,
>>>>>> there
>>>>>> are should be methods to improve this coefficient, by choosing rules
>>>>>> about what blocks to backup first.
>>>>>>
>>>>>> Are someone work now on persistent (while vm is turned off)
>>>>>> bitmaps? If
>>>>>> not, are there any recommendations?
>>>>>>
>>>>>> Best regards,
>>>>>> Vladimir
>>>>>>
>>>>>
>>>>> Hi Vladimir:
>>>>>
>>>>> I recently inherited Fam's backup series and I intended to start
>>>>> reviewing the comments this week. After I sift through that situation,
>>>>> the next step for me is persistent dirty bitmap support. I had been
>>>>> working on AHCI and IDE an awful lot lately, so it will take me a
>>>>> little bit to switch over.
>>>>>
>>>>> I don't have any existing notions of how I intend to implement it, so
>>>>> if you'd like to discuss any concerns you have about that subsystem,
>>>>> we can talk :)
>>>>>
>>>>> Thanks,
>>>>> --John
>>>>>
>>>>>> On 08.11.2014 10:19, Fam Zheng wrote:
>>>>>>> On Fri, 11/07 15:23, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> Glad to see you working on backup series again. Some time ago I've
>>>>>>>> already
>>>>>>>> fixed v4 to be mergeable, but unfortunately didn't publish.
>>>>>>>> Actually I need (and I'm going to develop) incremental backup with
>>>>>>>> reverse
>>>>>>>> delta (like abandoned project
>>>>>>>> http://wiki.qemu.org/Features/Livebackup by
>>>>>>>> Jagane). And I've already fixed Jagane's project to be working with
>>>>>>>> current
>>>>>>>> Qemu. But the code is bad. Can you give me a peace of advice about
>>>>>>>> what
>>>>>>>> should I notice? I have a very superficial understanding of qemu
>>>>>>>> block
>>>>>>>> devices, block jobs, etc.
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> Thank you for reviewing my series.
>>>>>>>
>>>>>>> I haven't seen Jagane's series, so could you pleased explain what
>>>>>>> "reverse
>>>>>>> delta" is?
>>>>>>>
>>>>>>> John Snow (CC'ed) will take my work on incremental backup so I think
>>>>>>> you can
>>>>>>> coordinate. I'll probably count on John to work on your comments
>>>>>>> too,
>>>>>>> in the
>>>>>>> future I'll follow the progress and review.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Fam
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>

-- 
—js

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series
  2014-11-18 16:08               ` John Snow
@ 2014-11-19  6:25                 ` Denis V. Lunev
  0 siblings, 0 replies; 17+ messages in thread
From: Denis V. Lunev @ 2014-11-19  6:25 UTC (permalink / raw)
  To: John Snow, Vladimir Sementsov-Ogievskiy, Fam Zheng
  Cc: Denis V. Lunev, stefanha, qemu-devel

On 18/11/14 19:08, John Snow wrote:
>
>
> On 11/18/2014 08:09 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> (3) Data Integrity
>>>
>>> The dirty flag could work something like:
>>>
>>> - If, on first open, the file has the dirty flag set, we need to
>>> discard the bitmap data because we can no longer trust it.
>>> - If the bitmap file is clean, proceed as normal, but take a lock
>>> against any of the bitmap functions to prevent them from marking any
>>> bits dirty.
>>> - On first write to a clean persistent bitmap, delay the write until
>>> we can mark the bitmap as dirty first. This incurs a write penalty
>>> when we try to use the bitmap at first...
>>> - Unlock the bitmap functions and allow them to mark blocks as needed.
>>> - At some point, based on a sync policy, re-commit the dirty
>>> information to the file and mark the file as clean once more and
>>> re-take the persistence lock.
>> Correct me if I'm wrong.
>>

- first of all what we are protecting against? Any QEMU or kernel
crash leads to the disaster. You can not guarantee at all the
consistency between data written to the main file (disk) and data
written to bitmap except if you wait that dirty bitmap is really
updated. In this case you will have performance halved in comparison
with the host (each guest write means 2 IOPS instead of 1)

- this effectively means that we can not be protected against crash

- if we don't we could not care at all about bitmap updates when
QEMU is running. We should write it only on stop/suspend/etc.

This simplifies things a lot.

So, the procedure is simple
- stop main VM by using vm_pause
- open/create bitmap file
- set dirty
- unpause

That's all. If QEMU is running, bitmap is always dirty. The only 
exception is backup creation. New backup means that old bitmap is
synced with dirty clear, new one is created with dirty set and
backup starts working with old bitmap.

Once again. In all other cases we can not guarantee that we will
report _all_ changed blocks to backup software if crash happens
in the middle. One missed block in the bitmap and the entire backup
is blown up.

This also means that (4) aka sync policy is simple. Do this on
close.

>> #Read bitmap:
>> read in blockdev_init, before any write to device, so no lock is needed.
>>
>> #Set bits in bitmap:
>> if bitmap.dirty_flag:
>>     set bits
>> else:
>>     LOCK
>>     set bits
>>     set bitmap.dirty_flag
>>     set dirty_flag in bitmap file
>>     UNLOCK
>>
>> #Sync:
>> if not bitmap.dirty_flag:
>>     skip sync
>> else:
>>     LOCK
>>     save one of bitmap levels (saving the last one is too long and not
>> very good idea, because it is fast-updateing)
>>     unset dirty_flag in bitmap file
>>     unset bitmap.dirty_flag
>>     UNLOCK
>>
>> #Last sync in bdrv_close:
>> Just save the last bitmap level and unset dirty_flag in bitmap file
>>
>> Also.. I'm not quite sure about locking.. As I understand, co-routines
>> in qemu are not running in parallel, is locking required? Or sync timer
>> will not be co-routine based?
>>
>> Best regards,
>> Vladimir
>
> Might be being too informal. I just meant a lock or barrier to prevent
> actual IO throughput until we can confirm the dirty flag has been
> adjusted to indicate that the persistent bitmap is now officially out of
> date. Nothing fancy.
>
> Wasn't trying to imply that we needed threading protection, just
> "locking" the IO until we can configure the bitmap as we need it to be.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
       [not found]         ` <546A88DD.10006@redhat.com>
  2014-11-18 10:54           ` [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series Vladimir Sementsov-Ogievskiy
@ 2014-11-20 10:34           ` Vladimir Sementsov-Ogievskiy
  2014-11-20 10:41             ` Vladimir Sementsov-Ogievskiy
  2014-11-21  0:24             ` Eric Blake
  1 sibling, 2 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-20 10:34 UTC (permalink / raw)
  To: qemu-devel, jsnow; +Cc: famz, den, stefanha

QDB file is for storing dirty bitmap. The specification is based on
qcow2 specification.

Saving several bitmaps is necessary when server shutdowns during
backup. In this case 2 tables for each disk are available. One
collected for a previous period and one active. Though this feature
is discussable.

Big endian format and Standard Cluster Descriptor are used to simplify
integration with qcow2, to support internal bitmaps for qcow2 in future.

The idea is that the same procedure writing the data to QDB file could
do the same for QCOW2. The only difference is cluster refcount table.
Should we use it here or not is still questionable.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
---
 docs/specs/qdb.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)
 create mode 100644 docs/specs/qdb.txt

diff --git a/docs/specs/qdb.txt b/docs/specs/qdb.txt
new file mode 100644
index 0000000..d570a69
--- /dev/null
+++ b/docs/specs/qdb.txt
@@ -0,0 +1,132 @@
+== General ==
+
+"QDB" means "Qemu Dirty Bitmaps". QDB file can store several dirty bitmaps.
+QDB file is organized in units of constant size, which are called clusters.
+
+All numbers in QDB are stored in Big Endian byte order.
+
+== Header ==
+
+The first cluster of a QDB image contains the file header:
+
+    Byte  0 -  3:   magic
+                    QDB magic string ("QDB\0")
+
+          4 -  7:   version
+                    Version number (valid value is 1)
+
+          8 - 11:   cluster_bits
+                    Number of bits that are used for addressing an offset
+                    within a cluster (1 << cluster_bits is the cluster size).
+                    Must not be less than 9 (i.e. 512 byte clusters).
+
+         12 - 15:   nb_bitmaps
+                    Number of bitmaps contained in the file
+
+         16 - 23:   bitmaps_offset
+                    Offset into the QDB file at which the bitmap table starts.
+                    Must be aligned to a cluster boundary.
+
+         24 - 27:   header_length
+                    Length of the header structure in bytes.
+
+Like in qcow2, directly after the image header, optional sections called header extensions can
+be stored. Each extension has a structure like the following:
+
+    Byte  0 -  3:   Header extension type:
+                        0x00000000 - End of the header extension area
+                        other      - Unknown header extension, can be safely
+                                     ignored
+
+          4 -  7:   Length of the header extension data
+
+          8 -  n:   Header extension data
+
+          n -  m:   Padding to round up the header extension size to the next
+                    multiple of 8.
+
+Unless stated otherwise, each header extension type shall appear at most once
+in the same image.
+
+== Cluster mapping ==
+
+QDB uses a ONE-level structure for the mapping of
+bitmaps to host clusters. It is called L1 table.
+
+The L1 table has a variable size (stored in the Bitmap table entry) and may
+use multiple clusters, however it must be contiguous in the QDB file.
+
+Given a offset into the bitmap, the offset into the QDB file can be
+obtained as follows:
+
+    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
+
+L1 table entry:
+
+    Bit  0 -  61:   Cluster descriptor
+
+        62 -  63:   Reserved
+
+Standard Cluster Descriptor (the same as in qcow2):
+
+    Bit       0:    If set to 1, the cluster reads as all zeros. The host
+                    cluster offset can be used to describe a preallocation,
+                    but it won't be used for reading data from this cluster,
+                    nor is data read from the backing file if the cluster is
+                    unallocated.
+
+         1 -  8:    Reserved (set to 0)
+
+         9 - 55:    Bits 9-55 of host cluster offset. Must be aligned to a
+                    cluster boundary. If the offset is 0, the cluster is
+                    unallocated.
+
+        56 - 61:    Reserved (set to 0)
+
+If a cluster is unallocated, read requests shall read zero.
+
+== Bitmap table ==
+
+QDB supports storing of several bitmaps.
+
+A directory of all bitmaps is stored in the bitmap table, a contiguous area
+in the QDB file, whose starting offset and length are given by the header
+fields bitmaps_offset and nb_bitmaps. The entries of the bitmap table
+have variable length, depending on the length of name and extra data.
+
+Bitmap table entry:
+
+    Byte 0 -  7:    Offset into the QDB file at which the L1 table for the
+                    bitmap starts. Must be aligned to a cluster boundary.
+
+         8 - 11:    Number of entries in the L1 table of the bitmap
+
+        12 - 15:    Bitmap granularity
+                    As represented in HBitmap structure. Given a granularity of
+                    G, each bit in the bitmap will actually represent a group
+                    of 2^G bytes.
+
+        16 - 23:    Bitmap size
+                    The size of really stored data is
+                    (size + (1 << granularity) - 1) >> granularity
+
+             24:    Bitmap enabled flag (valid values are 1 and 0)
+
+             25:    File dirty flag (valid values are 1 and 0;
+                    if value is 1 the bitmap is inconsistent)
+
+        26 - 27:    Size of the bitmap name
+
+        36 - 39:    Size of extra data in the table entry (used for future
+                    extensions of the format)
+
+        variable:   Extra data for future extensions. Unknown fields must be
+                    ignored.
+
+        variable:   Name of the bitmap (not null terminated)
+
+        variable:   Padding to round up the bitmap table entry size to the
+                    next multiple of 8.
+
+The fields "size", "granularity", "enabled" and "name" are corresponding with
+the fields in struct BdrvDirtyBitmap.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-20 10:34           ` [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec Vladimir Sementsov-Ogievskiy
@ 2014-11-20 10:41             ` Vladimir Sementsov-Ogievskiy
  2014-11-20 11:36               ` Stefan Hajnoczi
  2014-11-21  0:24             ` Eric Blake
  1 sibling, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-20 10:41 UTC (permalink / raw)
  To: qemu-devel, jsnow; +Cc: famz, den, stefanha

Also, it may be better to make this as qcow2 extension. And bitmap will 
be saved in separate qcow2 file, which will contain only the bitmap(s) 
and no other data (no disk, no snapshots).

Best regards,
Vladimir

On 20.11.2014 13:34, Vladimir Sementsov-Ogievskiy wrote:
> QDB file is for storing dirty bitmap. The specification is based on
> qcow2 specification.
>
> Saving several bitmaps is necessary when server shutdowns during
> backup. In this case 2 tables for each disk are available. One
> collected for a previous period and one active. Though this feature
> is discussable.
>
> Big endian format and Standard Cluster Descriptor are used to simplify
> integration with qcow2, to support internal bitmaps for qcow2 in future.
>
> The idea is that the same procedure writing the data to QDB file could
> do the same for QCOW2. The only difference is cluster refcount table.
> Should we use it here or not is still questionable.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>   docs/specs/qdb.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 132 insertions(+)
>   create mode 100644 docs/specs/qdb.txt
>
> diff --git a/docs/specs/qdb.txt b/docs/specs/qdb.txt
> new file mode 100644
> index 0000000..d570a69
> --- /dev/null
> +++ b/docs/specs/qdb.txt
> @@ -0,0 +1,132 @@
> +== General ==
> +
> +"QDB" means "Qemu Dirty Bitmaps". QDB file can store several dirty bitmaps.
> +QDB file is organized in units of constant size, which are called clusters.
> +
> +All numbers in QDB are stored in Big Endian byte order.
> +
> +== Header ==
> +
> +The first cluster of a QDB image contains the file header:
> +
> +    Byte  0 -  3:   magic
> +                    QDB magic string ("QDB\0")
> +
> +          4 -  7:   version
> +                    Version number (valid value is 1)
> +
> +          8 - 11:   cluster_bits
> +                    Number of bits that are used for addressing an offset
> +                    within a cluster (1 << cluster_bits is the cluster size).
> +                    Must not be less than 9 (i.e. 512 byte clusters).
> +
> +         12 - 15:   nb_bitmaps
> +                    Number of bitmaps contained in the file
> +
> +         16 - 23:   bitmaps_offset
> +                    Offset into the QDB file at which the bitmap table starts.
> +                    Must be aligned to a cluster boundary.
> +
> +         24 - 27:   header_length
> +                    Length of the header structure in bytes.
> +
> +Like in qcow2, directly after the image header, optional sections called header extensions can
> +be stored. Each extension has a structure like the following:
> +
> +    Byte  0 -  3:   Header extension type:
> +                        0x00000000 - End of the header extension area
> +                        other      - Unknown header extension, can be safely
> +                                     ignored
> +
> +          4 -  7:   Length of the header extension data
> +
> +          8 -  n:   Header extension data
> +
> +          n -  m:   Padding to round up the header extension size to the next
> +                    multiple of 8.
> +
> +Unless stated otherwise, each header extension type shall appear at most once
> +in the same image.
> +
> +== Cluster mapping ==
> +
> +QDB uses a ONE-level structure for the mapping of
> +bitmaps to host clusters. It is called L1 table.
> +
> +The L1 table has a variable size (stored in the Bitmap table entry) and may
> +use multiple clusters, however it must be contiguous in the QDB file.
> +
> +Given a offset into the bitmap, the offset into the QDB file can be
> +obtained as follows:
> +
> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
> +
> +L1 table entry:
> +
> +    Bit  0 -  61:   Cluster descriptor
> +
> +        62 -  63:   Reserved
> +
> +Standard Cluster Descriptor (the same as in qcow2):
> +
> +    Bit       0:    If set to 1, the cluster reads as all zeros. The host
> +                    cluster offset can be used to describe a preallocation,
> +                    but it won't be used for reading data from this cluster,
> +                    nor is data read from the backing file if the cluster is
> +                    unallocated.
> +
> +         1 -  8:    Reserved (set to 0)
> +
> +         9 - 55:    Bits 9-55 of host cluster offset. Must be aligned to a
> +                    cluster boundary. If the offset is 0, the cluster is
> +                    unallocated.
> +
> +        56 - 61:    Reserved (set to 0)
> +
> +If a cluster is unallocated, read requests shall read zero.
> +
> +== Bitmap table ==
> +
> +QDB supports storing of several bitmaps.
> +
> +A directory of all bitmaps is stored in the bitmap table, a contiguous area
> +in the QDB file, whose starting offset and length are given by the header
> +fields bitmaps_offset and nb_bitmaps. The entries of the bitmap table
> +have variable length, depending on the length of name and extra data.
> +
> +Bitmap table entry:
> +
> +    Byte 0 -  7:    Offset into the QDB file at which the L1 table for the
> +                    bitmap starts. Must be aligned to a cluster boundary.
> +
> +         8 - 11:    Number of entries in the L1 table of the bitmap
> +
> +        12 - 15:    Bitmap granularity
> +                    As represented in HBitmap structure. Given a granularity of
> +                    G, each bit in the bitmap will actually represent a group
> +                    of 2^G bytes.
> +
> +        16 - 23:    Bitmap size
> +                    The size of really stored data is
> +                    (size + (1 << granularity) - 1) >> granularity
> +
> +             24:    Bitmap enabled flag (valid values are 1 and 0)
> +
> +             25:    File dirty flag (valid values are 1 and 0;
> +                    if value is 1 the bitmap is inconsistent)
> +
> +        26 - 27:    Size of the bitmap name
> +
> +        36 - 39:    Size of extra data in the table entry (used for future
> +                    extensions of the format)
> +
> +        variable:   Extra data for future extensions. Unknown fields must be
> +                    ignored.
> +
> +        variable:   Name of the bitmap (not null terminated)
> +
> +        variable:   Padding to round up the bitmap table entry size to the
> +                    next multiple of 8.
> +
> +The fields "size", "granularity", "enabled" and "name" are corresponding with
> +the fields in struct BdrvDirtyBitmap.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-20 10:41             ` Vladimir Sementsov-Ogievskiy
@ 2014-11-20 11:36               ` Stefan Hajnoczi
  2014-11-21 10:27                 ` Vladimir Sementsov-Ogievskiy
  2014-11-21 12:56                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2014-11-20 11:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: famz, jsnow, qemu-devel, den

[-- Attachment #1: Type: text/plain, Size: 1483 bytes --]

On Thu, Nov 20, 2014 at 01:41:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Also, it may be better to make this as qcow2 extension. And bitmap will be
> saved in separate qcow2 file, which will contain only the bitmap(s) and no
> other data (no disk, no snapshots).

I think you are on to something with the idea of making the persistent
dirty bitmap itself a disk image.

That way drive-mirror and other commands can be used to live migrate the
dirty bitmap along with the guest's disks.  This allows both QEMU and
management tools to reuse existing code.

(We may need to allow multiple block jobs per BlockDriverState to make
this work but in theory that can be done.)

There is a constraint if we want to get live migration for free: The
bitmap contents must be accessible with bdrv_read() and
bdrv_get_block_status() to skip zero regions.

Putting the dirty bitmap into its own data structure in qcow2 and not
accessible as a BlockDriverState bdrv_read() means custom code must be
written to migrate the dirty bitmap.

So I suggest putting the bitmap contents into a disk image that can be
accessed as a BlockDriverState with bdrv_read().  The metadata (bitmap
name, granularity, etc) doesn't need to be stored in the image file
because management tools must be aware of it anyway.

The only thing besides the data that really needs to be stored is the
up-to-date flag to decide whether this dirty bitmap was synced cleanly.
A much simpler format would do for that.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-20 10:34           ` [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec Vladimir Sementsov-Ogievskiy
  2014-11-20 10:41             ` Vladimir Sementsov-Ogievskiy
@ 2014-11-21  0:24             ` Eric Blake
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Blake @ 2014-11-21  0:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, jsnow; +Cc: famz, den, stefanha

[-- Attachment #1: Type: text/plain, Size: 5256 bytes --]

On 11/20/2014 03:34 AM, Vladimir Sementsov-Ogievskiy wrote:
> QDB file is for storing dirty bitmap. The specification is based on
> qcow2 specification.
> 
> Saving several bitmaps is necessary when server shutdowns during
> backup. In this case 2 tables for each disk are available. One
> collected for a previous period and one active. Though this feature
> is discussable.
> 
> Big endian format and Standard Cluster Descriptor are used to simplify
> integration with qcow2, to support internal bitmaps for qcow2 in future.
> 
> The idea is that the same procedure writing the data to QDB file could
> do the same for QCOW2. The only difference is cluster refcount table.
> Should we use it here or not is still questionable.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> ---
>  docs/specs/qdb.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 132 insertions(+)
>  create mode 100644 docs/specs/qdb.txt

No comment on whether the approach itself makes sense - just a
high-level review of this document in isolation.

> 
> diff --git a/docs/specs/qdb.txt b/docs/specs/qdb.txt
> new file mode 100644
> index 0000000..d570a69
> --- /dev/null
> +++ b/docs/specs/qdb.txt
> @@ -0,0 +1,132 @@
> +== General ==

Missing a copyright notice.  Yeah, you've got a lot of bad examples in
this directory (in docs/* in general), but there ARE a few of the newer
files that are starting to buck the trend and use a copyright/license blurb.

> +
> +"QDB" means "Qemu Dirty Bitmaps". QDB file can store several dirty bitmaps.
> +QDB file is organized in units of constant size, which are called clusters.
> +
> +All numbers in QDB are stored in Big Endian byte order.
> +
> +== Header ==
> +
> +The first cluster of a QDB image contains the file header:
> +
> +    Byte  0 -  3:   magic
> +                    QDB magic string ("QDB\0")
> +
> +          4 -  7:   version
> +                    Version number (valid value is 1)
> +
> +          8 - 11:   cluster_bits
> +                    Number of bits that are used for addressing an offset
> +                    within a cluster (1 << cluster_bits is the cluster size).
> +                    Must not be less than 9 (i.e. 512 byte clusters).

Is there a maximum?

> +
> +         12 - 15:   nb_bitmaps
> +                    Number of bitmaps contained in the file
> +
> +         16 - 23:   bitmaps_offset
> +                    Offset into the QDB file at which the bitmap table starts.
> +                    Must be aligned to a cluster boundary.
> +
> +         24 - 27:   header_length
> +                    Length of the header structure in bytes.

does that include the length of all extensions?  Should we enforce a
maximum header length of one cluster?

> +
> +Like in qcow2, directly after the image header, optional sections called header extensions can
> +be stored. Each extension has a structure like the following:
> +
> +    Byte  0 -  3:   Header extension type:
> +                        0x00000000 - End of the header extension area
> +                        other      - Unknown header extension, can be safely
> +                                     ignored
> +
> +          4 -  7:   Length of the header extension data
> +
> +          8 -  n:   Header extension data
> +
> +          n -  m:   Padding to round up the header extension size to the next
> +                    multiple of 8.
> +
> +Unless stated otherwise, each header extension type shall appear at most once
> +in the same image.

I like how qcow2 v3 has a header extension for listing the name of each
header extension, for nicer error messages.  Also, I think that
declaring all unknown extensions as ignorable may be dangerous, since
you lack a capability bitmask.  Maybe it would be wise to copy the qcow2
v3 capabilities (including flags for ignorable vs. mandatory support of
given features, where a client can sanely decide what to do if it does
not recognize a feature).

> +
> +        26 - 27:    Size of the bitmap name
> +
> +        36 - 39:    Size of extra data in the table entry (used for future
> +                    extensions of the format)
> +
> +        variable:   Extra data for future extensions. Unknown fields must be
> +                    ignored.

This block is width 0 if bytes 36-39 is 0?  How are extensions
identified?  Are they required to be done like overall file headers,
with an id, length, and then variable data, so that it is possible to
scan to the end of each unknown extension to see if the next extension
is known?  This is where capability bits in the overall header may make
more sense.

> +
> +        variable:   Name of the bitmap (not null terminated)

The length of this block is determined by bytes 26-27?

> +
> +        variable:   Padding to round up the bitmap table entry size to the
> +                    next multiple of 8.
> +
> +The fields "size", "granularity", "enabled" and "name" are corresponding with
> +the fields in struct BdrvDirtyBitmap.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-20 11:36               ` Stefan Hajnoczi
@ 2014-11-21 10:27                 ` Vladimir Sementsov-Ogievskiy
  2014-11-21 16:55                   ` Stefan Hajnoczi
  2014-11-21 12:56                 ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-21 10:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, den

> There is a constraint if we want to get live migration for free: The
> bitmap contents must be accessible with bdrv_read() and
> bdrv_get_block_status() to skip zero regions.
Hm. I'm afraid, it still will not be free. If bitmap is active, it's 
actual version is in memory. To migrate bitmap file like a disk image, 
we should start syncing it with every write to corresponding disk, 
doubling number of io.

Moreover, we have normal dirty bitmaps, which have no name/file, do we 
migrate them? If, for example, the migration occurs when backup in 
progress? Active bitmaps should be migrated in the same way for 
persistent/named/normal bitmaps. I can't find in qemu source, is there 
bitmap migration?

Or you are saying about migrating disabled bitmaps? Hm. We should sync 
bitmap file on bitmap_disable. Disabled persistent bitmap is just a 
static file ~30mb, we can easily migrate it without common procedure 
with cow or something like this..

Best regards,
Vladimir

On 20.11.2014 14:36, Stefan Hajnoczi wrote:
> On Thu, Nov 20, 2014 at 01:41:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Also, it may be better to make this as qcow2 extension. And bitmap will be
>> saved in separate qcow2 file, which will contain only the bitmap(s) and no
>> other data (no disk, no snapshots).
> I think you are on to something with the idea of making the persistent
> dirty bitmap itself a disk image.
>
> That way drive-mirror and other commands can be used to live migrate the
> dirty bitmap along with the guest's disks.  This allows both QEMU and
> management tools to reuse existing code.
>
> (We may need to allow multiple block jobs per BlockDriverState to make
> this work but in theory that can be done.)
>
> There is a constraint if we want to get live migration for free: The
> bitmap contents must be accessible with bdrv_read() and
> bdrv_get_block_status() to skip zero regions.
>
> Putting the dirty bitmap into its own data structure in qcow2 and not
> accessible as a BlockDriverState bdrv_read() means custom code must be
> written to migrate the dirty bitmap.
>
> So I suggest putting the bitmap contents into a disk image that can be
> accessed as a BlockDriverState with bdrv_read().  The metadata (bitmap
> name, granularity, etc) doesn't need to be stored in the image file
> because management tools must be aware of it anyway.
>
> The only thing besides the data that really needs to be stored is the
> up-to-date flag to decide whether this dirty bitmap was synced cleanly.
> A much simpler format would do for that.
>
> Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-20 11:36               ` Stefan Hajnoczi
  2014-11-21 10:27                 ` Vladimir Sementsov-Ogievskiy
@ 2014-11-21 12:56                 ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-21 12:56 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, den

> The metadata (bitmap
> name, granularity, etc) doesn't need to be stored in the image file
> because management tools must be aware of it anyway.
What tools do you mean? In my opinion dirty bitmap should exist as a 
separate object. If it exists, it should be loaded with it's drive image 
and it should be maintained by qemu (loaded and enabled as a 
BdrvDirtyBitmap). If we use qcow2 format for dirty bitmaps, we can store 
metadata using header extension..

Also snapshots may be used to store several bitmaps in case when server 
shutdowns during backup and we need to store both current active bitmap 
and it's snapshot used by backup.

Best regards,
Vladimir

On 20.11.2014 14:36, Stefan Hajnoczi wrote:
> On Thu, Nov 20, 2014 at 01:41:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Also, it may be better to make this as qcow2 extension. And bitmap will be
>> saved in separate qcow2 file, which will contain only the bitmap(s) and no
>> other data (no disk, no snapshots).
> I think you are on to something with the idea of making the persistent
> dirty bitmap itself a disk image.
>
> That way drive-mirror and other commands can be used to live migrate the
> dirty bitmap along with the guest's disks.  This allows both QEMU and
> management tools to reuse existing code.
>
> (We may need to allow multiple block jobs per BlockDriverState to make
> this work but in theory that can be done.)
>
> There is a constraint if we want to get live migration for free: The
> bitmap contents must be accessible with bdrv_read() and
> bdrv_get_block_status() to skip zero regions.
>
> Putting the dirty bitmap into its own data structure in qcow2 and not
> accessible as a BlockDriverState bdrv_read() means custom code must be
> written to migrate the dirty bitmap.
>
> So I suggest putting the bitmap contents into a disk image that can be
> accessed as a BlockDriverState with bdrv_read().  The metadata (bitmap
> name, granularity, etc) doesn't need to be stored in the image file
> because management tools must be aware of it anyway.
>
> The only thing besides the data that really needs to be stored is the
> up-to-date flag to decide whether this dirty bitmap was synced cleanly.
> A much simpler format would do for that.
>
> Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-21 10:27                 ` Vladimir Sementsov-Ogievskiy
@ 2014-11-21 16:55                   ` Stefan Hajnoczi
  2014-11-24  9:19                     ` Vladimir Sementsov-Ogievskiy
                                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2014-11-21 16:55 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: famz, jsnow, qemu-devel, den

[-- Attachment #1: Type: text/plain, Size: 1868 bytes --]

On Fri, Nov 21, 2014 at 01:27:40PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >There is a constraint if we want to get live migration for free: The
> >bitmap contents must be accessible with bdrv_read() and
> >bdrv_get_block_status() to skip zero regions.
> Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual
> version is in memory. To migrate bitmap file like a disk image, we should
> start syncing it with every write to corresponding disk, doubling number of
> io.

It would be possible to drive-mirror the persistent dirty bitmap and
then flush it like all drives when the guest vCPUs are paused for
migration.

After thinking more about it though, this approach places more I/O into
the critical guest downtime phase.  In other words, slow disk I/O could
lead to long guest downtimes while QEMU tries to write out the dirty
bitmap.

> Moreover, we have normal dirty bitmaps, which have no name/file, do we
> migrate them? If, for example, the migration occurs when backup in progress?
> Active bitmaps should be migrated in the same way for
> persistent/named/normal bitmaps. I can't find in qemu source, is there
> bitmap migration?

bs->dirty_bitmaps is not migrated, in fact none of BlockDriverState is
migrated.

QEMU only migrates emulated device state (e.g. the hardware registers
and associated state).  It does not emulate host state that the guest
cannot see like the dirty bitmap.

> Or you are saying about migrating disabled bitmaps? Hm. We should sync
> bitmap file on bitmap_disable. Disabled persistent bitmap is just a static
> file ~30mb, we can easily migrate it without common procedure with cow or
> something like this..

Active dirty bitmaps should migrate too.  I'm thinking now that the
appropriate thing is to add live migration of dirty bitmaps to QEMU
(regardless of whether they are active or not).

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-21 16:55                   ` Stefan Hajnoczi
@ 2014-11-24  9:19                     ` Vladimir Sementsov-Ogievskiy
  2014-11-25 17:58                     ` Vladimir Sementsov-Ogievskiy
  2014-11-28 13:28                     ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-24  9:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, den

> Active dirty bitmaps should migrate too.  I'm thinking now that the
> appropriate thing is to add live migration of dirty bitmaps to QEMU
> (regardless of whether they are active or not).
Only for persistent bitmaps, or for all named bitmaps? If for all named 
bitmaps, then this migration should not be connected with bitmap file 
and it's format.

Best regards,
Vladimir

On 21.11.2014 19:55, Stefan Hajnoczi wrote:
> On Fri, Nov 21, 2014 at 01:27:40PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> There is a constraint if we want to get live migration for free: The
>>> bitmap contents must be accessible with bdrv_read() and
>>> bdrv_get_block_status() to skip zero regions.
>> Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual
>> version is in memory. To migrate bitmap file like a disk image, we should
>> start syncing it with every write to corresponding disk, doubling number of
>> io.
> It would be possible to drive-mirror the persistent dirty bitmap and
> then flush it like all drives when the guest vCPUs are paused for
> migration.
>
> After thinking more about it though, this approach places more I/O into
> the critical guest downtime phase.  In other words, slow disk I/O could
> lead to long guest downtimes while QEMU tries to write out the dirty
> bitmap.
>
>> Moreover, we have normal dirty bitmaps, which have no name/file, do we
>> migrate them? If, for example, the migration occurs when backup in progress?
>> Active bitmaps should be migrated in the same way for
>> persistent/named/normal bitmaps. I can't find in qemu source, is there
>> bitmap migration?
> bs->dirty_bitmaps is not migrated, in fact none of BlockDriverState is
> migrated.
>
> QEMU only migrates emulated device state (e.g. the hardware registers
> and associated state).  It does not emulate host state that the guest
> cannot see like the dirty bitmap.
>
>> Or you are saying about migrating disabled bitmaps? Hm. We should sync
>> bitmap file on bitmap_disable. Disabled persistent bitmap is just a static
>> file ~30mb, we can easily migrate it without common procedure with cow or
>> something like this..
> Active dirty bitmaps should migrate too.  I'm thinking now that the
> appropriate thing is to add live migration of dirty bitmaps to QEMU
> (regardless of whether they are active or not).
>
> Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-21 16:55                   ` Stefan Hajnoczi
  2014-11-24  9:19                     ` Vladimir Sementsov-Ogievskiy
@ 2014-11-25 17:58                     ` Vladimir Sementsov-Ogievskiy
  2014-11-28 13:28                     ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-25 17:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, den

> I'm thinking now that the
> appropriate thing is to add live migration of dirty bitmaps to QEMU
> (regardless of whether they are active or not).
Digging the code around, I've found this:

in mig_save_device_dirty which is actually an iteration of live block 
migration, after sending a sector we need to clear appropriate bit in 
migration dirty bitmap (bmds->dirty_bitmap). But we clear such bits in 
all bitmaps, associated with this device:

bdrv_reset_dirty(bmds->bs, sector, nr_sectors);

which is

void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int 
nr_sectors)
{
     BdrvDirtyBitmap *bitmap;
     QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
         hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
     }
}

I don't know why is it so, but with such approach we cant talk about 
dirty bitmap migration. Actually, all other dirty bitmaps, not related 
to this migration are broken because of this.

It's a mistake or I don't understand the concept of several dirty 
bitmaps per device in qemu. I've thought that they are separate 
entities, which are maintained by qemu. And other subsystems like backup 
or migration can create for itself a bitmap and use it not touching 
other bitmaps.. Am I wrong?

Best regards,
Vladimir

On 21.11.2014 19:55, Stefan Hajnoczi wrote:
> On Fri, Nov 21, 2014 at 01:27:40PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> There is a constraint if we want to get live migration for free: The
>>> bitmap contents must be accessible with bdrv_read() and
>>> bdrv_get_block_status() to skip zero regions.
>> Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual
>> version is in memory. To migrate bitmap file like a disk image, we should
>> start syncing it with every write to corresponding disk, doubling number of
>> io.
> It would be possible to drive-mirror the persistent dirty bitmap and
> then flush it like all drives when the guest vCPUs are paused for
> migration.
>
> After thinking more about it though, this approach places more I/O into
> the critical guest downtime phase.  In other words, slow disk I/O could
> lead to long guest downtimes while QEMU tries to write out the dirty
> bitmap.
>
>> Moreover, we have normal dirty bitmaps, which have no name/file, do we
>> migrate them? If, for example, the migration occurs when backup in progress?
>> Active bitmaps should be migrated in the same way for
>> persistent/named/normal bitmaps. I can't find in qemu source, is there
>> bitmap migration?
> bs->dirty_bitmaps is not migrated, in fact none of BlockDriverState is
> migrated.
>
> QEMU only migrates emulated device state (e.g. the hardware registers
> and associated state).  It does not emulate host state that the guest
> cannot see like the dirty bitmap.
>
>> Or you are saying about migrating disabled bitmaps? Hm. We should sync
>> bitmap file on bitmap_disable. Disabled persistent bitmap is just a static
>> file ~30mb, we can easily migrate it without common procedure with cow or
>> something like this..
> Active dirty bitmaps should migrate too.  I'm thinking now that the
> appropriate thing is to add live migration of dirty bitmaps to QEMU
> (regardless of whether they are active or not).
>
> Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-21 16:55                   ` Stefan Hajnoczi
  2014-11-24  9:19                     ` Vladimir Sementsov-Ogievskiy
  2014-11-25 17:58                     ` Vladimir Sementsov-Ogievskiy
@ 2014-11-28 13:28                     ` Vladimir Sementsov-Ogievskiy
  2014-12-01 11:02                       ` Stefan Hajnoczi
  2 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2014-11-28 13:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, Denis V. Lunev

On 21.11.2014 19:55, Stefan Hajnoczi wrote:
> Active dirty bitmaps should migrate too.  I'm thinking now that the
> appropriate thing is to add live migration of dirty bitmaps to QEMU
> (regardless of whether they are active or not).
I think, we should migrate named dirty bitmaps, which are not used now. 
So if some external mechanism uses the bitmap (for example - backup) - 
we actually can't migrate this process, because we will need to restore 
the whole backup structure including a pointer to the bitmap, which is 
too hard and includes not only bitmap migration.
So, if named bitmap is enabled, but not used (only bdrv_aligned_pwritev 
writes to it) it can be migrated. For this I see the following solutions:

1) Just save all corresponding pieces of named bitmaps with every 
migrated block. The block size is 1mb, so the overhead for migrating 
additionally a bitmap with 64kb granularity would be 2b, and it would be 
256b for bitmap with 512b granularity. This approach needs additional 
fields in BlkMigBlock, for saving bitmaps pieces.

2) Add DIRTY flag to migrated block flags, to distinguish blocks, which 
became dirty while migrating. Save all the bitmaps separately, and also 
update them on block_load, when we receive block with DIRTY flag on. 
Some information will be lost, migrated dirty bitmaps may be "more 
dirty" then original ones. This approach needs additional field "bool 
dirty" in BlkMigBlock, and saving this flag in blk_send.

These solutions don't depend on "persistence" of dirty bitmaps or 
persistent bitmap file format.

Best regards,
Vladimir

On 21.11.2014 19:55, Stefan Hajnoczi wrote:
> On Fri, Nov 21, 2014 at 01:27:40PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> There is a constraint if we want to get live migration for free: The
>>> bitmap contents must be accessible with bdrv_read() and
>>> bdrv_get_block_status() to skip zero regions.
>> Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual
>> version is in memory. To migrate bitmap file like a disk image, we should
>> start syncing it with every write to corresponding disk, doubling number of
>> io.
> It would be possible to drive-mirror the persistent dirty bitmap and
> then flush it like all drives when the guest vCPUs are paused for
> migration.
>
> After thinking more about it though, this approach places more I/O into
> the critical guest downtime phase.  In other words, slow disk I/O could
> lead to long guest downtimes while QEMU tries to write out the dirty
> bitmap.
>
>> Moreover, we have normal dirty bitmaps, which have no name/file, do we
>> migrate them? If, for example, the migration occurs when backup in progress?
>> Active bitmaps should be migrated in the same way for
>> persistent/named/normal bitmaps. I can't find in qemu source, is there
>> bitmap migration?
> bs->dirty_bitmaps is not migrated, in fact none of BlockDriverState is
> migrated.
>
> QEMU only migrates emulated device state (e.g. the hardware registers
> and associated state).  It does not emulate host state that the guest
> cannot see like the dirty bitmap.
>
>> Or you are saying about migrating disabled bitmaps? Hm. We should sync
>> bitmap file on bitmap_disable. Disabled persistent bitmap is just a static
>> file ~30mb, we can easily migrate it without common procedure with cow or
>> something like this..
> Active dirty bitmaps should migrate too.  I'm thinking now that the
> appropriate thing is to add live migration of dirty bitmaps to QEMU
> (regardless of whether they are active or not).
>
> Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec.
  2014-11-28 13:28                     ` Vladimir Sementsov-Ogievskiy
@ 2014-12-01 11:02                       ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2014-12-01 11:02 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: famz, jsnow, qemu-devel, Denis V. Lunev

[-- Attachment #1: Type: text/plain, Size: 2392 bytes --]

On Fri, Nov 28, 2014 at 04:28:57PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 21.11.2014 19:55, Stefan Hajnoczi wrote:
> >Active dirty bitmaps should migrate too.  I'm thinking now that the
> >appropriate thing is to add live migration of dirty bitmaps to QEMU
> >(regardless of whether they are active or not).
> I think, we should migrate named dirty bitmaps, which are not used now. So
> if some external mechanism uses the bitmap (for example - backup) - we
> actually can't migrate this process, because we will need to restore the
> whole backup structure including a pointer to the bitmap, which is too hard
> and includes not only bitmap migration.
> So, if named bitmap is enabled, but not used (only bdrv_aligned_pwritev
> writes to it) it can be migrated. For this I see the following solutions:
> 
> 1) Just save all corresponding pieces of named bitmaps with every migrated
> block. The block size is 1mb, so the overhead for migrating additionally a
> bitmap with 64kb granularity would be 2b, and it would be 256b for bitmap
> with 512b granularity. This approach needs additional fields in BlkMigBlock,
> for saving bitmaps pieces.

block-migration.c is not used for all live migration.  So it's important
not to tie dirty bitmap migration to block-migration.c, at least there
needs to be a way to skip actually copying disk contents in
block-migration.c.

(When there is shared storage that both source and destination hosts can
access then block-migration.c is not used.  Also, there is a newer
non-shared storage migration mechanism that is used instead of
block-migration.c which is not tied into the live migration data stream,
so block-migration.c is optional.)

> 2) Add DIRTY flag to migrated block flags, to distinguish blocks, which
> became dirty while migrating. Save all the bitmaps separately, and also
> update them on block_load, when we receive block with DIRTY flag on. Some
> information will be lost, migrated dirty bitmaps may be "more dirty" then
> original ones. This approach needs additional field "bool dirty" in
> BlkMigBlock, and saving this flag in blk_send.
> 
> These solutions don't depend on "persistence" of dirty bitmaps or persistent
> bitmap file format.

That's an important characteristic since we probably want to migrate
named dirty bitmaps, whether they are persistent or not.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-12-01 11:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <545CB9CE.9000302@parallels.com>
     [not found] ` <20141108071919.GB4940@fam-t430.nay.redhat.com>
     [not found]   ` <54607427.8040404@parallels.com>
     [not found]     ` <5462327C.5080704@redhat.com>
     [not found]       ` <5464B80E.6060201@parallels.com>
     [not found]         ` <546A88DD.10006@redhat.com>
2014-11-18 10:54           ` [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series Vladimir Sementsov-Ogievskiy
2014-11-18 13:09             ` Vladimir Sementsov-Ogievskiy
2014-11-18 14:41               ` Vladimir Sementsov-Ogievskiy
2014-11-18 16:08               ` John Snow
2014-11-19  6:25                 ` Denis V. Lunev
2014-11-20 10:34           ` [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec Vladimir Sementsov-Ogievskiy
2014-11-20 10:41             ` Vladimir Sementsov-Ogievskiy
2014-11-20 11:36               ` Stefan Hajnoczi
2014-11-21 10:27                 ` Vladimir Sementsov-Ogievskiy
2014-11-21 16:55                   ` Stefan Hajnoczi
2014-11-24  9:19                     ` Vladimir Sementsov-Ogievskiy
2014-11-25 17:58                     ` Vladimir Sementsov-Ogievskiy
2014-11-28 13:28                     ` Vladimir Sementsov-Ogievskiy
2014-12-01 11:02                       ` Stefan Hajnoczi
2014-11-21 12:56                 ` Vladimir Sementsov-Ogievskiy
2014-11-21  0:24             ` Eric Blake
2014-10-30  3:22 [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series Fam Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).