* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series [not found] ` <546A88DD.10006@redhat.com> @ 2014-11-18 10:54 ` Vladimir Sementsov-Ogievskiy 2014-11-18 13:09 ` Vladimir Sementsov-Ogievskiy 2014-11-20 10:34 ` [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec Vladimir Sementsov-Ogievskiy 1 sibling, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-18 10:54 UTC (permalink / raw) To: John Snow, Fam Zheng; +Cc: Denis V. Lunev, stefanha, qemu-devel > (2) File Format > > Some standard file magic, which includes: > > - Some magic byte(s) > - Dirty flag. Needed to tell if we can trust this data or not. > - The size of the bitmap > - The granularity of the bitmap > - The offset to the first sector of bitmap data (Maybe? It can't hurt > if we give ourselves a sector's worth to write metadata within.) > - Data starting at... PAGESIZE? - The name of the bitmap and also the size of this name > > (5) Partial Persistence > > We did not discuss only saving higher levels of the bitmap. What's the > primary benefit you're seeking? Hmm. It may be used for faster sync. Maybe, save some of bitmap levels on timer while vm is running and save the last level on shutdown? CC qemu-devel - ok. Best regards, Vladimir On 18.11.2014 02:46, John Snow wrote: > > > On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote: >> Hi >> >> I'd just like to start working on persistent dirty bitmap. My thoughts >> about it are the following: >> - qemu -drive file=file,dirty_bitmap=file >> so, bitmap will be loaded with drive open and saved with drive >> close. >> - save only meaningful (the last) level of the bitmap, restore all >> levels on bitmap loading >> - bool parameter "persistent" for bdrv_create_dirty_bitmap and >> BdrvDirtyBitmap >> - internal dirty_bitmaps, saved in qcow2 file >> >> Best regards, >> Vladimir > > I am thinking: > > (1) Command Lines > > If you enable dirty bitmaps and give it a file that doesn't exist, it > should error out on you. > > If you enable dirty bitmaps and give it a file that's blank, it > understands that it is to create a persistent bitmap file in this > location and it should enable persistence. > > If a bitmap file is given and it has valid magic, this should imply > persistence. > > I am hesitant to have it auto-create files that don't already exist in > case the files become large in size and a misconfiguration leads to > repeated creation of these files that get orphaned in random folders. > Perhaps we can add a create=auto flag or similar to allow this > behavior if wanted. > > (2) File Format > > Some standard file magic, which includes: > > - Some magic byte(s) > - Dirty flag. Needed to tell if we can trust this data or not. > - The size of the bitmap > - The granularity of the bitmap > - The offset to the first sector of bitmap data (Maybe? It can't hurt > if we give ourselves a sector's worth to write metadata within.) > - Data starting at... PAGESIZE? > > (3) Data Integrity > > The dirty flag could work something like: > > - If, on first open, the file has the dirty flag set, we need to > discard the bitmap data because we can no longer trust it. > - If the bitmap file is clean, proceed as normal, but take a lock > against any of the bitmap functions to prevent them from marking any > bits dirty. > - On first write to a clean persistent bitmap, delay the write until > we can mark the bitmap as dirty first. This incurs a write penalty > when we try to use the bitmap at first... > - Unlock the bitmap functions and allow them to mark blocks as needed. > - At some point, based on a sync policy, re-commit the dirty > information to the file and mark the file as clean once more and > re-take the persistence lock. > > (4) Synchronization Policy > > - Sync after so many bits become dirty in the bitmap, either as an > absolute threshold or a density percentage? > - Sync periodically on a fixed timer? > - Sync periodically opportunistically when I/O utilization becomes > relatively low? (With some sort of starvation prevention timer?) > - Sync only at shutdown? > > In discussing with Stefan, I think we rather liked the idea of a timer > that tries to re-commit the block data during lulls in the I/O. > > (5) Partial Persistence > > We did not discuss only saving higher levels of the bitmap. What's the > primary benefit you're seeking? > > (6) Inclusion as qcow2 Metadata > > And lastly, we did discuss the inclusion of the bitmap as qcow2 > metadata, but decided it wasn't our principle target for the format to > allow generality to other file formats. We didn't really discuss the > idea of having it as an option or an extension, but I don't (off the > top of my head) have any reasonings against it, but I will likely not > work on it myself. > > > You didn't CC qemu-devel on this (so I won't!), but perhaps we should > re-send out our ideas to the wider list for feedback before we proceed > any further. Maybe we can split the work if we agree upon a design. > > Thanks! > --js > > P.S.: I'm still cleaning up Fam's first patchset based on Max's and > your feedback. Hope to have it out by the end of this week. > >> On 11.11.2014 18:59, John Snow wrote: >>> >>> >>> On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi Fam, hi Jorn. >>>> >>>> Jagane's project - http://wiki.qemu.org/Features/Livebackup >>>> >>>> In two words: >>>> Normal delta - like in qemu, while backuping, we save all new >>>> writes to >>>> separate virtual disk - delta. When backup is done, we can merge delta >>>> back to original image. >>>> Reverse delta - while backuping, we don't stop writing to original >>>> image >>>> (and qemu works with it, not with delta), but before every write we >>>> copy >>>> the corresponding block (if backup needs it) to separate virtual disk >>>> (reverse delta). So, for backuping, if the current block is not >>>> rewritten, we take it from original file, otherwise we take it from >>>> reverse delta. The benefit is that, theoretically we'll have two times >>>> less overhead I/Os then with "normal delta backup + merge", because of >>>> part of (about 0.5) blocks which are rewritten while backuping will be >>>> backuped before rewrite, and they will not go to delta then. Also, >>>> there >>>> are should be methods to improve this coefficient, by choosing rules >>>> about what blocks to backup first. >>>> >>>> Are someone work now on persistent (while vm is turned off) >>>> bitmaps? If >>>> not, are there any recommendations? >>>> >>>> Best regards, >>>> Vladimir >>>> >>> >>> Hi Vladimir: >>> >>> I recently inherited Fam's backup series and I intended to start >>> reviewing the comments this week. After I sift through that situation, >>> the next step for me is persistent dirty bitmap support. I had been >>> working on AHCI and IDE an awful lot lately, so it will take me a >>> little bit to switch over. >>> >>> I don't have any existing notions of how I intend to implement it, so >>> if you'd like to discuss any concerns you have about that subsystem, >>> we can talk :) >>> >>> Thanks, >>> --John >>> >>>> On 08.11.2014 10:19, Fam Zheng wrote: >>>>> On Fri, 11/07 15:23, Vladimir Sementsov-Ogievskiy wrote: >>>>>> Hi! >>>>>> >>>>>> Glad to see you working on backup series again. Some time ago I've >>>>>> already >>>>>> fixed v4 to be mergeable, but unfortunately didn't publish. >>>>>> Actually I need (and I'm going to develop) incremental backup with >>>>>> reverse >>>>>> delta (like abandoned project >>>>>> http://wiki.qemu.org/Features/Livebackup by >>>>>> Jagane). And I've already fixed Jagane's project to be working with >>>>>> current >>>>>> Qemu. But the code is bad. Can you give me a peace of advice about >>>>>> what >>>>>> should I notice? I have a very superficial understanding of qemu >>>>>> block >>>>>> devices, block jobs, etc. >>>>> Hi Vladimir, >>>>> >>>>> Thank you for reviewing my series. >>>>> >>>>> I haven't seen Jagane's series, so could you pleased explain what >>>>> "reverse >>>>> delta" is? >>>>> >>>>> John Snow (CC'ed) will take my work on incremental backup so I think >>>>> you can >>>>> coordinate. I'll probably count on John to work on your comments too, >>>>> in the >>>>> future I'll follow the progress and review. >>>>> >>>>> Thanks, >>>>> Fam >>>> >>> >> > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series 2014-11-18 10:54 ` [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series Vladimir Sementsov-Ogievskiy @ 2014-11-18 13:09 ` Vladimir Sementsov-Ogievskiy 2014-11-18 14:41 ` Vladimir Sementsov-Ogievskiy 2014-11-18 16:08 ` John Snow 0 siblings, 2 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-18 13:09 UTC (permalink / raw) To: John Snow, Fam Zheng; +Cc: Denis V. Lunev, stefanha, qemu-devel > (3) Data Integrity > > The dirty flag could work something like: > > - If, on first open, the file has the dirty flag set, we need to > discard the bitmap data because we can no longer trust it. > - If the bitmap file is clean, proceed as normal, but take a lock > against any of the bitmap functions to prevent them from marking any > bits dirty. > - On first write to a clean persistent bitmap, delay the write until > we can mark the bitmap as dirty first. This incurs a write penalty > when we try to use the bitmap at first... > - Unlock the bitmap functions and allow them to mark blocks as needed. > - At some point, based on a sync policy, re-commit the dirty > information to the file and mark the file as clean once more and > re-take the persistence lock. Correct me if I'm wrong. #Read bitmap: read in blockdev_init, before any write to device, so no lock is needed. #Set bits in bitmap: if bitmap.dirty_flag: set bits else: LOCK set bits set bitmap.dirty_flag set dirty_flag in bitmap file UNLOCK #Sync: if not bitmap.dirty_flag: skip sync else: LOCK save one of bitmap levels (saving the last one is too long and not very good idea, because it is fast-updateing) unset dirty_flag in bitmap file unset bitmap.dirty_flag UNLOCK #Last sync in bdrv_close: Just save the last bitmap level and unset dirty_flag in bitmap file Also.. I'm not quite sure about locking.. As I understand, co-routines in qemu are not running in parallel, is locking required? Or sync timer will not be co-routine based? Best regards, Vladimir On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote: > >> (2) File Format >> >> Some standard file magic, which includes: >> >> - Some magic byte(s) >> - Dirty flag. Needed to tell if we can trust this data or not. >> - The size of the bitmap >> - The granularity of the bitmap >> - The offset to the first sector of bitmap data (Maybe? It can't hurt >> if we give ourselves a sector's worth to write metadata within.) >> - Data starting at... PAGESIZE? > - The name of the bitmap and also the size of this name > >> >> (5) Partial Persistence >> >> We did not discuss only saving higher levels of the bitmap. What's >> the primary benefit you're seeking? > Hmm. It may be used for faster sync. Maybe, save some of bitmap levels > on timer while vm is running and save the last level on shutdown? > > CC qemu-devel - ok. > > Best regards, > Vladimir > > On 18.11.2014 02:46, John Snow wrote: >> >> >> On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote: >>> Hi >>> >>> I'd just like to start working on persistent dirty bitmap. My thoughts >>> about it are the following: >>> - qemu -drive file=file,dirty_bitmap=file >>> so, bitmap will be loaded with drive open and saved with drive >>> close. >>> - save only meaningful (the last) level of the bitmap, restore all >>> levels on bitmap loading >>> - bool parameter "persistent" for bdrv_create_dirty_bitmap and >>> BdrvDirtyBitmap >>> - internal dirty_bitmaps, saved in qcow2 file >>> >>> Best regards, >>> Vladimir >> >> I am thinking: >> >> (1) Command Lines >> >> If you enable dirty bitmaps and give it a file that doesn't exist, it >> should error out on you. >> >> If you enable dirty bitmaps and give it a file that's blank, it >> understands that it is to create a persistent bitmap file in this >> location and it should enable persistence. >> >> If a bitmap file is given and it has valid magic, this should imply >> persistence. >> >> I am hesitant to have it auto-create files that don't already exist >> in case the files become large in size and a misconfiguration leads >> to repeated creation of these files that get orphaned in random >> folders. Perhaps we can add a create=auto flag or similar to allow >> this behavior if wanted. >> >> (2) File Format >> >> Some standard file magic, which includes: >> >> - Some magic byte(s) >> - Dirty flag. Needed to tell if we can trust this data or not. >> - The size of the bitmap >> - The granularity of the bitmap >> - The offset to the first sector of bitmap data (Maybe? It can't hurt >> if we give ourselves a sector's worth to write metadata within.) >> - Data starting at... PAGESIZE? >> >> (3) Data Integrity >> >> The dirty flag could work something like: >> >> - If, on first open, the file has the dirty flag set, we need to >> discard the bitmap data because we can no longer trust it. >> - If the bitmap file is clean, proceed as normal, but take a lock >> against any of the bitmap functions to prevent them from marking any >> bits dirty. >> - On first write to a clean persistent bitmap, delay the write until >> we can mark the bitmap as dirty first. This incurs a write penalty >> when we try to use the bitmap at first... >> - Unlock the bitmap functions and allow them to mark blocks as needed. >> - At some point, based on a sync policy, re-commit the dirty >> information to the file and mark the file as clean once more and >> re-take the persistence lock. >> >> (4) Synchronization Policy >> >> - Sync after so many bits become dirty in the bitmap, either as an >> absolute threshold or a density percentage? >> - Sync periodically on a fixed timer? >> - Sync periodically opportunistically when I/O utilization becomes >> relatively low? (With some sort of starvation prevention timer?) >> - Sync only at shutdown? >> >> In discussing with Stefan, I think we rather liked the idea of a >> timer that tries to re-commit the block data during lulls in the I/O. >> >> (5) Partial Persistence >> >> We did not discuss only saving higher levels of the bitmap. What's >> the primary benefit you're seeking? >> >> (6) Inclusion as qcow2 Metadata >> >> And lastly, we did discuss the inclusion of the bitmap as qcow2 >> metadata, but decided it wasn't our principle target for the format >> to allow generality to other file formats. We didn't really discuss >> the idea of having it as an option or an extension, but I don't (off >> the top of my head) have any reasonings against it, but I will likely >> not work on it myself. >> >> >> You didn't CC qemu-devel on this (so I won't!), but perhaps we should >> re-send out our ideas to the wider list for feedback before we >> proceed any further. Maybe we can split the work if we agree upon a >> design. >> >> Thanks! >> --js >> >> P.S.: I'm still cleaning up Fam's first patchset based on Max's and >> your feedback. Hope to have it out by the end of this week. >> >>> On 11.11.2014 18:59, John Snow wrote: >>>> >>>> >>>> On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>> Hi Fam, hi Jorn. >>>>> >>>>> Jagane's project - http://wiki.qemu.org/Features/Livebackup >>>>> >>>>> In two words: >>>>> Normal delta - like in qemu, while backuping, we save all new >>>>> writes to >>>>> separate virtual disk - delta. When backup is done, we can merge >>>>> delta >>>>> back to original image. >>>>> Reverse delta - while backuping, we don't stop writing to original >>>>> image >>>>> (and qemu works with it, not with delta), but before every write >>>>> we copy >>>>> the corresponding block (if backup needs it) to separate virtual disk >>>>> (reverse delta). So, for backuping, if the current block is not >>>>> rewritten, we take it from original file, otherwise we take it from >>>>> reverse delta. The benefit is that, theoretically we'll have two >>>>> times >>>>> less overhead I/Os then with "normal delta backup + merge", >>>>> because of >>>>> part of (about 0.5) blocks which are rewritten while backuping >>>>> will be >>>>> backuped before rewrite, and they will not go to delta then. Also, >>>>> there >>>>> are should be methods to improve this coefficient, by choosing rules >>>>> about what blocks to backup first. >>>>> >>>>> Are someone work now on persistent (while vm is turned off) >>>>> bitmaps? If >>>>> not, are there any recommendations? >>>>> >>>>> Best regards, >>>>> Vladimir >>>>> >>>> >>>> Hi Vladimir: >>>> >>>> I recently inherited Fam's backup series and I intended to start >>>> reviewing the comments this week. After I sift through that situation, >>>> the next step for me is persistent dirty bitmap support. I had been >>>> working on AHCI and IDE an awful lot lately, so it will take me a >>>> little bit to switch over. >>>> >>>> I don't have any existing notions of how I intend to implement it, so >>>> if you'd like to discuss any concerns you have about that subsystem, >>>> we can talk :) >>>> >>>> Thanks, >>>> --John >>>> >>>>> On 08.11.2014 10:19, Fam Zheng wrote: >>>>>> On Fri, 11/07 15:23, Vladimir Sementsov-Ogievskiy wrote: >>>>>>> Hi! >>>>>>> >>>>>>> Glad to see you working on backup series again. Some time ago I've >>>>>>> already >>>>>>> fixed v4 to be mergeable, but unfortunately didn't publish. >>>>>>> Actually I need (and I'm going to develop) incremental backup with >>>>>>> reverse >>>>>>> delta (like abandoned project >>>>>>> http://wiki.qemu.org/Features/Livebackup by >>>>>>> Jagane). And I've already fixed Jagane's project to be working with >>>>>>> current >>>>>>> Qemu. But the code is bad. Can you give me a peace of advice about >>>>>>> what >>>>>>> should I notice? I have a very superficial understanding of qemu >>>>>>> block >>>>>>> devices, block jobs, etc. >>>>>> Hi Vladimir, >>>>>> >>>>>> Thank you for reviewing my series. >>>>>> >>>>>> I haven't seen Jagane's series, so could you pleased explain what >>>>>> "reverse >>>>>> delta" is? >>>>>> >>>>>> John Snow (CC'ed) will take my work on incremental backup so I think >>>>>> you can >>>>>> coordinate. I'll probably count on John to work on your comments >>>>>> too, >>>>>> in the >>>>>> future I'll follow the progress and review. >>>>>> >>>>>> Thanks, >>>>>> Fam >>>>> >>>> >>> >> > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series 2014-11-18 13:09 ` Vladimir Sementsov-Ogievskiy @ 2014-11-18 14:41 ` Vladimir Sementsov-Ogievskiy 2014-11-18 16:08 ` John Snow 1 sibling, 0 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-18 14:41 UTC (permalink / raw) To: John Snow, qemu-devel; +Cc: Fam Zheng, Denis V. Lunev, stefanha Also, if we sync not the last level, bitmap.dirty_flag should be related to syncing level, not to the whole bitmap, to reduce sync overhead. Or, if we implement difficult sync policy, there should be dirty flags for each bitmap level. Despite this, only one level is needed to be saved in the bitmap file. PS: more ideas about file format - thanks to Denis V. Lunev <den@parallels.com> 1) Shouldn't we consider a possibility of storing several bitmaps in one file? Or one bitmap = one file? 2) Implement header extensions like in qcow2. Best regards, Vladimir On 18.11.2014 16:09, Vladimir Sementsov-Ogievskiy wrote: >> (3) Data Integrity >> >> The dirty flag could work something like: >> >> - If, on first open, the file has the dirty flag set, we need to >> discard the bitmap data because we can no longer trust it. >> - If the bitmap file is clean, proceed as normal, but take a lock >> against any of the bitmap functions to prevent them from marking any >> bits dirty. >> - On first write to a clean persistent bitmap, delay the write until >> we can mark the bitmap as dirty first. This incurs a write penalty >> when we try to use the bitmap at first... >> - Unlock the bitmap functions and allow them to mark blocks as needed. >> - At some point, based on a sync policy, re-commit the dirty >> information to the file and mark the file as clean once more and >> re-take the persistence lock. > Correct me if I'm wrong. > > #Read bitmap: > read in blockdev_init, before any write to device, so no lock is needed. > > #Set bits in bitmap: > if bitmap.dirty_flag: > set bits > else: > LOCK > set bits > set bitmap.dirty_flag > set dirty_flag in bitmap file > UNLOCK > > #Sync: > if not bitmap.dirty_flag: > skip sync > else: > LOCK > save one of bitmap levels (saving the last one is too long and not > very good idea, because it is fast-updateing) > unset dirty_flag in bitmap file > unset bitmap.dirty_flag > UNLOCK > > #Last sync in bdrv_close: > Just save the last bitmap level and unset dirty_flag in bitmap file > > Also.. I'm not quite sure about locking.. As I understand, co-routines > in qemu are not running in parallel, is locking required? Or sync > timer will not be co-routine based? > > Best regards, > Vladimir > > On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote: >> >>> (2) File Format >>> >>> Some standard file magic, which includes: >>> >>> - Some magic byte(s) >>> - Dirty flag. Needed to tell if we can trust this data or not. >>> - The size of the bitmap >>> - The granularity of the bitmap >>> - The offset to the first sector of bitmap data (Maybe? It can't >>> hurt if we give ourselves a sector's worth to write metadata within.) >>> - Data starting at... PAGESIZE? >> - The name of the bitmap and also the size of this name >> >>> >>> (5) Partial Persistence >>> >>> We did not discuss only saving higher levels of the bitmap. What's >>> the primary benefit you're seeking? >> Hmm. It may be used for faster sync. Maybe, save some of bitmap >> levels on timer while vm is running and save the last level on shutdown? >> >> CC qemu-devel - ok. >> >> Best regards, >> Vladimir >> >> On 18.11.2014 02:46, John Snow wrote: >>> >>> >>> On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi >>>> >>>> I'd just like to start working on persistent dirty bitmap. My thoughts >>>> about it are the following: >>>> - qemu -drive file=file,dirty_bitmap=file >>>> so, bitmap will be loaded with drive open and saved with >>>> drive close. >>>> - save only meaningful (the last) level of the bitmap, restore all >>>> levels on bitmap loading >>>> - bool parameter "persistent" for bdrv_create_dirty_bitmap and >>>> BdrvDirtyBitmap >>>> - internal dirty_bitmaps, saved in qcow2 file >>>> >>>> Best regards, >>>> Vladimir >>> >>> I am thinking: >>> >>> (1) Command Lines >>> >>> If you enable dirty bitmaps and give it a file that doesn't exist, >>> it should error out on you. >>> >>> If you enable dirty bitmaps and give it a file that's blank, it >>> understands that it is to create a persistent bitmap file in this >>> location and it should enable persistence. >>> >>> If a bitmap file is given and it has valid magic, this should imply >>> persistence. >>> >>> I am hesitant to have it auto-create files that don't already exist >>> in case the files become large in size and a misconfiguration leads >>> to repeated creation of these files that get orphaned in random >>> folders. Perhaps we can add a create=auto flag or similar to allow >>> this behavior if wanted. >>> >>> (2) File Format >>> >>> Some standard file magic, which includes: >>> >>> - Some magic byte(s) >>> - Dirty flag. Needed to tell if we can trust this data or not. >>> - The size of the bitmap >>> - The granularity of the bitmap >>> - The offset to the first sector of bitmap data (Maybe? It can't >>> hurt if we give ourselves a sector's worth to write metadata within.) >>> - Data starting at... PAGESIZE? >>> >>> (3) Data Integrity >>> >>> The dirty flag could work something like: >>> >>> - If, on first open, the file has the dirty flag set, we need to >>> discard the bitmap data because we can no longer trust it. >>> - If the bitmap file is clean, proceed as normal, but take a lock >>> against any of the bitmap functions to prevent them from marking any >>> bits dirty. >>> - On first write to a clean persistent bitmap, delay the write until >>> we can mark the bitmap as dirty first. This incurs a write penalty >>> when we try to use the bitmap at first... >>> - Unlock the bitmap functions and allow them to mark blocks as needed. >>> - At some point, based on a sync policy, re-commit the dirty >>> information to the file and mark the file as clean once more and >>> re-take the persistence lock. >>> >>> (4) Synchronization Policy >>> >>> - Sync after so many bits become dirty in the bitmap, either as an >>> absolute threshold or a density percentage? >>> - Sync periodically on a fixed timer? >>> - Sync periodically opportunistically when I/O utilization becomes >>> relatively low? (With some sort of starvation prevention timer?) >>> - Sync only at shutdown? >>> >>> In discussing with Stefan, I think we rather liked the idea of a >>> timer that tries to re-commit the block data during lulls in the I/O. >>> >>> (5) Partial Persistence >>> >>> We did not discuss only saving higher levels of the bitmap. What's >>> the primary benefit you're seeking? >>> >>> (6) Inclusion as qcow2 Metadata >>> >>> And lastly, we did discuss the inclusion of the bitmap as qcow2 >>> metadata, but decided it wasn't our principle target for the format >>> to allow generality to other file formats. We didn't really discuss >>> the idea of having it as an option or an extension, but I don't (off >>> the top of my head) have any reasonings against it, but I will >>> likely not work on it myself. >>> >>> >>> You didn't CC qemu-devel on this (so I won't!), but perhaps we >>> should re-send out our ideas to the wider list for feedback before >>> we proceed any further. Maybe we can split the work if we agree upon >>> a design. >>> >>> Thanks! >>> --js >>> >>> P.S.: I'm still cleaning up Fam's first patchset based on Max's and >>> your feedback. Hope to have it out by the end of this week. >>> >>>> On 11.11.2014 18:59, John Snow wrote: >>>>> >>>>> >>>>> On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>>> Hi Fam, hi Jorn. >>>>>> >>>>>> Jagane's project - http://wiki.qemu.org/Features/Livebackup >>>>>> >>>>>> In two words: >>>>>> Normal delta - like in qemu, while backuping, we save all new >>>>>> writes to >>>>>> separate virtual disk - delta. When backup is done, we can merge >>>>>> delta >>>>>> back to original image. >>>>>> Reverse delta - while backuping, we don't stop writing to >>>>>> original image >>>>>> (and qemu works with it, not with delta), but before every write >>>>>> we copy >>>>>> the corresponding block (if backup needs it) to separate virtual >>>>>> disk >>>>>> (reverse delta). So, for backuping, if the current block is not >>>>>> rewritten, we take it from original file, otherwise we take it from >>>>>> reverse delta. The benefit is that, theoretically we'll have two >>>>>> times >>>>>> less overhead I/Os then with "normal delta backup + merge", >>>>>> because of >>>>>> part of (about 0.5) blocks which are rewritten while backuping >>>>>> will be >>>>>> backuped before rewrite, and they will not go to delta then. >>>>>> Also, there >>>>>> are should be methods to improve this coefficient, by choosing rules >>>>>> about what blocks to backup first. >>>>>> >>>>>> Are someone work now on persistent (while vm is turned off) >>>>>> bitmaps? If >>>>>> not, are there any recommendations? >>>>>> >>>>>> Best regards, >>>>>> Vladimir >>>>>> >>>>> >>>>> Hi Vladimir: >>>>> >>>>> I recently inherited Fam's backup series and I intended to start >>>>> reviewing the comments this week. After I sift through that >>>>> situation, >>>>> the next step for me is persistent dirty bitmap support. I had been >>>>> working on AHCI and IDE an awful lot lately, so it will take me a >>>>> little bit to switch over. >>>>> >>>>> I don't have any existing notions of how I intend to implement it, so >>>>> if you'd like to discuss any concerns you have about that subsystem, >>>>> we can talk :) >>>>> >>>>> Thanks, >>>>> --John >>>>> >>>>>> On 08.11.2014 10:19, Fam Zheng wrote: >>>>>>> On Fri, 11/07 15:23, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>> Hi! >>>>>>>> >>>>>>>> Glad to see you working on backup series again. Some time ago I've >>>>>>>> already >>>>>>>> fixed v4 to be mergeable, but unfortunately didn't publish. >>>>>>>> Actually I need (and I'm going to develop) incremental backup with >>>>>>>> reverse >>>>>>>> delta (like abandoned project >>>>>>>> http://wiki.qemu.org/Features/Livebackup by >>>>>>>> Jagane). And I've already fixed Jagane's project to be working >>>>>>>> with >>>>>>>> current >>>>>>>> Qemu. But the code is bad. Can you give me a peace of advice about >>>>>>>> what >>>>>>>> should I notice? I have a very superficial understanding of >>>>>>>> qemu block >>>>>>>> devices, block jobs, etc. >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> Thank you for reviewing my series. >>>>>>> >>>>>>> I haven't seen Jagane's series, so could you pleased explain what >>>>>>> "reverse >>>>>>> delta" is? >>>>>>> >>>>>>> John Snow (CC'ed) will take my work on incremental backup so I >>>>>>> think >>>>>>> you can >>>>>>> coordinate. I'll probably count on John to work on your comments >>>>>>> too, >>>>>>> in the >>>>>>> future I'll follow the progress and review. >>>>>>> >>>>>>> Thanks, >>>>>>> Fam >>>>>> >>>>> >>>> >>> >> >> > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series 2014-11-18 13:09 ` Vladimir Sementsov-Ogievskiy 2014-11-18 14:41 ` Vladimir Sementsov-Ogievskiy @ 2014-11-18 16:08 ` John Snow 2014-11-19 6:25 ` Denis V. Lunev 1 sibling, 1 reply; 16+ messages in thread From: John Snow @ 2014-11-18 16:08 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, Fam Zheng Cc: Denis V. Lunev, stefanha, qemu-devel On 11/18/2014 08:09 AM, Vladimir Sementsov-Ogievskiy wrote: >> (3) Data Integrity >> >> The dirty flag could work something like: >> >> - If, on first open, the file has the dirty flag set, we need to >> discard the bitmap data because we can no longer trust it. >> - If the bitmap file is clean, proceed as normal, but take a lock >> against any of the bitmap functions to prevent them from marking any >> bits dirty. >> - On first write to a clean persistent bitmap, delay the write until >> we can mark the bitmap as dirty first. This incurs a write penalty >> when we try to use the bitmap at first... >> - Unlock the bitmap functions and allow them to mark blocks as needed. >> - At some point, based on a sync policy, re-commit the dirty >> information to the file and mark the file as clean once more and >> re-take the persistence lock. > Correct me if I'm wrong. > > #Read bitmap: > read in blockdev_init, before any write to device, so no lock is needed. > > #Set bits in bitmap: > if bitmap.dirty_flag: > set bits > else: > LOCK > set bits > set bitmap.dirty_flag > set dirty_flag in bitmap file > UNLOCK > > #Sync: > if not bitmap.dirty_flag: > skip sync > else: > LOCK > save one of bitmap levels (saving the last one is too long and not > very good idea, because it is fast-updateing) > unset dirty_flag in bitmap file > unset bitmap.dirty_flag > UNLOCK > > #Last sync in bdrv_close: > Just save the last bitmap level and unset dirty_flag in bitmap file > > Also.. I'm not quite sure about locking.. As I understand, co-routines > in qemu are not running in parallel, is locking required? Or sync timer > will not be co-routine based? > > Best regards, > Vladimir Might be being too informal. I just meant a lock or barrier to prevent actual IO throughput until we can confirm the dirty flag has been adjusted to indicate that the persistent bitmap is now officially out of date. Nothing fancy. Wasn't trying to imply that we needed threading protection, just "locking" the IO until we can configure the bitmap as we need it to be. > On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote: >> >>> (2) File Format >>> >>> Some standard file magic, which includes: >>> >>> - Some magic byte(s) >>> - Dirty flag. Needed to tell if we can trust this data or not. >>> - The size of the bitmap >>> - The granularity of the bitmap >>> - The offset to the first sector of bitmap data (Maybe? It can't hurt >>> if we give ourselves a sector's worth to write metadata within.) >>> - Data starting at... PAGESIZE? >> - The name of the bitmap and also the size of this name >> >>> >>> (5) Partial Persistence >>> >>> We did not discuss only saving higher levels of the bitmap. What's >>> the primary benefit you're seeking? >> Hmm. It may be used for faster sync. Maybe, save some of bitmap levels >> on timer while vm is running and save the last level on shutdown? >> >> CC qemu-devel - ok. >> >> Best regards, >> Vladimir >> >> On 18.11.2014 02:46, John Snow wrote: >>> >>> >>> On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi >>>> >>>> I'd just like to start working on persistent dirty bitmap. My thoughts >>>> about it are the following: >>>> - qemu -drive file=file,dirty_bitmap=file >>>> so, bitmap will be loaded with drive open and saved with drive >>>> close. >>>> - save only meaningful (the last) level of the bitmap, restore all >>>> levels on bitmap loading >>>> - bool parameter "persistent" for bdrv_create_dirty_bitmap and >>>> BdrvDirtyBitmap >>>> - internal dirty_bitmaps, saved in qcow2 file >>>> >>>> Best regards, >>>> Vladimir >>> >>> I am thinking: >>> >>> (1) Command Lines >>> >>> If you enable dirty bitmaps and give it a file that doesn't exist, it >>> should error out on you. >>> >>> If you enable dirty bitmaps and give it a file that's blank, it >>> understands that it is to create a persistent bitmap file in this >>> location and it should enable persistence. >>> >>> If a bitmap file is given and it has valid magic, this should imply >>> persistence. >>> >>> I am hesitant to have it auto-create files that don't already exist >>> in case the files become large in size and a misconfiguration leads >>> to repeated creation of these files that get orphaned in random >>> folders. Perhaps we can add a create=auto flag or similar to allow >>> this behavior if wanted. >>> >>> (2) File Format >>> >>> Some standard file magic, which includes: >>> >>> - Some magic byte(s) >>> - Dirty flag. Needed to tell if we can trust this data or not. >>> - The size of the bitmap >>> - The granularity of the bitmap >>> - The offset to the first sector of bitmap data (Maybe? It can't hurt >>> if we give ourselves a sector's worth to write metadata within.) >>> - Data starting at... PAGESIZE? >>> >>> (3) Data Integrity >>> >>> The dirty flag could work something like: >>> >>> - If, on first open, the file has the dirty flag set, we need to >>> discard the bitmap data because we can no longer trust it. >>> - If the bitmap file is clean, proceed as normal, but take a lock >>> against any of the bitmap functions to prevent them from marking any >>> bits dirty. >>> - On first write to a clean persistent bitmap, delay the write until >>> we can mark the bitmap as dirty first. This incurs a write penalty >>> when we try to use the bitmap at first... >>> - Unlock the bitmap functions and allow them to mark blocks as needed. >>> - At some point, based on a sync policy, re-commit the dirty >>> information to the file and mark the file as clean once more and >>> re-take the persistence lock. >>> >>> (4) Synchronization Policy >>> >>> - Sync after so many bits become dirty in the bitmap, either as an >>> absolute threshold or a density percentage? >>> - Sync periodically on a fixed timer? >>> - Sync periodically opportunistically when I/O utilization becomes >>> relatively low? (With some sort of starvation prevention timer?) >>> - Sync only at shutdown? >>> >>> In discussing with Stefan, I think we rather liked the idea of a >>> timer that tries to re-commit the block data during lulls in the I/O. >>> >>> (5) Partial Persistence >>> >>> We did not discuss only saving higher levels of the bitmap. What's >>> the primary benefit you're seeking? >>> >>> (6) Inclusion as qcow2 Metadata >>> >>> And lastly, we did discuss the inclusion of the bitmap as qcow2 >>> metadata, but decided it wasn't our principle target for the format >>> to allow generality to other file formats. We didn't really discuss >>> the idea of having it as an option or an extension, but I don't (off >>> the top of my head) have any reasonings against it, but I will likely >>> not work on it myself. >>> >>> >>> You didn't CC qemu-devel on this (so I won't!), but perhaps we should >>> re-send out our ideas to the wider list for feedback before we >>> proceed any further. Maybe we can split the work if we agree upon a >>> design. >>> >>> Thanks! >>> --js >>> >>> P.S.: I'm still cleaning up Fam's first patchset based on Max's and >>> your feedback. Hope to have it out by the end of this week. >>> >>>> On 11.11.2014 18:59, John Snow wrote: >>>>> >>>>> >>>>> On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote: >>>>>> Hi Fam, hi Jorn. >>>>>> >>>>>> Jagane's project - http://wiki.qemu.org/Features/Livebackup >>>>>> >>>>>> In two words: >>>>>> Normal delta - like in qemu, while backuping, we save all new >>>>>> writes to >>>>>> separate virtual disk - delta. When backup is done, we can merge >>>>>> delta >>>>>> back to original image. >>>>>> Reverse delta - while backuping, we don't stop writing to original >>>>>> image >>>>>> (and qemu works with it, not with delta), but before every write >>>>>> we copy >>>>>> the corresponding block (if backup needs it) to separate virtual disk >>>>>> (reverse delta). So, for backuping, if the current block is not >>>>>> rewritten, we take it from original file, otherwise we take it from >>>>>> reverse delta. The benefit is that, theoretically we'll have two >>>>>> times >>>>>> less overhead I/Os then with "normal delta backup + merge", >>>>>> because of >>>>>> part of (about 0.5) blocks which are rewritten while backuping >>>>>> will be >>>>>> backuped before rewrite, and they will not go to delta then. Also, >>>>>> there >>>>>> are should be methods to improve this coefficient, by choosing rules >>>>>> about what blocks to backup first. >>>>>> >>>>>> Are someone work now on persistent (while vm is turned off) >>>>>> bitmaps? If >>>>>> not, are there any recommendations? >>>>>> >>>>>> Best regards, >>>>>> Vladimir >>>>>> >>>>> >>>>> Hi Vladimir: >>>>> >>>>> I recently inherited Fam's backup series and I intended to start >>>>> reviewing the comments this week. After I sift through that situation, >>>>> the next step for me is persistent dirty bitmap support. I had been >>>>> working on AHCI and IDE an awful lot lately, so it will take me a >>>>> little bit to switch over. >>>>> >>>>> I don't have any existing notions of how I intend to implement it, so >>>>> if you'd like to discuss any concerns you have about that subsystem, >>>>> we can talk :) >>>>> >>>>> Thanks, >>>>> --John >>>>> >>>>>> On 08.11.2014 10:19, Fam Zheng wrote: >>>>>>> On Fri, 11/07 15:23, Vladimir Sementsov-Ogievskiy wrote: >>>>>>>> Hi! >>>>>>>> >>>>>>>> Glad to see you working on backup series again. Some time ago I've >>>>>>>> already >>>>>>>> fixed v4 to be mergeable, but unfortunately didn't publish. >>>>>>>> Actually I need (and I'm going to develop) incremental backup with >>>>>>>> reverse >>>>>>>> delta (like abandoned project >>>>>>>> http://wiki.qemu.org/Features/Livebackup by >>>>>>>> Jagane). And I've already fixed Jagane's project to be working with >>>>>>>> current >>>>>>>> Qemu. But the code is bad. Can you give me a peace of advice about >>>>>>>> what >>>>>>>> should I notice? I have a very superficial understanding of qemu >>>>>>>> block >>>>>>>> devices, block jobs, etc. >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> Thank you for reviewing my series. >>>>>>> >>>>>>> I haven't seen Jagane's series, so could you pleased explain what >>>>>>> "reverse >>>>>>> delta" is? >>>>>>> >>>>>>> John Snow (CC'ed) will take my work on incremental backup so I think >>>>>>> you can >>>>>>> coordinate. I'll probably count on John to work on your comments >>>>>>> too, >>>>>>> in the >>>>>>> future I'll follow the progress and review. >>>>>>> >>>>>>> Thanks, >>>>>>> Fam >>>>>> >>>>> >>>> >>> >> >> > -- —js ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series 2014-11-18 16:08 ` John Snow @ 2014-11-19 6:25 ` Denis V. Lunev 0 siblings, 0 replies; 16+ messages in thread From: Denis V. Lunev @ 2014-11-19 6:25 UTC (permalink / raw) To: John Snow, Vladimir Sementsov-Ogievskiy, Fam Zheng Cc: Denis V. Lunev, stefanha, qemu-devel On 18/11/14 19:08, John Snow wrote: > > > On 11/18/2014 08:09 AM, Vladimir Sementsov-Ogievskiy wrote: >>> (3) Data Integrity >>> >>> The dirty flag could work something like: >>> >>> - If, on first open, the file has the dirty flag set, we need to >>> discard the bitmap data because we can no longer trust it. >>> - If the bitmap file is clean, proceed as normal, but take a lock >>> against any of the bitmap functions to prevent them from marking any >>> bits dirty. >>> - On first write to a clean persistent bitmap, delay the write until >>> we can mark the bitmap as dirty first. This incurs a write penalty >>> when we try to use the bitmap at first... >>> - Unlock the bitmap functions and allow them to mark blocks as needed. >>> - At some point, based on a sync policy, re-commit the dirty >>> information to the file and mark the file as clean once more and >>> re-take the persistence lock. >> Correct me if I'm wrong. >> - first of all what we are protecting against? Any QEMU or kernel crash leads to the disaster. You can not guarantee at all the consistency between data written to the main file (disk) and data written to bitmap except if you wait that dirty bitmap is really updated. In this case you will have performance halved in comparison with the host (each guest write means 2 IOPS instead of 1) - this effectively means that we can not be protected against crash - if we don't we could not care at all about bitmap updates when QEMU is running. We should write it only on stop/suspend/etc. This simplifies things a lot. So, the procedure is simple - stop main VM by using vm_pause - open/create bitmap file - set dirty - unpause That's all. If QEMU is running, bitmap is always dirty. The only exception is backup creation. New backup means that old bitmap is synced with dirty clear, new one is created with dirty set and backup starts working with old bitmap. Once again. In all other cases we can not guarantee that we will report _all_ changed blocks to backup software if crash happens in the middle. One missed block in the bitmap and the entire backup is blown up. This also means that (4) aka sync policy is simple. Do this on close. >> #Read bitmap: >> read in blockdev_init, before any write to device, so no lock is needed. >> >> #Set bits in bitmap: >> if bitmap.dirty_flag: >> set bits >> else: >> LOCK >> set bits >> set bitmap.dirty_flag >> set dirty_flag in bitmap file >> UNLOCK >> >> #Sync: >> if not bitmap.dirty_flag: >> skip sync >> else: >> LOCK >> save one of bitmap levels (saving the last one is too long and not >> very good idea, because it is fast-updateing) >> unset dirty_flag in bitmap file >> unset bitmap.dirty_flag >> UNLOCK >> >> #Last sync in bdrv_close: >> Just save the last bitmap level and unset dirty_flag in bitmap file >> >> Also.. I'm not quite sure about locking.. As I understand, co-routines >> in qemu are not running in parallel, is locking required? Or sync timer >> will not be co-routine based? >> >> Best regards, >> Vladimir > > Might be being too informal. I just meant a lock or barrier to prevent > actual IO throughput until we can confirm the dirty flag has been > adjusted to indicate that the persistent bitmap is now officially out of > date. Nothing fancy. > > Wasn't trying to imply that we needed threading protection, just > "locking" the IO until we can configure the bitmap as we need it to be. > ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. [not found] ` <546A88DD.10006@redhat.com> 2014-11-18 10:54 ` [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series Vladimir Sementsov-Ogievskiy @ 2014-11-20 10:34 ` Vladimir Sementsov-Ogievskiy 2014-11-20 10:41 ` Vladimir Sementsov-Ogievskiy 2014-11-21 0:24 ` Eric Blake 1 sibling, 2 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-20 10:34 UTC (permalink / raw) To: qemu-devel, jsnow; +Cc: famz, den, stefanha QDB file is for storing dirty bitmap. The specification is based on qcow2 specification. Saving several bitmaps is necessary when server shutdowns during backup. In this case 2 tables for each disk are available. One collected for a previous period and one active. Though this feature is discussable. Big endian format and Standard Cluster Descriptor are used to simplify integration with qcow2, to support internal bitmaps for qcow2 in future. The idea is that the same procedure writing the data to QDB file could do the same for QCOW2. The only difference is cluster refcount table. Should we use it here or not is still questionable. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> --- docs/specs/qdb.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 docs/specs/qdb.txt diff --git a/docs/specs/qdb.txt b/docs/specs/qdb.txt new file mode 100644 index 0000000..d570a69 --- /dev/null +++ b/docs/specs/qdb.txt @@ -0,0 +1,132 @@ +== General == + +"QDB" means "Qemu Dirty Bitmaps". QDB file can store several dirty bitmaps. +QDB file is organized in units of constant size, which are called clusters. + +All numbers in QDB are stored in Big Endian byte order. + +== Header == + +The first cluster of a QDB image contains the file header: + + Byte 0 - 3: magic + QDB magic string ("QDB\0") + + 4 - 7: version + Version number (valid value is 1) + + 8 - 11: cluster_bits + Number of bits that are used for addressing an offset + within a cluster (1 << cluster_bits is the cluster size). + Must not be less than 9 (i.e. 512 byte clusters). + + 12 - 15: nb_bitmaps + Number of bitmaps contained in the file + + 16 - 23: bitmaps_offset + Offset into the QDB file at which the bitmap table starts. + Must be aligned to a cluster boundary. + + 24 - 27: header_length + Length of the header structure in bytes. + +Like in qcow2, directly after the image header, optional sections called header extensions can +be stored. Each extension has a structure like the following: + + Byte 0 - 3: Header extension type: + 0x00000000 - End of the header extension area + other - Unknown header extension, can be safely + ignored + + 4 - 7: Length of the header extension data + + 8 - n: Header extension data + + n - m: Padding to round up the header extension size to the next + multiple of 8. + +Unless stated otherwise, each header extension type shall appear at most once +in the same image. + +== Cluster mapping == + +QDB uses a ONE-level structure for the mapping of +bitmaps to host clusters. It is called L1 table. + +The L1 table has a variable size (stored in the Bitmap table entry) and may +use multiple clusters, however it must be contiguous in the QDB file. + +Given a offset into the bitmap, the offset into the QDB file can be +obtained as follows: + + offset = l1_table[offset / cluster_size] + (offset % cluster_size) + +L1 table entry: + + Bit 0 - 61: Cluster descriptor + + 62 - 63: Reserved + +Standard Cluster Descriptor (the same as in qcow2): + + Bit 0: If set to 1, the cluster reads as all zeros. The host + cluster offset can be used to describe a preallocation, + but it won't be used for reading data from this cluster, + nor is data read from the backing file if the cluster is + unallocated. + + 1 - 8: Reserved (set to 0) + + 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a + cluster boundary. If the offset is 0, the cluster is + unallocated. + + 56 - 61: Reserved (set to 0) + +If a cluster is unallocated, read requests shall read zero. + +== Bitmap table == + +QDB supports storing of several bitmaps. + +A directory of all bitmaps is stored in the bitmap table, a contiguous area +in the QDB file, whose starting offset and length are given by the header +fields bitmaps_offset and nb_bitmaps. The entries of the bitmap table +have variable length, depending on the length of name and extra data. + +Bitmap table entry: + + Byte 0 - 7: Offset into the QDB file at which the L1 table for the + bitmap starts. Must be aligned to a cluster boundary. + + 8 - 11: Number of entries in the L1 table of the bitmap + + 12 - 15: Bitmap granularity + As represented in HBitmap structure. Given a granularity of + G, each bit in the bitmap will actually represent a group + of 2^G bytes. + + 16 - 23: Bitmap size + The size of really stored data is + (size + (1 << granularity) - 1) >> granularity + + 24: Bitmap enabled flag (valid values are 1 and 0) + + 25: File dirty flag (valid values are 1 and 0; + if value is 1 the bitmap is inconsistent) + + 26 - 27: Size of the bitmap name + + 36 - 39: Size of extra data in the table entry (used for future + extensions of the format) + + variable: Extra data for future extensions. Unknown fields must be + ignored. + + variable: Name of the bitmap (not null terminated) + + variable: Padding to round up the bitmap table entry size to the + next multiple of 8. + +The fields "size", "granularity", "enabled" and "name" are corresponding with +the fields in struct BdrvDirtyBitmap. -- 1.9.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-20 10:34 ` [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec Vladimir Sementsov-Ogievskiy @ 2014-11-20 10:41 ` Vladimir Sementsov-Ogievskiy 2014-11-20 11:36 ` Stefan Hajnoczi 2014-11-21 0:24 ` Eric Blake 1 sibling, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-20 10:41 UTC (permalink / raw) To: qemu-devel, jsnow; +Cc: famz, den, stefanha Also, it may be better to make this as qcow2 extension. And bitmap will be saved in separate qcow2 file, which will contain only the bitmap(s) and no other data (no disk, no snapshots). Best regards, Vladimir On 20.11.2014 13:34, Vladimir Sementsov-Ogievskiy wrote: > QDB file is for storing dirty bitmap. The specification is based on > qcow2 specification. > > Saving several bitmaps is necessary when server shutdowns during > backup. In this case 2 tables for each disk are available. One > collected for a previous period and one active. Though this feature > is discussable. > > Big endian format and Standard Cluster Descriptor are used to simplify > integration with qcow2, to support internal bitmaps for qcow2 in future. > > The idea is that the same procedure writing the data to QDB file could > do the same for QCOW2. The only difference is cluster refcount table. > Should we use it here or not is still questionable. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> > --- > docs/specs/qdb.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 132 insertions(+) > create mode 100644 docs/specs/qdb.txt > > diff --git a/docs/specs/qdb.txt b/docs/specs/qdb.txt > new file mode 100644 > index 0000000..d570a69 > --- /dev/null > +++ b/docs/specs/qdb.txt > @@ -0,0 +1,132 @@ > +== General == > + > +"QDB" means "Qemu Dirty Bitmaps". QDB file can store several dirty bitmaps. > +QDB file is organized in units of constant size, which are called clusters. > + > +All numbers in QDB are stored in Big Endian byte order. > + > +== Header == > + > +The first cluster of a QDB image contains the file header: > + > + Byte 0 - 3: magic > + QDB magic string ("QDB\0") > + > + 4 - 7: version > + Version number (valid value is 1) > + > + 8 - 11: cluster_bits > + Number of bits that are used for addressing an offset > + within a cluster (1 << cluster_bits is the cluster size). > + Must not be less than 9 (i.e. 512 byte clusters). > + > + 12 - 15: nb_bitmaps > + Number of bitmaps contained in the file > + > + 16 - 23: bitmaps_offset > + Offset into the QDB file at which the bitmap table starts. > + Must be aligned to a cluster boundary. > + > + 24 - 27: header_length > + Length of the header structure in bytes. > + > +Like in qcow2, directly after the image header, optional sections called header extensions can > +be stored. Each extension has a structure like the following: > + > + Byte 0 - 3: Header extension type: > + 0x00000000 - End of the header extension area > + other - Unknown header extension, can be safely > + ignored > + > + 4 - 7: Length of the header extension data > + > + 8 - n: Header extension data > + > + n - m: Padding to round up the header extension size to the next > + multiple of 8. > + > +Unless stated otherwise, each header extension type shall appear at most once > +in the same image. > + > +== Cluster mapping == > + > +QDB uses a ONE-level structure for the mapping of > +bitmaps to host clusters. It is called L1 table. > + > +The L1 table has a variable size (stored in the Bitmap table entry) and may > +use multiple clusters, however it must be contiguous in the QDB file. > + > +Given a offset into the bitmap, the offset into the QDB file can be > +obtained as follows: > + > + offset = l1_table[offset / cluster_size] + (offset % cluster_size) > + > +L1 table entry: > + > + Bit 0 - 61: Cluster descriptor > + > + 62 - 63: Reserved > + > +Standard Cluster Descriptor (the same as in qcow2): > + > + Bit 0: If set to 1, the cluster reads as all zeros. The host > + cluster offset can be used to describe a preallocation, > + but it won't be used for reading data from this cluster, > + nor is data read from the backing file if the cluster is > + unallocated. > + > + 1 - 8: Reserved (set to 0) > + > + 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a > + cluster boundary. If the offset is 0, the cluster is > + unallocated. > + > + 56 - 61: Reserved (set to 0) > + > +If a cluster is unallocated, read requests shall read zero. > + > +== Bitmap table == > + > +QDB supports storing of several bitmaps. > + > +A directory of all bitmaps is stored in the bitmap table, a contiguous area > +in the QDB file, whose starting offset and length are given by the header > +fields bitmaps_offset and nb_bitmaps. The entries of the bitmap table > +have variable length, depending on the length of name and extra data. > + > +Bitmap table entry: > + > + Byte 0 - 7: Offset into the QDB file at which the L1 table for the > + bitmap starts. Must be aligned to a cluster boundary. > + > + 8 - 11: Number of entries in the L1 table of the bitmap > + > + 12 - 15: Bitmap granularity > + As represented in HBitmap structure. Given a granularity of > + G, each bit in the bitmap will actually represent a group > + of 2^G bytes. > + > + 16 - 23: Bitmap size > + The size of really stored data is > + (size + (1 << granularity) - 1) >> granularity > + > + 24: Bitmap enabled flag (valid values are 1 and 0) > + > + 25: File dirty flag (valid values are 1 and 0; > + if value is 1 the bitmap is inconsistent) > + > + 26 - 27: Size of the bitmap name > + > + 36 - 39: Size of extra data in the table entry (used for future > + extensions of the format) > + > + variable: Extra data for future extensions. Unknown fields must be > + ignored. > + > + variable: Name of the bitmap (not null terminated) > + > + variable: Padding to round up the bitmap table entry size to the > + next multiple of 8. > + > +The fields "size", "granularity", "enabled" and "name" are corresponding with > +the fields in struct BdrvDirtyBitmap. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-20 10:41 ` Vladimir Sementsov-Ogievskiy @ 2014-11-20 11:36 ` Stefan Hajnoczi 2014-11-21 10:27 ` Vladimir Sementsov-Ogievskiy 2014-11-21 12:56 ` Vladimir Sementsov-Ogievskiy 0 siblings, 2 replies; 16+ messages in thread From: Stefan Hajnoczi @ 2014-11-20 11:36 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy; +Cc: famz, jsnow, qemu-devel, den [-- Attachment #1: Type: text/plain, Size: 1483 bytes --] On Thu, Nov 20, 2014 at 01:41:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Also, it may be better to make this as qcow2 extension. And bitmap will be > saved in separate qcow2 file, which will contain only the bitmap(s) and no > other data (no disk, no snapshots). I think you are on to something with the idea of making the persistent dirty bitmap itself a disk image. That way drive-mirror and other commands can be used to live migrate the dirty bitmap along with the guest's disks. This allows both QEMU and management tools to reuse existing code. (We may need to allow multiple block jobs per BlockDriverState to make this work but in theory that can be done.) There is a constraint if we want to get live migration for free: The bitmap contents must be accessible with bdrv_read() and bdrv_get_block_status() to skip zero regions. Putting the dirty bitmap into its own data structure in qcow2 and not accessible as a BlockDriverState bdrv_read() means custom code must be written to migrate the dirty bitmap. So I suggest putting the bitmap contents into a disk image that can be accessed as a BlockDriverState with bdrv_read(). The metadata (bitmap name, granularity, etc) doesn't need to be stored in the image file because management tools must be aware of it anyway. The only thing besides the data that really needs to be stored is the up-to-date flag to decide whether this dirty bitmap was synced cleanly. A much simpler format would do for that. Stefan [-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-20 11:36 ` Stefan Hajnoczi @ 2014-11-21 10:27 ` Vladimir Sementsov-Ogievskiy 2014-11-21 16:55 ` Stefan Hajnoczi 2014-11-21 12:56 ` Vladimir Sementsov-Ogievskiy 1 sibling, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-21 10:27 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, den > There is a constraint if we want to get live migration for free: The > bitmap contents must be accessible with bdrv_read() and > bdrv_get_block_status() to skip zero regions. Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual version is in memory. To migrate bitmap file like a disk image, we should start syncing it with every write to corresponding disk, doubling number of io. Moreover, we have normal dirty bitmaps, which have no name/file, do we migrate them? If, for example, the migration occurs when backup in progress? Active bitmaps should be migrated in the same way for persistent/named/normal bitmaps. I can't find in qemu source, is there bitmap migration? Or you are saying about migrating disabled bitmaps? Hm. We should sync bitmap file on bitmap_disable. Disabled persistent bitmap is just a static file ~30mb, we can easily migrate it without common procedure with cow or something like this.. Best regards, Vladimir On 20.11.2014 14:36, Stefan Hajnoczi wrote: > On Thu, Nov 20, 2014 at 01:41:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> Also, it may be better to make this as qcow2 extension. And bitmap will be >> saved in separate qcow2 file, which will contain only the bitmap(s) and no >> other data (no disk, no snapshots). > I think you are on to something with the idea of making the persistent > dirty bitmap itself a disk image. > > That way drive-mirror and other commands can be used to live migrate the > dirty bitmap along with the guest's disks. This allows both QEMU and > management tools to reuse existing code. > > (We may need to allow multiple block jobs per BlockDriverState to make > this work but in theory that can be done.) > > There is a constraint if we want to get live migration for free: The > bitmap contents must be accessible with bdrv_read() and > bdrv_get_block_status() to skip zero regions. > > Putting the dirty bitmap into its own data structure in qcow2 and not > accessible as a BlockDriverState bdrv_read() means custom code must be > written to migrate the dirty bitmap. > > So I suggest putting the bitmap contents into a disk image that can be > accessed as a BlockDriverState with bdrv_read(). The metadata (bitmap > name, granularity, etc) doesn't need to be stored in the image file > because management tools must be aware of it anyway. > > The only thing besides the data that really needs to be stored is the > up-to-date flag to decide whether this dirty bitmap was synced cleanly. > A much simpler format would do for that. > > Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-21 10:27 ` Vladimir Sementsov-Ogievskiy @ 2014-11-21 16:55 ` Stefan Hajnoczi 2014-11-24 9:19 ` Vladimir Sementsov-Ogievskiy ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Stefan Hajnoczi @ 2014-11-21 16:55 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy; +Cc: famz, jsnow, qemu-devel, den [-- Attachment #1: Type: text/plain, Size: 1868 bytes --] On Fri, Nov 21, 2014 at 01:27:40PM +0300, Vladimir Sementsov-Ogievskiy wrote: > >There is a constraint if we want to get live migration for free: The > >bitmap contents must be accessible with bdrv_read() and > >bdrv_get_block_status() to skip zero regions. > Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual > version is in memory. To migrate bitmap file like a disk image, we should > start syncing it with every write to corresponding disk, doubling number of > io. It would be possible to drive-mirror the persistent dirty bitmap and then flush it like all drives when the guest vCPUs are paused for migration. After thinking more about it though, this approach places more I/O into the critical guest downtime phase. In other words, slow disk I/O could lead to long guest downtimes while QEMU tries to write out the dirty bitmap. > Moreover, we have normal dirty bitmaps, which have no name/file, do we > migrate them? If, for example, the migration occurs when backup in progress? > Active bitmaps should be migrated in the same way for > persistent/named/normal bitmaps. I can't find in qemu source, is there > bitmap migration? bs->dirty_bitmaps is not migrated, in fact none of BlockDriverState is migrated. QEMU only migrates emulated device state (e.g. the hardware registers and associated state). It does not emulate host state that the guest cannot see like the dirty bitmap. > Or you are saying about migrating disabled bitmaps? Hm. We should sync > bitmap file on bitmap_disable. Disabled persistent bitmap is just a static > file ~30mb, we can easily migrate it without common procedure with cow or > something like this.. Active dirty bitmaps should migrate too. I'm thinking now that the appropriate thing is to add live migration of dirty bitmaps to QEMU (regardless of whether they are active or not). Stefan [-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-21 16:55 ` Stefan Hajnoczi @ 2014-11-24 9:19 ` Vladimir Sementsov-Ogievskiy 2014-11-25 17:58 ` Vladimir Sementsov-Ogievskiy 2014-11-28 13:28 ` Vladimir Sementsov-Ogievskiy 2 siblings, 0 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-24 9:19 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, den > Active dirty bitmaps should migrate too. I'm thinking now that the > appropriate thing is to add live migration of dirty bitmaps to QEMU > (regardless of whether they are active or not). Only for persistent bitmaps, or for all named bitmaps? If for all named bitmaps, then this migration should not be connected with bitmap file and it's format. Best regards, Vladimir On 21.11.2014 19:55, Stefan Hajnoczi wrote: > On Fri, Nov 21, 2014 at 01:27:40PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>> There is a constraint if we want to get live migration for free: The >>> bitmap contents must be accessible with bdrv_read() and >>> bdrv_get_block_status() to skip zero regions. >> Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual >> version is in memory. To migrate bitmap file like a disk image, we should >> start syncing it with every write to corresponding disk, doubling number of >> io. > It would be possible to drive-mirror the persistent dirty bitmap and > then flush it like all drives when the guest vCPUs are paused for > migration. > > After thinking more about it though, this approach places more I/O into > the critical guest downtime phase. In other words, slow disk I/O could > lead to long guest downtimes while QEMU tries to write out the dirty > bitmap. > >> Moreover, we have normal dirty bitmaps, which have no name/file, do we >> migrate them? If, for example, the migration occurs when backup in progress? >> Active bitmaps should be migrated in the same way for >> persistent/named/normal bitmaps. I can't find in qemu source, is there >> bitmap migration? > bs->dirty_bitmaps is not migrated, in fact none of BlockDriverState is > migrated. > > QEMU only migrates emulated device state (e.g. the hardware registers > and associated state). It does not emulate host state that the guest > cannot see like the dirty bitmap. > >> Or you are saying about migrating disabled bitmaps? Hm. We should sync >> bitmap file on bitmap_disable. Disabled persistent bitmap is just a static >> file ~30mb, we can easily migrate it without common procedure with cow or >> something like this.. > Active dirty bitmaps should migrate too. I'm thinking now that the > appropriate thing is to add live migration of dirty bitmaps to QEMU > (regardless of whether they are active or not). > > Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-21 16:55 ` Stefan Hajnoczi 2014-11-24 9:19 ` Vladimir Sementsov-Ogievskiy @ 2014-11-25 17:58 ` Vladimir Sementsov-Ogievskiy 2014-11-28 13:28 ` Vladimir Sementsov-Ogievskiy 2 siblings, 0 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-25 17:58 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, den > I'm thinking now that the > appropriate thing is to add live migration of dirty bitmaps to QEMU > (regardless of whether they are active or not). Digging the code around, I've found this: in mig_save_device_dirty which is actually an iteration of live block migration, after sending a sector we need to clear appropriate bit in migration dirty bitmap (bmds->dirty_bitmap). But we clear such bits in all bitmaps, associated with this device: bdrv_reset_dirty(bmds->bs, sector, nr_sectors); which is void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors) { BdrvDirtyBitmap *bitmap; QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) { hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors); } } I don't know why is it so, but with such approach we cant talk about dirty bitmap migration. Actually, all other dirty bitmaps, not related to this migration are broken because of this. It's a mistake or I don't understand the concept of several dirty bitmaps per device in qemu. I've thought that they are separate entities, which are maintained by qemu. And other subsystems like backup or migration can create for itself a bitmap and use it not touching other bitmaps.. Am I wrong? Best regards, Vladimir On 21.11.2014 19:55, Stefan Hajnoczi wrote: > On Fri, Nov 21, 2014 at 01:27:40PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>> There is a constraint if we want to get live migration for free: The >>> bitmap contents must be accessible with bdrv_read() and >>> bdrv_get_block_status() to skip zero regions. >> Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual >> version is in memory. To migrate bitmap file like a disk image, we should >> start syncing it with every write to corresponding disk, doubling number of >> io. > It would be possible to drive-mirror the persistent dirty bitmap and > then flush it like all drives when the guest vCPUs are paused for > migration. > > After thinking more about it though, this approach places more I/O into > the critical guest downtime phase. In other words, slow disk I/O could > lead to long guest downtimes while QEMU tries to write out the dirty > bitmap. > >> Moreover, we have normal dirty bitmaps, which have no name/file, do we >> migrate them? If, for example, the migration occurs when backup in progress? >> Active bitmaps should be migrated in the same way for >> persistent/named/normal bitmaps. I can't find in qemu source, is there >> bitmap migration? > bs->dirty_bitmaps is not migrated, in fact none of BlockDriverState is > migrated. > > QEMU only migrates emulated device state (e.g. the hardware registers > and associated state). It does not emulate host state that the guest > cannot see like the dirty bitmap. > >> Or you are saying about migrating disabled bitmaps? Hm. We should sync >> bitmap file on bitmap_disable. Disabled persistent bitmap is just a static >> file ~30mb, we can easily migrate it without common procedure with cow or >> something like this.. > Active dirty bitmaps should migrate too. I'm thinking now that the > appropriate thing is to add live migration of dirty bitmaps to QEMU > (regardless of whether they are active or not). > > Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-21 16:55 ` Stefan Hajnoczi 2014-11-24 9:19 ` Vladimir Sementsov-Ogievskiy 2014-11-25 17:58 ` Vladimir Sementsov-Ogievskiy @ 2014-11-28 13:28 ` Vladimir Sementsov-Ogievskiy 2014-12-01 11:02 ` Stefan Hajnoczi 2 siblings, 1 reply; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-28 13:28 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, Denis V. Lunev On 21.11.2014 19:55, Stefan Hajnoczi wrote: > Active dirty bitmaps should migrate too. I'm thinking now that the > appropriate thing is to add live migration of dirty bitmaps to QEMU > (regardless of whether they are active or not). I think, we should migrate named dirty bitmaps, which are not used now. So if some external mechanism uses the bitmap (for example - backup) - we actually can't migrate this process, because we will need to restore the whole backup structure including a pointer to the bitmap, which is too hard and includes not only bitmap migration. So, if named bitmap is enabled, but not used (only bdrv_aligned_pwritev writes to it) it can be migrated. For this I see the following solutions: 1) Just save all corresponding pieces of named bitmaps with every migrated block. The block size is 1mb, so the overhead for migrating additionally a bitmap with 64kb granularity would be 2b, and it would be 256b for bitmap with 512b granularity. This approach needs additional fields in BlkMigBlock, for saving bitmaps pieces. 2) Add DIRTY flag to migrated block flags, to distinguish blocks, which became dirty while migrating. Save all the bitmaps separately, and also update them on block_load, when we receive block with DIRTY flag on. Some information will be lost, migrated dirty bitmaps may be "more dirty" then original ones. This approach needs additional field "bool dirty" in BlkMigBlock, and saving this flag in blk_send. These solutions don't depend on "persistence" of dirty bitmaps or persistent bitmap file format. Best regards, Vladimir On 21.11.2014 19:55, Stefan Hajnoczi wrote: > On Fri, Nov 21, 2014 at 01:27:40PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>> There is a constraint if we want to get live migration for free: The >>> bitmap contents must be accessible with bdrv_read() and >>> bdrv_get_block_status() to skip zero regions. >> Hm. I'm afraid, it still will not be free. If bitmap is active, it's actual >> version is in memory. To migrate bitmap file like a disk image, we should >> start syncing it with every write to corresponding disk, doubling number of >> io. > It would be possible to drive-mirror the persistent dirty bitmap and > then flush it like all drives when the guest vCPUs are paused for > migration. > > After thinking more about it though, this approach places more I/O into > the critical guest downtime phase. In other words, slow disk I/O could > lead to long guest downtimes while QEMU tries to write out the dirty > bitmap. > >> Moreover, we have normal dirty bitmaps, which have no name/file, do we >> migrate them? If, for example, the migration occurs when backup in progress? >> Active bitmaps should be migrated in the same way for >> persistent/named/normal bitmaps. I can't find in qemu source, is there >> bitmap migration? > bs->dirty_bitmaps is not migrated, in fact none of BlockDriverState is > migrated. > > QEMU only migrates emulated device state (e.g. the hardware registers > and associated state). It does not emulate host state that the guest > cannot see like the dirty bitmap. > >> Or you are saying about migrating disabled bitmaps? Hm. We should sync >> bitmap file on bitmap_disable. Disabled persistent bitmap is just a static >> file ~30mb, we can easily migrate it without common procedure with cow or >> something like this.. > Active dirty bitmaps should migrate too. I'm thinking now that the > appropriate thing is to add live migration of dirty bitmaps to QEMU > (regardless of whether they are active or not). > > Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-28 13:28 ` Vladimir Sementsov-Ogievskiy @ 2014-12-01 11:02 ` Stefan Hajnoczi 0 siblings, 0 replies; 16+ messages in thread From: Stefan Hajnoczi @ 2014-12-01 11:02 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy; +Cc: famz, jsnow, qemu-devel, Denis V. Lunev [-- Attachment #1: Type: text/plain, Size: 2392 bytes --] On Fri, Nov 28, 2014 at 04:28:57PM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 21.11.2014 19:55, Stefan Hajnoczi wrote: > >Active dirty bitmaps should migrate too. I'm thinking now that the > >appropriate thing is to add live migration of dirty bitmaps to QEMU > >(regardless of whether they are active or not). > I think, we should migrate named dirty bitmaps, which are not used now. So > if some external mechanism uses the bitmap (for example - backup) - we > actually can't migrate this process, because we will need to restore the > whole backup structure including a pointer to the bitmap, which is too hard > and includes not only bitmap migration. > So, if named bitmap is enabled, but not used (only bdrv_aligned_pwritev > writes to it) it can be migrated. For this I see the following solutions: > > 1) Just save all corresponding pieces of named bitmaps with every migrated > block. The block size is 1mb, so the overhead for migrating additionally a > bitmap with 64kb granularity would be 2b, and it would be 256b for bitmap > with 512b granularity. This approach needs additional fields in BlkMigBlock, > for saving bitmaps pieces. block-migration.c is not used for all live migration. So it's important not to tie dirty bitmap migration to block-migration.c, at least there needs to be a way to skip actually copying disk contents in block-migration.c. (When there is shared storage that both source and destination hosts can access then block-migration.c is not used. Also, there is a newer non-shared storage migration mechanism that is used instead of block-migration.c which is not tied into the live migration data stream, so block-migration.c is optional.) > 2) Add DIRTY flag to migrated block flags, to distinguish blocks, which > became dirty while migrating. Save all the bitmaps separately, and also > update them on block_load, when we receive block with DIRTY flag on. Some > information will be lost, migrated dirty bitmaps may be "more dirty" then > original ones. This approach needs additional field "bool dirty" in > BlkMigBlock, and saving this flag in blk_send. > > These solutions don't depend on "persistence" of dirty bitmaps or persistent > bitmap file format. That's an important characteristic since we probably want to migrate named dirty bitmaps, whether they are persistent or not. Stefan [-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-20 11:36 ` Stefan Hajnoczi 2014-11-21 10:27 ` Vladimir Sementsov-Ogievskiy @ 2014-11-21 12:56 ` Vladimir Sementsov-Ogievskiy 1 sibling, 0 replies; 16+ messages in thread From: Vladimir Sementsov-Ogievskiy @ 2014-11-21 12:56 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: famz, jsnow, qemu-devel, den > The metadata (bitmap > name, granularity, etc) doesn't need to be stored in the image file > because management tools must be aware of it anyway. What tools do you mean? In my opinion dirty bitmap should exist as a separate object. If it exists, it should be loaded with it's drive image and it should be maintained by qemu (loaded and enabled as a BdrvDirtyBitmap). If we use qcow2 format for dirty bitmaps, we can store metadata using header extension.. Also snapshots may be used to store several bitmaps in case when server shutdowns during backup and we need to store both current active bitmap and it's snapshot used by backup. Best regards, Vladimir On 20.11.2014 14:36, Stefan Hajnoczi wrote: > On Thu, Nov 20, 2014 at 01:41:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> Also, it may be better to make this as qcow2 extension. And bitmap will be >> saved in separate qcow2 file, which will contain only the bitmap(s) and no >> other data (no disk, no snapshots). > I think you are on to something with the idea of making the persistent > dirty bitmap itself a disk image. > > That way drive-mirror and other commands can be used to live migrate the > dirty bitmap along with the guest's disks. This allows both QEMU and > management tools to reuse existing code. > > (We may need to allow multiple block jobs per BlockDriverState to make > this work but in theory that can be done.) > > There is a constraint if we want to get live migration for free: The > bitmap contents must be accessible with bdrv_read() and > bdrv_get_block_status() to skip zero regions. > > Putting the dirty bitmap into its own data structure in qcow2 and not > accessible as a BlockDriverState bdrv_read() means custom code must be > written to migrate the dirty bitmap. > > So I suggest putting the bitmap contents into a disk image that can be > accessed as a BlockDriverState with bdrv_read(). The metadata (bitmap > name, granularity, etc) doesn't need to be stored in the image file > because management tools must be aware of it anyway. > > The only thing besides the data that really needs to be stored is the > up-to-date flag to decide whether this dirty bitmap was synced cleanly. > A much simpler format would do for that. > > Stefan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec. 2014-11-20 10:34 ` [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec Vladimir Sementsov-Ogievskiy 2014-11-20 10:41 ` Vladimir Sementsov-Ogievskiy @ 2014-11-21 0:24 ` Eric Blake 1 sibling, 0 replies; 16+ messages in thread From: Eric Blake @ 2014-11-21 0:24 UTC (permalink / raw) To: Vladimir Sementsov-Ogievskiy, qemu-devel, jsnow; +Cc: famz, den, stefanha [-- Attachment #1: Type: text/plain, Size: 5256 bytes --] On 11/20/2014 03:34 AM, Vladimir Sementsov-Ogievskiy wrote: > QDB file is for storing dirty bitmap. The specification is based on > qcow2 specification. > > Saving several bitmaps is necessary when server shutdowns during > backup. In this case 2 tables for each disk are available. One > collected for a previous period and one active. Though this feature > is discussable. > > Big endian format and Standard Cluster Descriptor are used to simplify > integration with qcow2, to support internal bitmaps for qcow2 in future. > > The idea is that the same procedure writing the data to QDB file could > do the same for QCOW2. The only difference is cluster refcount table. > Should we use it here or not is still questionable. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com> > --- > docs/specs/qdb.txt | 132 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 132 insertions(+) > create mode 100644 docs/specs/qdb.txt No comment on whether the approach itself makes sense - just a high-level review of this document in isolation. > > diff --git a/docs/specs/qdb.txt b/docs/specs/qdb.txt > new file mode 100644 > index 0000000..d570a69 > --- /dev/null > +++ b/docs/specs/qdb.txt > @@ -0,0 +1,132 @@ > +== General == Missing a copyright notice. Yeah, you've got a lot of bad examples in this directory (in docs/* in general), but there ARE a few of the newer files that are starting to buck the trend and use a copyright/license blurb. > + > +"QDB" means "Qemu Dirty Bitmaps". QDB file can store several dirty bitmaps. > +QDB file is organized in units of constant size, which are called clusters. > + > +All numbers in QDB are stored in Big Endian byte order. > + > +== Header == > + > +The first cluster of a QDB image contains the file header: > + > + Byte 0 - 3: magic > + QDB magic string ("QDB\0") > + > + 4 - 7: version > + Version number (valid value is 1) > + > + 8 - 11: cluster_bits > + Number of bits that are used for addressing an offset > + within a cluster (1 << cluster_bits is the cluster size). > + Must not be less than 9 (i.e. 512 byte clusters). Is there a maximum? > + > + 12 - 15: nb_bitmaps > + Number of bitmaps contained in the file > + > + 16 - 23: bitmaps_offset > + Offset into the QDB file at which the bitmap table starts. > + Must be aligned to a cluster boundary. > + > + 24 - 27: header_length > + Length of the header structure in bytes. does that include the length of all extensions? Should we enforce a maximum header length of one cluster? > + > +Like in qcow2, directly after the image header, optional sections called header extensions can > +be stored. Each extension has a structure like the following: > + > + Byte 0 - 3: Header extension type: > + 0x00000000 - End of the header extension area > + other - Unknown header extension, can be safely > + ignored > + > + 4 - 7: Length of the header extension data > + > + 8 - n: Header extension data > + > + n - m: Padding to round up the header extension size to the next > + multiple of 8. > + > +Unless stated otherwise, each header extension type shall appear at most once > +in the same image. I like how qcow2 v3 has a header extension for listing the name of each header extension, for nicer error messages. Also, I think that declaring all unknown extensions as ignorable may be dangerous, since you lack a capability bitmask. Maybe it would be wise to copy the qcow2 v3 capabilities (including flags for ignorable vs. mandatory support of given features, where a client can sanely decide what to do if it does not recognize a feature). > + > + 26 - 27: Size of the bitmap name > + > + 36 - 39: Size of extra data in the table entry (used for future > + extensions of the format) > + > + variable: Extra data for future extensions. Unknown fields must be > + ignored. This block is width 0 if bytes 36-39 is 0? How are extensions identified? Are they required to be done like overall file headers, with an id, length, and then variable data, so that it is possible to scan to the end of each unknown extension to see if the next extension is known? This is where capability bits in the overall header may make more sense. > + > + variable: Name of the bitmap (not null terminated) The length of this block is determined by bytes 26-27? > + > + variable: Padding to round up the bitmap table entry size to the > + next multiple of 8. > + > +The fields "size", "granularity", "enabled" and "name" are corresponding with > +the fields in struct BdrvDirtyBitmap. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 539 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-12-01 11:02 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <545CB9CE.9000302@parallels.com> [not found] ` <20141108071919.GB4940@fam-t430.nay.redhat.com> [not found] ` <54607427.8040404@parallels.com> [not found] ` <5462327C.5080704@redhat.com> [not found] ` <5464B80E.6060201@parallels.com> [not found] ` <546A88DD.10006@redhat.com> 2014-11-18 10:54 ` [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series Vladimir Sementsov-Ogievskiy 2014-11-18 13:09 ` Vladimir Sementsov-Ogievskiy 2014-11-18 14:41 ` Vladimir Sementsov-Ogievskiy 2014-11-18 16:08 ` John Snow 2014-11-19 6:25 ` Denis V. Lunev 2014-11-20 10:34 ` [Qemu-devel] [PATCH v2] persistent dirty bitmap: add QDB file spec Vladimir Sementsov-Ogievskiy 2014-11-20 10:41 ` Vladimir Sementsov-Ogievskiy 2014-11-20 11:36 ` Stefan Hajnoczi 2014-11-21 10:27 ` Vladimir Sementsov-Ogievskiy 2014-11-21 16:55 ` Stefan Hajnoczi 2014-11-24 9:19 ` Vladimir Sementsov-Ogievskiy 2014-11-25 17:58 ` Vladimir Sementsov-Ogievskiy 2014-11-28 13:28 ` Vladimir Sementsov-Ogievskiy 2014-12-01 11:02 ` Stefan Hajnoczi 2014-11-21 12:56 ` Vladimir Sementsov-Ogievskiy 2014-11-21 0:24 ` Eric Blake
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).