From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:40234) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gmNuI-0003Bb-Sm for qemu-devel@nongnu.org; Wed, 23 Jan 2019 14:09:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gmNuH-0005nq-88 for qemu-devel@nongnu.org; Wed, 23 Jan 2019 14:09:50 -0500 References: From: Eric Blake Message-ID: <9aef3157-e49e-4b53-f0de-75593df06da9@redhat.com> Date: Wed, 23 Jan 2019 13:09:41 -0600 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="YhoOm5X7isAIajmhPQqB8g1gMO78w3m45" Subject: Re: [Qemu-devel] Incremental drive-backup with dirty bitmaps List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Bharadwaj Rayala Cc: qemu-devel@nongnu.org, Kashyap Chamarthy , Suman Swaroop , kchamart@redhat.com, John Snow , qemu-discuss@nongnu.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --YhoOm5X7isAIajmhPQqB8g1gMO78w3m45 From: Eric Blake To: Bharadwaj Rayala Cc: qemu-devel@nongnu.org, Kashyap Chamarthy , Suman Swaroop , kchamart@redhat.com, John Snow , qemu-discuss@nongnu.org Message-ID: <9aef3157-e49e-4b53-f0de-75593df06da9@redhat.com> Subject: Re: [Qemu-devel] Incremental drive-backup with dirty bitmaps References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 1/23/19 12:08 PM, Bharadwaj Rayala wrote: >>> Issues i face: >>> 1. Does the drive-backup stall for the whole time the block job is in= >>> progress. This is a strict no for me. I didnot find any documentation= >>> regarding it but a powerpoint presentation(from kaskyapc) mentioning = it. >>> (Assuming yes!) >> >> The drive-backup is running in parallel to the guest. I'm not sure wh= at >> stalls you are seeing - but as qemu is doing all the work, it DOES hav= e >> to service both guest requests and the work to copy out the backup; >> also, if you have known-inefficient lseek() situations, there may be >> cases where qemu is doing a lousy job (there's work underway on the li= st >> to improve qemu's caching of lseek() data). >> >> > Eric, I watched your kvm forum video > https://www.youtube.com/watch?v=3DzQK5ANionpU. Which cleared out someth= ings > for me. Lets say you have a disk of size 10GB, I had assumed that, if > drive-backup has copied till 2 gb offset, that wouldnt qemu have to sta= ll > writes coming from guest b/w 2gb and 10gb ? Unless qemu does some inter= nal > qcow snapshoting at the start of the backup job and committing at the e= nd. > But if i get it correctly from what you explained, qemu doesnot create = a > new qcow file, but when a write comes from the guest to the live image,= old > block is first written to the backup synchronously before writing new d= ata > to the live qcow2 file. This would not stall the writes, but this would= > slow down the writes of the guest, as an extra write to target file on > secondary storage(over nfs) has to happen first. If the old block write= to > nfs fails, does backup fail with on-target-error appropriately set? or = does > it stall the guest write ? You have various knobs to control what happens on write failures, both on the source and on the destination (on-source-error and on-target-error) as well as how synchronized the image will be (MirrorCopyMode of background vs. write-blocking - but only since 3.0). Between those knobs, you should be able to control whether a failure to write to the backup image halts the guest or merely halts the job. But yes, I/O issued by the guest to a cluster currently being serviced by the backup code can result in longer write completion times from the guest's perspective on those clusters. >=20 >=20 >>> 2. Is the backup consistent? Are the drive file-systems quiesced on >> backup? >>> (Assuming no!) >> >> If you want the file systems quiesced on backup, then merely bracket >> your transaction that kicks off the drive-backup inside guest-agent >> commands that freeze and thaw the disk. So, consistency is not defaul= t >> (because it requires trusting the guest), but is possible. >> >> > Ok. Method 2 below would not even be required if both the above issues = can > be solved. >=20 >>> >>> *I cannot do this because drive-backup doesnot allow bitmap and node = that >>> the bitmap is attached to, to be different. :( * >> >> It might, as long as the bitmap is found on the backing chain (I'm a b= it >> fuzzier on that case, but KNOW that for pull-mode backups, my libvirt >> code is definitely relying on being able to access the bitmap from the= >> backing file of the BDS being exported over NBD). >> >> > Sorry. I dont get this. So lets say this was the drive-1 I had. A(raw) = <--- > B (qcow2) . @suman(cc'ed) created a bitmap(bitmap1) on device:drive-1 ,= > then took a snapshot of it. At this point the chain would be something = like > A(raw) <-- B(qcow2 - snapshot) <--- C(qcow2 - live). Would the bitmap= > that was created on drive-1 still be attached to #nodeB or would it be > attached to #nodeC. Would it have all the dirty blocks from "bitmap-add= to > now" or would it only have dirty blocks from "bitmap-add to snapshot". > If the bitmap's now attached to live drive-1( i.e, nodeC) it would have= all > the dirty blocks, but then can i do a drive-backup(bitmap1, src=3D#node= B). We are still exploring how external snapshots should interact with bitmaps (the low level building blocks may or may not already be present in qemu 3.1, but libvirt certainly hasn't been coded to use them to actually prove what works, as I'm still struggling to get the incremental backups without external snapshot code in libvirt first). At the moment, when you create nodeC, the bitmap in node B effectively becomes read-only (no more writes to nodeB, so the bitmap doesn't change content). You can, at the time you create nodeC but before wiring it into the chain using blockdev-add, also create another bitmap living in nodeC, such that when you then perform the snapshots, writes to nodeC are tracked in the new bitmap. To track all changes from the time that bitmap1 was first created, you'd need to be able to merge the bits set in bitmap1 of nodeB plus the bits set in the bitmap in nodeC. Qemu does not automatically move bitmaps from one image to another, so it really does boil down to whether we have enough other mechanisms for merging bitmaps from cross-image sources. >=20 > If the bitmap stays attached to ( nodeB), it would have only dirty bloc= ks > till the point snapshot C is created. But this is a problem, as a backu= p > workflow/program shouldnot restrict users from creating other snapshots= =2E Not a problem if you also create a bitmap every time you take an external snapshot, and then piece together bitmaps as needed to collect all changes between the point in time of interest and the present. > Backup workflow can take additional snapshots as done in method2 above = if > it wants, and then remove the snapshot once the backup job is done. I g= uess > this problem would be there for the pull based model as well. I am > currently trying my workflow on an rhev cluster, and i donot want my ba= ckup > workflow to interfere with snapshots triggered from rhevm/ovirt. "Incremental backup" means only the data that changed since the last backup (which can either be done via a single bitmap or by treating all external snapshot creation operations as a backup point in time); "differential backup" is the more powerful term that means tracking MULTIPLE points in time (in my libvirt code, by having a chain of multiple bitmaps, and then piecing together the right set of bitmaps as needed). But yes, it sounds like you want differential backups, by piecing together bitmaps over multiple points in time, and where you take care to freeze one bitmap and create a new one at any point in time where you want to be able to track changes since that point in time (whether kicking off a backup job, or doing an external snapshot). >> To >>> either fail the whole backup or succeed(when multiple drives are >> present), >>> i can use completion-mode =3D grouped. But then i cant combine them a= s its >>> not supported. i.e, do a >>> Transaction{drive-backup(drive1), dirty-bitmap-add(drive1, >>> bitmap1),drive-backup(drive2), dirty-bitmap-add(drive2, bitmap1), >>> completion-mode=3Dgrouped}. >> >> What error message are you getting? I'm not surprised if >> completion-mode=3Dgrouped isn't playing nicely with bitmaps in >> transactions, although that should be something that we should fix. >> >=20 > error says grouped completion-mode not allowed with command > 'drity-bitmap-add' >=20 The other thing to consider is whether you really need completion-mode=3Dgrouped, or whether you can instead use push-mode backups with a temporary backup. But again, that won't help you prior to qemu 3.1, where you don't have easy access to creating/merging bitmaps on the fly. The approach I'm using in libvirt is that since qemu's push-mode backup success destroys the old state of the bitmap, that I instead create a temporary bitmap, merge the real bitmap into the temporary bitmap (in a transaction), then kick off the backup job. If the backup job succeeds, delete the temporary bitmap, all is well; if it fails, then merge the temporary bitmap back into the real bitmap, but at the end of the day, by managing the bitmaps myself instead of letting qemu auto-manage them, I did not have to rely on completion-mode=3Dgroupe= d in order to get sane failure handling of push-mode backups across multiple disks. (Well, truth be told, that's the part of the libvirt code that I did NOT have working at KVM Forum, and where I still have not posted a working demo to the libvirt list in the meantime - so far, I have only demo'd pull-mode backups, and not push-mode, because I am still playing with how libvirt will make push-mode work reliably). >>> 3. Is there a way pre 2.12 to achieve auto-finalise =3D false in a >>> transaction. Can I somehow add a dummy block job, that will only fini= sh >>> when i want to finalise the actual 2 disks block jobs? My backup work= flow >>> needs to run on env's pre 2.12. >> >> Ouch - backups pre-2.12 have issues. If I had not read this paragraph= , >> my recommendation would be to stick to 3.1 and use pull-mode backups >> (where you use NBD to learn which portions of the image were dirtied, >> and pull those portions of the disk over NBD rather than qemu pushing >> them); I even have a working demo of preliminary libvirt code driving >> that which I presented at last year's KVM Forum. >> >=20 > What do you mean by issues? Do you mean any data/corruption bugs or lac= k of > some nice functionality that we are talking here? Lack of functionality. In particular, the 4.0 commands block-dirty-bitmap-{enable,merge,disable} (or their 3.1 counterparts x-block-dirty-bitmap-*) are essential to the workflow of differential backups (without being able to manage bitmaps yourself, you can only get the weaker incremental backup, and that means qemu itself is clearing the bitmap out of under your feet on success, and where you are having to worry about completion-mode=3Dgrouped). >=20 > Thanks a lot Eric for spending your time in answering my queries. I don= t > know if you work with Kashyap Chamarthy, but your help and his blogs ar= e > lifesavers. Yes, Kashyap is trying to build solutions on top of the building blocks that I am working on, so we have collaborated several times on these types of issues (he does a lot better at blog posts extracted from my mailing list brain dumps). --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org --YhoOm5X7isAIajmhPQqB8g1gMO78w3m45 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAlxIu/UACgkQp6FrSiUn Q2pGowf8Dx3Z+wVlqKFce4NPfnH/E0Llg/G+8ymHdzRWRLNRLe9xVNDQxWVNNFif 5ftWoGs3/Zot4wTUnX3s0IPCB6t6r+rKCru5Foe7/DHfX74RTMiGxFLNmzLDbcRL 7AFLxYfX/IzKt5KmvDxSl4iwEQsmGiJTdgCiy45YVOrbjZcXG6elSFUYOjDUOV4Q qY8/ugQiED2RI3oC2ntCdl1xPm5aAsB5Hk8+ZcbOCJWmkAxe2/49+cc28G/aHTV3 k7+Nr/y3Z4+haV9b/yp1/usmo5TsREVO72qPyw35hTZr79S1KiQlvrFFmXn7EM2V xdYjAL85a6tLbrLNqbBpzTVZ5MNH2A== =vPjO -----END PGP SIGNATURE----- --YhoOm5X7isAIajmhPQqB8g1gMO78w3m45--