qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	vsementsov@virtuozzo.com, qemu-block@nongnu.org,
	Juan Quintela <quintela@redhat.com>,
	Wen Congyang <wencongyang2@huawei.com>,
	Xie Changlong <xiechanglong.d@gmail.com>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Max Reitz <mreitz@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 00/18] bitmaps: introduce 'bitmap' sync mode
Date: Tue, 23 Jul 2019 12:58:10 -0400	[thread overview]
Message-ID: <787d6506-d4bb-0904-5e58-521d6fc8313a@redhat.com> (raw)
In-Reply-To: <20190723094702.glmdyjm6rgelcwte@nora.maurer-it.com>



On 7/23/19 5:47 AM, Fabian Grünbichler wrote:
> On Mon, Jul 22, 2019 at 01:21:02PM -0400, John Snow wrote:
>>
>>
>> On 7/22/19 8:17 AM, Fabian Grünbichler wrote:
>>> On Tue, Jul 09, 2019 at 07:25:32PM -0400, John Snow wrote:
>>>> This series adds a new "BITMAP" sync mode that is meant to replace the
>>>> existing "INCREMENTAL" sync mode.
>>>>
>>>> This mode can have its behavior modified by issuing any of three bitmap sync
>>>> modes, passed as arguments to the job.
>>>>
>>>> The three bitmap sync modes are:
>>>> - ON-SUCCESS: This is an alias for the old incremental mode. The bitmap is
>>>>               conditionally synchronized based on the return code of the job
>>>>               upon completion.
>>>> - NEVER: This is, effectively, the differential backup mode. It never clears
>>>>          the bitmap, as the name suggests.
>>>> - ALWAYS: Here is the new, exciting thing. The bitmap is always synchronized,
>>>>           even on failure. On success, this is identical to incremental, but
>>>>           on failure it clears only the bits that were copied successfully.
>>>>           This can be used to "resume" incremental backups from later points
>>>>           in times.
>>>>
>>>> I wrote this series by accident on my way to implement incremental mode
>>>> for mirror, but this happened first -- the problem is that Mirror mode
>>>> uses its existing modes in a very particular way; and this was the best
>>>> way to add bitmap support into the mirror job properly.
>>>>
>>>> [...]
>>>>
>>>> Future work:
>>>> [..]
>>>>  - Add these modes to Mirror. (Done*, but needs tests.)
>>>
>>> are these mirror patches available somehwere for testing in combination
>>> with this series? your bitmaps branch does not seem to contain them ;)
>>>
>>> we've been experimenting with Ma Haocong's patch (v4 from February) to add
>>> "incremental"/differential sync to drive-mirror recently with positive
>>> results so far, and this sounds like it is another attempt at getting
>>> this properly integrated into Qemu.
>>>
>>
>> Not available quite yet; I added it in fairly hastily but haven't done
>> the testing I want to do yet, so I wouldn't feel comfortable sharing it
>> before I do my own due diligence on it. Give me a chance to polish it so
>> that the testing effort isn't wasted :)
> 
> fair enough, and no hurries :)
> 
>>
>> Can you share some of your use-cases for how you are using the
>> "incremental mirror" so far? It might be useful for the patch
>> justification if I can point to production use cases. (And good for
>> allocating time, too.)
> 
> it's basically the same use case that the original "incremental mirror"
> patch (series)[1] from two years ago had (no affiliation with the author
> though) - we have a guest disk replication feature for ZFS/zvols in a
> clustered hypervisor setting, and would like to re-use the already
> replicated disk state when live-migrating a VM. Qemu does not know
> anything about the replication, since it happens on the storage layer
> with zfs send/zfs receive. note that for VMs, we use zvols which are
> block devices backed by ZFS (or rather, ZFS datasets exposed as block
> devices), minus the file system part of regular ZFS datasets. from
> Qemu's PoV these (replicated) disks are just regular block devices (and not
> image-backed disks on a filesystem, or accessed via some special
> BlockDriver like Ceph's RBD images).
> 
> we currently support live migration
> 1) with disks on shared/distributed storage (easy enough)
> 2) with regular (non-replicated, local) disks (via nbd/drive-mirror)
> 3) with unused disks on the storage level (disks are not known to Qemu/the VM)
> 
> 1-3 can be mixed and matched arbitrarily in one guest, e.g. with one
> disk on a shared Ceph cluster, one disk that is not in use on an NFS
> share, and another disk on a local LVM-thin pool. 2) and 3) also allow
> switching the underlying storage on the fly, since they transfer the
> full disk (content) anyway.
> 
> we also support offline migration with shared, local, unused and/or
> replicated disks (all on the storage level with no involvement of Qemu).
> 
> as you can see there is a gap in the live-migration feature matrix: when
> replication is used, you either have to poweroff the VM to re-use the
> replication state (storage-only migration), or drop the replication
> state and do a full local-disk live-migration before re-creating the
> replication state from scratch (which is bad, since replication can have
> multiple target hosts, and re-establishing the whole disk can take a
> while if its big).
> 
> our basic approach is (currently) the following:
> 
> 1) get disk info
> 2) Qemu: add dirty bitmaps for currently used, replicated disks
> 3) storage/ZFS: do a regular replication of all replicated disks (used AND unused)

I take it that the ZFS replication is not an ongoing process but
something that terminates, so you need QEMU to pick up the difference
that occurred during that time?

(Which I imagine the bitmap will pick up some writes that DO get
replicated, but copying some extra is safe.)

> 4) storage: do a regular storage migration of all regular unused local disks
> 5a) Qemu: do a regular drive-mirror of all currently used, local disks
> 5b) Qemu: do an incremental drive-mirror for all currently used, replicated disks

To mirror anything written since the replication started, based on this
timeline.

> 6) Qemu: wait for convergence of drive-mirror jobs
> 7) Qemu: do a regular live-migration of VM
> 8) Qemu: once converged and VM is suspended, complete drive-mirror jobs
> 9) Qemu: resume now fully migrated VM on target node
> 10) Qemu/storage: clean up on source node
> 
> 5b) with bitmaps from 2) is what is currently missing on the Qemu side,
> but seems easy enough to support (like I said, we are currently using Ma
> Haocong's patch for testing, but want to get this feature upstream one
> way or another instead of carrying our own, possibly incompatible in the
> near-future version).
> 

It will look VERY similar. Switching should be easy; the only difference
will be:

sync=BITMAP instead of sync=INCREMENTAL, and
bitmap_mode=NEVER provided explicitly to match Ma Haocong's patch behavior.

You can alternatively use the other bitmap policies depending on what
you want:

NEVER leaves the bitmap alone entirely like Ma Haocong's patch does. It
reflects a kind of "differential backup" intent; changes accumulate in
the bitmap if it was enabled.

ON-SUCCESS will reset any bits copied out if the job completes
successfully (note that this includes mirror cancellation after sync as
well as a COMPLETE instruction that includes the pivot.)

ALWAYS will reset any bits successfully copied out, regardless of the
final state of the job. You can use this one to resume the mirror on
failures.

You should be able to get the exact behavior you've already programmed
for, and maybe some new toys.

> 2) and 3) are obviously not atomic, so the bitmaps will contain some
> writes that have been replicated already on the block/storage layer
> below the VM, and those writes will be done a second time in step 5b).
> 
> we can work around this by adding another short down time by
> freezing/suspending prior to 2) until after doing the ZFS snapshots at
> the start of 3), in case these duplicate writes turn out to be
> problematic after all. this downtime would be rather short, as the bulk
> of the replication work (actually transfering the latest delta) can
> happen after unfreezing/resuming the VM. so far we haven't encountered
> any problems in our (albeit limited) testing though, so if possible we
> would naturally like to avoid the additional downtime altogether ;)
> 
> looking forward to your patch(es) :)
> 
> 1: <CAKVPjOZ8Y8U2zHgo_06aozrdd9_Cq6txWrX5F4HnFefAUjimyQ@mail.gmail.com>
> and <20170504105444.8940-1-daniel.kucera@gmail.com>
> 

Thanks for the writeup! My goal is to have this in for 4.2 alongside all
of the other bitmap changes I've queued so far.

--js


  reply	other threads:[~2019-07-23 16:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-09 23:25 [Qemu-devel] [PATCH v4 00/18] bitmaps: introduce 'bitmap' sync mode John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 01/18] qapi/block-core: Introduce BackupCommon John Snow
2019-07-10  1:36   ` John Snow
2019-07-10  4:11     ` Markus Armbruster
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 02/18] drive-backup: create do_backup_common John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 03/18] blockdev-backup: utilize do_backup_common John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 04/18] qapi: add BitmapSyncMode enum John Snow
2019-07-10  4:15   ` Markus Armbruster
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 05/18] block/backup: Add mirror sync mode 'bitmap' John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 06/18] block/backup: add 'never' policy to bitmap sync mode John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 07/18] hbitmap: Fix merge when b is empty, and result is not an alias of a John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 08/18] hbitmap: enable merging across granularities John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 09/18] block/dirty-bitmap: add bdrv_dirty_bitmap_merge_internal John Snow
2019-07-10 14:38   ` Max Reitz
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 10/18] block/dirty-bitmap: add bdrv_dirty_bitmap_get John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 11/18] block/backup: upgrade copy_bitmap to BdrvDirtyBitmap John Snow
2019-07-10 14:39   ` Max Reitz
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 12/18] block/backup: add 'always' bitmap sync policy John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 13/18] iotests: add testing shim for script-style python tests John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 14/18] iotests: teach run_job to cancel pending jobs John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 15/18] iotests: teach FilePath to produce multiple paths John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 16/18] iotests: Add virtio-scsi device helper John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 17/18] iotests: add test 257 for bitmap-mode backups John Snow
2019-07-10 14:45   ` Max Reitz
2019-07-11  1:51   ` John Snow
2019-07-09 23:25 ` [Qemu-devel] [PATCH v4 18/18] block/backup: loosen restriction on readonly bitmaps John Snow
2019-07-10 14:48 ` [Qemu-devel] [PATCH v4 00/18] bitmaps: introduce 'bitmap' sync mode Max Reitz
2019-07-10 17:21   ` John Snow
2019-07-15 20:00 ` John Snow
2019-07-22 12:17 ` Fabian Grünbichler
2019-07-22 17:21   ` John Snow
2019-07-23  9:47     ` Fabian Grünbichler
2019-07-23 16:58       ` John Snow [this message]
2019-07-24  8:41         ` Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=787d6506-d4bb-0904-5e58-521d6fc8313a@redhat.com \
    --to=jsnow@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    --cc=wencongyang2@huawei.com \
    --cc=xiechanglong.d@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).