From: Hongyang Yang <yanghy@cn.fujitsu.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Walid Nouri <walid.nouri@gmail.com>
Cc: kwolf@redhat.com, eddie.dong@intel.com, qemu-devel@nongnu.org,
"Michael R. Hines" <mrhines@linux.vnet.ibm.com>,
stefanha@redhat.com, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Fri, 12 Sep 2014 09:24:17 +0800 [thread overview]
Message-ID: <54124B41.90508@cn.fujitsu.com> (raw)
In-Reply-To: <20140911174407.GP2353@work-vm>
在 09/12/2014 01:44 AM, Dr. David Alan Gilbert 写道:
> (I've cc'd in Fam, Stefan, and Kevin for Block stuff, and
> Yang and Eddie for Colo)
>
> * Walid Nouri (walid.nouri@gmail.com) wrote:
>> Hello Michael, Hello Paolo
>> i have ???studied??? the available documentation/Information and tried to
>> get an idea of the QEMU live block operation possibilities.
>>
>> I think the MC protocol doesn???t need synchronous block device replication
>> because primary and secondary VM are not synchronous. The state of the
>> primary is allays ahead of the state of the secondary. When the primary is
>> in epoch(n) the secondary is in epoch(n-1).
>>
>> What MC needs is a block device agnostic, controlled and asynchronous
>> approach for replicating the contents of block devices and its state changes
>> to the secondary VM while the primary VM is running. Asynchronous block
>> transfer is important to allow maximum performance for the primary VM, while
>> keeping the secondary VM updated with state changes.
>>
>> The block device replication should be possible in two stages or modes.
>>
>> The first stage is the live copy of all block devices of the primary to the
>> secondary. This is necessary if the secondary doesn???t have an existing
>> image which is in sync with the primary at the time MC has started. This is
>> not very convenient but as far as I know actually there is no mechanism for
>> persistent dirty bitmap in QEMU.
>>
>> The second stage (mode) is the replication of block device state changes
>> (modified blocks) to keep the image on the secondary in sync with the
>> primary. The mirrored blocks must be buffered in ram (block buffer) until
>> the complete Checkpoint (RAM, vCPU, device state) can be committed.
>>
>> For keeping the complete system state consistent on the secondary system
>> there must be a possibility for MC to commit/discard block device state
>> changes. In normal operation the mirrored block device state changes (block
>> buffer) are committed to disk when the complete checkpoint is committed. In
>> case of a crash of the primary system while transferring a checkpoint the
>> data in the block buffer corresponding to the failed Checkpoint must be
>> discarded.
>
> I think for COLO there's a requirement that the secondary can do reads/writes
> in parallel with the primary, and the secondary can discard those reads/writes
> - and that doesn't happen in MC (Yang or Eddie should be able to confirm that).
Exactly, COLO need this functionality to ensure consistency.
>
>> The storage architecture should be ???shared nothing??? so that no shared
>> storage is required and primary/secondary can have separate block device
>> images.
>
> MC/COLO with shared storage still needs some stuff like this; but it's subtely
> different. They still need to be able to buffer/release modifications
> to the shared storage; if any of this code can also be used in the
> shared-storage configurations it would be good.
Shared-storage is more complicated, we don't support shared-storage currently...
>
>> I think this can be achieved by drive-mirror and a filter block driver.
>> Another approach could be to exploit the block migration functionality of
>> live migration with a filter block driver.
>>
>> The drive-mirror (and live migration) does not rely on shared storage and
>> allow live block device copy and incremental syncing.
>>
>> A block buffer can be implemented with a QEMU filter block driver. It should
>> sit at the same position as the Quorum driver in the block driver hierarchy.
>> When using block filter approach MC will be transparent and block device
>> agnostic.
>>
>> The block buffer filter must have an Interface which allows MC control the
>> commits or discards of block device state changes. I have no idea where to
>> put such an interface to stay conform with QEMU coding style.
>>
>>
>> I???m sure there are alternative and better approaches and I???m open for
>> any ideas
>>
>>
>> Walid
>>
>> Am 17.08.2014 11:52, schrieb Paolo Bonzini:
>>> Il 11/08/2014 22:15, Michael R. Hines ha scritto:
>>>> Excellent question: QEMU does have a feature called "drive-mirror"
>>>> in block/mirror.c that was introduced a couple of years ago. I'm not
>>>> sure what the
>>>> adoption rate of the feature is, but I would start with that one.
>>>
>>> block/mirror.c is asynchronous, and there's no support for communicating
>>> checkpoints back to the master. However, the quorum disk driver could
>>> be what you need.
>>>
>>> There's also a series on the mailing list that lets quorum read only
>> >from the primary, so that quorum can still do replication and fault
>>> tolerance, but skip fault detection.
>>>
>>> Paolo
>>>
>>>> There is also a second fault tolerance implementation that works a
>>>> little differently called
>>>> "COLO" - you may have seen those emails on the list too, but their
>>>> method does not require a disk replication solution, if I recall correctly.
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
>
--
Thanks,
Yang.
next prev parent reply other threads:[~2014-09-12 1:25 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <53D8FF52.9000104@gmail.com>
[not found] ` <1406820870.2680.3.camel@usa>
[not found] ` <53DBE726.4050102@gmail.com>
[not found] ` <1406947532.2680.11.camel@usa>
[not found] ` <53E0AA60.9030404@gmail.com>
[not found] ` <1407376929.21497.2.camel@usa>
[not found] ` <53E60F34.1070607@gmail.com>
[not found] ` <1407587152.24027.5.camel@usa>
2014-08-11 17:22 ` [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency Walid Nouri
2014-08-11 20:15 ` Michael R. Hines
2014-08-17 9:52 ` Paolo Bonzini
2014-08-19 8:58 ` Walid Nouri
2014-09-10 15:43 ` Walid Nouri
2014-09-11 1:50 ` Michael R. Hines
2014-09-12 1:34 ` Hongyang Yang
2014-09-11 7:27 ` Paolo Bonzini
2014-09-11 17:44 ` Dr. David Alan Gilbert
2014-09-11 22:08 ` Walid Nouri
2014-09-12 1:24 ` Hongyang Yang [this message]
2014-09-12 11:07 ` Stefan Hajnoczi
2014-09-17 20:53 ` Walid Nouri
2014-09-18 13:56 ` Stefan Hajnoczi
2014-09-23 16:36 ` Walid Nouri
2014-09-24 8:47 ` Stefan Hajnoczi
2014-09-25 16:06 ` Walid Nouri
2014-08-11 20:15 ` Michael R. Hines
2014-08-13 14:03 ` Walid Nouri
2014-08-13 22:28 ` Michael R. Hines
2014-08-14 10:58 ` Dr. David Alan Gilbert
2014-08-14 17:23 ` Michael R. Hines
2014-08-19 8:33 ` Walid Nouri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54124B41.90508@cn.fujitsu.com \
--to=yanghy@cn.fujitsu.com \
--cc=dgilbert@redhat.com \
--cc=eddie.dong@intel.com \
--cc=kwolf@redhat.com \
--cc=mrhines@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=walid.nouri@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).