From: Walid Nouri <walid.nouri@gmail.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: kwolf@redhat.com, eddie.dong@intel.com, qemu-devel@nongnu.org,
"Michael R. Hines" <mrhines@linux.vnet.ibm.com>,
stefanha@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
yanghy@cn.fujitsu.com
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Fri, 12 Sep 2014 00:08:20 +0200 [thread overview]
Message-ID: <54121D54.2040903@gmail.com> (raw)
In-Reply-To: <20140911174407.GP2353@work-vm>
Am 11.09.2014 19:44, schrieb Dr. David Alan Gilbert:
>> For keeping the complete system state consistent on the secondary system
>> there must be a possibility for MC to commit/discard block device state
>> changes. In normal operation the mirrored block device state changes (block
>> buffer) are committed to disk when the complete checkpoint is committed. In
>> case of a crash of the primary system while transferring a checkpoint the
>> data in the block buffer corresponding to the failed Checkpoint must be
>> discarded.
>
> I think for COLO there's a requirement that the secondary can do reads/writes
> in parallel with the primary, and the secondary can discard those reads/writes
> - and that doesn't happen in MC (Yang or Eddie should be able to confirm that).
>
>> The storage architecture should be ???shared nothing??? so that no shared
>> storage is required and primary/secondary can have separate block device
>> images.
I admit that my formulation was unintentionally a bit ambiguous :)
I should have written that a shared storage should not be mandatory.
I'm comming from an SMB environment and (redundant) shared storage
systems are still not usual in small companies :)
I looked for a storage agnostic approach which allows the number of
system components to be as low as possible and still get redundancy and
fault tolerance.
>
> MC/COLO with shared storage still needs some stuff like this; but it's subtely
> different. They still need to be able to buffer/release modifications
> to the shared storage; if any of this code can also be used in the
> shared-storage configurations it would be good.
The proposed approach with block filter and the commit/discard protocol
should be storage agnostic and will also work in a shared storage
environment, but only with distinct images (because of the protocol).
In case of a shared storage and a common image used by the primary and
secondary another storage protocol must be used.
It's not commit/discard but commit/rollback
The primary still sends asynchronously the block state changes. The
secondary buffers block device state changes but doesn't apply them in
normal operation. When the next checkpoint is complete the secondary
clears the buffer and forgets about the old block state data.
If the primary fails the secondary must rollback the common image with
the block state data corresponding to the actual checkpoint.
Otherwise the state of the image and rest of the system state on the
secondary will not be in sync.
When there is no block state data corresponding to the actual
checkpoint, then there is nothing to do on the storage for the secondary :)
There is a little danger in this though. When the secondary fails during
rollback, the common image will be left in an inconsistent state.
I think this risk cannot be avoided when using a common image. But this
unfortunate situation can also happen in other scenarios.
Sharing a common immage with this protocol will lead to a longer fail
over time in case of existing block device state data for the actual
checkpoint. The secondary must initiate the rollback and wait until all
blocks of the actual checkpoint are commited to the common immage before
taking over the active role.
Walid
next prev parent reply other threads:[~2014-09-11 22:08 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <53D8FF52.9000104@gmail.com>
[not found] ` <1406820870.2680.3.camel@usa>
[not found] ` <53DBE726.4050102@gmail.com>
[not found] ` <1406947532.2680.11.camel@usa>
[not found] ` <53E0AA60.9030404@gmail.com>
[not found] ` <1407376929.21497.2.camel@usa>
[not found] ` <53E60F34.1070607@gmail.com>
[not found] ` <1407587152.24027.5.camel@usa>
2014-08-11 17:22 ` [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency Walid Nouri
2014-08-11 20:15 ` Michael R. Hines
2014-08-17 9:52 ` Paolo Bonzini
2014-08-19 8:58 ` Walid Nouri
2014-09-10 15:43 ` Walid Nouri
2014-09-11 1:50 ` Michael R. Hines
2014-09-12 1:34 ` Hongyang Yang
2014-09-11 7:27 ` Paolo Bonzini
2014-09-11 17:44 ` Dr. David Alan Gilbert
2014-09-11 22:08 ` Walid Nouri [this message]
2014-09-12 1:24 ` Hongyang Yang
2014-09-12 11:07 ` Stefan Hajnoczi
2014-09-17 20:53 ` Walid Nouri
2014-09-18 13:56 ` Stefan Hajnoczi
2014-09-23 16:36 ` Walid Nouri
2014-09-24 8:47 ` Stefan Hajnoczi
2014-09-25 16:06 ` Walid Nouri
2014-08-11 20:15 ` Michael R. Hines
2014-08-13 14:03 ` Walid Nouri
2014-08-13 22:28 ` Michael R. Hines
2014-08-14 10:58 ` Dr. David Alan Gilbert
2014-08-14 17:23 ` Michael R. Hines
2014-08-19 8:33 ` Walid Nouri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54121D54.2040903@gmail.com \
--to=walid.nouri@gmail.com \
--cc=dgilbert@redhat.com \
--cc=eddie.dong@intel.com \
--cc=kwolf@redhat.com \
--cc=mrhines@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=yanghy@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).