From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: Walid Nouri <walid.nouri@gmail.com>,
qemu-devel@nongnu.org, michael@hinespot.com, hinesmr@cn.ibm.com
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Thu, 14 Aug 2014 06:28:02 +0800 [thread overview]
Message-ID: <53EBE672.7050903@linux.vnet.ibm.com> (raw)
In-Reply-To: <53EB7026.805@gmail.com>
On 08/13/2014 10:03 PM, Walid Nouri wrote:
>
> While looking to find some ideas for approaches to replicating block
> devices I have read the paper about the Remus implementation. I think
> MC can take a similar approach for local disk.
>
I agree.
> Here are the main facts that I have understood:
>
> Local disk contents is viewed as internal state the primary and
> secondary.
> In the explanation they describe that for keeping disc semantics of
> the primary and to allow the primary to run speculatively all disc
> state changes are directly written to the disk. In parrallel and
> asynchronously send to the secondary. The secondary keeps the pending
> writing requests in two disk buffers. A speculation-disk-buffer and a
> write-out-buffer.
>
> After the reception of the next checkpoint the secondary copies the
> speculation buffer to the write out buffer, commits the checkpoint and
> applies the write out buffer to its local disk.
>
> When the primary fails the secondary must wait until write-out-buffer
> has been completely written to disk before before changing the
> execution mode to run as primary. In this case (failure of primary)
> the secondary discards pending disk writes in its speculation buffer.
> This protocol keeps the disc state consistent with the last checkpoint.
>
> Remus uses the XEN specific blktap driver. As far as I know this can’t
> be used with QEMU (KVM).
>
> I must see how drive-mirror can be used for this kind of protocol.
>
That's all correct. Theoretically, we would do exactly the same thing:
drive-mirror on the source would write immediately to disk but follow
the same commit semantics on the destination as Xen.
>
> I have taken a look at COLO.
>
> IMHO there are two points. Custom changes of the TCP-Stack are a no-go
> for proprietary operating systems like Windows. It makes COLO
> application agnostic but not operating system agnostic. The other
> point is that with I/O intensive workloads COLO will tend to behave
> like MC. This is my point of view but i didn’t invest much time to
> understand everything in detail.
>
Actually, if I remember correctly, the TCP stack is only modified at the
hypervisor level - they are intercepting and translating TCP sequence
numbers "in-flight" to detect divergence of the source and destination -
which is not a big problem if the implementation is well-done.
My hope in the future was that the two approaches could be used in a
"Hybrid" manner - actually MC has much more of a performance hit for I/O
than COLO does because of its buffering requirements.
On the other hand, MC would perform better in a memory-intensive or
CPU-intensive situation - so maybe QEMU could "switch" between the two
mechanisms at different points in time when the resource bottleneck changes.
- Michael
next prev parent reply other threads:[~2014-08-14 5:50 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <53D8FF52.9000104@gmail.com>
[not found] ` <1406820870.2680.3.camel@usa>
[not found] ` <53DBE726.4050102@gmail.com>
[not found] ` <1406947532.2680.11.camel@usa>
[not found] ` <53E0AA60.9030404@gmail.com>
[not found] ` <1407376929.21497.2.camel@usa>
[not found] ` <53E60F34.1070607@gmail.com>
[not found] ` <1407587152.24027.5.camel@usa>
2014-08-11 17:22 ` [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency Walid Nouri
2014-08-11 20:15 ` Michael R. Hines
2014-08-17 9:52 ` Paolo Bonzini
2014-08-19 8:58 ` Walid Nouri
2014-09-10 15:43 ` Walid Nouri
2014-09-11 1:50 ` Michael R. Hines
2014-09-12 1:34 ` Hongyang Yang
2014-09-11 7:27 ` Paolo Bonzini
2014-09-11 17:44 ` Dr. David Alan Gilbert
2014-09-11 22:08 ` Walid Nouri
2014-09-12 1:24 ` Hongyang Yang
2014-09-12 11:07 ` Stefan Hajnoczi
2014-09-17 20:53 ` Walid Nouri
2014-09-18 13:56 ` Stefan Hajnoczi
2014-09-23 16:36 ` Walid Nouri
2014-09-24 8:47 ` Stefan Hajnoczi
2014-09-25 16:06 ` Walid Nouri
2014-08-11 20:15 ` Michael R. Hines
2014-08-13 14:03 ` Walid Nouri
2014-08-13 22:28 ` Michael R. Hines [this message]
2014-08-14 10:58 ` Dr. David Alan Gilbert
2014-08-14 17:23 ` Michael R. Hines
2014-08-19 8:33 ` Walid Nouri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53EBE672.7050903@linux.vnet.ibm.com \
--to=mrhines@linux.vnet.ibm.com \
--cc=hinesmr@cn.ibm.com \
--cc=michael@hinespot.com \
--cc=qemu-devel@nongnu.org \
--cc=walid.nouri@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).