qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: Walid Nouri <walid.nouri@gmail.com>,
	qemu-devel@nongnu.org, michael@hinespot.com, hinesmr@cn.ibm.com
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Thu, 14 Aug 2014 06:28:02 +0800	[thread overview]
Message-ID: <53EBE672.7050903@linux.vnet.ibm.com> (raw)
In-Reply-To: <53EB7026.805@gmail.com>

On 08/13/2014 10:03 PM, Walid Nouri wrote:
>
> While looking to find some ideas for approaches to replicating block 
> devices I have read the paper about the Remus implementation. I think 
> MC can take a similar approach for local disk.
>

I agree.

> Here are the main facts that I have understood:
>
> Local disk contents is viewed as internal state the primary and 
> secondary.
> In the explanation they describe that for keeping disc semantics of 
> the primary and to allow the primary to run speculatively all disc 
> state changes are directly written to the disk. In parrallel and 
> asynchronously send to the secondary. The secondary keeps the pending 
> writing requests in two disk buffers. A speculation-disk-buffer and a 
> write-out-buffer.
>
> After the reception of the next checkpoint the secondary copies the 
> speculation buffer to the write out buffer, commits the checkpoint and 
> applies the write out buffer to its local disk.
>
> When the primary fails the secondary must wait until write-out-buffer 
> has been completely written to disk before before changing the 
> execution mode to run as primary. In this case (failure of primary) 
> the secondary discards pending disk writes in its speculation buffer. 
> This protocol keeps the disc state consistent with the last checkpoint.
>
> Remus uses the XEN specific blktap driver. As far as I know this can’t 
> be used with QEMU (KVM).
>
> I must see how drive-mirror can be used for this kind of protocol.
>

That's all correct. Theoretically, we would do exactly the same thing: 
drive-mirror on the source would write immediately to disk but follow 
the same commit semantics on the destination as Xen.

>
> I have taken a look at COLO.
>

> IMHO there are two points. Custom changes of the TCP-Stack are a no-go 
> for proprietary operating systems like Windows. It makes COLO 
> application agnostic but not operating system agnostic. The other 
> point is that with I/O intensive workloads COLO will tend to behave 
> like MC. This is my point of view but i didn’t invest much time to 
> understand everything in detail.
>

Actually, if I remember correctly, the TCP stack is only modified at the 
hypervisor level - they are intercepting and translating TCP sequence 
numbers "in-flight" to detect divergence of the source and destination - 
which is not a big problem if the implementation is well-done.

My hope in the future was that the two approaches could be used in a 
"Hybrid" manner - actually MC has much more of a performance hit for I/O 
than COLO does because of its buffering requirements.

On the other hand, MC would perform better in a memory-intensive or 
CPU-intensive situation - so maybe QEMU could "switch" between the two 
mechanisms at different points in time when the resource bottleneck changes.

- Michael

  reply	other threads:[~2014-08-14  5:50 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <53D8FF52.9000104@gmail.com>
     [not found] ` <1406820870.2680.3.camel@usa>
     [not found]   ` <53DBE726.4050102@gmail.com>
     [not found]     ` <1406947532.2680.11.camel@usa>
     [not found]       ` <53E0AA60.9030404@gmail.com>
     [not found]         ` <1407376929.21497.2.camel@usa>
     [not found]           ` <53E60F34.1070607@gmail.com>
     [not found]             ` <1407587152.24027.5.camel@usa>
2014-08-11 17:22               ` [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency Walid Nouri
2014-08-11 20:15                 ` Michael R. Hines
2014-08-17  9:52                   ` Paolo Bonzini
2014-08-19  8:58                     ` Walid Nouri
2014-09-10 15:43                     ` Walid Nouri
2014-09-11  1:50                       ` Michael R. Hines
2014-09-12  1:34                         ` Hongyang Yang
2014-09-11  7:27                       ` Paolo Bonzini
2014-09-11 17:44                       ` Dr. David Alan Gilbert
2014-09-11 22:08                         ` Walid Nouri
2014-09-12  1:24                         ` Hongyang Yang
2014-09-12 11:07                         ` Stefan Hajnoczi
2014-09-17 20:53                           ` Walid Nouri
2014-09-18 13:56                             ` Stefan Hajnoczi
2014-09-23 16:36                               ` Walid Nouri
2014-09-24  8:47                                 ` Stefan Hajnoczi
2014-09-25 16:06                                   ` Walid Nouri
2014-08-11 20:15                 ` Michael R. Hines
2014-08-13 14:03                   ` Walid Nouri
2014-08-13 22:28                     ` Michael R. Hines [this message]
2014-08-14 10:58                       ` Dr. David Alan Gilbert
2014-08-14 17:23                         ` Michael R. Hines
2014-08-19  8:33                         ` Walid Nouri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53EBE672.7050903@linux.vnet.ibm.com \
    --to=mrhines@linux.vnet.ibm.com \
    --cc=hinesmr@cn.ibm.com \
    --cc=michael@hinespot.com \
    --cc=qemu-devel@nongnu.org \
    --cc=walid.nouri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).