From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: Walid Nouri <walid.nouri@gmail.com>,
hinesmr@cn.ibm.com, qemu-devel@nongnu.org, michael@hinespot.com
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Thu, 14 Aug 2014 11:58:03 +0100 [thread overview]
Message-ID: <20140814105802.GD2503@work-vm> (raw)
In-Reply-To: <53EBE672.7050903@linux.vnet.ibm.com>
cc'ing in a couple of the COLOers.
* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 08/13/2014 10:03 PM, Walid Nouri wrote:
> >
> >While looking to find some ideas for approaches to replicating block
> >devices I have read the paper about the Remus implementation. I think MC
> >can take a similar approach for local disk.
> >
>
> I agree.
>
> >Here are the main facts that I have understood:
> >
> >Local disk contents is viewed as internal state the primary and secondary.
> >In the explanation they describe that for keeping disc semantics of the
> >primary and to allow the primary to run speculatively all disc state
> >changes are directly written to the disk. In parrallel and asynchronously
> >send to the secondary. The secondary keeps the pending writing requests in
> >two disk buffers. A speculation-disk-buffer and a write-out-buffer.
> >
> >After the reception of the next checkpoint the secondary copies the
> >speculation buffer to the write out buffer, commits the checkpoint and
> >applies the write out buffer to its local disk.
> >
> >When the primary fails the secondary must wait until write-out-buffer has
> >been completely written to disk before before changing the execution mode
> >to run as primary. In this case (failure of primary) the secondary
> >discards pending disk writes in its speculation buffer. This protocol
> >keeps the disc state consistent with the last checkpoint.
> >
> >Remus uses the XEN specific blktap driver. As far as I know this can?t be
> >used with QEMU (KVM).
> >
> >I must see how drive-mirror can be used for this kind of protocol.
> >
>
> That's all correct. Theoretically, we would do exactly the same thing:
> drive-mirror on the source would write immediately to disk but follow the
> same commit semantics on the destination as Xen.
>
> >
> >I have taken a look at COLO.
> >
>
> >IMHO there are two points. Custom changes of the TCP-Stack are a no-go for
> >proprietary operating systems like Windows. It makes COLO application
> >agnostic but not operating system agnostic. The other point is that with
> >I/O intensive workloads COLO will tend to behave like MC. This is my point
> >of view but i didn?t invest much time to understand everything in detail.
> >
>
> Actually, if I remember correctly, the TCP stack is only modified at the
> hypervisor level - they are intercepting and translating TCP sequence
> numbers "in-flight" to detect divergence of the source and destination -
> which is not a big problem if the implementation is well-done.
The 2013 paper says:
'COLO modifies the guest OSâs TCP/IP stack in order to make the behavior
more deterministic. '
but does say that an alternative might be to have a
' comparison function that operates transparently over re-assembled TCP streams'
> My hope in the future was that the two approaches could be used in a
> "Hybrid" manner - actually MC has much more of a performance hit for I/O
> than COLO does because of its buffering requirements.
>
> On the other hand, MC would perform better in a memory-intensive or
> CPU-intensive situation - so maybe QEMU could "switch" between the two
> mechanisms at different points in time when the resource bottleneck changes.
If the primary were to rate-limit the number of resynchronisations
(and send the secondary a message as soon as it knew a resync was needed) that
would get some of the way, but then the only difference from microcheckpointing
at that point is the secondary doing a wasteful copy and sending the packets across;
it seems it should be easy to disable those if it knew that a resync was going to
happen.
Dave
> - Michael
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2014-08-14 10:58 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <53D8FF52.9000104@gmail.com>
[not found] ` <1406820870.2680.3.camel@usa>
[not found] ` <53DBE726.4050102@gmail.com>
[not found] ` <1406947532.2680.11.camel@usa>
[not found] ` <53E0AA60.9030404@gmail.com>
[not found] ` <1407376929.21497.2.camel@usa>
[not found] ` <53E60F34.1070607@gmail.com>
[not found] ` <1407587152.24027.5.camel@usa>
2014-08-11 17:22 ` [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency Walid Nouri
2014-08-11 20:15 ` Michael R. Hines
2014-08-17 9:52 ` Paolo Bonzini
2014-08-19 8:58 ` Walid Nouri
2014-09-10 15:43 ` Walid Nouri
2014-09-11 1:50 ` Michael R. Hines
2014-09-12 1:34 ` Hongyang Yang
2014-09-11 7:27 ` Paolo Bonzini
2014-09-11 17:44 ` Dr. David Alan Gilbert
2014-09-11 22:08 ` Walid Nouri
2014-09-12 1:24 ` Hongyang Yang
2014-09-12 11:07 ` Stefan Hajnoczi
2014-09-17 20:53 ` Walid Nouri
2014-09-18 13:56 ` Stefan Hajnoczi
2014-09-23 16:36 ` Walid Nouri
2014-09-24 8:47 ` Stefan Hajnoczi
2014-09-25 16:06 ` Walid Nouri
2014-08-11 20:15 ` Michael R. Hines
2014-08-13 14:03 ` Walid Nouri
2014-08-13 22:28 ` Michael R. Hines
2014-08-14 10:58 ` Dr. David Alan Gilbert [this message]
2014-08-14 17:23 ` Michael R. Hines
2014-08-19 8:33 ` Walid Nouri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140814105802.GD2503@work-vm \
--to=dgilbert@redhat.com \
--cc=hinesmr@cn.ibm.com \
--cc=michael@hinespot.com \
--cc=mrhines@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=walid.nouri@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).