qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Cc: Walid Nouri <walid.nouri@gmail.com>,
	hinesmr@cn.ibm.com, qemu-devel@nongnu.org, michael@hinespot.com
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Thu, 14 Aug 2014 11:58:03 +0100	[thread overview]
Message-ID: <20140814105802.GD2503@work-vm> (raw)
In-Reply-To: <53EBE672.7050903@linux.vnet.ibm.com>

cc'ing in a couple of the COLOers.

* Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
> On 08/13/2014 10:03 PM, Walid Nouri wrote:
> >
> >While looking to find some ideas for approaches to replicating block
> >devices I have read the paper about the Remus implementation. I think MC
> >can take a similar approach for local disk.
> >
> 
> I agree.
> 
> >Here are the main facts that I have understood:
> >
> >Local disk contents is viewed as internal state the primary and secondary.
> >In the explanation they describe that for keeping disc semantics of the
> >primary and to allow the primary to run speculatively all disc state
> >changes are directly written to the disk. In parrallel and asynchronously
> >send to the secondary. The secondary keeps the pending writing requests in
> >two disk buffers. A speculation-disk-buffer and a write-out-buffer.
> >
> >After the reception of the next checkpoint the secondary copies the
> >speculation buffer to the write out buffer, commits the checkpoint and
> >applies the write out buffer to its local disk.
> >
> >When the primary fails the secondary must wait until write-out-buffer has
> >been completely written to disk before before changing the execution mode
> >to run as primary. In this case (failure of primary) the secondary
> >discards pending disk writes in its speculation buffer. This protocol
> >keeps the disc state consistent with the last checkpoint.
> >
> >Remus uses the XEN specific blktap driver. As far as I know this can?t be
> >used with QEMU (KVM).
> >
> >I must see how drive-mirror can be used for this kind of protocol.
> >
> 
> That's all correct. Theoretically, we would do exactly the same thing:
> drive-mirror on the source would write immediately to disk but follow the
> same commit semantics on the destination as Xen.
> 
> >
> >I have taken a look at COLO.
> >
> 
> >IMHO there are two points. Custom changes of the TCP-Stack are a no-go for
> >proprietary operating systems like Windows. It makes COLO application
> >agnostic but not operating system agnostic. The other point is that with
> >I/O intensive workloads COLO will tend to behave like MC. This is my point
> >of view but i didn?t invest much time to understand everything in detail.
> >
> 
> Actually, if I remember correctly, the TCP stack is only modified at the
> hypervisor level - they are intercepting and translating TCP sequence
> numbers "in-flight" to detect divergence of the source and destination -
> which is not a big problem if the implementation is well-done.

The 2013 paper says:
   'COLO modifies the guest OS’s TCP/IP stack in order to make the behavior
    more deterministic. '
but does say that an alternative might be to have a
  ' comparison function that operates transparently over re-assembled TCP streams'

> My hope in the future was that the two approaches could be used in a
> "Hybrid" manner - actually MC has much more of a performance hit for I/O
> than COLO does because of its buffering requirements.
> 
> On the other hand, MC would perform better in a memory-intensive or
> CPU-intensive situation - so maybe QEMU could "switch" between the two
> mechanisms at different points in time when the resource bottleneck changes.

If the primary were to rate-limit the number of resynchronisations
(and send the secondary a message as soon as it knew a resync was needed) that
would get some of the way, but then the only difference from microcheckpointing
at that point is the secondary doing a wasteful copy and sending the packets across;
it seems it should be easy to disable those if it knew that a resync was going to
happen.

Dave

> - Michael
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2014-08-14 10:58 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <53D8FF52.9000104@gmail.com>
     [not found] ` <1406820870.2680.3.camel@usa>
     [not found]   ` <53DBE726.4050102@gmail.com>
     [not found]     ` <1406947532.2680.11.camel@usa>
     [not found]       ` <53E0AA60.9030404@gmail.com>
     [not found]         ` <1407376929.21497.2.camel@usa>
     [not found]           ` <53E60F34.1070607@gmail.com>
     [not found]             ` <1407587152.24027.5.camel@usa>
2014-08-11 17:22               ` [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency Walid Nouri
2014-08-11 20:15                 ` Michael R. Hines
2014-08-17  9:52                   ` Paolo Bonzini
2014-08-19  8:58                     ` Walid Nouri
2014-09-10 15:43                     ` Walid Nouri
2014-09-11  1:50                       ` Michael R. Hines
2014-09-12  1:34                         ` Hongyang Yang
2014-09-11  7:27                       ` Paolo Bonzini
2014-09-11 17:44                       ` Dr. David Alan Gilbert
2014-09-11 22:08                         ` Walid Nouri
2014-09-12  1:24                         ` Hongyang Yang
2014-09-12 11:07                         ` Stefan Hajnoczi
2014-09-17 20:53                           ` Walid Nouri
2014-09-18 13:56                             ` Stefan Hajnoczi
2014-09-23 16:36                               ` Walid Nouri
2014-09-24  8:47                                 ` Stefan Hajnoczi
2014-09-25 16:06                                   ` Walid Nouri
2014-08-11 20:15                 ` Michael R. Hines
2014-08-13 14:03                   ` Walid Nouri
2014-08-13 22:28                     ` Michael R. Hines
2014-08-14 10:58                       ` Dr. David Alan Gilbert [this message]
2014-08-14 17:23                         ` Michael R. Hines
2014-08-19  8:33                         ` Walid Nouri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140814105802.GD2503@work-vm \
    --to=dgilbert@redhat.com \
    --cc=hinesmr@cn.ibm.com \
    --cc=michael@hinespot.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=walid.nouri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).