Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Walid Nouri <walid.nouri@gmail.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "michael@hinespot.com" <michael@hinespot.com>,
	"hinesmr@cn.ibm.com" <hinesmr@cn.ibm.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Michael R. Hines" <mrhines@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency
Date: Tue, 19 Aug 2014 10:33:45 +0200	[thread overview]
Message-ID: <059E3E5C-0FD8-4876-AFB7-617EBC52055C@gmail.com> (raw)
In-Reply-To: <20140814105802.GD2503@work-vm>

Hi,
I have tried to find more information on how to use drive-mirror besides what is available on the wiki. This was not very satisfactory...

This may sound naive but are there some code examples in "c" or any other language, documentation of any kind, blog entries (developer), presentation videos or any other source of information to get started?

Walid


> Am 14.08.2014 um 12:58 schrieb "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> 
> cc'ing in a couple of the COLOers.
> 
> * Michael R. Hines (mrhines@linux.vnet.ibm.com) wrote:
>>> On 08/13/2014 10:03 PM, Walid Nouri wrote:
>>> 
>>> While looking to find some ideas for approaches to replicating block
>>> devices I have read the paper about the Remus implementation. I think MC
>>> can take a similar approach for local disk.
>> 
>> I agree.
>> 
>>> Here are the main facts that I have understood:
>>> 
>>> Local disk contents is viewed as internal state the primary and secondary.
>>> In the explanation they describe that for keeping disc semantics of the
>>> primary and to allow the primary to run speculatively all disc state
>>> changes are directly written to the disk. In parrallel and asynchronously
>>> send to the secondary. The secondary keeps the pending writing requests in
>>> two disk buffers. A speculation-disk-buffer and a write-out-buffer.
>>> 
>>> After the reception of the next checkpoint the secondary copies the
>>> speculation buffer to the write out buffer, commits the checkpoint and
>>> applies the write out buffer to its local disk.
>>> 
>>> When the primary fails the secondary must wait until write-out-buffer has
>>> been completely written to disk before before changing the execution mode
>>> to run as primary. In this case (failure of primary) the secondary
>>> discards pending disk writes in its speculation buffer. This protocol
>>> keeps the disc state consistent with the last checkpoint.
>>> 
>>> Remus uses the XEN specific blktap driver. As far as I know this can?t be
>>> used with QEMU (KVM).
>>> 
>>> I must see how drive-mirror can be used for this kind of protocol.
>> 
>> That's all correct. Theoretically, we would do exactly the same thing:
>> drive-mirror on the source would write immediately to disk but follow the
>> same commit semantics on the destination as Xen.
>> 
>>> 
>>> I have taken a look at COLO.
>> 
>>> IMHO there are two points. Custom changes of the TCP-Stack are a no-go for
>>> proprietary operating systems like Windows. It makes COLO application
>>> agnostic but not operating system agnostic. The other point is that with
>>> I/O intensive workloads COLO will tend to behave like MC. This is my point
>>> of view but i didn?t invest much time to understand everything in detail.
>> 
>> Actually, if I remember correctly, the TCP stack is only modified at the
>> hypervisor level - they are intercepting and translating TCP sequence
>> numbers "in-flight" to detect divergence of the source and destination -
>> which is not a big problem if the implementation is well-done.
> 
> The 2013 paper says:
>   'COLO modifies the guest OS’s TCP/IP stack in order to make the behavior
>    more deterministic. '
> but does say that an alternative might be to have a
>  ' comparison function that operates transparently over re-assembled TCP streams'
> 
>> My hope in the future was that the two approaches could be used in a
>> "Hybrid" manner - actually MC has much more of a performance hit for I/O
>> than COLO does because of its buffering requirements.
>> 
>> On the other hand, MC would perform better in a memory-intensive or
>> CPU-intensive situation - so maybe QEMU could "switch" between the two
>> mechanisms at different points in time when the resource bottleneck changes.
> 
> If the primary were to rate-limit the number of resynchronisations
> (and send the secondary a message as soon as it knew a resync was needed) that
> would get some of the way, but then the only difference from microcheckpointing
> at that point is the secondary doing a wasteful copy and sending the packets across;
> it seems it should be easy to disable those if it knew that a resync was going to
> happen.
> 
> Dave
> 
>> - Michael
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

     prev parent reply	other threads:[~2014-08-19  8:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <53D8FF52.9000104@gmail.com>
     [not found] ` <1406820870.2680.3.camel@usa>
     [not found]   ` <53DBE726.4050102@gmail.com>
     [not found]     ` <1406947532.2680.11.camel@usa>
     [not found]       ` <53E0AA60.9030404@gmail.com>
     [not found]         ` <1407376929.21497.2.camel@usa>
     [not found]           ` <53E60F34.1070607@gmail.com>
     [not found]             ` <1407587152.24027.5.camel@usa>
2014-08-11 17:22               ` [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency Walid Nouri
2014-08-11 20:15                 ` Michael R. Hines
2014-08-17  9:52                   ` Paolo Bonzini
2014-08-19  8:58                     ` Walid Nouri
2014-09-10 15:43                     ` Walid Nouri
2014-09-11  1:50                       ` Michael R. Hines
2014-09-12  1:34                         ` Hongyang Yang
2014-09-11  7:27                       ` Paolo Bonzini
2014-09-11 17:44                       ` Dr. David Alan Gilbert
2014-09-11 22:08                         ` Walid Nouri
2014-09-12  1:24                         ` Hongyang Yang
2014-09-12 11:07                         ` Stefan Hajnoczi
2014-09-17 20:53                           ` Walid Nouri
2014-09-18 13:56                             ` Stefan Hajnoczi
2014-09-23 16:36                               ` Walid Nouri
2014-09-24  8:47                                 ` Stefan Hajnoczi
2014-09-25 16:06                                   ` Walid Nouri
2014-08-11 20:15                 ` Michael R. Hines
2014-08-13 14:03                   ` Walid Nouri
2014-08-13 22:28                     ` Michael R. Hines
2014-08-14 10:58                       ` Dr. David Alan Gilbert
2014-08-14 17:23                         ` Michael R. Hines
2014-08-19  8:33                         ` Walid Nouri [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=059E3E5C-0FD8-4876-AFB7-617EBC52055C@gmail.com \
    --to=walid.nouri@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=hinesmr@cn.ibm.com \
    --cc=michael@hinespot.com \
    --cc=mrhines@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).