Re: [Qemu-devel] [RFC PATCH v2 01/12] mc: add documentation for micro-checkpointing

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: SADEKJ@il.ibm.com, pbonzini@redhat.com, quintela@redhat.com,
	BIRAN@il.ibm.com, qemu-devel@nongnu.org, EREZH@il.ibm.com,
	owasserm@redhat.com, onom@us.ibm.com, hinesmr@cn.ibm.com,
	isaku.yamahata@gmail.com, gokul@us.ibm.com, dbulkow@gmail.com,
	junqing.wang@cs2c.com.cn, abali@us.ibm.com,
	lig.fnst@cn.fujitsu.com, "Michael R. Hines" <mrhines@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC PATCH v2 01/12] mc: add documentation for micro-checkpointing
Date: Mon, 03 Mar 2014 14:08:47 +0800	[thread overview]
Message-ID: <53141C6F.6050603@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140221094433.GA2483@work-vm>

On 02/21/2014 05:44 PM, Dr. David Alan Gilbert wrote:
>> It's not clear to me how much of this (or any) of this control loop should
>> be in QEMU or in the management software, but I would definitely agree
>> that a minimum of at least the ability to detect the situation and remedy
>> the situation should be in QEMU. I'm not entirely convince that the
>> ability to *decide* to remedy the situation should be in QEMU, though.
> The management software access is low frequency, high latency; it should
> be setting general parameters (max memory allowed, desired checkpoint
> frequency etc) but I don't see that we can use it to do anything on
> a sooner than a few second basis; so yes it can monitor things and
> tweek the knobs if it sees the host as a whole is getting tight on RAM
> etc - but we can't rely on it to throw in the breaks if this guest
> suddenly decides to take bucket loads of RAM; something has to react
> quickly in relation to previously set limits.

I agree - the boolean flag I mentioned previously would do just
that: setting the flag (or state, perhaps instead of boolean),
would indicate to QEMU to make a particular type of sacrifice:

A flag of "0" might mean "Throttle the guest in an emergency"
A flag of "1" might mean "Throttling is not acceptable, just let the 
guest use the extra memory"
A flag of "2" might mean "Neither one is acceptable, fail now and inform 
the management software to restart somewhere else".

Or something to that effect........

>>>> If you block the guest from being checkpointed,
>>>> then what happens if there is a failure during that extended period?
>>>> We will have saved memory at the expense of availability.
>>> If the active machine fails during this time then the secondary carries
>>> on from it's last good snapshot in the knowledge that the active
>>> never finished the new snapshot and so never uncorked it's previous packets.
>>>
>>> If the secondary machine fails during this time then tha active drops
>>> it's nascent snapshot and carries on.
>> Yes, that makes sense. Where would that policy go, though,
>> continuing the above concern?
> I think there has to be some input from the management layer for failover,
> because (as per my split-brain concerns) something has to make the decision
> about which of the source/destination is to take over, and I don't
> believe individual instances have that information.

Agreed - so the "ability" (as hinted on above) should be in QEMU,
but the decision to recover from the situation probably should not
be, where "recover" is defined as the VM is back in a fully running,
fully fault-tolerant protected state (potentially where the source VM
is on a different machine than it was before).

>
>>>> Well, that's simple: If there is a failure of the source, the destination
>>>> will simply revert to the previous checkpoint using the same mode
>>>> of operation. The lost ACKs that you're curious about only
>>>> apply to the checkpoint that is in progress. Just because a
>>>> checkpoint is in progress does not mean that the previous checkpoint
>>>> is thrown away - it is already loaded into the destination's memory
>>>> and ready to be activated.
>>> I still don't see why, if the link between them fails, the destination
>>> doesn't fall back it it's previous checkpoint, AND the source carries
>>> on running - I don't see how they can differentiate which of them has failed.
>> I think you're forgetting that the source I/O is buffered - it doesn't
>> matter that the source VM is still running. As long as it's output is
>> buffered - it cannot have any non-fault-tolerant affect on the outside
>> world.
>>
>> In the future, if a technician access the machine or the network
>> is restored, the management software can terminate the stale
>> source virtual machine.
> I think going with my comment above; I'm working on the basis it's just
> as likely for the destination to fail as it is for the source to fail,
> and a destination failure shouldn't kill the source; and in the case
> of a destination failure the source is going to have to let it's buffered
> I/Os start going again.

Yes, that's correct, but only after management software knows about
the failure. If we're on a tightly-coupled fast lan, there's no reason
to believe that libvirt, for example, would be so slow that we cannot
wait a few extra (10s of?) milliseconds after destination failure to
choose a new destination and restart the previous checkpoint.

But if management *is* too slow, which is not unlikely, then I think
we should just tell the source to Migrate entirely and get out of that
environment.

Either way - this isn't something QEMU itself necessarily needs to
worry about - it just needs to know not to explode if the destination
fails and wait for instructions on what to do next.......

Alternatively, if the administrator "prefers" restarting the fault-tolerance
instead of Migration, we could have a QMP command that specifies
a "backup" destination (or even a "duplicate" destination) that QEMU
would automatically know about in the case of destination failure.

But, I wouldn't implement something like that until at least a first version
was accepted by the community.

- Michael

next prev parent reply	other threads:[~2014-03-03  6:09 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-18  8:50 [Qemu-devel] [RFC PATCH v2 00/12] mc: fault tolerante through micro-checkpointing mrhines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 01/12] mc: add documentation for micro-checkpointing mrhines
2014-02-18 12:45   ` Dr. David Alan Gilbert
2014-02-19  1:40     ` Michael R. Hines
2014-02-19 11:27       ` Dr. David Alan Gilbert
2014-02-20  1:17         ` Michael R. Hines
2014-02-20 10:09           ` Dr. David Alan Gilbert
2014-02-20 11:14             ` Li Guang
2014-02-20 14:58               ` Michael R. Hines
2014-02-20 14:57             ` Michael R. Hines
2014-02-20 16:32               ` Dr. David Alan Gilbert
2014-02-21  4:54                 ` Michael R. Hines
2014-02-21  9:44                   ` Dr. David Alan Gilbert
2014-03-03  6:08                     ` Michael R. Hines [this message]
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 02/12] mc: timestamp migration_bitmap and KVM logdirty usage mrhines
2014-02-18 10:32   ` Dr. David Alan Gilbert
2014-02-19  1:42     ` Michael R. Hines
2014-03-11 21:31   ` Juan Quintela
2014-04-04  3:08     ` Michael R. Hines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 03/12] mc: introduce a 'checkpointing' status check into the VCPU states mrhines
2014-03-11 21:36   ` Juan Quintela
2014-04-04  3:11     ` Michael R. Hines
2014-03-11 21:40   ` Eric Blake
2014-04-04  3:12     ` Michael R. Hines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 04/12] mc: support custom page loading and copying mrhines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 05/12] rdma: accelerated memcpy() support and better external RDMA user interfaces mrhines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 06/12] mc: introduce state machine changes for MC mrhines
2014-02-19  1:00   ` Li Guang
2014-02-19  2:14     ` Michael R. Hines
2014-02-20  5:03     ` Michael R. Hines
2014-02-21  8:13     ` Michael R. Hines
2014-02-24  6:48       ` Li Guang
2014-02-26  2:52         ` Li Guang
2014-03-11 21:57   ` Juan Quintela
2014-04-04  3:50     ` Michael R. Hines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 07/12] mc: introduce additional QMP statistics for micro-checkpointing mrhines
2014-03-11 21:45   ` Eric Blake
2014-04-04  3:15     ` Michael R. Hines
2014-04-04  4:22       ` Eric Blake
2014-03-11 21:59   ` Juan Quintela
2014-04-04  3:55     ` Michael R. Hines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 08/12] mc: core logic mrhines
2014-02-19  1:07   ` Li Guang
2014-02-19  2:16     ` Michael R. Hines
2014-02-19  2:53       ` Li Guang
2014-02-19  4:27         ` Michael R. Hines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 09/12] mc: configure and makefile support mrhines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 10/12] mc: expose tunable parameter for checkpointing frequency mrhines
2014-03-11 21:49   ` Eric Blake
2014-03-11 22:15     ` Juan Quintela
2014-03-11 22:49       ` Eric Blake
2014-04-04  5:29         ` Michael R. Hines
2014-04-04 14:56           ` Eric Blake
2014-04-11  6:10             ` Michael R. Hines
2014-04-04 16:28           ` Dr. David Alan Gilbert
2014-04-04 16:35             ` Eric Blake
2014-04-04  3:29     ` Michael R. Hines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 11/12] mc: introduce new capabilities to control micro-checkpointing mrhines
2014-03-11 21:57   ` Eric Blake
2014-04-04  3:38     ` Michael R. Hines
2014-04-04  4:25       ` Eric Blake
2014-03-11 22:02   ` Juan Quintela
2014-03-11 22:07     ` Eric Blake
2014-04-04  3:57       ` Michael R. Hines
2014-04-04  3:56     ` Michael R. Hines
2014-02-18  8:50 ` [Qemu-devel] [RFC PATCH v2 12/12] mc: activate and use MC if requested mrhines
2014-02-18  9:28 ` [Qemu-devel] [RFC PATCH v2 00/12] mc: fault tolerante through micro-checkpointing Li Guang
2014-02-19  1:29   ` Michael R. Hines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53141C6F.6050603@linux.vnet.ibm.com \
    --to=mrhines@linux.vnet.ibm.com \
    --cc=BIRAN@il.ibm.com \
    --cc=EREZH@il.ibm.com \
    --cc=SADEKJ@il.ibm.com \
    --cc=abali@us.ibm.com \
    --cc=dbulkow@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=gokul@us.ibm.com \
    --cc=hinesmr@cn.ibm.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=junqing.wang@cs2c.com.cn \
    --cc=lig.fnst@cn.fujitsu.com \
    --cc=mrhines@us.ibm.com \
    --cc=onom@us.ibm.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).