From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andres Lagar Cavilla Subject: Re: live saving of domU Date: Wed, 10 May 2006 16:42:15 -0400 Message-ID: <44625027.3090804@cs.toronto.edu> References: <44623B92.7050300@cs.toronto.edu> <446241C5.1030004@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <446241C5.1030004@us.ibm.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Anthony Liguori Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org >> My understanding is that the guest only canonicalizes the store and >> console mfn's and places them on the shared info frame which is >> passed to the suspend hypercall. The rest of the canonicalizations >> are done by dom0 user-space code (xc_linux_save). > > > Sort of. When you pause a domain, it could be doing something like a > PTE update in which case it has a PFN in a register (or on the stack > somewhere). Part of the reason for having a suspend entry point in > the kernel is to ensure that we're in a consistent state. Does the guest kernel do anything beyond what's in __do_suspend in reboot.c? >> The guest never really shuts down: it issues the suspend hypercall >> and waits for it to return. This could happen months later when the >> domain is resumed :) The suspend hypercall executing in xen is the >> one that pauses all vcpus and kills the domain. > > Actually, take a look at what HYPERVISOR_suspend is: > > It's just a shutdown op. But it doesn't have to be. The hypercall could only pause the domain, and let the user-space tools unpause (no 's' bit -> no domain/devices teardown) when checkpointing is over. The guest kernel can't tell the difference: it returns from the hypercall and life goes on, as long as the devices are still there. That's what I was referring to with: >> Is it feasible to use a different hypercall that pauses the domain >> but doesn't kill it, and once xc_linux_save is done checkpointing >> have it issue a dom0_op that unpauses the domain? > > A domain is "killed" with a dom0_op of domain_destroy which is invoked > by Xend. The problem with checkpointing is that once the 's' bit has > been set on a domain, there's no way to unset that bit. As I said a few lines up, let's not set the 's' bit for lightweight checkpoints. This is likely to cause a lot of special casing for xend/xenstore, right? Andres