From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: live saving of domU Date: Wed, 10 May 2006 16:05:50 -0500 Message-ID: <446255AE.1030105@us.ibm.com> References: <44623B92.7050300@cs.toronto.edu> <446241C5.1030004@us.ibm.com> <44625027.3090804@cs.toronto.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <44625027.3090804@cs.toronto.edu> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Andres Lagar Cavilla Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Andres Lagar Cavilla wrote: >>> My understanding is that the guest only canonicalizes the store and >>> console mfn's and places them on the shared info frame which is >>> passed to the suspend hypercall. The rest of the canonicalizations >>> are done by dom0 user-space code (xc_linux_save). >> >> >> Sort of. When you pause a domain, it could be doing something like a >> PTE update in which case it has a PFN in a register (or on the stack >> somewhere). Part of the reason for having a suspend entry point in >> the kernel is to ensure that we're in a consistent state. > > Does the guest kernel do anything beyond what's in __do_suspend in > reboot.c? Nothing that isn't reachable from that function. >>> The guest never really shuts down: it issues the suspend hypercall >>> and waits for it to return. This could happen months later when the >>> domain is resumed :) The suspend hypercall executing in xen is the >>> one that pauses all vcpus and kills the domain. >> >> Actually, take a look at what HYPERVISOR_suspend is: >> >> It's just a shutdown op. > > But it doesn't have to be. The hypercall could only pause the domain, > and let the user-space tools unpause (no 's' bit -> no domain/devices > teardown) when checkpointing is over. The guest kernel can't tell the > difference: it returns from the hypercall and life goes on, as long as > the devices are still there. That's what I was referring to with: It could, but you have a number of other problems you have to solve. How do you signal to userspace that the domain is suspended? You could introduce another VIRQ perhaps or extend the state. The __do_suspend path supposes that the devices are being cycled too. You either need Xend to participate in this process. How devices interact would need some careful thinking. >>> Is it feasible to use a different hypercall that pauses the domain >>> but doesn't kill it, and once xc_linux_save is done checkpointing >>> have it issue a dom0_op that unpauses the domain? >> >> A domain is "killed" with a dom0_op of domain_destroy which is >> invoked by Xend. The problem with checkpointing is that once the 's' >> bit has been set on a domain, there's no way to unset that bit. > > As I said a few lines up, let's not set the 's' bit for lightweight > checkpoints. This is likely to cause a lot of special casing for > xend/xenstore, right? Yeah, there's a lot of bits of userspace code that would be effected. I hope this isn't disparaging, I certainly think it's worth the effort. Regards, Anthony Liguori > Andres