From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yoshiaki Tamura Subject: Re: [PATCH 00 of 10] Teach xm save to checkpoint a Date: Wed, 20 Dec 2006 19:01:18 +0900 Message-ID: <458909EE.5030705@lab.ntt.co.jp> References: <20061216000428.GA5951@ventoux.cs.ubc.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20061216000428.GA5951@ventoux.cs.ubc.ca> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Brendan Cully Cc: xen-devel List-Id: xen-devel@lists.xenproject.org Brendan: Hi, my name is Yoshi Tamura, working for NTT Labs in Japan. I tried your patches, and I liked your new feature to checkpoint a runnin= g domain. I also tried your patches for live migration, but xc_linux_restore() on t= he=20 remote machine failed. I track downed the problem and fixed it by modifying __xen_checkpoint() i= n=20 machine_reboot.c. Take a look at the following patch. As far as I have tested, it works for both xm save -c and xm migrate =96l= ive. Let me know if you have any comments or better idea. Regards, Yoshi Tamura Signed-off-by: Yoshi Tamura diff -r 3bde632518a4 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot= .c --- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu Dec 14 2= 3:05:42=20 2006 -0800 +++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Wed Dec 20 1= 6:21:43=20 2006 +0900 @@ -171,8 +171,6 @@ int __xen_suspend(void) pre_suspend(); - gnttab_checkpoint(); - /* * We'll stop somewhere inside this hypercall. When it returns, * we'll start resuming after the restore. @@ -223,6 +221,8 @@ int __xen_checkpoint(void) xenbus_lock(); + gnttab_suspend(); + preempt_disable(); mm_pin_all(); @@ -257,6 +257,8 @@ int __xen_checkpoint(void) } else { post_checkpoint(); + gnttab_resume(); + local_irq_enable(); xenbus_unlock(); Brendan Cully wrote: > I think maybe I forgot to mention that I have successfully > checkpointed domains and restored them from checkpoints (with > file-system activity between checkpoints). It seems to work pretty > well. I'll try to put together a demo of this next week. >=20 > Regarding full device disconnection, my understanding is that guest > domains are already prepared to deal with back-end driver crashes (by > maintaining shadows of the ring etc), so a forced reconnect on resume > should be able to recover even if there wasn't an orderly shutdown > before the suspend. I thought when I looked over the code that the > reconnect path did a paranoid forced disconnect first anyway (eg > checking for existing event channels and resetting them). >=20 > On the other hand, if checkpoints are taken more frequently than they > are restored, it seems odd to be constantly detaching and reattaching > back-ends in the parent. >=20 > But if this is unsafe, it should be fairly easy to make the code do a > full disconnect before suspend. It might be as easy as changing xm > save to write 'suspend' to control/shutdown instead of 'checkpoint'. >=20 > On Friday, 15 December 2006 at 08:07, Steven Hand wrote: >>> I'm not too sure about the last couple of patches in this >>> series. Because the checkpointing domain doesn't disconnect before >>> calling suspend, it retains a few references to pages it doesn't >>> own. These trigger a PT race detector in xc_linux_save, which causes >>> it to abort. So the last couple of patches explicitly identify the >>> references I've found so far (shared_info and some grant table shared >>> pages) and simply zero those PTEs during save, since they'll be >>> recreated on restore. Finding the grant table pages is a bit fragile = - >>> I walk the page table loaded in CR3 at the time of suspend looking fo= r >>> the virtual address I've stowed in the suspend record. I've only got >>> code for two-level page tables at the moment, since I'm not convinced >>> this is the right approach. Under what circumstances would a non-live >>> save have an unsafe PTE race?=20 >> Pretty much any PT race in a non-live save/migrate is a bug; the=20 >> domain is (in theory) suspended at this point, and all of the=20 >> devices are disconnected. Since you've chosen not to 'disconnect'=20 >> the devices, you'll get random updates occuring to any shared=20 >> pages (shared via grants or directly shared with Xen).=20 >> >>> Maybe it's fine to simply zero these ptes without checking them.=20 >> I'd think not.=20 >=20 > to clarify, the pages that have caused races in my experiments are > always the same 5: shared_info and four grant table shared pages. The > reason these don't cause races in plain save is simply that they are > unmapped before suspend is called. Since I've adjusted the kernel to > recreate these specific pages on restore (but not in the parent when > checkpoint returns), my patches do just zero out the PTEs (simulating > in the save code what had previously been done in the guest). >=20 > Finding the guest grant table pages is a little annoying though. I > ended up having the guest put the virtual address of its mapping into > an unused field in the suspend record, then walking the page table to > find the MFN. I was thinking it might be better to either get Xen to > export a list of pages that the guest has references to, or to assume > that any unowned MFNs in the page tables are either pages that will be > recreated on restore anyway and just zero them out. In short, I wonder > how often that PT race code has stopped a non-live save. If the answer > is 'never', then zeroing out the PTEs might be fine. Especially since > the original domain is still intact after the checkpoint. >=20 > Thanks again for looking this over. >=20 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >=20 >=20 --=20 TAMURA, Yoshiaki NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: tamura.yoshiaki@lab.ntt.co.jp TEL: (046)-859-2771 FAX: (046)-855-1152 Address: 1-1 Hikarinooka, Yokosuka Kanagawa 239-0847 JAPAN