From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Subject: Re: [PATCH 00 of 10] Teach xm save to checkpoint a
Date: Wed, 20 Dec 2006 19:01:18 +0900
Message-ID: <458909EE.5030705@lab.ntt.co.jp>
References: <patchbomb.1166168316@ventoux.cs.ubc.ca>	<E1Gv86l-0006kd-00@mta1.cl.cam.ac.uk>
	<20061216000428.GA5951@ventoux.cs.ubc.ca>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <20061216000428.GA5951@ventoux.cs.ubc.ca>
List-Unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Brendan Cully <brendan@cs.ubc.ca>
Cc: xen-devel <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

Brendan:

Hi, my name is Yoshi Tamura, working for NTT Labs in Japan.
I tried your patches, and I liked your new feature to checkpoint a runnin=
g domain.
I also tried your patches for live migration, but xc_linux_restore() on t=
he=20
remote machine failed.
I track downed the problem and fixed it by modifying __xen_checkpoint() i=
n=20
machine_reboot.c. Take a look at the following patch.
As far as I have tested, it works for both xm save -c and xm migrate =96l=
ive.
Let me know if you have any comments or better idea.

Regards,

Yoshi Tamura


Signed-off-by: Yoshi Tamura <tamura.yoshiaki@lab.ntt.co.jp>

diff -r 3bde632518a4 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot=
.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c	Thu Dec 14 2=
3:05:42=20
2006 -0800
+++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c	Wed Dec 20 1=
6:21:43=20
2006 +0900
@@ -171,8 +171,6 @@ int __xen_suspend(void)

  	pre_suspend();

-	gnttab_checkpoint();
-
  	/*
  	 * We'll stop somewhere inside this hypercall. When it returns,
  	 * we'll start resuming after the restore.
@@ -223,6 +221,8 @@ int __xen_checkpoint(void)

  	xenbus_lock();

+	gnttab_suspend();
+
  	preempt_disable();

  	mm_pin_all();
@@ -257,6 +257,8 @@ int __xen_checkpoint(void)
  	} else {
  		post_checkpoint();

+		gnttab_resume();
+
  		local_irq_enable();

  		xenbus_unlock();


Brendan Cully wrote:
> I think maybe I forgot to mention that I have successfully
> checkpointed domains and restored them from checkpoints (with
> file-system activity between checkpoints). It seems to work pretty
> well. I'll try to put together a demo of this next week.
>=20
> Regarding full device disconnection, my understanding is that guest
> domains are already prepared to deal with back-end driver crashes (by
> maintaining shadows of the ring etc), so a forced reconnect on resume
> should be able to recover even if there wasn't an orderly shutdown
> before the suspend. I thought when I looked over the code that the
> reconnect path did a paranoid forced disconnect first anyway (eg
> checking for existing event channels and resetting them).
>=20
> On the other hand, if checkpoints are taken more frequently than they
> are restored, it seems odd to be constantly detaching and reattaching
> back-ends in the parent.
>=20
> But if this is unsafe, it should be fairly easy to make the code do a
> full disconnect before suspend. It might be as easy as changing xm
> save to write 'suspend' to control/shutdown instead of 'checkpoint'.
>=20
> On Friday, 15 December 2006 at 08:07, Steven Hand wrote:
>>> I'm not too sure about the last couple of patches in this
>>> series. Because the checkpointing domain doesn't disconnect before
>>> calling suspend, it retains a few references to pages it doesn't
>>> own. These trigger a PT race detector in xc_linux_save, which causes
>>> it to abort. So the last couple of patches explicitly identify the
>>> references I've found so far (shared_info and some grant table shared
>>> pages) and simply zero those PTEs during save, since they'll be
>>> recreated on restore. Finding the grant table pages is a bit fragile =
-
>>> I walk the page table loaded in CR3 at the time of suspend looking fo=
r
>>> the virtual address I've stowed in the suspend record. I've only got
>>> code for two-level page tables at the moment, since I'm not convinced
>>> this is the right approach. Under what circumstances would a non-live
>>> save have an unsafe PTE race?=20
>> Pretty much any PT race in a non-live save/migrate is a bug; the=20
>> domain is (in theory) suspended at this point, and all of the=20
>> devices are disconnected. Since you've chosen not to 'disconnect'=20
>> the devices, you'll get random updates occuring to any shared=20
>> pages (shared via grants or directly shared with Xen).=20
>>
>>> Maybe it's fine to simply zero these ptes without checking them.=20
>> I'd think not.=20
>=20
> to clarify, the pages that have caused races in my experiments are
> always the same 5: shared_info and four grant table shared pages. The
> reason these don't cause races in plain save is simply that they are
> unmapped before suspend is called. Since I've adjusted the kernel to
> recreate these specific pages on restore (but not in the parent when
> checkpoint returns), my patches do just zero out the PTEs (simulating
> in the save code what had previously been done in the guest).
>=20
> Finding the guest grant table pages is a little annoying though. I
> ended up having the guest put the virtual address of its mapping into
> an unused field in the suspend record, then walking the page table to
> find the MFN. I was thinking it might be better to either get Xen to
> export a list of pages that the guest has references to, or to assume
> that any unowned MFNs in the page tables are either pages that will be
> recreated on restore anyway and just zero them out. In short, I wonder
> how often that PT race code has stopped a non-live save. If the answer
> is 'never', then zeroing out the PTEs might be fine. Especially since
> the original domain is still intact after the checkpoint.
>=20
> Thanks again for looking this over.
>=20
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>=20
>=20


--=20
TAMURA, Yoshiaki

NTT Cyber Space Labs
OSS Computing Project
Kernel Group
E-mail: tamura.yoshiaki@lab.ntt.co.jp
TEL: (046)-859-2771
FAX: (046)-855-1152
Address: 1-1 Hikarinooka, Yokosuka
	 Kanagawa 239-0847 JAPAN