From: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
To: Brendan Cully <brendan@cs.ubc.ca>
Cc: xen-devel <xen-devel@lists.xensource.com>
Subject: Re: [PATCH 00 of 10] Teach xm save to checkpoint a
Date: Wed, 20 Dec 2006 19:01:18 +0900 [thread overview]
Message-ID: <458909EE.5030705@lab.ntt.co.jp> (raw)
In-Reply-To: <20061216000428.GA5951@ventoux.cs.ubc.ca>
Brendan:
Hi, my name is Yoshi Tamura, working for NTT Labs in Japan.
I tried your patches, and I liked your new feature to checkpoint a running domain.
I also tried your patches for live migration, but xc_linux_restore() on the
remote machine failed.
I track downed the problem and fixed it by modifying __xen_checkpoint() in
machine_reboot.c. Take a look at the following patch.
As far as I have tested, it works for both xm save -c and xm migrate –live.
Let me know if you have any comments or better idea.
Regards,
Yoshi Tamura
Signed-off-by: Yoshi Tamura <tamura.yoshiaki@lab.ntt.co.jp>
diff -r 3bde632518a4 linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Thu Dec 14 23:05:42
2006 -0800
+++ b/linux-2.6-xen-sparse/drivers/xen/core/machine_reboot.c Wed Dec 20 16:21:43
2006 +0900
@@ -171,8 +171,6 @@ int __xen_suspend(void)
pre_suspend();
- gnttab_checkpoint();
-
/*
* We'll stop somewhere inside this hypercall. When it returns,
* we'll start resuming after the restore.
@@ -223,6 +221,8 @@ int __xen_checkpoint(void)
xenbus_lock();
+ gnttab_suspend();
+
preempt_disable();
mm_pin_all();
@@ -257,6 +257,8 @@ int __xen_checkpoint(void)
} else {
post_checkpoint();
+ gnttab_resume();
+
local_irq_enable();
xenbus_unlock();
Brendan Cully wrote:
> I think maybe I forgot to mention that I have successfully
> checkpointed domains and restored them from checkpoints (with
> file-system activity between checkpoints). It seems to work pretty
> well. I'll try to put together a demo of this next week.
>
> Regarding full device disconnection, my understanding is that guest
> domains are already prepared to deal with back-end driver crashes (by
> maintaining shadows of the ring etc), so a forced reconnect on resume
> should be able to recover even if there wasn't an orderly shutdown
> before the suspend. I thought when I looked over the code that the
> reconnect path did a paranoid forced disconnect first anyway (eg
> checking for existing event channels and resetting them).
>
> On the other hand, if checkpoints are taken more frequently than they
> are restored, it seems odd to be constantly detaching and reattaching
> back-ends in the parent.
>
> But if this is unsafe, it should be fairly easy to make the code do a
> full disconnect before suspend. It might be as easy as changing xm
> save to write 'suspend' to control/shutdown instead of 'checkpoint'.
>
> On Friday, 15 December 2006 at 08:07, Steven Hand wrote:
>>> I'm not too sure about the last couple of patches in this
>>> series. Because the checkpointing domain doesn't disconnect before
>>> calling suspend, it retains a few references to pages it doesn't
>>> own. These trigger a PT race detector in xc_linux_save, which causes
>>> it to abort. So the last couple of patches explicitly identify the
>>> references I've found so far (shared_info and some grant table shared
>>> pages) and simply zero those PTEs during save, since they'll be
>>> recreated on restore. Finding the grant table pages is a bit fragile -
>>> I walk the page table loaded in CR3 at the time of suspend looking for
>>> the virtual address I've stowed in the suspend record. I've only got
>>> code for two-level page tables at the moment, since I'm not convinced
>>> this is the right approach. Under what circumstances would a non-live
>>> save have an unsafe PTE race?
>> Pretty much any PT race in a non-live save/migrate is a bug; the
>> domain is (in theory) suspended at this point, and all of the
>> devices are disconnected. Since you've chosen not to 'disconnect'
>> the devices, you'll get random updates occuring to any shared
>> pages (shared via grants or directly shared with Xen).
>>
>>> Maybe it's fine to simply zero these ptes without checking them.
>> I'd think not.
>
> to clarify, the pages that have caused races in my experiments are
> always the same 5: shared_info and four grant table shared pages. The
> reason these don't cause races in plain save is simply that they are
> unmapped before suspend is called. Since I've adjusted the kernel to
> recreate these specific pages on restore (but not in the parent when
> checkpoint returns), my patches do just zero out the PTEs (simulating
> in the save code what had previously been done in the guest).
>
> Finding the guest grant table pages is a little annoying though. I
> ended up having the guest put the virtual address of its mapping into
> an unused field in the suspend record, then walking the page table to
> find the MFN. I was thinking it might be better to either get Xen to
> export a list of pages that the guest has references to, or to assume
> that any unowned MFNs in the page tables are either pages that will be
> recreated on restore anyway and just zero them out. In short, I wonder
> how often that PT race code has stopped a non-live save. If the answer
> is 'never', then zeroing out the PTEs might be fine. Especially since
> the original domain is still intact after the checkpoint.
>
> Thanks again for looking this over.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
--
TAMURA, Yoshiaki
NTT Cyber Space Labs
OSS Computing Project
Kernel Group
E-mail: tamura.yoshiaki@lab.ntt.co.jp
TEL: (046)-859-2771
FAX: (046)-855-1152
Address: 1-1 Hikarinooka, Yokosuka
Kanagawa 239-0847 JAPAN
next prev parent reply other threads:[~2006-12-20 10:01 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-15 6:38 [PATCH 00 of 10] Teach xm save to checkpoint a running domain Brendan Cully
2006-12-15 6:38 ` [PATCH 01 of 10] Add resumedomain domctl to resume a domain after checkpoint Brendan Cully
2006-12-15 6:38 ` [PATCH 02 of 10] Export resumedomain domctl to libxc Brendan Cully
2006-12-15 6:38 ` [PATCH 03 of 10] Export xc_domain_resume to xend Brendan Cully
2006-12-15 6:38 ` [PATCH 04 of 10] Add XS_RESUME command Brendan Cully
2006-12-15 6:38 ` [PATCH 05 of 10] Export XS_RESUME to xend Brendan Cully
2006-12-15 6:38 ` [PATCH 06 of 10] Make suspend hypercall return 1 when the domain has been resumed Brendan Cully
2006-12-15 6:38 ` [PATCH 07 of 10] Add new shutdown mode for checkpoint Brendan Cully
2006-12-28 16:51 ` Keir Fraser
2007-01-12 1:25 ` Brendan Cully
2007-01-12 23:58 ` Brendan Cully
2006-12-15 6:38 ` [PATCH 08 of 10] Add xm save -c/--checkpoint option Brendan Cully
2006-12-15 6:38 ` [PATCH 09 of 10] Advertise address of grant table shared pages in suspend record Brendan Cully
2006-12-15 6:38 ` [PATCH 10 of 10] Ignore safe foreign maps in xc_linux_save Brendan Cully
2006-12-15 8:07 ` [PATCH 00 of 10] Teach xm save to checkpoint a running domain Steven Hand
2006-12-16 0:04 ` Brendan Cully
2006-12-20 10:01 ` Yoshiaki Tamura [this message]
2007-01-09 21:33 ` [PATCH 00 of 10] Teach xm save to checkpoint a Brendan Cully
2007-01-12 0:56 ` Yoshiaki Tamura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=458909EE.5030705@lab.ntt.co.jp \
--to=tamura.yoshiaki@lab.ntt.co.jp \
--cc=brendan@cs.ubc.ca \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.