* Re: RE: Re: Re: when timer go back in dom0 save and restore ormigrate, PV domain hung
@ 2008-11-27 5:10 James Song
2008-11-27 5:37 ` Re: RE: Re: Re: [Xen-devel] " Tian, Kevin
0 siblings, 1 reply; 2+ messages in thread
From: James Song @ 2008-11-27 5:10 UTC (permalink / raw)
To: keir.fraser, kevin.tian, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 13946 bytes --]
F.Y.I
>>> "Tian, Kevin" <kevin.tian@intel.com> 08.11.27. 11:50 >>>Sorry for a
typo. I did mean domU instead of dom0. :-) The point here is that
time_resume will sync to new system time and wall clock at restore, and
thus pv guest should be able to continue... Xen system time is not
wallclock time which just counts up from power up. As Keir points out,
only its progress is used to drive internal jiffies.
--- Actually, save/restore or migrate will not call time_resume, this
function mybe only be called in power saving.
Then what do you mean for "system time stop" here? TOD at user level, or
within kernel you observe xen system time never changing?
--- If you run command "date" in user mode, you will find the date of
output never change until a time interval equal to the value of time
delay. And also, you can run some applicatin without many relation with
time. such as vi,cd...etc, but if you run ping x.x.x.x you will find
only one line's respose and never go on.
Thanks
--James
From: James Song [mailto:jsong@novell.com]
Sent: Thursday, November 27, 2008 11:20 AM
To: keir.fraser@eu.citrix.com; Tian, Kevin;
xen-devel@lists.xensource.com
Subject: 答复: Re: [Xen-devel] when timer go back in dom0 save and
restore ormigrate, PV domain hung
Hi,
yes, there is a patch before to fix problem wc_sec/wc_nsec in
xc_domain_restore.c, but it still missed something.
If constucting dom0 or restoring of a PV dom. Guest os will read the
local wc_sec from xen as it base time.wc_sec is initialized with CMOS
data. There were some case which wc_sec will be changed. One is that go
back dom0's system-time will change dom0's time and wc_sec smaller
which is both Guest os and Xen. Actually, we can do a simple test,
starting a pv domain, then change dom0's time, and you will find the
system time of guest os stopped. That because you change wc_sec of
both xen and guest os.
This patch only consider the case of save/restore. I still not
sure the policy of this case that is when dom0's system-time go back.
what VMs should do? So, I have add this case to this patch
By the way, Kevin, Guest OS will hang not dom0 ;-) and also the
time of hang just is equivlant to the time interval you go back in
dom0 or new machine you migrate.
Thanks
-- James
>>> Keir Fraser <keir.fraser@eu.citrix.com>08?11?26? ?? 22:58 >>> So
what happens if someone changes wallclock using 'date'? That's
basically kind of what will appear to happen when s/r occurs.
-- Keir
On 26/11/08 14:32, "Tian, Kevin" <kevin.tian@intel.com> wrote:
hrtimer supports two timer bases: CLOCK_MONOTONIC and
CLOCK_REALTIME. wall_to_monotonic is only added in former case, and for
latter instead TOD is used directly per my reading. I did a quick
search, and it looks that futex and ntp are using CLOCK_REALTIME.
Also there's one vsyscall gate which can pass CLOCK_REALTIME from
caller too.
Thanks,
Kevin
mailto:keir.fraser@eu.citrix.com]
Sent: Wednesday, November 26, 2008 10:26 PM
To: Tian, Kevin; 'James Song';
xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] when timer go back in dom0 save and
restore or migrate, PV domain hung
hrtimers add wall_to_monotonic to xtime to get a timesource that
doesn't (or shouldn't!) warp.
-- Keir
On 26/11/08 14:20, "Tian, Kevin" <kevin.tian@intel.com>
wrote:
how about hrtimers? one mode is CLOCK_REALTIME, which uses
getnstimeofday as expiration. Once system time is changed either
in local or new machine, that expiration can't be adjusted. but
i'm not sure whether it still makes sense to try hrtimers in a
guest.
Thanks
Kevin
mailto:keir.fraser@eu.citrix.com]
Sent: Wednesday, November 26, 2008 10:11 PM
To: Tian, Kevin; 'James Song';
xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] when timer go back in dom0 save and
restore or migrate, PV domain hung
The problem hasn't been fully explained, but I can say that
PV guests expect system time to jump across s/r and deal with
that. For example, Linux doesn't use Xen system time
internally, but uses its progress to periodically update
jiffies, which does not warp across s/r.
We have had problems corrupting wc_sec/wc_nsec in
xc_domain_restore.c, but that was fixed some time ago.
-- Keir
On 26/11/08 14:00, "Tian, Kevin" <kevin.tian@intel.com>
wrote:
This is not a s/r or lm specific issue. For example,
system time can be changed even when pv guest is
running. Your patch only hacks restore point once, and
wc_sec can still be changed later when system time is
changed on-the-fly again.
IIRC, pv guest can catch up wall clock change in timer
interrupt, and time_resume will sync internal processed
system time with new system time after restored. But I'm
not sure whether it's enough. Actually the more
interesting is the uptime difference. For example, timer
with expiration calculated on previous system time may
wait nearly infinite if uptime among two boxes vary a lot.
But I think such issue should have been considered
already, e.g. some user tool assistance. I think Keir can
comment better here.
BTW, do you happen to know what exactly dom0 hangs on? In
some busy loop to catch up time, or long delay to some
critical timer expiration?
Thanks,
Kevin
mailto:xen-devel-bounces@lists.xensource.com]
On Behalf Of James Song
Sent: Tuesday, November 25, 2008 4:02 PM
To: xen-devel@lists.xensource.com
Subject: [Xen-devel] when timer go back in dom0 save
and restore or migrate, PV domain hung
Hi,
I find PV domin hung, When we take those steps
1, save PV domain
2, change system time of PV domain back
3, restore a PV domain
or
1, migrate a PV domain from Machine A to
Machine B
2, the system time of Machine B is slower than
Machine A.
the problem is wc_sec will be change when
system-time chanaged in dom0 or restore in a
slower-system-time machine, but when restoring, xen don't
restore the wc_sec of share_info from xenstore and use
native one. So guest os will hang.
this patch will work for this issue.
Thanks
-- Song Wei
diff -r a5ed0dbc829f tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c
Tue Nov 18 14:34:14 2008 +0800
+++ b/tools/libxc/xc_domain_restore.c Fri Nov 21
17:34:15 2008 +0800
@@ -328,6 +328,16 @@
/* For info only */
nr_pfns = 0;
+ //jsong@novell.com, james song
+ memset(&domctl, 0, sizeof(domctl));
+ domctl.domain = dom;
+ domctl.cmd =
XEN_DOMCTL_restoredomain;
+ frc = do_domctl(xc_handle, &domctl);
+ if ( frc != 0 )
+ {
+ ERROR("Unable to set flag
of restore.");
+ goto out;
+ }
if ( read_exact(io_fd, &p2m_size,
sizeof(unsigned long)) )
{
@@ -1120,6 +1130,8 @@
/* restore saved vcpu_info and arch specific info
*/
MEMCPY_FIELD(new_shared_info, old_shared_info,
vcpu_info);
+ MEMCPY_FIELD(new_shared_info,
old_shared_info, wc_nsec);
+ MEMCPY_FIELD(new_shared_info,
old_shared_info, wc_sec);
MEMCPY_FIELD(new_shared_info, old_shared_info,
arch);
/* clear any pending events and the selector
*/
diff -r a5ed0dbc829f xen/arch/x86/time.c
--- a/xen/arch/x86/time.c Tue Nov 18
14:34:14 2008 +0800
+++ b/xen/arch/x86/time.c Fri Nov 21
17:34:15 2008 +0800
@@ -689,7 +689,6 @@
wmb();
(*version)++;
}
-
void update_vcpu_system_time(struct vcpu
*v)
{
struct cpu_time *t;
@@ -703,7 +702,6 @@
if ( u->tsc_timestamp == t->local_tsc_stamp
)
return;
-
version_update_begin(&u->version);
u->tsc_timestamp = t->local_tsc_stamp;
@@ -713,14 +711,19 @@
version_update_end(&u->version);
}
-
void update_domain_wallclock_time(struct domain
*d)
{
spin_lock(&wc_lock);
+ if(d->after_restore )
+ {
+ d->after_restore = 0;
+ goto out; //jsong@novell.com
+ }
version_update_begin(&shared_info(d,
wc_version));
shared_info(d, wc_sec) = wc_sec +
d->time_offset_seconds;
shared_info(d, wc_nsec) = wc_nsec;
version_update_end(&shared_info(d, wc_version));
+out:
spin_unlock(&wc_lock);
}
@@ -751,7 +754,6 @@
u64 x;
u32 y, _wc_sec, _wc_nsec;
struct domain *d;
-
x = (secs * 1000000000ULL) + (u64)nsecs -
system_time_base;
y = do_div(x, 1000000000);
@@ -1050,7 +1052,6 @@
struct tm wallclock_time(void)
{
uint64_t seconds;
-
if ( !wc_sec )
return (struct tm) { 0 };
diff -r a5ed0dbc829f xen/common/domctl.c
--- a/xen/common/domctl.c Tue Nov 18
14:34:14 2008 +0800
+++ b/xen/common/domctl.c Fri Nov 21
17:34:15 2008 +0800
@@ -24,7 +24,6 @@
#include <asm/current.h>
#include <public/domctl.h>
#include <xsm/xsm.h>
-
extern long arch_do_domctl(
struct xen_domctl *op,
XEN_GUEST_HANDLE(xen_domctl_t) u_domctl);
@@ -315,6 +314,16 @@
ret = 0;
}
break;
+ case XEN_DOMCTL_restoredomain:
+ {
+ struct domain *d;
+ if ( (d =
rcu_lock_domain_by_id(op->domain)) == NULL )
+ break;
+
+ d->after_restore = 1;
+ rcu_unlock_domain(d);
+ break;
+ }
case XEN_DOMCTL_createdomain:
{
diff -r a5ed0dbc829f
xen/include/public/domctl.h
--- a/xen/include/public/domctl.h Tue
Nov 18 14:34:14 2008 +0800
+++ b/xen/include/public/domctl.h Fri Nov 21
17:34:15 2008 +0800
@@ -61,6 +61,7 @@
#define XEN_DOMCTL_destroydomain 2
#define XEN_DOMCTL_pausedomain
3
#define XEN_DOMCTL_unpausedomain 4
+#define XEN_DOMCTL_restoredomain
51
#define XEN_DOMCTL_resumedomain
27
#define XEN_DOMCTL_getdomaininfo 5
diff -r a5ed0dbc829f xen/include/xen/sched.h
--- a/xen/include/xen/sched.h Tue Nov
18 14:34:14 2008 +0800
+++ b/xen/include/xen/sched.h Fri Nov 21
17:34:15 2008 +0800
@@ -231,6 +231,7 @@
* cause a deadlock. Acquirers don't spin waiting;
they preempt.
*/
spinlock_t hypercall_deadlock_mutex;
+ int after_restore;
//jsong@novell.com
};
struct domain_setup_info
---------------------------------------------------------------------------------------------
Thanks
--Song wei
</keir.fraser@eu.citrix.com></kevin.tian@intel.com>
[-- Attachment #1.2: Type: text/html, Size: 23765 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
* RE: Re: RE: Re: Re: [Xen-devel] when timer go back in dom0 save and restore ormigrate, PV domain hung
2008-11-27 5:10 Re: RE: Re: Re: when timer go back in dom0 save and restore ormigrate, PV domain hung James Song
@ 2008-11-27 5:37 ` Tian, Kevin
0 siblings, 0 replies; 2+ messages in thread
From: Tian, Kevin @ 2008-11-27 5:37 UTC (permalink / raw)
To: 'James Song', keir.fraser@eu.citrix.com,
xen-devel@lists.xensource.com
[-- Attachment #1.1: Type: text/plain, Size: 11634 bytes --]
No, time_resume is for sure invoked. You should look at machine_reboot.c which is the whole path for s/r and lm.
"date" will change since by default wall clock in guest is synced to real. Maybe independent_wallclock is something you want to start with, which is not cared at s/r for now.
Thanks,
Kevin
________________________________
From: James Song [mailto:jsong@novell.com]
Sent: Thursday, November 27, 2008 1:10 PM
To: keir.fraser@eu.citrix.com; Tian, Kevin; xen-devel@lists.xensource.com
Subject: Re: RE: Re: Re: [Xen-devel] when timer go back in dom0 save and restore ormigrate, PV domain hung
F.Y.I
>>> "Tian, Kevin" 08.11.27. 11:50 >>>
Sorry for a typo. I did mean domU instead of dom0. :-) The point here is that time_resume will sync to new system time and wall clock at restore, and thus pv guest should be able to continue... Xen system time is not wallclock time which just counts up from power up. As Keir points out, only its progress is used to drive internal jiffies.
--- Actually, save/restore or migrate will not call time_resume, this function mybe only be called in power saving.
Then what do you mean for "system time stop" here? TOD at user level, or within kernel you observe xen system time never changing?
--- If you run command "date" in user mode, you will find the date of output never change until a time interval equal to the value of time delay. And also, you can run some applicatin without many relation with time. such as vi,cd...etc, but if you run ping x.x.x.x you will find only one line's respose and never go on.
Thanks
--James
________________________________
From: James Song [mailto:jsong@novell.com]
Sent: Thursday, November 27, 2008 11:20 AM
To: keir.fraser@eu.citrix.com; Tian, Kevin; xen-devel@lists.xensource.com
Subject: 答复: Re: [Xen-devel] when timer go back in dom0 save and restore ormigrate, PV domain hung
Hi,
yes, there is a patch before to fix problem wc_sec/wc_nsec in xc_domain_restore.c, but it still missed something.
If constucting dom0 or restoring of a PV dom. Guest os will read the local wc_sec from xen as it base time.wc_sec is initialized with CMOS data. There were some case which wc_sec will be changed. One is that go back dom0's system-time will change dom0's time and wc_sec smaller which is both Guest os and Xen. Actually, we can do a simple test, starting a pv domain, then change dom0's time, and you will find the system time of guest os stopped. That because you change wc_sec of both xen and guest os.
This patch only consider the case of save/restore. I still not sure the policy of this case that is when dom0's system-time go back. what VMs should do? So, I have add this case to this patch
By the way, Kevin, Guest OS will hang not dom0 ;-) and also the time of hang just is equivlant to the time interval you go back in dom0 or new machine you migrate.
Thanks
-- James
>>> Keir Fraser 08?11?26? ?? 22:58 >>> So what happens if someone changes wallclock using 'date'? That's basically kind of what will appear to happen when s/r occurs.
-- Keir
On 26/11/08 14:32, "Tian, Kevin" <kevin.tian@intel.com> wrote:
hrtimer supports two timer bases: CLOCK_MONOTONIC and CLOCK_REALTIME. wall_to_monotonic is only added in former case, and for latter instead TOD is used directly per my reading. I did a quick search, and it looks that futex and ntp are using CLOCK_REALTIME. Also there's one vsyscall gate which can pass CLOCK_REALTIME from caller too.
Thanks,
Kevin
________________________________
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]<mailto:keir.fraser@eu.citrix.com%5D>
Sent: Wednesday, November 26, 2008 10:26 PM
To: Tian, Kevin; 'James Song'; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] when timer go back in dom0 save and restore or migrate, PV domain hung
hrtimers add wall_to_monotonic to xtime to get a timesource that doesn't (or shouldn't!) warp.
-- Keir
On 26/11/08 14:20, "Tian, Kevin" <kevin.tian@intel.com> wrote:
how about hrtimers? one mode is CLOCK_REALTIME, which uses getnstimeofday as expiration. Once system time is changed either in local or new machine, that expiration can't be adjusted. but i'm not sure whether it still makes sense to try hrtimers in a guest.
Thanks
Kevin
________________________________
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]<mailto:keir.fraser@eu.citrix.com%5D>
Sent: Wednesday, November 26, 2008 10:11 PM
To: Tian, Kevin; 'James Song'; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] when timer go back in dom0 save and restore or migrate, PV domain hung
The problem hasn't been fully explained, but I can say that PV guests expect system time to jump across s/r and deal with that. For example, Linux doesn't use Xen system time internally, but uses its progress to periodically update jiffies, which does not warp across s/r.
We have had problems corrupting wc_sec/wc_nsec in xc_domain_restore.c, but that was fixed some time ago.
-- Keir
On 26/11/08 14:00, "Tian, Kevin" <kevin.tian@intel.com> wrote:
This is not a s/r or lm specific issue. For example, system time can be changed even when pv guest is running. Your patch only hacks restore point once, and wc_sec can still be changed later when system time is changed on-the-fly again.
IIRC, pv guest can catch up wall clock change in timer interrupt, and time_resume will sync internal processed system time with new system time after restored. But I'm not sure whether it's enough. Actually the more interesting is the uptime difference. For example, timer with expiration calculated on previous system time may wait nearly infinite if uptime among two boxes vary a lot. But I think such issue should have been considered already, e.g. some user tool assistance. I think Keir can comment better here.
BTW, do you happen to know what exactly dom0 hangs on? In some busy loop to catch up time, or long delay to some critical timer expiration?
Thanks,
Kevin
________________________________
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com]<mailto:xen-devel-bounces@lists.xensource.com%5D> On Behalf Of James Song
Sent: Tuesday, November 25, 2008 4:02 PM
To: xen-devel@lists.xensource.com
Subject: [Xen-devel] when timer go back in dom0 save and restore or migrate, PV domain hung
Hi,
I find PV domin hung, When we take those steps
1, save PV domain
2, change system time of PV domain back
3, restore a PV domain
or
1, migrate a PV domain from Machine A to Machine B
2, the system time of Machine B is slower than Machine A.
the problem is wc_sec will be change when system-time chanaged in dom0 or restore in a slower-system-time machine, but when restoring, xen don't restore the wc_sec of share_info from xenstore and use native one. So guest os will hang.
this patch will work for this issue.
Thanks
-- Song Wei
diff -r a5ed0dbc829f tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c Tue Nov 18 14:34:14 2008 +0800
+++ b/tools/libxc/xc_domain_restore.c Fri Nov 21 17:34:15 2008 +0800
@@ -328,6 +328,16 @@
/* For info only */
nr_pfns = 0;
+ //jsong@novell.com, james song
+ memset(&domctl, 0, sizeof(domctl));
+ domctl.domain = dom;
+ domctl.cmd = XEN_DOMCTL_restoredomain;
+ frc = do_domctl(xc_handle, &domctl);
+ if ( frc != 0 )
+ {
+ ERROR("Unable to set flag of restore.");
+ goto out;
+ }
if ( read_exact(io_fd, &p2m_size, sizeof(unsigned long)) )
{
@@ -1120,6 +1130,8 @@
/* restore saved vcpu_info and arch specific info */
MEMCPY_FIELD(new_shared_info, old_shared_info, vcpu_info);
+ MEMCPY_FIELD(new_shared_info, old_shared_info, wc_nsec);
+ MEMCPY_FIELD(new_shared_info, old_shared_info, wc_sec);
MEMCPY_FIELD(new_shared_info, old_shared_info, arch);
/* clear any pending events and the selector */
diff -r a5ed0dbc829f xen/arch/x86/time.c
--- a/xen/arch/x86/time.c Tue Nov 18 14:34:14 2008 +0800
+++ b/xen/arch/x86/time.c Fri Nov 21 17:34:15 2008 +0800
@@ -689,7 +689,6 @@
wmb();
(*version)++;
}
-
void update_vcpu_system_time(struct vcpu *v)
{
struct cpu_time *t;
@@ -703,7 +702,6 @@
if ( u->tsc_timestamp == t->local_tsc_stamp )
return;
-
version_update_begin(&u->version);
u->tsc_timestamp = t->local_tsc_stamp;
@@ -713,14 +711,19 @@
version_update_end(&u->version);
}
-
void update_domain_wallclock_time(struct domain *d)
{
spin_lock(&wc_lock);
+ if(d->after_restore )
+ {
+ d->after_restore = 0;
+ goto out; //jsong@novell.com
+ }
version_update_begin(&shared_info(d, wc_version));
shared_info(d, wc_sec) = wc_sec + d->time_offset_seconds;
shared_info(d, wc_nsec) = wc_nsec;
version_update_end(&shared_info(d, wc_version));
+out:
spin_unlock(&wc_lock);
}
@@ -751,7 +754,6 @@
u64 x;
u32 y, _wc_sec, _wc_nsec;
struct domain *d;
-
x = (secs * 1000000000ULL) + (u64)nsecs - system_time_base;
y = do_div(x, 1000000000);
@@ -1050,7 +1052,6 @@
struct tm wallclock_time(void)
{
uint64_t seconds;
-
if ( !wc_sec )
return (struct tm) { 0 };
diff -r a5ed0dbc829f xen/common/domctl.c
--- a/xen/common/domctl.c Tue Nov 18 14:34:14 2008 +0800
+++ b/xen/common/domctl.c Fri Nov 21 17:34:15 2008 +0800
@@ -24,7 +24,6 @@
#include <asm/current.h>
#include <public/domctl.h>
#include <xsm/xsm.h>
-
extern long arch_do_domctl(
struct xen_domctl *op, XEN_GUEST_HANDLE(xen_domctl_t) u_domctl);
@@ -315,6 +314,16 @@
ret = 0;
}
break;
+ case XEN_DOMCTL_restoredomain:
+ {
+ struct domain *d;
+ if ( (d = rcu_lock_domain_by_id(op->domain)) == NULL )
+ break;
+
+ d->after_restore = 1;
+ rcu_unlock_domain(d);
+ break;
+ }
case XEN_DOMCTL_createdomain:
{
diff -r a5ed0dbc829f xen/include/public/domctl.h
--- a/xen/include/public/domctl.h Tue Nov 18 14:34:14 2008 +0800
+++ b/xen/include/public/domctl.h Fri Nov 21 17:34:15 2008 +0800
@@ -61,6 +61,7 @@
#define XEN_DOMCTL_destroydomain 2
#define XEN_DOMCTL_pausedomain 3
#define XEN_DOMCTL_unpausedomain 4
+#define XEN_DOMCTL_restoredomain 51
#define XEN_DOMCTL_resumedomain 27
#define XEN_DOMCTL_getdomaininfo 5
diff -r a5ed0dbc829f xen/include/xen/sched.h
--- a/xen/include/xen/sched.h Tue Nov 18 14:34:14 2008 +0800
+++ b/xen/include/xen/sched.h Fri Nov 21 17:34:15 2008 +0800
@@ -231,6 +231,7 @@
* cause a deadlock. Acquirers don't spin waiting; they preempt.
*/
spinlock_t hypercall_deadlock_mutex;
+ int after_restore; //jsong@novell.com
};
struct domain_setup_info
---------------------------------------------------------------------------------------------
Thanks
--Song wei
[-- Attachment #1.2: Type: text/html, Size: 26594 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2008-11-27 5:37 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-27 5:10 Re: RE: Re: Re: when timer go back in dom0 save and restore ormigrate, PV domain hung James Song
2008-11-27 5:37 ` Re: RE: Re: Re: [Xen-devel] " Tian, Kevin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.