From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: when timer go back in dom0 save and restore or migrate, PV domain hung Date: Wed, 26 Nov 2008 14:58:15 +0000 Message-ID: References: <0A882F4D99BBF6449D58E61AAFD7EDD601E23B9B@pdsmsx502.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1147687240==" Return-path: In-Reply-To: <0A882F4D99BBF6449D58E61AAFD7EDD601E23B9B@pdsmsx502.ccr.corp.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Tian, Kevin" , 'James Song' , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --===============1147687240== Content-type: multipart/alternative; boundary="B_3310556302_1637732" > This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. --B_3310556302_1637732 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit So what happens if someone changes wallclock using 'date'? That's basically kind of what will appear to happen when s/r occurs. -- Keir On 26/11/08 14:32, "Tian, Kevin" wrote: > hrtimer supports two timer bases: CLOCK_MONOTONIC and CLOCK_REALTIME. > wall_to_monotonic is only added in former case, and for latter instead TOD is > used directly per my reading. I did a quick search, and it looks that futex > and ntp are using CLOCK_REALTIME. Also there's one vsyscall gate which can > pass CLOCK_REALTIME from caller too. > > Thanks, > Kevin > >> >> >> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Wednesday, November 26, 2008 10:26 PM >> To: Tian, Kevin; 'James Song'; xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] when timer go back in dom0 save and restore or >> migrate, PV domain hung >> >> >> hrtimers add wall_to_monotonic to xtime to get a timesource that doesn't (or >> shouldn't!) warp. >> >> -- Keir >> >> On 26/11/08 14:20, "Tian, Kevin" wrote: >> >> >>> how about hrtimers? one mode is CLOCK_REALTIME, which uses getnstimeofday >>> as expiration. Once system time is changed either in local or new machine, >>> that expiration can't be adjusted. but i'm not sure whether it still makes >>> sense to try hrtimers in a guest. >>> >>> Thanks >>> Kevin >>> >>> >>>> >>>> >>>> >>>> >>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>>> Sent: Wednesday, November 26, 2008 10:11 PM >>>> To: Tian, Kevin; 'James Song'; xen-devel@lists.xensource.com >>>> Subject: Re: [Xen-devel] when timer go back in dom0 save and restore or >>>> migrate, PV domain hung >>>> >>>> >>>> The problem hasn't been fully explained, but I can say that PV guests >>>> expect system time to jump across s/r and deal with that. For example, >>>> Linux doesn't use Xen system time internally, but uses its progress to >>>> periodically update jiffies, which does not warp across s/r. >>>> >>>> We have had problems corrupting wc_sec/wc_nsec in xc_domain_restore.c, >>>> but that was fixed some time ago. >>>> >>>> -- Keir >>>> >>>> On 26/11/08 14:00, "Tian, Kevin" wrote: >>>> >>>> >>>> >>>>> This is not a s/r or lm specific issue. For example, system time can be >>>>> changed even when pv guest is running. Your patch only hacks restore >>>>> point once, and wc_sec can still be changed later when system time is >>>>> changed on-the-fly again. >>>>> >>>>> IIRC, pv guest can catch up wall clock change in timer interrupt, and >>>>> time_resume will sync internal processed system time with new system >>>>> time after restored. But I'm not sure whether it's enough. Actually the >>>>> more interesting is the uptime difference. For example, timer with >>>>> expiration calculated on previous system time may wait nearly infinite >>>>> if uptime among two boxes vary a lot. But I think such issue should have >>>>> been considered already, e.g. some user tool assistance. I think Keir >>>>> can comment better here. >>>>> >>>>> BTW, do you happen to know what exactly dom0 hangs on? In some busy loop >>>>> to catch up time, or long delay to some critical timer expiration? >>>>> >>>>> Thanks, >>>>> Kevin >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> From: xen-devel-bounces@lists.xensource.com >>>>>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of James Song >>>>>> Sent: Tuesday, November 25, 2008 4:02 PM >>>>>> To: xen-devel@lists.xensource.com >>>>>> Subject: [Xen-devel] when timer go back in dom0 save and restore or >>>>>> migrate, PV domain hung >>>>>> >>>>>> >>>>>> Hi, >>>>>> I find PV domin hung, When we take those steps >>>>>> 1, save PV domain >>>>>> 2, change system time of PV domain back >>>>>> 3, restore a PV domain >>>>>> or >>>>>> 1, migrate a PV domain from Machine A to Machine B >>>>>> 2, the system time of Machine B is slower than Machine A. >>>>>> the problem is wc_sec will be change when system-time chanaged in >>>>>> dom0 or restore in a slower-system-time machine, but when restoring, >>>>>> xen don't restore the wc_sec of share_info from xenstore and use >>>>>> native one. So guest os will hang. >>>>>> this patch will work for this issue. >>>>>> >>>>>> Thanks >>>>>> -- Song Wei >>>>>> >>>>>> diff -r a5ed0dbc829f tools/libxc/xc_domain_restore.c >>>>>> --- a/tools/libxc/xc_domain_restore.c Tue Nov 18 14:34:14 2008 >>>>>> +0800 >>>>>> +++ b/tools/libxc/xc_domain_restore.c Fri Nov 21 17:34:15 2008 >>>>>> +0800 >>>>>> @@ -328,6 +328,16 @@ >>>>>> >>>>>> /* For info only */ >>>>>> nr_pfns = 0; >>>>>> + //jsong@novell.com, james song >>>>>> + memset(&domctl, 0, sizeof(domctl)); >>>>>> + domctl.domain = dom; >>>>>> + domctl.cmd = XEN_DOMCTL_restoredomain; >>>>>> + frc = do_domctl(xc_handle, &domctl); >>>>>> + if ( frc != 0 ) >>>>>> + { >>>>>> + ERROR("Unable to set flag of restore."); >>>>>> + goto out; >>>>>> + } >>>>>> >>>>>> if ( read_exact(io_fd, &p2m_size, sizeof(unsigned long)) ) >>>>>> { >>>>>> @@ -1120,6 +1130,8 @@ >>>>>> >>>>>> /* restore saved vcpu_info and arch specific info */ >>>>>> MEMCPY_FIELD(new_shared_info, old_shared_info, vcpu_info); >>>>>> + MEMCPY_FIELD(new_shared_info, old_shared_info, wc_nsec); >>>>>> + MEMCPY_FIELD(new_shared_info, old_shared_info, wc_sec); >>>>>> MEMCPY_FIELD(new_shared_info, old_shared_info, arch); >>>>>> >>>>>> /* clear any pending events and the selector */ >>>>>> diff -r a5ed0dbc829f xen/arch/x86/time.c >>>>>> --- a/xen/arch/x86/time.c Tue Nov 18 14:34:14 2008 +0800 >>>>>> +++ b/xen/arch/x86/time.c Fri Nov 21 17:34:15 2008 +0800 >>>>>> @@ -689,7 +689,6 @@ >>>>>> wmb(); >>>>>> (*version)++; >>>>>> } >>>>>> - >>>>>> void update_vcpu_system_time(struct vcpu *v) >>>>>> { >>>>>> struct cpu_time *t; >>>>>> @@ -703,7 +702,6 @@ >>>>>> >>>>>> if ( u->tsc_timestamp == t->local_tsc_stamp ) >>>>>> return; >>>>>> - >>>>>> version_update_begin(&u->version); >>>>>> >>>>>> u->tsc_timestamp = t->local_tsc_stamp; >>>>>> @@ -713,14 +711,19 @@ >>>>>> >>>>>> version_update_end(&u->version); >>>>>> } >>>>>> - >>>>>> void update_domain_wallclock_time(struct domain *d) >>>>>> { >>>>>> spin_lock(&wc_lock); >>>>>> + if(d->after_restore ) >>>>>> + { >>>>>> + d->after_restore = 0; >>>>>> + goto out; //jsong@novell.com >>>>>> + } >>>>>> version_update_begin(&shared_info(d, wc_version)); >>>>>> shared_info(d, wc_sec) = wc_sec + d->time_offset_seconds; >>>>>> shared_info(d, wc_nsec) = wc_nsec; >>>>>> version_update_end(&shared_info(d, wc_version)); >>>>>> +out: >>>>>> spin_unlock(&wc_lock); >>>>>> } >>>>>> >>>>>> @@ -751,7 +754,6 @@ >>>>>> u64 x; >>>>>> u32 y, _wc_sec, _wc_nsec; >>>>>> struct domain *d; >>>>>> - >>>>>> x = (secs * 1000000000ULL) + (u64)nsecs - system_time_base; >>>>>> y = do_div(x, 1000000000); >>>>>> >>>>>> @@ -1050,7 +1052,6 @@ >>>>>> struct tm wallclock_time(void) >>>>>> { >>>>>> uint64_t seconds; >>>>>> - >>>>>> if ( !wc_sec ) >>>>>> return (struct tm) { 0 }; >>>>>> >>>>>> diff -r a5ed0dbc829f xen/common/domctl.c >>>>>> --- a/xen/common/domctl.c Tue Nov 18 14:34:14 2008 +0800 >>>>>> +++ b/xen/common/domctl.c Fri Nov 21 17:34:15 2008 +0800 >>>>>> @@ -24,7 +24,6 @@ >>>>>> #include >>>>>> #include >>>>>> #include >>>>>> - >>>>>> extern long arch_do_domctl( >>>>>> struct xen_domctl *op, XEN_GUEST_HANDLE(xen_domctl_t) u_domctl); >>>>>> >>>>>> @@ -315,6 +314,16 @@ >>>>>> ret = 0; >>>>>> } >>>>>> break; >>>>>> + case XEN_DOMCTL_restoredomain: >>>>>> + { >>>>>> + struct domain *d; >>>>>> + if ( (d = rcu_lock_domain_by_id(op->domain)) == NULL ) >>>>>> + break; >>>>>> + >>>>>> + d->after_restore = 1; >>>>>> + rcu_unlock_domain(d); >>>>>> + break; >>>>>> + } >>>>>> >>>>>> case XEN_DOMCTL_createdomain: >>>>>> { >>>>>> diff -r a5ed0dbc829f xen/include/public/domctl.h >>>>>> --- a/xen/include/public/domctl.h Tue Nov 18 14:34:14 2008 +0800 >>>>>> +++ b/xen/include/public/domctl.h Fri Nov 21 17:34:15 2008 +0800 >>>>>> @@ -61,6 +61,7 @@ >>>>>> #define XEN_DOMCTL_destroydomain 2 >>>>>> #define XEN_DOMCTL_pausedomain 3 >>>>>> #define XEN_DOMCTL_unpausedomain 4 >>>>>> +#define XEN_DOMCTL_restoredomain 51 >>>>>> #define XEN_DOMCTL_resumedomain 27 >>>>>> >>>>>> #define XEN_DOMCTL_getdomaininfo 5 >>>>>> diff -r a5ed0dbc829f xen/include/xen/sched.h >>>>>> --- a/xen/include/xen/sched.h Tue Nov 18 14:34:14 2008 +0800 >>>>>> +++ b/xen/include/xen/sched.h Fri Nov 21 17:34:15 2008 +0800 >>>>>> @@ -231,6 +231,7 @@ >>>>>> * cause a deadlock. Acquirers don't spin waiting; they >>>>>> preempt. >>>>>> */ >>>>>> spinlock_t hypercall_deadlock_mutex; >>>>>> + int after_restore; //jsong@novell.com >>>>>> }; >>>>>> >>>>>> struct domain_setup_info >>>>>> ------------------------------------------------------------------------- >>>>>> -------------------- >>>>>> Thanks >>>>>> --Song wei >>>>>> >>>>> >>>> >>> >> > --B_3310556302_1637732 Content-type: text/html; charset="US-ASCII" Content-transfer-encoding: quoted-printable Re: [Xen-devel] when timer go back in dom0 save and restore or migra= te, PV domain hung So wh= at happens if someone changes wallclock using 'date'? That's basically kind = of what will appear to happen when s/r occurs.

 -- Keir

On 26/11/08 14:32, "Tian, Kevin" <kevin.tian@intel.com> wro= te:

hrtimer supports two timer bases: CLOCK_MONOTONIC and= CLOCK_REALTIME. wall_to_monotonic is only added in former case, and for lat= ter instead TOD is used directly per my reading. I did a quick search, and i= t looks that futex and ntp are using CLOCK_REALTIME. Also there's one vsysca= ll gate which can pass CLOCK_REALTIME from caller too.

Thanks,
Kevin


 

From:= Keir Fraser  [mailto:k= eir.fraser@eu.citrix.com]
Sent: Wednesday, November 26,  2008 10:26 PM
To: Tian, Kevin; 'James Song';  xen-devel@lists.xensource.com Subject: Re: [Xen-devel] when timer go  back in dom0 save and r= estore or migrate, PV domain hung

 
hrtimers add wall_to_monotonic to xtime to get a  timesource that does= n't (or shouldn't!) warp.

 -- Keir

On  26/11/08 14:20, "Tian, Kevin" <kevin.tian@intel.com&g= t;  wrote:

 
how about hrtimers? one mode is CLOCK_REALTIME, which= uses  getnstimeofday as expiration. Once system time is changed either= in local or  new machine, that expiration can't be adjusted. but i'm n= ot sure whether it  still makes sense to try hrtimers in a guest.

Thanks
Kevin

 

 
 

From:= Keir Fraser  [mailto:k= eir.fraser@eu.citrix.com]  
Sent: Wednesday, November 26,  2008 10:11 PM
To:  Tian, Kevin; 'James Song';   xen-devel@lists.xen= source.com
Subject: Re: [Xen-devel]  when timer go  back in dom0 save= and restore or migrate, PV domain  hung

 
The  problem hasn't been fully explained, but I can say  that PV = guests  expect system time to jump across s/r and deal with that. For &= nbsp; example, Linux doesn't use Xen system time internally, but uses i= ts  progress  to periodically update jiffies, which does not warp = across  s/r.

We have  had problems corrupting wc_sec/wc_nsec in  xc_domain_res= tore.c, but that was  fixed some time  ago.

 -- Keir

On 26/11/08 14:00, "Tian,  Kevin"  <kevin.tian@intel= .com> wrote:

 
 
This is not a s/r or lm specific issue. For example, = system  time  can be changed even when pv guest is running. Your p= atch only  hacks restore  point once, and wc_sec can still be chan= ged later  when system time is  changed on-the-fly  again.
IIRC, pv guest can catch up= wall clock change in timer  interrupt,  and time_resume will sync= internal processed system  time with new system  time after resto= red. But I'm not sure whether  it's enough. Actually the more  int= eresting is the uptime  difference. For example, timer with expiration =  calculated on  previous system time may wait nearly infinite if u= ptime among  two  boxes vary a lot. But I think such issue should = have been considered   already, e.g. some user tool assistance. I = think Keir can comment  better  here.

BTW, do you happen to know = what exactly dom0 hangs on? In  some  busy loop to catch up time, = or long delay to some critical  timer  expiration?

Thanks,
Kevin

 
 

 
 
 

From:=   xen-devel-bounces@lists.xensource.com  [mailto:xen-devel-bounces@lists.xen= source.com]   On Behalf Of James  Song
Sent: Tuesday,  November 25,  2008 4:02 PM
To:    xen-devel@lists.xensource.com
Subject:  [Xen-devel] when  timer go  back in dom0 sa= ve and restore or  migrate, PV domain  hung

 
Hi,
   I   find PV domin hung, When we take those step= s    
         1,  save PV &nbs= p;domain   
         2,   change= system time of  PV domain back   
         3,  restore &nbs= p; a PV domain  
        or    
         1,  migrate &nbs= p;a PV domain  from Machine A to Machine   B
         2,  the system &= nbsp; time of Machine B is slower than Machine  A.
   the  problem is  wc_sec will be  change wh= en system-time chanaged in dom0  or restore in a   slower-sys= tem-time machine, but when restoring, xen  don't  restore the wc_s= ec  of share_info from xenstore and use native   one. So gues= t os will hang.  
this patch will work for  this  issue.

 Thanks
 -- Song   Wei

diff -r  a5ed0dbc829f  tools/libxc/xc_domain_restore.c
---    a/tools/libxc/xc_domain_restore.c    T= ue  Nov 18  14:34:14 2008  +0800
+++  b/tools/libxc/xc_domain_restore.c     Fri Nov= 21   17:34:15 2008 +0800
@@ -328,6  +328,16   @@
 
     /* For info   only  */
     nr_pfns =3D 0;
+       //jsong@novell.com, james song
+      memset(&domctl, 0,   sizeof(d= omctl));
+     domctl.domain =3D   dom;
+     domctl.cmd    =3D    = XEN_DOMCTL_restoredomain;
+    frc =3D   do_domctl(xc_handle,  &domct= l);
+     if ( frc  !=3D 0 )
+      {
+             &= nbsp; ERROR("Unable   to set flag of  restore."= ;);
+             &= nbsp; goto   out;
+      }
 
     if  (   read_exact(io_fd, &= ;p2m_size, sizeof(unsigned long))    )
     {
@@ -1120,6 +1130,8    @@
 
     /* restore  saved  vcpu_info and ar= ch  specific info   */
     MEMCPY_FIELD(new_shared_info,   &nb= sp;old_shared_info, vcpu_info);
+       MEMCPY_FIELD(new_shared_info,  o= ld_shared_info,   wc_nsec);
+     MEMCPY_FIELD(new_shared_info,    o= ld_shared_info,   wc_sec);
      MEMCPY_FIELD(new_shared_info,  &nb= sp;old_shared_info,    arch);
 
     /* clear  any  pending events and &= nbsp;the selector */
diff -r  a5ed0dbc829f  xen/arch/x86/time.c
---   a/xen/arch/x86/time.c     Tue Nov 18 &n= bsp;14:34:14 2008 +0800
+++   b/xen/arch/x86/time.c     Fri Nov 21 17= :34:15 2008  +0800
@@   -689,7 +689,6   @@
      wmb();
     (*version)++;
 }
-
 void    update_vcpu_system_time(struct vcpu  &nbs= p;*v)
 {
      struct  cpu_time    = ;    *t;
@@ -703,7  +702,6   @@
 
     if (   u->tsc_timestamp =3D=3D  = ;t->local_tsc_stamp   )
          return;
-
      version_update_begin(&u->version= );
 
      u->tsc_timestamp    &= nbsp;  =3D t->local_tsc_stamp;
@@   -713,14  +711,19   @@
 
      version_update_end(&u->version);=
 }
-
 void    update_domain_wallclock_time(struct domain &nb= sp;  *d)
 {
      spin_lock(&wc_lock);
+      if(d->after_restore  )
+      {
+           d->after_r= estore  =3D  0;
+        goto   out;  //j= song@novell.com
+      }
      version_update_begin(&shared_info(d= ,    wc_version));
     shared_info(d,   wc_sec)  =3D &n= bsp;wc_sec +   d->time_offset_seconds;
     shared_info(d,    wc_nsec) =3D &= nbsp; wc_nsec;
      version_update_end(&shared_info(d, =    wc_version));
+out:
      spin_unlock(&wc_lock);
 }
 
@@   -751,7 +754,6  @@
     u64   x;
     u32 y,  _wc_sec,   _wc_nsec;      struct domain    *d;
-
     x =3D (secs *  1000000000ULL)  + (u64= )nsecs -   system_time_base;
     y  =3D  do_div(x,  1000000000);<= BR>  
@@ -1050,7 +1052,6   @@
 struct tm    wallclock_time(void)
 {
     uint64_t    seconds;
-
     if (  !wc_sec    )
         return   (s= truct tm) { 0  };
 
diff -r a5ed0dbc829f   xen/common/domctl.c
---  a/xen/common/domctl.c      Tue Nov 18 14= :34:14 2008 +0800
+++    b/xen/common/domctl.c    Fri Nov 21 &n= bsp;17:34:15 2008  +0800
@@  -24,7 +24,6 @@
 #include   <asm/current.h>
 #include    <public/domctl.h>
 #include    <xsm/xsm.h>
-
 extern long    arch_do_domctl(
     struct  xen_domctl  *op,  XEN_= GUEST_HANDLE(xen_domctl_t)  u_domctl);
 
@@  -315,6 +314,16    @@
         ret  =3D  &nb= sp; 0;
     }
      break;
+     case XEN_DOMCTL_restoredomain:
+     {
+          struct domain  = ; *d;
+         if ( (d  =3D  &nb= sp;rcu_lock_domain_by_id(op->domain)) =3D=3D NULL   )
+             &= nbsp; break;
+           
+          d->after_restore= =3D    1;
+           rcu_unlock_do= main(d);
+           break;
+     }
 
     case    XEN_DOMCTL_createdomai= n:
     {
diff   -r a5ed0dbc829f  xen/include/public/domctl.h
---    a/xen/include/public/domctl.h    Tue N= ov 18  14:34:14  2008  +0800
+++ b/xen/include/public/domctl.h      Fri Nov 21 =  17:34:15 2008 +0800
@@  -61,6 +61,7  @@
 #define  XEN_DOMCTL_destroydomain      =   2
 #define    XEN_DOMCTL_pausedomain    &n= bsp;      3
 #define   XEN_DOMCTL_unpausedomain     =    4
+#define   XEN_DOMCTL_restoredomain      = ;   51
 #define   XEN_DOMCTL_resumedomain     &= nbsp;   27
 
 #define    XEN_DOMCTL_getdomaininfo    =    5
diff -r   a5ed0dbc829f  xen/include/xen/sched.h
---   a/xen/include/xen/sched.h     Tue Nov 1= 8 14:34:14 2008   +0800
+++  b/xen/include/xen/sched.h    Fri Nov 21  17:3= 4:15   2008 +0800
@@ -231,6 +231,7   @@
      * cause a   deadlock.  A= cquirers don't spin waiting; they    preempt.
      */
      spinlock_t   hypercall_deadlo= ck_mutex;
+    int  after_restore;    //jsong@nove= ll.com
 };
 
 struct    domain_setup_info
---------------------------------------------------------------------------= ------------------
 Thanks
--Song    wei







--B_3310556302_1637732-- --===============1147687240== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============1147687240==--