All of lore.kernel.org
 help / color / mirror / Atom feed
From: "James Song" <jsong@novell.com>
To: keir.fraser@eu.citrix.com, kevin.tian@intel.com,
	xen-devel@lists.xensource.com
Subject: Re: RE: Re: Re:  when timer go back in dom0 save and restore ormigrate, PV domain hung
Date: Wed, 26 Nov 2008 22:10:06 -0700	[thread overview]
Message-ID: <492EC65F0200002000007EF2@lucius.provo.novell.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 13946 bytes --]

F.Y.I

>>> "Tian, Kevin" <kevin.tian@intel.com> 08.11.27.  11:50 >>>Sorry for a
typo. I did mean domU instead of dom0. :-) The point here is that
time_resume will sync to new system time and wall clock at restore, and
thus pv guest should be able to continue... Xen system time is not
wallclock time which just counts up from power up. As Keir points out,
only its progress is used to drive internal jiffies.
 --- Actually, save/restore or migrate will not call time_resume, this
function mybe only be called in power saving.

Then what do you mean for "system time stop" here? TOD at user level, or
within kernel you observe xen system time never changing?
 --- If you run command "date" in user mode, you will find the date of
output never change until a time interval equal to the value of time
delay. And also, you can run some applicatin without many relation with
time. such as vi,cd...etc, but if you run ping x.x.x.x you will find
only one line's respose and never go on. 


 Thanks
 --James


  From: James Song [mailto:jsong@novell.com]   
Sent: Thursday, November 27, 2008 11:20 AM
To:   keir.fraser@eu.citrix.com; Tian, Kevin;  
xen-devel@lists.xensource.com
Subject: 答复: Re: [Xen-devel] when   timer go back in dom0 save and
restore ormigrate, PV domain   hung


  
Hi,
    yes, there is a patch before to fix   problem wc_sec/wc_nsec in
xc_domain_restore.c, but it still missed   something.
If constucting dom0 or restoring of a PV dom. Guest os will read   the
local wc_sec from xen as it base time.wc_sec is initialized with CMOS  
data. There were some case which wc_sec will be changed. One is that go
back   dom0's system-time will change dom0's time and wc_sec smaller
which is both   Guest os and Xen. Actually, we can do a simple test,
starting a pv domain,   then change dom0's time, and you will find the
system time of guest os   stopped. That because you change wc_sec of
both xen and guest os.   
    This patch only consider the case of save/restore. I   still not
sure the policy of this case that is when dom0's system-time go   back.
what VMs should do?  So, I have add this case to this   patch
   By the way, Kevin, Guest OS will hang not dom0 ;-) and   also the
time of hang just is equivlant to the time interval you go back in  
dom0 or new machine you migrate.
 Thanks
  --   James

>>> Keir Fraser <keir.fraser@eu.citrix.com>08?11?26? ??   22:58 >>> So
what happens if someone changes wallclock using   'date'? That's
basically kind of what will appear to happen when s/r   occurs.

 -- Keir

On 26/11/08 14:32, "Tian, Kevin"   <kevin.tian@intel.com> wrote:

  hrtimer supports two timer bases: CLOCK_MONOTONIC and    
CLOCK_REALTIME. wall_to_monotonic is only added in former case, and for 
   latter instead TOD is used directly per my reading. I did a quick
search,     and it looks that futex and ntp are using CLOCK_REALTIME.
Also there's one     vsyscall gate which can pass CLOCK_REALTIME from
caller     too.

Thanks,
Kevin

    
 
      mailto:keir.fraser@eu.citrix.com]       
Sent: Wednesday, November 26,  2008 10:26 PM
To:       Tian, Kevin; 'James Song';       
xen-devel@lists.xensource.com
Subject: Re: [Xen-devel]       when timer go  back in dom0 save and
restore or migrate, PV domain       hung

 
hrtimers add       wall_to_monotonic to xtime to get a  timesource that
doesn't (or       shouldn't!) warp.

 -- Keir

On  26/11/08 14:20,       "Tian, Kevin" <kevin.tian@intel.com>       
wrote:

 
      how about hrtimers? one mode is CLOCK_REALTIME, which uses        
 getnstimeofday as expiration. Once system time is changed either       
 in local or  new machine, that expiration can't be adjusted. but       
 i'm not sure whether it  still makes sense to try hrtimers in a        
guest.

Thanks
Kevin

 
        
 
 
          mailto:keir.fraser@eu.citrix.com]            
Sent: Wednesday, November 26,  2008 10:11           PM
To:  Tian, Kevin; 'James Song';            
xen-devel@lists.xensource.com
Subject: Re:           [Xen-devel]  when timer go  back in dom0 save and
restore or           migrate, PV domain  hung

 
The  problem           hasn't been fully explained, but I can say  that
PV guests            expect system time to jump across s/r and deal with
that. For             example, Linux doesn't use Xen system time
internally, but           uses its  progress  to periodically update
jiffies, which           does not warp across  s/r.

We have  had problems           corrupting wc_sec/wc_nsec in 
xc_domain_restore.c, but that was            fixed some time  ago.

 -- Keir

On           26/11/08 14:00, "Tian,  Kevin"  <kevin.tian@intel.com>     
     wrote:

 
 
          This is not a s/r or lm specific issue. For example,          
  system  time  can be changed even when pv guest is            
running. Your patch only  hacks restore  point once, and            
wc_sec can still be changed later  when system time is             
changed on-the-fly  again.

IIRC, pv guest can catch up wall             clock change in timer 
interrupt,  and time_resume will             sync internal processed
system  time with new system  time             after restored. But I'm
not sure whether  it's enough. Actually             the more 
interesting is the uptime  difference. For             example, timer
with expiration  calculated on  previous             system time may
wait nearly infinite if uptime among  two              boxes vary a lot.
But I think such issue should have been             considered  
already, e.g. some user tool assistance. I             think Keir can
comment  better              here.

BTW, do you happen to know what             exactly dom0 hangs on? In 
some  busy loop to catch up             time, or long delay to some
critical  timer              expiration?

Thanks,
Kevin

 
 
            
 
 
 
              mailto:xen-devel-bounces@lists.xensource.com]             
   On Behalf Of James  Song
Sent:               Tuesday,  November 25,  2008 4:02 PM
To:                  xen-devel@lists.xensource.com
Subject:                [Xen-devel] when  timer go  back in dom0 save
and               restore or  migrate, PV domain  hung

 
Hi,
   I                 find PV domin hung, When we take those steps       
          
         1,                save PV  domain                 
         2,                 change system time of  PV domain back       
         
         3,                restore   a PV domain                
        or                  
         1,                migrate  a PV domain  from Machine A to
Machine                 B
         2,                the system   time of Machine B is slower than
              Machine  A.
   the  problem is                wc_sec will be  change when
system-time chanaged in               dom0  or restore in a  
slower-system-time machine,               but when restoring, xen  don't
 restore the wc_sec                of share_info from xenstore and use
native   one.               So guest os will hang.  
this patch will work for                this  issue.

 Thanks
 -- Song                 Wei

diff -r  a5ed0dbc829f                tools/libxc/xc_domain_restore.c
---                  a/tools/libxc/xc_domain_restore.c                 
Tue  Nov 18  14:34:14 2008                +0800
+++  b/tools/libxc/xc_domain_restore.c                   Fri Nov 21  
17:34:15 2008               +0800
@@ -328,6  +328,16                 @@
 
     /* For               info   only                */
     nr_pfns = 0;
+                     //jsong@novell.com, james               song
+      memset(&domctl, 0,                 sizeof(domctl));
+                   domctl.domain =   dom;
+                   domctl.cmd    =                 
XEN_DOMCTL_restoredomain;
+                  frc =   do_domctl(xc_handle,                &domctl);
+     if ( frc  !=               0 )
+      {
+                             ERROR("Unable                 to set flag
of  restore.");
+                             goto                 out;
+                    }
 
     if                (   read_exact(io_fd, &p2m_size,              
sizeof(unsigned long))                  )
     {
@@               -1120,6 +1130,8                  @@
 
     /*               restore  saved  vcpu_info and arch  specific info 
               */
     MEMCPY_FIELD(new_shared_info,                  old_shared_info,
vcpu_info);
+                     MEMCPY_FIELD(new_shared_info,               
old_shared_info,   wc_nsec);
+                   MEMCPY_FIELD(new_shared_info,                 
old_shared_info,                 wc_sec);
      MEMCPY_FIELD(new_shared_info,                 old_shared_info,    
             arch);
 
     /*               clear  any  pending events and  the selector      
        */
diff -r  a5ed0dbc829f  xen/arch/x86/time.c
---                 a/xen/arch/x86/time.c     Tue Nov               18 
14:34:14 2008 +0800
+++                 b/xen/arch/x86/time.c     Fri Nov               21
17:34:15 2008  +0800
@@   -689,7 +689,6                 @@
      wmb();
     (*version)++;
 }
-
 void                  update_vcpu_system_time(struct vcpu              
  *v)
 {
      struct                cpu_time        *t;
@@               -703,7  +702,6                 @@
 
     if (                 u->tsc_timestamp ==  t->local_tsc_stamp       
         )
          return;
-
      version_update_begin(&u->version);
 
      u->tsc_timestamp                     = t->local_tsc_stamp;
@@                 -713,14  +711,19                 @@
 
      version_update_end(&u->version);
 }
-
 void                  update_domain_wallclock_time(struct domain       
          *d)
 {
      spin_lock(&wc_lock);
+                    if(d->after_restore  )
+                    {
+                         d->after_restore                =  0;
+                      goto   out;                //jsong@novell.com
+                    }
      version_update_begin(&shared_info(d,                 
wc_version));
     shared_info(d,                 wc_sec)  =  wc_sec +                
d->time_offset_seconds;
     shared_info(d,                  wc_nsec) =                 wc_nsec;
      version_update_end(&shared_info(d,                  wc_version));
+out:
      spin_unlock(&wc_lock);
 }
 
@@                 -751,7 +754,6                @@
     u64                 x;
     u32 y,                _wc_sec,                 _wc_nsec;
     struct               domain                  *d;
-
     x =               (secs *  1000000000ULL)  + (u64)nsecs -          
      system_time_base;
     y                =  do_div(x,  1000000000);
 
@@ -1050,7               +1052,6   @@
 struct tm                  wallclock_time(void)
 {
     uint64_t                  seconds;
-
     if               (  !wc_sec                  )
         return                 (struct tm) { 0  };
 
diff -r               a5ed0dbc829f   xen/common/domctl.c
---                a/xen/common/domctl.c      Tue Nov               18
14:34:14 2008 +0800
+++                  b/xen/common/domctl.c    Fri Nov               21 
17:34:15 2008  +0800
@@  -24,7 +24,6               @@
 #include                 <asm/current.h>
 #include                  <public/domctl.h>
 #include                  <xsm/xsm.h>
-
 extern long                  arch_do_domctl(
     struct                xen_domctl  *op, 
XEN_GUEST_HANDLE(xen_domctl_t)                u_domctl);
 
@@  -315,6 +314,16                  @@
         ret                =                  0;
     }
      break;
+                   case XEN_DOMCTL_restoredomain:
+                   {
+                        struct               domain   *d;
+                       if ( (d  =                
rcu_lock_domain_by_id(op->domain)) == NULL                 )
+                             break;
+                         
+                        d->after_restore               =    1;
+                         rcu_unlock_domain(d);
+                         break;
+                   }
 
     case                  XEN_DOMCTL_createdomain:
     {
diff                 -r a5ed0dbc829f               
xen/include/public/domctl.h
---                  a/xen/include/public/domctl.h                  Tue
Nov 18  14:34:14  2008                +0800
+++ b/xen/include/public/domctl.h                    Fri Nov 21 
17:34:15 2008               +0800
@@  -61,6 +61,7  @@
 #define                XEN_DOMCTL_destroydomain                      2
 #define                  XEN_DOMCTL_pausedomain                        
3
 #define                 XEN_DOMCTL_unpausedomain                      4
+#define                 XEN_DOMCTL_restoredomain                      
51
 #define                 XEN_DOMCTL_resumedomain                      
27
 
 #define                  XEN_DOMCTL_getdomaininfo                     5
diff -r                 a5ed0dbc829f  xen/include/xen/sched.h
---                 a/xen/include/xen/sched.h     Tue               Nov
18 14:34:14 2008   +0800
+++                b/xen/include/xen/sched.h    Fri Nov 21              
 17:34:15   2008 +0800
@@ -231,6 +231,7                 @@
      * cause a                 deadlock.  Acquirers don't spin waiting;
they                  preempt.
      */
      spinlock_t                 hypercall_deadlock_mutex;
+    int                after_restore;                 
//jsong@novell.com
 };
 
 struct                  domain_setup_info
---------------------------------------------------------------------------------------------
 Thanks
--Song                  wei







</keir.fraser@eu.citrix.com></kevin.tian@intel.com>

[-- Attachment #1.2: Type: text/html, Size: 23765 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

             reply	other threads:[~2008-11-27  5:10 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-27  5:10 James Song [this message]
2008-11-27  5:37 ` Re: RE: Re: Re: [Xen-devel] when timer go back in dom0 save and restore ormigrate, PV domain hung Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=492EC65F0200002000007EF2@lucius.provo.novell.com \
    --to=jsong@novell.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.