All of lore.kernel.org
 help / color / mirror / Atom feed
From: "James Song" <jsong@novell.com>
To: keir.fraser@eu.citrix.com, kevin.tian@intel.com,
	xen-devel@lists.xensource.com
Subject: RE: RE:  when timer go back in dom0 save and restore ormigrate,  PV domain hung
Date: Thu, 27 Nov 2008 03:21:44 -0700	[thread overview]
Message-ID: <492F0F690200002000007F61@lucius.provo.novell.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 12806 bytes --]

Hi,
     Ok, now two machine A and B. the system-time of A is ahead of B. So wc_sec of A is also bigger than B. When PV dom in A migrate to B, we haven't upate that PV dom's wc_sec to equal with B. Ok, now we see pv dom's kernel:
    xen_sched_clock() in arch/86/xen/time.c andxen_clocksource_read()  arch/x86/kernel/time_32-xen.c
  you will find if state_entry_time of its's vcpu, because the state_entry_time is initalized in machine A. this time it more big than "now" of machine B. So no schedule, no system-update in Guest os.
I don't whether did I describe it clearly.

>>> "Tian, Kevin" <kevin.tian@intel.com> 08/11/27 PM 9:18 >>>there's a clock_was_set called for each settimeofday. In latest kernel, clock_was_set will adjust CLOCK_REALTIME queue accordingly, while in 2.6.18 it's defined as a nop. That says, current domU would be unable to handle wallclock change, but newer kernel with pvops could.  ---- yes, it works for FV, but for a modified PV domain, mybe not. 

 
for the issue reported in original thread, I agree that James should dig into the hang and explain the exact reason first.
 
Thanks
Kevin

  From: Keir Fraser   [mailto:keir.fraser@eu.citrix.com] 
Sent: Wednesday, November 26,   2008 10:58 PM
To: Tian, Kevin; 'James Song';   xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] when timer go   back in dom0 save and restore or migrate, PV domain hung


  
So what happens if someone changes wallclock using   'date'? That's basically kind of what will appear to happen when s/r   occurs.

 -- Keir

On 26/11/08 14:32, "Tian, Kevin"   <kevin.tian@intel.com> wrote:

  hrtimer supports two timer bases: CLOCK_MONOTONIC and     CLOCK_REALTIME. wall_to_monotonic is only added in former case, and for     latter instead TOD is used directly per my reading. I did a quick search,     and it looks that futex and ntp are using CLOCK_REALTIME. Also there's one     vsyscall gate which can pass CLOCK_REALTIME from caller     too.

Thanks,
Kevin

    
 
      mailto:keir.fraser@eu.citrix.com]       
Sent: Wednesday, November 26,  2008 10:26 PM
To:       Tian, Kevin; 'James Song';        xen-devel@lists.xensource.com
Subject: Re: [Xen-devel]       when timer go  back in dom0 save and restore or migrate, PV domain       hung

 
hrtimers add       wall_to_monotonic to xtime to get a  timesource that doesn't (or       shouldn't!) warp.

 -- Keir

On  26/11/08 14:20,       "Tian, Kevin" <kevin.tian@intel.com>        wrote:

 
      how about hrtimers? one mode is CLOCK_REALTIME, which uses          getnstimeofday as expiration. Once system time is changed either         in local or  new machine, that expiration can't be adjusted. but         i'm not sure whether it  still makes sense to try hrtimers in a         guest.

Thanks
Kevin

 
        
 
 
          mailto:keir.fraser@eu.citrix.com]            
Sent: Wednesday, November 26,  2008 10:11           PM
To:  Tian, Kevin; 'James Song';             xen-devel@lists.xensource.com
Subject: Re:           [Xen-devel]  when timer go  back in dom0 save and restore or           migrate, PV domain  hung

 
The  problem           hasn't been fully explained, but I can say  that PV guests            expect system time to jump across s/r and deal with that. For             example, Linux doesn't use Xen system time internally, but           uses its  progress  to periodically update jiffies, which           does not warp across  s/r.

We have  had problems           corrupting wc_sec/wc_nsec in  xc_domain_restore.c, but that was            fixed some time  ago.

 -- Keir

On           26/11/08 14:00, "Tian,  Kevin"  <kevin.tian@intel.com>           wrote:

 
 
          This is not a s/r or lm specific issue. For example,             system  time  can be changed even when pv guest is             running. Your patch only  hacks restore  point once, and             wc_sec can still be changed later  when system time is              changed on-the-fly  again.

IIRC, pv guest can catch up wall             clock change in timer  interrupt,  and time_resume will             sync internal processed system  time with new system  time             after restored. But I'm not sure whether  it's enough. Actually             the more  interesting is the uptime  difference. For             example, timer with expiration  calculated on  previous             system time may wait nearly infinite if uptime among  two              boxes vary a lot. But I think such issue should have been             considered   already, e.g. some user tool assistance. I             think Keir can comment  better              here.

BTW, do you happen to know what             exactly dom0 hangs on? In  some  busy loop to catch up             time, or long delay to some critical  timer              expiration?

Thanks,
Kevin

 
 
            
 
 
 
              mailto:xen-devel-bounces@lists.xensource.com]                 On Behalf Of James  Song
Sent:               Tuesday,  November 25,  2008 4:02 PM
To:                  xen-devel@lists.xensource.com
Subject:                [Xen-devel] when  timer go  back in dom0 save and               restore or  migrate, PV domain  hung

 
Hi,
   I                 find PV domin hung, When we take those steps                  
         1,                save PV  domain                 
         2,                 change system time of  PV domain back                 
         3,                restore   a PV domain                
        or                  
         1,                migrate  a PV domain  from Machine A to Machine                 B
         2,                the system   time of Machine B is slower than               Machine  A.
   the  problem is                wc_sec will be  change when system-time chanaged in               dom0  or restore in a   slower-system-time machine,               but when restoring, xen  don't  restore the wc_sec                of share_info from xenstore and use native   one.               So guest os will hang.  
this patch will work for                this  issue.

 Thanks
 -- Song                 Wei

diff -r  a5ed0dbc829f                tools/libxc/xc_domain_restore.c
---                  a/tools/libxc/xc_domain_restore.c                  Tue  Nov 18  14:34:14 2008                +0800
+++  b/tools/libxc/xc_domain_restore.c                   Fri Nov 21   17:34:15 2008               +0800
@@ -328,6  +328,16                 @@
 
     /* For               info   only                */
     nr_pfns = 0;
+                     //jsong@novell.com, james               song
+      memset(&domctl, 0,                 sizeof(domctl));
+                   domctl.domain =   dom;
+                   domctl.cmd    =                  XEN_DOMCTL_restoredomain;
+                  frc =   do_domctl(xc_handle,                &domctl);
+     if ( frc  !=               0 )
+      {
+                             ERROR("Unable                 to set flag of  restore.");
+                             goto                 out;
+                    }
 
     if                (   read_exact(io_fd, &p2m_size,               sizeof(unsigned long))                  )
     {
@@               -1120,6 +1130,8                  @@
 
     /*               restore  saved  vcpu_info and arch  specific info                 */
     MEMCPY_FIELD(new_shared_info,                  old_shared_info, vcpu_info);
+                     MEMCPY_FIELD(new_shared_info,                old_shared_info,   wc_nsec);
+                   MEMCPY_FIELD(new_shared_info,                  old_shared_info,                 wc_sec);
      MEMCPY_FIELD(new_shared_info,                 old_shared_info,                  arch);
 
     /*               clear  any  pending events and  the selector               */
diff -r  a5ed0dbc829f  xen/arch/x86/time.c
---                 a/xen/arch/x86/time.c     Tue Nov               18  14:34:14 2008 +0800
+++                 b/xen/arch/x86/time.c     Fri Nov               21 17:34:15 2008  +0800
@@   -689,7 +689,6                 @@
      wmb();
     (*version)++;
 }
-
 void                  update_vcpu_system_time(struct vcpu                 *v)
 {
      struct                cpu_time        *t;
@@               -703,7  +702,6                 @@
 
     if (                 u->tsc_timestamp ==  t->local_tsc_stamp                 )
          return;
-
      version_update_begin(&u->version);
 
      u->tsc_timestamp                     = t->local_tsc_stamp;
@@                 -713,14  +711,19                 @@
 
      version_update_end(&u->version);
 }
-
 void                  update_domain_wallclock_time(struct domain                  *d)
 {
      spin_lock(&wc_lock);
+                    if(d->after_restore  )
+                    {
+                         d->after_restore                =  0;
+                      goto   out;                //jsong@novell.com
+                    }
      version_update_begin(&shared_info(d,                  wc_version));
     shared_info(d,                 wc_sec)  =  wc_sec +                 d->time_offset_seconds;
     shared_info(d,                  wc_nsec) =                 wc_nsec;
      version_update_end(&shared_info(d,                  wc_version));
+out:
      spin_unlock(&wc_lock);
 }
 
@@                 -751,7 +754,6                @@
     u64                 x;
     u32 y,                _wc_sec,                 _wc_nsec;
     struct               domain                  *d;
-
     x =               (secs *  1000000000ULL)  + (u64)nsecs -                 system_time_base;
     y                =  do_div(x,  1000000000);
 
@@ -1050,7               +1052,6   @@
 struct tm                  wallclock_time(void)
 {
     uint64_t                  seconds;
-
     if               (  !wc_sec                  )
         return                 (struct tm) { 0  };
 
diff -r               a5ed0dbc829f   xen/common/domctl.c
---                a/xen/common/domctl.c      Tue Nov               18 14:34:14 2008 +0800
+++                  b/xen/common/domctl.c    Fri Nov               21  17:34:15 2008  +0800
@@  -24,7 +24,6               @@
 #include                 <asm/current.h>
 #include                  <public/domctl.h>
 #include                  <xsm/xsm.h>
-
 extern long                  arch_do_domctl(
     struct                xen_domctl  *op,  XEN_GUEST_HANDLE(xen_domctl_t)                u_domctl);
 
@@  -315,6 +314,16                  @@
         ret                =                  0;
     }
      break;
+                   case XEN_DOMCTL_restoredomain:
+                   {
+                        struct               domain   *d;
+                       if ( (d  =                 rcu_lock_domain_by_id(op->domain)) == NULL                 )
+                             break;
+                         
+                        d->after_restore               =    1;
+                         rcu_unlock_domain(d);
+                         break;
+                   }
 
     case                  XEN_DOMCTL_createdomain:
     {
diff                 -r a5ed0dbc829f                xen/include/public/domctl.h
---                  a/xen/include/public/domctl.h                  Tue Nov 18  14:34:14  2008                +0800
+++ b/xen/include/public/domctl.h                    Fri Nov 21  17:34:15 2008               +0800
@@  -61,6 +61,7  @@
 #define                XEN_DOMCTL_destroydomain                      2
 #define                  XEN_DOMCTL_pausedomain                         3
 #define                 XEN_DOMCTL_unpausedomain                      4
+#define                 XEN_DOMCTL_restoredomain                       51
 #define                 XEN_DOMCTL_resumedomain                       27
 
 #define                  XEN_DOMCTL_getdomaininfo                     5
diff -r                 a5ed0dbc829f  xen/include/xen/sched.h
---                 a/xen/include/xen/sched.h     Tue               Nov 18 14:34:14 2008   +0800
+++                b/xen/include/xen/sched.h    Fri Nov 21                17:34:15   2008 +0800
@@ -231,6 +231,7                 @@
      * cause a                 deadlock.  Acquirers don't spin waiting; they                  preempt.
      */
      spinlock_t                 hypercall_deadlock_mutex;
+    int                after_restore;                  //jsong@novell.com
 };
 
 struct                  domain_setup_info
---------------------------------------------------------------------------------------------
 Thanks
--Song                  wei







</kevin.tian@intel.com>

[-- Attachment #1.2: Type: text/html, Size: 22993 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

             reply	other threads:[~2008-11-27 10:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-27 10:21 James Song [this message]
2008-11-27 10:51 ` RE: RE: when timer go back in dom0 save and restore ormigrate, PV domain hung Keir Fraser
2008-11-27 17:51   ` RE: RE: [Xen-devel] " Jeremy Fitzhardinge

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=492F0F690200002000007F61@lucius.provo.novell.com \
    --to=jsong@novell.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.