How to get the checkpoint size in remus code?

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* How to get the checkpoint size in remus code?
@ 2012-10-01 14:48 José Eduardo França
  2012-10-01 15:30 ` lmingcsce
  0 siblings, 1 reply; 6+ messages in thread
From: José Eduardo França @ 2012-10-01 14:48 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1.1: Type: text/plain, Size: 293 bytes --]

Hi,

I'm doing my master research and I need to adapt remus code. Now... I wanna
get the checkpoint size (memory + disk) on each period. Does someone know
what function does this? I think some *fd *object's function in remus code
could just get the memory size.

Does someone help me?

Thanks

[-- Attachment #1.2: Type: text/html, Size: 331 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to get the checkpoint size in remus code?
  2012-10-01 14:48 How to get the checkpoint size in remus code? José Eduardo França
@ 2012-10-01 15:30 ` lmingcsce
  2012-10-03 11:11   ` José Eduardo França
  0 siblings, 1 reply; 6+ messages in thread
From: lmingcsce @ 2012-10-01 15:30 UTC (permalink / raw)
  To: José Eduardo França; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 706 bytes --]

Remus takes advantage of the last iteration of live migration to get dirty memory and transfer to the backup machine. So you can get the dirty memory size from live migration's dirty bitmap. 

On Oct 1, 2012, at 10:48 AM, José Eduardo França wrote:

> Hi,
> 
> I'm doing my master research and I need to adapt remus code. Now... I wanna get the checkpoint size (memory + disk) on each period. Does someone know what function does this? I think some fd object's function in remus code could just get the memory size.
> 
> Does someone help me?
> 
> Thanks
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 1009 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to get the checkpoint size in remus code?
  2012-10-01 15:30 ` lmingcsce
@ 2012-10-03 11:11   ` José Eduardo França
  2012-10-05 17:37     ` Shriram Rajagopalan
  0 siblings, 1 reply; 6+ messages in thread
From: José Eduardo França @ 2012-10-03 11:11 UTC (permalink / raw)
  To: lmingcsce; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1048 bytes --]

I thought remus used xc_domain_save. Is this function used from live
migration?

Futhermore I have two doubts if really Remus takes the last iteration of
live migration

   1. What's the function?
   2. And how to get de I/O disk size on each period?

Thanks

2012/10/1 lmingcsce <lmingcsce@gmail.com>

> Remus takes advantage of the last iteration of live migration to get dirty
> memory and transfer to the backup machine. So you can get the dirty memory
> size from live migration's dirty bitmap.
>
> On Oct 1, 2012, at 10:48 AM, José Eduardo França wrote:
>
> Hi,
>
> I'm doing my master research and I need to adapt remus code. Now... I
> wanna get the checkpoint size (memory + disk) on each period. Does someone
> know what function does this? I think some *fd *object's function in
> remus code could just get the memory size.
>
> Does someone help me?
>
> Thanks
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 1608 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to get the checkpoint size in remus code?
  2012-10-03 11:11   ` José Eduardo França
@ 2012-10-05 17:37     ` Shriram Rajagopalan
       [not found]       ` <CAJP76_D7smbG84YkK53kZkLDupZUECXLMVvDSNC5wqV2XuYtmg@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Shriram Rajagopalan @ 2012-10-05 17:37 UTC (permalink / raw)
  To: José Eduardo França; +Cc: lmingcsce, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1445 bytes --]

On Wed, Oct 3, 2012 at 4:11 AM, José Eduardo França <jefranca@gmail.com> wrote:
>
> I thought remus used xc_domain_save. Is this function used from live
> migration?
>
> Futhermore I have two doubts if really Remus takes the last iteration of
> live migration
>
> What's the function?

There is no specific function. xc_domain_save is where everything
happens. The infinite loop
that basically keeps sending checkpoints @ a particular frequency


> And how to get de I/O disk size on each period?
>

This depends on the disk backend. With blktap2 (unfortunately not
available in 3.* kernels)
tap-remus driver can give you the number of disk blocks sent per checkpoint.

With DRBD, it needs a little bit of hacking into the kernel module to
return the number of disk blocks
being sent with each checkpoint.


>> I'm doing my master research and I need to adapt remus code. Now... I
>> wanna get the checkpoint size (memory + disk) on each period. Does someone
>> know what function does this? I think some fd object's function in remus
>> code could just get the memory size.
>>

You can get memory checkpoint stats for each iteration - like
number of pages dirtied, size of data actually transmitted after
compression (including headers, etc),
time to checkpoint, etc.

The attached patch (for xen-4.1.2) will give you the memory checkpoint
stats for each checkpoint and
can be easily parsed.

shriram

[-- Attachment #2: 04_stats_fix.patch --]
[-- Type: application/octet-stream, Size: 14937 bytes --]

diff -r a9fd6c05f469 tools/libxc/xc_compression.c
--- a/tools/libxc/xc_compression.c	Thu Oct 13 08:53:03 2011 -0700
+++ b/tools/libxc/xc_compression.c	Wed Oct 26 09:19:02 2011 -0700
@@ -451,8 +451,8 @@
 void xc_compression_reset_pagebuf(xc_interface *xch, comp_ctx *ctx)
 {
     ctx->pfns_index = ctx->pfns_len = 0;
-    /* fprintf(stderr, "pagetables=%u,cache_misses=%u,emptypages=%u\n", */
-    /*         ctx->pagetable_pages, ctx->cache_misses, ctx->emptypages); */
+    fprintf(stderr, "pagetables=%u,cache_misses=%u,emptypages=%u\n",
+            ctx->pagetable_pages, ctx->cache_misses, ctx->emptypages);
     ctx->pagetable_pages = ctx->cache_misses = ctx->emptypages = 0;
 }
 
diff -r a9fd6c05f469 tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c	Thu Oct 13 08:53:03 2011 -0700
+++ b/tools/libxc/xc_domain_save.c	Wed Oct 26 09:19:02 2011 -0700
@@ -54,11 +54,24 @@
     struct domain_info_context dinfo;
 };
 
+struct remus_stats {
+    struct timeval suspendTime, resumeTime, compressTime, memflushTime;
+    unsigned int commitDelay;
+    unsigned int iter_sent;
+    unsigned long compression;
+    unsigned int iter_start;
+    unsigned int iter_now;
+    unsigned long total_sent;
+    unsigned long p2m_size;
+    struct timeval suspendCall, resumeCall, shadowCall;
+};
+
 /* buffer for output */
 struct outbuf {
     void* buf;
     size_t size;
     size_t pos;
+    unsigned long total_data_sent;
 };
 
 #define OUTBUF_SIZE (16384 * 1024)
@@ -219,6 +232,7 @@
         rc = write(fd, ob->buf + cur, ob->pos - cur);
     }
 
+    ob->total_data_sent += ob->pos;
     ob->pos = 0;
 
     return 0;
@@ -253,7 +267,7 @@
 }
 
 static int write_compressed(xc_interface *xch, void *comp_ctx, int dobuf,
-                            struct outbuf* ob, int fd)
+                            struct outbuf* ob, int fd, unsigned long *data_sent)
 {
     int rc = 0;
     int header = sizeof(int) + sizeof(unsigned long);
@@ -292,6 +306,7 @@
         }
 
         ob->pos += (size_t) compbuf_len;
+        *data_sent += compbuf_len;
         if (!dobuf && outbuf_flush(xch, ob, fd) < 0)
         {
             ERROR("Error when writing compressed chunk");
@@ -459,6 +474,47 @@
     return 0;
 }
 
+static void print_remus_stats(xc_interface *xch, struct remus_stats *rstats)
+{
+    unsigned int suspendTime = 0, compressTime = 0, flushTime = 0;
+    unsigned int suspendCall = 0, resumeCall = 0, shadowCall = 0;
+
+    /* interval precision in microseconds. */
+    suspendCall = (unsigned int)(tv_delta(&rstats->suspendTime,
+                                          &rstats->suspendCall));
+    resumeCall = (unsigned int)(tv_delta(&rstats->resumeTime,
+                                         &rstats->resumeCall));
+    shadowCall = (unsigned int)(tv_delta(&rstats->shadowCall,
+                                         &rstats->suspendTime));
+    suspendTime = (unsigned int)(tv_delta(&rstats->resumeCall,
+                                          &rstats->suspendTime));
+
+    if (!rstats->compression)
+    {
+        flushTime = (unsigned int)(tv_delta(&rstats->memflushTime,
+                                            &rstats->resumeTime));
+    }
+    else
+    {
+        compressTime = (unsigned int)(tv_delta(&rstats->compressTime,
+                                               &rstats->resumeTime));
+        flushTime = (unsigned int)(tv_delta(&rstats->memflushTime,
+                                            &rstats->compressTime));
+    }
+
+    fprintf(stderr, "%s:%u:suspendAt:%lu.%06lu:scall:%u:rcall:%u:dcall:%u:"
+            "suspendFor:%u:ctime:%u:flush:%u:commit:%u:tosend:%u:comp:%lu\n",
+            rstats->iter_start?"REMUS":"INITIAL",
+            (rstats->iter_now - rstats->iter_start),
+            rstats->suspendTime.tv_sec, rstats->suspendTime.tv_usec,
+            suspendCall, resumeCall, shadowCall,
+            suspendTime, compressTime, flushTime,
+            rstats->commitDelay,
+            rstats->iter_sent, rstats->compression);
+    fprintf(stderr, "Total Data Sent= %.3f MB\n",
+            ((float)(rstats->total_sent/(1024.0 * 1024.0))));
+}
+
 
 static int analysis_phase(xc_interface *xch, uint32_t domid, struct save_ctx *ctx,
                           xc_hypercall_buffer_t *arr, int runs)
@@ -995,6 +1051,7 @@
     unsigned long *to_fix = NULL;
 
     xc_shadow_op_stats_t stats;
+    struct remus_stats rstats;
 
     unsigned long needed_to_fix = 0;
     unsigned long total_sent    = 0;
@@ -1036,6 +1093,12 @@
     }
 
     outbuf_init(xch, &ob_pagebuf, OUTBUF_SIZE);
+    memset(&rstats, 0, sizeof(struct remus_stats));    
+    gettimeofday(&rstats.suspendTime, NULL);
+    gettimeofday(&rstats.resumeTime, NULL);
+    gettimeofday(&rstats.compressTime, NULL);
+    gettimeofday(&rstats.memflushTime, NULL);
+    gettimeofday(&rstats.compressTime, NULL);
 
     /* If no explicit control parameters given, use defaults */
     max_iters  = max_iters  ? : DEF_MAX_ITERS;
@@ -1072,6 +1135,7 @@
 
     /* Get the size of the P2M table */
     dinfo->p2m_size = xc_domain_maximum_gpfn(xch, dom) + 1;
+    rstats.p2m_size = dinfo->p2m_size;
 
     if ( dinfo->p2m_size > ~XEN_DOMCTL_PFINFO_LTAB_MASK )
     {
@@ -1273,9 +1337,10 @@
 #undef ratewrite
 #endif
 #define ratewrite(fd, live, buf, len) ratewrite_buffer(xch, last_iter, ob, (fd), (live), (buf), (len))
-#define wrcompressed(fd) write_compressed(xch, compress_ctx, last_iter, ob, (fd))
+#define wrcompressed(fd) write_compressed(xch, compress_ctx, last_iter, ob, (fd), &(rstats.compression))
 
     ob = &ob_pagebuf; /* Holds pfn_types, pages/compressed pages */
+    ob_pagebuf.total_data_sent = ob_tailbuf.total_data_sent = 0;
     /* Now write out each data page, canonicalising page tables as we go... */
     for ( ; ; )
     {
@@ -1291,6 +1356,8 @@
         iter++;
         sent_this_iter = 0;
         skip_this_iter = 0;
+        rstats.compression = 0;
+        rstats.iter_now = iter;
         N = 0;
 
         while ( N < dinfo->p2m_size )
@@ -1653,8 +1720,9 @@
         xc_report_progress_step(xch, dinfo->p2m_size, dinfo->p2m_size);
 
         total_sent += sent_this_iter;
+        rstats.iter_sent = sent_this_iter;
 
-        if ( last_iter )
+        if ( 0 )
         {
             print_stats( xch, dom, sent_this_iter, &stats, 1);
 
@@ -1692,13 +1760,16 @@
             {
                 DPRINTF("Start last iteration\n");
                 last_iter = 1;
+                rstats.iter_start = iter;
 
+                gettimeofday(&rstats.suspendCall, NULL);
                 if ( suspend_and_state(callbacks->suspend, callbacks->data,
                                        xch, io_fd, dom, &info) )
                 {
                     ERROR("Domain appears not to have suspended");
                     goto out;
                 }
+                gettimeofday(&rstats.suspendTime, NULL);
 
                 DPRINTF("SUSPEND shinfo %08lx\n", info.shared_info_frame);
                 if ( (tmem_saved > 0) &&
@@ -1725,14 +1796,15 @@
                 goto out;
             }
 
+            gettimeofday(&rstats.shadowCall, NULL);
             sent_last_iter = sent_this_iter;
 
-            print_stats(xch, dom, sent_this_iter, &stats, 1);
+            /* print_stats(xch, dom, sent_this_iter, &stats, 1); */
 
         }
     } /* end of infinite for loop */
 
-    DPRINTF("All memory is saved\n");
+    /* DPRINTF("All memory is saved\n"); */
 
     /* After last_iter, buffer the rest of pagebuf & tailbuf data into a
      * separate output buffer and flush it after the compressed page chunks.
@@ -2112,8 +2184,10 @@
  out:
     completed = 1;
 
+    gettimeofday(&rstats.resumeCall, NULL);
     if ( !rc && callbacks->postcopy )
         callbacks->postcopy(callbacks->data);
+    gettimeofday(&rstats.resumeTime, NULL);
 
     /* guest has been resumed. Now we can compress data
      * at our own pace.
@@ -2135,6 +2209,7 @@
             goto out;
         }
     }
+    gettimeofday(&rstats.compressTime, NULL);
 
     /* Flush last write and discard cache for file. */
     if ( outbuf_flush(xch, ob, io_fd) < 0 ) {
@@ -2143,27 +2218,36 @@
     }
 
     discard_file_cache(xch, io_fd, 1 /* flush */);
+    gettimeofday(&rstats.memflushTime, NULL);
 
     /* Enable compression now, finally */
     compressing = (flags & XCFLAGS_CHECKPOINT_COMPRESS);
 
     /* checkpoint_cb can spend arbitrarily long in between rounds */
     if (!rc && callbacks->checkpoint &&
-        callbacks->checkpoint(callbacks->data) > 0)
+        (rstats.commitDelay = callbacks->checkpoint(callbacks->data)) > 0)
     {
         /* reset stats timer */
-        print_stats(xch, dom, 0, &stats, 0);
+        /* print_stats(xch, dom, 0, &stats, 0); */
+        /* print_stats(xch, dom, 0, &time_stats, &shadow_stats, 0); */
+        /* subtract 1 from commitDelay since the python code adds 1 */
+        rstats.commitDelay -= 1;
+        rstats.total_sent += ob_pagebuf.total_data_sent;
+        rstats.total_sent += ob_tailbuf.total_data_sent;
+        print_remus_stats(xch, &rstats);
 
         rc = 1;
         /* last_iter = 1; */
+        gettimeofday(&rstats.suspendCall, NULL);
         if ( suspend_and_state(callbacks->suspend, callbacks->data, xch,
                                io_fd, dom, &info) )
         {
             ERROR("Domain appears not to have suspended");
             goto out;
         }
-        DPRINTF("SUSPEND shinfo %08lx\n", info.shared_info_frame);
-        print_stats(xch, dom, 0, &stats, 1);
+        /* DPRINTF("SUSPEND shinfo %08lx\n", info.shared_info_frame); */
+        /* print_stats(xch, dom, 0, &stats, 1); */
+        gettimeofday(&rstats.suspendTime, NULL);
 
         if ( xc_shadow_control(xch, dom,
                                XEN_DOMCTL_SHADOW_OP_CLEAN, HYPERCALL_BUFFER(to_send),
@@ -2172,6 +2256,7 @@
             PERROR("Error flushing shadow PT");
         }
 
+        gettimeofday(&rstats.shadowCall, NULL);
         goto copypages;
     }
 
diff -r a9fd6c05f469 tools/python/xen/lowlevel/checkpoint/checkpoint.c
--- a/tools/python/xen/lowlevel/checkpoint/checkpoint.c	Thu Oct 13 08:53:03 2011 -0700
+++ b/tools/python/xen/lowlevel/checkpoint/checkpoint.c	Wed Oct 26 09:19:02 2011 -0700
@@ -343,6 +343,7 @@
   CheckpointObject* self = (CheckpointObject*)data;
 
   PyObject* result;
+  long commitTime = 1;
 
   if (checkpoint_postflush(&self->cps) < 0) {
       fprintf(stderr, "%s\n", checkpoint_error(&self->cps));
@@ -359,12 +360,10 @@
   if (!result)
     return 0;
 
-  if (result == Py_None || PyObject_IsTrue(result)) {
-    Py_DECREF(result);
-    return 1;
-  }
+  if (result != Py_None)
+    commitTime = PyInt_AS_LONG(result);
 
   Py_DECREF(result);
 
-  return 0;
+  return (int)commitTime;
 }
diff -r a9fd6c05f469 tools/python/xen/lowlevel/checkpoint/libcheckpoint.c
--- a/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c	Thu Oct 13 08:53:03 2011 -0700
+++ b/tools/python/xen/lowlevel/checkpoint/libcheckpoint.c	Wed Oct 26 09:19:02 2011 -0700
@@ -205,13 +205,8 @@
 /* suspend the domain. Returns 0 on failure, 1 on success */
 int checkpoint_suspend(checkpoint_state* s)
 {
-  struct timeval tv;
   int rc;
 
-  gettimeofday(&tv, NULL);
-  fprintf(stderr, "PROF: suspending at %lu.%06lu\n", (unsigned long)tv.tv_sec,
-         (unsigned long)tv.tv_usec);
-
   if (s->suspend_evtchn >= 0)
       rc = evtchn_suspend(s);
   else if (s->domtype == dt_hvm)
@@ -254,7 +249,6 @@
 /* let guest execution resume */
 int checkpoint_resume(checkpoint_state* s)
 {
-  struct timeval tv;
   int rc;
 
   if (xc_domain_resume(s->xch, s->domid, 1)) {
@@ -264,10 +258,6 @@
     return -1;
   }
 
-  gettimeofday(&tv, NULL);
-  fprintf(stderr, "PROF: resumed at %lu.%06lu\n", (unsigned long)tv.tv_sec,
-         (unsigned long)tv.tv_usec);
-
   if (s->domtype > dt_pv && resume_qemu(s) < 0)
       return -1;
 
@@ -580,13 +570,13 @@
 {
     int rc = -1;
 
-    fprintf(stderr, "issuing HVM suspend hypercall\n");
+    /* fprintf(stderr, "issuing HVM suspend hypercall\n"); */
     rc = xc_domain_shutdown(s->xch, s->domid, SHUTDOWN_suspend);
     if (rc < 0) {
        s->errstr = "shutdown hypercall failed";
        return -1;
     }
-    fprintf(stderr, "suspend hypercall returned %d\n", rc);
+    /* fprintf(stderr, "suspend hypercall returned %d\n", rc); */
 
     if (check_shutdown(s) != 1)
        return -1;
@@ -600,7 +590,7 @@
 {
     char path[128];
 
-    fprintf(stderr, "pausing QEMU\n");
+    /* fprintf(stderr, "pausing QEMU\n"); */
 
     sprintf(path, "/local/domain/0/device-model/%d/command", s->domid);
     if (!xs_write(s->xsh, XBT_NULL, path, "save", 4)) {
@@ -635,7 +625,7 @@
 static int resume_qemu(checkpoint_state *s)
 {
     char path[128];
-    fprintf(stderr, "resuming QEMU\n");
+    /* fprintf(stderr, "resuming QEMU\n"); */
 
     sprintf(path, "/local/domain/0/device-model/%d/command", s->domid);
     if (!xs_write(s->xsh, XBT_NULL, path, "continue", 8)) {
diff -r a9fd6c05f469 tools/remus/remus
--- a/tools/remus/remus	Thu Oct 13 08:53:03 2011 -0700
+++ b/tools/remus/remus	Wed Oct 26 09:19:02 2011 -0700
@@ -78,6 +78,7 @@
 def run(cfg):
     closure = lambda: None
     closure.cmd = None
+    closure.epoch = 0
 
     def sigexception(signo, frame):
         raise SignalException(signo)
@@ -142,6 +143,7 @@
         if not cfg.timer:
             # when not using a timer thread, sleep until now + interval
             closure.starttime = time.time()
+            closure.epoch = closure.epoch + 1
 
         if closure.cmd == 's':
             die()
@@ -168,10 +170,11 @@
         if closure.cmd == 'c':
             die()
 
-        print >> sys.stderr, "PROF: flushed memory at %0.6f" % (time.time())
-
+        # print >> sys.stderr, "PROF: flushed memory at %0.6f" % (time.time())
+        tmptime = time.time()
         for buf in bufs:
             buf.commit()
+        endtime = time.time()
 
         if closure.cmd == 'c2':
             die()
@@ -181,14 +184,13 @@
         # getcommand()
 
         if not cfg.timer:
-            endtime = time.time()
             elapsed = (endtime - closure.starttime) * 1000
 
             if elapsed < cfg.interval:
                 time.sleep((cfg.interval - elapsed) / 1000.0)
 
-        # False ends checkpointing
-        return True
+        # 0 ends checkpointing
+        return int(((endtime - tmptime) * 1000) + 1)
 
     if cfg.timer:
         interval = cfg.interval
diff -r a9fd6c05f469 xen/common/hvm/save.c
--- a/xen/common/hvm/save.c	Thu Oct 13 08:53:03 2011 -0700
+++ b/xen/common/hvm/save.c	Wed Oct 26 09:19:02 2011 -0700
@@ -159,7 +159,7 @@
         handler = hvm_sr_handlers[i].save;
         if ( handler != NULL ) 
         {
-            gdprintk(XENLOG_INFO, "HVM save: %s\n",  hvm_sr_handlers[i].name);
+            //gdprintk(XENLOG_INFO, "HVM save: %s\n",  hvm_sr_handlers[i].name);
             if ( handler(d, h) != 0 ) 
             {
                 gdprintk(XENLOG_ERR, 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Fwd:  How to get the checkpoint size in remus code?
       [not found]       ` <CAJP76_D7smbG84YkK53kZkLDupZUECXLMVvDSNC5wqV2XuYtmg@mail.gmail.com>
@ 2012-10-24 17:36         ` José Eduardo França
  2012-10-24 17:49         ` José Eduardo França
  1 sibling, 0 replies; 6+ messages in thread
From: José Eduardo França @ 2012-10-24 17:36 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3060 bytes --]

---------- Forwarded message ----------
From: José Eduardo França <jefranca@gmail.com>
Date: 2012/10/17
Subject: Re: [Xen-devel] How to get the checkpoint size in remus code?
To: rshriram@cs.ubc.ca


Hi Shriram,

Thank you for your reply. I'm sorry coz I saw your email yesterday and my
English is bad.
Ok... I saw your patch but I need to explain more my problem.

In my master research, I intend to deploy a based-time dynamic checkpoint
that should work this way: if checkpoint size breaks *Lmax* (see attached
figure) I reduce checkpoint interval, and if checkpoint size doesn't break *
Lmin* I increase checkpoint interval. After that I will evaluate the
performace.

I had read remus code and I saw that remus control the elapsed time in

            endtime = time.time()
            elapsed = (endtime - closure.starttime) * 1000

            if elapsed < cfg.interval:
                time.sleep((cfg.interval - elapsed) / 1000.0)

Then I thought I could change the checkpoint interval close to code above,
but I suppose I need get checkpoint size. But here the remus code is python
and I saw (or thought) that checkpoint size is gotten on *xc_shadow_op_stats_t
*stats* or better on *stats->dirty_count*PAGE_SIZE*.

Would I get checkpoint size into remus code? I thought it's easier this way
for I intend to do.
Please, help me, coz my time is running out.

Thanks jefranca

PS: My English is terrible coz I'm not native


2012/10/5 Shriram Rajagopalan <rshriram@cs.ubc.ca>

> On Wed, Oct 3, 2012 at 4:11 AM, José Eduardo França <jefranca@gmail.com>
> wrote:
> >
> > I thought remus used xc_domain_save. Is this function used from live
> > migration?
> >
> > Futhermore I have two doubts if really Remus takes the last iteration of
> > live migration
> >
> > What's the function?
>
> There is no specific function. xc_domain_save is where everything
> happens. The infinite loop
> that basically keeps sending checkpoints @ a particular frequency
>
>
> > And how to get de I/O disk size on each period?
> >
>
> This depends on the disk backend. With blktap2 (unfortunately not
> available in 3.* kernels)
> tap-remus driver can give you the number of disk blocks sent per
> checkpoint.
>
> With DRBD, it needs a little bit of hacking into the kernel module to
> return the number of disk blocks
> being sent with each checkpoint.
>
>
> >> I'm doing my master research and I need to adapt remus code. Now... I
> >> wanna get the checkpoint size (memory + disk) on each period. Does
> someone
> >> know what function does this? I think some fd object's function in remus
> >> code could just get the memory size.
> >>
>
> You can get memory checkpoint stats for each iteration - like
> number of pages dirtied, size of data actually transmitted after
> compression (including headers, etc),
> time to checkpoint, etc.
>
> The attached patch (for xen-4.1.2) will give you the memory checkpoint
> stats for each checkpoint and
> can be easily parsed.
>
> shriram
>

[-- Attachment #1.2: Type: text/html, Size: 3993 bytes --]

[-- Attachment #2: dynamic-checkpoint.png --]
[-- Type: image/png, Size: 50513 bytes --]

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to get the checkpoint size in remus code?
       [not found]       ` <CAJP76_D7smbG84YkK53kZkLDupZUECXLMVvDSNC5wqV2XuYtmg@mail.gmail.com>
  2012-10-24 17:36         ` Fwd: " José Eduardo França
@ 2012-10-24 17:49         ` José Eduardo França
  1 sibling, 0 replies; 6+ messages in thread
From: José Eduardo França @ 2012-10-24 17:49 UTC (permalink / raw)
  To: rshriram; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3392 bytes --]

Hi Guys,

I see too that I can get stats using struct xen_domctl.

Then could I put the code below anywhere and get the dirty_count? For
example:

    DECLARE_DOMCTL;
    uint32_t dirty_count = domctl.u.shadow_op.stats.dirty_count;

Can I do this?

Or are there a permanent data that return dirty_count or checkpoint_size?
If yes, what's it?

thanks

2012/10/17 José Eduardo França <jefranca@gmail.com>

> Hi Shriram,
>
> Thank you for your reply. I'm sorry coz I saw your email yesterday and my
> English is bad.
> Ok... I saw your patch but I need to explain more my problem.
>
> In my master research, I intend to deploy a based-time dynamic checkpoint
> that should work this way: if checkpoint size breaks *Lmax* (see attached
> figure) I reduce checkpoint interval, and if checkpoint size doesn't break
> *Lmin* I increase checkpoint interval. After that I will evaluate the
> performace.
>
> I had read remus code and I saw that remus control the elapsed time in
>
>             endtime = time.time()
>             elapsed = (endtime - closure.starttime) * 1000
>
>             if elapsed < cfg.interval:
>                 time.sleep((cfg.interval - elapsed) / 1000.0)
>
> Then I thought I could change the checkpoint interval close to code above,
> but I suppose I need get checkpoint size. But here the remus code is python
> and I saw (or thought) that checkpoint size is gotten on *xc_shadow_op_stats_t
> *stats* or better on *stats->dirty_count*PAGE_SIZE*.
>
> Would I get checkpoint size into remus code? I thought it's easier this
> way for I intend to do.
> Please, help me, coz my time is running out.
>
> Thanks jefranca
>
> PS: My English is terrible coz I'm not native
>
>
> 2012/10/5 Shriram Rajagopalan <rshriram@cs.ubc.ca>
>
>> On Wed, Oct 3, 2012 at 4:11 AM, José Eduardo França <jefranca@gmail.com>
>> wrote:
>> >
>> > I thought remus used xc_domain_save. Is this function used from live
>> > migration?
>> >
>> > Futhermore I have two doubts if really Remus takes the last iteration of
>> > live migration
>> >
>> > What's the function?
>>
>> There is no specific function. xc_domain_save is where everything
>> happens. The infinite loop
>> that basically keeps sending checkpoints @ a particular frequency
>>
>>
>> > And how to get de I/O disk size on each period?
>> >
>>
>> This depends on the disk backend. With blktap2 (unfortunately not
>> available in 3.* kernels)
>> tap-remus driver can give you the number of disk blocks sent per
>> checkpoint.
>>
>> With DRBD, it needs a little bit of hacking into the kernel module to
>> return the number of disk blocks
>> being sent with each checkpoint.
>>
>>
>> >> I'm doing my master research and I need to adapt remus code. Now... I
>> >> wanna get the checkpoint size (memory + disk) on each period. Does
>> someone
>> >> know what function does this? I think some fd object's function in
>> remus
>> >> code could just get the memory size.
>> >>
>>
>> You can get memory checkpoint stats for each iteration - like
>> number of pages dirtied, size of data actually transmitted after
>> compression (including headers, etc),
>> time to checkpoint, etc.
>>
>> The attached patch (for xen-4.1.2) will give you the memory checkpoint
>> stats for each checkpoint and
>> can be easily parsed.
>>
>> shriram
>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 4363 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-10-24 17:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-01 14:48 How to get the checkpoint size in remus code? José Eduardo França
2012-10-01 15:30 ` lmingcsce
2012-10-03 11:11   ` José Eduardo França
2012-10-05 17:37     ` Shriram Rajagopalan
     [not found]       ` <CAJP76_D7smbG84YkK53kZkLDupZUECXLMVvDSNC5wqV2XuYtmg@mail.gmail.com>
2012-10-24 17:36         ` Fwd: " José Eduardo França
2012-10-24 17:49         ` José Eduardo França

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).