public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* migration with exec giving truncated images
@ 2007-08-02 21:02 Jim Paris
       [not found] ` <20070802210226.GA29753-lSbMZ+N7itA@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Paris @ 2007-08-02 21:02 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

http://kvm.qumranet.com/kvmwiki/Migration suggests to use
 
  stop
  migrate "exec:dd of=STATEFILE"

to save an image that can be loaded later.  I was having trouble
getting this to work (loading gave "Migration failed rc=233") and
discovered that not all of the data was being saved, probably because
of some buffering/pipe issues.  I ran the following commands:

  (qemu) stop
  (qemu) migrate "exec:dd of=/tmp/jr1"
  (qemu) migrate "exec:cat > /tmp/jr2" 
  (qemu) migrate "exec:dd bs=1 of=/tmp/jr3"

And the file sizes:

  $ ls -al /tmp/jr[123]
  -rw-r--r--  1 root root    86061424 2007-08-02 16:52 jr1
  -rw-r--r--  1 root root    86220963 2007-08-02 16:53 jr2
  -rw-r--r--  1 root root    86220963 2007-08-02 16:56 jr3

Sometimes the "cat" gives a filesize similar to "dd", depending on
image size.  Only "dd bs=1" appears to always give me all of the data.
Sometimes the truncated images work fine for resume, other times they
cause a "migration failed".

I haven't had a chance yet to dig too deep in the source to find the
cause.  I haven't seeen if this truncation also happens over TCP.

This was tested with kvm-28 modules and both kvm-28 and kvm-33
userspace.  Has anyone else seen this?

-jim

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: migration with exec giving truncated images
       [not found] ` <20070802210226.GA29753-lSbMZ+N7itA@public.gmane.org>
@ 2007-08-03 18:23   ` Jim Paris
       [not found]     ` <20070803182333.GA15267-lSbMZ+N7itA@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Paris @ 2007-08-03 18:23 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

I wrote:
>   migrate "exec:dd of=STATEFILE"
> ... I was having trouble getting this to work (loading gave
> "Migration failed rc=233") and discovered that not all of the data
> was being saved, probably because of some buffering/pipe issues.

Actually, maybe that isn't the issue.  I guess the migration data
might be different but still valid in both cases.  The real place I'm
running into trouble is when trying to add migrate-to-file support
in libvirt.  I'm having problems that I can't reproduce outside of 
libvirt (ie. running kvm manually).  For example I'm getting this
segfault, any ideas?  It's almost as if migrate_write() is being
called after migrate_finish() ??

-jim

This is kvm-33:

Core was generated by `/usr/local/bin/qemu-system-x86_64 -M pc -m 256 -smp 1 -monitor pty -boot c -hda'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002b1b0f786794 in memset () from /lib/libc.so.6
(gdb) bt full 8
#0  0x00002b1b0f786794 in memset () from /lib/libc.so.6
No symbol table info available.
#1  0x000000000047fd56 in kvm_get_dirty_pages_log_slot (slot=3, bitmap=0x0, offset=0, len=24)
    at /usr/local/src/kvm/kvm-33/qemu/qemu-kvm.c:1119
        r = 3
        i = 2620318256
        j = 0
        n = 3
        c = 3 '\003'
        page_number = 0
#2  0x000000000047fdf9 in kvm_update_dirty_pages_log () at /usr/local/src/kvm/kvm-33/qemu/qemu-kvm.c:1152
        r = 3
        len = 0
#3  0x00000000004162c5 in migrate_write (opaque=0x0) at /usr/local/src/kvm/kvm-33/qemu/migration.c:328
        s = (MigrationState *) 0x2b58010
#4  0x000000000040c45d in main_loop_wait (timeout=2) at /usr/local/src/kvm/kvm-33/qemu/vl.c:6236
        pioh = (IOHandlerRecord **) 0x1
        ioh = (IOHandlerRecord *) 0x2993f70
        rfds = {fds_bits = {0 <repeats 16 times>}}
        wfds = {fds_bits = {49152, 0 <repeats 15 times>}}
        xfds = {fds_bits = {0 <repeats 16 times>}}
        ret = 2
        nfds = 15
        tv = {tv_sec = 0, tv_usec = 0}
        pe = (PollingEntry *) 0x2993f70
#5  0x000000000047f1c5 in kvm_main_loop_wait (env=0x296b670, timeout=10) at /usr/local/src/kvm/kvm-33/qemu/qemu-kvm.c:604
No locals.
#6  0x000000000047f39f in kvm_main_loop_cpu (env=0x296b670) at /usr/local/src/kvm/kvm-33/qemu/qemu-kvm.c:667
No locals.
#7  0x000000000040c6aa in main_loop () at /usr/local/src/kvm/kvm-33/qemu/vl.c:6289
        ret = 3
        timeout = 0
        env = (CPUX86State *) 0x0
(More stack frames follow...)
(gdb) p *(MigrationState *) 0x2b58010
$1 = {fd = 14, throttle_count = 533900, bps = 13271376, updated_pages = 0, last_updated_pages = 11167, iteration = 1, 
  n_buffer = 9, l_buffer = 9, throttled = 0, has_error = 0x296b640, 
  buffer = "\020\204�000\001\000\000\000\000\000\004\016\a\021\vv\fv\rv\016v\022\004\023\000\027 \0334\0344\0354\0364��\021)\000�001\237\002\237\003\206\004\203\005\224\006$\a�t`\f\000\r\000\020\003\022�\023�\024@\025�\026$\027�030�\032\000\033\"\035\000��_\000\200\002�001\b\000\000\000a|G|\205|\b\004\000\000\000\000\000\000\000\000d\000\200\002�001\020\000�000�|G|�\020\006\005\v\006\005\005\000\000\000f\000\200\002�001\017\000�000�|G|�\020\006\005\n\005\005\005\000\001\017q\000\200\002�001\030\000�000\t}G|-}\030\006\b\020\b\b\b\000\000\000\\\000 \003X\002\b\000\000\000"..., addr = 0, timer = 0x2993f40, 
  opaque = 0x2993fc0, detach = 0, release = 0x4168c0 <cmd_release>, rapid_writes = 0}
(gdb) p *((MigrationState *) 0x2b58010)->has_error
$2 = 0





  I ran the following commands:
> 
>   (qemu) stop
>   (qemu) migrate "exec:dd of=/tmp/jr1"
>   (qemu) migrate "exec:cat > /tmp/jr2" 
>   (qemu) migrate "exec:dd bs=1 of=/tmp/jr3"
> 
> And the file sizes:
> 
>   $ ls -al /tmp/jr[123]
>   -rw-r--r--  1 root root    86061424 2007-08-02 16:52 jr1
>   -rw-r--r--  1 root root    86220963 2007-08-02 16:53 jr2
>   -rw-r--r--  1 root root    86220963 2007-08-02 16:56 jr3
> 
> Sometimes the "cat" gives a filesize similar to "dd", depending on
> image size.  Only "dd bs=1" appears to always give me all of the data.
> Sometimes the truncated images work fine for resume, other times they
> cause a "migration failed".
> 
> I haven't had a chance yet to dig too deep in the source to find the
> cause.  I haven't seeen if this truncation also happens over TCP.
> 
> This was tested with kvm-28 modules and both kvm-28 and kvm-33
> userspace.  Has anyone else seen this?
> 
> -jim

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: migration with exec giving truncated images
       [not found]     ` <20070803182333.GA15267-lSbMZ+N7itA@public.gmane.org>
@ 2007-08-07 18:28       ` Jim Paris
       [not found]         ` <20070807182826.GA30737-lSbMZ+N7itA@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Paris @ 2007-08-07 18:28 UTC (permalink / raw)
  To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

I wrote:
> It's almost as if migrate_write() is being called after
> migrate_finish() ??

Yes, I'm definitely seeing this -- migrate_finish followed by
migrate_write, which causes a segfault (and explains my truncated
images).  Unfortunately I'm not at all familiar with this code, and
all the qemu I/O handler stuff is still confusing to me.  Does anyone
with some experience in this area have some time to help track this
down?

-jim

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: migration with exec giving truncated images
       [not found]         ` <20070807182826.GA30737-lSbMZ+N7itA@public.gmane.org>
@ 2007-08-08  9:14           ` Uri Lublin
  0 siblings, 0 replies; 8+ messages in thread
From: Uri Lublin @ 2007-08-08  9:14 UTC (permalink / raw)
  To: Jim Paris; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f


I've never encountered that problem.
I haven't used "exec" migration protocol too many times though.
I have not used libvirt too many times either.
I'll look into it too.

Thanks,
Uri.

Jim Paris wrote:
> I wrote:
>   
>> It's almost as if migrate_write() is being called after
>> migrate_finish() ??
>>     
>
> Yes, I'm definitely seeing this -- migrate_finish followed by
> migrate_write, which causes a segfault (and explains my truncated
> images).  Unfortunately I'm not at all familiar with this code, and
> all the qemu I/O handler stuff is still confusing to me.  Does anyone
> with some experience in this area have some time to help track this
> down?
>
> -jim
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> kvm-devel mailing list
> kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>   

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: migration with exec giving truncated images
@ 2007-08-08 20:22 Jim Paris
       [not found] ` <1186604569626-git-send-email-jim-XrPbb/hENzg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Paris @ 2007-08-08 20:22 UTC (permalink / raw)
  To: Uri Lublin, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

I think I've (finally!) tracked it down.  See the attached patches.

The main problem is this: when using "-monitor pty", all incoming
commands are terminated with CRLF even though they were sent with just
LF, probably because of the pty layer somewhere.  When qemu's readline
gets CR and LF without calling readline_start() in between, it
executes the same command twice in a row, which meant that _two_
migrations were running concurrently.

-jim

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: migration with exec giving truncated images
       [not found] ` <1186604569626-git-send-email-jim-XrPbb/hENzg@public.gmane.org>
@ 2007-08-09 12:24   ` Uri Lublin
       [not found]     ` <46BB0760.80405-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Uri Lublin @ 2007-08-09 12:24 UTC (permalink / raw)
  To: Jim Paris; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f


Thanks for the patches.

There is still the mystery of different file sizes for different 
migration-exec commands, all files are "valid saved image".
It seems to me that some unmodified pages are being marked as dirty, and 
are being saved twice (and later loaded twice).
I'm still chasing that.

Uri.

Jim Paris wrote:
> I think I've (finally!) tracked it down.  See the attached patches.
>
> The main problem is this: when using "-monitor pty", all incoming
> commands are terminated with CRLF even though they were sent with just
> LF, probably because of the pty layer somewhere.  When qemu's readline
> gets CR and LF without calling readline_start() in between, it
> executes the same command twice in a row, which meant that _two_
> migrations were running concurrently.
>
> -jim
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> kvm-devel mailing list
> kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>   

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: migration with exec giving truncated images
       [not found]     ` <46BB0760.80405-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-08-14  3:56       ` Jim Paris
       [not found]         ` <20070814035659.GA10726-lSbMZ+N7itA@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Paris @ 2007-08-14  3:56 UTC (permalink / raw)
  To: Uri Lublin; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Uri Lublin wrote:
> There is still the mystery of different file sizes for different 
> migration-exec commands, all files are "valid saved image".
> It seems to me that some unmodified pages are being marked as dirty, and 
> are being saved twice (and later loaded twice).
> I'm still chasing that.

Hi Uri,

I looked into this a bit more and it seems that a big piece of
migration.c is missing or broken.

In migrate_write_buffer, it calls migrate_check_convergence, which
returns TRUE if the migration is "almost" complete (dirty pages < 50,
or too many iterations through memory).  At that point, it then calls
migrate_finish -- which finishes writing the current page, but never
actually writes the remaining 50 pages (!)  Am I missing something?

-jim



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: migration with exec giving truncated images
       [not found]         ` <20070814035659.GA10726-lSbMZ+N7itA@public.gmane.org>
@ 2007-08-14  4:49           ` Jim Paris
  0 siblings, 0 replies; 8+ messages in thread
From: Jim Paris @ 2007-08-14  4:49 UTC (permalink / raw)
  To: Uri Lublin; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

I wrote
> I looked into this a bit more and it seems that a big piece of
> migration.c is missing or broken.
..
> Am I missing something?

Yes, I am.  Sorry, I missed the qemu_live_savevm_state call, which
saves the rest of the dirty pages, and explains why some migration
images are larger than others (ram_live_save doesn't compress
homogeneous pages like migrate_write does).

-jim

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-08-14  4:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-02 21:02 migration with exec giving truncated images Jim Paris
     [not found] ` <20070802210226.GA29753-lSbMZ+N7itA@public.gmane.org>
2007-08-03 18:23   ` Jim Paris
     [not found]     ` <20070803182333.GA15267-lSbMZ+N7itA@public.gmane.org>
2007-08-07 18:28       ` Jim Paris
     [not found]         ` <20070807182826.GA30737-lSbMZ+N7itA@public.gmane.org>
2007-08-08  9:14           ` Uri Lublin
  -- strict thread matches above, loose matches on Subject: below --
2007-08-08 20:22 Jim Paris
     [not found] ` <1186604569626-git-send-email-jim-XrPbb/hENzg@public.gmane.org>
2007-08-09 12:24   ` Uri Lublin
     [not found]     ` <46BB0760.80405-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-08-14  3:56       ` Jim Paris
     [not found]         ` <20070814035659.GA10726-lSbMZ+N7itA@public.gmane.org>
2007-08-14  4:49           ` Jim Paris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox