* migration with exec giving truncated images
@ 2007-08-02 21:02 Jim Paris
[not found] ` <20070802210226.GA29753-lSbMZ+N7itA@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Jim Paris @ 2007-08-02 21:02 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
http://kvm.qumranet.com/kvmwiki/Migration suggests to use
stop
migrate "exec:dd of=STATEFILE"
to save an image that can be loaded later. I was having trouble
getting this to work (loading gave "Migration failed rc=233") and
discovered that not all of the data was being saved, probably because
of some buffering/pipe issues. I ran the following commands:
(qemu) stop
(qemu) migrate "exec:dd of=/tmp/jr1"
(qemu) migrate "exec:cat > /tmp/jr2"
(qemu) migrate "exec:dd bs=1 of=/tmp/jr3"
And the file sizes:
$ ls -al /tmp/jr[123]
-rw-r--r-- 1 root root 86061424 2007-08-02 16:52 jr1
-rw-r--r-- 1 root root 86220963 2007-08-02 16:53 jr2
-rw-r--r-- 1 root root 86220963 2007-08-02 16:56 jr3
Sometimes the "cat" gives a filesize similar to "dd", depending on
image size. Only "dd bs=1" appears to always give me all of the data.
Sometimes the truncated images work fine for resume, other times they
cause a "migration failed".
I haven't had a chance yet to dig too deep in the source to find the
cause. I haven't seeen if this truncation also happens over TCP.
This was tested with kvm-28 modules and both kvm-28 and kvm-33
userspace. Has anyone else seen this?
-jim
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <20070802210226.GA29753-lSbMZ+N7itA@public.gmane.org>]
* Re: migration with exec giving truncated images [not found] ` <20070802210226.GA29753-lSbMZ+N7itA@public.gmane.org> @ 2007-08-03 18:23 ` Jim Paris [not found] ` <20070803182333.GA15267-lSbMZ+N7itA@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Jim Paris @ 2007-08-03 18:23 UTC (permalink / raw) To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f I wrote: > migrate "exec:dd of=STATEFILE" > ... I was having trouble getting this to work (loading gave > "Migration failed rc=233") and discovered that not all of the data > was being saved, probably because of some buffering/pipe issues. Actually, maybe that isn't the issue. I guess the migration data might be different but still valid in both cases. The real place I'm running into trouble is when trying to add migrate-to-file support in libvirt. I'm having problems that I can't reproduce outside of libvirt (ie. running kvm manually). For example I'm getting this segfault, any ideas? It's almost as if migrate_write() is being called after migrate_finish() ?? -jim This is kvm-33: Core was generated by `/usr/local/bin/qemu-system-x86_64 -M pc -m 256 -smp 1 -monitor pty -boot c -hda'. Program terminated with signal 11, Segmentation fault. #0 0x00002b1b0f786794 in memset () from /lib/libc.so.6 (gdb) bt full 8 #0 0x00002b1b0f786794 in memset () from /lib/libc.so.6 No symbol table info available. #1 0x000000000047fd56 in kvm_get_dirty_pages_log_slot (slot=3, bitmap=0x0, offset=0, len=24) at /usr/local/src/kvm/kvm-33/qemu/qemu-kvm.c:1119 r = 3 i = 2620318256 j = 0 n = 3 c = 3 '\003' page_number = 0 #2 0x000000000047fdf9 in kvm_update_dirty_pages_log () at /usr/local/src/kvm/kvm-33/qemu/qemu-kvm.c:1152 r = 3 len = 0 #3 0x00000000004162c5 in migrate_write (opaque=0x0) at /usr/local/src/kvm/kvm-33/qemu/migration.c:328 s = (MigrationState *) 0x2b58010 #4 0x000000000040c45d in main_loop_wait (timeout=2) at /usr/local/src/kvm/kvm-33/qemu/vl.c:6236 pioh = (IOHandlerRecord **) 0x1 ioh = (IOHandlerRecord *) 0x2993f70 rfds = {fds_bits = {0 <repeats 16 times>}} wfds = {fds_bits = {49152, 0 <repeats 15 times>}} xfds = {fds_bits = {0 <repeats 16 times>}} ret = 2 nfds = 15 tv = {tv_sec = 0, tv_usec = 0} pe = (PollingEntry *) 0x2993f70 #5 0x000000000047f1c5 in kvm_main_loop_wait (env=0x296b670, timeout=10) at /usr/local/src/kvm/kvm-33/qemu/qemu-kvm.c:604 No locals. #6 0x000000000047f39f in kvm_main_loop_cpu (env=0x296b670) at /usr/local/src/kvm/kvm-33/qemu/qemu-kvm.c:667 No locals. #7 0x000000000040c6aa in main_loop () at /usr/local/src/kvm/kvm-33/qemu/vl.c:6289 ret = 3 timeout = 0 env = (CPUX86State *) 0x0 (More stack frames follow...) (gdb) p *(MigrationState *) 0x2b58010 $1 = {fd = 14, throttle_count = 533900, bps = 13271376, updated_pages = 0, last_updated_pages = 11167, iteration = 1, n_buffer = 9, l_buffer = 9, throttled = 0, has_error = 0x296b640, buffer = "\020\204�000\001\000\000\000\000\000\004\016\a\021\vv\fv\rv\016v\022\004\023\000\027 \0334\0344\0354\0364��\021)\000�001\237\002\237\003\206\004\203\005\224\006$\a�t`\f\000\r\000\020\003\022�\023�\024@\025�\026$\027�030�\032\000\033\"\035\000��_\000\200\002�001\b\000\000\000a|G|\205|\b\004\000\000\000\000\000\000\000\000d\000\200\002�001\020\000�000�|G|�\020\006\005\v\006\005\005\000\000\000f\000\200\002�001\017\000�000�|G|�\020\006\005\n\005\005\005\000\001\017q\000\200\002�001\030\000�000\t}G|-}\030\006\b\020\b\b\b\000\000\000\\\000 \003X\002\b\000\000\000"..., addr = 0, timer = 0x2993f40, opaque = 0x2993fc0, detach = 0, release = 0x4168c0 <cmd_release>, rapid_writes = 0} (gdb) p *((MigrationState *) 0x2b58010)->has_error $2 = 0 I ran the following commands: > > (qemu) stop > (qemu) migrate "exec:dd of=/tmp/jr1" > (qemu) migrate "exec:cat > /tmp/jr2" > (qemu) migrate "exec:dd bs=1 of=/tmp/jr3" > > And the file sizes: > > $ ls -al /tmp/jr[123] > -rw-r--r-- 1 root root 86061424 2007-08-02 16:52 jr1 > -rw-r--r-- 1 root root 86220963 2007-08-02 16:53 jr2 > -rw-r--r-- 1 root root 86220963 2007-08-02 16:56 jr3 > > Sometimes the "cat" gives a filesize similar to "dd", depending on > image size. Only "dd bs=1" appears to always give me all of the data. > Sometimes the truncated images work fine for resume, other times they > cause a "migration failed". > > I haven't had a chance yet to dig too deep in the source to find the > cause. I haven't seeen if this truncation also happens over TCP. > > This was tested with kvm-28 modules and both kvm-28 and kvm-33 > userspace. Has anyone else seen this? > > -jim ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20070803182333.GA15267-lSbMZ+N7itA@public.gmane.org>]
* Re: migration with exec giving truncated images [not found] ` <20070803182333.GA15267-lSbMZ+N7itA@public.gmane.org> @ 2007-08-07 18:28 ` Jim Paris [not found] ` <20070807182826.GA30737-lSbMZ+N7itA@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Jim Paris @ 2007-08-07 18:28 UTC (permalink / raw) To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f I wrote: > It's almost as if migrate_write() is being called after > migrate_finish() ?? Yes, I'm definitely seeing this -- migrate_finish followed by migrate_write, which causes a segfault (and explains my truncated images). Unfortunately I'm not at all familiar with this code, and all the qemu I/O handler stuff is still confusing to me. Does anyone with some experience in this area have some time to help track this down? -jim ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20070807182826.GA30737-lSbMZ+N7itA@public.gmane.org>]
* Re: migration with exec giving truncated images [not found] ` <20070807182826.GA30737-lSbMZ+N7itA@public.gmane.org> @ 2007-08-08 9:14 ` Uri Lublin 0 siblings, 0 replies; 8+ messages in thread From: Uri Lublin @ 2007-08-08 9:14 UTC (permalink / raw) To: Jim Paris; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f I've never encountered that problem. I haven't used "exec" migration protocol too many times though. I have not used libvirt too many times either. I'll look into it too. Thanks, Uri. Jim Paris wrote: > I wrote: > >> It's almost as if migrate_write() is being called after >> migrate_finish() ?? >> > > Yes, I'm definitely seeing this -- migrate_finish followed by > migrate_write, which causes a segfault (and explains my truncated > images). Unfortunately I'm not at all familiar with this code, and > all the qemu I/O handler stuff is still confusing to me. Does anyone > with some experience in this area have some time to help track this > down? > > -jim > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > kvm-devel mailing list > kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org > https://lists.sourceforge.net/lists/listinfo/kvm-devel > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: migration with exec giving truncated images
@ 2007-08-08 20:22 Jim Paris
[not found] ` <1186604569626-git-send-email-jim-XrPbb/hENzg@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Jim Paris @ 2007-08-08 20:22 UTC (permalink / raw)
To: Uri Lublin, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
I think I've (finally!) tracked it down. See the attached patches.
The main problem is this: when using "-monitor pty", all incoming
commands are terminated with CRLF even though they were sent with just
LF, probably because of the pty layer somewhere. When qemu's readline
gets CR and LF without calling readline_start() in between, it
executes the same command twice in a row, which meant that _two_
migrations were running concurrently.
-jim
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 8+ messages in thread[parent not found: <1186604569626-git-send-email-jim-XrPbb/hENzg@public.gmane.org>]
* Re: migration with exec giving truncated images [not found] ` <1186604569626-git-send-email-jim-XrPbb/hENzg@public.gmane.org> @ 2007-08-09 12:24 ` Uri Lublin [not found] ` <46BB0760.80405-atKUWr5tajBWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Uri Lublin @ 2007-08-09 12:24 UTC (permalink / raw) To: Jim Paris; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Thanks for the patches. There is still the mystery of different file sizes for different migration-exec commands, all files are "valid saved image". It seems to me that some unmodified pages are being marked as dirty, and are being saved twice (and later loaded twice). I'm still chasing that. Uri. Jim Paris wrote: > I think I've (finally!) tracked it down. See the attached patches. > > The main problem is this: when using "-monitor pty", all incoming > commands are terminated with CRLF even though they were sent with just > LF, probably because of the pty layer somewhere. When qemu's readline > gets CR and LF without calling readline_start() in between, it > executes the same command twice in a row, which meant that _two_ > migrations were running concurrently. > > -jim > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > kvm-devel mailing list > kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org > https://lists.sourceforge.net/lists/listinfo/kvm-devel > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <46BB0760.80405-atKUWr5tajBWk0Htik3J/w@public.gmane.org>]
* Re: migration with exec giving truncated images [not found] ` <46BB0760.80405-atKUWr5tajBWk0Htik3J/w@public.gmane.org> @ 2007-08-14 3:56 ` Jim Paris [not found] ` <20070814035659.GA10726-lSbMZ+N7itA@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Jim Paris @ 2007-08-14 3:56 UTC (permalink / raw) To: Uri Lublin; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Uri Lublin wrote: > There is still the mystery of different file sizes for different > migration-exec commands, all files are "valid saved image". > It seems to me that some unmodified pages are being marked as dirty, and > are being saved twice (and later loaded twice). > I'm still chasing that. Hi Uri, I looked into this a bit more and it seems that a big piece of migration.c is missing or broken. In migrate_write_buffer, it calls migrate_check_convergence, which returns TRUE if the migration is "almost" complete (dirty pages < 50, or too many iterations through memory). At that point, it then calls migrate_finish -- which finishes writing the current page, but never actually writes the remaining 50 pages (!) Am I missing something? -jim ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20070814035659.GA10726-lSbMZ+N7itA@public.gmane.org>]
* Re: migration with exec giving truncated images [not found] ` <20070814035659.GA10726-lSbMZ+N7itA@public.gmane.org> @ 2007-08-14 4:49 ` Jim Paris 0 siblings, 0 replies; 8+ messages in thread From: Jim Paris @ 2007-08-14 4:49 UTC (permalink / raw) To: Uri Lublin; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f I wrote > I looked into this a bit more and it seems that a big piece of > migration.c is missing or broken. .. > Am I missing something? Yes, I am. Sorry, I missed the qemu_live_savevm_state call, which saves the rest of the dirty pages, and explains why some migration images are larger than others (ram_live_save doesn't compress homogeneous pages like migrate_write does). -jim ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-08-14 4:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-02 21:02 migration with exec giving truncated images Jim Paris
[not found] ` <20070802210226.GA29753-lSbMZ+N7itA@public.gmane.org>
2007-08-03 18:23 ` Jim Paris
[not found] ` <20070803182333.GA15267-lSbMZ+N7itA@public.gmane.org>
2007-08-07 18:28 ` Jim Paris
[not found] ` <20070807182826.GA30737-lSbMZ+N7itA@public.gmane.org>
2007-08-08 9:14 ` Uri Lublin
-- strict thread matches above, loose matches on Subject: below --
2007-08-08 20:22 Jim Paris
[not found] ` <1186604569626-git-send-email-jim-XrPbb/hENzg@public.gmane.org>
2007-08-09 12:24 ` Uri Lublin
[not found] ` <46BB0760.80405-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-08-14 3:56 ` Jim Paris
[not found] ` <20070814035659.GA10726-lSbMZ+N7itA@public.gmane.org>
2007-08-14 4:49 ` Jim Paris
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox