All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: Live Migration Error
@ 2005-05-13 21:14 Ian Pratt
       [not found] ` <A95E2296287EAD4EB592B5DEEFCE0E9D1E3FFF@liverpoolst.ad.cl.c am.ac.uk>
  2005-05-16 18:55 ` Andres Lagar Cavilla
  0 siblings, 2 replies; 11+ messages in thread
From: Ian Pratt @ 2005-05-13 21:14 UTC (permalink / raw)
  To: Andrés Lagar Cavilla, Teemu Koponen; +Cc: xen-devel

 

> Teemu saves the day!!!
> I actually set the timeout to 100 for no particular reason 
> (originally it was 10, 20 didn't work either) Thanks Ian for 
> your suggestion as well

I'd be really surprised if increasing the timeout actually made a difference. Are you sure you're not just using the shadow mode fix that was checked in a couple of hours ago?

Best,
Ian

> Cheers!!
> Andres
> At 02:45 PM 5/13/2005, Teemu Koponen wrote:
> >On May 13, 2005, at 20:07, Andres Lagar Cavilla wrote:
> >
> >Andres,
> >
> >>I try to do a live migration in the same physical host, i.e. xm 
> >>migrate --live 'whatever' localhost It fails with 'Error: errors: 
> >>suspend, failed, Callbak timed out'.
> >>It seems like transfer of memory pages works until the 
> point when the 
> >>domain needs to be suspended to do the final transfer. 
> Funny thing is 
> >>it used to work before, gloriously, and I haven't made any 
> >>software/hardware changes. At some point a xm save command 
> failed with 
> >>timeout, and from there on live migration fails with this message. 
> >>Non-live migration works perfectly, also between different physical 
> >>hosts. save/restore also works flawlessly.
> >
> >I had similar timeout errors previously, when I was using a 
> bit slower 
> >servers. I overcame the problem by slightly increasing the timeout 
> >value in controller.py. It seemed to provide a remedy.
> >
> >Teemu
> >
> >--
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread
* RE: Live Migration Error
@ 2005-05-16 20:14 Ian Pratt
  0 siblings, 0 replies; 11+ messages in thread
From: Ian Pratt @ 2005-05-16 20:14 UTC (permalink / raw)
  To: Jim Henderson, xen-devel; +Cc: Andres Lagar Cavilla


> In case you need more feedback, the shadow code fix seems to 
> have cleared up my (chronic) live migration problems under 
> 2.0-testing too.
> Thanks for your attention to this matter.

Thanks for the feedback. It was an evil little typo that took mny hours
to hunt down. 

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Live Migration Error
@ 2005-05-13 17:07 Andres Lagar Cavilla
  2005-05-13 18:45 ` Teemu Koponen
  0 siblings, 1 reply; 11+ messages in thread
From: Andres Lagar Cavilla @ 2005-05-13 17:07 UTC (permalink / raw)
  To: xen-devel

Hi,
I've been scanning the list and seen reports on problems with live 
migration. Thought I might add a bit more entropy.
I try to do a live migration in the same physical host, i.e. xm migrate 
--live 'whatever' localhost
It fails with 'Error: errors: suspend, failed, Callbak timed out'.
It seems like transfer of memory pages works until the point when the 
domain needs to be suspended to do the final transfer. Funny thing is it 
used to work before, gloriously, and I haven't made any 
software/hardware changes. At some point a xm save command failed with 
timeout, and from there on live migration fails with this message. 
Non-live migration works perfectly, also between different physical 
hosts. save/restore also works flawlessly.
For the record, I use nfsroot.
I attached xfrd.log. I can post some other stuff, just ask

Thanks a lot
Andres

xfrd.log:

(xfr.migrate 6 "(domain (id 6) (name AndresNfsDomain) (memory 511) 
(maxmem 524288) (state -b---) (cpu 1) (cpu_time 0.10838393) (up_time
27.1105668545) (start_time 1115999325.85) (console (status listening) 
(id 12) (domain 6) (local_port 12) (remote_port 1) (console_port 9606))
(devices (vif (idx 0) (vif 0) (mac 00:80:84:00:00:11) (vifname vif6.0) 
(evtchn 13 3) (index 0))) (config (vm (name AndresNfsDomain) (memory 512)
(image (linux (kernel /boot/vmlinuz-2.6.11-xenU) (ip 
192.168.70.45:192.168.70.106:192.168.70.254:255.255.255.0:virtuality:eth0:off) 
(root /dev/nfs)
(args 'nfsroot=192.168.70.106:/mnt/nfs2,rsize=32768,wsize=32768 4'))) 
(device (vif (mac 00:80:84:00:00:11))))))" localhost 8002 1 0)[DEBUG]
Conn_sxpr< err=0
[DEBUG] Conn_connect> addr=127.0.0.1:8002
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.err 0)[DEBUG] Conn_sxpr< err=0

[1115999352.965314] xc_linux_save start 6

xc_linux_save start 6
[1115999352.966265] Saving memory pages: iter 1   0%
Saving memory pages: iter 1   0%4344 [INF] XFRD> Xfr service for 
127.0.0.1:54931
[DEBUG] Conn_init> flags=1
[DEBUG] Conn_init> write stream...
[DEBUG] stream_init>mode=w flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_init> read stream...
[DEBUG] stream_init>mode=r flags=1 compress=0
[DEBUG] stream_init> unbuffer...
[DEBUG] stream_init< err=0
[DEBUG] Conn_sxpr>
(xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0
[DEBUG] Conn_sxpr>
(xfr.xfr 6)[DEBUG] Conn_sxpr< err=0
[1115999352.971066] xc_linux_restore start

xc_linux_restore start
[1115999352.991648] Created domain 7

Created domain 7
[1115999353.003196] Reloading memory pages:   0%
Reloading memory pages:   5%
  5%
 10%
 10%
 10%FNI 765 : [1000007e,1020] pte=00bec063, mfn=00000bec, pfn=ffffffff 
[mfn]=deadbeef
 15%
 15%
 20%
 20%
 25%
 25%
 30%
 30%
 35%
 35%
 40%
 40%
 45%
 45%
 50%
 50%
 55%
 55%
 60%
 60%
 65%
 65%
 70%
 70%
 75%
 75%
 80%
 80%
 85%
 85%
 90%
 90%
 95%
 95%
 1: sent 130824, skipped 243,
 1: sent 130824, skipped 243, delta 2629ms, dom0 100%, target 71%, sent 
1630Mb/s, dirtied 4Mb/s 321 pages
[1115999355.596112] Saving memory pages: iter 2   0%
 2: sent 320, skipped 0,  2   0%
 2: sent 320, skipped 0, delta 11ms, dom0 0%, target 100%, sent 953Mb/s, 
dirtied 35Mb/s 12 pages
[1115999355.607606] Saving memory pages: iter 3   0%
 3: sent 12, skipped 0, r 3   0%
 3: sent 12, skipped100%
100%[DEBUG] Conn_sxpr>
(xfr.err 22)[DEBUG] Conn_sxpr< err=0
Retry suspend domain (120)
#... This repeats 198 times in total ...#
Retry suspend domain (120)
Unable to suspend domain. (120)
Unable to suspend domain. (120)
Domain appears not to have suspended: 120
Domain appears not to have suspended: 120
4343 [WRN] XFRD> Transfer errors:
4343 [WRN] XFRD> state=XFR_STATE    err=1
4343 [INF] XFRD> Xfr service err=1
Error when reading from state file
Error when reading from state file
4344 [INF] XFRD> Xfr service err=1

^ permalink raw reply	[flat|nested] 11+ messages in thread
* RE: Live Migration Error
@ 2005-05-13 16:36 Ian Pratt
       [not found] ` <A95E2296287EAD4EB592B5DEEFCE0E9D1E3FE9@liverpoolst.ad.cl.c am.ac.uk>
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Pratt @ 2005-05-13 16:36 UTC (permalink / raw)
  To: Andres Lagar Cavilla, xen-devel


> I've been scanning the list and seen reports on problems with 
> live migration. Thought I might add a bit more entropy.
> I try to do a live migration in the same physical host, i.e. 
> xm migrate --live 'whatever' localhost It fails with 'Error: 
> errors: suspend, failed, Callbak timed out'.
> It seems like transfer of memory pages works until the point 
> when the domain needs to be suspended to do the final 
> transfer. Funny thing is it used to work before, gloriously, 
> and I haven't made any software/hardware changes. At some 
> point a xm save command failed with timeout, and from there 
> on live migration fails with this message. 
> Non-live migration works perfectly, also between different 
> physical hosts. save/restore also works flawlessly.
> For the record, I use nfsroot.
> I attached xfrd.log. I can post some other stuff, just ask

Please try 2.0-testing.bk

Ian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-05-16 20:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-13 21:14 Live Migration Error Ian Pratt
     [not found] ` <A95E2296287EAD4EB592B5DEEFCE0E9D1E3FFF@liverpoolst.ad.cl.c am.ac.uk>
2005-05-13 21:22   ` Andrés Lagar Cavilla
2005-05-16 18:55 ` Andres Lagar Cavilla
2005-05-16 20:09   ` Jim Henderson
  -- strict thread matches above, loose matches on Subject: below --
2005-05-16 20:14 Ian Pratt
2005-05-13 17:07 Andres Lagar Cavilla
2005-05-13 18:45 ` Teemu Koponen
2005-05-13 20:50   ` Andrés Lagar Cavilla
2005-05-13 21:12   ` Andrés Lagar Cavilla
2005-05-13 16:36 Ian Pratt
     [not found] ` <A95E2296287EAD4EB592B5DEEFCE0E9D1E3FE9@liverpoolst.ad.cl.c am.ac.uk>
2005-05-13 20:47   ` Andrés Lagar Cavilla

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.