xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Olsowski <andreas.olsowski@uni.leuphana.de>
To: xen-devel@lists.xensource.com
Subject: slow live magration / xc_restore on xen4 pvops
Date: Tue, 01 Jun 2010 23:17:31 +0200	[thread overview]
Message-ID: <4C0578EB.2040800@uni.leuphana.de> (raw)
In-Reply-To: <2FD61F37AFF16D4DB46149330E4273C702FF9687@dcl-ex.dcml.docomolabs-usa.com>

Hi,

in preparation for our soon to arrive central storage array i wanted to 
test live magration and remus replication and stumbled upon a  problem.
When migrating a test-vm (512megs ram, idle) between my 3 servers two of 
them are extremely slow in "receiving" the vm. There is little to no cpu 
utilization from xc_restore until shortly before migration is complete.
The same goes for xm restore.
The xend.log contains:
[2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:286) 
restore:shadow=0x0, _static_max=0x20000000, _static_min=0x0,
[2010-06-01 21:16:27 5211] DEBUG (XendCheckpoint:305) [xc_restore]: 
/usr/lib/xen/bin/xc_restore 48 43 1 2 0 0 0 0
[2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) xc_domain_restore 
start: p2m_size = 20000
[2010-06-01 21:16:27 5211] INFO (XendCheckpoint:423) Reloading memory 
pages:   0%
[2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal 
error: Error when reading batch size
[2010-06-01 21:20:57 5211] INFO (XendCheckpoint:423) ERROR Internal 
error: error when buffering batch, finishing

When receiving a vm via live migration finally finishes. You can see the 
large gap in the timestamps.
The vm is perfectly fine after that, it just takes way too long.


First off let me explain my server setup, detailed information on trying 
to narrow down the error follows.
I have 3 servers running xen4 with 2.6.31.13-pvops as kernel, its the 
current kernel from jeremy's xen/master git branch.
The guests are running vanilla 2.6.32.11 kernels.

The 3 servers differ slightly in hardware, two are Dell PE 2950 and one 
is a Dell R710, the 2950's have 2 Quad-Xeon CPUs (L5335 and L5410), the 
R710 has 2 Quad Xeon E5520.
All machines have 24gigs of RAM.

They are called "tarballerina" (E5520), "xentruio1" (L5335) ad 
"xenturio2" (L5410).

Currently i use tarballerina for testing purposes but i dont consider 
anything in my setup "stable".
xenturio1 has 27 guests running, xenturio2 25.
No guest does anything that would even put a dent into the systems 
performance (ldap servers, radius, department webservers, etc.).

I created a test-vm on my current central iscsi storage, called "hatest" 
that idles around, has 2 VCPUs and 512megs of ram.

First i testen xm save/restore:
tarballerina:~# time xm restore /var/saverestore-t.mem
real    0m13.227s
user    0m0.090s
sys     0m0.023s
xenturio1:~# time xm restore /var/saverestore-x1.mem
real    4m15.173s
user    0m0.138s
sys     0m0.029s


When migrating to xenturio1 or 2 it the migration takes 181 to 278 
seconds, when migrating it to tarballerina it takes rougly 30seconds:
tarballerina:~# time xm migrate --live hatest 10.0.1.98
real    3m57.971s
user    0m0.086s
sys     0m0.029s
xenturio1:~# time xm migrate --live hatest 10.0.1.100
real    0m43.588s
user    0m0.123s
sys     0m0.034s


--- attempt of narrowing it down ----
My first guess was that since tarballerina had almost no guest running 
that did anything, it could be a issue of memory usage by the tapdisk2 
processes (each dom0 has been mem-set to 4096M).
I then started almost all vms that i have on tarballerina:
tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem
real    0m2.884s
tarballerina:~# time xm restore /var/saverestore-t.mem
real    0m15.594s


i tried this several times, sometimes it too 30+ seconds.

Then i started 2 VMs that run load and io generating processes  (stress, 
dd, openssl encryption, md5sum).
But this didnt affect xm restore perfomance, it still was quite fast:
tarballerina:~# time xm save saverestore-t /var/saverestore-t.mem
real    0m7.476s
user    0m0.101s
sys     0m0.022s
tarballerina:~# time xm restore /var/saverestore-t.mem
real    0m45.544s
user    0m0.094s
sys     0m0.022s

i tried several times again, restore took 17 to 45 seconds

Then i tried migrating the test-vm to tarballerina again, still fast, 
inspite of several vms including load and io generating vms:
This ate almost all available ram.
cputimes for xc_restore according to target machine's "top":
tarballerina -> xenturio1: 0:05:xx , cpu 2-4%, near the end 40%.
xenturio1 > tarballerina: 0:04:xx, cpu 4-8%, near the end 54%.

tarballerina:~# time xm migrate --live hatest 10.0.1.98
real    3m29.779s
user    0m0.102s
sys     0m0.017s
xenturio1:~# time xm migrate --live hatest 10.0.1.100
real    0m28.386s
user    0m0.154s
sys     0m0.032s


so my attempt of narrowing the problem down failed, its neither the free 
memory of the dom0 nor the load, io or the memory the other domUs utilize.
---end attempt---

More info(xm list, meminfo, table with migration times, etc.) on my 
setup can be found here:
http://andiolsi.rz.uni-lueneburg.de/node/37

There was another guy who has the same error in his logfile, this might 
be unrelated or not:
http://lists.xensource.com/archives/html/xen-users/2010-05/msg00318.html

Further information can be given, should demand for i arise.

With best regards

---
Andreas Olsowski <andreas.olsowski@uni.leuphana.de>
Leuphana Universität Lüneburg
System- und Netzwerktechnik
Rechenzentrum, Geb 7, Raum 15
Scharnhorststr. 1
21335 Lüneburg

Tel: ++49 4131 / 6771309

  parent reply	other threads:[~2010-06-01 21:17 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-01 17:49 XCP AkshayKumar Mehta
2010-06-01 19:06 ` XCP Jonathan Ludlam
2010-06-01 19:15   ` XCP AkshayKumar Mehta
2010-06-03  3:03   ` XCP AkshayKumar Mehta
2010-06-03 10:24     ` XCP Jonathan Ludlam
2010-06-03 17:20       ` XCP AkshayKumar Mehta
2010-08-31  1:33       ` XCP - iisues with XCP .5 AkshayKumar Mehta
2010-06-01 21:17 ` Andreas Olsowski [this message]
2010-06-02  7:11   ` slow live magration / xc_restore on xen4 pvops Keir Fraser
2010-06-02 15:46     ` Andreas Olsowski
2010-06-02 15:55       ` Keir Fraser
2010-06-02 16:18   ` Ian Jackson
2010-06-02 16:20     ` Ian Jackson
2010-06-02 16:24     ` Keir Fraser
2010-06-03  1:04       ` Brendan Cully
2010-06-03  4:31         ` Brendan Cully
2010-06-03  5:47         ` Keir Fraser
2010-06-03  6:45           ` Brendan Cully
2010-06-03  6:53             ` Jeremy Fitzhardinge
2010-06-03  6:55             ` Brendan Cully
2010-06-03  7:12               ` Keir Fraser
2010-06-03  8:58             ` Zhai, Edwin
2010-06-09 13:32               ` Keir Fraser
2010-06-02 16:27     ` Brendan Cully
2010-06-03 10:01       ` Ian Jackson
2010-06-03 15:03         ` Brendan Cully
2010-06-03 15:18           ` Keir Fraser
2010-06-03 17:15           ` Ian Jackson
2010-06-03 17:29             ` Brendan Cully
2010-06-03 18:02               ` Ian Jackson
2010-06-02 22:59   ` Andreas Olsowski
2010-06-10  9:27     ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C0578EB.2040800@uni.leuphana.de \
    --to=andreas.olsowski@uni.leuphana.de \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).