From: "Daniel P. Berrange" <berrange@redhat.com>
To: qemu-devel <qemu-devel@nongnu.org>
Cc: libvir-list@redhat.com
Subject: Re: [Qemu-devel] Long QEMU main loop pauses during migration (to file) under heavy load
Date: Fri, 11 Nov 2011 13:13:11 +0000 [thread overview]
Message-ID: <20111111131310.GM8472@redhat.com> (raw)
In-Reply-To: <20111111130320.GK8472@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5040 bytes --]
On Fri, Nov 11, 2011 at 01:03:20PM +0000, Daniel P. Berrange wrote:
> Libvirt recently introduced a change to the way it does 'save to file'
> with QEMU. Historically QEMU has a 32MB/s I/O limit on migration by
> default. When saving to file, we didn't want any artificial limit,
> but rather to max out the underlying storage. So when doing save to
> file, we set a large bandwidth limit (INT64_MAX / (1024 * 1024)) so
> it is effectively unlimited.
>
> After doing this, we discovered that the QEMU monitor was becoming
> entirely blocked. It did not even return from the 'migrate' command
> until migration was complete despite the 'detach' flag being set.
> This was a bug in libvirt, because we passed a plain file descriptor
> which does not support EAGAIN. Thank you POSIX.
>
> Libvirt has another mode where it uses an I/O helper command so get
> O_DIRECT, and in this mode we pass a pipe() FD to QEMU. After ensuring
> that this pipe FD really does have O_NONBLOCK set, we still saw some
> odd behaviour.
>
> I'm not sure whether what I describe can neccessarily be called a QEMU
> bug, but I wanted to raise it for discussion anyway....
>
> The sequence of steps is
>
> - libvirt sets qemu migration bandwidth to "unlimited"
> - libvirt opens a pipe() and sets O_NONBLOCK on the write end
> - libvirt spawns libvirt-iohelper giving it the target file
> on disk, and the read end of the pipe
> - libvirt does 'getfd migfile' monitor command to give QEMU
> the write end of the pipe
> - libvirt does 'migrate fd:migfile -d' to run migration
> - In parallel
> - QEMU is writing to the pipe (which is non-blocking)
> - libvirt_helper reading the pipe & writing to disk with O_DIRECT
I should have mentioned that the way I'm testing this is with
libvirt 0.9.7, with both QEMU 0.14 and QEMU GIT master, using
a guest with 2 GB of RAM:
$ virsh start l3
Domain l3 started
$ virsh dominfo l3
Id: 17
Name: l3
UUID: c7a3edbd-edaf-9455-926a-d65c16db1803
OS Type: hvm
State: running
CPU(s): 1
CPU time: 1.1s
Max memory: 2292000 kB
Used memory: 2292736 kB
Persistent: yes
Autostart: disable
Managed save: no
Security model: selinux
Security DOI: 0
Security label: system_u:system_r:unconfined_t:s0:c94,c700 (permissive)
To actually perform the save-to-file, I use the '--bypass-cache' flag
for libvirt, which ensures we pass a pipe to QEMU and run our I/O
helper for O_DIRECT, instead of directly giving QEMU a plain file
$ virsh save --bypass-cache l3 l3.image
Domain l3 saved to l3.image
> - Most of the qemu_savevm_state_iterate() calls complete in 10-20 ms
>
> - Reasonably often a qemu_savevm_state_iterate() call takes 300-400 ms
>
> - Fairly rarely a qemu_savevm_state_iterate() call takes 10-20 *seconds*
I use the attached systemtap script for determining these
eg run this before starting the migration to disk:
# stap qemu-mig.stp
Begin
0.000 Start
5.198 > Begin
5.220 < Begin 0.022
5.220 > Iterate
5.224 < Iterate 0.004
...snip..
6.299 > Iterate
6.314 < Iterate 0.015
6.314 > Iterate
6.319 < Iterate 0.005
6.409 > Iterate
8.139 < Iterate 1.730 <<< very slow iteration
8.152 > Iterate
13.078 < Iterate 4.926 <<< very slow iteration
13.963 > Iterate
14.248 < Iterate 0.285
14.441 > Iterate
14.448 < Iterate 0.007
...snip...
24.171 > Iterate
24.178 < Iterate 0.007
24.178 > Complete
24.588 < Complete 0.410
<Ctrl-C>
avg 79 = sum 8033 / count 101; min 3 max 4926
value |-------------------------------------------------- count
0 | 0
1 | 0
2 | 1
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 74
8 |@@@@@@@@@ 19
16 |@ 3
32 | 0
64 | 0
128 | 0
256 |@ 2
512 | 0
1024 | 1
2048 | 0
4096 | 1
8192 | 0
16384 | 0
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
[-- Attachment #2: qemu-mig.stp --]
[-- Type: text/plain, Size: 2023 bytes --]
global then;
global deltas;
global start;
function print_ts(str) {
now = gettimeofday_ns() / (1000*1000)
delta = (now - start)
printf("%3d.%03d %s\n",
(delta / 1000), (delta % 1000), str);
}
probe begin {
printf("Begin\n");
then = 0;
start = gettimeofday_ns() / (1000*1000);
print_ts("Start");
}
probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_begin") {
then = gettimeofday_ns() / (1000*1000);
print_ts("> Begin");
}
probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_begin").return {
now = gettimeofday_ns() / (1000*1000);
if (then != 0) {
delta = now - then;
deltas <<< delta;
print_ts(sprintf("< Begin %3d.%03d",
(delta / 1000), (delta % 1000)));
}
}
probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_iterate") {
then = gettimeofday_ns() / (1000*1000);
print_ts("> Iterate");
}
probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_iterate").return {
now = gettimeofday_ns() / (1000*1000);
if (then != 0) {
delta = now - then;
deltas <<< delta;
print_ts(sprintf("< Iterate %3d.%03d",
(delta / 1000), (delta % 1000)));
}
}
probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_complete") {
then = gettimeofday_ns() / (1000*1000);
print_ts("> Complete");
}
probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_complete").return {
now = gettimeofday_ns() / (1000*1000);
if (then != 0) {
delta = now - then;
deltas <<< delta;
print_ts(sprintf("< Complete %3d.%03d",
(delta / 1000), (delta % 1000)));
}
}
probe end {
printf ("avg %d = sum %d / count %d; min %d max %d\n",
@avg(deltas), @sum(deltas), @count(deltas), @min(deltas), @max(deltas));
print (@hist_log(deltas));
}
next prev parent reply other threads:[~2011-11-11 13:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-11 13:03 [Qemu-devel] Long QEMU main loop pauses during migration (to file) under heavy load Daniel P. Berrange
2011-11-11 13:13 ` Daniel P. Berrange [this message]
2011-11-11 13:31 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111111131310.GM8472@redhat.com \
--to=berrange@redhat.com \
--cc=libvir-list@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.