All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: qemu-devel <qemu-devel@nongnu.org>
Cc: libvir-list@redhat.com
Subject: Re: [Qemu-devel] Long QEMU main loop pauses during migration (to file) under heavy load
Date: Fri, 11 Nov 2011 13:13:11 +0000	[thread overview]
Message-ID: <20111111131310.GM8472@redhat.com> (raw)
In-Reply-To: <20111111130320.GK8472@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5040 bytes --]

On Fri, Nov 11, 2011 at 01:03:20PM +0000, Daniel P. Berrange wrote:
> Libvirt recently introduced a change to the way it does 'save to file'
> with QEMU. Historically QEMU has a 32MB/s I/O limit on migration by
> default. When saving to file, we didn't want any artificial limit,
> but rather to max out the underlying storage. So when doing save to
> file, we set a large bandwidth limit (INT64_MAX / (1024 * 1024)) so
> it is effectively unlimited.
> 
> After doing this, we discovered that the QEMU monitor was becoming
> entirely blocked. It did not even return from the 'migrate' command
> until migration was complete despite the 'detach' flag being set.
> This was a bug in libvirt, because we passed a plain file descriptor
> which does not support EAGAIN. Thank you POSIX.
> 
> Libvirt has another mode where it uses an I/O helper command so get
> O_DIRECT, and in this mode we pass a pipe() FD to QEMU. After ensuring
> that this pipe FD really does have O_NONBLOCK set, we still saw some
> odd behaviour.
> 
> I'm not sure whether what I describe can neccessarily be called a QEMU
> bug, but I wanted to raise it for discussion anyway....
> 
> The sequence of steps is
> 
>   - libvirt sets qemu migration bandwidth to "unlimited"
>   - libvirt opens a pipe() and sets O_NONBLOCK on the write end
>   - libvirt spawns  libvirt-iohelper giving it the target file
>     on disk, and the read end of the pipe
>   - libvirt does 'getfd migfile' monitor command to give QEMU
>     the write end of the pipe
>   - libvirt does 'migrate fd:migfile -d' to run migration
>   - In parallel
>        - QEMU is writing to the pipe (which is non-blocking)
>        - libvirt_helper reading the pipe & writing to disk with O_DIRECT


I should have mentioned that the way I'm testing this is with
libvirt 0.9.7, with both QEMU 0.14 and QEMU GIT master, using
a guest with 2 GB of RAM:


    $ virsh start l3
    Domain l3 started

    $ virsh dominfo l3
    Id:             17
    Name:           l3
    UUID:           c7a3edbd-edaf-9455-926a-d65c16db1803
    OS Type:        hvm
    State:          running
    CPU(s):         1
    CPU time:       1.1s
    Max memory:     2292000 kB
    Used memory:    2292736 kB
    Persistent:     yes
    Autostart:      disable
    Managed save:   no
    Security model: selinux
    Security DOI:   0
    Security label: system_u:system_r:unconfined_t:s0:c94,c700 (permissive)

To actually perform the save-to-file, I use the '--bypass-cache' flag
for libvirt, which ensures we pass a pipe to QEMU and run our I/O
helper for O_DIRECT, instead of directly giving QEMU a plain file

     $ virsh save --bypass-cache l3 l3.image 
     Domain l3 saved to l3.image

>  - Most of the qemu_savevm_state_iterate() calls complete in 10-20 ms
> 
>  - Reasonably often a qemu_savevm_state_iterate() call takes 300-400 ms
> 
>  - Fairly rarely a qemu_savevm_state_iterate() call takes 10-20 *seconds*

I use the attached systemtap script for determining these

eg run this before starting the migration to disk:

# stap qemu-mig.stp
Begin
  0.000 Start
  5.198 > Begin
  5.220 < Begin   0.022
  5.220 > Iterate
  5.224 < Iterate   0.004

  ...snip..

  6.299 > Iterate
  6.314 < Iterate   0.015
  6.314 > Iterate
  6.319 < Iterate   0.005
  6.409 > Iterate
  8.139 < Iterate   1.730         <<< very slow iteration
  8.152 > Iterate
 13.078 < Iterate   4.926         <<< very slow iteration
 13.963 > Iterate
 14.248 < Iterate   0.285
 14.441 > Iterate
 14.448 < Iterate   0.007

 ...snip...

 24.171 > Iterate
 24.178 < Iterate   0.007
 24.178 > Complete
 24.588 < Complete   0.410

 <Ctrl-C>

avg 79 = sum 8033 / count 101; min 3 max 4926
value |-------------------------------------------------- count
    0 |                                                    0
    1 |                                                    0
    2 |                                                    1
    4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@              74
    8 |@@@@@@@@@                                          19
   16 |@                                                   3
   32 |                                                    0
   64 |                                                    0
  128 |                                                    0
  256 |@                                                   2
  512 |                                                    0
 1024 |                                                    1
 2048 |                                                    0
 4096 |                                                    1
 8192 |                                                    0
16384 |                                                    0

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

[-- Attachment #2: qemu-mig.stp --]
[-- Type: text/plain, Size: 2023 bytes --]



global then;
global deltas;

global start;

function print_ts(str) {
  now = gettimeofday_ns() / (1000*1000)

  delta = (now - start)

  printf("%3d.%03d %s\n",
         (delta / 1000), (delta % 1000), str);
}


probe begin {
   printf("Begin\n");
   then = 0;
  start = gettimeofday_ns() / (1000*1000);
  print_ts("Start");

}


probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_begin") {
   then = gettimeofday_ns() / (1000*1000);
   print_ts("> Begin");   
}

probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_begin").return {
   now = gettimeofday_ns() / (1000*1000);
   if (then != 0) {
     delta = now - then;
     deltas <<< delta;
     print_ts(sprintf("< Begin %3d.%03d",
         (delta / 1000), (delta % 1000)));
   }
}

probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_iterate") {
   then = gettimeofday_ns() / (1000*1000);
   print_ts("> Iterate");   
}

probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_iterate").return {
   now = gettimeofday_ns() / (1000*1000);
   if (then != 0) {
     delta = now - then;
     deltas <<< delta;
     print_ts(sprintf("< Iterate %3d.%03d",
         (delta / 1000), (delta % 1000)));
   }
}

probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_complete") {
   then = gettimeofday_ns() / (1000*1000);
   print_ts("> Complete");   
}

probe process("/home/berrange/usr/qemu-git/bin/qemu-system-x86_64").function("qemu_savevm_state_complete").return {
   now = gettimeofday_ns() / (1000*1000);
   if (then != 0) {
     delta = now - then;
     deltas <<< delta;
     print_ts(sprintf("< Complete %3d.%03d",
         (delta / 1000), (delta % 1000)));
   }
}


probe end {
   printf ("avg %d = sum %d / count %d; min %d max %d\n",
           @avg(deltas), @sum(deltas), @count(deltas), @min(deltas), @max(deltas));
   print (@hist_log(deltas));
}

  reply	other threads:[~2011-11-11 13:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-11 13:03 [Qemu-devel] Long QEMU main loop pauses during migration (to file) under heavy load Daniel P. Berrange
2011-11-11 13:13 ` Daniel P. Berrange [this message]
2011-11-11 13:31 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111111131310.GM8472@redhat.com \
    --to=berrange@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.