qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O
Date: Fri, 26 Aug 2011 11:59:28 +0100	[thread overview]
Message-ID: <1314356368-26522-1-git-send-email-berrange@redhat.com> (raw)

From: "Daniel P. Berrange" <berrange@redhat.com>

There are two common cases where migrate_cancel is intended to be
used

  1. When migration is not converging due to an overactive
     guest and insufficient network bandwidth
  2. When migration is stuck due a network outage, waiting
     for the TCP transmit timeout to occurr & return an I/O
     error for send()

In the second case, if you attempt to use 'migrate_cancel' it
will also get stuck. This can be seen by attempting to migrate
to a QEMU which has been SIGSTOP'd

  $ ./x86_64-softmmu/qemu-system-x86_64 -cdrom ~/boot.iso -m 600 \
       -monitor stdio -vnc :2 -incoming tcp:localhost:9000
   QEMU 0.14.1 monitor - type 'help' for more information
   (qemu)
   <Ctrl-Z>
   [1]+  Stopped

And in another shell

  $ ./x86_64-softmmu/qemu-system-x86_64 -cdrom ~/boot.iso -m 600 \
        -monitor stdio -vnc :1
   QEMU 0.14.1 monitor - type 'help' for more information
   (qemu) migrate -d tcp:localhost:9000
   (qemu) info migrate
   Migration status: active
   transferred ram: 416 kbytes
   remaining ram: 621624 kbytes
   total ram: 623040 kbytes
   (qemu) migrate_cancel

This last command will never return, until the first QEMU is
resumed. Looking at the stack trace in GDB you see

 #0  0x0000003a8320e4c2 in __libc_send (fd=10, buf=0x1bc7c70, n=19777, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/send.c:28
 #1  0x000000000048fb1e in socket_write (s=<optimized out>, buf=<optimized out>, size=<optimized out>)
    at migration-tcp.c:39
 #2  0x000000000048eba4 in migrate_fd_put_buffer (opaque=0x1b76ad0, data=0x1bc7c70, size=19777)
    at migration.c:324
 #3  0x000000000048e442 in buffered_flush (s=0x1b76b90) at buffered_file.c:87
 #4  0x000000000048e4cf in buffered_close (opaque=0x1b76b90) at buffered_file.c:177
 #5  0x0000000000496d57 in qemu_fclose (f=0x1bbfc10) at savevm.c:479
 #6  0x000000000048f4ca in migrate_fd_cleanup (s=0x1b76ad0) at migration.c:291
 #7  0x000000000048f035 in do_migrate_cancel (mon=<optimized out>, qdict=<optimized out>,
    ret_data=<optimized out>) at migration.c:136[snip]
 [snip]

The migration_fd_cleanup method is where the problem really starts.
Specifically it does

    if (s->file) {
        DPRINTF("closing file\n");
        if (qemu_fclose(s->file) != 0) {
            ret = -1;
        }
        s->file = NULL;
    }

    if (s->fd != -1)
        close(s->fd);

And gets stuck in the qemu_fclose() bit because that method (rightly) tries
to flush all outstanding buffers before closing. Unfortunately while this is
desirable when migration ends successfully, it is undesirable when we are
failing/cancelling migration.

It is hard to tell qemu_fclose() that it shouldn't flush buffers directly,
so the alternative is to ensure that this method fails quickly when it
attempts I/O. This is easily achieved, simply by closing 's->fd' before
calling qemu_fclose.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 migration.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/migration.c b/migration.c
index f5959b4..a432c3b 100644
--- a/migration.c
+++ b/migration.c
@@ -286,6 +286,13 @@ int migrate_fd_cleanup(FdMigrationState *s)
 
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
 
+    if ((s->state == MIG_STATE_ERROR ||
+         s->state == MIG_STATE_CANCELLED) &&
+        s->fd != -1) {
+        close(s->fd);
+        s->fd = -1;
+    }
+
     if (s->file) {
         DPRINTF("closing file\n");
         if (qemu_fclose(s->file) != 0) {
-- 
1.7.6

             reply	other threads:[~2011-08-26 10:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-26 10:59 Daniel P. Berrange [this message]
2011-08-26 11:25 ` [Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O Daniel P. Berrange
2011-08-26 13:48   ` Paolo Bonzini
2012-06-01  5:04     ` Amos Kong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1314356368-26522-1-git-send-email-berrange@redhat.com \
    --to=berrange@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).