All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O
Date: Fri, 26 Aug 2011 11:59:28 +0100	[thread overview]
Message-ID: <1314356368-26522-1-git-send-email-berrange@redhat.com> (raw)

From: "Daniel P. Berrange" <berrange@redhat.com>

There are two common cases where migrate_cancel is intended to be
used

  1. When migration is not converging due to an overactive
     guest and insufficient network bandwidth
  2. When migration is stuck due a network outage, waiting
     for the TCP transmit timeout to occurr & return an I/O
     error for send()

In the second case, if you attempt to use 'migrate_cancel' it
will also get stuck. This can be seen by attempting to migrate
to a QEMU which has been SIGSTOP'd

  $ ./x86_64-softmmu/qemu-system-x86_64 -cdrom ~/boot.iso -m 600 \
       -monitor stdio -vnc :2 -incoming tcp:localhost:9000
   QEMU 0.14.1 monitor - type 'help' for more information
   (qemu)
   <Ctrl-Z>
   [1]+  Stopped

And in another shell

  $ ./x86_64-softmmu/qemu-system-x86_64 -cdrom ~/boot.iso -m 600 \
        -monitor stdio -vnc :1
   QEMU 0.14.1 monitor - type 'help' for more information
   (qemu) migrate -d tcp:localhost:9000
   (qemu) info migrate
   Migration status: active
   transferred ram: 416 kbytes
   remaining ram: 621624 kbytes
   total ram: 623040 kbytes
   (qemu) migrate_cancel

This last command will never return, until the first QEMU is
resumed. Looking at the stack trace in GDB you see

 #0  0x0000003a8320e4c2 in __libc_send (fd=10, buf=0x1bc7c70, n=19777, flags=0)
    at ../sysdeps/unix/sysv/linux/x86_64/send.c:28
 #1  0x000000000048fb1e in socket_write (s=<optimized out>, buf=<optimized out>, size=<optimized out>)
    at migration-tcp.c:39
 #2  0x000000000048eba4 in migrate_fd_put_buffer (opaque=0x1b76ad0, data=0x1bc7c70, size=19777)
    at migration.c:324
 #3  0x000000000048e442 in buffered_flush (s=0x1b76b90) at buffered_file.c:87
 #4  0x000000000048e4cf in buffered_close (opaque=0x1b76b90) at buffered_file.c:177
 #5  0x0000000000496d57 in qemu_fclose (f=0x1bbfc10) at savevm.c:479
 #6  0x000000000048f4ca in migrate_fd_cleanup (s=0x1b76ad0) at migration.c:291
 #7  0x000000000048f035 in do_migrate_cancel (mon=<optimized out>, qdict=<optimized out>,
    ret_data=<optimized out>) at migration.c:136[snip]
 [snip]

The migration_fd_cleanup method is where the problem really starts.
Specifically it does

    if (s->file) {
        DPRINTF("closing file\n");
        if (qemu_fclose(s->file) != 0) {
            ret = -1;
        }
        s->file = NULL;
    }

    if (s->fd != -1)
        close(s->fd);

And gets stuck in the qemu_fclose() bit because that method (rightly) tries
to flush all outstanding buffers before closing. Unfortunately while this is
desirable when migration ends successfully, it is undesirable when we are
failing/cancelling migration.

It is hard to tell qemu_fclose() that it shouldn't flush buffers directly,
so the alternative is to ensure that this method fails quickly when it
attempts I/O. This is easily achieved, simply by closing 's->fd' before
calling qemu_fclose.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 migration.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/migration.c b/migration.c
index f5959b4..a432c3b 100644
--- a/migration.c
+++ b/migration.c
@@ -286,6 +286,13 @@ int migrate_fd_cleanup(FdMigrationState *s)
 
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
 
+    if ((s->state == MIG_STATE_ERROR ||
+         s->state == MIG_STATE_CANCELLED) &&
+        s->fd != -1) {
+        close(s->fd);
+        s->fd = -1;
+    }
+
     if (s->file) {
         DPRINTF("closing file\n");
         if (qemu_fclose(s->file) != 0) {
-- 
1.7.6

             reply	other threads:[~2011-08-26 10:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-26 10:59 Daniel P. Berrange [this message]
2011-08-26 11:25 ` [Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O Daniel P. Berrange
2011-08-26 13:48   ` Paolo Bonzini
2012-06-01  5:04     ` Amos Kong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1314356368-26522-1-git-send-email-berrange@redhat.com \
    --to=berrange@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.