All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@web.de>
To: Michael Tokarev <mjt@tls.msk.ru>
Cc: mtosatti@redhat.com, qemu-devel@nongnu.org,
	Luiz Capitulino <lcapitulino@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
Date: Fri, 05 Aug 2011 00:19:51 +0200	[thread overview]
Message-ID: <4E3B1B07.2000400@web.de> (raw)
In-Reply-To: <4E3AFA5F.4040708@msgid.tls.msk.ru>

[-- Attachment #1: Type: text/plain, Size: 4583 bytes --]

On 2011-08-04 22:00, Michael Tokarev wrote:
> 04.08.2011 23:52, Michael Tokarev пишет:
>> 04.08.2011 23:24, Luiz Capitulino пишет:
>>> On Wed,  3 Aug 2011 18:51:44 +0400
>>> Michael Tokarev <mjt@tls.msk.ru> wrote:
>>>
>>>> If we do, it results in double monitor_resume() (second being called
>>>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>>>> negative.
>>>
>>> Are you hitting an specific issue or did you find this by code inspection?
>>
>> The specific case I'm talking about can trivially be triggered.
>> Run migrate "exec:nosuchfile" and your monitor will be stuck
>> forever.
>>
>>> IIRC, I asked Marcelo to add the monitor_resume() call in the fix for
>>> e447b1a603 because the monitor wasn't being resumed in some cases. Don't
>>> remember which though, do you Marcelo?
>>
>> And this (e447b1a603) is actually exactly the same thing: when the
>> child terminates before qemu completes sending stuff to it.
>>
>>> I see two possibilities here:
>>>
>>>  1. After e447b1a603 there was some change that made the monitor_resume() call
>>>     in migrate_fd_put_buffer() unnecessary
>>>
>>>  2. We're calling it in the wrong place
>>>
>>> Taking a quick look at the code I see that migrate_fd_cleanup() doesn't
>>> seem to be called when qemu_savevm_state_iterate() fails, for example.
>>
>> Yes indeed, but that will lead to a havoc in other places, like
>> leaving callback notifier registered even after migration gets
>> cancelled, or keeping the file open.
>>
>> Note that qemu_savevm_state_iterate() gets called from a callback
>> routine.
>>
>> What I observe here is that when the exec: process exits prematurely,
>> qemu will continue writing stuff to pipe up till migrate_fd_put_buffer()
>> hits error (EPIPE iirc), at which point we'll call mon->resume() twice.
>> The same happens in case of tcp migration when the other end closes
>> the connection.
>>
>> I don't know when qemu_savevm_state_iterate() may fail, or why
>> qemu_file_has_error() in there does not return true.
>>
>> Maybe the problem here is much more severe actually, -- that's why
>> I initially asked what it is all about, -- before offering the patch.
> 
> This looks like a race condition: which of the two places
> will detect the error first.
> 
> Sometimes I see this (-monitor stdio):
> 
> (qemu) migrate "exec:nothing"
> sh: nothing: not found
> 
> (qemu)
> 
> and after this, monitor is stuck as I described.  But sometimes
> I see this:
> 
> (qemu) migrate "exec:nothing"
> sh: nothing: not found
> (qemu)
> 
> And it continues working.  Note the extra newline.
> If the error gets detected sooner (like in case of
> exec:nothing, or even better, after removing
> /bin/sh so that system(3) will fail), the chance
> to have stuck monitor is much higher.  If the
> error gets detected later, it will survive most
> likely.  Or something like that anyway.

Resume on error in migrate_fd_put_buffer raced with migrate_fd_cleanup
triggered via migrate_fd_put_ready called from migrate_fd_connect.

This migration code is a horrible maze. Patch below tries to move
monitor_resume a bit out of this. Please check if it works for you, then
I'll send it out properly.

Thanks,
Jan

---

diff --git a/migration.c b/migration.c
index 2a15b98..756fa62 100644
--- a/migration.c
+++ b/migration.c
@@ -292,18 +292,17 @@ int migrate_fd_cleanup(FdMigrationState *s)
             ret = -1;
         }
         s->file = NULL;
+    } else {
+        if (s->mon) {
+            monitor_resume(s->mon);
+        }
     }
 
-    if (s->fd != -1)
+    if (s->fd != -1) {
         close(s->fd);
-
-    /* Don't resume monitor until we've flushed all of the buffers */
-    if (s->mon) {
-        monitor_resume(s->mon);
+        s->fd = -1;
     }
 
-    s->fd = -1;
-
     return ret;
 }
 
@@ -330,9 +329,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void *data, size_t size)
     if (ret == -EAGAIN) {
         qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s);
     } else if (ret < 0) {
-        if (s->mon) {
-            monitor_resume(s->mon);
-        }
         s->state = MIG_STATE_ERROR;
         notifier_list_notify(&migration_state_notifiers, NULL);
     }
@@ -458,6 +454,9 @@ int migrate_fd_close(void *opaque)
 {
     FdMigrationState *s = opaque;
 
+    if (s->mon) {
+        monitor_resume(s->mon);
+    }
     qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
     return s->close(s);
 }


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

  reply	other threads:[~2011-08-04 22:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-03 14:51 [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path Michael Tokarev
2011-08-04 19:24 ` Luiz Capitulino
2011-08-04 19:52   ` Michael Tokarev
2011-08-04 20:00     ` Michael Tokarev
2011-08-04 22:19       ` Jan Kiszka [this message]
2011-08-05  6:51         ` Michael Tokarev
2011-08-05  7:11           ` Jan Kiszka
  -- strict thread matches above, loose matches on Subject: below --
2011-07-19 11:46 Michael Tokarev
2011-07-19 21:48 ` Jan Kiszka
2011-07-20 16:34   ` Marcelo Tosatti
2011-07-20 22:06     ` Jan Kiszka
2011-08-03  7:38       ` Michael Tokarev
2011-08-03 13:22         ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E3B1B07.2000400@web.de \
    --to=jan.kiszka@web.de \
    --cc=lcapitulino@redhat.com \
    --cc=mjt@tls.msk.ru \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.