From: Michael Tokarev <mjt@tls.msk.ru>
To: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>,
mtosatti@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path
Date: Fri, 05 Aug 2011 00:00:31 +0400 [thread overview]
Message-ID: <4E3AFA5F.4040708@msgid.tls.msk.ru> (raw)
In-Reply-To: <4E3AF860.2070005@msgid.tls.msk.ru>
04.08.2011 23:52, Michael Tokarev пишет:
> 04.08.2011 23:24, Luiz Capitulino пишет:
>> On Wed, 3 Aug 2011 18:51:44 +0400
>> Michael Tokarev <mjt@tls.msk.ru> wrote:
>>
>>> If we do, it results in double monitor_resume() (second being called
>>> from migrate_fd_cleanup() anyway) and monitor suspend count becoming
>>> negative.
>>
>> Are you hitting an specific issue or did you find this by code inspection?
>
> The specific case I'm talking about can trivially be triggered.
> Run migrate "exec:nosuchfile" and your monitor will be stuck
> forever.
>
>> IIRC, I asked Marcelo to add the monitor_resume() call in the fix for
>> e447b1a603 because the monitor wasn't being resumed in some cases. Don't
>> remember which though, do you Marcelo?
>
> And this (e447b1a603) is actually exactly the same thing: when the
> child terminates before qemu completes sending stuff to it.
>
>> I see two possibilities here:
>>
>> 1. After e447b1a603 there was some change that made the monitor_resume() call
>> in migrate_fd_put_buffer() unnecessary
>>
>> 2. We're calling it in the wrong place
>>
>> Taking a quick look at the code I see that migrate_fd_cleanup() doesn't
>> seem to be called when qemu_savevm_state_iterate() fails, for example.
>
> Yes indeed, but that will lead to a havoc in other places, like
> leaving callback notifier registered even after migration gets
> cancelled, or keeping the file open.
>
> Note that qemu_savevm_state_iterate() gets called from a callback
> routine.
>
> What I observe here is that when the exec: process exits prematurely,
> qemu will continue writing stuff to pipe up till migrate_fd_put_buffer()
> hits error (EPIPE iirc), at which point we'll call mon->resume() twice.
> The same happens in case of tcp migration when the other end closes
> the connection.
>
> I don't know when qemu_savevm_state_iterate() may fail, or why
> qemu_file_has_error() in there does not return true.
>
> Maybe the problem here is much more severe actually, -- that's why
> I initially asked what it is all about, -- before offering the patch.
This looks like a race condition: which of the two places
will detect the error first.
Sometimes I see this (-monitor stdio):
(qemu) migrate "exec:nothing"
sh: nothing: not found
(qemu)
and after this, monitor is stuck as I described. But sometimes
I see this:
(qemu) migrate "exec:nothing"
sh: nothing: not found
(qemu)
And it continues working. Note the extra newline.
If the error gets detected sooner (like in case of
exec:nothing, or even better, after removing
/bin/sh so that system(3) will fail), the chance
to have stuck monitor is much higher. If the
error gets detected later, it will survive most
likely. Or something like that anyway.
/mjt
next prev parent reply other threads:[~2011-08-04 20:00 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-03 14:51 [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path Michael Tokarev
2011-08-04 19:24 ` Luiz Capitulino
2011-08-04 19:52 ` Michael Tokarev
2011-08-04 20:00 ` Michael Tokarev [this message]
2011-08-04 22:19 ` Jan Kiszka
2011-08-05 6:51 ` Michael Tokarev
2011-08-05 7:11 ` Jan Kiszka
-- strict thread matches above, loose matches on Subject: below --
2011-07-19 11:46 Michael Tokarev
2011-07-19 21:48 ` Jan Kiszka
2011-07-20 16:34 ` Marcelo Tosatti
2011-07-20 22:06 ` Jan Kiszka
2011-08-03 7:38 ` Michael Tokarev
2011-08-03 13:22 ` Jan Kiszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E3AFA5F.4040708@msgid.tls.msk.ru \
--to=mjt@tls.msk.ru \
--cc=jan.kiszka@siemens.com \
--cc=lcapitulino@redhat.com \
--cc=mtosatti@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.