From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:60668) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qp3xl-0006rr-JK for qemu-devel@nongnu.org; Thu, 04 Aug 2011 15:52:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qp3xk-0001q7-Bt for qemu-devel@nongnu.org; Thu, 04 Aug 2011 15:52:13 -0400 Received: from isrv.corpit.ru ([86.62.121.231]:49280) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qp3xk-0001iU-3E for qemu-devel@nongnu.org; Thu, 04 Aug 2011 15:52:12 -0400 Message-ID: <4E3AF860.2070005@msgid.tls.msk.ru> Date: Thu, 04 Aug 2011 23:52:00 +0400 From: Michael Tokarev MIME-Version: 1.0 References: <1312383104-9565-1-git-send-email-mjt@msgid.tls.msk.ru> <20110804162441.0045215c@doriath> In-Reply-To: <20110804162441.0045215c@doriath> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] do not call monitor_resume() from migrate_fd_put_buffer() error path List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Jan Kiszka , mtosatti@redhat.com, qemu-devel@nongnu.org 04.08.2011 23:24, Luiz Capitulino =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > On Wed, 3 Aug 2011 18:51:44 +0400 > Michael Tokarev wrote: >=20 >> If we do, it results in double monitor_resume() (second being called >> from migrate_fd_cleanup() anyway) and monitor suspend count becoming >> negative. >=20 > Are you hitting an specific issue or did you find this by code inspecti= on? The specific case I'm talking about can trivially be triggered. Run migrate "exec:nosuchfile" and your monitor will be stuck forever. > IIRC, I asked Marcelo to add the monitor_resume() call in the fix for > e447b1a603 because the monitor wasn't being resumed in some cases. Don'= t > remember which though, do you Marcelo? And this (e447b1a603) is actually exactly the same thing: when the child terminates before qemu completes sending stuff to it. > I see two possibilities here: >=20 > 1. After e447b1a603 there was some change that made the monitor_resume= () call > in migrate_fd_put_buffer() unnecessary >=20 > 2. We're calling it in the wrong place >=20 > Taking a quick look at the code I see that migrate_fd_cleanup() doesn't > seem to be called when qemu_savevm_state_iterate() fails, for example. Yes indeed, but that will lead to a havoc in other places, like leaving callback notifier registered even after migration gets cancelled, or keeping the file open. Note that qemu_savevm_state_iterate() gets called from a callback routine. What I observe here is that when the exec: process exits prematurely, qemu will continue writing stuff to pipe up till migrate_fd_put_buffer() hits error (EPIPE iirc), at which point we'll call mon->resume() twice. The same happens in case of tcp migration when the other end closes the connection. I don't know when qemu_savevm_state_iterate() may fail, or why qemu_file_has_error() in there does not return true. Maybe the problem here is much more severe actually, -- that's why I initially asked what it is all about, -- before offering the patch. Thanks, /mjt >> Signed-Off-By: Michael Tokarev >> Reviewed-By: Jan Kiszka >> --- >> migration.c | 3 --- >> 1 files changed, 0 insertions(+), 3 deletions(-) >> >> diff --git a/migration.c b/migration.c >> index 2a15b98..7ca883f 100644 >> --- a/migration.c >> +++ b/migration.c >> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const = void *data, size_t size) >> if (ret =3D=3D -EAGAIN) { >> qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify= , s); >> } else if (ret < 0) { >> - if (s->mon) { >> - monitor_resume(s->mon); >> - } >> s->state =3D MIG_STATE_ERROR; >> notifier_list_notify(&migration_state_notifiers, NULL); >> } >=20