qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Thomas Huth" <thuth@redhat.com>,
	qemu-devel@nongnu.org, "Juan Quintela" <quintela@redhat.com>,
	"Lukáš Doktor" <ldoktor@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe
Date: Tue, 13 Feb 2018 15:49:42 +0000	[thread overview]
Message-ID: <20180213154942.GL2378@work-vm> (raw)
In-Reply-To: <20180213154512.GT573@redhat.com>

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Tue, Feb 13, 2018 at 03:41:45PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Tue, Feb 13, 2018 at 03:25:30PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > > > On Tue, Feb 13, 2018 at 03:09:12PM +0000, Dr. David Alan Gilbert wrote:
> > > > > > * Thomas Huth (thuth@redhat.com) wrote:
> > > > > > > We are currently facing some migration failure on s390x when running
> > > > > > > certain avocado tests, e.g. when running the test
> > > > > > > type_specific.io-github-autotest-qemu.migrate.with_reboot.exec.gzip_exec.
> > > > > > > This test is using 'migrate -d "exec:nc localhost 5200"' for the migration.
> > > > > > > The problem is detected at the receiving side, where the migration stream
> > > > > > > apparently ends too early. However, the cause for the problem is the
> > > > > > > sending side: After writing the migration stream into the pipe to netcat,
> > > > > > > the source QEMU calls qio_channel_command_close() which closes the pipe
> > > > > > > and immediately (!) kills the child process afterwards. So if the
> > > > > > > sending netcat did not read the final bytes from the pipe yet, or
> > > > > > > if it did not manage to send out all its buffers yet, it is killed
> > > > > > > before the whole migration stream is passed to the destination side.
> > > > > > 
> > > > > > Thanks for tracking that down!
> > > > > > 
> > > > > > > To ease the situation at least a little bit, we should give the child
> > > > > > > process at least some few more time slices before we kill it with
> > > > > > > SIGTERM and then with SIGKILL. With this change, the avocado test now
> > > > > > > succeeds here in 10 out of 10 runs.
> > > > > > > 
> > > > > > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > > > > > ---
> > > > > > >  io/channel-command.c | 6 +++---
> > > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/io/channel-command.c b/io/channel-command.c
> > > > > > > index 319c5ed..f64db3e 100644
> > > > > > > --- a/io/channel-command.c
> > > > > > > +++ b/io/channel-command.c
> > > > > > > @@ -177,11 +177,11 @@ static int qio_channel_command_abort(QIOChannelCommand *ioc,
> > > > > > >              return -1;
> > > > > > >          }
> > > > > > >      } else if (ret == 0) {
> > > > > > > -        if (step == 0) {
> > > > > > > +        if (step == 4) {
> > > > > > >              kill(ioc->pid, SIGTERM);
> > > > > > > -        } else if (step == 1) {
> > > > > > > +        } else if (step == 8) {
> > > > > > >              kill(ioc->pid, SIGKILL);
> > > > > > > -        } else {
> > > > > > > +        } else if (step >= 9) {
> > > > > > 
> > > > > > Hmm.  This seems pretty arbitrary; if I understand correctly you're
> > > > > > saying it'll get a SIGTERM after 4 (arbitrary) * 10ms (arbitrary).
> > > > > > 
> > > > > > Who is to say that's enough for a scp or gzip or the like?
> > > > > 
> > > > > We could conceivably implement the  qio_channel_shutdown() operation
> > > > > for the QIOChannelCommand class. It would merely close the FD to the
> > > > > child process, but leave it running. That would give it time to read
> > > > > any data still in the pipe from QEMU IIUC.
> > > > 
> > > > Yeh that's better; although when would we call shutdown or close on it?
> > > 
> > > Doesn't QEMU alredy use  shutdown() during the right part of migration,
> > > or is that only wrt post-copy ?
> > 
> > We only use it for cancel and errors, not during the normal behaviour.
> 
> So we could do with shutdown() for sake of post-copy anyway, but for
> normal behaviour maybe the right answer is for close() to just wait a
> real long time for the child app to exit ?  If we close the pipes, and
> then wait 5 seconds or more before giving up ?

Yes, I'm happier with a much longer arbitrary value than a short
arbitrary value; but I do wonder if there's any real need to kill it.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-02-13 15:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-13 14:14 [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe Thomas Huth
2018-02-13 14:23 ` Daniel P. Berrangé
2018-02-13 15:09 ` Dr. David Alan Gilbert
2018-02-13 15:11   ` Daniel P. Berrangé
2018-02-13 15:25     ` Dr. David Alan Gilbert
2018-02-13 15:27       ` Daniel P. Berrangé
2018-02-13 15:41         ` Dr. David Alan Gilbert
2018-02-13 15:45           ` Daniel P. Berrangé
2018-02-13 15:49             ` Dr. David Alan Gilbert [this message]
2018-02-13 15:58               ` Daniel P. Berrangé
2018-02-13 16:03                 ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180213154942.GL2378@work-vm \
    --to=dgilbert@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=ldoktor@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).