qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Li Zhijian <lizhijian@cn.fujitsu.com>,
	Jason Wang <jasowang@redhat.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Peter Xu <peterx@redhat.com>, Zhang Chen <zhangckid@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] test-filter-mirror hangs
Date: Thu, 24 Jan 2019 11:01:46 +0000	[thread overview]
Message-ID: <20190124110146.GC7953@redhat.com> (raw)
In-Reply-To: <20190124103023.GB7953@redhat.com>

On Thu, Jan 24, 2019 at 10:30:23AM +0000, Daniel P. Berrangé wrote:
> On Thu, Jan 24, 2019 at 10:11:55AM +0000, Daniel P. Berrangé wrote:
> > On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
> > > Do you mean the:
> > > 
> > >     /* send a qmp command to guarantee that 'connected' is setting to true. */
> > >     qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> > > 
> > > why was that ever sufficient to know the socket was ready?
> > 
> > This doesn't make any sense to me.
> > 
> > There's the netdev socket, which has been passed in as a pre-opened socket
> > FD, so that's guaranteed connected.
> > 
> > There's the chardev server socket, to which we've just done a unix_connect()
> > call to establish a connection. If unix_connect() has succeeded, then at least
> > the socket is connected & ready for I/O from the test's side. This is a
> > reliable stream socket, so even if the test sends data on the socket right away
> > and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
> > as soon as QEMU starts to monitor for incoming data on the socket.
> > 
> > So I don't get what trying to wait for a "connected" state actually achieves.
> > It feels like a mistaken attempt to paper over some other unknown flaw that
> > just worked by some lucky side-effect.
> 
> Immediately after writing that, I see what's happened.
> 
> The  filter_redirector_receive_iov() method is triggered when QEMU reads
> from the -netdev socket (which we passed in as an FD and immediately
> write to).
> 
> This method will discard all data, however, if the chr_out -chardev is
> not in a connected state. So we do indeed have a race condition in this
> test suite.
> 
> In fact I'd say this filter-mirror object is racy by design even when
> run in normal usage, if your chardev is a server mode with "nowait" set,
> or is a client mode with "reconnect" set. It will simply discard data.
> 
> We can fix the test suite by using FD passing for the -chardev
> too, so we're guaranteed to be connected immediately.  It might be
> possible to remove "nowait" flag, but I'm not sure if that will cause
> problems with the qtest handshake as it might block QEMU at startup
> preventing qtest handshake from being performed.
> 
> If we care about the race in real QEMU execution, then we must either
> document that "nowait" or "reconnect" should never be used with
> filter-mirror, or perhaps can make use of "qemu_chr_wait_connected"
> to synchronize startup fo the filter-mirror object with the chardev
> initialization. That could fix the test suite too

Actually using qemu_chr_wait_connected would cause the test suite to
hang, and it wouldn't fix data loss in the case where the chardev
disconnected and then waited to connect again.

I think the core problem here is that the netdev code assumes that the
filters are always able to process packets. A proper solution would
involve the filters having a "bool ready" state and callback to notify
the netdev anytime this state changes.

The filter-mirror should *not* report ready until the chardev has been
opened.

The netdevs should then not read packets off the wire unless all the
regsitered filters are reporting that they are ready. If a filter then
transitions to not-ready, the netdev should again stop reading packets
off the wire & queue any that it might have had in flight, until the
filter becomes ready again.

Without this kind of setup the filters are inherantly racy in several
of the possible -chardev  configurations.

In that sense the flaky test has actually done us a favour showing that
the code is broken. It is not in fact the test that is broken, and though
we could workaround it in the test that doens't fix the root cause problem.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

  reply	other threads:[~2019-01-24 11:02 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-11 15:01 [Qemu-devel] test-filter-mirror hangs Peter Maydell
2019-01-11 16:15 ` Dr. David Alan Gilbert
2019-01-14 16:33   ` Zhang Chen
2019-01-17  9:46     ` Jason Wang
2019-01-21 18:56       ` Peter Maydell
2019-01-21 20:01         ` Dr. David Alan Gilbert
2019-01-22  9:06           ` Peter Maydell
2019-01-23  2:43         ` Jason Wang
2019-01-23 19:53           ` Dr. David Alan Gilbert
2019-01-24  4:01             ` Jason Wang
2019-01-24  9:11               ` Dr. David Alan Gilbert
2019-01-24  9:51                 ` Peter Xu
2019-01-25  3:55                   ` Jason Wang
2019-01-25  7:14                     ` Markus Armbruster
2019-01-25  3:45                 ` Jason Wang
2019-01-24  9:47               ` Markus Armbruster
2019-01-25  3:56                 ` Jason Wang
2019-01-25  7:12                   ` Markus Armbruster
2019-01-25  8:12                     ` Jason Wang
2019-01-25  8:44                       ` Markus Armbruster
2019-01-24 10:11             ` Daniel P. Berrangé
2019-01-24 10:30               ` Daniel P. Berrangé
2019-01-24 11:01                 ` Daniel P. Berrangé [this message]
2019-01-25  7:12                   ` Jason Wang
2019-01-25  7:00                 ` Jason Wang
2019-01-15 10:28   ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190124110146.GC7953@redhat.com \
    --to=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhangckid@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).