From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
Li Zhijian <lizhijian@cn.fujitsu.com>,
Jason Wang <jasowang@redhat.com>,
QEMU Developers <qemu-devel@nongnu.org>,
Peter Xu <peterx@redhat.com>, Zhang Chen <zhangckid@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] test-filter-mirror hangs
Date: Thu, 24 Jan 2019 10:30:23 +0000 [thread overview]
Message-ID: <20190124103023.GB7953@redhat.com> (raw)
In-Reply-To: <20190124101155.GA7953@redhat.com>
On Thu, Jan 24, 2019 at 10:11:55AM +0000, Daniel P. Berrangé wrote:
> On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
> > * Jason Wang (jasowang@redhat.com) wrote:
> > >
> > > On 2019/1/22 上午2:56, Peter Maydell wrote:
> > > > On Thu, 17 Jan 2019 at 09:46, Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On 2019/1/15 上午12:33, Zhang Chen wrote:
> > > > > >
> > > > > > On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
> > > > > > <dgilbert@redhat.com <mailto:dgilbert@redhat.com>> wrote:
> > > > > >
> > > > > > * Peter Maydell (peter.maydell@linaro.org
> > > > > > <mailto:peter.maydell@linaro.org>) wrote:
> > > > > > > Recently I've noticed that test-filter-mirror has been hanging
> > > > > > > intermittently, typically when run on some other TCG architecture.
> > > > > > > In the instance I've just looked at, this was with s390x guest on
> > > > > > > x86-64 host, though I've also seen it on other host archs and
> > > > > > > perhaps with other guests.
> > > > > >
> > > > > > Watch out to see if you really do see it for other guests;
> > > > > > it carefully avoids using virtio-net to avoid vhost; but on s390x it
> > > > > > uses virtio-net-ccw - could that hit the vhost it was trying to avoid?
> > > > > >
> > > > > > > Below is a backtrace, though it seems to be pretty unhelpful.
> > > > > > > Anybody got any theories ? Does the mirror test rely on dirty
> > > > > > > memory bitmaps like the migration test (which also hangs
> > > > > > > occasionally with TCG due to some bug I'm sure we've investigated
> > > > > > > in the past) ?
> > > > > >
> > > > > > I don't think it relies on the CPU at all.
> > > > > > I have no idea about this currently, but Jason and I designed the
> > > > > > test case.
> > > > > > Add Jason: Have any comments about this ?
> > > > >
> > > > > I can't reproduce this locally with s390x-softmmu. It looks to me the
> > > > > test should be independent to any kinds of emulation. It should pass
> > > > > when mainloop work.
> > > > I've just seen a hang with ppc64 guest on s390x host, so it is
> > > > indeed not specific to s390x guest (and so not specific to
> > > > virtio-net either, since the ppc64 guest setup uses e1000).
> > > >
> > > > thanks
> > > > -- PMM
> > >
> > >
> > > Finally reproduced locally after hundreds (sometimes thousands) times of
> > > running.
> > >
> > > Bisection points to OOB monitor[1].
> > >
> > > It looks to me after OOB is used unconditionally we lose a barrier to make
> > > sure socket is connected before sending packets in test-filter-mirror.c. Is
> > > there any other similar and simple thing that we could do to kick the
> > > mainloop?
> >
> > Do you mean the:
> >
> > /* send a qmp command to guarantee that 'connected' is setting to true. */
> > qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> >
> > why was that ever sufficient to know the socket was ready?
>
> This doesn't make any sense to me.
>
> There's the netdev socket, which has been passed in as a pre-opened socket
> FD, so that's guaranteed connected.
>
> There's the chardev server socket, to which we've just done a unix_connect()
> call to establish a connection. If unix_connect() has succeeded, then at least
> the socket is connected & ready for I/O from the test's side. This is a
> reliable stream socket, so even if the test sends data on the socket right away
> and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
> as soon as QEMU starts to monitor for incoming data on the socket.
>
> So I don't get what trying to wait for a "connected" state actually achieves.
> It feels like a mistaken attempt to paper over some other unknown flaw that
> just worked by some lucky side-effect.
Immediately after writing that, I see what's happened.
The filter_redirector_receive_iov() method is triggered when QEMU reads
from the -netdev socket (which we passed in as an FD and immediately
write to).
This method will discard all data, however, if the chr_out -chardev is
not in a connected state. So we do indeed have a race condition in this
test suite.
In fact I'd say this filter-mirror object is racy by design even when
run in normal usage, if your chardev is a server mode with "nowait" set,
or is a client mode with "reconnect" set. It will simply discard data.
We can fix the test suite by using FD passing for the -chardev
too, so we're guaranteed to be connected immediately. It might be
possible to remove "nowait" flag, but I'm not sure if that will cause
problems with the qtest handshake as it might block QEMU at startup
preventing qtest handshake from being performed.
If we care about the race in real QEMU execution, then we must either
document that "nowait" or "reconnect" should never be used with
filter-mirror, or perhaps can make use of "qemu_chr_wait_connected"
to synchronize startup fo the filter-mirror object with the chardev
initialization. That could fix the test suite too
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2019-01-24 10:31 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-11 15:01 [Qemu-devel] test-filter-mirror hangs Peter Maydell
2019-01-11 16:15 ` Dr. David Alan Gilbert
2019-01-14 16:33 ` Zhang Chen
2019-01-17 9:46 ` Jason Wang
2019-01-21 18:56 ` Peter Maydell
2019-01-21 20:01 ` Dr. David Alan Gilbert
2019-01-22 9:06 ` Peter Maydell
2019-01-23 2:43 ` Jason Wang
2019-01-23 19:53 ` Dr. David Alan Gilbert
2019-01-24 4:01 ` Jason Wang
2019-01-24 9:11 ` Dr. David Alan Gilbert
2019-01-24 9:51 ` Peter Xu
2019-01-25 3:55 ` Jason Wang
2019-01-25 7:14 ` Markus Armbruster
2019-01-25 3:45 ` Jason Wang
2019-01-24 9:47 ` Markus Armbruster
2019-01-25 3:56 ` Jason Wang
2019-01-25 7:12 ` Markus Armbruster
2019-01-25 8:12 ` Jason Wang
2019-01-25 8:44 ` Markus Armbruster
2019-01-24 10:11 ` Daniel P. Berrangé
2019-01-24 10:30 ` Daniel P. Berrangé [this message]
2019-01-24 11:01 ` Daniel P. Berrangé
2019-01-25 7:12 ` Jason Wang
2019-01-25 7:00 ` Jason Wang
2019-01-15 10:28 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190124103023.GB7953@redhat.com \
--to=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=jasowang@redhat.com \
--cc=lizhijian@cn.fujitsu.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=zhangckid@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).