qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Li Zhijian <lizhijian@cn.fujitsu.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Peter Xu <peterx@redhat.com>, Zhang Chen <zhangckid@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] test-filter-mirror hangs
Date: Fri, 25 Jan 2019 15:12:51 +0800	[thread overview]
Message-ID: <1ee4b9df-bd9c-0af8-e4a4-dbce44485f50@redhat.com> (raw)
In-Reply-To: <20190124110146.GC7953@redhat.com>


On 2019/1/24 下午7:01, Daniel P. Berrangé wrote:
> On Thu, Jan 24, 2019 at 10:30:23AM +0000, Daniel P. Berrangé wrote:
>> On Thu, Jan 24, 2019 at 10:11:55AM +0000, Daniel P. Berrangé wrote:
>>> On Wed, Jan 23, 2019 at 07:53:46PM +0000, Dr. David Alan Gilbert wrote:
>>>> Do you mean the:
>>>>
>>>>      /* send a qmp command to guarantee that 'connected' is setting to true. */
>>>>      qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>>
>>>> why was that ever sufficient to know the socket was ready?
>>> This doesn't make any sense to me.
>>>
>>> There's the netdev socket, which has been passed in as a pre-opened socket
>>> FD, so that's guaranteed connected.
>>>
>>> There's the chardev server socket, to which we've just done a unix_connect()
>>> call to establish a connection. If unix_connect() has succeeded, then at least
>>> the socket is connected & ready for I/O from the test's side. This is a
>>> reliable stream socket, so even if the test sends data on the socket right away
>>> and QEMU isn't ready, it won't be lost. It'll be buffered and received by QEMU
>>> as soon as QEMU starts to monitor for incoming data on the socket.
>>>
>>> So I don't get what trying to wait for a "connected" state actually achieves.
>>> It feels like a mistaken attempt to paper over some other unknown flaw that
>>> just worked by some lucky side-effect.
>> Immediately after writing that, I see what's happened.
>>
>> The  filter_redirector_receive_iov() method is triggered when QEMU reads
>> from the -netdev socket (which we passed in as an FD and immediately
>> write to).
>>
>> This method will discard all data, however, if the chr_out -chardev is
>> not in a connected state. So we do indeed have a race condition in this
>> test suite.
>>
>> In fact I'd say this filter-mirror object is racy by design even when
>> run in normal usage, if your chardev is a server mode with "nowait" set,
>> or is a client mode with "reconnect" set. It will simply discard data.
>>
>> We can fix the test suite by using FD passing for the -chardev
>> too, so we're guaranteed to be connected immediately.  It might be
>> possible to remove "nowait" flag, but I'm not sure if that will cause
>> problems with the qtest handshake as it might block QEMU at startup
>> preventing qtest handshake from being performed.
>>
>> If we care about the race in real QEMU execution, then we must either
>> document that "nowait" or "reconnect" should never be used with
>> filter-mirror, or perhaps can make use of "qemu_chr_wait_connected"
>> to synchronize startup fo the filter-mirror object with the chardev
>> initialization. That could fix the test suite too
> Actually using qemu_chr_wait_connected would cause the test suite to
> hang, and it wouldn't fix data loss in the case where the chardev
> disconnected and then waited to connect again.
>
> I think the core problem here is that the netdev code assumes that the
> filters are always able to process packets. A proper solution would
> involve the filters having a "bool ready" state and callback to notify
> the netdev anytime this state changes.
>
> The filter-mirror should *not* report ready until the chardev has been
> opened.
>
> The netdevs should then not read packets off the wire unless all the
> regsitered filters are reporting that they are ready.


Netdev should know nothing about filters. And there will be still a race 
between iterating all filters and handling disconnection if we did this.


>   If a filter then
> transitions to not-ready, the netdev should again stop reading packets
> off the wire & queue any that it might have had in flight, until the
> filter becomes ready again.


I agree to queue the packets in this case.

Thanks


>
> Without this kind of setup the filters are inherantly racy in several
> of the possible -chardev  configurations.
>
> In that sense the flaky test has actually done us a favour showing that
> the code is broken. It is not in fact the test that is broken, and though
> we could workaround it in the test that doens't fix the root cause problem.
>
> Regards,
> Daniel

  reply	other threads:[~2019-01-25  7:13 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-11 15:01 [Qemu-devel] test-filter-mirror hangs Peter Maydell
2019-01-11 16:15 ` Dr. David Alan Gilbert
2019-01-14 16:33   ` Zhang Chen
2019-01-17  9:46     ` Jason Wang
2019-01-21 18:56       ` Peter Maydell
2019-01-21 20:01         ` Dr. David Alan Gilbert
2019-01-22  9:06           ` Peter Maydell
2019-01-23  2:43         ` Jason Wang
2019-01-23 19:53           ` Dr. David Alan Gilbert
2019-01-24  4:01             ` Jason Wang
2019-01-24  9:11               ` Dr. David Alan Gilbert
2019-01-24  9:51                 ` Peter Xu
2019-01-25  3:55                   ` Jason Wang
2019-01-25  7:14                     ` Markus Armbruster
2019-01-25  3:45                 ` Jason Wang
2019-01-24  9:47               ` Markus Armbruster
2019-01-25  3:56                 ` Jason Wang
2019-01-25  7:12                   ` Markus Armbruster
2019-01-25  8:12                     ` Jason Wang
2019-01-25  8:44                       ` Markus Armbruster
2019-01-24 10:11             ` Daniel P. Berrangé
2019-01-24 10:30               ` Daniel P. Berrangé
2019-01-24 11:01                 ` Daniel P. Berrangé
2019-01-25  7:12                   ` Jason Wang [this message]
2019-01-25  7:00                 ` Jason Wang
2019-01-15 10:28   ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ee4b9df-bd9c-0af8-e4a4-dbce44485f50@redhat.com \
    --to=jasowang@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhangckid@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).