From: "Daniel P. Berrangé" <berrange@redhat.com>
To: qemu-devel@nongnu.org, Jason Wang <jasowang@redhat.com>
Cc: "Stefan Hajnoczi" <stefanha@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Michael S . Tsirkin" <mst@redhat.com>,
"Laurent Vivier" <lvivier@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH 0/6] net: fix non-deterministic failures of the 'netdev-socket' qtest
Date: Tue, 9 Jan 2024 13:49:26 +0000 [thread overview]
Message-ID: <ZZ1O5m2i_lz1fLMr@redhat.com> (raw)
In-Reply-To: <20240104162942.211458-1-berrange@redhat.com>
Hi Jason,
As the net/ maintainer, could you take a look at this series.
This failure has been causing pain for CI for quite a while.
If you're happy with it, I can include it in a pending pull
request of other misc patches I have.
On Thu, Jan 04, 2024 at 04:29:36PM +0000, Daniel P. Berrangé wrote:
> We've previously bumped up the timeouts in the netdev-socket qtest
> to supposedly fix non-deterministic failures, however, the failures
> are still hitting CI.
>
> A simple 'listen()' and 'connect()' pairing across 2 QEMU processes
> should be very quick to execute, even under high system load, so it
> was never likely that the test was failing due to timeouts being
> reached.
>
> The actual root cause was a race condition in the test design. It
> was spawning a QEMU with a 'server' netdev, and then spawning one
> with the 'client' netdev. There was insufficient synchronization,
> however, so it was possible for the 2nd QEMU process to attempt
> to 'connect()' before the 'listen()' call was made by the 1st QEMU.
>
> In the test scenarios that did not use the 'reconnect' flag, this
> would result in the client QEMU never getting into the expected
> state. The test code would thus loop on 'info network' until
> hitting the maximum wait time.
>
> This series reverts the increased timeouts, and fixes synchronization
> in the test scenarios. It also improves reporting of errors in the
> socket netdev backend so that 'info network' reports what actually
> went wrong rather than a useless generic 'connection error' string.
> This will help us diagnose any future CI problems, should they occurr.
>
> Daniel P. Berrangé (6):
> Revert "netdev: set timeout depending on loadavg"
> Revert "osdep: add getloadavg"
> Revert "tests/qtest/netdev-socket: Raise connection timeout to 120
> seconds"
> net: add explicit info about connecting/listening state
> net: handle QIOTask completion to report useful error message
> qtest: ensure netdev-socket tests have non-overlapping names
>
> include/qemu/osdep.h | 10 ---------
> meson.build | 1 -
> net/stream.c | 18 +++++++++++-----
> tests/qtest/netdev-socket.c | 42 +++++++------------------------------
> 4 files changed, 21 insertions(+), 50 deletions(-)
>
> --
> 2.43.0
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
prev parent reply other threads:[~2024-01-09 13:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-04 16:29 [PATCH 0/6] net: fix non-deterministic failures of the 'netdev-socket' qtest Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 1/6] Revert "netdev: set timeout depending on loadavg" Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 2/6] Revert "osdep: add getloadavg" Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 3/6] Revert "tests/qtest/netdev-socket: Raise connection timeout to 120 seconds" Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 4/6] net: add explicit info about connecting/listening state Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 5/6] net: handle QIOTask completion to report useful error message Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 6/6] qtest: ensure netdev-socket tests have non-overlapping names Daniel P. Berrangé
2024-01-04 17:47 ` Philippe Mathieu-Daudé
2024-01-04 16:45 ` [PATCH 0/6] net: fix non-deterministic failures of the 'netdev-socket' qtest Stefan Hajnoczi
2024-01-15 2:36 ` Jason Wang
2024-01-15 10:19 ` Peter Maydell
2024-01-16 1:05 ` Jason Wang
2024-01-09 13:49 ` Daniel P. Berrangé [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZ1O5m2i_lz1fLMr@redhat.com \
--to=berrange@redhat.com \
--cc=jasowang@redhat.com \
--cc=lvivier@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.