qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] net: fix non-deterministic failures of the 'netdev-socket' qtest
@ 2024-01-04 16:29 Daniel P. Berrangé
  2024-01-04 16:29 ` [PATCH 1/6] Revert "netdev: set timeout depending on loadavg" Daniel P. Berrangé
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Daniel P. Berrangé @ 2024-01-04 16:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Philippe Mathieu-Daudé,
	Daniel P. Berrangé, Michael S . Tsirkin, Jason Wang,
	Laurent Vivier, Thomas Huth, Marc-André Lureau,
	Paolo Bonzini

We've previously bumped up the timeouts in the netdev-socket qtest
to supposedly fix non-deterministic failures, however, the failures
are still hitting CI.

A simple 'listen()' and 'connect()' pairing across 2 QEMU processes
should be very quick to execute, even under high system load, so it
was never likely that the test was failing due to timeouts being
reached.

The actual root cause was a race condition in the test design. It
was spawning a QEMU with a 'server' netdev, and then spawning one
with the 'client' netdev. There was insufficient synchronization,
however, so it was possible for the 2nd QEMU process to attempt
to 'connect()' before the 'listen()' call was made by the 1st QEMU.

In the test scenarios that did not use the 'reconnect' flag, this
would result in the client QEMU never getting into the expected
state. The test code would thus loop on 'info network' until
hitting the maximum wait time.

This series reverts the increased timeouts, and fixes synchronization
in the test scenarios. It also improves reporting of errors in the
socket netdev backend so that 'info network' reports what actually
went wrong rather than a useless generic 'connection error' string.
This will help us diagnose any future CI problems, should they occurr.

Daniel P. Berrangé (6):
  Revert "netdev: set timeout depending on loadavg"
  Revert "osdep: add getloadavg"
  Revert "tests/qtest/netdev-socket: Raise connection timeout to 120
    seconds"
  net: add explicit info about connecting/listening state
  net: handle QIOTask completion to report useful error message
  qtest: ensure netdev-socket tests have non-overlapping names

 include/qemu/osdep.h        | 10 ---------
 meson.build                 |  1 -
 net/stream.c                | 18 +++++++++++-----
 tests/qtest/netdev-socket.c | 42 +++++++------------------------------
 4 files changed, 21 insertions(+), 50 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-01-16  1:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-04 16:29 [PATCH 0/6] net: fix non-deterministic failures of the 'netdev-socket' qtest Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 1/6] Revert "netdev: set timeout depending on loadavg" Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 2/6] Revert "osdep: add getloadavg" Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 3/6] Revert "tests/qtest/netdev-socket: Raise connection timeout to 120 seconds" Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 4/6] net: add explicit info about connecting/listening state Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 5/6] net: handle QIOTask completion to report useful error message Daniel P. Berrangé
2024-01-04 16:29 ` [PATCH 6/6] qtest: ensure netdev-socket tests have non-overlapping names Daniel P. Berrangé
2024-01-04 17:47   ` Philippe Mathieu-Daudé
2024-01-04 16:45 ` [PATCH 0/6] net: fix non-deterministic failures of the 'netdev-socket' qtest Stefan Hajnoczi
2024-01-15  2:36   ` Jason Wang
2024-01-15 10:19     ` Peter Maydell
2024-01-16  1:05       ` Jason Wang
2024-01-09 13:49 ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).