[PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun
@ 2024-11-19 15:05 Daniel P. Berrangé
  2024-11-19 15:05 ` [PATCH 01/15] tests/functional: fix mips64el test to honour workdir Daniel P. Berrangé
                   ` (15 more replies)
  0 siblings, 16 replies; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

This started out as a series to get rid of the many GBs of temp
files the functional tests leave behind. Then it expanded into
improving the functional test debugging by ensuring we preserve
the QEMU stdout/stderr log file created by the QEMUMachine class.
In the course of doing that I encountered some other minor points
worth fixing, and then got side tracked into looking at the tuxrun
hangs with aarch64be. Investigating the latter exposed some further
holes in the debugging story prompting yet more patches, as well as
a final solution for tuxrun. So this series does:

 * Purge all scratch files created by tests
 * Preserve the stdout/stderr log file
 * Capture debug log messages on QEMUMachine
 * Provide a QMP backdoor for debugging stuck QEMUs
 * Enhance console handling for partial line matches
 * Fix the tuxrun tests by eliminating sleeps

There's quite alot of code here, but at the same time it feels like
the kind of stuff that'll be valuable either in the 9.2 release, or
in the soon to exist 9.2 stable branch.

NB, with this series applied Thomas' tuxrun conversion to functional
testing survives 200 iterations on my machine, whereas it would
reliably hang in < 20, and often in < 10, before.

Daniel P. Berrangé (15):
  tests/functional: fix mips64el test to honour workdir
  tests/functional: automatically clean up scratch files after tests
  tests/functional: remove "AVOCADO" from env variable name
  tests/functional: remove todo wrt avocado.utils.wait_for
  tests/functional: remove leftover :avocado: tags
  tests/functional: remove obsolete reference to avocado bug
  tests/functional: remove comments talking about avocado
  tests/functional: honour self.workdir in ACPI bits tests
  tests/functional: put QEMUMachine logs in testcase log directory
  tests/functional: honour requested test VM name in QEMUMachine
  tests/functional: enable debug logging for QEMUMachine
  tests/functional: logs details of console interaction operations
  tests/functional: rewrite console handling to be bytewise
  tests/functional: remove time.sleep usage from tuxrun tests
  tests/functional: add a QMP backdoor for debugging stalled tests

 docs/devel/testing/functional.rst        | 16 ++++++
 tests/functional/qemu_test/cmd.py        | 65 ++++++++++++++++++------
 tests/functional/qemu_test/testcase.py   | 33 +++++++++---
 tests/functional/qemu_test/tuxruntest.py | 17 +++----
 tests/functional/test_acpi_bits.py       | 56 +++++++-------------
 tests/functional/test_arm_bpim2u.py      | 20 --------
 tests/functional/test_arm_orangepi.py    | 27 ----------
 tests/functional/test_m68k_nextcube.py   |  3 +-
 tests/functional/test_mips64el_malta.py  |  4 +-
 9 files changed, 119 insertions(+), 122 deletions(-)

-- 
2.46.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/15] tests/functional: fix mips64el test to honour workdir
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 15:35   ` Alex Bennée
  2024-11-19 17:08   ` Philippe Mathieu-Daudé
  2024-11-19 15:05 ` [PATCH 02/15] tests/functional: automatically clean up scratch files after tests Daniel P. Berrangé
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

The missing directory separator resulted in the kernel file being
created 1 level higher than expected.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/test_mips64el_malta.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/functional/test_mips64el_malta.py b/tests/functional/test_mips64el_malta.py
index 6c6355b131..24ebcdb9c1 100755
--- a/tests/functional/test_mips64el_malta.py
+++ b/tests/functional/test_mips64el_malta.py
@@ -129,7 +129,7 @@ def do_test_i6400_framebuffer_logo(self, cpu_cores_count):
         screendump_path = os.path.join(self.workdir, 'screendump.pbm')
 
         kernel_path_gz = self.ASSET_KERNEL_4_7_0.fetch()
-        kernel_path = self.workdir + "vmlinux"
+        kernel_path = self.workdir + "/vmlinux"
         gzip_uncompress(kernel_path_gz, kernel_path)
 
         tuxlogo_path = self.ASSET_TUXLOGO.fetch()
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/15] tests/functional: automatically clean up scratch files after tests
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
  2024-11-19 15:05 ` [PATCH 01/15] tests/functional: fix mips64el test to honour workdir Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 16:21   ` Alex Bennée
  2024-11-19 15:05 ` [PATCH 03/15] tests/functional: remove "AVOCADO" from env variable name Daniel P. Berrangé
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

The build/tests/functional subdirectories are consuming huge amounts
of disk space.

Split the location for scratch files into a 'scratch' sub-directory,
separate from log files, and delete it upon completion of each test.
The new env variable QEMU_TEST_KEEP_SCRATCH can be set to preserve
this scratch dir for debugging access if required.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 docs/devel/testing/functional.rst      |  6 ++++++
 tests/functional/qemu_test/testcase.py | 14 +++++++++-----
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/docs/devel/testing/functional.rst b/docs/devel/testing/functional.rst
index bf6f1bb81e..6b5d0c5b98 100644
--- a/docs/devel/testing/functional.rst
+++ b/docs/devel/testing/functional.rst
@@ -65,6 +65,12 @@ to the QEMU binary that should be used for the test, for example::
   $ export QEMU_TEST_QEMU_BINARY=$PWD/qemu-system-x86_64
   $ python3 ../tests/functional/test_file.py
 
+The test framework will automatically purge any scratch files created during
+the tests. If needing to debug a failed test, it is possible to keep these
+files around on disk by setting ```QEMU_TEST_KEEP_SCRATCH=1``` as an env
+variable.  Any preserved files will be deleted the next time the test is run
+without this variable set.
+
 Overview
 --------
 
diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
index 411978b5ef..b9418e2ac0 100644
--- a/tests/functional/qemu_test/testcase.py
+++ b/tests/functional/qemu_test/testcase.py
@@ -13,8 +13,9 @@
 
 import logging
 import os
-import subprocess
 import pycotap
+import shutil
+import subprocess
 import sys
 import unittest
 import uuid
@@ -40,11 +41,12 @@ def setUp(self, bin_prefix):
         self.assertIsNotNone(self.qemu_bin, 'QEMU_TEST_QEMU_BINARY must be set')
         self.arch = self.qemu_bin.split('-')[-1]
 
-        self.workdir = os.path.join(BUILD_DIR, 'tests/functional', self.arch,
-                                    self.id())
+        self.outputdir = os.path.join(BUILD_DIR, 'tests', 'functional',
+                                      self.arch, self.id())
+        self.workdir = os.path.join(self.outputdir, 'scratch')
         os.makedirs(self.workdir, exist_ok=True)
 
-        self.logdir = self.workdir
+        self.logdir = self.outputdir
         self.log_filename = os.path.join(self.logdir, 'base.log')
         self.log = logging.getLogger('qemu-test')
         self.log.setLevel(logging.DEBUG)
@@ -56,6 +58,8 @@ def setUp(self, bin_prefix):
         self.log.addHandler(self._log_fh)
 
     def tearDown(self):
+        if "QEMU_TEST_KEEP_SCRATCH" not in os.environ:
+            shutil.rmtree(self.workdir)
         self.log.removeHandler(self._log_fh)
 
     def main():
@@ -108,7 +112,7 @@ def setUp(self):
 
         console_log = logging.getLogger('console')
         console_log.setLevel(logging.DEBUG)
-        self.console_log_name = os.path.join(self.workdir, 'console.log')
+        self.console_log_name = os.path.join(self.logdir, 'console.log')
         self._console_log_fh = logging.FileHandler(self.console_log_name,
                                                    mode='w')
         self._console_log_fh.setLevel(logging.DEBUG)
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/15] tests/functional: remove "AVOCADO" from env variable name
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
  2024-11-19 15:05 ` [PATCH 01/15] tests/functional: fix mips64el test to honour workdir Daniel P. Berrangé
  2024-11-19 15:05 ` [PATCH 02/15] tests/functional: automatically clean up scratch files after tests Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 16:22   ` Alex Bennée
  2024-11-19 17:09   ` Philippe Mathieu-Daudé
  2024-11-19 15:05 ` [PATCH 04/15] tests/functional: remove todo wrt avocado.utils.wait_for Daniel P. Berrangé
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

This env variable is a debugging flag to save screendumps in the
mips64el malta tests.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/test_mips64el_malta.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/functional/test_mips64el_malta.py b/tests/functional/test_mips64el_malta.py
index 24ebcdb9c1..6d1195d362 100755
--- a/tests/functional/test_mips64el_malta.py
+++ b/tests/functional/test_mips64el_malta.py
@@ -159,7 +159,7 @@ def do_test_i6400_framebuffer_logo(self, cpu_cores_count):
         loc = np.where(result >= match_threshold)
         tuxlogo_count = 0
         h, w = tuxlogo_bgr.shape[:2]
-        debug_png = os.getenv('AVOCADO_CV2_SCREENDUMP_PNG_PATH')
+        debug_png = os.getenv('QEMU_TEST_CV2_SCREENDUMP_PNG_PATH')
         for tuxlogo_count, pt in enumerate(zip(*loc[::-1]), start=1):
             logger.debug('found Tux at position (x, y) = %s', pt)
             cv2.rectangle(screendump_bgr, pt,
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/15] tests/functional: remove todo wrt avocado.utils.wait_for
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (2 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 03/15] tests/functional: remove "AVOCADO" from env variable name Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 16:37   ` Alex Bennée
  2024-11-19 15:05 ` [PATCH 05/15] tests/functional: remove leftover :avocado: tags Daniel P. Berrangé
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

We're not using avocado anymore, so while the TODO item is still
relevant, suggesting use of avocado.utils is not.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/test_m68k_nextcube.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/functional/test_m68k_nextcube.py b/tests/functional/test_m68k_nextcube.py
index 89385a134a..0124622c40 100755
--- a/tests/functional/test_m68k_nextcube.py
+++ b/tests/functional/test_m68k_nextcube.py
@@ -37,8 +37,7 @@ def check_bootrom_framebuffer(self, screenshot_path):
         self.vm.launch()
 
         self.log.info('VM launched, waiting for display')
-        # TODO: Use avocado.utils.wait.wait_for to catch the
-        #       'displaysurface_create 1120x832' trace-event.
+        # TODO: wait for the 'displaysurface_create 1120x832' trace-event.
         time.sleep(2)
 
         self.vm.cmd('human-monitor-command',
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/15] tests/functional: remove leftover :avocado: tags
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (3 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 04/15] tests/functional: remove todo wrt avocado.utils.wait_for Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 16:37   ` Alex Bennée
  2024-11-19 15:05 ` [PATCH 06/15] tests/functional: remove obsolete reference to avocado bug Daniel P. Berrangé
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

These tags are not honoured under the new functional test harness.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/test_arm_bpim2u.py   | 20 --------------------
 tests/functional/test_arm_orangepi.py | 27 ---------------------------
 2 files changed, 47 deletions(-)

diff --git a/tests/functional/test_arm_bpim2u.py b/tests/functional/test_arm_bpim2u.py
index 2f9fa145e3..35ea58d46c 100755
--- a/tests/functional/test_arm_bpim2u.py
+++ b/tests/functional/test_arm_bpim2u.py
@@ -37,11 +37,6 @@ class BananaPiMachine(LinuxKernelTest):
         '5b41b4e11423e562c6011640f9a7cd3bdd0a3d42b83430f7caa70a432e6cd82c')
 
     def test_arm_bpim2u(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=machine:bpim2u
-        :avocado: tags=accel:tcg
-        """
         self.set_machine('bpim2u')
         deb_path = self.ASSET_DEB.fetch()
         kernel_path = self.extract_from_deb(deb_path,
@@ -64,11 +59,6 @@ def test_arm_bpim2u(self):
         os.remove(dtb_path)
 
     def test_arm_bpim2u_initrd(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=accel:tcg
-        :avocado: tags=machine:bpim2u
-        """
         self.set_machine('bpim2u')
         deb_path = self.ASSET_DEB.fetch()
         kernel_path = self.extract_from_deb(deb_path,
@@ -105,11 +95,6 @@ def test_arm_bpim2u_initrd(self):
         os.remove(initrd_path)
 
     def test_arm_bpim2u_gmac(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=machine:bpim2u
-        :avocado: tags=device:sd
-        """
         self.set_machine('bpim2u')
         self.require_netdev('user')
 
@@ -160,11 +145,6 @@ def test_arm_bpim2u_gmac(self):
 
     @skipUnless(os.getenv('QEMU_TEST_ALLOW_LARGE_STORAGE'), 'storage limited')
     def test_arm_bpim2u_openwrt_22_03_3(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=machine:bpim2u
-        :avocado: tags=device:sd
-        """
         self.set_machine('bpim2u')
         # This test download a 8.9 MiB compressed image and expand it
         # to 127 MiB.
diff --git a/tests/functional/test_arm_orangepi.py b/tests/functional/test_arm_orangepi.py
index d2ed5fcc82..6d57223a03 100755
--- a/tests/functional/test_arm_orangepi.py
+++ b/tests/functional/test_arm_orangepi.py
@@ -49,11 +49,6 @@ class BananaPiMachine(LinuxKernelTest):
         '20d3e07dc057e15c12452620e90ecab2047f0f7940d9cba8182ebc795927177f')
 
     def test_arm_orangepi(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=machine:orangepi-pc
-        :avocado: tags=accel:tcg
-        """
         self.set_machine('orangepi-pc')
         deb_path = self.ASSET_DEB.fetch()
         kernel_path = self.extract_from_deb(deb_path,
@@ -75,11 +70,6 @@ def test_arm_orangepi(self):
         os.remove(dtb_path)
 
     def test_arm_orangepi_initrd(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=accel:tcg
-        :avocado: tags=machine:orangepi-pc
-        """
         self.set_machine('orangepi-pc')
         deb_path = self.ASSET_DEB.fetch()
         kernel_path = self.extract_from_deb(deb_path,
@@ -115,12 +105,6 @@ def test_arm_orangepi_initrd(self):
         os.remove(initrd_path)
 
     def test_arm_orangepi_sd(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=accel:tcg
-        :avocado: tags=machine:orangepi-pc
-        :avocado: tags=device:sd
-        """
         self.set_machine('orangepi-pc')
         self.require_netdev('user')
         deb_path = self.ASSET_DEB.fetch()
@@ -167,11 +151,6 @@ def test_arm_orangepi_sd(self):
 
     @skipUnless(os.getenv('QEMU_TEST_ALLOW_LARGE_STORAGE'), 'storage limited')
     def test_arm_orangepi_armbian(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=machine:orangepi-pc
-        :avocado: tags=device:sd
-        """
         self.set_machine('orangepi-pc')
         # This test download a 275 MiB compressed image and expand it
         # to 1036 MiB, but the underlying filesystem is 1552 MiB...
@@ -208,12 +187,6 @@ def test_arm_orangepi_armbian(self):
 
     @skipUnless(os.getenv('QEMU_TEST_ALLOW_LARGE_STORAGE'), 'storage limited')
     def test_arm_orangepi_uboot_netbsd9(self):
-        """
-        :avocado: tags=arch:arm
-        :avocado: tags=machine:orangepi-pc
-        :avocado: tags=device:sd
-        :avocado: tags=os:netbsd
-        """
         self.set_machine('orangepi-pc')
         # This test download a 304MB compressed image and expand it to 2GB
         deb_path = self.ASSET_UBOOT.fetch()
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/15] tests/functional: remove obsolete reference to avocado bug
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (4 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 05/15] tests/functional: remove leftover :avocado: tags Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 16:39   ` Alex Bennée
  2024-11-19 15:05 ` [PATCH 07/15] tests/functional: remove comments talking about avocado Daniel P. Berrangé
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

Historical bugs in avocado related to zstd support are not relevant to
the code now that it uses QEMU's native test harness.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/qemu_test/tuxruntest.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/functional/qemu_test/tuxruntest.py b/tests/functional/qemu_test/tuxruntest.py
index f05aa96ad7..ed2b238c92 100644
--- a/tests/functional/qemu_test/tuxruntest.py
+++ b/tests/functional/qemu_test/tuxruntest.py
@@ -39,7 +39,6 @@ def setUp(self):
         super().setUp()
 
         # We need zstd for all the tuxrun tests
-        # See https://github.com/avocado-framework/avocado/issues/5609
         (has_zstd, msg) = has_cmd('zstd')
         if has_zstd is False:
             self.skipTest(msg)
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/15] tests/functional: remove comments talking about avocado
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (5 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 06/15] tests/functional: remove obsolete reference to avocado bug Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 17:32   ` Thomas Huth
  2024-11-19 15:05 ` [PATCH 08/15] tests/functional: honour self.workdir in ACPI bits tests Daniel P. Berrangé
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

The first comment is still relevant but should talk about our own test
harness instead. The second comment adds no value over reading the code
and can be removed.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/test_acpi_bits.py | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/tests/functional/test_acpi_bits.py b/tests/functional/test_acpi_bits.py
index ee40647d5b..6a7758c576 100755
--- a/tests/functional/test_acpi_bits.py
+++ b/tests/functional/test_acpi_bits.py
@@ -196,11 +196,11 @@ def copy_test_scripts(self):
         for filename in os.listdir(bits_test_dir):
             if os.path.isfile(os.path.join(bits_test_dir, filename)) and \
                filename.endswith('.py2'):
-                # all test scripts are named with extension .py2 so that
-                # avocado does not try to load them. These scripts are
-                # written for python 2.7 not python 3 and hence if avocado
-                # loaded them, it would complain about python 3 specific
-                # syntaxes.
+                # all test scripts are named with extension .py2 so that our
+                # test harness does not try to load them. These scripts are
+                # written for python 2.7 not python 3 and hence if our test
+                # harness loaded them, it would complain about python 3
+                # specific syntaxes.
                 newfilename = os.path.splitext(filename)[0] + '.py'
                 shutil.copy2(os.path.join(bits_test_dir, filename),
                              os.path.join(target_test_dir, newfilename))
@@ -399,8 +399,6 @@ def test_acpi_smbios_bits(self):
 
         # biosbits has been configured to run all the specified test suites
         # in batch mode and then automatically initiate a vm shutdown.
-        # Set timeout to BITS_TIMEOUT for SHUTDOWN event from bits VM at par
-        # with the avocado test timeout.
         self._vm.event_wait('SHUTDOWN', timeout=BITS_TIMEOUT)
         self._vm.wait(timeout=None)
         self.logger.debug("Checking console output ...")
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/15] tests/functional: honour self.workdir in ACPI bits tests
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (6 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 07/15] tests/functional: remove comments talking about avocado Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 17:09   ` Alex Bennée
  2024-11-19 15:05 ` [PATCH 09/15] tests/functional: put QEMUMachine logs in testcase log directory Daniel P. Berrangé
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

The ACPI bits test sets up its own private temporary directory into it
creates scratch files. This is justified by a suggestion that we need
to be able to preserve the scratch files. We have the ability to
preserve the scratch dir with our functional harness, so there's no
reason to diverge from standard practice in file placement.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/test_acpi_bits.py | 44 +++++++++---------------------
 1 file changed, 13 insertions(+), 31 deletions(-)

diff --git a/tests/functional/test_acpi_bits.py b/tests/functional/test_acpi_bits.py
index 6a7758c576..8774a79312 100755
--- a/tests/functional/test_acpi_bits.py
+++ b/tests/functional/test_acpi_bits.py
@@ -150,7 +150,6 @@ class AcpiBitsTest(QemuBaseTest): #pylint: disable=too-many-instance-attributes
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
         self._vm = None
-        self._workDir = None
         self._baseDir = None
 
         self._debugcon_addr = '0x403'
@@ -169,7 +168,7 @@ def copy_bits_config(self):
         config_file = 'bits-cfg.txt'
         bits_config_dir = os.path.join(self._baseDir, 'acpi-bits',
                                        'bits-config')
-        target_config_dir = os.path.join(self._workDir,
+        target_config_dir = os.path.join(self.workdir,
                                          'bits-%d' %self.BITS_INTERNAL_VER,
                                          'boot')
         self.assertTrue(os.path.exists(bits_config_dir))
@@ -186,7 +185,7 @@ def copy_test_scripts(self):
 
         bits_test_dir = os.path.join(self._baseDir, 'acpi-bits',
                                      'bits-tests')
-        target_test_dir = os.path.join(self._workDir,
+        target_test_dir = os.path.join(self.workdir,
                                        'bits-%d' %self.BITS_INTERNAL_VER,
                                        'boot', 'python')
 
@@ -224,8 +223,8 @@ def fix_mkrescue(self, mkrescue):
             the directory where we have extracted our pre-built bits grub
             tarball.
         """
-        grub_x86_64_mods = os.path.join(self._workDir, 'grub-inst-x86_64-efi')
-        grub_i386_mods = os.path.join(self._workDir, 'grub-inst')
+        grub_x86_64_mods = os.path.join(self.workdir, 'grub-inst-x86_64-efi')
+        grub_i386_mods = os.path.join(self.workdir, 'grub-inst')
 
         self.assertTrue(os.path.exists(grub_x86_64_mods))
         self.assertTrue(os.path.exists(grub_i386_mods))
@@ -246,11 +245,11 @@ def generate_bits_iso(self):
         """ Uses grub-mkrescue to generate a fresh bits iso with the python
             test scripts
         """
-        bits_dir = os.path.join(self._workDir,
+        bits_dir = os.path.join(self.workdir,
                                 'bits-%d' %self.BITS_INTERNAL_VER)
-        iso_file = os.path.join(self._workDir,
+        iso_file = os.path.join(self.workdir,
                                 'bits-%d.iso' %self.BITS_INTERNAL_VER)
-        mkrescue_script = os.path.join(self._workDir,
+        mkrescue_script = os.path.join(self.workdir,
                                        'grub-inst-x86_64-efi', 'bin',
                                        'grub-mkrescue')
 
@@ -289,17 +288,7 @@ def setUp(self): # pylint: disable=arguments-differ
 
         self._baseDir = Path(__file__).parent
 
-        # workdir could also be avocado's own workdir in self.workdir.
-        # At present, I prefer to maintain my own temporary working
-        # directory. It gives us more control over the generated bits
-        # log files and also for debugging, we may chose not to remove
-        # this working directory so that the logs and iso can be
-        # inspected manually and archived if needed.
-        self._workDir = tempfile.mkdtemp(prefix='acpi-bits-',
-                                         suffix='.tmp')
-        self.logger.info('working dir: %s', self._workDir)
-
-        prebuiltDir = os.path.join(self._workDir, 'prebuilt')
+        prebuiltDir = os.path.join(self.workdir, 'prebuilt')
         if not os.path.isdir(prebuiltDir):
             os.mkdir(prebuiltDir, mode=0o775)
 
@@ -320,10 +309,10 @@ def setUp(self): # pylint: disable=arguments-differ
 
         # extract the bits software in the temp working directory
         with zipfile.ZipFile(bits_zip_file, 'r') as zref:
-            zref.extractall(self._workDir)
+            zref.extractall(self.workdir)
 
         with tarfile.open(grub_tar_file, 'r', encoding='utf-8') as tarball:
-            tarball.extractall(self._workDir)
+            tarball.extractall(self.workdir)
 
         self.copy_test_scripts()
         self.copy_bits_config()
@@ -333,7 +322,7 @@ def parse_log(self):
         """parse the log generated by running bits tests and
            check for failures.
         """
-        debugconf = os.path.join(self._workDir, self._debugcon_log)
+        debugconf = os.path.join(self.workdir, self._debugcon_log)
         log = ""
         with open(debugconf, 'r', encoding='utf-8') as filehandle:
             log = filehandle.read()
@@ -359,25 +348,18 @@ def tearDown(self):
         """
         if self._vm:
             self.assertFalse(not self._vm.is_running)
-        if not os.getenv('BITS_DEBUG') and self._workDir:
-            self.logger.info('removing the work directory %s', self._workDir)
-            shutil.rmtree(self._workDir)
-        else:
-            self.logger.info('not removing the work directory %s ' \
-                             'as BITS_DEBUG is ' \
-                             'passed in the environment', self._workDir)
         super().tearDown()
 
     def test_acpi_smbios_bits(self):
         """The main test case implementation."""
 
-        iso_file = os.path.join(self._workDir,
+        iso_file = os.path.join(self.workdir,
                                 'bits-%d.iso' %self.BITS_INTERNAL_VER)
 
         self.assertTrue(os.access(iso_file, os.R_OK))
 
         self._vm = QEMUBitsMachine(binary=self.qemu_bin,
-                                   base_temp_dir=self._workDir,
+                                   base_temp_dir=self.workdir,
                                    debugcon_log=self._debugcon_log,
                                    debugcon_addr=self._debugcon_addr)
 
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/15] tests/functional: put QEMUMachine logs in testcase log directory
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (7 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 08/15] tests/functional: honour self.workdir in ACPI bits tests Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 17:10   ` Alex Bennée
  2024-11-19 15:05 ` [PATCH 10/15] tests/functional: honour requested test VM name in QEMUMachine Daniel P. Berrangé
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

We are not passing the 'log_dir' parameter to QEMUMachine, so the
QEMU stdout/err logs are being placed in a temp directory and thus
deleted after execution. This makes them inaccessible as gitlab
CI artifacts.

Pass the testcase log directory path into QEMUMachine to make the
logs persistent.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/qemu_test/testcase.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
index b9418e2ac0..ca13af244b 100644
--- a/tests/functional/qemu_test/testcase.py
+++ b/tests/functional/qemu_test/testcase.py
@@ -163,10 +163,11 @@ def require_device(self, devicename):
             self.skipTest('no support for device ' + devicename)
 
     def _new_vm(self, name, *args):
-        vm = QEMUMachine(self.qemu_bin, base_temp_dir=self.workdir)
+        vm = QEMUMachine(self.qemu_bin,
+                         base_temp_dir=self.workdir,
+                         log_dir=self.logdir)
         self.log.debug('QEMUMachine "%s" created', name)
         self.log.debug('QEMUMachine "%s" temp_dir: %s', name, vm.temp_dir)
-        self.log.debug('QEMUMachine "%s" log_dir: %s', name, vm.log_dir)
         if args:
             vm.add_args(*args)
         return vm
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/15] tests/functional: honour requested test VM name in QEMUMachine
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (8 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 09/15] tests/functional: put QEMUMachine logs in testcase log directory Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 17:40   ` Thomas Huth
  2024-11-19 15:05 ` [PATCH 11/15] tests/functional: enable debug logging for QEMUMachine Daniel P. Berrangé
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

The functional test case class is going to the trouble of passing
around a machine name, but then fails to give this QEMUMachine. As
a result, QEMUMachine will create a completely random name. Since
log file names match the machine name, this results in log files
accumulating over time.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/qemu_test/testcase.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
index ca13af244b..f9c9de1166 100644
--- a/tests/functional/qemu_test/testcase.py
+++ b/tests/functional/qemu_test/testcase.py
@@ -164,6 +164,7 @@ def require_device(self, devicename):
 
     def _new_vm(self, name, *args):
         vm = QEMUMachine(self.qemu_bin,
+                         name=name,
                          base_temp_dir=self.workdir,
                          log_dir=self.logdir)
         self.log.debug('QEMUMachine "%s" created', name)
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/15] tests/functional: enable debug logging for QEMUMachine
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (9 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 10/15] tests/functional: honour requested test VM name in QEMUMachine Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-21  6:52   ` Thomas Huth
  2024-11-19 15:05 ` [PATCH 12/15] tests/functional: logs details of console interaction operations Daniel P. Berrangé
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

Set the 'qemu.machine' logger to 'DEBUG' level, to ensure we see log
messages related to the QEMUMachine class. Most importantly this
ensures we capture the full QEMU command line args for instances we
spawn.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/qemu_test/testcase.py | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
index f9c9de1166..e2a329c3e5 100644
--- a/tests/functional/qemu_test/testcase.py
+++ b/tests/functional/qemu_test/testcase.py
@@ -57,9 +57,15 @@ def setUp(self, bin_prefix):
         self._log_fh.setFormatter(fileFormatter)
         self.log.addHandler(self._log_fh)
 
+        # Capture QEMUMachine logging
+        self.machinelog = logging.getLogger('qemu.machine')
+        self.machinelog.setLevel(logging.DEBUG)
+        self.machinelog.addHandler(self._log_fh)
+
     def tearDown(self):
         if "QEMU_TEST_KEEP_SCRATCH" not in os.environ:
             shutil.rmtree(self.workdir)
+        self.machinelog.removeHandler(self._log_fh)
         self.log.removeHandler(self._log_fh)
 
     def main():
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 12/15] tests/functional: logs details of console interaction operations
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (10 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 11/15] tests/functional: enable debug logging for QEMUMachine Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-21  6:58   ` Thomas Huth
  2024-11-19 15:05 ` [PATCH 13/15] tests/functional: rewrite console handling to be bytewise Daniel P. Berrangé
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

When functional tests go wrong, it will often be related to the console
interaction wait state. By logging the messages that we're looking for,
and data we're about to be sending, it'll be easier to diagnose where
tests are getting stuck.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/qemu_test/cmd.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py
index cbabb1ceed..76a48064cd 100644
--- a/tests/functional/qemu_test/cmd.py
+++ b/tests/functional/qemu_test/cmd.py
@@ -85,6 +85,8 @@ def _console_interaction(test, success_message, failure_message,
         vm = test.vm
     console = vm.console_file
     console_logger = logging.getLogger('console')
+    test.log.debug(f"Console interaction success:'{success_message}' " +
+                   f"failure:'{failure_message}' send:'{send_string}'")
     while True:
         if send_string:
             vm.console_socket.sendall(send_string.encode())
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 13/15] tests/functional: rewrite console handling to be bytewise
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (11 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 12/15] tests/functional: logs details of console interaction operations Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 17:11   ` Paolo Bonzini
  2024-11-19 15:05 ` [PATCH 14/15] tests/functional: remove time.sleep usage from tuxrun tests Daniel P. Berrangé
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé, Cédric Le Goater

The console interaction that waits for predicted strings uses
readline(), and thus is only capable of waiting for strings
that are followed by a newline.

This is inconvenient when needing to match on some things,
particularly login prompts, or shell prompts, causing tests
to use time.sleep(...) instead, which is unreliable.

Switch to reading the console 1 byte at a time, comparing
against the success/failure messages until we see a match,
regardless of whether a newline is encountered.

The success/failure comparisons are done with the python bytes
type, rather than strings, to avoid the problem of needing to
decode partially received multibyte utf8 characters.

Heavily inspired by a patch proposed by Cédric, but written
again to work in bytes, rather than strings.

Co-developed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/qemu_test/cmd.py | 63 +++++++++++++++++++++++--------
 1 file changed, 48 insertions(+), 15 deletions(-)

diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py
index 76a48064cd..91267a087f 100644
--- a/tests/functional/qemu_test/cmd.py
+++ b/tests/functional/qemu_test/cmd.py
@@ -78,15 +78,58 @@ def run_cmd(args):
 def is_readable_executable_file(path):
     return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
 
+def _console_readline(test, vm, success, failure):
+    msg = bytes([])
+    done = False
+    while True:
+        c = vm.console_socket.recv(1)
+        if c is None:
+            done = True
+            test.fail(
+                f"EOF in console, expected '{success}'")
+            break
+        msg += c
+
+        if success is None or success in msg:
+            done = True
+            break
+        if failure and failure in msg:
+            done = True
+            vm.console_socket.close()
+            test.fail(
+                f"'{failure}' found in console, expected '{success}'")
+
+        if c == b'\n':
+            break
+
+    console_logger = logging.getLogger('console')
+    try:
+        console_logger.debug(msg.decode().strip())
+    except:
+        console_logger.debug(msg)
+
+    return done
+
 def _console_interaction(test, success_message, failure_message,
                          send_string, keep_sending=False, vm=None):
     assert not keep_sending or send_string
     if vm is None:
         vm = test.vm
-    console = vm.console_file
-    console_logger = logging.getLogger('console')
+
     test.log.debug(f"Console interaction success:'{success_message}' " +
                    f"failure:'{failure_message}' send:'{send_string}'")
+
+    # We'll process console in bytes, to avoid having to
+    # deal with unicode decode errors from receiving
+    # partial utf8 byte sequences
+    success_message_b = None
+    if success_message is not None:
+        success_message_b = success_message.encode()
+
+    failure_message_b = None
+    if failure_message is not None:
+        failure_message_b = failure_message.encode()
+
     while True:
         if send_string:
             vm.console_socket.sendall(send_string.encode())
@@ -99,20 +142,10 @@ def _console_interaction(test, success_message, failure_message,
                 break
             continue
 
-        try:
-            msg = console.readline().decode().strip()
-        except UnicodeDecodeError:
-            msg = None
-        if not msg:
-            continue
-        console_logger.debug(msg)
-        if success_message is None or success_message in msg:
+        if _console_readline(test, vm,
+                             success_message_b,
+                             failure_message_b):
             break
-        if failure_message and failure_message in msg:
-            console.close()
-            fail = 'Failure message found in console: "%s". Expected: "%s"' % \
-                    (failure_message, success_message)
-            test.fail(fail)
 
 def interrupt_interactive_console_until_pattern(test, success_message,
                                                 failure_message=None,
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 14/15] tests/functional: remove time.sleep usage from tuxrun tests
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (12 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 13/15] tests/functional: rewrite console handling to be bytewise Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-19 15:05 ` [PATCH 15/15] tests/functional: add a QMP backdoor for debugging stalled tests Daniel P. Berrangé
  2024-11-19 17:59 ` [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Cédric Le Goater
  15 siblings, 0 replies; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

The tuxrun tests send a series of strings to the guest to login
and then run commands. Since we have been unable to match on
console output that isn't followed by a newline, the test used
many time.sleep() statements to pretend to synchronize with
the guest.

This has proved to be unreliable for the aarch64be instance of
the tuxrun tests, with the test often hanging. The hang is a
very subtle timing problem, and it is suspected that some
(otherwise apparently harmless) I/O error messages could be
resulting in full FIFO buffers, stalling interaction with
the guest.

With the newly rewritten console interaction able to match
strings that don't have a following newline, the tux run
tests can now match directly on the login prompt, and/or
shell PS1 prompt.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2689
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/functional/qemu_test/tuxruntest.py | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/tests/functional/qemu_test/tuxruntest.py b/tests/functional/qemu_test/tuxruntest.py
index ed2b238c92..ab3b27da43 100644
--- a/tests/functional/qemu_test/tuxruntest.py
+++ b/tests/functional/qemu_test/tuxruntest.py
@@ -124,16 +124,12 @@ def run_tuxtest_tests(self, haltmsg):
         then do a few things on the console. Trigger a shutdown and
         wait to exit cleanly.
         """
-        self.wait_for_console_pattern("Welcome to TuxTest")
-        time.sleep(0.2)
-        exec_command(self, 'root')
-        time.sleep(0.2)
-        exec_command(self, 'cat /proc/interrupts')
-        time.sleep(0.1)
-        exec_command(self, 'cat /proc/self/maps')
-        time.sleep(0.1)
-        exec_command(self, 'uname -a')
-        time.sleep(0.1)
+        ps1='root@tuxtest:~#'
+        self.wait_for_console_pattern('tuxtest login:')
+        exec_command_and_wait_for_pattern(self, 'root', ps1)
+        exec_command_and_wait_for_pattern(self, 'cat /proc/interrupts', ps1)
+        exec_command_and_wait_for_pattern(self, 'cat /proc/self/maps', ps1)
+        exec_command_and_wait_for_pattern(self, 'uname -a', ps1)
         exec_command_and_wait_for_pattern(self, 'halt', haltmsg)
 
         # Wait for VM to shut down gracefully if it can
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 15/15] tests/functional: add a QMP backdoor for debugging stalled tests
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (13 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 14/15] tests/functional: remove time.sleep usage from tuxrun tests Daniel P. Berrangé
@ 2024-11-19 15:05 ` Daniel P. Berrangé
  2024-11-21  7:01   ` Thomas Huth
  2024-11-19 17:59 ` [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Cédric Le Goater
  15 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 15:05 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell,
	Daniel P. Berrangé

Support the QEMU_TEST_QMP_BACKDOOR=backdoor.sock env variable as a
way to get a QMP backdoor for debugging a stalled QEMU test. Most
typically this would be used if running the tests directly:

 $ QEMU_TEST_QMP_BACKDOOR=backdoor.sock \
   QEMU_TEST_QEMU_BINARY=./build/qemu-system-arm \
   PYTHONPATH=./python \
   ./tests/functional/test_arm_tuxrun.py

And then, when the test stalls, in a second shell run:

 $ ./scripts/qmp/qmp-shell backdoor.sock

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 docs/devel/testing/functional.rst      | 10 ++++++++++
 tests/functional/qemu_test/testcase.py |  7 +++++++
 2 files changed, 17 insertions(+)

diff --git a/docs/devel/testing/functional.rst b/docs/devel/testing/functional.rst
index 6b5d0c5b98..b8ad7b0bf7 100644
--- a/docs/devel/testing/functional.rst
+++ b/docs/devel/testing/functional.rst
@@ -176,6 +176,16 @@ primarily depend on the value of the ``qemu_bin`` class attribute.
 If it is not explicitly set by the test code, its default value will
 be the result the QEMU_TEST_QEMU_BINARY environment variable.
 
+Debugging hung QEMU
+^^^^^^^^^^^^^^^^^^^
+
+When test cases go wrong it may be helpful to debug a stalled QEMU
+process. While the QEMUMachine class owns the primary QMP monitor
+socket, it is possible to request a second QMP monitor be created
+by setting the ``QEMU_TEST_QMP_BACKDOOR`` env variable to refer
+to a UNIX socket name. The ``qmp-shell`` command can then be
+attached to the stalled QEMU to examine its live state.
+
 Attribute reference
 -------------------
 
diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
index e2a329c3e5..eb889a5bae 100644
--- a/tests/functional/qemu_test/testcase.py
+++ b/tests/functional/qemu_test/testcase.py
@@ -175,6 +175,13 @@ def _new_vm(self, name, *args):
                          log_dir=self.logdir)
         self.log.debug('QEMUMachine "%s" created', name)
         self.log.debug('QEMUMachine "%s" temp_dir: %s', name, vm.temp_dir)
+
+        if "QEMU_TEST_QMP_BACKDOOR" in os.environ:
+            path = os.environ["QEMU_TEST_QMP_BACKDOOR"]
+            vm.add_args("-chardev",
+                        f"socket,id=backdoor,path={path},server=on,wait=off",
+                        "-mon", "chardev=backdoor,mode=control")
+
         if args:
             vm.add_args(*args)
         return vm
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/15] tests/functional: fix mips64el test to honour workdir
  2024-11-19 15:05 ` [PATCH 01/15] tests/functional: fix mips64el test to honour workdir Daniel P. Berrangé
@ 2024-11-19 15:35   ` Alex Bennée
  2024-11-19 17:08   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 36+ messages in thread
From: Alex Bennée @ 2024-11-19 15:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Philippe Mathieu-Daudé,
	Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> The missing directory separator resulted in the kernel file being
> created 1 level higher than expected.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/15] tests/functional: automatically clean up scratch files after tests
  2024-11-19 15:05 ` [PATCH 02/15] tests/functional: automatically clean up scratch files after tests Daniel P. Berrangé
@ 2024-11-19 16:21   ` Alex Bennée
  2024-11-19 17:28     ` Thomas Huth
  0 siblings, 1 reply; 36+ messages in thread
From: Alex Bennée @ 2024-11-19 16:21 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Philippe Mathieu-Daudé,
	Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> The build/tests/functional subdirectories are consuming huge amounts
> of disk space.
>
> Split the location for scratch files into a 'scratch' sub-directory,
> separate from log files, and delete it upon completion of each test.
> The new env variable QEMU_TEST_KEEP_SCRATCH can be set to preserve
> this scratch dir for debugging access if required.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/testing/functional.rst      |  6 ++++++
>  tests/functional/qemu_test/testcase.py | 14 +++++++++-----
>  2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/docs/devel/testing/functional.rst b/docs/devel/testing/functional.rst
> index bf6f1bb81e..6b5d0c5b98 100644
> --- a/docs/devel/testing/functional.rst
> +++ b/docs/devel/testing/functional.rst
> @@ -65,6 +65,12 @@ to the QEMU binary that should be used for the test, for example::
>    $ export QEMU_TEST_QEMU_BINARY=$PWD/qemu-system-x86_64
>    $ python3 ../tests/functional/test_file.py
>  
> +The test framework will automatically purge any scratch files created during
> +the tests. If needing to debug a failed test, it is possible to keep these
> +files around on disk by setting ```QEMU_TEST_KEEP_SCRATCH=1``` as an env
> +variable.  Any preserved files will be deleted the next time the test is run
> +without this variable set.
> +
>  Overview
>  --------
>  
> diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
> index 411978b5ef..b9418e2ac0 100644
> --- a/tests/functional/qemu_test/testcase.py
> +++ b/tests/functional/qemu_test/testcase.py
> @@ -13,8 +13,9 @@
>  
>  import logging
>  import os
> -import subprocess
>  import pycotap
> +import shutil
> +import subprocess
>  import sys
>  import unittest
>  import uuid
> @@ -40,11 +41,12 @@ def setUp(self, bin_prefix):
>          self.assertIsNotNone(self.qemu_bin, 'QEMU_TEST_QEMU_BINARY must be set')
>          self.arch = self.qemu_bin.split('-')[-1]
>  
> -        self.workdir = os.path.join(BUILD_DIR, 'tests/functional', self.arch,
> -                                    self.id())
> +        self.outputdir = os.path.join(BUILD_DIR, 'tests', 'functional',
> +                                      self.arch, self.id())

I don't think you need save self.outputdir as only self.logdir is used
by the other functions.

> +        self.workdir = os.path.join(self.outputdir, 'scratch')
>          os.makedirs(self.workdir, exist_ok=True)
>  
> -        self.logdir = self.workdir
> +        self.logdir = self.outputdir
>          self.log_filename = os.path.join(self.logdir, 'base.log')
>          self.log = logging.getLogger('qemu-test')
>          self.log.setLevel(logging.DEBUG)
> @@ -56,6 +58,8 @@ def setUp(self, bin_prefix):
>          self.log.addHandler(self._log_fh)
>  
>      def tearDown(self):
> +        if "QEMU_TEST_KEEP_SCRATCH" not in os.environ:
> +            shutil.rmtree(self.workdir)
>          self.log.removeHandler(self._log_fh)
>  
>      def main():
> @@ -108,7 +112,7 @@ def setUp(self):
>  
>          console_log = logging.getLogger('console')
>          console_log.setLevel(logging.DEBUG)
> -        self.console_log_name = os.path.join(self.workdir, 'console.log')
> +        self.console_log_name = os.path.join(self.logdir, 'console.log')
>          self._console_log_fh = logging.FileHandler(self.console_log_name,
>                                                     mode='w')
>          self._console_log_fh.setLevel(logging.DEBUG)

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/15] tests/functional: remove "AVOCADO" from env variable name
  2024-11-19 15:05 ` [PATCH 03/15] tests/functional: remove "AVOCADO" from env variable name Daniel P. Berrangé
@ 2024-11-19 16:22   ` Alex Bennée
  2024-11-19 17:09   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 36+ messages in thread
From: Alex Bennée @ 2024-11-19 16:22 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Philippe Mathieu-Daudé,
	Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> This env variable is a debugging flag to save screendumps in the
> mips64el malta tests.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 04/15] tests/functional: remove todo wrt avocado.utils.wait_for
  2024-11-19 15:05 ` [PATCH 04/15] tests/functional: remove todo wrt avocado.utils.wait_for Daniel P. Berrangé
@ 2024-11-19 16:37   ` Alex Bennée
  0 siblings, 0 replies; 36+ messages in thread
From: Alex Bennée @ 2024-11-19 16:37 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Philippe Mathieu-Daudé,
	Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> We're not using avocado anymore, so while the TODO item is still
> relevant, suggesting use of avocado.utils is not.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/15] tests/functional: remove leftover :avocado: tags
  2024-11-19 15:05 ` [PATCH 05/15] tests/functional: remove leftover :avocado: tags Daniel P. Berrangé
@ 2024-11-19 16:37   ` Alex Bennée
  0 siblings, 0 replies; 36+ messages in thread
From: Alex Bennée @ 2024-11-19 16:37 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Philippe Mathieu-Daudé,
	Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> These tags are not honoured under the new functional test harness.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 06/15] tests/functional: remove obsolete reference to avocado bug
  2024-11-19 15:05 ` [PATCH 06/15] tests/functional: remove obsolete reference to avocado bug Daniel P. Berrangé
@ 2024-11-19 16:39   ` Alex Bennée
  0 siblings, 0 replies; 36+ messages in thread
From: Alex Bennée @ 2024-11-19 16:39 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Philippe Mathieu-Daudé,
	Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> Historical bugs in avocado related to zstd support are not relevant to
> the code now that it uses QEMU's native test harness.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 01/15] tests/functional: fix mips64el test to honour workdir
  2024-11-19 15:05 ` [PATCH 01/15] tests/functional: fix mips64el test to honour workdir Daniel P. Berrangé
  2024-11-19 15:35   ` Alex Bennée
@ 2024-11-19 17:08   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 36+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-11-19 17:08 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée, Peter Maydell

On 19/11/24 16:05, Daniel P. Berrangé wrote:
> The missing directory separator resulted in the kernel file being
> created 1 level higher than expected.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/test_mips64el_malta.py | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/functional/test_mips64el_malta.py b/tests/functional/test_mips64el_malta.py
> index 6c6355b131..24ebcdb9c1 100755
> --- a/tests/functional/test_mips64el_malta.py
> +++ b/tests/functional/test_mips64el_malta.py
> @@ -129,7 +129,7 @@ def do_test_i6400_framebuffer_logo(self, cpu_cores_count):
>           screendump_path = os.path.join(self.workdir, 'screendump.pbm')
>   
>           kernel_path_gz = self.ASSET_KERNEL_4_7_0.fetch()
> -        kernel_path = self.workdir + "vmlinux"
> +        kernel_path = self.workdir + "/vmlinux"

Safer is to use os.path.join(), anyway:

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

>           gzip_uncompress(kernel_path_gz, kernel_path)
>   
>           tuxlogo_path = self.ASSET_TUXLOGO.fetch()



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/15] tests/functional: remove "AVOCADO" from env variable name
  2024-11-19 15:05 ` [PATCH 03/15] tests/functional: remove "AVOCADO" from env variable name Daniel P. Berrangé
  2024-11-19 16:22   ` Alex Bennée
@ 2024-11-19 17:09   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 36+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-11-19 17:09 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée, Peter Maydell

On 19/11/24 16:05, Daniel P. Berrangé wrote:
> This env variable is a debugging flag to save screendumps in the
> mips64el malta tests.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/test_mips64el_malta.py | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 08/15] tests/functional: honour self.workdir in ACPI bits tests
  2024-11-19 15:05 ` [PATCH 08/15] tests/functional: honour self.workdir in ACPI bits tests Daniel P. Berrangé
@ 2024-11-19 17:09   ` Alex Bennée
  0 siblings, 0 replies; 36+ messages in thread
From: Alex Bennée @ 2024-11-19 17:09 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Philippe Mathieu-Daudé,
	Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> The ACPI bits test sets up its own private temporary directory into it
> creates scratch files. This is justified by a suggestion that we need
> to be able to preserve the scratch files. We have the ability to
> preserve the scratch dir with our functional harness, so there's no
> reason to diverge from standard practice in file placement.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 09/15] tests/functional: put QEMUMachine logs in testcase log directory
  2024-11-19 15:05 ` [PATCH 09/15] tests/functional: put QEMUMachine logs in testcase log directory Daniel P. Berrangé
@ 2024-11-19 17:10   ` Alex Bennée
  0 siblings, 0 replies; 36+ messages in thread
From: Alex Bennée @ 2024-11-19 17:10 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Philippe Mathieu-Daudé,
	Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> We are not passing the 'log_dir' parameter to QEMUMachine, so the
> QEMU stdout/err logs are being placed in a temp directory and thus
> deleted after execution. This makes them inaccessible as gitlab
> CI artifacts.
>
> Pass the testcase log directory path into QEMUMachine to make the
> logs persistent.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 13/15] tests/functional: rewrite console handling to be bytewise
  2024-11-19 15:05 ` [PATCH 13/15] tests/functional: rewrite console handling to be bytewise Daniel P. Berrangé
@ 2024-11-19 17:11   ` Paolo Bonzini
  2024-11-19 18:54     ` Daniel P. Berrangé
  0 siblings, 1 reply; 36+ messages in thread
From: Paolo Bonzini @ 2024-11-19 17:11 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell, Cédric Le Goater

On 11/19/24 16:05, Daniel P. Berrangé wrote:
> The console interaction that waits for predicted strings uses
> readline(), and thus is only capable of waiting for strings
> that are followed by a newline.
> 
> This is inconvenient when needing to match on some things,
> particularly login prompts, or shell prompts, causing tests
> to use time.sleep(...) instead, which is unreliable.
> 
> Switch to reading the console 1 byte at a time, comparing
> against the success/failure messages until we see a match,
> regardless of whether a newline is encountered.
> 
> The success/failure comparisons are done with the python bytes
> type, rather than strings, to avoid the problem of needing to
> decode partially received multibyte utf8 characters.
> 
> Heavily inspired by a patch proposed by Cédric, but written
> again to work in bytes, rather than strings.
> 
> Co-developed-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/qemu_test/cmd.py | 63 +++++++++++++++++++++++--------
>   1 file changed, 48 insertions(+), 15 deletions(-)
> 
> diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py
> index 76a48064cd..91267a087f 100644
> --- a/tests/functional/qemu_test/cmd.py
> +++ b/tests/functional/qemu_test/cmd.py
> @@ -78,15 +78,58 @@ def run_cmd(args):
>   def is_readable_executable_file(path):
>       return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
>   
> +def _console_readline(test, vm, success, failure):
> +    msg = bytes([])
> +    done = False
> +    while True:
> +        c = vm.console_socket.recv(1)
> +        if c is None:
> +            done = True
> +            test.fail(
> +                f"EOF in console, expected '{success}'")
> +            break
> +        msg += c
> +
> +        if success is None or success in msg:

As an optimization, you could use msg.endswith(success) and 
msg.endswith(failure), which would avoid the most blatant cases of 
O(n^2) behavior.

More important, I think "if success is None" should not be here, because 
it will exit after one char.  Instead...

> +            done = True
> +            break
> +        if failure and failure in msg:
> +            done = True
> +            vm.console_socket.close()
> +            test.fail(
> +                f"'{failure}' found in console, expected '{success}'")
> +
> +        if c == b'\n':

Here you can put

                done = success is None

Paolo

> +            break
> +
> +    console_logger = logging.getLogger('console')
> +    try:
> +        console_logger.debug(msg.decode().strip())
> +    except:
> +        console_logger.debug(msg)
> +
> +    return done
> +
>   def _console_interaction(test, success_message, failure_message,
>                            send_string, keep_sending=False, vm=None):
>       assert not keep_sending or send_string
>       if vm is None:
>           vm = test.vm
> -    console = vm.console_file
> -    console_logger = logging.getLogger('console')
> +
>       test.log.debug(f"Console interaction success:'{success_message}' " +
>                      f"failure:'{failure_message}' send:'{send_string}'")
> +
> +    # We'll process console in bytes, to avoid having to
> +    # deal with unicode decode errors from receiving
> +    # partial utf8 byte sequences
> +    success_message_b = None
> +    if success_message is not None:
> +        success_message_b = success_message.encode()
> +
> +    failure_message_b = None
> +    if failure_message is not None:
> +        failure_message_b = failure_message.encode()
> +
>       while True:
>           if send_string:
>               vm.console_socket.sendall(send_string.encode())
> @@ -99,20 +142,10 @@ def _console_interaction(test, success_message, failure_message,
>                   break
>               continue
>   
> -        try:
> -            msg = console.readline().decode().strip()
> -        except UnicodeDecodeError:
> -            msg = None
> -        if not msg:
> -            continue
> -        console_logger.debug(msg)
> -        if success_message is None or success_message in msg:
> +        if _console_readline(test, vm,
> +                             success_message_b,
> +                             failure_message_b):
>               break
> -        if failure_message and failure_message in msg:
> -            console.close()
> -            fail = 'Failure message found in console: "%s". Expected: "%s"' % \
> -                    (failure_message, success_message)
> -            test.fail(fail)
>   
>   def interrupt_interactive_console_until_pattern(test, success_message,
>                                                   failure_message=None,



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 02/15] tests/functional: automatically clean up scratch files after tests
  2024-11-19 16:21   ` Alex Bennée
@ 2024-11-19 17:28     ` Thomas Huth
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Huth @ 2024-11-19 17:28 UTC (permalink / raw)
  To: Alex Bennée, Daniel P. Berrangé
  Cc: qemu-devel, Ani Sinha, Philippe Mathieu-Daudé, Peter Maydell

On 19/11/2024 17.21, Alex Bennée wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
>> The build/tests/functional subdirectories are consuming huge amounts
>> of disk space.
>>
>> Split the location for scratch files into a 'scratch' sub-directory,
>> separate from log files, and delete it upon completion of each test.
>> The new env variable QEMU_TEST_KEEP_SCRATCH can be set to preserve
>> this scratch dir for debugging access if required.
>>
>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> ---
>>   docs/devel/testing/functional.rst      |  6 ++++++
>>   tests/functional/qemu_test/testcase.py | 14 +++++++++-----
>>   2 files changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/docs/devel/testing/functional.rst b/docs/devel/testing/functional.rst
>> index bf6f1bb81e..6b5d0c5b98 100644
>> --- a/docs/devel/testing/functional.rst
>> +++ b/docs/devel/testing/functional.rst
>> @@ -65,6 +65,12 @@ to the QEMU binary that should be used for the test, for example::
>>     $ export QEMU_TEST_QEMU_BINARY=$PWD/qemu-system-x86_64
>>     $ python3 ../tests/functional/test_file.py
>>   
>> +The test framework will automatically purge any scratch files created during
>> +the tests. If needing to debug a failed test, it is possible to keep these
>> +files around on disk by setting ```QEMU_TEST_KEEP_SCRATCH=1``` as an env
>> +variable.  Any preserved files will be deleted the next time the test is run
>> +without this variable set.
>> +
>>   Overview
>>   --------
>>   
>> diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
>> index 411978b5ef..b9418e2ac0 100644
>> --- a/tests/functional/qemu_test/testcase.py
>> +++ b/tests/functional/qemu_test/testcase.py
>> @@ -13,8 +13,9 @@
>>   
>>   import logging
>>   import os
>> -import subprocess
>>   import pycotap
>> +import shutil
>> +import subprocess
>>   import sys
>>   import unittest
>>   import uuid
>> @@ -40,11 +41,12 @@ def setUp(self, bin_prefix):
>>           self.assertIsNotNone(self.qemu_bin, 'QEMU_TEST_QEMU_BINARY must be set')
>>           self.arch = self.qemu_bin.split('-')[-1]
>>   
>> -        self.workdir = os.path.join(BUILD_DIR, 'tests/functional', self.arch,
>> -                                    self.id())
>> +        self.outputdir = os.path.join(BUILD_DIR, 'tests', 'functional',
>> +                                      self.arch, self.id())
> 
> I don't think you need save self.outputdir as only self.logdir is used
> by the other functions.

I guess it might be useful to have this variable as a reference point in 
future patches, so I'm in favor of keeping it.

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 07/15] tests/functional: remove comments talking about avocado
  2024-11-19 15:05 ` [PATCH 07/15] tests/functional: remove comments talking about avocado Daniel P. Berrangé
@ 2024-11-19 17:32   ` Thomas Huth
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Huth @ 2024-11-19 17:32 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Ani Sinha, Alex Bennée, Philippe Mathieu-Daudé,
	Peter Maydell

On 19/11/2024 16.05, Daniel P. Berrangé wrote:
> The first comment is still relevant but should talk about our own test
> harness instead. The second comment adds no value over reading the code
> and can be removed.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/test_acpi_bits.py | 12 +++++-------
>   1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/tests/functional/test_acpi_bits.py b/tests/functional/test_acpi_bits.py
> index ee40647d5b..6a7758c576 100755
> --- a/tests/functional/test_acpi_bits.py
> +++ b/tests/functional/test_acpi_bits.py
> @@ -196,11 +196,11 @@ def copy_test_scripts(self):
>           for filename in os.listdir(bits_test_dir):
>               if os.path.isfile(os.path.join(bits_test_dir, filename)) and \
>                  filename.endswith('.py2'):
> -                # all test scripts are named with extension .py2 so that
> -                # avocado does not try to load them. These scripts are
> -                # written for python 2.7 not python 3 and hence if avocado
> -                # loaded them, it would complain about python 3 specific
> -                # syntaxes.
> +                # all test scripts are named with extension .py2 so that our
> +                # test harness does not try to load them. These scripts are

Since our test framework does not try to scan for *.py files automatically 
(but the files have to be added via meson.build instead), I'd maybe rather say:

"all test scripts are named with extension .py2 so that they are not run by 
accident"

or something similar?

> +                # written for python 2.7 not python 3 and hence if our test
> +                # harness loaded them, it would complain about python 3
> +                # specific syntaxes.
>                   newfilename = os.path.splitext(filename)[0] + '.py'
>                   shutil.copy2(os.path.join(bits_test_dir, filename),
>                                os.path.join(target_test_dir, newfilename))
> @@ -399,8 +399,6 @@ def test_acpi_smbios_bits(self):
>   
>           # biosbits has been configured to run all the specified test suites
>           # in batch mode and then automatically initiate a vm shutdown.
> -        # Set timeout to BITS_TIMEOUT for SHUTDOWN event from bits VM at par
> -        # with the avocado test timeout.
>           self._vm.event_wait('SHUTDOWN', timeout=BITS_TIMEOUT)
>           self._vm.wait(timeout=None)
>           self.logger.debug("Checking console output ...")



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 10/15] tests/functional: honour requested test VM name in QEMUMachine
  2024-11-19 15:05 ` [PATCH 10/15] tests/functional: honour requested test VM name in QEMUMachine Daniel P. Berrangé
@ 2024-11-19 17:40   ` Thomas Huth
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Huth @ 2024-11-19 17:40 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Ani Sinha, Alex Bennée, Philippe Mathieu-Daudé,
	Peter Maydell

On 19/11/2024 16.05, Daniel P. Berrangé wrote:
> The functional test case class is going to the trouble of passing
> around a machine name, but then fails to give this QEMUMachine. As
> a result, QEMUMachine will create a completely random name. Since
> log file names match the machine name, this results in log files
> accumulating over time.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/qemu_test/testcase.py | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
> index ca13af244b..f9c9de1166 100644
> --- a/tests/functional/qemu_test/testcase.py
> +++ b/tests/functional/qemu_test/testcase.py
> @@ -164,6 +164,7 @@ def require_device(self, devicename):
>   
>       def _new_vm(self, name, *args):
>           vm = QEMUMachine(self.qemu_bin,
> +                         name=name,
>                            base_temp_dir=self.workdir,
>                            log_dir=self.logdir)
>           self.log.debug('QEMUMachine "%s" created', name)

Reviewed-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun
  2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
                   ` (14 preceding siblings ...)
  2024-11-19 15:05 ` [PATCH 15/15] tests/functional: add a QMP backdoor for debugging stalled tests Daniel P. Berrangé
@ 2024-11-19 17:59 ` Cédric Le Goater
  15 siblings, 0 replies; 36+ messages in thread
From: Cédric Le Goater @ 2024-11-19 17:59 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell

On 11/19/24 16:05, Daniel P. Berrangé wrote:
> This started out as a series to get rid of the many GBs of temp
> files the functional tests leave behind. Then it expanded into
> improving the functional test debugging by ensuring we preserve
> the QEMU stdout/stderr log file created by the QEMUMachine class.
> In the course of doing that I encountered some other minor points
> worth fixing, and then got side tracked into looking at the tuxrun
> hangs with aarch64be. Investigating the latter exposed some further
> holes in the debugging story prompting yet more patches, as well as
> a final solution for tuxrun. So this series does:
> 
>   * Purge all scratch files created by tests
>   * Preserve the stdout/stderr log file
>   * Capture debug log messages on QEMUMachine
>   * Provide a QMP backdoor for debugging stuck QEMUs
>   * Enhance console handling for partial line matches
>   * Fix the tuxrun tests by eliminating sleeps
> 
> There's quite alot of code here, but at the same time it feels like
> the kind of stuff that'll be valuable either in the 9.2 release, or
> in the soon to exist 9.2 stable branch.

I gave the series a try on a RHEL9 host. I opened one new issue [1]
which seems related to QEMU modelling and not the test.

On top of that, I included the aspeed conversion patches from :

   https://lore.kernel.org/all/20241112130246.970281-1-clg@redhat.com/

Tested-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

[1] https://gitlab.com/qemu-project/qemu/-/issues/2691


> NB, with this series applied Thomas' tuxrun conversion to functional
> testing survives 200 iterations on my machine, whereas it would
> reliably hang in < 20, and often in < 10, before.
> 
> Daniel P. Berrangé (15):
>    tests/functional: fix mips64el test to honour workdir
>    tests/functional: automatically clean up scratch files after tests
>    tests/functional: remove "AVOCADO" from env variable name
>    tests/functional: remove todo wrt avocado.utils.wait_for
>    tests/functional: remove leftover :avocado: tags
>    tests/functional: remove obsolete reference to avocado bug
>    tests/functional: remove comments talking about avocado
>    tests/functional: honour self.workdir in ACPI bits tests
>    tests/functional: put QEMUMachine logs in testcase log directory
>    tests/functional: honour requested test VM name in QEMUMachine
>    tests/functional: enable debug logging for QEMUMachine
>    tests/functional: logs details of console interaction operations
>    tests/functional: rewrite console handling to be bytewise
>    tests/functional: remove time.sleep usage from tuxrun tests
>    tests/functional: add a QMP backdoor for debugging stalled tests
> 
>   docs/devel/testing/functional.rst        | 16 ++++++
>   tests/functional/qemu_test/cmd.py        | 65 ++++++++++++++++++------
>   tests/functional/qemu_test/testcase.py   | 33 +++++++++---
>   tests/functional/qemu_test/tuxruntest.py | 17 +++----
>   tests/functional/test_acpi_bits.py       | 56 +++++++-------------
>   tests/functional/test_arm_bpim2u.py      | 20 --------
>   tests/functional/test_arm_orangepi.py    | 27 ----------
>   tests/functional/test_m68k_nextcube.py   |  3 +-
>   tests/functional/test_mips64el_malta.py  |  4 +-
>   9 files changed, 119 insertions(+), 122 deletions(-)
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 13/15] tests/functional: rewrite console handling to be bytewise
  2024-11-19 17:11   ` Paolo Bonzini
@ 2024-11-19 18:54     ` Daniel P. Berrangé
  2024-11-19 19:26       ` Paolo Bonzini
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel P. Berrangé @ 2024-11-19 18:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell, Cédric Le Goater

On Tue, Nov 19, 2024 at 06:11:10PM +0100, Paolo Bonzini wrote:
> On 11/19/24 16:05, Daniel P. Berrangé wrote:
> > The console interaction that waits for predicted strings uses
> > readline(), and thus is only capable of waiting for strings
> > that are followed by a newline.
> > 
> > This is inconvenient when needing to match on some things,
> > particularly login prompts, or shell prompts, causing tests
> > to use time.sleep(...) instead, which is unreliable.
> > 
> > Switch to reading the console 1 byte at a time, comparing
> > against the success/failure messages until we see a match,
> > regardless of whether a newline is encountered.
> > 
> > The success/failure comparisons are done with the python bytes
> > type, rather than strings, to avoid the problem of needing to
> > decode partially received multibyte utf8 characters.
> > 
> > Heavily inspired by a patch proposed by Cédric, but written
> > again to work in bytes, rather than strings.
> > 
> > Co-developed-by: Cédric Le Goater <clg@redhat.com>
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >   tests/functional/qemu_test/cmd.py | 63 +++++++++++++++++++++++--------
> >   1 file changed, 48 insertions(+), 15 deletions(-)
> > 
> > diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py
> > index 76a48064cd..91267a087f 100644
> > --- a/tests/functional/qemu_test/cmd.py
> > +++ b/tests/functional/qemu_test/cmd.py
> > @@ -78,15 +78,58 @@ def run_cmd(args):
> >   def is_readable_executable_file(path):
> >       return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
> > +def _console_readline(test, vm, success, failure):
> > +    msg = bytes([])
> > +    done = False
> > +    while True:
> > +        c = vm.console_socket.recv(1)
> > +        if c is None:
> > +            done = True
> > +            test.fail(
> > +                f"EOF in console, expected '{success}'")
> > +            break
> > +        msg += c
> > +
> > +        if success is None or success in msg:
> 
> As an optimization, you could use msg.endswith(success) and
> msg.endswith(failure), which would avoid the most blatant cases of O(n^2)
> behavior.
> 
> More important, I think "if success is None" should not be here, because it
> will exit after one char.  Instead...
> 
> > +            done = True
> > +            break
> > +        if failure and failure in msg:
> > +            done = True
> > +            vm.console_socket.close()
> > +            test.fail(
> > +                f"'{failure}' found in console, expected '{success}'")
> > +
> > +        if c == b'\n':
> 
> Here you can put
> 
>                done = success is None

Hmmm, this can only be a problem if "success" is None, and
"failure" is not None, and although the old code would
technically work in that case, I think it is actually an
unknown/invalid usage scenario.

If BOTH "success" and "failure" are None, this method won't
be called at all. It is valid for "failure" to be none, but
I don't think it makes semantic sense for "success" to also
be None, while have "failure" be non-None.

So I'm inclined to say we declare 'success' to be mandatory
and validate that in the caller. eg

 assert send_string is not None or success_message is not None


and then remove this "success is None" check from
_console_readline.

> 
> Paolo
> 
> > +            break
> > +
> > +    console_logger = logging.getLogger('console')
> > +    try:
> > +        console_logger.debug(msg.decode().strip())
> > +    except:
> > +        console_logger.debug(msg)
> > +
> > +    return done
> > +
> >   def _console_interaction(test, success_message, failure_message,
> >                            send_string, keep_sending=False, vm=None):
> >       assert not keep_sending or send_string
> >       if vm is None:
> >           vm = test.vm
> > -    console = vm.console_file
> > -    console_logger = logging.getLogger('console')
> > +
> >       test.log.debug(f"Console interaction success:'{success_message}' " +
> >                      f"failure:'{failure_message}' send:'{send_string}'")
> > +
> > +    # We'll process console in bytes, to avoid having to
> > +    # deal with unicode decode errors from receiving
> > +    # partial utf8 byte sequences
> > +    success_message_b = None
> > +    if success_message is not None:
> > +        success_message_b = success_message.encode()
> > +
> > +    failure_message_b = None
> > +    if failure_message is not None:
> > +        failure_message_b = failure_message.encode()
> > +
> >       while True:
> >           if send_string:
> >               vm.console_socket.sendall(send_string.encode())
> > @@ -99,20 +142,10 @@ def _console_interaction(test, success_message, failure_message,
> >                   break
> >               continue
> > -        try:
> > -            msg = console.readline().decode().strip()
> > -        except UnicodeDecodeError:
> > -            msg = None
> > -        if not msg:
> > -            continue
> > -        console_logger.debug(msg)
> > -        if success_message is None or success_message in msg:
> > +        if _console_readline(test, vm,
> > +                             success_message_b,
> > +                             failure_message_b):
> >               break
> > -        if failure_message and failure_message in msg:
> > -            console.close()
> > -            fail = 'Failure message found in console: "%s". Expected: "%s"' % \
> > -                    (failure_message, success_message)
> > -            test.fail(fail)
> >   def interrupt_interactive_console_until_pattern(test, success_message,
> >                                                   failure_message=None,
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 13/15] tests/functional: rewrite console handling to be bytewise
  2024-11-19 18:54     ` Daniel P. Berrangé
@ 2024-11-19 19:26       ` Paolo Bonzini
  0 siblings, 0 replies; 36+ messages in thread
From: Paolo Bonzini @ 2024-11-19 19:26 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Ani Sinha, Alex Bennée,
	Philippe Mathieu-Daudé, Peter Maydell, Cédric Le Goater

On 11/19/24 19:54, Daniel P. Berrangé wrote:
>>> +        if success is None or success in msg:
>>
>> As an optimization, you could use msg.endswith(success) and
>> msg.endswith(failure), which would avoid the most blatant cases of O(n^2)
>> behavior.
>>
>> More important, I think "if success is None" should not be here, because it
>> will exit after one char.  Instead...
>>
>>> +            done = True
>>> +            break
>>> +        if failure and failure in msg:
>>> +            done = True
>>> +            vm.console_socket.close()
>>> +            test.fail(
>>> +                f"'{failure}' found in console, expected '{success}'")
>>> +
>>> +        if c == b'\n':
>>
>> Here you can put
>>
>>                 done = success is None
> 
> Hmmm, this can only be a problem if "success" is None, and
> "failure" is not None, and although the old code would
> technically work in that case, I think it is actually an
> unknown/invalid usage scenario.
> 
> If BOTH "success" and "failure" are None, this method won't
> be called at all. It is valid for "failure" to be none, but
> I don't think it makes semantic sense for "success" to also
> be None, while have "failure" be non-None.

"Read a line and check that it doesn't contain a substring" seemed like 
a plausible thing to test for, but you're right it doesn't make much 
sense in this context.  It would make more sense if _console_readline 
returned the full line, at which point it would be a completely 
different function and probably not underscore-prefixed.

So yeah, this is okay:

>   assert send_string is not None or success_message is not None

whether done in the caller or in _console_readline itself, as you prefer.

Paolo



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/15] tests/functional: enable debug logging for QEMUMachine
  2024-11-19 15:05 ` [PATCH 11/15] tests/functional: enable debug logging for QEMUMachine Daniel P. Berrangé
@ 2024-11-21  6:52   ` Thomas Huth
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Huth @ 2024-11-21  6:52 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Ani Sinha, Alex Bennée, Philippe Mathieu-Daudé,
	Peter Maydell

On 19/11/2024 16.05, Daniel P. Berrangé wrote:
> Set the 'qemu.machine' logger to 'DEBUG' level, to ensure we see log
> messages related to the QEMUMachine class. Most importantly this
> ensures we capture the full QEMU command line args for instances we
> spawn.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/qemu_test/testcase.py | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
> index f9c9de1166..e2a329c3e5 100644
> --- a/tests/functional/qemu_test/testcase.py
> +++ b/tests/functional/qemu_test/testcase.py
> @@ -57,9 +57,15 @@ def setUp(self, bin_prefix):
>           self._log_fh.setFormatter(fileFormatter)
>           self.log.addHandler(self._log_fh)
>   
> +        # Capture QEMUMachine logging
> +        self.machinelog = logging.getLogger('qemu.machine')
> +        self.machinelog.setLevel(logging.DEBUG)
> +        self.machinelog.addHandler(self._log_fh)
> +
>       def tearDown(self):
>           if "QEMU_TEST_KEEP_SCRATCH" not in os.environ:
>               shutil.rmtree(self.workdir)
> +        self.machinelog.removeHandler(self._log_fh)
>           self.log.removeHandler(self._log_fh)
>   
>       def main():

That looks better in the logs, indeed!

Tested-by: Thomas Huth <thuth@redhat.com>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 12/15] tests/functional: logs details of console interaction operations
  2024-11-19 15:05 ` [PATCH 12/15] tests/functional: logs details of console interaction operations Daniel P. Berrangé
@ 2024-11-21  6:58   ` Thomas Huth
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Huth @ 2024-11-21  6:58 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Ani Sinha, Alex Bennée, Philippe Mathieu-Daudé,
	Peter Maydell

On 19/11/2024 16.05, Daniel P. Berrangé wrote:
> When functional tests go wrong, it will often be related to the console
> interaction wait state. By logging the messages that we're looking for,
> and data we're about to be sending, it'll be easier to diagnose where
> tests are getting stuck.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/functional/qemu_test/cmd.py | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py
> index cbabb1ceed..76a48064cd 100644
> --- a/tests/functional/qemu_test/cmd.py
> +++ b/tests/functional/qemu_test/cmd.py
> @@ -85,6 +85,8 @@ def _console_interaction(test, success_message, failure_message,
>           vm = test.vm
>       console = vm.console_file
>       console_logger = logging.getLogger('console')
> +    test.log.debug(f"Console interaction success:'{success_message}' " +

I have to admit that I had to read the message twice to really understand 
it. Since the message starts with "Console interaction success", I thought 
that this message means that we successfully matched a string already.

So maybe rather change the text to read:

  Console interaction: success_msg=... failure_msg=...

?

  Thomas



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 15/15] tests/functional: add a QMP backdoor for debugging stalled tests
  2024-11-19 15:05 ` [PATCH 15/15] tests/functional: add a QMP backdoor for debugging stalled tests Daniel P. Berrangé
@ 2024-11-21  7:01   ` Thomas Huth
  0 siblings, 0 replies; 36+ messages in thread
From: Thomas Huth @ 2024-11-21  7:01 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Ani Sinha, Alex Bennée, Philippe Mathieu-Daudé,
	Peter Maydell

On 19/11/2024 16.05, Daniel P. Berrangé wrote:
> Support the QEMU_TEST_QMP_BACKDOOR=backdoor.sock env variable as a
> way to get a QMP backdoor for debugging a stalled QEMU test. Most
> typically this would be used if running the tests directly:
> 
>   $ QEMU_TEST_QMP_BACKDOOR=backdoor.sock \
>     QEMU_TEST_QEMU_BINARY=./build/qemu-system-arm \
>     PYTHONPATH=./python \
>     ./tests/functional/test_arm_tuxrun.py
> 
> And then, when the test stalls, in a second shell run:
> 
>   $ ./scripts/qmp/qmp-shell backdoor.sock
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   docs/devel/testing/functional.rst      | 10 ++++++++++
>   tests/functional/qemu_test/testcase.py |  7 +++++++
>   2 files changed, 17 insertions(+)
> 
> diff --git a/docs/devel/testing/functional.rst b/docs/devel/testing/functional.rst
> index 6b5d0c5b98..b8ad7b0bf7 100644
> --- a/docs/devel/testing/functional.rst
> +++ b/docs/devel/testing/functional.rst
> @@ -176,6 +176,16 @@ primarily depend on the value of the ``qemu_bin`` class attribute.
>   If it is not explicitly set by the test code, its default value will
>   be the result the QEMU_TEST_QEMU_BINARY environment variable.
>   
> +Debugging hung QEMU
> +^^^^^^^^^^^^^^^^^^^
> +
> +When test cases go wrong it may be helpful to debug a stalled QEMU
> +process. While the QEMUMachine class owns the primary QMP monitor
> +socket, it is possible to request a second QMP monitor be created
> +by setting the ``QEMU_TEST_QMP_BACKDOOR`` env variable to refer
> +to a UNIX socket name. The ``qmp-shell`` command can then be
> +attached to the stalled QEMU to examine its live state.
> +
>   Attribute reference
>   -------------------
>   
> diff --git a/tests/functional/qemu_test/testcase.py b/tests/functional/qemu_test/testcase.py
> index e2a329c3e5..eb889a5bae 100644
> --- a/tests/functional/qemu_test/testcase.py
> +++ b/tests/functional/qemu_test/testcase.py
> @@ -175,6 +175,13 @@ def _new_vm(self, name, *args):
>                            log_dir=self.logdir)
>           self.log.debug('QEMUMachine "%s" created', name)
>           self.log.debug('QEMUMachine "%s" temp_dir: %s', name, vm.temp_dir)
> +
> +        if "QEMU_TEST_QMP_BACKDOOR" in os.environ:
> +            path = os.environ["QEMU_TEST_QMP_BACKDOOR"]

Maybe slightly nicer to chech os.environ only once (and specify the 
environment variable name only once):

         path = os.environ.get('QEMU_TEST_QMP_BACKDOOR')
         if path:
             ...

?

  Thomas

> +            vm.add_args("-chardev",
> +                        f"socket,id=backdoor,path={path},server=on,wait=off",
> +                        "-mon", "chardev=backdoor,mode=control")
> +
>           if args:
>               vm.add_args(*args)
>           return vm



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-11-21  7:02 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-19 15:05 [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Daniel P. Berrangé
2024-11-19 15:05 ` [PATCH 01/15] tests/functional: fix mips64el test to honour workdir Daniel P. Berrangé
2024-11-19 15:35   ` Alex Bennée
2024-11-19 17:08   ` Philippe Mathieu-Daudé
2024-11-19 15:05 ` [PATCH 02/15] tests/functional: automatically clean up scratch files after tests Daniel P. Berrangé
2024-11-19 16:21   ` Alex Bennée
2024-11-19 17:28     ` Thomas Huth
2024-11-19 15:05 ` [PATCH 03/15] tests/functional: remove "AVOCADO" from env variable name Daniel P. Berrangé
2024-11-19 16:22   ` Alex Bennée
2024-11-19 17:09   ` Philippe Mathieu-Daudé
2024-11-19 15:05 ` [PATCH 04/15] tests/functional: remove todo wrt avocado.utils.wait_for Daniel P. Berrangé
2024-11-19 16:37   ` Alex Bennée
2024-11-19 15:05 ` [PATCH 05/15] tests/functional: remove leftover :avocado: tags Daniel P. Berrangé
2024-11-19 16:37   ` Alex Bennée
2024-11-19 15:05 ` [PATCH 06/15] tests/functional: remove obsolete reference to avocado bug Daniel P. Berrangé
2024-11-19 16:39   ` Alex Bennée
2024-11-19 15:05 ` [PATCH 07/15] tests/functional: remove comments talking about avocado Daniel P. Berrangé
2024-11-19 17:32   ` Thomas Huth
2024-11-19 15:05 ` [PATCH 08/15] tests/functional: honour self.workdir in ACPI bits tests Daniel P. Berrangé
2024-11-19 17:09   ` Alex Bennée
2024-11-19 15:05 ` [PATCH 09/15] tests/functional: put QEMUMachine logs in testcase log directory Daniel P. Berrangé
2024-11-19 17:10   ` Alex Bennée
2024-11-19 15:05 ` [PATCH 10/15] tests/functional: honour requested test VM name in QEMUMachine Daniel P. Berrangé
2024-11-19 17:40   ` Thomas Huth
2024-11-19 15:05 ` [PATCH 11/15] tests/functional: enable debug logging for QEMUMachine Daniel P. Berrangé
2024-11-21  6:52   ` Thomas Huth
2024-11-19 15:05 ` [PATCH 12/15] tests/functional: logs details of console interaction operations Daniel P. Berrangé
2024-11-21  6:58   ` Thomas Huth
2024-11-19 15:05 ` [PATCH 13/15] tests/functional: rewrite console handling to be bytewise Daniel P. Berrangé
2024-11-19 17:11   ` Paolo Bonzini
2024-11-19 18:54     ` Daniel P. Berrangé
2024-11-19 19:26       ` Paolo Bonzini
2024-11-19 15:05 ` [PATCH 14/15] tests/functional: remove time.sleep usage from tuxrun tests Daniel P. Berrangé
2024-11-19 15:05 ` [PATCH 15/15] tests/functional: add a QMP backdoor for debugging stalled tests Daniel P. Berrangé
2024-11-21  7:01   ` Thomas Huth
2024-11-19 17:59 ` [PATCH 00/15] test/functional: improve functional test debugging & fix tuxrun Cédric Le Goater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).