qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tests/avocado: Fix console data loss
@ 2023-09-12 13:13 Nicholas Piggin
  2023-09-12 15:34 ` Daniel P. Berrangé
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Nicholas Piggin @ 2023-09-12 13:13 UTC (permalink / raw)
  To: John Snow, Cleber Rosa
  Cc: Nicholas Piggin, Beraldo Leal, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, qemu-devel, qemu-ppc

Occasionally some avocado tests will fail waiting for console line
despite the machine running correctly. Console data goes missing, as can
be seen in the console log. This is due to _console_interaction calling
makefile() on the console socket each time it is invoked, which must be
losing old buffer contents when going out of scope.

It is not enough to makefile() with buffered=0. That helps significantly
but data loss is still possible. My guess is that readline() has a line
buffer even when the file is in unbuffered mode, that can eat data.

Fix this by providing a console file that persists for the life of the
console.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

For some reason, ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_192m
was flakey for me due to this bug. I don't know why that in particular,
3 calls to wait_for_console_pattern probably helps.

I didn't pinpoint when the bug was introduced because the original
was probably not buggy because it was only run once at the end of the
test. At some point after it was moved to common code, something would
have started to call it more than once which is where potential for bug
is introduced.

Thanks,
Nick

 python/qemu/machine/machine.py         | 19 +++++++++++++++++++
 tests/avocado/avocado_qemu/__init__.py |  2 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index c16a0b6fed..35d5a672db 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -191,6 +191,7 @@ def __init__(self,
             self.sock_dir, f"{self._name}.con"
         )
         self._console_socket: Optional[socket.socket] = None
+        self._console_file: Optional[socket.SocketIO] = None
         self._remove_files: List[str] = []
         self._user_killed = False
         self._quit_issued = False
@@ -509,6 +510,11 @@ def _early_cleanup(self) -> None:
         # If we keep the console socket open, we may deadlock waiting
         # for QEMU to exit, while QEMU is waiting for the socket to
         # become writable.
+        if self._console_file is not None:
+            LOG.debug("Closing console file")
+            self._console_file.close()
+            self._console_file = None
+
         if self._console_socket is not None:
             LOG.debug("Closing console socket")
             self._console_socket.close()
@@ -874,12 +880,25 @@ def console_socket(self) -> socket.socket:
         Returns a socket connected to the console
         """
         if self._console_socket is None:
+            LOG.debug("Opening console socket")
             self._console_socket = console_socket.ConsoleSocket(
                 self._console_address,
                 file=self._console_log_path,
                 drain=self._drain_console)
         return self._console_socket
 
+    @property
+    def console_file(self) -> socket.SocketIO:
+        """
+        Returns a file associated with the console socket
+        """
+        if self._console_file is None:
+            LOG.debug("Opening console file")
+            self._console_file = self.console_socket.makefile(mode='rb',
+                                                              buffering=0,
+                                                              encoding='utf-8')
+        return self._console_file
+
     @property
     def temp_dir(self) -> str:
         """
diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
index 33090903f1..0172a359b7 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -137,7 +137,7 @@ def _console_interaction(test, success_message, failure_message,
     assert not keep_sending or send_string
     if vm is None:
         vm = test.vm
-    console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
+    console = vm.console_file
     console_logger = logging.getLogger('console')
     while True:
         if send_string:
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] tests/avocado: Fix console data loss
  2023-09-12 13:13 [PATCH] tests/avocado: Fix console data loss Nicholas Piggin
@ 2023-09-12 15:34 ` Daniel P. Berrangé
  2023-09-12 16:42 ` Philippe Mathieu-Daudé
  2023-09-13  8:51 ` Alex Bennée
  2 siblings, 0 replies; 6+ messages in thread
From: Daniel P. Berrangé @ 2023-09-12 15:34 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: John Snow, Cleber Rosa, Beraldo Leal, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, qemu-devel, qemu-ppc

On Tue, Sep 12, 2023 at 11:13:40PM +1000, Nicholas Piggin wrote:
> Occasionally some avocado tests will fail waiting for console line
> despite the machine running correctly. Console data goes missing, as can
> be seen in the console log. This is due to _console_interaction calling
> makefile() on the console socket each time it is invoked, which must be
> losing old buffer contents when going out of scope.
> 
> It is not enough to makefile() with buffered=0. That helps significantly
> but data loss is still possible. My guess is that readline() has a line
> buffer even when the file is in unbuffered mode, that can eat data.
> 
> Fix this by providing a console file that persists for the life of the
> console.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> 
> For some reason, ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_192m
> was flakey for me due to this bug. I don't know why that in particular,
> 3 calls to wait_for_console_pattern probably helps.
> 
> I didn't pinpoint when the bug was introduced because the original
> was probably not buggy because it was only run once at the end of the
> test. At some point after it was moved to common code, something would
> have started to call it more than once which is where potential for bug
> is introduced.

I'm suspecting this patch might also fix many other failures we
see in avocado, all with wait_for_console_pattern in their trace

https://gitlab.com/qemu-project/qemu/-/issues/1884

>  python/qemu/machine/machine.py         | 19 +++++++++++++++++++
>  tests/avocado/avocado_qemu/__init__.py |  2 +-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index c16a0b6fed..35d5a672db 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -191,6 +191,7 @@ def __init__(self,
>              self.sock_dir, f"{self._name}.con"
>          )
>          self._console_socket: Optional[socket.socket] = None
> +        self._console_file: Optional[socket.SocketIO] = None
>          self._remove_files: List[str] = []
>          self._user_killed = False
>          self._quit_issued = False
> @@ -509,6 +510,11 @@ def _early_cleanup(self) -> None:
>          # If we keep the console socket open, we may deadlock waiting
>          # for QEMU to exit, while QEMU is waiting for the socket to
>          # become writable.
> +        if self._console_file is not None:
> +            LOG.debug("Closing console file")
> +            self._console_file.close()
> +            self._console_file = None
> +
>          if self._console_socket is not None:
>              LOG.debug("Closing console socket")
>              self._console_socket.close()
> @@ -874,12 +880,25 @@ def console_socket(self) -> socket.socket:
>          Returns a socket connected to the console
>          """
>          if self._console_socket is None:
> +            LOG.debug("Opening console socket")
>              self._console_socket = console_socket.ConsoleSocket(
>                  self._console_address,
>                  file=self._console_log_path,
>                  drain=self._drain_console)
>          return self._console_socket
>  
> +    @property
> +    def console_file(self) -> socket.SocketIO:
> +        """
> +        Returns a file associated with the console socket
> +        """
> +        if self._console_file is None:
> +            LOG.debug("Opening console file")
> +            self._console_file = self.console_socket.makefile(mode='rb',
> +                                                              buffering=0,
> +                                                              encoding='utf-8')
> +        return self._console_file
> +
>      @property
>      def temp_dir(self) -> str:
>          """
> diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
> index 33090903f1..0172a359b7 100644
> --- a/tests/avocado/avocado_qemu/__init__.py
> +++ b/tests/avocado/avocado_qemu/__init__.py
> @@ -137,7 +137,7 @@ def _console_interaction(test, success_message, failure_message,
>      assert not keep_sending or send_string
>      if vm is None:
>          vm = test.vm
> -    console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
> +    console = vm.console_file
>      console_logger = logging.getLogger('console')
>      while True:
>          if send_string:

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] tests/avocado: Fix console data loss
  2023-09-12 13:13 [PATCH] tests/avocado: Fix console data loss Nicholas Piggin
  2023-09-12 15:34 ` Daniel P. Berrangé
@ 2023-09-12 16:42 ` Philippe Mathieu-Daudé
  2023-09-13  8:51 ` Alex Bennée
  2 siblings, 0 replies; 6+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-09-12 16:42 UTC (permalink / raw)
  To: Nicholas Piggin, John Snow, Cleber Rosa, Alex Bennée
  Cc: Beraldo Leal, Wainer dos Santos Moschetta, qemu-devel, qemu-ppc

+Alex who also looked at that IIRC.

On 12/9/23 15:13, Nicholas Piggin wrote:
> Occasionally some avocado tests will fail waiting for console line
> despite the machine running correctly. Console data goes missing, as can
> be seen in the console log. This is due to _console_interaction calling
> makefile() on the console socket each time it is invoked, which must be
> losing old buffer contents when going out of scope.
> 
> It is not enough to makefile() with buffered=0. That helps significantly
> but data loss is still possible. My guess is that readline() has a line
> buffer even when the file is in unbuffered mode, that can eat data.
> 
> Fix this by providing a console file that persists for the life of the
> console.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> 
> For some reason, ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_192m
> was flakey for me due to this bug. I don't know why that in particular,
> 3 calls to wait_for_console_pattern probably helps.
> 
> I didn't pinpoint when the bug was introduced because the original
> was probably not buggy because it was only run once at the end of the
> test. At some point after it was moved to common code, something would
> have started to call it more than once which is where potential for bug
> is introduced.
> 
> Thanks,
> Nick
> 
>   python/qemu/machine/machine.py         | 19 +++++++++++++++++++
>   tests/avocado/avocado_qemu/__init__.py |  2 +-
>   2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index c16a0b6fed..35d5a672db 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -191,6 +191,7 @@ def __init__(self,
>               self.sock_dir, f"{self._name}.con"
>           )
>           self._console_socket: Optional[socket.socket] = None
> +        self._console_file: Optional[socket.SocketIO] = None
>           self._remove_files: List[str] = []
>           self._user_killed = False
>           self._quit_issued = False
> @@ -509,6 +510,11 @@ def _early_cleanup(self) -> None:
>           # If we keep the console socket open, we may deadlock waiting
>           # for QEMU to exit, while QEMU is waiting for the socket to
>           # become writable.
> +        if self._console_file is not None:
> +            LOG.debug("Closing console file")
> +            self._console_file.close()
> +            self._console_file = None
> +
>           if self._console_socket is not None:
>               LOG.debug("Closing console socket")
>               self._console_socket.close()
> @@ -874,12 +880,25 @@ def console_socket(self) -> socket.socket:
>           Returns a socket connected to the console
>           """
>           if self._console_socket is None:
> +            LOG.debug("Opening console socket")
>               self._console_socket = console_socket.ConsoleSocket(
>                   self._console_address,
>                   file=self._console_log_path,
>                   drain=self._drain_console)
>           return self._console_socket
>   
> +    @property
> +    def console_file(self) -> socket.SocketIO:
> +        """
> +        Returns a file associated with the console socket
> +        """
> +        if self._console_file is None:
> +            LOG.debug("Opening console file")
> +            self._console_file = self.console_socket.makefile(mode='rb',
> +                                                              buffering=0,
> +                                                              encoding='utf-8')
> +        return self._console_file
> +
>       @property
>       def temp_dir(self) -> str:
>           """
> diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
> index 33090903f1..0172a359b7 100644
> --- a/tests/avocado/avocado_qemu/__init__.py
> +++ b/tests/avocado/avocado_qemu/__init__.py
> @@ -137,7 +137,7 @@ def _console_interaction(test, success_message, failure_message,
>       assert not keep_sending or send_string
>       if vm is None:
>           vm = test.vm
> -    console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
> +    console = vm.console_file
>       console_logger = logging.getLogger('console')
>       while True:
>           if send_string:



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] tests/avocado: Fix console data loss
  2023-09-12 13:13 [PATCH] tests/avocado: Fix console data loss Nicholas Piggin
  2023-09-12 15:34 ` Daniel P. Berrangé
  2023-09-12 16:42 ` Philippe Mathieu-Daudé
@ 2023-09-13  8:51 ` Alex Bennée
  2023-09-14 15:24   ` John Snow
  2023-09-15  0:07   ` Nicholas Piggin
  2 siblings, 2 replies; 6+ messages in thread
From: Alex Bennée @ 2023-09-13  8:51 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: John Snow, Cleber Rosa, Beraldo Leal, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, qemu-ppc, qemu-devel


Nicholas Piggin <npiggin@gmail.com> writes:

> Occasionally some avocado tests will fail waiting for console line
> despite the machine running correctly. Console data goes missing, as can
> be seen in the console log. This is due to _console_interaction calling
> makefile() on the console socket each time it is invoked, which must be
> losing old buffer contents when going out of scope.
>
> It is not enough to makefile() with buffered=0. That helps significantly
> but data loss is still possible. My guess is that readline() has a line
> buffer even when the file is in unbuffered mode, that can eat data.
>
> Fix this by providing a console file that persists for the life of the
> console.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Queued to testing/next, thanks.

> ---
>
> For some reason, ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_192m
> was flakey for me due to this bug. I don't know why that in particular,
> 3 calls to wait_for_console_pattern probably helps.
>
> I didn't pinpoint when the bug was introduced because the original
> was probably not buggy because it was only run once at the end of the
> test. At some point after it was moved to common code, something would
> have started to call it more than once which is where potential for bug
> is introduced.

There is a sprawling mass somewhere between:

  - pythons buffering of IO
  - device models dropping chars when blocked
  - noisy tests with competing console output

that adds up to unreliable tests that rely on seeing certain patterns on
the console. 

>
> Thanks,
> Nick
>
>  python/qemu/machine/machine.py         | 19 +++++++++++++++++++
>  tests/avocado/avocado_qemu/__init__.py |  2 +-
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index c16a0b6fed..35d5a672db 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -191,6 +191,7 @@ def __init__(self,
>              self.sock_dir, f"{self._name}.con"
>          )
>          self._console_socket: Optional[socket.socket] = None
> +        self._console_file: Optional[socket.SocketIO] = None
>          self._remove_files: List[str] = []
>          self._user_killed = False
>          self._quit_issued = False
> @@ -509,6 +510,11 @@ def _early_cleanup(self) -> None:
>          # If we keep the console socket open, we may deadlock waiting
>          # for QEMU to exit, while QEMU is waiting for the socket to
>          # become writable.
> +        if self._console_file is not None:
> +            LOG.debug("Closing console file")
> +            self._console_file.close()
> +            self._console_file = None
> +
>          if self._console_socket is not None:
>              LOG.debug("Closing console socket")
>              self._console_socket.close()
> @@ -874,12 +880,25 @@ def console_socket(self) -> socket.socket:
>          Returns a socket connected to the console
>          """
>          if self._console_socket is None:
> +            LOG.debug("Opening console socket")
>              self._console_socket = console_socket.ConsoleSocket(
>                  self._console_address,
>                  file=self._console_log_path,
>                  drain=self._drain_console)
>          return self._console_socket
>  
> +    @property
> +    def console_file(self) -> socket.SocketIO:
> +        """
> +        Returns a file associated with the console socket
> +        """
> +        if self._console_file is None:
> +            LOG.debug("Opening console file")
> +            self._console_file = self.console_socket.makefile(mode='rb',
> +                                                              buffering=0,
> +                                                              encoding='utf-8')
> +        return self._console_file
> +
>      @property
>      def temp_dir(self) -> str:
>          """
> diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
> index 33090903f1..0172a359b7 100644
> --- a/tests/avocado/avocado_qemu/__init__.py
> +++ b/tests/avocado/avocado_qemu/__init__.py
> @@ -137,7 +137,7 @@ def _console_interaction(test, success_message, failure_message,
>      assert not keep_sending or send_string
>      if vm is None:
>          vm = test.vm
> -    console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
> +    console = vm.console_file
>      console_logger = logging.getLogger('console')
>      while True:
>          if send_string:


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] tests/avocado: Fix console data loss
  2023-09-13  8:51 ` Alex Bennée
@ 2023-09-14 15:24   ` John Snow
  2023-09-15  0:07   ` Nicholas Piggin
  1 sibling, 0 replies; 6+ messages in thread
From: John Snow @ 2023-09-14 15:24 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Nicholas Piggin, Cleber Rosa, Beraldo Leal,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	qemu-ppc, qemu-devel

On Wed, Sep 13, 2023 at 4:53 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> > Occasionally some avocado tests will fail waiting for console line
> > despite the machine running correctly. Console data goes missing, as can
> > be seen in the console log. This is due to _console_interaction calling
> > makefile() on the console socket each time it is invoked, which must be
> > losing old buffer contents when going out of scope.
> >
> > It is not enough to makefile() with buffered=0. That helps significantly
> > but data loss is still possible. My guess is that readline() has a line
> > buffer even when the file is in unbuffered mode, that can eat data.
> >
> > Fix this by providing a console file that persists for the life of the
> > console.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
> Queued to testing/next, thanks.

Sure, ACK.

I didn't realize that avocado was technically creating multiple
buffered files like this.

--js

>
> > ---
> >
> > For some reason, ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_192m
> > was flakey for me due to this bug. I don't know why that in particular,
> > 3 calls to wait_for_console_pattern probably helps.
> >
> > I didn't pinpoint when the bug was introduced because the original
> > was probably not buggy because it was only run once at the end of the
> > test. At some point after it was moved to common code, something would
> > have started to call it more than once which is where potential for bug
> > is introduced.
>
> There is a sprawling mass somewhere between:
>
>   - pythons buffering of IO
>   - device models dropping chars when blocked
>   - noisy tests with competing console output
>
> that adds up to unreliable tests that rely on seeing certain patterns on
> the console.
>
> >
> > Thanks,
> > Nick
> >
> >  python/qemu/machine/machine.py         | 19 +++++++++++++++++++
> >  tests/avocado/avocado_qemu/__init__.py |  2 +-
> >  2 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> > index c16a0b6fed..35d5a672db 100644
> > --- a/python/qemu/machine/machine.py
> > +++ b/python/qemu/machine/machine.py
> > @@ -191,6 +191,7 @@ def __init__(self,
> >              self.sock_dir, f"{self._name}.con"
> >          )
> >          self._console_socket: Optional[socket.socket] = None
> > +        self._console_file: Optional[socket.SocketIO] = None
> >          self._remove_files: List[str] = []
> >          self._user_killed = False
> >          self._quit_issued = False
> > @@ -509,6 +510,11 @@ def _early_cleanup(self) -> None:
> >          # If we keep the console socket open, we may deadlock waiting
> >          # for QEMU to exit, while QEMU is waiting for the socket to
> >          # become writable.
> > +        if self._console_file is not None:
> > +            LOG.debug("Closing console file")
> > +            self._console_file.close()
> > +            self._console_file = None
> > +
> >          if self._console_socket is not None:
> >              LOG.debug("Closing console socket")
> >              self._console_socket.close()
> > @@ -874,12 +880,25 @@ def console_socket(self) -> socket.socket:
> >          Returns a socket connected to the console
> >          """
> >          if self._console_socket is None:
> > +            LOG.debug("Opening console socket")
> >              self._console_socket = console_socket.ConsoleSocket(
> >                  self._console_address,
> >                  file=self._console_log_path,
> >                  drain=self._drain_console)
> >          return self._console_socket
> >
> > +    @property
> > +    def console_file(self) -> socket.SocketIO:
> > +        """
> > +        Returns a file associated with the console socket
> > +        """
> > +        if self._console_file is None:
> > +            LOG.debug("Opening console file")
> > +            self._console_file = self.console_socket.makefile(mode='rb',
> > +                                                              buffering=0,
> > +                                                              encoding='utf-8')
> > +        return self._console_file
> > +
> >      @property
> >      def temp_dir(self) -> str:
> >          """
> > diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
> > index 33090903f1..0172a359b7 100644
> > --- a/tests/avocado/avocado_qemu/__init__.py
> > +++ b/tests/avocado/avocado_qemu/__init__.py
> > @@ -137,7 +137,7 @@ def _console_interaction(test, success_message, failure_message,
> >      assert not keep_sending or send_string
> >      if vm is None:
> >          vm = test.vm
> > -    console = vm.console_socket.makefile(mode='rb', encoding='utf-8')
> > +    console = vm.console_file
> >      console_logger = logging.getLogger('console')
> >      while True:
> >          if send_string:
>
>
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] tests/avocado: Fix console data loss
  2023-09-13  8:51 ` Alex Bennée
  2023-09-14 15:24   ` John Snow
@ 2023-09-15  0:07   ` Nicholas Piggin
  1 sibling, 0 replies; 6+ messages in thread
From: Nicholas Piggin @ 2023-09-15  0:07 UTC (permalink / raw)
  To: Alex Bennée
  Cc: John Snow, Cleber Rosa, Beraldo Leal, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, qemu-ppc, qemu-devel

On Wed Sep 13, 2023 at 6:51 PM AEST, Alex Bennée wrote:
>
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> > Occasionally some avocado tests will fail waiting for console line
> > despite the machine running correctly. Console data goes missing, as can
> > be seen in the console log. This is due to _console_interaction calling
> > makefile() on the console socket each time it is invoked, which must be
> > losing old buffer contents when going out of scope.
> >
> > It is not enough to makefile() with buffered=0. That helps significantly
> > but data loss is still possible. My guess is that readline() has a line
> > buffer even when the file is in unbuffered mode, that can eat data.
> >
> > Fix this by providing a console file that persists for the life of the
> > console.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
> Queued to testing/next, thanks.
>
> > ---
> >
> > For some reason, ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_192m
> > was flakey for me due to this bug. I don't know why that in particular,
> > 3 calls to wait_for_console_pattern probably helps.
> >
> > I didn't pinpoint when the bug was introduced because the original
> > was probably not buggy because it was only run once at the end of the
> > test. At some point after it was moved to common code, something would
> > have started to call it more than once which is where potential for bug
> > is introduced.
>
> There is a sprawling mass somewhere between:
>
>   - pythons buffering of IO
>   - device models dropping chars when blocked
>   - noisy tests with competing console output
>
> that adds up to unreliable tests that rely on seeing certain patterns on
> the console. 

Yeah it's a tricky bug and a difficult stack to diagnose. I started to
look at 40p machine firmware console at first since it was happening on
there.

It's actually not too bad now, I was irritating it by putting delays in
various avocado console socket reading, which can trigger it easily (my
guess is due to delay allowing file buffer to pull in more data than is
consumed). With patch the only check-avocado failures I was getting was
some OS watchdog timeouts in their console print code caused by back
pressure.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-15  0:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-12 13:13 [PATCH] tests/avocado: Fix console data loss Nicholas Piggin
2023-09-12 15:34 ` Daniel P. Berrangé
2023-09-12 16:42 ` Philippe Mathieu-Daudé
2023-09-13  8:51 ` Alex Bennée
2023-09-14 15:24   ` John Snow
2023-09-15  0:07   ` Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).