qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang
@ 2023-11-16 11:53 Nicholas Piggin
  2023-11-16 11:53 ` [PATCH 2/3] chardev: report blocked write to chardev backend Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Nicholas Piggin @ 2023-11-16 11:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Marc-André Lureau, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha,
	Daniel P . Berrangé, Thomas Huth, John Snow, qemu-ppc,
	Alex Bennée

Like replay_linux.py, reverse_debugging.py starts the vm with console
set but does not interact with it (e.g., with wait_for_console_pattern).
In this situation, the console should have a drainer attached so the
socket does not fill. replay_linux.py has a drainer, but it is missing
from reverse_debugging.py.

Per analysis in Link: this can cause the console socket/pipe to fill and
QEMU get stuck in qemu_chr_write_buffer, leading to strange test case
failures (ppc64 fails because it prints a lot to console in early bios).
Attaching a drainer prevents this.

Note, this commit does not fix bugs introduced by the commits referenced
in the first two Fixes: tags, but together those commits conspire to
irritate the problem and cause test case failure, which this commit
fixes.

Link: https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
Fixes: 1d4796cd0083 ("python/machine: use socketpair() for console connections")
Fixes: 761a13b23946 ("tests/avocado: ppc64 reverse debugging tests for pseries and powernv")
Fixes: be52eca30978 ("tests/acceptance: add reverse debugging test")
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/avocado/reverse_debugging.py | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index fc47874eda..128d85bc0e 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -12,6 +12,7 @@
 
 from avocado import skipIf
 from avocado_qemu import BUILD_DIR
+from avocado.utils import datadrainer
 from avocado.utils import gdb
 from avocado.utils import process
 from avocado.utils.network.ports import find_free_port
@@ -52,6 +53,10 @@ def run_vm(self, record, shift, args, replay_path, image_path, port):
         if args:
             vm.add_args(*args)
         vm.launch()
+        console_drainer = datadrainer.LineLogger(vm.console_socket.fileno(),
+                                    logger=self.log.getChild('console'),
+                                    stop_check=(lambda : not vm.is_running()))
+        console_drainer.start()
         return vm
 
     @staticmethod
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-16 11:53 [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang Nicholas Piggin
@ 2023-11-16 11:53 ` Nicholas Piggin
  2023-11-20 12:06   ` Marc-André Lureau
  2023-11-16 11:53 ` [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI Nicholas Piggin
  2023-11-16 13:26 ` [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang Ani Sinha
  2 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2023-11-16 11:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Marc-André Lureau, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha,
	Daniel P . Berrangé, Thomas Huth, John Snow, qemu-ppc,
	Alex Bennée

If a chardev socket is not read, it will eventually fill and QEMU
can block attempting to write to it. A difficult bug in avocado
tests where the console socket was not being read from caused this
hang.

warn if a chardev write is blocked for 100ms.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
This is not necessary for the fix but it does trigger in the
failing avocado test without the previous patch applied. Maybe
it would be helpful?

Thanks,
Nick

 chardev/char.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/chardev/char.c b/chardev/char.c
index 996a024c7a..7c375e3cc4 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
 {
     ChardevClass *cc = CHARDEV_GET_CLASS(s);
     int res = 0;
+    int nr_retries = 0;
+
     *offset = 0;
 
     qemu_mutex_lock(&s->chr_write_lock);
@@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
             } else {
                 g_usleep(100);
             }
+            if (++nr_retries == 1000) { /* 100ms */
+                warn_report("Chardev '%s' write blocked for > 100ms, "
+                            "socket buffer full?", s->label);
+            }
             goto retry;
         }
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI
  2023-11-16 11:53 [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang Nicholas Piggin
  2023-11-16 11:53 ` [PATCH 2/3] chardev: report blocked write to chardev backend Nicholas Piggin
@ 2023-11-16 11:53 ` Nicholas Piggin
  2023-11-16 12:33   ` Thomas Huth
  2023-11-16 18:11   ` Thomas Huth
  2023-11-16 13:26 ` [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang Ani Sinha
  2 siblings, 2 replies; 21+ messages in thread
From: Nicholas Piggin @ 2023-11-16 11:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nicholas Piggin, Marc-André Lureau, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha,
	Daniel P . Berrangé, Thomas Huth, John Snow, qemu-ppc,
	Alex Bennée

Let's try enable reverse_debugging.py in gitlab CI.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Maybe we could try this again at some point? The bug might have been
noticed sooner.

They only take a couple of seconds to run so should not take too much
overhead. But my gitlab CI pipeline doesn't run the avocado tests for
some reason, so I can't see if it's still causing problems.

Thanks,
Nick
---

 tests/avocado/reverse_debugging.py | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
index 128d85bc0e..b1410e7a69 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -205,8 +205,6 @@ def get_pc(self, g):
         return self.get_reg_le(g, self.REG_PC) \
             + self.get_reg_le(g, self.REG_CS) * 0x10
 
-    # unidentified gitlab timeout problem
-    @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
     def test_x86_64_pc(self):
         """
         :avocado: tags=arch:x86_64
@@ -222,8 +220,6 @@ class ReverseDebugging_AArch64(ReverseDebugging):
 
     REG_PC = 32
 
-    # unidentified gitlab timeout problem
-    @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
     def test_aarch64_virt(self):
         """
         :avocado: tags=arch:aarch64
@@ -246,8 +242,6 @@ class ReverseDebugging_ppc64(ReverseDebugging):
 
     REG_PC = 0x40
 
-    # unidentified gitlab timeout problem
-    @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
     def test_ppc64_pseries(self):
         """
         :avocado: tags=arch:ppc64
@@ -259,7 +253,6 @@ def test_ppc64_pseries(self):
         self.endian_is_le = False
         self.reverse_debugging()
 
-    @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
     def test_ppc64_powernv(self):
         """
         :avocado: tags=arch:ppc64
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI
  2023-11-16 11:53 ` [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI Nicholas Piggin
@ 2023-11-16 12:33   ` Thomas Huth
  2023-11-16 18:11   ` Thomas Huth
  1 sibling, 0 replies; 21+ messages in thread
From: Thomas Huth @ 2023-11-16 12:33 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel
  Cc: Marc-André Lureau, Paolo Bonzini, Pavel Dovgalyuk,
	Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha,
	Daniel P . Berrangé, John Snow, qemu-ppc, Alex Bennée

On 16/11/2023 12.53, Nicholas Piggin wrote:
> Let's try enable reverse_debugging.py in gitlab CI.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> Maybe we could try this again at some point? The bug might have been
> noticed sooner.
> 
> They only take a couple of seconds to run so should not take too much
> overhead. But my gitlab CI pipeline doesn't run the avocado tests for
> some reason, so I can't see if it's still causing problems.

Go to Settings -> CI/CD -> Variables in your gitlab project, and create a 
QEMU_CI_AVOCADO_TESTING variable there, then the jobs should show up next time.

It's mentioned in the documentation in docs/devel/ci-jobs.rst.inc ... but I 
have to admit that it's easy to miss.

  Thomas




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang
  2023-11-16 11:53 [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang Nicholas Piggin
  2023-11-16 11:53 ` [PATCH 2/3] chardev: report blocked write to chardev backend Nicholas Piggin
  2023-11-16 11:53 ` [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI Nicholas Piggin
@ 2023-11-16 13:26 ` Ani Sinha
  2023-11-16 13:31   ` Ani Sinha
  2 siblings, 1 reply; 21+ messages in thread
From: Ani Sinha @ 2023-11-16 13:26 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: QEMU Developers, Marc-André Lureau, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal,
	"Daniel P . Berrangé", Thomas Huth, John Snow,
	qemu-ppc, Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]



> On 16-Nov-2023, at 5:23 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Like replay_linux.py, reverse_debugging.py starts the vm with console
> set but does not interact with it (e.g., with wait_for_console_pattern).
> In this situation, the console should have a drainer attached so the
> socket does not fill. replay_linux.py has a drainer, but it is missing
> from reverse_debugging.py.
> 
> Per analysis in Link: this can cause the console socket/pipe to fill and
> QEMU get stuck in qemu_chr_write_buffer, leading to strange test case
> failures (ppc64 fails because it prints a lot to console in early bios).
> Attaching a drainer prevents this.
> 
> Note, this commit does not fix bugs introduced by the commits referenced
> in the first two Fixes: tags, but together those commits conspire to
> irritate the problem and cause test case failure, which this commit
> fixes.
> 
> Link: https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
> Fixes: 1d4796cd0083 ("python/machine: use socketpair() for console connections")
> Fixes: 761a13b23946 ("tests/avocado: ppc64 reverse debugging tests for pseries and powernv")
> Fixes: be52eca30978 ("tests/acceptance: add reverse debugging test")
> Tested-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Ani Sinha <anisinha@redhat.com <mailto:anisinha@redhat.com> <mailto:anisinha@redhat.com>>

> ---
> tests/avocado/reverse_debugging.py | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
> index fc47874eda..128d85bc0e 100644
> --- a/tests/avocado/reverse_debugging.py
> +++ b/tests/avocado/reverse_debugging.py
> @@ -12,6 +12,7 @@
> 
> from avocado import skipIf
> from avocado_qemu import BUILD_DIR
> +from avocado.utils import datadrainer
> from avocado.utils import gdb
> from avocado.utils import process
> from avocado.utils.network.ports import find_free_port
> @@ -52,6 +53,10 @@ def run_vm(self, record, shift, args, replay_path, image_path, port):
>        if args:
>            vm.add_args(*args)
>        vm.launch()
> +        console_drainer = datadrainer.LineLogger(vm.console_socket.fileno(),
> +                                    logger=self.log.getChild('console'),
> +                                    stop_check=(lambda : not vm.is_running()))
> +        console_drainer.start()
>        return vm
> 
>    @staticmethod
> -- 
> 2.42.0

[-- Attachment #2: Type: text/html, Size: 8055 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang
  2023-11-16 13:26 ` [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang Ani Sinha
@ 2023-11-16 13:31   ` Ani Sinha
  2023-11-16 13:39     ` Ani Sinha
  0 siblings, 1 reply; 21+ messages in thread
From: Ani Sinha @ 2023-11-16 13:31 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: QEMU Developers, Marc-André Lureau, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal,
	"Daniel P . Berrangé", Thomas Huth, John Snow,
	qemu-ppc, Alex Bennée



> On 16-Nov-2023, at 6:56 PM, Ani Sinha <anisinha@redhat.com> wrote:
> 
> 
> 
>> On 16-Nov-2023, at 5:23 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>> 
>> Like replay_linux.py, reverse_debugging.py starts the vm with console
>> set but does not interact with it (e.g., with wait_for_console_pattern).
>> In this situation, the console should have a drainer attached so the
>> socket does not fill. replay_linux.py has a drainer, but it is missing
>> from reverse_debugging.py.
>> 
>> Per analysis in Link: this can cause the console socket/pipe to fill and
>> QEMU get stuck in qemu_chr_write_buffer, leading to strange test case
>> failures (ppc64 fails because it prints a lot to console in early bios).
>> Attaching a drainer prevents this.
>> 
>> Note, this commit does not fix bugs introduced by the commits referenced
>> in the first two Fixes: tags, but together those commits conspire to
>> irritate the problem and cause test case failure, which this commit
>> fixes.
>> 
>> Link: https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
>> Fixes: 1d4796cd0083 ("python/machine: use socketpair() for console connections")
>> Fixes: 761a13b23946 ("tests/avocado: ppc64 reverse debugging tests for pseries and powernv")
>> Fixes: be52eca30978 ("tests/acceptance: add reverse debugging test")
>> Tested-by: Thomas Huth <thuth@redhat.com>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> 
> Reviewed-by: Ani Sinha <anisinha@redhat.com <mailto:anisinha@redhat.com>>

This bs started happening recently once I upgraded my OS. Sorry about that.

Reviewed-by: Ani Sinha <anisinha@redhat.com <mailto:anisinha@redhat.com>>


> 
>> ---
>> tests/avocado/reverse_debugging.py | 5 +++++
>> 1 file changed, 5 insertions(+)
>> 
>> diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
>> index fc47874eda..128d85bc0e 100644
>> --- a/tests/avocado/reverse_debugging.py
>> +++ b/tests/avocado/reverse_debugging.py
>> @@ -12,6 +12,7 @@
>> 
>> from avocado import skipIf
>> from avocado_qemu import BUILD_DIR
>> +from avocado.utils import datadrainer
>> from avocado.utils import gdb
>> from avocado.utils import process
>> from avocado.utils.network.ports import find_free_port
>> @@ -52,6 +53,10 @@ def run_vm(self, record, shift, args, replay_path, image_path, port):
>>        if args:
>>            vm.add_args(*args)
>>        vm.launch()
>> +        console_drainer = datadrainer.LineLogger(vm.console_socket.fileno(),
>> +                                    logger=self.log.getChild('console'),
>> +                                    stop_check=(lambda : not vm.is_running()))
>> +        console_drainer.start()
>>        return vm
>> 
>>    @staticmethod
>> -- 
>> 2.42.0




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang
  2023-11-16 13:31   ` Ani Sinha
@ 2023-11-16 13:39     ` Ani Sinha
  0 siblings, 0 replies; 21+ messages in thread
From: Ani Sinha @ 2023-11-16 13:39 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: QEMU Developers, Marc-André Lureau, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal,
	"Daniel P . Berrangé", Thomas Huth, John Snow,
	qemu-ppc, Alex Bennée



> On 16-Nov-2023, at 7:01 PM, Ani Sinha <anisinha@redhat.com> wrote:
> 
> 
> 
>> On 16-Nov-2023, at 6:56 PM, Ani Sinha <anisinha@redhat.com> wrote:
>> 
>> 
>> 
>>> On 16-Nov-2023, at 5:23 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>>> 
>>> Like replay_linux.py, reverse_debugging.py starts the vm with console
>>> set but does not interact with it (e.g., with wait_for_console_pattern).
>>> In this situation, the console should have a drainer attached so the
>>> socket does not fill. replay_linux.py has a drainer, but it is missing
>>> from reverse_debugging.py.
>>> 
>>> Per analysis in Link: this can cause the console socket/pipe to fill and
>>> QEMU get stuck in qemu_chr_write_buffer, leading to strange test case
>>> failures (ppc64 fails because it prints a lot to console in early bios).
>>> Attaching a drainer prevents this.
>>> 
>>> Note, this commit does not fix bugs introduced by the commits referenced
>>> in the first two Fixes: tags, but together those commits conspire to
>>> irritate the problem and cause test case failure, which this commit
>>> fixes.
>>> 
>>> Link: https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
>>> Fixes: 1d4796cd0083 ("python/machine: use socketpair() for console connections")
>>> Fixes: 761a13b23946 ("tests/avocado: ppc64 reverse debugging tests for pseries and powernv")
>>> Fixes: be52eca30978 ("tests/acceptance: add reverse debugging test")
>>> Tested-by: Thomas Huth <thuth@redhat.com>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> 
>> Reviewed-by: Ani Sinha <anisinha@redhat.com <mailto:anisinha@redhat.com>>
> 
> This bs started happening recently once I upgraded my OS. Sorry about that.
> 
> Reviewed-by: Ani Sinha <anisinha@redhat.com <mailto:anisinha@redhat.com>>

Trying again	

Reviewed-by: Ani Sinha <anisinha@redhat.com>


> 
> 
>> 
>>> ---
>>> tests/avocado/reverse_debugging.py | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>> 
>>> diff --git a/tests/avocado/reverse_debugging.py b/tests/avocado/reverse_debugging.py
>>> index fc47874eda..128d85bc0e 100644
>>> --- a/tests/avocado/reverse_debugging.py
>>> +++ b/tests/avocado/reverse_debugging.py
>>> @@ -12,6 +12,7 @@
>>> 
>>> from avocado import skipIf
>>> from avocado_qemu import BUILD_DIR
>>> +from avocado.utils import datadrainer
>>> from avocado.utils import gdb
>>> from avocado.utils import process
>>> from avocado.utils.network.ports import find_free_port
>>> @@ -52,6 +53,10 @@ def run_vm(self, record, shift, args, replay_path, image_path, port):
>>>       if args:
>>>           vm.add_args(*args)
>>>       vm.launch()
>>> +        console_drainer = datadrainer.LineLogger(vm.console_socket.fileno(),
>>> +                                    logger=self.log.getChild('console'),
>>> +                                    stop_check=(lambda : not vm.is_running()))
>>> +        console_drainer.start()
>>>       return vm
>>> 
>>>   @staticmethod
>>> -- 
>>> 2.42.0




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI
  2023-11-16 11:53 ` [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI Nicholas Piggin
  2023-11-16 12:33   ` Thomas Huth
@ 2023-11-16 18:11   ` Thomas Huth
  2023-11-17  7:35     ` Nicholas Piggin
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Huth @ 2023-11-16 18:11 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel, Marc-André Lureau
  Cc: Paolo Bonzini, Pavel Dovgalyuk, Cleber Rosa,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Ani Sinha, Daniel P . Berrangé, John Snow,
	qemu-ppc, Alex Bennée

On 16/11/2023 12.53, Nicholas Piggin wrote:
> Let's try enable reverse_debugging.py in gitlab CI.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> Maybe we could try this again at some point? The bug might have been
> noticed sooner.
> 
> They only take a couple of seconds to run so should not take too much
> overhead. But my gitlab CI pipeline doesn't run the avocado tests for
> some reason, so I can't see if it's still causing problems.
> 
> Thanks,
> Nick
> ---
> 
>   tests/avocado/reverse_debugging.py | 7 -------
>   1 file changed, 7 deletions(-)

FYI, I gave it a try, and it survived my CI run:

  https://gitlab.com/thuth/qemu/-/jobs/5552213972#L403

So I went ahead and put it (together with the first patch) in my current 
pull request, let's see how it goes...

(for the second patch, I hope our chardev maintainer Marc-André could 
comment on that first)

  Thomas



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI
  2023-11-16 18:11   ` Thomas Huth
@ 2023-11-17  7:35     ` Nicholas Piggin
  2023-11-21  8:56       ` Thomas Huth
  0 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2023-11-17  7:35 UTC (permalink / raw)
  To: Thomas Huth, qemu-devel, Marc-André Lureau
  Cc: Paolo Bonzini, Pavel Dovgalyuk, Cleber Rosa,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Ani Sinha, Daniel P . Berrangé, John Snow,
	qemu-ppc, Alex Bennée

On Fri Nov 17, 2023 at 4:11 AM AEST, Thomas Huth wrote:
> On 16/11/2023 12.53, Nicholas Piggin wrote:
> > Let's try enable reverse_debugging.py in gitlab CI.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> > Maybe we could try this again at some point? The bug might have been
> > noticed sooner.
> > 
> > They only take a couple of seconds to run so should not take too much
> > overhead. But my gitlab CI pipeline doesn't run the avocado tests for
> > some reason, so I can't see if it's still causing problems.
> > 
> > Thanks,
> > Nick
> > ---
> > 
> >   tests/avocado/reverse_debugging.py | 7 -------
> >   1 file changed, 7 deletions(-)
>
> FYI, I gave it a try, and it survived my CI run:
>
>   https://gitlab.com/thuth/qemu/-/jobs/5552213972#L403
>
> So I went ahead and put it (together with the first patch) in my current 
> pull request, let's see how it goes...

Great, thank you.

> (for the second patch, I hope our chardev maintainer Marc-André could 
> comment on that first)

Yeah that would be good, no rush for that one. Maybe it's not desirable
for other reasons too.

Thanks,
Nick



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-16 11:53 ` [PATCH 2/3] chardev: report blocked write to chardev backend Nicholas Piggin
@ 2023-11-20 12:06   ` Marc-André Lureau
  2023-11-20 13:35     ` Nicholas Piggin
  0 siblings, 1 reply; 21+ messages in thread
From: Marc-André Lureau @ 2023-11-20 12:06 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-devel, Paolo Bonzini, Pavel Dovgalyuk, Cleber Rosa,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Ani Sinha, Daniel P . Berrangé, Thomas Huth,
	John Snow, qemu-ppc, Alex Bennée

Hi

On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> If a chardev socket is not read, it will eventually fill and QEMU
> can block attempting to write to it. A difficult bug in avocado
> tests where the console socket was not being read from caused this
> hang.
>
> warn if a chardev write is blocked for 100ms.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> This is not necessary for the fix but it does trigger in the
> failing avocado test without the previous patch applied. Maybe
> it would be helpful?
>
> Thanks,
> Nick
>
>  chardev/char.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/chardev/char.c b/chardev/char.c
> index 996a024c7a..7c375e3cc4 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
>  {
>      ChardevClass *cc = CHARDEV_GET_CLASS(s);
>      int res = 0;
> +    int nr_retries = 0;
> +
>      *offset = 0;
>
>      qemu_mutex_lock(&s->chr_write_lock);
> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
>              } else {
>                  g_usleep(100);
>              }
> +            if (++nr_retries == 1000) { /* 100ms */
> +                warn_report("Chardev '%s' write blocked for > 100ms, "
> +                            "socket buffer full?", s->label);
> +            }

That shouldn't happen, the frontend should poll and only write when it
can. What is the qemu command being used here?
I think this change can be worth for debugging though.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-20 12:06   ` Marc-André Lureau
@ 2023-11-20 13:35     ` Nicholas Piggin
  2023-11-21  9:39       ` Marc-André Lureau
  0 siblings, 1 reply; 21+ messages in thread
From: Nicholas Piggin @ 2023-11-20 13:35 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, Paolo Bonzini, Pavel Dovgalyuk, Cleber Rosa,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Ani Sinha, Daniel P . Berrangé, Thomas Huth,
	John Snow, qemu-ppc, Alex Bennée

On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> Hi
>
> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > If a chardev socket is not read, it will eventually fill and QEMU
> > can block attempting to write to it. A difficult bug in avocado
> > tests where the console socket was not being read from caused this
> > hang.
> >
> > warn if a chardev write is blocked for 100ms.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> > This is not necessary for the fix but it does trigger in the
> > failing avocado test without the previous patch applied. Maybe
> > it would be helpful?
> >
> > Thanks,
> > Nick
> >
> >  chardev/char.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/chardev/char.c b/chardev/char.c
> > index 996a024c7a..7c375e3cc4 100644
> > --- a/chardev/char.c
> > +++ b/chardev/char.c
> > @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> >  {
> >      ChardevClass *cc = CHARDEV_GET_CLASS(s);
> >      int res = 0;
> > +    int nr_retries = 0;
> > +
> >      *offset = 0;
> >
> >      qemu_mutex_lock(&s->chr_write_lock);
> > @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> >              } else {
> >                  g_usleep(100);
> >              }
> > +            if (++nr_retries == 1000) { /* 100ms */
> > +                warn_report("Chardev '%s' write blocked for > 100ms, "
> > +                            "socket buffer full?", s->label);
> > +            }
>
> That shouldn't happen, the frontend should poll and only write when it
> can. What is the qemu command being used here?

You can follow it through the thread here

https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/

In short, a console device is attached to a socket pair and nothing
ever reads from it. It eventually fills, and writing to it fails
indefinitely here.

It can be reproduced with:

make check-avocado
AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries


> I think this change can be worth for debugging though.
>
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Thanks,
Nick


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI
  2023-11-17  7:35     ` Nicholas Piggin
@ 2023-11-21  8:56       ` Thomas Huth
  2023-11-21  9:14         ` Daniel P. Berrangé
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Huth @ 2023-11-21  8:56 UTC (permalink / raw)
  To: Nicholas Piggin, qemu-devel, Stefan Hajnoczi
  Cc: Paolo Bonzini, Pavel Dovgalyuk, Cleber Rosa,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Ani Sinha, Daniel P . Berrangé, John Snow,
	qemu-ppc, Alex Bennée, Marc-André Lureau

On 17/11/2023 08.35, Nicholas Piggin wrote:
> On Fri Nov 17, 2023 at 4:11 AM AEST, Thomas Huth wrote:
>> On 16/11/2023 12.53, Nicholas Piggin wrote:
>>> Let's try enable reverse_debugging.py in gitlab CI.
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>> Maybe we could try this again at some point? The bug might have been
>>> noticed sooner.
>>>
>>> They only take a couple of seconds to run so should not take too much
>>> overhead. But my gitlab CI pipeline doesn't run the avocado tests for
>>> some reason, so I can't see if it's still causing problems.
>>>
>>> Thanks,
>>> Nick
>>> ---
>>>
>>>    tests/avocado/reverse_debugging.py | 7 -------
>>>    1 file changed, 7 deletions(-)
>>
>> FYI, I gave it a try, and it survived my CI run:
>>
>>    https://gitlab.com/thuth/qemu/-/jobs/5552213972#L403
>>
>> So I went ahead and put it (together with the first patch) in my current
>> pull request, let's see how it goes...
> 
> Great, thank you.

... and here it's failing again (current master branch):

https://gitlab.com/thuth/qemu/-/jobs/5582657378#L404

According to the debug.log in the artifacts, it's failing here:

08:28:32 DEBUG| [    0.230392217,5] OPAL v7.0 starting...

08:28:32 DEBUG| [    0.230674939,7] initial console log level: memory 7, driver 5

08:28:32 DEBUG| [    0.231048494,6] CPU: P9 generation processor (max 4 threads/core)

08:28:32 DEBUG| [
08:28:32 DEBUG| [    0.231412547,7] CPU: Boot CPU PIR is 0x0000 PVR is 0x004e1202

08:28:32 DEBUG| [
08:28:32 ERROR|
08:28:32 ERROR| Reproduced traceback from: /builds/thuth/qemu/build/pyvenv/lib64/python3.8/site-packages/avocado/core/test.py:770
08:28:32 ERROR| Traceback (most recent call last):
08:28:32 ERROR|   File "/builds/thuth/qemu/build/tests/avocado/reverse_debugging.py", line 262, in test_ppc64_powernv
08:28:32 ERROR|     self.reverse_debugging()
08:28:32 ERROR|   File "/builds/thuth/qemu/build/tests/avocado/reverse_debugging.py", line 178, in reverse_debugging
08:28:32 ERROR|     g.cmd(b'c')
08:28:32 ERROR|   File "/builds/thuth/qemu/build/pyvenv/lib64/python3.8/site-packages/avocado/utils/gdb.py", line 783, in cmd
08:28:32 ERROR|     response_payload = self.decode(result)
08:28:32 ERROR|   File "/builds/thuth/qemu/build/pyvenv/lib64/python3.8/site-packages/avocado/utils/gdb.py", line 738, in decode
08:28:32 ERROR|     raise InvalidPacketError
08:28:32 ERROR| avocado.utils.gdb.InvalidPacketError
08:28:32 ERROR|
08:28:32 DEBUG| Local variables:
08:28:32 DEBUG|  -> self <class 'reverse_debugging.ReverseDebugging_ppc64'>: 79-tests/avocado/reverse_debugging.py:ReverseDebugging_ppc64.test_ppc64_powernv
08:28:32 DEBUG| Shutting down VM appliance; timeout=30
08:28:32 DEBUG| Attempting graceful termination
08:28:32 DEBUG| Closing console socket
08:28:32 DEBUG| Politely asking QEMU to terminate

So unless someone has a clue how to fix that, I guess it's
likely best to revert this enablement patch again...

  Thomas



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI
  2023-11-21  8:56       ` Thomas Huth
@ 2023-11-21  9:14         ` Daniel P. Berrangé
  2023-11-21  9:40           ` Thomas Huth
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel P. Berrangé @ 2023-11-21  9:14 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Nicholas Piggin, qemu-devel, Stefan Hajnoczi, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha, John Snow,
	qemu-ppc, Alex Bennée, Marc-André Lureau

On Tue, Nov 21, 2023 at 09:56:24AM +0100, Thomas Huth wrote:
> On 17/11/2023 08.35, Nicholas Piggin wrote:
> > On Fri Nov 17, 2023 at 4:11 AM AEST, Thomas Huth wrote:
> > > On 16/11/2023 12.53, Nicholas Piggin wrote:
> > > > Let's try enable reverse_debugging.py in gitlab CI.
> > > > 
> > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > ---
> > > > Maybe we could try this again at some point? The bug might have been
> > > > noticed sooner.
> > > > 
> > > > They only take a couple of seconds to run so should not take too much
> > > > overhead. But my gitlab CI pipeline doesn't run the avocado tests for
> > > > some reason, so I can't see if it's still causing problems.
> > > > 
> > > > Thanks,
> > > > Nick
> > > > ---
> > > > 
> > > >    tests/avocado/reverse_debugging.py | 7 -------
> > > >    1 file changed, 7 deletions(-)
> > > 
> > > FYI, I gave it a try, and it survived my CI run:
> > > 
> > >    https://gitlab.com/thuth/qemu/-/jobs/5552213972#L403
> > > 
> > > So I went ahead and put it (together with the first patch) in my current
> > > pull request, let's see how it goes...
> > 
> > Great, thank you.
> 
> ... and here it's failing again (current master branch):
> 
> https://gitlab.com/thuth/qemu/-/jobs/5582657378#L404
> 
> According to the debug.log in the artifacts, it's failing here:
> 
> 08:28:32 DEBUG| [    0.230392217,5] OPAL v7.0 starting...
> 
> 08:28:32 DEBUG| [    0.230674939,7] initial console log level: memory 7, driver 5
> 
> 08:28:32 DEBUG| [    0.231048494,6] CPU: P9 generation processor (max 4 threads/core)
> 
> 08:28:32 DEBUG| [
> 08:28:32 DEBUG| [    0.231412547,7] CPU: Boot CPU PIR is 0x0000 PVR is 0x004e1202
> 
> 08:28:32 DEBUG| [
> 08:28:32 ERROR|
> 08:28:32 ERROR| Reproduced traceback from: /builds/thuth/qemu/build/pyvenv/lib64/python3.8/site-packages/avocado/core/test.py:770
> 08:28:32 ERROR| Traceback (most recent call last):
> 08:28:32 ERROR|   File "/builds/thuth/qemu/build/tests/avocado/reverse_debugging.py", line 262, in test_ppc64_powernv
> 08:28:32 ERROR|     self.reverse_debugging()
> 08:28:32 ERROR|   File "/builds/thuth/qemu/build/tests/avocado/reverse_debugging.py", line 178, in reverse_debugging
> 08:28:32 ERROR|     g.cmd(b'c')
> 08:28:32 ERROR|   File "/builds/thuth/qemu/build/pyvenv/lib64/python3.8/site-packages/avocado/utils/gdb.py", line 783, in cmd
> 08:28:32 ERROR|     response_payload = self.decode(result)
> 08:28:32 ERROR|   File "/builds/thuth/qemu/build/pyvenv/lib64/python3.8/site-packages/avocado/utils/gdb.py", line 738, in decode
> 08:28:32 ERROR|     raise InvalidPacketError
> 08:28:32 ERROR| avocado.utils.gdb.InvalidPacketError
> 08:28:32 ERROR|
> 08:28:32 DEBUG| Local variables:
> 08:28:32 DEBUG|  -> self <class 'reverse_debugging.ReverseDebugging_ppc64'>: 79-tests/avocado/reverse_debugging.py:ReverseDebugging_ppc64.test_ppc64_powernv
> 08:28:32 DEBUG| Shutting down VM appliance; timeout=30
> 08:28:32 DEBUG| Attempting graceful termination
> 08:28:32 DEBUG| Closing console socket
> 08:28:32 DEBUG| Politely asking QEMU to terminate
> 
> So unless someone has a clue how to fix that, I guess it's
> likely best to revert this enablement patch again...

A little further in the log we see

08:28:32 DEBUG| Politely asking QEMU to terminate
08:28:32 DEBUG| --> {
  "execute": "quit"
}
08:28:32 DEBUG| <-- {
  "timestamp": {
    "seconds": 1700555312,
    "microseconds": 86122
  },
  "event": "RESUME"
}
08:28:32 ERROR| Task.Reader: BrokenPipeError: [Errno 32] Broken pipe



With seeing a bad packet from GDB and seeing Broken pipe from QMP,
my impression is that the QEMU process is no longer present, most
likely it has SEGV'd I reckon.

IOW, I think we might well have a genuine bug here, not merely an
unreliable test suite.

None the less, unless someone can guess what the problem is, we'll
need to disable the test to get reliable CI.

A bug should be opened though with the CI logs.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-20 13:35     ` Nicholas Piggin
@ 2023-11-21  9:39       ` Marc-André Lureau
  2023-11-21  9:42         ` Daniel P. Berrangé
  2023-11-21  9:44         ` Thomas Huth
  0 siblings, 2 replies; 21+ messages in thread
From: Marc-André Lureau @ 2023-11-21  9:39 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: qemu-devel, Paolo Bonzini, Pavel Dovgalyuk, Cleber Rosa,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Ani Sinha, Daniel P . Berrangé, Thomas Huth,
	John Snow, qemu-ppc, Alex Bennée

Hi

On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> > Hi
> >
> > On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > >
> > > If a chardev socket is not read, it will eventually fill and QEMU
> > > can block attempting to write to it. A difficult bug in avocado
> > > tests where the console socket was not being read from caused this
> > > hang.
> > >
> > > warn if a chardev write is blocked for 100ms.
> > >
> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > ---
> > > This is not necessary for the fix but it does trigger in the
> > > failing avocado test without the previous patch applied. Maybe
> > > it would be helpful?
> > >
> > > Thanks,
> > > Nick
> > >
> > >  chardev/char.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/chardev/char.c b/chardev/char.c
> > > index 996a024c7a..7c375e3cc4 100644
> > > --- a/chardev/char.c
> > > +++ b/chardev/char.c
> > > @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> > >  {
> > >      ChardevClass *cc = CHARDEV_GET_CLASS(s);
> > >      int res = 0;
> > > +    int nr_retries = 0;
> > > +
> > >      *offset = 0;
> > >
> > >      qemu_mutex_lock(&s->chr_write_lock);
> > > @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> > >              } else {
> > >                  g_usleep(100);
> > >              }
> > > +            if (++nr_retries == 1000) { /* 100ms */
> > > +                warn_report("Chardev '%s' write blocked for > 100ms, "
> > > +                            "socket buffer full?", s->label);
> > > +            }
> >
> > That shouldn't happen, the frontend should poll and only write when it
> > can. What is the qemu command being used here?
>
> You can follow it through the thread here
>
> https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
>
> In short, a console device is attached to a socket pair and nothing
> ever reads from it. It eventually fills, and writing to it fails
> indefinitely here.
>
> It can be reproduced with:
>
> make check-avocado
> AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
>
>

How reliably? I tried 10/10.

> > I think this change can be worth for debugging though.
> >
> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>
> Thanks,
> Nick
>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI
  2023-11-21  9:14         ` Daniel P. Berrangé
@ 2023-11-21  9:40           ` Thomas Huth
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Huth @ 2023-11-21  9:40 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Nicholas Piggin, qemu-devel, Stefan Hajnoczi, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha, John Snow,
	qemu-ppc, Alex Bennée, Marc-André Lureau

On 21/11/2023 10.14, Daniel P. Berrangé wrote:
> On Tue, Nov 21, 2023 at 09:56:24AM +0100, Thomas Huth wrote:
>> On 17/11/2023 08.35, Nicholas Piggin wrote:
>>> On Fri Nov 17, 2023 at 4:11 AM AEST, Thomas Huth wrote:
>>>> On 16/11/2023 12.53, Nicholas Piggin wrote:
>>>>> Let's try enable reverse_debugging.py in gitlab CI.
>>>>>
>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>> ---
>>>>> Maybe we could try this again at some point? The bug might have been
>>>>> noticed sooner.
>>>>>
>>>>> They only take a couple of seconds to run so should not take too much
>>>>> overhead. But my gitlab CI pipeline doesn't run the avocado tests for
>>>>> some reason, so I can't see if it's still causing problems.
...
>>>> FYI, I gave it a try, and it survived my CI run:
>>>>
>>>>     https://gitlab.com/thuth/qemu/-/jobs/5552213972#L403
>>>>
>>>> So I went ahead and put it (together with the first patch) in my current
>>>> pull request, let's see how it goes...
>>>
>>> Great, thank you.
>>
>> ... and here it's failing again (current master branch):
>>
>> https://gitlab.com/thuth/qemu/-/jobs/5582657378#L404
>>
>> According to the debug.log in the artifacts, it's failing here:
...
> 08:28:32 ERROR| Task.Reader: BrokenPipeError: [Errno 32] Broken pipe
> 
> With seeing a bad packet from GDB and seeing Broken pipe from QMP,
> my impression is that the QEMU process is no longer present, most
> likely it has SEGV'd I reckon.
> 
> IOW, I think we might well have a genuine bug here, not merely an
> unreliable test suite.
> 
> None the less, unless someone can guess what the problem is, we'll
> need to disable the test to get reliable CI.

I'll sent a patch to revert the commit.

> A bug should be opened though with the CI logs.

Done: https://gitlab.com/qemu-project/qemu/-/issues/1992

  Thomas




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-21  9:39       ` Marc-André Lureau
@ 2023-11-21  9:42         ` Daniel P. Berrangé
  2023-11-21  9:44         ` Thomas Huth
  1 sibling, 0 replies; 21+ messages in thread
From: Daniel P. Berrangé @ 2023-11-21  9:42 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Nicholas Piggin, qemu-devel, Paolo Bonzini, Pavel Dovgalyuk,
	Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha, Thomas Huth,
	John Snow, qemu-ppc, Alex Bennée

On Tue, Nov 21, 2023 at 01:39:03PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> > > Hi
> > >
> > > On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > >
> > > > If a chardev socket is not read, it will eventually fill and QEMU
> > > > can block attempting to write to it. A difficult bug in avocado
> > > > tests where the console socket was not being read from caused this
> > > > hang.
> > > >
> > > > warn if a chardev write is blocked for 100ms.
> > > >
> > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > ---
> > > > This is not necessary for the fix but it does trigger in the
> > > > failing avocado test without the previous patch applied. Maybe
> > > > it would be helpful?
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > > >  chardev/char.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/chardev/char.c b/chardev/char.c
> > > > index 996a024c7a..7c375e3cc4 100644
> > > > --- a/chardev/char.c
> > > > +++ b/chardev/char.c
> > > > @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> > > >  {
> > > >      ChardevClass *cc = CHARDEV_GET_CLASS(s);
> > > >      int res = 0;
> > > > +    int nr_retries = 0;
> > > > +
> > > >      *offset = 0;
> > > >
> > > >      qemu_mutex_lock(&s->chr_write_lock);
> > > > @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> > > >              } else {
> > > >                  g_usleep(100);
> > > >              }
> > > > +            if (++nr_retries == 1000) { /* 100ms */
> > > > +                warn_report("Chardev '%s' write blocked for > 100ms, "
> > > > +                            "socket buffer full?", s->label);
> > > > +            }
> > >
> > > That shouldn't happen, the frontend should poll and only write when it
> > > can. What is the qemu command being used here?
> >
> > You can follow it through the thread here
> >
> > https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
> >
> > In short, a console device is attached to a socket pair and nothing
> > ever reads from it. It eventually fills, and writing to it fails
> > indefinitely here.
> >
> > It can be reproduced with:
> >
> > make check-avocado
> > AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
> >
> >
> 
> How reliably? I tried 10/10.

It reproduced 100% reliably, but note git master is fixed now, so to
test you'll need to revert cd43f00524070c0267613acc98a153dba0e398d9

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-21  9:39       ` Marc-André Lureau
  2023-11-21  9:42         ` Daniel P. Berrangé
@ 2023-11-21  9:44         ` Thomas Huth
  2023-11-21 11:47           ` Marc-André Lureau
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Huth @ 2023-11-21  9:44 UTC (permalink / raw)
  To: Marc-André Lureau, Nicholas Piggin
  Cc: qemu-devel, Paolo Bonzini, Pavel Dovgalyuk, Cleber Rosa,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Ani Sinha, Daniel P . Berrangé, John Snow,
	qemu-ppc, Alex Bennée

On 21/11/2023 10.39, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>
>>>> If a chardev socket is not read, it will eventually fill and QEMU
>>>> can block attempting to write to it. A difficult bug in avocado
>>>> tests where the console socket was not being read from caused this
>>>> hang.
>>>>
>>>> warn if a chardev write is blocked for 100ms.
>>>>
>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>> ---
>>>> This is not necessary for the fix but it does trigger in the
>>>> failing avocado test without the previous patch applied. Maybe
>>>> it would be helpful?
>>>>
>>>> Thanks,
>>>> Nick
>>>>
>>>>   chardev/char.c | 6 ++++++
>>>>   1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/chardev/char.c b/chardev/char.c
>>>> index 996a024c7a..7c375e3cc4 100644
>>>> --- a/chardev/char.c
>>>> +++ b/chardev/char.c
>>>> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
>>>>   {
>>>>       ChardevClass *cc = CHARDEV_GET_CLASS(s);
>>>>       int res = 0;
>>>> +    int nr_retries = 0;
>>>> +
>>>>       *offset = 0;
>>>>
>>>>       qemu_mutex_lock(&s->chr_write_lock);
>>>> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
>>>>               } else {
>>>>                   g_usleep(100);
>>>>               }
>>>> +            if (++nr_retries == 1000) { /* 100ms */
>>>> +                warn_report("Chardev '%s' write blocked for > 100ms, "
>>>> +                            "socket buffer full?", s->label);
>>>> +            }
>>>
>>> That shouldn't happen, the frontend should poll and only write when it
>>> can. What is the qemu command being used here?
>>
>> You can follow it through the thread here
>>
>> https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
>>
>> In short, a console device is attached to a socket pair and nothing
>> ever reads from it. It eventually fills, and writing to it fails
>> indefinitely here.
>>
>> It can be reproduced with:
>>
>> make check-avocado
>> AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
>>
>>
> 
> How reliably? I tried 10/10.

It used to fail for me every time I tried - but the fix has already been 
merged yesterday (commit cd43f00524070c026), so if you updated today, you'll 
see the test passing again.

  Thomas




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-21  9:44         ` Thomas Huth
@ 2023-11-21 11:47           ` Marc-André Lureau
  2023-11-22  9:55             ` Alex Bennée
  2023-11-22 10:38             ` Thomas Huth
  0 siblings, 2 replies; 21+ messages in thread
From: Marc-André Lureau @ 2023-11-21 11:47 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Nicholas Piggin, qemu-devel, Paolo Bonzini, Pavel Dovgalyuk,
	Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha,
	Daniel P . Berrangé, John Snow, qemu-ppc, Alex Bennée

Hi

On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <thuth@redhat.com> wrote:
>
> On 21/11/2023 10.39, Marc-André Lureau wrote:
> > Hi
> >
> > On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >>
> >> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> >>> Hi
> >>>
> >>> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> >>>>
> >>>> If a chardev socket is not read, it will eventually fill and QEMU
> >>>> can block attempting to write to it. A difficult bug in avocado
> >>>> tests where the console socket was not being read from caused this
> >>>> hang.
> >>>>
> >>>> warn if a chardev write is blocked for 100ms.
> >>>>
> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> >>>> ---
> >>>> This is not necessary for the fix but it does trigger in the
> >>>> failing avocado test without the previous patch applied. Maybe
> >>>> it would be helpful?
> >>>>
> >>>> Thanks,
> >>>> Nick
> >>>>
> >>>>   chardev/char.c | 6 ++++++
> >>>>   1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/chardev/char.c b/chardev/char.c
> >>>> index 996a024c7a..7c375e3cc4 100644
> >>>> --- a/chardev/char.c
> >>>> +++ b/chardev/char.c
> >>>> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> >>>>   {
> >>>>       ChardevClass *cc = CHARDEV_GET_CLASS(s);
> >>>>       int res = 0;
> >>>> +    int nr_retries = 0;
> >>>> +
> >>>>       *offset = 0;
> >>>>
> >>>>       qemu_mutex_lock(&s->chr_write_lock);
> >>>> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> >>>>               } else {
> >>>>                   g_usleep(100);
> >>>>               }
> >>>> +            if (++nr_retries == 1000) { /* 100ms */
> >>>> +                warn_report("Chardev '%s' write blocked for > 100ms, "
> >>>> +                            "socket buffer full?", s->label);
> >>>> +            }
> >>>
> >>> That shouldn't happen, the frontend should poll and only write when it
> >>> can. What is the qemu command being used here?
> >>
> >> You can follow it through the thread here
> >>
> >> https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
> >>
> >> In short, a console device is attached to a socket pair and nothing
> >> ever reads from it. It eventually fills, and writing to it fails
> >> indefinitely here.
> >>
> >> It can be reproduced with:
> >>
> >> make check-avocado
> >> AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
> >>
> >>
> >
> > How reliably? I tried 10/10.
>
> It used to fail for me every time I tried - but the fix has already been
> merged yesterday (commit cd43f00524070c026), so if you updated today, you'll
> see the test passing again.

Ok so the "frontend" is spapr-vty and there:

void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
{
    SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);

    /* XXX this blocks entire thread. Rewrite to use
     * qemu_chr_fe_write and background I/O callbacks */
    qemu_chr_fe_write_all(&dev->chardev, buf, len);
}

(grep "XXX this blocks", we have a lot...)

Can H_PUT_TERM_CHAR return the number of bytes written?

Is there a way to tell the guest the console is ready to accept more bytes?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-21 11:47           ` Marc-André Lureau
@ 2023-11-22  9:55             ` Alex Bennée
  2023-11-22 10:38             ` Thomas Huth
  1 sibling, 0 replies; 21+ messages in thread
From: Alex Bennée @ 2023-11-22  9:55 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Thomas Huth, Nicholas Piggin, qemu-devel, Paolo Bonzini,
	Pavel Dovgalyuk, Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha,
	Daniel P . Berrangé, John Snow, qemu-ppc

Marc-André Lureau <marcandre.lureau@redhat.com> writes:

> Hi
>
> On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 21/11/2023 10.39, Marc-André Lureau wrote:
>> > Hi
>> >
>> > On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>> >>
>> >> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
>> >>> Hi
>> >>>
>> >>> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>> >>>>
>> >>>> If a chardev socket is not read, it will eventually fill and QEMU
>> >>>> can block attempting to write to it. A difficult bug in avocado
>> >>>> tests where the console socket was not being read from caused this
>> >>>> hang.
>> >>>>
>> >>>> warn if a chardev write is blocked for 100ms.
>> >>>>
>> >>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
<snip>
>> >>>> index 996a024c7a..7c375e3cc4 100644
>> >>>> --- a/chardev/char.c
>> >>>> +++ b/chardev/char.c
>> >>>> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
>> >>>>   {
>> >>>>       ChardevClass *cc = CHARDEV_GET_CLASS(s);
>> >>>>       int res = 0;
>> >>>> +    int nr_retries = 0;
>> >>>> +
>> >>>>       *offset = 0;
>> >>>>
>> >>>>       qemu_mutex_lock(&s->chr_write_lock);
>> >>>> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
>> >>>>               } else {
>> >>>>                   g_usleep(100);
>> >>>>               }
>> >>>> +            if (++nr_retries == 1000) { /* 100ms */
>> >>>> +                warn_report("Chardev '%s' write blocked for > 100ms, "
>> >>>> +                            "socket buffer full?", s->label);
>> >>>> +            }
>> >>>
>> >>> That shouldn't happen, the frontend should poll and only write when it
>> >>> can. What is the qemu command being used here?
<snip>
>
> Ok so the "frontend" is spapr-vty and there:
>
> void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
> {
>     SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);
>
>     /* XXX this blocks entire thread. Rewrite to use
>      * qemu_chr_fe_write and background I/O callbacks */
>     qemu_chr_fe_write_all(&dev->chardev, buf, len);
> }
>
> (grep "XXX this blocks", we have a lot...)
>
> Can H_PUT_TERM_CHAR return the number of bytes written?
>
> Is there a way to tell the guest the console is ready to accept more bytes?

See also:

  Message-ID: <20231109192814.95977-1-philmd@linaro.org>
  Date: Thu,  9 Nov 2023 20:28:04 +0100
  Subject: [PATCH-for-8.2 v4 00/10] hw/char/pl011: Implement TX (async) FIFO to avoid blocking the main loop
  From: =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= <philmd@linaro.org>

Although it didn't make it into 8.2. I was hoping it would be a template
for fixing up the other cases.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-21 11:47           ` Marc-André Lureau
  2023-11-22  9:55             ` Alex Bennée
@ 2023-11-22 10:38             ` Thomas Huth
  2023-11-22 10:42               ` Daniel P. Berrangé
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Huth @ 2023-11-22 10:38 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Nicholas Piggin, qemu-devel, Paolo Bonzini, Pavel Dovgalyuk,
	Cleber Rosa, Philippe Mathieu-Daudé,
	Wainer dos Santos Moschetta, Beraldo Leal, Ani Sinha,
	Daniel P . Berrangé, John Snow, qemu-ppc, Alex Bennée

On 21/11/2023 12.47, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 21/11/2023 10.39, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>
>>>> On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
>>>>> Hi
>>>>>
>>>>> On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>>>
>>>>>> If a chardev socket is not read, it will eventually fill and QEMU
>>>>>> can block attempting to write to it. A difficult bug in avocado
>>>>>> tests where the console socket was not being read from caused this
>>>>>> hang.
>>>>>>
>>>>>> warn if a chardev write is blocked for 100ms.
>>>>>>
>>>>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>>>>> ---
>>>>>> This is not necessary for the fix but it does trigger in the
>>>>>> failing avocado test without the previous patch applied. Maybe
>>>>>> it would be helpful?
>>>>>>
>>>>>> Thanks,
>>>>>> Nick
>>>>>>
>>>>>>    chardev/char.c | 6 ++++++
>>>>>>    1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/chardev/char.c b/chardev/char.c
>>>>>> index 996a024c7a..7c375e3cc4 100644
>>>>>> --- a/chardev/char.c
>>>>>> +++ b/chardev/char.c
>>>>>> @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
>>>>>>    {
>>>>>>        ChardevClass *cc = CHARDEV_GET_CLASS(s);
>>>>>>        int res = 0;
>>>>>> +    int nr_retries = 0;
>>>>>> +
>>>>>>        *offset = 0;
>>>>>>
>>>>>>        qemu_mutex_lock(&s->chr_write_lock);
>>>>>> @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
>>>>>>                } else {
>>>>>>                    g_usleep(100);
>>>>>>                }
>>>>>> +            if (++nr_retries == 1000) { /* 100ms */
>>>>>> +                warn_report("Chardev '%s' write blocked for > 100ms, "
>>>>>> +                            "socket buffer full?", s->label);
>>>>>> +            }
>>>>>
>>>>> That shouldn't happen, the frontend should poll and only write when it
>>>>> can. What is the qemu command being used here?
>>>>
>>>> You can follow it through the thread here
>>>>
>>>> https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
>>>>
>>>> In short, a console device is attached to a socket pair and nothing
>>>> ever reads from it. It eventually fills, and writing to it fails
>>>> indefinitely here.
>>>>
>>>> It can be reproduced with:
>>>>
>>>> make check-avocado
>>>> AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
>>>>
>>>>
>>>
>>> How reliably? I tried 10/10.
>>
>> It used to fail for me every time I tried - but the fix has already been
>> merged yesterday (commit cd43f00524070c026), so if you updated today, you'll
>> see the test passing again.
> 
> Ok so the "frontend" is spapr-vty and there:
> 
> void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
> {
>      SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);
> 
>      /* XXX this blocks entire thread. Rewrite to use
>       * qemu_chr_fe_write and background I/O callbacks */
>      qemu_chr_fe_write_all(&dev->chardev, buf, len);
> }
> 
> (grep "XXX this blocks", we have a lot...)
> 
> Can H_PUT_TERM_CHAR return the number of bytes written?

You can find the definition of the hypercall in the LoPAPR spec:

  https://elinux.org/images/a/a4/LoPAPR_DRAFT_v11_24March2016.pdf

... and if I get it right, it does not have a way to tell the guest the 
amount of accepted characters. But it could return H_BUSY if it is not able 
to enqueue all characters at once. As far as I can see, this will make the 
guest spin until it can finally send out the characters... not sure whether 
that's so much better...?

  Thomas





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] chardev: report blocked write to chardev backend
  2023-11-22 10:38             ` Thomas Huth
@ 2023-11-22 10:42               ` Daniel P. Berrangé
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel P. Berrangé @ 2023-11-22 10:42 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Marc-André Lureau, Nicholas Piggin, qemu-devel,
	Paolo Bonzini, Pavel Dovgalyuk, Cleber Rosa,
	Philippe Mathieu-Daudé, Wainer dos Santos Moschetta,
	Beraldo Leal, Ani Sinha, John Snow, qemu-ppc, Alex Bennée

On Wed, Nov 22, 2023 at 11:38:28AM +0100, Thomas Huth wrote:
> On 21/11/2023 12.47, Marc-André Lureau wrote:
> > Hi
> > 
> > On Tue, Nov 21, 2023 at 1:45 PM Thomas Huth <thuth@redhat.com> wrote:
> > > 
> > > On 21/11/2023 10.39, Marc-André Lureau wrote:
> > > > Hi
> > > > 
> > > > On Mon, Nov 20, 2023 at 5:36 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > > > 
> > > > > On Mon Nov 20, 2023 at 10:06 PM AEST, Marc-André Lureau wrote:
> > > > > > Hi
> > > > > > 
> > > > > > On Thu, Nov 16, 2023 at 3:54 PM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > > > > > 
> > > > > > > If a chardev socket is not read, it will eventually fill and QEMU
> > > > > > > can block attempting to write to it. A difficult bug in avocado
> > > > > > > tests where the console socket was not being read from caused this
> > > > > > > hang.
> > > > > > > 
> > > > > > > warn if a chardev write is blocked for 100ms.
> > > > > > > 
> > > > > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > > > > > ---
> > > > > > > This is not necessary for the fix but it does trigger in the
> > > > > > > failing avocado test without the previous patch applied. Maybe
> > > > > > > it would be helpful?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Nick
> > > > > > > 
> > > > > > >    chardev/char.c | 6 ++++++
> > > > > > >    1 file changed, 6 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/chardev/char.c b/chardev/char.c
> > > > > > > index 996a024c7a..7c375e3cc4 100644
> > > > > > > --- a/chardev/char.c
> > > > > > > +++ b/chardev/char.c
> > > > > > > @@ -114,6 +114,8 @@ static int qemu_chr_write_buffer(Chardev *s,
> > > > > > >    {
> > > > > > >        ChardevClass *cc = CHARDEV_GET_CLASS(s);
> > > > > > >        int res = 0;
> > > > > > > +    int nr_retries = 0;
> > > > > > > +
> > > > > > >        *offset = 0;
> > > > > > > 
> > > > > > >        qemu_mutex_lock(&s->chr_write_lock);
> > > > > > > @@ -126,6 +128,10 @@ static int qemu_chr_write_buffer(Chardev *s,
> > > > > > >                } else {
> > > > > > >                    g_usleep(100);
> > > > > > >                }
> > > > > > > +            if (++nr_retries == 1000) { /* 100ms */
> > > > > > > +                warn_report("Chardev '%s' write blocked for > 100ms, "
> > > > > > > +                            "socket buffer full?", s->label);
> > > > > > > +            }
> > > > > > 
> > > > > > That shouldn't happen, the frontend should poll and only write when it
> > > > > > can. What is the qemu command being used here?
> > > > > 
> > > > > You can follow it through the thread here
> > > > > 
> > > > > https://lore.kernel.org/qemu-devel/ZVT-bY9YOr69QTPX@redhat.com/
> > > > > 
> > > > > In short, a console device is attached to a socket pair and nothing
> > > > > ever reads from it. It eventually fills, and writing to it fails
> > > > > indefinitely here.
> > > > > 
> > > > > It can be reproduced with:
> > > > > 
> > > > > make check-avocado
> > > > > AVOCADO_TESTS=tests/avocado/reverse_debugging.py:test_ppc64_pseries
> > > > > 
> > > > > 
> > > > 
> > > > How reliably? I tried 10/10.
> > > 
> > > It used to fail for me every time I tried - but the fix has already been
> > > merged yesterday (commit cd43f00524070c026), so if you updated today, you'll
> > > see the test passing again.
> > 
> > Ok so the "frontend" is spapr-vty and there:
> > 
> > void vty_putchars(SpaprVioDevice *sdev, uint8_t *buf, int len)
> > {
> >      SpaprVioVty *dev = VIO_SPAPR_VTY_DEVICE(sdev);
> > 
> >      /* XXX this blocks entire thread. Rewrite to use
> >       * qemu_chr_fe_write and background I/O callbacks */
> >      qemu_chr_fe_write_all(&dev->chardev, buf, len);
> > }
> > 
> > (grep "XXX this blocks", we have a lot...)
> > 
> > Can H_PUT_TERM_CHAR return the number of bytes written?
> 
> You can find the definition of the hypercall in the LoPAPR spec:
> 
>  https://elinux.org/images/a/a4/LoPAPR_DRAFT_v11_24March2016.pdf
> 
> ... and if I get it right, it does not have a way to tell the guest the
> amount of accepted characters. But it could return H_BUSY if it is not able
> to enqueue all characters at once. As far as I can see, this will make the
> guest spin until it can finally send out the characters... not sure whether
> that's so much better...?

If the rest of the guest can get on with useful work that's better. If we
block in QEMU, the entire guest hardware emulation is blocked so nothing
can make progress if it exits from vCPU run context.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-11-22 10:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-16 11:53 [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang Nicholas Piggin
2023-11-16 11:53 ` [PATCH 2/3] chardev: report blocked write to chardev backend Nicholas Piggin
2023-11-20 12:06   ` Marc-André Lureau
2023-11-20 13:35     ` Nicholas Piggin
2023-11-21  9:39       ` Marc-André Lureau
2023-11-21  9:42         ` Daniel P. Berrangé
2023-11-21  9:44         ` Thomas Huth
2023-11-21 11:47           ` Marc-André Lureau
2023-11-22  9:55             ` Alex Bennée
2023-11-22 10:38             ` Thomas Huth
2023-11-22 10:42               ` Daniel P. Berrangé
2023-11-16 11:53 ` [PATCH 3/3] tests/avocado: Enable reverse_debugging.py tests in gitlab CI Nicholas Piggin
2023-11-16 12:33   ` Thomas Huth
2023-11-16 18:11   ` Thomas Huth
2023-11-17  7:35     ` Nicholas Piggin
2023-11-21  8:56       ` Thomas Huth
2023-11-21  9:14         ` Daniel P. Berrangé
2023-11-21  9:40           ` Thomas Huth
2023-11-16 13:26 ` [PATCH 1/3] tests/avocado: reverse_debugging drain console to prevent hang Ani Sinha
2023-11-16 13:31   ` Ani Sinha
2023-11-16 13:39     ` Ani Sinha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).