All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nicholas Piggin" <npiggin@gmail.com>
To: "John Snow" <jsnow@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Thomas Huth" <thuth@redhat.com>,
	"Ani Sinha" <anisinha@redhat.com>,
	"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
	"Beraldo Leal" <bleal@redhat.com>,
	"Cleber Rosa" <crosa@redhat.com>,
	"Pavel Dovgalyuk" <pavel.dovgaluk@ispras.ru>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>
Subject: Re: [PATCH] tests/avocado/reverse_debugging: Disable the ppc64 tests by default
Date: Thu, 23 Nov 2023 12:04:24 +1000	[thread overview]
Message-ID: <CX5TMOWS9X9Y.366BHVH1CM2OQ@wheely> (raw)
In-Reply-To: <CAFn=p-aDO_fZOsiBMdHhn6GP3ZadCrUAN4=C6o4d95UVMo3vOA@mail.gmail.com>

On Tue Nov 21, 2023 at 5:18 AM AEST, John Snow wrote:
> On Wed, Nov 15, 2023 at 12:23 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Wed, Nov 15, 2023 at 01:14:53PM +0000, Daniel P. Berrangé wrote:
> > > On Wed, Nov 15, 2023 at 07:23:01AM +0100, Thomas Huth wrote:
> > > > On 15/11/2023 02.15, Nicholas Piggin wrote:
> > > > > On Wed Nov 15, 2023 at 4:29 AM AEST, Thomas Huth wrote:
> > > > > > On 14/11/2023 17.37, Philippe Mathieu-Daudé wrote:
> > > > > > > On 14/11/23 17:31, Thomas Huth wrote:
> > > > > > > > The tests seem currently to be broken. Disable them by default
> > > > > > > > until someone fixes them.
> > > > > > > >
> > > > > > > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > > > > > > ---
> > > > > > > >    tests/avocado/reverse_debugging.py | 7 ++++---
> > > > > > > >    1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > Similarly, I suspect https://gitlab.com/qemu-project/qemu/-/issues/1961
> > > > > > > which has a fix ready:
> > > > > > > https://lore.kernel.org/qemu-devel/20231110170831.185001-1-richard.henderson@linaro.org/
> > > > > > >
> > > > > > > Maybe wait the fix gets in first?
> > > > > >
> > > > > > No, I applied Richard's patch, but the problem persists. Does this test
> > > > > > still work for you?
> > > > >
> > > > > I bisected it to 1d4796cd008373 ("python/machine: use socketpair() for
> > > > > console connections"),
> > > >
> > > > Maybe John (who wrote that commit) can help?
> > >
> > > I find it hard to believe this commit is a direct root cause of the
> > > problem since all it does is change the QEMU startup sequence so that
> > > instead of QEMU listening for a monitor connection, it is given a
> > > pre-opened monitor connection.
> > >
> > > At the very most that should affect the startup timing a little.
> > >
> > > I notice all the reverse debugging tests have a skip on gitlab
> > > with a comment:
> > >
> > >     # unidentified gitlab timeout problem
> > >
> > > this makes be suspicious that John's patch has merely made this
> > > (henceforth undiagnosed) timeout more likely to ocurr.
> >
> > After an absolutely horrendous hours long debugging session I think
> > I figured out the problem. The QEMU process is blocking in
> >
> >     qemu_chr_write_buffer
> >
> > spinning in the loop on EAGAIN.
> >
> > The Python  Machine() class has passed one of a pre-created socketpair
> > FDs for the serial port chardev. The guest is trying to write to this
> > and blocking.  Nothing in the Machine() class is reading from the
> > other end of the serial port console.
> >
> >
> > Before John's change, the serial port uses a chardev in server mode
> > and crucially  'wait=off', and the Machine() class never opened the
> > console socket unless the test case wanted to read from it.
> >
> > IOW, QEMU had a background job setting there waiting for a connection
> > that would never come.
> >
> > As a result when QEMU started executing the guest, all the serial port
> > writes get sent into to the void.
> >
> >
> > So John's patch has had a semantic change in behaviour, because the
> > console socket is permanently open, and thus socket buffers are liable
> > to fill up.
> >
> > As a demo I increased the socket buffers to 1MB and everything then
> > succeeded.
> >
> > @@ -357,6 +360,10 @@ def _pre_launch(self) -> None:
> >
> >          if self._console_set:
> >              self._cons_sock_pair = socket.socketpair()
> > +            self._cons_sock_pair[0].setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024*1024);
> > +            self._cons_sock_pair[0].setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024*1024);
> > +            self._cons_sock_pair[1].setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024*1024);
> > +            self._cons_sock_pair[1].setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024*1024);
> >              os.set_inheritable(self._cons_sock_pair[0].fileno(), True)
> >
> >          # NOTE: Make sure any opened resources are *definitely* freed in
> >
> >
> > The Machine class doesn't know if anything will ever use the console,
> > so as is the change is unsafe.
> >
> > The original goal of John's change was to guarantee we capture early
> > boot messages as some test need that.
> >
> > I think we need to be able to have a flag to say whether the caller needs
> > an "early console" facility, and only use the pre-opened FD passing for
> > that case. Tests we need early console will have to ask for that guarantee
> > explicitly.
>
> Tch. I see. Thank you for diagnosing this.
>
> From the machine.py perspective, you have to *opt in* to having a
> console, so I hadn't considered that a caller would enable the console
> and then ... not read from it. Surely that's a bug in the caller?
>
> If you don't intend to read from the console, you shouldn't call set_console().

Agree, hence the fix patch for the test case.

Although most tests wait for console, ones like this that control the
machine with gdb/qmp are rarer, so less examples to copy paste from.

>
> (The async rewrite I have been toying with on and off has a built-in
> drainer that writes to a log file that would probably remedy this, but
> the client tests should still be fixed, I think. Otherwise, do you

This sounds good because no matter the test, you rarely don't want to
log console output. Separating that from what the test does with
console would be nice.

> have any suggestions for how I might make this failure state more
> obvious/friendly? I wonder if on close of the machine.py object I
> could detect that the pipe is full and emit a warning about that.)

That's an idea. It wouldn't be foolproof (test could be waiting for
something else or failed for some other reason), but at least it could
give a suggestion (similar to my warning in the chardev code).

How would you do it? Maybe the simplest/portable way would be keep
a pipe write fd open in the harness and try write something to it
with O_NONBLOCK?

Thanks,
Nick


  reply	other threads:[~2023-11-23  2:05 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-14 16:31 [PATCH] tests/avocado/reverse_debugging: Disable the ppc64 tests by default Thomas Huth
2023-11-14 16:35 ` Philippe Mathieu-Daudé
2023-11-14 16:37 ` Philippe Mathieu-Daudé
2023-11-14 18:29   ` Thomas Huth
2023-11-15  1:15     ` Nicholas Piggin
2023-11-15  6:23       ` Thomas Huth
2023-11-15 13:14         ` Daniel P. Berrangé
2023-11-15 17:22           ` Daniel P. Berrangé
2023-11-16  1:15             ` Nicholas Piggin
2023-11-16  3:55               ` Ani Sinha
2023-11-16  7:14                 ` Nicholas Piggin
2023-11-16  8:55                   ` Daniel P. Berrangé
2023-11-16 11:17                     ` Ani Sinha
2023-11-16 11:31                       ` Daniel P. Berrangé
2023-11-16  7:09               ` Thomas Huth
2023-11-16  9:45                 ` Nicholas Piggin
2023-11-16  9:00               ` Daniel P. Berrangé
2023-11-16  3:50             ` Ani Sinha
2023-11-20 19:18             ` John Snow
2023-11-23  2:04               ` Nicholas Piggin [this message]
2023-11-23 10:52               ` Peter Maydell
2024-01-08 23:52                 ` John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CX5TMOWS9X9Y.366BHVH1CM2OQ@wheely \
    --to=npiggin@gmail.com \
    --cc=alex.bennee@linaro.org \
    --cc=anisinha@redhat.com \
    --cc=berrange@redhat.com \
    --cc=bleal@redhat.com \
    --cc=crosa@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=pavel.dovgaluk@ispras.ru \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=wainersm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.