qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org, stefanha@redhat.com,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk>,
	"Huacai Chen" <chenhuacai@kernel.org>,
	"Philippe Mathieu-Daudé" <philippe.mathieu-daude@linaro.org>,
	"John Snow" <jsnow@redhat.com>
Subject: Failure analysis (was Re: [PULL for 7.2 00/10] testing and doc updates)
Date: Wed, 16 Nov 2022 18:20:10 +0000	[thread overview]
Message-ID: <87r0y29287.fsf@linaro.org> (raw)
In-Reply-To: <CAJSP0QVigz1nDq7JO2ABquzReGWkqY5dwXKrEaufw+FXnvsvkg@mail.gmail.com>


Stefan Hajnoczi <stefanha@gmail.com> writes:

> This pull request causes the following CI failure:
>
> https://gitlab.com/qemu-project/qemu/-/jobs/3328449477
>
> I haven't figured out the root cause of the failure. Maybe the pull
> request just exposes a latent failure. Please take a look and we can
> try again for -rc2.

OK after a lot of digging I've come to the following conclusion:

  * the Fuloong 2E machine never enables the FIFO on the 16550 (s->fcr & UART_FCR_FE)
  * as a result if qemu_chr_fe_write(&s->chr, &s->tsr, 1) fails with -EAGAIN
    - a serial_watch_cb is queued
    - s->tsr_retry++
  * additional serial_ioport_write's overwrite s->thr
  * the console output gets corrupted

You can see the effect by comparing the serial write and xmit values:

  ➜  grep serial_write alex.log | cut -d ' ' -f 6 | xxd -r -p | head -n 10
  [    0.000000] Initializing cgroup subsys cpuset
  [    0.000000] Initializing cgroup subsys cpu
  [    0.000000] Initializing cgroup subsys cpuacct
  [    0.000000] Linux version 3.16.0-6-loongson-2e (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 Debian 3.16.56-1+deb8u1 (2018-05-08)
  [    0.000000] memsize=256, highmemsize=0
  [    0.000000] CpuClock = 533080000
  [    0.000000] bootconsole [early0] enabled
  [    0.000000] CPU0 revision is: 00006302 (ICT Loongson-2)
  [    0.000000] FPU revision is: 00000501
  [    0.000000] Checking for the multiply/shift bug... no.
  🕙18:27:17 alex@zen:qemu.git/builds/all  on  pr/141122-misc-for-7.2-1 [$!?⇕] 
  ➜  grep serial_xmit alex.log | cut -d ' ' -f 2 | xxd -r -p | head -n 10
  [    0.000000] Initializing cgroup subsys cpuset
  [    0.000000] Initializing cgroup subsys cpu
  [    0.000000] Initializing cgroup subsys cpuacct
  [    0.000000] Linux version 3.16.0-6-loongson-2e (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 Debian 33 0.000000] bootconsole [early0] enabled
  [    0.000000] CPU0 revision is: 00006302 (ICT Loongson-2)
  [    0.000000] FPU revision is: 00000501
  [    0.000000] Checking for the multiply/shift bug... no.
  [    0.000000] Checking for the daddiu bug... no.
  [    0.000000] Determined physical RAM map:
  [    0.000000]  memory: 000

As a result the check for the pattern fails:

        console_pattern = 'Kernel command line: %s' % kernel_command_line
        self.wait_for_console_pattern(console_pattern)

resulting in a timeout and test fail.

In effect the configuration makes the output dependent on how fast the
avocado test can drain the socket as there is no buffering elsewhere in
the system. The changes in:

  Subject: [PULL 02/10] tests/avocado: improve behaviour waiting for login prompts

makes this failure more likely to happen - I think because the .peek() and
.readline() behaviour have different buffering strategies. Options
include:

  - enable the 16550 FIFO for the Loognson kernel (command line option?)
  - increase the buffering of the python socket.socket() code

I can get it to pass by shuffling the time.sleep() and a few other
checks around but that seems flaky at best.

-- 
Alex Bennée


  reply	other threads:[~2022-11-16 18:36 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-15 13:34 [PULL for 7.2 00/10] testing and doc updates Alex Bennée
2022-11-15 13:34 ` [PULL 01/10] Run docker probe only if docker or podman are available Alex Bennée
2022-11-15 13:34 ` [PULL 02/10] tests/avocado: improve behaviour waiting for login prompts Alex Bennée
2022-11-15 13:34 ` [PULL 03/10] tests/avocado/machine_aspeed.py: Reduce noise on the console for SDK tests Alex Bennée
2022-11-15 13:34 ` [PULL 04/10] tests/docker: allow user to override check target Alex Bennée
2022-11-15 13:34 ` [PULL 05/10] docs/devel: add a maintainers section to development process Alex Bennée
2022-11-15 13:34 ` [PULL 06/10] docs/devel: make language a little less code centric Alex Bennée
2022-11-15 13:34 ` [PULL 07/10] docs/devel: simplify the minimal checklist Alex Bennée
2022-11-15 13:34 ` [PULL 08/10] docs/devel: try and improve the language around patch review Alex Bennée
2022-11-15 13:34 ` [PULL 09/10] tests/avocado: Raise timeout for boot_linux.py:BootLinuxPPC64.test_pseries_tcg Alex Bennée
2022-11-15 13:34 ` [PULL 10/10] gitlab: integrate coverage report Alex Bennée
2022-11-15 23:53 ` [PULL for 7.2 00/10] testing and doc updates Stefan Hajnoczi
2022-11-16 18:20   ` Alex Bennée [this message]
2022-11-16 19:26     ` Failure analysis (was Re: [PULL for 7.2 00/10] testing and doc updates) Mark Cave-Ayland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0y29287.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=chenhuacai@kernel.org \
    --cc=jsnow@redhat.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=pbonzini@redhat.com \
    --cc=philippe.mathieu-daude@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).