Re: Transient fail of iotests 215 and 197

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Huth <thuth@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Eric Blake <eblake@redhat.com>, Max Reitz <mreitz@redhat.com>
Subject: Re: Transient fail of iotests 215 and 197
Date: Tue, 27 Jul 2021 16:23:11 +0200	[thread overview]
Message-ID: <31132251-1f39-e830-a0fd-63628529be53@redhat.com> (raw)
In-Reply-To: <YPhX1TakNJjH0RaA@redhat.com>

On 21/07/2021 19.22, Daniel P. Berrangé wrote:
> Peter caught the following transient fail on the staging tree:
> 
>    https://gitlab.com/qemu-project/qemu/-/jobs/1438817749
> 
> --- /builds/qemu-project/qemu/tests/qemu-iotests/197.out
> +++ 197.out.bad
> @@ -12,13 +12,12 @@
>   128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   read 0/0 bytes at offset 0
>   0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -read 2147483136/2147483136 bytes at offset 1024
> -2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +./common.rc: Killed                  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
>   read 1024/1024 bytes at offset 3221226496
>   1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on read-only device
> -2 GiB (0x80010000) bytes     allocated at offset 0 bytes (0x0)
> -1023.938 MiB (0x3fff0000) bytes not allocated at offset 2 GiB (0x80010000)
> +2 GiB (0x80000000) bytes     allocated at offset 0 bytes (0x0)
> +1 GiB (0x40000000) bytes not allocated at offset 2 GiB (0x80000000)
>   64 KiB (0x10000) bytes     allocated at offset 3 GiB (0xc0000000)
>   1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000)
>   No errors were found on the image.
> 
> 
> --- /builds/qemu-project/qemu/tests/qemu-iotests/215.out
> +++ 215.out.bad
> @@ -12,13 +12,12 @@
>   128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   read 0/0 bytes at offset 0
>   0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -read 2147483136/2147483136 bytes at offset 1024
> -2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +./common.rc: Killed                  ( VALGRIND_QEMU="${VALGRIND_QEMU_IO}" _qemu_proc_exec "${VALGRIND_LOGFILE}" "$QEMU_IO_PROG" $QEMU_IO_ARGS "$@" )
>   read 1024/1024 bytes at offset 3221226496
>   1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   qemu-io: can't open device TEST_DIR/t.wrap.qcow2: Block node is read-only
> -2 GiB (0x80010000) bytes     allocated at offset 0 bytes (0x0)
> -1023.938 MiB (0x3fff0000) bytes not allocated at offset 2 GiB (0x80010000)
> +2 GiB (0x80000000) bytes     allocated at offset 0 bytes (0x0)
> +1 GiB (0x40000000) bytes not allocated at offset 2 GiB (0x80000000)
>   64 KiB (0x10000) bytes     allocated at offset 3 GiB (0xc0000000)
>   1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000)
>   No errors were found on the image.
> 
> 
> Looks like the process might have been killed off by the OS part way
> through.
> 
> Interestingly both test cases have a comment:
> 
>    #                                        Since a 2G read may exhaust
>    # memory on some machines (particularly 32-bit), we skip the test if
>    # that fails due to memory pressure.
> 
> 
> I'm wondering if the logic for handling this failure is flawed, as being
> killed by the OS for exhuasting memory limits for the CI container looks
> like a plausible scenario to explain the failure.
> 
> The CI shared runners supposedly have 3.75 GB of RAM for the VM as a whole.
> If the tests are run in parallel this could still be an issue.
> 
> Maybe we need to skip these tests by default if they are known to require
> a significant amount of memory to run ?

The tests are not in the "auto" group, so they are not running by default - 
but I once added them to the build-tcg-disabled job since they were working 
fine in the gitlab-CI.

If they are now dying because of out-of-memory issues, that means that 
either they are using more memory now, or that the containers changed and 
provide less free memory now. Anyway, it sounds like the tests are not 
suited for the gitlab-CI anymore, and since they are not in the "auto" group 
anyway, I'd suggest to simply disable them in the build-tcg-disabled job 
again. I'll send a patch...

  Thomas

     prev parent reply	other threads:[~2021-07-27 14:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21 17:22 Transient fail of iotests 215 and 197 Daniel P. Berrangé
2021-07-27 14:23 ` Thomas Huth [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31132251-1f39-e830-a0fd-63628529be53@redhat.com \
    --to=thuth@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).