From: John Snow <jsnow@redhat.com>
To: "Philippe Mathieu-Daudé" <philmd@redhat.com>,
"Peter Maydell" <peter.maydell@linaro.org>,
"QEMU Developers" <qemu-devel@nongnu.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
robert.foley@linaro.org, "Cleber Rosa" <crosa@redhat.com>
Subject: Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
Date: Fri, 17 Jul 2020 18:24:18 -0400 [thread overview]
Message-ID: <89f309c2-9909-f306-2f2b-93fc1b8adc17@redhat.com> (raw)
In-Reply-To: <42c71a79-4911-dc47-d983-783d98e819c5@redhat.com>
On 7/17/20 9:25 AM, Philippe Mathieu-Daudé wrote:
> On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote:
>> Cc'ing John & Cleber.
>>
>> On 7/17/20 3:08 PM, Peter Maydell wrote:
>>> If you run vm-build-openbsd, our makefile/scripting infrastructure
>>> seems to fail to notice that death of the QEMU process that's
>>> running the VM should be a failure, and ends up allowing make
>>> to return a success condition.
>>>
>>> I have a script which runs a VM build which basically does:
>>>
>>> #!/bin/sh -e
>>> make -C "build" "vm-build-openbsd" J=8 V=1
>>> echo "OK DONE openbsd"
>>>
>>> It just gave me this output (tail end of logfile). We're executing tests,
>>> and then the qemu-system-x86_64 that's running the OpenBSD VM gets
>>> a signal 9 (sigkill), for unclear reasons (oom killer??). The python
>>> scripting gets an exception, but doesn't exit with a failure status to
>>> make, which then thinks all is fine, exits success itself and allows
>>> the set -e script to proceed to print the "OK DONE" line...
The signal 9 is likely due to QEMU freezing on exit. The machine.py code
waits about 3 seconds after trying to shut QEMU down gracefully before
sending SIGKILL and cleaning up forcefully.
>>>
>>>
>>> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
>>> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
>>> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
>>> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
>>> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
>>> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
>>> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
>>> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
>>> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'return': {}}
>>> WARNING:qemu.machine:qemu received signal 9; command:
>>> "qemu-system-x86_64 -display none -vga none -chardev
>>> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
>>> chardev=mon,mode=control -machine pc -chardev
>>> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
>>> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
>>> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
>>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
>>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
>>> -device virtio-blk,drive=drive0,bootindex=0 -drive
>>> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
>>> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
>>> Error in atexit._run_exitfuncs:
>>> Traceback (most recent call last):
>>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 436, in _do_shutdown
>>> self._soft_shutdown(has_quit, timeout)
>>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 419, in _soft_shutdown
>>> self._popen.wait(timeout=timeout)
>>> File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
>>> raise TimeoutExpired(self.args, timeout)
This is where it times out waiting for the process to exit of its own
volition. This is the "Inner" exception ...
>>> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
>>> 'none', '-vga', 'none', '-chardev',
>>> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
>>> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
>>> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
>>> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
>>> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
>>> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
>>> '8', '-enable-kvm', '-drive',
>>> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
>>> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
>>> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
>>> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
>>> timed out after 3 seconds
>>>
>>> The above exception was the direct cause of the following exception:
>>>
>>> Traceback (most recent call last):
>>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 466, in shutdown
>>> self._do_shutdown(has_quit, timeout=timeout)
>>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 440, in _do_shutdown
>>> from exc
>>> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
And this is the "Outer" exception, the machine.py cleanup handler
letting you know that while it did perform cleanup, QEMU was not shut
down properly.
>>> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
>>> OK DONE openbsd
>
> John it seems due to the addition of the 'finally' clause in
> 193bf1c061 ("python/machine.py: split shutdown into hard and
> soft flavors").
>
That's surprising. I'll look into this Monday, but the "Finally: "
should only perform some actions without suppressing the exception. You
can see here it DIDN'T suppress it.
...but maybe we don't return non-zero if an exception gets raised during
atexit()? Maybe that's what's happening here.
... ... Ah, guess what, that's exactly it. atexit suppresses exceptions.
You just never noticed this problem before because machine.py used to
suppress problems during shutdown instead of reporting them.
```
#!/usr/bin/env python3
import atexit
from qemu import machine
def handler():
raise machine.AbnormalShutdown("oh no")
atexit.register(handler)
```
Oops, this returns a zero code.
So:
- basevm.py should likely be converted to a context manager object just
like machine.py's QEMUMachine class was so cleanup can happen explicitly
when the VM object goes out of scope.
(Callers would use: "with baseVM() as foo:" pattern, or, where that's
not possible, call an explicit .shutdown().)
- The real problem, though: Why is QEMU hanging? It might need a longer
timeout, or it might be having problems with the console socket again.
(CC Robert Foley who has been working on the Console socket draining
problems. Maybe he has some insight here?)
--js
next prev parent reply other threads:[~2020-07-17 22:25 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-17 13:08 tests/vm infrastructure fails to notice that QEMU dying is a failure Peter Maydell
2020-07-17 13:22 ` Philippe Mathieu-Daudé
2020-07-17 13:25 ` Philippe Mathieu-Daudé
2020-07-17 22:24 ` John Snow [this message]
2020-07-18 9:30 ` Peter Maydell
2020-07-20 14:16 ` John Snow
2020-07-18 13:34 ` Robert Foley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=89f309c2-9909-f306-2f2b-93fc1b8adc17@redhat.com \
--to=jsnow@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=crosa@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=robert.foley@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).