qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: "Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"QEMU Developers" <qemu-devel@nongnu.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
	robert.foley@linaro.org, "Cleber Rosa" <crosa@redhat.com>
Subject: Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
Date: Fri, 17 Jul 2020 18:24:18 -0400	[thread overview]
Message-ID: <89f309c2-9909-f306-2f2b-93fc1b8adc17@redhat.com> (raw)
In-Reply-To: <42c71a79-4911-dc47-d983-783d98e819c5@redhat.com>

On 7/17/20 9:25 AM, Philippe Mathieu-Daudé wrote:
> On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote:
>> Cc'ing John & Cleber.
>>
>> On 7/17/20 3:08 PM, Peter Maydell wrote:
>>> If you run vm-build-openbsd, our makefile/scripting infrastructure
>>> seems to fail to notice that death of the QEMU process that's
>>> running the VM should be a failure, and ends up allowing make
>>> to return a success condition.
>>>
>>> I have a script which runs a VM build which basically does:
>>>
>>>    #!/bin/sh -e
>>>    make -C "build" "vm-build-openbsd" J=8 V=1
>>>    echo "OK DONE openbsd"
>>>
>>> It just gave me this output (tail end of logfile). We're executing tests,
>>> and then the qemu-system-x86_64 that's running the OpenBSD VM gets
>>> a signal 9 (sigkill), for unclear reasons (oom killer??). The python
>>> scripting gets an exception, but doesn't exit with a failure status to
>>> make, which then thinks all is fine, exits success itself and allows
>>> the set -e script to proceed to print the "OK DONE" line...

The signal 9 is likely due to QEMU freezing on exit. The machine.py code 
waits about 3 seconds after trying to shut QEMU down gracefully before 
sending SIGKILL and cleaning up forcefully.

>>>
>>>
>>> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
>>> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
>>> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
>>> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
>>> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
>>> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
>>> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
>>> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
>>> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'return': {}}
>>> WARNING:qemu.machine:qemu received signal 9; command:
>>> "qemu-system-x86_64 -display none -vga none -chardev
>>> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
>>> chardev=mon,mode=control -machine pc -chardev
>>> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
>>> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
>>> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
>>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
>>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
>>> -device virtio-blk,drive=drive0,bootindex=0 -drive
>>> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
>>> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
>>> Error in atexit._run_exitfuncs:
>>> Traceback (most recent call last):
>>>    File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 436, in _do_shutdown
>>>      self._soft_shutdown(has_quit, timeout)
>>>    File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 419, in _soft_shutdown
>>>      self._popen.wait(timeout=timeout)
>>>    File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
>>>      raise TimeoutExpired(self.args, timeout)

This is where it times out waiting for the process to exit of its own 
volition. This is the "Inner" exception ...

>>> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
>>> 'none', '-vga', 'none', '-chardev',
>>> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
>>> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
>>> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
>>> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
>>> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
>>> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
>>> '8', '-enable-kvm', '-drive',
>>> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
>>> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
>>> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
>>> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
>>> timed out after 3 seconds
>>>
>>> The above exception was the direct cause of the following exception:
>>>
>>> Traceback (most recent call last):
>>>    File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 466, in shutdown
>>>      self._do_shutdown(has_quit, timeout=timeout)
>>>    File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 440, in _do_shutdown
>>>      from exc
>>> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown

And this is the "Outer" exception, the machine.py cleanup handler 
letting you know that while it did perform cleanup, QEMU was not shut 
down properly.

>>> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
>>> OK DONE openbsd
> 
> John it seems due to the addition of the 'finally' clause in
> 193bf1c061 ("python/machine.py: split shutdown into hard and
> soft flavors").
> 

That's surprising. I'll look into this Monday, but the "Finally: " 
should only perform some actions without suppressing the exception. You 
can see here it DIDN'T suppress it.

...but maybe we don't return non-zero if an exception gets raised during 
atexit()? Maybe that's what's happening here.

... ... Ah, guess what, that's exactly it. atexit suppresses exceptions. 
You just never noticed this problem before because machine.py used to 
suppress problems during shutdown instead of reporting them.

```
#!/usr/bin/env python3

import atexit
from qemu import machine

def handler():
     raise machine.AbnormalShutdown("oh no")

atexit.register(handler)
```

Oops, this returns a zero code.


So:

- basevm.py should likely be converted to a context manager object just 
like machine.py's QEMUMachine class was so cleanup can happen explicitly 
when the VM object goes out of scope.

(Callers would use: "with baseVM() as foo:" pattern, or, where that's 
not possible, call an explicit .shutdown().)

- The real problem, though: Why is QEMU hanging? It might need a longer 
timeout, or it might be having problems with the console socket again.

(CC Robert Foley who has been working on the Console socket draining 
problems. Maybe he has some insight here?)

--js



  reply	other threads:[~2020-07-17 22:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-17 13:08 tests/vm infrastructure fails to notice that QEMU dying is a failure Peter Maydell
2020-07-17 13:22 ` Philippe Mathieu-Daudé
2020-07-17 13:25   ` Philippe Mathieu-Daudé
2020-07-17 22:24     ` John Snow [this message]
2020-07-18  9:30       ` Peter Maydell
2020-07-20 14:16         ` John Snow
2020-07-18 13:34       ` Robert Foley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89f309c2-9909-f306-2f2b-93fc1b8adc17@redhat.com \
    --to=jsnow@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=crosa@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=robert.foley@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).